Re: [PATCH 0/15] KVM: optimize for MMIO handled
On 06/08/2011 11:47 AM, Takuya Yoshikawa wrote: Sure, KVM guest is the client, and it uses e1000 NIC, and uses NAT network connect to the netperf server, the bandwidth of our network is 100M. I see the reason, thank you! I used virtio-net and you used e1000. You are using e1000 to see the MMIO performance change, right? Hi Takuya, Now, i have done the performance test for virtio-net, the performance is improved very little, and it is not *regression* ;-) The reason is, MMIO generated by virtio-net is very very little. ept = 1: Before patch: -- TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.247 (192.168.122.247) port 0 AF_INET Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size SizeTime Rate bytes Bytes bytesbytes secs.per sec 16384 87380 11 60.00 972.21 16384 87380 TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.247 (192.168.122.247) port 0 AF_INET Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size SizeTime Rate bytes Bytes bytesbytes secs.per sec 16384 87380 11 60.00 971.01 16384 87380 TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.247 (192.168.122.247) port 0 AF_INET Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size SizeTime Rate bytes Bytes bytesbytes secs.per sec 16384 87380 11 60.00 974.44 16384 87380 After patch: -- TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.247 (192.168.122.247) port 0 AF_INET Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size SizeTime Rate bytes Bytes bytesbytes secs.per sec 16384 87380 11 60.00 973.45 16384 87380 TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.247 (192.168.122.247) port 0 AF_INET Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size SizeTime Rate bytes Bytes bytesbytes secs.per sec 16384 87380 11 60.00 973.63 16384 87380 TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.247 (192.168.122.247) port 0 AF_INET Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size SizeTime Rate bytes Bytes bytesbytes secs.per sec 16384 87380 11 60.00 976.25 16384 87380 ept = 0, bypass_guest_pf=0: Before patch: -- TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.247 (192.168.122.247) port 0 AF_INET Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size SizeTime Rate bytes Bytes bytesbytes secs.per sec 16384 87380 11 60.00 975.16 16384 87380 TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.247 (192.168.122.247) port 0 AF_INET Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size SizeTime Rate bytes Bytes bytesbytes secs.per sec 16384 87380 11 60.00 979.95 16384 87380 TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.247 (192.168.122.247) port 0 AF_INET Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size SizeTime Rate bytes Bytes bytesbytes secs.per sec 16384 87380 11 60.00 984.03 16384 87380 After patch: -- TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.247 (192.168.122.247) port 0 AF_INET Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size SizeTime Rate bytes Bytes bytesbytes secs.per sec 16384 87380 11 60.00 974.30 16384 87380 TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.247 (192.168.122.247) port 0 AF_INET Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size SizeTime Rate bytes Bytes bytesbytes secs.per sec 16384 87380 11 60.00 976.33 16384 87380 TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.122.247 (192.168.122.247) port 0 AF_INET Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size SizeTime Rate bytes Bytes bytesbytes secs.per sec 16384 87380 11 60.00 981.45 16384 87380 -- To unsubscribe from this list: send the line
assigned EHCI USB headset not working
Hi, I am using latest clone from qemu-kvm git with kernel 2.6.35.7. Since assigning PCI soundcards, did not yield any usable results, I assigned a USB headset to a Windows7 VM. I used the following two command lines to enable the EHCI controller inside the VM and to assign the device to it: ... -device usb-ehci,id=ehci \ -device usb-host,vendorid=046d,productid=0a01,bus=ehci.0 \ ... Right after starting the VM I see the following output: ... Booting from Hard Disk... Booting from :7c00 husb: config #1 need 1 husb: 2 interfaces claimed for configuration 1 husb: config #1 need 1 husb: 2 interfaces claimed for configuration 1 husb: config #1 need 1 husb: 2 interfaces claimed for configuration 1 husb: config #1 need 1 husb: 3 interfaces claimed for configuration 1 husb: config #1 need 1 husb: 3 interfaces claimed for configuration 1 husb: config #1 need 1 husb: 3 interfaces claimed for configuration 1 USB stall USB stall USB stall USB stall USB stall USB stall USB stall USB stall USB stall USB stall USB stall USB stall USB stall USB stall husb: config #1 need 1 husb: 2 interfaces claimed for configuration 1 info usb on the monitor looks like this: Device 0.1, Port 1, Speed 1.5 Mb/s, Product Microsoft Wireless Desktop Rece Device 1.1, Port 1, Speed 480 Mb/s, Product Logitech USB Headset The sound device shows up under Windows7 and drivers are installed automatically. Unfortunately it does not work. All the players I tried, did not even start playing the sound file, although they detected the DirectSound Device. When connected to a natively running Windows7, the USB headset works the way it's supposed to. Any help is greatly appreciated. Regards André -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pci-assign: Do not reset the device unless the kernel supports it
On 2011-06-07 20:46, Alex Williamson wrote: On Tue, 2011-06-07 at 10:14 +0200, Jan Kiszka wrote: On 2011-06-07 10:06, Avi Kivity wrote: On 06/07/2011 01:04 AM, Jan Kiszka wrote: On 2011-06-06 23:48, Alex Williamson wrote: On Mon, 2011-06-06 at 23:30 +0200, Jan Kiszka wrote: From: Jan Kiszkajan.kis...@siemens.com At least kernels 2.6.38 and 2.6.39 do not properly support issuing a reset on an assigned device and corrupt its config space. Prevent this by checking for a host kernel with the required support, tagged by the to-be-introduced KVM_CAP_DEVICE_RESET. Wouldn't it be easier just to revert ed78661f in 2.6.39 stable? I guess we don't have an option to do that for .38 since stable is done there, but there are also some intel-iommu breakages that won't make stable for that release. It seems like the userspace invoked reset resolves known, demonstrable issues of devices continuing to DMA into guest memory while ed78661f is mostly a theoretical change. Easier would be this patch. But I don't mind reverting the problematic commit in 39, whatever is preferred. We should just resolve the issue finally. Kernel problems should be solved in the kernel (with exceptions of course, but don't see the need here). Then please file a revert for stable ASAP. How's this? For stable only or course. Thanks, Alex Revert KVM: Save/restore state of assigned PCI device From: Alex Williamson alex.william...@redhat.com This reverts ed78661f2614d3c9f69c23e280db3bafdabdf5bb as it assumes the saved PCI state will remain valid for the entire length of time that it is attached to a guest. This fails when userspace makes use of the pci-sysfs reset interface, which invalidates the saved device state, leaving nothing to be restored after the device is reset on de-assignment. This leaves the device in an unusable state. 3.0.0 will add an interface for KVM to save the PCI state in a [ It will be called 3.0. :) ] buffer unaffected by other callers of pci_reset_function(), but the most appropriate stable fix seems to be reverting this change since the original assumption about the device saved state persisting is incorrect. Signed-off-by: Alex Williamson alex.william...@redhat.com --- virt/kvm/assigned-dev.c |5 + 1 files changed, 1 insertions(+), 4 deletions(-) diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c index ae72ae6..e3f1235 100644 --- a/virt/kvm/assigned-dev.c +++ b/virt/kvm/assigned-dev.c @@ -197,8 +197,7 @@ static void kvm_free_assigned_device(struct kvm *kvm, { kvm_free_assigned_irq(kvm, assigned_dev); - __pci_reset_function(assigned_dev-dev); - pci_restore_state(assigned_dev-dev); + pci_reset_function(assigned_dev-dev); pci_release_regions(assigned_dev-dev); pci_disable_device(assigned_dev-dev); @@ -515,7 +514,6 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm, } pci_reset_function(dev); - pci_save_state(dev); match-assigned_dev_id = assigned_dev-assigned_dev_id; match-host_segnr = assigned_dev-segnr; @@ -546,7 +544,6 @@ out: mutex_unlock(kvm-lock); return r; out_list_del: - pci_restore_state(dev); list_del(match-list); pci_release_regions(dev); out_disable: Acked-by: Jan Kiszka jan.kis...@siemens.com Jan signature.asc Description: OpenPGP digital signature
Re: [PATCH 04/15] KVM: MMU: cache mmio info on page fault path
On 07.06.2011, at 15:00, Xiao Guangrong wrote: If the page fault is caused by mmio, we can cache the mmio info, later, we do not need to walk guest page table and quickly know it is a mmio fault while we emulate the mmio instruction Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/include/asm/kvm_host.h |5 +++ arch/x86/kvm/mmu.c | 21 +-- arch/x86/kvm/mmu.h | 23 + arch/x86/kvm/paging_tmpl.h | 21 ++- arch/x86/kvm/x86.c | 52 ++ arch/x86/kvm/x86.h | 36 +++ 6 files changed, 126 insertions(+), 32 deletions(-) [...] +static int vcpu_gva_to_gpa(struct kvm_vcpu *vcpu, unsigned long gva, +gpa_t *gpa, struct x86_exception *exception, +bool write) +{ + u32 access = (kvm_x86_ops-get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0; + + if (vcpu_match_mmio_gva(vcpu, gva) + check_write_user_access(vcpu, write, access, + vcpu-arch.access)) { + *gpa = vcpu-arch.mmio_gfn PAGE_SHIFT | + (gva (PAGE_SIZE - 1)); + return 1; Hrm. Let me try to understand what you're doing. Whenever a guest issues an MMIO, it triggers an #NPF or #PF and then we walk either the NPT or the guest PT to resolve the GPA to the fault and send off an MMIO. Within that path, you remember the GVA-GPA mapping for the last MMIO request. If the next MMIO request is on the same GVA and kernel/user permissions still apply, you simply bypass the resolution. So far so good. Now, what happens when the GVA is not identical to the GVA it was before? It's probably a purely theoretic case, but imagine the following: 1) guest issues MMIO on GVA 0x1000 (GPA 0x1000) 2) guest remaps page 0x1000 to GPA 0x2000 3) guest issues MMIO on GVA 0x1000 That would break with your current implementation, right? It sounds pretty theoretic, but imagine the following: 1) guest user space 1 maps MMIO region A to 0x1000 2) guest user space 2 maps MMIO region B to 0x1000 3) guest user space 1 issues MMIO on 0x1000 4) context switch; going to user space 2 5) user space 2 issues MMIO on 0x1000 That case could at least be identified by also comparing the guest's cr3 value during this hack. And considering things like UIO or microkernels, it's not too unlikely :). Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/15] KVM: optimize for MMIO handled
On Wed, 08 Jun 2011 14:22:36 +0800 Xiao Guangrong xiaoguangr...@cn.fujitsu.com wrote: On 06/08/2011 11:47 AM, Takuya Yoshikawa wrote: Sure, KVM guest is the client, and it uses e1000 NIC, and uses NAT network connect to the netperf server, the bandwidth of our network is 100M. I see the reason, thank you! I used virtio-net and you used e1000. You are using e1000 to see the MMIO performance change, right? Hi Takuya, Now, i have done the performance test for virtio-net, the performance is improved very little, and it is not *regression* ;-) The reason is, MMIO generated by virtio-net is very very little. Yes, so I thought you had chosen e1000 for this test :) Thanks, Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/15] KVM: MMU: cache mmio info on page fault path
On 06/08/2011 04:22 PM, Alexander Graf wrote: +static int vcpu_gva_to_gpa(struct kvm_vcpu *vcpu, unsigned long gva, + gpa_t *gpa, struct x86_exception *exception, + bool write) +{ +u32 access = (kvm_x86_ops-get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0; + +if (vcpu_match_mmio_gva(vcpu, gva) + check_write_user_access(vcpu, write, access, + vcpu-arch.access)) { +*gpa = vcpu-arch.mmio_gfn PAGE_SHIFT | +(gva (PAGE_SIZE - 1)); +return 1; Hi Alexander, Thanks for your review! Hrm. Let me try to understand what you're doing. Whenever a guest issues an MMIO, it triggers an #NPF or #PF and then we walk either the NPT or the guest PT to resolve the GPA to the fault and send off an MMIO. Within that path, you remember the GVA-GPA mapping for the last MMIO request. If the next MMIO request is on the same GVA and kernel/user permissions still apply, you simply bypass the resolution. So far so good. In this patch, we also introduced vcpu_clear_mmio_info() that clears mmio cache info on the vcpu, and it is called when guest flush tlb (reload CR3 or INVLPG). Now, what happens when the GVA is not identical to the GVA it was before? It's probably a purely theoretic case, but imagine the following: 1) guest issues MMIO on GVA 0x1000 (GPA 0x1000) 2) guest remaps page 0x1000 to GPA 0x2000 3) guest issues MMIO on GVA 0x1000 If guest modify the page structure, base on x86 tlb rules, we should flush tlb to ensure the cpu use the new mapping. When you remap GVA 0x1000 to 0x2000, you should flush tlb, then mmio cache info is cleared, so the later access is right. That would break with your current implementation, right? It sounds pretty theoretic, but imagine the following: 1) guest user space 1 maps MMIO region A to 0x1000 2) guest user space 2 maps MMIO region B to 0x1000 3) guest user space 1 issues MMIO on 0x1000 4) context switch; going to user space 2 5) user space 2 issues MMIO on 0x1000 Also, when context switched, CR3 is reloaded, mmio cache info can be cleared too. right? :-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/15] KVM: MMU: cache mmio info on page fault path
On 08.06.2011, at 10:58, Xiao Guangrong wrote: On 06/08/2011 04:22 PM, Alexander Graf wrote: +static int vcpu_gva_to_gpa(struct kvm_vcpu *vcpu, unsigned long gva, + gpa_t *gpa, struct x86_exception *exception, + bool write) +{ + u32 access = (kvm_x86_ops-get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0; + + if (vcpu_match_mmio_gva(vcpu, gva) + check_write_user_access(vcpu, write, access, + vcpu-arch.access)) { + *gpa = vcpu-arch.mmio_gfn PAGE_SHIFT | + (gva (PAGE_SIZE - 1)); + return 1; Hi Alexander, Thanks for your review! Hrm. Let me try to understand what you're doing. Whenever a guest issues an MMIO, it triggers an #NPF or #PF and then we walk either the NPT or the guest PT to resolve the GPA to the fault and send off an MMIO. Within that path, you remember the GVA-GPA mapping for the last MMIO request. If the next MMIO request is on the same GVA and kernel/user permissions still apply, you simply bypass the resolution. So far so good. In this patch, we also introduced vcpu_clear_mmio_info() that clears mmio cache info on the vcpu, and it is called when guest flush tlb (reload CR3 or INVLPG). Ah, that one solved the SPT case then of course. Now, what happens when the GVA is not identical to the GVA it was before? It's probably a purely theoretic case, but imagine the following: 1) guest issues MMIO on GVA 0x1000 (GPA 0x1000) 2) guest remaps page 0x1000 to GPA 0x2000 3) guest issues MMIO on GVA 0x1000 If guest modify the page structure, base on x86 tlb rules, we should flush tlb to ensure the cpu use the new mapping. When you remap GVA 0x1000 to 0x2000, you should flush tlb, then mmio cache info is cleared, so the later access is right. That would break with your current implementation, right? It sounds pretty theoretic, but imagine the following: 1) guest user space 1 maps MMIO region A to 0x1000 2) guest user space 2 maps MMIO region B to 0x1000 3) guest user space 1 issues MMIO on 0x1000 4) context switch; going to user space 2 5) user space 2 issues MMIO on 0x1000 Also, when context switched, CR3 is reloaded, mmio cache info can be cleared too. right? :-) Only when using SPT. In the NPT case, you will never see cr3 getting reloaded or INVLPG :). Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/15] KVM: MMU: cache mmio info on page fault path
On 06/08/2011 05:18 PM, Alexander Graf wrote: Also, when context switched, CR3 is reloaded, mmio cache info can be cleared too. right? :-) Only when using SPT. In the NPT case, you will never see cr3 getting reloaded or INVLPG :). In the NPT case, we only cache the GPA, GVA is not cached (vcpu.arch.mmio_gva is always 0) ;-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/15] KVM: MMU: cache mmio info on page fault path
On 08.06.2011, at 11:33, Xiao Guangrong wrote: On 06/08/2011 05:18 PM, Alexander Graf wrote: Also, when context switched, CR3 is reloaded, mmio cache info can be cleared too. right? :-) Only when using SPT. In the NPT case, you will never see cr3 getting reloaded or INVLPG :). In the NPT case, we only cache the GPA, GVA is not cached (vcpu.arch.mmio_gva is always 0) ;-) Ah, very nice! :) Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: assigned EHCI USB headset not working
Hi, The sound device shows up under Windows7 and drivers are installed automatically. Unfortunately it does not work. All the players I tried, did not even start playing the sound file, although they detected the DirectSound Device. iso xfer's from usb-linux via ehci are flaky for reasons not yet tracked down. Any reason why you don't just plug in a virtual sound card? The HDA emulation should work fine with win7. cheers, Gerd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] [virt] virtio-blk: Use ida to allocate disk index
On 06/01/2011 07:57 PM, Rusty Russell wrote: On Wed, 1 Jun 2011 03:24:29 -0400, Mark Wu d...@redhat.com wrote: Current index allocation in virtio-blk is based on a monotonically increasing variable index. It could cause some confusion about disk name in the case of hot-plugging disks. And it's impossible to find the lowest available index by just maintaining a simple index. So it's changed to use ida to allocate index via referring to the index allocation in scsi disk. Signed-off-by: Mark Wu d...@redhat.com Hi Mark, I don't believe that we do disk probes in parallel, so the spinlock is unnecessary. Otherwise, this looks good. Thanks, Rusty. Hi Rusty, Yes, I can't figure out an instance of disk probing in parallel either, but as per the following commit, I think we still need use lock for safety. What's your opinion? commit 4034cc68157bfa0b6622efe368488d3d3e20f4e6 Author: Tejun Heo t...@kernel.org Date: Sat Feb 21 11:04:45 2009 +0900 [SCSI] sd: revive sd_index_lock Commit f27bac2761cab5a2e212dea602d22457a9aa6943 which converted sd to use ida instead of idr incorrectly removed sd_index_lock around id allocation and free. idr/ida do have internal locks but they protect their free object lists not the allocation itself. The caller is responsible for that. This missing synchronization led to the same id being assigned to multiple devices leading to oops. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] [virt] virtio-blk: Use ida to allocate disk index
On 06/02/2011 06:34 AM, Michael S. Tsirkin wrote: On Wed, Jun 01, 2011 at 04:25:48AM -0400, Mark Wu wrote: On 06/01/2011 03:24 AM, Mark Wu wrote: - if (index_to_minor(index)= 1 MINORBITS) - return -ENOSPC; + do { + if (!ida_pre_get(vd_index_ida, GFP_KERNEL)) + return err; + There's a problem in above code: err is not initialized before using, so change it to return -1; + do { + if (!ida_pre_get(vd_index_ida, GFP_KERNEL)) + return -1; Not -1. Pls return -ENOMEM. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Hi Michael, Thanks for pointing out that. This is the revised patch. From ffe49efd20938952a09d5a87fe694a6f62937756 Mon Sep 17 00:00:00 2001 From: Mark Wu d...@redhat.com Date: Wed, 8 Jun 2011 08:25:53 -0400 Subject: [PATCH 1/1] [virt] virtio-blk: Use ida to allocate disk index Current index allocation in virtio-blk is based on a monotonically increasing variable index. It could cause some confusion about disk name in the case of hot-plugging disks. And it's impossible to find the lowest available index by just maintaining a simple index. So it's changed to use ida to allocate index via referring to the index allocation in scsi disk. Signed-off-by: Mark Wu d...@redhat.com --- drivers/block/virtio_blk.c | 37 - 1 files changed, 32 insertions(+), 5 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 079c088..f13b758 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -8,10 +8,14 @@ #include linux/scatterlist.h #include linux/string_helpers.h #include scsi/scsi_cmnd.h +#include linux/idr.h #define PART_BITS 4 -static int major, index; +static int major; +static DEFINE_SPINLOCK(vd_index_lock); +static DEFINE_IDA(vd_index_ida); + struct workqueue_struct *virtblk_wq; struct virtio_blk @@ -23,6 +27,7 @@ struct virtio_blk /* The disk structure for the kernel. */ struct gendisk *disk; + u32 index; /* Request tracking. */ struct list_head reqs; @@ -343,12 +348,26 @@ static int __devinit virtblk_probe(struct virtio_device *vdev) struct request_queue *q; int err; u64 cap; - u32 v, blk_size, sg_elems, opt_io_size; + u32 v, blk_size, sg_elems, opt_io_size, index; u16 min_io_size; u8 physical_block_exp, alignment_offset; - if (index_to_minor(index) = 1 MINORBITS) - return -ENOSPC; + do { + if (!ida_pre_get(vd_index_ida, GFP_KERNEL)) + return -ENOMEM; + + spin_lock(vd_index_lock); + err = ida_get_new(vd_index_ida, index); + spin_unlock(vd_index_lock); + } while (err == -EAGAIN); + + if (err) + return err; + + if (index_to_minor(index) = 1 MINORBITS) { + err = -ENOSPC; + goto out_free_index; + } /* We need to know how many segments before we allocate. */ err = virtio_config_val(vdev, VIRTIO_BLK_F_SEG_MAX, @@ -421,7 +440,7 @@ static int __devinit virtblk_probe(struct virtio_device *vdev) vblk-disk-private_data = vblk; vblk-disk-fops = virtblk_fops; vblk-disk-driverfs_dev = vdev-dev; - index++; + vblk-index = index; /* configure queue flush support */ if (virtio_has_feature(vdev, VIRTIO_BLK_F_FLUSH)) @@ -516,6 +535,10 @@ out_free_vq: vdev-config-del_vqs(vdev); out_free_vblk: kfree(vblk); +out_free_index: + spin_lock(vd_index_lock); + ida_remove(vd_index_ida, index); + spin_unlock(vd_index_lock); out: return err; } @@ -538,6 +561,10 @@ static void __devexit virtblk_remove(struct virtio_device *vdev) mempool_destroy(vblk-pool); vdev-config-del_vqs(vdev); kfree(vblk); + + spin_lock(vd_index_lock); + ida_remove(vd_index_ida, vblk-index); + spin_unlock(vd_index_lock); } static const struct virtio_device_id id_table[] = { -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Implementing Virtio Net driver for Solaris
Hi, I'm in the middle of writing a network driver for Solaris 10 to use a VirtIO backend. I've gotten the basics working and throughput between two VMs on the same host is ~ 4x faster then when using the rtls interface. When I'm looking for is some guidance as to which of the features (CSUM,MRG_RXBUF ,HOST_TSO,GUEST_TSO) give the most bang for buck, i.e. which should I look at implementing first Thanks, Conor -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/12] kvm: Clean up stubs
No one references kvm_check_extension, kvm_has_vcpu_events, and kvm_has_robust_singlestep outside KVM code. kvm_update_guest_debug is never called, thus has no job besides returning an error. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- kvm-stub.c | 18 +- 1 files changed, 1 insertions(+), 17 deletions(-) diff --git a/kvm-stub.c b/kvm-stub.c index 1c95452..1e835c6 100644 --- a/kvm-stub.c +++ b/kvm-stub.c @@ -42,11 +42,6 @@ int kvm_uncoalesce_mmio_region(target_phys_addr_t start, ram_addr_t size) return -ENOSYS; } -int kvm_check_extension(KVMState *s, unsigned int extension) -{ -return 0; -} - int kvm_init(void) { return -ENOSYS; @@ -78,16 +73,6 @@ int kvm_has_sync_mmu(void) return 0; } -int kvm_has_vcpu_events(void) -{ -return 0; -} - -int kvm_has_robust_singlestep(void) -{ -return 0; -} - int kvm_has_many_ioeventfds(void) { return 0; @@ -99,8 +84,7 @@ void kvm_setup_guest_memory(void *start, size_t size) int kvm_update_guest_debug(CPUState *env, unsigned long reinject_trap) { -tb_flush(env); -return 0; +return -ENOSYS; } int kvm_insert_breakpoint(CPUState *current_env, target_ulong addr, -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 12/12] Remove unneeded kvm.h from cpu-exec.c
This was obsoleted by 6792a57bf1. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- cpu-exec.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/cpu-exec.c b/cpu-exec.c index 6ddd8dd..9bb6405 100644 --- a/cpu-exec.c +++ b/cpu-exec.c @@ -20,7 +20,6 @@ #include exec.h #include disas.h #include tcg.h -#include kvm.h #include qemu-barrier.h #if defined(__sparc__) !defined(CONFIG_SOLARIS) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/12] kvm: Drop useless zero-initializations
Backing KVMState is alreay zero-initialized. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- kvm-all.c |5 - 1 files changed, 0 insertions(+), 5 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 106eb3a..4a9910a 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -764,28 +764,23 @@ int kvm_init(void) } #endif -s-vcpu_events = 0; #ifdef KVM_CAP_VCPU_EVENTS s-vcpu_events = kvm_check_extension(s, KVM_CAP_VCPU_EVENTS); #endif -s-robust_singlestep = 0; #ifdef KVM_CAP_X86_ROBUST_SINGLESTEP s-robust_singlestep = kvm_check_extension(s, KVM_CAP_X86_ROBUST_SINGLESTEP); #endif -s-debugregs = 0; #ifdef KVM_CAP_DEBUGREGS s-debugregs = kvm_check_extension(s, KVM_CAP_DEBUGREGS); #endif -s-xsave = 0; #ifdef KVM_CAP_XSAVE s-xsave = kvm_check_extension(s, KVM_CAP_XSAVE); #endif -s-xcrs = 0; #ifdef KVM_CAP_XCRS s-xcrs = kvm_check_extension(s, KVM_CAP_XCRS); #endif -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/12] Add kernel header update script
This helper pulls the required kernel headers for KVM and vhost into a specified directory. The update is triggered via scripts/update-linux-headers.sh LINUX_PATH and will place the output under linux-headers/linux and linux-headers/asm-*. It also imports the COPYING to care for headers without an explicit license. CC: Alexander Graf ag...@suse.de CC: Christoph Hellwig h...@lst.de CC: Peter Maydell peter.mayd...@linaro.org CC: Andreas Färber andreas.faer...@web.de Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- linux-headers/README|2 + scripts/update-linux-headers.sh | 55 +++ 2 files changed, 57 insertions(+), 0 deletions(-) create mode 100644 linux-headers/README create mode 100755 scripts/update-linux-headers.sh diff --git a/linux-headers/README b/linux-headers/README new file mode 100644 index 000..5c9026b --- /dev/null +++ b/linux-headers/README @@ -0,0 +1,2 @@ +Automatically imported Linux kernel headers. +Only use scripts/update-linux-headers.sh to update! diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh new file mode 100755 index 000..e5f45b2 --- /dev/null +++ b/scripts/update-linux-headers.sh @@ -0,0 +1,55 @@ +#!/bin/sh -e +# +# Update Linux kernel headers QEMU requires from a specified kernel tree. +# +# Copyright (C) 2011 Siemens AG +# +# Authors: +# Jan Kiszkajan.kis...@siemens.com +# +# This work is licensed under the terms of the GNU GPL version 2. +# See the COPYING file in the top-level directory. + +tmpdir=$TMPDIR/.tmp-hdrs-$$ +linux=$1 +output=$2 + +if [ -z $linux -o ! -d $linux ]; then +cat EOF +usage: update-kernel-headers.sh LINUX_PATH [OUTPUT_PATH] + +LINUX_PATH Linux kernel directory to obtain the headers from +OUTPUT_PATH output directory, usually the qemu source tree (default: $PWD) +EOF +exit 1 +fi + +if [ -z $output ]; then +output=$PWD +fi + +for arch in x86 powerpc s390; do +make -C $linux INSTALL_HDR_PATH=$tmpdir SRCARCH=$arch headers_install + +rm -rf $output/linux-headers/asm-$arch +mkdir -p $output/linux-headers/asm-$arch +for header in kvm.h kvm_para.h; do +cp $tmpdir/include/asm/$header $output/linux-headers/asm-$arch +done +if [ $arch == x86 ]; then +cp $tmpdir/include/asm/hyperv.h $output/linux-headers/asm-x86 +fi +done + +rm -rf $output/linux-headers/linux +mkdir -p $output/linux-headers/linux +for header in kvm.h kvm_para.h vhost.h virtio_config.h virtio_ring.h; do +cp $tmpdir/include/linux/$header $output/linux-headers/linux +done +if [ -L $linux/source ]; then +cp $linux/source/COPYING $output/linux-headers +else +cp $linux/COPYING $output/linux-headers +fi + +rm -rf $tmpdir -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/12] [uq/master] Import linux headers and some cleanups
Licensing of the virtio headers is no clarified. So we can finally resolve the clumbsy and constantly buggy #ifdef'ery around old KVM and virtio headers. Recent example: current qemu-kvm does not build against 2.6.32 headers. This series introduces an import mechanism for all required Linux headers so that the appropriate versions can be kept safely inside the QEMU tree. I've incorporated all the valuable review comments on the first version and rebased the result over current uq/master after rebasing that one over current QEMU master. Please note that I had no chance to test-build PPC or s390. Beside the header topic, this series also includes a few assorted KVM cleanup patches so that my queue is empty again. CC: Alexander Graf ag...@suse.de CC: Andreas Färber andreas.faer...@web.de CC: Christoph Hellwig h...@lst.de CC: Eduardo Habkost ehabk...@redhat.com CC: Peter Maydell peter.mayd...@linaro.org Jan Kiszka (12): Add kernel header update script Import kernel headers Switch build system to accompanied kernel headers kvm: Drop CONFIG_KVM_PARA kvm: ppc: Drop CONFIG_KVM_PPC_PVR kvm: Drop useless zero-initializations kvm: Drop KVM_CAP build dependencies kvm: x86: Drop KVM_CAP build dependencies kvm: ppc: Drop KVM_CAP build dependencies kvm: Clean up stubs kvm: x86: Pass KVMState to kvm_arch_get_supported_cpuid Remove unneeded kvm.h from cpu-exec.c Makefile.target |4 +- configure| 149 +-- cpu-exec.c |1 - hw/kvmclock.c|9 - kvm-all.c| 13 - kvm-stub.c | 18 +- kvm.h|2 +- linux-headers/COPYING| 356 +++ linux-headers/README |2 + linux-headers/asm-powerpc/kvm.h | 275 linux-headers/asm-powerpc/kvm_para.h | 53 +++ linux-headers/asm-s390/kvm.h | 44 ++ linux-headers/asm-s390/kvm_para.h| 17 + linux-headers/asm-x86/hyperv.h | 193 linux-headers/asm-x86/kvm.h | 324 ++ linux-headers/asm-x86/kvm_para.h | 79 linux-headers/linux/kvm.h| 804 ++ linux-headers/linux/kvm_para.h | 29 ++ linux-headers/linux/vhost.h | 130 ++ linux-headers/linux/virtio_config.h | 54 +++ linux-headers/linux/virtio_ring.h| 163 +++ scripts/update-linux-headers.sh | 55 +++ target-i386/cpuid.c | 20 +- target-i386/kvm.c| 123 +- target-ppc/kvm.c | 23 - target-s390x/cpu.h | 10 - target-s390x/op_helper.c |1 + 27 files changed, 2630 insertions(+), 321 deletions(-) create mode 100644 linux-headers/COPYING create mode 100644 linux-headers/README create mode 100644 linux-headers/asm-powerpc/kvm.h create mode 100644 linux-headers/asm-powerpc/kvm_para.h create mode 100644 linux-headers/asm-s390/kvm.h create mode 100644 linux-headers/asm-s390/kvm_para.h create mode 100644 linux-headers/asm-x86/hyperv.h create mode 100644 linux-headers/asm-x86/kvm.h create mode 100644 linux-headers/asm-x86/kvm_para.h create mode 100644 linux-headers/linux/kvm.h create mode 100644 linux-headers/linux/kvm_para.h create mode 100644 linux-headers/linux/vhost.h create mode 100644 linux-headers/linux/virtio_config.h create mode 100644 linux-headers/linux/virtio_ring.h create mode 100755 scripts/update-linux-headers.sh -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/12] kvm: Drop CONFIG_KVM_PARA
The kvm_para.h header is now always available. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- configure |1 - hw/kvmclock.c |9 - target-i386/kvm.c | 26 +- 3 files changed, 1 insertions(+), 35 deletions(-) diff --git a/configure b/configure index 0e1dc46..ed54db9 100755 --- a/configure +++ b/configure @@ -3218,7 +3218,6 @@ case $target_arch2 in \( $target_arch2 = x86_64 -a $cpu = i386 \) -o \ \( $target_arch2 = i386 -a $cpu = x86_64 \) \) ; then echo CONFIG_KVM=y $config_target_mak - echo CONFIG_KVM_PARA=y $config_target_mak if test $vhost_net = yes ; then echo CONFIG_VHOST_NET=y $config_target_mak fi diff --git a/hw/kvmclock.c b/hw/kvmclock.c index 004c4ad..692ad18 100644 --- a/hw/kvmclock.c +++ b/hw/kvmclock.c @@ -17,8 +17,6 @@ #include kvm.h #include kvmclock.h -#if defined(CONFIG_KVM_PARA) defined(KVM_CAP_ADJUST_CLOCK) - #include linux/kvm.h #include linux/kvm_para.h @@ -120,10 +118,3 @@ static void kvmclock_register_device(void) } device_init(kvmclock_register_device); - -#else /* !(CONFIG_KVM_PARA KVM_CAP_ADJUST_CLOCK) */ - -void kvmclock_create(void) -{ -} -#endif /* !(CONFIG_KVM_PARA KVM_CAP_ADJUST_CLOCK) */ diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 1ae2d61..0efcf97 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -18,6 +18,7 @@ #include sys/utsname.h #include linux/kvm.h +#include linux/kvm_para.h #include qemu-common.h #include sysemu.h @@ -29,10 +30,6 @@ #include hw/apic.h #include ioport.h -#ifdef CONFIG_KVM_PARA -#include linux/kvm_para.h -#endif -// //#define DEBUG_KVM #ifdef DEBUG_KVM @@ -62,9 +59,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = { static bool has_msr_star; static bool has_msr_hsave_pa; -#if defined(CONFIG_KVM_PARA) defined(KVM_CAP_ASYNC_PF) static bool has_msr_async_pf_en; -#endif static int lm_capable_kernel; static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max) @@ -92,7 +87,6 @@ static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max) return cpuid; } -#ifdef CONFIG_KVM_PARA struct kvm_para_features { int cap; int feature; @@ -118,7 +112,6 @@ static int get_para_features(CPUState *env) return features; } -#endif uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function, @@ -128,9 +121,7 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function, int i, max; uint32_t ret = 0; uint32_t cpuid_1_edx; -#ifdef CONFIG_KVM_PARA int has_kvm_features = 0; -#endif max = 1; while ((cpuid = try_get_cpuid(env-kvm_state, max)) == NULL) { @@ -140,11 +131,9 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function, for (i = 0; i cpuid-nent; ++i) { if (cpuid-entries[i].function == function cpuid-entries[i].index == index) { -#ifdef CONFIG_KVM_PARA if (cpuid-entries[i].function == KVM_CPUID_FEATURES) { has_kvm_features = 1; } -#endif switch (reg) { case R_EAX: ret = cpuid-entries[i].eax; @@ -177,12 +166,10 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function, qemu_free(cpuid); -#ifdef CONFIG_KVM_PARA /* fallback for older kernels */ if (!has_kvm_features (function == KVM_CPUID_FEATURES)) { ret = get_para_features(env); } -#endif return ret; } @@ -377,9 +364,7 @@ int kvm_arch_init_vcpu(CPUState *env) uint32_t limit, i, j, cpuid_i; uint32_t unused; struct kvm_cpuid_entry2 *c; -#ifdef CONFIG_KVM_PARA uint32_t signature[3]; -#endif env-cpuid_features = kvm_arch_get_supported_cpuid(env, 1, 0, R_EDX); @@ -397,7 +382,6 @@ int kvm_arch_init_vcpu(CPUState *env) cpuid_i = 0; -#ifdef CONFIG_KVM_PARA /* Paravirtualization CPUIDs */ memcpy(signature, KVMKVMKVM\0\0\0, 12); c = cpuid_data.entries[cpuid_i++]; @@ -418,8 +402,6 @@ int kvm_arch_init_vcpu(CPUState *env) has_msr_async_pf_en = c-eax (1 KVM_FEATURE_ASYNC_PF); #endif -#endif - cpu_x86_cpuid(env, 0, 0, limit, unused, unused, unused); for (i = 0; i = limit; i++) { @@ -931,12 +913,10 @@ static int kvm_put_msrs(CPUState *env, int level) kvm_msr_entry_set(msrs[n++], MSR_KVM_SYSTEM_TIME, env-system_time_msr); kvm_msr_entry_set(msrs[n++], MSR_KVM_WALL_CLOCK, env-wall_clock_msr); -#if defined(CONFIG_KVM_PARA) defined(KVM_CAP_ASYNC_PF) if (has_msr_async_pf_en) { kvm_msr_entry_set(msrs[n++], MSR_KVM_ASYNC_PF_EN, env-async_pf_en_msr); } -#endif } #ifdef KVM_CAP_MCE if (env-mcg_cap) { @@ -1172,11 +1152,9 @@ static int kvm_get_msrs(CPUState *env) #endif msrs[n++].index = MSR_KVM_SYSTEM_TIME; msrs[n++].index = MSR_KVM_WALL_CLOCK; -#if defined(CONFIG_KVM_PARA)
[PATCH 08/12] kvm: x86: Drop KVM_CAP build dependencies
No longer needed with accompanied kernel headers. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- target-i386/kvm.c | 67 ++-- 1 files changed, 3 insertions(+), 64 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 0efcf97..1c2d32c 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -94,9 +94,7 @@ struct kvm_para_features { { KVM_CAP_CLOCKSOURCE, KVM_FEATURE_CLOCKSOURCE }, { KVM_CAP_NOP_IO_DELAY, KVM_FEATURE_NOP_IO_DELAY }, { KVM_CAP_PV_MMU, KVM_FEATURE_MMU_OP }, -#ifdef KVM_CAP_ASYNC_PF { KVM_CAP_ASYNC_PF, KVM_FEATURE_ASYNC_PF }, -#endif { -1, -1 } }; @@ -193,7 +191,6 @@ static void kvm_unpoison_all(void *param) } } -#ifdef KVM_CAP_MCE static void kvm_hwpoison_page_add(ram_addr_t ram_addr) { HWPoisonPage *page; @@ -239,7 +236,6 @@ static void kvm_mce_inject(CPUState *env, target_phys_addr_t paddr, int code) cpu_x86_support_mca_broadcast(env) ? MCE_INJECT_BROADCAST : 0); } -#endif /* KVM_CAP_MCE */ static void hardware_memory_error(void) { @@ -249,7 +245,6 @@ static void hardware_memory_error(void) int kvm_arch_on_sigbus_vcpu(CPUState *env, int code, void *addr) { -#ifdef KVM_CAP_MCE ram_addr_t ram_addr; target_phys_addr_t paddr; @@ -269,9 +264,7 @@ int kvm_arch_on_sigbus_vcpu(CPUState *env, int code, void *addr) } kvm_hwpoison_page_add(ram_addr); kvm_mce_inject(env, paddr, code); -} else -#endif /* KVM_CAP_MCE */ -{ +} else { if (code == BUS_MCEERR_AO) { return 0; } else if (code == BUS_MCEERR_AR) { @@ -285,7 +278,6 @@ int kvm_arch_on_sigbus_vcpu(CPUState *env, int code, void *addr) int kvm_arch_on_sigbus(int code, void *addr) { -#ifdef KVM_CAP_MCE if ((first_cpu-mcg_cap MCG_SER_P) addr code == BUS_MCEERR_AO) { ram_addr_t ram_addr; target_phys_addr_t paddr; @@ -300,9 +292,7 @@ int kvm_arch_on_sigbus(int code, void *addr) } kvm_hwpoison_page_add(ram_addr); kvm_mce_inject(first_cpu, paddr, code); -} else -#endif /* KVM_CAP_MCE */ -{ +} else { if (code == BUS_MCEERR_AO) { return 0; } else if (code == BUS_MCEERR_AR) { @@ -316,7 +306,6 @@ int kvm_arch_on_sigbus(int code, void *addr) static int kvm_inject_mce_oldstyle(CPUState *env) { -#ifdef KVM_CAP_MCE if (!kvm_has_vcpu_events() env-exception_injected == EXCP12_MCHK) { unsigned int bank, bank_num = env-mcg_cap 0xff; struct kvm_x86_mce mce; @@ -342,7 +331,6 @@ static int kvm_inject_mce_oldstyle(CPUState *env) return kvm_vcpu_ioctl(env, KVM_X86_SET_MCE, mce); } -#endif /* KVM_CAP_MCE */ return 0; } @@ -398,9 +386,7 @@ int kvm_arch_init_vcpu(CPUState *env) c-eax = env-cpuid_kvm_features kvm_arch_get_supported_cpuid(env, KVM_CPUID_FEATURES, 0, R_EAX); -#ifdef KVM_CAP_ASYNC_PF has_msr_async_pf_en = c-eax (1 KVM_FEATURE_ASYNC_PF); -#endif cpu_x86_cpuid(env, 0, 0, limit, unused, unused, unused); @@ -481,7 +467,6 @@ int kvm_arch_init_vcpu(CPUState *env) cpuid_data.cpuid.nent = cpuid_i; -#ifdef KVM_CAP_MCE if (((env-cpuid_version 8)0xF) = 6 (env-cpuid_features(CPUID_MCE|CPUID_MCA)) == (CPUID_MCE|CPUID_MCA) kvm_check_extension(env-kvm_state, KVM_CAP_MCE) 0) { @@ -508,7 +493,6 @@ int kvm_arch_init_vcpu(CPUState *env) env-mcg_cap = mcg_cap; } -#endif qemu_add_vm_change_state_handler(cpu_update_state, env); @@ -600,7 +584,6 @@ int kvm_arch_init(KVMState *s) * that case we need to stick with the default, i.e. a 256K maximum BIOS * size. */ -#ifdef KVM_CAP_SET_IDENTITY_MAP_ADDR if (kvm_check_extension(s, KVM_CAP_SET_IDENTITY_MAP_ADDR)) { /* Allows up to 16M BIOSes. */ identity_base = 0xfeffc000; @@ -610,7 +593,7 @@ int kvm_arch_init(KVMState *s) return ret; } } -#endif + /* Set TSS base one page after EPT identity map. */ ret = kvm_vm_ioctl(s, KVM_SET_TSS_ADDR, identity_base + 0x1000); if (ret 0) { @@ -745,7 +728,6 @@ static int kvm_put_fpu(CPUState *env) return kvm_vcpu_ioctl(env, KVM_SET_FPU, fpu); } -#ifdef KVM_CAP_XSAVE #define XSAVE_CWD_RIP 2 #define XSAVE_CWD_RDP 4 #define XSAVE_MXCSR 6 @@ -753,11 +735,9 @@ static int kvm_put_fpu(CPUState *env) #define XSAVE_XMM_SPACE 40 #define XSAVE_XSTATE_BV 128 #define XSAVE_YMMH_SPACE 144 -#endif static int kvm_put_xsave(CPUState *env) { -#ifdef KVM_CAP_XSAVE int i, r; struct kvm_xsave* xsave; uint16_t cwd, swd, twd, fop; @@ -788,14 +768,10 @@ static int kvm_put_xsave(CPUState *env) r = kvm_vcpu_ioctl(env, KVM_SET_XSAVE, xsave); qemu_free(xsave); return r; -#else -return kvm_put_fpu(env); -#endif } static int
[PATCH 11/12] kvm: x86: Pass KVMState to kvm_arch_get_supported_cpuid
kvm_arch_get_supported_cpuid checks for global cpuid restrictions, it does not require any CPUState reference. Changing its interface allows to call it before any VCPU is initialized. CC: Eduardo Habkost ehabk...@redhat.com Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- kvm.h |2 +- target-i386/cpuid.c | 20 target-i386/kvm.c | 30 +++--- 3 files changed, 28 insertions(+), 24 deletions(-) diff --git a/kvm.h b/kvm.h index d565dba..243b063 100644 --- a/kvm.h +++ b/kvm.h @@ -157,7 +157,7 @@ bool kvm_arch_stop_on_emulation_error(CPUState *env); int kvm_check_extension(KVMState *s, unsigned int extension); -uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function, +uint32_t kvm_arch_get_supported_cpuid(KVMState *env, uint32_t function, uint32_t index, int reg); void kvm_cpu_synchronize_state(CPUState *env); void kvm_cpu_synchronize_post_reset(CPUState *env); diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c index 79e7580..e1ae3af 100644 --- a/target-i386/cpuid.c +++ b/target-i386/cpuid.c @@ -1144,10 +1144,12 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, break; case 7: if (kvm_enabled()) { -*eax = kvm_arch_get_supported_cpuid(env, 0x7, count, R_EAX); -*ebx = kvm_arch_get_supported_cpuid(env, 0x7, count, R_EBX); -*ecx = kvm_arch_get_supported_cpuid(env, 0x7, count, R_ECX); -*edx = kvm_arch_get_supported_cpuid(env, 0x7, count, R_EDX); +KVMState *s = env-kvm_state; + +*eax = kvm_arch_get_supported_cpuid(s, 0x7, count, R_EAX); +*ebx = kvm_arch_get_supported_cpuid(s, 0x7, count, R_EBX); +*ecx = kvm_arch_get_supported_cpuid(s, 0x7, count, R_ECX); +*edx = kvm_arch_get_supported_cpuid(s, 0x7, count, R_EDX); } else { *eax = 0; *ebx = 0; @@ -1179,10 +1181,12 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, break; } if (kvm_enabled()) { -*eax = kvm_arch_get_supported_cpuid(env, 0xd, count, R_EAX); -*ebx = kvm_arch_get_supported_cpuid(env, 0xd, count, R_EBX); -*ecx = kvm_arch_get_supported_cpuid(env, 0xd, count, R_ECX); -*edx = kvm_arch_get_supported_cpuid(env, 0xd, count, R_EDX); +KVMState *s = env-kvm_state; + +*eax = kvm_arch_get_supported_cpuid(s, 0xd, count, R_EAX); +*ebx = kvm_arch_get_supported_cpuid(s, 0xd, count, R_EBX); +*ecx = kvm_arch_get_supported_cpuid(s, 0xd, count, R_ECX); +*edx = kvm_arch_get_supported_cpuid(s, 0xd, count, R_EDX); } else { *eax = 0; *ebx = 0; diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 1c2d32c..5ebb054 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -98,12 +98,12 @@ struct kvm_para_features { { -1, -1 } }; -static int get_para_features(CPUState *env) +static int get_para_features(KVMState *s) { int i, features = 0; for (i = 0; i ARRAY_SIZE(para_features) - 1; i++) { -if (kvm_check_extension(env-kvm_state, para_features[i].cap)) { +if (kvm_check_extension(s, para_features[i].cap)) { features |= (1 para_features[i].feature); } } @@ -112,7 +112,7 @@ static int get_para_features(CPUState *env) } -uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function, +uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function, uint32_t index, int reg) { struct kvm_cpuid2 *cpuid; @@ -122,7 +122,7 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function, int has_kvm_features = 0; max = 1; -while ((cpuid = try_get_cpuid(env-kvm_state, max)) == NULL) { +while ((cpuid = try_get_cpuid(s, max)) == NULL) { max *= 2; } @@ -153,7 +153,7 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function, /* On Intel, kvm returns cpuid according to the Intel spec, * so add missing bits according to the AMD spec: */ -cpuid_1_edx = kvm_arch_get_supported_cpuid(env, 1, 0, R_EDX); +cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX); ret |= cpuid_1_edx 0x183f7ff; break; } @@ -166,7 +166,7 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function, /* fallback for older kernels */ if (!has_kvm_features (function == KVM_CPUID_FEATURES)) { -ret = get_para_features(env); +ret = get_para_features(s); } return ret; @@ -349,25 +349,25 @@ int kvm_arch_init_vcpu(CPUState *env) struct kvm_cpuid2 cpuid; struct kvm_cpuid_entry2 entries[100];
[PATCH 03/12] Switch build system to accompanied kernel headers
This helps reducing our build-time checks for feature support in the available Linux kernel headers. And it helps users that do not have sufficiently recent headers installed on their build machine. Consequently, the patch removes and build-time checks for kvm and vhost in configure, the --kerneldir switch, and KVM_CFLAGS. Kernel headers are supposed to be provided by QEMU only. s390 needs some extra love as it carries redefinitions from kernel headers. CC: Alexander Graf ag...@suse.de Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Makefile.target |4 +- configure| 151 ++ target-s390x/cpu.h | 10 --- target-s390x/op_helper.c |1 + 4 files changed, 21 insertions(+), 145 deletions(-) diff --git a/Makefile.target b/Makefile.target index 5c22df8..be9c0e8 100644 --- a/Makefile.target +++ b/Makefile.target @@ -14,7 +14,7 @@ endif TARGET_PATH=$(SRC_PATH)/target-$(TARGET_BASE_ARCH) $(call set-vpath, $(SRC_PATH):$(TARGET_PATH):$(SRC_PATH)/hw) -QEMU_CFLAGS+= -I.. -I$(TARGET_PATH) -DNEED_CPU_H +QEMU_CFLAGS+= -I.. -I../linux-headers -I$(TARGET_PATH) -DNEED_CPU_H include $(SRC_PATH)/Makefile.objs @@ -37,8 +37,6 @@ ifndef CONFIG_HAIKU LIBS+=-lm endif -kvm.o kvm-all.o vhost.o vhost_net.o kvmclock.o: QEMU_CFLAGS+=$(KVM_CFLAGS) - config-target.h: config-target.h-timestamp config-target.h-timestamp: config-target.mak diff --git a/configure b/configure index d38b952..0e1dc46 100755 --- a/configure +++ b/configure @@ -113,8 +113,7 @@ curl= curses= docs= fdt= -kvm= -kvm_para= +kvm=yes nptl= sdl= vnc=yes @@ -130,7 +129,7 @@ xen= xen_ctrl_version= linux_aio= attr= -vhost_net= +vhost_net=yes xfs= gprof=no @@ -165,7 +164,6 @@ guest_base= uname_release= io_thread=no mixemu=no -kerneldir= aix=no blobs=yes pkgversion= @@ -712,8 +710,6 @@ for opt do ;; --disable-blobs) blobs=no ;; - --kerneldir=*) kerneldir=$optarg - ;; --with-pkgversion=*) pkgversion= ($optarg) ;; --disable-docs) docs=no @@ -1001,7 +997,6 @@ echo --disable-attr disables attr and xattr support echo --enable-attrenable attr and xattr support echo --enable-io-thread enable IO thread echo --disable-blobs disable installing provided firmware blobs -echo --kerneldir=PATH look for kernel includes in PATH echo --enable-docsenable documentation build echo --disable-docs disable documentation build echo --disable-vhost-net disable vhost-net acceleration support @@ -1766,124 +1761,6 @@ EOF fi ## -# kvm probe -if test $kvm != no ; then -cat $TMPC EOF -#include linux/kvm.h -#if !defined(KVM_API_VERSION) || KVM_API_VERSION 12 || KVM_API_VERSION 12 -#error Invalid KVM version -#endif -EOF -must_have_caps=KVM_CAP_USER_MEMORY \ -KVM_CAP_DESTROY_MEMORY_REGION_WORKS \ -KVM_CAP_COALESCED_MMIO \ -KVM_CAP_SYNC_MMU \ - -if test \( $cpu = i386 -o $cpu = x86_64 \) ; then - must_have_caps=$caps \ - KVM_CAP_SET_TSS_ADDR \ - KVM_CAP_EXT_CPUID \ - KVM_CAP_CLOCKSOURCE \ - KVM_CAP_NOP_IO_DELAY \ - KVM_CAP_PV_MMU \ - KVM_CAP_MP_STATE \ - KVM_CAP_USER_NMI \ - -fi -for c in $must_have_caps ; do - cat $TMPC EOF -#if !defined($c) -#error Missing KVM capability $c -#endif -EOF -done -cat $TMPC EOF -int main(void) { return 0; } -EOF - if test $kerneldir != ; then - kvm_cflags=-I$kerneldir/include - if test \( $cpu = i386 -o $cpu = x86_64 \) \ - -a -d $kerneldir/arch/x86/include ; then -kvm_cflags=$kvm_cflags -I$kerneldir/arch/x86/include - elif test $cpu = ppc -a -d $kerneldir/arch/powerpc/include ; then - kvm_cflags=$kvm_cflags -I$kerneldir/arch/powerpc/include - elif test $cpu = s390x -a -d $kerneldir/arch/s390/include ; then - kvm_cflags=$kvm_cflags -I$kerneldir/arch/s390/include -elif test -d $kerneldir/arch/$cpu/include ; then -kvm_cflags=$kvm_cflags -I$kerneldir/arch/$cpu/include - fi - else -kvm_cflags=`$pkg_config --cflags kvm-kmod 2/dev/null` - fi - if compile_prog $kvm_cflags ; then -kvm=yes -cat $TMPC EOF -#include linux/kvm_para.h -int main(void) { return 0; } -EOF -if compile_prog $kvm_cflags ; then - kvm_para=yes -fi - else -if test $kvm = yes ; then - if has awk has grep; then -kvmerr=`LANG=C $cc $QEMU_CFLAGS -o $TMPE $kvm_cflags $TMPC 21 \ - | grep error: \ - | awk -F error: '{if (NR1) printf(, ); printf(%s,$2);}'` -if test $kvmerr != ; then - echo -e ${kvmerr}\n\ -NOTE: To enable KVM support, update your kernel to 2.6.29+ or install \ -recent
[PATCH 09/12] kvm: ppc: Drop KVM_CAP build dependencies
No longer needed with accompanied kernel headers. CC: Alexander Graf ag...@suse.de Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- target-ppc/kvm.c | 14 -- 1 files changed, 0 insertions(+), 14 deletions(-) diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index 0500e3f..21f35af 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -65,18 +65,10 @@ static void kvm_kick_env(void *env) int kvm_arch_init(KVMState *s) { -#ifdef KVM_CAP_PPC_UNSET_IRQ cap_interrupt_unset = kvm_check_extension(s, KVM_CAP_PPC_UNSET_IRQ); -#endif -#ifdef KVM_CAP_PPC_IRQ_LEVEL cap_interrupt_level = kvm_check_extension(s, KVM_CAP_PPC_IRQ_LEVEL); -#endif -#ifdef KVM_CAP_PPC_SEGSTATE cap_segstate = kvm_check_extension(s, KVM_CAP_PPC_SEGSTATE); -#endif -#ifdef KVM_CAP_PPC_BOOKE_SREGS cap_booke_sregs = kvm_check_extension(s, KVM_CAP_PPC_BOOKE_SREGS); -#endif if (!cap_interrupt_level) { fprintf(stderr, KVM: Couldn't find level irq capability. Expect the @@ -217,7 +209,6 @@ int kvm_arch_get_registers(CPUState *env) return ret; } -#ifdef KVM_CAP_PPC_BOOKE_SREGS if (sregs.u.e.features KVM_SREGS_E_BASE) { env-spr[SPR_BOOKE_CSRR0] = sregs.u.e.csrr0; env-spr[SPR_BOOKE_CSRR1] = sregs.u.e.csrr1; @@ -314,7 +305,6 @@ int kvm_arch_get_registers(CPUState *env) env-spr[SPR_BOOKE_PID2] = sregs.u.e.impl.fsl.pid2; } } -#endif } if (cap_segstate) { @@ -323,7 +313,6 @@ int kvm_arch_get_registers(CPUState *env) return ret; } -#ifdef KVM_CAP_PPC_SEGSTATE ppc_store_sdr1(env, sregs.u.s.sdr1); /* Sync SLB */ @@ -346,7 +335,6 @@ int kvm_arch_get_registers(CPUState *env) env-IBAT[0][i] = sregs.u.s.ppc32.ibat[i] 0x; env-IBAT[1][i] = sregs.u.s.ppc32.ibat[i] 32; } -#endif } return 0; @@ -525,7 +513,6 @@ int kvmppc_get_hypercall(CPUState *env, uint8_t *buf, int buf_len) { uint32_t *hc = (uint32_t*)buf; -#ifdef KVM_CAP_PPC_GET_PVINFO struct kvm_ppc_pvinfo pvinfo; if (kvm_check_extension(env-kvm_state, KVM_CAP_PPC_GET_PVINFO) @@ -534,7 +521,6 @@ int kvmppc_get_hypercall(CPUState *env, uint8_t *buf, int buf_len) return 0; } -#endif /* * Fallback to always fail hypercalls: -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/12] kvm: Drop KVM_CAP build dependencies
No longer needed with accompanied kernel headers. We are only left with build dependencies that are controlled by kvm arch headers. CC: Alexander Graf ag...@suse.de Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- kvm-all.c |8 1 files changed, 0 insertions(+), 8 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 4a9910a..cbc2532 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -757,21 +757,17 @@ int kvm_init(void) s-coalesced_mmio = kvm_check_extension(s, KVM_CAP_COALESCED_MMIO); s-broken_set_mem_region = 1; -#ifdef KVM_CAP_JOIN_MEMORY_REGIONS_WORKS ret = kvm_check_extension(s, KVM_CAP_JOIN_MEMORY_REGIONS_WORKS); if (ret 0) { s-broken_set_mem_region = 0; } -#endif #ifdef KVM_CAP_VCPU_EVENTS s-vcpu_events = kvm_check_extension(s, KVM_CAP_VCPU_EVENTS); #endif -#ifdef KVM_CAP_X86_ROBUST_SINGLESTEP s-robust_singlestep = kvm_check_extension(s, KVM_CAP_X86_ROBUST_SINGLESTEP); -#endif #ifdef KVM_CAP_DEBUGREGS s-debugregs = kvm_check_extension(s, KVM_CAP_DEBUGREGS); @@ -850,7 +846,6 @@ static void kvm_handle_io(uint16_t port, void *data, int direction, int size, } } -#ifdef KVM_CAP_INTERNAL_ERROR_DATA static int kvm_handle_internal_error(CPUState *env, struct kvm_run *run) { fprintf(stderr, KVM internal error.); @@ -877,7 +872,6 @@ static int kvm_handle_internal_error(CPUState *env, struct kvm_run *run) */ return -1; } -#endif void kvm_flush_coalesced_mmio_buffer(void) { @@ -1008,11 +1002,9 @@ int kvm_cpu_exec(CPUState *env) (uint64_t)run-hw.hardware_exit_reason); ret = -1; break; -#ifdef KVM_CAP_INTERNAL_ERROR_DATA case KVM_EXIT_INTERNAL_ERROR: ret = kvm_handle_internal_error(env, run); break; -#endif default: DPRINTF(kvm_arch_handle_exit\n); ret = kvm_arch_handle_exit(env, run); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/12] kvm: ppc: Drop CONFIG_KVM_PPC_PVR
Required header support is now unconditionally available. CC: Alexander Graf ag...@suse.de Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- configure|1 - target-ppc/kvm.c |9 - 2 files changed, 0 insertions(+), 10 deletions(-) diff --git a/configure b/configure index ed54db9..0947f98 100755 --- a/configure +++ b/configure @@ -3221,7 +3221,6 @@ case $target_arch2 in if test $vhost_net = yes ; then echo CONFIG_VHOST_NET=y $config_target_mak fi - echo CONFIG_KVM_PPC_PVR=y $config_target_mak fi esac if test $target_bigendian = yes ; then diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index e7b1b10..0500e3f 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -104,21 +104,12 @@ static int kvm_arch_sync_sregs(CPUState *cenv) } } -#if !defined(CONFIG_KVM_PPC_PVR) -if (1) { -fprintf(stderr, kvm error: missing PVR setting capability\n); -return -ENOSYS; -} -#endif - ret = kvm_vcpu_ioctl(cenv, KVM_GET_SREGS, sregs); if (ret) { return ret; } -#ifdef CONFIG_KVM_PPC_PVR sregs.pvr = cenv-spr[SPR_PVR]; -#endif return kvm_vcpu_ioctl(cenv, KVM_SET_SREGS, sregs); } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/12] Add kernel header update script
On 8 June 2011 15:10, Jan Kiszka jan.kis...@siemens.com wrote: --- /dev/null +++ b/scripts/update-linux-headers.sh @@ -0,0 +1,55 @@ +#!/bin/sh -e +# +if [ -z $output ]; then + output=$PWD +fi + mkdir -p $output/linux-headers/asm-$arch This script is rather lacking in quoting throughout. As a random example, this looks like it will break if you run the script from a directory with a space in the path. +tmpdir=$TMPDIR/.tmp-hdrs-$$ Better (safer) to use mktemp, I think. if [ -z $linux -o ! -d $linux ]; then test -o is obsolescent in POSIX; use if [ -z $linux ] || ! [ -d $linux ] ; then instead. -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/12] Add kernel header update script
On 2011-06-08 16:33, Peter Maydell wrote: On 8 June 2011 15:10, Jan Kiszka jan.kis...@siemens.com wrote: --- /dev/null +++ b/scripts/update-linux-headers.sh @@ -0,0 +1,55 @@ +#!/bin/sh -e +# +if [ -z $output ]; then +output=$PWD +fi +mkdir -p $output/linux-headers/asm-$arch This script is rather lacking in quoting throughout. As a random example, this looks like it will break if you run the script from a directory with a space in the path. True. +tmpdir=$TMPDIR/.tmp-hdrs-$$ Better (safer) to use mktemp, I think. Is that portable? I don't think so. if [ -z $linux -o ! -d $linux ]; then test -o is obsolescent in POSIX; use if [ -z $linux ] || ! [ -d $linux ] ; then instead. OK. Thanks, Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/12] Add kernel header update script
2011/6/8 Jan Kiszka jan.kis...@siemens.com: On 2011-06-08 16:33, Peter Maydell wrote: On 8 June 2011 15:10, Jan Kiszka jan.kis...@siemens.com wrote: +tmpdir=$TMPDIR/.tmp-hdrs-$$ Better (safer) to use mktemp, I think. Is that portable? I don't think so. We don't expect every random end user to run this script, though, right? We already use mktemp in scripts/refresh-pxe-roms.sh, for instance. -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 01/12] Add kernel header update script
This helper pulls the required kernel headers for KVM and vhost into a specified directory. The update is triggered via scripts/update-linux-headers.sh LINUX_PATH and will place the output under linux-headers/linux and linux-headers/asm-*. It also imports the COPYING to care for headers without an explicit license. CC: Alexander Graf ag...@suse.de CC: Christoph Hellwig h...@lst.de CC: Peter Maydell peter.mayd...@linaro.org CC: Andreas Färber andreas.faer...@web.de Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Changes in v2: - add quoting - use mktemp - avoid -o for test linux-headers/README|2 + scripts/update-linux-headers.sh | 55 +++ 2 files changed, 57 insertions(+), 0 deletions(-) create mode 100644 linux-headers/README create mode 100755 scripts/update-linux-headers.sh diff --git a/linux-headers/README b/linux-headers/README new file mode 100644 index 000..5c9026b --- /dev/null +++ b/linux-headers/README @@ -0,0 +1,2 @@ +Automatically imported Linux kernel headers. +Only use scripts/update-linux-headers.sh to update! diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh new file mode 100755 index 000..e43f385 --- /dev/null +++ b/scripts/update-linux-headers.sh @@ -0,0 +1,55 @@ +#!/bin/sh -e +# +# Update Linux kernel headers QEMU requires from a specified kernel tree. +# +# Copyright (C) 2011 Siemens AG +# +# Authors: +# Jan Kiszkajan.kis...@siemens.com +# +# This work is licensed under the terms of the GNU GPL version 2. +# See the COPYING file in the top-level directory. + +tmpdir=`mktemp -d` +linux=$1 +output=$2 + +if [ -z $linux ] || ! [ -d $linux ]; then +cat EOF +usage: update-kernel-headers.sh LINUX_PATH [OUTPUT_PATH] + +LINUX_PATH Linux kernel directory to obtain the headers from +OUTPUT_PATH output directory, usually the qemu source tree (default: $PWD) +EOF +exit 1 +fi + +if [ -z $output ]; then +output=$PWD +fi + +for arch in x86 powerpc s390; do +make -C $linux INSTALL_HDR_PATH=$tmpdir SRCARCH=$arch headers_install + +rm -rf $output/linux-headers/asm-$arch +mkdir -p $output/linux-headers/asm-$arch +for header in kvm.h kvm_para.h; do +cp $tmpdir/include/asm/$header $output/linux-headers/asm-$arch +done +if [ $arch == x86 ]; then +cp $tmpdir/include/asm/hyperv.h $output/linux-headers/asm-x86 +fi +done + +rm -rf $output/linux-headers/linux +mkdir -p $output/linux-headers/linux +for header in kvm.h kvm_para.h vhost.h virtio_config.h virtio_ring.h; do +cp $tmpdir/include/linux/$header $output/linux-headers/linux +done +if [ -L $linux/source ]; then +cp $linux/source/COPYING $output/linux-headers +else +cp $linux/COPYING $output/linux-headers +fi + +rm -rf $tmpdir -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 01/12] Add kernel header update script
On 8 June 2011 16:06, Jan Kiszka jan.kis...@siemens.com wrote: + if [ $arch == x86 ]; then This should be a single '=' -- '==' is a bashism. The 'checkbashisms' script (available in 'devscripts' package on debian and ubuntu) catches this: cam-vm-266:maverick:testing$ checkbashisms scripts/update-linux-headers.sh possible bashism in /home/petmay01/linaro/qemu-from-laptop/qemu/scripts/update-linux-headers.sh line 39 (should be 'b = a'): if [ $arch == x86 ]; then Otherwise looks good. -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 01/12] Add kernel header update script
This helper pulls the required kernel headers for KVM and vhost into a specified directory. The update is triggered via scripts/update-linux-headers.sh LINUX_PATH and will place the output under linux-headers/linux and linux-headers/asm-*. It also imports the COPYING to care for headers without an explicit license. CC: Alexander Graf ag...@suse.de CC: Christoph Hellwig h...@lst.de CC: Peter Maydell peter.mayd...@linaro.org CC: Andreas Färber andreas.faer...@web.de Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Changes in v3: - remove bashism Changes in v2: - add quoting - use mktemp - avoid -o for test linux-headers/README|2 + scripts/update-linux-headers.sh | 55 +++ 2 files changed, 57 insertions(+), 0 deletions(-) create mode 100644 linux-headers/README create mode 100755 scripts/update-linux-headers.sh diff --git a/linux-headers/README b/linux-headers/README new file mode 100644 index 000..5c9026b --- /dev/null +++ b/linux-headers/README @@ -0,0 +1,2 @@ +Automatically imported Linux kernel headers. +Only use scripts/update-linux-headers.sh to update! diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh new file mode 100755 index 000..9d2a4bc --- /dev/null +++ b/scripts/update-linux-headers.sh @@ -0,0 +1,55 @@ +#!/bin/sh -e +# +# Update Linux kernel headers QEMU requires from a specified kernel tree. +# +# Copyright (C) 2011 Siemens AG +# +# Authors: +# Jan Kiszkajan.kis...@siemens.com +# +# This work is licensed under the terms of the GNU GPL version 2. +# See the COPYING file in the top-level directory. + +tmpdir=`mktemp -d` +linux=$1 +output=$2 + +if [ -z $linux ] || ! [ -d $linux ]; then +cat EOF +usage: update-kernel-headers.sh LINUX_PATH [OUTPUT_PATH] + +LINUX_PATH Linux kernel directory to obtain the headers from +OUTPUT_PATH output directory, usually the qemu source tree (default: $PWD) +EOF +exit 1 +fi + +if [ -z $output ]; then +output=$PWD +fi + +for arch in x86 powerpc s390; do +make -C $linux INSTALL_HDR_PATH=$tmpdir SRCARCH=$arch headers_install + +rm -rf $output/linux-headers/asm-$arch +mkdir -p $output/linux-headers/asm-$arch +for header in kvm.h kvm_para.h; do +cp $tmpdir/include/asm/$header $output/linux-headers/asm-$arch +done +if [ $arch = x86 ]; then +cp $tmpdir/include/asm/hyperv.h $output/linux-headers/asm-x86 +fi +done + +rm -rf $output/linux-headers/linux +mkdir -p $output/linux-headers/linux +for header in kvm.h kvm_para.h vhost.h virtio_config.h virtio_ring.h; do +cp $tmpdir/include/linux/$header $output/linux-headers/linux +done +if [ -L $linux/source ]; then +cp $linux/source/COPYING $output/linux-headers +else +cp $linux/COPYING $output/linux-headers +fi + +rm -rf $tmpdir -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 01/12] Add kernel header update script
On 8 June 2011 17:22, Jan Kiszka jan.kis...@siemens.com wrote: This helper pulls the required kernel headers for KVM and vhost into a specified directory. The update is triggered via scripts/update-linux-headers.sh LINUX_PATH Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Changes in v3: - remove bashism Thanks; can't see any problems in this version. Reviewed-by: Peter Maydell peter.mayd...@linaro.org -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM induced panic on 2.6.38[2367] 2.6.39
On 08/06/11 11:59, Eric Dumazet wrote: Well, a bisection definitely should help, but needs a lot of time in your case. Yes. compile, test, crash, walk out to the other building to press reset, lather, rinse, repeat. I need a reset button on the end of a 50M wire, or a hardware watchdog! Actually it's not so bad. If I turn off slub debugging the kernel panics and reboots itself. This.. : [2.913034] netconsole: remote ethernet address 00:16:cb:a7:dd:d1 [2.913066] netconsole: device eth0 not up yet, forcing it [3.660062] Refined TSC clocksource calibration: 3213.422 MHz. [3.660118] Switching to clocksource tsc [ 63.200273] r8169 :03:00.0: eth0: unable to load firmware patch rtl_nic/rtl8168e-1.fw (-2) [ 63.223513] r8169 :03:00.0: eth0: link down [ 63.223556] r8169 :03:00.0: eth0: link down ..is slowing down reboots considerably. 3.0-rc does _not_ like some timing hardware in my machine. Having said that, at least it does not randomly panic on SCSI like 2.6.39 does. Ok, I've ruled out TCPMSS. Found out where it was being set and neutered it. I've replicated it with only the single DNAT rule. Could you try following patch, because this is the 'usual suspect' I had yesterday : diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 46cbd28..9f548f9 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -792,6 +792,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, fastpath = atomic_read(skb_shinfo(skb)-dataref) == delta; } +#if 0 if (fastpath size + sizeof(struct skb_shared_info)= ksize(skb-head)) { memmove(skb-head + size, skb_shinfo(skb), @@ -802,7 +803,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, off = nhead; goto adjust_others; } - +#endif data = kmalloc(size + sizeof(struct skb_shared_info), gfp_mask); if (!data) goto nodata; Nope.. that's not it. sigh That might have changed the characteristic of the fault slightly, but unfortunately I got caught with a couple of fsck's, so I only got to test it 3 times tonight. It's unfortunate that this is a production system, so I can only take it down between about 9pm and 1am. That would normally be pretty productive, except that an fsck of a 14TB ext4 can take 30 minutes if it panics at the wrong time. I'm out of time tonight, but I'll have a crack at some bisection tomorrow night. Now I just have to go back far enough that it works, and be near enough not to have to futz around with /proc /sys or drivers. I really, really, really appreciate you guys helping me with this. It has been driving me absolutely bonkers. If I'm ever in the same town as any of you, dinner and drinks are on me. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
restricting users to only power control of VMs
Hi, As the subject suggests, we are wondering whether there is any way to restrict certain classes of users from performing any action other than powering a VM up and down, and resetting it? If this can't be done with KVM, does anybody have suggestions on how this can be accomplished? The only way I can think of is with a setuid binary that can only start VMs and send reset and shutdown commands to its monitor socket. However, this does seem hackish and can be insecure if it's not written perfectly. Cheers, Iordan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
differencing disks support
Does KVM support or plan to support differencing disks (where there is a read-only source disk, and each person running a virtual machine can save block-level changes that their virtual machine is making to the disk in a separate differencing image)? If so, can somebody suggest how I may make use of this feature (i.e. building the newest version from source, and any other requirements). Thanks! Iordan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: differencing disks support
On Wed, Jun 8, 2011 at 4:09 PM, Iordan Iordanov ior...@cdf.toronto.edu wrote: Does KVM support or plan to support differencing disks (where there is a read-only source disk, and each person running a virtual machine can save block-level changes that their virtual machine is making to the disk in a separate differencing image)? If so, can somebody suggest how I may make use of this feature (i.e. building the newest version from source, and any other requirements). Thanks! Iordan I believe you could accomplish this with LVM2 snapshots. You would create an LVM volume with the base install or set of data or whatever. Then create snapshots of the original volume. Have your guests use the snapshot volumes. This page mentions doing it with Xen in the last paragraph: http://tldp.org/HOWTO/LVM-HOWTO/snapshotintro.html If you need support in qemu specifically for some reason, that's out of my realm. Hope this helps though. Dan VerWeire -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] kvm tools: Fix some SDL keyboard translations
This patch adds unmapped '', '', '|', '-', '+' and '=' which are quite useful in linux. Signed-off-by: Sasha Levin levinsasha...@gmail.com --- tools/kvm/ui/sdl.c | 10 +++--- 1 files changed, 7 insertions(+), 3 deletions(-) diff --git a/tools/kvm/ui/sdl.c b/tools/kvm/ui/sdl.c index 2e7c395..30fd511 100644 --- a/tools/kvm/ui/sdl.c +++ b/tools/kvm/ui/sdl.c @@ -20,7 +20,8 @@ static u8 keymap[255] = { [17]= 0x3e, /* 8 */ [18]= 0x46, /* 9 */ [19]= 0x45, /* 9 */ - + [20]= 0x4e, /* - */ + [21]= 0x55, /* + */ [22]= 0x66, /* backspace */ [24]= 0x15, /* q */ @@ -47,6 +48,8 @@ static u8 keymap[255] = { [46]= 0x4b, /* l */ [50]= 0x12, /* left shift */ + [51]= 0x5d, /* | */ + [52]= 0x1a, /* z */ [53]= 0x22, /* x */ @@ -55,8 +58,9 @@ static u8 keymap[255] = { [56]= 0x32, /* b */ [57]= 0x31, /* n */ [58]= 0x3a, /* m */ - - [61]= 0x4e, /* - */ + [59]= 0x41, /* */ + [60]= 0x49, /* */ + [61]= 0x4a, /* / */ [62]= 0x59, /* right shift */ [65]= 0x29, /* space */ }; -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] kvm tools: Use double buffering with SDL
Page flip every time we copy the buffer over instead of invalidating rects. This should improve performance by letting hardware do the page flipping. Signed-off-by: Sasha Levin levinsasha...@gmail.com --- tools/kvm/ui/sdl.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/kvm/ui/sdl.c b/tools/kvm/ui/sdl.c index 30fd511..59a6aa6 100644 --- a/tools/kvm/ui/sdl.c +++ b/tools/kvm/ui/sdl.c @@ -91,7 +91,7 @@ static void *sdl__thread(void *p) if (!guest_screen) die(Unable to create SDL RBG surface); - flags = SDL_HWSURFACE | SDL_ASYNCBLIT | SDL_HWACCEL; + flags = SDL_HWSURFACE | SDL_ASYNCBLIT | SDL_HWACCEL | SDL_DOUBLEBUF; screen = SDL_SetVideoMode(fb-width, fb-height, fb-depth, flags); if (!screen) @@ -99,7 +99,7 @@ static void *sdl__thread(void *p) for (;;) { SDL_BlitSurface(guest_screen, NULL, screen, NULL); - SDL_UpdateRect(screen, 0, 0, 0, 0); + SDL_Flip(screen); while (SDL_PollEvent(ev)) { switch (ev.type) { -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM induced panic on 2.6.38[2367] 2.6.39
Le jeudi 09 juin 2011 à 01:02 +0800, Brad Campbell a écrit : On 08/06/11 11:59, Eric Dumazet wrote: Well, a bisection definitely should help, but needs a lot of time in your case. Yes. compile, test, crash, walk out to the other building to press reset, lather, rinse, repeat. I need a reset button on the end of a 50M wire, or a hardware watchdog! Actually it's not so bad. If I turn off slub debugging the kernel panics and reboots itself. This.. : [2.913034] netconsole: remote ethernet address 00:16:cb:a7:dd:d1 [2.913066] netconsole: device eth0 not up yet, forcing it [3.660062] Refined TSC clocksource calibration: 3213.422 MHz. [3.660118] Switching to clocksource tsc [ 63.200273] r8169 :03:00.0: eth0: unable to load firmware patch rtl_nic/rtl8168e-1.fw (-2) [ 63.223513] r8169 :03:00.0: eth0: link down [ 63.223556] r8169 :03:00.0: eth0: link down ..is slowing down reboots considerably. 3.0-rc does _not_ like some timing hardware in my machine. Having said that, at least it does not randomly panic on SCSI like 2.6.39 does. Ok, I've ruled out TCPMSS. Found out where it was being set and neutered it. I've replicated it with only the single DNAT rule. Could you try following patch, because this is the 'usual suspect' I had yesterday : diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 46cbd28..9f548f9 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -792,6 +792,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, fastpath = atomic_read(skb_shinfo(skb)-dataref) == delta; } +#if 0 if (fastpath size + sizeof(struct skb_shared_info)= ksize(skb-head)) { memmove(skb-head + size, skb_shinfo(skb), @@ -802,7 +803,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, off = nhead; goto adjust_others; } - +#endif data = kmalloc(size + sizeof(struct skb_shared_info), gfp_mask); if (!data) goto nodata; Nope.. that's not it. sigh That might have changed the characteristic of the fault slightly, but unfortunately I got caught with a couple of fsck's, so I only got to test it 3 times tonight. It's unfortunate that this is a production system, so I can only take it down between about 9pm and 1am. That would normally be pretty productive, except that an fsck of a 14TB ext4 can take 30 minutes if it panics at the wrong time. I'm out of time tonight, but I'll have a crack at some bisection tomorrow night. Now I just have to go back far enough that it works, and be near enough not to have to futz around with /proc /sys or drivers. I really, really, really appreciate you guys helping me with this. It has been driving me absolutely bonkers. If I'm ever in the same town as any of you, dinner and drinks are on me. Hmm, I wonder if kmemcheck could help you, but its slow as hell, so not appropriate for production :( -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] [virt] virtio-blk: Use ida to allocate disk index
On Wed, 08 Jun 2011 09:08:29 -0400, Mark Wu d...@redhat.com wrote: Hi Rusty, Yes, I can't figure out an instance of disk probing in parallel either, but as per the following commit, I think we still need use lock for safety. What's your opinion? commit 4034cc68157bfa0b6622efe368488d3d3e20f4e6 Author: Tejun Heo t...@kernel.org Date: Sat Feb 21 11:04:45 2009 +0900 [SCSI] sd: revive sd_index_lock Commit f27bac2761cab5a2e212dea602d22457a9aa6943 which converted sd to use ida instead of idr incorrectly removed sd_index_lock around id allocation and free. idr/ida do have internal locks but they protect their free object lists not the allocation itself. The caller is responsible for that. This missing synchronization led to the same id being assigned to multiple devices leading to oops. I'm confused. Tejun, Greg, anyone can probes happen in parallel? If so, I'll have to review all my drivers. Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio scsi host draft specification, v3
On Tue, 07 Jun 2011 15:43:49 +0200, Paolo Bonzini pbonz...@redhat.com wrote: Hi all, after some preliminary discussion on the QEMU mailing list, I present a draft specification for a virtio-based SCSI host (controller, HBA, you name it). OK, I'm impressed. This is very well written and I doesn't make any of the obvious mistakes wrt. virtio. Unfortunately, I know almost nothing of SCSI, so I have to leave it to others to decide if this is actually useful and sufficient. I assume you have an implementation, as well? Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] tun: do not put self in waitq if doing a nonblock read
Perf shows a relatively high rate (about 8%) race in spin_lock_irqsave() when doing netperf between external host and guest. It's mainly becuase the lock contention between the tun_do_read() and tun_xmit_skb(), so this patch do not put self into waitqueue to reduce this kind of race. After this patch, it drops to 4%. Signed-off-by: Jason Wang jasow...@redhat.com Signed-off-by: Amos Kong ak...@redhat.com --- drivers/net/tun.c |6 -- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 74e9405..95dbff4 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -817,7 +817,8 @@ static ssize_t tun_do_read(struct tun_struct *tun, tun_debug(KERN_INFO, tun, tun_chr_read\n); - add_wait_queue(tun-wq.wait, wait); + if (unlikely(!noblock)) + add_wait_queue(tun-wq.wait, wait); while (len) { current-state = TASK_INTERRUPTIBLE; @@ -848,7 +849,8 @@ static ssize_t tun_do_read(struct tun_struct *tun, } current-state = TASK_RUNNING; - remove_wait_queue(tun-wq.wait, wait); + if (unlikely(!noblock)) + remove_wait_queue(tun-wq.wait, wait); return ret; } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] [virt] virtio-blk: Use ida to allocate disk index
On Thu, Jun 09, 2011 at 08:51:05AM +0930, Rusty Russell wrote: On Wed, 08 Jun 2011 09:08:29 -0400, Mark Wu d...@redhat.com wrote: Hi Rusty, Yes, I can't figure out an instance of disk probing in parallel either, but as per the following commit, I think we still need use lock for safety. What's your opinion? commit 4034cc68157bfa0b6622efe368488d3d3e20f4e6 Author: Tejun Heo t...@kernel.org Date: Sat Feb 21 11:04:45 2009 +0900 [SCSI] sd: revive sd_index_lock Commit f27bac2761cab5a2e212dea602d22457a9aa6943 which converted sd to use ida instead of idr incorrectly removed sd_index_lock around id allocation and free. idr/ida do have internal locks but they protect their free object lists not the allocation itself. The caller is responsible for that. This missing synchronization led to the same id being assigned to multiple devices leading to oops. I'm confused. Tejun, Greg, anyone can probes happen in parallel? If so, I'll have to review all my drivers. I know we've tried it in the past, at the PCI device level, and ran into some issues, but I don't remember if that code ever made it into the mainline kernel or not. greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] KVM: Adjust shadow paging to work when SMEP=1 and CR0.WP=0
Do we have test cases with guest.wp=0 in KVM test suite? Thanks! -Xin -Original Message- From: Avi Kivity [mailto:a...@redhat.com] Sent: Monday, June 06, 2011 9:19 PM To: Marcelo Tosatti; kvm@vger.kernel.org; Yang, Wei Y; Shan, Haitao; Li, Xin Subject: [PATCH] KVM: Adjust shadow paging to work when SMEP=1 and CR0.WP=0 When CR0.WP=0, we sometimes map user pages as kernel pages (to allow the kernel to write to them). Unfortunately this also allows the kernel to fetch from these pages, even if CR4.SMEP is set. Adjust for this by also setting NX on the spte in these circumstances. Signed-off-by: Avi Kivity a...@redhat.com --- Turned out a little more complicated than I thought. Documentation/virtual/kvm/mmu.txt | 18 ++ arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/mmu.c| 14 +- 3 files changed, 32 insertions(+), 1 deletions(-) diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt index f46aa58..5dc972c 100644 --- a/Documentation/virtual/kvm/mmu.txt +++ b/Documentation/virtual/kvm/mmu.txt @@ -165,6 +165,10 @@ Shadow pages contain the following information: Contains the value of efer.nxe for which the page is valid. role.cr0_wp: Contains the value of cr0.wp for which the page is valid. + role.smep_andnot_wp: +Contains the value of cr4.smep !cr0.wp for which the page is valid +(pages for which this is true are different from other pages; see the +treatment of cr0.wp=0 below). gfn: Either the guest page table containing the translations shadowed by this page, or the base page frame for linear translations. See role.direct. @@ -317,6 +321,20 @@ on fault type: (user write faults generate a #PF) +In the first case there is an additional complication if CR4.SMEP is +enabled: since we've turned the page into a kernel page, the kernel may now +execute it. We handle this by also setting spte.nx. If we get a user +fetch or read fault, we'll change spte.u=1 and spte.nx=gpte.nx back. + +To prevent an spte that was converted into a kernel page with cr0.wp=0 +from being written by the kernel after cr0.wp has changed to 1, we make +the value of cr0.wp part of the page role. This means that an spte created +with one value of cr0.wp cannot be used when cr0.wp has a different value - +it will simply be missed by the shadow page lookup code. A similar issue +exists when an spte created with cr0.wp=0 and cr4.smep=0 is used after +changing cr4.smep to 1. To avoid this, the value of !cr0.wp cr4.smep +is also made a part of the page role. + Large pages === diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index fc38eca..c7e7f53 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -205,6 +205,7 @@ union kvm_mmu_page_role { unsigned invalid:1; unsigned nxe:1; unsigned cr0_wp:1; + unsigned smep_andnot_wp:1; }; }; diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 2d14434..823f 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1985,8 +1985,17 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, spte |= PT_WRITABLE_MASK; if (!vcpu-arch.mmu.direct_map - !(pte_access ACC_WRITE_MASK)) + !(pte_access ACC_WRITE_MASK)) { spte = ~PT_USER_MASK; + /* + * If we converted a user page to a kernel page, + * so that the kernel can write to it when cr0.wp=0, + * then we should prevent the kernel from executing it + * if SMEP is enabled. + */ + if (!kvm_read_cr4_bits(vcpu, X86_CR4_SMEP)) + spte |= PT64_NX_MASK; + } /* * Optimization: for pte sync, if spte was writable the hash @@ -2955,6 +2964,7 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu) int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context) { int r; + bool smep = kvm_read_cr4_bits(vcpu, X86_CR4_SMEP); ASSERT(vcpu); ASSERT(!VALID_PAGE(vcpu-arch.mmu.root_hpa)); @@ -2969,6 +2979,8 @@ int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context) vcpu-arch.mmu.base_role.cr4_pae = !!is_pae(vcpu); vcpu-arch.mmu.base_role.cr0_wp = is_write_protection(vcpu); + vcpu-arch.mmu.base_role.smep_andnot_wp + = smep !is_write_protection(vcpu); return r; } -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html