Re: Guest floppy regression hits qemu-kvm, qemu is not affected
On 11/18/2011 04:06 PM, Lucas Meneghel Rodrigues wrote: Hi guys, Today during the last 'sanity' qemu-kvm testing, we've noticed a recurring problem: guest OS does not see the floppy, making the windows installs time out. This problem has been extensively discussed on qemu and qemu is fine here, problem is specific with qemu-kvm. http://comments.gmane.org/gmane.comp.emulators.qemu/123928 So, this is something to go on your radars. What was the last qemu-kvm commit hash that showed the problem? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM updates for 3.2-rc2
Linus, please pull from: git://git.kernel.org/pub/scm/virt/kvm/kvm.git kvm-updates/3.2 This includes a ppc ABI breakage fix, s390 fixes, a tracing/kvmclock conflict fix, and the implementation of guest-only/host-only profiling for Intel. The latter is not strictly a fix, and missed the merge window due to a combination of a dependency on tip.git, kernel.org being offline, and me forgetting all about it. However it would be good to have the feature implemented fully instead of just on AMD as it is at present. Alexander Graf (1): Revert KVM: PPC: Add support for explicit HIOR setting Avi Kivity (1): KVM guest: prevent tracing recursion with kvmclock Christian Borntraeger (2): KVM: s390: Fix tprot locking KVM: s390: announce SYNC_MMU Cornelia Huck (2): KVM: s390: Fix RUNNING flag misinterpretation KVM: s390: handle SIGP sense running intercepts Gleb Natapov (3): KVM: VMX: add support for switching of PERF_GLOBAL_CTRL KVM: VMX: Add support for guest/host-only profiling KVM: VMX: Check for automatic switch msr table overflow arch/powerpc/include/asm/kvm.h|8 -- arch/powerpc/include/asm/kvm_book3s.h |2 - arch/powerpc/kvm/book3s_pr.c | 14 +--- arch/powerpc/kvm/powerpc.c|1 - arch/s390/include/asm/kvm_host.h |3 +- arch/s390/kvm/diag.c |2 +- arch/s390/kvm/intercept.c |3 +- arch/s390/kvm/interrupt.c |1 + arch/s390/kvm/kvm-s390.c | 12 ++- arch/s390/kvm/priv.c | 10 ++- arch/s390/kvm/sigp.c | 45 +++- arch/x86/kernel/kvmclock.c|5 +- arch/x86/kvm/vmx.c| 131 ++--- include/linux/kvm.h |1 - 14 files changed, 189 insertions(+), 49 deletions(-) -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] KVM updates for 3.2-rc2
(now with [GIT PULL] for filters) Linus, please pull from: git://git.kernel.org/pub/scm/virt/kvm/kvm.git kvm-updates/3.2 This includes a ppc ABI breakage fix, s390 fixes, a tracing/kvmclock conflict fix, and the implementation of guest-only/host-only profiling for Intel. The latter is not strictly a fix, and missed the merge window due to a combination of a dependency on tip.git, kernel.org being offline, and me forgetting all about it. However it would be good to have the feature implemented fully instead of just on AMD as it is at present. Alexander Graf (1): Revert KVM: PPC: Add support for explicit HIOR setting Avi Kivity (1): KVM guest: prevent tracing recursion with kvmclock Christian Borntraeger (2): KVM: s390: Fix tprot locking KVM: s390: announce SYNC_MMU Cornelia Huck (2): KVM: s390: Fix RUNNING flag misinterpretation KVM: s390: handle SIGP sense running intercepts Gleb Natapov (3): KVM: VMX: add support for switching of PERF_GLOBAL_CTRL KVM: VMX: Add support for guest/host-only profiling KVM: VMX: Check for automatic switch msr table overflow arch/powerpc/include/asm/kvm.h|8 -- arch/powerpc/include/asm/kvm_book3s.h |2 - arch/powerpc/kvm/book3s_pr.c | 14 +--- arch/powerpc/kvm/powerpc.c|1 - arch/s390/include/asm/kvm_host.h |3 +- arch/s390/kvm/diag.c |2 +- arch/s390/kvm/intercept.c |3 +- arch/s390/kvm/interrupt.c |1 + arch/s390/kvm/kvm-s390.c | 12 ++- arch/s390/kvm/priv.c | 10 ++- arch/s390/kvm/sigp.c | 45 +++- arch/x86/kernel/kvmclock.c|5 +- arch/x86/kvm/vmx.c| 131 ++--- include/linux/kvm.h |1 - 14 files changed, 189 insertions(+), 49 deletions(-) -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2] ivshmem: add a new PIO BAR3(Doorbell) besides MMIO BAR0 to reduce notification time
On 11/18/2011 07:50 AM, zanghongy...@huawei.com wrote: From: Hongyong Zang zanghongy...@huawei.com This patch, adds a PIO BAR3 for guest notifying qemu. And we find the new notification way of PIO BAR3 reduces 30% time in comparison with the original MMIO BAR0 way. Please update the spec, and split the patch into an infrastructure patch (for long eventfds in the memory API) and an ivshmem patch. //#define DEBUG_IVSHMEM #ifdef DEBUG_IVSHMEM @@ -57,8 +58,10 @@ typedef struct IVShmemState { CharDriverState **eventfd_chr; CharDriverState *server_chr; MemoryRegion ivshmem_mmio; +MemoryRegion ivshmem_pio; pcibus_t mmio_addr; +pcibus_t pio_addr; /* We might need to register the BAR before we actually have the memory. * So prepare a container MemoryRegion for the BAR immediately and * add a subregion when we have the memory. @@ -234,7 +237,7 @@ static uint64_t ivshmem_io_read(void *opaque, target_phys_addr_t addr, return ret; } -static const MemoryRegionOps ivshmem_mmio_ops = { +static const MemoryRegionOps ivshmem_io_ops = { .read = ivshmem_io_read, .write = ivshmem_io_write, .endianness = DEVICE_NATIVE_ENDIAN, @@ -348,6 +351,8 @@ static void close_guest_eventfds(IVShmemState *s, int posn) for (i = 0; i guest_curr_max; i++) { kvm_set_ioeventfd_mmio_long(s-peers[posn].eventfds[i], s-mmio_addr + DOORBELL, (posn 16) | i, 0); +kvm_set_ioeventfd_pio_long(s-peers[posn].eventfds[i], +s-pio_addr + DOORBELL, (posn 16) | i, 0); This really shouldn't be needed - the memory API should take care of it. close(s-peers[posn].eventfds[i]); } @@ -367,6 +372,12 @@ static void setup_ioeventfds(IVShmemState *s) { true, (i 16) | j, s-peers[i].eventfds[j]); +memory_region_add_eventfd(s-ivshmem_pio, + DOORBELL, + 4, + true, + (i 16) | j, + s-peers[i].eventfds[j]); } } Where is the memory API support for this? } @@ -495,6 +506,10 @@ static void ivshmem_read(void *opaque, const uint8_t * buf, int flags) (incoming_posn 16) | guest_max_eventfd, 1) 0) { fprintf(stderr, ivshmem: ioeventfd not available\n); } +if (kvm_set_ioeventfd_pio_long(incoming_fd, s-pio_addr + DOORBELL, +(incoming_posn 16) | guest_max_eventfd, 1) 0) { +fprintf(stderr, ivshmem: ioeventfd not available\n); +} } Nor should this be needed. Please make BAR 3 disappear if started with -M pc-1.0. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC v2 PATCH] kvm tools, qcow: Add the support for copy-on-write clusters
On Sun, 2011-11-20 at 14:14 +0800, Lan, Tianyu wrote: OK. Thx. But fsync is too slow. I try to find a way to sync a range of file. Are there any solutions to meet my purpose? On Sun, 2011-11-20 at 08:23 +0200, Sasha Levin wrote: fdatasync() is as good as it'll get. tbh, maybe we should just consider opening QCOW images with O_SYNC and just get it over with? No, lets not do that. It's easier to improve the performance of correct code that doesn't use O_SYNC. Pekka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/6] KVM: introduce kvm_for_each_memslot macro
On 11/18/2011 11:18 AM, Xiao Guangrong wrote: index bb8728e..10524c0 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -307,6 +307,10 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i) (vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \ idx++) +#define kvm_for_each_memslot(slots, memslot, i) \ + for (i = 0; i (slots)-nmemslots\ + ({ memslot = (slots)-memslots[i]; 1; }); i++) + Statement expression not needed, you can use the comma operator: i (slots)-nmemslots (memslot = @(slots)-memslots[i], true) or even memslot = (slots)-memslots[i], i (slots)-nmemslots or just kill i and make memslot the loop variable. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 5/6] KVM: sort memslots by its size and use line search
On 11/18/2011 11:19 AM, Xiao Guangrong wrote: Sort memslots base on its size and use line search to find it, so the larger memslots have better fit The idea is from Avi Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com --- include/linux/kvm_host.h | 22 +--- virt/kvm/kvm_main.c | 82 - 2 files changed, 75 insertions(+), 29 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a0e4d63..83b396a 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -230,8 +230,12 @@ struct kvm_irq_routing_table {}; #define KVM_MEM_SLOTS_NUM (KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS) #endif +/* + * Note: + * memslots are not sorted by id anymore, please use id_to_memslot() + * to get the memslot by its id. + */ struct kvm_memslots { - int nmemslots; u64 generation; struct kvm_memory_slot memslots[KVM_MEM_SLOTS_NUM]; }; @@ -307,9 +311,10 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i) (vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \ idx++) -#define kvm_for_each_memslot(slots, memslot, i) \ - for (i = 0; i (slots)-nmemslots\ - ({ memslot = (slots)-memslots[i]; 1; }); i++) +#define kvm_for_each_memslot(slots, memslot, i) \ + for (i = 0; i KVM_MEM_SLOTS_NUM \ + ({ memslot = (slots)-memslots[i]; 1; }) \ + memslot-npages != 0; i++) You might allocate an always-empty memslot at the end and simplify the termination condition. int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id); void kvm_vcpu_uninit(struct kvm_vcpu *vcpu); @@ -335,7 +340,14 @@ static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm) static inline struct kvm_memory_slot * id_to_memslot(struct kvm_memslots *slots, int id) { - return slots-memslots[id]; + int i; + + for (i = 0; i KVM_MEM_SLOTS_NUM; i++) + if (slots-memslots[i].id == id) + return slots-memslots[i]; + Is that in any hot path? we could make an array for doing this translation. Alex wants to increase the memslot count, so this could be a long loop. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 5/6] KVM: sort memslots by its size and use line search
On 11/20/2011 01:26 PM, Avi Kivity wrote: int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id); void kvm_vcpu_uninit(struct kvm_vcpu *vcpu); @@ -335,7 +340,14 @@ static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm) static inline struct kvm_memory_slot * id_to_memslot(struct kvm_memslots *slots, int id) { - return slots-memslots[id]; + int i; + + for (i = 0; i KVM_MEM_SLOTS_NUM; i++) + if (slots-memslots[i].id == id) + return slots-memslots[i]; + Is that in any hot path? we could make an array for doing this translation. Never mind, I see patch 6. I should have known you wouldn't leave it like that. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/6] KVM: optimize memslots searching
On 11/18/2011 11:16 AM, Xiao Guangrong wrote: This is the more work base on my v1 patchset which is posted some months ago, it can be found at: https://lkml.org/lkml/2011/2/22/68 Change log: - sort memslots base on its size and do the line search instead of binary search base on gfn, it is from Avi's idea. - in order to reduce cache footprint, memslots are sorted in the array of kvm-memslots-memslots[] and introduce a table to map slot id to index in the array There is the performance result: autotest for RHEL.6.1 setup/boot/reboot/shutdown(average): ept=1:before: 449.5 after: 447.8 ept=0:before: 532.7 after: 529.8 kernbench(average): ept=1:before: 127.94 after: 126.98 ept=0:before: 196.85 after: 189.66 Looks good, had a couple of comments but they're really minor. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/4] [PATCH] kvm: Fix tprot locking
On 11/17/2011 01:15 PM, Martin Schwidefsky wrote: On Thu, 17 Nov 2011 12:27:41 +0200 Avi Kivity a...@redhat.com wrote: On 11/17/2011 12:00 PM, Carsten Otte wrote: From: Christian Borntraeger borntrae...@de.ibm.com There is a potential host deadlock in the tprot intercept handling. We must not hold the mmap semaphore while resolving the guest address. If userspace is remapping, then the memory detection in the guest is broken anyway so we can safely separate the address translation from walking the vmas. Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com Signed-off-by: Carsten Otte co...@de.ibm.com --- arch/s390/kvm/priv.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff -urpN linux-2.6/arch/s390/kvm/priv.c linux-2.6-patched/arch/s390/kvm/priv.c --- linux-2.6/arch/s390/kvm/priv.c2011-10-24 09:10:05.0 +0200 +++ linux-2.6-patched/arch/s390/kvm/priv.c2011-11-17 10:03:53.0 +0100 @@ -336,6 +336,7 @@ static int handle_tprot(struct kvm_vcpu u64 address1 = disp1 + base1 ? vcpu-arch.guest_gprs[base1] : 0; u64 address2 = disp2 + base2 ? vcpu-arch.guest_gprs[base2] : 0; struct vm_area_struct *vma; + unsigned long user_address; vcpu-stat.instruction_tprot++; @@ -349,9 +350,14 @@ static int handle_tprot(struct kvm_vcpu return -EOPNOTSUPP; + /* we must resolve the address without holding the mmap semaphore. + * This is ok since the userspace hypervisor is not supposed to change + * the mapping while the guest queries the memory. Otherwise the guest + * might crash or get wrong info anyway. */ + user_address = (unsigned long) __guestaddr_to_user(vcpu, address1); + down_read(current-mm-mmap_sem); - vma = find_vma(current-mm, - (unsigned long) __guestaddr_to_user(vcpu, address1)); + vma = find_vma(current-mm, user_address); if (!vma) { up_read(current-mm-mmap_sem); return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); Unrelated to the patch, but I'm curious: it looks like __gmap_fault() dereferences the guest page table? How can it assume that it is mapped? The gmap code does not assume that the code is mapped. If the individual MB has not been mapped in the guest address space or the target memory is not mapped in the process address space __gmap_fault() returns -EFAULT. I'm probably misreading the code. I did misread it - I assumed it was guest page table, whereas those are host page tables mapping guest memory (here, guest page table mean guest-managed virt to phys translation). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/4] [PATCH] kvm: Fix tprot locking
On 11/17/2011 01:32 PM, Martin Schwidefsky wrote: On Thu, 17 Nov 2011 12:15:52 +0100 Martin Schwidefsky schwidef...@de.ibm.com wrote: On Thu, 17 Nov 2011 12:27:41 +0200 Avi Kivity a...@redhat.com wrote: On 11/17/2011 12:00 PM, Carsten Otte wrote: From: Christian Borntraeger borntrae...@de.ibm.com There is a potential host deadlock in the tprot intercept handling. We must not hold the mmap semaphore while resolving the guest address. If userspace is remapping, then the memory detection in the guest is broken anyway so we can safely separate the address translation from walking the vmas. Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com Signed-off-by: Carsten Otte co...@de.ibm.com --- arch/s390/kvm/priv.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff -urpN linux-2.6/arch/s390/kvm/priv.c linux-2.6-patched/arch/s390/kvm/priv.c --- linux-2.6/arch/s390/kvm/priv.c 2011-10-24 09:10:05.0 +0200 +++ linux-2.6-patched/arch/s390/kvm/priv.c 2011-11-17 10:03:53.0 +0100 @@ -336,6 +336,7 @@ static int handle_tprot(struct kvm_vcpu u64 address1 = disp1 + base1 ? vcpu-arch.guest_gprs[base1] : 0; u64 address2 = disp2 + base2 ? vcpu-arch.guest_gprs[base2] : 0; struct vm_area_struct *vma; + unsigned long user_address; vcpu-stat.instruction_tprot++; @@ -349,9 +350,14 @@ static int handle_tprot(struct kvm_vcpu return -EOPNOTSUPP; + /* we must resolve the address without holding the mmap semaphore. +* This is ok since the userspace hypervisor is not supposed to change +* the mapping while the guest queries the memory. Otherwise the guest +* might crash or get wrong info anyway. */ + user_address = (unsigned long) __guestaddr_to_user(vcpu, address1); + down_read(current-mm-mmap_sem); - vma = find_vma(current-mm, - (unsigned long) __guestaddr_to_user(vcpu, address1)); + vma = find_vma(current-mm, user_address); if (!vma) { up_read(current-mm-mmap_sem); return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); Unrelated to the patch, but I'm curious: it looks like __gmap_fault() dereferences the guest page table? How can it assume that it is mapped? The gmap code does not assume that the code is mapped. If the individual MB has not been mapped in the guest address space or the target memory is not mapped in the process address space __gmap_fault() returns -EFAULT. I'm probably misreading the code. A little closer to the patch, x86 handles the same issue by calling get_user_pages_fast(). This should be more scalable than bouncing mmap_sem, something to consider. I don't think that the frequency of asynchronous page faults will make it necessary to use get_user_pages_fast(). We are talking about the case where I/O is necessary to provide the page that the guest accessed. The advantage of the way s390 does things is that after __gmap_fault translated the guest address to a user space address we can just do a standard page fault for the user space process. Only if that requires I/O we go the long way. Makes sense? Hmm, Carsten just made me aware that your question is not about pfault, it is about the standard case of a guest fault. For normal guest faults we use a cool trick that the s390 hardware allows us. We have the paging table for the kvm process and we have the guest page table for execution in the virtualized context. The trick is that the guest page table reuses the lowest level of the process page table. A fault that sets a pte in the process page table will automatically make that pte visible in the guest page table as well if the memory region has been mapped in the higher order page tables. Even the invalidation of a pte will automatically (!!) remove the referenced page from the guest page table as well, including the TLB entries on all cpus. The IPTE instruction is your friend :-) That is why we don't need mm notifiers. Yes, that explains it perfectly. I congratulate you on having such friendly hardware... -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/6] KVM: optimize memslots searching
On 11/20/2011 01:29 PM, Avi Kivity wrote: On 11/18/2011 11:16 AM, Xiao Guangrong wrote: This is the more work base on my v1 patchset which is posted some months ago, it can be found at: https://lkml.org/lkml/2011/2/22/68 Change log: - sort memslots base on its size and do the line search instead of binary search base on gfn, it is from Avi's idea. - in order to reduce cache footprint, memslots are sorted in the array of kvm-memslots-memslots[] and introduce a table to map slot id to index in the array There is the performance result: autotest for RHEL.6.1 setup/boot/reboot/shutdown(average): ept=1: before: 449.5 after: 447.8 ept=0: before: 532.7 after: 529.8 kernbench(average): ept=1: before: 127.94 after: 126.98 ept=0: before: 196.85 after: 189.66 Looks good, had a couple of comments but they're really minor. btw, this patchset touches a lot of common code. Did you crossbuild for other kvm archs? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM device assignment and user privileges
Hi all, I've been working on adding device assignment to KVM tools, and started with the basics of just getting a device assigned using the KVM_ASSIGN_PCI_DEVICE ioctl. What I've figured is that unprivileged users can request any PCI device to be assigned to him, including devices which he shouldn't be touching. In my case, it happened with the VGA card, where an unprivileged user simply called KVM_ASSIGN_PCI_DEVICE with the bus, seg and fn of the VGA card and caused the display on the host to go apeshit. Was it supposed to work this way? I couldn't find any security checks in the code paths of KVM_ASSIGN_PCI_DEVICE and it looks like any user can invoke it with any parameters he'd want - enabling him to kill the host. -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM device assignment and user privileges
On 11/20/2011 04:58 PM, Sasha Levin wrote: Hi all, I've been working on adding device assignment to KVM tools, and started with the basics of just getting a device assigned using the KVM_ASSIGN_PCI_DEVICE ioctl. What I've figured is that unprivileged users can request any PCI device to be assigned to him, including devices which he shouldn't be touching. In my case, it happened with the VGA card, where an unprivileged user simply called KVM_ASSIGN_PCI_DEVICE with the bus, seg and fn of the VGA card and caused the display on the host to go apeshit. Was it supposed to work this way? No, of course not. I couldn't find any security checks in the code paths of KVM_ASSIGN_PCI_DEVICE and it looks like any user can invoke it with any parameters he'd want - enabling him to kill the host. Alex, Chris? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
On 11/17/11 4:15 PM, Ben Hutchings bhutchi...@solarflare.com wrote: Sorry to come to this rather late. On Tue, 2011-11-08 at 23:55 -0800, Roopa Prabhu wrote: [...] v2 - v3 - Moved set and get filter ops from rtnl_link_ops to netdev_ops - Support for SRIOV VFs. [Note: The get filters msg (in the way current get rtnetlink handles it) might get too big for SRIOV vfs. This patch follows existing sriov vf get code and tries to accomodate filters for all VF's in a PF. And for the SRIOV case I have only tested the fact that the VF arguments are getting delivered to rtnetlink correctly. The code follows existing sriov vf handling code so rest of it should work fine] [...] This is already broken for large numbers of VFs, and increasing the amount of information per VF is going to make the situation worse. I am no netlink expert but I think that the current approach of bundling all information about an interface in a single message may not be sustainable. Yes agreed. I have the same concern. Also, I'm unclear on why this interface is to be used to set filtering for the (PF) net device as well as for related VFs. Doesn't that duplicate the functionality of ndo_set_rx_mode and ndo_vlan_rx_{add,kill}_vid? Yes..I have thought about this. But the reason the final version is the way it is because its trying to accommodate sriov and non sriov cases because I was just trying to make the netlink interface available to as many use cases that might need it. I just wanted to bring up the original intent of this patch. Which was to add support for TUNSETTXFILTER to macvtap so that it can do filtering instead of putting the lower dev (physical dev) in promiscuous mode (This part really does not care if the lowerdev is an SRIOV VF or not). And the focus was on macvlan passthru mode because it is the simplest case to solve (you have to just pass the filters to lowerdev device/driver). Now this may seem like It can be done with existing set_rx_mode/add_vlan_id etc (which are essentially the mechanisms I am using in the macvlan driver to send the filters to lowerdev for passthru mode), but it might not be the case for other macvlan modes. Macvlan device might need to maintain and do a filter lookup like the tap driver does today. And the only reason SRIOV came up in the original patch was because PASSTHRU mode of macvlan was added for SRIOV use case, though it really does not care if the lowerdev is an SRIOV VF or not. Instead of implementing TUNSETTXFILTER, michael had suggested netlink interface instead. When implementing the netlink interface, I did go back and forth in deciding whether this should be on every netdev or only virtual devices that support rtnl link ops. And the existence of set_rx_mode and add_vlan_id netdev ops Was the reason for confusion. So the next version implemented it as rtnl link ops because all I really want is a mechanism like TUNSETTXFILTER which can set/get filters for virtual devices that need to do filtering by themselves. But restricting this interface for only virtual devices did not make great sense so when greg pointed it out that he will need it for VF netdevs, I was happy to move it to netdev ops. And the only reason this patch works on both PF and VF if the final version is because, its trying to accommodate both SRIOV and non-SRIOV devices. So by saying PF and VF, for me it really means SRIOV VF and any other netdev devices. So I intentionally did not put PF or VF in the name of the op. Thanks, Roopa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv4 0/3] acpi: DSDT/SSDT runtime patching
Here's an updated revision of acpi runtime patching patchset. Lightly tested. changes in v4: - split PCI hotplug handling out to a separate SSDT Changes in v3: - change ssdt generation code to get rid of hardcoded offsets - enhancements to acpi_extract: add more extract methods ACPI_EXTRACT_NAME_WORD_CONST - extract a Word Const object from Name() ACPI_EXTRACT_NAME_BYTE_CONST - extract a Byte Const object from Name() ACPI_EXTRACT_PROCESSOR_START - start of Processor() block ACPI_EXTRACT_PROCESSOR_STRING - extract a NameString from Processor() ACPI_EXTRACT_PROCESSOR_END - offset at last byte of Processor() + 1 Changes in v2: - tools rewritten in python - Original ASL retains _EJ0 methods, BIOS patches that to EJ0_ - generic ACP_EXTRACT infrastructure that can match Method and Name Operators - instead of matching specific method name, insert tags in original DSL source and match that to AML - Here's a bug: guest thinks it can eject VGA device and ISA bridge. [root@dhcp74-172 ~]#lspci 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03) 00:02.0 VGA compatible controller: Cirrus Logic GD 5446 00:03.0 PCI bridge: Red Hat, Inc. Device 0001 00:04.0 Ethernet controller: Qumranet, Inc. Virtio network device 00:05.0 SCSI storage controller: Qumranet, Inc. Virtio block device 01:00.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 03) [root@dhcp74-172 ~]# ls /sys/bus/pci/slots/1/ adapter address attention latch module power [root@dhcp74-172 ~]# ls /sys/bus/pci/slots/2/ adapter address attention latch module power [root@dhcp74-172 ~]# echo 0 /sys/bus/pci/slots/2/power [root@dhcp74-172 ~]# lspci 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03) 00:03.0 PCI bridge: Red Hat, Inc. Device 0001 00:04.0 Ethernet controller: Qumranet, Inc. Virtio network device 00:05.0 SCSI storage controller: Qumranet, Inc. Virtio block device 01:00.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 03) This is wrong because slots 1 and 2 are marked as not hotpluggable in qemu. The reason is that our acpi tables declare both _RMV with value 0, and _EJ0 method for these slots. What happens in this case is undocumented by ACPI spec, so linux ignores _RMV, and windows seems to ignore _EJ0. The correct way to suppress hotplug is not to have _EJ0, so this is what this patch does: it probes PIIX and modifies DSDT to match. With these patches applied, we get: [root@dhcp74-172 ~]# ls /sys/bus/pci/slots/1/ address [root@dhcp74-172 ~]# ls /sys/bus/pci/slots/2/ address Michael S. Tsirkin (3): acpi: add ssdt for pci hotplug acpi: EJ0 method name patching acpi: remove _RMV Makefile |2 +- src/acpi-dsdt.dsl | 132 +-- src/acpi.c | 40 src/ssdt-pcihp.dsl | 98 ++ 4 files changed, 142 insertions(+), 130 deletions(-) create mode 100644 src/ssdt-pcihp.dsl -- 1.7.5.53.gc233e -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv4 1/3] acpi: add ssdt for pci hotplug
The point of this split is to make runtime patching easier. DSDT is required to supply: PCI0 - PCI root device object; PCEJ - Method object to eject a PCI slot. Additionally, SSDT is required to supply PCNT - Method object to notify OSPM of a PCI slot event. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Makefile |2 +- src/acpi-dsdt.dsl | 117 + src/acpi.c | 12 + src/ssdt-pcihp.dsl | 132 4 files changed, 148 insertions(+), 115 deletions(-) create mode 100644 src/ssdt-pcihp.dsl diff --git a/Makefile b/Makefile index 540f1ea..ef8e15d 100644 --- a/Makefile +++ b/Makefile @@ -200,7 +200,7 @@ src/%.hex: src/%.dsl ./tools/acpi_extract_preprocess.py ./tools/acpi_extract.py $(Q)./tools/acpi_extract.py $(OUT)$*.lst $(OUT)$*.off $(Q)cat $(OUT)$*.off $@ -$(OUT)ccode32flat.o: src/acpi-dsdt.hex src/ssdt-proc.hex +$(OUT)ccode32flat.o: src/acpi-dsdt.hex src/ssdt-proc.hex src/ssdt-pcihp.hex ### Kconfig rules export HOSTCC := $(CC) diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl index b9b06f2..f97d463 100644 --- a/src/acpi-dsdt.dsl +++ b/src/acpi-dsdt.dsl @@ -474,90 +474,15 @@ DefinitionBlock ( Return (0x0) } -#define gen_pci_device(slot)\ -Device(SL##slot) { \ -Name (_ADR, 0x##slot##) \ -Method (_RMV) { Return (PRMV(0x##slot)) } \ -Name (_SUN, 0x##slot) \ -} - -/* VGA (slot 1) and ISA bus (slot 2) defined above */ -gen_pci_device(03) -gen_pci_device(04) -gen_pci_device(05) -gen_pci_device(06) -gen_pci_device(07) -gen_pci_device(08) -gen_pci_device(09) -gen_pci_device(0a) -gen_pci_device(0b) -gen_pci_device(0c) -gen_pci_device(0d) -gen_pci_device(0e) -gen_pci_device(0f) -gen_pci_device(10) -gen_pci_device(11) -gen_pci_device(12) -gen_pci_device(13) -gen_pci_device(14) -gen_pci_device(15) -gen_pci_device(16) -gen_pci_device(17) -gen_pci_device(18) -gen_pci_device(19) -gen_pci_device(1a) -gen_pci_device(1b) -gen_pci_device(1c) -gen_pci_device(1d) -gen_pci_device(1e) -gen_pci_device(1f) - -/* Methods called by bulk generated hotplug devices below */ +/* Methods called by hotplug devices */ Method (PCEJ, 1, NotSerialized) { // _EJ0 method - eject callback Store(ShiftLeft(1, Arg0), B0EJ) Return (0x0) } -/* Bulk generated PCI hotplug devices */ -#define hotplug_slot(slot) \ -Device (S##slot) { \ - Name (_ADR, 0x##slot##) \ - Method (_EJ0, 1) { Return(PCEJ(0x##slot)) } \ - Name (_SUN, 0x##slot)\ -} - -hotplug_slot(01) -hotplug_slot(02) -hotplug_slot(03) -hotplug_slot(04) -hotplug_slot(05) -hotplug_slot(06) -hotplug_slot(07) -hotplug_slot(08) -hotplug_slot(09) -hotplug_slot(0a) -hotplug_slot(0b) -hotplug_slot(0c) -hotplug_slot(0d) -hotplug_slot(0e) -hotplug_slot(0f) -hotplug_slot(10) -hotplug_slot(11) -hotplug_slot(12) -hotplug_slot(13) -hotplug_slot(14) -hotplug_slot(15) -hotplug_slot(16) -hotplug_slot(17) -hotplug_slot(18) -hotplug_slot(19) -hotplug_slot(1a) -hotplug_slot(1b) -hotplug_slot(1c) -hotplug_slot(1d) -hotplug_slot(1e) -hotplug_slot(1f) + /* Hotplug notification method supplied by SSDT */ + External (\_SB.PCI0.PCNT, MethodObj) /* PCI hotplug notify method */ Method(PCNF, 0) { @@ -575,42 +500,6 @@ DefinitionBlock ( Return(One) } -#define gen_pci_hotplug(slot) \ -If (LEqual(Arg0, 0x##slot)) { Notify(S##slot, Arg1) } - -Method(PCNT, 2) { -gen_pci_hotplug(01) -gen_pci_hotplug(02) -gen_pci_hotplug(03) -gen_pci_hotplug(04) -gen_pci_hotplug(05) -gen_pci_hotplug(06) -gen_pci_hotplug(07) -gen_pci_hotplug(08) -gen_pci_hotplug(09) -gen_pci_hotplug(0a) -gen_pci_hotplug(0b) -gen_pci_hotplug(0c) -gen_pci_hotplug(0d) -gen_pci_hotplug(0e) -gen_pci_hotplug(0f) -gen_pci_hotplug(10) -gen_pci_hotplug(11) -gen_pci_hotplug(12) -
[PATCHv4 2/3] acpi: EJ0 method name patching
Modify ACPI to only supply _EJ0 methods for PCI slots that support hotplug. This is done by runtime patching: - Instrument SSDT ASL code with ACPI_EXTRACT directives tagging _EJ0 and _ADR fields. - At compile time, tools/acpi_extract.py looks for these methods in ASL source finds the matching AML, and stores the offsets of these methods in tables named aml_ej0_name and aml_adr_dword. - At run time, go over aml_ej0_name, use aml_adr_dword to get slot information and check which slots support hotplug. If hotplug is disabled, we patch the _EJ0 NameString in ACPI table, replacing _EJ0 with EJ0_. Note that this has the same checksum, but is ignored by OSPM. Note: the method used is robust in that we don't need to change any offsets manually in case of ASL code changes. As all parsing is done at compile time, any unexpected input causes build failure, not a runtime failure. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- src/acpi.c | 44 src/ssdt-pcihp.dsl |6 ++ 2 files changed, 42 insertions(+), 8 deletions(-) diff --git a/src/acpi.c b/src/acpi.c index c5147d9..107469f 100644 --- a/src/acpi.c +++ b/src/acpi.c @@ -484,12 +484,42 @@ build_ssdt(void) return ssdt; } -static void* -copy_table(void *table, int size) +#include ssdt-pcihp.hex + +#define PCI_RMV_BASE 0xae0c + +extern void link_time_assertion(void); + +static void* build_pcihp(void) { -u8 *copy = malloc_high(size); -memcpy(copy, table, size); -return copy; +u32 rmvc_pcrm; +int i; + +u8 *ssdt = malloc_high(sizeof ssdp_pcihp_aml); +memcpy(ssdt, ssdp_pcihp_aml, sizeof ssdp_pcihp_aml); + +/* Runtime patching of EJ0: to disable hotplug for a slot, + * replace the method name: _EJ0 by EJ0_. */ +if (ARRAY_SIZE(aml_ej0_name) != ARRAY_SIZE(aml_adr_dword)) { +link_time_assertion(); +} + +rmvc_pcrm = inl(PCI_RMV_BASE); +for (i = 0; i ARRAY_SIZE(aml_ej0_name); ++i) { +/* Slot is in byte 2 in _ADR */ +u8 slot = ssdp_pcihp_aml[aml_adr_dword[i] + 2] 0x1F; + /* Sanity check */ +if (memcmp(ssdp_pcihp_aml + aml_ej0_name[i], _EJ0, 4)) { +warn_internalerror(); +free(ssdt); +return NULL; +} +if (!(rmvc_pcrm (0x1 slot))) { +memcpy(ssdt + aml_ej0_name[i], EJ0_, 4); +} +} + +return ssdt; } #define HPET_SIGNATURE 0x54455048 // HPET @@ -642,8 +672,6 @@ static const struct pci_device_id acpi_find_tbl[] = { struct rsdp_descriptor *RsdpAddr; -#include ssdt-pcihp.hex - #define MAX_ACPI_TABLES 20 void acpi_bios_init(void) @@ -675,7 +703,7 @@ acpi_bios_init(void) ACPI_INIT_TABLE(build_madt()); ACPI_INIT_TABLE(build_hpet()); ACPI_INIT_TABLE(build_srat()); -ACPI_INIT_TABLE(copy_table(ssdp_pcihp_aml, sizeof ssdp_pcihp_aml)); +ACPI_INIT_TABLE(build_pcihp()); u16 i, external_tables = qemu_cfg_acpi_additional_tables(); diff --git a/src/ssdt-pcihp.dsl b/src/ssdt-pcihp.dsl index 72d1bb7..cc96ffc 100644 --- a/src/ssdt-pcihp.dsl +++ b/src/ssdt-pcihp.dsl @@ -53,9 +53,15 @@ DefinitionBlock (ssdt-pcihp.aml, SSDT, 0x01, BXPC, BXSSDTPCIHP, 0x1) gen_pci_device(1f) /* Bulk generated PCI hotplug devices */ +// Method _EJ0 can be patched by BIOS to EJ0_ +// at runtime, if the slot is detected to not support hotplug. +// Extract the offset of the address dword and the +// _EJ0 name to allow this patching. #define hotplug_slot(slot) \ Device (S##slot) { \ + ACPI_EXTRACT_NAME_DWORD_CONST aml_adr_dword \ Name (_ADR, 0x##slot##) \ + ACPI_EXTRACT_METHOD_STRING aml_ej0_name \ Method (_EJ0, 1) { Return(PCEJ(0x##slot)) } \ Name (_SUN, 0x##slot)\ } -- 1.7.5.53.gc233e -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv4 3/3] acpi: remove _RMV
The macro gen_pci_device is used to add _RMV method to a slot device so it is no longer needed: presence of _EJ0 now indicates that the slot is ejectable. It is also placing two devices with the same _ADR on the same bus, which isn't defined by the ACPI spec. So let's remove it. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- src/acpi-dsdt.dsl | 15 --- src/ssdt-pcihp.dsl | 40 2 files changed, 0 insertions(+), 55 deletions(-) diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl index f97d463..aff3f48 100644 --- a/src/acpi-dsdt.dsl +++ b/src/acpi-dsdt.dsl @@ -132,12 +132,6 @@ DefinitionBlock ( B0EJ, 32, } -OperationRegion(RMVC, SystemIO, 0xae0c, 0x04) -Field(RMVC, DWordAcc, NoLock, WriteAsZeros) -{ -PCRM, 32, -} - Name (_CRS, ResourceTemplate () { WordBusNumber (ResourceProducer, MinFixed, MaxFixed, PosDecode, @@ -239,7 +233,6 @@ DefinitionBlock ( Return (0x00) } } - Method(_RMV) { Return (0x00) } } } @@ -251,7 +244,6 @@ DefinitionBlock ( Scope(\_SB.PCI0) { Device (ISA) { Name (_ADR, 0x0001) -Method(_RMV) { Return (0x00) } /* PIIX PCI to ISA irq remapping */ OperationRegion (P40C, PCI_Config, 0x60, 0x04) @@ -466,13 +458,6 @@ DefinitionBlock ( Scope(\_SB.PCI0) { /* Methods called by bulk generated PCI devices below */ -Method (PRMV, 1, NotSerialized) { -// _RMV method - check if device can be removed -If (And(\_SB.PCI0.PCRM, ShiftLeft(1, Arg0))) { -Return (0x1) -} -Return (0x0) -} /* Methods called by hotplug devices */ Method (PCEJ, 1, NotSerialized) { diff --git a/src/ssdt-pcihp.dsl b/src/ssdt-pcihp.dsl index cc96ffc..4b435b8 100644 --- a/src/ssdt-pcihp.dsl +++ b/src/ssdt-pcihp.dsl @@ -9,49 +9,9 @@ DefinitionBlock (ssdt-pcihp.aml, SSDT, 0x01, BXPC, BXSSDTPCIHP, 0x1) /* Objects supplied by DSDT */ External (\_SB.PCI0, DeviceObj) -External (\_SB.PCI0.PRMV, MethodObj) External (\_SB.PCI0.PCEJ, MethodObj) Scope(\_SB.PCI0) { - -#define gen_pci_device(slot)\ -Device(SL##slot) { \ -Name (_ADR, 0x##slot##) \ -Method (_RMV) { Return (PRMV(0x##slot)) } \ -Name (_SUN, 0x##slot) \ -} - -/* VGA (slot 1) and ISA bus (slot 2) defined in DSDT */ -gen_pci_device(03) -gen_pci_device(04) -gen_pci_device(05) -gen_pci_device(06) -gen_pci_device(07) -gen_pci_device(08) -gen_pci_device(09) -gen_pci_device(0a) -gen_pci_device(0b) -gen_pci_device(0c) -gen_pci_device(0d) -gen_pci_device(0e) -gen_pci_device(0f) -gen_pci_device(10) -gen_pci_device(11) -gen_pci_device(12) -gen_pci_device(13) -gen_pci_device(14) -gen_pci_device(15) -gen_pci_device(16) -gen_pci_device(17) -gen_pci_device(18) -gen_pci_device(19) -gen_pci_device(1a) -gen_pci_device(1b) -gen_pci_device(1c) -gen_pci_device(1d) -gen_pci_device(1e) -gen_pci_device(1f) - /* Bulk generated PCI hotplug devices */ // Method _EJ0 can be patched by BIOS to EJ0_ // at runtime, if the slot is detected to not support hotplug. -- 1.7.5.53.gc233e -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Failure to compile qemu-kvm - In file included from /home/ykaul/qemu-kvm/hw/pci.c:36:0: ./qmp-commands.h:3:1: error: expected identifier or ‘(’ before ‘{’ token
Latest git hash 1d1c8a498b7ce5c5636f1014f7ad18aa4e1acc0a (though I think it happened before that). ./configure command line: --target-list=x86_64-softmmu --enable-vnc-thread --audio-card-list=hda --disable-nptl --disable-guest-base --enable-spice --disable-werror --disable-usb-redir --disable-smartcard-nss On F16/x64. qemu latest compiles fine with those parameters. ... CCx86_64-softmmu/pci.o In file included from /home/ykaul/qemu-kvm/hw/pci.c:36:0: ./qmp-commands.h:3:1: error: expected identifier or ‘(’ before ‘{’ token ./qmp-commands.h:10:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:20:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:30:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:40:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:50:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:60:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:70:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:80:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:90:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:100:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:110:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:120:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:130:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:140:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:154:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:164:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:174:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:184:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:194:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:204:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:214:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:224:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:233:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:244:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:254:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:264:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:274:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:284:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:294:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:304:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:314:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:323:2: error: expected identifier or ‘(’ before ‘,’ token ./qmp-commands.h:333:2: error: expected identifier or ‘(’ before ‘,’ token /home/ykaul/qemu-kvm/hw/pci.c:1431:14: warning: no previous prototype for ‘qmp_query_pci’ [-Wmissing-prototypes] Thanks, Y. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv4 0/3] acpi: DSDT/SSDT runtime patching
On Sun, Nov 20, 2011 at 07:56:43PM +0200, Michael S. Tsirkin wrote: Here's an updated revision of acpi runtime patching patchset. Lightly tested. It looks good to me. -Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/6] KVM: introduce kvm_for_each_memslot macro
(2011/11/20 20:21), Avi Kivity wrote: On 11/18/2011 11:18 AM, Xiao Guangrong wrote: index bb8728e..10524c0 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -307,6 +307,10 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i) (vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \ idx++) +#define kvm_for_each_memslot(slots, memslot, i)\ + for (i = 0; i (slots)-nmemslots \ + ({ memslot =(slots)-memslots[i]; 1; }); i++) + Statement expression not needed, you can use the comma operator: i (slots)-nmemslots (memslot = @(slots)-memslots[i], true) or even memslot =(slots)-memslots[i], i (slots)-nmemslots or just kill i and make memslot the loop variable. Do you have any preference for the arguments ordering? I think placing the target one, memslot in this case, first is conventional in the kernel code, except when we want to place kvm or something like that. But in kvm code, there seems to be some difference. Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] vfio: VFIO Driver core framework
On Fri, Nov 18, 2011 at 01:32:56PM -0700, Alex Williamson wrote: On Thu, 2011-11-17 at 11:02 +1100, David Gibson wrote: On Tue, Nov 15, 2011 at 11:01:28AM -0700, Alex Williamson wrote: On Tue, 2011-11-15 at 17:34 +1100, David Gibson wrote: On Thu, Nov 03, 2011 at 02:12:24PM -0600, Alex Williamson wrote: snip +Groups, Devices, IOMMUs, oh my +--- + +A fundamental component of VFIO is the notion of IOMMU groups. IOMMUs +can't always distinguish transactions from each individual device in +the system. Sometimes this is because of the IOMMU design, such as with +PEs, other times it's caused by the I/O topology, for instance a +PCIe-to-PCI bridge masking all devices behind it. We call the sets of +devices created by these restictions IOMMU groups (or just groups for +this document). + +The IOMMU cannot distiguish transactions between the individual devices +within the group, therefore the group is the basic unit of ownership for +a userspace process. Because of this, groups are also the primary +interface to both devices and IOMMU domains in VFIO. + +The VFIO representation of groups is created as devices are added into +the framework by a VFIO bus driver. The vfio-pci module is an example +of a bus driver. This module registers devices along with a set of bus +specific callbacks with the VFIO core. These callbacks provide the +interfaces later used for device access. As each new group is created, +as determined by iommu_device_group(), VFIO creates a /dev/vfio/$GROUP +character device. Ok.. so, the fact that it's called vfio-pci suggests that the VFIO bus driver is per bus type, not per bus instance. But grouping constraints could be per bus instance, if you have a couple of different models of PCI host bridge with IOMMUs of different capabilities built in, for example. Yes, vfio-pci manages devices on the pci_bus_type; per type, not per bus instance. Ok, how can that work. vfio-pci is responsible for generating the groupings, yes? For which it needs to know the iommu/host bridge's isolation capabilities, which vary depending on the type of host bridge. No, grouping is done at the iommu driver level. vfio gets groupings via iomm_device_group(), which uses the iommu_ops for the bus_type of the requested device. I'll attempt to clarify where groups come from in the documentation. Hrm, but still per bus type, not bus instance. Hrm. Yeah, I need to look at the earlier iommu patches in more detail. [snip] Yeah, I'm starting to get my head around the model, but the current description of it doesn't help very much. In particular the terms merge and unmerge lead one to the wrong mental model, I think. The semantics of merge and unmerge under those names are really non-obvious. Merge kind of has to merge two whole metagroups, but it's unclear if unmerge reverses one merge, or just takes out one (atom) group. These operations need better names, at least. Christian suggested a change to UNMERGE that we do not need to specify a group to unmerge from. This makes it more like a list implementation except there's no defined list_head. Any member of the list can pull in a new entry. Calling UNMERGE on any member extracts that member. I think that's a good idea, but unmerge is not a good word for it. I can't think of better, if you can, please suggest. Well, I think addgroup and removegroup would be better than merge and unmerge, although they have their own problems. Then it's unclear what order you can do various operations, and which order you can open and close various things. You can kind of figure it out but it takes far more thinking than it should. So at the _very_ least, we need to invent new terminology and find a much better way of describing this API's semantics. I still think an entirely different interface, where metagroups are created from outside with a lifetime that's not tied to an fd would be a better idea. As we've discussed previously, configfs provides part of this, but has no ioctl support. It doesn't make sense to me to go play with groups in configfs, but then still interact with them via a char dev. Why not? You configure, say, loopback devices with losetup, then use them as a block device. Similar with nbd. You can configure serial devices with setserial, then use them as a char dev. It also splits the ownership model I'm not even sure what that means. and makes it harder to enforce who gets to interact with the devices vs who gets to manipulate groups. How so. Let's map out what a configfs interface would
Re: [PATCH v2 3/6] KVM: introduce kvm_for_each_memslot macro
On 11/20/2011 07:21 PM, Avi Kivity wrote: On 11/18/2011 11:18 AM, Xiao Guangrong wrote: index bb8728e..10524c0 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -307,6 +307,10 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i) (vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \ idx++) +#define kvm_for_each_memslot(slots, memslot, i) \ +for (i = 0; i (slots)-nmemslots\ + ({ memslot = (slots)-memslots[i]; 1; }); i++) + Statement expression not needed, you can use the comma operator: i (slots)-nmemslots (memslot = @(slots)-memslots[i], true) or even memslot = (slots)-memslots[i], i (slots)-nmemslots or just kill i and make memslot the loop variable. Good, will fix it in the next version, thanks! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] kvm tools: Implement multiple VQ for virtio-net
On Wed, 16 Nov 2011 09:23:17 +0200, Michael S. Tsirkin m...@redhat.com wrote: On Wed, Nov 16, 2011 at 10:34:42AM +1030, Rusty Russell wrote: On Mon, 14 Nov 2011 15:05:07 +0200, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Nov 14, 2011 at 02:25:17PM +0200, Pekka Enberg wrote: On Mon, Nov 14, 2011 at 4:04 AM, Asias He asias.he...@gmail.com wrote: Why both the bandwidth and latency performance are dropping so dramatically with multiple VQ? What's the expected benefit from multiple VQs Heh, the original patchset didn't mention this :) It really should. They are supposed to speed up networking for high smp guests. If we have one queue per guest CPU, does this allow us to run lockless? Thanks, Rusty. LLTX? It's supposed to be deprecated, isn't it? I was referring back to Subject: virtio net lockless ring which Stephen sent back in June, nothing more specific. I assumed from his query that this was still an active area of exploration... Stephen? Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 1/5] virtio-pci: flexible configuration layout
On Tue, 15 Nov 2011 01:43:13 +0200, Sasha Levin levinsasha...@gmail.com wrote: From: Michael S. Tsirkin m...@redhat.com Add a flexible mechanism to specify virtio configuration layout, using pci vendor-specific capability. A separate capability is used for each of common, device specific and data-path accesses. OK, a couple of minor suggestions: +static int virtio_pci_iomap(struct virtio_pci_device *vp_dev) +{ + vp_dev-isr_map = virtio_pci_map_cfg(vp_dev, + VIRTIO_PCI_CAP_ISR_CFG, + sizeof(u8)); + vp_dev-notify_map = virtio_pci_map_cfg(vp_dev, + VIRTIO_PCI_CAP_NOTIFY_CFG, + sizeof(u16)); + vp_dev-common_map = virtio_pci_map_cfg(vp_dev, + VIRTIO_PCI_CAP_COMMON_CFG, + sizeof(u32)); + vp_dev-device_map = virtio_pci_map_cfg(vp_dev, + VIRTIO_PCI_CAP_DEVICE_CFG, + sizeof(u8)); + + if (!vp_dev-notify_map || !vp_dev-common_map || + !vp_dev-device_map) { + /* + * If not all capabilities present, map legacy PIO. + * Legacy access is at BAR 0. We never need to map + * more than 256 bytes there, since legacy config space + * used PIO which has this size limit. + * */ + vp_dev-legacy_map = pci_iomap(vp_dev-pci_dev, 0, 256); + if (!vp_dev-legacy_map) { + dev_err(vp_dev-vdev.dev, Unable to map legacy PIO); + goto err; + } + } Please don't mix the two. Ever, under any circumstances. If it helps, put CONFIG_VIRTIO_PCI_LEGACY #ifdefs in there: vp_dev-common_map = virtio_pci_map_cfg(vp_dev, VIRTIO_PCI_CAP_COMMON_CFG, sizeof(u32)); /* Something horribly wrong? */ if (IS_ERR(vp_dev-common_map)) { err = PTR_ERR(vp_dev-common_map); goto fail; } /* Not found? */ if (!vp_dev-common_map) return legacy_setup(vp_dev, ...); ... Practice has shown that if we allow something, it'll be done. We should break horribly on malformed devices, and really really only fall back when we have a well-formed, but old, device. And various other legacy paths can happily use: #ifdef CONFIG_VIRTIO_PCI_LEGACY #define is_legacy(vp_dev) ((vp_dev)-legacy_map != NULL) #else #define is_legacy(vp_dev) false #endif Here's suggested Kconfig entry: config VIRTIO_PCI_LEGACY bool default y depends on VIRTIO_PCI ---help--- Look out into your driveway. Do you have a flying car? If so, you can happily disable this option and virtio will not break. Otherwise, leave it set. Unless you're testing what life will be like in The Future. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5 of 5] virtio: expose added descriptors immediately
On Wed, 16 Nov 2011 09:18:38 +0200, Michael S. Tsirkin m...@redhat.com wrote: My unlocked kick patches will trip this warning: they make virtio-net do add + get without kick. Heh, it's a good sign if they do, since that means you're running really well :) I think block with unlocked kick can trip it too: add, lock is dropped and then an interrupt can get. We also don't need a kick each num - each 2^15 is enough. Why don't we do this at start of add_buf: if (vq-num_added = 0x7fff) return -ENOSPC; The warning was there in case a driver is never doing a kick, and getting away with it (mostly) because the device is polling. Let's not penalize good drivers to catch bad ones. How about we do this properly, like so: From: Rusty Russell ru...@rustcorp.com.au Subject: virtio: add debugging if driver doesn't kick. Under the existing #ifdef DEBUG, check that they don't have more than 1/10 of a second between an add_buf() and a virtqueue_notify()/virtqueue_kick_prepare() call. We could get false positives on a really busy system, but good for development. Signed-off-by: Rusty Russell ru...@rustcorp.com.au --- drivers/virtio/virtio_ring.c | 31 +++ 1 file changed, 31 insertions(+) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -22,6 +22,7 @@ #include linux/device.h #include linux/slab.h #include linux/module.h +#include linux/hrtimer.h /* virtio guest is communicating with a virtual device that actually runs on * a host processor. Memory barriers are used to control SMP effects. */ @@ -102,6 +103,10 @@ struct vring_virtqueue #ifdef DEBUG /* They're supposed to lock for us. */ unsigned int in_use; + + /* Figure out if their kicks are too delayed. */ + bool last_add_time_valid; + ktime_t last_add_time; #endif /* Tokens for callbacks. */ @@ -192,6 +197,19 @@ int virtqueue_add_buf(struct virtqueue * BUG_ON(data == NULL); +#ifdef DEBUG + { + ktime_t now = ktime_get(); + + /* No kick or get, with .1 second between? Warn. */ + if (vq-last_add_time_valid) + WARN_ON(ktime_to_ms(ktime_sub(now, vq-last_add_time)) +100); + vq-last_add_time = now; + vq-last_add_time_valid = true; + } +#endif + /* If the host supports indirect descriptor tables, and we have multiple * buffers, then go indirect. FIXME: tune this threshold */ if (vq-indirect (out + in) 1 vq-num_free) { @@ -291,6 +309,14 @@ bool virtqueue_kick_prepare(struct virtq new = vq-vring.avail-idx; vq-num_added = 0; +#ifdef DEBUG + if (vq-last_add_time_valid) { + WARN_ON(ktime_to_ms(ktime_sub(ktime_get(), + vq-last_add_time)) 100); + } + vq-last_add_time_valid = false; +#endif + if (vq-event) { needs_kick = vring_need_event(vring_avail_event(vq-vring), new, old); @@ -428,6 +454,10 @@ void *virtqueue_get_buf(struct virtqueue virtio_mb(); } +#ifdef DEBUG + vq-last_add_time_valid = false; +#endif + END_USE(vq); return ret; } @@ -611,6 +641,7 @@ struct virtqueue *vring_new_virtqueue(un list_add_tail(vq-vq.list, vdev-vqs); #ifdef DEBUG vq-in_use = false; + vq-last_add_time_valid = false; #endif vq-indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 5/6] KVM: sort memslots by its size and use line search
On 11/20/2011 07:26 PM, Avi Kivity wrote: On 11/18/2011 11:19 AM, Xiao Guangrong wrote: Sort memslots base on its size and use line search to find it, so the larger memslots have better fit The idea is from Avi Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com --- include/linux/kvm_host.h | 22 +--- virt/kvm/kvm_main.c | 82 - 2 files changed, 75 insertions(+), 29 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a0e4d63..83b396a 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -230,8 +230,12 @@ struct kvm_irq_routing_table {}; #define KVM_MEM_SLOTS_NUM (KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS) #endif +/* + * Note: + * memslots are not sorted by id anymore, please use id_to_memslot() + * to get the memslot by its id. + */ struct kvm_memslots { -int nmemslots; u64 generation; struct kvm_memory_slot memslots[KVM_MEM_SLOTS_NUM]; }; @@ -307,9 +311,10 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i) (vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \ idx++) -#define kvm_for_each_memslot(slots, memslot, i) \ -for (i = 0; i (slots)-nmemslots\ - ({ memslot = (slots)-memslots[i]; 1; }); i++) +#define kvm_for_each_memslot(slots, memslot, i) \ +for (i = 0; i KVM_MEM_SLOTS_NUM \ + ({ memslot = (slots)-memslots[i]; 1; }) \ + memslot-npages != 0; i++) You might allocate an always-empty memslot at the end and simplify the termination condition. The slots are sorted by its size, all empty slots are at the end of the array, so we can simplify break the loop if meet a empty slot. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/6] KVM: optimize memslots searching
On 11/20/2011 08:12 PM, Avi Kivity wrote: On 11/20/2011 01:29 PM, Avi Kivity wrote: On 11/18/2011 11:16 AM, Xiao Guangrong wrote: This is the more work base on my v1 patchset which is posted some months ago, it can be found at: https://lkml.org/lkml/2011/2/22/68 Change log: - sort memslots base on its size and do the line search instead of binary search base on gfn, it is from Avi's idea. - in order to reduce cache footprint, memslots are sorted in the array of kvm-memslots-memslots[] and introduce a table to map slot id to index in the array There is the performance result: autotest for RHEL.6.1 setup/boot/reboot/shutdown(average): ept=1: before: 449.5 after: 447.8 ept=0: before: 532.7 after: 529.8 kernbench(average): ept=1: before: 127.94 after: 126.98 ept=0: before: 196.85 after: 189.66 Looks good, had a couple of comments but they're really minor. btw, this patchset touches a lot of common code. Did you crossbuild for other kvm archs? No :-), but i will build cross compilation environment and test it next time, thanks! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM device assignment and user privileges
* Avi Kivity (a...@redhat.com) wrote: On 11/20/2011 04:58 PM, Sasha Levin wrote: Hi all, I've been working on adding device assignment to KVM tools, and started with the basics of just getting a device assigned using the KVM_ASSIGN_PCI_DEVICE ioctl. What I've figured is that unprivileged users can request any PCI device to be assigned to him, including devices which he shouldn't be touching. In my case, it happened with the VGA card, where an unprivileged user simply called KVM_ASSIGN_PCI_DEVICE with the bus, seg and fn of the VGA card and caused the display on the host to go apeshit. Was it supposed to work this way? No, of course not. Indeed. A device is typically owned by a host OS driver which precludes device assignment from working. If it's not, the unprivilged guest will not have access to the device's config space or resource bars as they are only rw for a privileged user. And similarly, /dev/kvm was typically left as 0644. As you can see, it's fragile. I couldn't find any security checks in the code paths of KVM_ASSIGN_PCI_DEVICE and it looks like any user can invoke it with any parameters he'd want - enabling him to kill the host. Alex, Chris? The security checks were removed some time back. The expectation was that there was nothing an unprivleged user could usefully do w/ the assign device ioctl, and the assign irq ioctl only works after assign device. It's built on an overly fragile set of assumptions, however. Avi, the simplest short term thing to do now might be simply revert: 48bb09e KVM: remove CAP_SYS_RAWIO requirement from kvm_vm_ioctl_assign_irq While it's a regression for existing unprivileged users it's better than a hole. And in the meantime, we can come up w/ something better to replace with. thanks, -chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm tools, qcow: Add the support for copy-on-write clusters
When meeting request to write the cluster without copied flag, allocate a new cluster and write original data with modification to the new cluster. This also adds support for the writing operation of the qcow2 compressed image. After testing, image file can pass through qemu-img check. The performance is needed to be improved. Signed-off-by: Lan Tianyu tianyu@intel.com --- tools/kvm/disk/qcow.c| 411 +- tools/kvm/include/kvm/qcow.h |2 + 2 files changed, 285 insertions(+), 128 deletions(-) diff --git a/tools/kvm/disk/qcow.c b/tools/kvm/disk/qcow.c index 680b37d..723faaa 100644 --- a/tools/kvm/disk/qcow.c +++ b/tools/kvm/disk/qcow.c @@ -20,6 +20,16 @@ #include linux/kernel.h #include linux/types.h + +static inline int qcow_pwrite_sync(int fd, + void *buf, size_t count, off_t offset) +{ + if (pwrite_in_full(fd, buf, count, offset) 0) + return -1; + + return fdatasync(fd); +} + static int l2_table_insert(struct rb_root *root, struct qcow_l2_table *new) { struct rb_node **link = (root-rb_node), *parent = NULL; @@ -101,7 +111,8 @@ static int qcow_l2_cache_write(struct qcow *q, struct qcow_l2_table *c) size = 1 header-l2_bits; - if (pwrite_in_full(q-fd, c-table, size * sizeof(u64), c-offset) 0) + if (qcow_pwrite_sync(q-fd, c-table, + size * sizeof(u64), c-offset) 0) return -1; c-dirty = 0; @@ -122,9 +133,6 @@ static int cache_table(struct qcow *q, struct qcow_l2_table *c) */ lru = list_first_entry(l1t-lru_list, struct qcow_l2_table, list); - if (qcow_l2_cache_write(q, lru) 0) - goto error; - /* Remove the node from the cache */ rb_erase(lru-node, r); list_del_init(lru-list); @@ -518,14 +526,6 @@ static inline u64 file_size(int fd) return st.st_size; } -static inline int qcow_pwrite_sync(int fd, void *buf, size_t count, off_t offset) -{ - if (pwrite_in_full(fd, buf, count, offset) 0) - return -1; - - return fdatasync(fd); -} - /* Writes a level 2 table at the end of the file. */ static u64 qcow_write_l2_table(struct qcow *q, u64 *table) { @@ -601,7 +601,8 @@ static int write_refcount_block(struct qcow *q, struct qcow_refcount_block *rfb) if (!rfb-dirty) return 0; - if (pwrite_in_full(q-fd, rfb-entries, rfb-size * sizeof(u16), rfb-offset) 0) + if (qcow_pwrite_sync(q-fd, rfb-entries, + rfb-size * sizeof(u16), rfb-offset) 0) return -1; rfb-dirty = 0; @@ -618,9 +619,6 @@ static int cache_refcount_block(struct qcow *q, struct qcow_refcount_block *c) if (rft-nr_cached == MAX_CACHE_NODES) { lru = list_first_entry(rft-lru_list, struct qcow_refcount_block, list); - if (write_refcount_block(q, lru) 0) - goto error; - rb_erase(lru-node, r); list_del_init(lru-list); rft-nr_cached--; @@ -706,6 +704,11 @@ static struct qcow_refcount_block *qcow_read_refcount_block(struct qcow *q, u64 rfb_offset = be64_to_cpu(rft-rf_table[rft_idx]); + if (!rfb_offset) { + pr_warning(Don't support to grow refcount table); + return NULL; + } + rfb = refcount_block_search(q, rfb_offset); if (rfb) return rfb; @@ -728,35 +731,138 @@ error_free_rfb: return NULL; } -/* - * QCOW file might grow during a write operation. Not only data but metadata is - * also written at the end of the file. Therefore it is necessary to ensure - * every write is committed to disk. Hence we use uses qcow_pwrite_sync() to - * synchronize the in-core state of QCOW image to disk. - * - * We also try to restore the image to a consistent state if the metdata - * operation fails. The two metadat operations are: level 1 and level 2 table - * update. If either of them fails the image is truncated to a consistent state. +static u16 qcow_get_refcount(struct qcow *q, u64 clust_idx) +{ + struct qcow_refcount_block *rfb = NULL; + struct qcow_header *header = q-header; + u64 rfb_idx; + + rfb = qcow_read_refcount_block(q, clust_idx); + if (!rfb) { + pr_warning(Error while reading refcount table); + return -1; + } + + rfb_idx = clust_idx (((1ULL + (header-cluster_bits - QCOW_REFCOUNT_BLOCK_SHIFT)) - 1)); + + if (rfb_idx = rfb-size) { + pr_warning(L1: refcount block index out of bounds); + return -1; + } + + return be16_to_cpu(rfb-entries[rfb_idx]); +} + +static int update_cluster_refcount(struct qcow *q, u64 clust_idx, u16 append) +{ + struct qcow_refcount_block *rfb = NULL; + struct qcow_header *header = q-header; + u16
Re: [PATCH 01/11] KVM: PPC: Add memory-mapping support for PCI passthrough and emulation
On 11/17/2011 12:52 AM, Paul Mackerras wrote: From: Benjamin Herrenschmidt b...@kernel.crashing.org This adds support for adding PCI device I/O regions to the guest memory map, and for trapping guest accesses to emulated MMIO regions and delivering them to qemu for MMIO emulation. To trap guest accesses to emulated MMIO regions, we reserve key 31 for the hypervisor's use and set the VPM1 bit in LPCR, which sends all page faults to the host. Any page fault that is not a key fault gets reflected immediately to the guest. We set HPTEs for emulated MMIO regions to have key = 31, and don't allow the guest to create HPTEs with key = 31. Any page fault that is a key fault with key = 31 is then a candidate for MMIO emulation and thus gets sent up to qemu. We also load the instruction that caused the fault for use later when qemu has done the emulation. [pau...@samba.org: Cleaned up, moved kvmppc_book3s_hv_emulate_mmio() to book3s_64_mmu_hv.c] + /* + * XXX WARNING: We do not know for sure whether the instruction we just + * read from memory is the same that caused the fault in the first + * place. We don't have a problem with the guest shooting itself in + * the foot that way, however we must be careful that we enforce + * the write permission based on the instruction we are actually + * emulating, not based on dsisr. Unfortunately, the KVM code for + * instruction emulation isn't smart enough for that to work + * so right now we just do it badly and racily, but that will need + * fixing + */ + Ouch, I assume this will be fixed before merging? } int kvmppc_core_prepare_memory_region(struct kvm *kvm, - struct kvm_userspace_memory_region *mem) + struct kvm_memory_slot *memslot, + struct kvm_userspace_memory_region *mem) { unsigned long psize, porder; unsigned long i, npages, totalpages; unsigned long pg_ix; struct kvmppc_pginfo *pginfo; - unsigned long hva; struct kvmppc_rma_info *ri = NULL; + struct vm_area_struct *vma; struct page *page; + unsigned long hva; + + /* + * This could be an attempt at adding memory or it could be MMIO + * pass-through. We need to treat them differently but the only + * way for us to know what it is is to look at the VMA and play + * guess work so let's just do that + */ There is no the VMA. There could be multiple VMAs, or none (with the mmap() coming afterwards). You could do all the checks you want here, only to have host userspace remap it under your feet. This needs to be done on a per-page basis at fault time. Please see the corresponding x86 code (warning: convoluted), which has a similar problem (though I have no idea if you can use a similar solution). + + /* + * We require read write permission as we cannot yet + * enforce guest read-only protection or no access. + */ + if ((vma-vm_flags (VM_READ | VM_WRITE)) != + (VM_READ | VM_WRITE)) + goto err_unlock; This, too, must be done at get_user_pages() time. What happens if mmu notifiers tell you to write protect a page? void kvm_arch_commit_memory_region(struct kvm *kvm, diff --git a/include/linux/kvm.h b/include/linux/kvm.h index c107fae..774b04d 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -105,6 +105,9 @@ struct kvm_userspace_memory_region { #define KVM_MEM_LOG_DIRTY_PAGES 1UL #define KVM_MEMSLOT_INVALID (1UL 1) +/* Kernel internal use */ +#define KVM_MEMSLOT_IO(1UL 31) + Please define it internally then (and leave a comment so we don't overlap it). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 10/11] KVM: PPC: Implement MMU notifiers
On 11/17/2011 01:52 AM, Paul Mackerras wrote: This implements the low-level functions called by the MMU notifiers in the generic KVM code, and defines KVM_ARCH_WANT_MMU_NOTIFIER if CONFIG_KVM_BOOK3S_64_HV so that the generic KVM MMU notifiers get included. That means we also have to take notice of when PTE invalidations are in progress, as indicated by mmu_notifier_retry(). In kvmppc_h_enter, if any invalidation is in progress we just install a non-present HPTE. In kvmppc_book3s_hv_page_fault, if an invalidation is in progress we just return without resolving the guest, causing it to encounter another page fault immediately. This is better than spinning inside kvmppc_book3s_hv_page_fault because this way the guest can get preempted by a hypervisor decrementer interrupt without us having to do any special checks. We currently maintain a referenced bit in the rmap array, and when we clear it, we make all the HPTEs that map the corresponding page be non-present, as if the page were invalidated. In future we could use the hardware reference bit in the guest HPT instead. The kvm_set_spte_hva function is implemented as kvm_unmap_hva. The former appears to be unused anyway. This is mostly used for COW (after ksm, not fork). So if you want to use ksm, this avoids an exit. This all means that on processors that support virtual partition memory (POWER7), we can claim support for the KVM_CAP_SYNC_MMU capability, and we no longer have to pin all the guest memory. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html