Re: XP smp using a lot of CPU
Ross Boylan wrote: I just installed XP into a new VM, specifying -smp 2 for the machine. According to top, it's using nearly 200% of a cpu even when I'm not doing anything. Is this real CPU useage, or just a reporting problem (just as my disk image is big according to ls, but isn't really)? If it's real, is there anything I can do about it? kvm 0.7.2 on Debian Lenny (but 2.6.29 kernel), amd64. Xeon chips; 32 bit version of XP pro installed, now fully patched (including the Windows Genuine Advantage stuff, though I cancelled it when it wanted to run). Task manager in XP shows virtually no CPU useage. Please cc me on responses. I'm guessing Windows uses a pio port to sleep, which kvm doesn't support. Can you provide kvm_stat output? -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Enable dirty logging for all regions during migration
Glauber Costa wrote: From: Glauber de Oliveira Costa glom...@redhat.com In current calculations, we are not activating dirty logging for all regions, leading migration to fail. This problem was already raised by Yaniv Kamay a while ago. The proposed solution at the time (not merged), was a calculation to convert from target_phys_addr_t to ram_addr_t, which the dirty logging code expects. Avi noticed that enabling dirty logging for the region 0 - -1ULL would do the trick. As I hit the problem, I can confirm it does. This patch, therefore, goes with this simpler approach. Before this patch, migration fails. With this patch, simple migration tests succeds. Applied, thanks. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] kvm: Use a bitmap for tracking used GSIs
On Tue, May 12, 2009 at 04:07:15PM -0600, Alex Williamson wrote: We're currently using a counter to track the most recent GSI we've handed out. This quickly hits KVM_MAX_IRQ_ROUTES when using device assignment with a driver that regularly toggles the MSI enable bit. BTW, dwhich driver does that? Any idea why? -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] kvm: Use a bitmap for tracking used GSIs
On Tue, May 12, 2009 at 04:07:15PM -0600, Alex Williamson wrote: We're currently using a counter to track the most recent GSI we've handed out. This quickly hits KVM_MAX_IRQ_ROUTES when using device assignment with a driver that regularly toggles the MSI enable bit. This can mean only a few minutes of usable run time. Instead, track used GSIs in a bitmap. Signed-off-by: Alex Williamson alex.william...@hp.com --- v2: Added mutex to protect gsi bitmap v3: Updated for comments from Michael Tsirkin No longer depends on [PATCH] kvm: device-assignment: Catch GSI overflow hw/device-assignment.c |4 ++ kvm/libkvm/kvm-common.h |4 ++ kvm/libkvm/libkvm.c | 83 +-- kvm/libkvm/libkvm.h | 10 ++ 4 files changed, 88 insertions(+), 13 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index a7365c8..a6cc9b9 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -561,8 +561,10 @@ static void free_dev_irq_entries(AssignedDevice *dev) { int i; -for (i = 0; i dev-irq_entries_nr; i++) +for (i = 0; i dev-irq_entries_nr; i++) { kvm_del_routing_entry(kvm_context, dev-entry[i]); +kvm_free_irq_route_gsi(kvm_context, dev-entry[i].gsi); +} free(dev-entry); dev-entry = NULL; dev-irq_entries_nr = 0; diff --git a/kvm/libkvm/kvm-common.h b/kvm/libkvm/kvm-common.h index 591fb53..4b3cb51 100644 --- a/kvm/libkvm/kvm-common.h +++ b/kvm/libkvm/kvm-common.h @@ -66,8 +66,10 @@ struct kvm_context { #ifdef KVM_CAP_IRQ_ROUTING struct kvm_irq_routing *irq_routes; int nr_allocated_irq_routes; + void *used_gsi_bitmap; + int max_gsi; + pthread_mutex_t gsi_mutex; #endif - int max_used_gsi; }; int kvm_alloc_kernel_memory(kvm_context_t kvm, unsigned long memory, diff --git a/kvm/libkvm/libkvm.c b/kvm/libkvm/libkvm.c index ba0a5d1..3d7ab75 100644 --- a/kvm/libkvm/libkvm.c +++ b/kvm/libkvm/libkvm.c @@ -35,6 +35,7 @@ #include errno.h #include sys/ioctl.h #include inttypes.h +#include pthread.h #include libkvm.h #if defined(__x86_64__) || defined(__i386__) @@ -65,6 +66,8 @@ int kvm_abi = EXPECTED_KVM_API_VERSION; int kvm_page_size; +static inline void set_bit(uint32_t *buf, unsigned int bit); + struct slot_info { unsigned long phys_addr; unsigned long len; @@ -286,6 +289,9 @@ kvm_context_t kvm_init(struct kvm_callbacks *callbacks, int fd; kvm_context_t kvm; int r; +#ifdef KVM_CAP_IRQ_ROUTING Let's kill all these ifdefs. Or at least, let's not add them. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Best choice for copy/clone/snapshot
Ross Boylan wrote: First, I have a feeling this might be a question I could ask on a qemu list. It is. Is there a way for me to tell which questions should go where? If the question is equally valid for qemu and qemu-kvm, then qemu-devel is the correct forum. Is it OK to ask here? Sure, we aren't sticklers for this sort of thing. As I install software onto a system I want to preserve its state--just the disk state---at various points so I can go back. What is the best way to do this? LVM snapshots. Read up on the 'lvcreate -s' command and option. First, I think I could just make a copy of the virtual disk, although I haven't seen this suggested anywhere. I assume this will work if the VM is off; Yes. are there other circumstances in which it is safe? You could suspend the guest, either by having it sleep, or externally using ctrl-Z. Since my original virtual disk file isn't really occupying its nominal space, I assume this will be true of the copy too. Second, kvm-img could create a copy on write image. There are several things I don't understand about this. Suppose I go kvm-img -b A.img B.img If I then go on and use A.img as I did before, changing what is on disk, have I screwed up B.img? Yes. If you use an image as a backing store, you promise not to change it. Use B.img instead. Do A.img or B.img have to be qcow2 format? I created a raw image for portability. Only B.img, though it works better if both are qcow2s. Suppose I work for awhile installing new stuff on B.img, and then want to preserve the state. Is kvm-img -b B.img C.img sensible, or is this kind of recursive operation (B.img is already the copy on write version of A.img) not OK? Should work. Does ‘commit [-f fmt] filename’, documented as Commit the changes recorded in filename in its base image. mean commit the recorded changes TO its base image? Yes. It was broken until recently, so use with caution. Here are some other things I think I don't want to do. Please let me know if I'm mistaken. -snapshot on the kvm command line: nothing persistent comes of this (maybe if you commit you update the original image, but you don't get 2). Right. snapshot in the monitor: this snapshots the non-disk state of the VM; further, that state is not guaranteed to work if you later change what is on the disk. I think kvm-img snapshot also accesses these facilities. It snapshots both the disk and non-disk state. You have to use qcow2 for this. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][KVM-AUTOTEST] Add custom install option for kvm_install
Michael Goldish wrote: That is, assuming you want to - install KVM from F10 branch - run all tests - install KVM from F11 branch - run all tests - install KVM from devel branch - run all tests If you meant something different please correct me. Note kvm is moving to split userspace/kernel packaging, so this can me useful to test version compatibility. i.e. test kvm-kmod A, B, C vs qemu-kvm X, Y, Z. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/2] qemu-kvm: MSI-X support
Michael S. Tsirkin wrote: It seems that if I just call apic_deliver_irq each time I want to send MSI, things will work. However, large part of the msix code is managing IRQs versus kernel, and I'm not sure it's a wise investment of effort to rip it all out. So IMHO, what's missing is API that abstracts managing irq routes in kvm, specifically abstract this stuff in some way: kvm_get_irq_route_gsi kvm_add_routing_entry kvm_del_routing_entry kvm_commit_irq_routes All these are just games with qemu_irq objects. Should be a lot simpler in userspace. kvm_set_irq qemu_set_irq(). How hard is that? Should be pretty easy, once you get the hang of qemu_irq. For now, this API could be a stub that just stores the routes somewhere, and set_irq would call the local apic emulation, along the lines of: uint8_t dest = (addr_lo MSI_ADDR_DEST_ID_MASK) MSI_ADDR_DEST_ID_SHIFT; uint8_t vector = (addr_hi MSI_DATA_VECTOR_MASK) MSI_DATA_VECTOR_SHIFT; uint8_t dest_mode = (addr_lo MSI_ADDR_DEST_MODE_SHIFT) 0x1; uint8_t trigger_mode = (data MSI_DATA_TRIGGER_SHIFT) 0x1; uint8_t delivery_mode = (data MSI_DATA_DELIVERY_MODE_SHIFT) 0x7; apic_deliver_irq(dest, dest_mode, delivery_mode, vector, 0, trigger_mode); qemu_set_irq() eventually calls a callback that you specify; just set it do look up the entry and call apic_deliver_irq. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] virtio: find_vqs/del_vqs virtio operations
On Wed, May 13, 2009 at 10:47:08AM +0930, Rusty Russell wrote: On Wed, 13 May 2009 01:03:30 am Michael S. Tsirkin wrote: On Wed, May 13, 2009 at 12:00:02AM +0930, Rusty Russell wrote and perhaps consider varargs for the callbacks (or would that be too horrible at the implementation end?) Thanks, Rusty. Ugh ... I think it will be. And AFAIK gcc generates a lot of code for varargs - not something we want to do in each interrupt handler. Err, no I mean for find_vqs: eg. (block device) err = vdev-config-find_vqs(vdev, 1, vblk-vq, blk_done); (net device) err = vdev-config-find_vqs(vdev, 3, vqs, skb_recv_done, skb_xmit_done, NULL); A bit neater for for the single-queue case. Cheers, Rusty. Oh. I see. But it becomes messy now that we also need to pass in the names, and we lose type safety. Let's just add a helper function for the single vq case? static inline struct virtqueue *virtio_find_vq(struct virtio_devide *vdev, vq_callback_t *c, const char *n) { vq_callback_t *callbacks[] = { c }; const char *names[] = { n }; struct virtqueue *vq; int err = vdev-config-find_vqs(vdev, 1, vq, callbacks, names); if (err 0) return ERR_PTR(err); return vq; } -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network I/O performance
Fischer, Anna wrote: I am running KVM with Fedora Core 8 on a 2.6.23 32-bit kernel. I use the tun/tap device model and the Linux bridge kernel module to connect my VM to the network. I have 2 10G Intel 82598 network devices (with the ixgbe driver) attached to my machine and I want to do packet routing in my VM (the VM has two virtual network interfaces configured). Analysing the network performance of the standard QEMU emulated NICs, I get less that 1G of throughput on those 10G links. Surprisingly though, I don't really see CPU utilization being maxed out. This is a dual core machine, and mpstat shows me that both CPUs are about 40% idle. My VM is more or less unresponsive due to the high network processing load while the host OS still seems to be in good shape. How can I best tune this setup to achieve best possible performance with KVM? I know there is virtIO and I know there is PCI pass-through, but those models are not an option for me right now. How many cpus are assigned to the guest? If only one, then 40% idle equates to 100% of a core for the guest and 20% for housekeeping. If this is the case, you could try pinning the vcpu thread (info cpus from the monitor) to one core. You should then see 100%/20% cpu load distribution. wrt emulated NIC performance, I'm guessing you're not doing tcp? If you were we might do something with TSO. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] virtio: find_vqs/del_vqs virtio operations
Michael S. Tsirkin wrote: On Wed, May 13, 2009 at 10:47:08AM +0930, Rusty Russell wrote: On Wed, 13 May 2009 01:03:30 am Michael S. Tsirkin wrote: On Wed, May 13, 2009 at 12:00:02AM +0930, Rusty Russell wrote and perhaps consider varargs for the callbacks (or would that be too horrible at the implementation end?) Thanks, Rusty. Ugh ... I think it will be. And AFAIK gcc generates a lot of code for varargs - not something we want to do in each interrupt handler. Err, no I mean for find_vqs: eg. (block device) err = vdev-config-find_vqs(vdev, 1, vblk-vq, blk_done); (net device) err = vdev-config-find_vqs(vdev, 3, vqs, skb_recv_done, skb_xmit_done, NULL); A bit neater for for the single-queue case. Cheers, Rusty. Oh. I see. But it becomes messy now that we also need to pass in the names, and we lose type safety. Let's just add a helper function for the single vq case? static inline struct virtqueue *virtio_find_vq(struct virtio_devide *vdev, vq_callback_t *c, const char *n) { vq_callback_t *callbacks[] = { c }; const char *names[] = { n }; struct virtqueue *vq; int err = vdev-config-find_vqs(vdev, 1, vq, callbacks, names); if (err 0) return ERR_PTR(err); return vq; } Much saner. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Deal with shadow interrupts after emulated instructions
Glauber Costa wrote: Same story, more avi's comments merged. Applied, thanks. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/3] locking fixes / cr3 validation v3
mtosa...@redhat.com wrote: Addressing comments. Applied all. But please fix you From: header. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 1/1] kvm: expand on help info to specify kvm intel and amd module names
a...@linux-foundation.org wrote: From: Robert P. J. Day rpj...@crashcourse.ca Signed-off-by: Robert P. J. Day rpj...@crashcourse.ca Cc: Avi Kivity a...@redhat.com Signed-off-by: Andrew Morton a...@linux-foundation.org Applied, thanks. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] kvm-s390: collection of kvm-s390 fixes - v3
ehrha...@linux.vnet.ibm.com wrote: From: Christian Ehrhardt ehrha...@de.ibm.com *updates in v3* - fix memory slot vs. run uses trylock to avoid a potential livelock - fix memory slot vs. run checks if it is the first and only memslot registered *updates in v2* - hrtimer wakeup use a more accurate calculation - unlink vcpu uses smb_mb so the pointer is really zero when the page is freed This is a collection of fixes for kvm-s390 that originate from several tests made in the last few months. They are now tested a while and should be ready to be merged. Applied all, thanks. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] Userspace changes for KVM HPET (v3)
Beth Kon wrote: Beth Kon wrote: Avi Kivity wrote: Beth Kon wrote: Signed-off-by: Beth Kon e...@us.ibm.com diff --git a/hw/hpet.c b/hw/hpet.c index c7945ec..100abf5 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -30,6 +30,7 @@ #include console.h #include qemu-timer.h #include hpet_emul.h +#include qemu-kvm.h //#define HPET_DEBUG #ifdef HPET_DEBUG @@ -48,6 +49,28 @@ uint32_t hpet_in_legacy_mode(void) return 0; } +static void hpet_legacy_enable(void) +{ +if (qemu_kvm_pit_in_kernel()) { +kvm_kpit_disable(); +dprintf(qemu: hpet disabled kernel pit\n); +} else { +hpet_pit_disable(); +dprintf(qemu: hpet disabled userspace pit\n); +} +} + +static void hpet_legacy_disable(void) +{ +if (qemu_kvm_pit_in_kernel()) { +kvm_kpit_enable(); +dprintf(qemu: hpet enabled kernel pit\n); +} else { +hpet_pit_enable(); +dprintf(qemu: hpet enabled userspace pit\n); +} +} I think it's better to move these into hpet_pit_enable() and hpet_pit_enable(). This avoids changing the calls below, and puts pit stuff in i8254.c instead of hpet.c. Might also need to be called from hpet_load(); probably a problem in upstream as well. My assumption about hpet_load was that the correct pit state would be established via pit_load (since all saves/loads are done together). But when I wrote this, I was thinking only about the userspace pit (for qemu). I'm not sure how the load concept applies to kernel state. Do I need to explicitly re-enable or disable the kernel pit during load? Looking further at the code, it looks like kvm_pit_load should take care of this. Agree? I doesn't save/load the enabled bit, does it? -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] Userspace changes for KVM HPET (v3)
Avi Kivity wrote: My assumption about hpet_load was that the correct pit state would be established via pit_load (since all saves/loads are done together). But when I wrote this, I was thinking only about the userspace pit (for qemu). I'm not sure how the load concept applies to kernel state. Do I need to explicitly re-enable or disable the kernel pit during load? Looking further at the code, it looks like kvm_pit_load should take care of this. Agree? I doesn't save/load the enabled bit, does it? Also, we might migrate between a host with pit-in-kernel and a host with pit-in-userspace, so this is should be handled at the pit level, not kvm. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] KVM fixes for 2.6.30rc3
Avi Kivity wrote: Linus, please pull repo and branch at Typo in $subject, branch is against recent git. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Enable IRQ windows after exception injection if there are pending virq
Gleb Natapov wrote: On Tue, May 12, 2009 at 11:06:39PM +0800, Dong, Eddie wrote: I didn't take many test since our PTS system stop working now due to KVM userspace build changes. But since the logic is pretty simple, so I want to post here to see comments. Thx, eddie If there is pending irq after an virtual exception is injected, KVM needs to enable IRQ window to trap back earlier once the exception is handled. I already posted patch to do that http://patchwork.kernel.org/patch/21830/ Is you patch different? Is it base on the idea I mentioned to you in private mail (April 27), or a novel one? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip v5 1/7] x86: instruction decorder API
On Fri, May 08, 2009 at 08:48:42PM -0400, Masami Hiramatsu wrote: +++ b/arch/x86/scripts/gen-insn-attr-x86.awk @@ -0,0 +1,314 @@ +#!/bin/awk -f On some distributions (debian) it is /usr/bin/awk. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: user: include arch specific headers from $(KERNELDIR)
Currently we only include $(KERNELDIR)/include in CFLAGS, but we also have $(KERNELDIR)/arch/$(arch)/include or else we'll get mis-matched headers. Signed-off-by: Mark McLoughlin mar...@redhat.com --- kvm/user/config-i386.mak |1 - kvm/user/config-ia64.mak |1 + kvm/user/config-powerpc.mak|1 + kvm/user/config-x86-common.mak |2 ++ kvm/user/config-x86_64.mak |1 - 5 files changed, 4 insertions(+), 2 deletions(-) diff --git a/kvm/user/config-i386.mak b/kvm/user/config-i386.mak index 09175d5..eebb9de 100644 --- a/kvm/user/config-i386.mak +++ b/kvm/user/config-i386.mak @@ -3,7 +3,6 @@ cstart.o = $(TEST_DIR)/cstart.o bits = 32 ldarch = elf32-i386 CFLAGS += -D__i386__ -CFLAGS += -I $(KERNELDIR)/include tests= diff --git a/kvm/user/config-ia64.mak b/kvm/user/config-ia64.mak index c4c639e..e8803a0 100644 --- a/kvm/user/config-ia64.mak +++ b/kvm/user/config-ia64.mak @@ -2,6 +2,7 @@ bits = 64 CFLAGS += -m64 CFLAGS += -D__ia64__ CFLAGS += -I $(KERNELDIR)/include +CFLAGS += -I $(KERNELDIR)/arch/ia64/include all: diff --git a/kvm/user/config-powerpc.mak b/kvm/user/config-powerpc.mak index dd7ef54..589aa61 100644 --- a/kvm/user/config-powerpc.mak +++ b/kvm/user/config-powerpc.mak @@ -1,4 +1,5 @@ CFLAGS += -I $(KERNELDIR)/include +CFLAGS += -I $(KERNELDIR)/arch/powerpc/include CFLAGS += -Wa,-mregnames -I test/lib CFLAGS += -ffreestanding diff --git a/kvm/user/config-x86-common.mak b/kvm/user/config-x86-common.mak index e789fd4..8d8fadf 100644 --- a/kvm/user/config-x86-common.mak +++ b/kvm/user/config-x86-common.mak @@ -12,6 +12,8 @@ cflatobjs += \ $(libcflat): LDFLAGS += -nostdlib $(libcflat): CFLAGS += -ffreestanding -I test/lib +CFLAGS += -I $(KERNELDIR)/include +CFLAGS += -I $(KERNELDIR)/arch/x86/include CFLAGS += -m$(bits) FLATLIBS = test/lib/libcflat.a $(libgcc) diff --git a/kvm/user/config-x86_64.mak b/kvm/user/config-x86_64.mak index b50b540..d88f54c 100644 --- a/kvm/user/config-x86_64.mak +++ b/kvm/user/config-x86_64.mak @@ -3,7 +3,6 @@ cstart.o = $(TEST_DIR)/cstart64.o bits = 64 ldarch = elf64-x86-64 CFLAGS += -D__x86_64__ -CFLAGS += -I $(KERNELDIR)/include tests = $(TEST_DIR)/access.flat $(TEST_DIR)/irq.flat $(TEST_DIR)/sieve.flat \ $(TEST_DIR)/simple.flat $(TEST_DIR)/stringio.flat \ -- 1.6.0.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][Resend] Fix Warnining in arch/x86/kvm/vmx.c
Hi Avi/Yaniv, With gcc --version 4.4.1 20090429 (prerelease) I get the following warning: arch/x86/kvm/vmx.c: In function âvmx_intr_assistâ: arch/x86/kvm/vmx.c:3233: warning: âmax_irrâ may be used uninitialized in this function arch/x86/kvm/vmx.c:3233: note: âmax_irrâ was declared here Investigation found that: 3231 static void update_tpr_threshold(struct kvm_vcpu *vcpu) 3232 { 3233 int max_irr, tpr; 3234 3235 if (!vm_need_tpr_shadow(vcpu-kvm)) 3236 return; 3237 3238 if (!kvm_lapic_enabled(vcpu) || 3239 ((max_irr = kvm_lapic_find_highest_irr(vcpu)) == -1)) { (max_irr = kvm_lapic_find_highest_irr(vcpu)) == -1 may not get a chance to evaluate if: !kvm_lapic_enabled(vcpu) evaluates to true (as the expressions are Or-ed). 3240 vmcs_write32(TPR_THRESHOLD, 0); 3241 return; 3242 } 3243 3244 tpr = (kvm_lapic_get_cr8(vcpu) 0x0f) 4; 3245 vmcs_write32(TPR_THRESHOLD, (max_irr tpr) ? tpr 4 : max_irr 4); Using (max_irr tpr) and max_irr 4, without max_irr getting initialized can cause trouble. 3246 } I would like to propose a small fix for this by interchanging the operands in ||, so that max_irr is initialized in all instances, and, the warning fades away, without compromising the criteria of conditional evaluation inside if(). Signed-Off-By: Subrata Modak subr...@linux.vnet.ibm.com, To: Avi Kivity a...@qumranet.com To: Yaniv Kamay ya...@qumranet.com To: kvm@vger.kernel.org Cc: Balbir Singh bal...@linux.vnet.ibm.com Cc: Sachin P Sant sach...@linux.vnet.ibm.com Subject: [PATCH][Resend] Fix Warnining in arch/x86/kvm/vmx.c --- --- a/arch/x86/kvm/vmx.c2009-05-12 15:28:42.0 +0530 +++ b/arch/x86/kvm/vmx.c2009-05-12 15:51:02.0 +0530 @@ -3235,8 +3235,8 @@ static void update_tpr_threshold(struct if (!vm_need_tpr_shadow(vcpu-kvm)) return; - if (!kvm_lapic_enabled(vcpu) || - ((max_irr = kvm_lapic_find_highest_irr(vcpu)) == -1)) { + if (((max_irr = kvm_lapic_find_highest_irr(vcpu)) == -1) || +!kvm_lapic_enabled(vcpu)) { vmcs_write32(TPR_THRESHOLD, 0); return; } --- Regards-- Subrata -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip v5 1/7] x86: instruction decorder API
On Wed, May 13, 2009 at 10:23, Gleb Natapov g...@redhat.com wrote: On Fri, May 08, 2009 at 08:48:42PM -0400, Masami Hiramatsu wrote: +++ b/arch/x86/scripts/gen-insn-attr-x86.awk @@ -0,0 +1,314 @@ +#!/bin/awk -f On some distributions (debian) it is /usr/bin/awk. True, but on most of them (all?) there is also an appropriate link in /bin. If shebang could have more that one argument, then '/usr/bin/env awk -f' would be the best solution I think. -- Przemysław Pawełczyk -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][Resend] Fix Warnining in arch/x86/kvm/vmx.c
Subrata Modak wrote: Hi Avi/Yaniv, With gcc --version 4.4.1 20090429 (prerelease) I get the following warning: arch/x86/kvm/vmx.c: In function ‘vmx_intr_assist’: arch/x86/kvm/vmx.c:3233: warning: ‘max_irr’ may be used uninitialized in this function arch/x86/kvm/vmx.c:3233: note: ‘max_irr’ was declared here Investigation found that: 3231 static void update_tpr_threshold(struct kvm_vcpu *vcpu) 3232 { 3233 int max_irr, tpr; 3234 3235 if (!vm_need_tpr_shadow(vcpu-kvm)) 3236 return; 3237 3238 if (!kvm_lapic_enabled(vcpu) || 3239 ((max_irr = kvm_lapic_find_highest_irr(vcpu)) == -1)) { This function no longer exists; can you check if the current code is susceptible? (max_irr = kvm_lapic_find_highest_irr(vcpu)) == -1 may not get a chance to evaluate if: !kvm_lapic_enabled(vcpu) evaluates to true (as the expressions are Or-ed). 3240 vmcs_write32(TPR_THRESHOLD, 0); 3241 return; 3242 } 3243 3244 tpr = (kvm_lapic_get_cr8(vcpu) 0x0f) 4; 3245 vmcs_write32(TPR_THRESHOLD, (max_irr tpr) ? tpr 4 : max_irr 4); Using (max_irr tpr) and max_irr 4, without max_irr getting initialized can cause trouble. With !kvm_lapic_enabled(), TPR_THRESHOLD is meaningless. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip v5 1/7] x86: instruction decorder API
On Wed, May 13, 2009 at 11:35:16AM +0200, Przemysssaw Paweeeczyk wrote: On Wed, May 13, 2009 at 10:23, Gleb Natapov g...@redhat.com wrote: On Fri, May 08, 2009 at 08:48:42PM -0400, Masami Hiramatsu wrote: +++ b/arch/x86/scripts/gen-insn-attr-x86.awk @@ -0,0 +1,314 @@ +#!/bin/awk -f On some distributions (debian) it is /usr/bin/awk. True, but on most of them (all?) there is also an appropriate link in /bin. Nope, not on debian testing. Although I assume if kernel compilation will start to fail it will appear :) If shebang could have more that one argument, then '/usr/bin/env awk -f' would be the best solution I think. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] kvm: Use a bitmap for tracking used GSIs
Alex Williamson wrote: We're currently using a counter to track the most recent GSI we've handed out. This quickly hits KVM_MAX_IRQ_ROUTES when using device assignment with a driver that regularly toggles the MSI enable bit. This can mean only a few minutes of usable run time. Instead, track used GSIs in a bitmap. Signed-off-by: Alex Williamson alex.william...@hp.com --- v2: Added mutex to protect gsi bitmap Why is the mutex needed? We already have mutex protection in qemu. How often does the driver enable/disable the MSI (and, do you now why)? If it's often enough it may justify kernel support. (We'll need this patch in any case for kernels without this new support). -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: event injection MACROs
Dong, Eddie wrote: I noticed the MACRO for SVM vmcb-control.event_inj and VMX VM_EXIT_INTR_INFO are almost same, I have a need to query the event injection situation in common code so plan to expose this register read/write to x86.c. Should we define a new format for evtinj/VM_EXIT_INTR_INFO as common KVM format, or just move those original MACRO to kvm_host.h? This is dangerous if additional bits or field values are defined by either architecture. Better to use accessors. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] qemu-kvm-0.10.4
Mark McLoughlin wrote: On Tue, 2009-05-12 at 22:30 +0200, Farkas Levente wrote: Avi Kivity wrote: qemu-kvm-0.10.4 is now available. This is the first release of the 0.10 stable branch of qemu-kvm. The qemu-kvm 0.10.4 includes all of the features and fixes of qemu-0.10.4, plus adaptations for improved kvm support. Note that qemu-kvm releases do not include the kvm external modules (kvm*.ko); you can use the modules provided by your distribution, modules from the development releases (kvm-xx), or from the kvm-kmod stable branch releases once they become available. As this is the first release of this branch there is no changelog; qemu-kvm-0.10.4 is roughly equivalent (but is not identical) to qemu from kvm-84. this's the plan? ie. the stable userspace will be about kvm-84? what's the plan for kvm-kmod release date and it's also be somewhere ~ 84? AIUI it, the plan is: - There will be stable releases of qemu-kvm in sync with qemu upstream releases - e.g. you can expect a qemu-kvm-0.11.0 release shortly after qemu-0.11.0 is released - There will be no stable releases, as such, of the kernel module. You should use upstream linux releases instead - e.g. the latest stable release is 2.6.29.2 - The kvm-XX releases are development snapshots of the kvm.git and qemu-kvm.git code For example, in Fedora, our plan is that we will ship the kvm.ko included in upstream linux releases and the qemu-kvm stable releases[1]. We may include qemu-kvm from kvm-XX releases during the development of the next Fedora release, but only as a preview of the next qemu-kvm stable release. and what's the plan for rhel-5.4? in this case latest stable kernel can't be used since 5.x series are always 2.6.18 based and if there is not a stable kvm-kmod branch then...? -- Levente Si vis pacem para bellum! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Enable IRQ windows after exception injection if there are pending virq
On Wed, May 13, 2009 at 03:45:37PM +0800, Dong, Eddie wrote: Gleb Natapov wrote: On Tue, May 12, 2009 at 11:06:39PM +0800, Dong, Eddie wrote: I didn't take many test since our PTS system stop working now due to KVM userspace build changes. But since the logic is pretty simple, so I want to post here to see comments. Thx, eddie If there is pending irq after an virtual exception is injected, KVM needs to enable IRQ window to trap back earlier once the exception is handled. I already posted patch to do that http://patchwork.kernel.org/patch/21830/ Is you patch different? Is it base on the idea I mentioned to you in private mail (April 27), or a novel one? Yes. It fixes the bug you pointed out. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/3] generic hypercall support
Anthony Liguori wrote: Gregory Haskins wrote: So, yes, the delta from PIO to HC is 350ns. Yes, this is a ~1.4% improvement. So what? Its still an improvement. If that improvement were for free, would you object? And we all know that this change isn't free because we have to change some code (+128/-0, to be exact). But what is it specifically you are objecting to in the first place? Adding hypercall support as an pv_ops primitive isn't exactly hard or complex, or even very much code. Where does 25us come from? The number you post below are 33us and 66us. snip The 25us is approximately the max from an in-kernel harness strapped directly to the driver gathered informally during testing. The 33us is from formally averaging multiple runs of a userspace socket app in preparation for publishing. I consider the 25us the target goal since there is obviously overhead that a socket application deals with that theoretically a guest bypasses with the tap-device. Note that the socket application itself often sees 30us itself...this was just a particularly slow set of runs that day. Note that this is why I express the impact as approximately (e.g. ~4%). Sorry for the confusion. -Greg signature.asc Description: OpenPGP digital signature
[PATCH][KVM-AUTOTEST] TAP network support in kvm-autotest
Hi All: This patch tries to add tap network support in kvm-autotest. Multiple nics connected to different bridges could be achieved through this script. Public bridge is important for testing real network traffic and migration. The patch gives each nic with randomly generated mac address. The ip address required in the test could be dynamically probed through nmap/arp. Only the ip address of first NIC is used through the test. Example: nics = nic1 nic2 network = bridge bridge = switch ifup =/etc/qemu-ifup-switch ifdown =/etc/qemu-ifdown-switch This would make the virtual machine have two nics both of which are connected to a bridge with the name of 'switch'. Ifup/ifdown scripts are also specified. Another Example: nics = nic1 nic2 network = bridge bridge = switch bridge_nic2 = virbr0 ifup =/etc/qemu-ifup-switch ifup_nic2 = /etc/qemu-ifup-virbr0 This would makes the virtual machine have two nics: nic1 are connected to bridge 'switch' and nci2 are connected to bridge 'virbr0'. Public mode and user mode nic could also be mixed: nics = nic1 nic2 network = bridge network_nic2 = user Looking forward for comments and suggestions. From: jason jasow...@redhat.com Date: Wed, 13 May 2009 16:15:28 +0800 Subject: [PATCH] Add tap networking support. --- client/tests/kvm_runtest_2/kvm_utils.py |7 +++ client/tests/kvm_runtest_2/kvm_vm.py| 74 ++- 2 files changed, 69 insertions(+), 12 deletions(-) diff --git a/client/tests/kvm_runtest_2/kvm_utils.py b/client/tests/kvm_runtest_2/kvm_utils.py index be8ad95..0d1f7f8 100644 --- a/client/tests/kvm_runtest_2/kvm_utils.py +++ b/client/tests/kvm_runtest_2/kvm_utils.py @@ -773,3 +773,10 @@ def md5sum_file(filename, size=None): size -= len(data) f.close() return o.hexdigest() + +def random_mac(): +mac=[0x00,0x16,0x30, + random.randint(0x00,0x09), + random.randint(0x00,0x09), + random.randint(0x00,0x09)] +return ':'.join(map(lambda x: %02x %x,mac)) diff --git a/client/tests/kvm_runtest_2/kvm_vm.py b/client/tests/kvm_runtest_2/kvm_vm.py index fab839f..ea7dab6 100644 --- a/client/tests/kvm_runtest_2/kvm_vm.py +++ b/client/tests/kvm_runtest_2/kvm_vm.py @@ -105,6 +105,10 @@ class VM: self.qemu_path = qemu_path self.image_dir = image_dir self.iso_dir = iso_dir +self.macaddr = [] +for nic_name in kvm_utils.get_sub_dict_names(params,nics): +macaddr = kvm_utils.random_mac() +self.macaddr.append(macaddr) def verify_process_identity(self): Make sure .pid really points to the original qemu process. @@ -189,9 +193,25 @@ class VM: for nic_name in kvm_utils.get_sub_dict_names(params, nics): nic_params = kvm_utils.get_sub_dict(params, nic_name) qemu_cmd += -net nic,vlan=%d % vlan +net = nic_params.get(network) +if net == bridge: +qemu_cmd += ,macaddr=%s % self.macaddr[vlan] if nic_params.get(nic_model): qemu_cmd += ,model=%s % nic_params.get(nic_model) -qemu_cmd += -net user,vlan=%d % vlan +if net == bridge: +qemu_cmd += -net tap,vlan=%d % vlan +ifup = nic_params.get(ifup) +if ifup: +qemu_cmd += ,script=%s % ifup +else: +qemu_cmd += ,script=/etc/qemu-ifup +ifdown = nic_params.get(ifdown) +if ifdown: +qemu_cmd += ,downscript=%s % ifdown +else: +qemu_cmd += ,downscript=no +else: +qemu_cmd += -net user,vlan=%d % vlan vlan += 1 mem = params.get(mem) @@ -206,11 +226,11 @@ class VM: extra_params = params.get(extra_params) if extra_params: qemu_cmd += %s % extra_params - + for redir_name in kvm_utils.get_sub_dict_names(params, redirs): redir_params = kvm_utils.get_sub_dict(params, redir_name) guest_port = int(redir_params.get(guest_port)) -host_port = self.get_port(guest_port) +host_port = self.get_port(guest_port,True) qemu_cmd += -redir tcp:%s::%s % (host_port, guest_port) if params.get(display) == vnc: @@ -467,27 +487,57 @@ class VM: If port redirection is used, return 'localhost' (the guest has no IP address of its own). Otherwise return the guest's IP address. -# Currently redirection is always used, so return 'localhost' -return localhost +if self.params.get(network) == bridge: +# probing ip address through arp +bridge_name = self.params['bridge'] +macaddr = self.macaddr[0] +lines = os.popen(arp -a).readlines() +for line in lines: +if macaddr in line: +return
Re: [PATCH 1/3] virtio: find_vqs/del_vqs virtio operations
On Wed, 13 May 2009 04:48:34 pm Michael S. Tsirkin wrote: Let's just add a helper function for the single vq case? static inline struct virtqueue *virtio_find_vq(struct virtio_devide *vdev, vq_callback_t *c, const char *n) virtio_find_single_vq() to emphasize the singular nature, and it looks good. Thanks! Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] KVM: Fix potentially recursively get kvm lock
On Tue, May 12, 2009 at 11:30:21AM -0300, Marcelo Tosatti wrote: On Tue, May 12, 2009 at 10:13:36PM +0800, Yang, Sheng wrote: + mutex_unlock(kvm-lock); assigned_dev list is protected by kvm-lock. So you could have another ioctl adding to it at the same time you're searching. Oh, yes... My fault... Could either have a separate kvm-assigned_devs_lock, to protect kvm-arch.assigned_dev_head (users are ioctls that manipulate it), or change the IRQ injection to use a separate spinlock, kill the workqueue and call kvm_set_irq from the assigned device interrupt handler. Peferred the latter, though needs more work. But the only reason for put a workqueue here is because kvm-lock is a mutex? I can't believe... If so, I think we had made a big mistake - we have to fix all kinds of racy problem caused by this, but finally find it's unnecessary... One issue is that kvm_set_irq can take too long while interrupts are blocked, and you'd have to disable interrupts in other contexes that inject interrupts (say qemu-ioctl(SET_INTERRUPT)-...-), so all i can see is a tradeoff. guess mode on But the interrupt injection path seems to be pretty short and efficient to happen in host interrupt context. guess mode off Avi, Gleb? Interrupt injection path also use IRQ routing data structures so access to them should be protected by the same lock. And of cause in kernel device (apic/ioapic/pic) mmio is done holding this lock so interrupt injection cannot happen in parallel with device reconfiguration. May be we want more parallelism here. Maybe another reason is kvm_kick_vcpu(), but have already fixed by you. Note you tested the spinlock_irq patch with GigE and there was no significant performance regression right? Continue to check the code... -- regards Yang, Sheng -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/10] Unprotect a page if #PF happens during NMI injection.
Gleb Natapov wrote: It is done for exception and interrupt already. Applied, thanks. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] kvm: Use a bitmap for tracking used GSIs
On Wed, 2009-05-13 at 10:03 +0300, Michael S. Tsirkin wrote: On Tue, May 12, 2009 at 04:07:15PM -0600, Alex Williamson wrote: We're currently using a counter to track the most recent GSI we've handed out. This quickly hits KVM_MAX_IRQ_ROUTES when using device assignment with a driver that regularly toggles the MSI enable bit. BTW, dwhich driver does that? Any idea why? I've seen it from both e1000e and qla2xxx. I assumed it was some kind of interrupt mitigation since the devices seem to work fine otherwise. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] kvm: Use a bitmap for tracking used GSIs
On Wed, 2009-05-13 at 10:04 +0300, Michael S. Tsirkin wrote: On Tue, May 12, 2009 at 04:07:15PM -0600, Alex Williamson wrote: @@ -286,6 +289,9 @@ kvm_context_t kvm_init(struct kvm_callbacks *callbacks, int fd; kvm_context_t kvm; int r; +#ifdef KVM_CAP_IRQ_ROUTING Let's kill all these ifdefs. Or at least, let's not add them. AFAICT, they're still used both for builds against older kernels and architectures that don't support it. Hollis just added the one around kvm_get_irq_route_gsi() 10 days ago to fix ppc build. Has it since been deprecated? Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] kvm: Use a bitmap for tracking used GSIs
On Wed, 2009-05-13 at 12:47 +0300, Avi Kivity wrote: Alex Williamson wrote: We're currently using a counter to track the most recent GSI we've handed out. This quickly hits KVM_MAX_IRQ_ROUTES when using device assignment with a driver that regularly toggles the MSI enable bit. This can mean only a few minutes of usable run time. Instead, track used GSIs in a bitmap. Signed-off-by: Alex Williamson alex.william...@hp.com --- v2: Added mutex to protect gsi bitmap Why is the mutex needed? We already have mutex protection in qemu. If it's unneeded, I'll happily remove it. I was assuming in a guest with multiple devices these could come in parallel, but maybe the guest is already serialized for config space accesses via cfc/cf8. How often does the driver enable/disable the MSI (and, do you now why)? If it's often enough it may justify kernel support. (We'll need this patch in any case for kernels without this new support). Seems like multiple times per second. I don't know why. Now I'm starting to get curious why nobody else seems to be hitting this. I'm seeing it on an e1000e NIC and Qlogic fibre channel. Is everyone else using MSI-X or regular interrupts vs MSI? Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] kvm: Use a bitmap for tracking used GSIs
Alex Williamson wrote: On Wed, 2009-05-13 at 12:47 +0300, Avi Kivity wrote: Alex Williamson wrote: We're currently using a counter to track the most recent GSI we've handed out. This quickly hits KVM_MAX_IRQ_ROUTES when using device assignment with a driver that regularly toggles the MSI enable bit. This can mean only a few minutes of usable run time. Instead, track used GSIs in a bitmap. Signed-off-by: Alex Williamson alex.william...@hp.com --- v2: Added mutex to protect gsi bitmap Why is the mutex needed? We already have mutex protection in qemu. If it's unneeded, I'll happily remove it. I was assuming in a guest with multiple devices these could come in parallel, but maybe the guest is already serialized for config space accesses via cfc/cf8. The guest may or may not be serialized; we can't rely on that. But qemu is, and we can. How often does the driver enable/disable the MSI (and, do you now why)? If it's often enough it may justify kernel support. (We'll need this patch in any case for kernels without this new support). Seems like multiple times per second. I don't know why. Now I'm starting to get curious why nobody else seems to be hitting this. I'm seeing it on an e1000e NIC and Qlogic fibre channel. Is everyone else using MSI-X or regular interrupts vs MSI? When you say multiple times, it is several, or a lot more? Maybe it is NAPI? -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] kvm: Use a bitmap for tracking used GSIs
On Wed, 2009-05-13 at 15:35 +0300, Avi Kivity wrote: Alex Williamson wrote: On Wed, 2009-05-13 at 12:47 +0300, Avi Kivity wrote: Alex Williamson wrote: We're currently using a counter to track the most recent GSI we've handed out. This quickly hits KVM_MAX_IRQ_ROUTES when using device assignment with a driver that regularly toggles the MSI enable bit. This can mean only a few minutes of usable run time. Instead, track used GSIs in a bitmap. Signed-off-by: Alex Williamson alex.william...@hp.com --- v2: Added mutex to protect gsi bitmap Why is the mutex needed? We already have mutex protection in qemu. If it's unneeded, I'll happily remove it. I was assuming in a guest with multiple devices these could come in parallel, but maybe the guest is already serialized for config space accesses via cfc/cf8. The guest may or may not be serialized; we can't rely on that. But qemu is, and we can. Ok, I'll drop the mutex here. How often does the driver enable/disable the MSI (and, do you now why)? If it's often enough it may justify kernel support. (We'll need this patch in any case for kernels without this new support). Seems like multiple times per second. I don't know why. Now I'm starting to get curious why nobody else seems to be hitting this. I'm seeing it on an e1000e NIC and Qlogic fibre channel. Is everyone else using MSI-X or regular interrupts vs MSI? When you say multiple times, it is several, or a lot more? Maybe it is NAPI? The system would run out of the ~1000 available GSIs in a minute or two with just an e1000e available to the guest. So that's something on the order of 10/s. This also causes a printk in the host ever time the interrupt in enabled, which can't help performance and gets pretty annoying for syslog. I was guessing some kind of interrupt mitigation, such as NAPI, but a qlogic FC card seems to do it too (seemingly at a slower rate). Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio net regression
Re-sending as this does not seem to have made it to the list. Antoine Martin wrote: Hi, Here is another one, any ideas? These oopses do look quite deep. Is it normal to end up in tcp_send_ack from pdflush?? Cheers Antoine [929492.154634] pdflush: page allocation failure. order:0, mode:0x20 [929492.154637] Pid: 291, comm: pdflush Not tainted 2.6.29.2 #5 [929492.154639] Call Trace: [929492.154641] IRQ [8027e8bc] __alloc_pages_internal+0x3e1/0x401 [929492.154649] [8055b5ea] try_fill_recv+0xa1/0x182 [929492.154652] [8055c1fc] virtnet_poll+0x533/0x5ab [929492.154655] [80632bba] net_rx_action+0x70/0x143 [929492.154658] [8023f18c] __do_softirq+0x83/0x123 [929492.154661] [8020d35c] call_softirq+0x1c/0x28 [929492.154664] [8020e2c0] do_softirq+0x3c/0x85 [929492.154666] [8023eea3] irq_exit+0x3f/0x7a [929492.154668] [8020e59c] do_IRQ+0x12b/0x14f [929492.154670] [8020cad3] ret_from_intr+0x0/0x29 [929492.154672] EOI [802c22b1] __set_page_dirty_buffers+0x0/0x8f [929492.154677] [8031702b] bget_one+0x0/0xb [929492.154680] [80316fa2] walk_page_buffers+0x2/0x8b [929492.154682] [803185bc] ext3_ordered_writepage+0xae/0x134 [929492.154685] [8027ea46] __writepage+0xa/0x25 [929492.154687] [8027f19f] write_cache_pages+0x206/0x322 [929492.154689] [8027ea3c] __writepage+0x0/0x25 [929492.154691] [8027f2fe] do_writepages+0x27/0x2d [929492.154694] [802bd3f6] __writeback_single_inode+0x1a7/0x3b5 [929492.154696] [8020a68c] __switch_to+0xb4/0x38c [929492.154698] [802bda76] generic_sync_sb_inodes+0x2a7/0x458 [929492.154701] [802bde00] writeback_inodes+0x8d/0xe6 [929492.154704] [807296e2] _spin_lock+0x5/0x7 [929492.155056] [8027f432] wb_kupdate+0x9f/0x116 [929492.155058] [80280095] pdflush+0x14b/0x202 [929492.155061] [8027f393] wb_kupdate+0x0/0x116 [929492.155063] [8027ff4a] pdflush+0x0/0x202 [929492.155065] [8027ff4a] pdflush+0x0/0x202 [929492.155068] [8024c127] kthread+0x47/0x73 [929492.155070] [8020d25a] child_rip+0xa/0x20 [929492.155072] [8024c0e0] kthread+0x0/0x73 [929492.183142] [8020d250] child_rip+0x0/0x20 [929492.183145] Mem-Info: [929492.183147] DMA per-cpu: [929492.183149] CPU0: hi:0, btch: 1 usd: 0 [929492.183151] DMA32 per-cpu: [929492.183154] CPU0: hi: 186, btch: 31 usd: 184 [929492.183158] Active_anon:2755 active_file:39849 inactive_anon:2972 [929492.183159] inactive_file:70353 unevictable:0 dirty:4172 writeback:1580 unstable:0 [929492.183161] free:734 slab:5619 mapped:15047 pagetables:927 bounce:0 [929492.183166] DMA free:1968kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:40kB active_file:2116kB inactive_file:1880kB unevictable:0kB present:5448kB pages_scanned:0 all_unreclaimable? no [929492.183169] lowmem_reserve[]: 0 489 489 489 [929492.183176] DMA32 free:968kB min:2812kB low:3512kB high:4216kB active_anon:11020kB inactive_anon:11848kB active_file:157280kB inactive_file:279532kB unevictable:0kB present:500896kB pages_scanned:0 all_unreclaimable? no [929492.183180] lowmem_reserve[]: 0 0 0 0 [929492.183183] DMA: 6*4kB 2*8kB 3*16kB 1*32kB 1*64kB 2*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1976kB [929492.183235] DMA32: 0*4kB 1*8kB 0*16kB 0*32kB 1*64kB 3*128kB 2*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 968kB [929492.183244] 110992 total pagecache pages [929492.183246] 739 pages in swap cache [929492.183248] Swap cache stats: add 8996, delete 8257, find 92604/93191 [929492.183250] Free swap = 1040016kB [929492.183252] Total swap = 1048568kB [929492.186003] 131056 pages RAM [929492.186006] 4799 pages reserved [929492.186007] 44697 pages shared [929492.186008] 90516 pages non-shared [930274.380075] eth0: no IPv6 routers present Antoine Martin wrote: Hi Still getting (some but less) network issues with a 2.6.28.9 host. Found quite a few of these call traces in the 2.6.29.1 guests: Guest has 512MB of memory and was not all that busy (just network traffic), so I don't understand why it would fail to allocate a page... [701453.834571] kjournald: page allocation failure. order:0, mode:0x4020 [701453.834574] Pid: 4806, comm: kjournald Not tainted 2.6.29.1 #4 [701453.834576] Call Trace: [701453.834578] IRQ [8027fa48] __alloc_pages_internal+0x3e1/0x401 [701453.834586] [802a1ad4] __slab_alloc+0x17f/0x4ca [701453.834590] [8067e322] tcp_send_ack+0x23/0x105 [701453.834592] [8067e322] tcp_send_ack+0x23/0x105 [701453.834595] [802a2e66] __kmalloc_track_caller+0xac/0xe1 [701453.834598] [8062f97e] __alloc_skb+0x61/0x11e [701453.834600] [8067e322] tcp_send_ack+0x23/0x105 [701453.834603] [8067c374] tcp_rcv_established+0x6c7/0x9e6
Re: [PATCH v4] kvm: Use a bitmap for tracking used GSIs
Alex Williamson wrote: When you say multiple times, it is several, or a lot more? Maybe it is NAPI? The system would run out of the ~1000 available GSIs in a minute or two with just an e1000e available to the guest. So that's something on the order of 10/s. This also causes a printk in the host ever time the interrupt in enabled, which can't help performance and gets pretty annoying for syslog. I was guessing some kind of interrupt mitigation, such as NAPI, but a qlogic FC card seems to do it too (seemingly at a slower rate). I see. And what is the path by which it is disabled? The mask bit in the MSI entry? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] kvm: Use a bitmap for tracking used GSIs
On Wed, 2009-05-13 at 16:00 +0300, Avi Kivity wrote: Alex Williamson wrote: When you say multiple times, it is several, or a lot more? Maybe it is NAPI? The system would run out of the ~1000 available GSIs in a minute or two with just an e1000e available to the guest. So that's something on the order of 10/s. This also causes a printk in the host ever time the interrupt in enabled, which can't help performance and gets pretty annoying for syslog. I was guessing some kind of interrupt mitigation, such as NAPI, but a qlogic FC card seems to do it too (seemingly at a slower rate). I see. And what is the path by which it is disabled? The mask bit in the MSI entry? Yes, I believe the only path is via a write to the MSI capability in the PCI config space. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] KVM: Fix potentially recursively get kvm lock
On Wed, May 13, 2009 at 10:07:54AM +0800, Yang, Sheng wrote: KVM: workaround workqueue / deassign_host_irq deadlock I think I'm running into the following deadlock in the kvm kernel module when trying to use device assignment: CPU A CPU B kvm_vm_ioctl_deassign_dev_irq() mutex_lock(kvm-lock); worker_thread() - kvm_deassign_irq() - kvm_assigned_dev_interrupt_work_handler() - deassign_host_irq() mutex_lock(kvm-lock); - cancel_work_sync() [blocked] Workaround the issue by dropping kvm-lock for cancel_work_sync(). Reported-by: Alex Williamson alex.william...@hp.com From: Sheng Yang sheng.y...@intel.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Another calling path(kvm_free_all_assigned_devices()) don't hold kvm-lock... Seems it need the lock for travel assigned dev list? Sheng, The task executing the deassign irq ioctl has a reference to the vm instance. This solution is just temporary though until the locks can be split and then dropping kvm-lock around cancel_work_sync will not be necessary anymore. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip v5 0/7] tracing: kprobe-based event tracer and x86 instruction decoder
* Masami Hiramatsu mhira...@redhat.com wrote: Ingo Molnar wrote: * Masami Hiramatsu mhira...@redhat.com wrote: Hi, Here are the patches of kprobe-based event tracer for x86, version 5, which allows you to probe various kernel events through ftrace interface. This version supports only x86(-32/-64) (but porting it on other arch just needs kprobes/kretprobes and register and stack access APIs). This patchset also includes x86(-64) instruction decoder which supports non-SSE/FP opcodes and includes x86 opcode map. I think it will be possible to share this opcode map with KVM's decoder. This series can be applied on the latest linux-2.6-tip tree. This patchset includes following changes: - Add x86 instruction decoder [1/7] - Check insertion point safety in kprobe [2/7] - Cleanup fix_riprel() with insn decoder [3/7] - Add kprobe-tracer plugin [4/7] - Fix kernel_trap_sp() on x86 according to systemtap runtime. [5/7] - Add arch-dep register and stack fetching functions [6/7] - Support fetching various status (register/stack/memory/etc.) [7/7] Future items: - .init function tracing support. - Support primitive types(long, ulong, int, uint, etc) for args. Ok, this looks pretty complete already. Two high-level comments: - There's no self-test - would it be possible to add one? See trace_selftest* in kernel/trace/ - No generic integration. Hmm, Ingo, could you tell me what I can do for the integration? Would you means that I should use filters? yeah, that - and for the tracepoints to show up under /debug/tracing/events/. They'd in essence be 'flexible', dynamic event tracepoints that extend upon existing, built-in tracepoints. To user-space tools the two would show up in a very similar way and with a similar usage (once they are injected). Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] kvm: Use a bitmap for tracking used GSIs
On Wed, May 13, 2009 at 07:11:16AM -0600, Alex Williamson wrote: On Wed, 2009-05-13 at 16:00 +0300, Avi Kivity wrote: Alex Williamson wrote: When you say multiple times, it is several, or a lot more? Maybe it is NAPI? The system would run out of the ~1000 available GSIs in a minute or two with just an e1000e available to the guest. So that's something on the order of 10/s. This also causes a printk in the host ever time the interrupt in enabled, which can't help performance and gets pretty annoying for syslog. I was guessing some kind of interrupt mitigation, such as NAPI, but a qlogic FC card seems to do it too (seemingly at a slower rate). I see. And what is the path by which it is disabled? The mask bit in the MSI entry? Yes, I believe the only path is via a write to the MSI capability in the PCI config space. Alex Very surprising: I haven't seen any driver disable MSI expect on device destructor path. Is this a linux guest? -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Implement generic double fault generation mechanism
That is OK, You can send two patches. The first one will WARN_ON and overwrite exception like the current code does. And the second one will remove WARN_ON explaining that this case is actually possible to trigger from a guest. Sounds you don't like to provide this additional one, here it is for the purpose of removing the block issue. My basic position is still same with what mentioned in previous mail, but I am neutral to either way. Thx, eddie Signed-off-by: Eddie Dong eddie.d...@intel.com Overwriting former event may help forward progress in case of multiple exception/interrupt happens serially. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d0e75a2..b3de5d2 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -183,11 +183,7 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu, int class1, class2; if (!vcpu-arch.exception.pending) { - vcpu-arch.exception.pending = true; - vcpu-arch.exception.has_error_code = has_error; - vcpu-arch.exception.nr = nr; - vcpu-arch.exception.error_code = error_code; - return; + goto out; } /* to check exception */ @@ -208,9 +204,15 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu, vcpu-arch.exception.has_error_code = true; vcpu-arch.exception.nr = DF_VECTOR; vcpu-arch.exception.error_code = 0; + return; } else printk(KERN_ERR Exception 0x%x on 0x%x happens serially\n, prev_nr, nr); +out: + vcpu-arch.exception.pending = true; + vcpu-arch.exception.has_error_code = has_error; + vcpu-arch.exception.nr = nr; + vcpu-arch.exception.error_code = error_code; } void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr) serial_irq.patch Description: serial_irq.patch
Re: [PATCH v4] kvm: Use a bitmap for tracking used GSIs
On Wed, 2009-05-13 at 16:55 +0300, Michael S. Tsirkin wrote: On Wed, May 13, 2009 at 07:11:16AM -0600, Alex Williamson wrote: On Wed, 2009-05-13 at 16:00 +0300, Avi Kivity wrote: Alex Williamson wrote: When you say multiple times, it is several, or a lot more? Maybe it is NAPI? The system would run out of the ~1000 available GSIs in a minute or two with just an e1000e available to the guest. So that's something on the order of 10/s. This also causes a printk in the host ever time the interrupt in enabled, which can't help performance and gets pretty annoying for syslog. I was guessing some kind of interrupt mitigation, such as NAPI, but a qlogic FC card seems to do it too (seemingly at a slower rate). I see. And what is the path by which it is disabled? The mask bit in the MSI entry? Yes, I believe the only path is via a write to the MSI capability in the PCI config space. Alex Very surprising: I haven't seen any driver disable MSI expect on device destructor path. Is this a linux guest? Yes, Debian 2.6.26 kernel. I'll check it it behaves the same on newer upstream kernels and try to figure out why it's doing it. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: event injection MACROs
Avi Kivity wrote: Dong, Eddie wrote: I noticed the MACRO for SVM vmcb-control.event_inj and VMX VM_EXIT_INTR_INFO are almost same, I have a need to query the event injection situation in common code so plan to expose this register read/write to x86.c. Should we define a new format for evtinj/VM_EXIT_INTR_INFO as common KVM format, or just move those original MACRO to kvm_host.h? This is dangerous if additional bits or field values are defined by either architecture. Better to use accessors. OK. Also back to Gleb's question, the reason I want to do that is to simplify event generation mechanism in current KVM. Today KVM use additional layer of exception/nmi/interrupt such as vcpu.arch.exception.pending, vcpu-arch.interrupt.pending vcpu-arch.nmi_injected. All those additional layer is due to compete of VM_ENTRY_INTR_INFO_FIELD write to inject the event. Both SVM VMX has only one resource to inject the virtual event but KVM generates 3 catagory of events in parallel which further requires additional logic to dictate among them. One example is that exception has higher priority than NMI/IRQ injection in current code which is not true in reality. Another issue is that an failed event from previous injection say IRQ or NMI may be discarded if an virtual exception happens in the EXIT handling now. With the patch of generic double fault handling, this case should be handled as normally. Will post RFC soon. Thx, eddie-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] kvm: Use a bitmap for tracking used GSIs
On Tue, May 12, 2009 at 10:41:29PM -0600, Alex Williamson wrote: + gsi_count = kvm_get_gsi_count(kvm); + /* Round up so we can search ints using ffs */ + gsi_bytes = ((gsi_count + 31) / 32) * 4; + kvm-used_gsi_bitmap = malloc(gsi_bytes); What happens on error in kvm_get_gsi_count? gsi_count will be negative .. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] kvm: Use a bitmap for tracking used GSIs
On Wed, 2009-05-13 at 08:15 -0600, Alex Williamson wrote: On Wed, 2009-05-13 at 16:55 +0300, Michael S. Tsirkin wrote: On Wed, May 13, 2009 at 07:11:16AM -0600, Alex Williamson wrote: On Wed, 2009-05-13 at 16:00 +0300, Avi Kivity wrote: Alex Williamson wrote: When you say multiple times, it is several, or a lot more? Maybe it is NAPI? The system would run out of the ~1000 available GSIs in a minute or two with just an e1000e available to the guest. So that's something on the order of 10/s. This also causes a printk in the host ever time the interrupt in enabled, which can't help performance and gets pretty annoying for syslog. I was guessing some kind of interrupt mitigation, such as NAPI, but a qlogic FC card seems to do it too (seemingly at a slower rate). I see. And what is the path by which it is disabled? The mask bit in the MSI entry? Yes, I believe the only path is via a write to the MSI capability in the PCI config space. Alex Very surprising: I haven't seen any driver disable MSI expect on device destructor path. Is this a linux guest? Yes, Debian 2.6.26 kernel. I'll check it it behaves the same on newer upstream kernels and try to figure out why it's doing it. Updating the guest to 2.6.29 seems to fix the interrupt toggling. So it's either something in older kernels or something debian introduced, but that seems unlikely. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip v5 1/7] x86: instruction decorder API
Gleb Natapov wrote: On Wed, May 13, 2009 at 11:35:16AM +0200, Przemysssaw Paweeeczyk wrote: On Wed, May 13, 2009 at 10:23, Gleb Natapov g...@redhat.com wrote: On Fri, May 08, 2009 at 08:48:42PM -0400, Masami Hiramatsu wrote: +++ b/arch/x86/scripts/gen-insn-attr-x86.awk @@ -0,0 +1,314 @@ +#!/bin/awk -f On some distributions (debian) it is /usr/bin/awk. True, but on most of them (all?) there is also an appropriate link in /bin. Nope, not on debian testing. Although I assume if kernel compilation will start to fail it will appear :) If shebang could have more that one argument, then '/usr/bin/env awk -f' would be the best solution I think. Ah, I see. Actually, it will be executed from Makefile with 'awk -f'. --- a/arch/x86/lib/Makefile +++ b/arch/x86/lib/Makefile @@ -2,12 +2,21 @@ # Makefile for x86 specific library files. # +quiet_cmd_inat_tables = GEN $@ + cmd_inat_tables = awk -f $(srctree)/arch/x86/scripts/gen-insn-attr-x86.awk $(srctree)/arch/x86/lib/x86-opcode-map.txt $@ + So, if awk is on the PATH, it will pass. Maybe, I need to add 'HOSTAWK = awk' line in Makefile. Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America) Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/3] generic hypercall support
Anthony Liguori wrote: Gregory Haskins wrote: I specifically generalized my statement above because #1 I assume everyone here is smart enough to convert that nice round unit into the relevant figure. And #2, there are multiple potential latency sources at play which we need to factor in when looking at the big picture. For instance, the difference between PF exit, and an IO exit (2.58us on x86, to be precise). Or whether you need to take a heavy-weight exit. Or a context switch to qemu, the the kernel, back to qemu, and back to the vcpu). Or acquire a mutex. Or get head-of-lined on the VGA models IO. I know you wish that this whole discussion would just go away, but these little 300ns here, 1600ns there really add up in aggregate despite your dismissive attitude towards them. And it doesn't take much to affect the results in a measurable way. As stated, each 1us costs ~4%. My motivation is to reduce as many of these sources as possible. So, yes, the delta from PIO to HC is 350ns. Yes, this is a ~1.4% improvement. So what? Its still an improvement. If that improvement were for free, would you object? And we all know that this change isn't free because we have to change some code (+128/-0, to be exact). But what is it specifically you are objecting to in the first place? Adding hypercall support as an pv_ops primitive isn't exactly hard or complex, or even very much code. Where does 25us come from? The number you post below are 33us and 66us. This is part of what's frustrating me in this thread. Things are way too theoretical. Saying that if packet latency was 25us, then it would be a 1.4% improvement is close to misleading. [ answered in the last reply ] The numbers you've posted are also measuring on-box speeds. What really matters are off-box latencies and that's just going to exaggerate. I'm not 100% clear on what you mean with on-box vs off-box. These figures were gathered between two real machines connected via 10GE cross-over cable. The 5.8Gb/s and 33us (25us) values were gathered sending real data between these hosts. This sounds off-box to me, but I am not sure I truly understand your assertion. IIUC, if you switched vbus to using PIO today, you would go from 66us to to 65.65, which you'd round to 66us for on-box latencies. Even if you didn't round, it's a 0.5% improvement in latency. I think part of what you are missing is that in order to create vbus, I needed to _create_ an in-kernel hook from scratch since there were no existing methods. Since I measured HC to be superior in performance (if by only a little), I wasn't going to chose the slower way if there wasn't a reason, and at the time I didn't see one. Now after community review, perhaps we do have a reason, but that is the point of the review process. So now we can push something like iofd as a PIO hook instead. But either way, something needed to be created. Adding hypercall support as a pv_ops primitive is adding a fair bit of complexity. You need a hypercall fd mechanism to plumb this down to userspace otherwise, you can't support migration from in-kernel backend to non in-kernel backend. I respectfully disagree. This is orthogonal to the simple issue of the IO type for the exit. Where you *do* have a point is that the bigger benefit comes from in-kernel termination (like the iofd stuff I posted yesterday). However, in-kernel termination is not strictly necessary to exploit some reduction in overhead in the IO latency. In either case we know we can shave off about 2.56us from an MMIO. Since I formally measured MMIO rtt to userspace yesterday, we now know that we can do qemu-mmio in about 110k IOPS, 9.09us rtt. Switching to pv_io_ops-mmio() alone would be a boost to approximately 153k IOPS, 6.53us rtt. This would have a tangible benefit to all models without any hypercall plumbing screwing up migration. Therefore I still stand by the assertion that the hypercall discussion alone doesn't add very much complexity. You need some way to allocate hypercalls to particular devices which so far, has been completely ignored. I'm sorry, but thats not true. Vbus already handles this mapping. I've already mentioned why hypercalls are also unfortunate from a guest perspective. They require kernel patching and this is almost certainly going to break at least Vista as a guest. Certainly Windows 7. Yes, you have a point here. So it's not at all fair to trivialize the complexity introduce here. I'm simply asking for justification to introduce this complexity. I don't see why this is unfair for me to ask. In summary, I don't think there is really much complexity being added because this stuff really doesn't depend on the hypercallfd (iofd) interface in order to have some benefit, as you assert above. The hypercall page is a good point for attestation, but that issue exists already today and is not a newly created issue by this proposal. As
[PATCH v5] kvm: Use a bitmap for tracking used GSIs
We're currently using a counter to track the most recent GSI we've handed out. This quickly hits KVM_MAX_IRQ_ROUTES when using device assignment with a driver that regularly toggles the MSI enable bit. This can mean only a few minutes of usable run time. Instead, track used GSIs in a bitmap. Signed-off-by: Alex Williamson alex.william...@hp.com --- v2: Added mutex to protect gsi bitmap v3: Updated for comments from Michael Tsirkin No longer depends on [PATCH] kvm: device-assignment: Catch GSI overflow v4: Fix gsi_bytes calculation noted by Sheng Yang v5: Remove mutex per Avi Fix negative gsi_count path per Michael Remove KVM_CAP_IRQ_ROUTING per Michael, ppc should still be protected by the KVM_IOAPIC_NUM_PINS check hw/device-assignment.c |4 ++- kvm/libkvm/kvm-common.h |3 +- kvm/libkvm/libkvm.c | 74 ++- kvm/libkvm/libkvm.h | 10 ++ 4 files changed, 75 insertions(+), 16 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index a7365c8..a6cc9b9 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -561,8 +561,10 @@ static void free_dev_irq_entries(AssignedDevice *dev) { int i; -for (i = 0; i dev-irq_entries_nr; i++) +for (i = 0; i dev-irq_entries_nr; i++) { kvm_del_routing_entry(kvm_context, dev-entry[i]); +kvm_free_irq_route_gsi(kvm_context, dev-entry[i].gsi); +} free(dev-entry); dev-entry = NULL; dev-irq_entries_nr = 0; diff --git a/kvm/libkvm/kvm-common.h b/kvm/libkvm/kvm-common.h index 591fb53..c95c591 100644 --- a/kvm/libkvm/kvm-common.h +++ b/kvm/libkvm/kvm-common.h @@ -67,7 +67,8 @@ struct kvm_context { struct kvm_irq_routing *irq_routes; int nr_allocated_irq_routes; #endif - int max_used_gsi; + void *used_gsi_bitmap; + int max_gsi; }; int kvm_alloc_kernel_memory(kvm_context_t kvm, unsigned long memory, diff --git a/kvm/libkvm/libkvm.c b/kvm/libkvm/libkvm.c index ba0a5d1..74fb59b 100644 --- a/kvm/libkvm/libkvm.c +++ b/kvm/libkvm/libkvm.c @@ -61,10 +61,13 @@ #define DPRINTF(fmt, args...) do {} while (0) #endif +#define min(x,y) ((x) (y) ? (x) : (y)) int kvm_abi = EXPECTED_KVM_API_VERSION; int kvm_page_size; +static inline void set_bit(uint32_t *buf, unsigned int bit); + struct slot_info { unsigned long phys_addr; unsigned long len; @@ -285,7 +288,7 @@ kvm_context_t kvm_init(struct kvm_callbacks *callbacks, { int fd; kvm_context_t kvm; - int r; + int r, gsi_count; fd = open(/dev/kvm, O_RDWR); if (fd == -1) { @@ -323,6 +326,28 @@ kvm_context_t kvm_init(struct kvm_callbacks *callbacks, kvm-no_irqchip_creation = 0; kvm-no_pit_creation = 0; + gsi_count = kvm_get_gsi_count(kvm); + if (gsi_count 0) { + int gsi_bytes, i; + + /* Round up so we can search ints using ffs */ + gsi_bytes = ((gsi_count + 31) / 32) * 4; + kvm-used_gsi_bitmap = malloc(gsi_bytes); + if (!kvm-used_gsi_bitmap) + goto out_close; + memset(kvm-used_gsi_bitmap, 0, gsi_bytes); + kvm-max_gsi = gsi_bytes * 8; + + /* Mark all the IOAPIC pin GSIs and any over-allocated + * GSIs as already in use. */ +#ifdef KVM_IOAPIC_NUM_PINS + for (i = 0; i min(KVM_IOAPIC_NUM_PINS, gsi_count); i++) + set_bit(kvm-used_gsi_bitmap, i); +#endif + for (i = gsi_count; i kvm-max_gsi; i++) + set_bit(kvm-used_gsi_bitmap, i); + } + return kvm; out_close: close(fd); @@ -1298,8 +1323,6 @@ int kvm_add_routing_entry(kvm_context_t kvm, new-flags = entry-flags; new-u = entry-u; - if (entry-gsi kvm-max_used_gsi) - kvm-max_used_gsi = entry-gsi; return 0; #else return -ENOSYS; @@ -1404,19 +1427,42 @@ int kvm_commit_irq_routes(kvm_context_t kvm) #endif } +static inline void set_bit(uint32_t *buf, unsigned int bit) +{ + buf[bit / 32] |= 1U (bit % 32); +} + +static inline void clear_bit(uint32_t *buf, unsigned int bit) +{ + buf[bit / 32] = ~(1U (bit % 32)); +} + +static int kvm_find_free_gsi(kvm_context_t kvm) +{ + int i, bit, gsi; + uint32_t *buf = kvm-used_gsi_bitmap; + + for (i = 0; i kvm-max_gsi / 32; i++) { + bit = ffs(~buf[i]); + if (!bit) + continue; + + gsi = bit - 1 + i * 32; + set_bit(buf, gsi); + return gsi; + } + + return -ENOSPC; +} + int kvm_get_irq_route_gsi(kvm_context_t kvm) { -#ifdef KVM_CAP_IRQ_ROUTING - if (kvm-max_used_gsi = KVM_IOAPIC_NUM_PINS) { - if (kvm-max_used_gsi = kvm_get_gsi_count(kvm)) -return kvm-max_used_gsi + 1; -else -
Re: [PATCH -tip v5 1/7] x86: instruction decorder API
On Wed, May 13, 2009 at 10:35:55AM -0400, Masami Hiramatsu wrote: Gleb Natapov wrote: On Wed, May 13, 2009 at 11:35:16AM +0200, Przemysssaw Paweeeczyk wrote: On Wed, May 13, 2009 at 10:23, Gleb Natapov g...@redhat.com wrote: On Fri, May 08, 2009 at 08:48:42PM -0400, Masami Hiramatsu wrote: +++ b/arch/x86/scripts/gen-insn-attr-x86.awk @@ -0,0 +1,314 @@ +#!/bin/awk -f On some distributions (debian) it is /usr/bin/awk. True, but on most of them (all?) there is also an appropriate link in /bin. Nope, not on debian testing. Although I assume if kernel compilation will start to fail it will appear :) If shebang could have more that one argument, then '/usr/bin/env awk -f' would be the best solution I think. Ah, I see. Actually, it will be executed from Makefile with 'awk -f'. --- a/arch/x86/lib/Makefile +++ b/arch/x86/lib/Makefile @@ -2,12 +2,21 @@ # Makefile for x86 specific library files. # +quiet_cmd_inat_tables = GEN $@ + cmd_inat_tables = awk -f $(srctree)/arch/x86/scripts/gen-insn-attr-x86.awk $(srctree)/arch/x86/lib/x86-opcode-map.txt $@ + So, if awk is on the PATH, it will pass. Ah, that is good enough I thing. I tried to run scrip manually. Maybe, I need to add 'HOSTAWK = awk' line in Makefile. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Best choice for copy/clone/snapshot
Thanks for all the info. I have one follow up. On Wed, 2009-05-13 at 10:07 +0300, Avi Kivity wrote: As I install software onto a system I want to preserve its state--just the disk state---at various points so I can go back. What is the best way to do this? LVM snapshots. Read up on the 'lvcreate -s' command and option. I may have been unclear. I meant as I install software on the VM. Since some of them are running Windows, they can't do LVM. I am running LVM on my host Linux system. Or are you suggesting that I put the image files on a snapshottable partition? Over time the snapshot seems likely to accumulate a lot of original sectors that don't involve the disk image I care about. Or do you mean I should back each virtual disk with an LVM volume? That does seem cleaner; I've just been following the docs and they use regular files. They say I can't just use a raw partition, but maybe kvm-img -f qcow2 /dev/MyVolumeGroup/Volume10 ? Does that give better performance? The one drawback I see is that I'd have to really take the space I wanted, rather than having it only notionally reserved for a file. I'm not sure how growing the logical volume would interact with qcow... Ross -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Network I/O performance
Subject: Re: Network I/O performance Fischer, Anna wrote: I am running KVM with Fedora Core 8 on a 2.6.23 32-bit kernel. I use the tun/tap device model and the Linux bridge kernel module to connect my VM to the network. I have 2 10G Intel 82598 network devices (with the ixgbe driver) attached to my machine and I want to do packet routing in my VM (the VM has two virtual network interfaces configured). Analysing the network performance of the standard QEMU emulated NICs, I get less that 1G of throughput on those 10G links. Surprisingly though, I don't really see CPU utilization being maxed out. This is a dual core machine, and mpstat shows me that both CPUs are about 40% idle. My VM is more or less unresponsive due to the high network processing load while the host OS still seems to be in good shape. How can I best tune this setup to achieve best possible performance with KVM? I know there is virtIO and I know there is PCI pass-through, but those models are not an option for me right now. How many cpus are assigned to the guest? If only one, then 40% idle equates to 100% of a core for the guest and 20% for housekeeping. No, the machine has a dual core CPU and I have configured the guest with 2 CPUs. So I would want to see KVM using up to 200% of CPU, ideally. There is nothing else running on that machine. If this is the case, you could try pinning the vcpu thread (info cpus from the monitor) to one core. You should then see 100%/20% cpu load distribution. wrt emulated NIC performance, I'm guessing you're not doing tcp? If you were we might do something with TSO. No, I am measuring UDP throughput performance. I have now tried using a different NIC model, and the e1000 model seems to achieve slightly better performance (CPU goes up to 110% only though). I have also been running virtio now, and while its performance with 2.6.20 was very poor too, when changing the guest kernel to 2.6.30, I get a reasonable performance and higher CPU utilization (e.g. it goes up to 180-190%). I have to throttle the incoming bandwidth though, because as soon as I go over a certain threshold, CPU goes back down to 90% and throughput goes down too. I have not seen this with Xen/VMware where I mostly managed to max out CPU completely before throughput performance did not go up anymore. I have also realized that when using the tun/tap configuration with a bridge, packets are replicated on all tap devices when QEMU writes packets to the tun interface. I guess this is a limitation of tun/tap as it does not know to which tap device the packet has to go to. The tap device then eventually drops packets when the destination MAC is not its own, but it still receives the packet which causes more overhead in the system overall. I have not yet experimented much with pinning VCPU threads to cores. I will do that as well. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5] kvm: Use a bitmap for tracking used GSIs
On Wed, May 13, 2009 at 09:13:38AM -0600, Alex Williamson wrote: @@ -323,6 +326,28 @@ kvm_context_t kvm_init(struct kvm_callbacks *callbacks, kvm-no_irqchip_creation = 0; kvm-no_pit_creation = 0; + gsi_count = kvm_get_gsi_count(kvm); + if (gsi_count 0) { + int gsi_bytes, i; + + /* Round up so we can search ints using ffs */ + gsi_bytes = ((gsi_count + 31) / 32) * 4; Let's take ALIGN macro from linux/kernel.h? + kvm-used_gsi_bitmap = malloc(gsi_bytes); + if (!kvm-used_gsi_bitmap) + goto out_close; + memset(kvm-used_gsi_bitmap, 0, gsi_bytes); + kvm-max_gsi = gsi_bytes * 8; + + /* Mark all the IOAPIC pin GSIs and any over-allocated + * GSIs as already in use. */ Align '*'s please. +#ifdef KVM_IOAPIC_NUM_PINS I think we should just export #define KVM_IOAPIC_NUM_PINS 0 for ppc in kernel headers (or in libkvm), and get rid of this ifdef completely. Avi, agree? + for (i = 0; i min(KVM_IOAPIC_NUM_PINS, gsi_count); i++) + set_bit(kvm-used_gsi_bitmap, i); +#endif + for (i = gsi_count; i kvm-max_gsi; i++) + set_bit(kvm-used_gsi_bitmap, i); + } + return kvm; out_close: close(fd); -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC + PATCHES] Work to get KVM autotest upstream
The patches look good, but I haven't tested them yet to make sure they leave everything at a functional state (will test them and let you know). I have a somewhat related question: how is KVM-Autotest development going to proceed after the upstream merge? Currently I have comfortable access to our repository at TLV, and on good days I push as many as 20 patches per day. Should I submit all patches to the Autotest mailing list after the merge, or are we going to work with pull requests, or some other way? Will we work with git or svn? Thanks, Michael - Original Message - From: Lucas Meneghel Rodrigues mrodr...@redhat.com To: kvm@vger.kernel.org Sent: Wednesday, May 13, 2009 4:37:40 PM (GMT+0200) Auto-Detected Subject: [RFC + PATCHES] Work to get KVM autotest upstream These are the patches I have so far related to the work to get kvm autotest in shape for upstream merge. Please note that once the patches are applied, the kvm_runtest_2 directory should be placed on a fresh svn trunk checkout to work, so there's a little bit of tweaking to get them working. That said, this haven't had enough testing. I am posting them here only if someone wants to take a look at them. Cheers, -- Lucas Meneghel Rodrigues Software Engineer (QE) Red Hat - Emerging Technologies -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5] kvm: Use a bitmap for tracking used GSIs
On Wed, 2009-05-13 at 19:05 +0300, Michael S. Tsirkin wrote: On Wed, May 13, 2009 at 09:13:38AM -0600, Alex Williamson wrote: @@ -323,6 +326,28 @@ kvm_context_t kvm_init(struct kvm_callbacks *callbacks, kvm-no_irqchip_creation = 0; kvm-no_pit_creation = 0; + gsi_count = kvm_get_gsi_count(kvm); + if (gsi_count 0) { + int gsi_bytes, i; + + /* Round up so we can search ints using ffs */ + gsi_bytes = ((gsi_count + 31) / 32) * 4; Let's take ALIGN macro from linux/kernel.h? It's already defined in libkvm.c, I'll just move it up in the file. There's also a BITMAP_SIZE macro by it that looks like it can be nuked. + kvm-used_gsi_bitmap = malloc(gsi_bytes); + if (!kvm-used_gsi_bitmap) + goto out_close; + memset(kvm-used_gsi_bitmap, 0, gsi_bytes); + kvm-max_gsi = gsi_bytes * 8; + + /* Mark all the IOAPIC pin GSIs and any over-allocated + * GSIs as already in use. */ Align '*'s please. Argh, fixed. +#ifdef KVM_IOAPIC_NUM_PINS I think we should just export #define KVM_IOAPIC_NUM_PINS 0 for ppc in kernel headers (or in libkvm), and get rid of this ifdef completely. Ok, I'll add an #ifndef and make it zero in libkvm.c. It can be cleaned out further from there. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] Add serial number support for virtio_blk, V2
[Resend of earlier patch: 1/2 rebased to qemu-kvm, 2/2 minor tweak] This patch allows passing of a virtio_blk drive serial number from qemu into a guest's virtio_blk driver, and provides a means to access the serial number from a guest's userspace. Equivalent functionality currently exists for IDE and SCSI, however it is not yet implemented for virtio. Scenarios exist where guest code relies on a unique drive serial number to correctly identify the machine environment in which it exists. The following two patches implement the above: qemu-vblk-serial-2.patch which provides the qemu missing bits to interpret a '-drive .. serial=XYZ ..' flag, and: virtio_blk-serial-2.patch which extracts this information and makes it available to guest userspace via ioctl. Attached to this patch header is a trivial example program which retrieves the serial number from guest userspace. The above patches are relative to qemu-kvm.git and 2.6.29.3 respectively. -john -- john.coo...@redhat.com /* example: retrieve serial number from virtio block device */ #include stdio.h #include fcntl.h #include stdlib.h #include linux/virtio_blk.h #define iswhite(c) (!('!' = (c) (c) = '~')) #ifndef VBLK_GET_SN #define VBLK_GET_SN ((unsigned int)('V' 24 | 'B' 16 | 'L' 8 | 'K')) #endif /* get virtblk drive serial# */ int main(int ac, char ***av) { int fd, nb, i; unsigned char sn[30]; unsigned char *p; sn[0] = sizeof (sn); if ((fd = open(/dev/vda, O_RDONLY)) 0) perror(can't open device), exit(1); else if ((nb = ioctl(fd, VBLK_GET_SN, sn)) 0) perror(can't ioctl device), exit(1); printf(returned %d bytes:\n, nb); for (p = sn, i = nb; 0 = --i; ++p) printf(%02x%c, *p, i ? ' ' : '\t'); for (p = sn, i = nb; 0 = --i; ++p) printf(%c%s, iswhite(*p) ? '.' : *p, i ? : \n); return (0); }
Re: [RFC + PATCHES] Work to get KVM autotest upstream
On Wed, 2009-05-13 at 12:23 -0400, Michael Goldish wrote: The patches look good, but I haven't tested them yet to make sure they leave everything at a functional state (will test them and let you know). Thanks Michael! I will start to give more thorough test on this today, since we finally got 0.10 in shape. I have a somewhat related question: how is KVM-Autotest development going to proceed after the upstream merge? Currently I have comfortable access to our repository at TLV, and on good days I push as many as 20 patches per day. Should I submit all patches to the Autotest mailing list after the merge, or are we going to work with pull requests, or some other way? Will we work with git or svn? Here is my plan: For people inside our team, with access to the git tree we can just pull stuff to the git tree and on a given time basis I can pick up the patches and send them altogether to the KVM and autotest mailing list, wait for reviews and then check them. If you are already used to send all your changes to the KVM mailing list though, this would pose little or no change to you, just send an additional cc to the autotest mailing list. What do you think? Thanks, Michael - Original Message - From: Lucas Meneghel Rodrigues mrodr...@redhat.com To: kvm@vger.kernel.org Sent: Wednesday, May 13, 2009 4:37:40 PM (GMT+0200) Auto-Detected Subject: [RFC + PATCHES] Work to get KVM autotest upstream These are the patches I have so far related to the work to get kvm autotest in shape for upstream merge. Please note that once the patches are applied, the kvm_runtest_2 directory should be placed on a fresh svn trunk checkout to work, so there's a little bit of tweaking to get them working. That said, this haven't had enough testing. I am posting them here only if someone wants to take a look at them. Cheers, -- Lucas Meneghel Rodrigues Software Engineer (QE) Red Hat - Emerging Technologies -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] Add serial number support for virtio_blk, V2
-- john.coo...@redhat.com diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c index dad4ef0..90825a8 100644 --- a/hw/virtio-blk.c +++ b/hw/virtio-blk.c @@ -25,6 +25,7 @@ typedef struct VirtIOBlock BlockDriverState *bs; VirtQueue *vq; void *rq; +char serial_str[BLOCK_SERIAL_STRLEN + 1]; } VirtIOBlock; static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev) @@ -285,6 +286,8 @@ static void virtio_blk_reset(VirtIODevice *vdev) qemu_aio_flush(); } +/* coalesce internal state, copy to pci i/o region 0 + */ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config) { VirtIOBlock *s = to_virtio_blk(vdev); @@ -299,11 +302,13 @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config) stw_raw(blkcfg.cylinders, cylinders); blkcfg.heads = heads; blkcfg.sectors = secs; +memcpy(blkcfg.serial, s-serial_str, sizeof (blkcfg.serial)); memcpy(config, blkcfg, sizeof(blkcfg)); } static uint32_t virtio_blk_get_features(VirtIODevice *vdev) { +VirtIOBlock *s = to_virtio_blk(vdev); uint32_t features = 0; features |= (1 VIRTIO_BLK_F_SEG_MAX); @@ -311,6 +316,8 @@ static uint32_t virtio_blk_get_features(VirtIODevice *vdev) #ifdef __linux__ features |= (1 VIRTIO_BLK_F_SCSI); #endif +if (strcmp(s-serial_str, 0)) +features |= 1 VIRTIO_BLK_F_SN; return features; } @@ -353,6 +360,7 @@ void *virtio_blk_init(PCIBus *bus, BlockDriverState *bs) VirtIOBlock *s; int cylinders, heads, secs; static int virtio_blk_id; +char *ps = drive_get_serial(bs); s = (VirtIOBlock *)virtio_init_pci(bus, virtio-blk, PCI_VENDOR_ID_REDHAT_QUMRANET, @@ -369,6 +377,10 @@ void *virtio_blk_init(PCIBus *bus, BlockDriverState *bs) s-vdev.reset = virtio_blk_reset; s-bs = bs; s-rq = NULL; +if (strlen(ps)) +strncpy(s-serial_str, ps, sizeof (s-serial_str)); +else +snprintf(s-serial_str, sizeof (s-serial_str), 0); bs-private = s-vdev.pci_dev; bdrv_guess_geometry(s-bs, cylinders, heads, secs); bdrv_set_geometry_hint(s-bs, cylinders, heads, secs); diff --git a/hw/virtio-blk.h b/hw/virtio-blk.h index 5ef6c36..3229394 100644 --- a/hw/virtio-blk.h +++ b/hw/virtio-blk.h @@ -31,6 +31,7 @@ #define VIRTIO_BLK_F_RO 5 /* Disk is read-only */ #define VIRTIO_BLK_F_BLK_SIZE 6 /* Block size of disk is available*/ #define VIRTIO_BLK_F_SCSI 7 /* Supports scsi command passthru */ +#define VIRTIO_BLK_F_SN 8 /* serial number supported */ struct virtio_blk_config { @@ -40,6 +41,8 @@ struct virtio_blk_config uint16_t cylinders; uint8_t heads; uint8_t sectors; +uint32_t _blk_size;/* structure pad, currently unused */ +uint8_t serial[BLOCK_SERIAL_STRLEN]; } __attribute__((packed)); /* These two define direction. */ diff --git a/sysemu.h b/sysemu.h index 1f45fd6..185b4e3 100644 --- a/sysemu.h +++ b/sysemu.h @@ -141,6 +141,8 @@ typedef enum { BLOCK_ERR_STOP_ANY } BlockInterfaceErrorAction; +#define BLOCK_SERIAL_STRLEN 20 + typedef struct DriveInfo { BlockDriverState *bdrv; BlockInterfaceType type; @@ -149,7 +151,7 @@ typedef struct DriveInfo { int used; int drive_opt_idx; BlockInterfaceErrorAction onerror; -char serial[21]; +char serial[BLOCK_SERIAL_STRLEN + 1]; } DriveInfo; #define MAX_IDE_DEVS 2
[PATCH 2/2] Add serial number support for virtio_blk, V2
-- john.coo...@redhat.com drivers/block/virtio_blk.c | 35 --- include/linux/virtio_blk.h | 10 ++ 2 files changed, 42 insertions(+), 3 deletions(-) = --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -146,12 +146,40 @@ static void do_virtblk_request(struct re vblk-vq-vq_ops-kick(vblk-vq); } +/* user passes the address of a char[] for serial# return, and has set char[0] + * to the array size. copy serial# to this char[] and return number of + * characters copied excluding any trailing '\0' pad chars in buffer. + */ +static int get_virtblk_sn(struct block_device *bdev, void *buf) +{ + struct virtio_blk *vblk = bdev-bd_disk-private_data; + unsigned char serial[BLOCK_SERIAL_STRLEN]; + unsigned char snlen; + int rv; + + if (copy_from_user(snlen, buf, sizeof (snlen))) + rv = -EFAULT; + else if ((rv = virtio_config_val(vblk-vdev, VIRTIO_BLK_F_SN, + offsetof(struct virtio_blk_config, serial), serial))) + ; + else if (copy_to_user(buf, serial, + snlen = min(snlen, (unsigned char)sizeof (serial + rv = -EFAULT; + else + for (rv = 0; rv snlen; ++rv) + if (!serial[rv]) +break; + return (rv); +} + static int virtblk_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, unsigned long data) { - return scsi_cmd_ioctl(bdev-bd_disk-queue, - bdev-bd_disk, mode, cmd, - (void __user *)data); + if (cmd == VBLK_GET_SN) + return (get_virtblk_sn(bdev, (void __user *)data)); + else + return scsi_cmd_ioctl(bdev-bd_disk-queue, bdev-bd_disk, + mode, cmd, (void __user *)data); } /* We provide getgeo only to please some old bootloader/partitioning tools */ @@ -356,6 +384,7 @@ static struct virtio_device_id id_table[ static unsigned int features[] = { VIRTIO_BLK_F_BARRIER, VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY, VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE, + VIRTIO_BLK_F_SN }; static struct virtio_driver virtio_blk = { = --- a/include/linux/virtio_blk.h +++ b/include/linux/virtio_blk.h @@ -15,7 +15,16 @@ #define VIRTIO_BLK_F_GEOMETRY 4 /* Legacy geometry available */ #define VIRTIO_BLK_F_RO 5 /* Disk is read-only */ #define VIRTIO_BLK_F_BLK_SIZE 6 /* Block size of disk is available*/ +#define VIRTIO_BLK_F_SN 8 /* serial number supported */ +/* ioctl cmd to retrieve serial# +*/ +#define VBLK_GET_SN ((unsigned int)('V' 24 | 'B' 16 | 'L' 8 | 'K')) + +#define BLOCK_SERIAL_STRLEN 20 + +/* mapped into pci i/o region 0 + */ struct virtio_blk_config { /* The capacity (in 512-byte sectors). */ @@ -32,6 +41,7 @@ struct virtio_blk_config } geometry; /* block size of device (if VIRTIO_BLK_F_BLK_SIZE) */ __u32 blk_size; + __u8 serial[BLOCK_SERIAL_STRLEN]; } __attribute__((packed)); /* These two define direction. */
Re: Best choice for copy/clone/snapshot
Ross Boylan wrote: Or do you mean I should back each virtual disk with an LVM volume? Yes, this option is what was meant. That does seem cleaner; I've just been following the docs and they use regular files. They say I can't just use a raw partition, but maybe kvm-img -f qcow2 /dev/MyVolumeGroup/Volume10 ? While new versions of qcow2 have some extensions that let the last-written sector be tracked for use on device-backed partitions, the expectation is that you'll (really) just use the raw partition; qcow2 more than takes back the performance gain from getting your host filesystem out of the loop. I'm not sure how growing the logical volume would interact with qcow... Right -- folks doing this route go raw rather than qcow, so it's just a matter of resizing the partitions / filesystems within the guest. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Problem doing pci passthrough of the network card without VT-d
Are you expecting this to work using the 1:1 mapping for direct device assignment? I use a similar setup (e.g. dma=none and no VT-d) but a different NIC (Intel 82598 10G) and a different driver (ixgbe). I see the same messages, but also don't get the device to work in the guest (while it does work in the host OS). In fact I don't get any errors on the guest side, so it is hard to track what is wrong. No I/O is happening. The guest cannot not transmit/receive any packets to/from those NICs. The interface packet counters stay at 0. I see an error in QEMU saying invalid memtype, and it also seems to have trouble assigning IRQs. assigned_dev_enabled_msix() fails with Invalid Argument, but on the guest side I can see that MSI-X is configured properly under /proc/interrupts. I use the latest KVM 2.6.30 tree in both host OS and guest OS. -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Passera, Pablo R Sent: 12 May 2009 11:22 To: kvm@vger.kernel.org Subject: RE: Problem doing pci passthrough of the network card without VT-d One update on this. I disabled VT-d from the BIOS and now I am not getting the DMAR error messages in dmesg, but the board still does not work on the guest. Any help is welcomed. e1000e :00:19.0: PCI INT A disabled pci-stub :00:19.0: PCI INT A - GSI 20 (level, low) - IRQ 20 pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X Regards, Pablo -Original Message- From: Passera, Pablo R Sent: Tuesday, May 12, 2009 12:14 PM To: kvm@vger.kernel.org Subject: Problem doing pci passthrough of the network card without VT- d Hi List, I am having problems to do pci passthrough to a network card without using VT-d. The card is present in the guest but with a different model (Intel Corporation 82801I Gigabit Ethernet Controller (rev 2)) and it does not work. The qemu line that I used is: ./devel/bin/qemu-system-x86_64 -hda ./dm.img -m 256 -pcidevice host=00:19.0,dma=none -net none Before running qemu I did echo 8086 294c /sys/bus/pci/drivers/pci-stub/new_id echo :00:19.0 /sys/bus/pci/drivers/e1000e/unbind echo :00:19.0 /sys/bus/pci/drivers/pci-stub/bind This is the lspci -tv output -[:00]-+-00.0 Intel Corporation 82X38/X48 Express DRAM Controller +-01.0-[:01]00.0 nVidia Corporation G80 [GeForce 8800 GTX] +-19.0 Intel Corporation 82566DC-2 Gigabit Network Connection +-1a.0 Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 +-1a.1 Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 +-1a.2 Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 +-1a.7 Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 +-1b.0 Intel Corporation 82801I (ICH9 Family) HD Audio Controller +-1c.0-[:02]-- +-1c.4-[:03]00.0 Marvell Technology Group Ltd. 88SE6121 SATA II Controller +-1d.0 Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 +-1d.1 Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 +-1d.2 Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 +-1d.7 Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 +-1e.0-[:04]03.0 Texas Instruments TSB43AB22/A IEEE- 1394a-2000 Controller (PHY/Link) +-1f.0 Intel Corporation 82801IR (ICH9R) LPC Interface Controller +-1f.2 Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 4 port SATA IDE Controller +-1f.3 Intel Corporation 82801I (ICH9 Family) SMBus Controller \-1f.5 Intel Corporation 82801I (ICH9 Family) 2 port SATA IDE Controller I am getting the following error in host dmesg e1000e :00:19.0: PCI INT A disabled pci-stub :00:19.0: PCI INT A - GSI 20 (level, low) - IRQ 20 pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X DMAR:[DMA Read] Request device [00:19.0] fault addr baee000 DMAR:[fault reason 02] Present bit in context entry is clear pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X DMAR:[DMA Read] Request device [00:19.0] fault
[PATCH v6] kvm: Use a bitmap for tracking used GSIs
We're currently using a counter to track the most recent GSI we've handed out. This quickly hits KVM_MAX_IRQ_ROUTES when using device assignment with a driver that regularly toggles the MSI enable bit. This can mean only a few minutes of usable run time. Instead, track used GSIs in a bitmap. Signed-off-by: Alex Williamson alex.william...@hp.com --- v2: Added mutex to protect gsi bitmap v3: Updated for comments from Michael Tsirkin No longer depends on [PATCH] kvm: device-assignment: Catch GSI overflow v4: Fix gsi_bytes calculation noted by Sheng Yang v5: Remove mutex per Avi Fix negative gsi_count path per Michael Remove KVM_CAP_IRQ_ROUTING per Michael, ppc should still be protected by the KVM_IOAPIC_NUM_PINS check v6: Make use of ALIGN macro, per Michael Define KVM_IOAPIC_NUM_PINS if not already, per Michael Fix comment indent, per Michael Remove unused BITMAP_SIZE macro hw/device-assignment.c |4 ++ kvm/libkvm/kvm-common.h |3 +- kvm/libkvm/libkvm.c | 80 +-- kvm/libkvm/libkvm.h | 10 ++ 4 files changed, 78 insertions(+), 19 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index a7365c8..a6cc9b9 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -561,8 +561,10 @@ static void free_dev_irq_entries(AssignedDevice *dev) { int i; -for (i = 0; i dev-irq_entries_nr; i++) +for (i = 0; i dev-irq_entries_nr; i++) { kvm_del_routing_entry(kvm_context, dev-entry[i]); +kvm_free_irq_route_gsi(kvm_context, dev-entry[i].gsi); +} free(dev-entry); dev-entry = NULL; dev-irq_entries_nr = 0; diff --git a/kvm/libkvm/kvm-common.h b/kvm/libkvm/kvm-common.h index 591fb53..c95c591 100644 --- a/kvm/libkvm/kvm-common.h +++ b/kvm/libkvm/kvm-common.h @@ -67,7 +67,8 @@ struct kvm_context { struct kvm_irq_routing *irq_routes; int nr_allocated_irq_routes; #endif - int max_used_gsi; + void *used_gsi_bitmap; + int max_gsi; }; int kvm_alloc_kernel_memory(kvm_context_t kvm, unsigned long memory, diff --git a/kvm/libkvm/libkvm.c b/kvm/libkvm/libkvm.c index ba0a5d1..70857c7 100644 --- a/kvm/libkvm/libkvm.c +++ b/kvm/libkvm/libkvm.c @@ -61,10 +61,18 @@ #define DPRINTF(fmt, args...) do {} while (0) #endif +#define MIN(x,y) ((x) (y) ? (x) : (y)) +#define ALIGN(x, y) (((x)+(y)-1) ~((y)-1)) + +#ifndef KVM_IOAPIC_NUM_PINS +#define KVM_IOAPIC_NUM_PINS 0 +#endif int kvm_abi = EXPECTED_KVM_API_VERSION; int kvm_page_size; +static inline void set_bit(uint32_t *buf, unsigned int bit); + struct slot_info { unsigned long phys_addr; unsigned long len; @@ -285,7 +293,7 @@ kvm_context_t kvm_init(struct kvm_callbacks *callbacks, { int fd; kvm_context_t kvm; - int r; + int r, gsi_count; fd = open(/dev/kvm, O_RDWR); if (fd == -1) { @@ -323,6 +331,26 @@ kvm_context_t kvm_init(struct kvm_callbacks *callbacks, kvm-no_irqchip_creation = 0; kvm-no_pit_creation = 0; + gsi_count = kvm_get_gsi_count(kvm); + if (gsi_count 0) { + int gsi_bits, i; + + /* Round up so we can search ints using ffs */ + gsi_bits = ALIGN(gsi_count, 32); + kvm-used_gsi_bitmap = malloc(gsi_bits / 8); + if (!kvm-used_gsi_bitmap) + goto out_close; + memset(kvm-used_gsi_bitmap, 0, gsi_bits / 8); + kvm-max_gsi = gsi_bits; + + /* Mark all the IOAPIC pin GSIs and any over-allocated +* GSIs as already in use. */ + for (i = 0; i MIN(KVM_IOAPIC_NUM_PINS, gsi_count); i++) + set_bit(kvm-used_gsi_bitmap, i); + for (i = gsi_count; i gsi_bits; i++) + set_bit(kvm-used_gsi_bitmap, i); + } + return kvm; out_close: close(fd); @@ -626,9 +654,6 @@ int kvm_get_dirty_pages(kvm_context_t kvm, unsigned long phys_addr, void *buf) return kvm_get_map(kvm, KVM_GET_DIRTY_LOG, slot, buf); } -#define ALIGN(x, y) (((x)+(y)-1) ~((y)-1)) -#define BITMAP_SIZE(m) (ALIGN(((m)/PAGE_SIZE), sizeof(long) * 8) / 8) - int kvm_get_dirty_pages_range(kvm_context_t kvm, unsigned long phys_addr, unsigned long len, void *buf, void *opaque, int (*cb)(unsigned long start, unsigned long len, @@ -1298,8 +1323,6 @@ int kvm_add_routing_entry(kvm_context_t kvm, new-flags = entry-flags; new-u = entry-u; - if (entry-gsi kvm-max_used_gsi) - kvm-max_used_gsi = entry-gsi; return 0; #else return -ENOSYS; @@ -1404,19 +1427,42 @@ int kvm_commit_irq_routes(kvm_context_t kvm) #endif } +static inline void set_bit(uint32_t *buf, unsigned int bit) +{ + buf[bit / 32] |= 1U (bit % 32); +} + +static inline void clear_bit(uint32_t
RE: Problem doing pci passthrough of the network card without VT-d
Hi Anna, Are you expecting this to work using the 1:1 mapping for direct device assignment? Actually, I want to use the current qemu implementation for this. AFAIK from the code seems that qemu mmaps the device memory into the qemu pci subsystem memory space. Is this correct? In fact I don't get any errors on the guest side, so it is hard to track what is wrong. In the guest I am getting an error in dmesg saying Detected Tx Unit Hang I see an error in QEMU saying invalid memtype, and it also seems to have trouble assigning IRQs. The only error I am seeing in qemu is assigned_dev_iomem_map: e_phys=f202 r_virt=0x7f95bca9a000 type=0 len=0002 region_num=0 BUG: kvm_destroy_phys_mem: invalid parameters (slot=-1) Regards, Pablo -Original Message- From: Fischer, Anna [mailto:anna.fisc...@hp.com] Sent: Wednesday, May 13, 2009 2:22 PM To: Passera, Pablo R Cc: kvm@vger.kernel.org Subject: RE: Problem doing pci passthrough of the network card without VT-d Are you expecting this to work using the 1:1 mapping for direct device assignment? I use a similar setup (e.g. dma=none and no VT-d) but a different NIC (Intel 82598 10G) and a different driver (ixgbe). I see the same messages, but also don't get the device to work in the guest (while it does work in the host OS). In fact I don't get any errors on the guest side, so it is hard to track what is wrong. No I/O is happening. The guest cannot not transmit/receive any packets to/from those NICs. The interface packet counters stay at 0. I see an error in QEMU saying invalid memtype, and it also seems to have trouble assigning IRQs. assigned_dev_enabled_msix() fails with Invalid Argument, but on the guest side I can see that MSI-X is configured properly under /proc/interrupts. I use the latest KVM 2.6.30 tree in both host OS and guest OS. -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Passera, Pablo R Sent: 12 May 2009 11:22 To: kvm@vger.kernel.org Subject: RE: Problem doing pci passthrough of the network card without VT-d One update on this. I disabled VT-d from the BIOS and now I am not getting the DMAR error messages in dmesg, but the board still does not work on the guest. Any help is welcomed. e1000e :00:19.0: PCI INT A disabled pci-stub :00:19.0: PCI INT A - GSI 20 (level, low) - IRQ 20 pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X pci-stub :00:19.0: irq 29 for MSI/MSI-X Regards, Pablo -Original Message- From: Passera, Pablo R Sent: Tuesday, May 12, 2009 12:14 PM To: kvm@vger.kernel.org Subject: Problem doing pci passthrough of the network card without VT- d Hi List, I am having problems to do pci passthrough to a network card without using VT-d. The card is present in the guest but with a different model (Intel Corporation 82801I Gigabit Ethernet Controller (rev 2)) and it does not work. The qemu line that I used is: ./devel/bin/qemu-system-x86_64 -hda ./dm.img -m 256 -pcidevice host=00:19.0,dma=none -net none Before running qemu I did echo 8086 294c /sys/bus/pci/drivers/pci-stub/new_id echo :00:19.0 /sys/bus/pci/drivers/e1000e/unbind echo :00:19.0 /sys/bus/pci/drivers/pci-stub/bind This is the lspci -tv output -[:00]-+-00.0 Intel Corporation 82X38/X48 Express DRAM Controller +-01.0-[:01]00.0 nVidia Corporation G80 [GeForce 8800 GTX] +-19.0 Intel Corporation 82566DC-2 Gigabit Network Connection +-1a.0 Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 +-1a.1 Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 +-1a.2 Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 +-1a.7 Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 +-1b.0 Intel Corporation 82801I (ICH9 Family) HD Audio Controller +-1c.0-[:02]-- +-1c.4-[:03]00.0 Marvell Technology Group Ltd. 88SE6121 SATA II Controller +-1d.0 Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 +-1d.1 Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 +-1d.2 Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 +-1d.7 Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 +-1e.0-[:04]03.0 Texas Instruments TSB43AB22/A IEEE- 1394a-2000 Controller (PHY/Link) +-1f.0 Intel Corporation 82801IR (ICH9R) LPC Interface Controller +-1f.2 Intel Corporation 82801IR/IO/IH
kvm-autotest: The automation plans?
Hi Uri/Lucas, Do you have any plans for enhancing kvm-autotest? I was looking mainly on the following 2 aspects: (1). we have standalone migration only. Is there any plans of enhancing kvm-autotest so that we can trigger migration while a workload is running? Something like this: Start a workload(may be n instances of it). let the test execute for some time. Trigger migration. Log into the target. Check if the migration is succesful Check if the test results are consistent. (2). How can we run N parallel instances of a test? Will the current configuration be easily able to support it? Please provide your thoughts on the above features. -- Sudhir Kumar -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-autotest: The automation plans?
- sudhir kumar smalik...@gmail.com wrote: Hi Uri/Lucas, Do you have any plans for enhancing kvm-autotest? I was looking mainly on the following 2 aspects: (1). we have standalone migration only. Is there any plans of enhancing kvm-autotest so that we can trigger migration while a workload is running? Something like this: Start a workload(may be n instances of it). let the test execute for some time. Trigger migration. Log into the target. Check if the migration is succesful Check if the test results are consistent. Yes, we have plans to implement such functionality. It shouldn't be hard, but we need to give it some thought in order to implement it as elegantly as possible. (2). How can we run N parallel instances of a test? Will the current configuration be easily able to support it? I currently have some experimental patches that allow running of several parallel queues of tests. But what exactly do you mean by N parallel instances of a test? Do you mean N queues? Please provide an example so I can get a better idea. Thanks, Michael -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: changing guest CD
On Mon, 2009-05-11 at 17:13 -0500, Anthony Liguori wrote: Stuart Jansen wrote: Does KVM support changing the CD in a running guest's disc drive? I've tried to do it using the qemu monitor, but so far haven't been able to. I've seen rumor and innuendo that KVM can't change the disc in a running system, but no official confirmation yet. If KVM doesn't support changing the disc in a running system, what would be required to support it? It does via the change command. What did you try and how did it fail? I've been using both libvirt and raw qemu monitor with Fedora 11 KVM RPMs. After further testing, while F11 doesn't work, F10 does. Guess it's time to see if Fedora bugzilla already has a report. -- XML is like violence: if it doesn't solve your problem, you aren't using enough of it. - Chris Maden -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-AUTOTEST][PATCH] timedrift support
On Tue, 2009-05-12 at 21:07 +0800, Bear Yang wrote: Sorry forgot to attach my new patch. Bear Yang wrote: Hi Lucas: First, I want to say really thanks for your kindly,carefully words and suggestions. now, I modified my scripts follow your opinions. 1. Add the genload to timedrift, but I am not sure whether it is right or not to add the information CVS relevant. If it is not necessary. I will remove them next time. Yes, we can remove the CVS related info, they just got mailed to you because I got the code from a fresh LTP CVS checkout! 2. Replace the API os.system to utils.system 3. Replace the API os.environ.get('HOSTNAME') to socket.gethostname() 4. for the snippet of the code below: +if utils.system(ntp_cmd, ignore_status=True) != 0: +raise error.TestFail, NTP server has not starting correctly... Your suggestion is Instead of the if clause we'd put a try/except block, but I am not clear how to do it. Would you please give me some guides for this. Sorry. You could re-write the above if statement using the form: try: utils.system(ntp_cmd) except: raise error.TestFail(NTP server has not started correctly) Some comments: 1) The try/except block works because utils.system already throws an exception when the exit code is different from 0. 2) The form raise error.TestFail(NTP server has not started correctly) Is preferred on the upstream project over the equivalent raise error.TestFail, NTP server has not started correctly But on kvm autotest we are adopting the later, so don't worry and keep the all the raises the way they are on your original patch. This was just a side comment. Other thing about functional the clauses which to get vm handle below: +# get vm handle +vm = kvm_utils.env_get_vm(env,params.get(main_vm)) +if not vm: +raise error.TestError, VM object not found in environment +if not vm.is_alive(): +raise error.TestError, VM seems to be dead; Test requires a living VM I agree with you on this point, I remember that somebody to do this before. but seems upstream not accept his modification. Ok, will take a look at this. By the way, when you have an updated patch please let us know! Thank you very much, -- Lucas Meneghel Rodrigues Software Engineer (QE) Red Hat - Emerging Technologies -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-autotest: The automation plans?
On Wed, 2009-05-13 at 23:21 +0530, sudhir kumar wrote: Hi Uri/Lucas, Do you have any plans for enhancing kvm-autotest? I was looking mainly on the following 2 aspects: Hi Sudhir, about the two questions you've made, Michael has answered them a lot better than I possibly could. So please keep in touch and send your ideas so we can consider implementing them on our tests! Thank you very much, -- Lucas Meneghel Rodrigues Software Engineer (QE) Red Hat - Emerging Technologies -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6] kvm: Use a bitmap for tracking used GSIs
On Wed, May 13, 2009 at 11:28:16AM -0600, Alex Williamson wrote: We're currently using a counter to track the most recent GSI we've handed out. This quickly hits KVM_MAX_IRQ_ROUTES when using device assignment with a driver that regularly toggles the MSI enable bit. This can mean only a few minutes of usable run time. Instead, track used GSIs in a bitmap. Signed-off-by: Alex Williamson alex.william...@hp.com Acked-by: Michael S. Tsirkin m...@redhat.com -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv5 0/3] virtio: MSI-X support
Here's the latest draft of virtio patches. This is on top of Rusty's recent virtqueue list + name patch. Michael S. Tsirkin (3): virtio: find_vqs/del_vqs virtio operations virtio_pci: split up vp_interrupt virtio_pci: optional MSI-X support drivers/block/virtio_blk.c |6 +- drivers/char/hw_random/virtio-rng.c |6 +- drivers/char/virtio_console.c | 26 ++-- drivers/lguest/lguest_device.c | 36 - drivers/net/virtio_net.c| 45 ++--- drivers/s390/kvm/kvm_virtio.c | 36 - drivers/virtio/virtio_balloon.c | 27 ++-- drivers/virtio/virtio_pci.c | 301 ++- include/linux/virtio_config.h | 46 -- include/linux/virtio_pci.h | 10 +- net/9p/trans_virtio.c |2 +- 11 files changed, 423 insertions(+), 118 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv5 1/3] virtio: find_vqs/del_vqs virtio operations
This replaces find_vq/del_vq with find_vqs/del_vqs virtio operations, and updates all drivers. This is needed for MSI support, because MSI needs to know the total number of vectors upfront. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/block/virtio_blk.c |6 ++-- drivers/char/hw_random/virtio-rng.c |6 ++-- drivers/char/virtio_console.c | 26 --- drivers/lguest/lguest_device.c | 36 +- drivers/net/virtio_net.c| 45 + drivers/s390/kvm/kvm_virtio.c | 36 +- drivers/virtio/virtio_balloon.c | 27 drivers/virtio/virtio_pci.c | 37 ++- include/linux/virtio_config.h | 46 ++ net/9p/trans_virtio.c |2 +- 10 files changed, 180 insertions(+), 87 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 8f7c956..c9f5627 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -224,7 +224,7 @@ static int virtblk_probe(struct virtio_device *vdev) sg_init_table(vblk-sg, vblk-sg_elems); /* We expect one virtqueue, for output. */ - vblk-vq = vdev-config-find_vq(vdev, 0, blk_done, requests); + vblk-vq = virtio_find_single_vq(vdev, blk_done, requests); if (IS_ERR(vblk-vq)) { err = PTR_ERR(vblk-vq); goto out_free_vblk; @@ -323,7 +323,7 @@ out_put_disk: out_mempool: mempool_destroy(vblk-pool); out_free_vq: - vdev-config-del_vq(vblk-vq); + vdev-config-del_vqs(vdev); out_free_vblk: kfree(vblk); out: @@ -344,7 +344,7 @@ static void virtblk_remove(struct virtio_device *vdev) blk_cleanup_queue(vblk-disk-queue); put_disk(vblk-disk); mempool_destroy(vblk-pool); - vdev-config-del_vq(vblk-vq); + vdev-config-del_vqs(vdev); kfree(vblk); } diff --git a/drivers/char/hw_random/virtio-rng.c b/drivers/char/hw_random/virtio-rng.c index 2aeafce..f2041fe 100644 --- a/drivers/char/hw_random/virtio-rng.c +++ b/drivers/char/hw_random/virtio-rng.c @@ -94,13 +94,13 @@ static int virtrng_probe(struct virtio_device *vdev) int err; /* We expect a single virtqueue. */ - vq = vdev-config-find_vq(vdev, 0, random_recv_done, input); + vq = virtio_find_single_vq(vdev, random_recv_done, input); if (IS_ERR(vq)) return PTR_ERR(vq); err = hwrng_register(virtio_hwrng); if (err) { - vdev-config-del_vq(vq); + vdev-config-del_vqs(vdev); return err; } @@ -112,7 +112,7 @@ static void virtrng_remove(struct virtio_device *vdev) { vdev-config-reset(vdev); hwrng_unregister(virtio_hwrng); - vdev-config-del_vq(vq); + vdev-config-del_vqs(vdev); } static struct virtio_device_id id_table[] = { diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c index 58684e4..c74dacf 100644 --- a/drivers/char/virtio_console.c +++ b/drivers/char/virtio_console.c @@ -188,6 +188,9 @@ static void hvc_handle_input(struct virtqueue *vq) * Finally we put our input buffer in the input queue, ready to receive. */ static int __devinit virtcons_probe(struct virtio_device *dev) { + vq_callback_t *callbacks[] = { hvc_handle_input, NULL}; + const char *names[] = { input, output }; + struct virtqueue *vqs[2]; int err; vdev = dev; @@ -199,20 +202,15 @@ static int __devinit virtcons_probe(struct virtio_device *dev) goto fail; } - /* Find the input queue. */ + /* Find the queues. */ /* FIXME: This is why we want to wean off hvc: we do nothing * when input comes in. */ - in_vq = vdev-config-find_vq(vdev, 0, hvc_handle_input, input); - if (IS_ERR(in_vq)) { - err = PTR_ERR(in_vq); + err = vdev-config-find_vqs(vdev, 2, vqs, callbacks, names); + if (err) goto free; - } - out_vq = vdev-config-find_vq(vdev, 1, NULL, output); - if (IS_ERR(out_vq)) { - err = PTR_ERR(out_vq); - goto free_in_vq; - } + in_vq = vqs[0]; + out_vq = vqs[1]; /* Start using the new console output. */ virtio_cons.get_chars = get_chars; @@ -233,17 +231,15 @@ static int __devinit virtcons_probe(struct virtio_device *dev) hvc = hvc_alloc(0, 0, virtio_cons, PAGE_SIZE); if (IS_ERR(hvc)) { err = PTR_ERR(hvc); - goto free_out_vq; + goto free_vqs; } /* Register the input buffer the first time. */ add_inbuf(); return 0; -free_out_vq: - vdev-config-del_vq(out_vq); -free_in_vq: - vdev-config-del_vq(in_vq); +free_vqs: + vdev-config-del_vqs(vdev); free:
[PATCHv5 2/3] virtio_pci: split up vp_interrupt
This reorganizes virtio-pci code in vp_interrupt slightly, so that it's easier to add per-vq MSI support on top. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/virtio/virtio_pci.c | 53 +++--- 1 files changed, 34 insertions(+), 19 deletions(-) diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c index 027f13f..951e673 100644 --- a/drivers/virtio/virtio_pci.c +++ b/drivers/virtio/virtio_pci.c @@ -164,6 +164,37 @@ static void vp_notify(struct virtqueue *vq) iowrite16(info-queue_index, vp_dev-ioaddr + VIRTIO_PCI_QUEUE_NOTIFY); } +/* Handle a configuration change: Tell driver if it wants to know. */ +static irqreturn_t vp_config_changed(int irq, void *opaque) +{ + struct virtio_pci_device *vp_dev = opaque; + struct virtio_driver *drv; + drv = container_of(vp_dev-vdev.dev.driver, + struct virtio_driver, driver); + + if (drv drv-config_changed) + drv-config_changed(vp_dev-vdev); + return IRQ_HANDLED; +} + +/* Notify all virtqueues on an interrupt. */ +static irqreturn_t vp_vring_interrupt(int irq, void *opaque) +{ + struct virtio_pci_device *vp_dev = opaque; + struct virtio_pci_vq_info *info; + irqreturn_t ret = IRQ_NONE; + unsigned long flags; + + spin_lock_irqsave(vp_dev-lock, flags); + list_for_each_entry(info, vp_dev-virtqueues, node) { + if (vring_interrupt(irq, info-vq) == IRQ_HANDLED) + ret = IRQ_HANDLED; + } + spin_unlock_irqrestore(vp_dev-lock, flags); + + return ret; +} + /* A small wrapper to also acknowledge the interrupt when it's handled. * I really need an EIO hook for the vring so I can ack the interrupt once we * know that we'll be handling the IRQ but before we invoke the callback since @@ -173,9 +204,6 @@ static void vp_notify(struct virtqueue *vq) static irqreturn_t vp_interrupt(int irq, void *opaque) { struct virtio_pci_device *vp_dev = opaque; - struct virtio_pci_vq_info *info; - irqreturn_t ret = IRQ_NONE; - unsigned long flags; u8 isr; /* reading the ISR has the effect of also clearing it so it's very @@ -187,23 +215,10 @@ static irqreturn_t vp_interrupt(int irq, void *opaque) return IRQ_NONE; /* Configuration change? Tell driver if it wants to know. */ - if (isr VIRTIO_PCI_ISR_CONFIG) { - struct virtio_driver *drv; - drv = container_of(vp_dev-vdev.dev.driver, - struct virtio_driver, driver); - - if (drv drv-config_changed) - drv-config_changed(vp_dev-vdev); - } + if (isr VIRTIO_PCI_ISR_CONFIG) + vp_config_changed(irq, opaque); - spin_lock_irqsave(vp_dev-lock, flags); - list_for_each_entry(info, vp_dev-virtqueues, node) { - if (vring_interrupt(irq, info-vq) == IRQ_HANDLED) - ret = IRQ_HANDLED; - } - spin_unlock_irqrestore(vp_dev-lock, flags); - - return ret; + return vp_vring_interrupt(irq, opaque); } /* the config-find_vq() implementation */ -- 1.6.3.rc3.1.g830204 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv5 3/3] virtio_pci: optional MSI-X support
This implements optional MSI-X support in virtio_pci. MSI-X is used whenever the host supports at least 2 MSI-X vectors: 1 for configuration changes and 1 for virtqueues. Per-virtqueue vectors are allocated if enough vectors available. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/virtio/virtio_pci.c | 227 +++ include/linux/virtio_pci.h | 10 ++- 2 files changed, 217 insertions(+), 20 deletions(-) diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c index 951e673..65627a4 100644 --- a/drivers/virtio/virtio_pci.c +++ b/drivers/virtio/virtio_pci.c @@ -42,6 +42,26 @@ struct virtio_pci_device /* a list of queues so we can dispatch IRQs */ spinlock_t lock; struct list_head virtqueues; + + /* MSI-X support */ + int msix_enabled; + int intx_enabled; + struct msix_entry *msix_entries; + /* Name strings for interrupts. This size should be enough, +* and I'm too lazy to allocate each name separately. */ + char (*msix_names)[256]; + /* Number of available vectors */ + unsigned msix_vectors; + /* Vectors allocated */ + unsigned msix_used_vectors; +}; + +/* Constants for MSI-X */ +/* Use first vector for configuration changes, second and the rest for + * virtqueues Thus, we need at least 2 vectors for MSI. */ +enum { + VP_MSIX_CONFIG_VECTOR = 0, + VP_MSIX_VQ_VECTOR = 1, }; struct virtio_pci_vq_info @@ -60,6 +80,9 @@ struct virtio_pci_vq_info /* the list node for the virtqueues list */ struct list_head node; + + /* MSI-X vector (or none) */ + unsigned vector; }; /* Qumranet donated their vendor ID for devices 0x1000 thru 0x10FF. */ @@ -109,7 +132,8 @@ static void vp_get(struct virtio_device *vdev, unsigned offset, void *buf, unsigned len) { struct virtio_pci_device *vp_dev = to_vp_device(vdev); - void __iomem *ioaddr = vp_dev-ioaddr + VIRTIO_PCI_CONFIG + offset; + void __iomem *ioaddr = vp_dev-ioaddr + + VIRTIO_PCI_CONFIG(vp_dev) + offset; u8 *ptr = buf; int i; @@ -123,7 +147,8 @@ static void vp_set(struct virtio_device *vdev, unsigned offset, const void *buf, unsigned len) { struct virtio_pci_device *vp_dev = to_vp_device(vdev); - void __iomem *ioaddr = vp_dev-ioaddr + VIRTIO_PCI_CONFIG + offset; + void __iomem *ioaddr = vp_dev-ioaddr + + VIRTIO_PCI_CONFIG(vp_dev) + offset; const u8 *ptr = buf; int i; @@ -221,7 +246,121 @@ static irqreturn_t vp_interrupt(int irq, void *opaque) return vp_vring_interrupt(irq, opaque); } -/* the config-find_vq() implementation */ +static void vp_free_vectors(struct virtio_device *vdev) { + struct virtio_pci_device *vp_dev = to_vp_device(vdev); + int i; + + if (vp_dev-intx_enabled) { + free_irq(vp_dev-pci_dev-irq, vp_dev); + vp_dev-intx_enabled = 0; + } + + for (i = 0; i vp_dev-msix_used_vectors; ++i) + free_irq(vp_dev-msix_entries[i].vector, vp_dev); + vp_dev-msix_used_vectors = 0; + + if (vp_dev-msix_enabled) { + /* Disable the vector used for configuration */ + iowrite16(VIRTIO_MSI_NO_VECTOR, + vp_dev-ioaddr + VIRTIO_MSI_CONFIG_VECTOR); + /* Flush the write out to device */ + ioread16(vp_dev-ioaddr + VIRTIO_MSI_CONFIG_VECTOR); + + vp_dev-msix_enabled = 0; + pci_disable_msix(vp_dev-pci_dev); + } +} + +static int vp_enable_msix(struct pci_dev *dev, struct msix_entry *entries, + int *options, int noptions) +{ + int i; + for (i = 0; i noptions; ++i) + if (!pci_enable_msix(dev, entries, options[i])) + return options[i]; + return -EBUSY; +} + +static int vp_request_vectors(struct virtio_device *vdev, unsigned max_vqs) +{ + struct virtio_pci_device *vp_dev = to_vp_device(vdev); + const char *name = dev_name(vp_dev-vdev.dev); + unsigned i, v; + int err = -ENOMEM; + /* We want at most one vector per queue and one for config changes. +* Fallback to separate vectors for config and a shared for queues. +* Finally fall back to regular interrupts. */ + int options[] = { max_vqs + 1, 2 }; + int nvectors = max(options[0], options[1]); + + vp_dev-msix_entries = kmalloc(nvectors * sizeof *vp_dev-msix_entries, + GFP_KERNEL); + if (!vp_dev-msix_entries) + goto error_entries; + vp_dev-msix_names = kmalloc(nvectors * sizeof *vp_dev-msix_names, +GFP_KERNEL); + if (!vp_dev-msix_names) + goto error_names; + + for (i = 0; i nvectors; ++i) +
kvm-85 sometimes not starting on 2.6.30-rc5
Hi, sometimes trying to start kvm on 2.6.30-rc5 (with kvm module v85, userspace v85) fails with: kvm_create_vm: Interrupted system call Could not create KVM context and following backtrace appears in dmesg: [ 309.546138] BUG: MAX_LOCK_DEPTH too low! [ 309.549964] turning off the locking correctness validator. [ 309.549964] Pid: 2833, comm: qemu-kvm Not tainted 2.6.30lb.00_01_PRE08 #1 [ 309.549964] Call Trace: [ 309.549964] [80269aa9] __lock_acquire+0x4a9/0xb70 [ 309.549964] [802c54ef] ? mm_take_all_locks+0x2f/0x130 [ 309.549964] [8026b825] lock_acquire+0xa5/0x150 [ 309.549964] [802c55ac] ? mm_take_all_locks+0xec/0x130 [ 309.549964] [80505c96] _spin_lock_nest_lock+0x36/0x50 [ 309.549964] [802c55ac] ? mm_take_all_locks+0xec/0x130 [ 309.549964] [802c55ac] mm_take_all_locks+0xec/0x130 [ 309.549964] [802d43ab] do_mmu_notifier_register+0x7b/0x1d0 [ 309.549964] [802d451e] mmu_notifier_register+0xe/0x10 [ 309.549964] [a02a8dd9] kvm_dev_ioctl+0x189/0x2f0 [kvm] [ 309.549964] [802f0171] vfs_ioctl+0x31/0x90 [ 309.549964] [802f03fb] do_vfs_ioctl+0x22b/0x550 [ 309.549964] [802f07a2] sys_ioctl+0x82/0xa0 [ 309.549964] [8020b442] system_call_fastpath+0x16/0x1b It happened to me when I didn't have storage with kernel mounted. Further attempts are usually successfull. BR nik -- - Nikola CIPRICH LinuxBox.cz, s.r.o. 28. rijna 168, 709 01 Ostrava tel.: +420 596 603 142 fax:+420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: ser...@linuxbox.cz - -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] don't use a 32-bit bit type as offset argument.
In the call path of kvm_get_dirty_pages_log_range(), its caller kvm_get_dirty_bitmap_cb() passes the target_phys_addr_t both as start_addr and the offset. So, using int will make dirty tracking over 4G fail completely. Of course we should be using qemu types in here, so please don't get me started on this. The whole file is wrong already ;) Signed-off-by: Glauber Costa glom...@redhat.com --- qemu-kvm.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/qemu-kvm.c b/qemu-kvm.c index f55cee8..27c37b5 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -1201,7 +1201,7 @@ int kvm_physical_memory_set_dirty_tracking(int enable) /* get kvm's dirty pages bitmap and update qemu's */ static int kvm_get_dirty_pages_log_range(unsigned long start_addr, unsigned char *bitmap, - unsigned int offset, + unsigned long offset, unsigned long mem_size) { unsigned int i, j, n=0; -- 1.5.6.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC + PATCHES] Work to get KVM autotest upstream
- Lucas Meneghel Rodrigues mrodr...@redhat.com wrote: On Wed, 2009-05-13 at 12:23 -0400, Michael Goldish wrote: The patches look good, but I haven't tested them yet to make sure they leave everything at a functional state (will test them and let you know). Thanks Michael! I will start to give more thorough test on this today, since we finally got 0.10 in shape. I have a somewhat related question: how is KVM-Autotest development going to proceed after the upstream merge? Currently I have comfortable access to our repository at TLV, and on good days I push as many as 20 patches per day. Should I submit all patches to the Autotest mailing list after the merge, or are we going to work with pull requests, or some other way? Will we work with git or svn? Here is my plan: For people inside our team, with access to the git tree we can just pull stuff to the git tree and on a given time basis I can pick up the patches and send them altogether to the KVM and autotest mailing list, wait for reviews and then check them. I think it would be nice to have a 'fast' development channel like directly pulling from a git tree. If you are already used to send all your changes to the KVM mailing list though, this would pose little or no change to you, just send an additional cc to the autotest mailing list. What do you think? So far we've kept development mostly internal in TLV, so I'm not quite used to passing my commits through the mailing list. Will this be necessary? I'm worried it might slow down development to a grinding halt. Thanks, Michael -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: XP smp using a lot of CPU
Hi all, very very interesting. I have a similar problem but the other way round. If my XP runs up tp 100% CPU usage top on the linux host reports only 33% cpu usage. I would expect around 50% because I only provide one core for the guest. I already increased the process priority of qemu and the io priority, nothing helped. The rest of the CPU is nearly idle, no excessive disk access this time :-) Any Idea what this could be? Best regards, Erik Ross Boylan wrote: I just installed XP into a new VM, specifying -smp 2 for the machine. According to top, it's using nearly 200% of a cpu even when I'm not doing anything. Is this real CPU useage, or just a reporting problem (just as my disk image is big according to ls, but isn't really)? If it's real, is there anything I can do about it? kvm 0.7.2 on Debian Lenny (but 2.6.29 kernel), amd64. Xeon chips; 32 bit version of XP pro installed, now fully patched (including the Windows Genuine Advantage stuff, though I cancelled it when it wanted to run). Task manager in XP shows virtually no CPU useage. Please cc me on responses. Thanks for any assistance. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] don't use a 32-bit bit type as offset argument.
* Glauber Costa glom...@redhat.com [2009-05-13 14:22]: In the call path of kvm_get_dirty_pages_log_range(), its caller kvm_get_dirty_bitmap_cb() passes the target_phys_addr_t both as start_addr and the offset. So, using int will make dirty tracking over 4G fail completely. Does this patch fix someting like 32-bit migration with 4G ? Seems like it might. Of course we should be using qemu types in here, so please don't get me started on this. The whole file is wrong already ;) Signed-off-by: Glauber Costa glom...@redhat.com --- qemu-kvm.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/qemu-kvm.c b/qemu-kvm.c index f55cee8..27c37b5 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -1201,7 +1201,7 @@ int kvm_physical_memory_set_dirty_tracking(int enable) /* get kvm's dirty pages bitmap and update qemu's */ static int kvm_get_dirty_pages_log_range(unsigned long start_addr, unsigned char *bitmap, - unsigned int offset, + unsigned long offset, unsigned long mem_size) { unsigned int i, j, n=0; -- 1.5.6.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] don't use a 32-bit bit type as offset argument.
On Wed, May 13, 2009 at 5:23 PM, Ryan Harper ry...@us.ibm.com wrote: * Glauber Costa glom...@redhat.com [2009-05-13 14:22]: In the call path of kvm_get_dirty_pages_log_range(), its caller kvm_get_dirty_bitmap_cb() passes the target_phys_addr_t both as start_addr and the offset. So, using int will make dirty tracking over 4G fail completely. Does this patch fix someting like 32-bit migration with 4G ? Seems like it might. it fixes general 4G migration. I tested a 64-bit guest on a 64-bit host, and it does not work previous to this patch -- Glauber Costa. Free as in Freedom http://glommer.net The less confident you are, the more serious you have to act. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: user: include arch specific headers from $(KERNELDIR)
On Wednesday 13 May 2009 08:32:21 Mark McLoughlin wrote: Currently we only include $(KERNELDIR)/include in CFLAGS, but we also have $(KERNELDIR)/arch/$(arch)/include or else we'll get mis-matched headers. I think this is fundamentally wrong. User files should never directly access kernel headers, because they are postprocessed in various ways in order to get files that are valid in user space, e.g. __user annotations are removed. The three possible sources for kernel headers are: /usr/include - system provided headers, may be older than the running kernel /lib/modules/$(uname -r)/build/usr/include - user space headers for the currently running kernel $(KERNELDIR)/usr/include - user space headers from a configured kernel tree after 'make headers_install' Arnd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] don't use a 32-bit bit type as offset argument.
Glauber Costa wrote: In the call path of kvm_get_dirty_pages_log_range(), its caller kvm_get_dirty_bitmap_cb() passes the target_phys_addr_t both as start_addr and the offset. So, using int will make dirty tracking over 4G fail completely. Of course we should be using qemu types in here, so please don't get me started on this. The whole file is wrong already ;) :-) Signed-off-by: Glauber Costa glom...@redhat.com Good candidate for stable too. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] kvm: Use a bitmap for tracking used GSIs
On Wed, 2009-05-13 at 08:33 -0600, Alex Williamson wrote: On Wed, 2009-05-13 at 08:15 -0600, Alex Williamson wrote: On Wed, 2009-05-13 at 16:55 +0300, Michael S. Tsirkin wrote: Very surprising: I haven't seen any driver disable MSI expect on device destructor path. Is this a linux guest? Yes, Debian 2.6.26 kernel. I'll check it it behaves the same on newer upstream kernels and try to figure out why it's doing it. Updating the guest to 2.6.29 seems to fix the interrupt toggling. So it's either something in older kernels or something debian introduced, but that seems unlikely. For the curious, this was fixed prior to 2.6.27-rc1 by this: commit ce6fce4295ba727b36fdc73040e444bd1aae64cd Author: Matthew Wilcox Date: Fri Jul 25 15:42:58 2008 -0600 PCI MSI: Don't disable MSIs if the mask bit isn't supported David Vrabel has a device which generates an interrupt storm on the INTx pin if we disable MSI interrupts altogether. Masking interrupts is only a performance optimisation, so we can ignore the request to mask the interrupt. It looks like without the maskbit attribute on MSI, the default way to mask an MSI interrupt was to toggle the MSI enable bit. This was introduced in 58e0543e8f355b32f0778a18858b255adb7402ae, so it's lifespan was probably 2.6.21 - 2.6.26. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] add ksm kernel shared memory driver.
On Mon, 20 Apr 2009 04:36:06 +0300 Izik Eidus iei...@redhat.com wrote: Ksm is driver that allow merging identical pages between one or more applications in way unvisible to the application that use it. Pages that are merged are marked as readonly and are COWed when any application try to change them. Ksm is used for cases where using fork() is not suitable, one of this cases is where the pages of the application keep changing dynamicly and the application cannot know in advance what pages are going to be identical. Ksm works by walking over the memory pages of the applications it scan in order to find identical pages. It uses a two sorted data strctures called stable and unstable trees to find in effective way the identical pages. When ksm finds two identical pages, it marks them as readonly and merges them into single one page, after the pages are marked as readonly and merged into one page, linux will treat this pages as normal copy_on_write pages and will fork them when write access will happen to them. Ksm scan just memory areas that were registred to be scanned by it. ... + copy_user_highpage(kpage, page1, addr1, vma); ... Breaks ppc64 allmodcofnig because that architecture doesn't export its copy_user_page() to modules. Architectures are inconsistent about this. x86 _does_ export it, because it bounces it to the exported copy_page(). So can I ask that you sit down and work out upon which architectures it really makes sense to offer KSM? Disallow the others in Kconfig and arrange for copy_user_highpage() to be available on the allowed architectures? Thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] add ksm kernel shared memory driver.
* Andrew Morton (a...@linux-foundation.org) wrote: Breaks ppc64 allmodcofnig because that architecture doesn't export its copy_user_page() to modules. Things like this and updating to use madvise() I think all point towards s/tristate/bool/. I don't think CONFIG_KSM=M has huge benefit. thanks, -chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC + PATCHES] Work to get KVM autotest upstream
On Wed, May 13, 2009 at 5:21 PM, Ryan Harper ry...@us.ibm.com wrote: * Michael Goldish mgold...@redhat.com [2009-05-13 14:54]: - Lucas Meneghel Rodrigues mrodr...@redhat.com wrote: On Wed, 2009-05-13 at 12:23 -0400, Michael Goldish wrote: The patches look good, but I haven't tested them yet to make sure they leave everything at a functional state (will test them and let you know). Thanks Michael! I will start to give more thorough test on this today, since we finally got 0.10 in shape. I have a somewhat related question: how is KVM-Autotest development going to proceed after the upstream merge? Currently I have comfortable access to our repository at TLV, and on good days I push as many as 20 patches per day. Should I submit all patches to the Autotest mailing list after the merge, or are we going to work with pull requests, or some other way? Will we work with git or svn? Here is my plan: For people inside our team, with access to the git tree we can just pull stuff to the git tree and on a given time basis I can pick up the patches and send them altogether to the KVM and autotest mailing list, wait for reviews and then check them. I think it would be nice to have a 'fast' development channel like directly pulling from a git tree. If you are already used to send all your changes to the KVM mailing list though, this would pose little or no change to you, just send an additional cc to the autotest mailing list. What do you think? So far we've kept development mostly internal in TLV, so I'm not quite used to passing my commits through the mailing list. Will this be necessary? I'm worried it might slow down development to a grinding halt. I'd definitely like to see patches to the list before committing; we do the same for kvm, qemu etc, not sure why kvm-autotest should be any different. On the other hand, it's not currently being done that way and I'm not losing any sleep over it; it's easy enough to git log and and email the list if you break something or think something should be done differently. If you have, or can have, a publicly visible git tree with your changes, you can generate pull requests from time to time. Then the job of the maintainer will be only to sanitize your tree, make sure it is in overall good shape, and merge it to the main stream. -- Glauber Costa. Free as in Freedom http://glommer.net The less confident you are, the more serious you have to act. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] add ksm kernel shared memory driver.
Anthony Liguori wrote: Chris Wright wrote: * Andrew Morton (a...@linux-foundation.org) wrote: Breaks ppc64 allmodcofnig because that architecture doesn't export its copy_user_page() to modules. Things like this and updating to use madvise() I think all point towards s/tristate/bool/. I don't think CONFIG_KSM=M has huge benefit. I agree. I am sending in one sec, the madvise patch that will kick it away from being module anyway... Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] add ksm kernel shared memory driver.
Andrew Morton wrote: On Mon, 20 Apr 2009 04:36:06 +0300 Izik Eidus iei...@redhat.com wrote: Ksm is driver that allow merging identical pages between one or more applications in way unvisible to the application that use it. Pages that are merged are marked as readonly and are COWed when any application try to change them. Ksm is used for cases where using fork() is not suitable, one of this cases is where the pages of the application keep changing dynamicly and the application cannot know in advance what pages are going to be identical. Ksm works by walking over the memory pages of the applications it scan in order to find identical pages. It uses a two sorted data strctures called stable and unstable trees to find in effective way the identical pages. When ksm finds two identical pages, it marks them as readonly and merges them into single one page, after the pages are marked as readonly and merged into one page, linux will treat this pages as normal copy_on_write pages and will fork them when write access will happen to them. Ksm scan just memory areas that were registred to be scanned by it. ... + copy_user_highpage(kpage, page1, addr1, vma); ... Breaks ppc64 allmodcofnig because that architecture doesn't export its copy_user_page() to modules. Architectures are inconsistent about this. x86 _does_ export it, because it bounces it to the exported copy_page(). So can I ask that you sit down and work out upon which architectures it really makes sense to offer KSM? Disallow the others in Kconfig and arrange for copy_user_highpage() to be available on the allowed architectures? Hi There is some way (script) that i can run that will allow compile this code for every possible arch? (I dont mind to allow it just for archs that support virtualization - x86, ia64, powerpc, s390, but is it the right thing to do ?) Thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] bios: Fix MADT corruption and RSDT size when using -acpitable
External ACPI tables are counted twice for the RSDT size and the load address for the first external table is in the MADT (interrupt override entries are overwritten). Signed-off-by: Vincent Minet vinc...@vincent-minet.net --- kvm/bios/rombios32.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index cbd5f15..289361b 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -1626,7 +1626,7 @@ void acpi_bios_init(void) addr = base_addr = ram_size - ACPI_DATA_SIZE; rsdt_addr = addr; rsdt = (void *)(addr); -rsdt_size = sizeof(*rsdt) + external_tables * 4; +rsdt_size = sizeof(*rsdt); addr += rsdt_size; fadt_addr = addr; @@ -1787,6 +1787,7 @@ void acpi_bios_init(void) } int_override++; madt_size += sizeof(struct madt_int_override); +addr += sizeof(struct madt_int_override); } acpi_build_table_header((struct acpi_table_header *)madt, APIC, madt_size, 1); -- 1.6.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bios: Fix MADT corruption and RSDT size when using -acpitable
Vincent Minet wrote: External ACPI tables are counted twice for the RSDT size and the load address for the first external table is in the MADT (interrupt override entries are overwritten). Signed-off-by: Vincent Minet vinc...@vincent-minet.net Beth, I think you had a patch attempting to address the same issue. It was a bit more involved though. Which is the proper fix and are they both to the same problem? Regards, Anthony Liguori --- kvm/bios/rombios32.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index cbd5f15..289361b 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -1626,7 +1626,7 @@ void acpi_bios_init(void) addr = base_addr = ram_size - ACPI_DATA_SIZE; rsdt_addr = addr; rsdt = (void *)(addr); -rsdt_size = sizeof(*rsdt) + external_tables * 4; +rsdt_size = sizeof(*rsdt); addr += rsdt_size; fadt_addr = addr; @@ -1787,6 +1787,7 @@ void acpi_bios_init(void) } int_override++; madt_size += sizeof(struct madt_int_override); +addr += sizeof(struct madt_int_override); } acpi_build_table_header((struct acpi_table_header *)madt, APIC, madt_size, 1); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC + PATCHES] Work to get KVM autotest upstream
On Wed, May 13, 2009 at 9:19 PM, Anthony Liguori anth...@codemonkey.ws wrote: Glauber Costa wrote: On Wed, May 13, 2009 at 5:21 PM, Ryan Harper ry...@us.ibm.com wrote: I'd definitely like to see patches to the list before committing; we do the same for kvm, qemu etc, not sure why kvm-autotest should be any different. On the other hand, it's not currently being done that way and I'm not losing any sleep over it; it's easy enough to git log and and email the list if you break something or think something should be done differently. If you have, or can have, a publicly visible git tree with your changes, you can generate pull requests from time to time. Then the job of the maintainer will be only to sanitize your tree, make sure it is in overall good shape, and merge it to the main stream. The advantage to posting non-trivial patches (beyond review) is that it helps people learn about how things are being developed and makes it easier to for others to get involved. It forces a lot of the design discussions to happen on the mailing list. +5 Note that I'm not against it in any means. I'm all for post for mailing lists. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] add ksm kernel shared memory driver.
On Thu, May 14, 2009 at 03:15:05AM +0300, Izik Eidus wrote: Hi There is some way (script) that i can run that will allow compile this code for every possible arch? Segher Boessenkool has a tool for builing cross toolchains and the kernel at git://git.infradead.org/users/segher/buildall.git You can save yourself some time (and pain) and use the built toolchains at: http://bakeyournoodle.com/cross If there is any interest I can get these toolchains hosted on a faster machine (say kernel.org) Yours Tony -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv5 3/3] virtio_pci: optional MSI-X support
Michael S. Tsirkin wrote: This implements optional MSI-X support in virtio_pci. MSI-X is used whenever the host supports at least 2 MSI-X vectors: 1 for configuration changes and 1 for virtqueues. Per-virtqueue vectors are allocated if enough vectors available. Signed-off-by: Michael S. Tsirkin m...@redhat.com Acked-by: Anthony Liguori aligu...@us.ibm.com Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 resend 5/6] VT-d: cleanup iommu_flush_iotlb_psi and flush_unmaps
Make iommu_flush_iotlb_psi() and flush_unmaps() more readable. Signed-off-by: Yu Zhao yu.z...@intel.com --- drivers/pci/intel-iommu.c | 46 +--- 1 files changed, 22 insertions(+), 24 deletions(-) diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c index 001b328..a2cbc01 100644 --- a/drivers/pci/intel-iommu.c +++ b/drivers/pci/intel-iommu.c @@ -968,30 +968,27 @@ static int __iommu_flush_iotlb(struct intel_iommu *iommu, u16 did, static int iommu_flush_iotlb_psi(struct intel_iommu *iommu, u16 did, u64 addr, unsigned int pages, int non_present_entry_flush) { - unsigned int mask; + int rc; + unsigned int mask = ilog2(__roundup_pow_of_two(pages)); BUG_ON(addr (~VTD_PAGE_MASK)); BUG_ON(pages == 0); - /* Fallback to domain selective flush if no PSI support */ - if (!cap_pgsel_inv(iommu-cap)) - return iommu-flush.flush_iotlb(iommu, did, 0, 0, - DMA_TLB_DSI_FLUSH, - non_present_entry_flush); - /* +* Fallback to domain selective flush if no PSI support or the size is +* too big. * PSI requires page size to be 2 ^ x, and the base address is naturally * aligned to the size */ - mask = ilog2(__roundup_pow_of_two(pages)); - /* Fallback to domain selective flush if size is too big */ - if (mask cap_max_amask_val(iommu-cap)) - return iommu-flush.flush_iotlb(iommu, did, 0, 0, - DMA_TLB_DSI_FLUSH, non_present_entry_flush); - - return iommu-flush.flush_iotlb(iommu, did, addr, mask, - DMA_TLB_PSI_FLUSH, - non_present_entry_flush); + if (!cap_pgsel_inv(iommu-cap) || mask cap_max_amask_val(iommu-cap)) + rc = iommu-flush.flush_iotlb(iommu, did, 0, 0, + DMA_TLB_DSI_FLUSH, + non_present_entry_flush); + else + rc = iommu-flush.flush_iotlb(iommu, did, addr, mask, + DMA_TLB_PSI_FLUSH, + non_present_entry_flush); + return rc; } static void iommu_disable_protect_mem_regions(struct intel_iommu *iommu) @@ -2214,15 +2211,16 @@ static void flush_unmaps(void) if (!iommu) continue; - if (deferred_flush[i].next) { - iommu-flush.flush_iotlb(iommu, 0, 0, 0, -DMA_TLB_GLOBAL_FLUSH, 0); - for (j = 0; j deferred_flush[i].next; j++) { - __free_iova(deferred_flush[i].domain[j]-iovad, - deferred_flush[i].iova[j]); - } - deferred_flush[i].next = 0; + if (!deferred_flush[i].next) + continue; + + iommu-flush.flush_iotlb(iommu, 0, 0, 0, +DMA_TLB_GLOBAL_FLUSH, 0); + for (j = 0; j deferred_flush[i].next; j++) { + __free_iova(deferred_flush[i].domain[j]-iovad, + deferred_flush[i].iova[j]); } + deferred_flush[i].next = 0; } list_size = 0; -- 1.5.6.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 resend 0/6] ATS capability support for Intel IOMMU
This patch series implements Address Translation Service support for the Intel IOMMU. The PCIe Endpoint that supports ATS capability can request the DMA address translation from the IOMMU and cache the translation itself. This can alleviate IOMMU TLB pressure and improve the hardware performance in the I/O virtualization environment. The ATS is one of PCI-SIG I/O Virtualization (IOV) Specifications. The spec can be found at: http://www.pcisig.com/specifications/iov/ats/ (it requires membership). Changelog: v3 - v4 1, coding style fixes (Grant Grundler) 2, support the Virtual Function ATS capability v2 - v3 1, throw error message if VT-d hardware detects invalid descriptor on Queued Invalidation interface (David Woodhouse) 2, avoid using pci_find_ext_capability every time when reading ATS Invalidate Queue Depth (Matthew Wilcox) v1 - v2 added 'static' prefix to a local LIST_HEAD (Andrew Morton) Yu Zhao (6): PCI: support the ATS capability PCI: handle Virtual Function ATS enabling VT-d: parse ATSR in DMA Remapping Reporting Structure VT-d: add device IOTLB invalidation support VT-d: cleanup iommu_flush_iotlb_psi and flush_unmaps VT-d: support the device IOTLB drivers/pci/dmar.c | 189 +++--- drivers/pci/intel-iommu.c | 140 ++-- drivers/pci/iov.c | 155 ++-- drivers/pci/pci.h | 39 + include/linux/dmar.h|9 ++ include/linux/intel-iommu.h | 16 - include/linux/pci.h |2 + include/linux/pci_regs.h| 10 +++ 8 files changed, 515 insertions(+), 45 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 resend 1/6] PCI: support the ATS capability
The PCIe ATS capability makes the Endpoint be able to request the DMA address translation from the IOMMU and cache the translation in the device side, thus alleviate IOMMU pressure and improve the hardware performance in the I/O virtualization environment. Signed-off-by: Yu Zhao yu.z...@intel.com --- drivers/pci/iov.c| 105 ++ drivers/pci/pci.h| 37 include/linux/pci.h |2 + include/linux/pci_regs.h | 10 4 files changed, 154 insertions(+), 0 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index b497daa..0a7a1b4 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -5,6 +5,7 @@ * * PCI Express I/O Virtualization (IOV) support. * Single Root IOV 1.0 + * Address Translation Service 1.0 */ #include linux/pci.h @@ -679,3 +680,107 @@ irqreturn_t pci_sriov_migration(struct pci_dev *dev) return sriov_migration(dev) ? IRQ_HANDLED : IRQ_NONE; } EXPORT_SYMBOL_GPL(pci_sriov_migration); + +static int ats_alloc_one(struct pci_dev *dev, int ps) +{ + int pos; + u16 cap; + struct pci_ats *ats; + + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ATS); + if (!pos) + return -ENODEV; + + ats = kzalloc(sizeof(*ats), GFP_KERNEL); + if (!ats) + return -ENOMEM; + + ats-pos = pos; + ats-stu = ps; + pci_read_config_word(dev, pos + PCI_ATS_CAP, cap); + ats-qdep = PCI_ATS_CAP_QDEP(cap) ? PCI_ATS_CAP_QDEP(cap) : + PCI_ATS_MAX_QDEP; + dev-ats = ats; + + return 0; +} + +static void ats_free_one(struct pci_dev *dev) +{ + kfree(dev-ats); + dev-ats = NULL; +} + +/** + * pci_enable_ats - enable the ATS capability + * @dev: the PCI device + * @ps: the IOMMU page shift + * + * Returns 0 on success, or negative on failure. + */ +int pci_enable_ats(struct pci_dev *dev, int ps) +{ + int rc; + u16 ctrl; + + BUG_ON(dev-ats); + + if (ps PCI_ATS_MIN_STU) + return -EINVAL; + + rc = ats_alloc_one(dev, ps); + if (rc) + return rc; + + ctrl = PCI_ATS_CTRL_ENABLE; + ctrl |= PCI_ATS_CTRL_STU(ps - PCI_ATS_MIN_STU); + pci_write_config_word(dev, dev-ats-pos + PCI_ATS_CTRL, ctrl); + + return 0; +} + +/** + * pci_disable_ats - disable the ATS capability + * @dev: the PCI device + */ +void pci_disable_ats(struct pci_dev *dev) +{ + u16 ctrl; + + BUG_ON(!dev-ats); + + pci_read_config_word(dev, dev-ats-pos + PCI_ATS_CTRL, ctrl); + ctrl = ~PCI_ATS_CTRL_ENABLE; + pci_write_config_word(dev, dev-ats-pos + PCI_ATS_CTRL, ctrl); + + ats_free_one(dev); +} + +/** + * pci_ats_queue_depth - query the ATS Invalidate Queue Depth + * @dev: the PCI device + * + * Returns the queue depth on success, or negative on failure. + * + * The ATS spec uses 0 in the Invalidate Queue Depth field to + * indicate that the function can accept 32 Invalidate Request. + * But here we use the `real' values (i.e. 1~32) for the Queue + * Depth. + */ +int pci_ats_queue_depth(struct pci_dev *dev) +{ + int pos; + u16 cap; + + if (dev-ats) + return dev-ats-qdep; + + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ATS); + if (!pos) + return -ENODEV; + + pci_read_config_word(dev, pos + PCI_ATS_CAP, cap); + + return PCI_ATS_CAP_QDEP(cap) ? PCI_ATS_CAP_QDEP(cap) : + PCI_ATS_MAX_QDEP; +} diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index d03f6b9..3c2ec64 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -229,6 +229,13 @@ struct pci_sriov { u8 __iomem *mstate; /* VF Migration State Array */ }; +/* Address Translation Service */ +struct pci_ats { + int pos;/* capability position */ + int stu;/* Smallest Translation Unit */ + int qdep; /* Invalidate Queue Depth */ +}; + #ifdef CONFIG_PCI_IOV extern int pci_iov_init(struct pci_dev *dev); extern void pci_iov_release(struct pci_dev *dev); @@ -236,6 +243,20 @@ extern int pci_iov_resource_bar(struct pci_dev *dev, int resno, enum pci_bar_type *type); extern void pci_restore_iov_state(struct pci_dev *dev); extern int pci_iov_bus_range(struct pci_bus *bus); + +extern int pci_enable_ats(struct pci_dev *dev, int ps); +extern void pci_disable_ats(struct pci_dev *dev); +extern int pci_ats_queue_depth(struct pci_dev *dev); +/** + * pci_ats_enabled - query the ATS status + * @dev: the PCI device + * + * Returns 1 if ATS capability is enabled, or 0 if not. + */ +static inline int pci_ats_enabled(struct pci_dev *dev) +{ + return !!dev-ats; +} #else static inline int pci_iov_init(struct pci_dev *dev) { @@ -257,6 +278,22 @@ static inline int pci_iov_bus_range(struct pci_bus *bus) { return 0; } + +static inline int
[PATCH v4 resend 3/6] VT-d: parse ATSR in DMA Remapping Reporting Structure
Parse the Root Port ATS Capability Reporting Structure in the DMA Remapping Reporting Structure ACPI table. Signed-off-by: Yu Zhao yu.z...@intel.com --- drivers/pci/dmar.c | 112 -- include/linux/dmar.h|9 include/linux/intel-iommu.h |1 + 3 files changed, 116 insertions(+), 6 deletions(-) diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c index fa3a113..eaa405f 100644 --- a/drivers/pci/dmar.c +++ b/drivers/pci/dmar.c @@ -267,6 +267,84 @@ rmrr_parse_dev(struct dmar_rmrr_unit *rmrru) } return ret; } + +static LIST_HEAD(dmar_atsr_units); + +static int __init dmar_parse_one_atsr(struct acpi_dmar_header *hdr) +{ + struct acpi_dmar_atsr *atsr; + struct dmar_atsr_unit *atsru; + + atsr = container_of(hdr, struct acpi_dmar_atsr, header); + atsru = kzalloc(sizeof(*atsru), GFP_KERNEL); + if (!atsru) + return -ENOMEM; + + atsru-hdr = hdr; + atsru-include_all = atsr-flags 0x1; + + list_add(atsru-list, dmar_atsr_units); + + return 0; +} + +static int __init atsr_parse_dev(struct dmar_atsr_unit *atsru) +{ + int rc; + struct acpi_dmar_atsr *atsr; + + if (atsru-include_all) + return 0; + + atsr = container_of(atsru-hdr, struct acpi_dmar_atsr, header); + rc = dmar_parse_dev_scope((void *)(atsr + 1), + (void *)atsr + atsr-header.length, + atsru-devices_cnt, atsru-devices, + atsr-segment); + if (rc || !atsru-devices_cnt) { + list_del(atsru-list); + kfree(atsru); + } + + return rc; +} + +int dmar_find_matched_atsr_unit(struct pci_dev *dev) +{ + int i; + struct pci_bus *bus; + struct acpi_dmar_atsr *atsr; + struct dmar_atsr_unit *atsru; + + list_for_each_entry(atsru, dmar_atsr_units, list) { + atsr = container_of(atsru-hdr, struct acpi_dmar_atsr, header); + if (atsr-segment == pci_domain_nr(dev-bus)) + goto found; + } + + return 0; + +found: + for (bus = dev-bus; bus; bus = bus-parent) { + struct pci_dev *bridge = bus-self; + + if (!bridge || !bridge-is_pcie || + bridge-pcie_type == PCI_EXP_TYPE_PCI_BRIDGE) + return 0; + + if (bridge-pcie_type == PCI_EXP_TYPE_ROOT_PORT) { + for (i = 0; i atsru-devices_cnt; i++) + if (atsru-devices[i] == bridge) + return 1; + break; + } + } + + if (atsru-include_all) + return 1; + + return 0; +} #endif static void __init @@ -274,22 +352,28 @@ dmar_table_print_dmar_entry(struct acpi_dmar_header *header) { struct acpi_dmar_hardware_unit *drhd; struct acpi_dmar_reserved_memory *rmrr; + struct acpi_dmar_atsr *atsr; switch (header-type) { case ACPI_DMAR_TYPE_HARDWARE_UNIT: - drhd = (struct acpi_dmar_hardware_unit *)header; + drhd = container_of(header, struct acpi_dmar_hardware_unit, + header); printk (KERN_INFO PREFIX - DRHD (flags: 0x%08x)base: 0x%016Lx\n, - drhd-flags, (unsigned long long)drhd-address); + DRHD base: %#016Lx flags: %#x\n, + (unsigned long long)drhd-address, drhd-flags); break; case ACPI_DMAR_TYPE_RESERVED_MEMORY: - rmrr = (struct acpi_dmar_reserved_memory *)header; - + rmrr = container_of(header, struct acpi_dmar_reserved_memory, + header); printk (KERN_INFO PREFIX - RMRR base: 0x%016Lx end: 0x%016Lx\n, + RMRR base: %#016Lx end: %#016Lx\n, (unsigned long long)rmrr-base_address, (unsigned long long)rmrr-end_address); break; + case ACPI_DMAR_TYPE_ATSR: + atsr = container_of(header, struct acpi_dmar_atsr, header); + printk(KERN_INFO PREFIX ATSR flags: %#x\n, atsr-flags); + break; } } @@ -363,6 +447,11 @@ parse_dmar_table(void) ret = dmar_parse_one_rmrr(entry_header); #endif break; + case ACPI_DMAR_TYPE_ATSR: +#ifdef CONFIG_DMAR + ret = dmar_parse_one_atsr(entry_header); +#endif + break; default: printk(KERN_WARNING PREFIX Unknown DMAR structure type\n); @@ -431,11 +520,19 @@ int __init dmar_dev_scope_init(void) #ifdef CONFIG_DMAR { struct
[PATCH v4 resend 2/6] PCI: handle Virtual Function ATS enabling
The SR-IOV spec requires that the Smallest Translation Unit and the Invalidate Queue Depth fields in the Virtual Function ATS capability are hardwired to 0. If a function is a Virtual Function, then and set its Physical Function's STU before enabling the ATS. Signed-off-by: Yu Zhao yu.z...@intel.com --- drivers/pci/iov.c | 66 +--- drivers/pci/pci.h |4 ++- 2 files changed, 55 insertions(+), 15 deletions(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 0a7a1b4..4151404 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -491,10 +491,10 @@ found: if (pdev) iov-dev = pci_dev_get(pdev); - else { + else iov-dev = dev; - mutex_init(iov-lock); - } + + mutex_init(iov-lock); dev-sriov = iov; dev-is_physfn = 1; @@ -514,11 +514,11 @@ static void sriov_release(struct pci_dev *dev) { BUG_ON(dev-sriov-nr_virtfn); - if (dev == dev-sriov-dev) - mutex_destroy(dev-sriov-lock); - else + if (dev != dev-sriov-dev) pci_dev_put(dev-sriov-dev); + mutex_destroy(dev-sriov-lock); + kfree(dev-sriov); dev-sriov = NULL; } @@ -723,19 +723,40 @@ int pci_enable_ats(struct pci_dev *dev, int ps) int rc; u16 ctrl; - BUG_ON(dev-ats); + BUG_ON(dev-ats dev-ats-is_enabled); if (ps PCI_ATS_MIN_STU) return -EINVAL; - rc = ats_alloc_one(dev, ps); - if (rc) - return rc; + if (dev-is_physfn || dev-is_virtfn) { + struct pci_dev *pdev = dev-is_physfn ? dev : dev-physfn; + + mutex_lock(pdev-sriov-lock); + if (pdev-ats) + rc = pdev-ats-stu == ps ? 0 : -EINVAL; + else + rc = ats_alloc_one(pdev, ps); + + if (!rc) + pdev-ats-ref_cnt++; + mutex_unlock(pdev-sriov-lock); + if (rc) + return rc; + } + + if (!dev-is_physfn) { + rc = ats_alloc_one(dev, ps); + if (rc) + return rc; + } ctrl = PCI_ATS_CTRL_ENABLE; - ctrl |= PCI_ATS_CTRL_STU(ps - PCI_ATS_MIN_STU); + if (!dev-is_virtfn) + ctrl |= PCI_ATS_CTRL_STU(ps - PCI_ATS_MIN_STU); pci_write_config_word(dev, dev-ats-pos + PCI_ATS_CTRL, ctrl); + dev-ats-is_enabled = 1; + return 0; } @@ -747,13 +768,26 @@ void pci_disable_ats(struct pci_dev *dev) { u16 ctrl; - BUG_ON(!dev-ats); + BUG_ON(!dev-ats || !dev-ats-is_enabled); pci_read_config_word(dev, dev-ats-pos + PCI_ATS_CTRL, ctrl); ctrl = ~PCI_ATS_CTRL_ENABLE; pci_write_config_word(dev, dev-ats-pos + PCI_ATS_CTRL, ctrl); - ats_free_one(dev); + dev-ats-is_enabled = 0; + + if (dev-is_physfn || dev-is_virtfn) { + struct pci_dev *pdev = dev-is_physfn ? dev : dev-physfn; + + mutex_lock(pdev-sriov-lock); + pdev-ats-ref_cnt--; + if (!pdev-ats-ref_cnt) + ats_free_one(pdev); + mutex_unlock(pdev-sriov-lock); + } + + if (!dev-is_physfn) + ats_free_one(dev); } /** @@ -765,13 +799,17 @@ void pci_disable_ats(struct pci_dev *dev) * The ATS spec uses 0 in the Invalidate Queue Depth field to * indicate that the function can accept 32 Invalidate Request. * But here we use the `real' values (i.e. 1~32) for the Queue - * Depth. + * Depth; and 0 indicates the function shares the Queue with + * other functions (doesn't exclusively own a Queue). */ int pci_ats_queue_depth(struct pci_dev *dev) { int pos; u16 cap; + if (dev-is_virtfn) + return 0; + if (dev-ats) return dev-ats-qdep; diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 3c2ec64..f73bcbe 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -234,6 +234,8 @@ struct pci_ats { int pos;/* capability position */ int stu;/* Smallest Translation Unit */ int qdep; /* Invalidate Queue Depth */ + int ref_cnt;/* Physical Function reference count */ + int is_enabled:1; /* Enable bit is set */ }; #ifdef CONFIG_PCI_IOV @@ -255,7 +257,7 @@ extern int pci_ats_queue_depth(struct pci_dev *dev); */ static inline int pci_ats_enabled(struct pci_dev *dev) { - return !!dev-ats; + return dev-ats dev-ats-is_enabled; } #else static inline int pci_iov_init(struct pci_dev *dev) -- 1.5.6.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html