[COMMIT maint/2.6.30] Merge commit 'qemu-svn/stable_0_10' into maint/2.6.30
From: Avi Kivity a...@redhat.com * commit 'qemu-svn/stable_0_10': Fix savevm after BDRV_FILE size enforcement stop dirty tracking just at the end of migration (Glauber Costa) create qemu_file_set_error (Glauber Costa) propagate error on failed completion (Glauber Costa) qcow2: fix image creation for large, ~2TB, images (Chris Wright) pci_add storage: fix error handling for 'if' parameter (Eduardo Habkost) Fix (at least one cause of) qcow2 corruption. (Nolan Leake) Fix oops on 2.6.25 guest (Rusty Russell) SH4: Add support for kernel cmdline -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT maint/2.6.30] kvm: libkvm: Compile with correct kernel include directory
From: Avi Kivity a...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/configure b/configure index dfbff37..df4d686 100755 --- a/configure +++ b/configure @@ -1679,7 +1679,7 @@ configure_kvm() { \( $cpu = i386 -o $cpu = x86_64 -o $cpu = ia64 -o $cpu = powerpc \); then echo #define USE_KVM 1 $config_h echo USE_KVM=1 $config_mak -echo CONFIG_KVM_KERNEL_INC=$kerneldir/include $config_mak +echo CONFIG_KVM_KERNEL_INC=$kerneldir/include config-host.mak if test $kvm_cap_pit = yes ; then echo USE_KVM_PIT=1 $config_mak echo #define USE_KVM_PIT 1 $config_h diff --git a/kvm/libkvm/Makefile b/kvm/libkvm/Makefile index a8f1917..8811d84 100644 --- a/kvm/libkvm/Makefile +++ b/kvm/libkvm/Makefile @@ -12,6 +12,7 @@ cc-option = $(shell if $(CC) $(1) -S -o /dev/null -xc /dev/null \ CFLAGS += $(autodepend-flags) -g -fomit-frame-pointer -Wall CFLAGS += $(call cc-option, -fno-stack-protector, ) CFLAGS += $(call cc-option, -fno-stack-protector-all, ) +CFLAGS += -I$(CONFIG_KVM_KERNEL_INC) LDFLAGS += $(CFLAGS) -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT maint/2.6.30] Build libkvm from libkvm directory in new layout
From: Avi Kivity a...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/Makefile.target b/Makefile.target index 9ec7068..440857e 100644 --- a/Makefile.target +++ b/Makefile.target @@ -605,7 +605,7 @@ endif ifdef CONFIG_KVM_KERNEL_INC CFLAGS += -I $(CONFIG_KVM_KERNEL_INC) LIBS += -lkvm -DEPLIBS += ../libkvm/libkvm.a +DEPLIBS += libkvm.a endif ifdef CONFIG_VNC_TLS @@ -814,6 +814,14 @@ $(QEMU_PROG): LIBS += $(SDL_LIBS) $(COCOA_LIBS) $(CURSES_LIBS) $(BRLAPI_LIBS) $( $(QEMU_PROG): $(OBJS) ../libqemu_common.a libqemu.a $(DEPLIBS) $(LINK) +FORCE: + +libkvm.a: FORCE + $(MAKE) -C ../kvm/libkvm + if ! cmp -s libkvm.a ../kvm/libkvm/libkvm.a; then \ + cp ../kvm/libkvm/libkvm.a . ; \ + fi + endif # !CONFIG_USER_ONLY gdbstub-xml.c: $(TARGET_XML_FILES) feature_to_c.sh -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT maint/2.6.30] configure: Include libkvm files
From: Avi Kivity a...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/configure b/configure index f526bd5..7a97cc4 100755 --- a/configure +++ b/configure @@ -489,6 +489,8 @@ if test $werror = yes ; then CFLAGS=$CFLAGS -Werror fi +CFLAGS=$CFLAGS -I$(readlink -f kvm/libkvm) + if test $solaris = no ; then if ld --version 2/dev/null | grep GNU ld /dev/null 2/dev/null ; then LDFLAGS=$LDFLAGS -Wl,--warn-common -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT maint/2.6.30] kvm: libkvm: adjust Makefile to new layout
From: Avi Kivity a...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/kvm/libkvm/Makefile b/kvm/libkvm/Makefile index 72c36a6..62bc45f 100644 --- a/kvm/libkvm/Makefile +++ b/kvm/libkvm/Makefile @@ -1,4 +1,4 @@ -include ../config.mak +include ../../config-host.mak include config-$(ARCH).mak # cc-option @@ -9,7 +9,6 @@ cc-option = $(shell if $(CC) $(1) -S -o /dev/null -xc /dev/null \ CFLAGS += $(autodepend-flags) -g -fomit-frame-pointer -Wall CFLAGS += $(call cc-option, -fno-stack-protector, ) CFLAGS += $(call cc-option, -fno-stack-protector-all, ) -CFLAGS += -I $(LIBKVM_KERNELDIR)/include LDFLAGS += $(CFLAGS) -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT maint/2.6.30] Default to building x86_64-softmmu for kvm
From: Avi Kivity a...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/configure b/configure index 7a97cc4..dfbff37 100755 --- a/configure +++ b/configure @@ -98,7 +98,7 @@ else cpu=`uname -m` fi -target_list= +target_list=x86_64-softmmu case $cpu in i386|i486|i586|i686|i86pc|BePC) cpu=i386 -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT maint/2.6.30] configure: Use USE_KVM instead of CONFIG_KVM
From: Avi Kivity a...@redhat.com We use USE_KVM instead of CONFIG_KVM to distinguish between our version of the kvm/qemu glue and the official one. Change ./configure to reflect that. Signed-off-by: Avi Kivity a...@redhat.com diff --git a/configure b/configure index d31ca0d..f526bd5 100755 --- a/configure +++ b/configure @@ -1716,9 +1716,9 @@ case $target_cpu in echo #define USE_KQEMU 1 $config_h fi if test $kvm = yes ; then - echo CONFIG_KVM=yes $config_mak + echo USE_KVM=yes $config_mak echo KVM_CFLAGS=$kvm_cflags $config_mak - echo #define CONFIG_KVM 1 $config_h + echo #define USE_KVM 1 $config_h fi configure_kvm ;; @@ -1740,9 +1740,9 @@ case $target_cpu in configure_kvm if [ use_upstream_kvm = yes ]; then if test $kvm = yes ; then - echo CONFIG_KVM=yes $config_mak + echo USE_KVM=yes $config_mak echo KVM_CFLAGS=$kvm_cflags $config_mak - echo #define CONFIG_KVM 1 $config_h + echo #define USE_KVM 1 $config_h fi fi ;; @@ -1804,9 +1804,9 @@ case $target_cpu in echo #define TARGET_PPC 1 $config_h echo #define TARGET_PPCEMB 1 $config_h if test $kvm = yes ; then - echo CONFIG_KVM=yes $config_mak + echo USE_KVM=yes $config_mak echo KVM_CFLAGS=$kvm_cflags $config_mak - echo #define CONFIG_KVM 1 $config_h + echo #define USE_KVM 1 $config_h fi gdb_xml_files=power-core.xml power-fpu.xml power-altivec.xml power-spe.xml ;; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT maint/2.6.30] kvm: use new header layout
From: Avi Kivity a...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/Makefile.target b/Makefile.target index 4fbda65..be97e6b 100644 --- a/Makefile.target +++ b/Makefile.target @@ -140,6 +140,9 @@ endif kvm.o: CFLAGS+=$(KVM_CFLAGS) kvm-all.o: CFLAGS+=$(KVM_CFLAGS) +qemu-kvm.o qemu-kvm-x86.o device-assignment.o ioapic.o i8259.o i8254-kvm.o \ + kvm-tpr-opt.o apic.o: CFLAGS+=$(KVM_CFLAGS) + all: $(PROGS) # @@ -602,8 +605,7 @@ ifdef CONFIG_CS4231A SOUND_HW += cs4231a.o endif -ifdef CONFIG_KVM_KERNEL_INC -CFLAGS += -I $(CONFIG_KVM_KERNEL_INC) +ifdef USE_KVM DEPLIBS += libkvm.a endif @@ -816,7 +818,7 @@ $(QEMU_PROG): $(OBJS) ../libqemu_common.a libqemu.a $(DEPLIBS) FORCE: libkvm.a: FORCE - $(MAKE) -C ../kvm/libkvm + $(MAKE) -C ../kvm/libkvm KVM_CFLAGS=$(KVM_CFLAGS) if ! cmp -s libkvm.a ../kvm/libkvm/libkvm.a; then \ cp ../kvm/libkvm/libkvm.a . ; \ fi diff --git a/configure b/configure index df4d686..913bcb8 100755 --- a/configure +++ b/configure @@ -194,9 +194,6 @@ signalfd=no eventfd=no cpu_emulation=yes -# qemu-kvm: use local kerneldir -kerneldir=$(readlink -f kvm/kernel) - # OS specific if check_define __linux__ ; then targetos=Linux @@ -769,8 +766,22 @@ fi ## # KVM probe +case $cpu in +i386 | x86_64) + kvm_arch=x86 + ;; +*) + kvm_arch=$cpu + ;; +esac + +kvm_cflags= + if test $kvm = yes ; then +kvm_cflags=-I$source_path/kvm/kernel/include +kvm_cflags=$kvm_cflags -I$source_path/kvm/kernel/arch/$kvm_arch/include + # test for KVM_CAP_PIT cat $TMPC EOF @@ -780,7 +791,7 @@ cat $TMPC EOF #endif int main(void) { return 0; } EOF -if $cc $ARCH_CFLAGS $CFLAGS -I$kerneldir/include -o $TMPE ${OS_CFLAGS} $TMPC 2 /dev/null ; then +if $cc $ARCH_CFLAGS $CFLAGS $kvm_cflags -o $TMPE ${OS_CFLAGS} $TMPC 2 /dev/null ; then kvm_cap_pit=yes fi @@ -793,7 +804,7 @@ cat $TMPC EOF #endif int main(void) { return 0; } EOF -if $cc $ARCH_CFLAGS $CFLAGS -I$kerneldir/include -o $TMPE ${OS_CFLAGS} $TMPC 2 /dev/null ; then +if $cc $ARCH_CFLAGS $CFLAGS $kvm_cflags -o $TMPE ${OS_CFLAGS} $TMPC 2 /dev/null ; then kvm_cap_device_assignment=yes fi fi @@ -1049,19 +1060,6 @@ if test $kvm = yes ; then #endif int main(void) { return 0; } EOF - if test $kerneldir != ; then - kvm_cflags=-I$kerneldir/include - if test \( $cpu = i386 -o $cpu = x86_64 \) \ - -a -d $kerneldir/arch/x86/include ; then -kvm_cflags=$kvm_cflags -I$kerneldir/arch/x86/include - elif test $cpu = ppc -a -d $kerneldir/arch/powerpc/include ; then - kvm_cflags=$kvm_cflags -I$kerneldir/arch/powerpc/include -elif test -d $kerneldir/arch/$cpu/include ; then -kvm_cflags=$kvm_cflags -I$kerneldir/arch/$cpu/include - fi - else - kvm_cflags= - fi if $cc $ARCH_CFLAGS -o $TMPE ${OS_CFLAGS} $kvm_cflags $TMPC \ /dev/null 2/dev/null ; then : @@ -1679,7 +1677,7 @@ configure_kvm() { \( $cpu = i386 -o $cpu = x86_64 -o $cpu = ia64 -o $cpu = powerpc \); then echo #define USE_KVM 1 $config_h echo USE_KVM=1 $config_mak -echo CONFIG_KVM_KERNEL_INC=$kerneldir/include config-host.mak +echo KVM_CFLAGS=$kvm_cflags $config_mak if test $kvm_cap_pit = yes ; then echo USE_KVM_PIT=1 $config_mak echo #define USE_KVM_PIT 1 $config_h diff --git a/kvm/libkvm/Makefile b/kvm/libkvm/Makefile index 8811d84..727ce48 100644 --- a/kvm/libkvm/Makefile +++ b/kvm/libkvm/Makefile @@ -12,7 +12,7 @@ cc-option = $(shell if $(CC) $(1) -S -o /dev/null -xc /dev/null \ CFLAGS += $(autodepend-flags) -g -fomit-frame-pointer -Wall CFLAGS += $(call cc-option, -fno-stack-protector, ) CFLAGS += $(call cc-option, -fno-stack-protector-all, ) -CFLAGS += -I$(CONFIG_KVM_KERNEL_INC) +CFLAGS += $(KVM_CFLAGS) LDFLAGS += $(CFLAGS) -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT maint/2.6.30] Merge branch 'stable_0_10' of git://git.sv.gnu.org/qemu into maint/2.6.30
From: Avi Kivity a...@redhat.com * 'stable_0_10' of git://git.sv.gnu.org/qemu: (21 commits) block-vpc: Don't silently create smaller image than requested (Kevin Wolf) Regenerate BIOS for stable branch Fix non-ACPI Timer Interrupt Routing (Beth Kon) hpet: Fix emulation of HPET_TN_SETVAL (Jan Kiszka) kvm: Fix cpuid initialization (Jan Kiszka) qcow2 corruption: Fix alloc_cluster_link_l2 (Kevin Wolf) Free VLANClientState using qemu_free() (Mark McLoughlin) Introduce VLANClientState::cleanup() (Mark McLoughlin) Use NICInfo::model for eepro100 savevm ID string (Mark McLoughlin) Add unregister_savevm() (Mark McLoughlin) Remove NICInfo from e1000 and mipsnet state (Mark McLoughlin) Remove some useless malloc() checking (Mark McLoughlin) Don't fail PCI hotplug if no NIC model is supplied (Mark McLoughlin) Fix error handling in net_client_init() (Mark McLoughlin) struct iovec is now universally available (Mark McLoughlin) Remove stray GSO code from virtio_net (Mark McLoughlin) Recognise evdev(xx)_aliases(yy) and xfree86(xx)_aliases(yy) as keymap names. Make PCI config status register read-only Fix crash on resolution change - screen dump - vga redraw (Avi Kivity) Update version for release ... Conflicts: hw/eepro100.c hw/pcnet.c hw/rtl8139.c net.c pc-bios/bios.bin Signed-off-by: Avi Kivity a...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-77 Excessive Disk Access causes real time clock hang!
Erik Rull wrote: Are you using qcow2? In some cases qcow2 will stall the guest cpu. Note that defragmenting the guest drive may cause the qcow2 file to fragment even more, and will certainly increase its size. I recommend only defragmenting when using raw storage. I don't think so. I created a partition on my host real harddrive and provided this partition to my windows guest. If you have an idea, which virtualized drive system could be the fastest (except giving a complete disk to the guest), your comments are welcome :-) interface: virtio cache: none format: raw, using a partition or logical volume What are you using? -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm.git now live
Cameron Macdonell wrote: Great. They work as expected. One thing I noticed is that make clean doesn't remove libkvm.{o,a} and .libkvm.d. Should it? Should, but doesn't. Another thing to fix. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Assign the correct pci id range to virtio_pci
On 04/24/09 15:19, Anthony Liguori wrote: Pantelis Koukousoulas wrote: According to the file pci-ids.txt in qemu sources, the range of PCI device IDs assigned to virtio_pci is 0x1000 to 0x10ff, with a few subranges that have different rules regarding who can get an ID there and how. Nevertheless, the full range should be assigned to the generic virtio_pci driver, so that all corresponding devices, including the experimental/unreleased ones just work. Gerd, since you appear to be managing the PCI ID range, what do you think? I'd suggest to exclude the experimental range by default (enable via module parameter) to make clear it isn't for regular use. cheers Gerd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kvm-85 won't boot guests with virtio block device
Added the sacred signoff at numerious request :) Met vriendelijke groet, Pauline Middelink -- GPG Key fingerprint = 2D5B 87A7 DDA6 0378 5DEA BD3B 9A50 B416 E2D0 C3C2 For more details look at my website http://www.polyware.nl/~middelink Small regression. libvirt determines the use of the boot= flag by looking at the helptext. This flag is required for booting off virtio-blk devices. Signed-off-by: Pauline Middelink middelink at polyware.nl diff -ur kvm-85/qemu/qemu-options.hx kvm-85a/qemu/qemu-options.hx --- kvm-85/qemu/qemu-options.hx 2009-04-21 11:57:31.0 +0200 +++ kvm-85a/qemu/qemu-options.hx2009-04-24 19:04:50.0 +0200 @@ -77,6 +77,7 @@ -drive [file=file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n [,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n [,cache=writethrough|writeback|none][,format=f][,serial=s]\n + [,boot=on|off]\n use 'file' as a drive image\n) STEXI @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
[PATCH] qemu-kvm: make clean should propagate into libkvm directory
make 'clean' target propage into libkvm if it's enabled Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Makefile.target |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/Makefile.target b/Makefile.target index d890220..28dce60 100644 --- a/Makefile.target +++ b/Makefile.target @@ -846,6 +846,9 @@ qemu-options.h: $(SRC_PATH)/qemu-options.hx clean: rm -f *.o *.a *~ $(PROGS) nwfpe/*.o fpu/*.o qemu-options.h gdbstub-xml.c rm -f *.d */*.d tcg/*.o +ifdef USE_KVM + $(MAKE) -C ../kvm/libkvm clean +endif install: all ifneq ($(PROGS),) -- 1.6.0.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] qemu-kvm: fix compiler warning
kvm-common.h:25:7: warning: __ia64__ is not defined Signed-off-by: Michael S. Tsirkin m...@redhat.com --- kvm/libkvm/kvm-common.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/kvm/libkvm/kvm-common.h b/kvm/libkvm/kvm-common.h index 96361e8..591fb53 100644 --- a/kvm/libkvm/kvm-common.h +++ b/kvm/libkvm/kvm-common.h @@ -22,7 +22,7 @@ #define KVM_MAX_NUM_MEM_REGIONS 1u #define MAX_VCPUS 64 #define LIBKVM_S390_ORIGIN (0UL) -#elif __ia64__ +#elif defined(__ia64__) #define KVM_MAX_NUM_MEM_REGIONS 32u #define MAX_VCPUS 256 #else -- 1.6.0.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] qemu-kvm: make kvm_create_pit static
libkvm-x86.c:55: warning: no previous prototype for ‘kvm_create_pit’ Signed-off-by: Michael S. Tsirkin m...@redhat.com --- kvm/libkvm/libkvm-x86.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/kvm/libkvm/libkvm-x86.c b/kvm/libkvm/libkvm-x86.c index 2fc4fce..df8cc81 100644 --- a/kvm/libkvm/libkvm-x86.c +++ b/kvm/libkvm/libkvm-x86.c @@ -52,7 +52,7 @@ static int kvm_init_tss(kvm_context_t kvm) return 0; } -int kvm_create_pit(kvm_context_t kvm) +static int kvm_create_pit(kvm_context_t kvm) { #ifdef KVM_CAP_PIT int r; -- 1.6.0.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] virtio_blk: SG_IO passthru support
From: Hannes Reinecke h...@suse.de Add support for SG_IO passthru to virtio_blk. We add the scsi command block after the normal outhdr, and the scsi inhdr with full status information aswell as the sense buffer before the regular inhdr. [hch: forward ported, added the VIRTIO_BLK_F_SCSI flags, some comments and tested the whole beast] Signed-off-by: Hannes Reinecke h...@suse.de Signed-off-by: Christoph Hellwig h...@lst.de Index: linux-2.6/drivers/block/virtio_blk.c === --- linux-2.6.orig/drivers/block/virtio_blk.c 2009-04-26 23:01:41.330960470 +0200 +++ linux-2.6/drivers/block/virtio_blk.c2009-04-27 00:18:04.708076895 +0200 @@ -37,6 +37,7 @@ struct virtblk_req struct list_head list; struct request *req; struct virtio_blk_outhdr out_hdr; + struct virtio_scsi_inhdr in_hdr; u8 status; }; @@ -49,7 +50,9 @@ static void blk_done(struct virtqueue *v spin_lock_irqsave(vblk-lock, flags); while ((vbr = vblk-vq-vq_ops-get_buf(vblk-vq, len)) != NULL) { + unsigned int nr_bytes; int error; + switch (vbr-status) { case VIRTIO_BLK_S_OK: error = 0; @@ -62,7 +65,15 @@ static void blk_done(struct virtqueue *v break; } - __blk_end_request(vbr-req, error, blk_rq_bytes(vbr-req)); + if (blk_pc_request(vbr-req)) { + vbr-req-data_len = vbr-in_hdr.residual; + nr_bytes = vbr-in_hdr.data_len; + vbr-req-sense_len = vbr-in_hdr.sense_len; + vbr-req-errors = vbr-in_hdr.errors; + } else + nr_bytes = blk_rq_bytes(vbr-req); + + __blk_end_request(vbr-req, error, nr_bytes); list_del(vbr-list); mempool_free(vbr, vblk-pool); } @@ -74,7 +85,7 @@ static void blk_done(struct virtqueue *v static bool do_req(struct request_queue *q, struct virtio_blk *vblk, struct request *req) { - unsigned long num, out, in; + unsigned long num, out = 0, in = 0; struct virtblk_req *vbr; vbr = mempool_alloc(vblk-pool, GFP_ATOMIC); @@ -99,18 +110,36 @@ static bool do_req(struct request_queue if (blk_barrier_rq(vbr-req)) vbr-out_hdr.type |= VIRTIO_BLK_T_BARRIER; - sg_set_buf(vblk-sg[0], vbr-out_hdr, sizeof(vbr-out_hdr)); - num = blk_rq_map_sg(q, vbr-req, vblk-sg+1); - sg_set_buf(vblk-sg[num+1], vbr-status, sizeof(vbr-status)); - - if (rq_data_dir(vbr-req) == WRITE) { - vbr-out_hdr.type |= VIRTIO_BLK_T_OUT; - out = 1 + num; - in = 1; - } else { - vbr-out_hdr.type |= VIRTIO_BLK_T_IN; - out = 1; - in = 1 + num; + sg_set_buf(vblk-sg[out++], vbr-out_hdr, sizeof(vbr-out_hdr)); + + /* +* If this is a packet command we need a couple of additional headers. +* Behind the normal outhdr we put a segment with the scsi command +* block, and before the normal inhdr we put the sense data and the +* inhdr with additional status information before the normal inhdr. +*/ + if (blk_pc_request(vbr-req)) + sg_set_buf(vblk-sg[out++], vbr-req-cmd, vbr-req-cmd_len); + + num = blk_rq_map_sg(q, vbr-req, vblk-sg + out); + + if (blk_pc_request(vbr-req)) { + sg_set_buf(vblk-sg[num + out + in++], vbr-req-sense, 96); + sg_set_buf(vblk-sg[num + out + in++], vbr-in_hdr, + sizeof(vbr-in_hdr)); + } + + sg_set_buf(vblk-sg[num + out + in++], vbr-status, + sizeof(vbr-status)); + + if (num) { + if (rq_data_dir(vbr-req) == WRITE) { + vbr-out_hdr.type |= VIRTIO_BLK_T_OUT; + out += num; + } else { + vbr-out_hdr.type |= VIRTIO_BLK_T_IN; + in += num; + } } if (vblk-vq-vq_ops-add_buf(vblk-vq, vblk-sg, out, in, vbr)) { @@ -149,8 +178,16 @@ static void do_virtblk_request(struct re static int virtblk_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, unsigned long data) { - return scsi_cmd_ioctl(bdev-bd_disk-queue, - bdev-bd_disk, mode, cmd, + struct gendisk *disk = bdev-bd_disk; + struct virtio_blk *vblk = disk-private_data; + + /* +* Only allow the generic SCSI ioctls if the host can support it. +*/ + if (!virtio_has_feature(vblk-vdev, VIRTIO_BLK_F_SCSI)) + return -ENOIOCTLCMD; + + return scsi_cmd_ioctl(disk-queue, disk, mode, cmd, (void __user *)data); } @@ -356,6 +393,7 @@ static
[PATCH] virtio-blk: add SGI_IO passthru support
Add support for SG_IO passthru (packet commands) to the virtio-blk backend. Conceptually based on an older patch from Hannes Reinecke but largely rewritten to match the code structure and layering in virtio-blk. Note that currently we issue the hose SG_IO synchronously. We could easily switch to async I/O, but that would required either bloating the VirtIOBlockReq by the size of struct sg_io_hdr or an additional memory allocation for each SG_IO request. Signed-off-by: Christoph Hellwig h...@lst.de Index: qemu/hw/virtio-blk.h === --- qemu.orig/hw/virtio-blk.h 2009-04-26 16:50:38.154074532 +0200 +++ qemu/hw/virtio-blk.h2009-04-26 22:51:16.838076869 +0200 @@ -28,6 +28,9 @@ #define VIRTIO_BLK_F_SIZE_MAX 1 /* Indicates maximum segment size */ #define VIRTIO_BLK_F_SEG_MAX2 /* Indicates maximum # of segments */ #define VIRTIO_BLK_F_GEOMETRY 4 /* Indicates support of legacy geometry */ +#define VIRTIO_BLK_F_RO 5 /* Disk is read-only */ +#define VIRTIO_BLK_F_BLK_SIZE 6 /* Block size of disk is available*/ +#define VIRTIO_BLK_F_SCSI 7 /* Supports scsi command passthru */ struct virtio_blk_config { @@ -70,6 +73,15 @@ struct virtio_blk_inhdr unsigned char status; }; +/* SCSI pass-through header */ +struct virtio_scsi_inhdr +{ +uint32_t errors; +uint32_t data_len; +uint32_t sense_len; +uint32_t residual; +}; + void *virtio_blk_init(PCIBus *bus, BlockDriverState *bs); #endif Index: qemu/hw/virtio-blk.c === --- qemu.orig/hw/virtio-blk.c 2009-04-26 16:50:38.160074667 +0200 +++ qemu/hw/virtio-blk.c2009-04-27 10:25:19.278074514 +0200 @@ -15,6 +15,9 @@ #include sysemu.h #include virtio-blk.h #include block_int.h +#ifdef __linux__ +# include scsi/sg.h +#endif typedef struct VirtIOBlock { @@ -35,6 +38,7 @@ typedef struct VirtIOBlockReq VirtQueueElement elem; struct virtio_blk_inhdr *in; struct virtio_blk_outhdr *out; +struct virtio_scsi_inhdr *scsi; QEMUIOVector qiov; struct VirtIOBlockReq *next; } VirtIOBlockReq; @@ -103,6 +107,108 @@ static VirtIOBlockReq *virtio_blk_get_re return req; } +#ifdef __linux__ +static void virtio_blk_handle_scsi(VirtIOBlockReq *req) +{ +struct sg_io_hdr hdr; +int ret, size = 0; +int status; +int i; + +/* + * We require at least one output segment each for the virtio_blk_outhdr + * and the SCSI command block. + * + * We also at least require the virtio_blk_inhdr, the virtio_scsi_inhdr + * and the sense buffer pointer in the input segments. + */ +if (req-elem.out_num 2 || req-elem.in_num 3) { +virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR); +return; +} + +/* + * No support for bidirection commands yet. + */ +if (req-elem.out_num 2 req-elem.in_num 3) { +virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP); +return; +} + +/* + * The scsi inhdr is placed in the second-to-last input segment, just + * before the regular inhdr. + */ +req-scsi = (void *)req-elem.in_sg[req-elem.in_num - 2].iov_base; +size = sizeof(*req-in) + sizeof(*req-scsi); + +memset(hdr, 0, sizeof(struct sg_io_hdr)); +hdr.interface_id = 'S'; +hdr.cmd_len = req-elem.out_sg[1].iov_len; +hdr.cmdp = req-elem.out_sg[1].iov_base; +hdr.dxfer_len = 0; + +if (req-elem.out_num 2) { +/* + * If there are more than the minimally required 2 output segments + * there is write payload starting from the third iovec. + */ +hdr.dxfer_direction = SG_DXFER_TO_DEV; +hdr.iovec_count = req-elem.out_num - 2; + +for (i = 0; i hdr.iovec_count; i++) +hdr.dxfer_len += req-elem.out_sg[i + 2].iov_len; + +hdr.dxferp = req-elem.out_sg + 2; + +} else if (req-elem.in_num 3) { +/* + * If we have more than 3 input segments the guest wants to actually + * read data. + */ +hdr.dxfer_direction = SG_DXFER_FROM_DEV; +hdr.iovec_count = req-elem.in_num - 3; +for (i = 0; i hdr.iovec_count; i++) +hdr.dxfer_len += req-elem.in_sg[i].iov_len; + +hdr.dxferp = req-elem.in_sg; +size += hdr.dxfer_len; +} else { +/* + * Some SCSI commands don't actually transfer any data. + */ +hdr.dxfer_direction = SG_DXFER_NONE; +} + +hdr.sbp = req-elem.in_sg[req-elem.in_num - 3].iov_base; +hdr.mx_sb_len = req-elem.in_sg[req-elem.in_num - 3].iov_len; +size += hdr.mx_sb_len; + +ret = bdrv_ioctl(req-dev-bs, SG_IO, hdr); +if (ret) { +status = VIRTIO_BLK_S_UNSUPP; +hdr.status = ret; +hdr.resid = hdr.dxfer_len; +} else if (hdr.status) { +status = VIRTIO_BLK_S_IOERR; +} else { +status =
Re: [PATCH] Assign the correct pci id range to virtio_pci
I'd suggest to exclude the experimental range by default (enable via module parameter) to make clear it isn't for regular use. Module parameter on what? The module parameter parsing code is afaict provided by the end-driver (e.g., virtio-net) which only speaks virtio and has no idea there is an actual PCI device in the backend. Isn't it easier to just make it clear that a PCI id within the 0x1000-10ef range is a prerequisite for inclusion in mainline linux / qemu and leave it at that? Btw, including just a subset of the experimental range like e.g., 0x10f0-0x10f3 would be fine with me, if there is a desire to be compatible with the allow for breaking the ABI rationale for the 0x1000-0x103f range. Thanks, Pantelis -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface
Gregory Haskins wrote: This allows an eventfd to be registered as an irq source with a guest. Any signaling operation on the eventfd (via userspace or kernel) will inject the registered GSI at the next available window. +struct kvm_irqfd { + __u32 fd; + __u32 gsi; +}; + I think it's better to have ioctl create and return the fd. This way we aren't tied to eventfd (though it makes a lot of sense to use it). Also, please add a flags field and some padding so we can extend it later. + +#include linux/kvm_host.h +#include linux/eventfd.h +#include linux/workqueue.h +#include linux/wait.h +#include linux/poll.h +#include linux/file.h +#include linux/list.h + +struct _irqfd { + struct kvm *kvm; + int gsi; + struct file *file; + struct list_head list; + poll_tablept; + wait_queue_head_t*wqh; + wait_queue_t wait; + struct work_structwork; +}; + +static void +irqfd_inject(struct work_struct *work) +{ + struct _irqfd *irqfd = container_of(work, struct _irqfd, work); + struct kvm *kvm = irqfd-kvm; + + mutex_lock(kvm-lock); + kvm_set_irq(kvm, kvm-irqfd.src, irqfd-gsi, 1); Need to lower the irq too (though irqfd only supports edge triggered interrupts). + mutex_unlock(kvm-lock); +} + +static int +irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) +{ + struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait); + + /* +* The eventfd calls its wake_up with interrupts disabled, +* so we need to defer the IRQ injection until later since we need +* to acquire the kvm-lock to do so. +*/ + schedule_work(irqfd-work); + + return 0; +} One day we'll have lockless injection and we'll want to drop this. I guess if we create the fd ourselves we can make it work, but I don't see how we can do this with eventfd. +int +kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi) +{ + struct _irqfd *irqfd; + struct file *file; + int ret; + + irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL); + if (!irqfd) + return -ENOMEM; + + irqfd-kvm = kvm; + irqfd-gsi = gsi; + INIT_LIST_HEAD(irqfd-list); + init_waitqueue_func_entry(irqfd-wait, irqfd_wakeup); + init_poll_funcptr(irqfd-pt, irqfd_ptable_queue_proc); + INIT_WORK(irqfd-work, irqfd_inject); + + file = eventfd_fget(fd); + if (IS_ERR(file)) { + ret = PTR_ERR(file); + goto fail; + } + + ret = file-f_op-poll(file, irqfd-pt); + /* do we need to look for errors in ret? */ Do we? + + irqfd-file = file; + + mutex_lock(kvm-lock); + if (kvm-irqfd.src == -1) { + ret = kvm_request_irq_source_id(kvm); + BUG_ON(ret 0); I think you can reuse the userspace irq source (since it's just another way for userspace to inject an interrupt). It isn't really needed since the irq source stuff is only needed to support level triggered interrupts. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Assign the correct pci id range to virtio_pci
On 04/27/09 10:39, Pantelis Koukousoulas wrote: I'd suggest to exclude the experimental range by default (enable via module parameter) to make clear it isn't for regular use. Module parameter on what? virtio_pci.ko Then you'll do something like this ... echo options virtio_pci experimental=1 /etc/modprobe.d/virtio.conf ... to enable the experimental IDs. The module parameter parsing code is afaict provided by the end-driver (e.g., virtio-net) which only speaks virtio and has no idea there is an actual PCI device in the backend. virtio_pci is a separate module and can have parameters too. cheers, Gerd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Assign the correct pci id range to virtio_pci
On 04/27/09 11:11, Pantelis Koukousoulas wrote: Module parameter on what? virtio_pci.ko Then you'll do something like this ... echo options virtio_pci experimental=1 /etc/modprobe.d/virtio.conf ... to enable the experimental IDs. Ok, I can sure live with such an arrangement :) Should the entire 0x10f0-0x10ff range be enabled by this option or just a subset? (and if a subset, how large, 4 IDs, more ?) That roughly matches the number of non-experimental IDs (1/4 of the complete range), so 4 IDs looks good to me. cheers, Gerd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Assign the correct pci id range to virtio_pci
Gerd Hoffmann wrote: On 04/27/09 11:11, Pantelis Koukousoulas wrote: Module parameter on what? virtio_pci.ko Then you'll do something like this ... echo options virtio_pci experimental=1 /etc/modprobe.d/virtio.conf ... to enable the experimental IDs. Ok, I can sure live with such an arrangement :) Should the entire 0x10f0-0x10ff range be enabled by this option or just a subset? (and if a subset, how large, 4 IDs, more ?) That roughly matches the number of non-experimental IDs (1/4 of the complete range), so 4 IDs looks good to me. Or maybe modprobe virtio-pci claim=0x10f2 claim=0x10f7 -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM: Enable snooping control for supported hardware
Memory aliases with different memory type is a problem for guest. For the guest without assigned device, the memory type of guest memory would always been the same as host(WB); but for the assigned device, some part of memory may be used as DMA and then set to uncacheable memory type(UC/WC), which would be a conflict of host memory type then be a potential issue. Snooping control can guarantee the cache correctness of memory go through the DMA engine of VT-d. Signed-off-by: Sheng Yang sh...@linux.intel.com --- arch/x86/include/asm/kvm_host.h |5 - arch/x86/kvm/mmu.c | 16 +--- arch/x86/kvm/svm.c |4 ++-- arch/x86/kvm/vmx.c | 29 ++--- virt/kvm/iommu.c| 27 --- 5 files changed, 61 insertions(+), 20 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index ba6906f..b972889 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -395,6 +395,8 @@ struct kvm_arch{ struct list_head active_mmu_pages; struct list_head assigned_dev_head; struct iommu_domain *iommu_domain; +#define KVM_IOMMU_CACHE_COHERENCY 0x1 + int iommu_flags; struct kvm_pic *vpic; struct kvm_ioapic *vioapic; struct kvm_pit *vpit; @@ -523,7 +525,7 @@ struct kvm_x86_ops { int (*interrupt_allowed)(struct kvm_vcpu *vcpu); int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); int (*get_tdp_level)(void); - int (*get_mt_mask_shift)(void); + u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio); }; extern struct kvm_x86_ops *kvm_x86_ops; @@ -551,6 +553,7 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa, const void *val, int bytes); int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes, gpa_t addr, unsigned long *ret); +u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn); extern bool tdp_enabled; diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 55c6923..ea1c2aa 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1606,7 +1606,7 @@ static int get_mtrr_type(struct mtrr_state_type *mtrr_state, return mtrr_state-def_type; } -static u8 get_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn) +u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn) { u8 mtrr; @@ -1616,6 +1616,7 @@ static u8 get_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn) mtrr = MTRR_TYPE_WRBACK; return mtrr; } +EXPORT_SYMBOL_GPL(kvm_get_guest_memory_type); static int kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) { @@ -1688,16 +1689,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte, spte |= shadow_user_mask; if (largepage) spte |= PT_PAGE_SIZE_MASK; - if (tdp_enabled) { - if (!kvm_is_mmio_pfn(pfn)) { - mt_mask = get_memory_type(vcpu, gfn) - kvm_x86_ops-get_mt_mask_shift(); - mt_mask |= VMX_EPT_IGMT_BIT; - } else - mt_mask = MTRR_TYPE_UNCACHABLE - kvm_x86_ops-get_mt_mask_shift(); - spte |= mt_mask; - } + if (tdp_enabled) + spte |= kvm_x86_ops-get_mt_mask(vcpu, gfn, + kvm_is_mmio_pfn(pfn)); spte |= (u64)pfn PAGE_SHIFT; diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 053f3c5..2bbe1de 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2617,7 +2617,7 @@ static int get_npt_level(void) #endif } -static int svm_get_mt_mask_shift(void) +static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) { return 0; } @@ -2678,7 +2678,7 @@ static struct kvm_x86_ops svm_x86_ops = { .set_tss_addr = svm_set_tss_addr, .get_tdp_level = get_npt_level, - .get_mt_mask_shift = svm_get_mt_mask_shift, + .get_mt_mask = svm_get_mt_mask, }; static int __init svm_init(void) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 85db8b2..db853b6 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3678,9 +3678,32 @@ static int get_ept_level(void) return VMX_EPT_DEFAULT_GAW + 1; } -static int vmx_get_mt_mask_shift(void) +static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) { - return VMX_EPT_MT_EPTE_SHIFT; + int ret; + + /* For VT-d and EPT combination +* 1. MMIO: always map as UC +* 2. EPT with VT-d: +* a. VT-d with snooping control feature: snooping control feature of +* VT-d engine can guarantee the cache correctness. Just set it +* to WB to keep consistent with host. So the same as item 3. +* b. VT-d without snooping control feature: can't
Re: Unable to boot guest on kernel 2.6.29.1 with kvm-84 or kvm-85
Eino Malinen wrote: Can you try a bisect? I confirm having the same trouble with kvm. Bisect with the vanilla kernel gave the following result: (f438349...|BISECTING)$ git bisect log # bad: [8d7bff2d72660d9d60aa371ae3d1356bbf329a09] Linux 2.6.29.1 # good: [8e0ee43bc2c3e19db56a4adaa9a9b04ce885cd84] Linux 2.6.29 git bisect start 'v2.6.29.1' 'v2.6.29' '--' 'arch/x86/kvm/' 'virt/kvm/' # bad: [f438349efb8247cd0c1d453a4131b1f801bf5691] KVM: VMX: Don't allow uninhibited access to EFER on i386 git bisect bad f438349efb8247cd0c1d453a4131b1f801bf5691 Are the commits splitted more precisely in kvm.git? No (and this is pretty fine granularity -- that commit is a one liner). So 2.6.29 worked and 2.6.29.1 did not? What is your host cpu type? Are you running an i386 or x86_64 host? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] virtio-blk: add SGI_IO passthru support
Christoph Hellwig wrote: [had the qemu list address wrong the first time, reply to this message, not the previous if you were on Cc] Add support for SG_IO passthru (packet commands) to the virtio-blk backend. Conceptually based on an older patch from Hannes Reinecke but largely rewritten to match the code structure and layering in virtio-blk. Note that currently we issue the hose SG_IO synchronously. We could easily switch to async I/O, but that would required either bloating the VirtIOBlockReq by the size of struct sg_io_hdr or an additional memory allocation for each SG_IO request. I think that's worthwhile. The extra bloat is trivial (especially as the number of inflight virtio requests is tightly bounded), and stalling the vcpu for requests is a pain. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio_blk: SG_IO passthru support
Am Monday 27 April 2009 10:21:21 schrieben Sie: From: Hannes Reinecke h...@suse.de Add support for SG_IO passthru to virtio_blk. We add the scsi command block after the normal outhdr, and the scsi inhdr with full status information aswell as the sense buffer before the regular inhdr. [hch: forward ported, added the VIRTIO_BLK_F_SCSI flags, some comments and tested the whole beast] I like the VIRTIO_BLK_F_SCSI flag - it avoids some problems we had with an earlier version. (I believe we found a solution, but this explicit check looks much more reliable) I gave this patch a short test with the kuli loader (s390 userspace). You can consider the VIRTIO_BLK_F_SCSI not available case Tested-by: Christian Borntraeger borntrae...@de.ibm.com Christian -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface
Gregory Haskins wrote: Avi Kivity wrote: One day we'll have lockless injection and we'll want to drop this. I guess if we create the fd ourselves we can make it work, but I don't see how we can do this with eventfd. Hmm...this is a good point. There probably is no way to use eventfd off the shelf in a way that doesn't cause this callback to be in a critical section. Should we just worry about switching away from eventfd when this occurs, or should I implement a custom anon-fd now? Something else to consider: eventfd supports calling eventfd_signal() from interrupt context, so even if the callback were not invoked from a preempt/irq off CS due to the wqh-lock, the context may still be a CS anyway. Now, userspace and vbus based injection do not need to worry about this, but I wonder if this is a desirable attribute for some other source (device-assignment perhaps)? If so, is there a way to design the lockless-injection code such that its still friendly to irq-context, or is this a mode that we would never want to support anyway? I think this would be a good thing to have at least to maintain compatibility with the existing eventfd interface, which has its own advantages. -Greg signature.asc Description: OpenPGP digital signature
Re: [PATCH v2 00/16] interrupt injection rework
Gleb Natapov wrote: Hi, This patch series aims to consolidate IRQ injection code for in kernel IRQ chip and userspace one. Also to move IRQ injection logic from SVM/VMX specific code to x86.c. Applied all, thanks for this excellent patchset. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] qemu-kvm: add irqfd support
Michael S. Tsirkin wrote: On Fri, Apr 24, 2009 at 12:30:17AM -0400, Gregory Haskins wrote: +int kvm_irqfd(kvm_context_t kvm, int gsi) +{ +int fd, r; + +if (!kvm_check_extension(kvm, KVM_CAP_IRQFD)) +return -ENOENT; + +fd = eventfd(0, 0); +if (fd 0) +return fd; + +r = assign_irqfd(kvm, fd, gsi); +if (r 0) +return r; Do we need to close fd on error? Good catch. This will be changing in v3 to do the allocation in kernel, but I will be sure to properly cleanup this type of leak, however that ends up looking. Thanks Michael. -Greg + +return fd; +} signature.asc Description: OpenPGP digital signature
Re: [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface
Gregory Haskins wrote: Something else to consider: eventfd supports calling eventfd_signal() from interrupt context, so even if the callback were not invoked from a preempt/irq off CS due to the wqh-lock, the context may still be a CS anyway. Now, userspace and vbus based injection do not need to worry about this, but I wonder if this is a desirable attribute for some other source (device-assignment perhaps)? I'm no networking expert, but device assignment certainly wants to queue an interrupt from an interrupt (basically it forwards the interrupt from host to guest). Block is also simple enough to trigger the interrupt from the completion context. For networking, we'd eventually want to do the host-guest copy using a dma engine, and we'd want to inject the interrupt from the dma engine's completion handler. If so, is there a way to design the lockless-injection code such that its still friendly to irq-context, or is this a mode that we would never want to support anyway? I think this would be a good thing to have at least to maintain compatibility with the existing eventfd interface, which has its own advantages. This has RCU painted all over it in a 1200-point font. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Assign the correct pci id range to virtio_pci
Or maybe modprobe virtio-pci claim=0x10f2 claim=0x10f7 How about claim=0x10f2,0x10f7 instead so that it can be implemented as a standard module array parameter? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Assign the correct pci id range to virtio_pci
Pantelis Koukousoulas wrote: Or maybe modprobe virtio-pci claim=0x10f2 claim=0x10f7 How about claim=0x10f2,0x10f7 instead so that it can be implemented as a standard module array parameter? Even better. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kvm-kmod.git
This is a small git repository for the kvm external module kit. It includes a submodule link to the kernel tree, so when you check out the repository, it also checks out a kernel tree for a specific revision. This way the various scripts and the kernel source are always synchronized. $ git clone git://git.kernel.org/pub/scm/virt/kvm/kvm-kmod.git $ git submodule init $ git submodule update $ ./configure $ make sync $ make Todo: - add maint branches - automatic 'make sync'? perhaps replace by symlimks? - add release script -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 0/8] virtio: add guest MSI-X support
Add optional MSI-X support: use a vector per virtqueue with fallback to a common vector and finally to regular interrupt. Teach all drivers to use it. I added 2 new virtio operations: request_vqs/free_vqs because MSI needs to know the total number of vectors upfront. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Here's a draft set of patches for MSI-X support in the guest. It still needs to be tested properly, and performance impact measured, but I thought I'd share it here in the hope of getting some very early feedback/flames. Michael S. Tsirkin (8): virtio: add request_vqs/free_vqs virtio operations virtio_blk: add request_vqs/free_vqs calls virtio-rng: add request_vqs/free_vqs calls virtio_console: add request_vqs/free_vqs calls virtio_net: add request_vqs/free_vqs calls virtio_balloon: add request_vqs/free_vqs calls virtio_pci: split up vp_interrupt virtio_pci: optional MSI-X support drivers/block/virtio_blk.c |9 ++- drivers/char/hw_random/virtio-rng.c | 10 ++- drivers/char/virtio_console.c |8 ++- drivers/net/virtio_net.c| 16 +++- drivers/virtio/virtio_balloon.c |9 ++- drivers/virtio/virtio_pci.c | 192 ++- include/linux/virtio_config.h | 35 +++ 7 files changed, 247 insertions(+), 32 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/8] virtio: add request_vqs/free_vqs operations
This adds 2 new optional virtio operations: request_vqs/free_vqs. They will be used for MSI support, because MSI needs to know the total number of vectors upfront. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- include/linux/virtio_config.h | 35 +++ 1 files changed, 35 insertions(+), 0 deletions(-) diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index bf8ec28..e935670 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -49,6 +49,15 @@ * @set_status: write the status byte * vdev: the virtio_device * status: the new status byte + * @request_vqs: request the specified number of virtqueues + * vdev: the virtio_device + * max_vqs: the max number of virtqueues we want + * If supplied, must call before any virtqueues are instantiated. + * To modify the max number of virtqueues after request_vqs has been + * called, call free_vqs and then request_vqs with a new value. + * @free_vqs: cleanup resources allocated by request_vqs + * vdev: the virtio_device + * If supplied, must call after all virtqueues have been deleted. * @reset: reset the device * vdev: the virtio device * After this, status and feature negotiation must be done again @@ -75,6 +84,8 @@ struct virtio_config_ops u8 (*get_status)(struct virtio_device *vdev); void (*set_status)(struct virtio_device *vdev, u8 status); void (*reset)(struct virtio_device *vdev); + int (*request_vqs)(struct virtio_device *vdev, unsigned max_vqs); + void (*free_vqs)(struct virtio_device *vdev); struct virtqueue *(*find_vq)(struct virtio_device *vdev, unsigned index, void (*callback)(struct virtqueue *)); @@ -126,5 +137,29 @@ static inline int virtio_config_buf(struct virtio_device *vdev, vdev-config-get(vdev, offset, buf, len); return 0; } + +/** + * virtio_request_vqs: request the specified number of virtqueues + * @vdev: the virtio_device + * @max_vqs: the max number of virtqueues we want + * + * For details, see documentation for request_vqs above. */ +static inline int virtio_request_vqs(struct virtio_device *vdev, +unsigned max_vqs) +{ + return vdev-config-request_vqs ? + vdev-config-request_vqs(vdev, max_vqs) : 0; +} + +/** + * free_vqs: cleanup resources allocated by virtio_request_vqs + * @vdev: the virtio_device + * + * For details, see documentation for free_vqs above. */ +static inline void virtio_free_vqs(struct virtio_device *vdev) +{ + if (vdev-config-free_vqs) + vdev-config-free_vqs(vdev); +} #endif /* __KERNEL__ */ #endif /* _LINUX_VIRTIO_CONFIG_H */ -- 1.6.0.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/8] virtio_blk: add request_vqs/free_vqs calls
Add request_vqs/free_vqs calls to virtio_blk. These will be required for MSI support. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/block/virtio_blk.c |9 - 1 files changed, 8 insertions(+), 1 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 5d34764..523eddc 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -224,10 +224,14 @@ static int virtblk_probe(struct virtio_device *vdev) sg_init_table(vblk-sg, vblk-sg_elems); /* We expect one virtqueue, for output. */ + err = virtio_request_vqs(vdev, 1); + if (err) + goto out_free_vblk; + vblk-vq = vdev-config-find_vq(vdev, 0, blk_done); if (IS_ERR(vblk-vq)) { err = PTR_ERR(vblk-vq); - goto out_free_vblk; + goto out_free_vqs; } vblk-pool = mempool_create_kmalloc_pool(1,sizeof(struct virtblk_req)); @@ -324,6 +328,8 @@ out_mempool: mempool_destroy(vblk-pool); out_free_vq: vdev-config-del_vq(vblk-vq); +out_free_vqs: + virtio_free_vqs(vdev); out_free_vblk: kfree(vblk); out: @@ -345,6 +351,7 @@ static void virtblk_remove(struct virtio_device *vdev) put_disk(vblk-disk); mempool_destroy(vblk-pool); vdev-config-del_vq(vblk-vq); + virtio_free_vqs(vdev); kfree(vblk); } -- 1.6.0.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/8] virtio-rng: add request_vqs/free_vqs calls
Add request_vqs/free_vqs calls to virtio-rng. These will be required for MSI support. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/char/hw_random/virtio-rng.c | 10 +- 1 files changed, 9 insertions(+), 1 deletions(-) diff --git a/drivers/char/hw_random/virtio-rng.c b/drivers/char/hw_random/virtio-rng.c index d0e563e..d43a6cd 100644 --- a/drivers/char/hw_random/virtio-rng.c +++ b/drivers/char/hw_random/virtio-rng.c @@ -94,13 +94,20 @@ static int virtrng_probe(struct virtio_device *vdev) int err; /* We expect a single virtqueue. */ + err = virtio_request_vqs(vdev, 1); + if (err) + return err; + vq = vdev-config-find_vq(vdev, 0, random_recv_done); - if (IS_ERR(vq)) + if (IS_ERR(vq)) { + virtio_free_vqs(vdev); return PTR_ERR(vq); + } err = hwrng_register(virtio_hwrng); if (err) { vdev-config-del_vq(vq); + virtio_free_vqs(vdev); return err; } @@ -113,6 +120,7 @@ static void virtrng_remove(struct virtio_device *vdev) vdev-config-reset(vdev); hwrng_unregister(virtio_hwrng); vdev-config-del_vq(vq); + virtio_free_vqs(vdev); } static struct virtio_device_id id_table[] = { -- 1.6.0.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/8] virtio_pci: split up vp_interrupt
This reorganizes virtio-pci code in vp_interrupt slightly, so that it's easier to add per-vq MSI support on top. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/virtio/virtio_pci.c | 45 +- 1 files changed, 35 insertions(+), 10 deletions(-) diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c index 330aacb..151538c 100644 --- a/drivers/virtio/virtio_pci.c +++ b/drivers/virtio/virtio_pci.c @@ -164,6 +164,37 @@ static void vp_notify(struct virtqueue *vq) iowrite16(info-queue_index, vp_dev-ioaddr + VIRTIO_PCI_QUEUE_NOTIFY); } +/* Handle a configuration change: Tell driver if it wants to know. */ +static irqreturn_t vp_config_changed(int irq, void *opaque) +{ + struct virtio_pci_device *vp_dev = opaque; + struct virtio_driver *drv; + drv = container_of(vp_dev-vdev.dev.driver, + struct virtio_driver, driver); + + if (drv drv-config_changed) + drv-config_changed(vp_dev-vdev); + return IRQ_HANDLED; +} + +/* Notify all virtqueues on an interrupt. */ +static irqreturn_t vp_vring_interrupt(int irq, void *opaque) +{ + struct virtio_pci_device *vp_dev = opaque; + struct virtio_pci_vq_info *info; + irqreturn_t ret = IRQ_NONE; + unsigned long flags; + + spin_lock_irqsave(vp_dev-lock, flags); + list_for_each_entry(info, vp_dev-virtqueues, node) { + if (vring_interrupt(irq, info-vq) == IRQ_HANDLED) + ret = IRQ_HANDLED; + } + spin_unlock_irqrestore(vp_dev-lock, flags); + + return ret; +} + /* A small wrapper to also acknowledge the interrupt when it's handled. * I really need an EIO hook for the vring so I can ack the interrupt once we * know that we'll be handling the IRQ but before we invoke the callback since @@ -173,9 +204,6 @@ static void vp_notify(struct virtqueue *vq) static irqreturn_t vp_interrupt(int irq, void *opaque) { struct virtio_pci_device *vp_dev = opaque; - struct virtio_pci_vq_info *info; - irqreturn_t ret = IRQ_NONE; - unsigned long flags; u8 isr; /* reading the ISR has the effect of also clearing it so it's very @@ -187,14 +215,11 @@ static irqreturn_t vp_interrupt(int irq, void *opaque) return IRQ_NONE; /* Configuration change? Tell driver if it wants to know. */ - if (isr VIRTIO_PCI_ISR_CONFIG) { - struct virtio_driver *drv; - drv = container_of(vp_dev-vdev.dev.driver, - struct virtio_driver, driver); + if (isr VIRTIO_PCI_ISR_CONFIG) + vp_config_changed(irq, opaque); - if (drv drv-config_changed) - drv-config_changed(vp_dev-vdev); - } + return vp_vring_interrupt(irq, opaque); +} spin_lock_irqsave(vp_dev-lock, flags); list_for_each_entry(info, vp_dev-virtqueues, node) { -- 1.6.0.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 8/8] virtio_pci: optional MSI-X support
This implements optional MSI-X support in virtio_pci. MSI-X is used whenever the host supports at least 2 MSI-X vectors: 1 for configuration changes and 1 for virtqueues. Per-virtqueue vectors are allocated if enough vectors available. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/virtio/virtio_pci.c | 147 ++- 1 files changed, 132 insertions(+), 15 deletions(-) diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c index 151538c..20bdc8c 100644 --- a/drivers/virtio/virtio_pci.c +++ b/drivers/virtio/virtio_pci.c @@ -42,8 +42,33 @@ struct virtio_pci_device /* a list of queues so we can dispatch IRQs */ spinlock_t lock; struct list_head virtqueues; + + /* MSI-X support */ + struct msix_entry *msix_entries; + /* Name strings for interrupts. This size should be enough, +* and I'm too lazy to allocate each name separately. */ + char (*msix_names)[256]; + /* Number of vectors configured at startup (excludes per-virtqueue +* vectors if any) */ + unsigned msix_preset_vectors; + /* Number of per-virtqueue vectors if any. */ + unsigned msix_per_vq_vectors; +}; + +/* Constants for MSI-X */ +/* Use first vector for configuration changes, second and the rest for + * virtqueues Thus, we need at least 2 vectors for MSI. */ +enum { + VP_MSIX_CONFIG_VECTOR = 0, + VP_MSIX_VQ_VECTOR = 1, + VP_MSIX_MIN_VECTORS = 2 }; +static inline int vq_vector(int index) +{ + return index + VP_MSIX_VQ_VECTOR; +} + struct virtio_pci_vq_info { /* the actual virtqueue */ @@ -221,14 +246,92 @@ static irqreturn_t vp_interrupt(int irq, void *opaque) return vp_vring_interrupt(irq, opaque); } - spin_lock_irqsave(vp_dev-lock, flags); - list_for_each_entry(info, vp_dev-virtqueues, node) { - if (vring_interrupt(irq, info-vq) == IRQ_HANDLED) - ret = IRQ_HANDLED; +/* the config-free_vqs() implementation */ +static void vp_free_vqs(struct virtio_device *vdev) { + struct virtio_pci_device *vp_dev = to_vp_device(vdev); + int i; + + for (i = 0; i vp_dev-msix_preset_vectors; ++i) + free_irq(vp_dev-msix_entries[i].vector, vp_dev); + + if (!vp_dev-msix_preset_vectors) + free_irq(vp_dev-pci_dev-irq, vp_dev); +} + +/* the config-request_vqs() implementation */ +static int vp_request_vqs(struct virtio_device *vdev, unsigned max_vqs) { + struct virtio_pci_device *vp_dev = to_vp_device(vdev); + const char *name = dev_name(vp_dev-vdev.dev); + unsigned i, vectors; + int err = -ENOMEM; + + /* We need at most one vector per queue and one for config changes */ + vectors = vq_vector(max_vqs); + vp_dev-msix_entries = kmalloc(vectors * sizeof *vp_dev-msix_entries, + GFP_KERNEL); + if (!vp_dev-msix_entries) + goto error_entries; + vp_dev-msix_names = kmalloc(vectors * sizeof *vp_dev-msix_names, +GFP_KERNEL); + if (!vp_dev-msix_names) + goto error_names; + + snprintf(vp_dev-msix_names[VP_MSIX_CONFIG_VECTOR], +sizeof *vp_dev-msix_names, %s-config, name); + for (i = 0; i max_vqs; ++i) + snprintf(vp_dev-msix_names[vq_vector(i)], +sizeof *vp_dev-msix_names, %s-vq-%d, name, i); + + vp_dev-msix_preset_vectors = 1; + vp_dev-msix_per_vq_vectors = max_vqs; + for (;;) { + err = pci_enable_msix(vp_dev-pci_dev, vp_dev-msix_entries, + vectors); + /* Error out if not enough vectors */ + if (err 0 err VP_MSIX_MIN_VECTORS) + err = -EBUSY; + if (err = 0) + break; + /* Not enough vectors for all queues. Retry, disabling +* per-queue interrupts */ + vectors = VP_MSIX_MIN_VECTORS; + vp_dev-msix_preset_vectors = VP_MSIX_MIN_VECTORS; + vp_dev-msix_per_vq_vectors = 0; + snprintf(vp_dev-msix_names[VP_MSIX_VQ_VECTOR], +sizeof *vp_dev-msix_names, %s-vq, name); } - spin_unlock_irqrestore(vp_dev-lock, flags); - return ret; + if (err) { + /* Can't allocate enough MSI-X vectors, use regular interrupt */ + vp_dev-msix_preset_vectors = 0; + vp_dev-msix_per_vq_vectors = 0; + /* Register a handler for the queue with the PCI device's +* interrupt */ + err = request_irq(vp_dev-pci_dev-irq, vp_interrupt, + IRQF_SHARED, name, vp_dev); + if (err) + goto error_irq; + } + for (i = 0; i vp_dev-msix_preset_vectors; ++i) { +
[PATCH 5/8] virtio_net: add request_vqs/free_vqs calls
Add request_vqs/free_vqs calls to virtio_net. These will be required for MSI support. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/net/virtio_net.c | 16 +--- 1 files changed, 13 insertions(+), 3 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 9c82a39..fbe8a70 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -841,6 +841,7 @@ static int virtnet_probe(struct virtio_device *vdev) int err; struct net_device *dev; struct virtnet_info *vi; + bool control_vq; /* Allocate ourselves a network device with room for our info */ dev = alloc_etherdev(sizeof(struct virtnet_info)); @@ -901,11 +902,17 @@ static int virtnet_probe(struct virtio_device *vdev) if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF)) vi-mergeable_rx_bufs = true; - /* We expect two virtqueues, receive then send. */ + /* We expect either two or three virtqueues: receive, send and +* optionally control. */ + control_vq = virtio_has_feature(vi-vdev, VIRTIO_NET_F_CTRL_VQ); + err = virtio_request_vqs(vdev, control_vq ? 3 : 2); + if (err) + goto free; + vi-rvq = vdev-config-find_vq(vdev, 0, skb_recv_done); if (IS_ERR(vi-rvq)) { err = PTR_ERR(vi-rvq); - goto free; + goto free_vqs; } vi-svq = vdev-config-find_vq(vdev, 1, skb_xmit_done); @@ -914,7 +921,7 @@ static int virtnet_probe(struct virtio_device *vdev) goto free_recv; } - if (virtio_has_feature(vi-vdev, VIRTIO_NET_F_CTRL_VQ)) { + if (control_vq) { vi-cvq = vdev-config-find_vq(vdev, 2, NULL); if (IS_ERR(vi-cvq)) { err = PTR_ERR(vi-svq); @@ -965,6 +972,8 @@ free_send: vdev-config-del_vq(vi-svq); free_recv: vdev-config-del_vq(vi-rvq); +free_vqs: + virtio_free_vqs(vi-vdev); free: free_netdev(dev); return err; @@ -994,6 +1003,7 @@ static void virtnet_remove(struct virtio_device *vdev) vdev-config-del_vq(vi-rvq); if (virtio_has_feature(vi-vdev, VIRTIO_NET_F_CTRL_VQ)) vdev-config-del_vq(vi-cvq); + virtio_free_vqs(vi-vdev); unregister_netdev(vi-dev); while (vi-pages) -- 1.6.0.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM: Enable snooping control for supported hardware
Memory aliases with different memory type is a problem for guest. For the guest without assigned device, the memory type of guest memory would always been the same as host(WB); but for the assigned device, some part of memory may be used as DMA and then set to uncacheable memory type(UC/WC), which would be a conflict of host memory type then be a potential issue. Snooping control can guarantee the cache correctness of memory go through the DMA engine of VT-d. Signed-off-by: Sheng Yang sh...@linux.intel.com --- arch/x86/include/asm/kvm_host.h |2 ++ arch/x86/kvm/vmx.c | 19 +-- virt/kvm/iommu.c| 27 --- 3 files changed, 43 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index b6d01a4..b972889 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -395,6 +395,8 @@ struct kvm_arch{ struct list_head active_mmu_pages; struct list_head assigned_dev_head; struct iommu_domain *iommu_domain; +#define KVM_IOMMU_CACHE_COHERENCY 0x1 + int iommu_flags; struct kvm_pic *vpic; struct kvm_ioapic *vioapic; struct kvm_pit *vpit; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 7c4f5a3..79fc401 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3682,11 +3682,26 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) { u64 ret; + /* For VT-d and EPT combination +* 1. MMIO: always map as UC +* 2. EPT with VT-d: +* a. VT-d without snooping control feature: can't guarantee the +* result, try to trust guest. +* b. VT-d with snooping control feature: snooping control feature of +* VT-d engine can guarantee the cache correctness. Just set it +* to WB to keep consistent with host. So the same as item 3. +* 3. EPT without VT-d: always map as WB and set IGMT=1 to keep +*consistent with host MTRR +*/ if (is_mmio) ret = MTRR_TYPE_UNCACHABLE VMX_EPT_MT_EPTE_SHIFT; + else if (vcpu-kvm-arch.iommu_domain + !(vcpu-kvm-arch.iommu_flags KVM_IOMMU_CACHE_COHERENCY)) + ret = kvm_get_guest_memory_type(vcpu, gfn) + VMX_EPT_MT_EPTE_SHIFT; else - ret = (kvm_get_guest_memory_type(vcpu, gfn) - VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IGMT_BIT; + ret = (MTRR_TYPE_WRBACK VMX_EPT_MT_EPTE_SHIFT) + | VMX_EPT_IGMT_BIT; return ret; } diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c index 4c40375..1514758 100644 --- a/virt/kvm/iommu.c +++ b/virt/kvm/iommu.c @@ -39,11 +39,16 @@ int kvm_iommu_map_pages(struct kvm *kvm, pfn_t pfn; int i, r = 0; struct iommu_domain *domain = kvm-arch.iommu_domain; + int flags; /* check if iommu exists and in use */ if (!domain) return 0; + flags = IOMMU_READ | IOMMU_WRITE; + if (kvm-arch.iommu_flags KVM_IOMMU_CACHE_COHERENCY) + flags |= IOMMU_CACHE; + for (i = 0; i npages; i++) { /* check if already mapped */ if (iommu_iova_to_phys(domain, gfn_to_gpa(gfn))) @@ -53,8 +58,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, r = iommu_map_range(domain, gfn_to_gpa(gfn), pfn_to_hpa(pfn), - PAGE_SIZE, - IOMMU_READ | IOMMU_WRITE); + PAGE_SIZE, flags); if (r) { printk(KERN_ERR kvm_iommu_map_address: iommu failed to map pfn=%lx\n, pfn); @@ -88,7 +92,7 @@ int kvm_assign_device(struct kvm *kvm, { struct pci_dev *pdev = NULL; struct iommu_domain *domain = kvm-arch.iommu_domain; - int r; + int r, last_flags; /* check if iommu exists and in use */ if (!domain) @@ -107,12 +111,29 @@ int kvm_assign_device(struct kvm *kvm, return r; } + last_flags = kvm-arch.iommu_flags; + if (iommu_domain_has_cap(kvm-arch.iommu_domain, +IOMMU_CAP_CACHE_COHERENCY)) + kvm-arch.iommu_flags |= KVM_IOMMU_CACHE_COHERENCY; + + /* Check if need to update IOMMU page table for guest memory */ + if ((last_flags ^ kvm-arch.iommu_flags) == + KVM_IOMMU_CACHE_COHERENCY) { + kvm_iommu_unmap_memslots(kvm); + r = kvm_iommu_map_memslots(kvm); + if (r) + goto out_unmap; + } + printk(KERN_DEBUG assign device: host bdf = %x:%x:%x\n, assigned_dev-host_busnr,
[PATCH 1/2] KVM: Replace get_mt_mask_shift with get_mt_mask
Shadow_mt_mask is out of date, now it have only been used as a flag to indicate if TDP enabled. Get rid of it and use tdp_enabled instead. Also put memory type logical in kvm_x86_ops-get_mt_mask(). Signed-off-by: Sheng Yang sh...@linux.intel.com --- arch/x86/include/asm/kvm_host.h |5 +++-- arch/x86/kvm/mmu.c | 21 ++--- arch/x86/kvm/svm.c |4 ++-- arch/x86/kvm/vmx.c | 17 - arch/x86/kvm/x86.c |2 +- 5 files changed, 24 insertions(+), 25 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index cb306cf..b6d01a4 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -523,7 +523,7 @@ struct kvm_x86_ops { int (*interrupt_allowed)(struct kvm_vcpu *vcpu); int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); int (*get_tdp_level)(void); - int (*get_mt_mask_shift)(void); + u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio); }; extern struct kvm_x86_ops *kvm_x86_ops; @@ -537,7 +537,7 @@ int kvm_mmu_setup(struct kvm_vcpu *vcpu); void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 notrap_pte); void kvm_mmu_set_base_ptes(u64 base_pte); void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, - u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 mt_mask); + u64 dirty_mask, u64 nx_mask, u64 x_mask); int kvm_mmu_reset_context(struct kvm_vcpu *vcpu); void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot); @@ -551,6 +551,7 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa, const void *val, int bytes); int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes, gpa_t addr, unsigned long *ret); +u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn); extern bool tdp_enabled; diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 5b79afa..a875e02 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -178,7 +178,6 @@ static u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */ static u64 __read_mostly shadow_user_mask; static u64 __read_mostly shadow_accessed_mask; static u64 __read_mostly shadow_dirty_mask; -static u64 __read_mostly shadow_mt_mask; static inline u64 rsvd_bits(int s, int e) { @@ -199,14 +198,13 @@ void kvm_mmu_set_base_ptes(u64 base_pte) EXPORT_SYMBOL_GPL(kvm_mmu_set_base_ptes); void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, - u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 mt_mask) + u64 dirty_mask, u64 nx_mask, u64 x_mask) { shadow_user_mask = user_mask; shadow_accessed_mask = accessed_mask; shadow_dirty_mask = dirty_mask; shadow_nx_mask = nx_mask; shadow_x_mask = x_mask; - shadow_mt_mask = mt_mask; } EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes); @@ -1608,7 +1606,7 @@ static int get_mtrr_type(struct mtrr_state_type *mtrr_state, return mtrr_state-def_type; } -static u8 get_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn) +u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn) { u8 mtrr; @@ -1618,6 +1616,7 @@ static u8 get_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn) mtrr = MTRR_TYPE_WRBACK; return mtrr; } +EXPORT_SYMBOL_GPL(kvm_get_guest_memory_type); static int kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) { @@ -1670,7 +1669,6 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte, { u64 spte; int ret = 0; - u64 mt_mask = shadow_mt_mask; /* * We don't set the accessed bit, since we sometimes want to see @@ -1690,16 +1688,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte, spte |= shadow_user_mask; if (largepage) spte |= PT_PAGE_SIZE_MASK; - if (mt_mask) { - if (!kvm_is_mmio_pfn(pfn)) { - mt_mask = get_memory_type(vcpu, gfn) - kvm_x86_ops-get_mt_mask_shift(); - mt_mask |= VMX_EPT_IGMT_BIT; - } else - mt_mask = MTRR_TYPE_UNCACHABLE - kvm_x86_ops-get_mt_mask_shift(); - spte |= mt_mask; - } + if (tdp_enabled) + spte |= kvm_x86_ops-get_mt_mask(vcpu, gfn, + kvm_is_mmio_pfn(pfn)); spte |= (u64)pfn PAGE_SHIFT; diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 053f3c5..2bbe1de 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2617,7 +2617,7 @@ static int get_npt_level(void) #endif } -static int svm_get_mt_mask_shift(void) +static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) { return 0; } @@ -2678,7 +2678,7 @@ static struct kvm_x86_ops svm_x86_ops = { .set_tss_addr =
Re: kvm-userspace broken?
Oliver Rath wrote: Hi List, maybe i missed some announcements, but git clone git://git.kernel.org/pub/scm/virt/kvm/kvm-userspace.git givs no response: kvm-userspace # git clone git://git.kernel.org/pub/scm/virt/kvm/kvm-userspace.git Initialized empty Git repository in /home/oliver/kvm-userspace/kvm-userspace/.git/ fatal: The remote end hung up unexpectedly Whats up there? kvm-userspace.git has been retired; it's now playing golf in git://git.kernel.org/pub/scm/virt/kvm/retired/kvm-userspace.git. Use git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git instead. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm-userspace: Make PC speaker emulation aware of in-kernel PIT
On Sunday 26 April 2009 03:59:11 Anthony Liguori wrote: Jan Kiszka wrote: Anthony Liguori wrote: Marcelo Tosatti wrote: Jan, While the patch itself looks fine, IMO it would be better to move all of the timer handling to userspace, except the performance critical parts, since most of it is generic. Either periodic or one-shot timer, with: The reason for having the PIT in-kernel is not performance. The PIT is not performance sensitive. I think that depends. Some OSes (in some configurations) use the PIT counter as clock source and/or program it regularly in one-shot mode. An aging use case, but still a valid one. I can't find the thread, but this has been discussed at length before. The justification has always been for time drift correction. If you crunch the numbers, even at a 1024HZ, there just aren't enough exits to really make a difference from a performance perspective. I am agree too. When I moved PIT to kernel, the direct reason is at that time, timer in KVM is crappy, mainly due to interrupt handling stuffs. I remember the most obviously one is userspace pit injected one interrupt after another, regardless if the interrupt have already been delivered to the guest, so some interrupt lost, and the timer of guest would become slower and slower. We decided to depends on in-kernel pit to provide a stable time source, so move the whole pit to kernel(rather than try to provide a interface to fix it as Xen did at the time which seems much more complex). Now KVM timer is much maturer and stable than that time, so I think it's ok to try to separate the timer interrupt logic and IO logic now. (though I also think it would still spend some time to get a elegant interface...) -- regards Yang, Sheng Just to state it more clearly, if you assume an additional 5us to drop to userspace (which is absurdly high, but let's stick with it), 1024 exits per second comes out to about 5ms which is only 0.5% in terms of CPU consumption. The APIC is quite a bit more understandable because especially with SMP, you can generate a very high number of interrupts per second and taking a drop to userspace for every EOI can be start to matter with exit rates in the hundreds of thousands. It's because it was easier to do interrupt catch-up by pushing the PIT into the kernel which IMHO was the wrong path to go down. Pushing the emulation of port 0x61 into the kernel was a mistake we now have to deal with. I'm not that sure about the PIT itself. I agree re: port 0x61. I'm just saying that there is no point in moving just the non performance critical components to userspace as Marcelo suggests because the whole thing is non performance critical. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
qemu/hw/device-assignment: questions about msix_table_page
Sheng, Marcelo, I've been reading code in qemu/hw/device-assignment.c, and I have a couple of questions about msi-x implementation: 1. What is the reason that msix_table_page is allocated with mmap and not with e.g. malloc? 2. msix_table_page has the guest view of the msix table for the device. However, even this memory isn't mapped into guest directly, instead msix_mmio_read/msix_mmio_write perform the write in qemu. Won't it be possible to map this page directly into guest memory, reducing the overhead for table writes? Could you shed light on this for me please? Thanks, ( --) ( Resending with a sane subject/reply-to address. ) ( Sorry about multiple copies.) -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re:
On Monday 27 April 2009 18:41:17 Michael S. Tsirkin wrote: Sheng, Marcelo, I've been reading code in qemu/hw/device-assignment.c, and I have a couple of questions about msi-x implementation: Hi Michael 1. What is the reason that msix_table_page is allocated with mmap and not with e.g. malloc? msix_table_page is a page, and mmap allocate memory on page boundary. So I use it. 2. msix_table_page has the guest view of the msix table for the device. However, even this memory isn't mapped into guest directly, instead msix_mmio_read/msix_mmio_write perform the write in qemu. Won't it be possible to map this page directly into guest memory, reducing the overhead for table writes? First, Linux configured the real MSI-X table in device, which is out of our scope. KVM accepted the interrupt from Linux, then inject it to the guest according to the MSI-X table setting of guest. So KVM should know about the page modification. For example, MSI-X table got mask bit which can be written by guest at any time(this bit haven't been implement yet, but should be soon), then we should mask the correlated vector of real MSI-X table; then guest may modified the MSI address/data, that also should be intercepted by KVM and used to update our knowledge of guest. So we can't passthrough the modification. If guest can write to the real device MSI-X table directly, it would cause chaos on interrupt delivery, for what guest see is totally different with what's host see... -- regards Yang, Sheng Could you shed light on this for me please? Thanks, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface
Avi Kivity wrote: Gregory Haskins wrote: This allows an eventfd to be registered as an irq source with a guest. Any signaling operation on the eventfd (via userspace or kernel) will inject the registered GSI at the next available window. +struct kvm_irqfd { +__u32 fd; +__u32 gsi; +}; + I think it's better to have ioctl create and return the fd. This way we aren't tied to eventfd (though it makes a lot of sense to use it). I dont mind either way, but I am not sure it buys us much as the one driving the fd would need to understand if the interface is eventfd-esque or something else anyway. Let me know if you still want to see this changed. Sure, the interface remains the same (write 8 bytes), but the implementation can change. For example, we can implement it to work from interrupt context, once we hack the locking appropriately. I was thinking more along the lines of eventfd_signal(). AIO and vbus currently use this interface, as opposed to the more polymorhpic f_ops-write(). +static void +irqfd_inject(struct work_struct *work) +{ +struct _irqfd *irqfd = container_of(work, struct _irqfd, work); +struct kvm *kvm = irqfd-kvm; + +mutex_lock(kvm-lock); +kvm_set_irq(kvm, kvm-irqfd.src, irqfd-gsi, 1); Need to lower the irq too (though irqfd only supports edge triggered interrupts). Should I just do back-to-back 1+0 inside the same lock? Yes. Might be nice to add a kvm_toggle_irq(), but let's leave that until later. Ok. One day we'll have lockless injection and we'll want to drop this. I guess if we create the fd ourselves we can make it work, but I don't see how we can do this with eventfd. Hmm...this is a good point. There probably is no way to use eventfd off the shelf in a way that doesn't cause this callback to be in a critical section. Should we just worry about switching away from eventfd when this occurs, or should I implement a custom anon-fd now? I'd just go with eventfd, and switch when it becomes relevant. As long as the kernel allocates the fd, we're free to do as we like. Sounds good. -Greg signature.asc Description: OpenPGP digital signature
Re: qemu/hw/device-assignment: questions about msix_table_page
On Mon, Apr 27, 2009 at 09:16:14PM +0800, Sheng Yang wrote: On Monday 27 April 2009 18:41:17 Michael S. Tsirkin wrote: Sheng, Marcelo, I've been reading code in qemu/hw/device-assignment.c, and I have a couple of questions about msi-x implementation: Hi Michael 1. What is the reason that msix_table_page is allocated with mmap and not with e.g. malloc? msix_table_page is a page, and mmap allocate memory on page boundary. So I use it. Just wondering, would e.g. posix_memalign work here as well? 2. msix_table_page has the guest view of the msix table for the device. However, even this memory isn't mapped into guest directly, instead msix_mmio_read/msix_mmio_write perform the write in qemu. Won't it be possible to map this page directly into guest memory, reducing the overhead for table writes? First, Linux configured the real MSI-X table in device, which is out of our scope. KVM accepted the interrupt from Linux, then inject it to the guest according to the MSI-X table setting of guest. So KVM should know about the page modification. For example, MSI-X table got mask bit which can be written by guest at any time(this bit haven't been implement yet, but should be soon), then we should mask the correlated vector of real MSI-X table; then guest may modified the MSI address/data, that also should be intercepted by KVM and used to update our knowledge of guest. So we can't passthrough the modification. Right, I see that. However all msix_mmio_write does is a memcpy. So what I don't understand yet, what causes the real MSI-X table to be modified? Where's that code? If guest can write to the real device MSI-X table directly, it would cause chaos on interrupt delivery, for what guest see is totally different with what's host see... Obviously. Thanks, -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/8] virtio: add guest MSI-X support
Am Monday 27 April 2009 14:31:36 schrieb Michael S. Tsirkin: Add optional MSI-X support: use a vector per virtqueue with fallback to a common vector and finally to regular interrupt. Teach all drivers to use it. I added 2 new virtio operations: request_vqs/free_vqs because MSI needs to know the total number of vectors upfront. Signed-off-by: Michael S. Tsirkin m...@redhat.com I dont know, if that is feasible for MSI, but the transport(virtio_pci) should already know the number of virtqueues, which should match the number of vectors, no? In fact, the transport has to have a way of getting the number of virtqeues because find_vq returns ENOENT on invalid index numbers. Christian -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu/hw/device-assignment: questions about msix_table_page
On Monday 27 April 2009 21:51:34 Michael S. Tsirkin wrote: On Mon, Apr 27, 2009 at 09:16:14PM +0800, Sheng Yang wrote: On Monday 27 April 2009 18:41:17 Michael S. Tsirkin wrote: Sheng, Marcelo, I've been reading code in qemu/hw/device-assignment.c, and I have a couple of questions about msi-x implementation: Hi Michael 1. What is the reason that msix_table_page is allocated with mmap and not with e.g. malloc? msix_table_page is a page, and mmap allocate memory on page boundary. So I use it. Just wondering, would e.g. posix_memalign work here as well? Um, I think it should work too. 2. msix_table_page has the guest view of the msix table for the device. However, even this memory isn't mapped into guest directly, instead msix_mmio_read/msix_mmio_write perform the write in qemu. Won't it be possible to map this page directly into guest memory, reducing the overhead for table writes? First, Linux configured the real MSI-X table in device, which is out of our scope. KVM accepted the interrupt from Linux, then inject it to the guest according to the MSI-X table setting of guest. So KVM should know about the page modification. For example, MSI-X table got mask bit which can be written by guest at any time(this bit haven't been implement yet, but should be soon), then we should mask the correlated vector of real MSI-X table; then guest may modified the MSI address/data, that also should be intercepted by KVM and used to update our knowledge of guest. So we can't passthrough the modification. Right, I see that. However all msix_mmio_write does is a memcpy. So what I don't understand yet, what causes the real MSI-X table to be modified? Where's that code? Now it haven't been allowed to do dynamically change...:( For now, please refer to assigned_device_update_msix_mmio in qemu/hw/device-assignment.c. It would scan the MSI-X page and transfer to KVM through ioctl. And for the kernel part, please refer to pci_enable_msix(). The userspace/kernel both still need change to support mask/unmask feature. That's still in TODO list(and I hope it can catch up with 2.6.31 merge window). -- regards Yang, Sheng If guest can write to the real device MSI-X table directly, it would cause chaos on interrupt delivery, for what guest see is totally different with what's host see... Obviously. Thanks, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-kmod.git
Avi Kivity wrote: This is a small git repository for the kvm external module kit. It includes a submodule link to the kernel tree, so when you check out the repository, it also checks out a kernel tree for a specific revision. This way the various scripts and the kernel source are always synchronized. $ git clone git://git.kernel.org/pub/scm/virt/kvm/kvm-kmod.git $ git submodule init $ git submodule update $ ./configure $ make sync $ make Todo: - add maint branches - automatic 'make sync'? perhaps replace by symlimks? - add release script hi, i see this big reorganization of the sources (which is good), but does it means that there will be separate tarball releases for kvm-kmod and kvm userspace (which can be compiled without other tarball and git commands)? if yes, it means there'll be one kvm-kmod tarball or for each kernel there will be a different tarball? thanks. -- Levente Si vis pacem para bellum! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu/hw/device-assignment: questions about msix_table_page
On Mon, Apr 27, 2009 at 10:03:59PM +0800, Sheng Yang wrote: On Monday 27 April 2009 21:51:34 Michael S. Tsirkin wrote: On Mon, Apr 27, 2009 at 09:16:14PM +0800, Sheng Yang wrote: On Monday 27 April 2009 18:41:17 Michael S. Tsirkin wrote: Sheng, Marcelo, I've been reading code in qemu/hw/device-assignment.c, and I have a couple of questions about msi-x implementation: Hi Michael 1. What is the reason that msix_table_page is allocated with mmap and not with e.g. malloc? msix_table_page is a page, and mmap allocate memory on page boundary. So I use it. Just wondering, would e.g. posix_memalign work here as well? Um, I think it should work too. 2. msix_table_page has the guest view of the msix table for the device. However, even this memory isn't mapped into guest directly, instead msix_mmio_read/msix_mmio_write perform the write in qemu. Won't it be possible to map this page directly into guest memory, reducing the overhead for table writes? First, Linux configured the real MSI-X table in device, which is out of our scope. KVM accepted the interrupt from Linux, then inject it to the guest according to the MSI-X table setting of guest. So KVM should know about the page modification. For example, MSI-X table got mask bit which can be written by guest at any time(this bit haven't been implement yet, but should be soon), then we should mask the correlated vector of real MSI-X table; then guest may modified the MSI address/data, that also should be intercepted by KVM and used to update our knowledge of guest. So we can't passthrough the modification. Right, I see that. However all msix_mmio_write does is a memcpy. So what I don't understand yet, what causes the real MSI-X table to be modified? Where's that code? Now it haven't been allowed to do dynamically change...:( For now, please refer to assigned_device_update_msix_mmio in qemu/hw/device-assignment.c. It would scan the MSI-X page and transfer to KVM through ioctl. And for the kernel part, please refer to pci_enable_msix(). Right, I noticed that. So msix_mmio_read/write are placeholders, right? For exiting functionality we thinkably could let the guest write into msix_table_page. The userspace/kernel both still need change to support mask/unmask feature. That's still in TODO list(and I hope it can catch up with 2.6.31 merge window). -- regards Yang, Sheng If guest can write to the real device MSI-X table directly, it would cause chaos on interrupt delivery, for what guest see is totally different with what's host see... Obviously. Thanks, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu/hw/device-assignment: questions about msix_table_page
On Monday 27 April 2009 22:15:04 Michael S. Tsirkin wrote: On Mon, Apr 27, 2009 at 10:03:59PM +0800, Sheng Yang wrote: On Monday 27 April 2009 21:51:34 Michael S. Tsirkin wrote: On Mon, Apr 27, 2009 at 09:16:14PM +0800, Sheng Yang wrote: On Monday 27 April 2009 18:41:17 Michael S. Tsirkin wrote: Sheng, Marcelo, I've been reading code in qemu/hw/device-assignment.c, and I have a couple of questions about msi-x implementation: Hi Michael 1. What is the reason that msix_table_page is allocated with mmap and not with e.g. malloc? msix_table_page is a page, and mmap allocate memory on page boundary. So I use it. Just wondering, would e.g. posix_memalign work here as well? Um, I think it should work too. 2. msix_table_page has the guest view of the msix table for the device. However, even this memory isn't mapped into guest directly, instead msix_mmio_read/msix_mmio_write perform the write in qemu. Won't it be possible to map this page directly into guest memory, reducing the overhead for table writes? First, Linux configured the real MSI-X table in device, which is out of our scope. KVM accepted the interrupt from Linux, then inject it to the guest according to the MSI-X table setting of guest. So KVM should know about the page modification. For example, MSI-X table got mask bit which can be written by guest at any time(this bit haven't been implement yet, but should be soon), then we should mask the correlated vector of real MSI-X table; then guest may modified the MSI address/data, that also should be intercepted by KVM and used to update our knowledge of guest. So we can't passthrough the modification. Right, I see that. However all msix_mmio_write does is a memcpy. So what I don't understand yet, what causes the real MSI-X table to be modified? Where's that code? Now it haven't been allowed to do dynamically change...:( For now, please refer to assigned_device_update_msix_mmio in qemu/hw/device-assignment.c. It would scan the MSI-X page and transfer to KVM through ioctl. And for the kernel part, please refer to pci_enable_msix(). Right, I noticed that. So msix_mmio_read/write are placeholders, right? For exiting functionality we thinkably could let the guest write into msix_table_page. Yeah, in some meaning. But it's not suggested let guest write into the msix_table_page. 1. For some drivers don't set/unset mask bit, guest won't access the page frequently, so it shouldn't be a performance critical one. 2. With mask bit, some drivers would access the page more frequently, but you have to intercept it... My suggestion is get mask bit support first, then consider optimization. Mask bit would affect the performance on large scaled system, so it should be done. So current MSI-X status is indeed temporarily... I guess what you see now maybe accessing is frequently due to mask bit, but we ignored it for now, so it may can be optimized. But I really think it's temporarily status, so I think it's better not to spend much effect on optimize before mask bit support done... -- regards Yang, Sheng The userspace/kernel both still need change to support mask/unmask feature. That's still in TODO list(and I hope it can catch up with 2.6.31 merge window). -- regards Yang, Sheng If guest can write to the real device MSI-X table directly, it would cause chaos on interrupt delivery, for what guest see is totally different with what's host see... Obviously. Thanks, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: qemu: framebuffer: build fix for target-arm
Include qemu-kvm.h for non-KVM_UPSTREAM building and surround the kvm code with USE_KVM guards. Fixes target-arm: qemu/hw/framebuffer.c: In function 'framebuffer_update_display': qemu/hw/framebuffer.c:53: warning: implicit declaration of function 'kvm_enabled' qemu/hw/framebuffer.c:54: warning: implicit declaration of function 'kvm_physical_sync_dirty_bitmap' Signed-off-by: Mark McLoughlin mar...@redhat.com --- hw/framebuffer.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/hw/framebuffer.c b/hw/framebuffer.c index 1086ba9..e2d7604 100644 --- a/hw/framebuffer.c +++ b/hw/framebuffer.c @@ -18,6 +18,7 @@ #include console.h #include framebuffer.h #include kvm.h +#include qemu-kvm.h /* Render an image from a shared memory framebuffer. */ @@ -50,9 +51,11 @@ void framebuffer_update_display( *first_row = -1; src_len = src_width * rows; +#ifdef USE_KVM if (kvm_enabled()) { kvm_physical_sync_dirty_bitmap(base, src_len); } +#endif pd = cpu_get_physical_page_desc(base); pd2 = cpu_get_physical_page_desc(base + src_len - 1); /* We should reall check that this is a continuous ram region. -- 1.6.0.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/8] virtio: add guest MSI-X support
On Mon, Apr 27, 2009 at 04:00:30PM +0200, Christian Borntraeger wrote: Am Monday 27 April 2009 14:31:36 schrieb Michael S. Tsirkin: Add optional MSI-X support: use a vector per virtqueue with fallback to a common vector and finally to regular interrupt. Teach all drivers to use it. I added 2 new virtio operations: request_vqs/free_vqs because MSI needs to know the total number of vectors upfront. Signed-off-by: Michael S. Tsirkin m...@redhat.com I dont know, if that is feasible for MSI, but the transport(virtio_pci) should already know the number of virtqueues, which should match the number of vectors, no? I think no, the transport can find out the max number of vectors the device supports (pci_msix_table_size), but not how many virtqueues are needed by the driver. As number of virtqueues = number of vectors, we could pre-allocate all vectors that host supports, but this seems a bit drastic as an MSI-X device could support up to 2K vectors. In fact, the transport has to have a way of getting the number of virtqeues because find_vq returns ENOENT on invalid index numbers. Christian So again, I think this is an upper bound supported by host. Right? -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu/hw/device-assignment: questions about msix_table_page
On Mon, Apr 27, 2009 at 10:30:17PM +0800, Sheng Yang wrote: My suggestion is get mask bit support first, then consider optimization. Yea. Thanks for the explanations! -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-blk: add SGI_IO passthru support
Christoph Hellwig wrote: [had the qemu list address wrong the first time, reply to this message, not the previous if you were on Cc] Add support for SG_IO passthru (packet commands) to the virtio-blk backend. Conceptually based on an older patch from Hannes Reinecke but largely rewritten to match the code structure and layering in virtio-blk. Note that currently we issue the hose SG_IO synchronously. We could easily switch to async I/O, but that would required either bloating the VirtIOBlockReq by the size of struct sg_io_hdr or an additional memory allocation for each SG_IO request. Signed-off-by: Christoph Hellwig h...@lst.de So practically speaking, what can you do with this? Should we be handling some SCSI cmds internally to QEMU (like eject operations) and supporting media=cdrom in -drive for if=virtio? On a related topic, should we switch /dev/vdX to be /dev/sdX? Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/8] virtio: add guest MSI-X support
Michael S. Tsirkin wrote: Add optional MSI-X support: use a vector per virtqueue with fallback to a common vector and finally to regular interrupt. Teach all drivers to use it. I added 2 new virtio operations: request_vqs/free_vqs because MSI needs to know the total number of vectors upfront. Signed-off-by: Michael S. Tsirkin m...@redhat.com Do you have userspace patches for testing? Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] kvm-85 release
Hi Avi, On Tue, 2009-04-21 at 12:32 +0300, Avi Kivity wrote: Again a long delayed release; I'll try to reduce the lag between releases. kvm-85 comes in two flavors: the traditional kernel+qemu package (kvm-85.tar.gz), as well as separate qemu (qemu-kvm-devel-85.tar.gz) I packaged this in Fedora 12 as qemu-0.10.50 and noticed a couple of small problems with the qemu-kvm-devel tarball: - No kernel/include/asm symlink, so it was building with the system headers and not finding e.g. KVM_FEATURE_CLOCKSOURCE - An empty kvm-85 dir And for anyone else doing similar packaging, I've also: - Disabled pwritev() support until glibc stops writing random junk. See https://bugzilla.redhat.com/497429 - Added Pauline's boot=on -help string patch - Fixed the arm target build with the patch I just sent Cheers, Mark. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] kvm-85 release
On 27.04.2009, at 16:41, Mark McLoughlin wrote: Hi Avi, On Tue, 2009-04-21 at 12:32 +0300, Avi Kivity wrote: Again a long delayed release; I'll try to reduce the lag between releases. kvm-85 comes in two flavors: the traditional kernel+qemu package (kvm-85.tar.gz), as well as separate qemu (qemu-kvm-devel-85.tar.gz) I packaged this in Fedora 12 as qemu-0.10.50 and noticed a couple of small problems with the qemu-kvm-devel tarball: - No kernel/include/asm symlink, so it was building with the system headers and not finding e.g. KVM_FEATURE_CLOCKSOURCE - An empty kvm-85 dir And for anyone else doing similar packaging, I've also: - Disabled pwritev() support until glibc stops writing random junk. See https://bugzilla.redhat.com/497429 Wouldn't it be useful to have it disabled upstream then? Alex - Added Pauline's boot=on -help string patch - Fixed the arm target build with the patch I just sent Cheers, Mark. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] kvm-85 release
On Mon, Apr 27, 2009 at 04:45:23PM +0200, Alexander Graf wrote: - Disabled pwritev() support until glibc stops writing random junk. See https://bugzilla.redhat.com/497429 Wouldn't it be useful to have it disabled upstream then? No glibc has been released with the broken one yet, I gusee Mark must have been running some sort of snapshot. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
CPU Limitations
Hi, i have the following problem: I got a amd opteron server with 32 (4*8) cpus. Since kvm is only able to created a virtual machine with 16 cpus, i changed KVM_MAX_VCPUS, MAX_VCPUS and MAX_CPUS as suggested in http://www.linux-kvm.org/wiki/images/b/be/KvmForum2008$kdf2008_6.pdf after compiling everything is shows the right number of maximum cpus in the virt-manager, but it is still not working. If i start for example my virtual machine with kvm -m 4000 -smp 30 disk0.img i get a segmentation fault. in the logfile it looks like that: [ 4440.424512] __ratelimit: 18 callbacks suppressed [ 4440.424523] kvm[7054]: segfault at 1ae5cf6b0 ip 004092ba sp 7fff3dbb8dc0 error 4 in kvm[40+1e1000] Any idee what is missing or what is wrong? cheers cornelius -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2768533 ] SCSI/VirtIO: VM unable to reboot from non-IDE controller
Bugs item #2768533, was opened at 2009-04-16 16:01 Message generated for change (Settings changed) made by technologov You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2768533group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Closed Resolution: Fixed Priority: 7 Private: No Submitted By: Technologov (technologov) Assigned to: Nobody/Anonymous (nobody) Summary: SCSI/VirtIO: VM unable to reboot from non-IDE controller Initial Comment: All Red Hat-based systems (RHEL, Fedora) have failed during automated tests on SCSI disks on KVM. It turned out to be a problem of Qemu/KVM, where after a software reboot (i.e. initiated by guest OS), the system is unable to boot from SCSI controller. To reboot successfully you need either soft reboot+IDE or Hard reboot (Qemu cold boot)+SCSI. The Command sent to Qemu/KVM: /usr/local/bin/qemu-system-x86_64 -m 512 -monitor tcp:localhost:4602,server,nowait -cdrom /isos/linux/Fedora-8-i386-DVD.iso -drive file=/vm/fedora8-32.qcow2,if=scsi,boot=on -name fedora8-32 Host: RHEL 5/x64, KVM-85rc6. (tried both Intel and AMD) Guest: RHEL 5, Fedora 8, Fedora 9 (tried both 32 and 64-bit). -Alexey, 16.4.2009. -- Comment By: Technologov (technologov) Date: 2009-04-27 17:56 Message: Bug fixed in KVM-85. Closing. -- Comment By: Technologov (technologov) Date: 2009-04-21 12:41 Message: Ryan Harper provided a patch that fixes this. I need to make sure it makes way into KVM release. Not closing bug yet. -- Comment By: Technologov (technologov) Date: 2009-04-19 17:25 Message: Bug was also opened in Ubuntu Launchpad. https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/363743 Two reasons: 1. Qemu project does not have bugzilla - so Ubuntu Launchpad became main bugzilla for Qemu 2. This bug affects Ubuntu-8.10 -- Comment By: Technologov (technologov) Date: 2009-04-16 17:12 Message: This problem also exists for VirtIO block device. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2768533group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-kmod.git
Farkas Levente wrote: hi, i see this big reorganization of the sources (which is good), but does it means that there will be separate tarball releases for kvm-kmod and kvm userspace (which can be compiled without other tarball and git commands)? if yes, it means there'll be one kvm-kmod tarball or for each kernel there will be a different tarball? thanks There will be a qemu-kvm tarball (qemu only), kvm-kmod tarball (kernel module only, suitable for any supported host kernel), and kvm tarball (contains both of the above). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] kvm-85 release
Mark McLoughlin wrote: Hi Avi, On Tue, 2009-04-21 at 12:32 +0300, Avi Kivity wrote: Again a long delayed release; I'll try to reduce the lag between releases. kvm-85 comes in two flavors: the traditional kernel+qemu package (kvm-85.tar.gz), as well as separate qemu (qemu-kvm-devel-85.tar.gz) I packaged this in Fedora 12 as qemu-0.10.50 and noticed a couple of small problems with the qemu-kvm-devel tarball: - No kernel/include/asm symlink, so it was building with the system headers and not finding e.g. KVM_FEATURE_CLOCKSOURCE This is already fixed in the new repository. - An empty kvm-85 dir I need to rewrite my release script anyway. And for anyone else doing similar packaging, I've also: - Disabled pwritev() support until glibc stops writing random junk. See https://bugzilla.redhat.com/497429 - Added Pauline's boot=on -help string patch - Fixed the arm target build with the patch I just sent I hope to release kvm-86 soon. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/8] virtio: add guest MSI-X support
Christian Borntraeger wrote: Am Monday 27 April 2009 14:31:36 schrieb Michael S. Tsirkin: Add optional MSI-X support: use a vector per virtqueue with fallback to a common vector and finally to regular interrupt. Teach all drivers to use it. I added 2 new virtio operations: request_vqs/free_vqs because MSI needs to know the total number of vectors upfront. Signed-off-by: Michael S. Tsirkin m...@redhat.com I dont know, if that is feasible for MSI, but the transport(virtio_pci) should already know the number of virtqueues, which should match the number of vectors, no? In fact, the transport has to have a way of getting the number of virtqeues because find_vq returns ENOENT on invalid index numbers. One thing I thought was to decouple the virtqueue count from the MSI entry count, and let the guest choose how many MSI entries it wants to use. This is useful if it wants several queues to share an interrupt, perhaps it wants both rx and tx rings on one cpu to share a single vector. So the device could expose a large, constant number of MSI entries, and in addition expose a table mapping ring number to msi entry number. The guest could point all rings to one entry, or have each ring use a private entry, or anything in between. That saves us the new API (at the expense of a lot more code, but with added flexibility). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] kvm-85 release
On Mon, 2009-04-27 at 10:50 -0400, Christoph Hellwig wrote: On Mon, Apr 27, 2009 at 04:45:23PM +0200, Alexander Graf wrote: - Disabled pwritev() support until glibc stops writing random junk. See https://bugzilla.redhat.com/497429 Wouldn't it be useful to have it disabled upstream then? No glibc has been released with the broken one yet, I gusee Mark must have been running some sort of snapshot. Yeah, it's a pre-release snapshot of 2.10 which will be included in Fedora 11. Looks like it's fixed upstream already and we'll have a new snapshot very soon. Cheers, Mark. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/8] virtio: add guest MSI-X support
Am Monday 27 April 2009 16:32:56 schrieb Michael S. Tsirkin: I dont know, if that is feasible for MSI, but the transport(virtio_pci) should already know the number of virtqueues, which should match the number of vectors, no? I think no, the transport can find out the max number of vectors the device supports (pci_msix_table_size), but not how many virtqueues are needed by the driver. I was not talking about the number of vectors, but the number of real exisiting virtqueues. I just read virtio_pci again and yes,it would be a bit hard to get the number of available virtqueues, because we need to loop over all possible vqs: (something like) for (i=0; i SOMEMAX; i++) { iowrite16(i, vp_dev-ioaddr + VIRTIO_PCI_QUEUE_SEL); if (!ioread16(vp_dev-ioaddr + VIRTIO_PCI_QUEUE_NUM)) break; } return i; which is obviously not very nice. On the other hand, at the moment we have up to 3 virtqueues per device (network), and we have a current maximum of 16 virtqeues in qemu (VIRTIO_PCI_QUEUE_MAX in virtio.c), which would reduce the number. As number of virtqueues = number of vectors, we could pre-allocate all vectors that host supports, but this seems a bit drastic as an MSI-X device could support up to 2K vectors. In fact, the transport has to have a way of getting the number of virtqeues because find_vq returns ENOENT on invalid index numbers. Christian So again, I think this is an upper bound supported by host. Right? Not the upper bound, but the real available virtqueues. (With current qemu 3 for virtio-net, 2 for virtio-console etc.) Since I use a different transport (drivers/s390/kvm/kvm_virtio.c), my motiviation is to keep the virtio interface as generic as possible. I dont really like the new interface, but I cannot give you silver bullet technical reasons - its more a gut feeling. The interface would work with lguest and s390. Anyway. Avis suggestion to decouple MSI count and virtqueue count looks like a promising approach. Christian -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
bad virtio disk performance
Hi, I'm experiencing bad disk I/O performance using virtio disks. I'm using Linux 2.6.29 (host guest), kvm 84 userspace. On the host, and in a non-virtio guest, I get ~120 MB/s when writing with dd (the disks are fast RAID0 SAS disks). In a guest with a virtio disk, I get at most ~32 MB/s. The rest of the setup is the same. For reference, I'm running kvm -drive file=/tmp/debian-amd64.img,if=virtio. Is such performance expected? What should I check? Thank you, -- | Lucas Nussbaum | lu...@lucas-nussbaum.net http://www.lucas-nussbaum.net/ | | jabber: lu...@nussbaum.fr GPG: 1024D/023B3F4F | -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/8] virtio: add guest MSI-X support
On Mon, Apr 27, 2009 at 06:06:48PM +0300, Avi Kivity wrote: Christian Borntraeger wrote: Am Monday 27 April 2009 14:31:36 schrieb Michael S. Tsirkin: Add optional MSI-X support: use a vector per virtqueue with fallback to a common vector and finally to regular interrupt. Teach all drivers to use it. I added 2 new virtio operations: request_vqs/free_vqs because MSI needs to know the total number of vectors upfront. Signed-off-by: Michael S. Tsirkin m...@redhat.com I dont know, if that is feasible for MSI, but the transport(virtio_pci) should already know the number of virtqueues, which should match the number of vectors, no? In fact, the transport has to have a way of getting the number of virtqeues because find_vq returns ENOENT on invalid index numbers. One thing I thought was to decouple the virtqueue count from the MSI entry count, and let the guest choose how many MSI entries it wants to use. This is useful if it wants several queues to share an interrupt, perhaps it wants both rx and tx rings on one cpu to share a single vector. So the device could expose a large, constant number of MSI entries, and in addition expose a table mapping ring number to msi entry number. The guest could point all rings to one entry, or have each ring use a private entry, or anything in between. Sounds good ... and it might not be a lot of code I think. That saves us the new API (at the expense of a lot more code, but with added flexibility). So we'll probably need to rename request_vqs to request_vectors, but we probably still need the driver to pass the number of vectors it wants to the transport. Right? -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] reserved-ram for pci-passthrough without VT-d capable hardware
Andrea, We are working with embedded hardware that does not have VT-d and we need 1-1 mapping. I wonder which is the status of this patch. Have you continued updating it with the latest KVM version? Regards, Pablo On Wed, Jul 30, 2008 at 05:16:06PM +0300, Dor Laor wrote: In addition KVM is used in embedded too and things are slower there, we know of a specific use case (production) that demands 1:1 mapping and can't use VT-d Since you mentioned this ;), I take opportunity to add that those embedded usages are the ones that are totally fine with the compile time passthrough-guest-ram decision, instead of a boot time decision. Those host kernels will likely have RT patches (KVM works great with preempt-RT indeed) and in turn the compile time ram selection is the least of their problems as you can imagine ;). So you can see my patch as an embedded-build option, similar to Configure standard kernel features (for small systems) and no distro is shipping new kernels with that feature on either. Than if we decide 1:1 should have larger userbase instead of only the people that knows what they're doing (i.e. 1:1 guest can destroy linux-hypervisor) we can always add a bit of strtol parsing to 16bit kernelloader. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/8] virtio: add guest MSI-X support
Am Monday 27 April 2009 17:39:36 schrieb Michael S. Tsirkin: So we'll probably need to rename request_vqs to request_vectors, but we probably still need the driver to pass the number of vectors it wants to the transport. Right? This might be a stupid idea, but would something like the following be sufficient for you? Index: linux-2.6/drivers/net/virtio_net.c === --- linux-2.6.orig/drivers/net/virtio_net.c +++ linux-2.6/drivers/net/virtio_net.c @@ -1021,6 +1021,7 @@ static unsigned int features[] = { static struct virtio_driver virtio_net = { .feature_table = features, .feature_table_size = ARRAY_SIZE(features), + .num_vq_max = 3, .driver.name = KBUILD_MODNAME, .driver.owner = THIS_MODULE, .id_table = id_table, Index: linux-2.6/include/linux/virtio.h === --- linux-2.6.orig/include/linux/virtio.h +++ linux-2.6/include/linux/virtio.h @@ -110,6 +110,7 @@ struct virtio_driver { const struct virtio_device_id *id_table; const unsigned int *feature_table; unsigned int feature_table_size; + unsigned int num_vq_max; int (*probe)(struct virtio_device *dev); void (*remove)(struct virtio_device *dev); void (*config_changed)(struct virtio_device *dev); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/7] kvm/mmu: rename is_largepage_backed to mapping_level
Joerg, On Fri, Apr 24, 2009 at 01:58:43PM +0200, Joerg Roedel wrote: With the new name and the corresponding backend changes this function can now support multiple hugepage sizes. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/kvm/mmu.c | 100 +-- arch/x86/kvm/paging_tmpl.h |4 +- 2 files changed, 69 insertions(+), 35 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index e3421d8..56cd7c2 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -382,37 +382,52 @@ static void mmu_free_rmap_desc(struct kvm_rmap_desc *rd) * Return the pointer to the largepage write count for a given * gfn, handling slots that are not large page aligned. */ -static int *slot_largepage_idx(gfn_t gfn, struct kvm_memory_slot *slot) +static int *slot_largepage_idx(gfn_t gfn, +struct kvm_memory_slot *slot, +int level) { unsigned long idx; - idx = (gfn / KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL)) - - (slot-base_gfn / KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL)); - return slot-lpage_info[0][idx].write_count; + idx = (gfn / KVM_PAGES_PER_HPAGE(level)) - + (slot-base_gfn / KVM_PAGES_PER_HPAGE(level)); + return slot-lpage_info[level - 2][idx].write_count; } static void account_shadowed(struct kvm *kvm, gfn_t gfn) { + struct kvm_memory_slot *slot; int *write_count; + int i; gfn = unalias_gfn(kvm, gfn); - write_count = slot_largepage_idx(gfn, - gfn_to_memslot_unaliased(kvm, gfn)); - *write_count += 1; + + for (i = PT_DIRECTORY_LEVEL; + i PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) { + slot = gfn_to_memslot_unaliased(kvm, gfn); + write_count = slot_largepage_idx(gfn, slot, i); + *write_count += 1; + } } static void unaccount_shadowed(struct kvm *kvm, gfn_t gfn) { + struct kvm_memory_slot *slot; int *write_count; + int i; gfn = unalias_gfn(kvm, gfn); - write_count = slot_largepage_idx(gfn, - gfn_to_memslot_unaliased(kvm, gfn)); - *write_count -= 1; - WARN_ON(*write_count 0); + for (i = PT_DIRECTORY_LEVEL; + i PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) { + slot = gfn_to_memslot_unaliased(kvm, gfn); + write_count = slot_largepage_idx(gfn, slot, i); + *write_count -= 1; + WARN_ON(*write_count 0); + } } -static int has_wrprotected_page(struct kvm *kvm, gfn_t gfn) +static int has_wrprotected_page(struct kvm *kvm, + gfn_t gfn, + int level) { struct kvm_memory_slot *slot; int *largepage_idx; @@ -420,47 +435,67 @@ static int has_wrprotected_page(struct kvm *kvm, gfn_t gfn) gfn = unalias_gfn(kvm, gfn); slot = gfn_to_memslot_unaliased(kvm, gfn); if (slot) { - largepage_idx = slot_largepage_idx(gfn, slot); + largepage_idx = slot_largepage_idx(gfn, slot, level); return *largepage_idx; } return 1; } -static int host_largepage_backed(struct kvm *kvm, gfn_t gfn) +static int host_mapping_level(struct kvm *kvm, gfn_t gfn) { + unsigned long page_size = PAGE_SIZE; struct vm_area_struct *vma; unsigned long addr; - int ret = 0; + int i, ret = 0; addr = gfn_to_hva(kvm, gfn); if (kvm_is_error_hva(addr)) - return ret; + return page_size; down_read(current-mm-mmap_sem); vma = find_vma(current-mm, addr); - if (vma is_vm_hugetlb_page(vma)) - ret = 1; + if (!vma) + goto out; + + page_size = vma_kernel_pagesize(vma); + +out: up_read(current-mm-mmap_sem); + for (i = PT_PAGE_TABLE_LEVEL; + i (PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES); ++i) { + if (page_size = KVM_HPAGE_SIZE(i)) + ret = i; + else + break; + } + return ret; } -static int is_largepage_backed(struct kvm_vcpu *vcpu, gfn_t large_gfn) +static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn) { struct kvm_memory_slot *slot; - - if (has_wrprotected_page(vcpu-kvm, large_gfn)) - return 0; - - if (!host_largepage_backed(vcpu-kvm, large_gfn)) - return 0; + int host_level; + int level = PT_PAGE_TABLE_LEVEL; slot = gfn_to_memslot(vcpu-kvm, large_gfn); if (slot slot-dirty_bitmap) - return 0; + return PT_PAGE_TABLE_LEVEL; - return 1; + host_level = host_mapping_level(vcpu-kvm, large_gfn); + + if (host_level == PT_PAGE_TABLE_LEVEL) +
Re: kvm-kmod.git
Avi Kivity wrote: This is a small git repository for the kvm external module kit. It includes a submodule link to the kernel tree, so when you check out the repository, it also checks out a kernel tree for a specific revision. This way the various scripts and the kernel source are always synchronized. $ git clone git://git.kernel.org/pub/scm/virt/kvm/kvm-kmod.git $ git submodule init $ git submodule update $ ./configure $ make sync $ make Todo: - add maint branches - automatic 'make sync'? perhaps replace by symlimks? - add release script Configure spits out an error about include/asm not existing. I think that make sync has to be run before configure to create the include directory and copy asm-x86 inside so that ln -sf asm-$karch include/asm will work. Cam -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/8] virtio: add guest MSI-X support
On Mon, Apr 27, 2009 at 05:37:25PM +0200, Christian Borntraeger wrote: As number of virtqueues = number of vectors, we could pre-allocate all vectors that host supports, but this seems a bit drastic as an MSI-X device could support up to 2K vectors. In fact, the transport has to have a way of getting the number of virtqeues because find_vq returns ENOENT on invalid index numbers. Christian So again, I think this is an upper bound supported by host. Right? Not the upper bound, but the real available virtqueues. (With current qemu 3 for virtio-net, 2 for virtio-console etc.) Here's what I mean by upper bound: guest and host number of virtqueues might be different: the host might support control virtqueue like in virtio-net, but guest might be an older version and not use it. Since I use a different transport (drivers/s390/kvm/kvm_virtio.c), my motiviation is to keep the virtio interface as generic as possible. I dont really like the new interface, but I cannot give you silver bullet technical reasons - its more a gut feeling. The interface would work with lguest and s390. Anyway. Avis suggestion to decouple MSI count and virtqueue count looks like a promising approach. So generally, we add request_vectors which gives us the number of available MSI vectors and then find_vq gets a vector #? Does this sound better? -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-77 Excessive Disk Access causes real time clock hang!
Hi Avi, Avi Kivity wrote: interface: virtio cache: none format: raw, using a partition or logical volume What are you using? uhm, I'm not sure, I call qemu with: qemu-system-x86_64 -usb -hda /dev/hda2 -m 1536 -net nic,macaddr=$MACADDR -net tap,script=/etc/qemu-ifup -no-acpi -monitor stdio -usbdevice tablet -boot c The /dev/hda2 is NTFS formatted - does this make sense, because you wrote sth. with raw... Maybe another file system would be faster? Or is it ignored by the guest system? Best regards, Erik -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/8] virtio: add guest MSI-X support
On Mon, Apr 27, 2009 at 06:59:52PM +0200, Christian Borntraeger wrote: Am Monday 27 April 2009 17:39:36 schrieb Michael S. Tsirkin: So we'll probably need to rename request_vqs to request_vectors, but we probably still need the driver to pass the number of vectors it wants to the transport. Right? This might be a stupid idea, but would something like the following be sufficient for you? Yes. Index: linux-2.6/drivers/net/virtio_net.c === --- linux-2.6.orig/drivers/net/virtio_net.c +++ linux-2.6/drivers/net/virtio_net.c @@ -1021,6 +1021,7 @@ static unsigned int features[] = { static struct virtio_driver virtio_net = { .feature_table = features, .feature_table_size = ARRAY_SIZE(features), + .num_vq_max = 3, .driver.name = KBUILD_MODNAME, .driver.owner = THIS_MODULE, .id_table = id_table, Index: linux-2.6/include/linux/virtio.h === --- linux-2.6.orig/include/linux/virtio.h +++ linux-2.6/include/linux/virtio.h @@ -110,6 +110,7 @@ struct virtio_driver { const struct virtio_device_id *id_table; const unsigned int *feature_table; unsigned int feature_table_size; + unsigned int num_vq_max; int (*probe)(struct virtio_device *dev); void (*remove)(struct virtio_device *dev); void (*config_changed)(struct virtio_device *dev); Good idea. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Boot problems with qemu-kvm
Hi, I've tried booting two VMs (Ubuntu and Fedora 11) that previously worked with the latest kvm-userspace. At start, I get the following warning: Could not open '/dev/kqemu' - QEMU acceleration layer not activated: No such file or directory But, booting proceeds quickly which seems to indicate that KVM is being used. However, the network device is not started and the graphics don't start properly. CPU usage also spikes to 100% and stays there and I can't interact at all with the VM. I'm running the 2.6.29 KVM kernel in the guests. Cam -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bad virtio disk performance
Lucas Nussbaum wrote: Hi, I'm experiencing bad disk I/O performance using virtio disks. I'm using Linux 2.6.29 (host guest), kvm 84 userspace. On the host, and in a non-virtio guest, I get ~120 MB/s when writing with dd (the disks are fast RAID0 SAS disks). Could you provide detail of the exact type and size of i/o load you were creating with dd? Also the full qemu cmd line invocation in both cases would be useful. In a guest with a virtio disk, I get at most ~32 MB/s. Which non-virtio interface was used for the comparison? The rest of the setup is the same. For reference, I'm running kvm -drive file=/tmp/debian-amd64.img,if=virtio. Is such performance expected? What should I check? Not expected, something is awry. blktrace(8) run on the host will shed some light on the type of i/o requests issued by qemu in both cases. -john -- john.coo...@third-harmonic.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] reserved-ram for pci-passthrough without VT-d capable hardware
Hello Pablo, On Mon, Apr 27, 2009 at 11:00:51AM -0600, Passera, Pablo R wrote: Andrea, We are working with embedded hardware that does not have VT-d and we need 1-1 mapping. I wonder which is the status of this patch. Have you continued updating it with the latest KVM version? Sorry to say but it isn't updated to latest KVM and latest mainline. Porting normally should be easy. I attached last versions. Since you mentioned this ;), I take opportunity to add that those embedded usages are the ones that are totally fine with the compile time passthrough-guest-ram decision, instead of a boot time decision. Those host kernels will likely have RT patches (KVM works great with preempt-RT indeed) and in turn the compile time ram selection is the least of their problems as you can imagine ;). So you can see my patch as an embedded-build option, similar to Configure standard kernel features (for small systems) and no distro is shipping new kernels with that feature on either. Than if we decide 1:1 should have larger userbase instead of only the people that knows what they're doing (i.e. 1:1 guest can destroy linux-hypervisor) we can always add a bit of strtol parsing to 16bit kernelloader. Agreed! From: Andrea Arcangeli aarca...@redhat.com The reserved RAM can be mapped by virtualization software with /dev/mem to create a 1:1 mapping between guest physical (bus) address and host physical (bus) address. This will allow pci passthrough with DMA for the guest using the ram with the 1:1 mapping. The only detail to take care of is the ram marked reserved RAM failed. The virtualization software must create for the guest an e820 map that only includes the reserved RAM regions but if the guest touches memory with guest physical address in the reserved RAM failed ranges (linux guest will do that even if the ram isn't present in the e820 map), it should provide that as ram and map it with a non linear mapping. This should allow any linux kernel to run fine and hopefully any other OS too. svm ~ # cat /proc/iomem |head -n 20 -0fff : reserved RAM failed 1000-5fff : reserved RAM 6000-7fff : reserved RAM failed 8000-0009efff : reserved RAM 0009f000-0009 : reserved 000cd600-000c : pnp 00:0d 000f-000f : reserved 0010-0fff : reserved RAM 1000-3ded : System RAM 1000-10329ab2 : Kernel code 10329ab3-104933e7 : Kernel data 104f5000-10558e67 : Kernel bss 3dee-3dee2fff : ACPI Non-volatile Storage 3dee3000-3dee : ACPI Tables 3def-3def : reserved 3dff-3ffe : pnp 00:0d e000-efff : reserved fa00-fbff : PCI Bus #01 fa00-fbff : :01:05.0 fda0-fdbf : PCI Bus #01 svm ~ # hexdump /dev/mem | grep -C2 ' ' 7e0 * 0001000 * 0006000 a5a5 a5a5 8ec8 8ed8 8ec0 66d0 06c7 -- * 0007ff0 3063 1000 0008000 * 009f000 0002 -- 00fffe0 6000 3c03 45e7 0184 0500 0082 01c0 0223 000 5bea 00e0 31f0 2f32 3931 302f 0037 12fc 010 * 1000 8d48 f92d 48ff ed81 1000 8948 ^C svm ~ # Signed-off-by: From: Andrea Arcangeli aarca...@redhat.com --- This is a port to current linux-2.6.git of the previous reserved-ram patch. Let me know if there's a chance to get this acked and included. Anything that isn't at compile time would require much bigger changes just to parse the command line at 16bit realmode time to know where to relocate the kernel dynamically. Because 1:1 is a corner case feature required only by some users, this is the minimal intrusive approach. This also has some limits as it can't reserve more than 1g, and with a few more changes 2g but this is ok for a long time as the virtualized 1:1 guest doesn't need to be huge, just a desktop. diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1276,8 +1276,36 @@ config CRASH_DUMP (CONFIG_RELOCATABLE=y). For more details see Documentation/kdump/kdump.txt +config RESERVE_PHYSICAL_START + bool Reserve all RAM below PHYSICAL_START (EXPERIMENTAL) + depends on !RELOCATABLE X86_64 + help + This makes the kernel use only RAM above __PHYSICAL_START. + All memory below __PHYSICAL_START will be left unused and + marked as reserved RAM in /proc/iomem. The few special + pages that can't be relocated at addresses above + __PHYSICAL_START and that can't be guaranteed to be unused + by the running kernel will be marked reserved RAM failed + in /proc/iomem. Those may or may be not used by the kernel + (for example SMP trampoline pages would only be used if + CPU hotplug is enabled). + + The reserved RAM can be mapped by virtualization software +
Re: [PATCH RFC 0/8] virtio: add guest MSI-X support
On Mon, Apr 27, 2009 at 09:39:06AM -0500, Anthony Liguori wrote: Michael S. Tsirkin wrote: Add optional MSI-X support: use a vector per virtqueue with fallback to a common vector and finally to regular interrupt. Teach all drivers to use it. I added 2 new virtio operations: request_vqs/free_vqs because MSI needs to know the total number of vectors upfront. Signed-off-by: Michael S. Tsirkin m...@redhat.com Do you have userspace patches for testing? Regards, Anthony Liguori Not ready yet, sorry. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bad virtio disk performance
On 27/04/09 at 13:36 -0400, john cooper wrote: Lucas Nussbaum wrote: Hi, I'm experiencing bad disk I/O performance using virtio disks. I'm using Linux 2.6.29 (host guest), kvm 84 userspace. On the host, and in a non-virtio guest, I get ~120 MB/s when writing with dd (the disks are fast RAID0 SAS disks). Could you provide detail of the exact type and size of i/o load you were creating with dd? I tried with various block sizes. An example invocation would be: dd if=/dev/zero of=foo bs=4096 count=262144 conv=fsync (126 MB/s without virtio, 32 MB/s with virtio). Also the full qemu cmd line invocation in both cases would be useful. non-virtio: kvm -drive file=/tmp/debian-amd64.img,if=scsi,cache=writethrough -net nic,macaddr=00:16:3e:5a:28:1,model=e1000 -net tap -nographic -kernel /boot/vmlinuz-2.6.29 -initrd /boot/initrd.img-2.6.29 -append root=/dev/sda1 ro console=tty0 console=ttyS0,9600,8n1 virtio: kvm -drive file=/tmp/debian-amd64.img,if=virtio,cache=writethrough -net nic,macaddr=00:16:3e:5a:28:1,model=e1000 -net tap -nographic -kernel /boot/vmlinuz-2.6.29 -initrd /boot/initrd.img-2.6.29 -append root=/dev/vda1 ro console=tty0 console=ttyS0,9600,8n1 In a guest with a virtio disk, I get at most ~32 MB/s. Which non-virtio interface was used for the comparison? if=ide I got the same performance with if=scsi The rest of the setup is the same. For reference, I'm running kvm -drive file=/tmp/debian-amd64.img,if=virtio. Is such performance expected? What should I check? Not expected, something is awry. blktrace(8) run on the host will shed some light on the type of i/o requests issued by qemu in both cases. Ah, I found something interesting. btrace summary after writing a 1 GB file: --- without virtio: Total (8,5): Reads Queued: 0, 0KiBWrites Queued: 272259, 1089MiB Read Dispatches:0, 0KiBWrite Dispatches: 9769, 1089MiB Reads Requeued: 0Writes Requeued: 0 Reads Completed:0, 0KiBWrites Completed: 9769, 1089MiB Read Merges:0, 0KiBWrite Merges: 262490, 1049MiB IO unplugs: 45973Timer unplugs: 30 --- with virtio: Total (8,5): Reads Queued: 1, 4KiBWrites Queued: 430734, 1776MiB Read Dispatches:1, 4KiBWrite Dispatches: 196143, 1776MiB Reads Requeued: 0Writes Requeued: 0 Reads Completed:1, 4KiBWrites Completed: 196143, 1776MiB Read Merges:0, 0KiBWrite Merges: 234578, 938488KiB IO unplugs:301311Timer unplugs: 25 (I re-ran the test twice, got similar results) So, apparently, with virtio, there's a lot more data being written to disk. The underlying filesystem is ext3, and is monted as /tmp. It only contains the VM image file. Another difference is that, with virtio, the data was shared equally over all 4 CPUs, with without virt-io, CPU0 and CPU1 did all the work. In the virtio log, I also see a (null) process doing a lot of writes. I uploaded the logs to http://blop.info/bazaar/virtio/, if you want to take a look. Thank you, - Lucas -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM PATCH v3 1/2] eventfd: export fget and signal interfaces for module use
We will use this later in the series Signed-off-by: Gregory Haskins ghask...@novell.com --- fs/eventfd.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/fs/eventfd.c b/fs/eventfd.c index 2a701d5..faf9e60 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -16,6 +16,7 @@ #include linux/anon_inodes.h #include linux/eventfd.h #include linux/syscalls.h +#include linux/module.h struct eventfd_ctx { wait_queue_head_t wqh; @@ -56,6 +57,7 @@ int eventfd_signal(struct file *file, int n) return n; } +EXPORT_SYMBOL_GPL(eventfd_signal); static int eventfd_release(struct inode *inode, struct file *file) { @@ -197,6 +199,7 @@ struct file *eventfd_fget(int fd) return file; } +EXPORT_SYMBOL_GPL(eventfd_fget); SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags) { @@ -228,6 +231,7 @@ SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags) kfree(ctx); return fd; } +EXPORT_SYMBOL_GPL(sys_eventfd2); SYSCALL_DEFINE1(eventfd, unsigned int, count) { -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM PATCH v3 0/2] irqfd
(Applies to kvm.git 41b76d8d0487c26d6d4d3fe53c1ff59b3236f096) This series implements a mechanism called irqfd. It lets you create an eventfd based file-desriptor to inject interrupts to a kvm guest. We associate one gsi per fd for fine-grained routing. [ Changelog: v3: *) The kernel now allocates the eventfd (need to export sys_eventfd2) *) Added a flags field for future expansion to kvm_irqfd() *) We properly toggle the irq level 1+0. *) We re-use the USERSPACE_SRC_ID instead of creating our own *) Properly check for failures establishing a poll-table with eventfd *) Fixed fd/file leaks on failure *) Rebased to lateste kvm.git::41b76d8d04 v2: *) Dropped notifier_chain based callbacks in favor of wait_queue_t::func and file::poll based callbacks (Thanks to Davide for the suggestion) v1: *) Initial release We do not have a user of this interface in this series, though note future version of virtual-bus (v4 and above) will be based on this. The first patch will require mainline buy-in, particularly from Davide (cc'd). The last patch is kvm specific. qemu-kvm.git patch to follow. -Greg --- Gregory Haskins (2): kvm: add support for irqfd via eventfd-notification interface eventfd: export fget and signal interfaces for module use arch/x86/kvm/Makefile|2 - arch/x86/kvm/x86.c |1 fs/eventfd.c |4 + include/linux/kvm.h |7 ++ include/linux/kvm_host.h |4 + virt/kvm/irqfd.c | 158 ++ virt/kvm/kvm_main.c | 11 +++ 7 files changed, 186 insertions(+), 1 deletions(-) create mode 100644 virt/kvm/irqfd.c -- Signature -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface
This allows an eventfd to be registered as an irq source with a guest. Any signaling operation on the eventfd (via userspace or kernel) will inject the registered GSI at the next available window. Signed-off-by: Gregory Haskins ghask...@novell.com --- arch/x86/kvm/Makefile|2 - arch/x86/kvm/x86.c |1 include/linux/kvm.h |7 ++ include/linux/kvm_host.h |4 + virt/kvm/irqfd.c | 158 ++ virt/kvm/kvm_main.c | 11 +++ 6 files changed, 182 insertions(+), 1 deletions(-) create mode 100644 virt/kvm/irqfd.c diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index b43c4ef..d5fff51 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -3,7 +3,7 @@ # common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \ -coalesced_mmio.o irq_comm.o) +coalesced_mmio.o irq_comm.o irqfd.o) ifeq ($(CONFIG_KVM_TRACE),y) common-objs += $(addprefix ../../../virt/kvm/, kvm_trace.o) endif diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ab1fdac..0773433 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1027,6 +1027,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_REINJECT_CONTROL: case KVM_CAP_IRQ_INJECT_STATUS: case KVM_CAP_ASSIGN_DEV_IRQ: + case KVM_CAP_IRQFD: r = 1; break; case KVM_CAP_COALESCED_MMIO: diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 3db5d8d..0e594f2 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -415,6 +415,7 @@ struct kvm_trace_rec { #define KVM_CAP_ASSIGN_DEV_IRQ 29 /* Another bug in KVM_SET_USER_MEMORY_REGION fixed: */ #define KVM_CAP_JOIN_MEMORY_REGIONS_WORKS 30 +#define KVM_CAP_IRQFD 31 #ifdef KVM_CAP_IRQ_ROUTING @@ -454,6 +455,11 @@ struct kvm_irq_routing { #endif +struct kvm_irqfd { + __u32 gsi; + __u32 flags; +}; + /* * ioctls for VM fds */ @@ -498,6 +504,7 @@ struct kvm_irq_routing { #define KVM_ASSIGN_SET_MSIX_ENTRY \ _IOW(KVMIO, 0x74, struct kvm_assigned_msix_entry) #define KVM_DEASSIGN_DEV_IRQ _IOW(KVMIO, 0x75, struct kvm_assigned_irq) +#define KVM_IRQFD _IOW(KVMIO, 0x76, struct kvm_irqfd) /* * ioctls for vcpu fds diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 095ebb6..6a8d1c1 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -134,6 +134,7 @@ struct kvm { struct list_head vm_list; struct kvm_io_bus mmio_bus; struct kvm_io_bus pio_bus; + struct list_head irqfds; struct kvm_vm_stat stat; struct kvm_arch arch; atomic_t users_count; @@ -524,4 +525,7 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {} #endif +int kvm_irqfd(struct kvm *kvm, int gsi, int flags); +void kvm_irqfd_release(struct kvm *kvm); + #endif diff --git a/virt/kvm/irqfd.c b/virt/kvm/irqfd.c new file mode 100644 index 000..4fd2bea --- /dev/null +++ b/virt/kvm/irqfd.c @@ -0,0 +1,158 @@ +/* + * irqfd: Allows an eventfd to be used to inject an interrupt to the guest + * + * Credit goes to Avi Kivity for the original idea. + * + * Copyright 2009 Novell. All Rights Reserved. + * + * Author: + * Gregory Haskins ghask...@novell.com + * + * This file is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software Foundation, + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. + */ + +#include linux/kvm_host.h +#include linux/eventfd.h +#include linux/workqueue.h +#include linux/syscalls.h +#include linux/wait.h +#include linux/poll.h +#include linux/file.h +#include linux/list.h + +struct _irqfd { + struct kvm *kvm; + int gsi; + struct file *file; + struct list_head list; + poll_tablept; + wait_queue_head_t*wqh; + wait_queue_t wait; + struct work_structwork; +}; + +static void +irqfd_inject(struct work_struct *work) +{ + struct _irqfd *irqfd = container_of(work, struct _irqfd, work); + struct kvm *kvm = irqfd-kvm; + + mutex_lock(kvm-lock); + kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 1); + kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 0); + mutex_unlock(kvm-lock); +} + +static int +irqfd_wakeup(wait_queue_t *wait, unsigned mode, int
[PATCH v3] qemu-kvm: add irqfd support
(Applies to qemu-kvm.git 8f7a30dbc40a1d4c09275566f9ed9647ed1ee50f) irqfd lets you create an eventfd based file-desriptor to inject interrupts to a kvm guest. We associate one gsi per fd for fine-grained routing. Signed-off-by: Gregory Haskins ghask...@novell.com --- kvm/libkvm/libkvm.c | 18 ++ kvm/libkvm/libkvm.h | 14 ++ 2 files changed, 32 insertions(+), 0 deletions(-) diff --git a/kvm/libkvm/libkvm.c b/kvm/libkvm/libkvm.c index 0610e3f..dc9a951 100644 --- a/kvm/libkvm/libkvm.c +++ b/kvm/libkvm/libkvm.c @@ -1440,3 +1440,21 @@ int kvm_assign_set_msix_entry(kvm_context_t kvm, return ret; } #endif + +int kvm_irqfd(kvm_context_t kvm, int gsi, int flags) +{ + int r; + struct kvm_irqfd data = { + .gsi = gsi, + .flags = flags, + }; + + if (!kvm_check_extension(kvm, KVM_CAP_IRQFD)) + return -ENOENT; + + r = ioctl(kvm-vm_fd, KVM_IRQFD, data); + if (r == -1) + r = -errno; + return r; +} + diff --git a/kvm/libkvm/libkvm.h b/kvm/libkvm/libkvm.h index ce6f054..89a8893 100644 --- a/kvm/libkvm/libkvm.h +++ b/kvm/libkvm/libkvm.h @@ -856,6 +856,20 @@ int kvm_commit_irq_routes(kvm_context_t kvm); */ int kvm_get_irq_route_gsi(kvm_context_t kvm); +/*! + * \brief Create a file descriptor for injecting interrupts + * + * Creates an eventfd based file-descriptor that maps to a specific GSI + * in the guest. eventfd compliant signaling (write() from userspace, or + * eventfd_signal() from kernelspace) will cause the GSI to inject + * itself into the guest at the next available window. + * + * \param kvm Pointer to the current kvm_context + * \param gsi GSI to assign to this fd + * \param flags reserved, must be zero + */ +int kvm_irqfd(kvm_context_t kvm, int gsi, int flags); + #ifdef KVM_CAP_DEVICE_MSIX int kvm_assign_set_msix_nr(kvm_context_t kvm, struct kvm_assigned_msix_nr *msix_nr); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 1/4] KVM: MMU: protect kvm_mmu_change_mmu_pages with mmu_lock
kvm_handle_hva, called by MMU notifiers, manipulates mmu data only with the protection of mmu_lock. Update kvm_mmu_change_mmu_pages callers to take mmu_lock, thus protecting against kvm_handle_hva. CC: Andrea Arcangeli aarca...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/arch/x86/kvm/mmu.c === --- kvm.orig/arch/x86/kvm/mmu.c +++ kvm/arch/x86/kvm/mmu.c @@ -2732,7 +2732,6 @@ void kvm_mmu_slot_remove_write_access(st { struct kvm_mmu_page *sp; - spin_lock(kvm-mmu_lock); list_for_each_entry(sp, kvm-arch.active_mmu_pages, link) { int i; u64 *pt; @@ -2747,7 +2746,6 @@ void kvm_mmu_slot_remove_write_access(st pt[i] = ~PT_WRITABLE_MASK; } kvm_flush_remote_tlbs(kvm); - spin_unlock(kvm-mmu_lock); } void kvm_mmu_zap_all(struct kvm *kvm) Index: kvm/arch/x86/kvm/x86.c === --- kvm.orig/arch/x86/kvm/x86.c +++ kvm/arch/x86/kvm/x86.c @@ -1647,10 +1647,12 @@ static int kvm_vm_ioctl_set_nr_mmu_pages return -EINVAL; down_write(kvm-slots_lock); + spin_lock(kvm-mmu_lock); kvm_mmu_change_mmu_pages(kvm, kvm_nr_mmu_pages); kvm-arch.n_requested_mmu_pages = kvm_nr_mmu_pages; + spin_unlock(kvm-mmu_lock); up_write(kvm-slots_lock); return 0; } @@ -1826,7 +1828,9 @@ int kvm_vm_ioctl_get_dirty_log(struct kv /* If nothing is dirty, don't bother messing with page tables. */ if (is_dirty) { + spin_lock(kvm-mmu_lock); kvm_mmu_slot_remove_write_access(kvm, log-slot); + spin_unlock(kvm-mmu_lock); kvm_flush_remote_tlbs(kvm); memslot = kvm-memslots[log-slot]; n = ALIGN(memslot-npages, BITS_PER_LONG) / 8; @@ -4510,12 +4514,14 @@ int kvm_arch_set_memory_region(struct kv } } + spin_lock(kvm-mmu_lock); if (!kvm-arch.n_requested_mmu_pages) { unsigned int nr_mmu_pages = kvm_mmu_calculate_mmu_pages(kvm); kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages); } kvm_mmu_slot_remove_write_access(kvm, mem-slot); + spin_unlock(kvm-mmu_lock); kvm_flush_remote_tlbs(kvm); return 0; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 4/4] KVM: x86: disallow changing a slots size
Support to shrinking aliases complicates kernel code unnecessarily, while userspace can do the same with two operations, delete an alias, and create a new alias. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/arch/x86/kvm/x86.c === --- kvm.orig/arch/x86/kvm/x86.c +++ kvm/arch/x86/kvm/x86.c @@ -1707,6 +1707,7 @@ static int kvm_vm_ioctl_set_memory_alias { int r, n; struct kvm_mem_alias *p; + unsigned long npages; r = -EINVAL; /* General sanity checks */ @@ -1728,9 +1729,12 @@ static int kvm_vm_ioctl_set_memory_alias p = kvm-arch.aliases[alias-slot]; - /* FIXME: either disallow shrinking alias slots or disable -* size changes as done with memslots -*/ + /* Disallow changing an alias slot's size. */ + npages = alias-memory_size PAGE_SHIFT; + r = -EINVAL; + if (npages p-npages npages != p-npages) + goto out_unlock; + if (!alias-memory_size) { r = -EBUSY; if (kvm_root_gfn_in_range(kvm, p-base_gfn, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 2/4] KVM: take mmu_lock when updating a deleted slot
kvm_handle_hva relies on mmu_lock protection to safely access the memslot structures. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/virt/kvm/kvm_main.c === --- kvm.orig/virt/kvm/kvm_main.c +++ kvm/virt/kvm/kvm_main.c @@ -1199,8 +1199,10 @@ int __kvm_set_memory_region(struct kvm * kvm_free_physmem_slot(old, npages ? new : NULL); /* Slot deletion case: we have to update the current slot */ + spin_lock(kvm-mmu_lock); if (!npages) *memslot = old; + spin_unlock(kvm-mmu_lock); #ifdef CONFIG_DMAR /* map the pages in iommu page table */ r = kvm_iommu_map_pages(kvm, base_gfn, npages); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 0/4] set_memory_region locking fixes / vcpu-arch.cr3 + removal of memslots
-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 3/4] KVM: introduce kvm_arch_can_free_memslot, disallow slot deletion if cached cr3
Disallow the deletion of memory slots (and aliases, for x86 case), if a vcpu contains a cr3 that points to such slot/alias. This complements commit 6c20e1442bb1c62914bb85b7f4a38973d2a423ba. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/arch/ia64/kvm/kvm-ia64.c === --- kvm.orig/arch/ia64/kvm/kvm-ia64.c +++ kvm/arch/ia64/kvm/kvm-ia64.c @@ -1633,6 +1633,11 @@ void kvm_arch_flush_shadow(struct kvm *k kvm_flush_remote_tlbs(kvm); } +int kvm_arch_can_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot) +{ + return 1; +} + long kvm_arch_dev_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { Index: kvm/arch/powerpc/kvm/powerpc.c === --- kvm.orig/arch/powerpc/kvm/powerpc.c +++ kvm/arch/powerpc/kvm/powerpc.c @@ -176,6 +176,11 @@ void kvm_arch_flush_shadow(struct kvm *k { } +int kvm_arch_can_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot) +{ + return 1; +} + struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id) { struct kvm_vcpu *vcpu; Index: kvm/arch/s390/kvm/kvm-s390.c === --- kvm.orig/arch/s390/kvm/kvm-s390.c +++ kvm/arch/s390/kvm/kvm-s390.c @@ -691,6 +691,11 @@ void kvm_arch_flush_shadow(struct kvm *k { } +int kvm_arch_can_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot) +{ + return 1; +} + gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn) { return gfn; Index: kvm/arch/x86/kvm/x86.c === --- kvm.orig/arch/x86/kvm/x86.c +++ kvm/arch/x86/kvm/x86.c @@ -1676,6 +1676,27 @@ gfn_t unalias_gfn(struct kvm *kvm, gfn_t return gfn; } +static int kvm_root_gfn_in_range(struct kvm *kvm, gfn_t base_gfn, +gfn_t end_gfn, bool unalias) +{ + struct kvm_vcpu *vcpu; + gfn_t root_gfn; + int i; + + for (i = 0; i KVM_MAX_VCPUS; ++i) { + vcpu = kvm-vcpus[i]; + if (!vcpu) + continue; + root_gfn = vcpu-arch.cr3 PAGE_SHIFT; + if (unalias) + root_gfn = unalias_gfn(kvm, root_gfn); + if (root_gfn = base_gfn root_gfn = end_gfn) + return 1; + } + + return 0; +} + /* * Set a new alias region. Aliases map a portion of physical memory into * another portion. This is useful for memory windows, for example the PC @@ -1706,6 +1727,19 @@ static int kvm_vm_ioctl_set_memory_alias spin_lock(kvm-mmu_lock); p = kvm-arch.aliases[alias-slot]; + + /* FIXME: either disallow shrinking alias slots or disable +* size changes as done with memslots +*/ + if (!alias-memory_size) { + r = -EBUSY; + if (kvm_root_gfn_in_range(kvm, p-base_gfn, + p-base_gfn + p-npages - 1, + false)) + goto out_unlock; + } + + p-base_gfn = alias-guest_phys_addr PAGE_SHIFT; p-npages = alias-memory_size PAGE_SHIFT; p-target_gfn = alias-target_phys_addr PAGE_SHIFT; @@ -1722,6 +1756,9 @@ static int kvm_vm_ioctl_set_memory_alias return 0; +out_unlock: + spin_unlock(kvm-mmu_lock); + up_write(kvm-slots_lock); out: return r; } @@ -4532,6 +4569,15 @@ void kvm_arch_flush_shadow(struct kvm *k kvm_mmu_zap_all(kvm); } +int kvm_arch_can_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot) +{ + int ret; + + ret = kvm_root_gfn_in_range(kvm, slot-base_gfn, + slot-base_gfn + slot-npages - 1, true); + return !ret; +} + int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) { return vcpu-arch.mp_state == KVM_MP_STATE_RUNNABLE Index: kvm/include/linux/kvm_host.h === --- kvm.orig/include/linux/kvm_host.h +++ kvm/include/linux/kvm_host.h @@ -200,6 +200,7 @@ int kvm_arch_set_memory_region(struct kv struct kvm_memory_slot old, int user_alloc); void kvm_arch_flush_shadow(struct kvm *kvm); +int kvm_arch_can_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot); gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn); struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn); unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn); Index: kvm/virt/kvm/kvm_main.c === --- kvm.orig/virt/kvm/kvm_main.c +++ kvm/virt/kvm/kvm_main.c @@ -1179,8 +1179,13 @@ int __kvm_set_memory_region(struct kvm * } #endif /* not defined CONFIG_S390 */ - if (!npages) + if (!npages) {
[patch 1/4] qemu: external module: smp_send_reschedule compat
smp_send_reschedule was exported (via smp_ops) in v2.6.24. Create a compat function which schedules the IPI to keventd context, in case interrupts are disabled, for kernels 2.6.24. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/kvm/kernel/external-module-compat-comm.h b/kvm/kernel/external-module-compat-comm.h index c955927..852a8c1 100644 --- a/kvm/kernel/external-module-compat-comm.h +++ b/kvm/kernel/external-module-compat-comm.h @@ -116,6 +116,10 @@ int kvm_smp_call_function_single(int cpu, void (*func)(void *info), #endif +#if LINUX_VERSION_CODE KERNEL_VERSION(2,6,28) +#define nr_cpu_ids NR_CPUS +#endif + #include linux/miscdevice.h #ifndef KVM_MINOR #define KVM_MINOR 232 @@ -220,6 +224,10 @@ int kvm_smp_call_function_mask(cpumask_t mask, void (*func) (void *info), #define smp_call_function_mask kvm_smp_call_function_mask +void kvm_smp_send_reschedule(int cpu); + +#define smp_send_reschedule kvm_smp_send_reschedule + #endif /* empty_zero_page isn't exported in all kernels */ diff --git a/kvm/kernel/external-module-compat.c b/kvm/kernel/external-module-compat.c index 0d858be..ca269cc 100644 --- a/kvm/kernel/external-module-compat.c +++ b/kvm/kernel/external-module-compat.c @@ -221,6 +221,58 @@ out: return 0; } +#include linux/workqueue.h + +static void vcpu_kick_intr(void *info) +{ +} + +struct kvm_kick { + int cpu; + struct work_struct work; +}; + +#if LINUX_VERSION_CODE KERNEL_VERSION(2,6,20) +static void kvm_do_smp_call_function(void *data) +{ + int me; + struct kvm_kick *kvm_kick = data; +#else +static void kvm_do_smp_call_function(struct work_struct *work) +{ + int me; + struct kvm_kick *kvm_kick = container_of(work, struct kvm_kick, work); +#endif + me = get_cpu(); + + if (kvm_kick-cpu != me) + smp_call_function_single(kvm_kick-cpu, vcpu_kick_intr, +NULL, 0); + kfree(kvm_kick); + put_cpu(); +} + +void kvm_queue_smp_call_function(int cpu) +{ + struct kvm_kick *kvm_kick = kmalloc(sizeof(struct kvm_kick), GFP_ATOMIC); + +#if LINUX_VERSION_CODE KERNEL_VERSION(2,6,20) + INIT_WORK(kvm_kick-work, kvm_do_smp_call_function, kvm_kick); +#else + INIT_WORK(kvm_kick-work, kvm_do_smp_call_function); +#endif + + schedule_work(kvm_kick-work); +} + +void kvm_smp_send_reschedule(int cpu) +{ + if (irqs_disabled()) { + kvm_queue_smp_call_function(cpu); + return; + } + smp_call_function_single(cpu, vcpu_kick_intr, NULL, 0); +} #endif /* manually export hrtimer_init/start/cancel */ diff --git a/kvm/kernel/x86/hack-module.awk b/kvm/kernel/x86/hack-module.awk index 260eeef..f4d14da 100644 --- a/kvm/kernel/x86/hack-module.awk +++ b/kvm/kernel/x86/hack-module.awk @@ -35,6 +35,8 @@ BEGIN { split(INIT_WORK desc_struct ldttss_desc64 desc_ptr \ { sub(/match-dev-msi_enabled/, kvm_pcidev_msi_enabled(match-dev)) } +{ sub(/smp_send_reschedule/, kvm_smp_send_reschedule) } + /^static void __vmx_load_host_state/ { vmx_load_host_state = 1 } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 3/4] KVM: use smp_send_reschedule in kvm_vcpu_kick
KVM uses a function call IPI to cause the exit of a guest running on a physical cpu. For virtual interrupt notification there is no need to wait on IPI receival, or to execute any function. This is exactly what the reschedule IPI does, without the overhead of function IPI. So use it instead of smp_call_function_single in kvm_vcpu_kick. Also change the guest_mode variable to a bit in vcpu-requests, and use that to collapse multiple IPI's that would be issued between the first one and zeroing of guest mode. This allows kvm_vcpu_kick to called with interrupts disabled. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/arch/ia64/kernel/irq_ia64.c === --- kvm.orig/arch/ia64/kernel/irq_ia64.c +++ kvm/arch/ia64/kernel/irq_ia64.c @@ -610,6 +610,9 @@ static struct irqaction ipi_irqaction = .name = IPI }; +/* + * KVM uses this interrupt to force a cpu out of guest mode + */ static struct irqaction resched_irqaction = { .handler = dummy_handler, .flags =IRQF_DISABLED, Index: kvm/arch/ia64/kvm/kvm-ia64.c === --- kvm.orig/arch/ia64/kvm/kvm-ia64.c +++ kvm/arch/ia64/kvm/kvm-ia64.c @@ -668,7 +668,7 @@ again: host_ctx = kvm_get_host_context(vcpu); guest_ctx = kvm_get_guest_context(vcpu); - vcpu-guest_mode = 1; + clear_bit(KVM_REQ_KICK, vcpu-requests); r = kvm_vcpu_pre_transition(vcpu); if (r 0) @@ -685,7 +685,7 @@ again: kvm_vcpu_post_transition(vcpu); vcpu-arch.launched = 1; - vcpu-guest_mode = 0; + set_bit(KVM_REQ_KICK, vcpu-requests); local_irq_enable(); /* @@ -1884,24 +1884,18 @@ void kvm_arch_hardware_unsetup(void) { } -static void vcpu_kick_intr(void *info) -{ -#ifdef DEBUG - struct kvm_vcpu *vcpu = (struct kvm_vcpu *)info; - printk(KERN_DEBUGvcpu_kick_intr %p \n, vcpu); -#endif -} - void kvm_vcpu_kick(struct kvm_vcpu *vcpu) { - int ipi_pcpu = vcpu-cpu; - int cpu = get_cpu(); + int me; + int cpu = vcpu-cpu; if (waitqueue_active(vcpu-wq)) wake_up_interruptible(vcpu-wq); - if (vcpu-guest_mode cpu != ipi_pcpu) - smp_call_function_single(ipi_pcpu, vcpu_kick_intr, vcpu, 0); + me = get_cpu(); + if (cpu != me (unsigned) cpu nr_cpu_ids cpu_online(cpu)) + if (!test_and_set_bit(KVM_REQ_KICK, vcpu-requests)) + smp_send_reschedule(cpu); put_cpu(); } Index: kvm/arch/x86/kernel/smp.c === --- kvm.orig/arch/x86/kernel/smp.c +++ kvm/arch/x86/kernel/smp.c @@ -172,6 +172,9 @@ void smp_reschedule_interrupt(struct pt_ { ack_APIC_irq(); inc_irq_stat(irq_resched_count); + /* +* KVM uses this interrupt to force a cpu out of guest mode +*/ } void smp_call_function_interrupt(struct pt_regs *regs) Index: kvm/arch/x86/kvm/x86.c === --- kvm.orig/arch/x86/kvm/x86.c +++ kvm/arch/x86/kvm/x86.c @@ -3197,6 +3197,9 @@ static int vcpu_enter_guest(struct kvm_v local_irq_disable(); + clear_bit(KVM_REQ_KICK, vcpu-requests); + smp_mb__after_clear_bit(); + if (vcpu-requests || need_resched() || signal_pending(current)) { local_irq_enable(); preempt_enable(); @@ -3204,13 +3207,6 @@ static int vcpu_enter_guest(struct kvm_v goto out; } - vcpu-guest_mode = 1; - /* -* Make sure that guest_mode assignment won't happen after -* testing the pending IRQ vector bitmap. -*/ - smp_wmb(); - if (vcpu-arch.exception.pending) __queue_exception(vcpu); else if (irqchip_in_kernel(vcpu-kvm)) @@ -3252,7 +3248,7 @@ static int vcpu_enter_guest(struct kvm_v set_debugreg(vcpu-arch.host_dr6, 6); set_debugreg(vcpu-arch.host_dr7, 7); - vcpu-guest_mode = 0; + set_bit(KVM_REQ_KICK, vcpu-requests); local_irq_enable(); ++vcpu-stat.exits; @@ -4550,30 +4546,20 @@ int kvm_arch_vcpu_runnable(struct kvm_vc || vcpu-arch.nmi_pending; } -static void vcpu_kick_intr(void *info) -{ -#ifdef DEBUG - struct kvm_vcpu *vcpu = (struct kvm_vcpu *)info; - printk(KERN_DEBUG vcpu_kick_intr %p \n, vcpu); -#endif -} - void kvm_vcpu_kick(struct kvm_vcpu *vcpu) { - int ipi_pcpu = vcpu-cpu; - int cpu; + int me; + int cpu = vcpu-cpu; if (waitqueue_active(vcpu-wq)) { wake_up_interruptible(vcpu-wq); ++vcpu-stat.halt_wakeup; } - /* -* We may be called synchronously with irqs disabled in guest mode, -* So need not to call smp_call_function_single() in that case. -*/ -
[patch 4/4] KVM: protect assigned dev workqueue, int handler and irq acker
kvm_assigned_dev_ack_irq is vulnerable to a race condition with the interrupt handler function. It does: if (dev-host_irq_disabled) { enable_irq(dev-host_irq); dev-host_irq_disabled = false; } If an interrupt triggers before the host-dev_irq_disabled assignment, it will disable the interrupt and set dev-host_irq_disabled to true. On return to kvm_assigned_dev_ack_irq, dev-host_irq_disabled is set to false, and the next kvm_assigned_dev_ack_irq call will fail to reenable it. Other than that, having the interrupt handler and work handlers run in parallel sounds like asking for trouble (could not spot any obvious problem, but better not have to, its fragile). CC: sheng.y...@intel.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/include/linux/kvm_host.h === --- kvm.orig/include/linux/kvm_host.h +++ kvm/include/linux/kvm_host.h @@ -345,6 +345,7 @@ struct kvm_assigned_dev_kernel { int flags; struct pci_dev *dev; struct kvm *kvm; + spinlock_t assigned_dev_lock; }; struct kvm_irq_mask_notifier { Index: kvm/virt/kvm/kvm_main.c === --- kvm.orig/virt/kvm/kvm_main.c +++ kvm/virt/kvm/kvm_main.c @@ -42,6 +42,7 @@ #include linux/mman.h #include linux/swap.h #include linux/bitops.h +#include linux/spinlock.h #include asm/processor.h #include asm/io.h @@ -130,6 +131,7 @@ static void kvm_assigned_dev_interrupt_w * finer-grained lock, update this */ mutex_lock(kvm-lock); + spin_lock_irq(assigned_dev-assigned_dev_lock); if (assigned_dev-irq_requested_type KVM_DEV_IRQ_HOST_MSIX) { struct kvm_guest_msix_entry *guest_entries = assigned_dev-guest_msix_entries; @@ -156,18 +158,21 @@ static void kvm_assigned_dev_interrupt_w } } + spin_unlock_irq(assigned_dev-assigned_dev_lock); mutex_unlock(assigned_dev-kvm-lock); } static irqreturn_t kvm_assigned_dev_intr(int irq, void *dev_id) { + unsigned long flags; struct kvm_assigned_dev_kernel *assigned_dev = (struct kvm_assigned_dev_kernel *) dev_id; + spin_lock_irqsave(assigned_dev-assigned_dev_lock, flags); if (assigned_dev-irq_requested_type KVM_DEV_IRQ_HOST_MSIX) { int index = find_index_from_host_irq(assigned_dev, irq); if (index 0) - return IRQ_HANDLED; + goto out; assigned_dev-guest_msix_entries[index].flags |= KVM_ASSIGNED_MSIX_PENDING; } @@ -177,6 +182,8 @@ static irqreturn_t kvm_assigned_dev_intr disable_irq_nosync(irq); assigned_dev-host_irq_disabled = true; +out: + spin_unlock_irqrestore(assigned_dev-assigned_dev_lock, flags); return IRQ_HANDLED; } @@ -184,6 +191,7 @@ static irqreturn_t kvm_assigned_dev_intr static void kvm_assigned_dev_ack_irq(struct kvm_irq_ack_notifier *kian) { struct kvm_assigned_dev_kernel *dev; + unsigned long flags; if (kian-gsi == -1) return; @@ -196,10 +204,12 @@ static void kvm_assigned_dev_ack_irq(str /* The guest irq may be shared so this ack may be * from another device. */ + spin_lock_irqsave(dev-assigned_dev_lock, flags); if (dev-host_irq_disabled) { enable_irq(dev-host_irq); dev-host_irq_disabled = false; } + spin_unlock_irqrestore(dev-assigned_dev_lock, flags); } static void deassign_guest_irq(struct kvm *kvm, @@ -615,6 +625,7 @@ static int kvm_vm_ioctl_assign_device(st match-host_devfn = assigned_dev-devfn; match-flags = assigned_dev-flags; match-dev = dev; + spin_lock_init(match-assigned_dev_lock); match-irq_source_id = -1; match-kvm = kvm; match-ack_notifier.irq_acked = kvm_assigned_dev_ack_irq; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 2/4] KVM: x86: wake up waitqueue before calling get_cpu()
From: Jan Blunck jblu...@suse.de This moves the get_cpu() call down to be called after we wake up the waiters. Therefore the waitqueue locks can savely be rt mutex. Signed-off-by: Jan Blunck jblu...@suse.de Signed-off-by: Sven-Thorsten Dietrich s...@thebigcorporation.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/arch/x86/kvm/x86.c === --- kvm.orig/arch/x86/kvm/x86.c +++ kvm/arch/x86/kvm/x86.c @@ -4544,7 +4544,7 @@ static void vcpu_kick_intr(void *info) void kvm_vcpu_kick(struct kvm_vcpu *vcpu) { int ipi_pcpu = vcpu-cpu; - int cpu = get_cpu(); + int cpu; if (waitqueue_active(vcpu-wq)) { wake_up_interruptible(vcpu-wq); @@ -4554,6 +4554,7 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu * We may be called synchronously with irqs disabled in guest mode, * So need not to call smp_call_function_single() in that case. */ + cpu = get_cpu(); if (vcpu-guest_mode vcpu-cpu != cpu) smp_call_function_single(ipi_pcpu, vcpu_kick_intr, vcpu, 0); put_cpu(); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
boot problems with if=virtio
I know there have been a couple other threads here about booting with if=virtio, but I think this might be a different problem, not sure: I am using kvm.git (41b76d8d0487c26d6d4d3fe53c1ff59b3236f096) and qemu-kvm.git (8f7a30dbc40a1d4c09275566f9ed9647ed1ee50f) and linux 2.6.20-rc3 It appears to build fine. I am trying to run the following command: name=newcastle-xmailt01 dev1=/dev/disk/by-id/scsi-3600a0b8f1eb1069748d8c230 dev2=/dev/disk/by-id/scsi-3600a0b8f1eb106dc48f45432 macaddr=00:50:56:00:00:06 tap=tap6 cpus=1 mem=1024 /usr/local/bin/qemu-system-x86_64 -name $name\ -drive file=$dev1,if=virtio,boot=on,cache=none\ -drive file=$dev2,if=virtio,boot=off,cache=none\ -m $mem -net nic,model=virtio,vlan=0,macaddr=$macaddr\ -net tap,vlan=0,ifname=$tap,script=/etc/qemu-ifup -vnc 127.0.0.1:6 -smp $cpus -daemonize ...and I get Boot failed: could not read the boot disk This did work with the kvm-userspace.git (kvm-85rc6). I can get this to work with a windows vm, using ide. Was there a recent change to the -drive options that I am missing? Thanks, -Andrew -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: Discard shadow_mt_mask
On Mon, Apr 27, 2009 at 05:47:44PM +0800, Sheng Yang wrote: mt_mask is out of date, now it have only been used as a flag to indicate if TDP enabled. Get rid of it and use tdp_enabled instead. Signed-off-by: Sheng Yang sh...@linux.intel.com --- arch/x86/include/asm/kvm_host.h |2 +- arch/x86/kvm/mmu.c |8 +++- arch/x86/kvm/vmx.c |3 +-- arch/x86/kvm/x86.c |2 +- 4 files changed, 6 insertions(+), 9 deletions(-) ACK both patches. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm-userspace: Make PC speaker emulation aware of in-kernel PIT
On Thu, Apr 23, 2009 at 10:24:05PM +0200, Jan Kiszka wrote: When using the in-kernel PIT the speaker emulation has to synchronize the PIT state with KVM. Enhance the existing speaker sound device and allow it to take over port 0x61 by using KVM_CREATE_PIT_NOSPKR when available. This unbreaks -soundhw pcspk in KVM mode. ACK both patches. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] add ksm kernel shared memory driver.
On Mon, 20 Apr 2009 04:36:06 +0300 Izik Eidus iei...@redhat.com wrote: Ksm is driver that allow merging identical pages between one or more applications in way unvisible to the application that use it. Pages that are merged are marked as readonly and are COWed when any application try to change them. Breaks sparc64 and probably lots of other architectures: mm/ksm.c: In function `try_to_merge_two_pages_alloc': mm/ksm.c:697: error: `_PAGE_RW' undeclared (first use in this function) there should be an official arch-independent way of manipulating vma-vm_page_prot, but I'm not immediately finding it. An alternative (and quite inferior) fix would be to disable ksm on architectures which don't implement _PAGE_RW. That's most of them. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html