[COMMIT maint/2.6.30] Merge commit 'qemu-svn/stable_0_10' into maint/2.6.30

2009-04-27 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

* commit 'qemu-svn/stable_0_10':
  Fix savevm after BDRV_FILE size enforcement
  stop dirty tracking just at the end of migration (Glauber Costa)
  create qemu_file_set_error (Glauber Costa)
  propagate error on failed completion (Glauber Costa)
  qcow2: fix image creation for large,  ~2TB, images (Chris Wright)
  pci_add storage: fix error handling for 'if' parameter (Eduardo Habkost)
  Fix (at least one cause of) qcow2 corruption. (Nolan Leake)
  Fix oops on 2.6.25 guest (Rusty Russell)
  SH4: Add support for kernel cmdline
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT maint/2.6.30] kvm: libkvm: Compile with correct kernel include directory

2009-04-27 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/configure b/configure
index dfbff37..df4d686 100755
--- a/configure
+++ b/configure
@@ -1679,7 +1679,7 @@ configure_kvm() {
   \( $cpu = i386 -o $cpu = x86_64 -o $cpu = ia64 -o $cpu 
= powerpc \); then
 echo #define USE_KVM 1  $config_h
 echo USE_KVM=1  $config_mak
-echo CONFIG_KVM_KERNEL_INC=$kerneldir/include  $config_mak
+echo CONFIG_KVM_KERNEL_INC=$kerneldir/include  config-host.mak
 if test $kvm_cap_pit = yes ; then
echo USE_KVM_PIT=1  $config_mak
echo #define USE_KVM_PIT 1  $config_h
diff --git a/kvm/libkvm/Makefile b/kvm/libkvm/Makefile
index a8f1917..8811d84 100644
--- a/kvm/libkvm/Makefile
+++ b/kvm/libkvm/Makefile
@@ -12,6 +12,7 @@ cc-option = $(shell if $(CC) $(1) -S -o /dev/null -xc 
/dev/null \
 CFLAGS += $(autodepend-flags) -g -fomit-frame-pointer -Wall
 CFLAGS += $(call cc-option, -fno-stack-protector, )
 CFLAGS += $(call cc-option, -fno-stack-protector-all, )
+CFLAGS += -I$(CONFIG_KVM_KERNEL_INC)
 
 LDFLAGS += $(CFLAGS)
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT maint/2.6.30] Build libkvm from libkvm directory in new layout

2009-04-27 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/Makefile.target b/Makefile.target
index 9ec7068..440857e 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -605,7 +605,7 @@ endif
 ifdef CONFIG_KVM_KERNEL_INC
 CFLAGS += -I $(CONFIG_KVM_KERNEL_INC)
 LIBS += -lkvm
-DEPLIBS += ../libkvm/libkvm.a
+DEPLIBS += libkvm.a
 endif
 
 ifdef CONFIG_VNC_TLS
@@ -814,6 +814,14 @@ $(QEMU_PROG): LIBS += $(SDL_LIBS) $(COCOA_LIBS) 
$(CURSES_LIBS) $(BRLAPI_LIBS) $(
 $(QEMU_PROG): $(OBJS) ../libqemu_common.a libqemu.a $(DEPLIBS)
$(LINK)
 
+FORCE:
+
+libkvm.a: FORCE
+   $(MAKE) -C ../kvm/libkvm
+   if ! cmp -s libkvm.a ../kvm/libkvm/libkvm.a; then \
+  cp ../kvm/libkvm/libkvm.a . ; \
+   fi
+
 endif # !CONFIG_USER_ONLY
 
 gdbstub-xml.c: $(TARGET_XML_FILES) feature_to_c.sh
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT maint/2.6.30] configure: Include libkvm files

2009-04-27 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/configure b/configure
index f526bd5..7a97cc4 100755
--- a/configure
+++ b/configure
@@ -489,6 +489,8 @@ if test $werror = yes ; then
 CFLAGS=$CFLAGS -Werror
 fi
 
+CFLAGS=$CFLAGS -I$(readlink -f kvm/libkvm)
+
 if test $solaris = no ; then
 if ld --version 2/dev/null | grep GNU ld /dev/null 2/dev/null ; then
 LDFLAGS=$LDFLAGS -Wl,--warn-common
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT maint/2.6.30] kvm: libkvm: adjust Makefile to new layout

2009-04-27 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/kvm/libkvm/Makefile b/kvm/libkvm/Makefile
index 72c36a6..62bc45f 100644
--- a/kvm/libkvm/Makefile
+++ b/kvm/libkvm/Makefile
@@ -1,4 +1,4 @@
-include ../config.mak
+include ../../config-host.mak
 include config-$(ARCH).mak
 
 # cc-option
@@ -9,7 +9,6 @@ cc-option = $(shell if $(CC) $(1) -S -o /dev/null -xc /dev/null 
\
 CFLAGS += $(autodepend-flags) -g -fomit-frame-pointer -Wall
 CFLAGS += $(call cc-option, -fno-stack-protector, )
 CFLAGS += $(call cc-option, -fno-stack-protector-all, )
-CFLAGS += -I $(LIBKVM_KERNELDIR)/include
 
 LDFLAGS += $(CFLAGS)
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT maint/2.6.30] Default to building x86_64-softmmu for kvm

2009-04-27 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/configure b/configure
index 7a97cc4..dfbff37 100755
--- a/configure
+++ b/configure
@@ -98,7 +98,7 @@ else
   cpu=`uname -m`
 fi
 
-target_list=
+target_list=x86_64-softmmu
 case $cpu in
   i386|i486|i586|i686|i86pc|BePC)
 cpu=i386
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT maint/2.6.30] configure: Use USE_KVM instead of CONFIG_KVM

2009-04-27 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

We use USE_KVM instead of CONFIG_KVM to distinguish between our version
of the kvm/qemu glue and the official one.  Change ./configure to reflect that.

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/configure b/configure
index d31ca0d..f526bd5 100755
--- a/configure
+++ b/configure
@@ -1716,9 +1716,9 @@ case $target_cpu in
   echo #define USE_KQEMU 1  $config_h
 fi
 if test $kvm = yes ; then
-  echo CONFIG_KVM=yes  $config_mak
+  echo USE_KVM=yes  $config_mak
   echo KVM_CFLAGS=$kvm_cflags  $config_mak
-  echo #define CONFIG_KVM 1  $config_h
+  echo #define USE_KVM 1  $config_h
 fi
 configure_kvm
   ;;
@@ -1740,9 +1740,9 @@ case $target_cpu in
 configure_kvm
 if [ use_upstream_kvm = yes ]; then
 if test $kvm = yes ; then
-  echo CONFIG_KVM=yes  $config_mak
+  echo USE_KVM=yes  $config_mak
   echo KVM_CFLAGS=$kvm_cflags  $config_mak
-  echo #define CONFIG_KVM 1  $config_h
+  echo #define USE_KVM 1  $config_h
 fi
 fi
   ;;
@@ -1804,9 +1804,9 @@ case $target_cpu in
 echo #define TARGET_PPC 1  $config_h
 echo #define TARGET_PPCEMB 1  $config_h
 if test $kvm = yes ; then
-  echo CONFIG_KVM=yes  $config_mak
+  echo USE_KVM=yes  $config_mak
   echo KVM_CFLAGS=$kvm_cflags  $config_mak
-  echo #define CONFIG_KVM 1  $config_h
+  echo #define USE_KVM 1  $config_h
 fi
 gdb_xml_files=power-core.xml power-fpu.xml power-altivec.xml 
power-spe.xml
   ;;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT maint/2.6.30] kvm: use new header layout

2009-04-27 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/Makefile.target b/Makefile.target
index 4fbda65..be97e6b 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -140,6 +140,9 @@ endif
 kvm.o: CFLAGS+=$(KVM_CFLAGS)
 kvm-all.o: CFLAGS+=$(KVM_CFLAGS)
 
+qemu-kvm.o qemu-kvm-x86.o device-assignment.o ioapic.o i8259.o i8254-kvm.o \
+   kvm-tpr-opt.o apic.o:  CFLAGS+=$(KVM_CFLAGS)
+
 all: $(PROGS)
 
 #
@@ -602,8 +605,7 @@ ifdef CONFIG_CS4231A
 SOUND_HW += cs4231a.o
 endif
 
-ifdef CONFIG_KVM_KERNEL_INC
-CFLAGS += -I $(CONFIG_KVM_KERNEL_INC)
+ifdef USE_KVM
 DEPLIBS += libkvm.a
 endif
 
@@ -816,7 +818,7 @@ $(QEMU_PROG): $(OBJS) ../libqemu_common.a libqemu.a 
$(DEPLIBS)
 FORCE:
 
 libkvm.a: FORCE
-   $(MAKE) -C ../kvm/libkvm
+   $(MAKE) -C ../kvm/libkvm KVM_CFLAGS=$(KVM_CFLAGS)
if ! cmp -s libkvm.a ../kvm/libkvm/libkvm.a; then \
   cp ../kvm/libkvm/libkvm.a . ; \
fi
diff --git a/configure b/configure
index df4d686..913bcb8 100755
--- a/configure
+++ b/configure
@@ -194,9 +194,6 @@ signalfd=no
 eventfd=no
 cpu_emulation=yes
 
-# qemu-kvm: use local kerneldir
-kerneldir=$(readlink -f kvm/kernel)
-
 # OS specific
 if check_define __linux__ ; then
   targetos=Linux
@@ -769,8 +766,22 @@ fi
 ##
 # KVM probe
 
+case $cpu in
+i386 | x86_64)
+   kvm_arch=x86
+   ;;
+*)
+   kvm_arch=$cpu
+   ;;
+esac
+
+kvm_cflags=
+
 if test $kvm = yes ; then
 
+kvm_cflags=-I$source_path/kvm/kernel/include
+kvm_cflags=$kvm_cflags -I$source_path/kvm/kernel/arch/$kvm_arch/include
+
 # test for KVM_CAP_PIT
 
 cat  $TMPC EOF
@@ -780,7 +791,7 @@ cat  $TMPC EOF
 #endif
 int main(void) { return 0; }
 EOF
-if $cc $ARCH_CFLAGS $CFLAGS -I$kerneldir/include -o $TMPE ${OS_CFLAGS} 
$TMPC 2 /dev/null ; then
+if $cc $ARCH_CFLAGS $CFLAGS $kvm_cflags -o $TMPE ${OS_CFLAGS} $TMPC 2 
/dev/null ; then
kvm_cap_pit=yes
 fi
 
@@ -793,7 +804,7 @@ cat  $TMPC EOF
 #endif
 int main(void) { return 0; }
 EOF
-if $cc $ARCH_CFLAGS $CFLAGS -I$kerneldir/include -o $TMPE ${OS_CFLAGS} 
$TMPC 2 /dev/null ; then
+if $cc $ARCH_CFLAGS $CFLAGS $kvm_cflags -o $TMPE ${OS_CFLAGS} $TMPC 2 
/dev/null ; then
kvm_cap_device_assignment=yes
 fi
 fi
@@ -1049,19 +1060,6 @@ if test $kvm = yes ; then
 #endif
 int main(void) { return 0; }
 EOF
-  if test $kerneldir !=  ; then
-  kvm_cflags=-I$kerneldir/include
-  if test \( $cpu = i386 -o $cpu = x86_64 \) \
- -a -d $kerneldir/arch/x86/include ; then
-kvm_cflags=$kvm_cflags -I$kerneldir/arch/x86/include
-   elif test $cpu = ppc -a -d $kerneldir/arch/powerpc/include ; then
-   kvm_cflags=$kvm_cflags -I$kerneldir/arch/powerpc/include
-elif test -d $kerneldir/arch/$cpu/include ; then
-kvm_cflags=$kvm_cflags -I$kerneldir/arch/$cpu/include
-  fi
-  else
-  kvm_cflags=
-  fi
   if $cc $ARCH_CFLAGS -o $TMPE ${OS_CFLAGS} $kvm_cflags $TMPC \
/dev/null 2/dev/null ; then
 :
@@ -1679,7 +1677,7 @@ configure_kvm() {
   \( $cpu = i386 -o $cpu = x86_64 -o $cpu = ia64 -o $cpu 
= powerpc \); then
 echo #define USE_KVM 1  $config_h
 echo USE_KVM=1  $config_mak
-echo CONFIG_KVM_KERNEL_INC=$kerneldir/include  config-host.mak
+echo KVM_CFLAGS=$kvm_cflags  $config_mak
 if test $kvm_cap_pit = yes ; then
echo USE_KVM_PIT=1  $config_mak
echo #define USE_KVM_PIT 1  $config_h
diff --git a/kvm/libkvm/Makefile b/kvm/libkvm/Makefile
index 8811d84..727ce48 100644
--- a/kvm/libkvm/Makefile
+++ b/kvm/libkvm/Makefile
@@ -12,7 +12,7 @@ cc-option = $(shell if $(CC) $(1) -S -o /dev/null -xc 
/dev/null \
 CFLAGS += $(autodepend-flags) -g -fomit-frame-pointer -Wall
 CFLAGS += $(call cc-option, -fno-stack-protector, )
 CFLAGS += $(call cc-option, -fno-stack-protector-all, )
-CFLAGS += -I$(CONFIG_KVM_KERNEL_INC)
+CFLAGS += $(KVM_CFLAGS)
 
 LDFLAGS += $(CFLAGS)
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT maint/2.6.30] Merge branch 'stable_0_10' of git://git.sv.gnu.org/qemu into maint/2.6.30

2009-04-27 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

* 'stable_0_10' of git://git.sv.gnu.org/qemu: (21 commits)
  block-vpc: Don't silently create smaller image than requested (Kevin Wolf)
  Regenerate BIOS for stable branch
  Fix non-ACPI Timer Interrupt Routing (Beth Kon)
  hpet: Fix emulation of HPET_TN_SETVAL (Jan Kiszka)
  kvm: Fix cpuid initialization (Jan Kiszka)
  qcow2 corruption: Fix alloc_cluster_link_l2 (Kevin Wolf)
  Free VLANClientState using qemu_free() (Mark McLoughlin)
  Introduce VLANClientState::cleanup() (Mark McLoughlin)
  Use NICInfo::model for eepro100 savevm ID string (Mark McLoughlin)
  Add unregister_savevm() (Mark McLoughlin)
  Remove NICInfo from e1000 and mipsnet state (Mark McLoughlin)
  Remove some useless malloc() checking (Mark McLoughlin)
  Don't fail PCI hotplug if no NIC model is supplied (Mark McLoughlin)
  Fix error handling in net_client_init() (Mark McLoughlin)
  struct iovec is now universally available (Mark McLoughlin)
  Remove stray GSO code from virtio_net (Mark McLoughlin)
  Recognise evdev(xx)_aliases(yy) and xfree86(xx)_aliases(yy) as keymap names.
  Make PCI config status register read-only
  Fix crash on resolution change - screen dump - vga redraw (Avi Kivity)
  Update version for release
  ...

Conflicts:
hw/eepro100.c
hw/pcnet.c
hw/rtl8139.c
net.c
pc-bios/bios.bin

Signed-off-by: Avi Kivity a...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm-77 Excessive Disk Access causes real time clock hang!

2009-04-27 Thread Avi Kivity

Erik Rull wrote:

Are you using qcow2?  In some cases qcow2 will stall the guest cpu.

Note that defragmenting the guest drive may cause the qcow2 file to 
fragment even more, and will certainly increase its size.  I 
recommend only defragmenting when using raw storage.


I don't think so. I created a partition on my host real harddrive 
and provided this partition to my windows guest.


If you have an idea, which virtualized drive system could be the 
fastest (except giving a complete disk to the guest), your comments 
are welcome :-)





interface: virtio
cache: none
format: raw, using a partition or logical volume

What are you using?


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm.git now live

2009-04-27 Thread Avi Kivity

Cameron Macdonell wrote:
Great.  They work as expected.  One thing I noticed is that make clean 
doesn't remove libkvm.{o,a} and .libkvm.d.  Should it?


Should, but doesn't.  Another thing to fix.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Assign the correct pci id range to virtio_pci

2009-04-27 Thread Gerd Hoffmann

On 04/24/09 15:19, Anthony Liguori wrote:

Pantelis Koukousoulas wrote:

According to the file pci-ids.txt in qemu sources, the range of PCI
device IDs assigned to virtio_pci is 0x1000 to 0x10ff, with a few
subranges that have different rules regarding who can get an ID
there and how.

Nevertheless, the full range should be assigned to the generic
virtio_pci driver, so that all corresponding devices, including
the experimental/unreleased ones just work.


Gerd, since you appear to be managing the PCI ID range, what do you think?


I'd suggest to exclude the experimental range by default (enable via 
module parameter) to make clear it isn't for regular use.


cheers
  Gerd

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kvm-85 won't boot guests with virtio block device

2009-04-27 Thread Pauline Middelink
Added the sacred signoff at numerious request :)

Met vriendelijke groet,
Pauline Middelink
-- 
GPG Key fingerprint = 2D5B 87A7 DDA6 0378 5DEA  BD3B 9A50 B416 E2D0 C3C2
For more details look at my website http://www.polyware.nl/~middelink
Small regression. libvirt determines the use of the boot= flag by looking
at the helptext. This flag is required for booting off virtio-blk devices.

Signed-off-by: Pauline Middelink middelink at polyware.nl

diff -ur kvm-85/qemu/qemu-options.hx kvm-85a/qemu/qemu-options.hx
--- kvm-85/qemu/qemu-options.hx 2009-04-21 11:57:31.0 +0200
+++ kvm-85a/qemu/qemu-options.hx2009-04-24 19:04:50.0 +0200
@@ -77,6 +77,7 @@
 -drive [file=file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n
[,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n
[,cache=writethrough|writeback|none][,format=f][,serial=s]\n
+   [,boot=on|off]\n
 use 'file' as a drive image\n)
 STEXI
 @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]


[PATCH] qemu-kvm: make clean should propagate into libkvm directory

2009-04-27 Thread Michael S. Tsirkin
make 'clean' target propage into libkvm if it's enabled

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 Makefile.target |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/Makefile.target b/Makefile.target
index d890220..28dce60 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -846,6 +846,9 @@ qemu-options.h: $(SRC_PATH)/qemu-options.hx
 clean:
rm -f *.o *.a *~ $(PROGS) nwfpe/*.o fpu/*.o qemu-options.h gdbstub-xml.c
rm -f *.d */*.d tcg/*.o
+ifdef USE_KVM
+   $(MAKE) -C ../kvm/libkvm clean
+endif
 
 install: all
 ifneq ($(PROGS),)
-- 
1.6.0.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] qemu-kvm: fix compiler warning

2009-04-27 Thread Michael S. Tsirkin
kvm-common.h:25:7: warning: __ia64__ is not defined

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 kvm/libkvm/kvm-common.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kvm/libkvm/kvm-common.h b/kvm/libkvm/kvm-common.h
index 96361e8..591fb53 100644
--- a/kvm/libkvm/kvm-common.h
+++ b/kvm/libkvm/kvm-common.h
@@ -22,7 +22,7 @@
 #define KVM_MAX_NUM_MEM_REGIONS 1u
 #define MAX_VCPUS 64
 #define LIBKVM_S390_ORIGIN (0UL)
-#elif __ia64__
+#elif defined(__ia64__)
 #define KVM_MAX_NUM_MEM_REGIONS 32u
 #define MAX_VCPUS 256
 #else
-- 
1.6.0.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] qemu-kvm: make kvm_create_pit static

2009-04-27 Thread Michael S. Tsirkin
libkvm-x86.c:55: warning: no previous prototype for ‘kvm_create_pit’

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 kvm/libkvm/libkvm-x86.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kvm/libkvm/libkvm-x86.c b/kvm/libkvm/libkvm-x86.c
index 2fc4fce..df8cc81 100644
--- a/kvm/libkvm/libkvm-x86.c
+++ b/kvm/libkvm/libkvm-x86.c
@@ -52,7 +52,7 @@ static int kvm_init_tss(kvm_context_t kvm)
return 0;
 }
 
-int kvm_create_pit(kvm_context_t kvm)
+static int kvm_create_pit(kvm_context_t kvm)
 {
 #ifdef KVM_CAP_PIT
int r;
-- 
1.6.0.6
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] virtio_blk: SG_IO passthru support

2009-04-27 Thread Christoph Hellwig
From: Hannes Reinecke h...@suse.de

Add support for SG_IO passthru to virtio_blk.  We add the scsi command
block after the normal outhdr, and the scsi inhdr with full status
information aswell as the sense buffer before the regular inhdr.

[hch: forward ported, added the VIRTIO_BLK_F_SCSI flags, some comments
 and tested the whole beast]


Signed-off-by: Hannes Reinecke h...@suse.de
Signed-off-by: Christoph Hellwig h...@lst.de

Index: linux-2.6/drivers/block/virtio_blk.c
===
--- linux-2.6.orig/drivers/block/virtio_blk.c   2009-04-26 23:01:41.330960470 
+0200
+++ linux-2.6/drivers/block/virtio_blk.c2009-04-27 00:18:04.708076895 
+0200
@@ -37,6 +37,7 @@ struct virtblk_req
struct list_head list;
struct request *req;
struct virtio_blk_outhdr out_hdr;
+   struct virtio_scsi_inhdr in_hdr;
u8 status;
 };
 
@@ -49,7 +50,9 @@ static void blk_done(struct virtqueue *v
 
spin_lock_irqsave(vblk-lock, flags);
while ((vbr = vblk-vq-vq_ops-get_buf(vblk-vq, len)) != NULL) {
+   unsigned int nr_bytes;
int error;
+
switch (vbr-status) {
case VIRTIO_BLK_S_OK:
error = 0;
@@ -62,7 +65,15 @@ static void blk_done(struct virtqueue *v
break;
}
 
-   __blk_end_request(vbr-req, error, blk_rq_bytes(vbr-req));
+   if (blk_pc_request(vbr-req)) {
+   vbr-req-data_len = vbr-in_hdr.residual;
+   nr_bytes = vbr-in_hdr.data_len;
+   vbr-req-sense_len = vbr-in_hdr.sense_len;
+   vbr-req-errors = vbr-in_hdr.errors;
+   } else
+   nr_bytes = blk_rq_bytes(vbr-req);
+
+   __blk_end_request(vbr-req, error, nr_bytes);
list_del(vbr-list);
mempool_free(vbr, vblk-pool);
}
@@ -74,7 +85,7 @@ static void blk_done(struct virtqueue *v
 static bool do_req(struct request_queue *q, struct virtio_blk *vblk,
   struct request *req)
 {
-   unsigned long num, out, in;
+   unsigned long num, out = 0, in = 0;
struct virtblk_req *vbr;
 
vbr = mempool_alloc(vblk-pool, GFP_ATOMIC);
@@ -99,18 +110,36 @@ static bool do_req(struct request_queue 
if (blk_barrier_rq(vbr-req))
vbr-out_hdr.type |= VIRTIO_BLK_T_BARRIER;
 
-   sg_set_buf(vblk-sg[0], vbr-out_hdr, sizeof(vbr-out_hdr));
-   num = blk_rq_map_sg(q, vbr-req, vblk-sg+1);
-   sg_set_buf(vblk-sg[num+1], vbr-status, sizeof(vbr-status));
-
-   if (rq_data_dir(vbr-req) == WRITE) {
-   vbr-out_hdr.type |= VIRTIO_BLK_T_OUT;
-   out = 1 + num;
-   in = 1;
-   } else {
-   vbr-out_hdr.type |= VIRTIO_BLK_T_IN;
-   out = 1;
-   in = 1 + num;
+   sg_set_buf(vblk-sg[out++], vbr-out_hdr, sizeof(vbr-out_hdr));
+
+   /*
+* If this is a packet command we need a couple of additional headers.
+* Behind the normal outhdr we put a segment with the scsi command
+* block, and before the normal inhdr we put the sense data and the
+* inhdr with additional status information before the normal inhdr.
+*/
+   if (blk_pc_request(vbr-req))
+   sg_set_buf(vblk-sg[out++], vbr-req-cmd, vbr-req-cmd_len);
+
+   num = blk_rq_map_sg(q, vbr-req, vblk-sg + out);
+
+   if (blk_pc_request(vbr-req)) {
+   sg_set_buf(vblk-sg[num + out + in++], vbr-req-sense, 96);
+   sg_set_buf(vblk-sg[num + out + in++], vbr-in_hdr,
+  sizeof(vbr-in_hdr));
+   }
+
+   sg_set_buf(vblk-sg[num + out + in++], vbr-status,
+  sizeof(vbr-status));
+
+   if (num) {
+   if (rq_data_dir(vbr-req) == WRITE) {
+   vbr-out_hdr.type |= VIRTIO_BLK_T_OUT;
+   out += num;
+   } else {
+   vbr-out_hdr.type |= VIRTIO_BLK_T_IN;
+   in += num;
+   }
}
 
if (vblk-vq-vq_ops-add_buf(vblk-vq, vblk-sg, out, in, vbr)) {
@@ -149,8 +178,16 @@ static void do_virtblk_request(struct re
 static int virtblk_ioctl(struct block_device *bdev, fmode_t mode,
 unsigned cmd, unsigned long data)
 {
-   return scsi_cmd_ioctl(bdev-bd_disk-queue,
- bdev-bd_disk, mode, cmd,
+   struct gendisk *disk = bdev-bd_disk;
+   struct virtio_blk *vblk = disk-private_data;
+
+   /*
+* Only allow the generic SCSI ioctls if the host can support it.
+*/
+   if (!virtio_has_feature(vblk-vdev, VIRTIO_BLK_F_SCSI))
+   return -ENOIOCTLCMD;
+
+   return scsi_cmd_ioctl(disk-queue, disk, mode, cmd,
  (void __user *)data);
 }
 
@@ -356,6 +393,7 @@ static 

[PATCH] virtio-blk: add SGI_IO passthru support

2009-04-27 Thread Christoph Hellwig
Add support for SG_IO passthru (packet commands) to the virtio-blk
backend.  Conceptually based on an older patch from Hannes Reinecke
but largely rewritten to match the code structure and layering in
virtio-blk.

Note that currently we issue the hose SG_IO synchronously.  We could
easily switch to async I/O, but that would required either bloating
the VirtIOBlockReq by the size of struct sg_io_hdr or an additional
memory allocation for each SG_IO request.


Signed-off-by: Christoph Hellwig h...@lst.de

Index: qemu/hw/virtio-blk.h
===
--- qemu.orig/hw/virtio-blk.h   2009-04-26 16:50:38.154074532 +0200
+++ qemu/hw/virtio-blk.h2009-04-26 22:51:16.838076869 +0200
@@ -28,6 +28,9 @@
 #define VIRTIO_BLK_F_SIZE_MAX   1   /* Indicates maximum segment size */
 #define VIRTIO_BLK_F_SEG_MAX2   /* Indicates maximum # of segments */
 #define VIRTIO_BLK_F_GEOMETRY   4   /* Indicates support of legacy 
geometry */
+#define VIRTIO_BLK_F_RO 5   /* Disk is read-only */
+#define VIRTIO_BLK_F_BLK_SIZE   6   /* Block size of disk is available*/
+#define VIRTIO_BLK_F_SCSI   7   /* Supports scsi command passthru */
 
 struct virtio_blk_config
 {
@@ -70,6 +73,15 @@ struct virtio_blk_inhdr
 unsigned char status;
 };
 
+/* SCSI pass-through header */
+struct virtio_scsi_inhdr
+{
+uint32_t errors;
+uint32_t data_len;
+uint32_t sense_len;
+uint32_t residual;
+};
+
 void *virtio_blk_init(PCIBus *bus, BlockDriverState *bs);
 
 #endif
Index: qemu/hw/virtio-blk.c
===
--- qemu.orig/hw/virtio-blk.c   2009-04-26 16:50:38.160074667 +0200
+++ qemu/hw/virtio-blk.c2009-04-27 10:25:19.278074514 +0200
@@ -15,6 +15,9 @@
 #include sysemu.h
 #include virtio-blk.h
 #include block_int.h
+#ifdef __linux__
+# include scsi/sg.h
+#endif
 
 typedef struct VirtIOBlock
 {
@@ -35,6 +38,7 @@ typedef struct VirtIOBlockReq
 VirtQueueElement elem;
 struct virtio_blk_inhdr *in;
 struct virtio_blk_outhdr *out;
+struct virtio_scsi_inhdr *scsi;
 QEMUIOVector qiov;
 struct VirtIOBlockReq *next;
 } VirtIOBlockReq;
@@ -103,6 +107,108 @@ static VirtIOBlockReq *virtio_blk_get_re
 return req;
 }
 
+#ifdef __linux__
+static void virtio_blk_handle_scsi(VirtIOBlockReq *req)
+{
+struct sg_io_hdr hdr;
+int ret, size = 0;
+int status;
+int i;
+
+/*
+ * We require at least one output segment each for the virtio_blk_outhdr
+ * and the SCSI command block.
+ *
+ * We also at least require the virtio_blk_inhdr, the virtio_scsi_inhdr
+ * and the sense buffer pointer in the input segments.
+ */
+if (req-elem.out_num  2 || req-elem.in_num  3) {
+virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
+return;
+}
+
+/*
+ * No support for bidirection commands yet.
+ */
+if (req-elem.out_num  2  req-elem.in_num  3) {
+virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
+return;
+}
+
+/*
+ * The scsi inhdr is placed in the second-to-last input segment, just
+ * before the regular inhdr.
+ */
+req-scsi = (void *)req-elem.in_sg[req-elem.in_num - 2].iov_base;
+size = sizeof(*req-in) + sizeof(*req-scsi);
+
+memset(hdr, 0, sizeof(struct sg_io_hdr));
+hdr.interface_id = 'S';
+hdr.cmd_len = req-elem.out_sg[1].iov_len;
+hdr.cmdp = req-elem.out_sg[1].iov_base;
+hdr.dxfer_len = 0;
+
+if (req-elem.out_num  2) {
+/*
+ * If there are more than the minimally required 2 output segments
+ * there is write payload starting from the third iovec.
+ */
+hdr.dxfer_direction = SG_DXFER_TO_DEV;
+hdr.iovec_count = req-elem.out_num - 2;
+
+for (i = 0; i  hdr.iovec_count; i++)
+hdr.dxfer_len += req-elem.out_sg[i + 2].iov_len;
+
+hdr.dxferp = req-elem.out_sg + 2;
+
+} else if (req-elem.in_num  3) {
+/*
+ * If we have more than 3 input segments the guest wants to actually
+ * read data.
+ */
+hdr.dxfer_direction = SG_DXFER_FROM_DEV;
+hdr.iovec_count = req-elem.in_num - 3;
+for (i = 0; i  hdr.iovec_count; i++)
+hdr.dxfer_len += req-elem.in_sg[i].iov_len;
+
+hdr.dxferp = req-elem.in_sg;
+size += hdr.dxfer_len;
+} else {
+/*
+ * Some SCSI commands don't actually transfer any data.
+ */
+hdr.dxfer_direction = SG_DXFER_NONE;
+}
+
+hdr.sbp = req-elem.in_sg[req-elem.in_num - 3].iov_base;
+hdr.mx_sb_len = req-elem.in_sg[req-elem.in_num - 3].iov_len;
+size += hdr.mx_sb_len;
+
+ret = bdrv_ioctl(req-dev-bs, SG_IO, hdr);
+if (ret) {
+status = VIRTIO_BLK_S_UNSUPP;
+hdr.status = ret;
+hdr.resid = hdr.dxfer_len;
+} else if (hdr.status) {
+status = VIRTIO_BLK_S_IOERR;
+} else {
+status = 

Re: [PATCH] Assign the correct pci id range to virtio_pci

2009-04-27 Thread Pantelis Koukousoulas
 I'd suggest to exclude the experimental range by default (enable via module
 parameter) to make clear it isn't for regular use.

Module parameter on what? The module parameter parsing code is afaict
provided by the end-driver (e.g., virtio-net) which only speaks virtio and has
no idea there is an actual PCI device in the backend.

Isn't it easier to just make it clear that a PCI id within the 0x1000-10ef range
is a prerequisite for inclusion in mainline linux / qemu and leave it at that?

Btw, including just a subset of the experimental range like e.g.,
0x10f0-0x10f3 would
be fine with me, if there is a desire to be compatible with the allow for
breaking the ABI rationale for the 0x1000-0x103f range.

Thanks,
Pantelis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-04-27 Thread Avi Kivity

Gregory Haskins wrote:

This allows an eventfd to be registered as an irq source with a guest.  Any
signaling operation on the eventfd (via userspace or kernel) will inject
the registered GSI at the next available window.

 
+struct kvm_irqfd {

+   __u32 fd;
+   __u32 gsi;
+};
+
  


I think it's better to have ioctl create and return the fd.  This way we 
aren't tied to eventfd (though it makes a lot of sense to use it).


Also, please add a flags field and some padding so we can extend it later.


+
+#include linux/kvm_host.h
+#include linux/eventfd.h
+#include linux/workqueue.h
+#include linux/wait.h
+#include linux/poll.h
+#include linux/file.h
+#include linux/list.h
+
+struct _irqfd {
+   struct kvm   *kvm;
+   int   gsi;
+   struct file  *file;
+   struct list_head  list;
+   poll_tablept;
+   wait_queue_head_t*wqh;
+   wait_queue_t  wait;
+   struct work_structwork;
+};
+
+static void
+irqfd_inject(struct work_struct *work)
+{
+   struct _irqfd *irqfd = container_of(work, struct _irqfd, work);
+   struct kvm *kvm = irqfd-kvm;
+
+   mutex_lock(kvm-lock);
+   kvm_set_irq(kvm, kvm-irqfd.src, irqfd-gsi, 1);
  


Need to lower the irq too (though irqfd only supports edge triggered 
interrupts).



+   mutex_unlock(kvm-lock);
+}
+
+static int
+irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
+{
+   struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
+
+   /*
+* The eventfd calls its wake_up with interrupts disabled,
+* so we need to defer the IRQ injection until later since we need
+* to acquire the kvm-lock to do so.
+*/
+   schedule_work(irqfd-work);
+
+   return 0;
+}
  


One day we'll have lockless injection and we'll want to drop this.  I 
guess if we create the fd ourselves we can make it work, but I don't see 
how we can do this with eventfd.



+int
+kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
+{
+   struct _irqfd *irqfd;
+   struct file *file;
+   int ret;
+
+   irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
+   if (!irqfd)
+   return -ENOMEM;
+
+   irqfd-kvm = kvm;
+   irqfd-gsi = gsi;
+   INIT_LIST_HEAD(irqfd-list);
+   init_waitqueue_func_entry(irqfd-wait, irqfd_wakeup);
+   init_poll_funcptr(irqfd-pt, irqfd_ptable_queue_proc);
+   INIT_WORK(irqfd-work, irqfd_inject);
+
+   file = eventfd_fget(fd);
+   if (IS_ERR(file)) {
+   ret = PTR_ERR(file);
+   goto fail;
+   }
+
+   ret = file-f_op-poll(file, irqfd-pt);
+   /* do we need to look for errors in ret? */
  


Do we?


+
+   irqfd-file = file;
+
+   mutex_lock(kvm-lock);
+   if (kvm-irqfd.src == -1) {
+   ret = kvm_request_irq_source_id(kvm);
+   BUG_ON(ret  0);
  


I think you can reuse the userspace irq source (since it's just another 
way for userspace to inject an interrupt).  It isn't really needed since 
the irq source stuff is only needed to support level triggered interrupts.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Assign the correct pci id range to virtio_pci

2009-04-27 Thread Gerd Hoffmann

On 04/27/09 10:39, Pantelis Koukousoulas wrote:

I'd suggest to exclude the experimental range by default (enable via module
parameter) to make clear it isn't for regular use.


Module parameter on what?


virtio_pci.ko

Then you'll do something like this ...

echo options virtio_pci experimental=1  /etc/modprobe.d/virtio.conf

... to enable the experimental IDs.


The module parameter parsing code is afaict
provided by the end-driver (e.g., virtio-net) which only speaks virtio and has
no idea there is an actual PCI device in the backend.


virtio_pci is a separate module and can have parameters too.

cheers,
  Gerd

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Assign the correct pci id range to virtio_pci

2009-04-27 Thread Gerd Hoffmann

On 04/27/09 11:11, Pantelis Koukousoulas wrote:

Module parameter on what?

virtio_pci.ko

Then you'll do something like this ...

echo options virtio_pci experimental=1  /etc/modprobe.d/virtio.conf

... to enable the experimental IDs.


Ok, I can sure live with such an arrangement :)
Should the entire 0x10f0-0x10ff range be enabled by this option or
just a subset?
(and if a subset, how large, 4 IDs, more ?)


That roughly matches the number of non-experimental IDs (1/4 of the 
complete range), so 4 IDs looks good to me.


cheers,
  Gerd

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Assign the correct pci id range to virtio_pci

2009-04-27 Thread Avi Kivity

Gerd Hoffmann wrote:

On 04/27/09 11:11, Pantelis Koukousoulas wrote:

Module parameter on what?

virtio_pci.ko

Then you'll do something like this ...

echo options virtio_pci experimental=1  /etc/modprobe.d/virtio.conf

... to enable the experimental IDs.


Ok, I can sure live with such an arrangement :)
Should the entire 0x10f0-0x10ff range be enabled by this option or
just a subset?
(and if a subset, how large, 4 IDs, more ?)


That roughly matches the number of non-experimental IDs (1/4 of the 
complete range), so 4 IDs looks good to me.


Or maybe

  modprobe virtio-pci claim=0x10f2 claim=0x10f7

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM: Enable snooping control for supported hardware

2009-04-27 Thread Sheng Yang
Memory aliases with different memory type is a problem for guest. For the guest
without assigned device, the memory type of guest memory would always been the
same as host(WB); but for the assigned device, some part of memory may be used
as DMA and then set to uncacheable memory type(UC/WC), which would be a 
conflict of
host memory type then be a potential issue.

Snooping control can guarantee the cache correctness of memory go through the
DMA engine of VT-d.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 arch/x86/include/asm/kvm_host.h |5 -
 arch/x86/kvm/mmu.c  |   16 +---
 arch/x86/kvm/svm.c  |4 ++--
 arch/x86/kvm/vmx.c  |   29 ++---
 virt/kvm/iommu.c|   27 ---
 5 files changed, 61 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ba6906f..b972889 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -395,6 +395,8 @@ struct kvm_arch{
struct list_head active_mmu_pages;
struct list_head assigned_dev_head;
struct iommu_domain *iommu_domain;
+#define KVM_IOMMU_CACHE_COHERENCY  0x1
+   int iommu_flags;
struct kvm_pic *vpic;
struct kvm_ioapic *vioapic;
struct kvm_pit *vpit;
@@ -523,7 +525,7 @@ struct kvm_x86_ops {
int (*interrupt_allowed)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
int (*get_tdp_level)(void);
-   int (*get_mt_mask_shift)(void);
+   u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
 };
 
 extern struct kvm_x86_ops *kvm_x86_ops;
@@ -551,6 +553,7 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
  const void *val, int bytes);
 int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes,
  gpa_t addr, unsigned long *ret);
+u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn);
 
 extern bool tdp_enabled;
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 55c6923..ea1c2aa 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1606,7 +1606,7 @@ static int get_mtrr_type(struct mtrr_state_type 
*mtrr_state,
return mtrr_state-def_type;
 }
 
-static u8 get_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn)
+u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
u8 mtrr;
 
@@ -1616,6 +1616,7 @@ static u8 get_memory_type(struct kvm_vcpu *vcpu, gfn_t 
gfn)
mtrr = MTRR_TYPE_WRBACK;
return mtrr;
 }
+EXPORT_SYMBOL_GPL(kvm_get_guest_memory_type);
 
 static int kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 {
@@ -1688,16 +1689,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 
*shadow_pte,
spte |= shadow_user_mask;
if (largepage)
spte |= PT_PAGE_SIZE_MASK;
-   if (tdp_enabled) {
-   if (!kvm_is_mmio_pfn(pfn)) {
-   mt_mask = get_memory_type(vcpu, gfn) 
-   kvm_x86_ops-get_mt_mask_shift();
-   mt_mask |= VMX_EPT_IGMT_BIT;
-   } else
-   mt_mask = MTRR_TYPE_UNCACHABLE 
-   kvm_x86_ops-get_mt_mask_shift();
-   spte |= mt_mask;
-   }
+   if (tdp_enabled)
+   spte |= kvm_x86_ops-get_mt_mask(vcpu, gfn,
+   kvm_is_mmio_pfn(pfn));
 
spte |= (u64)pfn  PAGE_SHIFT;
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 053f3c5..2bbe1de 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2617,7 +2617,7 @@ static int get_npt_level(void)
 #endif
 }
 
-static int svm_get_mt_mask_shift(void)
+static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 {
return 0;
 }
@@ -2678,7 +2678,7 @@ static struct kvm_x86_ops svm_x86_ops = {
 
.set_tss_addr = svm_set_tss_addr,
.get_tdp_level = get_npt_level,
-   .get_mt_mask_shift = svm_get_mt_mask_shift,
+   .get_mt_mask = svm_get_mt_mask,
 };
 
 static int __init svm_init(void)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 85db8b2..db853b6 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3678,9 +3678,32 @@ static int get_ept_level(void)
return VMX_EPT_DEFAULT_GAW + 1;
 }
 
-static int vmx_get_mt_mask_shift(void)
+static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 {
-   return VMX_EPT_MT_EPTE_SHIFT;
+   int ret;
+
+   /* For VT-d and EPT combination
+* 1. MMIO: always map as UC
+* 2. EPT with VT-d:
+*   a. VT-d with snooping control feature: snooping control feature of
+*  VT-d engine can guarantee the cache correctness. Just set it
+*  to WB to keep consistent with host. So the same as item 3.
+*   b. VT-d without snooping control feature: can't 

Re: Unable to boot guest on kernel 2.6.29.1 with kvm-84 or kvm-85

2009-04-27 Thread Avi Kivity

Eino Malinen wrote:

Can you try a bisect?



I confirm having the same trouble with kvm. Bisect with the vanilla kernel 
gave the following result:


(f438349...|BISECTING)$ git bisect log
# bad: [8d7bff2d72660d9d60aa371ae3d1356bbf329a09] Linux 2.6.29.1
# good: [8e0ee43bc2c3e19db56a4adaa9a9b04ce885cd84] Linux 2.6.29
git bisect start 'v2.6.29.1' 'v2.6.29' '--' 'arch/x86/kvm/' 'virt/kvm/'
# bad: [f438349efb8247cd0c1d453a4131b1f801bf5691] KVM: VMX: Don't allow 
uninhibited access to EFER on i386

git bisect bad f438349efb8247cd0c1d453a4131b1f801bf5691

Are the commits splitted more precisely in kvm.git?
  

No (and this is pretty fine granularity -- that commit is a one liner).

So 2.6.29 worked and 2.6.29.1 did not?  What is your host cpu type?  Are 
you running an i386 or x86_64 host?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] virtio-blk: add SGI_IO passthru support

2009-04-27 Thread Avi Kivity

Christoph Hellwig wrote:

[had the qemu list address wrong the first time, reply to this message,
 not the previous if you were on Cc]


Add support for SG_IO passthru (packet commands) to the virtio-blk
backend.  Conceptually based on an older patch from Hannes Reinecke
but largely rewritten to match the code structure and layering in
virtio-blk.

Note that currently we issue the hose SG_IO synchronously.  We could
easily switch to async I/O, but that would required either bloating
the VirtIOBlockReq by the size of struct sg_io_hdr or an additional
memory allocation for each SG_IO request.
  


I think that's worthwhile.  The extra bloat is trivial (especially as 
the number of inflight virtio requests is tightly bounded), and stalling 
the vcpu for requests is a pain.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio_blk: SG_IO passthru support

2009-04-27 Thread Christian Borntraeger
Am Monday 27 April 2009 10:21:21 schrieben Sie:
 From: Hannes Reinecke h...@suse.de
 
 Add support for SG_IO passthru to virtio_blk.  We add the scsi command
 block after the normal outhdr, and the scsi inhdr with full status
 information aswell as the sense buffer before the regular inhdr.
 
 [hch: forward ported, added the VIRTIO_BLK_F_SCSI flags, some comments
  and tested the whole beast]

I like the VIRTIO_BLK_F_SCSI flag - it avoids some problems we had with an
earlier version. (I believe we found a solution, but this explicit check
looks much more reliable)
I gave this patch a short test with the kuli loader (s390 userspace). You can
consider the VIRTIO_BLK_F_SCSI not available case
Tested-by: Christian Borntraeger borntrae...@de.ibm.com

Christian


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-04-27 Thread Gregory Haskins
Gregory Haskins wrote:
 Avi Kivity wrote:
   

 One day we'll have lockless injection and we'll want to drop this.  I
 guess if we create the fd ourselves we can make it work, but I don't
 see how we can do this with eventfd.

 

 Hmm...this is a good point.  There probably is no way to use eventfd
 off the shelf in a way that doesn't cause this callback to be in a
 critical section.  Should we just worry about switching away from
 eventfd when this occurs, or should I implement a custom anon-fd now?
   

Something else to consider:  eventfd supports calling eventfd_signal()
from interrupt context, so
even if the callback were not invoked from a preempt/irq off CS due to
the wqh-lock, the context may still be a CS anyway.  Now, userspace and
vbus based injection do not need to worry about this, but I wonder if
this is a  desirable attribute for some other source (device-assignment
perhaps)?

If so, is there a way to design the lockless-injection code such that
its still friendly to irq-context, or is this a mode that we would never
want to support anyway?  I think this would be a good thing to have at
least to maintain compatibility with the existing eventfd interface,
which has its own advantages.

-Greg




signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 00/16] interrupt injection rework

2009-04-27 Thread Avi Kivity

Gleb Natapov wrote:

Hi,

This patch series aims to consolidate IRQ injection code for in kernel
IRQ chip and userspace one. Also to move IRQ injection logic from
SVM/VMX specific code to x86.c.
  


Applied all, thanks for this excellent patchset.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] qemu-kvm: add irqfd support

2009-04-27 Thread Gregory Haskins
Michael S. Tsirkin wrote:
 On Fri, Apr 24, 2009 at 12:30:17AM -0400, Gregory Haskins wrote:
   
 +int kvm_irqfd(kvm_context_t kvm, int gsi)
 +{
 +int fd, r;
 +
 +if (!kvm_check_extension(kvm, KVM_CAP_IRQFD))
 +return -ENOENT;
 +
 +fd = eventfd(0, 0);
 +if (fd  0)
 +return fd;
 +
 +r = assign_irqfd(kvm, fd, gsi);
 +if (r  0)
 +return r;
 

 Do we need to close fd on error?
   
Good catch.  This will be changing in v3 to do the allocation in kernel,
but I will be sure to properly cleanup this type of leak, however that
ends up looking.

Thanks Michael.
-Greg

   
 +
 +return fd;
 +}
 




signature.asc
Description: OpenPGP digital signature


Re: [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-04-27 Thread Avi Kivity

Gregory Haskins wrote:

Something else to consider:  eventfd supports calling eventfd_signal()
from interrupt context, so
even if the callback were not invoked from a preempt/irq off CS due to
the wqh-lock, the context may still be a CS anyway.  Now, userspace and
vbus based injection do not need to worry about this, but I wonder if
this is a  desirable attribute for some other source (device-assignment
perhaps)?
  


I'm no networking expert, but device assignment certainly wants to queue 
an interrupt from an interrupt (basically it forwards the interrupt from 
host to guest).  Block is also simple enough to trigger the interrupt 
from the completion context.


For networking, we'd eventually want to do the host-guest copy using a 
dma engine, and we'd want to inject the interrupt from the dma engine's 
completion handler.




If so, is there a way to design the lockless-injection code such that
its still friendly to irq-context, or is this a mode that we would never
want to support anyway?  I think this would be a good thing to have at
least to maintain compatibility with the existing eventfd interface,
which has its own advantages.
  


This has RCU painted all over it in a 1200-point font.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Assign the correct pci id range to virtio_pci

2009-04-27 Thread Pantelis Koukousoulas
 Or maybe

  modprobe virtio-pci claim=0x10f2 claim=0x10f7

How about claim=0x10f2,0x10f7 instead so that it can be implemented as
a standard module array parameter?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Assign the correct pci id range to virtio_pci

2009-04-27 Thread Avi Kivity

Pantelis Koukousoulas wrote:

Or maybe

 modprobe virtio-pci claim=0x10f2 claim=0x10f7



How about claim=0x10f2,0x10f7 instead so that it can be implemented as
a standard module array parameter?
  


Even better.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kvm-kmod.git

2009-04-27 Thread Avi Kivity
This is a small git repository for the kvm external module kit.  It 
includes a submodule link to the kernel tree, so when you check out the 
repository, it also checks out a kernel tree for a specific revision.  
This way the various scripts and the kernel source are always synchronized.


$ git clone git://git.kernel.org/pub/scm/virt/kvm/kvm-kmod.git
$ git submodule init
$ git submodule update
$ ./configure
$ make sync
$ make

Todo:

- add maint branches
- automatic 'make sync'? perhaps replace by symlimks?
- add release script

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 0/8] virtio: add guest MSI-X support

2009-04-27 Thread Michael S. Tsirkin
Add optional MSI-X support: use a vector per virtqueue with
fallback to a common vector and finally to regular interrupt.
Teach all drivers to use it.

I added 2 new virtio operations: request_vqs/free_vqs because MSI
needs to know the total number of vectors upfront.

Signed-off-by: Michael S. Tsirkin m...@redhat.com

---

Here's a draft set of patches for MSI-X support in the guest.  It still
needs to be tested properly, and performance impact measured, but I
thought I'd share it here in the hope of getting some very early
feedback/flames.

Michael S. Tsirkin (8):
  virtio: add request_vqs/free_vqs virtio operations
  virtio_blk: add request_vqs/free_vqs calls
  virtio-rng: add request_vqs/free_vqs calls
  virtio_console: add request_vqs/free_vqs calls
  virtio_net: add request_vqs/free_vqs calls
  virtio_balloon: add request_vqs/free_vqs calls
  virtio_pci: split up vp_interrupt
  virtio_pci: optional MSI-X support

 drivers/block/virtio_blk.c  |9 ++-
 drivers/char/hw_random/virtio-rng.c |   10 ++-
 drivers/char/virtio_console.c   |8 ++-
 drivers/net/virtio_net.c|   16 +++-
 drivers/virtio/virtio_balloon.c |9 ++-
 drivers/virtio/virtio_pci.c |  192 ++-
 include/linux/virtio_config.h   |   35 +++
 7 files changed, 247 insertions(+), 32 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/8] virtio: add request_vqs/free_vqs operations

2009-04-27 Thread Michael S. Tsirkin
This adds 2 new optional virtio operations: request_vqs/free_vqs. They will be
used for MSI support, because MSI needs to know the total number of vectors
upfront.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 include/linux/virtio_config.h |   35 +++
 1 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index bf8ec28..e935670 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -49,6 +49,15 @@
  * @set_status: write the status byte
  * vdev: the virtio_device
  * status: the new status byte
+ * @request_vqs: request the specified number of virtqueues
+ * vdev: the virtio_device
+ * max_vqs: the max number of virtqueues we want
+ *  If supplied, must call before any virtqueues are instantiated.
+ *  To modify the max number of virtqueues after request_vqs has been
+ *  called, call free_vqs and then request_vqs with a new value.
+ * @free_vqs: cleanup resources allocated by request_vqs
+ * vdev: the virtio_device
+ *  If supplied, must call after all virtqueues have been deleted.
  * @reset: reset the device
  * vdev: the virtio device
  * After this, status and feature negotiation must be done again
@@ -75,6 +84,8 @@ struct virtio_config_ops
u8 (*get_status)(struct virtio_device *vdev);
void (*set_status)(struct virtio_device *vdev, u8 status);
void (*reset)(struct virtio_device *vdev);
+   int (*request_vqs)(struct virtio_device *vdev, unsigned max_vqs);
+   void (*free_vqs)(struct virtio_device *vdev);
struct virtqueue *(*find_vq)(struct virtio_device *vdev,
 unsigned index,
 void (*callback)(struct virtqueue *));
@@ -126,5 +137,29 @@ static inline int virtio_config_buf(struct virtio_device 
*vdev,
vdev-config-get(vdev, offset, buf, len);
return 0;
 }
+
+/**
+ * virtio_request_vqs: request the specified number of virtqueues
+ * @vdev: the virtio_device
+ * @max_vqs: the max number of virtqueues we want
+ *
+ * For details, see documentation for request_vqs above. */
+static inline int virtio_request_vqs(struct virtio_device *vdev,
+unsigned max_vqs)
+{
+   return vdev-config-request_vqs ?
+   vdev-config-request_vqs(vdev, max_vqs) : 0;
+}
+
+/**
+ * free_vqs: cleanup resources allocated by virtio_request_vqs
+ * @vdev: the virtio_device
+ *
+ * For details, see documentation for free_vqs above. */
+static inline void virtio_free_vqs(struct virtio_device *vdev)
+{
+   if (vdev-config-free_vqs)
+   vdev-config-free_vqs(vdev);
+}
 #endif /* __KERNEL__ */
 #endif /* _LINUX_VIRTIO_CONFIG_H */
-- 
1.6.0.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/8] virtio_blk: add request_vqs/free_vqs calls

2009-04-27 Thread Michael S. Tsirkin
Add request_vqs/free_vqs calls to virtio_blk.
These will be required for MSI support.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/block/virtio_blk.c |9 -
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 5d34764..523eddc 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -224,10 +224,14 @@ static int virtblk_probe(struct virtio_device *vdev)
sg_init_table(vblk-sg, vblk-sg_elems);
 
/* We expect one virtqueue, for output. */
+   err = virtio_request_vqs(vdev, 1);
+   if (err)
+   goto out_free_vblk;
+
vblk-vq = vdev-config-find_vq(vdev, 0, blk_done);
if (IS_ERR(vblk-vq)) {
err = PTR_ERR(vblk-vq);
-   goto out_free_vblk;
+   goto out_free_vqs;
}
 
vblk-pool = mempool_create_kmalloc_pool(1,sizeof(struct virtblk_req));
@@ -324,6 +328,8 @@ out_mempool:
mempool_destroy(vblk-pool);
 out_free_vq:
vdev-config-del_vq(vblk-vq);
+out_free_vqs:
+   virtio_free_vqs(vdev);
 out_free_vblk:
kfree(vblk);
 out:
@@ -345,6 +351,7 @@ static void virtblk_remove(struct virtio_device *vdev)
put_disk(vblk-disk);
mempool_destroy(vblk-pool);
vdev-config-del_vq(vblk-vq);
+   virtio_free_vqs(vdev);
kfree(vblk);
 }
 
-- 
1.6.0.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/8] virtio-rng: add request_vqs/free_vqs calls

2009-04-27 Thread Michael S. Tsirkin
Add request_vqs/free_vqs calls to virtio-rng.
These will be required for MSI support.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/char/hw_random/virtio-rng.c |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/drivers/char/hw_random/virtio-rng.c 
b/drivers/char/hw_random/virtio-rng.c
index d0e563e..d43a6cd 100644
--- a/drivers/char/hw_random/virtio-rng.c
+++ b/drivers/char/hw_random/virtio-rng.c
@@ -94,13 +94,20 @@ static int virtrng_probe(struct virtio_device *vdev)
int err;
 
/* We expect a single virtqueue. */
+   err = virtio_request_vqs(vdev, 1);
+   if (err)
+   return err;
+
vq = vdev-config-find_vq(vdev, 0, random_recv_done);
-   if (IS_ERR(vq))
+   if (IS_ERR(vq)) {
+   virtio_free_vqs(vdev);
return PTR_ERR(vq);
+   }
 
err = hwrng_register(virtio_hwrng);
if (err) {
vdev-config-del_vq(vq);
+   virtio_free_vqs(vdev);
return err;
}
 
@@ -113,6 +120,7 @@ static void virtrng_remove(struct virtio_device *vdev)
vdev-config-reset(vdev);
hwrng_unregister(virtio_hwrng);
vdev-config-del_vq(vq);
+   virtio_free_vqs(vdev);
 }
 
 static struct virtio_device_id id_table[] = {
-- 
1.6.0.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/8] virtio_pci: split up vp_interrupt

2009-04-27 Thread Michael S. Tsirkin
This reorganizes virtio-pci code in vp_interrupt slightly, so that
it's easier to add per-vq MSI support on top.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/virtio/virtio_pci.c |   45 +-
 1 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 330aacb..151538c 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -164,6 +164,37 @@ static void vp_notify(struct virtqueue *vq)
iowrite16(info-queue_index, vp_dev-ioaddr + VIRTIO_PCI_QUEUE_NOTIFY);
 }
 
+/* Handle a configuration change: Tell driver if it wants to know. */
+static irqreturn_t vp_config_changed(int irq, void *opaque)
+{
+   struct virtio_pci_device *vp_dev = opaque;
+   struct virtio_driver *drv;
+   drv = container_of(vp_dev-vdev.dev.driver,
+  struct virtio_driver, driver);
+
+   if (drv  drv-config_changed)
+   drv-config_changed(vp_dev-vdev);
+   return IRQ_HANDLED;
+}
+
+/* Notify all virtqueues on an interrupt. */
+static irqreturn_t vp_vring_interrupt(int irq, void *opaque)
+{
+   struct virtio_pci_device *vp_dev = opaque;
+   struct virtio_pci_vq_info *info;
+   irqreturn_t ret = IRQ_NONE;
+   unsigned long flags;
+
+   spin_lock_irqsave(vp_dev-lock, flags);
+   list_for_each_entry(info, vp_dev-virtqueues, node) {
+   if (vring_interrupt(irq, info-vq) == IRQ_HANDLED)
+   ret = IRQ_HANDLED;
+   }
+   spin_unlock_irqrestore(vp_dev-lock, flags);
+
+   return ret;
+}
+
 /* A small wrapper to also acknowledge the interrupt when it's handled.
  * I really need an EIO hook for the vring so I can ack the interrupt once we
  * know that we'll be handling the IRQ but before we invoke the callback since
@@ -173,9 +204,6 @@ static void vp_notify(struct virtqueue *vq)
 static irqreturn_t vp_interrupt(int irq, void *opaque)
 {
struct virtio_pci_device *vp_dev = opaque;
-   struct virtio_pci_vq_info *info;
-   irqreturn_t ret = IRQ_NONE;
-   unsigned long flags;
u8 isr;
 
/* reading the ISR has the effect of also clearing it so it's very
@@ -187,14 +215,11 @@ static irqreturn_t vp_interrupt(int irq, void *opaque)
return IRQ_NONE;
 
/* Configuration change?  Tell driver if it wants to know. */
-   if (isr  VIRTIO_PCI_ISR_CONFIG) {
-   struct virtio_driver *drv;
-   drv = container_of(vp_dev-vdev.dev.driver,
-  struct virtio_driver, driver);
+   if (isr  VIRTIO_PCI_ISR_CONFIG)
+   vp_config_changed(irq, opaque);
 
-   if (drv  drv-config_changed)
-   drv-config_changed(vp_dev-vdev);
-   }
+   return vp_vring_interrupt(irq, opaque);
+}
 
spin_lock_irqsave(vp_dev-lock, flags);
list_for_each_entry(info, vp_dev-virtqueues, node) {
-- 
1.6.0.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/8] virtio_pci: optional MSI-X support

2009-04-27 Thread Michael S. Tsirkin
This implements optional MSI-X support in virtio_pci.
MSI-X is used whenever the host supports at least 2 MSI-X
vectors: 1 for configuration changes and 1 for virtqueues.
Per-virtqueue vectors are allocated if enough vectors
available.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/virtio/virtio_pci.c |  147 ++-
 1 files changed, 132 insertions(+), 15 deletions(-)

diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 151538c..20bdc8c 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -42,8 +42,33 @@ struct virtio_pci_device
/* a list of queues so we can dispatch IRQs */
spinlock_t lock;
struct list_head virtqueues;
+
+   /* MSI-X support */
+   struct msix_entry *msix_entries;
+   /* Name strings for interrupts. This size should be enough,
+* and I'm too lazy to allocate each name separately. */
+   char (*msix_names)[256];
+   /* Number of vectors configured at startup (excludes per-virtqueue
+* vectors if any) */
+   unsigned msix_preset_vectors;
+   /* Number of per-virtqueue vectors if any. */
+   unsigned msix_per_vq_vectors;
+};
+
+/* Constants for MSI-X */
+/* Use first vector for configuration changes, second and the rest for
+ * virtqueues Thus, we need at least 2 vectors for MSI. */
+enum {
+   VP_MSIX_CONFIG_VECTOR = 0,
+   VP_MSIX_VQ_VECTOR = 1,
+   VP_MSIX_MIN_VECTORS = 2
 };
 
+static inline int vq_vector(int index)
+{
+   return index + VP_MSIX_VQ_VECTOR;
+}
+
 struct virtio_pci_vq_info
 {
/* the actual virtqueue */
@@ -221,14 +246,92 @@ static irqreturn_t vp_interrupt(int irq, void *opaque)
return vp_vring_interrupt(irq, opaque);
 }
 
-   spin_lock_irqsave(vp_dev-lock, flags);
-   list_for_each_entry(info, vp_dev-virtqueues, node) {
-   if (vring_interrupt(irq, info-vq) == IRQ_HANDLED)
-   ret = IRQ_HANDLED;
+/* the config-free_vqs() implementation */
+static void vp_free_vqs(struct virtio_device *vdev) {
+   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+   int i;
+
+   for (i = 0; i  vp_dev-msix_preset_vectors; ++i)
+   free_irq(vp_dev-msix_entries[i].vector, vp_dev);
+
+   if (!vp_dev-msix_preset_vectors)
+   free_irq(vp_dev-pci_dev-irq, vp_dev);
+}
+
+/* the config-request_vqs() implementation */
+static int vp_request_vqs(struct virtio_device *vdev, unsigned max_vqs) {
+   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+   const char *name = dev_name(vp_dev-vdev.dev);
+   unsigned i, vectors;
+   int err = -ENOMEM;
+
+   /* We need at most one vector per queue and one for config changes */
+   vectors = vq_vector(max_vqs);
+   vp_dev-msix_entries = kmalloc(vectors * sizeof *vp_dev-msix_entries,
+  GFP_KERNEL);
+   if (!vp_dev-msix_entries)
+   goto error_entries;
+   vp_dev-msix_names = kmalloc(vectors * sizeof *vp_dev-msix_names,
+GFP_KERNEL);
+   if (!vp_dev-msix_names)
+   goto error_names;
+
+   snprintf(vp_dev-msix_names[VP_MSIX_CONFIG_VECTOR],
+sizeof *vp_dev-msix_names, %s-config, name);
+   for (i = 0; i  max_vqs; ++i)
+   snprintf(vp_dev-msix_names[vq_vector(i)],
+sizeof *vp_dev-msix_names, %s-vq-%d, name, i);
+
+   vp_dev-msix_preset_vectors = 1;
+   vp_dev-msix_per_vq_vectors = max_vqs;
+   for (;;) {
+   err = pci_enable_msix(vp_dev-pci_dev, vp_dev-msix_entries,
+ vectors);
+   /* Error out if not enough vectors */
+   if (err  0  err  VP_MSIX_MIN_VECTORS)
+   err = -EBUSY;
+   if (err = 0)
+   break;
+   /* Not enough vectors for all queues. Retry, disabling
+* per-queue interrupts */
+   vectors = VP_MSIX_MIN_VECTORS;
+   vp_dev-msix_preset_vectors = VP_MSIX_MIN_VECTORS;
+   vp_dev-msix_per_vq_vectors = 0;
+   snprintf(vp_dev-msix_names[VP_MSIX_VQ_VECTOR],
+sizeof *vp_dev-msix_names, %s-vq, name);
}
-   spin_unlock_irqrestore(vp_dev-lock, flags);
 
-   return ret;
+   if (err) {
+   /* Can't allocate enough MSI-X vectors, use regular interrupt */
+   vp_dev-msix_preset_vectors = 0;
+   vp_dev-msix_per_vq_vectors = 0;
+   /* Register a handler for the queue with the PCI device's
+* interrupt */
+   err = request_irq(vp_dev-pci_dev-irq, vp_interrupt,
+ IRQF_SHARED, name, vp_dev);
+   if (err)
+   goto error_irq;
+   }
+   for (i = 0; i  vp_dev-msix_preset_vectors; ++i) {
+  

[PATCH 5/8] virtio_net: add request_vqs/free_vqs calls

2009-04-27 Thread Michael S. Tsirkin
Add request_vqs/free_vqs calls to virtio_net.
These will be required for MSI support.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/net/virtio_net.c |   16 +---
 1 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9c82a39..fbe8a70 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -841,6 +841,7 @@ static int virtnet_probe(struct virtio_device *vdev)
int err;
struct net_device *dev;
struct virtnet_info *vi;
+   bool control_vq;
 
/* Allocate ourselves a network device with room for our info */
dev = alloc_etherdev(sizeof(struct virtnet_info));
@@ -901,11 +902,17 @@ static int virtnet_probe(struct virtio_device *vdev)
if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
vi-mergeable_rx_bufs = true;
 
-   /* We expect two virtqueues, receive then send. */
+   /* We expect either two or three virtqueues: receive, send and
+* optionally control. */
+   control_vq = virtio_has_feature(vi-vdev, VIRTIO_NET_F_CTRL_VQ);
+   err = virtio_request_vqs(vdev, control_vq ? 3 : 2);
+   if (err)
+   goto free;
+
vi-rvq = vdev-config-find_vq(vdev, 0, skb_recv_done);
if (IS_ERR(vi-rvq)) {
err = PTR_ERR(vi-rvq);
-   goto free;
+   goto free_vqs;
}
 
vi-svq = vdev-config-find_vq(vdev, 1, skb_xmit_done);
@@ -914,7 +921,7 @@ static int virtnet_probe(struct virtio_device *vdev)
goto free_recv;
}
 
-   if (virtio_has_feature(vi-vdev, VIRTIO_NET_F_CTRL_VQ)) {
+   if (control_vq) {
vi-cvq = vdev-config-find_vq(vdev, 2, NULL);
if (IS_ERR(vi-cvq)) {
err = PTR_ERR(vi-svq);
@@ -965,6 +972,8 @@ free_send:
vdev-config-del_vq(vi-svq);
 free_recv:
vdev-config-del_vq(vi-rvq);
+free_vqs:
+   virtio_free_vqs(vi-vdev);
 free:
free_netdev(dev);
return err;
@@ -994,6 +1003,7 @@ static void virtnet_remove(struct virtio_device *vdev)
vdev-config-del_vq(vi-rvq);
if (virtio_has_feature(vi-vdev, VIRTIO_NET_F_CTRL_VQ))
vdev-config-del_vq(vi-cvq);
+   virtio_free_vqs(vi-vdev);
unregister_netdev(vi-dev);
 
while (vi-pages)
-- 
1.6.0.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM: Enable snooping control for supported hardware

2009-04-27 Thread Sheng Yang
Memory aliases with different memory type is a problem for guest. For the guest
without assigned device, the memory type of guest memory would always been the
same as host(WB); but for the assigned device, some part of memory may be used
as DMA and then set to uncacheable memory type(UC/WC), which would be a 
conflict of
host memory type then be a potential issue.

Snooping control can guarantee the cache correctness of memory go through the
DMA engine of VT-d.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 arch/x86/include/asm/kvm_host.h |2 ++
 arch/x86/kvm/vmx.c  |   19 +--
 virt/kvm/iommu.c|   27 ---
 3 files changed, 43 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b6d01a4..b972889 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -395,6 +395,8 @@ struct kvm_arch{
struct list_head active_mmu_pages;
struct list_head assigned_dev_head;
struct iommu_domain *iommu_domain;
+#define KVM_IOMMU_CACHE_COHERENCY  0x1
+   int iommu_flags;
struct kvm_pic *vpic;
struct kvm_ioapic *vioapic;
struct kvm_pit *vpit;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 7c4f5a3..79fc401 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3682,11 +3682,26 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t 
gfn, bool is_mmio)
 {
u64 ret;
 
+   /* For VT-d and EPT combination
+* 1. MMIO: always map as UC
+* 2. EPT with VT-d:
+*   a. VT-d without snooping control feature: can't guarantee the
+*  result, try to trust guest.
+*   b. VT-d with snooping control feature: snooping control feature of
+*  VT-d engine can guarantee the cache correctness. Just set it
+*  to WB to keep consistent with host. So the same as item 3.
+* 3. EPT without VT-d: always map as WB and set IGMT=1 to keep
+*consistent with host MTRR
+*/
if (is_mmio)
ret = MTRR_TYPE_UNCACHABLE  VMX_EPT_MT_EPTE_SHIFT;
+   else if (vcpu-kvm-arch.iommu_domain 
+   !(vcpu-kvm-arch.iommu_flags  KVM_IOMMU_CACHE_COHERENCY))
+   ret = kvm_get_guest_memory_type(vcpu, gfn) 
+ VMX_EPT_MT_EPTE_SHIFT;
else
-   ret = (kvm_get_guest_memory_type(vcpu, gfn) 
-   VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IGMT_BIT;
+   ret = (MTRR_TYPE_WRBACK  VMX_EPT_MT_EPTE_SHIFT)
+   | VMX_EPT_IGMT_BIT;
 
return ret;
 }
diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index 4c40375..1514758 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -39,11 +39,16 @@ int kvm_iommu_map_pages(struct kvm *kvm,
pfn_t pfn;
int i, r = 0;
struct iommu_domain *domain = kvm-arch.iommu_domain;
+   int flags;
 
/* check if iommu exists and in use */
if (!domain)
return 0;
 
+   flags = IOMMU_READ | IOMMU_WRITE;
+   if (kvm-arch.iommu_flags  KVM_IOMMU_CACHE_COHERENCY)
+   flags |= IOMMU_CACHE;
+
for (i = 0; i  npages; i++) {
/* check if already mapped */
if (iommu_iova_to_phys(domain, gfn_to_gpa(gfn)))
@@ -53,8 +58,7 @@ int kvm_iommu_map_pages(struct kvm *kvm,
r = iommu_map_range(domain,
gfn_to_gpa(gfn),
pfn_to_hpa(pfn),
-   PAGE_SIZE,
-   IOMMU_READ | IOMMU_WRITE);
+   PAGE_SIZE, flags);
if (r) {
printk(KERN_ERR kvm_iommu_map_address:
   iommu failed to map pfn=%lx\n, pfn);
@@ -88,7 +92,7 @@ int kvm_assign_device(struct kvm *kvm,
 {
struct pci_dev *pdev = NULL;
struct iommu_domain *domain = kvm-arch.iommu_domain;
-   int r;
+   int r, last_flags;
 
/* check if iommu exists and in use */
if (!domain)
@@ -107,12 +111,29 @@ int kvm_assign_device(struct kvm *kvm,
return r;
}
 
+   last_flags = kvm-arch.iommu_flags;
+   if (iommu_domain_has_cap(kvm-arch.iommu_domain,
+IOMMU_CAP_CACHE_COHERENCY))
+   kvm-arch.iommu_flags |= KVM_IOMMU_CACHE_COHERENCY;
+
+   /* Check if need to update IOMMU page table for guest memory */
+   if ((last_flags ^ kvm-arch.iommu_flags) ==
+   KVM_IOMMU_CACHE_COHERENCY) {
+   kvm_iommu_unmap_memslots(kvm);
+   r = kvm_iommu_map_memslots(kvm);
+   if (r)
+   goto out_unmap;
+   }
+
printk(KERN_DEBUG assign device: host bdf = %x:%x:%x\n,
assigned_dev-host_busnr,

[PATCH 1/2] KVM: Replace get_mt_mask_shift with get_mt_mask

2009-04-27 Thread Sheng Yang
Shadow_mt_mask is out of date, now it have only been used as a flag to indicate
if TDP enabled. Get rid of it and use tdp_enabled instead.

Also put memory type logical in kvm_x86_ops-get_mt_mask().

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 arch/x86/include/asm/kvm_host.h |5 +++--
 arch/x86/kvm/mmu.c  |   21 ++---
 arch/x86/kvm/svm.c  |4 ++--
 arch/x86/kvm/vmx.c  |   17 -
 arch/x86/kvm/x86.c  |2 +-
 5 files changed, 24 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index cb306cf..b6d01a4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -523,7 +523,7 @@ struct kvm_x86_ops {
int (*interrupt_allowed)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
int (*get_tdp_level)(void);
-   int (*get_mt_mask_shift)(void);
+   u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
 };
 
 extern struct kvm_x86_ops *kvm_x86_ops;
@@ -537,7 +537,7 @@ int kvm_mmu_setup(struct kvm_vcpu *vcpu);
 void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 notrap_pte);
 void kvm_mmu_set_base_ptes(u64 base_pte);
 void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
-   u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 mt_mask);
+   u64 dirty_mask, u64 nx_mask, u64 x_mask);
 
 int kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
 void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
@@ -551,6 +551,7 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
  const void *val, int bytes);
 int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes,
  gpa_t addr, unsigned long *ret);
+u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn);
 
 extern bool tdp_enabled;
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5b79afa..a875e02 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -178,7 +178,6 @@ static u64 __read_mostly shadow_x_mask; /* mutual 
exclusive with nx_mask */
 static u64 __read_mostly shadow_user_mask;
 static u64 __read_mostly shadow_accessed_mask;
 static u64 __read_mostly shadow_dirty_mask;
-static u64 __read_mostly shadow_mt_mask;
 
 static inline u64 rsvd_bits(int s, int e)
 {
@@ -199,14 +198,13 @@ void kvm_mmu_set_base_ptes(u64 base_pte)
 EXPORT_SYMBOL_GPL(kvm_mmu_set_base_ptes);
 
 void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
-   u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 mt_mask)
+   u64 dirty_mask, u64 nx_mask, u64 x_mask)
 {
shadow_user_mask = user_mask;
shadow_accessed_mask = accessed_mask;
shadow_dirty_mask = dirty_mask;
shadow_nx_mask = nx_mask;
shadow_x_mask = x_mask;
-   shadow_mt_mask = mt_mask;
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes);
 
@@ -1608,7 +1606,7 @@ static int get_mtrr_type(struct mtrr_state_type 
*mtrr_state,
return mtrr_state-def_type;
 }
 
-static u8 get_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn)
+u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
u8 mtrr;
 
@@ -1618,6 +1616,7 @@ static u8 get_memory_type(struct kvm_vcpu *vcpu, gfn_t 
gfn)
mtrr = MTRR_TYPE_WRBACK;
return mtrr;
 }
+EXPORT_SYMBOL_GPL(kvm_get_guest_memory_type);
 
 static int kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 {
@@ -1670,7 +1669,6 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 
*shadow_pte,
 {
u64 spte;
int ret = 0;
-   u64 mt_mask = shadow_mt_mask;
 
/*
 * We don't set the accessed bit, since we sometimes want to see
@@ -1690,16 +1688,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 
*shadow_pte,
spte |= shadow_user_mask;
if (largepage)
spte |= PT_PAGE_SIZE_MASK;
-   if (mt_mask) {
-   if (!kvm_is_mmio_pfn(pfn)) {
-   mt_mask = get_memory_type(vcpu, gfn) 
-   kvm_x86_ops-get_mt_mask_shift();
-   mt_mask |= VMX_EPT_IGMT_BIT;
-   } else
-   mt_mask = MTRR_TYPE_UNCACHABLE 
-   kvm_x86_ops-get_mt_mask_shift();
-   spte |= mt_mask;
-   }
+   if (tdp_enabled)
+   spte |= kvm_x86_ops-get_mt_mask(vcpu, gfn,
+   kvm_is_mmio_pfn(pfn));
 
spte |= (u64)pfn  PAGE_SHIFT;
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 053f3c5..2bbe1de 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2617,7 +2617,7 @@ static int get_npt_level(void)
 #endif
 }
 
-static int svm_get_mt_mask_shift(void)
+static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 {
return 0;
 }
@@ -2678,7 +2678,7 @@ static struct kvm_x86_ops svm_x86_ops = {
 
.set_tss_addr = 

Re: kvm-userspace broken?

2009-04-27 Thread Avi Kivity

Oliver Rath wrote:

Hi List,

maybe i missed some announcements, but

git clone git://git.kernel.org/pub/scm/virt/kvm/kvm-userspace.git


givs no response:

kvm-userspace # git clone 
git://git.kernel.org/pub/scm/virt/kvm/kvm-userspace.git
Initialized empty Git repository in 
/home/oliver/kvm-userspace/kvm-userspace/.git/
fatal: The remote end hung up unexpectedly


Whats up there?
  


kvm-userspace.git has been retired; it's now playing golf in 
git://git.kernel.org/pub/scm/virt/kvm/retired/kvm-userspace.git.  Use 
git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git instead.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm-userspace: Make PC speaker emulation aware of in-kernel PIT

2009-04-27 Thread Sheng Yang
On Sunday 26 April 2009 03:59:11 Anthony Liguori wrote:
 Jan Kiszka wrote:
  Anthony Liguori wrote:
  Marcelo Tosatti wrote:
  Jan,
 
  While the patch itself looks fine, IMO it would be better to move all
  of the timer handling to userspace, except the performance critical
  parts,
  since most of it is generic. Either periodic or one-shot timer, with:
 
  The reason for having the PIT in-kernel is not performance.  The PIT is
  not performance sensitive.
 
  I think that depends. Some OSes (in some configurations) use the PIT
  counter as clock source and/or program it regularly in one-shot mode. An
  aging use case, but still a valid one.

 I can't find the thread, but this has been discussed at length before.
 The justification has always been for time drift correction.  If you
 crunch the numbers, even at a 1024HZ, there just aren't enough exits to
 really make a difference from a performance perspective.

I am agree too. 

When I moved PIT to kernel, the direct reason is at that time, timer in KVM is 
crappy, mainly due to interrupt handling stuffs. I remember the most obviously 
one is userspace pit injected one interrupt after another, regardless if the 
interrupt have already been delivered to the guest, so some interrupt lost, 
and the timer of guest would become slower and slower. We decided to depends 
on in-kernel pit to provide a stable time source, so move the whole pit to 
kernel(rather than try to provide a interface to fix it as Xen did at the time 
which seems much more complex).

Now KVM timer is much maturer and stable than that time, so I think it's ok to 
try to separate the timer interrupt logic and IO logic now. (though I also 
think it would still spend some time to get a elegant interface...)

-- 
regards
Yang, Sheng


 Just to state it more clearly, if you assume an additional 5us to drop
 to userspace (which is absurdly high, but let's stick with it), 1024
 exits per second comes out to about 5ms which is only 0.5% in terms of
 CPU consumption.

 The APIC is quite a bit more understandable because especially with SMP,
 you can generate a very high number of interrupts per second and taking
 a drop to userspace for every EOI can be start to matter with exit rates
 in the hundreds of thousands.

  It's because it was easier to do interrupt catch-up by pushing the PIT
  into the kernel which IMHO was the wrong path to go down.
 
  Pushing the emulation of port 0x61 into the kernel was a mistake we now
  have to deal with. I'm not that sure about the PIT itself.

 I agree re: port 0x61.  I'm just saying that there is no point in moving
 just the non performance critical components to userspace as Marcelo
 suggests because the whole thing is non performance critical.

 Regards,

 Anthony Liguori

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


qemu/hw/device-assignment: questions about msix_table_page

2009-04-27 Thread Michael S. Tsirkin
Sheng, Marcelo,
I've been reading code in qemu/hw/device-assignment.c, and
I have a couple of questions about msi-x implementation:
1. What is the reason that msix_table_page is allocated
   with mmap and not with e.g. malloc?
2. msix_table_page has the guest view of the msix table for the device.
   However, even this memory isn't mapped into guest directly, instead
   msix_mmio_read/msix_mmio_write perform the write in qemu.
   Won't it be possible to map this page directly into
   guest memory, reducing the overhead for table writes?

Could you shed light on this for me please?
Thanks,

( --)
( Resending with a sane subject/reply-to address. )
( Sorry about multiple copies.)

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re:

2009-04-27 Thread Sheng Yang
On Monday 27 April 2009 18:41:17 Michael S. Tsirkin wrote:
 Sheng, Marcelo,
 I've been reading code in qemu/hw/device-assignment.c, and
 I have a couple of questions about msi-x implementation:

Hi Michael

 1. What is the reason that msix_table_page is allocated
with mmap and not with e.g. malloc?

msix_table_page is a page, and mmap allocate memory on page boundary. So I use 
it.

 2. msix_table_page has the guest view of the msix table for the device.
However, even this memory isn't mapped into guest directly, instead
msix_mmio_read/msix_mmio_write perform the write in qemu.
Won't it be possible to map this page directly into
guest memory, reducing the overhead for table writes?

First, Linux configured the real MSI-X table in device, which is out of our 
scope. KVM accepted the interrupt from Linux, then inject it to the guest 
according to the MSI-X table setting of guest. So KVM should know about the 
page modification. For example, MSI-X table got mask bit which can be written 
by guest at any time(this bit haven't been implement yet, but should be soon), 
then we should mask the correlated vector of real MSI-X table; then guest may 
modified the MSI address/data, that also should be intercepted by KVM and used 
to update our knowledge of guest. So we can't passthrough the modification.

If guest can write to the real device MSI-X table directly, it would cause 
chaos on interrupt delivery, for what guest see is totally different with 
what's host see...

-- 
regards
Yang, Sheng


 Could you shed light on this for me please?
 Thanks,


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v2 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-04-27 Thread Gregory Haskins
Avi Kivity wrote:
 Gregory Haskins wrote:
 This allows an eventfd to be registered as an irq source with a
 guest.  Any
 signaling operation on the eventfd (via userspace or kernel) will
 inject
 the registered GSI at the next available window.

  
 +struct kvm_irqfd {
 +__u32 fd;
 +__u32 gsi;
 +};
 +
 
 I think it's better to have ioctl create and return the fd.  This way
 we aren't tied to eventfd (though it makes a lot of sense to use it).
 

 I dont mind either way, but I am not sure it buys us much as the one
 driving the fd would need to understand if the interface is
 eventfd-esque or something else anyway.  Let me know if you still want
 to see this changed.
   

 Sure, the interface remains the same (write 8 bytes), but the
 implementation can change.  For example, we can implement it to work
 from interrupt context, once we hack the locking appropriately.

I was thinking more along the lines of eventfd_signal().  AIO and vbus
currently use this interface, as opposed to the more polymorhpic
f_ops-write().



 +static void
 +irqfd_inject(struct work_struct *work)
 +{
 +struct _irqfd *irqfd = container_of(work, struct _irqfd, work);
 +struct kvm *kvm = irqfd-kvm;
 +
 +mutex_lock(kvm-lock);
 +kvm_set_irq(kvm, kvm-irqfd.src, irqfd-gsi, 1);
 
 Need to lower the irq too (though irqfd only supports edge triggered
 interrupts).

 
 Should I just do back-to-back 1+0 inside the same lock?

   

 Yes.  Might be nice to add a kvm_toggle_irq(), but let's leave that
 until later.

Ok.



  

 One day we'll have lockless injection and we'll want to drop this.  I
 guess if we create the fd ourselves we can make it work, but I don't
 see how we can do this with eventfd.

 

 Hmm...this is a good point.  There probably is no way to use eventfd
 off the shelf in a way that doesn't cause this callback to be in a
 critical section.  Should we just worry about switching away from
 eventfd when this occurs, or should I implement a custom anon-fd now?
   

 I'd just go with eventfd, and switch when it becomes relevant.  As
 long as the kernel allocates the fd, we're free to do as we like.

Sounds good.

-Greg




signature.asc
Description: OpenPGP digital signature


Re: qemu/hw/device-assignment: questions about msix_table_page

2009-04-27 Thread Michael S. Tsirkin
On Mon, Apr 27, 2009 at 09:16:14PM +0800, Sheng Yang wrote:
 On Monday 27 April 2009 18:41:17 Michael S. Tsirkin wrote:
  Sheng, Marcelo,
  I've been reading code in qemu/hw/device-assignment.c, and
  I have a couple of questions about msi-x implementation:
 
 Hi Michael

  1. What is the reason that msix_table_page is allocated
 with mmap and not with e.g. malloc?
 
 msix_table_page is a page, and mmap allocate memory on page boundary. So I 
 use 
 it.

Just wondering, would e.g. posix_memalign work here as well?

  2. msix_table_page has the guest view of the msix table for the device.
 However, even this memory isn't mapped into guest directly, instead
 msix_mmio_read/msix_mmio_write perform the write in qemu.
 Won't it be possible to map this page directly into
 guest memory, reducing the overhead for table writes?
 
 First, Linux configured the real MSI-X table in device, which is out of our 
 scope. KVM accepted the interrupt from Linux, then inject it to the guest 
 according to the MSI-X table setting of guest. So KVM should know about the 
 page modification. For example, MSI-X table got mask bit which can be written 
 by guest at any time(this bit haven't been implement yet, but should be 
 soon), 
 then we should mask the correlated vector of real MSI-X table; then guest may 
 modified the MSI address/data, that also should be intercepted by KVM and 
 used 
 to update our knowledge of guest. So we can't passthrough the modification.

Right, I see that. However all msix_mmio_write does is a memcpy.
So what I don't understand yet, what causes the real MSI-X table to be modified?
Where's that code?

 If guest can write to the real device MSI-X table directly, it would cause 
 chaos on interrupt delivery, for what guest see is totally different with 
 what's host see...

Obviously.

Thanks,
-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/8] virtio: add guest MSI-X support

2009-04-27 Thread Christian Borntraeger
Am Monday 27 April 2009 14:31:36 schrieb Michael S. Tsirkin:
 Add optional MSI-X support: use a vector per virtqueue with
 fallback to a common vector and finally to regular interrupt.
 Teach all drivers to use it.
 
 I added 2 new virtio operations: request_vqs/free_vqs because MSI
 needs to know the total number of vectors upfront.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com

I dont know, if that is feasible for MSI, but the transport(virtio_pci) should
already know the number of virtqueues, which should match the number of
vectors, no?

In fact, the transport has to have a way of getting the number of virtqeues
because find_vq returns ENOENT on invalid index numbers.

Christian
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu/hw/device-assignment: questions about msix_table_page

2009-04-27 Thread Sheng Yang
On Monday 27 April 2009 21:51:34 Michael S. Tsirkin wrote:
 On Mon, Apr 27, 2009 at 09:16:14PM +0800, Sheng Yang wrote:
  On Monday 27 April 2009 18:41:17 Michael S. Tsirkin wrote:
   Sheng, Marcelo,
   I've been reading code in qemu/hw/device-assignment.c, and
   I have a couple of questions about msi-x implementation:
 
  Hi Michael
 
   1. What is the reason that msix_table_page is allocated
  with mmap and not with e.g. malloc?
 
  msix_table_page is a page, and mmap allocate memory on page boundary. So
  I use it.

 Just wondering, would e.g. posix_memalign work here as well?

Um, I think it should work too.

   2. msix_table_page has the guest view of the msix table for the device.
  However, even this memory isn't mapped into guest directly, instead
  msix_mmio_read/msix_mmio_write perform the write in qemu.
  Won't it be possible to map this page directly into
  guest memory, reducing the overhead for table writes?
 
  First, Linux configured the real MSI-X table in device, which is out of
  our scope. KVM accepted the interrupt from Linux, then inject it to the
  guest according to the MSI-X table setting of guest. So KVM should know
  about the page modification. For example, MSI-X table got mask bit which
  can be written by guest at any time(this bit haven't been implement yet,
  but should be soon), then we should mask the correlated vector of real
  MSI-X table; then guest may modified the MSI address/data, that also
  should be intercepted by KVM and used to update our knowledge of guest.
  So we can't passthrough the modification.

 Right, I see that. However all msix_mmio_write does is a memcpy.
 So what I don't understand yet, what causes the real MSI-X table to be
 modified? Where's that code?

Now it haven't been allowed to do dynamically change...:( For now, please 
refer to assigned_device_update_msix_mmio in qemu/hw/device-assignment.c. It 
would scan the MSI-X page and transfer to KVM through ioctl. And for the 
kernel part, please refer to pci_enable_msix().

The userspace/kernel both still need change to support mask/unmask feature. 
That's still in TODO list(and I hope it can catch up with 2.6.31 merge 
window).

-- 
regards
Yang, Sheng


  If guest can write to the real device MSI-X table directly, it would
  cause chaos on interrupt delivery, for what guest see is totally
  different with what's host see...

 Obviously.

 Thanks,


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm-kmod.git

2009-04-27 Thread Farkas Levente
Avi Kivity wrote:
 This is a small git repository for the kvm external module kit.  It
 includes a submodule link to the kernel tree, so when you check out the
 repository, it also checks out a kernel tree for a specific revision. 
 This way the various scripts and the kernel source are always synchronized.
 
 $ git clone git://git.kernel.org/pub/scm/virt/kvm/kvm-kmod.git
 $ git submodule init
 $ git submodule update
 $ ./configure
 $ make sync
 $ make
 
 Todo:
 
 - add maint branches
 - automatic 'make sync'? perhaps replace by symlimks?
 - add release script
 

hi,
i see this big reorganization of the sources (which is good), but does
it means that there will be separate tarball releases for kvm-kmod and
kvm userspace (which can be compiled without other tarball and git
commands)?
if yes, it means there'll be one kvm-kmod tarball or for each kernel
there will be a different tarball?
thanks.

-- 
  Levente   Si vis pacem para bellum!
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu/hw/device-assignment: questions about msix_table_page

2009-04-27 Thread Michael S. Tsirkin
On Mon, Apr 27, 2009 at 10:03:59PM +0800, Sheng Yang wrote:
 On Monday 27 April 2009 21:51:34 Michael S. Tsirkin wrote:
  On Mon, Apr 27, 2009 at 09:16:14PM +0800, Sheng Yang wrote:
   On Monday 27 April 2009 18:41:17 Michael S. Tsirkin wrote:
Sheng, Marcelo,
I've been reading code in qemu/hw/device-assignment.c, and
I have a couple of questions about msi-x implementation:
  
   Hi Michael
  
1. What is the reason that msix_table_page is allocated
   with mmap and not with e.g. malloc?
  
   msix_table_page is a page, and mmap allocate memory on page boundary. So
   I use it.
 
  Just wondering, would e.g. posix_memalign work here as well?
 
 Um, I think it should work too.
 
2. msix_table_page has the guest view of the msix table for the device.
   However, even this memory isn't mapped into guest directly, instead
   msix_mmio_read/msix_mmio_write perform the write in qemu.
   Won't it be possible to map this page directly into
   guest memory, reducing the overhead for table writes?
  
   First, Linux configured the real MSI-X table in device, which is out of
   our scope. KVM accepted the interrupt from Linux, then inject it to the
   guest according to the MSI-X table setting of guest. So KVM should know
   about the page modification. For example, MSI-X table got mask bit which
   can be written by guest at any time(this bit haven't been implement yet,
   but should be soon), then we should mask the correlated vector of real
   MSI-X table; then guest may modified the MSI address/data, that also
   should be intercepted by KVM and used to update our knowledge of guest.
   So we can't passthrough the modification.
 
  Right, I see that. However all msix_mmio_write does is a memcpy.
  So what I don't understand yet, what causes the real MSI-X table to be
  modified? Where's that code?
 
 Now it haven't been allowed to do dynamically change...:( For now, please 
 refer to assigned_device_update_msix_mmio in qemu/hw/device-assignment.c. It 
 would scan the MSI-X page and transfer to KVM through ioctl. And for the 
 kernel part, please refer to pci_enable_msix().

Right, I noticed that. So msix_mmio_read/write are placeholders, right?
For exiting functionality we thinkably could let the guest write into
msix_table_page.

 The userspace/kernel both still need change to support mask/unmask feature. 
 That's still in TODO list(and I hope it can catch up with 2.6.31 merge 
 window).
 
 -- 
 regards
 Yang, Sheng
 
 
   If guest can write to the real device MSI-X table directly, it would
   cause chaos on interrupt delivery, for what guest see is totally
   different with what's host see...
 
  Obviously.
 
  Thanks,
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu/hw/device-assignment: questions about msix_table_page

2009-04-27 Thread Sheng Yang
On Monday 27 April 2009 22:15:04 Michael S. Tsirkin wrote:
 On Mon, Apr 27, 2009 at 10:03:59PM +0800, Sheng Yang wrote:
  On Monday 27 April 2009 21:51:34 Michael S. Tsirkin wrote:
   On Mon, Apr 27, 2009 at 09:16:14PM +0800, Sheng Yang wrote:
On Monday 27 April 2009 18:41:17 Michael S. Tsirkin wrote:
 Sheng, Marcelo,
 I've been reading code in qemu/hw/device-assignment.c, and
 I have a couple of questions about msi-x implementation:
   
Hi Michael
   
 1. What is the reason that msix_table_page is allocated
with mmap and not with e.g. malloc?
   
msix_table_page is a page, and mmap allocate memory on page boundary.
So I use it.
  
   Just wondering, would e.g. posix_memalign work here as well?
 
  Um, I think it should work too.
 
 2. msix_table_page has the guest view of the msix table for the
 device. However, even this memory isn't mapped into guest directly,
 instead msix_mmio_read/msix_mmio_write perform the write in qemu.
 Won't it be possible to map this page directly into
guest memory, reducing the overhead for table writes?
   
First, Linux configured the real MSI-X table in device, which is out
of our scope. KVM accepted the interrupt from Linux, then inject it
to the guest according to the MSI-X table setting of guest. So KVM
should know about the page modification. For example, MSI-X table got
mask bit which can be written by guest at any time(this bit haven't
been implement yet, but should be soon), then we should mask the
correlated vector of real MSI-X table; then guest may modified the
MSI address/data, that also should be intercepted by KVM and used to
update our knowledge of guest. So we can't passthrough the
modification.
  
   Right, I see that. However all msix_mmio_write does is a memcpy.
   So what I don't understand yet, what causes the real MSI-X table to be
   modified? Where's that code?
 
  Now it haven't been allowed to do dynamically change...:( For now, please
  refer to assigned_device_update_msix_mmio in qemu/hw/device-assignment.c.
  It would scan the MSI-X page and transfer to KVM through ioctl. And for
  the kernel part, please refer to pci_enable_msix().

 Right, I noticed that. So msix_mmio_read/write are placeholders, right?
 For exiting functionality we thinkably could let the guest write into
 msix_table_page.

Yeah, in some meaning. But it's not suggested let guest write into the 
msix_table_page.

1. For some drivers don't set/unset mask bit, guest won't access the page 
frequently, so it shouldn't be a performance critical one. 
2. With mask bit, some drivers would access the page more frequently, but you 
have to intercept it...

My suggestion is get mask bit support first, then consider optimization. Mask 
bit would affect the performance on large scaled system, so it should be done. 
So current MSI-X status is indeed temporarily...

I guess what you see now maybe accessing is frequently due to mask bit, but we 
ignored it for now, so it may can be optimized. But I really think it's 
temporarily status, so I think it's better not to spend much effect on 
optimize before mask bit support done...

-- 
regards
Yang, Sheng


  The userspace/kernel both still need change to support mask/unmask
  feature. That's still in TODO list(and I hope it can catch up with 2.6.31
  merge window).
 
  --
  regards
  Yang, Sheng
 
If guest can write to the real device MSI-X table directly, it would
cause chaos on interrupt delivery, for what guest see is totally
different with what's host see...
  
   Obviously.
  
   Thanks,


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: qemu: framebuffer: build fix for target-arm

2009-04-27 Thread Mark McLoughlin
Include qemu-kvm.h for non-KVM_UPSTREAM building and surround the
kvm code with USE_KVM guards.

Fixes target-arm:

 qemu/hw/framebuffer.c: In function 'framebuffer_update_display':
 qemu/hw/framebuffer.c:53: warning: implicit declaration of function 
'kvm_enabled'
 qemu/hw/framebuffer.c:54: warning: implicit declaration of function 
'kvm_physical_sync_dirty_bitmap'

Signed-off-by: Mark McLoughlin mar...@redhat.com
---
 hw/framebuffer.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/hw/framebuffer.c b/hw/framebuffer.c
index 1086ba9..e2d7604 100644
--- a/hw/framebuffer.c
+++ b/hw/framebuffer.c
@@ -18,6 +18,7 @@
 #include console.h
 #include framebuffer.h
 #include kvm.h
+#include qemu-kvm.h
 
 /* Render an image from a shared memory framebuffer.  */

@@ -50,9 +51,11 @@ void framebuffer_update_display(
 *first_row = -1;
 src_len = src_width * rows;
 
+#ifdef USE_KVM
 if (kvm_enabled()) {
 kvm_physical_sync_dirty_bitmap(base, src_len);
 }
+#endif
 pd = cpu_get_physical_page_desc(base);
 pd2 = cpu_get_physical_page_desc(base + src_len - 1);
 /* We should reall check that this is a continuous ram region.
-- 
1.6.0.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/8] virtio: add guest MSI-X support

2009-04-27 Thread Michael S. Tsirkin
On Mon, Apr 27, 2009 at 04:00:30PM +0200, Christian Borntraeger wrote:
 Am Monday 27 April 2009 14:31:36 schrieb Michael S. Tsirkin:
  Add optional MSI-X support: use a vector per virtqueue with
  fallback to a common vector and finally to regular interrupt.
  Teach all drivers to use it.
  
  I added 2 new virtio operations: request_vqs/free_vqs because MSI
  needs to know the total number of vectors upfront.
  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
 
 I dont know, if that is feasible for MSI, but the transport(virtio_pci) should
 already know the number of virtqueues, which should match the number of
 vectors, no?

I think no, the transport can find out the max number of vectors the
device supports (pci_msix_table_size), but not how many virtqueues are
needed by the driver.
As  number of virtqueues = number of vectors,
we could pre-allocate all vectors that host supports, but this seems
a bit drastic as an MSI-X device could support up to 2K vectors.

 In fact, the transport has to have a way of getting the number of virtqeues
 because find_vq returns ENOENT on invalid index numbers.
 
 Christian

So again, I think this is an upper bound supported by host. Right?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu/hw/device-assignment: questions about msix_table_page

2009-04-27 Thread Michael S. Tsirkin
On Mon, Apr 27, 2009 at 10:30:17PM +0800, Sheng Yang wrote:
 My suggestion is get mask bit support first, then consider optimization.

Yea.
Thanks for the explanations!

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-blk: add SGI_IO passthru support

2009-04-27 Thread Anthony Liguori

Christoph Hellwig wrote:

[had the qemu list address wrong the first time, reply to this message,
 not the previous if you were on Cc]


Add support for SG_IO passthru (packet commands) to the virtio-blk
backend.  Conceptually based on an older patch from Hannes Reinecke
but largely rewritten to match the code structure and layering in
virtio-blk.

Note that currently we issue the hose SG_IO synchronously.  We could
easily switch to async I/O, but that would required either bloating
the VirtIOBlockReq by the size of struct sg_io_hdr or an additional
memory allocation for each SG_IO request.


Signed-off-by: Christoph Hellwig h...@lst.de
  


So practically speaking, what can you do with this?  Should we be 
handling some SCSI cmds internally to QEMU (like eject operations) and 
supporting media=cdrom in -drive for if=virtio?


On a related topic, should we switch /dev/vdX to be /dev/sdX?

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/8] virtio: add guest MSI-X support

2009-04-27 Thread Anthony Liguori

Michael S. Tsirkin wrote:

Add optional MSI-X support: use a vector per virtqueue with
fallback to a common vector and finally to regular interrupt.
Teach all drivers to use it.

I added 2 new virtio operations: request_vqs/free_vqs because MSI
needs to know the total number of vectors upfront.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
  


Do you have userspace patches for testing?

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] kvm-85 release

2009-04-27 Thread Mark McLoughlin
Hi Avi,

On Tue, 2009-04-21 at 12:32 +0300, Avi Kivity wrote:

 Again a long delayed release; I'll try to reduce the lag between
 releases.
 
 kvm-85 comes in two flavors: the traditional kernel+qemu package 
 (kvm-85.tar.gz), as well as separate qemu (qemu-kvm-devel-85.tar.gz) 

I packaged this in Fedora 12 as qemu-0.10.50 and noticed a couple of
small problems with the qemu-kvm-devel tarball:

  - No kernel/include/asm symlink, so it was building with the system 
headers and not finding e.g. KVM_FEATURE_CLOCKSOURCE

  - An empty kvm-85 dir

And for anyone else doing similar packaging, I've also:

  - Disabled pwritev() support until glibc stops writing random junk. 
See https://bugzilla.redhat.com/497429

  - Added Pauline's boot=on -help string patch

  - Fixed the arm target build with the patch I just sent

Cheers,
Mark.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] kvm-85 release

2009-04-27 Thread Alexander Graf


On 27.04.2009, at 16:41, Mark McLoughlin wrote:


Hi Avi,

On Tue, 2009-04-21 at 12:32 +0300, Avi Kivity wrote:


Again a long delayed release; I'll try to reduce the lag between
releases.

kvm-85 comes in two flavors: the traditional kernel+qemu package
(kvm-85.tar.gz), as well as separate qemu (qemu-kvm-devel-85.tar.gz)


I packaged this in Fedora 12 as qemu-0.10.50 and noticed a couple of
small problems with the qemu-kvm-devel tarball:

 - No kernel/include/asm symlink, so it was building with the system
   headers and not finding e.g. KVM_FEATURE_CLOCKSOURCE

 - An empty kvm-85 dir

And for anyone else doing similar packaging, I've also:

 - Disabled pwritev() support until glibc stops writing random junk.
   See https://bugzilla.redhat.com/497429


Wouldn't it be useful to have it disabled upstream then?

Alex




 - Added Pauline's boot=on -help string patch

 - Fixed the arm target build with the patch I just sent

Cheers,
Mark.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] kvm-85 release

2009-04-27 Thread Christoph Hellwig
On Mon, Apr 27, 2009 at 04:45:23PM +0200, Alexander Graf wrote:
  - Disabled pwritev() support until glibc stops writing random junk.
See https://bugzilla.redhat.com/497429

 Wouldn't it be useful to have it disabled upstream then?

No glibc has been released with the broken one yet, I gusee Mark must
have been running some sort of snapshot.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


CPU Limitations

2009-04-27 Thread Cornelius Wefelscheid

Hi,
i have the following problem:
I got a amd opteron server with 32 (4*8) cpus.

Since kvm is only able to created a virtual machine with 16 cpus, i
changed KVM_MAX_VCPUS, MAX_VCPUS and MAX_CPUS
as suggested in
http://www.linux-kvm.org/wiki/images/b/be/KvmForum2008$kdf2008_6.pdf

after compiling everything is shows the right number of maximum cpus in
the virt-manager,
but it is still not working.

If i start for example my virtual machine with

kvm -m 4000 -smp 30 disk0.img

i get a segmentation fault.
in the logfile it looks like that:

[ 4440.424512] __ratelimit: 18 callbacks suppressed
[ 4440.424523] kvm[7054]: segfault at 1ae5cf6b0 ip 004092ba sp
7fff3dbb8dc0 error 4 in kvm[40+1e1000]

Any idee what is missing or what is wrong?

cheers
cornelius


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2768533 ] SCSI/VirtIO: VM unable to reboot from non-IDE controller

2009-04-27 Thread SourceForge.net
Bugs item #2768533, was opened at 2009-04-16 16:01
Message generated for change (Settings changed) made by technologov
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2768533group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Closed
Resolution: Fixed
Priority: 7
Private: No
Submitted By: Technologov (technologov)
Assigned to: Nobody/Anonymous (nobody)
Summary: SCSI/VirtIO: VM unable to reboot from non-IDE controller

Initial Comment:
All Red Hat-based systems (RHEL, Fedora) have failed during automated tests on 
SCSI disks on KVM.
It turned out to be a problem of Qemu/KVM, where after a software reboot (i.e. 
initiated by guest OS), the system is unable to boot from SCSI controller.

To reboot successfully you need either soft reboot+IDE or Hard reboot (Qemu 
cold boot)+SCSI.

The Command sent to Qemu/KVM: 
/usr/local/bin/qemu-system-x86_64 -m 512 -monitor 
tcp:localhost:4602,server,nowait -cdrom /isos/linux/Fedora-8-i386-DVD.iso  
-drive file=/vm/fedora8-32.qcow2,if=scsi,boot=on -name fedora8-32

Host: RHEL 5/x64, KVM-85rc6. (tried both Intel and AMD)
Guest: RHEL 5, Fedora 8, Fedora 9 (tried both 32 and 64-bit).

-Alexey, 16.4.2009.

--

Comment By: Technologov (technologov)
Date: 2009-04-27 17:56

Message:
Bug fixed in KVM-85. Closing.

--

Comment By: Technologov (technologov)
Date: 2009-04-21 12:41

Message:
Ryan Harper provided a patch that fixes this.

I need to make sure it makes way into KVM release. Not closing bug yet.

--

Comment By: Technologov (technologov)
Date: 2009-04-19 17:25

Message:
Bug was also opened in Ubuntu Launchpad.
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/363743

Two reasons: 
1. Qemu project does not have bugzilla - so Ubuntu Launchpad became main
bugzilla for Qemu
2. This bug affects Ubuntu-8.10

--

Comment By: Technologov (technologov)
Date: 2009-04-16 17:12

Message:
This problem also exists for VirtIO block device.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2768533group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm-kmod.git

2009-04-27 Thread Avi Kivity

Farkas Levente wrote:

hi,
i see this big reorganization of the sources (which is good), but does
it means that there will be separate tarball releases for kvm-kmod and
kvm userspace (which can be compiled without other tarball and git
commands)?
if yes, it means there'll be one kvm-kmod tarball or for each kernel
there will be a different tarball?
thanks


There will be a qemu-kvm tarball (qemu only), kvm-kmod tarball (kernel 
module only, suitable for any supported host kernel), and kvm tarball 
(contains both of the above).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] kvm-85 release

2009-04-27 Thread Avi Kivity

Mark McLoughlin wrote:

Hi Avi,

On Tue, 2009-04-21 at 12:32 +0300, Avi Kivity wrote:

  

Again a long delayed release; I'll try to reduce the lag between
releases.

kvm-85 comes in two flavors: the traditional kernel+qemu package 
(kvm-85.tar.gz), as well as separate qemu (qemu-kvm-devel-85.tar.gz) 



I packaged this in Fedora 12 as qemu-0.10.50 and noticed a couple of
small problems with the qemu-kvm-devel tarball:

  - No kernel/include/asm symlink, so it was building with the system 
headers and not finding e.g. KVM_FEATURE_CLOCKSOURCE
  


This is already fixed in the new repository.


  - An empty kvm-85 dir
  


I need to rewrite my release script anyway.


And for anyone else doing similar packaging, I've also:

  - Disabled pwritev() support until glibc stops writing random junk. 
See https://bugzilla.redhat.com/497429


  - Added Pauline's boot=on -help string patch

  - Fixed the arm target build with the patch I just sent
  


I hope to release kvm-86 soon.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/8] virtio: add guest MSI-X support

2009-04-27 Thread Avi Kivity

Christian Borntraeger wrote:

Am Monday 27 April 2009 14:31:36 schrieb Michael S. Tsirkin:
  

Add optional MSI-X support: use a vector per virtqueue with
fallback to a common vector and finally to regular interrupt.
Teach all drivers to use it.

I added 2 new virtio operations: request_vqs/free_vqs because MSI
needs to know the total number of vectors upfront.

Signed-off-by: Michael S. Tsirkin m...@redhat.com



I dont know, if that is feasible for MSI, but the transport(virtio_pci) should
already know the number of virtqueues, which should match the number of
vectors, no?

In fact, the transport has to have a way of getting the number of virtqeues
because find_vq returns ENOENT on invalid index numbers.
  


One thing I thought was to decouple the virtqueue count from the MSI 
entry count, and let the guest choose how many MSI entries it wants to 
use.  This is useful if it wants several queues to share an interrupt, 
perhaps it wants both rx and tx rings on one cpu to share a single vector.


So the device could expose a large, constant number of MSI entries, and 
in addition expose a table mapping ring number to msi entry number.  The 
guest could point all rings to one entry, or have each ring use a 
private entry, or anything in between.


That saves us the new API (at the expense of a lot more code, but with 
added flexibility).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] kvm-85 release

2009-04-27 Thread Mark McLoughlin
On Mon, 2009-04-27 at 10:50 -0400, Christoph Hellwig wrote:
 On Mon, Apr 27, 2009 at 04:45:23PM +0200, Alexander Graf wrote:
   - Disabled pwritev() support until glibc stops writing random junk.
 See https://bugzilla.redhat.com/497429
 
  Wouldn't it be useful to have it disabled upstream then?
 
 No glibc has been released with the broken one yet, I gusee Mark must
 have been running some sort of snapshot.

Yeah, it's a pre-release snapshot of 2.10 which will be included in
Fedora 11. 

Looks like it's fixed upstream already and we'll have a new snapshot
very soon.

Cheers,
Mark.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/8] virtio: add guest MSI-X support

2009-04-27 Thread Christian Borntraeger
Am Monday 27 April 2009 16:32:56 schrieb Michael S. Tsirkin:

  I dont know, if that is feasible for MSI, but the transport(virtio_pci) 
  should
  already know the number of virtqueues, which should match the number of
  vectors, no?
 
 I think no, the transport can find out the max number of vectors the
 device supports (pci_msix_table_size), but not how many virtqueues are
 needed by the driver.

I was not talking about the number of vectors, but the number of real exisiting 
virtqueues.
I just read virtio_pci again and yes,it would be a bit hard to get the number
of available virtqueues, because we need to loop over all possible vqs:
(something like)

for (i=0; i  SOMEMAX; i++) {
iowrite16(i, vp_dev-ioaddr + VIRTIO_PCI_QUEUE_SEL);
if (!ioread16(vp_dev-ioaddr + VIRTIO_PCI_QUEUE_NUM))
break;
}
return i;

which is obviously not very nice.
On the other hand, at the moment we have up to 3 virtqueues per device 
(network), and we have a current maximum of 16 virtqeues in qemu 
(VIRTIO_PCI_QUEUE_MAX in virtio.c), which would reduce the number.

 As  number of virtqueues = number of vectors,
 we could pre-allocate all vectors that host supports, but this seems
 a bit drastic as an MSI-X device could support up to 2K vectors.
 
  In fact, the transport has to have a way of getting the number of virtqeues
  because find_vq returns ENOENT on invalid index numbers.
  
  Christian
 
 So again, I think this is an upper bound supported by host. Right?

Not the upper bound, but the real available virtqueues. (With current qemu
3 for virtio-net, 2 for virtio-console etc.)

Since I use a different transport (drivers/s390/kvm/kvm_virtio.c), my 
motiviation is to keep the virtio interface as generic as possible. I dont 
really like the new interface, but I cannot give you silver bullet technical 
reasons - its more a gut feeling. The interface would work with lguest and s390.

Anyway. Avis suggestion to decouple MSI count and virtqueue count looks
like a promising approach. 

Christian
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


bad virtio disk performance

2009-04-27 Thread Lucas Nussbaum
Hi,

I'm experiencing bad disk I/O performance using virtio disks.

I'm using Linux 2.6.29 (host  guest), kvm 84 userspace.
On the host, and in a non-virtio guest, I get ~120 MB/s when writing
with dd (the disks are fast RAID0 SAS disks).

In a guest with a virtio disk, I get at most ~32 MB/s.

The rest of the setup is the same. For reference, I'm running kvm -drive
file=/tmp/debian-amd64.img,if=virtio.

Is such performance expected? What should I check?

Thank you,
-- 
| Lucas Nussbaum
| lu...@lucas-nussbaum.net   http://www.lucas-nussbaum.net/ |
| jabber: lu...@nussbaum.fr GPG: 1024D/023B3F4F |
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/8] virtio: add guest MSI-X support

2009-04-27 Thread Michael S. Tsirkin
On Mon, Apr 27, 2009 at 06:06:48PM +0300, Avi Kivity wrote:
 Christian Borntraeger wrote:
 Am Monday 27 April 2009 14:31:36 schrieb Michael S. Tsirkin:
   
 Add optional MSI-X support: use a vector per virtqueue with
 fallback to a common vector and finally to regular interrupt.
 Teach all drivers to use it.

 I added 2 new virtio operations: request_vqs/free_vqs because MSI
 needs to know the total number of vectors upfront.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 

 I dont know, if that is feasible for MSI, but the transport(virtio_pci) 
 should
 already know the number of virtqueues, which should match the number of
 vectors, no?

 In fact, the transport has to have a way of getting the number of virtqeues
 because find_vq returns ENOENT on invalid index numbers.
   

 One thing I thought was to decouple the virtqueue count from the MSI  
 entry count, and let the guest choose how many MSI entries it wants to  
 use.  This is useful if it wants several queues to share an interrupt,  
 perhaps it wants both rx and tx rings on one cpu to share a single 
 vector.

 So the device could expose a large, constant number of MSI entries, and  
 in addition expose a table mapping ring number to msi entry number.  The  
 guest could point all rings to one entry, or have each ring use a  
 private entry, or anything in between.

Sounds good ... and it might not be a lot of code I think.

 That saves us the new API (at the expense of a lot more code, but with  
 added flexibility).

So we'll probably need to rename request_vqs to request_vectors,
but we probably still need the driver to pass the number of
vectors it wants to the transport. Right?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] reserved-ram for pci-passthrough without VT-d capable hardware

2009-04-27 Thread Passera, Pablo R
Andrea,
We are working with embedded hardware that does not have VT-d and we 
need 1-1 mapping. I wonder which is the status of this patch. Have you 
continued updating it with the latest KVM version?

Regards,
Pablo

On Wed, Jul 30, 2008 at 05:16:06PM +0300, Dor Laor wrote:
 In addition KVM is used in embedded too and things are slower there, we
 know of a specific use case (production) that demands
 1:1 mapping and can't use VT-d

Since you mentioned this ;), I take opportunity to add that those
embedded usages are the ones that are totally fine with the compile
time passthrough-guest-ram decision, instead of a boot time
decision. Those host kernels will likely have RT patches (KVM works
great with preempt-RT indeed) and in turn the compile time ram
selection is the least of their problems as you can imagine ;). So you
can see my patch as an embedded-build option, similar to Configure
standard kernel features (for small systems) and no distro is
shipping new kernels with that feature on either.

Than if we decide 1:1 should have larger userbase instead of only the
people that knows what they're doing (i.e. 1:1 guest can destroy
linux-hypervisor) we can always add a bit of strtol parsing to 16bit
kernelloader.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/8] virtio: add guest MSI-X support

2009-04-27 Thread Christian Borntraeger
Am Monday 27 April 2009 17:39:36 schrieb Michael S. Tsirkin:
 So we'll probably need to rename request_vqs to request_vectors,
 but we probably still need the driver to pass the number of
 vectors it wants to the transport. Right?

This might be a stupid idea, but would something like the following
be sufficient for you?

Index: linux-2.6/drivers/net/virtio_net.c
===
--- linux-2.6.orig/drivers/net/virtio_net.c
+++ linux-2.6/drivers/net/virtio_net.c
@@ -1021,6 +1021,7 @@ static unsigned int features[] = {
 static struct virtio_driver virtio_net = {
.feature_table = features,
.feature_table_size = ARRAY_SIZE(features),
+   .num_vq_max = 3,
.driver.name =  KBUILD_MODNAME,
.driver.owner = THIS_MODULE,
.id_table = id_table,
Index: linux-2.6/include/linux/virtio.h
===
--- linux-2.6.orig/include/linux/virtio.h
+++ linux-2.6/include/linux/virtio.h
@@ -110,6 +110,7 @@ struct virtio_driver {
const struct virtio_device_id *id_table;
const unsigned int *feature_table;
unsigned int feature_table_size;
+   unsigned int num_vq_max;
int (*probe)(struct virtio_device *dev);
void (*remove)(struct virtio_device *dev);
void (*config_changed)(struct virtio_device *dev);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/7] kvm/mmu: rename is_largepage_backed to mapping_level

2009-04-27 Thread Marcelo Tosatti
Joerg,

On Fri, Apr 24, 2009 at 01:58:43PM +0200, Joerg Roedel wrote:
 With the new name and the corresponding backend changes this function
 can now support multiple hugepage sizes.
 
 Signed-off-by: Joerg Roedel joerg.roe...@amd.com
 ---
  arch/x86/kvm/mmu.c |  100 +--
  arch/x86/kvm/paging_tmpl.h |4 +-
  2 files changed, 69 insertions(+), 35 deletions(-)
 
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index e3421d8..56cd7c2 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -382,37 +382,52 @@ static void mmu_free_rmap_desc(struct kvm_rmap_desc *rd)
   * Return the pointer to the largepage write count for a given
   * gfn, handling slots that are not large page aligned.
   */
 -static int *slot_largepage_idx(gfn_t gfn, struct kvm_memory_slot *slot)
 +static int *slot_largepage_idx(gfn_t gfn,
 +struct kvm_memory_slot *slot,
 +int level)
  {
   unsigned long idx;
  
 - idx = (gfn / KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL)) -
 -   (slot-base_gfn / KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL));
 - return slot-lpage_info[0][idx].write_count;
 + idx = (gfn / KVM_PAGES_PER_HPAGE(level)) -
 +   (slot-base_gfn / KVM_PAGES_PER_HPAGE(level));
 + return slot-lpage_info[level - 2][idx].write_count;
  }
  
  static void account_shadowed(struct kvm *kvm, gfn_t gfn)
  {
 + struct kvm_memory_slot *slot;
   int *write_count;
 + int i;
  
   gfn = unalias_gfn(kvm, gfn);
 - write_count = slot_largepage_idx(gfn,
 -  gfn_to_memslot_unaliased(kvm, gfn));
 - *write_count += 1;
 +
 + for (i = PT_DIRECTORY_LEVEL;
 +  i  PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
 + slot  = gfn_to_memslot_unaliased(kvm, gfn);
 + write_count   = slot_largepage_idx(gfn, slot, i);
 + *write_count += 1;
 + }
  }
  
  static void unaccount_shadowed(struct kvm *kvm, gfn_t gfn)
  {
 + struct kvm_memory_slot *slot;
   int *write_count;
 + int i;
  
   gfn = unalias_gfn(kvm, gfn);
 - write_count = slot_largepage_idx(gfn,
 -  gfn_to_memslot_unaliased(kvm, gfn));
 - *write_count -= 1;
 - WARN_ON(*write_count  0);
 + for (i = PT_DIRECTORY_LEVEL;
 +  i  PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
 + slot  = gfn_to_memslot_unaliased(kvm, gfn);
 + write_count   = slot_largepage_idx(gfn, slot, i);
 + *write_count -= 1;
 + WARN_ON(*write_count  0);
 + }
  }
  
 -static int has_wrprotected_page(struct kvm *kvm, gfn_t gfn)
 +static int has_wrprotected_page(struct kvm *kvm,
 + gfn_t gfn,
 + int level)
  {
   struct kvm_memory_slot *slot;
   int *largepage_idx;
 @@ -420,47 +435,67 @@ static int has_wrprotected_page(struct kvm *kvm, gfn_t 
 gfn)
   gfn = unalias_gfn(kvm, gfn);
   slot = gfn_to_memslot_unaliased(kvm, gfn);
   if (slot) {
 - largepage_idx = slot_largepage_idx(gfn, slot);
 + largepage_idx = slot_largepage_idx(gfn, slot, level);
   return *largepage_idx;
   }
  
   return 1;
  }
  
 -static int host_largepage_backed(struct kvm *kvm, gfn_t gfn)
 +static int host_mapping_level(struct kvm *kvm, gfn_t gfn)
  {
 + unsigned long page_size = PAGE_SIZE;
   struct vm_area_struct *vma;
   unsigned long addr;
 - int ret = 0;
 + int i, ret = 0;
  
   addr = gfn_to_hva(kvm, gfn);
   if (kvm_is_error_hva(addr))
 - return ret;
 + return page_size;
  
   down_read(current-mm-mmap_sem);
   vma = find_vma(current-mm, addr);
 - if (vma  is_vm_hugetlb_page(vma))
 - ret = 1;
 + if (!vma)
 + goto out;
 +
 + page_size = vma_kernel_pagesize(vma);
 +
 +out:
   up_read(current-mm-mmap_sem);
  
 + for (i = PT_PAGE_TABLE_LEVEL;
 +  i  (PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES); ++i) {
 + if (page_size = KVM_HPAGE_SIZE(i))
 + ret = i;
 + else
 + break;
 + }
 +
   return ret;
  }
  
 -static int is_largepage_backed(struct kvm_vcpu *vcpu, gfn_t large_gfn)
 +static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn)
  {
   struct kvm_memory_slot *slot;
 -
 - if (has_wrprotected_page(vcpu-kvm, large_gfn))
 - return 0;
 -
 - if (!host_largepage_backed(vcpu-kvm, large_gfn))
 - return 0;
 + int host_level;
 + int level = PT_PAGE_TABLE_LEVEL;
  
   slot = gfn_to_memslot(vcpu-kvm, large_gfn);
   if (slot  slot-dirty_bitmap)
 - return 0;
 + return PT_PAGE_TABLE_LEVEL;
  
 - return 1;
 + host_level = host_mapping_level(vcpu-kvm, large_gfn);
 +
 + if (host_level == PT_PAGE_TABLE_LEVEL)
 +   

Re: kvm-kmod.git

2009-04-27 Thread Cam Macdonell

Avi Kivity wrote:
This is a small git repository for the kvm external module kit.  It 
includes a submodule link to the kernel tree, so when you check out the 
repository, it also checks out a kernel tree for a specific revision.  
This way the various scripts and the kernel source are always synchronized.


$ git clone git://git.kernel.org/pub/scm/virt/kvm/kvm-kmod.git
$ git submodule init
$ git submodule update
$ ./configure
$ make sync
$ make

Todo:

- add maint branches
- automatic 'make sync'? perhaps replace by symlimks?
- add release script


Configure spits out an error about include/asm not existing.  I think 
that make sync has to be run before configure to create the include 
directory and copy asm-x86 inside so that ln -sf asm-$karch 
include/asm will work.


Cam
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/8] virtio: add guest MSI-X support

2009-04-27 Thread Michael S. Tsirkin
On Mon, Apr 27, 2009 at 05:37:25PM +0200, Christian Borntraeger wrote:
  As  number of virtqueues = number of vectors,
  we could pre-allocate all vectors that host supports, but this seems
  a bit drastic as an MSI-X device could support up to 2K vectors.
  
   In fact, the transport has to have a way of getting the number of 
   virtqeues
   because find_vq returns ENOENT on invalid index numbers.
   
   Christian
  
  So again, I think this is an upper bound supported by host. Right?
 
 Not the upper bound, but the real available virtqueues. (With current qemu
 3 for virtio-net, 2 for virtio-console etc.)

Here's what I mean by upper bound: guest and host number of
virtqueues might be different: the host might support
control virtqueue like in virtio-net, but guest might be an older
version and not use it.

 Since I use a different transport (drivers/s390/kvm/kvm_virtio.c), my
 motiviation is to keep the virtio interface as generic as possible. I
 dont really like the new interface, but I cannot give you silver
 bullet technical reasons - its more a gut feeling. The interface would
 work with lguest and s390.
 
 Anyway. Avis suggestion to decouple MSI count and virtqueue count looks
 like a promising approach. 

So generally, we add request_vectors which gives us the number of
available MSI vectors and then find_vq gets a vector #?
Does this sound better?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm-77 Excessive Disk Access causes real time clock hang!

2009-04-27 Thread Erik Rull

Hi Avi,

Avi Kivity wrote:

interface: virtio
cache: none
format: raw, using a partition or logical volume

What are you using?


uhm, I'm not sure, I call qemu with:

qemu-system-x86_64 -usb -hda /dev/hda2 -m 1536 -net nic,macaddr=$MACADDR 
-net tap,script=/etc/qemu-ifup -no-acpi -monitor stdio -usbdevice tablet 
-boot c


The /dev/hda2 is NTFS formatted - does this make sense, because you wrote 
sth. with raw...
Maybe another file system would be faster? Or is it ignored by the guest 
system?


Best regards,

Erik
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/8] virtio: add guest MSI-X support

2009-04-27 Thread Michael S. Tsirkin
On Mon, Apr 27, 2009 at 06:59:52PM +0200, Christian Borntraeger wrote:
 Am Monday 27 April 2009 17:39:36 schrieb Michael S. Tsirkin:
  So we'll probably need to rename request_vqs to request_vectors,
  but we probably still need the driver to pass the number of
  vectors it wants to the transport. Right?
 
 This might be a stupid idea, but would something like the following
 be sufficient for you?

Yes.

 Index: linux-2.6/drivers/net/virtio_net.c
 ===
 --- linux-2.6.orig/drivers/net/virtio_net.c
 +++ linux-2.6/drivers/net/virtio_net.c
 @@ -1021,6 +1021,7 @@ static unsigned int features[] = {
  static struct virtio_driver virtio_net = {
 .feature_table = features,
 .feature_table_size = ARRAY_SIZE(features),
 +   .num_vq_max = 3,
 .driver.name =  KBUILD_MODNAME,
 .driver.owner = THIS_MODULE,
 .id_table = id_table,
 Index: linux-2.6/include/linux/virtio.h
 ===
 --- linux-2.6.orig/include/linux/virtio.h
 +++ linux-2.6/include/linux/virtio.h
 @@ -110,6 +110,7 @@ struct virtio_driver {
 const struct virtio_device_id *id_table;
 const unsigned int *feature_table;
 unsigned int feature_table_size;
 +   unsigned int num_vq_max;
 int (*probe)(struct virtio_device *dev);
 void (*remove)(struct virtio_device *dev);
 void (*config_changed)(struct virtio_device *dev);

Good idea.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Boot problems with qemu-kvm

2009-04-27 Thread Cam Macdonell


Hi,

I've tried booting two VMs (Ubuntu and Fedora 11) that previously worked 
with the latest kvm-userspace.  At start, I get the following warning:


Could not open '/dev/kqemu' - QEMU acceleration layer not activated: No 
such file or directory


But, booting proceeds quickly which seems to indicate that KVM is being 
used.  However, the network device is not started and the graphics don't 
start properly.  CPU usage also spikes to 100% and stays there and I 
can't interact at all with the VM.  I'm running the 2.6.29 KVM kernel in 
the guests.


Cam
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad virtio disk performance

2009-04-27 Thread john cooper

Lucas Nussbaum wrote:

Hi,

I'm experiencing bad disk I/O performance using virtio disks.

I'm using Linux 2.6.29 (host  guest), kvm 84 userspace.
On the host, and in a non-virtio guest, I get ~120 MB/s when writing
with dd (the disks are fast RAID0 SAS disks).


Could you provide detail of the exact type and size
of i/o load you were creating with dd?

Also the full qemu cmd line invocation in both
cases would be useful.


In a guest with a virtio disk, I get at most ~32 MB/s.


Which non-virtio interface was used for the
comparison?


The rest of the setup is the same. For reference, I'm running kvm -drive
file=/tmp/debian-amd64.img,if=virtio.

Is such performance expected? What should I check?


Not expected, something is awry.

blktrace(8) run on the host will shed some light
on the type of i/o requests issued by qemu in both
cases.

-john


--
john.coo...@third-harmonic.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] reserved-ram for pci-passthrough without VT-d capable hardware

2009-04-27 Thread Andrea Arcangeli
Hello Pablo,

On Mon, Apr 27, 2009 at 11:00:51AM -0600, Passera, Pablo R wrote:
 Andrea,
 We are working with embedded hardware that does not have
VT-d and we need 1-1 mapping. I wonder which is the status of this
patch. Have you continued updating it with the latest KVM version?

Sorry to say but it isn't updated to latest KVM and latest
mainline. Porting normally should be easy. I attached last versions.

 Since you mentioned this ;), I take opportunity to add that those
 embedded usages are the ones that are totally fine with the compile
 time passthrough-guest-ram decision, instead of a boot time
 decision. Those host kernels will likely have RT patches (KVM works
 great with preempt-RT indeed) and in turn the compile time ram
 selection is the least of their problems as you can imagine ;). So you
 can see my patch as an embedded-build option, similar to Configure
 standard kernel features (for small systems) and no distro is
 shipping new kernels with that feature on either.
 
 Than if we decide 1:1 should have larger userbase instead of only the
 people that knows what they're doing (i.e. 1:1 guest can destroy
 linux-hypervisor) we can always add a bit of strtol parsing to 16bit
 kernelloader.

Agreed!
From: Andrea Arcangeli aarca...@redhat.com

The reserved RAM can be mapped by virtualization software with
/dev/mem to create a 1:1 mapping between guest physical (bus) address
and host physical (bus) address. This will allow pci passthrough with
DMA for the guest using the ram with the 1:1 mapping. The only detail
to take care of is the ram marked reserved RAM failed. The
virtualization software must create for the guest an e820 map that
only includes the reserved RAM regions but if the guest touches
memory with guest physical address in the reserved RAM failed ranges
(linux guest will do that even if the ram isn't present in the e820
map), it should provide that as ram and map it with a non linear
mapping. This should allow any linux kernel to run fine and hopefully
any other OS too.

svm ~ # cat /proc/iomem |head -n 20
-0fff : reserved RAM failed
1000-5fff : reserved RAM
6000-7fff : reserved RAM failed
8000-0009efff : reserved RAM
0009f000-0009 : reserved
000cd600-000c : pnp 00:0d
000f-000f : reserved
0010-0fff : reserved RAM
1000-3ded : System RAM
  1000-10329ab2 : Kernel code
  10329ab3-104933e7 : Kernel data
  104f5000-10558e67 : Kernel bss
3dee-3dee2fff : ACPI Non-volatile Storage
3dee3000-3dee : ACPI Tables
3def-3def : reserved
3dff-3ffe : pnp 00:0d
e000-efff : reserved
fa00-fbff : PCI Bus #01
  fa00-fbff : :01:05.0
fda0-fdbf : PCI Bus #01
svm ~ # hexdump /dev/mem | grep -C2 '   '
7e0        
*
0001000        
*
0006000 a5a5 a5a5 8ec8 8ed8 8ec0 66d0 06c7 
--
*
0007ff0     3063 1000  
0008000        
*
009f000 0002       
--
00fffe0 6000 3c03 45e7 0184 0500 0082 01c0 0223
000 5bea 00e0 31f0 2f32 3931 302f 0037 12fc
010        
*
1000 8d48 f92d  48ff ed81  1000 8948
^C
svm ~ #

Signed-off-by: From: Andrea Arcangeli aarca...@redhat.com
---

This is a port to current linux-2.6.git of the previous reserved-ram
patch. Let me know if there's a chance to get this acked and
included. Anything that isn't at compile time would require much
bigger changes just to parse the command line at 16bit realmode time
to know where to relocate the kernel dynamically. Because 1:1 is a
corner case feature required only by some users, this is the minimal
intrusive approach. This also has some limits as it can't reserve more
than 1g, and with a few more changes 2g but this is ok for a long time
as the virtualized 1:1 guest doesn't need to be huge, just a desktop.

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1276,8 +1276,36 @@ config CRASH_DUMP
  (CONFIG_RELOCATABLE=y).
  For more details see Documentation/kdump/kdump.txt
 
+config RESERVE_PHYSICAL_START
+   bool Reserve all RAM below PHYSICAL_START (EXPERIMENTAL)
+   depends on !RELOCATABLE  X86_64
+   help
+ This makes the kernel use only RAM above __PHYSICAL_START.
+ All memory below __PHYSICAL_START will be left unused and
+ marked as reserved RAM in /proc/iomem. The few special
+ pages that can't be relocated at addresses above
+ __PHYSICAL_START and that can't be guaranteed to be unused
+ by the running kernel will be marked reserved RAM failed
+ in /proc/iomem. Those may or may be not used by the kernel
+ (for example SMP trampoline pages would only be used if
+ CPU hotplug is enabled).
+
+ The reserved RAM can be mapped by virtualization software
+  

Re: [PATCH RFC 0/8] virtio: add guest MSI-X support

2009-04-27 Thread Michael S. Tsirkin
On Mon, Apr 27, 2009 at 09:39:06AM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin wrote:
 Add optional MSI-X support: use a vector per virtqueue with
 fallback to a common vector and finally to regular interrupt.
 Teach all drivers to use it.

 I added 2 new virtio operations: request_vqs/free_vqs because MSI
 needs to know the total number of vectors upfront.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com
   

 Do you have userspace patches for testing?

 Regards,

 Anthony Liguori

Not ready yet, sorry.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad virtio disk performance

2009-04-27 Thread Lucas Nussbaum
On 27/04/09 at 13:36 -0400, john cooper wrote:
 Lucas Nussbaum wrote:
 Hi,

 I'm experiencing bad disk I/O performance using virtio disks.

 I'm using Linux 2.6.29 (host  guest), kvm 84 userspace.
 On the host, and in a non-virtio guest, I get ~120 MB/s when writing
 with dd (the disks are fast RAID0 SAS disks).

 Could you provide detail of the exact type and size
 of i/o load you were creating with dd?

I tried with various block sizes. An example invocation would be:
dd if=/dev/zero of=foo bs=4096 count=262144 conv=fsync (126 MB/s without
virtio, 32 MB/s with virtio).

 Also the full qemu cmd line invocation in both
 cases would be useful.

non-virtio:
kvm -drive file=/tmp/debian-amd64.img,if=scsi,cache=writethrough -net
nic,macaddr=00:16:3e:5a:28:1,model=e1000 -net tap -nographic -kernel
/boot/vmlinuz-2.6.29 -initrd /boot/initrd.img-2.6.29 -append
root=/dev/sda1 ro console=tty0 console=ttyS0,9600,8n1

virtio:
kvm -drive file=/tmp/debian-amd64.img,if=virtio,cache=writethrough -net
nic,macaddr=00:16:3e:5a:28:1,model=e1000 -net tap -nographic -kernel
/boot/vmlinuz-2.6.29 -initrd /boot/initrd.img-2.6.29 -append
root=/dev/vda1 ro console=tty0 console=ttyS0,9600,8n1


 In a guest with a virtio disk, I get at most ~32 MB/s.

 Which non-virtio interface was used for the
 comparison?

if=ide
I got the same performance with if=scsi

 The rest of the setup is the same. For reference, I'm running kvm -drive
 file=/tmp/debian-amd64.img,if=virtio.

 Is such performance expected? What should I check?

 Not expected, something is awry.

 blktrace(8) run on the host will shed some light
 on the type of i/o requests issued by qemu in both
 cases.

Ah, I found something interesting. btrace summary after writing a 1 GB
file:
--- without virtio:
Total (8,5):
 Reads Queued:   0,   0KiBWrites Queued:  272259, 1089MiB
 Read Dispatches:0,   0KiBWrite Dispatches: 9769, 1089MiB
 Reads Requeued: 0Writes Requeued: 0
 Reads Completed:0,   0KiBWrites Completed: 9769, 1089MiB
 Read Merges:0,   0KiBWrite Merges:   262490, 1049MiB
 IO unplugs: 45973Timer unplugs:  30
--- with virtio:
Total (8,5):
 Reads Queued:   1,   4KiBWrites Queued:  430734, 1776MiB
 Read Dispatches:1,   4KiBWrite Dispatches:   196143, 1776MiB
 Reads Requeued: 0Writes Requeued: 0
 Reads Completed:1,   4KiBWrites Completed:   196143, 1776MiB
 Read Merges:0,   0KiBWrite Merges:   234578, 938488KiB
 IO unplugs:301311Timer unplugs:  25
(I re-ran the test twice, got similar results)

So, apparently, with virtio, there's a lot more data being written to
disk. The underlying filesystem is ext3, and is monted as /tmp. It only
contains the VM image file. Another difference is that, with virtio, the
data was shared equally over all 4 CPUs, with without virt-io, CPU0 and
CPU1 did all the work.
In the virtio log, I also see a (null) process doing a lot of writes.

I uploaded the logs to http://blop.info/bazaar/virtio/, if you want to
take a look.

Thank you,

- Lucas
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM PATCH v3 1/2] eventfd: export fget and signal interfaces for module use

2009-04-27 Thread Gregory Haskins
We will use this later in the series

Signed-off-by: Gregory Haskins ghask...@novell.com
---

 fs/eventfd.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 2a701d5..faf9e60 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -16,6 +16,7 @@
 #include linux/anon_inodes.h
 #include linux/eventfd.h
 #include linux/syscalls.h
+#include linux/module.h
 
 struct eventfd_ctx {
wait_queue_head_t wqh;
@@ -56,6 +57,7 @@ int eventfd_signal(struct file *file, int n)
 
return n;
 }
+EXPORT_SYMBOL_GPL(eventfd_signal);
 
 static int eventfd_release(struct inode *inode, struct file *file)
 {
@@ -197,6 +199,7 @@ struct file *eventfd_fget(int fd)
 
return file;
 }
+EXPORT_SYMBOL_GPL(eventfd_fget);
 
 SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags)
 {
@@ -228,6 +231,7 @@ SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags)
kfree(ctx);
return fd;
 }
+EXPORT_SYMBOL_GPL(sys_eventfd2);
 
 SYSCALL_DEFINE1(eventfd, unsigned int, count)
 {

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM PATCH v3 0/2] irqfd

2009-04-27 Thread Gregory Haskins
(Applies to kvm.git 41b76d8d0487c26d6d4d3fe53c1ff59b3236f096)

This series implements a mechanism called irqfd.  It lets you create
an eventfd based file-desriptor to inject interrupts to a kvm guest.  We
associate one gsi per fd for fine-grained routing.

[ Changelog:

   v3:
*) The kernel now allocates the eventfd (need to export sys_eventfd2)
*) Added a flags field for future expansion to kvm_irqfd()
*) We properly toggle the irq level 1+0.
*) We re-use the USERSPACE_SRC_ID instead of creating our own
*) Properly check for failures establishing a poll-table with eventfd
*) Fixed fd/file leaks on failure
*) Rebased to lateste kvm.git::41b76d8d04

   v2:
*) Dropped notifier_chain based callbacks in favor of
   wait_queue_t::func and file::poll based callbacks (Thanks to
   Davide for the suggestion)

   v1:
*) Initial release



We do not have a user of this interface in this series, though note
future version of virtual-bus (v4 and above) will be based on this.

The first patch will require mainline buy-in, particularly from Davide
(cc'd).  The last patch is kvm specific.

qemu-kvm.git patch to follow.

-Greg

---

Gregory Haskins (2):
  kvm: add support for irqfd via eventfd-notification interface
  eventfd: export fget and signal interfaces for module use


 arch/x86/kvm/Makefile|2 -
 arch/x86/kvm/x86.c   |1 
 fs/eventfd.c |4 +
 include/linux/kvm.h  |7 ++
 include/linux/kvm_host.h |4 +
 virt/kvm/irqfd.c |  158 ++
 virt/kvm/kvm_main.c  |   11 +++
 7 files changed, 186 insertions(+), 1 deletions(-)
 create mode 100644 virt/kvm/irqfd.c

-- 
Signature
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-04-27 Thread Gregory Haskins
This allows an eventfd to be registered as an irq source with a guest.  Any
signaling operation on the eventfd (via userspace or kernel) will inject
the registered GSI at the next available window.

Signed-off-by: Gregory Haskins ghask...@novell.com
---

 arch/x86/kvm/Makefile|2 -
 arch/x86/kvm/x86.c   |1 
 include/linux/kvm.h  |7 ++
 include/linux/kvm_host.h |4 +
 virt/kvm/irqfd.c |  158 ++
 virt/kvm/kvm_main.c  |   11 +++
 6 files changed, 182 insertions(+), 1 deletions(-)
 create mode 100644 virt/kvm/irqfd.c

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index b43c4ef..d5fff51 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -3,7 +3,7 @@
 #
 
 common-objs = $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \
-coalesced_mmio.o irq_comm.o)
+coalesced_mmio.o irq_comm.o irqfd.o)
 ifeq ($(CONFIG_KVM_TRACE),y)
 common-objs += $(addprefix ../../../virt/kvm/, kvm_trace.o)
 endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ab1fdac..0773433 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1027,6 +1027,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_REINJECT_CONTROL:
case KVM_CAP_IRQ_INJECT_STATUS:
case KVM_CAP_ASSIGN_DEV_IRQ:
+   case KVM_CAP_IRQFD:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 3db5d8d..0e594f2 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -415,6 +415,7 @@ struct kvm_trace_rec {
 #define KVM_CAP_ASSIGN_DEV_IRQ 29
 /* Another bug in KVM_SET_USER_MEMORY_REGION fixed: */
 #define KVM_CAP_JOIN_MEMORY_REGIONS_WORKS 30
+#define KVM_CAP_IRQFD 31
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -454,6 +455,11 @@ struct kvm_irq_routing {
 
 #endif
 
+struct kvm_irqfd {
+   __u32 gsi;
+   __u32 flags;
+};
+
 /*
  * ioctls for VM fds
  */
@@ -498,6 +504,7 @@ struct kvm_irq_routing {
 #define KVM_ASSIGN_SET_MSIX_ENTRY \
_IOW(KVMIO, 0x74, struct kvm_assigned_msix_entry)
 #define KVM_DEASSIGN_DEV_IRQ   _IOW(KVMIO, 0x75, struct kvm_assigned_irq)
+#define KVM_IRQFD   _IOW(KVMIO, 0x76, struct kvm_irqfd)
 
 /*
  * ioctls for vcpu fds
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 095ebb6..6a8d1c1 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -134,6 +134,7 @@ struct kvm {
struct list_head vm_list;
struct kvm_io_bus mmio_bus;
struct kvm_io_bus pio_bus;
+   struct list_head irqfds;
struct kvm_vm_stat stat;
struct kvm_arch arch;
atomic_t users_count;
@@ -524,4 +525,7 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {}
 
 #endif
 
+int kvm_irqfd(struct kvm *kvm, int gsi, int flags);
+void kvm_irqfd_release(struct kvm *kvm);
+
 #endif
diff --git a/virt/kvm/irqfd.c b/virt/kvm/irqfd.c
new file mode 100644
index 000..4fd2bea
--- /dev/null
+++ b/virt/kvm/irqfd.c
@@ -0,0 +1,158 @@
+/*
+ * irqfd: Allows an eventfd to be used to inject an interrupt to the guest
+ *
+ * Credit goes to Avi Kivity for the original idea.
+ *
+ * Copyright 2009 Novell.  All Rights Reserved.
+ *
+ * Author:
+ * Gregory Haskins ghask...@novell.com
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#include linux/kvm_host.h
+#include linux/eventfd.h
+#include linux/workqueue.h
+#include linux/syscalls.h
+#include linux/wait.h
+#include linux/poll.h
+#include linux/file.h
+#include linux/list.h
+
+struct _irqfd {
+   struct kvm   *kvm;
+   int   gsi;
+   struct file  *file;
+   struct list_head  list;
+   poll_tablept;
+   wait_queue_head_t*wqh;
+   wait_queue_t  wait;
+   struct work_structwork;
+};
+
+static void
+irqfd_inject(struct work_struct *work)
+{
+   struct _irqfd *irqfd = container_of(work, struct _irqfd, work);
+   struct kvm *kvm = irqfd-kvm;
+
+   mutex_lock(kvm-lock);
+   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 1);
+   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 0);
+   mutex_unlock(kvm-lock);
+}
+
+static int
+irqfd_wakeup(wait_queue_t *wait, unsigned mode, int 

[PATCH v3] qemu-kvm: add irqfd support

2009-04-27 Thread Gregory Haskins
(Applies to qemu-kvm.git 8f7a30dbc40a1d4c09275566f9ed9647ed1ee50f)

irqfd lets you create an eventfd based file-desriptor to inject interrupts
to a kvm guest.  We associate one gsi per fd for fine-grained routing.

Signed-off-by: Gregory Haskins ghask...@novell.com
---

 kvm/libkvm/libkvm.c |   18 ++
 kvm/libkvm/libkvm.h |   14 ++
 2 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/kvm/libkvm/libkvm.c b/kvm/libkvm/libkvm.c
index 0610e3f..dc9a951 100644
--- a/kvm/libkvm/libkvm.c
+++ b/kvm/libkvm/libkvm.c
@@ -1440,3 +1440,21 @@ int kvm_assign_set_msix_entry(kvm_context_t kvm,
 return ret;
 }
 #endif
+
+int kvm_irqfd(kvm_context_t kvm, int gsi, int flags)
+{
+   int r;
+   struct kvm_irqfd data = {
+   .gsi   = gsi,
+   .flags = flags,
+   };
+
+   if (!kvm_check_extension(kvm, KVM_CAP_IRQFD))
+   return -ENOENT;
+
+   r = ioctl(kvm-vm_fd, KVM_IRQFD, data);
+   if (r == -1)
+   r = -errno;
+   return r;
+}  
+
diff --git a/kvm/libkvm/libkvm.h b/kvm/libkvm/libkvm.h
index ce6f054..89a8893 100644
--- a/kvm/libkvm/libkvm.h
+++ b/kvm/libkvm/libkvm.h
@@ -856,6 +856,20 @@ int kvm_commit_irq_routes(kvm_context_t kvm);
  */
 int kvm_get_irq_route_gsi(kvm_context_t kvm);
 
+/*!
+ * \brief Create a file descriptor for injecting interrupts
+ *
+ * Creates an eventfd based file-descriptor that maps to a specific GSI
+ * in the guest.  eventfd compliant signaling (write() from userspace, or
+ * eventfd_signal() from kernelspace) will cause the GSI to inject
+ * itself into the guest at the next available window.
+ *
+ * \param kvm Pointer to the current kvm_context
+ * \param gsi GSI to assign to this fd
+ * \param flags reserved, must be zero
+ */
+int kvm_irqfd(kvm_context_t kvm, int gsi, int flags);
+
 #ifdef KVM_CAP_DEVICE_MSIX
 int kvm_assign_set_msix_nr(kvm_context_t kvm,
   struct kvm_assigned_msix_nr *msix_nr);

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 1/4] KVM: MMU: protect kvm_mmu_change_mmu_pages with mmu_lock

2009-04-27 Thread mtosatti
kvm_handle_hva, called by MMU notifiers, manipulates mmu data only with
the protection of mmu_lock.

Update kvm_mmu_change_mmu_pages callers to take mmu_lock, thus protecting
against kvm_handle_hva.

CC: Andrea Arcangeli aarca...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/kvm/mmu.c
===
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -2732,7 +2732,6 @@ void kvm_mmu_slot_remove_write_access(st
 {
struct kvm_mmu_page *sp;
 
-   spin_lock(kvm-mmu_lock);
list_for_each_entry(sp, kvm-arch.active_mmu_pages, link) {
int i;
u64 *pt;
@@ -2747,7 +2746,6 @@ void kvm_mmu_slot_remove_write_access(st
pt[i] = ~PT_WRITABLE_MASK;
}
kvm_flush_remote_tlbs(kvm);
-   spin_unlock(kvm-mmu_lock);
 }
 
 void kvm_mmu_zap_all(struct kvm *kvm)
Index: kvm/arch/x86/kvm/x86.c
===
--- kvm.orig/arch/x86/kvm/x86.c
+++ kvm/arch/x86/kvm/x86.c
@@ -1647,10 +1647,12 @@ static int kvm_vm_ioctl_set_nr_mmu_pages
return -EINVAL;
 
down_write(kvm-slots_lock);
+   spin_lock(kvm-mmu_lock);
 
kvm_mmu_change_mmu_pages(kvm, kvm_nr_mmu_pages);
kvm-arch.n_requested_mmu_pages = kvm_nr_mmu_pages;
 
+   spin_unlock(kvm-mmu_lock);
up_write(kvm-slots_lock);
return 0;
 }
@@ -1826,7 +1828,9 @@ int kvm_vm_ioctl_get_dirty_log(struct kv
 
/* If nothing is dirty, don't bother messing with page tables. */
if (is_dirty) {
+   spin_lock(kvm-mmu_lock);
kvm_mmu_slot_remove_write_access(kvm, log-slot);
+   spin_unlock(kvm-mmu_lock);
kvm_flush_remote_tlbs(kvm);
memslot = kvm-memslots[log-slot];
n = ALIGN(memslot-npages, BITS_PER_LONG) / 8;
@@ -4510,12 +4514,14 @@ int kvm_arch_set_memory_region(struct kv
}
}
 
+   spin_lock(kvm-mmu_lock);
if (!kvm-arch.n_requested_mmu_pages) {
unsigned int nr_mmu_pages = kvm_mmu_calculate_mmu_pages(kvm);
kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages);
}
 
kvm_mmu_slot_remove_write_access(kvm, mem-slot);
+   spin_unlock(kvm-mmu_lock);
kvm_flush_remote_tlbs(kvm);
 
return 0;


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 4/4] KVM: x86: disallow changing a slots size

2009-04-27 Thread mtosatti
Support to shrinking aliases complicates kernel code unnecessarily,
while userspace can do the same with two operations, delete an alias,
and create a new alias.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/kvm/x86.c
===
--- kvm.orig/arch/x86/kvm/x86.c
+++ kvm/arch/x86/kvm/x86.c
@@ -1707,6 +1707,7 @@ static int kvm_vm_ioctl_set_memory_alias
 {
int r, n;
struct kvm_mem_alias *p;
+   unsigned long npages;
 
r = -EINVAL;
/* General sanity checks */
@@ -1728,9 +1729,12 @@ static int kvm_vm_ioctl_set_memory_alias
 
p = kvm-arch.aliases[alias-slot];
 
-   /* FIXME: either disallow shrinking alias slots or disable
-* size changes as done with memslots
-*/
+   /* Disallow changing an alias slot's size. */
+   npages = alias-memory_size  PAGE_SHIFT;
+   r = -EINVAL;
+   if (npages  p-npages  npages != p-npages)
+   goto out_unlock;
+
if (!alias-memory_size) {
r = -EBUSY;
if (kvm_root_gfn_in_range(kvm, p-base_gfn,


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 2/4] KVM: take mmu_lock when updating a deleted slot

2009-04-27 Thread mtosatti
kvm_handle_hva relies on mmu_lock protection to safely access 
the memslot structures.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/virt/kvm/kvm_main.c
===
--- kvm.orig/virt/kvm/kvm_main.c
+++ kvm/virt/kvm/kvm_main.c
@@ -1199,8 +1199,10 @@ int __kvm_set_memory_region(struct kvm *
 
kvm_free_physmem_slot(old, npages ? new : NULL);
/* Slot deletion case: we have to update the current slot */
+   spin_lock(kvm-mmu_lock);
if (!npages)
*memslot = old;
+   spin_unlock(kvm-mmu_lock);
 #ifdef CONFIG_DMAR
/* map the pages in iommu page table */
r = kvm_iommu_map_pages(kvm, base_gfn, npages);


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 0/4] set_memory_region locking fixes / vcpu-arch.cr3 + removal of memslots

2009-04-27 Thread mtosatti


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 3/4] KVM: introduce kvm_arch_can_free_memslot, disallow slot deletion if cached cr3

2009-04-27 Thread mtosatti
Disallow the deletion of memory slots (and aliases, for x86 case), if a
vcpu contains a cr3 that points to such slot/alias.

This complements commit 6c20e1442bb1c62914bb85b7f4a38973d2a423ba.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/ia64/kvm/kvm-ia64.c
===
--- kvm.orig/arch/ia64/kvm/kvm-ia64.c
+++ kvm/arch/ia64/kvm/kvm-ia64.c
@@ -1633,6 +1633,11 @@ void kvm_arch_flush_shadow(struct kvm *k
kvm_flush_remote_tlbs(kvm);
 }
 
+int kvm_arch_can_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+   return 1;
+}
+
 long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
 {
Index: kvm/arch/powerpc/kvm/powerpc.c
===
--- kvm.orig/arch/powerpc/kvm/powerpc.c
+++ kvm/arch/powerpc/kvm/powerpc.c
@@ -176,6 +176,11 @@ void kvm_arch_flush_shadow(struct kvm *k
 {
 }
 
+int kvm_arch_can_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+   return 1;
+}
+
 struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
 {
struct kvm_vcpu *vcpu;
Index: kvm/arch/s390/kvm/kvm-s390.c
===
--- kvm.orig/arch/s390/kvm/kvm-s390.c
+++ kvm/arch/s390/kvm/kvm-s390.c
@@ -691,6 +691,11 @@ void kvm_arch_flush_shadow(struct kvm *k
 {
 }
 
+int kvm_arch_can_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+   return 1;
+}
+
 gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn)
 {
return gfn;
Index: kvm/arch/x86/kvm/x86.c
===
--- kvm.orig/arch/x86/kvm/x86.c
+++ kvm/arch/x86/kvm/x86.c
@@ -1676,6 +1676,27 @@ gfn_t unalias_gfn(struct kvm *kvm, gfn_t
return gfn;
 }
 
+static int kvm_root_gfn_in_range(struct kvm *kvm, gfn_t base_gfn,
+gfn_t end_gfn, bool unalias)
+{
+   struct kvm_vcpu *vcpu;
+   gfn_t root_gfn;
+   int i;
+
+   for (i = 0; i  KVM_MAX_VCPUS; ++i) {
+   vcpu = kvm-vcpus[i];
+   if (!vcpu)
+   continue;
+   root_gfn = vcpu-arch.cr3  PAGE_SHIFT;
+   if (unalias)
+   root_gfn = unalias_gfn(kvm, root_gfn);
+   if (root_gfn = base_gfn  root_gfn = end_gfn)
+   return 1;
+   }
+
+   return 0;
+}
+
 /*
  * Set a new alias region.  Aliases map a portion of physical memory into
  * another portion.  This is useful for memory windows, for example the PC
@@ -1706,6 +1727,19 @@ static int kvm_vm_ioctl_set_memory_alias
spin_lock(kvm-mmu_lock);
 
p = kvm-arch.aliases[alias-slot];
+
+   /* FIXME: either disallow shrinking alias slots or disable
+* size changes as done with memslots
+*/
+   if (!alias-memory_size) {
+   r = -EBUSY;
+   if (kvm_root_gfn_in_range(kvm, p-base_gfn,
+  p-base_gfn + p-npages - 1,
+  false))
+   goto out_unlock;
+   }
+
+
p-base_gfn = alias-guest_phys_addr  PAGE_SHIFT;
p-npages = alias-memory_size  PAGE_SHIFT;
p-target_gfn = alias-target_phys_addr  PAGE_SHIFT;
@@ -1722,6 +1756,9 @@ static int kvm_vm_ioctl_set_memory_alias
 
return 0;
 
+out_unlock:
+   spin_unlock(kvm-mmu_lock);
+   up_write(kvm-slots_lock);
 out:
return r;
 }
@@ -4532,6 +4569,15 @@ void kvm_arch_flush_shadow(struct kvm *k
kvm_mmu_zap_all(kvm);
 }
 
+int kvm_arch_can_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+   int ret;
+
+   ret = kvm_root_gfn_in_range(kvm, slot-base_gfn,
+   slot-base_gfn + slot-npages - 1, true);
+   return !ret;
+}
+
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
 {
return vcpu-arch.mp_state == KVM_MP_STATE_RUNNABLE
Index: kvm/include/linux/kvm_host.h
===
--- kvm.orig/include/linux/kvm_host.h
+++ kvm/include/linux/kvm_host.h
@@ -200,6 +200,7 @@ int kvm_arch_set_memory_region(struct kv
struct kvm_memory_slot old,
int user_alloc);
 void kvm_arch_flush_shadow(struct kvm *kvm);
+int kvm_arch_can_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot);
 gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn);
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
 unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn);
Index: kvm/virt/kvm/kvm_main.c
===
--- kvm.orig/virt/kvm/kvm_main.c
+++ kvm/virt/kvm/kvm_main.c
@@ -1179,8 +1179,13 @@ int __kvm_set_memory_region(struct kvm *
}
 #endif /* not defined CONFIG_S390 */
 
-   if (!npages)
+   if (!npages) {

[patch 1/4] qemu: external module: smp_send_reschedule compat

2009-04-27 Thread mtosatti
smp_send_reschedule was exported (via smp_ops) in v2.6.24.
Create a compat function which schedules the IPI to keventd context, 
in case interrupts are disabled, for kernels  2.6.24.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/kvm/kernel/external-module-compat-comm.h 
b/kvm/kernel/external-module-compat-comm.h
index c955927..852a8c1 100644
--- a/kvm/kernel/external-module-compat-comm.h
+++ b/kvm/kernel/external-module-compat-comm.h
@@ -116,6 +116,10 @@ int kvm_smp_call_function_single(int cpu, void 
(*func)(void *info),
 
 #endif
 
+#if LINUX_VERSION_CODE  KERNEL_VERSION(2,6,28)
+#define nr_cpu_ids NR_CPUS
+#endif
+
 #include linux/miscdevice.h
 #ifndef KVM_MINOR
 #define KVM_MINOR 232
@@ -220,6 +224,10 @@ int kvm_smp_call_function_mask(cpumask_t mask, void 
(*func) (void *info),
 
 #define smp_call_function_mask kvm_smp_call_function_mask
 
+void kvm_smp_send_reschedule(int cpu);
+
+#define smp_send_reschedule kvm_smp_send_reschedule
+
 #endif
 
 /* empty_zero_page isn't exported in all kernels */
diff --git a/kvm/kernel/external-module-compat.c 
b/kvm/kernel/external-module-compat.c
index 0d858be..ca269cc 100644
--- a/kvm/kernel/external-module-compat.c
+++ b/kvm/kernel/external-module-compat.c
@@ -221,6 +221,58 @@ out:
return 0;
 }
 
+#include linux/workqueue.h
+
+static void vcpu_kick_intr(void *info)
+{
+}
+
+struct kvm_kick {
+   int cpu;
+   struct work_struct work;
+};
+
+#if LINUX_VERSION_CODE  KERNEL_VERSION(2,6,20)
+static void kvm_do_smp_call_function(void *data)
+{
+   int me;
+   struct kvm_kick *kvm_kick = data;
+#else
+static void kvm_do_smp_call_function(struct work_struct *work)
+{
+   int me;
+   struct kvm_kick *kvm_kick = container_of(work, struct kvm_kick, work);
+#endif
+   me = get_cpu();
+
+   if (kvm_kick-cpu != me)
+   smp_call_function_single(kvm_kick-cpu, vcpu_kick_intr,
+NULL, 0);
+   kfree(kvm_kick);
+   put_cpu();
+}
+
+void kvm_queue_smp_call_function(int cpu)
+{
+   struct kvm_kick *kvm_kick = kmalloc(sizeof(struct kvm_kick), 
GFP_ATOMIC);
+
+#if LINUX_VERSION_CODE  KERNEL_VERSION(2,6,20)
+   INIT_WORK(kvm_kick-work, kvm_do_smp_call_function, kvm_kick);
+#else
+   INIT_WORK(kvm_kick-work, kvm_do_smp_call_function);
+#endif
+
+   schedule_work(kvm_kick-work);
+}
+
+void kvm_smp_send_reschedule(int cpu)
+{
+   if (irqs_disabled()) {
+   kvm_queue_smp_call_function(cpu);
+   return;
+   }
+   smp_call_function_single(cpu, vcpu_kick_intr, NULL, 0);
+}
 #endif
 
 /* manually export hrtimer_init/start/cancel */
diff --git a/kvm/kernel/x86/hack-module.awk b/kvm/kernel/x86/hack-module.awk
index 260eeef..f4d14da 100644
--- a/kvm/kernel/x86/hack-module.awk
+++ b/kvm/kernel/x86/hack-module.awk
@@ -35,6 +35,8 @@ BEGIN { split(INIT_WORK desc_struct ldttss_desc64 desc_ptr  
\
 
 { sub(/match-dev-msi_enabled/, kvm_pcidev_msi_enabled(match-dev)) }
 
+{ sub(/smp_send_reschedule/, kvm_smp_send_reschedule) }
+
 /^static void __vmx_load_host_state/ {
 vmx_load_host_state = 1
 }


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 3/4] KVM: use smp_send_reschedule in kvm_vcpu_kick

2009-04-27 Thread mtosatti
KVM uses a function call IPI to cause the exit of a guest running on a
physical cpu. For virtual interrupt notification there is no need to
wait on IPI receival, or to execute any function.

This is exactly what the reschedule IPI does, without the overhead
of function IPI. So use it instead of smp_call_function_single in
kvm_vcpu_kick.

Also change the guest_mode variable to a bit in vcpu-requests, and
use that to collapse multiple IPI's that would be issued between the
first one and zeroing of guest mode.

This allows kvm_vcpu_kick to called with interrupts disabled.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/ia64/kernel/irq_ia64.c
===
--- kvm.orig/arch/ia64/kernel/irq_ia64.c
+++ kvm/arch/ia64/kernel/irq_ia64.c
@@ -610,6 +610,9 @@ static struct irqaction ipi_irqaction = 
.name = IPI
 };
 
+/*
+ * KVM uses this interrupt to force a cpu out of guest mode
+ */
 static struct irqaction resched_irqaction = {
.handler =  dummy_handler,
.flags =IRQF_DISABLED,
Index: kvm/arch/ia64/kvm/kvm-ia64.c
===
--- kvm.orig/arch/ia64/kvm/kvm-ia64.c
+++ kvm/arch/ia64/kvm/kvm-ia64.c
@@ -668,7 +668,7 @@ again:
host_ctx = kvm_get_host_context(vcpu);
guest_ctx = kvm_get_guest_context(vcpu);
 
-   vcpu-guest_mode = 1;
+   clear_bit(KVM_REQ_KICK, vcpu-requests);
 
r = kvm_vcpu_pre_transition(vcpu);
if (r  0)
@@ -685,7 +685,7 @@ again:
kvm_vcpu_post_transition(vcpu);
 
vcpu-arch.launched = 1;
-   vcpu-guest_mode = 0;
+   set_bit(KVM_REQ_KICK, vcpu-requests);
local_irq_enable();
 
/*
@@ -1884,24 +1884,18 @@ void kvm_arch_hardware_unsetup(void)
 {
 }
 
-static void vcpu_kick_intr(void *info)
-{
-#ifdef DEBUG
-   struct kvm_vcpu *vcpu = (struct kvm_vcpu *)info;
-   printk(KERN_DEBUGvcpu_kick_intr %p \n, vcpu);
-#endif
-}
-
 void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
 {
-   int ipi_pcpu = vcpu-cpu;
-   int cpu = get_cpu();
+   int me;
+   int cpu = vcpu-cpu;
 
if (waitqueue_active(vcpu-wq))
wake_up_interruptible(vcpu-wq);
 
-   if (vcpu-guest_mode  cpu != ipi_pcpu)
-   smp_call_function_single(ipi_pcpu, vcpu_kick_intr, vcpu, 0);
+   me = get_cpu();
+   if (cpu != me  (unsigned) cpu  nr_cpu_ids  cpu_online(cpu))
+   if (!test_and_set_bit(KVM_REQ_KICK, vcpu-requests))
+   smp_send_reschedule(cpu);
put_cpu();
 }
 
Index: kvm/arch/x86/kernel/smp.c
===
--- kvm.orig/arch/x86/kernel/smp.c
+++ kvm/arch/x86/kernel/smp.c
@@ -172,6 +172,9 @@ void smp_reschedule_interrupt(struct pt_
 {
ack_APIC_irq();
inc_irq_stat(irq_resched_count);
+   /*
+* KVM uses this interrupt to force a cpu out of guest mode
+*/
 }
 
 void smp_call_function_interrupt(struct pt_regs *regs)
Index: kvm/arch/x86/kvm/x86.c
===
--- kvm.orig/arch/x86/kvm/x86.c
+++ kvm/arch/x86/kvm/x86.c
@@ -3197,6 +3197,9 @@ static int vcpu_enter_guest(struct kvm_v
 
local_irq_disable();
 
+   clear_bit(KVM_REQ_KICK, vcpu-requests);
+   smp_mb__after_clear_bit();
+
if (vcpu-requests || need_resched() || signal_pending(current)) {
local_irq_enable();
preempt_enable();
@@ -3204,13 +3207,6 @@ static int vcpu_enter_guest(struct kvm_v
goto out;
}
 
-   vcpu-guest_mode = 1;
-   /*
-* Make sure that guest_mode assignment won't happen after
-* testing the pending IRQ vector bitmap.
-*/
-   smp_wmb();
-
if (vcpu-arch.exception.pending)
__queue_exception(vcpu);
else if (irqchip_in_kernel(vcpu-kvm))
@@ -3252,7 +3248,7 @@ static int vcpu_enter_guest(struct kvm_v
set_debugreg(vcpu-arch.host_dr6, 6);
set_debugreg(vcpu-arch.host_dr7, 7);
 
-   vcpu-guest_mode = 0;
+   set_bit(KVM_REQ_KICK, vcpu-requests);
local_irq_enable();
 
++vcpu-stat.exits;
@@ -4550,30 +4546,20 @@ int kvm_arch_vcpu_runnable(struct kvm_vc
   || vcpu-arch.nmi_pending;
 }
 
-static void vcpu_kick_intr(void *info)
-{
-#ifdef DEBUG
-   struct kvm_vcpu *vcpu = (struct kvm_vcpu *)info;
-   printk(KERN_DEBUG vcpu_kick_intr %p \n, vcpu);
-#endif
-}
-
 void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
 {
-   int ipi_pcpu = vcpu-cpu;
-   int cpu;
+   int me;
+   int cpu = vcpu-cpu;
 
if (waitqueue_active(vcpu-wq)) {
wake_up_interruptible(vcpu-wq);
++vcpu-stat.halt_wakeup;
}
-   /*
-* We may be called synchronously with irqs disabled in guest mode,
-* So need not to call smp_call_function_single() in that case.
-*/
-  

[patch 4/4] KVM: protect assigned dev workqueue, int handler and irq acker

2009-04-27 Thread mtosatti
kvm_assigned_dev_ack_irq is vulnerable to a race condition with the
interrupt handler function. It does:

if (dev-host_irq_disabled) {
enable_irq(dev-host_irq);
dev-host_irq_disabled = false;
}

If an interrupt triggers before the host-dev_irq_disabled assignment,
it will disable the interrupt and set dev-host_irq_disabled to true.

On return to kvm_assigned_dev_ack_irq, dev-host_irq_disabled is set to
false, and the next kvm_assigned_dev_ack_irq call will fail to reenable
it.

Other than that, having the interrupt handler and work handlers run in
parallel sounds like asking for trouble (could not spot any obvious
problem, but better not have to, its fragile).

CC: sheng.y...@intel.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/include/linux/kvm_host.h
===
--- kvm.orig/include/linux/kvm_host.h
+++ kvm/include/linux/kvm_host.h
@@ -345,6 +345,7 @@ struct kvm_assigned_dev_kernel {
int flags;
struct pci_dev *dev;
struct kvm *kvm;
+   spinlock_t assigned_dev_lock;
 };
 
 struct kvm_irq_mask_notifier {
Index: kvm/virt/kvm/kvm_main.c
===
--- kvm.orig/virt/kvm/kvm_main.c
+++ kvm/virt/kvm/kvm_main.c
@@ -42,6 +42,7 @@
 #include linux/mman.h
 #include linux/swap.h
 #include linux/bitops.h
+#include linux/spinlock.h
 
 #include asm/processor.h
 #include asm/io.h
@@ -130,6 +131,7 @@ static void kvm_assigned_dev_interrupt_w
 * finer-grained lock, update this
 */
mutex_lock(kvm-lock);
+   spin_lock_irq(assigned_dev-assigned_dev_lock);
if (assigned_dev-irq_requested_type  KVM_DEV_IRQ_HOST_MSIX) {
struct kvm_guest_msix_entry *guest_entries =
assigned_dev-guest_msix_entries;
@@ -156,18 +158,21 @@ static void kvm_assigned_dev_interrupt_w
}
}
 
+   spin_unlock_irq(assigned_dev-assigned_dev_lock);
mutex_unlock(assigned_dev-kvm-lock);
 }
 
 static irqreturn_t kvm_assigned_dev_intr(int irq, void *dev_id)
 {
+   unsigned long flags;
struct kvm_assigned_dev_kernel *assigned_dev =
(struct kvm_assigned_dev_kernel *) dev_id;
 
+   spin_lock_irqsave(assigned_dev-assigned_dev_lock, flags);
if (assigned_dev-irq_requested_type  KVM_DEV_IRQ_HOST_MSIX) {
int index = find_index_from_host_irq(assigned_dev, irq);
if (index  0)
-   return IRQ_HANDLED;
+   goto out;
assigned_dev-guest_msix_entries[index].flags |=
KVM_ASSIGNED_MSIX_PENDING;
}
@@ -177,6 +182,8 @@ static irqreturn_t kvm_assigned_dev_intr
disable_irq_nosync(irq);
assigned_dev-host_irq_disabled = true;
 
+out:
+   spin_unlock_irqrestore(assigned_dev-assigned_dev_lock, flags);
return IRQ_HANDLED;
 }
 
@@ -184,6 +191,7 @@ static irqreturn_t kvm_assigned_dev_intr
 static void kvm_assigned_dev_ack_irq(struct kvm_irq_ack_notifier *kian)
 {
struct kvm_assigned_dev_kernel *dev;
+   unsigned long flags;
 
if (kian-gsi == -1)
return;
@@ -196,10 +204,12 @@ static void kvm_assigned_dev_ack_irq(str
/* The guest irq may be shared so this ack may be
 * from another device.
 */
+   spin_lock_irqsave(dev-assigned_dev_lock, flags);
if (dev-host_irq_disabled) {
enable_irq(dev-host_irq);
dev-host_irq_disabled = false;
}
+   spin_unlock_irqrestore(dev-assigned_dev_lock, flags);
 }
 
 static void deassign_guest_irq(struct kvm *kvm,
@@ -615,6 +625,7 @@ static int kvm_vm_ioctl_assign_device(st
match-host_devfn = assigned_dev-devfn;
match-flags = assigned_dev-flags;
match-dev = dev;
+   spin_lock_init(match-assigned_dev_lock);
match-irq_source_id = -1;
match-kvm = kvm;
match-ack_notifier.irq_acked = kvm_assigned_dev_ack_irq;


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 2/4] KVM: x86: wake up waitqueue before calling get_cpu()

2009-04-27 Thread mtosatti
From: Jan Blunck jblu...@suse.de
 
This moves the get_cpu() call down to be called after we wake up the
waiters. Therefore the waitqueue locks can savely be rt mutex.
 
Signed-off-by: Jan Blunck jblu...@suse.de
Signed-off-by: Sven-Thorsten Dietrich s...@thebigcorporation.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/kvm/x86.c
===
--- kvm.orig/arch/x86/kvm/x86.c
+++ kvm/arch/x86/kvm/x86.c
@@ -4544,7 +4544,7 @@ static void vcpu_kick_intr(void *info)
 void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
 {
int ipi_pcpu = vcpu-cpu;
-   int cpu = get_cpu();
+   int cpu;
 
if (waitqueue_active(vcpu-wq)) {
wake_up_interruptible(vcpu-wq);
@@ -4554,6 +4554,7 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu
 * We may be called synchronously with irqs disabled in guest mode,
 * So need not to call smp_call_function_single() in that case.
 */
+   cpu = get_cpu();
if (vcpu-guest_mode  vcpu-cpu != cpu)
smp_call_function_single(ipi_pcpu, vcpu_kick_intr, vcpu, 0);
put_cpu();


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


boot problems with if=virtio

2009-04-27 Thread Andrew Theurer
I know there have been a couple other threads here about booting with 
if=virtio, but I think this might be a different problem, not sure:


I am using kvm.git (41b76d8d0487c26d6d4d3fe53c1ff59b3236f096)
and qemu-kvm.git (8f7a30dbc40a1d4c09275566f9ed9647ed1ee50f)
and linux 2.6.20-rc3

It appears to build fine.  I am trying to run the following command:

name=newcastle-xmailt01
dev1=/dev/disk/by-id/scsi-3600a0b8f1eb1069748d8c230
dev2=/dev/disk/by-id/scsi-3600a0b8f1eb106dc48f45432
macaddr=00:50:56:00:00:06
tap=tap6
cpus=1
mem=1024
/usr/local/bin/qemu-system-x86_64 -name $name\
   -drive file=$dev1,if=virtio,boot=on,cache=none\
   -drive file=$dev2,if=virtio,boot=off,cache=none\
   -m $mem  -net nic,model=virtio,vlan=0,macaddr=$macaddr\
   -net tap,vlan=0,ifname=$tap,script=/etc/qemu-ifup -vnc 127.0.0.1:6 
-smp $cpus -daemonize



...and I get Boot failed: could not read the boot disk

This did work with the kvm-userspace.git (kvm-85rc6).  I can get this to 
work with a windows vm, using ide.  Was there a recent change to the 
-drive options that I am missing?


Thanks,

-Andrew

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: Discard shadow_mt_mask

2009-04-27 Thread Marcelo Tosatti
On Mon, Apr 27, 2009 at 05:47:44PM +0800, Sheng Yang wrote:
 mt_mask is out of date, now it have only been used as a flag to indicate if 
 TDP
 enabled. Get rid of it and use tdp_enabled instead.
 
 Signed-off-by: Sheng Yang sh...@linux.intel.com
 ---
  arch/x86/include/asm/kvm_host.h |2 +-
  arch/x86/kvm/mmu.c  |8 +++-
  arch/x86/kvm/vmx.c  |3 +--
  arch/x86/kvm/x86.c  |2 +-
  4 files changed, 6 insertions(+), 9 deletions(-)

ACK both patches.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm-userspace: Make PC speaker emulation aware of in-kernel PIT

2009-04-27 Thread Marcelo Tosatti
On Thu, Apr 23, 2009 at 10:24:05PM +0200, Jan Kiszka wrote:
 When using the in-kernel PIT the speaker emulation has to synchronize
 the PIT state with KVM. Enhance the existing speaker sound device and
 allow it to take over port 0x61 by using KVM_CREATE_PIT_NOSPKR when
 available. This unbreaks -soundhw pcspk in KVM mode.

ACK both patches.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] add ksm kernel shared memory driver.

2009-04-27 Thread Andrew Morton
On Mon, 20 Apr 2009 04:36:06 +0300
Izik Eidus iei...@redhat.com wrote:

 Ksm is driver that allow merging identical pages between one or more
 applications in way unvisible to the application that use it.
 Pages that are merged are marked as readonly and are COWed when any
 application try to change them.

Breaks sparc64 and probably lots of other architectures:

mm/ksm.c: In function `try_to_merge_two_pages_alloc':
mm/ksm.c:697: error: `_PAGE_RW' undeclared (first use in this function)

there should be an official arch-independent way of manipulating
vma-vm_page_prot, but I'm not immediately finding it.

An alternative (and quite inferior) fix would be to disable ksm on
architectures which don't implement _PAGE_RW.  That's most of them.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >