Re: [PATCH 0/11] RFC: PCI using capabilitities

2011-12-11 Thread Avi Kivity
On 12/08/2011 05:37 PM, Sasha Levin wrote:
 On Thu, 2011-12-08 at 20:52 +1030, Rusty Russell wrote:
  Here's the patch series I ended up with.  I haven't coded up the QEMU
  side yet, so no idea if the new driver works.
  
  Questions:
  (1) Do we win from separating ISR, NOTIFY and COMMON?
  (2) I used a u8 bar; should I use a bir and pack it instead?  BIR
  seems a little obscure (noone else in the kernel source seems to
  refer to it).

 I started implementing it for KVM tools, when I noticed a strange thing:
 my vq creating was failing because the driver was reading a value other
 than 0 from the address field of a new vq, and failing.

 I've added simple prints in the usermode code, and saw the following
 ordering:

 1. queue select vq 0
 2. queue read address (returns 0 - new vq)
 3. queue write address (good address of vq)
 4. queue read address (returns !=0, fails)
 4. queue select vq 1

 From that I understood that the ordering is wrong, the driver was trying
 to read address before selecting the correct vq.

 At that point, I've added simple prints to the driver. Initially it
 looked as follows:

   iowrite16(index, vp_dev-common-queue_select);

   switch (ioread64(vp_dev-common-queue_address)) {
   [...]
   };

 So I added prints before the iowrite16() and after the ioread64(), and
 saw that while the driver prints were ordered, the device ones weren't:

   [1.264052] before iowrite index=1
   kvmtool: net returning pfn (vq=0): 310706176
   kvmtool: queue selected: 1
   [1.264890] after ioread index=1

 Suspecting that something was wrong with ordering, I've added a print
 between the iowrite and the ioread, and it finally started working well.

 Which leads me to the question: Are MMIO vs MMIO reads/writes not
 ordered?

mmios are strictly ordered.

Perhaps your printfs are reordered by buffering?  Are they from
different threads?  Are you using coalesced mmio (which is still
strictly ordered, if used correctly)?

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/11] RFC: PCI using capabilitities

2011-12-11 Thread Sasha Levin
On Sun, 2011-12-11 at 11:05 +0200, Avi Kivity wrote:
 mmios are strictly ordered.
 
 Perhaps your printfs are reordered by buffering?  Are they from
 different threads?  Are you using coalesced mmio (which is still
 strictly ordered, if used correctly)? 

I print the queue_selector and queue_address in the printfs, even if
printfs were reordered they would be printing the data right, unlike
they do now. It's the data in the printfs that matters, not their order.

Same vcpu thread with both accesses.

Not using coalesced mmio.

-- 

Sasha.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 00/13] KVM/ARM Implementation

2011-12-11 Thread Christoffer Dall
The following series implements KVM support for ARM processors,
specifically on the Cortex A-15 platform.

The patch series applies to commit 0ec4044a029b5ba9ed6dc7c52390c25da717e184
on Catalin Marinas' linux-arm-arch tree.

This is Version 5 of the patch series, but the first two versions
were reviewed outside of the KVM mailing list. Changes can also be
pulled from:
 git://github.com/virtualopensystems/linux-kvm-arm.git kvm-a15-v5

The implementation is broken up into a logical set of patches, the first
one containing a skeleton of files, makefile changes, the basic user
space interface and KVM architecture specific stubs.  Subsequent patches
implement parts of the system as listed:
 1.  Skeleton
 2.  Identity Mapping for Hyp mode
 3.  Hypervisor initialization
 4.  Memory virtualization setup (hyp mode mappings and 2nd stage)
 5.  Inject IRQs and FIQs from userspace
 6.  World-switch implementation and Hyp exception vectors
 7.  Emulation framework and CP15 emulation
 8.  Handle guest user memory aborts
 9.  Handle guest MMIO aborts
 10. Support guest wait-for-interrupt instructions.
 11. Initial SMP host support (incomplete!)
 12. Fix guest view of MPIDR
 13. Initial SMP guest support (incomplete!)

Testing:
Limited testing, but have run GCC inside guest, which compiled a small
hello-world program, which was successfully run. Hardware still
unavailable, so all testing has been done on ARM Fast Models.

For a guide on how to set up a testing environment and try out these
patches, see:
 http://www.virtualopensystems.com/media/pdf/kvm-arm-guide.pdf
 https://wiki.linaro.org/PeterMaydell/A15OnFastModels

Still on the to-do list:
 - Reuse VMIDs
 - Fix SMP host support
 - Fix SMP guest support
 - Support guest Thumb mode for MMIO emulation
 - Further testing
 - Performance improvements

Changes since v4:
 - Addressed reviewer comments from v4
* cleanup debug and trace code
* remove printks
* fixup kvm_arch_vcpu_ioctl_run
* add trace details to mmio emulation
 - Fix from Marc Zyngier: Move kvm_guest_enter/exit into non-preemptible
   section (squashed into world-switch patch)
 - Cleanup create_hyp_mappings/remove_hyp_mappings from Marc Zyngier
   (squashed into hypervisor initialization patch)
 - Removed the remove_hyp_mappings feature. Removing hypervisor mappings
   could potentially unmap other important data shared in the same page.
 - Removed the arm_ prefix from the arch-specific files.
 - Initial SMP host/guest support

Changes since v3:
 - v4 actually works, fully boots a guest
 - Support compiling as a module
 - Use static inlines instead of macros for vcpu_reg and friends
 - Optimize kvm_vcpu_reg function
 - Use Ftrace for trace capabilities
 - Updated documentation and commenting
 - Use KVM_IRQ_LINE instead of KVM_INTERRUPT
 - Emulates load/store instructions not supported through HSR
  syndrome information.
 - Frees 2nd stage translation tables on VM teardown
 - Handles IRQ/FIQ instructions
 - Handles more CP15 accesses
 - Support guest WFI calls
 - Uses debugfs instead of /proc
 - Support compiling in Thumb mode

Changes since v2:
 - Performs world-switch code
 - Maps guest memory using 2nd stage translation
 - Emulates co-processor 15 instructions
 - Forwards I/O faults to QEMU.

---

Christoffer Dall (12):
  ARM: KVM: Initial skeleton to compile KVM support
  ARM: KVM: Hypervisor identity mapping
  ARM: KVM: Add hypervisor inititalization
  ARM: KVM: Memory virtualization setup
  ARM: KVM: Inject IRQs and FIQs from userspace
  ARM: KVM: World-switch implementation
  ARM: KVM: Emulation framework and CP15 emulation
  ARM: KVM: Handle guest faults in KVM
  ARM: KVM: Handle I/O aborts
  ARM: KVM: Guest wait-for-interrupts (WFI) support
  ARM: KVM: Support SMP hosts
  ARM: KVM: Support SMP guests

Marc Zyngier (1):
  ARM: KVM: Fix guest view of MPIDR


 Documentation/virtual/kvm/api.txt   |   10 
 arch/arm/Kconfig|2 
 arch/arm/Makefile   |1 
 arch/arm/include/asm/kvm.h  |   75 +++
 arch/arm/include/asm/kvm_arm.h  |  130 +
 arch/arm/include/asm/kvm_asm.h  |   51 ++
 arch/arm/include/asm/kvm_emulate.h  |  100 
 arch/arm/include/asm/kvm_host.h |  112 
 arch/arm/include/asm/kvm_mmu.h  |   42 ++
 arch/arm/include/asm/kvm_para.h |9 
 arch/arm/include/asm/pgtable-3level-hwdef.h |5 
 arch/arm/include/asm/pgtable-3level.h   |   12 
 arch/arm/include/asm/pgtable.h  |   11 
 arch/arm/include/asm/unified.h  |   12 
 arch/arm/kernel/armksyms.c  |7 
 arch/arm/kernel/asm-offsets.c   |   34 +
 arch/arm/kernel/entry-armv.S|1 
 arch/arm/kvm/Kconfig|   44 ++
 arch/arm/kvm/Makefile   |   17 +
 arch/arm/kvm/arm.c  |  716 

[PATCH v5 01/13] ARM: KVM: Initial skeleton to compile KVM support

2011-12-11 Thread Christoffer Dall
Targets KVM support for Cortex A-15 processors.

Contains no real functionality but all the framework components,
make files, header files and some tracing functionality.

Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/Kconfig   |2 
 arch/arm/Makefile  |1 
 arch/arm/include/asm/kvm.h |   66 +
 arch/arm/include/asm/kvm_asm.h |   28 
 arch/arm/include/asm/kvm_emulate.h |   91 
 arch/arm/include/asm/kvm_host.h|   93 
 arch/arm/include/asm/kvm_para.h|9 +
 arch/arm/include/asm/unified.h |   12 ++
 arch/arm/kvm/Kconfig   |   44 ++
 arch/arm/kvm/Makefile  |   17 ++
 arch/arm/kvm/arm.c |  279 
 arch/arm/kvm/debug.h   |   48 ++
 arch/arm/kvm/emulate.c |  121 
 arch/arm/kvm/exports.c |   16 ++
 arch/arm/kvm/guest.c   |  148 +++
 arch/arm/kvm/init.S|   17 ++
 arch/arm/kvm/interrupts.S  |   17 ++
 arch/arm/kvm/mmu.c |   15 ++
 arch/arm/kvm/trace.h   |   52 +++
 arch/arm/mach-vexpress/Kconfig |1 
 arch/arm/mm/Kconfig|8 +
 21 files changed, 1085 insertions(+), 0 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm.h
 create mode 100644 arch/arm/include/asm/kvm_asm.h
 create mode 100644 arch/arm/include/asm/kvm_emulate.h
 create mode 100644 arch/arm/include/asm/kvm_host.h
 create mode 100644 arch/arm/include/asm/kvm_para.h
 create mode 100644 arch/arm/kvm/Kconfig
 create mode 100644 arch/arm/kvm/Makefile
 create mode 100644 arch/arm/kvm/arm.c
 create mode 100644 arch/arm/kvm/debug.h
 create mode 100644 arch/arm/kvm/emulate.c
 create mode 100644 arch/arm/kvm/exports.c
 create mode 100644 arch/arm/kvm/guest.c
 create mode 100644 arch/arm/kvm/init.S
 create mode 100644 arch/arm/kvm/interrupts.S
 create mode 100644 arch/arm/kvm/mmu.c
 create mode 100644 arch/arm/kvm/trace.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 00e908b..2a65d7b 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -2248,3 +2248,5 @@ source security/Kconfig
 source crypto/Kconfig
 
 source lib/Kconfig
+
+source arch/arm/kvm/Kconfig
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index dfcf3b0..621fb8d 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -255,6 +255,7 @@ core-$(CONFIG_VFP)  += arch/arm/vfp/
 
 # If we have a machine-specific directory, then include it in the build.
 core-y += arch/arm/kernel/ arch/arm/mm/ 
arch/arm/common/
+core-y += arch/arm/kvm/
 core-y += $(machdirs) $(platdirs)
 
 drivers-$(CONFIG_OPROFILE)  += arch/arm/oprofile/
diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
new file mode 100644
index 000..87dc33b
--- /dev/null
+++ b/arch/arm/include/asm/kvm.h
@@ -0,0 +1,66 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ */
+
+#ifndef __ARM_KVM_H__
+#define __ARM_KVM_H__
+
+#include asm/types.h
+
+/*
+ * Modes used for short-hand mode determinition in the world-switch code and
+ * in emulation code.
+ *
+ * Note: These indices do NOT correspond to the value of the CPSR mode bits!
+ */
+#define MODE_FIQ   0
+#define MODE_IRQ   1
+#define MODE_SVC   2
+#define MODE_ABT   3
+#define MODE_UND   4
+#define MODE_USR   5
+#define MODE_SYS   6
+
+struct kvm_regs {
+   __u32 regs0_7[8];   /* Unbanked regs. (r0 - r7)*/
+   __u32 fiq_regs8_12[5];  /* Banked fiq regs. (r8 - r12) */
+   __u32 usr_regs8_12[5];  /* Banked usr registers (r8 - r12) */
+   __u32 reg13[6]; /* Banked r13, indexed by MODE_*/
+   __u32 reg14[6]; /* Banked r13, indexed by MODE_*/
+   __u32 reg15;
+   __u32 cpsr;
+   __u32 spsr[5];  /* Banked SPSR,  indexed by MODE_  */
+   struct {
+   __u32 c1_sys;
+   __u32 c2_base0;
+   __u32 c2_base1;
+   __u32 c3_dacr;
+   } cp15;
+
+};
+
+struct kvm_sregs {
+};
+
+struct kvm_fpu {
+};
+
+struct kvm_guest_debug_arch {
+};
+
+struct kvm_debug_exit_arch {
+};
+
+#endif 

[PATCH v5 02/13] ARM: KVM: Hypervisor identity mapping

2011-12-11 Thread Christoffer Dall
From: Christoffer Dall cd...@cs.columbia.edu

Adds support in the identity mapping feature that allows KVM to setup
identity mapping for the Hyp mode with the AP[1] bit set as required by
the specification and also supports freeing created sub pmd's after
finished use.

These two functions:
 - hyp_identity_mapping_add(pgd, addr, end);
 - hyp_identity_mapping_del(pgd, addr, end);
are essentially calls to the same function as the non-hyp versions but
with a different argument value. KVM calls these functions to setup
and teardown the identity mapping used to initialize the hypervisor.

Note, the hyp-version of the _del function actually frees the pmd's
pointed to by the pgd as opposed to the non-hyp version which just
clears them.

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/pgtable-3level-hwdef.h |1 +
 arch/arm/include/asm/pgtable.h  |6 +++
 arch/arm/mm/idmap.c |   54 +++
 3 files changed, 60 insertions(+), 1 deletions(-)

diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h 
b/arch/arm/include/asm/pgtable-3level-hwdef.h
index d795282..a2d404e 100644
--- a/arch/arm/include/asm/pgtable-3level-hwdef.h
+++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
@@ -44,6 +44,7 @@
 #define PMD_SECT_XN(_AT(pmdval_t, 1)  54)
 #define PMD_SECT_AP_WRITE  (_AT(pmdval_t, 0))
 #define PMD_SECT_AP_READ   (_AT(pmdval_t, 0))
+#define PMD_SECT_AP1   (_AT(pmdval_t, 1)  6)
 #define PMD_SECT_TEX(x)(_AT(pmdval_t, 0))
 
 /*
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index aec18ab..19456f4 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -318,6 +318,12 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 void identity_mapping_add(pgd_t *, unsigned long, unsigned long);
 void identity_mapping_del(pgd_t *, unsigned long, unsigned long);
 
+#ifdef CONFIG_KVM_ARM_HOST
+void hyp_identity_mapping_add(pgd_t *, unsigned long, unsigned long);
+void hyp_identity_mapping_del(pgd_t *pgd, unsigned long addr,
+ unsigned long end);
+#endif
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* CONFIG_MMU */
diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
index 267db72..e29903a 100644
--- a/arch/arm/mm/idmap.c
+++ b/arch/arm/mm/idmap.c
@@ -1,3 +1,4 @@
+#include linux/module.h
 #include linux/kernel.h
 
 #include asm/cputype.h
@@ -54,11 +55,18 @@ static void idmap_add_pud(pgd_t *pgd, unsigned long addr, 
unsigned long end,
} while (pud++, addr = next, addr != end);
 }
 
-void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long end)
+static void __identity_mapping_add(pgd_t *pgd, unsigned long addr,
+  unsigned long end, bool hyp_mapping)
 {
unsigned long prot, next;
 
prot = PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
+
+#ifdef CONFIG_ARM_LPAE
+   if (hyp_mapping)
+   prot |= PMD_SECT_AP1;
+#endif
+
if (cpu_architecture() = CPU_ARCH_ARMv5TEJ  !cpu_is_xscale())
prot |= PMD_BIT4;
 
@@ -69,6 +77,12 @@ void identity_mapping_add(pgd_t *pgd, unsigned long addr, 
unsigned long end)
} while (pgd++, addr = next, addr != end);
 }
 
+void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long end)
+{
+   __identity_mapping_add(pgd, addr, end, false);
+}
+
+
 #ifdef CONFIG_SMP
 static void idmap_del_pmd(pud_t *pud, unsigned long addr, unsigned long end)
 {
@@ -103,6 +117,44 @@ void identity_mapping_del(pgd_t *pgd, unsigned long addr, 
unsigned long end)
 }
 #endif
 
+#ifdef CONFIG_KVM_ARM_HOST
+void hyp_identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long 
end)
+{
+   __identity_mapping_add(pgd, addr, end, true);
+}
+EXPORT_SYMBOL_GPL(hyp_identity_mapping_add);
+
+static void hyp_idmap_del_pmd(pgd_t *pgd, unsigned long addr)
+{
+   pud_t *pud;
+   pmd_t *pmd;
+
+   pud = pud_offset(pgd, addr);
+   pmd = pmd_offset(pud, addr);
+   pmd_free(NULL, pmd);
+}
+
+/*
+ * This version actually frees the underlying pmds for all pgds in range and
+ * clear the pgds themselves afterwards.
+ */
+void hyp_identity_mapping_del(pgd_t *pgd, unsigned long addr, unsigned long 
end)
+{
+   unsigned long next;
+   pgd_t *next_pgd;
+
+   do {
+   next = pgd_addr_end(addr, end);
+   next_pgd = pgd + pgd_index(addr);
+   if (!pgd_none_or_clear_bad(next_pgd)) {
+   hyp_idmap_del_pmd(next_pgd, addr);
+   pgd_clear(next_pgd);
+   }
+   } while (addr = next, addr  end);
+}
+EXPORT_SYMBOL_GPL(hyp_identity_mapping_del);
+#endif
+
 /*
  * In order to soft-boot, we need to insert a 1:1 mapping in place of
  * the user-mode pages.  This will then ensure that we have predictable

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a 

[PATCH v5 03/13] ARM: KVM: Add hypervisor inititalization

2011-12-11 Thread Christoffer Dall
Sets up the required registers to run code in HYP-mode from the kernel.
No major controversies, but we should consider how to deal with SMP
support for hypervisor stack page.

By setting the HVBAR the kernel can execute code in Hyp-mode with
the MMU disabled. The HVBAR initially points to initialization code,
which initializes other Hyp-mode registers and enables the MMU
for Hyp-mode. Afterwards, the HVBAR is changed to point to KVM
Hyp vectors used to catch guest faults and to switch to Hyp mode
to perform a world-switch into a KVM guest.

Also provides memory mapping code to map required code pages and data
structures accessed in Hyp mode at the same virtual address as the
host kernel virtual addresses, but which conforms to the architectural
requirements for translations in Hyp mode. This interface is added in
arch/arm/kvm/arm_mmu.c and is comprised of:
 - create_hyp_mappings(hyp_pgd, start, end);
 - free_hyp_pmds(pgd_hyp);

See the implementation for more details.

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_arm.h  |  103 +
 arch/arm/include/asm/kvm_asm.h  |   23 
 arch/arm/include/asm/kvm_host.h |1 
 arch/arm/include/asm/kvm_mmu.h  |   35 ++
 arch/arm/include/asm/pgtable-3level-hwdef.h |4 +
 arch/arm/include/asm/pgtable-3level.h   |4 +
 arch/arm/include/asm/pgtable.h  |1 
 arch/arm/kvm/arm.c  |  166 +++
 arch/arm/kvm/exports.c  |   10 ++
 arch/arm/kvm/init.S |   98 
 arch/arm/kvm/interrupts.S   |   30 +
 arch/arm/kvm/mmu.c  |  152 +
 mm/memory.c |1 
 13 files changed, 628 insertions(+), 0 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm_arm.h
 create mode 100644 arch/arm/include/asm/kvm_mmu.h

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
new file mode 100644
index 000..835abd1
--- /dev/null
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -0,0 +1,103 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ */
+
+#ifndef __KVM_ARM_H__
+#define __KVM_ARM_H__
+
+#include asm/types.h
+
+/* Hyp Configuration Register (HCR) bits */
+#define HCR_TGE(1  27)
+#define HCR_TVM(1  26)
+#define HCR_TTLB   (1  25)
+#define HCR_TPU(1  24)
+#define HCR_TPC(1  23)
+#define HCR_TSW(1  22)
+#define HCR_TAC(1  21)
+#define HCR_TIDCP  (1  20)
+#define HCR_TSC(1  19)
+#define HCR_TID3   (1  18)
+#define HCR_TID2   (1  17)
+#define HCR_TID1   (1  16)
+#define HCR_TID0   (1  15)
+#define HCR_TWE(1  14)
+#define HCR_TWI(1  13)
+#define HCR_DC (1  12)
+#define HCR_BSU(3  10)
+#define HCR_FB (1  9)
+#define HCR_VA (1  8)
+#define HCR_VI (1  7)
+#define HCR_VF (1  6)
+#define HCR_AMO(1  5)
+#define HCR_IMO(1  4)
+#define HCR_FMO(1  3)
+#define HCR_PTW(1  2)
+#define HCR_SWIO   (1  1)
+#define HCR_VM 1
+#define HCR_GUEST_MASK (HCR_TSC | HCR_TWE | HCR_TWI | HCR_VM | HCR_AMO | \
+   HCR_AMO | HCR_IMO | HCR_FMO | HCR_SWIO)
+
+/* Hyp System Control Register (HSCTLR) bits */
+#define HSCTLR_TE  (1  30)
+#define HSCTLR_EE  (1  25)
+#define HSCTLR_FI  (1  21)
+#define HSCTLR_WXN (1  19)
+#define HSCTLR_I   (1  12)
+#define HSCTLR_C   (1  2)
+#define HSCTLR_A   (1  1)
+#define HSCTLR_M   1
+#define HSCTLR_MASK(HSCTLR_M | HSCTLR_A | HSCTLR_C | HSCTLR_I | \
+HSCTLR_WXN | HSCTLR_FI | HSCTLR_EE | HSCTLR_TE)
+
+/* TTBCR and HTCR Registers bits */
+#define TTBCR_EAE  (1  31)
+#define TTBCR_IMP  (1  30)
+#define TTBCR_SH1  (3  28)
+#define TTBCR_ORGN1(3  26)
+#define TTBCR_IRGN1(3  24)
+#define TTBCR_EPD1 (1  23)
+#define TTBCR_A1   (1  22)
+#define TTBCR_T1SZ (3  16)
+#define TTBCR_SH0  (3  12)
+#define TTBCR_ORGN0(3  10)
+#define TTBCR_IRGN0(3  8)
+#define TTBCR_EPD0 (1  7)
+#define TTBCR_T0SZ 3
+#define HTCR_MASK  

[PATCH v5 04/13] ARM: KVM: Memory virtualization setup

2011-12-11 Thread Christoffer Dall
This commit introduces the framework for guest memory management
through the use of 2nd stage translation. Each VM has a pointer
to a level-1 tabled (the pgd field in struct kvm_arch) which is
used for the 2nd stage translations. Entries are added when handling
guest faults (later patch) and the table itself can be allocated and
freed through the following functions implemented in
arch/arm/kvm/arm_mmu.c:
 - kvm_alloc_stage2_pgd(struct kvm *kvm);
 - kvm_free_stage2_pgd(struct kvm *kvm);

Further, each entry in TLBs and caches are tagged with a VMID
identifier in addition to ASIDs. The VMIDs are managed using
a bitmap and assigned when creating the VM in kvm_arch_init_vm()
where the 2nd stage pgd is also allocated. The table is freed in
kvm_arch_destroy_vm(). Both functions are called from the main
KVM code.

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_host.h |4 ++
 arch/arm/include/asm/kvm_mmu.h  |5 +++
 arch/arm/kvm/arm.c  |   59 +++--
 arch/arm/kvm/mmu.c  |   69 +++
 4 files changed, 132 insertions(+), 5 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 6a10467..06d1263 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -31,7 +31,9 @@ struct kvm_vcpu;
 u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
 
 struct kvm_arch {
-   pgd_t *pgd; /* 1-level 2nd stage table */
+   u32vmid;/* The VMID used for the virt. memory system */
+   pgd_t *pgd; /* 1-level 2nd stage table */
+   u64vttbr;   /* VTTBR value associated with above pgd and vmid */
 };
 
 #define EXCEPTION_NONE  0
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 13fd8dc..9d7440c 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -32,4 +32,9 @@ extern pgd_t *kvm_hyp_pgd;
 int create_hyp_mappings(pgd_t *hyp_pgd, void *from, void *to);
 void free_hyp_pmds(pgd_t *hyp_pgd);
 
+int kvm_alloc_stage2_pgd(struct kvm *kvm);
+void kvm_free_stage2_pgd(struct kvm *kvm);
+
+int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
+
 #endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index e6bdf50..89ba18d 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -94,15 +94,62 @@ void kvm_arch_sync_events(struct kvm *kvm)
 {
 }
 
+/**
+ * kvm_arch_init_vm - initializes a VM data structure
+ * @kvm:   pointer to the KVM struct
+ */
 int kvm_arch_init_vm(struct kvm *kvm)
 {
-   return 0;
+   int ret = 0;
+   phys_addr_t pgd_phys;
+   unsigned long vmid;
+
+   mutex_lock(kvm_vmids_mutex);
+   vmid = find_first_zero_bit(kvm_vmids, VMID_SIZE);
+   if (vmid = VMID_SIZE) {
+   mutex_unlock(kvm_vmids_mutex);
+   return -EBUSY;
+   }
+   __set_bit(vmid, kvm_vmids);
+   kvm-arch.vmid = vmid;
+   mutex_unlock(kvm_vmids_mutex);
+
+   ret = kvm_alloc_stage2_pgd(kvm);
+   if (ret)
+   goto out_fail_alloc;
+
+   pgd_phys = virt_to_phys(kvm-arch.pgd);
+   kvm-arch.vttbr = pgd_phys  ((1LLU  40) - 1)  ~((2  VTTBR_X) - 1);
+   kvm-arch.vttbr |= ((u64)vmid  48);
+
+   ret = create_hyp_mappings(kvm_hyp_pgd, kvm, kvm + 1);
+   if (ret)
+   goto out_free_stage2_pgd;
+
+   return ret;
+out_free_stage2_pgd:
+   kvm_free_stage2_pgd(kvm);
+out_fail_alloc:
+   clear_bit(vmid, kvm_vmids);
+   return ret;
 }
 
+/**
+ * kvm_arch_destroy_vm - destroy the VM data structure
+ * @kvm:   pointer to the KVM struct
+ */
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
int i;
 
+   kvm_free_stage2_pgd(kvm);
+
+   if (kvm-arch.vmid != 0) {
+   mutex_lock(kvm_vmids_mutex);
+   clear_bit(kvm-arch.vmid, kvm_vmids);
+   mutex_unlock(kvm_vmids_mutex);
+   }
+
for (i = 0; i  KVM_MAX_VCPUS; ++i) {
if (kvm-vcpus[i]) {
kvm_arch_vcpu_free(kvm-vcpus[i]);
@@ -178,6 +225,10 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, 
unsigned int id)
if (err)
goto free_vcpu;
 
+   err = create_hyp_mappings(kvm_hyp_pgd, vcpu, vcpu + 1);
+   if (err)
+   goto free_vcpu;
+
return vcpu;
 free_vcpu:
kmem_cache_free(kvm_vcpu_cache, vcpu);
@@ -187,7 +238,7 @@ out:
 
 void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
 {
-   KVMARM_NOT_IMPLEMENTED();
+   kmem_cache_free(kvm_vcpu_cache, vcpu);
 }
 
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
@@ -293,8 +344,8 @@ static int init_hyp_mode(void)
 
hyp_stack_ptr = (unsigned long)kvm_arm_hyp_stack_page + PAGE_SIZE;
 
-   init_phys_addr = virt_to_phys((void *)__kvm_hyp_init);
-   init_end_phys_addr = virt_to_phys((void *)__kvm_hyp_init_end);
+   init_phys_addr 

[PATCH v5 05/13] ARM: KVM: Inject IRQs and FIQs from userspace

2011-12-11 Thread Christoffer Dall
Userspace can inject IRQs and FIQs through the KVM_IRQ_LINE VM ioctl.
This ioctl is used since the sematics are in fact two lines that can be
either raised or lowered on the VCPU - the IRQ and FIQ lines.

KVM needs to know which VCPU it must operate on and whether the FIQ or
IRQ line is raised/lowered. Hence both pieces of information is packed
in the kvm_irq_level-irq field. The irq fild value will be:
  IRQ: vcpu_index * 2
  FIQ: (vcpu_index * 2) + 1

This is documented in Documentation/kvm/api.txt.

The effect of the ioctl is simply to simply raise/lower the
corresponding virt_irq field on the VCPU struct, which will cause the
world-switch code to raise/lower virtual interrupts when running the
guest on next switch. The wait_for_interrupt flag is also cleared for
raised IRQs causing an idle VCPU to become active again.

Note: The custom trace_kvm_irq_line is used despite a generic definition of
trace_kvm_set_irq, since the trace-Kvm_set_irq depends on the x86-specific
define of __HAVE_IOAPIC. Either the trace event should be created
regardless of this define or it should depend on another ifdef clause,
common for both x86 and ARM. However, since the arguments don't really
match those used in ARM, I am yet to be convinced why this is necessary.

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 Documentation/virtual/kvm/api.txt |   10 ++-
 arch/arm/include/asm/kvm.h|8 ++
 arch/arm/include/asm/kvm_arm.h|1 +
 arch/arm/kvm/arm.c|   53 -
 arch/arm/kvm/trace.h  |   21 +++
 include/linux/kvm.h   |1 +
 6 files changed, 91 insertions(+), 3 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 7945b0b..4abaa67 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -572,7 +572,7 @@ only go to the IOAPIC.  On ia64, a IOSAPIC is created.
 4.25 KVM_IRQ_LINE
 
 Capability: KVM_CAP_IRQCHIP
-Architectures: x86, ia64
+Architectures: x86, ia64, arm
 Type: vm ioctl
 Parameters: struct kvm_irq_level
 Returns: 0 on success, -1 on error
@@ -582,6 +582,14 @@ Requires that an interrupt controller model has been 
previously created with
 KVM_CREATE_IRQCHIP.  Note that edge-triggered interrupts require the level
 to be set to 1 and then back to 0.
 
+KVM_CREATE_IRQCHIP (except for ARM).  Note that edge-triggered interrupts
+require the level to be set to 1 and then back to 0.
+
+ARM uses two types of interrupt lines per CPU, ie. IRQ and FIQ. The value of 
the
+irq field should be (VCPU_INDEX * 2) for IRQs and ((VCPU_INDEX * 2) + 1) for
+FIQs. Level is used to raise/lower the line. See arch/arm/include/asm/kvm.h for
+convenience macros.
+
 struct kvm_irq_level {
union {
__u32 irq; /* GSI */
diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
index 87dc33b..8935062 100644
--- a/arch/arm/include/asm/kvm.h
+++ b/arch/arm/include/asm/kvm.h
@@ -20,6 +20,14 @@
 #include asm/types.h
 
 /*
+ * KVM_IRQ_LINE macros to set/read IRQ/FIQ for specific VCPU index.
+ */
+enum KVM_ARM_IRQ_LINE_TYPE {
+   KVM_ARM_IRQ_LINE = 0,
+   KVM_ARM_FIQ_LINE = 1,
+};
+
+/*
  * Modes used for short-hand mode determinition in the world-switch code and
  * in emulation code.
  *
diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 835abd1..e378a37 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -49,6 +49,7 @@
 #define HCR_VM 1
 #define HCR_GUEST_MASK (HCR_TSC | HCR_TWE | HCR_TWI | HCR_VM | HCR_AMO | \
HCR_AMO | HCR_IMO | HCR_FMO | HCR_SWIO)
+#define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
 
 /* Hyp System Control Register (HSCTLR) bits */
 #define HSCTLR_TE  (1  30)
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 89ba18d..fc0bd6b 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -299,6 +299,43 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
return -EINVAL;
 }
 
+static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm,
+ struct kvm_irq_level *irq_level)
+{
+   u32 mask;
+   unsigned int vcpu_idx;
+   struct kvm_vcpu *vcpu;
+
+   vcpu_idx = irq_level-irq / 2;
+   if (vcpu_idx = KVM_MAX_VCPUS)
+   return -EINVAL;
+
+   vcpu = kvm_get_vcpu(kvm, vcpu_idx);
+   if (!vcpu)
+   return -EINVAL;
+
+   switch (irq_level-irq % 2) {
+   case KVM_ARM_IRQ_LINE:
+   mask = HCR_VI;
+   break;
+   case KVM_ARM_FIQ_LINE:
+   mask = HCR_VF;
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   trace_kvm_irq_line(irq_level-irq % 2, irq_level-level, vcpu_idx);
+
+   if (irq_level-level) {
+   vcpu-arch.virt_irq |= mask;
+   vcpu-arch.wait_for_interrupts = 

[PATCH v5 06/13] ARM: KVM: World-switch implementation

2011-12-11 Thread Christoffer Dall
Provides complete world-switch implementation to switch to other guests
runinng in non-secure modes. Includes Hyp exception handlers that
captures necessary exception information and stores the information on
the VCPU and KVM structures.

Switching to Hyp mode is done through a simple HVC instructions. The
exception vector code will check that the HVC comes from VMID==0 and if
so will store the necessary state on the Hyp stack, which will look like
this (see hyp_hvc):
  ...
  Hyp_Sp + 4: lr_usr
  Hyp_Sp: spsr (Host-SVC cpsr)

When returning from Hyp mode to SVC mode, another HVC instruction is
executed from Hyp mode, which is taken in the Hyp_Svc handler. The Hyp
stack pointer should be where it was left from the above initial call,
since the values on the stack will be used to restore state (see
hyp_svc).

Otherwise, the world-switch is pretty straight-forward. All state that
can be modified by the guest is first backed up on the Hyp stack and the
VCPU values is loaded onto the hardware. State, which is not loaded, but
theoretically modifiable by the guest is protected through the
virtualiation features to generate a trap and cause software emulation.
Upon guest returns, all state is restored from hardware onto the VCPU
struct and the original state is restored from the Hyp-stack onto the
hardware.

One controversy may be the back-door call to __irq_svc (the host
kernel's own physical IRQ handler) which is called when a physical IRQ
exception is taken in Hyp mode while running in the guest.

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm.h  |1 
 arch/arm/include/asm/kvm_arm.h  |   26 ++
 arch/arm/include/asm/kvm_host.h |8 +
 arch/arm/kernel/armksyms.c  |7 +
 arch/arm/kernel/asm-offsets.c   |   33 +++
 arch/arm/kernel/entry-armv.S|1 
 arch/arm/kvm/arm.c  |   45 
 arch/arm/kvm/guest.c|2 
 arch/arm/kvm/interrupts.S   |  443 +++
 9 files changed, 562 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
index 8935062..ff88ca0 100644
--- a/arch/arm/include/asm/kvm.h
+++ b/arch/arm/include/asm/kvm.h
@@ -51,6 +51,7 @@ struct kvm_regs {
__u32 cpsr;
__u32 spsr[5];  /* Banked SPSR,  indexed by MODE_  */
struct {
+   __u32 c0_midr;
__u32 c1_sys;
__u32 c2_base0;
__u32 c2_base1;
diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index e378a37..1769187 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -100,5 +100,31 @@
 #define VTTBR_X(5 - VTCR_GUEST_T0SZ)
 #endif
 
+/* Hyp Syndrome Register (HSR) bits */
+#define HSR_EC_SHIFT   (26)
+#define HSR_EC (0x3fU  HSR_EC_SHIFT)
+#define HSR_IL (1U  25)
+#define HSR_ISS(HSR_IL - 1)
+#define HSR_ISV_SHIFT  (24)
+#define HSR_ISV(1U  HSR_ISV_SHIFT)
+
+#define HSR_EC_UNKNOWN (0x00)
+#define HSR_EC_WFI (0x01)
+#define HSR_EC_CP15_32 (0x03)
+#define HSR_EC_CP15_64 (0x04)
+#define HSR_EC_CP14_MR (0x05)
+#define HSR_EC_CP14_LS (0x06)
+#define HSR_EC_CP_0_13 (0x07)
+#define HSR_EC_CP10_ID (0x08)
+#define HSR_EC_JAZELLE (0x09)
+#define HSR_EC_BXJ (0x0A)
+#define HSR_EC_CP14_64 (0x0C)
+#define HSR_EC_SVC_HYP (0x11)
+#define HSR_EC_HVC (0x12)
+#define HSR_EC_SMC (0x13)
+#define HSR_EC_IABT(0x20)
+#define HSR_EC_IABT_HYP(0x21)
+#define HSR_EC_DABT(0x24)
+#define HSR_EC_DABT_HYP(0x25)
 
 #endif /* __KVM_ARM_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 06d1263..59fcd15 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -62,6 +62,7 @@ struct kvm_vcpu_arch {
 
/* System control coprocessor (cp15) */
struct {
+   u32 c0_MIDR;/* Main ID Register */
u32 c1_SCTLR;   /* System Control Register */
u32 c1_ACTLR;   /* Auxilliary Control Register */
u32 c1_CPACR;   /* Coprocessor Access Control */
@@ -69,6 +70,12 @@ struct kvm_vcpu_arch {
u64 c2_TTBR1;   /* Translation Table Base Register 1 */
u32 c2_TTBCR;   /* Translation Table Base Control R. */
u32 c3_DACR;/* Domain Access Control Register */
+   u32 c10_PRRR;   /* Primary Region Remap Register */
+   u32 c10_NMRR;   /* Normal Memory Remap Register */
+   u32 c13_CID;/* Context ID Register */
+   u32 c13_TID_URW;/* Thread ID, User R/W */
+   u32 c13_TID_URO;/* Thread ID, User R/O */
+   u32 c13_TID_PRIV;   /* Thread ID, Priveleged */
} cp15;
 
u32 virt_irq;   /* HCR exception mask */
@@ -78,6 +85,7 

[PATCH v5 07/13] ARM: KVM: Emulation framework and CP15 emulation

2011-12-11 Thread Christoffer Dall
From: Christoffer Dall cd...@cs.columbia.edu

Adds a new important function in the main KVM/ARM code called
handle_exit() which is called from kvm_arch_vcpu_ioctl_run() on returns
from guest execution. This function examines the Hyp-Syndrome-Register
(HSR), which contains information telling KVM what caused the exit from
the guest.

Some of the reasons for an exit are CP15 accesses, which are
not allowed from the guest and this commits handles these exits by
emulating the intented operation in software and skip the guest
instruction.

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_emulate.h |7 +
 arch/arm/kvm/arm.c |   77 ++
 arch/arm/kvm/emulate.c |  195 
 arch/arm/kvm/trace.h   |   28 +
 4 files changed, 307 insertions(+), 0 deletions(-)

diff --git a/arch/arm/include/asm/kvm_emulate.h 
b/arch/arm/include/asm/kvm_emulate.h
index 91d461a..af21fd5 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -40,6 +40,13 @@ static inline unsigned char vcpu_mode(struct kvm_vcpu *vcpu)
return modes_table[vcpu-arch.regs.cpsr  0xf];
 }
 
+int kvm_handle_cp10_id(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp_0_13_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp15_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run);
+
 /*
  * Return the SPSR for the specified mode of the virtual CPU.
  */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index f5d..a6e1763 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -35,6 +35,7 @@
 #include asm/kvm_arm.h
 #include asm/kvm_asm.h
 #include asm/kvm_mmu.h
+#include asm/kvm_emulate.h
 
 #include debug.h
 
@@ -306,6 +307,62 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
return 0;
 }
 
+static inline int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
+ int exception_index)
+{
+   unsigned long hsr_ec;
+
+   if (exception_index == ARM_EXCEPTION_IRQ)
+   return 0;
+
+   if (exception_index != ARM_EXCEPTION_HVC) {
+   kvm_err(-EINVAL, Unsupported exception type);
+   return -EINVAL;
+   }
+
+   hsr_ec = (vcpu-arch.hsr  HSR_EC)  HSR_EC_SHIFT;
+   switch (hsr_ec) {
+   case HSR_EC_WFI:
+   return kvm_handle_wfi(vcpu, run);
+   case HSR_EC_CP15_32:
+   case HSR_EC_CP15_64:
+   return kvm_handle_cp15_access(vcpu, run);
+   case HSR_EC_CP14_MR:
+   return kvm_handle_cp14_access(vcpu, run);
+   case HSR_EC_CP14_LS:
+   return kvm_handle_cp14_load_store(vcpu, run);
+   case HSR_EC_CP14_64:
+   return kvm_handle_cp14_access(vcpu, run);
+   case HSR_EC_CP_0_13:
+   return kvm_handle_cp_0_13_access(vcpu, run);
+   case HSR_EC_CP10_ID:
+   return kvm_handle_cp10_id(vcpu, run);
+   case HSR_EC_SVC_HYP:
+   /* SVC called from Hyp mode should never get here */
+   kvm_msg(SVC called from Hyp mode shouldn't go here);
+   BUG();
+   case HSR_EC_HVC:
+   kvm_msg(hvc: %x (at %08x), vcpu-arch.hsr  ((1  16) - 1),
+vcpu-arch.regs.pc);
+   kvm_msg( HSR: %8x, vcpu-arch.hsr);
+   break;
+   case HSR_EC_IABT:
+   case HSR_EC_DABT:
+   return kvm_handle_guest_abort(vcpu, run);
+   case HSR_EC_IABT_HYP:
+   case HSR_EC_DABT_HYP:
+   /* The hypervisor should never cause aborts */
+   kvm_msg(The hypervisor itself shouldn't cause aborts);
+   BUG();
+   default:
+   kvm_msg(Unkown exception class: %08x (%08x), hsr_ec,
+   vcpu-arch.hsr);
+   BUG();
+   }
+
+   return 0;
+}
+
 /**
  * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
  * @vcpu:  The VCPU pointer
@@ -333,6 +390,26 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
local_irq_enable();
 
trace_kvm_exit(vcpu-arch.regs.pc);
+
+   ret = handle_exit(vcpu, run, ret);
+   if (ret) {
+   kvm_err(ret, Error in handle_exit);
+   break;
+   }
+
+   if (run-exit_reason == KVM_EXIT_MMIO)
+   break;
+
+   if (need_resched()) {
+   vcpu_put(vcpu);
+   schedule();
+   vcpu_load(vcpu);
+   }
+
+   if (signal_pending(current)  !(run-exit_reason)) {
+

[PATCH v5 08/13] ARM: KVM: Handle guest faults in KVM

2011-12-11 Thread Christoffer Dall
From: Christoffer Dall cd...@cs.columbia.edu

Handles the guest faults in KVM by mapping in corresponding user pages
in the 2nd stage page tables.

Introduces new ARM-specific kernel memory types, PAGE_KVM_GUEST and
pgprot_guest variables used to map 2nd stage memory for KVM guests.

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/pgtable-3level.h |8 ++
 arch/arm/include/asm/pgtable.h|4 +
 arch/arm/kvm/mmu.c|  107 -
 arch/arm/mm/mmu.c |3 +
 4 files changed, 120 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/pgtable-3level.h 
b/arch/arm/include/asm/pgtable-3level.h
index edc3cb9..6dc5331 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -104,6 +104,14 @@
  */
 #define L_PGD_SWAPPER  (_AT(pgdval_t, 1)  55)/* 
swapper_pg_dir entry */
 
+/*
+ * 2-nd stage PTE definitions for LPAE.
+ */
+#define L_PTE2_READ(_AT(pteval_t, 1)  6) /* HAP[0] */
+#define L_PTE2_WRITE   (_AT(pteval_t, 1)  7) /* HAP[1] */
+#define L_PTE2_NORM_WB (_AT(pteval_t, 3)  4) /* MemAttr[3:2] */
+#define L_PTE2_INNER_WB(_AT(pteval_t, 3)  2) /* MemAttr[1:0] 
*/
+
 #ifndef __ASSEMBLY__
 
 #define pud_none(pud)  (!pud_val(pud))
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 20025cc..778856b 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -76,6 +76,7 @@ extern void __pgd_error(const char *file, int line, pgd_t);
 
 extern pgprot_tpgprot_user;
 extern pgprot_tpgprot_kernel;
+extern pgprot_tpgprot_guest;
 
 #define _MOD_PROT(p, b)__pgprot(pgprot_val(p) | (b))
 
@@ -89,6 +90,9 @@ extern pgprot_t   pgprot_kernel;
 #define PAGE_KERNEL_MOD_PROT(pgprot_kernel, L_PTE_XN)
 #define PAGE_KERNEL_EXEC   pgprot_kernel
 #define PAGE_HYP   _MOD_PROT(pgprot_kernel, L_PTE_USER)
+#define PAGE_KVM_GUEST _MOD_PROT(pgprot_guest, L_PTE2_READ | \
+ L_PTE2_WRITE | L_PTE2_NORM_WB | \
+ L_PTE2_INNER_WB)
 
 #define __PAGE_NONE__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | 
L_PTE_XN)
 #define __PAGE_SHARED  __pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index f7a7b17..d468238 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -229,8 +229,111 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
kvm-arch.pgd = NULL;
 }
 
+static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+ gfn_t gfn, struct kvm_memory_slot *memslot)
+{
+   pfn_t pfn;
+   pgd_t *pgd;
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *pte, new_pte;
+
+   pfn = gfn_to_pfn(vcpu-kvm, gfn);
+
+   if (is_error_pfn(pfn)) {
+   kvm_err(-EFAULT, Guest gfn %u (0x%08lx) does not have 
+   corresponding host mapping,
+   gfn, gfn  PAGE_SHIFT);
+   return -EFAULT;
+   }
+
+   /* Create 2nd stage page table mapping - Level 1 */
+   pgd = vcpu-kvm-arch.pgd + pgd_index(fault_ipa);
+   pud = pud_offset(pgd, fault_ipa);
+   if (pud_none(*pud)) {
+   pmd = pmd_alloc_one(NULL, fault_ipa);
+   if (!pmd) {
+   kvm_err(-ENOMEM, Cannot allocate 2nd stage pmd);
+   return -ENOMEM;
+   }
+   pud_populate(NULL, pud, pmd);
+   pmd += pmd_index(fault_ipa);
+   } else
+   pmd = pmd_offset(pud, fault_ipa);
+
+   /* Create 2nd stage page table mapping - Level 2 */
+   if (pmd_none(*pmd)) {
+   pte = pte_alloc_one_kernel(NULL, fault_ipa);
+   if (!pte) {
+   kvm_err(-ENOMEM, Cannot allocate 2nd stage pte);
+   return -ENOMEM;
+   }
+   pmd_populate_kernel(NULL, pmd, pte);
+   pte += pte_index(fault_ipa);
+   } else
+   pte = pte_offset_kernel(pmd, fault_ipa);
+
+   /* Create 2nd stage page table mapping - Level 3 */
+   new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
+   set_pte_ext(pte, new_pte, 0);
+
+   return 0;
+}
+
+#define HSR_ABT_FS (0x3f)
+#define HPFAR_MASK (~0xf)
+
+/**
+ * kvm_handle_guest_abort - handles all 2nd stage aborts
+ * @vcpu:  the VCPU pointer
+ * @run:   the kvm_run structure
+ *
+ * Any abort that gets to the host is almost guaranteed to be caused by a
+ * missing second stage translation table entry, which can mean that either the
+ * guest simply needs more memory and we must allocate an appropriate page or 
it
+ * can mean that the guest tried to access I/O memory, which is emulated by 
user
+ * space. The 

[PATCH v5 09/13] ARM: KVM: Handle I/O aborts

2011-12-11 Thread Christoffer Dall
From: Christoffer Dall cd...@cs.columbia.edu

When the guest accesses I/O memory this will create data abort
exceptions and they are handled by decoding the HSR information
(physical address, read/write, length, register) and forwarding reads
and writes to QEMU which performs the device emulation.

Certain classes of load/store operations do not support the syndrome
information provided in the HSR and we therefore must be able to fetch
the offending instruction from guest memory and decode it manually.

This requires changing the general flow somewhat since new calls to run
the VCPU must check if there's a pending MMIO load and perform the write
after userspace has made the data available.

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_emulate.h |2 
 arch/arm/include/asm/kvm_host.h|1 
 arch/arm/include/asm/kvm_mmu.h |1 
 arch/arm/kvm/arm.c |8 +
 arch/arm/kvm/emulate.c |  288 
 arch/arm/kvm/mmu.c |  155 +++
 arch/arm/kvm/trace.h   |   22 +++
 7 files changed, 470 insertions(+), 7 deletions(-)

diff --git a/arch/arm/include/asm/kvm_emulate.h 
b/arch/arm/include/asm/kvm_emulate.h
index af21fd5..9899474 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -46,6 +46,8 @@ int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct 
kvm_run *run);
 int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp15_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_emulate_mmio_ls(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+   unsigned long instr);
 
 /*
  * Return the SPSR for the specified mode of the virtual CPU.
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 59fcd15..86f6cf1 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -88,6 +88,7 @@ struct kvm_vcpu_arch {
u64 pc_ipa; /* IPA for the current PC (VA to PA result) */
 
/* IO related fields */
+   bool mmio_sign_extend;  /* for byte/halfword loads */
u32 mmio_rd;
 
/* Misc. fields */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 9d7440c..e82eae9 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -35,6 +35,7 @@ void free_hyp_pmds(pgd_t *hyp_pgd);
 int kvm_alloc_stage2_pgd(struct kvm *kvm);
 void kvm_free_stage2_pgd(struct kvm *kvm);
 
+int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
 
 #endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index a6e1763..e5348a7 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -379,6 +379,14 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
int ret;
 
for (;;) {
+   if (run-exit_reason == KVM_EXIT_MMIO) {
+   ret = kvm_handle_mmio_return(vcpu, vcpu-run);
+   if (ret)
+   break;
+   }
+
+   run-exit_reason = KVM_EXIT_UNKNOWN;
+
trace_kvm_entry(vcpu-arch.regs.pc);
 
local_irq_disable();
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index fded8c7..4fb5a7d 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -20,6 +20,7 @@
 #include asm/kvm_emulate.h
 #include trace/events/kvm.h
 
+#include trace.h
 #include debug.h
 #include trace.h
 
@@ -128,8 +129,30 @@ u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 
mode)
 }
 
 /**
- * Co-processor emulation
+ * Utility functions common for all emulation code
+ */
+
+/*
+ * This one accepts a matrix where the first element is the
+ * bits as they must be, and the second element is the bitmask.
  */
+#define INSTR_NONE -1
+static int kvm_instr_index(u32 instr, u32 table[][2], int table_entries)
+{
+   int i;
+   u32 mask;
+
+   for (i = 0; i  table_entries; i++) {
+   mask = table[i][1];
+   if ((table[i][0]  mask) == (instr  mask))
+   return i;
+   }
+   return INSTR_NONE;
+}
+
+/**
+ * Co-processor emulation
+ */
 
 struct coproc_params {
unsigned long CRm;
@@ -228,9 +251,11 @@ static int emulate_cp15_c10_access(struct kvm_vcpu *vcpu,
  * @vcpu: The VCPU pointer
  * @p:The coprocessor parameters struct pointer holding trap inst. details
  *
- 

[PATCH v5 10/13] ARM: KVM: Guest wait-for-interrupts (WFI) support

2011-12-11 Thread Christoffer Dall
From: Christoffer Dall cd...@cs.columbia.edu

When the guest executes a WFI instruction the operation is trapped to
KVM, which emulates the instruction in software. There is no correlation
between a guest executing a WFI instruction and actually puttin the
hardware into a low-power mode, since a KVM guest is essentially a
process and the WFI instruction can be seen as 'sleep' call from this
process. Therefore, we flag the VCPU to be in wait_for_interrupts mode
and call the main KVM function kvm_vcpu_block() function. This function
will put the thread on a wait-queue and call schedule.

When an interrupt comes in through KVM_IRQ_LINE (see previous patch) we
signal the VCPU thread and unflag the VCPU to no longer wait for
interrupts. All calls to kvm_arch_vcpu_ioctl_run() result in a call to
kvm_vcpu_block() as long as the VCPU is in wfi-mode.


Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/kvm/arm.c |   33 -
 arch/arm/kvm/emulate.c |   12 
 arch/arm/kvm/trace.h   |   15 +++
 3 files changed, 51 insertions(+), 9 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index e5348a7..00215a1 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -302,9 +302,16 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
return -EINVAL;
 }
 
+/**
+ * kvm_arch_vcpu_runnable - determine if the vcpu can be scheduled
+ * @v: The VCPU pointer
+ *
+ * If the guest CPU is not waiting for interrupts then it is by definition
+ * runnable.
+ */
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
-   return 0;
+   return !v-arch.wait_for_interrupts;
 }
 
 static inline int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
@@ -379,6 +386,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
int ret;
 
for (;;) {
+   if (vcpu-arch.wait_for_interrupts)
+   goto wait_for_interrupts;
+
if (run-exit_reason == KVM_EXIT_MMIO) {
ret = kvm_handle_mmio_return(vcpu, vcpu-run);
if (ret)
@@ -408,16 +418,19 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
if (run-exit_reason == KVM_EXIT_MMIO)
break;
 
-   if (need_resched()) {
-   vcpu_put(vcpu);
-   schedule();
-   vcpu_load(vcpu);
-   }
-
-   if (signal_pending(current)  !(run-exit_reason)) {
-   run-exit_reason = KVM_EXIT_IRQ_WINDOW_OPEN;
+   if (need_resched())
+   kvm_resched(vcpu);
+wait_for_interrupts:
+   if (signal_pending(current)) {
+   if (!run-exit_reason) {
+   ret = -EINTR;
+   run-exit_reason = KVM_EXIT_INTR;
+   }
break;
}
+
+   if (vcpu-arch.wait_for_interrupts)
+   kvm_vcpu_block(vcpu);
}
 
return ret;
@@ -454,6 +467,8 @@ static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm,
if (irq_level-level) {
vcpu-arch.virt_irq |= mask;
vcpu-arch.wait_for_interrupts = 0;
+   if (waitqueue_active(vcpu-wq))
+   wake_up_interruptible(vcpu-wq);
} else
vcpu-arch.virt_irq = ~mask;
 
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index 4fb5a7d..f60c75a 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -335,8 +335,20 @@ unsupp_err_out:
return -EINVAL;
 }
 
+/**
+ * kvm_handle_wfi - handle a wait-for-interrupts instruction executed by a 
guest
+ * @vcpu:  the vcpu pointer
+ * @run:   the kvm_run structure pointer
+ *
+ * Simply sets the wait_for_interrupts flag on the vcpu structure, which will
+ * halt execution of world-switches and schedule other host processes until
+ * there is an incoming IRQ or FIQ to the VM.
+ */
 int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
+   trace_kvm_wfi(vcpu-arch.regs.pc);
+   if (!vcpu-arch.virt_irq)
+   vcpu-arch.wait_for_interrupts = 1;
return 0;
 }
 
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index 8ba3db9..693da82 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -111,6 +111,21 @@ TRACE_EVENT(kvm_irq_line,
__entry-level, __entry-vcpu_idx)
 );
 
+TRACE_EVENT(kvm_wfi,
+   TP_PROTO(unsigned long vcpu_pc),
+   TP_ARGS(vcpu_pc),
+
+   TP_STRUCT__entry(
+   __field(unsigned long,  vcpu_pc )
+   ),
+
+   TP_fast_assign(
+   __entry-vcpu_pc= vcpu_pc;
+   ),
+
+   TP_printk(guest executed wfi at: 0x%08lx, __entry-vcpu_pc)
+);
+
 
 #endif /* _TRACE_KVM_H */
 

--
To 

[PATCH v5 11/13] ARM: KVM: Support SMP hosts

2011-12-11 Thread Christoffer Dall
In order to support KVM on a SMP host, it is necessary to initialize the
hypervisor on all CPUs, mostly by making sure each CPU gets its own
hypervisor stack and runs the HYP init code.

We also take care of some missing locking of modifications to the
hypervisor page tables and ensure synchronized consistency between
virtual IRQ masks and wait_for_interrupt flags on the VPUs.

Note that this code doesn't handle CPU hotplug yet.
Note that this code doesn't support SMP guests.

WARNING: This code is in development and guests do not fully boot on SMP
hosts yet.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_host.h |4 -
 arch/arm/include/asm/kvm_mmu.h  |1 
 arch/arm/kvm/arm.c  |  175 +++
 arch/arm/kvm/emulate.c  |2 
 arch/arm/kvm/mmu.c  |9 ++
 5 files changed, 114 insertions(+), 77 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 86f6cf1..a0ffbe8 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -78,8 +78,6 @@ struct kvm_vcpu_arch {
u32 c13_TID_PRIV;   /* Thread ID, Priveleged */
} cp15;
 
-   u32 virt_irq;   /* HCR exception mask */
-
/* Exception Information */
u32 hsr;/* Hyp Syndrom Register */
u32 hdfar;  /* Hyp Data Fault Address Register */
@@ -92,6 +90,8 @@ struct kvm_vcpu_arch {
u32 mmio_rd;
 
/* Misc. fields */
+   spinlock_t irq_lock;
+   u32 virt_irq;   /* HCR exception mask */
u32 wait_for_interrupts;
 };
 
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index e82eae9..917edd7 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -28,6 +28,7 @@
 #define PGD2_ORDER get_order(PTRS_PER_PGD2 * sizeof(pgd_t))
 
 extern pgd_t *kvm_hyp_pgd;
+extern struct mutex kvm_hyp_pgd_mutex;
 
 int create_hyp_mappings(pgd_t *hyp_pgd, void *from, void *to);
 void free_hyp_pmds(pgd_t *hyp_pgd);
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 00215a1..6e384e2 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -61,7 +61,7 @@ void __kvm_print_msg(char *fmt, ...)
spin_unlock(__tmp_log_lock);
 }
 
-static void *kvm_arm_hyp_stack_page;
+static DEFINE_PER_CPU(void *, kvm_arm_hyp_stack_page);
 
 /* The VMID used in the VTTBR */
 #define VMID_SIZE (18)
@@ -257,6 +257,8 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
unsigned long cpsr;
unsigned long sctlr;
 
+   spin_lock_init(vcpu-arch.irq_lock);
+
/* Init execution CPSR */
asm volatile (mrs  %[cpsr], cpsr :
[cpsr] =r (cpsr));
@@ -464,13 +466,27 @@ static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm,
 
trace_kvm_irq_line(irq_level-irq % 2, irq_level-level, vcpu_idx);
 
+   spin_lock(vcpu-arch.irq_lock);
if (irq_level-level) {
vcpu-arch.virt_irq |= mask;
+
+   /*
+* Note that we grab the wq.lock before clearing the wfi flag
+* since this ensures that a concurrent call to kvm_vcpu_block
+* will either sleep before we grab the lock, in which case we
+* wake it up, or will never sleep due to
+* kvm_arch_vcpu_runnable being true (iow. this avoids having
+* to grab the irq_lock in kvm_arch_vcpu_runnable).
+*/
+   spin_lock(vcpu-wq.lock);
vcpu-arch.wait_for_interrupts = 0;
+
if (waitqueue_active(vcpu-wq))
-   wake_up_interruptible(vcpu-wq);
+   __wake_up_locked(vcpu-wq, TASK_INTERRUPTIBLE);
+   spin_unlock(vcpu-wq.lock);
} else
vcpu-arch.virt_irq = ~mask;
+   spin_unlock(vcpu-arch.irq_lock);
 
return 0;
 }
@@ -505,14 +521,49 @@ long kvm_arch_vm_ioctl(struct file *filp,
}
 }
 
+static void cpu_set_vector(void *vector)
+{
+   /*
+* Set the HVBAR
+*/
+   asm volatile (
+   movr0, %[vector_ptr]\n\t
+   ldrr7, =SMCHYP_HVBAR_W\n\t
+   smc#0\n\t : :
+   [vector_ptr] r (vector) :
+   r0, r7);
+}
+
+static void cpu_init_hyp_mode(void *vector)
+{
+   unsigned long hyp_stack_ptr;
+   void *stack_page;
+
+   stack_page = __get_cpu_var(kvm_arm_hyp_stack_page);
+   hyp_stack_ptr = (unsigned long)stack_page + PAGE_SIZE;
+
+   cpu_set_vector(vector);
+
+   /*
+* Call initialization code
+*/
+   asm volatile (
+   movr0, %[pgd_ptr]\n\t
+   movr1, %[stack_ptr]\n\t
+   hvc#0\n\t : :
+   [pgd_ptr] r (virt_to_phys(kvm_hyp_pgd)),
+   [stack_ptr] r 

[PATCH v5 12/13] ARM: KVM: Fix guest view of MPIDR

2011-12-11 Thread Christoffer Dall
From: Marc Zyngier marc.zyng...@arm.com

A guest may need to know which CPU it has booted on (and Linux does).
Now that we can run KVM on a SMP host, QEMU may be running on any
CPU. In that case, directly reading MPIDR will give an inconsistent
view on the guest CPU number (among other problems).

The solution is to use the VMPIDR register, which is computed by
using the host MPIDR and overriding the low bits with KVM vcpu_id.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
---
 arch/arm/include/asm/kvm_host.h |1 +
 arch/arm/kernel/asm-offsets.c   |1 +
 arch/arm/kvm/arm.c  |4 
 arch/arm/kvm/interrupts.S   |8 
 4 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index a0ffbe8..7fcc412 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -63,6 +63,7 @@ struct kvm_vcpu_arch {
/* System control coprocessor (cp15) */
struct {
u32 c0_MIDR;/* Main ID Register */
+   u32 c0_MPIDR;   /* MultiProcessor ID Register */
u32 c1_SCTLR;   /* System Control Register */
u32 c1_ACTLR;   /* Auxilliary Control Register */
u32 c1_CPACR;   /* Coprocessor Access Control */
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index c126cfb..1c6e2ee 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -148,6 +148,7 @@ int main(void)
 #ifdef CONFIG_KVM_ARM_HOST
   DEFINE(VCPU_KVM, offsetof(struct kvm_vcpu, kvm));
   DEFINE(VCPU_MIDR,offsetof(struct kvm_vcpu, arch.cp15.c0_MIDR));
+  DEFINE(VCPU_MPIDR,   offsetof(struct kvm_vcpu, arch.cp15.c0_MPIDR));
   DEFINE(VCPU_SCTLR,   offsetof(struct kvm_vcpu, arch.cp15.c1_SCTLR));
   DEFINE(VCPU_CPACR,   offsetof(struct kvm_vcpu, arch.cp15.c1_CPACR));
   DEFINE(VCPU_TTBR0,   offsetof(struct kvm_vcpu, arch.cp15.c2_TTBR0));
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 6e384e2..9c5c38e 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -32,6 +32,7 @@
 #include asm/ptrace.h
 #include asm/mman.h
 #include asm/tlbflush.h
+#include asm/cputype.h
 #include asm/kvm_arm.h
 #include asm/kvm_asm.h
 #include asm/kvm_mmu.h
@@ -270,6 +271,9 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
[sctlr] =r (sctlr));
vcpu-arch.cp15.c1_SCTLR = sctlr  ~1U;
 
+   /* Compute guest MPIDR */
+   vcpu-arch.cp15.c0_MPIDR = (read_cpuid_mpidr()  ~0xff) | vcpu-vcpu_id;
+
return 0;
 }
 
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index d516bf4..fbc26ca 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -245,6 +245,10 @@ ENTRY(__kvm_vcpu_run)
ldr r1, [r0, #VCPU_MIDR]
mcr p15, 4, r1, c0, c0, 0
 
+   @ Write guest view of MPIDR into VMPIDR
+   ldr r1, [r0, #VCPU_MPIDR]
+   mcr p15, 4, r1, c0, c0, 5
+
@ Load guest registers
add r0, r0, #(VCPU_USR_SP)
load_mode_state r0, usr
@@ -291,6 +295,10 @@ __kvm_vcpu_return:
mrc p15, 0, r2, c0, c0, 0
mcr p15, 4, r2, c0, c0, 0
 
+   @ Back to hardware MPIDR
+   mrc p15, 0, r2, c0, c0, 5
+   mcr p15, 4, r2, c0, c0, 5
+
@ Set VMID == 0
mov r2, #0
mov r3, #0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 13/13] ARM: KVM: Support SMP guests

2011-12-11 Thread Christoffer Dall
This patch is a beginning attempt to support SMP guests. So far we only
add locking for the second stage PGD stored on the kvm_arch struct.

WARNING: This code is untested and does not yet support SMP guests.

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_host.h |   12 ++--
 arch/arm/kvm/arm.c  |1 +
 arch/arm/kvm/mmu.c  |   57 +--
 3 files changed, 47 insertions(+), 23 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 7fcc412..555a6f1 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -31,9 +31,15 @@ struct kvm_vcpu;
 u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
 
 struct kvm_arch {
-   u32vmid;/* The VMID used for the virt. memory system */
-   pgd_t *pgd; /* 1-level 2nd stage table */
-   u64vttbr;   /* VTTBR value associated with above pgd and vmid */
+   /* The VMID used for the virt. memory system */
+   u32vmid;
+
+   /* 1-level 2nd stage table and lock */
+   struct mutex pgd_mutex;
+   pgd_t *pgd;
+
+   /* VTTBR value associated with above pgd and vmid */
+   u64vttbr;
 };
 
 #define EXCEPTION_NONE  0
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 9c5c38e..14ccc4d 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -119,6 +119,7 @@ int kvm_arch_init_vm(struct kvm *kvm)
ret = kvm_alloc_stage2_pgd(kvm);
if (ret)
goto out_fail_alloc;
+   mutex_init(kvm-arch.pgd_mutex);
 
pgd_phys = virt_to_phys(kvm-arch.pgd);
kvm-arch.vttbr = pgd_phys  ((1LLU  40) - 1)  ~((2  VTTBR_X) - 1);
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 50c9571..baeb8a1 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -177,6 +177,9 @@ out:
  * Allocates the 1st level table only of size defined by PGD2_ORDER (can
  * support either full 40-bit input addresses or limited to 32-bit input
  * addresses). Clears the allocated pages.
+ *
+ * Note we don't need locking here as this is only called when the VM is
+ * destroyed, which can only be done once.
  */
 int kvm_alloc_stage2_pgd(struct kvm *kvm)
 {
@@ -204,6 +207,9 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
  * Walks the level-1 page table pointed to by kvm-arch.pgd and frees all
  * underlying level-2 and level-3 tables before freeing the actual level-1 
table
  * and setting the struct pointer to NULL.
+ *
+ * Note we don't need locking here as this is only called when the VM is
+ * destroyed, which can only be done once.
  */
 void kvm_free_stage2_pgd(struct kvm *kvm)
 {
@@ -239,49 +245,38 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
kvm-arch.pgd = NULL;
 }
 
-static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
- gfn_t gfn, struct kvm_memory_slot *memslot)
+static int __user_mem_abort(struct kvm *kvm, phys_addr_t addr, pfn_t pfn)
 {
-   pfn_t pfn;
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
pte_t *pte, new_pte;
 
-   pfn = gfn_to_pfn(vcpu-kvm, gfn);
-
-   if (is_error_pfn(pfn)) {
-   kvm_err(-EFAULT, Guest gfn %u (0x%08lx) does not have 
-   corresponding host mapping,
-   gfn, gfn  PAGE_SHIFT);
-   return -EFAULT;
-   }
-
/* Create 2nd stage page table mapping - Level 1 */
-   pgd = vcpu-kvm-arch.pgd + pgd_index(fault_ipa);
-   pud = pud_offset(pgd, fault_ipa);
+   pgd = kvm-arch.pgd + pgd_index(addr);
+   pud = pud_offset(pgd, addr);
if (pud_none(*pud)) {
-   pmd = pmd_alloc_one(NULL, fault_ipa);
+   pmd = pmd_alloc_one(NULL, addr);
if (!pmd) {
kvm_err(-ENOMEM, Cannot allocate 2nd stage pmd);
return -ENOMEM;
}
pud_populate(NULL, pud, pmd);
-   pmd += pmd_index(fault_ipa);
+   pmd += pmd_index(addr);
} else
-   pmd = pmd_offset(pud, fault_ipa);
+   pmd = pmd_offset(pud, addr);
 
/* Create 2nd stage page table mapping - Level 2 */
if (pmd_none(*pmd)) {
-   pte = pte_alloc_one_kernel(NULL, fault_ipa);
+   pte = pte_alloc_one_kernel(NULL, addr);
if (!pte) {
kvm_err(-ENOMEM, Cannot allocate 2nd stage pte);
return -ENOMEM;
}
pmd_populate_kernel(NULL, pmd, pte);
-   pte += pte_index(fault_ipa);
+   pte += pte_index(addr);
} else
-   pte = pte_offset_kernel(pmd, fault_ipa);
+   pte = pte_offset_kernel(pmd, addr);
 
/* Create 2nd stage page table mapping - Level 3 */
new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
@@ -290,6 +285,28 @@ static int 

[PATCH] kvm tools: Add NMI ability to 'kvm debug'

2011-12-11 Thread Sasha Levin
This allows triggering NMI on guests using 'kvm debug -m [cpu]'.

Please note that the default behaviour of 'kvm debug' dumping guest's cpu
state has been modified to require a '-d'/--dump.

Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/builtin-debug.c|   22 +++
 tools/kvm/builtin-run.c  |   16 +++-
 tools/kvm/include/kvm/builtin-debug.h|   11 
 tools/kvm/include/kvm/kvm-cpu.h  |1 +
 tools/kvm/kvm-cpu.c  |5 +++
 tools/kvm/x86/include/kvm/kvm-cpu-arch.h |1 +
 tools/kvm/x86/kvm-cpu.c  |   41 ++
 7 files changed, 90 insertions(+), 7 deletions(-)

diff --git a/tools/kvm/builtin-debug.c b/tools/kvm/builtin-debug.c
index 045dc2c..eee26c3 100644
--- a/tools/kvm/builtin-debug.c
+++ b/tools/kvm/builtin-debug.c
@@ -14,13 +14,10 @@
 
 static bool all;
 static int instance;
+static int nmi = -1;
+static bool dump;
 static const char *instance_name;
 
-struct debug_cmd {
-   u32 type;
-   u32 len;
-};
-
 static const char * const debug_usage[] = {
kvm debug [--all] [-n name],
NULL
@@ -30,6 +27,8 @@ static const struct option debug_options[] = {
OPT_GROUP(General options:),
OPT_BOOLEAN('a', all, all, Debug all instances),
OPT_STRING('n', name, instance_name, name, Instance name),
+   OPT_BOOLEAN('d', dump, dump, Generate a debug dump from guest),
+   OPT_INTEGER('m', nmi, nmi, Generate NMI on VCPU),
OPT_END()
 };
 
@@ -51,13 +50,24 @@ void kvm_debug_help(void)
 static int do_debug(const char *name, int sock)
 {
char buff[BUFFER_SIZE];
-   struct debug_cmd cmd = {KVM_IPC_DEBUG, 0};
+   struct debug_cmd cmd = {KVM_IPC_DEBUG, 2 * sizeof(u32)};
int r;
 
+   if (dump)
+   cmd.dbg_type |= KVM_DEBUG_CMD_TYPE_DUMP;
+
+   if (nmi != -1) {
+   cmd.dbg_type |= KVM_DEBUG_CMD_TYPE_NMI;
+   cmd.cpu = nmi;
+   }
+
r = xwrite(sock, cmd, sizeof(cmd));
if (r  0)
return r;
 
+   if (!dump)
+   return 0;
+
do {
r = xread(sock, buff, BUFFER_SIZE);
if (r  0)
diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index 7969901..7709edb 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -31,6 +31,7 @@
 #include kvm/guest_compat.h
 #include kvm/pci-shmem.h
 #include kvm/kvm-ipc.h
+#include kvm/builtin-debug.h
 
 #include linux/types.h
 
@@ -464,7 +465,7 @@ static void handle_sigusr1(int sig)
struct kvm_cpu *cpu = current_kvm_cpu;
int fd = kvm_cpu__get_debug_fd();
 
-   if (!cpu)
+   if (!cpu || cpu-needs_nmi)
return;
 
dprintf(fd, \n #\n # vCPU #%ld's dump:\n #\n, cpu-cpu_id);
@@ -495,6 +496,19 @@ static void handle_pause(int fd, u32 type, u32 len, u8 
*msg)
 static void handle_debug(int fd, u32 type, u32 len, u8 *msg)
 {
int i;
+   u32 dbg_type = *(u32 *)msg;
+   int vcpu = *(((u32 *)msg) + 1);
+
+   if (dbg_type  KVM_DEBUG_CMD_TYPE_NMI) {
+   if (vcpu = kvm-nrcpus)
+   return;
+
+   kvm_cpus[vcpu]-needs_nmi = 1;
+   pthread_kill(kvm_cpus[vcpu]-thread, SIGUSR1);
+   }
+
+   if (!(dbg_type  KVM_DEBUG_CMD_TYPE_DUMP))
+   return;
 
for (i = 0; i  nrcpus; i++) {
struct kvm_cpu *cpu = kvm_cpus[i];
diff --git a/tools/kvm/include/kvm/builtin-debug.h 
b/tools/kvm/include/kvm/builtin-debug.h
index 3fc2469..b24b501 100644
--- a/tools/kvm/include/kvm/builtin-debug.h
+++ b/tools/kvm/include/kvm/builtin-debug.h
@@ -1,6 +1,17 @@
 #ifndef KVM__DEBUG_H
 #define KVM__DEBUG_H
 
+#include linux/types.h
+
+struct debug_cmd {
+   u32 type;
+   u32 len;
+   u32 dbg_type;
+#define KVM_DEBUG_CMD_TYPE_DUMP(1  0)
+#define KVM_DEBUG_CMD_TYPE_NMI (1  1)
+   u32 cpu;
+};
+
 int kvm_cmd_debug(int argc, const char **argv, const char *prefix);
 void kvm_debug_help(void);
 
diff --git a/tools/kvm/include/kvm/kvm-cpu.h b/tools/kvm/include/kvm/kvm-cpu.h
index 15618f1..d4448f6 100644
--- a/tools/kvm/include/kvm/kvm-cpu.h
+++ b/tools/kvm/include/kvm/kvm-cpu.h
@@ -19,5 +19,6 @@ void kvm_cpu__set_debug_fd(int fd);
 void kvm_cpu__show_code(struct kvm_cpu *vcpu);
 void kvm_cpu__show_registers(struct kvm_cpu *vcpu);
 void kvm_cpu__show_page_tables(struct kvm_cpu *vcpu);
+void kvm_cpu__arch_nmi(struct kvm_cpu *cpu);
 
 #endif /* KVM__KVM_CPU_H */
diff --git a/tools/kvm/kvm-cpu.c b/tools/kvm/kvm-cpu.c
index 884a89f..8ec4efa 100644
--- a/tools/kvm/kvm-cpu.c
+++ b/tools/kvm/kvm-cpu.c
@@ -94,6 +94,11 @@ int kvm_cpu__start(struct kvm_cpu *cpu)
cpu-paused = 0;
}
 
+   if (cpu-needs_nmi) {
+   kvm_cpu__arch_nmi(cpu);
+   cpu-needs_nmi = 0;
+   }
+
kvm_cpu__run(cpu);
 
   

Re: [PATCH v5 00/13] KVM/ARM Implementation

2011-12-11 Thread Peter Maydell
On 11 December 2011 10:24, Christoffer Dall
c.d...@virtualopensystems.com wrote:
 The following series implements KVM support for ARM processors,
 specifically on the Cortex A-15 platform.

 Still on the to-do list:
  - Reuse VMIDs
  - Fix SMP host support
  - Fix SMP guest support
  - Support guest Thumb mode for MMIO emulation
  - Further testing
  - Performance improvements

Other items for this list:
 - Support Neon/VFP in guests (the fpu regs struct is empty ATM)
 - Support guest debugging

I couldn't see any support for the TLS registers in your cp15 emulation:
did I miss it, or do we handle it without needing to trap?

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] virtio: use mandatory barriers for remote processor vdevs

2011-12-11 Thread Michael S. Tsirkin
On Sat, Dec 03, 2011 at 03:44:36PM +1030, Rusty Russell wrote:
 On Sat, 03 Dec 2011 10:09:44 +1100, Benjamin Herrenschmidt 
 b...@kernel.crashing.org wrote:
  On Tue, 2011-11-29 at 14:31 +0200, Ohad Ben-Cohen wrote:
   A trivial, albeit sub-optimal, solution would be to simply revert
   commit d57ed95 virtio: use smp_XX barriers on SMP. Obviously, though,
   that's going to have a negative impact on performance of SMP-based
   virtualization use cases.
  
  Have you measured the impact of using normal barriers (non-SMP ones)
  like we use on normal HW drivers unconditionally ?
  
  IE. If the difference is small enough I'd say just go for it and avoid
  the bloat.
 
 Yep.  Plan is:
 1) Measure the difference.
 2) Difference unmeassurable?  Use normal barriers (ie. revert d57ed95).
 3) Difference small?  Revert d57ed95 for 3.2, revisit for 3.3.
 4) Difference large?  Runtime switch based on if you're PCI for 3.2,
revisit for 3.3.
 
 Cheers,
 Rusty.

Forwarding some results by Amos, who run multiple netperf streams in
parallel, from an external box to the guest.  TCP_STREAM results were
noisy.  This could be due to buffering done by TCP, where packet size
varies even as message size is constant.

TCP_RR results were consistent. In this benchmark, after switching
to mandatory barriers, CPU utilization increased by up to 35% while
throughput went down by up to 14%. the normalized throughput/cpu
regressed consistently, between 7 and 35%

The fix applied was simply this:

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 3198f2e..fdccb77 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -23,7 +23,7 @@
 
 /* virtio guest is communicating with a virtual device that actually runs on
  * a host processor.  Memory barriers are used to control SMP effects. */
-#ifdef CONFIG_SMP
+#if 0
 /* Where possible, use SMP barriers which are more lightweight than mandatory
  * barriers, because mandatory barriers control MMIO effects on accesses
  * through relaxed memory I/O windows (which virtio does not use). */



-- 
MST
Fri Dec  9 23:57:33 2011

1 - old-exhost_guest.txt
2 - fixed-exhost_guest.txt

==
TCP_STREAM
  sessions| size|throughput|   cpu| normalize|  #tx-pkts|  #rx-pkts| 
#re-trans|  #tx-intr|  #rx-intr|  #io_exit|  #irq_inj|#tpkt/#exit| #rpkt/#irq
11|   64|949.64| 10.64|89|   1170134|   1368739|
 0|17|487392|488820|504716|   2.39|   2.71
21|   64|946.03| 10.87|87|   1119582|   1325851|
 0|17|493763|485865|516161|   2.30|   2.57
% |  0.0|  -0.4|  +2.2|  -2.2|  -4.3|  -3.1|
 0|   0.0|  +1.3|  -0.6|  +2.3|   -3.8|   -5.2
12|   64|   1877.15| 15.45|   121|   2151267|   2561929|
 0|33|923916|971093|969360|   2.22|   2.64
22|   64|   1867.63| 15.06|   124|   2212457|   2607606|
 0|33|836160|927721|883964|   2.38|   2.95
% |  0.0|  -0.5|  -2.5|  +2.5|  +2.8|  +1.8|
 0|   0.0|  -9.5|  -4.5|  -8.8|   +7.2|  +11.7
14|   64|   3577.38| 19.62|   182|   4176151|   5036661|
 0|64|   1677417|   1412979|   1859101|   2.96|   2.71
24|   64|   3583.17| 20.05|   178|   4215327|   5063534|
 0|65|   1682582|   1549394|   1759033|   2.72|   2.88
% |  0.0|  +0.2|  +2.2|  -2.2|  +0.9|  +0.5|
 0|  +1.6|  +0.3|  +9.7|  -5.4|   -8.1|   +6.3
11|  256|   2654.52| 11.41|   232|925787|   1029214|
 0|14|597763|670927|619414|   1.38|   1.66
21|  256|   2632.22| 20.32|   129|977446|   1036094|
 0|15|742699|715460|764512|   1.37|   1.36
% |  0.0|  -0.8| +78.1| -44.4|  +5.6|  +0.7|
 0|  +7.1| +24.2|  +6.6| +23.4|   -0.7|  -18.1
12|  256|   5228.76| 16.94|   308|   1949442|   2082492|
 0|30|   1230329|   1323945|   1274262|   1.47|   1.63
22|  256|   5140.98| 19.58|   262|   1991090|   2093206|
 0|30|   1400232|   1271363|   1441564|   1.57|   1.45
% |  0.0|  -1.7| +15.6| -14.9|  +2.1|  +0.5|
 0|   0.0| +13.8|  -4.0| +13.1|   +6.8|  -11.0
14|  256|   9412.61| 24.04|   391|   2292404|   2351356|
 0|35|   1669864|555786|   1741742|   4.12|   1.35
24|  256|   9408.92| 22.80|   412|   2303267|   

Re: [PATCH 0/11] RFC: PCI using capabilitities

2011-12-11 Thread Michael S. Tsirkin
On Sun, Dec 11, 2011 at 12:03:52PM +0200, Sasha Levin wrote:
 On Sun, 2011-12-11 at 11:05 +0200, Avi Kivity wrote:
  mmios are strictly ordered.
  
  Perhaps your printfs are reordered by buffering?  Are they from
  different threads?  Are you using coalesced mmio (which is still
  strictly ordered, if used correctly)? 
 
 I print the queue_selector and queue_address in the printfs, even if
 printfs were reordered they would be printing the data right, unlike
 they do now. It's the data in the printfs that matters, not their order.
 
 Same vcpu thread with both accesses.
 
 Not using coalesced mmio.

Not sure why this would matter, but is the BAR a prefetcheable one?
Rusty's patch uses pci_iomap which maps a prefetcheable BAR
as cacheable.


 -- 
 
 Sasha.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5 V5] Avoid soft lockup message when KVM is stopped by host

2011-12-11 Thread Dor Laor

On 12/07/2011 04:41 PM, Avi Kivity wrote:

On 12/05/2011 10:18 PM, Eric B Munson wrote:

Changes from V4:
Rename KVM_GUEST_PAUSED to KVMCLOCK_GUEST_PAUSED
Add description of KVMCLOCK_GUEST_PAUSED ioctl to api.txt

Changes from V3:
Include CC's on patch 3
Drop clear flag ioctl and have the watchdog clear the flag when it is reset

Changes from V2:
A new kvm functions defined in kvm_para.h, the only change to pvclock is the
initial flag definition

Changes from V1:
(Thanks Marcelo)
Host code has all been moved to arch/x86/kvm/x86.c
KVM_PAUSE_GUEST was renamed to KVM_GUEST_PAUSED

When a guest kernel is stopped by the host hypervisor it can look like a soft
lockup to the guest kernel.  This false warning can mask later soft lockup
warnings which may be real.  This patch series adds a method for a host
hypervisor to communicate to a guest kernel that it is being stopped.  The
final patch in the series has the watchdog check this flag when it goes to
issue a soft lockup warning and skip the warning if the guest knows it was
stopped.

It was attempted to solve this in Qemu, but the side effects of saving and
restoring the clock and tsc for each vcpu put the wall clock of the guest behind
by the amount of time of the pause.  This forces a guest to have ntp running
in order to keep the wall clock accurate.


Guests need to run NTP regardless, not only the virtualization layer add 
some skew, the physical world is not that perfect.
btw: traditional NTP client won't sync the time automatically if the 
diff is  0.5%.




Having this controlled from userspace means it doesn't work for SIGSTOP
or for long scheduling delays.  What about doing this automatically
based on preempt notifiers?




Isn't it solved by steal time?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/11] RFC: PCI using capabilitities

2011-12-11 Thread Michael S. Tsirkin
On Thu, Dec 08, 2011 at 05:37:37PM +0200, Sasha Levin wrote:
 On Thu, 2011-12-08 at 20:52 +1030, Rusty Russell wrote:
  Here's the patch series I ended up with.  I haven't coded up the QEMU
  side yet, so no idea if the new driver works.
  
  Questions:
  (1) Do we win from separating ISR, NOTIFY and COMMON?
  (2) I used a u8 bar; should I use a bir and pack it instead?  BIR
  seems a little obscure (noone else in the kernel source seems to
  refer to it).
 
 I started implementing it for KVM tools, when I noticed a strange thing:
 my vq creating was failing because the driver was reading a value other
 than 0 from the address field of a new vq, and failing.
 
 I've added simple prints in the usermode code, and saw the following
 ordering:
 
 1. queue select vq 0
 2. queue read address (returns 0 - new vq)
 3. queue write address (good address of vq)
 4. queue read address (returns !=0, fails)
 4. queue select vq 1
 
 From that I understood that the ordering is wrong, the driver was trying
 to read address before selecting the correct vq.
 
 At that point, I've added simple prints to the driver. Initially it
 looked as follows:
 
   iowrite16(index, vp_dev-common-queue_select);
 
   switch (ioread64(vp_dev-common-queue_address)) {
   [...]
   };
 
 So I added prints before the iowrite16() and after the ioread64(), and
 saw that while the driver prints were ordered, the device ones weren't:
 
   [1.264052] before iowrite index=1
   kvmtool: net returning pfn (vq=0): 310706176
   kvmtool: queue selected: 1
   [1.264890] after ioread index=1
 
 Suspecting that something was wrong with ordering, I've added a print
 between the iowrite and the ioread, and it finally started working well.
 
 Which leads me to the question: Are MMIO vs MMIO reads/writes not
 ordered?

First, I'd like to answer your questions from the PCI side.
Look for PCI rules in the PCI spec.
You will notices that a write is required to be able to
pass a read request. It might also pass read completion.
A read request will not pass a write request.
There's more or less no ordering between different types of transactions
(memory versus io/configuration).

That's wrt to the question you asked.

But this is not your setup: you have a single vcpu so
you will not initiate a write (select vq) until you get
a read completion.

So what you are really describing is this setup: guest reads a value,
gets the response, then writes out another one, and kvm tool reports the
write before the read.


 -- 
 
 Sasha.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/11] RFC: PCI using capabilitities

2011-12-11 Thread Sasha Levin
On Sun, 2011-12-11 at 14:30 +0200, Michael S. Tsirkin wrote:
 On Sun, Dec 11, 2011 at 12:03:52PM +0200, Sasha Levin wrote:
  On Sun, 2011-12-11 at 11:05 +0200, Avi Kivity wrote:
   mmios are strictly ordered.
   
   Perhaps your printfs are reordered by buffering?  Are they from
   different threads?  Are you using coalesced mmio (which is still
   strictly ordered, if used correctly)? 
  
  I print the queue_selector and queue_address in the printfs, even if
  printfs were reordered they would be printing the data right, unlike
  they do now. It's the data in the printfs that matters, not their order.
  
  Same vcpu thread with both accesses.
  
  Not using coalesced mmio.
 
 Not sure why this would matter, but is the BAR a prefetcheable one?
 Rusty's patch uses pci_iomap which maps a prefetcheable BAR
 as cacheable.

Wasn't defined as prefetchable, but I'm seeing same thing with or
without it.

-- 

Sasha.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/11] RFC: PCI using capabilitities

2011-12-11 Thread Sasha Levin
On Sun, 2011-12-11 at 14:47 +0200, Michael S. Tsirkin wrote:
 First, I'd like to answer your questions from the PCI side.
 Look for PCI rules in the PCI spec.
 You will notices that a write is required to be able to
 pass a read request. It might also pass read completion.
 A read request will not pass a write request.
 There's more or less no ordering between different types of transactions
 (memory versus io/configuration).
 
 That's wrt to the question you asked.
 
 But this is not your setup: you have a single vcpu so
 you will not initiate a write (select vq) until you get
 a read completion.
 
 So what you are really describing is this setup: guest reads a value,
 gets the response, then writes out another one, and kvm tool reports the
 write before the read. 

No, it's exactly the opposite. Guest writes a value first and then reads
one (writes queue_select and reads queue_address) and kvm tool reporting
the read before the write.

I must add here that the kvm tool doesn't do anything fancy with simple
IO/MMIO. Theres no thread games or anything similar there. The vcpu
thread is doing all the IO/MMIO work.

-- 

Sasha.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3 00/10] KVM in-guest performance monitoring

2011-12-11 Thread Avi Kivity
On 11/10/2011 02:57 PM, Gleb Natapov wrote:
 This patchset exposes an emulated version 2 architectural performance
 monitoring unit to KVM guests.  The PMU is emulated using perf_events,
 so the host kernel can multiplex host-wide, host-user, and the
 guest on available resources.

 The patches are against next branch on kvm.git.



Thanks, applied.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/10] nEPT: Nested INVEPT

2011-12-11 Thread Nadav Har'El
On Thu, Nov 10, 2011, Avi Kivity wrote about Re: [PATCH 08/10] nEPT: Nested 
INVEPT:
 On 11/10/2011 12:01 PM, Nadav Har'El wrote:
  If we let L1 use EPT, we should probably also support the INVEPT 
  instruction.
..
  +   if (vmcs12  nested_cpu_has_ept(vmcs12) 
  +   (vmcs12-ept_pointer == operand.eptp) 
  +   vmx-nested.last_eptp02)
  +   ept_sync_context(vmx-nested.last_eptp02);
  +   else
  +   ept_sync_global();
 
 Are either of these needed?  Won't a write to a shadowed EPT table cause
 them anyway?

This is very good point... You're right that as it stands, any changes
to the guest EPT table (EPT12) will cause changes to the shadow EPT
table (EPT02), and these already cause KVM to do an INVEPT, so no point
to do this again when the guest asks.  So basically, I can have INVEPT
emulated by doing absolutely nothing (after checking all the checks), right?

I wonder if I am missing any reason why a hypervisor might want to do
INVEPT without changing the EPT12 table first.

-- 
Nadav Har'El|Sunday, Dec 11 2011, 
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |Why do programmers mix up Christmas and
http://nadav.harel.org.il   |Halloween? Because DEC 25 = OCT 31
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/10] nEPT: Nested INVEPT

2011-12-11 Thread Avi Kivity
On 12/11/2011 04:24 PM, Nadav Har'El wrote:
 On Thu, Nov 10, 2011, Avi Kivity wrote about Re: [PATCH 08/10] nEPT: Nested 
 INVEPT:
  On 11/10/2011 12:01 PM, Nadav Har'El wrote:
   If we let L1 use EPT, we should probably also support the INVEPT 
   instruction.
 ..
   + if (vmcs12  nested_cpu_has_ept(vmcs12) 
   + (vmcs12-ept_pointer == operand.eptp) 
   + vmx-nested.last_eptp02)
   + ept_sync_context(vmx-nested.last_eptp02);
   + else
   + ept_sync_global();
  
  Are either of these needed?  Won't a write to a shadowed EPT table cause
  them anyway?

 This is very good point... You're right that as it stands, any changes
 to the guest EPT table (EPT12) will cause changes to the shadow EPT
 table (EPT02), and these already cause KVM to do an INVEPT, so no point
 to do this again when the guest asks.  So basically, I can have INVEPT
 emulated by doing absolutely nothing (after checking all the checks), right?

Right.  This was the case for INVLPG before we added out-of-sync pages;
we didn't even intercept the instruction.

 I wonder if I am missing any reason why a hypervisor might want to do
 INVEPT without changing the EPT12 table first.

Shouldn't happen, but why do you care?  If EPT12 has not changed, any
access through EPT02 or its TLB entry is valid.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 05/13] ARM: KVM: Inject IRQs and FIQs from userspace

2011-12-11 Thread Jan Kiszka
Just found two, maybe three nits while browsing by:

On 2011-12-11 11:24, Christoffer Dall wrote:
 Userspace can inject IRQs and FIQs through the KVM_IRQ_LINE VM ioctl.
 This ioctl is used since the sematics are in fact two lines that can be
 either raised or lowered on the VCPU - the IRQ and FIQ lines.
 
 KVM needs to know which VCPU it must operate on and whether the FIQ or
 IRQ line is raised/lowered. Hence both pieces of information is packed
 in the kvm_irq_level-irq field. The irq fild value will be:
   IRQ: vcpu_index * 2
   FIQ: (vcpu_index * 2) + 1
 
 This is documented in Documentation/kvm/api.txt.
 
 The effect of the ioctl is simply to simply raise/lower the
 corresponding virt_irq field on the VCPU struct, which will cause the
 world-switch code to raise/lower virtual interrupts when running the
 guest on next switch. The wait_for_interrupt flag is also cleared for
 raised IRQs causing an idle VCPU to become active again.
 
 Note: The custom trace_kvm_irq_line is used despite a generic definition of
 trace_kvm_set_irq, since the trace-Kvm_set_irq depends on the x86-specific
 define of __HAVE_IOAPIC. Either the trace event should be created
 regardless of this define or it should depend on another ifdef clause,
 common for both x86 and ARM. However, since the arguments don't really
 match those used in ARM, I am yet to be convinced why this is necessary.
 
 Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
 ---
  Documentation/virtual/kvm/api.txt |   10 ++-
  arch/arm/include/asm/kvm.h|8 ++
  arch/arm/include/asm/kvm_arm.h|1 +
  arch/arm/kvm/arm.c|   53 
 -
  arch/arm/kvm/trace.h  |   21 +++
  include/linux/kvm.h   |1 +
  6 files changed, 91 insertions(+), 3 deletions(-)
 
 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index 7945b0b..4abaa67 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -572,7 +572,7 @@ only go to the IOAPIC.  On ia64, a IOSAPIC is created.
  4.25 KVM_IRQ_LINE
  
  Capability: KVM_CAP_IRQCHIP
 -Architectures: x86, ia64
 +Architectures: x86, ia64, arm
  Type: vm ioctl
  Parameters: struct kvm_irq_level
  Returns: 0 on success, -1 on error
 @@ -582,6 +582,14 @@ Requires that an interrupt controller model has been 
 previously created with
  KVM_CREATE_IRQCHIP.  Note that edge-triggered interrupts require the level
  to be set to 1 and then back to 0.
  
 +KVM_CREATE_IRQCHIP (except for ARM).  Note that edge-triggered interrupts
 +require the level to be set to 1 and then back to 0.

You probably wanted to replace the original lines with these two, no?

 +
 +ARM uses two types of interrupt lines per CPU, ie. IRQ and FIQ. The value of 
 the
 +irq field should be (VCPU_INDEX * 2) for IRQs and ((VCPU_INDEX * 2) + 1) for
 +FIQs. Level is used to raise/lower the line. See arch/arm/include/asm/kvm.h 
 for
 +convenience macros.
 +
  struct kvm_irq_level {
   union {
   __u32 irq; /* GSI */
 diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
 index 87dc33b..8935062 100644
 --- a/arch/arm/include/asm/kvm.h
 +++ b/arch/arm/include/asm/kvm.h
 @@ -20,6 +20,14 @@
  #include asm/types.h
  
  /*
 + * KVM_IRQ_LINE macros to set/read IRQ/FIQ for specific VCPU index.
 + */
 +enum KVM_ARM_IRQ_LINE_TYPE {
 + KVM_ARM_IRQ_LINE = 0,
 + KVM_ARM_FIQ_LINE = 1,
 +};
 +
 +/*
   * Modes used for short-hand mode determinition in the world-switch code and
   * in emulation code.
   *
 diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
 index 835abd1..e378a37 100644
 --- a/arch/arm/include/asm/kvm_arm.h
 +++ b/arch/arm/include/asm/kvm_arm.h
 @@ -49,6 +49,7 @@
  #define HCR_VM   1
  #define HCR_GUEST_MASK (HCR_TSC | HCR_TWE | HCR_TWI | HCR_VM | HCR_AMO | \
   HCR_AMO | HCR_IMO | HCR_FMO | HCR_SWIO)
 +#define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
  
  /* Hyp System Control Register (HSCTLR) bits */
  #define HSCTLR_TE(1  30)
 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index 89ba18d..fc0bd6b 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -299,6 +299,43 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
 struct kvm_run *run)
   return -EINVAL;
  }
  
 +static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm,
 +   struct kvm_irq_level *irq_level)
 +{
 + u32 mask;
 + unsigned int vcpu_idx;
 + struct kvm_vcpu *vcpu;
 +
 + vcpu_idx = irq_level-irq / 2;
 + if (vcpu_idx = KVM_MAX_VCPUS)
 + return -EINVAL;
 +
 + vcpu = kvm_get_vcpu(kvm, vcpu_idx);
 + if (!vcpu)
 + return -EINVAL;
 +
 + switch (irq_level-irq % 2) {
 + case KVM_ARM_IRQ_LINE:
 + mask = HCR_VI;
 + break;
 + case KVM_ARM_FIQ_LINE:
 + mask = HCR_VF;
 + break;

Re: Current kernel fails to compile with KVM on PowerPC

2011-12-11 Thread Jörg Sommer
Alexander Graf hat am Tue 22. Nov, 22:29 (+0100) geschrieben:
 On 22.11.2011, at 21:04, Jörg Sommer wrote:
  Jörg Sommer hat am Mon 07. Nov, 20:48 (+0100) geschrieben:
  I'm trying to build the kernel with the git commit-id
  31555213f03bca37d2c02e10946296052f4ecfcd, but it fails
  
   CHK include/linux/version.h
   HOSTCC  scripts/mod/modpost.o
   CHK include/generated/utsrelease.h
   UPD include/generated/utsrelease.h
   HOSTLD  scripts/mod/modpost
   GEN include/generated/bounds.h
   CC  arch/powerpc/kernel/asm-offsets.s
  In file included from arch/powerpc/kernel/asm-offsets.c:59:0:
  /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h: In function 
  ‘compute_tlbie_rb’:
  /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:393:10: error: 
  ‘HPTE_V_SECONDARY’ undeclared (first use in this function)
  /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:393:10: note: 
  each undeclared identifier is reported only once for each function it 
  appears in
  /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:396:12: error: 
  ‘HPTE_V_1TB_SEG’ undeclared (first use in this function)
  /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:401:10: error: 
  ‘HPTE_V_LARGE’ undeclared (first use in this function)
  /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:415:2: 
  warning: right shift count = width of type [enabled by default]
  make[3]: *** [arch/powerpc/kernel/asm-offsets.s] Fehler 1
  make[2]: *** [prepare0] Fehler 2
  make[1]: *** [deb-pkg] Fehler 2
  make: *** [deb-pkg] Fehler 2
  
  I'm still having this problem. I can' build
  6fe4c6d466e95d31164f14b1ac4aefb51f0f4f82. Are there any patches to
  make the kernel builds and do not oops [1] on PowerPC?
 
 The failures above should be fixed by now.

I've pulled git://git.kernel.org/pub/scm/virt/kvm/kvm.git
(a41d08d13f903da5c633fc58ee074156f05ab3ce), but this tree doesn't contain
a suitable commit. Where can I find it?

Bye, Jörg.
-- 
 Ich kenn mich mit OpenBSD kaum aus, was sind denn da so die
 Vorteile gegenueber Linux und iptables?
Der Fuchsschwanzeffekt ist größer. :-
Message-ID: slrnb11064.54g.hsch...@humbert.ddns.org


signature.asc
Description: Digital signature http://en.wikipedia.org/wiki/OpenPGP


Re: [PATCH v5 05/13] ARM: KVM: Inject IRQs and FIQs from userspace

2011-12-11 Thread Peter Maydell
On 11 December 2011 15:18, Jan Kiszka jan.kis...@web.de wrote:
 Just found two, maybe three nits while browsing by:

 On 2011-12-11 11:24, Christoffer Dall wrote:
 +ARM uses two types of interrupt lines per CPU, ie. IRQ and FIQ. The value 
 of the
 +irq field should be (VCPU_INDEX * 2) for IRQs and ((VCPU_INDEX * 2) + 1) for
 +FIQs.

This seems to me a slightly obscure way of defining the two fields
in this word (ie bits [31..1] cpu number, bit [0] irq-vs-fiq).

 +static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm,
 +                                   struct kvm_irq_level *irq_level)
 +{
 +     u32 mask;
 +     unsigned int vcpu_idx;
 +     struct kvm_vcpu *vcpu;
 +
 +     vcpu_idx = irq_level-irq / 2;
 +     if (vcpu_idx = KVM_MAX_VCPUS)
 +             return -EINVAL;
 +
 +     vcpu = kvm_get_vcpu(kvm, vcpu_idx);
 +     if (!vcpu)
 +             return -EINVAL;
 +
 +     switch (irq_level-irq % 2) {
 +     case KVM_ARM_IRQ_LINE:
 +             mask = HCR_VI;
 +             break;
 +     case KVM_ARM_FIQ_LINE:
 +             mask = HCR_VF;
 +             break;
 +     default:
 +             return -EINVAL;

 Due to % 2, default is unreachable. Remove the masking?

Removing the mask would be wrong since the irq field here
is encoding both cpu number and irq-vs-fiq. The default is
just an unreachable condition. (Why are we using % here
rather than the obvious bit operation, incidentally?)

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 05/13] ARM: KVM: Inject IRQs and FIQs from userspace

2011-12-11 Thread Christoffer Dall
On Sun, Dec 11, 2011 at 10:18 AM, Jan Kiszka jan.kis...@web.de wrote:
 Just found two, maybe three nits while browsing by:

 On 2011-12-11 11:24, Christoffer Dall wrote:
 Userspace can inject IRQs and FIQs through the KVM_IRQ_LINE VM ioctl.
 This ioctl is used since the sematics are in fact two lines that can be
 either raised or lowered on the VCPU - the IRQ and FIQ lines.

 KVM needs to know which VCPU it must operate on and whether the FIQ or
 IRQ line is raised/lowered. Hence both pieces of information is packed
 in the kvm_irq_level-irq field. The irq fild value will be:
   IRQ: vcpu_index * 2
   FIQ: (vcpu_index * 2) + 1

 This is documented in Documentation/kvm/api.txt.

 The effect of the ioctl is simply to simply raise/lower the
 corresponding virt_irq field on the VCPU struct, which will cause the
 world-switch code to raise/lower virtual interrupts when running the
 guest on next switch. The wait_for_interrupt flag is also cleared for
 raised IRQs causing an idle VCPU to become active again.

 Note: The custom trace_kvm_irq_line is used despite a generic definition of
 trace_kvm_set_irq, since the trace-Kvm_set_irq depends on the x86-specific
 define of __HAVE_IOAPIC. Either the trace event should be created
 regardless of this define or it should depend on another ifdef clause,
 common for both x86 and ARM. However, since the arguments don't really
 match those used in ARM, I am yet to be convinced why this is necessary.

 Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
 ---
  Documentation/virtual/kvm/api.txt |   10 ++-
  arch/arm/include/asm/kvm.h        |    8 ++
  arch/arm/include/asm/kvm_arm.h    |    1 +
  arch/arm/kvm/arm.c                |   53 
 -
  arch/arm/kvm/trace.h              |   21 +++
  include/linux/kvm.h               |    1 +
  6 files changed, 91 insertions(+), 3 deletions(-)

 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index 7945b0b..4abaa67 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -572,7 +572,7 @@ only go to the IOAPIC.  On ia64, a IOSAPIC is created.
  4.25 KVM_IRQ_LINE

  Capability: KVM_CAP_IRQCHIP
 -Architectures: x86, ia64
 +Architectures: x86, ia64, arm
  Type: vm ioctl
  Parameters: struct kvm_irq_level
  Returns: 0 on success, -1 on error
 @@ -582,6 +582,14 @@ Requires that an interrupt controller model has been 
 previously created with
  KVM_CREATE_IRQCHIP.  Note that edge-triggered interrupts require the level
  to be set to 1 and then back to 0.

 +KVM_CREATE_IRQCHIP (except for ARM).  Note that edge-triggered interrupts
 +require the level to be set to 1 and then back to 0.

 You probably wanted to replace the original lines with these two, no?


ah yes, some stgit re-ordering artifact.

 +
 +ARM uses two types of interrupt lines per CPU, ie. IRQ and FIQ. The value 
 of the
 +irq field should be (VCPU_INDEX * 2) for IRQs and ((VCPU_INDEX * 2) + 1) for
 +FIQs. Level is used to raise/lower the line. See arch/arm/include/asm/kvm.h 
 for
 +convenience macros.
 +
  struct kvm_irq_level {
       union {
               __u32 irq;     /* GSI */
 diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
 index 87dc33b..8935062 100644
 --- a/arch/arm/include/asm/kvm.h
 +++ b/arch/arm/include/asm/kvm.h
 @@ -20,6 +20,14 @@
  #include asm/types.h

  /*
 + * KVM_IRQ_LINE macros to set/read IRQ/FIQ for specific VCPU index.
 + */
 +enum KVM_ARM_IRQ_LINE_TYPE {
 +     KVM_ARM_IRQ_LINE = 0,
 +     KVM_ARM_FIQ_LINE = 1,
 +};
 +
 +/*
   * Modes used for short-hand mode determinition in the world-switch code and
   * in emulation code.
   *
 diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
 index 835abd1..e378a37 100644
 --- a/arch/arm/include/asm/kvm_arm.h
 +++ b/arch/arm/include/asm/kvm_arm.h
 @@ -49,6 +49,7 @@
  #define HCR_VM               1
  #define HCR_GUEST_MASK (HCR_TSC | HCR_TWE | HCR_TWI | HCR_VM | HCR_AMO | \
                       HCR_AMO | HCR_IMO | HCR_FMO | HCR_SWIO)
 +#define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)

  /* Hyp System Control Register (HSCTLR) bits */
  #define HSCTLR_TE    (1  30)
 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index 89ba18d..fc0bd6b 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -299,6 +299,43 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
 struct kvm_run *run)
       return -EINVAL;
  }

 +static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm,
 +                                   struct kvm_irq_level *irq_level)
 +{
 +     u32 mask;
 +     unsigned int vcpu_idx;
 +     struct kvm_vcpu *vcpu;
 +
 +     vcpu_idx = irq_level-irq / 2;
 +     if (vcpu_idx = KVM_MAX_VCPUS)
 +             return -EINVAL;
 +
 +     vcpu = kvm_get_vcpu(kvm, vcpu_idx);
 +     if (!vcpu)
 +             return -EINVAL;
 +
 +     switch (irq_level-irq % 2) {
 +     case KVM_ARM_IRQ_LINE:
 +             mask = HCR_VI;
 +  

Re: [PATCH v5 00/13] KVM/ARM Implementation

2011-12-11 Thread Christoffer Dall
On Sun, Dec 11, 2011 at 6:32 AM, Peter Maydell peter.mayd...@linaro.org wrote:
 On 11 December 2011 10:24, Christoffer Dall
 c.d...@virtualopensystems.com wrote:
 The following series implements KVM support for ARM processors,
 specifically on the Cortex A-15 platform.

 Still on the to-do list:
  - Reuse VMIDs
  - Fix SMP host support
  - Fix SMP guest support
  - Support guest Thumb mode for MMIO emulation
  - Further testing
  - Performance improvements

 Other items for this list:
  - Support Neon/VFP in guests (the fpu regs struct is empty ATM)
  - Support guest debugging


ok, thanks, will add these to the list. I have a feeling it will keep
growing for a while :)

 I couldn't see any support for the TLS registers in your cp15 emulation:
 did I miss it, or do we handle it without needing to trap?

by TLS you mean the cp15, c13 registers (tid and friends?) If so, I
handle these in the world-switch code (look at read_cp15_state and
write_cp15_state).

otherwise, help me out on the acronym...

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 00/13] KVM/ARM Implementation

2011-12-11 Thread Peter Maydell
On 11 December 2011 19:23, Christoffer Dall
c.d...@virtualopensystems.com wrote:
 by TLS you mean the cp15, c13 registers (tid and friends?) If so, I
 handle these in the world-switch code (look at read_cp15_state and
 write_cp15_state).

 otherwise, help me out on the acronym...

Yes, those are the ones (TLS == thread local storage). Thanks for
the pointer.

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 05/13] ARM: KVM: Inject IRQs and FIQs from userspace

2011-12-11 Thread Christoffer Dall
On Sun, Dec 11, 2011 at 11:03 AM, Peter Maydell
peter.mayd...@linaro.org wrote:
 On 11 December 2011 15:18, Jan Kiszka jan.kis...@web.de wrote:
 Just found two, maybe three nits while browsing by:

 On 2011-12-11 11:24, Christoffer Dall wrote:
 +ARM uses two types of interrupt lines per CPU, ie. IRQ and FIQ. The value 
 of the
 +irq field should be (VCPU_INDEX * 2) for IRQs and ((VCPU_INDEX * 2) + 1) 
 for
 +FIQs.

 This seems to me a slightly obscure way of defining the two fields
 in this word (ie bits [31..1] cpu number, bit [0] irq-vs-fiq).


Isn't that just personal preference? The other scheme was suggested by
Avi, and nobody else complained then, so I'd be inclined to just leave
it as is.

 +static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm,
 +                                   struct kvm_irq_level *irq_level)
 +{
 +     u32 mask;
 +     unsigned int vcpu_idx;
 +     struct kvm_vcpu *vcpu;
 +
 +     vcpu_idx = irq_level-irq / 2;
 +     if (vcpu_idx = KVM_MAX_VCPUS)
 +             return -EINVAL;
 +
 +     vcpu = kvm_get_vcpu(kvm, vcpu_idx);
 +     if (!vcpu)
 +             return -EINVAL;
 +
 +     switch (irq_level-irq % 2) {
 +     case KVM_ARM_IRQ_LINE:
 +             mask = HCR_VI;
 +             break;
 +     case KVM_ARM_FIQ_LINE:
 +             mask = HCR_VF;
 +             break;
 +     default:
 +             return -EINVAL;

 Due to % 2, default is unreachable. Remove the masking?

 Removing the mask would be wrong since the irq field here
 is encoding both cpu number and irq-vs-fiq. The default is
 just an unreachable condition. (Why are we using % here
 rather than the obvious bit operation, incidentally?)

right, I will remove the default case.

I highly doubt that the difference in using a bitop will be measurably
more efficient, but if you feel strongly about it, I can change it to
a shift and bitwise and, which I assume is what you mean by the
obvious bit operation? I think my CS background speaks for using %,
but whatever.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 05/13] ARM: KVM: Inject IRQs and FIQs from userspace

2011-12-11 Thread Peter Maydell
On 11 December 2011 19:30, Christoffer Dall
c.d...@virtualopensystems.com wrote:
 On Sun, Dec 11, 2011 at 11:03 AM, Peter Maydell
 peter.mayd...@linaro.org wrote:
 Removing the mask would be wrong since the irq field here
 is encoding both cpu number and irq-vs-fiq. The default is
 just an unreachable condition. (Why are we using % here
 rather than the obvious bit operation, incidentally?)

 right, I will remove the default case.

 I highly doubt that the difference in using a bitop will be measurably
 more efficient, but if you feel strongly about it, I can change it to
 a shift and bitwise and, which I assume is what you mean by the
 obvious bit operation? I think my CS background speaks for using %,
 but whatever.

Certainly the compiler ought to be able to figure out the
two are the same thing; I just think irq  1 is more readable
than irq % 2 (because it's being clear that it's treating the
variable as a pile of bits rather than an integer). This is
bikeshedding rather, though, and style issues in kernel code
are a matter for the kernel folk. So you can ignore me :-)

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm tools: Clean up LINT assignment code

2011-12-11 Thread Sasha Levin
Just set delivery mode directly without going through ugly casting.

This cleans up and simplifies the code.

Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/x86/kvm-cpu.c |   10 ++
 1 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/tools/kvm/x86/kvm-cpu.c b/tools/kvm/x86/kvm-cpu.c
index 27b7a8f..cc1f560 100644
--- a/tools/kvm/x86/kvm-cpu.c
+++ b/tools/kvm/x86/kvm-cpu.c
@@ -81,18 +81,12 @@ static int kvm_cpu__set_lint(struct kvm_cpu *vcpu)
 {
struct kvm_lapic_state klapic;
struct local_apic *lapic = (void *)klapic;
-   u32 lvt;
 
if (ioctl(vcpu-vcpu_fd, KVM_GET_LAPIC, klapic))
return -1;
 
-   lvt = *(u32 *)lapic-lvt_lint0;
-   lvt = SET_APIC_DELIVERY_MODE(lvt, APIC_MODE_EXTINT);
-   *(u32 *)lapic-lvt_lint0 = lvt;
-
-   lvt = *(u32 *)lapic-lvt_lint1;
-   lvt = SET_APIC_DELIVERY_MODE(lvt, APIC_MODE_NMI);
-   *(u32 *)lapic-lvt_lint1 = lvt;
+   lapic-lvt_lint0.delivery_mode = APIC_MODE_EXTINT;
+   lapic-lvt_lint1.delivery_mode = APIC_MODE_NMI;
 
return ioctl(vcpu-vcpu_fd, KVM_SET_LAPIC, klapic);
 }
-- 
1.7.8

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Android-virt] [PATCH v5 05/13] ARM: KVM: Inject IRQs and FIQs from userspace

2011-12-11 Thread Christoffer Dall
On Dec 11, 2011, at 2:48 PM, Peter Maydell peter.mayd...@linaro.org wrote:

 On 11 December 2011 19:30, Christoffer Dall
 c.d...@virtualopensystems.com wrote:
 On Sun, Dec 11, 2011 at 11:03 AM, Peter Maydell
 peter.mayd...@linaro.org wrote:
 Removing the mask would be wrong since the irq field here
 is encoding both cpu number and irq-vs-fiq. The default is
 just an unreachable condition. (Why are we using % here
 rather than the obvious bit operation, incidentally?)

 right, I will remove the default case.

 I highly doubt that the difference in using a bitop will be measurably
 more efficient, but if you feel strongly about it, I can change it to
 a shift and bitwise and, which I assume is what you mean by the
 obvious bit operation? I think my CS background speaks for using %,
 but whatever.

 Certainly the compiler ought to be able to figure out the
 two are the same thing; I just think irq  1 is more readable
 than irq % 2 (because it's being clear that it's treating the
 variable as a pile of bits rather than an integer). This is
 bikeshedding rather, though, and style issues in kernel code
 are a matter for the kernel folk. So you can ignore me :-)

Well, if it was just irq  1, then I hear you, but it would be (irq
 cpu_idx)  1 which I don't think is more clear.

But yes let's see what the kernel folks say.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Android-virt] [PATCH v5 05/13] ARM: KVM: Inject IRQs and FIQs from userspace

2011-12-11 Thread Peter Maydell
On 11 December 2011 20:07, Christoffer Dall
christofferd...@christofferdall.dk wrote:
 Well, if it was just irq  1, then I hear you, but it would be (irq
  cpu_idx)  1 which I don't think is more clear.

Er, what? The fields are [31..1] CPU index and [0] irqtype,
right? So what you have now is:
 vcpu_idx = irq_level-irq / 2;
 irqtype = irq_level-irq % 2;
and the bitshifting equivalent is:
 vcpu_idx = irq_level-irq  1;
 irqtype = irq_level-irq  1;
surely?

Shifting by the cpuindex is definitely wrong.

(Incidentally I fixed a bug in your QEMU-side code which wasn't
feeding this field to the kernel in the way it expects:

http://git.linaro.org/gitweb?p=qemu/qemu-linaro.git;a=commitdiff;h=2502ba067e795e48d346f9816fad45177ca64bca

Sorry, I should have posted that to the list. I'll do that now.)

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Android-virt] [PATCH v5 05/13] ARM: KVM: Inject IRQs and FIQs from userspace

2011-12-11 Thread Christoffer Dall
On Sun, Dec 11, 2011 at 3:25 PM, Peter Maydell peter.mayd...@linaro.org wrote:
 On 11 December 2011 20:07, Christoffer Dall
 christofferd...@christofferdall.dk wrote:
 Well, if it was just irq  1, then I hear you, but it would be (irq
  cpu_idx)  1 which I don't think is more clear.

 Er, what? The fields are [31..1] CPU index and [0] irqtype,
 right? So what you have now is:
     vcpu_idx = irq_level-irq / 2;
     irqtype = irq_level-irq % 2;
 and the bitshifting equivalent is:
     vcpu_idx = irq_level-irq  1;
     irqtype = irq_level-irq  1;
 surely?

 Shifting by the cpuindex is definitely wrong.

actually, that's not how the irq_level field is defined. If you look
in Documentation/virtual/kvm/api.txt:

ARM uses two types of interrupt lines per CPU, ie. IRQ and FIQ. The
value of the
irq field should be (VCPU_INDEX * 2) for IRQs and ((VCPU_INDEX * 2) + 1) for
FIQs. Level is used to raise/lower the line. See arch/arm/include/asm/kvm.h for
convenience macros.

also, in the kernel code the cpu_index is achieved by a simple integer
division by 2.

as I said, this was the proposal from the last round of reviews after
a lengthy discussion, so I sticked with that.

we should definitely fix either side, and the only sane argument is
that this is an irq_line field, so an index resembling an actual line
seems more semantically in line with the field purpose rather than a
bit encoding, but I am open to arguments and not married to the
current implementation.

 (Incidentally I fixed a bug in your QEMU-side code which wasn't
 feeding this field to the kernel in the way it expects:

 http://git.linaro.org/gitweb?p=qemu/qemu-linaro.git;a=commitdiff;h=2502ba067e795e48d346f9816fad45177ca64bca

 Sorry, I should have posted that to the list. I'll do that now.)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 10/12] [PATCH] kvm-s390: storage key interface

2011-12-11 Thread Heiko Carstens
On Sat, Dec 10, 2011 at 01:35:39PM +0100, Carsten Otte wrote:
 This patch introduces an interface to access the guest visible
 storage keys. It supports three operations that model the behavior
 that SSKE/ISKE/RRBE instructions would have if they were issued by
 the guest. These instructions are all documented in the z architecture
 principles of operation book.
 
 Signed-off-by: Carsten Otte co...@de.ibm.com

[...]

 --- a/arch/s390/kvm/kvm-s390.c
 +++ b/arch/s390/kvm/kvm-s390.c
 @@ -112,13 +112,115 @@ void kvm_arch_exit(void)
  {
  }
 
 +static long kvm_s390_keyop(struct kvm_s390_keyop *kop)
 +{
 + unsigned long addr = kop-user_addr;
 + pte_t *ptep;
 + pgste_t pgste;
 + int r;
 + unsigned long skey;
 + unsigned long bits;
 +
 + /* make sure this process is a hypervisor */
 + r = -EINVAL;
 + if (!mm_has_pgste(current-mm))
 + goto out;
 +
 + r = -EFAULT;
 + if (addr = PGDIR_SIZE)
 + goto out;
 +
 + spin_lock(current-mm-page_table_lock);
 + ptep = ptep_for_addr(addr);

Locking is broken; following order is possible:

kvm_s390_keyop()- spin_lock(current-mm-page_table_lock)
- ptep_for_addr()  - down_read(current-mm-mmap_sem)
  --- Bug 1, we might schedule here
- __pmdp_for_addr()
- __pte_alloc()- spin_lock(mm-page_table_lock)
  --- Bug 2, deadlock

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Android-virt] [PATCH v5 05/13] ARM: KVM: Inject IRQs and FIQs from userspace

2011-12-11 Thread Peter Maydell
On 11 December 2011 21:36, Christoffer Dall
c.d...@virtualopensystems.com wrote:
 On Sun, Dec 11, 2011 at 3:25 PM, Peter Maydell peter.mayd...@linaro.org 
 wrote:
 On 11 December 2011 20:07, Christoffer Dall
 christofferd...@christofferdall.dk wrote:
 Well, if it was just irq  1, then I hear you, but it would be (irq
  cpu_idx)  1 which I don't think is more clear.

 Er, what? The fields are [31..1] CPU index and [0] irqtype,
 right? So what you have now is:
     vcpu_idx = irq_level-irq / 2;
     irqtype = irq_level-irq % 2;
 and the bitshifting equivalent is:
     vcpu_idx = irq_level-irq  1;
     irqtype = irq_level-irq  1;
 surely?

 Shifting by the cpuindex is definitely wrong.

 actually, that's not how the irq_level field is defined.

It's not clear to me which part of my comment this is aimed at. Shifting
by the cpuindex doesn't give the right answer whether you define
irq_level by bitfields or with the current phrasing you quote below.

 If you look
 in Documentation/virtual/kvm/api.txt:

 ARM uses two types of interrupt lines per CPU, ie. IRQ and FIQ. The
 value of the
 irq field should be (VCPU_INDEX * 2) for IRQs and ((VCPU_INDEX * 2) + 1) for
 FIQs. Level is used to raise/lower the line. See arch/arm/include/asm/kvm.h 
 for
 convenience macros.

That's exactly the same thing, though, right? It's just a matter
of how you choose to phrase it (in either text or in code; the values
come out identical). When I was sorting out the QEMU side, I started
out by looking at the kernel source code, deduced that we were encoding
CPU number and irq-vs-fiq as described above (and documenting it in a
slightly confusing way as a multiplication) and then wrote the qemu
code in what seemed to me the clearest way.

(Actually what would be clearest would be if the ioctl took the
(interrupt-target, interrupt-line-for-that-target, value-of-line)
tuple as three separate values rather than encoding two of them into
a single integer, but I assume there's a reason we can't have that.)

 we should definitely fix either side, and the only sane argument is
 that this is an irq_line field, so an index resembling an actual line
 seems more semantically in line with the field purpose rather than a
 bit encoding, but I am open to arguments and not married to the
 current implementation.

To be clear, I'm not attempting to suggest a change in the semantics
of this field. (The qemu patch fixes the qemu side to adhere to what
the kernel requires.)

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] KVM: Make mmu_shrink() scan nr_to_scan shadow pages

2011-12-11 Thread Takuya Yoshikawa
This patch set fixes mmu_shrink() as I said last week.

Though I did not change tuning parameters, we can do that in the future
on top of this: I think the batch size, 128, may be too large.

Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] KVM: Rename vm_list to kvm_list to avoid confusion

2011-12-11 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp

Make it clear that this is not related to virtual memory.

Remove vm_ prefix from the corresponding member of the struct kvm to
avoid kvm-vm_ redundancy alongside.

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
 Documentation/virtual/kvm/locking.txt |2 +-
 arch/x86/include/asm/kvm_host.h   |2 +-
 arch/x86/kvm/mmu.c|4 ++--
 arch/x86/kvm/x86.c|4 ++--
 include/linux/kvm_host.h  |2 +-
 virt/kvm/kvm_main.c   |   12 ++--
 6 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/Documentation/virtual/kvm/locking.txt 
b/Documentation/virtual/kvm/locking.txt
index 3b4cd3b..1a851be 100644
--- a/Documentation/virtual/kvm/locking.txt
+++ b/Documentation/virtual/kvm/locking.txt
@@ -12,7 +12,7 @@ KVM Lock Overview
 Name:  kvm_lock
 Type:  raw_spinlock
 Arch:  any
-Protects:  - vm_list
+Protects:  - kvm_list
- hardware virtualization enable/disable
 Comment:   'raw' because hardware enabling/disabling must be atomic /wrt
migration.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 020413a..186b2b0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -105,7 +105,7 @@
 #define ASYNC_PF_PER_VCPU 64
 
 extern raw_spinlock_t kvm_lock;
-extern struct list_head vm_list;
+extern struct list_head kvm_list;
 
 struct kvm_vcpu;
 struct kvm;
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2a2a9b4..590f76b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3911,7 +3911,7 @@ static int mmu_shrink(struct shrinker *shrink, struct 
shrink_control *sc)
 
raw_spin_lock(kvm_lock);
 
-   list_for_each_entry(kvm, vm_list, vm_list) {
+   list_for_each_entry(kvm, kvm_list, list) {
int idx;
LIST_HEAD(invalid_list);
 
@@ -3930,7 +3930,7 @@ static int mmu_shrink(struct shrinker *shrink, struct 
shrink_control *sc)
srcu_read_unlock(kvm-srcu, idx);
}
if (kvm_freed)
-   list_move_tail(kvm_freed-vm_list, vm_list);
+   list_move_tail(kvm_freed-list, kvm_list);
 
raw_spin_unlock(kvm_lock);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index eeeaf2e..96f118b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4566,7 +4566,7 @@ static int kvmclock_cpufreq_notifier(struct 
notifier_block *nb, unsigned long va
smp_call_function_single(freq-cpu, tsc_khz_changed, freq, 1);
 
raw_spin_lock(kvm_lock);
-   list_for_each_entry(kvm, vm_list, vm_list) {
+   list_for_each_entry(kvm, kvm_list, list) {
kvm_for_each_vcpu(i, vcpu, kvm) {
if (vcpu-cpu != freq-cpu)
continue;
@@ -5857,7 +5857,7 @@ int kvm_arch_hardware_enable(void *garbage)
int i;
 
kvm_shared_msr_cpu_online();
-   list_for_each_entry(kvm, vm_list, vm_list)
+   list_for_each_entry(kvm, kvm_list, list)
kvm_for_each_vcpu(i, vcpu, kvm)
if (vcpu-cpu == smp_processor_id())
kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8c5c303..054b52e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -256,7 +256,7 @@ struct kvm {
struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
atomic_t online_vcpus;
int last_boosted_vcpu;
-   struct list_head vm_list;
+   struct list_head list; /* the list of kvm instances */
struct mutex lock;
struct kvm_io_bus *buses[KVM_NR_BUSES];
 #ifdef CONFIG_HAVE_KVM_EVENTFD
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d8bac07..03ae960 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -70,8 +70,8 @@ MODULE_LICENSE(GPL);
  * kvm-lock -- kvm-slots_lock -- kvm-irq_lock
  */
 
-DEFINE_RAW_SPINLOCK(kvm_lock);
-LIST_HEAD(vm_list);
+DEFINE_RAW_SPINLOCK(kvm_lock); /* protect kvm_list */
+LIST_HEAD(kvm_list);   /* the list of kvm instances */
 
 static cpumask_var_t cpus_hardware_enabled;
 static int kvm_usage_count = 0;
@@ -498,7 +498,7 @@ static struct kvm *kvm_create_vm(void)
goto out_err;
 
raw_spin_lock(kvm_lock);
-   list_add(kvm-vm_list, vm_list);
+   list_add(kvm-list, kvm_list);
raw_spin_unlock(kvm_lock);
 
return kvm;
@@ -573,7 +573,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
 
kvm_arch_sync_events(kvm);
raw_spin_lock(kvm_lock);
-   list_del(kvm-vm_list);
+   list_del(kvm-list);
raw_spin_unlock(kvm_lock);
kvm_free_irq_routing(kvm);
for (i = 0; i  KVM_NR_BUSES; i++)
@@ -2626,7 +2626,7 @@ static int vm_stat_get(void *_offset, u64 *val)
 
*val = 0;

[PATCH 2/4] KVM: MMU: Make common preparation code for zapping sp into a function

2011-12-11 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp

Use list_entry() instead of container_of() for taking a shadow page from
the active_mmu_pages list.

Note: the return value of pre_zap_one_sp() will be used later.

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
 arch/x86/kvm/mmu.c |   45 +++--
 1 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 590f76b..b1e8270 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1930,6 +1930,26 @@ static int kvm_mmu_prepare_zap_page(struct kvm *kvm, 
struct kvm_mmu_page *sp,
return ret;
 }
 
+/**
+ * pre_zap_one_sp - make one shadow page ready for being freed
+ * @kvm: the kvm instance
+ * @invalid_list: the list to which we add shadow pages ready for being freed
+ *
+ * Take one shadow page from the tail of the active_mmu_pages list and make it
+ * ready for being freed, then put it into the @invalid_list.  Other pages,
+ * unsync children, may also be put into the @invalid_list.
+ *
+ * Return the number of shadow pages added to the @invalid_list this way.
+ */
+static int pre_zap_one_sp(struct kvm *kvm, struct list_head *invalid_list)
+{
+   struct kvm_mmu_page *sp;
+
+   sp = list_entry(kvm-arch.active_mmu_pages.prev,
+   struct kvm_mmu_page, link);
+   return kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
+}
+
 static void kvm_mmu_isolate_pages(struct list_head *invalid_list)
 {
struct kvm_mmu_page *sp;
@@ -1999,11 +2019,7 @@ void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned 
int goal_nr_mmu_pages)
if (kvm-arch.n_used_mmu_pages  goal_nr_mmu_pages) {
while (kvm-arch.n_used_mmu_pages  goal_nr_mmu_pages 
!list_empty(kvm-arch.active_mmu_pages)) {
-   struct kvm_mmu_page *page;
-
-   page = container_of(kvm-arch.active_mmu_pages.prev,
-   struct kvm_mmu_page, link);
-   kvm_mmu_prepare_zap_page(kvm, page, invalid_list);
+   pre_zap_one_sp(kvm, invalid_list);
}
kvm_mmu_commit_zap_page(kvm, invalid_list);
goal_nr_mmu_pages = kvm-arch.n_used_mmu_pages;
@@ -3719,11 +3735,7 @@ void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu)
 
while (kvm_mmu_available_pages(vcpu-kvm)  KVM_REFILL_PAGES 
   !list_empty(vcpu-kvm-arch.active_mmu_pages)) {
-   struct kvm_mmu_page *sp;
-
-   sp = container_of(vcpu-kvm-arch.active_mmu_pages.prev,
- struct kvm_mmu_page, link);
-   kvm_mmu_prepare_zap_page(vcpu-kvm, sp, invalid_list);
+   pre_zap_one_sp(vcpu-kvm, invalid_list);
++vcpu-kvm-stat.mmu_recycled;
}
kvm_mmu_commit_zap_page(vcpu-kvm, invalid_list);
@@ -3890,16 +3902,6 @@ restart:
spin_unlock(kvm-mmu_lock);
 }
 
-static void kvm_mmu_remove_some_alloc_mmu_pages(struct kvm *kvm,
-   struct list_head *invalid_list)
-{
-   struct kvm_mmu_page *page;
-
-   page = container_of(kvm-arch.active_mmu_pages.prev,
-   struct kvm_mmu_page, link);
-   kvm_mmu_prepare_zap_page(kvm, page, invalid_list);
-}
-
 static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc)
 {
struct kvm *kvm;
@@ -3919,8 +3921,7 @@ static int mmu_shrink(struct shrinker *shrink, struct 
shrink_control *sc)
spin_lock(kvm-mmu_lock);
if (!kvm_freed  nr_to_scan  0 
kvm-arch.n_used_mmu_pages  0) {
-   kvm_mmu_remove_some_alloc_mmu_pages(kvm,
-   invalid_list);
+   pre_zap_one_sp(kvm, invalid_list);
kvm_freed = kvm;
}
nr_to_scan--;
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] KVM: MMU: Make preparation for zapping some sp into a separate function

2011-12-11 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp

This will be used for mmu_shrink() in the following patch.

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
 arch/x86/kvm/mmu.c |   36 ++--
 1 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b1e8270..fcd0dd1 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2003,6 +2003,28 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 
 }
 
+/**
+ * pre_zap_some_sp - make some shadow pages ready for being freed
+ * @kvm: the kvm instance
+ * @invalid_list: the list to which we add shadow pages ready for being freed
+ * @nr_to_zap: how many shadow pages we want to zap
+ *
+ * Try to make @nr_to_zap shadow pages ready for being freed, then put them
+ * into the @invalid_list.
+ *
+ * Return the number of shadow pages actually added to the @invalid_list.
+ */
+static int pre_zap_some_sp(struct kvm *kvm, struct list_head *invalid_list,
+  int nr_to_zap)
+{
+   int nr_before = kvm-arch.n_used_mmu_pages;
+
+   while (nr_to_zap  0  !list_empty(kvm-arch.active_mmu_pages))
+   nr_to_zap -= pre_zap_one_sp(kvm, invalid_list);
+
+   return nr_before - kvm-arch.n_used_mmu_pages;
+}
+
 /*
  * Changing the number of mmu pages allocated to the vm
  * Note: if goal_nr_mmu_pages is too small, you will get dead lock
@@ -2010,17 +2032,11 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
 void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int goal_nr_mmu_pages)
 {
LIST_HEAD(invalid_list);
-   /*
-* If we set the number of mmu pages to be smaller be than the
-* number of actived pages , we must to free some mmu pages before we
-* change the value
-*/
+   int nr_to_zap = kvm-arch.n_used_mmu_pages  goal_nr_mmu_pages;
 
-   if (kvm-arch.n_used_mmu_pages  goal_nr_mmu_pages) {
-   while (kvm-arch.n_used_mmu_pages  goal_nr_mmu_pages 
-   !list_empty(kvm-arch.active_mmu_pages)) {
-   pre_zap_one_sp(kvm, invalid_list);
-   }
+   if (nr_to_zap  0) {
+   /* free some shadow pages to make the number fit the goal */
+   pre_zap_some_sp(kvm, invalid_list, nr_to_zap);
kvm_mmu_commit_zap_page(kvm, invalid_list);
goal_nr_mmu_pages = kvm-arch.n_used_mmu_pages;
}
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] KVM: MMU: Make mmu_shrink() scan nr_to_scan shadow pages

2011-12-11 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp

Currently, mmu_shrink() tries to free a shadow page from one kvm and
does not use nr_to_scan correctly.

This patch fixes this by making it try to free some shadow pages from
each kvm.  The number of shadow pages each kvm frees becomes
proportional to the number of shadow pages it is using.

Note: an easy way to see how this code works is to do
  echo 3  /proc/sys/vm/drop_caches
during some virtual machines are running.  Shadow pages will be zapped
as expected by this.

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
 arch/x86/kvm/mmu.c |   23 ++-
 1 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index fcd0dd1..c6c61dd 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3921,7 +3921,7 @@ restart:
 static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc)
 {
struct kvm *kvm;
-   struct kvm *kvm_freed = NULL;
+   int nr_to_zap, nr_total;
int nr_to_scan = sc-nr_to_scan;
 
if (nr_to_scan == 0)
@@ -3929,25 +3929,30 @@ static int mmu_shrink(struct shrinker *shrink, struct 
shrink_control *sc)
 
raw_spin_lock(kvm_lock);
 
+   nr_total = percpu_counter_read_positive(kvm_total_used_mmu_pages);
+
list_for_each_entry(kvm, kvm_list, list) {
int idx;
LIST_HEAD(invalid_list);
 
+   if (nr_to_scan = 0) {
+   /* next time from this kvm */
+   list_move_tail(kvm_list, kvm-list);
+   break;
+   }
+
idx = srcu_read_lock(kvm-srcu);
spin_lock(kvm-mmu_lock);
-   if (!kvm_freed  nr_to_scan  0 
-   kvm-arch.n_used_mmu_pages  0) {
-   pre_zap_one_sp(kvm, invalid_list);
-   kvm_freed = kvm;
-   }
-   nr_to_scan--;
 
+   /* proportional to how many shadow pages this kvm is using */
+   nr_to_zap = sc-nr_to_scan * kvm-arch.n_used_mmu_pages;
+   nr_to_zap /= nr_total;
+   nr_to_scan -= pre_zap_some_sp(kvm, invalid_list, nr_to_zap);
kvm_mmu_commit_zap_page(kvm, invalid_list);
+
spin_unlock(kvm-mmu_lock);
srcu_read_unlock(kvm-srcu, idx);
}
-   if (kvm_freed)
-   list_move_tail(kvm_freed-list, kvm_list);
 
raw_spin_unlock(kvm_lock);
 
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] virtio: use mandatory barriers for remote processor vdevs

2011-12-11 Thread Benjamin Herrenschmidt
On Sun, 2011-12-11 at 14:25 +0200, Michael S. Tsirkin wrote:

 Forwarding some results by Amos, who run multiple netperf streams in
 parallel, from an external box to the guest.  TCP_STREAM results were
 noisy.  This could be due to buffering done by TCP, where packet size
 varies even as message size is constant.
 
 TCP_RR results were consistent. In this benchmark, after switching
 to mandatory barriers, CPU utilization increased by up to 35% while
 throughput went down by up to 14%. the normalized throughput/cpu
 regressed consistently, between 7 and 35%
 
 The fix applied was simply this:

What machine  processor was this  ?

Cheers,
Ben.

 diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
 index 3198f2e..fdccb77 100644
 --- a/drivers/virtio/virtio_ring.c
 +++ b/drivers/virtio/virtio_ring.c
 @@ -23,7 +23,7 @@
  
  /* virtio guest is communicating with a virtual device that actually runs 
 on
   * a host processor.  Memory barriers are used to control SMP effects. */
 -#ifdef CONFIG_SMP
 +#if 0
  /* Where possible, use SMP barriers which are more lightweight than mandatory
   * barriers, because mandatory barriers control MMIO effects on accesses
   * through relaxed memory I/O windows (which virtio does not use). */
 
 
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Android-virt] [PATCH v5 05/13] ARM: KVM: Inject IRQs and FIQs from userspace

2011-12-11 Thread Peter Maydell
On 11 December 2011 22:12, Peter Maydell peter.mayd...@linaro.org wrote:
 (Actually what would be clearest would be if the ioctl took the
 (interrupt-target, interrupt-line-for-that-target, value-of-line)
 tuple as three separate values rather than encoding two of them into
 a single integer, but I assume there's a reason we can't have that.)

Have you thought about how this encoding scheme would be extended
when we move to using the VGIC and an in-kernel interrupt controller
implementation, incidentally? I haven't really looked into that at
all, but I assume that then QEMU is going to start having to tell
the kernel it wants to deliver interrupt 35 to the GIC, and so on...

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Android-virt] [PATCH v5 05/13] ARM: KVM: Inject IRQs and FIQs from userspace

2011-12-11 Thread Christoffer Dall
On Sun, Dec 11, 2011 at 5:35 PM, Peter Maydell peter.mayd...@linaro.org wrote:
 On 11 December 2011 22:12, Peter Maydell peter.mayd...@linaro.org wrote:
 (Actually what would be clearest would be if the ioctl took the
 (interrupt-target, interrupt-line-for-that-target, value-of-line)
 tuple as three separate values rather than encoding two of them into
 a single integer, but I assume there's a reason we can't have that.)

 Have you thought about how this encoding scheme would be extended
 when we move to using the VGIC and an in-kernel interrupt controller
 implementation, incidentally? I haven't really looked into that at
 all, but I assume that then QEMU is going to start having to tell
 the kernel it wants to deliver interrupt 35 to the GIC, and so on...


no, I haven't looked into that at all. My plan was to decipher the
common irq, ioapic stuff for x86 and see how much we can re-use and if
there will be some nice way to either use what's there or change some
bits to accomodate both existing archs and ARM. But the short answer
is, no not really, I was focusing so far on getting a stable
implementation upstream.

yes, we are going to have to have some interface with QEMU for this
and if we need new features from what's already there that should
probably be discussed in the same round as the mechanism for handing
of CP15 stuff to QEMU that we touched upon earlier.

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Android-virt] [PATCH v5 05/13] ARM: KVM: Inject IRQs and FIQs from userspace

2011-12-11 Thread Jan Kiszka
On 2011-12-11 23:53, Christoffer Dall wrote:
 On Sun, Dec 11, 2011 at 5:35 PM, Peter Maydell peter.mayd...@linaro.org 
 wrote:
 On 11 December 2011 22:12, Peter Maydell peter.mayd...@linaro.org wrote:
 (Actually what would be clearest would be if the ioctl took the
 (interrupt-target, interrupt-line-for-that-target, value-of-line)
 tuple as three separate values rather than encoding two of them into
 a single integer, but I assume there's a reason we can't have that.)

 Have you thought about how this encoding scheme would be extended
 when we move to using the VGIC and an in-kernel interrupt controller
 implementation, incidentally? I haven't really looked into that at
 all, but I assume that then QEMU is going to start having to tell
 the kernel it wants to deliver interrupt 35 to the GIC, and so on...


 no, I haven't looked into that at all. My plan was to decipher the
 common irq, ioapic stuff for x86 and see how much we can re-use and if
 there will be some nice way to either use what's there or change some
 bits to accomodate both existing archs and ARM. But the short answer
 is, no not really, I was focusing so far on getting a stable
 implementation upstream.
 
 yes, we are going to have to have some interface with QEMU for this
 and if we need new features from what's already there that should
 probably be discussed in the same round as the mechanism for handing
 of CP15 stuff to QEMU that we touched upon earlier.

Enabling in-kernel irqchips usually means switching worlds. So the
semantics of these particular IRQ inject interface details may change
without breaking anything.

However, things might look different if there will be a need to inject
also the CPU IRQs directly, not only the irqchip inputs. In that case,
it may make some sense to reserve more space for interrupt types than
just one bit and use a common encoding scheme.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH V2 04/23] kvm tools: Get correct 64-bit types on PPC64 and link appropriately

2011-12-11 Thread Matt Evans
On 09/12/11 19:29, Pekka Enberg wrote:
 On Fri, Dec 9, 2011 at 10:24 AM, Sasha Levin levinsasha...@gmail.com wrote:
 If you also got kernel patches that add __SANE_USERSPACE_TYPES__ to the
 headers, and KVM_CAP_NR_VCPUS to KVM PPC, we can carry them in the KVM
 tools tree as well.
 
 Yup, all we need is ACKs from PPC maintainers.

Cool, I've pinged them re. __SANE_USERSPACE_TYPES__, and that would be useful to
carry in your tree.  But, IMHO, the patch I sent to Alex ([PATCH] KVM: PPC: Add
KVM_CAP_NR_VCPUS and KVM_CAP_MAX_VCPUS) should go in via his tree.  The
KVM/PPC-HV kernel folk will get it quicker in their trees (pulled from Alex) and
it's not a build problem, only a limit of SMP CPU numbers.  That is, if you're
building a kernel for PPC KVM today you'll probably use something more similar
to Alex's tree than mainline/kvm tools tree.


Cheers,


Matt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 23/23] kvm tools: Create arch-specific kvm_cpu__emulate_{mm}io()

2011-12-11 Thread Matt Evans
On 09/12/11 18:53, Sasha Levin wrote:
 On Fri, 2011-12-09 at 17:56 +1100, Matt Evans wrote:
 @@ -30,4 +31,18 @@ struct kvm_cpu {
 struct kvm_coalesced_mmio_ring  *ring;
  };
  
 +/*
 + * As these are such simple wrappers, let's have them in the header so 
 they'll
 + * be cheaper to call:
 + */
 +static inline bool kvm_cpu__emulate_io(struct kvm *kvm, u16 port, void 
 *data, int direction, int size, u32 count)
 +{
 +   return kvm__emulate_io(kvm, port, data, direction, size, count);
 +}
 +
 +static inline bool kvm_cpu__emulate_mmio(struct kvm *kvm, u64 phys_addr, u8 
 *data, u32 len, u8 is_write)
 +{
 +   return kvm_cpu__emulate_mmio(kvm, phys_addr, data, len, is_write);
 
 This is probably wrong. kvm_cpu__emulate_mmio just calls itself over and
 over.

Urgh, not just probably -- CP strikes again.  Consider it fixed.

Thanks!


Matt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: MMU: Make preparation for zapping some sp into a separate function

2011-12-11 Thread Takuya Yoshikawa
Takuya Yoshikawa takuya.yoshik...@gmail.com wrote:

 @@ -2010,17 +2032,11 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
  void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int 
 goal_nr_mmu_pages)
  {
   LIST_HEAD(invalid_list);
 - /*
 -  * If we set the number of mmu pages to be smaller be than the
 -  * number of actived pages , we must to free some mmu pages before we
 -  * change the value
 -  */
 + int nr_to_zap = kvm-arch.n_used_mmu_pages  goal_nr_mmu_pages;

Sorry, should have been:
int nr_to_zap = kvm-arch.n_used_mmu_pages - goal_nr_mmu_pages;

I will fix this after getting some comments.

Takuya

  
 - if (kvm-arch.n_used_mmu_pages  goal_nr_mmu_pages) {
 - while (kvm-arch.n_used_mmu_pages  goal_nr_mmu_pages 
 - !list_empty(kvm-arch.active_mmu_pages)) {
 - pre_zap_one_sp(kvm, invalid_list);
 - }
 + if (nr_to_zap  0) {
 + /* free some shadow pages to make the number fit the goal */
 + pre_zap_some_sp(kvm, invalid_list, nr_to_zap);
   kvm_mmu_commit_zap_page(kvm, invalid_list);
   goal_nr_mmu_pages = kvm-arch.n_used_mmu_pages;
   }
 -- 
 1.7.5.4
 


-- 
Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH] kvm tools, qcow: Add support for growing refcount blocks

2011-12-11 Thread Lan Tianyu
This patch enables allocating new refcount blocks and so then kvm tools
could expand qcow2 image much larger.

Signed-off-by: Lan Tianyu tianyu@intel.com
---
 tools/kvm/disk/qcow.c |  105 +---
 1 files changed, 89 insertions(+), 16 deletions(-)

diff --git a/tools/kvm/disk/qcow.c b/tools/kvm/disk/qcow.c
index e139fa5..929ba69 100644
--- a/tools/kvm/disk/qcow.c
+++ b/tools/kvm/disk/qcow.c
@@ -12,6 +12,7 @@
 #include string.h
 #include unistd.h
 #include fcntl.h
+#include errno.h
 #ifdef CONFIG_HAS_ZLIB
 #include zlib.h
 #endif
@@ -20,6 +21,10 @@
 #include linux/kernel.h
 #include linux/types.h
 
+static int update_cluster_refcount(struct qcow *q, u64 clust_idx, u16 append);
+static int qcow_write_refcount_table(struct qcow *q);
+static u64 qcow_alloc_clusters(struct qcow *q, u64 size, int update_ref);
+static void  qcow_free_clusters(struct qcow *q, u64 clust_start, u64 size);
 
 static inline int qcow_pwrite_sync(int fd,
void *buf, size_t count, off_t offset)
@@ -657,6 +662,56 @@ static struct qcow_refcount_block 
*refcount_block_search(struct qcow *q, u64 off
return rfb;
 }
 
+static struct qcow_refcount_block *qcow_grow_refcount_block(struct qcow *q,
+   u64 clust_idx)
+{
+   struct qcow_header *header = q-header;
+   struct qcow_refcount_table *rft = q-refcount_table;
+   struct qcow_refcount_block *rfb;
+   u64 new_block_offset;
+   u64 rft_idx;
+
+   rft_idx = clust_idx  (header-cluster_bits -
+   QCOW_REFCOUNT_BLOCK_SHIFT);
+
+   if (rft_idx = rft-rf_size) {
+   pr_warning(Don't support grow refcount block table);
+   return NULL;
+   }
+
+   new_block_offset = qcow_alloc_clusters(q, q-cluster_size, 0);
+   if (new_block_offset  0)
+   return NULL;
+
+   rfb = new_refcount_block(q, new_block_offset);
+   if (!rfb)
+   return NULL;
+
+   memset(rfb-entries, 0x00, q-cluster_size);
+   rfb-dirty = 1;
+
+   /* write refcount block */
+   if (write_refcount_block(q, rfb)  0)
+   goto free_rfb;
+
+   if (cache_refcount_block(q, rfb)  0)
+   goto free_rfb;
+
+   rft-rf_table[rft_idx] = cpu_to_be64(new_block_offset);
+   if (qcow_write_refcount_table(q)  0)
+   goto free_rfb;
+
+   if (update_cluster_refcount(q, new_block_offset 
+   header-cluster_bits, 1)  0)
+   goto free_rfb;
+
+   return rfb;
+
+free_rfb:
+   free(rfb);
+   return NULL;
+}
+
 static struct qcow_refcount_block *qcow_read_refcount_block(struct qcow *q, 
u64 clust_idx)
 {
struct qcow_header *header = q-header;
@@ -667,14 +722,11 @@ static struct qcow_refcount_block 
*qcow_read_refcount_block(struct qcow *q, u64
 
rft_idx = clust_idx  (header-cluster_bits - 
QCOW_REFCOUNT_BLOCK_SHIFT);
if (rft_idx = rft-rf_size)
-   return NULL;
+   return (void *)-ENOSPC;
 
rfb_offset = be64_to_cpu(rft-rf_table[rft_idx]);
-
-   if (!rfb_offset) {
-   pr_warning(Don't support to grow refcount table);
-   return NULL;
-   }
+   if (!rfb_offset)
+   return (void *)-ENOSPC;
 
rfb = refcount_block_search(q, rfb_offset);
if (rfb)
@@ -708,7 +760,8 @@ static u16 qcow_get_refcount(struct qcow *q, u64 clust_idx)
if (!rfb) {
pr_warning(Error while reading refcount table);
return -1;
-   }
+   } else if ((long)rfb == -ENOSPC)
+   return 0;
 
rfb_idx = clust_idx  (((1ULL 
(header-cluster_bits - QCOW_REFCOUNT_BLOCK_SHIFT)) - 1));
@@ -732,6 +785,12 @@ static int update_cluster_refcount(struct qcow *q, u64 
clust_idx, u16 append)
if (!rfb) {
pr_warning(error while reading refcount table);
return -1;
+   } else if ((long)rfb == -ENOSPC) {
+   rfb = qcow_grow_refcount_block(q, clust_idx);
+   if (!rfb) {
+   pr_warning(error while growing refcount table);
+   return -1;
+   }
}
 
rfb_idx = clust_idx  (((1ULL 
@@ -774,11 +833,11 @@ static void  qcow_free_clusters(struct qcow *q, u64 
clust_start, u64 size)
  * can satisfy the size. free_clust_idx is initialized to zero and
  * Record last position.
  */
-static u64 qcow_alloc_clusters(struct qcow *q, u64 size)
+static u64 qcow_alloc_clusters(struct qcow *q, u64 size, int update_ref)
 {
struct qcow_header *header = q-header;
u16 clust_refcount;
-   u32 clust_idx, i;
+   u32 clust_idx = 0, i;
u64 clust_num;
 
clust_num = (size + (q-cluster_size - 1))  header-cluster_bits;
@@ -793,12 +852,15 @@ again:
goto again;
}
 
-   for (i = 0; i  clust_num; i++)
-   if (update_cluster_refcount(q,
-   q-free_clust_idx - 

New Guess OS Creation Problem

2011-12-11 Thread takizo
Hi All, 

I am running on CentOS Released 6.1 final. Been using and running Linux KVM 
quite well for quite some time, something goes wrong after I perform yum 
upgrade. 

I created new VM yesterday without any problem, same exact installation 
procedure, installed FreeBSD 8.2. I tried to create a new VM today after yum 
upgrade, it's able to detect the hard disk, when I start commit FreeBSD 8.2 
installation, it complains cannot write to disk as stated the error message 
below.

block I/O error in device 'drive-ide0-0-0': Invalid argument (22)
block I/O error in device 'drive-ide0-0-0': Invalid argument (22)
block I/O error in device 'drive-ide0-0-0': Invalid argument (22)
block I/O error in device 'drive-ide0-0-0': Invalid argument (22)
block I/O error in device 'drive-ide0-0-0': Invalid argument (22)
block I/O error in device 'drive-ide0-0-0': Invalid argument (22)

Below is the log I capture from the VM log file

-- Log start -- 
2011-12-12 00:23:39.485: starting up
LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none 
/usr/libexec/qemu-kvm -S -M rhel6.1.0 -enable-kvm -m 4096 -smp 
1,sockets=1,cores=1,threads=1 -name database -uuid 
f3e9f320-7826-7e50-94bb-1833f7fd9dfb -nodefconfig -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/database.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-reboot 
-drive 
file=/opt/cibai/database,if=none,id=drive-ide0-0-0,format=raw,cache=none,aio=threads
 -device 
ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=2 -drive 
file=/opt/ISO-Download/FreeBSD-8.2-RELEASE-amd64-disc1.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,aio=threads
 -device 
ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -netdev 
tap,fd=22,id=hostnet0 -device 
e1000,netdev=hostnet0,id=net0,mac=52:54:00:77:a5:a6,bus=pci.0,addr=0x3 -chardev 
pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -vnc 
127.0.0.1:2,password -vga cirrus -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4
char device redirected to /dev/pts/6
Using CPU model cpu64-rhel6

block I/O error in device 'drive-ide0-0-0': Invalid argument (22)
block I/O error in device 'drive-ide0-0-0': Invalid argument (22)
block I/O error in device 'drive-ide0-0-0': Invalid argument (22)
block I/O error in device 'drive-ide0-0-0': Invalid argument (22)
block I/O error in device 'drive-ide0-0-0': Invalid argument (22)
block I/O error in device 'drive-ide0-0-0': Invalid argument (22)
-- Log End --

Below is the software version currently running;
gpxe-roms-qemu-0.9.7-6.7.el6.noarch
qemu-img-0.12.1.2-2.160.el6_1.8.x86_64
qemu-kvm-0.12.1.2-2.160.el6_1.8.x86_64
libvirt-client-0.8.7-18.el6_1.4.x86_64
libvirt-python-0.8.7-18.el6_1.4.x86_64
libvirt-0.8.7-18.el6_1.4.x86_64

Any of you having the problem as well?
I am planning to install CentOS as guest and see whether is has the same 
problem as well. Thanks. 

--
Paul Ooi--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] kvm: make vcpu life cycle separated from kvm instance

2011-12-11 Thread Liu Ping Fan
From: Liu Ping Fan pingf...@linux.vnet.ibm.com

Currently, vcpu can be destructed only when kvm instance destroyed.
Change this to vcpu's destruction taken when its refcnt is zero,
and then vcpu MUST and CAN be destroyed before kvm's destroy.

Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
---
 arch/x86/kvm/i8254.c |   10 --
 arch/x86/kvm/i8259.c |   12 +--
 arch/x86/kvm/mmu.c   |7 ++--
 arch/x86/kvm/x86.c   |   54 +++
 include/linux/kvm_host.h |   71 ++
 virt/kvm/irq_comm.c  |7 +++-
 virt/kvm/kvm_main.c  |   62 +--
 7 files changed, 170 insertions(+), 53 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 76e3f1c..ac79598 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -289,7 +289,7 @@ static void pit_do_work(struct work_struct *work)
struct kvm_pit *pit = container_of(work, struct kvm_pit, expired);
struct kvm *kvm = pit-kvm;
struct kvm_vcpu *vcpu;
-   int i;
+   struct kvm_iter it;
struct kvm_kpit_state *ps = pit-pit_state;
int inject = 0;
 
@@ -315,9 +315,13 @@ static void pit_do_work(struct work_struct *work)
 * LVT0 to NMI delivery. Other PIC interrupts are just sent to
 * VCPU0, and only if its LVT0 is in EXTINT mode.
 */
-   if (kvm-arch.vapics_in_nmi_mode  0)
-   kvm_for_each_vcpu(i, vcpu, kvm)
+   if (kvm-arch.vapics_in_nmi_mode  0) {
+   rcu_read_lock();
+   kvm_for_each_vcpu(it, vcpu, kvm) {
kvm_apic_nmi_wd_deliver(vcpu);
+   }
+   rcu_read_unlock();
+   }
}
 }
 
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index cac4746..2186b30 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -50,25 +50,29 @@ static void pic_unlock(struct kvm_pic *s)
 {
bool wakeup = s-wakeup_needed;
struct kvm_vcpu *vcpu, *found = NULL;
-   int i;
+   struct kvm *kvm = s-kvm;
+   struct kvm_iter it;
 
s-wakeup_needed = false;
 
spin_unlock(s-lock);
 
if (wakeup) {
-   kvm_for_each_vcpu(i, vcpu, s-kvm) {
+   rcu_read_lock();
+   kvm_for_each_vcpu(it, vcpu, kvm)
if (kvm_apic_accept_pic_intr(vcpu)) {
found = vcpu;
break;
}
-   }
 
-   if (!found)
+   if (!found) {
+   rcu_read_unlock();
return;
+   }
 
kvm_make_request(KVM_REQ_EVENT, found);
kvm_vcpu_kick(found);
+   rcu_read_unlock();
}
 }
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f1b36cf..c16887e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1833,11 +1833,12 @@ static void kvm_mmu_put_page(struct kvm_mmu_page *sp, 
u64 *parent_pte)
 
 static void kvm_mmu_reset_last_pte_updated(struct kvm *kvm)
 {
-   int i;
+   struct kvm_iter it;
struct kvm_vcpu *vcpu;
-
-   kvm_for_each_vcpu(i, vcpu, kvm)
+   rcu_read_lock();
+   kvm_for_each_vcpu(it, vcpu, kvm)
vcpu-arch.last_pte_updated = NULL;
+   rcu_read_unlock();
 }
 
 static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c38efd7..a302470 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1831,10 +1831,15 @@ static int get_msr_hyperv(struct kvm_vcpu *vcpu, u32 
msr, u64 *pdata)
switch (msr) {
case HV_X64_MSR_VP_INDEX: {
int r;
+   struct kvm_iter it;
struct kvm_vcpu *v;
-   kvm_for_each_vcpu(r, v, vcpu-kvm)
+   struct kvm *kvm =  vcpu-kvm;
+   rcu_read_lock();
+   kvm_for_each_vcpu(it, v, kvm) {
if (v == vcpu)
data = r;
+   }
+   rcu_read_unlock();
break;
}
case HV_X64_MSR_EOI:
@@ -4966,7 +4971,8 @@ static int kvmclock_cpufreq_notifier(struct 
notifier_block *nb, unsigned long va
struct cpufreq_freqs *freq = data;
struct kvm *kvm;
struct kvm_vcpu *vcpu;
-   int i, send_ipi = 0;
+   int send_ipi = 0;
+   struct kvm_iter it;
 
/*
 * We allow guests to temporarily run on slowing clocks,
@@ -5016,13 +5022,16 @@ static int kvmclock_cpufreq_notifier(struct 
notifier_block *nb, unsigned long va
 
raw_spin_lock(kvm_lock);
list_for_each_entry(kvm, vm_list, vm_list) {
-   kvm_for_each_vcpu(i, vcpu, kvm) {
+
+   rcu_read_lock();
+  

Re: [RFC] virtio: use mandatory barriers for remote processor vdevs

2011-12-11 Thread Amos Kong

On 12/12/11 06:27, Benjamin Herrenschmidt wrote:

On Sun, 2011-12-11 at 14:25 +0200, Michael S. Tsirkin wrote:


Forwarding some results by Amos, who run multiple netperf streams in
parallel, from an external box to the guest.  TCP_STREAM results were
noisy.  This could be due to buffering done by TCP, where packet size
varies even as message size is constant.

TCP_RR results were consistent. In this benchmark, after switching
to mandatory barriers, CPU utilization increased by up to 35% while
throughput went down by up to 14%. the normalized throughput/cpu
regressed consistently, between 7 and 35%

The fix applied was simply this:


What machine  processor was this  ?


pined guest memory to numa node 1
# numactl -m 1 qemu-kvm ...
pined guest vcpu threads and vhost thread to single cpu of numa node 1
# taskset -p 0x10 8348 (vhost_net_thread)
# taskset -p 0x20 8353 (vcpu 1 thread)
# taskset -p 0x40 8357 (vcpu 2 thread)
pined cpu/memory of netperf client process to node 1
# numactl --cpunodebind=1 --membind=1 netperf ...

8 cores
---
processor   : 7
vendor_id   : GenuineIntel
cpu family  : 6
model   : 44
model name  : Intel(R) Xeon(R) CPU   E5620  @ 2.40GHz
stepping: 2
microcode   : 0xc
cpu MHz : 1596.000
cache size  : 12288 KB
physical id : 1
siblings: 4
core id : 10
cpu cores   : 4
apicid  : 52
initial apicid  : 52
fpu : yes
fpu_exception   : yes
cpuid level : 11
wp  : yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx 
pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl 
xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx 
smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes 
lahf_lm ida arat epb dts tpr_shadow vnmi flexpriority ept vpid

bogomips: 4787.76
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

# cat /proc/meminfo
MemTotal:   16446616 kB
MemFree:15874092 kB
Buffers:   30404 kB
Cached:   238640 kB
SwapCached:0 kB
Active:   100204 kB
Inactive: 184312 kB
Active(anon):  15724 kB
Inactive(anon):4 kB
Active(file):  84480 kB
Inactive(file):   184308 kB
Unevictable:   0 kB
Mlocked:   0 kB
SwapTotal:   8388604 kB
SwapFree:8388604 kB
Dirty:56 kB
Writeback: 0 kB
AnonPages: 15548 kB
Mapped:11540 kB
Shmem:   256 kB
Slab:  82444 kB
SReclaimable:  19220 kB
SUnreclaim:63224 kB
KernelStack:1224 kB
PageTables: 2256 kB
NFS_Unstable:  0 kB
Bounce:0 kB
WritebackTmp:  0 kB
CommitLimit:16611912 kB
Committed_AS: 209068 kB
VmallocTotal:   34359738367 kB
VmallocUsed:  224244 kB
VmallocChunk:   34351073668 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
HugePages_Total:   0
HugePages_Free:0
HugePages_Rsvd:0
HugePages_Surp:0
Hugepagesize:   2048 kB
DirectMap4k:9876 kB
DirectMap2M: 2070528 kB
DirectMap1G:14680064 kB

# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3
node 0 size: 8175 MB
node 0 free: 7706 MB
node 1 cpus: 4 5 6 7
node 1 size: 8192 MB
node 1 free: 7796 MB
node distances:
node   0   1
  0:  10  20
  1:  20  10
# numactl --show
policy: default
preferred node: current
physcpubind: 0 1 2 3 4 5 6 7
cpubind: 0 1
nodebind: 0 1
membind: 0 1



Cheers,
Ben.


diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 3198f2e..fdccb77 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -23,7 +23,7 @@

  /* virtio guest is communicating with a virtual device that actually runs on
   * a host processor.  Memory barriers are used to control SMP effects. */
-#ifdef CONFIG_SMP
+#if 0
  /* Where possible, use SMP barriers which are more lightweight than mandatory
   * barriers, because mandatory barriers control MMIO effects on accesses
   * through relaxed memory I/O windows (which virtio does not use). */






--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] KVM: Rename vm_list to kvm_list to avoid confusion

2011-12-11 Thread Xiao Guangrong
On 12/12/2011 06:24 AM, Takuya Yoshikawa wrote:

 From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
 
 Make it clear that this is not related to virtual memory.
 


'vm' means 'virtual machine'...

 Remove vm_ prefix from the corresponding member of the struct kvm to
 avoid kvm-vm_ redundancy alongside.
 
 Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] KVM: Rename vm_list to kvm_list to avoid confusion

2011-12-11 Thread Takuya Yoshikawa

(2011/12/12 12:16), Xiao Guangrong wrote:

On 12/12/2011 06:24 AM, Takuya Yoshikawa wrote:


From: Takuya Yoshikawayoshikawa.tak...@oss.ntt.co.jp

Make it clear that this is not related to virtual memory.




'vm' means 'virtual machine'...


Of course I know.  So I wrote not related to virtual memory.

What's your point?

Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New Guess OS Creation Problem

2011-12-11 Thread takizo
Hi All, 

I have tried to install Centos on guest, it has the same problem, cannot read 
the HDD. I format it in qcow2 for linux and raw in FreeBSD. 

--
Paul Ooi 


On Dec 12, 2011, at 10:13 AM, takizo wrote:

 Hi All, 
 
 I am running on CentOS Released 6.1 final. Been using and running Linux KVM 
 quite well for quite some time, something goes wrong after I perform yum 
 upgrade. 
 
 I created new VM yesterday without any problem, same exact installation 
 procedure, installed FreeBSD 8.2. I tried to create a new VM today after yum 
 upgrade, it's able to detect the hard disk, when I start commit FreeBSD 8.2 
 installation, it complains cannot write to disk as stated the error message 
 below.
 
 block I/O error in device 'drive-ide0-0-0': Invalid argument (22)
 block I/O error in device 'drive-ide0-0-0': Invalid argument (22)
 block I/O error in device 'drive-ide0-0-0': Invalid argument (22)
 block I/O error in device 'drive-ide0-0-0': Invalid argument (22)
 block I/O error in device 'drive-ide0-0-0': Invalid argument (22)
 block I/O error in device 'drive-ide0-0-0': Invalid argument (22)
 
 Below is the log I capture from the VM log file
 
 -- Log start -- 
 2011-12-12 00:23:39.485: starting up
 LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none 
 /usr/libexec/qemu-kvm -S -M rhel6.1.0 -enable-kvm -m 4096 -smp 
 1,sockets=1,cores=1,threads=1 -name database -uuid 
 f3e9f320-7826-7e50-94bb-1833f7fd9dfb -nodefconfig -nodefaults -chardev 
 socket,id=charmonitor,path=/var/lib/libvirt/qemu/database.monitor,server,nowait
  -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-reboot 
 -drive 
 file=/opt/cibai/database,if=none,id=drive-ide0-0-0,format=raw,cache=none,aio=threads
  -device 
 ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=2 
 -drive 
 file=/opt/ISO-Download/FreeBSD-8.2-RELEASE-amd64-disc1.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw,aio=threads
  -device 
 ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 
 -netdev tap,fd=22,id=hostnet0 -device 
 e1000,netdev=hostnet0,id=net0,mac=52:54:00:77:a5:a6,bus=pci.0,addr=0x3 
 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
 -usb -vnc 127.0.0.1:2,password -vga cirrus -device 
 virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4
 char device redirected to /dev/pts/6
 Using CPU model cpu64-rhel6
 
 block I/O error in device 'drive-ide0-0-0': Invalid argument (22)
 block I/O error in device 'drive-ide0-0-0': Invalid argument (22)
 block I/O error in device 'drive-ide0-0-0': Invalid argument (22)
 block I/O error in device 'drive-ide0-0-0': Invalid argument (22)
 block I/O error in device 'drive-ide0-0-0': Invalid argument (22)
 block I/O error in device 'drive-ide0-0-0': Invalid argument (22)
 -- Log End --
 
 Below is the software version currently running;
 gpxe-roms-qemu-0.9.7-6.7.el6.noarch
 qemu-img-0.12.1.2-2.160.el6_1.8.x86_64
 qemu-kvm-0.12.1.2-2.160.el6_1.8.x86_64
 libvirt-client-0.8.7-18.el6_1.4.x86_64
 libvirt-python-0.8.7-18.el6_1.4.x86_64
 libvirt-0.8.7-18.el6_1.4.x86_64
 
 Any of you having the problem as well?
 I am planning to install CentOS as guest and see whether is has the same 
 problem as well. Thanks. 
 
 --
 Paul Ooi

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] KVM: Rename vm_list to kvm_list to avoid confusion

2011-12-11 Thread Xiao Guangrong
On 12/12/2011 12:04 PM, Takuya Yoshikawa wrote:

 (2011/12/12 12:16), Xiao Guangrong wrote:
 On 12/12/2011 06:24 AM, Takuya Yoshikawa wrote:

 From: Takuya Yoshikawayoshikawa.tak...@oss.ntt.co.jp

 Make it clear that this is not related to virtual memory.



 'vm' means 'virtual machine'...
 
 Of course I know.  So I wrote not related to virtual memory.
 
 What's your point?
 


In the code, we have kvm_create_vm()/kvm_destroy_vm(), then
add/delete the 'vm to/from the vm_list, it is really clear,
so i think this name is OK. :)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


buildbot failure in kvm on next-s390

2011-12-11 Thread kvm
The Buildbot has detected a new failure on builder next-s390 while building kvm.
Full details are available at:
 http://buildbot.b1-systems.de/kvm/builders/next-s390/builds/380

Buildbot URL: http://buildbot.b1-systems.de/kvm/

Buildslave for this Build: b1_kvm_1

Build Reason: The Nightly scheduler named 'nightly_next' triggered this build
Build Source Stamp: [branch next] HEAD
Blamelist: 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot



Re: [PATCH V2 17/23] kvm tools: Add ability to map guest RAM from hugetlbfs

2011-12-11 Thread Matt Evans
On 09/12/11 18:39, Sasha Levin wrote:
 On Fri, 2011-12-09 at 17:55 +1100, Matt Evans wrote:
 Add a --hugetlbfs commandline option to give a path to hugetlbfs-map guest
 memory (down in kvm__arch_init()).  For x86, guest memory is a normal
 ANON mmap() if this option is not provided, otherwise a hugetlbfs mmap.

 Signed-off-by: Matt Evans m...@ozlabs.org
 ---
  tools/kvm/builtin-run.c  |4 ++-
  tools/kvm/include/kvm/kvm.h  |4 +-
  tools/kvm/include/kvm/util.h |4 +++
  tools/kvm/kvm.c  |4 +-
  tools/kvm/util.c |   45 
 ++
  tools/kvm/x86/kvm.c  |   20 +++--
  6 files changed, 73 insertions(+), 8 deletions(-)

 diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
 index 7969901..0acfe81 100644
 --- a/tools/kvm/builtin-run.c
 +++ b/tools/kvm/builtin-run.c
 @@ -82,6 +82,7 @@ static const char *guest_mac;
  static const char *host_mac;
  static const char *script;
  static const char *guest_name;
 +static const char *hugetlbfs_path;
  static struct virtio_net_params *net_params;
  static bool single_step;
  static bool readonly_image[MAX_DISK_IMAGES];
 @@ -422,6 +423,7 @@ static const struct option options[] = {
  OPT_CALLBACK('\0', tty, NULL, tty id,
   Remap guest TTY into a pty on the host,
   tty_parser),
 +OPT_STRING('\0', hugetlbfs, hugetlbfs_path, path, Hugetlbfs 
 path),
  
  OPT_GROUP(Kernel options:),
  OPT_STRING('k', kernel, kernel_filename, kernel,
 @@ -807,7 +809,7 @@ int kvm_cmd_run(int argc, const char **argv, const char 
 *prefix)
  guest_name = default_name;
  }
  
 -kvm = kvm__init(dev, ram_size, guest_name);
 +kvm = kvm__init(dev, hugetlbfs_path, ram_size, guest_name);
  
  kvm-single_step = single_step;
  
 diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h
 index 5fe6e75..7159952 100644
 --- a/tools/kvm/include/kvm/kvm.h
 +++ b/tools/kvm/include/kvm/kvm.h
 @@ -30,7 +30,7 @@ struct kvm_ext {
  void kvm__set_dir(const char *fmt, ...);
  const char *kvm__get_dir(void);
  
 -struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name);
 +struct kvm *kvm__init(const char *kvm_dev, const char *hugetlbfs_path, u64 
 ram_size, const char *name);
  int kvm__recommended_cpus(struct kvm *kvm);
  int kvm__max_cpus(struct kvm *kvm);
  void kvm__init_ram(struct kvm *kvm);
 @@ -54,7 +54,7 @@ int kvm__enumerate_instances(int (*callback)(const char 
 *name, int pid));
  void kvm__remove_socket(const char *name);
  
  void kvm__arch_set_cmdline(char *cmdline, bool video);
 -void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, u64 ram_size, 
 const char *name);
 +void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, const char 
 *hugetlbfs_path, u64 ram_size, const char *name);
  void kvm__arch_setup_firmware(struct kvm *kvm);
  bool kvm__arch_cpu_supports_vm(void);
  void kvm__arch_periodic_poll(struct kvm *kvm);
 diff --git a/tools/kvm/include/kvm/util.h b/tools/kvm/include/kvm/util.h
 index dc2e0b9..1f6fbbd 100644
 --- a/tools/kvm/include/kvm/util.h
 +++ b/tools/kvm/include/kvm/util.h
 @@ -20,6 +20,7 @@
  #include limits.h
  #include sys/param.h
  #include sys/types.h
 +#include linux/types.h
  
  #ifdef __GNUC__
  #define NORETURN __attribute__((__noreturn__))
 @@ -75,4 +76,7 @@ static inline void msleep(unsigned int msecs)
  {
  usleep(MSECS_TO_USECS(msecs));
  }
 +
 +void *mmap_hugetlbfs(const char *htlbfs_path, u64 size);
 +
  #endif /* KVM__UTIL_H */
 diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
 index c54f886..35ca2c5 100644
 --- a/tools/kvm/kvm.c
 +++ b/tools/kvm/kvm.c
 @@ -306,7 +306,7 @@ int kvm__max_cpus(struct kvm *kvm)
  return ret;
  }
  
 -struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name)
 +struct kvm *kvm__init(const char *kvm_dev, const char *hugetlbfs_path, u64 
 ram_size, const char *name)
  {
  struct kvm *kvm;
  int ret;
 @@ -339,7 +339,7 @@ struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, 
 const char *name)
  if (kvm__check_extensions(kvm))
  die(A required KVM extention is not supported by OS);
  
 -kvm__arch_init(kvm, kvm_dev, ram_size, name);
 +kvm__arch_init(kvm, kvm_dev, hugetlbfs_path, ram_size, name);
  
  kvm-name = name;
  
 diff --git a/tools/kvm/util.c b/tools/kvm/util.c
 index 4efbce9..90b6a3b 100644
 --- a/tools/kvm/util.c
 +++ b/tools/kvm/util.c
 @@ -4,6 +4,11 @@
  
  #include kvm/util.h
  
 +#include linux/magic.h/* For HUGETLBFS_MAGIC */
 +#include sys/mman.h
 +#include sys/stat.h
 +#include sys/statfs.h
 +
  static void report(const char *prefix, const char *err, va_list params)
  {
  char msg[1024];
 @@ -99,3 +104,43 @@ size_t strlcat(char *dest, const char *src, size_t count)
  
  return res;
  }
 +
 +void *mmap_hugetlbfs(const char *htlbfs_path, u64 size)
 +{
 +char mpath[PATH_MAX];
 +int fd;
 +int r;
 +struct statfs sfs;
 +

Re: [RFC] virtio: use mandatory barriers for remote processor vdevs

2011-12-11 Thread Rusty Russell
On Mon, 12 Dec 2011 11:06:53 +0800, Amos Kong ak...@redhat.com wrote:
 On 12/12/11 06:27, Benjamin Herrenschmidt wrote:
  On Sun, 2011-12-11 at 14:25 +0200, Michael S. Tsirkin wrote:
 
  Forwarding some results by Amos, who run multiple netperf streams in
  parallel, from an external box to the guest.  TCP_STREAM results were
  noisy.  This could be due to buffering done by TCP, where packet size
  varies even as message size is constant.
 
  TCP_RR results were consistent. In this benchmark, after switching
  to mandatory barriers, CPU utilization increased by up to 35% while
  throughput went down by up to 14%. the normalized throughput/cpu
  regressed consistently, between 7 and 35%
 
  The fix applied was simply this:
 
  What machine  processor was this  ?
 
 pined guest memory to numa node 1

Please try this patch.  How much does the branch cost us?

(Compiles, untested).

Thanks,
Rusty.

From: Rusty Russell ru...@rustcorp.com.au
Subject: virtio: harsher barriers for virtio-mmio.

We were cheating with our barriers; using the smp ones rather than the
real device ones.  That was fine, until virtio-mmio came along, which
could be talking to a real device (a non-SMP CPU).

Unfortunately, just putting back the real barriers (reverting
d57ed95d) causes a performance regression on virtio-pci.  In
particular, Amos reports netbench's TCP_RR over virtio_net CPU
utilization increased up to 35% while throughput went down by up to
14%.

By comparison, this branch costs us???

Reference: https://lkml.org/lkml/2011/12/11/22

Signed-off-by: Rusty Russell ru...@rustcorp.com.au
---
 drivers/lguest/lguest_device.c |   10 ++
 drivers/s390/kvm/kvm_virtio.c  |2 +-
 drivers/virtio/virtio_mmio.c   |7 ---
 drivers/virtio/virtio_pci.c|4 ++--
 drivers/virtio/virtio_ring.c   |   34 +-
 include/linux/virtio_ring.h|1 +
 tools/virtio/linux/virtio.h|1 +
 tools/virtio/virtio_test.c |3 ++-
 8 files changed, 38 insertions(+), 24 deletions(-)

diff --git a/drivers/lguest/lguest_device.c b/drivers/lguest/lguest_device.c
--- a/drivers/lguest/lguest_device.c
+++ b/drivers/lguest/lguest_device.c
@@ -291,11 +291,13 @@ static struct virtqueue *lg_find_vq(stru
}
 
/*
-* OK, tell virtio_ring.c to set up a virtqueue now we know its size
-* and we've got a pointer to its pages.
+* OK, tell virtio_ring.c to set up a virtqueue now we know its size
+* and we've got a pointer to its pages.  Note that we set weak_barriers
+* to 'true': the host just a(nother) SMP CPU, so we only need inter-cpu
+* barriers.
 */
-   vq = vring_new_virtqueue(lvq-config.num, LGUEST_VRING_ALIGN,
-vdev, lvq-pages, lg_notify, callback, name);
+   vq = vring_new_virtqueue(lvq-config.num, LGUEST_VRING_ALIGN, vdev,
+true, lvq-pages, lg_notify, callback, name);
if (!vq) {
err = -ENOMEM;
goto unmap;
diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c
--- a/drivers/s390/kvm/kvm_virtio.c
+++ b/drivers/s390/kvm/kvm_virtio.c
@@ -198,7 +198,7 @@ static struct virtqueue *kvm_find_vq(str
goto out;
 
vq = vring_new_virtqueue(config-num, KVM_S390_VIRTIO_RING_ALIGN,
-vdev, (void *) config-address,
+vdev, true, (void *) config-address,
 kvm_notify, callback, name);
if (!vq) {
err = -ENOMEM;
diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -309,9 +309,10 @@ static struct virtqueue *vm_setup_vq(str
writel(virt_to_phys(info-queue)  PAGE_SHIFT,
vm_dev-base + VIRTIO_MMIO_QUEUE_PFN);
 
-   /* Create the vring */
-   vq = vring_new_virtqueue(info-num, VIRTIO_MMIO_VRING_ALIGN,
-vdev, info-queue, vm_notify, callback, name);
+   /* Create the vring: no weak barriers, the other side is could
+* be an independent device. */
+   vq = vring_new_virtqueue(info-num, VIRTIO_MMIO_VRING_ALIGN, vdev,
+false, info-queue, vm_notify, callback, name);
if (!vq) {
err = -ENOMEM;
goto error_new_virtqueue;
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -414,8 +414,8 @@ static struct virtqueue *setup_vq(struct
  vp_dev-ioaddr + VIRTIO_PCI_QUEUE_PFN);
 
/* create the vring */
-   vq = vring_new_virtqueue(info-num, VIRTIO_PCI_VRING_ALIGN,
-vdev, info-queue, vp_notify, callback, name);
+   vq = vring_new_virtqueue(info-num, VIRTIO_PCI_VRING_ALIGN, vdev,
+   

Re: [PATCH V2 17/23] kvm tools: Add ability to map guest RAM from hugetlbfs

2011-12-11 Thread Matt Evans
On 09/12/11 19:42, Pekka Enberg wrote:
 On Fri, Dec 9, 2011 at 8:55 AM, Matt Evans m...@ozlabs.org wrote:
 Add a --hugetlbfs commandline option to give a path to hugetlbfs-map guest
 memory (down in kvm__arch_init()).  For x86, guest memory is a normal
 ANON mmap() if this option is not provided, otherwise a hugetlbfs mmap.

 Signed-off-by: Matt Evans m...@ozlabs.org
 
 Btw, why don't you want to use MADV_HUGEPAGE for this? You could just
 do it unconditionally, no?

Well, I'm manually mapping from hugetlbfs as currently* PPC KVM requires
hugepages to back guest RAM and MADV_HUGEPAGE is just a hint, no?  I also wanted
things to work on kernels without transparent hugepages enabled.  I think it's
safer to do things explicitly, as if the user requests hugepages it's more
transparent (I'm thinking benchmarking, etc.) to be definitely using hugepages.


Cheers,


Matt


*: I know Paul's posted patches to implement smallpage support... so this will
   change in time.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


buildbot failure in kvm on next-x86_64

2011-12-11 Thread kvm
The Buildbot has detected a new failure on builder next-x86_64 while building 
kvm.
Full details are available at:
 http://buildbot.b1-systems.de/kvm/builders/next-x86_64/builds/378

Buildbot URL: http://buildbot.b1-systems.de/kvm/

Buildslave for this Build: b1_kvm_1

Build Reason: The Nightly scheduler named 'nightly_next' triggered this build
Build Source Stamp: [branch next] HEAD
Blamelist: 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot

N�r��yb�X��ǧv�^�)޺{.n�+h����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf

buildbot failure in kvm on next-i386

2011-12-11 Thread kvm
The Buildbot has detected a new failure on builder next-i386 while building kvm.
Full details are available at:
 http://buildbot.b1-systems.de/kvm/builders/next-i386/builds/378

Buildbot URL: http://buildbot.b1-systems.de/kvm/

Buildslave for this Build: b1_kvm_1

Build Reason: The Nightly scheduler named 'nightly_next' triggered this build
Build Source Stamp: [branch next] HEAD
Blamelist: 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot



Re: [PATCH V2 04/23] kvm tools: Get correct 64-bit types on PPC64 and link appropriately

2011-12-11 Thread Pekka Enberg
On Mon, 2011-12-12 at 12:03 +1100, Matt Evans wrote:
 On 09/12/11 19:29, Pekka Enberg wrote:
  On Fri, Dec 9, 2011 at 10:24 AM, Sasha Levin levinsasha...@gmail.com 
  wrote:
  If you also got kernel patches that add __SANE_USERSPACE_TYPES__ to the
  headers, and KVM_CAP_NR_VCPUS to KVM PPC, we can carry them in the KVM
  tools tree as well.
  
  Yup, all we need is ACKs from PPC maintainers.
 
 Cool, I've pinged them re. __SANE_USERSPACE_TYPES__, and that would be useful 
 to
 carry in your tree.  But, IMHO, the patch I sent to Alex ([PATCH] KVM: PPC: 
 Add
 KVM_CAP_NR_VCPUS and KVM_CAP_MAX_VCPUS) should go in via his tree.  The
 KVM/PPC-HV kernel folk will get it quicker in their trees (pulled from Alex) 
 and
 it's not a build problem, only a limit of SMP CPU numbers.  That is, if 
 you're
 building a kernel for PPC KVM today you'll probably use something more similar
 to Alex's tree than mainline/kvm tools tree.

Definitely. The __SANE_USERSPACE_TYPES__ patch should probably go to
powerpc git tree in addition to our tree.

Pekka

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 17/23] kvm tools: Add ability to map guest RAM from hugetlbfs

2011-12-11 Thread Pekka Enberg
On Mon, Dec 12, 2011 at 7:17 AM, Matt Evans m...@ozlabs.org wrote:
 Well, I'm manually mapping from hugetlbfs as currently* PPC KVM requires
 hugepages to back guest RAM and MADV_HUGEPAGE is just a hint, no?  I also 
 wanted
 things to work on kernels without transparent hugepages enabled.  I think it's
 safer to do things explicitly, as if the user requests hugepages it's more
 transparent (I'm thinking benchmarking, etc.) to be definitely using 
 hugepages.

OK, makes sense. You should probably mention that in the changelog.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 17/23] kvm tools: Add ability to map guest RAM from hugetlbfs

2011-12-11 Thread Matt Evans
On 09/12/11 19:38, Pekka Enberg wrote:
 On Fri, Dec 9, 2011 at 8:55 AM, Matt Evans m...@ozlabs.org wrote:
 Add a --hugetlbfs commandline option to give a path to hugetlbfs-map guest
 memory (down in kvm__arch_init()).  For x86, guest memory is a normal
 ANON mmap() if this option is not provided, otherwise a hugetlbfs mmap.

 Signed-off-by: Matt Evans m...@ozlabs.org
 
 +void *mmap_hugetlbfs(const char *htlbfs_path, u64 size)
 +{
 +   char mpath[PATH_MAX];
 +   int fd;
 +   int r;
 +   struct statfs sfs;
 +   void *addr;
 +
 +   do {
 +   /*
 +* QEMU seems to work around this returning EINTR...  Let's 
 do
 +* that too.
 +*/
 +   r = statfs(htlbfs_path, sfs);
 +   } while (r  errno == EINTR);
 
 Can this really happen? What about EAGAIN? The retry logic really
 wants to live in tools/kvm/read-write.c as a xstatfs() wrapper if we
 do need this.

I don't think it can.  As per the comment, I thought QEMU knew something I
didn't but I haven't seen any other reason for doing this.  I'll remove it,
thanks for the sanity jolt.


Matt

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Android-virt] [PATCH v5 05/13] ARM: KVM: Inject IRQs and FIQs from userspace

2011-12-11 Thread Alexander Graf

On 11.12.2011, at 20:48, Peter Maydell peter.mayd...@linaro.org wrote:

 On 11 December 2011 19:30, Christoffer Dall
 c.d...@virtualopensystems.com wrote:
 On Sun, Dec 11, 2011 at 11:03 AM, Peter Maydell
 peter.mayd...@linaro.org wrote:
 Removing the mask would be wrong since the irq field here
 is encoding both cpu number and irq-vs-fiq. The default is
 just an unreachable condition. (Why are we using % here
 rather than the obvious bit operation, incidentally?)
 
 right, I will remove the default case.
 
 I highly doubt that the difference in using a bitop will be measurably
 more efficient, but if you feel strongly about it, I can change it to
 a shift and bitwise and, which I assume is what you mean by the
 obvious bit operation? I think my CS background speaks for using %,
 but whatever.
 
 Certainly the compiler ought to be able to figure out the
 two are the same thing; I just think irq  1 is more readable
 than irq % 2 (because it's being clear that it's treating the
 variable as a pile of bits rather than an integer). This is
 bikeshedding rather, though, and style issues in kernel code
 are a matter for the kernel folk. So you can ignore me :-)

Yes, the general rule of thumb is to use bit operations where you can. And in 
this case it certainly makes sense :).

Plus, bit operations are an order of magnitude faster than div/mod usually.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] KVM: Rename vm_list to kvm_list to avoid confusion

2011-12-11 Thread Takuya Yoshikawa

(2011/12/12 13:51), Xiao Guangrong wrote:

On 12/12/2011 12:04 PM, Takuya Yoshikawa wrote:


(2011/12/12 12:16), Xiao Guangrong wrote:

On 12/12/2011 06:24 AM, Takuya Yoshikawa wrote:


From: Takuya Yoshikawayoshikawa.tak...@oss.ntt.co.jp

Make it clear that this is not related to virtual memory.




'vm' means 'virtual machine'...


Of course I know.  So I wrote not related to virtual memory.

What's your point?




In the code, we have kvm_create_vm()/kvm_destroy_vm(), then
add/delete the 'vm to/from the vm_list, it is really clear,
so i think this name is OK. :)



Some reasons I wanted to change this:

- The lock which protects this list is called kvm_lock, not vm_lock
- Some architectures are using vm_list for vm region member
- The list connects kvm instances (struct kvm) and we are doing
  list_for_each_entry(kvm, vm_list, vm_list), not
  list_for_each_entry(vm, vm_list, vm_list)

In the case of kvm_create_vm(), it creates not only a kvm instance but also
does more virtual machine initialization generally.  So _vm is reasonable.
(I do not mind if it is static in kvm_main.c but it is more widely used.)


But I do not mind to drop this patch if other people also want to keep the
name.  So I will wait some more comments.


Thanks,
Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Current kernel fails to compile with KVM on PowerPC

2011-12-11 Thread Alexander Graf

On 11.12.2011, at 16:16, Jörg Sommer wrote:

 Alexander Graf hat am Tue 22. Nov, 22:29 (+0100) geschrieben:
 On 22.11.2011, at 21:04, Jörg Sommer wrote:
 Jörg Sommer hat am Mon 07. Nov, 20:48 (+0100) geschrieben:
 I'm trying to build the kernel with the git commit-id
 31555213f03bca37d2c02e10946296052f4ecfcd, but it fails
 
 CHK include/linux/version.h
 HOSTCC  scripts/mod/modpost.o
 CHK include/generated/utsrelease.h
 UPD include/generated/utsrelease.h
 HOSTLD  scripts/mod/modpost
 GEN include/generated/bounds.h
 CC  arch/powerpc/kernel/asm-offsets.s
 In file included from arch/powerpc/kernel/asm-offsets.c:59:0:
 /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h: In function 
 ‘compute_tlbie_rb’:
 /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:393:10: error: 
 ‘HPTE_V_SECONDARY’ undeclared (first use in this function)
 /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:393:10: note: 
 each undeclared identifier is reported only once for each function it 
 appears in
 /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:396:12: error: 
 ‘HPTE_V_1TB_SEG’ undeclared (first use in this function)
 /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:401:10: error: 
 ‘HPTE_V_LARGE’ undeclared (first use in this function)
 /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:415:2: 
 warning: right shift count = width of type [enabled by default]
 make[3]: *** [arch/powerpc/kernel/asm-offsets.s] Fehler 1
 make[2]: *** [prepare0] Fehler 2
 make[1]: *** [deb-pkg] Fehler 2
 make: *** [deb-pkg] Fehler 2
 
 I'm still having this problem. I can' build
 6fe4c6d466e95d31164f14b1ac4aefb51f0f4f82. Are there any patches to
 make the kernel builds and do not oops [1] on PowerPC?
 
 The failures above should be fixed by now.
 
 I've pulled git://git.kernel.org/pub/scm/virt/kvm/kvm.git
 (a41d08d13f903da5c633fc58ee074156f05ab3ce), but this tree doesn't contain
 a suitable commit. Where can I find it?

Please try:

  git://github.com/agraf/linux-2.6.git kvm-ppc-next

That's my WIP tree. I still have a few more patches I want to collect before 
shoving everything through automated testing and pushing it on to Avi.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Current kernel fails to compile with KVM on PowerPC

2011-12-11 Thread Jörg Sommer
Alexander Graf hat am Tue 22. Nov, 22:29 (+0100) geschrieben:
 On 22.11.2011, at 21:04, Jörg Sommer wrote:
  Jörg Sommer hat am Mon 07. Nov, 20:48 (+0100) geschrieben:
  I'm trying to build the kernel with the git commit-id
  31555213f03bca37d2c02e10946296052f4ecfcd, but it fails
  
   CHK include/linux/version.h
   HOSTCC  scripts/mod/modpost.o
   CHK include/generated/utsrelease.h
   UPD include/generated/utsrelease.h
   HOSTLD  scripts/mod/modpost
   GEN include/generated/bounds.h
   CC  arch/powerpc/kernel/asm-offsets.s
  In file included from arch/powerpc/kernel/asm-offsets.c:59:0:
  /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h: In function 
  ‘compute_tlbie_rb’:
  /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:393:10: error: 
  ‘HPTE_V_SECONDARY’ undeclared (first use in this function)
  /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:393:10: note: 
  each undeclared identifier is reported only once for each function it 
  appears in
  /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:396:12: error: 
  ‘HPTE_V_1TB_SEG’ undeclared (first use in this function)
  /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:401:10: error: 
  ‘HPTE_V_LARGE’ undeclared (first use in this function)
  /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:415:2: 
  warning: right shift count = width of type [enabled by default]
  make[3]: *** [arch/powerpc/kernel/asm-offsets.s] Fehler 1
  make[2]: *** [prepare0] Fehler 2
  make[1]: *** [deb-pkg] Fehler 2
  make: *** [deb-pkg] Fehler 2
  
  I'm still having this problem. I can' build
  6fe4c6d466e95d31164f14b1ac4aefb51f0f4f82. Are there any patches to
  make the kernel builds and do not oops [1] on PowerPC?
 
 The failures above should be fixed by now.

I've pulled git://git.kernel.org/pub/scm/virt/kvm/kvm.git
(a41d08d13f903da5c633fc58ee074156f05ab3ce), but this tree doesn't contain
a suitable commit. Where can I find it?

Bye, Jörg.
-- 
 Ich kenn mich mit OpenBSD kaum aus, was sind denn da so die
 Vorteile gegenueber Linux und iptables?
Der Fuchsschwanzeffekt ist größer. :-
Message-ID: slrnb11064.54g.hsch...@humbert.ddns.org


signature.asc
Description: Digital signature http://en.wikipedia.org/wiki/OpenPGP


Re: [PATCH V2 04/23] kvm tools: Get correct 64-bit types on PPC64 and link appropriately

2011-12-11 Thread Matt Evans
On 09/12/11 19:29, Pekka Enberg wrote:
 On Fri, Dec 9, 2011 at 10:24 AM, Sasha Levin levinsasha...@gmail.com wrote:
 If you also got kernel patches that add __SANE_USERSPACE_TYPES__ to the
 headers, and KVM_CAP_NR_VCPUS to KVM PPC, we can carry them in the KVM
 tools tree as well.
 
 Yup, all we need is ACKs from PPC maintainers.

Cool, I've pinged them re. __SANE_USERSPACE_TYPES__, and that would be useful to
carry in your tree.  But, IMHO, the patch I sent to Alex ([PATCH] KVM: PPC: Add
KVM_CAP_NR_VCPUS and KVM_CAP_MAX_VCPUS) should go in via his tree.  The
KVM/PPC-HV kernel folk will get it quicker in their trees (pulled from Alex) and
it's not a build problem, only a limit of SMP CPU numbers.  That is, if you're
building a kernel for PPC KVM today you'll probably use something more similar
to Alex's tree than mainline/kvm tools tree.


Cheers,


Matt
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 17/23] kvm tools: Add ability to map guest RAM from hugetlbfs

2011-12-11 Thread Matt Evans
On 09/12/11 18:39, Sasha Levin wrote:
 On Fri, 2011-12-09 at 17:55 +1100, Matt Evans wrote:
 Add a --hugetlbfs commandline option to give a path to hugetlbfs-map guest
 memory (down in kvm__arch_init()).  For x86, guest memory is a normal
 ANON mmap() if this option is not provided, otherwise a hugetlbfs mmap.

 Signed-off-by: Matt Evans m...@ozlabs.org
 ---
  tools/kvm/builtin-run.c  |4 ++-
  tools/kvm/include/kvm/kvm.h  |4 +-
  tools/kvm/include/kvm/util.h |4 +++
  tools/kvm/kvm.c  |4 +-
  tools/kvm/util.c |   45 
 ++
  tools/kvm/x86/kvm.c  |   20 +++--
  6 files changed, 73 insertions(+), 8 deletions(-)

 diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
 index 7969901..0acfe81 100644
 --- a/tools/kvm/builtin-run.c
 +++ b/tools/kvm/builtin-run.c
 @@ -82,6 +82,7 @@ static const char *guest_mac;
  static const char *host_mac;
  static const char *script;
  static const char *guest_name;
 +static const char *hugetlbfs_path;
  static struct virtio_net_params *net_params;
  static bool single_step;
  static bool readonly_image[MAX_DISK_IMAGES];
 @@ -422,6 +423,7 @@ static const struct option options[] = {
  OPT_CALLBACK('\0', tty, NULL, tty id,
   Remap guest TTY into a pty on the host,
   tty_parser),
 +OPT_STRING('\0', hugetlbfs, hugetlbfs_path, path, Hugetlbfs 
 path),
  
  OPT_GROUP(Kernel options:),
  OPT_STRING('k', kernel, kernel_filename, kernel,
 @@ -807,7 +809,7 @@ int kvm_cmd_run(int argc, const char **argv, const char 
 *prefix)
  guest_name = default_name;
  }
  
 -kvm = kvm__init(dev, ram_size, guest_name);
 +kvm = kvm__init(dev, hugetlbfs_path, ram_size, guest_name);
  
  kvm-single_step = single_step;
  
 diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h
 index 5fe6e75..7159952 100644
 --- a/tools/kvm/include/kvm/kvm.h
 +++ b/tools/kvm/include/kvm/kvm.h
 @@ -30,7 +30,7 @@ struct kvm_ext {
  void kvm__set_dir(const char *fmt, ...);
  const char *kvm__get_dir(void);
  
 -struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name);
 +struct kvm *kvm__init(const char *kvm_dev, const char *hugetlbfs_path, u64 
 ram_size, const char *name);
  int kvm__recommended_cpus(struct kvm *kvm);
  int kvm__max_cpus(struct kvm *kvm);
  void kvm__init_ram(struct kvm *kvm);
 @@ -54,7 +54,7 @@ int kvm__enumerate_instances(int (*callback)(const char 
 *name, int pid));
  void kvm__remove_socket(const char *name);
  
  void kvm__arch_set_cmdline(char *cmdline, bool video);
 -void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, u64 ram_size, 
 const char *name);
 +void kvm__arch_init(struct kvm *kvm, const char *kvm_dev, const char 
 *hugetlbfs_path, u64 ram_size, const char *name);
  void kvm__arch_setup_firmware(struct kvm *kvm);
  bool kvm__arch_cpu_supports_vm(void);
  void kvm__arch_periodic_poll(struct kvm *kvm);
 diff --git a/tools/kvm/include/kvm/util.h b/tools/kvm/include/kvm/util.h
 index dc2e0b9..1f6fbbd 100644
 --- a/tools/kvm/include/kvm/util.h
 +++ b/tools/kvm/include/kvm/util.h
 @@ -20,6 +20,7 @@
  #include limits.h
  #include sys/param.h
  #include sys/types.h
 +#include linux/types.h
  
  #ifdef __GNUC__
  #define NORETURN __attribute__((__noreturn__))
 @@ -75,4 +76,7 @@ static inline void msleep(unsigned int msecs)
  {
  usleep(MSECS_TO_USECS(msecs));
  }
 +
 +void *mmap_hugetlbfs(const char *htlbfs_path, u64 size);
 +
  #endif /* KVM__UTIL_H */
 diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
 index c54f886..35ca2c5 100644
 --- a/tools/kvm/kvm.c
 +++ b/tools/kvm/kvm.c
 @@ -306,7 +306,7 @@ int kvm__max_cpus(struct kvm *kvm)
  return ret;
  }
  
 -struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, const char *name)
 +struct kvm *kvm__init(const char *kvm_dev, const char *hugetlbfs_path, u64 
 ram_size, const char *name)
  {
  struct kvm *kvm;
  int ret;
 @@ -339,7 +339,7 @@ struct kvm *kvm__init(const char *kvm_dev, u64 ram_size, 
 const char *name)
  if (kvm__check_extensions(kvm))
  die(A required KVM extention is not supported by OS);
  
 -kvm__arch_init(kvm, kvm_dev, ram_size, name);
 +kvm__arch_init(kvm, kvm_dev, hugetlbfs_path, ram_size, name);
  
  kvm-name = name;
  
 diff --git a/tools/kvm/util.c b/tools/kvm/util.c
 index 4efbce9..90b6a3b 100644
 --- a/tools/kvm/util.c
 +++ b/tools/kvm/util.c
 @@ -4,6 +4,11 @@
  
  #include kvm/util.h
  
 +#include linux/magic.h/* For HUGETLBFS_MAGIC */
 +#include sys/mman.h
 +#include sys/stat.h
 +#include sys/statfs.h
 +
  static void report(const char *prefix, const char *err, va_list params)
  {
  char msg[1024];
 @@ -99,3 +104,43 @@ size_t strlcat(char *dest, const char *src, size_t count)
  
  return res;
  }
 +
 +void *mmap_hugetlbfs(const char *htlbfs_path, u64 size)
 +{
 +char mpath[PATH_MAX];
 +int fd;
 +int r;
 +struct statfs sfs;
 +

Re: [PATCH V2 04/23] kvm tools: Get correct 64-bit types on PPC64 and link appropriately

2011-12-11 Thread Pekka Enberg
On Mon, 2011-12-12 at 12:03 +1100, Matt Evans wrote:
 On 09/12/11 19:29, Pekka Enberg wrote:
  On Fri, Dec 9, 2011 at 10:24 AM, Sasha Levin levinsasha...@gmail.com 
  wrote:
  If you also got kernel patches that add __SANE_USERSPACE_TYPES__ to the
  headers, and KVM_CAP_NR_VCPUS to KVM PPC, we can carry them in the KVM
  tools tree as well.
  
  Yup, all we need is ACKs from PPC maintainers.
 
 Cool, I've pinged them re. __SANE_USERSPACE_TYPES__, and that would be useful 
 to
 carry in your tree.  But, IMHO, the patch I sent to Alex ([PATCH] KVM: PPC: 
 Add
 KVM_CAP_NR_VCPUS and KVM_CAP_MAX_VCPUS) should go in via his tree.  The
 KVM/PPC-HV kernel folk will get it quicker in their trees (pulled from Alex) 
 and
 it's not a build problem, only a limit of SMP CPU numbers.  That is, if 
 you're
 building a kernel for PPC KVM today you'll probably use something more similar
 to Alex's tree than mainline/kvm tools tree.

Definitely. The __SANE_USERSPACE_TYPES__ patch should probably go to
powerpc git tree in addition to our tree.

Pekka

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 17/23] kvm tools: Add ability to map guest RAM from hugetlbfs

2011-12-11 Thread Pekka Enberg
On Mon, Dec 12, 2011 at 7:17 AM, Matt Evans m...@ozlabs.org wrote:
 Well, I'm manually mapping from hugetlbfs as currently* PPC KVM requires
 hugepages to back guest RAM and MADV_HUGEPAGE is just a hint, no?  I also 
 wanted
 things to work on kernels without transparent hugepages enabled.  I think it's
 safer to do things explicitly, as if the user requests hugepages it's more
 transparent (I'm thinking benchmarking, etc.) to be definitely using 
 hugepages.

OK, makes sense. You should probably mention that in the changelog.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Current kernel fails to compile with KVM on PowerPC

2011-12-11 Thread Alexander Graf

On 11.12.2011, at 16:16, Jörg Sommer wrote:

 Alexander Graf hat am Tue 22. Nov, 22:29 (+0100) geschrieben:
 On 22.11.2011, at 21:04, Jörg Sommer wrote:
 Jörg Sommer hat am Mon 07. Nov, 20:48 (+0100) geschrieben:
 I'm trying to build the kernel with the git commit-id
 31555213f03bca37d2c02e10946296052f4ecfcd, but it fails
 
 CHK include/linux/version.h
 HOSTCC  scripts/mod/modpost.o
 CHK include/generated/utsrelease.h
 UPD include/generated/utsrelease.h
 HOSTLD  scripts/mod/modpost
 GEN include/generated/bounds.h
 CC  arch/powerpc/kernel/asm-offsets.s
 In file included from arch/powerpc/kernel/asm-offsets.c:59:0:
 /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h: In function 
 ‘compute_tlbie_rb’:
 /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:393:10: error: 
 ‘HPTE_V_SECONDARY’ undeclared (first use in this function)
 /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:393:10: note: 
 each undeclared identifier is reported only once for each function it 
 appears in
 /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:396:12: error: 
 ‘HPTE_V_1TB_SEG’ undeclared (first use in this function)
 /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:401:10: error: 
 ‘HPTE_V_LARGE’ undeclared (first use in this function)
 /home/joerg/git/linux/arch/powerpc/include/asm/kvm_book3s.h:415:2: 
 warning: right shift count = width of type [enabled by default]
 make[3]: *** [arch/powerpc/kernel/asm-offsets.s] Fehler 1
 make[2]: *** [prepare0] Fehler 2
 make[1]: *** [deb-pkg] Fehler 2
 make: *** [deb-pkg] Fehler 2
 
 I'm still having this problem. I can' build
 6fe4c6d466e95d31164f14b1ac4aefb51f0f4f82. Are there any patches to
 make the kernel builds and do not oops [1] on PowerPC?
 
 The failures above should be fixed by now.
 
 I've pulled git://git.kernel.org/pub/scm/virt/kvm/kvm.git
 (a41d08d13f903da5c633fc58ee074156f05ab3ce), but this tree doesn't contain
 a suitable commit. Where can I find it?

Please try:

  git://github.com/agraf/linux-2.6.git kvm-ppc-next

That's my WIP tree. I still have a few more patches I want to collect before 
shoving everything through automated testing and pushing it on to Avi.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html