[GIT PULL] more KVM changes for 3.11

2013-07-08 Thread Gleb Natapov
Linus,

Please pull from

git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/kvm-3.11-2

to receive more KVM updates for the 3.11 merge window. There is a fix for
a bug that prevents some guests from working on old Intel CPUs and a patch
that integrates ARM64 KVM, merged via ARM64 tree, into Kconfig.


Gleb Natapov (1):
  KVM: VMX: mark unusable segment as nonpresent

Marc Zyngier (1):
  arm64: KVM: Kconfig integration

 arch/arm64/Kconfig  |2 ++
 arch/arm64/kernel/asm-offsets.c |1 +
 arch/arm64/kvm/Kconfig  |   51 +++
 arch/x86/kvm/vmx.c  |   11 +++--
 4 files changed, 63 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm64/kvm/Kconfig
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] kvm tools: fix boot of guests with more than 4gb of ram

2013-07-08 Thread Pekka Enberg
On Sun, Jul 7, 2013 at 7:00 PM, Sasha Levin sasha.le...@oracle.com wrote:
 Commit kvm tools: virtio: remove hardcoded assumptions
 about guest page size has introduced a bug that prevented
 guests with more than 4gb of ram from booting.

 The issue is that 'pfn' is a 32bit integer, so when multiplying
 it by page size to get the actual page will cause an overflow if
 the pfn referred to a memory area above 4gb.

 Signed-off-by: Sasha Levin sasha.le...@oracle.com

Will, Michael, Asias, good to merge?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 60505] Heavy network traffic triggers vhost_net lockup

2013-07-08 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=60505

--- Comment #4 from Bart Van Assche bvanass...@acm.org ---
I have not yet tried to disable zero-copy tx. But even with the vhost-net patch
applied on kernel v3.9.9 I can still trigger this issue:

Jul  8 10:58:01 asus kernel: BUG: unable to handle kernel NULL pointer
dereference at 001c
Jul  8 10:58:01 asus kernel: IP: [810f73a9]
put_compound_page+0x89/0x170
Jul  8 10:58:01 asus kernel: PGD 0 
Jul  8 10:58:01 asus kernel: Oops:  [#1] SMP 
Jul  8 10:58:01 asus kernel: Modules linked in: dm_queue_length dm_multipath
ib_iser iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vhost_net tun fuse
ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables
x_tables af_packet bridge stp llc rdma_ucm rdma_cm iw_cm ib_addr ib_srp
scsi_transport_srp scsi_tgt ib_ipoib ib_cm ib_uverbs ib_umad mlx4_en mlx4_ib
ib_sa ib_mad ib_core dm_mod hid_generic usbhid hid acpi_cpufreq mperf kvm_intel
i2c_i801 kvm r8169 ehci_pci snd_hda_codec_hdmi qla2xxx snd_hda_codec_realtek
snd_hda_intel snd_hda_codec snd_hwdep ehci_hcd snd_pcm snd_seq mii sr_mod cdrom
sg snd_timer pcspkr snd_seq_device mlx4_core scsi_transport_fc wmi snd
soundcore snd_page_alloc crc32c_intel microcode autofs4 ext4 jbd2 mbcache crc16
raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx
raid10 raid0 raid1 sd_mod crc_t10dif i915 drm_kms_helper drm ahci libahci
intel_agp i2c_algo_bit intel_gtt agpgart xhci_hcd i2c_core video usbcore
usb_common button processor thermal_sys hwmon scsi_dh_alua scsi_dh pata_acpi
libata scsi_mod
Jul  8 10:58:01 asus kernel: CPU 3 
Jul  8 10:58:01 asus kernel: Pid: 5485, comm: vhost-5462 Not tainted 3.9.9+ #1
Gigabyte Technology Co., Ltd. Z68X-UD3H-B3/Z68X-UD3H-B3
Jul  8 10:58:01 asus kernel: RIP: 0010:[810f73a9] 
[810f73a9] put_compound_page+0x89/0x170
Jul  8 10:58:01 asus kernel: RSP: 0018:8800aab13bd8  EFLAGS: 00010286
Jul  8 10:58:01 asus kernel: RAX: 880118b0b600 RBX: 880118b0b800 RCX:
ea000252801c
Jul  8 10:58:01 asus kernel: RDX: 0140 RSI: 0246 RDI:
880118b0b800
Jul  8 10:58:01 asus kernel: RBP: 8800aab13bf8 R08: 8800aa8f4518 R09:
0010
Jul  8 10:58:01 asus kernel: R10:  R11: 7fa0c000 R12:

Jul  8 10:58:01 asus kernel: R13: a078f96c R14: 91aa R15:
8800b3bb7500
Jul  8 10:58:01 asus kernel: FS:  ()
GS:88011fac() knlGS:
Jul  8 10:58:01 asus kernel: CS:  0010 DS:  ES:  CR0: 80050033
Jul  8 10:58:01 asus kernel: CR2: 001c CR3: aab9f000 CR4:
000427e0
Jul  8 10:58:01 asus kernel: DR0:  DR1:  DR2:

Jul  8 10:58:01 asus kernel: DR3:  DR6: 0ff0 DR7:
0400
Jul  8 10:58:01 asus kernel: Process vhost-5462 (pid: 5485, threadinfo
8800aab12000, task 88010792)
Jul  8 10:58:01 asus kernel: Stack:
Jul  8 10:58:01 asus kernel: eaecae40 0012 8800b3bb7500
a078f96c
Jul  8 10:58:01 asus kernel: 8800aab13c08 810f77ec 8800aab13c28
8132045f
Jul  8 10:58:01 asus kernel: 8800b3bb7500 8800b3bb7500 8800aab13c48
813204fe
Jul  8 10:58:01 asus kernel: Call Trace:
Jul  8 10:58:01 asus kernel: [810f77ec] put_page+0x2c/0x40
Jul  8 10:58:01 asus kernel: [8132045f] skb_release_data+0x8f/0x110
Jul  8 10:58:01 asus kernel: [813204fe] __kfree_skb+0x1e/0xa0
Jul  8 10:58:01 asus kernel: [813205b6] kfree_skb+0x36/0xa0
Jul  8 10:58:01 asus kernel: [a078f96c] tun_get_user+0x71c/0x810
[tun]
Jul  8 10:58:01 asus kernel: [a078faba] tun_sendmsg+0x5a/0x80 [tun]
Jul  8 10:58:01 asus kernel: [a079e607] handle_tx+0x287/0x680
[vhost_net]
Jul  8 10:58:01 asus kernel: [a079ea35] handle_tx_kick+0x15/0x20
[vhost_net]
Jul  8 10:58:01 asus kernel: [a079a80a] vhost_worker+0xaa/0x1a0
[vhost_net]
Jul  8 10:58:01 asus kernel: [8105ef80] kthread+0xc0/0xd0
Jul  8 10:58:01 asus kernel: [8140395c] ret_from_fork+0x7c/0xb0
Jul  8 10:58:01 asus kernel: Code: 8b 6d f8 c9 c3 48 8b 07 f6 c4 80 75 0d f0 ff
4b 1c 0f 94 c0 84 c0 74 c9 eb bf 4c 8b 67 30 48 8b 07 f6 c4 80 74 e7 4c 39 e7
74 e2 41 8b 54 24 1c 49 8d 4c 24 1c 85 d2 74 d4 8d 72 01 89 d0 f0 0f 
Jul  8 10:58:01 asus kernel: RIP  [810f73a9]
put_compound_page+0x89/0x170
Jul  8 10:58:01 asus kernel: RSP 8800aab13bd8
Jul  8 10:58:01 asus kernel: CR2: 001c
Jul  8 10:58:01 asus kernel: ---[ end trace 481d0b283c089c9a ]---

The patch I ran this test with is as follows:

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index dfff647..98f81e6 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -857,7 +857,7 @@ static long vhost_net_set_backend(struct vhost_net *n,
unsigned index, int fd)
 mutex_unlock(vq-mutex);

 

[PATCH 1/2] KVM: PPC: Book3S HV: Correct tlbie usage

2013-07-08 Thread Paul Mackerras
This corrects the usage of the tlbie (TLB invalidate entry) instruction
in HV KVM.  The tlbie instruction changed between PPC970 and POWER7.
On the PPC970, the bit to select large vs. small page is in the instruction,
not in the RB register value.  This changes the code to use the correct
form on PPC970.

On POWER7 we were calculating the AVAL (Abbreviated Virtual Address, Lower)
field of the RB value incorrectly for 64k pages.  This fixes it.

Since we now have several cases to handle for the tlbie instruction, this
factors out the code to do a sequence of tlbies into a new function,
do_tlbies(), and calls that from the various places where the code was
doing tlbie instructions inline.  It also makes kvmppc_h_bulk_remove()
use the same global_invalidates() function for determining whether to do
local or global TLB invalidations as is used in other places, for
consistency, and also to make sure that kvm-arch.need_tlb_flush gets
updated properly.

Signed-off-by: Paul Mackerras pau...@samba.org
Cc: sta...@vger.kernel.org
---
 arch/powerpc/include/asm/kvm_book3s_64.h |   2 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  | 139 ++-
 2 files changed, 82 insertions(+), 59 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 9c1ff33..dc6b84a 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -100,7 +100,7 @@ static inline unsigned long compute_tlbie_rb(unsigned long 
v, unsigned long r,
/* (masks depend on page size) */
rb |= 0x1000;   /* page encoding in LP field */
rb |= (va_low  0x7f)  16; /* 7b of VA in AVA/LP 
field */
-   rb |= (va_low  0xfe);  /* AVAL field (P7 doesn't seem 
to care) */
+   rb |= ((va_low  4)  0xf0);   /* AVAL field (P7 
doesn't seem to care) */
}
} else {
/* 4kB page */
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 6dcbb49..105b00f 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -385,6 +385,80 @@ static inline int try_lock_tlbie(unsigned int *lock)
return old == 0;
 }
 
+/*
+ * tlbie/tlbiel is a bit different on the PPC970 compared to later
+ * processors such as POWER7; the large page bit is in the instruction
+ * not RB, and the top 16 bits and the bottom 12 bits of the VA
+ * in RB must be 0.
+ */
+static void do_tlbies_970(struct kvm *kvm, unsigned long *rbvalues,
+ long npages, int global, bool need_sync)
+{
+   long i;
+
+   if (global) {
+   while (!try_lock_tlbie(kvm-arch.tlbie_lock))
+   cpu_relax();
+   if (need_sync)
+   asm volatile(ptesync : : : memory);
+   for (i = 0; i  npages; ++i) {
+   unsigned long rb = rbvalues[i];
+
+   if (rb  1) /* large page */
+   asm volatile(tlbie %0,1 : :
+r (rb  0xf000ul));
+   else
+   asm volatile(tlbie %0,0 : :
+r (rb  0xf000ul));
+   }
+   asm volatile(eieio; tlbsync; ptesync : : : memory);
+   kvm-arch.tlbie_lock = 0;
+   } else {
+   if (need_sync)
+   asm volatile(ptesync : : : memory);
+   for (i = 0; i  npages; ++i) {
+   unsigned long rb = rbvalues[i];
+
+   if (rb  1) /* large page */
+   asm volatile(tlbiel %0,1 : :
+r (rb  0xf000ul));
+   else
+   asm volatile(tlbiel %0,0 : :
+r (rb  0xf000ul));
+   }
+   asm volatile(ptesync : : : memory);
+   }
+}
+
+static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues,
+ long npages, int global, bool need_sync)
+{
+   long i;
+
+   if (cpu_has_feature(CPU_FTR_ARCH_201)) {
+   /* PPC970 tlbie instruction is a bit different */
+   do_tlbies_970(kvm, rbvalues, npages, global, need_sync);
+   return;
+   }
+   if (global) {
+   while (!try_lock_tlbie(kvm-arch.tlbie_lock))
+   cpu_relax();
+   if (need_sync)
+   asm volatile(ptesync : : : memory);
+   for (i = 0; i  npages; ++i)
+   asm volatile(PPC_TLBIE(%1,%0) : :
+r (rbvalues[i]), r (kvm-arch.lpid));
+   asm volatile(eieio; tlbsync; ptesync 

[Bug 60505] Heavy network traffic triggers vhost_net lockup

2013-07-08 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=60505

Bart Van Assche bvanass...@acm.org changed:

   What|Removed |Added

 Regression|No  |Yes

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM: PPC: Book3S HV: Allow negative offsets to real-mode hcall handlers

2013-07-08 Thread Paul Mackerras
The table of offsets to real-mode hcall handlers in book3s_hv_rmhandlers.S
can contain negative values, if some of the handlers end up before the
table in the vmlinux binary.  Thus we need to use a sign-extending load
to read the values in the table rather than a zero-extending load.
Without this, the host crashes when the guest does one of the hcalls
with negative offsets, due to jumping to a bogus address.

Signed-off-by: Paul Mackerras pau...@samba.org
Cc: sta...@vger.kernel.org
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index b02f91e..60dce5b 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1381,7 +1381,7 @@ hcall_try_real_mode:
cmpldi  r3,hcall_real_table_end - hcall_real_table
bge guest_exit_cont
LOAD_REG_ADDR(r4, hcall_real_table)
-   lwzxr3,r3,r4
+   lwaxr3,r3,r4
cmpwi   r3,0
beq guest_exit_cont
add r3,r3,r4
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 60505] Heavy network traffic triggers vhost_net lockup

2013-07-08 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=60505

--- Comment #5 from Bart Van Assche bvanass...@acm.org ---
The lockup does not occur with kernel 3.8.12 but occurs with at least kernel
3.9.9 and kernel 3.10. I have been able to trigger the lockup with kernel 3.10
without seeing any tasks hanging in vhost_work_flush().

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] KVM: nVMX: Fix read/write to MSR_IA32_FEATURE_CONTROL

2013-07-08 Thread Gleb Natapov
On Sun, Jul 07, 2013 at 11:07:33PM +0800, Arthur Chunqi Li wrote:
 Fix read/write to IA32_FEATURE_CONTROL MSR in nested environment.
 
 This patch simulate this MSR in nested_vmx and the default value is
 0x0. BIOS should set it to 0x5 before VMXON. After setting the lock
 bit, write to it will cause #GP(0).
 
 Another QEMU patch is also needed to handle emulation of reset
 and migration. Reset to vCPU should clear this MSR and migration
 should reserve value of it.
 
 This patch is based on Nadav's previous commit.
 http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/88478
 
 Signed-off-by: Nadav Har'El nyh at il.ibm.com

Nadav's address is n...@math.technion.ac.il. Also the first line of the
email should be From: Nadav Har'El n...@math.technion.ac.il

 Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
 ---
  arch/x86/kvm/vmx.c |   32 ++--
  arch/x86/kvm/x86.c |3 ++-
  2 files changed, 28 insertions(+), 7 deletions(-)
 
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index a7e1855..a64efd0 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -373,6 +373,7 @@ struct nested_vmx {
* we must keep them pinned while L2 runs.
*/
   struct page *apic_access_page;
 + u64 msr_ia32_feature_control;
  };
  
  #define POSTED_INTR_ON  0
 @@ -2282,8 +2283,11 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 
 msr_index, u64 *pdata)
  
   switch (msr_index) {
   case MSR_IA32_FEATURE_CONTROL:
 - *pdata = 0;
 - break;
 + if (nested_vmx_allowed(vcpu)){
Space after { here and everywhere. Use scripts/checkpatch.pl to check
your patches for style issues.

 + *pdata = to_vmx(vcpu)-nested.msr_ia32_feature_control;
 + break;
 + }
 + return 0;
   case MSR_IA32_VMX_BASIC:
   /*
* This MSR reports some information about VMX support. We
 @@ -2356,14 +2360,21 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 
 msr_index, u64 *pdata)
   return 1;
  }
  
 -static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
 +static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
  {
 + u32 msr_index = msr_info-index;
 + u64 data = msr_info-data;
 + bool host_initialized = msr_info-host_initiated;
Leave empty line here.

   if (!nested_vmx_allowed(vcpu))
   return 0;
  
 - if (msr_index == MSR_IA32_FEATURE_CONTROL)
 - /* TODO: the right thing. */
 + if (msr_index == MSR_IA32_FEATURE_CONTROL){
 + if (!host_initialized  
 to_vmx(vcpu)-nested.msr_ia32_feature_control
 +  FEATURE_CONTROL_LOCKED)
 + return 0;
 + to_vmx(vcpu)-nested.msr_ia32_feature_control = data;
   return 1;
 + }
   /*
* No need to treat VMX capability MSRs specially: If we don't handle
* them, handle_wrmsr will #GP(0), which is correct (they are readonly)
 @@ -2494,7 +2505,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct 
 msr_data *msr_info)
   return 1;
   /* Otherwise falls through */
   default:
 - if (vmx_set_vmx_msr(vcpu, msr_index, data))
 + if (vmx_set_vmx_msr(vcpu, msr_info))
   break;
   msr = find_msr_entry(vmx, msr_index);
   if (msr) {
 @@ -5576,6 +5587,8 @@ static int handle_vmon(struct kvm_vcpu *vcpu)
   struct kvm_segment cs;
   struct vcpu_vmx *vmx = to_vmx(vcpu);
   struct vmcs *shadow_vmcs;
 + const u64 VMXON_NEEDED_FEATURES = FEATURE_CONTROL_LOCKED
 + | FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX;
  
   /* The Intel VMX Instruction Reference lists a bunch of bits that
* are prerequisite to running VMXON, most notably cr4.VMXE must be
 @@ -5604,6 +5617,13 @@ static int handle_vmon(struct kvm_vcpu *vcpu)
   skip_emulated_instruction(vcpu);
   return 1;
   }
 +
 + if ((vmx-nested.msr_ia32_feature_control  VMXON_NEEDED_FEATURES)
 + != VMXON_NEEDED_FEATURES) {
 + kvm_inject_gp(vcpu, 0);
 + return 1;
 + }
 +
   if (enable_shadow_vmcs) {
   shadow_vmcs = alloc_vmcs();
   if (!shadow_vmcs)
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index d21bce5..cff77c4 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -850,7 +850,8 @@ static u32 msrs_to_save[] = {
  #ifdef CONFIG_X86_64
   MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR,
  #endif
 - MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA
 + MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
 + MSR_IA32_FEATURE_CONTROL
  };
  
  static unsigned num_msrs_to_save;
 -- 
 1.7.9.5

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to 

Re: [PATCH 1/2] KVM: Introduce kvm_arch_memslots_updated()

2013-07-08 Thread Cornelia Huck
On Thu, 4 Jul 2013 12:53:15 +0300
Gleb Natapov g...@redhat.com wrote:

 On Thu, Jul 04, 2013 at 01:40:29PM +0900, Takuya Yoshikawa wrote:
  This is called right after the memslots is updated, i.e. when the result
  of update_memslots() gets installed in install_new_memslots().  Since
  the memslots needs to be updated twice when we delete or move a memslot,
  kvm_arch_commit_memory_region() does not correspond to this exactly.
  
  In the following patch, x86 will use this new API to check if the mmio
  generation has reached its maximum value, in which case mmio sptes need
  to be flushed out.
  
  Signed-off-by: Takuya Yoshikawa yoshikawa_takuya...@lab.ntt.co.jp
  ---
   Removed the trailing space after return old_memslots; at this chance.
  
   arch/arm/kvm/arm.c |4 
   arch/ia64/kvm/kvm-ia64.c   |4 
   arch/mips/kvm/kvm_mips.c   |4 
   arch/powerpc/kvm/powerpc.c |4 
   arch/s390/kvm/kvm-s390.c   |4 
   arch/x86/kvm/x86.c |4 
   include/linux/kvm_host.h   |1 +
   virt/kvm/kvm_main.c|5 -
   8 files changed, 29 insertions(+), 1 deletion(-)

  diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
  index e3aae6d..1c1e9de 100644
  --- a/include/linux/kvm_host.h
  +++ b/include/linux/kvm_host.h
  @@ -498,6 +498,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
   void kvm_arch_free_memslot(struct kvm_memory_slot *free,
 struct kvm_memory_slot *dont);
   int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long 
  npages);
  +void kvm_arch_memslots_updated(struct kvm *kvm);
 We can define empty function here like this:
 #ifdef __KVM_HAVE_MEMSLOT_UPDATE
  void kvm_arch_memslots_updated(struct kvm *kvm);
 #else
  static void kvm_arch_memslots_updated(struct kvm *kvm)
  {
  }
 #endif
 
 and make x86.c define __KVM_HAVE_MEMSLOT_UPDATE.
 
 But I am fine with your approach too. Do other arch maintainers have any
 preferences here?

I don't really have a strong preference either way, so

Acked-by: Cornelia Huck cornelia.h...@de.ibm.com

for the current approach.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] kvm tools: fix boot of guests with more than 4gb of ram

2013-07-08 Thread Will Deacon
Hi guys,

On Mon, Jul 08, 2013 at 09:12:26AM +0100, Pekka Enberg wrote:
 On Sun, Jul 7, 2013 at 7:00 PM, Sasha Levin sasha.le...@oracle.com wrote:
  Commit kvm tools: virtio: remove hardcoded assumptions
  about guest page size has introduced a bug that prevented
  guests with more than 4gb of ram from booting.
 
  The issue is that 'pfn' is a 32bit integer, so when multiplying
  it by page size to get the actual page will cause an overflow if
  the pfn referred to a memory area above 4gb.
 
  Signed-off-by: Sasha Levin sasha.le...@oracle.com
 
 Will, Michael, Asias, good to merge?

I'm at a conference at the moment, so unable to test this patch, but it
looks simple and correct enough to me:

  Acked-by: Will Deacon will.dea...@arm.com

Cheers,

Will
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4] KVM: nVMX: Fix read/write to MSR_IA32_FEATURE_CONTROL

2013-07-08 Thread Arthur Chunqi Li
From: Nadav Har'El n...@math.technion.ac.il

Fix read/write to IA32_FEATURE_CONTROL MSR in nested environment.

This patch simulate this MSR in nested_vmx and the default value is
0x0. BIOS should set it to 0x5 before VMXON. After setting the lock
bit, write to it will cause #GP(0).

Another QEMU patch is also needed to handle emulation of reset
and migration. Reset to vCPU should clear this MSR and migration
should reserve value of it.

This patch is based on Nadav's previous commit.
http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/88478

Signed-off-by: Nadav Har'El n...@math.technion.ac.il
Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 arch/x86/kvm/vmx.c |   35 +--
 arch/x86/kvm/x86.c |3 ++-
 2 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a7e1855..1200e4e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -373,6 +373,7 @@ struct nested_vmx {
 * we must keep them pinned while L2 runs.
 */
struct page *apic_access_page;
+   u64 msr_ia32_feature_control;
 };
 
 #define POSTED_INTR_ON  0
@@ -2282,8 +2283,11 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 
msr_index, u64 *pdata)
 
switch (msr_index) {
case MSR_IA32_FEATURE_CONTROL:
-   *pdata = 0;
-   break;
+   if (nested_vmx_allowed(vcpu)) {
+   *pdata = to_vmx(vcpu)-nested.msr_ia32_feature_control;
+   break;
+   }
+   return 0;
case MSR_IA32_VMX_BASIC:
/*
 * This MSR reports some information about VMX support. We
@@ -2356,14 +2360,24 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 
msr_index, u64 *pdata)
return 1;
 }
 
-static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
+static int vmx_set_vmx_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
+   u32 msr_index = msr_info-index;
+   u64 data = msr_info-data;
+   bool host_initialized = msr_info-host_initiated;
+
if (!nested_vmx_allowed(vcpu))
return 0;
 
-   if (msr_index == MSR_IA32_FEATURE_CONTROL)
-   /* TODO: the right thing. */
+   if (msr_index == MSR_IA32_FEATURE_CONTROL) {
+   if (!host_initialized 
+   to_vmx(vcpu)-nested.msr_ia32_feature_control
+FEATURE_CONTROL_LOCKED)
+   return 0;
+   to_vmx(vcpu)-nested.msr_ia32_feature_control = data;
return 1;
+   }
+
/*
 * No need to treat VMX capability MSRs specially: If we don't handle
 * them, handle_wrmsr will #GP(0), which is correct (they are readonly)
@@ -2494,7 +2508,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
return 1;
/* Otherwise falls through */
default:
-   if (vmx_set_vmx_msr(vcpu, msr_index, data))
+   if (vmx_set_vmx_msr(vcpu, msr_info))
break;
msr = find_msr_entry(vmx, msr_index);
if (msr) {
@@ -5576,6 +5590,8 @@ static int handle_vmon(struct kvm_vcpu *vcpu)
struct kvm_segment cs;
struct vcpu_vmx *vmx = to_vmx(vcpu);
struct vmcs *shadow_vmcs;
+   const u64 VMXON_NEEDED_FEATURES = FEATURE_CONTROL_LOCKED
+   | FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX;
 
/* The Intel VMX Instruction Reference lists a bunch of bits that
 * are prerequisite to running VMXON, most notably cr4.VMXE must be
@@ -5604,6 +5620,13 @@ static int handle_vmon(struct kvm_vcpu *vcpu)
skip_emulated_instruction(vcpu);
return 1;
}
+
+   if ((vmx-nested.msr_ia32_feature_control  VMXON_NEEDED_FEATURES)
+   != VMXON_NEEDED_FEATURES) {
+   kvm_inject_gp(vcpu, 0);
+   return 1;
+   }
+
if (enable_shadow_vmcs) {
shadow_vmcs = alloc_vmcs();
if (!shadow_vmcs)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d21bce5..cff77c4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -850,7 +850,8 @@ static u32 msrs_to_save[] = {
 #ifdef CONFIG_X86_64
MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR,
 #endif
-   MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA
+   MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
+   MSR_IA32_FEATURE_CONTROL
 };
 
 static unsigned num_msrs_to_save;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL] vhost: cleanups and fixes

2013-07-08 Thread Michael S. Tsirkin
The following changes since commit 8bb495e3f02401ee6f76d1b1d77f3ac9f079e376:

  Linux 3.10 (2013-06-30 15:13:29 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to 09a34c8404c1d4c5782de319c02e1d742c57875c:

  vhost/test: update test after vhost cleanups (2013-07-07 18:02:25 +0300)


vhost: fixes and cleanups 3.11

This includes some fixes and cleanups for vhost net and scsi drivers.
The scsi driver changes will cause a conflict with  Nicholas Bellinger's scsi
target changes, but the conflicting commit in my tree simply renames some
variables so it's trivial to resolve.

Signed-off-by: Michael S. Tsirkin m...@redhat.com


Asias He (8):
  vhost: Simplify dev-vqs[i] access
  vhost-scsi: Remove unnecessary forward struct vhost_scsi declaration
  vhost-scsi: Rename struct vhost_scsi *s to *vs
  vhost-scsi: Make func indention more consistent
  vhost-scsi: Rename struct tcm_vhost_tpg *tv_tpg to *tpg
  vhost-scsi: Rename struct tcm_vhost_cmd *tv_cmd to *cmd
  vhost: Make vhost a separate module
  vhost: Make local function static

Michael S. Tsirkin (2):
  vhost-net: fix use-after-free in vhost_net_flush
  vhost/test: update test after vhost cleanups

 drivers/vhost/Kconfig  |   8 +
 drivers/vhost/Makefile |   3 +-
 drivers/vhost/net.c|  13 +-
 drivers/vhost/scsi.c   | 472 ++---
 drivers/vhost/test.c   |  33 ++--
 drivers/vhost/vhost.c  |  86 +++--
 drivers/vhost/vhost.h  |   2 +
 7 files changed, 356 insertions(+), 261 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Registers need to recover when emulating L2 vmexit

2013-07-08 Thread Arthur Chunqi Li
Hi Gleb and Paolo,
From current KVM codes, when L2 cause VMEXIT or L1 fails to enter L2,
host VMX will execute nested_vmx_vmexit() and
nested_vmx_entry_failure(). Both of them calls
load_vmcs12_host_state() which loads vmcs12's HOST fields as vmcs01's
GUEST fields. But the HOST and GUEST fields are not accurately
correspondence, e.g. GUEST_CS/ES..._BASE/LIMIT/AR. What will these
MSRs be set?

Thanks,
Arthur

--
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] PF: Move architecture specifics to the backends

2013-07-08 Thread Marc Zyngier

On 2013-07-05 21:55, Dominik Dingel wrote:
Current common codes uses PAGE_OFFSET to indicate a bad host virtual 
address.

As this check won't work on architectures that don't map kernel and
user memory
into the same address space (e.g. s390), it is moved into architcture
specific
code.

Signed-off-by: Dominik Dingel din...@linux.vnet.ibm.com
---
 arch/arm/include/asm/kvm_host.h |  8 
 arch/ia64/include/asm/kvm_host.h|  3 +++
 arch/mips/include/asm/kvm_host.h|  6 ++
 arch/powerpc/include/asm/kvm_host.h |  8 
 arch/s390/include/asm/kvm_host.h| 12 
 arch/x86/include/asm/kvm_host.h |  8 
 include/linux/kvm_host.h|  8 
 7 files changed, 45 insertions(+), 8 deletions(-)


[...]


diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a63d83e..210f493 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -85,14 +85,6 @@ static inline bool is_noslot_pfn(pfn_t pfn)
return pfn == KVM_PFN_NOSLOT;
 }

-#define KVM_HVA_ERR_BAD(PAGE_OFFSET)
-#define KVM_HVA_ERR_RO_BAD (PAGE_OFFSET + PAGE_SIZE)
-
-static inline bool kvm_is_error_hva(unsigned long addr)
-{
-   return addr = PAGE_OFFSET;
-}
-
 #define KVM_ERR_PTR_BAD_PAGE   (ERR_PTR(-ENOENT))

 static inline bool is_error_page(struct page *page)


Nit: This breaks arm64. I suppose the patches have been created before 
the arm64 code got merged, so I'd expect the next version of this series 
to deal with arm64 as well.


Thanks,

M.
--
Fast, cheap, reliable. Pick two.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 01/13] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1

2013-07-08 Thread Gleb Natapov
On Thu, Jul 04, 2013 at 08:42:53AM +, Zhang, Yang Z wrote:
 Gleb Natapov wrote on 2013-07-02:
  On Tue, Jul 02, 2013 at 05:34:56PM +0200, Jan Kiszka wrote:
  On 2013-07-02 17:15, Gleb Natapov wrote:
  On Tue, Jul 02, 2013 at 04:28:56PM +0200, Jan Kiszka wrote:
  On 2013-07-02 15:59, Gleb Natapov wrote:
  On Tue, Jul 02, 2013 at 03:01:24AM +, Zhang, Yang Z wrote:
  Since this series is pending in mail list for long time. And
  it's really a big feature for Nested. Also, I doubt the
  original authors(Jun and Nahav)should not have enough time to continue 
  it.
  So I will pick it up. :)
  
  See comments below:
  
  Paolo Bonzini wrote on 2013-05-20:
  Il 19/05/2013 06:52, Jun Nakajima ha scritto:
  From: Nadav Har'El n...@il.ibm.com
  
  Recent KVM, since
  http://kerneltrap.org/mailarchive/linux-kvm/2010/5/2/6261577
  switch the EFER MSR when EPT is used and the host and guest have
  different NX bits. So if we add support for nested EPT (L1 guest
  using EPT to run L2) and want to be able to run recent KVM as L1,
  we need to allow L1 to use this EFER switching feature.
  
  To do this EFER switching, KVM uses VM_ENTRY/EXIT_LOAD_IA32_EFER
  if available, and if it isn't, it uses the generic
  VM_ENTRY/EXIT_MSR_LOAD. This patch adds support for the former
  (the latter is still unsupported).
  
  Nested entry and exit emulation (prepare_vmcs_02 and
  load_vmcs12_host_state, respectively) already handled
  VM_ENTRY/EXIT_LOAD_IA32_EFER correctly. So all that's left to do
  in this patch is to properly advertise this feature to L1.
  
  Note that vmcs12's VM_ENTRY/EXIT_LOAD_IA32_EFER are emulated by
  L0, by using vmx_set_efer (which itself sets one of several
  vmcs02 fields), so we always support this feature, regardless of
  whether the host supports it.
  
  Signed-off-by: Nadav Har'El n...@il.ibm.com
  Signed-off-by: Jun Nakajima jun.nakaj...@intel.com
  Signed-off-by: Xinhao Xu xinhao...@intel.com
  ---
   arch/x86/kvm/vmx.c | 23 ---
   1 file changed, 16 insertions(+), 7 deletions(-)
  diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
  260a919..fb9cae5 100644
  --- a/arch/x86/kvm/vmx.c
  +++ b/arch/x86/kvm/vmx.c
  @@ -2192,7 +2192,8 @@ static __init void
  nested_vmx_setup_ctls_msrs(void)  #else
   nested_vmx_exit_ctls_high = 0;  #endif
  -nested_vmx_exit_ctls_high |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;
  +nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR
  | +VM_EXIT_LOAD_IA32_EFER);
  
   /* entry controls */
   rdmsr(MSR_IA32_VMX_ENTRY_CTLS, @@ -2201,8 +2202,8
  @@ static
  __init void nested_vmx_setup_ctls_msrs(void)
   nested_vmx_entry_ctls_low = VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR;
   nested_vmx_entry_ctls_high =   VM_ENTRY_LOAD_IA32_PAT |
   VM_ENTRY_IA32E_MODE;
  -nested_vmx_entry_ctls_high |=
  VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR; -
  +nested_vmx_entry_ctls_high |=
  (VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | +  

  VM_ENTRY_LOAD_IA32_EFER);
   /* cpu-based controls */
   rdmsr(MSR_IA32_VMX_PROCBASED_CTLS,
   nested_vmx_procbased_ctls_low,
  nested_vmx_procbased_ctls_high); @@ -7492,10 +7493,18 @@ static
  void prepare_vmcs02(struct kvm_vcpu *vcpu,
  struct vmcs12 *vmcs12)
   vcpu-arch.cr0_guest_owned_bits =
   ~vmcs12-cr0_guest_host_mask;   vmcs_writel(CR0_GUEST_HOST_MASK,
  ~vcpu-arch.cr0_guest_owned_bits);
  
  -/* Note: IA32_MODE, LOAD_IA32_EFER are modified by
  vmx_set_efer
  below */
  -vmcs_write32(VM_EXIT_CONTROLS, -
  vmcs12-vm_exit_controls |
  vmcs_config.vmexit_ctrl); -  vmcs_write32(VM_ENTRY_CONTROLS,
  vmcs12-vm_entry_controls | +/* L2-L1 exit controls are
  emulated - the hardware exit is +to L0 so +   * we should use its
  exit controls. Note that IA32_MODE, LOAD_IA32_EFER +  * bits are
  further modified by vmx_set_efer() below. +   */
  +vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);
  This is wrong. We cannot use L0 exit control directly.
  LOAD_PERF_GLOBAL_CTRL, LOAD_HOST_EFE, LOAD_HOST_PAT,
  ACK_INTR_ON_EXIT should use host's exit control. But others, still
  need use (vmcs12|host).
  
  I do not see why. We always intercept DR7/PAT/EFER, so save is
  emulated too. Host address space size always come from L0 and
  preemption timer is not supported for nested IIRC and when it
  will be host will have to save it on exit anyway for correct emulation.
  
  Preemption timer is already supported and works fine as far as I tested.
  KVM doesn't use it for L1, so we do not need to save/restore it - IIRC.
  
  So what happens if L1 configures it to value X after X/2 ticks L0
  exit happen and L0 gets back to L2 directly. The counter will be X
  again instead of X/2.
  
  Likely. Yes, we need to improve our emulation by setting Save
  VMX-preemption timer value or emulate this in software if the
  hardware lacks support for it (was this flag introduced 

Re: [Qemu-devel] [PATCH qom-cpu v9] target-i386: Move hyperv_* static globals to X86CPU

2013-07-08 Thread Igor Mammedov
On Mon,  8 Jul 2013 03:03:54 +0200
Andreas Färber afaer...@suse.de wrote:

 From: Igor Mammedov imamm...@redhat.com
 
 - since hyperv_* helper functions are used only in target-i386/kvm.c
   move them there as static helpers
 
 Requested-by: Eduardo Habkost ehabk...@redhat.com
 Signed-off-by: Igor Mammedov imamm...@redhat.com
 Signed-off-by: Andreas Färber afaer...@suse.de
I'm not tested it yet, but it looks good to me.

 ---
  v8 (imammedo) - v9:
  * Use X86CPU instead of CPUX86State (only used in KVM)
  * Changed helper functions to X86CPU argument
  * Moved field initialization to QOM instance_init
  * Fixed subject (not today's CPUState)
 
  target-i386/Makefile.objs |  2 +-
  target-i386/cpu-qom.h |  4 +++
  target-i386/cpu.c | 16 
  target-i386/cpu.h |  4 +++
  target-i386/hyperv.c  | 64 
 ---
  target-i386/hyperv.h  | 45 -
  target-i386/kvm.c | 36 ++
  7 files changed, 46 insertions(+), 125 deletions(-)
  delete mode 100644 target-i386/hyperv.c
  delete mode 100644 target-i386/hyperv.h
 
 diff --git a/target-i386/Makefile.objs b/target-i386/Makefile.objs
 index c1d4f05..887dca7 100644
 --- a/target-i386/Makefile.objs
 +++ b/target-i386/Makefile.objs
 @@ -2,7 +2,7 @@ obj-y += translate.o helper.o cpu.o
  obj-y += excp_helper.o fpu_helper.o cc_helper.o int_helper.o svm_helper.o
  obj-y += smm_helper.o misc_helper.o mem_helper.o seg_helper.o
  obj-$(CONFIG_SOFTMMU) += machine.o arch_memory_mapping.o arch_dump.o
 -obj-$(CONFIG_KVM) += kvm.o hyperv.o
 +obj-$(CONFIG_KVM) += kvm.o
  obj-$(CONFIG_NO_KVM) += kvm-stub.o
  obj-$(CONFIG_LINUX_USER) += ioport-user.o
  obj-$(CONFIG_BSD_USER) += ioport-user.o
 diff --git a/target-i386/cpu-qom.h b/target-i386/cpu-qom.h
 index 7e55e5f..18f08b8 100644
 --- a/target-i386/cpu-qom.h
 +++ b/target-i386/cpu-qom.h
 @@ -66,6 +66,10 @@ typedef struct X86CPU {
  
  CPUX86State env;
  
 +bool hyperv_vapic;
 +bool hyperv_relaxed_timing;
 +int hyperv_spinlock_attempts;
 +
  /* Features that were filtered out because of missing host capabilities 
 */
  uint32_t filtered_features[FEATURE_WORDS];
  } X86CPU;
 diff --git a/target-i386/cpu.c b/target-i386/cpu.c
 index e3f75a8..14e9c7e 100644
 --- a/target-i386/cpu.c
 +++ b/target-i386/cpu.c
 @@ -35,8 +35,6 @@
  #include qapi/visitor.h
  #include sysemu/arch_init.h
  
 -#include hyperv.h
 -
  #include hw/hw.h
  #if defined(CONFIG_KVM)
  #include linux/kvm_para.h
 @@ -1587,12 +1585,19 @@ static void cpu_x86_parse_featurestr(X86CPU *cpu, 
 char *features, Error **errp)
  object_property_parse(OBJECT(cpu), num, tsc-frequency, 
 errp);
  } else if (!strcmp(featurestr, hv-spinlocks)) {
  char *err;
 +const int min = 0xFFF;
  numvalue = strtoul(val, err, 0);
  if (!*val || *err) {
  error_setg(errp, bad numerical value %s, val);
  goto out;
  }
 -hyperv_set_spinlock_retries(numvalue);
 +if (numvalue  min) {
 +fprintf(stderr, hv-spinlocks value shall always be = 
 0x%x
 +, fixup will be removed in future versions\n,
 +min);
 +numvalue = min;
 +}
 +cpu-hyperv_spinlock_attempts = numvalue;
  } else {
  error_setg(errp, unrecognized feature %s, featurestr);
  goto out;
 @@ -1602,9 +1607,9 @@ static void cpu_x86_parse_featurestr(X86CPU *cpu, char 
 *features, Error **errp)
  } else if (!strcmp(featurestr, enforce)) {
  check_cpuid = enforce_cpuid = 1;
  } else if (!strcmp(featurestr, hv_relaxed)) {
 -hyperv_enable_relaxed_timing(true);
 +cpu-hyperv_relaxed_timing = true;
  } else if (!strcmp(featurestr, hv_vapic)) {
 -hyperv_enable_vapic_recommended(true);
 +cpu-hyperv_vapic = true;
  } else {
  error_setg(errp, feature string `%s' not in format (+feature|
 -feature|feature=xyz), featurestr);
 @@ -2479,6 +2484,7 @@ static void x86_cpu_initfn(Object *obj)
  x86_cpu_get_feature_words,
  NULL, NULL, (void *)cpu-filtered_features, NULL);
  
 +cpu-hyperv_spinlock_attempts = HYPERV_SPINLOCK_NEVER_RETRY;
  env-cpuid_apic_id = x86_cpu_apic_id_from_index(cs-cpu_index);
  
  /* init various static tables used in TCG mode */
 diff --git a/target-i386/cpu.h b/target-i386/cpu.h
 index 2d005b3..6c3eb86 100644
 --- a/target-i386/cpu.h
 +++ b/target-i386/cpu.h
 @@ -549,6 +549,10 @@ typedef uint32_t FeatureWordArray[FEATURE_WORDS];
  #define CPUID_MWAIT_IBE (1  1) /* Interrupts can exit capability */
  #define CPUID_MWAIT_EMX (1  0) /* enumeration 

[Bug 60271] Kernelpanic since 3.9.8 with qemu-kvm and pci-passthrough

2013-07-08 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=60271

Fabian Zimmermann dev@googlemail.com changed:

   What|Removed |Added

Summary|Kernelpanic since 3.9.6 |Kernelpanic since 3.9.8
   |with qemu-kvm and   |with qemu-kvm and
   |pci-passthrough |pci-passthrough

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 60271] Kernelpanic since 3.9.8 with qemu-kvm and pci-passthrough

2013-07-08 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=60271

--- Comment #6 from Fabian Zimmermann dev@googlemail.com ---
* reversed both above patches - problem still there
* disabled radeon-module (in kernel-config) - problem still there

Attached you will find the dmesg.txt (netconsole-output of panic). Don't
hesitate to ask if I can provide further information

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 60271] Kernelpanic since 3.9.8 with qemu-kvm and pci-passthrough

2013-07-08 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=60271

--- Comment #7 from Fabian Zimmermann dev@googlemail.com ---
Created attachment 106838
  -- https://bugzilla.kernel.org/attachment.cgi?id=106838action=edit
netconsole / dmesg of panic

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 60271] Kernelpanic since 3.9.8 with qemu-kvm and pci-passthrough

2013-07-08 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=60271

Fabian Zimmermann dev@googlemail.com changed:

   What|Removed |Added

 Kernel Version|3.9.8   |3.9.8, 3.9.9

--- Comment #8 from Fabian Zimmermann dev@googlemail.com ---
3.9.9 is affected, too.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

2013-07-08 Thread Alexander Graf

On 28.06.2013, at 11:20, Mihai Caraman wrote:

 lwepx faults needs to be handled by KVM and this implies additional code
 in DO_KVM macro to identify the source of the exception originated from
 host context. This requires to check the Exception Syndrome Register
 (ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) for DTB_MISS,
 DSI and LRAT exceptions which is too intrusive for the host.
 
 Get rid of lwepx and acquire last instuction in kvmppc_handle_exit() by
 searching for the physical address and kmap it. This fixes an infinite loop

What's the difference in speed for this?

Also, could we call lwepx later in host code, when kvmppc_get_last_inst() gets 
invoked?

 caused by lwepx's data TLB miss handled in the host and the TODO for TLB
 eviction and execute-but-not-read entries.
 
 Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
 ---
 Resend this pacth for Alex G. he was unsubscribed from kvm-ppc mailist
 for a while.
 
 arch/powerpc/include/asm/mmu-book3e.h |6 ++-
 arch/powerpc/kvm/booke.c  |6 +++
 arch/powerpc/kvm/booke.h  |2 +
 arch/powerpc/kvm/bookehv_interrupts.S |   32 ++-
 arch/powerpc/kvm/e500.c   |4 ++
 arch/powerpc/kvm/e500mc.c |   69 +
 6 files changed, 91 insertions(+), 28 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/mmu-book3e.h 
 b/arch/powerpc/include/asm/mmu-book3e.h
 index 99d43e0..32e470e 100644
 --- a/arch/powerpc/include/asm/mmu-book3e.h
 +++ b/arch/powerpc/include/asm/mmu-book3e.h
 @@ -40,7 +40,10 @@
 
 /* MAS registers bit definitions */
 
 -#define MAS0_TLBSEL(x)   (((x)  28)  0x3000)
 +#define MAS0_TLBSEL_MASK 0x3000
 +#define MAS0_TLBSEL_SHIFT28
 +#define MAS0_TLBSEL(x)   (((x)  MAS0_TLBSEL_SHIFT)  
 MAS0_TLBSEL_MASK)
 +#define MAS0_GET_TLBSEL(mas0)(((mas0)  MAS0_TLBSEL_MASK)  
 MAS0_TLBSEL_SHIFT)
 #define MAS0_ESEL_MASK0x0FFF
 #define MAS0_ESEL_SHIFT   16
 #define MAS0_ESEL(x)  (((x)  MAS0_ESEL_SHIFT)  MAS0_ESEL_MASK)
 @@ -58,6 +61,7 @@
 #define MAS1_TSIZE_MASK   0x0f80
 #define MAS1_TSIZE_SHIFT  7
 #define MAS1_TSIZE(x) (((x)  MAS1_TSIZE_SHIFT)  MAS1_TSIZE_MASK)
 +#define MAS1_GET_TSIZE(mas1) (((mas1)  MAS1_TSIZE_MASK)  MAS1_TSIZE_SHIFT)
 
 #define MAS2_EPN  (~0xFFFUL)
 #define MAS2_X0   0x0040
 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
 index 1020119..6764a8e 100644
 --- a/arch/powerpc/kvm/booke.c
 +++ b/arch/powerpc/kvm/booke.c
 @@ -836,6 +836,12 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
 kvm_vcpu *vcpu,
   /* update before a new last_exit_type is rewritten */
   kvmppc_update_timing_stats(vcpu);
 
 + /*
 +  * The exception type can change at this point, such as if the TLB entry
 +  * for the emulated instruction has been evicted.
 +  */
 + kvmppc_prepare_for_emulation(vcpu, exit_nr);

Please model this the same way as book3s. Check out kvmppc_get_last_inst() as a 
starting point.

 +
   /* restart interrupts if they were meant for the host */
   kvmppc_restart_interrupt(vcpu, exit_nr);
 
 diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h
 index 5fd1ba6..a0d0fea 100644
 --- a/arch/powerpc/kvm/booke.h
 +++ b/arch/powerpc/kvm/booke.h
 @@ -90,6 +90,8 @@ void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu);
 void kvmppc_booke_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
 void kvmppc_booke_vcpu_put(struct kvm_vcpu *vcpu);
 
 +void kvmppc_prepare_for_emulation(struct kvm_vcpu *vcpu, unsigned int 
 *exit_nr);
 +
 enum int_class {
   INT_CLASS_NONCRIT,
   INT_CLASS_CRIT,
 diff --git a/arch/powerpc/kvm/bookehv_interrupts.S 
 b/arch/powerpc/kvm/bookehv_interrupts.S
 index 20c7a54..0538ab9 100644
 --- a/arch/powerpc/kvm/bookehv_interrupts.S
 +++ b/arch/powerpc/kvm/bookehv_interrupts.S
 @@ -120,37 +120,20 @@
 
   .if \flags  NEED_EMU
   /*
 -  * This assumes you have external PID support.
 -  * To support a bookehv CPU without external PID, you'll
 -  * need to look up the TLB entry and create a temporary mapping.
 -  *
 -  * FIXME: we don't currently handle if the lwepx faults.  PR-mode
 -  * booke doesn't handle it either.  Since Linux doesn't use
 -  * broadcast tlbivax anymore, the only way this should happen is
 -  * if the guest maps its memory execute-but-not-read, or if we
 -  * somehow take a TLB miss in the middle of this entry code and
 -  * evict the relevant entry.  On e500mc, all kernel lowmem is
 -  * bolted into TLB1 large page mappings, and we don't use
 -  * broadcast invalidates, so we should not take a TLB miss here.
 -  *
 -  * Later we'll need to deal with faults here.  Disallowing guest
 -  * mappings that are execute-but-not-read could be an option on
 -  * e500mc, but not on chips with an LRAT if it is used.
 

[Bug 60271] Kernelpanic since 3.9.8 with qemu-kvm and pci-passthrough

2013-07-08 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=60271

Michael S. Tsirkin m.s.tsir...@gmail.com changed:

   What|Removed |Added

 CC||m.s.tsir...@gmail.com

--- Comment #9 from Michael S. Tsirkin m.s.tsir...@gmail.com ---
can you pls try disabling zero copy tx in vhost_net?
it's a module parameter for this module

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -V3 1/4] mm/cma: Move dma contiguous changes into a seperate config

2013-07-08 Thread Alexander Graf

On 02.07.2013, at 07:45, Aneesh Kumar K.V wrote:

 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 We want to use CMA for allocating hash page table and real mode area for
 PPC64. Hence move DMA contiguous related changes into a seperate config
 so that ppc64 can enable CMA without requiring DMA contiguous.
 
 Acked-by: Michal Nazarewicz min...@mina86.com
 Acked-by: Paul Mackerras pau...@samba.org
 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

Thanks, applied all to kvm-ppc-queue. Please provide a cover letter next time 
:).


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v3 01/13] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1

2013-07-08 Thread Zhang, Yang Z
 -Original Message-
 From: Gleb Natapov [mailto:g...@redhat.com]
 Sent: Monday, July 08, 2013 8:38 PM
 To: Zhang, Yang Z
 Cc: Jan Kiszka; Paolo Bonzini; Nakajima, Jun; kvm@vger.kernel.org
 Subject: Re: [PATCH v3 01/13] nEPT: Support LOAD_IA32_EFER entry/exit
 controls for L1
 
 On Thu, Jul 04, 2013 at 08:42:53AM +, Zhang, Yang Z wrote:
  Gleb Natapov wrote on 2013-07-02:
   On Tue, Jul 02, 2013 at 05:34:56PM +0200, Jan Kiszka wrote:
   On 2013-07-02 17:15, Gleb Natapov wrote:
   On Tue, Jul 02, 2013 at 04:28:56PM +0200, Jan Kiszka wrote:
   On 2013-07-02 15:59, Gleb Natapov wrote:
   On Tue, Jul 02, 2013 at 03:01:24AM +, Zhang, Yang Z wrote:
   Since this series is pending in mail list for long time. And
   it's really a big feature for Nested. Also, I doubt the
   original authors(Jun and Nahav)should not have enough time to
 continue it.
   So I will pick it up. :)
  
   See comments below:
  
   Paolo Bonzini wrote on 2013-05-20:
   Il 19/05/2013 06:52, Jun Nakajima ha scritto:
   From: Nadav Har'El n...@il.ibm.com
  
   Recent KVM, since
   http://kerneltrap.org/mailarchive/linux-kvm/2010/5/2/6261577
   switch the EFER MSR when EPT is used and the host and guest
 have
   different NX bits. So if we add support for nested EPT (L1 guest
   using EPT to run L2) and want to be able to run recent KVM as L1,
   we need to allow L1 to use this EFER switching feature.
  
   To do this EFER switching, KVM uses
 VM_ENTRY/EXIT_LOAD_IA32_EFER
   if available, and if it isn't, it uses the generic
   VM_ENTRY/EXIT_MSR_LOAD. This patch adds support for the
 former
   (the latter is still unsupported).
  
   Nested entry and exit emulation (prepare_vmcs_02 and
   load_vmcs12_host_state, respectively) already handled
   VM_ENTRY/EXIT_LOAD_IA32_EFER correctly. So all that's left to do
   in this patch is to properly advertise this feature to L1.
  
   Note that vmcs12's VM_ENTRY/EXIT_LOAD_IA32_EFER are
 emulated by
   L0, by using vmx_set_efer (which itself sets one of several
   vmcs02 fields), so we always support this feature, regardless of
   whether the host supports it.
  
   Signed-off-by: Nadav Har'El n...@il.ibm.com
   Signed-off-by: Jun Nakajima jun.nakaj...@intel.com
   Signed-off-by: Xinhao Xu xinhao...@intel.com
   ---
arch/x86/kvm/vmx.c | 23 ---
1 file changed, 16 insertions(+), 7 deletions(-)
   diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
   260a919..fb9cae5 100644
   --- a/arch/x86/kvm/vmx.c
   +++ b/arch/x86/kvm/vmx.c
   @@ -2192,7 +2192,8 @@ static __init void
   nested_vmx_setup_ctls_msrs(void)  #else
  nested_vmx_exit_ctls_high = 0;  #endif
   -  nested_vmx_exit_ctls_high |=
 VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;
   +  nested_vmx_exit_ctls_high |=
 (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR
   | +  VM_EXIT_LOAD_IA32_EFER);
  
  /* entry controls */
  rdmsr(MSR_IA32_VMX_ENTRY_CTLS, @@ -2201,8 +2202,8
   @@ static
   __init void nested_vmx_setup_ctls_msrs(void)
  nested_vmx_entry_ctls_low =
 VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR;
  nested_vmx_entry_ctls_high =
   VM_ENTRY_LOAD_IA32_PAT |
VM_ENTRY_IA32E_MODE;
   -  nested_vmx_entry_ctls_high |=
   VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR; -
   +  nested_vmx_entry_ctls_high |=
   (VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | +
   VM_ENTRY_LOAD_IA32_EFER);
  /* cpu-based controls */
  rdmsr(MSR_IA32_VMX_PROCBASED_CTLS,
  nested_vmx_procbased_ctls_low,
   nested_vmx_procbased_ctls_high); @@ -7492,10 +7493,18 @@
 static
   void prepare_vmcs02(struct kvm_vcpu *vcpu,
   struct vmcs12 *vmcs12)
  vcpu-arch.cr0_guest_owned_bits =
~vmcs12-cr0_guest_host_mask;
   vmcs_writel(CR0_GUEST_HOST_MASK,
   ~vcpu-arch.cr0_guest_owned_bits);
  
   -  /* Note: IA32_MODE, LOAD_IA32_EFER are modified by
   vmx_set_efer
   below */
   -  vmcs_write32(VM_EXIT_CONTROLS, -
   vmcs12-vm_exit_controls |
   vmcs_config.vmexit_ctrl); -vmcs_write32(VM_ENTRY_CONTROLS,
   vmcs12-vm_entry_controls | +  /* L2-L1 exit controls are
   emulated - the hardware exit is +to L0 so + * we should use
 its
   exit controls. Note that IA32_MODE, LOAD_IA32_EFER +*
 bits are
   further modified by vmx_set_efer() below. + */
   +  vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);
   This is wrong. We cannot use L0 exit control directly.
   LOAD_PERF_GLOBAL_CTRL, LOAD_HOST_EFE, LOAD_HOST_PAT,
   ACK_INTR_ON_EXIT should use host's exit control. But others, still
   need use (vmcs12|host).
  
   I do not see why. We always intercept DR7/PAT/EFER, so save is
   emulated too. Host address space size always come from L0 and
   preemption timer is not supported for nested IIRC and when it
   will be host will have to save it on exit anyway for correct 
   emulation.
  
   Preemption timer is already supported and works fine as far as I 
   tested.
   KVM doesn't use it for L1, so we do not need to save/restore it - IIRC.
  
   So what 

guests not shutting down when host shuts down

2013-07-08 Thread Lentes, Bernd
Hi,

i have a SLES 11 SP2 64bit host with three guests:
- Windows XP 32
- Ubuntu 12.04 LTS 64bit
- SLES 11 SP2 64bit

The SLES guest shuts down with the host shutdown. The others not. When i 
shutdown these two guests with the virt-manager, they shutdown fine.
ACPI is activated in virt-manager for both of them. When the host shuts down, 
the two guests get a signal (excerpt from the log of the host:)

===
2013-07-07 16:39:51.674: starting up
LC_ALL=C PATH=/bin:/sbin:/usr/bin:/usr/sbin HOME=/ QEMU_AUDIO_DRV=none 
/usr/bin/qemu-kvm -S -M pc-0.15 -enable-kvm -m 1025 -smp 
1,sockets=1,cores=1,threads=1 -name greensql_2 -uuid 
2cfbac9c-dbb2-c4bf-4aba-2d18dc49d18e -nodefconfig -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/greensql_2.monitor,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown 
-drive 
file=/var/lib/kvm/images/greensql_2/disk0.raw,if=none,id=drive-ide0-0-0,format=raw
 -device 
ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive 
if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device 
ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev 
tap,fd=17,id=hostnet0,vhost=on,vhostfd=20 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:37:92:a9,bus=pci.0,addr=0x3 
-usb -vnc 127.0.0.1:2 -vga cirrus -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4
Domain id=3 is tainted: high-privileges

qemu: terminating on signal 15 from pid 24958

2013-07-08 13:58:29.651: starting up
==

I'm a bit astonished about no-shutdown in the commandline, but the sles guest 
also has it in its commandline, so it should not bother.

Thanks for any help.


Bernd





--
Bernd Lentes

Systemadministration
Institut für Entwicklungsgenetik
Gebäude 35.34 - Raum 208
HelmholtzZentrum münchen
bernd.len...@helmholtz-muenchen.de
phone: +49 89 3187 1241
fax:   +49 89 3187 2294
http://www.helmholtz-muenchen.de/idg

Wer nichts verdient außer Geld verdient nichts außer Geld

Helmholtz Zentrum München
Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH)
Ingolstädter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir´in Bärbel Brumme-Bothe
Geschäftsführer: Prof. Dr. Günther Wess Dr. Nikolaus Blum Dr. Alfons Enhsen
Registergericht: Amtsgericht München HRB 6466
USt-IdNr: DE 129521671
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 01/13] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1

2013-07-08 Thread Gleb Natapov
On Mon, Jul 08, 2013 at 02:28:15PM +, Zhang, Yang Z wrote:
  -Original Message-
  From: Gleb Natapov [mailto:g...@redhat.com]
  Sent: Monday, July 08, 2013 8:38 PM
  To: Zhang, Yang Z
  Cc: Jan Kiszka; Paolo Bonzini; Nakajima, Jun; kvm@vger.kernel.org
  Subject: Re: [PATCH v3 01/13] nEPT: Support LOAD_IA32_EFER entry/exit
  controls for L1
  
  On Thu, Jul 04, 2013 at 08:42:53AM +, Zhang, Yang Z wrote:
   Gleb Natapov wrote on 2013-07-02:
On Tue, Jul 02, 2013 at 05:34:56PM +0200, Jan Kiszka wrote:
On 2013-07-02 17:15, Gleb Natapov wrote:
On Tue, Jul 02, 2013 at 04:28:56PM +0200, Jan Kiszka wrote:
On 2013-07-02 15:59, Gleb Natapov wrote:
On Tue, Jul 02, 2013 at 03:01:24AM +, Zhang, Yang Z wrote:
Since this series is pending in mail list for long time. And
it's really a big feature for Nested. Also, I doubt the
original authors(Jun and Nahav)should not have enough time to
  continue it.
So I will pick it up. :)
   
See comments below:
   
Paolo Bonzini wrote on 2013-05-20:
Il 19/05/2013 06:52, Jun Nakajima ha scritto:
From: Nadav Har'El n...@il.ibm.com
   
Recent KVM, since
http://kerneltrap.org/mailarchive/linux-kvm/2010/5/2/6261577
switch the EFER MSR when EPT is used and the host and guest
  have
different NX bits. So if we add support for nested EPT (L1 guest
using EPT to run L2) and want to be able to run recent KVM as L1,
we need to allow L1 to use this EFER switching feature.
   
To do this EFER switching, KVM uses
  VM_ENTRY/EXIT_LOAD_IA32_EFER
if available, and if it isn't, it uses the generic
VM_ENTRY/EXIT_MSR_LOAD. This patch adds support for the
  former
(the latter is still unsupported).
   
Nested entry and exit emulation (prepare_vmcs_02 and
load_vmcs12_host_state, respectively) already handled
VM_ENTRY/EXIT_LOAD_IA32_EFER correctly. So all that's left to do
in this patch is to properly advertise this feature to L1.
   
Note that vmcs12's VM_ENTRY/EXIT_LOAD_IA32_EFER are
  emulated by
L0, by using vmx_set_efer (which itself sets one of several
vmcs02 fields), so we always support this feature, regardless of
whether the host supports it.
   
Signed-off-by: Nadav Har'El n...@il.ibm.com
Signed-off-by: Jun Nakajima jun.nakaj...@intel.com
Signed-off-by: Xinhao Xu xinhao...@intel.com
---
 arch/x86/kvm/vmx.c | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
260a919..fb9cae5 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2192,7 +2192,8 @@ static __init void
nested_vmx_setup_ctls_msrs(void)  #else
 nested_vmx_exit_ctls_high = 0;  #endif
-nested_vmx_exit_ctls_high |=
  VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;
+nested_vmx_exit_ctls_high |=
  (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR
| +VM_EXIT_LOAD_IA32_EFER);
   
 /* entry controls */
 rdmsr(MSR_IA32_VMX_ENTRY_CTLS, @@ -2201,8 +2202,8
@@ static
__init void nested_vmx_setup_ctls_msrs(void)
 nested_vmx_entry_ctls_low =
  VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR;
 nested_vmx_entry_ctls_high =
  VM_ENTRY_LOAD_IA32_PAT |
 VM_ENTRY_IA32E_MODE;
-nested_vmx_entry_ctls_high |=
VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR; -
+nested_vmx_entry_ctls_high |=
(VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | +
VM_ENTRY_LOAD_IA32_EFER);
 /* cpu-based controls */
 rdmsr(MSR_IA32_VMX_PROCBASED_CTLS,
 nested_vmx_procbased_ctls_low,
nested_vmx_procbased_ctls_high); @@ -7492,10 +7493,18 @@
  static
void prepare_vmcs02(struct kvm_vcpu *vcpu,
struct vmcs12 *vmcs12)
 vcpu-arch.cr0_guest_owned_bits =
 ~vmcs12-cr0_guest_host_mask;
  vmcs_writel(CR0_GUEST_HOST_MASK,
~vcpu-arch.cr0_guest_owned_bits);
   
-/* Note: IA32_MODE, LOAD_IA32_EFER are modified by
vmx_set_efer
below */
-vmcs_write32(VM_EXIT_CONTROLS, -
  vmcs12-vm_exit_controls |
vmcs_config.vmexit_ctrl); -  vmcs_write32(VM_ENTRY_CONTROLS,
vmcs12-vm_entry_controls | +/* L2-L1 exit controls are
emulated - the hardware exit is +to L0 so +   * we should use
  its
exit controls. Note that IA32_MODE, LOAD_IA32_EFER +  *
  bits are
further modified by vmx_set_efer() below. +   */
+vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);
This is wrong. We cannot use L0 exit control directly.
LOAD_PERF_GLOBAL_CTRL, LOAD_HOST_EFE, LOAD_HOST_PAT,
ACK_INTR_ON_EXIT should use host's exit control. But others, still
need use (vmcs12|host).
   
I do not see why. We always intercept DR7/PAT/EFER, so save is
emulated too. Host address space size always come from L0 and
preemption timer is not supported for nested IIRC and 

[GIT PULL] VFIO for v3.11

2013-07-08 Thread Alex Williamson
Hi Linus,

The following changes since commit 7d132055814ef17a6c7b69f342244c410a5e000f:

  Linux 3.10-rc6 (2013-06-15 11:51:07 -1000)

are available in the git repository at:

  git://github.com/awilliam/linux-vfio.git tags/vfio-v3.11

for you to fetch changes up to 8d38ef1948bd415a5cb653a5c0ec16f3402aaca1:

  vfio/type1: Fix leak on error path (2013-07-01 08:28:58 -0600)


vfio Updates for v3.11

Largely hugepage support for vfio/type1 iommu and surrounding cleanups and 
fixes.


Alex Williamson (6):
  vfio: Convert type1 iommu to use rbtree
  vfio: hugepage support for vfio_iommu_type1
  vfio: Provide module option to disable vfio_iommu_type1 hugepage support
  vfio/type1: Fix missed frees and zero sized removes
  vfio: Limit group opens
  vfio/type1: Fix leak on error path

Alexey Kardashevskiy (1):
  vfio: fix documentation

 Documentation/vfio.txt  |   6 +-
 drivers/vfio/vfio.c |  14 +++
 drivers/vfio/vfio_iommu_type1.c | 626 
+---
 include/uapi/linux/vfio.h   |   8 +-
 4 files changed, 424 insertions(+), 230 deletions(-)


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Registers need to recover when emulating L2 vmexit

2013-07-08 Thread Gleb Natapov
On Mon, Jul 08, 2013 at 07:50:45PM +0800, Arthur Chunqi Li wrote:
 Hi Gleb and Paolo,
 From current KVM codes, when L2 cause VMEXIT or L1 fails to enter L2,
 host VMX will execute nested_vmx_vmexit() and
 nested_vmx_entry_failure(). Both of them calls
 load_vmcs12_host_state() which loads vmcs12's HOST fields as vmcs01's
 GUEST fields. But the HOST and GUEST fields are not accurately
 correspondence, e.g. GUEST_CS/ES..._BASE/LIMIT/AR. What will these
 MSRs be set?
 
This is not MSRs, but VMCS field. Currently they are set to whatever
value they had in vmcs01 when L1 executed VMLAUNCH, but this is
incorrect. They should be set according to section 27.5.2
Loading Host Segment and Descriptor-Table Registers of SDM.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: PPC: Fix kvm_exit_names array

2013-07-08 Thread Alexander Graf

On 03.07.2013, at 15:30, Mihai Caraman wrote:

 Some exit ids where left out from kvm_exit_names array.
 
 Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
 ---
 arch/powerpc/kvm/timing.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)
 
 diff --git a/arch/powerpc/kvm/timing.c b/arch/powerpc/kvm/timing.c
 index 07b6110..c392d26 100644
 --- a/arch/powerpc/kvm/timing.c
 +++ b/arch/powerpc/kvm/timing.c
 @@ -135,7 +135,9 @@ static const char 
 *kvm_exit_names[__NUMBER_OF_KVM_EXIT_TYPES] = {
   [USR_PR_INST] = USR_PR_INST,
   [FP_UNAVAIL] =  FP_UNAVAIL,
   [DEBUG_EXITS] = DEBUG,
 - [TIMEINGUEST] = TIMEINGUEST
 + [TIMEINGUEST] = TIMEINGUEST,
 + [DBELL_EXITS] = DBELL,
 + [GDBELL_EXITS] =GDBELL

Please add a comma at the end here, so that we don't have to uselessly touch 
the entry next time again.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Emulate MCSRR0/1 SPR and rfmci instruction

2013-07-08 Thread Alexander Graf

On 03.07.2013, at 15:30, Mihai Caraman wrote:

 Some guests are making use of return from machine check instruction
 to do crazy things even though the 64-bit kernel doesn't handle yet
 this interrupt. Emulate MCSRR0/1 SPR and rfmci instruction accordingly.
 
 Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
 ---
 arch/powerpc/include/asm/kvm_host.h |1 +
 arch/powerpc/kvm/booke_emulate.c|   25 +
 arch/powerpc/kvm/timing.c   |1 +
 3 files changed, 27 insertions(+), 0 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/kvm_host.h 
 b/arch/powerpc/include/asm/kvm_host.h
 index af326cd..0466789 100644
 --- a/arch/powerpc/include/asm/kvm_host.h
 +++ b/arch/powerpc/include/asm/kvm_host.h
 @@ -148,6 +148,7 @@ enum kvm_exit_types {
   EMULATED_TLBWE_EXITS,
   EMULATED_RFI_EXITS,
   EMULATED_RFCI_EXITS,
 + EMULATED_RFMCI_EXITS,

I would quite frankly prefer to see us abandon the whole exit timing framework 
in the kernel and instead use trace points. Then we don't have to maintain all 
of this randomly exercised code.

FWIW I think in this case however, treating RFMCI the same as RFI or random 
instruction emulation shouldn't hurt. This whole table is only about timing 
measurements. If you want to know for real what's going on, use trace points.

Otherwise looks good.


Alex

   DEC_EXITS,
   EXT_INTR_EXITS,
   HALT_WAKEUP,
 diff --git a/arch/powerpc/kvm/booke_emulate.c 
 b/arch/powerpc/kvm/booke_emulate.c
 index 27a4b28..aaff1b7 100644
 --- a/arch/powerpc/kvm/booke_emulate.c
 +++ b/arch/powerpc/kvm/booke_emulate.c
 @@ -23,6 +23,7 @@
 
 #include booke.h
 
 +#define OP_19_XOP_RFMCI   38
 #define OP_19_XOP_RFI 50
 #define OP_19_XOP_RFCI51
 
 @@ -43,6 +44,12 @@ static void kvmppc_emul_rfci(struct kvm_vcpu *vcpu)
   kvmppc_set_msr(vcpu, vcpu-arch.csrr1);
 }
 
 +static void kvmppc_emul_rfmci(struct kvm_vcpu *vcpu)
 +{
 + vcpu-arch.pc = vcpu-arch.mcsrr0;
 + kvmppc_set_msr(vcpu, vcpu-arch.mcsrr1);
 +}
 +
 int kvmppc_booke_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
 unsigned int inst, int *advance)
 {
 @@ -65,6 +72,12 @@ int kvmppc_booke_emulate_op(struct kvm_run *run, struct 
 kvm_vcpu *vcpu,
   *advance = 0;
   break;
 
 + case OP_19_XOP_RFMCI:
 + kvmppc_emul_rfmci(vcpu);
 + kvmppc_set_exit_type(vcpu, EMULATED_RFMCI_EXITS);
 + *advance = 0;
 + break;
 +
   default:
   emulated = EMULATE_FAIL;
   break;
 @@ -138,6 +151,12 @@ int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, 
 int sprn, ulong spr_val)
   case SPRN_DBCR1:
   vcpu-arch.dbg_reg.dbcr1 = spr_val;
   break;
 + case SPRN_MCSRR0:
 + vcpu-arch.mcsrr0 = spr_val;
 + break;
 + case SPRN_MCSRR1:
 + vcpu-arch.mcsrr1 = spr_val;
 + break;
   case SPRN_DBSR:
   vcpu-arch.dbsr = ~spr_val;
   break;
 @@ -284,6 +303,12 @@ int kvmppc_booke_emulate_mfspr(struct kvm_vcpu *vcpu, 
 int sprn, ulong *spr_val)
   case SPRN_DBCR1:
   *spr_val = vcpu-arch.dbg_reg.dbcr1;
   break;
 + case SPRN_MCSRR0:
 + *spr_val = vcpu-arch.mcsrr0;
 + break;
 + case SPRN_MCSRR1:
 + *spr_val = vcpu-arch.mcsrr1;
 + break;
   case SPRN_DBSR:
   *spr_val = vcpu-arch.dbsr;
   break;
 diff --git a/arch/powerpc/kvm/timing.c b/arch/powerpc/kvm/timing.c
 index c392d26..670f63d 100644
 --- a/arch/powerpc/kvm/timing.c
 +++ b/arch/powerpc/kvm/timing.c
 @@ -129,6 +129,7 @@ static const char 
 *kvm_exit_names[__NUMBER_OF_KVM_EXIT_TYPES] = {
   [EMULATED_TLBSX_EXITS] =EMUL_TLBSX,
   [EMULATED_TLBWE_EXITS] =EMUL_TLBWE,
   [EMULATED_RFI_EXITS] =  EMUL_RFI,
 + [EMULATED_RFMCI_EXITS] =EMUL_RFMCI,
   [DEC_EXITS] =   DEC,
   [EXT_INTR_EXITS] =  EXTINT,
   [HALT_WAKEUP] = HALT,
 -- 
 1.7.3.4
 
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm-ppc in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM: x86: stop IO emulation cycle if instruction pointer is modified

2013-07-08 Thread Marcelo Tosatti
On Sat, Jul 06, 2013 at 10:41:12AM +0300, Gleb Natapov wrote:
 On Fri, Jul 05, 2013 at 04:16:55PM -0300, Marcelo Tosatti wrote:
  
  MMIO/PIO emulation should be interrupted if the system is restarted.
  Otherwise in progress IO emulation continues at the instruction pointer,
  even after vcpus' IP has been modified by KVM_SET_REGS.
  
  Use IP change as an indicator to reset MMIO/PIO emulation state.
  
 Userspace has to return to the kernel to complete pending IO operation.
 This is documented in Documentation/virtual/kvm/api.txt. If this is not
 what program does it is a bug. What userspace you see the problem with?

You're right, this patch should not be necessary.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] vfio: add external user support

2013-07-08 Thread Alex Williamson
On Sun, 2013-07-07 at 01:07 +1000, Alexey Kardashevskiy wrote:
 VFIO is designed to be used via ioctls on file descriptors
 returned by VFIO.
 
 However in some situations support for an external user is required.
 The first user is KVM on PPC64 (SPAPR TCE protocol) which is going to
 use the existing VFIO groups for exclusive access in real/virtual mode
 on a host to avoid passing map/unmap requests to the user space which
 would made things pretty slow.
 
 The proposed protocol includes:
 
 1. do normal VFIO init stuff such as opening a new container, attaching
 group(s) to it, setting an IOMMU driver for a container. When IOMMU is
 set for a container, all groups in it are considered ready to use by
 an external user.
 
 2. pass a fd of the group we want to accelerate to KVM. KVM calls
 vfio_group_get_external_user() to verify if the group is initialized,
 IOMMU is set for it and increment the container user counter to prevent
 the VFIO group from disposal prior to KVM exit.
 The current TCE IOMMU driver marks the whole IOMMU table as busy when
 IOMMU is set for a container what prevents other DMA users from
 allocating from it so it is safe to grant user space access to it.
 
 3. KVM calls vfio_external_user_iommu_id() to obtian an IOMMU ID which
 KVM uses to get an iommu_group struct for later use.
 
 4. When KVM is finished, it calls vfio_group_put_external_user() to
 release the VFIO group by decrementing the container user counter.
 Everything gets released.
 
 The vfio: Limit group opens patch is also required for the consistency.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
 diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
 index c488da5..57aa191 100644
 --- a/drivers/vfio/vfio.c
 +++ b/drivers/vfio/vfio.c
 @@ -1370,6 +1370,62 @@ static const struct file_operations vfio_device_fops = 
 {
  };
  
  /**
 + * External user API, exported by symbols to be linked dynamically.
 + *
 + * The protocol includes:
 + *  1. do normal VFIO init operation:
 + *   - opening a new container;
 + *   - attaching group(s) to it;
 + *   - setting an IOMMU driver for a container.
 + * When IOMMU is set for a container, all groups in it are
 + * considered ready to use by an external user.
 + *
 + * 2. The user space passed a group fd which we want to accelerate in
 + * KVM. KVM uses vfio_group_get_external_user() to verify that:
 + *   - the group is initialized;
 + *   - IOMMU is set for it.
 + * Then vfio_group_get_external_user() increments the container user
 + * counter to prevent the VFIO group from disposal prior to KVM exit.
 + *
 + * 3. KVM calls vfio_external_user_iommu_id() to know an IOMMU ID which
 + * KVM uses to get an iommu_group struct for later use.
 + *
 + * 4. When KVM is finished, it calls vfio_group_put_external_user() to
 + * release the VFIO group by decrementing the container user counter.

nit, the interface is for any external user, not just kvm.

 + */
 +struct vfio_group *vfio_group_get_external_user(struct file *filep)
 +{
 + struct vfio_group *group = filep-private_data;
 +
 + if (filep-f_op != vfio_group_fops)
 + return NULL;

ERR_PTR(-EINVAL)

There also needs to be a vfio_group_get(group) here and put in error
cases.

 +
 + if (!atomic_inc_not_zero(group-container_users))
 + return NULL;

ERR_PTR(-EINVAL)

 +
 + if (!group-container-iommu_driver ||
 + !vfio_group_viable(group)) {
 + atomic_dec(group-container_users);
 + return NULL;

ERR_PTR(-EINVAL)

 + }
 +
 + return group;
 +}
 +EXPORT_SYMBOL_GPL(vfio_group_get_external_user);
 +
 +void vfio_group_put_external_user(struct vfio_group *group)
 +{
 + vfio_group_try_dissolve_container(group);

And a vfio_group_put(group) here

 +}
 +EXPORT_SYMBOL_GPL(vfio_group_put_external_user);
 +
 +int vfio_external_user_iommu_id(struct vfio_group *group)
 +{
 + return iommu_group_id(group-iommu_group);
 +}
 +EXPORT_SYMBOL_GPL(vfio_external_user_iommu_id);
 +
 +/**
   * Module/class support
   */
  static char *vfio_devnode(struct device *dev, umode_t *mode)
 diff --git a/include/linux/vfio.h b/include/linux/vfio.h
 index ac8d488..24579a0 100644
 --- a/include/linux/vfio.h
 +++ b/include/linux/vfio.h
 @@ -90,4 +90,11 @@ extern void vfio_unregister_iommu_driver(
   TYPE tmp;   \
   offsetof(TYPE, MEMBER) + sizeof(tmp.MEMBER); }) \
  
 +/*
 + * External user API
 + */
 +extern struct vfio_group *vfio_group_get_external_user(struct file *filep);
 +extern void vfio_group_put_external_user(struct vfio_group *group);
 +extern int vfio_external_user_iommu_id(struct vfio_group *group);
 +
  #endif /* VFIO_H */



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] vfio: add external user support

2013-07-08 Thread Alexey Kardashevskiy
On 07/09/2013 07:52 AM, Alex Williamson wrote:
 On Sun, 2013-07-07 at 01:07 +1000, Alexey Kardashevskiy wrote:
 VFIO is designed to be used via ioctls on file descriptors
 returned by VFIO.

 However in some situations support for an external user is required.
 The first user is KVM on PPC64 (SPAPR TCE protocol) which is going to
 use the existing VFIO groups for exclusive access in real/virtual mode
 on a host to avoid passing map/unmap requests to the user space which
 would made things pretty slow.

 The proposed protocol includes:

 1. do normal VFIO init stuff such as opening a new container, attaching
 group(s) to it, setting an IOMMU driver for a container. When IOMMU is
 set for a container, all groups in it are considered ready to use by
 an external user.

 2. pass a fd of the group we want to accelerate to KVM. KVM calls
 vfio_group_get_external_user() to verify if the group is initialized,
 IOMMU is set for it and increment the container user counter to prevent
 the VFIO group from disposal prior to KVM exit.
 The current TCE IOMMU driver marks the whole IOMMU table as busy when
 IOMMU is set for a container what prevents other DMA users from
 allocating from it so it is safe to grant user space access to it.

 3. KVM calls vfio_external_user_iommu_id() to obtian an IOMMU ID which
 KVM uses to get an iommu_group struct for later use.

 4. When KVM is finished, it calls vfio_group_put_external_user() to
 release the VFIO group by decrementing the container user counter.
 Everything gets released.

 The vfio: Limit group opens patch is also required for the consistency.

 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
 diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
 index c488da5..57aa191 100644
 --- a/drivers/vfio/vfio.c
 +++ b/drivers/vfio/vfio.c
 @@ -1370,6 +1370,62 @@ static const struct file_operations vfio_device_fops 
 = {
  };
  
  /**
 + * External user API, exported by symbols to be linked dynamically.
 + *
 + * The protocol includes:
 + *  1. do normal VFIO init operation:
 + *  - opening a new container;
 + *  - attaching group(s) to it;
 + *  - setting an IOMMU driver for a container.
 + * When IOMMU is set for a container, all groups in it are
 + * considered ready to use by an external user.
 + *
 + * 2. The user space passed a group fd which we want to accelerate in
 + * KVM. KVM uses vfio_group_get_external_user() to verify that:
 + *  - the group is initialized;
 + *  - IOMMU is set for it.
 + * Then vfio_group_get_external_user() increments the container user
 + * counter to prevent the VFIO group from disposal prior to KVM exit.
 + *
 + * 3. KVM calls vfio_external_user_iommu_id() to know an IOMMU ID which
 + * KVM uses to get an iommu_group struct for later use.
 + *
 + * 4. When KVM is finished, it calls vfio_group_put_external_user() to
 + * release the VFIO group by decrementing the container user counter.
 
 nit, the interface is for any external user, not just kvm.

s/KVM/An external user/ ?
Or add the description below uses KVM just as an example of an external user?


 + */
 +struct vfio_group *vfio_group_get_external_user(struct file *filep)
 +{
 +struct vfio_group *group = filep-private_data;
 +
 +if (filep-f_op != vfio_group_fops)
 +return NULL;
 
 ERR_PTR(-EINVAL)
 
 There also needs to be a vfio_group_get(group) here and put in error
 cases.


Is that because I do not hold a reference to the file anymore?


 +
 +if (!atomic_inc_not_zero(group-container_users))
 +return NULL;
 
 ERR_PTR(-EINVAL)
 
 +
 +if (!group-container-iommu_driver ||
 +!vfio_group_viable(group)) {
 +atomic_dec(group-container_users);
 +return NULL;
 
 ERR_PTR(-EINVAL)
 
 +}
 +
 +return group;
 +}
 +EXPORT_SYMBOL_GPL(vfio_group_get_external_user);
 +
 +void vfio_group_put_external_user(struct vfio_group *group)
 +{
 +vfio_group_try_dissolve_container(group);
 
 And a vfio_group_put(group) here
 
 +}
 +EXPORT_SYMBOL_GPL(vfio_group_put_external_user);
 +
 +int vfio_external_user_iommu_id(struct vfio_group *group)
 +{
 +return iommu_group_id(group-iommu_group);
 +}
 +EXPORT_SYMBOL_GPL(vfio_external_user_iommu_id);
 +
 +/**
   * Module/class support
   */
  static char *vfio_devnode(struct device *dev, umode_t *mode)
 diff --git a/include/linux/vfio.h b/include/linux/vfio.h
 index ac8d488..24579a0 100644
 --- a/include/linux/vfio.h
 +++ b/include/linux/vfio.h
 @@ -90,4 +90,11 @@ extern void vfio_unregister_iommu_driver(
  TYPE tmp;   \
  offsetof(TYPE, MEMBER) + sizeof(tmp.MEMBER); }) \
  
 +/*
 + * External user API
 + */
 +extern struct vfio_group *vfio_group_get_external_user(struct file *filep);
 +extern void vfio_group_put_external_user(struct vfio_group *group);
 +extern int vfio_external_user_iommu_id(struct vfio_group *group);
 +
  #endif /* VFIO_H */
 
 
 


-- 
Alexey
--
To unsubscribe from this list: send the 

[PATCH 1/2] KVM: PPC: Book3S HV: Correct tlbie usage

2013-07-08 Thread Paul Mackerras
This corrects the usage of the tlbie (TLB invalidate entry) instruction
in HV KVM.  The tlbie instruction changed between PPC970 and POWER7.
On the PPC970, the bit to select large vs. small page is in the instruction,
not in the RB register value.  This changes the code to use the correct
form on PPC970.

On POWER7 we were calculating the AVAL (Abbreviated Virtual Address, Lower)
field of the RB value incorrectly for 64k pages.  This fixes it.

Since we now have several cases to handle for the tlbie instruction, this
factors out the code to do a sequence of tlbies into a new function,
do_tlbies(), and calls that from the various places where the code was
doing tlbie instructions inline.  It also makes kvmppc_h_bulk_remove()
use the same global_invalidates() function for determining whether to do
local or global TLB invalidations as is used in other places, for
consistency, and also to make sure that kvm-arch.need_tlb_flush gets
updated properly.

Signed-off-by: Paul Mackerras pau...@samba.org
Cc: sta...@vger.kernel.org
---
 arch/powerpc/include/asm/kvm_book3s_64.h |   2 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  | 139 ++-
 2 files changed, 82 insertions(+), 59 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 9c1ff33..dc6b84a 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -100,7 +100,7 @@ static inline unsigned long compute_tlbie_rb(unsigned long 
v, unsigned long r,
/* (masks depend on page size) */
rb |= 0x1000;   /* page encoding in LP field */
rb |= (va_low  0x7f)  16; /* 7b of VA in AVA/LP 
field */
-   rb |= (va_low  0xfe);  /* AVAL field (P7 doesn't seem 
to care) */
+   rb |= ((va_low  4)  0xf0);   /* AVAL field (P7 
doesn't seem to care) */
}
} else {
/* 4kB page */
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 6dcbb49..105b00f 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -385,6 +385,80 @@ static inline int try_lock_tlbie(unsigned int *lock)
return old == 0;
 }
 
+/*
+ * tlbie/tlbiel is a bit different on the PPC970 compared to later
+ * processors such as POWER7; the large page bit is in the instruction
+ * not RB, and the top 16 bits and the bottom 12 bits of the VA
+ * in RB must be 0.
+ */
+static void do_tlbies_970(struct kvm *kvm, unsigned long *rbvalues,
+ long npages, int global, bool need_sync)
+{
+   long i;
+
+   if (global) {
+   while (!try_lock_tlbie(kvm-arch.tlbie_lock))
+   cpu_relax();
+   if (need_sync)
+   asm volatile(ptesync : : : memory);
+   for (i = 0; i  npages; ++i) {
+   unsigned long rb = rbvalues[i];
+
+   if (rb  1) /* large page */
+   asm volatile(tlbie %0,1 : :
+r (rb  0xf000ul));
+   else
+   asm volatile(tlbie %0,0 : :
+r (rb  0xf000ul));
+   }
+   asm volatile(eieio; tlbsync; ptesync : : : memory);
+   kvm-arch.tlbie_lock = 0;
+   } else {
+   if (need_sync)
+   asm volatile(ptesync : : : memory);
+   for (i = 0; i  npages; ++i) {
+   unsigned long rb = rbvalues[i];
+
+   if (rb  1) /* large page */
+   asm volatile(tlbiel %0,1 : :
+r (rb  0xf000ul));
+   else
+   asm volatile(tlbiel %0,0 : :
+r (rb  0xf000ul));
+   }
+   asm volatile(ptesync : : : memory);
+   }
+}
+
+static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues,
+ long npages, int global, bool need_sync)
+{
+   long i;
+
+   if (cpu_has_feature(CPU_FTR_ARCH_201)) {
+   /* PPC970 tlbie instruction is a bit different */
+   do_tlbies_970(kvm, rbvalues, npages, global, need_sync);
+   return;
+   }
+   if (global) {
+   while (!try_lock_tlbie(kvm-arch.tlbie_lock))
+   cpu_relax();
+   if (need_sync)
+   asm volatile(ptesync : : : memory);
+   for (i = 0; i  npages; ++i)
+   asm volatile(PPC_TLBIE(%1,%0) : :
+r (rbvalues[i]), r (kvm-arch.lpid));
+   asm volatile(eieio; tlbsync; ptesync 

[PATCH 2/2] KVM: PPC: Book3S HV: Allow negative offsets to real-mode hcall handlers

2013-07-08 Thread Paul Mackerras
The table of offsets to real-mode hcall handlers in book3s_hv_rmhandlers.S
can contain negative values, if some of the handlers end up before the
table in the vmlinux binary.  Thus we need to use a sign-extending load
to read the values in the table rather than a zero-extending load.
Without this, the host crashes when the guest does one of the hcalls
with negative offsets, due to jumping to a bogus address.

Signed-off-by: Paul Mackerras pau...@samba.org
Cc: sta...@vger.kernel.org
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index b02f91e..60dce5b 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1381,7 +1381,7 @@ hcall_try_real_mode:
cmpldi  r3,hcall_real_table_end - hcall_real_table
bge guest_exit_cont
LOAD_REG_ADDR(r4, hcall_real_table)
-   lwzxr3,r3,r4
+   lwaxr3,r3,r4
cmpwi   r3,0
beq guest_exit_cont
add r3,r3,r4
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation

2013-07-08 Thread Alexander Graf

On 28.06.2013, at 11:20, Mihai Caraman wrote:

 lwepx faults needs to be handled by KVM and this implies additional code
 in DO_KVM macro to identify the source of the exception originated from
 host context. This requires to check the Exception Syndrome Register
 (ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) for DTB_MISS,
 DSI and LRAT exceptions which is too intrusive for the host.
 
 Get rid of lwepx and acquire last instuction in kvmppc_handle_exit() by
 searching for the physical address and kmap it. This fixes an infinite loop

What's the difference in speed for this?

Also, could we call lwepx later in host code, when kvmppc_get_last_inst() gets 
invoked?

 caused by lwepx's data TLB miss handled in the host and the TODO for TLB
 eviction and execute-but-not-read entries.
 
 Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
 ---
 Resend this pacth for Alex G. he was unsubscribed from kvm-ppc mailist
 for a while.
 
 arch/powerpc/include/asm/mmu-book3e.h |6 ++-
 arch/powerpc/kvm/booke.c  |6 +++
 arch/powerpc/kvm/booke.h  |2 +
 arch/powerpc/kvm/bookehv_interrupts.S |   32 ++-
 arch/powerpc/kvm/e500.c   |4 ++
 arch/powerpc/kvm/e500mc.c |   69 +
 6 files changed, 91 insertions(+), 28 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/mmu-book3e.h 
 b/arch/powerpc/include/asm/mmu-book3e.h
 index 99d43e0..32e470e 100644
 --- a/arch/powerpc/include/asm/mmu-book3e.h
 +++ b/arch/powerpc/include/asm/mmu-book3e.h
 @@ -40,7 +40,10 @@
 
 /* MAS registers bit definitions */
 
 -#define MAS0_TLBSEL(x)   (((x)  28)  0x3000)
 +#define MAS0_TLBSEL_MASK 0x3000
 +#define MAS0_TLBSEL_SHIFT28
 +#define MAS0_TLBSEL(x)   (((x)  MAS0_TLBSEL_SHIFT)  
 MAS0_TLBSEL_MASK)
 +#define MAS0_GET_TLBSEL(mas0)(((mas0)  MAS0_TLBSEL_MASK)  
 MAS0_TLBSEL_SHIFT)
 #define MAS0_ESEL_MASK0x0FFF
 #define MAS0_ESEL_SHIFT   16
 #define MAS0_ESEL(x)  (((x)  MAS0_ESEL_SHIFT)  MAS0_ESEL_MASK)
 @@ -58,6 +61,7 @@
 #define MAS1_TSIZE_MASK   0x0f80
 #define MAS1_TSIZE_SHIFT  7
 #define MAS1_TSIZE(x) (((x)  MAS1_TSIZE_SHIFT)  MAS1_TSIZE_MASK)
 +#define MAS1_GET_TSIZE(mas1) (((mas1)  MAS1_TSIZE_MASK)  MAS1_TSIZE_SHIFT)
 
 #define MAS2_EPN  (~0xFFFUL)
 #define MAS2_X0   0x0040
 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
 index 1020119..6764a8e 100644
 --- a/arch/powerpc/kvm/booke.c
 +++ b/arch/powerpc/kvm/booke.c
 @@ -836,6 +836,12 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
 kvm_vcpu *vcpu,
   /* update before a new last_exit_type is rewritten */
   kvmppc_update_timing_stats(vcpu);
 
 + /*
 +  * The exception type can change at this point, such as if the TLB entry
 +  * for the emulated instruction has been evicted.
 +  */
 + kvmppc_prepare_for_emulation(vcpu, exit_nr);

Please model this the same way as book3s. Check out kvmppc_get_last_inst() as a 
starting point.

 +
   /* restart interrupts if they were meant for the host */
   kvmppc_restart_interrupt(vcpu, exit_nr);
 
 diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h
 index 5fd1ba6..a0d0fea 100644
 --- a/arch/powerpc/kvm/booke.h
 +++ b/arch/powerpc/kvm/booke.h
 @@ -90,6 +90,8 @@ void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu);
 void kvmppc_booke_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
 void kvmppc_booke_vcpu_put(struct kvm_vcpu *vcpu);
 
 +void kvmppc_prepare_for_emulation(struct kvm_vcpu *vcpu, unsigned int 
 *exit_nr);
 +
 enum int_class {
   INT_CLASS_NONCRIT,
   INT_CLASS_CRIT,
 diff --git a/arch/powerpc/kvm/bookehv_interrupts.S 
 b/arch/powerpc/kvm/bookehv_interrupts.S
 index 20c7a54..0538ab9 100644
 --- a/arch/powerpc/kvm/bookehv_interrupts.S
 +++ b/arch/powerpc/kvm/bookehv_interrupts.S
 @@ -120,37 +120,20 @@
 
   .if \flags  NEED_EMU
   /*
 -  * This assumes you have external PID support.
 -  * To support a bookehv CPU without external PID, you'll
 -  * need to look up the TLB entry and create a temporary mapping.
 -  *
 -  * FIXME: we don't currently handle if the lwepx faults.  PR-mode
 -  * booke doesn't handle it either.  Since Linux doesn't use
 -  * broadcast tlbivax anymore, the only way this should happen is
 -  * if the guest maps its memory execute-but-not-read, or if we
 -  * somehow take a TLB miss in the middle of this entry code and
 -  * evict the relevant entry.  On e500mc, all kernel lowmem is
 -  * bolted into TLB1 large page mappings, and we don't use
 -  * broadcast invalidates, so we should not take a TLB miss here.
 -  *
 -  * Later we'll need to deal with faults here.  Disallowing guest
 -  * mappings that are execute-but-not-read could be an option on
 -  * e500mc, but not on chips with an LRAT if it is used.
 

Re: [PATCH -V3 1/4] mm/cma: Move dma contiguous changes into a seperate config

2013-07-08 Thread Alexander Graf

On 02.07.2013, at 07:45, Aneesh Kumar K.V wrote:

 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 We want to use CMA for allocating hash page table and real mode area for
 PPC64. Hence move DMA contiguous related changes into a seperate config
 so that ppc64 can enable CMA without requiring DMA contiguous.
 
 Acked-by: Michal Nazarewicz min...@mina86.com
 Acked-by: Paul Mackerras pau...@samba.org
 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

Thanks, applied all to kvm-ppc-queue. Please provide a cover letter next time 
:).


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: PPC: Fix kvm_exit_names array

2013-07-08 Thread Alexander Graf

On 03.07.2013, at 15:30, Mihai Caraman wrote:

 Some exit ids where left out from kvm_exit_names array.
 
 Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
 ---
 arch/powerpc/kvm/timing.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)
 
 diff --git a/arch/powerpc/kvm/timing.c b/arch/powerpc/kvm/timing.c
 index 07b6110..c392d26 100644
 --- a/arch/powerpc/kvm/timing.c
 +++ b/arch/powerpc/kvm/timing.c
 @@ -135,7 +135,9 @@ static const char 
 *kvm_exit_names[__NUMBER_OF_KVM_EXIT_TYPES] = {
   [USR_PR_INST] = USR_PR_INST,
   [FP_UNAVAIL] =  FP_UNAVAIL,
   [DEBUG_EXITS] = DEBUG,
 - [TIMEINGUEST] = TIMEINGUEST
 + [TIMEINGUEST] = TIMEINGUEST,
 + [DBELL_EXITS] = DBELL,
 + [GDBELL_EXITS] =GDBELL

Please add a comma at the end here, so that we don't have to uselessly touch 
the entry next time again.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Emulate MCSRR0/1 SPR and rfmci instruction

2013-07-08 Thread Alexander Graf

On 03.07.2013, at 15:30, Mihai Caraman wrote:

 Some guests are making use of return from machine check instruction
 to do crazy things even though the 64-bit kernel doesn't handle yet
 this interrupt. Emulate MCSRR0/1 SPR and rfmci instruction accordingly.
 
 Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
 ---
 arch/powerpc/include/asm/kvm_host.h |1 +
 arch/powerpc/kvm/booke_emulate.c|   25 +
 arch/powerpc/kvm/timing.c   |1 +
 3 files changed, 27 insertions(+), 0 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/kvm_host.h 
 b/arch/powerpc/include/asm/kvm_host.h
 index af326cd..0466789 100644
 --- a/arch/powerpc/include/asm/kvm_host.h
 +++ b/arch/powerpc/include/asm/kvm_host.h
 @@ -148,6 +148,7 @@ enum kvm_exit_types {
   EMULATED_TLBWE_EXITS,
   EMULATED_RFI_EXITS,
   EMULATED_RFCI_EXITS,
 + EMULATED_RFMCI_EXITS,

I would quite frankly prefer to see us abandon the whole exit timing framework 
in the kernel and instead use trace points. Then we don't have to maintain all 
of this randomly exercised code.

FWIW I think in this case however, treating RFMCI the same as RFI or random 
instruction emulation shouldn't hurt. This whole table is only about timing 
measurements. If you want to know for real what's going on, use trace points.

Otherwise looks good.


Alex

   DEC_EXITS,
   EXT_INTR_EXITS,
   HALT_WAKEUP,
 diff --git a/arch/powerpc/kvm/booke_emulate.c 
 b/arch/powerpc/kvm/booke_emulate.c
 index 27a4b28..aaff1b7 100644
 --- a/arch/powerpc/kvm/booke_emulate.c
 +++ b/arch/powerpc/kvm/booke_emulate.c
 @@ -23,6 +23,7 @@
 
 #include booke.h
 
 +#define OP_19_XOP_RFMCI   38
 #define OP_19_XOP_RFI 50
 #define OP_19_XOP_RFCI51
 
 @@ -43,6 +44,12 @@ static void kvmppc_emul_rfci(struct kvm_vcpu *vcpu)
   kvmppc_set_msr(vcpu, vcpu-arch.csrr1);
 }
 
 +static void kvmppc_emul_rfmci(struct kvm_vcpu *vcpu)
 +{
 + vcpu-arch.pc = vcpu-arch.mcsrr0;
 + kvmppc_set_msr(vcpu, vcpu-arch.mcsrr1);
 +}
 +
 int kvmppc_booke_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
 unsigned int inst, int *advance)
 {
 @@ -65,6 +72,12 @@ int kvmppc_booke_emulate_op(struct kvm_run *run, struct 
 kvm_vcpu *vcpu,
   *advance = 0;
   break;
 
 + case OP_19_XOP_RFMCI:
 + kvmppc_emul_rfmci(vcpu);
 + kvmppc_set_exit_type(vcpu, EMULATED_RFMCI_EXITS);
 + *advance = 0;
 + break;
 +
   default:
   emulated = EMULATE_FAIL;
   break;
 @@ -138,6 +151,12 @@ int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, 
 int sprn, ulong spr_val)
   case SPRN_DBCR1:
   vcpu-arch.dbg_reg.dbcr1 = spr_val;
   break;
 + case SPRN_MCSRR0:
 + vcpu-arch.mcsrr0 = spr_val;
 + break;
 + case SPRN_MCSRR1:
 + vcpu-arch.mcsrr1 = spr_val;
 + break;
   case SPRN_DBSR:
   vcpu-arch.dbsr = ~spr_val;
   break;
 @@ -284,6 +303,12 @@ int kvmppc_booke_emulate_mfspr(struct kvm_vcpu *vcpu, 
 int sprn, ulong *spr_val)
   case SPRN_DBCR1:
   *spr_val = vcpu-arch.dbg_reg.dbcr1;
   break;
 + case SPRN_MCSRR0:
 + *spr_val = vcpu-arch.mcsrr0;
 + break;
 + case SPRN_MCSRR1:
 + *spr_val = vcpu-arch.mcsrr1;
 + break;
   case SPRN_DBSR:
   *spr_val = vcpu-arch.dbsr;
   break;
 diff --git a/arch/powerpc/kvm/timing.c b/arch/powerpc/kvm/timing.c
 index c392d26..670f63d 100644
 --- a/arch/powerpc/kvm/timing.c
 +++ b/arch/powerpc/kvm/timing.c
 @@ -129,6 +129,7 @@ static const char 
 *kvm_exit_names[__NUMBER_OF_KVM_EXIT_TYPES] = {
   [EMULATED_TLBSX_EXITS] =EMUL_TLBSX,
   [EMULATED_TLBWE_EXITS] =EMUL_TLBWE,
   [EMULATED_RFI_EXITS] =  EMUL_RFI,
 + [EMULATED_RFMCI_EXITS] =EMUL_RFMCI,
   [DEC_EXITS] =   DEC,
   [EXT_INTR_EXITS] =  EXTINT,
   [HALT_WAKEUP] = HALT,
 -- 
 1.7.3.4
 
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm-ppc in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] vfio: add external user support

2013-07-08 Thread Alex Williamson
On Sun, 2013-07-07 at 01:07 +1000, Alexey Kardashevskiy wrote:
 VFIO is designed to be used via ioctls on file descriptors
 returned by VFIO.
 
 However in some situations support for an external user is required.
 The first user is KVM on PPC64 (SPAPR TCE protocol) which is going to
 use the existing VFIO groups for exclusive access in real/virtual mode
 on a host to avoid passing map/unmap requests to the user space which
 would made things pretty slow.
 
 The proposed protocol includes:
 
 1. do normal VFIO init stuff such as opening a new container, attaching
 group(s) to it, setting an IOMMU driver for a container. When IOMMU is
 set for a container, all groups in it are considered ready to use by
 an external user.
 
 2. pass a fd of the group we want to accelerate to KVM. KVM calls
 vfio_group_get_external_user() to verify if the group is initialized,
 IOMMU is set for it and increment the container user counter to prevent
 the VFIO group from disposal prior to KVM exit.
 The current TCE IOMMU driver marks the whole IOMMU table as busy when
 IOMMU is set for a container what prevents other DMA users from
 allocating from it so it is safe to grant user space access to it.
 
 3. KVM calls vfio_external_user_iommu_id() to obtian an IOMMU ID which
 KVM uses to get an iommu_group struct for later use.
 
 4. When KVM is finished, it calls vfio_group_put_external_user() to
 release the VFIO group by decrementing the container user counter.
 Everything gets released.
 
 The vfio: Limit group opens patch is also required for the consistency.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
 diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
 index c488da5..57aa191 100644
 --- a/drivers/vfio/vfio.c
 +++ b/drivers/vfio/vfio.c
 @@ -1370,6 +1370,62 @@ static const struct file_operations vfio_device_fops = 
 {
  };
  
  /**
 + * External user API, exported by symbols to be linked dynamically.
 + *
 + * The protocol includes:
 + *  1. do normal VFIO init operation:
 + *   - opening a new container;
 + *   - attaching group(s) to it;
 + *   - setting an IOMMU driver for a container.
 + * When IOMMU is set for a container, all groups in it are
 + * considered ready to use by an external user.
 + *
 + * 2. The user space passed a group fd which we want to accelerate in
 + * KVM. KVM uses vfio_group_get_external_user() to verify that:
 + *   - the group is initialized;
 + *   - IOMMU is set for it.
 + * Then vfio_group_get_external_user() increments the container user
 + * counter to prevent the VFIO group from disposal prior to KVM exit.
 + *
 + * 3. KVM calls vfio_external_user_iommu_id() to know an IOMMU ID which
 + * KVM uses to get an iommu_group struct for later use.
 + *
 + * 4. When KVM is finished, it calls vfio_group_put_external_user() to
 + * release the VFIO group by decrementing the container user counter.

nit, the interface is for any external user, not just kvm.

 + */
 +struct vfio_group *vfio_group_get_external_user(struct file *filep)
 +{
 + struct vfio_group *group = filep-private_data;
 +
 + if (filep-f_op != vfio_group_fops)
 + return NULL;

ERR_PTR(-EINVAL)

There also needs to be a vfio_group_get(group) here and put in error
cases.

 +
 + if (!atomic_inc_not_zero(group-container_users))
 + return NULL;

ERR_PTR(-EINVAL)

 +
 + if (!group-container-iommu_driver ||
 + !vfio_group_viable(group)) {
 + atomic_dec(group-container_users);
 + return NULL;

ERR_PTR(-EINVAL)

 + }
 +
 + return group;
 +}
 +EXPORT_SYMBOL_GPL(vfio_group_get_external_user);
 +
 +void vfio_group_put_external_user(struct vfio_group *group)
 +{
 + vfio_group_try_dissolve_container(group);

And a vfio_group_put(group) here

 +}
 +EXPORT_SYMBOL_GPL(vfio_group_put_external_user);
 +
 +int vfio_external_user_iommu_id(struct vfio_group *group)
 +{
 + return iommu_group_id(group-iommu_group);
 +}
 +EXPORT_SYMBOL_GPL(vfio_external_user_iommu_id);
 +
 +/**
   * Module/class support
   */
  static char *vfio_devnode(struct device *dev, umode_t *mode)
 diff --git a/include/linux/vfio.h b/include/linux/vfio.h
 index ac8d488..24579a0 100644
 --- a/include/linux/vfio.h
 +++ b/include/linux/vfio.h
 @@ -90,4 +90,11 @@ extern void vfio_unregister_iommu_driver(
   TYPE tmp;   \
   offsetof(TYPE, MEMBER) + sizeof(tmp.MEMBER); }) \
  
 +/*
 + * External user API
 + */
 +extern struct vfio_group *vfio_group_get_external_user(struct file *filep);
 +extern void vfio_group_put_external_user(struct vfio_group *group);
 +extern int vfio_external_user_iommu_id(struct vfio_group *group);
 +
  #endif /* VFIO_H */



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] vfio: add external user support

2013-07-08 Thread Alexey Kardashevskiy
On 07/09/2013 07:52 AM, Alex Williamson wrote:
 On Sun, 2013-07-07 at 01:07 +1000, Alexey Kardashevskiy wrote:
 VFIO is designed to be used via ioctls on file descriptors
 returned by VFIO.

 However in some situations support for an external user is required.
 The first user is KVM on PPC64 (SPAPR TCE protocol) which is going to
 use the existing VFIO groups for exclusive access in real/virtual mode
 on a host to avoid passing map/unmap requests to the user space which
 would made things pretty slow.

 The proposed protocol includes:

 1. do normal VFIO init stuff such as opening a new container, attaching
 group(s) to it, setting an IOMMU driver for a container. When IOMMU is
 set for a container, all groups in it are considered ready to use by
 an external user.

 2. pass a fd of the group we want to accelerate to KVM. KVM calls
 vfio_group_get_external_user() to verify if the group is initialized,
 IOMMU is set for it and increment the container user counter to prevent
 the VFIO group from disposal prior to KVM exit.
 The current TCE IOMMU driver marks the whole IOMMU table as busy when
 IOMMU is set for a container what prevents other DMA users from
 allocating from it so it is safe to grant user space access to it.

 3. KVM calls vfio_external_user_iommu_id() to obtian an IOMMU ID which
 KVM uses to get an iommu_group struct for later use.

 4. When KVM is finished, it calls vfio_group_put_external_user() to
 release the VFIO group by decrementing the container user counter.
 Everything gets released.

 The vfio: Limit group opens patch is also required for the consistency.

 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
 diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
 index c488da5..57aa191 100644
 --- a/drivers/vfio/vfio.c
 +++ b/drivers/vfio/vfio.c
 @@ -1370,6 +1370,62 @@ static const struct file_operations vfio_device_fops 
 = {
  };
  
  /**
 + * External user API, exported by symbols to be linked dynamically.
 + *
 + * The protocol includes:
 + *  1. do normal VFIO init operation:
 + *  - opening a new container;
 + *  - attaching group(s) to it;
 + *  - setting an IOMMU driver for a container.
 + * When IOMMU is set for a container, all groups in it are
 + * considered ready to use by an external user.
 + *
 + * 2. The user space passed a group fd which we want to accelerate in
 + * KVM. KVM uses vfio_group_get_external_user() to verify that:
 + *  - the group is initialized;
 + *  - IOMMU is set for it.
 + * Then vfio_group_get_external_user() increments the container user
 + * counter to prevent the VFIO group from disposal prior to KVM exit.
 + *
 + * 3. KVM calls vfio_external_user_iommu_id() to know an IOMMU ID which
 + * KVM uses to get an iommu_group struct for later use.
 + *
 + * 4. When KVM is finished, it calls vfio_group_put_external_user() to
 + * release the VFIO group by decrementing the container user counter.
 
 nit, the interface is for any external user, not just kvm.

s/KVM/An external user/ ?
Or add the description below uses KVM just as an example of an external user?


 + */
 +struct vfio_group *vfio_group_get_external_user(struct file *filep)
 +{
 +struct vfio_group *group = filep-private_data;
 +
 +if (filep-f_op != vfio_group_fops)
 +return NULL;
 
 ERR_PTR(-EINVAL)
 
 There also needs to be a vfio_group_get(group) here and put in error
 cases.


Is that because I do not hold a reference to the file anymore?


 +
 +if (!atomic_inc_not_zero(group-container_users))
 +return NULL;
 
 ERR_PTR(-EINVAL)
 
 +
 +if (!group-container-iommu_driver ||
 +!vfio_group_viable(group)) {
 +atomic_dec(group-container_users);
 +return NULL;
 
 ERR_PTR(-EINVAL)
 
 +}
 +
 +return group;
 +}
 +EXPORT_SYMBOL_GPL(vfio_group_get_external_user);
 +
 +void vfio_group_put_external_user(struct vfio_group *group)
 +{
 +vfio_group_try_dissolve_container(group);
 
 And a vfio_group_put(group) here
 
 +}
 +EXPORT_SYMBOL_GPL(vfio_group_put_external_user);
 +
 +int vfio_external_user_iommu_id(struct vfio_group *group)
 +{
 +return iommu_group_id(group-iommu_group);
 +}
 +EXPORT_SYMBOL_GPL(vfio_external_user_iommu_id);
 +
 +/**
   * Module/class support
   */
  static char *vfio_devnode(struct device *dev, umode_t *mode)
 diff --git a/include/linux/vfio.h b/include/linux/vfio.h
 index ac8d488..24579a0 100644
 --- a/include/linux/vfio.h
 +++ b/include/linux/vfio.h
 @@ -90,4 +90,11 @@ extern void vfio_unregister_iommu_driver(
  TYPE tmp;   \
  offsetof(TYPE, MEMBER) + sizeof(tmp.MEMBER); }) \
  
 +/*
 + * External user API
 + */
 +extern struct vfio_group *vfio_group_get_external_user(struct file *filep);
 +extern void vfio_group_put_external_user(struct vfio_group *group);
 +extern int vfio_external_user_iommu_id(struct vfio_group *group);
 +
  #endif /* VFIO_H */
 
 
 


-- 
Alexey
--
To unsubscribe from this list: send the