Re: Nested paging in nested SVM setup

2014-08-20 Thread Valentine Sinitsyn

Hi all,

Please excuse me for bringing alive a two-month old thread, but I had 
time to investigate the issue a bit only recently.


On 18.06.2014 18:47, Jan Kiszka wrote:

On 2014-06-18 13:36, Valentine Sinitsyn wrote:
If we want to provide useful nested SVM support, this must be feasible.
If there is a bug, it has to be fixed.
Looks like it is a bug in KVM. I had a chance to run the same code 
bare-metal ([1], line 310 is uncommented for bare-metal case but present 
for nested SVM), and it seems to work as expected. However, When I trace 
it in nested SVM setup, after some successful APIC reads and writes, I 
get the following:



 qemu-system-x86-1968  [001] 220417.681261: kvm_nested_vmexit:rip: 
0x8104f5b8 reason: npf ext_inf1: 0x0001000f ext_inf2: 
0xfee00300 ext_int:
 0x ext_int_err: 0x
 qemu-system-x86-1968  [001] 220417.681261: kvm_page_fault:   address 
fee00300 error_code f
 qemu-system-x86-1968  [001] 220417.681263: kvm_emulate_insn: 
0:8104f5b8:89 04 25 00 93 5f ff (prot64)
 qemu-system-x86-1968  [001] 220417.681268: kvm_inj_exception: (0x23c)
 qemu-system-x86-1968  [001] 220417.681269: kvm_entry:vcpu 0
 qemu-system-x86-1968  [001] 220417.681271: kvm_exit: reason  rip 
0x8104f5b8 info 0 0


You can see the problem here: the code tries to access APIC MMIO 
register, which is trapped by KVM's MMU code (at nested page table 
walk). During MMIO access emulation, KVM decides to inject 0x23c 
exception (which looks wrong, as there is no exception with this number 
defined). After that things become flawed (pay attention to empty reason 
in the last line; the VMCB is certainly not in the state KVM 
expects/supports).


I'm no KVM expert, and will be grateful for debugging suggestions (or 
maybe even assistance).


Many thanks for the help.

1. 
https://github.com/vsinitsyn/jailhouse/blob/amd-v/hypervisor/arch/x86/svm.c#L301


--
Regards,
Valentine Sinitsyn
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] KVM: vmx: fix ept reserved bits for 1-GByte page

2014-08-20 Thread Paolo Bonzini
Il 20/08/2014 05:17, Wanpeng Li ha scritto:
 + else if (spte  (1ULL  7))

You have to check level == 1 specifically here, or add...

 + /*
 +  * 1GB/2MB page, bits 29:12 or 20:12 reserved respectively,
 +  * level == 1 if the hypervisor is using the ignored bit 7.
 +  */
 + mask |= (PAGE_SIZE  ((level - 1) * 9)) - PAGE_SIZE;
 + else

... if (level  1) here.

Otherwise, you're marking bits 6:3 as reserved for 4K pages.  This
should cause a WARN, because KVM puts 0110 in those bits:

ret = (MTRR_TYPE_WRBACK  VMX_EPT_MT_EPTE_SHIFT)
| VMX_EPT_IPAT_BIT;

(in vmx_get_mt_mask: writeback memory, ignore PAT memory type from the
guest's page tables)

How are you testing this patch?

Paolo

 + /* bits 6:3 reserved */
 + mask |= 0x78;

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Nested paging in nested SVM setup

2014-08-20 Thread Paolo Bonzini
Il 20/08/2014 08:46, Valentine Sinitsyn ha scritto:
 
 You can see the problem here: the code tries to access APIC MMIO
 register, which is trapped by KVM's MMU code (at nested page table
 walk). During MMIO access emulation, KVM decides to inject 0x23c
 exception (which looks wrong, as there is no exception with this number
 defined). After that things become flawed (pay attention to empty reason
 in the last line; the VMCB is certainly not in the state KVM
 expects/supports).
 
 I'm no KVM expert, and will be grateful for debugging suggestions (or
 maybe even assistance).

Is the 0x23c always the same?  Can you try this patch?

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 204422de3fed..194e9300a31b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -346,6 +346,7 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
 
kvm_make_request(KVM_REQ_EVENT, vcpu);
 
+   WARN_ON(nr  0x1f);
if (!vcpu-arch.exception.pending) {
queue:
vcpu-arch.exception.pending = true;

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] KVM: nVMX: nested TPR shadow/threshold emulation

2014-08-20 Thread Wanpeng Li
Hi Paolo,
On Tue, Aug 19, 2014 at 10:34:20AM +0200, Paolo Bonzini wrote:
Il 19/08/2014 10:30, Wanpeng Li ha scritto:
 +if (vmx-nested.virtual_apic_page)
 +nested_release_page(vmx-nested.virtual_apic_page);
 +vmx-nested.virtual_apic_page =
 +   nested_get_page(vcpu, vmcs12-virtual_apic_page_addr);
 +if (!vmx-nested.virtual_apic_page)
 +exec_control =
 +~CPU_BASED_TPR_SHADOW;
 +else
 +vmcs_write64(VIRTUAL_APIC_PAGE_ADDR,
 +page_to_phys(vmx-nested.virtual_apic_page));
 +
 +/*
 + * If CR8 load exits are enabled, CR8 store exits are enabled,
 + * and virtualize APIC access is disabled, the processor would
 + * never notice. Doing it unconditionally is not correct, but
 + * it is the simplest thing.
 + */
 +if (!(exec_control  CPU_BASED_TPR_SHADOW) 
 +!((exec_control  CPU_BASED_CR8_LOAD_EXITING) 
 +(exec_control  CPU_BASED_CR8_STORE_EXITING)))
 +nested_vmx_failValid(vcpu, 
 VMXERR_ENTRY_INVALID_CONTROL_FIELD);
 +

You aren't checking virtualize APIC access here, but the comment
mentions it.

As the comment says, failing the entry unconditionally could be the
simplest thing, which means moving the nested_vmx_failValid call inside
the if (!vmx-nested.virtual_apic_page).

If you want to check all of CR8_LOAD/CR8_STORE/VIRTUALIZE_APIC_ACCESS,
please mention in the comment that failing the vm entry is _not_ what
the processor does but it's basically the only possibility we have.  In
that case, I would also place the if within the if
(!vmx-nested.virtual_apic_page): it also simplifies the condition
because you don't have to check CPU_BASED_TPR_SHADOW anymore.

You can send v5 with these changes, and I'll apply it for 3.18.  Thanks!


Do you mean this? 

+   /*
+* Failing the vm entry is _not_ what the processor does
+* but it's basically the only possibility we have.
+*/
+   if (!vmx-nested.virtual_apic_page)
+   nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);

Regards,
Wanpeng Li 

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] KVM: track pid for VCPU only on KVM_RUN ioctl

2014-08-20 Thread Christian Borntraeger
On 20/08/14 01:22, Wanpeng Li wrote:
 On Tue, Aug 19, 2014 at 04:04:03PM +0200, Christian Borntraeger wrote:
 On 18/08/14 07:02, Wanpeng Li wrote:
 Hi Christian,
 On Tue, Aug 05, 2014 at 04:44:14PM +0200, Christian Borntraeger wrote:
 We currently track the pid of the task that runs the VCPU in
 vcpu_load. Since we call vcpu_load for all kind of ioctls on a
 CPU, this causes hickups due to synchronize_rcu if one CPU is
 modified by another CPU or the main thread (e.g. initialization,
 reset). We track the pid only for the purpose of yielding, so
 let's update the pid only in the KVM_RUN ioctl.

 In addition, don't do a synchronize_rcu on startup (pid == 0).

 This speeds up guest boot time on s390 noticably for some configs, e.g.
 HZ=100, no full state tracking, 64 guest cpus 32 host cpus.

 Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
 CC: Rik van Riel r...@redhat.com
 CC: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 CC: Michael Mueller m...@linux.vnet.ibm.com
 ---
 virt/kvm/kvm_main.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index 9ae9135..ebc8f54 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -124,14 +124,6 @@ int vcpu_load(struct kvm_vcpu *vcpu)

if (mutex_lock_killable(vcpu-mutex))
return -EINTR;

 One question: 

 -  if (unlikely(vcpu-pid != current-pids[PIDTYPE_PID].pid)) {

 When vcpu-pid and current-pids[PIDTYPE_PID].pid will be different?

 If two different thread call an ioctl on a vcpu fd. (It must be an ioctl 
 that has done vcpu_load - almost all except for some interrupt injections)
 
 Thanks for your explanation. When can this happen?

In general, by using clone and do an ioctl in the new thread on a pre-existing 
fd.
In qemu, e.g. by using an kvm_ioctl on a vcpu from main thread or another cpu.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/9] KVM: VMX: abstract ple_window modifiers

2014-08-20 Thread Paolo Bonzini
Il 19/08/2014 22:35, Radim Krčmář ha scritto:
 They were almost identical and thus merged with a loathable macro.
 
 Signed-off-by: Radim Krčmář rkrc...@redhat.com
 ---
  This solution is hopefully more acceptable than function pointers.

I think a little amount of duplication is not a problem.

Paolo

  arch/x86/kvm/vmx.c | 53 +++--
  1 file changed, 19 insertions(+), 34 deletions(-)
 
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index a236a9f..c6cfb71 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -5694,42 +5694,27 @@ static int handle_invalid_guest_state(struct kvm_vcpu 
 *vcpu)
  out:
   return ret;
  }
 -
 -static void grow_ple_window(struct kvm_vcpu *vcpu)
 -{
 - struct vcpu_vmx *vmx = to_vmx(vcpu);
 - int old = vmx-ple_window;
 - int new;
 -
 - if (ple_window_grow  1)
 - new = ple_window;
 - else if (ple_window_grow  ple_window)
 - new = old * ple_window_grow;
 - else
 - new = old + ple_window_grow;
 -
 - vmx-ple_window = min(new, ple_window_max);
 -
 - trace_kvm_ple_window_grow(vcpu-vcpu_id, vmx-ple_window, old);
 +#define make_ple_window_modifier(type, oplt, opge, cmp, bound) \
 +static void type##_ple_window(struct kvm_vcpu *vcpu) \
 +{ \
 + struct vcpu_vmx *vmx = to_vmx(vcpu); \
 + int old = vmx-ple_window; \
 + int new; \
 +\
 + if (ple_window_##type  1) \
 + new = ple_window; \
 + else if (ple_window_##type  ple_window) \
 + new = old oplt ple_window_##type; \
 + else \
 + new = old opge ple_window_##type; \
 +\
 + vmx-ple_window = cmp(new, bound); \
 +\
 + trace_kvm_ple_window_##type(vcpu-vcpu_id, vmx-ple_window, old); \
  }
  
 -static void shrink_ple_window(struct kvm_vcpu *vcpu)
 -{
 - struct vcpu_vmx *vmx = to_vmx(vcpu);
 - int old = vmx-ple_window;
 - int new;
 -
 - if (ple_window_shrink  1)
 - new = ple_window;
 - else if (ple_window_shrink  ple_window)
 - new = old / ple_window_shrink;
 - else
 - new = old - ple_window_shrink;
 -
 - vmx-ple_window = max(new, ple_window);
 -
 - trace_kvm_ple_window_shrink(vcpu-vcpu_id, vmx-ple_window, old);
 -}
 +make_ple_window_modifier(grow,   *, +, min, ple_window_max)
 +make_ple_window_modifier(shrink, /, -, max, ple_window)
  
  /*
   * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/9] KVM: VMX: make PLE window per-vcpu

2014-08-20 Thread Paolo Bonzini
Il 19/08/2014 22:35, Radim Krčmář ha scritto:
 Change PLE window into per-vcpu variable, seeded from module parameter,
 to allow greater flexibility.
 
 Brings in a small overhead on every vmentry.
 
 Signed-off-by: Radim Krčmář rkrc...@redhat.com
 ---
  I've been thinking about a general hierarchical per-vcpu variable model,
  but it's hard to have current performance and sane code.
 
  arch/x86/kvm/vmx.c | 7 +++
  1 file changed, 7 insertions(+)
 
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 2b306f9..eaa5574 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -484,6 +484,9 @@ struct vcpu_vmx {
  
   /* Support for a guest hypervisor (nested VMX) */
   struct nested_vmx nested;
 +
 + /* Dynamic PLE window. */
 + int ple_window;
  };
  
  enum segment_cache_field {
 @@ -4403,6 +4406,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
   if (ple_gap) {
   vmcs_write32(PLE_GAP, ple_gap);
   vmcs_write32(PLE_WINDOW, ple_window);

Is this necessary?

 + vmx-ple_window = ple_window;
   }
  
   vmcs_write32(PAGE_FAULT_ERROR_CODE_MASK, 0);
 @@ -7387,6 +7391,9 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu 
 *vcpu)
   if (vmx-emulation_required)
   return;
  
 + if (ple_gap)
 + vmcs_write32(PLE_WINDOW, vmx-ple_window);
 +
   if (vmx-nested.sync_shadow_vmcs) {
   copy_vmcs12_to_shadow(vmx);
   vmx-nested.sync_shadow_vmcs = false;
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum

2014-08-20 Thread Paolo Bonzini
Il 19/08/2014 22:35, Radim Krčmář ha scritto:
 Every increase of ple_window_grow creates potential overflows.
 They are not serious, because we clamp ple_window and userspace is
 expected to fix ple_window_max within a second.
 ---
  arch/x86/kvm/vmx.c | 34 +-
  1 file changed, 33 insertions(+), 1 deletion(-)
 
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index d7f58e8..6873a0b 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -138,7 +138,9 @@ module_param(ple_window, int, S_IRUGO | S_IWUSR);
  
  /* Default doubles per-vcpu window every exit. */
  static int ple_window_grow = KVM_VMX_DEFAULT_PLE_WINDOW_GROW;
 -module_param(ple_window_grow, int, S_IRUGO | S_IWUSR);
 +static struct kernel_param_ops ple_window_grow_ops;
 +module_param_cb(ple_window_grow, ple_window_grow_ops,
 +ple_window_grow, S_IRUGO | S_IWUSR);
  
  /* Default resets per-vcpu window every exit to ple_window. */
  static int ple_window_shrink = KVM_VMX_DEFAULT_PLE_WINDOW_SHRINK;
 @@ -5717,6 +5719,36 @@ static void type##_ple_window(struct kvm_vcpu *vcpu) \
  make_ple_window_modifier(grow,   *, +) /* grow_ple_window */
  make_ple_window_modifier(shrink, /, -) /* shrink_ple_window */
  
 +static void clamp_ple_window_max(void)
 +{
 + int maximum;
 +
 + if (ple_window_grow  1)
 + return;
 +
 + if (ple_window_grow  ple_window)
 + maximum = INT_MAX / ple_window_grow;
 + else
 + maximum = INT_MAX - ple_window_grow;
 +
 + ple_window_max = clamp(ple_window_max, ple_window, maximum);
 +}

I think avoiding overflows is better.  In fact, I think you should call
this function for ple_window_max too.

You could keep the ple_window_max variable to the user-set value.
Whenever ple_window_grow or ple_window_max are changed, you can set an
internal variable (let's call it ple_window_actual_max, but I'm not wed
to this name) to the computed value, and then do:

if (ple_window_grow  1 || ple_window_actual_max  ple_window)
new = ple_window;
else if (ple_window_grow  ple_window)
new = max(ple_window_actual_max, old) * ple_window_grow;
else
new = max(ple_window_actual_max, old) + ple_window_grow;

(I think the || in the first if can be eliminated with some creativity
in clamp_ple_window_max).

Paolo

 +static int set_ple_window_grow(const char *arg, const struct kernel_param 
 *kp)
 +{
 + int ret;
 +
 + clamp_ple_window_max();
 + ret = param_set_int(arg, kp);
 +
 + return ret;
 +}
 +
 +static struct kernel_param_ops ple_window_grow_ops = {
 + .set = set_ple_window_grow,
 + .get = param_get_int,
 +};
 +
  /*
   * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
   * exiting, so only get here on cpu with PAUSE-Loop-Exiting.
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum

2014-08-20 Thread Paolo Bonzini
Il 20/08/2014 09:16, Paolo Bonzini ha scritto:
 Il 19/08/2014 22:35, Radim Krčmář ha scritto:
 Every increase of ple_window_grow creates potential overflows.
 They are not serious, because we clamp ple_window and userspace is
 expected to fix ple_window_max within a second.
 ---
  arch/x86/kvm/vmx.c | 34 +-
  1 file changed, 33 insertions(+), 1 deletion(-)

 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index d7f58e8..6873a0b 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -138,7 +138,9 @@ module_param(ple_window, int, S_IRUGO | S_IWUSR);
  
  /* Default doubles per-vcpu window every exit. */
  static int ple_window_grow = KVM_VMX_DEFAULT_PLE_WINDOW_GROW;
 -module_param(ple_window_grow, int, S_IRUGO | S_IWUSR);
 +static struct kernel_param_ops ple_window_grow_ops;
 +module_param_cb(ple_window_grow, ple_window_grow_ops,
 +ple_window_grow, S_IRUGO | S_IWUSR);
  
  /* Default resets per-vcpu window every exit to ple_window. */
  static int ple_window_shrink = KVM_VMX_DEFAULT_PLE_WINDOW_SHRINK;
 @@ -5717,6 +5719,36 @@ static void type##_ple_window(struct kvm_vcpu *vcpu) \
  make_ple_window_modifier(grow,   *, +) /* grow_ple_window */
  make_ple_window_modifier(shrink, /, -) /* shrink_ple_window */
  
 +static void clamp_ple_window_max(void)
 +{
 +int maximum;
 +
 +if (ple_window_grow  1)
 +return;
 +
 +if (ple_window_grow  ple_window)
 +maximum = INT_MAX / ple_window_grow;
 +else
 +maximum = INT_MAX - ple_window_grow;
 +
 +ple_window_max = clamp(ple_window_max, ple_window, maximum);
 +}
 
 I think avoiding overflows is better.  In fact, I think you should call
 this function for ple_window_max too.
 
 You could keep the ple_window_max variable to the user-set value.
 Whenever ple_window_grow or ple_window_max are changed, you can set an
 internal variable (let's call it ple_window_actual_max, but I'm not wed
 to this name) to the computed value, and then do:
 
   if (ple_window_grow  1 || ple_window_actual_max  ple_window)
   new = ple_window;
   else if (ple_window_grow  ple_window)
   new = max(ple_window_actual_max, old) * ple_window_grow;
   else
   new = max(ple_window_actual_max, old) + ple_window_grow;

Ehm, this should of course be min.

Paolo

 (I think the || in the first if can be eliminated with some creativity
 in clamp_ple_window_max).
 
 Paolo
 
 +static int set_ple_window_grow(const char *arg, const struct kernel_param 
 *kp)
 +{
 +int ret;
 +
 +clamp_ple_window_max();
 +ret = param_set_int(arg, kp);
 +
 +return ret;
 +}
 +
 +static struct kernel_param_ops ple_window_grow_ops = {
 +.set = set_ple_window_grow,
 +.get = param_get_int,
 +};
 +
  /*
   * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
   * exiting, so only get here on cpu with PAUSE-Loop-Exiting.

 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/9] KVM: VMX: clamp PLE window

2014-08-20 Thread Paolo Bonzini
Il 19/08/2014 22:35, Radim Krčmář ha scritto:
 Modifications could get unwanted values of PLE window. (low or negative)
 Use ple_window and the maximal value that cannot overflow as bounds.
 
 ple_window_max defaults to a very high value, but it would make sense to
 set it to some fraction of the scheduler tick.
 
 Signed-off-by: Radim Krčmář rkrc...@redhat.com
 ---
  arch/x86/kvm/vmx.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)
 
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 66259fd..e1192fb 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -144,6 +144,10 @@ module_param(ple_window_grow, int, S_IRUGO);
  static int ple_window_shrink = KVM_VMX_DEFAULT_PLE_WINDOW_SHRINK;
  module_param(ple_window_shrink, int, S_IRUGO);
  
 +/* Default is to compute the maximum so we can never overflow. */
 +static int ple_window_max = INT_MAX / KVM_VMX_DEFAULT_PLE_WINDOW_GROW;
 +module_param(ple_window_max, int, S_IRUGO);
 +
  extern const ulong vmx_return;
  
  #define NR_AUTOLOAD_MSRS 8
 @@ -5704,7 +5708,7 @@ static void grow_ple_window(struct kvm_vcpu *vcpu)
   else
   new = old + ple_window_grow;
  
 - vmx-ple_window = new;
 + vmx-ple_window = min(new, ple_window_max);
  }

Please introduce a dynamic overflow-avoiding ple_window_max (like what
you have in patch 9) already in patch 4...

  static void shrink_ple_window(struct kvm_vcpu *vcpu)
 @@ -5720,7 +5724,7 @@ static void shrink_ple_window(struct kvm_vcpu *vcpu)
   else
   new = old - ple_window_shrink;
  
 - vmx-ple_window = new;
 + vmx-ple_window = max(new, ple_window);

... and also squash this in patch 4.

This patch can then introduce the ple_window_max module parameter (using
module_param_cb to avoid overflows).

Paolo

  }
  
  /*
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4] KVM: vmx: fix ept reserved bits for 1-GByte page

2014-08-20 Thread Wanpeng Li
EPT misconfig handler in kvm will check which reason lead to EPT
misconfiguration after vmexit. One of the reasons is that an EPT
paging-structure entry is configured with settings reserved for
future functionality. However, the handler can't identify if
paging-structure entry of reserved bits for 1-GByte page are
configured, since PDPTE which point to 1-GByte page will reserve
bits 29:12 instead of bits 7:3 which are reserved for PDPTE that
references an EPT Page Directory. This patch fix it by reserve
bits 29:12 for 1-GByte page.

Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com
---
v3 - v4:
 * don't mask bits 6:3 as reserved for 4K pages
v2 - v3:
 * return 0xf8 for level == 4
 * check spte  (1ULL  7) if level == 1
 * (rsvd_mask  0x38) == 0 for large page or leaf page
v1 - v2:
 * same if statement cover both 2MB and 1GB pages
 * return 0xf8 for level == 4
 * get the level by checking the return value of ept_rsvd_mask

---
 arch/x86/kvm/vmx.c | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index cad37d5..286c283 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5521,17 +5521,18 @@ static u64 ept_rsvd_mask(u64 spte, int level)
for (i = 51; i  boot_cpu_data.x86_phys_bits; i--)
mask |= (1ULL  i);
 
-   if (level  2)
+   if (level == 4)
/* bits 7:3 reserved */
mask |= 0xf8;
-   else if (level == 2) {
-   if (spte  (1ULL  7))
-   /* 2MB ref, bits 20:12 reserved */
-   mask |= 0x1ff000;
-   else
-   /* bits 6:3 reserved */
-   mask |= 0x78;
-   }
+   else if (spte  (1ULL  7))
+   /*
+* 1GB/2MB page, bits 29:12 or 20:12 reserved respectively,
+* level == 1 if the hypervisor is using the ignored bit 7.
+*/
+   mask |= (PAGE_SIZE  ((level - 1) * 9)) - PAGE_SIZE;
+   else if (level  1)
+   /* bits 6:3 reserved */
+   mask |= 0x78;
 
return mask;
 }
@@ -5561,7 +5562,8 @@ static void ept_misconfig_inspect_spte(struct kvm_vcpu 
*vcpu, u64 spte,
WARN_ON(1);
}
 
-   if (level == 1 || (level == 2  (spte  (1ULL  7 {
+   /* bits 5:3 are _not_ reserved for large page or leaf page */
+   if ((rsvd_bits  0x38) == 0) {
u64 ept_mem_type = (spte  0x38)  3;
 
if (ept_mem_type == 2 || ept_mem_type == 3 ||
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] KVM: vmx: fix ept reserved bits for 1-GByte page

2014-08-20 Thread Wanpeng Li
On Wed, Aug 20, 2014 at 08:51:38AM +0200, Paolo Bonzini wrote:
Il 20/08/2014 05:17, Wanpeng Li ha scritto:
 +else if (spte  (1ULL  7))

You have to check level == 1 specifically here, or add...

 +/*
 + * 1GB/2MB page, bits 29:12 or 20:12 reserved respectively,
 + * level == 1 if the hypervisor is using the ignored bit 7.
 + */
 +mask |= (PAGE_SIZE  ((level - 1) * 9)) - PAGE_SIZE;
 +else

... if (level  1) here.

Otherwise, you're marking bits 6:3 as reserved for 4K pages.  This
should cause a WARN, because KVM puts 0110 in those bits:

ret = (MTRR_TYPE_WRBACK  VMX_EPT_MT_EPTE_SHIFT)
| VMX_EPT_IPAT_BIT;

(in vmx_get_mt_mask: writeback memory, ignore PAT memory type from the
guest's page tables)


Got it.

Regards,
Wanpeng Li 

How are you testing this patch?

Paolo

 +/* bits 6:3 reserved */
 +mask |= 0x78;
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Nested paging in nested SVM setup

2014-08-20 Thread Valentine Sinitsyn

Hi Paolo,

On 20.08.2014 12:55, Paolo Bonzini wrote:

Is the 0x23c always the same?

No, it's just a garbage - I've seen other values as well (0x80 last time).


 Can you try this patch?

Sure. It does print a warning:

[ 2176.722098] [ cut here ]
[ 2176.722118] WARNING: CPU: 0 PID: 1488 at 
/home/val/kvm-kmod/x86/x86.c:368 kvm_multiple_exception+0x121/0x130 [kvm]()
[ 2176.722121] Modules linked in: kvm_amd(O) kvm(O) amd_freq_sensitivity 
snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic 
crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel 
snd_hda_intel aesni_intel snd_hda_controller radeon snd_hda_codec 
ipmi_si aes_x86_64 ipmi_msghandler snd_hwdep ttm r8169 ppdev mii lrw 
gf128mul snd_pcm glue_helper drm_kms_helper snd_timer fam15h_power evdev 
drm shpchp snd ablk_helper cryptd microcode mac_hid soundcore serio_raw 
pcspkr i2c_algo_bit k10temp i2c_piix4 i2c_core parport_pc parport hwmon 
edac_core tpm_tis edac_mce_amd tpm video button acpi_cpufreq processor 
ext4 crc16 mbcache jbd2 sd_mod crc_t10dif crct10dif_common atkbd libps2 
ahci libahci ohci_pci ohci_hcd ehci_pci xhci_hcd libata ehci_hcd usbcore 
scsi_mod usb_common i8042 serio [last unloaded: kvm]


[ 2176.722217] CPU: 0 PID: 1488 Comm: qemu-system-x86 Tainted: G 
W  O  3.16.1-1-ARCH #1
[ 2176.71] Hardware name: To Be Filled By O.E.M. To Be Filled By 
O.E.M./IMB-A180, BIOS L0.17 05/24/2013
[ 2176.74]   25350f51 8800919fbbc0 
8152ae6c
[ 2176.79]   8800919fbbf8 8106e45d 
880037f68000
[ 2176.722234]  0080 0001 81a4 


[ 2176.722239] Call Trace:
[ 2176.722250]  [8152ae6c] dump_stack+0x4d/0x6f
[ 2176.722257]  [8106e45d] warn_slowpath_common+0x7d/0xa0
[ 2176.722262]  [8106e58a] warn_slowpath_null+0x1a/0x20
[ 2176.722275]  [a0651e41] kvm_multiple_exception+0x121/0x130 
[kvm]
[ 2176.722288]  [a06594f8] x86_emulate_instruction+0x548/0x640 
[kvm]

[ 2176.722303]  [a06653e1] kvm_mmu_page_fault+0x91/0xf0 [kvm]
[ 2176.722310]  [a04eb6a7] pf_interception+0xd7/0x180 [kvm_amd]
[ 2176.722317]  [8104e876] ? native_apic_mem_write+0x6/0x10
[ 2176.722323]  [a04ef261] handle_exit+0x141/0x9d0 [kvm_amd]
[ 2176.722335]  [a065512c] ? kvm_set_cr8+0x1c/0x20 [kvm]
[ 2176.722341]  [a04ea3e0] ? nested_svm_get_tdp_cr3+0x20/0x20 
[kvm_amd]
[ 2176.722355]  [a065adc7] 
kvm_arch_vcpu_ioctl_run+0x597/0x1210 [kvm]

[ 2176.722368]  [a065705b] ? kvm_arch_vcpu_load+0xbb/0x200 [kvm]
[ 2176.722378]  [a064a152] kvm_vcpu_ioctl+0x2b2/0x5c0 [kvm]
[ 2176.722384]  [810b66b4] ? __wake_up+0x44/0x50
[ 2176.722390]  [81200dcc] ? fsnotify+0x28c/0x370
[ 2176.722397]  [811d4a70] do_vfs_ioctl+0x2d0/0x4b0
[ 2176.722403]  [811df18e] ? __fget+0x6e/0xb0
[ 2176.722408]  [811d4cd1] SyS_ioctl+0x81/0xa0
[ 2176.722414]  [81530be9] system_call_fastpath+0x16/0x1b
[ 2176.722418] ---[ end trace b0f81744c5a5ea4a ]---

Thanks,
Valentine
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] KVM: nVMX: nested TPR shadow/threshold emulation

2014-08-20 Thread Paolo Bonzini
Il 20/08/2014 08:59, Wanpeng Li ha scritto:
 
 + /*
 +  * Failing the vm entry is _not_ what the processor does
 +  * but it's basically the only possibility we have.

 * We could still enter the guest if CR8 load exits are 
 * enabled, CR8 store exits are enabled, and virtualize APIC
 * access is disabled; in this case the processor would never
 * use the TPR shadow and we could simply clear the bit from
 * the execution control.  But such a configuration is useless,
 * so let's keep the code simple.

 +  */
 + if (!vmx-nested.virtual_apic_page)
 + nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);

I thought so, but I'm afraid it's too late to do nested_vmx_failValid 
here.

Without a test case, I'd be more confident if you moved the 
nested_release_page/nested_get_page to a separate function, that 
nested_vmx_run calls before enter_guest_mode.  The same function can 
map apic_access_page too, for cleanliness.  Something like this:

 if (cpu_has_secondary_exec_ctrls() 
 nested_cpu_has(vmcs12,
CPU_BASED_ACTIVATE_SECONDARY_CONTROLS) 
 (vmcs12-secondary_vm_exec_control 
  SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) {
if (vmx-nested.apic_access_page) /* shouldn't happen */

nested_release_page(vmx-nested.apic_access_page);
vmx-nested.apic_access_page =
nested_get_page(vcpu, vmcs12-apic_access_addr);
 }
if (...) {
 /* do the same for virtual_apic_page if CPU_BASED_TPR_SHADOW
is set... */

/*
 * Failing the vm entry is _not_ what the processor does
 * but it's basically the only possibility we have.
 * We could still enter the guest if CR8 load exits are 
 * enabled, CR8 store exits are enabled, and virtualize APIC
 * access is disabled; in this case the processor would never
 * use the TPR shadow and we could simply clear the bit from
 * the execution control.  But such a configuration is useless,
 * so let's keep the code simple.
 */
if (!vmx-nested.virtual_apic_page)
return -EFAULT;
}

return 0;

...

Then nested_vmx_run can do the nested_vmx_failValid if the function returns
an error.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/9] KVM: add kvm_arch_sched_in

2014-08-20 Thread Christian Borntraeger
On 19/08/14 22:35, Radim Krčmář wrote:

 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -3123,6 +3123,8 @@ static void kvm_sched_in(struct preempt_notifier *pn, 
 int cpu)
   if (vcpu-preempted)
   vcpu-preempted = false;
 
 + kvm_arch_sched_in(vcpu, cpu);
 +
   kvm_arch_vcpu_load(vcpu, cpu);
  }
 

Why cant you reuse kvm_arch_vcpu_load? Its also called on each sched_in and is 
architecture specific.

Christian

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 82761] DMAR:[fault reason 06] PTE Read access is not set

2014-08-20 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=82761

--- Comment #8 from Ansa89 ansalonistef...@gmail.com ---
(In reply to Alex Williamson from comment #6)
 Are these 3 separate NICs plugged into PCI slots on the motherboard or is
 this a single triple-port card with embedded PCIe-to-PCI bridge?

They are 3 separate NICs plugged into 3 separate PCI slots.


 You might be able to run the IOMMU in passthrough mode with iommu=pt
 r8169.use_dac=1, but note the warning in modinfo use_dac:Enable PCI DAC.
 Unsafe on 32 bit PCI slot.  Unfortunately if you don't enable use_dac, then
 intel_iommu will ignore the passthrough option for these devices.

I tried using intel_iommu=pt, but it didn't work (resulted in vt-d disabled).
However with intel_iommu=on iommu=pt the errors remain (probably because I
didn't add r8169.use_dac=1).
I'm on a 64 bit system, but I think it has nothing to with 32 bit PCI slot.


 Also note that this problem has nothing to do with Virtualization/KVM. 
 Drivers/Network or perhaps Drivers/PCI would be a more appropriate
 classification.

I searched for IOMMU section but it doesn't exist.
I will probably change classification to Drivers/PCI.



(In reply to Alex Williamson from comment #7)
 I'm guessing this might be the motherboard here: MSI ZH77A-G43

Yes, that is my motherboard.


 Since you're apparently trying to use VT-d on this system for KVM and
 therefore presumably device assignment, I'll note that you will never be
 able to successfully assign the conventional PCI devices separately between
 guests or between host and guests.  The IOMMU does not have the granularity
 to create separate IOMMU domains per PCI slot in this topology.  Also, some
 (all?) Realtek NICs have some strange backdoors to PCI configuration space
 that make them poor targets for PCI device assignment:

Yes, I'm trying to do device assignment, but not with those NICs: I want to
pass only the nVidia PCIe VGA card to guest; while all NICs (and the integrated
VGA card) will remain available to host.
It would be nice if there would be a way to prevent IOMMU on these NICs (or
something like that).

SIDE NOTE: in the qemu commit they talk about RTL8168, but I have real RTL8169
devices (the only RTL8168 device is the integrated NIC and for that device I'm
using r8168 driver from realtek compiled by hand).

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Nested paging in nested SVM setup

2014-08-20 Thread Paolo Bonzini
Il 20/08/2014 09:37, Valentine Sinitsyn ha scritto:
 Hi Paolo,
 
 On 20.08.2014 12:55, Paolo Bonzini wrote:
 Is the 0x23c always the same?
 No, it's just a garbage - I've seen other values as well (0x80 last time).
 
  Can you try this patch?
 Sure. It does print a warning:
 
 [ 2176.722098] [ cut here ]
 [ 2176.722118] WARNING: CPU: 0 PID: 1488 at
 /home/val/kvm-kmod/x86/x86.c:368 kvm_multiple_exception+0x121/0x130 [kvm]()
 [ 2176.722121] Modules linked in: kvm_amd(O) kvm(O) amd_freq_sensitivity
 snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic
 crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel
 snd_hda_intel aesni_intel snd_hda_controller radeon snd_hda_codec
 ipmi_si aes_x86_64 ipmi_msghandler snd_hwdep ttm r8169 ppdev mii lrw
 gf128mul snd_pcm glue_helper drm_kms_helper snd_timer fam15h_power evdev
 drm shpchp snd ablk_helper cryptd microcode mac_hid soundcore serio_raw
 pcspkr i2c_algo_bit k10temp i2c_piix4 i2c_core parport_pc parport hwmon
 edac_core tpm_tis edac_mce_amd tpm video button acpi_cpufreq processor
 ext4 crc16 mbcache jbd2 sd_mod crc_t10dif crct10dif_common atkbd libps2
 ahci libahci ohci_pci ohci_hcd ehci_pci xhci_hcd libata ehci_hcd usbcore
 scsi_mod usb_common i8042 serio [last unloaded: kvm]
 
 [ 2176.722217] CPU: 0 PID: 1488 Comm: qemu-system-x86 Tainted: G W  O 
 3.16.1-1-ARCH #1
 [ 2176.71] Hardware name: To Be Filled By O.E.M. To Be Filled By
 O.E.M./IMB-A180, BIOS L0.17 05/24/2013
 [ 2176.74]   25350f51 8800919fbbc0
 8152ae6c
 [ 2176.79]   8800919fbbf8 8106e45d
 880037f68000
 [ 2176.722234]  0080 0001 81a4
 
 [ 2176.722239] Call Trace:
 [ 2176.722250]  [8152ae6c] dump_stack+0x4d/0x6f
 [ 2176.722257]  [8106e45d] warn_slowpath_common+0x7d/0xa0
 [ 2176.722262]  [8106e58a] warn_slowpath_null+0x1a/0x20
 [ 2176.722275]  [a0651e41] kvm_multiple_exception+0x121/0x130
 [kvm]
 [ 2176.722288]  [a06594f8] x86_emulate_instruction+0x548/0x640
 [kvm]
 [ 2176.722303]  [a06653e1] kvm_mmu_page_fault+0x91/0xf0 [kvm]
 [ 2176.722310]  [a04eb6a7] pf_interception+0xd7/0x180 [kvm_amd]
 [ 2176.722317]  [8104e876] ? native_apic_mem_write+0x6/0x10
 [ 2176.722323]  [a04ef261] handle_exit+0x141/0x9d0 [kvm_amd]
 [ 2176.722335]  [a065512c] ? kvm_set_cr8+0x1c/0x20 [kvm]
 [ 2176.722341]  [a04ea3e0] ? nested_svm_get_tdp_cr3+0x20/0x20
 [kvm_amd]
 [ 2176.722355]  [a065adc7]
 kvm_arch_vcpu_ioctl_run+0x597/0x1210 [kvm]
 [ 2176.722368]  [a065705b] ? kvm_arch_vcpu_load+0xbb/0x200 [kvm]
 [ 2176.722378]  [a064a152] kvm_vcpu_ioctl+0x2b2/0x5c0 [kvm]
 [ 2176.722384]  [810b66b4] ? __wake_up+0x44/0x50
 [ 2176.722390]  [81200dcc] ? fsnotify+0x28c/0x370
 [ 2176.722397]  [811d4a70] do_vfs_ioctl+0x2d0/0x4b0
 [ 2176.722403]  [811df18e] ? __fget+0x6e/0xb0
 [ 2176.722408]  [811d4cd1] SyS_ioctl+0x81/0xa0
 [ 2176.722414]  [81530be9] system_call_fastpath+0x16/0x1b
 [ 2176.722418] ---[ end trace b0f81744c5a5ea4a ]---
 
 Thanks,
 Valentine
 -- 
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

I audited the various places that return X86EMUl_PROPAGATE_FAULT and
I think the culprit is this code in paging_tmpl.h.

real_gpa = mmu-translate_gpa(vcpu, gfn_to_gpa(gfn), access);
if (real_gpa == UNMAPPED_GVA)
return 0;

It returns zero without setting fault.vector.

Another patch...  I will post parts of it separately, if I am right
you should get 0xfe as the vector and a WARN from the gva_to_gpa function.

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index ef297919a691..e5bf13003cd2 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -527,6 +527,7 @@ static unsigned long seg_base(struct x86_emulate_ctxt 
*ctxt, int seg)
 static int emulate_exception(struct x86_emulate_ctxt *ctxt, int vec,
 u32 error, bool valid)
 {
+   WARN_ON(vec  0x1f);
ctxt-exception.vector = vec;
ctxt-exception.error_code = error;
ctxt-exception.error_code_valid = valid;
@@ -3016,7 +3015,7 @@ static int em_movbe(struct x86_emulate_ctxt *ctxt)
ctxt-dst.val = swab64(ctxt-src.val);
break;
default:
-   return X86EMUL_PROPAGATE_FAULT;
+   BUG();
}
return X86EMUL_CONTINUE;
 }
@@ -4829,8 +4828,10 @@ writeback:
ctxt-eip = ctxt-_eip;
 
 done:
-   if (rc == X86EMUL_PROPAGATE_FAULT)
+   if (rc == X86EMUL_PROPAGATE_FAULT) {
+   WARN_ON(ctxt-exception.vector  0x1f);
ctxt-have_exception = true;
+   }
if (rc == X86EMUL_INTERCEPTED)
return 

Re: [PATCH v4] KVM: vmx: fix ept reserved bits for 1-GByte page

2014-08-20 Thread Paolo Bonzini
Il 20/08/2014 09:31, Wanpeng Li ha scritto:
 EPT misconfig handler in kvm will check which reason lead to EPT
 misconfiguration after vmexit. One of the reasons is that an EPT
 paging-structure entry is configured with settings reserved for
 future functionality. However, the handler can't identify if
 paging-structure entry of reserved bits for 1-GByte page are
 configured, since PDPTE which point to 1-GByte page will reserve
 bits 29:12 instead of bits 7:3 which are reserved for PDPTE that
 references an EPT Page Directory. This patch fix it by reserve
 bits 29:12 for 1-GByte page.

Thanks, the patch looks good.

Can you describe how you detected the problem and how you're testing for it?

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] KVM: vmx: fix ept reserved bits for 1-GByte page

2014-08-20 Thread Wanpeng Li
On Wed, Aug 20, 2014 at 10:13:07AM +0200, Paolo Bonzini wrote:
Il 20/08/2014 09:31, Wanpeng Li ha scritto:
 EPT misconfig handler in kvm will check which reason lead to EPT
 misconfiguration after vmexit. One of the reasons is that an EPT
 paging-structure entry is configured with settings reserved for
 future functionality. However, the handler can't identify if
 paging-structure entry of reserved bits for 1-GByte page are
 configured, since PDPTE which point to 1-GByte page will reserve
 bits 29:12 instead of bits 7:3 which are reserved for PDPTE that
 references an EPT Page Directory. This patch fix it by reserve
 bits 29:12 for 1-GByte page.

Thanks, the patch looks good.

Can you describe how you detected the problem and how you're testing for it?


I found the issue by reviewing codes.

Regards,
Wanpeng Li 

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-20 Thread Paolo Bonzini
Il 20/08/2014 03:03, David Matlack ha scritto:
 On Tue, Aug 19, 2014 at 5:29 PM, Xiao Guangrong
 xiaoguangr...@linux.vnet.ibm.com wrote:
 On 08/19/2014 05:03 PM, Paolo Bonzini wrote:
 Il 19/08/2014 10:50, Xiao Guangrong ha scritto:
 Okay, what confused me it that it seems that the single line patch
 is ok to you. :)

 No, it was late and I was confused. :)

 Now, do we really need to care the case 2? like David said:
 Sorry I didn't explain myself very well: Since we can get a single wrong
 mmio exit no matter what, it has to be handled in userspace. So my point
 was, it doesn't really help to fix that one very specific way that it can
 happen, because it can just happen in other ways. (E.g. update memslots
 occurs after is_noslot_pfn() and before mmio exit).

 What's your idea?

 I think if you always treat the low bit as zero in mmio sptes, you can
 do that without losing a bit of the generation.

 What's you did is avoiding cache a invalid generation number into spte, but
 actually if we can figure it out when we check mmio access, it's ok. Like 
 the
 updated patch i posted should fix it, that way avoids doubly increase the 
 number.

 Yes.

 Okay, if you're interested increasing the number doubly, there is the 
 simpler
 one:

 This wastes a bit in the mmio spte though.  My idea is to increase the
 memslots generation twice, but drop the low bit in the mmio spte.

 Yeah, really smart idea. :)

 Paolo/David, would you mind making a patch for this (+ the comments in 
 David's
 patch)?
 
 Paolo, since it was your idea would you like to write it? I don't mind either
 way.

Sure, I'll post the patch for review.

Paolo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost: Add polling mode

2014-08-20 Thread Christian Borntraeger
On 10/08/14 10:30, Razya Ladelsky wrote:
 From: Razya Ladelsky ra...@il.ibm.com
 Date: Thu, 31 Jul 2014 09:47:20 +0300
 Subject: [PATCH] vhost: Add polling mode
 
 When vhost is waiting for buffers from the guest driver (e.g., more packets to
 send in vhost-net's transmit queue), it normally goes to sleep and waits for 
 the
 guest to kick it. This kick involves a PIO in the guest, and therefore an 
 exit
 (and possibly userspace involvement in translating this PIO exit into a file
 descriptor event), all of which hurts performance.
 
 If the system is under-utilized (has cpu time to spare), vhost can 
 continuously
 poll the virtqueues for new buffers, and avoid asking the guest to kick us.
 This patch adds an optional polling mode to vhost, that can be enabled via a
 kernel module parameter, poll_start_rate.
 
 When polling is active for a virtqueue, the guest is asked to disable
 notification (kicks), and the worker thread continuously checks for new 
 buffers.
 When it does discover new buffers, it simulates a kick by invoking the
 underlying backend driver (such as vhost-net), which thinks it got a real kick
 from the guest, and acts accordingly. If the underlying driver asks not to be
 kicked, we disable polling on this virtqueue.
 
 We start polling on a virtqueue when we notice it has work to do. Polling on
 this virtqueue is later disabled after 3 seconds of polling turning up no new
 work, as in this case we are better off returning to the exit-based 
 notification
 mechanism. The default timeout of 3 seconds can be changed with the
 poll_stop_idle kernel module parameter.
 
 This polling approach makes lot of sense for new HW with posted-interrupts for
 which we have exitless host-to-guest notifications. But even with support for
 posted interrupts, guest-to-host communication still causes exits. Polling 
 adds
 the missing part.
 
 When systems are overloaded, there won't be enough cpu time for the various
 vhost threads to poll their guests' devices. For these scenarios, we plan to 
 add
 support for vhost threads that can be shared by multiple devices, even of
 multiple vms.
 Our ultimate goal is to implement the I/O acceleration features described in:
 KVM Forum 2013: Efficient and Scalable Virtio (by Abel Gordon)
 https://www.youtube.com/watch?v=9EyweibHfEs
 and
 https://www.mail-archive.com/kvm@vger.kernel.org/msg98179.html
 
 I ran some experiments with TCP stream netperf and filebench (having 2 threads
 performing random reads) benchmarks on an IBM System x3650 M4.
 I have two machines, A and B. A hosts the vms, B runs the netserver.
 The vms (on A) run netperf, its destination server is running on B.
 All runs loaded the guests in a way that they were (cpu) saturated. For 
 example,
 I ran netperf with 64B messages, which is heavily loading the vm (which is why
 its throughput is low).
 The idea was to get it 100% loaded, so we can see that the polling is getting 
 it
 to produce higher throughput.
 
 The system had two cores per guest, as to allow for both the vcpu and the 
 vhost
 thread to run concurrently for maximum throughput (but I didn't pin the 
 threads
 to specific cores).
 My experiments were fair in a sense that for both cases, with or without
 polling, I run both threads, vcpu and vhost, on 2 cores (set their affinity 
 that
 way). The only difference was whether polling was enabled/disabled.
 
 Results:
 
 Netperf, 1 vm:
 The polling patch improved throughput by ~33% (1516 MB/sec - 2046 MB/sec).
 Number of exits/sec decreased 6x.
 The same improvement was shown when I tested with 3 vms running netperf
 (4086 MB/sec - 5545 MB/sec).
 
 filebench, 1 vm:
 ops/sec improved by 13% with the polling patch. Number of exits was reduced by
 31%.
 The same experiment with 3 vms running filebench showed similar numbers.
 
 Signed-off-by: Razya Ladelsky ra...@il.ibm.com

Gave it a quick try on s390/kvm. As expected it makes no difference for big 
streaming workload like iperf.
uperf with a 1-1 round robin got indeed faster by about 30%.
The high CPU consumption is something that bothers me though, as virtualized 
systems tend to be full.


 +static int poll_start_rate = 0;
 +module_param(poll_start_rate, int, S_IRUGO|S_IWUSR);
 +MODULE_PARM_DESC(poll_start_rate, Start continuous polling of virtqueue 
 when rate of events is at least this number per jiffy. If 0, never start 
 polling.);
 +
 +static int poll_stop_idle = 3*HZ; /* 3 seconds */
 +module_param(poll_stop_idle, int, S_IRUGO|S_IWUSR);
 +MODULE_PARM_DESC(poll_stop_idle, Stop continuous polling of virtqueue after 
 this many jiffies of no work.);

This seems ridicoudly high. Even one jiffie is an eternity, so setting it to 1 
as a default would reduce the CPU overhead for most cases.
If we dont have a packet in one millisecond, we can surely go back to the kick 
approach, I think.

Christian

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info 

Re: [PATCH] KVM: x86: Warn on APIC base relocation

2014-08-20 Thread Nadav Amit
CC’ing the KVM mailing list which I forgot.

On Aug 20, 2014, at 11:12 AM, Nadav Amit na...@cs.technion.ac.il wrote:

 APIC base relocation is unsupported by KVM. If anyone uses it, the least 
 should
 be to report a warning in the hypervisor. Note that kvm-unit-tests performs
 APIC base relocation, and causes the warning to be printed.
 
 Signed-off-by: Nadav Amit na...@cs.technion.ac.il
 ---
 arch/x86/kvm/lapic.c | 5 +
 1 file changed, 5 insertions(+)
 
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index 08e8a89..6655e20 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -1416,6 +1416,11 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 
 value)
   apic-base_address = apic-vcpu-arch.apic_base 
MSR_IA32_APICBASE_BASE;
 
 + if ((value  MSR_IA32_APICBASE_ENABLE) 
 +  apic-base_address != APIC_DEFAULT_PHYS_BASE)
 + printk_once(KERN_WARNING
 + APIC base relocation is unsupported by KVM\n);
 +
   /* with FSB delivery interrupt, we can restart APIC functionality */
   apic_debug(apic base msr is 0x%016 PRIx64 , and base address is 
  0x%lx.\n, apic-vcpu-arch.apic_base, apic-base_address);
 -- 
 1.9.1
 



signature.asc
Description: Message signed with OpenPGP using GPGMail


[PATCH v2 1/2] KVM: Introduce gfn_to_hva_memslot_prot

2014-08-20 Thread Christoffer Dall
To support read-only memory regions on arm and arm64, we have a need to
resolve a gfn to an hva given a pointer to a memslot to avoid looping
through the memslots twice and to reuse the hva error checking of
gfn_to_hva_prot(), add a new gfn_to_hva_memslot_prot() function and
refactor gfn_to_hva_prot() to use this function.

Acked-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
---
Changelog[v2]:
 - Fix typo in patch title

 include/linux/kvm_host.h |  2 ++
 virt/kvm/kvm_main.c  | 11 +--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a4c33b3..85875e0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -528,6 +528,8 @@ struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
 unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn);
 unsigned long gfn_to_hva_prot(struct kvm *kvm, gfn_t gfn, bool *writable);
 unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn);
+unsigned long gfn_to_hva_memslot_prot(struct kvm_memory_slot *slot, gfn_t gfn,
+ bool *writable);
 void kvm_release_page_clean(struct page *page);
 void kvm_release_page_dirty(struct page *page);
 void kvm_set_page_accessed(struct page *page);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 33712fb..36b887d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1075,9 +1075,9 @@ EXPORT_SYMBOL_GPL(gfn_to_hva);
  * If writable is set to false, the hva returned by this function is only
  * allowed to be read.
  */
-unsigned long gfn_to_hva_prot(struct kvm *kvm, gfn_t gfn, bool *writable)
+unsigned long gfn_to_hva_memslot_prot(struct kvm_memory_slot *slot,
+ gfn_t gfn, bool *writable)
 {
-   struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
unsigned long hva = __gfn_to_hva_many(slot, gfn, NULL, false);
 
if (!kvm_is_error_hva(hva)  writable)
@@ -1086,6 +1086,13 @@ unsigned long gfn_to_hva_prot(struct kvm *kvm, gfn_t 
gfn, bool *writable)
return hva;
 }
 
+unsigned long gfn_to_hva_prot(struct kvm *kvm, gfn_t gfn, bool *writable)
+{
+   struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
+
+   return gfn_to_hva_memslot_prot(slot, gfn, writable);
+}
+
 static int kvm_read_hva(void *data, void __user *hva, int len)
 {
return __copy_from_user(data, hva, len);
-- 
2.0.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] arm/arm64: KVM: Support KVM_CAP_READONLY_MEM

2014-08-20 Thread Christoffer Dall
When userspace loads code and data in a read-only memory regions, KVM
needs to be able to handle this on arm and arm64.  Specifically this is
used when running code directly from a read-only flash device; the
common scenario is a UEFI blob loaded with the -bios option in QEMU.

Note that the MMIO exit on writes to a read-only memory is ABI and can
be used to emulate block-erase style flash devices.

Acked-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
---
 arch/arm/include/uapi/asm/kvm.h   |  1 +
 arch/arm/kvm/arm.c|  1 +
 arch/arm/kvm/mmu.c| 15 ---
 arch/arm64/include/uapi/asm/kvm.h |  1 +
 4 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index e6ebdd3..51257fd 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -25,6 +25,7 @@
 
 #define __KVM_HAVE_GUEST_DEBUG
 #define __KVM_HAVE_IRQ_LINE
+#define __KVM_HAVE_READONLY_MEM
 
 #define KVM_REG_SIZE(id)   \
(1U  (((id)  KVM_REG_SIZE_MASK)  KVM_REG_SIZE_SHIFT))
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index a99e0cd..3ab3e60 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -188,6 +188,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_ONE_REG:
case KVM_CAP_ARM_PSCI:
case KVM_CAP_ARM_PSCI_0_2:
+   case KVM_CAP_READONLY_MEM:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 16e7994..dcbe01e 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -747,14 +747,13 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, 
phys_addr_t *ipap)
 }
 
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
- struct kvm_memory_slot *memslot,
+ struct kvm_memory_slot *memslot, unsigned long hva,
  unsigned long fault_status)
 {
int ret;
bool write_fault, writable, hugetlb = false, force_pte = false;
unsigned long mmu_seq;
gfn_t gfn = fault_ipa  PAGE_SHIFT;
-   unsigned long hva = gfn_to_hva(vcpu-kvm, gfn);
struct kvm *kvm = vcpu-kvm;
struct kvm_mmu_memory_cache *memcache = vcpu-arch.mmu_page_cache;
struct vm_area_struct *vma;
@@ -863,7 +862,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
unsigned long fault_status;
phys_addr_t fault_ipa;
struct kvm_memory_slot *memslot;
-   bool is_iabt;
+   unsigned long hva;
+   bool is_iabt, write_fault, writable;
gfn_t gfn;
int ret, idx;
 
@@ -884,7 +884,10 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
idx = srcu_read_lock(vcpu-kvm-srcu);
 
gfn = fault_ipa  PAGE_SHIFT;
-   if (!kvm_is_visible_gfn(vcpu-kvm, gfn)) {
+   memslot = gfn_to_memslot(vcpu-kvm, gfn);
+   hva = gfn_to_hva_memslot_prot(memslot, gfn, writable);
+   write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
+   if (kvm_is_error_hva(hva) || (write_fault  !writable)) {
if (is_iabt) {
/* Prefetch Abort on I/O address */
kvm_inject_pabt(vcpu, kvm_vcpu_get_hfar(vcpu));
@@ -910,9 +913,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
goto out_unlock;
}
 
-   memslot = gfn_to_memslot(vcpu-kvm, gfn);
-
-   ret = user_mem_abort(vcpu, fault_ipa, memslot, fault_status);
+   ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status);
if (ret == 0)
ret = 1;
 out_unlock:
diff --git a/arch/arm64/include/uapi/asm/kvm.h 
b/arch/arm64/include/uapi/asm/kvm.h
index e633ff8..f4ec5a6 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -37,6 +37,7 @@
 
 #define __KVM_HAVE_GUEST_DEBUG
 #define __KVM_HAVE_IRQ_LINE
+#define __KVM_HAVE_READONLY_MEM
 
 #define KVM_REG_SIZE(id)   \
(1U  (((id)  KVM_REG_SIZE_MASK)  KVM_REG_SIZE_SHIFT))
-- 
2.0.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Questions]

2014-08-20 Thread Zhangjie (HZ)
Hi MST,

I see reduce networking latency
from Networking Todo, the idea is to allow handling short packets from softirq 
or VCPU context.
If from softirq context, how could softirq copy skb to guest memory? If the 
method is to use mmstruct of Qemu,
would it be expensive?
If from VCPU context, maybe the internal operation of the virtual machine will 
have a significant delay.

Thanks!
-- 
Best Wishes!
Zhang Jie

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 2/2] KVM: nVMX: introduce apic_access_and_virtual_page_valid

2014-08-20 Thread Wanpeng Li
Introduce apic_access_and_virtual_page_valid() to check the valid 
of nested apic access page and virtual apic page earlier.

Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com
---
 arch/x86/kvm/vmx.c | 82 ++
 1 file changed, 46 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index caf239d..02bc07d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7838,6 +7838,50 @@ static void vmx_inject_page_fault_nested(struct kvm_vcpu 
*vcpu,
kvm_inject_page_fault(vcpu, fault);
 }
 
+static bool apic_access_and_virtual_page_valid(struct kvm_vcpu *vcpu,
+   struct vmcs12 *vmcs12)
+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+   if (nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) {
+   if (!PAGE_ALIGNED(vmcs12-apic_access_addr))
+   /*TODO: Also verify bits beyond physical address width 
are 0*/
+   return false;
+
+   /*
+* Translate L1 physical address to host physical
+* address for vmcs02. Keep the page pinned, so this
+* physical address remains valid. We keep a reference
+* to it so we can release it later.
+*/
+   if (vmx-nested.apic_access_page) /* shouldn't happen */
+   nested_release_page(vmx-nested.apic_access_page);
+   vmx-nested.apic_access_page =
+   nested_get_page(vcpu, vmcs12-apic_access_addr);
+   }
+
+   if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW)) {
+   if (vmx-nested.virtual_apic_page) /* shouldn't happen */
+   nested_release_page(vmx-nested.virtual_apic_page);
+   vmx-nested.virtual_apic_page =
+   nested_get_page(vcpu, vmcs12-virtual_apic_page_addr);
+
+   /*
+* Failing the vm entry is _not_ what the processor does
+* but it's basically the only possibility we have.
+* We could still enter the guest if CR8 load exits are
+* enabled, CR8 store exits are enabled, and virtualize APIC
+* access is disabled; in this case the processor would never
+* use the TPR shadow and we could simply clear the bit from
+* the execution control.  But such a configuration is useless,
+* so let's keep the code simple.
+*/
+   if (!vmx-nested.virtual_apic_page)
+   return false;
+   }
+   return true;
+}
+
 static void vmx_start_preemption_timer(struct kvm_vcpu *vcpu)
 {
u64 preemption_timeout = get_vmcs12(vcpu)-vmx_preemption_timer_value;
@@ -7984,16 +8028,6 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
 
if (exec_control  SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES) {
/*
-* Translate L1 physical address to host physical
-* address for vmcs02. Keep the page pinned, so this
-* physical address remains valid. We keep a reference
-* to it so we can release it later.
-*/
-   if (vmx-nested.apic_access_page) /* shouldn't happen */
-   
nested_release_page(vmx-nested.apic_access_page);
-   vmx-nested.apic_access_page =
-   nested_get_page(vcpu, vmcs12-apic_access_addr);
-   /*
 * If translation failed, no matter: This feature asks
 * to exit when accessing the given address, and if it
 * can never be accessed, this feature won't do
@@ -8040,30 +8074,8 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
exec_control |= vmcs12-cpu_based_vm_exec_control;
 
if (exec_control  CPU_BASED_TPR_SHADOW) {
-   if (vmx-nested.virtual_apic_page)
-   nested_release_page(vmx-nested.virtual_apic_page);
-   vmx-nested.virtual_apic_page =
-  nested_get_page(vcpu, vmcs12-virtual_apic_page_addr);
-   if (!vmx-nested.virtual_apic_page)
-   exec_control =
-   ~CPU_BASED_TPR_SHADOW;
-   else
-   vmcs_write64(VIRTUAL_APIC_PAGE_ADDR,
+   vmcs_write64(VIRTUAL_APIC_PAGE_ADDR,
page_to_phys(vmx-nested.virtual_apic_page));
-
-   /*
-* Failing the vm entry is _not_ what the processor does
-* but it's basically the only possibility we have.
-* We could still enter the guest if CR8 load exits are
-* enabled, CR8 

[PATCH v5 1/2] KVM: nVMX: nested TPR shadow/threshold emulation

2014-08-20 Thread Wanpeng Li
This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=61411

TPR shadow/threshold feature is important to speed up the Windows guest.
Besides, it is a must feature for certain VMM.

We map virtual APIC page address and TPR threshold from L1 VMCS. If
TPR_BELOW_THRESHOLD VM exit is triggered by L2 guest and L1 interested
in, we inject it into L1 VMM for handling.

Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com
---
v4 - v5:
 * moving the nested_vmx_failValid call inside the if 
(!vmx-nested.virtual_apic_page)
v3 - v4:
 * add Paolo's Reviewed-by
 * unconditionally fail the vmentry, with a comment
 * setup the TPR_SHADOW/virtual_apic_page of vmcs02 based on vmcs01 if L2 owns 
the APIC
v2 - v3:
 * nested vm entry failure if both tpr shadow and cr8 exiting bits are not set
v1 - v2:
 * don't take L0's virtualize APIC accesses setting into account
 * virtual_apic_page do exactly the same thing that is done for apic_access_page
 * add the tpr threshold field to the read-write fields for shadow VMCS

 arch/x86/kvm/vmx.c | 51 +--
 1 file changed, 49 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 286c283..caf239d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -379,6 +379,7 @@ struct nested_vmx {
 * we must keep them pinned while L2 runs.
 */
struct page *apic_access_page;
+   struct page *virtual_apic_page;
u64 msr_ia32_feature_control;
 
struct hrtimer preemption_timer;
@@ -533,6 +534,7 @@ static int max_shadow_read_only_fields =
ARRAY_SIZE(shadow_read_only_fields);
 
 static unsigned long shadow_read_write_fields[] = {
+   TPR_THRESHOLD,
GUEST_RIP,
GUEST_RSP,
GUEST_CR0,
@@ -2330,7 +2332,7 @@ static __init void nested_vmx_setup_ctls_msrs(void)
CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING |
CPU_BASED_USE_IO_BITMAPS | CPU_BASED_MONITOR_EXITING |
CPU_BASED_RDPMC_EXITING | CPU_BASED_RDTSC_EXITING |
-   CPU_BASED_PAUSE_EXITING |
+   CPU_BASED_PAUSE_EXITING | CPU_BASED_TPR_SHADOW |
CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
/*
 * We can allow some features even when not supported by the
@@ -6150,6 +6152,10 @@ static void free_nested(struct vcpu_vmx *vmx)
nested_release_page(vmx-nested.apic_access_page);
vmx-nested.apic_access_page = 0;
}
+   if (vmx-nested.virtual_apic_page) {
+   nested_release_page(vmx-nested.virtual_apic_page);
+   vmx-nested.virtual_apic_page = 0;
+   }
 
nested_free_all_saved_vmcss(vmx);
 }
@@ -6938,7 +6944,7 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
case EXIT_REASON_MCE_DURING_VMENTRY:
return 0;
case EXIT_REASON_TPR_BELOW_THRESHOLD:
-   return 1;
+   return nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW);
case EXIT_REASON_APIC_ACCESS:
return nested_cpu_has2(vmcs12,
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES);
@@ -7059,6 +7065,12 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
 
 static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
 {
+   struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+
+   if (is_guest_mode(vcpu) 
+   nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW))
+   return;
+
if (irr == -1 || tpr  irr) {
vmcs_write32(TPR_THRESHOLD, 0);
return;
@@ -8026,6 +8038,37 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
exec_control = ~CPU_BASED_VIRTUAL_NMI_PENDING;
exec_control = ~CPU_BASED_TPR_SHADOW;
exec_control |= vmcs12-cpu_based_vm_exec_control;
+
+   if (exec_control  CPU_BASED_TPR_SHADOW) {
+   if (vmx-nested.virtual_apic_page)
+   nested_release_page(vmx-nested.virtual_apic_page);
+   vmx-nested.virtual_apic_page =
+  nested_get_page(vcpu, vmcs12-virtual_apic_page_addr);
+   if (!vmx-nested.virtual_apic_page)
+   exec_control =
+   ~CPU_BASED_TPR_SHADOW;
+   else
+   vmcs_write64(VIRTUAL_APIC_PAGE_ADDR,
+   page_to_phys(vmx-nested.virtual_apic_page));
+
+   /*
+* Failing the vm entry is _not_ what the processor does
+* but it's basically the only possibility we have.
+* We could still enter the guest if CR8 load exits are
+* enabled, CR8 store exits are enabled, and virtualize APIC
+* access is disabled; in this case the processor would never
+* use the TPR shadow and we could simply clear the bit from
+* the 

Re: Nested paging in nested SVM setup

2014-08-20 Thread Valentine Sinitsyn

On 20.08.2014 14:11, Paolo Bonzini wrote:

Another patch...  I will post parts of it separately, if I am right
you should get 0xfe as the vector and a WARN from the gva_to_gpa function.
I confirm the vector is 0xfe, however I see no warnings from 
gva_to_gpa() - only from emulate_exception():



[ 3417.251967] [ cut here ]
[ 3417.251983] WARNING: CPU: 1 PID: 1584 at 
/home/val/kvm-kmod/x86/emulate.c:4839 x86_emulate_insn+0xb33/0xb70 [kvm]()


I can see both warnings, if I move 'WARN(walker.fault.vector  0x1f)' 
from gva_to_gpa() to gva_to_gpa_nested(), however:



[ 3841.420019] WARNING: CPU: 0 PID: 1945 at 
/home/val/kvm-kmod/x86/paging_tmpl.h:903 paging64_gva_to_gpa_nested+0xd1/0xe0 
[kvm]()
[ 3841.420457] WARNING: CPU: 0 PID: 1945 at 
/home/val/kvm-kmod/x86/emulate.c:4839 x86_emulate_insn+0xb33/0xb70 [kvm]()


Thanks,
Valentine
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Questions]

2014-08-20 Thread Michael S. Tsirkin
On Wed, Aug 20, 2014 at 05:37:01PM +0800, Zhangjie (HZ) wrote:
 Hi MST,
 
 I see reduce networking latency
 from Networking Todo, the idea is to allow handling short packets from 
 softirq or VCPU context.
 If from softirq context, how could softirq copy skb to guest memory? If the 
 method is to use mmstruct of Qemu,
 would it be expensive?

I have some very rough patches to explain this part of the idea.
Will dig them out for you.

 If from VCPU context, maybe the internal operation of the virtual machine 
 will have a significant delay.

We'd have to find a good heuristic here.
Maybe for a small number of very short packets the delay won't be significant.

 Thanks!
 -- 
 Best Wishes!
 Zhang Jie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: x86: Clarify PMU related features bit manipulation

2014-08-20 Thread Nadav Amit
kvm_pmu_cpuid_update makes a lot of bit manuiplation operations, when in fact
there are already unions that can be used instead. Changing the bit
manipulation to the union for clarity. This patch does not change the
functionality.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/pmu.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 3dd6acc..8e6b7d8 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -15,6 +15,7 @@
 #include linux/types.h
 #include linux/kvm_host.h
 #include linux/perf_event.h
+#include asm/perf_event.h
 #include x86.h
 #include cpuid.h
 #include lapic.h
@@ -463,7 +464,8 @@ void kvm_pmu_cpuid_update(struct kvm_vcpu *vcpu)
 {
struct kvm_pmu *pmu = vcpu-arch.pmu;
struct kvm_cpuid_entry2 *entry;
-   unsigned bitmap_len;
+   union cpuid10_eax eax;
+   union cpuid10_edx edx;
 
pmu-nr_arch_gp_counters = 0;
pmu-nr_arch_fixed_counters = 0;
@@ -475,25 +477,27 @@ void kvm_pmu_cpuid_update(struct kvm_vcpu *vcpu)
entry = kvm_find_cpuid_entry(vcpu, 0xa, 0);
if (!entry)
return;
+   eax.full = entry-eax;
+   edx.full = entry-edx;
 
-   pmu-version = entry-eax  0xff;
+   pmu-version = eax.split.version_id;
if (!pmu-version)
return;
 
-   pmu-nr_arch_gp_counters = min((int)(entry-eax  8)  0xff,
-   INTEL_PMC_MAX_GENERIC);
-   pmu-counter_bitmask[KVM_PMC_GP] =
-   ((u64)1  ((entry-eax  16)  0xff)) - 1;
-   bitmap_len = (entry-eax  24)  0xff;
-   pmu-available_event_types = ~entry-ebx  ((1ull  bitmap_len) - 1);
+   pmu-nr_arch_gp_counters = min_t(int, eax.split.num_counters,
+   INTEL_PMC_MAX_GENERIC);
+   pmu-counter_bitmask[KVM_PMC_GP] = ((u64)1  eax.split.bit_width) - 1;
+   pmu-available_event_types = ~entry-ebx 
+   ((1ull  eax.split.mask_length) - 1);
 
if (pmu-version == 1) {
pmu-nr_arch_fixed_counters = 0;
} else {
-   pmu-nr_arch_fixed_counters = min((int)(entry-edx  0x1f),
+   pmu-nr_arch_fixed_counters =
+   min_t(int, edx.split.num_counters_fixed,
INTEL_PMC_MAX_FIXED);
pmu-counter_bitmask[KVM_PMC_FIXED] =
-   ((u64)1  ((entry-edx  5)  0xff)) - 1;
+   ((u64)1  edx.split.bit_width_fixed) - 1;
}
 
pmu-global_ctrl = ((1  pmu-nr_arch_gp_counters) - 1) |
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM: x86: pmu: Enabling PMU v3

2014-08-20 Thread Nadav Amit
Currently the guest PMU version number 3 is not supported (versions up to 2 are
reported).  The features PMU v3 presents are: AnyThread, extend reporting of
capabilities in CPUID leaf 0AH and varible number of perfomrance counters.
While most of the support is already present, the version reported is still 2,
since dealing with AnyThread is complicated. Nonetheless, OSes may assume other
features than AnyThread are not supported since the version report is 2.

This patch checks if the guest vCPU uses SMT. If not, it reports PMU v3.  When
PMU v3 is used, the AnyThread bit is ignored, but does not trigger faults.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/cpuid.c|  2 +-
 arch/x86/kvm/pmu.c  | 11 +--
 arch/x86/kvm/svm.c  | 15 +++
 arch/x86/kvm/vmx.c  | 16 
 5 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4bda61b..8c8401b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -327,6 +327,7 @@ struct kvm_pmu {
u64 counter_bitmask[2];
u64 global_ctrl_mask;
u64 reserved_bits;
+   u64 fixed_ctrl_reserved_bits;
u8 version;
struct kvm_pmc gp_counters[INTEL_PMC_MAX_GENERIC];
struct kvm_pmc fixed_counters[INTEL_PMC_MAX_FIXED];
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 38a0afe..0d7b729 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -406,7 +406,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
if (!cap.version)
memset(cap, 0, sizeof(cap));
 
-   eax.split.version_id = min(cap.version, 2);
+   eax.split.version_id = min(cap.version, 3);
eax.split.num_counters = cap.num_counters_gp;
eax.split.bit_width = cap.bit_width_gp;
eax.split.mask_length = cap.events_mask_len;
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 8e6b7d8..2ad7101 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -383,7 +383,7 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data 
*msr_info)
case MSR_CORE_PERF_FIXED_CTR_CTRL:
if (pmu-fixed_ctr_ctrl == data)
return 0;
-   if (!(data  0xf444ull)) {
+   if (!(data  pmu-fixed_ctrl_reserved_bits)) {
reprogram_fixed_counters(pmu, data);
return 0;
}
@@ -472,7 +472,7 @@ void kvm_pmu_cpuid_update(struct kvm_vcpu *vcpu)
pmu-counter_bitmask[KVM_PMC_GP] = 0;
pmu-counter_bitmask[KVM_PMC_FIXED] = 0;
pmu-version = 0;
-   pmu-reserved_bits = 0x0020ull;
+   pmu-reserved_bits = 0xull;
 
entry = kvm_find_cpuid_entry(vcpu, 0xa, 0);
if (!entry)
@@ -504,6 +504,13 @@ void kvm_pmu_cpuid_update(struct kvm_vcpu *vcpu)
(((1ull  pmu-nr_arch_fixed_counters) - 1)  
INTEL_PMC_IDX_FIXED);
pmu-global_ctrl_mask = ~pmu-global_ctrl;
 
+   pmu-fixed_ctrl_reserved_bits =
+   ~((1ull  pmu-nr_arch_fixed_counters * 4) - 1);
+   if (pmu-version == 2) {
+   /* No support for anythread */
+   pmu-reserved_bits |= 0x20;
+   pmu-fixed_ctrl_reserved_bits |= 0xull;
+   }
entry = kvm_find_cpuid_entry(vcpu, 7, 0);
if (entry 
(boot_cpu_has(X86_FEATURE_HLE) || boot_cpu_has(X86_FEATURE_RTM)) 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1f49c86..963a9c0 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -4057,6 +4057,21 @@ static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t 
gfn, bool is_mmio)
 
 static void svm_cpuid_update(struct kvm_vcpu *vcpu)
 {
+   struct kvm_cpuid_entry2 *best;
+
+   /* If SMT, then PMU v3 is unsupported because of the anythread bit */
+   best = kvm_find_cpuid_entry(vcpu, 0x801e, 0);
+   if (best  ((best-ebx  8)  3)  0) {
+   best = kvm_find_cpuid_entry(vcpu, 0xa, 0);
+   if (best) {
+   union cpuid10_eax eax;
+
+   eax.full = best-eax;
+   eax.split.version_id =
+   min_t(int, eax.split.version_id, 2);
+   best-eax = eax.full;
+   }
+   }
 }
 
 static void svm_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index cad37d5..437b131 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7726,6 +7726,7 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
struct kvm_cpuid_entry2 *best;
struct vcpu_vmx *vmx = to_vmx(vcpu);
u32 exec_control;
+   bool smt = false;
 

[PATCH 0/2] KVM: x86: Enabling PMU v3 on non-SMT VMs

2014-08-20 Thread Nadav Amit
This patch-set enables PMU v3 on non-SMT VMs. All the PMU v3 features are
already in KVM except the AnyThread support.  However, AnyThread is only
important on SMT machines, and can be ignored otherwise. Reporting PMU v3 can
be useful for OSes that rely on the version, and not on other CPUID fields.

Thanks for reviewing the code. Note that it was not tested on AMD machine.

Nadav Amit (2):
  KVM: x86: Clarify PMU related features bit manipulation
  KVM: x86: pmu: Enabling PMU v3

 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/cpuid.c|  2 +-
 arch/x86/kvm/pmu.c  | 35 +++
 arch/x86/kvm/svm.c  | 15 +++
 arch/x86/kvm/vmx.c  | 16 
 5 files changed, 56 insertions(+), 13 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost: Add polling mode

2014-08-20 Thread Michael S. Tsirkin
On Wed, Aug 20, 2014 at 10:41:32AM +0200, Christian Borntraeger wrote:
 On 10/08/14 10:30, Razya Ladelsky wrote:
  From: Razya Ladelsky ra...@il.ibm.com
  Date: Thu, 31 Jul 2014 09:47:20 +0300
  Subject: [PATCH] vhost: Add polling mode
  
  When vhost is waiting for buffers from the guest driver (e.g., more packets 
  to
  send in vhost-net's transmit queue), it normally goes to sleep and waits 
  for the
  guest to kick it. This kick involves a PIO in the guest, and therefore an 
  exit
  (and possibly userspace involvement in translating this PIO exit into a file
  descriptor event), all of which hurts performance.
  
  If the system is under-utilized (has cpu time to spare), vhost can 
  continuously
  poll the virtqueues for new buffers, and avoid asking the guest to kick us.
  This patch adds an optional polling mode to vhost, that can be enabled via a
  kernel module parameter, poll_start_rate.
  
  When polling is active for a virtqueue, the guest is asked to disable
  notification (kicks), and the worker thread continuously checks for new 
  buffers.
  When it does discover new buffers, it simulates a kick by invoking the
  underlying backend driver (such as vhost-net), which thinks it got a real 
  kick
  from the guest, and acts accordingly. If the underlying driver asks not to 
  be
  kicked, we disable polling on this virtqueue.
  
  We start polling on a virtqueue when we notice it has work to do. Polling on
  this virtqueue is later disabled after 3 seconds of polling turning up no 
  new
  work, as in this case we are better off returning to the exit-based 
  notification
  mechanism. The default timeout of 3 seconds can be changed with the
  poll_stop_idle kernel module parameter.
  
  This polling approach makes lot of sense for new HW with posted-interrupts 
  for
  which we have exitless host-to-guest notifications. But even with support 
  for
  posted interrupts, guest-to-host communication still causes exits. Polling 
  adds
  the missing part.
  
  When systems are overloaded, there won't be enough cpu time for the various
  vhost threads to poll their guests' devices. For these scenarios, we plan 
  to add
  support for vhost threads that can be shared by multiple devices, even of
  multiple vms.
  Our ultimate goal is to implement the I/O acceleration features described 
  in:
  KVM Forum 2013: Efficient and Scalable Virtio (by Abel Gordon)
  https://www.youtube.com/watch?v=9EyweibHfEs
  and
  https://www.mail-archive.com/kvm@vger.kernel.org/msg98179.html
  
  I ran some experiments with TCP stream netperf and filebench (having 2 
  threads
  performing random reads) benchmarks on an IBM System x3650 M4.
  I have two machines, A and B. A hosts the vms, B runs the netserver.
  The vms (on A) run netperf, its destination server is running on B.
  All runs loaded the guests in a way that they were (cpu) saturated. For 
  example,
  I ran netperf with 64B messages, which is heavily loading the vm (which is 
  why
  its throughput is low).
  The idea was to get it 100% loaded, so we can see that the polling is 
  getting it
  to produce higher throughput.
  
  The system had two cores per guest, as to allow for both the vcpu and the 
  vhost
  thread to run concurrently for maximum throughput (but I didn't pin the 
  threads
  to specific cores).
  My experiments were fair in a sense that for both cases, with or without
  polling, I run both threads, vcpu and vhost, on 2 cores (set their affinity 
  that
  way). The only difference was whether polling was enabled/disabled.
  
  Results:
  
  Netperf, 1 vm:
  The polling patch improved throughput by ~33% (1516 MB/sec - 2046 MB/sec).
  Number of exits/sec decreased 6x.
  The same improvement was shown when I tested with 3 vms running netperf
  (4086 MB/sec - 5545 MB/sec).
  
  filebench, 1 vm:
  ops/sec improved by 13% with the polling patch. Number of exits was reduced 
  by
  31%.
  The same experiment with 3 vms running filebench showed similar numbers.
  
  Signed-off-by: Razya Ladelsky ra...@il.ibm.com
 
 Gave it a quick try on s390/kvm. As expected it makes no difference for big 
 streaming workload like iperf.
 uperf with a 1-1 round robin got indeed faster by about 30%.
 The high CPU consumption is something that bothers me though, as virtualized 
 systems tend to be full.
 
 
  +static int poll_start_rate = 0;
  +module_param(poll_start_rate, int, S_IRUGO|S_IWUSR);
  +MODULE_PARM_DESC(poll_start_rate, Start continuous polling of virtqueue 
  when rate of events is at least this number per jiffy. If 0, never start 
  polling.);
  +
  +static int poll_stop_idle = 3*HZ; /* 3 seconds */
  +module_param(poll_stop_idle, int, S_IRUGO|S_IWUSR);
  +MODULE_PARM_DESC(poll_stop_idle, Stop continuous polling of virtqueue 
  after this many jiffies of no work.);
 
 This seems ridicoudly high. Even one jiffie is an eternity, so setting it to 
 1 as a default would reduce the CPU overhead for most cases.
 If we dont have a packet in one 

Re: [PATCH v5 2/2] KVM: nVMX: introduce apic_access_and_virtual_page_valid

2014-08-20 Thread Paolo Bonzini
Il 20/08/2014 11:45, Wanpeng Li ha scritto:
 Introduce apic_access_and_virtual_page_valid() to check the valid 
 of nested apic access page and virtual apic page earlier.
 
 Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com
 ---
  arch/x86/kvm/vmx.c | 82 
 ++
  1 file changed, 46 insertions(+), 36 deletions(-)
 
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index caf239d..02bc07d 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -7838,6 +7838,50 @@ static void vmx_inject_page_fault_nested(struct 
 kvm_vcpu *vcpu,
   kvm_inject_page_fault(vcpu, fault);
  }
  
 +static bool apic_access_and_virtual_page_valid(struct kvm_vcpu *vcpu,
 + struct vmcs12 *vmcs12)
 +{
 + struct vcpu_vmx *vmx = to_vmx(vcpu);
 +
 + if (nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) {
 + if (!PAGE_ALIGNED(vmcs12-apic_access_addr))
 + /*TODO: Also verify bits beyond physical address width 
 are 0*/
 + return false;
 +
 + /*
 +  * Translate L1 physical address to host physical
 +  * address for vmcs02. Keep the page pinned, so this
 +  * physical address remains valid. We keep a reference
 +  * to it so we can release it later.
 +  */
 + if (vmx-nested.apic_access_page) /* shouldn't happen */
 + nested_release_page(vmx-nested.apic_access_page);
 + vmx-nested.apic_access_page =
 + nested_get_page(vcpu, vmcs12-apic_access_addr);
 + }
 +
 + if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW)) {
 + if (vmx-nested.virtual_apic_page) /* shouldn't happen */
 + nested_release_page(vmx-nested.virtual_apic_page);
 + vmx-nested.virtual_apic_page =
 + nested_get_page(vcpu, vmcs12-virtual_apic_page_addr);
 +
 + /*
 +  * Failing the vm entry is _not_ what the processor does
 +  * but it's basically the only possibility we have.
 +  * We could still enter the guest if CR8 load exits are
 +  * enabled, CR8 store exits are enabled, and virtualize APIC
 +  * access is disabled; in this case the processor would never
 +  * use the TPR shadow and we could simply clear the bit from
 +  * the execution control.  But such a configuration is useless,
 +  * so let's keep the code simple.
 +  */
 + if (!vmx-nested.virtual_apic_page)
 + return false;
 + }
 + return true;
 +}
 +
  static void vmx_start_preemption_timer(struct kvm_vcpu *vcpu)
  {
   u64 preemption_timeout = get_vmcs12(vcpu)-vmx_preemption_timer_value;
 @@ -7984,16 +8028,6 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
  
   if (exec_control  SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES) {
   /*
 -  * Translate L1 physical address to host physical
 -  * address for vmcs02. Keep the page pinned, so this
 -  * physical address remains valid. We keep a reference
 -  * to it so we can release it later.
 -  */
 - if (vmx-nested.apic_access_page) /* shouldn't happen */
 - 
 nested_release_page(vmx-nested.apic_access_page);
 - vmx-nested.apic_access_page =
 - nested_get_page(vcpu, vmcs12-apic_access_addr);
 - /*
* If translation failed, no matter: This feature asks
* to exit when accessing the given address, and if it
* can never be accessed, this feature won't do
 @@ -8040,30 +8074,8 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
 struct vmcs12 *vmcs12)
   exec_control |= vmcs12-cpu_based_vm_exec_control;
  
   if (exec_control  CPU_BASED_TPR_SHADOW) {
 - if (vmx-nested.virtual_apic_page)
 - nested_release_page(vmx-nested.virtual_apic_page);
 - vmx-nested.virtual_apic_page =
 -nested_get_page(vcpu, vmcs12-virtual_apic_page_addr);
 - if (!vmx-nested.virtual_apic_page)
 - exec_control =
 - ~CPU_BASED_TPR_SHADOW;
 - else
 - vmcs_write64(VIRTUAL_APIC_PAGE_ADDR,
 + vmcs_write64(VIRTUAL_APIC_PAGE_ADDR,
   page_to_phys(vmx-nested.virtual_apic_page));
 -
 - /*
 -  * Failing the vm entry is _not_ what the processor does
 -  * but it's basically the only possibility we have.
 -  * We could still enter the guest if CR8 load exits are
 -  * 

Re: [PATCH] vhost: Add polling mode

2014-08-20 Thread Michael S. Tsirkin
On Sun, Aug 10, 2014 at 11:30:35AM +0300, Razya Ladelsky wrote:
 From: Razya Ladelsky ra...@il.ibm.com
 Date: Thu, 31 Jul 2014 09:47:20 +0300
 Subject: [PATCH] vhost: Add polling mode
 
 When vhost is waiting for buffers from the guest driver (e.g., more packets to
 send in vhost-net's transmit queue), it normally goes to sleep and waits for 
 the
 guest to kick it. This kick involves a PIO in the guest, and therefore an 
 exit
 (and possibly userspace involvement in translating this PIO exit into a file
 descriptor event), all of which hurts performance.
 
 If the system is under-utilized (has cpu time to spare), vhost can 
 continuously
 poll the virtqueues for new buffers, and avoid asking the guest to kick us.
 This patch adds an optional polling mode to vhost, that can be enabled via a
 kernel module parameter, poll_start_rate.
 
 When polling is active for a virtqueue, the guest is asked to disable
 notification (kicks), and the worker thread continuously checks for new 
 buffers.
 When it does discover new buffers, it simulates a kick by invoking the
 underlying backend driver (such as vhost-net), which thinks it got a real kick
 from the guest, and acts accordingly. If the underlying driver asks not to be
 kicked, we disable polling on this virtqueue.
 
 We start polling on a virtqueue when we notice it has work to do. Polling on
 this virtqueue is later disabled after 3 seconds of polling turning up no new
 work, as in this case we are better off returning to the exit-based 
 notification
 mechanism. The default timeout of 3 seconds can be changed with the
 poll_stop_idle kernel module parameter.
 
 This polling approach makes lot of sense for new HW with posted-interrupts for
 which we have exitless host-to-guest notifications. But even with support for
 posted interrupts, guest-to-host communication still causes exits. Polling 
 adds
 the missing part.
 
 When systems are overloaded, there won't be enough cpu time for the various
 vhost threads to poll their guests' devices. For these scenarios, we plan to 
 add
 support for vhost threads that can be shared by multiple devices, even of
 multiple vms.
 Our ultimate goal is to implement the I/O acceleration features described in:
 KVM Forum 2013: Efficient and Scalable Virtio (by Abel Gordon)
 https://www.youtube.com/watch?v=9EyweibHfEs
 and
 https://www.mail-archive.com/kvm@vger.kernel.org/msg98179.html
 
 I ran some experiments with TCP stream netperf and filebench (having 2 threads
 performing random reads) benchmarks on an IBM System x3650 M4.
 I have two machines, A and B. A hosts the vms, B runs the netserver.
 The vms (on A) run netperf, its destination server is running on B.
 All runs loaded the guests in a way that they were (cpu) saturated. For 
 example,
 I ran netperf with 64B messages, which is heavily loading the vm (which is why
 its throughput is low).
 The idea was to get it 100% loaded, so we can see that the polling is getting 
 it
 to produce higher throughput.
 
 The system had two cores per guest, as to allow for both the vcpu and the 
 vhost
 thread to run concurrently for maximum throughput (but I didn't pin the 
 threads
 to specific cores).
 My experiments were fair in a sense that for both cases, with or without
 polling, I run both threads, vcpu and vhost, on 2 cores (set their affinity 
 that
 way). The only difference was whether polling was enabled/disabled.
 
 Results:
 
 Netperf, 1 vm:
 The polling patch improved throughput by ~33% (1516 MB/sec - 2046 MB/sec).
 Number of exits/sec decreased 6x.
 The same improvement was shown when I tested with 3 vms running netperf
 (4086 MB/sec - 5545 MB/sec).
 
 filebench, 1 vm:
 ops/sec improved by 13% with the polling patch. Number of exits was reduced by
 31%.
 The same experiment with 3 vms running filebench showed similar numbers.
 
 Signed-off-by: Razya Ladelsky ra...@il.ibm.com

This really needs more thourough benchmarking report, including
system data.  One good example for a related patch:
http://lwn.net/Articles/551179/
though for virtualization, we need data about host as well, and if you
want to look at streaming benchmarks, you need to test different message
sizes and measure packet size.

For now, commenting on the patches assuming that will be forthcoming.

 ---
  drivers/vhost/net.c   |6 +-
  drivers/vhost/scsi.c  |6 +-
  drivers/vhost/vhost.c |  245 
 +++--
  drivers/vhost/vhost.h |   38 +++-
  4 files changed, 277 insertions(+), 18 deletions(-)
 
 diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
 index 971a760..558aecb 100644
 --- a/drivers/vhost/net.c
 +++ b/drivers/vhost/net.c
 @@ -742,8 +742,10 @@ static int vhost_net_open(struct inode *inode, struct 
 file *f)
   }
   vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
  
 - vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev);
 - vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev);
 + 

Re: [PATCH 0/2] KVM: x86: Enabling PMU v3 on non-SMT VMs

2014-08-20 Thread Paolo Bonzini
Il 20/08/2014 12:25, Nadav Amit ha scritto:
 This patch-set enables PMU v3 on non-SMT VMs. All the PMU v3 features are
 already in KVM except the AnyThread support.  However, AnyThread is only
 important on SMT machines, and can be ignored otherwise. Reporting PMU v3 can
 be useful for OSes that rely on the version, and not on other CPUID fields.
 
 Thanks for reviewing the code. Note that it was not tested on AMD machine.
 
 Nadav Amit (2):
   KVM: x86: Clarify PMU related features bit manipulation
   KVM: x86: pmu: Enabling PMU v3
 
  arch/x86/include/asm/kvm_host.h |  1 +
  arch/x86/kvm/cpuid.c|  2 +-
  arch/x86/kvm/pmu.c  | 35 +++
  arch/x86/kvm/svm.c  | 15 +++
  arch/x86/kvm/vmx.c  | 16 
  5 files changed, 56 insertions(+), 13 deletions(-)
 

For now I've reviewed patch 1 and will apply that to kvm/queue.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost: Add polling mode

2014-08-20 Thread Michael S. Tsirkin
On Tue, Aug 19, 2014 at 11:36:31AM +0300, Razya Ladelsky wrote:
  That was just one example. There many other possibilities.  Either
  actually make the systems load all host CPUs equally, or divide
  throughput by host CPU.
  
 
 The polling patch adds this capability to vhost, reducing costly exit 
 overhead when the vm is loaded.
 
 In order to load the vm I ran netperf  with msg size of 256:
 
 Without polling:  2480 Mbits/sec,  utilization: vm - 100%   vhost - 64% 
 With Polling: 4160 Mbits/sec,  utilization: vm - 100%   vhost - 100% 
 
 Therefore, throughput/cpu without polling is 15.1, and 20.8 with polling.
 

Can you please present results in a form that makes
it possible to see the effect on various configurations
and workloads?

Here's one example where this was done:
https://lkml.org/lkml/2014/8/14/495

You really should also provide data about your host
configuration (missing in the above link).

 My intention was to load vhost as close as possible to 100% utilization 
 without polling, in order to compare it to the polling utilization case 
 (where vhost is always 100%). 
 The best use case, of course, would be when the shared vhost thread work 
 (TBD) is integrated and then vhost will actually be using its polling 
 cycles to handle requests of multiple devices (even from multiple vms).
 
 Thanks,
 Razya


-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: emulate: warn on invalid or uninitialized exception numbers

2014-08-20 Thread Paolo Bonzini
These were reported when running Jailhouse on AMD processors.

Initialize ctxt-exception.vector with an invalid exception number,
and warn if it remained invalid even though the emulator got
an X86EMUL_PROPAGATE_FAULT return code.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 arch/x86/kvm/emulate.c | 5 -
 arch/x86/kvm/x86.c | 1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 4fbf4b598f92..e5bf13003cd2 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -527,6 +527,7 @@ static unsigned long seg_base(struct x86_emulate_ctxt 
*ctxt, int seg)
 static int emulate_exception(struct x86_emulate_ctxt *ctxt, int vec,
 u32 error, bool valid)
 {
+   WARN_ON(vec  0x1f);
ctxt-exception.vector = vec;
ctxt-exception.error_code = error;
ctxt-exception.error_code_valid = valid;
@@ -4827,8 +4828,10 @@ writeback:
ctxt-eip = ctxt-_eip;
 
 done:
-   if (rc == X86EMUL_PROPAGATE_FAULT)
+   if (rc == X86EMUL_PROPAGATE_FAULT) {
+   WARN_ON(ctxt-exception.vector  0x1f);
ctxt-have_exception = true;
+   }
if (rc == X86EMUL_INTERCEPTED)
return EMULATION_INTERCEPTED;
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 737b4bdac41c..cd718c01cdf1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5248,6 +5248,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
 
ctxt-interruptibility = 0;
ctxt-have_exception = false;
+   ctxt-exception.vector = -1;
ctxt-perm_ok = false;
 
ctxt-ud = emulation_type  EMULTYPE_TRAP_UD;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: emulate: do not return X86EMUL_PROPAGATE_FAULT explicitly

2014-08-20 Thread Paolo Bonzini
Always get it through emulate_exception or emulate_ts.  This
ensures that the ctxt-exception fields have been populated.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 arch/x86/kvm/emulate.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index ef297919a691..4fbf4b598f92 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1549,8 +1549,7 @@ load:
ctxt-ops-set_segment(ctxt, selector, seg_desc, base3, seg);
return X86EMUL_CONTINUE;
 exception:
-   emulate_exception(ctxt, err_vec, err_code, true);
-   return X86EMUL_PROPAGATE_FAULT;
+   return emulate_exception(ctxt, err_vec, err_code, true);
 }
 
 static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
@@ -2723,8 +2722,7 @@ static int emulator_do_task_switch(struct 
x86_emulate_ctxt *ctxt,
if (!next_tss_desc.p ||
((desc_limit  0x67  (next_tss_desc.type  8)) ||
 desc_limit  0x2b)) {
-   emulate_ts(ctxt, tss_selector  0xfffc);
-   return X86EMUL_PROPAGATE_FAULT;
+   return emulate_ts(ctxt, tss_selector  0xfffc);
}
 
if (reason == TASK_SWITCH_IRET || reason == TASK_SWITCH_JMP) {
@@ -3016,7 +3014,7 @@ static int em_movbe(struct x86_emulate_ctxt *ctxt)
ctxt-dst.val = swab64(ctxt-src.val);
break;
default:
-   return X86EMUL_PROPAGATE_FAULT;
+   BUG();
}
return X86EMUL_CONTINUE;
 }
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virt/kvm/assigned-dev.c: Set 'dev-irq_source_id' to '-1' after free it

2014-08-20 Thread Chen Gang
On 08/20/2014 08:01 AM, Chen Gang wrote:
 
 By the way, at present, I use Qemu as user mode program, is there common
 test with both Qemu and KVM/Xen? And is a PC enough for the common test?


Oh, I find Qemu have make check just like gcc/binutils, so for each of
my patch, next, shall run ./configure  make  make check at least.

And also welcome any additional ideas, suggestions or completions about
test for kvm/xen/qemu.


Thanks.
 
 
 On 08/20/2014 07:58 AM, Chen Gang wrote:
 On 08/19/2014 11:49 PM, Paolo Bonzini wrote:
 Il 19/08/2014 17:44, Chen Gang ha scritto:
 Hello maintainers:

 Please help check this patch, when you have time.
 Hi, it's already on its way to 3.17-rc2, but I first have to run a bunch
 of tests.

 OK, thanks. Also can let me try the test, although I am not quite
 familiar with KVM. Since I plan to focus on KVM/Xen next, I shall
 construct related environments for its' common test, at least.

 I am just constructing the gcc common test environments under a new PC,
 is a PC also enough for KVM/Xen common test?

 Welcome any ideas, suggestions or completions about it (especially the
 information about KVM/Xen common test).


 Thanks.

 
 


-- 
Chen Gang

Open share and attitude like air water and life which God blessed
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86: do not check CS.DPL against RPL during task switch

2014-08-20 Thread Paolo Bonzini
This reverts the check added by commit 5045b468037d (KVM: x86: check CS.DPL
against RPL during task switch, 2014-05-15).  Although the CS.DPL=CS.RPL
check is mentioned in table 7-1 of the SDM as causing a #TSS exception,
it is not mentioned in table 6-6 that lists invalid TSS conditions
which cause #TSS exceptions. In fact it causes some tests to fail, which
pass on bare-metal.

Keep the rest of the commit, since we will find new uses for it in 3.18.

Reported-by: Nadav Amit na...@cs.technion.ac.il
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 arch/x86/kvm/emulate.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index ef117b842334..03954f7900f5 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1491,9 +1491,6 @@ static int __load_segment_descriptor(struct 
x86_emulate_ctxt *ctxt,
goto exception;
break;
case VCPU_SREG_CS:
-   if (in_task_switch  rpl != dpl)
-   goto exception;
-
if (!(seg_desc.type  8))
goto exception;
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86: raise invalid TSS exceptions during a task switch

2014-08-20 Thread Paolo Bonzini
Conditions that would usually trigger a general protection fault should
instead raise #TS.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 arch/x86/kvm/emulate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 03954f7900f5..ef297919a691 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1468,7 +1468,7 @@ static int __load_segment_descriptor(struct 
x86_emulate_ctxt *ctxt,
return ret;
 
err_code = selector  0xfffc;
-   err_vec = GP_VECTOR;
+   err_vec = in_task_switch ? TS_VECTOR : GP_VECTOR;
 
/* can't load system descriptor into segment selector */
if (seg = VCPU_SREG_GS  !seg_desc.s)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] KVM changes for 3.17-rc2

2014-08-20 Thread Paolo Bonzini
Linus,

The following changes since commit 7d1311b93e58ed55f3a31cc8f94c4b8fe988a2b9:

  Linux 3.17-rc1 (2014-08-16 10:40:26 -0600)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus

for you to fetch changes up to 30d1e0e806e5b2fadc297ba78f2d7afd6ba309cf:

  virt/kvm/assigned-dev.c: Set 'dev-irq_source_id' to '-1' after free it 
(2014-08-19 15:12:28 +0200)


Reverting a 3.16 patch, fixing two bugs in device assignment
(one has a CVE), and fixing some problems introduced during the merge window
(the CMA bug came in via Andrew, the x86 ones via yours truly).


Alexey Kardashevskiy (1):
  PC, KVM, CMA: Fix regression caused by wrong get_order() use

Chen Gang (1):
  virt/kvm/assigned-dev.c: Set 'dev-irq_source_id' to '-1' after free it

Michael S. Tsirkin (1):
  kvm: iommu: fix the third parameter of kvm_iommu_put_pages (CVE-2014-3601)

Nadav Amit (1):
  KVM: x86: Avoid emulating instructions on #UD mistakenly

Paolo Bonzini (2):
  KVM: x86: do not check CS.DPL against RPL during task switch
  Revert KVM: x86: Increase the number of fixed MTRR regs to 10

 arch/powerpc/kvm/book3s_hv_builtin.c |  6 +++---
 arch/x86/include/asm/kvm_host.h  |  2 +-
 arch/x86/kvm/emulate.c   | 11 ---
 virt/kvm/assigned-dev.c  |  4 +++-
 virt/kvm/iommu.c | 19 ++-
 5 files changed, 21 insertions(+), 21 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86: Keep masked bits unmodified on kvm_set_shared_msr

2014-08-20 Thread Nadav Amit
Currently, when an msr is updated using kvm_set_shared_msr the masked bits are
zeroed.  This behavior is currently valid since the only MSR with partial mask
is EFER, in which only SCE might be unmasked. However, using the
kvm_set_shared_msr for other purposes becomes impossible.

This patch keeps the masked bits unmodified while setting a shared msr.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/x86.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5f5edb6..ee42410 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -236,6 +236,7 @@ void kvm_set_shared_msr(unsigned slot, u64 value, u64 mask)
 
if (((value ^ smsr-values[slot].curr)  mask) == 0)
return;
+   value = (smsr-values[slot].curr  ~mask) | (value  mask);
smsr-values[slot].curr = value;
wrmsrl(shared_msrs_global.msrs[slot], value);
if (!smsr-registered) {
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/9] KVM: VMX: abstract ple_window modifiers

2014-08-20 Thread Radim Krčmář
2014-08-20 09:02+0200, Paolo Bonzini:
 Il 19/08/2014 22:35, Radim Krčmář ha scritto:
  They were almost identical and thus merged with a loathable macro.
  
  Signed-off-by: Radim Krčmář rkrc...@redhat.com
  ---
   This solution is hopefully more acceptable than function pointers.
 
 I think a little amount of duplication is not a problem.

Ok, I'll drop this patch from from v2.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum

2014-08-20 Thread Radim Krčmář
2014-08-20 09:16+0200, Paolo Bonzini:
 Il 19/08/2014 22:35, Radim Krčmář ha scritto:
  Every increase of ple_window_grow creates potential overflows.
  They are not serious, because we clamp ple_window and userspace is
  expected to fix ple_window_max within a second.
  ---
 I think avoiding overflows is better.  In fact, I think you should call
 this function for ple_window_max too.

(Ack, I just wanted to avoid the worst userspace error, which is why
 PW_max hasn't changed when PW_grow got smaller and we could overflow.)

 You could keep the ple_window_max variable to the user-set value.
 Whenever ple_window_grow or ple_window_max are changed, you can set an
 internal variable (let's call it ple_window_actual_max, but I'm not wed
 to this name) to the computed value, and then do:
 
   if (ple_window_grow  1 || ple_window_actual_max  ple_window)
   new = ple_window;
   else if (ple_window_grow  ple_window)
   new = max(ple_window_actual_max, old) * ple_window_grow;
   else
   new = max(ple_window_actual_max, old) + ple_window_grow;

Oh, I like that this can get rid of all overflows, ple_window_actual_max
(PW_effective_max?) is going to be set to
ple_window_max [/-] ple_window_grow in v2.

 (I think the || in the first if can be eliminated with some creativity
 in clamp_ple_window_max).

To do it, we'll want to intercept changes to ple_window as well.
(I disliked this patch a lot even before :)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/9] KVM: VMX: clamp PLE window

2014-08-20 Thread Radim Krčmář
2014-08-20 09:18+0200, Paolo Bonzini:
 Il 19/08/2014 22:35, Radim Krčmář ha scritto:
  Modifications could get unwanted values of PLE window. (low or negative)
  Use ple_window and the maximal value that cannot overflow as bounds.
  
  ple_window_max defaults to a very high value, but it would make sense to
  set it to some fraction of the scheduler tick.
  
  Signed-off-by: Radim Krčmář rkrc...@redhat.com
  ---
 Please introduce a dynamic overflow-avoiding ple_window_max (like what
 you have in patch 9) already in patch 4...
 
   static void shrink_ple_window(struct kvm_vcpu *vcpu)
  @@ -5720,7 +5724,7 @@ static void shrink_ple_window(struct kvm_vcpu *vcpu)
  else
  new = old - ple_window_shrink;
   
  -   vmx-ple_window = new;
  +   vmx-ple_window = max(new, ple_window);
 
 ... and also squash this in patch 4.
 
 This patch can then introduce the ple_window_max module parameter (using
 module_param_cb to avoid overflows).

Will do.

---
It is going to make the patches slightly harder to review;
Are we doing it because git doesn't bisect on series boundaries?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/9] KVM: VMX: make PLE window per-vcpu

2014-08-20 Thread Radim Krčmář
2014-08-20 09:13+0200, Paolo Bonzini:
 Il 19/08/2014 22:35, Radim Krčmář ha scritto:
   enum segment_cache_field {
  @@ -4403,6 +4406,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
  if (ple_gap) {
  vmcs_write32(PLE_GAP, ple_gap);
  vmcs_write32(PLE_WINDOW, ple_window);
 
 Is this necessary?

V2, thanks.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/9] KVM: add kvm_arch_sched_in

2014-08-20 Thread Radim Krčmář
2014-08-20 09:47+0200, Christian Borntraeger:
 On 19/08/14 22:35, Radim Krčmář wrote:
 
  --- a/virt/kvm/kvm_main.c
  +++ b/virt/kvm/kvm_main.c
  @@ -3123,6 +3123,8 @@ static void kvm_sched_in(struct preempt_notifier *pn, 
  int cpu)
  if (vcpu-preempted)
  vcpu-preempted = false;
  
  +   kvm_arch_sched_in(vcpu, cpu);
  +
  kvm_arch_vcpu_load(vcpu, cpu);
   }
  
 
 Why cant you reuse kvm_arch_vcpu_load? Its also called on each sched_in and 
 is architecture specific.

kvm_arch_vcpu_load is also called from kvm_vcpu_ioctl, so we'd be
shrinking unnecessarily.
(sched_in gives us a bit of useful information about the state of the
 system, kvm_vcpu_ioctl not that much.)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum

2014-08-20 Thread Paolo Bonzini
Il 20/08/2014 14:41, Radim Krčmář ha scritto:
 if (ple_window_grow  1 || ple_window_actual_max  ple_window)
 new = ple_window;
 else if (ple_window_grow  ple_window)
 new = max(ple_window_actual_max, old) * ple_window_grow;
 else
 new = max(ple_window_actual_max, old) + ple_window_grow;
 Oh, I like that this can get rid of all overflows, ple_window_actual_max
 (PW_effective_max?) is going to be set to
 ple_window_max [/-] ple_window_grow in v2.
 
  (I think the || in the first if can be eliminated with some creativity
  in clamp_ple_window_max).
 To do it, we'll want to intercept changes to ple_window as well.
 (I disliked this patch a lot even before :)

What about setting ple_window_actual_max to 0 if ple_window_grow is 0
(instead of just returning)?

Then the if (ple_window_actual_max  ple_window) will always fail and
you'll go through new = ple_window.  But perhaps it's more gross and
worthless than creative. :)

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 4/6] KVM: PPC: Move ONE_REG AltiVec support to powerpc

2014-08-20 Thread Mihai Caraman
Move ONE_REG AltiVec support to powerpc generic layer.

Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
---
v4:
 - split ONE_REG powerpc generic and ONE_REG AltiVec

v3:
 - make ONE_REG AltiVec support powerpc generic

v2:
 - add comment describing VCSR register representation in KVM vs kernel

 arch/powerpc/include/uapi/asm/kvm.h |  5 +
 arch/powerpc/kvm/book3s.c   | 42 -
 arch/powerpc/kvm/powerpc.c  | 42 +
 3 files changed, 47 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
b/arch/powerpc/include/uapi/asm/kvm.h
index 3ca357a..ab4d473 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -476,6 +476,11 @@ struct kvm_get_htab_header {
 
 /* FP and vector status/control registers */
 #define KVM_REG_PPC_FPSCR  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x80)
+/*
+ * VSCR register is documented as a 32-bit register in the ISA, but it can
+ * only be accesses via a vector register. Expose VSCR as a 32-bit register
+ * even though the kernel represents it as a 128-bit vector.
+ */
 #define KVM_REG_PPC_VSCR   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x81)
 
 /* Virtual processor areas */
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 26868e2..1b5adda 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -558,25 +558,6 @@ int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id,
case KVM_REG_PPC_FPSCR:
*val = get_reg_val(id, vcpu-arch.fp.fpscr);
break;
-#ifdef CONFIG_ALTIVEC
-   case KVM_REG_PPC_VR0 ... KVM_REG_PPC_VR31:
-   if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
-   r = -ENXIO;
-   break;
-   }
-   val-vval = vcpu-arch.vr.vr[id - KVM_REG_PPC_VR0];
-   break;
-   case KVM_REG_PPC_VSCR:
-   if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
-   r = -ENXIO;
-   break;
-   }
-   *val = get_reg_val(id, vcpu-arch.vr.vscr.u[3]);
-   break;
-   case KVM_REG_PPC_VRSAVE:
-   *val = get_reg_val(id, vcpu-arch.vrsave);
-   break;
-#endif /* CONFIG_ALTIVEC */
 #ifdef CONFIG_VSX
case KVM_REG_PPC_VSR0 ... KVM_REG_PPC_VSR31:
if (cpu_has_feature(CPU_FTR_VSX)) {
@@ -653,29 +634,6 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
case KVM_REG_PPC_FPSCR:
vcpu-arch.fp.fpscr = set_reg_val(id, *val);
break;
-#ifdef CONFIG_ALTIVEC
-   case KVM_REG_PPC_VR0 ... KVM_REG_PPC_VR31:
-   if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
-   r = -ENXIO;
-   break;
-   }
-   vcpu-arch.vr.vr[id - KVM_REG_PPC_VR0] = val-vval;
-   break;
-   case KVM_REG_PPC_VSCR:
-   if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
-   r = -ENXIO;
-   break;
-   }
-   vcpu-arch.vr.vscr.u[3] = set_reg_val(id, *val);
-   break;
-   case KVM_REG_PPC_VRSAVE:
-   if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
-   r = -ENXIO;
-   break;
-   }
-   vcpu-arch.vrsave = set_reg_val(id, *val);
-   break;
-#endif /* CONFIG_ALTIVEC */
 #ifdef CONFIG_VSX
case KVM_REG_PPC_VSR0 ... KVM_REG_PPC_VSR31:
if (cpu_has_feature(CPU_FTR_VSX)) {
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 1326116..19d4755 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -941,6 +941,25 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, 
struct kvm_one_reg *reg)
if (r == -EINVAL) {
r = 0;
switch (reg-id) {
+#ifdef CONFIG_ALTIVEC
+   case KVM_REG_PPC_VR0 ... KVM_REG_PPC_VR31:
+   if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
+   r = -ENXIO;
+   break;
+   }
+   val.vval = vcpu-arch.vr.vr[reg-id - KVM_REG_PPC_VR0];
+   break;
+   case KVM_REG_PPC_VSCR:
+   if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
+   r = -ENXIO;
+   break;
+   }
+   val = get_reg_val(reg-id, 

[PATCH] KVM: x86: Replace X86_FEATURE_NX offset with the definition

2014-08-20 Thread Nadav Amit
Replace reference to X86_FEATURE_NX using bit shift with the defined
X86_FEATURE_NX.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/cpuid.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 38a0afe..f4bad87 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -112,8 +112,8 @@ static void cpuid_fix_nx_cap(struct kvm_vcpu *vcpu)
break;
}
}
-   if (entry  (entry-edx  (1  20))  !is_efer_nx()) {
-   entry-edx = ~(1  20);
+   if (entry  (entry-edx  bit(X86_FEATURE_NX))  !is_efer_nx()) {
+   entry-edx = ~bit(X86_FEATURE_NX);
printk(KERN_INFO kvm: guest NX capability removed\n);
}
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 6/6] KVM: PPC: Booke: Add ONE_REG support for IVPR and IVORs

2014-08-20 Thread Mihai Caraman
Add ONE_REG support for IVPR and IVORs registers. Implement IVPR, IVORs 0-15
and 35 in booke common layer.

Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
---
v4:
 - add ONE_REG IVPR
 - use IVPR, IVOR2 and IVOR8 setters
 - add api documentation for ONE_REG IVPR and IVORs

v3:
 - new patch

 Documentation/virtual/kvm/api.txt   |   7 ++
 arch/powerpc/include/uapi/asm/kvm.h |  25 +++
 arch/powerpc/kvm/booke.c| 145 
 arch/powerpc/kvm/e500.c |  42 ++-
 arch/powerpc/kvm/e500mc.c   |  16 
 5 files changed, 233 insertions(+), 2 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index beae3fd..cd7b171 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1917,6 +1917,13 @@ registers, find a list below:
   PPC   | KVM_REG_PPC_TM_VSCR   | 32
   PPC   | KVM_REG_PPC_TM_DSCR   | 64
   PPC   | KVM_REG_PPC_TM_TAR| 64
+  PPC   | KVM_REG_PPC_IVPR  | 64
+  PPC   | KVM_REG_PPC_IVOR0 | 32
+  ...
+  PPC   | KVM_REG_PPC_IVOR15| 32
+  PPC   | KVM_REG_PPC_IVOR32| 32
+  ...
+  PPC   | KVM_REG_PPC_IVOR37| 32
 |   |
   MIPS  | KVM_REG_MIPS_R0   | 64
   ...
diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
b/arch/powerpc/include/uapi/asm/kvm.h
index ab4d473..c97f119 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -564,6 +564,31 @@ struct kvm_get_htab_header {
 #define KVM_REG_PPC_SPRG9  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xba)
 #define KVM_REG_PPC_DBSR   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xbb)
 
+/* Booke IVPR  IVOR registers */
+#define KVM_REG_PPC_IVPR   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xbc)
+#define KVM_REG_PPC_IVOR0  (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xbd)
+#define KVM_REG_PPC_IVOR1  (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xbe)
+#define KVM_REG_PPC_IVOR2  (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xbf)
+#define KVM_REG_PPC_IVOR3  (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc0)
+#define KVM_REG_PPC_IVOR4  (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc1)
+#define KVM_REG_PPC_IVOR5  (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc2)
+#define KVM_REG_PPC_IVOR6  (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc3)
+#define KVM_REG_PPC_IVOR7  (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc4)
+#define KVM_REG_PPC_IVOR8  (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc5)
+#define KVM_REG_PPC_IVOR9  (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc6)
+#define KVM_REG_PPC_IVOR10 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc7)
+#define KVM_REG_PPC_IVOR11 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc8)
+#define KVM_REG_PPC_IVOR12 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc9)
+#define KVM_REG_PPC_IVOR13 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xca)
+#define KVM_REG_PPC_IVOR14 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xcb)
+#define KVM_REG_PPC_IVOR15 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xcc)
+#define KVM_REG_PPC_IVOR32 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xcd)
+#define KVM_REG_PPC_IVOR33 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xce)
+#define KVM_REG_PPC_IVOR34 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xcf)
+#define KVM_REG_PPC_IVOR35 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xd0)
+#define KVM_REG_PPC_IVOR36 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xd1)
+#define KVM_REG_PPC_IVOR37 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xd2)
+
 /* Transactional Memory checkpointed state:
  * This is all GPRs, all VSX regs and a subset of SPRs
  */
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index d4df648..1cb2a2a 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -1570,6 +1570,75 @@ int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id,
int r = 0;
 
switch (id) {
+   case KVM_REG_PPC_IVPR:
+   *val = get_reg_val(id, vcpu-arch.ivpr);
+   break;
+   case KVM_REG_PPC_IVOR0:
+   *val = get_reg_val(id,
+   vcpu-arch.ivor[BOOKE_IRQPRIO_CRITICAL]);
+   break;
+   case KVM_REG_PPC_IVOR1:
+   *val = get_reg_val(id,
+   vcpu-arch.ivor[BOOKE_IRQPRIO_MACHINE_CHECK]);
+   break;
+   case KVM_REG_PPC_IVOR2:
+   *val = get_reg_val(id,
+   vcpu-arch.ivor[BOOKE_IRQPRIO_DATA_STORAGE]);
+   break;
+   case KVM_REG_PPC_IVOR3:
+   *val = get_reg_val(id,
+   vcpu-arch.ivor[BOOKE_IRQPRIO_INST_STORAGE]);
+   break;
+   case KVM_REG_PPC_IVOR4:
+   *val = get_reg_val(id,
+   vcpu-arch.ivor[BOOKE_IRQPRIO_EXTERNAL]);
+   break;
+   case KVM_REG_PPC_IVOR5:
+   *val = get_reg_val(id,
+   vcpu-arch.ivor[BOOKE_IRQPRIO_ALIGNMENT]);
+   break;
+   case 

[PATCH v4 0/6] KVM: PPC: Book3e: AltiVec support

2014-08-20 Thread Mihai Caraman
Add KVM Book3e AltiVec support.

Changes:

v4:
 - use CONFIG_SPE_POSSIBLE and a new ifdef for CONFIG_ALTIVEC
 - remove SPE handlers from bookehv
 - split ONE_REG powerpc generic and ONE_REG AltiVec
 - add setters for IVPR, IVOR2 and IVOR8
 - add api documentation for ONE_REG IVPR and IVORs
 - don't enable e6500 core since hardware threads are not yet supported

v3:
 - use distinct SPE/AltiVec exception handlers
 - make ONE_REG AltiVec support powerpc generic
 - add ONE_REG IVORs support

 v2:
 - integrate Paul's FP/VMX/VSX changes that landed in kvm-ppc-queue
   in January and take into account feedback

Mihai Caraman (6):
  KVM: PPC: Book3E: Increase FPU laziness
  KVM: PPC: Book3e: Add AltiVec support
  KVM: PPC: Make ONE_REG powerpc generic
  KVM: PPC: Move ONE_REG AltiVec support to powerpc
  KVM: PPC: Booke: Add setter functions for IVPR, IVOR2 and IVOR8
emulation
  KVM: PPC: Booke: Add ONE_REG support for IVPR and IVORs

 Documentation/virtual/kvm/api.txt |   7 +
 arch/powerpc/include/uapi/asm/kvm.h   |  30 +++
 arch/powerpc/kvm/book3s.c | 151 --
 arch/powerpc/kvm/booke.c  | 371 --
 arch/powerpc/kvm/booke.h  |  43 +---
 arch/powerpc/kvm/booke_emulate.c  |  15 +-
 arch/powerpc/kvm/bookehv_interrupts.S |   9 +-
 arch/powerpc/kvm/e500.c   |  42 +++-
 arch/powerpc/kvm/e500_emulate.c   |  20 ++
 arch/powerpc/kvm/e500mc.c |  18 +-
 arch/powerpc/kvm/powerpc.c|  97 +
 11 files changed, 576 insertions(+), 227 deletions(-)

-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 3/6] KVM: PPC: Make ONE_REG powerpc generic

2014-08-20 Thread Mihai Caraman
Make ONE_REG generic for server and embedded architectures by moving
kvm_vcpu_ioctl_get_one_reg() and kvm_vcpu_ioctl_set_one_reg() functions
to powerpc layer.

Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
---
v4:
 - split ONE_REG powerpc generic and ONE_REG AltiVec

v3:
 - make ONE_REG AltiVec support powerpc generic

v2:
 - add comment describing VCSR register representation in KVM vs kernel

 arch/powerpc/kvm/book3s.c  | 121 +++--
 arch/powerpc/kvm/booke.c   |  91 +-
 arch/powerpc/kvm/powerpc.c |  55 +
 3 files changed, 138 insertions(+), 129 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index dd03f6b..26868e2 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -535,33 +535,28 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, 
struct kvm_fpu *fpu)
return -ENOTSUPP;
 }
 
-int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
+int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id,
+   union kvmppc_one_reg *val)
 {
-   int r;
-   union kvmppc_one_reg val;
-   int size;
+   int r = 0;
long int i;
 
-   size = one_reg_size(reg-id);
-   if (size  sizeof(val))
-   return -EINVAL;
-
-   r = vcpu-kvm-arch.kvm_ops-get_one_reg(vcpu, reg-id, val);
+   r = vcpu-kvm-arch.kvm_ops-get_one_reg(vcpu, id, val);
if (r == -EINVAL) {
r = 0;
-   switch (reg-id) {
+   switch (id) {
case KVM_REG_PPC_DAR:
-   val = get_reg_val(reg-id, kvmppc_get_dar(vcpu));
+   *val = get_reg_val(id, kvmppc_get_dar(vcpu));
break;
case KVM_REG_PPC_DSISR:
-   val = get_reg_val(reg-id, kvmppc_get_dsisr(vcpu));
+   *val = get_reg_val(id, kvmppc_get_dsisr(vcpu));
break;
case KVM_REG_PPC_FPR0 ... KVM_REG_PPC_FPR31:
-   i = reg-id - KVM_REG_PPC_FPR0;
-   val = get_reg_val(reg-id, VCPU_FPR(vcpu, i));
+   i = id - KVM_REG_PPC_FPR0;
+   *val = get_reg_val(id, VCPU_FPR(vcpu, i));
break;
case KVM_REG_PPC_FPSCR:
-   val = get_reg_val(reg-id, vcpu-arch.fp.fpscr);
+   *val = get_reg_val(id, vcpu-arch.fp.fpscr);
break;
 #ifdef CONFIG_ALTIVEC
case KVM_REG_PPC_VR0 ... KVM_REG_PPC_VR31:
@@ -569,110 +564,94 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, 
struct kvm_one_reg *reg)
r = -ENXIO;
break;
}
-   val.vval = vcpu-arch.vr.vr[reg-id - KVM_REG_PPC_VR0];
+   val-vval = vcpu-arch.vr.vr[id - KVM_REG_PPC_VR0];
break;
case KVM_REG_PPC_VSCR:
if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
r = -ENXIO;
break;
}
-   val = get_reg_val(reg-id, vcpu-arch.vr.vscr.u[3]);
+   *val = get_reg_val(id, vcpu-arch.vr.vscr.u[3]);
break;
case KVM_REG_PPC_VRSAVE:
-   val = get_reg_val(reg-id, vcpu-arch.vrsave);
+   *val = get_reg_val(id, vcpu-arch.vrsave);
break;
 #endif /* CONFIG_ALTIVEC */
 #ifdef CONFIG_VSX
case KVM_REG_PPC_VSR0 ... KVM_REG_PPC_VSR31:
if (cpu_has_feature(CPU_FTR_VSX)) {
-   long int i = reg-id - KVM_REG_PPC_VSR0;
-   val.vsxval[0] = vcpu-arch.fp.fpr[i][0];
-   val.vsxval[1] = vcpu-arch.fp.fpr[i][1];
+   i = id - KVM_REG_PPC_VSR0;
+   val-vsxval[0] = vcpu-arch.fp.fpr[i][0];
+   val-vsxval[1] = vcpu-arch.fp.fpr[i][1];
} else {
r = -ENXIO;
}
break;
 #endif /* CONFIG_VSX */
-   case KVM_REG_PPC_DEBUG_INST: {
-   u32 opcode = INS_TW;
-   r = copy_to_user((u32 __user *)(long)reg-addr,
-opcode, sizeof(u32));
+   case KVM_REG_PPC_DEBUG_INST:
+   *val = get_reg_val(id, INS_TW);
break;
-   }
 #ifdef CONFIG_KVM_XICS
case KVM_REG_PPC_ICP_STATE:
if (!vcpu-arch.icp) {
r = -ENXIO;
break;
}

[PATCH v4 2/6] KVM: PPC: Book3e: Add AltiVec support

2014-08-20 Thread Mihai Caraman
Add AltiVec support in KVM for Book3e. FPU support gracefully reuse host
infrastructure so follow the same approach for AltiVec.

Book3e specification defines shared interrupt numbers for SPE and AltiVec
units. Still SPE is present in e200/e500v2 cores while AltiVec is present in
e6500 core. So we can currently decide at compile-time which of the SPE or
AltiVec units to support exclusively by using CONFIG_SPE_POSSIBLE and
CONFIG_PPC_E500MC defines. As Alexander Graf suggested, keep SPE and AltiVec
exception handlers distinct to improve code readability.

Guests have the privilege to enable AltiVec, so we always need to support
AltiVec in KVM and implicitly in host to reflect interrupts and to save/restore
the unit context. KVM will be loaded on cores with AltiVec unit only if
CONFIG_ALTIVEC is defined. Use this define to guard KVM AltiVec logic.

Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
---
v4:
 - use CONFIG_SPE_POSSIBLE and a new ifdef for CONFIG_ALTIVEC
 - remove SPE handlers from bookehv
 - update commit message

v3:
 - use distinct SPE/AltiVec exception handlers

v2:
 - integrate Paul's FP/VMX/VSX changes

 arch/powerpc/kvm/booke.c  | 74 ++-
 arch/powerpc/kvm/booke.h  |  6 +++
 arch/powerpc/kvm/bookehv_interrupts.S |  9 +
 arch/powerpc/kvm/e500_emulate.c   | 20 ++
 4 files changed, 101 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 91e7217..8ace612 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -168,6 +168,40 @@ static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu *vcpu)
 #endif
 }
 
+/*
+ * Simulate AltiVec unavailable fault to load guest state
+ * from thread to AltiVec unit.
+ * It requires to be called with preemption disabled.
+ */
+static inline void kvmppc_load_guest_altivec(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_ALTIVEC
+   if (cpu_has_feature(CPU_FTR_ALTIVEC)) {
+   if (!(current-thread.regs-msr  MSR_VEC)) {
+   enable_kernel_altivec();
+   load_vr_state(vcpu-arch.vr);
+   current-thread.vr_save_area = vcpu-arch.vr;
+   current-thread.regs-msr |= MSR_VEC;
+   }
+   }
+#endif
+}
+
+/*
+ * Save guest vcpu AltiVec state into thread.
+ * It requires to be called with preemption disabled.
+ */
+static inline void kvmppc_save_guest_altivec(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_ALTIVEC
+   if (cpu_has_feature(CPU_FTR_ALTIVEC)) {
+   if (current-thread.regs-msr  MSR_VEC)
+   giveup_altivec(current);
+   current-thread.vr_save_area = NULL;
+   }
+#endif
+}
+
 static void kvmppc_vcpu_sync_debug(struct kvm_vcpu *vcpu)
 {
/* Synchronize guest's desire to get debug interrupts into shadow MSR */
@@ -375,9 +409,15 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu 
*vcpu,
case BOOKE_IRQPRIO_ITLB_MISS:
case BOOKE_IRQPRIO_SYSCALL:
case BOOKE_IRQPRIO_FP_UNAVAIL:
+#ifdef CONFIG_SPE_POSSIBLE
case BOOKE_IRQPRIO_SPE_UNAVAIL:
case BOOKE_IRQPRIO_SPE_FP_DATA:
case BOOKE_IRQPRIO_SPE_FP_ROUND:
+#endif
+#ifdef CONFIG_ALTIVEC
+   case BOOKE_IRQPRIO_ALTIVEC_UNAVAIL:
+   case BOOKE_IRQPRIO_ALTIVEC_ASSIST:
+#endif
case BOOKE_IRQPRIO_AP_UNAVAIL:
allowed = 1;
msr_mask = MSR_CE | MSR_ME | MSR_DE;
@@ -697,6 +737,17 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
kvmppc_load_guest_fp(vcpu);
 #endif
 
+#ifdef CONFIG_ALTIVEC
+   /* Save userspace AltiVec state in stack */
+   if (cpu_has_feature(CPU_FTR_ALTIVEC))
+   enable_kernel_altivec();
+   /*
+* Since we can't trap on MSR_VEC in GS-mode, we consider the guest
+* as always using the AltiVec.
+*/
+   kvmppc_load_guest_altivec(vcpu);
+#endif
+
/* Switch to guest debug context */
debug = vcpu-arch.dbg_reg;
switch_booke_debug_regs(debug);
@@ -719,6 +770,10 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
kvmppc_save_guest_fp(vcpu);
 #endif
 
+#ifdef CONFIG_ALTIVEC
+   kvmppc_save_guest_altivec(vcpu);
+#endif
+
 out:
vcpu-mode = OUTSIDE_GUEST_MODE;
return ret;
@@ -1025,7 +1080,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_SPE_FP_ROUND);
r = RESUME_GUEST;
break;
-#else
+#elif defined(CONFIG_SPE_POSSIBLE)
case BOOKE_INTERRUPT_SPE_UNAVAIL:
/*
 * Guest wants SPE, but host kernel doesn't support it.  Send
@@ -1046,6 +1101,22 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
run-hw.hardware_exit_reason = exit_nr;
r = RESUME_HOST;
break;
+#endif /* CONFIG_SPE_POSSIBLE 

[PATCH v4 5/6] KVM: PPC: Booke: Add setter functions for IVPR, IVOR2 and IVOR8 emulation

2014-08-20 Thread Mihai Caraman
Add setter functions for IVPR, IVOR2 and IVOR8 emulation in preparation
for ONE_REG support.

Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
---
v4:
 - new patch
 - add api documentation for ONE_REG IVPR and IVORs

 arch/powerpc/kvm/booke.c | 24 
 arch/powerpc/kvm/booke.h |  3 +++
 arch/powerpc/kvm/booke_emulate.c | 15 +++
 3 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 831c1b4..d4df648 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -1782,6 +1782,30 @@ void kvmppc_clr_tsr_bits(struct kvm_vcpu *vcpu, u32 
tsr_bits)
update_timer_ints(vcpu);
 }
 
+void kvmppc_set_ivpr(struct kvm_vcpu *vcpu, ulong new_ivpr)
+{
+   vcpu-arch.ivpr = new_ivpr;
+#ifdef CONFIG_KVM_BOOKE_HV
+   mtspr(SPRN_GIVPR, new_ivpr);
+#endif
+}
+
+void kvmppc_set_ivor2(struct kvm_vcpu *vcpu, u32 new_ivor)
+{
+   vcpu-arch.ivor[BOOKE_IRQPRIO_DATA_STORAGE] = new_ivor;
+#ifdef CONFIG_KVM_BOOKE_HV
+   mtspr(SPRN_GIVOR2, new_ivor);
+#endif
+}
+
+void kvmppc_set_ivor8(struct kvm_vcpu *vcpu, u32 new_ivor)
+{
+   vcpu-arch.ivor[BOOKE_IRQPRIO_SYSCALL] = new_ivor;
+#ifdef CONFIG_KVM_BOOKE_HV
+   mtspr(SPRN_GIVOR8, new_ivor);
+#endif
+}
+
 void kvmppc_decrementer_func(unsigned long data)
 {
struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h
index 22ba08e..0242530 100644
--- a/arch/powerpc/kvm/booke.h
+++ b/arch/powerpc/kvm/booke.h
@@ -80,6 +80,9 @@ void kvmppc_set_epcr(struct kvm_vcpu *vcpu, u32 new_epcr);
 void kvmppc_set_tcr(struct kvm_vcpu *vcpu, u32 new_tcr);
 void kvmppc_set_tsr_bits(struct kvm_vcpu *vcpu, u32 tsr_bits);
 void kvmppc_clr_tsr_bits(struct kvm_vcpu *vcpu, u32 tsr_bits);
+void kvmppc_set_ivpr(struct kvm_vcpu *vcpu, ulong new_ivpr);
+void kvmppc_set_ivor2(struct kvm_vcpu *vcpu, u32 new_ivor);
+void kvmppc_set_ivor8(struct kvm_vcpu *vcpu, u32 new_ivor);
 
 int kvmppc_booke_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
 unsigned int inst, int *advance);
diff --git a/arch/powerpc/kvm/booke_emulate.c b/arch/powerpc/kvm/booke_emulate.c
index 92bc668..94c64e3 100644
--- a/arch/powerpc/kvm/booke_emulate.c
+++ b/arch/powerpc/kvm/booke_emulate.c
@@ -191,10 +191,7 @@ int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, int 
sprn, ulong spr_val)
break;
 
case SPRN_IVPR:
-   vcpu-arch.ivpr = spr_val;
-#ifdef CONFIG_KVM_BOOKE_HV
-   mtspr(SPRN_GIVPR, spr_val);
-#endif
+   kvmppc_set_ivpr(vcpu, spr_val);
break;
case SPRN_IVOR0:
vcpu-arch.ivor[BOOKE_IRQPRIO_CRITICAL] = spr_val;
@@ -203,10 +200,7 @@ int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, int 
sprn, ulong spr_val)
vcpu-arch.ivor[BOOKE_IRQPRIO_MACHINE_CHECK] = spr_val;
break;
case SPRN_IVOR2:
-   vcpu-arch.ivor[BOOKE_IRQPRIO_DATA_STORAGE] = spr_val;
-#ifdef CONFIG_KVM_BOOKE_HV
-   mtspr(SPRN_GIVOR2, spr_val);
-#endif
+   kvmppc_set_ivor2(vcpu, spr_val);
break;
case SPRN_IVOR3:
vcpu-arch.ivor[BOOKE_IRQPRIO_INST_STORAGE] = spr_val;
@@ -224,10 +218,7 @@ int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, int 
sprn, ulong spr_val)
vcpu-arch.ivor[BOOKE_IRQPRIO_FP_UNAVAIL] = spr_val;
break;
case SPRN_IVOR8:
-   vcpu-arch.ivor[BOOKE_IRQPRIO_SYSCALL] = spr_val;
-#ifdef CONFIG_KVM_BOOKE_HV
-   mtspr(SPRN_GIVOR8, spr_val);
-#endif
+   kvmppc_set_ivor8(vcpu, spr_val);
break;
case SPRN_IVOR9:
vcpu-arch.ivor[BOOKE_IRQPRIO_AP_UNAVAIL] = spr_val;
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 1/6] KVM: PPC: Book3E: Increase FPU laziness

2014-08-20 Thread Mihai Caraman
Increase FPU laziness by loading the guest state into the unit before entering
the guest instead of doing it on each vcpu schedule. Without this improvement
an interrupt may claim floating point corrupting guest state.

Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
---
v4:
 - update commit message

v3:
 - no changes

v2:
 - remove fpu_active
 - add descriptive comments

 arch/powerpc/kvm/booke.c  | 43 ---
 arch/powerpc/kvm/booke.h  | 34 --
 arch/powerpc/kvm/e500mc.c |  2 --
 3 files changed, 36 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 074b7fc..91e7217 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -124,6 +124,40 @@ static void kvmppc_vcpu_sync_spe(struct kvm_vcpu *vcpu)
 }
 #endif
 
+/*
+ * Load up guest vcpu FP state if it's needed.
+ * It also set the MSR_FP in thread so that host know
+ * we're holding FPU, and then host can help to save
+ * guest vcpu FP state if other threads require to use FPU.
+ * This simulates an FP unavailable fault.
+ *
+ * It requires to be called with preemption disabled.
+ */
+static inline void kvmppc_load_guest_fp(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_PPC_FPU
+   if (!(current-thread.regs-msr  MSR_FP)) {
+   enable_kernel_fp();
+   load_fp_state(vcpu-arch.fp);
+   current-thread.fp_save_area = vcpu-arch.fp;
+   current-thread.regs-msr |= MSR_FP;
+   }
+#endif
+}
+
+/*
+ * Save guest vcpu FP state into thread.
+ * It requires to be called with preemption disabled.
+ */
+static inline void kvmppc_save_guest_fp(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_PPC_FPU
+   if (current-thread.regs-msr  MSR_FP)
+   giveup_fpu(current);
+   current-thread.fp_save_area = NULL;
+#endif
+}
+
 static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu *vcpu)
 {
 #if defined(CONFIG_PPC_FPU)  !defined(CONFIG_KVM_BOOKE_HV)
@@ -658,12 +692,8 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
 
/*
 * Since we can't trap on MSR_FP in GS-mode, we consider the guest
-* as always using the FPU.  Kernel usage of FP (via
-* enable_kernel_fp()) in this thread must not occur while
-* vcpu-fpu_active is set.
+* as always using the FPU.
 */
-   vcpu-fpu_active = 1;
-
kvmppc_load_guest_fp(vcpu);
 #endif
 
@@ -687,8 +717,6 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
 
 #ifdef CONFIG_PPC_FPU
kvmppc_save_guest_fp(vcpu);
-
-   vcpu-fpu_active = 0;
 #endif
 
 out:
@@ -1194,6 +1222,7 @@ out:
else {
/* interrupts now hard-disabled */
kvmppc_fix_ee_before_entry();
+   kvmppc_load_guest_fp(vcpu);
}
}
 
diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h
index f753543..e73d513 100644
--- a/arch/powerpc/kvm/booke.h
+++ b/arch/powerpc/kvm/booke.h
@@ -116,40 +116,6 @@ extern int kvmppc_core_emulate_mtspr_e500(struct kvm_vcpu 
*vcpu, int sprn,
 extern int kvmppc_core_emulate_mfspr_e500(struct kvm_vcpu *vcpu, int sprn,
  ulong *spr_val);
 
-/*
- * Load up guest vcpu FP state if it's needed.
- * It also set the MSR_FP in thread so that host know
- * we're holding FPU, and then host can help to save
- * guest vcpu FP state if other threads require to use FPU.
- * This simulates an FP unavailable fault.
- *
- * It requires to be called with preemption disabled.
- */
-static inline void kvmppc_load_guest_fp(struct kvm_vcpu *vcpu)
-{
-#ifdef CONFIG_PPC_FPU
-   if (vcpu-fpu_active  !(current-thread.regs-msr  MSR_FP)) {
-   enable_kernel_fp();
-   load_fp_state(vcpu-arch.fp);
-   current-thread.fp_save_area = vcpu-arch.fp;
-   current-thread.regs-msr |= MSR_FP;
-   }
-#endif
-}
-
-/*
- * Save guest vcpu FP state into thread.
- * It requires to be called with preemption disabled.
- */
-static inline void kvmppc_save_guest_fp(struct kvm_vcpu *vcpu)
-{
-#ifdef CONFIG_PPC_FPU
-   if (vcpu-fpu_active  (current-thread.regs-msr  MSR_FP))
-   giveup_fpu(current);
-   current-thread.fp_save_area = NULL;
-#endif
-}
-
 static inline void kvmppc_clear_dbsr(void)
 {
mtspr(SPRN_DBSR, mfspr(SPRN_DBSR));
diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c
index 000cf82..4549349 100644
--- a/arch/powerpc/kvm/e500mc.c
+++ b/arch/powerpc/kvm/e500mc.c
@@ -145,8 +145,6 @@ static void kvmppc_core_vcpu_load_e500mc(struct kvm_vcpu 
*vcpu, int cpu)
kvmppc_e500_tlbil_all(vcpu_e500);
__get_cpu_var(last_vcpu_of_lpid)[vcpu-kvm-arch.lpid] = vcpu;
}
-
-   kvmppc_load_guest_fp(vcpu);
 }
 
 static void kvmppc_core_vcpu_put_e500mc(struct kvm_vcpu *vcpu)
-- 
1.7.11.7

--
To unsubscribe from this list: 

[PATCH 2/2] KVM: vmx: Reflect misc_enables in real CPU

2014-08-20 Thread Nadav Amit
IA32_MISC_ENABLE MSR has two bits that affect the actual results which can be
observed by the guest: fast string enable, and FOPCODE compatibility.  Guests
may wish to change the default settings of these bits.

Linux usually enables fast-string by default. However, when fast string is
enabled data breakpoints are only recognized on boundaries between data-groups.
On some old CPUs enabling fast-string also resulted in single-step not
occurring upon each iteration.

FOPCODE compatibility can be used to analyze program performance by recording
the last instruction executed before FSAVE/FSTENV/FXSAVE.

This patch saves and restores these bits in IA32_MISC_ENABLE if they are
supported upon entry to guest and exit to userspace respectively.  To avoid
possible issues, fast-string can only be enabled by the guest if the host
enabled them. The physical CPU version is checked to ensure no shared bits are
reconfigured in the process.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm.c  |  7 ++
 arch/x86/kvm/vmx.c  | 56 +
 arch/x86/kvm/x86.c  |  2 +-
 4 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4bda61b..879b930 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -699,6 +699,7 @@ struct kvm_x86_ops {
void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3);
int (*set_cr4)(struct kvm_vcpu *vcpu, unsigned long cr4);
void (*set_efer)(struct kvm_vcpu *vcpu, u64 efer);
+   void (*set_misc_enable)(struct kvm_vcpu *vcpu, u64 data);
void (*get_idt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
void (*set_idt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
void (*get_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1f49c86..378e50e 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -480,6 +480,11 @@ static void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
mark_dirty(to_svm(vcpu)-vmcb, VMCB_CR);
 }
 
+static void svm_set_misc_enable(struct kvm_vcpu *vcpu, u64 data)
+{
+   vcpu-arch.ia32_misc_enable_msr = data;
+}
+
 static int is_external_interrupt(u32 info)
 {
info = SVM_EVTINJ_TYPE_MASK | SVM_EVTINJ_VALID;
@@ -1152,6 +1157,7 @@ static void init_vmcb(struct vcpu_svm *svm)
init_sys_seg(save-tr, SEG_TYPE_BUSY_TSS16);
 
svm_set_efer(svm-vcpu, 0);
+   svm_set_misc_enable(svm-vcpu, 0);
save-dr6 = 0x0ff0;
kvm_set_rflags(svm-vcpu, 2);
save-rip = 0xfff0;
@@ -4338,6 +4344,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.set_cr3 = svm_set_cr3,
.set_cr4 = svm_set_cr4,
.set_efer = svm_set_efer,
+   .set_misc_enable = svm_set_misc_enable,
.get_idt = svm_get_idt,
.set_idt = svm_set_idt,
.get_gdt = svm_get_gdt,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 45bab55..2d2efd0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -809,6 +809,8 @@ static const struct kvm_vmx_segment_field {
 };
 
 static u64 host_efer;
+static u64 host_misc_enable;
+static u64 guest_misc_enable_mask;
 
 static void ept_save_pdptrs(struct kvm_vcpu *vcpu);
 
@@ -1609,6 +1611,33 @@ static void reload_tss(void)
load_TR_desc();
 }
 
+static void __init update_guest_misc_enable_mask(void)
+{
+   /* Calculating which of the IA32_MISC_ENABLE bits should be reflected
+  in hardware */
+   struct cpuinfo_x86 *c = boot_cpu_data;
+   u64 data;
+
+   guest_misc_enable_mask = 0;
+
+   /* Core/Atom architecture share fast-string and x86 compat */
+   if (c-x86 != 6 || c-x86_model  0xd)
+   return;
+
+   if (rdmsrl_safe(MSR_IA32_MISC_ENABLE, data)  0)
+   return;
+   if (boot_cpu_has(X86_FEATURE_REP_GOOD))
+   guest_misc_enable_mask |= MSR_IA32_MISC_ENABLE_FAST_STRING;
+
+   preempt_disable();
+   if (wrmsrl_safe(MSR_IA32_MISC_ENABLE,
+   data | MSR_IA32_MISC_ENABLE_X87_COMPAT) = 0) {
+   guest_misc_enable_mask |= MSR_IA32_MISC_ENABLE_X87_COMPAT;
+   wrmsrl(MSR_IA32_MISC_ENABLE, data);
+   }
+   preempt_enable();
+}
+
 static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
 {
u64 guest_efer;
@@ -3126,6 +3155,8 @@ static __init int hardware_setup(void)
if (!cpu_has_vmx_apicv())
enable_apicv = 0;
 
+   update_guest_misc_enable_mask();
+
if (enable_apicv)
kvm_x86_ops-update_cr8_intercept = NULL;
else {
@@ -3315,6 +3346,28 @@ static void vmx_set_efer(struct kvm_vcpu *vcpu, u64 efer)
setup_msrs(vmx);
 }
 
+static void vmx_set_misc_enable(struct kvm_vcpu *vcpu, u64 data)
+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+   

[PATCH] x86: Test debug exceptions with disabled fast-string

2014-08-20 Thread Nadav Amit
x86 allows to enable fast strings, sacrificing the precision of debug
watchpoints.  Previously, KVM did not reflect the guest fast strings settings
in the actual MSR, resulting always in imprecise exception.

This test checks whether disabled fast strings causes the debug trap on
rep-string to occur on the precise iteration. A debug watchpoint which is not
cache-line aligned is set, and 128 bytes are set using rep-string operation.
The iteration in which the debug exception occurred is then checked.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 x86/debug.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/x86/debug.c b/x86/debug.c
index 34e56fb..eb96dbe 100644
--- a/x86/debug.c
+++ b/x86/debug.c
@@ -11,10 +11,13 @@
 
 #include libcflat.h
 #include desc.h
+#include msr.h
+#include processor.h
 
-static volatile unsigned long bp_addr[10], dr6[10];
+static volatile unsigned long bp_addr[10], dr6[10], rcx[10];
 static volatile unsigned int n;
 static volatile unsigned long value;
+static unsigned char dst[128] __attribute__ ((aligned(64)));
 
 static unsigned long get_dr6(void)
 {
@@ -43,6 +46,7 @@ static void handle_db(struct ex_regs *regs)
 {
bp_addr[n] = regs-rip;
dr6[n] = get_dr6();
+   rcx[n] = regs-rcx;
 
if (dr6[n]  0x1)
regs-rflags |= (1  16);
@@ -60,7 +64,7 @@ static void handle_bp(struct ex_regs *regs)
 
 int main(int ac, char **av)
 {
-   unsigned long start;
+   unsigned long start, misc_enable;
 
setup_idt();
handle_exception(DB_VECTOR, handle_db);
@@ -109,5 +113,18 @@ hw_wp:
   n == 1 
   bp_addr[0] == ((unsigned long)hw_wp)  dr6[0] == 0x4ff2);
 
+   misc_enable = rdmsr(MSR_IA32_MISC_ENABLE);
+   wrmsr(MSR_IA32_MISC_ENABLE,
+ misc_enable  ~MSR_IA32_MISC_ENABLE_FAST_STRING);
+
+   n = 0;
+   set_dr1((void *)dst[59]);
+   set_dr7(0x0010040a);
+
+   asm volatile(rep stosb\n\t : : D(dst), c(128) : cc, memory);
+
+   report(hw watchpoint with disabled fast-string, rcx[0] == 128-1-59);
+   wrmsr(MSR_IA32_MISC_ENABLE, misc_enable);
+
return report_summary();
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: x86: update cpuid according to IA32_MISC_ENABLE

2014-08-20 Thread Nadav Amit
Virtual BIOS may use the Limit CPUID Maxval and XD Bit Disable fields in
IA32_MISC_ENABLE. These two fields update the CPUID, and in the case of XD Bit
Disable also disable NX support.

This patch reflects this behavior in CPUID, and disables NX bit accordingly.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/cpuid.c | 20 
 arch/x86/kvm/vmx.c   |  8 ++--
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 38a0afe..ff7f429 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -757,6 +757,25 @@ static struct kvm_cpuid_entry2* check_cpuid_limit(struct 
kvm_vcpu *vcpu,
return kvm_find_cpuid_entry(vcpu, maxlevel-eax, index);
 }
 
+static void cpuid_override(struct kvm_vcpu *vcpu, u32 function, u32 index,
+  u32 *eax, u32 *ebx, u32 *ecx, u32 *edx)
+{
+   switch (function) {
+   case 0:
+   if (vcpu-arch.ia32_misc_enable_msr 
+   MSR_IA32_MISC_ENABLE_LIMIT_CPUID)
+   *eax = min_t(u32, *eax, 3);
+   break;
+   case 1:
+   if (vcpu-arch.ia32_misc_enable_msr 
+   MSR_IA32_MISC_ENABLE_XD_DISABLE)
+   *edx = ~bit(X86_FEATURE_NX);
+   break;
+   default:
+   break;
+   }
+}
+
 void kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, u32 *ecx, u32 *edx)
 {
u32 function = *eax, index = *ecx;
@@ -774,6 +793,7 @@ void kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, 
u32 *ecx, u32 *edx)
*edx = best-edx;
} else
*eax = *ebx = *ecx = *edx = 0;
+   cpuid_override(vcpu, function, index, eax, ebx, ecx, edx);
trace_kvm_cpuid(function, *eax, *ebx, *ecx, *edx);
 }
 EXPORT_SYMBOL_GPL(kvm_cpuid);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index cad37d5..45bab55 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1633,9 +1633,13 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, 
int efer_offset)
vmx-guest_msrs[efer_offset].mask = ~ignore_bits;
 
clear_atomic_switch_msr(vmx, MSR_EFER);
+   /* Clear nx according if xd_disable is on */
+   guest_efer = vmx-vcpu.arch.efer;
+   if (vmx-vcpu.arch.ia32_misc_enable_msr 
+   MSR_IA32_MISC_ENABLE_XD_DISABLE)
+   guest_efer = ~EFER_NX;
/* On ept, can't emulate nx, and must switch nx atomically */
-   if (enable_ept  ((vmx-vcpu.arch.efer ^ host_efer)  EFER_NX)) {
-   guest_efer = vmx-vcpu.arch.efer;
+   if (enable_ept  ((guest_efer ^ host_efer)  EFER_NX)) {
if (!(guest_efer  EFER_LMA))
guest_efer = ~EFER_LME;
add_atomic_switch_msr(vmx, MSR_EFER, guest_efer, host_efer);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Replace X86_FEATURE_NX offset with the definition

2014-08-20 Thread Paolo Bonzini
Il 20/08/2014 15:38, Nadav Amit ha scritto:
 Replace reference to X86_FEATURE_NX using bit shift with the defined
 X86_FEATURE_NX.
 
 Signed-off-by: Nadav Amit na...@cs.technion.ac.il
 ---
  arch/x86/kvm/cpuid.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
 index 38a0afe..f4bad87 100644
 --- a/arch/x86/kvm/cpuid.c
 +++ b/arch/x86/kvm/cpuid.c
 @@ -112,8 +112,8 @@ static void cpuid_fix_nx_cap(struct kvm_vcpu *vcpu)
   break;
   }
   }
 - if (entry  (entry-edx  (1  20))  !is_efer_nx()) {
 - entry-edx = ~(1  20);
 + if (entry  (entry-edx  bit(X86_FEATURE_NX))  !is_efer_nx()) {
 + entry-edx = ~bit(X86_FEATURE_NX);
   printk(KERN_INFO kvm: guest NX capability removed\n);
   }
  }
 

Applying, thanks.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] KVM: vmx: Supporting IA32_MISC_ENABLE MSR

2014-08-20 Thread Nadav Amit
IA32_MISC_ENABLE MSR has several bits that affect the behavior of the CPU and
are visible to the guest.  This patch-set makes the vCPU to behave correctly in
respect to: Fast string enable, FPU Fopcode compatibilty mode, limit CPUID
max-val and XD-bit disable.

Thanks for reviewing the patch.

Nadav Amit (2):
  KVM: x86: update cpuid according to IA32_MISC_ENABLE
  KVM: vmx: Reflect misc_enables in real CPU

 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/cpuid.c| 20 +
 arch/x86/kvm/svm.c  |  7 +
 arch/x86/kvm/vmx.c  | 64 +++--
 arch/x86/kvm/x86.c  |  2 +-
 5 files changed, 91 insertions(+), 3 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm tools: balloon: fix overflow in PFN to address conversion

2014-08-20 Thread Konstantin Khlebnikov
Fix trivial overflow of u32 value

Signed-off-by: Konstantin Khlebnikov koc...@gmail.com
---
 tools/kvm/virtio/balloon.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/kvm/virtio/balloon.c b/tools/kvm/virtio/balloon.c
index f7dfb0b..84c4bb0 100644
--- a/tools/kvm/virtio/balloon.c
+++ b/tools/kvm/virtio/balloon.c
@@ -64,7 +64,7 @@ static bool virtio_bln_do_io_request(struct kvm *kvm, struct 
bln_dev *bdev, stru
for (i = 0 ; i  len ; i++) {
void *guest_ptr;
 
-   guest_ptr = guest_flat_to_host(kvm, ptrs[i]  
VIRTIO_BALLOON_PFN_SHIFT);
+   guest_ptr = guest_flat_to_host(kvm, (u64)ptrs[i]  
VIRTIO_BALLOON_PFN_SHIFT);
if (queue == bdev-vqs[VIRTIO_BLN_INFLATE]) {
madvise(guest_ptr, 1  VIRTIO_BALLOON_PFN_SHIFT, 
MADV_DONTNEED);
bdev-config.actual++;

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 2/7] random, timekeeping: Collect timekeeping entropy in the timekeeping code

2014-08-20 Thread John Stultz
On Thu, Aug 14, 2014 at 12:43 AM, Andy Lutomirski l...@amacapital.net wrote:
 Currently, init_std_data calls ktime_get_real().  This imposes
 awkward constraints on when init_std_data can be called, and
 init_std_data is unlikely to collect the full unpredictable data
 available to the timekeeping code, especially after resume.

 Remove this code from random.c and add the appropriate
 add_device_randomness calls to timekeeping.c instead.

 Cc: John Stultz john.stu...@linaro.org
 Signed-off-by: Andy Lutomirski l...@amacapital.net
 ---
  drivers/char/random.c |  2 --
  kernel/time/timekeeping.c | 11 +++
  2 files changed, 11 insertions(+), 2 deletions(-)

 diff --git a/drivers/char/random.c b/drivers/char/random.c
 index 7673e60..8dc3e3a 100644
 --- a/drivers/char/random.c
 +++ b/drivers/char/random.c
 @@ -1263,12 +1263,10 @@ static void seed_entropy_store(void *ctx, u32 data)
  static void init_std_data(struct entropy_store *r)
  {
 int i;
 -   ktime_t now = ktime_get_real();
 unsigned long rv;
 char log_prefix[128];

 r-last_pulled = jiffies;
 -   mix_pool_bytes(r, now, sizeof(now), NULL);
 for (i = r-poolinfo-poolbytes; i  0; i -= sizeof(rv)) {
 rv = random_get_entropy();
 mix_pool_bytes(r, rv, sizeof(rv), NULL);
 diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
 index 32d8d6a..9609db9 100644
 --- a/kernel/time/timekeeping.c
 +++ b/kernel/time/timekeeping.c
 @@ -23,6 +23,7 @@
  #include linux/stop_machine.h
  #include linux/pvclock_gtod.h
  #include linux/compiler.h
 +#include linux/random.h

  #include tick-internal.h
  #include ntp_internal.h
 @@ -835,6 +836,9 @@ void __init timekeeping_init(void)
 memcpy(shadow_timekeeper, timekeeper, sizeof(timekeeper));

 write_seqcount_end(timekeeper_seq);
 +
 +   add_device_randomness(tk, sizeof(tk));
 +


So I can't (and really don't want to) vouch for the correctness side
of this. The initial idea of using the structure instead of reading
the time worried me a bit, but we have already read the clocksource
and stored it in cycle_last so there's a wee bit more then just the
RTC time and a bunch of zeros in the timekeeper structure.

Though on some systems the read_persistent_clock call can't access the
RTC at timekeeping_init, so I'm not sure we're really getting that
much more then the cycle_last clocksource value here. Probably should
add something like this to the RTC hctosys logic.

thanks
-john
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virt-install: failed to initialize KVM: Permission denied

2014-08-20 Thread Cole Robinson
On 08/19/2014 02:38 PM, arnaud gaboury wrote:
 $ uname -r
 3.16.1-1-ARCH
 -
 
 As a regular user, member of the libvirt group, I run this command to
 create a basic VM:
 
 virt-install --connect qemu:///system --name=test --ram 2048 --cpu
 host-model-only --os-variant=win7 --disk /myVM/test --boot cdrom,hd
 --virt-type kvm --graphics spice --controller scsi,model=virtio-scsi
 --cdrom=/drawer/myIso/w8.iso
 
 It returns an error :
 --
 ---
 Starting install...
 ERRORinternal error: process exited while connecting to monitor:
 Could not access KVM kernel module: Permission denied
 failed to initialize KVM: Permission denied
 -
 
 $ getfacl /dev/kvm
 
 # file: dev/kvm
 # owner: root
 # group: kvm
 user::rw-
 user:martinus:rw-
 group::rw-
 mask::rw-
 other::---
 
 The command return seems to indicate rights are correct.
 $ lsmod return kvm  kvm_intel are loaded.
 
 If I run the virt-install with qemu:///session, I do not have this
 issue and can create the VM.
 
 I found many entries about the KVM permission issue, but with no clear
 answer to solve it.
 

When connecting to qemu:///system, libvirt does not run VMs as your regular
user. What user libvirtd uses though is dependent on how it's configured. On
Fedora, qemu VMs are run as the 'qemu' user. If that's how it's configured on
your distro, the above permissions would block use of /dev/kvm. Here's how
permissions look on Fedora 20 for me:

$ ls -l /dev/kvm
crw-rw-rw-+ 1 root kvm 10, 232 Aug  8 09:51 /dev/kvm

$ getfacl /dev/kvm
getfacl: Removing leading '/' from absolute path names
# file: dev/kvm
# owner: root
# group: kvm
user::rw-
user:crobinso:rw-
group::rw-
mask::rw-
other::rw-

Those permissive permissions are set by a udev rule installed by 
qemu-system-x86:

$ cat /lib/udev/rules.d/80-kvm.rules
KERNEL==kvm, GROUP=kvm, MODE=0666

So perhaps your distro should do the same.

- Cole
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum

2014-08-20 Thread Radim Krčmář
2014-08-20 15:15+0200, Paolo Bonzini:
 Il 20/08/2014 14:41, Radim Krčmář ha scritto:
if (ple_window_grow  1 || ple_window_actual_max  ple_window)
new = ple_window;
else if (ple_window_grow  ple_window)
new = max(ple_window_actual_max, old) * ple_window_grow;
else
new = max(ple_window_actual_max, old) + ple_window_grow;
  Oh, I like that this can get rid of all overflows, ple_window_actual_max
  (PW_effective_max?) is going to be set to
  ple_window_max [/-] ple_window_grow in v2.
  
   (I think the || in the first if can be eliminated with some creativity
   in clamp_ple_window_max).
  To do it, we'll want to intercept changes to ple_window as well.
  (I disliked this patch a lot even before :)
 
 What about setting ple_window_actual_max to 0 if ple_window_grow is 0
 (instead of just returning)?
 
 Then the if (ple_window_actual_max  ple_window) will always fail and
 you'll go through new = ple_window.  But perhaps it's more gross and
 worthless than creative. :)

That code can't use PW directly, because PW_actual_max needs to be one
PW_grow below PW_max, so I'd rather enforce minimal PW_actual_max.

Btw. without extra code, we are still going to overflow on races when
changing PW_grow, should they be covered as well?

(+ There is a bug in this patch -- clamp_ple_window_max() should be
 after param_set_int() ... damned unreviewed last-second changes.)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum

2014-08-20 Thread Paolo Bonzini
Il 20/08/2014 17:31, Radim Krčmář ha scritto:
 Btw. without extra code, we are still going to overflow on races when
 changing PW_grow, should they be covered as well?

You mean because there is no spinlock or similar protecting the changes?
 I guess you could use a seqlock.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virt-install: failed to initialize KVM: Permission denied

2014-08-20 Thread arnaud gaboury
 Those permissive permissions are set by a udev rule installed by 
 qemu-system-x86:

 $ cat /lib/udev/rules.d/80-kvm.rules
 KERNEL==kvm, GROUP=kvm, MODE=0666

 So perhaps your distro should do the same.


I have it as /lib/udev/rules.d/65-kvm.rules.

In fact, I solved my issue when setting user:group to qemu:kvm in
/etc/libvirt/qemu.conf
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


qemu-kvm process soft lockups cpu, results in server crash

2014-08-20 Thread yue longguang
hi, i encountered several times   my server crashes  because qemu-kvm
process locks up cpu.

at the front conntrack drop packets, but i have no idea  there is how
much relationship  between the stuck with dropping packets.

environment:
centos6.5  kernel 2.6.32-431.11.2 , x86_64,  qemu-kvm-tools-0.12

thanks
--log---
Aug 18 22:07:05 localhost kernel: [4625821.185649] nf_conntrack: table
full, dropping packet.
Aug 18 22:07:05 localhost kernel: [4625821.192085] nf_conntrack: table
full, dropping packet.
Aug 18 22:07:05 localhost kernel: [4625821.198608] nf_conntrack: table
full, dropping packet.
Aug 18 22:07:05 localhost kernel: [4625821.205021] nf_conntrack: table
full, dropping packet.
Aug 18 22:07:05 localhost kernel: [4625821.211432] nf_conntrack: table
full, dropping packet.
Aug 18 22:07:05 localhost kernel: [4625821.217874] nf_conntrack: table
full, dropping packet.
Aug 18 22:07:05 localhost kernel: [4625821.224301] nf_conntrack: table
full, dropping packet.
Aug 18 22:07:05 localhost kernel: [4625821.230764] nf_conntrack: table
full, dropping packet.
Aug 18 22:07:05 localhost kernel: [4625821.237219] nf_conntrack: table
full, dropping packet.
Aug 18 22:07:05 localhost kernel: [4625821.243664] nf_conntrack: table
full, dropping packet.
Aug 18 22:07:05 localhost ./gbalancer-0.5.1[19991]: dial tcp
10.200.86.30:3306: i/o timeout
Aug 18 22:07:06 localhost ./gbalancer-0.5.1[19991]: dial tcp
10.200.86.31:3306: i/o timeout
Aug 18 22:07:06 localhost ./gbalancer-0.5.1[19991]: wrangler: detected
server 10.200.86.32:3306 is down
Aug 18 22:07:06 localhost ./gbalancer-0.5.1[19991]: wrangler: detected
server 10.200.86.31:3306 is down
Aug 18 22:07:06 localhost ./gbalancer-0.5.1[19991]: wrangler: detected
server 10.200.86.30:3306 is down
Aug 18 22:07:07 localhost kernel: [4625822.875756] [ cut
here ]
Aug 18 22:07:07 localhost kernel: [4625822.881685] WARNING: at
net/sched/sch_generic.c:261 dev_watchdog+0x26b/0x280() (Not tainted)
Aug 18 22:07:07 localhost kernel: [4625822.892399] Hardware name: IBM
System x3550 M4: -[7914ON9]-
Aug 18 22:07:07 localhost kernel: [4625822.899346] NETDEV WATCHDOG:
eth0 (igb): transmit queue 0 timed out
Aug 18 22:07:07 localhost kernel: [4625822.907052] Modules linked in:
iptable_filter iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4
nf_conntrack nf_defrag_ipv4 ip_tables tcp_diag inet_diag netconsole
xt_CHECKSUM configfs ip6table_filter ip6_tables ebtable_nat ebtables
cpufreq_ondemand acpi_cpufreq freq_table mperf ipmi_watchdog
ipmi_poweroff ipmi_devintf bridge bonding 8021q garp stp llc ipv6
vhost_net macvtap macvlan tun kvm_intel kvm cdc_ether usbnet mii
microcode iTCO_wdt iTCO_vendor_support igb dca i2c_algo_bit ptp
pps_core sg ics932s401 i2c_i801 i2c_core lpc_ich mfd_core shpchp ext4
jbd2 mbcache sd_mod crc_t10dif megaraid_sas wmi dm_mirror
dm_region_hash dm_log dm_mod [last unloaded: nf_conntrack]
Aug 18 22:07:07 localhost kernel: [4625822.979601] Pid: 0, comm:
swapper Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Aug 18 22:07:07 localhost kernel: [4625822.988093] Call Trace:
Aug 18 22:07:07 localhost kernel: [4625822.991527]  IRQ
[81071e27] ? warn_slowpath_common+0x87/0xc0
Aug 18 22:07:07 localhost kernel: [4625822.999861]
[81071f16] ? warn_slowpath_fmt+0x46/0x50
Aug 18 22:07:07 localhost kernel: [4625823.007193]
[8147bc8b] ? dev_watchdog+0x26b/0x280
Aug 18 22:07:07 localhost kernel: [4625823.014239]
[8105dd5c] ? scheduler_tick+0xcc/0x260
Aug 18 22:07:07 localhost kernel: [4625823.021369]
[8147ba20] ? dev_watchdog+0x0/0x280
Aug 18 22:07:07 localhost kernel: [4625823.028204]
[81084ae7] ? run_timer_softirq+0x197/0x340
Aug 18 22:07:07 localhost kernel: [4625823.035716]
[810ac8e5] ? tick_dev_program_event+0x65/0xc0
Aug 18 22:07:07 localhost kernel: [4625823.043526]
[8107a8e1] ? __do_softirq+0xc1/0x1e0
Aug 18 22:07:07 localhost kernel: [4625823.050447]
[810ac9ba] ? tick_program_event+0x2a/0x30
Aug 18 22:07:07 localhost kernel: [4625823.057877]
[8100c30c] ? call_softirq+0x1c/0x30
Aug 18 22:07:07 localhost kernel: [4625823.064744]
[8100fa75] ? do_softirq+0x65/0xa0
Aug 18 22:07:07 localhost kernel: [4625823.071385]
[8107a795] ? irq_exit+0x85/0x90
Aug 18 22:07:07 localhost kernel: [4625823.077831]
[815316ca] ? smp_apic_timer_interrupt+0x4a/0x60
Aug 18 22:07:07 localhost kernel: [4625823.085832]
[8100bb93] ? apic_timer_interrupt+0x13/0x20
Aug 18 22:07:07 localhost kernel: [4625823.093612]  EOI
[812e0bee] ? intel_idle+0xde/0x170
Aug 18 22:07:07 localhost kernel: [4625823.101047]
[812e0bd1] ? intel_idle+0xc1/0x170
Aug 18 22:07:07 localhost kernel: [4625823.107876]
[81426b67] ? cpuidle_idle_call+0xa7/0x140
Aug 18 22:07:07 localhost kernel: [4625823.115287]
[81009fc6] ? cpu_idle+0xb6/0x110
Aug 18 22:07:07 localhost kernel: [4625823.121822]
[8152143c] ? start_secondary+0x2ac/0x2ef
Aug 18 22:07:07 localhost 

Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum

2014-08-20 Thread Radim Krčmář
2014-08-20 17:34+0200, Paolo Bonzini:
 Il 20/08/2014 17:31, Radim Krčmář ha scritto:
  Btw. without extra code, we are still going to overflow on races when
  changing PW_grow, should they be covered as well?
 
 You mean because there is no spinlock or similar protecting the changes?
  I guess you could use a seqlock.

Yes, for example between a modification of ple_window
  new = min(old, PW_actual_max) * PW_grow
which gets compiled into something like this:
  1) tmp = min(old, PW_actual_max)
  2) new = tmp * PW_grow
and a write to increase PW_grow
  3) PW_actual_max = min(PW_max / new_PW_grow, PW_actual_max)
  4) PW_grow = new_PW_grow
  5) PW_actual_max = PW_max / new_PW_grow

3 and 4 can exectute between 1 and 2, which could overflow.

I don't think they are important enough to warrant a significant
performance hit of locking.
Or even more checks that would prevent it in a lockless way.

(I'd just see that the result is set to something legal and also drop
 line 3, because it does not help things that much.)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum

2014-08-20 Thread Paolo Bonzini
Il 20/08/2014 18:01, Radim Krčmář ha scritto:
 2014-08-20 17:34+0200, Paolo Bonzini:
 Il 20/08/2014 17:31, Radim Krčmář ha scritto:
 Btw. without extra code, we are still going to overflow on races when
 changing PW_grow, should they be covered as well?

 You mean because there is no spinlock or similar protecting the changes?
  I guess you could use a seqlock.
 
 Yes, for example between a modification of ple_window
   new = min(old, PW_actual_max) * PW_grow
 which gets compiled into something like this:
   1) tmp = min(old, PW_actual_max)
   2) new = tmp * PW_grow
 and a write to increase PW_grow
   3) PW_actual_max = min(PW_max / new_PW_grow, PW_actual_max)
   4) PW_grow = new_PW_grow
   5) PW_actual_max = PW_max / new_PW_grow
 
 3 and 4 can exectute between 1 and 2, which could overflow.
 
 I don't think they are important enough to warrant a significant
 performance hit of locking.

A seqlock just costs two memory accesses to the same (shared) cache line
as the PW data, and a non-taken branch.  I don't like code that is
unsafe by design...

Paolo

 Or even more checks that would prevent it in a lockless way.
 
 (I'd just see that the result is set to something legal and also drop
  line 3, because it does not help things that much.)
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum

2014-08-20 Thread Radim Krčmář
2014-08-20 18:03+0200, Paolo Bonzini:
 Il 20/08/2014 18:01, Radim Krčmář ha scritto:
  2014-08-20 17:34+0200, Paolo Bonzini:
  Il 20/08/2014 17:31, Radim Krčmář ha scritto:
  Btw. without extra code, we are still going to overflow on races when
  changing PW_grow, should they be covered as well?
 
  You mean because there is no spinlock or similar protecting the changes?
   I guess you could use a seqlock.
  
  Yes, for example between a modification of ple_window
new = min(old, PW_actual_max) * PW_grow
  which gets compiled into something like this:
1) tmp = min(old, PW_actual_max)
2) new = tmp * PW_grow
  and a write to increase PW_grow
3) PW_actual_max = min(PW_max / new_PW_grow, PW_actual_max)
4) PW_grow = new_PW_grow
5) PW_actual_max = PW_max / new_PW_grow
  
  3 and 4 can exectute between 1 and 2, which could overflow.
  
  I don't think they are important enough to warrant a significant
  performance hit of locking.
 
 A seqlock just costs two memory accesses to the same (shared) cache line
 as the PW data, and a non-taken branch.

Oh, seqlock readers do not have to write to shared memory, so it is
acceptable ...

 I don't like code that is
 unsafe by design...

I wouldn't say it is unsafe, because VCPU's PW is always greater than
module's PW. We are just going to PLE exit sooner than expected.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: LPC IOMMU and VFIO MicroConference - Call for Participation

2014-08-20 Thread Alex Williamson

Ok folks, it's time to submit your discussion proposals for the LPC
IOMMU and VFIO uconf.  If you added an idea to the wiki, now is the time
to formally propose it as a discussion topic.  If you have ideas how to
make the IOMMU or VFIO subsystems better, now is the time to propose it.
If you can't figure out how to make something work in the current
infrastructure, now is the time to propose a discussion.  If you're
adding new features and want to make sure we can support them, now is
the time to propose a discussion.

I don't think we've seen a formal schedule yet, but many of us have
conflicts with KVM Forum this year and I expect the LPC planning
committee to take that into account, so please submit your proposals
anyway and feel free to note your availability/conflicts in the Note to
organizers section.

LPC is full, but there is a waiting list and the sooner you can get on
it, the more likely you are to be registered.  I expect uconf discussion
leads to have an advantage in moving through the queue and we may be
able to provide discounted registration for discussion leads.  Thanks,

Alex

On Tue, 2014-08-12 at 11:20 +0200, Joerg Roedel wrote:
 LPC IOMMU and VFIO MicroConference - Call for Participation
 ===
 
 We are pleased to announce that this year there will be the first IOMMU
 and VFIO MicroConference held at Linux Plumbers Conference in
 Düsseldorf. An initial request for support of this micro conference
 generated, among others, the following possible topic ideas:
 
   * Improving generic IOMMU code and move code out of drivers
   * IOMMU device error handling
   * IOMMU Power Management
   * Virtualizing IOMMUs
   * Interface between IOMMUs an memory management
 
 More suggested topics can be found at the wiki page of the micro
 conference:
 
   http://wiki.linuxplumbersconf.org/2014:iommu_microconference
 
 We now ask for formal proposals for these discussions along with any
 other topics or problems that need to be discussed in this area.
 
 The format of the micro conference will be roughly half-hour slots for
 each topic, where the discussion lead gives a short introduction to the
 problem and maybe sketches possible solutions. The rest of the slot is
 open for discussions so that we come to an agreement how to move
 forward.
 
 Please submit your formal proposal on the Linux Plumbers website (OpenID
 login required) until August 31st at:
 
   
 http://www.linuxplumbersconf.org/2014/how-to-submit-microconference-discussions-topics/
 
 Hope to see you in Düsseldorf!
 
 
   Joerg Roedel and Alex Williamson
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/6] KVM: VMX: make PLE window per-VCPU

2014-08-20 Thread Radim Krčmář
Change PLE window into per-VCPU variable, seeded from module parameter,
to allow greater flexibility.

Brings in a small overhead on every vmentry.

Signed-off-by: Radim Krčmář rkrc...@redhat.com
---
 arch/x86/kvm/vmx.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2b306f9..18e0e52 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -484,6 +484,9 @@ struct vcpu_vmx {
 
/* Support for a guest hypervisor (nested VMX) */
struct nested_vmx nested;
+
+   /* Dynamic PLE window. */
+   int ple_window;
 };
 
 enum segment_cache_field {
@@ -4402,7 +4405,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
 
if (ple_gap) {
vmcs_write32(PLE_GAP, ple_gap);
-   vmcs_write32(PLE_WINDOW, ple_window);
+   vmx-ple_window = ple_window;
}
 
vmcs_write32(PAGE_FAULT_ERROR_CODE_MASK, 0);
@@ -7387,6 +7390,9 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
if (vmx-emulation_required)
return;
 
+   if (ple_gap)
+   vmcs_write32(PLE_WINDOW, vmx-ple_window);
+
if (vmx-nested.sync_shadow_vmcs) {
copy_vmcs12_to_shadow(vmx);
vmx-nested.sync_shadow_vmcs = false;
-- 
2.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 6/6] KVM: VMX: runtime knobs for dynamic PLE window

2014-08-20 Thread Radim Krčmář
ple_window is updated on every vmentry, so there is no reason to have it
read-only anymore.
ple_window* weren't writable to prevent runtime overflow races;
they are prevented by a seqlock.

Signed-off-by: Radim Krčmář rkrc...@redhat.com
---
 arch/x86/kvm/vmx.c | 48 +---
 1 file changed, 37 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f63ac5d..bd73fa1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -132,24 +132,29 @@ module_param(nested, bool, S_IRUGO);
 #define KVM_VMX_DEFAULT_PLE_WINDOW_MAX\
INT_MAX / KVM_VMX_DEFAULT_PLE_WINDOW_GROW
 
+static struct kernel_param_ops param_ops_ple_t;
+#define param_check_ple_t(name, p) __param_check(name, p, int)
+
+static DEFINE_SEQLOCK(ple_window_seqlock);
+
 static int ple_gap = KVM_VMX_DEFAULT_PLE_GAP;
 module_param(ple_gap, int, S_IRUGO);
 
 static int ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
-module_param(ple_window, int, S_IRUGO);
+module_param(ple_window, ple_t, S_IRUGO | S_IWUSR);
 
 /* Default doubles per-vcpu window every exit. */
 static int ple_window_grow = KVM_VMX_DEFAULT_PLE_WINDOW_GROW;
-module_param(ple_window_grow, int, S_IRUGO);
+module_param(ple_window_grow, ple_t, S_IRUGO | S_IWUSR);
 
 /* Default resets per-vcpu window every exit to ple_window. */
 static int ple_window_shrink = KVM_VMX_DEFAULT_PLE_WINDOW_SHRINK;
-module_param(ple_window_shrink, int, S_IRUGO);
+module_param(ple_window_shrink, int, S_IRUGO | S_IWUSR);
 
 /* Default is to compute the maximum so we can never overflow. */
 static int ple_window_actual_max = KVM_VMX_DEFAULT_PLE_WINDOW_MAX;
 static int ple_window_max= KVM_VMX_DEFAULT_PLE_WINDOW_MAX;
-module_param(ple_window_max, int, S_IRUGO);
+module_param(ple_window_max, ple_t, S_IRUGO | S_IWUSR);
 
 extern const ulong vmx_return;
 
@@ -5730,13 +5735,19 @@ static void modify_ple_window(struct kvm_vcpu *vcpu, 
int grow)
struct vcpu_vmx *vmx = to_vmx(vcpu);
int old = vmx-ple_window;
int new;
+   unsigned seq;
 
-   if (grow)
-   new = __grow_ple_window(old)
-   else
-   new = __shrink_ple_window(old, ple_window_shrink, ple_window);
+   do {
+   seq = read_seqbegin(ple_window_seqlock);
 
-   vmx-ple_window = max(new, ple_window);
+   if (grow)
+   new = __grow_ple_window(old);
+   else
+   new = __shrink_ple_window(old, ple_window_shrink,
+ ple_window);
+
+   vmx-ple_window = max(new, ple_window);
+   } while (read_seqretry(ple_window_seqlock, seq));
 
trace_kvm_ple_window(grow, vcpu-vcpu_id, vmx-ple_window, old);
 }
@@ -5750,6 +5761,23 @@ static void update_ple_window_actual_max(void)
ple_window_grow, INT_MIN);
 }
 
+static int param_set_ple_t(const char *arg, const struct kernel_param *kp)
+{
+   int ret;
+
+   write_seqlock(ple_window_seqlock);
+   ret = param_set_int(arg, kp);
+   update_ple_window_actual_max();
+   write_sequnlock(ple_window_seqlock);
+
+   return ret;
+}
+
+static struct kernel_param_ops param_ops_ple_t = {
+   .set = param_set_ple_t,
+   .get = param_get_int,
+};
+
 /*
  * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
  * exiting, so only get here on cpu with PAUSE-Loop-Exiting.
@@ -9153,8 +9181,6 @@ static int __init vmx_init(void)
} else
kvm_disable_tdp();
 
-   update_ple_window_actual_max();
-
return 0;
 
 out7:
-- 
2.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/6] KVM: VMX: dynamise PLE window

2014-08-20 Thread Radim Krčmář
Window is increased on every PLE exit and decreased on every sched_in.
The idea is that we don't want to PLE exit if there is no preemption
going on.
We do this with sched_in() because it does not hold rq lock.

There are two new kernel parameters for changing the window:
 ple_window_grow and ple_window_shrink
ple_window_grow affects the window on PLE exit and ple_window_shrink
does it on sched_in;  depending on their value, the window is modifier
like this: (ple_window is kvm_intel's global)

  ple_window_shrink/ |
  ple_window_grow| PLE exit   | sched_in
  ---++-
   1|  = ple_window  |  = ple_window
   ple_window   | *= ple_window_grow | /= ple_window_shrink
  otherwise  | += ple_window_grow | -= ple_window_shrink

A third new parameter, ple_window_max, controls a maximal ple_window.
A minimum equals to ple_window.

Signed-off-by: Radim Krčmář rkrc...@redhat.com
---
 arch/x86/kvm/vmx.c | 80 --
 1 file changed, 78 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 18e0e52..e63d7ac 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -125,14 +125,32 @@ module_param(nested, bool, S_IRUGO);
  * Time is measured based on a counter that runs at the same rate as the TSC,
  * refer SDM volume 3b section 21.6.13  22.1.3.
  */
-#define KVM_VMX_DEFAULT_PLE_GAP128
-#define KVM_VMX_DEFAULT_PLE_WINDOW 4096
+#define KVM_VMX_DEFAULT_PLE_GAP   128
+#define KVM_VMX_DEFAULT_PLE_WINDOW4096
+#define KVM_VMX_DEFAULT_PLE_WINDOW_GROW   2
+#define KVM_VMX_DEFAULT_PLE_WINDOW_SHRINK 0
+#define KVM_VMX_DEFAULT_PLE_WINDOW_MAX\
+   INT_MAX / KVM_VMX_DEFAULT_PLE_WINDOW_GROW
+
 static int ple_gap = KVM_VMX_DEFAULT_PLE_GAP;
 module_param(ple_gap, int, S_IRUGO);
 
 static int ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
 module_param(ple_window, int, S_IRUGO);
 
+/* Default doubles per-vcpu window every exit. */
+static int ple_window_grow = KVM_VMX_DEFAULT_PLE_WINDOW_GROW;
+module_param(ple_window_grow, int, S_IRUGO);
+
+/* Default resets per-vcpu window every exit to ple_window. */
+static int ple_window_shrink = KVM_VMX_DEFAULT_PLE_WINDOW_SHRINK;
+module_param(ple_window_shrink, int, S_IRUGO);
+
+/* Default is to compute the maximum so we can never overflow. */
+static int ple_window_actual_max = KVM_VMX_DEFAULT_PLE_WINDOW_MAX;
+static int ple_window_max= KVM_VMX_DEFAULT_PLE_WINDOW_MAX;
+module_param(ple_window_max, int, S_IRUGO);
+
 extern const ulong vmx_return;
 
 #define NR_AUTOLOAD_MSRS 8
@@ -5679,12 +5697,66 @@ out:
return ret;
 }
 
+static int __grow_ple_window(int val)
+{
+   if (ple_window_grow  1)
+   return ple_window;
+
+   val = min(val, ple_window_actual_max);
+
+   if (ple_window_grow  ple_window)
+   val *= ple_window_grow;
+   else
+   val += ple_window_grow;
+
+   return val;
+}
+
+static int __shrink_ple_window(int val, int shrinker, int minimum)
+{
+   if (shrinker  1)
+   return ple_window;
+
+   if (shrinker  ple_window)
+   val /= shrinker;
+   else
+   val -= shrinker;
+
+   return max(val, minimum);
+}
+
+static void modify_ple_window(struct kvm_vcpu *vcpu, int grow)
+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+   int new;
+
+   if (grow)
+   new = __grow_ple_window(vmx-ple_window);
+   else
+   new = __shrink_ple_window(vmx-ple_window, ple_window_shrink,
+ ple_window);
+
+   vmx-ple_window = max(new, ple_window);
+}
+#define grow_ple_window(vcpu)   modify_ple_window(vcpu, 1)
+#define shrink_ple_window(vcpu) modify_ple_window(vcpu, 0)
+
+static void update_ple_window_actual_max(void)
+{
+   ple_window_actual_max =
+   __shrink_ple_window(max(ple_window_max, ple_window),
+   ple_window_grow, INT_MIN);
+}
+
 /*
  * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE
  * exiting, so only get here on cpu with PAUSE-Loop-Exiting.
  */
 static int handle_pause(struct kvm_vcpu *vcpu)
 {
+   if (ple_gap)
+   grow_ple_window(vcpu);
+
skip_emulated_instruction(vcpu);
kvm_vcpu_on_spin(vcpu);
 
@@ -8854,6 +8926,8 @@ static int vmx_check_intercept(struct kvm_vcpu *vcpu,
 
 void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
 {
+   if (ple_gap)
+   shrink_ple_window(vcpu);
 }
 
 static struct kvm_x86_ops vmx_x86_ops = {
@@ -9077,6 +9151,8 @@ static int __init vmx_init(void)
} else
kvm_disable_tdp();
 
+   update_ple_window_actual_max();
+
return 0;
 
 out7:
-- 
2.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

[PATCH v2 5/6] KVM: trace kvm_ple_window

2014-08-20 Thread Radim Krčmář
Tracepoint for dynamic PLE window, fired on every potential change.

Signed-off-by: Radim Krčmář rkrc...@redhat.com
---
 arch/x86/kvm/trace.h | 25 +
 arch/x86/kvm/vmx.c   |  8 +---
 arch/x86/kvm/x86.c   |  1 +
 3 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index e850a7d..4b8e6cb 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -848,6 +848,31 @@ TRACE_EVENT(kvm_track_tsc,
  __print_symbolic(__entry-host_clock, host_clocks))
 );
 
+TRACE_EVENT(kvm_ple_window,
+   TP_PROTO(int grow, unsigned int vcpu_id, int new, int old),
+   TP_ARGS(grow, vcpu_id, new, old),
+
+   TP_STRUCT__entry(
+   __field( int,  grow )
+   __field(unsigned int,   vcpu_id )
+   __field( int,   new )
+   __field( int,   old )
+   ),
+
+   TP_fast_assign(
+   __entry-grow   = grow;
+   __entry-vcpu_id= vcpu_id;
+   __entry-new= new;
+   __entry-old= old;
+   ),
+
+   TP_printk(vcpu %u: ple_window %d %s %d,
+ __entry-vcpu_id,
+ __entry-new,
+ __entry-grow ? + : -,
+ __entry-old)
+);
+
 #endif /* CONFIG_X86_64 */
 
 #endif /* _TRACE_KVM_H */
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e63d7ac..f63ac5d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5728,15 +5728,17 @@ static int __shrink_ple_window(int val, int shrinker, 
int minimum)
 static void modify_ple_window(struct kvm_vcpu *vcpu, int grow)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
+   int old = vmx-ple_window;
int new;
 
if (grow)
-   new = __grow_ple_window(vmx-ple_window);
+   new = __grow_ple_window(old)
else
-   new = __shrink_ple_window(vmx-ple_window, ple_window_shrink,
- ple_window);
+   new = __shrink_ple_window(old, ple_window_shrink, ple_window);
 
vmx-ple_window = max(new, ple_window);
+
+   trace_kvm_ple_window(grow, vcpu-vcpu_id, vmx-ple_window, old);
 }
 #define grow_ple_window(vcpu)   modify_ple_window(vcpu, 1)
 #define shrink_ple_window(vcpu) modify_ple_window(vcpu, 0)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5696ee7..814b20c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7648,3 +7648,4 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_invlpga);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_skinit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_intercepts);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_write_tsc_offset);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_ple_window);
-- 
2.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/6] KVM: x86: introduce sched_in to kvm_x86_ops

2014-08-20 Thread Radim Krčmář
sched_in preempt notifier is available for x86, allow its use in
specific virtualization technlogies as well.

Signed-off-by: Radim Krčmář rkrc...@redhat.com
---
 arch/x86/include/asm/kvm_host.h | 2 ++
 arch/x86/kvm/svm.c  | 6 ++
 arch/x86/kvm/vmx.c  | 6 ++
 arch/x86/kvm/x86.c  | 1 +
 4 files changed, 15 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5724601..358e2f3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -772,6 +772,8 @@ struct kvm_x86_ops {
bool (*mpx_supported)(void);
 
int (*check_nested_events)(struct kvm_vcpu *vcpu, bool external_intr);
+
+   void (*sched_in)(struct kvm_vcpu *kvm, int cpu);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ddf7427..4baf1bc 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -4305,6 +4305,10 @@ static void svm_handle_external_intr(struct kvm_vcpu 
*vcpu)
local_irq_enable();
 }
 
+static void svm_sched_in(struct kvm_vcpu *vcpu, int cpu)
+{
+}
+
 static struct kvm_x86_ops svm_x86_ops = {
.cpu_has_kvm_support = has_svm,
.disabled_by_bios = is_disabled,
@@ -4406,6 +4410,8 @@ static struct kvm_x86_ops svm_x86_ops = {
 
.check_intercept = svm_check_intercept,
.handle_external_intr = svm_handle_external_intr,
+
+   .sched_in = svm_sched_in,
 };
 
 static int __init svm_init(void)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bfe11cf..2b306f9 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8846,6 +8846,10 @@ static int vmx_check_intercept(struct kvm_vcpu *vcpu,
return X86EMUL_CONTINUE;
 }
 
+void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
+{
+}
+
 static struct kvm_x86_ops vmx_x86_ops = {
.cpu_has_kvm_support = cpu_has_kvm_support,
.disabled_by_bios = vmx_disabled_by_bios,
@@ -8951,6 +8955,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
.mpx_supported = vmx_mpx_supported,
 
.check_nested_events = vmx_check_nested_events,
+
+   .sched_in = vmx_sched_in,
 };
 
 static int __init vmx_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d7c214f..5696ee7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7148,6 +7148,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 
 void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
 {
+   kvm_x86_ops-sched_in(vcpu, cpu);
 }
 
 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
-- 
2.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/6] Dynamic Pause Loop Exiting window.

2014-08-20 Thread Radim Krčmář
v1 - v2:
 * squashed [v1 4/9] and [v1 5/9] (clamping)
 * dropped [v1 7/9] (CPP abstractions)
 * merged core of [v1 9/9] into [v1 4/9] (automatic maximum)
 * reworked kernel_param_ops: closer to pure int [v2 6/6]
 * introduced ple_window_actual_max  reworked clamping [v2 4/6]
 * added seqlock for parameter modifications [v2 6/6]

---
PLE does not scale in its current form.  When increasing VCPU count
above 150, one can hit soft lockups because of runqueue lock contention.
(Which says a lot about performance.)

The main reason is that kvm_ple_loop cycles through all VCPUs.
Replacing it with a scalable solution would be ideal, but it has already
been well optimized for various workloads, so this series tries to
alleviate one different major problem while minimizing a chance of
regressions: we have too many useless PLE exits.

Just increasing PLE window would help some cases, but it still spirals
out of control.  By increasing the window after every PLE exit, we can
limit the amount of useless ones, so we don't reach the state where CPUs
spend 99% of the time waiting for a lock.

HP confirmed that this series prevents soft lockups and TSC sync errors
on large guests.

Radim Krčmář (6):
  KVM: add kvm_arch_sched_in
  KVM: x86: introduce sched_in to kvm_x86_ops
  KVM: VMX: make PLE window per-VCPU
  KVM: VMX: dynamise PLE window
  KVM: trace kvm_ple_window
  KVM: VMX: runtime knobs for dynamic PLE window

 arch/arm/kvm/arm.c  |   4 ++
 arch/mips/kvm/mips.c|   4 ++
 arch/powerpc/kvm/powerpc.c  |   4 ++
 arch/s390/kvm/kvm-s390.c|   4 ++
 arch/x86/include/asm/kvm_host.h |   2 +
 arch/x86/kvm/svm.c  |   6 ++
 arch/x86/kvm/trace.h|  25 
 arch/x86/kvm/vmx.c  | 124 ++--
 arch/x86/kvm/x86.c  |   6 ++
 include/linux/kvm_host.h|   2 +
 virt/kvm/kvm_main.c |   2 +
 11 files changed, 179 insertions(+), 4 deletions(-)

-- 
2.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/6] KVM: add kvm_arch_sched_in

2014-08-20 Thread Radim Krčmář
Introduce preempt notifiers for architecture specific code.
Advantage over creating a new notifier in every arch is slightly simpler
code and guaranteed call order with respect to kvm_sched_in.

Signed-off-by: Radim Krčmář rkrc...@redhat.com
---
 arch/arm/kvm/arm.c | 4 
 arch/mips/kvm/mips.c   | 4 
 arch/powerpc/kvm/powerpc.c | 4 
 arch/s390/kvm/kvm-s390.c   | 4 
 arch/x86/kvm/x86.c | 4 
 include/linux/kvm_host.h   | 2 ++
 virt/kvm/kvm_main.c| 2 ++
 7 files changed, 24 insertions(+)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index a99e0cd..9f788eb 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -288,6 +288,10 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 {
 }
 
+void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
+{
+}
+
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
vcpu-cpu = cpu;
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index cd71141..2362df2 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -1002,6 +1002,10 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 {
 }
 
+void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
+{
+}
+
 int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
  struct kvm_translation *tr)
 {
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 4c79284..cbc432f 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -720,6 +720,10 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
kvmppc_subarch_vcpu_uninit(vcpu);
 }
 
+void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
+{
+}
+
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 #ifdef CONFIG_BOOKE
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index ce81eb2..a3c324e 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -555,6 +555,10 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
/* Nothing todo */
 }
 
+void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
+{
+}
+
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
save_fp_ctl(vcpu-arch.host_fpregs.fpc);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f1e22d..d7c214f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7146,6 +7146,10 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
static_key_slow_dec(kvm_no_apic_vcpu);
 }
 
+void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
+{
+}
+
 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
if (type)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a4c33b3..ebd7236 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -624,6 +624,8 @@ void kvm_arch_exit(void);
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu);
 
+void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu);
+
 void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 33712fb..d3c3ed0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3123,6 +3123,8 @@ static void kvm_sched_in(struct preempt_notifier *pn, int 
cpu)
if (vcpu-preempted)
vcpu-preempted = false;
 
+   kvm_arch_sched_in(vcpu, cpu);
+
kvm_arch_vcpu_load(vcpu, cpu);
 }
 
-- 
2.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RESEND][PATCH] kvm-unit-tests: x86: pmu: call measure for every counter in check_counters_many

2014-08-20 Thread Chris J Arges
In the check_counters_many function measure was only being called on the last
counter, causing the pmu test to fail. This ensures that measure is called for
each counter in the array before calling verify_counter.

Signed-off-by: Chris J Arges chris.j.ar...@canonical.com
---
 x86/pmu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/x86/pmu.c b/x86/pmu.c
index 5c85146..3402d1e 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -287,11 +287,11 @@ static void check_counters_many(void)
n++;
}
 
-   measure(cnt, n);
-
-   for (i = 0; i  n; i++)
+   for (i = 0; i  n; i++) {
+   measure(cnt[i], 1);
if (!verify_counter(cnt[i]))
break;
+   }
 
report(all counters, i == n);
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu

2014-08-20 Thread Chai Wen
On 08/19/2014 09:36 AM, Chai Wen wrote:

 On 08/19/2014 04:38 AM, Don Zickus wrote:
 
 On Mon, Aug 18, 2014 at 09:02:00PM +0200, Ingo Molnar wrote:

 * Don Zickus dzic...@redhat.com wrote:

 So I agree with the motivation of this improvement, but 
 is this implementation namespace-safe?

 What namespace are you worried about colliding with?  I 
 thought softlockup_ would provide the safety??  Maybe I 
 am missing something obvious. :-(

 I meant PID namespaces - a PID in itself isn't guaranteed 
 to be unique across the system.

 Ah, I don't think we thought about that.  Is there a better 
 way to do this?  Is there a domain id or something that can 
 be OR'd with the pid?

 What is always unique is the task pointer itself. We use pids 
 when we interface with user-space - but we don't really do that 
 here, right?

 No, I don't believe so.  Ok, so saving 'current' and comparing that should
 be enough, correct?

 
 
 I am not sure of the safety about using pid here with namespace.
 But as to the pointer of process, is there a chance that we got a 'historical'
 address saved in the 'softlockup_warn_pid(or address)_saved' and the current
 hogging process happened to get the same task pointer address?
 If it never happens, I think the comparing of address is ok.
 


Hi Ingo

what do you think of Don's solution- 'comparing of task pointer' ?
Anyway this is just an additional check about some very special cases,
so I think the issue that I am concerned above is not a problem at all.
And after learning some concepts about PID namespace, I think comparing
of task pointer is reliable dealing with PID namespace here.

And Don, If you want me to re-post this patch, please let me know that.

thanks
chai wen

 thanks
 chai wen
 
 Cheers,
 Don
 .

 
 
 



-- 
Regards

Chai Wen
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: LPC IOMMU and VFIO MicroConference - Call for Participation

2014-08-20 Thread Jiang Liu
Hi Alex and Joerg,
I have my travel request approved but missed the registration window.
Hope I will be lucky:)
Regards!
Gerry

On 2014/8/21 1:10, Alex Williamson wrote:
 
 Ok folks, it's time to submit your discussion proposals for the LPC
 IOMMU and VFIO uconf.  If you added an idea to the wiki, now is the time
 to formally propose it as a discussion topic.  If you have ideas how to
 make the IOMMU or VFIO subsystems better, now is the time to propose it.
 If you can't figure out how to make something work in the current
 infrastructure, now is the time to propose a discussion.  If you're
 adding new features and want to make sure we can support them, now is
 the time to propose a discussion.
 
 I don't think we've seen a formal schedule yet, but many of us have
 conflicts with KVM Forum this year and I expect the LPC planning
 committee to take that into account, so please submit your proposals
 anyway and feel free to note your availability/conflicts in the Note to
 organizers section.
 
 LPC is full, but there is a waiting list and the sooner you can get on
 it, the more likely you are to be registered.  I expect uconf discussion
 leads to have an advantage in moving through the queue and we may be
 able to provide discounted registration for discussion leads.  Thanks,
 
 Alex
 
 On Tue, 2014-08-12 at 11:20 +0200, Joerg Roedel wrote:
 LPC IOMMU and VFIO MicroConference - Call for Participation
 ===

 We are pleased to announce that this year there will be the first IOMMU
 and VFIO MicroConference held at Linux Plumbers Conference in
 Düsseldorf. An initial request for support of this micro conference
 generated, among others, the following possible topic ideas:

  * Improving generic IOMMU code and move code out of drivers
  * IOMMU device error handling
  * IOMMU Power Management
  * Virtualizing IOMMUs
  * Interface between IOMMUs an memory management

 More suggested topics can be found at the wiki page of the micro
 conference:

  http://wiki.linuxplumbersconf.org/2014:iommu_microconference

 We now ask for formal proposals for these discussions along with any
 other topics or problems that need to be discussed in this area.

 The format of the micro conference will be roughly half-hour slots for
 each topic, where the discussion lead gives a short introduction to the
 problem and maybe sketches possible solutions. The rest of the slot is
 open for discussions so that we come to an agreement how to move
 forward.

 Please submit your formal proposal on the Linux Plumbers website (OpenID
 login required) until August 31st at:

  
 http://www.linuxplumbersconf.org/2014/how-to-submit-microconference-discussions-topics/

 Hope to see you in Düsseldorf!


  Joerg Roedel and Alex Williamson

 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 
 
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu

2014-08-20 Thread Don Zickus
On Thu, Aug 21, 2014 at 09:37:04AM +0800, Chai Wen wrote:
 On 08/19/2014 09:36 AM, Chai Wen wrote:
 
  On 08/19/2014 04:38 AM, Don Zickus wrote:
  
  On Mon, Aug 18, 2014 at 09:02:00PM +0200, Ingo Molnar wrote:
 
  * Don Zickus dzic...@redhat.com wrote:
 
  So I agree with the motivation of this improvement, but 
  is this implementation namespace-safe?
 
  What namespace are you worried about colliding with?  I 
  thought softlockup_ would provide the safety??  Maybe I 
  am missing something obvious. :-(
 
  I meant PID namespaces - a PID in itself isn't guaranteed 
  to be unique across the system.
 
  Ah, I don't think we thought about that.  Is there a better 
  way to do this?  Is there a domain id or something that can 
  be OR'd with the pid?
 
  What is always unique is the task pointer itself. We use pids 
  when we interface with user-space - but we don't really do that 
  here, right?
 
  No, I don't believe so.  Ok, so saving 'current' and comparing that should
  be enough, correct?
 
  
  
  I am not sure of the safety about using pid here with namespace.
  But as to the pointer of process, is there a chance that we got a 
  'historical'
  address saved in the 'softlockup_warn_pid(or address)_saved' and the current
  hogging process happened to get the same task pointer address?
  If it never happens, I think the comparing of address is ok.
  
 
 
 Hi Ingo
 
 what do you think of Don's solution- 'comparing of task pointer' ?
 Anyway this is just an additional check about some very special cases,
 so I think the issue that I am concerned above is not a problem at all.
 And after learning some concepts about PID namespace, I think comparing
 of task pointer is reliable dealing with PID namespace here.
 
 And Don, If you want me to re-post this patch, please let me know that.

Sure, just quickly test with the task pointer to make sure it still works
and then re-post.

Cheers,
Don
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] powerpc/booke: Restrict SPE exception handlers to e200/e500 cores

2014-08-20 Thread Mihai Caraman
SPE exception handlers are now defined for 32-bit e500mc cores even though
SPE unit is not present and CONFIG_SPE is undefined.

Restrict SPE exception handlers to e200/e500 cores adding CONFIG_SPE_POSSIBLE
and consequently guard __stup_ivors and __setup_cpu functions.

Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
Cc: Scott Wood scottw...@freescale.com
Cc: Alexander Graf ag...@suse.de
---
v2:
 - use CONFIG_PPC_E500MC without CONFIG_E500
 - use elif defined()

 arch/powerpc/kernel/cpu_setup_fsl_booke.S | 12 +++-
 arch/powerpc/kernel/cputable.c|  5 +
 arch/powerpc/kernel/head_fsl_booke.S  | 18 +-
 arch/powerpc/platforms/Kconfig.cputype|  6 +-
 4 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/cpu_setup_fsl_booke.S 
b/arch/powerpc/kernel/cpu_setup_fsl_booke.S
index 4f1393d..dddba3e 100644
--- a/arch/powerpc/kernel/cpu_setup_fsl_booke.S
+++ b/arch/powerpc/kernel/cpu_setup_fsl_booke.S
@@ -91,6 +91,7 @@ _GLOBAL(setup_altivec_idle)
 
blr
 
+#ifdef CONFIG_PPC_E500MC
 _GLOBAL(__setup_cpu_e6500)
mflrr6
 #ifdef CONFIG_PPC64
@@ -107,14 +108,20 @@ _GLOBAL(__setup_cpu_e6500)
bl  __setup_cpu_e5500
mtlrr6
blr
+#endif /* CONFIG_PPC_E500MC */
 
 #ifdef CONFIG_PPC32
+#ifdef CONFIG_E200
 _GLOBAL(__setup_cpu_e200)
/* enable dedicated debug exception handling resources (Debug APU) */
mfspr   r3,SPRN_HID0
ori r3,r3,HID0_DAPUEN@l
mtspr   SPRN_HID0,r3
b   __setup_e200_ivors
+#endif /* CONFIG_E200 */
+
+#ifdef CONFIG_E500
+#ifndef CONFIG_PPC_E500MC
 _GLOBAL(__setup_cpu_e500v1)
 _GLOBAL(__setup_cpu_e500v2)
mflrr4
@@ -129,6 +136,7 @@ _GLOBAL(__setup_cpu_e500v2)
 #endif
mtlrr4
blr
+#else /* CONFIG_PPC_E500MC */
 _GLOBAL(__setup_cpu_e500mc)
 _GLOBAL(__setup_cpu_e5500)
mflrr5
@@ -159,7 +167,9 @@ _GLOBAL(__setup_cpu_e5500)
 2:
mtlrr5
blr
-#endif
+#endif /* CONFIG_PPC_E500MC */
+#endif /* CONFIG_E500 */
+#endif /* CONFIG_PPC32 */
 
 #ifdef CONFIG_PPC_BOOK3E_64
 _GLOBAL(__restore_cpu_e6500)
diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index 0c15764..df979c5f 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -2051,6 +2051,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
 #endif /* CONFIG_PPC32 */
 #ifdef CONFIG_E500
 #ifdef CONFIG_PPC32
+#ifndef CONFIG_PPC_E500MC
{   /* e500 */
.pvr_mask   = 0x,
.pvr_value  = 0x8020,
@@ -2090,6 +2091,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
.machine_check  = machine_check_e500,
.platform   = ppc8548,
},
+#else
{   /* e500mc */
.pvr_mask   = 0x,
.pvr_value  = 0x8023,
@@ -2108,7 +2110,9 @@ static struct cpu_spec __initdata cpu_specs[] = {
.machine_check  = machine_check_e500mc,
.platform   = ppce500mc,
},
+#endif /* CONFIG_PPC_E500MC */
 #endif /* CONFIG_PPC32 */
+#ifdef CONFIG_PPC_E500MC
{   /* e5500 */
.pvr_mask   = 0x,
.pvr_value  = 0x8024,
@@ -2152,6 +2156,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
.machine_check  = machine_check_e500mc,
.platform   = ppce6500,
},
+#endif /* CONFIG_PPC_E500MC */
 #ifdef CONFIG_PPC32
{   /* default match */
.pvr_mask   = 0x,
diff --git a/arch/powerpc/kernel/head_fsl_booke.S 
b/arch/powerpc/kernel/head_fsl_booke.S
index b497188..90f487f 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -613,6 +613,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV)
mfspr   r10, SPRN_SPRG_RSCRATCH0
b   InstructionStorage
 
+/* Define SPE handlers for e200 and e500v2 */
 #ifdef CONFIG_SPE
/* SPE Unavailable */
START_EXCEPTION(SPEUnavailable)
@@ -622,10 +623,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV)
b   fast_exception_return
 1: addir3,r1,STACK_FRAME_OVERHEAD
EXC_XFER_EE_LITE(0x2010, KernelSPE)
-#else
+#elif defined(CONFIG_SPE_POSSIBLE)
EXCEPTION(0x2020, SPE_ALTIVEC_UNAVAIL, SPEUnavailable, \
  unknown_exception, EXC_XFER_EE)
-#endif /* CONFIG_SPE */
+#endif /* CONFIG_SPE_POSSIBLE */
 
/* SPE Floating Point Data */
 #ifdef CONFIG_SPE
@@ -635,12 +636,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV)
/* SPE Floating Point Round */
EXCEPTION(0x2050, SPE_FP_ROUND, SPEFloatingPointRound, \
  SPEFloatingPointRoundException, EXC_XFER_EE)
-#else
+#elif defined(CONFIG_SPE_POSSIBLE)
EXCEPTION(0x2040, 

[PATCH v2 2/2] powerpc/booke: Revert SPE/AltiVec common defines for interrupt numbers

2014-08-20 Thread Mihai Caraman
Book3E specification defines shared interrupt numbers for SPE and AltiVec
units. Still SPE is present in e200/e500v2 cores while AltiVec is present in
e6500 core. So we can currently decide at compile-time which unit to support
exclusively. As Alexander Graf suggested, this will improve code readability
especially in KVM.

Use distinct defines to identify SPE/AltiVec interrupt numbers, reverting
c58ce397 and 6b310fc5 patches that added common defines.

Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
Cc: Scott Wood scottw...@freescale.com
Cc: Alexander Graf ag...@suse.de
---
 arch/powerpc/kernel/exceptions-64e.S | 4 ++--
 arch/powerpc/kernel/head_fsl_booke.S | 8 
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index bb9cac6..3e68d1c 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -635,7 +635,7 @@ interrupt_end_book3e:
 
 /* Altivec Unavailable Interrupt */
START_EXCEPTION(altivec_unavailable);
-   NORMAL_EXCEPTION_PROLOG(0x200, BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL,
+   NORMAL_EXCEPTION_PROLOG(0x200, BOOKE_INTERRUPT_ALTIVEC_UNAVAIL,
PROLOG_ADDITION_NONE)
/* we can probably do a shorter exception entry for that one... */
EXCEPTION_COMMON(0x200)
@@ -658,7 +658,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
 /* AltiVec Assist */
START_EXCEPTION(altivec_assist);
NORMAL_EXCEPTION_PROLOG(0x220,
-   BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST,
+   BOOKE_INTERRUPT_ALTIVEC_ASSIST,
PROLOG_ADDITION_NONE)
EXCEPTION_COMMON(0x220)
INTS_DISABLE
diff --git a/arch/powerpc/kernel/head_fsl_booke.S 
b/arch/powerpc/kernel/head_fsl_booke.S
index 90f487f..fffd1f9 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -617,27 +617,27 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV)
 #ifdef CONFIG_SPE
/* SPE Unavailable */
START_EXCEPTION(SPEUnavailable)
-   NORMAL_EXCEPTION_PROLOG(SPE_ALTIVEC_UNAVAIL)
+   NORMAL_EXCEPTION_PROLOG(SPE_UNAVAIL)
beq 1f
bl  load_up_spe
b   fast_exception_return
 1: addir3,r1,STACK_FRAME_OVERHEAD
EXC_XFER_EE_LITE(0x2010, KernelSPE)
 #elif defined(CONFIG_SPE_POSSIBLE)
-   EXCEPTION(0x2020, SPE_ALTIVEC_UNAVAIL, SPEUnavailable, \
+   EXCEPTION(0x2020, SPE_UNAVAIL, SPEUnavailable, \
  unknown_exception, EXC_XFER_EE)
 #endif /* CONFIG_SPE_POSSIBLE */
 
/* SPE Floating Point Data */
 #ifdef CONFIG_SPE
-   EXCEPTION(0x2030, SPE_FP_DATA_ALTIVEC_ASSIST, SPEFloatingPointData,
+   EXCEPTION(0x2030, SPE_FP_DATA, SPEFloatingPointData,
  SPEFloatingPointException, EXC_XFER_EE)
 
/* SPE Floating Point Round */
EXCEPTION(0x2050, SPE_FP_ROUND, SPEFloatingPointRound, \
  SPEFloatingPointRoundException, EXC_XFER_EE)
 #elif defined(CONFIG_SPE_POSSIBLE)
-   EXCEPTION(0x2040, SPE_FP_DATA_ALTIVEC_ASSIST, SPEFloatingPointData,
+   EXCEPTION(0x2040, SPE_FP_DATA, SPEFloatingPointData,
  unknown_exception, EXC_XFER_EE)
EXCEPTION(0x2050, SPE_FP_ROUND, SPEFloatingPointRound, \
  unknown_exception, EXC_XFER_EE)
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 0/6] KVM: PPC: Book3e: AltiVec support

2014-08-20 Thread Mihai Caraman
Add KVM Book3e AltiVec support.

Changes:

v4:
 - use CONFIG_SPE_POSSIBLE and a new ifdef for CONFIG_ALTIVEC
 - remove SPE handlers from bookehv
 - split ONE_REG powerpc generic and ONE_REG AltiVec
 - add setters for IVPR, IVOR2 and IVOR8
 - add api documentation for ONE_REG IVPR and IVORs
 - don't enable e6500 core since hardware threads are not yet supported

v3:
 - use distinct SPE/AltiVec exception handlers
 - make ONE_REG AltiVec support powerpc generic
 - add ONE_REG IVORs support

 v2:
 - integrate Paul's FP/VMX/VSX changes that landed in kvm-ppc-queue
   in January and take into account feedback

Mihai Caraman (6):
  KVM: PPC: Book3E: Increase FPU laziness
  KVM: PPC: Book3e: Add AltiVec support
  KVM: PPC: Make ONE_REG powerpc generic
  KVM: PPC: Move ONE_REG AltiVec support to powerpc
  KVM: PPC: Booke: Add setter functions for IVPR, IVOR2 and IVOR8
emulation
  KVM: PPC: Booke: Add ONE_REG support for IVPR and IVORs

 Documentation/virtual/kvm/api.txt |   7 +
 arch/powerpc/include/uapi/asm/kvm.h   |  30 +++
 arch/powerpc/kvm/book3s.c | 151 --
 arch/powerpc/kvm/booke.c  | 371 --
 arch/powerpc/kvm/booke.h  |  43 +---
 arch/powerpc/kvm/booke_emulate.c  |  15 +-
 arch/powerpc/kvm/bookehv_interrupts.S |   9 +-
 arch/powerpc/kvm/e500.c   |  42 +++-
 arch/powerpc/kvm/e500_emulate.c   |  20 ++
 arch/powerpc/kvm/e500mc.c |  18 +-
 arch/powerpc/kvm/powerpc.c|  97 +
 11 files changed, 576 insertions(+), 227 deletions(-)

-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 6/6] KVM: PPC: Booke: Add ONE_REG support for IVPR and IVORs

2014-08-20 Thread Mihai Caraman
Add ONE_REG support for IVPR and IVORs registers. Implement IVPR, IVORs 0-15
and 35 in booke common layer.

Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
---
v4:
 - add ONE_REG IVPR
 - use IVPR, IVOR2 and IVOR8 setters
 - add api documentation for ONE_REG IVPR and IVORs

v3:
 - new patch

 Documentation/virtual/kvm/api.txt   |   7 ++
 arch/powerpc/include/uapi/asm/kvm.h |  25 +++
 arch/powerpc/kvm/booke.c| 145 
 arch/powerpc/kvm/e500.c |  42 ++-
 arch/powerpc/kvm/e500mc.c   |  16 
 5 files changed, 233 insertions(+), 2 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index beae3fd..cd7b171 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1917,6 +1917,13 @@ registers, find a list below:
   PPC   | KVM_REG_PPC_TM_VSCR   | 32
   PPC   | KVM_REG_PPC_TM_DSCR   | 64
   PPC   | KVM_REG_PPC_TM_TAR| 64
+  PPC   | KVM_REG_PPC_IVPR  | 64
+  PPC   | KVM_REG_PPC_IVOR0 | 32
+  ...
+  PPC   | KVM_REG_PPC_IVOR15| 32
+  PPC   | KVM_REG_PPC_IVOR32| 32
+  ...
+  PPC   | KVM_REG_PPC_IVOR37| 32
 |   |
   MIPS  | KVM_REG_MIPS_R0   | 64
   ...
diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
b/arch/powerpc/include/uapi/asm/kvm.h
index ab4d473..c97f119 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -564,6 +564,31 @@ struct kvm_get_htab_header {
 #define KVM_REG_PPC_SPRG9  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xba)
 #define KVM_REG_PPC_DBSR   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xbb)
 
+/* Booke IVPR  IVOR registers */
+#define KVM_REG_PPC_IVPR   (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xbc)
+#define KVM_REG_PPC_IVOR0  (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xbd)
+#define KVM_REG_PPC_IVOR1  (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xbe)
+#define KVM_REG_PPC_IVOR2  (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xbf)
+#define KVM_REG_PPC_IVOR3  (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc0)
+#define KVM_REG_PPC_IVOR4  (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc1)
+#define KVM_REG_PPC_IVOR5  (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc2)
+#define KVM_REG_PPC_IVOR6  (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc3)
+#define KVM_REG_PPC_IVOR7  (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc4)
+#define KVM_REG_PPC_IVOR8  (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc5)
+#define KVM_REG_PPC_IVOR9  (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc6)
+#define KVM_REG_PPC_IVOR10 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc7)
+#define KVM_REG_PPC_IVOR11 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc8)
+#define KVM_REG_PPC_IVOR12 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc9)
+#define KVM_REG_PPC_IVOR13 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xca)
+#define KVM_REG_PPC_IVOR14 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xcb)
+#define KVM_REG_PPC_IVOR15 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xcc)
+#define KVM_REG_PPC_IVOR32 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xcd)
+#define KVM_REG_PPC_IVOR33 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xce)
+#define KVM_REG_PPC_IVOR34 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xcf)
+#define KVM_REG_PPC_IVOR35 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xd0)
+#define KVM_REG_PPC_IVOR36 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xd1)
+#define KVM_REG_PPC_IVOR37 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xd2)
+
 /* Transactional Memory checkpointed state:
  * This is all GPRs, all VSX regs and a subset of SPRs
  */
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index d4df648..1cb2a2a 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -1570,6 +1570,75 @@ int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id,
int r = 0;
 
switch (id) {
+   case KVM_REG_PPC_IVPR:
+   *val = get_reg_val(id, vcpu-arch.ivpr);
+   break;
+   case KVM_REG_PPC_IVOR0:
+   *val = get_reg_val(id,
+   vcpu-arch.ivor[BOOKE_IRQPRIO_CRITICAL]);
+   break;
+   case KVM_REG_PPC_IVOR1:
+   *val = get_reg_val(id,
+   vcpu-arch.ivor[BOOKE_IRQPRIO_MACHINE_CHECK]);
+   break;
+   case KVM_REG_PPC_IVOR2:
+   *val = get_reg_val(id,
+   vcpu-arch.ivor[BOOKE_IRQPRIO_DATA_STORAGE]);
+   break;
+   case KVM_REG_PPC_IVOR3:
+   *val = get_reg_val(id,
+   vcpu-arch.ivor[BOOKE_IRQPRIO_INST_STORAGE]);
+   break;
+   case KVM_REG_PPC_IVOR4:
+   *val = get_reg_val(id,
+   vcpu-arch.ivor[BOOKE_IRQPRIO_EXTERNAL]);
+   break;
+   case KVM_REG_PPC_IVOR5:
+   *val = get_reg_val(id,
+   vcpu-arch.ivor[BOOKE_IRQPRIO_ALIGNMENT]);
+   break;
+   case 

[PATCH v4 3/6] KVM: PPC: Make ONE_REG powerpc generic

2014-08-20 Thread Mihai Caraman
Make ONE_REG generic for server and embedded architectures by moving
kvm_vcpu_ioctl_get_one_reg() and kvm_vcpu_ioctl_set_one_reg() functions
to powerpc layer.

Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
---
v4:
 - split ONE_REG powerpc generic and ONE_REG AltiVec

v3:
 - make ONE_REG AltiVec support powerpc generic

v2:
 - add comment describing VCSR register representation in KVM vs kernel

 arch/powerpc/kvm/book3s.c  | 121 +++--
 arch/powerpc/kvm/booke.c   |  91 +-
 arch/powerpc/kvm/powerpc.c |  55 +
 3 files changed, 138 insertions(+), 129 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index dd03f6b..26868e2 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -535,33 +535,28 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, 
struct kvm_fpu *fpu)
return -ENOTSUPP;
 }
 
-int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg)
+int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id,
+   union kvmppc_one_reg *val)
 {
-   int r;
-   union kvmppc_one_reg val;
-   int size;
+   int r = 0;
long int i;
 
-   size = one_reg_size(reg-id);
-   if (size  sizeof(val))
-   return -EINVAL;
-
-   r = vcpu-kvm-arch.kvm_ops-get_one_reg(vcpu, reg-id, val);
+   r = vcpu-kvm-arch.kvm_ops-get_one_reg(vcpu, id, val);
if (r == -EINVAL) {
r = 0;
-   switch (reg-id) {
+   switch (id) {
case KVM_REG_PPC_DAR:
-   val = get_reg_val(reg-id, kvmppc_get_dar(vcpu));
+   *val = get_reg_val(id, kvmppc_get_dar(vcpu));
break;
case KVM_REG_PPC_DSISR:
-   val = get_reg_val(reg-id, kvmppc_get_dsisr(vcpu));
+   *val = get_reg_val(id, kvmppc_get_dsisr(vcpu));
break;
case KVM_REG_PPC_FPR0 ... KVM_REG_PPC_FPR31:
-   i = reg-id - KVM_REG_PPC_FPR0;
-   val = get_reg_val(reg-id, VCPU_FPR(vcpu, i));
+   i = id - KVM_REG_PPC_FPR0;
+   *val = get_reg_val(id, VCPU_FPR(vcpu, i));
break;
case KVM_REG_PPC_FPSCR:
-   val = get_reg_val(reg-id, vcpu-arch.fp.fpscr);
+   *val = get_reg_val(id, vcpu-arch.fp.fpscr);
break;
 #ifdef CONFIG_ALTIVEC
case KVM_REG_PPC_VR0 ... KVM_REG_PPC_VR31:
@@ -569,110 +564,94 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, 
struct kvm_one_reg *reg)
r = -ENXIO;
break;
}
-   val.vval = vcpu-arch.vr.vr[reg-id - KVM_REG_PPC_VR0];
+   val-vval = vcpu-arch.vr.vr[id - KVM_REG_PPC_VR0];
break;
case KVM_REG_PPC_VSCR:
if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
r = -ENXIO;
break;
}
-   val = get_reg_val(reg-id, vcpu-arch.vr.vscr.u[3]);
+   *val = get_reg_val(id, vcpu-arch.vr.vscr.u[3]);
break;
case KVM_REG_PPC_VRSAVE:
-   val = get_reg_val(reg-id, vcpu-arch.vrsave);
+   *val = get_reg_val(id, vcpu-arch.vrsave);
break;
 #endif /* CONFIG_ALTIVEC */
 #ifdef CONFIG_VSX
case KVM_REG_PPC_VSR0 ... KVM_REG_PPC_VSR31:
if (cpu_has_feature(CPU_FTR_VSX)) {
-   long int i = reg-id - KVM_REG_PPC_VSR0;
-   val.vsxval[0] = vcpu-arch.fp.fpr[i][0];
-   val.vsxval[1] = vcpu-arch.fp.fpr[i][1];
+   i = id - KVM_REG_PPC_VSR0;
+   val-vsxval[0] = vcpu-arch.fp.fpr[i][0];
+   val-vsxval[1] = vcpu-arch.fp.fpr[i][1];
} else {
r = -ENXIO;
}
break;
 #endif /* CONFIG_VSX */
-   case KVM_REG_PPC_DEBUG_INST: {
-   u32 opcode = INS_TW;
-   r = copy_to_user((u32 __user *)(long)reg-addr,
-opcode, sizeof(u32));
+   case KVM_REG_PPC_DEBUG_INST:
+   *val = get_reg_val(id, INS_TW);
break;
-   }
 #ifdef CONFIG_KVM_XICS
case KVM_REG_PPC_ICP_STATE:
if (!vcpu-arch.icp) {
r = -ENXIO;
break;
}

[PATCH v4 4/6] KVM: PPC: Move ONE_REG AltiVec support to powerpc

2014-08-20 Thread Mihai Caraman
Move ONE_REG AltiVec support to powerpc generic layer.

Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
---
v4:
 - split ONE_REG powerpc generic and ONE_REG AltiVec

v3:
 - make ONE_REG AltiVec support powerpc generic

v2:
 - add comment describing VCSR register representation in KVM vs kernel

 arch/powerpc/include/uapi/asm/kvm.h |  5 +
 arch/powerpc/kvm/book3s.c   | 42 -
 arch/powerpc/kvm/powerpc.c  | 42 +
 3 files changed, 47 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
b/arch/powerpc/include/uapi/asm/kvm.h
index 3ca357a..ab4d473 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -476,6 +476,11 @@ struct kvm_get_htab_header {
 
 /* FP and vector status/control registers */
 #define KVM_REG_PPC_FPSCR  (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x80)
+/*
+ * VSCR register is documented as a 32-bit register in the ISA, but it can
+ * only be accesses via a vector register. Expose VSCR as a 32-bit register
+ * even though the kernel represents it as a 128-bit vector.
+ */
 #define KVM_REG_PPC_VSCR   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x81)
 
 /* Virtual processor areas */
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 26868e2..1b5adda 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -558,25 +558,6 @@ int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id,
case KVM_REG_PPC_FPSCR:
*val = get_reg_val(id, vcpu-arch.fp.fpscr);
break;
-#ifdef CONFIG_ALTIVEC
-   case KVM_REG_PPC_VR0 ... KVM_REG_PPC_VR31:
-   if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
-   r = -ENXIO;
-   break;
-   }
-   val-vval = vcpu-arch.vr.vr[id - KVM_REG_PPC_VR0];
-   break;
-   case KVM_REG_PPC_VSCR:
-   if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
-   r = -ENXIO;
-   break;
-   }
-   *val = get_reg_val(id, vcpu-arch.vr.vscr.u[3]);
-   break;
-   case KVM_REG_PPC_VRSAVE:
-   *val = get_reg_val(id, vcpu-arch.vrsave);
-   break;
-#endif /* CONFIG_ALTIVEC */
 #ifdef CONFIG_VSX
case KVM_REG_PPC_VSR0 ... KVM_REG_PPC_VSR31:
if (cpu_has_feature(CPU_FTR_VSX)) {
@@ -653,29 +634,6 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
case KVM_REG_PPC_FPSCR:
vcpu-arch.fp.fpscr = set_reg_val(id, *val);
break;
-#ifdef CONFIG_ALTIVEC
-   case KVM_REG_PPC_VR0 ... KVM_REG_PPC_VR31:
-   if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
-   r = -ENXIO;
-   break;
-   }
-   vcpu-arch.vr.vr[id - KVM_REG_PPC_VR0] = val-vval;
-   break;
-   case KVM_REG_PPC_VSCR:
-   if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
-   r = -ENXIO;
-   break;
-   }
-   vcpu-arch.vr.vscr.u[3] = set_reg_val(id, *val);
-   break;
-   case KVM_REG_PPC_VRSAVE:
-   if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
-   r = -ENXIO;
-   break;
-   }
-   vcpu-arch.vrsave = set_reg_val(id, *val);
-   break;
-#endif /* CONFIG_ALTIVEC */
 #ifdef CONFIG_VSX
case KVM_REG_PPC_VSR0 ... KVM_REG_PPC_VSR31:
if (cpu_has_feature(CPU_FTR_VSX)) {
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 1326116..19d4755 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -941,6 +941,25 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, 
struct kvm_one_reg *reg)
if (r == -EINVAL) {
r = 0;
switch (reg-id) {
+#ifdef CONFIG_ALTIVEC
+   case KVM_REG_PPC_VR0 ... KVM_REG_PPC_VR31:
+   if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
+   r = -ENXIO;
+   break;
+   }
+   val.vval = vcpu-arch.vr.vr[reg-id - KVM_REG_PPC_VR0];
+   break;
+   case KVM_REG_PPC_VSCR:
+   if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
+   r = -ENXIO;
+   break;
+   }
+   val = get_reg_val(reg-id, 

[PATCH v4 2/6] KVM: PPC: Book3e: Add AltiVec support

2014-08-20 Thread Mihai Caraman
Add AltiVec support in KVM for Book3e. FPU support gracefully reuse host
infrastructure so follow the same approach for AltiVec.

Book3e specification defines shared interrupt numbers for SPE and AltiVec
units. Still SPE is present in e200/e500v2 cores while AltiVec is present in
e6500 core. So we can currently decide at compile-time which of the SPE or
AltiVec units to support exclusively by using CONFIG_SPE_POSSIBLE and
CONFIG_PPC_E500MC defines. As Alexander Graf suggested, keep SPE and AltiVec
exception handlers distinct to improve code readability.

Guests have the privilege to enable AltiVec, so we always need to support
AltiVec in KVM and implicitly in host to reflect interrupts and to save/restore
the unit context. KVM will be loaded on cores with AltiVec unit only if
CONFIG_ALTIVEC is defined. Use this define to guard KVM AltiVec logic.

Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
---
v4:
 - use CONFIG_SPE_POSSIBLE and a new ifdef for CONFIG_ALTIVEC
 - remove SPE handlers from bookehv
 - update commit message

v3:
 - use distinct SPE/AltiVec exception handlers

v2:
 - integrate Paul's FP/VMX/VSX changes

 arch/powerpc/kvm/booke.c  | 74 ++-
 arch/powerpc/kvm/booke.h  |  6 +++
 arch/powerpc/kvm/bookehv_interrupts.S |  9 +
 arch/powerpc/kvm/e500_emulate.c   | 20 ++
 4 files changed, 101 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 91e7217..8ace612 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -168,6 +168,40 @@ static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu *vcpu)
 #endif
 }
 
+/*
+ * Simulate AltiVec unavailable fault to load guest state
+ * from thread to AltiVec unit.
+ * It requires to be called with preemption disabled.
+ */
+static inline void kvmppc_load_guest_altivec(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_ALTIVEC
+   if (cpu_has_feature(CPU_FTR_ALTIVEC)) {
+   if (!(current-thread.regs-msr  MSR_VEC)) {
+   enable_kernel_altivec();
+   load_vr_state(vcpu-arch.vr);
+   current-thread.vr_save_area = vcpu-arch.vr;
+   current-thread.regs-msr |= MSR_VEC;
+   }
+   }
+#endif
+}
+
+/*
+ * Save guest vcpu AltiVec state into thread.
+ * It requires to be called with preemption disabled.
+ */
+static inline void kvmppc_save_guest_altivec(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_ALTIVEC
+   if (cpu_has_feature(CPU_FTR_ALTIVEC)) {
+   if (current-thread.regs-msr  MSR_VEC)
+   giveup_altivec(current);
+   current-thread.vr_save_area = NULL;
+   }
+#endif
+}
+
 static void kvmppc_vcpu_sync_debug(struct kvm_vcpu *vcpu)
 {
/* Synchronize guest's desire to get debug interrupts into shadow MSR */
@@ -375,9 +409,15 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu 
*vcpu,
case BOOKE_IRQPRIO_ITLB_MISS:
case BOOKE_IRQPRIO_SYSCALL:
case BOOKE_IRQPRIO_FP_UNAVAIL:
+#ifdef CONFIG_SPE_POSSIBLE
case BOOKE_IRQPRIO_SPE_UNAVAIL:
case BOOKE_IRQPRIO_SPE_FP_DATA:
case BOOKE_IRQPRIO_SPE_FP_ROUND:
+#endif
+#ifdef CONFIG_ALTIVEC
+   case BOOKE_IRQPRIO_ALTIVEC_UNAVAIL:
+   case BOOKE_IRQPRIO_ALTIVEC_ASSIST:
+#endif
case BOOKE_IRQPRIO_AP_UNAVAIL:
allowed = 1;
msr_mask = MSR_CE | MSR_ME | MSR_DE;
@@ -697,6 +737,17 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
kvmppc_load_guest_fp(vcpu);
 #endif
 
+#ifdef CONFIG_ALTIVEC
+   /* Save userspace AltiVec state in stack */
+   if (cpu_has_feature(CPU_FTR_ALTIVEC))
+   enable_kernel_altivec();
+   /*
+* Since we can't trap on MSR_VEC in GS-mode, we consider the guest
+* as always using the AltiVec.
+*/
+   kvmppc_load_guest_altivec(vcpu);
+#endif
+
/* Switch to guest debug context */
debug = vcpu-arch.dbg_reg;
switch_booke_debug_regs(debug);
@@ -719,6 +770,10 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
kvmppc_save_guest_fp(vcpu);
 #endif
 
+#ifdef CONFIG_ALTIVEC
+   kvmppc_save_guest_altivec(vcpu);
+#endif
+
 out:
vcpu-mode = OUTSIDE_GUEST_MODE;
return ret;
@@ -1025,7 +1080,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_SPE_FP_ROUND);
r = RESUME_GUEST;
break;
-#else
+#elif defined(CONFIG_SPE_POSSIBLE)
case BOOKE_INTERRUPT_SPE_UNAVAIL:
/*
 * Guest wants SPE, but host kernel doesn't support it.  Send
@@ -1046,6 +1101,22 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
run-hw.hardware_exit_reason = exit_nr;
r = RESUME_HOST;
break;
+#endif /* CONFIG_SPE_POSSIBLE 

[PATCH v4 1/6] KVM: PPC: Book3E: Increase FPU laziness

2014-08-20 Thread Mihai Caraman
Increase FPU laziness by loading the guest state into the unit before entering
the guest instead of doing it on each vcpu schedule. Without this improvement
an interrupt may claim floating point corrupting guest state.

Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
---
v4:
 - update commit message

v3:
 - no changes

v2:
 - remove fpu_active
 - add descriptive comments

 arch/powerpc/kvm/booke.c  | 43 ---
 arch/powerpc/kvm/booke.h  | 34 --
 arch/powerpc/kvm/e500mc.c |  2 --
 3 files changed, 36 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 074b7fc..91e7217 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -124,6 +124,40 @@ static void kvmppc_vcpu_sync_spe(struct kvm_vcpu *vcpu)
 }
 #endif
 
+/*
+ * Load up guest vcpu FP state if it's needed.
+ * It also set the MSR_FP in thread so that host know
+ * we're holding FPU, and then host can help to save
+ * guest vcpu FP state if other threads require to use FPU.
+ * This simulates an FP unavailable fault.
+ *
+ * It requires to be called with preemption disabled.
+ */
+static inline void kvmppc_load_guest_fp(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_PPC_FPU
+   if (!(current-thread.regs-msr  MSR_FP)) {
+   enable_kernel_fp();
+   load_fp_state(vcpu-arch.fp);
+   current-thread.fp_save_area = vcpu-arch.fp;
+   current-thread.regs-msr |= MSR_FP;
+   }
+#endif
+}
+
+/*
+ * Save guest vcpu FP state into thread.
+ * It requires to be called with preemption disabled.
+ */
+static inline void kvmppc_save_guest_fp(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_PPC_FPU
+   if (current-thread.regs-msr  MSR_FP)
+   giveup_fpu(current);
+   current-thread.fp_save_area = NULL;
+#endif
+}
+
 static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu *vcpu)
 {
 #if defined(CONFIG_PPC_FPU)  !defined(CONFIG_KVM_BOOKE_HV)
@@ -658,12 +692,8 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
 
/*
 * Since we can't trap on MSR_FP in GS-mode, we consider the guest
-* as always using the FPU.  Kernel usage of FP (via
-* enable_kernel_fp()) in this thread must not occur while
-* vcpu-fpu_active is set.
+* as always using the FPU.
 */
-   vcpu-fpu_active = 1;
-
kvmppc_load_guest_fp(vcpu);
 #endif
 
@@ -687,8 +717,6 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
 
 #ifdef CONFIG_PPC_FPU
kvmppc_save_guest_fp(vcpu);
-
-   vcpu-fpu_active = 0;
 #endif
 
 out:
@@ -1194,6 +1222,7 @@ out:
else {
/* interrupts now hard-disabled */
kvmppc_fix_ee_before_entry();
+   kvmppc_load_guest_fp(vcpu);
}
}
 
diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h
index f753543..e73d513 100644
--- a/arch/powerpc/kvm/booke.h
+++ b/arch/powerpc/kvm/booke.h
@@ -116,40 +116,6 @@ extern int kvmppc_core_emulate_mtspr_e500(struct kvm_vcpu 
*vcpu, int sprn,
 extern int kvmppc_core_emulate_mfspr_e500(struct kvm_vcpu *vcpu, int sprn,
  ulong *spr_val);
 
-/*
- * Load up guest vcpu FP state if it's needed.
- * It also set the MSR_FP in thread so that host know
- * we're holding FPU, and then host can help to save
- * guest vcpu FP state if other threads require to use FPU.
- * This simulates an FP unavailable fault.
- *
- * It requires to be called with preemption disabled.
- */
-static inline void kvmppc_load_guest_fp(struct kvm_vcpu *vcpu)
-{
-#ifdef CONFIG_PPC_FPU
-   if (vcpu-fpu_active  !(current-thread.regs-msr  MSR_FP)) {
-   enable_kernel_fp();
-   load_fp_state(vcpu-arch.fp);
-   current-thread.fp_save_area = vcpu-arch.fp;
-   current-thread.regs-msr |= MSR_FP;
-   }
-#endif
-}
-
-/*
- * Save guest vcpu FP state into thread.
- * It requires to be called with preemption disabled.
- */
-static inline void kvmppc_save_guest_fp(struct kvm_vcpu *vcpu)
-{
-#ifdef CONFIG_PPC_FPU
-   if (vcpu-fpu_active  (current-thread.regs-msr  MSR_FP))
-   giveup_fpu(current);
-   current-thread.fp_save_area = NULL;
-#endif
-}
-
 static inline void kvmppc_clear_dbsr(void)
 {
mtspr(SPRN_DBSR, mfspr(SPRN_DBSR));
diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c
index 000cf82..4549349 100644
--- a/arch/powerpc/kvm/e500mc.c
+++ b/arch/powerpc/kvm/e500mc.c
@@ -145,8 +145,6 @@ static void kvmppc_core_vcpu_load_e500mc(struct kvm_vcpu 
*vcpu, int cpu)
kvmppc_e500_tlbil_all(vcpu_e500);
__get_cpu_var(last_vcpu_of_lpid)[vcpu-kvm-arch.lpid] = vcpu;
}
-
-   kvmppc_load_guest_fp(vcpu);
 }
 
 static void kvmppc_core_vcpu_put_e500mc(struct kvm_vcpu *vcpu)
-- 
1.7.11.7

--
To unsubscribe from this list: 

[PATCH v4 5/6] KVM: PPC: Booke: Add setter functions for IVPR, IVOR2 and IVOR8 emulation

2014-08-20 Thread Mihai Caraman
Add setter functions for IVPR, IVOR2 and IVOR8 emulation in preparation
for ONE_REG support.

Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
---
v4:
 - new patch
 - add api documentation for ONE_REG IVPR and IVORs

 arch/powerpc/kvm/booke.c | 24 
 arch/powerpc/kvm/booke.h |  3 +++
 arch/powerpc/kvm/booke_emulate.c | 15 +++
 3 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 831c1b4..d4df648 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -1782,6 +1782,30 @@ void kvmppc_clr_tsr_bits(struct kvm_vcpu *vcpu, u32 
tsr_bits)
update_timer_ints(vcpu);
 }
 
+void kvmppc_set_ivpr(struct kvm_vcpu *vcpu, ulong new_ivpr)
+{
+   vcpu-arch.ivpr = new_ivpr;
+#ifdef CONFIG_KVM_BOOKE_HV
+   mtspr(SPRN_GIVPR, new_ivpr);
+#endif
+}
+
+void kvmppc_set_ivor2(struct kvm_vcpu *vcpu, u32 new_ivor)
+{
+   vcpu-arch.ivor[BOOKE_IRQPRIO_DATA_STORAGE] = new_ivor;
+#ifdef CONFIG_KVM_BOOKE_HV
+   mtspr(SPRN_GIVOR2, new_ivor);
+#endif
+}
+
+void kvmppc_set_ivor8(struct kvm_vcpu *vcpu, u32 new_ivor)
+{
+   vcpu-arch.ivor[BOOKE_IRQPRIO_SYSCALL] = new_ivor;
+#ifdef CONFIG_KVM_BOOKE_HV
+   mtspr(SPRN_GIVOR8, new_ivor);
+#endif
+}
+
 void kvmppc_decrementer_func(unsigned long data)
 {
struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h
index 22ba08e..0242530 100644
--- a/arch/powerpc/kvm/booke.h
+++ b/arch/powerpc/kvm/booke.h
@@ -80,6 +80,9 @@ void kvmppc_set_epcr(struct kvm_vcpu *vcpu, u32 new_epcr);
 void kvmppc_set_tcr(struct kvm_vcpu *vcpu, u32 new_tcr);
 void kvmppc_set_tsr_bits(struct kvm_vcpu *vcpu, u32 tsr_bits);
 void kvmppc_clr_tsr_bits(struct kvm_vcpu *vcpu, u32 tsr_bits);
+void kvmppc_set_ivpr(struct kvm_vcpu *vcpu, ulong new_ivpr);
+void kvmppc_set_ivor2(struct kvm_vcpu *vcpu, u32 new_ivor);
+void kvmppc_set_ivor8(struct kvm_vcpu *vcpu, u32 new_ivor);
 
 int kvmppc_booke_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
 unsigned int inst, int *advance);
diff --git a/arch/powerpc/kvm/booke_emulate.c b/arch/powerpc/kvm/booke_emulate.c
index 92bc668..94c64e3 100644
--- a/arch/powerpc/kvm/booke_emulate.c
+++ b/arch/powerpc/kvm/booke_emulate.c
@@ -191,10 +191,7 @@ int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, int 
sprn, ulong spr_val)
break;
 
case SPRN_IVPR:
-   vcpu-arch.ivpr = spr_val;
-#ifdef CONFIG_KVM_BOOKE_HV
-   mtspr(SPRN_GIVPR, spr_val);
-#endif
+   kvmppc_set_ivpr(vcpu, spr_val);
break;
case SPRN_IVOR0:
vcpu-arch.ivor[BOOKE_IRQPRIO_CRITICAL] = spr_val;
@@ -203,10 +200,7 @@ int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, int 
sprn, ulong spr_val)
vcpu-arch.ivor[BOOKE_IRQPRIO_MACHINE_CHECK] = spr_val;
break;
case SPRN_IVOR2:
-   vcpu-arch.ivor[BOOKE_IRQPRIO_DATA_STORAGE] = spr_val;
-#ifdef CONFIG_KVM_BOOKE_HV
-   mtspr(SPRN_GIVOR2, spr_val);
-#endif
+   kvmppc_set_ivor2(vcpu, spr_val);
break;
case SPRN_IVOR3:
vcpu-arch.ivor[BOOKE_IRQPRIO_INST_STORAGE] = spr_val;
@@ -224,10 +218,7 @@ int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, int 
sprn, ulong spr_val)
vcpu-arch.ivor[BOOKE_IRQPRIO_FP_UNAVAIL] = spr_val;
break;
case SPRN_IVOR8:
-   vcpu-arch.ivor[BOOKE_IRQPRIO_SYSCALL] = spr_val;
-#ifdef CONFIG_KVM_BOOKE_HV
-   mtspr(SPRN_GIVOR8, spr_val);
-#endif
+   kvmppc_set_ivor8(vcpu, spr_val);
break;
case SPRN_IVOR9:
vcpu-arch.ivor[BOOKE_IRQPRIO_AP_UNAVAIL] = spr_val;
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: [PATCH v2 1/2] powerpc/booke: Restrict SPE exception handlers to e200/e500 cores

2014-08-20 Thread Scott Wood
On Wed, 2014-08-20 at 16:09 +0300, Mihai Caraman wrote:
 SPE exception handlers are now defined for 32-bit e500mc cores even though
 SPE unit is not present and CONFIG_SPE is undefined.
 
 Restrict SPE exception handlers to e200/e500 cores adding CONFIG_SPE_POSSIBLE
 and consequently guard __stup_ivors and __setup_cpu functions.
 
 Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
 Cc: Scott Wood scottw...@freescale.com
 Cc: Alexander Graf ag...@suse.de
 ---
 v2:
  - use CONFIG_PPC_E500MC without CONFIG_E500
  - use elif defined()
 
  arch/powerpc/kernel/cpu_setup_fsl_booke.S | 12 +++-
  arch/powerpc/kernel/cputable.c|  5 +
  arch/powerpc/kernel/head_fsl_booke.S  | 18 +-
  arch/powerpc/platforms/Kconfig.cputype|  6 +-
  4 files changed, 34 insertions(+), 7 deletions(-)

Acked-by: Scott Wood scottw...@freescale.com

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] powerpc/booke: Revert SPE/AltiVec common defines for interrupt numbers

2014-08-20 Thread Scott Wood
On Wed, 2014-08-20 at 16:09 +0300, Mihai Caraman wrote:
 Book3E specification defines shared interrupt numbers for SPE and AltiVec
 units. Still SPE is present in e200/e500v2 cores while AltiVec is present in
 e6500 core. So we can currently decide at compile-time which unit to support
 exclusively. As Alexander Graf suggested, this will improve code readability
 especially in KVM.
 
 Use distinct defines to identify SPE/AltiVec interrupt numbers, reverting
 c58ce397 and 6b310fc5 patches that added common defines.
 
 Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
 Cc: Scott Wood scottw...@freescale.com
 Cc: Alexander Graf ag...@suse.de
 ---
  arch/powerpc/kernel/exceptions-64e.S | 4 ++--
  arch/powerpc/kernel/head_fsl_booke.S | 8 
  2 files changed, 6 insertions(+), 6 deletions(-)

Acked-by: Scott Wood scottw...@freescale.com

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html