date:20141210

Re: [PATCH] KVM: x86: nVMX: support for MSR loading/storing

2014-12-10 Thread Eugene Korenevsky

 GCC doesn't warn that ((u32)e-index  24) == 0x800 is always false?
 I think SDM says '(e-index  8) == 0x8'.

Missed that. Thank you.


-- 
Eugene
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: nVMX: support for MSR loading/storing

2014-12-10 Thread Eugene Korenevsky

 Hi, Eugene, is it okay to split my part up?

I think the patch is atomic. No ideas how this patch could be split
without breaking its integrity.
You are a co-author of the patch since your ideas make significant part of it.


-- 
Eugene
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: nVMX: support for MSR loading/storing

2014-12-10 Thread Eugene Korenevsky

Will send fixed patch this evening.

-- 
Eugene
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Windows 7 VM BSOD

2014-12-10 Thread Vadim Rozenfeld

On Wed, 2014-12-10 at 15:42 +0800, Thomas Lau wrote:
 I briefly tested Penryn, Westmere. Bug still could reproduce.
 

It should be four parameters printed on the screen, right below
the error code string. Could you please post them? 

 how could I set level, model and enforce on libvirt ?! I could also
 test it if you could tell me how to add those options on libvirtd.

Sorry, have no idea how to deal with libvirt.

 
 On Wed, Dec 10, 2014 at 2:19 PM, Vadim Rozenfeld vroze...@redhat.com wrote:
  On Wed, 2014-12-10 at 08:51 +0800, t...@tetrioncapital.com wrote:
  Hi,
 
  Anything you want me to try on my side?
 
  There is an open bug in bugzilla which looks
  pretty similar to your problem
  https://bugzilla.redhat.com/show_bug.cgi?id=1139928
 
  Please take a look at comment #18 posted by Eduardo
   https://bugzilla.redhat.com/show_bug.cgi?id=1139928#c18
 
  Best regards,
  Vadim.
 
  Sent from my BlackBerry 10 smartphone.
 
  Thomas Lau
  Director of Infrastructure
  Tetrion Capital Limited
 
  Direct: +852-3976-8903
  Mobile: +852-9323-9670
  Address: 20/F, IFC 1, Central district, Hong Kong
Original Message
  From: Thomas Lau
  Sent: Tuesday, 9 December, 2014 4:24 PM
  To: Vadim Rozenfeld
  Cc: Zhang Haoyu; kvm; imammedo
  Subject: Re: Windows 7 VM BSOD
 
  Hi Vadim,
 
  Now turning on is OK somehow, shutdown still stuck.
 
  On Tue, Dec 9, 2014 at 4:03 PM, Vadim Rozenfeld vroze...@redhat.com 
  wrote:
   On Tue, 2014-12-09 at 15:54 +0800, Thomas Lau wrote:
   I changed CPU type to Westmere, it boot up with 0x05C BSOD
  
   It should be four parameters printed on the screen, right below
   the error code string. Could you please post them?
  
   Vadim.
  
  
   On Tue, Dec 9, 2014 at 3:10 PM, Vadim Rozenfeld vroze...@redhat.com 
   wrote:
On Tue, 2014-12-09 at 11:54 +0800, Thomas Lau wrote:
Hi Vadim,
   
I want to quote back to your original post back in early 2014:
https://www.mail-archive.com/kvm@vger.kernel.org/msg99782.html
   
   
According to 
http://msdn.microsoft.com/en-us/library/windows/hardware/ff559069(v=vs.85).aspx
the 0x5C means HAL_INITIALIZATION_FAILED
   
Problem matched exactly, which I am using CPU IvyBridge-EP and I got
same BSOD as well.
   
Some CPU flags (feature bits) should be missing.
Can you try changing cpu type?
   
Best regards,
Vadim.
   
   
Are we missing some hyperv feature?
   
On Wed, Dec 3, 2014 at 7:29 PM, Vadim Rozenfeld 
vroze...@redhat.com wrote:
 If you run WS2008(R2) or Win7 - always turn on relaxed timing. 
 Otherwise
 it's just a matter of time when you hit 101 BOSD.
 Bugcheck 78 is quite rare one. What is your setup, and how easy 
 it's
 reproducible?

 Best regards,
 Vadim.

 On Wed, 2014-12-03 at 19:13 +0800, Thomas Lau wrote:
 it works on your side meaning that you had such issue but 
 afterwards
 it's all fixed by apply hv_relaxed ?

 On Wed, Dec 3, 2014 at 7:08 PM, Zhang Haoyu zhan...@sangfor.com 
 wrote:
  https://bugzilla.redhat.com/show_bug.cgi?id=893857
 
  In fact I am doing testing now, but are we fixing one problem 
  and
  introduce other problem?!
 
  I'm not sure about this, but it works on my side,
  I think BSOD(error:0x0078) has been fixed,
  please show your environment.
 
  Thanks,
  Zhang Haoyu
  On Wed, Dec 3, 2014 at 6:36 PM, Thomas Lau 
  t...@tetrioncapital.com wrote:
   Hi,
  
   How do I know if my qemu-kvm version support this?
  
   On Wed, Dec 3, 2014 at 6:25 PM, Zhang Haoyu 
   zhan...@sangfor.com wrote:
   Hi All,
  
   I am running 3.13.0-24-generic kernel on Ubuntu 14, 
   Windows 7 VM
   installation was fine, but it does random reboot by 
   itself, the error
   code is 0x0101, does anyone know how to fix this?
   Could you try hv_relaxed, like -cpu kvm64,hv_relaxed.
  
   Thanks,
   Zhang Haoyu
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at http://vger.kernel.org/majordomo-info.html


   
   
   
   
   
  
  
  
  
  
 
 
 
 
 
 
 
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/2] KVM: x86: Emulator fixes for VM86

2014-12-10 Thread Nadav Amit

Two minor fixes for emulation of instructions on VM86.

Thanks for reviewing them.

Nadav Amit (2):
  KVM: x86: Do not push eflags.vm on pushf
  KVM: x86: Emulate should check #UD before #GP

 arch/x86/kvm/emulate.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] KVM: x86: Emulate should check #UD before #GP

2014-12-10 Thread Nadav Amit

Intel SDM table 6-2 (Priority Among Simultaneous Exceptions and Interrupts)
shows that faults from decoding the next instruction got higher priority than
general protection.  Moving the protected-mode check before the CPL check to
avoid wrong exception on vm86 mode.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/emulate.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 5cd5401..0d42aca 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4803,6 +4803,12 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
goto done;
}
 
+   /* Instruction can only be executed in protected mode */
+   if ((ctxt-d  Prot)  ctxt-mode  X86EMUL_MODE_PROT16) {
+   rc = emulate_ud(ctxt);
+   goto done;
+   }
+
/* Privileged instruction can be executed only in CPL=0 */
if ((ctxt-d  Priv)  ops-cpl(ctxt)) {
if (ctxt-d  PrivUD)
@@ -4812,12 +4818,6 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
goto done;
}
 
-   /* Instruction can only be executed in protected mode */
-   if ((ctxt-d  Prot)  ctxt-mode  X86EMUL_MODE_PROT16) {
-   rc = emulate_ud(ctxt);
-   goto done;
-   }
-
/* Do instruction specific permission checks */
if (ctxt-d  CheckPerm) {
rc = ctxt-check_perm(ctxt);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] KVM: x86: Do not push eflags.vm on pushf

2014-12-10 Thread Nadav Amit

The pushf instruction does not push eflags.VM, so emulation should not do so as
well.  Although eflags.RF should not be pushed as well, it is already cleared
by the time pushf is executed.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
 arch/x86/kvm/emulate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 33ecfcf..5cd5401 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1863,7 +1863,7 @@ static int em_pusha(struct x86_emulate_ctxt *ctxt)
 
 static int em_pushf(struct x86_emulate_ctxt *ctxt)
 {
-   ctxt-src.val =  (unsigned long)ctxt-eflags;
+   ctxt-src.val = (unsigned long)ctxt-eflags  ~EFLG_VM;
return em_push(ctxt);
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: nVMX: support for MSR loading/storing

2014-12-10 Thread Wanpeng Li

Hi all,
On Wed, Dec 10, 2014 at 08:07:45AM -0100, Eugene Korenevsky wrote:
 Hi, Eugene, is it okay to split my part up?

I think the patch is atomic. No ideas how this patch could be split
without breaking its integrity.
You are a co-author of the patch since your ideas make significant part of it.


Since Wincy send out his patch before you, I prefer he send out a newer 
version which fix issues in his own patch, then you can send out another 
enhanced one based on Wincy's work.

Regards,
Wanpeng Li 


-- 
Eugene
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: nVMX: support for MSR loading/storing

2014-12-10 Thread Wanpeng Li

On Wed, Dec 10, 2014 at 08:13:58AM -0100, Eugene Korenevsky wrote:
Will send fixed patch this evening.

Please see my reply to another thread.


-- 
Eugene
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: nVMX: support for MSR loading/storing

2014-12-10 Thread Wincy Van

2014-12-10 17:01 GMT+08:00 Wanpeng Li wanpeng...@linux.intel.com:
 Hi all,
 On Wed, Dec 10, 2014 at 08:07:45AM -0100, Eugene Korenevsky wrote:
 Hi, Eugene, is it okay to split my part up?

I think the patch is atomic. No ideas how this patch could be split
without breaking its integrity.
You are a co-author of the patch since your ideas make significant part of it.


 Since Wincy send out his patch before you, I prefer he send out a newer
 version which fix issues in his own patch, then you can send out another
 enhanced one based on Wincy's work.

 Ok, I will send out the version two ASAP, thanks.

Wincy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: nVMX: support for MSR loading/storing

2014-12-10 Thread Bandan Das

Eugene Korenevsky ekorenev...@gmail.com writes:

 Hi, Eugene, is it okay to split my part up?

 I think the patch is atomic. No ideas how this patch could be split
 without breaking its integrity.
 You are a co-author of the patch since your ideas make significant part of it.

I was suggesting adding the interfaces you introduced in the first patch
and then using these interfaces in the second patch to make reviewing easier.
It's ok to mention that the second depends on the first.

If Wincy has code contributions to the patch, he should sign it off too,
else maybe add a Suggested-by to give him credit for his ideas.

Also, please include a v3 in the Subject when you submit your next version.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/5] arm/arm64: KVM: vgic: move reset initialization into vgic_init_maps()

2014-12-10 Thread Eric Auger

Hi Christoffer,
Reviewed-by: Eric Auger eric.au...@linaro.org
see few comments below.
On 12/09/2014 04:44 PM, Christoffer Dall wrote:
 From: Peter Maydell peter.mayd...@linaro.org
 
 VGIC initialization currently happens in three phases:
  (1) kvm_vgic_create() (triggered by userspace GIC creation)
  (2) vgic_init_maps() (triggered by userspace GIC register read/write
  requests, or from kvm_vgic_init() if not already run)
  (3) kvm_vgic_init() (triggered by first VM run)
 
 We were doing initialization of some state to correspond with the
 state of a freshly-reset GIC in kvm_vgic_init(); this is too late,
 since it will overwrite changes made by userspace using the
 register access APIs before the VM is run. Move this initialization
 earlier, into the vgic_init_maps() phase.
 
 This fixes a bug where QEMU could successfully restore a saved
 VM state snapshot into a VM that had already been run, but could
 not restore it from cold using the -loadvm command line option
 (the symptoms being that the restored VM would run but interrupts
 were ignored).
 
 Finally rename vgic_init_maps to vgic_init and renamed kvm_vgic_init to
 kvm_vgic_map_resources.
 
   [ This patch is originally written by Peter Maydell, but I have
 modified it somewhat heavily, renaming various bits and moving code
 around.  If something is broken, I am to be blamed. - Christoffer ]
 
 Signed-off-by: Peter Maydell peter.mayd...@linaro.org
 Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
 ---
 This patch was originally named vgic: move reset initialization into
 vgic_init_maps() but I renamed it slightly to match the other vgic
 patches in the kernel.  I also did the additional changes since the
 original patch:
  - Renamed kvm_vgic_init to kvm_vgic_map_resources
  - Renamed vgic_init_maps to vgic_init
  - Moved vgic_enable call into existing vcpu loop in vgic_init
  - Moved ITARGETSRn initializtion above vcpu loop in vgic_init (the idea
typo
is to init global state first, then vcpu state).

kvm_vgic_vcpu_init also has disappeared and PPI settings of
dist-irq_enabled and dist-irq_cfg now are in former vgic_init_maps.

Maybe it would be simpler to review if there were 2 patches: one for
init redistribution from kvm_vgic_init to vgic_init_maps and one for the
renaming.

kvm_vgic_map_resources: difficult to understand it also inits the
internal states. Wouldn't kvm_vgic_set_ready be aligned with ready
terminology?

Best Regards

Eric

  - Added comment in kvm_vgic_map_resources




 
  arch/arm/kvm/arm.c |  6 ++--
  include/kvm/arm_vgic.h |  4 +--
  virt/kvm/arm/vgic.c| 77 
 +-
  3 files changed, 37 insertions(+), 50 deletions(-)
 
 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index 9e193c8..a56cbb5 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -427,11 +427,11 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu 
 *vcpu)
   vcpu-arch.has_run_once = true;
  
   /*
 -  * Initialize the VGIC before running a vcpu the first time on
 -  * this VM.
 +  * Map the VGIC hardware resources before running a vcpu the first
 +  * time on this VM.
*/
   if (unlikely(!vgic_initialized(vcpu-kvm))) {
 - ret = kvm_vgic_init(vcpu-kvm);
 + ret = kvm_vgic_map_resources(vcpu-kvm);
   if (ret)
   return ret;
   }
 diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
 index 206dcc3..fe9783b 100644
 --- a/include/kvm/arm_vgic.h
 +++ b/include/kvm/arm_vgic.h
 @@ -274,7 +274,7 @@ struct kvm_exit_mmio;
  #ifdef CONFIG_KVM_ARM_VGIC
  int kvm_vgic_addr(struct kvm *kvm, unsigned long type, u64 *addr, bool 
 write);
  int kvm_vgic_hyp_init(void);
 -int kvm_vgic_init(struct kvm *kvm);
 +int kvm_vgic_map_resources(struct kvm *kvm);
  int kvm_vgic_create(struct kvm *kvm);
  void kvm_vgic_destroy(struct kvm *kvm);
  void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu);
 @@ -321,7 +321,7 @@ static inline int kvm_vgic_addr(struct kvm *kvm, unsigned 
 long type, u64 *addr,
   return -ENXIO;
  }
  
 -static inline int kvm_vgic_init(struct kvm *kvm)
 +static inline int kvm_vgic_map_resources(struct kvm *kvm)
  {
   return 0;
  }
 diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
 index aacdb59..91e6bfc 100644
 --- a/virt/kvm/arm/vgic.c
 +++ b/virt/kvm/arm/vgic.c
 @@ -91,6 +91,7 @@
  #define ACCESS_WRITE_VALUE   (3  1)
  #define ACCESS_WRITE_MASK(x) ((x)  (3  1))
  
 +static int vgic_init(struct kvm *kvm);
  static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
  static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
  static void vgic_update_state(struct kvm *kvm);
 @@ -1726,39 +1727,14 @@ static int vgic_vcpu_init_maps(struct kvm_vcpu *vcpu, 
 int nr_irqs)
  
   int sz = (nr_irqs - VGIC_NR_PRIVATE_IRQS) / 8;
   vgic_cpu-pending_shared = kzalloc(sz, GFP_KERNEL);
 - vgic_cpu-vgic_irq_lr_map = kzalloc(nr_irqs, GFP_KERNEL);
 +

Re: [PATCH 3/5] arm/arm64: KVM: Add (new) vgic_initialized macro

2014-12-10 Thread Eric Auger


On 12/09/2014 04:44 PM, Christoffer Dall wrote:
 Some code paths will need to check to see if the internal state of the
 vgic has been initialized (such as when creating new VCPUs), so
 introduce such a macro that checks the nr_cpus field which is set when
 the vgic has been initialized.
 
 Also set nr_cpus = 0 in kvm_vgic_destroy, because the error path in
 vgic_init() will call this function, and code should never errornously
 assume the vgic to be properly initialized after an error.
 
 Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
 ---
  include/kvm/arm_vgic.h | 6 ++
  virt/kvm/arm/vgic.c| 1 +
  2 files changed, 7 insertions(+)
 
 diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
 index 3e262b9..ac4888d 100644
 --- a/include/kvm/arm_vgic.h
 +++ b/include/kvm/arm_vgic.h
 @@ -287,6 +287,7 @@ bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct 
 kvm_run *run,
 struct kvm_exit_mmio *mmio);
  
  #define irqchip_in_kernel(k) (!!((k)-arch.vgic.in_kernel))
 +#define vgic_initialized(k)  (!!((k)-arch.vgic.nr_cpus))
  #define vgic_ready(k)((k)-arch.vgic.ready)
  
  int vgic_v2_probe(struct device_node *vgic_node,
 @@ -369,6 +370,11 @@ static inline int irqchip_in_kernel(struct kvm *kvm)
   return 0;
  }
  
 +static inline bool vgic_initialized(struct kvm *kvm)
 +{
 + return true;
 +}
 +
  static inline bool vgic_ready(struct kvm *kvm)
  {
   return true;
 diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
 index 6293349..c98cc6b 100644
 --- a/virt/kvm/arm/vgic.c
 +++ b/virt/kvm/arm/vgic.c
 @@ -1774,6 +1774,7 @@ void kvm_vgic_destroy(struct kvm *kvm)
   dist-irq_spi_cpu = NULL;
   dist-irq_spi_target = NULL;
   dist-irq_pending_on_cpu = NULL;
 + dist-nr_cpus = 0;
Reviewed-by: Eric Auger eric.au...@linaro.org
we could use that new vgic_initialized at the entry of vgic_init instead
of testing dist-nr_cpus, hence introducing one user.

Eric
  }
  
  /*
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefit

2014-12-10 Thread Paolo Bonzini



On 06/12/2014 04:03, Andy Lutomirski wrote:
 paravirt_enabled has the following effects:
 
  - Disables the F00F bug workaround warning.  There is no F00F bug
workaround any more because Linux's standard IDT handling already
works around the F00F bug, but the warning still exists.  This
is only cosmetic, and, in any event, there is no such thing as
KVM on a CPU with the F00F bug.
 
  - Disables 32-bit APM BIOS detection.  On a KVM paravirt system,
there should be no APM BIOS anyway.
 
  - Disables tboot.  I think that the tboot code should check the
CPUID hypervisor bit directly if it matters.
 
  - paravirt_enabled disables espfix32.  espfix32 should *not* be
disabled under KVM paravirt.
 
 The last point is the purpose of this patch.  It fixes a leak of the
 high 16 bits of the kernel stack address on 32-bit KVM paravirt
 guests.
 
 While I'm at it, this removes pv_info setup from kvmclock.  That
 code seems to serve no purpose.

kvmclock_init runs before kvm_guest_init, and this is a stable@ patch so
for the sake of extra safety I've left the pv_info.name assignment in.
Applied (locally for now), will be in 3.19.

Paolo

 Cc: sta...@vger.kernel.org
 Signed-off-by: Andy Lutomirski l...@amacapital.net
 ---
  arch/x86/kernel/kvm.c  | 9 -
  arch/x86/kernel/kvmclock.c | 2 --
  2 files changed, 8 insertions(+), 3 deletions(-)
 
 diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
 index f6945bef2cd1..94f643484300 100644
 --- a/arch/x86/kernel/kvm.c
 +++ b/arch/x86/kernel/kvm.c
 @@ -283,7 +283,14 @@ NOKPROBE_SYMBOL(do_async_page_fault);
  static void __init paravirt_ops_setup(void)
  {
   pv_info.name = KVM;
 - pv_info.paravirt_enabled = 1;
 +
 + /*
 +  * KVM isn't paravirt in the sense of paravirt_enabled.  A KVM
 +  * guest kernel works like a bare metal kernel with additional
 +  * features, and paravirt_enabled is about features that are
 +  * missing.
 +  */
 + pv_info.paravirt_enabled = 0;
  
   if (kvm_para_has_feature(KVM_FEATURE_NOP_IO_DELAY))
   pv_cpu_ops.io_delay = kvm_io_delay;
 diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
 index d9156ceecdff..d4d9a8ad7893 100644
 --- a/arch/x86/kernel/kvmclock.c
 +++ b/arch/x86/kernel/kvmclock.c
 @@ -263,8 +263,6 @@ void __init kvmclock_init(void)
  #endif
   kvm_get_preset_lpj();
   clocksource_register_hz(kvm_clock, NSEC_PER_SEC);
 - pv_info.paravirt_enabled = 1;
 - pv_info.name = KVM;
  
   if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT))
   pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: Remove prefix flag when GP macro is used

2014-12-10 Thread Paolo Bonzini



On 07/12/2014 10:49, Nadav Amit wrote:
 The macro GP already sets the flag Prefix. Remove the redundant flag for
 0f_38_f0 and 0f_38_f1 opcodes.
 
 Signed-off-by: Nadav Amit na...@cs.technion.ac.il
 ---
  arch/x86/kvm/emulate.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
 index 3817334..b4f4201 100644
 --- a/arch/x86/kvm/emulate.c
 +++ b/arch/x86/kvm/emulate.c
 @@ -4172,8 +4172,8 @@ static const struct opcode opcode_map_0f_38[256] = {
   /* 0x80 - 0xef */
   X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N),
   /* 0xf0 - 0xf1 */
 - GP(EmulateOnUD | ModRM | Prefix, three_byte_0f_38_f0),
 - GP(EmulateOnUD | ModRM | Prefix, three_byte_0f_38_f1),
 + GP(EmulateOnUD | ModRM, three_byte_0f_38_f0),
 + GP(EmulateOnUD | ModRM, three_byte_0f_38_f1),
   /* 0xf2 - 0xff */
   N, N, X4(N), X8(N)
  };
 

Applied, thanks.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] KVM: x86: Emulator fixes for VM86

2014-12-10 Thread Paolo Bonzini



On 10/12/2014 10:19, Nadav Amit wrote:
 Two minor fixes for emulation of instructions on VM86.
 
 Thanks for reviewing them.
 
 Nadav Amit (2):
   KVM: x86: Do not push eflags.vm on pushf
   KVM: x86: Emulate should check #UD before #GP
 
  arch/x86/kvm/emulate.c | 14 +++---
  1 file changed, 7 insertions(+), 7 deletions(-)
 

Applied, thanks.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: nVMX: Disable unrestricted mode if ept=0

2014-12-10 Thread Paolo Bonzini



On 06/12/2014 16:02, Bandan Das wrote:
 
 If L0 has disabled EPT, don't advertise unrestricted
 mode at all since it depends on EPT to run real mode code.
 
 Signed-off-by: Bandan Das b...@redhat.com
 ---
  arch/x86/kvm/vmx.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 3e556c6..ed70394 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -2377,12 +2377,12 @@ static __init void nested_vmx_setup_ctls_msrs(void)
   nested_vmx_secondary_ctls_low = 0;
   nested_vmx_secondary_ctls_high =
   SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
 - SECONDARY_EXEC_UNRESTRICTED_GUEST |
   SECONDARY_EXEC_WBINVD_EXITING;
  
   if (enable_ept) {
   /* nested EPT: emulate EPT also to L1 */
 - nested_vmx_secondary_ctls_high |= SECONDARY_EXEC_ENABLE_EPT;
 + nested_vmx_secondary_ctls_high |= SECONDARY_EXEC_ENABLE_EPT |
 + SECONDARY_EXEC_UNRESTRICTED_GUEST;
   nested_vmx_ept_caps = VMX_EPT_PAGE_WALK_4_BIT |
VMX_EPTP_WB_BIT | VMX_EPT_2MB_PAGE_BIT |
VMX_EPT_INVEPT_BIT;
 

Thanks, applied with

Fixes: 92fbc7b195b824e201d9f06f2b93105f72384d65
Cc: sta...@vger.kernel.org

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/5] arm/arm64: KVM: Don't allow creating VCPUs after vgic_initialized

2014-12-10 Thread Eric Auger

On 12/09/2014 04:44 PM, Christoffer Dall wrote:
 When the vgic initializes its internal state it does so based on the
 number of VCPUs available at the time.  If we allow KVM to create more
 VCPUs after the VGIC has been initialized, we are likely to error out in
 unfortunate ways later, perform buffer overflows etc.
 
 Cc: Eric Auger eric.au...@linaro.org
 Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
 ---
 This replaces Eric Auger's previous patch
 (https://lists.cs.columbia.edu/pipermail/kvmarm/2014-December/012646.html),
 because it fits better with testing to include it in this series and I
 realized that we need to add a check against irqchip_in_kernel() as
 well.
 
  arch/arm/kvm/arm.c | 5 +
  1 file changed, 5 insertions(+)
 
 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index a9d005f..d4da244 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -213,6 +213,11 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, 
 unsigned int id)
   int err;
   struct kvm_vcpu *vcpu;
  
 + if (irqchip_in_kernel(kvm)  vgic_initialized(kvm)) {
Reviewed-by: Eric Auger eric.au...@linaro.org
a question about that irqchip_in_kernel(kvm):
kvm-arch.vgic.in_kernel is set in kvm_vgic_create but nobody resets it,
especially in destroy, am i wrong?
if the vgic is initialized shouldn't it be also created? Shouldn't we
test irqchip_in_kernel in vgic_init instead?
Also in case we need irqchip_in_kernel(kvm) here we might need it also
in kvm_vgic_inject_irq because dist-lock is grabbed in
vgic_update_irq_pending.

Eric
 + err = -EBUSY;
 + goto out;
 + }
 +
   vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
   if (!vcpu) {
   err = -ENOMEM;
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/5] arm/arm64: KVM: Initialize the vgic on-demand when injecting IRQs

2014-12-10 Thread Eric Auger

On 12/09/2014 04:44 PM, Christoffer Dall wrote:
 Userspace assumes that it can wire up IRQ injections after having
 created all VCPUs and after having created the VGIC, but potentially
 before starting the first VCPU.  This can currently lead to lost IRQs
 because the state of that IRQ injection is not stored anywhere and we
 don't return an error to userspace.
 
 We haven't seen this problem manifest itself yet, 
Actually we did with VFIO signaling setup before VGIC init!
presumably because
 guests reset the devices on boot, but this could cause issues with
 migration and other non-standard startup configurations.
 
 Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
 ---
  virt/kvm/arm/vgic.c | 9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)
 
 diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
 index c98cc6b..feef015 100644
 --- a/virt/kvm/arm/vgic.c
 +++ b/virt/kvm/arm/vgic.c
 @@ -1693,8 +1693,13 @@ out:
  int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
   bool level)
  {
 - if (likely(vgic_ready(kvm)) 
 - vgic_update_irq_pending(kvm, cpuid, irq_num, level))
 + if (unlikely(!vgic_initialized(kvm))) {
 + mutex_lock(kvm-lock);
 + vgic_init(kvm);
 + mutex_unlock(kvm-lock);
 + }
I was previously encouraged to test the virtual interrupt controller
readiness when setting irqfd up(proposal made in
https://lkml.org/lkml/2014/12/3/601). I guess this becomes useless now,
correct? Reviewed-by on the whole series.

Eric
 +
 + if (vgic_update_irq_pending(kvm, cpuid, irq_num, level))
   vgic_kick_vcpus(kvm);
  
   return 0;
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] x86: Remove Fix Mes in emulate.c from needing fault addresses

2014-12-10 Thread Paolo Bonzini



On 08/12/2014 04:18, nick wrote:
 Paolo,
 Not to be annoying but I am wondering, if my patch has been merged as I have 
 yet to see it in the mainline
 kernel.

It will be sent to Linus during the merge window.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[QEMU patch 1/2] kvm: sync kernel headers

2014-12-10 Thread Marcelo Tosatti

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

---
 linux-headers/asm-x86/kvm.h |5 +
 linux-headers/linux/kvm.h   |   14 +++---
 2 files changed, 8 insertions(+), 11 deletions(-)

Index: qemu/linux-headers/asm-x86/kvm.h
===
--- qemu.orig/linux-headers/asm-x86/kvm.h   2014-12-08 17:54:33.647488264 
-0200
+++ qemu/linux-headers/asm-x86/kvm.h2014-12-09 13:27:20.749752962 -0200
@@ -277,6 +277,11 @@
__u8 reserved[31];
 };
 
+struct kvm_tscdeadline_advance {
+   __u32 timer_advance;
+   __u32 reserved[3];
+};
+
 /* When set in flags, include corresponding fields on KVM_SET_VCPU_EVENTS */
 #define KVM_VCPUEVENT_VALID_NMI_PENDING0x0001
 #define KVM_VCPUEVENT_VALID_SIPI_VECTOR0x0002
Index: qemu/linux-headers/linux/kvm.h
===
--- qemu.orig/linux-headers/linux/kvm.h 2014-12-08 17:54:33.647488264 -0200
+++ qemu/linux-headers/linux/kvm.h  2014-12-09 13:27:20.750752961 -0200
@@ -761,6 +753,7 @@
 #define KVM_CAP_PPC_FIXUP_HCALL 103
 #define KVM_CAP_PPC_ENABLE_HCALL 104
 #define KVM_CAP_CHECK_EXTENSION_VM 105
+#define KVM_CAP_TSCDEADLINE_ADVANCE 106
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1061,6 +1054,8 @@
 #define KVM_GET_DEVICE_ATTR  _IOW(KVMIO,  0xe2, struct kvm_device_attr)
 #define KVM_HAS_DEVICE_ATTR  _IOW(KVMIO,  0xe3, struct kvm_device_attr)
 
+#define KVM_SET_TSCDEADLINE_ADVANCE  _IOW(KVMIO,  0xe4, struct 
kvm_tscdeadline_advance)
+
 /*
  * ioctls for vcpu fds
  */


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[QEMU patch 2/2] kvm: allow configuration of tsc deadline timer advancement

2014-12-10 Thread Marcelo Tosatti

Add machine option and QMP commands to configure TSC deadline
timer advancement.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

---
 monitor.c |   15 ++
 qapi-schema.json  |   29 +++
 qmp-commands.hx   |   48 
 target-i386/kvm.c |   80 ++
 vl.c  |4 ++
 5 files changed, 176 insertions(+)

Index: qemu.tscdeadline/qapi-schema.json
===
--- qemu.tscdeadline.orig/qapi-schema.json
+++ qemu.tscdeadline/qapi-schema.json
@@ -3515,3 +3515,32 @@
 # Since: 2.1
 ##
 { 'command': 'rtc-reset-reinjection' }
+
+##
+# @set-lapic-tscdeadline-advance
+#
+# This command sets the TSC deadline timer advancement.
+# This value will be subtracted from the expiration time
+# of the high resolution timer which emulates
+# TSC deadline timer.
+#
+# Useful to achieve low timer latencies.
+#
+# Only supported by KVM acceleration.
+#
+# Since: 2.3
+##
+{ 'command': 'set-lapic-tscdeadline-advance',
+  'data': { 'advance':'int' }
+}
+
+##
+# @get-lapic-tscdeadline-advance
+#
+# This command gets the TSC deadline timer advancement.
+#
+# Only supported by KVM acceleration.
+#
+# Since: 2.3
+##
+{ 'command': 'get-lapic-tscdeadline-advance', 'returns': 'int' }
Index: qemu.tscdeadline/qmp-commands.hx
===
--- qemu.tscdeadline.orig/qmp-commands.hx
+++ qemu.tscdeadline/qmp-commands.hx
@@ -3854,3 +3854,51 @@ Move mouse pointer to absolute coordinat
 - { return: {} }
 
 EQMP
+
+{
+.name   = set-lapic-tscdeadline-advance,
+.args_type  = advance:i,
+.mhandler.cmd_new = qmp_marshal_input_set_lapic_tscdeadline_advance,
+},
+
+SQMP
+set-lapic-tscdeadline-advance
+-
+
+Set LAPIC tscdeadline timer advancement, in nanoseconds.
+
+Arguments:
+
+- advance: LAPIC tscdeadline timer advancement (json-int)
+
+Example:
+
+- { execute: set-lapic-tscdeadline-advance 1000 }
+- { return: {} }
+
+EQMP
+
+{
+.name   = get-lapic-tscdeadline-advance,
+.args_type  = ,
+.mhandler.cmd_new = qmp_marshal_input_get_lapic_tscdeadline_advance,
+},
+
+SQMP
+get-lapic-tscdeadline-advance
+-
+
+Get LAPIC tscdeadline timer advancement, in nanoseconds.
+
+Arguments: None.
+
+returns a json-object with the following information:
+- value : json-int
+
+Example:
+
+- { execute: get-lapic-tscdeadline-advance }
+- { return: {1000} }
+
+EQMP
+
Index: qemu.tscdeadline/vl.c
===
--- qemu.tscdeadline.orig/vl.c
+++ qemu.tscdeadline/vl.c
@@ -387,6 +387,10 @@ static QemuOptsList qemu_machine_opts =
 .name = iommu,
 .type = QEMU_OPT_BOOL,
 .help = Set on/off to enable/disable Intel IOMMU (VT-d),
+},{
+.name = lapic-tscdeadline-advance,
+.type = QEMU_OPT_NUMBER,
+.help = Set lapic tscdeadline timer advance,
 },
 { /* End of list */ }
 },
Index: qemu.tscdeadline/target-i386/kvm.c
===
--- qemu.tscdeadline.orig/target-i386/kvm.c
+++ qemu.tscdeadline/target-i386/kvm.c
@@ -37,6 +37,7 @@
 #include hw/pci/pci.h
 #include migration/migration.h
 #include qapi/qmp/qerror.h
+#include qmp-commands.h
 
 //#define DEBUG_KVM
 
@@ -84,6 +85,10 @@ static bool has_msr_mtrr;
 static bool has_msr_architectural_pmu;
 static uint32_t num_architectural_pmu_counters;
 
+static struct lapic_tscdeadline_advance {
+unsigned int advance_ns;
+} lapic_tscdeadline_advance;
+
 bool kvm_allows_irq0_override(void)
 {
 return !kvm_irqchip_in_kernel() || kvm_has_gsi_routing();
@@ -835,12 +840,32 @@ static int kvm_get_supported_msrs(KVMSta
 return ret;
 }
 
+static int kvm_set_lapic_tscdeadline(KVMState *s, uint32_t advance)
+{
+struct kvm_tscdeadline_advance adv;
+int ret = 0;
+
+memset(adv, 0, sizeof(adv));
+
+adv.timer_advance = advance;
+
+ret = kvm_vm_ioctl(s, KVM_SET_TSCDEADLINE_ADVANCE, adv);
+if (ret  0) {
+return ret;
+}
+
+lapic_tscdeadline_advance.advance_ns = advance;
+
+return ret;
+}
+
 int kvm_arch_init(KVMState *s)
 {
 uint64_t identity_base = 0xfffbc000;
 uint64_t shadow_mem;
 int ret;
 struct utsname utsname;
+uint32_t lapic_advance_ns;
 
 ret = kvm_get_supported_msrs(s);
 if (ret  0) {
@@ -894,9 +919,40 @@ int kvm_arch_init(KVMState *s)
 return ret;
 }
 }
+
+lapic_advance_ns = qemu_opt_get_number(qemu_get_machine_opts(),
+   lapic-tscdeadline-advance,
+   0);
+if (lapic_advance_ns) {
+ret = kvm_set_lapic_tscdeadline(s, lapic_advance_ns);
+if (ret) {
+fprintf(stderr, Set tscdeadline

[QEMU patch 0/2] QEMU lapic tsc deadline advancement

2014-12-10 Thread Marcelo Tosatti

Add command to set TSC deadline timer advancement.
This value will be subtracted from the expiration time
of the high resolution timer which emulates
TSC deadline timer.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Intel-gfx] [ANNOUNCE][RFC] KVMGT - the implementation of Intel GVT-g(full GPU virtualization) for KVM

2014-12-10 Thread Paolo Bonzini



On 09/12/2014 03:49, Tian, Kevin wrote:
 - Now we have XenGT/KVMGT separately maintained, and KVMGT lags
 behind XenGT regarding to features and qualities. Likely you'll continue
 see stale code (like Xen inst decoder) for some time. In the future we
 plan to maintain a single kernel repo for both, so KVMGT can share
 same quality as XenGT once KVM in-kernel dm framework is stable.
 
 - Regarding to Qemu hacks, KVMGT really doesn't have any different 
 requirements as what have been discussed for GPU pass-through, e.g. 
 about ISA bridge. Our implementation is based on an old Qemu repo, 
 and honestly speaking not cleanly developed, because we know we
 can leverage from GPU pass-through support once it's in Qemu. At 
 that time we'll leverage the same logic with minimal changes to 
 hook KVMGT mgmt. APIs (e.g. create/destroy a vGPU instance). So
 we can ignore this area for now. :-)

Could the virtual device model introduce new registers in order to avoid
poking at the ISA bridge?  I'm not sure that you can leverage from GPU
pass-through support once it's in Qemu, since the Xen IGD passthrough
support is being added to a separate machine that is specific to Xen IGD
passthrough; no ISA bridge hacking will probably be allowed on the -M
pc and -M q35 machine types.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 1/2] KVM: x86: add method to test PIR bitmap vector

2014-12-10 Thread Marcelo . Tosatti

kvm_x86_ops-test_posted_interrupt() returns true/false depending
whether 'vector' is set.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/include/asm/kvm_host.h
===
--- kvm.orig/arch/x86/include/asm/kvm_host.h
+++ kvm/arch/x86/include/asm/kvm_host.h
@@ -743,6 +743,7 @@ struct kvm_x86_ops {
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
+   bool (*test_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
int (*get_tdp_level)(void);
Index: kvm/arch/x86/kvm/vmx.c
===
--- kvm.orig/arch/x86/kvm/vmx.c
+++ kvm/arch/x86/kvm/vmx.c
@@ -435,6 +435,11 @@ static int pi_test_and_set_pir(int vecto
return test_and_set_bit(vector, (unsigned long *)pi_desc-pir);
 }
 
+static int pi_test_pir(int vector, struct pi_desc *pi_desc)
+{
+   return test_bit(vector, (unsigned long *)pi_desc-pir);
+}
+
 struct vcpu_vmx {
struct kvm_vcpu   vcpu;
unsigned long host_rsp;
@@ -5939,6 +5944,7 @@ static __init int hardware_setup(void)
else {
kvm_x86_ops-hwapic_irr_update = NULL;
kvm_x86_ops-deliver_posted_interrupt = NULL;
+   kvm_x86_ops-test_posted_interrupt = NULL;
kvm_x86_ops-sync_pir_to_irr = vmx_sync_pir_to_irr_dummy;
}
 
@@ -6960,6 +6966,13 @@ static int handle_invvpid(struct kvm_vcp
return 1;
 }
 
+static bool vmx_test_pir(struct kvm_vcpu *vcpu, int vector)
+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+   return pi_test_pir(vector, vmx-pi_desc);
+}
+
 /*
  * The exit handlers return 1 if the exit was handled fully and guest execution
  * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
@@ -9374,6 +9387,7 @@ static struct kvm_x86_ops vmx_x86_ops =
.hwapic_isr_update = vmx_hwapic_isr_update,
.sync_pir_to_irr = vmx_sync_pir_to_irr,
.deliver_posted_interrupt = vmx_deliver_posted_interrupt,
+   .test_posted_interrupt = vmx_test_pir,
 
.set_tss_addr = vmx_set_tss_addr,
.get_tdp_level = get_ept_level,


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 0/2] KVM: add option to advance tscdeadline hrtimer expiration

2014-12-10 Thread Marcelo . Tosatti

See patches for details.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [QEMU patch 2/2] kvm: allow configuration of tsc deadline timer advancement

2014-12-10 Thread Paolo Bonzini



On 10/12/2014 17:23, Marcelo Tosatti wrote:
 Add machine option and QMP commands to configure TSC deadline
 timer advancement.
 
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 
 ---
  monitor.c |   15 ++
  qapi-schema.json  |   29 +++
  qmp-commands.hx   |   48 
  target-i386/kvm.c |   80 
 ++
  vl.c  |4 ++
  5 files changed, 176 insertions(+)
 
 Index: qemu.tscdeadline/qapi-schema.json
 ===
 --- qemu.tscdeadline.orig/qapi-schema.json
 +++ qemu.tscdeadline/qapi-schema.json
 @@ -3515,3 +3515,32 @@
  # Since: 2.1
  ##
  { 'command': 'rtc-reset-reinjection' }
 +
 +##
 +# @set-lapic-tscdeadline-advance
 +#
 +# This command sets the TSC deadline timer advancement.
 +# This value will be subtracted from the expiration time
 +# of the high resolution timer which emulates
 +# TSC deadline timer.
 +#
 +# Useful to achieve low timer latencies.
 +#
 +# Only supported by KVM acceleration.
 +#
 +# Since: 2.3
 +##
 +{ 'command': 'set-lapic-tscdeadline-advance',
 +  'data': { 'advance':'int' }
 +}
 +
 +##
 +# @get-lapic-tscdeadline-advance
 +#
 +# This command gets the TSC deadline timer advancement.
 +#
 +# Only supported by KVM acceleration.
 +#
 +# Since: 2.3
 +##
 +{ 'command': 'get-lapic-tscdeadline-advance', 'returns': 'int' }

Please add an object property to the x86 CPU object.  It can then be
configured with -global on the command line.

 +ret = kvm_vm_ioctl(s, KVM_SET_TSCDEADLINE_ADVANCE, adv);
 +if (ret  0) {
 +return ret;
 +}

Please use KVM_GET/SET_ONE_REG instead of introducing a new set of ioctls.

Paolo

 +lapic_tscdeadline_advance.advance_ns = advance;
 +
 +return ret;
 +}
 +
  int kvm_arch_init(KVMState *s)
  {
  uint64_t identity_base = 0xfffbc000;
  uint64_t shadow_mem;
  int ret;
  struct utsname utsname;
 +uint32_t lapic_advance_ns;
  
  ret = kvm_get_supported_msrs(s);
  if (ret  0) {
 @@ -894,9 +919,40 @@ int kvm_arch_init(KVMState *s)
  return ret;
  }
  }
 +
 +lapic_advance_ns = qemu_opt_get_number(qemu_get_machine_opts(),
 +   lapic-tscdeadline-advance,
 +   0);
 +if (lapic_advance_ns) {
 +ret = kvm_set_lapic_tscdeadline(s, lapic_advance_ns);
 +if (ret) {
 +fprintf(stderr, Set tscdeadline advance failed: %s\n,
 +strerror(-ret));
 +return ret;
 +}
 +}
 +
 +
  return 0;
  }
  
 +int64_t qmp_get_lapic_tscdeadline_advance(Error **errp)
 +{
 +return lapic_tscdeadline_advance.advance_ns;
 +}
 +
 +void qmp_set_lapic_tscdeadline_advance(int64_t advance, Error **errp)
 +{
 +KVMState *s = kvm_state;
 +int ret;
 +
 +ret = kvm_set_lapic_tscdeadline(s, advance);
 +if (ret) {
 +error_setg_errno(errp, ret, set lapic tscdeadline failed);
 +return;
 +}
 +}
 +
  static void set_v8086_seg(struct kvm_segment *lhs, const SegmentCache *rhs)
  {
  lhs-selector = rhs-selector;
 Index: qemu.tscdeadline/monitor.c
 ===
 --- qemu.tscdeadline.orig/monitor.c
 +++ qemu.tscdeadline/monitor.c
 @@ -5447,3 +5447,18 @@ void qmp_rtc_reset_reinjection(Error **e
  error_set(errp, QERR_FEATURE_DISABLED, rtc-reset-reinjection);
  }
  #endif
 +
 +#if !defined (TARGET_I386) || !defined (CONFIG_KVM)
 +int64_t qmp_get_lapic_tscdeadline_advance(Error **errp)
 +{
 +error_set(errp, QERR_FEATURE_DISABLED, get-lapic-tscdeadline-advance);
 +
 +return 0;
 +}
 +
 +void qmp_set_lapic_tscdeadline_advance(int64_t advance, Error **errp)
 +{
 +error_set(errp, QERR_FEATURE_DISABLED, set-lapic-tscdeadline-advance);
 +}
 +#endif
 +
 Index: qemu.tscdeadline/include/hw/boards.h
 ===
 --- qemu.tscdeadline.orig/include/hw/boards.h
 +++ qemu.tscdeadline/include/hw/boards.h
 @@ -133,6 +133,7 @@ struct MachineState {
  bool usb;
  char *firmware;
  bool iommu;
 +int lapi_tscdeadline_advance;
  
  ram_addr_t ram_size;
  ram_addr_t maxram_size;
 Index: qemu.tscdeadline/qemu-options.hx
 ===
 --- qemu.tscdeadline.orig/qemu-options.hx
 +++ qemu.tscdeadline/qemu-options.hx
 @@ -37,7 +37,8 @@ DEF(machine, HAS_ARG, QEMU_OPTION_mach
  kvm_shadow_mem=size of KVM shadow MMU\n
  dump-guest-core=on|off include guest memory in a core 
 dump (default=on)\n
  mem-merge=on|off controls memory merge support 
 (default: on)\n
 -iommu=on|off controls emulated Intel IOMMU (VT-d) 
 support (default=off)\n,
 +iommu=on|off controls emulated Intel IOMMU (VT-d) 
 support

[patch 0/2] KVM: add option to advance tscdeadline hrtimer expiration (v2)

2014-12-10 Thread Marcelo Tosatti

See patches for details.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 2/2] KVM: x86: add option to advance tscdeadline hrtimer expiration

2014-12-10 Thread Marcelo Tosatti

For the hrtimer which emulates the tscdeadline timer in the guest,
add an option to advance expiration, and busy spin on VM-entry waiting
for the actual expiration time to elapse.

This allows achieving low latencies in cyclictest (or any scenario 
which requires strict timing regarding timer expiration).

Reduces cyclictest avg latency by 50%.

Note: this option requires tuning to find the appropriate value 
for a particular hardware/guest combination. One method is to measure the 
average delay between apic_timer_fn and VM-entry. 
Another method is to start with 1000ns, and increase the value
in say 500ns increments until avg cyclictest numbers stop decreasing.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/kvm/lapic.c
===
--- kvm.orig/arch/x86/kvm/lapic.c
+++ kvm/arch/x86/kvm/lapic.c
@@ -33,6 +33,7 @@
 #include asm/page.h
 #include asm/current.h
 #include asm/apicdef.h
+#include asm/delay.h
 #include linux/atomic.h
 #include linux/jump_label.h
 #include kvm_cache_regs.h
@@ -1073,6 +1074,7 @@ static void apic_timer_expired(struct kv
 {
struct kvm_vcpu *vcpu = apic-vcpu;
wait_queue_head_t *q = vcpu-wq;
+   struct kvm_timer *ktimer = apic-lapic_timer;
 
/*
 * Note: KVM_REQ_PENDING_TIMER is implicitly checked in
@@ -1087,11 +1089,59 @@ static void apic_timer_expired(struct kv
 
if (waitqueue_active(q))
wake_up_interruptible(q);
+
+   if (ktimer-timer_mode_mask == APIC_LVT_TIMER_TSCDEADLINE)
+   ktimer-expired_tscdeadline = ktimer-tscdeadline;
+}
+
+static bool lapic_timer_int_injected(struct kvm_vcpu *vcpu)
+{
+   struct kvm_lapic *apic = vcpu-arch.apic;
+   u32 reg = kvm_apic_get_reg(apic, APIC_LVTT);
+
+   if (kvm_apic_hw_enabled(apic)) {
+   int vec = reg  APIC_VECTOR_MASK;
+
+   if (kvm_x86_ops-test_posted_interrupt)
+   return kvm_x86_ops-test_posted_interrupt(vcpu, vec);
+   else {
+   if (apic_test_vector(vec, apic-regs + APIC_ISR))
+   return true;
+   }
+   }
+   return false;
+}
+
+void wait_lapic_expire(struct kvm_vcpu *vcpu)
+{
+   struct kvm_lapic *apic = vcpu-arch.apic;
+   u64 guest_tsc, tsc_deadline;
+
+   if (!kvm_vcpu_has_lapic(vcpu))
+   return;
+
+   if (!apic_lvtt_tscdeadline(apic))
+   return;
+
+   if (!lapic_timer_int_injected(vcpu))
+   return;
+
+   tsc_deadline = apic-lapic_timer.expired_tscdeadline;
+   guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, native_read_tsc());
+
+   while (guest_tsc  tsc_deadline) {
+   int delay = min(tsc_deadline - guest_tsc, 1000ULL);
+
+   ndelay(delay);
+   guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, native_read_tsc());
+   }
 }
 
 static void start_apic_timer(struct kvm_lapic *apic)
 {
ktime_t now;
+   struct kvm_arch *kvm_arch = apic-vcpu-kvm-arch;
+
atomic_set(apic-lapic_timer.pending, 0);
 
if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic)) {
@@ -1137,6 +1187,7 @@ static void start_apic_timer(struct kvm_
/* lapic timer in tsc deadline mode */
u64 guest_tsc, tscdeadline = apic-lapic_timer.tscdeadline;
u64 ns = 0;
+   ktime_t expire;
struct kvm_vcpu *vcpu = apic-vcpu;
unsigned long this_tsc_khz = vcpu-arch.virtual_tsc_khz;
unsigned long flags;
@@ -1149,10 +1200,14 @@ static void start_apic_timer(struct kvm_
now = apic-lapic_timer.timer.base-get_time();
guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, native_read_tsc());
if (likely(tscdeadline  guest_tsc)) {
+   u32 advance = kvm_arch-lapic_tscdeadline_advance_ns;
+
ns = (tscdeadline - guest_tsc) * 100ULL;
do_div(ns, this_tsc_khz);
+   expire = ktime_add_ns(now, ns);
+   expire = ktime_sub_ns(expire, advance);
hrtimer_start(apic-lapic_timer.timer,
-   ktime_add_ns(now, ns), HRTIMER_MODE_ABS);
+ expire, HRTIMER_MODE_ABS);
} else
apic_timer_expired(apic);
 
Index: kvm/arch/x86/kvm/lapic.h
===
--- kvm.orig/arch/x86/kvm/lapic.h
+++ kvm/arch/x86/kvm/lapic.h
@@ -14,6 +14,7 @@ struct kvm_timer {
u32 timer_mode;
u32 timer_mode_mask;
u64 tscdeadline;
+   u64 expired_tscdeadline;
atomic_t pending;   /* accumulated triggered timers 
*/
 };
 
@@ -170,4 +171,6 @@ static inline bool kvm_apic_has_events(s
 
 bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
 
+void

Re: [patch 2/2] KVM: x86: add option to advance tscdeadline hrtimer expiration

2014-12-10 Thread Rik van Riel

On 12/10/2014 12:06 PM, Marcelo Tosatti wrote:
 For the hrtimer which emulates the tscdeadline timer in the guest,
 add an option to advance expiration, and busy spin on VM-entry waiting
 for the actual expiration time to elapse.
 
 This allows achieving low latencies in cyclictest (or any scenario 
 which requires strict timing regarding timer expiration).
 
 Reduces cyclictest avg latency by 50%.
 
 Note: this option requires tuning to find the appropriate value 
 for a particular hardware/guest combination. One method is to measure the 
 average delay between apic_timer_fn and VM-entry. 
 Another method is to start with 1000ns, and increase the value
 in say 500ns increments until avg cyclictest numbers stop decreasing.

It would be good to document how this is used, in the changelog.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 1/2] KVM: x86: add method to test PIR bitmap vector

2014-12-10 Thread Rik van Riel

On 12/10/2014 12:06 PM, Marcelo Tosatti wrote:
 kvm_x86_ops-test_posted_interrupt() returns true/false depending
 whether 'vector' is set.

Is that good? Bad? How does this patch address the issue?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 1/2] KVM: x86: add method to test PIR bitmap vector

2014-12-10 Thread Marcelo Tosatti

On Wed, Dec 10, 2014 at 12:10:04PM -0500, Rik van Riel wrote:
 On 12/10/2014 12:06 PM, Marcelo Tosatti wrote:
  kvm_x86_ops-test_posted_interrupt() returns true/false depending
  whether 'vector' is set.
 
 Is that good? Bad? How does this patch address the issue?

What issue?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [QEMU patch 2/2] kvm: allow configuration of tsc deadline timer advancement

2014-12-10 Thread Marcelo Tosatti

On Wed, Dec 10, 2014 at 06:09:19PM +0100, Paolo Bonzini wrote:
 
 
 On 10/12/2014 18:04, Marcelo Tosatti wrote:
  Please add an object property to the x86 CPU object.  It can then be
  configured with -global on the command line.
  
  Don't want to allow individual values for different CPUs. 
  It is a per-VM property.
 
 Why?  It can cause busy waiting, it would make sense to make it stricter
 for realtime CPUs and leave 0 for non-realtime CPUs.
 
 Paolo

HW timer behaviour should be consistent across CPUs, IMO.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 0/2] KVM: add option to advance tscdeadline hrtimer expiration (v2)

2014-12-10 Thread Marcelo Tosatti

On Wed, Dec 10, 2014 at 06:10:19PM +0100, Paolo Bonzini wrote:
 
 
 On 10/12/2014 18:06, Marcelo Tosatti wrote:
  See patches for details.
 
 Difference between v1 and v2?  Please fix your workflow.
 
 Paolo

Wrong sender email address, that is all.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 87611] Duplicate interrupt abbreviation [patch proposal included]

2014-12-10 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=87611

Alan a...@lxorguk.ukuu.org.uk changed:

   What|Removed |Added

 CC||a...@lxorguk.ukuu.org.uk
  Component|x86-64  |kvm
   Assignee|platform_x86_64@kernel-bugs |virtualization_kvm@kernel-b
   |.osdl.org   |ugs.osdl.org
Product|Platform Specific/Hardware  |Virtualization

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [QEMU patch 2/2] kvm: allow configuration of tsc deadline timer advancement

2014-12-10 Thread Marcelo Tosatti

On Wed, Dec 10, 2014 at 06:01:21PM +0100, Paolo Bonzini wrote:
 
 
 On 10/12/2014 17:23, Marcelo Tosatti wrote:
  Add machine option and QMP commands to configure TSC deadline
  timer advancement.
  
  Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
  
  ---
   monitor.c |   15 ++
   qapi-schema.json  |   29 +++
   qmp-commands.hx   |   48 
   target-i386/kvm.c |   80 
  ++
   vl.c  |4 ++
   5 files changed, 176 insertions(+)
  
  Index: qemu.tscdeadline/qapi-schema.json
  ===
  --- qemu.tscdeadline.orig/qapi-schema.json
  +++ qemu.tscdeadline/qapi-schema.json
  @@ -3515,3 +3515,32 @@
   # Since: 2.1
   ##
   { 'command': 'rtc-reset-reinjection' }
  +
  +##
  +# @set-lapic-tscdeadline-advance
  +#
  +# This command sets the TSC deadline timer advancement.
  +# This value will be subtracted from the expiration time
  +# of the high resolution timer which emulates
  +# TSC deadline timer.
  +#
  +# Useful to achieve low timer latencies.
  +#
  +# Only supported by KVM acceleration.
  +#
  +# Since: 2.3
  +##
  +{ 'command': 'set-lapic-tscdeadline-advance',
  +  'data': { 'advance':'int' }
  +}
  +
  +##
  +# @get-lapic-tscdeadline-advance
  +#
  +# This command gets the TSC deadline timer advancement.
  +#
  +# Only supported by KVM acceleration.
  +#
  +# Since: 2.3
  +##
  +{ 'command': 'get-lapic-tscdeadline-advance', 'returns': 'int' }
 
 Please add an object property to the x86 CPU object.  It can then be
 configured with -global on the command line.

Don't want to allow individual values for different CPUs. 
It is a per-VM property.

Is it still appropriate to use an object property of the 
CPU object?

 
  +ret = kvm_vm_ioctl(s, KVM_SET_TSCDEADLINE_ADVANCE, adv);
  +if (ret  0) {
  +return ret;
  +}
 
 Please use KVM_GET/SET_ONE_REG instead of introducing a new set of ioctls.
 
 Paolo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 2/2] KVM: x86: add option to advance tscdeadline hrtimer expiration

2014-12-10 Thread Paolo Bonzini



On 10/12/2014 17:53, marcelo.tosa...@amt.cnet wrote:
 For the hrtimer which emulates the tscdeadline timer in the guest,
 add an option to advance expiration, and busy spin on VM-entry waiting
 for the actual expiration time to elapse.
 
 This allows achieving low latencies in cyclictest (or any scenario 
 which requires strict timing regarding timer expiration).
 
 Reduces cyclictest avg latency by 50%.
 
 Note: this option requires tuning to find the appropriate value 
 for a particular hardware/guest combination. One method is to measure the 
 average delay between apic_timer_fn and VM-entry. 
 Another method is to start with 1000ns, and increase the value
 in say 500ns increments until avg cyclictest numbers stop decreasing.
 
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

What is the latency value that you find in practice, for both
apic_timer_fn to vmentry?  Or for apic_timer_fn to just before vmrun?
Let's start with a kvm-unit-tests patch to measure this value.

We can then decide whether to hardcode a small default value (e.g.
1000-3000) and make it a module parameter?  Or perhaps start with a
higher value (twice what you find in practice?) and adjust it towards a
target every time wait_lapic_expire is called.  But in order to judge
the correct approach, I need to see the numbers.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [QEMU patch 2/2] kvm: allow configuration of tsc deadline timer advancement

2014-12-10 Thread Paolo Bonzini



On 10/12/2014 18:04, Marcelo Tosatti wrote:
 Please add an object property to the x86 CPU object.  It can then be
 configured with -global on the command line.
 
 Don't want to allow individual values for different CPUs. 
 It is a per-VM property.

Why?  It can cause busy waiting, it would make sense to make it stricter
for realtime CPUs and leave 0 for non-realtime CPUs.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [QEMU patch 2/2] kvm: allow configuration of tsc deadline timer advancement

2014-12-10 Thread Paolo Bonzini



On 10/12/2014 18:27, Marcelo Tosatti wrote:
 On Wed, Dec 10, 2014 at 06:09:19PM +0100, Paolo Bonzini wrote:


 On 10/12/2014 18:04, Marcelo Tosatti wrote:
 Please add an object property to the x86 CPU object.  It can then be
 configured with -global on the command line.

 Don't want to allow individual values for different CPUs. 
 It is a per-VM property.

 Why?  It can cause busy waiting, it would make sense to make it stricter
 for realtime CPUs and leave 0 for non-realtime CPUs.

 Paolo
 
 HW timer behaviour should be consistent across CPUs, IMO.

It's not going to be anyway.  Cache line bounces, frequency scaling,
presence of higher-priority RT tasks, etc. can cause different response
for one CPU over the others.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 2/2] KVM: x86: add option to advance tscdeadline hrtimer expiration

2014-12-10 Thread Marcelo Tosatti

On Wed, Dec 10, 2014 at 06:08:14PM +0100, Paolo Bonzini wrote:
 
 
 On 10/12/2014 17:53, marcelo.tosa...@amt.cnet wrote:
  For the hrtimer which emulates the tscdeadline timer in the guest,
  add an option to advance expiration, and busy spin on VM-entry waiting
  for the actual expiration time to elapse.
  
  This allows achieving low latencies in cyclictest (or any scenario 
  which requires strict timing regarding timer expiration).
  
  Reduces cyclictest avg latency by 50%.
  
  Note: this option requires tuning to find the appropriate value 
  for a particular hardware/guest combination. One method is to measure the 
  average delay between apic_timer_fn and VM-entry. 
  Another method is to start with 1000ns, and increase the value
  in say 500ns increments until avg cyclictest numbers stop decreasing.
  
  Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 
 What is the latency value that you find in practice, for both
 apic_timer_fn to vmentry?  Or for apic_timer_fn to just before vmrun?

7us between apic_timer_fn and kvm_entry tracepoint.

 Let's start with a kvm-unit-tests patch to measure this value.

I can, but kvm-unit-test register state will not be similar to
actual guest state (think host/guest state loading).

What is the advantage of using a kvm-unit-test test rather
than cyclictest in the guest ?

 We can then decide whether to hardcode a small default value (e.g.
 1000-3000) and make it a module parameter?  Or perhaps start with a
 higher value (twice what you find in practice?) and adjust it towards a
 target every time wait_lapic_expire is called.  But in order to judge
 the correct approach, I need to see the numbers.

Problem with automatic adjustment is: what is the correct target?

You want faster instances of apic_timer_fn-vm-entry to spin a bit,
and allow slow instances of apic_timer_fn-vm-entry to have
an effective advancement.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH RFC v5 05/19] virtio: support more feature bits

2014-12-10 Thread Cornelia Huck

On Tue,  2 Dec 2014 14:00:13 +0100
Cornelia Huck cornelia.h...@de.ibm.com wrote:


 diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
 index 070006c..23d713b 100644
 --- a/include/hw/qdev-properties.h
 +++ b/include/hw/qdev-properties.h
 @@ -51,6 +51,17 @@ extern PropertyInfo qdev_prop_arraylen;
  .defval= (bool)_defval,  \
  }
 
 +#define DEFINE_PROP_BIT64(_name, _state, _field, _bit, _defval) {  \
 +.name  = (_name),\
 +.info  = (qdev_prop_bit),   \
 +.bitnr= (_bit),  \
 +.offset= offsetof(_state, _field)\
 ++ type_check(uint64_t,typeof_field(_state, _field)), \
 +.qtype = QTYPE_QBOOL,\
 +.defval= (bool)_defval,  \
 +}
 +
 +
  #define DEFINE_PROP_BOOL(_name, _state, _field, _defval) {   \
  .name  = (_name),\
  .info  = (qdev_prop_bool),  \

This one is of course broken. I'll send an updated patch tomorrow.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH RFC v5 18/19] virtio: support revision-specific features

2014-12-10 Thread Cornelia Huck

On Tue,  2 Dec 2014 14:00:26 +0100
Cornelia Huck cornelia.h...@de.ibm.com wrote:

 Devices may support different sets of feature bits depending on which
 revision they're operating at. Let's give the transport a way to
 re-query the device about its features when the revision has been
 changed.
 
 Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
 ---
  hw/s390x/virtio-ccw.c  |   12 ++--
  hw/virtio/virtio-bus.c |   14 --
  include/hw/virtio/virtio-bus.h |3 +++
  include/hw/virtio/virtio.h |3 +++
  4 files changed, 28 insertions(+), 4 deletions(-)

There seems to be something wrong with this patch - I noticed when I
fixed prop_bit64. Needs debugging.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 87611] Duplicate interrupt abbreviation [patch proposal included]

2014-12-10 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=87611

--- Comment #1 from Antti Tönkyrä daeda...@pingtimeout.net ---
I also posted the patch to LKML but I think no-one picked it up and I didn't
have time to follow up on this after that. ( https://lkml.org/lkml/2014/11/9/57
)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [QEMU patch 2/2] kvm: allow configuration of tsc deadline timer advancement

2014-12-10 Thread Eric Blake

On 12/10/2014 09:23 AM, Marcelo Tosatti wrote:
 Add machine option and QMP commands to configure TSC deadline
 timer advancement.
 
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 
 ---

 +##
 +# @get-lapic-tscdeadline-advance
 +#
 +# This command gets the TSC deadline timer advancement.
 +#
 +# Only supported by KVM acceleration.
 +#
 +# Since: 2.3
 +##
 +{ 'command': 'get-lapic-tscdeadline-advance', 'returns': 'int' }

Please don't return a bare int.  It is not extensible, if we ever need
multiple named values associated with lapic in the future.  Return a
dictionary instead.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [patch 0/2] KVM: add option to advance tscdeadline hrtimer expiration (v2)

2014-12-10 Thread Paolo Bonzini



On 10/12/2014 18:06, Marcelo Tosatti wrote:
 See patches for details.

Difference between v1 and v2?  Please fix your workflow.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 2/2] KVM: x86: add option to advance tscdeadline hrtimer expiration

2014-12-10 Thread Marcelo . Tosatti

For the hrtimer which emulates the tscdeadline timer in the guest,
add an option to advance expiration, and busy spin on VM-entry waiting
for the actual expiration time to elapse.

This allows achieving low latencies in cyclictest (or any scenario 
which requires strict timing regarding timer expiration).

Reduces cyclictest avg latency by 50%.

Note: this option requires tuning to find the appropriate value 
for a particular hardware/guest combination. One method is to measure the 
average delay between apic_timer_fn and VM-entry. 
Another method is to start with 1000ns, and increase the value
in say 500ns increments until avg cyclictest numbers stop decreasing.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/kvm/lapic.c
===
--- kvm.orig/arch/x86/kvm/lapic.c
+++ kvm/arch/x86/kvm/lapic.c
@@ -33,6 +33,7 @@
 #include asm/page.h
 #include asm/current.h
 #include asm/apicdef.h
+#include asm/delay.h
 #include linux/atomic.h
 #include linux/jump_label.h
 #include kvm_cache_regs.h
@@ -1073,6 +1074,7 @@ static void apic_timer_expired(struct kv
 {
struct kvm_vcpu *vcpu = apic-vcpu;
wait_queue_head_t *q = vcpu-wq;
+   struct kvm_timer *ktimer = apic-lapic_timer;
 
/*
 * Note: KVM_REQ_PENDING_TIMER is implicitly checked in
@@ -1087,11 +1089,59 @@ static void apic_timer_expired(struct kv
 
if (waitqueue_active(q))
wake_up_interruptible(q);
+
+   if (ktimer-timer_mode_mask == APIC_LVT_TIMER_TSCDEADLINE)
+   ktimer-expired_tscdeadline = ktimer-tscdeadline;
+}
+
+static bool lapic_timer_int_injected(struct kvm_vcpu *vcpu)
+{
+   struct kvm_lapic *apic = vcpu-arch.apic;
+   u32 reg = kvm_apic_get_reg(apic, APIC_LVTT);
+
+   if (kvm_apic_hw_enabled(apic)) {
+   int vec = reg  APIC_VECTOR_MASK;
+
+   if (kvm_x86_ops-test_posted_interrupt)
+   return kvm_x86_ops-test_posted_interrupt(vcpu, vec);
+   else {
+   if (apic_test_vector(vec, apic-regs + APIC_ISR))
+   return true;
+   }
+   }
+   return false;
+}
+
+void wait_lapic_expire(struct kvm_vcpu *vcpu)
+{
+   struct kvm_lapic *apic = vcpu-arch.apic;
+   u64 guest_tsc, tsc_deadline;
+
+   if (!kvm_vcpu_has_lapic(vcpu))
+   return;
+
+   if (!apic_lvtt_tscdeadline(apic))
+   return;
+
+   if (!lapic_timer_int_injected(vcpu))
+   return;
+
+   tsc_deadline = apic-lapic_timer.expired_tscdeadline;
+   guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, native_read_tsc());
+
+   while (guest_tsc  tsc_deadline) {
+   int delay = min(tsc_deadline - guest_tsc, 1000ULL);
+
+   ndelay(delay);
+   guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, native_read_tsc());
+   }
 }
 
 static void start_apic_timer(struct kvm_lapic *apic)
 {
ktime_t now;
+   struct kvm_arch *kvm_arch = apic-vcpu-kvm-arch;
+
atomic_set(apic-lapic_timer.pending, 0);
 
if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic)) {
@@ -1137,6 +1187,7 @@ static void start_apic_timer(struct kvm_
/* lapic timer in tsc deadline mode */
u64 guest_tsc, tscdeadline = apic-lapic_timer.tscdeadline;
u64 ns = 0;
+   ktime_t expire;
struct kvm_vcpu *vcpu = apic-vcpu;
unsigned long this_tsc_khz = vcpu-arch.virtual_tsc_khz;
unsigned long flags;
@@ -1149,10 +1200,14 @@ static void start_apic_timer(struct kvm_
now = apic-lapic_timer.timer.base-get_time();
guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, native_read_tsc());
if (likely(tscdeadline  guest_tsc)) {
+   u32 advance = kvm_arch-lapic_tscdeadline_advance_ns;
+
ns = (tscdeadline - guest_tsc) * 100ULL;
do_div(ns, this_tsc_khz);
+   expire = ktime_add_ns(now, ns);
+   expire = ktime_sub_ns(expire, advance);
hrtimer_start(apic-lapic_timer.timer,
-   ktime_add_ns(now, ns), HRTIMER_MODE_ABS);
+ expire, HRTIMER_MODE_ABS);
} else
apic_timer_expired(apic);
 
Index: kvm/arch/x86/kvm/lapic.h
===
--- kvm.orig/arch/x86/kvm/lapic.h
+++ kvm/arch/x86/kvm/lapic.h
@@ -14,6 +14,7 @@ struct kvm_timer {
u32 timer_mode;
u32 timer_mode_mask;
u64 tscdeadline;
+   u64 expired_tscdeadline;
atomic_t pending;   /* accumulated triggered timers 
*/
 };
 
@@ -170,4 +171,6 @@ static inline bool kvm_apic_has_events(s
 
 bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
 
+void

[patch 1/2] KVM: x86: add method to test PIR bitmap vector

2014-12-10 Thread Marcelo Tosatti

kvm_x86_ops-test_posted_interrupt() returns true/false depending
whether 'vector' is set.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/include/asm/kvm_host.h
===
--- kvm.orig/arch/x86/include/asm/kvm_host.h
+++ kvm/arch/x86/include/asm/kvm_host.h
@@ -743,6 +743,7 @@ struct kvm_x86_ops {
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
+   bool (*test_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
int (*get_tdp_level)(void);
Index: kvm/arch/x86/kvm/vmx.c
===
--- kvm.orig/arch/x86/kvm/vmx.c
+++ kvm/arch/x86/kvm/vmx.c
@@ -435,6 +435,11 @@ static int pi_test_and_set_pir(int vecto
return test_and_set_bit(vector, (unsigned long *)pi_desc-pir);
 }
 
+static int pi_test_pir(int vector, struct pi_desc *pi_desc)
+{
+   return test_bit(vector, (unsigned long *)pi_desc-pir);
+}
+
 struct vcpu_vmx {
struct kvm_vcpu   vcpu;
unsigned long host_rsp;
@@ -5939,6 +5944,7 @@ static __init int hardware_setup(void)
else {
kvm_x86_ops-hwapic_irr_update = NULL;
kvm_x86_ops-deliver_posted_interrupt = NULL;
+   kvm_x86_ops-test_posted_interrupt = NULL;
kvm_x86_ops-sync_pir_to_irr = vmx_sync_pir_to_irr_dummy;
}
 
@@ -6960,6 +6966,13 @@ static int handle_invvpid(struct kvm_vcp
return 1;
 }
 
+static bool vmx_test_pir(struct kvm_vcpu *vcpu, int vector)
+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+   return pi_test_pir(vector, vmx-pi_desc);
+}
+
 /*
  * The exit handlers return 1 if the exit was handled fully and guest execution
  * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
@@ -9374,6 +9387,7 @@ static struct kvm_x86_ops vmx_x86_ops =
.hwapic_isr_update = vmx_hwapic_isr_update,
.sync_pir_to_irr = vmx_sync_pir_to_irr,
.deliver_posted_interrupt = vmx_deliver_posted_interrupt,
+   .test_posted_interrupt = vmx_test_pir,
 
.set_tss_addr = vmx_set_tss_addr,
.get_tdp_level = get_ept_level,


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 1/2] KVM: x86: add method to test PIR bitmap vector

2014-12-10 Thread Rik van Riel

On 12/10/2014 12:27 PM, Marcelo Tosatti wrote:
 On Wed, Dec 10, 2014 at 12:10:04PM -0500, Rik van Riel wrote:
 On 12/10/2014 12:06 PM, Marcelo Tosatti wrote:
 kvm_x86_ops-test_posted_interrupt() returns true/false depending
 whether 'vector' is set.

 Is that good? Bad? How does this patch address the issue?
 
 What issue?

Why is this change being made?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 2/2] KVM: x86: add option to advance tscdeadline hrtimer expiration

2014-12-10 Thread Paolo Bonzini



On 10/12/2014 18:34, Marcelo Tosatti wrote:
  Let's start with a kvm-unit-tests patch to measure this value.
 I can, but kvm-unit-test register state will not be similar to
 actual guest state (think host/guest state loading).

7us is about 2 clock cycles.  A lightweight vmexit is an order of
magnitude less expensive, and half of the vmexit overhead is the VMRUN
instruction itself.  All in all, the host/guest state loading should not
matter (or should matter little).

 What is the advantage of using a kvm-unit-test test rather
 than cyclictest in the guest ?

That it starts in 3 seconds, and that you can vary the timer frequency
in order to measure jitter in addition to latency.

 We can then decide whether to hardcode a small default value (e.g.
 1000-3000) and make it a module parameter?  Or perhaps start with a
 higher value (twice what you find in practice?) and adjust it towards a
 target every time wait_lapic_expire is called.  But in order to judge
 the correct approach, I need to see the numbers.
 
 Problem with automatic adjustment is: what is the correct target?

We cannot say without seeing the numbers, particularly the jitter.  This
is why I want to see numbers for varying frequencies (from 100us to 10ms
per tick, say).

 You want faster instances of apic_timer_fn-vm-entry to spin a bit,
 and allow slow instances of apic_timer_fn-vm-entry to have
 an effective advancement.

If it is small enogh, you can make the timer a little early (increase
advance) by a small amount on every delivered interrupt.  This prepares
for a slow instance.

And you can make the timer less early (decrease advance) by some
percentage of what you had to wait on every wait_lapic_expire, if you
had to wait more than a given threshold.  This avoids that you wait too
much on consecutive fast instances.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [QEMU patch 2/2] kvm: allow configuration of tsc deadline timer advancement

2014-12-10 Thread Paolo Bonzini



On 10/12/2014 18:35, Marcelo Tosatti wrote:
 On Wed, Dec 10, 2014 at 06:29:43PM +0100, Paolo Bonzini wrote:


 On 10/12/2014 18:27, Marcelo Tosatti wrote:
 On Wed, Dec 10, 2014 at 06:09:19PM +0100, Paolo Bonzini wrote:


 On 10/12/2014 18:04, Marcelo Tosatti wrote:
 Please add an object property to the x86 CPU object.  It can then be
 configured with -global on the command line.

 Don't want to allow individual values for different CPUs. 
 It is a per-VM property.

 Why?  It can cause busy waiting, it would make sense to make it stricter
 for realtime CPUs and leave 0 for non-realtime CPUs.

 HW timer behaviour should be consistent across CPUs, IMO.

 It's not going to be anyway.  Cache line bounces, frequency scaling,
 presence of higher-priority RT tasks, etc. can cause different response
 for one CPU over the others.
 
 OK i'll change it to per-CPU.

Well, my preferred choice would be automatic adjustment with a module
parameter.  If we need manual tuning, per-CPU would be my choice, but
automatic is nicer anyway. :)

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 1/2] KVM: x86: add method to test PIR bitmap vector

2014-12-10 Thread Marcelo Tosatti

On Wed, Dec 10, 2014 at 12:50:29PM -0500, Rik van Riel wrote:
 On 12/10/2014 12:27 PM, Marcelo Tosatti wrote:
  On Wed, Dec 10, 2014 at 12:10:04PM -0500, Rik van Riel wrote:
  On 12/10/2014 12:06 PM, Marcelo Tosatti wrote:
  kvm_x86_ops-test_posted_interrupt() returns true/false depending
  whether 'vector' is set.
 
  Is that good? Bad? How does this patch address the issue?
  
  What issue?
 
 Why is this change being made?

Next patch in the series.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [QEMU patch 2/2] kvm: allow configuration of tsc deadline timer advancement

2014-12-10 Thread Marcelo Tosatti

On Wed, Dec 10, 2014 at 06:29:43PM +0100, Paolo Bonzini wrote:
 
 
 On 10/12/2014 18:27, Marcelo Tosatti wrote:
  On Wed, Dec 10, 2014 at 06:09:19PM +0100, Paolo Bonzini wrote:
 
 
  On 10/12/2014 18:04, Marcelo Tosatti wrote:
  Please add an object property to the x86 CPU object.  It can then be
  configured with -global on the command line.
 
  Don't want to allow individual values for different CPUs. 
  It is a per-VM property.
 
  Why?  It can cause busy waiting, it would make sense to make it stricter
  for realtime CPUs and leave 0 for non-realtime CPUs.
 
  Paolo
  
  HW timer behaviour should be consistent across CPUs, IMO.
 
 It's not going to be anyway.  Cache line bounces, frequency scaling,
 presence of higher-priority RT tasks, etc. can cause different response
 for one CPU over the others.
 
 Paolo

OK i'll change it to per-CPU.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [QEMU patch 2/2] kvm: allow configuration of tsc deadline timer advancement

2014-12-10 Thread Radim Krčmář

2014-12-10 18:55+0100, Paolo Bonzini:
 Well, my preferred choice would be automatic adjustment with a module
 parameter.  If we need manual tuning, per-CPU would be my choice, but
 automatic is nicer anyway. :)

I agree with Paolo, and think it would be better not to touch QEMU ...
it makes little sense to migrate this value and it is probably going to
be quite similar on every CPU, so a writeable module parameter is a
better starting point.  (We can always turn it into a nightmare later.)

If you measure the difference between the TSC you wanted and got on VM
entry, you can use it to automatically guess a delta for the next timer.
(That is IMO exactly what you would do with a manual tuning.
 The algorithm should probably prefer being a bit late than early too.)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [QEMU patch 2/2] kvm: allow configuration of tsc deadline timer advancement

2014-12-10 Thread Paolo Bonzini



On 10/12/2014 19:39, Radim Krčmář wrote:
 2014-12-10 18:55+0100, Paolo Bonzini:
 Well, my preferred choice would be automatic adjustment with a module
 parameter.  If we need manual tuning, per-CPU would be my choice, but
 automatic is nicer anyway. :)
 
 I agree with Paolo, and think it would be better not to touch QEMU ...
 it makes little sense to migrate this value and it is probably going to
 be quite similar on every CPU, so a writeable module parameter is a
 better starting point.  (We can always turn it into a nightmare later.)

Ok, let's start with a simple module parameter, similar to what PLE used
to have.  We can use that to play with kvm-unit-tests.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH kvm-unit-tests 00/15] arm64: initial drop

2014-12-10 Thread Andrew Jones

This series adds support for aarch64 to the kvm-unit-tests framework,
bringing it to the same level as the arm support. In the process a
few tweaks to the arm support were made, as one of the main goals
was to share as much code as possible between the two.

Patches
01   : A fix for the script runner. We need this one for arm
   regardless of the aarch64 support.
02-03: Fixes to the arm support. The bugs fixed weren't visible
   until running on aarch64.
04-07: Prep the arm framework for the bare minimal initial drop
08   : The bare minimal initial drop
09   : Add vector support to the minimal drop
10-12: Prep the arm framework for enabling the mmu on aarch64
13-14: Prep the aarch64 framework for enabling the mmu
15   : Enables the mmu on aarch64

These patches are also available here
https://github.com/rhdrjones/kvm-unit-tests/tree/arm64/initial-drop

Thanks,
drew


Andrew Jones (15):
  arm: fix run script testdev probing
  virtio: don't use size_t
  arm: setup: fix type mismatch
  Makefile: cscope may need to look in lib/$ARCH
  arm: use absolute headers
  arm: setup: drop unused arguments
  arm: selftest: rename svc mode to kernel mode
  arm64: initial drop
  arm64: vectors support
  arm: get PHYS_MASK from pgtable-hwdef.h
  arm: import more linux page table api
  arm: prepare mmu code for arm64
  arm64: import some Linux page table API
  arm64: prepare for 64k pages
  arm64: enable mmu

 Makefile  |   4 +-
 arm/cstart.S  |  18 ++-
 arm/cstart64.S| 252 ++
 arm/flat.lds  |  11 +-
 arm/run   |  12 +-
 arm/selftest.c| 141 +--
 arm/unittests.cfg |  12 +-
 config/config-arm-common.mak  |  69 
 config/config-arm.mak |  74 ++---
 config/config-arm64.mak   |  21 
 configure |  12 +-
 lib/arm/asm-offsets.c |  11 +-
 lib/arm/asm/asm-offsets.h |   2 +-
 lib/arm/asm/io.h  |   8 +-
 lib/arm/asm/mmu-api.h |  14 +++
 lib/arm/asm/mmu.h |  27 ++---
 lib/arm/asm/page.h|   7 +-
 lib/arm/asm/pgtable-hwdef.h   |  44 +++-
 lib/arm/asm/pgtable.h |  91 +++
 lib/arm/asm/processor.h   |   2 +-
 lib/arm/asm/ptrace.h  |   2 +-
 lib/arm/asm/setup.h   |  11 +-
 lib/arm/eabi_compat.c |   2 +-
 lib/arm/io.c  |  10 +-
 lib/arm/mmu.c |  82 ++
 lib/arm/processor.c   |   6 +-
 lib/arm/setup.c   |  19 ++--
 lib/arm/spinlock.c|   8 +-
 lib/arm64/.gitignore  |   1 +
 lib/arm64/asm-offsets.c   |  30 +
 lib/arm64/asm/asm-offsets.h   |   1 +
 lib/arm64/asm/barrier.h   |  17 +++
 lib/arm64/asm/esr.h   |  43 +++
 lib/arm64/asm/io.h|  84 ++
 lib/arm64/asm/mmu-api.h   |   1 +
 lib/arm64/asm/mmu.h   |  24 
 lib/arm64/asm/page.h  |  65 +++
 lib/arm64/asm/pgtable-hwdef.h | 136 +++
 lib/arm64/asm/pgtable.h   |  69 
 lib/arm64/asm/processor.h |  66 +++
 lib/arm64/asm/ptrace.h|  95 
 lib/arm64/asm/setup.h |   1 +
 lib/arm64/asm/spinlock.h  |  15 +++
 lib/arm64/processor.c | 192 
 lib/chr-testdev.c |   4 +-
 lib/kbuild.h  |   8 ++
 lib/virtio.c  |   2 +-
 lib/virtio.h  |   3 +-
 48 files changed, 1638 insertions(+), 191 deletions(-)
 create mode 100644 arm/cstart64.S
 create mode 100644 config/config-arm-common.mak
 create mode 100644 config/config-arm64.mak
 create mode 100644 lib/arm/asm/mmu-api.h
 create mode 100644 lib/arm/asm/pgtable.h
 create mode 100644 lib/arm64/.gitignore
 create mode 100644 lib/arm64/asm-offsets.c
 create mode 100644 lib/arm64/asm/asm-offsets.h
 create mode 100644 lib/arm64/asm/barrier.h
 create mode 100644 lib/arm64/asm/esr.h
 create mode 100644 lib/arm64/asm/io.h
 create mode 100644 lib/arm64/asm/mmu-api.h
 create mode 100644 lib/arm64/asm/mmu.h
 create mode 100644 lib/arm64/asm/page.h
 create mode 100644 lib/arm64/asm/pgtable-hwdef.h
 create mode 100644 lib/arm64/asm/pgtable.h
 create mode 100644 lib/arm64/asm/processor.h
 create mode 100644 lib/arm64/asm/ptrace.h
 create mode 100644 lib/arm64/asm/setup.h
 create mode 100644 lib/arm64/asm/spinlock.h
 create mode 100644 lib/arm64/processor.c
 create mode 100644 lib/kbuild.h

-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 15/15] arm64: enable mmu

2014-12-10 Thread Andrew Jones

Implement asm_mmu_enable and flush_tlb_all, and then make a final
change to mmu.c in order to link it into arm64. The final change
is to map the code read-only. This is necessary because armv8
forces all writable code shared between EL1 and EL0 to be PXN.

Signed-off-by: Andrew Jones drjo...@redhat.com
---
 arm/cstart64.S   | 64 
 arm/flat.lds |  1 +
 config/config-arm-common.mak |  1 +
 config/config-arm.mak|  1 -
 lib/arm/asm/mmu.h|  1 +
 lib/arm/mmu.c| 10 ++-
 lib/arm64/asm/mmu-api.h  |  1 +
 lib/arm64/asm/mmu.h  | 16 +++
 lib/arm64/asm/processor.h| 14 ++
 lib/arm64/processor.c| 26 +-
 10 files changed, 127 insertions(+), 8 deletions(-)
 create mode 100644 lib/arm64/asm/mmu-api.h

diff --git a/arm/cstart64.S b/arm/cstart64.S
index d1860a94fb2d3..5151f4c77d745 100644
--- a/arm/cstart64.S
+++ b/arm/cstart64.S
@@ -8,6 +8,9 @@
 #define __ASSEMBLY__
 #include asm/asm-offsets.h
 #include asm/ptrace.h
+#include asm/processor.h
+#include asm/page.h
+#include asm/pgtable-hwdef.h
 
 .section .init
 
@@ -55,6 +58,67 @@ halt:
b   1b
 
 /*
+ * asm_mmu_enable
+ *   Inputs:
+ * x0 is the base address of the translation table
+ *   Outputs: none
+ *
+ * Adapted from
+ *   arch/arm64/kernel/head.S
+ *   arch/arm64/mm/proc.S
+ */
+
+/*
+ * Memory region attributes for LPAE:
+ *
+ *   n = AttrIndx[2:0]
+ *  n   MAIR
+ *   DEVICE_nGnRnE  000 
+ *   DEVICE_nGnRE   001 0100
+ *   DEVICE_GRE 010 1100
+ *   NORMAL_NC  011 01000100
+ *   NORMAL 100 
+ */
+#define MAIR(attr, mt) ((attr)  ((mt) * 8))
+
+.globl asm_mmu_enable
+asm_mmu_enable:
+   ic  iallu   // I+BTB cache invalidate
+   tlbivmalle1is   // invalidate I + D TLBs
+   dsb ish
+
+   /* TCR */
+   ldr x1, =TCR_TxSZ(VA_BITS) |\
+TCR_TG0_64K | TCR_TG1_64K |\
+TCR_IRGN_WBWA | TCR_ORGN_WBWA |\
+TCR_SHARED
+   mov x2, #3  // 011 is 42 bits
+   bfi x1, x2, #32, #3
+   msr tcr_el1, x1
+
+   /* MAIR */
+   ldr x1, =MAIR(0x00, MT_DEVICE_nGnRnE) | \
+MAIR(0x04, MT_DEVICE_nGnRE) |  \
+MAIR(0x0c, MT_DEVICE_GRE) |\
+MAIR(0x44, MT_NORMAL_NC) | \
+MAIR(0xff, MT_NORMAL)
+   msr mair_el1, x1
+
+   /* TTBR0 */
+   msr ttbr0_el1, x0
+   isb
+
+   /* SCTLR */
+   mrs x1, sctlr_el1
+   orr x1, x1, SCTLR_EL1_C
+   orr x1, x1, SCTLR_EL1_I
+   orr x1, x1, SCTLR_EL1_M
+   msr sctlr_el1, x1
+   isb
+
+   ret
+
+/*
  * Vectors
  * Adapted from arch/arm64/kernel/entry.S
  */
diff --git a/arm/flat.lds b/arm/flat.lds
index 89a55720d728f..a8849ee0939a8 100644
--- a/arm/flat.lds
+++ b/arm/flat.lds
@@ -3,6 +3,7 @@ SECTIONS
 {
 .text : { *(.init) *(.text) *(.text.*) }
 . = ALIGN(64K);
+etext = .;
 .data : {
 exception_stacks = .;
 . += 64K;
diff --git a/config/config-arm-common.mak b/config/config-arm-common.mak
index b61a2a6044ab2..b01e9ab836b2d 100644
--- a/config/config-arm-common.mak
+++ b/config/config-arm-common.mak
@@ -33,6 +33,7 @@ cflatobjs += lib/virtio-mmio.o
 cflatobjs += lib/chr-testdev.o
 cflatobjs += lib/arm/io.o
 cflatobjs += lib/arm/setup.o
+cflatobjs += lib/arm/mmu.o
 
 libeabi = lib/arm/libeabi.a
 eabiobjs = lib/arm/eabi_compat.o
diff --git a/config/config-arm.mak b/config/config-arm.mak
index 96686fb639d2d..16e2cb5c103a3 100644
--- a/config/config-arm.mak
+++ b/config/config-arm.mak
@@ -15,7 +15,6 @@ CFLAGS += -mcpu=$(PROCESSOR)
 cstart.o = $(TEST_DIR)/cstart.o
 cflatobjs += lib/arm/spinlock.o
 cflatobjs += lib/arm/processor.o
-cflatobjs += lib/arm/mmu.o
 
 # arm specific tests
 tests =
diff --git a/lib/arm/asm/mmu.h b/lib/arm/asm/mmu.h
index 5ec7a6ce5886b..c1bd01c9ee1b9 100644
--- a/lib/arm/asm/mmu.h
+++ b/lib/arm/asm/mmu.h
@@ -9,6 +9,7 @@
 #include asm/barrier.h
 
 #define PTE_USER   L_PTE_USER
+#define PTE_RDONLY PTE_AP2
 #define PTE_SHARED L_PTE_SHARED
 #define PTE_AF PTE_EXT_AF
 #define PTE_WBWA   L_PTE_MT_WRITEALLOC
diff --git a/lib/arm/mmu.c b/lib/arm/mmu.c
index 55d18a10e1ebd..1c024538663ce 100644
--- a/lib/arm/mmu.c
+++ b/lib/arm/mmu.c
@@ -8,6 +8,8 @@
 #include asm/setup.h
 #include asm/mmu.h
 
+extern unsigned long etext;
+
 pgd_t *mmu_idmap;
 
 static bool mmu_on;
@@ -72,13 +74,19 @@ void mmu_enable_idmap(void)
 {
unsigned long phys_end = sizeof(long) == 8 || !(PHYS_END  32)
? PHYS_END : 0xf000;
+   unsigned long code_end = (unsigned long)etext;

[PATCH 04/15] Makefile: cscope may need to look in lib/$ARCH

2014-12-10 Thread Andrew Jones

When $ARCH != $TEST_DIR we should look there too. This patch cheats
though and makes cscope always look there, but then gets rid of the
duplicates generated when $ARCH == $TEST_DIR.

Signed-off-by: Andrew Jones drjo...@redhat.com
---
 Makefile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index dd7e6e94bfe7b..4f28f072ae3d7 100644
--- a/Makefile
+++ b/Makefile
@@ -82,6 +82,6 @@ distclean: clean libfdt_clean
 cscope: common_dirs = lib lib/libfdt lib/asm lib/asm-generic
 cscope:
$(RM) ./cscope.*
-   find -L $(TEST_DIR) lib/$(TEST_DIR) $(common_dirs) -maxdepth 1 \
-   -name '*.[chsS]' -print | sed 's,^\./,,'  ./cscope.files
+   find -L $(TEST_DIR) lib/$(TEST_DIR) lib/$(ARCH) $(common_dirs) 
-maxdepth 1 \
+   -name '*.[chsS]' -print | sed 's,^\./,,' | sort -u  
./cscope.files
cscope -bk
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/15] arm: selftest: rename svc mode to kernel mode

2014-12-10 Thread Andrew Jones

Separate the concepts of an 'svc', the syscall instruction present
on both arm and arm64, and 'svc mode', which is arm's kernel mode,
and doesn't exist on arm64. kernel mode on arm64 is modeled with
exception level 1 (el1).

Signed-off-by: Andrew Jones drjo...@redhat.com
---
 arm/selftest.c|  4 ++--
 arm/unittests.cfg | 12 ++--
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arm/selftest.c b/arm/selftest.c
index 0de794ea7d696..885a54fee0e4a 100644
--- a/arm/selftest.c
+++ b/arm/selftest.c
@@ -195,11 +195,11 @@ int main(int argc, char **argv)
 
check_setup(argc-1, argv[1]);
 
-   } else if (strcmp(argv[0], vectors-svc) == 0) {
+   } else if (strcmp(argv[0], vectors-kernel) == 0) {
 
check_vectors(NULL);
 
-   } else if (strcmp(argv[0], vectors-usr) == 0) {
+   } else if (strcmp(argv[0], vectors-user) == 0) {
 
void *sp = memalign(PAGE_SIZE, PAGE_SIZE);
memset(sp, 0, PAGE_SIZE);
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index 57f5f90f3e808..efcca6bf24af6 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -17,14 +17,14 @@ smp  = 1
 extra_params = -m 256 -append 'setup smp=1 mem=256'
 groups = selftest
 
-# Test vector setup and exception handling (svc mode).
-[selftest::vectors-svc]
+# Test vector setup and exception handling (kernel mode).
+[selftest::vectors-kernel]
 file = selftest.flat
-extra_params = -append 'vectors-svc'
+extra_params = -append 'vectors-kernel'
 groups = selftest
 
-# Test vector setup and exception handling (usr mode).
-[selftest::vectors-usr]
+# Test vector setup and exception handling (user mode).
+[selftest::vectors-user]
 file = selftest.flat
-extra_params = -append 'vectors-usr'
+extra_params = -append 'vectors-user'
 groups = selftest
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 09/15] arm64: vectors support

2014-12-10 Thread Andrew Jones

Signed-off-by: Andrew Jones drjo...@redhat.com
---
 arm/cstart64.S| 142 ++-
 arm/selftest.c| 129 ---
 arm/unittests.cfg |   2 -
 config/config-arm64.mak   |   1 +
 lib/arm64/asm-offsets.c   |  16 +
 lib/arm64/asm/esr.h   |  43 
 lib/arm64/asm/processor.h |  52 ++
 lib/arm64/asm/ptrace.h|  95 ++
 lib/arm64/processor.c | 168 ++
 9 files changed, 637 insertions(+), 11 deletions(-)
 create mode 100644 lib/arm64/asm/esr.h
 create mode 100644 lib/arm64/asm/processor.h
 create mode 100644 lib/arm64/asm/ptrace.h
 create mode 100644 lib/arm64/processor.c

diff --git a/arm/cstart64.S b/arm/cstart64.S
index 1d98066d0e187..d1860a94fb2d3 100644
--- a/arm/cstart64.S
+++ b/arm/cstart64.S
@@ -7,6 +7,7 @@
  */
 #define __ASSEMBLY__
 #include asm/asm-offsets.h
+#include asm/ptrace.h
 
 .section .init
 
@@ -26,7 +27,7 @@ start:
msr cpacr_el1, x0
 
/* set up exception handling */
-// bl  exceptions_init
+   bl  exceptions_init
 
/* complete setup */
ldp x0, x1, [sp], #16
@@ -40,9 +41,148 @@ start:
bl  exit
b   halt
 
+exceptions_init:
+   adr x0, vector_table
+   msr vbar_el1, x0
+   isb
+   ret
+
 .text
 
 .globl halt
 halt:
 1: wfi
b   1b
+
+/*
+ * Vectors
+ * Adapted from arch/arm64/kernel/entry.S
+ */
+.macro vector_stub, name, vec
+\name:
+   stp  x0,  x1, [sp, #-S_FRAME_SIZE]!
+   stp  x2,  x3, [sp,  #16]
+   stp  x4,  x5, [sp,  #32]
+   stp  x6,  x7, [sp,  #48]
+   stp  x8,  x9, [sp,  #64]
+   stp x10, x11, [sp,  #80]
+   stp x12, x13, [sp,  #96]
+   stp x14, x15, [sp, #112]
+   stp x16, x17, [sp, #128]
+   stp x18, x19, [sp, #144]
+   stp x20, x21, [sp, #160]
+   stp x22, x23, [sp, #176]
+   stp x24, x25, [sp, #192]
+   stp x26, x27, [sp, #208]
+   stp x28, x29, [sp, #224]
+
+   str x30, [sp, #S_LR]
+
+   .if \vec = 8
+   mrs x1, sp_el0
+   .else
+   add x1, sp, #S_FRAME_SIZE
+   .endif
+   str x1, [sp, #S_SP]
+
+   mrs x1, elr_el1
+   mrs x2, spsr_el1
+   stp x1, x2, [sp, #S_PC]
+
+   and x2, x2, #PSR_MODE_MASK
+   cmp x2, #PSR_MODE_EL0t
+   b.ne1f
+   adr x2, user_mode
+   str xzr, [x2]   /* we're in kernel mode now */
+
+1: mov x0, \vec
+   mov x1, sp
+   mrs x2, esr_el1
+   bl  do_handle_exception
+
+   ldp x1, x2, [sp, #S_PC]
+   msr spsr_el1, x2
+   msr elr_el1, x1
+
+   and x2, x2, #PSR_MODE_MASK
+   cmp x2, #PSR_MODE_EL0t
+   b.ne1f
+   adr x2, user_mode
+   mov x1, #1
+   str x1, [x2]/* we're going back to user mode */
+
+1:
+   .if \vec = 8
+   ldr x1, [sp, #S_SP]
+   msr sp_el0, x1
+   .endif
+
+   ldr x30, [sp, #S_LR]
+
+   ldp x28, x29, [sp, #224]
+   ldp x26, x27, [sp, #208]
+   ldp x24, x25, [sp, #192]
+   ldp x22, x23, [sp, #176]
+   ldp x20, x21, [sp, #160]
+   ldp x18, x19, [sp, #144]
+   ldp x16, x17, [sp, #128]
+   ldp x14, x15, [sp, #112]
+   ldp x12, x13, [sp,  #96]
+   ldp x10, x11, [sp,  #80]
+   ldp  x8,  x9, [sp,  #64]
+   ldp  x6,  x7, [sp,  #48]
+   ldp  x4,  x5, [sp,  #32]
+   ldp  x2,  x3, [sp,  #16]
+   ldp  x0,  x1, [sp], #S_FRAME_SIZE
+
+   eret
+.endm
+
+vector_stubel1t_sync, 0
+vector_stubel1t_irq,  1
+vector_stubel1t_fiq,  2
+vector_stubel1t_error,3
+
+vector_stubel1h_sync, 4
+vector_stubel1h_irq,  5
+vector_stubel1h_fiq,  6
+vector_stubel1h_error,7
+
+vector_stubel0_sync_64,   8
+vector_stubel0_irq_64,9
+vector_stubel0_fiq_64,   10
+vector_stubel0_error_64, 11
+
+vector_stubel0_sync_32,  12
+vector_stubel0_irq_32,   13
+vector_stubel0_fiq_32,   14
+vector_stubel0_error_32, 15
+
+.section .text.ex
+
+.macro ventry, label
+.align 7
+   b   \label
+.endm
+
+.align 11
+vector_table:
+   ventry  el1t_sync   // Synchronous EL1t
+   ventry  el1t_irq// IRQ EL1t
+   ventry  el1t_fiq// FIQ EL1t
+   ventry  el1t_error  // Error EL1t
+
+   ventry  el1h_sync   // Synchronous EL1h
+   ventry  el1h_irq// IRQ EL1h
+   ventry  el1h_fiq// FIQ EL1h
+   ventry  el1h_error  // Error EL1h
+
+   ventry  el0_sync_64 // Synchronous 64-bit EL0
+

[PATCH 13/15] arm64: import some Linux page table API

2014-12-10 Thread Andrew Jones

Signed-off-by: Andrew Jones drjo...@redhat.com
---
 lib/arm64/asm/page.h  |  66 +++-
 lib/arm64/asm/pgtable-hwdef.h | 136 ++
 lib/arm64/asm/pgtable.h   |  69 +
 3 files changed, 270 insertions(+), 1 deletion(-)
 create mode 100644 lib/arm64/asm/pgtable-hwdef.h
 create mode 100644 lib/arm64/asm/pgtable.h

diff --git a/lib/arm64/asm/page.h b/lib/arm64/asm/page.h
index 395760cad5f82..29ad1f1f720c4 100644
--- a/lib/arm64/asm/page.h
+++ b/lib/arm64/asm/page.h
@@ -1 +1,65 @@
-#include ../../arm/asm/page.h
+#ifndef _ASMARM64_PAGE_H_
+#define _ASMARM64_PAGE_H_
+/*
+ * Adapted from
+ *   arch/arm64/include/asm/pgtable-types.h
+ *   include/asm-generic/pgtable-nopud.h
+ *   include/asm-generic/pgtable-nopmd.h
+ *
+ * Copyright (C) 2014, Red Hat Inc, Andrew Jones drjo...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ */
+
+#include const.h
+
+#define PGTABLE_LEVELS 2
+#define VA_BITS42
+
+#define PAGE_SHIFT 16
+#define PAGE_SIZE  (_AC(1,UL)  PAGE_SHIFT)
+#define PAGE_MASK  (~(PAGE_SIZE-1))
+
+#ifndef __ASSEMBLY__
+
+#define PAGE_ALIGN(addr)   ALIGN(addr, PAGE_SIZE)
+
+#include alloc.h
+
+typedef u64 pteval_t;
+typedef u64 pmdval_t;
+typedef u64 pudval_t;
+typedef u64 pgdval_t;
+typedef struct { pteval_t pte; } pte_t;
+typedef struct { pgdval_t pgd; } pgd_t;
+typedef struct { pteval_t pgprot; } pgprot_t;
+
+#define pte_val(x) ((x).pte)
+#define pgd_val(x) ((x).pgd)
+#define pgprot_val(x)  ((x).pgprot)
+
+#define __pte(x)   ((pte_t) { (x) } )
+#define __pgd(x)   ((pgd_t) { (x) } )
+#define __pgprot(x)((pgprot_t) { (x) } )
+
+typedef struct { pgd_t pgd; } pud_t;
+#define pud_val(x) (pgd_val((x).pgd))
+#define __pud(x)   ((pud_t) { __pgd(x) } )
+
+typedef struct { pud_t pud; } pmd_t;
+#define pmd_val(x) (pud_val((x).pud))
+#define __pmd(x)   ((pmd_t) { __pud(x) } )
+
+#ifndef __virt_to_phys
+#define __phys_to_virt(x)  ((unsigned long) (x))
+#define __virt_to_phys(x)  (x)
+#endif
+
+#define __va(x)((void 
*)__phys_to_virt((phys_addr_t)(x)))
+#define __pa(x)__virt_to_phys((unsigned long)(x))
+
+#define virt_to_pfn(kaddr) (__pa(kaddr)  PAGE_SHIFT)
+#define pfn_to_virt(pfn)   __va((pfn)  PAGE_SHIFT)
+
+#endif /* !__ASSEMBLY__ */
+#endif /* _ASMARM64_PAGE_H_ */
diff --git a/lib/arm64/asm/pgtable-hwdef.h b/lib/arm64/asm/pgtable-hwdef.h
new file mode 100644
index 0..20ac9fa402987
--- /dev/null
+++ b/lib/arm64/asm/pgtable-hwdef.h
@@ -0,0 +1,136 @@
+#ifndef _ASMARM64_PGTABLE_HWDEF_H_
+#define _ASMARM64_PGTABLE_HWDEF_H_
+/*
+ * From arch/arm64/include/asm/pgtable-hwdef.h
+ *  arch/arm64/include/asm/memory.h
+ */
+#define UL(x) _AC(x, UL)
+
+#define PTRS_PER_PTE   (1  (PAGE_SHIFT - 3))
+
+/*
+ * PGDIR_SHIFT determines the size a top-level page table entry can map
+ * (depending on the configuration, this level can be 0, 1 or 2).
+ */
+#define PGDIR_SHIFT((PAGE_SHIFT - 3) * PGTABLE_LEVELS + 3)
+#define PGDIR_SIZE (_AC(1, UL)  PGDIR_SHIFT)
+#define PGDIR_MASK (~(PGDIR_SIZE-1))
+#define PTRS_PER_PGD   (1  (VA_BITS - PGDIR_SHIFT))
+
+/* From include/asm-generic/pgtable-nopud.h */
+#define PUD_SHIFT  PGDIR_SHIFT
+#define PTRS_PER_PUD   1
+#define PUD_SIZE   (UL(1)  PUD_SHIFT)
+#define PUD_MASK   (~(PUD_SIZE-1))
+/* From include/asm-generic/pgtable-nopmd.h */
+#define PMD_SHIFT  PUD_SHIFT
+#define PTRS_PER_PMD   1
+#define PMD_SIZE   (UL(1)  PMD_SHIFT)
+#define PMD_MASK   (~(PMD_SIZE-1))
+
+/*
+ * Section address mask and size definitions.
+ */
+#define SECTION_SHIFT  PMD_SHIFT
+#define SECTION_SIZE   (_AC(1, UL)  SECTION_SHIFT)
+#define SECTION_MASK   (~(SECTION_SIZE-1))
+
+/*
+ * Hardware page table definitions.
+ *
+ * Level 1 descriptor (PUD).
+ */
+#define PUD_TYPE_TABLE (_AT(pudval_t, 3)  0)
+#define PUD_TABLE_BIT  (_AT(pgdval_t, 1)  1)
+#define PUD_TYPE_MASK  (_AT(pgdval_t, 3)  0)
+#define PUD_TYPE_SECT  (_AT(pgdval_t, 1)  0)
+
+/*
+ * Level 2 descriptor (PMD).
+ */
+#define PMD_TYPE_MASK  (_AT(pmdval_t, 3)  0)
+#define PMD_TYPE_FAULT (_AT(pmdval_t, 0)  0)
+#define PMD_TYPE_TABLE (_AT(pmdval_t, 3)  0)
+#define PMD_TYPE_SECT  (_AT(pmdval_t, 1)  0)
+#define PMD_TABLE_BIT  (_AT(pmdval_t, 1)  1)
+
+/*
+ * Section
+ */
+#define PMD_SECT_VALID (_AT(pmdval_t, 1)  0)
+#define PMD_SECT_PROT_NONE (_AT(pmdval_t, 1)  58)
+#define PMD_SECT_USER  (_AT(pmdval_t, 1)  6) /* AP[1] */
+#define PMD_SECT_RDONLY(_AT(pmdval_t, 1)  7) /* 
AP[2] */
+#define PMD_SECT_S

[PATCH 11/15] arm: import more linux page table api

2014-12-10 Thread Andrew Jones

To use page level descriptors we need some pgd/pud/pmd/pte
methods, and a few more flags defined.

Signed-off-by: Andrew Jones drjo...@redhat.com
---
 lib/arm/asm/mmu.h   | 16 +---
 lib/arm/asm/pgtable-hwdef.h | 38 ++-
 lib/arm/asm/pgtable.h   | 91 +
 3 files changed, 129 insertions(+), 16 deletions(-)
 create mode 100644 lib/arm/asm/pgtable.h

diff --git a/lib/arm/asm/mmu.h b/lib/arm/asm/mmu.h
index 8090a1b554820..254c29f84fe6f 100644
--- a/lib/arm/asm/mmu.h
+++ b/lib/arm/asm/mmu.h
@@ -5,22 +5,8 @@
  *
  * This work is licensed under the terms of the GNU LGPL, version 2.
  */
-#include asm/setup.h
+#include asm/pgtable.h
 #include asm/barrier.h
-#include alloc.h
-
-#define PTRS_PER_PGD   4
-#define PGDIR_SHIFT30
-#define PGDIR_SIZE (1UL  PGDIR_SHIFT)
-#define PGDIR_MASK (~((1  PGDIR_SHIFT) - 1))
-
-#define pgd_free(pgd) free(pgd)
-static inline pgd_t *pgd_alloc(void)
-{
-   pgd_t *pgd = memalign(L1_CACHE_BYTES, PTRS_PER_PGD * sizeof(pgd_t));
-   memset(pgd, 0, PTRS_PER_PGD * sizeof(pgd_t));
-   return pgd;
-}
 
 static inline void local_flush_tlb_all(void)
 {
diff --git a/lib/arm/asm/pgtable-hwdef.h b/lib/arm/asm/pgtable-hwdef.h
index b6850f64b0f52..13a273d36e8fe 100644
--- a/lib/arm/asm/pgtable-hwdef.h
+++ b/lib/arm/asm/pgtable-hwdef.h
@@ -1,9 +1,45 @@
 #ifndef _ASMARM_PGTABLE_HWDEF_H_
 #define _ASMARM_PGTABLE_HWDEF_H_
 /*
- * From arch/arm/include/asm/pgtable-3level-hwdef.h
+ * From arch/arm/include/asm/pgtable-3level.h
+ *  arch/arm/include/asm/pgtable-3level-hwdef.h
  */
 
+#define PTRS_PER_PGD   4
+#define PGDIR_SHIFT30
+#define PGDIR_SIZE (_AC(1,UL)  PGDIR_SHIFT)
+#define PGDIR_MASK (~((1  PGDIR_SHIFT) - 1))
+
+#define PTRS_PER_PTE   512
+#define PTRS_PER_PMD   512
+
+#define PMD_SHIFT  21
+#define PMD_SIZE   (_AC(1,UL)  PMD_SHIFT)
+#define PMD_MASK   (~((1  PMD_SHIFT) - 1))
+
+#define L_PMD_SECT_VALID   (_AT(pmdval_t, 1)  0)
+
+#define L_PTE_VALID(_AT(pteval_t, 1)  0) /* Valid */
+#define L_PTE_PRESENT  (_AT(pteval_t, 3)  0) /* Present */
+#define L_PTE_USER (_AT(pteval_t, 1)  6) /* AP[1] */
+#define L_PTE_SHARED   (_AT(pteval_t, 3)  8) /* SH[1:0], 
inner shareable */
+#define L_PTE_YOUNG(_AT(pteval_t, 1)  10)/* AF */
+#define L_PTE_XN   (_AT(pteval_t, 1)  54)/* XN */
+
+/*
+ * AttrIndx[2:0] encoding (mapping attributes defined in the MAIR* registers).
+ */
+#define L_PTE_MT_UNCACHED  (_AT(pteval_t, 0)  2) /* strongly ordered */
+#define L_PTE_MT_BUFFERABLE(_AT(pteval_t, 1)  2) /* normal non-cacheable 
*/
+#define L_PTE_MT_WRITETHROUGH  (_AT(pteval_t, 2)  2) /* normal inner 
write-through */
+#define L_PTE_MT_WRITEBACK (_AT(pteval_t, 3)  2) /* normal inner 
write-back */
+#define L_PTE_MT_WRITEALLOC(_AT(pteval_t, 7)  2) /* normal inner 
write-alloc */
+#define L_PTE_MT_DEV_SHARED(_AT(pteval_t, 4)  2) /* device */
+#define L_PTE_MT_DEV_NONSHARED (_AT(pteval_t, 4)  2) /* device */
+#define L_PTE_MT_DEV_WC(_AT(pteval_t, 1)  2) /* normal 
non-cacheable */
+#define L_PTE_MT_DEV_CACHED(_AT(pteval_t, 3)  2) /* normal inner 
write-back */
+#define L_PTE_MT_MASK  (_AT(pteval_t, 7)  2)
+
 /*
  * Hardware page table definitions.
  *
diff --git a/lib/arm/asm/pgtable.h b/lib/arm/asm/pgtable.h
new file mode 100644
index 0..8a730f44e537b
--- /dev/null
+++ b/lib/arm/asm/pgtable.h
@@ -0,0 +1,91 @@
+#ifndef _ASMARM_PGTABLE_H_
+#define _ASMARM_PGTABLE_H_
+/*
+ * Adapted from arch/arm/include/asm/pgtable.h
+ *  arch/arm/include/asm/pgtable-3level.h
+ *  arch/arm/include/asm/pgalloc.h
+ *  include/asm-generic/pgtable-nopud.h
+ *
+ * Note: some Linux function APIs have been modified. Nothing crazy,
+ *   but if a function took, for example, an mm_struct, then
+ *   that was either removed or replaced.
+ */
+#include alloc.h
+#include asm/setup.h
+#include asm/page.h
+#include asm/pgtable-hwdef.h
+
+#define pgd_none(pgd)  (!pgd_val(pgd))
+#define pud_none(pud)  (!pud_val(pud))
+#define pmd_none(pmd)  (!pmd_val(pmd))
+#define pte_none(pte)  (!pte_val(pte))
+
+#define pgd_index(addr) \
+   (((addr)  PGDIR_SHIFT)  (PTRS_PER_PGD - 1))
+#define pgd_offset(pgtable, addr) ((pgtable) + pgd_index(addr))
+
+#define pgd_free(pgd) free(pgd)
+static inline pgd_t *pgd_alloc(void)
+{
+   pgd_t *pgd = memalign(L1_CACHE_BYTES, PTRS_PER_PGD * sizeof(pgd_t));
+   memset(pgd, 0, PTRS_PER_PGD * sizeof(pgd_t));
+   return pgd;
+}
+
+#define pud_offset(pgd, addr)  ((pud_t *)pgd)
+#define pud_free(pud)
+#define pud_alloc(pgd, addr)   pud_offset(pgd, addr)
+
+static inline pmd_t *pud_page_vaddr(pud_t pud)
+{
+   return __va(pud_val(pud)  PHYS_MASK  (s32)PAGE_MASK);
+}
+
+#define

[PATCH 12/15] arm: prepare mmu code for arm64

2014-12-10 Thread Andrew Jones

* don't assume 1G PGDIR_SIZE or L1_CACHE_BYTES pgd alignment
* use page level descriptors for non-I/O memory
* apply new pgd/pud/pmd/pte methods
* split mmu.h to share function declarations
* use more generic flag names in mmu.c

Signed-off-by: Andrew Jones drjo...@redhat.com
---
 lib/arm/asm/mmu-api.h | 14 +++
 lib/arm/asm/mmu.h | 10 +---
 lib/arm/asm/setup.h   |  3 +++
 lib/arm/mmu.c | 69 +--
 4 files changed, 74 insertions(+), 22 deletions(-)
 create mode 100644 lib/arm/asm/mmu-api.h

diff --git a/lib/arm/asm/mmu-api.h b/lib/arm/asm/mmu-api.h
new file mode 100644
index 0..f2511e3dc7dee
--- /dev/null
+++ b/lib/arm/asm/mmu-api.h
@@ -0,0 +1,14 @@
+#ifndef __ASMARM_MMU_API_H_
+#define __ASMARM_MMU_API_H_
+extern pgd_t *mmu_idmap;
+extern bool mmu_enabled(void);
+extern void mmu_enable(pgd_t *pgtable);
+extern void mmu_enable_idmap(void);
+extern void mmu_init_io_sect(pgd_t *pgtable, unsigned long virt_offset);
+extern void mmu_set_range_sect(pgd_t *pgtable, unsigned long virt_offset,
+  unsigned long phys_start, unsigned long phys_end,
+  pgprot_t prot);
+extern void mmu_set_range_ptes(pgd_t *pgtable, unsigned long virt_offset,
+  unsigned long phys_start, unsigned long phys_end,
+  pgprot_t prot);
+#endif
diff --git a/lib/arm/asm/mmu.h b/lib/arm/asm/mmu.h
index 254c29f84fe6f..5ec7a6ce5886b 100644
--- a/lib/arm/asm/mmu.h
+++ b/lib/arm/asm/mmu.h
@@ -8,6 +8,11 @@
 #include asm/pgtable.h
 #include asm/barrier.h
 
+#define PTE_USER   L_PTE_USER
+#define PTE_SHARED L_PTE_SHARED
+#define PTE_AF PTE_EXT_AF
+#define PTE_WBWA   L_PTE_MT_WRITEALLOC
+
 static inline void local_flush_tlb_all(void)
 {
asm volatile(mcr p15, 0, %0, c8, c7, 0 :: r (0));
@@ -21,9 +26,6 @@ static inline void flush_tlb_all(void)
local_flush_tlb_all();
 }
 
-extern bool mmu_enabled(void);
-extern void mmu_enable(pgd_t *pgtable);
-extern void mmu_enable_idmap(void);
-extern void mmu_init_io_sect(pgd_t *pgtable);
+#include asm/mmu-api.h
 
 #endif /* __ASMARM_MMU_H_ */
diff --git a/lib/arm/asm/setup.h b/lib/arm/asm/setup.h
index 450501cc6e8e3..02b668672fca4 100644
--- a/lib/arm/asm/setup.h
+++ b/lib/arm/asm/setup.h
@@ -17,6 +17,9 @@ extern phys_addr_t __phys_offset, __phys_end;
 
 #define PHYS_OFFSET(__phys_offset)
 #define PHYS_END   (__phys_end)
+/* mach-virt reserves the first 1G section for I/O */
+#define PHYS_IO_OFFSET (0UL)
+#define PHYS_IO_END(1UL  30)
 
 #define L1_CACHE_SHIFT 6
 #define L1_CACHE_BYTES (1  L1_CACHE_SHIFT)
diff --git a/lib/arm/mmu.c b/lib/arm/mmu.c
index 7a975c6708de4..55d18a10e1ebd 100644
--- a/lib/arm/mmu.c
+++ b/lib/arm/mmu.c
@@ -8,9 +8,9 @@
 #include asm/setup.h
 #include asm/mmu.h
 
-static bool mmu_on;
-static pgd_t idmap[PTRS_PER_PGD] __attribute__((aligned(L1_CACHE_BYTES)));
+pgd_t *mmu_idmap;
 
+static bool mmu_on;
 bool mmu_enabled(void)
 {
return mmu_on;
@@ -24,29 +24,62 @@ void mmu_enable(pgd_t *pgtable)
mmu_on = true;
 }
 
-void mmu_init_io_sect(pgd_t *pgtable)
+void mmu_set_range_ptes(pgd_t *pgtable, unsigned long virt_offset,
+   unsigned long phys_start, unsigned long phys_end,
+   pgprot_t prot)
 {
-   /*
-* mach-virt reserves the first 1G section for I/O
-*/
-   pgd_val(pgtable[0]) = PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_USER;
-   pgd_val(pgtable[0]) |= PMD_SECT_UNCACHED;
+   unsigned long vaddr = virt_offset  PAGE_MASK;
+   unsigned long paddr = phys_start  PAGE_MASK;
+   unsigned long virt_end = phys_end - paddr + vaddr;
+
+   for (; vaddr  virt_end; vaddr += PAGE_SIZE, paddr += PAGE_SIZE) {
+   pgd_t *pgd = pgd_offset(pgtable, vaddr);
+   pud_t *pud = pud_alloc(pgd, vaddr);
+   pmd_t *pmd = pmd_alloc(pud, vaddr);
+   pte_t *pte = pte_alloc(pmd, vaddr);
+
+   pte_val(*pte) = paddr;
+   pte_val(*pte) |= PTE_TYPE_PAGE | PTE_AF | PTE_SHARED;
+   pte_val(*pte) |= pgprot_val(prot);
+   }
+}
+
+void mmu_set_range_sect(pgd_t *pgtable, unsigned long virt_offset,
+   unsigned long phys_start, unsigned long phys_end,
+   pgprot_t prot)
+{
+   unsigned long vaddr = virt_offset  PGDIR_MASK;
+   unsigned long paddr = phys_start  PGDIR_MASK;
+   unsigned long virt_end = phys_end - paddr + vaddr;
+
+   for (; vaddr  virt_end; vaddr += PGDIR_SIZE, paddr += PGDIR_SIZE) {
+   pgd_t *pgd = pgd_offset(pgtable, vaddr);
+   pgd_val(*pgd) = paddr;
+   pgd_val(*pgd) |= PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S;
+   pgd_val(*pgd) |= pgprot_val(prot);
+   }
+}
+
+
+void mmu_init_io_sect(pgd_t *pgtable, unsigned long

[PATCH 01/15] arm: fix run script testdev probing

2014-12-10 Thread Andrew Jones

Using -kernel doesn't force qemu to exit immediately,
and thus we hang when trying to run arm/run. Using
-initrd works though.

Signed-off-by: Andrew Jones drjo...@redhat.com
---
 arm/run | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arm/run b/arm/run
index a714350225597..4c5e52525d687 100755
--- a/arm/run
+++ b/arm/run
@@ -26,7 +26,7 @@ if ! $qemu $M -device '?' 21 | grep virtconsole  
/dev/null; then
exit 2
 fi
 
-if $qemu $M -chardev testdev,id=id -kernel . 21 \
+if $qemu $M -chardev testdev,id=id -initrd . 21 \
| grep backend  /dev/null; then
echo $qpath doesn't support chr-testdev. Exiting.
exit 2
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 03/15] arm: setup: fix type mismatch

2014-12-10 Thread Andrew Jones

Correct a type mismatch in the cpus initialization.

Signed-off-by: Andrew Jones drjo...@redhat.com
---
 lib/arm/setup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/arm/setup.c b/lib/arm/setup.c
index 5fa37ca35f383..50ca4cb9ff99e 100644
--- a/lib/arm/setup.c
+++ b/lib/arm/setup.c
@@ -22,7 +22,7 @@ extern unsigned long stacktop;
 extern void io_init(void);
 extern void setup_args(const char *args);
 
-u32 cpus[NR_CPUS] = { [0 ... NR_CPUS-1] = (~0UL) };
+u32 cpus[NR_CPUS] = { [0 ... NR_CPUS-1] = (~0U) };
 int nr_cpus;
 
 phys_addr_t __phys_offset, __phys_end;
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 14/15] arm64: prepare for 64k pages

2014-12-10 Thread Andrew Jones

This changes the layout for arm too, but that's fine. The only
thing to keep in mind is that while arm64 will have a single
64k page for its stack, arm will have 16 4k pages. If the number
of stack pages matters, then unit tests that want to work for
both arm and arm64, may need to avoid using more than one page,
even though the memory is there.

Signed-off-by: Andrew Jones drjo...@redhat.com
---
 arm/flat.lds | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arm/flat.lds b/arm/flat.lds
index ee9fc0ab79abc..89a55720d728f 100644
--- a/arm/flat.lds
+++ b/arm/flat.lds
@@ -2,10 +2,10 @@
 SECTIONS
 {
 .text : { *(.init) *(.text) *(.text.*) }
-. = ALIGN(4K);
+. = ALIGN(64K);
 .data : {
 exception_stacks = .;
-. += 4K;
+. += 64K;
 exception_stacks_end = .;
 *(.data)
 }
@@ -13,10 +13,10 @@ SECTIONS
 .rodata : { *(.rodata) }
 . = ALIGN(16);
 .bss : { *(.bss) }
-. = ALIGN(4K);
+. = ALIGN(64K);
 edata = .;
-. += 8K;
-. = ALIGN(4K);
+. += 64K;
+. = ALIGN(64K);
 stacktop = .;
 }
 
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 10/15] arm: get PHYS_MASK from pgtable-hwdef.h

2014-12-10 Thread Andrew Jones

This allows it to be different for arm64, even with setup.h
shared.

Signed-off-by: Andrew Jones drjo...@redhat.com
---
 lib/arm/asm/mmu.h   | 2 +-
 lib/arm/asm/page.h  | 5 ++---
 lib/arm/asm/pgtable-hwdef.h | 6 ++
 lib/arm/asm/setup.h | 6 ++
 lib/arm/mmu.c   | 1 -
 5 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/lib/arm/asm/mmu.h b/lib/arm/asm/mmu.h
index 1117aeaf06a57..8090a1b554820 100644
--- a/lib/arm/asm/mmu.h
+++ b/lib/arm/asm/mmu.h
@@ -5,7 +5,7 @@
  *
  * This work is licensed under the terms of the GNU LGPL, version 2.
  */
-#include asm/page.h
+#include asm/setup.h
 #include asm/barrier.h
 #include alloc.h
 
diff --git a/lib/arm/asm/page.h b/lib/arm/asm/page.h
index 304c80b9ddfd7..039e2ddfb8e0f 100644
--- a/lib/arm/asm/page.h
+++ b/lib/arm/asm/page.h
@@ -16,7 +16,7 @@
 
 #define PAGE_ALIGN(addr)   ALIGN(addr, PAGE_SIZE)
 
-#include asm/setup.h
+#include alloc.h
 
 typedef u64 pteval_t;
 typedef u64 pmdval_t;
@@ -51,6 +51,5 @@ typedef struct { pgd_t pgd; } pud_t;
 #define virt_to_pfn(kaddr) (__pa(kaddr)  PAGE_SHIFT)
 #define pfn_to_virt(pfn)   __va((pfn)  PAGE_SHIFT)
 
-#endif /* __ASSEMBLY__ */
-
+#endif /* !__ASSEMBLY__ */
 #endif /* _ASMARM_PAGE_H_ */
diff --git a/lib/arm/asm/pgtable-hwdef.h b/lib/arm/asm/pgtable-hwdef.h
index a2564aaca05a3..b6850f64b0f52 100644
--- a/lib/arm/asm/pgtable-hwdef.h
+++ b/lib/arm/asm/pgtable-hwdef.h
@@ -62,4 +62,10 @@
 #define PTE_EXT_NG (_AT(pteval_t, 1)  11)/* nG */
 #define PTE_EXT_XN (_AT(pteval_t, 1)  54)/* XN */
 
+/*
+ * 40-bit physical address supported.
+ */
+#define PHYS_MASK_SHIFT(40)
+#define PHYS_MASK  ((_AC(1, ULL)  PHYS_MASK_SHIFT) - 1)
+
 #endif /* _ASMARM_PGTABLE_HWDEF_H_ */
diff --git a/lib/arm/asm/setup.h b/lib/arm/asm/setup.h
index 3ef3b2c99a9de..450501cc6e8e3 100644
--- a/lib/arm/asm/setup.h
+++ b/lib/arm/asm/setup.h
@@ -6,7 +6,8 @@
  * This work is licensed under the terms of the GNU LGPL, version 2.
  */
 #include libcflat.h
-#include alloc.h
+#include asm/page.h
+#include asm/pgtable-hwdef.h
 
 #define NR_CPUS8
 extern u32 cpus[NR_CPUS];
@@ -16,9 +17,6 @@ extern phys_addr_t __phys_offset, __phys_end;
 
 #define PHYS_OFFSET(__phys_offset)
 #define PHYS_END   (__phys_end)
-#define PHYS_SHIFT 40
-#define PHYS_SIZE  (1ULL  PHYS_SHIFT)
-#define PHYS_MASK  (PHYS_SIZE - 1ULL)
 
 #define L1_CACHE_SHIFT 6
 #define L1_CACHE_BYTES (1  L1_CACHE_SHIFT)
diff --git a/lib/arm/mmu.c b/lib/arm/mmu.c
index a42ae84bcec1f..7a975c6708de4 100644
--- a/lib/arm/mmu.c
+++ b/lib/arm/mmu.c
@@ -7,7 +7,6 @@
  */
 #include asm/setup.h
 #include asm/mmu.h
-#include asm/pgtable-hwdef.h
 
 static bool mmu_on;
 static pgd_t idmap[PTRS_PER_PGD] __attribute__((aligned(L1_CACHE_BYTES)));
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 05/15] arm: use absolute headers

2014-12-10 Thread Andrew Jones

Files in lib/arm including asm/someheader.h will get
lib/arm/asm/someheader.h, not lib/asm/someheader.h. So we
need to use  instead of  in order to prepare for headers
of the same name, but for a different arch. We change all
'#include's of all arm files, as consistency looks better.

Signed-off-by: Andrew Jones drjo...@redhat.com
---
 arm/cstart.S  |  6 +++---
 arm/selftest.c| 14 +++---
 lib/arm/asm-offsets.c |  4 ++--
 lib/arm/asm/asm-offsets.h |  2 +-
 lib/arm/asm/io.h  |  8 
 lib/arm/asm/mmu.h |  6 +++---
 lib/arm/asm/page.h|  2 +-
 lib/arm/asm/processor.h   |  2 +-
 lib/arm/asm/ptrace.h  |  2 +-
 lib/arm/asm/setup.h   |  4 ++--
 lib/arm/eabi_compat.c |  2 +-
 lib/arm/io.c  | 10 +-
 lib/arm/mmu.c |  6 +++---
 lib/arm/processor.c   |  6 +++---
 lib/arm/setup.c   | 14 +++---
 lib/arm/spinlock.c|  8 
 16 files changed, 48 insertions(+), 48 deletions(-)

diff --git a/arm/cstart.S b/arm/cstart.S
index a1ccfb24bb4e0..1e3c3a32375fd 100644
--- a/arm/cstart.S
+++ b/arm/cstart.S
@@ -6,9 +6,9 @@
  * This work is licensed under the terms of the GNU LGPL, version 2.
  */
 #define __ASSEMBLY__
-#include asm/asm-offsets.h
-#include asm/ptrace.h
-#include asm/cp15.h
+#include asm/asm-offsets.h
+#include asm/ptrace.h
+#include asm/cp15.h
 
 .arm
 
diff --git a/arm/selftest.c b/arm/selftest.c
index 0f70e1dcb3b0e..0de794ea7d696 100644
--- a/arm/selftest.c
+++ b/arm/selftest.c
@@ -5,13 +5,13 @@
  *
  * This work is licensed under the terms of the GNU LGPL, version 2.
  */
-#include libcflat.h
-#include alloc.h
-#include asm/setup.h
-#include asm/ptrace.h
-#include asm/asm-offsets.h
-#include asm/processor.h
-#include asm/page.h
+#include libcflat.h
+#include alloc.h
+#include asm/setup.h
+#include asm/ptrace.h
+#include asm/asm-offsets.h
+#include asm/processor.h
+#include asm/page.h
 
 #define TESTGRP selftest
 
diff --git a/lib/arm/asm-offsets.c b/lib/arm/asm-offsets.c
index a9c349d2d427c..76380dfa15ab8 100644
--- a/lib/arm/asm-offsets.c
+++ b/lib/arm/asm-offsets.c
@@ -5,8 +5,8 @@
  *
  * This work is licensed under the terms of the GNU LGPL, version 2.
  */
-#include libcflat.h
-#include asm/ptrace.h
+#include libcflat.h
+#include asm/ptrace.h
 
 #define DEFINE(sym, val) \
asm volatile(\n- #sym  %0  #val : : i (val))
diff --git a/lib/arm/asm/asm-offsets.h b/lib/arm/asm/asm-offsets.h
index c2ff2ba6ec417..d370ee36a182b 100644
--- a/lib/arm/asm/asm-offsets.h
+++ b/lib/arm/asm/asm-offsets.h
@@ -1 +1 @@
-#include generated/asm-offsets.h
+#include generated/asm-offsets.h
diff --git a/lib/arm/asm/io.h b/lib/arm/asm/io.h
index bbcbcd0542490..ba3b0b2412adb 100644
--- a/lib/arm/asm/io.h
+++ b/lib/arm/asm/io.h
@@ -1,8 +1,8 @@
 #ifndef _ASMARM_IO_H_
 #define _ASMARM_IO_H_
-#include libcflat.h
-#include asm/barrier.h
-#include asm/page.h
+#include libcflat.h
+#include asm/barrier.h
+#include asm/page.h
 
 #define __iomem
 #define __force
@@ -89,6 +89,6 @@ static inline void *phys_to_virt(phys_addr_t x)
return (void *)__phys_to_virt(x);
 }
 
-#include asm-generic/io.h
+#include asm-generic/io.h
 
 #endif /* _ASMARM_IO_H_ */
diff --git a/lib/arm/asm/mmu.h b/lib/arm/asm/mmu.h
index 451c7493c2aba..1117aeaf06a57 100644
--- a/lib/arm/asm/mmu.h
+++ b/lib/arm/asm/mmu.h
@@ -5,9 +5,9 @@
  *
  * This work is licensed under the terms of the GNU LGPL, version 2.
  */
-#include asm/page.h
-#include asm/barrier.h
-#include alloc.h
+#include asm/page.h
+#include asm/barrier.h
+#include alloc.h
 
 #define PTRS_PER_PGD   4
 #define PGDIR_SHIFT30
diff --git a/lib/arm/asm/page.h b/lib/arm/asm/page.h
index 6ff849a0c0e3b..304c80b9ddfd7 100644
--- a/lib/arm/asm/page.h
+++ b/lib/arm/asm/page.h
@@ -6,7 +6,7 @@
  * This work is licensed under the terms of the GNU LGPL, version 2.
  */
 
-#include const.h
+#include const.h
 
 #define PAGE_SHIFT 12
 #define PAGE_SIZE  (_AC(1,UL)  PAGE_SHIFT)
diff --git a/lib/arm/asm/processor.h b/lib/arm/asm/processor.h
index 883cab89622f7..a56f8d1fc9797 100644
--- a/lib/arm/asm/processor.h
+++ b/lib/arm/asm/processor.h
@@ -5,7 +5,7 @@
  *
  * This work is licensed under the terms of the GNU LGPL, version 2.
  */
-#include ptrace.h
+#include asm/ptrace.h
 
 enum vector {
EXCPTN_RST,
diff --git a/lib/arm/asm/ptrace.h b/lib/arm/asm/ptrace.h
index 3a4c7532258f6..9ee71c760d22f 100644
--- a/lib/arm/asm/ptrace.h
+++ b/lib/arm/asm/ptrace.h
@@ -49,7 +49,7 @@
 #define PSR_ENDIAN_MASK0x0200  /* Endianness state mask */
 
 #ifndef __ASSEMBLY__
-#include libcflat.h
+#include libcflat.h
 
 struct pt_regs {
unsigned long uregs[18];
diff --git a/lib/arm/asm/setup.h b/lib/arm/asm/setup.h
index 21445ef2085fc..3ef3b2c99a9de 100644
--- a/lib/arm/asm/setup.h
+++ b/lib/arm/asm/setup.h
@@ -5,8 +5,8 @@
  *
  * This work is licensed under the terms of the GNU LGPL, version 2.
  */
-#include libcflat.h
-#include alloc.h
+#include

[PATCH 06/15] arm: setup: drop unused arguments

2014-12-10 Thread Andrew Jones

Drop the unused arguments from setup(), passing only the fdt.
This allows setup() to be more easily shared with arm64.

Signed-off-by: Andrew Jones drjo...@redhat.com
---
 arm/cstart.S| 12 ++--
 lib/arm/setup.c |  3 +--
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arm/cstart.S b/arm/cstart.S
index 1e3c3a32375fd..da496e9eae7e0 100644
--- a/arm/cstart.S
+++ b/arm/cstart.S
@@ -19,15 +19,23 @@ start:
/*
 * bootloader params are in r0-r2
 * See the kernel doc Documentation/arm/Booting
+*   r0 = 0
+*   r1 = machine type number
+*   r2 = physical address of the dtb
+*
+* As we have no need for r0's nor r1's value, then
+* put the dtb in r0. This allows setup to be consistent
+* with arm64.
 */
ldr sp, =stacktop
-   push{r0-r3}
+   mov r0, r2
+   push{r0-r1}
 
/* set up vector table and mode stacks */
bl  exceptions_init
 
/* complete setup */
-   pop {r0-r3}
+   pop {r0-r1}
bl  setup
 
/* run the test */
diff --git a/lib/arm/setup.c b/lib/arm/setup.c
index 9d2094da8a29c..8f58802e958ac 100644
--- a/lib/arm/setup.c
+++ b/lib/arm/setup.c
@@ -62,8 +62,7 @@ static void mem_init(phys_addr_t freemem_start)
mmu_enable_idmap();
 }
 
-void setup(unsigned long arg __unused, unsigned long id __unused,
-  const void *fdt)
+void setup(const void *fdt)
 {
const char *bootargs;
u32 fdt_size;
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 02/15] virtio: don't use size_t

2014-12-10 Thread Andrew Jones

A size_t can have a different size when compiled as
64-bit vs. 32-bit. When unsigned int is what we want,
then make sure unsigned int is what we use.

Signed-off-by: Andrew Jones drjo...@redhat.com
---
 lib/chr-testdev.c | 4 ++--
 lib/virtio.c  | 2 +-
 lib/virtio.h  | 3 ++-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/lib/chr-testdev.c b/lib/chr-testdev.c
index 0c9a173a04886..c19424fd44b20 100644
--- a/lib/chr-testdev.c
+++ b/lib/chr-testdev.c
@@ -13,7 +13,7 @@ static struct virtio_device *vcon;
 static struct virtqueue *in_vq, *out_vq;
 static struct spinlock lock;
 
-static void __testdev_send(char *buf, size_t len)
+static void __testdev_send(char *buf, unsigned int len)
 {
int ret;
 
@@ -29,8 +29,8 @@ static void __testdev_send(char *buf, size_t len)
 
 void chr_testdev_exit(int code)
 {
+   unsigned int len;
char buf[8];
-   int len;
 
snprintf(buf, sizeof(buf), %dq, code);
len = strlen(buf);
diff --git a/lib/virtio.c b/lib/virtio.c
index cb496ff2eabd5..9532d1aeb1707 100644
--- a/lib/virtio.c
+++ b/lib/virtio.c
@@ -47,7 +47,7 @@ void vring_init_virtqueue(struct vring_virtqueue *vq, 
unsigned index,
vq-data[i] = NULL;
 }
 
-int virtqueue_add_outbuf(struct virtqueue *_vq, char *buf, size_t len)
+int virtqueue_add_outbuf(struct virtqueue *_vq, char *buf, unsigned int len)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
unsigned avail;
diff --git a/lib/virtio.h b/lib/virtio.h
index b51899ab998b6..4801e204a469d 100644
--- a/lib/virtio.h
+++ b/lib/virtio.h
@@ -139,7 +139,8 @@ extern void vring_init_virtqueue(struct vring_virtqueue 
*vq, unsigned index,
 bool (*notify)(struct virtqueue *),
 void (*callback)(struct virtqueue *),
 const char *name);
-extern int virtqueue_add_outbuf(struct virtqueue *vq, char *buf, size_t len);
+extern int virtqueue_add_outbuf(struct virtqueue *vq, char *buf,
+   unsigned int len);
 extern bool virtqueue_kick(struct virtqueue *vq);
 extern void detach_buf(struct vring_virtqueue *vq, unsigned head);
 extern void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len);
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

kvm-unit-tests: add tscdeadline-latency test

2014-12-10 Thread Marcelo Tosatti


To test latency between TSC deadline timer 
interrupt injection.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm-unit-tests/config/config-x86-common.mak
===
--- kvm-unit-tests.orig/config/config-x86-common.mak2014-06-27 
13:43:43.694257143 -0300
+++ kvm-unit-tests/config/config-x86-common.mak 2014-12-10 16:10:41.715339378 
-0200
@@ -69,6 +69,8 @@
 
 $(TEST_DIR)/apic.elf: $(cstart.o) $(TEST_DIR)/apic.o
 
+$(TEST_DIR)/tscdeadline-latency.elf: $(cstart.o) 
$(TEST_DIR)/tscdeadline-latency.o
+
 $(TEST_DIR)/init.elf: $(cstart.o) $(TEST_DIR)/init.o
 
 $(TEST_DIR)/realmode.elf: $(TEST_DIR)/realmode.o
Index: kvm-unit-tests/config/config-x86_64.mak
===
--- kvm-unit-tests.orig/config/config-x86_64.mak2014-12-10 
16:03:20.609681443 -0200
+++ kvm-unit-tests/config/config-x86_64.mak 2014-12-10 16:10:25.172352577 
-0200
@@ -9,5 +9,6 @@
  $(TEST_DIR)/pcid.flat $(TEST_DIR)/debug.flat
 tests += $(TEST_DIR)/svm.flat
 tests += $(TEST_DIR)/vmx.flat
+tests += $(TEST_DIR)/tscdeadline-latency.flat
 
 include config/config-x86-common.mak
Index: kvm-unit-tests/x86/tscdeadline-latency.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ kvm-unit-tests/x86/tscdeadline-latency.c2014-12-10 18:21:38.151253344 
-0200
@@ -0,0 +1,110 @@
+/*
+ * qemu command line | grep latency | cut -f 2 -d :  latency
+ *
+ * In octave:
+ * load latency
+ * min(list)
+ * max(list)
+ * mean(list)
+ * hist(latency, 50)
+ */
+
+#include libcflat.h
+#include apic.h
+#include vm.h
+#include smp.h
+#include desc.h
+#include isr.h
+#include msr.h
+
+static void test_lapic_existence(void)
+{
+u32 lvr;
+
+lvr = apic_read(APIC_LVR);
+printf(apic version: %x\n, lvr);
+report(apic existence, (u16)lvr == 0x14);
+}
+
+#define TSC_DEADLINE_TIMER_MODE (2  17)
+#define TSC_DEADLINE_TIMER_VECTOR 0xef
+#define MSR_IA32_TSC0x0010
+#define MSR_IA32_TSCDEADLINE0x06e0
+
+static int tdt_count;
+u64 exptime;
+int delta;
+#define TABLE_SIZE 1
+u64 table[TABLE_SIZE];
+volatile int table_idx;
+
+static void tsc_deadline_timer_isr(isr_regs_t *regs)
+{
+u64 now = rdtsc();
+++tdt_count;
+
+if (table_idx  TABLE_SIZE  tdt_count  1)
+table[table_idx++] = now - exptime;
+
+exptime = now+delta;
+wrmsr(MSR_IA32_TSCDEADLINE, now+delta);
+apic_write(APIC_EOI, 0);
+}
+
+static void start_tsc_deadline_timer(void)
+{
+handle_irq(TSC_DEADLINE_TIMER_VECTOR, tsc_deadline_timer_isr);
+irq_enable();
+
+wrmsr(MSR_IA32_TSCDEADLINE, rdmsr(MSR_IA32_TSC)+delta);
+asm volatile (nop);
+}
+
+static int enable_tsc_deadline_timer(void)
+{
+uint32_t lvtt;
+
+if (cpuid(1).c  (1  24)) {
+lvtt = TSC_DEADLINE_TIMER_MODE | TSC_DEADLINE_TIMER_VECTOR;
+apic_write(APIC_LVTT, lvtt);
+start_tsc_deadline_timer();
+return 1;
+} else {
+return 0;
+}
+}
+
+static void test_tsc_deadline_timer(void)
+{
+if(enable_tsc_deadline_timer()) {
+printf(tsc deadline timer enabled\n);
+} else {
+printf(tsc deadline timer not detected\n);
+}
+}
+
+int main()
+{
+int i;
+
+setup_vm();
+smp_init();
+setup_idt();
+
+test_lapic_existence();
+
+mask_pic_interrupts();
+
+delta = 20;
+test_tsc_deadline_timer();
+irq_enable();
+
+do {
+asm volatile(hlt);
+} while (table_idx  TABLE_SIZE);
+
+for (i = 0; i  TABLE_SIZE; i++)
+printf(latency: %d\n, table[i]);
+
+return report_summary();
+}
Index: kvm-unit-tests/x86/unittests.cfg
===
--- kvm-unit-tests.orig/x86/unittests.cfg   2014-12-10 16:03:20.616681437 
-0200
+++ kvm-unit-tests/x86/unittests.cfg2014-12-10 16:15:23.145114609 -0200
@@ -161,3 +161,8 @@
 [debug]
 file = debug.flat
 arch = x86_64
+
+[tscdeadline_latency]
+file = tscdeadline_latency.flat
+extra_params = -cpu qemu64,+tsc-deadline
+arch = x86_64
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/15] arm64: initial drop

2014-12-10 Thread Andrew Jones

This is the initial drop of the arm64 test framework and a first test
that just checks that setup completed (a selftest). kvm isn't needed
to run this test unless testing with smp  1.

Try it out with
  yum install gcc-aarch64-linux-gnu
  ./configure --cross-prefix=aarch64-linux-gnu- --arch=arm64
  make
  QEMU=[qemu with aarch64, mach-virt, and chr-testdev] ./run_tests.sh

Signed-off-by: Andrew Jones drjo...@redhat.com
---
 arm/cstart64.S   | 48 +
 arm/run  | 10 --
 arm/selftest.c   |  6 
 arm/unittests.cfg|  2 ++
 config/config-arm-common.mak | 68 +++
 config/config-arm.mak| 75 +--
 config/config-arm64.mak  | 20 +++
 configure| 12 ++-
 lib/arm/asm-offsets.c|  7 +---
 lib/arm64/.gitignore |  1 +
 lib/arm64/asm-offsets.c  | 14 
 lib/arm64/asm/asm-offsets.h  |  1 +
 lib/arm64/asm/barrier.h  | 17 +
 lib/arm64/asm/io.h   | 84 
 lib/arm64/asm/mmu.h  | 18 ++
 lib/arm64/asm/page.h |  1 +
 lib/arm64/asm/setup.h|  1 +
 lib/arm64/asm/spinlock.h | 15 
 lib/kbuild.h |  8 +
 19 files changed, 333 insertions(+), 75 deletions(-)
 create mode 100644 arm/cstart64.S
 create mode 100644 config/config-arm-common.mak
 create mode 100644 config/config-arm64.mak
 create mode 100644 lib/arm64/.gitignore
 create mode 100644 lib/arm64/asm-offsets.c
 create mode 100644 lib/arm64/asm/asm-offsets.h
 create mode 100644 lib/arm64/asm/barrier.h
 create mode 100644 lib/arm64/asm/io.h
 create mode 100644 lib/arm64/asm/mmu.h
 create mode 100644 lib/arm64/asm/page.h
 create mode 100644 lib/arm64/asm/setup.h
 create mode 100644 lib/arm64/asm/spinlock.h
 create mode 100644 lib/kbuild.h

diff --git a/arm/cstart64.S b/arm/cstart64.S
new file mode 100644
index 0..1d98066d0e187
--- /dev/null
+++ b/arm/cstart64.S
@@ -0,0 +1,48 @@
+/*
+ * Boot entry point and assembler functions for aarch64 tests.
+ *
+ * Copyright (C) 2014, Red Hat Inc, Andrew Jones drjo...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ */
+#define __ASSEMBLY__
+#include asm/asm-offsets.h
+
+.section .init
+
+.globl start
+start:
+   /*
+* bootloader params are in x0-x3
+* The physical address of the dtb is in x0, x1-x3 are reserved
+* See the kernel doc Documentation/arm64/booting.txt
+*/
+   adr x4, stacktop
+   mov sp, x4
+   stp x0, x1, [sp, #-16]!
+
+   /* Enable FP/ASIMD */
+   mov x0, #(3  20)
+   msr cpacr_el1, x0
+
+   /* set up exception handling */
+// bl  exceptions_init
+
+   /* complete setup */
+   ldp x0, x1, [sp], #16
+   bl  setup
+
+   /* run the test */
+   adr x0, __argc
+   ldr x0, [x0]
+   adr x1, __argv
+   bl  main
+   bl  exit
+   b   halt
+
+.text
+
+.globl halt
+halt:
+1: wfi
+   b   1b
diff --git a/arm/run b/arm/run
index 4c5e52525d687..662a8564674a3 100755
--- a/arm/run
+++ b/arm/run
@@ -5,8 +5,9 @@ if [ ! -f config.mak ]; then
exit 2
 fi
 source config.mak
+processor=$PROCESSOR
 
-qemu=${QEMU:-qemu-system-arm}
+qemu=${QEMU:-qemu-system-$ARCH_NAME}
 qpath=$(which $qemu 2/dev/null)
 
 if [ -z $qpath ]; then
@@ -36,7 +37,12 @@ M='-machine virt,accel=kvm:tcg'
 chr_testdev='-device virtio-serial-device'
 chr_testdev+=' -device virtconsole,chardev=ctd -chardev testdev,id=ctd'
 
-command=$qemu $M -cpu $PROCESSOR $chr_testdev
+# arm64 must use '-cpu host' with kvm
+if [ $(arch) = aarch64 ]  [ $ARCH = arm64 ]  [ -c /dev/kvm ]; then
+   processor=host
+fi
+
+command=$qemu $M -cpu $processor $chr_testdev
 command+= -display none -serial stdio -kernel
 
 echo $command $@
diff --git a/arm/selftest.c b/arm/selftest.c
index 885a54fee0e4a..30f44261d47db 100644
--- a/arm/selftest.c
+++ b/arm/selftest.c
@@ -8,10 +8,12 @@
 #include libcflat.h
 #include alloc.h
 #include asm/setup.h
+#ifdef __arm__
 #include asm/ptrace.h
 #include asm/asm-offsets.h
 #include asm/processor.h
 #include asm/page.h
+#endif
 
 #define TESTGRP selftest
 
@@ -78,6 +80,7 @@ static void check_setup(int argc, char **argv)
assert_args(nr_tests, 2);
 }
 
+#ifdef __arm__
 static struct pt_regs expected_regs;
 /*
  * Capture the current register state and execute an instruction
@@ -184,6 +187,7 @@ static void check_vectors(void *arg __unused)
report(%s, check_und()  check_svc(), testname);
exit(report_summary());
 }
+#endif
 
 int main(int argc, char **argv)
 {
@@ -195,6 +199,7 @@ int main(int argc, char **argv)
 
check_setup(argc-1, argv[1]);
 
+#ifdef __arm__
} else if (strcmp(argv[0], vectors-kernel) == 0) {
 
check_vectors(NULL);
@@ -204,6 +209,7 @@

Re: [PATCH] x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefit

2014-12-10 Thread Andy Lutomirski

On Wed, Dec 10, 2014 at 3:49 AM, Paolo Bonzini pbonz...@redhat.com wrote:


 On 06/12/2014 04:03, Andy Lutomirski wrote:
 paravirt_enabled has the following effects:

  - Disables the F00F bug workaround warning.  There is no F00F bug
workaround any more because Linux's standard IDT handling already
works around the F00F bug, but the warning still exists.  This
is only cosmetic, and, in any event, there is no such thing as
KVM on a CPU with the F00F bug.

  - Disables 32-bit APM BIOS detection.  On a KVM paravirt system,
there should be no APM BIOS anyway.

  - Disables tboot.  I think that the tboot code should check the
CPUID hypervisor bit directly if it matters.

  - paravirt_enabled disables espfix32.  espfix32 should *not* be
disabled under KVM paravirt.

 The last point is the purpose of this patch.  It fixes a leak of the
 high 16 bits of the kernel stack address on 32-bit KVM paravirt
 guests.

 While I'm at it, this removes pv_info setup from kvmclock.  That
 code seems to serve no purpose.

 kvmclock_init runs before kvm_guest_init, and this is a stable@ patch so
 for the sake of extra safety I've left the pv_info.name assignment in.
 Applied (locally for now), will be in 3.19.


In the interest of reduced future confusion, would it make sense to
drop the duplicate initialization for 3.20?

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 0/2] KVM: add option to advance tscdeadline hrtimer expiration (v3)

2014-12-10 Thread Marcelo Tosatti

See patches for details.

v2:
- fix email address.

v3:
- use module parameter for configuration of value (Paolo/Radim)



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 1/2] KVM: x86: add method to test PIR bitmap vector

2014-12-10 Thread Marcelo Tosatti

kvm_x86_ops-test_posted_interrupt() returns true/false depending
whether 'vector' is set.

Next patch makes use of this interface.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/include/asm/kvm_host.h
===
--- kvm.orig/arch/x86/include/asm/kvm_host.h
+++ kvm/arch/x86/include/asm/kvm_host.h
@@ -743,6 +743,7 @@ struct kvm_x86_ops {
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
+   bool (*test_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
int (*get_tdp_level)(void);
Index: kvm/arch/x86/kvm/vmx.c
===
--- kvm.orig/arch/x86/kvm/vmx.c
+++ kvm/arch/x86/kvm/vmx.c
@@ -435,6 +435,11 @@ static int pi_test_and_set_pir(int vecto
return test_and_set_bit(vector, (unsigned long *)pi_desc-pir);
 }
 
+static int pi_test_pir(int vector, struct pi_desc *pi_desc)
+{
+   return test_bit(vector, (unsigned long *)pi_desc-pir);
+}
+
 struct vcpu_vmx {
struct kvm_vcpu   vcpu;
unsigned long host_rsp;
@@ -5939,6 +5944,7 @@ static __init int hardware_setup(void)
else {
kvm_x86_ops-hwapic_irr_update = NULL;
kvm_x86_ops-deliver_posted_interrupt = NULL;
+   kvm_x86_ops-test_posted_interrupt = NULL;
kvm_x86_ops-sync_pir_to_irr = vmx_sync_pir_to_irr_dummy;
}
 
@@ -6960,6 +6966,13 @@ static int handle_invvpid(struct kvm_vcp
return 1;
 }
 
+static bool vmx_test_pir(struct kvm_vcpu *vcpu, int vector)
+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+   return pi_test_pir(vector, vmx-pi_desc);
+}
+
 /*
  * The exit handlers return 1 if the exit was handled fully and guest execution
  * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
@@ -9374,6 +9387,7 @@ static struct kvm_x86_ops vmx_x86_ops =
.hwapic_isr_update = vmx_hwapic_isr_update,
.sync_pir_to_irr = vmx_sync_pir_to_irr,
.deliver_posted_interrupt = vmx_deliver_posted_interrupt,
+   .test_posted_interrupt = vmx_test_pir,
 
.set_tss_addr = vmx_set_tss_addr,
.get_tdp_level = get_ept_level,


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 2/2] KVM: x86: add option to advance tscdeadline hrtimer expiration

2014-12-10 Thread Marcelo Tosatti

For the hrtimer which emulates the tscdeadline timer in the guest,
add an option to advance expiration, and busy spin on VM-entry waiting
for the actual expiration time to elapse.

This allows achieving low latencies in cyclictest (or any scenario 
which requires strict timing regarding timer expiration).

Reduces cyclictest avg latency by 50%.

Note: this option requires tuning to find the appropriate value 
for a particular hardware/guest combination. One method is to measure the 
average delay between apic_timer_fn and VM-entry. 
Another method is to start with 1000ns, and increase the value
in say 500ns increments until avg cyclictest numbers stop decreasing.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/kvm/lapic.c
===
--- kvm.orig/arch/x86/kvm/lapic.c
+++ kvm/arch/x86/kvm/lapic.c
@@ -33,6 +33,7 @@
 #include asm/page.h
 #include asm/current.h
 #include asm/apicdef.h
+#include asm/delay.h
 #include linux/atomic.h
 #include linux/jump_label.h
 #include kvm_cache_regs.h
@@ -1073,6 +1074,7 @@ static void apic_timer_expired(struct kv
 {
struct kvm_vcpu *vcpu = apic-vcpu;
wait_queue_head_t *q = vcpu-wq;
+   struct kvm_timer *ktimer = apic-lapic_timer;
 
/*
 * Note: KVM_REQ_PENDING_TIMER is implicitly checked in
@@ -1087,11 +1089,58 @@ static void apic_timer_expired(struct kv
 
if (waitqueue_active(q))
wake_up_interruptible(q);
+
+   if (ktimer-timer_mode_mask == APIC_LVT_TIMER_TSCDEADLINE)
+   ktimer-expired_tscdeadline = ktimer-tscdeadline;
+}
+
+static bool lapic_timer_int_injected(struct kvm_vcpu *vcpu)
+{
+   struct kvm_lapic *apic = vcpu-arch.apic;
+   u32 reg = kvm_apic_get_reg(apic, APIC_LVTT);
+
+   if (kvm_apic_hw_enabled(apic)) {
+   int vec = reg  APIC_VECTOR_MASK;
+
+   if (kvm_x86_ops-test_posted_interrupt)
+   return kvm_x86_ops-test_posted_interrupt(vcpu, vec);
+   else {
+   if (apic_test_vector(vec, apic-regs + APIC_ISR))
+   return true;
+   }
+   }
+   return false;
+}
+
+void wait_lapic_expire(struct kvm_vcpu *vcpu)
+{
+   struct kvm_lapic *apic = vcpu-arch.apic;
+   u64 guest_tsc, tsc_deadline;
+
+   if (!kvm_vcpu_has_lapic(vcpu))
+   return;
+
+   if (!apic_lvtt_tscdeadline(apic))
+   return;
+
+   if (!lapic_timer_int_injected(vcpu))
+   return;
+
+   tsc_deadline = apic-lapic_timer.expired_tscdeadline;
+   guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, native_read_tsc());
+
+   while (guest_tsc  tsc_deadline) {
+   int delay = min(tsc_deadline - guest_tsc, 1000ULL);
+
+   ndelay(delay);
+   guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, native_read_tsc());
+   }
 }
 
 static void start_apic_timer(struct kvm_lapic *apic)
 {
ktime_t now;
+
atomic_set(apic-lapic_timer.pending, 0);
 
if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic)) {
@@ -1137,6 +1186,7 @@ static void start_apic_timer(struct kvm_
/* lapic timer in tsc deadline mode */
u64 guest_tsc, tscdeadline = apic-lapic_timer.tscdeadline;
u64 ns = 0;
+   ktime_t expire;
struct kvm_vcpu *vcpu = apic-vcpu;
unsigned long this_tsc_khz = vcpu-arch.virtual_tsc_khz;
unsigned long flags;
@@ -1151,8 +1201,10 @@ static void start_apic_timer(struct kvm_
if (likely(tscdeadline  guest_tsc)) {
ns = (tscdeadline - guest_tsc) * 100ULL;
do_div(ns, this_tsc_khz);
+   expire = ktime_add_ns(now, ns);
+   expire = ktime_sub_ns(expire, lapic_timer_advance_ns);
hrtimer_start(apic-lapic_timer.timer,
-   ktime_add_ns(now, ns), HRTIMER_MODE_ABS);
+ expire, HRTIMER_MODE_ABS);
} else
apic_timer_expired(apic);
 
Index: kvm/arch/x86/kvm/lapic.h
===
--- kvm.orig/arch/x86/kvm/lapic.h
+++ kvm/arch/x86/kvm/lapic.h
@@ -14,6 +14,7 @@ struct kvm_timer {
u32 timer_mode;
u32 timer_mode_mask;
u64 tscdeadline;
+   u64 expired_tscdeadline;
atomic_t pending;   /* accumulated triggered timers 
*/
 };
 
@@ -170,4 +171,6 @@ static inline bool kvm_apic_has_events(s
 
 bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
 
+void wait_lapic_expire(struct kvm_vcpu *vcpu);
+
 #endif
Index: kvm/arch/x86/kvm/x86.c
===
--- kvm.orig/arch/x86/kvm/x86.c
+++ kvm/arch/x86/kvm/x86.c
@@ -108,6 +108,10 @@ EXPORT_SYMBOL_GPL(kvm_max_guest_tsc_khz)

Re: [PATCH] x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefit

2014-12-10 Thread Paolo Bonzini

 In the interest of reduced future confusion, would it make sense to
 drop the duplicate initialization for 3.20?

Yup.  It would be great if possible to even unify the two init
functions, but I haven't checked what happens in the middle.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH kvm-unit-tests] x86: test_conforming_switch misses es initialization

2014-12-10 Thread Paolo Bonzini

Applied, thanks.

Paolo

- Original Message -
 From: Nadav Amit na...@cs.technion.ac.il
 To: pbonz...@redhat.com
 Cc: kvm@vger.kernel.org, Nadav Amit na...@cs.technion.ac.il
 Sent: Sunday, December 7, 2014 10:39:01 AM
 Subject: [PATCH kvm-unit-tests] x86: test_conforming_switch misses es 
 initialization
 
 test_conforming_switch in the taskswitch2 tests, miss es initialization.
 Fix it.
 
 Signed-off-by: Nadav Amit na...@cs.technion.ac.il
 ---
  x86/taskswitch2.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)
 
 diff --git a/x86/taskswitch2.c b/x86/taskswitch2.c
 index f55843c..db3e41a 100644
 --- a/x86/taskswitch2.c
 +++ b/x86/taskswitch2.c
 @@ -271,7 +271,8 @@ void test_conforming_switch(void)
  
   tss_intr.cs = CONFORM_CS_SEL | 3;
   tss_intr.eip = (u32)user_tss;
 - tss_intr.ds = tss_intr.gs = tss_intr.fs = tss_intr.ss = USER_DS;
 + tss_intr.ss = USER_DS;
 + tss_intr.ds = tss_intr.gs = tss_intr.es = tss_intr.fs = tss_intr.ss;
   tss_intr.eflags |= 3  IOPL_SHIFT;
   set_gdt_entry(CONFORM_CS_SEL, 0, 0x, 0x9f, 0xc0);
   asm volatile(lcall $ xstr(TSS_INTR) , $0xf4f4f4f4);
 --
 1.9.1
 
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kvm-unit-tests: add tscdeadline-latency test

2014-12-10 Thread Paolo Bonzini



On 10/12/2014 21:23, Marcelo Tosatti wrote:
 
 To test latency between TSC deadline timer 
 interrupt injection.
 
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 
 Index: kvm-unit-tests/config/config-x86-common.mak
 ===
 --- kvm-unit-tests.orig/config/config-x86-common.mak  2014-06-27 
 13:43:43.694257143 -0300
 +++ kvm-unit-tests/config/config-x86-common.mak   2014-12-10 
 16:10:41.715339378 -0200
 @@ -69,6 +69,8 @@
  
  $(TEST_DIR)/apic.elf: $(cstart.o) $(TEST_DIR)/apic.o
  
 +$(TEST_DIR)/tscdeadline-latency.elf: $(cstart.o) 
 $(TEST_DIR)/tscdeadline-latency.o
 +
  $(TEST_DIR)/init.elf: $(cstart.o) $(TEST_DIR)/init.o
  
  $(TEST_DIR)/realmode.elf: $(TEST_DIR)/realmode.o
 Index: kvm-unit-tests/config/config-x86_64.mak
 ===
 --- kvm-unit-tests.orig/config/config-x86_64.mak  2014-12-10 
 16:03:20.609681443 -0200
 +++ kvm-unit-tests/config/config-x86_64.mak   2014-12-10 16:10:25.172352577 
 -0200
 @@ -9,5 +9,6 @@
 $(TEST_DIR)/pcid.flat $(TEST_DIR)/debug.flat
  tests += $(TEST_DIR)/svm.flat
  tests += $(TEST_DIR)/vmx.flat
 +tests += $(TEST_DIR)/tscdeadline-latency.flat
  
  include config/config-x86-common.mak
 Index: kvm-unit-tests/x86/tscdeadline-latency.c
 ===
 --- /dev/null 1970-01-01 00:00:00.0 +
 +++ kvm-unit-tests/x86/tscdeadline-latency.c  2014-12-10 18:21:38.151253344 
 -0200
 @@ -0,0 +1,110 @@
 +/*
 + * qemu command line | grep latency | cut -f 2 -d :  latency
 + *
 + * In octave:
 + * load latency
 + * min(list)
 + * max(list)
 + * mean(list)
 + * hist(latency, 50)
 + */
 +
 +#include libcflat.h
 +#include apic.h
 +#include vm.h
 +#include smp.h
 +#include desc.h
 +#include isr.h
 +#include msr.h
 +
 +static void test_lapic_existence(void)
 +{
 +u32 lvr;
 +
 +lvr = apic_read(APIC_LVR);
 +printf(apic version: %x\n, lvr);
 +report(apic existence, (u16)lvr == 0x14);
 +}
 +
 +#define TSC_DEADLINE_TIMER_MODE (2  17)
 +#define TSC_DEADLINE_TIMER_VECTOR 0xef
 +#define MSR_IA32_TSC0x0010
 +#define MSR_IA32_TSCDEADLINE0x06e0
 +
 +static int tdt_count;
 +u64 exptime;
 +int delta;
 +#define TABLE_SIZE 1
 +u64 table[TABLE_SIZE];
 +volatile int table_idx;
 +
 +static void tsc_deadline_timer_isr(isr_regs_t *regs)
 +{
 +u64 now = rdtsc();
 +++tdt_count;
 +
 +if (table_idx  TABLE_SIZE  tdt_count  1)
 +table[table_idx++] = now - exptime;
 +
 +exptime = now+delta;
 +wrmsr(MSR_IA32_TSCDEADLINE, now+delta);
 +apic_write(APIC_EOI, 0);
 +}
 +
 +static void start_tsc_deadline_timer(void)
 +{
 +handle_irq(TSC_DEADLINE_TIMER_VECTOR, tsc_deadline_timer_isr);
 +irq_enable();
 +
 +wrmsr(MSR_IA32_TSCDEADLINE, rdmsr(MSR_IA32_TSC)+delta);
 +asm volatile (nop);
 +}
 +
 +static int enable_tsc_deadline_timer(void)
 +{
 +uint32_t lvtt;
 +
 +if (cpuid(1).c  (1  24)) {
 +lvtt = TSC_DEADLINE_TIMER_MODE | TSC_DEADLINE_TIMER_VECTOR;
 +apic_write(APIC_LVTT, lvtt);
 +start_tsc_deadline_timer();
 +return 1;
 +} else {
 +return 0;
 +}
 +}
 +
 +static void test_tsc_deadline_timer(void)
 +{
 +if(enable_tsc_deadline_timer()) {
 +printf(tsc deadline timer enabled\n);
 +} else {
 +printf(tsc deadline timer not detected\n);
 +}
 +}
 +
 +int main()
 +{
 +int i;
 +
 +setup_vm();
 +smp_init();
 +setup_idt();
 +
 +test_lapic_existence();
 +
 +mask_pic_interrupts();
 +
 +delta = 20;
 +test_tsc_deadline_timer();
 +irq_enable();
 +
 +do {
 +asm volatile(hlt);
 +} while (table_idx  TABLE_SIZE);
 +
 +for (i = 0; i  TABLE_SIZE; i++)
 +printf(latency: %d\n, table[i]);
 +
 +return report_summary();
 +}
 Index: kvm-unit-tests/x86/unittests.cfg
 ===
 --- kvm-unit-tests.orig/x86/unittests.cfg 2014-12-10 16:03:20.616681437 
 -0200
 +++ kvm-unit-tests/x86/unittests.cfg  2014-12-10 16:15:23.145114609 -0200
 @@ -161,3 +161,8 @@
  [debug]
  file = debug.flat
  arch = x86_64
 +
 +[tscdeadline_latency]
 +file = tscdeadline_latency.flat
 +extra_params = -cpu qemu64,+tsc-deadline
 +arch = x86_64
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

Applied, thanks.  Here is a script I use to run it:

#! /bin/sh
time ./x86/run x86/tscdeadline-latency.flat -cpu host | sed -n 's/^latency: 
//p'  l.txt
time ./x86/run x86/tscdeadline-latency.flat -append '200 4000' -cpu host | 
sed -n 's/^latency: //p'  l2.txt
time ./x86/run x86/tscdeadline-latency.flat -append '400 2000' -cpu host | 
sed -n 's/^latency: //p'  l3.txt

gnuplot  \EOF
  hist(x,width)=width*floor(x/width) + binwidth/2.0

  binwidth=500

Re: [patch 2/2] KVM: x86: add option to advance tscdeadline hrtimer expiration

2014-12-10 Thread Paolo Bonzini



On 10/12/2014 21:57, Marcelo Tosatti wrote:
 For the hrtimer which emulates the tscdeadline timer in the guest,
 add an option to advance expiration, and busy spin on VM-entry waiting
 for the actual expiration time to elapse.
 
 This allows achieving low latencies in cyclictest (or any scenario 
 which requires strict timing regarding timer expiration).
 
 Reduces cyclictest avg latency by 50%.
 
 Note: this option requires tuning to find the appropriate value 
 for a particular hardware/guest combination. One method is to measure the 
 average delay between apic_timer_fn and VM-entry. 
 Another method is to start with 1000ns, and increase the value
 in say 500ns increments until avg cyclictest numbers stop decreasing.

What values are you using in practice for the parameter?

 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 
 Index: kvm/arch/x86/kvm/lapic.c
 ===
 --- kvm.orig/arch/x86/kvm/lapic.c
 +++ kvm/arch/x86/kvm/lapic.c
 @@ -33,6 +33,7 @@
  #include asm/page.h
  #include asm/current.h
  #include asm/apicdef.h
 +#include asm/delay.h
  #include linux/atomic.h
  #include linux/jump_label.h
  #include kvm_cache_regs.h
 @@ -1073,6 +1074,7 @@ static void apic_timer_expired(struct kv
  {
   struct kvm_vcpu *vcpu = apic-vcpu;
   wait_queue_head_t *q = vcpu-wq;
 + struct kvm_timer *ktimer = apic-lapic_timer;
  
   /*
* Note: KVM_REQ_PENDING_TIMER is implicitly checked in
 @@ -1087,11 +1089,58 @@ static void apic_timer_expired(struct kv
  
   if (waitqueue_active(q))
   wake_up_interruptible(q);
 +
 + if (ktimer-timer_mode_mask == APIC_LVT_TIMER_TSCDEADLINE)
 + ktimer-expired_tscdeadline = ktimer-tscdeadline;
 +}
 +
 +static bool lapic_timer_int_injected(struct kvm_vcpu *vcpu)
 +{
 + struct kvm_lapic *apic = vcpu-arch.apic;
 + u32 reg = kvm_apic_get_reg(apic, APIC_LVTT);
 +
 + if (kvm_apic_hw_enabled(apic)) {
 + int vec = reg  APIC_VECTOR_MASK;
 +
 + if (kvm_x86_ops-test_posted_interrupt)
 + return kvm_x86_ops-test_posted_interrupt(vcpu, vec);
 + else {
 + if (apic_test_vector(vec, apic-regs + APIC_ISR))
 + return true;
 + }

One branch here is testing IRR, the other is testing ISR.  I think
testing ISR is right; on APICv, the above test will cause a busy wait
during a higher-priority task (or during an interrupt service routine
for the timer itself), just because the timer interrupt was delivered.

So, on APICv, if the interrupt is in PIR but it has bits 7:4 =
PPR[7:4], you have a problem. :(  There is no APICv hook that lets you
get a vmexit when the PPR becomes low enough.

 + }
 + return false;
 +}
 +
 +void wait_lapic_expire(struct kvm_vcpu *vcpu)
 +{
 + struct kvm_lapic *apic = vcpu-arch.apic;
 + u64 guest_tsc, tsc_deadline;
 +
 + if (!kvm_vcpu_has_lapic(vcpu))
 + return;
 +
 + if (!apic_lvtt_tscdeadline(apic))
 + return;

This test is wrong, I think.  You need to check whether the timer
interrupt was a TSC deadline interrupt.  Instead, you are checking
whether the current mode is TSC-deadline.  This can be different if the
interrupt could not be delivered immediately after it was received.
This is easy to fix: replace the first two tests with
apic-lapic_timer.expired_tscdeadline != 0 and...

 + if (!lapic_timer_int_injected(vcpu))
 + return;
 + tsc_deadline = apic-lapic_timer.expired_tscdeadline;

... set apic-lapic_timer.expired_tscdeadline to 0 here.

But I'm not sure how to solve the above problem with APICv.  That's a
pity.  Knowing what values you use in practice for the parameter, would
also make it easier to understand the problem.  Please report that
together with the graphs produced by the unit test you added.

Paolo

 + guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, native_read_tsc());
 +
 + while (guest_tsc  tsc_deadline) {
 + int delay = min(tsc_deadline - guest_tsc, 1000ULL);
 +
 + ndelay(delay);
 + guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, native_read_tsc());
 + }
  }
  
  static void start_apic_timer(struct kvm_lapic *apic)
  {
   ktime_t now;
 +
   atomic_set(apic-lapic_timer.pending, 0);
  
   if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic)) {
 @@ -1137,6 +1186,7 @@ static void start_apic_timer(struct kvm_
   /* lapic timer in tsc deadline mode */
   u64 guest_tsc, tscdeadline = apic-lapic_timer.tscdeadline;
   u64 ns = 0;
 + ktime_t expire;
   struct kvm_vcpu *vcpu = apic-vcpu;
   unsigned long this_tsc_khz = vcpu-arch.virtual_tsc_khz;
   unsigned long flags;
 @@ -1151,8 +1201,10 @@ static void start_apic_timer(struct kvm_
   if (likely(tscdeadline  guest_tsc)) {
   ns = (tscdeadline -

RE: [Intel-gfx] [ANNOUNCE][RFC] KVMGT - the implementation of Intel GVT-g(full GPU virtualization) for KVM

2014-12-10 Thread Tian, Kevin

 From: Paolo Bonzini [mailto:pbonz...@redhat.com]
 Sent: Thursday, December 11, 2014 12:59 AM
 
 On 09/12/2014 03:49, Tian, Kevin wrote:
  - Now we have XenGT/KVMGT separately maintained, and KVMGT lags
  behind XenGT regarding to features and qualities. Likely you'll continue
  see stale code (like Xen inst decoder) for some time. In the future we
  plan to maintain a single kernel repo for both, so KVMGT can share
  same quality as XenGT once KVM in-kernel dm framework is stable.
 
  - Regarding to Qemu hacks, KVMGT really doesn't have any different
  requirements as what have been discussed for GPU pass-through, e.g.
  about ISA bridge. Our implementation is based on an old Qemu repo,
  and honestly speaking not cleanly developed, because we know we
  can leverage from GPU pass-through support once it's in Qemu. At
  that time we'll leverage the same logic with minimal changes to
  hook KVMGT mgmt. APIs (e.g. create/destroy a vGPU instance). So
  we can ignore this area for now. :-)
 
 Could the virtual device model introduce new registers in order to avoid
 poking at the ISA bridge?  I'm not sure that you can leverage from GPU
 pass-through support once it's in Qemu, since the Xen IGD passthrough
 support is being added to a separate machine that is specific to Xen IGD
 passthrough; no ISA bridge hacking will probably be allowed on the -M
 pc and -M q35 machine types.
 

My point is that KVMGT doesn't introduce new requirements as what's
required in IGD passthrough case, because all the hacks you see now
is to satisfy guest graphics driver's expectation. I haven't follow up the
KVM IGD passthrough progress, but if it doesn't require ISA bridge hacking
the same trick can be adopted by KVMGT too. You may know Allen is
working on driver changes to avoid causing those hacks in Qemu side.
That effort will benefit us too. So I don't think this is a KVMGT specific
issue, and we need a common solution to close this gap instead of 
hacking vGPU device model alone.

Thanks
Kevin

Re: [ANNOUNCE][RFC] KVMGT - the implementation of Intel GVT-g(full GPU virtualization) for KVM

2014-12-10 Thread Paolo Bonzini



On 11/12/2014 01:33, Tian, Kevin wrote:
 My point is that KVMGT doesn't introduce new requirements as what's
 required in IGD passthrough case, because all the hacks you see now
 is to satisfy guest graphics driver's expectation. I haven't follow up the
 KVM IGD passthrough progress, but if it doesn't require ISA bridge hacking
 the same trick can be adopted by KVMGT too.

Right now it did require ISA bridge hacking.

 You may know Allen is
 working on driver changes to avoid causing those hacks in Qemu side.
 That effort will benefit us too.

That's good to know, thanks!

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm: coalesced_mmio: remove one redundant check inside of coalesced_mmio_in_range()

2014-12-10 Thread Tiejun Chen

We already check 'len' above to make sure it already isn't
negative here, so indeed, (addr + len  addr) should never be happened.

Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
 virt/kvm/coalesced_mmio.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 00d8642..60f59cd 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -30,8 +30,6 @@ static int coalesced_mmio_in_range(struct 
kvm_coalesced_mmio_dev *dev,
 */
if (len  0)
return 0;
-   if (addr + len  addr)
-   return 0;
if (addr  dev-zone.addr)
return 0;
if (addr + len  dev-zone.addr + dev-zone.size)
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 2/2] KVM: x86: add option to advance tscdeadline hrtimer expiration

2014-12-10 Thread Marcelo Tosatti

On Thu, Dec 11, 2014 at 12:37:57AM +0100, Paolo Bonzini wrote:
 
 
 On 10/12/2014 21:57, Marcelo Tosatti wrote:
  For the hrtimer which emulates the tscdeadline timer in the guest,
  add an option to advance expiration, and busy spin on VM-entry waiting
  for the actual expiration time to elapse.
  
  This allows achieving low latencies in cyclictest (or any scenario 
  which requires strict timing regarding timer expiration).
  
  Reduces cyclictest avg latency by 50%.
  
  Note: this option requires tuning to find the appropriate value 
  for a particular hardware/guest combination. One method is to measure the 
  average delay between apic_timer_fn and VM-entry. 
  Another method is to start with 1000ns, and increase the value
  in say 500ns increments until avg cyclictest numbers stop decreasing.
 
 What values are you using in practice for the parameter?

7us.

  Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
  
  Index: kvm/arch/x86/kvm/lapic.c
  ===
  --- kvm.orig/arch/x86/kvm/lapic.c
  +++ kvm/arch/x86/kvm/lapic.c
  @@ -33,6 +33,7 @@
   #include asm/page.h
   #include asm/current.h
   #include asm/apicdef.h
  +#include asm/delay.h
   #include linux/atomic.h
   #include linux/jump_label.h
   #include kvm_cache_regs.h
  @@ -1073,6 +1074,7 @@ static void apic_timer_expired(struct kv
   {
  struct kvm_vcpu *vcpu = apic-vcpu;
  wait_queue_head_t *q = vcpu-wq;
  +   struct kvm_timer *ktimer = apic-lapic_timer;
   
  /*
   * Note: KVM_REQ_PENDING_TIMER is implicitly checked in
  @@ -1087,11 +1089,58 @@ static void apic_timer_expired(struct kv
   
  if (waitqueue_active(q))
  wake_up_interruptible(q);
  +
  +   if (ktimer-timer_mode_mask == APIC_LVT_TIMER_TSCDEADLINE)
  +   ktimer-expired_tscdeadline = ktimer-tscdeadline;
  +}
  +
  +static bool lapic_timer_int_injected(struct kvm_vcpu *vcpu)
  +{
  +   struct kvm_lapic *apic = vcpu-arch.apic;
  +   u32 reg = kvm_apic_get_reg(apic, APIC_LVTT);
  +
  +   if (kvm_apic_hw_enabled(apic)) {
  +   int vec = reg  APIC_VECTOR_MASK;
  +
  +   if (kvm_x86_ops-test_posted_interrupt)
  +   return kvm_x86_ops-test_posted_interrupt(vcpu, vec);
  +   else {
  +   if (apic_test_vector(vec, apic-regs + APIC_ISR))
  +   return true;
  +   }
 
 One branch here is testing IRR, the other is testing ISR.  I think
 testing ISR is right; on APICv, the above test will cause a busy wait
 during a higher-priority task (or during an interrupt service routine
 for the timer itself), just because the timer interrupt was delivered.

Yes.

 So, on APICv, if the interrupt is in PIR but it has bits 7:4 =
 PPR[7:4], you have a problem. :(  There is no APICv hook that lets you
 get a vmexit when the PPR becomes low enough.

Well, you simply exit earlier and busy spin for VM-exit
time.

For Linux guests, there is no problem.

  +   }
  +   return false;
  +}
  +
  +void wait_lapic_expire(struct kvm_vcpu *vcpu)
  +{
  +   struct kvm_lapic *apic = vcpu-arch.apic;
  +   u64 guest_tsc, tsc_deadline;
  +
  +   if (!kvm_vcpu_has_lapic(vcpu))
  +   return;
  +
  +   if (!apic_lvtt_tscdeadline(apic))
  +   return;
 
 This test is wrong, I think.  You need to check whether the timer
 interrupt was a TSC deadline interrupt.  Instead, you are checking
 whether the current mode is TSC-deadline.  This can be different if the
 interrupt could not be delivered immediately after it was received.
 This is easy to fix: replace the first two tests with
 apic-lapic_timer.expired_tscdeadline != 0 and...

Yes.

  +   if (!lapic_timer_int_injected(vcpu))
  +   return;
  +   tsc_deadline = apic-lapic_timer.expired_tscdeadline;
 
 ... set apic-lapic_timer.expired_tscdeadline to 0 here.
 
 But I'm not sure how to solve the above problem with APICv.  That's a
 pity.  Knowing what values you use in practice for the parameter, would
 also make it easier to understand the problem.  Please report that
 together with the graphs produced by the unit test you added.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kvm-unit-tests: add tscdeadline-latency test

2014-12-10 Thread Marcelo Tosatti

On Wed, Dec 10, 2014 at 10:49:52PM +0100, Paolo Bonzini wrote:
 
 
 On 10/12/2014 21:23, Marcelo Tosatti wrote:
  
  To test latency between TSC deadline timer 
  interrupt injection.
  
  Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
  
  Index: kvm-unit-tests/config/config-x86-common.mak
  ===
  --- kvm-unit-tests.orig/config/config-x86-common.mak2014-06-27 
  13:43:43.694257143 -0300
  +++ kvm-unit-tests/config/config-x86-common.mak 2014-12-10 
  16:10:41.715339378 -0200
  @@ -69,6 +69,8 @@
   
   $(TEST_DIR)/apic.elf: $(cstart.o) $(TEST_DIR)/apic.o
   
  +$(TEST_DIR)/tscdeadline-latency.elf: $(cstart.o) 
  $(TEST_DIR)/tscdeadline-latency.o
  +
   $(TEST_DIR)/init.elf: $(cstart.o) $(TEST_DIR)/init.o
   
   $(TEST_DIR)/realmode.elf: $(TEST_DIR)/realmode.o
  Index: kvm-unit-tests/config/config-x86_64.mak
  ===
  --- kvm-unit-tests.orig/config/config-x86_64.mak2014-12-10 
  16:03:20.609681443 -0200
  +++ kvm-unit-tests/config/config-x86_64.mak 2014-12-10 16:10:25.172352577 
  -0200
  @@ -9,5 +9,6 @@
$(TEST_DIR)/pcid.flat $(TEST_DIR)/debug.flat
   tests += $(TEST_DIR)/svm.flat
   tests += $(TEST_DIR)/vmx.flat
  +tests += $(TEST_DIR)/tscdeadline-latency.flat
   
   include config/config-x86-common.mak
  Index: kvm-unit-tests/x86/tscdeadline-latency.c
  ===
  --- /dev/null   1970-01-01 00:00:00.0 +
  +++ kvm-unit-tests/x86/tscdeadline-latency.c2014-12-10 
  18:21:38.151253344 -0200
  @@ -0,0 +1,110 @@
  +/*
  + * qemu command line | grep latency | cut -f 2 -d :  latency
  + *
  + * In octave:
  + * load latency
  + * min(list)
  + * max(list)
  + * mean(list)
  + * hist(latency, 50)
  + */
  +
  +#include libcflat.h
  +#include apic.h
  +#include vm.h
  +#include smp.h
  +#include desc.h
  +#include isr.h
  +#include msr.h
  +
  +static void test_lapic_existence(void)
  +{
  +u32 lvr;
  +
  +lvr = apic_read(APIC_LVR);
  +printf(apic version: %x\n, lvr);
  +report(apic existence, (u16)lvr == 0x14);
  +}
  +
  +#define TSC_DEADLINE_TIMER_MODE (2  17)
  +#define TSC_DEADLINE_TIMER_VECTOR 0xef
  +#define MSR_IA32_TSC0x0010
  +#define MSR_IA32_TSCDEADLINE0x06e0
  +
  +static int tdt_count;
  +u64 exptime;
  +int delta;
  +#define TABLE_SIZE 1
  +u64 table[TABLE_SIZE];
  +volatile int table_idx;
  +
  +static void tsc_deadline_timer_isr(isr_regs_t *regs)
  +{
  +u64 now = rdtsc();
  +++tdt_count;
  +
  +if (table_idx  TABLE_SIZE  tdt_count  1)
  +table[table_idx++] = now - exptime;
  +
  +exptime = now+delta;
  +wrmsr(MSR_IA32_TSCDEADLINE, now+delta);
  +apic_write(APIC_EOI, 0);
  +}
  +
  +static void start_tsc_deadline_timer(void)
  +{
  +handle_irq(TSC_DEADLINE_TIMER_VECTOR, tsc_deadline_timer_isr);
  +irq_enable();
  +
  +wrmsr(MSR_IA32_TSCDEADLINE, rdmsr(MSR_IA32_TSC)+delta);
  +asm volatile (nop);
  +}
  +
  +static int enable_tsc_deadline_timer(void)
  +{
  +uint32_t lvtt;
  +
  +if (cpuid(1).c  (1  24)) {
  +lvtt = TSC_DEADLINE_TIMER_MODE | TSC_DEADLINE_TIMER_VECTOR;
  +apic_write(APIC_LVTT, lvtt);
  +start_tsc_deadline_timer();
  +return 1;
  +} else {
  +return 0;
  +}
  +}
  +
  +static void test_tsc_deadline_timer(void)
  +{
  +if(enable_tsc_deadline_timer()) {
  +printf(tsc deadline timer enabled\n);
  +} else {
  +printf(tsc deadline timer not detected\n);
  +}
  +}
  +
  +int main()
  +{
  +int i;
  +
  +setup_vm();
  +smp_init();
  +setup_idt();
  +
  +test_lapic_existence();
  +
  +mask_pic_interrupts();
  +
  +delta = 20;
  +test_tsc_deadline_timer();
  +irq_enable();
  +
  +do {
  +asm volatile(hlt);
  +} while (table_idx  TABLE_SIZE);
  +
  +for (i = 0; i  TABLE_SIZE; i++)
  +printf(latency: %d\n, table[i]);
  +
  +return report_summary();
  +}
  Index: kvm-unit-tests/x86/unittests.cfg
  ===
  --- kvm-unit-tests.orig/x86/unittests.cfg   2014-12-10 16:03:20.616681437 
  -0200
  +++ kvm-unit-tests/x86/unittests.cfg2014-12-10 16:15:23.145114609 
  -0200
  @@ -161,3 +161,8 @@
   [debug]
   file = debug.flat
   arch = x86_64
  +
  +[tscdeadline_latency]
  +file = tscdeadline_latency.flat
  +extra_params = -cpu qemu64,+tsc-deadline
  +arch = x86_64
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
 
 Applied, thanks.  Here is a script I use to run it:
 
 #! /bin/sh
 time ./x86/run x86/tscdeadline-latency.flat -cpu host | sed -n 's/^latency: 
 //p'  l.txt
 time ./x86/run x86/tscdeadline-latency.flat

[PATCH v3 1/3] KVM: nVMX: Add nested msr load/restore algorithm

2014-12-10 Thread Wincy Van

Several hypervisors need MSR auto load/restore feature.
We read MSRs from VM-entry MSR load area which specified by L1,
and load them via kvm_set_msr in the nested entry.
When nested exit occurs, we get MSRs via kvm_get_msr, writing
them to L1`s MSR store area. After this, we read MSRs from VM-exit
MSR load area, and load them via kvm_set_msr.

Signed-off-by: Wincy Van fanwenyi0...@gmail.com
---
 arch/x86/include/uapi/asm/vmx.h |  5 +++
 arch/x86/kvm/vmx.c  | 68 +
 arch/x86/kvm/x86.c  |  1 +
 virt/kvm/kvm_main.c |  1 +
 4 files changed, 75 insertions(+)

diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index b813bf9..ff2b8e2 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -56,6 +56,7 @@
 #define EXIT_REASON_MSR_READ31
 #define EXIT_REASON_MSR_WRITE   32
 #define EXIT_REASON_INVALID_STATE   33
+#define EXIT_REASON_MSR_LOAD_FAIL   34
 #define EXIT_REASON_MWAIT_INSTRUCTION   36
 #define EXIT_REASON_MONITOR_INSTRUCTION 39
 #define EXIT_REASON_PAUSE_INSTRUCTION   40
@@ -116,10 +117,14 @@
{ EXIT_REASON_APIC_WRITE,APIC_WRITE }, \
{ EXIT_REASON_EOI_INDUCED,   EOI_INDUCED }, \
{ EXIT_REASON_INVALID_STATE, INVALID_STATE }, \
+   { EXIT_REASON_MSR_LOAD_FAIL, MSR_LOAD_FAIL }, \
{ EXIT_REASON_INVD,  INVD }, \
{ EXIT_REASON_INVVPID,   INVVPID }, \
{ EXIT_REASON_INVPCID,   INVPCID }, \
{ EXIT_REASON_XSAVES,XSAVES }, \
{ EXIT_REASON_XRSTORS,   XRSTORS }
 
+#define VMX_ABORT_SAVE_GUEST_MSR_FAIL1
+#define VMX_ABORT_LOAD_HOST_MSR_FAIL 4
+
 #endif /* _UAPIVMX_H */
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9bcc871..b49d198 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6143,6 +6143,13 @@ static void nested_vmx_failValid(struct kvm_vcpu *vcpu,
 */
 }
 
+static void nested_vmx_abort(struct kvm_vcpu *vcpu, u32 indicator)
+{
+   /* TODO: not to reset guest simply here. */
+   kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
+   pr_warn(kvm: nested vmx abort, indicator %d\n, indicator);
+}
+
 static enum hrtimer_restart vmx_preemption_timer_fn(struct hrtimer *timer)
 {
struct vcpu_vmx *vmx =
@@ -8286,6 +8293,67 @@ static void vmx_start_preemption_timer(struct kvm_vcpu 
*vcpu)
  ns_to_ktime(preemption_timeout), HRTIMER_MODE_REL);
 }
 
+static inline int nested_vmx_msr_check_common(struct vmx_msr_entry *e)
+{
+   if (e-index  8 == 0x8 || e-reserved != 0)
+   return -EINVAL;
+   return 0;
+}
+
+static inline int nested_vmx_load_msr_check(struct vmx_msr_entry *e)
+{
+   if (e-index == MSR_FS_BASE ||
+   e-index == MSR_GS_BASE ||
+   nested_vmx_msr_check_common(e))
+   return -EINVAL;
+   return 0;
+}
+
+/*
+ * Load guest's/host's msr at nested entry/exit.
+ * return 0 for success, entry index for failure.
+ */
+static u32 nested_vmx_load_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count)
+{
+   u32 i;
+   struct vmx_msr_entry e;
+   struct msr_data msr;
+
+   msr.host_initiated = false;
+   for (i = 0; i  count; i++) {
+   kvm_read_guest(vcpu-kvm, gpa + i * sizeof(e), e, sizeof(e));
+   if (nested_vmx_load_msr_check(e))
+   goto fail;
+   msr.index = e.index;
+   msr.data = e.value;
+   if (kvm_set_msr(vcpu, msr))
+   goto fail;
+   }
+   return 0;
+fail:
+   return i + 1;
+}
+
+static int nested_vmx_store_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count)
+{
+   u32 i;
+   struct vmx_msr_entry e;
+
+   for (i = 0; i  count; i++) {
+   kvm_read_guest(vcpu-kvm, gpa + i * sizeof(e),
+   e, 2 * sizeof(u32));
+   if (nested_vmx_msr_check_common(e))
+   return -EINVAL;
+   if (kvm_get_msr(vcpu, e.index, e.value))
+   return -EINVAL;
+   kvm_write_guest(vcpu-kvm,
+   gpa + i * sizeof(e) +
+   offsetof(struct vmx_msr_entry, value),
+   e.value, sizeof(e.value));
+   }
+   return 0;
+}
+
 /*
  * prepare_vmcs02 is called when the L1 guest hypervisor runs its nested
  * L2 guest. L1 has a vmcs for L2 (vmcs12), and this function merges it
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c259814..af9faed 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2324,6 +2324,7 @@ int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 
*pdata)
 {
return kvm_x86_ops-get_msr(vcpu, msr_index, pdata);
 }
+EXPORT_SYMBOL_GPL(kvm_get_msr);
 
 static int get_msr_mtrr(struct kvm_vcpu *vcpu, u32 msr,

[PATCH v3 2/3] KVM: nVMX: Improve nested msr switch checking

2014-12-10 Thread Eugene Korenevsky

This patch improve checks required by Intel Software Developer Manual.
 - SMM MSRs are not allowed.
 - microcode MSRs are not allowed.
 - check x2apic MSRs only when LAPIC is in x2apic mode.
 - MSR switch areas must be aligned to 16 bytes.
 - address of first and last byte in MSR switch areas should not set any bits
   beyond the processor's physical-address width.

Also it adds warning messages on failures during MSR switch. These messages
are useful for people who debug their VMMs in nVMX.

Signed-off-by: Eugene Korenevsky ekorenev...@gmail.com
---
 arch/x86/include/uapi/asm/msr-index.h |   3 +
 arch/x86/kvm/vmx.c| 121 ++
 2 files changed, 110 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/uapi/asm/msr-index.h 
b/arch/x86/include/uapi/asm/msr-index.h
index e21331c..3c9c601 100644
--- a/arch/x86/include/uapi/asm/msr-index.h
+++ b/arch/x86/include/uapi/asm/msr-index.h
@@ -316,6 +316,9 @@
 #define MSR_IA32_UCODE_WRITE   0x0079
 #define MSR_IA32_UCODE_REV 0x008b
 
+#define MSR_IA32_SMM_MONITOR_CTL   0x009b
+#define MSR_IA32_SMBASE0x009e
+
 #define MSR_IA32_PERF_STATUS   0x0198
 #define MSR_IA32_PERF_CTL  0x0199
 #define MSR_AMD_PSTATE_DEF_BASE0xc0010064
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b49d198..9061d93 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8293,18 +8293,78 @@ static void vmx_start_preemption_timer(struct kvm_vcpu 
*vcpu)
  ns_to_ktime(preemption_timeout), HRTIMER_MODE_REL);
 }
 
-static inline int nested_vmx_msr_check_common(struct vmx_msr_entry *e)
+static int nested_vmx_check_msr_switch(struct kvm_vcpu *vcpu,
+  unsigned long count_field,
+  unsigned long addr_field,
+  int maxphyaddr)
 {
-   if (e-index  8 == 0x8 || e-reserved != 0)
+   u64 count, addr;
+
+   if (vmcs12_read_any(vcpu, count_field, count) ||
+   vmcs12_read_any(vcpu, addr_field, addr)) {
+   WARN_ON(1);
return -EINVAL;
+   }
+   if (!IS_ALIGNED(addr, 16) || addr  maxphyaddr ||
+   (addr + count * sizeof(struct vmx_msr_entry) - 1)  maxphyaddr) {
+   pr_warn_ratelimited(
+   nVMX: invalid MSR switch (0x%lx, %d, %llu, 0x%08llx),
+   addr_field, maxphyaddr, count, addr);
+   return -EINVAL;
+   }
return 0;
 }
 
-static inline int nested_vmx_load_msr_check(struct vmx_msr_entry *e)
+static int nested_vmx_check_msr_switch_controls(struct kvm_vcpu *vcpu,
+   struct vmcs12 *vmcs12)
+{
+   int maxphyaddr;
+
+   if (vmcs12-vm_exit_msr_load_count == 0 
+   vmcs12-vm_exit_msr_store_count == 0 
+   vmcs12-vm_entry_msr_load_count == 0)
+   return 0; /* Fast path */
+   maxphyaddr = cpuid_maxphyaddr(vcpu);
+   if (nested_vmx_check_msr_switch(vcpu, VM_EXIT_MSR_LOAD_COUNT,
+   VM_EXIT_MSR_LOAD_ADDR, maxphyaddr) ||
+   nested_vmx_check_msr_switch(vcpu, VM_EXIT_MSR_STORE_COUNT,
+   VM_EXIT_MSR_STORE_ADDR, maxphyaddr) ||
+   nested_vmx_check_msr_switch(vcpu, VM_ENTRY_MSR_LOAD_COUNT,
+   VM_ENTRY_MSR_LOAD_ADDR, maxphyaddr))
+   return -EINVAL;
+   return 0;
+}
+
+static int nested_vmx_msr_check_common(struct kvm_vcpu *vcpu,
+  struct vmx_msr_entry *e)
+{
+   /* x2APIC MSR accesses are not allowed */
+   if (apic_x2apic_mode(vcpu-arch.apic)  e-index  8 == 0x8)
+   return -EINVAL;
+   if (e-index == MSR_IA32_UCODE_WRITE || /* SDM Table 35-2 */
+   e-index == MSR_IA32_UCODE_REV)
+   return -EINVAL;
+   if (e-reserved != 0)
+   return -EINVAL;
+   return 0;
+}
+
+static int nested_vmx_load_msr_check(struct kvm_vcpu *vcpu,
+struct vmx_msr_entry *e)
 {
if (e-index == MSR_FS_BASE ||
e-index == MSR_GS_BASE ||
-   nested_vmx_msr_check_common(e))
+   e-index == MSR_IA32_SMM_MONITOR_CTL || /* SMM is not supported */
+   nested_vmx_msr_check_common(vcpu, e))
+   return -EINVAL;
+   return 0;
+}
+
+static int nested_vmx_store_msr_check(struct kvm_vcpu *vcpu,
+ struct vmx_msr_entry *e)
+{
+   if (e-index == MSR_IA32_SMBASE || /* SMM is not supported */
+   nested_vmx_msr_check_common(vcpu, e))
return -EINVAL;
return 0;
 }
@@ -8321,13 +8381,27 @@ static u32 nested_vmx_load_msr(struct kvm_vcpu *vcpu, 
u64 gpa, u32 count)
 
msr.host_initiated = false;
for (i = 0; i  count; i++) {
-

[PATCH v3 3/3] KVM: nVMX: Enable nested msr load/restore feature

2014-12-10 Thread Eugene Korenevsky

On nested entry:
 - check msr switch area.
 - load L2's MSRs. If failed, terminate nested entry
   and load L1's state. If failed on loading L1's MSRs
   again, do nested vmx abort.

On nested exit:
 - restore L2's MSRs. If failed, do nested vmx abort.
 - load L1's MSRs. If failed, do nested vmx abort.

Signed-off-by: Wincy Van fanwenyi0...@gmail.com
Signed-off-by: Eugene Korenevsky ekorenev...@gmail.com
---
 arch/x86/kvm/vmx.c | 30 +++---
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9061d93..0d4efaa 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8743,6 +8743,7 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool 
launch)
int cpu;
struct loaded_vmcs *vmcs02;
bool ia32e;
+   u32 msr_entry_idx;
 
if (!nested_vmx_check_permission(vcpu) ||
!nested_vmx_check_vmcs12(vcpu))
@@ -8790,11 +8791,7 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool 
launch)
return 1;
}
 
-   if (vmcs12-vm_entry_msr_load_count  0 ||
-   vmcs12-vm_exit_msr_load_count  0 ||
-   vmcs12-vm_exit_msr_store_count  0) {
-   pr_warn_ratelimited(%s: VMCS MSR_{LOAD,STORE} unsupported\n,
-   __func__);
+   if (nested_vmx_check_msr_switch_controls(vcpu, vmcs12)) {
nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);
return 1;
}
@@ -8900,10 +8897,21 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool 
launch)
 
vmx_segment_cache_clear(vmx);
 
-   vmcs12-launch_state = 1;
-
prepare_vmcs02(vcpu, vmcs12);
 
+   msr_entry_idx = nested_vmx_load_msr(vcpu,
+   vmcs12-vm_entry_msr_load_addr,
+   vmcs12-vm_entry_msr_load_count);
+   if (msr_entry_idx) {
+   leave_guest_mode(vcpu);
+   vmx_load_vmcs01(vcpu);
+   nested_vmx_entry_failure(vcpu, vmcs12,
+   EXIT_REASON_MSR_LOAD_FAIL, msr_entry_idx);
+   return 1;
+   }
+
+   vmcs12-launch_state = 1;
+
if (vmcs12-guest_activity_state == GUEST_ACTIVITY_HLT)
return kvm_emulate_halt(vcpu);
 
@@ -9333,6 +9341,10 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
 
kvm_set_dr(vcpu, 7, 0x400);
vmcs_write64(GUEST_IA32_DEBUGCTL, 0);
+
+   if (nested_vmx_load_msr(vcpu, vmcs12-vm_exit_msr_load_addr,
+   vmcs12-vm_exit_msr_load_count))
+   nested_vmx_abort(vcpu, VMX_ABORT_LOAD_HOST_MSR_FAIL);
 }
 
 /*
@@ -9354,6 +9366,10 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 
exit_reason,
prepare_vmcs12(vcpu, vmcs12, exit_reason, exit_intr_info,
   exit_qualification);
 
+   if (nested_vmx_store_msr(vcpu, vmcs12-vm_exit_msr_store_addr,
+vmcs12-vm_exit_msr_store_count))
+   nested_vmx_abort(vcpu, VMX_ABORT_SAVE_GUEST_MSR_FAIL);
+
vmx_load_vmcs01(vcpu);
 
if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
-- 
2.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [v2 17/25] KVM: kvm-vfio: User API for VT-d Posted-Interrupts

2014-12-10 Thread Wu, Feng

 -Original Message-
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Monday, December 08, 2014 1:21 PM
 To: Wu, Feng
 Cc: Eric Auger; t...@linutronix.de; mi...@redhat.com; h...@zytor.com;
 x...@kernel.org; g...@kernel.org; pbonz...@redhat.com;
 dw...@infradead.org; j...@8bytes.org; jiang@linux.intel.com;
 linux-ker...@vger.kernel.org; io...@lists.linux-foundation.org;
 kvm@vger.kernel.org
 Subject: Re: [v2 17/25] KVM: kvm-vfio: User API for VT-d Posted-Interrupts

 On Mon, 2014-12-08 at 04:58 +, Wu, Feng wrote:

   -Original Message-
   From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org]
 On
   Behalf Of Eric Auger
   Sent: Thursday, December 04, 2014 10:05 PM
   To: Wu, Feng; t...@linutronix.de; mi...@redhat.com; h...@zytor.com;
   x...@kernel.org; g...@kernel.org; pbonz...@redhat.com;
   dw...@infradead.org; j...@8bytes.org; alex.william...@redhat.com;
   jiang@linux.intel.com
   Cc: linux-ker...@vger.kernel.org; io...@lists.linux-foundation.org;
   kvm@vger.kernel.org
   Subject: Re: [v2 17/25] KVM: kvm-vfio: User API for VT-d Posted-Interrupts

   Hi Feng,
   On 12/03/2014 08:39 AM, Feng Wu wrote:
This patch adds and documents a new attribute
KVM_DEV_VFIO_DEVICE_POSTING_IRQ in KVM_DEV_VFIO_DEVICE
 group.
This new attribute is used for VT-d Posted-Interrupts.

When guest OS changes the interrupt configuration for an
assigned device, such as, MSI/MSIx data/address fields,
QEMU will use this IRQ attribute to tell KVM to update the
related IRTE according the VT-d Posted-Interrrupts Specification,
such as, the guest vector should be updated in the related IRTE.

Signed-off-by: Feng Wu feng...@intel.com
---
 Documentation/virtual/kvm/devices/vfio.txt |9 +
 include/uapi/linux/kvm.h   |   10 ++
 2 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/Documentation/virtual/kvm/devices/vfio.txt
   b/Documentation/virtual/kvm/devices/vfio.txt
index f7aff29..41e12b7 100644
--- a/Documentation/virtual/kvm/devices/vfio.txt
+++ b/Documentation/virtual/kvm/devices/vfio.txt
@@ -42,3 +42,12 @@ activated before VFIO_DEVICE_SET_IRQS has
 been
   called to trigger the IRQ
 or associate an eventfd to it. Unforwarding can only be called while 
the
 signaling has been disabled with VFIO_DEVICE_SET_IRQS. If this
 condition
   is
 not satisfied, the command returns an -EBUSY.
+
+  KVM_DEV_VFIO_DEVICE_POSTING_IRQ: Use posted interrtups
   mechanism to post
   typo
+   the IRQ to guests.
+For this attribute, kvm_device_attr.addr points to a kvm_vfio_dev_irq
   struct.
+
+When guest OS changes the interrupt configuration for an assigned
 device,
+such as, MSI/MSIx data/address fields, QEMU will use this IRQ attribute
+to tell KVM to update the related IRTE according the VT-d
   Posted-Interrrupts
+Specification, such as, the guest vector should be updated in the 
related
   IRTE.
   For my curiosity are there any restrictions about the instant at which
   the change can be done?
   I do not get here how you deactivate the posting?

  The current method is if the hardware supports interrupts posting, we will
  use it instead of interrupts remapping, since it has good performance. Why
  do I need deactivate interrupts posting?

  Here is the reply to Alex for the same question:
  In fact, I don't think we need to stop the posted-interrupts. For setting
  posted interrupts, we update the related IRTE according to the new
  format. If the guest reboots, or unload the drivers, or some other
  operations, the msi/msix will be disabled first, in this path, the irq
  will be disabled the related IRTE is not used anymore.

 Right, and I'm still not sure I agree with that reasoning.  We need to
 build the kernel interface to be generic, not tailored for a specific
 userspace.  I don't really feel comfortable having something that can't
 be disabled via a similar path to it being enabled.  For instance, what
 about a dynamic debug interface where we want to enable tracing and see
 each interrupt injected into the guest.  At that point we'd want to
 disabled posted interrupts and direct KVM injection and route via QEMU.
 Thanks,

 Alex

I am not quite understand why we need to debug the software
delivery path for interrupt when PI is used, in this case, the software
injection code will have no chance to execute. If we don't want the use
PI, we can disable it from kernel command line.

Thanks,
Feng

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index a269a42..7d98650 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -949,6 +949,7 @@ struct kvm_device_attr {
 #define  KVM_DEV_VFIO_DEVICE   2
 #define   KVM_DEV_VFIO_DEVICE_FORWARD_IRQ  1
 #define

Fix Penguin Penalty 17th October2014 ( mail-archive.com )

2014-12-10 Thread clayey34257

Dear Sir

Did your website get hit by Google Penguin update on October 17th 2014? What 
basically is Google Penguin Update? It is actually a code name for Google 
algorithm which aims at decreasing your websites search engine rankings that 
violate Googles guidelines by using black hat SEO techniques to rank your 
webpage by giving number of spammy links to the page.
 
We are one of those few SEO companies that can help you avoid penalties from 
Google Updates like Penguin and Panda. Our clients have survived all the 
previous and present updates with ease. They have never been hit because we use 
100% white hat SEO techniques to rank Webpages.  Simple thing that we do to 
keep websites away from any Penguin or Panda penalties is follow Google 
guidelines and we give Google users the best answers to their queries.

If you are looking to increase the quality of your websites and to get more 
targeted traffic or save your websites from these Google penalties email us 
back with your interest. 

We will be glad to serve you and help you grow your business.

Regards

Arohi Singh

SEO Manager ( TOB )
B7 Green Avenue, Amritsar 143001 Punjab

NO CLICK in the subject to STOP EMAILS
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

89 matches

Mail list logo