[GIT PULL 0/2] KVM: s390: Fix and cleanup for 4.2 (kvm/next)

2015-06-02 Thread Christian Borntraeger
Paolo,

one fix and one cleanup for 4.2.



The following changes since commit 06b36753a6466239fc945b6756e12d635621ae5f:

  KVM: s390: drop handling of interception code 12 (2015-05-08 15:51:17 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git  
tags/kvm-s390-next-20150602

for you to fetch changes up to ea2cdd27dce66dc498c623adacd807ea3a350443:

  KVM: s390: introduce KMSG_COMPONENT for kvm-s390 (2015-06-02 09:39:20 +0200)


KVM: s390: Fix and cleanup for 4.2 (kvm/next)

One small fix for a commit targetted for 4.2 and one cleanup
regarding our printks.


David Hildenbrand (2):
  KVM: s390: call exit_sie() directly on vcpu block/request
  KVM: s390: introduce KMSG_COMPONENT for kvm-s390

 arch/s390/kvm/kvm-s390.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 1/2] KVM: s390: call exit_sie() directly on vcpu block/request

2015-06-02 Thread Christian Borntraeger
From: David Hildenbrand d...@linux.vnet.ibm.com

Thinking about it, I can't find a real use case where we want
to block a VCPU and not kick it out of SIE. (except if we want
to do the same in batch for multiple VCPUs - but that's a micro
optimization)

So let's simply perform the exit_sie() calls directly when setting
the other magic block bits in the SIE.

Otherwise e.g. kvm_s390_set_tod_low() still has other VCPUs running
after that call, working with a wrong epoch.

Fixes: 27406cd50c (KVM: s390: provide functions for blocking all CPUs)
Acked-by: Christian Borntraeger borntrae...@de.ibm.com
Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
---
 arch/s390/kvm/kvm-s390.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 6bc69ab..f37557b 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1417,6 +1417,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
 void kvm_s390_vcpu_block(struct kvm_vcpu *vcpu)
 {
atomic_set_mask(PROG_BLOCK_SIE, vcpu-arch.sie_block-prog20);
+   exit_sie(vcpu);
 }
 
 void kvm_s390_vcpu_unblock(struct kvm_vcpu *vcpu)
@@ -1427,6 +1428,7 @@ void kvm_s390_vcpu_unblock(struct kvm_vcpu *vcpu)
 static void kvm_s390_vcpu_request(struct kvm_vcpu *vcpu)
 {
atomic_set_mask(PROG_REQUEST, vcpu-arch.sie_block-prog20);
+   exit_sie(vcpu);
 }
 
 static void kvm_s390_vcpu_request_handled(struct kvm_vcpu *vcpu)
@@ -1450,7 +1452,6 @@ void kvm_s390_sync_request(int req, struct kvm_vcpu *vcpu)
 {
kvm_make_request(req, vcpu);
kvm_s390_vcpu_request(vcpu);
-   exit_sie(vcpu);
 }
 
 static void kvm_gmap_notifier(struct gmap *gmap, unsigned long address)
-- 
2.3.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 2/2] KVM: s390: introduce KMSG_COMPONENT for kvm-s390

2015-06-02 Thread Christian Borntraeger
From: David Hildenbrand d...@linux.vnet.ibm.com

Let's remove kvm-s390 from our printk messages and make use
of pr_fmt instead.

Also replace one printk() occurrence by a equivalent pr_warn
on the way.

Suggested-by: Christian Borntraeger borntrae...@de.ibm.com
Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
---
 arch/s390/kvm/kvm-s390.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index f37557b..9cb6cfaa 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -36,6 +36,10 @@
 #include kvm-s390.h
 #include gaccess.h
 
+#define KMSG_COMPONENT kvm-s390
+#undef pr_fmt
+#define pr_fmt(fmt) KMSG_COMPONENT :  fmt
+
 #define CREATE_TRACE_POINTS
 #include trace.h
 #include trace-s390.h
@@ -2086,7 +2090,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
if (!kvm_s390_user_cpu_state_ctrl(vcpu-kvm)) {
kvm_s390_vcpu_start(vcpu);
} else if (is_vcpu_stopped(vcpu)) {
-   pr_err_ratelimited(kvm-s390: can't run stopped vcpu %d\n,
+   pr_err_ratelimited(can't run stopped vcpu %d\n,
   vcpu-vcpu_id);
return -EINVAL;
}
@@ -2617,7 +2621,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
rc = gmap_map_segment(kvm-arch.gmap, mem-userspace_addr,
mem-guest_phys_addr, mem-memory_size);
if (rc)
-   printk(KERN_WARNING kvm-s390: failed to commit memory 
region\n);
+   pr_warn(failed to commit memory region\n);
return;
 }
 
-- 
2.3.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time

2015-06-02 Thread Christoffer Dall
On Mon, Jun 01, 2015 at 08:48:22AM -0700, Mario Smarduch wrote:
 On 05/30/2015 11:59 PM, Christoffer Dall wrote:
  Hi Mario,
  
  On Fri, May 29, 2015 at 03:34:47PM -0700, Mario Smarduch wrote:
  On 05/28/2015 11:49 AM, Christoffer Dall wrote:
  Until now we have been calling kvm_guest_exit after re-enabling
  interrupts when we come back from the guest, but this has the
  unfortunate effect that CPU time accounting done in the context of timer
  interrupts occurring while the guest is running doesn't properly notice
  that the time since the last tick was spent in the guest.
 
  Inspired by the comment in the x86 code, move the kvm_guest_exit() call
  below the local_irq_enable() call and change __kvm_guest_exit() to
  kvm_guest_exit(), because we are now calling this function with
  interrupts enabled.  We have to now explicitly disable preemption and
  not enable preemption before we've called kvm_guest_exit(), since
  otherwise we could be preempted and everything happening before we
  eventually get scheduled again would be accounted for as guest time.
 
  At the same time, move the trace_kvm_exit() call outside of the atomic
  section, since there is no reason for us to do that with interrupts
  disabled.
 
  Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
  ---
  This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
  rework recently posted by Christian Borntraeger.  I hope I got the logic
  of this right, there were 2 slightly worrying facts about this:
 
  First, we now enable and disable and enable interrupts on each exit
  path, but I couldn't see any performance overhead on hackbench - yes the
  only benchmark we care about.
 
  Second, looking at the ppc and mips code, they seem to also call
  kvm_guest_exit() before enabling interrupts, so I don't understand how
  guest CPU time accounting works on those architectures.
 
  Changes since v1:
   - Tweak comment and commit text based on Marc's feedback.
   - Explicitly disable preemption and enable it only after 
  kvm_guest_exit().
 
   arch/arm/kvm/arm.c | 21 +
   1 file changed, 17 insertions(+), 4 deletions(-)
 
  diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
  index e41cb11..fe8028d 100644
  --- a/arch/arm/kvm/arm.c
  +++ b/arch/arm/kvm/arm.c
  @@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
  struct kvm_run *run)
kvm_vgic_flush_hwstate(vcpu);
kvm_timer_flush_hwstate(vcpu);
   
  + preempt_disable();
local_irq_disable();
   
/*
  @@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
  struct kvm_run *run)
   
if (ret = 0 || need_new_vmid_gen(vcpu-kvm)) {
local_irq_enable();
  + preempt_enable();
kvm_timer_sync_hwstate(vcpu);
kvm_vgic_sync_hwstate(vcpu);
continue;
  @@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
  struct kvm_run *run)
ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
   
vcpu-mode = OUTSIDE_GUEST_MODE;
  - __kvm_guest_exit();
  - trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
  + /*
  +  * Back from guest
  +  */
  +
/*
 * We may have taken a host interrupt in HYP mode (ie
 * while executing the guest). This interrupt is still
  @@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
  struct kvm_run *run)
local_irq_enable();
   
/*
  -  * Back from guest
  -  */
  +  * We do local_irq_enable() before calling kvm_guest_exit() so
  +  * that if a timer interrupt hits while running the guest we
  +  * account that tick as being spent in the guest.  We enable
  +  * preemption after calling kvm_guest_exit() so that if we get
  +  * preempted we make sure ticks after that is not counted as
  +  * guest time.
  +  */
  + kvm_guest_exit();
  + trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
  + preempt_enable();
  +
   
kvm_timer_sync_hwstate(vcpu);
kvm_vgic_sync_hwstate(vcpu);
 
 
  Hi Christoffer,
   so currently we take a snap shot when we enter the guest
  (tsk-vtime_snap) and upon exit add the time we spent in
  the guest and update accrued time, which appears correct.
  
  not on arm64, because we don't select HAVE_VIRT_CPU_ACCOUNTING_GEN.  Or
  am I missing something obvious here?
 I see what you mean we can't use cycle based accounting to accrue
 Guest time.
 

See other thread, we can enable this in the config but it still only
works with NO_HZ_FULL.

  
 
  With this patch it appears that interrupts running
  in host mode are accrued to 

Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time

2015-06-02 Thread Christoffer Dall
On Mon, Jun 01, 2015 at 03:37:32PM +0200, Christian Borntraeger wrote:
 Am 01.06.2015 um 15:35 schrieb Christoffer Dall:
  On Mon, Jun 01, 2015 at 11:21:19AM +0200, Christian Borntraeger wrote:
  Am 01.06.2015 um 11:08 schrieb Christoffer Dall:
 
 
  Second, looking at the ppc and mips code, they seem to also call
  kvm_guest_exit() before enabling interrupts, so I don't understand how
  guest CPU time accounting works on those architectures.
 
  Not an expert here, but I assume mips has the same logic as arm so if 
  your
  patch is right for arm its probably also for mips.
 
  powerpc looks similar to what s390 does (not using the tick, instead it 
  uses
  a hw-timer) so this should be fine.
 
  I wonder if we can simply enable HAVE_VIRT_CPU_ACCOUNTING_GEN and get
  this for free which would avoid the need for this patch?
 
  Asssuming that HAVE_VIRT_CPU_ACCOUNTING_GEN behaves similar to 
  HAVE_VIRT_CPU_ACCOUNTING on s390/power in respect to not rely on ticks
  - yes it might work out. Can you give it a try?
 
  Adding HAVE_VIRT_CPU_ACCOUNTING_GEN to arch/arm64/Kconfig works, but has
  no effect unless you also enable CONFIG_NO_HZ_FULL, so that hardly feels
  like a fix since it would be a shame to force users to use this config
  option to report CPU usage correctly.
  
  I'm not entirely sure what the history and meaning behind these configs
  are, so maybe there is an entirely different rework needed here.  It
  seems logical that you could simply sample the counter at entry/exit of
  the guest, but if there is nowhere to store this data without
  NO_HZ_FULL+VIRT_CPU_ACCOUNTING_GEN then I guess that would be why?
 
 Given Paolos response that irq_disable/enable is faster than save/restore
 at least on x86 your v2 patch might actually be the right thing to do.
 
Thanks, I think so too, but we should enable
HAVE_VIRT_CPU_ACCOUNTING_GEN for arm64 as well, assuming there are no
mysterious side affects of doing so.

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Update on Split Irqchip Patches

2015-06-02 Thread Steve Rutherford
Hi All,

I just sent out a new version of the patches that enable a split
irqchip. I've tested them against Google's VMM, and the updated
patches boot Windows and pass the KVM unit tests. It's mostly the
same as before, with the tweaks suggested against the first version
I sent out.

A new Google Intern (Andrew Liu) is currently looking into patching
QEMU to work with this patch set, which is pretty exciting.

Are there any changes/updates to this patch set that are necessary
before it could be merged?

Cheers,
Steve
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/4] KVM: x86: Split the APIC from the rest of IRQCHIP.

2015-06-02 Thread Steve Rutherford
First patch in a series which enables the relocation of the
PIC/IOAPIC to userspace.

Adds capability KVM_CAP_SPLIT_IRQCHIP;

KVM_CAP_SPLIT_IRQCHIP enables the construction of LAPICs without the
rest of the irqchip.

Compile tested for x86.

Signed-off-by: Steve Rutherford srutherf...@google.com
Suggested-by: Andrew Honig aho...@google.com
---
 Documentation/virtual/kvm/api.txt | 15 
 arch/powerpc/kvm/irq.h|  5 
 arch/s390/kvm/irq.h   |  4 
 arch/x86/include/asm/kvm_host.h   |  2 ++
 arch/x86/kvm/assigned-dev.c   |  4 ++--
 arch/x86/kvm/irq.c|  6 ++---
 arch/x86/kvm/irq.h| 11 +
 arch/x86/kvm/irq_comm.c   |  7 ++
 arch/x86/kvm/lapic.c  | 13 +++
 arch/x86/kvm/mmu.c|  2 +-
 arch/x86/kvm/svm.c|  4 ++--
 arch/x86/kvm/vmx.c| 12 +-
 arch/x86/kvm/x86.c| 49 +++
 include/kvm/arm_vgic.h|  1 +
 include/linux/kvm_host.h  |  1 +
 include/uapi/linux/kvm.h  |  1 +
 virt/kvm/irqchip.c|  2 +-
 17 files changed, 104 insertions(+), 35 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 6955444..9a43d42 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2979,6 +2979,7 @@ len must be a multiple of sizeof(struct kvm_s390_irq). It 
must be  0
 and it must not exceed (max_vcpus + 32) * sizeof(struct kvm_s390_irq),
 which is the maximum number of possibly pending cpu-local interrupts.
 
+
 5. The kvm_run structure
 
 
@@ -3575,6 +3576,20 @@ struct {
 
 KVM handlers should exit to userspace with rc = -EREMOTE.
 
+7.5 KVM_SPLIT_IRQCHIP
+
+Capability: KVM_CAP_SPLIT_IRQCHIP
+Architectures: x86
+Type:  VM ioctl
+Parameters: None
+Returns: 0 on success, -1 on error
+
+Create a local apic for each processor in the kernel.  This differs from
+KVM_CREATE_IRQCHIP in that it only creates the local apic; it creates neither
+the ioapic nor the pic in the kernel. Also, enables in kernel routing of
+interrupt requests. Fails if VCPU has already been created, or if the irqchip 
is
+already in the kernel.
+
 
 8. Other capabilities.
 --
diff --git a/arch/powerpc/kvm/irq.h b/arch/powerpc/kvm/irq.h
index 5a9a10b..5e6fa06 100644
--- a/arch/powerpc/kvm/irq.h
+++ b/arch/powerpc/kvm/irq.h
@@ -17,4 +17,9 @@ static inline int irqchip_in_kernel(struct kvm *kvm)
return ret;
 }
 
+static inline int lapic_in_kernel(struct kvm *kvm)
+{
+   return irqchip_in_kernel(kvm);
+}
+
 #endif
diff --git a/arch/s390/kvm/irq.h b/arch/s390/kvm/irq.h
index d98e415..db876c3 100644
--- a/arch/s390/kvm/irq.h
+++ b/arch/s390/kvm/irq.h
@@ -19,4 +19,8 @@ static inline int irqchip_in_kernel(struct kvm *kvm)
return 1;
 }
 
+static inline int lapic_in_kernel(struct kvm *kvm)
+{
+   return irqchip_in_kernel(kvm);
+}
 #endif
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7276107..af3225a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -639,6 +639,8 @@ struct kvm_arch {
bool boot_vcpu_runs_old_kvmclock;
 
u64 disabled_quirks;
+
+   bool irqchip_split;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/assigned-dev.c b/arch/x86/kvm/assigned-dev.c
index d090ecf..1237e92 100644
--- a/arch/x86/kvm/assigned-dev.c
+++ b/arch/x86/kvm/assigned-dev.c
@@ -291,7 +291,7 @@ static int kvm_deassign_irq(struct kvm *kvm,
 {
unsigned long guest_irq_type, host_irq_type;
 
-   if (!irqchip_in_kernel(kvm))
+   if (!lapic_in_kernel(kvm))
return -EINVAL;
/* no irq assignment to deassign */
if (!assigned_dev-irq_requested_type)
@@ -568,7 +568,7 @@ static int kvm_vm_ioctl_assign_irq(struct kvm *kvm,
struct kvm_assigned_dev_kernel *match;
unsigned long host_irq_type, guest_irq_type;
 
-   if (!irqchip_in_kernel(kvm))
+   if (!lapic_in_kernel(kvm))
return r;
 
mutex_lock(kvm-lock);
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index a1ec6a50..706e47a 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -57,7 +57,7 @@ static int kvm_cpu_has_extint(struct kvm_vcpu *v)
  */
 int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v)
 {
-   if (!irqchip_in_kernel(v-kvm))
+   if (!lapic_in_kernel(v-kvm))
return v-arch.interrupt.pending;
 
if (kvm_cpu_has_extint(v))
@@ -75,7 +75,7 @@ int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v)
  */
 int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
 {
-   if (!irqchip_in_kernel(v-kvm))
+   if (!lapic_in_kernel(v-kvm))
return v-arch.interrupt.pending;
 
if (kvm_cpu_has_extint(v))
@@ -103,7 +103,7 @@ int kvm_cpu_get_interrupt(struct kvm_vcpu *v)
 {
int vector;
 
-   if 

[PATCH v3 3/4] KVM: x86: Add EOI exit bitmap inference

2015-06-02 Thread Steve Rutherford
In order to support a userspace IOAPIC interacting with an in kernel
APIC, the EOI exit bitmaps need to be configurable.

If the IOAPIC is in userspace (i.e. the irqchip has been split), the
EOI exit bitmaps will be set whenever the GSI Routes are configured.
In particular, for the low MSI routes are reservable for userspace
IOAPICs. For these MSI routes, the EOI Exit bit corresponding to the
destination vector of the route will be set for the destination VCPU.

The intention is for the userspace IOAPICs to use the reservable MSI
routes to inject interrupts into the guest.

This is a slight abuse of the notion of an MSI Route, given that MSIs
classically bypass the IOAPIC. It might be worthwhile to add an
additional route type to improve clarity.

Compile tested for Intel x86.

Signed-off-by: Steve Rutherford srutherf...@google.com
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/ioapic.c   | 16 
 arch/x86/kvm/ioapic.h   |  2 ++
 arch/x86/kvm/lapic.c|  3 +--
 arch/x86/kvm/x86.c  | 30 ++
 include/linux/kvm_host.h|  9 +
 virt/kvm/irqchip.c  | 37 +
 7 files changed, 88 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2778d36..4f439ff 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -644,6 +644,7 @@ struct kvm_arch {
u64 disabled_quirks;
 
bool irqchip_split;
+   u8 nr_reserved_ioapic_pins;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 856f791..fb5281b 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -672,3 +672,19 @@ int kvm_set_ioapic(struct kvm *kvm, struct 
kvm_ioapic_state *state)
spin_unlock(ioapic-lock);
return 0;
 }
+
+void kvm_arch_irq_routing_update(struct kvm *kvm)
+{
+   struct kvm_ioapic *ioapic = kvm-arch.vioapic;
+
+   if (ioapic)
+   return;
+   if (!lapic_in_kernel(kvm))
+   return;
+   kvm_make_scan_ioapic_request(kvm);
+}
+
+u8 kvm_arch_nr_userspace_ioapic_pins(struct kvm *kvm)
+{
+   return kvm-arch.nr_reserved_ioapic_pins;
+}
diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
index ca0b0b4..3af349c 100644
--- a/arch/x86/kvm/ioapic.h
+++ b/arch/x86/kvm/ioapic.h
@@ -9,6 +9,7 @@ struct kvm;
 struct kvm_vcpu;
 
 #define IOAPIC_NUM_PINS  KVM_IOAPIC_NUM_PINS
+#define MAX_NR_RESERVED_IOAPIC_PINS 48
 #define IOAPIC_VERSION_ID 0x11 /* IOAPIC version */
 #define IOAPIC_EDGE_TRIG  0
 #define IOAPIC_LEVEL_TRIG 1
@@ -123,4 +124,5 @@ int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state 
*state);
 void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap,
u32 *tmr);
 
+void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
 #endif
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 28eb946..766d297 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -209,8 +209,7 @@ out:
if (old)
kfree_rcu(old, rcu);
 
-   if (!irqchip_split(kvm))
-   kvm_vcpu_request_scan_ioapic(kvm);
+   kvm_make_scan_ioapic_request(kvm);
 }
 
 static inline void apic_set_spiv(struct kvm_lapic *apic, u32 val)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5e01810..35d13d4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3930,15 +3930,20 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
case KVM_CAP_SPLIT_IRQCHIP: {
mutex_lock(kvm-lock);
r = -EEXIST;
-   if (lapic_in_kernel(kvm))
+   if (irqchip_in_kernel(kvm))
goto split_irqchip_unlock;
r = -EINVAL;
-   if (atomic_read(kvm-online_vcpus))
-   goto split_irqchip_unlock;
-   r = kvm_setup_empty_irq_routing(kvm);
-   if (r)
+   if (cap-args[0]  MAX_NR_RESERVED_IOAPIC_PINS)
goto split_irqchip_unlock;
-   kvm-arch.irqchip_split = true;
+   if (!irqchip_split(kvm)) {
+   if (atomic_read(kvm-online_vcpus))
+   goto split_irqchip_unlock;
+   r = kvm_setup_empty_irq_routing(kvm);
+   if (r)
+   goto split_irqchip_unlock;
+   kvm-arch.irqchip_split = true;
+   }
+   kvm-arch.nr_reserved_ioapic_pins = cap-args[0];
r = 0;
 split_irqchip_unlock:
mutex_unlock(kvm-lock);
@@ -6403,8 +6408,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
goto out;
}
}
-   if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu))
-   

[PATCH v3 2/4] KVM: x86: Add KVM exit for IOAPIC EOIs

2015-06-02 Thread Steve Rutherford
Adds KVM_EXIT_IOAPIC_EOI which allows the kernel to EOI
level-triggered IOAPIC interrupts.

Uses a per VCPU exit bitmap to decide whether or not the IOAPIC needs
to be informed (which is identical to the EOI_EXIT_BITMAP field used
by modern x86 processors, but can also be used to elide kvm IOAPIC EOI
exits on older processors).

[Note: A prototype using ResampleFDs found that decoupling the EOI
from the VCPU's thread made it possible for the VCPU to not see a
recent EOI after reentering the guest. This does not match real
hardware.]

Compile tested for Intel x86.

Signed-off-by: Steve Rutherford srutherf...@google.com
---
 Documentation/virtual/kvm/api.txt | 10 ++
 arch/x86/include/asm/kvm_host.h   |  3 +++
 arch/x86/kvm/lapic.c  |  9 +
 arch/x86/kvm/x86.c| 11 +++
 include/linux/kvm_host.h  |  1 +
 include/uapi/linux/kvm.h  |  5 +
 6 files changed, 39 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 9a43d42..6ab2a3f7 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3271,6 +3271,16 @@ Valid values for 'type' are:
 */
__u64 kvm_valid_regs;
__u64 kvm_dirty_regs;
+
+   /* KVM_EXIT_IOAPIC_EOI */
+struct {
+  __u8 vector;
+} eoi;
+
+Indicates that an eoi of a level triggered IOAPIC interrupt on vector has
+occurred, which should be handled by the userspace IOAPIC. Triggers when
+the Irqchip has been split between userspace and the kernel.
+
union {
struct kvm_sync_regs regs;
char padding[1024];
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index af3225a..2778d36 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -540,6 +540,9 @@ struct kvm_vcpu_arch {
struct {
bool pv_unhalted;
} pv;
+
+   u64 eoi_exit_bitmaps[4];
+   int pending_ioapic_eoi;
 };
 
 struct kvm_lpage_info {
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 92f4c98..28eb946 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -869,6 +869,15 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct 
kvm_vcpu *vcpu2)
 
 static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
 {
+   if (irqchip_split(apic-vcpu-kvm)) {
+   if (test_bit(vector,
+(void *) apic-vcpu-arch.eoi_exit_bitmaps)) {
+   apic-vcpu-arch.pending_ioapic_eoi = vector;
+   kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic-vcpu);
+   }
+   return;
+   }
+
if (kvm_ioapic_handles_vector(apic-vcpu-kvm, vector)) {
int trigger_mode;
if (apic_test_vector(vector, apic-regs + APIC_TMR))
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 19c8980..5e01810 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6392,6 +6392,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_handle_pmu_event(vcpu);
if (kvm_check_request(KVM_REQ_PMI, vcpu))
kvm_deliver_pmi(vcpu);
+   if (kvm_check_request(KVM_REQ_IOAPIC_EOI_EXIT, vcpu)) {
+   BUG_ON(vcpu-arch.pending_ioapic_eoi  255);
+   if (test_bit(vcpu-arch.pending_ioapic_eoi,
+(void *) vcpu-arch.eoi_exit_bitmaps)) {
+   vcpu-run-exit_reason = KVM_EXIT_IOAPIC_EOI;
+   vcpu-run-eoi.vector =
+   vcpu-arch.pending_ioapic_eoi;
+   r = 0;
+   goto out;
+   }
+   }
if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu))
vcpu_scan_ioapic(vcpu);
if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu))
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7e2b41a..c6df36f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -134,6 +134,7 @@ static inline bool is_error_page(struct page *page)
 #define KVM_REQ_ENABLE_IBS23
 #define KVM_REQ_DISABLE_IBS   24
 #define KVM_REQ_APIC_PAGE_RELOAD  25
+#define KVM_REQ_IOAPIC_EOI_EXIT   26
 
 #define KVM_USERSPACE_IRQ_SOURCE_ID0
 #define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID   1
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 1e6f6c3..826a08d 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -183,6 +183,7 @@ struct kvm_s390_skeys {
 #define KVM_EXIT_EPR  23
 #define KVM_EXIT_SYSTEM_EVENT 24
 #define KVM_EXIT_S390_STSI25
+#define KVM_EXIT_IOAPIC_EOI   26
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -329,6 +330,10 @@ 

[PATCH v3 4/4] KVM: x86: Add support for local interrupt requests from userspace

2015-06-02 Thread Steve Rutherford
In order to enable userspace PIC support, the userspace PIC needs to
be able to inject local interrupt requests.

This adds the ioctl KVM_REQUEST_PIC_INJECTION and kvm exit
KVM_EXIT_GET_EXTINT.

The vm ioctl KVM_REQUEST_PIC_INJECTION makes a KVM_REQ_EVENT request
on the BSP, which causes the BSP to exit to userspace to fetch the
vector of the underlying external interrupt, which the BSP then
injects into the guest. This matches the PIC spec, and is necessary to
boot Windows.

Compiles for x86.

Update: Boots Windows and passes the KVM Unit Tests.

Signed-off-by: Steve Rutherford srutherf...@google.com
---
 Documentation/virtual/kvm/api.txt |  9 ++
 arch/x86/include/asm/kvm_host.h   |  2 ++
 arch/x86/kvm/irq.c| 22 +--
 arch/x86/kvm/lapic.c  |  7 +
 arch/x86/kvm/lapic.h  |  2 ++
 arch/x86/kvm/x86.c| 59 +--
 include/uapi/linux/kvm.h  |  7 +
 7 files changed, 103 insertions(+), 5 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 6ab2a3f7..b5d90cb 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2979,6 +2979,15 @@ len must be a multiple of sizeof(struct kvm_s390_irq). 
It must be  0
 and it must not exceed (max_vcpus + 32) * sizeof(struct kvm_s390_irq),
 which is the maximum number of possibly pending cpu-local interrupts.
 
+4.96 KVM_REQUEST_PIC_INJECTION
+
+Capability: KVM_CAP_SPLIT_IRQCHIP
+Type: VM ioctl
+Parameters: none
+Returns: 0 on success, -1 on error.
+
+Informs the kernel that userspace has a pending external interrupt.
+
 
 5. The kvm_run structure
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4f439ff..0e8b0fc 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -543,6 +543,8 @@ struct kvm_vcpu_arch {
 
u64 eoi_exit_bitmaps[4];
int pending_ioapic_eoi;
+   bool userspace_extint_available;
+   int pending_external_vector;
 };
 
 struct kvm_lpage_info {
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 706e47a..1270b2a 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -38,12 +38,25 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
 EXPORT_SYMBOL(kvm_cpu_has_pending_timer);
 
 /*
+ * check if there is a pending userspace external interrupt
+ */
+static int pending_userspace_extint(struct kvm_vcpu *v)
+{
+   return v-arch.userspace_extint_available ||
+  v-arch.pending_external_vector != -1;
+}
+
+/*
  * check if there is pending interrupt from
  * non-APIC source without intack.
  */
 static int kvm_cpu_has_extint(struct kvm_vcpu *v)
 {
-   if (kvm_apic_accept_pic_intr(v))
+   u8 accept = kvm_apic_accept_pic_intr(v);
+
+   if (accept  irqchip_split(v-kvm))
+   return pending_userspace_extint(v);
+   else if (accept)
return pic_irqchip(v-kvm)-output; /* PIC */
else
return 0;
@@ -91,7 +104,12 @@ EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
  */
 static int kvm_cpu_get_extint(struct kvm_vcpu *v)
 {
-   if (kvm_cpu_has_extint(v))
+   if (irqchip_split(v-kvm)  kvm_cpu_has_extint(v)) {
+   int vector = v-arch.pending_external_vector;
+
+   v-arch.pending_external_vector = -1;
+   return vector;
+   } else if (kvm_cpu_has_extint(v))
return kvm_pic_read_irq(v-kvm); /* PIC */
return -1;
 }
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 766d297..012b56ee 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2094,3 +2094,10 @@ void kvm_lapic_init(void)
jump_label_rate_limit(apic_hw_disabled, HZ);
jump_label_rate_limit(apic_sw_disabled, HZ);
 }
+
+void kvm_request_pic_injection(struct kvm_vcpu *vcpu)
+{
+   vcpu-arch.userspace_extint_available = true;
+   kvm_make_request(KVM_REQ_EVENT, vcpu);
+   kvm_vcpu_kick(vcpu);
+}
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 71b150c..7831e4d 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -63,6 +63,8 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct 
kvm_lapic_irq *irq,
unsigned long *dest_map);
 int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
 
+void kvm_request_pic_injection(struct kvm_vcpu *vcpu);
+
 bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
struct kvm_lapic_irq *irq, int *r, unsigned long *dest_map);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 35d13d4..40e7509 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -65,6 +65,8 @@
 #include asm/pvclock.h
 #include asm/div64.h
 
+#define GET_VECTOR_FROM_USERSPACE 1
+
 #define MAX_IO_MSRS 256
 #define KVM_MAX_MCE_BANKS 32
 #define KVM_MCE_CAP_SUPPORTED (MCG_CTL_P | MCG_SER_P)
@@ -4217,6 +4219,30 @@ long 

Re: [PATCH 05/15] KVM: MTRR: clean up mtrr default type

2015-06-02 Thread Xiao Guangrong


Thanks for your review, Paolo!


On 06/01/2015 05:11 PM, Paolo Bonzini wrote:


  struct kvm_vcpu_arch {
diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
index 562341b..6de49dd 100644
--- a/arch/x86/kvm/mtrr.c
+++ b/arch/x86/kvm/mtrr.c
@@ -105,7 +105,6 @@ EXPORT_SYMBOL_GPL(kvm_mtrr_valid);
  static void update_mtrr(struct kvm_vcpu *vcpu, u32 msr)
  {
struct kvm_mtrr *mtrr_state = vcpu-arch.mtrr_state;
-   unsigned char mtrr_enabled = mtrr_state-enabled;
gfn_t start, end, mask;
int index;
bool is_fixed = true;
@@ -114,7 +113,7 @@ static void update_mtrr(struct kvm_vcpu *vcpu, u32 msr)
  !kvm_arch_has_noncoherent_dma(vcpu-kvm))
return;

-   if (!(mtrr_enabled  0x2)  msr != MSR_MTRRdefType)
+   if (!mtrr_state-mtrr_enabled  msr != MSR_MTRRdefType)


I know Linus doesn't like bitfields too much.  Can you change these to
inline functions, and only leave an u64 deftype in the struct?



Sure. I will introduce mtrr_is_enabled() and fixed_mtrr_is_enabled() instead
of these bitfields.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/15] KVM: MTRR: improve kvm_mtrr_get_guest_memory_type

2015-06-02 Thread Xiao Guangrong



On 06/01/2015 05:16 PM, Paolo Bonzini wrote:



On 30/05/2015 12:59, Xiao Guangrong wrote:

  - kvm_mtrr_get_guest_memory_type() only checks one page in MTRRs so that
it's unnecessary to check to see if the range is partially covered in
MTRR

  - optimize the check of overlap memory type and add some comments to explain
the precedence

Signed-off-by: Xiao Guangrong guangrong.x...@linux.intel.com
---
  arch/x86/kvm/mtrr.c | 89 ++---
  1 file changed, 44 insertions(+), 45 deletions(-)

diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
index bc9c6da..d3c06d2 100644
--- a/arch/x86/kvm/mtrr.c
+++ b/arch/x86/kvm/mtrr.c
@@ -231,24 +231,16 @@ int kvm_mtrr_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 
*pdata)
return 0;
  }

-/*
- * The function is based on mtrr_type_lookup() in
- * arch/x86/kernel/cpu/mtrr/generic.c
- */
-static int get_mtrr_type(struct kvm_mtrr *mtrr_state,
-u64 start, u64 end)
+u8 kvm_mtrr_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn)
  {
-   u64 base, mask;
-   u8 prev_match, curr_match;
-   int i, num_var_ranges = KVM_NR_VAR_MTRR;
+   struct kvm_mtrr *mtrr_state = vcpu-arch.mtrr_state;
+   u64 base, mask, start = gfn_to_gpa(gfn);
+   int i, num_var_ranges = KVM_NR_VAR_MTRR, type_mask, type = -1;


Do not mix initialized and uninitialized variables on the same line
(preexisting, I know, but let's fix it instead of making it worse :)).
Please put each initialized variable on a separate line.


Okay. Will follow this style in the future development.



Also please initialize type_mask here (more on this below).



/* MTRR is completely disabled, use UC for all of physical memory. */
if (!mtrr_state-mtrr_enabled)
return MTRR_TYPE_UNCACHABLE;

-   /* Make end inclusive end, instead of exclusive */
-   end--;
-
/* Look in fixed ranges. Just return the type as per start */
if (mtrr_state-fixed_mtrr_enabled  (start  0x10)) {
int idx;
@@ -273,9 +265,9 @@ static int get_mtrr_type(struct kvm_mtrr *mtrr_state,
 * Look of multiple ranges matching this address and pick type
 * as per MTRR precedence
 */
-   prev_match = 0xFF;
+   type_mask = (1  MTRR_TYPE_WRBACK) | (1  MTRR_TYPE_WRTHROUGH);
for (i = 0; i  num_var_ranges; ++i) {
-   unsigned short start_state, end_state;
+   int curr_type;

if (!(mtrr_state-var_ranges[i].mask  (1  11)))
continue;
@@ -283,50 +275,57 @@ static int get_mtrr_type(struct kvm_mtrr *mtrr_state,
base = mtrr_state-var_ranges[i].base  PAGE_MASK;
mask = mtrr_state-var_ranges[i].mask  PAGE_MASK;

-   start_state = ((start  mask) == (base  mask));
-   end_state = ((end  mask) == (base  mask));
-   if (start_state != end_state)
-   return 0xFE;
-
if ((start  mask) != (base  mask))
continue;

-   curr_match = mtrr_state-var_ranges[i].base  0xff;
-   if (prev_match == 0xFF) {
-   prev_match = curr_match;
+   /*
+* Please refer to Intel SDM Volume 3: 11.11.4.1 MTRR
+* Precedences.
+*/
+
+   curr_type = mtrr_state-var_ranges[i].base  0xff;
+   if (type == -1) {
+   type = curr_type;
continue;
}

-   if (prev_match == MTRR_TYPE_UNCACHABLE ||
-   curr_match == MTRR_TYPE_UNCACHABLE)
+   /*
+* If two or more variable memory ranges match and the
+* memory types are identical, then that memory type is
+* used.
+*/
+   if (type == curr_type)
+   continue;
+
+   /*
+* If two or more variable memory ranges match and one of
+* the memory types is UC, the UC memory type used.
+*/
+   if (curr_type == MTRR_TYPE_UNCACHABLE)
return MTRR_TYPE_UNCACHABLE;

-   if ((prev_match == MTRR_TYPE_WRBACK 
-curr_match == MTRR_TYPE_WRTHROUGH) ||
-   (prev_match == MTRR_TYPE_WRTHROUGH 
-curr_match == MTRR_TYPE_WRBACK)) {
-   prev_match = MTRR_TYPE_WRTHROUGH;
-   curr_match = MTRR_TYPE_WRTHROUGH;
+   /*
+* If two or more variable memory ranges match and the
+* memory types are WT and WB, the WT memory type is used.
+*/
+   if (((1  type)  type_mask) 
+ ((1  curr_type)  type_mask)) {


Please inline definition of type_mask in the if, or rename it to
wt_wb_mask and make it const.  

Re: [PATCH 10/15] KVM: MTRR: sort variable MTRRs

2015-06-02 Thread Xiao Guangrong



On 06/01/2015 05:27 PM, Paolo Bonzini wrote:



On 30/05/2015 12:59, Xiao Guangrong wrote:

Sort all valid variable MTRRs based on its base address, it will help us to
check a range to see if it's fully contained in variable MTRRs

Signed-off-by: Xiao Guangrong guangrong.x...@linux.intel.com
---
  arch/x86/include/asm/kvm_host.h |  3 +++
  arch/x86/kvm/mtrr.c | 39 +++
  arch/x86/kvm/x86.c  |  2 +-
  arch/x86/kvm/x86.h  |  1 +
  4 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f3fc152..5be8f2e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -347,6 +347,7 @@ enum {
  struct kvm_mtrr_range {
u64 base;
u64 mask;
+   struct list_head node;
  };

  struct kvm_mtrr {
@@ -362,6 +363,8 @@ struct kvm_mtrr {
unsigned mtrr_enabled:1;
};
};
+
+   struct list_head head;
  };

  struct kvm_vcpu_arch {
diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
index aeb9767..8e3b81a 100644
--- a/arch/x86/kvm/mtrr.c
+++ b/arch/x86/kvm/mtrr.c
@@ -262,6 +262,38 @@ do_zap:
kvm_zap_gfn_range(vcpu-kvm, gpa_to_gfn(start), gpa_to_gfn(end));
  }

+static bool var_mtrr_range_is_valid(struct kvm_mtrr_range *range)
+{
+   u64 start, end;
+
+   if (!(range-mask  (1  11)))
+   return false;
+
+   var_mtrr_range(range, start, end);
+   return end  start;
+}
+
+static void set_var_mtrr_start(struct kvm_mtrr *mtrr_state, int index)
+{
+   /* remove the entry if it's in the list. */
+   if (var_mtrr_range_is_valid(mtrr_state-var_ranges[index]))
+   list_del(mtrr_state-var_ranges[index].node);
+}
+
+static void set_var_mtrr_end(struct kvm_mtrr *mtrr_state, int index)
+{
+   struct kvm_mtrr_range *tmp, *cur = mtrr_state-var_ranges[index];
+
+   /* add it to the list if it's valid. */
+   if (var_mtrr_range_is_valid(mtrr_state-var_ranges[index])) {
+   list_for_each_entry(tmp, mtrr_state-head, node)
+   if (cur-base  tmp-base)
+   list_add_tail(cur-node, tmp-node);
+
+   list_add_tail(cur-node, mtrr_state-head);
+   }
+}
+
  int kvm_mtrr_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
  {
int index;
@@ -281,10 +313,12 @@ int kvm_mtrr_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 
data)

index = (msr - 0x200) / 2;
is_mtrr_mask = msr - 0x200 - 2 * index;
+   set_var_mtrr_start(vcpu-arch.mtrr_state, index);
if (!is_mtrr_mask)
vcpu-arch.mtrr_state.var_ranges[index].base = data;
else
vcpu-arch.mtrr_state.var_ranges[index].mask = data;
+   set_var_mtrr_end(vcpu-arch.mtrr_state, index);


Just move the whole code to a new kvm_set_var_mtrr function.


Okay.




}

update_mtrr(vcpu, msr);
@@ -331,6 +365,11 @@ int kvm_mtrr_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 
*pdata)
return 0;
  }

+void kvm_vcpu_mtrr_init(struct kvm_vcpu *vcpu)
+{
+   INIT_LIST_HEAD(vcpu-arch.mtrr_state.head);
+}
+
  u8 kvm_mtrr_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn)
  {
struct kvm_mtrr *mtrr_state = vcpu-arch.mtrr_state;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 64c2891..8084ac3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6991,7 +6991,7 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
kvm_vcpu_reset(vcpu, false);
kvm_mmu_setup(vcpu);
vcpu_put(vcpu);
-
+   kvm_vcpu_mtrr_init(vcpu);


Please move this call much earlier.


Okay, will do these in v2.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8

2015-06-02 Thread David Gibson
On Thu, May 28, 2015 at 03:17:20PM +1000, Paul Mackerras wrote:
 This builds on the ability to run more than one vcore on a physical
 core by using the micro-threading (split-core) modes of the POWER8
 chip.  Previously, only vcores from the same VM could be run together,
 and (on POWER8) only if they had just one thread per core.  With the
 ability to split the core on guest entry and unsplit it on guest exit,
 we can run up to 8 vcpu threads from up to 4 different VMs, and we can
 run multiple vcores with 2 or 4 vcpus per vcore.
 
 Dynamic micro-threading is only available if the static configuration
 of the cores is whole-core mode (unsplit), and only on POWER8.
 
 To manage this, we introduce a new kvm_split_mode struct which is
 shared across all of the subcores in the core, with a pointer in the
 paca on each thread.  In addition we extend the core_info struct to
 have information on each subcore.  When deciding whether to add a
 vcore to the set already on the core, we now have two possibilities:
 (a) piggyback the vcore onto an existing subcore, or (b) start a new
 subcore.
 
 Currently, when any vcpu needs to exit the guest and switch to host
 virtual mode, we interrupt all the threads in all subcores and switch
 the core back to whole-core mode.  It may be possible in future to
 allow some of the subcores to keep executing in the guest while
 subcore 0 switches to the host, but that is not implemented in this
 patch.
 
 This adds a module parameter called dynamic_mt_modes which controls
 which micro-threading (split-core) modes the code will consider, as a
 bitmap.  In other words, if it is 0, no micro-threading mode is
 considered; if it is 2, only 2-way micro-threading is considered; if
 it is 4, only 4-way, and if it is 6, both 2-way and 4-way
 micro-threading mode will be considered.  The default is 6.
 
 With this, we now have secondary threads which are the primary thread
 for their subcore and therefore need to do the MMU switch.  These
 threads will need to be started even if they have no vcpu to run, so
 we use the vcore pointer in the PACA rather than the vcpu pointer to
 trigger them.
 
 Signed-off-by: Paul Mackerras pau...@samba.org

Reviewed-by: David Gibson da...@gibson.dropbear.id.au

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpwPMMh5BtzP.pgp
Description: PGP signature


Re: [PATCH 2/2] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8

2015-06-02 Thread David Gibson
On Thu, May 28, 2015 at 03:17:20PM +1000, Paul Mackerras wrote:
 This builds on the ability to run more than one vcore on a physical
 core by using the micro-threading (split-core) modes of the POWER8
 chip.  Previously, only vcores from the same VM could be run together,
 and (on POWER8) only if they had just one thread per core.  With the
 ability to split the core on guest entry and unsplit it on guest exit,
 we can run up to 8 vcpu threads from up to 4 different VMs, and we can
 run multiple vcores with 2 or 4 vcpus per vcore.
 
 Dynamic micro-threading is only available if the static configuration
 of the cores is whole-core mode (unsplit), and only on POWER8.
 
 To manage this, we introduce a new kvm_split_mode struct which is
 shared across all of the subcores in the core, with a pointer in the
 paca on each thread.  In addition we extend the core_info struct to
 have information on each subcore.  When deciding whether to add a
 vcore to the set already on the core, we now have two possibilities:
 (a) piggyback the vcore onto an existing subcore, or (b) start a new
 subcore.
 
 Currently, when any vcpu needs to exit the guest and switch to host
 virtual mode, we interrupt all the threads in all subcores and switch
 the core back to whole-core mode.  It may be possible in future to
 allow some of the subcores to keep executing in the guest while
 subcore 0 switches to the host, but that is not implemented in this
 patch.
 
 This adds a module parameter called dynamic_mt_modes which controls
 which micro-threading (split-core) modes the code will consider, as a
 bitmap.  In other words, if it is 0, no micro-threading mode is
 considered; if it is 2, only 2-way micro-threading is considered; if
 it is 4, only 4-way, and if it is 6, both 2-way and 4-way
 micro-threading mode will be considered.  The default is 6.
 
 With this, we now have secondary threads which are the primary thread
 for their subcore and therefore need to do the MMU switch.  These
 threads will need to be started even if they have no vcpu to run, so
 we use the vcore pointer in the PACA rather than the vcpu pointer to
 trigger them.
 
 Signed-off-by: Paul Mackerras pau...@samba.org

Reviewed-by: David Gibson da...@gibson.dropbear.id.au

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpgPtMaytBmL.pgp
Description: PGP signature


Re: [PATCH 08/15] KVM: MTRR: introduce fixed_mtrr_segment table

2015-06-02 Thread Xiao Guangrong



On 06/01/2015 05:25 PM, Paolo Bonzini wrote:



On 30/05/2015 12:59, Xiao Guangrong wrote:

This table summarizes the information of fixed MTRRs and introduce some APIs
to abstract its operation which helps us to clean up the code and will be
used in later patches

Signed-off-by: Xiao Guangrong guangrong.x...@linux.intel.com
---
  arch/x86/kvm/mtrr.c | 191 ++--
  1 file changed, 142 insertions(+), 49 deletions(-)

diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
index d3c06d2..888441e 100644
--- a/arch/x86/kvm/mtrr.c
+++ b/arch/x86/kvm/mtrr.c
@@ -102,12 +102,126 @@ bool kvm_mtrr_valid(struct kvm_vcpu *vcpu, u32 msr, u64 
data)
  }
  EXPORT_SYMBOL_GPL(kvm_mtrr_valid);

+struct fixed_mtrr_segment {
+   u64 start;
+   u64 end;
+
+   /*
+* unit corresponds to the MSR entry in this segment, the size
+* of unit is covered in one MSR.
+*/
+   u64 unit_size;
+
+   /* a range is covered in one memory cache type. */
+   u64 range_size;
+
+   /* the start position in kvm_mtrr.fixed_ranges[]. */
+   int range_start;
+};
+
+static struct fixed_mtrr_segment fixed_seg_table[] = {
+   /* MSR_MTRRfix64K_0, 1 unit. 64K fixed mtrr. */
+   {
+   .start = 0x0,
+   .end = 0x8,
+   .unit_size = 0x8,


Unit size is always range size * 8.


Good catch, will drop this.




+   .range_size = 1ULL  16,


Perhaps 64 * 1024 (and so on below) is clearer, because it matches the
name of the MSR?


Yes, it's better.




+   .range_start = 0,
+   },
+
+   /*
+* MSR_MTRRfix16K_8 ... MSR_MTRRfix16K_A, 2 units,
+* 16K fixed mtrr.
+*/
+   {
+   .start = 0x8,
+   .end = 0xc,
+   .unit_size = (0xc - 0x8) / 2,
+   .range_size = 1ULL  14,
+   .range_start = 8,
+   },
+
+   /*
+* MSR_MTRRfix4K_C ... MSR_MTRRfix4K_F8000, 8 units,
+* 4K fixed mtrr.
+*/
+   {
+   .start = 0xc,
+   .end = 0x10,
+   .unit_size = (0x10 - 0xc) / 8,
+   .range_size = 1ULL  12,
+   .range_start = 24,
+   }
+};
+
+static int fixed_msr_to_range_index(u32 msr)
+{
+   int seg, unit;
+
+   if (!fixed_msr_to_seg_unit(msr, seg, unit))
+   return -1;
+
+   fixed_mtrr_seg_unit_range_index(seg, unit);


This looks wrong.


Oops, should be return fixed_mtrr_seg_unit_range_index(seg, unit); :(




+   return 0;
+}
+
  static void update_mtrr(struct kvm_vcpu *vcpu, u32 msr)
  {
struct kvm_mtrr *mtrr_state = vcpu-arch.mtrr_state;
gfn_t start, end, mask;
int index;
-   bool is_fixed = true;

if (msr == MSR_IA32_CR_PAT || !tdp_enabled ||
  !kvm_arch_has_noncoherent_dma(vcpu-kvm))
@@ -116,32 +230,19 @@ static void update_mtrr(struct kvm_vcpu *vcpu, u32 msr)
if (!mtrr_state-mtrr_enabled  msr != MSR_MTRRdefType)
return;

+   if (fixed_msr_to_range(msr, start, end)) {
+   if (!mtrr_state-fixed_mtrr_enabled)
+   return;
+   goto do_zap;
+   }
+
switch (msr) {


Please move defType handling in an if above if
(fixed_msr_to_range(msr, start, end)).  Then you don't need the goto.



Yes, indeed. Will do it in the next version.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/15] KVM: MTRR: introduce mtrr_for_each_mem_type

2015-06-02 Thread Xiao Guangrong



On 06/01/2015 05:33 PM, Paolo Bonzini wrote:



On 30/05/2015 12:59, Xiao Guangrong wrote:

+struct mtrr_looker {
+   /* input fields. */
+   struct kvm_mtrr *mtrr_state;
+   u64 start;
+   u64 end;


s/looker/iter/ or s/looker/lookup/


Good to me.




+static void mtrr_lookup_start(struct mtrr_looker *looker)
+{
+   looker-mem_type = -1;
+
+   if (!looker-mtrr_state-mtrr_enabled) {
+   looker-partial_map = true;
+   return;
+   }
+
+   if (!mtrr_lookup_fixed_start(looker))
+   mtrr_lookup_var_start(looker);
+}
+


Separating mtrr_lookup_start and mtrr_lookup_init is weird.


Agreed, will fold mtrr_lookup_start() into mtrr_lookup_init().



There are common parts of mtrr_lookup_*_start and mtrr_lookup_*_next.
For example this:


+   looker-mem_type = looker-mtrr_state-fixed_ranges[index];
+   looker-start = fixed_mtrr_range_end_addr(seg, index);
+   return true;


in mtrr_lookup_fixed_start is the same as this:



+   end = fixed_mtrr_range_end_addr(looker-seg, looker-index);
+
+   /* switch to next segment. */
+   if (end = eseg-end) {
+   looker-seg++;
+   looker-index = 0;
+
+   /* have looked up for all fixed MTRRs. */
+   if (looker-seg = ARRAY_SIZE(fixed_seg_table))
+   return mtrr_lookup_var_start(looker);
+
+   end = fixed_mtrr_range_end_addr(looker-seg, looker-index);
+   }
+
+   looker-mem_type = mtrr_state-fixed_ranges[looker-index];
+   looker-start = end;


in mtrr_lookup_fixed_next.  Can you try to make them more common?


Yes, i will check them carefully and introduce common functions to do
it.



Basically you should have

+#define mtrr_for_each_mem_type(_looker_, _mtrr_, _gpa_start_, _gpa_end_) \
+   for (mtrr_lookup_init(_looker_, _mtrr_, _gpa_start_, _gpa_end_); \
+!mtrr_lookup_end(_looker_); mtrr_lookup_next(_looker_))


Good idea, will do it in v2.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/15] KVM: MTRR: introduce mtrr_for_each_mem_type

2015-06-02 Thread Xiao Guangrong



On 06/01/2015 10:26 PM, Paolo Bonzini wrote:



On 01/06/2015 11:33, Paolo Bonzini wrote:

+   looker-mem_type = looker-mtrr_state-fixed_ranges[index];

+   looker-start = fixed_mtrr_range_end_addr(seg, index);
+   return true;

in mtrr_lookup_fixed_start is the same as this:



+   end = fixed_mtrr_range_end_addr(looker-seg, looker-index);
+
+   /* switch to next segment. */
+   if (end = eseg-end) {
+   looker-seg++;
+   looker-index = 0;
+
+   /* have looked up for all fixed MTRRs. */
+   if (looker-seg = ARRAY_SIZE(fixed_seg_table))
+   return mtrr_lookup_var_start(looker);
+
+   end = fixed_mtrr_range_end_addr(looker-seg, looker-index);
+   }
+
+   looker-mem_type = mtrr_state-fixed_ranges[looker-index];
+   looker-start = end;

in mtrr_lookup_fixed_next.  Can you try to make them more common?

Basically you should have

+#define mtrr_for_each_mem_type(_looker_, _mtrr_, _gpa_start_, _gpa_end_) \
+   for (mtrr_lookup_init(_looker_, _mtrr_, _gpa_start_, _gpa_end_); \
+!mtrr_lookup_end(_looker_); mtrr_lookup_next(_looker_))


... where the above code I quoted would be part of mtrr_lookup_end.


Okay, will do. :)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] KVM: MTRR: do not map huage page for non-consistent range

2015-06-02 Thread Xiao Guangrong



On 06/01/2015 05:36 PM, Paolo Bonzini wrote:



On 30/05/2015 12:59, Xiao Guangrong wrote:

Currently guest MTRR is completely prohibited if cache snoop is supported on
IOMMU (!noncoherent_dma) and host does the emulation based on the knowledge
from host side, however, host side is not the good point to know
what the purpose of guest is. A good example is that pass-throughed VGA
frame buffer is not always UC as host expected


Can you explain how?  The original idea was that such a framebuffer
would be kvm_is_reserved_pfn and thus be unconditionally UC.


Yes, frame-buffer is always UC in current code, however, UC for frame-buffer
causes bad performance. It's quoted from Documentation/x86/mtrr.txt:

| This is most useful when you have a video (VGA) card on a PCI or AGP bus.
| Enabling write-combining allows bus write transfers to be combined into a
| larger transfer before bursting over the PCI/AGP bus. This can increase
| performance of image write operations 2.5 times or more.

So that guest will configure the range to MTRR, this patchset follows
guest MTRR and cooperates with guest PAT (ept.VMX_EPT_IPAT_BIT = 0) to emulate
guest cache type as guest expects.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time

2015-06-02 Thread Christoffer Dall
[replying to myself]

On Tue, Jun 02, 2015 at 11:27:59AM +0200, Christoffer Dall wrote:

[..]

   
   If this patch is incorrect, then how does it work on x86, where
   handle_external_intr() is called (with a barrier in between) before
   kvm_guest_exit(), and where handle_external_intr() is simply
   local_irq_enable() on SVM and something more complicated on VMX ?
   
   Finally, how exactly is preemption latency added here?   Won't IRQ
   processing run with higher priority than any task on your system, so the
   order of (1) process pending IRQs (2) call schedule if needed is still
   preserved here, but we call kvm_guest_exit() between (1) and (2) instead
   of before (1).
  
  I may be missing something, but on return from interrupt with preempt
  disabled we can't take the need resched path. And need to return
  to KVM no?
 
 preempt_enable() will call __preempt_schedule() and cause preemption
 there, so you're talking about adding these lines of latency:
 
   kvm_guest_exit();
   trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
 
 And these were called with interrupts disabled before, so I don't see
 the issue??
 
 However, your question is making me think whether we have a race in the
 current code on fully preemptible kernels, if we get preempted before
 calling kvm_timer_sync_hwstate() and kvm_vgic_sync_hwstate(), then we
 could potentially schedule another vcpu on this core and loose/corrupt
 state, can we not?  We probably need to check for this in
 kvm_vcpu_load/kvm_vcpu_put.  I need to think more about if this is a
 real issue or if I'm seeing ghosts.
 
I've thought about it and I don't think there's a race because those
functions don't access the hardware directly, but only manipulate
per-vcpu data structures.

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: PPC: Book3S HV: Make use of unused threads when running guests

2015-06-02 Thread David Gibson
On Thu, May 28, 2015 at 03:17:19PM +1000, Paul Mackerras wrote:
 When running a virtual core of a guest that is configured with fewer
 threads per core than the physical cores have, the extra physical
 threads are currently unused.  This makes it possible to use them to
 run one or more other virtual cores from the same guest when certain
 conditions are met.  This applies on POWER7, and on POWER8 to guests
 with one thread per virtual core.  (It doesn't apply to POWER8 guests
 with multiple threads per vcore because they require a 1-1 virtual to
 physical thread mapping in order to be able to use msgsndp and the
 TIR.)
 
 The idea is that we maintain a list of preempted vcores for each
 physical cpu (i.e. each core, since the host runs single-threaded).
 Then, when a vcore is about to run, it checks to see if there are
 any vcores on the list for its physical cpu that could be
 piggybacked onto this vcore's execution.  If so, those additional
 vcores are put into state VCORE_PIGGYBACK and their runnable VCPU
 threads are started as well as the original vcore, which is called
 the master vcore.
 
 After the vcores have exited the guest, the extra ones are put back
 onto the preempted list if any of their VCPUs are still runnable and
 not idle.
 
 This means that vcpu-arch.ptid is no longer necessarily the same as
 the physical thread that the vcpu runs on.  In order to make it easier
 for code that wants to send an IPI to know which CPU to target, we
 now store that in a new field in struct vcpu_arch, called thread_cpu.
 
 Signed-off-by: Paul Mackerras pau...@samba.org

Reviewed-by: David Gibson da...@gibson.dropbear.id.au

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpGb5ERdypMG.pgp
Description: PGP signature


Re: [PATCH 1/2] KVM: PPC: Book3S HV: Make use of unused threads when running guests

2015-06-02 Thread David Gibson
On Thu, May 28, 2015 at 03:17:19PM +1000, Paul Mackerras wrote:
 When running a virtual core of a guest that is configured with fewer
 threads per core than the physical cores have, the extra physical
 threads are currently unused.  This makes it possible to use them to
 run one or more other virtual cores from the same guest when certain
 conditions are met.  This applies on POWER7, and on POWER8 to guests
 with one thread per virtual core.  (It doesn't apply to POWER8 guests
 with multiple threads per vcore because they require a 1-1 virtual to
 physical thread mapping in order to be able to use msgsndp and the
 TIR.)
 
 The idea is that we maintain a list of preempted vcores for each
 physical cpu (i.e. each core, since the host runs single-threaded).
 Then, when a vcore is about to run, it checks to see if there are
 any vcores on the list for its physical cpu that could be
 piggybacked onto this vcore's execution.  If so, those additional
 vcores are put into state VCORE_PIGGYBACK and their runnable VCPU
 threads are started as well as the original vcore, which is called
 the master vcore.
 
 After the vcores have exited the guest, the extra ones are put back
 onto the preempted list if any of their VCPUs are still runnable and
 not idle.
 
 This means that vcpu-arch.ptid is no longer necessarily the same as
 the physical thread that the vcpu runs on.  In order to make it easier
 for code that wants to send an IPI to know which CPU to target, we
 now store that in a new field in struct vcpu_arch, called thread_cpu.
 
 Signed-off-by: Paul Mackerras pau...@samba.org

Reviewed-by: David Gibson da...@gibson.dropbear.id.au

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgprmehfWy_o1.pgp
Description: PGP signature