Re: [patch 3/3] KVM: x86: add tracepoint to wait_lapic_expire

2014-12-16 Thread Paolo Bonzini


On 15/12/2014 23:06, Marcelo Tosatti wrote:
 Add tracepoint to wait_lapic_expire.
 
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 
 Index: kvm/arch/x86/kvm/lapic.c
 ===
 --- kvm.orig/arch/x86/kvm/lapic.c
 +++ kvm/arch/x86/kvm/lapic.c
 @@ -1121,6 +1121,7 @@ void wait_lapic_expire(struct kvm_vcpu *
  {
   struct kvm_lapic *apic = vcpu-arch.apic;
   u64 guest_tsc, tsc_deadline;
 + unsigned int total_delay = 0;
  
   if (!kvm_vcpu_has_lapic(vcpu))
   return;
 @@ -1138,9 +1139,13 @@ void wait_lapic_expire(struct kvm_vcpu *
   while (guest_tsc  tsc_deadline) {
   int delay = min(tsc_deadline - guest_tsc, 1000ULL);
  
 + total_delay += delay;
 +
   __delay(delay);
   guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, native_read_tsc());
   }
 +
 + trace_kvm_wait_lapic_expire(vcpu-vcpu_id, total_delay);

Let's add guest_tsc - tsc_deadline to the tracepoint.  This should
simplify the tuning of the parameter.

Paolo


  }
  
  static void start_apic_timer(struct kvm_lapic *apic)
 Index: kvm/arch/x86/kvm/trace.h
 ===
 --- kvm.orig/arch/x86/kvm/trace.h
 +++ kvm/arch/x86/kvm/trace.h
 @@ -914,6 +914,25 @@ TRACE_EVENT(kvm_pvclock_update,
 __entry-flags)
  );
  
 +TRACE_EVENT(kvm_wait_lapic_expire,
 + TP_PROTO(unsigned int vcpu_id, unsigned int total_delay),
 + TP_ARGS(vcpu_id, total_delay),
 +
 + TP_STRUCT__entry(
 + __field(unsigned int,   vcpu_id )
 + __field(unsigned int,   total_delay )
 + ),
 +
 + TP_fast_assign(
 + __entry-vcpu_id   = vcpu_id;
 + __entry-total_delay   = total_delay;
 + ),
 +
 + TP_printk(vcpu %u: total_delay %u,
 +   __entry-vcpu_id,
 +   __entry-total_delay)
 +);
 +
  #endif /* _TRACE_KVM_H */
  
  #undef TRACE_INCLUDE_PATH
 
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [v3 00/26] Add VT-d Posted-Interrupts support

2014-12-16 Thread Wu, Feng
Hi Paolo,

Could you please have a look at this series? Thanks a lot!

Thanks,
Feng

 -Original Message-
 From: Wu, Feng
 Sent: Friday, December 12, 2014 11:15 PM
 To: t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org;
 g...@kernel.org; pbonz...@redhat.com; dw...@infradead.org;
 j...@8bytes.org; alex.william...@redhat.com; jiang@linux.intel.com
 Cc: eric.au...@linaro.org; linux-ker...@vger.kernel.org;
 io...@lists.linux-foundation.org; kvm@vger.kernel.org; Wu, Feng
 Subject: [v3 00/26] Add VT-d Posted-Interrupts support
 
 VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
 With VT-d Posted-Interrupts enabled, external interrupts from
 direct-assigned devices can be delivered to guests without VMM
 intervention when guest is running in non-root mode.
 
 You can find the VT-d Posted-Interrtups Spec. in the following URL:
 http://www.intel.com/content/www/us/en/intelligent-systems/intel-technolog
 y/vt-directed-io-spec.html
 
 v1-v2:
 * Use VFIO framework to enable this feature, the VFIO part of this series is
   base on Eric's patch [PATCH v3 0/8] KVM-VFIO IRQ forward control
 * Rebase this patchset on
 git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git,
   then revise some irq logic based on the new hierarchy irqdomain patches
 provided
   by Jiang Liu jiang@linux.intel.com
 
 v2-v3:
 * Adjust the Posted-interrupts Descriptor updating logic when vCPU is
   preempted or blocked.
 * KVM_DEV_VFIO_DEVICE_POSTING_IRQ --
 KVM_DEV_VFIO_DEVICE_POST_IRQ
 * __KVM_HAVE_ARCH_KVM_VFIO_POSTING --
 __KVM_HAVE_ARCH_KVM_VFIO_POST
 * Add KVM_DEV_VFIO_DEVICE_UNPOST_IRQ attribute for VFIO irq, which
   can be used to change back to remapping mode.
 * Fix typo
 
 This patch series is made of the following groups:
 1-6: Some preparation changes in iommu and irq component, this is based on
 the
  new hierarchy irqdomain logic.
 7-9, 26: IOMMU changes for VT-d Posted-Interrupts, such as, feature
 detection,
   command line parameter.
 10-17, 22-25: Changes related to KVM itself.
 18-20: Changes in VFIO component, this part was previously sent out as
 [RFC PATCH v2 0/2] kvm-vfio: implement the vfio skeleton for VT-d
 Posted-Interrupts
 21: x86 irq related changes
 
 Feng Wu (26):
   genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a
 VCPU
   iommu: Add new member capability to struct irq_remap_ops
   iommu, x86: Define new irte structure for VT-d Posted-Interrupts
   iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip
   x86, irq: Implement irq_set_vcpu_affinity for pci_msi_ir_controller
   iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
   iommu, x86: Add cap_pi_support() to detect VT-d PI capability
   iommu, x86: Add intel_irq_remapping_capability() for Intel
   iommu, x86: define irq_remapping_cap()
   KVM: change struct pi_desc for VT-d Posted-Interrupts
   KVM: Add some helper functions for Posted-Interrupts
   KVM: Initialize VT-d Posted-Interrupts Descriptor
   KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
   KVM: Get Posted-Interrupts descriptor address from struct kvm_vcpu
   KVM: add interfaces to control PI outside vmx
   KVM: Make struct kvm_irq_routing_table accessible
   KVM: make kvm_set_msi_irq() public
   KVM: kvm-vfio: User API for VT-d Posted-Interrupts
   KVM: kvm-vfio: implement the VFIO skeleton for VT-d Posted-Interrupts
   KVM: x86: kvm-vfio: VT-d posted-interrupts setup
   x86, irq: Define a global vector for VT-d Posted-Interrupts
   KVM: Define a wakeup worker thread for vCPU
   KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
   KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
   KVM: Suppress posted-interrupt when 'SN' is set
   iommu/vt-d: Add a command line parameter for VT-d posted-interrupts
 
  Documentation/kernel-parameters.txt|   1 +
  Documentation/virtual/kvm/devices/vfio.txt |   9 ++
  arch/x86/include/asm/entry_arch.h  |   2 +
  arch/x86/include/asm/hardirq.h |   1 +
  arch/x86/include/asm/hw_irq.h  |   2 +
  arch/x86/include/asm/irq_remapping.h   |  11 ++
  arch/x86/include/asm/irq_vectors.h |   1 +
  arch/x86/include/asm/kvm_host.h|  12 ++
  arch/x86/kernel/apic/msi.c |   1 +
  arch/x86/kernel/entry_64.S |   2 +
  arch/x86/kernel/irq.c  |  27 
  arch/x86/kernel/irqinit.c  |   2 +
  arch/x86/kvm/Makefile  |   2 +-
  arch/x86/kvm/kvm_vfio_x86.c|  77 +
  arch/x86/kvm/vmx.c | 244
 -
  arch/x86/kvm/x86.c |  22 ++-
  drivers/iommu/intel_irq_remapping.c|  68 +++-
  drivers/iommu/irq_remapping.c  |  24 ++-
  drivers/iommu/irq_remapping.h  |   8 +
  include/linux/dmar.h   |  32 
  include/linux/intel-iommu.h   

Re: [PATCH] KVM: nVMX: consult PFEC_MASK and PFEC_MATCH when generating #PF VM-exit

2014-12-16 Thread Paolo Bonzini


On 15/12/2014 21:56, Eugene Korenevsky wrote:
 + u32 inequality, bit;
 +
 + bit = (vmcs12-exception_bitmap  (1u  PF_VECTOR)) ? 1u : 0;
 + inequality =
 + (error_code  vmcs12-page_fault_error_code_mask) !=
 +  vmcs12-page_fault_error_code_match ? 1u : 0;

You should either remove ? 1u : 0 (which is redundant), or flip the
bit in the exception bitmap, like

inequality = ...
  ? (1u  PF_VECTOR) : 0;
return ((vmcs12-exception_bitmap ^ inequality)
 (1u  PF_VECTOR)) != 0;

If you choose the former, please use != 0 in the assignment of bit
instead of the ternary operator, and make the two variables bool.  Then
you can remove the != 0 in the return below.

Paolo

 + return (inequality ^ bit) != 0;
 +}
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Why do additional cores reduce performance?

2014-12-16 Thread Paolo Bonzini


On 16/12/2014 00:40, Oleg Ovechko wrote:
 A. Host Windows, 6 cores (no HT, turbo boost off): 6:23 (+- 10 secs)
 B. Host Windows, 1 CPU core (other are turned off in BIOS): 7:13 (+-10 secs)
 C. Host 1 core, Guest Windows 1 core: 7:15 - same as B, no degradation
 D. Host 6 cores, Guest Windows 1 core: 7:57
 E. Host 6 cores, Guest Windows 4 cores: 8:17

What is your benchmark?

Windows sometimes has scalability problems due to the way it does
timing.  Try replacing -cpu host with -no-hpet -cpu
host,hv_time,hv_vapic.

 3. Also I am unsure about HT. When I specify cores=2,

I suppose you mean threads=2.

 is there any
 guaranty that whole core with both HT parts is passed to VM? Or it can be
 mix of two real cores with separate caches?

It will be a mix.  Do not specify HT in the guest, unless you have HT in
the host _and_ you are pinning the two threads of each guest core to the
two threads of a host core.

Paolo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC] s390/kernel: use stnsm instead of stosm

2014-12-16 Thread Christian Borntraeger
At least on z196 stnsm is faster than stosm.

Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
---
 arch/s390/include/asm/irqflags.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/irqflags.h b/arch/s390/include/asm/irqflags.h
index 37b9091..16aa0c7 100644
--- a/arch/s390/include/asm/irqflags.h
+++ b/arch/s390/include/asm/irqflags.h
@@ -36,7 +36,7 @@ static inline notrace void __arch_local_irq_ssm(unsigned long 
flags)
 
 static inline notrace unsigned long arch_local_save_flags(void)
 {
-   return __arch_local_irq_stosm(0x00);
+   return __arch_local_irq_stnsm(0xff);
 }
 
 static inline notrace unsigned long arch_local_irq_save(void)
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] s390/kernel: use stnsm instead of stosm

2014-12-16 Thread Christian Borntraeger
Paolo,


sorry, should have only go to Martin and Heiko. 
Nothing to worry about from your side. :-)


Am 16.12.2014 um 10:30 schrieb Christian Borntraeger:
 At least on z196 stnsm is faster than stosm.
 
 Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
 ---
  arch/s390/include/asm/irqflags.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/arch/s390/include/asm/irqflags.h 
 b/arch/s390/include/asm/irqflags.h
 index 37b9091..16aa0c7 100644
 --- a/arch/s390/include/asm/irqflags.h
 +++ b/arch/s390/include/asm/irqflags.h
 @@ -36,7 +36,7 @@ static inline notrace void __arch_local_irq_ssm(unsigned 
 long flags)
 
  static inline notrace unsigned long arch_local_save_flags(void)
  {
 - return __arch_local_irq_stosm(0x00);
 + return __arch_local_irq_stnsm(0xff);
  }
 
  static inline notrace unsigned long arch_local_irq_save(void)
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] s390/kernel: use stnsm instead of stosm

2014-12-16 Thread Paolo Bonzini


On 16/12/2014 10:31, Christian Borntraeger wrote:
 Paolo,
 
 
 sorry, should have only go to Martin and Heiko. 
 Nothing to worry about from your side. :-)

No problem, it's always fun to learn new s390 instructions. :)

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/3] KVM: x86: add tracepoint to wait_lapic_expire

2014-12-16 Thread Marcelo Tosatti
On Tue, Dec 16, 2014 at 10:03:39AM +0100, Paolo Bonzini wrote:
 
 
 On 15/12/2014 23:06, Marcelo Tosatti wrote:
  Add tracepoint to wait_lapic_expire.
  
  Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
  
  Index: kvm/arch/x86/kvm/lapic.c
  ===
  --- kvm.orig/arch/x86/kvm/lapic.c
  +++ kvm/arch/x86/kvm/lapic.c
  @@ -1121,6 +1121,7 @@ void wait_lapic_expire(struct kvm_vcpu *
   {
  struct kvm_lapic *apic = vcpu-arch.apic;
  u64 guest_tsc, tsc_deadline;
  +   unsigned int total_delay = 0;
   
  if (!kvm_vcpu_has_lapic(vcpu))
  return;
  @@ -1138,9 +1139,13 @@ void wait_lapic_expire(struct kvm_vcpu *
  while (guest_tsc  tsc_deadline) {
  int delay = min(tsc_deadline - guest_tsc, 1000ULL);
   
  +   total_delay += delay;
  +
  __delay(delay);
  guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, native_read_tsc());
  }
  +
  +   trace_kvm_wait_lapic_expire(vcpu-vcpu_id, total_delay);
 
 Let's add guest_tsc - tsc_deadline to the tracepoint.  This should
 simplify the tuning of the parameter.
 
 Paolo

total_delay is very close to that, except the summands are 

1000 + 1000 + ... + remainder

Yes?

BTW its very easy to tune the parameter with the kvm-unit-test test
(the optimal value is clear). I'll write a document.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/3] KVM: x86: add tracepoint to wait_lapic_expire

2014-12-16 Thread Paolo Bonzini


On 16/12/2014 13:15, Marcelo Tosatti wrote:
  Let's add guest_tsc - tsc_deadline to the tracepoint.  This should
  simplify the tuning of the parameter.
 
 total_delay is very close to that, except the summands are 
 
 1000 + 1000 + ... + remainder
 
 Yes?

Almost: guest_tsc - tsc_deadline will be negative if the vmentry
overshot the original tsc_deadline.  In that case, total_delay will be zero.

 BTW its very easy to tune the parameter with the kvm-unit-test test
 (the optimal value is clear). I'll write a document.

Nice benefit of the test. :)

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 0/3] KVM: add option to advance tscdeadline hrtimer expiration (v5)

2014-12-16 Thread Marcelo Tosatti
See patches for details.

v2:
- fix email address.

v3:
- use module parameter for configuration of value (Paolo/Radim)

v4:
- fix check for tscdeadline mode while waiting for expiration (Paolo)
- use proper delay function (Radim)
- fix LVTT tscdeadline mode check in hrtimer interrupt handler (Radim)
- add comment regarding PPR and APICv (Paolo)

v5:
- use tscdeadline expiration and guest tsc difference in 
  tracepoint (Paolo)


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 2/3] KVM: x86: add option to advance tscdeadline hrtimer expiration

2014-12-16 Thread Marcelo Tosatti
For the hrtimer which emulates the tscdeadline timer in the guest,
add an option to advance expiration, and busy spin on VM-entry waiting
for the actual expiration time to elapse.

This allows achieving low latencies in cyclictest (or any scenario 
which requires strict timing regarding timer expiration).

Reduces average cyclictest latency from 12us to 8us
on Core i5 desktop.

Note: this option requires tuning to find the appropriate value 
for a particular hardware/guest combination. One method is to measure the 
average delay between apic_timer_fn and VM-entry. 
Another method is to start with 1000ns, and increase the value
in say 500ns increments until avg cyclictest numbers stop decreasing.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/kvm/lapic.c
===
--- kvm.orig/arch/x86/kvm/lapic.c
+++ kvm/arch/x86/kvm/lapic.c
@@ -33,6 +33,7 @@
 #include asm/page.h
 #include asm/current.h
 #include asm/apicdef.h
+#include asm/delay.h
 #include linux/atomic.h
 #include linux/jump_label.h
 #include kvm_cache_regs.h
@@ -1073,6 +1074,7 @@ static void apic_timer_expired(struct kv
 {
struct kvm_vcpu *vcpu = apic-vcpu;
wait_queue_head_t *q = vcpu-wq;
+   struct kvm_timer *ktimer = apic-lapic_timer;
 
/*
 * Note: KVM_REQ_PENDING_TIMER is implicitly checked in
@@ -1087,11 +1089,64 @@ static void apic_timer_expired(struct kv
 
if (waitqueue_active(q))
wake_up_interruptible(q);
+
+   if (apic_lvtt_tscdeadline(apic))
+   ktimer-expired_tscdeadline = ktimer-tscdeadline;
+}
+
+/*
+ * On APICv, this test will cause a busy wait
+ * during a higher-priority task.
+ */
+
+static bool lapic_timer_int_injected(struct kvm_vcpu *vcpu)
+{
+   struct kvm_lapic *apic = vcpu-arch.apic;
+   u32 reg = kvm_apic_get_reg(apic, APIC_LVTT);
+
+   if (kvm_apic_hw_enabled(apic)) {
+   int vec = reg  APIC_VECTOR_MASK;
+
+   if (kvm_x86_ops-test_posted_interrupt)
+   return kvm_x86_ops-test_posted_interrupt(vcpu, vec);
+   else {
+   if (apic_test_vector(vec, apic-regs + APIC_ISR))
+   return true;
+   }
+   }
+   return false;
+}
+
+void wait_lapic_expire(struct kvm_vcpu *vcpu)
+{
+   struct kvm_lapic *apic = vcpu-arch.apic;
+   u64 guest_tsc, tsc_deadline;
+
+   if (!kvm_vcpu_has_lapic(vcpu))
+   return;
+
+   if (apic-lapic_timer.expired_tscdeadline == 0)
+   return;
+
+   if (!lapic_timer_int_injected(vcpu))
+   return;
+
+   tsc_deadline = apic-lapic_timer.expired_tscdeadline;
+   apic-lapic_timer.expired_tscdeadline = 0;
+   guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, native_read_tsc());
+
+   while (guest_tsc  tsc_deadline) {
+   int delay = min(tsc_deadline - guest_tsc, 1000ULL);
+
+   __delay(delay);
+   guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, native_read_tsc());
+   }
 }
 
 static void start_apic_timer(struct kvm_lapic *apic)
 {
ktime_t now;
+
atomic_set(apic-lapic_timer.pending, 0);
 
if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic)) {
@@ -1137,6 +1192,7 @@ static void start_apic_timer(struct kvm_
/* lapic timer in tsc deadline mode */
u64 guest_tsc, tscdeadline = apic-lapic_timer.tscdeadline;
u64 ns = 0;
+   ktime_t expire;
struct kvm_vcpu *vcpu = apic-vcpu;
unsigned long this_tsc_khz = vcpu-arch.virtual_tsc_khz;
unsigned long flags;
@@ -1151,8 +1207,10 @@ static void start_apic_timer(struct kvm_
if (likely(tscdeadline  guest_tsc)) {
ns = (tscdeadline - guest_tsc) * 100ULL;
do_div(ns, this_tsc_khz);
+   expire = ktime_add_ns(now, ns);
+   expire = ktime_sub_ns(expire, lapic_timer_advance_ns);
hrtimer_start(apic-lapic_timer.timer,
-   ktime_add_ns(now, ns), HRTIMER_MODE_ABS);
+ expire, HRTIMER_MODE_ABS);
} else
apic_timer_expired(apic);
 
Index: kvm/arch/x86/kvm/lapic.h
===
--- kvm.orig/arch/x86/kvm/lapic.h
+++ kvm/arch/x86/kvm/lapic.h
@@ -14,6 +14,7 @@ struct kvm_timer {
u32 timer_mode;
u32 timer_mode_mask;
u64 tscdeadline;
+   u64 expired_tscdeadline;
atomic_t pending;   /* accumulated triggered timers 
*/
 };
 
@@ -170,4 +171,6 @@ static inline bool kvm_apic_has_events(s
 
 bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
 
+void wait_lapic_expire(struct kvm_vcpu *vcpu);
+
 #endif
Index: kvm/arch/x86/kvm/x86.c

[patch 1/3] KVM: x86: add method to test PIR bitmap vector

2014-12-16 Thread Marcelo Tosatti
kvm_x86_ops-test_posted_interrupt() returns true/false depending
whether 'vector' is set.

Next patch makes use of this interface.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/include/asm/kvm_host.h
===
--- kvm.orig/arch/x86/include/asm/kvm_host.h
+++ kvm/arch/x86/include/asm/kvm_host.h
@@ -743,6 +743,7 @@ struct kvm_x86_ops {
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
+   bool (*test_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
int (*get_tdp_level)(void);
Index: kvm/arch/x86/kvm/vmx.c
===
--- kvm.orig/arch/x86/kvm/vmx.c
+++ kvm/arch/x86/kvm/vmx.c
@@ -435,6 +435,11 @@ static int pi_test_and_set_pir(int vecto
return test_and_set_bit(vector, (unsigned long *)pi_desc-pir);
 }
 
+static int pi_test_pir(int vector, struct pi_desc *pi_desc)
+{
+   return test_bit(vector, (unsigned long *)pi_desc-pir);
+}
+
 struct vcpu_vmx {
struct kvm_vcpu   vcpu;
unsigned long host_rsp;
@@ -5939,6 +5944,7 @@ static __init int hardware_setup(void)
else {
kvm_x86_ops-hwapic_irr_update = NULL;
kvm_x86_ops-deliver_posted_interrupt = NULL;
+   kvm_x86_ops-test_posted_interrupt = NULL;
kvm_x86_ops-sync_pir_to_irr = vmx_sync_pir_to_irr_dummy;
}
 
@@ -6960,6 +6966,13 @@ static int handle_invvpid(struct kvm_vcp
return 1;
 }
 
+static bool vmx_test_pir(struct kvm_vcpu *vcpu, int vector)
+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+   return pi_test_pir(vector, vmx-pi_desc);
+}
+
 /*
  * The exit handlers return 1 if the exit was handled fully and guest execution
  * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
@@ -9374,6 +9387,7 @@ static struct kvm_x86_ops vmx_x86_ops =
.hwapic_isr_update = vmx_hwapic_isr_update,
.sync_pir_to_irr = vmx_sync_pir_to_irr,
.deliver_posted_interrupt = vmx_deliver_posted_interrupt,
+   .test_posted_interrupt = vmx_test_pir,
 
.set_tss_addr = vmx_set_tss_addr,
.get_tdp_level = get_ept_level,


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 3/3] KVM: x86: add tracepoint to wait_lapic_expire

2014-12-16 Thread Marcelo Tosatti
Add tracepoint to wait_lapic_expire.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/kvm/lapic.c
===
--- kvm.orig/arch/x86/kvm/lapic.c
+++ kvm/arch/x86/kvm/lapic.c
@@ -1120,7 +1120,7 @@ static bool lapic_timer_int_injected(str
 void wait_lapic_expire(struct kvm_vcpu *vcpu)
 {
struct kvm_lapic *apic = vcpu-arch.apic;
-   u64 guest_tsc, tsc_deadline;
+   u64 orig_guest_tsc, guest_tsc, tsc_deadline;
 
if (!kvm_vcpu_has_lapic(vcpu))
return;
@@ -1133,7 +1133,7 @@ void wait_lapic_expire(struct kvm_vcpu *
 
tsc_deadline = apic-lapic_timer.expired_tscdeadline;
apic-lapic_timer.expired_tscdeadline = 0;
-   guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, native_read_tsc());
+   orig_guest_tsc = guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, 
native_read_tsc());
 
while (guest_tsc  tsc_deadline) {
int delay = min(tsc_deadline - guest_tsc, 1000ULL);
@@ -1141,6 +1141,8 @@ void wait_lapic_expire(struct kvm_vcpu *
__delay(delay);
guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, native_read_tsc());
}
+
+   trace_kvm_wait_lapic_expire(vcpu-vcpu_id, orig_guest_tsc - 
tsc_deadline);
 }
 
 static void start_apic_timer(struct kvm_lapic *apic)
Index: kvm/arch/x86/kvm/trace.h
===
--- kvm.orig/arch/x86/kvm/trace.h
+++ kvm/arch/x86/kvm/trace.h
@@ -914,6 +914,25 @@ TRACE_EVENT(kvm_pvclock_update,
  __entry-flags)
 );
 
+TRACE_EVENT(kvm_wait_lapic_expire,
+   TP_PROTO(unsigned int vcpu_id, s64 delta),
+   TP_ARGS(vcpu_id, delta),
+
+   TP_STRUCT__entry(
+   __field(unsigned int,   vcpu_id )
+   __field(s64,delta   )
+   ),
+
+   TP_fast_assign(
+   __entry-vcpu_id   = vcpu_id;
+   __entry-delta = delta;
+   ),
+
+   TP_printk(vcpu %u: delta %lld,
+ __entry-vcpu_id,
+ __entry-delta)
+);
+
 #endif /* _TRACE_KVM_H */
 
 #undef TRACE_INCLUDE_PATH


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/3] KVM: x86: add option to advance tscdeadline hrtimer expiration

2014-12-16 Thread Paolo Bonzini
On 15/12/2014 23:06, Marcelo Tosatti wrote:
 For the hrtimer which emulates the tscdeadline timer in the guest,
 add an option to advance expiration, and busy spin on VM-entry waiting
 for the actual expiration time to elapse.
 
 This allows achieving low latencies in cyclictest (or any scenario 
 which requires strict timing regarding timer expiration).
 
 Reduces average cyclictest latency from 12us to 8us
 on Core i5 desktop.
 
 Note: this option requires tuning to find the appropriate value 
 for a particular hardware/guest combination. One method is to measure the 
 average delay between apic_timer_fn and VM-entry. 
 Another method is to start with 1000ns, and increase the value
 in say 500ns increments until avg cyclictest numbers stop decreasing.
 
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 
 Index: kvm/arch/x86/kvm/lapic.c
 ===
 --- kvm.orig/arch/x86/kvm/lapic.c
 +++ kvm/arch/x86/kvm/lapic.c
 @@ -33,6 +33,7 @@
  #include asm/page.h
  #include asm/current.h
  #include asm/apicdef.h
 +#include asm/delay.h
  #include linux/atomic.h
  #include linux/jump_label.h
  #include kvm_cache_regs.h
 @@ -1073,6 +1074,7 @@ static void apic_timer_expired(struct kv
  {
   struct kvm_vcpu *vcpu = apic-vcpu;
   wait_queue_head_t *q = vcpu-wq;
 + struct kvm_timer *ktimer = apic-lapic_timer;
  
   /*
* Note: KVM_REQ_PENDING_TIMER is implicitly checked in
 @@ -1087,11 +1089,64 @@ static void apic_timer_expired(struct kv
  
   if (waitqueue_active(q))
   wake_up_interruptible(q);
 +
 + if (apic_lvtt_tscdeadline(apic))
 + ktimer-expired_tscdeadline = ktimer-tscdeadline;
 +}
 +
 +/*
 + * On APICv, this test will cause a busy wait
 + * during a higher-priority task.
 + */
 +
 +static bool lapic_timer_int_injected(struct kvm_vcpu *vcpu)
 +{
 + struct kvm_lapic *apic = vcpu-arch.apic;
 + u32 reg = kvm_apic_get_reg(apic, APIC_LVTT);
 +
 + if (kvm_apic_hw_enabled(apic)) {
 + int vec = reg  APIC_VECTOR_MASK;
 +
 + if (kvm_x86_ops-test_posted_interrupt)
 + return kvm_x86_ops-test_posted_interrupt(vcpu, vec);
 + else {
 + if (apic_test_vector(vec, apic-regs + APIC_ISR))
 + return true;
 + }
 + }
 + return false;
 +}
 +
 +void wait_lapic_expire(struct kvm_vcpu *vcpu)
 +{
 + struct kvm_lapic *apic = vcpu-arch.apic;
 + u64 guest_tsc, tsc_deadline;
 +
 + if (!kvm_vcpu_has_lapic(vcpu))
 + return;
 +
 + if (apic-lapic_timer.expired_tscdeadline == 0)
 + return;
 +
 + if (!lapic_timer_int_injected(vcpu))
 + return;

By the time we get here, I think, if expired_tscdeadline != 0 we're sure
that the interrupt has been injected.  It may be in IRR rather than ISR,
but at least on APICv the last test should be redundant.

So perhaps you can get rid of patch 1 and check
kvm_apic_vid_enabled(vcpu-kvm):

if (k_a_v_e(vcpu-kvm)
return true;
if (apic_test_vector(vec, apic-regs + APIC_ISR))
return true;

Does this sound correct?

Paolo



 + tsc_deadline = apic-lapic_timer.expired_tscdeadline;
 + apic-lapic_timer.expired_tscdeadline = 0;
 + guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, native_read_tsc());
 +
 + while (guest_tsc  tsc_deadline) {
 + int delay = min(tsc_deadline - guest_tsc, 1000ULL);
 +
 + __delay(delay);
 + guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, native_read_tsc());
 + }
  }
  
  static void start_apic_timer(struct kvm_lapic *apic)
  {
   ktime_t now;
 +
   atomic_set(apic-lapic_timer.pending, 0);
  
   if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic)) {
 @@ -1137,6 +1192,7 @@ static void start_apic_timer(struct kvm_
   /* lapic timer in tsc deadline mode */
   u64 guest_tsc, tscdeadline = apic-lapic_timer.tscdeadline;
   u64 ns = 0;
 + ktime_t expire;
   struct kvm_vcpu *vcpu = apic-vcpu;
   unsigned long this_tsc_khz = vcpu-arch.virtual_tsc_khz;
   unsigned long flags;
 @@ -1151,8 +1207,10 @@ static void start_apic_timer(struct kvm_
   if (likely(tscdeadline  guest_tsc)) {
   ns = (tscdeadline - guest_tsc) * 100ULL;
   do_div(ns, this_tsc_khz);
 + expire = ktime_add_ns(now, ns);
 + expire = ktime_sub_ns(expire, lapic_timer_advance_ns);
   hrtimer_start(apic-lapic_timer.timer,
 - ktime_add_ns(now, ns), HRTIMER_MODE_ABS);
 +   expire, HRTIMER_MODE_ABS);
   } else
   apic_timer_expired(apic);
  
 Index: kvm/arch/x86/kvm/lapic.h
 ===
 --- 

Re: [patch 2/3] KVM: x86: add option to advance tscdeadline hrtimer expiration

2014-12-16 Thread Marcelo Tosatti
On Tue, Dec 16, 2014 at 03:34:22PM +0100, Paolo Bonzini wrote:
 On 15/12/2014 23:06, Marcelo Tosatti wrote:
  For the hrtimer which emulates the tscdeadline timer in the guest,
  add an option to advance expiration, and busy spin on VM-entry waiting
  for the actual expiration time to elapse.
  
  This allows achieving low latencies in cyclictest (or any scenario 
  which requires strict timing regarding timer expiration).
  
  Reduces average cyclictest latency from 12us to 8us
  on Core i5 desktop.
  
  Note: this option requires tuning to find the appropriate value 
  for a particular hardware/guest combination. One method is to measure the 
  average delay between apic_timer_fn and VM-entry. 
  Another method is to start with 1000ns, and increase the value
  in say 500ns increments until avg cyclictest numbers stop decreasing.
  
  Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
  
  Index: kvm/arch/x86/kvm/lapic.c
  ===
  --- kvm.orig/arch/x86/kvm/lapic.c
  +++ kvm/arch/x86/kvm/lapic.c
  @@ -33,6 +33,7 @@
   #include asm/page.h
   #include asm/current.h
   #include asm/apicdef.h
  +#include asm/delay.h
   #include linux/atomic.h
   #include linux/jump_label.h
   #include kvm_cache_regs.h
  @@ -1073,6 +1074,7 @@ static void apic_timer_expired(struct kv
   {
  struct kvm_vcpu *vcpu = apic-vcpu;
  wait_queue_head_t *q = vcpu-wq;
  +   struct kvm_timer *ktimer = apic-lapic_timer;
   
  /*
   * Note: KVM_REQ_PENDING_TIMER is implicitly checked in
  @@ -1087,11 +1089,64 @@ static void apic_timer_expired(struct kv
   
  if (waitqueue_active(q))
  wake_up_interruptible(q);
  +
  +   if (apic_lvtt_tscdeadline(apic))
  +   ktimer-expired_tscdeadline = ktimer-tscdeadline;
  +}
  +
  +/*
  + * On APICv, this test will cause a busy wait
  + * during a higher-priority task.
  + */
  +
  +static bool lapic_timer_int_injected(struct kvm_vcpu *vcpu)
  +{
  +   struct kvm_lapic *apic = vcpu-arch.apic;
  +   u32 reg = kvm_apic_get_reg(apic, APIC_LVTT);
  +
  +   if (kvm_apic_hw_enabled(apic)) {
  +   int vec = reg  APIC_VECTOR_MASK;
  +
  +   if (kvm_x86_ops-test_posted_interrupt)
  +   return kvm_x86_ops-test_posted_interrupt(vcpu, vec);
  +   else {
  +   if (apic_test_vector(vec, apic-regs + APIC_ISR))
  +   return true;
  +   }
  +   }
  +   return false;
  +}
  +
  +void wait_lapic_expire(struct kvm_vcpu *vcpu)
  +{
  +   struct kvm_lapic *apic = vcpu-arch.apic;
  +   u64 guest_tsc, tsc_deadline;
  +
  +   if (!kvm_vcpu_has_lapic(vcpu))
  +   return;
  +
  +   if (apic-lapic_timer.expired_tscdeadline == 0)
  +   return;
  +
  +   if (!lapic_timer_int_injected(vcpu))
  +   return;
 
 By the time we get here, I think, if expired_tscdeadline != 0 we're sure
 that the interrupt has been injected.  It may be in IRR rather than ISR,
 but at least on APICv the last test should be redundant.
 
 So perhaps you can get rid of patch 1 and check
 kvm_apic_vid_enabled(vcpu-kvm):
 
   if (k_a_v_e(vcpu-kvm)
   return true;
   if (apic_test_vector(vec, apic-regs + APIC_ISR))
   return true;
 
 Does this sound correct?

* expired_tscdeadline != 0.
* APIC timer interrupt delivery masked at LVTT register.

Implies expired_tscdeadline != 0 and interrupt not injected.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/3] KVM: x86: add option to advance tscdeadline hrtimer expiration

2014-12-16 Thread Paolo Bonzini


On 16/12/2014 16:13, Marcelo Tosatti wrote:
  So perhaps you can get rid of patch 1 and check
  kvm_apic_vid_enabled(vcpu-kvm):
  
 if (k_a_v_e(vcpu-kvm)
 return true;
 if (apic_test_vector(vec, apic-regs + APIC_ISR))
 return true;
  
  Does this sound correct?
 * expired_tscdeadline != 0.
 * APIC timer interrupt delivery masked at LVTT register.
 
 Implies expired_tscdeadline != 0 and interrupt not injected.

Good point.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: iommu: Add cond_resched to legacy device assignment code

2014-12-16 Thread Joerg Roedel
From: Joerg Roedel jroe...@suse.de

When assigning devices to large memory guests (=128GB guest
memory in the failure case) the functions to create the
IOMMU page-tables for the whole guest might run for a very
long time. On non-preemptible kernels this might cause
Soft-Lockup warnings. Fix these by adding a cond_resched()
to the mapping and unmapping loops.

Signed-off-by: Joerg Roedel jroe...@suse.de
---
 virt/kvm/iommu.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index c1e6ae9..ac427e8 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -137,7 +137,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct 
kvm_memory_slot *slot)
 
gfn += page_size  PAGE_SHIFT;
 
-
+   cond_resched();
}
 
return 0;
@@ -311,6 +311,8 @@ static void kvm_iommu_put_pages(struct kvm *kvm,
kvm_unpin_pages(kvm, pfn, unmap_pages);
 
gfn += unmap_pages;
+
+   cond_resched();
}
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Why do additional cores reduce performance?

2014-12-16 Thread Oleg Ovechko
 What is your benchmark?

I've tried different ways (CrystalDiskMark 3.0.3 x64, ATTO Disk
Banchmark v2.47) all give same result.
The numbers I've provided in 1st mail are for 100G file copied over. I
simply subtract stop and start times. 50 seconds is so huge difference
(three sigma rule gives 10 secs for 10 tries), I can even use wall
clocks.

When everything is enabled in BIOS it is 6:23 on real Windows versus
9:03 on virtualized...

Phil Ehrens has sent me link
https://lists.gnu.org/archive/html/qemu-discuss/2014-10/msg00036.html
If I don't misunderstand, it means kvm/qemu simply is not designed for
multi-threading.
I guess I need to try different hypervisor. 50% performance is too
high price especially when VT-x and VT-d are meant to make it 0%

 Windows sometimes has scalability problems due to the way it does
 timing.  Try replacing -cpu host with -no-hpet -cpu
 host,hv_time,hv_vapic.

Does not change results.

 It will be a mix.  Do not specify HT in the guest, unless you have HT in
 the host _and_ you are pinning the two threads of each guest core to the
 two threads of a host core.

Do you mean -smp 4,sockets=1,cores=2,threads=2 for 2 cores with HT
enabled? Gives even worth result - 9:17
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Why do additional cores reduce performance?

2014-12-16 Thread Paolo Bonzini


On 16/12/2014 17:22, Oleg Ovechko wrote:
 What is your benchmark?
 
 I've tried different ways (CrystalDiskMark 3.0.3 x64, ATTO Disk
 Banchmark v2.47) all give same result.

All are run on the AHCI passthrough disk(s), right?

 When everything is enabled in BIOS it is 6:23 on real Windows versus
 9:03 on virtualized...
 
 Phil Ehrens has sent me link
 https://lists.gnu.org/archive/html/qemu-discuss/2014-10/msg00036.html
 If I don't misunderstand, it means kvm/qemu simply is not designed for
 multi-threading.

No, it means TCG does not support multithreading.  KVM does, and you are
using it.

 I guess I need to try different hypervisor. 50% performance is too
 high price especially when VT-x and VT-d are meant to make it 0%

It is surprising to me too.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [kvmtool]: Use the arch default transport method for network

2014-12-16 Thread Suzuki K. Poulose
From: Suzuki K. Poulose suzuki.poul...@arm.com

lkvm by default sets up a virtio-pci transport for network, if none is
specified. This can be a problem on archs (e.g ARM64), where virtio-pci is
not supported yet and cause the following warning at exit.

  # KVM compatibility warning.
virtio-net device was not detected.
While you have requested a virtio-net device, the guest kernel did not 
initialize it.
Please make sure that the guest kernel was compiled with 
CONFIG_VIRTIO_NET=y enabled in .config.

This patch changes it to make use of the default transport method for the
architecture when none is specified. This will ensure that on every arch
we get the network up by default in the VM.

Applies on top of the kvm/arm branch in Will's kvmtool tree.

Signed-off-by: Suzuki K. Poulose suzuki.poul...@arm.com
Acked-by: Will Deacon will.dea...@arm.com
---
 tools/kvm/include/kvm/virtio.h |1 +
 tools/kvm/virtio/core.c|9 +
 tools/kvm/virtio/net.c |   21 +++--
 3 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/tools/kvm/include/kvm/virtio.h b/tools/kvm/include/kvm/virtio.h
index 8a9eab5..768ee96 100644
--- a/tools/kvm/include/kvm/virtio.h
+++ b/tools/kvm/include/kvm/virtio.h
@@ -160,6 +160,7 @@ int virtio_init(struct kvm *kvm, void *dev, struct 
virtio_device *vdev,
struct virtio_ops *ops, enum virtio_trans trans,
int device_id, int subsys_id, int class);
 int virtio_compat_add_message(const char *device, const char *config);
+const char* virtio_trans_name(enum virtio_trans trans);
 
 static inline void *virtio_get_vq(struct kvm *kvm, u32 pfn, u32 page_size)
 {
diff --git a/tools/kvm/virtio/core.c b/tools/kvm/virtio/core.c
index 9ae7887..3b6e4d7 100644
--- a/tools/kvm/virtio/core.c
+++ b/tools/kvm/virtio/core.c
@@ -12,6 +12,15 @@
 #include kvm/kvm.h
 
 
+const char* virtio_trans_name(enum virtio_trans trans)
+{
+   if (trans == VIRTIO_PCI)
+   return pci;
+   else if (trans == VIRTIO_MMIO)
+   return mmio;
+   return unknown;
+}
+
 struct vring_used_elem *virt_queue__set_used_elem(struct virt_queue *queue, 
u32 head, u32 len)
 {
struct vring_used_elem *used_elem;
diff --git a/tools/kvm/virtio/net.c b/tools/kvm/virtio/net.c
index c8af385..e9daea4 100644
--- a/tools/kvm/virtio/net.c
+++ b/tools/kvm/virtio/net.c
@@ -758,6 +758,7 @@ static int virtio_net__init_one(struct virtio_net_params 
*params)
int i, err;
struct net_dev *ndev;
struct virtio_ops *ops;
+   enum virtio_trans trans = VIRTIO_DEFAULT_TRANS(params-kvm);
 
ndev = calloc(1, sizeof(struct net_dev));
if (ndev == NULL)
@@ -799,12 +800,20 @@ static int virtio_net__init_one(struct virtio_net_params 
*params)
}
 
*ops = net_dev_virtio_ops;
-   if (params-trans  strcmp(params-trans, mmio) == 0)
-   virtio_init(params-kvm, ndev, ndev-vdev, ops, VIRTIO_MMIO,
-   PCI_DEVICE_ID_VIRTIO_NET, VIRTIO_ID_NET, 
PCI_CLASS_NET);
-   else
-   virtio_init(params-kvm, ndev, ndev-vdev, ops, VIRTIO_PCI,
-   PCI_DEVICE_ID_VIRTIO_NET, VIRTIO_ID_NET, 
PCI_CLASS_NET);
+
+   if (params-trans) {
+   if (strcmp(params-trans, mmio) == 0)
+   trans = VIRTIO_MMIO;
+   else if (strcmp(params-trans, pci) == 0)
+   trans = VIRTIO_PCI;
+   else
+   pr_warning(virtio-net: Unknown transport method : %s, 
 
+  falling back to %s., params-trans,
+  virtio_trans_name(trans));
+   }
+
+   virtio_init(params-kvm, ndev, ndev-vdev, ops, trans,
+   PCI_DEVICE_ID_VIRTIO_NET, VIRTIO_ID_NET, PCI_CLASS_NET);
 
if (params-vhost)
virtio_net__vhost_init(params-kvm, ndev);
-- 
1.7.9.5


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[v2 PATCH] KVM: nVMX: consult PFEC_MASK and PFEC_MATCH when generating #PF VM-exit

2014-12-16 Thread Eugene Korenevsky
When generating #PF VM-exit, check equality:
(PFEC  PFEC_MASK) == PFEC_MATCH
If there is equality, the 14 bit of exception bitmap is used to take decision
about generating #PF VM-exit. If there is inequality, inverted 14 bit is used.

Signed-off-by: Eugene Korenevsky ekorenev...@gmail.com
---
 arch/x86/kvm/vmx.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 09ccf6c..a8ef8265 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8206,6 +8206,18 @@ static void nested_ept_uninit_mmu_context(struct 
kvm_vcpu *vcpu)
vcpu-arch.walk_mmu = vcpu-arch.mmu;
 }
 
+static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12,
+   u16 error_code)
+{
+   bool inequality, bit;
+
+   bit = (vmcs12-exception_bitmap  (1u  PF_VECTOR)) != 0;
+   inequality =
+   (error_code  vmcs12-page_fault_error_code_mask) !=
+vmcs12-page_fault_error_code_match;
+   return inequality ^ bit;
+}
+
 static void vmx_inject_page_fault_nested(struct kvm_vcpu *vcpu,
struct x86_exception *fault)
 {
@@ -8213,8 +8225,7 @@ static void vmx_inject_page_fault_nested(struct kvm_vcpu 
*vcpu,
 
WARN_ON(!is_guest_mode(vcpu));
 
-   /* TODO: also check PFEC_MATCH/MASK, not just EB.PF. */
-   if (vmcs12-exception_bitmap  (1u  PF_VECTOR))
+   if (nested_vmx_is_page_fault_vmexit(vmcs12, fault-error_code))
nested_vmx_vmexit(vcpu, to_vmx(vcpu)-exit_reason,
  vmcs_read32(VM_EXIT_INTR_INFO),
  vmcs_readl(EXIT_QUALIFICATION));
-- 
2.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: iommu: Add cond_resched to legacy device assignment code

2014-12-16 Thread Chen, Tiejun

On 2014/12/16 23:47, Joerg Roedel wrote:

From: Joerg Roedel jroe...@suse.de

When assigning devices to large memory guests (=128GB guest
memory in the failure case) the functions to create the
IOMMU page-tables for the whole guest might run for a very
long time. On non-preemptible kernels this might cause
Soft-Lockup warnings. Fix these by adding a cond_resched()
to the mapping and unmapping loops.

Signed-off-by: Joerg Roedel jroe...@suse.de
---
  virt/kvm/iommu.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index c1e6ae9..ac427e8 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c


This file is already gone after one latest commit c274e03af705, kvm: 
x86: move assigned-dev.c and iommu.c to arch/x86/ is introduced, so you 
need to pull your tree firstly :)


Tiejun



@@ -137,7 +137,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct 
kvm_memory_slot *slot)

gfn += page_size  PAGE_SHIFT;

-
+   cond_resched();
}

return 0;
@@ -311,6 +311,8 @@ static void kvm_iommu_put_pages(struct kvm *kvm,
kvm_unpin_pages(kvm, pfn, unmap_pages);

gfn += unmap_pages;
+
+   cond_resched();
}
  }



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Clarification for patch 7

2014-12-16 Thread Mario Smarduch
Hi Christoffer, Marc -
   in stage2_dissolve_pmd() CONFIG_SMP is
unnecessary. At the time huge page is write protected,
until it faults and is cleared any page in the range
may be dirty not just the gpa access that caused the
fault.

The comment on another CPU is wrong, I
confused myself while testing it should not be possible for
another CPU to write to same PMD range while
another is handling its PMD fault. I ran a test with
only initrd which exposed an issue using my
test scenario, QEMU appears fine.

It also depends on user space if you first turn
on logging, do pre-copy then marking just the page
is enough. It's hard to interpret the API in this
case. It just says dirty pages since the last
call.

That patch could be resent without upsetting
the rest.

- Mario
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[question] Why newer QEMU may lose irq when doing migration?

2014-12-16 Thread Wincy Van
Hi, all:

The patchset (https://lkml.org/lkml/2014/3/18/309) fixed migration of
Windows guests, but commit 0bc830b05c667218d703f2026ec866c49df974fc
(KVM: ioapic: clear IRR for edge-triggered interrupts at delivery)
introduced a bug (see
https://www.mail-archive.com/kvm@vger.kernel.org/msg109813.html).

From the description Unlike the old qemu-kvm, which really never did
that, with new QEMU it is for some reason
somewhat likely to migrate a VM with a nonzero IRR in the ioapic.

Why could new QEMU do that? I can not find any codes about the some reason..
As we know, once a irq is set in kvm's ioapic, the ioapic will send
that irq to lapic, this is an atomic operation.
Then, kvm will inject them in inject_pending_event(or set rvi in
apic-v case). QEMU will also save the pending irq when doing
migration.

I can not find a point which guest could lose a irq, but this scenario
really exists.

Any ideas?


Thanks,

Wincy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html