[PATCH v2 10/12] KVM: VMX: Setup TSC scaling ratio when a vcpu is loaded

2015-10-20 Thread Haozhong Zhang
This patch makes kvm-intel module to load TSC scaling ratio into TSC
multiplier field of VMCS when a vcpu is loaded, so that TSC scaling
ratio can take effect if VMX TSC scaling is enabled.

Signed-off-by: Haozhong Zhang 
---
 arch/x86/kvm/vmx.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a86f790..c241ff3 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2062,6 +2062,12 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
rdmsrl(MSR_IA32_SYSENTER_ESP, sysenter_esp);
vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /* 22.2.3 */
+
+   /* Setup TSC multiplier */
+   if (cpu_has_vmx_tsc_scaling())
+   vmcs_write64(TSC_MULTIPLIER,
+vcpu->arch.tsc_scaling_ratio);
+
vmx->loaded_vmcs->cpu = cpu;
}
 
-- 
2.4.8

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 00/12] KVM: x86: add support for VMX TSC scaling

2015-10-20 Thread Haozhong Zhang
This patchset adds support for VMX TSC scaling feature which is
available on Intel Skylake CPU. The specification of VMX TSC scaling
can be found at
http://www.intel.com/content/www/us/en/processors/timestamp-counter-scaling-virtualization-white-paper.html

VMX TSC scaling allows guest TSC which is read by guest rdtsc(p)
instructions increases in a rate that is customized by the hypervisor
and can be different than the host TSC rate. Basically, VMX TSC
scaling adds a 64-bit field called TSC multiplier in VMCS so that, if
VMX TSC scaling is enabled, TSC read by guest rdtsc(p) instructions
will be calculated by the following formula:

  guest EDX:EAX = (Host TSC * TSC multiplier) >> 48 + VMX TSC Offset

where, Host TSC = Host MSR_IA32_TSC + Host MSR_IA32_TSC_ADJUST.

This patchset, when cooperating with another QEMU patchset (sent in
another email "target-i386: save/restore vcpu's TSC rate during
migration"), allows guest programs observe a consistent TSC rate even
though they are migrated among machines with different host TSC rates.

VMX TSC scaling shares some common logics with SVM TSC ratio which
is already supported by KVM. Patch 1 ~ 8 move those common logics from
SVM code to the common code. Upon them, patch 9 ~ 12 add VMX-specific
support for VMX TSC scaling.

Changes in v2:
 * Remove the duplicated variable 'kvm_tsc_scaling_ratio_rsvd'.
 * Remove an unnecessary error check in original patch 2.
 * Do 64-bit arithmetic by functions recommended by Paolo.
 * Make kvm_set_tsc_khz() returns an error number so that ioctl
   KVM_SET_TSC_KHZ does not return 0 if errors happen.

Reviewed-by: Eric Northup 

Haozhong Zhang (12):
  KVM: x86: Collect information for setting TSC scaling ratio
  KVM: x86: Add a common TSC scaling ratio field in kvm_vcpu_arch
  KVM: x86: Add a common TSC scaling function
  KVM: x86: Replace call-back set_tsc_khz() with a common function
  KVM: x86: Replace call-back compute_tsc_offset() with a common function
  KVM: x86: Move TSC scaling logic out of call-back adjust_tsc_offset()
  KVM: x86: Move TSC scaling logic out of call-back read_l1_tsc()
  KVM: x86: Use the correct vcpu's TSC rate to compute time scale
  KVM: VMX: Enable and initialize VMX TSC scaling
  KVM: VMX: Setup TSC scaling ratio when a vcpu is loaded
  KVM: VMX: Use a scaled host TSC for guest readings of MSR_IA32_TSC
  KVM: VMX: Dump TSC multiplier in dump_vmcs()

 arch/x86/include/asm/kvm_host.h |  24 +++
 arch/x86/include/asm/vmx.h  |   3 +
 arch/x86/kvm/lapic.c|   4 +-
 arch/x86/kvm/svm.c  | 116 --
 arch/x86/kvm/vmx.c  |  64 ++-
 arch/x86/kvm/x86.c  | 134 +++-
 include/linux/kvm_host.h|  20 ++
 include/linux/math64.h  |  99 +
 8 files changed, 297 insertions(+), 167 deletions(-)

-- 
2.4.8

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 01/12] KVM: x86: Collect information for setting TSC scaling ratio

2015-10-20 Thread Haozhong Zhang
The number of bits of the fractional part of the 64-bit TSC scaling
ratio in VMX and SVM is different. This patch makes the architecture
code to collect the number of fractional bits and other related
information into variables that can be accessed in the common code.

Signed-off-by: Haozhong Zhang 
---
 arch/x86/include/asm/kvm_host.h | 6 ++
 arch/x86/kvm/svm.c  | 4 
 arch/x86/kvm/x86.c  | 6 ++
 3 files changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 53deb27..0540dc8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -990,6 +990,12 @@ extern bool kvm_has_tsc_control;
 extern u32  kvm_min_guest_tsc_khz;
 /* maximum supported tsc_khz for guests */
 extern u32  kvm_max_guest_tsc_khz;
+/* number of bits of the fractional part of the TSC scaling ratio */
+extern u8   kvm_tsc_scaling_ratio_frac_bits;
+/* default TSC scaling ratio (= 1.0) */
+extern u64  kvm_default_tsc_scaling_ratio;
+/* maximum allowed value of TSC scaling ratio */
+extern u64  kvm_max_tsc_scaling_ratio;
 
 enum emulation_result {
EMULATE_DONE, /* no further processing */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index cd8659c..55f5f49 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -908,7 +908,11 @@ static __init int svm_hardware_setup(void)
max = min(0x7fffULL, __scale_tsc(tsc_khz, TSC_RATIO_MAX));
 
kvm_max_guest_tsc_khz = max;
+
+   kvm_max_tsc_scaling_ratio = TSC_RATIO_MAX;
+   kvm_tsc_scaling_ratio_frac_bits = 32;
}
+   kvm_default_tsc_scaling_ratio = TSC_RATIO_DEFAULT;
 
if (nested) {
printk(KERN_INFO "kvm: Nested Virtualization enabled\n");
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9e9c226..79cbbb5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -109,6 +109,12 @@ bool kvm_has_tsc_control;
 EXPORT_SYMBOL_GPL(kvm_has_tsc_control);
 u32  kvm_max_guest_tsc_khz;
 EXPORT_SYMBOL_GPL(kvm_max_guest_tsc_khz);
+u8   kvm_tsc_scaling_ratio_frac_bits;
+EXPORT_SYMBOL_GPL(kvm_tsc_scaling_ratio_frac_bits);
+u64  kvm_default_tsc_scaling_ratio;
+EXPORT_SYMBOL_GPL(kvm_default_tsc_scaling_ratio);
+u64  kvm_max_tsc_scaling_ratio;
+EXPORT_SYMBOL_GPL(kvm_max_tsc_scaling_ratio);
 
 /* tsc tolerance in parts per million - default to 1/2 of the NTP threshold */
 static u32 tsc_tolerance_ppm = 250;
-- 
2.4.8

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 05/12] KVM: x86: Replace call-back compute_tsc_offset() with a common function

2015-10-20 Thread Haozhong Zhang
Both VMX and SVM calculate the tsc-offset in the same way, so this
patch removes the call-back compute_tsc_offset() and replaces it with a
common function kvm_compute_tsc_offset().

Signed-off-by: Haozhong Zhang 
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/svm.c  | 10 --
 arch/x86/kvm/vmx.c  |  6 --
 arch/x86/kvm/x86.c  | 15 ---
 4 files changed, 12 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c67469b..d5e820b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -856,7 +856,6 @@ struct kvm_x86_ops {
u64 (*read_tsc_offset)(struct kvm_vcpu *vcpu);
void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
 
-   u64 (*compute_tsc_offset)(struct kvm_vcpu *vcpu, u64 target_tsc);
u64 (*read_l1_tsc)(struct kvm_vcpu *vcpu, u64 host_tsc);
 
void (*get_exit_info)(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index a1364927..481fdd3 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1005,15 +1005,6 @@ static void svm_adjust_tsc_offset(struct kvm_vcpu *vcpu, 
s64 adjustment, bool ho
mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
 }
 
-static u64 svm_compute_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc)
-{
-   u64 tsc;
-
-   tsc = kvm_scale_tsc(vcpu, rdtsc());
-
-   return target_tsc - tsc;
-}
-
 static void init_vmcb(struct vcpu_svm *svm, bool init_event)
 {
struct vmcb_control_area *control = >vmcb->control;
@@ -4371,7 +4362,6 @@ static struct kvm_x86_ops svm_x86_ops = {
.read_tsc_offset = svm_read_tsc_offset,
.write_tsc_offset = svm_write_tsc_offset,
.adjust_tsc_offset = svm_adjust_tsc_offset,
-   .compute_tsc_offset = svm_compute_tsc_offset,
.read_l1_tsc = svm_read_l1_tsc,
 
.set_tdp_cr3 = set_tdp_cr3,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 7f87cf6..7896395 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2426,11 +2426,6 @@ static void vmx_adjust_tsc_offset(struct kvm_vcpu *vcpu, 
s64 adjustment, bool ho
   offset + adjustment);
 }
 
-static u64 vmx_compute_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc)
-{
-   return target_tsc - rdtsc();
-}
-
 static bool guest_cpuid_has_vmx(struct kvm_vcpu *vcpu)
 {
struct kvm_cpuid_entry2 *best = kvm_find_cpuid_entry(vcpu, 1, 0);
@@ -10815,7 +10810,6 @@ static struct kvm_x86_ops vmx_x86_ops = {
.read_tsc_offset = vmx_read_tsc_offset,
.write_tsc_offset = vmx_write_tsc_offset,
.adjust_tsc_offset = vmx_adjust_tsc_offset,
-   .compute_tsc_offset = vmx_compute_tsc_offset,
.read_l1_tsc = vmx_read_l1_tsc,
 
.set_tdp_cr3 = vmx_set_cr3,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index db5ef73..75129bd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1397,6 +1397,15 @@ u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc)
 }
 EXPORT_SYMBOL_GPL(kvm_scale_tsc);
 
+static u64 kvm_compute_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc)
+{
+   u64 tsc;
+
+   tsc = kvm_scale_tsc(vcpu, rdtsc());
+
+   return target_tsc - tsc;
+}
+
 void kvm_write_tsc(struct kvm_vcpu *vcpu, struct msr_data *msr)
 {
struct kvm *kvm = vcpu->kvm;
@@ -1408,7 +1417,7 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, struct msr_data 
*msr)
u64 data = msr->data;
 
raw_spin_lock_irqsave(>arch.tsc_write_lock, flags);
-   offset = kvm_x86_ops->compute_tsc_offset(vcpu, data);
+   offset = kvm_compute_tsc_offset(vcpu, data);
ns = get_kernel_ns();
elapsed = ns - kvm->arch.last_tsc_nsec;
 
@@ -1465,7 +1474,7 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, struct msr_data 
*msr)
} else {
u64 delta = nsec_to_cycles(vcpu, elapsed);
data += delta;
-   offset = kvm_x86_ops->compute_tsc_offset(vcpu, data);
+   offset = kvm_compute_tsc_offset(vcpu, data);
pr_debug("kvm: adjusted tsc offset by %llu\n", delta);
}
matched = true;
@@ -2692,7 +2701,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
if (tsc_delta < 0)
mark_tsc_unstable("KVM discovered backwards TSC");
if (check_tsc_unstable()) {
-   u64 offset = kvm_x86_ops->compute_tsc_offset(vcpu,
+   u64 offset = kvm_compute_tsc_offset(vcpu,
vcpu->arch.last_guest_tsc);
kvm_x86_ops->write_tsc_offset(vcpu, offset);
vcpu->arch.tsc_catchup = 1;
-- 
2.4.8

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to 

[PATCH v2 08/12] KVM: x86: Use the correct vcpu's TSC rate to compute time scale

2015-10-20 Thread Haozhong Zhang
This patch makes KVM use virtual_tsc_khz rather than the host TSC rate
as vcpu's TSC rate to compute the time scale if TSC scaling is enabled.

Signed-off-by: Haozhong Zhang 
---
 arch/x86/kvm/x86.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f2516bf..d5690a3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1698,7 +1698,7 @@ static void kvm_gen_update_masterclock(struct kvm *kvm)
 
 static int kvm_guest_time_update(struct kvm_vcpu *v)
 {
-   unsigned long flags, this_tsc_khz;
+   unsigned long flags, this_tsc_khz, tgt_tsc_khz;
struct kvm_vcpu_arch *vcpu = >arch;
struct kvm_arch *ka = >kvm->arch;
s64 kernel_ns;
@@ -1761,7 +1761,9 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
return 0;
 
if (unlikely(vcpu->hw_tsc_khz != this_tsc_khz)) {
-   kvm_get_time_scale(NSEC_PER_SEC / 1000, this_tsc_khz,
+   tgt_tsc_khz = kvm_has_tsc_control ?
+   vcpu->virtual_tsc_khz : this_tsc_khz;
+   kvm_get_time_scale(NSEC_PER_SEC / 1000, tgt_tsc_khz,
   >hv_clock.tsc_shift,
   >hv_clock.tsc_to_system_mul);
vcpu->hw_tsc_khz = this_tsc_khz;
-- 
2.4.8

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] arm/arm64: KVM: Fix disabled distributor operation

2015-10-20 Thread Eric Auger
Hi Christoffer,
On 10/17/2015 10:30 PM, Christoffer Dall wrote:
> We currently do a single update of the vgic state when the distrbutor
distributor
> enable/disable control register is accessed and then bypass updating the
> state for as long as the distributor remains disabled.
> 
> This is incorrect, because updating the state does not consider the
> distributor enable bit, and this you can end up in a situation where an
> interrupt is marked as pending on the CPU interface, but not pending on
> the distributor, which is an impossible state to be in, and triggers a
> warning.  Consider for example the following sequence of events:
> 
> 1. An interrupt is marked as pending on the distributor
>- the interrupt is also forwarded to the CPU interface
> 2. The guest turns off the distributor (it's about to do a reboot)
>- we stop updating the CPU interface state from now on
> 3. The guest disables the pending interrupt
>- we remove the pending state from the distributor, but don't touch
>  the CPU interface, see point 2.
> 
> Since the distributor disable bit really means that no interrupts should
> be forwarded to the CPU interface, we modify the code to keep updating
> the internal VGIC state, but always set the CPU interface pending bits
> to zero when the distributor is disabled.
> 
> Signed-off-by: Christoffer Dall 
> ---
>  virt/kvm/arm/vgic.c | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 58b1256..66c6616 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1012,6 +1012,12 @@ static int compute_pending_for_cpu(struct kvm_vcpu 
> *vcpu)
>   pend_percpu = vcpu->arch.vgic_cpu.pending_percpu;
>   pend_shared = vcpu->arch.vgic_cpu.pending_shared;
>  
> + if (!dist->enabled) {
> + bitmap_zero(pend_percpu, VGIC_NR_PRIVATE_IRQS);
> + bitmap_zero(pend_shared, nr_shared);
> + return 0;
> + }
> +
>   pending = vgic_bitmap_get_cpu_map(>irq_pending, vcpu_id);
>   enabled = vgic_bitmap_get_cpu_map(>irq_enabled, vcpu_id);
>   bitmap_and(pend_percpu, pending, enabled, VGIC_NR_PRIVATE_IRQS);
> @@ -1039,11 +1045,6 @@ void vgic_update_state(struct kvm *kvm)
>   struct kvm_vcpu *vcpu;
>   int c;
>  
> - if (!dist->enabled) {
> - set_bit(0, dist->irq_pending_on_cpu);
> - return;
I am confused. Don't you want to clear the whole bitmap?

Shouldn't we also handle interrupts programmed in the LR. Spec says any
ack should return a spurious ID. Is it what is going to happen with the
current implementation?

Eric
> - }
> -
>   kvm_for_each_vcpu(c, vcpu, kvm) {
>   if (compute_pending_for_cpu(vcpu))
>   set_bit(c, dist->irq_pending_on_cpu);
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


PING: [PATCH v5 0/7] KVM: arm64: Implement API for vGICv3 live migration

2015-10-20 Thread Pavel Fedin
 Hello! How are things going? There was lots of activity, discussions, and then 
- silence...

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


> -Original Message-
> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf 
> Of Pavel Fedin
> Sent: Monday, October 12, 2015 11:30 AM
> To: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org
> Cc: Christoffer Dall; Marc Zyngier; Andre Przywara
> Subject: [PATCH v5 0/7] KVM: arm64: Implement API for vGICv3 live migration
> 
> This patchset adds necessary userspace API in order to support vGICv3 live
> migration. GICv3 registers are accessed using device attribute ioctls,
> similar to GICv2.
> 
> Whoever wants to test it, please note that this version is not
> binary-compatible with previous one, the API has been seriously changed.
> qemu patchess will be posted in some time.
> 
> v4 => v5:
> - Adapted to new API by Peter Maydell, Marc Zyngier and Christoffer Dall.
>   Acked-by's on the documentation were dropped, just in case, because i
>   slightly adjusted it. Additionally, i merged all doc updates into one
>   patch.
> 
> v3 => v4:
> - Split pure refactoring from anything else
> - Documentation brought up to date
> - Cleaned up 'mmio' structure usage in vgic_attr_regs_access(),
>   use call_range_handler() for 64-bit access handling
> - Rebased on new linux-next
> 
> v2 => v3:
> - KVM_DEV_ARM_VGIC_CPUID_MASK enlarged to 20 bits, allowing more than 256
>   CPUs.
> - Bug fix: Correctly set mmio->private, necessary for redistributor access.
> - Added accessors for ICC_AP0R and ICC_AP1R registers
> - Rebased on new linux-next
> 
> v1 => v2:
> - Do not use generic register get/set API for CPU interface, use only
>   device attributes.
> - Introduce size specifier for distributor and redistributor register
>   accesses, do not assume size any more.
> - Lots of refactor and reusable code extraction.
> - Added forgotten documentation
> 
> Christoffer Dall (1):
>   KVM: arm/arm64: Update API documentation
> 
> Pavel Fedin (6):
>   KVM: arm/arm64: Move endianess conversion out of
> vgic_attr_regs_access()
>   KVM: arm/arm64: Refactor vGIC attributes handling code
>   KVM: arm64: Implement vGICv3 distributor and redistributor access from
> userspace
>   KVM: arm64: Refactor system register handlers
>   KVM: arm64: Introduce find_reg_by_id()
>   KVM: arm64: Implement vGICv3 CPU interface access
> 
>  Documentation/virtual/kvm/devices/arm-vgic-v3.txt | 117 
>  Documentation/virtual/kvm/devices/arm-vgic.txt|  41 +--
>  arch/arm64/include/uapi/asm/kvm.h |  17 +-
>  arch/arm64/kvm/sys_regs.c |  83 +++---
>  arch/arm64/kvm/sys_regs.h |   8 +-
>  arch/arm64/kvm/sys_regs_generic_v8.c  |   2 +-
>  include/linux/irqchip/arm-gic-v3.h|  19 +-
>  virt/kvm/arm/vgic-v2-emul.c   | 124 ++--
>  virt/kvm/arm/vgic-v3-emul.c   | 343 
> +-
>  virt/kvm/arm/vgic.c   |  57 
>  virt/kvm/arm/vgic.h   |   3 +
>  11 files changed, 630 insertions(+), 184 deletions(-)
>  create mode 100644 Documentation/virtual/kvm/devices/arm-vgic-v3.txt
> 
> --
> 2.4.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] arm/arm64: KVM: Fix disabled distributor operation

2015-10-20 Thread Christoffer Dall
On Tue, Oct 20, 2015 at 11:08:44AM +0200, Eric Auger wrote:
> Hi Christoffer,
> On 10/17/2015 10:30 PM, Christoffer Dall wrote:
> > We currently do a single update of the vgic state when the distrbutor
> distributor
> > enable/disable control register is accessed and then bypass updating the
> > state for as long as the distributor remains disabled.
> > 
> > This is incorrect, because updating the state does not consider the
> > distributor enable bit, and this you can end up in a situation where an
> > interrupt is marked as pending on the CPU interface, but not pending on
> > the distributor, which is an impossible state to be in, and triggers a
> > warning.  Consider for example the following sequence of events:
> > 
> > 1. An interrupt is marked as pending on the distributor
> >- the interrupt is also forwarded to the CPU interface
> > 2. The guest turns off the distributor (it's about to do a reboot)
> >- we stop updating the CPU interface state from now on
> > 3. The guest disables the pending interrupt
> >- we remove the pending state from the distributor, but don't touch
> >  the CPU interface, see point 2.
> > 
> > Since the distributor disable bit really means that no interrupts should
> > be forwarded to the CPU interface, we modify the code to keep updating
> > the internal VGIC state, but always set the CPU interface pending bits
> > to zero when the distributor is disabled.
> > 
> > Signed-off-by: Christoffer Dall 
> > ---
> >  virt/kvm/arm/vgic.c | 11 ++-
> >  1 file changed, 6 insertions(+), 5 deletions(-)
> > 
> > diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> > index 58b1256..66c6616 100644
> > --- a/virt/kvm/arm/vgic.c
> > +++ b/virt/kvm/arm/vgic.c
> > @@ -1012,6 +1012,12 @@ static int compute_pending_for_cpu(struct kvm_vcpu 
> > *vcpu)
> > pend_percpu = vcpu->arch.vgic_cpu.pending_percpu;
> > pend_shared = vcpu->arch.vgic_cpu.pending_shared;
> >  
> > +   if (!dist->enabled) {
> > +   bitmap_zero(pend_percpu, VGIC_NR_PRIVATE_IRQS);
> > +   bitmap_zero(pend_shared, nr_shared);
> > +   return 0;
> > +   }
> > +
> > pending = vgic_bitmap_get_cpu_map(>irq_pending, vcpu_id);
> > enabled = vgic_bitmap_get_cpu_map(>irq_enabled, vcpu_id);
> > bitmap_and(pend_percpu, pending, enabled, VGIC_NR_PRIVATE_IRQS);
> > @@ -1039,11 +1045,6 @@ void vgic_update_state(struct kvm *kvm)
> > struct kvm_vcpu *vcpu;
> > int c;
> >  
> > -   if (!dist->enabled) {
> > -   set_bit(0, dist->irq_pending_on_cpu);
> > -   return;
> I am confused. Don't you want to clear the whole bitmap?

So the first line used to set irq_pending_on_cpu for VCPU0 when the
distributor was disabled, which I think basically worked around the
guest kernel expecting to see a timer interrupt during boot when doing a
WFI.  Now when that's fixed, we don't need this (gigantuous hack) anymore.

The return statement was also weird and buggy, because it never
prevented anything from going to the CPU interface, it just stopped
updating things.


> 
> Shouldn't we also handle interrupts programmed in the LR. Spec says any
> ack should return a spurious ID. Is it what is going to happen with the
> current implementation?
> 

yes, we really should.  We should unqueue them, but I haven't seen any
bugs from this, and I feel like we're already changing a lot with short
notice, so I'd rather not distrupt anything more right now.

Besides, when we get around to redesigning this whole thing, the concept
of unqueueing goes away.

I know it sucks reviewing fixes that only fix a subset of a bad
implementation, but I'm aiming for 'slightly better than current state'
right now :)

-Christoffer


> > -   }
> > -
> > kvm_for_each_vcpu(c, vcpu, kvm) {
> > if (compute_pending_for_cpu(vcpu))
> > set_bit(c, dist->irq_pending_on_cpu);
> > 
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: PING: [PATCH v5 0/7] KVM: arm64: Implement API for vGICv3 live migration

2015-10-20 Thread Pavel Fedin
 Hello!

> We'll get to your patches when we get to them, but seriously, you have
> to curb your eagerness a bit.

 Ok, sorry for this disturbance then. It is enough for me to know it.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PING: [PATCH v5 0/7] KVM: arm64: Implement API for vGICv3 live migration

2015-10-20 Thread Christoffer Dall
On Tue, Oct 20, 2015 at 12:30:41PM +0300, Pavel Fedin wrote:
>  Hello! How are things going? There was lots of activity, discussions, and 
> then - silence...
> 

As I told you, this is not going to make it for v4.4, and if you haven't
noticed we're getting close to the release of v4.3 and to the v4.4 merge
window opening up, so we're busy.

Also, take a look at the number of patches on the kvmarm list, they all
have to be reviewed, and the think about all the patches you reviewed
(wait, that was none), and then you know how many patches I have to
review.

You can keep sending your pings, but they're useless, as we've told you
before, because we're not ignoring your patches, we're just busy and
have to prioritize.

We'll get to your patches when we get to them, but seriously, you have
to curb your eagerness a bit.

Thanks,
-Christoffer


> 
> 
> > -Original Message-
> > From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On 
> > Behalf Of Pavel Fedin
> > Sent: Monday, October 12, 2015 11:30 AM
> > To: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org
> > Cc: Christoffer Dall; Marc Zyngier; Andre Przywara
> > Subject: [PATCH v5 0/7] KVM: arm64: Implement API for vGICv3 live migration
> > 
> > This patchset adds necessary userspace API in order to support vGICv3 live
> > migration. GICv3 registers are accessed using device attribute ioctls,
> > similar to GICv2.
> > 
> > Whoever wants to test it, please note that this version is not
> > binary-compatible with previous one, the API has been seriously changed.
> > qemu patchess will be posted in some time.
> > 
> > v4 => v5:
> > - Adapted to new API by Peter Maydell, Marc Zyngier and Christoffer Dall.
> >   Acked-by's on the documentation were dropped, just in case, because i
> >   slightly adjusted it. Additionally, i merged all doc updates into one
> >   patch.
> > 
> > v3 => v4:
> > - Split pure refactoring from anything else
> > - Documentation brought up to date
> > - Cleaned up 'mmio' structure usage in vgic_attr_regs_access(),
> >   use call_range_handler() for 64-bit access handling
> > - Rebased on new linux-next
> > 
> > v2 => v3:
> > - KVM_DEV_ARM_VGIC_CPUID_MASK enlarged to 20 bits, allowing more than 256
> >   CPUs.
> > - Bug fix: Correctly set mmio->private, necessary for redistributor access.
> > - Added accessors for ICC_AP0R and ICC_AP1R registers
> > - Rebased on new linux-next
> > 
> > v1 => v2:
> > - Do not use generic register get/set API for CPU interface, use only
> >   device attributes.
> > - Introduce size specifier for distributor and redistributor register
> >   accesses, do not assume size any more.
> > - Lots of refactor and reusable code extraction.
> > - Added forgotten documentation
> > 
> > Christoffer Dall (1):
> >   KVM: arm/arm64: Update API documentation
> > 
> > Pavel Fedin (6):
> >   KVM: arm/arm64: Move endianess conversion out of
> > vgic_attr_regs_access()
> >   KVM: arm/arm64: Refactor vGIC attributes handling code
> >   KVM: arm64: Implement vGICv3 distributor and redistributor access from
> > userspace
> >   KVM: arm64: Refactor system register handlers
> >   KVM: arm64: Introduce find_reg_by_id()
> >   KVM: arm64: Implement vGICv3 CPU interface access
> > 
> >  Documentation/virtual/kvm/devices/arm-vgic-v3.txt | 117 
> >  Documentation/virtual/kvm/devices/arm-vgic.txt|  41 +--
> >  arch/arm64/include/uapi/asm/kvm.h |  17 +-
> >  arch/arm64/kvm/sys_regs.c |  83 +++---
> >  arch/arm64/kvm/sys_regs.h |   8 +-
> >  arch/arm64/kvm/sys_regs_generic_v8.c  |   2 +-
> >  include/linux/irqchip/arm-gic-v3.h|  19 +-
> >  virt/kvm/arm/vgic-v2-emul.c   | 124 ++--
> >  virt/kvm/arm/vgic-v3-emul.c   | 343 
> > +-
> >  virt/kvm/arm/vgic.c   |  57 
> >  virt/kvm/arm/vgic.h   |   3 +
> >  11 files changed, 630 insertions(+), 184 deletions(-)
> >  create mode 100644 Documentation/virtual/kvm/devices/arm-vgic-v3.txt
> > 
> > --
> > 2.4.4
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Difference between vcpu_load and kvm_sched_in ?

2015-10-20 Thread Yacine
Hi, I'm a student working on virtual machine introspection.

I'm trying to implement an application on top of KVM in which I need to trap
writes to CR3 (host with 8 cores and guest with one vcpu).

When I do this when handling a VM EXIT using:
vmcs_set_bits(CPU_BASED_VM_EXEC_CONTROL, CPU_BASED_CR3_LOAD_EXITING), it
works correctly and I can see the traps in my log file.

Now when I do the same thing after receiving a command from Qemu (command is
handled in kvm_vm_ioctl by calling a function I added to kvm_x86_ops
vmx_x86_ops) I get a vmwrite error. I found out that the problem is because
the logical processor on the host that is handling the ioctl command is not
the same that is running the VM and holding its state; so I must do the
vmwrite on the one executing the VM

To change the logical cpu executing the VM, I tried this:

vcpu_load; start cr3 trapping; vcpu_put

it worked correctly (in my logs I see that vcpu.cpu become equal to "cpu =
raw_smp_processor_id();") but the VM blocks for a lot of time due to mutex
in vcpu_load (up to serveral seconds and sometimes minutes !)

I replaced vcpu_load with kvm_sched_in, now everything works perfectly and
the VM doesn't block at all (logs here: http://pastebin.com/h5XNNMcb).

So, what I want to know is: what is the difference between vcpu_load and
kvm_sched_in ? both of this functions call kvm_arch_vcpu_loadbut the latter
one does it without doing a mutex

Is there a problem in using kvm_sched_in instead of vcpu_load for my use case ?


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network hangs when communicating with host

2015-10-20 Thread Sasha Levin
On 10/20/2015 09:42 AM, Dmitry Vyukov wrote:
> I now have another issue. My binary fails to mmap a file within lkvm
> sandbox. The same binary works fine on host and in qemu. I've added
> strace into sandbox script, and here is the output:
> 
> [pid   837] openat(AT_FDCWD, "syzkaller-shm048878722", O_RDWR|O_CLOEXEC) = 5
> [pid   837] mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_SHARED, 5,
> 0) = -1 EINVAL (Invalid argument)
> 
> I don't see anything that can potentially cause EINVAL here. Is it
> possible that lkvm somehow affects kernel behavior here?
> 
> I run lkvm as:
> 
> $ taskset 1 /kvmtool/lkvm sandbox --disk syz-0 --mem=2048 --cpus=2
> --kernel /arch/x86/boot/bzImage --network mode=user --sandbox
> /workdir/kvm/syz-0.sh

It's possible that something in the virtio-9p layer is broken. I'll give
it a look in the evening.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 12/12] KVM: VMX: Dump TSC multiplier in dump_vmcs()

2015-10-20 Thread Haozhong Zhang
This patch enhances dump_vmcs() to dump the value of TSC multiplier
field in VMCS.

Signed-off-by: Haozhong Zhang 
---
 arch/x86/kvm/vmx.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a02b59c..66d25be 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8013,6 +8013,9 @@ static void dump_vmcs(void)
   vmcs_read32(IDT_VECTORING_INFO_FIELD),
   vmcs_read32(IDT_VECTORING_ERROR_CODE));
pr_err("TSC Offset = 0x%016lx\n", vmcs_readl(TSC_OFFSET));
+   if (secondary_exec_control & SECONDARY_EXEC_TSC_SCALING)
+   pr_err("TSC Multiplier = 0x%016lx\n",
+  vmcs_readl(TSC_MULTIPLIER));
if (cpu_based_exec_ctrl & CPU_BASED_TPR_SHADOW)
pr_err("TPR Threshold = 0x%02x\n", vmcs_read32(TPR_THRESHOLD));
if (pin_based_exec_ctrl & PIN_BASED_POSTED_INTR)
-- 
2.4.8

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFT - PATCH v2 0/2] KVM/arm64: add fp/simd lazy switch support

2015-10-20 Thread Christoffer Dall
On Mon, Oct 19, 2015 at 03:06:59PM -0700, Mario Smarduch wrote:
> 
> 
> On 10/18/2015 2:07 PM, Christoffer Dall wrote:
> > On Mon, Oct 12, 2015 at 09:29:23AM -0700, Mario Smarduch wrote:
> >> Hi Christoffer, Marc -
> >>   I just threw this test your way without any explanation.
> > 
> > I'm confused.  Did you send me something somewhere already?
> Yes in the last patchset
> 
> https://lists.cs.columbia.edu/pipermail/kvmarm/2015-October/016698.html
> 
> I included a simple test I put together.
> 

Sorry, I missed that change in the cover letter.

> > 
> >>
> >> The test loops, does fp arithmetic and checks the truncated result.
> >> It could be a little more dynamic have an initial run to
> >> get the sum to compare against while looping, different fp
> >> hardware may come up with a different sum, but truncation is
> >> to 5'th decimal point.
> >>
> >> The rationale is that if there is any fp/simd corruption
> >> one of these runs should fail. I think most likely scenario
> >> for that is a world switch in midst of fp operation. I've
> >> instrumented (basically add some tracing to vcpu_put()) and
> >> validated vcpu_put gets called thousands of time (for v7,v8)
> >> for an over night test running two guests/host crunching
> >> fp operations.
> >>
> >> Other then that not sure how to really catch any problems
> >> with the patches applied. Obviously this is a huge issues, if this has
> >> any problems. If you or Marc have any other ideas I'd be happy
> >> to enhance the test.
> > 
> > I think it's important to run two VMs at the same time, each with some
> > floating-point work, and then run some floating point on the host at the
> > same time.
> > 
> > You can make that even more interesting by doing 32-bit guests at the
> > same time as well.
> 
> Yes that's the test combination I've been running.

ok, cool, then I trust these patches.

> > 
> > I believe Marc was running Panranoia
> > (http://www.netlib.org/paranoia/paranoia.c) to test the last lazy
> > series.
> 
> I'll try this test and run it for several days, see if anything shows up.
> 

I actually don't know what it does, i.e. if it just uses the FP hardware
or if it actually checks the results produced.

Several days may be unnecessary, but if your machine has nothing else to
do, then why not.

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] KVM/arm: enable enhanced armv7 fp/simd lazy switch

2015-10-20 Thread Christoffer Dall
On Mon, Oct 19, 2015 at 04:25:04PM -0700, Mario Smarduch wrote:
> 
> 
> On 10/19/2015 3:14 AM, Christoffer Dall wrote:
> > On Sat, Sep 26, 2015 at 04:43:29PM -0700, Mario Smarduch wrote:
> >> This patch enhances current lazy vfp/simd hardware switch. In addition to
> >> current lazy switch, it tracks vfp/simd hardware state with a vcpu 
> >> lazy flag. 
> >>
> >> vcpu lazy flag is set on guest access and trap to vfp/simd hardware switch 
> >> handler. On vm-enter if lazy flag is set skip trap enable and saving 
> >> host fpexc. On vm-exit if flag is set skip hardware context switch
> >> and return to host with guest context.
> >>
> >> On vcpu_put check if vcpu lazy flag is set, and execute a hardware context 
> >> switch to restore host.
> >>
> >> Signed-off-by: Mario Smarduch 
> >> ---
> >>  arch/arm/include/asm/kvm_asm.h |  1 +
> >>  arch/arm/kvm/arm.c | 17 
> >>  arch/arm/kvm/interrupts.S  | 60 
> >> +++---
> >>  arch/arm/kvm/interrupts_head.S | 12 ++---
> >>  4 files changed, 71 insertions(+), 19 deletions(-)
> >>
> >> diff --git a/arch/arm/include/asm/kvm_asm.h 
> >> b/arch/arm/include/asm/kvm_asm.h
> >> index 194c91b..4b45d79 100644
> >> --- a/arch/arm/include/asm/kvm_asm.h
> >> +++ b/arch/arm/include/asm/kvm_asm.h
> >> @@ -97,6 +97,7 @@ extern char __kvm_hyp_code_end[];
> >>  extern void __kvm_flush_vm_context(void);
> >>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
> >>  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
> >> +extern void __kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
> >>  
> >>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> >>  #endif
> >> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> >> index ce404a5..79f49c7 100644
> >> --- a/arch/arm/kvm/arm.c
> >> +++ b/arch/arm/kvm/arm.c
> >> @@ -105,6 +105,20 @@ void kvm_arch_check_processor_compat(void *rtn)
> >>*(int *)rtn = 0;
> >>  }
> >>  
> >> +/**
> >> + * kvm_switch_fp_regs() - switch guest/host VFP/SIMD registers
> >> + * @vcpu: pointer to vcpu structure.
> >> + *
> > 
> > nit: stray blank line
> ok
> > 
> >> + */
> >> +static void kvm_switch_fp_regs(struct kvm_vcpu *vcpu)
> >> +{
> >> +#ifdef CONFIG_ARM
> >> +  if (vcpu->arch.vfp_lazy == 1) {
> >> +  kvm_call_hyp(__kvm_restore_host_vfp_state, vcpu);
> > 
> > why do you have to do this in HYP mode ?
>  Calling it directly works fine. I moved the function outside hyp start/end
> range in interrupts.S. Not thinking outside the box, just thought let them all
> be hyp calls.
> 
> > 
> >> +  vcpu->arch.vfp_lazy = 0;
> >> +  }
> >> +#endif
> > 
> > we've tried to put stuff like this in header files to avoid the ifdefs
> > so far.  Could that be done here as well?
> 
> That was a to do, but didn't get around to it.
> > 
> >> +}
> >>  
> >>  /**
> >>   * kvm_arch_init_vm - initializes a VM data structure
> >> @@ -295,6 +309,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >>  
> >>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> >>  {
> >> +  /* Check if Guest accessed VFP registers */
> > 
> > misleading comment: this function does more than checking
> Yep sure does, will change.
> > 
> >> +  kvm_switch_fp_regs(vcpu);
> >> +
> >>/*
> >> * The arch-generic KVM code expects the cpu field of a vcpu to be -1
> >> * if the vcpu is no longer assigned to a cpu.  This is used for the
> >> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
> >> index 900ef6d..6d98232 100644
> >> --- a/arch/arm/kvm/interrupts.S
> >> +++ b/arch/arm/kvm/interrupts.S
> >> @@ -96,6 +96,29 @@ ENTRY(__kvm_flush_vm_context)
> >>bx  lr
> >>  ENDPROC(__kvm_flush_vm_context)
> >>  
> >> +/**
> >> + * void __kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes a lazy
> >> + * fp/simd switch, saves the guest, restores host.
> >> + *
> > 
> > nit: stray blank line
> ok.
> > 
> >> + */
> >> +ENTRY(__kvm_restore_host_vfp_state)
> >> +#ifdef CONFIG_VFPv3
> >> +  push{r3-r7}
> >> +
> >> +  add r7, r0, #VCPU_VFP_GUEST
> >> +  store_vfp_state r7
> >> +
> >> +  add r7, r0, #VCPU_VFP_HOST
> >> +  ldr r7, [r7]
> >> +  restore_vfp_state r7
> >> +
> >> +  ldr r3, [vcpu, #VCPU_VFP_FPEXC]
> > 
> > either use r0 or vcpu throughout this function please
> Yeah that's bad - in the same function to
> > 
> >> +  VFPFMXR FPEXC, r3
> >> +
> >> +  pop {r3-r7}
> >> +#endif
> >> +  bx  lr
> >> +ENDPROC(__kvm_restore_host_vfp_state)
> >>  
> >>  /
> >>   *  Hypervisor world-switch code
> >> @@ -119,11 +142,15 @@ ENTRY(__kvm_vcpu_run)
> >>@ If the host kernel has not been configured with VFPv3 support,
> >>@ then it is safer if we deny guests from using it as well.
> >>  #ifdef CONFIG_VFPv3
> >> +  @ r7 must be preserved until next vfp lazy check
> > 
> > I don't understand this comment
> > 
> >> +  vfp_inlazy_mode r7, 

[PATCH v2 2/3] target-i386: calculate vcpu's TSC rate to be migrated

2015-10-20 Thread Haozhong Zhang
If vcpu's TSC rate is not specified by the cpu option 'tsc-freq', we
will use the value returned by KVM_GET_TSC_KHZ; otherwise, we use the
user-specified value.

Signed-off-by: Haozhong Zhang 
---
 target-i386/kvm.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 80d1a7e..698524a 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -2213,6 +2213,35 @@ static int kvm_get_debugregs(X86CPU *cpu)
 return 0;
 }
 
+static int kvm_setup_tsc_khz(X86CPU *cpu, int level)
+{
+CPUState *cs = CPU(cpu);
+CPUX86State *env = >env;
+int r;
+
+if (level < KVM_PUT_FULL_STATE)
+return 0;
+
+/*
+ * Prepare the vcpu's TSC rate (ie. env->tsc_khz_incoming) to be migrated.
+ * 1. If no user-specified value is provided, we will use the value from
+ *KVM;
+ * 2. Otherwise, we just use the user-specified value.
+ */
+if (!env->tsc_khz) {
+r = kvm_vcpu_ioctl(cs, KVM_GET_TSC_KHZ);
+if (r < 0) {
+fprintf(stderr, "KVM_GET_TSC_KHZ failed\n");
+return r;
+}
+env->tsc_khz_incoming = r;
+} else {
+env->tsc_khz_incoming = env->tsc_khz;
+}
+
+return 0;
+}
+
 int kvm_arch_put_registers(CPUState *cpu, int level)
 {
 X86CPU *x86_cpu = X86_CPU(cpu);
@@ -2248,6 +2277,10 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
 if (ret < 0) {
 return ret;
 }
+ret = kvm_setup_tsc_khz(x86_cpu, level);
+if (ret < 0) {
+return ret;
+}
 ret = kvm_put_msrs(x86_cpu, level);
 if (ret < 0) {
 return ret;
-- 
2.4.8

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/3] target-i386: add a subsection for migrating vcpu's TSC rate

2015-10-20 Thread Haozhong Zhang
The newly added subsection 'vmstate_tsc_khz' is used by following
patches to migrate vcpu's TSC rate. For the back migration
compatibility, this subsection is not migrated on pc-*-2.4 and older
machine types by default. If users do want to migrate this subsection on
older machine types, they can enable it by giving a new cpu option
'save-tsc-freq'.

Signed-off-by: Haozhong Zhang 
---
 include/hw/i386/pc.h  |  5 +
 target-i386/cpu.c |  1 +
 target-i386/cpu.h |  2 ++
 target-i386/machine.c | 19 +++
 4 files changed, 27 insertions(+)

diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 0503485..7fde50f 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -300,6 +300,11 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
 #define PC_COMPAT_2_4 \
 HW_COMPAT_2_4 \
 {\
+.driver   = TYPE_X86_CPU,\
+.property = "save-tsc-freq",\
+.value= "off",\
+},\
+{\
 .driver   = "Haswell-" TYPE_X86_CPU,\
 .property = "abm",\
 .value= "off",\
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 05d7f26..b6bb457 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -3143,6 +3143,7 @@ static Property x86_cpu_properties[] = {
 DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, false),
 DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false),
 DEFINE_PROP_BOOL("kvm", X86CPU, expose_kvm, true),
+DEFINE_PROP_BOOL("save-tsc-freq", X86CPU, env.save_tsc_khz, true),
 DEFINE_PROP_UINT32("level", X86CPU, env.cpuid_level, 0),
 DEFINE_PROP_UINT32("xlevel", X86CPU, env.cpuid_xlevel, 0),
 DEFINE_PROP_UINT32("xlevel2", X86CPU, env.cpuid_xlevel2, 0),
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 54d9d50..ba1a289 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -966,6 +966,8 @@ typedef struct CPUX86State {
 uint32_t sipi_vector;
 bool tsc_valid;
 int64_t tsc_khz;
+int64_t tsc_khz_incoming;
+bool save_tsc_khz;
 void *kvm_xsave_buf;
 
 uint64_t mcg_cap;
diff --git a/target-i386/machine.c b/target-i386/machine.c
index 9fa0563..7d68d63 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -752,6 +752,24 @@ static const VMStateDescription vmstate_xss = {
 }
 };
 
+static bool tsc_khz_needed(void *opaque)
+{
+X86CPU *cpu = opaque;
+CPUX86State *env = >env;
+return env->tsc_khz_incoming && env->save_tsc_khz;
+}
+
+static const VMStateDescription vmstate_tsc_khz = {
+.name = "cpu/tsc_khz",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = tsc_khz_needed,
+.fields = (VMStateField[]) {
+VMSTATE_INT64(env.tsc_khz_incoming, X86CPU),
+VMSTATE_END_OF_LIST()
+}
+};
+
 VMStateDescription vmstate_x86_cpu = {
 .name = "cpu",
 .version_id = 12,
@@ -871,6 +889,7 @@ VMStateDescription vmstate_x86_cpu = {
 _msr_hyperv_crash,
 _avx512,
 _xss,
+_tsc_khz,
 NULL
 }
 };
-- 
2.4.8

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/3] target-i386: save/restore vcpu's TSC rate during migration

2015-10-20 Thread Haozhong Zhang
This patchset enables QEMU to save/restore vcpu's TSC rate during the
migration. When cooperating with KVM which supports TSC scaling, guest
programs can observe a consistent guest TSC rate even though they are
migrated among machines with different host TSC rates.

A pair of cpu options 'save-tsc-freq' and 'load-tsc-freq' are added to
control the migration of vcpu's TSC rate.
 * By default, the migration of vcpu's TSC rate is enabled only on
   pc-*-2.5 and newer machine types. If the cpu option 'save-tsc-freq'
   is present, the vcpu's TSC rate will be migrated from older machine
   types as well.
 * Another cpu option 'load-tsc-freq' controls whether the migrated
   vcpu's TSC rate is used. By default, QEMU will not use the migrated
   TSC rate if this option is not present. Otherwise, QEMU will use
   the migrated TSC rate and override the TSC rate given by the cpu
   option 'tsc-freq'.

Changes in v2:
 * Add a pair of cpu options 'save-tsc-freq' and 'load-tsc-freq' to
   control the migration of vcpu's TSC rate.
 * Move all logic of setting TSC rate to target-i386.
 * Remove the duplicated TSC setup in kvm_arch_init_vcpu().

Haozhong Zhang (3):
  target-i386: add a subsection for migrating vcpu's TSC rate
  target-i386: calculate vcpu's TSC rate to be migrated
  target-i386: load the migrated vcpu's TSC rate

 include/hw/i386/pc.h  |  5 +
 target-i386/cpu.c |  2 ++
 target-i386/cpu.h |  3 +++
 target-i386/kvm.c | 61 +++
 target-i386/machine.c | 19 
 5 files changed, 81 insertions(+), 9 deletions(-)

-- 
2.4.8

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/3] target-i386: load the migrated vcpu's TSC rate

2015-10-20 Thread Haozhong Zhang
Set vcpu's TSC rate to the migrated value (if any). If KVM supports TSC
scaling, guest programs will observe TSC increasing in the migrated rate
other than the host TSC rate.

The loading is controlled by a new cpu option 'load-tsc-freq'. If it is
present, then the loading will be enabled and the migrated vcpu's TSC
rate will override the value specified by the cpu option
'tsc-freq'. Otherwise, the loading will be disabled.

The setting of vcpu's TSC rate in this patch duplicates the code in
kvm_arch_init_vcpu(), so we remove the latter one.

Signed-off-by: Haozhong Zhang 
---
 target-i386/cpu.c |  1 +
 target-i386/cpu.h |  1 +
 target-i386/kvm.c | 28 +++-
 3 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index b6bb457..763ba4b 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -3144,6 +3144,7 @@ static Property x86_cpu_properties[] = {
 DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false),
 DEFINE_PROP_BOOL("kvm", X86CPU, expose_kvm, true),
 DEFINE_PROP_BOOL("save-tsc-freq", X86CPU, env.save_tsc_khz, true),
+DEFINE_PROP_BOOL("load-tsc-freq", X86CPU, env.load_tsc_khz, false),
 DEFINE_PROP_UINT32("level", X86CPU, env.cpuid_level, 0),
 DEFINE_PROP_UINT32("xlevel", X86CPU, env.cpuid_xlevel, 0),
 DEFINE_PROP_UINT32("xlevel2", X86CPU, env.cpuid_xlevel2, 0),
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index ba1a289..353f5fb 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -968,6 +968,7 @@ typedef struct CPUX86State {
 int64_t tsc_khz;
 int64_t tsc_khz_incoming;
 bool save_tsc_khz;
+bool load_tsc_khz;
 void *kvm_xsave_buf;
 
 uint64_t mcg_cap;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 698524a..34616f5 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -743,15 +743,6 @@ int kvm_arch_init_vcpu(CPUState *cs)
 return r;
 }
 
-r = kvm_check_extension(cs->kvm_state, KVM_CAP_TSC_CONTROL);
-if (r && env->tsc_khz) {
-r = kvm_vcpu_ioctl(cs, KVM_SET_TSC_KHZ, env->tsc_khz);
-if (r < 0) {
-fprintf(stderr, "KVM_SET_TSC_KHZ failed\n");
-return r;
-}
-}
-
 if (kvm_has_xsave()) {
 env->kvm_xsave_buf = qemu_memalign(4096, sizeof(struct kvm_xsave));
 }
@@ -2223,6 +2214,25 @@ static int kvm_setup_tsc_khz(X86CPU *cpu, int level)
 return 0;
 
 /*
+ * If the cpu option 'load-tsc-freq' is present, the vcpu's TSC rate in the
+ * migrated state will be used and the overrides the user-specified vcpu's
+ * TSC rate (if any).
+ */
+if (runstate_check(RUN_STATE_INMIGRATE) &&
+env->load_tsc_khz && env->tsc_khz_incoming) {
+env->tsc_khz = env->tsc_khz_incoming;
+}
+
+r = kvm_check_extension(cs->kvm_state, KVM_CAP_TSC_CONTROL);
+if (r && env->tsc_khz) {
+r = kvm_vcpu_ioctl(cs, KVM_SET_TSC_KHZ, env->tsc_khz);
+if (r < 0) {
+fprintf(stderr, "KVM_SET_TSC_KHZ failed\n");
+return r;
+}
+}
+
+/*
  * Prepare the vcpu's TSC rate (ie. env->tsc_khz_incoming) to be migrated.
  * 1. If no user-specified value is provided, we will use the value from
  *KVM;
-- 
2.4.8

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 07/12] KVM: x86: Move TSC scaling logic out of call-back read_l1_tsc()

2015-10-20 Thread Haozhong Zhang
Both VMX and SVM scales the host TSC in the same way in call-back
read_l1_tsc(), so this patch moves the scaling logic from call-back
read_l1_tsc() to a common function kvm_read_l1_tsc().

Signed-off-by: Haozhong Zhang 
---
 arch/x86/kvm/lapic.c |  4 ++--
 arch/x86/kvm/svm.c   |  3 +--
 arch/x86/kvm/x86.c   | 11 ---
 include/linux/kvm_host.h |  1 +
 4 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 168b875..355a400 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1250,7 +1250,7 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu)
 
tsc_deadline = apic->lapic_timer.expired_tscdeadline;
apic->lapic_timer.expired_tscdeadline = 0;
-   guest_tsc = kvm_x86_ops->read_l1_tsc(vcpu, rdtsc());
+   guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
trace_kvm_wait_lapic_expire(vcpu->vcpu_id, guest_tsc - tsc_deadline);
 
/* __delay is delay_tsc whenever the hardware has TSC, thus always.  */
@@ -1318,7 +1318,7 @@ static void start_apic_timer(struct kvm_lapic *apic)
local_irq_save(flags);
 
now = apic->lapic_timer.timer.base->get_time();
-   guest_tsc = kvm_x86_ops->read_l1_tsc(vcpu, rdtsc());
+   guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
if (likely(tscdeadline > guest_tsc)) {
ns = (tscdeadline - guest_tsc) * 100ULL;
do_div(ns, this_tsc_khz);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 9cfc02a..8e46be1 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2985,8 +2985,7 @@ static int cr8_write_interception(struct vcpu_svm *svm)
 static u64 svm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)
 {
struct vmcb *vmcb = get_host_vmcb(to_svm(vcpu));
-   return vmcb->control.tsc_offset +
-   kvm_scale_tsc(vcpu, host_tsc);
+   return vmcb->control.tsc_offset + host_tsc;
 }
 
 static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 75129bd..f2516bf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1406,6 +1406,12 @@ static u64 kvm_compute_tsc_offset(struct kvm_vcpu *vcpu, 
u64 target_tsc)
return target_tsc - tsc;
 }
 
+u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)
+{
+   return kvm_x86_ops->read_l1_tsc(vcpu, kvm_scale_tsc(vcpu, host_tsc));
+}
+EXPORT_SYMBOL_GPL(kvm_read_l1_tsc);
+
 void kvm_write_tsc(struct kvm_vcpu *vcpu, struct msr_data *msr)
 {
struct kvm *kvm = vcpu->kvm;
@@ -1729,7 +1735,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
kernel_ns = get_kernel_ns();
}
 
-   tsc_timestamp = kvm_x86_ops->read_l1_tsc(v, host_tsc);
+   tsc_timestamp = kvm_read_l1_tsc(v, host_tsc);
 
/*
 * We may have to catch up the TSC to match elapsed wall clock
@@ -6532,8 +6538,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (hw_breakpoint_active())
hw_breakpoint_restore();
 
-   vcpu->arch.last_guest_tsc = kvm_x86_ops->read_l1_tsc(vcpu,
-  rdtsc());
+   vcpu->arch.last_guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
 
vcpu->mode = OUTSIDE_GUEST_MODE;
smp_wmb();
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 95a6bf2..0d3fd3c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1185,6 +1185,7 @@ int kvm_arch_update_irqfd_routing(struct kvm *kvm, 
unsigned int host_irq,
 #endif /* CONFIG_HAVE_KVM_IRQ_BYPASS */
 
 u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc);
+u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc);
 
 extern struct kvm_x86_ops *kvm_x86_ops;
 
-- 
2.4.8

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 03/12] KVM: x86: Add a common TSC scaling function

2015-10-20 Thread Haozhong Zhang
VMX and SVM calculate the TSC scaling ratio in a similar logic, so this
patch generalizes it to a common TSC scaling function.

Signed-off-by: Haozhong Zhang 
---
 arch/x86/kvm/svm.c   | 48 +++--
 arch/x86/kvm/x86.c   | 45 +++
 include/linux/kvm_host.h |  3 +++
 include/linux/math64.h   | 70 
 4 files changed, 122 insertions(+), 44 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 04b58cf..d347170 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -212,7 +212,6 @@ static int nested_svm_intercept(struct vcpu_svm *svm);
 static int nested_svm_vmexit(struct vcpu_svm *svm);
 static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
  bool has_error_code, u32 error_code);
-static u64 __scale_tsc(u64 ratio, u64 tsc);
 
 enum {
VMCB_INTERCEPTS, /* Intercept vectors, TSC offset,
@@ -892,21 +891,7 @@ static __init int svm_hardware_setup(void)
kvm_enable_efer_bits(EFER_FFXSR);
 
if (boot_cpu_has(X86_FEATURE_TSCRATEMSR)) {
-   u64 max;
-
kvm_has_tsc_control = true;
-
-   /*
-* Make sure the user can only configure tsc_khz values that
-* fit into a signed integer.
-* A min value is not calculated needed because it will always
-* be 1 on all machines and a value of 0 is used to disable
-* tsc-scaling for the vcpu.
-*/
-   max = min(0x7fffULL, __scale_tsc(tsc_khz, TSC_RATIO_MAX));
-
-   kvm_max_guest_tsc_khz = max;
-
kvm_max_tsc_scaling_ratio = TSC_RATIO_MAX;
kvm_tsc_scaling_ratio_frac_bits = 32;
}
@@ -973,31 +958,6 @@ static void init_sys_seg(struct vmcb_seg *seg, uint32_t 
type)
seg->base = 0;
 }
 
-static u64 __scale_tsc(u64 ratio, u64 tsc)
-{
-   u64 mult, frac, _tsc;
-
-   mult  = ratio >> 32;
-   frac  = ratio & ((1ULL << 32) - 1);
-
-   _tsc  = tsc;
-   _tsc *= mult;
-   _tsc += (tsc >> 32) * frac;
-   _tsc += ((tsc & ((1ULL << 32) - 1)) * frac) >> 32;
-
-   return _tsc;
-}
-
-static u64 svm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc)
-{
-   u64 _tsc = tsc;
-
-   if (vcpu->arch.tsc_scaling_ratio != TSC_RATIO_DEFAULT)
-   _tsc = __scale_tsc(vcpu->arch.tsc_scaling_ratio, tsc);
-
-   return _tsc;
-}
-
 static void svm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool 
scale)
 {
u64 ratio;
@@ -1066,7 +1026,7 @@ static void svm_adjust_tsc_offset(struct kvm_vcpu *vcpu, 
s64 adjustment, bool ho
if (host) {
if (vcpu->arch.tsc_scaling_ratio != TSC_RATIO_DEFAULT)
WARN_ON(adjustment < 0);
-   adjustment = svm_scale_tsc(vcpu, (u64)adjustment);
+   adjustment = kvm_scale_tsc(vcpu, (u64)adjustment);
}
 
svm->vmcb->control.tsc_offset += adjustment;
@@ -1084,7 +1044,7 @@ static u64 svm_compute_tsc_offset(struct kvm_vcpu *vcpu, 
u64 target_tsc)
 {
u64 tsc;
 
-   tsc = svm_scale_tsc(vcpu, rdtsc());
+   tsc = kvm_scale_tsc(vcpu, rdtsc());
 
return target_tsc - tsc;
 }
@@ -3076,7 +3036,7 @@ static u64 svm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 
host_tsc)
 {
struct vmcb *vmcb = get_host_vmcb(to_svm(vcpu));
return vmcb->control.tsc_offset +
-   svm_scale_tsc(vcpu, host_tsc);
+   kvm_scale_tsc(vcpu, host_tsc);
 }
 
 static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
@@ -3086,7 +3046,7 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
switch (msr_info->index) {
case MSR_IA32_TSC: {
msr_info->data = svm->vmcb->control.tsc_offset +
-   svm_scale_tsc(vcpu, rdtsc());
+   kvm_scale_tsc(vcpu, rdtsc());
 
break;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8849e8b..29c5781 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1328,6 +1328,39 @@ static void update_ia32_tsc_adjust_msr(struct kvm_vcpu 
*vcpu, s64 offset)
vcpu->arch.ia32_tsc_adjust_msr += offset - curr_offset;
 }
 
+/*
+ * Multiply tsc by a fixed point number represented by ratio.
+ *
+ * The most significant 64-N bits (mult) of ratio represent the
+ * integral part of the fixed point number; the remaining N bits
+ * (frac) represent the fractional part, ie. ratio represents a fixed
+ * point number (mult + frac * 2^(-N)).
+ *
+ * N.B: we always assume not all 64 bits of ratio are used for the
+ * fractional part and the ratio has at least 1 bit for the fractional
+ * part, i.e. 0 < N < 64.
+ *
+ * N equals to kvm_tsc_scaling_ratio_frac_bits.
+ */
+static inline u64 __scale_tsc(u64 ratio, u64 tsc)
+{
+   

[PATCH v2 04/12] KVM: x86: Replace call-back set_tsc_khz() with a common function

2015-10-20 Thread Haozhong Zhang
Both VMX and SVM propagate virtual_tsc_khz in the same way, so this
patch removes the call-back set_tsc_khz() and replaces it with a common
function.

Signed-off-by: Haozhong Zhang 
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/svm.c  | 36 
 arch/x86/kvm/vmx.c  | 17 ---
 arch/x86/kvm/x86.c  | 46 -
 include/linux/math64.h  | 29 ++
 5 files changed, 70 insertions(+), 59 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1e08ad5..c67469b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -853,7 +853,6 @@ struct kvm_x86_ops {
 
bool (*has_wbinvd_exit)(void);
 
-   void (*set_tsc_khz)(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool 
scale);
u64 (*read_tsc_offset)(struct kvm_vcpu *vcpu);
void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index d347170..a1364927 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -958,41 +958,6 @@ static void init_sys_seg(struct vmcb_seg *seg, uint32_t 
type)
seg->base = 0;
 }
 
-static void svm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool 
scale)
-{
-   u64 ratio;
-   u64 khz;
-
-   /* Guest TSC same frequency as host TSC? */
-   if (!scale) {
-   vcpu->arch.tsc_scaling_ratio = TSC_RATIO_DEFAULT;
-   return;
-   }
-
-   /* TSC scaling supported? */
-   if (!boot_cpu_has(X86_FEATURE_TSCRATEMSR)) {
-   if (user_tsc_khz > tsc_khz) {
-   vcpu->arch.tsc_catchup = 1;
-   vcpu->arch.tsc_always_catchup = 1;
-   } else
-   WARN(1, "user requested TSC rate below hardware 
speed\n");
-   return;
-   }
-
-   khz = user_tsc_khz;
-
-   /* TSC scaling required  - calculate ratio */
-   ratio = khz << 32;
-   do_div(ratio, tsc_khz);
-
-   if (ratio == 0 || ratio & TSC_RATIO_RSVD) {
-   WARN_ONCE(1, "Invalid TSC ratio - virtual-tsc-khz=%u\n",
-   user_tsc_khz);
-   return;
-   }
-   vcpu->arch.tsc_scaling_ratio = ratio;
-}
-
 static u64 svm_read_tsc_offset(struct kvm_vcpu *vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
@@ -4403,7 +4368,6 @@ static struct kvm_x86_ops svm_x86_ops = {
 
.has_wbinvd_exit = svm_has_wbinvd_exit,
 
-   .set_tsc_khz = svm_set_tsc_khz,
.read_tsc_offset = svm_read_tsc_offset,
.write_tsc_offset = svm_write_tsc_offset,
.adjust_tsc_offset = svm_adjust_tsc_offset,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 15bff51..7f87cf6 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2382,22 +2382,6 @@ static u64 vmx_read_l1_tsc(struct kvm_vcpu *vcpu, u64 
host_tsc)
return host_tsc + tsc_offset;
 }
 
-/*
- * Engage any workarounds for mis-matched TSC rates.  Currently limited to
- * software catchup for faster rates on slower CPUs.
- */
-static void vmx_set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool 
scale)
-{
-   if (!scale)
-   return;
-
-   if (user_tsc_khz > tsc_khz) {
-   vcpu->arch.tsc_catchup = 1;
-   vcpu->arch.tsc_always_catchup = 1;
-   } else
-   WARN(1, "user requested TSC rate below hardware speed\n");
-}
-
 static u64 vmx_read_tsc_offset(struct kvm_vcpu *vcpu)
 {
return vmcs_read64(TSC_OFFSET);
@@ -10828,7 +10812,6 @@ static struct kvm_x86_ops vmx_x86_ops = {
 
.has_wbinvd_exit = cpu_has_vmx_wbinvd_exit,
 
-   .set_tsc_khz = vmx_set_tsc_khz,
.read_tsc_offset = vmx_read_tsc_offset,
.write_tsc_offset = vmx_write_tsc_offset,
.adjust_tsc_offset = vmx_adjust_tsc_offset,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 29c5781..db5ef73 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1252,7 +1252,43 @@ static u32 adjust_tsc_khz(u32 khz, s32 ppm)
return v;
 }
 
-static void kvm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 this_tsc_khz)
+static int set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
+{
+   u64 ratio;
+
+   /* Guest TSC same frequency as host TSC? */
+   if (!scale) {
+   vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
+   return 0;
+   }
+
+   /* TSC scaling supported? */
+   if (!kvm_has_tsc_control) {
+   if (user_tsc_khz > tsc_khz) {
+   vcpu->arch.tsc_catchup = 1;
+   vcpu->arch.tsc_always_catchup = 1;
+   return 0;
+   } else {
+   WARN(1, "user requested TSC rate below hardware 
speed\n");
+   return -1;
+   }
+   }
+
+   

[PATCH v2 06/12] KVM: x86: Move TSC scaling logic out of call-back adjust_tsc_offset()

2015-10-20 Thread Haozhong Zhang
For both VMX and SVM, if the 2nd argument of call-back
adjust_tsc_offset() is the host TSC, then adjust_tsc_offset() will scale
it first. This patch moves this common TSC scaling logic to its caller
adjust_tsc_offset_host() and rename the call-back adjust_tsc_offset() to
adjust_tsc_offset_guest().

Signed-off-by: Haozhong Zhang 
---
 arch/x86/include/asm/kvm_host.h | 15 +--
 arch/x86/kvm/svm.c  | 10 ++
 arch/x86/kvm/vmx.c  |  4 ++--
 include/linux/kvm_host.h| 16 
 4 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d5e820b..b70cebb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -845,7 +845,7 @@ struct kvm_x86_ops {
int (*get_lpage_level)(void);
bool (*rdtscp_supported)(void);
bool (*invpcid_supported)(void);
-   void (*adjust_tsc_offset)(struct kvm_vcpu *vcpu, s64 adjustment, bool 
host);
+   void (*adjust_tsc_offset_guest)(struct kvm_vcpu *vcpu, s64 adjustment);
 
void (*set_tdp_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3);
 
@@ -920,19 +920,6 @@ struct kvm_arch_async_pf {
bool direct_map;
 };
 
-extern struct kvm_x86_ops *kvm_x86_ops;
-
-static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu,
-  s64 adjustment)
-{
-   kvm_x86_ops->adjust_tsc_offset(vcpu, adjustment, false);
-}
-
-static inline void adjust_tsc_offset_host(struct kvm_vcpu *vcpu, s64 
adjustment)
-{
-   kvm_x86_ops->adjust_tsc_offset(vcpu, adjustment, true);
-}
-
 int kvm_mmu_module_init(void);
 void kvm_mmu_module_exit(void);
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 481fdd3..9cfc02a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -984,16 +984,10 @@ static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, 
u64 offset)
mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
 }
 
-static void svm_adjust_tsc_offset(struct kvm_vcpu *vcpu, s64 adjustment, bool 
host)
+static void svm_adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, s64 adjustment)
 {
struct vcpu_svm *svm = to_svm(vcpu);
 
-   if (host) {
-   if (vcpu->arch.tsc_scaling_ratio != TSC_RATIO_DEFAULT)
-   WARN_ON(adjustment < 0);
-   adjustment = kvm_scale_tsc(vcpu, (u64)adjustment);
-   }
-
svm->vmcb->control.tsc_offset += adjustment;
if (is_guest_mode(vcpu))
svm->nested.hsave->control.tsc_offset += adjustment;
@@ -4361,7 +4355,7 @@ static struct kvm_x86_ops svm_x86_ops = {
 
.read_tsc_offset = svm_read_tsc_offset,
.write_tsc_offset = svm_write_tsc_offset,
-   .adjust_tsc_offset = svm_adjust_tsc_offset,
+   .adjust_tsc_offset_guest = svm_adjust_tsc_offset_guest,
.read_l1_tsc = svm_read_l1_tsc,
 
.set_tdp_cr3 = set_tdp_cr3,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 7896395..1f72480 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2413,7 +2413,7 @@ static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, 
u64 offset)
}
 }
 
-static void vmx_adjust_tsc_offset(struct kvm_vcpu *vcpu, s64 adjustment, bool 
host)
+static void vmx_adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, s64 adjustment)
 {
u64 offset = vmcs_read64(TSC_OFFSET);
 
@@ -10809,7 +10809,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
 
.read_tsc_offset = vmx_read_tsc_offset,
.write_tsc_offset = vmx_write_tsc_offset,
-   .adjust_tsc_offset = vmx_adjust_tsc_offset,
+   .adjust_tsc_offset_guest = vmx_adjust_tsc_offset_guest,
.read_l1_tsc = vmx_read_l1_tsc,
 
.set_tdp_cr3 = vmx_set_cr3,
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3556148..95a6bf2 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1186,4 +1186,20 @@ int kvm_arch_update_irqfd_routing(struct kvm *kvm, 
unsigned int host_irq,
 
 u64 kvm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc);
 
+extern struct kvm_x86_ops *kvm_x86_ops;
+
+static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu,
+  s64 adjustment)
+{
+   kvm_x86_ops->adjust_tsc_offset_guest(vcpu, adjustment);
+}
+
+static inline void adjust_tsc_offset_host(struct kvm_vcpu *vcpu, s64 
adjustment)
+{
+   if (vcpu->arch.tsc_scaling_ratio != kvm_default_tsc_scaling_ratio)
+   WARN_ON(adjustment < 0);
+   adjustment = kvm_scale_tsc(vcpu, (u64) adjustment);
+   kvm_x86_ops->adjust_tsc_offset_guest(vcpu, adjustment);
+}
+
 #endif
-- 
2.4.8

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 09/12] KVM: VMX: Enable and initialize VMX TSC scaling

2015-10-20 Thread Haozhong Zhang
This patch exhances kvm-intel module to enable VMX TSC scaling and
collects information of TSC scaling ratio during initialization.

Signed-off-by: Haozhong Zhang 
---
 arch/x86/include/asm/vmx.h |  3 +++
 arch/x86/kvm/vmx.c | 19 ++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index aa336ff..14c63c7 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -73,6 +73,7 @@
 #define SECONDARY_EXEC_ENABLE_PML   0x0002
 #define SECONDARY_EXEC_XSAVES  0x0010
 #define SECONDARY_EXEC_PCOMMIT 0x0020
+#define SECONDARY_EXEC_TSC_SCALING  0x0200
 
 #define PIN_BASED_EXT_INTR_MASK 0x0001
 #define PIN_BASED_NMI_EXITING   0x0008
@@ -167,6 +168,8 @@ enum vmcs_field {
VMWRITE_BITMAP  = 0x2028,
XSS_EXIT_BITMAP = 0x202C,
XSS_EXIT_BITMAP_HIGH= 0x202D,
+   TSC_MULTIPLIER  = 0x2032,
+   TSC_MULTIPLIER_HIGH = 0x2033,
GUEST_PHYSICAL_ADDRESS  = 0x2400,
GUEST_PHYSICAL_ADDRESS_HIGH = 0x2401,
VMCS_LINK_POINTER   = 0x2800,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1f72480..a86f790 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -107,6 +107,9 @@ static u64 __read_mostly host_xss;
 static bool __read_mostly enable_pml = 1;
 module_param_named(pml, enable_pml, bool, S_IRUGO);
 
+#define KVM_VMX_TSC_MULTIPLIER_DEFAULT 0x0001ULL
+#define KVM_VMX_TSC_MULTIPLIER_MAX 0xULL
+
 #define KVM_GUEST_CR0_MASK (X86_CR0_NW | X86_CR0_CD)
 #define KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST (X86_CR0_WP | X86_CR0_NE)
 #define KVM_VM_CR0_ALWAYS_ON   \
@@ -1172,6 +1175,12 @@ static inline bool cpu_has_vmx_pml(void)
return vmcs_config.cpu_based_2nd_exec_ctrl & SECONDARY_EXEC_ENABLE_PML;
 }
 
+static inline bool cpu_has_vmx_tsc_scaling(void)
+{
+   return vmcs_config.cpu_based_2nd_exec_ctrl &
+   SECONDARY_EXEC_TSC_SCALING;
+}
+
 static inline bool report_flexpriority(void)
 {
return flexpriority_enabled;
@@ -3133,7 +3142,8 @@ static __init int setup_vmcs_config(struct vmcs_config 
*vmcs_conf)
SECONDARY_EXEC_SHADOW_VMCS |
SECONDARY_EXEC_XSAVES |
SECONDARY_EXEC_ENABLE_PML |
-   SECONDARY_EXEC_PCOMMIT;
+   SECONDARY_EXEC_PCOMMIT |
+   SECONDARY_EXEC_TSC_SCALING;
if (adjust_vmx_controls(min2, opt2,
MSR_IA32_VMX_PROCBASED_CTLS2,
&_cpu_based_2nd_exec_control) < 0)
@@ -6177,6 +6187,13 @@ static __init int hardware_setup(void)
if (!cpu_has_vmx_apicv())
enable_apicv = 0;
 
+   if (cpu_has_vmx_tsc_scaling()) {
+   kvm_has_tsc_control = true;
+   kvm_max_tsc_scaling_ratio = KVM_VMX_TSC_MULTIPLIER_MAX;
+   kvm_tsc_scaling_ratio_frac_bits = 48;
+   }
+   kvm_default_tsc_scaling_ratio = KVM_VMX_TSC_MULTIPLIER_DEFAULT;
+
if (enable_apicv)
kvm_x86_ops->update_cr8_intercept = NULL;
else {
-- 
2.4.8

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 11/12] KVM: VMX: Use a scaled host TSC for guest readings of MSR_IA32_TSC

2015-10-20 Thread Haozhong Zhang
This patch makes kvm-intel to return a scaled host TSC plus the TSC
offset when handling guest readings to MSR_IA32_TSC.

Signed-off-by: Haozhong Zhang 
---
 arch/x86/kvm/vmx.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c241ff3..a02b59c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2372,15 +2372,16 @@ static void setup_msrs(struct vcpu_vmx *vmx)
 
 /*
  * reads and returns guest's timestamp counter "register"
- * guest_tsc = host_tsc + tsc_offset-- 21.3
+ * guest_tsc = (host_tsc * tsc multiplier) >> 48 + tsc_offset
+ * -- Intel TSC Scaling for Virtualization White Paper, sec 1.3
  */
-static u64 guest_read_tsc(void)
+static u64 guest_read_tsc(struct kvm_vcpu *vcpu)
 {
u64 host_tsc, tsc_offset;
 
host_tsc = rdtsc();
tsc_offset = vmcs_read64(TSC_OFFSET);
-   return host_tsc + tsc_offset;
+   return kvm_scale_tsc(vcpu, host_tsc) + tsc_offset;
 }
 
 /*
@@ -2772,7 +2773,7 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_EFER:
return kvm_get_msr_common(vcpu, msr_info);
case MSR_IA32_TSC:
-   msr_info->data = guest_read_tsc();
+   msr_info->data = guest_read_tsc(vcpu);
break;
case MSR_IA32_SYSENTER_CS:
msr_info->data = vmcs_read32(GUEST_SYSENTER_CS);
-- 
2.4.8

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 02/12] KVM: x86: Add a common TSC scaling ratio field in kvm_vcpu_arch

2015-10-20 Thread Haozhong Zhang
This patch moves the field of TSC scaling ratio from the architecture
struct vcpu_svm to the common struct kvm_vcpu_arch.

Signed-off-by: Haozhong Zhang 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/svm.c  | 23 +--
 arch/x86/kvm/x86.c  |  5 -
 3 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0540dc8..1e08ad5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -505,6 +505,7 @@ struct kvm_vcpu_arch {
u32 virtual_tsc_mult;
u32 virtual_tsc_khz;
s64 ia32_tsc_adjust_msr;
+   u64 tsc_scaling_ratio;
 
atomic_t nmi_queued;  /* unprocessed asynchronous NMIs */
unsigned nmi_pending; /* NMI queued after currently running handler */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 55f5f49..04b58cf 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -158,8 +158,6 @@ struct vcpu_svm {
unsigned long int3_rip;
u32 apf_reason;
 
-   u64  tsc_ratio;
-
/* cached guest cpuid flags for faster access */
bool nrips_enabled  : 1;
 };
@@ -992,24 +990,22 @@ static u64 __scale_tsc(u64 ratio, u64 tsc)
 
 static u64 svm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc)
 {
-   struct vcpu_svm *svm = to_svm(vcpu);
u64 _tsc = tsc;
 
-   if (svm->tsc_ratio != TSC_RATIO_DEFAULT)
-   _tsc = __scale_tsc(svm->tsc_ratio, tsc);
+   if (vcpu->arch.tsc_scaling_ratio != TSC_RATIO_DEFAULT)
+   _tsc = __scale_tsc(vcpu->arch.tsc_scaling_ratio, tsc);
 
return _tsc;
 }
 
 static void svm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool 
scale)
 {
-   struct vcpu_svm *svm = to_svm(vcpu);
u64 ratio;
u64 khz;
 
/* Guest TSC same frequency as host TSC? */
if (!scale) {
-   svm->tsc_ratio = TSC_RATIO_DEFAULT;
+   vcpu->arch.tsc_scaling_ratio = TSC_RATIO_DEFAULT;
return;
}
 
@@ -1034,7 +1030,7 @@ static void svm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 
user_tsc_khz, bool scale)
user_tsc_khz);
return;
}
-   svm->tsc_ratio = ratio;
+   vcpu->arch.tsc_scaling_ratio = ratio;
 }
 
 static u64 svm_read_tsc_offset(struct kvm_vcpu *vcpu)
@@ -1068,7 +1064,7 @@ static void svm_adjust_tsc_offset(struct kvm_vcpu *vcpu, 
s64 adjustment, bool ho
struct vcpu_svm *svm = to_svm(vcpu);
 
if (host) {
-   if (svm->tsc_ratio != TSC_RATIO_DEFAULT)
+   if (vcpu->arch.tsc_scaling_ratio != TSC_RATIO_DEFAULT)
WARN_ON(adjustment < 0);
adjustment = svm_scale_tsc(vcpu, (u64)adjustment);
}
@@ -1240,8 +1236,6 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
goto out;
}
 
-   svm->tsc_ratio = TSC_RATIO_DEFAULT;
-
err = kvm_vcpu_init(>vcpu, kvm, id);
if (err)
goto free_svm;
@@ -1311,6 +1305,7 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
int i;
+   u64 tsc_ratio = vcpu->arch.tsc_scaling_ratio;
 
if (unlikely(cpu != vcpu->cpu)) {
svm->asid_generation = 0;
@@ -1328,9 +1323,9 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
rdmsrl(host_save_user_msrs[i], svm->host_user_msrs[i]);
 
if (static_cpu_has(X86_FEATURE_TSCRATEMSR) &&
-   svm->tsc_ratio != __this_cpu_read(current_tsc_ratio)) {
-   __this_cpu_write(current_tsc_ratio, svm->tsc_ratio);
-   wrmsrl(MSR_AMD64_TSC_RATIO, svm->tsc_ratio);
+   tsc_ratio != __this_cpu_read(current_tsc_ratio)) {
+   __this_cpu_write(current_tsc_ratio, tsc_ratio);
+   wrmsrl(MSR_AMD64_TSC_RATIO, tsc_ratio);
}
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 79cbbb5..8849e8b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1258,8 +1258,11 @@ static void kvm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 
this_tsc_khz)
int use_scaling = 0;
 
/* tsc_khz can be zero if TSC calibration fails */
-   if (this_tsc_khz == 0)
+   if (this_tsc_khz == 0) {
+   /* set tsc_scaling_ratio to a safe value */
+   vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
return;
+   }
 
/* Compute a scale to convert nanoseconds in TSC cycles */
kvm_get_time_scale(this_tsc_khz, NSEC_PER_SEC / 1000,
-- 
2.4.8

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tools lib traceevent: update KVM plugin

2015-10-20 Thread Arnaldo Carvalho de Melo
Em Fri, Oct 09, 2015 at 10:10:13PM +0200, Paolo Bonzini escreveu:
> On 01/10/2015 12:28, Paolo Bonzini wrote:
> > The format of the role word has changed through the years and the
> > plugin was never updated; some VMX exit reasons were missing too.
> > 
> > Signed-off-by: Paolo Bonzini 
> > ---
> >  tools/lib/traceevent/plugin_kvm.c | 25 +
> >  1 file changed, 17 insertions(+), 8 deletions(-)
> > 
> > diff --git a/tools/lib/traceevent/plugin_kvm.c 
> > b/tools/lib/traceevent/plugin_kvm.c
> > index 88fe83dff7cd..18536f756577 100644
> > --- a/tools/lib/traceevent/plugin_kvm.c
> > +++ b/tools/lib/traceevent/plugin_kvm.c
> > @@ -124,7 +124,10 @@ static const char *disassemble(unsigned char *insn, 
> > int len, uint64_t rip,
> > _ER(WBINVD,  54)\
> > _ER(XSETBV,  55)\
> > _ER(APIC_WRITE,  56)\
> > -   _ER(INVPCID, 58)
> > +   _ER(INVPCID, 58)\
> > +   _ER(PML_FULL,62)\
> > +   _ER(XSAVES,  63)\
> > +   _ER(XRSTORS, 64)
> >  
> >  #define SVM_EXIT_REASONS \
> > _ER(EXIT_READ_CR0,  0x000)  \
> > @@ -352,15 +355,18 @@ static int kvm_nested_vmexit_handler(struct trace_seq 
> > *s, struct pevent_record *
> >  union kvm_mmu_page_role {
> > unsigned word;
> > struct {
> > -   unsigned glevels:4;
> > unsigned level:4;
> > +   unsigned cr4_pae:1;
> > unsigned quadrant:2;
> > -   unsigned pad_for_nice_hex_output:6;
> > unsigned direct:1;
> > unsigned access:3;
> > unsigned invalid:1;
> > -   unsigned cr4_pge:1;
> > unsigned nxe:1;
> > +   unsigned cr0_wp:1;
> > +   unsigned smep_and_not_wp:1;
> > +   unsigned smap_and_not_wp:1;
> > +   unsigned pad_for_nice_hex_output:8;
> > +   unsigned smm:8;
> > };
> >  };
> >  
> > @@ -385,15 +391,18 @@ static int kvm_mmu_print_role(struct trace_seq *s, 
> > struct pevent_record *record,
> > if (pevent_is_file_bigendian(event->pevent) ==
> > pevent_is_host_bigendian(event->pevent)) {
> >  
> > -   trace_seq_printf(s, "%u/%u q%u%s %s%s %spge %snxe",
> > +   trace_seq_printf(s, "%u q%u%s %s%s %spae %snxe %swp%s%s%s",
> >  role.level,
> > -role.glevels,
> >  role.quadrant,
> >  role.direct ? " direct" : "",
> >  access_str[role.access],
> >  role.invalid ? " invalid" : "",
> > -role.cr4_pge ? "" : "!",
> > -role.nxe ? "" : "!");
> > +role.cr4_pae ? "" : "!",
> > +role.nxe ? "" : "!",
> > +role.cr0_wp ? "" : "!",
> > +role.smep_and_not_wp ? " smep" : "",
> > +role.smap_and_not_wp ? " smap" : "",
> > +role.smm ? " smm" : "");
> > } else
> > trace_seq_printf(s, "WORD: %08x", role.word);
> >  
> > 
> 
> Ping?  Arnaldo, ok to include this patch in my tree?

Applying, I saw it before, but lost track, perhaps was waiting for
Steven Rostedt to chime in, but now I noticed he wasn't CCed, he is now.

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [edk2] KVM: MTRR: fix memory type handling if MTRR is completely disabled

2015-10-20 Thread Laszlo Ersek
Hi,

On 10/20/15 19:27, Janusz wrote:
> W dniu 15.10.2015 o 20:46, Laszlo Ersek pisze:
>> On 10/15/15 18:53, Kinney, Michael D wrote:
>>> Laszlo,
>>>
>>> There is already a PCD for this timeout that is used by CpuMpPei.
>>>
>>> gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds
>>>
>>> I noticed that CpuDxe is using a hard coded AP timeout.  I think we should 
>>> just use this same PCD for both the PEI and DXE CPU module and then set it 
>>> for OVMF to the compatible value.
>> Perfect, thank you!
>>
>> (I notice the default in the DEC file is 5, which is half of what
>> the DXE driver hardcodes.)
>>
>> Now we only need a recommended (or experimental) value for it, and an
>> explanation why 100*1000 is no longer sufficient on KVM :)
>>
>> Thanks!
>> Laszlo
>>
>>
>>
> Laszlo,
> 
> I saw that there is already some change in ovmf for MicroSecondDelay
> https://github.com/tianocore/edk2/commit/1e410eadd80c328e66868263b3006a274ce81ae0
> Is that a fix for it? Because I tried it and it still doesn't work for
> me: https://bpaste.net/show/2514b51bf41f
> I still get internal error

I think you guys are now "mature enough OVMF users" to start employing
the correct terminology.

"edk2" (also spelled as "EDK II") is: "a modern, feature-rich,
cross-platform firmware development environment for the UEFI and PI
specifications".

The source tree contains a whole bunch of modules (drivers,
applications, libraries), organized into packages.

"OVMF" usually denotes a firmware binary built from one of the
OvmfPkg/OvmfPkg*.dsc "platform description files". Think of them as "top
level makefiles". The difference between them is the target architecture
(there's Ia32, X64, and Ia32X64 -- the last one means that the SEC and
PEI phases are 32-bit, whereas the DXE and later phases are 64-bit.) In
practice you'll only care about full X64.

Now, each of OvmfPkg/OvmfPkg*.dsc builds the following three kinds of
modules into the final binary:
- platform-independent modules from various top-level packages
- platform- (ie. Ia32/X64-) dependent modules from various top-level
  packages
- modules from under OvmfPkg that are specific to QEMU/KVM (and Xen, if
  you happen to use OVMF with Xen)

Now, when you reference a commit like 1e410ead above, you can look at
the diffstat, and decide if it is OvmfPkg-specific (third category
above) or not. Here you see UefiCpuPkg, which happens to be the second
category.

The important point is: please do *not* call any and all edk2 patches
"OVMF changes", indiscriminately. That's super confusing for people who
understand the above distinctions. Which now you do too. :)

Let me add that in edk2, patches that straddle top level packages are
generally forbidden -- you can't have a patch that modifies OvmfPkg and
UefiCpuPkg at the same time, modulo *very* rare exceptions. If a feature
or bugfix needs to touch several top-level packages, the series must be
built up carefully in stages.

Knowing all of the above, you can tell that the patch you referenced had
only *enabled* OvmfPkg to customize UefiCpuPkg, via
"PcdCpuApInitTimeOutInMicroSeconds". But for that customization to occur
actually, a small patch for OvmfPkg will be necessary too, in order to
set "PcdCpuApInitTimeOutInMicroSeconds" differently from the default.

I plan to send that patch soon. If you'd like to be CC'd, that's great
(reporting back with a Tested-by is even better!), but I'll need your
real name for that. (Or any name that looks like a real name.)

Thanks!
Laszlo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [edk2] KVM: MTRR: fix memory type handling if MTRR is completely disabled

2015-10-20 Thread Janusz
W dniu 15.10.2015 o 20:46, Laszlo Ersek pisze:
> On 10/15/15 18:53, Kinney, Michael D wrote:
>> Laszlo,
>>
>> There is already a PCD for this timeout that is used by CpuMpPei.
>>
>>  gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds
>>
>> I noticed that CpuDxe is using a hard coded AP timeout.  I think we should 
>> just use this same PCD for both the PEI and DXE CPU module and then set it 
>> for OVMF to the compatible value.
> Perfect, thank you!
>
> (I notice the default in the DEC file is 5, which is half of what
> the DXE driver hardcodes.)
>
> Now we only need a recommended (or experimental) value for it, and an
> explanation why 100*1000 is no longer sufficient on KVM :)
>
> Thanks!
> Laszlo
>
>
>
Laszlo,

I saw that there is already some change in ovmf for MicroSecondDelay
https://github.com/tianocore/edk2/commit/1e410eadd80c328e66868263b3006a274ce81ae0
Is that a fix for it? Because I tried it and it still doesn't work for
me: https://bpaste.net/show/2514b51bf41f
I still get internal error

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] arm/arm64: KVM: Fix disabled distributor operation

2015-10-20 Thread Eric Auger
On 10/20/2015 11:44 AM, Christoffer Dall wrote:
> On Tue, Oct 20, 2015 at 11:08:44AM +0200, Eric Auger wrote:
>> Hi Christoffer,
>> On 10/17/2015 10:30 PM, Christoffer Dall wrote:
>>> We currently do a single update of the vgic state when the distrbutor
>> distributor
>>> enable/disable control register is accessed and then bypass updating the
>>> state for as long as the distributor remains disabled.
>>>
>>> This is incorrect, because updating the state does not consider the
>>> distributor enable bit, and this you can end up in a situation where an
>>> interrupt is marked as pending on the CPU interface, but not pending on
>>> the distributor, which is an impossible state to be in, and triggers a
>>> warning.  Consider for example the following sequence of events:
>>>
>>> 1. An interrupt is marked as pending on the distributor
>>>- the interrupt is also forwarded to the CPU interface
>>> 2. The guest turns off the distributor (it's about to do a reboot)
>>>- we stop updating the CPU interface state from now on
>>> 3. The guest disables the pending interrupt
>>>- we remove the pending state from the distributor, but don't touch
>>>  the CPU interface, see point 2.
>>>
>>> Since the distributor disable bit really means that no interrupts should
>>> be forwarded to the CPU interface, we modify the code to keep updating
>>> the internal VGIC state, but always set the CPU interface pending bits
>>> to zero when the distributor is disabled.
>>>
>>> Signed-off-by: Christoffer Dall 
>>> ---
>>>  virt/kvm/arm/vgic.c | 11 ++-
>>>  1 file changed, 6 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>>> index 58b1256..66c6616 100644
>>> --- a/virt/kvm/arm/vgic.c
>>> +++ b/virt/kvm/arm/vgic.c
>>> @@ -1012,6 +1012,12 @@ static int compute_pending_for_cpu(struct kvm_vcpu 
>>> *vcpu)
>>> pend_percpu = vcpu->arch.vgic_cpu.pending_percpu;
>>> pend_shared = vcpu->arch.vgic_cpu.pending_shared;
>>>  
>>> +   if (!dist->enabled) {
>>> +   bitmap_zero(pend_percpu, VGIC_NR_PRIVATE_IRQS);
>>> +   bitmap_zero(pend_shared, nr_shared);
>>> +   return 0;
>>> +   }
>>> +
>>> pending = vgic_bitmap_get_cpu_map(>irq_pending, vcpu_id);
>>> enabled = vgic_bitmap_get_cpu_map(>irq_enabled, vcpu_id);
>>> bitmap_and(pend_percpu, pending, enabled, VGIC_NR_PRIVATE_IRQS);
>>> @@ -1039,11 +1045,6 @@ void vgic_update_state(struct kvm *kvm)
>>> struct kvm_vcpu *vcpu;
>>> int c;
>>>  
>>> -   if (!dist->enabled) {
>>> -   set_bit(0, dist->irq_pending_on_cpu);
>>> -   return;
>> I am confused. Don't you want to clear the whole bitmap?
> 
> So the first line used to set irq_pending_on_cpu for VCPU0 when the
> distributor was disabled, which I think basically worked around the
> guest kernel expecting to see a timer interrupt during boot when doing a
> WFI.  Now when that's fixed, we don't need this (gigantuous hack) anymore.
> 
> The return statement was also weird and buggy, because it never
> prevented anything from going to the CPU interface, it just stopped
> updating things.

Yeah sorry I read it as an addition so I didn't understand that code but
I definitively understand you remove it. Sorry - tired.

Best Regards

Eric
> 
> 
>>
>> Shouldn't we also handle interrupts programmed in the LR. Spec says any
>> ack should return a spurious ID. Is it what is going to happen with the
>> current implementation?
>>
> 
> yes, we really should.  We should unqueue them, but I haven't seen any
> bugs from this, and I feel like we're already changing a lot with short
> notice, so I'd rather not distrupt anything more right now.
> 
> Besides, when we get around to redesigning this whole thing, the concept
> of unqueueing goes away.
> 
> I know it sucks reviewing fixes that only fix a subset of a bad
> implementation, but I'm aiming for 'slightly better than current state'
> right now :)
> 
> -Christoffer
> 
> 
>>> -   }
>>> -
>>> kvm_for_each_vcpu(c, vcpu, kvm) {
>>> if (compute_pending_for_cpu(vcpu))
>>> set_bit(c, dist->irq_pending_on_cpu);
>>>
>>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tools lib traceevent: update KVM plugin

2015-10-20 Thread Steven Rostedt
On Tue, 20 Oct 2015 17:19:12 +0200
Paolo Bonzini  wrote:

> 
> 
> On 20/10/2015 16:44, Steven Rostedt wrote:
> > What happens if you run new perf on an older kernel. Is this new plugin
> > going to be screwed up? Plugins should be backward compatible.
> 
> If you run new perf on older kernel, the new plugin will print the 
> "role" field (see kvm_mmu_print_role) slightly incorrectly.  That said, 
> the existing plugin was _also_ printing the role in a wildly wrong 
> format, like 2.6.35 vintage; the glevels field was removed by commit 
> 5b7e0102ae74, "KVM: MMU: Replace role.glevels with role.cr4_pae", in 
> April 2010.

Can we add a check if glevels field exists, and do the old format if it
does? I'm very strict on making sure event-parse works with all
incarnations of kernels.

> 
> Going forward it's really unlikely that the role will change apart from 
> adding new bits.  These can be added to the plugin while keeping it 
> backwards-compatible.  Addition to the role happen when you implement 
> new virtual MMU features such as SMEP, SMAP or SMM.  That's once per year
> or less.
> 
> > Is the plugin even still needed? I'm looking at some of the kvm events
> > and they seem to be mostly self sufficient. What ones need a plugin
> > today?
> 
> Yes, most of them are.  It's only needed for kvm_mmu_get_page, 
> kvm_mmu_prepare_zap_page and kvm_emulate_insn.  The latter is only 
> interesting if you install the disassembler library, and I wouldn't 
> really care if it went away.
> 
> kvm_mmu_get_page and kvm_mmu_prepare_zap_page, however, have output like
> 
> kvm_mmu_get_page: [FAILED TO PARSE] mmu_valid_gen=0x2 gfn=786432 
> role=1923 root_count=0 unsync=0 created=1
> 
> without the plugin vs.
> 
> kvm_mmu_get_page: new sp gfn c 3 q0 direct rwx !pae !nxe !wp root 0 
> sync
> 
> with the plugin.

Thanks for the update.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: Re: [PATCH v3 2/4] KVM: use simple waitqueue for vcpu->wq

2015-10-20 Thread Paolo Bonzini
Hi kvm-ppc folks,

there is a remark for you in here...

Message ID is 20151020140031.gg17...@twins.programming.kicks-ass.net

Thanks,

Paolo

 Forwarded Message 
Subject: Re: [PATCH v3 2/4] KVM: use simple waitqueue for vcpu->wq
Date: Tue, 20 Oct 2015 16:00:31 +0200
From: Peter Zijlstra 
To: Daniel Wagner 
CC: linux-ker...@vger.kernel.org, linux-rt-us...@vger.kernel.org,
Marcelo Tosatti , Paolo Bonzini
, Paul E. McKenney ,
Paul Gortmaker , Thomas Gleixner


On Tue, Oct 20, 2015 at 09:28:08AM +0200, Daniel Wagner wrote:
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 2280497..f534e15 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -2560,10 +2560,9 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore 
> *vc)
>  {
>   struct kvm_vcpu *vcpu;
>   int do_sleep = 1;
> + DECLARE_SWAITQUEUE(wait);
>  
> - DEFINE_WAIT(wait);
> -
> - prepare_to_wait(>wq, , TASK_INTERRUPTIBLE);
> + prepare_to_swait(>wq, , TASK_INTERRUPTIBLE);
>  
>   /*
>* Check one last time for pending exceptions and ceded state after
> @@ -2577,7 +2576,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore 
> *vc)
>   }
>  
>   if (!do_sleep) {
> - finish_wait(>wq, );
> + finish_swait(>wq, );
>   return;
>   }
>  
> @@ -2585,7 +2584,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore 
> *vc)
>   trace_kvmppc_vcore_blocked(vc, 0);
>   spin_unlock(>lock);
>   schedule();
> - finish_wait(>wq, );
> + finish_swait(>wq, );
>   spin_lock(>lock);
>   vc->vcore_state = VCORE_INACTIVE;
>   trace_kvmppc_vcore_blocked(vc, 1);

This one looks buggy, one should _NOT_ assume that your blocking
condition is true after schedule().

> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 8db1d93..45ab55f 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2019,7 +2018,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>   }
>  
>   for (;;) {
> - prepare_to_wait(>wq, , TASK_INTERRUPTIBLE);
> + prepare_to_swait(>wq, , TASK_INTERRUPTIBLE);
>  
>   if (kvm_vcpu_check_block(vcpu) < 0)
>   break;
> @@ -2028,7 +2027,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>   schedule();
>   }
>  
> - finish_wait(>wq, );
> + finish_swait(>wq, );
>   cur = ktime_get();
>  
>  out:

Should we not take this opportunity to get rid of these open-coded wait
loops?


Does this work?

---
 arch/powerpc/kvm/book3s_hv.c | 33 +
 virt/kvm/kvm_main.c  | 13 ++---
 2 files changed, 19 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 228049786888..b5b8bcad5105 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2552,18 +2552,10 @@ static void kvmppc_wait_for_exec(struct
kvmppc_vcore *vc,
finish_wait(>arch.cpu_run, );
 }

-/*
- * All the vcpus in this vcore are idle, so wait for a decrementer
- * or external interrupt to one of the vcpus.  vc->lock is held.
- */
-static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
+static inline bool kvmppc_vcore_should_sleep(struct kvmppc_vcore *vc)
 {
struct kvm_vcpu *vcpu;
-   int do_sleep = 1;
-
-   DEFINE_WAIT(wait);
-
-   prepare_to_wait(>wq, , TASK_INTERRUPTIBLE);
+   bool sleep = true;

/*
 * Check one last time for pending exceptions and ceded state after
@@ -2571,26 +2563,35 @@ static void kvmppc_vcore_blocked(struct
kvmppc_vcore *vc)
 */
list_for_each_entry(vcpu, >runnable_threads, arch.run_list) {
if (vcpu->arch.pending_exceptions || !vcpu->arch.ceded) {
-   do_sleep = 0;
+   sleep = false;
break;
}
}

-   if (!do_sleep) {
-   finish_wait(>wq, );
-   return;
-   }
+   return sleep;
+}

+static inline void kvmppc_vcore_schedule(struct kvmppc_vcore *vc)
+{
vc->vcore_state = VCORE_SLEEPING;
trace_kvmppc_vcore_blocked(vc, 0);
spin_unlock(>lock);
schedule();
-   finish_wait(>wq, );
spin_lock(>lock);
vc->vcore_state = VCORE_INACTIVE;
trace_kvmppc_vcore_blocked(vc, 1);
 }

+/*
+ * All the vcpus in this vcore are idle, so wait for a decrementer
+ * or external interrupt to one of the vcpus.  vc->lock is held.
+ */
+static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
+{
+   ___wait_event(vc->wq, !kvmppc_vcore_should_sleep(vc), TASK_IDLE, 0, 0,
+ kvmppc_vcore_schedule(vc));
+}
+
 static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct 

Re: [PATCH 1/4] vfio: platform: add capability to register a reset function

2015-10-20 Thread Eric Auger
Hi Alex,
On 10/19/2015 09:45 PM, Alex Williamson wrote:
> On Sun, 2015-10-18 at 18:00 +0200, Eric Auger wrote:
>> In preparation for subsequent changes in reset function lookup,
>> lets introduce a dynamic list of reset combos (compat string,
>> reset module, reset function). The list can be populated/voided with
>> two new functions, vfio_platform_register/unregister_reset. Those are
>> not yet used in this patch.
>>
>> Signed-off-by: Eric Auger 
>> ---
>>  drivers/vfio/platform/vfio_platform_common.c  | 55 
>> +++
>>  drivers/vfio/platform/vfio_platform_private.h | 14 +++
>>  2 files changed, 69 insertions(+)
>>
>> diff --git a/drivers/vfio/platform/vfio_platform_common.c 
>> b/drivers/vfio/platform/vfio_platform_common.c
>> index e43efb5..d36afc9 100644
>> --- a/drivers/vfio/platform/vfio_platform_common.c
>> +++ b/drivers/vfio/platform/vfio_platform_common.c
>> @@ -23,6 +23,8 @@
>>  
>>  #include "vfio_platform_private.h"
>>  
>> +struct list_head reset_list;
> 
> What's the purpose of this one above?
useless indeed
> 
>> +LIST_HEAD(reset_list);
> 
> static
> 
> I think you also need a mutex protecting this list, otherwise nothing
> prevents concurrent list changes and walks.  A rw lock probably fits the
> usage model best, but I don't expect much contention if you just want to
> use a mutex.
yes you're right. I am going to use a standard mutex I think.
> 
>>  static DEFINE_MUTEX(driver_lock);
>>  
>>  static const struct vfio_platform_reset_combo reset_lookup_table[] = {
>> @@ -573,3 +575,56 @@ struct vfio_platform_device 
>> *vfio_platform_remove_common(struct device *dev)
>>  return vdev;
>>  }
>>  EXPORT_SYMBOL_GPL(vfio_platform_remove_common);
>> +
>> +int vfio_platform_register_reset(struct module *reset_owner, char *compat,
> 
> const char *
ok
> 
>> + vfio_platform_reset_fn_t reset)
>> +{
>> +struct vfio_platform_reset_node *node, *iter;
>> +bool found = false;
>> +
>> +list_for_each_entry(iter, _list, link) {
>> +if (!strcmp(iter->compat, compat)) {
>> +found = true;
>> +break;
>> +}
>> +}
>> +if (found)
>> +return -EINVAL;
>> +
>> +node = kmalloc(sizeof(*node), GFP_KERNEL);
>> +if (!node)
>> +return -ENOMEM;
>> +
>> +node->compat = kstrdup(compat, GFP_KERNEL);
>> +if (!node->compat)
>> +return -ENOMEM;
> 
> Leaks node
thanks
> 
>> +
>> +node->owner = reset_owner;
>> +node->reset = reset;
>> +
>> +list_add(>link, _list);
>> +return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(vfio_platform_register_reset);
>> +
>> +int vfio_platform_unregister_reset(char *compat)
> 
> const char *
ok

Thanks for your quick review!

Best Regards

Eric
> 
>> +{
>> +struct vfio_platform_reset_node *iter;
>> +bool found = false;
>> +
>> +list_for_each_entry(iter, _list, link) {
>> +if (!strcmp(iter->compat, compat)) {
>> +found = true;
>> +break;
>> +}
>> +}
>> +if (!found)
>> +return -EINVAL;
>> +
>> +list_del(>link);
>> +kfree(iter->compat);
>> +kfree(iter);
>> +return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(vfio_platform_unregister_reset);
>> +
>> diff --git a/drivers/vfio/platform/vfio_platform_private.h 
>> b/drivers/vfio/platform/vfio_platform_private.h
>> index 1c9b3d5..17323f0 100644
>> --- a/drivers/vfio/platform/vfio_platform_private.h
>> +++ b/drivers/vfio/platform/vfio_platform_private.h
>> @@ -76,6 +76,15 @@ struct vfio_platform_reset_combo {
>>  const char *module_name;
>>  };
>>  
>> +typedef int (*vfio_platform_reset_fn_t)(struct vfio_platform_device *vdev);
>> +
>> +struct vfio_platform_reset_node {
>> +struct list_head link;
>> +char *compat;
>> +struct module *owner;
>> +vfio_platform_reset_fn_t reset;
>> +};
>> +
>>  extern int vfio_platform_probe_common(struct vfio_platform_device *vdev,
>>struct device *dev);
>>  extern struct vfio_platform_device *vfio_platform_remove_common
>> @@ -89,4 +98,9 @@ extern int vfio_platform_set_irqs_ioctl(struct 
>> vfio_platform_device *vdev,
>>  unsigned start, unsigned count,
>>  void *data);
>>  
>> +extern int vfio_platform_register_reset(struct module *owner,
>> +char *compat,
>> +vfio_platform_reset_fn_t reset);
>> +extern int vfio_platform_unregister_reset(char *compat);
>> +
>>  #endif /* VFIO_PLATFORM_PRIVATE_H */
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [edk2] KVM: MTRR: fix memory type handling if MTRR is completely disabled

2015-10-20 Thread Janusz Mocek
W dniu 20.10.2015 o 19:44, Laszlo Ersek pisze:
> Hi,
>
> On 10/20/15 19:27, Janusz wrote:
>> W dniu 15.10.2015 o 20:46, Laszlo Ersek pisze:
>>> On 10/15/15 18:53, Kinney, Michael D wrote:
 Laszlo,

 There is already a PCD for this timeout that is used by CpuMpPei.

gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds

 I noticed that CpuDxe is using a hard coded AP timeout.  I think we should 
 just use this same PCD for both the PEI and DXE CPU module and then set it 
 for OVMF to the compatible value.
>>> Perfect, thank you!
>>>
>>> (I notice the default in the DEC file is 5, which is half of what
>>> the DXE driver hardcodes.)
>>>
>>> Now we only need a recommended (or experimental) value for it, and an
>>> explanation why 100*1000 is no longer sufficient on KVM :)
>>>
>>> Thanks!
>>> Laszlo
>>>
>>>
>>>
>> Laszlo,
>>
>> I saw that there is already some change in ovmf for MicroSecondDelay
>> https://github.com/tianocore/edk2/commit/1e410eadd80c328e66868263b3006a274ce81ae0
>> Is that a fix for it? Because I tried it and it still doesn't work for
>> me: https://bpaste.net/show/2514b51bf41f
>> I still get internal error
> I think you guys are now "mature enough OVMF users" to start employing
> the correct terminology.

Sory for that :)

> "edk2" (also spelled as "EDK II") is: "a modern, feature-rich,
> cross-platform firmware development environment for the UEFI and PI
> specifications".
>
> The source tree contains a whole bunch of modules (drivers,
> applications, libraries), organized into packages.
>
> "OVMF" usually denotes a firmware binary built from one of the
> OvmfPkg/OvmfPkg*.dsc "platform description files". Think of them as "top
> level makefiles". The difference between them is the target architecture
> (there's Ia32, X64, and Ia32X64 -- the last one means that the SEC and
> PEI phases are 32-bit, whereas the DXE and later phases are 64-bit.) In
> practice you'll only care about full X64.
>
> Now, each of OvmfPkg/OvmfPkg*.dsc builds the following three kinds of
> modules into the final binary:
> - platform-independent modules from various top-level packages
> - platform- (ie. Ia32/X64-) dependent modules from various top-level
>   packages
> - modules from under OvmfPkg that are specific to QEMU/KVM (and Xen, if
>   you happen to use OVMF with Xen)
>
> Now, when you reference a commit like 1e410ead above, you can look at
> the diffstat, and decide if it is OvmfPkg-specific (third category
> above) or not. Here you see UefiCpuPkg, which happens to be the second
> category.
>
> The important point is: please do *not* call any and all edk2 patches
> "OVMF changes", indiscriminately. That's super confusing for people who
> understand the above distinctions. Which now you do too. :)
>
> Let me add that in edk2, patches that straddle top level packages are
> generally forbidden -- you can't have a patch that modifies OvmfPkg and
> UefiCpuPkg at the same time, modulo *very* rare exceptions. If a feature
> or bugfix needs to touch several top-level packages, the series must be
> built up carefully in stages.
>
> Knowing all of the above, you can tell that the patch you referenced had
> only *enabled* OvmfPkg to customize UefiCpuPkg, via
> "PcdCpuApInitTimeOutInMicroSeconds". But for that customization to occur
> actually, a small patch for OvmfPkg will be necessary too, in order to
> set "PcdCpuApInitTimeOutInMicroSeconds" differently from the default.
>
> I plan to send that patch soon. If you'd like to be CC'd, that's great
> (reporting back with a Tested-by is even better!), but I'll need your
> real name for that. (Or any name that looks like a real name.)
would be great if you could add me to cc list, thanks
>
> Thanks!
> Laszlo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tools lib traceevent: update KVM plugin

2015-10-20 Thread Arnaldo Carvalho de Melo
Em Tue, Oct 20, 2015 at 05:49:22PM +0200, Paolo Bonzini escreveu:
> 
> 
> On 20/10/2015 17:48, Steven Rostedt wrote:
> > On Tue, 20 Oct 2015 17:37:43 +0200
> > Paolo Bonzini  wrote:
> > 
> >> However, it frankly seems a bit academic.  The parsing _will_ work,
> >> apart from printing a nonsensical role just like it has always done for
> >> the past four years.
> > 
> > I'm not going to be too picky about it. But things like this may seem
> > academic, but it's also what we forget about, and people still use
> > 2.6.32 kernels, and this may come back and bite us later.
> > 
> > But it's more likely to bite you than me, so if you don't think it's
> > worth while, then I'm not going to push it.
> 
> Thanks.  The role is not going to matter except in really weird cases
> and hopefully those are just not there in 2.6.32 kernels.  Thanks also
> for educating me so I won't get it wrong in the future!

Cool, Steven, can I take that as an Acked-by for this patch?

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tools lib traceevent: update KVM plugin

2015-10-20 Thread Paolo Bonzini


On 20/10/2015 17:26, Steven Rostedt wrote:
> > > > What happens if you run new perf on an older kernel. Is this new plugin
> > > > going to be screwed up? Plugins should be backward compatible.
> > 
> > If you run new perf on older kernel, the new plugin will print the 
> > "role" field (see kvm_mmu_print_role) slightly incorrectly.  That said, 
> > the existing plugin was _also_ printing the role in a wildly wrong 
> > format, like 2.6.35 vintage; the glevels field was removed by commit 
> > 5b7e0102ae74, "KVM: MMU: Replace role.glevels with role.cr4_pae", in 
> > April 2010.
> 
> Can we add a check if glevels field exists, and do the old format if it
> does? I'm very strict on making sure event-parse works with all
> incarnations of kernels.

Note that this is not a glevels _tracepoint_ field.  The tracepoint
field is "role" and it is a 64-bit integer.  The plugin interprets the
bitfields of that integer, and the position/meaning of the bitfields has
changed.

I guess I could use the mmu_valid_gen field as a proxy (it was
introduced after glevels was removed), and only provide the "very old"
(up to 2.6.34) and "new" (4.2+) formats.  The "old" format (2.6.35-4.1)
was never implemented in the plugin; it may well remain that way.

However, it frankly seems a bit academic.  The parsing _will_ work,
apart from printing a nonsensical role just like it has always done for
the past four years.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Difference between vcpu_load and kvm_sched_in ?

2015-10-20 Thread Paolo Bonzini


On 20/10/2015 11:57, Yacine wrote:
> vcpu_load; start cr3 trapping; vcpu_put
> 
> it worked correctly (in my logs I see that vcpu.cpu become equal to "cpu =
> raw_smp_processor_id();") but the VM blocks for a lot of time due to mutex
> in vcpu_load (up to serveral seconds and sometimes minutes !)

Right, that's because while the CPU is running the mutex is taken.  If
the VCPU doesn't exit, the mutex is held.

> I replaced vcpu_load with kvm_sched_in, now everything works perfectly and
> the VM doesn't block at all (logs here: http://pastebin.com/h5XNNMcb).
> 
> So, what I want to know is: what is the difference between vcpu_load and
> kvm_sched_in ? both of this functions call kvm_arch_vcpu_loadbut the latter
> one does it without doing a mutex

kvm_sched_out and kvm_sched_in are part of KVM's preemption hooks.  The
hooks are registered only between vcpu_load and vcpu_put, therefore they
know that the mutex is taken.  The sequence will go like this:

vcpu_load
kvm_sched_out
kvm_sched_in
kvm_sched_out
kvm_sched_in
...
vcpu_put

and it will all happen with the mutex held.

> Is there a problem in using kvm_sched_in instead of vcpu_load for my use case 
> ?

Yes, unfortunately it is a problem: you are loading the same VMCS on two
processors, which has undefined results.

To fix the problem, wrap the ioctl into a function and pass the function
to QEMU's "run_on_cpu" function.  It will send the ioctl from the right
thread, so that the kernel will not be holding the vcpu mutex.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 28/33] nvdimm acpi: support DSM_FUN_IMPLEMENTED function

2015-10-20 Thread Xiao Guangrong



On 10/21/2015 12:26 AM, Xiao Guangrong wrote:



On 10/20/2015 11:51 PM, Stefan Hajnoczi wrote:

On Mon, Oct 19, 2015 at 08:54:14AM +0800, Xiao Guangrong wrote:

+exit:
+/* Write our output result to dsm memory. */
+((dsm_out *)dsm_ram_addr)->len = out->len;


Missing byteswap?

I thought you were going to remove this field because it wasn't needed
by the guest.



The @len is the size of _DSM result buffer, for example, for the function of
DSM_FUN_IMPLEMENTED the result buffer is 8 bytes, and for
DSM_DEV_FUN_NAMESPACE_LABEL_SIZE the buffer size is 4 bytes. It tells ASL code


Sorry, s/DSM_DEV_FUN_NAMESPACE_LABEL_SIZE/DSM_DEV_FUN_SET_NAMESPACE_LABEL_DATA


how much size of memory we need to return to the _DSM caller.

In _DSM code, it's handled like this:

"RLEN" is @len, “OBUF” is the left memory in DSM page.

 /* get @len*/
 aml_append(method, aml_store(aml_name("RLEN"), aml_local(6)));
 /* @len << 3 to get bits. */
 aml_append(method, aml_store(aml_shiftleft(aml_local(6),
aml_int(3)), aml_local(6)));

 /* get @len << 3 bits from OBUF, and return it to the caller. */
 aml_append(method, aml_create_field(aml_name("ODAT"), aml_int(0),
 aml_local(6) , "OBUF"));

Since @len is our internally used, it's not return to guest, so i did not do
byteswap here.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 28/33] nvdimm acpi: support DSM_FUN_IMPLEMENTED function

2015-10-20 Thread Michael S. Tsirkin
On Tue, Oct 20, 2015 at 04:51:49PM +0100, Stefan Hajnoczi wrote:
> On Mon, Oct 19, 2015 at 08:54:14AM +0800, Xiao Guangrong wrote:
> > +exit:
> > +/* Write our output result to dsm memory. */
> > +((dsm_out *)dsm_ram_addr)->len = out->len;
> 
> Missing byteswap?


That's why I'd prefer no structures, using build_append_int_noprefix,
unless the structure is used outside ACPI as well.

> I thought you were going to remove this field because it wasn't needed
> by the guest.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tools lib traceevent: update KVM plugin

2015-10-20 Thread Steven Rostedt
On Tue, 20 Oct 2015 12:50:26 -0300
Arnaldo Carvalho de Melo  wrote:

> Cool, Steven, can I take that as an Acked-by for this patch?
> 

Acked-by: Steven Rostedt 

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 28/33] nvdimm acpi: support DSM_FUN_IMPLEMENTED function

2015-10-20 Thread Stefan Hajnoczi
On Mon, Oct 19, 2015 at 08:54:14AM +0800, Xiao Guangrong wrote:
> +exit:
> +/* Write our output result to dsm memory. */
> +((dsm_out *)dsm_ram_addr)->len = out->len;

Missing byteswap?

I thought you were going to remove this field because it wasn't needed
by the guest.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 28/33] nvdimm acpi: support DSM_FUN_IMPLEMENTED function

2015-10-20 Thread Xiao Guangrong



On 10/20/2015 11:51 PM, Stefan Hajnoczi wrote:

On Mon, Oct 19, 2015 at 08:54:14AM +0800, Xiao Guangrong wrote:

+exit:
+/* Write our output result to dsm memory. */
+((dsm_out *)dsm_ram_addr)->len = out->len;


Missing byteswap?

I thought you were going to remove this field because it wasn't needed
by the guest.



The @len is the size of _DSM result buffer, for example, for the function of
DSM_FUN_IMPLEMENTED the result buffer is 8 bytes, and for
DSM_DEV_FUN_NAMESPACE_LABEL_SIZE the buffer size is 4 bytes. It tells ASL code
how much size of memory we need to return to the _DSM caller.

In _DSM code, it's handled like this:

"RLEN" is @len, “OBUF” is the left memory in DSM page.

/* get @len*/
aml_append(method, aml_store(aml_name("RLEN"), aml_local(6)));
/* @len << 3 to get bits. */
aml_append(method, aml_store(aml_shiftleft(aml_local(6),
   aml_int(3)), aml_local(6)));

/* get @len << 3 bits from OBUF, and return it to the caller. */
aml_append(method, aml_create_field(aml_name("ODAT"), aml_int(0),
aml_local(6) , "OBUF"));

Since @len is our internally used, it's not return to guest, so i did not do
byteswap here.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tools lib traceevent: update KVM plugin

2015-10-20 Thread Paolo Bonzini


On 20/10/2015 16:44, Steven Rostedt wrote:
> What happens if you run new perf on an older kernel. Is this new plugin
> going to be screwed up? Plugins should be backward compatible.

If you run new perf on older kernel, the new plugin will print the 
"role" field (see kvm_mmu_print_role) slightly incorrectly.  That said, 
the existing plugin was _also_ printing the role in a wildly wrong 
format, like 2.6.35 vintage; the glevels field was removed by commit 
5b7e0102ae74, "KVM: MMU: Replace role.glevels with role.cr4_pae", in 
April 2010.

Going forward it's really unlikely that the role will change apart from 
adding new bits.  These can be added to the plugin while keeping it 
backwards-compatible.  Addition to the role happen when you implement 
new virtual MMU features such as SMEP, SMAP or SMM.  That's once per year
or less.

> Is the plugin even still needed? I'm looking at some of the kvm events
> and they seem to be mostly self sufficient. What ones need a plugin
> today?

Yes, most of them are.  It's only needed for kvm_mmu_get_page, 
kvm_mmu_prepare_zap_page and kvm_emulate_insn.  The latter is only 
interesting if you install the disassembler library, and I wouldn't 
really care if it went away.

kvm_mmu_get_page and kvm_mmu_prepare_zap_page, however, have output like

kvm_mmu_get_page: [FAILED TO PARSE] mmu_valid_gen=0x2 gfn=786432 role=1923 
root_count=0 unsync=0 created=1

without the plugin vs.

kvm_mmu_get_page: new sp gfn c 3 q0 direct rwx !pae !nxe !wp root 0 sync

with the plugin.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 3/6] KVM: arm: use GIC support unconditionally

2015-10-20 Thread Christoffer Dall
From: Arnd Bergmann 

The vgic code on ARM is built for all configurations that enable KVM,
but the parent_data field that it references is only present when
CONFIG_IRQ_DOMAIN_HIERARCHY is set:

virt/kvm/arm/vgic.c: In function 'kvm_vgic_map_phys_irq':
virt/kvm/arm/vgic.c:1781:13: error: 'struct irq_data' has no member named 
'parent_data'

This flag is implied by the GIC driver, and indeed the VGIC code only
makes sense if a GIC is present. This changes the CONFIG_KVM symbol
to always select GIC, which avoids the issue.

Fixes: 662d9715840 ("arm/arm64: KVM: Kill CONFIG_KVM_ARM_{VGIC,TIMER}")
Signed-off-by: Arnd Bergmann 
Acked-by: Marc Zyngier 
Signed-off-by: Christoffer Dall 
---
 arch/arm/kvm/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index 210ecca..356970f 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -21,6 +21,7 @@ config KVM
depends on MMU && OF
select PREEMPT_NOTIFIERS
select ANON_INODES
+   select ARM_GIC
select HAVE_KVM_CPU_RELAX_INTERCEPT
select HAVE_KVM_ARCH_TLB_FLUSH_ALL
select KVM_MMIO
-- 
2.1.2.330.g565301e.dirty

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 2/6] KVM: arm/arm64: Fix memory leak if timer initialization fails

2015-10-20 Thread Christoffer Dall
From: Pavel Fedin 

Jump to correct label and free kvm_host_cpu_state

Reviewed-by: Wei Huang 
Signed-off-by: Pavel Fedin 
Signed-off-by: Christoffer Dall 
---
 arch/arm/kvm/arm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index dc017ad..78b2869 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -1080,7 +1080,7 @@ static int init_hyp_mode(void)
 */
err = kvm_timer_hyp_init();
if (err)
-   goto out_free_mappings;
+   goto out_free_context;
 
 #ifndef CONFIG_HOTPLUG_CPU
free_boot_hyp_pgd();
-- 
2.1.2.330.g565301e.dirty

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 6/6] arm/arm64: KVM: Fix disabled distributor operation

2015-10-20 Thread Christoffer Dall
We currently do a single update of the vgic state when the distributor
enable/disable control register is accessed and then bypass updating the
state for as long as the distributor remains disabled.

This is incorrect, because updating the state does not consider the
distributor enable bit, and this you can end up in a situation where an
interrupt is marked as pending on the CPU interface, but not pending on
the distributor, which is an impossible state to be in, and triggers a
warning.  Consider for example the following sequence of events:

1. An interrupt is marked as pending on the distributor
   - the interrupt is also forwarded to the CPU interface
2. The guest turns off the distributor (it's about to do a reboot)
   - we stop updating the CPU interface state from now on
3. The guest disables the pending interrupt
   - we remove the pending state from the distributor, but don't touch
 the CPU interface, see point 2.

Since the distributor disable bit really means that no interrupts should
be forwarded to the CPU interface, we modify the code to keep updating
the internal VGIC state, but always set the CPU interface pending bits
to zero when the distributor is disabled.

Signed-off-by: Christoffer Dall 
---
 virt/kvm/arm/vgic.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 58b1256..66c6616 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1012,6 +1012,12 @@ static int compute_pending_for_cpu(struct kvm_vcpu *vcpu)
pend_percpu = vcpu->arch.vgic_cpu.pending_percpu;
pend_shared = vcpu->arch.vgic_cpu.pending_shared;
 
+   if (!dist->enabled) {
+   bitmap_zero(pend_percpu, VGIC_NR_PRIVATE_IRQS);
+   bitmap_zero(pend_shared, nr_shared);
+   return 0;
+   }
+
pending = vgic_bitmap_get_cpu_map(>irq_pending, vcpu_id);
enabled = vgic_bitmap_get_cpu_map(>irq_enabled, vcpu_id);
bitmap_and(pend_percpu, pending, enabled, VGIC_NR_PRIVATE_IRQS);
@@ -1039,11 +1045,6 @@ void vgic_update_state(struct kvm *kvm)
struct kvm_vcpu *vcpu;
int c;
 
-   if (!dist->enabled) {
-   set_bit(0, dist->irq_pending_on_cpu);
-   return;
-   }
-
kvm_for_each_vcpu(c, vcpu, kvm) {
if (compute_pending_for_cpu(vcpu))
set_bit(c, dist->irq_pending_on_cpu);
-- 
2.1.2.330.g565301e.dirty

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tools lib traceevent: update KVM plugin

2015-10-20 Thread Paolo Bonzini


On 20/10/2015 17:48, Steven Rostedt wrote:
> On Tue, 20 Oct 2015 17:37:43 +0200
> Paolo Bonzini  wrote:
> 
>> However, it frankly seems a bit academic.  The parsing _will_ work,
>> apart from printing a nonsensical role just like it has always done for
>> the past four years.
> 
> I'm not going to be too picky about it. But things like this may seem
> academic, but it's also what we forget about, and people still use
> 2.6.32 kernels, and this may come back and bite us later.
> 
> But it's more likely to bite you than me, so if you don't think it's
> worth while, then I'm not going to push it.

Thanks.  The role is not going to matter except in really weird cases
and hopefully those are just not there in 2.6.32 kernels.  Thanks also
for educating me so I won't get it wrong in the future!

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tools lib traceevent: update KVM plugin

2015-10-20 Thread Steven Rostedt
On Tue, 20 Oct 2015 17:37:43 +0200
Paolo Bonzini  wrote:

> However, it frankly seems a bit academic.  The parsing _will_ work,
> apart from printing a nonsensical role just like it has always done for
> the past four years.

I'm not going to be too picky about it. But things like this may seem
academic, but it's also what we forget about, and people still use
2.6.32 kernels, and this may come back and bite us later.

But it's more likely to bite you than me, so if you don't think it's
worth while, then I'm not going to push it.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 0/6] A handful of fixes for KVM/ARM for v4.3-rc7

2015-10-20 Thread Christoffer Dall
Hi Paolo,

The following changes since commit 920552b213e3dc832a874b4e7ba29ecddbab31bc:

  KVM: disable halt_poll_ns as default for s390x (2015-09-25 10:31:30 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git 
tags/kvm-arm-for-v4.3-rc7

for you to fetch changes up to 0d997491f814c87310a6ad7be30a9049c7150489:

  arm/arm64: KVM: Fix disabled distributor operation (2015-10-20 18:09:13 +0200)

Sorry for sending these relatively late, but we had a situation where we
found one breakage in the timer implementation changes merged for 4.3,
then fixing that issue revealed another bug, and then that happened
again, and now we have something that looks stable.

Description of the fixes is in the tag and quoted below.

Thanks,
-Christoffer


A late round of KVM/ARM fixes for v4.3-rc7, fixing:
 - A bug where level-triggered interrupts lowered from userspace
   are still routed to the guest
 - A memory leak an a failed initialization path
 - A build error under certain configurations
 - Several timer bugs introduced with moving the timer to the active
   state handling instead of the masking trick.


Arnd Bergmann (1):
  KVM: arm: use GIC support unconditionally

Christoffer Dall (3):
  arm/arm64: KVM: Fix arch timer behavior for disabled interrupts
  arm/arm64: KVM: Clear map->active on pend/active clear
  arm/arm64: KVM: Fix disabled distributor operation

Pavel Fedin (2):
  KVM: arm/arm64: Do not inject spurious interrupts
  KVM: arm/arm64: Fix memory leak if timer initialization fails

 arch/arm/kvm/Kconfig  |  1 +
 arch/arm/kvm/arm.c|  2 +-
 virt/kvm/arm/arch_timer.c | 19 ++
 virt/kvm/arm/vgic.c   | 95 +++
 4 files changed, 76 insertions(+), 41 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 5/6] arm/arm64: KVM: Clear map->active on pend/active clear

2015-10-20 Thread Christoffer Dall
When a guest reboots or offlines/onlines CPUs, it is not uncommon for it
to clear the pending and active states of an interrupt through the
emulated VGIC distributor.  However, since the architected timers are
defined by the architecture to be level triggered and the guest
rightfully expects them to be that, but we emulate them as
edge-triggered, we have to mimic level-triggered behavior for an
edge-triggered virtual implementation.

We currently do not signal the VGIC when the map->active field is true,
because it indicates that the guest has already been signalled of the
interrupt as required.  Normally this field is set to false when the
guest deactivates the virtual interrupt through the sync path.

We also need to catch the case where the guest deactivates the interrupt
through the emulated distributor, again allowing guests to boot even if
the original virtual timer signal hit before the guest's GIC
initialization sequence is run.

Reviewed-by: Eric Auger 
Signed-off-by: Christoffer Dall 
---
 virt/kvm/arm/vgic.c | 32 +++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index ea21bc2..58b1256 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -531,6 +531,34 @@ bool vgic_handle_set_pending_reg(struct kvm *kvm,
return false;
 }
 
+/*
+ * If a mapped interrupt's state has been modified by the guest such that it
+ * is no longer active or pending, without it have gone through the sync path,
+ * then the map->active field must be cleared so the interrupt can be taken
+ * again.
+ */
+static void vgic_handle_clear_mapped_irq(struct kvm_vcpu *vcpu)
+{
+   struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
+   struct list_head *root;
+   struct irq_phys_map_entry *entry;
+   struct irq_phys_map *map;
+
+   rcu_read_lock();
+
+   /* Check for PPIs */
+   root = _cpu->irq_phys_map_list;
+   list_for_each_entry_rcu(entry, root, entry) {
+   map = >map;
+
+   if (!vgic_dist_irq_is_pending(vcpu, map->virt_irq) &&
+   !vgic_irq_is_active(vcpu, map->virt_irq))
+   map->active = false;
+   }
+
+   rcu_read_unlock();
+}
+
 bool vgic_handle_clear_pending_reg(struct kvm *kvm,
   struct kvm_exit_mmio *mmio,
   phys_addr_t offset, int vcpu_id)
@@ -561,6 +589,7 @@ bool vgic_handle_clear_pending_reg(struct kvm *kvm,
  vcpu_id, offset);
vgic_reg_access(mmio, reg, offset, mode);
 
+   vgic_handle_clear_mapped_irq(kvm_get_vcpu(kvm, vcpu_id));
vgic_update_state(kvm);
return true;
}
@@ -598,6 +627,7 @@ bool vgic_handle_clear_active_reg(struct kvm *kvm,
ACCESS_READ_VALUE | ACCESS_WRITE_CLEARBIT);
 
if (mmio->is_write) {
+   vgic_handle_clear_mapped_irq(kvm_get_vcpu(kvm, vcpu_id));
vgic_update_state(kvm);
return true;
}
@@ -1406,7 +1436,7 @@ static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct 
vgic_lr vlr)
return 0;
 
map = vgic_irq_map_search(vcpu, vlr.irq);
-   BUG_ON(!map || !map->active);
+   BUG_ON(!map);
 
ret = irq_get_irqchip_state(map->irq,
IRQCHIP_STATE_ACTIVE,
-- 
2.1.2.330.g565301e.dirty

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 1/6] KVM: arm/arm64: Do not inject spurious interrupts

2015-10-20 Thread Christoffer Dall
From: Pavel Fedin 

When lowering a level-triggered line from userspace, we forgot to lower
the pending bit on the emulated CPU interface and we also did not
re-compute the pending_on_cpu bitmap for the CPU affected by the change.

Update vgic_update_irq_pending() to fix the two issues above and also
raise a warning in vgic_quue_irq_to_lr if we encounter an interrupt
pending on a CPU which is neither marked active nor pending.

  [ Commit text reworked completely - Christoffer ]

Signed-off-by: Pavel Fedin 
Signed-off-by: Christoffer Dall 
---
 virt/kvm/arm/vgic.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 6bd1c9b..596455a 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1132,7 +1132,8 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, 
int irq,
kvm_debug("Set active, clear distributor: 0x%x\n", vlr.state);
vgic_irq_clear_active(vcpu, irq);
vgic_update_state(vcpu->kvm);
-   } else if (vgic_dist_irq_is_pending(vcpu, irq)) {
+   } else {
+   WARN_ON(!vgic_dist_irq_is_pending(vcpu, irq));
vlr.state |= LR_STATE_PENDING;
kvm_debug("Set pending: 0x%x\n", vlr.state);
}
@@ -1607,8 +1608,12 @@ static int vgic_update_irq_pending(struct kvm *kvm, int 
cpuid,
} else {
if (level_triggered) {
vgic_dist_irq_clear_level(vcpu, irq_num);
-   if (!vgic_dist_irq_soft_pend(vcpu, irq_num))
+   if (!vgic_dist_irq_soft_pend(vcpu, irq_num)) {
vgic_dist_irq_clear_pending(vcpu, irq_num);
+   vgic_cpu_irq_clear(vcpu, irq_num);
+   if (!compute_pending_for_cpu(vcpu))
+   clear_bit(cpuid, 
dist->irq_pending_on_cpu);
+   }
}
 
ret = false;
-- 
2.1.2.330.g565301e.dirty

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 4/6] arm/arm64: KVM: Fix arch timer behavior for disabled interrupts

2015-10-20 Thread Christoffer Dall
We have an interesting issue when the guest disables the timer interrupt
on the VGIC, which happens when turning VCPUs off using PSCI, for
example.

The problem is that because the guest disables the virtual interrupt at
the VGIC level, we never inject interrupts to the guest and therefore
never mark the interrupt as active on the physical distributor.  The
host also never takes the timer interrupt (we only use the timer device
to trigger a guest exit and everything else is done in software), so the
interrupt does not become active through normal means.

The result is that we keep entering the guest with a programmed timer
that will always fire as soon as we context switch the hardware timer
state and run the guest, preventing forward progress for the VCPU.

Since the active state on the physical distributor is really part of the
timer logic, it is the job of our virtual arch timer driver to manage
this state.

The timer->map->active boolean field indicates whether we have signalled
this interrupt to the vgic and if that interrupt is still pending or
active.  As long as that is the case, the hardware doesn't have to
generate physical interrupts and therefore we mark the interrupt as
active on the physical distributor.

We also have to restore the pending state of an interrupt that was
queued to an LR but was retired from the LR for some reason, while
remaining pending in the LR.

Cc: Marc Zyngier 
Reported-by: Lorenzo Pieralisi 
Signed-off-by: Christoffer Dall 
---
 virt/kvm/arm/arch_timer.c | 19 +++
 virt/kvm/arm/vgic.c   | 43 +++
 2 files changed, 30 insertions(+), 32 deletions(-)

diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 48c6e1a..b9d3a32 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -137,6 +137,8 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
 void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
 {
struct arch_timer_cpu *timer = >arch.timer_cpu;
+   bool phys_active;
+   int ret;
 
/*
 * We're about to run this vcpu again, so there is no need to
@@ -151,6 +153,23 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
 */
if (kvm_timer_should_fire(vcpu))
kvm_timer_inject_irq(vcpu);
+
+   /*
+* We keep track of whether the edge-triggered interrupt has been
+* signalled to the vgic/guest, and if so, we mask the interrupt and
+* the physical distributor to prevent the timer from raising a
+* physical interrupt whenever we run a guest, preventing forward
+* VCPU progress.
+*/
+   if (kvm_vgic_get_phys_irq_active(timer->map))
+   phys_active = true;
+   else
+   phys_active = false;
+
+   ret = irq_set_irqchip_state(timer->map->irq,
+   IRQCHIP_STATE_ACTIVE,
+   phys_active);
+   WARN_ON(ret);
 }
 
 /**
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 596455a..ea21bc2 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1092,6 +1092,15 @@ static void vgic_retire_lr(int lr_nr, int irq, struct 
kvm_vcpu *vcpu)
struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
struct vgic_lr vlr = vgic_get_lr(vcpu, lr_nr);
 
+   /*
+* We must transfer the pending state back to the distributor before
+* retiring the LR, otherwise we may loose edge-triggered interrupts.
+*/
+   if (vlr.state & LR_STATE_PENDING) {
+   vgic_dist_irq_set_pending(vcpu, irq);
+   vlr.hwirq = 0;
+   }
+
vlr.state = 0;
vgic_set_lr(vcpu, lr_nr, vlr);
clear_bit(lr_nr, vgic_cpu->lr_used);
@@ -1241,7 +1250,7 @@ static void __kvm_vgic_flush_hwstate(struct kvm_vcpu 
*vcpu)
struct vgic_cpu *vgic_cpu = >arch.vgic_cpu;
struct vgic_dist *dist = >kvm->arch.vgic;
unsigned long *pa_percpu, *pa_shared;
-   int i, vcpu_id, lr, ret;
+   int i, vcpu_id;
int overflow = 0;
int nr_shared = vgic_nr_shared_irqs(dist);
 
@@ -1296,31 +1305,6 @@ epilog:
 */
clear_bit(vcpu_id, dist->irq_pending_on_cpu);
}
-
-   for (lr = 0; lr < vgic->nr_lr; lr++) {
-   struct vgic_lr vlr;
-
-   if (!test_bit(lr, vgic_cpu->lr_used))
-   continue;
-
-   vlr = vgic_get_lr(vcpu, lr);
-
-   /*
-* If we have a mapping, and the virtual interrupt is
-* presented to the guest (as pending or active), then we must
-* set the state to active in the physical world. See
-* Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt.
-*/
-   if (vlr.state & LR_HW) {
-   struct irq_phys_map *map;
-   map 

Re: Steal time accounting in KVM. Benchmark.

2015-10-20 Thread Alexey Makhalov
'echo NO_NONTASK_CAPACITY > /sys/kernel/debug/sched_features'  in both guests.
Results:
VM1: STA is disabled -- no changes, still little bit bellow expected 90%
VM2: STA is enabled -- result is changed, but still bad. Hard to say
better or worse. It prefers to stuck at quarters (100% 75% 50% 25%)
Output is attached.

Thanks,
Alexey

On Tue, Oct 20, 2015 at 5:39 AM, Wanpeng Li  wrote:
> Cc Peterz,
> 2015-10-20 5:58 GMT+08:00 Alexey Makhalov :
>>
>> Hi,
>>
>> I did benchmarking of scheduler fairness with enabled steal time
>> accounting(STA) in KVM.
>> And results are really interesting.
>>
>> Looks like STA provides worse scheduler fairness against disabled STA
>> (no-steal-acc cmdline param)
>>
>> I created benchmark, main idea is: 2 cgroups with cpu.shares
>> proportion like 1/9, run identical work in both groups, and expecting
>> to get the same proportion of work done – 1/9. Condition – CPU
>> overcommit.
>>
>> On bare metal it is fair +- some percentages of fluctation.
>> On KVM with no STA   it’s less fair. With STA enabled results are
>> ugly! One again – in CPU overcommit situation.
>>
>>
>> Host: ubuntu 14.04, Intel i5 – 2x2 CPUs
>> 2 VMs (4 vCPU each) are working in parallel.   2:1 cpu overcommit.
>>
>> Each VM has running benchmark:
>> cgroups cpu.shares proportion is 128/1152 (10%/90%), work – spinning
>> in cycle, number of cycles are being counted.
>
>
> Could you try if 'echo NO_NONTASK_CAPACITY >
> /sys/kernel/debug/sched_features' in guests works?
>
> Regards,
> Wanpeng Li
90% 28539
75% 31907
76% 30348
75% 43677
75% 29765
63% 30351
74% 28535
70% 29590
65% 31281
87% 29476
75% 29505
75% 29737
75% 29534
75% 29627
89% 30411
75% 29069
74% 29786
74% 29776
74% 29633
74% 29470
74% 29406
90% 30274
90% 29615
96% 30497
100% 29263
100% 29241
100% 29748
100% 29790
99% 29769
74% 66239
52% 118436
50% 117409
50% 118167
50% 118990
50% 38479
48% 29641
50% 29757
50% 29754
50% 29688
50% 29671
50% 29745
50% 29831
50% 29596
50% 29645
50% 29661
50% 29567
50% 29511
50% 29763
50% 29919
51% 28823
51% 29944
50% 29501
50% 29406
50% 29410
49% 29519
51% 30109
49% 29586
49% 29621
49% 29579
43% 80183
49% 118185
49% 113732
45% 29698
25% 29528
24% 30078
36% 29716
32% 28976
43% 77606
19% 131592
0% 39503
12% 29709
25% 29717
24% 29720
25% 29695
25% 29703
25% 29570
25% 29492
25% 29790
25% 29677
25% 29451
25% 29536
25% 29476
25% 29613
24% 30045
24% 30227
37% 52553
43% 29986
50% 29741
49% 30003
50% 29982
35% 29907
39% 30156
48% 30590
66% 58939
--- benchmark was stopped on the first VM. VM1 was in idle from here ---  
90% 58977
89% 58952
90% 58945
89% 58945
90% 58948
89% 59001
89% 58966
90% 58966
90% 59005
89% 58998
90% 58998
90% 59010
90% 59028
90% 59070
90% 58558
90% 58948
89% 59080
89% 58601
90% 58385
90% 59003
89% 57695
90% 58843
89% 59104
90% 59006
90% 57110
--- to here ---
97% 30300
100% 29335
100% 29541
100% 29609
100% 29880
100% 29925
100% 29517
100% 29723
100% 28467
99% 29597
100% 29817
100% 29772
100% 29761
100% 29547
100% 29556
100% 29848
100% 29713
100% 29616
100% 29659
100% 29709
100% 29757
100% 29806
100% 30357
100% 29784
100% 29812
100% 29679
100% 29839
100% 29633
100% 29650
100% 29730
100% 29964
100% 29993
100% 29899
100% 29951
100% 29924
100% 29938
100% 29784
100% 30043
100% 29308
99% 30625
95% 29907
99% 29811
100% 29626
100% 30054
100% 29744
100% 29693
100% 30008
100% 29935
100% 29825
100% 29684
100% 30010
100% 30163
93% 51597
89% 58972
89% 58692
90% 57975
100% 27946
100% 29624
100% 30365
98% 30468
99% 30971
100% 30907
100% 28079
100% 30074
100% 29001
100% 30366
99% 29814
100% 30405
97% 29866
100% 29003
100% 28545
100% 29693
100% 30013
100% 28179
100% 28644
100% 30081
100% 28191
100% 31325
100% 30223
100% 29848
100% 30057
100% 29675
100% 29859
100% 28160
100% 29679
100% 29689
100% 29727
100% 29715
100% 30547
100% 29281
100% 29761
100% 29596
100% 29797
100% 29664
100% 29768
100% 29708
100% 29614
100% 29627
100% 30158
100% 29801
100% 28569
100% 29118
100% 30540
100% 29335
100% 29188
99% 30413
100% 29438
100% 28382
100% 29492
100% 28555
100% 29700
99% 30374
75% 29759
75% 29954
75% 29588
75% 29540
75% 29616
75% 29890
75% 29457
75% 29466
75% 29724
74% 29395
74% 29613
75% 29776
75% 29455
75% 29788
75% 29592
75% 29679
75% 29749
75% 29715
75% 29992
75% 29271
73% 30173
74% 29480
75% 29415
75% 28416
75% 29786
75% 29086
75% 29293
75% 29670
75% 29898
75% 29756
75% 29498
75% 29946
75% 29328
75% 29120
75% 29821
75% 29765
75% 29673
75% 29682
75% 29657
74% 30045
75% 29641
75% 29662
75% 29595
75% 29635
75% 29720
73% 30315
75% 29589
75% 29607
75% 29629
75% 29738
75% 30068
72% 28898
75% 26530
74% 30075
73% 29182
74% 28506
73% 28135
74% 29083
75% 29609
73% 27956
74% 28921
74% 29045
74% 29576
74% 29228
74% 29494
74% 29218
73% 27872
74% 29787
74% 29633
68% 32062
98% 31992
98% 32120
90% 32984
98% 32568
98% 32307
97% 32300
98% 32262
99% 30436
94% 30023
96% 32511
99% 31075
99% 30112
100% 29835
100% 30079
100% 29980
100% 29954
100% 29771
100% 29993
100% 29796
100% 29895
100% 29748
100% 29846
100% 

[PATCH 2/4] tools lib traceevent: update KVM plugin

2015-10-20 Thread Arnaldo Carvalho de Melo
From: Paolo Bonzini 

The format of the role word has changed through the years and the plugin
was never updated; some VMX exit reasons were missing too.

Signed-off-by: Paolo Bonzini 
Acked-by: Steven Rostedt 
Cc: David Ahern 
Cc: Namhyung Kim 
Cc: kvm@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1443695293-31127-1-git-send-email-pbonz...@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/traceevent/plugin_kvm.c | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/tools/lib/traceevent/plugin_kvm.c 
b/tools/lib/traceevent/plugin_kvm.c
index 88fe83dff7cd..18536f756577 100644
--- a/tools/lib/traceevent/plugin_kvm.c
+++ b/tools/lib/traceevent/plugin_kvm.c
@@ -124,7 +124,10 @@ static const char *disassemble(unsigned char *insn, int 
len, uint64_t rip,
_ER(WBINVD,  54)\
_ER(XSETBV,  55)\
_ER(APIC_WRITE,  56)\
-   _ER(INVPCID, 58)
+   _ER(INVPCID, 58)\
+   _ER(PML_FULL,62)\
+   _ER(XSAVES,  63)\
+   _ER(XRSTORS, 64)
 
 #define SVM_EXIT_REASONS \
_ER(EXIT_READ_CR0,  0x000)  \
@@ -352,15 +355,18 @@ static int kvm_nested_vmexit_handler(struct trace_seq *s, 
struct pevent_record *
 union kvm_mmu_page_role {
unsigned word;
struct {
-   unsigned glevels:4;
unsigned level:4;
+   unsigned cr4_pae:1;
unsigned quadrant:2;
-   unsigned pad_for_nice_hex_output:6;
unsigned direct:1;
unsigned access:3;
unsigned invalid:1;
-   unsigned cr4_pge:1;
unsigned nxe:1;
+   unsigned cr0_wp:1;
+   unsigned smep_and_not_wp:1;
+   unsigned smap_and_not_wp:1;
+   unsigned pad_for_nice_hex_output:8;
+   unsigned smm:8;
};
 };
 
@@ -385,15 +391,18 @@ static int kvm_mmu_print_role(struct trace_seq *s, struct 
pevent_record *record,
if (pevent_is_file_bigendian(event->pevent) ==
pevent_is_host_bigendian(event->pevent)) {
 
-   trace_seq_printf(s, "%u/%u q%u%s %s%s %spge %snxe",
+   trace_seq_printf(s, "%u q%u%s %s%s %spae %snxe %swp%s%s%s",
 role.level,
-role.glevels,
 role.quadrant,
 role.direct ? " direct" : "",
 access_str[role.access],
 role.invalid ? " invalid" : "",
-role.cr4_pge ? "" : "!",
-role.nxe ? "" : "!");
+role.cr4_pae ? "" : "!",
+role.nxe ? "" : "!",
+role.cr0_wp ? "" : "!",
+role.smep_and_not_wp ? " smep" : "",
+role.smap_and_not_wp ? " smap" : "",
+role.smm ? " smm" : "");
} else
trace_seq_printf(s, "WORD: %08x", role.word);
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Difference between vcpu_load and kvm_sched_in ?

2015-10-20 Thread Wanpeng Li

On 10/20/15 11:44 PM, Paolo Bonzini wrote:


On 20/10/2015 11:57, Yacine wrote:

vcpu_load; start cr3 trapping; vcpu_put

it worked correctly (in my logs I see that vcpu.cpu become equal to "cpu =
raw_smp_processor_id();") but the VM blocks for a lot of time due to mutex
in vcpu_load (up to serveral seconds and sometimes minutes !)

Right, that's because while the CPU is running the mutex is taken.  If
the VCPU doesn't exit, the mutex is held.


I replaced vcpu_load with kvm_sched_in, now everything works perfectly and
the VM doesn't block at all (logs here: http://pastebin.com/h5XNNMcb).

So, what I want to know is: what is the difference between vcpu_load and
kvm_sched_in ? both of this functions call kvm_arch_vcpu_loadbut the latter
one does it without doing a mutex

kvm_sched_out and kvm_sched_in are part of KVM's preemption hooks.  The
hooks are registered only between vcpu_load and vcpu_put, therefore they
know that the mutex is taken.  The sequence will go like this:

 vcpu_load
 kvm_sched_out
 kvm_sched_in
 kvm_sched_out
 kvm_sched_in
 ...
 vcpu_put


If this should be:

vcpu_load
kvm_sched_in
kvm_sched_out
kvm_sched_in
kvm_sched_out
...
vcpu_put

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tools lib traceevent: update KVM plugin

2015-10-20 Thread Steven Rostedt
On Tue, 20 Oct 2015 12:32:44 -0200
Arnaldo Carvalho de Melo  wrote:


> > Ping?  Arnaldo, ok to include this patch in my tree?
> 
> Applying, I saw it before, but lost track, perhaps was waiting for
> Steven Rostedt to chime in, but now I noticed he wasn't CCed, he is now.

What happens if you run new perf on an older kernel. Is this new plugin
going to be screwed up? Plugins should be backward compatible.

Is the plugin even still needed? I'm looking at some of the kvm events
and they seem to be mostly self sufficient. What ones need a plugin
today?

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] KVM/arm: enable enhanced armv7 fp/simd lazy switch

2015-10-20 Thread Mario Smarduch


On 10/20/2015 12:24 AM, Christoffer Dall wrote:
> On Mon, Oct 19, 2015 at 04:25:04PM -0700, Mario Smarduch wrote:
>>
>>
>> On 10/19/2015 3:14 AM, Christoffer Dall wrote:
>>> On Sat, Sep 26, 2015 at 04:43:29PM -0700, Mario Smarduch wrote:
 This patch enhances current lazy vfp/simd hardware switch. In addition to
 current lazy switch, it tracks vfp/simd hardware state with a vcpu 
 lazy flag. 

 vcpu lazy flag is set on guest access and trap to vfp/simd hardware switch 
 handler. On vm-enter if lazy flag is set skip trap enable and saving 
 host fpexc. On vm-exit if flag is set skip hardware context switch
 and return to host with guest context.

 On vcpu_put check if vcpu lazy flag is set, and execute a hardware context 
 switch to restore host.

 Signed-off-by: Mario Smarduch 
 ---
  arch/arm/include/asm/kvm_asm.h |  1 +
  arch/arm/kvm/arm.c | 17 
  arch/arm/kvm/interrupts.S  | 60 
 +++---
  arch/arm/kvm/interrupts_head.S | 12 ++---
  4 files changed, 71 insertions(+), 19 deletions(-)

 diff --git a/arch/arm/include/asm/kvm_asm.h 
 b/arch/arm/include/asm/kvm_asm.h
 index 194c91b..4b45d79 100644
 --- a/arch/arm/include/asm/kvm_asm.h
 +++ b/arch/arm/include/asm/kvm_asm.h
 @@ -97,6 +97,7 @@ extern char __kvm_hyp_code_end[];
  extern void __kvm_flush_vm_context(void);
  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 +extern void __kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
  
  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
  #endif
 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index ce404a5..79f49c7 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -105,6 +105,20 @@ void kvm_arch_check_processor_compat(void *rtn)
*(int *)rtn = 0;
  }
  
 +/**
 + * kvm_switch_fp_regs() - switch guest/host VFP/SIMD registers
 + * @vcpu: pointer to vcpu structure.
 + *
>>>
>>> nit: stray blank line
>> ok
>>>
 + */
 +static void kvm_switch_fp_regs(struct kvm_vcpu *vcpu)
 +{
 +#ifdef CONFIG_ARM
 +  if (vcpu->arch.vfp_lazy == 1) {
 +  kvm_call_hyp(__kvm_restore_host_vfp_state, vcpu);
>>>
>>> why do you have to do this in HYP mode ?
>>  Calling it directly works fine. I moved the function outside hyp start/end
>> range in interrupts.S. Not thinking outside the box, just thought let them 
>> all
>> be hyp calls.
>>
>>>
 +  vcpu->arch.vfp_lazy = 0;
 +  }
 +#endif
>>>
>>> we've tried to put stuff like this in header files to avoid the ifdefs
>>> so far.  Could that be done here as well?
>>
>> That was a to do, but didn't get around to it.
>>>
 +}
  
  /**
   * kvm_arch_init_vm - initializes a VM data structure
 @@ -295,6 +309,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
  
  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
  {
 +  /* Check if Guest accessed VFP registers */
>>>
>>> misleading comment: this function does more than checking
>> Yep sure does, will change.
>>>
 +  kvm_switch_fp_regs(vcpu);
 +
/*
 * The arch-generic KVM code expects the cpu field of a vcpu to be -1
 * if the vcpu is no longer assigned to a cpu.  This is used for the
 diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
 index 900ef6d..6d98232 100644
 --- a/arch/arm/kvm/interrupts.S
 +++ b/arch/arm/kvm/interrupts.S
 @@ -96,6 +96,29 @@ ENTRY(__kvm_flush_vm_context)
bx  lr
  ENDPROC(__kvm_flush_vm_context)
  
 +/**
 + * void __kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes a lazy
 + * fp/simd switch, saves the guest, restores host.
 + *
>>>
>>> nit: stray blank line
>> ok.
>>>
 + */
 +ENTRY(__kvm_restore_host_vfp_state)
 +#ifdef CONFIG_VFPv3
 +  push{r3-r7}
 +
 +  add r7, r0, #VCPU_VFP_GUEST
 +  store_vfp_state r7
 +
 +  add r7, r0, #VCPU_VFP_HOST
 +  ldr r7, [r7]
 +  restore_vfp_state r7
 +
 +  ldr r3, [vcpu, #VCPU_VFP_FPEXC]
>>>
>>> either use r0 or vcpu throughout this function please
>> Yeah that's bad - in the same function to
>>>
 +  VFPFMXR FPEXC, r3
 +
 +  pop {r3-r7}
 +#endif
 +  bx  lr
 +ENDPROC(__kvm_restore_host_vfp_state)
  
  /
   *  Hypervisor world-switch code
 @@ -119,11 +142,15 @@ ENTRY(__kvm_vcpu_run)
@ If the host kernel has not been configured with VFPv3 support,
@ then it is safer if we deny guests from using it as well.
  #ifdef CONFIG_VFPv3
 +  @ r7 must be preserved until next vfp lazy check
>>>
>>> I don't understand this 

Re: Steal time accounting in KVM. Benchmark.

2015-10-20 Thread Alexey Makhalov
Yes, VM1 results are as before.

Alexey

On Tue, Oct 20, 2015 at 4:04 PM, Wanpeng Li  wrote:
> On 10/21/15 4:05 AM, Alexey Makhalov wrote:
>>
>> 'echo NO_NONTASK_CAPACITY > /sys/kernel/debug/sched_features'  in both
>> guests.
>> Results:
>> VM1: STA is disabled -- no changes, still little bit bellow expected 90%
>> VM2: STA is enabled -- result is changed, but still bad. Hard to say
>> better or worse. It prefers to stuck at quarters (100% 75% 50% 25%)
>> Output is attached.
>
>
> If the output in attachment is for VM2 only?
>
> Regards,
> Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Steal time accounting in KVM. Benchmark.

2015-10-20 Thread Wanpeng Li

On 10/21/15 4:05 AM, Alexey Makhalov wrote:

'echo NO_NONTASK_CAPACITY > /sys/kernel/debug/sched_features'  in both guests.
Results:
VM1: STA is disabled -- no changes, still little bit bellow expected 90%
VM2: STA is enabled -- result is changed, but still bad. Hard to say
better or worse. It prefers to stuck at quarters (100% 75% 50% 25%)
Output is attached.


If the output in attachment is for VM2 only?

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network hangs when communicating with host

2015-10-20 Thread Dmitry Vyukov
I now have another issue. My binary fails to mmap a file within lkvm
sandbox. The same binary works fine on host and in qemu. I've added
strace into sandbox script, and here is the output:

[pid   837] openat(AT_FDCWD, "syzkaller-shm048878722", O_RDWR|O_CLOEXEC) = 5
[pid   837] mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_SHARED, 5,
0) = -1 EINVAL (Invalid argument)

I don't see anything that can potentially cause EINVAL here. Is it
possible that lkvm somehow affects kernel behavior here?

I run lkvm as:

$ taskset 1 /kvmtool/lkvm sandbox --disk syz-0 --mem=2048 --cpus=2
--kernel /arch/x86/boot/bzImage --network mode=user --sandbox
/workdir/kvm/syz-0.sh








On Mon, Oct 19, 2015 at 4:20 PM, Sasha Levin  wrote:
> On 10/19/2015 05:28 AM, Dmitry Vyukov wrote:
>> On Mon, Oct 19, 2015 at 11:22 AM, Andre Przywara  
>> wrote:
>>> Hi Dmitry,
>>>
>>> On 19/10/15 10:05, Dmitry Vyukov wrote:
 On Fri, Oct 16, 2015 at 7:25 PM, Sasha Levin  
 wrote:
> On 10/15/2015 04:20 PM, Dmitry Vyukov wrote:
>> Hello,
>>
>> I am trying to run a program in lkvm sandbox so that it communicates
>> with a program on host. I run lkvm as:
>>
>> ./lkvm sandbox --disk sandbox-test --mem=2048 --cpus=4 --kernel
>> /arch/x86/boot/bzImage --network mode=user -- /my_prog
>>
>> /my_prog then connects to a program on host over a tcp socket.
>> I see that host receives some data, sends some data back, but then
>> my_prog hangs on network read.
>>
>> To localize this I wrote 2 programs (attached). ping is run on host
>> and pong is run from lkvm sandbox. They successfully establish tcp
>> connection, but after some iterations both hang on read.
>>
>> Networking code in Go runtime is there for more than 3 years, widely
>> used in production and does not have any known bugs. However, it uses
>> epoll edge-triggered readiness notifications that known to be tricky.
>> Is it possible that lkvm contains some networking bug? Can it be
>> related to the data races in lkvm I reported earlier today?
>>>
>>> Just to let you know:
>>> I think we have seen networking issues in the past - root over NFS had
>>> issues IIRC. Will spent some time on debugging this and it looked like a
>>> race condition in kvmtool's virtio implementation. I think pinning
>>> kvmtool's virtio threads to one host core made this go away. However
>>> although he tried hard (even by Will's standards!) he couldn't find a
>>> the real root cause or a fix at the time he looked at it and we found
>>> other ways to work around the issues (using virtio-blk or initrd's).
>>>
>>> So it's quite possible that there are issues. I haven't had time yet to
>>> look at your sanitizer reports, but it looks like a promising approach
>>> to find the root cause.
>>
>>
>> Thanks, Andre!
>>
>> ping/pong does not hang within at least 5 minutes when I run lkvm
>> under taskset 1.
>>
>> And, yeah, this pretty strongly suggests a data race. ThreadSanitizer
>> can point you to the bug within a minute, so you just need to say
>> "aha! it is here". Or maybe not. There are no guarantees. But if you
>> already spent significant time on this, then checking the reports
>> definitely looks like a good idea.
>
> Okay, that's good to know.
>
> I have a few busy days, but I'll definitely try to clear up these reports
> as they seem to be pointing to real issues.
>
>
> Thanks,
> Sasha
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] userfault21 update

2015-10-20 Thread Patrick Donnelly
On Mon, Oct 19, 2015 at 5:42 PM, Andrea Arcangeli  wrote:
> Hello Patrick,
>
> On Mon, Oct 12, 2015 at 11:04:11AM -0400, Patrick Donnelly wrote:
>> Hello Andrea,
>>
>> On Mon, Jun 15, 2015 at 1:22 PM, Andrea Arcangeli  
>> wrote:
>> > This is an incremental update to the userfaultfd code in -mm.
>>
>> Sorry I'm late to this party. I'm curious how a ptrace monitor might
>> use a userfaultfd to handle faults in all of its tracees. Is this
>> possible without having each (newly forked) tracee "cooperate" by
>> creating a userfaultfd and passing that to the tracer?
>
> To make the non cooperative usage work, userfaulfd also needs more
> features to track fork() and mremap() syscalls and such, as the
> monitor needs to be aware about modifications to the address space of
> each "mm" is managing and of new forked "mm" as well. So fork() won't
> need to call userfaultfd once we add those features, but it still
> doesn't need to know about the "pid". The uffd_msg already has padding
> to add the features you need for that.
>
> Pavel invented and developed those features for the non cooperative
> usage to implement postcopy live migration of containers. He posted
> some patchset on the lists too, but it probably needs to be rebased on
> upstream.
>
> The ptrace monitor thread can also fault into the userfault area if it
> wants to (but only if it's not the userfault manager thread as well).
> I didn't expect the ptrace monitor to want to be a userfault manager
> too though.
> [...]

Okay, it's definitely tricky to make this work for a tree of
non-cooperative processes. Brainstorming some ideas:

o If we are using ptrace, then we can add a ptrace event for receiving
the userfaultfd associated with the tracee, via waitpid (!). The
ptrace monitor can deduplicate userfaultfds by looking at the inode.
It can also associate a userfaultfd with a group of threads sharing a
mm. [For my possible use-case with Parrot[1], we already track the
shared address spaces of tracees in order to implement an mmap hook.]

o The userfaultfd can have a flag for tracking a tree of processes
(which can be sent via unix sockets to the userfault handler) and use
an opaque tag (the mm pointer?) to disambiguate the faults, instead of
a pid. There would need to be some kind of message to notify about
newly cloned threads and the mm associated with them? Yes, you
wouldn't be able to know which pid (or kernel/ptrace thread) generated
a fault but at least you would know which pids the mm belongs to.

I didn't see the patchset Pavel posted in a quick search of the
archives. Only this [2].

[1] http://ccl.cse.nd.edu/software/parrot/
[2] https://lkml.org/lkml/2015/1/15/103

-- 
Patrick Donnelly
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html