Re: [PATCH 0/6] VSOCK: revert virtio-vsock until device spec is finalized

2015-12-08 Thread David Miller
From: Stefan Hajnoczi 
Date: Tue,  8 Dec 2015 19:57:30 +0800

> Please revert for now.

Please don't revert it piece by piece like this.

Instead, send me one big revert commit that undoes the whole
thing.  There is even a merge commit that you can use to
create that revert cleanly.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 14/21] KVM: ARM64: Add reset and access handlers for PMOVSSET and PMOVSCLR register

2015-12-08 Thread Marc Zyngier
On 08/12/15 12:47, Shannon Zhao wrote:
> From: Shannon Zhao 
> 
> Since the reset value of PMOVSSET and PMOVSCLR is UNKNOWN, use
> reset_unknown for its reset handler. Add a new case to emulate writing
> PMOVSSET or PMOVSCLR register.
> 
> When writing non-zero value to PMOVSSET, pend PMU interrupt. When the
> value writing to PMOVSCLR is equal to the current value, clear the PMU
> pending interrupt.
> 
> Signed-off-by: Shannon Zhao 
> ---
>  arch/arm64/kvm/sys_regs.c | 27 --
>  include/kvm/arm_pmu.h |  4 +++
>  virt/kvm/arm/pmu.c| 72 
> +++
>  3 files changed, 100 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index c1dffb2..c830fde 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -601,6 +601,15 @@ static bool access_pmu_regs(struct kvm_vcpu *vcpu,
>   vcpu_sys_reg(vcpu, r->reg) &= ~p->regval;
>   break;
>   }
> + case PMOVSSET_EL0: {
> + if (r->CRm == 14)
> + /* accessing PMOVSSET_EL0 */
> + kvm_pmu_overflow_set(vcpu, p->regval);
> + else
> + /* accessing PMOVSCLR_EL0 */
> + kvm_pmu_overflow_clear(vcpu, p->regval);
> + break;
> + }
>   case PMCR_EL0: {
>   /* Only update writeable bits of PMCR */
>   val = vcpu_sys_reg(vcpu, r->reg);
> @@ -847,7 +856,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
> access_pmu_regs, reset_unknown, PMCNTENSET_EL0 },
>   /* PMOVSCLR_EL0 */
>   { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b011),
> -   trap_raz_wi },
> +   access_pmu_regs, reset_unknown, PMOVSSET_EL0 },
>   /* PMSWINC_EL0 */
>   { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b100),
> trap_raz_wi },
> @@ -874,7 +883,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
> trap_raz_wi },
>   /* PMOVSSET_EL0 */
>   { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1110), Op2(0b011),
> -   trap_raz_wi },
> +   access_pmu_regs, reset_unknown, PMOVSSET_EL0 },
>  
>   /* TPIDR_EL0 */
>   { Op0(0b11), Op1(0b011), CRn(0b1101), CRm(0b), Op2(0b010),
> @@ -1184,6 +1193,15 @@ static bool access_pmu_cp15_regs(struct kvm_vcpu *vcpu,
>   vcpu_cp15(vcpu, r->reg) &= ~p->regval;
>   break;
>   }
> + case c9_PMOVSSET: {
> + if (r->CRm == 14)
> + /* accessing c9_PMOVSSET */
> + kvm_pmu_overflow_set(vcpu, p->regval);
> + else
> + /* accessing c9_PMOVSCLR */
> + kvm_pmu_overflow_clear(vcpu, p->regval);
> + break;
> + }
>   case c9_PMCR: {
>   /* Only update writeable bits of PMCR */
>   val = vcpu_cp15(vcpu, r->reg);
> @@ -1271,7 +1289,8 @@ static const struct sys_reg_desc cp15_regs[] = {
> NULL, c9_PMCNTENSET },
>   { Op1( 0), CRn( 9), CRm(12), Op2( 2), access_pmu_cp15_regs,
> NULL, c9_PMCNTENSET },
> - { Op1( 0), CRn( 9), CRm(12), Op2( 3), trap_raz_wi },
> + { Op1( 0), CRn( 9), CRm(12), Op2( 3), access_pmu_cp15_regs,
> +   NULL, c9_PMOVSSET },
>   { Op1( 0), CRn( 9), CRm(12), Op2( 5), access_pmu_cp15_regs,
> NULL, c9_PMSELR },
>   { Op1( 0), CRn( 9), CRm(12), Op2( 6), access_pmu_cp15_regs,
> @@ -1287,6 +1306,8 @@ static const struct sys_reg_desc cp15_regs[] = {
> NULL, c9_PMINTENSET },
>   { Op1( 0), CRn( 9), CRm(14), Op2( 2), access_pmu_cp15_regs,
> NULL, c9_PMINTENSET },
> + { Op1( 0), CRn( 9), CRm(14), Op2( 3), access_pmu_cp15_regs,
> +   NULL, c9_PMOVSSET },
>  
>   { Op1( 0), CRn(10), CRm( 2), Op2( 0), access_vm_reg, NULL, c10_PRRR },
>   { Op1( 0), CRn(10), CRm( 2), Op2( 1), access_vm_reg, NULL, c10_NMRR },
> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
> index e731656..a76df52 100644
> --- a/include/kvm/arm_pmu.h
> +++ b/include/kvm/arm_pmu.h
> @@ -41,6 +41,8 @@ struct kvm_pmu {
>  u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32 select_idx);
>  void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u32 val);
>  void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u32 val, bool all_enable);
> +void kvm_pmu_overflow_clear(struct kvm_vcpu *vcpu, u32 val);
> +void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u32 val);
>  void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u32 data,
>   u32 select_idx);
>  #else
> @@ -50,6 +52,8 @@ u64 kvm_pmu_get_counter_value(struct kvm_vcpu 

Re: [PATCH v6 15/21] KVM: ARM64: Add reset and access handlers for PMUSERENR register

2015-12-08 Thread Marc Zyngier
On 08/12/15 12:47, Shannon Zhao wrote:
> From: Shannon Zhao 
> 
> The reset value of PMUSERENR_EL0 is UNKNOWN, use reset_unknown.
> 
> Signed-off-by: Shannon Zhao 
> ---
>  arch/arm64/kvm/sys_regs.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index c830fde..80b66c0 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -880,7 +880,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
> access_pmu_pmxevcntr },
>   /* PMUSERENR_EL0 */
>   { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1110), Op2(0b000),
> -   trap_raz_wi },
> +   access_pmu_regs, reset_unknown, PMUSERENR_EL0 },

So while the 64bit view of the register resets as unknown, a CPU
resetting in 32bit mode resets as 0. I suggest you reset it as zero, and
document that choice. You may have to revisit all the other registers
that do reset as unknown for 64bit as well.

>   /* PMOVSSET_EL0 */
>   { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1110), Op2(0b011),
> access_pmu_regs, reset_unknown, PMOVSSET_EL0 },
> @@ -1301,7 +1301,8 @@ static const struct sys_reg_desc cp15_regs[] = {
> NULL, c9_PMCCNTR },
>   { Op1( 0), CRn( 9), CRm(13), Op2( 1), access_pmu_pmxevtyper },
>   { Op1( 0), CRn( 9), CRm(13), Op2( 2), access_pmu_pmxevcntr },
> - { Op1( 0), CRn( 9), CRm(14), Op2( 0), trap_raz_wi },
> + { Op1( 0), CRn( 9), CRm(14), Op2( 0), access_pmu_cp15_regs,
> +   NULL,  c9_PMUSERENR, 0 },
>   { Op1( 0), CRn( 9), CRm(14), Op2( 1), access_pmu_cp15_regs,
> NULL, c9_PMINTENSET },
>   { Op1( 0), CRn( 9), CRm(14), Op2( 2), access_pmu_cp15_regs,
> 

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost: vsock: select CONFIG_VHOST

2015-12-08 Thread Michael S. Tsirkin
On Tue, Dec 08, 2015 at 04:46:08PM +0100, Arnd Bergmann wrote:
> When building the new vsock code without vhost, we get a build error:
> 
> drivers/built-in.o: In function `vhost_vsock_flush':
> :(.text+0x24d29c): undefined reference to `vhost_poll_flush'
> 
> This adds an explicit 'select' like we have for the other vhost
> drivers.
> 
> Signed-off-by: Arnd Bergmann 

This will need to be done eventually, so

Acked-by: Michael S. Tsirkin 

but I really think the right thing for now is to revert current vsock
code, or disable building it unconditionally.  Stefan, could you please
send a patch like this?

> ---
>  drivers/vhost/Kconfig.vsock | 2 ++
>  1 file changed, 2 insertions(+)
> 
> The patch causing the problem is currently in net-next, so the fix should be
> applied on top of that.
> 
> diff --git a/drivers/vhost/Kconfig.vsock b/drivers/vhost/Kconfig.vsock
> index 3491865d3eb9..bfb9edc4b5d6 100644
> --- a/drivers/vhost/Kconfig.vsock
> +++ b/drivers/vhost/Kconfig.vsock
> @@ -2,6 +2,8 @@ config VHOST_VSOCK
>   tristate "vhost virtio-vsock driver"
>   depends on VSOCKETS && EVENTFD
>   select VIRTIO_VSOCKETS_COMMON
> + select VHOST
> + select VHOST_RING
>   default n
>   ---help---
>   Say M here to enable the vhost-vsock for virtio-vsock guests
> -- 
> 2.1.0.rc2
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 11/21] KVM: ARM64: Add access handler for PMXEVCNTR register

2015-12-08 Thread Marc Zyngier
On 08/12/15 12:47, Shannon Zhao wrote:
> From: Shannon Zhao 
> 
> Accessing PMXEVCNTR register is mapped to the PMEVCNTRn or PMCCNTR which
> is selected by PMSELR.
> 
> Signed-off-by: Shannon Zhao 
> ---
>  arch/arm64/kvm/sys_regs.c | 44 ++--
>  1 file changed, 42 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index f7a73b5..2304937 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -516,6 +516,46 @@ out:
>   return true;
>  }
>  
> +static bool access_pmu_pmxevcntr(struct kvm_vcpu *vcpu,
> +  struct sys_reg_params *p,
> +  const struct sys_reg_desc *r)
> +{
> + u64 pmcr, idx, val;
> +
> + if (!vcpu_mode_is_32bit(vcpu)) {
> + pmcr = vcpu_sys_reg(vcpu, PMCR_EL0);
> + idx = vcpu_sys_reg(vcpu, PMSELR_EL0) & ARMV8_COUNTER_MASK;
> +
> + if (!pmu_counter_idx_valid(pmcr, idx))
> + goto out;
> +
> + val = kvm_pmu_get_counter_value(vcpu, idx);
> + if (!p->is_write) {
> + p->regval = val;
> + goto out;
> + }
> +
> + vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + idx) += (s64)p->regval - val;
> + } else {
> + pmcr = vcpu_cp15(vcpu, c9_PMCR);
> + idx = vcpu_cp15(vcpu, c9_PMSELR) & ARMV8_COUNTER_MASK;
> +
> + if (!pmu_counter_idx_valid(pmcr, idx))
> + goto out;
> +
> + val = kvm_pmu_get_counter_value(vcpu, idx);
> + if (!p->is_write) {
> + p->regval = val;
> + goto out;
> + }
> +
> + vcpu_cp15(vcpu, c14_PMEVCNTR0 + idx) += (s64)p->regval - val;
> + }
> +
> +out:
> + return true;
> +}

There is definitely some common code with the handling of PMEVCNTRn
here. Can you please factor it ?

> +
>  /* PMU registers accessor. */
>  static bool access_pmu_regs(struct kvm_vcpu *vcpu,
>   struct sys_reg_params *p,
> @@ -804,7 +844,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
> access_pmu_pmxevtyper },
>   /* PMXEVCNTR_EL0 */
>   { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b010),
> -   trap_raz_wi },
> +   access_pmu_pmxevcntr },
>   /* PMUSERENR_EL0 */
>   { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1110), Op2(0b000),
> trap_raz_wi },
> @@ -1192,7 +1232,7 @@ static const struct sys_reg_desc cp15_regs[] = {
>   { Op1( 0), CRn( 9), CRm(13), Op2( 0), access_pmu_cp15_regs,
> NULL, c9_PMCCNTR },
>   { Op1( 0), CRn( 9), CRm(13), Op2( 1), access_pmu_pmxevtyper },
> - { Op1( 0), CRn( 9), CRm(13), Op2( 2), trap_raz_wi },
> + { Op1( 0), CRn( 9), CRm(13), Op2( 2), access_pmu_pmxevcntr },
>   { Op1( 0), CRn( 9), CRm(14), Op2( 0), trap_raz_wi },
>   { Op1( 0), CRn( 9), CRm(14), Op2( 1), trap_raz_wi },
>   { Op1( 0), CRn( 9), CRm(14), Op2( 2), trap_raz_wi },
> 

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost: vsock: select CONFIG_VHOST

2015-12-08 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Tue, 8 Dec 2015 18:09:44 +0200

> On Tue, Dec 08, 2015 at 04:46:08PM +0100, Arnd Bergmann wrote:
>> When building the new vsock code without vhost, we get a build error:
>> 
>> drivers/built-in.o: In function `vhost_vsock_flush':
>> :(.text+0x24d29c): undefined reference to `vhost_poll_flush'
>> 
>> This adds an explicit 'select' like we have for the other vhost
>> drivers.
>> 
>> Signed-off-by: Arnd Bergmann 
> 
> This will need to be done eventually, so
> 
> Acked-by: Michael S. Tsirkin 
> 
> but I really think the right thing for now is to revert current vsock
> code, or disable building it unconditionally.  Stefan, could you please
> send a patch like this?

I'm waiting for the full single revert patch and will apply it as
soon as I receive it.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v10 5/6] target-arm: kvm - re-inject guest debug exceptions

2015-12-08 Thread Alex Bennée
If we can't find details for the debug exception in our debug state
then we can assume the exception is due to debugging inside the guest.
To inject the exception into the guest state we re-use the TCG exception
code (do_interrupt).

However while guest debugging is in effect we currently can't handle the
guest using single step as we will keep trapping to back to userspace.
GDB makes heavy use of single-step behind the scenes which effectively
means the guests ability to debug itself is disabled while it is being
debugged.

Signed-off-by: Alex Bennée <alex.ben...@linaro.org>

---
v5:
  - new for v5
v10:
  - fix arm32 compile
  - add full stop at end of sentance
  - attempted to expand on limitations in commit msg
---
 target-arm/helper-a64.c | 12 ++--
 target-arm/kvm64.c  | 24 +---
 2 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
index deb8dbe..fc3ccdf 100644
--- a/target-arm/helper-a64.c
+++ b/target-arm/helper-a64.c
@@ -25,6 +25,7 @@
 #include "qemu/bitops.h"
 #include "internals.h"
 #include "qemu/crc32c.h"
+#include "sysemu/kvm.h"
 #include  /* For crc32 */
 
 /* C2.4.7 Multiply and divide */
@@ -469,7 +470,8 @@ void aarch64_cpu_do_interrupt(CPUState *cs)
   new_el);
 if (qemu_loglevel_mask(CPU_LOG_INT)
 && !excp_is_internal(cs->exception_index)) {
-qemu_log_mask(CPU_LOG_INT, "...with ESR 0x%" PRIx32 "\n",
+qemu_log_mask(CPU_LOG_INT, "...with ESR %x/0x%" PRIx32 "\n",
+  env->exception.syndrome >> ARM_EL_EC_SHIFT,
   env->exception.syndrome);
 }
 
@@ -535,6 +537,12 @@ void aarch64_cpu_do_interrupt(CPUState *cs)
 aarch64_restore_sp(env, new_el);
 
 env->pc = addr;
-cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
+
+qemu_log_mask(CPU_LOG_INT, "...to EL%d PC 0x%" PRIx64 " PSTATE 0x%x\n",
+  new_el, env->pc, pstate_read(env));
+
+if (!kvm_enabled()) {
+cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
+}
 }
 #endif
diff --git a/target-arm/kvm64.c b/target-arm/kvm64.c
index 771ecdb..8e6d044 100644
--- a/target-arm/kvm64.c
+++ b/target-arm/kvm64.c
@@ -871,6 +871,7 @@ bool kvm_arm_handle_debug(CPUState *cs, struct 
kvm_debug_exit_arch *debug_exit)
 {
 int hsr_ec = debug_exit->hsr >> ARM_EL_EC_SHIFT;
 ARMCPU *cpu = ARM_CPU(cs);
+CPUClass *cc = CPU_GET_CLASS(cs);
 CPUARMState *env = >env;
 
 /* Ensure PC is synchronised */
@@ -881,7 +882,14 @@ bool kvm_arm_handle_debug(CPUState *cs, struct 
kvm_debug_exit_arch *debug_exit)
 if (cs->singlestep_enabled) {
 return true;
 } else {
-error_report("Came out of SINGLE STEP when not enabled");
+/*
+ * The kernel should have supressed the guests ability to
+ * single step at this point so something has gone wrong.
+ */
+error_report("%s: guest single-step while debugging unsupported"
+ " (%"PRIx64", %"PRIx32")\n",
+ __func__, env->pc, debug_exit->hsr);
+return false;
 }
 break;
 case EC_AA64_BKPT:
@@ -908,12 +916,14 @@ bool kvm_arm_handle_debug(CPUState *cs, struct 
kvm_debug_exit_arch *debug_exit)
      __func__, debug_exit->hsr, env->pc);
 }
 
-/* If we don't handle this it could be it really is for the
-   guest to handle */
-qemu_log_mask(LOG_UNIMP,
-  "%s: re-injecting exception not yet implemented"
-  " (0x%"PRIx32", %"PRIx64")\n",
-  __func__, hsr_ec, env->pc);
+/* If we are not handling the debug exception it must belong to
+ * the guest. Let's re-use the existing TCG interrupt code to set
+ * everything up properly.
+ */
+cs->exception_index = EXCP_BKPT;
+env->exception.syndrome = debug_exit->hsr;
+env->exception.vaddress = debug_exit->far;
+cc->do_interrupt(cs);
 
 return false;
 }
-- 
2.6.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 18/21] KVM: ARM64: Add PMU overflow interrupt routing

2015-12-08 Thread Marc Zyngier
On 08/12/15 12:47, Shannon Zhao wrote:
> From: Shannon Zhao 
> 
> When calling perf_event_create_kernel_counter to create perf_event,
> assign a overflow handler. Then when perf event overflows, call
> kvm_vcpu_kick() to sync the interrupt.

Please update the commit message, things have changed quite a bit now.

> 
> Signed-off-by: Shannon Zhao 
> ---
>  arch/arm/kvm/arm.c|  2 ++
>  include/kvm/arm_pmu.h |  2 ++
>  virt/kvm/arm/pmu.c| 52 
> ++-
>  3 files changed, 55 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index e06fd29..cd696ef 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -28,6 +28,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define CREATE_TRACE_POINTS
>  #include "trace.h"
> @@ -569,6 +570,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
> kvm_run *run)
>* non-preemptible context.
>*/
>   preempt_disable();
> + kvm_pmu_flush_hwstate(vcpu);
>   kvm_timer_flush_hwstate(vcpu);
>   kvm_vgic_flush_hwstate(vcpu);
>  
> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
> index a131f76..c4041008 100644
> --- a/include/kvm/arm_pmu.h
> +++ b/include/kvm/arm_pmu.h
> @@ -38,6 +38,7 @@ struct kvm_pmu {
>  };
>  
>  #ifdef CONFIG_KVM_ARM_PMU
> +void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu);
>  u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32 select_idx);
>  void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u32 val);
>  void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u32 val, bool all_enable);
> @@ -48,6 +49,7 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, 
> u32 data,
>   u32 select_idx);
>  void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u32 val);
>  #else
> +void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu) {}
>  u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32 select_idx)
>  {
>   return 0;
> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> index 9b9c706..ff182d6 100644
> --- a/virt/kvm/arm/pmu.c
> +++ b/virt/kvm/arm/pmu.c
> @@ -21,6 +21,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /**
>   * kvm_pmu_get_counter_value - get PMU counter value
> @@ -90,6 +91,54 @@ static void kvm_pmu_stop_counter(struct kvm_pmc *pmc)
>  }
>  
>  /**
> + * kvm_pmu_flush_hwstate - flush pmu state to cpu
> + * @vcpu: The vcpu pointer
> + *
> + * Inject virtual PMU IRQ if IRQ is pending for this cpu.
> + */
> +void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_pmu *pmu = >arch.pmu;
> + u32 overflow;
> +
> + if (pmu->irq_num == -1)
> + return;
> +
> + if (!vcpu_mode_is_32bit(vcpu)) {
> + if (!(vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMCR_E))
> + return;
> +
> + overflow = vcpu_sys_reg(vcpu, PMCNTENSET_EL0)
> +& vcpu_sys_reg(vcpu, PMINTENSET_EL1)
> +& vcpu_sys_reg(vcpu, PMOVSSET_EL0);
> + } else {
> + if (!(vcpu_cp15(vcpu, c9_PMCR) & ARMV8_PMCR_E))
> + return;
> +
> + overflow = vcpu_cp15(vcpu, c9_PMCNTENSET)
> +& vcpu_cp15(vcpu, c9_PMINTENSET)
> +& vcpu_cp15(vcpu, c9_PMOVSSET);
> + }
> +
> + kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id, pmu->irq_num,
> + overflow ? 1 : 0);
> +}
> +
> +/**
> + * When perf event overflows, call kvm_pmu_overflow_set to set overflow 
> status.
> + */
> +static void kvm_pmu_perf_overflow(struct perf_event *perf_event,
> +   struct perf_sample_data *data,
> +   struct pt_regs *regs)
> +{
> + struct kvm_pmc *pmc = perf_event->overflow_handler_context;
> + struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
> + int idx = pmc->idx;
> +
> + kvm_pmu_overflow_set(vcpu, BIT(idx));
> +}
> +
> +/**
>   * kvm_pmu_enable_counter - enable selected PMU counter
>   * @vcpu: The vcpu pointer
>   * @val: the value guest writes to PMCNTENSET register
> @@ -341,7 +390,8 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu 
> *vcpu, u32 data,
>   /* The initial sample period (overflow count) of an event. */
>   attr.sample_period = (-counter) & pmc->bitmask;
>  
> - event = perf_event_create_kernel_counter(, -1, current, NULL, pmc);
> + event = perf_event_create_kernel_counter(, -1, current,
> +  kvm_pmu_perf_overflow, pmc);
>   if (IS_ERR(event)) {
>   printk_once("kvm: pmu event creation failed %ld\n",
>   PTR_ERR(event));
> 

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org

Re: [PATCH v6 21/21] KVM: ARM64: Add a new kvm ARM PMU device

2015-12-08 Thread Marc Zyngier
On 08/12/15 12:47, Shannon Zhao wrote:
> From: Shannon Zhao 
> 
> Add a new kvm device type KVM_DEV_TYPE_ARM_PMU_V3 for ARM PMU. Implement
> the kvm_device_ops for it.
> 
> Signed-off-by: Shannon Zhao 
> ---
>  Documentation/virtual/kvm/devices/arm-pmu.txt | 16 +
>  arch/arm64/include/uapi/asm/kvm.h |  3 +
>  include/linux/kvm_host.h  |  1 +
>  include/uapi/linux/kvm.h  |  2 +
>  virt/kvm/arm/pmu.c| 93 
> +++
>  virt/kvm/kvm_main.c   |  4 ++
>  6 files changed, 119 insertions(+)
>  create mode 100644 Documentation/virtual/kvm/devices/arm-pmu.txt
> 
> diff --git a/Documentation/virtual/kvm/devices/arm-pmu.txt 
> b/Documentation/virtual/kvm/devices/arm-pmu.txt
> new file mode 100644
> index 000..5121f1f
> --- /dev/null
> +++ b/Documentation/virtual/kvm/devices/arm-pmu.txt
> @@ -0,0 +1,16 @@
> +ARM Virtual Performance Monitor Unit (vPMU)
> +===
> +
> +Device types supported:
> +  KVM_DEV_TYPE_ARM_PMU_V3 ARM Performance Monitor Unit v3
> +
> +Instantiate one PMU instance for per VCPU through this API.
> +
> +Groups:
> +  KVM_DEV_ARM_PMU_GRP_IRQ
> +  Attributes:
> +A value describing the interrupt number of PMU overflow interrupt. This
> +interrupt should be a PPI.
> +
> +  Errors:
> +-EINVAL: Value set is out of the expected range (from 16 to 31)
> diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> b/arch/arm64/include/uapi/asm/kvm.h
> index 2d4ca4b..568afa2 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -204,6 +204,9 @@ struct kvm_arch_memory_slot {
>  #define KVM_DEV_ARM_VGIC_GRP_CTRL4
>  #define   KVM_DEV_ARM_VGIC_CTRL_INIT 0
>  
> +/* Device Control API: ARM PMU */
> +#define KVM_DEV_ARM_PMU_GRP_IRQ  0
> +
>  /* KVM_IRQ_LINE irq field index values */
>  #define KVM_ARM_IRQ_TYPE_SHIFT   24
>  #define KVM_ARM_IRQ_TYPE_MASK0xff
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index c923350..608dea6 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1161,6 +1161,7 @@ extern struct kvm_device_ops kvm_mpic_ops;
>  extern struct kvm_device_ops kvm_xics_ops;
>  extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
>  extern struct kvm_device_ops kvm_arm_vgic_v3_ops;
> +extern struct kvm_device_ops kvm_arm_pmu_ops;
>  
>  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
>  
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 03f3618..4ba6fdd 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1032,6 +1032,8 @@ enum kvm_device_type {
>  #define KVM_DEV_TYPE_FLICKVM_DEV_TYPE_FLIC
>   KVM_DEV_TYPE_ARM_VGIC_V3,
>  #define KVM_DEV_TYPE_ARM_VGIC_V3 KVM_DEV_TYPE_ARM_VGIC_V3
> + KVM_DEV_TYPE_ARM_PMU_V3,
> +#define  KVM_DEV_TYPE_ARM_PMU_V3 KVM_DEV_TYPE_ARM_PMU_V3
>   KVM_DEV_TYPE_MAX,
>  };
>  
> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> index f8007c7..a84a4d7 100644
> --- a/virt/kvm/arm/pmu.c
> +++ b/virt/kvm/arm/pmu.c
> @@ -19,10 +19,13 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
>  
> +#include "vgic.h"
> +
>  /**
>   * kvm_pmu_get_counter_value - get PMU counter value
>   * @vcpu: The vcpu pointer
> @@ -438,3 +441,93 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu 
> *vcpu, u32 data,
>  
>   pmc->perf_event = event;
>  }
> +
> +static inline bool kvm_arm_pmu_initialized(struct kvm_vcpu *vcpu)
> +{
> + return vcpu->arch.pmu.irq_num != -1;
> +}
> +
> +static int kvm_arm_pmu_set_irq(struct kvm *kvm, int irq)
> +{
> + int j;
> + struct kvm_vcpu *vcpu;
> +
> + kvm_for_each_vcpu(j, vcpu, kvm) {
> + struct kvm_pmu *pmu = >arch.pmu;
> +
> + if (!kvm_arm_pmu_initialized(vcpu))
> + kvm_debug("Set kvm ARM PMU irq: %d\n", irq);
> + pmu->irq_num = irq;

Missing braces? Also, you should consider returning an error if it has
been initialized twice.

> + }
> +
> + return 0;
> +}
> +
> +static int kvm_arm_pmu_create(struct kvm_device *dev, u32 type)
> +{
> + int i;
> + struct kvm_vcpu *vcpu;
> + struct kvm *kvm = dev->kvm;
> +
> + kvm_for_each_vcpu(i, vcpu, kvm) {
> + struct kvm_pmu *pmu = >arch.pmu;
> +
> + memset(pmu, 0, sizeof(*pmu));
> + kvm_pmu_vcpu_reset(vcpu);
> + pmu->irq_num = -1;
> + }
> +
> + return 0;
> +}
> +
> +static void kvm_arm_pmu_destroy(struct kvm_device *dev)
> +{
> + kfree(dev);
> +}
> +
> +static int kvm_arm_pmu_set_attr(struct kvm_device *dev,
> + struct kvm_device_attr *attr)
> +{
> + switch (attr->group) {
> + case KVM_DEV_ARM_PMU_GRP_IRQ: {
> + int __user *uaddr = (int 

Re: [PATCH v6 17/21] KVM: ARM64: Add helper to handle PMCR register bits

2015-12-08 Thread Marc Zyngier
On 08/12/15 12:47, Shannon Zhao wrote:
> From: Shannon Zhao 
> 
> According to ARMv8 spec, when writing 1 to PMCR.E, all counters are
> enabled by PMCNTENSET, while writing 0 to PMCR.E, all counters are
> disabled. When writing 1 to PMCR.P, reset all event counters, not
> including PMCCNTR, to zero. When writing 1 to PMCR.C, reset PMCCNTR to
> zero.
> 
> Signed-off-by: Shannon Zhao 
> ---
>  arch/arm64/kvm/sys_regs.c |  2 ++
>  include/kvm/arm_pmu.h |  2 ++
>  virt/kvm/arm/pmu.c| 51 
> +++
>  3 files changed, 55 insertions(+)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 9baa654..110b288 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -620,6 +620,7 @@ static bool access_pmu_regs(struct kvm_vcpu *vcpu,
>   val &= ~ARMV8_PMCR_MASK;
>   val |= p->regval & ARMV8_PMCR_MASK;
>   vcpu_sys_reg(vcpu, r->reg) = val;
> + kvm_pmu_handle_pmcr(vcpu, val);
>   break;
>   }
>   case PMCEID0_EL0:
> @@ -1218,6 +1219,7 @@ static bool access_pmu_cp15_regs(struct kvm_vcpu *vcpu,
>   val &= ~ARMV8_PMCR_MASK;
>   val |= p->regval & ARMV8_PMCR_MASK;
>   vcpu_cp15(vcpu, r->reg) = val;
> + kvm_pmu_handle_pmcr(vcpu, val);
>   break;
>   }
>   case c9_PMCEID0:
> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
> index d12450a..a131f76 100644
> --- a/include/kvm/arm_pmu.h
> +++ b/include/kvm/arm_pmu.h
> @@ -46,6 +46,7 @@ void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u32 val);
>  void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u32 val);
>  void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u32 data,
>   u32 select_idx);
> +void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u32 val);
>  #else
>  u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32 select_idx)
>  {
> @@ -58,6 +59,7 @@ void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u32 val) {}
>  void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u32 val) {}
>  void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u32 data,
>   u32 select_idx) {}
> +void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u32 val) {}
>  #endif
>  
>  #endif
> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> index 093e211..9b9c706 100644
> --- a/virt/kvm/arm/pmu.c
> +++ b/virt/kvm/arm/pmu.c
> @@ -151,6 +151,57 @@ static u32 kvm_pmu_valid_counter_mask(struct kvm_vcpu 
> *vcpu)
>  }
>  
>  /**
> + * kvm_pmu_handle_pmcr - handle PMCR register
> + * @vcpu: The vcpu pointer
> + * @val: the value guest writes to PMCR register
> + */
> +void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u32 val)
> +{
> + struct kvm_pmu *pmu = >arch.pmu;
> + struct kvm_pmc *pmc;
> + u32 enable;
> + int i;
> +
> + if (val & ARMV8_PMCR_E) {
> + if (!vcpu_mode_is_32bit(vcpu))
> + enable = vcpu_sys_reg(vcpu, PMCNTENSET_EL0);

This returns a 64bit quantity. Please explicitly select the low 32bits.

> + else
> + enable = vcpu_cp15(vcpu, c9_PMCNTENSET);
> +
> + kvm_pmu_enable_counter(vcpu, enable, true);

It really feels like there is some common stuff with the handling of
PNCTENSET,

> + } else {
> + kvm_pmu_disable_counter(vcpu, 0xUL);
> + }
> +
> + if (val & ARMV8_PMCR_C) {
> + pmc = >pmc[ARMV8_MAX_COUNTERS - 1];

Nit: it would be nice to have a #define for the cycle counter.

> + if (pmc->perf_event)
> + local64_set(>perf_event->count, 0);
> + if (!vcpu_mode_is_32bit(vcpu))
> + vcpu_sys_reg(vcpu, PMCCNTR_EL0) = 0;
> + else
> + vcpu_cp15(vcpu, c9_PMCCNTR) = 0;
> + }
> +
> + if (val & ARMV8_PMCR_P) {
> + for (i = 0; i < ARMV8_MAX_COUNTERS - 1; i++) {
> + pmc = >pmc[i];
> + if (pmc->perf_event)
> + local64_set(>perf_event->count, 0);
> + if (!vcpu_mode_is_32bit(vcpu))
> + vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) = 0;
> + else
> + vcpu_cp15(vcpu, c14_PMEVCNTR0 + i) = 0;
> + }
> + }
> +
> + if (val & ARMV8_PMCR_LC) {
> + pmc = >pmc[ARMV8_MAX_COUNTERS - 1];
> + pmc->bitmask = 0xUL;
> + }
> +}
> +
> +/**
>   * kvm_pmu_overflow_clear - clear PMU overflow interrupt
>   * @vcpu: The vcpu pointer
>   * @val: the value guest writes to PMOVSCLR register
> 

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the 

Re: [PATCH v6 00/21] KVM: ARM64: Add guest PMU support

2015-12-08 Thread Marc Zyngier
Shannon,

On 08/12/15 12:47, Shannon Zhao wrote:
> From: Shannon Zhao 
> 
> This patchset adds guest PMU support for KVM on ARM64. It takes
> trap-and-emulate approach. When guest wants to monitor one event, it
> will be trapped by KVM and KVM will call perf_event API to create a perf
> event and call relevant perf_event APIs to get the count value of event.

There is still some work to do. A number of bugs, some design issues,
and in general a lot of tidying up to do. If you want this in 4.5,
you're going to have to address this quickly.

But quickly doesn't mean rushing it, and I have the feeling that this is
what you've done over the past week. Also, I've reviewed this series 3
times during that time, and I feel a bit exhausted.

So please do not send another one before the end of the week. take the
time to address all the comments, ask questions if needed, and hopefully
I can review it again next Monday.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 07/21] KVM: ARM64: PMU: Add perf event map and introduce perf event creating function

2015-12-08 Thread Shannon Zhao


On 2015/12/8 23:43, Marc Zyngier wrote:
> On 08/12/15 12:47, Shannon Zhao wrote:
>> From: Shannon Zhao 
>> +/**
>> + * kvm_pmu_get_counter_value - get PMU counter value
>> + * @vcpu: The vcpu pointer
>> + * @select_idx: The counter index
>> + */
>> +u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32 select_idx)
>> +{
>> +u64 counter, enabled, running;
>> +struct kvm_pmu *pmu = >arch.pmu;
>> +struct kvm_pmc *pmc = >pmc[select_idx];
>> +
>> +if (!vcpu_mode_is_32bit(vcpu))
>> +counter = vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + select_idx);
>> +else
>> +counter = vcpu_cp15(vcpu, c14_PMEVCNTR0 + select_idx);
>> +
>> +if (pmc->perf_event)
>> +counter += perf_event_read_value(pmc->perf_event, ,
>> + );
>> +
>> +return counter & pmc->bitmask;
> 
> This one confused me for a while. Is it the case that you return
> whatever is in the vcpu view of the counter, plus anything that perf
> itself has counted? If so, I'd appreciate a comment here...
> 
Yes, the real counter value is the current counter value plus the value
perf event counts. I'll add a comment.

>> +}
>> +
>> +static bool kvm_pmu_counter_is_enabled(struct kvm_vcpu *vcpu, u32 
>> select_idx)
>> +{
>> +if (!vcpu_mode_is_32bit(vcpu))
>> +return (vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMCR_E) &
>> +   (vcpu_sys_reg(vcpu, PMCNTENSET_EL0) >> select_idx);
> 
> This looks wrong. Shouldn't it be:
> 
> return ((vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMCR_E) &&
> (vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & (1 << select_idx)));
> 
>> +else
>> +return (vcpu_sys_reg(vcpu, c9_PMCR) & ARMV8_PMCR_E) &
>> +   (vcpu_sys_reg(vcpu, c9_PMCNTENSET) >> select_idx);
>> +}
> 
> Also, I don't really see why we need to check the 32bit version, which
> has the exact same content.
> 
>> +
>> +static inline struct kvm_vcpu *kvm_pmc_to_vcpu(struct kvm_pmc *pmc)
>> +{
>> +struct kvm_pmu *pmu;
>> +struct kvm_vcpu_arch *vcpu_arch;
>> +
>> +pmc -= pmc->idx;
>> +pmu = container_of(pmc, struct kvm_pmu, pmc[0]);
>> +vcpu_arch = container_of(pmu, struct kvm_vcpu_arch, pmu);
>> +return container_of(vcpu_arch, struct kvm_vcpu, arch);
>> +}
>> +
>> +/**
>> + * kvm_pmu_stop_counter - stop PMU counter
>> + * @pmc: The PMU counter pointer
>> + *
>> + * If this counter has been configured to monitor some event, release it 
>> here.
>> + */
>> +static void kvm_pmu_stop_counter(struct kvm_pmc *pmc)
>> +{
>> +struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
>> +u64 counter;
>> +
>> +if (pmc->perf_event) {
>> +counter = kvm_pmu_get_counter_value(vcpu, pmc->idx);
>> +if (!vcpu_mode_is_32bit(vcpu))
>> +vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + pmc->idx) = counter;
>> +else
>> +vcpu_cp15(vcpu, c14_PMEVCNTR0 + pmc->idx) = counter;
> 
> Same thing - we don't need to make a difference between 32 and 64bit.
> 
So it's fine to drop all the vcpu_mode_is_32bit(vcpu) check of this
series? The only one we should take care is the PMCCNTR, right?

>> +
>> +perf_event_release_kernel(pmc->perf_event);
>> +pmc->perf_event = NULL;
>> +}
>> +}
>> +
>> +/**
>> + * kvm_pmu_set_counter_event_type - set selected counter to monitor some 
>> event
>> + * @vcpu: The vcpu pointer
>> + * @data: The data guest writes to PMXEVTYPER_EL0
>> + * @select_idx: The number of selected counter
>> + *
>> + * When OS accesses PMXEVTYPER_EL0, that means it wants to set a PMC to 
>> count an
>> + * event with given hardware event number. Here we call perf_event API to
>> + * emulate this action and create a kernel perf event for it.
>> + */
>> +void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u32 data,
>> +u32 select_idx)
>> +{
>> +struct kvm_pmu *pmu = >arch.pmu;
>> +struct kvm_pmc *pmc = >pmc[select_idx];
>> +struct perf_event *event;
>> +struct perf_event_attr attr;
>> +u32 eventsel;
>> +u64 counter;
>> +
>> +kvm_pmu_stop_counter(pmc);
> 
> Wait. I didn't realize this before, but you have the vcpu right here.
> Why don't you pass it as a parameter to kvm_pmu_stop_counter and avoid
> the kvm_pmc_to_vcpu thing altogether?
> 
Yeah, we could pass vcpu as a parameter for this function. But the
kvm_pmc_to_vcpu helper is also used in kvm_pmu_perf_overflow() and
within kvm_pmu_perf_overflow it needs the pmc->idx, we couldn't pass
vcpu as a parameter, so this helper is necessary for kvm_pmu_perf_overflow.

Thanks,
-- 
Shannon

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/5] Add virtio transport for AF_VSOCK

2015-12-08 Thread Stefan Hajnoczi
On Fri, Dec 04, 2015 at 09:45:04AM +0200, Michael S. Tsirkin wrote:
> On Wed, Dec 02, 2015 at 02:43:58PM +0800, Stefan Hajnoczi wrote:
> > 1. The 3-way handshake isn't necessary over a reliable transport 
> > (virtqueue).
> >Spoofing packets is also impossible so the security aspects of the 3-way
> >handshake (including syn cookie) add nothing.  The next version will 
> > have a
> >single operation to establish a connection.
> 
> It's hard to discuss without seeing the details, but we do need to
> slow down guests that are flooding host with socket creation requests.
> The handshake is a simple way for hypervisor to defer
> such requests until it has resources without
> breaking things.

I'll send an updated virtio-vsock device specification so we can discuss
it.

VMCI simply uses sk->sk_max_ack_backlog in
net/vmw_vsock/vmci_transport.c.  If backlog (from listen(2)) is maxed
out then the connection is refused.

The same would work for virtio-vsock and there is no advantage to the
3-way handshake.

> > 2. Credit-based flow control doesn't work for SOCK_DGRAM since multiple 
> > clients
> >can transmit to the same listen socket.  There is no way for the clients 
> > to
> >coordinate buffer space with each other fairly.  The next version will 
> > drop
> >credit-based flow control for SOCK_DGRAM and only rely on best-effort
> >delivery.  SOCK_STREAM still has guaranteed delivery.
> 
> I suspect in the end we will need a measure of fairness even
> if you drop packets. And recovering from packet loss is
> hard enough that not many applications do it correctly.
> I suggest disabling SOCK_DGRAM for now.

I'm not aware of a SOCK_DGRAM user at this time.  Will disable it for
now.


signature.asc
Description: PGP signature


Re: [PATCH 0/6] VSOCK: revert virtio-vsock until device spec is finalized

2015-12-08 Thread Stefan Hajnoczi
On Tue, Dec 08, 2015 at 11:26:55AM -0500, David Miller wrote:
> From: Stefan Hajnoczi 
> Date: Tue,  8 Dec 2015 19:57:30 +0800
> 
> > Please revert for now.
> 
> Please don't revert it piece by piece like this.
> 
> Instead, send me one big revert commit that undoes the whole
> thing.  There is even a merge commit that you can use to
> create that revert cleanly.

Okay.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v2] Revert "Merge branch 'vsock-virtio'"

2015-12-08 Thread David Miller
From: Stefan Hajnoczi 
Date: Wed,  9 Dec 2015 10:51:12 +0800

> This reverts commit 0d76d6e8b2507983a2cae4c09880798079007421 and merge
> commit c402293bd76fbc93e52ef8c0947ab81eea3ae019, reversing changes made
> to c89359a42e2a49656451569c382eed63e781153c.
> 
> The virtio-vsock device specification is not finalized yet.  Michael
> Tsirkin voiced concerned about merging this code when the hardware
> interface (and possibly the userspace interface) could still change.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
> v2:
>  * Revert merge commit and coccinelle fixup in a single patch

Applied, th anks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] KVM: PPC: Increase memslots to 320

2015-12-08 Thread Paul Mackerras
On Wed, Nov 04, 2015 at 10:03:48AM +0100, Thomas Huth wrote:
> Only using 32 memslots for KVM on powerpc is way too low, you can
> nowadays hit this limit quite fast by adding a couple of PCI devices
> and/or pluggable memory DIMMs to the guest.
> x86 already increased the limit to 512 in total, to satisfy 256
> pluggable DIMM slots, 3 private slots and 253 slots for other things
> like PCI devices. On powerpc, we only have 32 pluggable DIMMs in

I agree with increasing the limit.  Is there a reason we have only 32
pluggable DIMMs in QEMU on powerpc, not more?  Should we be increasing
that limit too?  If so, maybe we should increase the number of memory
slots to 512?

> QEMU, not 256, so we likely do not as much slots as on x86. Thus

"so we likely do not need as many slots as on x86" would be better
English.

> setting the slot limit to 320 sounds like a good value for the
> time being (until we have some code in the future to resize the
> memslot array dynamically).
> And while we're at it, also remove the KVM_MEM_SLOTS_NUM definition
> from the powerpc-specific header since this gets defined in the
> generic kvm_host.h header anyway.
> 
> Signed-off-by: Thomas Huth 

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] KVM: PPC: Increase memslots to 320

2015-12-08 Thread Paul Mackerras
On Wed, Nov 04, 2015 at 10:03:48AM +0100, Thomas Huth wrote:
> Only using 32 memslots for KVM on powerpc is way too low, you can
> nowadays hit this limit quite fast by adding a couple of PCI devices
> and/or pluggable memory DIMMs to the guest.
> x86 already increased the limit to 512 in total, to satisfy 256
> pluggable DIMM slots, 3 private slots and 253 slots for other things
> like PCI devices. On powerpc, we only have 32 pluggable DIMMs in

I agree with increasing the limit.  Is there a reason we have only 32
pluggable DIMMs in QEMU on powerpc, not more?  Should we be increasing
that limit too?  If so, maybe we should increase the number of memory
slots to 512?

> QEMU, not 256, so we likely do not as much slots as on x86. Thus

"so we likely do not need as many slots as on x86" would be better
English.

> setting the slot limit to 320 sounds like a good value for the
> time being (until we have some code in the future to resize the
> memslot array dynamically).
> And while we're at it, also remove the KVM_MEM_SLOTS_NUM definition
> from the powerpc-specific header since this gets defined in the
> generic kvm_host.h header anyway.
> 
> Signed-off-by: Thomas Huth 

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4 0/4] KVM: arm64: BUG FIX: Correctly handle zero register transfers

2015-12-07 Thread Pavel Fedin
 Hello!

> I messed up the "load into xzr" test royally in the last attached patch.
> It was quite wrong.

 Yes, because "mov %0, xzr" is not trapped.

> I have now tested
> 
>  asm volatile(
>  "str %3, [%1]\n\t"
>  "ldr wzr, [%1]\n\t"
>  "str wzr, [%2]\n\t"
>  "ldr %0, [%2]\n\t"
>  :"=r"(val):"r"(addr), "r"(addr2), "r"(0x):"memory");
> report("mmio: 'ldr wzr' check: read 0x%08lx", val != 0x, val);
> 
> which passes

 I guess i forgot to mention that both addr and addr2 have to be MMIO 
registers. If they are plain memory, then of course everything
will work because they are not trapped.

> Anyway, I
> probably won't clean this test up and post it. I don't think we really
> need to add it as a regression test, unless others disagree and would
> like to see it added.

 Considering how difficult it was to find this problem, and how tricky and 
unobvious it is, i would ask to add this test. Especially
considering you've already written it. At least it will serve as a reminder 
about the problem.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 03/21] KVM: ARM64: Add offset defines for PMU registers

2015-12-07 Thread Marc Zyngier
On 07/12/15 14:31, Shannon Zhao wrote:
> 
> 
> On 2015/12/7 22:06, Marc Zyngier wrote:
>> On 03/12/15 06:11, Shannon Zhao wrote:
>>> From: Shannon Zhao 
>>>
>>> We are about to trap and emulate acccesses to each PMU register
>>
>> s/acccesses/accesses/
>>
>>> individually. This adds the context offsets for the AArch64 PMU
>>> registers and their AArch32 counterparts.
>>>
>>> Signed-off-by: Shannon Zhao 
>>> ---
>>>   arch/arm64/include/asm/kvm_asm.h | 55 
>>> 
>>>   1 file changed, 50 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/kvm_asm.h 
>>> b/arch/arm64/include/asm/kvm_asm.h
>>> index 5e37710..4f804c1 100644
>>> --- a/arch/arm64/include/asm/kvm_asm.h
>>> +++ b/arch/arm64/include/asm/kvm_asm.h
>>> @@ -48,12 +48,34 @@
>>>   #define MDSCR_EL1 22  /* Monitor Debug System Control Register */
>>>   #define MDCCINT_EL1   23  /* Monitor Debug Comms Channel 
>>> Interrupt Enable Reg */
>>>
>>
>> Coming back to this patch, it gives a clear view of where you have state
>> duplication.
>>
>>> +/* Performance Monitors Registers */
>>> +#define PMCR_EL0   24  /* Control Register */
>>> +#define PMOVSSET_EL0   25  /* Overflow Flag Status Set Register */
>>> +#define PMOVSCLR_EL0   26  /* Overflow Flag Status Clear Register 
>>> */
>>
>> This should only be a single state. You don't even have to represent it
>> in the sysreg array, to be honest.
>>
> Yeah, I could store the state in one of them and drop one of them here.

Indeed.

>>> +#define PMSELR_EL0 27  /* Event Counter Selection Register */
>>> +#define PMCEID0_EL028  /* Common Event Identification Register 
>>> 0 */
>>> +#define PMCEID1_EL029  /* Common Event Identification Register 
>>> 1 */
>>> +#define PMEVCNTR0_EL0  30  /* Event Counter Register (0-30) */
>>> +#define PMEVCNTR30_EL0 60
>>> +#define PMCCNTR_EL061  /* Cycle Counter Register */
>>> +#define PMEVTYPER0_EL0 62  /* Event Type Register (0-30) */
>>> +#define PMEVTYPER30_EL092
>>> +#define PMCCFILTR_EL0  93  /* Cycle Count Filter Register */
>>> +#define PMXEVCNTR_EL0  94  /* Selected Event Count Register */
>>> +#define PMXEVTYPER_EL0 95  /* Selected Event Type Register */
>>
>> These "select" registers aren't real ones, but just a way to pick the
>> real register. They should be removed.
>>
> I think these two could be retained, since it's convenient to handle the 
> guest accessing by using "case PMXEVCNTR_EL0"

Not really. This is just occupying pointless space is the register file,
and it would be clearer to have a couple of helpers called directly from
the sys_reg trap table.

Please get rid of it.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 03/21] KVM: ARM64: Add offset defines for PMU registers

2015-12-07 Thread Shannon Zhao



On 2015/12/7 22:06, Marc Zyngier wrote:

On 03/12/15 06:11, Shannon Zhao wrote:

From: Shannon Zhao 

We are about to trap and emulate acccesses to each PMU register


s/acccesses/accesses/


individually. This adds the context offsets for the AArch64 PMU
registers and their AArch32 counterparts.

Signed-off-by: Shannon Zhao 
---
  arch/arm64/include/asm/kvm_asm.h | 55 
  1 file changed, 50 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 5e37710..4f804c1 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -48,12 +48,34 @@
  #define MDSCR_EL1 22  /* Monitor Debug System Control Register */
  #define MDCCINT_EL1   23  /* Monitor Debug Comms Channel Interrupt Enable 
Reg */



Coming back to this patch, it gives a clear view of where you have state
duplication.


+/* Performance Monitors Registers */
+#define PMCR_EL0   24  /* Control Register */
+#define PMOVSSET_EL0   25  /* Overflow Flag Status Set Register */
+#define PMOVSCLR_EL0   26  /* Overflow Flag Status Clear Register */


This should only be a single state. You don't even have to represent it
in the sysreg array, to be honest.


Yeah, I could store the state in one of them and drop one of them here.


+#define PMSELR_EL0 27  /* Event Counter Selection Register */
+#define PMCEID0_EL028  /* Common Event Identification Register 0 */
+#define PMCEID1_EL029  /* Common Event Identification Register 1 */
+#define PMEVCNTR0_EL0  30  /* Event Counter Register (0-30) */
+#define PMEVCNTR30_EL0 60
+#define PMCCNTR_EL061  /* Cycle Counter Register */
+#define PMEVTYPER0_EL0 62  /* Event Type Register (0-30) */
+#define PMEVTYPER30_EL092
+#define PMCCFILTR_EL0  93  /* Cycle Count Filter Register */
+#define PMXEVCNTR_EL0  94  /* Selected Event Count Register */
+#define PMXEVTYPER_EL0 95  /* Selected Event Type Register */


These "select" registers aren't real ones, but just a way to pick the
real register. They should be removed.

I think these two could be retained, since it's convenient to handle the 
guest accessing by using "case PMXEVCNTR_EL0"



+#define PMCNTENSET_EL0 96  /* Count Enable Set Register */
+#define PMCNTENCLR_EL0 97  /* Count Enable Clear Register */
+#define PMINTENSET_EL1 98  /* Interrupt Enable Set Register */
+#define PMINTENCLR_EL1 99  /* Interrupt Enable Clear Register */


Same for these. They are just convenient accessors for the HW register.


+#define PMUSERENR_EL0  100 /* User Enable Register */
+#define PMSWINC_EL0101 /* Software Increment Register */
+
  /* 32bit specific registers. Keep them at the end of the range */
-#defineDACR32_EL2  24  /* Domain Access Control Register */
-#defineIFSR32_EL2  25  /* Instruction Fault Status Register */
-#defineFPEXC32_EL2 26  /* Floating-Point Exception Control 
Register */
-#defineDBGVCR32_EL227  /* Debug Vector Catch Register */
-#defineNR_SYS_REGS 28
+#defineDACR32_EL2  102 /* Domain Access Control Register */
+#defineIFSR32_EL2  103 /* Instruction Fault Status Register */
+#defineFPEXC32_EL2 104 /* Floating-Point Exception Control 
Register */
+#defineDBGVCR32_EL2105 /* Debug Vector Catch Register */
+#defineNR_SYS_REGS 106

  /* 32bit mapping */
  #define c0_MPIDR  (MPIDR_EL1 * 2) /* MultiProcessor ID Register */
@@ -75,6 +97,24 @@
  #define c6_IFAR   (c6_DFAR + 1)   /* Instruction Fault Address 
Register */
  #define c7_PAR(PAR_EL1 * 2)   /* Physical Address Register */
  #define c7_PAR_high   (c7_PAR + 1)/* PAR top 32 bits */
+
+/* Performance Monitors*/
+#define c9_PMCR(PMCR_EL0 * 2)
+#define c9_PMOVSSET(PMOVSSET_EL0 * 2)
+#define c9_PMOVSCLR(PMOVSCLR_EL0 * 2)
+#define c9_PMCCNTR (PMCCNTR_EL0 * 2)
+#define c9_PMSELR  (PMSELR_EL0 * 2)
+#define c9_PMCEID0 (PMCEID0_EL0 * 2)
+#define c9_PMCEID1 (PMCEID1_EL0 * 2)
+#define c9_PMXEVCNTR   (PMXEVCNTR_EL0 * 2)
+#define c9_PMXEVTYPER  (PMXEVTYPER_EL0 * 2)
+#define c9_PMCNTENSET  (PMCNTENSET_EL0 * 2)
+#define c9_PMCNTENCLR  (PMCNTENCLR_EL0 * 2)
+#define c9_PMINTENSET  (PMINTENSET_EL1 * 2)
+#define c9_PMINTENCLR  (PMINTENCLR_EL1 * 2)
+#define c9_PMUSERENR   (PMUSERENR_EL0 * 2)
+#define c9_PMSWINC (PMSWINC_EL0 * 2)
+
  #define c10_PRRR  (MAIR_EL1 * 2)  /* Primary Region Remap Register */
  #define c10_NMRR  (c10_PRRR + 1)  /* Normal Memory Remap Register */
  #define c12_VBAR  (VBAR_EL1 * 2)  /* Vector Base Address Register */
@@ -86,6 +126,11 @@
  #define c10_AMAIR1(c10_AMAIR0 + 1)/* Aux Memory Attr Indirection Reg */
  #define c14_CNTKCTL   (CNTKCTL_EL1 * 2) /* Timer Control Register 

Re: [PATCH v5 00/21] KVM: ARM64: Add guest PMU support

2015-12-07 Thread Shannon Zhao

Hi Marc,

On 2015/12/7 22:11, Marc Zyngier wrote:

Shannon,

On 03/12/15 06:11, Shannon Zhao wrote:

From: Shannon Zhao 

This patchset adds guest PMU support for KVM on ARM64. It takes
trap-and-emulate approach. When guest wants to monitor one event, it
will be trapped by KVM and KVM will call perf_event API to create a perf
event and call relevant perf_event APIs to get the count value of event.

Use perf to test this patchset in guest. When using "perf list", it
shows the list of the hardware events and hardware cache events perf
supports. Then use "perf stat -e EVENT" to monitor some event. For
example, use "perf stat -e cycles" to count cpu cycles and
"perf stat -e cache-misses" to count cache misses.

Below are the outputs of "perf stat -r 5 sleep 5" when running in host
and guest.

Host:
  Performance counter stats for 'sleep 5' (5 runs):

   0.510276  task-clock (msec) #0.000 CPUs utilized 
   ( +-  1.57% )
  1  context-switches  #0.002 M/sec
  0  cpu-migrations#0.000 K/sec
 49  page-faults   #0.096 M/sec 
   ( +-  0.77% )
1064117  cycles#2.085 GHz   
   ( +-  1.56% )
  stalled-cycles-frontend
  stalled-cycles-backend
 529051  instructions  #0.50  insns per cycle   
   ( +-  0.55% )
  branches
   9894  branch-misses #   19.390 M/sec 
   ( +-  1.70% )

5.000853900 seconds time elapsed
  ( +-  0.00% )

Guest:
  Performance counter stats for 'sleep 5' (5 runs):

   0.642456  task-clock (msec) #0.000 CPUs utilized 
   ( +-  1.81% )
  1  context-switches  #0.002 M/sec
  0  cpu-migrations#0.000 K/sec
 49  page-faults   #0.076 M/sec 
   ( +-  1.64% )
1322717  cycles#2.059 GHz   
   ( +-  1.88% )
  stalled-cycles-frontend
  stalled-cycles-backend
 640944  instructions  #0.48  insns per cycle   
   ( +-  1.10% )
  branches
  10665  branch-misses #   16.600 M/sec 
   ( +-  2.23% )

5.001181452 seconds time elapsed
  ( +-  0.00% )

Have a cycle counter read test like below in guest and host:

static void test(void)
{
unsigned long count, count1, count2;
count1 = read_cycles();
count++;
count2 = read_cycles();
}

Host:
count1: 3046186213
count2: 3046186347
delta: 134

Guest:
count1: 5645797121
count2: 5645797270
delta: 149

The gap between guest and host is very small. One reason for this I
think is that it doesn't count the cycles in EL2 and host since we add
exclude_hv = 1. So the cycles spent to store/restore registers which
happens at EL2 are not included.

This patchset can be fetched from [1] and the relevant QEMU version for
test can be fetched from [2].

The results of 'perf test' can be found from [3][4].
The results of perf_event_tests test suite can be found from [5][6].

Also, I have tested "perf top" in two VMs and host at the same time. It
works well.


I've commented on more issues I've found. Hopefully you'll be able to
respin this quickly enough, and end-up with a simpler code base (state
duplication is a bit messy).


Ok, will try my best :)


Another thing I have noticed is that you have dropped the vgic changes
that were configuring the interrupt. It feels like they should be
included, and configure the PPI as a LEVEL interrupt.
The reason why I drop that is in upstream code PPIs are LEVEL interrupt 
by default which is changed by the arch_timers patches. So is it 
necessary to configure it again?



Also, looking at
your QEMU code, you seem to configure the interrupt as EDGE, which is
now how yor emulated HW behaves.

Sorry, the QEMU code is not updated while the version I use for test 
locally configures the interrupt as LEVEL. I will push the newest one 
tomorrow.



Looking forward to reviewing the next version.

Thanks,

M.



--
Shannon
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 21/21] KVM: ARM64: Add a new kvm ARM PMU device

2015-12-07 Thread Marc Zyngier
On 07/12/15 14:37, Shannon Zhao wrote:
> 
> 
> On 2015/12/7 21:56, Marc Zyngier wrote:
>>> +static int kvm_arm_pmu_set_attr(struct kvm_device *dev,
 +  struct kvm_device_attr *attr)
 +{
 +  switch (attr->group) {
 +  case KVM_DEV_ARM_PMU_GRP_IRQ: {
 +  int __user *uaddr = (int __user *)(long)attr->addr;
 +  int reg;
 +
 +  if (get_user(reg, uaddr))
 +  return -EFAULT;
 +
 +  if (reg < VGIC_NR_SGIS || reg >= VGIC_NR_PRIVATE_IRQS)
 +  return -EINVAL;
 +
 +  return kvm_arm_pmu_set_irq(dev->kvm, reg);
>> What prevents the IRQ to be changed while the VM is already running?
>> This should probably be a one-shot thing (change it once, be denied
>> other changes).
> 
> So add a helper like vgic_initialized to check whether vPMU is initialized?

Something like that, yes.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 11/21] KVM: ARM64: Add reset and access handlers for PMCNTENSET and PMCNTENCLR register

2015-12-07 Thread Marc Zyngier
On 03/12/15 06:11, Shannon Zhao wrote:
> From: Shannon Zhao 
> 
> Since the reset value of PMCNTENSET and PMCNTENCLR is UNKNOWN, use
> reset_unknown for its reset handler. Add a new case to emulate writing
> PMCNTENSET or PMCNTENCLR register.
> 
> When writing to PMCNTENSET, call perf_event_enable to enable the perf
> event. When writing to PMCNTENCLR, call perf_event_disable to disable
> the perf event.
> 
> Signed-off-by: Shannon Zhao 
> ---
>  arch/arm64/kvm/sys_regs.c | 52 
> +++
>  include/kvm/arm_pmu.h |  4 
>  virt/kvm/arm/pmu.c| 47 ++
>  3 files changed, 99 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 9e06fe8..e852e5d 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -526,6 +526,27 @@ static bool access_pmu_regs(struct kvm_vcpu *vcpu,
>   vcpu_sys_reg(vcpu, PMEVTYPER0_EL0 + idx) = val;
>   break;
>   }
> + case PMCNTENSET_EL0: {
> + val = *vcpu_reg(vcpu, p->Rt);
> + kvm_pmu_enable_counter(vcpu, val,
> +vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMCR_E);
> + /* Value 1 of PMCNTENSET_EL0 and PMCNTENCLR_EL0 means
> +  * corresponding counter enabled.
> +  */
> + vcpu_sys_reg(vcpu, r->reg) |= val;
> + vcpu_sys_reg(vcpu, PMCNTENCLR_EL0) |= val;
> + break;
> + }
> + case PMCNTENCLR_EL0: {
> + val = *vcpu_reg(vcpu, p->Rt);
> + kvm_pmu_disable_counter(vcpu, val);
> + /* Value 0 of PMCNTENSET_EL0 and PMCNTENCLR_EL0 means
> +  * corresponding counter disabled.
> +  */
> + vcpu_sys_reg(vcpu, r->reg) &= ~val;
> + vcpu_sys_reg(vcpu, PMCNTENSET_EL0) &= ~val;
> + break;
> + }

You have the exact same problem here. These registers are the two side
of the same coin. You should only have a single state describing the
state of the counters, and PMCNTEN{SET,CLR}_EL0 just being accessors for
that state.

Rule of thumb: if you have to write the same value twice, you're doing
the wrong thing.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 08/21] KVM: ARM64: Add reset and access handlers for PMXEVTYPER register

2015-12-07 Thread Marc Zyngier
On 03/12/15 06:11, Shannon Zhao wrote:
> From: Shannon Zhao 
> 
> Since the reset value of PMXEVTYPER is UNKNOWN, use reset_unknown or
> reset_unknown_cp15 for its reset handler. Add access handler which
> emulates writing and reading PMXEVTYPER register. When writing to
> PMXEVTYPER, call kvm_pmu_set_counter_event_type to create a perf_event
> for the selected event type.
> 
> Signed-off-by: Shannon Zhao 
> ---
>  arch/arm64/kvm/sys_regs.c | 44 ++--
>  1 file changed, 42 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index b0a8d88..6967a49 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -473,6 +473,17 @@ static void reset_pmceid(struct kvm_vcpu *vcpu, const 
> struct sys_reg_desc *r)
>   vcpu_sys_reg(vcpu, r->reg) = pmceid;
>  }
>  
> +static bool pmu_counter_idx_valid(u64 pmcr, u64 idx)
> +{
> + u64 val;
> +
> + val = (pmcr >> ARMV8_PMCR_N_SHIFT) & ARMV8_PMCR_N_MASK;
> + if (idx >= val && idx != ARMV8_COUNTER_MASK)
> + return false;
> +
> + return true;
> +}
> +
>  /* PMU registers accessor. */
>  static bool access_pmu_regs(struct kvm_vcpu *vcpu,
>   const struct sys_reg_params *p,
> @@ -482,6 +493,20 @@ static bool access_pmu_regs(struct kvm_vcpu *vcpu,
>  
>   if (p->is_write) {
>   switch (r->reg) {
> + case PMXEVTYPER_EL0: {
> + u64 idx = vcpu_sys_reg(vcpu, PMSELR_EL0)
> +   & ARMV8_COUNTER_MASK;
> +
> + if (!pmu_counter_idx_valid(vcpu_sys_reg(vcpu, PMCR_EL0),
> +idx))
> + break;
> +
> + val = *vcpu_reg(vcpu, p->Rt);
> + kvm_pmu_set_counter_event_type(vcpu, val, idx);
> + vcpu_sys_reg(vcpu, PMXEVTYPER_EL0) = val;
> + vcpu_sys_reg(vcpu, PMEVTYPER0_EL0 + idx) = val;

I'm really confused by this. PMXEVTYPER is not really a register, but
level of indirection between PMEVTYPERn_EL0 and PMCCFILTER_EL0
(depending on PMSELR_EL0.SEL). On its own, it doesn't exist.

Here, you are making PMXEVTYPER_EL0 an actual register, storing the same
value twice (and not handling PMCCFILTER_EL0). It definitely feels
wrong. You should be able to trap it and propagate the access to the
right register, without duplicating the state.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 21/21] KVM: ARM64: Add a new kvm ARM PMU device

2015-12-07 Thread Shannon Zhao



On 2015/12/7 21:56, Marc Zyngier wrote:

+static int kvm_arm_pmu_set_attr(struct kvm_device *dev,
>+   struct kvm_device_attr *attr)
>+{
>+   switch (attr->group) {
>+   case KVM_DEV_ARM_PMU_GRP_IRQ: {
>+   int __user *uaddr = (int __user *)(long)attr->addr;
>+   int reg;
>+
>+   if (get_user(reg, uaddr))
>+   return -EFAULT;
>+
>+   if (reg < VGIC_NR_SGIS || reg >= VGIC_NR_PRIVATE_IRQS)
>+   return -EINVAL;
>+
>+   return kvm_arm_pmu_set_irq(dev->kvm, reg);

What prevents the IRQ to be changed while the VM is already running?
This should probably be a one-shot thing (change it once, be denied
other changes).


So add a helper like vgic_initialized to check whether vPMU is initialized?

Thanks,
--
Shannon
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 00/21] KVM: ARM64: Add guest PMU support

2015-12-07 Thread Marc Zyngier
On 07/12/15 14:47, Shannon Zhao wrote:
> Hi Marc,
> 
> On 2015/12/7 22:11, Marc Zyngier wrote:
>> Shannon,
>>
>> On 03/12/15 06:11, Shannon Zhao wrote:
>>> From: Shannon Zhao 
>>>
>>> This patchset adds guest PMU support for KVM on ARM64. It takes
>>> trap-and-emulate approach. When guest wants to monitor one event, it
>>> will be trapped by KVM and KVM will call perf_event API to create a perf
>>> event and call relevant perf_event APIs to get the count value of event.
>>>
>>> Use perf to test this patchset in guest. When using "perf list", it
>>> shows the list of the hardware events and hardware cache events perf
>>> supports. Then use "perf stat -e EVENT" to monitor some event. For
>>> example, use "perf stat -e cycles" to count cpu cycles and
>>> "perf stat -e cache-misses" to count cache misses.
>>>
>>> Below are the outputs of "perf stat -r 5 sleep 5" when running in host
>>> and guest.
>>>
>>> Host:
>>>   Performance counter stats for 'sleep 5' (5 runs):
>>>
>>>0.510276  task-clock (msec) #0.000 CPUs utilized 
>>>( +-  1.57% )
>>>   1  context-switches  #0.002 M/sec
>>>   0  cpu-migrations#0.000 K/sec
>>>  49  page-faults   #0.096 M/sec 
>>>( +-  0.77% )
>>> 1064117  cycles#2.085 GHz   
>>>( +-  1.56% )
>>>   stalled-cycles-frontend
>>>   stalled-cycles-backend
>>>  529051  instructions  #0.50  insns per 
>>> cycle  ( +-  0.55% )
>>>   branches
>>>9894  branch-misses #   19.390 M/sec 
>>>( +-  1.70% )
>>>
>>> 5.000853900 seconds time elapsed
>>>   ( +-  0.00% )
>>>
>>> Guest:
>>>   Performance counter stats for 'sleep 5' (5 runs):
>>>
>>>0.642456  task-clock (msec) #0.000 CPUs utilized 
>>>( +-  1.81% )
>>>   1  context-switches  #0.002 M/sec
>>>   0  cpu-migrations#0.000 K/sec
>>>  49  page-faults   #0.076 M/sec 
>>>( +-  1.64% )
>>> 1322717  cycles#2.059 GHz   
>>>( +-  1.88% )
>>>   stalled-cycles-frontend
>>>   stalled-cycles-backend
>>>  640944  instructions  #0.48  insns per 
>>> cycle  ( +-  1.10% )
>>>   branches
>>>   10665  branch-misses #   16.600 M/sec 
>>>( +-  2.23% )
>>>
>>> 5.001181452 seconds time elapsed
>>>   ( +-  0.00% )
>>>
>>> Have a cycle counter read test like below in guest and host:
>>>
>>> static void test(void)
>>> {
>>> unsigned long count, count1, count2;
>>> count1 = read_cycles();
>>> count++;
>>> count2 = read_cycles();
>>> }
>>>
>>> Host:
>>> count1: 3046186213
>>> count2: 3046186347
>>> delta: 134
>>>
>>> Guest:
>>> count1: 5645797121
>>> count2: 5645797270
>>> delta: 149
>>>
>>> The gap between guest and host is very small. One reason for this I
>>> think is that it doesn't count the cycles in EL2 and host since we add
>>> exclude_hv = 1. So the cycles spent to store/restore registers which
>>> happens at EL2 are not included.
>>>
>>> This patchset can be fetched from [1] and the relevant QEMU version for
>>> test can be fetched from [2].
>>>
>>> The results of 'perf test' can be found from [3][4].
>>> The results of perf_event_tests test suite can be found from [5][6].
>>>
>>> Also, I have tested "perf top" in two VMs and host at the same time. It
>>> works well.
>>
>> I've commented on more issues I've found. Hopefully you'll be able to
>> respin this quickly enough, and end-up with a simpler code base (state
>> duplication is a bit messy).
>>
> Ok, will try my best :)
> 
>> Another thing I have noticed is that you have dropped the vgic changes
>> that were configuring the interrupt. It feels like they should be
>> included, and configure the PPI as a LEVEL interrupt.
> The reason why I drop that is in upstream code PPIs are LEVEL interrupt 
> by default which is changed by the arch_timers patches. So is it 
> necessary to configure it again?

Ah, yes. Missed that. No, that's fine.

> 
>> Also, looking at
>> your QEMU code, you seem to configure the interrupt as EDGE, which is
>> now how yor emulated HW behaves.
>>
> Sorry, the QEMU code is not updated while the version I use for test 
> locally configures the interrupt as LEVEL. I will push the newest one 
> tomorrow.

That'd be good.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to 

Re: [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC

2015-12-07 Thread Lan, Tianyu

On 12/5/2015 1:07 AM, Alexander Duyck wrote:


We still need to support Windows guest for migration and this is why our
patches keep all changes in the driver since it's impossible to change
Windows kernel.


That is a poor argument.  I highly doubt Microsoft is interested in
having to modify all of the drivers that will support direct assignment
in order to support migration.  They would likely request something
similar to what I have in that they will want a way to do DMA tracking
with minimal modification required to the drivers.


This totally depends on the NIC or other devices' vendors and they
should make decision to support migration or not. If yes, they would
modify driver.

If just target to call suspend/resume during migration, the feature will
be meaningless. Most cases don't want to affect user during migration
a lot and so the service down time is vital. Our target is to apply
SRIOV NIC passthough to cloud service and NFV(network functions
virtualization) projects which are sensitive to network performance
and stability. From my opinion, We should give a change for device
driver to implement itself migration job. Call suspend and resume
callback in the driver if it doesn't care the performance during migration.




Following is my idea to do DMA tracking.

Inject event to VF driver after memory iterate stage
and before stop VCPU and then VF driver marks dirty all
using DMA memory. The new allocated pages also need to
be marked dirty before stopping VCPU. All dirty memory
in this time slot will be migrated until stop-and-copy
stage. We also need to make sure to disable VF via clearing the
bus master enable bit for VF before migrating these memory.


The ordering of your explanation here doesn't quite work.  What needs to
happen is that you have to disable DMA and then mark the pages as dirty.
  What the disabling of the BME does is signal to the hypervisor that
the device is now stopped.  The ixgbevf_suspend call already supported
by the driver is almost exactly what is needed to take care of something
like this.


This is why I hope to reserve a piece of space in the dma page to do 
dummy write. This can help to mark page dirty while not require to stop 
DMA and not race with DMA data.


If can't do that, we have to stop DMA in a short time to mark all dma
pages dirty and then reenable it. I am not sure how much we can get by
this way to track all DMA memory with device running during migration. I
need to do some tests and compare results with stop DMA diretly at last
stage during migration.




The question is how we would go about triggering it.  I really don't
think the PCI configuration space approach is the right idea.
 I wonder
if we couldn't get away with some sort of ACPI event instead.  We
already require ACPI support in order to shut down the system
gracefully, I wonder if we couldn't get away with something similar in
order to suspend/resume the direct assigned devices gracefully.



I don't think there is such events in the current spec.
Otherwise, There are two kinds of suspend/resume callbacks.
1) System suspend/resume called during S2RAM and S2DISK.
2) Runtime suspend/resume called by pm core when device is idle.
If you want to do what you mentioned, you have to change PM core and
ACPI spec.


The dma page allocated by VF driver also needs to reserve space
to do dummy write.


No, this will not work.  If for example you have a VF driver allocating
memory for a 9K receive how will that work?  It isn't as if you can poke
a hole in the contiguous memory.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 02/21] KVM: ARM64: Define PMU data structure for each vcpu

2015-12-07 Thread Marc Zyngier
On 03/12/15 06:11, Shannon Zhao wrote:
> From: Shannon Zhao 
> 
> Here we plan to support virtual PMU for guest by full software
> emulation, so define some basic structs and functions preparing for
> futher steps. Define struct kvm_pmc for performance monitor counter and
> struct kvm_pmu for performance monitor unit for each vcpu. According to
> ARMv8 spec, the PMU contains at most 32(ARMV8_MAX_COUNTERS) counters.
> 
> Since this only supports ARM64 (or PMUv3), add a separate config symbol
> for it.
> 
> Signed-off-by: Shannon Zhao 
> ---
>  arch/arm64/include/asm/kvm_host.h |  2 ++
>  arch/arm64/kvm/Kconfig|  8 
>  include/kvm/arm_pmu.h | 41 
> +++
>  3 files changed, 51 insertions(+)
>  create mode 100644 include/kvm/arm_pmu.h
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index a35ce72..42e15bb 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -37,6 +37,7 @@
>  
>  #include 
>  #include 
> +#include 
>  
>  #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
>  
> @@ -132,6 +133,7 @@ struct kvm_vcpu_arch {
>   /* VGIC state */
>   struct vgic_cpu vgic_cpu;
>   struct arch_timer_cpu timer_cpu;
> + struct kvm_pmu pmu;
>  
>   /*
>* Anything that is not used directly from assembly code goes
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index a5272c0..66da9a2 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -36,6 +36,7 @@ config KVM
>   select HAVE_KVM_EVENTFD
>   select HAVE_KVM_IRQFD
>   select KVM_ARM_VGIC_V3
> + select KVM_ARM_PMU
>   ---help---
> Support hosting virtualized guest machines.
> We don't support KVM with 16K page tables yet, due to the multiple
> @@ -48,6 +49,13 @@ config KVM_ARM_HOST
>   ---help---
> Provides host support for ARM processors.
>  
> +config KVM_ARM_PMU
> + bool
> + depends on KVM_ARM_HOST && HW_PERF_EVENTS
> + ---help---
> +   Adds support for a virtual Performance Monitoring Unit (PMU) in
> +   virtual machines.
> +
>  source drivers/vhost/Kconfig
>  
>  endif # VIRTUALIZATION
> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
> new file mode 100644
> index 000..0c13470
> --- /dev/null
> +++ b/include/kvm/arm_pmu.h
> @@ -0,0 +1,41 @@
> +/*
> + * Copyright (C) 2015 Linaro Ltd.
> + * Author: Shannon Zhao 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +
> +#ifndef __ASM_ARM_KVM_PMU_H
> +#define __ASM_ARM_KVM_PMU_H
> +
> +#include 
> +#ifdef CONFIG_KVM_ARM_PMU
> +#include 
> +#endif
> +
> +struct kvm_pmc {
> + u8 idx;/* index into the pmu->pmc array */
> + struct perf_event *perf_event;
> + struct kvm_vcpu *vcpu;

Why do you need this? If you have the pointer to a kvm_pmc structure, it
is very cheap to compute the address of the vcpu:

static inline kvm_vcpu *kvm_pmc_to_vcpu(struct kvm_pmc *pmc)
{
struct kvm_pmu *pmu;
pmc -= pmc->idx;
pmu = container_of(pmc, struct kvm_pmu, pmc);
return container_of(pmu, struct kvm_vcpu, pmu);
}

All of which the compiler will happily optimize for you.

> + u64 bitmask;

Looking at the code in pmu.c, there seem to be some confusion between
u64 and unsigned long (see kvm_pmu_get_counter_value). Please make sure
this is uniform.

> +};
> +
> +struct kvm_pmu {
> +#ifdef CONFIG_KVM_ARM_PMU
> + /* PMU IRQ Number per VCPU */
> + int irq_num;
> + struct kvm_pmc pmc[ARMV8_MAX_COUNTERS];
> +#endif
> +};
> +
> +#endif
> 

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 21/21] KVM: ARM64: Add a new kvm ARM PMU device

2015-12-07 Thread Marc Zyngier
On 03/12/15 06:11, Shannon Zhao wrote:
> From: Shannon Zhao 
> 
> Add a new kvm device type KVM_DEV_TYPE_ARM_PMU_V3 for ARM PMU. Implement
> the kvm_device_ops for it.
> 
> Signed-off-by: Shannon Zhao 
> ---
>  Documentation/virtual/kvm/devices/arm-pmu.txt | 16 +
>  arch/arm64/include/uapi/asm/kvm.h |  3 +
>  include/linux/kvm_host.h  |  1 +
>  include/uapi/linux/kvm.h  |  2 +
>  virt/kvm/arm/pmu.c| 87 
> +++
>  virt/kvm/kvm_main.c   |  4 ++
>  6 files changed, 113 insertions(+)
>  create mode 100644 Documentation/virtual/kvm/devices/arm-pmu.txt
> 
> diff --git a/Documentation/virtual/kvm/devices/arm-pmu.txt 
> b/Documentation/virtual/kvm/devices/arm-pmu.txt
> new file mode 100644
> index 000..5121f1f
> --- /dev/null
> +++ b/Documentation/virtual/kvm/devices/arm-pmu.txt
> @@ -0,0 +1,16 @@
> +ARM Virtual Performance Monitor Unit (vPMU)
> +===
> +
> +Device types supported:
> +  KVM_DEV_TYPE_ARM_PMU_V3 ARM Performance Monitor Unit v3
> +
> +Instantiate one PMU instance for per VCPU through this API.
> +
> +Groups:
> +  KVM_DEV_ARM_PMU_GRP_IRQ
> +  Attributes:
> +A value describing the interrupt number of PMU overflow interrupt. This
> +interrupt should be a PPI.
> +
> +  Errors:
> +-EINVAL: Value set is out of the expected range (from 16 to 31)
> diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> b/arch/arm64/include/uapi/asm/kvm.h
> index 2d4ca4b..568afa2 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -204,6 +204,9 @@ struct kvm_arch_memory_slot {
>  #define KVM_DEV_ARM_VGIC_GRP_CTRL4
>  #define   KVM_DEV_ARM_VGIC_CTRL_INIT 0
>  
> +/* Device Control API: ARM PMU */
> +#define KVM_DEV_ARM_PMU_GRP_IRQ  0
> +
>  /* KVM_IRQ_LINE irq field index values */
>  #define KVM_ARM_IRQ_TYPE_SHIFT   24
>  #define KVM_ARM_IRQ_TYPE_MASK0xff
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index c923350..608dea6 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1161,6 +1161,7 @@ extern struct kvm_device_ops kvm_mpic_ops;
>  extern struct kvm_device_ops kvm_xics_ops;
>  extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
>  extern struct kvm_device_ops kvm_arm_vgic_v3_ops;
> +extern struct kvm_device_ops kvm_arm_pmu_ops;
>  
>  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
>  
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 03f3618..4ba6fdd 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1032,6 +1032,8 @@ enum kvm_device_type {
>  #define KVM_DEV_TYPE_FLICKVM_DEV_TYPE_FLIC
>   KVM_DEV_TYPE_ARM_VGIC_V3,
>  #define KVM_DEV_TYPE_ARM_VGIC_V3 KVM_DEV_TYPE_ARM_VGIC_V3
> + KVM_DEV_TYPE_ARM_PMU_V3,
> +#define  KVM_DEV_TYPE_ARM_PMU_V3 KVM_DEV_TYPE_ARM_PMU_V3
>   KVM_DEV_TYPE_MAX,
>  };
>  
> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> index bd2fece..82b90e8 100644
> --- a/virt/kvm/arm/pmu.c
> +++ b/virt/kvm/arm/pmu.c
> @@ -19,10 +19,13 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
>  
> +#include "vgic.h"
> +
>  /**
>   * kvm_pmu_get_counter_value - get PMU counter value
>   * @vcpu: The vcpu pointer
> @@ -436,3 +439,87 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu 
> *vcpu, u32 data,
>  
>   pmc->perf_event = event;
>  }
> +
> +static int kvm_arm_pmu_set_irq(struct kvm *kvm, int irq)
> +{
> + int j;
> + struct kvm_vcpu *vcpu;
> +
> + kvm_for_each_vcpu(j, vcpu, kvm) {
> + struct kvm_pmu *pmu = >arch.pmu;
> +
> + kvm_debug("Set kvm ARM PMU irq: %d\n", irq);
> + pmu->irq_num = irq;
> + }
> +
> + return 0;
> +}
> +
> +static int kvm_arm_pmu_create(struct kvm_device *dev, u32 type)
> +{
> + int i;
> + struct kvm_vcpu *vcpu;
> + struct kvm *kvm = dev->kvm;
> +
> + kvm_for_each_vcpu(i, vcpu, kvm) {
> + struct kvm_pmu *pmu = >arch.pmu;
> +
> + memset(pmu, 0, sizeof(*pmu));
> + kvm_pmu_vcpu_reset(vcpu);
> + pmu->irq_num = -1;
> + }
> +
> + return 0;
> +}
> +
> +static void kvm_arm_pmu_destroy(struct kvm_device *dev)
> +{
> + kfree(dev);
> +}
> +
> +static int kvm_arm_pmu_set_attr(struct kvm_device *dev,
> + struct kvm_device_attr *attr)
> +{
> + switch (attr->group) {
> + case KVM_DEV_ARM_PMU_GRP_IRQ: {
> + int __user *uaddr = (int __user *)(long)attr->addr;
> + int reg;
> +
> + if (get_user(reg, uaddr))
> + return -EFAULT;
> +
> + if (reg < VGIC_NR_SGIS || reg >= VGIC_NR_PRIVATE_IRQS)
> + return -EINVAL;
> +
> + return 

Re: [PATCH v5 00/21] KVM: ARM64: Add guest PMU support

2015-12-07 Thread Marc Zyngier
Shannon,

On 03/12/15 06:11, Shannon Zhao wrote:
> From: Shannon Zhao 
> 
> This patchset adds guest PMU support for KVM on ARM64. It takes
> trap-and-emulate approach. When guest wants to monitor one event, it
> will be trapped by KVM and KVM will call perf_event API to create a perf
> event and call relevant perf_event APIs to get the count value of event.
> 
> Use perf to test this patchset in guest. When using "perf list", it
> shows the list of the hardware events and hardware cache events perf
> supports. Then use "perf stat -e EVENT" to monitor some event. For
> example, use "perf stat -e cycles" to count cpu cycles and
> "perf stat -e cache-misses" to count cache misses.
> 
> Below are the outputs of "perf stat -r 5 sleep 5" when running in host
> and guest.
> 
> Host:
>  Performance counter stats for 'sleep 5' (5 runs):
> 
>   0.510276  task-clock (msec) #0.000 CPUs utilized
> ( +-  1.57% )
>  1  context-switches  #0.002 M/sec
>  0  cpu-migrations#0.000 K/sec
> 49  page-faults   #0.096 M/sec
> ( +-  0.77% )
>1064117  cycles#2.085 GHz  
> ( +-  1.56% )
>  stalled-cycles-frontend
>  stalled-cycles-backend
> 529051  instructions  #0.50  insns per cycle  
> ( +-  0.55% )
>  branches
>   9894  branch-misses #   19.390 M/sec
> ( +-  1.70% )
> 
>5.000853900 seconds time elapsed   
>( +-  0.00% )
> 
> Guest:
>  Performance counter stats for 'sleep 5' (5 runs):
> 
>   0.642456  task-clock (msec) #0.000 CPUs utilized
> ( +-  1.81% )
>  1  context-switches  #0.002 M/sec
>  0  cpu-migrations#0.000 K/sec
> 49  page-faults   #0.076 M/sec
> ( +-  1.64% )
>1322717  cycles#2.059 GHz  
> ( +-  1.88% )
>  stalled-cycles-frontend
>  stalled-cycles-backend
> 640944  instructions  #0.48  insns per cycle  
> ( +-  1.10% )
>  branches
>  10665  branch-misses #   16.600 M/sec
> ( +-  2.23% )
> 
>5.001181452 seconds time elapsed   
>( +-  0.00% )
> 
> Have a cycle counter read test like below in guest and host:
> 
> static void test(void)
> {
>   unsigned long count, count1, count2;
>   count1 = read_cycles();
>   count++;
>   count2 = read_cycles();
> }
> 
> Host:
> count1: 3046186213
> count2: 3046186347
> delta: 134
> 
> Guest:
> count1: 5645797121
> count2: 5645797270
> delta: 149
> 
> The gap between guest and host is very small. One reason for this I
> think is that it doesn't count the cycles in EL2 and host since we add
> exclude_hv = 1. So the cycles spent to store/restore registers which
> happens at EL2 are not included.
> 
> This patchset can be fetched from [1] and the relevant QEMU version for
> test can be fetched from [2].
> 
> The results of 'perf test' can be found from [3][4].
> The results of perf_event_tests test suite can be found from [5][6].
> 
> Also, I have tested "perf top" in two VMs and host at the same time. It
> works well.

I've commented on more issues I've found. Hopefully you'll be able to
respin this quickly enough, and end-up with a simpler code base (state
duplication is a bit messy).

Another thing I have noticed is that you have dropped the vgic changes
that were configuring the interrupt. It feels like they should be
included, and configure the PPI as a LEVEL interrupt. Also, looking at
your QEMU code, you seem to configure the interrupt as EDGE, which is
now how yor emulated HW behaves.

Looking forward to reviewing the next version.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 04/21] KVM: ARM64: Add reset and access handlers for PMCR_EL0 register

2015-12-07 Thread Marc Zyngier
On 03/12/15 06:11, Shannon Zhao wrote:
> From: Shannon Zhao 
> 
> Add reset handler which gets host value of PMCR_EL0 and make writable
> bits architecturally UNKNOWN except PMCR.E to zero. Add a common access
> handler for PMU registers which emulates writing and reading register
> and add emulation for PMCR.
> 
> Signed-off-by: Shannon Zhao 
> ---
>  arch/arm64/kvm/sys_regs.c | 97 
> ++-
>  1 file changed, 95 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 87a64e8..e020fe0 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -33,6 +33,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -446,6 +447,58 @@ static void reset_mpidr(struct kvm_vcpu *vcpu, const 
> struct sys_reg_desc *r)
>   vcpu_sys_reg(vcpu, MPIDR_EL1) = (1ULL << 31) | mpidr;
>  }
>  
> +static void reset_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> +{
> + u64 pmcr, val;
> +
> + asm volatile("mrs %0, pmcr_el0\n" : "=r" (pmcr));
> + /* Writable bits of PMCR_EL0 (ARMV8_PMCR_MASK) is reset to UNKNOWN
> +  * except PMCR.E resetting to zero.
> +  */
> + val = ((pmcr & ~ARMV8_PMCR_MASK) | (ARMV8_PMCR_MASK & 0xdecafbad))
> +   & (~ARMV8_PMCR_E);
> + vcpu_sys_reg(vcpu, r->reg) = val;
> +}
> +
> +/* PMU registers accessor. */
> +static bool access_pmu_regs(struct kvm_vcpu *vcpu,
> + const struct sys_reg_params *p,

You may have noticed that this now generates a warning on 4.4-rc4, as
you cannot have a const struct sys_reg_params anymore (we've changed a
number of things there to solve another bug).

> + const struct sys_reg_desc *r)
> +{
> + u64 val;
> +
> + if (p->is_write) {
> + switch (r->reg) {
> + case PMCR_EL0: {
> + /* Only update writeable bits of PMCR */
> + val = vcpu_sys_reg(vcpu, r->reg);
> + val &= ~ARMV8_PMCR_MASK;
> + val |= *vcpu_reg(vcpu, p->Rt) & ARMV8_PMCR_MASK;

vcpu_reg and ->Rt are now gone. To ease the transition, I've pushed a
patch on top of this series to my kvm-arm64/pmu-v5 branch. Feel free to
use it as a reference.

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 03/21] KVM: ARM64: Add offset defines for PMU registers

2015-12-07 Thread Marc Zyngier
On 03/12/15 06:11, Shannon Zhao wrote:
> From: Shannon Zhao 
> 
> We are about to trap and emulate acccesses to each PMU register

s/acccesses/accesses/

> individually. This adds the context offsets for the AArch64 PMU
> registers and their AArch32 counterparts.
> 
> Signed-off-by: Shannon Zhao 
> ---
>  arch/arm64/include/asm/kvm_asm.h | 55 
> 
>  1 file changed, 50 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_asm.h 
> b/arch/arm64/include/asm/kvm_asm.h
> index 5e37710..4f804c1 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -48,12 +48,34 @@
>  #define MDSCR_EL122  /* Monitor Debug System Control Register */
>  #define MDCCINT_EL1  23  /* Monitor Debug Comms Channel Interrupt Enable 
> Reg */
>  

Coming back to this patch, it gives a clear view of where you have state
duplication.

> +/* Performance Monitors Registers */
> +#define PMCR_EL0 24  /* Control Register */
> +#define PMOVSSET_EL0 25  /* Overflow Flag Status Set Register */
> +#define PMOVSCLR_EL0 26  /* Overflow Flag Status Clear Register */

This should only be a single state. You don't even have to represent it
in the sysreg array, to be honest.

> +#define PMSELR_EL0   27  /* Event Counter Selection Register */
> +#define PMCEID0_EL0  28  /* Common Event Identification Register 0 */
> +#define PMCEID1_EL0  29  /* Common Event Identification Register 1 */
> +#define PMEVCNTR0_EL030  /* Event Counter Register (0-30) */
> +#define PMEVCNTR30_EL0   60
> +#define PMCCNTR_EL0  61  /* Cycle Counter Register */
> +#define PMEVTYPER0_EL0   62  /* Event Type Register (0-30) */
> +#define PMEVTYPER30_EL0  92
> +#define PMCCFILTR_EL093  /* Cycle Count Filter Register */
> +#define PMXEVCNTR_EL094  /* Selected Event Count Register */
> +#define PMXEVTYPER_EL0   95  /* Selected Event Type Register */

These "select" registers aren't real ones, but just a way to pick the
real register. They should be removed.

> +#define PMCNTENSET_EL0   96  /* Count Enable Set Register */
> +#define PMCNTENCLR_EL0   97  /* Count Enable Clear Register */
> +#define PMINTENSET_EL1   98  /* Interrupt Enable Set Register */
> +#define PMINTENCLR_EL1   99  /* Interrupt Enable Clear Register */

Same for these. They are just convenient accessors for the HW register.

> +#define PMUSERENR_EL0100 /* User Enable Register */
> +#define PMSWINC_EL0  101 /* Software Increment Register */
> +
>  /* 32bit specific registers. Keep them at the end of the range */
> -#define  DACR32_EL2  24  /* Domain Access Control Register */
> -#define  IFSR32_EL2  25  /* Instruction Fault Status Register */
> -#define  FPEXC32_EL2 26  /* Floating-Point Exception Control 
> Register */
> -#define  DBGVCR32_EL227  /* Debug Vector Catch Register */
> -#define  NR_SYS_REGS 28
> +#define  DACR32_EL2  102 /* Domain Access Control Register */
> +#define  IFSR32_EL2  103 /* Instruction Fault Status Register */
> +#define  FPEXC32_EL2 104 /* Floating-Point Exception Control 
> Register */
> +#define  DBGVCR32_EL2105 /* Debug Vector Catch Register */
> +#define  NR_SYS_REGS 106
>  
>  /* 32bit mapping */
>  #define c0_MPIDR (MPIDR_EL1 * 2) /* MultiProcessor ID Register */
> @@ -75,6 +97,24 @@
>  #define c6_IFAR  (c6_DFAR + 1)   /* Instruction Fault Address 
> Register */
>  #define c7_PAR   (PAR_EL1 * 2)   /* Physical Address Register */
>  #define c7_PAR_high  (c7_PAR + 1)/* PAR top 32 bits */
> +
> +/* Performance Monitors*/
> +#define c9_PMCR  (PMCR_EL0 * 2)
> +#define c9_PMOVSSET  (PMOVSSET_EL0 * 2)
> +#define c9_PMOVSCLR  (PMOVSCLR_EL0 * 2)
> +#define c9_PMCCNTR   (PMCCNTR_EL0 * 2)
> +#define c9_PMSELR(PMSELR_EL0 * 2)
> +#define c9_PMCEID0   (PMCEID0_EL0 * 2)
> +#define c9_PMCEID1   (PMCEID1_EL0 * 2)
> +#define c9_PMXEVCNTR (PMXEVCNTR_EL0 * 2)
> +#define c9_PMXEVTYPER(PMXEVTYPER_EL0 * 2)
> +#define c9_PMCNTENSET(PMCNTENSET_EL0 * 2)
> +#define c9_PMCNTENCLR(PMCNTENCLR_EL0 * 2)
> +#define c9_PMINTENSET(PMINTENSET_EL1 * 2)
> +#define c9_PMINTENCLR(PMINTENCLR_EL1 * 2)
> +#define c9_PMUSERENR (PMUSERENR_EL0 * 2)
> +#define c9_PMSWINC   (PMSWINC_EL0 * 2)
> +
>  #define c10_PRRR (MAIR_EL1 * 2)  /* Primary Region Remap Register */
>  #define c10_NMRR (c10_PRRR + 1)  /* Normal Memory Remap Register */
>  #define c12_VBAR (VBAR_EL1 * 2)  /* Vector Base Address Register */
> @@ -86,6 +126,11 @@
>  #define c10_AMAIR1   (c10_AMAIR0 + 1)/* Aux Memory Attr Indirection Reg */
>  #define c14_CNTKCTL  (CNTKCTL_EL1 * 2) /* Timer Control Register (PL1) */
>  
> +/* Performance Monitors*/
> +#define 

Re: [PATCH v3 05/22] arm64: KVM: Implement vgic-v3 save/restore

2015-12-07 Thread Marc Zyngier
Hi Mario,

On 07/12/15 16:40, Mario Smarduch wrote:
> Hi Marc,
> 
> On 12/7/2015 2:53 AM, Marc Zyngier wrote:
>> Implement the vgic-v3 save restore as a direct translation of
>> the assembly code version.
>>
>> Signed-off-by: Marc Zyngier 
>> ---
>>  arch/arm64/kvm/hyp/Makefile |   1 +
>>  arch/arm64/kvm/hyp/hyp.h|   3 +
>>  arch/arm64/kvm/hyp/vgic-v3-sr.c | 226 
>> 
>>  3 files changed, 230 insertions(+)
>>  create mode 100644 arch/arm64/kvm/hyp/vgic-v3-sr.c
>>
>> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
>> index d8d5968..d1e38ce 100644
>> --- a/arch/arm64/kvm/hyp/Makefile
>> +++ b/arch/arm64/kvm/hyp/Makefile
>> @@ -3,3 +3,4 @@
>>  #
>>  
>>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
>> +obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
>> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
>> index ac63553..5759f9f 100644
>> --- a/arch/arm64/kvm/hyp/hyp.h
>> +++ b/arch/arm64/kvm/hyp/hyp.h
>> @@ -32,5 +32,8 @@
>>  void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
>>  void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
>>  
>> +void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
>> +void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
>> +
>>  #endif /* __ARM64_KVM_HYP_H__ */
>>  
>> diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c 
>> b/arch/arm64/kvm/hyp/vgic-v3-sr.c
>> new file mode 100644
>> index 000..78d05f3
>> --- /dev/null
>> +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
>> @@ -0,0 +1,226 @@
>> +/*
>> + * Copyright (C) 2012-2015 - ARM Ltd
>> + * Author: Marc Zyngier 
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see .
>> + */
>> +
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include 
>> +
>> +#include "hyp.h"
>> +
>> +#define vtr_to_max_lr_idx(v)((v) & 0xf)
>> +#define vtr_to_nr_pri_bits(v)   (((u32)(v) >> 29) + 1)
>> +
>> +#define read_gicreg(r)  
>> \
>> +({  \
>> +u64 reg;\
>> +asm volatile("mrs_s %0, " __stringify(r) : "=r" (reg)); \
>> +reg;\
>> +})
>> +
>> +#define write_gicreg(v,r)   \
>> +do {\
>> +u64 __val = (v);\
>> +asm volatile("msr_s " __stringify(r) ", %0" : : "r" (__val));\
>> +} while (0)
>> +
>> +/* vcpu is already in the HYP VA space */
>> +void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
>> +{
>> +struct vgic_v3_cpu_if *cpu_if = >arch.vgic_cpu.vgic_v3;
>> +u64 val;
>> +u32 max_lr_idx, nr_pri_bits;
>> +
>> +/*
>> + * Make sure stores to the GIC via the memory mapped interface
>> + * are now visible to the system register interface.
>> + */
>> +dsb(st);
>> +
>> +cpu_if->vgic_vmcr  = read_gicreg(ICH_VMCR_EL2);
>> +cpu_if->vgic_misr  = read_gicreg(ICH_MISR_EL2);
>> +cpu_if->vgic_eisr  = read_gicreg(ICH_EISR_EL2);
>> +cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2);
>> +
>> +write_gicreg(0, ICH_HCR_EL2);
>> +val = read_gicreg(ICH_VTR_EL2);
>> +max_lr_idx = vtr_to_max_lr_idx(val);
>> +nr_pri_bits = vtr_to_nr_pri_bits(val);
>> +
> Can you setup a base pointer to cpu_if->vgic_lr and use an offset?

I could, but I fail to see what we'd gain by using this (aside from
slightly shorter lines). Or am I completely missing the point?

> Also is there a way to get rid of the constants, that implicitly hard codes 
> max
> number of LRs, doesn't make the code portable.

Well, it is a sad fact of life that the maximum number of LRs *is*
hardcoded to an architectural limit of 16. These are CPU registers, and
there is only so many of them (and probably a lot less in practice -
filling 4 of them has proved to be an extremely rare case).

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 05/22] arm64: KVM: Implement vgic-v3 save/restore

2015-12-07 Thread Marc Zyngier
On 07/12/15 17:18, Mario Smarduch wrote:
> 
> 
> On 12/7/2015 8:52 AM, Marc Zyngier wrote:
>> Hi Mario,
>>
>> On 07/12/15 16:40, Mario Smarduch wrote:
>>> Hi Marc,
>>>
>>> On 12/7/2015 2:53 AM, Marc Zyngier wrote:
 Implement the vgic-v3 save restore as a direct translation of
 the assembly code version.

 Signed-off-by: Marc Zyngier 
 ---
  arch/arm64/kvm/hyp/Makefile |   1 +
  arch/arm64/kvm/hyp/hyp.h|   3 +
  arch/arm64/kvm/hyp/vgic-v3-sr.c | 226 
 
  3 files changed, 230 insertions(+)
  create mode 100644 arch/arm64/kvm/hyp/vgic-v3-sr.c

 diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
 index d8d5968..d1e38ce 100644
 --- a/arch/arm64/kvm/hyp/Makefile
 +++ b/arch/arm64/kvm/hyp/Makefile
 @@ -3,3 +3,4 @@
  #
  
  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
 +obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
 diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
 index ac63553..5759f9f 100644
 --- a/arch/arm64/kvm/hyp/hyp.h
 +++ b/arch/arm64/kvm/hyp/hyp.h
 @@ -32,5 +32,8 @@
  void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
  void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
  
 +void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
 +void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
 +
  #endif /* __ARM64_KVM_HYP_H__ */
  
 diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c 
 b/arch/arm64/kvm/hyp/vgic-v3-sr.c
 new file mode 100644
 index 000..78d05f3
 --- /dev/null
 +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
 @@ -0,0 +1,226 @@
 +/*
 + * Copyright (C) 2012-2015 - ARM Ltd
 + * Author: Marc Zyngier 
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License version 2 as
 + * published by the Free Software Foundation.
 + *
 + * This program is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 + * GNU General Public License for more details.
 + *
 + * You should have received a copy of the GNU General Public License
 + * along with this program.  If not, see .
 + */
 +
 +#include 
 +#include 
 +#include 
 +
 +#include 
 +
 +#include "hyp.h"
 +
 +#define vtr_to_max_lr_idx(v)  ((v) & 0xf)
 +#define vtr_to_nr_pri_bits(v) (((u32)(v) >> 29) + 1)
 +
 +#define read_gicreg(r)
 \
 +  ({  \
 +  u64 reg;\
 +  asm volatile("mrs_s %0, " __stringify(r) : "=r" (reg)); \
 +  reg;\
 +  })
 +
 +#define write_gicreg(v,r) \
 +  do {\
 +  u64 __val = (v);\
 +  asm volatile("msr_s " __stringify(r) ", %0" : : "r" (__val));\
 +  } while (0)
 +
 +/* vcpu is already in the HYP VA space */
 +void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
 +{
 +  struct vgic_v3_cpu_if *cpu_if = >arch.vgic_cpu.vgic_v3;
 +  u64 val;
 +  u32 max_lr_idx, nr_pri_bits;
 +
 +  /*
 +   * Make sure stores to the GIC via the memory mapped interface
 +   * are now visible to the system register interface.
 +   */
 +  dsb(st);
 +
 +  cpu_if->vgic_vmcr  = read_gicreg(ICH_VMCR_EL2);
 +  cpu_if->vgic_misr  = read_gicreg(ICH_MISR_EL2);
 +  cpu_if->vgic_eisr  = read_gicreg(ICH_EISR_EL2);
 +  cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2);
 +
 +  write_gicreg(0, ICH_HCR_EL2);
 +  val = read_gicreg(ICH_VTR_EL2);
 +  max_lr_idx = vtr_to_max_lr_idx(val);
 +  nr_pri_bits = vtr_to_nr_pri_bits(val);
 +
>>> Can you setup a base pointer to cpu_if->vgic_lr and use an offset?
>>
>> I could, but I fail to see what we'd gain by using this (aside from
>> slightly shorter lines). Or am I completely missing the point?
> 
> Skip adding the offset of vgic_lr to cpu_if pointer.

But if we do that, we also change the layout that EL1 expect. Assume we
do something like this:

u64 *current_lr = cpu_if->vgic_lr;

switch (max_lr_idx) {
case 15:
current_lr++ = read_gicreg(ICH_LR15_EL2);
case 14:
current_lr++ = read_gicreg(ICH_LR14_EL2);
[...]
}

with max_lr_idx = 4 (a common case), we end up filling vgic_lr[0..3],
while the rest of the code expects it 

Re: [PATCH v5 02/21] KVM: ARM64: Define PMU data structure for each vcpu

2015-12-07 Thread Marc Zyngier
On 07/12/15 15:05, Marc Zyngier wrote:
> On 03/12/15 06:11, Shannon Zhao wrote:
>> From: Shannon Zhao 
>>
>> Here we plan to support virtual PMU for guest by full software
>> emulation, so define some basic structs and functions preparing for
>> futher steps. Define struct kvm_pmc for performance monitor counter and
>> struct kvm_pmu for performance monitor unit for each vcpu. According to
>> ARMv8 spec, the PMU contains at most 32(ARMV8_MAX_COUNTERS) counters.
>>
>> Since this only supports ARM64 (or PMUv3), add a separate config symbol
>> for it.
>>
>> Signed-off-by: Shannon Zhao 
>> ---
>>  arch/arm64/include/asm/kvm_host.h |  2 ++
>>  arch/arm64/kvm/Kconfig|  8 
>>  include/kvm/arm_pmu.h | 41 
>> +++
>>  3 files changed, 51 insertions(+)
>>  create mode 100644 include/kvm/arm_pmu.h
>>
>> diff --git a/arch/arm64/include/asm/kvm_host.h 
>> b/arch/arm64/include/asm/kvm_host.h
>> index a35ce72..42e15bb 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -37,6 +37,7 @@
>>  
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
>>  
>> @@ -132,6 +133,7 @@ struct kvm_vcpu_arch {
>>  /* VGIC state */
>>  struct vgic_cpu vgic_cpu;
>>  struct arch_timer_cpu timer_cpu;
>> +struct kvm_pmu pmu;
>>  
>>  /*
>>   * Anything that is not used directly from assembly code goes
>> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
>> index a5272c0..66da9a2 100644
>> --- a/arch/arm64/kvm/Kconfig
>> +++ b/arch/arm64/kvm/Kconfig
>> @@ -36,6 +36,7 @@ config KVM
>>  select HAVE_KVM_EVENTFD
>>  select HAVE_KVM_IRQFD
>>  select KVM_ARM_VGIC_V3
>> +select KVM_ARM_PMU
>>  ---help---
>>Support hosting virtualized guest machines.
>>We don't support KVM with 16K page tables yet, due to the multiple
>> @@ -48,6 +49,13 @@ config KVM_ARM_HOST
>>  ---help---
>>Provides host support for ARM processors.
>>  
>> +config KVM_ARM_PMU
>> +bool
>> +depends on KVM_ARM_HOST && HW_PERF_EVENTS
>> +---help---
>> +  Adds support for a virtual Performance Monitoring Unit (PMU) in
>> +  virtual machines.
>> +
>>  source drivers/vhost/Kconfig
>>  
>>  endif # VIRTUALIZATION
>> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
>> new file mode 100644
>> index 000..0c13470
>> --- /dev/null
>> +++ b/include/kvm/arm_pmu.h
>> @@ -0,0 +1,41 @@
>> +/*
>> + * Copyright (C) 2015 Linaro Ltd.
>> + * Author: Shannon Zhao 
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see .
>> + */
>> +
>> +#ifndef __ASM_ARM_KVM_PMU_H
>> +#define __ASM_ARM_KVM_PMU_H
>> +
>> +#include 
>> +#ifdef CONFIG_KVM_ARM_PMU
>> +#include 
>> +#endif
>> +
>> +struct kvm_pmc {
>> +u8 idx;/* index into the pmu->pmc array */
>> +struct perf_event *perf_event;
>> +struct kvm_vcpu *vcpu;
> 
> Why do you need this? If you have the pointer to a kvm_pmc structure, it
> is very cheap to compute the address of the vcpu:
> 
> static inline kvm_vcpu *kvm_pmc_to_vcpu(struct kvm_pmc *pmc)
> {
>   struct kvm_pmu *pmu;
>   pmc -= pmc->idx;
>   pmu = container_of(pmc, struct kvm_pmu, pmc);
>   return container_of(pmu, struct kvm_vcpu, pmu);
> }
> 
> All of which the compiler will happily optimize for you.

FWIW, actually compiling code looks something like this:

static inline struct kvm_vcpu *kvm_pmc_to_vcpu(struct kvm_pmc *pmc)
{
struct kvm_pmu *pmu;
struct kvm_vcpu_arch *vcpu_arch;

pmc -= pmc->idx;
pmu = container_of(pmc, struct kvm_pmu, pmc[0]);
vcpu_arch = container_of(pmu, struct kvm_vcpu_arch, pmu);
return container_of(vcpu_arch, struct kvm_vcpu, arch);
}

which amounts to 4 sub instructions.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


live migration vs device assignment (was Re: [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC)

2015-12-07 Thread Michael S. Tsirkin
nstead?
Remove device from guest control by unmapping
it from guest PTEs, teach guest not to crash
and not to hang. Ideally reset the device instead.

This sounds like a useful thing to support even
outside virtualization context.

3.  (Presumably) faster device start
Finally, device needs to be started at destination.  Again, hotplug
will also work. Isn't it fast enough? where exactly is
the time spent?

Alternatively, some kind of hot-unplug that makes e.g.
net core save the device state so the following hotplug can restore it
back might work. This is closer to what you are trying to
do, but it is not very robust since device at source
and destination could be slightly different.

A full reset at destination sounds better.

If combining with surprise removal in 2 above, maybe pretend to linux
that there was a fatal error on the device, and have linux re-initialize
it?  To me this sounds better as it will survive across minor device
changes src/dst. Still won't survive if driver happens to change which
isn't something users always have control over.
We could teach linux about a new event that replaces
the device.

Again, might be a useful thing to support even
outside virtualization context.


-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 05/22] arm64: KVM: Implement vgic-v3 save/restore

2015-12-07 Thread Mario Smarduch


On 12/7/2015 8:52 AM, Marc Zyngier wrote:
> Hi Mario,
> 
> On 07/12/15 16:40, Mario Smarduch wrote:
>> Hi Marc,
>>
>> On 12/7/2015 2:53 AM, Marc Zyngier wrote:
>>> Implement the vgic-v3 save restore as a direct translation of
>>> the assembly code version.
>>>
>>> Signed-off-by: Marc Zyngier 
>>> ---
>>>  arch/arm64/kvm/hyp/Makefile |   1 +
>>>  arch/arm64/kvm/hyp/hyp.h|   3 +
>>>  arch/arm64/kvm/hyp/vgic-v3-sr.c | 226 
>>> 
>>>  3 files changed, 230 insertions(+)
>>>  create mode 100644 arch/arm64/kvm/hyp/vgic-v3-sr.c
>>>
>>> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
>>> index d8d5968..d1e38ce 100644
>>> --- a/arch/arm64/kvm/hyp/Makefile
>>> +++ b/arch/arm64/kvm/hyp/Makefile
>>> @@ -3,3 +3,4 @@
>>>  #
>>>  
>>>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
>>> +obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
>>> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
>>> index ac63553..5759f9f 100644
>>> --- a/arch/arm64/kvm/hyp/hyp.h
>>> +++ b/arch/arm64/kvm/hyp/hyp.h
>>> @@ -32,5 +32,8 @@
>>>  void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
>>>  void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
>>>  
>>> +void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
>>> +void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
>>> +
>>>  #endif /* __ARM64_KVM_HYP_H__ */
>>>  
>>> diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c 
>>> b/arch/arm64/kvm/hyp/vgic-v3-sr.c
>>> new file mode 100644
>>> index 000..78d05f3
>>> --- /dev/null
>>> +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
>>> @@ -0,0 +1,226 @@
>>> +/*
>>> + * Copyright (C) 2012-2015 - ARM Ltd
>>> + * Author: Marc Zyngier 
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License version 2 as
>>> + * published by the Free Software Foundation.
>>> + *
>>> + * This program is distributed in the hope that it will be useful,
>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> + * GNU General Public License for more details.
>>> + *
>>> + * You should have received a copy of the GNU General Public License
>>> + * along with this program.  If not, see .
>>> + */
>>> +
>>> +#include 
>>> +#include 
>>> +#include 
>>> +
>>> +#include 
>>> +
>>> +#include "hyp.h"
>>> +
>>> +#define vtr_to_max_lr_idx(v)   ((v) & 0xf)
>>> +#define vtr_to_nr_pri_bits(v)  (((u32)(v) >> 29) + 1)
>>> +
>>> +#define read_gicreg(r) 
>>> \
>>> +   ({  \
>>> +   u64 reg;\
>>> +   asm volatile("mrs_s %0, " __stringify(r) : "=r" (reg)); \
>>> +   reg;\
>>> +   })
>>> +
>>> +#define write_gicreg(v,r)  \
>>> +   do {\
>>> +   u64 __val = (v);\
>>> +   asm volatile("msr_s " __stringify(r) ", %0" : : "r" (__val));\
>>> +   } while (0)
>>> +
>>> +/* vcpu is already in the HYP VA space */
>>> +void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
>>> +{
>>> +   struct vgic_v3_cpu_if *cpu_if = >arch.vgic_cpu.vgic_v3;
>>> +   u64 val;
>>> +   u32 max_lr_idx, nr_pri_bits;
>>> +
>>> +   /*
>>> +* Make sure stores to the GIC via the memory mapped interface
>>> +* are now visible to the system register interface.
>>> +*/
>>> +   dsb(st);
>>> +
>>> +   cpu_if->vgic_vmcr  = read_gicreg(ICH_VMCR_EL2);
>>> +   cpu_if->vgic_misr  = read_gicreg(ICH_MISR_EL2);
>>> +   cpu_if->vgic_eisr  = read_gicreg(ICH_EISR_EL2);
>>> +   cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2);
>>> +
>>> +   write_gicreg(0, ICH_HCR_EL2);
>>> +   val = read_gicreg(ICH_VTR_EL2);
>>> +   max_lr_idx = vtr_to_max_lr_idx(val);
>>> +   nr_pri_bits = vtr_to_nr_pri_bits(val);
>>> +
>> Can you setup a base pointer to cpu_if->vgic_lr and use an offset?
> 
> I could, but I fail to see what we'd gain by using this (aside from
> slightly shorter lines). Or am I completely missing the point?

Skip adding the offset of vgic_lr to cpu_if pointer.
> 
>> Also is there a way to get rid of the constants, that implicitly hard codes 
>> max
>> number of LRs, doesn't make the code portable.
> 
> Well, it is a sad fact of life that the maximum number of LRs *is*
> hardcoded to an architectural limit of 16. These are CPU registers, and
> there is only so many of them (and probably a lot less in practice -
> filling 4 of them has proved to be an extremely rare case).

Yes I'm aware of that it was 64 (or maybe still is) on armv7 but specs have
changed from time to time.

> 
> Thanks,
> 
>   M.
> 
--
To 

Re: [PATCH v3 05/22] arm64: KVM: Implement vgic-v3 save/restore

2015-12-07 Thread Mario Smarduch
ic_lr[VGIC_V3_LR_INDEX(10)], 
> ICH_LR10_EL2);
> + case 9:
> + write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(9)], ICH_LR9_EL2);
> + case 8:
> + write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(8)], ICH_LR8_EL2);
> + case 7:
> + write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(7)], ICH_LR7_EL2);
> + case 6:
> + write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(6)], ICH_LR6_EL2);
> + case 5:
> + write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(5)], ICH_LR5_EL2);
> + case 4:
> + write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(4)], ICH_LR4_EL2);
> + case 3:
> + write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(3)], ICH_LR3_EL2);
> + case 2:
> + write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(2)], ICH_LR2_EL2);
> + case 1:
> + write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(1)], ICH_LR1_EL2);
> + case 0:
> + write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(0)], ICH_LR0_EL2);
> + }
> +
> + /*
> +  * Ensures that the above will have reached the
> +  * (re)distributors. This ensure the guest will read the
> +  * correct values from the memory-mapped interface.
> +  */
> + isb();
> + dsb(sy);
> +
> + /*
> +  * Prevent the guest from touching the GIC system registers if
> +  * SRE isn't enabled for GICv3 emulation.
> +  */
> + if (!cpu_if->vgic_sre) {
> + write_gicreg(read_gicreg(ICC_SRE_EL2) & ~ICC_SRE_EL2_ENABLE,
> +  ICC_SRE_EL2);
> + }
> +}
> +
> +u64 __hyp_text __vgic_v3_read_ich_vtr_el2(void)
> +{
> + return read_gicreg(ICH_VTR_EL2);
> +}
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC

2015-12-07 Thread Michael S. Tsirkin
On Mon, Dec 07, 2015 at 09:12:08AM -0800, Alexander Duyck wrote:
> On Mon, Dec 7, 2015 at 7:40 AM, Lan, Tianyu  wrote:
> > On 12/5/2015 1:07 AM, Alexander Duyck wrote:
> >>>
> >>>
> >>> We still need to support Windows guest for migration and this is why our
> >>> patches keep all changes in the driver since it's impossible to change
> >>> Windows kernel.
> >>
> >>
> >> That is a poor argument.  I highly doubt Microsoft is interested in
> >> having to modify all of the drivers that will support direct assignment
> >> in order to support migration.  They would likely request something
> >> similar to what I have in that they will want a way to do DMA tracking
> >> with minimal modification required to the drivers.
> >
> >
> > This totally depends on the NIC or other devices' vendors and they
> > should make decision to support migration or not. If yes, they would
> > modify driver.
> 
> Having to modify every driver that wants to support live migration is
> a bit much.  In addition I don't see this being limited only to NIC
> devices.  You can direct assign a number of different devices, your
> solution cannot be specific to NICs.
> 
> > If just target to call suspend/resume during migration, the feature will
> > be meaningless. Most cases don't want to affect user during migration
> > a lot and so the service down time is vital. Our target is to apply
> > SRIOV NIC passthough to cloud service and NFV(network functions
> > virtualization) projects which are sensitive to network performance
> > and stability. From my opinion, We should give a change for device
> > driver to implement itself migration job. Call suspend and resume
> > callback in the driver if it doesn't care the performance during migration.
> 
> The suspend/resume callback should be efficient in terms of time.
> After all we don't want the system to stall for a long period of time
> when it should be either running or asleep.  Having it burn cycles in
> a power state limbo doesn't do anyone any good.  If nothing else maybe
> it will help to push the vendors to speed up those functions which
> then benefit migration and the system sleep states.
> 
> Also you keep assuming you can keep the device running while you do
> the migration and you can't.  You are going to corrupt the memory if
> you do, and you have yet to provide any means to explain how you are
> going to solve that.
> 
> 
> >
> >>
> >>> Following is my idea to do DMA tracking.
> >>>
> >>> Inject event to VF driver after memory iterate stage
> >>> and before stop VCPU and then VF driver marks dirty all
> >>> using DMA memory. The new allocated pages also need to
> >>> be marked dirty before stopping VCPU. All dirty memory
> >>> in this time slot will be migrated until stop-and-copy
> >>> stage. We also need to make sure to disable VF via clearing the
> >>> bus master enable bit for VF before migrating these memory.
> >>
> >>
> >> The ordering of your explanation here doesn't quite work.  What needs to
> >> happen is that you have to disable DMA and then mark the pages as dirty.
> >>   What the disabling of the BME does is signal to the hypervisor that
> >> the device is now stopped.  The ixgbevf_suspend call already supported
> >> by the driver is almost exactly what is needed to take care of something
> >> like this.
> >
> >
> > This is why I hope to reserve a piece of space in the dma page to do dummy
> > write. This can help to mark page dirty while not require to stop DMA and
> > not race with DMA data.
> 
> You can't and it will still race.  What concerns me is that your
> patches and the document you referenced earlier show a considerable
> lack of understanding about how DMA and device drivers work.  There is
> a reason why device drivers have so many memory barriers and the like
> in them.  The fact is when you have CPU and a device both accessing
> memory things have to be done in a very specific order and you cannot
> violate that.
> 
> If you have a contiguous block of memory you expect the device to
> write into you cannot just poke a hole in it.  Such a situation is not
> supported by any hardware that I am aware of.
> 
> As far as writing to dirty the pages it only works so long as you halt
> the DMA and then mark the pages dirty.  It has to be in that order.
> Any other order will result in data corruption and I am sure the NFV
> customers definitely don't want that.
> 
> > If can't do that, we have to stop DMA in a short time to mark all dma
> > pages dirty and then reenable it. I am not sure how much we can get by
> > this way to track all DMA memory with device running during migration. I
> > need to do some tests and compare results with stop DMA diretly at last
> > stage during migration.
> 
> We have to halt the DMA before we can complete the migration.  So
> please feel free to test this.
> 
> In addition I still feel you would be better off taking this in
> smaller steps.  I still say your first step would be to come up with a
> generic 

Re: [PATCH v3 01/22] arm64: Add macros to read/write system registers

2015-12-07 Thread Catalin Marinas
On Mon, Dec 07, 2015 at 10:53:17AM +, Marc Zyngier wrote:
> From: Mark Rutland 
> 
> Rather than crafting custom macros for reading/writing each system
> register provide generics accessors, read_sysreg and write_sysreg, for
> this purpose.
> 
> Unlike read_cpuid, calls to read_exception_reg are never expected
> to be optimized away or replaced with synthetic values.

What's read_exception_reg? Is it a macro somewhere?

> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index d48ab5b..c9c283a 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -20,6 +20,8 @@
>  #ifndef __ASM_SYSREG_H
>  #define __ASM_SYSREG_H
>  
> +#include 
> +
>  #include 
>  
>  /*
> @@ -208,6 +210,8 @@
>  
>  #else
>  
> +#include 
> +
>  asm(
>  ".irp
> num,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30\n"
>  ".equ__reg_num_x\\num, \\num\n"
> @@ -232,6 +236,19 @@ static inline void config_sctlr_el1(u32 clear, u32 set)
>   val |= set;
>   asm volatile("msr sctlr_el1, %0" : : "r" (val));
>  }
> +
> +#define read_sysreg(r) ({\
> + u64 __val;  \
> + asm volatile("mrs %0, " __stringify(r) : "=r" (__val)); \
> + __val;  \
> +})

And maybe a comment here on why this is always volatile.

Otherwise:

Acked-by: Catalin Marinas 

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 01/22] arm64: Add macros to read/write system registers

2015-12-07 Thread Mark Rutland
On Mon, Dec 07, 2015 at 05:35:20PM +, Catalin Marinas wrote:
> On Mon, Dec 07, 2015 at 10:53:17AM +, Marc Zyngier wrote:
> > From: Mark Rutland 
> > 
> > Rather than crafting custom macros for reading/writing each system
> > register provide generics accessors, read_sysreg and write_sysreg, for
> > this purpose.
> > 
> > Unlike read_cpuid, calls to read_exception_reg are never expected
> > to be optimized away or replaced with synthetic values.
> 
> What's read_exception_reg? Is it a macro somewhere?

That was what read_sysreg used to be called on a local branch of mine. I
didn't spot that when reworking the patch.

So s/read_exception_reg/read_sysreg/ here.

> > +#define read_sysreg(r) ({  \
> > +   u64 __val;  \
> > +   asm volatile("mrs %0, " __stringify(r) : "=r" (__val)); \
> > +   __val;  \
> > +})
> 
> And maybe a comment here on why this is always volatile.

Makes sense to me.

Marc, are you happy to turn the last sentence from the commit message
into a comment here (with the substitution)?

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC

2015-12-07 Thread Alexander Duyck
On Mon, Dec 7, 2015 at 7:40 AM, Lan, Tianyu  wrote:
> On 12/5/2015 1:07 AM, Alexander Duyck wrote:
>>>
>>>
>>> We still need to support Windows guest for migration and this is why our
>>> patches keep all changes in the driver since it's impossible to change
>>> Windows kernel.
>>
>>
>> That is a poor argument.  I highly doubt Microsoft is interested in
>> having to modify all of the drivers that will support direct assignment
>> in order to support migration.  They would likely request something
>> similar to what I have in that they will want a way to do DMA tracking
>> with minimal modification required to the drivers.
>
>
> This totally depends on the NIC or other devices' vendors and they
> should make decision to support migration or not. If yes, they would
> modify driver.

Having to modify every driver that wants to support live migration is
a bit much.  In addition I don't see this being limited only to NIC
devices.  You can direct assign a number of different devices, your
solution cannot be specific to NICs.

> If just target to call suspend/resume during migration, the feature will
> be meaningless. Most cases don't want to affect user during migration
> a lot and so the service down time is vital. Our target is to apply
> SRIOV NIC passthough to cloud service and NFV(network functions
> virtualization) projects which are sensitive to network performance
> and stability. From my opinion, We should give a change for device
> driver to implement itself migration job. Call suspend and resume
> callback in the driver if it doesn't care the performance during migration.

The suspend/resume callback should be efficient in terms of time.
After all we don't want the system to stall for a long period of time
when it should be either running or asleep.  Having it burn cycles in
a power state limbo doesn't do anyone any good.  If nothing else maybe
it will help to push the vendors to speed up those functions which
then benefit migration and the system sleep states.

Also you keep assuming you can keep the device running while you do
the migration and you can't.  You are going to corrupt the memory if
you do, and you have yet to provide any means to explain how you are
going to solve that.


>
>>
>>> Following is my idea to do DMA tracking.
>>>
>>> Inject event to VF driver after memory iterate stage
>>> and before stop VCPU and then VF driver marks dirty all
>>> using DMA memory. The new allocated pages also need to
>>> be marked dirty before stopping VCPU. All dirty memory
>>> in this time slot will be migrated until stop-and-copy
>>> stage. We also need to make sure to disable VF via clearing the
>>> bus master enable bit for VF before migrating these memory.
>>
>>
>> The ordering of your explanation here doesn't quite work.  What needs to
>> happen is that you have to disable DMA and then mark the pages as dirty.
>>   What the disabling of the BME does is signal to the hypervisor that
>> the device is now stopped.  The ixgbevf_suspend call already supported
>> by the driver is almost exactly what is needed to take care of something
>> like this.
>
>
> This is why I hope to reserve a piece of space in the dma page to do dummy
> write. This can help to mark page dirty while not require to stop DMA and
> not race with DMA data.

You can't and it will still race.  What concerns me is that your
patches and the document you referenced earlier show a considerable
lack of understanding about how DMA and device drivers work.  There is
a reason why device drivers have so many memory barriers and the like
in them.  The fact is when you have CPU and a device both accessing
memory things have to be done in a very specific order and you cannot
violate that.

If you have a contiguous block of memory you expect the device to
write into you cannot just poke a hole in it.  Such a situation is not
supported by any hardware that I am aware of.

As far as writing to dirty the pages it only works so long as you halt
the DMA and then mark the pages dirty.  It has to be in that order.
Any other order will result in data corruption and I am sure the NFV
customers definitely don't want that.

> If can't do that, we have to stop DMA in a short time to mark all dma
> pages dirty and then reenable it. I am not sure how much we can get by
> this way to track all DMA memory with device running during migration. I
> need to do some tests and compare results with stop DMA diretly at last
> stage during migration.

We have to halt the DMA before we can complete the migration.  So
please feel free to test this.

In addition I still feel you would be better off taking this in
smaller steps.  I still say your first step would be to come up with a
generic solution for the dirty page tracking like the dma_mark_clean()
approach I had mentioned earlier.  If I get time I might try to take
care of it myself later this week since you don't seem to agree with
that approach.

>>
>> The question is how we would go about 

RE: [PATCH v4 0/4] KVM: arm64: BUG FIX: Correctly handle zero register transfers

2015-12-07 Thread Pavel Fedin
 Hello!

> But, if Pavel doesn't
> mind trying them out on his system, then it'd be good to know if they
> reproduce there. I'd like to find out if it's a test case problem or
> something else strange going on with environments.

Does not build, applied to master:
--- cut ---
aarch64-unknown-linux-gnu-gcc  -std=gnu99 -ffreestanding -Wextra -O2 -I lib -I 
lib/libfdt -g -MMD -MF arm/.xzr-test.d -Wall
-fomit-frame-pointer  -fno-stack-protector -c -o arm/xzr-test.o 
arm/xzr-test.c
arm/xzr-test.c: In function 'check_xzr_sysreg':
arm/xzr-test.c:13:2: warning: implicit declaration of function 'mmu_disable' 
[-Wimplicit-function-declaration]
  mmu_disable(); /* Tell KVM to set HCR_TVM for this VCPU */
  ^
aarch64-unknown-linux-gnu-gcc  -std=gnu99 -ffreestanding -Wextra -O2 -I lib -I 
lib/libfdt -g -MMD -MF arm/.xzr-test.d -Wall
-fomit-frame-pointer  -fno-stack-protector   -nostdlib -o arm/xzr-test.elf \
-Wl,-T,arm/flat.lds,--build-id=none,-Ttext=4008 \
arm/xzr-test.o arm/cstart64.o lib/libcflat.a lib/libfdt/libfdt.a 
/usr/lib/gcc/aarch64-unknown-linux-gnu/4.9.0/libgcc.a
lib/arm/libeabi.a
arm/xzr-test.o: In function `check_xzr_sysreg':
/cygdrive/d/Projects/kvm-unit-tests/arm/xzr-test.c:13: undefined reference to 
`mmu_disable'
--- cut ---

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4 0/4] KVM: arm64: BUG FIX: Correctly handle zero register transfers

2015-12-07 Thread Pavel Fedin
 Hello!

> FYI, I tried writing test cases for this issue with kvm-unit-tests. The
> issue didn't reproduce for me. It's quite possible my test cases are
> flawed, so I'm not making any claims about the validity of the series

 This is indeed very interesting, so i'll take a look at it.
 For now i've just only took a quick glance at the code, and i have at least 
one suggestion. Could you happen to have sp == 0 in
check_xzr_sysreg()? In this case it will magically work.
 Also, you could try to write a test which tries to overwrite xzr. Something 
like:

volatile int *addr1;
volatile int *addr2;

asm volatile("str %3, [%1]\n\t"
 "ldr wzr, [%1]\n\t"
 "str wzr, [%2]\n\t",
 "ldr %0, [%2]\n\t"
 :"=r"(res):"r"(addr1), "r"(addr2), "r"(some_nonzero_val):"memory");

 Then check for res == some_nonzero_val. If they are equal, you've got the bug 
:)

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4 0/4] KVM: arm64: BUG FIX: Correctly handle zero register transfers

2015-12-07 Thread Pavel Fedin
 Hello!

> FYI, I tried writing test cases for this issue with kvm-unit-tests. The
> issue didn't reproduce for me. It's quite possible my test cases are
> flawed

 Indeed they are, a very little thing fell through again... :)
 It's not just SP, it's SP_EL0. And you never initialize it to anything because 
your code always runs in kernel mode, so it's just
zero, so you get your zero.
 But if you add a little thing in the beginning of your main():

asm volatile("msr sp_el0, %0" : : "r" (0xDEADC0DE0BADC0DE));

 then you have it:
--- cut ---
[root@thunderx-2 kvm-unit-tests]# ./arm-run arm/xzr-test.flat -smp 2
qemu-system-aarch64 -machine virt,accel=kvm:tcg,gic-version=host -cpu host 
-device virtio-serial-device -device
virtconsole,chardev=ctd -chardev testdev,id=ctd -display none -serial stdio 
-kernel arm/xzr-test.flat -smp 2
PASS: mmio: sanity check: read 0x
FAIL: mmio: 'str wzr' check: read 0x0badc0de
vm_setup_vq: virtqueue 0 already setup! base=0xa003e00
chr_testdev_init: chr-testdev: can't init virtqueues
--- cut ---

 Here i run only MMIO test, because i could not compile sysreg one, so i simply 
commented it out.

 P.S. Could you also apply something like the following to arm/run:
--- cut ---
arm/run | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arm/run b/arm/run
index 662a856..3890c8c 100755
--- a/arm/run
+++ b/arm/run
@@ -33,7 +33,11 @@ if $qemu $M -chardev testdev,id=id -initrd . 2>&1 \
exit 2
 fi
 
-M='-machine virt,accel=kvm:tcg'
+if $qemu $M,? 2>&1 | grep gic-version > /dev/null; then
+   GIC='gic-version=host,'
+fi
+
+M="-machine virt,${GIC}accel=kvm:tcg"
 chr_testdev='-device virtio-serial-device'
 chr_testdev+=' -device virtconsole,chardev=ctd -chardev testdev,id=ctd'
--- cut ---

 Without it qemu does not work on GICv3-only hardware, like my board, because 
it defaults to gic-version=2. I don't post the patch
on the mailing lists, because in order to be able to post this 5-liner i'll 
need to go through the formal approval procedure at my
company, and i just don't want to bother for a single small fix. :) Will do as 
a "Reported-by:".

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: powerpc/64: Include KVM guest test in all interrupt vectors

2015-12-07 Thread Michael Ellerman
On Thu, 2015-12-11 at 05:44:42 UTC, Paul Mackerras wrote:
> Currently, if HV KVM is configured but PR KVM isn't, we don't include
> a test to see whether we were interrupted in KVM guest context for the
> set of interrupts which get delivered directly to the guest by hardware
> if they occur in the guest.  This includes things like program
> interrupts.
> 
> However, the recent bug where userspace could set the MSR for a VCPU
> to have an illegal value in the TS field, and thus cause a TM Bad Thing
> type of program interrupt on the hrfid that enters the guest, showed that
> we can never be completely sure that these interrupts can never occur
> in the guest entry/exit code.  If one of these interrupts does happen
> and we have HV KVM configured but not PR KVM, then we end up trying to
> run the handler in the host with the MMU set to the guest MMU context,
> which generally ends badly.
> 
> Thus, for robustness it is better to have the test in every interrupt
> vector, so that if some way is found to trigger some interrupt in the
> guest entry/exit path, we can handle it without immediately crashing
> the host.
> 
> This means that the distinction between KVMTEST and KVMTEST_PR goes
> away.  Thus we delete KVMTEST_PR and associated macros and use KVMTEST
> everywhere that we previously used either KVMTEST_PR or KVMTEST.  It
> also means that SOFTEN_TEST_HV_201 becomes the same as SOFTEN_TEST_PR,
> so we deleted SOFTEN_TEST_HV_201 and use SOFTEN_TEST_PR instead.
> 
> Signed-off-by: Paul Mackerras 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/31a40e2b052c0f2b80df7b56

cheers
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8

2015-12-07 Thread Thomas Huth
On 20/11/15 09:11, Thomas Huth wrote:
> In the old DABR register, the BT (Breakpoint Translation) bit
> is bit number 61. In the new DAWRX register, the WT (Watchpoint
> Translation) bit is bit number 59. So to move the DABR-BT bit
> into the position of the DAWRX-WT bit, it has to be shifted by
> two, not only by one. This fixes hardware watchpoints in gdb of
> older guests that only use the H_SET_DABR/X interface instead
> of the new H_SET_MODE interface.
> 
> Signed-off-by: Thomas Huth 
> ---
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index b98889e..3983b87 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -2143,7 +2143,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  
>   /* Emulate H_SET_DABR/X on P8 for the sake of compat mode guests */
>  2:   rlwimi  r5, r4, 5, DAWRX_DR | DAWRX_DW
> - rlwimi  r5, r4, 1, DAWRX_WT
> + rlwimi  r5, r4, 2, DAWRX_WT
>   clrrdi  r4, r4, 3
>   std r4, VCPU_DAWR(r3)
>   std r5, VCPU_DAWRX(r3)

Ping?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8

2015-12-07 Thread Thomas Huth
On 20/11/15 09:11, Thomas Huth wrote:
> In the old DABR register, the BT (Breakpoint Translation) bit
> is bit number 61. In the new DAWRX register, the WT (Watchpoint
> Translation) bit is bit number 59. So to move the DABR-BT bit
> into the position of the DAWRX-WT bit, it has to be shifted by
> two, not only by one. This fixes hardware watchpoints in gdb of
> older guests that only use the H_SET_DABR/X interface instead
> of the new H_SET_MODE interface.
> 
> Signed-off-by: Thomas Huth 
> ---
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index b98889e..3983b87 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -2143,7 +2143,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  
>   /* Emulate H_SET_DABR/X on P8 for the sake of compat mode guests */
>  2:   rlwimi  r5, r4, 5, DAWRX_DR | DAWRX_DW
> - rlwimi  r5, r4, 1, DAWRX_WT
> + rlwimi  r5, r4, 2, DAWRX_WT
>   clrrdi  r4, r4, 3
>   std r4, VCPU_DAWR(r3)
>   std r5, VCPU_DAWRX(r3)

Ping?

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 01/22] arm64: Add macros to read/write system registers

2015-12-07 Thread Marc Zyngier
On 07/12/15 17:45, Mark Rutland wrote:
> On Mon, Dec 07, 2015 at 05:35:20PM +, Catalin Marinas wrote:
>> On Mon, Dec 07, 2015 at 10:53:17AM +, Marc Zyngier wrote:
>>> From: Mark Rutland 
>>>
>>> Rather than crafting custom macros for reading/writing each system
>>> register provide generics accessors, read_sysreg and write_sysreg, for
>>> this purpose.
>>>
>>> Unlike read_cpuid, calls to read_exception_reg are never expected
>>> to be optimized away or replaced with synthetic values.
>>
>> What's read_exception_reg? Is it a macro somewhere?
> 
> That was what read_sysreg used to be called on a local branch of mine. I
> didn't spot that when reworking the patch.
> 
> So s/read_exception_reg/read_sysreg/ here.
> 
>>> +#define read_sysreg(r) ({  \
>>> +   u64 __val;  \
>>> +   asm volatile("mrs %0, " __stringify(r) : "=r" (__val)); \
>>> +   __val;  \
>>> +})
>>
>> And maybe a comment here on why this is always volatile.
> 
> Makes sense to me.
> 
> Marc, are you happy to turn the last sentence from the commit message
> into a comment here (with the substitution)?

Sure, I'll update the patch and push the branch out again, together with
Catalin's Ack.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/4] KVM: arm64: BUG FIX: Correctly handle zero register transfers

2015-12-07 Thread Andrew Jones
On Mon, Dec 07, 2015 at 04:36:31PM -0600, Andrew Jones wrote:
> On Mon, Dec 07, 2015 at 11:36:28AM +0300, Pavel Fedin wrote:
> >  Hello!
> > 
> > > FYI, I tried writing test cases for this issue with kvm-unit-tests. The
> > > issue didn't reproduce for me. It's quite possible my test cases are
> > > flawed, so I'm not making any claims about the validity of the series
> > 
> >  This is indeed very interesting, so i'll take a look at it.
> >  For now i've just only took a quick glance at the code, and i have at 
> > least one suggestion. Could you happen to have sp == 0 in
> > check_xzr_sysreg()? In this case it will magically work.
> >  Also, you could try to write a test which tries to overwrite xzr. 
> > Something like:
> > 
> > volatile int *addr1;
> > volatile int *addr2;
> > 
> > asm volatile("str %3, [%1]\n\t"
> >  "ldr wzr, [%1]\n\t"
> >  "str wzr, [%2]\n\t",
> >  "ldr %0, [%2]\n\t"
> >  :"=r"(res):"r"(addr1), "r"(addr2), 
> > "r"(some_nonzero_val):"memory");
> > 
> >  Then check for res == some_nonzero_val. If they are equal, you've got the 
> > bug :)
> >
> 
> Besides the fixes mentioned in other mails, I did add this load to xzr
> tests too. For mmio we get the expected failure. mrs seems to work
> though, but maybe that's expected.
> 
> qemu-system-aarch64 -machine virt,accel=kvm -cpu host \
>   -device virtio-serial-device -device virtconsole,chardev=ctd \
>   -chardev testdev,id=ctd -display none -serial stdio \
>   -kernel arm/xzr-test.flat -smp 2
> 
> PASS: mmio: sanity check: read 0x
> FAIL: mmio: 'str wzr' check: read 0x0badc0de
> FAIL: mmio: 'ldr wzr' check: read 0x0badc0de
> PASS: sysreg: sp = 0x401affe0
> FAIL: sysreg: from xzr check: read 0xc0de0badc0de
> PASS: sysreg: to xzr check: read 0x
>

I messed up the "load into xzr" test royally in the last attached patch.
It was quite wrong. I have now tested

 asm volatile(
 "str %3, [%1]\n\t"
 "ldr wzr, [%1]\n\t"
 "str wzr, [%2]\n\t"
 "ldr %0, [%2]\n\t"
 :"=r"(val):"r"(addr), "r"(addr2), "r"(0x):"memory");
report("mmio: 'ldr wzr' check: read 0x%08lx", val != 0x, val);

which passes and

 val = readl(addr);
 printf("addr = 0x%08lx\n", val);
 val = readl(addr2);
 printf("addr2 = 0x%08lx\n", val);

gives

addr = 0x
addr2 = 0x

So it looks like we don't "change" xzr somehow with loads. Anyway, I
probably won't clean this test up and post it. I don't think we really
need to add it as a regression test, unless others disagree and would
like to see it added.

Thanks,
drew
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 05/22] arm64: KVM: Implement vgic-v3 save/restore

2015-12-07 Thread Mario Smarduch


On 12/7/2015 10:20 AM, Marc Zyngier wrote:
> On 07/12/15 18:05, Mario Smarduch wrote:
>>
>>
>> On 12/7/2015 9:37 AM, Marc Zyngier wrote:
[...]
>>>
>>
>> I was thinking something like 'current_lr[VGIC_V3_LR_INDEX(...)]'.
> 
> That doesn't change anything, the compiler is perfectly able to 
> optimize something like this:
> 
> [...]
> ffc0007f31ac:   38624862ldrbw2, [x3,w2,uxtw]
> ffc0007f31b0:   1063adr x3, ffc0007f31bc 
> <__vgic_v3_save_state+0x64>
> ffc0007f31b4:   8b228862add x2, x3, w2, sxtb #2
> ffc0007f31b8:   d61f0040br  x2
> ffc0007f31bc:   d53ccde2mrs x2, s3_4_c12_c13_7
> ffc0007f31c0:   f9001c02str x2, [x0,#56]
> ffc0007f31c4:   d53ccdc2mrs x2, s3_4_c12_c13_6
> ffc0007f31c8:   f9002002str x2, [x0,#64]
> ffc0007f31cc:   d53ccda2mrs x2, s3_4_c12_c13_5
> ffc0007f31d0:   f9002402str x2, [x0,#72]
> ffc0007f31d4:   d53ccd82mrs x2, s3_4_c12_c13_4
> ffc0007f31d8:   f9002802str x2, [x0,#80]
> ffc0007f31dc:   d53ccd62mrs x2, s3_4_c12_c13_3
> ffc0007f31e0:   f9002c02str x2, [x0,#88]
> ffc0007f31e4:   d53ccd42mrs x2, s3_4_c12_c13_2
> ffc0007f31e8:   f9003002str x2, [x0,#96]
> ffc0007f31ec:   d53ccd22mrs x2, s3_4_c12_c13_1
> ffc0007f31f0:   f9003402str x2, [x0,#104]
> ffc0007f31f4:   d53ccd02mrs x2, s3_4_c12_c13_0
> ffc0007f31f8:   f9003802str x2, [x0,#112]
> ffc0007f31fc:   d53ccce2mrs x2, s3_4_c12_c12_7
> ffc0007f3200:   f9003c02str x2, [x0,#120]
> ffc0007f3204:   d532mrs x2, s3_4_c12_c12_6
> ffc0007f3208:   f9004002str x2, [x0,#128]
> ffc0007f320c:   d53ccca2mrs x2, s3_4_c12_c12_5
> ffc0007f3210:   f9004402str x2, [x0,#136]
> ffc0007f3214:   d53ccc82mrs x2, s3_4_c12_c12_4
> ffc0007f3218:   f9004802str x2, [x0,#144]
> ffc0007f321c:   d53ccc62mrs x2, s3_4_c12_c12_3
> ffc0007f3220:   f9004c02str x2, [x0,#152]
> ffc0007f3224:   d53ccc42mrs x2, s3_4_c12_c12_2
> ffc0007f3228:   f9005002str x2, [x0,#160]
> ffc0007f322c:   d53ccc22mrs x2, s3_4_c12_c12_1
> ffc0007f3230:   f9005402str x2, [x0,#168]
> ffc0007f3234:   d53ccc02mrs x2, s3_4_c12_c12_0
> ffc0007f3238:   7100183fcmp w1, #0x6
> ffc0007f323c:   f9005802str x2, [x0,#176]
> 
> As you can see, this is as optimal as it gets, short of being able
> to find a nice way to use more than one register...

Interesting, thanks for the dump I'm no expert on pipeline optimizations but I'm
wondering with these system register accesses can these be executed out of order
provided you didn't have what I thinks are write after read dependencies?
It's only 4 registers here, there are some other longer stretches in subsequent
patches.

I minor note here is some white space in this patch.
> 
> Thanks,
> 
>   M.
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 06/22] arm64: KVM: Implement timer save/restore

2015-12-07 Thread Mario Smarduch


On 12/7/2015 2:53 AM, Marc Zyngier wrote:
> Implement the timer save restore as a direct translation of
> the assembly code version.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/hyp/Makefile  |  1 +
>  arch/arm64/kvm/hyp/hyp.h |  3 ++
>  arch/arm64/kvm/hyp/timer-sr.c| 72 
> 
>  include/clocksource/arm_arch_timer.h |  6 +++
>  4 files changed, 82 insertions(+)
>  create mode 100644 arch/arm64/kvm/hyp/timer-sr.c
> 
> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> index d1e38ce..455dc0a 100644
> --- a/arch/arm64/kvm/hyp/Makefile
> +++ b/arch/arm64/kvm/hyp/Makefile
> @@ -4,3 +4,4 @@
>  
>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
> +obj-$(CONFIG_KVM_ARM_HOST) += timer-sr.o
> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
> index 5759f9f..f213e46 100644
> --- a/arch/arm64/kvm/hyp/hyp.h
> +++ b/arch/arm64/kvm/hyp/hyp.h
> @@ -35,5 +35,8 @@ void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
>  void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
>  void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
>  
> +void __timer_save_state(struct kvm_vcpu *vcpu);
> +void __timer_restore_state(struct kvm_vcpu *vcpu);
> +
>  #endif /* __ARM64_KVM_HYP_H__ */
>  
> diff --git a/arch/arm64/kvm/hyp/timer-sr.c b/arch/arm64/kvm/hyp/timer-sr.c
> new file mode 100644
> index 000..67292c0
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/timer-sr.c
> @@ -0,0 +1,72 @@
> +/*
> + * Copyright (C) 2012-2015 - ARM Ltd
> + * Author: Marc Zyngier 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#include "hyp.h"
> +
> +/* vcpu is already in the HYP VA space */
> +void __hyp_text __timer_save_state(struct kvm_vcpu *vcpu)
> +{
> + struct kvm *kvm = kern_hyp_va(vcpu->kvm);
> + struct arch_timer_cpu *timer = >arch.timer_cpu;
> + u64 val;
> +
> + if (kvm->arch.timer.enabled) {
> + timer->cntv_ctl = read_sysreg(cntv_ctl_el0);
> + isb();

Can you share the subtle insight why is the isb() needed here?
B2.7.3 mentions changes to system registers only.

> + timer->cntv_cval = read_sysreg(cntv_cval_el0);
> + }
> +
> + /* Disable the virtual timer */
> + write_sysreg(0, cntv_ctl_el0);
> +
> + /* Allow physical timer/counter access for the host */
> + val = read_sysreg(cnthctl_el2);
> + val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
> + write_sysreg(val, cnthctl_el2);
> +
> + /* Clear cntvoff for the host */
> + write_sysreg(0, cntvoff_el2);
> +}
> +
> +void __hyp_text __timer_restore_state(struct kvm_vcpu *vcpu)
> +{
> + struct kvm *kvm = kern_hyp_va(vcpu->kvm);
> + struct arch_timer_cpu *timer = >arch.timer_cpu;
> + u64 val;
> +
> + /*
> +  * Disallow physical timer access for the guest
> +  * Physical counter access is allowed
> +  */
> + val = read_sysreg(cnthctl_el2);
> + val &= ~CNTHCTL_EL1PCEN;
> + val |= CNTHCTL_EL1PCTEN;
> + write_sysreg(val, cnthctl_el2);
> +
> + if (kvm->arch.timer.enabled) {
> + write_sysreg(kvm->arch.timer.cntvoff, cntvoff_el2);
> + write_sysreg(timer->cntv_cval, cntv_cval_el0);
> + isb();
> + write_sysreg(timer->cntv_ctl, cntv_ctl_el0);
> + }
> +}
> diff --git a/include/clocksource/arm_arch_timer.h 
> b/include/clocksource/arm_arch_timer.h
> index 9916d0e..25d0914 100644
> --- a/include/clocksource/arm_arch_timer.h
> +++ b/include/clocksource/arm_arch_timer.h
> @@ -23,6 +23,12 @@
>  #define ARCH_TIMER_CTRL_IT_MASK  (1 << 1)
>  #define ARCH_TIMER_CTRL_IT_STAT  (1 << 2)
>  
> +#define CNTHCTL_EL1PCTEN (1 << 0)
> +#define CNTHCTL_EL1PCEN  (1 << 1)
> +#define CNTHCTL_EVNTEN   (1 << 2)
> +#define CNTHCTL_EVNTDIR  (1 << 3)
> +#define CNTHCTL_EVNTI(0xF << 4)
> +
>  enum arch_timer_reg {
>   ARCH_TIMER_REG_CTRL,
>   ARCH_TIMER_REG_TVAL,
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kernel 3/9] KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers

2015-12-07 Thread David Gibson
On Tue, Sep 15, 2015 at 08:49:33PM +1000, Alexey Kardashevskiy wrote:
> This reworks the existing H_PUT_TCE/H_GET_TCE handlers to have one
> exit path. This allows next patch to add locks nicely.

I don't see a problem with the actual code, but it doesn't seem to
match this description: I still see multiple return statements for
h_put_tce at least.

> This moves the ioba boundaries check to a helper and adds a check for
> least bits which have to be zeros.
> 
> The patch is pretty mechanical (only check for least ioba bits is added)
> so no change in behaviour is expected.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 102 
> +++-
>  1 file changed, 66 insertions(+), 36 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
> b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 89e96b3..8ae12ac 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -35,71 +35,101 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define TCES_PER_PAGE(PAGE_SIZE / sizeof(u64))
>  
> +/*
> + * Finds a TCE table descriptor by LIOBN.
> + *
> + * WARNING: This will be called in real or virtual mode on HV KVM and virtual
> + *  mode on PR KVM
> + */
> +static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu 
> *vcpu,
> + unsigned long liobn)
> +{
> + struct kvm *kvm = vcpu->kvm;
> + struct kvmppc_spapr_tce_table *stt;
> +
> + list_for_each_entry_rcu_notrace(stt, >arch.spapr_tce_tables, list)
> + if (stt->liobn == liobn)
> + return stt;
> +
> + return NULL;
> +}
> +
> +/*
> + * Validates IO address.
> + *
> + * WARNING: This will be called in real-mode on HV KVM and virtual
> + *  mode on PR KVM
> + */
> +static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
> + unsigned long ioba, unsigned long npages)
> +{
> + unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
> + unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
> + unsigned long size = stt->window_size >> IOMMU_PAGE_SHIFT_4K;
> +
> + if ((ioba & mask) || (size + npages <= idx))
> + return H_PARAMETER;

Not sure if it's worth a check for overflow in (size+npages) there.

> +
> + return H_SUCCESS;
> +}
> +
>  /* WARNING: This will be called in real-mode on HV KVM and virtual
>   *  mode on PR KVM
>   */
>  long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> unsigned long ioba, unsigned long tce)
>  {
> - struct kvm *kvm = vcpu->kvm;
> - struct kvmppc_spapr_tce_table *stt;
> + struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
> + long ret = H_TOO_HARD;
> + unsigned long idx;
> + struct page *page;
> + u64 *tbl;
>  
>   /* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
>   /*  liobn, ioba, tce); */
>  
> - list_for_each_entry(stt, >arch.spapr_tce_tables, list) {
> - if (stt->liobn == liobn) {
> - unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
> - struct page *page;
> - u64 *tbl;
> + if (!stt)
> + return ret;
>  
> - /* udbg_printf("H_PUT_TCE: liobn 0x%lx => stt=%p  
> window_size=0x%x\n", */
> - /*  liobn, stt, stt->window_size); */
> - if (ioba >= stt->window_size)
> - return H_PARAMETER;
> + ret = kvmppc_ioba_validate(stt, ioba, 1);
> + if (ret)
> + return ret;
>  
> - page = stt->pages[idx / TCES_PER_PAGE];
> - tbl = (u64 *)page_address(page);
> + idx = ioba >> SPAPR_TCE_SHIFT;
> + page = stt->pages[idx / TCES_PER_PAGE];
> + tbl = (u64 *)page_address(page);
>  
> - /* FIXME: Need to validate the TCE itself */
> - /* udbg_printf("tce @ %p\n", [idx % 
> TCES_PER_PAGE]); */
> - tbl[idx % TCES_PER_PAGE] = tce;
> - return H_SUCCESS;
> - }
> - }
> + /* FIXME: Need to validate the TCE itself */
> + /* udbg_printf("tce @ %p\n", [idx % TCES_PER_PAGE]); */
> + tbl[idx % TCES_PER_PAGE] = tce;
>  
> - /* Didn't find the liobn, punt it to userspace */
> - return H_TOO_HARD;
> + return ret;

So, this relies on the fact that kvmppc_ioba_validate() would have
returned H_SUCCESS some distance above.  This seems rather fragile if
you insert anything else which alters ret in between.  Since this is
the success path, I think it's clearer to explicitly return H_SUCCESS.

>  }
>  EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
>  
>  long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> unsigned long ioba)
>  {
> - struct kvm *kvm = vcpu->kvm;
> - struct 

Re: [PATCH kernel 1/9] rcu: Define notrace version of list_for_each_entry_rcu

2015-12-07 Thread David Gibson
On Tue, Sep 15, 2015 at 08:49:31PM +1000, Alexey Kardashevskiy wrote:
> This defines list_for_each_entry_rcu_notrace and list_entry_rcu_notrace
> which use rcu_dereference_raw_notrace instead of rcu_dereference_raw.
> This allows using list_for_each_entry_rcu_notrace in real mode (MMU is off).
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

> ---
>  include/linux/rculist.h | 38 ++
>  1 file changed, 38 insertions(+)
> 
> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> index 17c6b1f..439c4d7 100644
> --- a/include/linux/rculist.h
> +++ b/include/linux/rculist.h
> @@ -253,6 +253,25 @@ static inline void list_splice_init_rcu(struct list_head 
> *list,
>  })
>  
>  /**
> + * list_entry_rcu_notrace - get the struct for this entry
> + * @ptr:the  list_head pointer.
> + * @type:   the type of the struct this is embedded in.
> + * @member: the name of the list_struct within the struct.
> + *
> + * This primitive may safely run concurrently with the _rcu list-mutation
> + * primitives such as list_add_rcu() as long as it's guarded by 
> rcu_read_lock().
> + *
> + * This is the same as list_entry_rcu() except that it does
> + * not do any RCU debugging or tracing.
> + */
> +#define list_entry_rcu_notrace(ptr, type, member) \
> +({ \
> + typeof(*ptr) __rcu *__ptr = (typeof(*ptr) __rcu __force *)ptr; \
> + container_of((typeof(ptr))rcu_dereference_raw_notrace(__ptr), \
> + type, member); \
> +})
> +
> +/**
>   * Where are list_empty_rcu() and list_first_entry_rcu()?
>   *
>   * Implementing those functions following their counterparts list_empty() and
> @@ -308,6 +327,25 @@ static inline void list_splice_init_rcu(struct list_head 
> *list,
>   pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
>  
>  /**
> + * list_for_each_entry_rcu_notrace - iterate over rcu list of given type
> + * @pos: the type * to use as a loop cursor.
> + * @head:the head for your list.
> + * @member:  the name of the list_struct within the struct.
> + *
> + * This list-traversal primitive may safely run concurrently with
> + * the _rcu list-mutation primitives such as list_add_rcu()
> + * as long as the traversal is guarded by rcu_read_lock().
> + *
> + * This is the same as list_for_each_entry_rcu() except that it does
> + * not do any RCU debugging or tracing.
> + */
> +#define list_for_each_entry_rcu_notrace(pos, head, member) \
> + for (pos = list_entry_rcu_notrace((head)->next, typeof(*pos), member); \
> + >member != (head); \
> + pos = list_entry_rcu_notrace(pos->member.next, typeof(*pos), \
> + member))
> +
> +/**
>   * list_for_each_entry_continue_rcu - continue iteration over list of given 
> type
>   * @pos: the type * to use as a loop cursor.
>   * @head:the head for your list.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH kernel 1/9] rcu: Define notrace version of list_for_each_entry_rcu

2015-12-07 Thread David Gibson
On Tue, Sep 15, 2015 at 08:49:31PM +1000, Alexey Kardashevskiy wrote:
> This defines list_for_each_entry_rcu_notrace and list_entry_rcu_notrace
> which use rcu_dereference_raw_notrace instead of rcu_dereference_raw.
> This allows using list_for_each_entry_rcu_notrace in real mode (MMU is off).
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

> ---
>  include/linux/rculist.h | 38 ++
>  1 file changed, 38 insertions(+)
> 
> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> index 17c6b1f..439c4d7 100644
> --- a/include/linux/rculist.h
> +++ b/include/linux/rculist.h
> @@ -253,6 +253,25 @@ static inline void list_splice_init_rcu(struct list_head 
> *list,
>  })
>  
>  /**
> + * list_entry_rcu_notrace - get the struct for this entry
> + * @ptr:the  list_head pointer.
> + * @type:   the type of the struct this is embedded in.
> + * @member: the name of the list_struct within the struct.
> + *
> + * This primitive may safely run concurrently with the _rcu list-mutation
> + * primitives such as list_add_rcu() as long as it's guarded by 
> rcu_read_lock().
> + *
> + * This is the same as list_entry_rcu() except that it does
> + * not do any RCU debugging or tracing.
> + */
> +#define list_entry_rcu_notrace(ptr, type, member) \
> +({ \
> + typeof(*ptr) __rcu *__ptr = (typeof(*ptr) __rcu __force *)ptr; \
> + container_of((typeof(ptr))rcu_dereference_raw_notrace(__ptr), \
> + type, member); \
> +})
> +
> +/**
>   * Where are list_empty_rcu() and list_first_entry_rcu()?
>   *
>   * Implementing those functions following their counterparts list_empty() and
> @@ -308,6 +327,25 @@ static inline void list_splice_init_rcu(struct list_head 
> *list,
>   pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
>  
>  /**
> + * list_for_each_entry_rcu_notrace - iterate over rcu list of given type
> + * @pos: the type * to use as a loop cursor.
> + * @head:the head for your list.
> + * @member:  the name of the list_struct within the struct.
> + *
> + * This list-traversal primitive may safely run concurrently with
> + * the _rcu list-mutation primitives such as list_add_rcu()
> + * as long as the traversal is guarded by rcu_read_lock().
> + *
> + * This is the same as list_for_each_entry_rcu() except that it does
> + * not do any RCU debugging or tracing.
> + */
> +#define list_for_each_entry_rcu_notrace(pos, head, member) \
> + for (pos = list_entry_rcu_notrace((head)->next, typeof(*pos), member); \
> + >member != (head); \
> + pos = list_entry_rcu_notrace(pos->member.next, typeof(*pos), \
> + member))
> +
> +/**
>   * list_for_each_entry_continue_rcu - continue iteration over list of given 
> type
>   * @pos: the type * to use as a loop cursor.
>   * @head:the head for your list.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH kernel 3/9] KVM: PPC: Rework H_PUT_TCE/H_GET_TCE handlers

2015-12-07 Thread David Gibson
On Tue, Sep 15, 2015 at 08:49:33PM +1000, Alexey Kardashevskiy wrote:
> This reworks the existing H_PUT_TCE/H_GET_TCE handlers to have one
> exit path. This allows next patch to add locks nicely.

I don't see a problem with the actual code, but it doesn't seem to
match this description: I still see multiple return statements for
h_put_tce at least.

> This moves the ioba boundaries check to a helper and adds a check for
> least bits which have to be zeros.
> 
> The patch is pretty mechanical (only check for least ioba bits is added)
> so no change in behaviour is expected.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 102 
> +++-
>  1 file changed, 66 insertions(+), 36 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
> b/arch/powerpc/kvm/book3s_64_vio_hv.c
> index 89e96b3..8ae12ac 100644
> --- a/arch/powerpc/kvm/book3s_64_vio_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
> @@ -35,71 +35,101 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define TCES_PER_PAGE(PAGE_SIZE / sizeof(u64))
>  
> +/*
> + * Finds a TCE table descriptor by LIOBN.
> + *
> + * WARNING: This will be called in real or virtual mode on HV KVM and virtual
> + *  mode on PR KVM
> + */
> +static struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu 
> *vcpu,
> + unsigned long liobn)
> +{
> + struct kvm *kvm = vcpu->kvm;
> + struct kvmppc_spapr_tce_table *stt;
> +
> + list_for_each_entry_rcu_notrace(stt, >arch.spapr_tce_tables, list)
> + if (stt->liobn == liobn)
> + return stt;
> +
> + return NULL;
> +}
> +
> +/*
> + * Validates IO address.
> + *
> + * WARNING: This will be called in real-mode on HV KVM and virtual
> + *  mode on PR KVM
> + */
> +static long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt,
> + unsigned long ioba, unsigned long npages)
> +{
> + unsigned long mask = (1ULL << IOMMU_PAGE_SHIFT_4K) - 1;
> + unsigned long idx = ioba >> IOMMU_PAGE_SHIFT_4K;
> + unsigned long size = stt->window_size >> IOMMU_PAGE_SHIFT_4K;
> +
> + if ((ioba & mask) || (size + npages <= idx))
> + return H_PARAMETER;

Not sure if it's worth a check for overflow in (size+npages) there.

> +
> + return H_SUCCESS;
> +}
> +
>  /* WARNING: This will be called in real-mode on HV KVM and virtual
>   *  mode on PR KVM
>   */
>  long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> unsigned long ioba, unsigned long tce)
>  {
> - struct kvm *kvm = vcpu->kvm;
> - struct kvmppc_spapr_tce_table *stt;
> + struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn);
> + long ret = H_TOO_HARD;
> + unsigned long idx;
> + struct page *page;
> + u64 *tbl;
>  
>   /* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
>   /*  liobn, ioba, tce); */
>  
> - list_for_each_entry(stt, >arch.spapr_tce_tables, list) {
> - if (stt->liobn == liobn) {
> - unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
> - struct page *page;
> - u64 *tbl;
> + if (!stt)
> + return ret;
>  
> - /* udbg_printf("H_PUT_TCE: liobn 0x%lx => stt=%p  
> window_size=0x%x\n", */
> - /*  liobn, stt, stt->window_size); */
> - if (ioba >= stt->window_size)
> - return H_PARAMETER;
> + ret = kvmppc_ioba_validate(stt, ioba, 1);
> + if (ret)
> + return ret;
>  
> - page = stt->pages[idx / TCES_PER_PAGE];
> - tbl = (u64 *)page_address(page);
> + idx = ioba >> SPAPR_TCE_SHIFT;
> + page = stt->pages[idx / TCES_PER_PAGE];
> + tbl = (u64 *)page_address(page);
>  
> - /* FIXME: Need to validate the TCE itself */
> - /* udbg_printf("tce @ %p\n", [idx % 
> TCES_PER_PAGE]); */
> - tbl[idx % TCES_PER_PAGE] = tce;
> - return H_SUCCESS;
> - }
> - }
> + /* FIXME: Need to validate the TCE itself */
> + /* udbg_printf("tce @ %p\n", [idx % TCES_PER_PAGE]); */
> + tbl[idx % TCES_PER_PAGE] = tce;
>  
> - /* Didn't find the liobn, punt it to userspace */
> - return H_TOO_HARD;
> + return ret;

So, this relies on the fact that kvmppc_ioba_validate() would have
returned H_SUCCESS some distance above.  This seems rather fragile if
you insert anything else which alters ret in between.  Since this is
the success path, I think it's clearer to explicitly return H_SUCCESS.

>  }
>  EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
>  
>  long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
> unsigned long ioba)
>  {
> - struct kvm *kvm = vcpu->kvm;
> - struct 

Re: [PATCH kernel 4/9] KVM: PPC: Use RCU for arch.spapr_tce_tables

2015-12-07 Thread David Gibson
On Tue, Sep 15, 2015 at 08:49:34PM +1000, Alexey Kardashevskiy wrote:
> At the moment spapr_tce_tables is not protected against races. This makes
> use of RCU-variants of list helpers. As some bits are executed in real
> mode, this makes use of just introduced list_for_each_entry_rcu_notrace().
> 
> This converts release_spapr_tce_table() to a RCU scheduled handler.
> 
> Signed-off-by: Alexey Kardashevskiy 

Looks correct on my limited knowledge of RCU

Reviewed-by: David Gibson 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH kernel 2/9] KVM: PPC: Make real_vmalloc_addr() public

2015-12-07 Thread David Gibson
On Tue, Sep 15, 2015 at 08:49:32PM +1000, Alexey Kardashevskiy wrote:
> This helper translates vmalloc'd addresses to linear addresses.
> It is only used by the KVM MMU code now and resides in the HV KVM code.
> We will need it further in the TCE code and the DMA memory preregistration
> code called in real mode.
> 
> This makes real_vmalloc_addr() public and moves it to the powerpc code as
> it does not do anything special for KVM.
> 
> Signed-off-by: Alexey Kardashevskiy 

Hmm, I have a couple of small concerns.

First, I'm not sure if the name is clear enough for a public function.

Second, I'm not sure if mmu-hash64.h is the right place for it.  This
is still a function with very specific and limited usage, I wonder if
we should have somewhere else for such special real mode helpers.

Paulus, thoughts?

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH kernel 4/9] KVM: PPC: Use RCU for arch.spapr_tce_tables

2015-12-07 Thread David Gibson
On Tue, Sep 15, 2015 at 08:49:34PM +1000, Alexey Kardashevskiy wrote:
> At the moment spapr_tce_tables is not protected against races. This makes
> use of RCU-variants of list helpers. As some bits are executed in real
> mode, this makes use of just introduced list_for_each_entry_rcu_notrace().
> 
> This converts release_spapr_tce_table() to a RCU scheduled handler.
> 
> Signed-off-by: Alexey Kardashevskiy 

Looks correct on my limited knowledge of RCU

Reviewed-by: David Gibson 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH kernel 2/9] KVM: PPC: Make real_vmalloc_addr() public

2015-12-07 Thread David Gibson
On Tue, Sep 15, 2015 at 08:49:32PM +1000, Alexey Kardashevskiy wrote:
> This helper translates vmalloc'd addresses to linear addresses.
> It is only used by the KVM MMU code now and resides in the HV KVM code.
> We will need it further in the TCE code and the DMA memory preregistration
> code called in real mode.
> 
> This makes real_vmalloc_addr() public and moves it to the powerpc code as
> it does not do anything special for KVM.
> 
> Signed-off-by: Alexey Kardashevskiy 

Hmm, I have a couple of small concerns.

First, I'm not sure if the name is clear enough for a public function.

Second, I'm not sure if mmu-hash64.h is the right place for it.  This
is still a function with very specific and limited usage, I wonder if
we should have somewhere else for such special real mode helpers.

Paulus, thoughts?

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v4 0/4] KVM: arm64: BUG FIX: Correctly handle zero register transfers

2015-12-07 Thread Andrew Jones
On Mon, Dec 07, 2015 at 11:47:44AM +0300, Pavel Fedin wrote:
>  Hello!
> 
> > But, if Pavel doesn't
> > mind trying them out on his system, then it'd be good to know if they
> > reproduce there. I'd like to find out if it's a test case problem or
> > something else strange going on with environments.
> 
> Does not build, applied to master:
> --- cut ---
> aarch64-unknown-linux-gnu-gcc  -std=gnu99 -ffreestanding -Wextra -O2 -I lib 
> -I lib/libfdt -g -MMD -MF arm/.xzr-test.d -Wall
> -fomit-frame-pointer  -fno-stack-protector -c -o arm/xzr-test.o 
> arm/xzr-test.c
> arm/xzr-test.c: In function 'check_xzr_sysreg':
> arm/xzr-test.c:13:2: warning: implicit declaration of function 'mmu_disable' 
> [-Wimplicit-function-declaration]
>   mmu_disable(); /* Tell KVM to set HCR_TVM for this VCPU */
>   ^
> aarch64-unknown-linux-gnu-gcc  -std=gnu99 -ffreestanding -Wextra -O2 -I lib 
> -I lib/libfdt -g -MMD -MF arm/.xzr-test.d -Wall
> -fomit-frame-pointer  -fno-stack-protector   -nostdlib -o arm/xzr-test.elf \
> -Wl,-T,arm/flat.lds,--build-id=none,-Ttext=4008 \
> arm/xzr-test.o arm/cstart64.o lib/libcflat.a lib/libfdt/libfdt.a 
> /usr/lib/gcc/aarch64-unknown-linux-gnu/4.9.0/libgcc.a
> lib/arm/libeabi.a
> arm/xzr-test.o: In function `check_xzr_sysreg':
> /cygdrive/d/Projects/kvm-unit-tests/arm/xzr-test.c:13: undefined reference to 
> `mmu_disable'
> --- cut ---

Have you done a git pull of your kvm-unit-tests repo lately? The patch
that introduces mmu_disable was commit a few months ago or so. Other
than your repo just not having mmu_disable(), then I can't think of why
it compiles for me and not you. If you have done a recent git pull, then
maybe do a 'make distclean; ./configure; make'

Thanks,
drew

> 
> Kind regards,
> Pavel Fedin
> Expert Engineer
> Samsung Electronics Research center Russia
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/4] KVM: arm64: BUG FIX: Correctly handle zero register transfers

2015-12-07 Thread Andrew Jones
On Mon, Dec 07, 2015 at 12:48:12PM +0300, Pavel Fedin wrote:
>  Hello!
> 
> > FYI, I tried writing test cases for this issue with kvm-unit-tests. The
> > issue didn't reproduce for me. It's quite possible my test cases are
> > flawed
> 
>  Indeed they are, a very little thing fell through again... :)
>  It's not just SP, it's SP_EL0. And you never initialize it to anything 
> because your code always runs in kernel mode, so it's just
> zero, so you get your zero.
>  But if you add a little thing in the beginning of your main():
> 
> asm volatile("msr sp_el0, %0" : : "r" (0xDEADC0DE0BADC0DE));

Ah! Thanks for this. The mmio test does now fail for me too. The sysreg
test still doesn't fail for me (even though I'm doing the above on the
vcpu I use for that too). Maybe there's something weird with which reg
I'm using, and whether or not my attempt to get trapping enabled on it
is working the way I expected. I'll play with it some more.

> 
>  then you have it:
> --- cut ---
> [root@thunderx-2 kvm-unit-tests]# ./arm-run arm/xzr-test.flat -smp 2
> qemu-system-aarch64 -machine virt,accel=kvm:tcg,gic-version=host -cpu host 
> -device virtio-serial-device -device
> virtconsole,chardev=ctd -chardev testdev,id=ctd -display none -serial stdio 
> -kernel arm/xzr-test.flat -smp 2
> PASS: mmio: sanity check: read 0x
> FAIL: mmio: 'str wzr' check: read 0x0badc0de
> vm_setup_vq: virtqueue 0 already setup! base=0xa003e00
> chr_testdev_init: chr-testdev: can't init virtqueues
> --- cut ---
> 
>  Here i run only MMIO test, because i could not compile sysreg one, so i 
> simply commented it out.
> 
>  P.S. Could you also apply something like the following to arm/run:
> --- cut ---
> arm/run | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arm/run b/arm/run
> index 662a856..3890c8c 100755
> --- a/arm/run
> +++ b/arm/run
> @@ -33,7 +33,11 @@ if $qemu $M -chardev testdev,id=id -initrd . 2>&1 \
>   exit 2
>  fi
>  
> -M='-machine virt,accel=kvm:tcg'
> +if $qemu $M,? 2>&1 | grep gic-version > /dev/null; then
> + GIC='gic-version=host,'
> +fi
> +
> +M="-machine virt,${GIC}accel=kvm:tcg"
>  chr_testdev='-device virtio-serial-device'
>  chr_testdev+=' -device virtconsole,chardev=ctd -chardev testdev,id=ctd'
> --- cut ---

Yes, I'll send a patch for this soon. I actually have something similar to this
in my local tree already, I just hadn't bothered sending it as I didn't think
anybody else needed it yet.

> 
>  Without it qemu does not work on GICv3-only hardware, like my board, because 
> it defaults to gic-version=2. I don't post the patch
> on the mailing lists, because in order to be able to post this 5-liner i'll 
> need to go through the formal approval procedure at my
> company, and i just don't want to bother for a single small fix. :) Will do 
> as a "Reported-by:".

It'd be nice if you could go through the procedure. You've been sending
patches to KVM, and ideally we'll start trying to send kvm-unit-tests
patches along with feature patches.

Thanks,
drew
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/4] KVM: arm64: BUG FIX: Correctly handle zero register transfers

2015-12-07 Thread Andrew Jones
On Mon, Dec 07, 2015 at 03:58:11PM -0600, Andrew Jones wrote:
> On Mon, Dec 07, 2015 at 12:48:12PM +0300, Pavel Fedin wrote:
> >  Hello!
> > 
> > > FYI, I tried writing test cases for this issue with kvm-unit-tests. The
> > > issue didn't reproduce for me. It's quite possible my test cases are
> > > flawed
> > 
> >  Indeed they are, a very little thing fell through again... :)
> >  It's not just SP, it's SP_EL0. And you never initialize it to anything 
> > because your code always runs in kernel mode, so it's just
> > zero, so you get your zero.
> >  But if you add a little thing in the beginning of your main():
> > 
> > asm volatile("msr sp_el0, %0" : : "r" (0xDEADC0DE0BADC0DE));
> 
> Ah! Thanks for this. The mmio test does now fail for me too. The sysreg
> test still doesn't fail for me (even though I'm doing the above on the
> vcpu I use for that too). Maybe there's something weird with which reg
> I'm using, and whether or not my attempt to get trapping enabled on it
> is working the way I expected. I'll play with it some more.
>

Must be the trapping thing. I switched to dbgbvr0_el1, which has
trapping enabled on it until it's touched, and was able the reproduce
the xzr issue it.

drew
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/4] KVM: arm64: BUG FIX: Correctly handle zero register transfers

2015-12-07 Thread Andrew Jones
On Mon, Dec 07, 2015 at 11:36:28AM +0300, Pavel Fedin wrote:
>  Hello!
> 
> > FYI, I tried writing test cases for this issue with kvm-unit-tests. The
> > issue didn't reproduce for me. It's quite possible my test cases are
> > flawed, so I'm not making any claims about the validity of the series
> 
>  This is indeed very interesting, so i'll take a look at it.
>  For now i've just only took a quick glance at the code, and i have at least 
> one suggestion. Could you happen to have sp == 0 in
> check_xzr_sysreg()? In this case it will magically work.
>  Also, you could try to write a test which tries to overwrite xzr. Something 
> like:
> 
> volatile int *addr1;
> volatile int *addr2;
> 
> asm volatile("str %3, [%1]\n\t"
>  "ldr wzr, [%1]\n\t"
>  "str wzr, [%2]\n\t",
>  "ldr %0, [%2]\n\t"
>  :"=r"(res):"r"(addr1), "r"(addr2), 
> "r"(some_nonzero_val):"memory");
> 
>  Then check for res == some_nonzero_val. If they are equal, you've got the 
> bug :)
>

Besides the fixes mentioned in other mails, I did add this load to xzr
tests too. For mmio we get the expected failure. mrs seems to work
though, but maybe that's expected.

qemu-system-aarch64 -machine virt,accel=kvm -cpu host \
  -device virtio-serial-device -device virtconsole,chardev=ctd \
  -chardev testdev,id=ctd -display none -serial stdio \
  -kernel arm/xzr-test.flat -smp 2

PASS: mmio: sanity check: read 0x
FAIL: mmio: 'str wzr' check: read 0x0badc0de
FAIL: mmio: 'ldr wzr' check: read 0x0badc0de
PASS: sysreg: sp = 0x401affe0
FAIL: sysreg: from xzr check: read 0xc0de0badc0de
PASS: sysreg: to xzr check: read 0x

SUMMARY: 6 tests, 3 unexpected failures
Return value from qemu: 3

Updated test attached.

drew 
>From ef5af811a72c14977e7958ee94b0c7b0fb99e6e8 Mon Sep 17 00:00:00 2001
From: Andrew Jones 
Date: Fri, 4 Dec 2015 23:55:53 +0100
Subject: [kvm-unit-tests PATCH] arm64: add xzr emulator test

---
v2:
 - added Pavel's fixes
 - changed target sysreg

 arm/xzr-test.c  | 89 +
 config/config-arm64.mak |  4 ++-
 2 files changed, 92 insertions(+), 1 deletion(-)
 create mode 100644 arm/xzr-test.c

diff --git a/arm/xzr-test.c b/arm/xzr-test.c
new file mode 100644
index 0..cf92dcc2d4e00
--- /dev/null
+++ b/arm/xzr-test.c
@@ -0,0 +1,89 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+
+static void check_xzr_sysreg(void)
+{
+   uint64_t val;
+
+#if 0
+   flush_tlb_all();
+   mmu_disable(); /* Tell KVM to set HCR_TVM for this VCPU */
+#endif
+
+   val = current_stack_pointer;
+   report("sysreg: sp = 0x%016lx", val != 0, val);
+
+   asm volatile("msr sp_el0, %0" : : "r" (0xdeadc0de0badc0de));
+   isb();
+
+#if 0
+   asm volatile("msr ttbr0_el1, %0" : : "r" (0x & 
PAGE_MASK));
+   isb();
+   asm volatile("mrs %0, ttbr0_el1" : "=r" (val));
+   isb();
+   report("sysreg: sanity check: read 0x%016lx", val == 
(0x & PAGE_MASK), val);
+
+   asm volatile("msr ttbr0_el1, xzr");
+   isb();
+   asm volatile("mrs %0, ttbr0_el1" : "=r" (val));
+   isb();
+   report("sysreg: xzr check: read 0x%016lx", val == 0, val);
+#endif
+   asm volatile("msr dbgbvr0_el1, xzr");
+   isb();
+   asm volatile("mrs %0, dbgbvr0_el1" : "=r" (val));
+   isb();
+   report("sysreg: from xzr check: read 0x%016lx", val == 0, val);
+   asm volatile("mrs xzr, dbgbvr0_el1");
+   isb();
+   asm volatile("mov %0, xzr" : "=r" (val));
+   report("sysreg: to xzr check: read 0x%016lx", val == 0, val);
+
+   halt();
+}
+
+static uint32_t *steal_mmio_addr(void)
+{
+   /*
+* Steal an MMIO addr from chr-testdev. Before calling exit()
+* chr-testdev must be reinit.
+*/
+   return (uint32_t *)(0x0a003e00UL /* base */ + 0x40 /* queue pfn */);
+}
+
+int main(void)
+{
+   volatile uint32_t *addr = steal_mmio_addr();
+   uint32_t val;
+   long i;
+
+   asm volatile("msr sp_el0, %0" : : "r" (0xdeadc0de0badc0de));
+   isb();
+
+   writel(0x, addr);
+   val = readl(addr);
+   report("mmio: sanity check: read 0x%08lx", val == 0x, val);
+
+   mb();
+   asm volatile("str wzr, [%0]" : : "r" (addr));
+   val = readl(addr);
+   report("mmio: 'str wzr' check: read 0x%08lx", val == 0, val);
+   mb();
+   asm volatile("ldr wzr, [%0]" : : "r" (addr));
+   report("mmio: 'ldr wzr' check: read 0x%08lx", val == 0, val);
+
+   writel(0, addr);
+   chr_testdev_init();
+
+   smp_boot_secondary(1, check_xzr_sysreg);
+   for (i = 0; i < 10; ++i)
+   cpu_relax();
+
+   return report_summary();
+}
diff --git a/config/config-arm64.mak b/config/config-arm64.mak
index d61b703c8140e..65b355175f8a0 100644
--- a/config/config-arm64.mak
+++ 

Re: [PATCH v3 05/22] arm64: KVM: Implement vgic-v3 save/restore

2015-12-07 Thread Mario Smarduch


On 12/7/2015 9:37 AM, Marc Zyngier wrote:
> On 07/12/15 17:18, Mario Smarduch wrote:
>>
>>
>> On 12/7/2015 8:52 AM, Marc Zyngier wrote:
>>> Hi Mario,
>>>
>>> On 07/12/15 16:40, Mario Smarduch wrote:
 Hi Marc,

 On 12/7/2015 2:53 AM, Marc Zyngier wrote:
> Implement the vgic-v3 save restore as a direct translation of
> the assembly code version.
>
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/hyp/Makefile |   1 +
>  arch/arm64/kvm/hyp/hyp.h|   3 +
>  arch/arm64/kvm/hyp/vgic-v3-sr.c | 226 
> 
>  3 files changed, 230 insertions(+)
>  create mode 100644 arch/arm64/kvm/hyp/vgic-v3-sr.c
>
> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> index d8d5968..d1e38ce 100644
> --- a/arch/arm64/kvm/hyp/Makefile
> +++ b/arch/arm64/kvm/hyp/Makefile
> @@ -3,3 +3,4 @@
>  #
>  
>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
> +obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
> index ac63553..5759f9f 100644
> --- a/arch/arm64/kvm/hyp/hyp.h
> +++ b/arch/arm64/kvm/hyp/hyp.h
> @@ -32,5 +32,8 @@
>  void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
>  void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
>  
> +void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
> +void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
> +
>  #endif /* __ARM64_KVM_HYP_H__ */
>  
> diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c 
> b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> new file mode 100644
> index 000..78d05f3
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> @@ -0,0 +1,226 @@
> +/*
> + * Copyright (C) 2012-2015 - ARM Ltd
> + * Author: Marc Zyngier 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#include "hyp.h"
> +
> +#define vtr_to_max_lr_idx(v) ((v) & 0xf)
> +#define vtr_to_nr_pri_bits(v)(((u32)(v) >> 29) + 1)
> +
> +#define read_gicreg(r)   
> \
> + ({  \
> + u64 reg;\
> + asm volatile("mrs_s %0, " __stringify(r) : "=r" (reg)); \
> + reg;\
> + })
> +
> +#define write_gicreg(v,r)
> \
> + do {\
> + u64 __val = (v);\
> + asm volatile("msr_s " __stringify(r) ", %0" : : "r" (__val));\
> + } while (0)
> +
> +/* vcpu is already in the HYP VA space */
> +void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
> +{
> + struct vgic_v3_cpu_if *cpu_if = >arch.vgic_cpu.vgic_v3;
> + u64 val;
> + u32 max_lr_idx, nr_pri_bits;
> +
> + /*
> +  * Make sure stores to the GIC via the memory mapped interface
> +  * are now visible to the system register interface.
> +  */
> + dsb(st);
> +
> + cpu_if->vgic_vmcr  = read_gicreg(ICH_VMCR_EL2);
> + cpu_if->vgic_misr  = read_gicreg(ICH_MISR_EL2);
> + cpu_if->vgic_eisr  = read_gicreg(ICH_EISR_EL2);
> + cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2);
> +
> + write_gicreg(0, ICH_HCR_EL2);
> + val = read_gicreg(ICH_VTR_EL2);
> + max_lr_idx = vtr_to_max_lr_idx(val);
> + nr_pri_bits = vtr_to_nr_pri_bits(val);
> +
 Can you setup a base pointer to cpu_if->vgic_lr and use an offset?
>>>
>>> I could, but I fail to see what we'd gain by using this (aside from
>>> slightly shorter lines). Or am I completely missing the point?
>>
>> Skip adding the offset of vgic_lr to cpu_if pointer.
> 
> But if we do that, we also change the layout that EL1 expect. Assume we
> do something like this:
> 
> u64 *current_lr = cpu_if->vgic_lr;
> 
> switch (max_lr_idx) {
>   case 15:
>   current_lr++ = read_gicreg(ICH_LR15_EL2);
>   case 14:
>   

Re: [PATCH v3 05/22] arm64: KVM: Implement vgic-v3 save/restore

2015-12-07 Thread Marc Zyngier
On 07/12/15 18:05, Mario Smarduch wrote:
> 
> 
> On 12/7/2015 9:37 AM, Marc Zyngier wrote:
>> On 07/12/15 17:18, Mario Smarduch wrote:
>>>
>>>
>>> On 12/7/2015 8:52 AM, Marc Zyngier wrote:
 Hi Mario,

 On 07/12/15 16:40, Mario Smarduch wrote:
> Hi Marc,
>
> On 12/7/2015 2:53 AM, Marc Zyngier wrote:
>> Implement the vgic-v3 save restore as a direct translation of
>> the assembly code version.
>>
>> Signed-off-by: Marc Zyngier 
>> ---
>>  arch/arm64/kvm/hyp/Makefile |   1 +
>>  arch/arm64/kvm/hyp/hyp.h|   3 +
>>  arch/arm64/kvm/hyp/vgic-v3-sr.c | 226 
>> 
>>  3 files changed, 230 insertions(+)
>>  create mode 100644 arch/arm64/kvm/hyp/vgic-v3-sr.c
>>
>> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
>> index d8d5968..d1e38ce 100644
>> --- a/arch/arm64/kvm/hyp/Makefile
>> +++ b/arch/arm64/kvm/hyp/Makefile
>> @@ -3,3 +3,4 @@
>>  #
>>  
>>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
>> +obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
>> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
>> index ac63553..5759f9f 100644
>> --- a/arch/arm64/kvm/hyp/hyp.h
>> +++ b/arch/arm64/kvm/hyp/hyp.h
>> @@ -32,5 +32,8 @@
>>  void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
>>  void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
>>  
>> +void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
>> +void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
>> +
>>  #endif /* __ARM64_KVM_HYP_H__ */
>>  
>> diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c 
>> b/arch/arm64/kvm/hyp/vgic-v3-sr.c
>> new file mode 100644
>> index 000..78d05f3
>> --- /dev/null
>> +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
>> @@ -0,0 +1,226 @@
>> +/*
>> + * Copyright (C) 2012-2015 - ARM Ltd
>> + * Author: Marc Zyngier 
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see .
>> + */
>> +
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include 
>> +
>> +#include "hyp.h"
>> +
>> +#define vtr_to_max_lr_idx(v)((v) & 0xf)
>> +#define vtr_to_nr_pri_bits(v)   (((u32)(v) >> 29) + 1)
>> +
>> +#define read_gicreg(r)  
>> \
>> +({  
>> \
>> +u64 reg;
>> \
>> +asm volatile("mrs_s %0, " __stringify(r) : "=r" (reg)); 
>> \
>> +reg;
>> \
>> +})
>> +
>> +#define write_gicreg(v,r)   
>> \
>> +do {
>> \
>> +u64 __val = (v);
>> \
>> +asm volatile("msr_s " __stringify(r) ", %0" : : "r" 
>> (__val));\
>> +} while (0)
>> +
>> +/* vcpu is already in the HYP VA space */
>> +void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
>> +{
>> +struct vgic_v3_cpu_if *cpu_if = >arch.vgic_cpu.vgic_v3;
>> +u64 val;
>> +u32 max_lr_idx, nr_pri_bits;
>> +
>> +/*
>> + * Make sure stores to the GIC via the memory mapped interface
>> + * are now visible to the system register interface.
>> + */
>> +dsb(st);
>> +
>> +cpu_if->vgic_vmcr  = read_gicreg(ICH_VMCR_EL2);
>> +cpu_if->vgic_misr  = read_gicreg(ICH_MISR_EL2);
>> +cpu_if->vgic_eisr  = read_gicreg(ICH_EISR_EL2);
>> +cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2);
>> +
>> +write_gicreg(0, ICH_HCR_EL2);
>> +val = read_gicreg(ICH_VTR_EL2);
>> +max_lr_idx = vtr_to_max_lr_idx(val);
>> +nr_pri_bits = vtr_to_nr_pri_bits(val);
>> +
> Can you setup a base pointer to cpu_if->vgic_lr and use an offset?

 I could, but I fail to see what we'd gain by using this (aside from
 

Re: [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC

2015-12-07 Thread Alexander Duyck
On Mon, Dec 7, 2015 at 9:39 AM, Michael S. Tsirkin  wrote:
> On Mon, Dec 07, 2015 at 09:12:08AM -0800, Alexander Duyck wrote:
>> On Mon, Dec 7, 2015 at 7:40 AM, Lan, Tianyu  wrote:
>> > On 12/5/2015 1:07 AM, Alexander Duyck wrote:

>> > If can't do that, we have to stop DMA in a short time to mark all dma
>> > pages dirty and then reenable it. I am not sure how much we can get by
>> > this way to track all DMA memory with device running during migration. I
>> > need to do some tests and compare results with stop DMA diretly at last
>> > stage during migration.
>>
>> We have to halt the DMA before we can complete the migration.  So
>> please feel free to test this.
>>
>> In addition I still feel you would be better off taking this in
>> smaller steps.  I still say your first step would be to come up with a
>> generic solution for the dirty page tracking like the dma_mark_clean()
>> approach I had mentioned earlier.  If I get time I might try to take
>> care of it myself later this week since you don't seem to agree with
>> that approach.
>
> Or even try to look at the dirty bit in the VT-D PTEs
> on the host. See the mail I have just sent.
> Might be slower, or might be faster, but is completely
> transparent.

I just saw it and I am looking over the VTd spec now.  It looks like
there might be some performance impacts if software is changing the
PTEs since then the VTd harwdare cannot cache them.  I still have to
do some more reading though so I can fully understand the impacts.

>> >>
>> >> The question is how we would go about triggering it.  I really don't
>> >> think the PCI configuration space approach is the right idea.
>> >>  I wonder
>> >> if we couldn't get away with some sort of ACPI event instead.  We
>> >> already require ACPI support in order to shut down the system
>> >> gracefully, I wonder if we couldn't get away with something similar in
>> >> order to suspend/resume the direct assigned devices gracefully.
>> >>
>> >
>> > I don't think there is such events in the current spec.
>> > Otherwise, There are two kinds of suspend/resume callbacks.
>> > 1) System suspend/resume called during S2RAM and S2DISK.
>> > 2) Runtime suspend/resume called by pm core when device is idle.
>> > If you want to do what you mentioned, you have to change PM core and
>> > ACPI spec.
>>
>> The thought I had was to somehow try to move the direct assigned
>> devices into their own power domain and then simulate a AC power event
>> where that domain is switched off.  However I don't know if there are
>> ACPI events to support that since the power domain code currently only
>> appears to be in use for runtime power management.
>>
>> That had also given me the thought to look at something like runtime
>> power management for the VFs.  We would need to do a runtime
>> suspend/resume.  The only problem is I don't know if there is any way
>> to get the VFs to do a quick wakeup.  It might be worthwhile looking
>> at trying to check with the ACPI experts out there to see if there is
>> anything we can do as bypassing having to use the configuration space
>> mechanism to signal this would definitely be worth it.
>
> I don't much like this idea because it relies on the
> device being exactly the same across source/destination.
> After all, this is always true for suspend/resume.
> Most users do not have control over this, and you would
> often get sightly different versions of firmware,
> etc without noticing.

The original code was operating on that assumption as well.  That is
kind of why I suggested suspend/resume rather than reinventing the
wheel.

> I think we should first see how far along we can get
> by doing a full device reset, and only carrying over
> high level state such as IP, MAC, ARP cache etc.

One advantage of the suspend/resume approach is that it is compatible
with a full reset.  The suspend/resume approach assumes the device
goes through a D0->D3->D0 reset as a part of transitioning between the
system states.

I do admit though that the PCI spec says you aren't supposed to be
hot-swapping devices while the system is in a sleep state so odds are
you would encounter issues if the device changed in any significant
way.

>> >>> The dma page allocated by VF driver also needs to reserve space
>> >>> to do dummy write.
>> >>
>> >>
>> >> No, this will not work.  If for example you have a VF driver allocating
>> >> memory for a 9K receive how will that work?  It isn't as if you can poke
>> >> a hole in the contiguous memory.
>>
>> This is the bit that makes your "poke a hole" solution not portable to
>> other drivers.  I don't know if you overlooked it but for many NICs
>> jumbo frames means using large memory allocations to receive the data.
>> That is the way ixgbevf was up until about a year ago so you cannot
>> expect all the drivers that will want migration support to allow a
>> space for you to write to.  In addition some storage drivers have to
>> map an entire page, 

Re: [PATCH] VSOCK: fix returnvar.cocci warnings

2015-12-06 Thread David Miller
From: Julia Lawall 
Date: Sun, 6 Dec 2015 06:56:23 +0100 (CET)

> Remove unneeded variable used to store return value.
> 
> Generated by: scripts/coccinelle/misc/returnvar.cocci
> 
> CC: Asias He 
> Signed-off-by: Fengguang Wu 
> Signed-off-by: Julia Lawall 

Applied to net-next, thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] VSOCK: fix returnvar.cocci warnings

2015-12-06 Thread Stefan Hajnoczi
On Sun, Dec 06, 2015 at 06:56:23AM +0100, Julia Lawall wrote:
> Remove unneeded variable used to store return value.
> 
> Generated by: scripts/coccinelle/misc/returnvar.cocci
> 
> CC: Asias He 
> Signed-off-by: Fengguang Wu 
> Signed-off-by: Julia Lawall 
> 
> ---
> 
>  vsock.c |3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -56,8 +56,7 @@ struct vhost_vsock {
>  
>  static u32 vhost_transport_get_local_cid(void)
>  {
> - u32 cid = VHOST_VSOCK_DEFAULT_HOST_CID;
> - return cid;
> + return VHOST_VSOCK_DEFAULT_HOST_CID;
>  }
>  
>  static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [v2,2/9] powerpc/smp: Add smp_muxed_ipi_set_message

2015-12-06 Thread Michael Ellerman
On Wed, 2015-25-11 at 23:44:49 UTC, "Suresh E. Warrier" wrote:
> smp_muxed_ipi_message_pass() invokes smp_ops->cause_ipi, which
> updates the MFFR through an ioremapped address, to cause the
> IPI. Because of this real mode callers cannot call
> smp_muxed_ipi_message_pass() for IPI messaging.

You're talking about the XICS code here but you don't mention that. Please
expand it to make it clear that you're talking about XICS.

cheers
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v2,3/9] powerpc/powernv: Add icp_native_cause_ipi_rm

2015-12-06 Thread Michael Ellerman
Subject would be better as "powerpc/xics".

On Wed, 2015-25-11 at 23:44:50 UTC, "Suresh E. Warrier" wrote:
> Function to cause an IPI. Requires kvm_hstate.xics_phys
> to be initialized with physical address of XICS.

Please expand the change log a bit, this is a bit terse.

> diff --git a/arch/powerpc/sysdev/xics/icp-native.c 
> b/arch/powerpc/sysdev/xics/icp-native.c
> index eae3265..e39b18a 100644
> --- a/arch/powerpc/sysdev/xics/icp-native.c
> +++ b/arch/powerpc/sysdev/xics/icp-native.c
> @@ -159,6 +159,25 @@ static void icp_native_cause_ipi(int cpu, unsigned long 
> data)
>   icp_native_set_qirr(cpu, IPI_PRIORITY);
>  }
>  
> +void icp_native_cause_ipi_rm(int cpu)
> +{
> + /*
> +  * Currently not used to send IPIs to another CPU
> +  * on the same core. Only caller is KVM real mode.
> +  * Need the physical address of the XICS to be
> +  * previously saved in kvm_hstate in the paca.
> +  */
> + unsigned long xics_phys;
> +
> + /*
> +  * Just like the cause_ipi functions, it is required to
> +  * include a full barrier (out8 includes a sync) before
> +  * causing the IPI.
> +  */
> + xics_phys = paca[cpu].kvm_hstate.xics_phys;
> + out_rm8((u8 *)(xics_phys + XICS_MFRR), IPI_PRIORITY);
> +}

This doesn't build without KVM:

  arch/powerpc/sysdev/xics/icp-native.c:177:23: error: 'struct paca_struct' has 
no member named 'kvm_hstate'


Probably the whole function should be #ifdef CONFIG_KVM_BOOK3S_64_HV or
something.

cheers
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/11] KVM: x86: track guest page access

2015-12-05 Thread Xiao Guangrong


Ping...

Paolo, any comment?



On 12/02/2015 01:00 AM, Xiao Guangrong wrote:



On 12/01/2015 06:17 PM, Paolo Bonzini wrote:



On 30/11/2015 19:26, Xiao Guangrong wrote:

This patchset introduces the feature which allows us to track page
access in guest. Currently, only write access tracking is implemented
in this version.

Four APIs are introduces:
- kvm_page_track_add_page(kvm, gfn, mode), single guest page @gfn is
   added into the track pool of the guest instance represented by @kvm,
   @mode specifies which kind of access on the @gfn is tracked

- kvm_page_track_remove_page(kvm, gfn, mode), is the opposed operation
   of kvm_page_track_add_page() which removes @gfn from the tracking pool.
   gfn is no tracked after its last user is gone

- kvm_page_track_register_notifier(kvm, n), register a notifier so that
   the event triggered by page tracking will be received, at that time,
   the callback of n->track_write() will be called

- kvm_page_track_unregister_notifier(kvm, n), does the opposed operation
   of kvm_page_track_register_notifier(), which unlinks the notifier and
   stops receiving the tracked event

The first user of page track is non-leaf shadow page tables as they are
always write protected. It also gains performance improvement because
page track speeds up page fault handler for the tracked pages. The
performance result of kernel building is as followings:

before   after
real 461.63   real 455.48
user 4529.55  user 4557.88
sys 1995.39   sys 1922.57


For KVM-GT, as far as I know Andrea Arcangeli is working on extending
userfaultfd to tracking write faults only.  Perhaps KVM-GT can do
something similar, where KVM gets the write tracking functionality for
free through the MMU notifiers.  Any thoughts on this?


Userfaultfd is excellent and has the ability to notify write event indeed,
however, it is not suitable for the use case of shadow page.

For the performance, shadow GPU is performance critical and requires
frequently being switched, it is not good to handle it in userspace. And
windows guest has many GPU tables and updates it frequently, that means,
we need to write protect huge number of pages which are single page based,
I am afraid userfaultfd can not handle this case efficiently.

For the functionality, userfaultfd can not fill the need of shadow page
because:
- the page is keeping readonly, userfaultfd can not fix the fault and let
   the vcpu progress (write access causes writeable gup).

- the access need to be emulated, however, userfaultfd/kernel does not have
   the ability to emulate the access as the access is trigged by guest, the
   instruction info is stored in VMCS so that only KVM can emulate it.

- shadow page needs to be notified after the emulation is finished as it
   should know the new data written to the page to update its page hierarchy.
   (some hardwares lack the 'retry' ability so the shadow page table need to
reflect the table in guest at any time).



Applying your technique to non-leaf shadow pages actually makes this
series quite interesting. :)  Shadow paging is still in use for nested
EPT, so it's always a good idea to speed it up.


Yes. Very glad to see you like it. :)



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PAX: size overflow detected in function __vhost_add_used_n drivers/vhost/vhost.c:1517

2015-12-05 Thread PaX Team
On 5 Dec 2015 at 20:51, Toralf Förster wrote:

> run into the following at a 64bit hardened stable Gentoo Linux while
> running the following command at the host (probably just the ssh login
> was it yet) : 
> 
> $ cd ~/devel/linux/; git archive --prefix linux-4.4.x/ v4.4-rc3 | (ssh 
> root@n22kvm "cd /usr/src/; sudo tar -xf-")
> 
> 
> Dec  5 20:39:26 t44 kernel: PAX: size overflow detected in function 
> __vhost_add_used_n drivers/vhost/vhost.c:1517 cicus.491_193 max, count: 7, 
> decl: last_used_idx; num: 0; context: vhost_virtqueue;

it was already reported (and is fixed):
  https://forums.grsecurity.net/viewtopic.php?f=3=4329

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC

2015-12-04 Thread Michael S. Tsirkin
On Fri, Dec 04, 2015 at 02:42:36PM +0800, Lan, Tianyu wrote:
> 
> On 12/2/2015 10:31 PM, Michael S. Tsirkin wrote:
> >>>We hope
> >>>to find a better way to make SRIOV NIC work in these cases and this is
> >>>worth to do since SRIOV NIC provides better network performance compared
> >>>with PV NIC.
> >If this is a performance optimization as the above implies,
> >you need to include some numbers, and document how did
> >you implement the switch and how did you measure the performance.
> >
> 
> OK. Some ideas of my patches come from paper "CompSC: Live Migration with
> Pass-through Devices".
> http://www.cl.cam.ac.uk/research/srg/netos/vee_2012/papers/p109.pdf
> 
> It compared performance data between the solution of switching PV and VF and
> VF migration.(Chapter 7: Discussion)
> 

I haven't read it, but I would like to note you can't rely on research
papers.  If you propose a patch to be merged you need to measure what is
its actual effect on modern linux at the end of 2015.

> >>>Current patches have some issues. I think we can find
> >>>solution for them andimprove them step by step.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: arm/arm64: Revert to old way of checking for device mapping in stage2_flush_ptes().

2015-12-04 Thread Ard Biesheuvel
On 4 December 2015 at 02:58, Ben Hutchings  wrote:
> On Wed, 2015-12-02 at 18:41 +0100, Ard Biesheuvel wrote:
>> Hi Pavel,
>>
>> Thanks for getting to the bottom of this.
>>
>> On 1 December 2015 at 14:03, Pavel Fedin  wrote:
>> > This function takes stage-II physical addresses (A.K.A. IPA), on input, not
>> > real physical addresses. This causes kvm_is_device_pfn() to return wrong
>> > values, depending on how much guest and host memory maps match. This
>> > results in completely broken KVM on some boards. The problem has been
>> > caught on Samsung proprietary hardware.
>> >
>> > Cc: sta...@vger.kernel.org
>> > Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's 
>> > uncachedness")
>> >
>>
>> That commit is not in a release yet, so no need for cc stable
> [...]
>
> But it is cc'd to stable, so unless it is going to be nacked at review
> stage, any subsequent fixes should also be cc'd.
>

Ah yes, thanks for pointing that out.

But please, don't cc your proposed patches straight to
sta...@vger.kernel.org. I usually leave it up to the maintainer that
merges the patch to add the Cc: line to the commit log.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/4] KVM: arm64: Correctly handle zero register in system register accesses

2015-12-04 Thread Marc Zyngier
Hi Pavel,

On 04/12/15 10:25, Pavel Fedin wrote:
> System register accesses also use zero register for Rt == 31, and
> therefore using it will also result in getting SP value instead. This
> patch makes them also using new accessors, introduced by the previous
> patch. Since register value is no longer directly associated with storage
> inside vCPU context structure, we introduce a dedicated storage for it in
> struct sys_reg_params.
> 
> This refactor also gets rid of "massive hack" in kvm_handle_cp_64().
> 
> Signed-off-by: Pavel Fedin 
> ---
>  arch/arm64/kvm/sys_regs.c| 88 
> ++--
>  arch/arm64/kvm/sys_regs.h|  4 +-
>  arch/arm64/kvm/sys_regs_generic_v8.c |  2 +-
>  3 files changed, 46 insertions(+), 48 deletions(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index e5f024e..7c9cd64 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -97,18 +97,17 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
> struct sys_reg_params *p,
> const struct sys_reg_desc *r)
>  {
> - unsigned long val;
>   bool was_enabled = vcpu_has_cache_enabled(vcpu);
>  
>   BUG_ON(!p->is_write);
>  
> - val = *vcpu_reg(vcpu, p->Rt);
>   if (!p->is_aarch32) {
> - vcpu_sys_reg(vcpu, r->reg) = val;
> + vcpu_sys_reg(vcpu, r->reg) = p->regval;
>   } else {
>   if (!p->is_32bit)
> - vcpu_cp15_64_high(vcpu, r->reg) = val >> 32;
> - vcpu_cp15_64_low(vcpu, r->reg) = val & 0xUL;
> + vcpu_cp15_64_high(vcpu, r->reg) =
> + upper_32_bits(p->regval);

nit: please keep the assignment on one line. My terminal expands way
beyond 80 chars, and the checkpatch police doesn't intimidate me! ;-)

> + vcpu_cp15_64_low(vcpu, r->reg) = lower_32_bits(p->regval);
>   }
>  
>   kvm_toggle_cache(vcpu, was_enabled);
> @@ -125,13 +124,10 @@ static bool access_gic_sgi(struct kvm_vcpu *vcpu,
>  struct sys_reg_params *p,
>  const struct sys_reg_desc *r)
>  {
> - u64 val;
> -
>   if (!p->is_write)
>   return read_from_write_only(vcpu, p);
>  
> - val = *vcpu_reg(vcpu, p->Rt);
> - vgic_v3_dispatch_sgi(vcpu, val);
> + vgic_v3_dispatch_sgi(vcpu, p->regval);
>  
>   return true;
>  }
> @@ -153,7 +149,7 @@ static bool trap_oslsr_el1(struct kvm_vcpu *vcpu,
>   if (p->is_write) {
>   return ignore_write(vcpu, p);
>   } else {
> - *vcpu_reg(vcpu, p->Rt) = (1 << 3);
> + p->regval = (1 << 3);
>   return true;
>   }
>  }
> @@ -167,7 +163,7 @@ static bool trap_dbgauthstatus_el1(struct kvm_vcpu *vcpu,
>   } else {
>   u32 val;
>   asm volatile("mrs %0, dbgauthstatus_el1" : "=r" (val));
> - *vcpu_reg(vcpu, p->Rt) = val;
> + p->regval = val;
>   return true;
>   }
>  }
> @@ -204,13 +200,13 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
>   const struct sys_reg_desc *r)
>  {
>   if (p->is_write) {
> - vcpu_sys_reg(vcpu, r->reg) = *vcpu_reg(vcpu, p->Rt);
> + vcpu_sys_reg(vcpu, r->reg) = p->regval;
>   vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
>   } else {
> - *vcpu_reg(vcpu, p->Rt) = vcpu_sys_reg(vcpu, r->reg);
> + p->regval = vcpu_sys_reg(vcpu, r->reg);
>   }
>  
> - trace_trap_reg(__func__, r->reg, p->is_write, *vcpu_reg(vcpu, p->Rt));
> + trace_trap_reg(__func__, r->reg, p->is_write, p->regval);
>  
>   return true;
>  }
> @@ -228,7 +224,7 @@ static inline void reg_to_dbg(struct kvm_vcpu *vcpu,
> struct sys_reg_params *p,
> u64 *dbg_reg)
>  {
> - u64 val = *vcpu_reg(vcpu, p->Rt);
> + u64 val = p->regval;
>  
>   if (p->is_32bit) {
>   val &= 0xUL;
> @@ -243,12 +239,9 @@ static inline void dbg_to_reg(struct kvm_vcpu *vcpu,
> struct sys_reg_params *p,
> u64 *dbg_reg)
>  {
> - u64 val = *dbg_reg;
> -
> + p->regval = *dbg_reg;
>   if (p->is_32bit)
> - val &= 0xUL;
> -
> - *vcpu_reg(vcpu, p->Rt) = val;
> + p->regval &= 0xUL;
>  }
>  
>  static inline bool trap_bvr(struct kvm_vcpu *vcpu,
> @@ -697,10 +690,10 @@ static bool trap_dbgidr(struct kvm_vcpu *vcpu,
>   u64 pfr = read_system_reg(SYS_ID_AA64PFR0_EL1);
>   u32 el3 = !!cpuid_feature_extract_field(pfr, 
> ID_AA64PFR0_EL3_SHIFT);
>  
> - *vcpu_reg(vcpu, p->Rt) = dfr >> ID_AA64DFR0_WRPS_SHIFT) & 
> 0xf) << 28) |
> -   (((dfr >> ID_AA64DFR0_BRPS_SHIFT) & 
> 0xf) << 24) |
> -

Re: [PATCH v2 4/4] KVM: arm64: Get rid of old vcpu_reg()

2015-12-04 Thread Marc Zyngier
On 04/12/15 10:26, Pavel Fedin wrote:
> Using oldstyle vcpu_reg() accessor is proven to be inappropriate and
> unsafe on ARM64. This patch converts the rest of use cases to new
> accessors and completely removes vcpu_reg() on ARM64.
> 
> Signed-off-by: Pavel Fedin 

Reviewed-by: Marc Zyngier 

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 0/4] KVM: arm64: BUG FIX: Correctly handle zero register transfers

2015-12-04 Thread Pavel Fedin
 Hello!

> Thanks a lot for respining this quickly. I just had a few minor
> comments, so this is almost ready to go. If you can fix that

 Damn, the rest of reviews got stuck somewhere and arrived later, so i've just 
sent v3 without wrap fix. Will correct it.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC

2015-12-04 Thread Lan, Tianyu


On 12/4/2015 4:05 PM, Michael S. Tsirkin wrote:

I haven't read it, but I would like to note you can't rely on research
papers.  If you propose a patch to be merged you need to measure what is
its actual effect on modern linux at the end of 2015.


Sure. Will do that.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/4] KVM: arm64: Remove const from struct sys_reg_params

2015-12-04 Thread Marc Zyngier
Hi Pavel,

On 04/12/15 10:25, Pavel Fedin wrote:
> Further rework is going to introduce a dedicated storage for transfer
> register value in struct sys_reg_params. Before doing this we have to
> remove all 'const' modifiers from it.

I think you are being a bit overzealous here, and a few const can
legitimately be kept, see below.

> Signed-off-by: Pavel Fedin 
> ---
>  arch/arm64/kvm/sys_regs.c| 38 
> ++--
>  arch/arm64/kvm/sys_regs.h| 12 ++--
>  arch/arm64/kvm/sys_regs_generic_v8.c |  2 +-
>  3 files changed, 26 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 87a64e8..e5f024e 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -78,7 +78,7 @@ static u32 get_ccsidr(u32 csselr)
>   * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized).
>   */
>  static bool access_dcsw(struct kvm_vcpu *vcpu,
> - const struct sys_reg_params *p,
> + struct sys_reg_params *p,
>   const struct sys_reg_desc *r)
>  {
>   if (!p->is_write)
> @@ -94,7 +94,7 @@ static bool access_dcsw(struct kvm_vcpu *vcpu,
>   * sys_regs and leave it in complete control of the caches.
>   */
>  static bool access_vm_reg(struct kvm_vcpu *vcpu,
> -   const struct sys_reg_params *p,
> +   struct sys_reg_params *p,
> const struct sys_reg_desc *r)
>  {
>   unsigned long val;
> @@ -122,7 +122,7 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
>   * for both AArch64 and AArch32 accesses.
>   */
>  static bool access_gic_sgi(struct kvm_vcpu *vcpu,
> -const struct sys_reg_params *p,
> +struct sys_reg_params *p,
>  const struct sys_reg_desc *r)
>  {
>   u64 val;
> @@ -137,7 +137,7 @@ static bool access_gic_sgi(struct kvm_vcpu *vcpu,
>  }
>  
>  static bool trap_raz_wi(struct kvm_vcpu *vcpu,
> - const struct sys_reg_params *p,
> + struct sys_reg_params *p,
>   const struct sys_reg_desc *r)
>  {
>   if (p->is_write)
> @@ -147,7 +147,7 @@ static bool trap_raz_wi(struct kvm_vcpu *vcpu,
>  }
>  
>  static bool trap_oslsr_el1(struct kvm_vcpu *vcpu,
> -const struct sys_reg_params *p,
> +struct sys_reg_params *p,
>  const struct sys_reg_desc *r)
>  {
>   if (p->is_write) {
> @@ -159,7 +159,7 @@ static bool trap_oslsr_el1(struct kvm_vcpu *vcpu,
>  }
>  
>  static bool trap_dbgauthstatus_el1(struct kvm_vcpu *vcpu,
> -const struct sys_reg_params *p,
> +struct sys_reg_params *p,
>  const struct sys_reg_desc *r)
>  {
>   if (p->is_write) {
> @@ -200,7 +200,7 @@ static bool trap_dbgauthstatus_el1(struct kvm_vcpu *vcpu,
>   *   now use the debug registers.
>   */
>  static bool trap_debug_regs(struct kvm_vcpu *vcpu,
> - const struct sys_reg_params *p,
> + struct sys_reg_params *p,
>   const struct sys_reg_desc *r)
>  {
>   if (p->is_write) {
> @@ -225,7 +225,7 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
>   * hyp.S code switches between host and guest values in future.
>   */
>  static inline void reg_to_dbg(struct kvm_vcpu *vcpu,
> -   const struct sys_reg_params *p,
> +   struct sys_reg_params *p,
> u64 *dbg_reg)
>  {
>   u64 val = *vcpu_reg(vcpu, p->Rt);
> @@ -240,7 +240,7 @@ static inline void reg_to_dbg(struct kvm_vcpu *vcpu,
>  }
>  
>  static inline void dbg_to_reg(struct kvm_vcpu *vcpu,
> -   const struct sys_reg_params *p,
> +   struct sys_reg_params *p,
> u64 *dbg_reg)
>  {
>   u64 val = *dbg_reg;
> @@ -252,7 +252,7 @@ static inline void dbg_to_reg(struct kvm_vcpu *vcpu,
>  }
>  
>  static inline bool trap_bvr(struct kvm_vcpu *vcpu,
> - const struct sys_reg_params *p,
> + struct sys_reg_params *p,
>   const struct sys_reg_desc *rd)
>  {
>   u64 *dbg_reg = >arch.vcpu_debug_state.dbg_bvr[rd->reg];
> @@ -294,7 +294,7 @@ static inline void reset_bvr(struct kvm_vcpu *vcpu,
>  }
>  
>  static inline bool trap_bcr(struct kvm_vcpu *vcpu,
> - const struct sys_reg_params *p,
> + struct sys_reg_params *p,
>   const struct sys_reg_desc *rd)
>  {
>   u64 *dbg_reg = >arch.vcpu_debug_state.dbg_bcr[rd->reg];
> @@ -337,7 +337,7 @@ static inline void reset_bcr(struct kvm_vcpu *vcpu,
>  }
>  
>  static inline bool trap_wvr(struct kvm_vcpu *vcpu,
> -  

RE: [PATCH v2 2/4] KVM: arm64: Remove const from struct sys_reg_params

2015-12-04 Thread Pavel Fedin
 Hello!

> I think you are being a bit overzealous here, and a few const can
> legitimately be kept, see below.

 :) Yes, i've just commanded "search and replace" to the editor. Fixing...

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/4] KVM: arm64: BUG FIX: Correctly handle zero register transfers

2015-12-04 Thread Marc Zyngier
On 04/12/15 10:25, Pavel Fedin wrote:
> ARM64 CPU has zero register which is read-only, with a value of 0.
> However, KVM currently incorrectly recognizes it being SP (because
> Rt == 31, and in struct user_pt_regs 'regs' array is followed by SP),
> resulting in invalid value being read, or even SP corruption on write.
> 
> The problem has been discovered by performing an operation
> 
>  *((volatile int *)reg) = 0;
> 
> which compiles as "str xzr, [xx]", and resulted in strange values being
> written.
> 
> v1 => v2:
> - Changed type of transfer value to u64 and store it directly in
>   struct sys_reg_params instead of a pointer
> - Use lower_32_bits()/upper_32_bits() where appropriate
> - Fixed wrong usage of 'Rt' instead of 'Rt2' in kvm_handle_cp_64(),
>   overlooked in v1
> - Do not write value back when reading

[+Christoffer]

Hi Pavel,

Thanks a lot for respining this quickly. I just had a few minor
comments, so this is almost ready to go. If you can fix that (and
assuming nobody has any further objection), we'll try to get this queued
ASAP.

Cheers,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3 0/3] target-i386: add memory protection-key support

2015-12-04 Thread Eduardo Habkost
On Wed, Nov 18, 2015 at 10:20:14AM +0800, Huaitong Han wrote:
> Changes in v3:
> *Fix cpuid_7_0_ecx_feature_name error.
> 
> Changes in v2:
> *Fix memcpy error for xsave state.
> *Fix TCG_7_0_ECX_FEATURES to 0.
> *Make subjects more readable.
> 
> The protection-key feature provides an additional mechanism by which IA-32e
> paging controls access to usermode addresses.
> 
> Hardware support for protection keys for user pages is enumerated with CPUID
> feature flag CPUID.7.0.ECX[3]:PKU. Software support is CPUID.7.0.ECX[4]:OSPKE
> with the setting of CR4.PKE(bit 22).
> 
> The PKRU register is XSAVE-managed state CPUID.D.0.EAX[9], the size of XSAVE
> state component for PKRU is 8 bytes, the offset is 0xa80.
> 
> The specification of Protection Keys can be found at SDM (4.6.2, volume 3)
> http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf.

Reviewed-by: Eduardo Habkost 

The patches were squashed together and queued in x86-next branch
for 2.6.

> 
> Huaitong Han (3):
>   target-i386: add pkeys support for cpuid handling
>   target-i386: add pkeys support for xsave state handling
>   target-i386: add pkeys support for vm migration
> 
>  target-i386/cpu.c | 23 ++-
>  target-i386/cpu.h |  7 +++
>  target-i386/kvm.c |  3 +++
>  target-i386/machine.c | 23 +++
>  4 files changed, 55 insertions(+), 1 deletion(-)
> 
> -- 
> 2.4.3
> 
> 

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/4] KVM: arm64: Correctly handle zero register during MMIO

2015-12-04 Thread Marc Zyngier
On 04/12/15 12:03, Pavel Fedin wrote:
> On ARM64 register index of 31 corresponds to both zero register and SP.
> However, all memory access instructions, use ZR as transfer register. SP
> is used only as a base register in indirect memory addressing, or by
> register-register arithmetics, which cannot be trapped here.
> 
> Correct emulation is achieved by introducing new register accessor
> functions, which can do special handling for reg_num == 31. These new
> accessors intentionally do not rely on old vcpu_reg() on ARM64, because
> it is to be removed. Since the affected code is shared by both ARM
> flavours, implementations of these accessors are also added to ARM32 code.
> 
> This patch fixes setting MMIO register to a random value (actually SP)
> instead of zero by something like:
> 
>  *((volatile int *)reg) = 0;
> 
> compilers tend to generate "str wzr, [xx]" here
> 
> Signed-off-by: Pavel Fedin 
> Reviewed-by: Marc Zyngier 
> ---
>  arch/arm/include/asm/kvm_emulate.h   | 12 
>  arch/arm/kvm/mmio.c  |  5 +++--
>  arch/arm64/include/asm/kvm_emulate.h | 13 +
>  3 files changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_emulate.h 
> b/arch/arm/include/asm/kvm_emulate.h
> index a9c80a2..b7ff32e 100644
> --- a/arch/arm/include/asm/kvm_emulate.h
> +++ b/arch/arm/include/asm/kvm_emulate.h
> @@ -28,6 +28,18 @@
>  unsigned long *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num);
>  unsigned long *vcpu_spsr(struct kvm_vcpu *vcpu);
>  
> +static inline unsigned long vcpu_get_reg(const struct kvm_vcpu *vcpu,
> +  u8 reg_num)
> +{
> + return *vcpu_reg(vcpu, reg_num);
> +}
> +
> +static inline void vcpu_set_reg(const struct kvm_vcpu *vcpu, u8 reg_num,
> + unsigned long val)
> +{
> + *vcpu_reg(vcpu, reg_num) = val;
> +}
> +

This makes a 32bit compile scream (making these vcpu pointer const is
not a good idea).

I'll fix it locally.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/23] userfaultfd: linux/Documentation/vm/userfaultfd.txt

2015-12-04 Thread Michael Kerrisk (man-pages)
Hi Andrea,

On 09/11/2015 10:47 AM, Michael Kerrisk (man-pages) wrote:
> On 05/14/2015 07:30 PM, Andrea Arcangeli wrote:
>> Add documentation.
> 
> Hi Andrea,
> 
> I do not recall... Did you write a man page also for this new system call?

No response to my last mail, so I'll try again... Did you 
write any man page for this interface?

Thanks,

Michael


>> Signed-off-by: Andrea Arcangeli 
>> ---
>>  Documentation/vm/userfaultfd.txt | 140 
>> +++
>>  1 file changed, 140 insertions(+)
>>  create mode 100644 Documentation/vm/userfaultfd.txt
>>
>> diff --git a/Documentation/vm/userfaultfd.txt 
>> b/Documentation/vm/userfaultfd.txt
>> new file mode 100644
>> index 000..c2f5145
>> --- /dev/null
>> +++ b/Documentation/vm/userfaultfd.txt
>> @@ -0,0 +1,140 @@
>> += Userfaultfd =
>> +
>> +== Objective ==
>> +
>> +Userfaults allow the implementation of on-demand paging from userland
>> +and more generally they allow userland to take control various memory
>> +page faults, something otherwise only the kernel code could do.
>> +
>> +For example userfaults allows a proper and more optimal implementation
>> +of the PROT_NONE+SIGSEGV trick.
>> +
>> +== Design ==
>> +
>> +Userfaults are delivered and resolved through the userfaultfd syscall.
>> +
>> +The userfaultfd (aside from registering and unregistering virtual
>> +memory ranges) provides two primary functionalities:
>> +
>> +1) read/POLLIN protocol to notify a userland thread of the faults
>> +   happening
>> +
>> +2) various UFFDIO_* ioctls that can manage the virtual memory regions
>> +   registered in the userfaultfd that allows userland to efficiently
>> +   resolve the userfaults it receives via 1) or to manage the virtual
>> +   memory in the background
>> +
>> +The real advantage of userfaults if compared to regular virtual memory
>> +management of mremap/mprotect is that the userfaults in all their
>> +operations never involve heavyweight structures like vmas (in fact the
>> +userfaultfd runtime load never takes the mmap_sem for writing).
>> +
>> +Vmas are not suitable for page- (or hugepage) granular fault tracking
>> +when dealing with virtual address spaces that could span
>> +Terabytes. Too many vmas would be needed for that.
>> +
>> +The userfaultfd once opened by invoking the syscall, can also be
>> +passed using unix domain sockets to a manager process, so the same
>> +manager process could handle the userfaults of a multitude of
>> +different processes without them being aware about what is going on
>> +(well of course unless they later try to use the userfaultfd
>> +themselves on the same region the manager is already tracking, which
>> +is a corner case that would currently return -EBUSY).
>> +
>> +== API ==
>> +
>> +When first opened the userfaultfd must be enabled invoking the
>> +UFFDIO_API ioctl specifying a uffdio_api.api value set to UFFD_API (or
>> +a later API version) which will specify the read/POLLIN protocol
>> +userland intends to speak on the UFFD. The UFFDIO_API ioctl if
>> +successful (i.e. if the requested uffdio_api.api is spoken also by the
>> +running kernel), will return into uffdio_api.features and
>> +uffdio_api.ioctls two 64bit bitmasks of respectively the activated
>> +feature of the read(2) protocol and the generic ioctl available.
>> +
>> +Once the userfaultfd has been enabled the UFFDIO_REGISTER ioctl should
>> +be invoked (if present in the returned uffdio_api.ioctls bitmask) to
>> +register a memory range in the userfaultfd by setting the
>> +uffdio_register structure accordingly. The uffdio_register.mode
>> +bitmask will specify to the kernel which kind of faults to track for
>> +the range (UFFDIO_REGISTER_MODE_MISSING would track missing
>> +pages). The UFFDIO_REGISTER ioctl will return the
>> +uffdio_register.ioctls bitmask of ioctls that are suitable to resolve
>> +userfaults on the range registered. Not all ioctls will necessarily be
>> +supported for all memory types depending on the underlying virtual
>> +memory backend (anonymous memory vs tmpfs vs real filebacked
>> +mappings).
>> +
>> +Userland can use the uffdio_register.ioctls to manage the virtual
>> +address space in the background (to add or potentially also remove
>> +memory from the userfaultfd registered range). This means a userfault
>> +could be triggering just before userland maps in the background the
>> +user-faulted page.
>> +
>> +The primary ioctl to resolve userfaults is UFFDIO_COPY. That
>> +atomically copies a page into the userfault registered range and wakes
>> +up the blocked userfaults (unless uffdio_copy.mode &
>> +UFFDIO_COPY_MODE_DONTWAKE is set). Other ioctl works similarly to
>> +UFFDIO_COPY.
>> +
>> +== QEMU/KVM ==
>> +
>> +QEMU/KVM is using the userfaultfd syscall to implement postcopy live
>> +migration. Postcopy live migration is one form of memory
>> +externalization consisting of a virtual machine running with part or
>> +all of its memory residing on a different 

Re: [PATCH] kvm: Dump guest rIP when the guest tried something unsupported

2015-12-04 Thread Paolo Bonzini


On 20/11/2015 19:52, Borislav Petkov wrote:
> From: Borislav Petkov 
> 
> It looks like this in action:
> 
>   kvm [5197]: vcpu0, guest rIP: 0x810187ba unhandled rdmsr: 0xc001102
> 
> and helps to pinpoint quickly where in the guest we did the unsupported
> thing.
> 
> Signed-off-by: Borislav Petkov 
> ---
>  include/linux/kvm_host.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 5706a2108f0a..597f6607c440 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -439,7 +439,8 @@ struct kvm {
>  
>  /* The guest did something we don't support. */
>  #define vcpu_unimpl(vcpu, fmt, ...)  \
> - kvm_pr_unimpl("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
> + kvm_pr_unimpl("vcpu%i, guest rIP: 0x%lx " fmt,  \
> + (vcpu)->vcpu_id, kvm_rip_read(vcpu), ## __VA_ARGS__)
>  
>  #define vcpu_debug(vcpu, fmt, ...)   \
>   kvm_debug("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
> 

Thanks, applying this to kvm/queue.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/9] drivers/hv: replace enum hv_message_type by u32

2015-12-04 Thread Denis V. Lunev

On 12/02/2015 03:22 PM, Paolo Bonzini wrote:


On 30/11/2015 17:22, Andrey Smetanin wrote:

enum hv_message_type inside struct hv_message, hv_post_message
is not size portable. Replace enum by u32.

It's only non-portable inside structs.  Okay to apply just these:

@@ -172,7 +174,7 @@ union hv_message_flags {

  /* Define synthetic interrupt controller message header. */
  struct hv_message_header {
-   u32 message_type;
+   enum hv_message_type message_type;
u8 payload_size;
union hv_message_flags message_flags;
u8 reserved[2];
@@ -345,7 +347,7 @@ enum hv_call_code {
  struct hv_input_post_message {
union hv_connection_id connectionid;
u32 reserved;
-   u32 message_type;
+   enum hv_message_type message_type;
u32 payload_size;
u64 payload[HV_MESSAGE_PAYLOAD_QWORD_COUNT];
  };

?

Paolo

sorry for the delay.

Andrey is on vacation till the end of the week.

This could be not enough for some compilers as this exact
enum could be signed and unsigned depends upon the
implementation of the compiler and if it is signed we
can face signed/unsigned comparison in ifs.

Vitaly, by the way, this code is a bit rotten. The only caller of
hv_post_message calls it with message type exactly written
as "1", which is not defined in the enum.

/*
 * vmbus_post_msg - Send a msg on the vmbus's message connection
 */
int vmbus_post_msg(void *buffer, size_t buflen)
{
...
ret = hv_post_message(conn_id, 1, buffer, buflen);

Den
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] VSOCK: mark virtio_transport.ko experimental

2015-12-04 Thread Michael S. Tsirkin
On Fri, Dec 04, 2015 at 11:49:18AM +0800, Stefan Hajnoczi wrote:
> Be explicit that the virtio_transport.ko code implements a draft virtio
> specification that is still subject to change.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
> If you'd rather wait until the device specification has been finalized, feel
> free to revert the virtio-vsock code for now.  Apologies for not mentioning 
> the
> status in the Kconfig earlier.
> 
>  net/vmw_vsock/Kconfig | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig
> index 74e0bc8..d8be850 100644
> --- a/net/vmw_vsock/Kconfig
> +++ b/net/vmw_vsock/Kconfig
> @@ -28,12 +28,17 @@ config VMWARE_VMCI_VSOCKETS
> will be called vmw_vsock_vmci_transport. If unsure, say N.
>  
>  config VIRTIO_VSOCKETS
> - tristate "virtio transport for Virtual Sockets"
> + tristate "virtio transport for Virtual Sockets (Experimental)"
>   depends on VSOCKETS && VIRTIO
>   select VIRTIO_VSOCKETS_COMMON
> + default n
>   help
> This module implements a virtio transport for Virtual Sockets.
>  
> +   This feature is based on a draft of the virtio-vsock device
> +   specification that is still subject to change.  It can be used
> +   to begin developing applications that use Virtual Sockets.
> +
> Enable this transport if your Virtual Machine runs on Qemu/KVM.
>  
> To compile this driver as a module, choose M here: the module

I'm pretty sure this alone is not enough.  I think depending on an entry
under drivers/staging is necessary. The issue is userspace depending on
the interface, not kernel code itself being unstable.  We can create
drivers/staging/virtio for this purpose, even if it's just to hold the
Kconfig entry.  And I'd rather add STAGING within the KConfig names too,
so people enabling it don't get a surpise when their userspace stops
working.

But yes, revert would be cleaner and easier than all this temporary
work.

If you agree, could you send a patch to do one of these two things pls?

> -- 
> 2.5.0
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/4] KVM: arm64: Remove const from struct sys_reg_params

2015-12-04 Thread Marc Zyngier
On 04/12/15 12:03, Pavel Fedin wrote:
> Further rework is going to introduce a dedicated storage for transfer
> register value in struct sys_reg_params. Before doing this we have to
> remove 'const' modifiers from it in all accessor functions and their
> callers.
> 
> Signed-off-by: Pavel Fedin 
> ---
>  arch/arm64/kvm/sys_regs.c| 36 
> ++--
>  arch/arm64/kvm/sys_regs.h|  4 ++--
>  arch/arm64/kvm/sys_regs_generic_v8.c |  2 +-
>  3 files changed, 21 insertions(+), 21 deletions(-)

Reviewed-by: Marc Zyngier 

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/9] drivers/hv: replace enum hv_message_type by u32

2015-12-04 Thread Paolo Bonzini


On 04/12/2015 15:33, Denis V. Lunev wrote:
> On 12/02/2015 03:22 PM, Paolo Bonzini wrote:
>>
>> On 30/11/2015 17:22, Andrey Smetanin wrote:
>>> enum hv_message_type inside struct hv_message, hv_post_message
>>> is not size portable. Replace enum by u32.
>> It's only non-portable inside structs.  Okay to apply just these:
>>
>> @@ -172,7 +174,7 @@ union hv_message_flags {
>>
>>   /* Define synthetic interrupt controller message header. */
>>   struct hv_message_header {
>> -u32 message_type;
>> +enum hv_message_type message_type;
>>   u8 payload_size;
>>   union hv_message_flags message_flags;
>>   u8 reserved[2];
>> @@ -345,7 +347,7 @@ enum hv_call_code {
>>   struct hv_input_post_message {
>>   union hv_connection_id connectionid;
>>   u32 reserved;
>> -u32 message_type;
>> +enum hv_message_type message_type;
>>   u32 payload_size;
>>   u64 payload[HV_MESSAGE_PAYLOAD_QWORD_COUNT];
>>   };
>>
>> ?
>>
>> Paolo
> sorry for the delay.
> 
> Andrey is on vacation till the end of the week.
> 
> This could be not enough for some compilers as this exact
> enum could be signed and unsigned depends upon the
> implementation of the compiler and if it is signed we
> can face signed/unsigned comparison in ifs.

But why is that a problem?  The issue is pre-existing anyway; the only
one that can cause bugs when moving code to uapi/ (i.e. which means it
can be used on non-x86 platforms) is the size of the enum, not the
signedness.

Paolo

> Vitaly, by the way, this code is a bit rotten. The only caller of
> hv_post_message calls it with message type exactly written
> as "1", which is not defined in the enum.
> 
> /*
>  * vmbus_post_msg - Send a msg on the vmbus's message connection
>  */
> int vmbus_post_msg(void *buffer, size_t buflen)
> {
> ...
> ret = hv_post_message(conn_id, 1, buffer, buflen);
> 
> Den
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC

2015-12-04 Thread Alexander Duyck

On 12/04/2015 08:32 AM, Lan, Tianyu wrote:

Hi Michael & Alexander:
Thanks a lot for your comments and suggestions.

We still need to support Windows guest for migration and this is why our
patches keep all changes in the driver since it's impossible to change
Windows kernel.


That is a poor argument.  I highly doubt Microsoft is interested in 
having to modify all of the drivers that will support direct assignment 
in order to support migration.  They would likely request something 
similar to what I have in that they will want a way to do DMA tracking 
with minimal modification required to the drivers.



Following is my idea to do DMA tracking.

Inject event to VF driver after memory iterate stage
and before stop VCPU and then VF driver marks dirty all
using DMA memory. The new allocated pages also need to
be marked dirty before stopping VCPU. All dirty memory
in this time slot will be migrated until stop-and-copy
stage. We also need to make sure to disable VF via clearing the
bus master enable bit for VF before migrating these memory.


The ordering of your explanation here doesn't quite work.  What needs to 
happen is that you have to disable DMA and then mark the pages as dirty. 
 What the disabling of the BME does is signal to the hypervisor that 
the device is now stopped.  The ixgbevf_suspend call already supported 
by the driver is almost exactly what is needed to take care of something 
like this.


The question is how we would go about triggering it.  I really don't 
think the PCI configuration space approach is the right idea.  I wonder 
if we couldn't get away with some sort of ACPI event instead.  We 
already require ACPI support in order to shut down the system 
gracefully, I wonder if we couldn't get away with something similar in 
order to suspend/resume the direct assigned devices gracefully.



The dma page allocated by VF driver also needs to reserve space
to do dummy write.


No, this will not work.  If for example you have a VF driver allocating 
memory for a 9K receive how will that work?  It isn't as if you can poke 
a hole in the contiguous memory.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/9] drivers/hv: replace enum hv_message_type by u32

2015-12-04 Thread Paolo Bonzini


On 04/12/2015 17:55, Denis V. Lunev wrote:
> On 12/04/2015 05:41 PM, Paolo Bonzini wrote:
>>
>> On 04/12/2015 15:33, Denis V. Lunev wrote:
>>> On 12/02/2015 03:22 PM, Paolo Bonzini wrote:
 On 30/11/2015 17:22, Andrey Smetanin wrote:
> enum hv_message_type inside struct hv_message, hv_post_message
> is not size portable. Replace enum by u32.
 It's only non-portable inside structs.  Okay to apply just these:

 @@ -172,7 +174,7 @@ union hv_message_flags {

/* Define synthetic interrupt controller message header. */
struct hv_message_header {
 -u32 message_type;
 +enum hv_message_type message_type;
u8 payload_size;
union hv_message_flags message_flags;
u8 reserved[2];
 @@ -345,7 +347,7 @@ enum hv_call_code {
struct hv_input_post_message {
union hv_connection_id connectionid;
u32 reserved;
 -u32 message_type;
 +enum hv_message_type message_type;
u32 payload_size;
u64 payload[HV_MESSAGE_PAYLOAD_QWORD_COUNT];
};

 ?

 Paolo
>>> sorry for the delay.
>>>
>>> Andrey is on vacation till the end of the week.
>>>
>>> This could be not enough for some compilers as this exact
>>> enum could be signed and unsigned depends upon the
>>> implementation of the compiler and if it is signed we
>>> can face signed/unsigned comparison in ifs.
>> But why is that a problem?  The issue is pre-existing anyway; the only
>> one that can cause bugs when moving code to uapi/ (i.e. which means it
>> can be used on non-x86 platforms) is the size of the enum, not the
>> signedness.
>
> we are now comparing enum with enum which are the same type.
> With the change you are proposing we will compare enum
> with u32 which are different.

This is only an issue in C++.

> Original suggestion from Andrey was safe in this respect.

Sure, but it makes code less clear.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] KVM/ARM fixes for 4.4-rc4

2015-12-04 Thread Paolo Bonzini


On 04/12/2015 18:17, Marc Zyngier wrote:
> Hi Paolo,
> 
> This pull request contains a number of fixes for 4.4-rc4 (or -rc5 if
> we already missed the boat).
> 
> The first part is a very nice catch from Pavel, who noticed that we
> were not dealing very well (if at all) with the aliasing between two
> registers.
> 
> The last patch from Ard is a fix for a fix that was merged in
> -rc3. Hopefully we got it right this time.
> 
> Please pull!
> 
>M.
> 
> The following changes since commit fbb4574ce9a37e15a9872860bf202f2be5bdf6c4:
> 
>   arm64: kvm: report original PAR_EL1 upon panic (2015-11-24 18:20:58 +0100)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git 
> tags/kvm-arm-for-v4.4-rc4
> 
> for you to fetch changes up to 0de58f852875a0f0dcfb120bb8433e4e73c7803b:
> 
>   ARM/arm64: KVM: correct PTE uncachedness check (2015-12-04 16:30:17 +)
> 
> 
> KVM/ARM fixes for v4.4-rc4
> 
> - A series of fixes to deal with the aliasing between the sp and xzr register
> - A fix for the cache flush fix that went in -rc3
> 
> 
> Ard Biesheuvel (1):
>   ARM/arm64: KVM: correct PTE uncachedness check
> 
> Pavel Fedin (4):
>   arm64: KVM: Correctly handle zero register during MMIO
>   arm64: KVM: Remove const from struct sys_reg_params
>   arm64: KVM: Correctly handle zero register in system register accesses
>   arm64: KVM: Get rid of old vcpu_reg()
> 
>  arch/arm/include/asm/kvm_emulate.h   |  12 
>  arch/arm/kvm/mmio.c  |   5 +-
>  arch/arm/kvm/mmu.c   |   4 +-
>  arch/arm/kvm/psci.c  |  20 +++---
>  arch/arm64/include/asm/kvm_emulate.h |  18 +++--
>  arch/arm64/kvm/handle_exit.c |   2 +-
>  arch/arm64/kvm/sys_regs.c| 123 
> +--
>  arch/arm64/kvm/sys_regs.h|   8 +--
>  arch/arm64/kvm/sys_regs_generic_v8.c |   4 +-
>  9 files changed, 107 insertions(+), 89 deletions(-)
> 

Pulled, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/23] userfaultfd: linux/Documentation/vm/userfaultfd.txt

2015-12-04 Thread Andrea Arcangeli
Hello Michael,

On Fri, Dec 04, 2015 at 04:50:03PM +0100, Michael Kerrisk (man-pages) wrote:
> Hi Andrea,
> 
> On 09/11/2015 10:47 AM, Michael Kerrisk (man-pages) wrote:
> > On 05/14/2015 07:30 PM, Andrea Arcangeli wrote:
> >> Add documentation.
> > 
> > Hi Andrea,
> > 
> > I do not recall... Did you write a man page also for this new system call?
> 
> No response to my last mail, so I'll try again... Did you 
> write any man page for this interface?

I wished I would answer with the manpage itself to give a more
satisfactory answer, but answer is still no at this time. Right now
there's the write protection tracking feature posted to linux-mm and
I'm currently reviewing that. It's worth documenting that part too in
the manpage as it's going to happen sooner than later.

Lack of manpage so far didn't prevent userland to use it (qemu
postcopy is already in upstream qemu and it depends on userfaultfd),
nor review of the code nor other kernel contributors to extend the
syscall API. Other users started testing the syscall too. This is just
to explain why unfortunately the manpage didn't get the top priority
yet, but nevertheless the manpage should happen too and it's
important. Advice on how to proceed is welcome.

Thanks,
Andrea
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC

2015-12-04 Thread Lan, Tianyu

Hi Michael & Alexander:
Thanks a lot for your comments and suggestions.

We still need to support Windows guest for migration and this is why our 
patches keep all changes in the driver since it's impossible to change 
Windows kernel.


Following is my idea to do DMA tracking.

Inject event to VF driver after memory iterate stage
and before stop VCPU and then VF driver marks dirty all
using DMA memory. The new allocated pages also need to
be marked dirty before stopping VCPU. All dirty memory
in this time slot will be migrated until stop-and-copy
stage. We also need to make sure to disable VF via clearing the
bus master enable bit for VF before migrating these memory.

The dma page allocated by VF driver also needs to reserve space
to do dummy write.


On 12/2/2015 7:44 PM, Michael S. Tsirkin wrote:

On Tue, Dec 01, 2015 at 10:36:33AM -0800, Alexander Duyck wrote:

On Tue, Dec 1, 2015 at 9:37 AM, Michael S. Tsirkin  wrote:

On Tue, Dec 01, 2015 at 09:04:32AM -0800, Alexander Duyck wrote:

On Tue, Dec 1, 2015 at 7:28 AM, Michael S. Tsirkin  wrote:



There are several components to this:
- dma_map_* needs to prevent page from
   being migrated while device is running.
   For example, expose some kind of bitmap from guest
   to host, set bit there while page is mapped.
   What happens if we stop the guest and some
   bits are still set? See dma_alloc_coherent below
   for some ideas.


Yeah, I could see something like this working.  Maybe we could do
something like what was done for the NX bit and make use of the upper
order bits beyond the limits of the memory range to mark pages as
non-migratable?

I'm curious.  What we have with a DMA mapped region is essentially
shared memory between the guest and the device.  How would we resolve
something like this with IVSHMEM, or are we blocked there as well in
terms of migration?


I have some ideas. Will post later.


I look forward to it.


- dma_unmap_* needs to mark page as dirty
   This can be done by writing into a page.

- dma_sync_* needs to mark page as dirty
   This is trickier as we can not change the data.
   One solution is using atomics.
   For example:
 int x = ACCESS_ONCE(*p);
 cmpxchg(p, x, x);
   Seems to do a write without changing page
   contents.


Like I said we can probably kill 2 birds with one stone by just
implementing our own dma_mark_clean() for x86 virtualized
environments.

I'd say we could take your solution one step further and just use 0
instead of bothering to read the value.  After all it won't write the
area if the value at the offset is not 0.


Really almost any atomic that has no side effect will do.
atomic or with 0
atomic and with 

It's just that cmpxchg already happens to have a portable
wrapper.


I was originally thinking maybe an atomic_add with 0 would be the way
to go.


cmpxchg with any value too.


  Either way though we still are using a locked prefix and
having to dirty a cache line per page which is going to come at some
cost.


I agree. It's likely not necessary for everyone
to be doing this: only people that both
run within the VM and want migration to work
need to do this logging.

So set some module option to have driver tell hypervisor that it
supports logging.  If bus mastering is enabled before this, migration is
blocked.  Or even pass some flag from hypervisor so
driver can detect it needs to log writes.
I guess this could be put in device config somewhere,
though in practice it's a global thing, not a per device one, so
maybe we need some new channel to
pass this flag to guest. CPUID?
Or maybe we can put some kind of agent in the initrd
and use the existing guest agent channel after all.
agent in initrd could open up a lot of new possibilities.



- dma_alloc_coherent memory (e.g. device rings)
   must be migrated after device stopped modifying it.
   Just stopping the VCPU is not enough:
   you must make sure device is not changing it.

   Or maybe the device has some kind of ring flush operation,
   if there was a reasonably portable way to do this
   (e.g. a flush capability could maybe be added to SRIOV)
   then hypervisor could do this.


This is where things start to get messy. I was suggesting the
suspend/resume to resolve this bit, but it might be possible to also
deal with this via something like this via clearing the bus master
enable bit for the VF.  If I am not mistaken that should disable MSI-X
interrupts and halt any DMA.  That should work as long as you have
some mechanism that is tracking the pages in use for DMA.


A bigger issue is recovering afterwards.


Agreed.


   In case you need to resume on source, you
   really need to follow the same path
   as on destination, preferably detecting
   device reset and restoring the device
   state.


The problem with detecting the reset is that you would likely have to
be polling to do something like that.


We could some event to guest to notify it about this event
through a new or 

Re: [Qemu-devel] [PATCH v8 0/5] implement vNVDIMM

2015-12-04 Thread Vladimir Sementsov-Ogievskiy

On 16.11.2015 13:50, Xiao Guangrong wrote:

NVDIMM (A Non-Volatile Dual In-line Memory Module) is going to be supported
on Intel's platform.


Hi.

One question: do this mean, that your qemu emulated nvidimm - pmem 
solution will work only on Intel host?



--
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/9] drivers/hv: replace enum hv_message_type by u32

2015-12-04 Thread Denis V. Lunev

On 12/04/2015 05:41 PM, Paolo Bonzini wrote:


On 04/12/2015 15:33, Denis V. Lunev wrote:

On 12/02/2015 03:22 PM, Paolo Bonzini wrote:

On 30/11/2015 17:22, Andrey Smetanin wrote:

enum hv_message_type inside struct hv_message, hv_post_message
is not size portable. Replace enum by u32.

It's only non-portable inside structs.  Okay to apply just these:

@@ -172,7 +174,7 @@ union hv_message_flags {

   /* Define synthetic interrupt controller message header. */
   struct hv_message_header {
-u32 message_type;
+enum hv_message_type message_type;
   u8 payload_size;
   union hv_message_flags message_flags;
   u8 reserved[2];
@@ -345,7 +347,7 @@ enum hv_call_code {
   struct hv_input_post_message {
   union hv_connection_id connectionid;
   u32 reserved;
-u32 message_type;
+enum hv_message_type message_type;
   u32 payload_size;
   u64 payload[HV_MESSAGE_PAYLOAD_QWORD_COUNT];
   };

?

Paolo

sorry for the delay.

Andrey is on vacation till the end of the week.

This could be not enough for some compilers as this exact
enum could be signed and unsigned depends upon the
implementation of the compiler and if it is signed we
can face signed/unsigned comparison in ifs.

But why is that a problem?  The issue is pre-existing anyway; the only
one that can cause bugs when moving code to uapi/ (i.e. which means it
can be used on non-x86 platforms) is the size of the enum, not the
signedness.

Paolo

we are now comparing enum with enum which are the same type.
With the change you are proposing we will compare enum
with u32 which are different.

Original suggestion from Andrey was safe in this respect.

Den
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] vhost: replace % with & on data path

2015-12-04 Thread Venkatesh Srinivas
On Mon, Nov 30, 2015 at 11:15:23AM +0200, Michael S. Tsirkin wrote:
> We know vring num is a power of 2, so use &
> to mask the high bits.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---

The generated code switches from DIV -> masking, source is clearer as well.

Tested-by: Venkatesh Srinivas 

-- vs;
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/4] KVM: arm64: BUG FIX: Correctly handle zero register transfers

2015-12-04 Thread Andrew Jones
On Fri, Dec 04, 2015 at 03:03:10PM +0300, Pavel Fedin wrote:
> ARM64 CPU has zero register which is read-only, with a value of 0.
> However, KVM currently incorrectly recognizes it being SP (because
> Rt == 31, and in struct user_pt_regs 'regs' array is followed by SP),
> resulting in invalid value being read, or even SP corruption on write.
> 
> The problem has been discovered by performing an operation
> 
>  *((volatile int *)reg) = 0;
> 
> which compiles as "str xzr, [xx]", and resulted in strange values being
> written.
> 
> v3 => v4:
> - Unwrapped assignment in patch 0003
> 
> v2 => v3:
> - Brought back some const modifiers in unaffected functions
> 
> v1 => v2:
> - Changed type of transfer value to u64 and store it directly in
>   struct sys_reg_params instead of a pointer
> - Use lower_32_bits()/upper_32_bits() where appropriate
> - Fixed wrong usage of 'Rt' instead of 'Rt2' in kvm_handle_cp_64(),
>   overlooked in v1
> - Do not write value back when reading
> 
> Pavel Fedin (4):
>   KVM: arm64: Correctly handle zero register during MMIO
>   KVM: arm64: Remove const from struct sys_reg_params
>   KVM: arm64: Correctly handle zero register in system register accesses
>   KVM: arm64: Get rid of old vcpu_reg()
>

FYI, I tried writing test cases for this issue with kvm-unit-tests. The
issue didn't reproduce for me. It's quite possible my test cases are
flawed, so I'm not making any claims about the validity of the series (I
also see that it has already been acked and pulled). But, if Pavel doesn't
mind trying them out on his system, then it'd be good to know if they
reproduce there. I'd like to find out if it's a test case problem or
something else strange going on with environments.

kvm-unit-tests patch attached

Thanks,
drew
>From 6576833b5e45801f0226316afae7daf0936a0aee Mon Sep 17 00:00:00 2001
From: Andrew Jones 
Date: Fri, 4 Dec 2015 23:55:53 +0100
Subject: [kvm-unit-tests PATCH] arm64: add xzr emulator test

---
 arm/xzr-test.c  | 61 +
 config/config-arm64.mak |  4 +++-
 2 files changed, 64 insertions(+), 1 deletion(-)
 create mode 100644 arm/xzr-test.c

diff --git a/arm/xzr-test.c b/arm/xzr-test.c
new file mode 100644
index 0..77a11461c955c
--- /dev/null
+++ b/arm/xzr-test.c
@@ -0,0 +1,61 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static void check_xzr_sysreg(void)
+{
+   uint64_t val;
+
+   flush_tlb_all();
+   mmu_disable(); /* Tell KVM to set HCR_TVM for this VCPU */
+
+   asm volatile("msr ttbr0_el1, %0" : : "r" (0x & 
PAGE_MASK));
+   isb();
+   asm volatile("mrs %0, ttbr0_el1" : "=r" (val));
+   isb();
+   report("sysreg: sanity check: read 0x%016lx", val == 
(0x & PAGE_MASK), val);
+
+   asm volatile("msr ttbr0_el1, xzr");
+   isb();
+   asm volatile("mrs %0, ttbr0_el1" : "=r" (val));
+   isb();
+   report("sysreg: xzr check: read 0x%016lx", val == 0, val);
+
+   halt();
+}
+
+static uint32_t *steal_mmio_addr(void)
+{
+   /*
+* Steal an MMIO addr from chr-testdev. Before calling exit()
+* chr-testdev must be reinit.
+*/
+   return (uint32_t *)(0x0a003e00UL /* base */ + 0x40 /* queue pfn */);
+}
+
+int main(void)
+{
+   volatile uint32_t *addr = steal_mmio_addr();
+   uint32_t val;
+   long i;
+
+   writel(0x, addr);
+   val = readl(addr);
+   report("mmio: sanity check: read 0x%08lx", val == 0x, val);
+
+   mb();
+   asm volatile("str wzr, [%0]" : : "r" (addr));
+   val = readl(addr);
+   report("mmio: 'str wzr' check: read 0x%08lx", val == 0, val);
+
+   chr_testdev_init();
+
+   smp_boot_secondary(1, check_xzr_sysreg);
+   for (i = 0; i < 10; ++i)
+   cpu_relax();
+
+   return report_summary();
+}
diff --git a/config/config-arm64.mak b/config/config-arm64.mak
index d61b703c8140e..65b355175f8a0 100644
--- a/config/config-arm64.mak
+++ b/config/config-arm64.mak
@@ -12,9 +12,11 @@ cflatobjs += lib/arm64/processor.o
 cflatobjs += lib/arm64/spinlock.o
 
 # arm64 specific tests
-tests =
+tests = $(TEST_DIR)/xzr-test.flat
 
 include config/config-arm-common.mak
 
 arch_clean: arm_clean
$(RM) lib/arm64/.*.d
+
+$(TEST_DIR)/xzr-test.elf: $(cstart.o) $(TEST_DIR)/xzr-test.o
-- 
1.8.3.1



Re: [Qemu-devel] [PATCH v8 0/5] implement vNVDIMM

2015-12-04 Thread Xiao Guangrong



On 12/05/2015 12:38 AM, Vladimir Sementsov-Ogievskiy wrote:

On 16.11.2015 13:50, Xiao Guangrong wrote:

NVDIMM (A Non-Volatile Dual In-line Memory Module) is going to be supported
on Intel's platform.


Hi.

One question: do this mean, that your qemu emulated nvidimm - pmem solution 
will work only on Intel
host?


Currently, it is only enabled in x86 emulator, however, it can be easily 
enabled on
other platform which supports acpi.

BTW, you also need to check the linux kernel driver, nfit.ko, which is 
currently only
working on x86 IIUC:
config ACPI_NFIT
tristate "ACPI NVDIMM Firmware Interface Table (NFIT)"
depends on PHYS_ADDR_T_64BIT
depends on BLK_DEV
depends on ARCH_HAS_MMIO_FLUSH
select LIBNVDIMM
help
  Infrastructure to probe ACPI 6 compliant platforms for
  NVDIMMs (NFIT) and register a libnvdimm device tree.  In
  addition to storage devices this also enables libnvdimm to pass
  ACPI._DSM messages for platform/dimm configuration.

  To compile this driver as a module, choose M here:
  the module will be called nfit.

$ git grep ARCH_HAS_MMIO_FLUSH
arch/x86/Kconfig:   select ARCH_HAS_MMIO_FLUSH
drivers/acpi/Kconfig:   depends on ARCH_HAS_MMIO_FLUSH
lib/Kconfig:config ARCH_HAS_MMIO_FLUSH

You should check  ARCH_HAS_MMIO_FLUSH on your platform.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


<    2   3   4   5   6   7   8   9   10   11   >