Re: [PATCH v4 18/21] KVM: ARM64: Add PMU overflow interrupt routing

2015-12-02 Thread Christoffer Dall
On Wed, Dec 02, 2015 at 10:22:04AM +, Marc Zyngier wrote:
> On 02/12/15 09:49, Shannon Zhao wrote:
> > 
> > 
> > On 2015/12/2 16:45, Marc Zyngier wrote:
> >> On 02/12/15 02:40, Shannon Zhao wrote:
> 
> 
>  On 2015/12/2 0:57, Marc Zyngier wrote:
> >> On 01/12/15 16:26, Shannon Zhao wrote:
> 
> 
>  On 2015/12/1 23:41, Marc Zyngier wrote:
>  The reason is that when guest clear the overflow register, it 
>  will trap
> >> to kvm and call kvm_pmu_sync_hwstate() as you see above. At 
> >> this moment,
> >> the overflow register is still overflowed(that is some bit is 
> >> still 1).
> >> So We need to use some flag to mark we already inject this 
> >> interrupt.
> >> And if during guest handling the overflow, there is a new 
> >> overflow
> >> happening, the pmu->irq_pending will be set ture by
> >> kvm_pmu_perf_overflow(), then it needs to inject this new 
> >> interrupt, right?
> >> I don't think so. This is a level interrupt, so the level should 
> >> stay
> >> high as long as the guest hasn't cleared all possible sources for 
> >> that
> >> interrupt.
> >>
> >> For your example, the guest writes to PMOVSCLR to clear the 
> >> overflow
> >> caused by a given counter. If the status is now 0, the interrupt 
> >> line
> >> drops. If the status is still non zero, the line stays high. And I
> >> believe that writing a 1 to PMOVSSET would actually trigger an
> >> interrupt, or keep it high if it has already high.
> >>
>  Right, writing 1 to PMOVSSET will trigger an interrupt.
> 
> >> In essence, do not try to maintain side state. I've been bitten.
> 
>  So on VM entry, it check if PMOVSSET is zero. If not, call 
>  kvm_vgic_inject_irq to set the level high. If so, set the level low.
>  On VM exit, it seems there is nothing to do.
> >>
> >> It is even simpler than that:
> >>
> >> - When you get an overflow, you inject an interrupt with the level set 
> >> to 1.
> >> - When the overflow register gets cleared, you inject the same 
> >> interrupt
> >> with the level set to 0.
> >>
> >> I don't think you need to do anything else, and the world switch should
> >> be left untouched.
> >>
> 
>  On 2015/7/17 23:28, Christoffer Dall wrote:>> > +
>  kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
> >> +  pmu->irq_num, 1);
> >> what context is this overflow handler function?  kvm_vgic_inject_irq
> >> grabs a mutex, so it can sleep...
> >>
> >> from a quick glance at the perf core code, it looks like this is in
> >> interrupt context, so that call to kvm_vgic_inject_irq looks bad.
> >>
> 
>  But as Christoffer said before, it's not good to call
>  kvm_vgic_inject_irq directly in interrupt context. So if we just kick
>  the vcpu here and call kvm_vgic_inject_irq on VM entry, is this fine?
> >> Possibly. I'm slightly worried that inject_irq itself is going to kick
> >> the vcpu again for no good reason. 
> > Yes, this will introduce a extra kick. What's the impact of kicking a
> > kicked vcpu?
> 
> As long as you only kick yourself, it shouldn't be much (trying to
> decipher vcpu_kick).
> 

The behavior of vcpu_kick really depends on a number of things:

 - If you're kicking yourself, nothing happens.
 - If you're kicking a sleeping vcpu, wake it up
 - If you're kicking a running vcpu, send it a physical IPI
 - If the vcpu is not running, and not sleeping (so still runnable)
   don't do anything, just wait until it gets scheduled.

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 19/21] arm64: KVM: Turn system register numbers to an enum

2015-12-02 Thread Marc Zyngier
On 02/12/15 11:51, Christoffer Dall wrote:
> On Fri, Nov 27, 2015 at 06:50:13PM +, Marc Zyngier wrote:
>> Having the system register numbers as #defines has been a pain
>> since day one, as the ordering is pretty fragile, and moving
>> things around leads to renumbering and epic conflict resolutions.
>>
>> Now that we're mostly acessing the sysreg file in C, an enum is
>> a much better type to use, and we can clean things up a bit.
>>
>> Signed-off-by: Marc Zyngier 
>> ---
>>  arch/arm64/include/asm/kvm_asm.h | 76 -
>>  arch/arm64/include/asm/kvm_emulate.h |  1 -
>>  arch/arm64/include/asm/kvm_host.h| 81 
>> +++-
>>  arch/arm64/include/asm/kvm_mmio.h|  1 -
>>  arch/arm64/kernel/asm-offsets.c  |  1 +
>>  arch/arm64/kvm/guest.c   |  1 -
>>  arch/arm64/kvm/handle_exit.c |  1 +
>>  arch/arm64/kvm/hyp/debug-sr.c|  1 +
>>  arch/arm64/kvm/hyp/entry.S   |  3 +-
>>  arch/arm64/kvm/hyp/sysreg-sr.c   |  1 +
>>  arch/arm64/kvm/sys_regs.c|  1 +
>>  virt/kvm/arm/vgic-v3.c   |  1 +
>>  12 files changed, 87 insertions(+), 82 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_asm.h 
>> b/arch/arm64/include/asm/kvm_asm.h
>> index 5e37710..52b777b 100644
>> --- a/arch/arm64/include/asm/kvm_asm.h
>> +++ b/arch/arm64/include/asm/kvm_asm.h
>> @@ -20,82 +20,6 @@
>>  
>>  #include 
>>  
>> -/*
>> - * 0 is reserved as an invalid value.
>> - * Order *must* be kept in sync with the hyp switch code.
>> - */
>> -#define MPIDR_EL1   1   /* MultiProcessor Affinity Register */
>> -#define CSSELR_EL1  2   /* Cache Size Selection Register */
>> -#define SCTLR_EL1   3   /* System Control Register */
>> -#define ACTLR_EL1   4   /* Auxiliary Control Register */
>> -#define CPACR_EL1   5   /* Coprocessor Access Control */
>> -#define TTBR0_EL1   6   /* Translation Table Base Register 0 */
>> -#define TTBR1_EL1   7   /* Translation Table Base Register 1 */
>> -#define TCR_EL1 8   /* Translation Control Register */
>> -#define ESR_EL1 9   /* Exception Syndrome Register */
>> -#define AFSR0_EL1   10  /* Auxilary Fault Status Register 0 */
>> -#define AFSR1_EL1   11  /* Auxilary Fault Status Register 1 */
>> -#define FAR_EL1 12  /* Fault Address Register */
>> -#define MAIR_EL113  /* Memory Attribute Indirection 
>> Register */
>> -#define VBAR_EL114  /* Vector Base Address Register */
>> -#define CONTEXTIDR_EL1  15  /* Context ID Register */
>> -#define TPIDR_EL0   16  /* Thread ID, User R/W */
>> -#define TPIDRRO_EL0 17  /* Thread ID, User R/O */
>> -#define TPIDR_EL1   18  /* Thread ID, Privileged */
>> -#define AMAIR_EL1   19  /* Aux Memory Attribute Indirection 
>> Register */
>> -#define CNTKCTL_EL1 20  /* Timer Control Register (EL1) */
>> -#define PAR_EL1 21  /* Physical Address Register */
>> -#define MDSCR_EL1   22  /* Monitor Debug System Control Register */
>> -#define MDCCINT_EL1 23  /* Monitor Debug Comms Channel Interrupt Enable 
>> Reg */
>> -
>> -/* 32bit specific registers. Keep them at the end of the range */
>> -#define DACR32_EL2  24  /* Domain Access Control Register */
>> -#define IFSR32_EL2  25  /* Instruction Fault Status Register */
>> -#define FPEXC32_EL2 26  /* Floating-Point Exception Control 
>> Register */
>> -#define DBGVCR32_EL227  /* Debug Vector Catch Register */
>> -#define NR_SYS_REGS 28
>> -
>> -/* 32bit mapping */
>> -#define c0_MPIDR(MPIDR_EL1 * 2) /* MultiProcessor ID Register */
>> -#define c0_CSSELR   (CSSELR_EL1 * 2)/* Cache Size Selection Register */
>> -#define c1_SCTLR(SCTLR_EL1 * 2) /* System Control Register */
>> -#define c1_ACTLR(ACTLR_EL1 * 2) /* Auxiliary Control Register */
>> -#define c1_CPACR(CPACR_EL1 * 2) /* Coprocessor Access Control */
>> -#define c2_TTBR0(TTBR0_EL1 * 2) /* Translation Table Base Register 0 */
>> -#define c2_TTBR0_high   (c2_TTBR0 + 1)  /* TTBR0 top 32 bits */
>> -#define c2_TTBR1(TTBR1_EL1 * 2) /* Translation Table Base Register 1 */
>> -#define c2_TTBR1_high   (c2_TTBR1 + 1)  /* TTBR1 top 32 bits */
>> -#define c2_TTBCR(TCR_EL1 * 2)   /* Translation Table Base Control R. */
>> -#define c3_DACR (DACR32_EL2 * 2)/* Domain Access Control 
>> Register */
>> -#define c5_DFSR (ESR_EL1 * 2)   /* Data Fault Status Register */
>> -#define c5_IFSR (IFSR32_EL2 * 2)/* Instruction Fault Status 
>> Register */
>> -#define c5_ADFSR(AFSR0_EL1 * 2) /* Auxiliary Data Fault Status R */
>> -#define c5_AIFSR(AFSR1_EL1 * 2) /* Auxiliary Instr Fault Status R */
>> -#define c6_DFAR (FAR_EL1 * 2)   

Re: [PATCH v2 10/21] arm64: KVM: Add patchable function selector

2015-12-02 Thread Christoffer Dall
On Wed, Dec 02, 2015 at 01:19:22PM +, Marc Zyngier wrote:
> On 02/12/15 11:53, Christoffer Dall wrote:
> > On Wed, Dec 02, 2015 at 09:47:43AM +, Marc Zyngier wrote:
> >> On 02/12/15 09:27, Christoffer Dall wrote:
> >>> On Tue, Dec 01, 2015 at 06:51:00PM +, Marc Zyngier wrote:
>  On 01/12/15 15:39, Christoffer Dall wrote:
> > On Fri, Nov 27, 2015 at 06:50:04PM +, Marc Zyngier wrote:
> >> KVM so far relies on code patching, and is likely to use it more
> >> in the future. The main issue is that our alternative system works
> >> at the instruction level, while we'd like to have alternatives at
> >> the function level.
> >>
> >> In order to cope with this, add the "hyp_alternate_select" macro that
> >> outputs a brief sequence of code that in turn can be patched, allowing
> >> al alternative function to be selected.
> >
> > s/al/an/ ?
> >
> >>
> >> Signed-off-by: Marc Zyngier 
> >> ---
> >>  arch/arm64/kvm/hyp/hyp.h | 16 
> >>  1 file changed, 16 insertions(+)
> >>
> >> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
> >> index 7ac8e11..f0427ee 100644
> >> --- a/arch/arm64/kvm/hyp/hyp.h
> >> +++ b/arch/arm64/kvm/hyp/hyp.h
> >> @@ -27,6 +27,22 @@
> >>  
> >>  #define kern_hyp_va(v) (typeof(v))((unsigned long)v & 
> >> HYP_PAGE_OFFSET_MASK)
> >>  
> >> +/*
> >> + * Generates patchable code sequences that are used to switch between
> >> + * two implementations of a function, depending on the availability of
> >> + * a feature.
> >> + */
> >
> > This looks right to me, but I'm a bit unclear what the types of this is
> > and how to use it.
> >
> > Are orig and alt function pointers and cond is a CONFIG_FOO ?  fname is
> > a symbol, which is defined as a prototype somewhere and then implemented
> > here, or?
> >
> > Perhaps a Usage: part of the docs would be helpful.
> 
>  How about:
> 
>  @fname: a symbol name that will be defined as a function returning a
>  function pointer whose type will match @orig and @alt
>  @orig: A pointer to the default function, as returned by @fname when
>  @cond doesn't hold
>  @alt: A pointer to the alternate function, as returned by @fname when
>  @cond holds
>  @cond: a CPU feature (as described in asm/cpufeature.h)
> >>>
> >>> looks good.
> >>>
> 
> >
> >> +#define hyp_alternate_select(fname, orig, alt, cond)  
> >> \
> >> +typeof(orig) * __hyp_text fname(void) 
> >> \
> >> +{ 
> >> \
> >> +  typeof(alt) *val = orig;
> >> \
> >> +  asm volatile(ALTERNATIVE("nop   \n",
> >> \
> >> +   "mov   %0, %1  \n",
> >> \
> >> +   cond)  
> >> \
> >> +   : "+r" (val) : "r" (alt)); 
> >> \
> >> +  return val; 
> >> \
> >> +}
> >> +
> >>  void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
> >>  void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
> >>  
> >> -- 
> >> 2.1.4
> >>
> >
> > I haven't thought much about how all of this is implemented, but from my
> > point of views the ideal situation would be something like:
> >
> > void foo(int a, int b)
> > {
> > ALTERNATIVE_IF_NOT CONFIG_BAR
> > foo_legacy(a, b);
> > ALTERNATIVE_ELSE
> > foo_new(a, b);
> > ALTERNATIVE_END
> > }
> >
> > I realize this may be impossible because the C code could implement all
> > sort of fun stuff around the actual function calls, but would there be
> > some way to annotate the functions and find the actual branch statement
> > and change the target?
> 
>  The main issue is that C doesn't give you any access to the branch
>  function itself, except for the asm-goto statements. It also makes it
>  very hard to preserve the return type. For your idea to work, we'd need
>  some support in the compiler itself. I'm sure that it is doable, just
>  not by me! ;-)
> >>>
> >>> Not by me either, I'm just asking stupid questions - as always.
> >>
> >> I don't find that stupid. Asking that kind of stuff is useful to put
> >> things in perspective.
> >>
> > 
> > Thanks!
> > 
> 
>  This is why I've ended up creating something that returns a function
>  *pointer*, because that's something that exists in the language (no new
>  concept). I simply made sure I could return it at minimal cost.
> 
> >>>
> >>> I 

Re: [PATCH v2 16/21] arm64: KVM: Add compatibility aliases

2015-12-02 Thread Marc Zyngier
On 02/12/15 11:49, Christoffer Dall wrote:
> On Fri, Nov 27, 2015 at 06:50:10PM +, Marc Zyngier wrote:
>> So far, we've implemented the new world switch with a completely
>> different namespace, so that we could have both implementation
>> compiled in.
>>
>> Let's take things one step further by adding weak aliases that
>> have the same names as the original implementation. The weak
>> attributes allows the new implementation to be overriden by the
>> old one, and everything still work.
> 
> Do I understand correctly that the whole point of this is to keep
> everything compiling nicely while at the same time being able to split
> the patches so that you can have an isolated "remove old code" patch
> that doesn't have to change the callers?

Exactly.

> If so, I think explaining this rationale would be helpful in the commit
> message in case we have to go back and track these changes in connection
> with a regression and don't remember why we did things this way.

Fair enough. I'll update the commit message (possibly by stealing a
large part of the above text!).

> Maybe I'm being over-cautious though...
> 
> Otherwise:
> 
> Acked-by: Christoffer Dall 

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 12/21] arm64: KVM: Implement fpsimd save/restore

2015-12-02 Thread Marc Zyngier
On 02/12/15 11:53, Christoffer Dall wrote:
> On Fri, Nov 27, 2015 at 06:50:06PM +, Marc Zyngier wrote:
>> Implement the fpsimd save restore, keeping the lazy part in
>> assembler (as returning to C would be overkill).
>>
>> Signed-off-by: Marc Zyngier 
>> ---
>>  arch/arm64/kvm/hyp/Makefile|  1 +
>>  arch/arm64/kvm/hyp/entry.S | 32 +++-
>>  arch/arm64/kvm/hyp/fpsimd.S| 33 +
>>  arch/arm64/kvm/hyp/hyp.h   |  7 +++
>>  arch/arm64/kvm/hyp/switch.c|  8 
>>  arch/arm64/kvm/hyp/sysreg-sr.c |  2 +-
>>  6 files changed, 81 insertions(+), 2 deletions(-)
>>  create mode 100644 arch/arm64/kvm/hyp/fpsimd.S
>>
>> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
>> index 9c11b0f..56238d0 100644
>> --- a/arch/arm64/kvm/hyp/Makefile
>> +++ b/arch/arm64/kvm/hyp/Makefile
>> @@ -9,3 +9,4 @@ obj-$(CONFIG_KVM_ARM_HOST) += sysreg-sr.o
>>  obj-$(CONFIG_KVM_ARM_HOST) += debug-sr.o
>>  obj-$(CONFIG_KVM_ARM_HOST) += entry.o
>>  obj-$(CONFIG_KVM_ARM_HOST) += switch.o
>> +obj-$(CONFIG_KVM_ARM_HOST) += fpsimd.o
>> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
>> index 2c4449a..7552922 100644
>> --- a/arch/arm64/kvm/hyp/entry.S
>> +++ b/arch/arm64/kvm/hyp/entry.S
>> @@ -27,6 +27,7 @@
>>  
>>  #define CPU_GP_REG_OFFSET(x)(CPU_GP_REGS + x)
>>  #define CPU_XREG_OFFSET(x)  CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
>> +#define CPU_SYSREG_OFFSET(x)(CPU_SYSREGS + 8*x)
>>  
>>  .text
>>  .pushsection.hyp.text, "ax"
>> @@ -152,4 +153,33 @@ ENTRY(__guest_exit)
>>  ret
>>  ENDPROC(__guest_exit)
>>  
>> -/* Insert fault handling here */
>> +ENTRY(__fpsimd_guest_restore)
>> +pushx4, lr
>> +
>> +mrs x2, cptr_el2
>> +bic x2, x2, #CPTR_EL2_TFP
>> +msr cptr_el2, x2
>> +isb
>> +
>> +mrs x3, tpidr_el2
>> +
>> +ldr x0, [x3, #VCPU_HOST_CONTEXT]
>> +kern_hyp_va x0
>> +add x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
>> +bl  __fpsimd_save_state
>> +
>> +add x2, x3, #VCPU_CONTEXT
>> +add x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
>> +bl  __fpsimd_restore_state
>> +
>> +mrs x1, hcr_el2
>> +tbnzx1, #HCR_RW_SHIFT, 1f
> 
> nit: Add a comment along the lines of:
> // Skip restoring fpexc32 for AArch64 guests
> 
>> +ldr x4, [x2, #CPU_SYSREG_OFFSET(FPEXC32_EL2)]
>> +msr fpexc32_el2, x4
>> +1:
>> +pop x4, lr
>> +pop x2, x3
>> +pop x0, x1
>> +
>> +eret
>> +ENDPROC(__fpsimd_guest_restore)
>> diff --git a/arch/arm64/kvm/hyp/fpsimd.S b/arch/arm64/kvm/hyp/fpsimd.S
>> new file mode 100644
>> index 000..da3f22c
>> --- /dev/null
>> +++ b/arch/arm64/kvm/hyp/fpsimd.S
>> @@ -0,0 +1,33 @@
>> +/*
>> + * Copyright (C) 2015 - ARM Ltd
>> + * Author: Marc Zyngier 
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see .
>> + */
>> +
>> +#include 
>> +
>> +#include 
>> +
>> +.text
>> +.pushsection.hyp.text, "ax"
>> +
>> +ENTRY(__fpsimd_save_state)
>> +fpsimd_save x0, 1
>> +ret
>> +ENDPROC(__fpsimd_save_state)
>> +
>> +ENTRY(__fpsimd_restore_state)
>> +fpsimd_restore  x0, 1
>> +ret
>> +ENDPROC(__fpsimd_restore_state)
>> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
>> index f0427ee..18365dd 100644
>> --- a/arch/arm64/kvm/hyp/hyp.h
>> +++ b/arch/arm64/kvm/hyp/hyp.h
>> @@ -66,6 +66,13 @@ void __debug_restore_state(struct kvm_vcpu *vcpu,
>>  void __debug_cond_save_host_state(struct kvm_vcpu *vcpu);
>>  void __debug_cond_restore_host_state(struct kvm_vcpu *vcpu);
>>  
>> +void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
>> +void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
>> +static inline bool __fpsimd_enabled(void)
>> +{
>> +return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP);
>> +}
>> +
>>  u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt);
>>  
>>  #endif /* __ARM64_KVM_HYP_H__ */
>> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
>> index d67ed9e..8affc19 100644
>> --- a/arch/arm64/kvm/hyp/switch.c
>> +++ b/arch/arm64/kvm/hyp/switch.c
>> @@ -88,6 +88,7 @@ int __hyp_text __guest_run(struct kvm_vcpu *vcpu)
>>  {
>>  struct kvm_cpu_context *host_ctxt;
>>  struct kvm_cpu_context *guest_ctxt;
>> +bool fp_enabled;
>>  

Re: [PATCH v2 21/21] arm64: KVM: Remove weak attributes

2015-12-02 Thread Marc Zyngier
On 02/12/15 11:47, Christoffer Dall wrote:
> On Fri, Nov 27, 2015 at 06:50:15PM +, Marc Zyngier wrote:
>> As we've now switched to the new world switch implementation,
>> remove the weak attributes, as nobody is supposed to override
>> it anymore.
> 
> Why not remove the aliases and change the callers?

This is likely to be a bigger patch, and it would affect the 32bit as
well. So far, I'm choosing to keep things the same. Another solution
would be to completely drop the aliases, and just rename the new
function to have the old names.

I don't mind either way.

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 21/21] arm64: KVM: Remove weak attributes

2015-12-02 Thread Christoffer Dall
On Wed, Dec 02, 2015 at 03:21:49PM +, Marc Zyngier wrote:
> On 02/12/15 11:47, Christoffer Dall wrote:
> > On Fri, Nov 27, 2015 at 06:50:15PM +, Marc Zyngier wrote:
> >> As we've now switched to the new world switch implementation,
> >> remove the weak attributes, as nobody is supposed to override
> >> it anymore.
> > 
> > Why not remove the aliases and change the callers?
> 
> This is likely to be a bigger patch, and it would affect the 32bit as
> well. So far, I'm choosing to keep things the same. Another solution
> would be to completely drop the aliases, and just rename the new
> function to have the old names.
> 
> I don't mind either way.
> 
I didn't think of the 32-bit side.  I think eventually we should get rid
of the aliases and just have the funcitons named as they are called, but
there's no rush.  We can wait until we've the done the 32-bit side if
you prefer.

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 12/21] arm64: KVM: Implement fpsimd save/restore

2015-12-02 Thread Christoffer Dall
On Wed, Dec 02, 2015 at 03:29:50PM +, Marc Zyngier wrote:
> On 02/12/15 11:53, Christoffer Dall wrote:
> > On Fri, Nov 27, 2015 at 06:50:06PM +, Marc Zyngier wrote:
> >> Implement the fpsimd save restore, keeping the lazy part in
> >> assembler (as returning to C would be overkill).
> >>
> >> Signed-off-by: Marc Zyngier 
> >> ---
> >>  arch/arm64/kvm/hyp/Makefile|  1 +
> >>  arch/arm64/kvm/hyp/entry.S | 32 +++-
> >>  arch/arm64/kvm/hyp/fpsimd.S| 33 +
> >>  arch/arm64/kvm/hyp/hyp.h   |  7 +++
> >>  arch/arm64/kvm/hyp/switch.c|  8 
> >>  arch/arm64/kvm/hyp/sysreg-sr.c |  2 +-
> >>  6 files changed, 81 insertions(+), 2 deletions(-)
> >>  create mode 100644 arch/arm64/kvm/hyp/fpsimd.S
> >>
> >> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> >> index 9c11b0f..56238d0 100644
> >> --- a/arch/arm64/kvm/hyp/Makefile
> >> +++ b/arch/arm64/kvm/hyp/Makefile
> >> @@ -9,3 +9,4 @@ obj-$(CONFIG_KVM_ARM_HOST) += sysreg-sr.o
> >>  obj-$(CONFIG_KVM_ARM_HOST) += debug-sr.o
> >>  obj-$(CONFIG_KVM_ARM_HOST) += entry.o
> >>  obj-$(CONFIG_KVM_ARM_HOST) += switch.o
> >> +obj-$(CONFIG_KVM_ARM_HOST) += fpsimd.o
> >> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> >> index 2c4449a..7552922 100644
> >> --- a/arch/arm64/kvm/hyp/entry.S
> >> +++ b/arch/arm64/kvm/hyp/entry.S
> >> @@ -27,6 +27,7 @@
> >>  
> >>  #define CPU_GP_REG_OFFSET(x)  (CPU_GP_REGS + x)
> >>  #define CPU_XREG_OFFSET(x)CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 
> >> 8*x)
> >> +#define CPU_SYSREG_OFFSET(x)  (CPU_SYSREGS + 8*x)
> >>  
> >>.text
> >>.pushsection.hyp.text, "ax"
> >> @@ -152,4 +153,33 @@ ENTRY(__guest_exit)
> >>ret
> >>  ENDPROC(__guest_exit)
> >>  
> >> -  /* Insert fault handling here */
> >> +ENTRY(__fpsimd_guest_restore)
> >> +  pushx4, lr
> >> +
> >> +  mrs x2, cptr_el2
> >> +  bic x2, x2, #CPTR_EL2_TFP
> >> +  msr cptr_el2, x2
> >> +  isb
> >> +
> >> +  mrs x3, tpidr_el2
> >> +
> >> +  ldr x0, [x3, #VCPU_HOST_CONTEXT]
> >> +  kern_hyp_va x0
> >> +  add x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >> +  bl  __fpsimd_save_state
> >> +
> >> +  add x2, x3, #VCPU_CONTEXT
> >> +  add x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> >> +  bl  __fpsimd_restore_state
> >> +
> >> +  mrs x1, hcr_el2
> >> +  tbnzx1, #HCR_RW_SHIFT, 1f
> > 
> > nit: Add a comment along the lines of:
> > // Skip restoring fpexc32 for AArch64 guests
> > 
> >> +  ldr x4, [x2, #CPU_SYSREG_OFFSET(FPEXC32_EL2)]
> >> +  msr fpexc32_el2, x4
> >> +1:
> >> +  pop x4, lr
> >> +  pop x2, x3
> >> +  pop x0, x1
> >> +
> >> +  eret
> >> +ENDPROC(__fpsimd_guest_restore)
> >> diff --git a/arch/arm64/kvm/hyp/fpsimd.S b/arch/arm64/kvm/hyp/fpsimd.S
> >> new file mode 100644
> >> index 000..da3f22c
> >> --- /dev/null
> >> +++ b/arch/arm64/kvm/hyp/fpsimd.S
> >> @@ -0,0 +1,33 @@
> >> +/*
> >> + * Copyright (C) 2015 - ARM Ltd
> >> + * Author: Marc Zyngier 
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License version 2 as
> >> + * published by the Free Software Foundation.
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public License
> >> + * along with this program.  If not, see .
> >> + */
> >> +
> >> +#include 
> >> +
> >> +#include 
> >> +
> >> +  .text
> >> +  .pushsection.hyp.text, "ax"
> >> +
> >> +ENTRY(__fpsimd_save_state)
> >> +  fpsimd_save x0, 1
> >> +  ret
> >> +ENDPROC(__fpsimd_save_state)
> >> +
> >> +ENTRY(__fpsimd_restore_state)
> >> +  fpsimd_restore  x0, 1
> >> +  ret
> >> +ENDPROC(__fpsimd_restore_state)
> >> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
> >> index f0427ee..18365dd 100644
> >> --- a/arch/arm64/kvm/hyp/hyp.h
> >> +++ b/arch/arm64/kvm/hyp/hyp.h
> >> @@ -66,6 +66,13 @@ void __debug_restore_state(struct kvm_vcpu *vcpu,
> >>  void __debug_cond_save_host_state(struct kvm_vcpu *vcpu);
> >>  void __debug_cond_restore_host_state(struct kvm_vcpu *vcpu);
> >>  
> >> +void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
> >> +void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
> >> +static inline bool __fpsimd_enabled(void)
> >> +{
> >> +  return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP);
> >> +}
> >> +
> >>  u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context 
> >> *host_ctxt);
> >>  
> >>  #endif /* __ARM64_KVM_HYP_H__ */
> >> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> >> index 

Re: [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC

2015-12-02 Thread Michael S. Tsirkin
On Wed, Dec 02, 2015 at 10:08:25PM +0800, Lan, Tianyu wrote:
> On 12/1/2015 11:02 PM, Michael S. Tsirkin wrote:
> >>But
> >>it requires guest OS to do specific configurations inside and rely on
> >>bonding driver which blocks it work on Windows.
> >> From performance side,
> >>putting VF and virtio NIC under bonded interface will affect their
> >>performance even when not do migration. These factors block to use VF
> >>NIC passthough in some user cases(Especially in the cloud) which require
> >>migration.
> >
> >That's really up to guest. You don't need to do bonding,
> >you can just move the IP and mac from userspace, that's
> >possible on most OS-es.
> >
> >Or write something in guest kernel that is more lightweight if you are
> >so inclined. What we are discussing here is the host-guest interface,
> >not the in-guest interface.
> >
> >>Current solution we proposed changes NIC driver and Qemu. Guest Os
> >>doesn't need to do special thing for migration.
> >>It's easy to deploy
> >
> >
> >Except of course these patches don't even work properly yet.
> >
> >And when they do, even minor changes in host side NIC hardware across
> >migration will break guests in hard to predict ways.
> 
> Switching between PV and VF NIC will introduce network stop and the
> latency of hotplug VF is measurable.
> For some user cases(cloud service
> and OPNFV) which are sensitive to network stabilization and performance,
> these are not friend and blocks SRIOV NIC usage in these case.

I find this hard to credit. hotplug is not normally a data path
operation.

> We hope
> to find a better way to make SRIOV NIC work in these cases and this is
> worth to do since SRIOV NIC provides better network performance compared
> with PV NIC.

If this is a performance optimization as the above implies,
you need to include some numbers, and document how did
you implement the switch and how did you measure the performance.

> Current patches have some issues. I think we can find
> solution for them andimprove them step by step.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 21/21] arm64: KVM: Remove weak attributes

2015-12-02 Thread Marc Zyngier
On 02/12/15 16:21, Christoffer Dall wrote:
> On Wed, Dec 02, 2015 at 03:21:49PM +, Marc Zyngier wrote:
>> On 02/12/15 11:47, Christoffer Dall wrote:
>>> On Fri, Nov 27, 2015 at 06:50:15PM +, Marc Zyngier wrote:
 As we've now switched to the new world switch implementation,
 remove the weak attributes, as nobody is supposed to override
 it anymore.
>>>
>>> Why not remove the aliases and change the callers?
>>
>> This is likely to be a bigger patch, and it would affect the 32bit as
>> well. So far, I'm choosing to keep things the same. Another solution
>> would be to completely drop the aliases, and just rename the new
>> function to have the old names.
>>
>> I don't mind either way.
>>
> I didn't think of the 32-bit side.  I think eventually we should get rid
> of the aliases and just have the funcitons named as they are called, but
> there's no rush.  We can wait until we've the done the 32-bit side if
> you prefer.

Probably safest. We're already changing a lot of things. Hopefully for
4.6 I'll have the 32bit side sorted.

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: arm/arm64: Revert to old way of checking for device mapping in stage2_flush_ptes().

2015-12-02 Thread Ard Biesheuvel
Hi Pavel,

Thanks for getting to the bottom of this.

On 1 December 2015 at 14:03, Pavel Fedin  wrote:
> This function takes stage-II physical addresses (A.K.A. IPA), on input, not
> real physical addresses. This causes kvm_is_device_pfn() to return wrong
> values, depending on how much guest and host memory maps match. This
> results in completely broken KVM on some boards. The problem has been
> caught on Samsung proprietary hardware.
>
> Cc: sta...@vger.kernel.org
> Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's uncachedness")
>

That commit is not in a release yet, so no need for cc stable

> Signed-off-by: Pavel Fedin 
> ---
>  arch/arm/kvm/mmu.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 7dace90..51ad98f 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -310,7 +310,8 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
>
> pte = pte_offset_kernel(pmd, addr);
> do {
> -   if (!pte_none(*pte) && 
> !kvm_is_device_pfn(__phys_to_pfn(addr)))
> +   if (!pte_none(*pte) &&
> +   (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)

I think your analysis is correct, but does that not apply to both instances?
And instead of reverting, could we fix this properly instead?

> kvm_flush_dcache_pte(*pte);
> } while (pte++, addr += PAGE_SIZE, addr != end);
>  }
> --
> 2.4.4
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: arm/arm64: vgic: make vgic_io_ops static

2015-12-02 Thread Christoffer Dall
On Thu, Nov 12, 2015 at 07:59:14PM +0800, Jisheng Zhang wrote:
> vgic_io_ops is only referenced within vgic.c, so it can be declared
> static.
> 
> Signed-off-by: Jisheng Zhang 

Applied to queue,
-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: arm/arm64: Revert to old way of checking for device mapping in stage2_flush_ptes().

2015-12-02 Thread Ard Biesheuvel
On 2 December 2015 at 19:50, Christoffer Dall
 wrote:
> On Tue, Dec 01, 2015 at 04:03:52PM +0300, Pavel Fedin wrote:
>> This function takes stage-II physical addresses (A.K.A. IPA), on input, not
>> real physical addresses. This causes kvm_is_device_pfn() to return wrong
>> values, depending on how much guest and host memory maps match. This
>> results in completely broken KVM on some boards. The problem has been
>> caught on Samsung proprietary hardware.
>>
>> Cc: sta...@vger.kernel.org
>
> cc'ing stable doesn't make sense here as the bug was introduced in
> v4.4-rc3 and we didn't release v4.4 yet...
>
>> Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's 
>> uncachedness")
>>
>> Signed-off-by: Pavel Fedin 
>> ---
>>  arch/arm/kvm/mmu.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 7dace90..51ad98f 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -310,7 +310,8 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t 
>> *pmd,
>>
>>   pte = pte_offset_kernel(pmd, addr);
>>   do {
>> - if (!pte_none(*pte) && !kvm_is_device_pfn(__phys_to_pfn(addr)))
>> + if (!pte_none(*pte) &&
>> + (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
>>   kvm_flush_dcache_pte(*pte);
>>   } while (pte++, addr += PAGE_SIZE, addr != end);
>>  }
>
> You are right that there was a bug in the fix, but your fix is not the
> right one.
>
> Either we have to apply an actual mask and the compare against the value
> (yes, I know, because of the UXN bit we get lucky so far, but that's too
> brittle), or we should do a translation fo the gfn to a pfn.  Is there
> anything preventing us to do the following?
>
> if (!pte_none(*pte) && !kvm_is_device_pfn(pte_pfn(*pte)))
>

Yes, that looks better. I got confused by addr being a 'phys_addr_t'
but obviously, the address inside the PTE is the one we need to test
for device-ness, so I think we should replace both instances with this

-- 
Ard.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: arm/arm64: Revert to old way of checking for device mapping in stage2_flush_ptes().

2015-12-02 Thread Christoffer Dall
On Wed, Dec 02, 2015 at 08:04:42PM +0100, Ard Biesheuvel wrote:
> On 2 December 2015 at 19:50, Christoffer Dall
>  wrote:
> > On Tue, Dec 01, 2015 at 04:03:52PM +0300, Pavel Fedin wrote:
> >> This function takes stage-II physical addresses (A.K.A. IPA), on input, not
> >> real physical addresses. This causes kvm_is_device_pfn() to return wrong
> >> values, depending on how much guest and host memory maps match. This
> >> results in completely broken KVM on some boards. The problem has been
> >> caught on Samsung proprietary hardware.
> >>
> >> Cc: sta...@vger.kernel.org
> >
> > cc'ing stable doesn't make sense here as the bug was introduced in
> > v4.4-rc3 and we didn't release v4.4 yet...
> >
> >> Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's 
> >> uncachedness")
> >>
> >> Signed-off-by: Pavel Fedin 
> >> ---
> >>  arch/arm/kvm/mmu.c | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> >> index 7dace90..51ad98f 100644
> >> --- a/arch/arm/kvm/mmu.c
> >> +++ b/arch/arm/kvm/mmu.c
> >> @@ -310,7 +310,8 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t 
> >> *pmd,
> >>
> >>   pte = pte_offset_kernel(pmd, addr);
> >>   do {
> >> - if (!pte_none(*pte) && 
> >> !kvm_is_device_pfn(__phys_to_pfn(addr)))
> >> + if (!pte_none(*pte) &&
> >> + (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
> >>   kvm_flush_dcache_pte(*pte);
> >>   } while (pte++, addr += PAGE_SIZE, addr != end);
> >>  }
> >
> > You are right that there was a bug in the fix, but your fix is not the
> > right one.
> >
> > Either we have to apply an actual mask and the compare against the value
> > (yes, I know, because of the UXN bit we get lucky so far, but that's too
> > brittle), or we should do a translation fo the gfn to a pfn.  Is there
> > anything preventing us to do the following?
> >
> > if (!pte_none(*pte) && !kvm_is_device_pfn(pte_pfn(*pte)))
> >
> 
> Yes, that looks better. I got confused by addr being a 'phys_addr_t'

Yeah, that's what I thought when I saw this.  Admittedly we could have a
typedef for the IPA, but oh well...

> but obviously, the address inside the PTE is the one we need to test
> for device-ness, so I think we should replace both instances with this
> 

care to send a patch by any chance?

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: arm/arm64: Revert to old way of checking for device mapping in stage2_flush_ptes().

2015-12-02 Thread Christoffer Dall
On Tue, Dec 01, 2015 at 04:03:52PM +0300, Pavel Fedin wrote:
> This function takes stage-II physical addresses (A.K.A. IPA), on input, not
> real physical addresses. This causes kvm_is_device_pfn() to return wrong
> values, depending on how much guest and host memory maps match. This
> results in completely broken KVM on some boards. The problem has been
> caught on Samsung proprietary hardware.
> 
> Cc: sta...@vger.kernel.org

cc'ing stable doesn't make sense here as the bug was introduced in
v4.4-rc3 and we didn't release v4.4 yet...

> Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's uncachedness")
> 
> Signed-off-by: Pavel Fedin 
> ---
>  arch/arm/kvm/mmu.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 7dace90..51ad98f 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -310,7 +310,8 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
>  
>   pte = pte_offset_kernel(pmd, addr);
>   do {
> - if (!pte_none(*pte) && !kvm_is_device_pfn(__phys_to_pfn(addr)))
> + if (!pte_none(*pte) &&
> + (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
>   kvm_flush_dcache_pte(*pte);
>   } while (pte++, addr += PAGE_SIZE, addr != end);
>  }

You are right that there was a bug in the fix, but your fix is not the
right one.

Either we have to apply an actual mask and the compare against the value
(yes, I know, because of the UXN bit we get lucky so far, but that's too
brittle), or we should do a translation fo the gfn to a pfn.  Is there
anything preventing us to do the following?

if (!pte_none(*pte) && !kvm_is_device_pfn(pte_pfn(*pte)))

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 18/21] KVM: ARM64: Add PMU overflow interrupt routing

2015-12-02 Thread Marc Zyngier
On 02/12/15 02:40, Shannon Zhao wrote:
> 
> 
> On 2015/12/2 0:57, Marc Zyngier wrote:
>> On 01/12/15 16:26, Shannon Zhao wrote:
>>>
>>>
>>> On 2015/12/1 23:41, Marc Zyngier wrote:
> The reason is that when guest clear the overflow register, it will trap
>> to kvm and call kvm_pmu_sync_hwstate() as you see above. At this moment,
>> the overflow register is still overflowed(that is some bit is still 1).
>> So We need to use some flag to mark we already inject this interrupt.
>> And if during guest handling the overflow, there is a new overflow
>> happening, the pmu->irq_pending will be set ture by
>> kvm_pmu_perf_overflow(), then it needs to inject this new interrupt, 
>> right?
 I don't think so. This is a level interrupt, so the level should stay
 high as long as the guest hasn't cleared all possible sources for that
 interrupt.

 For your example, the guest writes to PMOVSCLR to clear the overflow
 caused by a given counter. If the status is now 0, the interrupt line
 drops. If the status is still non zero, the line stays high. And I
 believe that writing a 1 to PMOVSSET would actually trigger an
 interrupt, or keep it high if it has already high.

>>> Right, writing 1 to PMOVSSET will trigger an interrupt.
>>>
 In essence, do not try to maintain side state. I've been bitten.
>>>
>>> So on VM entry, it check if PMOVSSET is zero. If not, call 
>>> kvm_vgic_inject_irq to set the level high. If so, set the level low.
>>> On VM exit, it seems there is nothing to do.
>>
>> It is even simpler than that:
>>
>> - When you get an overflow, you inject an interrupt with the level set to 1.
>> - When the overflow register gets cleared, you inject the same interrupt
>> with the level set to 0.
>>
>> I don't think you need to do anything else, and the world switch should
>> be left untouched.
>>
> 
> On 2015/7/17 23:28, Christoffer Dall wrote:>> > + 
> kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
 +  pmu->irq_num, 1);
>> what context is this overflow handler function?  kvm_vgic_inject_irq
>> grabs a mutex, so it can sleep...
>>
>> from a quick glance at the perf core code, it looks like this is in
>> interrupt context, so that call to kvm_vgic_inject_irq looks bad.
>>
> 
> But as Christoffer said before, it's not good to call
> kvm_vgic_inject_irq directly in interrupt context. So if we just kick
> the vcpu here and call kvm_vgic_inject_irq on VM entry, is this fine?

Possibly. I'm slightly worried that inject_irq itself is going to kick
the vcpu again for no good reason. I guess we'll find out (and maybe
we'll add a kvm_vgic_inject_irq_no_kick_please() helper...).

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 10/21] arm64: KVM: Add patchable function selector

2015-12-02 Thread Christoffer Dall
On Tue, Dec 01, 2015 at 06:51:00PM +, Marc Zyngier wrote:
> On 01/12/15 15:39, Christoffer Dall wrote:
> > On Fri, Nov 27, 2015 at 06:50:04PM +, Marc Zyngier wrote:
> >> KVM so far relies on code patching, and is likely to use it more
> >> in the future. The main issue is that our alternative system works
> >> at the instruction level, while we'd like to have alternatives at
> >> the function level.
> >>
> >> In order to cope with this, add the "hyp_alternate_select" macro that
> >> outputs a brief sequence of code that in turn can be patched, allowing
> >> al alternative function to be selected.
> > 
> > s/al/an/ ?
> > 
> >>
> >> Signed-off-by: Marc Zyngier 
> >> ---
> >>  arch/arm64/kvm/hyp/hyp.h | 16 
> >>  1 file changed, 16 insertions(+)
> >>
> >> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
> >> index 7ac8e11..f0427ee 100644
> >> --- a/arch/arm64/kvm/hyp/hyp.h
> >> +++ b/arch/arm64/kvm/hyp/hyp.h
> >> @@ -27,6 +27,22 @@
> >>  
> >>  #define kern_hyp_va(v) (typeof(v))((unsigned long)v & 
> >> HYP_PAGE_OFFSET_MASK)
> >>  
> >> +/*
> >> + * Generates patchable code sequences that are used to switch between
> >> + * two implementations of a function, depending on the availability of
> >> + * a feature.
> >> + */
> > 
> > This looks right to me, but I'm a bit unclear what the types of this is
> > and how to use it.
> > 
> > Are orig and alt function pointers and cond is a CONFIG_FOO ?  fname is
> > a symbol, which is defined as a prototype somewhere and then implemented
> > here, or?
> > 
> > Perhaps a Usage: part of the docs would be helpful.
> 
> How about:
> 
> @fname: a symbol name that will be defined as a function returning a
> function pointer whose type will match @orig and @alt
> @orig: A pointer to the default function, as returned by @fname when
> @cond doesn't hold
> @alt: A pointer to the alternate function, as returned by @fname when
> @cond holds
> @cond: a CPU feature (as described in asm/cpufeature.h)

looks good.

> 
> > 
> >> +#define hyp_alternate_select(fname, orig, alt, cond)  
> >> \
> >> +typeof(orig) * __hyp_text fname(void) 
> >> \
> >> +{ \
> >> +  typeof(alt) *val = orig;\
> >> +  asm volatile(ALTERNATIVE("nop   \n",\
> >> +   "mov   %0, %1  \n",\
> >> +   cond)  \
> >> +   : "+r" (val) : "r" (alt)); \
> >> +  return val; \
> >> +}
> >> +
> >>  void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
> >>  void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
> >>  
> >> -- 
> >> 2.1.4
> >>
> > 
> > I haven't thought much about how all of this is implemented, but from my
> > point of views the ideal situation would be something like:
> > 
> > void foo(int a, int b)
> > {
> > ALTERNATIVE_IF_NOT CONFIG_BAR
> > foo_legacy(a, b);
> > ALTERNATIVE_ELSE
> > foo_new(a, b);
> > ALTERNATIVE_END
> > }
> > 
> > I realize this may be impossible because the C code could implement all
> > sort of fun stuff around the actual function calls, but would there be
> > some way to annotate the functions and find the actual branch statement
> > and change the target?
> 
> The main issue is that C doesn't give you any access to the branch
> function itself, except for the asm-goto statements. It also makes it
> very hard to preserve the return type. For your idea to work, we'd need
> some support in the compiler itself. I'm sure that it is doable, just
> not by me! ;-)

Not by me either, I'm just asking stupid questions - as always.

> 
> This is why I've ended up creating something that returns a function
> *pointer*, because that's something that exists in the language (no new
> concept). I simply made sure I could return it at minimal cost.
> 

I don't have a problem with this either.  I'm curious though, how much
of a performance improvement (and why) we get from doing this as opposed
to a simple if-statement?

Thanks,
-Christoffer

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: BUG ALERT: ARM32 KVM does not work in 4.4-rc3

2015-12-02 Thread Pavel Fedin
 Hello!

> > My project involves ARM64, but from time to time i also test ARM32
> > KVM. I have discovered that it stopped working in 4.4-rc3. The same
> > virtual machine works perfectly under current kvmarm/next, but gets
> > stuck at random point under 4.4-rc3 from linux-stable. I'm not sure
> > that i have time to investigate this quickly, but i'll post some new
> > information as soon as i get it

[skip]

> So until you bisect it to an exact commit and configuration, I declare
> the alert over. ;-)

 Just in case, to make sure you don't miss it. I have found the problem, and 
it's just good luck that it works on some machines.
Unreliably, BTW. The problem is that it verifies guest's physical addresses 
(IPA) against host memory map; and the fix is here:
http://www.spinics.net/lists/kvm/msg124561.html

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 10/21] arm64: KVM: Add patchable function selector

2015-12-02 Thread Andrew Jones
On Fri, Nov 27, 2015 at 06:50:04PM +, Marc Zyngier wrote:
> KVM so far relies on code patching, and is likely to use it more
> in the future. The main issue is that our alternative system works
> at the instruction level, while we'd like to have alternatives at
> the function level.

How about setting static-keys at hyp init time?

Thanks,
drew
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 02/10] Qemu/VFIO: Add new VFIO_GET_PCI_CAP_INFO ioctl cmd definition

2015-12-02 Thread Alex Williamson
On Tue, 2015-11-24 at 21:35 +0800, Lan Tianyu wrote:
> Signed-off-by: Lan Tianyu 
> ---
>  linux-headers/linux/vfio.h | 16 
>  1 file changed, 16 insertions(+)
> 
> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> index 0508d0b..732b0bd 100644
> --- a/linux-headers/linux/vfio.h
> +++ b/linux-headers/linux/vfio.h
> @@ -495,6 +495,22 @@ struct vfio_eeh_pe_op {
>  
>  #define VFIO_EEH_PE_OP   _IO(VFIO_TYPE, VFIO_BASE + 21)
>  
> +
> +#define VFIO_FIND_FREE_PCI_CONFIG_REG   _IO(VFIO_TYPE, VFIO_BASE + 22)
> +
> +#define VFIO_GET_PCI_CAP_INFO   _IO(VFIO_TYPE, VFIO_BASE + 22)
> +
> +struct vfio_pci_cap_info {
> +__u32 argsz;
> +__u32 flags;
> +#define VFIO_PCI_CAP_GET_SIZE (1 << 0)
> +#define VFIO_PCI_CAP_GET_FREE_REGION (1 << 1)
> +__u32 index;
> +__u32 offset;
> +__u32 size;
> +__u8 cap;
> +};
> +
>  /* * */
>  
>  #endif /* VFIO_H */

I didn't seen a matching kernel patch series for this, but why is the
kernel more capable of doing this than userspace is already?  These seem
like pointless ioctls, we're creating a purely virtual PCI capability,
the kernel doesn't really need to participate in that.  Also, why are we
restricting ourselves to standard capabilities?  That's often a crowded
space and we can't always know whether an area is free or not based only
on it being covered by a capability.  Some capabilities can also appear
more than once, so there's context that isn't being passed to the kernel
here.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO

2015-12-02 Thread Alex Williamson
On Tue, 2015-11-24 at 16:50 +0300, Pavel Fedin wrote:
> On some architectures (e.g. ARM64) if the device is behind an IOMMU, and
> is being mapped by VFIO, it is necessary to also add mappings for MSI
> translation register for interrupts to work. This series implements the
> necessary API to do this, and makes use of this API for GICv3 ITS on
> ARM64.
> 
> v1 => v2:
> - Adde dependency on CONFIG_GENERIC_MSI_IRQ_DOMAIN in some parts of the
>   code, should fix build without this option
> 
> Pavel Fedin (3):
>   vfio: Introduce map and unmap operations
>   gicv3, its: Introduce VFIO map and unmap operations
>   vfio: Introduce generic MSI mapping operations
> 
>  drivers/irqchip/irq-gic-v3-its.c   |  31 ++
>  drivers/vfio/pci/vfio_pci_intrs.c  |  11 
>  drivers/vfio/vfio.c| 116 
> +
>  drivers/vfio/vfio_iommu_type1.c|  29 ++
>  include/linux/irqchip/arm-gic-v3.h |   2 +
>  include/linux/msi.h|  12 
>  include/linux/vfio.h   |  17 +-
>  7 files changed, 217 insertions(+), 1 deletion(-)


Some good points and bad.  I like that you're making this transparent
for the user, but at the same time, directly calling function pointers
through the msi_domain_ops is quite ugly.  There needs to be a real,
interface there that isn't specific to vfio.  The down side of making it
transparent to the user is that parts of their IOVA space are being
claimed and they have no way to figure out what they are.  In fact, the
IOMMU mappings bypass the rb-tree that the type1 driver uses, so these
mappings might stomp on existing mappings for the user or the user might
stomp on these.  Neither of which would be much fun to debug.

There have been previous efforts to support MSI mapping in VFIO[1,2],
but none of them have really gone anywhere.  Whatever solution we use
needs to support everyone who needs it.  Thanks,

Alex

[1] http://www.spinics.net/lists/kvm/msg121669.html, 
http://www.spinics.net/lists/kvm/msg121662.html
[2] http://www.spinics.net/lists/kvm/msg119236.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 09/10] Qemu/VFIO: Add SRIOV VF migration support

2015-12-02 Thread Alex Williamson
On Tue, 2015-11-24 at 21:35 +0800, Lan Tianyu wrote:
> This patch is to add SRIOV VF migration support.
> Create new device type "vfio-sriov" and add faked PCI migration capability
> to the type device.
> 
> The purpose of the new capability
> 1) sync migration status with VF driver in the VM
> 2) Get mailbox irq vector to notify VF driver during migration.
> 3) Provide a way to control injecting irq or not.
> 
> Qemu will migrate PCI configure space regs and MSIX config for VF.
> Inject mailbox irq at last stage of migration to notify VF about
> migration event and wait VF driver ready for migration. VF driver
> writeS PCI config reg PCI_VF_MIGRATION_VF_STATUS in the new cap table
> to tell Qemu.

What makes this sr-iov specific?  Why wouldn't we simply extend vfio-pci
with a migration=on feature?  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 06/10] Qemu/PCI: Add macros for faked PCI migration capability

2015-12-02 Thread Alex Williamson
On Tue, 2015-11-24 at 21:35 +0800, Lan Tianyu wrote:
> This patch is to extend PCI CAP id for migration cap and
> add reg macros. The CAP ID is trial and we may find better one if the
> solution is feasible.
> 
> *PCI_VF_MIGRATION_CAP
> For VF driver to  control that triggers mailbox irq or not during migration.
> 
> *PCI_VF_MIGRATION_VMM_STATUS
> Qemu stores migration status in the reg
> 
> *PCI_VF_MIGRATION_VF_STATUS
> VF driver tells Qemu ready for migration
> 
> *PCI_VF_MIGRATION_IRQ
> VF driver stores mailbox interrupt vector in the reg for Qemu to trigger 
> during migration.
> 
> Signed-off-by: Lan Tianyu 
> ---
>  include/hw/pci/pci_regs.h | 19 +++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/include/hw/pci/pci_regs.h b/include/hw/pci/pci_regs.h
> index 57e8c80..0dcaf7e 100644
> --- a/include/hw/pci/pci_regs.h
> +++ b/include/hw/pci/pci_regs.h
> @@ -213,6 +213,7 @@
>  #define  PCI_CAP_ID_MSIX 0x11/* MSI-X */
>  #define  PCI_CAP_ID_SATA 0x12/* Serial ATA */
>  #define  PCI_CAP_ID_AF   0x13/* PCI Advanced Features */
> +#define  PCI_CAP_ID_MIGRATION   0x14 
>  #define PCI_CAP_LIST_NEXT1   /* Next capability in the list */
>  #define PCI_CAP_FLAGS2   /* Capability defined flags (16 
> bits) */
>  #define PCI_CAP_SIZEOF   4
> @@ -716,4 +717,22 @@
>  #define PCI_ACS_CTRL 0x06/* ACS Control Register */
>  #define PCI_ACS_EGRESS_CTL_V 0x08/* ACS Egress Control Vector */
>  
> +/* Migration*/
> +#define PCI_VF_MIGRATION_CAP0x04
> +#define PCI_VF_MIGRATION_VMM_STATUS  0x05
> +#define PCI_VF_MIGRATION_VF_STATUS   0x06
> +#define PCI_VF_MIGRATION_IRQ 0x07
> +
> +#define PCI_VF_MIGRATION_CAP_SIZE   0x08
> +
> +#define VMM_MIGRATION_END0x00
> +#define VMM_MIGRATION_START  0x01  
> +
> +#define PCI_VF_WAIT_FOR_MIGRATION   0x00  
> +#define PCI_VF_READY_FOR_MIGRATION  0x01
> +
> +#define PCI_VF_MIGRATION_DISABLE0x00
> +#define PCI_VF_MIGRATION_ENABLE 0x01
> +
> +
>  #endif /* LINUX_PCI_REGS_H */

This will of course break if the PCI SIG defines that capability index.
Couldn't this be done within a vendor defined capability?  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 07/21] KVM: ARM64: PMU: Add perf event map and introduce perf event creating function

2015-12-02 Thread Shannon Zhao
From: Shannon Zhao 

When we use tools like perf on host, perf passes the event type and the
id of this event type category to kernel, then kernel will map them to
hardware event number and write this number to PMU PMEVTYPER_EL0
register. When getting the event number in KVM, directly use raw event
type to create a perf_event for it.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/include/asm/pmu.h |   2 +
 arch/arm64/kvm/Makefile  |   1 +
 include/kvm/arm_pmu.h|  13 +
 virt/kvm/arm/pmu.c   | 127 +++
 4 files changed, 143 insertions(+)
 create mode 100644 virt/kvm/arm/pmu.c

diff --git a/arch/arm64/include/asm/pmu.h b/arch/arm64/include/asm/pmu.h
index 4264ea0..e3cb6b3 100644
--- a/arch/arm64/include/asm/pmu.h
+++ b/arch/arm64/include/asm/pmu.h
@@ -28,6 +28,8 @@
 #define ARMV8_PMCR_D   (1 << 3) /* CCNT counts every 64th cpu cycle */
 #define ARMV8_PMCR_X   (1 << 4) /* Export to ETM */
 #define ARMV8_PMCR_DP  (1 << 5) /* Disable CCNT if non-invasive debug*/
+/* Determines which PMCCNTR_EL0 bit generates an overflow */
+#define ARMV8_PMCR_LC  (1 << 6)
 #defineARMV8_PMCR_N_SHIFT  11   /* Number of counters 
supported */
 #defineARMV8_PMCR_N_MASK   0x1f
 #defineARMV8_PMCR_MASK 0x3f /* Mask for writable bits */
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 1949fe5..18d56d8 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -27,3 +27,4 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v3.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v3-emul.o
 kvm-$(CONFIG_KVM_ARM_HOST) += vgic-v3-switch.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
+kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 0c13470..59d9085 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -38,4 +38,17 @@ struct kvm_pmu {
 #endif
 };
 
+#ifdef CONFIG_KVM_ARM_PMU
+unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32 select_idx);
+void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u32 data,
+   u32 select_idx);
+#else
+unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32 select_idx)
+{
+   return 0;
+}
+void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u32 data,
+   u32 select_idx) {}
+#endif
+
 #endif
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
new file mode 100644
index 000..6c50003
--- /dev/null
+++ b/virt/kvm/arm/pmu.c
@@ -0,0 +1,127 @@
+/*
+ * Copyright (C) 2015 Linaro Ltd.
+ * Author: Shannon Zhao 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * kvm_pmu_get_counter_value - get PMU counter value
+ * @vcpu: The vcpu pointer
+ * @select_idx: The counter index
+ */
+unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32 select_idx)
+{
+   u64 counter, enabled, running;
+   struct kvm_pmu *pmu = >arch.pmu;
+   struct kvm_pmc *pmc = >pmc[select_idx];
+
+   if (!vcpu_mode_is_32bit(vcpu))
+   counter = vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + select_idx);
+   else
+   counter = vcpu_cp15(vcpu, c14_PMEVCNTR0 + select_idx);
+
+   if (pmc->perf_event)
+   counter += perf_event_read_value(pmc->perf_event, ,
+);
+
+   return counter & pmc->bitmask;
+}
+
+static bool kvm_pmu_counter_is_enabled(struct kvm_vcpu *vcpu, u32 select_idx)
+{
+   if (!vcpu_mode_is_32bit(vcpu))
+   return (vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMCR_E) &
+  (vcpu_sys_reg(vcpu, PMCNTENSET_EL0) >> select_idx);
+   else
+   return (vcpu_sys_reg(vcpu, c9_PMCR) & ARMV8_PMCR_E) &
+  (vcpu_sys_reg(vcpu, c9_PMCNTENSET) >> select_idx);
+}
+
+/**
+ * kvm_pmu_stop_counter - stop PMU counter
+ * @pmc: The PMU counter pointer
+ *
+ * If this counter has been configured to monitor some event, release it here.
+ */
+static void kvm_pmu_stop_counter(struct kvm_pmc *pmc)
+{
+   struct kvm_vcpu *vcpu = pmc->vcpu;
+   u64 counter;
+
+   if (pmc->perf_event) {
+   counter = kvm_pmu_get_counter_value(vcpu, 

[PATCH v5 20/21] KVM: ARM64: Free perf event of PMU when destroying vcpu

2015-12-02 Thread Shannon Zhao
From: Shannon Zhao 

When KVM frees VCPU, it needs to free the perf_event of PMU.

Signed-off-by: Shannon Zhao 
---
 arch/arm/kvm/arm.c|  1 +
 include/kvm/arm_pmu.h |  2 ++
 virt/kvm/arm/pmu.c| 21 +
 3 files changed, 24 insertions(+)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index cd696ef..cea2176 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -259,6 +259,7 @@ void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
kvm_mmu_free_memory_caches(vcpu);
kvm_timer_vcpu_terminate(vcpu);
kvm_vgic_vcpu_destroy(vcpu);
+   kvm_pmu_vcpu_destroy(vcpu);
kmem_cache_free(kvm_vcpu_cache, vcpu);
 }
 
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index fe8035a..53feebb 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -40,6 +40,7 @@ struct kvm_pmu {
 
 #ifdef CONFIG_KVM_ARM_PMU
 void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu);
+void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu);
 void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu);
 unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32 select_idx);
 void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u32 val);
@@ -52,6 +53,7 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, 
u32 data,
 void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u32 val);
 #else
 void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu) {}
+void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu) {}
 void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu) {}
 unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32 select_idx)
 {
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index 4014831..bd2fece 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -98,6 +98,27 @@ void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu)
 }
 
 /**
+ * kvm_pmu_vcpu_destroy - free perf event of PMU for cpu
+ * @vcpu: The vcpu pointer
+ *
+ */
+void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu)
+{
+   int i;
+   struct kvm_pmu *pmu = >arch.pmu;
+
+   for (i = 0; i < ARMV8_MAX_COUNTERS; i++) {
+   struct kvm_pmc *pmc = >pmc[i];
+
+   if (pmc->perf_event) {
+   perf_event_disable(pmc->perf_event);
+   perf_event_release_kernel(pmc->perf_event);
+   pmc->perf_event = NULL;
+   }
+   }
+}
+
+/**
  * kvm_pmu_flush_hwstate - flush pmu state to cpu
  * @vcpu: The vcpu pointer
  *
-- 
2.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 06/21] KVM: ARM64: Add reset and access handlers for PMCEID0 and PMCEID1 register

2015-12-02 Thread Shannon Zhao
From: Shannon Zhao 

Add reset handler which gets host value of PMCEID0 or PMCEID1. Since
write action to PMCEID0 or PMCEID1 is ignored, add a new case for this.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/kvm/sys_regs.c | 29 +
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 1f1f6a6..b0a8d88 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -460,6 +460,19 @@ static void reset_pmcr(struct kvm_vcpu *vcpu, const struct 
sys_reg_desc *r)
vcpu_sys_reg(vcpu, r->reg) = val;
 }
 
+static void reset_pmceid(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
+{
+   u64 pmceid;
+
+   if (r->reg == PMCEID0_EL0)
+   asm volatile("mrs %0, pmceid0_el0\n" : "=r" (pmceid));
+   else
+   /* PMCEID1_EL0 */
+   asm volatile("mrs %0, pmceid1_el0\n" : "=r" (pmceid));
+
+   vcpu_sys_reg(vcpu, r->reg) = pmceid;
+}
+
 /* PMU registers accessor. */
 static bool access_pmu_regs(struct kvm_vcpu *vcpu,
const struct sys_reg_params *p,
@@ -477,6 +490,9 @@ static bool access_pmu_regs(struct kvm_vcpu *vcpu,
vcpu_sys_reg(vcpu, r->reg) = val;
break;
}
+   case PMCEID0_EL0:
+   case PMCEID1_EL0:
+   return ignore_write(vcpu, p);
default:
vcpu_sys_reg(vcpu, r->reg) = *vcpu_reg(vcpu, p->Rt);
break;
@@ -701,10 +717,10 @@ static const struct sys_reg_desc sys_reg_descs[] = {
  access_pmu_regs, reset_unknown, PMSELR_EL0 },
/* PMCEID0_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b110),
- trap_raz_wi },
+ access_pmu_regs, reset_pmceid, PMCEID0_EL0 },
/* PMCEID1_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b111),
- trap_raz_wi },
+ access_pmu_regs, reset_pmceid, PMCEID1_EL0 },
/* PMCCNTR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b000),
  trap_raz_wi },
@@ -934,6 +950,9 @@ static bool access_pmu_cp15_regs(struct kvm_vcpu *vcpu,
vcpu_cp15(vcpu, r->reg) = val;
break;
}
+   case c9_PMCEID0:
+   case c9_PMCEID1:
+   return ignore_write(vcpu, p);
default:
vcpu_cp15(vcpu, r->reg) = *vcpu_reg(vcpu, p->Rt);
break;
@@ -991,8 +1010,10 @@ static const struct sys_reg_desc cp15_regs[] = {
{ Op1( 0), CRn( 9), CRm(12), Op2( 3), trap_raz_wi },
{ Op1( 0), CRn( 9), CRm(12), Op2( 5), access_pmu_cp15_regs,
  NULL, c9_PMSELR },
-   { Op1( 0), CRn( 9), CRm(12), Op2( 6), trap_raz_wi },
-   { Op1( 0), CRn( 9), CRm(12), Op2( 7), trap_raz_wi },
+   { Op1( 0), CRn( 9), CRm(12), Op2( 6), access_pmu_cp15_regs,
+ NULL, c9_PMCEID0 },
+   { Op1( 0), CRn( 9), CRm(12), Op2( 7), access_pmu_cp15_regs,
+ NULL, c9_PMCEID1 },
{ Op1( 0), CRn( 9), CRm(13), Op2( 0), trap_raz_wi },
{ Op1( 0), CRn( 9), CRm(13), Op2( 1), trap_raz_wi },
{ Op1( 0), CRn( 9), CRm(13), Op2( 2), trap_raz_wi },
-- 
2.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 15/21] KVM: ARM64: Add reset and access handlers for PMSWINC register

2015-12-02 Thread Shannon Zhao
From: Shannon Zhao 

Add access handler which emulates writing and reading PMSWINC
register and add support for creating software increment event.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/kvm/sys_regs.c | 18 +-
 include/kvm/arm_pmu.h |  2 ++
 virt/kvm/arm/pmu.c| 44 
 3 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index eb4fcf9..12f4806 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -567,6 +567,11 @@ static bool access_pmu_regs(struct kvm_vcpu *vcpu,
kvm_pmu_overflow_clear(vcpu, *vcpu_reg(vcpu, p->Rt));
break;
}
+   case PMSWINC_EL0: {
+   val = *vcpu_reg(vcpu, p->Rt);
+   kvm_pmu_software_increment(vcpu, val);
+   break;
+   }
case PMCR_EL0: {
/* Only update writeable bits of PMCR */
val = vcpu_sys_reg(vcpu, r->reg);
@@ -602,6 +607,8 @@ static bool access_pmu_regs(struct kvm_vcpu *vcpu,
*vcpu_reg(vcpu, p->Rt) = val;
break;
}
+   case PMSWINC_EL0:
+   return read_zero(vcpu, p);
case PMCR_EL0: {
/* PMCR.P & PMCR.C are RAZ */
val = vcpu_sys_reg(vcpu, r->reg)
@@ -814,7 +821,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
  access_pmu_regs, reset_unknown, PMOVSCLR_EL0 },
/* PMSWINC_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b100),
- trap_raz_wi },
+ access_pmu_regs, reset_unknown, PMSWINC_EL0 },
/* PMSELR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b101),
  access_pmu_regs, reset_unknown, PMSELR_EL0 },
@@ -1119,6 +1126,11 @@ static bool access_pmu_cp15_regs(struct kvm_vcpu *vcpu,
kvm_pmu_overflow_clear(vcpu, *vcpu_reg(vcpu, p->Rt));
break;
}
+   case c9_PMSWINC: {
+   val = *vcpu_reg(vcpu, p->Rt);
+   kvm_pmu_software_increment(vcpu, val);
+   break;
+   }
case c9_PMCR: {
/* Only update writeable bits of PMCR */
val = vcpu_cp15(vcpu, r->reg);
@@ -1154,6 +1166,8 @@ static bool access_pmu_cp15_regs(struct kvm_vcpu *vcpu,
*vcpu_reg(vcpu, p->Rt) = val;
break;
}
+   case c9_PMSWINC:
+   return read_zero(vcpu, p);
case c9_PMCR: {
/* PMCR.P & PMCR.C are RAZ */
val = vcpu_cp15(vcpu, r->reg)
@@ -1206,6 +1220,8 @@ static const struct sys_reg_desc cp15_regs[] = {
  NULL, c9_PMCNTENCLR },
{ Op1( 0), CRn( 9), CRm(12), Op2( 3), access_pmu_cp15_regs,
  NULL, c9_PMOVSCLR },
+   { Op1( 0), CRn( 9), CRm(12), Op2( 4), access_pmu_cp15_regs,
+ NULL, c9_PMSWINC },
{ Op1( 0), CRn( 9), CRm(12), Op2( 5), access_pmu_cp15_regs,
  NULL, c9_PMSELR },
{ Op1( 0), CRn( 9), CRm(12), Op2( 6), access_pmu_cp15_regs,
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 4f3154c..a54c391 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -44,6 +44,7 @@ void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u32 val);
 void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u32 val, bool all_enable);
 void kvm_pmu_overflow_clear(struct kvm_vcpu *vcpu, u32 val);
 void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u32 val);
+void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u32 val);
 void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u32 data,
u32 select_idx);
 #else
@@ -55,6 +56,7 @@ void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u32 val) 
{}
 void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u32 val, bool all_enable) {}
 void kvm_pmu_overflow_clear(struct kvm_vcpu *vcpu, u32 val) {}
 void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u32 val) {}
+void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u32 val) {}
 void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u32 data,
u32 select_idx) {}
 #endif
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index 296b4ad..d133909 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -206,6 +206,46 @@ void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u32 val)
 }
 
 /**
+ * kvm_pmu_software_increment - do software increment
+ * @vcpu: The vcpu pointer
+ * @val: the value guest writes to PMSWINC register
+ */
+void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u32 val)

[PATCH v5 17/21] KVM: ARM64: Add helper to handle PMCR register bits

2015-12-02 Thread Shannon Zhao
From: Shannon Zhao 

According to ARMv8 spec, when writing 1 to PMCR.E, all counters are
enabled by PMCNTENSET, while writing 0 to PMCR.E, all counters are
disabled. When writing 1 to PMCR.P, reset all event counters, not
including PMCCNTR, to zero. When writing 1 to PMCR.C, reset PMCCNTR to
zero.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/kvm/sys_regs.c |  2 ++
 include/kvm/arm_pmu.h |  2 ++
 virt/kvm/arm/pmu.c| 51 +++
 3 files changed, 55 insertions(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 9320277..405cf70 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -578,6 +578,7 @@ static bool access_pmu_regs(struct kvm_vcpu *vcpu,
val &= ~ARMV8_PMCR_MASK;
val |= *vcpu_reg(vcpu, p->Rt) & ARMV8_PMCR_MASK;
vcpu_sys_reg(vcpu, r->reg) = val;
+   kvm_pmu_handle_pmcr(vcpu, val);
break;
}
case PMCEID0_EL0:
@@ -1219,6 +1220,7 @@ static bool access_pmu_cp15_regs(struct kvm_vcpu *vcpu,
val &= ~ARMV8_PMCR_MASK;
val |= *vcpu_reg(vcpu, p->Rt) & ARMV8_PMCR_MASK;
vcpu_cp15(vcpu, r->reg) = val;
+   kvm_pmu_handle_pmcr(vcpu, val);
break;
}
case c9_PMCEID0:
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index a54c391..212a3de 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -47,6 +47,7 @@ void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u32 val);
 void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u32 val);
 void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u32 data,
u32 select_idx);
+void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u32 val);
 #else
 unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32 select_idx)
 {
@@ -59,6 +60,7 @@ void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u32 val) {}
 void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u32 val) {}
 void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u32 data,
u32 select_idx) {}
+void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u32 val) {}
 #endif
 
 #endif
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index d133909..b81e35e 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -140,6 +140,57 @@ static unsigned long kvm_pmu_valid_counter_mask(struct 
kvm_vcpu *vcpu)
 }
 
 /**
+ * kvm_pmu_handle_pmcr - handle PMCR register
+ * @vcpu: The vcpu pointer
+ * @val: the value guest writes to PMCR register
+ */
+void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u32 val)
+{
+   struct kvm_pmu *pmu = >arch.pmu;
+   struct kvm_pmc *pmc;
+   u32 enable;
+   int i;
+
+   if (val & ARMV8_PMCR_E) {
+   if (!vcpu_mode_is_32bit(vcpu))
+   enable = vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
+   else
+   enable = vcpu_cp15(vcpu, c9_PMCNTENSET);
+
+   kvm_pmu_enable_counter(vcpu, enable, true);
+   } else {
+   kvm_pmu_disable_counter(vcpu, 0xUL);
+   }
+
+   if (val & ARMV8_PMCR_C) {
+   pmc = >pmc[ARMV8_MAX_COUNTERS - 1];
+   if (pmc->perf_event)
+   local64_set(>perf_event->count, 0);
+   if (!vcpu_mode_is_32bit(vcpu))
+   vcpu_sys_reg(vcpu, PMCCNTR_EL0) = 0;
+   else
+   vcpu_cp15(vcpu, c9_PMCCNTR) = 0;
+   }
+
+   if (val & ARMV8_PMCR_P) {
+   for (i = 0; i < ARMV8_MAX_COUNTERS - 1; i++) {
+   pmc = >pmc[i];
+   if (pmc->perf_event)
+   local64_set(>perf_event->count, 0);
+   if (!vcpu_mode_is_32bit(vcpu))
+   vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) = 0;
+   else
+   vcpu_cp15(vcpu, c14_PMEVCNTR0 + i) = 0;
+   }
+   }
+
+   if (val & ARMV8_PMCR_LC) {
+   pmc = >pmc[ARMV8_MAX_COUNTERS - 1];
+   pmc->bitmask = 0xUL;
+   }
+}
+
+/**
  * kvm_pmu_overflow_clear - clear PMU overflow interrupt
  * @vcpu: The vcpu pointer
  * @val: the value guest writes to PMOVSCLR register
-- 
2.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 02/21] KVM: ARM64: Define PMU data structure for each vcpu

2015-12-02 Thread Shannon Zhao
From: Shannon Zhao 

Here we plan to support virtual PMU for guest by full software
emulation, so define some basic structs and functions preparing for
futher steps. Define struct kvm_pmc for performance monitor counter and
struct kvm_pmu for performance monitor unit for each vcpu. According to
ARMv8 spec, the PMU contains at most 32(ARMV8_MAX_COUNTERS) counters.

Since this only supports ARM64 (or PMUv3), add a separate config symbol
for it.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/include/asm/kvm_host.h |  2 ++
 arch/arm64/kvm/Kconfig|  8 
 include/kvm/arm_pmu.h | 41 +++
 3 files changed, 51 insertions(+)
 create mode 100644 include/kvm/arm_pmu.h

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index a35ce72..42e15bb 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -37,6 +37,7 @@
 
 #include 
 #include 
+#include 
 
 #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
 
@@ -132,6 +133,7 @@ struct kvm_vcpu_arch {
/* VGIC state */
struct vgic_cpu vgic_cpu;
struct arch_timer_cpu timer_cpu;
+   struct kvm_pmu pmu;
 
/*
 * Anything that is not used directly from assembly code goes
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index a5272c0..66da9a2 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -36,6 +36,7 @@ config KVM
select HAVE_KVM_EVENTFD
select HAVE_KVM_IRQFD
select KVM_ARM_VGIC_V3
+   select KVM_ARM_PMU
---help---
  Support hosting virtualized guest machines.
  We don't support KVM with 16K page tables yet, due to the multiple
@@ -48,6 +49,13 @@ config KVM_ARM_HOST
---help---
  Provides host support for ARM processors.
 
+config KVM_ARM_PMU
+   bool
+   depends on KVM_ARM_HOST && HW_PERF_EVENTS
+   ---help---
+ Adds support for a virtual Performance Monitoring Unit (PMU) in
+ virtual machines.
+
 source drivers/vhost/Kconfig
 
 endif # VIRTUALIZATION
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
new file mode 100644
index 000..0c13470
--- /dev/null
+++ b/include/kvm/arm_pmu.h
@@ -0,0 +1,41 @@
+/*
+ * Copyright (C) 2015 Linaro Ltd.
+ * Author: Shannon Zhao 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#ifndef __ASM_ARM_KVM_PMU_H
+#define __ASM_ARM_KVM_PMU_H
+
+#include 
+#ifdef CONFIG_KVM_ARM_PMU
+#include 
+#endif
+
+struct kvm_pmc {
+   u8 idx;/* index into the pmu->pmc array */
+   struct perf_event *perf_event;
+   struct kvm_vcpu *vcpu;
+   u64 bitmask;
+};
+
+struct kvm_pmu {
+#ifdef CONFIG_KVM_ARM_PMU
+   /* PMU IRQ Number per VCPU */
+   int irq_num;
+   struct kvm_pmc pmc[ARMV8_MAX_COUNTERS];
+#endif
+};
+
+#endif
-- 
2.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 13/21] KVM: ARM64: Add reset and access handlers for PMOVSSET and PMOVSCLR register

2015-12-02 Thread Shannon Zhao
From: Shannon Zhao 

Since the reset value of PMOVSSET and PMOVSCLR is UNKNOWN, use
reset_unknown for its reset handler. Add a new case to emulate writing
PMOVSSET or PMOVSCLR register.

When writing non-zero value to PMOVSSET, pend PMU interrupt. When the
value writing to PMOVSCLR is equal to the current value, clear the PMU
pending interrupt.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/kvm/sys_regs.c | 25 +--
 include/kvm/arm_pmu.h |  4 +++
 virt/kvm/arm/pmu.c| 80 +++
 3 files changed, 106 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index a4f9177..f5e0732 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -559,6 +559,14 @@ static bool access_pmu_regs(struct kvm_vcpu *vcpu,
vcpu_sys_reg(vcpu, PMINTENSET_EL1) &= ~val;
break;
}
+   case PMOVSSET_EL0: {
+   kvm_pmu_overflow_set(vcpu, *vcpu_reg(vcpu, p->Rt));
+   break;
+   }
+   case PMOVSCLR_EL0: {
+   kvm_pmu_overflow_clear(vcpu, *vcpu_reg(vcpu, p->Rt));
+   break;
+   }
case PMCR_EL0: {
/* Only update writeable bits of PMCR */
val = vcpu_sys_reg(vcpu, r->reg);
@@ -803,7 +811,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
  access_pmu_regs, reset_unknown, PMCNTENCLR_EL0 },
/* PMOVSCLR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b011),
- trap_raz_wi },
+ access_pmu_regs, reset_unknown, PMOVSCLR_EL0 },
/* PMSWINC_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b100),
  trap_raz_wi },
@@ -830,7 +838,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
  trap_raz_wi },
/* PMOVSSET_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1110), Op2(0b011),
- trap_raz_wi },
+ access_pmu_regs, reset_unknown, PMOVSSET_EL0 },
 
/* TPIDR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1101), CRm(0b), Op2(0b010),
@@ -1103,6 +,14 @@ static bool access_pmu_cp15_regs(struct kvm_vcpu *vcpu,
vcpu_cp15(vcpu, c9_PMINTENSET) &= ~val;
break;
}
+   case c9_PMOVSSET: {
+   kvm_pmu_overflow_set(vcpu, *vcpu_reg(vcpu, p->Rt));
+   break;
+   }
+   case c9_PMOVSCLR: {
+   kvm_pmu_overflow_clear(vcpu, *vcpu_reg(vcpu, p->Rt));
+   break;
+   }
case c9_PMCR: {
/* Only update writeable bits of PMCR */
val = vcpu_cp15(vcpu, r->reg);
@@ -1188,7 +1204,8 @@ static const struct sys_reg_desc cp15_regs[] = {
  NULL, c9_PMCNTENSET },
{ Op1( 0), CRn( 9), CRm(12), Op2( 2), access_pmu_cp15_regs,
  NULL, c9_PMCNTENCLR },
-   { Op1( 0), CRn( 9), CRm(12), Op2( 3), trap_raz_wi },
+   { Op1( 0), CRn( 9), CRm(12), Op2( 3), access_pmu_cp15_regs,
+ NULL, c9_PMOVSCLR },
{ Op1( 0), CRn( 9), CRm(12), Op2( 5), access_pmu_cp15_regs,
  NULL, c9_PMSELR },
{ Op1( 0), CRn( 9), CRm(12), Op2( 6), access_pmu_cp15_regs,
@@ -1206,6 +1223,8 @@ static const struct sys_reg_desc cp15_regs[] = {
  NULL, c9_PMINTENSET },
{ Op1( 0), CRn( 9), CRm(14), Op2( 2), access_pmu_cp15_regs,
  NULL, c9_PMINTENCLR },
+   { Op1( 0), CRn( 9), CRm(14), Op2( 3), access_pmu_cp15_regs,
+ NULL, c9_PMOVSSET },
 
{ Op1( 0), CRn(10), CRm( 2), Op2( 0), access_vm_reg, NULL, c10_PRRR },
{ Op1( 0), CRn(10), CRm( 2), Op2( 1), access_vm_reg, NULL, c10_NMRR },
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index fff8f15..4f3154c 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -42,6 +42,8 @@ struct kvm_pmu {
 unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32 select_idx);
 void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u32 val);
 void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u32 val, bool all_enable);
+void kvm_pmu_overflow_clear(struct kvm_vcpu *vcpu, u32 val);
+void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u32 val);
 void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u32 data,
u32 select_idx);
 #else
@@ -51,6 +53,8 @@ unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu 
*vcpu, u32 select_idx)
 }
 void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u32 val) {}
 void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u32 val, bool all_enable) {}
+void kvm_pmu_overflow_clear(struct kvm_vcpu *vcpu, u32 val) {}
+void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u32 val) {}
 void 

[PATCH v5 18/21] KVM: ARM64: Add PMU overflow interrupt routing

2015-12-02 Thread Shannon Zhao
From: Shannon Zhao 

When calling perf_event_create_kernel_counter to create perf_event,
assign a overflow handler. Then when perf event overflows, call
kvm_vcpu_kick() to sync the interrupt.

Signed-off-by: Shannon Zhao 
---
 arch/arm/kvm/arm.c|  2 ++
 include/kvm/arm_pmu.h |  2 ++
 virt/kvm/arm/pmu.c| 52 ++-
 3 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index e06fd29..cd696ef 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define CREATE_TRACE_POINTS
 #include "trace.h"
@@ -569,6 +570,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 * non-preemptible context.
 */
preempt_disable();
+   kvm_pmu_flush_hwstate(vcpu);
kvm_timer_flush_hwstate(vcpu);
kvm_vgic_flush_hwstate(vcpu);
 
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 212a3de..4cb0ade 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -39,6 +39,7 @@ struct kvm_pmu {
 };
 
 #ifdef CONFIG_KVM_ARM_PMU
+void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu);
 unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32 select_idx);
 void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u32 val);
 void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u32 val, bool all_enable);
@@ -49,6 +50,7 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, 
u32 data,
u32 select_idx);
 void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u32 val);
 #else
+void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu) {}
 unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32 select_idx)
 {
return 0;
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index b81e35e..34e7386 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /**
  * kvm_pmu_get_counter_value - get PMU counter value
@@ -79,6 +80,54 @@ static void kvm_pmu_stop_counter(struct kvm_pmc *pmc)
 }
 
 /**
+ * kvm_pmu_flush_hwstate - flush pmu state to cpu
+ * @vcpu: The vcpu pointer
+ *
+ * Inject virtual PMU IRQ if IRQ is pending for this cpu.
+ */
+void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu)
+{
+   struct kvm_pmu *pmu = >arch.pmu;
+   u32 overflow;
+
+   if (pmu->irq_num == -1)
+   return;
+
+   if (!vcpu_mode_is_32bit(vcpu)) {
+   if (!(vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMCR_E))
+   return;
+
+   overflow = vcpu_sys_reg(vcpu, PMCNTENSET_EL0)
+  & vcpu_sys_reg(vcpu, PMINTENSET_EL1)
+  & vcpu_sys_reg(vcpu, PMOVSSET_EL0);
+   } else {
+   if (!(vcpu_cp15(vcpu, c9_PMCR) & ARMV8_PMCR_E))
+   return;
+
+   overflow = vcpu_cp15(vcpu, c9_PMCNTENSET)
+  & vcpu_cp15(vcpu, c9_PMINTENSET)
+  & vcpu_cp15(vcpu, c9_PMOVSSET);
+   }
+
+   kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id, pmu->irq_num,
+   overflow ? 1 : 0);
+}
+
+/**
+ * When perf event overflows, call kvm_pmu_overflow_set to set overflow status.
+ */
+static void kvm_pmu_perf_overflow(struct perf_event *perf_event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
+{
+   struct kvm_pmc *pmc = perf_event->overflow_handler_context;
+   struct kvm_vcpu *vcpu = pmc->vcpu;
+   int idx = pmc->idx;
+
+   kvm_pmu_overflow_set(vcpu, BIT(idx));
+}
+
+/**
  * kvm_pmu_enable_counter - enable selected PMU counter
  * @vcpu: The vcpu pointer
  * @val: the value guest writes to PMCNTENSET register
@@ -338,7 +387,8 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, 
u32 data,
/* The initial sample period (overflow count) of an event. */
attr.sample_period = (-counter) & pmc->bitmask;
 
-   event = perf_event_create_kernel_counter(, -1, current, NULL, pmc);
+   event = perf_event_create_kernel_counter(, -1, current,
+kvm_pmu_perf_overflow, pmc);
if (IS_ERR(event)) {
printk_once("kvm: pmu event creation failed %ld\n",
PTR_ERR(event));
-- 
2.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 21/21] KVM: ARM64: Add a new kvm ARM PMU device

2015-12-02 Thread Shannon Zhao
From: Shannon Zhao 

Add a new kvm device type KVM_DEV_TYPE_ARM_PMU_V3 for ARM PMU. Implement
the kvm_device_ops for it.

Signed-off-by: Shannon Zhao 
---
 Documentation/virtual/kvm/devices/arm-pmu.txt | 16 +
 arch/arm64/include/uapi/asm/kvm.h |  3 +
 include/linux/kvm_host.h  |  1 +
 include/uapi/linux/kvm.h  |  2 +
 virt/kvm/arm/pmu.c| 87 +++
 virt/kvm/kvm_main.c   |  4 ++
 6 files changed, 113 insertions(+)
 create mode 100644 Documentation/virtual/kvm/devices/arm-pmu.txt

diff --git a/Documentation/virtual/kvm/devices/arm-pmu.txt 
b/Documentation/virtual/kvm/devices/arm-pmu.txt
new file mode 100644
index 000..5121f1f
--- /dev/null
+++ b/Documentation/virtual/kvm/devices/arm-pmu.txt
@@ -0,0 +1,16 @@
+ARM Virtual Performance Monitor Unit (vPMU)
+===
+
+Device types supported:
+  KVM_DEV_TYPE_ARM_PMU_V3 ARM Performance Monitor Unit v3
+
+Instantiate one PMU instance for per VCPU through this API.
+
+Groups:
+  KVM_DEV_ARM_PMU_GRP_IRQ
+  Attributes:
+A value describing the interrupt number of PMU overflow interrupt. This
+interrupt should be a PPI.
+
+  Errors:
+-EINVAL: Value set is out of the expected range (from 16 to 31)
diff --git a/arch/arm64/include/uapi/asm/kvm.h 
b/arch/arm64/include/uapi/asm/kvm.h
index 2d4ca4b..568afa2 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -204,6 +204,9 @@ struct kvm_arch_memory_slot {
 #define KVM_DEV_ARM_VGIC_GRP_CTRL  4
 #define   KVM_DEV_ARM_VGIC_CTRL_INIT   0
 
+/* Device Control API: ARM PMU */
+#define KVM_DEV_ARM_PMU_GRP_IRQ0
+
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_TYPE_SHIFT 24
 #define KVM_ARM_IRQ_TYPE_MASK  0xff
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c923350..608dea6 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1161,6 +1161,7 @@ extern struct kvm_device_ops kvm_mpic_ops;
 extern struct kvm_device_ops kvm_xics_ops;
 extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
 extern struct kvm_device_ops kvm_arm_vgic_v3_ops;
+extern struct kvm_device_ops kvm_arm_pmu_ops;
 
 #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 03f3618..4ba6fdd 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1032,6 +1032,8 @@ enum kvm_device_type {
 #define KVM_DEV_TYPE_FLIC  KVM_DEV_TYPE_FLIC
KVM_DEV_TYPE_ARM_VGIC_V3,
 #define KVM_DEV_TYPE_ARM_VGIC_V3   KVM_DEV_TYPE_ARM_VGIC_V3
+   KVM_DEV_TYPE_ARM_PMU_V3,
+#defineKVM_DEV_TYPE_ARM_PMU_V3 KVM_DEV_TYPE_ARM_PMU_V3
KVM_DEV_TYPE_MAX,
 };
 
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index bd2fece..82b90e8 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -19,10 +19,13 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 
+#include "vgic.h"
+
 /**
  * kvm_pmu_get_counter_value - get PMU counter value
  * @vcpu: The vcpu pointer
@@ -436,3 +439,87 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, 
u32 data,
 
pmc->perf_event = event;
 }
+
+static int kvm_arm_pmu_set_irq(struct kvm *kvm, int irq)
+{
+   int j;
+   struct kvm_vcpu *vcpu;
+
+   kvm_for_each_vcpu(j, vcpu, kvm) {
+   struct kvm_pmu *pmu = >arch.pmu;
+
+   kvm_debug("Set kvm ARM PMU irq: %d\n", irq);
+   pmu->irq_num = irq;
+   }
+
+   return 0;
+}
+
+static int kvm_arm_pmu_create(struct kvm_device *dev, u32 type)
+{
+   int i;
+   struct kvm_vcpu *vcpu;
+   struct kvm *kvm = dev->kvm;
+
+   kvm_for_each_vcpu(i, vcpu, kvm) {
+   struct kvm_pmu *pmu = >arch.pmu;
+
+   memset(pmu, 0, sizeof(*pmu));
+   kvm_pmu_vcpu_reset(vcpu);
+   pmu->irq_num = -1;
+   }
+
+   return 0;
+}
+
+static void kvm_arm_pmu_destroy(struct kvm_device *dev)
+{
+   kfree(dev);
+}
+
+static int kvm_arm_pmu_set_attr(struct kvm_device *dev,
+   struct kvm_device_attr *attr)
+{
+   switch (attr->group) {
+   case KVM_DEV_ARM_PMU_GRP_IRQ: {
+   int __user *uaddr = (int __user *)(long)attr->addr;
+   int reg;
+
+   if (get_user(reg, uaddr))
+   return -EFAULT;
+
+   if (reg < VGIC_NR_SGIS || reg >= VGIC_NR_PRIVATE_IRQS)
+   return -EINVAL;
+
+   return kvm_arm_pmu_set_irq(dev->kvm, reg);
+   }
+   }
+
+   return -ENXIO;
+}
+
+static int kvm_arm_pmu_get_attr(struct kvm_device *dev,
+   struct kvm_device_attr *attr)
+{
+   return 0;
+}
+
+static int kvm_arm_pmu_has_attr(struct kvm_device *dev,
+ 

[PATCH v5 16/21] KVM: ARM64: Add access handlers for PMEVCNTRn and PMEVTYPERn register

2015-12-02 Thread Shannon Zhao
From: Shannon Zhao 

Add access handler which emulates writing and reading PMEVCNTRn and
PMEVTYPERn.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/kvm/sys_regs.c | 164 ++
 1 file changed, 164 insertions(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 12f4806..9320277 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -640,6 +640,20 @@ static bool access_pmu_regs(struct kvm_vcpu *vcpu,
{ Op0(0b10), Op1(0b000), CRn(0b), CRm((n)), Op2(0b111), \
  trap_wcr, reset_wcr, n, 0,  get_wcr, set_wcr }
 
+/* Macro to expand the PMEVCNTRn_EL0 register */
+#define PMU_PMEVCNTR_EL0(n)\
+   /* PMEVCNTRn_EL0 */ \
+   { Op0(0b11), Op1(0b011), CRn(0b1110),   \
+ CRm((0b1000 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)), \
+ access_pmu_regs, reset_unknown, (PMEVCNTR0_EL0 + n), }
+
+/* Macro to expand the PMEVTYPERn_EL0 register */
+#define PMU_PMEVTYPER_EL0(n)   \
+   /* PMEVTYPERn_EL0 */\
+   { Op0(0b11), Op1(0b011), CRn(0b1110),   \
+ CRm((0b1100 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)), \
+ access_pmu_regs, reset_unknown, (PMEVTYPER0_EL0 + n), }
+
 /*
  * Architected system registers.
  * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -854,6 +868,74 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ Op0(0b11), Op1(0b011), CRn(0b1101), CRm(0b), Op2(0b011),
  NULL, reset_unknown, TPIDRRO_EL0 },
 
+   /* PMEVCNTRn_EL0 */
+   PMU_PMEVCNTR_EL0(0),
+   PMU_PMEVCNTR_EL0(1),
+   PMU_PMEVCNTR_EL0(2),
+   PMU_PMEVCNTR_EL0(3),
+   PMU_PMEVCNTR_EL0(4),
+   PMU_PMEVCNTR_EL0(5),
+   PMU_PMEVCNTR_EL0(6),
+   PMU_PMEVCNTR_EL0(7),
+   PMU_PMEVCNTR_EL0(8),
+   PMU_PMEVCNTR_EL0(9),
+   PMU_PMEVCNTR_EL0(10),
+   PMU_PMEVCNTR_EL0(11),
+   PMU_PMEVCNTR_EL0(12),
+   PMU_PMEVCNTR_EL0(13),
+   PMU_PMEVCNTR_EL0(14),
+   PMU_PMEVCNTR_EL0(15),
+   PMU_PMEVCNTR_EL0(16),
+   PMU_PMEVCNTR_EL0(17),
+   PMU_PMEVCNTR_EL0(18),
+   PMU_PMEVCNTR_EL0(19),
+   PMU_PMEVCNTR_EL0(20),
+   PMU_PMEVCNTR_EL0(21),
+   PMU_PMEVCNTR_EL0(22),
+   PMU_PMEVCNTR_EL0(23),
+   PMU_PMEVCNTR_EL0(24),
+   PMU_PMEVCNTR_EL0(25),
+   PMU_PMEVCNTR_EL0(26),
+   PMU_PMEVCNTR_EL0(27),
+   PMU_PMEVCNTR_EL0(28),
+   PMU_PMEVCNTR_EL0(29),
+   PMU_PMEVCNTR_EL0(30),
+   /* PMEVTYPERn_EL0 */
+   PMU_PMEVTYPER_EL0(0),
+   PMU_PMEVTYPER_EL0(1),
+   PMU_PMEVTYPER_EL0(2),
+   PMU_PMEVTYPER_EL0(3),
+   PMU_PMEVTYPER_EL0(4),
+   PMU_PMEVTYPER_EL0(5),
+   PMU_PMEVTYPER_EL0(6),
+   PMU_PMEVTYPER_EL0(7),
+   PMU_PMEVTYPER_EL0(8),
+   PMU_PMEVTYPER_EL0(9),
+   PMU_PMEVTYPER_EL0(10),
+   PMU_PMEVTYPER_EL0(11),
+   PMU_PMEVTYPER_EL0(12),
+   PMU_PMEVTYPER_EL0(13),
+   PMU_PMEVTYPER_EL0(14),
+   PMU_PMEVTYPER_EL0(15),
+   PMU_PMEVTYPER_EL0(16),
+   PMU_PMEVTYPER_EL0(17),
+   PMU_PMEVTYPER_EL0(18),
+   PMU_PMEVTYPER_EL0(19),
+   PMU_PMEVTYPER_EL0(20),
+   PMU_PMEVTYPER_EL0(21),
+   PMU_PMEVTYPER_EL0(22),
+   PMU_PMEVTYPER_EL0(23),
+   PMU_PMEVTYPER_EL0(24),
+   PMU_PMEVTYPER_EL0(25),
+   PMU_PMEVTYPER_EL0(26),
+   PMU_PMEVTYPER_EL0(27),
+   PMU_PMEVTYPER_EL0(28),
+   PMU_PMEVTYPER_EL0(29),
+   PMU_PMEVTYPER_EL0(30),
+   /* PMCCFILTR_EL0 */
+   { Op0(0b11), Op1(0b011), CRn(0b1110), CRm(0b), Op2(0b111),
+ access_pmu_regs, reset_unknown, PMCCFILTR_EL0, },
+
/* DACR32_EL2 */
{ Op0(0b11), Op1(0b100), CRn(0b0011), CRm(0b), Op2(0b000),
  NULL, reset_unknown, DACR32_EL2 },
@@ -1184,6 +1266,20 @@ static bool access_pmu_cp15_regs(struct kvm_vcpu *vcpu,
return true;
 }
 
+/* Macro to expand the PMEVCNTRn register */
+#define PMU_PMEVCNTR(n)
\
+   /* PMEVCNTRn */ \
+   { Op1(0), CRn(0b1110),  \
+ CRm((0b1000 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)), \
+ access_pmu_cp15_regs, NULL, (c14_PMEVCNTR0 + n), }
+
+/* Macro to expand the PMEVTYPERn register */
+#define PMU_PMEVTYPER(n)   \
+   /* PMEVTYPERn */\
+   { Op1(0), CRn(0b1110),  \
+ CRm((0b1100 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)), \
+ access_pmu_cp15_regs, NULL, (c14_PMEVTYPER0 + n), }
+
 /*
  * Trapped cp15 registers. TTBR0/TTBR1 get a 

RE: [PATCH] KVM: arm/arm64: Revert to old way of checking for device mapping in stage2_flush_ptes().

2015-12-02 Thread Pavel Fedin
 Hello!

> > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> > index 7dace90..51ad98f 100644
> > --- a/arch/arm/kvm/mmu.c
> > +++ b/arch/arm/kvm/mmu.c
> > @@ -310,7 +310,8 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t 
> > *pmd,
> >
> > pte = pte_offset_kernel(pmd, addr);
> > do {
> > -   if (!pte_none(*pte) && 
> > !kvm_is_device_pfn(__phys_to_pfn(addr)))
> > +   if (!pte_none(*pte) &&
> > +   (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
> 
> I think your analysis is correct, but does that not apply to both instances?

 No no, another one is correct, since it operates on real PFN (at least looks 
like so). I have verified my fix against the original problem (crash on 
Exynos5410 without generic timer), and it still works fine there.

> And instead of reverting, could we fix this properly instead?

 Of course, i'm not against alternate approaches, feel free to. I've just 
suggested what i could, to fix things quickly. I'm indeed no expert in KVM 
memory management yet. After all, this is what mailing lists are for.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 05/21] KVM: ARM64: Add reset and access handlers for PMSELR register

2015-12-02 Thread Shannon Zhao
From: Shannon Zhao 

Since the reset value of PMSELR_EL0 is UNKNOWN, use reset_unknown for
its reset handler. As it doesn't need to deal with the accessing action
specially, it uses default case to emulate writing and reading PMSELR
register.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/kvm/sys_regs.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index e020fe0..1f1f6a6 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -698,7 +698,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
  trap_raz_wi },
/* PMSELR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b101),
- trap_raz_wi },
+ access_pmu_regs, reset_unknown, PMSELR_EL0 },
/* PMCEID0_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b110),
  trap_raz_wi },
@@ -989,7 +989,8 @@ static const struct sys_reg_desc cp15_regs[] = {
{ Op1( 0), CRn( 9), CRm(12), Op2( 1), trap_raz_wi },
{ Op1( 0), CRn( 9), CRm(12), Op2( 2), trap_raz_wi },
{ Op1( 0), CRn( 9), CRm(12), Op2( 3), trap_raz_wi },
-   { Op1( 0), CRn( 9), CRm(12), Op2( 5), trap_raz_wi },
+   { Op1( 0), CRn( 9), CRm(12), Op2( 5), access_pmu_cp15_regs,
+ NULL, c9_PMSELR },
{ Op1( 0), CRn( 9), CRm(12), Op2( 6), trap_raz_wi },
{ Op1( 0), CRn( 9), CRm(12), Op2( 7), trap_raz_wi },
{ Op1( 0), CRn( 9), CRm(13), Op2( 0), trap_raz_wi },
-- 
2.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 00/21] KVM: ARM64: Add guest PMU support

2015-12-02 Thread Shannon Zhao
From: Shannon Zhao 

This patchset adds guest PMU support for KVM on ARM64. It takes
trap-and-emulate approach. When guest wants to monitor one event, it
will be trapped by KVM and KVM will call perf_event API to create a perf
event and call relevant perf_event APIs to get the count value of event.

Use perf to test this patchset in guest. When using "perf list", it
shows the list of the hardware events and hardware cache events perf
supports. Then use "perf stat -e EVENT" to monitor some event. For
example, use "perf stat -e cycles" to count cpu cycles and
"perf stat -e cache-misses" to count cache misses.

Below are the outputs of "perf stat -r 5 sleep 5" when running in host
and guest.

Host:
 Performance counter stats for 'sleep 5' (5 runs):

  0.510276  task-clock (msec) #0.000 CPUs utilized  
  ( +-  1.57% )
 1  context-switches  #0.002 M/sec
 0  cpu-migrations#0.000 K/sec
49  page-faults   #0.096 M/sec  
  ( +-  0.77% )
   1064117  cycles#2.085 GHz
  ( +-  1.56% )
 stalled-cycles-frontend
 stalled-cycles-backend
529051  instructions  #0.50  insns per cycle
  ( +-  0.55% )
 branches
  9894  branch-misses #   19.390 M/sec  
  ( +-  1.70% )

   5.000853900 seconds time elapsed 
 ( +-  0.00% )

Guest:
 Performance counter stats for 'sleep 5' (5 runs):

  0.642456  task-clock (msec) #0.000 CPUs utilized  
  ( +-  1.81% )
 1  context-switches  #0.002 M/sec
 0  cpu-migrations#0.000 K/sec
49  page-faults   #0.076 M/sec  
  ( +-  1.64% )
   1322717  cycles#2.059 GHz
  ( +-  1.88% )
 stalled-cycles-frontend
 stalled-cycles-backend
640944  instructions  #0.48  insns per cycle
  ( +-  1.10% )
 branches
 10665  branch-misses #   16.600 M/sec  
  ( +-  2.23% )

   5.001181452 seconds time elapsed 
 ( +-  0.00% )

Have a cycle counter read test like below in guest and host:

static void test(void)
{
unsigned long count, count1, count2;
count1 = read_cycles();
count++;
count2 = read_cycles();
}

Host:
count1: 3046186213
count2: 3046186347
delta: 134

Guest:
count1: 5645797121
count2: 5645797270
delta: 149

The gap between guest and host is very small. One reason for this I
think is that it doesn't count the cycles in EL2 and host since we add
exclude_hv = 1. So the cycles spent to store/restore registers which
happens at EL2 are not included.

This patchset can be fetched from [1] and the relevant QEMU version for
test can be fetched from [2].

The results of 'perf test' can be found from [3][4].
The results of perf_event_tests test suite can be found from [5][6].

Also, I have tested "perf top" in two VMs and host at the same time. It
works well.

Thanks,
Shannon

[1] https://git.linaro.org/people/shannon.zhao/linux-mainline.git  
KVM_ARM64_PMU_v5
[2] https://git.linaro.org/people/shannon.zhao/qemu.git  virtual_PMU
[3] http://people.linaro.org/~shannon.zhao/PMU/perf-test-host.txt
[4] http://people.linaro.org/~shannon.zhao/PMU/perf-test-guest.txt
[5] http://people.linaro.org/~shannon.zhao/PMU/perf_event_tests-host.txt
[6] http://people.linaro.org/~shannon.zhao/PMU/perf_event_tests-guest.txt

Changes since v4:
* Rebase on new linux kernel mainline 
* Drop the reset handler of CP15 registers
* Fix a compile failure on arch ARM due to lack of asm/pmu.h
* Refactor the interrupt injecting flow according to Marc's suggestion
* Check the value of PMSELR register
* Calculate the attr.disabled according to PMCR.E and PMCNTENSET/CLR
* Fix some coding style
* Document the vPMU irq range

Changes since v3:
* Rebase on new linux kernel mainline 
* Use ARMV8_MAX_COUNTERS instead of 32
* Reset PMCR.E to zero.
* Trigger overflow for software increment.
* Optimize PMU interrupt inject logic.
* Add handler for E,C,P bits of PMCR
* Fix the overflow bug found by perf_event_tests
* Run 'perf test', 'perf top' and perf_event_tests test suite
* Add exclude_hv = 1 configuration to not count in EL2

Changes since v2:
* Directly use perf raw event type to create perf_event in KVM
* Add a helper vcpu_sysreg_write
* remove unrelated header file

Changes since v1:
* Use switch...case for registers access handler instead of adding
  alone handler for each register
* Try to use the sys_regs to store the register value instead of adding
  new variables in struct kvm_pmc
* Fix the 

[PATCH v5 09/21] KVM: ARM64: Add reset and access handlers for PMXEVCNTR register

2015-12-02 Thread Shannon Zhao
From: Shannon Zhao 

Since the reset value of PMXEVCNTR is UNKNOWN, use reset_unknown for
its reset handler. Add access handler which emulates writing and reading
PMXEVCNTR register. When reading PMXEVCNTR, call perf_event_read_value
to get the count value of the perf event.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/kvm/sys_regs.c | 53 +--
 1 file changed, 51 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 6967a49..43a634c 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -493,6 +493,18 @@ static bool access_pmu_regs(struct kvm_vcpu *vcpu,
 
if (p->is_write) {
switch (r->reg) {
+   case PMXEVCNTR_EL0: {
+   u64 idx = vcpu_sys_reg(vcpu, PMSELR_EL0)
+ & ARMV8_COUNTER_MASK;
+
+   if (!pmu_counter_idx_valid(vcpu_sys_reg(vcpu, PMCR_EL0),
+  idx))
+   break;
+
+   val = kvm_pmu_get_counter_value(vcpu, idx);
+   vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + idx) += 
(s64)*vcpu_reg(vcpu, p->Rt) - val;
+   break;
+   }
case PMXEVTYPER_EL0: {
u64 idx = vcpu_sys_reg(vcpu, PMSELR_EL0)
  & ARMV8_COUNTER_MASK;
@@ -524,6 +536,18 @@ static bool access_pmu_regs(struct kvm_vcpu *vcpu,
}
} else {
switch (r->reg) {
+   case PMXEVCNTR_EL0: {
+   u64 idx = vcpu_sys_reg(vcpu, PMSELR_EL0)
+ & ARMV8_COUNTER_MASK;
+
+   if (!pmu_counter_idx_valid(vcpu_sys_reg(vcpu, PMCR_EL0),
+  idx))
+   break;
+
+   val = kvm_pmu_get_counter_value(vcpu, idx);
+   *vcpu_reg(vcpu, p->Rt) = val;
+   break;
+   }
case PMCR_EL0: {
/* PMCR.P & PMCR.C are RAZ */
val = vcpu_sys_reg(vcpu, r->reg)
@@ -754,7 +778,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
  access_pmu_regs, reset_unknown, PMXEVTYPER_EL0 },
/* PMXEVCNTR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b010),
- trap_raz_wi },
+ access_pmu_regs, reset_unknown, PMXEVCNTR_EL0 },
/* PMUSERENR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1110), Op2(0b000),
  trap_raz_wi },
@@ -967,6 +991,18 @@ static bool access_pmu_cp15_regs(struct kvm_vcpu *vcpu,
 
if (p->is_write) {
switch (r->reg) {
+   case c9_PMXEVCNTR: {
+   u32 idx = vcpu_cp15(vcpu, c9_PMSELR)
+ & ARMV8_COUNTER_MASK;
+
+   if (!pmu_counter_idx_valid(vcpu_sys_reg(vcpu, c9_PMCR),
+  idx))
+   break;
+
+   val = kvm_pmu_get_counter_value(vcpu, idx);
+   vcpu_cp15(vcpu, c14_PMEVCNTR0 + idx) += 
(s64)*vcpu_reg(vcpu, p->Rt) - val;
+   break;
+   }
case c9_PMXEVTYPER: {
u32 idx = vcpu_cp15(vcpu, c9_PMSELR)
  & ARMV8_COUNTER_MASK;
@@ -998,6 +1034,18 @@ static bool access_pmu_cp15_regs(struct kvm_vcpu *vcpu,
}
} else {
switch (r->reg) {
+   case c9_PMXEVCNTR: {
+   u32 idx = vcpu_cp15(vcpu, c9_PMSELR)
+ & ARMV8_COUNTER_MASK;
+
+   if (!pmu_counter_idx_valid(vcpu_sys_reg(vcpu, c9_PMCR),
+  idx))
+   break;
+
+   val = kvm_pmu_get_counter_value(vcpu, idx);
+   *vcpu_reg(vcpu, p->Rt) = val;
+   break;
+   }
case c9_PMCR: {
/* PMCR.P & PMCR.C are RAZ */
val = vcpu_cp15(vcpu, r->reg)
@@ -1056,7 +1104,8 @@ static const struct sys_reg_desc cp15_regs[] = {
{ Op1( 0), CRn( 9), CRm(13), Op2( 0), trap_raz_wi },
{ Op1( 0), CRn( 9), CRm(13), Op2( 1), access_pmu_cp15_regs,
  NULL, c9_PMXEVTYPER },
-   { Op1( 0), CRn( 9), CRm(13), Op2( 2), trap_raz_wi },
+   { Op1( 0), CRn( 9), CRm(13), Op2( 2), access_pmu_cp15_regs,
+ NULL, c9_PMXEVCNTR },
{ Op1( 0), CRn( 9), CRm(14), Op2( 0), trap_raz_wi },
{ Op1( 0), CRn( 9), CRm(14), Op2( 1), trap_raz_wi },
{ Op1( 0), CRn( 9), CRm(14), Op2( 2), trap_raz_wi },
-- 
2.0.4


--
To unsubscribe from this 

[PATCH v5 19/21] KVM: ARM64: Reset PMU state when resetting vcpu

2015-12-02 Thread Shannon Zhao
From: Shannon Zhao 

When resetting vcpu, it needs to reset the PMU state to initial status.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/kvm/reset.c |  3 +++
 include/kvm/arm_pmu.h  |  2 ++
 virt/kvm/arm/pmu.c | 18 ++
 3 files changed, 23 insertions(+)

diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index f34745c..dfbce78 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -120,6 +120,9 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
/* Reset system registers */
kvm_reset_sys_regs(vcpu);
 
+   /* Reset PMU */
+   kvm_pmu_vcpu_reset(vcpu);
+
/* Reset timer */
return kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq);
 }
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 4cb0ade..fe8035a 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -39,6 +39,7 @@ struct kvm_pmu {
 };
 
 #ifdef CONFIG_KVM_ARM_PMU
+void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu);
 void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu);
 unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32 select_idx);
 void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u32 val);
@@ -50,6 +51,7 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, 
u32 data,
u32 select_idx);
 void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u32 val);
 #else
+void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu) {}
 void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu) {}
 unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32 select_idx)
 {
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index 34e7386..4014831 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -80,6 +80,24 @@ static void kvm_pmu_stop_counter(struct kvm_pmc *pmc)
 }
 
 /**
+ * kvm_pmu_vcpu_reset - reset pmu state for cpu
+ * @vcpu: The vcpu pointer
+ *
+ */
+void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu)
+{
+   int i;
+   struct kvm_pmu *pmu = >arch.pmu;
+
+   for (i = 0; i < ARMV8_MAX_COUNTERS; i++) {
+   kvm_pmu_stop_counter(>pmc[i]);
+   pmu->pmc[i].idx = i;
+   pmu->pmc[i].vcpu = vcpu;
+   pmu->pmc[i].bitmask = 0xUL;
+   }
+}
+
+/**
  * kvm_pmu_flush_hwstate - flush pmu state to cpu
  * @vcpu: The vcpu pointer
  *
-- 
2.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 03/21] KVM: ARM64: Add offset defines for PMU registers

2015-12-02 Thread Shannon Zhao
From: Shannon Zhao 

We are about to trap and emulate acccesses to each PMU register
individually. This adds the context offsets for the AArch64 PMU
registers and their AArch32 counterparts.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/include/asm/kvm_asm.h | 55 
 1 file changed, 50 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 5e37710..4f804c1 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -48,12 +48,34 @@
 #define MDSCR_EL1  22  /* Monitor Debug System Control Register */
 #define MDCCINT_EL123  /* Monitor Debug Comms Channel Interrupt Enable 
Reg */
 
+/* Performance Monitors Registers */
+#define PMCR_EL0   24  /* Control Register */
+#define PMOVSSET_EL0   25  /* Overflow Flag Status Set Register */
+#define PMOVSCLR_EL0   26  /* Overflow Flag Status Clear Register */
+#define PMSELR_EL0 27  /* Event Counter Selection Register */
+#define PMCEID0_EL028  /* Common Event Identification Register 0 */
+#define PMCEID1_EL029  /* Common Event Identification Register 1 */
+#define PMEVCNTR0_EL0  30  /* Event Counter Register (0-30) */
+#define PMEVCNTR30_EL0 60
+#define PMCCNTR_EL061  /* Cycle Counter Register */
+#define PMEVTYPER0_EL0 62  /* Event Type Register (0-30) */
+#define PMEVTYPER30_EL092
+#define PMCCFILTR_EL0  93  /* Cycle Count Filter Register */
+#define PMXEVCNTR_EL0  94  /* Selected Event Count Register */
+#define PMXEVTYPER_EL0 95  /* Selected Event Type Register */
+#define PMCNTENSET_EL0 96  /* Count Enable Set Register */
+#define PMCNTENCLR_EL0 97  /* Count Enable Clear Register */
+#define PMINTENSET_EL1 98  /* Interrupt Enable Set Register */
+#define PMINTENCLR_EL1 99  /* Interrupt Enable Clear Register */
+#define PMUSERENR_EL0  100 /* User Enable Register */
+#define PMSWINC_EL0101 /* Software Increment Register */
+
 /* 32bit specific registers. Keep them at the end of the range */
-#defineDACR32_EL2  24  /* Domain Access Control Register */
-#defineIFSR32_EL2  25  /* Instruction Fault Status Register */
-#defineFPEXC32_EL2 26  /* Floating-Point Exception Control 
Register */
-#defineDBGVCR32_EL227  /* Debug Vector Catch Register */
-#defineNR_SYS_REGS 28
+#defineDACR32_EL2  102 /* Domain Access Control Register */
+#defineIFSR32_EL2  103 /* Instruction Fault Status Register */
+#defineFPEXC32_EL2 104 /* Floating-Point Exception Control 
Register */
+#defineDBGVCR32_EL2105 /* Debug Vector Catch Register */
+#defineNR_SYS_REGS 106
 
 /* 32bit mapping */
 #define c0_MPIDR   (MPIDR_EL1 * 2) /* MultiProcessor ID Register */
@@ -75,6 +97,24 @@
 #define c6_IFAR(c6_DFAR + 1)   /* Instruction Fault Address 
Register */
 #define c7_PAR (PAR_EL1 * 2)   /* Physical Address Register */
 #define c7_PAR_high(c7_PAR + 1)/* PAR top 32 bits */
+
+/* Performance Monitors*/
+#define c9_PMCR(PMCR_EL0 * 2)
+#define c9_PMOVSSET(PMOVSSET_EL0 * 2)
+#define c9_PMOVSCLR(PMOVSCLR_EL0 * 2)
+#define c9_PMCCNTR (PMCCNTR_EL0 * 2)
+#define c9_PMSELR  (PMSELR_EL0 * 2)
+#define c9_PMCEID0 (PMCEID0_EL0 * 2)
+#define c9_PMCEID1 (PMCEID1_EL0 * 2)
+#define c9_PMXEVCNTR   (PMXEVCNTR_EL0 * 2)
+#define c9_PMXEVTYPER  (PMXEVTYPER_EL0 * 2)
+#define c9_PMCNTENSET  (PMCNTENSET_EL0 * 2)
+#define c9_PMCNTENCLR  (PMCNTENCLR_EL0 * 2)
+#define c9_PMINTENSET  (PMINTENSET_EL1 * 2)
+#define c9_PMINTENCLR  (PMINTENCLR_EL1 * 2)
+#define c9_PMUSERENR   (PMUSERENR_EL0 * 2)
+#define c9_PMSWINC (PMSWINC_EL0 * 2)
+
 #define c10_PRRR   (MAIR_EL1 * 2)  /* Primary Region Remap Register */
 #define c10_NMRR   (c10_PRRR + 1)  /* Normal Memory Remap Register */
 #define c12_VBAR   (VBAR_EL1 * 2)  /* Vector Base Address Register */
@@ -86,6 +126,11 @@
 #define c10_AMAIR1 (c10_AMAIR0 + 1)/* Aux Memory Attr Indirection Reg */
 #define c14_CNTKCTL(CNTKCTL_EL1 * 2) /* Timer Control Register (PL1) */
 
+/* Performance Monitors*/
+#define c14_PMEVCNTR0  (PMEVCNTR0_EL0 * 2)
+#define c14_PMEVTYPER0 (PMEVTYPER0_EL0 * 2)
+#define c14_PMCCFILTR  (PMCCFILTR_EL0 * 2)
+
 #define cp14_DBGDSCRext(MDSCR_EL1 * 2)
 #define cp14_DBGBCR0   (DBGBCR0_EL1 * 2)
 #define cp14_DBGBVR0   (DBGBVR0_EL1 * 2)
-- 
2.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: VMX: fix the writing POSTED_INTR_NV

2015-12-02 Thread roy . qing . li
From: Li RongQing  

POSTED_INTR_NV is 16bit, should not use 64bit write function

[ 5311.676074] vmwrite error: reg 3 value 0 (err 12)
  [ 5311.680001] CPU: 49 PID: 4240 Comm: qemu-system-i38 Tainted: G I 
4.1.13-WR8.0.0.0_standard #1
  [ 5311.689343] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS 
SE5C610.86B.01.01.0008.021120151325 02/11/2015
  [ 5311.699550]   e69a7e1c c1950de1  e69a7e38 fafcff45 
fafebd24
  [ 5311.706924] 0003  000c b6a06dfa e69a7e40 fafcff79 e69a7eb0 
fafd5f57
  [ 5311.714296] e69a7ec0 c1080600  0001 c0e18018 01be  
0b43
  [ 5311.721651] Call Trace:
  [ 5311.722942] [] dump_stack+0x4b/0x75
  [ 5311.726467] [] vmwrite_error+0x35/0x40 [kvm_intel]
  [ 5311.731444] [] vmcs_writel+0x29/0x30 [kvm_intel]
  [ 5311.736228] [] vmx_create_vcpu+0x337/0xb90 [kvm_intel]
  [ 5311.741600] [] ? dequeue_task_fair+0x2e0/0xf60
  [ 5311.746197] [] kvm_arch_vcpu_create+0x3a/0x70 [kvm]
  [ 5311.751278] [] kvm_vm_ioctl+0x14d/0x640 [kvm]
  [ 5311.755771] [] ? free_pages_prepare+0x1a4/0x2d0
  [ 5311.760455] [] ? debug_smp_processor_id+0x12/0x20
  [ 5311.765333] [] ? sched_move_task+0xbe/0x170
  [ 5311.769621] [] ? kmem_cache_free+0x213/0x230
  [ 5311.774016] [] ? kvm_set_memory_region+0x60/0x60 [kvm]
  [ 5311.779379] [] do_vfs_ioctl+0x2e2/0x500
  [ 5311.783285] [] ? kmem_cache_free+0x213/0x230
  [ 5311.787677] [] ? __mmdrop+0x63/0xd0
  [ 5311.791196] [] ? __mmdrop+0x63/0xd0
  [ 5311.794712] [] ? __mmdrop+0x63/0xd0
  [ 5311.798234] [] ? __fget+0x57/0x90
  [ 5311.801559] [] ? __fget_light+0x22/0x50
  [ 5311.805464] [] SyS_ioctl+0x80/0x90
  [ 5311.808885] [] sysenter_do_call+0x12/0x12
  [ 5312.059280] kvm: zapping shadow pages for mmio generation wraparound
  [ 5313.678415] kvm [4231]: vcpu0 disabled perfctr wrmsr: 0xc2 data 0x
  [ 5313.726518] kvm [4231]: vcpu0 unhandled rdmsr: 0x570

Signed-off-by: Li RongQing 
Cc: Yang Zhang 
---
 arch/x86/kvm/vmx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5eb56ed..418e084 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4780,7 +4780,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
 
vmcs_write16(GUEST_INTR_STATUS, 0);
 
-   vmcs_write64(POSTED_INTR_NV, POSTED_INTR_VECTOR);
+   vmcs_write16(POSTED_INTR_NV, POSTED_INTR_VECTOR);
vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((>pi_desc)));
}
 
@@ -9505,7 +9505,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
 */
vmx->nested.posted_intr_nv = vmcs12->posted_intr_nv;
vmx->nested.pi_pending = false;
-   vmcs_write64(POSTED_INTR_NV, POSTED_INTR_VECTOR);
+   vmcs_write16(POSTED_INTR_NV, POSTED_INTR_VECTOR);
vmcs_write64(POSTED_INTR_DESC_ADDR,
page_to_phys(vmx->nested.pi_desc_page) +
(unsigned long)(vmcs12->posted_intr_desc_addr &
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 14/21] KVM: ARM64: Add reset and access handlers for PMUSERENR register

2015-12-02 Thread Shannon Zhao
From: Shannon Zhao 

The reset value of PMUSERENR_EL0 is UNKNOWN, use reset_unknown.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/kvm/sys_regs.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index f5e0732..eb4fcf9 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -835,7 +835,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
  access_pmu_regs, reset_unknown, PMXEVCNTR_EL0 },
/* PMUSERENR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1110), Op2(0b000),
- trap_raz_wi },
+ access_pmu_regs, reset_unknown, PMUSERENR_EL0 },
/* PMOVSSET_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1110), Op2(0b011),
  access_pmu_regs, reset_unknown, PMOVSSET_EL0 },
@@ -1218,7 +1218,8 @@ static const struct sys_reg_desc cp15_regs[] = {
  NULL, c9_PMXEVTYPER },
{ Op1( 0), CRn( 9), CRm(13), Op2( 2), access_pmu_cp15_regs,
  NULL, c9_PMXEVCNTR },
-   { Op1( 0), CRn( 9), CRm(14), Op2( 0), trap_raz_wi },
+   { Op1( 0), CRn( 9), CRm(14), Op2( 0), access_pmu_cp15_regs,
+ NULL,  c9_PMUSERENR, 0 },
{ Op1( 0), CRn( 9), CRm(14), Op2( 1), access_pmu_cp15_regs,
  NULL, c9_PMINTENSET },
{ Op1( 0), CRn( 9), CRm(14), Op2( 2), access_pmu_cp15_regs,
-- 
2.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 11/21] KVM: ARM64: Add reset and access handlers for PMCNTENSET and PMCNTENCLR register

2015-12-02 Thread Shannon Zhao
From: Shannon Zhao 

Since the reset value of PMCNTENSET and PMCNTENCLR is UNKNOWN, use
reset_unknown for its reset handler. Add a new case to emulate writing
PMCNTENSET or PMCNTENCLR register.

When writing to PMCNTENSET, call perf_event_enable to enable the perf
event. When writing to PMCNTENCLR, call perf_event_disable to disable
the perf event.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/kvm/sys_regs.c | 52 +++
 include/kvm/arm_pmu.h |  4 
 virt/kvm/arm/pmu.c| 47 ++
 3 files changed, 99 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 9e06fe8..e852e5d 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -526,6 +526,27 @@ static bool access_pmu_regs(struct kvm_vcpu *vcpu,
vcpu_sys_reg(vcpu, PMEVTYPER0_EL0 + idx) = val;
break;
}
+   case PMCNTENSET_EL0: {
+   val = *vcpu_reg(vcpu, p->Rt);
+   kvm_pmu_enable_counter(vcpu, val,
+  vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMCR_E);
+   /* Value 1 of PMCNTENSET_EL0 and PMCNTENCLR_EL0 means
+* corresponding counter enabled.
+*/
+   vcpu_sys_reg(vcpu, r->reg) |= val;
+   vcpu_sys_reg(vcpu, PMCNTENCLR_EL0) |= val;
+   break;
+   }
+   case PMCNTENCLR_EL0: {
+   val = *vcpu_reg(vcpu, p->Rt);
+   kvm_pmu_disable_counter(vcpu, val);
+   /* Value 0 of PMCNTENSET_EL0 and PMCNTENCLR_EL0 means
+* corresponding counter disabled.
+*/
+   vcpu_sys_reg(vcpu, r->reg) &= ~val;
+   vcpu_sys_reg(vcpu, PMCNTENSET_EL0) &= ~val;
+   break;
+   }
case PMCR_EL0: {
/* Only update writeable bits of PMCR */
val = vcpu_sys_reg(vcpu, r->reg);
@@ -764,10 +785,10 @@ static const struct sys_reg_desc sys_reg_descs[] = {
  access_pmu_regs, reset_pmcr, PMCR_EL0, },
/* PMCNTENSET_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b001),
- trap_raz_wi },
+ access_pmu_regs, reset_unknown, PMCNTENSET_EL0 },
/* PMCNTENCLR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b010),
- trap_raz_wi },
+ access_pmu_regs, reset_unknown, PMCNTENCLR_EL0 },
/* PMOVSCLR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b011),
  trap_raz_wi },
@@ -1037,6 +1058,27 @@ static bool access_pmu_cp15_regs(struct kvm_vcpu *vcpu,
vcpu_cp15(vcpu, c14_PMEVTYPER0 + idx) = val;
break;
}
+   case c9_PMCNTENSET: {
+   val = *vcpu_reg(vcpu, p->Rt);
+   kvm_pmu_enable_counter(vcpu, val,
+  vcpu_cp15(vcpu, c9_PMCR) & ARMV8_PMCR_E);
+   /* Value 1 of PMCNTENSET_EL0 and PMCNTENCLR_EL0 means
+* corresponding counter enabled.
+*/
+   vcpu_cp15(vcpu, r->reg) |= val;
+   vcpu_cp15(vcpu, c9_PMCNTENCLR) |= val;
+   break;
+   }
+   case c9_PMCNTENCLR: {
+   val = *vcpu_reg(vcpu, p->Rt);
+   kvm_pmu_disable_counter(vcpu, val);
+   /* Value 0 of PMCNTENSET_EL0 and PMCNTENCLR_EL0 means
+* corresponding counter disabled.
+*/
+   vcpu_cp15(vcpu, r->reg) &= ~val;
+   vcpu_cp15(vcpu, c9_PMCNTENSET) &= ~val;
+   break;
+   }
case c9_PMCR: {
/* Only update writeable bits of PMCR */
val = vcpu_cp15(vcpu, r->reg);
@@ -1118,8 +1160,10 @@ static const struct sys_reg_desc cp15_regs[] = {
/* PMU */
{ Op1( 0), CRn( 9), CRm(12), Op2( 0), access_pmu_cp15_regs,
  NULL, c9_PMCR },
-   { Op1( 0), CRn( 9), CRm(12), Op2( 1), trap_raz_wi },
-   { Op1( 0), CRn( 9), CRm(12), Op2( 2), trap_raz_wi },
+   { Op1( 0), CRn( 9), CRm(12), Op2( 1), access_pmu_cp15_regs,
+ NULL, c9_PMCNTENSET },
+   { Op1( 0), CRn( 9), CRm(12), Op2( 2), access_pmu_cp15_regs,
+ NULL, c9_PMCNTENCLR },
{ Op1( 0), CRn( 9), CRm(12), Op2( 3), trap_raz_wi },
{ Op1( 0), CRn( 9), CRm(12), Op2( 5), access_pmu_cp15_regs,
  NULL, c9_PMSELR },
diff --git a/include/kvm/arm_pmu.h 

[PATCH v5 12/21] KVM: ARM64: Add reset and access handlers for PMINTENSET and PMINTENCLR register

2015-12-02 Thread Shannon Zhao
From: Shannon Zhao 

Since the reset value of PMINTENSET and PMINTENCLR is UNKNOWN, use
reset_unknown for its reset handler. Add a new case to emulate writing
PMINTENSET or PMINTENCLR register.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/kvm/sys_regs.c | 34 ++
 1 file changed, 30 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index e852e5d..a4f9177 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -547,6 +547,18 @@ static bool access_pmu_regs(struct kvm_vcpu *vcpu,
vcpu_sys_reg(vcpu, PMCNTENSET_EL0) &= ~val;
break;
}
+   case PMINTENSET_EL1: {
+   val = *vcpu_reg(vcpu, p->Rt);
+   vcpu_sys_reg(vcpu, r->reg) |= val;
+   vcpu_sys_reg(vcpu, PMINTENCLR_EL1) |= val;
+   break;
+   }
+   case PMINTENCLR_EL1: {
+   val = *vcpu_reg(vcpu, p->Rt);
+   vcpu_sys_reg(vcpu, r->reg) &= ~val;
+   vcpu_sys_reg(vcpu, PMINTENSET_EL1) &= ~val;
+   break;
+   }
case PMCR_EL0: {
/* Only update writeable bits of PMCR */
val = vcpu_sys_reg(vcpu, r->reg);
@@ -742,10 +754,10 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 
/* PMINTENSET_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b1001), CRm(0b1110), Op2(0b001),
- trap_raz_wi },
+ access_pmu_regs, reset_unknown, PMINTENSET_EL1 },
/* PMINTENCLR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b1001), CRm(0b1110), Op2(0b010),
- trap_raz_wi },
+ access_pmu_regs, reset_unknown, PMINTENCLR_EL1 },
 
/* MAIR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b1010), CRm(0b0010), Op2(0b000),
@@ -1079,6 +1091,18 @@ static bool access_pmu_cp15_regs(struct kvm_vcpu *vcpu,
vcpu_cp15(vcpu, c9_PMCNTENSET) &= ~val;
break;
}
+   case c9_PMINTENSET: {
+   val = *vcpu_reg(vcpu, p->Rt);
+   vcpu_cp15(vcpu, r->reg) |= val;
+   vcpu_cp15(vcpu, c9_PMINTENCLR) |= val;
+   break;
+   }
+   case c9_PMINTENCLR: {
+   val = *vcpu_reg(vcpu, p->Rt);
+   vcpu_cp15(vcpu, r->reg) &= ~val;
+   vcpu_cp15(vcpu, c9_PMINTENSET) &= ~val;
+   break;
+   }
case c9_PMCR: {
/* Only update writeable bits of PMCR */
val = vcpu_cp15(vcpu, r->reg);
@@ -1178,8 +1202,10 @@ static const struct sys_reg_desc cp15_regs[] = {
{ Op1( 0), CRn( 9), CRm(13), Op2( 2), access_pmu_cp15_regs,
  NULL, c9_PMXEVCNTR },
{ Op1( 0), CRn( 9), CRm(14), Op2( 0), trap_raz_wi },
-   { Op1( 0), CRn( 9), CRm(14), Op2( 1), trap_raz_wi },
-   { Op1( 0), CRn( 9), CRm(14), Op2( 2), trap_raz_wi },
+   { Op1( 0), CRn( 9), CRm(14), Op2( 1), access_pmu_cp15_regs,
+ NULL, c9_PMINTENSET },
+   { Op1( 0), CRn( 9), CRm(14), Op2( 2), access_pmu_cp15_regs,
+ NULL, c9_PMINTENCLR },
 
{ Op1( 0), CRn(10), CRm( 2), Op2( 0), access_vm_reg, NULL, c10_PRRR },
{ Op1( 0), CRn(10), CRm( 2), Op2( 1), access_vm_reg, NULL, c10_NMRR },
-- 
2.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 01/21] ARM64: Move PMU register related defines to asm/pmu.h

2015-12-02 Thread Shannon Zhao
From: Shannon Zhao 

To use the ARMv8 PMU related register defines from the KVM code,
we move the relevant definitions to asm/pmu.h header file.

Signed-off-by: Anup Patel 
Signed-off-by: Shannon Zhao 
---
 arch/arm64/include/asm/pmu.h   | 64 ++
 arch/arm64/kernel/perf_event.c | 36 +---
 2 files changed, 65 insertions(+), 35 deletions(-)
 create mode 100644 arch/arm64/include/asm/pmu.h

diff --git a/arch/arm64/include/asm/pmu.h b/arch/arm64/include/asm/pmu.h
new file mode 100644
index 000..4264ea0
--- /dev/null
+++ b/arch/arm64/include/asm/pmu.h
@@ -0,0 +1,64 @@
+/*
+ * Copyright (C) 2015 Linaro Ltd, Shannon Zhao
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+#ifndef __ASM_PMU_H
+#define __ASM_PMU_H
+
+#define ARMV8_MAX_COUNTERS  32
+#define ARMV8_COUNTER_MASK  (ARMV8_MAX_COUNTERS - 1)
+
+/*
+ * Per-CPU PMCR: config reg
+ */
+#define ARMV8_PMCR_E   (1 << 0) /* Enable all counters */
+#define ARMV8_PMCR_P   (1 << 1) /* Reset all counters */
+#define ARMV8_PMCR_C   (1 << 2) /* Cycle counter reset */
+#define ARMV8_PMCR_D   (1 << 3) /* CCNT counts every 64th cpu cycle */
+#define ARMV8_PMCR_X   (1 << 4) /* Export to ETM */
+#define ARMV8_PMCR_DP  (1 << 5) /* Disable CCNT if non-invasive debug*/
+#defineARMV8_PMCR_N_SHIFT  11   /* Number of counters 
supported */
+#defineARMV8_PMCR_N_MASK   0x1f
+#defineARMV8_PMCR_MASK 0x3f /* Mask for writable bits */
+
+/*
+ * PMCNTEN: counters enable reg
+ */
+#defineARMV8_CNTEN_MASK0x  /* Mask for writable 
bits */
+
+/*
+ * PMINTEN: counters interrupt enable reg
+ */
+#defineARMV8_INTEN_MASK0x  /* Mask for writable 
bits */
+
+/*
+ * PMOVSR: counters overflow flag status reg
+ */
+#defineARMV8_OVSR_MASK 0x  /* Mask for writable 
bits */
+#defineARMV8_OVERFLOWED_MASK   ARMV8_OVSR_MASK
+
+/*
+ * PMXEVTYPER: Event selection reg
+ */
+#defineARMV8_EVTYPE_MASK   0xc80003ff  /* Mask for writable 
bits */
+#defineARMV8_EVTYPE_EVENT  0x3ff   /* Mask for EVENT bits 
*/
+
+/*
+ * Event filters for PMUv3
+ */
+#defineARMV8_EXCLUDE_EL1   (1 << 31)
+#defineARMV8_EXCLUDE_EL0   (1 << 30)
+#defineARMV8_INCLUDE_EL2   (1 << 27)
+
+#endif /* __ASM_PMU_H */
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 5b1897e..7eca5dc 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * ARMv8 PMUv3 Performance Events handling code.
@@ -187,9 +188,6 @@ static const unsigned 
armv8_a57_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 #defineARMV8_IDX_COUNTER_LAST(cpu_pmu) \
(ARMV8_IDX_CYCLE_COUNTER + cpu_pmu->num_events - 1)
 
-#defineARMV8_MAX_COUNTERS  32
-#defineARMV8_COUNTER_MASK  (ARMV8_MAX_COUNTERS - 1)
-
 /*
  * ARMv8 low level PMU access
  */
@@ -200,38 +198,6 @@ static const unsigned 
armv8_a57_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 #defineARMV8_IDX_TO_COUNTER(x) \
(((x) - ARMV8_IDX_COUNTER0) & ARMV8_COUNTER_MASK)
 
-/*
- * Per-CPU PMCR: config reg
- */
-#define ARMV8_PMCR_E   (1 << 0) /* Enable all counters */
-#define ARMV8_PMCR_P   (1 << 1) /* Reset all counters */
-#define ARMV8_PMCR_C   (1 << 2) /* Cycle counter reset */
-#define ARMV8_PMCR_D   (1 << 3) /* CCNT counts every 64th cpu cycle */
-#define ARMV8_PMCR_X   (1 << 4) /* Export to ETM */
-#define ARMV8_PMCR_DP  (1 << 5) /* Disable CCNT if non-invasive debug*/
-#defineARMV8_PMCR_N_SHIFT  11   /* Number of counters 
supported */
-#defineARMV8_PMCR_N_MASK   0x1f
-#defineARMV8_PMCR_MASK 0x3f /* Mask for writable bits */
-
-/*
- * PMOVSR: counters overflow flag status reg
- */
-#defineARMV8_OVSR_MASK 0x  /* Mask for writable 
bits */
-#defineARMV8_OVERFLOWED_MASK   ARMV8_OVSR_MASK
-
-/*
- * PMXEVTYPER: Event selection reg
- */
-#defineARMV8_EVTYPE_MASK   0xc80003ff  /* Mask for writable 
bits */
-#defineARMV8_EVTYPE_EVENT  0x3ff   /* Mask for EVENT bits 

[PATCH v5 08/21] KVM: ARM64: Add reset and access handlers for PMXEVTYPER register

2015-12-02 Thread Shannon Zhao
From: Shannon Zhao 

Since the reset value of PMXEVTYPER is UNKNOWN, use reset_unknown or
reset_unknown_cp15 for its reset handler. Add access handler which
emulates writing and reading PMXEVTYPER register. When writing to
PMXEVTYPER, call kvm_pmu_set_counter_event_type to create a perf_event
for the selected event type.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/kvm/sys_regs.c | 44 ++--
 1 file changed, 42 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index b0a8d88..6967a49 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -473,6 +473,17 @@ static void reset_pmceid(struct kvm_vcpu *vcpu, const 
struct sys_reg_desc *r)
vcpu_sys_reg(vcpu, r->reg) = pmceid;
 }
 
+static bool pmu_counter_idx_valid(u64 pmcr, u64 idx)
+{
+   u64 val;
+
+   val = (pmcr >> ARMV8_PMCR_N_SHIFT) & ARMV8_PMCR_N_MASK;
+   if (idx >= val && idx != ARMV8_COUNTER_MASK)
+   return false;
+
+   return true;
+}
+
 /* PMU registers accessor. */
 static bool access_pmu_regs(struct kvm_vcpu *vcpu,
const struct sys_reg_params *p,
@@ -482,6 +493,20 @@ static bool access_pmu_regs(struct kvm_vcpu *vcpu,
 
if (p->is_write) {
switch (r->reg) {
+   case PMXEVTYPER_EL0: {
+   u64 idx = vcpu_sys_reg(vcpu, PMSELR_EL0)
+ & ARMV8_COUNTER_MASK;
+
+   if (!pmu_counter_idx_valid(vcpu_sys_reg(vcpu, PMCR_EL0),
+  idx))
+   break;
+
+   val = *vcpu_reg(vcpu, p->Rt);
+   kvm_pmu_set_counter_event_type(vcpu, val, idx);
+   vcpu_sys_reg(vcpu, PMXEVTYPER_EL0) = val;
+   vcpu_sys_reg(vcpu, PMEVTYPER0_EL0 + idx) = val;
+   break;
+   }
case PMCR_EL0: {
/* Only update writeable bits of PMCR */
val = vcpu_sys_reg(vcpu, r->reg);
@@ -726,7 +751,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
  trap_raz_wi },
/* PMXEVTYPER_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b001),
- trap_raz_wi },
+ access_pmu_regs, reset_unknown, PMXEVTYPER_EL0 },
/* PMXEVCNTR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b010),
  trap_raz_wi },
@@ -942,6 +967,20 @@ static bool access_pmu_cp15_regs(struct kvm_vcpu *vcpu,
 
if (p->is_write) {
switch (r->reg) {
+   case c9_PMXEVTYPER: {
+   u32 idx = vcpu_cp15(vcpu, c9_PMSELR)
+ & ARMV8_COUNTER_MASK;
+
+   if (!pmu_counter_idx_valid(vcpu_sys_reg(vcpu, c9_PMCR),
+  idx))
+   break;
+
+   val = *vcpu_reg(vcpu, p->Rt);
+   kvm_pmu_set_counter_event_type(vcpu, val, idx);
+   vcpu_cp15(vcpu, c9_PMXEVTYPER) = val;
+   vcpu_cp15(vcpu, c14_PMEVTYPER0 + idx) = val;
+   break;
+   }
case c9_PMCR: {
/* Only update writeable bits of PMCR */
val = vcpu_cp15(vcpu, r->reg);
@@ -1015,7 +1054,8 @@ static const struct sys_reg_desc cp15_regs[] = {
{ Op1( 0), CRn( 9), CRm(12), Op2( 7), access_pmu_cp15_regs,
  NULL, c9_PMCEID1 },
{ Op1( 0), CRn( 9), CRm(13), Op2( 0), trap_raz_wi },
-   { Op1( 0), CRn( 9), CRm(13), Op2( 1), trap_raz_wi },
+   { Op1( 0), CRn( 9), CRm(13), Op2( 1), access_pmu_cp15_regs,
+ NULL, c9_PMXEVTYPER },
{ Op1( 0), CRn( 9), CRm(13), Op2( 2), trap_raz_wi },
{ Op1( 0), CRn( 9), CRm(14), Op2( 0), trap_raz_wi },
{ Op1( 0), CRn( 9), CRm(14), Op2( 1), trap_raz_wi },
-- 
2.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 10/21] KVM: ARM64: Add reset and access handlers for PMCCNTR register

2015-12-02 Thread Shannon Zhao
From: Shannon Zhao 

Since the reset value of PMCCNTR is UNKNOWN, use reset_unknown for its
reset handler. Add a new case to emulate reading and writing to PMCCNTR
register.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/kvm/sys_regs.c | 31 +--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 43a634c..9e06fe8 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -493,6 +493,13 @@ static bool access_pmu_regs(struct kvm_vcpu *vcpu,
 
if (p->is_write) {
switch (r->reg) {
+   case PMCCNTR_EL0: {
+   val = kvm_pmu_get_counter_value(vcpu,
+   ARMV8_MAX_COUNTERS - 1);
+   vcpu_sys_reg(vcpu, r->reg) +=
+ (s64)*vcpu_reg(vcpu, p->Rt) - val;
+   break;
+   }
case PMXEVCNTR_EL0: {
u64 idx = vcpu_sys_reg(vcpu, PMSELR_EL0)
  & ARMV8_COUNTER_MASK;
@@ -536,6 +543,12 @@ static bool access_pmu_regs(struct kvm_vcpu *vcpu,
}
} else {
switch (r->reg) {
+   case PMCCNTR_EL0: {
+   val = kvm_pmu_get_counter_value(vcpu,
+   ARMV8_MAX_COUNTERS - 1);
+   *vcpu_reg(vcpu, p->Rt) = val;
+   break;
+   }
case PMXEVCNTR_EL0: {
u64 idx = vcpu_sys_reg(vcpu, PMSELR_EL0)
  & ARMV8_COUNTER_MASK;
@@ -772,7 +785,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
  access_pmu_regs, reset_pmceid, PMCEID1_EL0 },
/* PMCCNTR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b000),
- trap_raz_wi },
+ access_pmu_regs, reset_unknown, PMCCNTR_EL0 },
/* PMXEVTYPER_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b001),
  access_pmu_regs, reset_unknown, PMXEVTYPER_EL0 },
@@ -991,6 +1004,13 @@ static bool access_pmu_cp15_regs(struct kvm_vcpu *vcpu,
 
if (p->is_write) {
switch (r->reg) {
+   case c9_PMCCNTR: {
+   val = kvm_pmu_get_counter_value(vcpu,
+   ARMV8_MAX_COUNTERS - 1);
+   vcpu_cp15(vcpu, r->reg) += (s64)*vcpu_reg(vcpu, p->Rt)
+  - val;
+   break;
+   }
case c9_PMXEVCNTR: {
u32 idx = vcpu_cp15(vcpu, c9_PMSELR)
  & ARMV8_COUNTER_MASK;
@@ -1034,6 +1054,12 @@ static bool access_pmu_cp15_regs(struct kvm_vcpu *vcpu,
}
} else {
switch (r->reg) {
+   case c9_PMCCNTR: {
+   val = kvm_pmu_get_counter_value(vcpu,
+   ARMV8_MAX_COUNTERS - 1);
+   *vcpu_reg(vcpu, p->Rt) = val;
+   break;
+   }
case c9_PMXEVCNTR: {
u32 idx = vcpu_cp15(vcpu, c9_PMSELR)
  & ARMV8_COUNTER_MASK;
@@ -1101,7 +1127,8 @@ static const struct sys_reg_desc cp15_regs[] = {
  NULL, c9_PMCEID0 },
{ Op1( 0), CRn( 9), CRm(12), Op2( 7), access_pmu_cp15_regs,
  NULL, c9_PMCEID1 },
-   { Op1( 0), CRn( 9), CRm(13), Op2( 0), trap_raz_wi },
+   { Op1( 0), CRn( 9), CRm(13), Op2( 0), access_pmu_cp15_regs,
+ NULL, c9_PMCCNTR },
{ Op1( 0), CRn( 9), CRm(13), Op2( 1), access_pmu_cp15_regs,
  NULL, c9_PMXEVTYPER },
{ Op1( 0), CRn( 9), CRm(13), Op2( 2), access_pmu_cp15_regs,
-- 
2.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 04/21] KVM: ARM64: Add reset and access handlers for PMCR_EL0 register

2015-12-02 Thread Shannon Zhao
From: Shannon Zhao 

Add reset handler which gets host value of PMCR_EL0 and make writable
bits architecturally UNKNOWN except PMCR.E to zero. Add a common access
handler for PMU registers which emulates writing and reading register
and add emulation for PMCR.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/kvm/sys_regs.c | 97 ++-
 1 file changed, 95 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 87a64e8..e020fe0 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -446,6 +447,58 @@ static void reset_mpidr(struct kvm_vcpu *vcpu, const 
struct sys_reg_desc *r)
vcpu_sys_reg(vcpu, MPIDR_EL1) = (1ULL << 31) | mpidr;
 }
 
+static void reset_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
+{
+   u64 pmcr, val;
+
+   asm volatile("mrs %0, pmcr_el0\n" : "=r" (pmcr));
+   /* Writable bits of PMCR_EL0 (ARMV8_PMCR_MASK) is reset to UNKNOWN
+* except PMCR.E resetting to zero.
+*/
+   val = ((pmcr & ~ARMV8_PMCR_MASK) | (ARMV8_PMCR_MASK & 0xdecafbad))
+ & (~ARMV8_PMCR_E);
+   vcpu_sys_reg(vcpu, r->reg) = val;
+}
+
+/* PMU registers accessor. */
+static bool access_pmu_regs(struct kvm_vcpu *vcpu,
+   const struct sys_reg_params *p,
+   const struct sys_reg_desc *r)
+{
+   u64 val;
+
+   if (p->is_write) {
+   switch (r->reg) {
+   case PMCR_EL0: {
+   /* Only update writeable bits of PMCR */
+   val = vcpu_sys_reg(vcpu, r->reg);
+   val &= ~ARMV8_PMCR_MASK;
+   val |= *vcpu_reg(vcpu, p->Rt) & ARMV8_PMCR_MASK;
+   vcpu_sys_reg(vcpu, r->reg) = val;
+   break;
+   }
+   default:
+   vcpu_sys_reg(vcpu, r->reg) = *vcpu_reg(vcpu, p->Rt);
+   break;
+   }
+   } else {
+   switch (r->reg) {
+   case PMCR_EL0: {
+   /* PMCR.P & PMCR.C are RAZ */
+   val = vcpu_sys_reg(vcpu, r->reg)
+ & ~(ARMV8_PMCR_P | ARMV8_PMCR_C);
+   *vcpu_reg(vcpu, p->Rt) = val;
+   break;
+   }
+   default:
+   *vcpu_reg(vcpu, p->Rt) = vcpu_sys_reg(vcpu, r->reg);
+   break;
+   }
+   }
+
+   return true;
+}
+
 /* Silly macro to expand the DBG{BCR,BVR,WVR,WCR}n_EL1 registers in one go */
 #define DBG_BCR_BVR_WCR_WVR_EL1(n) \
/* DBGBVRn_EL1 */   \
@@ -630,7 +683,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 
/* PMCR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b000),
- trap_raz_wi },
+ access_pmu_regs, reset_pmcr, PMCR_EL0, },
/* PMCNTENSET_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b001),
  trap_raz_wi },
@@ -864,6 +917,45 @@ static const struct sys_reg_desc cp14_64_regs[] = {
{ Op1( 0), CRm( 2), .access = trap_raz_wi },
 };
 
+/* PMU CP15 registers accessor. */
+static bool access_pmu_cp15_regs(struct kvm_vcpu *vcpu,
+const struct sys_reg_params *p,
+const struct sys_reg_desc *r)
+{
+   u32 val;
+
+   if (p->is_write) {
+   switch (r->reg) {
+   case c9_PMCR: {
+   /* Only update writeable bits of PMCR */
+   val = vcpu_cp15(vcpu, r->reg);
+   val &= ~ARMV8_PMCR_MASK;
+   val |= *vcpu_reg(vcpu, p->Rt) & ARMV8_PMCR_MASK;
+   vcpu_cp15(vcpu, r->reg) = val;
+   break;
+   }
+   default:
+   vcpu_cp15(vcpu, r->reg) = *vcpu_reg(vcpu, p->Rt);
+   break;
+   }
+   } else {
+   switch (r->reg) {
+   case c9_PMCR: {
+   /* PMCR.P & PMCR.C are RAZ */
+   val = vcpu_cp15(vcpu, r->reg)
+ & ~(ARMV8_PMCR_P | ARMV8_PMCR_C);
+   *vcpu_reg(vcpu, p->Rt) = val;
+   break;
+   }
+   default:
+   *vcpu_reg(vcpu, p->Rt) = vcpu_cp15(vcpu, r->reg);
+   break;
+   }
+   }
+
+   return true;
+}
+
 /*
  * Trapped cp15 registers. TTBR0/TTBR1 get a double encoding,
  * depending on the way they are accessed (as a 32bit or a 64bit
@@ -892,7 +984,8 @@ static const struct sys_reg_desc 

Re: [PATCH v4 18/21] KVM: ARM64: Add PMU overflow interrupt routing

2015-12-02 Thread Shannon Zhao


On 2015/12/2 16:45, Marc Zyngier wrote:
> On 02/12/15 02:40, Shannon Zhao wrote:
>> > 
>> > 
>> > On 2015/12/2 0:57, Marc Zyngier wrote:
>>> >> On 01/12/15 16:26, Shannon Zhao wrote:
 >>>
 >>>
 >>> On 2015/12/1 23:41, Marc Zyngier wrote:
>> > The reason is that when guest clear the overflow register, it will 
>> > trap
>>> >> to kvm and call kvm_pmu_sync_hwstate() as you see above. At this 
>>> >> moment,
>>> >> the overflow register is still overflowed(that is some bit is 
>>> >> still 1).
>>> >> So We need to use some flag to mark we already inject this 
>>> >> interrupt.
>>> >> And if during guest handling the overflow, there is a new 
>>> >> overflow
>>> >> happening, the pmu->irq_pending will be set ture by
>>> >> kvm_pmu_perf_overflow(), then it needs to inject this new 
>>> >> interrupt, right?
>  I don't think so. This is a level interrupt, so the level should stay
>  high as long as the guest hasn't cleared all possible sources for 
>  that
>  interrupt.
> 
>  For your example, the guest writes to PMOVSCLR to clear the overflow
>  caused by a given counter. If the status is now 0, the interrupt line
>  drops. If the status is still non zero, the line stays high. And I
>  believe that writing a 1 to PMOVSSET would actually trigger an
>  interrupt, or keep it high if it has already high.
> 
 >>> Right, writing 1 to PMOVSSET will trigger an interrupt.
 >>>
>  In essence, do not try to maintain side state. I've been bitten.
 >>>
 >>> So on VM entry, it check if PMOVSSET is zero. If not, call 
 >>> kvm_vgic_inject_irq to set the level high. If so, set the level low.
 >>> On VM exit, it seems there is nothing to do.
>>> >>
>>> >> It is even simpler than that:
>>> >>
>>> >> - When you get an overflow, you inject an interrupt with the level set 
>>> >> to 1.
>>> >> - When the overflow register gets cleared, you inject the same interrupt
>>> >> with the level set to 0.
>>> >>
>>> >> I don't think you need to do anything else, and the world switch should
>>> >> be left untouched.
>>> >>
>> > 
>> > On 2015/7/17 23:28, Christoffer Dall wrote:>> > +  
>> > kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
>  +pmu->irq_num, 1);
>>> >> what context is this overflow handler function?  kvm_vgic_inject_irq
>>> >> grabs a mutex, so it can sleep...
>>> >>
>>> >> from a quick glance at the perf core code, it looks like this is in
>>> >> interrupt context, so that call to kvm_vgic_inject_irq looks bad.
>>> >>
>> > 
>> > But as Christoffer said before, it's not good to call
>> > kvm_vgic_inject_irq directly in interrupt context. So if we just kick
>> > the vcpu here and call kvm_vgic_inject_irq on VM entry, is this fine?
> Possibly. I'm slightly worried that inject_irq itself is going to kick
> the vcpu again for no good reason. 
Yes, this will introduce a extra kick. What's the impact of kicking a
kicked vcpu?

> I guess we'll find out (and maybe
> we'll add a kvm_vgic_inject_irq_no_kick_please() helper...).
And add a parameter "bool kick" for vgic_update_irq_pending ?

-- 
Shannon

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 02/23] KVM: use heuristic for fast VCPU lookup by id

2015-12-02 Thread Christian Borntraeger
From: David Hildenbrand 

Usually, VCPU ids match the array index. So let's try a fast
lookup first before falling back to the slow iteration.

Suggested-by: Christian Borntraeger 
Reviewed-by: Dominik Dingel 
Reviewed-by: Christian Borntraeger 
Signed-off-by: David Hildenbrand 
Signed-off-by: Christian Borntraeger 
---
 include/linux/kvm_host.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 2911919..a754fc0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -472,6 +472,11 @@ static inline struct kvm_vcpu *kvm_get_vcpu_by_id(struct 
kvm *kvm, int id)
struct kvm_vcpu *vcpu;
int i;
 
+   if (id < 0 || id >= KVM_MAX_VCPUS)
+   return NULL;
+   vcpu = kvm_get_vcpu(kvm, id);
+   if (vcpu && vcpu->vcpu_id == id)
+   return vcpu;
kvm_for_each_vcpu(i, vcpu, kvm)
if (vcpu->vcpu_id == id)
return vcpu;
-- 
2.3.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 14/23] KVM: s390: we always have a SCA

2015-12-02 Thread Christian Borntraeger
From: David Hildenbrand 

Having no sca can never happen, even when something goes wrong when
switching to ESCA. Otherwise we would have a serious bug.
Let's remove this superfluous check.

Acked-by: Dominik Dingel 
Signed-off-by: David Hildenbrand 
Signed-off-by: Christian Borntraeger 
---
 arch/s390/kvm/kvm-s390.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 16c19fb..5c58127 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1608,13 +1608,8 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
vcpu->arch.sie_block->itdba = (unsigned long) _page->itdb;
 
vcpu->arch.sie_block->icpua = id;
-   if (!kvm_is_ucontrol(kvm)) {
-   if (!kvm->arch.sca) {
-   WARN_ON_ONCE(1);
-   goto out_free_cpu;
-   }
+   if (!kvm_is_ucontrol(kvm))
sca_add_vcpu(vcpu, kvm, id);
-   }
 
spin_lock_init(>arch.local_int.lock);
vcpu->arch.local_int.float_int = >arch.float_int;
-- 
2.3.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 10/21] arm64: KVM: Add patchable function selector

2015-12-02 Thread Marc Zyngier
On 02/12/15 09:27, Christoffer Dall wrote:
> On Tue, Dec 01, 2015 at 06:51:00PM +, Marc Zyngier wrote:
>> On 01/12/15 15:39, Christoffer Dall wrote:
>>> On Fri, Nov 27, 2015 at 06:50:04PM +, Marc Zyngier wrote:
 KVM so far relies on code patching, and is likely to use it more
 in the future. The main issue is that our alternative system works
 at the instruction level, while we'd like to have alternatives at
 the function level.

 In order to cope with this, add the "hyp_alternate_select" macro that
 outputs a brief sequence of code that in turn can be patched, allowing
 al alternative function to be selected.
>>>
>>> s/al/an/ ?
>>>

 Signed-off-by: Marc Zyngier 
 ---
  arch/arm64/kvm/hyp/hyp.h | 16 
  1 file changed, 16 insertions(+)

 diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
 index 7ac8e11..f0427ee 100644
 --- a/arch/arm64/kvm/hyp/hyp.h
 +++ b/arch/arm64/kvm/hyp/hyp.h
 @@ -27,6 +27,22 @@
  
  #define kern_hyp_va(v) (typeof(v))((unsigned long)v & 
 HYP_PAGE_OFFSET_MASK)
  
 +/*
 + * Generates patchable code sequences that are used to switch between
 + * two implementations of a function, depending on the availability of
 + * a feature.
 + */
>>>
>>> This looks right to me, but I'm a bit unclear what the types of this is
>>> and how to use it.
>>>
>>> Are orig and alt function pointers and cond is a CONFIG_FOO ?  fname is
>>> a symbol, which is defined as a prototype somewhere and then implemented
>>> here, or?
>>>
>>> Perhaps a Usage: part of the docs would be helpful.
>>
>> How about:
>>
>> @fname: a symbol name that will be defined as a function returning a
>> function pointer whose type will match @orig and @alt
>> @orig: A pointer to the default function, as returned by @fname when
>> @cond doesn't hold
>> @alt: A pointer to the alternate function, as returned by @fname when
>> @cond holds
>> @cond: a CPU feature (as described in asm/cpufeature.h)
> 
> looks good.
> 
>>
>>>
 +#define hyp_alternate_select(fname, orig, alt, cond)  
 \
 +typeof(orig) * __hyp_text fname(void) 
 \
 +{ \
 +  typeof(alt) *val = orig;\
 +  asm volatile(ALTERNATIVE("nop   \n",\
 +   "mov   %0, %1  \n",\
 +   cond)  \
 +   : "+r" (val) : "r" (alt)); \
 +  return val; \
 +}
 +
  void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
  void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
  
 -- 
 2.1.4

>>>
>>> I haven't thought much about how all of this is implemented, but from my
>>> point of views the ideal situation would be something like:
>>>
>>> void foo(int a, int b)
>>> {
>>> ALTERNATIVE_IF_NOT CONFIG_BAR
>>> foo_legacy(a, b);
>>> ALTERNATIVE_ELSE
>>> foo_new(a, b);
>>> ALTERNATIVE_END
>>> }
>>>
>>> I realize this may be impossible because the C code could implement all
>>> sort of fun stuff around the actual function calls, but would there be
>>> some way to annotate the functions and find the actual branch statement
>>> and change the target?
>>
>> The main issue is that C doesn't give you any access to the branch
>> function itself, except for the asm-goto statements. It also makes it
>> very hard to preserve the return type. For your idea to work, we'd need
>> some support in the compiler itself. I'm sure that it is doable, just
>> not by me! ;-)
> 
> Not by me either, I'm just asking stupid questions - as always.

I don't find that stupid. Asking that kind of stuff is useful to put
things in perspective.

>>
>> This is why I've ended up creating something that returns a function
>> *pointer*, because that's something that exists in the language (no new
>> concept). I simply made sure I could return it at minimal cost.
>>
> 
> I don't have a problem with this either.  I'm curious though, how much
> of a performance improvement (and why) we get from doing this as opposed
> to a simple if-statement?

An if statement will involve fetching some configuration from memory.
You can do that, but you are going to waste a cache line and memory
bandwidth (both which are scarce resources) for something that never
ever changes over the life of the system. These things tend to accumulate.

There is also a small number of cases where you *have* to patch
instructions (think VHE, for example). And having two different ways to
check for things is just asking for trouble in the long run.

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line 

Re: [PATCH v4 18/21] KVM: ARM64: Add PMU overflow interrupt routing

2015-12-02 Thread Marc Zyngier
On 02/12/15 09:49, Shannon Zhao wrote:
> 
> 
> On 2015/12/2 16:45, Marc Zyngier wrote:
>> On 02/12/15 02:40, Shannon Zhao wrote:


 On 2015/12/2 0:57, Marc Zyngier wrote:
>> On 01/12/15 16:26, Shannon Zhao wrote:


 On 2015/12/1 23:41, Marc Zyngier wrote:
 The reason is that when guest clear the overflow register, it will 
 trap
>> to kvm and call kvm_pmu_sync_hwstate() as you see above. At this 
>> moment,
>> the overflow register is still overflowed(that is some bit is 
>> still 1).
>> So We need to use some flag to mark we already inject this 
>> interrupt.
>> And if during guest handling the overflow, there is a new 
>> overflow
>> happening, the pmu->irq_pending will be set ture by
>> kvm_pmu_perf_overflow(), then it needs to inject this new 
>> interrupt, right?
>> I don't think so. This is a level interrupt, so the level should stay
>> high as long as the guest hasn't cleared all possible sources for 
>> that
>> interrupt.
>>
>> For your example, the guest writes to PMOVSCLR to clear the overflow
>> caused by a given counter. If the status is now 0, the interrupt line
>> drops. If the status is still non zero, the line stays high. And I
>> believe that writing a 1 to PMOVSSET would actually trigger an
>> interrupt, or keep it high if it has already high.
>>
 Right, writing 1 to PMOVSSET will trigger an interrupt.

>> In essence, do not try to maintain side state. I've been bitten.

 So on VM entry, it check if PMOVSSET is zero. If not, call 
 kvm_vgic_inject_irq to set the level high. If so, set the level low.
 On VM exit, it seems there is nothing to do.
>>
>> It is even simpler than that:
>>
>> - When you get an overflow, you inject an interrupt with the level set 
>> to 1.
>> - When the overflow register gets cleared, you inject the same interrupt
>> with the level set to 0.
>>
>> I don't think you need to do anything else, and the world switch should
>> be left untouched.
>>

 On 2015/7/17 23:28, Christoffer Dall wrote:>> > +  
 kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
>> +pmu->irq_num, 1);
>> what context is this overflow handler function?  kvm_vgic_inject_irq
>> grabs a mutex, so it can sleep...
>>
>> from a quick glance at the perf core code, it looks like this is in
>> interrupt context, so that call to kvm_vgic_inject_irq looks bad.
>>

 But as Christoffer said before, it's not good to call
 kvm_vgic_inject_irq directly in interrupt context. So if we just kick
 the vcpu here and call kvm_vgic_inject_irq on VM entry, is this fine?
>> Possibly. I'm slightly worried that inject_irq itself is going to kick
>> the vcpu again for no good reason. 
> Yes, this will introduce a extra kick. What's the impact of kicking a
> kicked vcpu?

As long as you only kick yourself, it shouldn't be much (trying to
decipher vcpu_kick).

>> I guess we'll find out (and maybe
>> we'll add a kvm_vgic_inject_irq_no_kick_please() helper...).
> And add a parameter "bool kick" for vgic_update_irq_pending ?

Given that we're completely rewriting the thing, I'd rather not add more
hacks to it if we can avoid it.

Give it a go, and we'll find out!

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 01/23] KVM: Use common function for VCPU lookup by id

2015-12-02 Thread Christian Borntraeger
From: David Hildenbrand 

Let's reuse the new common function for VPCU lookup by id.

Reviewed-by: Christian Borntraeger 
Reviewed-by: Dominik Dingel 
Signed-off-by: David Hildenbrand 
Signed-off-by: Christian Borntraeger 
[split out the new function into a separate patch]
---
 arch/powerpc/kvm/book3s_hv.c | 10 ++
 arch/s390/kvm/diag.c | 11 +++
 virt/kvm/kvm_main.c  | 12 +---
 3 files changed, 10 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 54b45b7..a29da44 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -308,16 +308,10 @@ static void kvmppc_dump_regs(struct kvm_vcpu *vcpu)
 
 static struct kvm_vcpu *kvmppc_find_vcpu(struct kvm *kvm, int id)
 {
-   int r;
-   struct kvm_vcpu *v, *ret = NULL;
+   struct kvm_vcpu *ret;
 
mutex_lock(>lock);
-   kvm_for_each_vcpu(r, v, kvm) {
-   if (v->vcpu_id == id) {
-   ret = v;
-   break;
-   }
-   }
+   ret = kvm_get_vcpu_by_id(kvm, id);
mutex_unlock(>lock);
return ret;
 }
diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c
index 5fbfb88..05f7de9 100644
--- a/arch/s390/kvm/diag.c
+++ b/arch/s390/kvm/diag.c
@@ -155,10 +155,8 @@ static int __diag_time_slice_end(struct kvm_vcpu *vcpu)
 
 static int __diag_time_slice_end_directed(struct kvm_vcpu *vcpu)
 {
-   struct kvm *kvm = vcpu->kvm;
struct kvm_vcpu *tcpu;
int tid;
-   int i;
 
tid = vcpu->run->s.regs.gprs[(vcpu->arch.sie_block->ipa & 0xf0) >> 4];
vcpu->stat.diagnose_9c++;
@@ -167,12 +165,9 @@ static int __diag_time_slice_end_directed(struct kvm_vcpu 
*vcpu)
if (tid == vcpu->vcpu_id)
return 0;
 
-   kvm_for_each_vcpu(i, tcpu, kvm)
-   if (tcpu->vcpu_id == tid) {
-   kvm_vcpu_yield_to(tcpu);
-   break;
-   }
-
+   tcpu = kvm_get_vcpu_by_id(vcpu->kvm, tid);
+   if (tcpu)
+   kvm_vcpu_yield_to(tcpu);
return 0;
 }
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 73cbb41..9649a42 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2257,7 +2257,7 @@ static int create_vcpu_fd(struct kvm_vcpu *vcpu)
 static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
 {
int r;
-   struct kvm_vcpu *vcpu, *v;
+   struct kvm_vcpu *vcpu;
 
if (id >= KVM_MAX_VCPUS)
return -EINVAL;
@@ -2281,12 +2281,10 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, 
u32 id)
r = -EINVAL;
goto unlock_vcpu_destroy;
}
-
-   kvm_for_each_vcpu(r, v, kvm)
-   if (v->vcpu_id == id) {
-   r = -EEXIST;
-   goto unlock_vcpu_destroy;
-   }
+   if (kvm_get_vcpu_by_id(kvm, id)) {
+   r = -EEXIST;
+   goto unlock_vcpu_destroy;
+   }
 
BUG_ON(kvm->vcpus[atomic_read(>online_vcpus)]);
 
-- 
2.3.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 10/23] KVM: s390: Make provisions for ESCA utilization

2015-12-02 Thread Christian Borntraeger
From: "Eugene (jno) Dvurechenski" 

This patch updates the routines (sca_*) to provide transparent access
to and manipulation on the data for both Basic and Extended SCA in use.
The kvm.arch.sca is generalized to (void *) to handle BSCA/ESCA cases.
Also the kvm.arch.use_esca flag is provided.
The actual functionality is kept the same.

Signed-off-by: Eugene (jno) Dvurechenski 
Signed-off-by: Christian Borntraeger 
---
 arch/s390/include/asm/kvm_host.h |  3 +-
 arch/s390/kvm/interrupt.c| 78 +++-
 arch/s390/kvm/kvm-s390.c | 54 +---
 3 files changed, 106 insertions(+), 29 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 923b13d..25fdbf8 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -620,7 +620,8 @@ struct kvm_s390_crypto_cb {
 };
 
 struct kvm_arch{
-   struct bsca_block *sca;
+   void *sca;
+   int use_esca;
debug_info_t *dbf;
struct kvm_s390_float_interrupt float_int;
struct kvm_device *flic;
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index aa221a4..60b36b0 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -37,30 +37,60 @@
 /* handle external calls via sigp interpretation facility */
 static int sca_ext_call_pending(struct kvm_vcpu *vcpu, int *src_id)
 {
-   struct bsca_block *sca = vcpu->kvm->arch.sca;
-   union bsca_sigp_ctrl sigp_ctrl = sca->cpu[vcpu->vcpu_id].sigp_ctrl;
+   int c, scn;
+
+   if (vcpu->kvm->arch.use_esca) {
+   struct esca_block *sca = vcpu->kvm->arch.sca;
+   union esca_sigp_ctrl sigp_ctrl =
+   sca->cpu[vcpu->vcpu_id].sigp_ctrl;
+
+   c = sigp_ctrl.c;
+   scn = sigp_ctrl.scn;
+   } else {
+   struct bsca_block *sca = vcpu->kvm->arch.sca;
+   union bsca_sigp_ctrl sigp_ctrl =
+   sca->cpu[vcpu->vcpu_id].sigp_ctrl;
+
+   c = sigp_ctrl.c;
+   scn = sigp_ctrl.scn;
+   }
 
if (src_id)
-   *src_id = sigp_ctrl.scn;
+   *src_id = scn;
 
-   return sigp_ctrl.c &&
-   atomic_read(>arch.sie_block->cpuflags) &
+   return c && atomic_read(>arch.sie_block->cpuflags) &
CPUSTAT_ECALL_PEND;
 }
 
 static int sca_inject_ext_call(struct kvm_vcpu *vcpu, int src_id)
 {
int expect, rc;
-   struct bsca_block *sca = vcpu->kvm->arch.sca;
-   union bsca_sigp_ctrl *sigp_ctrl = &(sca->cpu[vcpu->vcpu_id].sigp_ctrl);
-   union bsca_sigp_ctrl new_val = {0}, old_val = *sigp_ctrl;
 
-   new_val.scn = src_id;
-   new_val.c = 1;
-   old_val.c = 0;
+   if (vcpu->kvm->arch.use_esca) {
+   struct esca_block *sca = vcpu->kvm->arch.sca;
+   union esca_sigp_ctrl *sigp_ctrl =
+   &(sca->cpu[vcpu->vcpu_id].sigp_ctrl);
+   union esca_sigp_ctrl new_val = {0}, old_val = *sigp_ctrl;
 
-   expect = old_val.value;
-   rc = cmpxchg(_ctrl->value, old_val.value, new_val.value);
+   new_val.scn = src_id;
+   new_val.c = 1;
+   old_val.c = 0;
+
+   expect = old_val.value;
+   rc = cmpxchg(_ctrl->value, old_val.value, new_val.value);
+   } else {
+   struct bsca_block *sca = vcpu->kvm->arch.sca;
+   union bsca_sigp_ctrl *sigp_ctrl =
+   &(sca->cpu[vcpu->vcpu_id].sigp_ctrl);
+   union bsca_sigp_ctrl new_val = {0}, old_val = *sigp_ctrl;
+
+   new_val.scn = src_id;
+   new_val.c = 1;
+   old_val.c = 0;
+
+   expect = old_val.value;
+   rc = cmpxchg(_ctrl->value, old_val.value, new_val.value);
+   }
 
if (rc != expect) {
/* another external call is pending */
@@ -72,12 +102,28 @@ static int sca_inject_ext_call(struct kvm_vcpu *vcpu, int 
src_id)
 
 static void sca_clear_ext_call(struct kvm_vcpu *vcpu)
 {
-   struct bsca_block *sca = vcpu->kvm->arch.sca;
struct kvm_s390_local_interrupt *li = >arch.local_int;
-   union bsca_sigp_ctrl *sigp_ctrl = &(sca->cpu[vcpu->vcpu_id].sigp_ctrl);
+   int rc, expect;
 
atomic_andnot(CPUSTAT_ECALL_PEND, li->cpuflags);
-   sigp_ctrl->value = 0;
+   if (vcpu->kvm->arch.use_esca) {
+   struct esca_block *sca = vcpu->kvm->arch.sca;
+   union esca_sigp_ctrl *sigp_ctrl =
+   &(sca->cpu[vcpu->vcpu_id].sigp_ctrl);
+   union esca_sigp_ctrl old = *sigp_ctrl;
+
+   expect = old.value;
+   rc = cmpxchg(_ctrl->value, old.value, 0);
+   } else {
+   struct bsca_block *sca = vcpu->kvm->arch.sca;
+   union bsca_sigp_ctrl 

[GIT PULL 17/23] KVM: s390: cleanup sca_add_vcpu

2015-12-02 Thread Christian Borntraeger
From: David Hildenbrand 

Now that we already have kvm and the VCPU id set for the VCPU, we can
convert sda_add_vcpu to look much more like sda_del_vcpu.

Signed-off-by: David Hildenbrand 
Signed-off-by: Christian Borntraeger 
---
 arch/s390/kvm/kvm-s390.c | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 7e0092b..d9d71bb 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1276,27 +1276,26 @@ static void sca_del_vcpu(struct kvm_vcpu *vcpu)
read_unlock(>kvm->arch.sca_lock);
 }
 
-static void sca_add_vcpu(struct kvm_vcpu *vcpu, struct kvm *kvm,
-   unsigned int id)
+static void sca_add_vcpu(struct kvm_vcpu *vcpu)
 {
-   read_lock(>arch.sca_lock);
-   if (kvm->arch.use_esca) {
-   struct esca_block *sca = kvm->arch.sca;
+   read_lock(>kvm->arch.sca_lock);
+   if (vcpu->kvm->arch.use_esca) {
+   struct esca_block *sca = vcpu->kvm->arch.sca;
 
-   sca->cpu[id].sda = (__u64) vcpu->arch.sie_block;
+   sca->cpu[vcpu->vcpu_id].sda = (__u64) vcpu->arch.sie_block;
vcpu->arch.sie_block->scaoh = (__u32)(((__u64)sca) >> 32);
vcpu->arch.sie_block->scaol = (__u32)(__u64)sca & ~0x3fU;
vcpu->arch.sie_block->ecb2 |= 0x04U;
-   set_bit_inv(id, (unsigned long *) sca->mcn);
+   set_bit_inv(vcpu->vcpu_id, (unsigned long *) sca->mcn);
} else {
-   struct bsca_block *sca = kvm->arch.sca;
+   struct bsca_block *sca = vcpu->kvm->arch.sca;
 
-   sca->cpu[id].sda = (__u64) vcpu->arch.sie_block;
+   sca->cpu[vcpu->vcpu_id].sda = (__u64) vcpu->arch.sie_block;
vcpu->arch.sie_block->scaoh = (__u32)(((__u64)sca) >> 32);
vcpu->arch.sie_block->scaol = (__u32)(__u64)sca;
-   set_bit_inv(id, (unsigned long *) >mcn);
+   set_bit_inv(vcpu->vcpu_id, (unsigned long *) >mcn);
}
-   read_unlock(>arch.sca_lock);
+   read_unlock(>kvm->arch.sca_lock);
 }
 
 /* Basic SCA to Extended SCA data copy routines */
@@ -1492,7 +1491,7 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
mutex_unlock(>kvm->lock);
if (!kvm_is_ucontrol(vcpu->kvm)) {
vcpu->arch.gmap = vcpu->kvm->arch.gmap;
-   sca_add_vcpu(vcpu, vcpu->kvm, vcpu->vcpu_id);
+   sca_add_vcpu(vcpu);
}
 
 }
-- 
2.3.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 19/23] s390/sclp: introduce check for SIE

2015-12-02 Thread Christian Borntraeger
From: David Hildenbrand 

This patch adds a way to check if the SIE with zArchitecture support is
available.

Acked-by: Martin Schwidefsky 
Acked-by: Cornelia Huck 
Signed-off-by: David Hildenbrand 
Signed-off-by: Christian Borntraeger 
---
 arch/s390/include/asm/sclp.h   | 6 +-
 drivers/s390/char/sclp_early.c | 1 +
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/sclp.h b/arch/s390/include/asm/sclp.h
index 8324abb..dea883f 100644
--- a/arch/s390/include/asm/sclp.h
+++ b/arch/s390/include/asm/sclp.h
@@ -29,7 +29,10 @@ struct sclp_ipl_info {
 
 struct sclp_core_entry {
u8 core_id;
-   u8 reserved0[2];
+   u8 reserved0;
+   u8 : 4;
+   u8 sief2 : 1;
+   u8 : 3;
u8 : 3;
u8 siif : 1;
u8 sigpif : 1;
@@ -55,6 +58,7 @@ struct sclp_info {
unsigned char has_sprp : 1;
unsigned char has_hvs : 1;
unsigned char has_esca : 1;
+   unsigned char has_sief2 : 1;
unsigned int ibc;
unsigned int mtid;
unsigned int mtid_cp;
diff --git a/drivers/s390/char/sclp_early.c b/drivers/s390/char/sclp_early.c
index ff1e1bb..e0a1f4e 100644
--- a/drivers/s390/char/sclp_early.c
+++ b/drivers/s390/char/sclp_early.c
@@ -136,6 +136,7 @@ static void __init sclp_facilities_detect(struct 
read_info_sccb *sccb)
continue;
sclp.has_siif = cpue->siif;
sclp.has_sigpif = cpue->sigpif;
+   sclp.has_sief2 = cpue->sief2;
break;
}
 
-- 
2.3.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 23/23] KVM: s390: remove redudant assigment of error code

2015-12-02 Thread Christian Borntraeger
rc already contains -ENOMEM, no need to assign it twice.

Signed-off-by: Christian Borntraeger 
Acked-by: Cornelia Huck 
Reviewed-by: David Hildenbrand 
---
 arch/s390/kvm/kvm-s390.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 77724ce..6857262 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1618,10 +1618,8 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
 */
vcpu->arch.guest_fpregs.fprs = kzalloc(sizeof(freg_t) * __NUM_FPRS,
   GFP_KERNEL);
-   if (!vcpu->arch.guest_fpregs.fprs) {
-   rc = -ENOMEM;
+   if (!vcpu->arch.guest_fpregs.fprs)
goto out_free_sie_block;
-   }
 
rc = kvm_vcpu_init(vcpu, kvm, id);
if (rc)
-- 
2.3.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 08/23] KVM: s390: Provide SCA-aware helpers for VCPU add/del

2015-12-02 Thread Christian Borntraeger
From: "Eugene (jno) Dvurechenski" 

This patch provides SCA-aware helpers to create/delete a VCPU.
This is to prepare for upcoming introduction of Extended SCA support.

Signed-off-by: Eugene (jno) Dvurechenski 
Reviewed-by: David Hildenbrand 
Signed-off-by: Christian Borntraeger 
---
 arch/s390/kvm/kvm-s390.c | 44 +++-
 1 file changed, 31 insertions(+), 13 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 5c36c8e..8ddd488 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -283,6 +283,8 @@ static void kvm_s390_sync_dirty_log(struct kvm *kvm,
 }
 
 /* Section: vm related */
+static void sca_del_vcpu(struct kvm_vcpu *vcpu);
+
 /*
  * Get (and clear) the dirty memory log for a memory slot.
  */
@@ -1189,11 +1191,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
kvm_s390_clear_local_irqs(vcpu);
kvm_clear_async_pf_completion_queue(vcpu);
if (!kvm_is_ucontrol(vcpu->kvm)) {
-   clear_bit(63 - vcpu->vcpu_id,
- (unsigned long *) >kvm->arch.sca->mcn);
-   if (vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda ==
-   (__u64) vcpu->arch.sie_block)
-   vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda = 0;
+   sca_del_vcpu(vcpu);
}
smp_mb();
 
@@ -1249,6 +1247,32 @@ static int __kvm_ucontrol_vcpu_init(struct kvm_vcpu 
*vcpu)
return 0;
 }
 
+static void sca_del_vcpu(struct kvm_vcpu *vcpu)
+{
+   struct sca_block *sca = vcpu->kvm->arch.sca;
+
+   clear_bit_inv(vcpu->vcpu_id, (unsigned long *) >mcn);
+   if (sca->cpu[vcpu->vcpu_id].sda == (__u64) vcpu->arch.sie_block)
+   sca->cpu[vcpu->vcpu_id].sda = 0;
+}
+
+static void sca_add_vcpu(struct kvm_vcpu *vcpu, struct kvm *kvm,
+   unsigned int id)
+{
+   struct sca_block *sca = kvm->arch.sca;
+
+   if (!sca->cpu[id].sda)
+   sca->cpu[id].sda = (__u64) vcpu->arch.sie_block;
+   vcpu->arch.sie_block->scaoh = (__u32)(((__u64)sca) >> 32);
+   vcpu->arch.sie_block->scaol = (__u32)(__u64)sca;
+   set_bit_inv(id, (unsigned long *) >mcn);
+}
+
+static int sca_can_add_vcpu(struct kvm *kvm, unsigned int id)
+{
+   return id < KVM_MAX_VCPUS;
+}
+
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
vcpu->arch.pfault_token = KVM_S390_PFAULT_TOKEN_INVALID;
@@ -1465,7 +1489,7 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
struct sie_page *sie_page;
int rc = -EINVAL;
 
-   if (id >= KVM_MAX_VCPUS)
+   if (!sca_can_add_vcpu(kvm, id))
goto out;
 
rc = -ENOMEM;
@@ -1487,13 +1511,7 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
WARN_ON_ONCE(1);
goto out_free_cpu;
}
-   if (!kvm->arch.sca->cpu[id].sda)
-   kvm->arch.sca->cpu[id].sda =
-   (__u64) vcpu->arch.sie_block;
-   vcpu->arch.sie_block->scaoh =
-   (__u32)(((__u64)kvm->arch.sca) >> 32);
-   vcpu->arch.sie_block->scaol = (__u32)(__u64)kvm->arch.sca;
-   set_bit(63 - id, (unsigned long *) >arch.sca->mcn);
+   sca_add_vcpu(vcpu, kvm, id);
}
 
spin_lock_init(>arch.local_int.lock);
-- 
2.3.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 05/23] s390/sclp: introduce checks for ESCA and HVS

2015-12-02 Thread Christian Borntraeger
From: "Eugene (jno) Dvurechenski" 

Introduce sclp.has_hvs and sclp.has_esca to provide a way for kvm to check
whether the extended-SCA and the home-virtual-SCA facilities are available.

Signed-off-by: Eugene (jno) Dvurechenski 
Reviewed-by: David Hildenbrand 
Signed-off-by: Christian Borntraeger 
---
 arch/s390/include/asm/sclp.h   | 2 ++
 drivers/s390/char/sclp_early.c | 7 ++-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/sclp.h b/arch/s390/include/asm/sclp.h
index 821dde5..8324abb 100644
--- a/arch/s390/include/asm/sclp.h
+++ b/arch/s390/include/asm/sclp.h
@@ -53,6 +53,8 @@ struct sclp_info {
unsigned char has_sigpif : 1;
unsigned char has_core_type : 1;
unsigned char has_sprp : 1;
+   unsigned char has_hvs : 1;
+   unsigned char has_esca : 1;
unsigned int ibc;
unsigned int mtid;
unsigned int mtid_cp;
diff --git a/drivers/s390/char/sclp_early.c b/drivers/s390/char/sclp_early.c
index 7bc6df3..ff1e1bb 100644
--- a/drivers/s390/char/sclp_early.c
+++ b/drivers/s390/char/sclp_early.c
@@ -43,7 +43,10 @@ struct read_info_sccb {
u8  _pad_92[100 - 92];  /* 92-99 */
u32 rnsize2;/* 100-103 */
u64 rnmax2; /* 104-111 */
-   u8  _pad_112[120 - 112];/* 112-119 */
+   u8  _pad_112[116 - 112];/* 112-115 */
+   u8  fac116; /* 116 */
+   u8  _pad_117[119 - 117];/* 117-118 */
+   u8  fac119; /* 119 */
u16 hcpua;  /* 120-121 */
u8  _pad_122[4096 - 122];   /* 122-4095 */
 } __packed __aligned(PAGE_SIZE);
@@ -108,6 +111,8 @@ static void __init sclp_facilities_detect(struct 
read_info_sccb *sccb)
sclp.facilities = sccb->facilities;
sclp.has_sprp = !!(sccb->fac84 & 0x02);
sclp.has_core_type = !!(sccb->fac84 & 0x01);
+   sclp.has_esca = !!(sccb->fac116 & 0x08);
+   sclp.has_hvs = !!(sccb->fac119 & 0x80);
if (sccb->fac85 & 0x02)
S390_lowcore.machine_flags |= MACHINE_FLAG_ESOP;
sclp.rnmax = sccb->rnmax ? sccb->rnmax : sccb->rnmax2;
-- 
2.3.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 11/23] KVM: s390: Introduce switching code

2015-12-02 Thread Christian Borntraeger
From: "Eugene (jno) Dvurechenski" 

This patch adds code that performs transparent switch to Extended
SCA on addition of 65th VCPU in a VM. Disposal of ESCA is added too.
The entier ESCA functionality, however, is still not enabled.
The enablement will be provided in a separate patch.

This patch also uses read/write lock protection of SCA and its subfields for
possible disposal at the BSCA-to-ESCA transition. While only Basic SCA needs 
such
a protection (for the swap), any SCA access is now guarded.

Signed-off-by: Eugene (jno) Dvurechenski 
Signed-off-by: Christian Borntraeger 
---
 arch/s390/include/asm/kvm_host.h |  1 +
 arch/s390/kvm/gaccess.c  | 30 
 arch/s390/kvm/interrupt.c|  6 
 arch/s390/kvm/kvm-s390.c | 75 ++--
 4 files changed, 103 insertions(+), 9 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 25fdbf8..86c3386 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -622,6 +622,7 @@ struct kvm_s390_crypto_cb {
 struct kvm_arch{
void *sca;
int use_esca;
+   rwlock_t sca_lock;
debug_info_t *dbf;
struct kvm_s390_float_interrupt float_int;
struct kvm_device *flic;
diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index 06f7edb..d30db40 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -259,10 +259,14 @@ struct aste {
 
 int ipte_lock_held(struct kvm_vcpu *vcpu)
 {
-   union ipte_control *ic = kvm_s390_get_ipte_control(vcpu->kvm);
+   if (vcpu->arch.sie_block->eca & 1) {
+   int rc;
 
-   if (vcpu->arch.sie_block->eca & 1)
-   return ic->kh != 0;
+   read_lock(>kvm->arch.sca_lock);
+   rc = kvm_s390_get_ipte_control(vcpu->kvm)->kh != 0;
+   read_unlock(>kvm->arch.sca_lock);
+   return rc;
+   }
return vcpu->kvm->arch.ipte_lock_count != 0;
 }
 
@@ -274,16 +278,20 @@ static void ipte_lock_simple(struct kvm_vcpu *vcpu)
vcpu->kvm->arch.ipte_lock_count++;
if (vcpu->kvm->arch.ipte_lock_count > 1)
goto out;
+retry:
+   read_lock(>kvm->arch.sca_lock);
ic = kvm_s390_get_ipte_control(vcpu->kvm);
do {
old = READ_ONCE(*ic);
-   while (old.k) {
+   if (old.k) {
+   read_unlock(>kvm->arch.sca_lock);
cond_resched();
-   old = READ_ONCE(*ic);
+   goto retry;
}
new = old;
new.k = 1;
} while (cmpxchg(>val, old.val, new.val) != old.val);
+   read_unlock(>kvm->arch.sca_lock);
 out:
mutex_unlock(>kvm->arch.ipte_mutex);
 }
@@ -296,12 +304,14 @@ static void ipte_unlock_simple(struct kvm_vcpu *vcpu)
vcpu->kvm->arch.ipte_lock_count--;
if (vcpu->kvm->arch.ipte_lock_count)
goto out;
+   read_lock(>kvm->arch.sca_lock);
ic = kvm_s390_get_ipte_control(vcpu->kvm);
do {
old = READ_ONCE(*ic);
new = old;
new.k = 0;
} while (cmpxchg(>val, old.val, new.val) != old.val);
+   read_unlock(>kvm->arch.sca_lock);
wake_up(>kvm->arch.ipte_wq);
 out:
mutex_unlock(>kvm->arch.ipte_mutex);
@@ -311,23 +321,28 @@ static void ipte_lock_siif(struct kvm_vcpu *vcpu)
 {
union ipte_control old, new, *ic;
 
+retry:
+   read_lock(>kvm->arch.sca_lock);
ic = kvm_s390_get_ipte_control(vcpu->kvm);
do {
old = READ_ONCE(*ic);
-   while (old.kg) {
+   if (old.kg) {
+   read_unlock(>kvm->arch.sca_lock);
cond_resched();
-   old = READ_ONCE(*ic);
+   goto retry;
}
new = old;
new.k = 1;
new.kh++;
} while (cmpxchg(>val, old.val, new.val) != old.val);
+   read_unlock(>kvm->arch.sca_lock);
 }
 
 static void ipte_unlock_siif(struct kvm_vcpu *vcpu)
 {
union ipte_control old, new, *ic;
 
+   read_lock(>kvm->arch.sca_lock);
ic = kvm_s390_get_ipte_control(vcpu->kvm);
do {
old = READ_ONCE(*ic);
@@ -336,6 +351,7 @@ static void ipte_unlock_siif(struct kvm_vcpu *vcpu)
if (!new.kh)
new.k = 0;
} while (cmpxchg(>val, old.val, new.val) != old.val);
+   read_unlock(>kvm->arch.sca_lock);
if (!new.kh)
wake_up(>kvm->arch.ipte_wq);
 }
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 60b36b0..831c9ac 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -39,6 +39,7 @@ static int sca_ext_call_pending(struct kvm_vcpu *vcpu, int 

[GIT PULL 15/23] KVM: s390: fix SCA related races and double use

2015-12-02 Thread Christian Borntraeger
From: David Hildenbrand 

If something goes wrong in kvm_arch_vcpu_create, the VCPU has already
been added to the sca but will never be removed. Trying to create VCPUs
with duplicate ids (e.g. after a failed attempt) is problematic.

Also, when creating multiple VCPUs in parallel, we could theoretically
forget to set the correct SCA when the switch to ESCA happens just
before the VCPU is registered.

Let's add the VCPU to the SCA in kvm_arch_vcpu_postcreate, where we can
be sure that no duplicate VCPU with the same id is around and the VCPU
has already been registered at the VM. We also have to make sure to update
ECB at that point.

Signed-off-by: David Hildenbrand 
Signed-off-by: Christian Borntraeger 
---
 arch/s390/kvm/kvm-s390.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 5c58127..2ba5978 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1289,6 +1289,7 @@ static void sca_add_vcpu(struct kvm_vcpu *vcpu, struct 
kvm *kvm,
sca->cpu[id].sda = (__u64) vcpu->arch.sie_block;
vcpu->arch.sie_block->scaoh = (__u32)(((__u64)sca) >> 32);
vcpu->arch.sie_block->scaol = (__u32)(__u64)sca & ~0x3fU;
+   vcpu->arch.sie_block->ecb2 |= 0x04U;
set_bit_inv(id, (unsigned long *) sca->mcn);
} else {
struct bsca_block *sca = kvm->arch.sca;
@@ -1493,8 +1494,11 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
vcpu->arch.sie_block->epoch = vcpu->kvm->arch.epoch;
preempt_enable();
mutex_unlock(>kvm->lock);
-   if (!kvm_is_ucontrol(vcpu->kvm))
+   if (!kvm_is_ucontrol(vcpu->kvm)) {
vcpu->arch.gmap = vcpu->kvm->arch.gmap;
+   sca_add_vcpu(vcpu, vcpu->kvm, vcpu->vcpu_id);
+   }
+
 }
 
 static void kvm_s390_vcpu_crypto_setup(struct kvm_vcpu *vcpu)
@@ -1558,8 +1562,6 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
vcpu->arch.sie_block->ecb |= 0x10;
 
vcpu->arch.sie_block->ecb2  = 8;
-   if (vcpu->kvm->arch.use_esca)
-   vcpu->arch.sie_block->ecb2 |= 4;
vcpu->arch.sie_block->eca   = 0xC1002000U;
if (sclp.has_siif)
vcpu->arch.sie_block->eca |= 1;
@@ -1608,9 +1610,6 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
vcpu->arch.sie_block->itdba = (unsigned long) _page->itdb;
 
vcpu->arch.sie_block->icpua = id;
-   if (!kvm_is_ucontrol(kvm))
-   sca_add_vcpu(vcpu, kvm, id);
-
spin_lock_init(>arch.local_int.lock);
vcpu->arch.local_int.float_int = >arch.float_int;
vcpu->arch.local_int.wq = >wq;
-- 
2.3.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 21/23] KVM: s390: don't load kvm without virtualization support

2015-12-02 Thread Christian Borntraeger
From: David Hildenbrand 

If we don't have support for virtualization (SIE), e.g. when running under
a hypervisor not supporting execution of the SIE instruction, we should
immediately abort loading the kvm module, as the SIE instruction cannot
be enabled dynamically.

Currently, the SIE instructions fails with an exception on a non-SIE
host, resulting in the guest making no progress, instead of failing hard.

Reviewed-by: Cornelia Huck 
Acked-by: Martin Schwidefsky 
Signed-off-by: David Hildenbrand 
Signed-off-by: Christian Borntraeger 
---
 arch/s390/kvm/kvm-s390.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 539d385..49d3319 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2859,6 +2859,11 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 
 static int __init kvm_s390_init(void)
 {
+   if (!sclp.has_sief2) {
+   pr_info("SIE not available\n");
+   return -ENODEV;
+   }
+
return kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
 }
 
-- 
2.3.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 12/23] KVM: s390: Enable up to 248 VCPUs per VM

2015-12-02 Thread Christian Borntraeger
From: "Eugene (jno) Dvurechenski" 

This patch allows s390 to have more than 64 VCPUs for a guest (up to
248 for memory usage considerations), if supported by the underlaying
hardware (sclp.has_esca).

Signed-off-by: Eugene (jno) Dvurechenski 
Signed-off-by: Christian Borntraeger 
---
 arch/s390/include/asm/kvm_host.h | 2 +-
 arch/s390/kvm/kvm-s390.c | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 86c3386..12e9291 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -27,7 +27,7 @@
 
 #define KVM_S390_BSCA_CPU_SLOTS 64
 #define KVM_S390_ESCA_CPU_SLOTS 248
-#define KVM_MAX_VCPUS KVM_S390_BSCA_CPU_SLOTS
+#define KVM_MAX_VCPUS KVM_S390_ESCA_CPU_SLOTS
 #define KVM_USER_MEM_SLOTS 32
 
 /*
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 5e884aa..16c19fb 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -246,7 +246,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
break;
case KVM_CAP_NR_VCPUS:
case KVM_CAP_MAX_VCPUS:
-   r = KVM_MAX_VCPUS;
+   r = sclp.has_esca ? KVM_S390_ESCA_CPU_SLOTS
+ : KVM_S390_BSCA_CPU_SLOTS;
break;
case KVM_CAP_NR_MEMSLOTS:
r = KVM_USER_MEM_SLOTS;
-- 
2.3.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 03/23] KVM: Remove unnecessary debugfs dentry references

2015-12-02 Thread Christian Borntraeger
From: Janosch Frank 

KVM creates debugfs files to export VM statistics to userland. To be
able to remove them on kvm exit it tracks the files' dentries.

Since their parent directory is also tracked and since each parent
direntry knows its children we can easily remove them by using
debugfs_remove_recursive(kvm_debugfs_dir). Therefore we don't
need the extra tracking in the kvm_stats_debugfs_item anymore.

Signed-off-by: Janosch Frank 
Reviewed-By: Sascha Silbe 
Acked-by: Christian Borntraeger 
Signed-off-by: Christian Borntraeger 
---
 include/linux/kvm_host.h |  1 -
 virt/kvm/kvm_main.c  | 18 --
 2 files changed, 4 insertions(+), 15 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a754fc0..590c46e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1016,7 +1016,6 @@ struct kvm_stats_debugfs_item {
const char *name;
int offset;
enum kvm_stat_kind kind;
-   struct dentry *dentry;
 };
 extern struct kvm_stats_debugfs_item debugfs_entries[];
 extern struct dentry *kvm_debugfs_dir;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9649a42..be3cef1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3447,10 +3447,9 @@ static int kvm_init_debug(void)
goto out;
 
for (p = debugfs_entries; p->name; ++p) {
-   p->dentry = debugfs_create_file(p->name, 0444, kvm_debugfs_dir,
-   (void *)(long)p->offset,
-   stat_fops[p->kind]);
-   if (p->dentry == NULL)
+   if (!debugfs_create_file(p->name, 0444, kvm_debugfs_dir,
+(void *)(long)p->offset,
+stat_fops[p->kind]))
goto out_dir;
}
 
@@ -3462,15 +3461,6 @@ out:
return r;
 }
 
-static void kvm_exit_debug(void)
-{
-   struct kvm_stats_debugfs_item *p;
-
-   for (p = debugfs_entries; p->name; ++p)
-   debugfs_remove(p->dentry);
-   debugfs_remove(kvm_debugfs_dir);
-}
-
 static int kvm_suspend(void)
 {
if (kvm_usage_count)
@@ -3628,7 +3618,7 @@ EXPORT_SYMBOL_GPL(kvm_init);
 
 void kvm_exit(void)
 {
-   kvm_exit_debug();
+   debugfs_remove_recursive(kvm_debugfs_dir);
misc_deregister(_dev);
kmem_cache_destroy(kvm_vcpu_cache);
kvm_async_pf_deinit();
-- 
2.3.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 06/23] KVM: s390: Generalize access to IPTE controls

2015-12-02 Thread Christian Borntraeger
From: "Eugene (jno) Dvurechenski" 

This patch generalizes access to the IPTE controls, which is a part of SCA.
This is to prepare for upcoming introduction of Extended SCA support.

Signed-off-by: Eugene (jno) Dvurechenski 
Signed-off-by: Christian Borntraeger 
---
 arch/s390/kvm/gaccess.c  | 10 +-
 arch/s390/kvm/kvm-s390.h |  5 +
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index a7559f7..06f7edb 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -259,7 +259,7 @@ struct aste {
 
 int ipte_lock_held(struct kvm_vcpu *vcpu)
 {
-   union ipte_control *ic = >kvm->arch.sca->ipte_control;
+   union ipte_control *ic = kvm_s390_get_ipte_control(vcpu->kvm);
 
if (vcpu->arch.sie_block->eca & 1)
return ic->kh != 0;
@@ -274,7 +274,7 @@ static void ipte_lock_simple(struct kvm_vcpu *vcpu)
vcpu->kvm->arch.ipte_lock_count++;
if (vcpu->kvm->arch.ipte_lock_count > 1)
goto out;
-   ic = >kvm->arch.sca->ipte_control;
+   ic = kvm_s390_get_ipte_control(vcpu->kvm);
do {
old = READ_ONCE(*ic);
while (old.k) {
@@ -296,7 +296,7 @@ static void ipte_unlock_simple(struct kvm_vcpu *vcpu)
vcpu->kvm->arch.ipte_lock_count--;
if (vcpu->kvm->arch.ipte_lock_count)
goto out;
-   ic = >kvm->arch.sca->ipte_control;
+   ic = kvm_s390_get_ipte_control(vcpu->kvm);
do {
old = READ_ONCE(*ic);
new = old;
@@ -311,7 +311,7 @@ static void ipte_lock_siif(struct kvm_vcpu *vcpu)
 {
union ipte_control old, new, *ic;
 
-   ic = >kvm->arch.sca->ipte_control;
+   ic = kvm_s390_get_ipte_control(vcpu->kvm);
do {
old = READ_ONCE(*ic);
while (old.kg) {
@@ -328,7 +328,7 @@ static void ipte_unlock_siif(struct kvm_vcpu *vcpu)
 {
union ipte_control old, new, *ic;
 
-   ic = >kvm->arch.sca->ipte_control;
+   ic = kvm_s390_get_ipte_control(vcpu->kvm);
do {
old = READ_ONCE(*ic);
new = old;
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index 1e70e00..844f711 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -340,4 +340,9 @@ void kvm_s390_clear_bp_data(struct kvm_vcpu *vcpu);
 void kvm_s390_prepare_debug_exit(struct kvm_vcpu *vcpu);
 void kvm_s390_handle_per_event(struct kvm_vcpu *vcpu);
 
+/* support for Basic/Extended SCA handling */
+static inline union ipte_control *kvm_s390_get_ipte_control(struct kvm *kvm)
+{
+   return >arch.sca->ipte_control;
+}
 #endif
-- 
2.3.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 16/23] KVM: s390: always set/clear the SCA sda field

2015-12-02 Thread Christian Borntraeger
From: David Hildenbrand 

Let's always set and clear the sda when enabling/disabling a VCPU.
Dealing with sda being set to something else makes no sense anymore
as we enable a VCPU in the SCA now after it has been registered at
the VM.

Signed-off-by: David Hildenbrand 
Signed-off-by: Christian Borntraeger 
---
 arch/s390/kvm/kvm-s390.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 2ba5978..7e0092b 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1266,14 +1266,12 @@ static void sca_del_vcpu(struct kvm_vcpu *vcpu)
struct esca_block *sca = vcpu->kvm->arch.sca;
 
clear_bit_inv(vcpu->vcpu_id, (unsigned long *) sca->mcn);
-   if (sca->cpu[vcpu->vcpu_id].sda == (__u64) vcpu->arch.sie_block)
-   sca->cpu[vcpu->vcpu_id].sda = 0;
+   sca->cpu[vcpu->vcpu_id].sda = 0;
} else {
struct bsca_block *sca = vcpu->kvm->arch.sca;
 
clear_bit_inv(vcpu->vcpu_id, (unsigned long *) >mcn);
-   if (sca->cpu[vcpu->vcpu_id].sda == (__u64) vcpu->arch.sie_block)
-   sca->cpu[vcpu->vcpu_id].sda = 0;
+   sca->cpu[vcpu->vcpu_id].sda = 0;
}
read_unlock(>kvm->arch.sca_lock);
 }
@@ -1285,8 +1283,7 @@ static void sca_add_vcpu(struct kvm_vcpu *vcpu, struct 
kvm *kvm,
if (kvm->arch.use_esca) {
struct esca_block *sca = kvm->arch.sca;
 
-   if (!sca->cpu[id].sda)
-   sca->cpu[id].sda = (__u64) vcpu->arch.sie_block;
+   sca->cpu[id].sda = (__u64) vcpu->arch.sie_block;
vcpu->arch.sie_block->scaoh = (__u32)(((__u64)sca) >> 32);
vcpu->arch.sie_block->scaol = (__u32)(__u64)sca & ~0x3fU;
vcpu->arch.sie_block->ecb2 |= 0x04U;
@@ -1294,8 +1291,7 @@ static void sca_add_vcpu(struct kvm_vcpu *vcpu, struct 
kvm *kvm,
} else {
struct bsca_block *sca = kvm->arch.sca;
 
-   if (!sca->cpu[id].sda)
-   sca->cpu[id].sda = (__u64) vcpu->arch.sie_block;
+   sca->cpu[id].sda = (__u64) vcpu->arch.sie_block;
vcpu->arch.sie_block->scaoh = (__u32)(((__u64)sca) >> 32);
vcpu->arch.sie_block->scaol = (__u32)(__u64)sca;
set_bit_inv(id, (unsigned long *) >mcn);
-- 
2.3.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 09/23] KVM: s390: Introduce new structures

2015-12-02 Thread Christian Borntraeger
From: "Eugene (jno) Dvurechenski" 

This patch adds new structures and updates some existing ones to
provide the base for Extended SCA functionality.

The old sca_* structures were renamed to bsca_* to keep things uniform.

The access to fields of SIGP controls were turned into bitfields instead
of hardcoded bitmasks.

Signed-off-by: Eugene (jno) Dvurechenski 
Signed-off-by: Christian Borntraeger 
---
 arch/s390/include/asm/kvm_host.h | 47 +++-
 arch/s390/kvm/interrupt.c| 31 --
 arch/s390/kvm/kvm-s390.c | 14 ++--
 arch/s390/kvm/kvm-s390.h |  4 +++-
 4 files changed, 70 insertions(+), 26 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index efaac2c..923b13d 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -25,7 +25,9 @@
 #include 
 #include 
 
-#define KVM_MAX_VCPUS 64
+#define KVM_S390_BSCA_CPU_SLOTS 64
+#define KVM_S390_ESCA_CPU_SLOTS 248
+#define KVM_MAX_VCPUS KVM_S390_BSCA_CPU_SLOTS
 #define KVM_USER_MEM_SLOTS 32
 
 /*
@@ -40,9 +42,34 @@
 #define SIGP_CTRL_C0x80
 #define SIGP_CTRL_SCN_MASK 0x3f
 
-struct sca_entry {
+union bsca_sigp_ctrl {
+   __u8 value;
+   struct {
+   __u8 c : 1;
+   __u8 r : 1;
+   __u8 scn : 6;
+   };
+} __packed;
+
+union esca_sigp_ctrl {
+   __u16 value;
+   struct {
+   __u8 c : 1;
+   __u8 reserved: 7;
+   __u8 scn;
+   };
+} __packed;
+
+struct esca_entry {
+   union esca_sigp_ctrl sigp_ctrl;
+   __u16   reserved1[3];
+   __u64   sda;
+   __u64   reserved2[6];
+} __packed;
+
+struct bsca_entry {
__u8reserved0;
-   __u8sigp_ctrl;
+   union bsca_sigp_ctrlsigp_ctrl;
__u16   reserved[3];
__u64   sda;
__u64   reserved2[2];
@@ -57,14 +84,22 @@ union ipte_control {
};
 };
 
-struct sca_block {
+struct bsca_block {
union ipte_control ipte_control;
__u64   reserved[5];
__u64   mcn;
__u64   reserved2;
-   struct sca_entry cpu[64];
+   struct bsca_entry cpu[KVM_S390_BSCA_CPU_SLOTS];
 } __attribute__((packed));
 
+struct esca_block {
+   union ipte_control ipte_control;
+   __u64   reserved1[7];
+   __u64   mcn[4];
+   __u64   reserved2[20];
+   struct esca_entry cpu[KVM_S390_ESCA_CPU_SLOTS];
+} __packed;
+
 #define CPUSTAT_STOPPED0x8000
 #define CPUSTAT_WAIT   0x1000
 #define CPUSTAT_ECALL_PEND 0x0800
@@ -585,7 +620,7 @@ struct kvm_s390_crypto_cb {
 };
 
 struct kvm_arch{
-   struct sca_block *sca;
+   struct bsca_block *sca;
debug_info_t *dbf;
struct kvm_s390_float_interrupt float_int;
struct kvm_device *flic;
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 2a4718a..aa221a4 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -37,25 +37,32 @@
 /* handle external calls via sigp interpretation facility */
 static int sca_ext_call_pending(struct kvm_vcpu *vcpu, int *src_id)
 {
-   struct sca_block *sca = vcpu->kvm->arch.sca;
-   uint8_t sigp_ctrl = sca->cpu[vcpu->vcpu_id].sigp_ctrl;
+   struct bsca_block *sca = vcpu->kvm->arch.sca;
+   union bsca_sigp_ctrl sigp_ctrl = sca->cpu[vcpu->vcpu_id].sigp_ctrl;
 
if (src_id)
-   *src_id = sigp_ctrl & SIGP_CTRL_SCN_MASK;
+   *src_id = sigp_ctrl.scn;
 
-   return sigp_ctrl & SIGP_CTRL_C &&
+   return sigp_ctrl.c &&
atomic_read(>arch.sie_block->cpuflags) &
CPUSTAT_ECALL_PEND;
 }
 
 static int sca_inject_ext_call(struct kvm_vcpu *vcpu, int src_id)
 {
-   struct sca_block *sca = vcpu->kvm->arch.sca;
-   uint8_t *sigp_ctrl = &(sca->cpu[vcpu->vcpu_id].sigp_ctrl);
-   uint8_t new_val = SIGP_CTRL_C | (src_id & SIGP_CTRL_SCN_MASK);
-   uint8_t old_val = *sigp_ctrl & ~SIGP_CTRL_C;
+   int expect, rc;
+   struct bsca_block *sca = vcpu->kvm->arch.sca;
+   union bsca_sigp_ctrl *sigp_ctrl = &(sca->cpu[vcpu->vcpu_id].sigp_ctrl);
+   union bsca_sigp_ctrl new_val = {0}, old_val = *sigp_ctrl;
 
-   if (cmpxchg(sigp_ctrl, old_val, new_val) != old_val) {
+   new_val.scn = src_id;
+   new_val.c = 1;
+   old_val.c = 0;
+
+   expect = old_val.value;
+   rc = cmpxchg(_ctrl->value, old_val.value, new_val.value);
+
+   if (rc != expect) {
/* another external call is pending */
return -EBUSY;
}
@@ -65,12 +72,12 @@ static int sca_inject_ext_call(struct kvm_vcpu *vcpu, int 
src_id)
 
 static void sca_clear_ext_call(struct kvm_vcpu *vcpu)
 {
-   struct sca_block *sca = vcpu->kvm->arch.sca;
+   struct bsca_block *sca = vcpu->kvm->arch.sca;
struct kvm_s390_local_interrupt *li = 

[GIT PULL 18/23] KVM: s390: don't switch to ESCA for ucontrol

2015-12-02 Thread Christian Borntraeger
From: David Hildenbrand 

sca_add_vpcu is not called for ucontrol guests. We must also not
apply the sca checking for sca_can_add_vcpu as ucontrol guests
do not have to follow the sca limits.

As common code already checks that id < KVM_MAX_VCPUS all other
data structures are safe as well.

Signed-off-by: David Hildenbrand 
Signed-off-by: Christian Borntraeger 
---
 arch/s390/kvm/kvm-s390.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index d9d71bb..539d385 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1588,7 +1588,7 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
struct sie_page *sie_page;
int rc = -EINVAL;
 
-   if (!sca_can_add_vcpu(kvm, id))
+   if (!kvm_is_ucontrol(kvm) && !sca_can_add_vcpu(kvm, id))
goto out;
 
rc = -ENOMEM;
-- 
2.3.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 04/23] KVM: s390: rewrite vcpu_post_run and drop out early

2015-12-02 Thread Christian Borntraeger
From: David Hildenbrand 

Let's rewrite this function to better reflect how we actually handle
exit_code. By dropping out early we can save a few cycles. This
especially speeds up sie exits caused by host irqs.

Also, let's move the special -EOPNOTSUPP for intercepts to
the place where it belongs and convert it to -EREMOTE.

Reviewed-by: Dominik Dingel 
Reviewed-by: Cornelia Huck 
Signed-off-by: David Hildenbrand 
Signed-off-by: Christian Borntraeger 
---
 arch/s390/kvm/intercept.c |  7 +++---
 arch/s390/kvm/kvm-s390.c  | 59 +--
 2 files changed, 24 insertions(+), 42 deletions(-)

diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
index b4a5aa1..d53c107 100644
--- a/arch/s390/kvm/intercept.c
+++ b/arch/s390/kvm/intercept.c
@@ -54,9 +54,6 @@ void kvm_s390_rewind_psw(struct kvm_vcpu *vcpu, int ilc)
 static int handle_noop(struct kvm_vcpu *vcpu)
 {
switch (vcpu->arch.sie_block->icptcode) {
-   case 0x0:
-   vcpu->stat.exit_null++;
-   break;
case 0x10:
vcpu->stat.exit_external_request++;
break;
@@ -338,8 +335,10 @@ static int handle_partial_execution(struct kvm_vcpu *vcpu)
 
 int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu)
 {
+   if (kvm_is_ucontrol(vcpu->kvm))
+   return -EOPNOTSUPP;
+
switch (vcpu->arch.sie_block->icptcode) {
-   case 0x00:
case 0x10:
case 0x18:
return handle_noop(vcpu);
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 8465892..5c36c8e 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2071,8 +2071,6 @@ static int vcpu_post_run_fault_in_sie(struct kvm_vcpu 
*vcpu)
 
 static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
 {
-   int rc = -1;
-
VCPU_EVENT(vcpu, 6, "exit sie icptcode %d",
   vcpu->arch.sie_block->icptcode);
trace_kvm_s390_sie_exit(vcpu, vcpu->arch.sie_block->icptcode);
@@ -2080,40 +2078,35 @@ static int vcpu_post_run(struct kvm_vcpu *vcpu, int 
exit_reason)
if (guestdbg_enabled(vcpu))
kvm_s390_restore_guest_per_regs(vcpu);
 
-   if (exit_reason >= 0) {
-   rc = 0;
+   memcpy(>run->s.regs.gprs[14], >arch.sie_block->gg14, 16);
+
+   if (vcpu->arch.sie_block->icptcode > 0) {
+   int rc = kvm_handle_sie_intercept(vcpu);
+
+   if (rc != -EOPNOTSUPP)
+   return rc;
+   vcpu->run->exit_reason = KVM_EXIT_S390_SIEIC;
+   vcpu->run->s390_sieic.icptcode = vcpu->arch.sie_block->icptcode;
+   vcpu->run->s390_sieic.ipa = vcpu->arch.sie_block->ipa;
+   vcpu->run->s390_sieic.ipb = vcpu->arch.sie_block->ipb;
+   return -EREMOTE;
+   } else if (exit_reason != -EFAULT) {
+   vcpu->stat.exit_null++;
+   return 0;
} else if (kvm_is_ucontrol(vcpu->kvm)) {
vcpu->run->exit_reason = KVM_EXIT_S390_UCONTROL;
vcpu->run->s390_ucontrol.trans_exc_code =
current->thread.gmap_addr;
vcpu->run->s390_ucontrol.pgm_code = 0x10;
-   rc = -EREMOTE;
-
+   return -EREMOTE;
} else if (current->thread.gmap_pfault) {
trace_kvm_s390_major_guest_pfault(vcpu);
current->thread.gmap_pfault = 0;
-   if (kvm_arch_setup_async_pf(vcpu)) {
-   rc = 0;
-   } else {
-   gpa_t gpa = current->thread.gmap_addr;
-   rc = kvm_arch_fault_in_page(vcpu, gpa, 1);
-   }
+   if (kvm_arch_setup_async_pf(vcpu))
+   return 0;
+   return kvm_arch_fault_in_page(vcpu, current->thread.gmap_addr, 
1);
}
-
-   if (rc == -1)
-   rc = vcpu_post_run_fault_in_sie(vcpu);
-
-   memcpy(>run->s.regs.gprs[14], >arch.sie_block->gg14, 16);
-
-   if (rc == 0) {
-   if (kvm_is_ucontrol(vcpu->kvm))
-   /* Don't exit for host interrupts. */
-   rc = vcpu->arch.sie_block->icptcode ? -EOPNOTSUPP : 0;
-   else
-   rc = kvm_handle_sie_intercept(vcpu);
-   }
-
-   return rc;
+   return vcpu_post_run_fault_in_sie(vcpu);
 }
 
 static int __vcpu_run(struct kvm_vcpu *vcpu)
@@ -2233,18 +2226,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
rc = 0;
}
 
-   if (rc == -EOPNOTSUPP) {
-   /* intercept cannot be handled in-kernel, prepare kvm-run */
-   kvm_run->exit_reason = KVM_EXIT_S390_SIEIC;
-   kvm_run->s390_sieic.icptcode = vcpu->arch.sie_block->icptcode;
-   

[GIT PULL 07/23] KVM: s390: Generalize access to SIGP controls

2015-12-02 Thread Christian Borntraeger
From: "Eugene (jno) Dvurechenski" 

This patch generalizes access to the SIGP controls, which is a part of SCA.
This is to prepare for upcoming introduction of Extended SCA support.

Signed-off-by: Eugene (jno) Dvurechenski 
Signed-off-by: Christian Borntraeger 
---
 arch/s390/kvm/interrupt.c | 72 +--
 1 file changed, 45 insertions(+), 27 deletions(-)

diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 6a75352..2a4718a 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -34,6 +34,45 @@
 #define PFAULT_DONE 0x0680
 #define VIRTIO_PARAM 0x0d00
 
+/* handle external calls via sigp interpretation facility */
+static int sca_ext_call_pending(struct kvm_vcpu *vcpu, int *src_id)
+{
+   struct sca_block *sca = vcpu->kvm->arch.sca;
+   uint8_t sigp_ctrl = sca->cpu[vcpu->vcpu_id].sigp_ctrl;
+
+   if (src_id)
+   *src_id = sigp_ctrl & SIGP_CTRL_SCN_MASK;
+
+   return sigp_ctrl & SIGP_CTRL_C &&
+   atomic_read(>arch.sie_block->cpuflags) &
+   CPUSTAT_ECALL_PEND;
+}
+
+static int sca_inject_ext_call(struct kvm_vcpu *vcpu, int src_id)
+{
+   struct sca_block *sca = vcpu->kvm->arch.sca;
+   uint8_t *sigp_ctrl = &(sca->cpu[vcpu->vcpu_id].sigp_ctrl);
+   uint8_t new_val = SIGP_CTRL_C | (src_id & SIGP_CTRL_SCN_MASK);
+   uint8_t old_val = *sigp_ctrl & ~SIGP_CTRL_C;
+
+   if (cmpxchg(sigp_ctrl, old_val, new_val) != old_val) {
+   /* another external call is pending */
+   return -EBUSY;
+   }
+   atomic_or(CPUSTAT_ECALL_PEND, >arch.sie_block->cpuflags);
+   return 0;
+}
+
+static void sca_clear_ext_call(struct kvm_vcpu *vcpu)
+{
+   struct sca_block *sca = vcpu->kvm->arch.sca;
+   struct kvm_s390_local_interrupt *li = >arch.local_int;
+   uint8_t *sigp_ctrl = &(sca->cpu[vcpu->vcpu_id].sigp_ctrl);
+
+   atomic_andnot(CPUSTAT_ECALL_PEND, li->cpuflags);
+   *sigp_ctrl = 0;
+}
+
 int psw_extint_disabled(struct kvm_vcpu *vcpu)
 {
return !(vcpu->arch.sie_block->gpsw.mask & PSW_MASK_EXT);
@@ -792,13 +831,11 @@ static const deliver_irq_t deliver_irq_funcs[] = {
 int kvm_s390_ext_call_pending(struct kvm_vcpu *vcpu)
 {
struct kvm_s390_local_interrupt *li = >arch.local_int;
-   uint8_t sigp_ctrl = vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sigp_ctrl;
 
if (!sclp.has_sigpif)
return test_bit(IRQ_PEND_EXT_EXTERNAL, >pending_irqs);
 
-   return (sigp_ctrl & SIGP_CTRL_C) &&
-  (atomic_read(>arch.sie_block->cpuflags) & 
CPUSTAT_ECALL_PEND);
+   return sca_ext_call_pending(vcpu, NULL);
 }
 
 int kvm_s390_vcpu_has_irq(struct kvm_vcpu *vcpu, int exclude_stop)
@@ -909,9 +946,7 @@ void kvm_s390_clear_local_irqs(struct kvm_vcpu *vcpu)
memset(>irq, 0, sizeof(li->irq));
spin_unlock(>lock);
 
-   /* clear pending external calls set by sigp interpretation facility */
-   atomic_andnot(CPUSTAT_ECALL_PEND, li->cpuflags);
-   vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sigp_ctrl = 0;
+   sca_clear_ext_call(vcpu);
 }
 
 int __must_check kvm_s390_deliver_pending_interrupts(struct kvm_vcpu *vcpu)
@@ -1003,21 +1038,6 @@ static int __inject_pfault_init(struct kvm_vcpu *vcpu, 
struct kvm_s390_irq *irq)
return 0;
 }
 
-static int __inject_extcall_sigpif(struct kvm_vcpu *vcpu, uint16_t src_id)
-{
-   unsigned char new_val, old_val;
-   uint8_t *sigp_ctrl = >kvm->arch.sca->cpu[vcpu->vcpu_id].sigp_ctrl;
-
-   new_val = SIGP_CTRL_C | (src_id & SIGP_CTRL_SCN_MASK);
-   old_val = *sigp_ctrl & ~SIGP_CTRL_C;
-   if (cmpxchg(sigp_ctrl, old_val, new_val) != old_val) {
-   /* another external call is pending */
-   return -EBUSY;
-   }
-   atomic_or(CPUSTAT_ECALL_PEND, >arch.sie_block->cpuflags);
-   return 0;
-}
-
 static int __inject_extcall(struct kvm_vcpu *vcpu, struct kvm_s390_irq *irq)
 {
struct kvm_s390_local_interrupt *li = >arch.local_int;
@@ -1034,7 +1054,7 @@ static int __inject_extcall(struct kvm_vcpu *vcpu, struct 
kvm_s390_irq *irq)
return -EINVAL;
 
if (sclp.has_sigpif)
-   return __inject_extcall_sigpif(vcpu, src_id);
+   return sca_inject_ext_call(vcpu, src_id);
 
if (test_and_set_bit(IRQ_PEND_EXT_EXTERNAL, >pending_irqs))
return -EBUSY;
@@ -2203,7 +2223,7 @@ static void store_local_irq(struct 
kvm_s390_local_interrupt *li,
 
 int kvm_s390_get_irq_state(struct kvm_vcpu *vcpu, __u8 __user *buf, int len)
 {
-   uint8_t sigp_ctrl = vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sigp_ctrl;
+   int scn;
unsigned long sigp_emerg_pending[BITS_TO_LONGS(KVM_MAX_VCPUS)];
struct kvm_s390_local_interrupt *li = >arch.local_int;
unsigned long pending_irqs;
@@ -2243,14 +2263,12 @@ int kvm_s390_get_irq_state(struct kvm_vcpu 

[GIT PULL 00/23] KVM: s390 features, kvm_get_vcpu_by_id and stat for 4.5

2015-12-02 Thread Christian Borntraeger
Paolo,

here is the first s390 pull request for 4.5. It also contains the
remaining vcpu lookup changes and an improved cleanup of the kvm_stat
exit path.
I have deferred the kvm_stat per VM patches.

The s390 changes are:
- ESCA support (up to 248 CPUs)
- detection if KVM works (e.g. for nested virtualization)
- cleanups

The following changes since commit bb11c6c96544737aede6a2eb92e5c6bc8b46534b:

  KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page() 
(2015-11-26 15:31:36 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git  
tags/kvm-s390-next-4.5-1

for you to fetch changes up to 2f8a43d45d14ad62b105ed99151b453c12df7149:

  KVM: s390: remove redudant assigment of error code (2015-11-30 12:47:13 +0100)


KVM: s390 features, kvm_get_vcpu_by_id and stat

Several features for s390
1. ESCA support (up to 248 vCPUs)
2. KVM detection: we  can now detect if we support KVM (e.g. does KVM
   under KVM work?)

kvm_stat:
1. cleanup

kvm_get_vcpu_by_id:
1. Use kvm_get_vcpu_by_id where appropriate
2. Apply a heuristic to optimize for ID VCPU == No. VCPU


Christian Borntraeger (1):
  KVM: s390: remove redudant assigment of error code

David Hildenbrand (12):
  KVM: Use common function for VCPU lookup by id
  KVM: use heuristic for fast VCPU lookup by id
  KVM: s390: rewrite vcpu_post_run and drop out early
  KVM: s390: fast path for sca_ext_call_pending
  KVM: s390: we always have a SCA
  KVM: s390: fix SCA related races and double use
  KVM: s390: always set/clear the SCA sda field
  KVM: s390: cleanup sca_add_vcpu
  KVM: s390: don't switch to ESCA for ucontrol
  s390/sclp: introduce check for SIE
  s390: show virtualization support in /proc/cpuinfo
  KVM: s390: don't load kvm without virtualization support

Eugene (jno) Dvurechenski (8):
  s390/sclp: introduce checks for ESCA and HVS
  KVM: s390: Generalize access to IPTE controls
  KVM: s390: Generalize access to SIGP controls
  KVM: s390: Provide SCA-aware helpers for VCPU add/del
  KVM: s390: Introduce new structures
  KVM: s390: Make provisions for ESCA utilization
  KVM: s390: Introduce switching code
  KVM: s390: Enable up to 248 VCPUs per VM

Heiko Carstens (1):
  KVM: s390: remove pointless test_facility(2) check

Janosch Frank (1):
  KVM: Remove unnecessary debugfs dentry references

 arch/powerpc/kvm/book3s_hv.c |  10 +-
 arch/s390/include/asm/elf.h  |   7 ++
 arch/s390/include/asm/kvm_host.h |  49 +++-
 arch/s390/include/asm/sclp.h |   8 +-
 arch/s390/kernel/processor.c |   6 +
 arch/s390/kernel/setup.c |   9 ++
 arch/s390/kvm/diag.c |  11 +-
 arch/s390/kvm/gaccess.c  |  38 +--
 arch/s390/kvm/intercept.c|   7 +-
 arch/s390/kvm/interrupt.c| 133 +-
 arch/s390/kvm/kvm-s390.c | 237 +++
 arch/s390/kvm/kvm-s390.h |   7 ++
 drivers/s390/char/sclp_early.c   |   8 +-
 include/linux/kvm_host.h |   6 +-
 virt/kvm/kvm_main.c  |  30 ++---
 15 files changed, 407 insertions(+), 159 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 22/23] KVM: s390: remove pointless test_facility(2) check

2015-12-02 Thread Christian Borntraeger
From: Heiko Carstens 

This evaluates always to 'true'.

Signed-off-by: Heiko Carstens 
Reviewed-by: David Hildenbrand 
Signed-off-by: Christian Borntraeger 
---
 arch/s390/kvm/kvm-s390.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 49d3319..77724ce 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1027,7 +1027,7 @@ static int kvm_s390_apxa_installed(void)
u8 config[128];
int cc;
 
-   if (test_facility(2) && test_facility(12)) {
+   if (test_facility(12)) {
cc = kvm_s390_query_ap_config(config);
 
if (cc)
-- 
2.3.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 13/23] KVM: s390: fast path for sca_ext_call_pending

2015-12-02 Thread Christian Borntraeger
From: David Hildenbrand 

If CPUSTAT_ECALL_PEND isn't set, we can't have an external call pending,
so we can directly avoid taking the lock.

Acked-by: Christian Borntraeger 
Signed-off-by: David Hildenbrand 
Signed-off-by: Christian Borntraeger 
---
 arch/s390/kvm/interrupt.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 831c9ac..62ec925 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -39,6 +39,9 @@ static int sca_ext_call_pending(struct kvm_vcpu *vcpu, int 
*src_id)
 {
int c, scn;
 
+   if (!(atomic_read(>arch.sie_block->cpuflags) & 
CPUSTAT_ECALL_PEND))
+   return 0;
+
read_lock(>kvm->arch.sca_lock);
if (vcpu->kvm->arch.use_esca) {
struct esca_block *sca = vcpu->kvm->arch.sca;
@@ -60,8 +63,7 @@ static int sca_ext_call_pending(struct kvm_vcpu *vcpu, int 
*src_id)
if (src_id)
*src_id = scn;
 
-   return c && atomic_read(>arch.sie_block->cpuflags) &
-   CPUSTAT_ECALL_PEND;
+   return c;
 }
 
 static int sca_inject_ext_call(struct kvm_vcpu *vcpu, int src_id)
-- 
2.3.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 20/23] s390: show virtualization support in /proc/cpuinfo

2015-12-02 Thread Christian Borntraeger
From: David Hildenbrand 

This patch exposes the SIE capability (aka virtualization support) via
/proc/cpuinfo -> "features" as "sie".

As we don't want to expose this hwcap via elf, let's add a second,
"internal"/non-elf capability list. The content is simply concatenated
to the existing features when printing /proc/cpuinfo.

We also add the defines to elf.h to keep the hwcap stuff at a common
place.

Acked-by: Martin Schwidefsky 
Signed-off-by: David Hildenbrand 
Signed-off-by: Christian Borntraeger 
---
 arch/s390/include/asm/elf.h  | 7 +++
 arch/s390/kernel/processor.c | 6 ++
 arch/s390/kernel/setup.c | 9 +
 3 files changed, 22 insertions(+)

diff --git a/arch/s390/include/asm/elf.h b/arch/s390/include/asm/elf.h
index bab6739..08e34a5 100644
--- a/arch/s390/include/asm/elf.h
+++ b/arch/s390/include/asm/elf.h
@@ -104,6 +104,9 @@
 #define HWCAP_S390_TE  1024
 #define HWCAP_S390_VXRS2048
 
+/* Internal bits, not exposed via elf */
+#define HWCAP_INT_SIE  1UL
+
 /*
  * These are used to set parameters in the core dumps.
  */
@@ -169,6 +172,10 @@ extern unsigned int vdso_enabled;
 extern unsigned long elf_hwcap;
 #define ELF_HWCAP (elf_hwcap)
 
+/* Internal hardware capabilities, not exposed via elf */
+
+extern unsigned long int_hwcap;
+
 /* This yields a string that ld.so will use to load implementation
specific libraries for optimization.  This is more specific in
intent than poking at uname or /proc/cpuinfo.
diff --git a/arch/s390/kernel/processor.c b/arch/s390/kernel/processor.c
index 7ce00e7..647128d 100644
--- a/arch/s390/kernel/processor.c
+++ b/arch/s390/kernel/processor.c
@@ -61,6 +61,9 @@ static int show_cpuinfo(struct seq_file *m, void *v)
"esan3", "zarch", "stfle", "msa", "ldisp", "eimm", "dfp",
"edat", "etf3eh", "highgprs", "te", "vx"
};
+   static const char * const int_hwcap_str[] = {
+   "sie"
+   };
unsigned long n = (unsigned long) v - 1;
int i;
 
@@ -75,6 +78,9 @@ static int show_cpuinfo(struct seq_file *m, void *v)
for (i = 0; i < ARRAY_SIZE(hwcap_str); i++)
if (hwcap_str[i] && (elf_hwcap & (1UL << i)))
seq_printf(m, "%s ", hwcap_str[i]);
+   for (i = 0; i < ARRAY_SIZE(int_hwcap_str); i++)
+   if (int_hwcap_str[i] && (int_hwcap & (1UL << i)))
+   seq_printf(m, "%s ", int_hwcap_str[i]);
seq_puts(m, "\n");
show_cacheinfo(m);
}
diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
index c837bca..dc83ae6 100644
--- a/arch/s390/kernel/setup.c
+++ b/arch/s390/kernel/setup.c
@@ -80,6 +80,8 @@ EXPORT_SYMBOL(console_irq);
 unsigned long elf_hwcap __read_mostly = 0;
 char elf_platform[ELF_PLATFORM_SIZE];
 
+unsigned long int_hwcap = 0;
+
 int __initdata memory_end_set;
 unsigned long __initdata memory_end;
 unsigned long __initdata max_physmem_end;
@@ -793,6 +795,13 @@ static int __init setup_hwcaps(void)
strcpy(elf_platform, "z13");
break;
}
+
+   /*
+* Virtualization support HWCAP_INT_SIE is bit 0.
+*/
+   if (sclp.has_sief2)
+   int_hwcap |= HWCAP_INT_SIE;
+
return 0;
 }
 arch_initcall(setup_hwcaps);
-- 
2.3.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC

2015-12-02 Thread Michael S. Tsirkin
On Tue, Dec 01, 2015 at 10:36:33AM -0800, Alexander Duyck wrote:
> On Tue, Dec 1, 2015 at 9:37 AM, Michael S. Tsirkin  wrote:
> > On Tue, Dec 01, 2015 at 09:04:32AM -0800, Alexander Duyck wrote:
> >> On Tue, Dec 1, 2015 at 7:28 AM, Michael S. Tsirkin  wrote:
> 
> >> > There are several components to this:
> >> > - dma_map_* needs to prevent page from
> >> >   being migrated while device is running.
> >> >   For example, expose some kind of bitmap from guest
> >> >   to host, set bit there while page is mapped.
> >> >   What happens if we stop the guest and some
> >> >   bits are still set? See dma_alloc_coherent below
> >> >   for some ideas.
> >>
> >> Yeah, I could see something like this working.  Maybe we could do
> >> something like what was done for the NX bit and make use of the upper
> >> order bits beyond the limits of the memory range to mark pages as
> >> non-migratable?
> >>
> >> I'm curious.  What we have with a DMA mapped region is essentially
> >> shared memory between the guest and the device.  How would we resolve
> >> something like this with IVSHMEM, or are we blocked there as well in
> >> terms of migration?
> >
> > I have some ideas. Will post later.
> 
> I look forward to it.
> 
> >> > - dma_unmap_* needs to mark page as dirty
> >> >   This can be done by writing into a page.
> >> >
> >> > - dma_sync_* needs to mark page as dirty
> >> >   This is trickier as we can not change the data.
> >> >   One solution is using atomics.
> >> >   For example:
> >> > int x = ACCESS_ONCE(*p);
> >> > cmpxchg(p, x, x);
> >> >   Seems to do a write without changing page
> >> >   contents.
> >>
> >> Like I said we can probably kill 2 birds with one stone by just
> >> implementing our own dma_mark_clean() for x86 virtualized
> >> environments.
> >>
> >> I'd say we could take your solution one step further and just use 0
> >> instead of bothering to read the value.  After all it won't write the
> >> area if the value at the offset is not 0.
> >
> > Really almost any atomic that has no side effect will do.
> > atomic or with 0
> > atomic and with 
> >
> > It's just that cmpxchg already happens to have a portable
> > wrapper.
> 
> I was originally thinking maybe an atomic_add with 0 would be the way
> to go.

cmpxchg with any value too.

>  Either way though we still are using a locked prefix and
> having to dirty a cache line per page which is going to come at some
> cost.

I agree. It's likely not necessary for everyone
to be doing this: only people that both
run within the VM and want migration to work
need to do this logging.

So set some module option to have driver tell hypervisor that it
supports logging.  If bus mastering is enabled before this, migration is
blocked.  Or even pass some flag from hypervisor so
driver can detect it needs to log writes.
I guess this could be put in device config somewhere,
though in practice it's a global thing, not a per device one, so
maybe we need some new channel to
pass this flag to guest. CPUID?
Or maybe we can put some kind of agent in the initrd
and use the existing guest agent channel after all.
agent in initrd could open up a lot of new possibilities.


> >> > - dma_alloc_coherent memory (e.g. device rings)
> >> >   must be migrated after device stopped modifying it.
> >> >   Just stopping the VCPU is not enough:
> >> >   you must make sure device is not changing it.
> >> >
> >> >   Or maybe the device has some kind of ring flush operation,
> >> >   if there was a reasonably portable way to do this
> >> >   (e.g. a flush capability could maybe be added to SRIOV)
> >> >   then hypervisor could do this.
> >>
> >> This is where things start to get messy. I was suggesting the
> >> suspend/resume to resolve this bit, but it might be possible to also
> >> deal with this via something like this via clearing the bus master
> >> enable bit for the VF.  If I am not mistaken that should disable MSI-X
> >> interrupts and halt any DMA.  That should work as long as you have
> >> some mechanism that is tracking the pages in use for DMA.
> >
> > A bigger issue is recovering afterwards.
> 
> Agreed.
> 
> >> >   In case you need to resume on source, you
> >> >   really need to follow the same path
> >> >   as on destination, preferably detecting
> >> >   device reset and restoring the device
> >> >   state.
> >>
> >> The problem with detecting the reset is that you would likely have to
> >> be polling to do something like that.
> >
> > We could some event to guest to notify it about this event
> > through a new or existing channel.
> >
> > Or we could make it possible for userspace to trigger this,
> > then notify guest through the guest agent.
> 
> The first thing that comes to mind would be to use something like PCIe
> Advanced Error Reporting, however I don't know if we can put a
> requirement on the system supporting the q35 machine type or not in
> order to support migration.

You mean require pci express? This 

Re: [PATCH v2 16/21] arm64: KVM: Add compatibility aliases

2015-12-02 Thread Christoffer Dall
On Fri, Nov 27, 2015 at 06:50:10PM +, Marc Zyngier wrote:
> So far, we've implemented the new world switch with a completely
> different namespace, so that we could have both implementation
> compiled in.
> 
> Let's take things one step further by adding weak aliases that
> have the same names as the original implementation. The weak
> attributes allows the new implementation to be overriden by the
> old one, and everything still work.

Do I understand correctly that the whole point of this is to keep
everything compiling nicely while at the same time being able to split
the patches so that you can have an isolated "remove old code" patch
that doesn't have to change the callers?

If so, I think explaining this rationale would be helpful in the commit
message in case we have to go back and track these changes in connection
with a regression and don't remember why we did things this way.

Maybe I'm being over-cautious though...

Otherwise:

Acked-by: Christoffer Dall 

> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/hyp/debug-sr.c   | 3 +++
>  arch/arm64/kvm/hyp/hyp-entry.S  | 3 +++
>  arch/arm64/kvm/hyp/switch.c | 3 +++
>  arch/arm64/kvm/hyp/tlb.c| 9 +
>  arch/arm64/kvm/hyp/vgic-v3-sr.c | 3 +++
>  5 files changed, 21 insertions(+)
> 
> diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
> index a0b2b99..afd0a53 100644
> --- a/arch/arm64/kvm/hyp/debug-sr.c
> +++ b/arch/arm64/kvm/hyp/debug-sr.c
> @@ -128,3 +128,6 @@ u32 __hyp_text __debug_read_mdcr_el2(void)
>  {
>   return read_sysreg(mdcr_el2);
>  }
> +
> +__alias(__debug_read_mdcr_el2)
> +u32 __weak __kvm_get_mdcr_el2(void);
> diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
> index 39d6935..ace919b 100644
> --- a/arch/arm64/kvm/hyp/hyp-entry.S
> +++ b/arch/arm64/kvm/hyp/hyp-entry.S
> @@ -184,6 +184,8 @@ ENDPROC(\label)
>  
>   .align 11
>  
> + .weak   __kvm_hyp_vector
> +ENTRY(__kvm_hyp_vector)
>  ENTRY(__hyp_vector)
>   ventry  el2t_sync_invalid   // Synchronous EL2t
>   ventry  el2t_irq_invalid// IRQ EL2t
> @@ -205,3 +207,4 @@ ENTRY(__hyp_vector)
>   ventry  el1_fiq_invalid // FIQ 32-bit EL1
>   ventry  el1_error_invalid   // Error 32-bit EL1
>  ENDPROC(__hyp_vector)
> +ENDPROC(__kvm_hyp_vector)
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 7b81089..c8ba370 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -141,6 +141,9 @@ int __hyp_text __guest_run(struct kvm_vcpu *vcpu)
>   return exit_code;
>  }
>  
> +__alias(__guest_run)
> +int __weak __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> +
>  static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx 
> ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";
>  
>  void __hyp_text __noreturn __hyp_panic(void)
> diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c
> index d4a07d0..2c279a8 100644
> --- a/arch/arm64/kvm/hyp/tlb.c
> +++ b/arch/arm64/kvm/hyp/tlb.c
> @@ -47,6 +47,9 @@ void __hyp_text __tlb_flush_vmid_ipa(struct kvm *kvm, 
> phys_addr_t ipa)
>   write_sysreg(0, vttbr_el2);
>  }
>  
> +__alias(__tlb_flush_vmid_ipa)
> +void __weak __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
> +
>  void __hyp_text __tlb_flush_vmid(struct kvm *kvm)
>  {
>   dsb(ishst);
> @@ -63,6 +66,9 @@ void __hyp_text __tlb_flush_vmid(struct kvm *kvm)
>   write_sysreg(0, vttbr_el2);
>  }
>  
> +__alias(__tlb_flush_vmid)
> +void __weak __kvm_tlb_flush_vmid(struct kvm *kvm);
> +
>  void __hyp_text __tlb_flush_vm_context(void)
>  {
>   dsb(ishst);
> @@ -70,3 +76,6 @@ void __hyp_text __tlb_flush_vm_context(void)
>"ic ialluis  ": : );
>   dsb(ish);
>  }
> +
> +__alias(__tlb_flush_vm_context)
> +void __weak __kvm_flush_vm_context(void);
> diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> index b490db5..1b0eedb 100644
> --- a/arch/arm64/kvm/hyp/vgic-v3-sr.c
> +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> @@ -220,3 +220,6 @@ u64 __hyp_text __vgic_v3_read_ich_vtr_el2(void)
>  {
>   return read_gicreg(ICH_VTR_EL2);
>  }
> +
> +__alias(__vgic_v3_read_ich_vtr_el2)
> +u64 __weak __vgic_v3_get_ich_vtr_el2(void);
> -- 
> 2.1.4
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 12/21] arm64: KVM: Implement fpsimd save/restore

2015-12-02 Thread Christoffer Dall
On Fri, Nov 27, 2015 at 06:50:06PM +, Marc Zyngier wrote:
> Implement the fpsimd save restore, keeping the lazy part in
> assembler (as returning to C would be overkill).
> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/hyp/Makefile|  1 +
>  arch/arm64/kvm/hyp/entry.S | 32 +++-
>  arch/arm64/kvm/hyp/fpsimd.S| 33 +
>  arch/arm64/kvm/hyp/hyp.h   |  7 +++
>  arch/arm64/kvm/hyp/switch.c|  8 
>  arch/arm64/kvm/hyp/sysreg-sr.c |  2 +-
>  6 files changed, 81 insertions(+), 2 deletions(-)
>  create mode 100644 arch/arm64/kvm/hyp/fpsimd.S
> 
> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> index 9c11b0f..56238d0 100644
> --- a/arch/arm64/kvm/hyp/Makefile
> +++ b/arch/arm64/kvm/hyp/Makefile
> @@ -9,3 +9,4 @@ obj-$(CONFIG_KVM_ARM_HOST) += sysreg-sr.o
>  obj-$(CONFIG_KVM_ARM_HOST) += debug-sr.o
>  obj-$(CONFIG_KVM_ARM_HOST) += entry.o
>  obj-$(CONFIG_KVM_ARM_HOST) += switch.o
> +obj-$(CONFIG_KVM_ARM_HOST) += fpsimd.o
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> index 2c4449a..7552922 100644
> --- a/arch/arm64/kvm/hyp/entry.S
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -27,6 +27,7 @@
>  
>  #define CPU_GP_REG_OFFSET(x) (CPU_GP_REGS + x)
>  #define CPU_XREG_OFFSET(x)   CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
> +#define CPU_SYSREG_OFFSET(x) (CPU_SYSREGS + 8*x)
>  
>   .text
>   .pushsection.hyp.text, "ax"
> @@ -152,4 +153,33 @@ ENTRY(__guest_exit)
>   ret
>  ENDPROC(__guest_exit)
>  
> - /* Insert fault handling here */
> +ENTRY(__fpsimd_guest_restore)
> + pushx4, lr
> +
> + mrs x2, cptr_el2
> + bic x2, x2, #CPTR_EL2_TFP
> + msr cptr_el2, x2
> + isb
> +
> + mrs x3, tpidr_el2
> +
> + ldr x0, [x3, #VCPU_HOST_CONTEXT]
> + kern_hyp_va x0
> + add x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> + bl  __fpsimd_save_state
> +
> + add x2, x3, #VCPU_CONTEXT
> + add x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> + bl  __fpsimd_restore_state
> +
> + mrs x1, hcr_el2
> + tbnzx1, #HCR_RW_SHIFT, 1f

nit: Add a comment along the lines of:
// Skip restoring fpexc32 for AArch64 guests

> + ldr x4, [x2, #CPU_SYSREG_OFFSET(FPEXC32_EL2)]
> + msr fpexc32_el2, x4
> +1:
> + pop x4, lr
> + pop x2, x3
> + pop x0, x1
> +
> + eret
> +ENDPROC(__fpsimd_guest_restore)
> diff --git a/arch/arm64/kvm/hyp/fpsimd.S b/arch/arm64/kvm/hyp/fpsimd.S
> new file mode 100644
> index 000..da3f22c
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/fpsimd.S
> @@ -0,0 +1,33 @@
> +/*
> + * Copyright (C) 2015 - ARM Ltd
> + * Author: Marc Zyngier 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +
> +#include 
> +
> +#include 
> +
> + .text
> + .pushsection.hyp.text, "ax"
> +
> +ENTRY(__fpsimd_save_state)
> + fpsimd_save x0, 1
> + ret
> +ENDPROC(__fpsimd_save_state)
> +
> +ENTRY(__fpsimd_restore_state)
> + fpsimd_restore  x0, 1
> + ret
> +ENDPROC(__fpsimd_restore_state)
> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
> index f0427ee..18365dd 100644
> --- a/arch/arm64/kvm/hyp/hyp.h
> +++ b/arch/arm64/kvm/hyp/hyp.h
> @@ -66,6 +66,13 @@ void __debug_restore_state(struct kvm_vcpu *vcpu,
>  void __debug_cond_save_host_state(struct kvm_vcpu *vcpu);
>  void __debug_cond_restore_host_state(struct kvm_vcpu *vcpu);
>  
> +void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
> +void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
> +static inline bool __fpsimd_enabled(void)
> +{
> + return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP);
> +}
> +
>  u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt);
>  
>  #endif /* __ARM64_KVM_HYP_H__ */
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index d67ed9e..8affc19 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -88,6 +88,7 @@ int __hyp_text __guest_run(struct kvm_vcpu *vcpu)
>  {
>   struct kvm_cpu_context *host_ctxt;
>   struct kvm_cpu_context *guest_ctxt;
> + bool fp_enabled;
>   u64 exit_code;
>  
>   vcpu = kern_hyp_va(vcpu);
> @@ -117,6 +118,8 @@ int __hyp_text __guest_run(struct kvm_vcpu *vcpu)
>   exit_code = __guest_enter(vcpu, 

Re: [PATCH v2 13/21] arm64: KVM: Implement TLB handling

2015-12-02 Thread Christoffer Dall
On Fri, Nov 27, 2015 at 06:50:07PM +, Marc Zyngier wrote:
> Implement the TLB handling as a direct translation of the assembly
> code version.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/hyp/Makefile |  1 +
>  arch/arm64/kvm/hyp/tlb.c| 72 
> +
>  2 files changed, 73 insertions(+)
>  create mode 100644 arch/arm64/kvm/hyp/tlb.c
> 
> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> index 56238d0..1a529f5 100644
> --- a/arch/arm64/kvm/hyp/Makefile
> +++ b/arch/arm64/kvm/hyp/Makefile
> @@ -10,3 +10,4 @@ obj-$(CONFIG_KVM_ARM_HOST) += debug-sr.o
>  obj-$(CONFIG_KVM_ARM_HOST) += entry.o
>  obj-$(CONFIG_KVM_ARM_HOST) += switch.o
>  obj-$(CONFIG_KVM_ARM_HOST) += fpsimd.o
> +obj-$(CONFIG_KVM_ARM_HOST) += tlb.o
> diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c
> new file mode 100644
> index 000..d4a07d0
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/tlb.c
> @@ -0,0 +1,72 @@
> +/*
> + * Copyright (C) 2015 - ARM Ltd
> + * Author: Marc Zyngier 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +
> +#include "hyp.h"
> +
> +void __hyp_text __tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
> +{
> + dsb(ishst);
> +
> + /* Switch to requested VMID */
> + kvm = kern_hyp_va(kvm);
> + write_sysreg(kvm->arch.vttbr, vttbr_el2);
> + isb();
> +
> + /*
> +  * We could do so much better if we had the VA as well.
> +  * Instead, we invalidate Stage-2 for this IPA, and the
> +  * whole of Stage-1. Weep...
> +  */
> + ipa >>= 12;
> + asm volatile("tlbi ipas2e1is, %0" : : "r" (ipa));
> + dsb(ish);

nit: missing white space

> + /*
> +  * We have to ensure completion of the invalidation at Stage-2,
> +  * since a table walk on another CPU could refill a TLB with a
> +  * complete (S1 + S2) walk based on the old Stage-2 mapping if
> +  * the Stage-1 invalidation happened first.
> +  */

nit: isn't that comment targeting the dsb(ish) above as in the asm code
and should be moved above that line?

> + asm volatile("tlbi vmalle1is" : : );
> + dsb(ish);
> + isb();
> +
> + write_sysreg(0, vttbr_el2);
> +}
> +
> +void __hyp_text __tlb_flush_vmid(struct kvm *kvm)
> +{
> + dsb(ishst);
> +
> + /* Switch to requested VMID */
> + kvm = kern_hyp_va(kvm);
> + write_sysreg(kvm->arch.vttbr, vttbr_el2);
> + isb();
> +
> + asm volatile("tlbi vmalls12e1is" : : );
> + dsb(ish);
> + isb();
> +
> + write_sysreg(0, vttbr_el2);
> +}
> +
> +void __hyp_text __tlb_flush_vm_context(void)
> +{
> + dsb(ishst);
> + asm volatile("tlbi alle1is  \n"
> +  "ic ialluis  ": : );
> + dsb(ish);
> +}
> -- 
> 2.1.4
> 

Otherwise:

Reviewed-by: Christoffer Dall 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 15/21] arm64: KVM: Add panic handling

2015-12-02 Thread Christoffer Dall
On Fri, Nov 27, 2015 at 06:50:09PM +, Marc Zyngier wrote:
> Add the panic handler, together with the small bits of assembly
> code to call the kernel's panic implementation.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/hyp/hyp-entry.S | 11 ++-
>  arch/arm64/kvm/hyp/hyp.h   |  1 +
>  arch/arm64/kvm/hyp/switch.c| 30 ++
>  3 files changed, 41 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
> index 8334407..39d6935 100644
> --- a/arch/arm64/kvm/hyp/hyp-entry.S
> +++ b/arch/arm64/kvm/hyp/hyp-entry.S
> @@ -150,7 +150,16 @@ el1_irq:
>   mov x1, #ARM_EXCEPTION_IRQ
>   b   __guest_exit
>  
> -.macro invalid_vectorlabel, target = __kvm_hyp_panic
> +ENTRY(__hyp_do_panic)
> + mov lr, #(PSR_F_BIT | PSR_I_BIT | PSR_A_BIT | PSR_D_BIT |\
> +   PSR_MODE_EL1h)
> + msr spsr_el2, lr
> + ldr lr, =panic
> + msr elr_el2, lr
> + eret
> +ENDPROC(__hyp_do_panic)
> +
> +.macro invalid_vectorlabel, target = __hyp_panic
>   .align  2
>  \label:
>   b \target
> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
> index 18365dd..87f16fa 100644
> --- a/arch/arm64/kvm/hyp/hyp.h
> +++ b/arch/arm64/kvm/hyp/hyp.h
> @@ -74,6 +74,7 @@ static inline bool __fpsimd_enabled(void)
>  }
>  
>  u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt);
> +void __noreturn __hyp_do_panic(unsigned long, ...);
>  
>  #endif /* __ARM64_KVM_HYP_H__ */
>  
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 8affc19..7b81089 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -140,3 +140,33 @@ int __hyp_text __guest_run(struct kvm_vcpu *vcpu)
>  
>   return exit_code;
>  }
> +
> +static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx 
> ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";
> +
> +void __hyp_text __noreturn __hyp_panic(void)
> +{
> + unsigned long str_va = (unsigned long)__hyp_panic_string;
> + u64 spsr = read_sysreg(spsr_el2);
> + u64 elr = read_sysreg(elr_el2);
> + u64 par = read_sysreg(par_el1);
> +
> + if (read_sysreg(vttbr_el2)) {
> + struct kvm_vcpu *vcpu;
> + struct kvm_cpu_context *host_ctxt;
> +
> + vcpu = (struct kvm_vcpu *)read_sysreg(tpidr_el2);
> + host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> + __deactivate_traps(vcpu);
> + __deactivate_vm(vcpu);
> + __sysreg_restore_state(host_ctxt);
> + }
> +
> + /* Call panic for real */
> + __hyp_do_panic(str_va - HYP_PAGE_OFFSET + PAGE_OFFSET,

is the first parameter hyp_kern_va(str_va) ?  If so, can you add that
define instead?

> +spsr,  elr,
> +read_sysreg(esr_el2),   read_sysreg(far_el2),
> +read_sysreg(hpfar_el2), par,
> +(void *)read_sysreg(tpidr_el2));
> +
> + unreachable();
> +}
> -- 
> 2.1.4
> 

Otherwise:

Reviewed-by: Christoffer Dall 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 10/21] arm64: KVM: Add patchable function selector

2015-12-02 Thread Christoffer Dall
On Wed, Dec 02, 2015 at 09:47:43AM +, Marc Zyngier wrote:
> On 02/12/15 09:27, Christoffer Dall wrote:
> > On Tue, Dec 01, 2015 at 06:51:00PM +, Marc Zyngier wrote:
> >> On 01/12/15 15:39, Christoffer Dall wrote:
> >>> On Fri, Nov 27, 2015 at 06:50:04PM +, Marc Zyngier wrote:
>  KVM so far relies on code patching, and is likely to use it more
>  in the future. The main issue is that our alternative system works
>  at the instruction level, while we'd like to have alternatives at
>  the function level.
> 
>  In order to cope with this, add the "hyp_alternate_select" macro that
>  outputs a brief sequence of code that in turn can be patched, allowing
>  al alternative function to be selected.
> >>>
> >>> s/al/an/ ?
> >>>
> 
>  Signed-off-by: Marc Zyngier 
>  ---
>   arch/arm64/kvm/hyp/hyp.h | 16 
>   1 file changed, 16 insertions(+)
> 
>  diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
>  index 7ac8e11..f0427ee 100644
>  --- a/arch/arm64/kvm/hyp/hyp.h
>  +++ b/arch/arm64/kvm/hyp/hyp.h
>  @@ -27,6 +27,22 @@
>   
>   #define kern_hyp_va(v) (typeof(v))((unsigned long)v & 
>  HYP_PAGE_OFFSET_MASK)
>   
>  +/*
>  + * Generates patchable code sequences that are used to switch between
>  + * two implementations of a function, depending on the availability of
>  + * a feature.
>  + */
> >>>
> >>> This looks right to me, but I'm a bit unclear what the types of this is
> >>> and how to use it.
> >>>
> >>> Are orig and alt function pointers and cond is a CONFIG_FOO ?  fname is
> >>> a symbol, which is defined as a prototype somewhere and then implemented
> >>> here, or?
> >>>
> >>> Perhaps a Usage: part of the docs would be helpful.
> >>
> >> How about:
> >>
> >> @fname: a symbol name that will be defined as a function returning a
> >> function pointer whose type will match @orig and @alt
> >> @orig: A pointer to the default function, as returned by @fname when
> >> @cond doesn't hold
> >> @alt: A pointer to the alternate function, as returned by @fname when
> >> @cond holds
> >> @cond: a CPU feature (as described in asm/cpufeature.h)
> > 
> > looks good.
> > 
> >>
> >>>
>  +#define hyp_alternate_select(fname, orig, alt, cond)
>  \
>  +typeof(orig) * __hyp_text fname(void)   
>  \
>  +{   
>  \
>  +typeof(alt) *val = orig;
>  \
>  +asm volatile(ALTERNATIVE("nop   \n",
>  \
>  + "mov   %0, %1  \n",
>  \
>  + cond)  
>  \
>  + : "+r" (val) : "r" (alt)); 
>  \
>  +return val; 
>  \
>  +}
>  +
>   void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
>   void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
>   
>  -- 
>  2.1.4
> 
> >>>
> >>> I haven't thought much about how all of this is implemented, but from my
> >>> point of views the ideal situation would be something like:
> >>>
> >>> void foo(int a, int b)
> >>> {
> >>>   ALTERNATIVE_IF_NOT CONFIG_BAR
> >>>   foo_legacy(a, b);
> >>>   ALTERNATIVE_ELSE
> >>>   foo_new(a, b);
> >>>   ALTERNATIVE_END
> >>> }
> >>>
> >>> I realize this may be impossible because the C code could implement all
> >>> sort of fun stuff around the actual function calls, but would there be
> >>> some way to annotate the functions and find the actual branch statement
> >>> and change the target?
> >>
> >> The main issue is that C doesn't give you any access to the branch
> >> function itself, except for the asm-goto statements. It also makes it
> >> very hard to preserve the return type. For your idea to work, we'd need
> >> some support in the compiler itself. I'm sure that it is doable, just
> >> not by me! ;-)
> > 
> > Not by me either, I'm just asking stupid questions - as always.
> 
> I don't find that stupid. Asking that kind of stuff is useful to put
> things in perspective.
> 

Thanks!

> >>
> >> This is why I've ended up creating something that returns a function
> >> *pointer*, because that's something that exists in the language (no new
> >> concept). I simply made sure I could return it at minimal cost.
> >>
> > 
> > I don't have a problem with this either.  I'm curious though, how much
> > of a performance improvement (and why) we get from doing this as opposed
> > to a simple if-statement?
> 
> An if statement will involve fetching some configuration from memory.
> You can do that, but you are going to waste a cache line and memory
> bandwidth (both which are scarce resources) for something 

Re: [PATCH v2 14/21] arm64: KVM: HYP mode entry points

2015-12-02 Thread Christoffer Dall
On Fri, Nov 27, 2015 at 06:50:08PM +, Marc Zyngier wrote:
> Add the entry points for HYP mode (both for hypercalls and
> exception handling).
> 
> Signed-off-by: Marc Zyngier 

Reviewed-by: Christoffer Dall 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 3/3] vhost_net: basic polling support

2015-12-02 Thread Michael S. Tsirkin
On Wed, Dec 02, 2015 at 01:04:03PM +0800, Jason Wang wrote:
> 
> 
> On 12/01/2015 10:43 PM, Michael S. Tsirkin wrote:
> > On Tue, Dec 01, 2015 at 01:17:49PM +0800, Jason Wang wrote:
> >>
> >> On 11/30/2015 06:44 PM, Michael S. Tsirkin wrote:
> >>> On Wed, Nov 25, 2015 at 03:11:29PM +0800, Jason Wang wrote:
> > This patch tries to poll for new added tx buffer or socket receive
> > queue for a while at the end of tx/rx processing. The maximum time
> > spent on polling were specified through a new kind of vring ioctl.
> >
> > Signed-off-by: Jason Wang 
> >>> One further enhancement would be to actually poll
> >>> the underlying device. This should be reasonably
> >>> straight-forward with macvtap (especially in the
> >>> passthrough mode).
> >>>
> >>>
> >> Yes, it is. I have some patches to do this by replacing
> >> skb_queue_empty() with sk_busy_loop() but for tap.
> > We probably don't want to do this unconditionally, though.
> >
> >> Tests does not show
> >> any improvement but some regression.
> > Did you add code to call sk_mark_napi_id on tap then?
> > sk_busy_loop won't do anything useful without.
> 
> Yes I did. Probably something wrong elsewhere.

Is this for guest-to-guest? the patch to do napi
for tap is still not upstream due to minor performance
regression.  Want me to repost it?

> >
> >>  Maybe it's better to test macvtap.
> > Same thing ...
> >
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 21/21] arm64: KVM: Remove weak attributes

2015-12-02 Thread Christoffer Dall
On Fri, Nov 27, 2015 at 06:50:15PM +, Marc Zyngier wrote:
> As we've now switched to the new world switch implementation,
> remove the weak attributes, as nobody is supposed to override
> it anymore.

Why not remove the aliases and change the callers?

-Christoffer

> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/hyp/debug-sr.c   |  5 ++---
>  arch/arm64/kvm/hyp/hyp-entry.S  |  3 ---
>  arch/arm64/kvm/hyp/switch.c |  5 ++---
>  arch/arm64/kvm/hyp/tlb.c| 16 +++-
>  arch/arm64/kvm/hyp/vgic-v3-sr.c |  5 ++---
>  5 files changed, 13 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
> index 774a3f69..747546b 100644
> --- a/arch/arm64/kvm/hyp/debug-sr.c
> +++ b/arch/arm64/kvm/hyp/debug-sr.c
> @@ -125,10 +125,9 @@ void __hyp_text __debug_cond_restore_host_state(struct 
> kvm_vcpu *vcpu)
>   }
>  }
>  
> -u32 __hyp_text __debug_read_mdcr_el2(void)
> +static u32 __hyp_text __debug_read_mdcr_el2(void)
>  {
>   return read_sysreg(mdcr_el2);
>  }
>  
> -__alias(__debug_read_mdcr_el2)
> -u32 __weak __kvm_get_mdcr_el2(void);
> +__alias(__debug_read_mdcr_el2) u32 __kvm_get_mdcr_el2(void);
> diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
> index ace919b..bbc0be1 100644
> --- a/arch/arm64/kvm/hyp/hyp-entry.S
> +++ b/arch/arm64/kvm/hyp/hyp-entry.S
> @@ -184,9 +184,7 @@ ENDPROC(\label)
>  
>   .align 11
>  
> - .weak   __kvm_hyp_vector
>  ENTRY(__kvm_hyp_vector)
> -ENTRY(__hyp_vector)
>   ventry  el2t_sync_invalid   // Synchronous EL2t
>   ventry  el2t_irq_invalid// IRQ EL2t
>   ventry  el2t_fiq_invalid// FIQ EL2t
> @@ -206,5 +204,4 @@ ENTRY(__hyp_vector)
>   ventry  el1_irq // IRQ 32-bit EL1
>   ventry  el1_fiq_invalid // FIQ 32-bit EL1
>   ventry  el1_error_invalid   // Error 32-bit EL1
> -ENDPROC(__hyp_vector)
>  ENDPROC(__kvm_hyp_vector)
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index c8ba370..1154d66 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -84,7 +84,7 @@ static void __hyp_text __vgic_restore_state(struct kvm_vcpu 
> *vcpu)
>   __vgic_call_restore_state()(vcpu);
>  }
>  
> -int __hyp_text __guest_run(struct kvm_vcpu *vcpu)
> +static int __hyp_text __guest_run(struct kvm_vcpu *vcpu)
>  {
>   struct kvm_cpu_context *host_ctxt;
>   struct kvm_cpu_context *guest_ctxt;
> @@ -141,8 +141,7 @@ int __hyp_text __guest_run(struct kvm_vcpu *vcpu)
>   return exit_code;
>  }
>  
> -__alias(__guest_run)
> -int __weak __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> +__alias(__guest_run) int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>  
>  static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx 
> ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n";
>  
> diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c
> index 2c279a8..250e06c 100644
> --- a/arch/arm64/kvm/hyp/tlb.c
> +++ b/arch/arm64/kvm/hyp/tlb.c
> @@ -17,7 +17,7 @@
>  
>  #include "hyp.h"
>  
> -void __hyp_text __tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
> +static void __hyp_text __tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
>  {
>   dsb(ishst);
>  
> @@ -47,10 +47,10 @@ void __hyp_text __tlb_flush_vmid_ipa(struct kvm *kvm, 
> phys_addr_t ipa)
>   write_sysreg(0, vttbr_el2);
>  }
>  
> -__alias(__tlb_flush_vmid_ipa)
> -void __weak __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
> +__alias(__tlb_flush_vmid_ipa) void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm,
> + phys_addr_t ipa);
>  
> -void __hyp_text __tlb_flush_vmid(struct kvm *kvm)
> +static void __hyp_text __tlb_flush_vmid(struct kvm *kvm)
>  {
>   dsb(ishst);
>  
> @@ -66,10 +66,9 @@ void __hyp_text __tlb_flush_vmid(struct kvm *kvm)
>   write_sysreg(0, vttbr_el2);
>  }
>  
> -__alias(__tlb_flush_vmid)
> -void __weak __kvm_tlb_flush_vmid(struct kvm *kvm);
> +__alias(__tlb_flush_vmid) void __kvm_tlb_flush_vmid(struct kvm *kvm);
>  
> -void __hyp_text __tlb_flush_vm_context(void)
> +static void __hyp_text __tlb_flush_vm_context(void)
>  {
>   dsb(ishst);
>   asm volatile("tlbi alle1is  \n"
> @@ -77,5 +76,4 @@ void __hyp_text __tlb_flush_vm_context(void)
>   dsb(ish);
>  }
>  
> -__alias(__tlb_flush_vm_context)
> -void __weak __kvm_flush_vm_context(void);
> +__alias(__tlb_flush_vm_context) void __kvm_flush_vm_context(void);
> diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> index 1b0eedb..82a4f4b 100644
> --- a/arch/arm64/kvm/hyp/vgic-v3-sr.c
> +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> @@ -216,10 +216,9 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu 
> *vcpu)
>   }
>  }
>  
> -u64 __hyp_text __vgic_v3_read_ich_vtr_el2(void)
> +static u64 __hyp_text 

Re: [PATCH v2 20/21] arm64: KVM: Cleanup asm-offset.c

2015-12-02 Thread Christoffer Dall
On Fri, Nov 27, 2015 at 06:50:14PM +, Marc Zyngier wrote:
> As we've now rewritten most of our code-base in C, most of the
> KVM-specific code in asm-offset.c is useless. Delete-time again!
> 
> Signed-off-by: Marc Zyngier 

Acked-by: Christoffer Dall 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 19/21] arm64: KVM: Turn system register numbers to an enum

2015-12-02 Thread Christoffer Dall
On Fri, Nov 27, 2015 at 06:50:13PM +, Marc Zyngier wrote:
> Having the system register numbers as #defines has been a pain
> since day one, as the ordering is pretty fragile, and moving
> things around leads to renumbering and epic conflict resolutions.
> 
> Now that we're mostly acessing the sysreg file in C, an enum is
> a much better type to use, and we can clean things up a bit.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/include/asm/kvm_asm.h | 76 -
>  arch/arm64/include/asm/kvm_emulate.h |  1 -
>  arch/arm64/include/asm/kvm_host.h| 81 
> +++-
>  arch/arm64/include/asm/kvm_mmio.h|  1 -
>  arch/arm64/kernel/asm-offsets.c  |  1 +
>  arch/arm64/kvm/guest.c   |  1 -
>  arch/arm64/kvm/handle_exit.c |  1 +
>  arch/arm64/kvm/hyp/debug-sr.c|  1 +
>  arch/arm64/kvm/hyp/entry.S   |  3 +-
>  arch/arm64/kvm/hyp/sysreg-sr.c   |  1 +
>  arch/arm64/kvm/sys_regs.c|  1 +
>  virt/kvm/arm/vgic-v3.c   |  1 +
>  12 files changed, 87 insertions(+), 82 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_asm.h 
> b/arch/arm64/include/asm/kvm_asm.h
> index 5e37710..52b777b 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -20,82 +20,6 @@
>  
>  #include 
>  
> -/*
> - * 0 is reserved as an invalid value.
> - * Order *must* be kept in sync with the hyp switch code.
> - */
> -#define  MPIDR_EL1   1   /* MultiProcessor Affinity Register */
> -#define  CSSELR_EL1  2   /* Cache Size Selection Register */
> -#define  SCTLR_EL1   3   /* System Control Register */
> -#define  ACTLR_EL1   4   /* Auxiliary Control Register */
> -#define  CPACR_EL1   5   /* Coprocessor Access Control */
> -#define  TTBR0_EL1   6   /* Translation Table Base Register 0 */
> -#define  TTBR1_EL1   7   /* Translation Table Base Register 1 */
> -#define  TCR_EL1 8   /* Translation Control Register */
> -#define  ESR_EL1 9   /* Exception Syndrome Register */
> -#define  AFSR0_EL1   10  /* Auxilary Fault Status Register 0 */
> -#define  AFSR1_EL1   11  /* Auxilary Fault Status Register 1 */
> -#define  FAR_EL1 12  /* Fault Address Register */
> -#define  MAIR_EL113  /* Memory Attribute Indirection 
> Register */
> -#define  VBAR_EL114  /* Vector Base Address Register */
> -#define  CONTEXTIDR_EL1  15  /* Context ID Register */
> -#define  TPIDR_EL0   16  /* Thread ID, User R/W */
> -#define  TPIDRRO_EL0 17  /* Thread ID, User R/O */
> -#define  TPIDR_EL1   18  /* Thread ID, Privileged */
> -#define  AMAIR_EL1   19  /* Aux Memory Attribute Indirection 
> Register */
> -#define  CNTKCTL_EL1 20  /* Timer Control Register (EL1) */
> -#define  PAR_EL1 21  /* Physical Address Register */
> -#define MDSCR_EL122  /* Monitor Debug System Control Register */
> -#define MDCCINT_EL1  23  /* Monitor Debug Comms Channel Interrupt Enable 
> Reg */
> -
> -/* 32bit specific registers. Keep them at the end of the range */
> -#define  DACR32_EL2  24  /* Domain Access Control Register */
> -#define  IFSR32_EL2  25  /* Instruction Fault Status Register */
> -#define  FPEXC32_EL2 26  /* Floating-Point Exception Control 
> Register */
> -#define  DBGVCR32_EL227  /* Debug Vector Catch Register */
> -#define  NR_SYS_REGS 28
> -
> -/* 32bit mapping */
> -#define c0_MPIDR (MPIDR_EL1 * 2) /* MultiProcessor ID Register */
> -#define c0_CSSELR(CSSELR_EL1 * 2)/* Cache Size Selection Register */
> -#define c1_SCTLR (SCTLR_EL1 * 2) /* System Control Register */
> -#define c1_ACTLR (ACTLR_EL1 * 2) /* Auxiliary Control Register */
> -#define c1_CPACR (CPACR_EL1 * 2) /* Coprocessor Access Control */
> -#define c2_TTBR0 (TTBR0_EL1 * 2) /* Translation Table Base Register 0 */
> -#define c2_TTBR0_high(c2_TTBR0 + 1)  /* TTBR0 top 32 bits */
> -#define c2_TTBR1 (TTBR1_EL1 * 2) /* Translation Table Base Register 1 */
> -#define c2_TTBR1_high(c2_TTBR1 + 1)  /* TTBR1 top 32 bits */
> -#define c2_TTBCR (TCR_EL1 * 2)   /* Translation Table Base Control R. */
> -#define c3_DACR  (DACR32_EL2 * 2)/* Domain Access Control 
> Register */
> -#define c5_DFSR  (ESR_EL1 * 2)   /* Data Fault Status Register */
> -#define c5_IFSR  (IFSR32_EL2 * 2)/* Instruction Fault Status 
> Register */
> -#define c5_ADFSR (AFSR0_EL1 * 2) /* Auxiliary Data Fault Status R */
> -#define c5_AIFSR (AFSR1_EL1 * 2) /* Auxiliary Instr Fault Status R */
> -#define c6_DFAR  (FAR_EL1 * 2)   /* Data Fault Address Register 
> */
> -#define c6_IFAR  (c6_DFAR + 1)   /* 

Re: [PATCH v2 18/21] arm64: KVM: Move away from the assembly version of the world switch

2015-12-02 Thread Christoffer Dall
On Fri, Nov 27, 2015 at 06:50:12PM +, Marc Zyngier wrote:
> This is it. We remove all of the code that has now been rewritten.
> 
> Signed-off-by: Marc Zyngier 

Acked-by: Christoffer Dall 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 17/21] arm64: KVM: Map the kernel RO section into HYP

2015-12-02 Thread Christoffer Dall
On Fri, Nov 27, 2015 at 06:50:11PM +, Marc Zyngier wrote:
> In order to run C code in HYP, we must make sure that the kernel's
> RO section in mapped into HYP (otherwise things break badly).
> 
> Signed-off-by: Marc Zyngier 

Acked-by: Christoffer Dall 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL 04/23] KVM: s390: rewrite vcpu_post_run and drop out early

2015-12-02 Thread Paolo Bonzini


On 02/12/2015 12:06, Christian Borntraeger wrote:
> + memcpy(>run->s.regs.gprs[14], >arch.sie_block->gg14, 16);

This is preexisting but... boy it's ugly. :)

Do you gain much over the simpler

vcpu->run->s.regs.gprs[14] = vcpu->arch.sie_block->gg14;
vcpu->run->s.regs.gprs[15] = vcpu->arch.sie_block->gg15;

?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/9] drivers/hv: replace enum hv_message_type by u32

2015-12-02 Thread Paolo Bonzini


On 30/11/2015 17:22, Andrey Smetanin wrote:
> enum hv_message_type inside struct hv_message, hv_post_message
> is not size portable. Replace enum by u32.

It's only non-portable inside structs.  Okay to apply just these:

@@ -172,7 +174,7 @@ union hv_message_flags {

 /* Define synthetic interrupt controller message header. */
 struct hv_message_header {
-   u32 message_type;
+   enum hv_message_type message_type;
u8 payload_size;
union hv_message_flags message_flags;
u8 reserved[2];
@@ -345,7 +347,7 @@ enum hv_call_code {
 struct hv_input_post_message {
union hv_connection_id connectionid;
u32 reserved;
-   u32 message_type;
+   enum hv_message_type message_type;
u32 payload_size;
u64 payload[HV_MESSAGE_PAYLOAD_QWORD_COUNT];
 };

?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL 04/23] KVM: s390: rewrite vcpu_post_run and drop out early

2015-12-02 Thread Christian Borntraeger
On 12/02/2015 01:20 PM, Paolo Bonzini wrote:
> 
> 
> On 02/12/2015 12:06, Christian Borntraeger wrote:
>> +memcpy(>run->s.regs.gprs[14], >arch.sie_block->gg14, 16);
> 
> This is preexisting but... boy it's ugly. :)
> 
> Do you gain much over the simpler
> 
>   vcpu->run->s.regs.gprs[14] = vcpu->arch.sie_block->gg14;
>   vcpu->run->s.regs.gprs[15] = vcpu->arch.sie_block->gg15;
> 

Its just legacy code from the old days.
There is a difference, but it seems to a missed opportunity from gcc

vcpu->arch.sie_block->gg14 = vcpu->run->s.regs.gprs[14];
839c:   e3 30 f0 b8 00 04   lg  %r3,184(%r15)
83a2:   e3 10 32 40 00 04   lg  %r1,576(%r3)
83a8:   e3 20 30 80 00 04   lg  %r2,128(%r3)
83ae:   e3 20 21 b8 00 04   lg  %r2,440(%r2)
83b4:   e3 20 10 a0 00 24   stg %r2,160(%r1)
vcpu->arch.sie_block->gg15 = vcpu->run->s.regs.gprs[15];
83ba:   e3 10 32 40 00 04   lg  %r1,576(%r3)
83c0:   e3 20 30 80 00 04   lg  %r2,128(%r3)
83c6:   e3 20 21 c0 00 04   lg  %r2,448(%r2)
83cc:   e3 20 10 a8 00 24   stg %r2,168(%r1)

gcc seems to reuse and reload %r2 and %r3, maybe register pressure.



the memcpy gives

memcpy(>arch.sie_block->gg14, >run->s.regs.gprs[14], 16);
839c:   e3 30 f0 b8 00 04   lg  %r3,184(%r15)
83a2:   e3 10 32 40 00 04   lg  %r1,576(%r3)
83a8:   e3 20 30 80 00 04   lg  %r2,128(%r3)
83ae:   d2 0f 10 a0 21 b8   mvc 160(16,%r1),440(%r2)

I will prepare a patch and do my usual micro benchmark. Unless
things get much worse I will schedule this for the next pull.

Christian


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL 00/23] KVM: s390 features, kvm_get_vcpu_by_id and stat for 4.5

2015-12-02 Thread Paolo Bonzini


On 02/12/2015 12:06, Christian Borntraeger wrote:
> Paolo,
> 
> here is the first s390 pull request for 4.5. It also contains the
> remaining vcpu lookup changes and an improved cleanup of the kvm_stat
> exit path.
> I have deferred the kvm_stat per VM patches.
> 
> The s390 changes are:
> - ESCA support (up to 248 CPUs)
> - detection if KVM works (e.g. for nested virtualization)
> - cleanups
> 
> The following changes since commit bb11c6c96544737aede6a2eb92e5c6bc8b46534b:
> 
>   KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page() 
> (2015-11-26 15:31:36 +0100)
> 
> are available in the git reposi at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git  
> tags/kvm-s390-next-4.5-1
> 
> for you to fetch changes up to 2f8a43d45d14ad62b105ed99151b453c12df7149:
> 
>   KVM: s390: remove redudant assigment of error code (2015-11-30 12:47:13 
> +0100)
> 
> 
> KVM: s390 features, kvm_get_vcpu_by_id and stat
> 
> Several features for s390
> 1. ESCA support (up to 248 vCPUs)
> 2. KVM detection: we  can now detect if we support KVM (e.g. does KVM
>under KVM work?)
> 
> kvm_stat:
> 1. cleanup
> 
> kvm_get_vcpu_by_id:
> 1. Use kvm_get_vcpu_by_id where appropriate
> 2. Apply a heuristic to optimize for ID VCPU == No. VCPU
> 
> 
> Christian Borntraeger (1):
>   KVM: s390: remove redudant assigment of error code
> 
> David Hildenbrand (12):
>   KVM: Use common function for VCPU lookup by id
>   KVM: use heuristic for fast VCPU lookup by id
>   KVM: s390: rewrite vcpu_post_run and drop out early
>   KVM: s390: fast path for sca_ext_call_pending
>   KVM: s390: we always have a SCA
>   KVM: s390: fix SCA related races and double use
>   KVM: s390: always set/clear the SCA sda field
>   KVM: s390: cleanup sca_add_vcpu
>   KVM: s390: don't switch to ESCA for ucontrol
>   s390/sclp: introduce check for SIE
>   s390: show virtualization support in /proc/cpuinfo
>   KVM: s390: don't load kvm without virtualization support
> 
> Eugene (jno) Dvurechenski (8):
>   s390/sclp: introduce checks for ESCA and HVS
>   KVM: s390: Generalize access to IPTE controls
>   KVM: s390: Generalize access to SIGP controls
>   KVM: s390: Provide SCA-aware helpers for VCPU add/del
>   KVM: s390: Introduce new structures
>   KVM: s390: Make provisions for ESCA utilization
>   KVM: s390: Introduce switching code
>   KVM: s390: Enable up to 248 VCPUs per VM
> 
> Heiko Carstens (1):
>   KVM: s390: remove pointless test_facility(2) check
> 
> Janosch Frank (1):
>   KVM: Remove unnecessary debugfs dentry references
> 
>  arch/powerpc/kvm/book3s_hv.c |  10 +-
>  arch/s390/include/asm/elf.h  |   7 ++
>  arch/s390/include/asm/kvm_host.h |  49 +++-
>  arch/s390/include/asm/sclp.h |   8 +-
>  arch/s390/kernel/processor.c |   6 +
>  arch/s390/kernel/setup.c |   9 ++
>  arch/s390/kvm/diag.c |  11 +-
>  arch/s390/kvm/gaccess.c  |  38 +--
>  arch/s390/kvm/intercept.c|   7 +-
>  arch/s390/kvm/interrupt.c| 133 +-
>  arch/s390/kvm/kvm-s390.c | 237 
> +++
>  arch/s390/kvm/kvm-s390.h |   7 ++
>  drivers/s390/char/sclp_early.c   |   8 +-
>  include/linux/kvm_host.h |   6 +-
>  virt/kvm/kvm_main.c  |  30 ++---
>  15 files changed, 407 insertions(+), 159 deletions(-)
> 

Pulled, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL 04/23] KVM: s390: rewrite vcpu_post_run and drop out early

2015-12-02 Thread Paolo Bonzini


On 02/12/2015 14:04, Christian Borntraeger wrote:
>> > Do you gain much over the simpler
>> > 
>> >vcpu->run->s.regs.gprs[14] = vcpu->arch.sie_block->gg14;
>> >vcpu->run->s.regs.gprs[15] = vcpu->arch.sie_block->gg15;
>> > 
> Its just legacy code from the old days.
> There is a difference, but it seems to a missed opportunity from gcc
> 
> vcpu->arch.sie_block->gg14 = vcpu->run->s.regs.gprs[14];
> 839c:   e3 30 f0 b8 00 04   lg  %r3,184(%r15)
> 83a2:   e3 10 32 40 00 04   lg  %r1,576(%r3)
> 83a8:   e3 20 30 80 00 04   lg  %r2,128(%r3)
> 83ae:   e3 20 21 b8 00 04   lg  %r2,440(%r2)
> 83b4:   e3 20 10 a0 00 24   stg %r2,160(%r1)
> vcpu->arch.sie_block->gg15 = vcpu->run->s.regs.gprs[15];
> 83ba:   e3 10 32 40 00 04   lg  %r1,576(%r3)
> 83c0:   e3 20 30 80 00 04   lg  %r2,128(%r3)
> 83c6:   e3 20 21 c0 00 04   lg  %r2,448(%r2)
> 83cc:   e3 20 10 a8 00 24   stg %r2,168(%r1)
> 
> gcc seems to reuse and reload %r2 and %r3, maybe register pressure.

More likely to be -fno-strict-aliasing. :(

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 10/21] arm64: KVM: Add patchable function selector

2015-12-02 Thread Marc Zyngier
On 02/12/15 11:53, Christoffer Dall wrote:
> On Wed, Dec 02, 2015 at 09:47:43AM +, Marc Zyngier wrote:
>> On 02/12/15 09:27, Christoffer Dall wrote:
>>> On Tue, Dec 01, 2015 at 06:51:00PM +, Marc Zyngier wrote:
 On 01/12/15 15:39, Christoffer Dall wrote:
> On Fri, Nov 27, 2015 at 06:50:04PM +, Marc Zyngier wrote:
>> KVM so far relies on code patching, and is likely to use it more
>> in the future. The main issue is that our alternative system works
>> at the instruction level, while we'd like to have alternatives at
>> the function level.
>>
>> In order to cope with this, add the "hyp_alternate_select" macro that
>> outputs a brief sequence of code that in turn can be patched, allowing
>> al alternative function to be selected.
>
> s/al/an/ ?
>
>>
>> Signed-off-by: Marc Zyngier 
>> ---
>>  arch/arm64/kvm/hyp/hyp.h | 16 
>>  1 file changed, 16 insertions(+)
>>
>> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
>> index 7ac8e11..f0427ee 100644
>> --- a/arch/arm64/kvm/hyp/hyp.h
>> +++ b/arch/arm64/kvm/hyp/hyp.h
>> @@ -27,6 +27,22 @@
>>  
>>  #define kern_hyp_va(v) (typeof(v))((unsigned long)v & 
>> HYP_PAGE_OFFSET_MASK)
>>  
>> +/*
>> + * Generates patchable code sequences that are used to switch between
>> + * two implementations of a function, depending on the availability of
>> + * a feature.
>> + */
>
> This looks right to me, but I'm a bit unclear what the types of this is
> and how to use it.
>
> Are orig and alt function pointers and cond is a CONFIG_FOO ?  fname is
> a symbol, which is defined as a prototype somewhere and then implemented
> here, or?
>
> Perhaps a Usage: part of the docs would be helpful.

 How about:

 @fname: a symbol name that will be defined as a function returning a
 function pointer whose type will match @orig and @alt
 @orig: A pointer to the default function, as returned by @fname when
 @cond doesn't hold
 @alt: A pointer to the alternate function, as returned by @fname when
 @cond holds
 @cond: a CPU feature (as described in asm/cpufeature.h)
>>>
>>> looks good.
>>>

>
>> +#define hyp_alternate_select(fname, orig, alt, cond)
>> \
>> +typeof(orig) * __hyp_text fname(void)   
>> \
>> +{   
>> \
>> +typeof(alt) *val = orig;
>> \
>> +asm volatile(ALTERNATIVE("nop   \n",
>> \
>> + "mov   %0, %1  \n",
>> \
>> + cond)  
>> \
>> + : "+r" (val) : "r" (alt)); 
>> \
>> +return val; 
>> \
>> +}
>> +
>>  void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
>>  void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
>>  
>> -- 
>> 2.1.4
>>
>
> I haven't thought much about how all of this is implemented, but from my
> point of views the ideal situation would be something like:
>
> void foo(int a, int b)
> {
>   ALTERNATIVE_IF_NOT CONFIG_BAR
>   foo_legacy(a, b);
>   ALTERNATIVE_ELSE
>   foo_new(a, b);
>   ALTERNATIVE_END
> }
>
> I realize this may be impossible because the C code could implement all
> sort of fun stuff around the actual function calls, but would there be
> some way to annotate the functions and find the actual branch statement
> and change the target?

 The main issue is that C doesn't give you any access to the branch
 function itself, except for the asm-goto statements. It also makes it
 very hard to preserve the return type. For your idea to work, we'd need
 some support in the compiler itself. I'm sure that it is doable, just
 not by me! ;-)
>>>
>>> Not by me either, I'm just asking stupid questions - as always.
>>
>> I don't find that stupid. Asking that kind of stuff is useful to put
>> things in perspective.
>>
> 
> Thanks!
> 

 This is why I've ended up creating something that returns a function
 *pointer*, because that's something that exists in the language (no new
 concept). I simply made sure I could return it at minimal cost.

>>>
>>> I don't have a problem with this either.  I'm curious though, how much
>>> of a performance improvement (and why) we get from doing this as opposed
>>> to a simple if-statement?
>>
>> An if statement will involve fetching some configuration from memory.
>> You can do that, but you are going to waste a cache line and memory
>> 

Re: [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC

2015-12-02 Thread Lan, Tianyu

On 12/1/2015 11:02 PM, Michael S. Tsirkin wrote:

But
it requires guest OS to do specific configurations inside and rely on
bonding driver which blocks it work on Windows.
 From performance side,
putting VF and virtio NIC under bonded interface will affect their
performance even when not do migration. These factors block to use VF
NIC passthough in some user cases(Especially in the cloud) which require
migration.


That's really up to guest. You don't need to do bonding,
you can just move the IP and mac from userspace, that's
possible on most OS-es.

Or write something in guest kernel that is more lightweight if you are
so inclined. What we are discussing here is the host-guest interface,
not the in-guest interface.


Current solution we proposed changes NIC driver and Qemu. Guest Os
doesn't need to do special thing for migration.
It's easy to deploy



Except of course these patches don't even work properly yet.

And when they do, even minor changes in host side NIC hardware across
migration will break guests in hard to predict ways.


Switching between PV and VF NIC will introduce network stop and the
latency of hotplug VF is measurable. For some user cases(cloud service
and OPNFV) which are sensitive to network stabilization and performance,
these are not friend and blocks SRIOV NIC usage in these case. We hope
to find a better way to make SRIOV NIC work in these cases and this is
worth to do since SRIOV NIC provides better network performance compared
with PV NIC. Current patches have some issues. I think we can find
solution for them andimprove them step by step.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests] x86: use asm volatile for flags and segment register read/writes

2015-12-02 Thread Paolo Bonzini
The effects of a move from or to these registers is not entirely described
by the asm's operands.  Therefore, it may happen that the compiler
moves the asm around in ways that break tests.

In one case, the compiler marked read_ss() as pure and thus subjected
it to common subexpression elimination:

u16 ss = read_ss();

// check for null segment load
*mem = 0;
asm volatile("mov %0, %%ss" : : "m"(*mem));
report("mov null, %%ss", read_ss() == 0);

This caused a spurious failure of the test.

Reported-by: Lucas Meneguel Rodrigues 
Signed-off-by: Paolo Bonzini 
---
 lib/x86/processor.h | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/lib/x86/processor.h b/lib/x86/processor.h
index 1816807..95cea1a 100644
--- a/lib/x86/processor.h
+++ b/lib/x86/processor.h
@@ -61,7 +61,7 @@ static inline u16 read_cs(void)
 {
 unsigned val;
 
-asm ("mov %%cs, %0" : "=mr"(val));
+asm volatile ("mov %%cs, %0" : "=mr"(val));
 return val;
 }
 
@@ -69,7 +69,7 @@ static inline u16 read_ds(void)
 {
 unsigned val;
 
-asm ("mov %%ds, %0" : "=mr"(val));
+asm volatile ("mov %%ds, %0" : "=mr"(val));
 return val;
 }
 
@@ -77,7 +77,7 @@ static inline u16 read_es(void)
 {
 unsigned val;
 
-asm ("mov %%es, %0" : "=mr"(val));
+asm volatile ("mov %%es, %0" : "=mr"(val));
 return val;
 }
 
@@ -85,7 +85,7 @@ static inline u16 read_ss(void)
 {
 unsigned val;
 
-asm ("mov %%ss, %0" : "=mr"(val));
+asm volatile ("mov %%ss, %0" : "=mr"(val));
 return val;
 }
 
@@ -93,7 +93,7 @@ static inline u16 read_fs(void)
 {
 unsigned val;
 
-asm ("mov %%fs, %0" : "=mr"(val));
+asm volatile ("mov %%fs, %0" : "=mr"(val));
 return val;
 }
 
@@ -101,45 +101,45 @@ static inline u16 read_gs(void)
 {
 unsigned val;
 
-asm ("mov %%gs, %0" : "=mr"(val));
+asm volatile ("mov %%gs, %0" : "=mr"(val));
 return val;
 }
 
 static inline unsigned long read_rflags(void)
 {
unsigned long f;
-   asm ("pushf; pop %0\n\t" : "=rm"(f));
+   asm volatile ("pushf; pop %0\n\t" : "=rm"(f));
return f;
 }
 
 static inline void write_ds(unsigned val)
 {
-asm ("mov %0, %%ds" : : "rm"(val) : "memory");
+asm volatile ("mov %0, %%ds" : : "rm"(val) : "memory");
 }
 
 static inline void write_es(unsigned val)
 {
-asm ("mov %0, %%es" : : "rm"(val) : "memory");
+asm volatile ("mov %0, %%es" : : "rm"(val) : "memory");
 }
 
 static inline void write_ss(unsigned val)
 {
-asm ("mov %0, %%ss" : : "rm"(val) : "memory");
+asm volatile ("mov %0, %%ss" : : "rm"(val) : "memory");
 }
 
 static inline void write_fs(unsigned val)
 {
-asm ("mov %0, %%fs" : : "rm"(val) : "memory");
+asm volatile ("mov %0, %%fs" : : "rm"(val) : "memory");
 }
 
 static inline void write_gs(unsigned val)
 {
-asm ("mov %0, %%gs" : : "rm"(val) : "memory");
+asm volatile ("mov %0, %%gs" : : "rm"(val) : "memory");
 }
 
 static inline void write_rflags(unsigned long f)
 {
-   asm ("push %0; popf\n\t" : : "rm"(f));
+asm volatile ("push %0; popf\n\t" : : "rm"(f));
 }
 
 static inline u64 rdmsr(u32 index)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL 04/23] KVM: s390: rewrite vcpu_post_run and drop out early

2015-12-02 Thread Christian Borntraeger
On 12/02/2015 02:05 PM, Paolo Bonzini wrote:
> 
> 
> On 02/12/2015 14:04, Christian Borntraeger wrote:
 Do you gain much over the simpler

vcpu->run->s.regs.gprs[14] = vcpu->arch.sie_block->gg14;
vcpu->run->s.regs.gprs[15] = vcpu->arch.sie_block->gg15;

>> Its just legacy code from the old days.
>> There is a difference, but it seems to a missed opportunity from gcc
>>
>> vcpu->arch.sie_block->gg14 = vcpu->run->s.regs.gprs[14];
>> 839c:   e3 30 f0 b8 00 04   lg  %r3,184(%r15)
>> 83a2:   e3 10 32 40 00 04   lg  %r1,576(%r3)
>> 83a8:   e3 20 30 80 00 04   lg  %r2,128(%r3)
>> 83ae:   e3 20 21 b8 00 04   lg  %r2,440(%r2)
>> 83b4:   e3 20 10 a0 00 24   stg %r2,160(%r1)
>> vcpu->arch.sie_block->gg15 = vcpu->run->s.regs.gprs[15];
>> 83ba:   e3 10 32 40 00 04   lg  %r1,576(%r3)
>> 83c0:   e3 20 30 80 00 04   lg  %r2,128(%r3)
>> 83c6:   e3 20 21 c0 00 04   lg  %r2,448(%r2)
>> 83cc:   e3 20 10 a8 00 24   stg %r2,168(%r1)
>>
>> gcc seems to reuse and reload %r2 and %r3, maybe register pressure.
> 
> More likely to be -fno-strict-aliasing. :(

Yes its indeed the aliasing. 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html