Re: [PATCH v3 07/22] arm64: KVM: Implement system register save/restore

2015-12-12 Thread Mario Smarduch


On 12/11/2015 10:29 AM, Marc Zyngier wrote:
> Hi Mario,
> 
> On 11/12/15 03:24, Mario Smarduch wrote:
>> Hi Marc,
>>
>> On 12/7/2015 2:53 AM, Marc Zyngier wrote:
>>> Implement the system register save/restore as a direct translation of
>>> the assembly code version.
>>>
>>> Signed-off-by: Marc Zyngier 
>>> Reviewed-by: Christoffer Dall 
>>> ---
>>>  arch/arm64/kvm/hyp/Makefile|  1 +
>>>  arch/arm64/kvm/hyp/hyp.h   |  3 ++
>>>  arch/arm64/kvm/hyp/sysreg-sr.c | 90 
>>> ++
>>>  3 files changed, 94 insertions(+)
>>>  create mode 100644 arch/arm64/kvm/hyp/sysreg-sr.c
>>>
>>> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
>>> index 455dc0a..ec94200 100644
>>> --- a/arch/arm64/kvm/hyp/Makefile
>>> +++ b/arch/arm64/kvm/hyp/Makefile
>>> @@ -5,3 +5,4 @@
>>>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
>>>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
>>>  obj-$(CONFIG_KVM_ARM_HOST) += timer-sr.o
>>> +obj-$(CONFIG_KVM_ARM_HOST) += sysreg-sr.o
>>> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
>>> index f213e46..778d56d 100644
>>> --- a/arch/arm64/kvm/hyp/hyp.h
>>> +++ b/arch/arm64/kvm/hyp/hyp.h
>>> @@ -38,5 +38,8 @@ void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
>>>  void __timer_save_state(struct kvm_vcpu *vcpu);
>>>  void __timer_restore_state(struct kvm_vcpu *vcpu);
>>>  
>>> +void __sysreg_save_state(struct kvm_cpu_context *ctxt);
>>> +void __sysreg_restore_state(struct kvm_cpu_context *ctxt);
>>> +
>>>  #endif /* __ARM64_KVM_HYP_H__ */
>>>  
>>> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
>>> new file mode 100644
>>> index 000..add8fcb
>>> --- /dev/null
>>> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
>>> @@ -0,0 +1,90 @@
>>> +/*
>>> + * Copyright (C) 2012-2015 - ARM Ltd
>>> + * Author: Marc Zyngier 
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License version 2 as
>>> + * published by the Free Software Foundation.
>>> + *
>>> + * This program is distributed in the hope that it will be useful,
>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> + * GNU General Public License for more details.
>>> + *
>>> + * You should have received a copy of the GNU General Public License
>>> + * along with this program.  If not, see .
>>> + */
>>> +
>>> +#include 
>>> +#include 
>>> +
>>> +#include 
>>> +
>>> +#include "hyp.h"
>>> +
>>
>> I looked closer on some other ways to get better performance out of
>> the compiler. This code sequence performs about 35% faster for 
>> __sysreg_save_state(..) for 5000 exits you save about 500mS or 100nS
>> per exit. This is on Juno.
> 
> 35% faster? Really? That's pretty crazy. Was that on the A57 or the A53?

Good question, I bind kvmtool to cpu1, I think that's an A57.
> 
>>
>> register int volatile count asm("r2") = 0;

I meant x2, but this compiles with aarch64 compiler and runs on Juno. Appears
like compiler may have an issue.

> 
> Does this even work on arm64? We don't have an "r2" register...
> 
>>
>> do {
>> 
>> } while(count);
>>
>> I didn't test the restore function (ran out of time) but I suspect it should 
>> be
>> the same. The assembler pretty much uses all the GPRs, (a little too many, 
>> using
>> stp to push 4 pairs on the stack and restore) looking at the assembler it all
>> should execute out of order.
> 
> Are you talking about the original implementation here? or the generated
> code out of the compiler? The original implementation didn't push
> anything on the stack (apart from the prologue, but we have the same
> thing in the C implementation).

This is generated compiler code using the do { ... } while code.
> 
> Looking at the compiler output, we have a bunch of mrs/str, one after
> the other - pretty basic. Maybe that gives the CPU some "breathing"
> time, but I have no idea if that's more or less efficient.
> 
> But the main thing is that we can now rely on the compiler to generate
> something that is more or less optimized for a given platform if there
> is such a requirement. We go from something that was cast in stone to
> something that has {some degree of flexibility.

Yes definitely, the do {} while does bunch of mrs then bunch of str,
probably leads to out of order execution, eliminating the write after read
dependency.
Right now I don't know the compiler option that leads to this optimization.
> 
>>
>> FWIW I gave this a try since compilers like to optimize loops. I used
>> 'cntpct_el0' counter register to measure the intervals.
> 
> It'd be nice to have a measure in terms of cycle, but that's a good
> first approximation.
Will do that in the future. This series performs no worse  then assembler one
and the huge change is the clean C code 

Re: [PATCH v3 07/22] arm64: KVM: Implement system register save/restore

2015-12-11 Thread Marc Zyngier
Hi Mario,

On 11/12/15 03:24, Mario Smarduch wrote:
> Hi Marc,
> 
> On 12/7/2015 2:53 AM, Marc Zyngier wrote:
>> Implement the system register save/restore as a direct translation of
>> the assembly code version.
>>
>> Signed-off-by: Marc Zyngier 
>> Reviewed-by: Christoffer Dall 
>> ---
>>  arch/arm64/kvm/hyp/Makefile|  1 +
>>  arch/arm64/kvm/hyp/hyp.h   |  3 ++
>>  arch/arm64/kvm/hyp/sysreg-sr.c | 90 
>> ++
>>  3 files changed, 94 insertions(+)
>>  create mode 100644 arch/arm64/kvm/hyp/sysreg-sr.c
>>
>> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
>> index 455dc0a..ec94200 100644
>> --- a/arch/arm64/kvm/hyp/Makefile
>> +++ b/arch/arm64/kvm/hyp/Makefile
>> @@ -5,3 +5,4 @@
>>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
>>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
>>  obj-$(CONFIG_KVM_ARM_HOST) += timer-sr.o
>> +obj-$(CONFIG_KVM_ARM_HOST) += sysreg-sr.o
>> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
>> index f213e46..778d56d 100644
>> --- a/arch/arm64/kvm/hyp/hyp.h
>> +++ b/arch/arm64/kvm/hyp/hyp.h
>> @@ -38,5 +38,8 @@ void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
>>  void __timer_save_state(struct kvm_vcpu *vcpu);
>>  void __timer_restore_state(struct kvm_vcpu *vcpu);
>>  
>> +void __sysreg_save_state(struct kvm_cpu_context *ctxt);
>> +void __sysreg_restore_state(struct kvm_cpu_context *ctxt);
>> +
>>  #endif /* __ARM64_KVM_HYP_H__ */
>>  
>> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
>> new file mode 100644
>> index 000..add8fcb
>> --- /dev/null
>> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
>> @@ -0,0 +1,90 @@
>> +/*
>> + * Copyright (C) 2012-2015 - ARM Ltd
>> + * Author: Marc Zyngier 
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see .
>> + */
>> +
>> +#include 
>> +#include 
>> +
>> +#include 
>> +
>> +#include "hyp.h"
>> +
> 
> I looked closer on some other ways to get better performance out of
> the compiler. This code sequence performs about 35% faster for 
> __sysreg_save_state(..) for 5000 exits you save about 500mS or 100nS
> per exit. This is on Juno.

35% faster? Really? That's pretty crazy. Was that on the A57 or the A53?

> 
> register int volatile count asm("r2") = 0;

Does this even work on arm64? We don't have an "r2" register...

> 
> do {
> 
> } while(count);
> 
> I didn't test the restore function (ran out of time) but I suspect it should 
> be
> the same. The assembler pretty much uses all the GPRs, (a little too many, 
> using
> stp to push 4 pairs on the stack and restore) looking at the assembler it all
> should execute out of order.

Are you talking about the original implementation here? or the generated
code out of the compiler? The original implementation didn't push
anything on the stack (apart from the prologue, but we have the same
thing in the C implementation).

Looking at the compiler output, we have a bunch of mrs/str, one after
the other - pretty basic. Maybe that gives the CPU some "breathing"
time, but I have no idea if that's more or less efficient.

But the main thing is that we can now rely on the compiler to generate
something that is more or less optimized for a given platform if there
is such a requirement. We go from something that was cast in stone to
something that has some degree of flexibility.

> 
> FWIW I gave this a try since compilers like to optimize loops. I used
> 'cntpct_el0' counter register to measure the intervals.

It'd be nice to have a measure in terms of cycle, but that's a good
first approximation.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 07/22] arm64: KVM: Implement system register save/restore

2015-12-10 Thread Mario Smarduch
Hi Marc,

On 12/7/2015 2:53 AM, Marc Zyngier wrote:
> Implement the system register save/restore as a direct translation of
> the assembly code version.
> 
> Signed-off-by: Marc Zyngier 
> Reviewed-by: Christoffer Dall 
> ---
>  arch/arm64/kvm/hyp/Makefile|  1 +
>  arch/arm64/kvm/hyp/hyp.h   |  3 ++
>  arch/arm64/kvm/hyp/sysreg-sr.c | 90 
> ++
>  3 files changed, 94 insertions(+)
>  create mode 100644 arch/arm64/kvm/hyp/sysreg-sr.c
> 
> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> index 455dc0a..ec94200 100644
> --- a/arch/arm64/kvm/hyp/Makefile
> +++ b/arch/arm64/kvm/hyp/Makefile
> @@ -5,3 +5,4 @@
>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
>  obj-$(CONFIG_KVM_ARM_HOST) += timer-sr.o
> +obj-$(CONFIG_KVM_ARM_HOST) += sysreg-sr.o
> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
> index f213e46..778d56d 100644
> --- a/arch/arm64/kvm/hyp/hyp.h
> +++ b/arch/arm64/kvm/hyp/hyp.h
> @@ -38,5 +38,8 @@ void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
>  void __timer_save_state(struct kvm_vcpu *vcpu);
>  void __timer_restore_state(struct kvm_vcpu *vcpu);
>  
> +void __sysreg_save_state(struct kvm_cpu_context *ctxt);
> +void __sysreg_restore_state(struct kvm_cpu_context *ctxt);
> +
>  #endif /* __ARM64_KVM_HYP_H__ */
>  
> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> new file mode 100644
> index 000..add8fcb
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> @@ -0,0 +1,90 @@
> +/*
> + * Copyright (C) 2012-2015 - ARM Ltd
> + * Author: Marc Zyngier 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +
> +#include 
> +#include 
> +
> +#include 
> +
> +#include "hyp.h"
> +

I looked closer on some other ways to get better performance out of the
compiler. This code sequence performs about 35% faster for
__sysreg_save_state(..) for 5000 exits you save about 500mS or 100nS per exit.
This is on Juno.

register int volatile count asm("r2") = 0;

do {

} while(count);

I didn't test the restore function (ran out of time) but I suspect it should be
the same. The assembler pretty much uses all the GPRs, (a little too many, using
stp to push 4 pairs on the stack and restore) looking at the assembler it all
should execute out of order.

FWIW I gave this a try since compilers like to optimize loops. I used
'cntpct_el0' counter register to measure the intervals.


> +/* ctxt is already in the HYP VA space */
> +void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
> +{
> + ctxt->sys_regs[MPIDR_EL1]   = read_sysreg(vmpidr_el2);
> + ctxt->sys_regs[CSSELR_EL1]  = read_sysreg(csselr_el1);
> + ctxt->sys_regs[SCTLR_EL1]   = read_sysreg(sctlr_el1);
> + ctxt->sys_regs[ACTLR_EL1]   = read_sysreg(actlr_el1);
> + ctxt->sys_regs[CPACR_EL1]   = read_sysreg(cpacr_el1);
> + ctxt->sys_regs[TTBR0_EL1]   = read_sysreg(ttbr0_el1);
> + ctxt->sys_regs[TTBR1_EL1]   = read_sysreg(ttbr1_el1);
> + ctxt->sys_regs[TCR_EL1] = read_sysreg(tcr_el1);
> + ctxt->sys_regs[ESR_EL1] = read_sysreg(esr_el1);
> + ctxt->sys_regs[AFSR0_EL1]   = read_sysreg(afsr0_el1);
> + ctxt->sys_regs[AFSR1_EL1]   = read_sysreg(afsr1_el1);
> + ctxt->sys_regs[FAR_EL1] = read_sysreg(far_el1);
> + ctxt->sys_regs[MAIR_EL1]= read_sysreg(mair_el1);
> + ctxt->sys_regs[VBAR_EL1]= read_sysreg(vbar_el1);
> + ctxt->sys_regs[CONTEXTIDR_EL1]  = read_sysreg(contextidr_el1);
> + ctxt->sys_regs[TPIDR_EL0]   = read_sysreg(tpidr_el0);
> + ctxt->sys_regs[TPIDRRO_EL0] = read_sysreg(tpidrro_el0);
> + ctxt->sys_regs[TPIDR_EL1]   = read_sysreg(tpidr_el1);
> + ctxt->sys_regs[AMAIR_EL1]   = read_sysreg(amair_el1);
> + ctxt->sys_regs[CNTKCTL_EL1] = read_sysreg(cntkctl_el1);
> + ctxt->sys_regs[PAR_EL1] = read_sysreg(par_el1);
> + ctxt->sys_regs[MDSCR_EL1]   = read_sysreg(mdscr_el1);
> +
> + ctxt->gp_regs.regs.sp   = read_sysreg(sp_el0);
> + ctxt->gp_regs.regs.pc   = read_sysreg(elr_el2);
> + ctxt->gp_regs.regs.pstate   = read_sysreg(spsr_el2);
> + ctxt->gp_regs.sp_el1= read_sysreg(sp_el1);
> + ctxt->gp_regs.elr_el1   = 

[PATCH v3 07/22] arm64: KVM: Implement system register save/restore

2015-12-07 Thread Marc Zyngier
Implement the system register save/restore as a direct translation of
the assembly code version.

Signed-off-by: Marc Zyngier 
Reviewed-by: Christoffer Dall 
---
 arch/arm64/kvm/hyp/Makefile|  1 +
 arch/arm64/kvm/hyp/hyp.h   |  3 ++
 arch/arm64/kvm/hyp/sysreg-sr.c | 90 ++
 3 files changed, 94 insertions(+)
 create mode 100644 arch/arm64/kvm/hyp/sysreg-sr.c

diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
index 455dc0a..ec94200 100644
--- a/arch/arm64/kvm/hyp/Makefile
+++ b/arch/arm64/kvm/hyp/Makefile
@@ -5,3 +5,4 @@
 obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
 obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
 obj-$(CONFIG_KVM_ARM_HOST) += timer-sr.o
+obj-$(CONFIG_KVM_ARM_HOST) += sysreg-sr.o
diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
index f213e46..778d56d 100644
--- a/arch/arm64/kvm/hyp/hyp.h
+++ b/arch/arm64/kvm/hyp/hyp.h
@@ -38,5 +38,8 @@ void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
 void __timer_save_state(struct kvm_vcpu *vcpu);
 void __timer_restore_state(struct kvm_vcpu *vcpu);
 
+void __sysreg_save_state(struct kvm_cpu_context *ctxt);
+void __sysreg_restore_state(struct kvm_cpu_context *ctxt);
+
 #endif /* __ARM64_KVM_HYP_H__ */
 
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
new file mode 100644
index 000..add8fcb
--- /dev/null
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -0,0 +1,90 @@
+/*
+ * Copyright (C) 2012-2015 - ARM Ltd
+ * Author: Marc Zyngier 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include 
+#include 
+
+#include 
+
+#include "hyp.h"
+
+/* ctxt is already in the HYP VA space */
+void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
+{
+   ctxt->sys_regs[MPIDR_EL1]   = read_sysreg(vmpidr_el2);
+   ctxt->sys_regs[CSSELR_EL1]  = read_sysreg(csselr_el1);
+   ctxt->sys_regs[SCTLR_EL1]   = read_sysreg(sctlr_el1);
+   ctxt->sys_regs[ACTLR_EL1]   = read_sysreg(actlr_el1);
+   ctxt->sys_regs[CPACR_EL1]   = read_sysreg(cpacr_el1);
+   ctxt->sys_regs[TTBR0_EL1]   = read_sysreg(ttbr0_el1);
+   ctxt->sys_regs[TTBR1_EL1]   = read_sysreg(ttbr1_el1);
+   ctxt->sys_regs[TCR_EL1] = read_sysreg(tcr_el1);
+   ctxt->sys_regs[ESR_EL1] = read_sysreg(esr_el1);
+   ctxt->sys_regs[AFSR0_EL1]   = read_sysreg(afsr0_el1);
+   ctxt->sys_regs[AFSR1_EL1]   = read_sysreg(afsr1_el1);
+   ctxt->sys_regs[FAR_EL1] = read_sysreg(far_el1);
+   ctxt->sys_regs[MAIR_EL1]= read_sysreg(mair_el1);
+   ctxt->sys_regs[VBAR_EL1]= read_sysreg(vbar_el1);
+   ctxt->sys_regs[CONTEXTIDR_EL1]  = read_sysreg(contextidr_el1);
+   ctxt->sys_regs[TPIDR_EL0]   = read_sysreg(tpidr_el0);
+   ctxt->sys_regs[TPIDRRO_EL0] = read_sysreg(tpidrro_el0);
+   ctxt->sys_regs[TPIDR_EL1]   = read_sysreg(tpidr_el1);
+   ctxt->sys_regs[AMAIR_EL1]   = read_sysreg(amair_el1);
+   ctxt->sys_regs[CNTKCTL_EL1] = read_sysreg(cntkctl_el1);
+   ctxt->sys_regs[PAR_EL1] = read_sysreg(par_el1);
+   ctxt->sys_regs[MDSCR_EL1]   = read_sysreg(mdscr_el1);
+
+   ctxt->gp_regs.regs.sp   = read_sysreg(sp_el0);
+   ctxt->gp_regs.regs.pc   = read_sysreg(elr_el2);
+   ctxt->gp_regs.regs.pstate   = read_sysreg(spsr_el2);
+   ctxt->gp_regs.sp_el1= read_sysreg(sp_el1);
+   ctxt->gp_regs.elr_el1   = read_sysreg(elr_el1);
+   ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg(spsr_el1);
+}
+
+void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
+{
+   write_sysreg(ctxt->sys_regs[MPIDR_EL1],   vmpidr_el2);
+   write_sysreg(ctxt->sys_regs[CSSELR_EL1],  csselr_el1);
+   write_sysreg(ctxt->sys_regs[SCTLR_EL1],   sctlr_el1);
+   write_sysreg(ctxt->sys_regs[ACTLR_EL1],   actlr_el1);
+   write_sysreg(ctxt->sys_regs[CPACR_EL1],   cpacr_el1);
+   write_sysreg(ctxt->sys_regs[TTBR0_EL1],   ttbr0_el1);
+   write_sysreg(ctxt->sys_regs[TTBR1_EL1],   ttbr1_el1);
+   write_sysreg(ctxt->sys_regs[TCR_EL1], tcr_el1);
+   write_sysreg(ctxt->sys_regs[ESR_EL1], esr_el1);
+   write_sysreg(ctxt->sys_regs[AFSR0_EL1],   afsr0_el1);
+   write_sysreg(ctxt->sys_regs[AFSR1_EL1],   afsr1_el1);
+   write_sysreg(ctxt->sys_regs[FAR_EL1],