Re: [PATCH v3 12/18] arm64: KVM: Add SMCCC_ARCH_WORKAROUND_1 fast handling

2018-02-05 Thread Christoffer Dall
On Mon, Feb 05, 2018 at 09:08:31AM +, Marc Zyngier wrote:
> On 04/02/18 18:39, Christoffer Dall wrote:
> > On Thu, Feb 01, 2018 at 11:46:51AM +, Marc Zyngier wrote:
> >> We want SMCCC_ARCH_WORKAROUND_1 to be fast. As fast as possible.
> >> So let's intercept it as early as we can by testing for the
> >> function call number as soon as we've identified a HVC call
> >> coming from the guest.
> > 
> > Hmmm.  How often is this expected to happen and what is the expected
> > extra cost of doing the early-exit handling in the C code vs. here?
> 
> Pretty often. On each context switch of a Linux guest, for example. It
> is almost as bad as if we were trapping all VM ops. Moving it to C is
> definitely visible on something like hackbench (I remember something
> like a 10-12% degradation on Seattle, but I'd need to rerun the tests to
> give you something accurate). 

If it's that easily visible (although hackbench is clearly the
pathological case here), then we should try to optimize it.  Let's hope
we don't have to add too many of these workarounds in the future.

> It is the whole GPR save/restore dance
> that costs us a lot (31 registers for the guest, 12 for the host), plus
> some the extra SError synchronization that doesn't come for free either.
> 

Fair enough.

> > I think we'd be better off if we only had a single early-exit path (and
> > we should move the FP/SIMD trap to that path as well), but if there's a
> > measurable benefit of having this logic in assembly as opposed to in the
> > C code, then I'm ok with this as well.
> 
> I agree that the multiplication of "earlier than early" paths is
> becoming annoying. Moving the FP/SIMD stuff to C would be less
> problematic, as we have patches to move some of that to load/put, and
> we'd only take the trap once per time slice (as opposed to once per
> entry at the moment).

Yes, and we can even improve on that (see separate discussions around
KVM support for SVE with Dave).

> 
> Here, we're trying hard to do exactly nothing, because each instruction
> is just an extra overhead (we've already nuked the BP). I even
> considered inserting that code as part of the per-CPU-type vectors (and
> leave the rest of the KVM code alone), but it felt like a step too far.
> 

We can always look at adjusting this more in the future if we want.

Reviewed-by: Christoffer Dall 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 12/18] arm64: KVM: Add SMCCC_ARCH_WORKAROUND_1 fast handling

2018-02-05 Thread Marc Zyngier
On 04/02/18 18:39, Christoffer Dall wrote:
> On Thu, Feb 01, 2018 at 11:46:51AM +, Marc Zyngier wrote:
>> We want SMCCC_ARCH_WORKAROUND_1 to be fast. As fast as possible.
>> So let's intercept it as early as we can by testing for the
>> function call number as soon as we've identified a HVC call
>> coming from the guest.
> 
> Hmmm.  How often is this expected to happen and what is the expected
> extra cost of doing the early-exit handling in the C code vs. here?

Pretty often. On each context switch of a Linux guest, for example. It
is almost as bad as if we were trapping all VM ops. Moving it to C is
definitely visible on something like hackbench (I remember something
like a 10-12% degradation on Seattle, but I'd need to rerun the tests to
give you something accurate). It is the whole GPR save/restore dance
that costs us a lot (31 registers for the guest, 12 for the host), plus
some the extra SError synchronization that doesn't come for free either.

> I think we'd be better off if we only had a single early-exit path (and
> we should move the FP/SIMD trap to that path as well), but if there's a
> measurable benefit of having this logic in assembly as opposed to in the
> C code, then I'm ok with this as well.

I agree that the multiplication of "earlier than early" paths is
becoming annoying. Moving the FP/SIMD stuff to C would be less
problematic, as we have patches to move some of that to load/put, and
we'd only take the trap once per time slice (as opposed to once per
entry at the moment).

Here, we're trying hard to do exactly nothing, because each instruction
is just an extra overhead (we've already nuked the BP). I even
considered inserting that code as part of the per-CPU-type vectors (and
leave the rest of the KVM code alone), but it felt like a step too far.

> The code in this patch looks fine otherwise.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 12/18] arm64: KVM: Add SMCCC_ARCH_WORKAROUND_1 fast handling

2018-02-04 Thread Christoffer Dall
On Thu, Feb 01, 2018 at 11:46:51AM +, Marc Zyngier wrote:
> We want SMCCC_ARCH_WORKAROUND_1 to be fast. As fast as possible.
> So let's intercept it as early as we can by testing for the
> function call number as soon as we've identified a HVC call
> coming from the guest.

Hmmm.  How often is this expected to happen and what is the expected
extra cost of doing the early-exit handling in the C code vs. here?

I think we'd be better off if we only had a single early-exit path (and
we should move the FP/SIMD trap to that path as well), but if there's a
measurable benefit of having this logic in assembly as opposed to in the
C code, then I'm ok with this as well.

The code in this patch looks fine otherwise.

Thanks,
-Christoffer

> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/hyp/hyp-entry.S | 20 ++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
> index e4f37b9dd47c..f36464bd57c5 100644
> --- a/arch/arm64/kvm/hyp/hyp-entry.S
> +++ b/arch/arm64/kvm/hyp/hyp-entry.S
> @@ -15,6 +15,7 @@
>   * along with this program.  If not, see .
>   */
>  
> +#include 
>  #include 
>  
>  #include 
> @@ -64,10 +65,11 @@ alternative_endif
>   lsr x0, x1, #ESR_ELx_EC_SHIFT
>  
>   cmp x0, #ESR_ELx_EC_HVC64
> + ccmpx0, #ESR_ELx_EC_HVC32, #4, ne
>   b.neel1_trap
>  
> - mrs x1, vttbr_el2   // If vttbr is valid, the 64bit guest
> - cbnzx1, el1_trap// called HVC
> + mrs x1, vttbr_el2   // If vttbr is valid, the guest
> + cbnzx1, el1_hvc_guest   // called HVC
>  
>   /* Here, we're pretty sure the host called HVC. */
>   ldp x0, x1, [sp], #16
> @@ -100,6 +102,20 @@ alternative_endif
>  
>   eret
>  
> +el1_hvc_guest:
> + /*
> +  * Fastest possible path for ARM_SMCCC_ARCH_WORKAROUND_1.
> +  * The workaround has already been applied on the host,
> +  * so let's quickly get back to the guest. We don't bother
> +  * restoring x1, as it can be clobbered anyway.
> +  */
> + ldr x1, [sp]// Guest's x0
> + eor w1, w1, #ARM_SMCCC_ARCH_WORKAROUND_1
> + cbnzw1, el1_trap
> + mov x0, x1
> + add sp, sp, #16
> + eret
> +
>  el1_trap:
>   /*
>* x0: ESR_EC
> -- 
> 2.14.2
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm