Re: [PATCH v3 13/28] arm64/sve: Signal handling support

2017-10-13 Thread Dave Martin
On Fri, Oct 13, 2017 at 12:17:19PM +0100, Catalin Marinas wrote:
> On Thu, Oct 12, 2017 at 05:11:57PM +0100, Dave P Martin wrote:
> > On Wed, Oct 11, 2017 at 05:40:52PM +0100, Catalin Marinas wrote:
> > > On Tue, Oct 10, 2017 at 07:38:30PM +0100, Dave P Martin wrote:
> > > > diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> > > > index aabeaee..fa4ed34 100644
> > > > --- a/arch/arm64/kernel/fpsimd.c
> > > > +++ b/arch/arm64/kernel/fpsimd.c
> > > > @@ -310,6 +310,32 @@ static void fpsimd_to_sve(struct task_struct *task)
> > > >sizeof(fst->vregs[i]));
> > > >  }
> > > >  
> > > > +/*
> > > > + * Transfer the SVE state in task->thread.sve_state to
> > > > + * task->thread.fpsimd_state.
> > > > + *
> > > > + * Task can be a non-runnable task, or current.  In the latter case,
> > > > + * softirqs (and preemption) must be disabled.
> > > > + * task->thread.sve_state must point to at least sve_state_size(task)
> > > > + * bytes of allocated kernel memory.
> > > > + * task->thread.sve_state must be up to date before calling this 
> > > > function.
> > > > + */
> > > > +static void sve_to_fpsimd(struct task_struct *task)
> > > > +{
> > > > +   unsigned int vq;
> > > > +   void const *sst = task->thread.sve_state;
> > > > +   struct fpsimd_state *fst = >thread.fpsimd_state;
> > > > +   unsigned int i;
> > > > +
> > > > +   if (!system_supports_sve())
> > > > +   return;
> > > > +
> > > > +   vq = sve_vq_from_vl(task->thread.sve_vl);
> > > > +   for (i = 0; i < 32; ++i)
> > > > +   memcpy(>vregs[i], ZREG(sst, vq, i),
> > > > +  sizeof(fst->vregs[i]));
> > > > +}
> > > 
> > > Nit: could we actually just do an assignment with some pointer casting?
> > > It looks like we invoke memcpy for every 16 bytes (same for
> > > fpsimd_to_sve).
> > 
> > I was uneasy about what the type of ZREG(sst, vq, i) ought to be.
> > In any case, memest() is magic: my oldskool GCC (5.3.0) generates:
> > 
> > 08084c70 :
> > 08084c70:   1404b   08084c80 
> > 
> > 08084c74:   d503201fnop
> > 08084c78:   d65f03c0ret
> > 08084c7c:   d503201fnop
> > 08084c80:   f0007d61adrpx1, 09033000 
> > 
> > 08084c84:   f942a021ldr x1, [x1,#1344]
> > 08084c88:   36b001c1tbz w1, #22, 08084cc0 
> > 
> > 08084c8c:   b94ca805ldr w5, [x0,#3240]
> > 08084c90:   912a0001add x1, x0, #0xa80
> > 08084c94:   91320004add x4, x0, #0xc80
> > 08084c98:   f9465006ldr x6, [x0,#3232]
> > 08084c9c:   121c6ca5and w5, w5, #0xfff0
> > 08084ca0:   5280mov w0, #0x0
> > // #0
> > 08084ca4:   8b2040c2add x2, x6, w0, uxtw
> > 08084ca8:   0b05add w0, w0, w5
> > 08084cac:   a9400c42ldp x2, x3, [x2]
> > 08084cb0:   a8810c22stp x2, x3, [x1],#16
> > 08084cb4:   eb01009fcmp x4, x1
> > 08084cb8:   5461b.ne08084ca4 
> > 
> > 08084cbc:   d65f03c0ret
> > 08084cc0:   d65f03c0ret
> > 08084cc4:   d503201fnop
> > 
> > 
> > Without volatile, I think assigning a single object and doing a memcpy()
> > are equivalent to the compiler: which it actually uses depends solely on
> > optimisation considerations.
> > 
> > (But then I'm not a language lawyer ... not a professional one anyway).
> > 
> > Are you concerned compilers may mess this up?
> 
> That's fine, please ignore my comment then. I was worried that gcc would
> always generate a call to the memcpy implementation rather than inlining
> it.

OK.  I'll keep an eye on this, but in any case it won't impact the
FPSIMD-only case.

Cheers
---Dave
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 13/28] arm64/sve: Signal handling support

2017-10-13 Thread Catalin Marinas
On Thu, Oct 12, 2017 at 05:11:57PM +0100, Dave P Martin wrote:
> On Wed, Oct 11, 2017 at 05:40:52PM +0100, Catalin Marinas wrote:
> > On Tue, Oct 10, 2017 at 07:38:30PM +0100, Dave P Martin wrote:
> > > diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> > > index aabeaee..fa4ed34 100644
> > > --- a/arch/arm64/kernel/fpsimd.c
> > > +++ b/arch/arm64/kernel/fpsimd.c
> > > @@ -310,6 +310,32 @@ static void fpsimd_to_sve(struct task_struct *task)
> > >  sizeof(fst->vregs[i]));
> > >  }
> > >  
> > > +/*
> > > + * Transfer the SVE state in task->thread.sve_state to
> > > + * task->thread.fpsimd_state.
> > > + *
> > > + * Task can be a non-runnable task, or current.  In the latter case,
> > > + * softirqs (and preemption) must be disabled.
> > > + * task->thread.sve_state must point to at least sve_state_size(task)
> > > + * bytes of allocated kernel memory.
> > > + * task->thread.sve_state must be up to date before calling this 
> > > function.
> > > + */
> > > +static void sve_to_fpsimd(struct task_struct *task)
> > > +{
> > > + unsigned int vq;
> > > + void const *sst = task->thread.sve_state;
> > > + struct fpsimd_state *fst = >thread.fpsimd_state;
> > > + unsigned int i;
> > > +
> > > + if (!system_supports_sve())
> > > + return;
> > > +
> > > + vq = sve_vq_from_vl(task->thread.sve_vl);
> > > + for (i = 0; i < 32; ++i)
> > > + memcpy(>vregs[i], ZREG(sst, vq, i),
> > > +sizeof(fst->vregs[i]));
> > > +}
> > 
> > Nit: could we actually just do an assignment with some pointer casting?
> > It looks like we invoke memcpy for every 16 bytes (same for
> > fpsimd_to_sve).
> 
> I was uneasy about what the type of ZREG(sst, vq, i) ought to be.
> In any case, memest() is magic: my oldskool GCC (5.3.0) generates:
> 
> 08084c70 :
> 08084c70:   1404b   08084c80 
> 
> 08084c74:   d503201fnop
> 08084c78:   d65f03c0ret
> 08084c7c:   d503201fnop
> 08084c80:   f0007d61adrpx1, 09033000 
> 
> 08084c84:   f942a021ldr x1, [x1,#1344]
> 08084c88:   36b001c1tbz w1, #22, 08084cc0 
> 
> 08084c8c:   b94ca805ldr w5, [x0,#3240]
> 08084c90:   912a0001add x1, x0, #0xa80
> 08084c94:   91320004add x4, x0, #0xc80
> 08084c98:   f9465006ldr x6, [x0,#3232]
> 08084c9c:   121c6ca5and w5, w5, #0xfff0
> 08084ca0:   5280mov w0, #0x0  
>   // #0
> 08084ca4:   8b2040c2add x2, x6, w0, uxtw
> 08084ca8:   0b05add w0, w0, w5
> 08084cac:   a9400c42ldp x2, x3, [x2]
> 08084cb0:   a8810c22stp x2, x3, [x1],#16
> 08084cb4:   eb01009fcmp x4, x1
> 08084cb8:   5461b.ne08084ca4 
> 
> 08084cbc:   d65f03c0ret
> 08084cc0:   d65f03c0ret
> 08084cc4:   d503201fnop
> 
> 
> Without volatile, I think assigning a single object and doing a memcpy()
> are equivalent to the compiler: which it actually uses depends solely on
> optimisation considerations.
> 
> (But then I'm not a language lawyer ... not a professional one anyway).
> 
> Are you concerned compilers may mess this up?

That's fine, please ignore my comment then. I was worried that gcc would
always generate a call to the memcpy implementation rather than inlining
it.

-- 
Catalin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 13/28] arm64/sve: Signal handling support

2017-10-12 Thread Dave Martin
On Wed, Oct 11, 2017 at 05:40:52PM +0100, Catalin Marinas wrote:
> On Tue, Oct 10, 2017 at 07:38:30PM +0100, Dave P Martin wrote:
> > diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> > index aabeaee..fa4ed34 100644
> > --- a/arch/arm64/kernel/fpsimd.c
> > +++ b/arch/arm64/kernel/fpsimd.c
> > @@ -310,6 +310,32 @@ static void fpsimd_to_sve(struct task_struct *task)
> >sizeof(fst->vregs[i]));
> >  }
> >  
> > +/*
> > + * Transfer the SVE state in task->thread.sve_state to
> > + * task->thread.fpsimd_state.
> > + *
> > + * Task can be a non-runnable task, or current.  In the latter case,
> > + * softirqs (and preemption) must be disabled.
> > + * task->thread.sve_state must point to at least sve_state_size(task)
> > + * bytes of allocated kernel memory.
> > + * task->thread.sve_state must be up to date before calling this function.
> > + */
> > +static void sve_to_fpsimd(struct task_struct *task)
> > +{
> > +   unsigned int vq;
> > +   void const *sst = task->thread.sve_state;
> > +   struct fpsimd_state *fst = >thread.fpsimd_state;
> > +   unsigned int i;
> > +
> > +   if (!system_supports_sve())
> > +   return;
> > +
> > +   vq = sve_vq_from_vl(task->thread.sve_vl);
> > +   for (i = 0; i < 32; ++i)
> > +   memcpy(>vregs[i], ZREG(sst, vq, i),
> > +  sizeof(fst->vregs[i]));
> > +}
> 
> Nit: could we actually just do an assignment with some pointer casting?
> It looks like we invoke memcpy for every 16 bytes (same for
> fpsimd_to_sve).

I was uneasy about what the type of ZREG(sst, vq, i) ought to be.
In any case, memest() is magic: my oldskool GCC (5.3.0) generates:

08084c70 :
08084c70:   1404b   08084c80 

08084c74:   d503201fnop
08084c78:   d65f03c0ret
08084c7c:   d503201fnop
08084c80:   f0007d61adrpx1, 09033000 

08084c84:   f942a021ldr x1, [x1,#1344]
08084c88:   36b001c1tbz w1, #22, 08084cc0 

08084c8c:   b94ca805ldr w5, [x0,#3240]
08084c90:   912a0001add x1, x0, #0xa80
08084c94:   91320004add x4, x0, #0xc80
08084c98:   f9465006ldr x6, [x0,#3232]
08084c9c:   121c6ca5and w5, w5, #0xfff0
08084ca0:   5280mov w0, #0x0
// #0
08084ca4:   8b2040c2add x2, x6, w0, uxtw
08084ca8:   0b05add w0, w0, w5
08084cac:   a9400c42ldp x2, x3, [x2]
08084cb0:   a8810c22stp x2, x3, [x1],#16
08084cb4:   eb01009fcmp x4, x1
08084cb8:   5461b.ne08084ca4 

08084cbc:   d65f03c0ret
08084cc0:   d65f03c0ret
08084cc4:   d503201fnop


Without volatile, I think assigning a single object and doing a memcpy()
are equivalent to the compiler: which it actually uses depends solely on
optimisation considerations.

(But then I'm not a language lawyer ... not a professional one anyway).


Are you concerned compilers may mess this up?

Cheers
---Dave
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 13/28] arm64/sve: Signal handling support

2017-10-11 Thread Catalin Marinas
On Tue, Oct 10, 2017 at 07:38:30PM +0100, Dave P Martin wrote:
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index aabeaee..fa4ed34 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -310,6 +310,32 @@ static void fpsimd_to_sve(struct task_struct *task)
>  sizeof(fst->vregs[i]));
>  }
>  
> +/*
> + * Transfer the SVE state in task->thread.sve_state to
> + * task->thread.fpsimd_state.
> + *
> + * Task can be a non-runnable task, or current.  In the latter case,
> + * softirqs (and preemption) must be disabled.
> + * task->thread.sve_state must point to at least sve_state_size(task)
> + * bytes of allocated kernel memory.
> + * task->thread.sve_state must be up to date before calling this function.
> + */
> +static void sve_to_fpsimd(struct task_struct *task)
> +{
> + unsigned int vq;
> + void const *sst = task->thread.sve_state;
> + struct fpsimd_state *fst = >thread.fpsimd_state;
> + unsigned int i;
> +
> + if (!system_supports_sve())
> + return;
> +
> + vq = sve_vq_from_vl(task->thread.sve_vl);
> + for (i = 0; i < 32; ++i)
> + memcpy(>vregs[i], ZREG(sst, vq, i),
> +sizeof(fst->vregs[i]));
> +}

Nit: could we actually just do an assignment with some pointer casting?
It looks like we invoke memcpy for every 16 bytes (same for
fpsimd_to_sve).

Otherwise:

Reviewed-by: Catalin Marinas 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm