Re: [PATCH v3 13/28] arm64/sve: Signal handling support
On Fri, Oct 13, 2017 at 12:17:19PM +0100, Catalin Marinas wrote: > On Thu, Oct 12, 2017 at 05:11:57PM +0100, Dave P Martin wrote: > > On Wed, Oct 11, 2017 at 05:40:52PM +0100, Catalin Marinas wrote: > > > On Tue, Oct 10, 2017 at 07:38:30PM +0100, Dave P Martin wrote: > > > > diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c > > > > index aabeaee..fa4ed34 100644 > > > > --- a/arch/arm64/kernel/fpsimd.c > > > > +++ b/arch/arm64/kernel/fpsimd.c > > > > @@ -310,6 +310,32 @@ static void fpsimd_to_sve(struct task_struct *task) > > > >sizeof(fst->vregs[i])); > > > > } > > > > > > > > +/* > > > > + * Transfer the SVE state in task->thread.sve_state to > > > > + * task->thread.fpsimd_state. > > > > + * > > > > + * Task can be a non-runnable task, or current. In the latter case, > > > > + * softirqs (and preemption) must be disabled. > > > > + * task->thread.sve_state must point to at least sve_state_size(task) > > > > + * bytes of allocated kernel memory. > > > > + * task->thread.sve_state must be up to date before calling this > > > > function. > > > > + */ > > > > +static void sve_to_fpsimd(struct task_struct *task) > > > > +{ > > > > + unsigned int vq; > > > > + void const *sst = task->thread.sve_state; > > > > + struct fpsimd_state *fst = >thread.fpsimd_state; > > > > + unsigned int i; > > > > + > > > > + if (!system_supports_sve()) > > > > + return; > > > > + > > > > + vq = sve_vq_from_vl(task->thread.sve_vl); > > > > + for (i = 0; i < 32; ++i) > > > > + memcpy(>vregs[i], ZREG(sst, vq, i), > > > > + sizeof(fst->vregs[i])); > > > > +} > > > > > > Nit: could we actually just do an assignment with some pointer casting? > > > It looks like we invoke memcpy for every 16 bytes (same for > > > fpsimd_to_sve). > > > > I was uneasy about what the type of ZREG(sst, vq, i) ought to be. > > In any case, memest() is magic: my oldskool GCC (5.3.0) generates: > > > > 08084c70 : > > 08084c70: 1404b 08084c80 > >> > 08084c74: d503201fnop > > 08084c78: d65f03c0ret > > 08084c7c: d503201fnop > > 08084c80: f0007d61adrpx1, 09033000 > > > > 08084c84: f942a021ldr x1, [x1,#1344] > > 08084c88: 36b001c1tbz w1, #22, 08084cc0 > > > > 08084c8c: b94ca805ldr w5, [x0,#3240] > > 08084c90: 912a0001add x1, x0, #0xa80 > > 08084c94: 91320004add x4, x0, #0xc80 > > 08084c98: f9465006ldr x6, [x0,#3232] > > 08084c9c: 121c6ca5and w5, w5, #0xfff0 > > 08084ca0: 5280mov w0, #0x0 > > // #0 > > 08084ca4: 8b2040c2add x2, x6, w0, uxtw > > 08084ca8: 0b05add w0, w0, w5 > > 08084cac: a9400c42ldp x2, x3, [x2] > > 08084cb0: a8810c22stp x2, x3, [x1],#16 > > 08084cb4: eb01009fcmp x4, x1 > > 08084cb8: 5461b.ne08084ca4 > > > > 08084cbc: d65f03c0ret > > 08084cc0: d65f03c0ret > > 08084cc4: d503201fnop > > > > > > Without volatile, I think assigning a single object and doing a memcpy() > > are equivalent to the compiler: which it actually uses depends solely on > > optimisation considerations. > > > > (But then I'm not a language lawyer ... not a professional one anyway). > > > > Are you concerned compilers may mess this up? > > That's fine, please ignore my comment then. I was worried that gcc would > always generate a call to the memcpy implementation rather than inlining > it. OK. I'll keep an eye on this, but in any case it won't impact the FPSIMD-only case. Cheers ---Dave ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v3 13/28] arm64/sve: Signal handling support
On Thu, Oct 12, 2017 at 05:11:57PM +0100, Dave P Martin wrote: > On Wed, Oct 11, 2017 at 05:40:52PM +0100, Catalin Marinas wrote: > > On Tue, Oct 10, 2017 at 07:38:30PM +0100, Dave P Martin wrote: > > > diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c > > > index aabeaee..fa4ed34 100644 > > > --- a/arch/arm64/kernel/fpsimd.c > > > +++ b/arch/arm64/kernel/fpsimd.c > > > @@ -310,6 +310,32 @@ static void fpsimd_to_sve(struct task_struct *task) > > > sizeof(fst->vregs[i])); > > > } > > > > > > +/* > > > + * Transfer the SVE state in task->thread.sve_state to > > > + * task->thread.fpsimd_state. > > > + * > > > + * Task can be a non-runnable task, or current. In the latter case, > > > + * softirqs (and preemption) must be disabled. > > > + * task->thread.sve_state must point to at least sve_state_size(task) > > > + * bytes of allocated kernel memory. > > > + * task->thread.sve_state must be up to date before calling this > > > function. > > > + */ > > > +static void sve_to_fpsimd(struct task_struct *task) > > > +{ > > > + unsigned int vq; > > > + void const *sst = task->thread.sve_state; > > > + struct fpsimd_state *fst = >thread.fpsimd_state; > > > + unsigned int i; > > > + > > > + if (!system_supports_sve()) > > > + return; > > > + > > > + vq = sve_vq_from_vl(task->thread.sve_vl); > > > + for (i = 0; i < 32; ++i) > > > + memcpy(>vregs[i], ZREG(sst, vq, i), > > > +sizeof(fst->vregs[i])); > > > +} > > > > Nit: could we actually just do an assignment with some pointer casting? > > It looks like we invoke memcpy for every 16 bytes (same for > > fpsimd_to_sve). > > I was uneasy about what the type of ZREG(sst, vq, i) ought to be. > In any case, memest() is magic: my oldskool GCC (5.3.0) generates: > > 08084c70 : > 08084c70: 1404b 08084c80 >> 08084c74: d503201fnop > 08084c78: d65f03c0ret > 08084c7c: d503201fnop > 08084c80: f0007d61adrpx1, 09033000 > > 08084c84: f942a021ldr x1, [x1,#1344] > 08084c88: 36b001c1tbz w1, #22, 08084cc0 > > 08084c8c: b94ca805ldr w5, [x0,#3240] > 08084c90: 912a0001add x1, x0, #0xa80 > 08084c94: 91320004add x4, x0, #0xc80 > 08084c98: f9465006ldr x6, [x0,#3232] > 08084c9c: 121c6ca5and w5, w5, #0xfff0 > 08084ca0: 5280mov w0, #0x0 > // #0 > 08084ca4: 8b2040c2add x2, x6, w0, uxtw > 08084ca8: 0b05add w0, w0, w5 > 08084cac: a9400c42ldp x2, x3, [x2] > 08084cb0: a8810c22stp x2, x3, [x1],#16 > 08084cb4: eb01009fcmp x4, x1 > 08084cb8: 5461b.ne08084ca4 > > 08084cbc: d65f03c0ret > 08084cc0: d65f03c0ret > 08084cc4: d503201fnop > > > Without volatile, I think assigning a single object and doing a memcpy() > are equivalent to the compiler: which it actually uses depends solely on > optimisation considerations. > > (But then I'm not a language lawyer ... not a professional one anyway). > > Are you concerned compilers may mess this up? That's fine, please ignore my comment then. I was worried that gcc would always generate a call to the memcpy implementation rather than inlining it. -- Catalin ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v3 13/28] arm64/sve: Signal handling support
On Wed, Oct 11, 2017 at 05:40:52PM +0100, Catalin Marinas wrote: > On Tue, Oct 10, 2017 at 07:38:30PM +0100, Dave P Martin wrote: > > diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c > > index aabeaee..fa4ed34 100644 > > --- a/arch/arm64/kernel/fpsimd.c > > +++ b/arch/arm64/kernel/fpsimd.c > > @@ -310,6 +310,32 @@ static void fpsimd_to_sve(struct task_struct *task) > >sizeof(fst->vregs[i])); > > } > > > > +/* > > + * Transfer the SVE state in task->thread.sve_state to > > + * task->thread.fpsimd_state. > > + * > > + * Task can be a non-runnable task, or current. In the latter case, > > + * softirqs (and preemption) must be disabled. > > + * task->thread.sve_state must point to at least sve_state_size(task) > > + * bytes of allocated kernel memory. > > + * task->thread.sve_state must be up to date before calling this function. > > + */ > > +static void sve_to_fpsimd(struct task_struct *task) > > +{ > > + unsigned int vq; > > + void const *sst = task->thread.sve_state; > > + struct fpsimd_state *fst = >thread.fpsimd_state; > > + unsigned int i; > > + > > + if (!system_supports_sve()) > > + return; > > + > > + vq = sve_vq_from_vl(task->thread.sve_vl); > > + for (i = 0; i < 32; ++i) > > + memcpy(>vregs[i], ZREG(sst, vq, i), > > + sizeof(fst->vregs[i])); > > +} > > Nit: could we actually just do an assignment with some pointer casting? > It looks like we invoke memcpy for every 16 bytes (same for > fpsimd_to_sve). I was uneasy about what the type of ZREG(sst, vq, i) ought to be. In any case, memest() is magic: my oldskool GCC (5.3.0) generates: 08084c70 : 08084c70: 1404b 08084c8008084c74: d503201fnop 08084c78: d65f03c0ret 08084c7c: d503201fnop 08084c80: f0007d61adrpx1, 09033000 08084c84: f942a021ldr x1, [x1,#1344] 08084c88: 36b001c1tbz w1, #22, 08084cc0 08084c8c: b94ca805ldr w5, [x0,#3240] 08084c90: 912a0001add x1, x0, #0xa80 08084c94: 91320004add x4, x0, #0xc80 08084c98: f9465006ldr x6, [x0,#3232] 08084c9c: 121c6ca5and w5, w5, #0xfff0 08084ca0: 5280mov w0, #0x0 // #0 08084ca4: 8b2040c2add x2, x6, w0, uxtw 08084ca8: 0b05add w0, w0, w5 08084cac: a9400c42ldp x2, x3, [x2] 08084cb0: a8810c22stp x2, x3, [x1],#16 08084cb4: eb01009fcmp x4, x1 08084cb8: 5461b.ne08084ca4 08084cbc: d65f03c0ret 08084cc0: d65f03c0ret 08084cc4: d503201fnop Without volatile, I think assigning a single object and doing a memcpy() are equivalent to the compiler: which it actually uses depends solely on optimisation considerations. (But then I'm not a language lawyer ... not a professional one anyway). Are you concerned compilers may mess this up? Cheers ---Dave ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v3 13/28] arm64/sve: Signal handling support
On Tue, Oct 10, 2017 at 07:38:30PM +0100, Dave P Martin wrote: > diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c > index aabeaee..fa4ed34 100644 > --- a/arch/arm64/kernel/fpsimd.c > +++ b/arch/arm64/kernel/fpsimd.c > @@ -310,6 +310,32 @@ static void fpsimd_to_sve(struct task_struct *task) > sizeof(fst->vregs[i])); > } > > +/* > + * Transfer the SVE state in task->thread.sve_state to > + * task->thread.fpsimd_state. > + * > + * Task can be a non-runnable task, or current. In the latter case, > + * softirqs (and preemption) must be disabled. > + * task->thread.sve_state must point to at least sve_state_size(task) > + * bytes of allocated kernel memory. > + * task->thread.sve_state must be up to date before calling this function. > + */ > +static void sve_to_fpsimd(struct task_struct *task) > +{ > + unsigned int vq; > + void const *sst = task->thread.sve_state; > + struct fpsimd_state *fst = >thread.fpsimd_state; > + unsigned int i; > + > + if (!system_supports_sve()) > + return; > + > + vq = sve_vq_from_vl(task->thread.sve_vl); > + for (i = 0; i < 32; ++i) > + memcpy(>vregs[i], ZREG(sst, vq, i), > +sizeof(fst->vregs[i])); > +} Nit: could we actually just do an assignment with some pointer casting? It looks like we invoke memcpy for every 16 bytes (same for fpsimd_to_sve). Otherwise: Reviewed-by: Catalin Marinas___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm