Re: [PATCH] powerpc/32: Clear volatile regs on syscall exit

2022-02-24 Thread Segher Boessenkool
Hi!

On Thu, Feb 24, 2022 at 09:29:55AM +0100, Gabriel Paubert wrote:
> On Wed, Feb 23, 2022 at 05:27:39PM -0600, Segher Boessenkool wrote:
> > On Wed, Feb 23, 2022 at 09:48:09PM +0100, Gabriel Paubert wrote:
> > > On Wed, Feb 23, 2022 at 06:11:36PM +0100, Christophe Leroy wrote:
> > > > +   /* Zero volatile regs that may contain sensitive kernel data */
> > > > +   li  r0,0
> > > > +   li  r4,0
> > > > +   li  r5,0
> > > > +   li  r6,0
> > > > +   li  r7,0
> > > > +   li  r8,0
> > > > +   li  r9,0
> > > > +   li  r10,0
> > > > +   li  r11,0
> > > > +   li  r12,0
> > > > +   mtctr   r0
> > > > +   mtxer   r0
> > > 
> > > Here, I'm almost sure that on some processors, it would be better to
> > > separate mtctr form mtxer. mtxer is typically very expensive (pipeline
> > > flush) but I don't know what's the best ordering for the average core.
> > 
> > mtxer is cheaper than mtctr on many cores :-)
> 
> We're speaking of 32 bit here I believe;

32-bit userland, yes.  Which runs fine on non-ancient cores, too.

> on my (admittedly old) paper
> copy of PowerPC 604 user's manual, I read in a footnote:
> 
> "The mtspr (XER) instruction causes instructions to be flushed when it
> executes." 

And the 604 has a trivial depth pipeline anyway.

> I know there are probably very few 604 left in the field, but in this
> case mtspr(xer) looks very much like a superset of isync.

It hasn't been like that for decades.  On the 750 mtxer was execution
synchronised only already, for example.

> I also just had a look at the documentation of a more widespread core:
> 
> https://www.nxp.com/docs/en/reference-manual/MPC7450UM.pdf
> 
> and mtspr(xer) is marked as execution and refetch serialized, actually
> it is the only instruction to have both.

This looks like a late addition (it messes up the table, for example,
being put after "mtspr (other)").  It also is different from 7400 and
750 and everything else.  A late bugfix?  Curious :-)

> Maybe there is a subtle difference between "refetch serialization" and
> "pipeline flush", but in this case please educate me.

There is a subtle difference, but it goes the other way: refetch
serialisation doesn't stop fetch / flush everything after it, only when
the instruction completes it rejects everything after it.  So it can
waste a bit more :-)

> Besides that the back to back mtctr/mtspr(xer) may limit instruction
> decoding and issuing bandwidth.

It doesn't limit decode or dispatch (not issue fwiw) bandwidth on any
core I have ever heard of.

> I'd rather move one of them up by a few
> lines since they can only go to one of the execution units on some
> (or even most?) cores. This was my main point initially.

I think it is much more beneficial to *not* do these insns than to
shift them back and forth a cycle.


Segher


Re: [PATCH] powerpc/32: Clear volatile regs on syscall exit

2022-02-24 Thread Segher Boessenkool
On Wed, Feb 23, 2022 at 09:48:09PM +0100, Gabriel Paubert wrote:
> On Wed, Feb 23, 2022 at 06:11:36PM +0100, Christophe Leroy wrote:
> > +   /* Zero volatile regs that may contain sensitive kernel data */
> > +   li  r0,0
> > +   li  r4,0
> > +   li  r5,0
> > +   li  r6,0
> > +   li  r7,0
> > +   li  r8,0
> > +   li  r9,0
> > +   li  r10,0
> > +   li  r11,0
> > +   li  r12,0
> > +   mtctr   r0
> > +   mtxer   r0
> 
> Here, I'm almost sure that on some processors, it would be better to
> separate mtctr form mtxer. mtxer is typically very expensive (pipeline
> flush) but I don't know what's the best ordering for the average core.

mtxer is cheaper than mtctr on many cores :-)

On p9 mtxer is cracked into two latency 3 ops (which run in parallel).
While mtctr has latency 5.

On p8 mtxer was horrible indeed (but nothing near as bad as a pipeline
flush).


Segher


Re: [PATCH] powerpc/32: Clear volatile regs on syscall exit

2022-02-24 Thread Christophe Leroy


Le 23/02/2022 à 20:34, Kees Cook a écrit :
> On Wed, Feb 23, 2022 at 06:11:36PM +0100, Christophe Leroy wrote:
>> Commit a82adfd5c7cb ("hardening: Introduce CONFIG_ZERO_CALL_USED_REGS")
>> added zeroing of used registers at function exit.
>>
>> At the time being, PPC64 clears volatile registers on syscall exit but
>> PPC32 doesn't do it for performance reason.
>>
>> Add that clearing in PPC32 syscall exit as well, but only when
>> CONFIG_ZERO_CALL_USED_REGS is selected.
>>
>> On an 8xx, the null_syscall selftest gives:
>> - Without CONFIG_ZERO_CALL_USED_REGS : 288 cycles
>> - With CONFIG_ZERO_CALL_USED_REGS: 305 cycles
>> - With CONFIG_ZERO_CALL_USED_REGS + this patch   : 319 cycles
>>
>> Note that (independent of this patch), with pmac32_defconfig,
>> vmlinux size is as follows with/without CONFIG_ZERO_CALL_USED_REGS:
>>
>> text databss dec hex filename
>> 9578869  2525210  194400 12298479bba8ef  vmlinux.without
>> 10318045 2525210  194400 13037655c6f057  vmlinux.with
>>
>> That is a 7.7% increase on text size, 6.0% on overall size.
>>
>> Signed-off-by: Christophe Leroy 
>> ---
>>   arch/powerpc/kernel/entry_32.S | 15 +++
>>   1 file changed, 15 insertions(+)
>>
>> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
>> index 7748c278d13c..199f23092c02 100644
>> --- a/arch/powerpc/kernel/entry_32.S
>> +++ b/arch/powerpc/kernel/entry_32.S
>> @@ -151,6 +151,21 @@ syscall_exit_finish:
>>  bne 3f
>>  mtcrr5
>>   
>> +#ifdef CONFIG_ZERO_CALL_USED_REGS
>> +/* Zero volatile regs that may contain sensitive kernel data */
>> +li  r0,0
>> +li  r4,0
>> +li  r5,0
>> +li  r6,0
>> +li  r7,0
>> +li  r8,0
>> +li  r9,0
>> +li  r10,0
>> +li  r11,0
>> +li  r12,0
>> +mtctr   r0
>> +mtxer   r0
>> +#endif
> 
> I think this should probably be unconditional -- if this is actually
> leaking kernel pointers (or data) that's pretty bad. :|
> 
> If you really want to leave it build-time selectable, maybe add a new
> config that gets "select"ed by CONFIG_ZERO_CALL_USED_REGS?

You mean a CONFIG that is selected by CONFIG_ZERO_CALL_USED_REGS and may 
also be selected by the user when CONFIG_ZERO_CALL_USED_REGS is not 
selected ?

At exit:
- contain of r4 is loaded in LR
- contain of r5 is loaded in CR
- contain of r7 is were we branch after switching back to user mode
- contain of r8 is loaded in MSR. Allthough MSR can't be read by the 
user, there is nothing secret in it.
- XER contains arithmetic flags, nothing really sensitive.

So remain r0, r6, r9 to r12 and ctr.

Maybe a compromise could be to only clear those when 
CONFIG_ZERO_CALL_USED_REGS is not selected ?

> 
> (And you may want to consider wiping all "unused" registers at syscall
> entry as well.)

How "unused" ?

At syscall entry we have syscall NR in r0, syscall args in r3 to r8.
The handler uses r9, r10, r11 and r12 prior to re-enabling MMU and 
taking any conditional branche.
r1 and r2 are also soon set and used (r1 is stack ptr, r2 is ptr to 
current task struct) and restored from stack at the end.
r13-r31 are callee saved/restored.

Christophe

Re: [PATCH] powerpc/32: Clear volatile regs on syscall exit

2022-02-24 Thread Gabriel Paubert
On Wed, Feb 23, 2022 at 05:27:39PM -0600, Segher Boessenkool wrote:
> On Wed, Feb 23, 2022 at 09:48:09PM +0100, Gabriel Paubert wrote:
> > On Wed, Feb 23, 2022 at 06:11:36PM +0100, Christophe Leroy wrote:
> > > + /* Zero volatile regs that may contain sensitive kernel data */
> > > + li  r0,0
> > > + li  r4,0
> > > + li  r5,0
> > > + li  r6,0
> > > + li  r7,0
> > > + li  r8,0
> > > + li  r9,0
> > > + li  r10,0
> > > + li  r11,0
> > > + li  r12,0
> > > + mtctr   r0
> > > + mtxer   r0
> > 
> > Here, I'm almost sure that on some processors, it would be better to
> > separate mtctr form mtxer. mtxer is typically very expensive (pipeline
> > flush) but I don't know what's the best ordering for the average core.
> 
> mtxer is cheaper than mtctr on many cores :-)

We're speaking of 32 bit here I believe; on my (admittedly old) paper
copy of PowerPC 604 user's manual, I read in a footnote:

"The mtspr (XER) instruction causes instructions to be flushed when it
executes." 

Also a paragraph about "PostDispatch Serialization Mode" which reads:
"All instructions following the postdispatch serialization instruction
are flushed, refetched, and reexecuted."

Then it goes on to list the affected instructions which starts with:
mtsper(xer), mcrxr, isync, ...

I know there are probably very few 604 left in the field, but in this
case mtspr(xer) looks very much like a superset of isync.

I also just had a look at the documentation of a more widespread core:

https://www.nxp.com/docs/en/reference-manual/MPC7450UM.pdf

and mtspr(xer) is marked as execution and refetch serialized, actually
it is the only instruction to have both.

Maybe there is a subtle difference between "refetch serialization" and
"pipeline flush", but in this case please educate me.

Besides that the back to back mtctr/mtspr(xer) may limit instruction
decoding and issuing bandwidth.  I'd rather move one of them up by a few
lines since they can only go to one of the execution units on some
(or even most?) cores. This was my main point initially.

Gabriel

> 
> On p9 mtxer is cracked into two latency 3 ops (which run in parallel).
> While mtctr has latency 5.
> 
> On p8 mtxer was horrible indeed (but nothing near as bad as a pipeline
> flush).
> 
> 
> Segher
 



Re: [PATCH] powerpc/32: Clear volatile regs on syscall exit

2022-02-24 Thread Christophe Leroy


Le 23/02/2022 à 21:48, Gabriel Paubert a écrit :
> On Wed, Feb 23, 2022 at 06:11:36PM +0100, Christophe Leroy wrote:
>> Commit a82adfd5c7cb ("hardening: Introduce CONFIG_ZERO_CALL_USED_REGS")
>> added zeroing of used registers at function exit.
>>
>> At the time being, PPC64 clears volatile registers on syscall exit but
>> PPC32 doesn't do it for performance reason.
>>
>> Add that clearing in PPC32 syscall exit as well, but only when
>> CONFIG_ZERO_CALL_USED_REGS is selected.
>>
>> On an 8xx, the null_syscall selftest gives:
>> - Without CONFIG_ZERO_CALL_USED_REGS : 288 cycles
>> - With CONFIG_ZERO_CALL_USED_REGS: 305 cycles
>> - With CONFIG_ZERO_CALL_USED_REGS + this patch   : 319 cycles
>>
>> Note that (independent of this patch), with pmac32_defconfig,
>> vmlinux size is as follows with/without CONFIG_ZERO_CALL_USED_REGS:
>>
>> text databss dec hex filename
>> 9578869  2525210  194400 12298479bba8ef  vmlinux.without
>> 10318045 2525210  194400 13037655c6f057  vmlinux.with
>>
>> That is a 7.7% increase on text size, 6.0% on overall size.
>>
>> Signed-off-by: Christophe Leroy 
>> ---
>>   arch/powerpc/kernel/entry_32.S | 15 +++
>>   1 file changed, 15 insertions(+)
>>
>> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
>> index 7748c278d13c..199f23092c02 100644
>> --- a/arch/powerpc/kernel/entry_32.S
>> +++ b/arch/powerpc/kernel/entry_32.S
>> @@ -151,6 +151,21 @@ syscall_exit_finish:
>>  bne 3f
>>  mtcrr5
>>   
>> +#ifdef CONFIG_ZERO_CALL_USED_REGS
>> +/* Zero volatile regs that may contain sensitive kernel data */
>> +li  r0,0
>> +li  r4,0
>> +li  r5,0
>> +li  r6,0
>> +li  r7,0
>> +li  r8,0
>> +li  r9,0
>> +li  r10,0
>> +li  r11,0
>> +li  r12,0
>> +mtctr   r0
>> +mtxer   r0
> 
> Here, I'm almost sure that on some processors, it would be better to
> separate mtctr form mtxer. mtxer is typically very expensive (pipeline
> flush) but I don't know what's the best ordering for the average core.

In the 8xx, CTR and LR are handled by the BPU as any other reg (Latency 
1 blocage 1).
AFAIU, XER is serialize + 1

> 
> And what about lr? Should it also be cleared?

LR is restored from stack.

Christophe

Re: [PATCH] powerpc/32: Clear volatile regs on syscall exit

2022-02-23 Thread Gabriel Paubert
On Wed, Feb 23, 2022 at 06:11:36PM +0100, Christophe Leroy wrote:
> Commit a82adfd5c7cb ("hardening: Introduce CONFIG_ZERO_CALL_USED_REGS")
> added zeroing of used registers at function exit.
> 
> At the time being, PPC64 clears volatile registers on syscall exit but
> PPC32 doesn't do it for performance reason.
> 
> Add that clearing in PPC32 syscall exit as well, but only when
> CONFIG_ZERO_CALL_USED_REGS is selected.
> 
> On an 8xx, the null_syscall selftest gives:
> - Without CONFIG_ZERO_CALL_USED_REGS  : 288 cycles
> - With CONFIG_ZERO_CALL_USED_REGS : 305 cycles
> - With CONFIG_ZERO_CALL_USED_REGS + this patch: 319 cycles
> 
> Note that (independent of this patch), with pmac32_defconfig,
> vmlinux size is as follows with/without CONFIG_ZERO_CALL_USED_REGS:
> 
>text   databss dec hex filename
> 9578869   2525210  194400 12298479bba8ef  vmlinux.without
> 10318045  2525210  194400 13037655c6f057  vmlinux.with
> 
> That is a 7.7% increase on text size, 6.0% on overall size.
> 
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/kernel/entry_32.S | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
> index 7748c278d13c..199f23092c02 100644
> --- a/arch/powerpc/kernel/entry_32.S
> +++ b/arch/powerpc/kernel/entry_32.S
> @@ -151,6 +151,21 @@ syscall_exit_finish:
>   bne 3f
>   mtcrr5
>  
> +#ifdef CONFIG_ZERO_CALL_USED_REGS
> + /* Zero volatile regs that may contain sensitive kernel data */
> + li  r0,0
> + li  r4,0
> + li  r5,0
> + li  r6,0
> + li  r7,0
> + li  r8,0
> + li  r9,0
> + li  r10,0
> + li  r11,0
> + li  r12,0
> + mtctr   r0
> + mtxer   r0

Here, I'm almost sure that on some processors, it would be better to
separate mtctr form mtxer. mtxer is typically very expensive (pipeline
flush) but I don't know what's the best ordering for the average core.

And what about lr? Should it also be cleared?

Gabriel

> +#endif
>  1:   lwz r2,GPR2(r1)
>   lwz r1,GPR1(r1)
>   rfi
> -- 
> 2.34.1
> 
 



Re: [PATCH] powerpc/32: Clear volatile regs on syscall exit

2022-02-23 Thread Kees Cook
On Wed, Feb 23, 2022 at 06:11:36PM +0100, Christophe Leroy wrote:
> Commit a82adfd5c7cb ("hardening: Introduce CONFIG_ZERO_CALL_USED_REGS")
> added zeroing of used registers at function exit.
> 
> At the time being, PPC64 clears volatile registers on syscall exit but
> PPC32 doesn't do it for performance reason.
> 
> Add that clearing in PPC32 syscall exit as well, but only when
> CONFIG_ZERO_CALL_USED_REGS is selected.
> 
> On an 8xx, the null_syscall selftest gives:
> - Without CONFIG_ZERO_CALL_USED_REGS  : 288 cycles
> - With CONFIG_ZERO_CALL_USED_REGS : 305 cycles
> - With CONFIG_ZERO_CALL_USED_REGS + this patch: 319 cycles
> 
> Note that (independent of this patch), with pmac32_defconfig,
> vmlinux size is as follows with/without CONFIG_ZERO_CALL_USED_REGS:
> 
>text   databss dec hex filename
> 9578869   2525210  194400 12298479bba8ef  vmlinux.without
> 10318045  2525210  194400 13037655c6f057  vmlinux.with
> 
> That is a 7.7% increase on text size, 6.0% on overall size.
> 
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/kernel/entry_32.S | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
> index 7748c278d13c..199f23092c02 100644
> --- a/arch/powerpc/kernel/entry_32.S
> +++ b/arch/powerpc/kernel/entry_32.S
> @@ -151,6 +151,21 @@ syscall_exit_finish:
>   bne 3f
>   mtcrr5
>  
> +#ifdef CONFIG_ZERO_CALL_USED_REGS
> + /* Zero volatile regs that may contain sensitive kernel data */
> + li  r0,0
> + li  r4,0
> + li  r5,0
> + li  r6,0
> + li  r7,0
> + li  r8,0
> + li  r9,0
> + li  r10,0
> + li  r11,0
> + li  r12,0
> + mtctr   r0
> + mtxer   r0
> +#endif

I think this should probably be unconditional -- if this is actually
leaking kernel pointers (or data) that's pretty bad. :|

If you really want to leave it build-time selectable, maybe add a new
config that gets "select"ed by CONFIG_ZERO_CALL_USED_REGS?

(And you may want to consider wiping all "unused" registers at syscall
entry as well.)

-Kees

>  1:   lwz r2,GPR2(r1)
>   lwz r1,GPR1(r1)
>   rfi
> -- 
> 2.34.1
> 

-- 
Kees Cook