Re: [cryptodev:master 130/134] aes_generic.c:undefined reference to `_restgpr_31_x'

2018-01-14 Thread Segher Boessenkool
On Sun, Jan 14, 2018 at 10:40:36PM +0100, Arnd Bergmann wrote:
> Right. I've done some more investigation anyway, starting over with the
> analysis of the gcc options that change it. I've found now that turning
> off '-fcode-hoisting' but leaving on the other options I had suspected
> earlier (-O2 instead of -Os, -ftree-sra, -ftree-pre) also fixes the
> stack problem, and appears to result in the best performance so
> far.

Oh nice!

> I need to rerun the whole test matrix, but that seems rather
> promising, and the result may also help debug what's really happening.

-fcode-hoisting moves all expression evaluation to as early as possible;
for this AES code that means it will increase register pressure a lot,
causing a lot of spilling (well, that is my guess).  If that is so, then
we need to dial down -fcode-hoisting a bit, maybe make it aware of
register pressure.

Glad you found a smoking gun,


Segher


Re: [cryptodev:master 130/134] aes_generic.c:undefined reference to `_restgpr_31_x'

2018-01-14 Thread Arnd Bergmann
On Fri, Jan 12, 2018 at 11:10 PM, Segher Boessenkool
 wrote:
> On Fri, Jan 12, 2018 at 10:45:31PM +0100, Arnd Bergmann wrote:
>> > I guess you could enable the _x routines whenever you use ubsan?  Ubsan
>> > will cause much bigger code growth than the handful of insns in those
>> > routines?
>>
>> Right, that could work, too. My patch that Herbert merged intentionally
>> used -Os also for non-UBSAN builds because it turned out to
>> be much faster (see gcc PR83651),
>
> "Much"?
>
> -Os is *slower* with 8.0, 5% faster with 7.2, 4% faster with 7.1,
> slower with 7.0 and 6.3.  Your numbers, #c1.
>
> Anf this is the generic code of course, which is slow anyway (not to
> mention insecure).

Right. I've done some more investigation anyway, starting over with the
analysis of the gcc options that change it. I've found now that turning
off '-fcode-hoisting' but leaving on the other options I had suspected
earlier (-O2 instead of -Os, -ftree-sra, -ftree-pre) also fixes the
stack problem, and appears to result in the best performance so
far. I need to rerun the whole test matrix, but that seems rather
promising, and the result may also help debug what's really happening.

   Arnd


Re: [cryptodev:master 130/134] aes_generic.c:undefined reference to `_restgpr_31_x'

2018-01-12 Thread Segher Boessenkool
On Fri, Jan 12, 2018 at 10:45:31PM +0100, Arnd Bergmann wrote:
> > I guess you could enable the _x routines whenever you use ubsan?  Ubsan
> > will cause much bigger code growth than the handful of insns in those
> > routines?
> 
> Right, that could work, too. My patch that Herbert merged intentionally
> used -Os also for non-UBSAN builds because it turned out to
> be much faster (see gcc PR83651),

"Much"?

-Os is *slower* with 8.0, 5% faster with 7.2, 4% faster with 7.1,
slower with 7.0 and 6.3.  Your numbers, #c1.

Anf this is the generic code of course, which is slow anyway (not to
mention insecure).

> but we could revert that back
> to the default and only use the -Os for UBSAN, essentially
> addressing only PR83356 but not PR83651.


Segher


Re: [cryptodev:master 130/134] aes_generic.c:undefined reference to `_restgpr_31_x'

2018-01-12 Thread Arnd Bergmann
On Fri, Jan 12, 2018 at 10:41 PM, Segher Boessenkool
 wrote:
> On Fri, Jan 12, 2018 at 10:29:01PM +0100, Arnd Bergmann wrote:
>> On Fri, Jan 12, 2018 at 9:41 PM, Segher Boessenkool
>>  wrote:
>> > On Fri, Jan 12, 2018 at 08:43:21PM +0100, Arnd Bergmann wrote:
>> >> On Fri, Jan 12, 2018 at 5:39 PM, Segher Boessenkool
>>
>> >> We could theoretically work around it by turning that into
>> >> "#if defined(CONFIG_CC_OPTIMIZE_FOR_SIZE) ||
>> >> defined(CONFIG_CRYPTO_AES)", but that seems rather ugly.
>> >>
>> >> My earlier patch already tried to be more specific, turning very
>> >> specific optimizations off rather than moving from -O2 to -Os,
>> >> but that turned out to lead to significantly worse performance,
>> >> where -Os improved performance slightly. Is there a way
>> >> to ask powerpc compilers to use mostly -Os but not the
>> >> specific thing that makes it link to _restgpr_31_x?
>> >
>> > There is no such thing, sorry.  Would be very hard to implement, and
>> > older compilers will never get it, so it won't help you anyway :-(
>>
>> We use -Os only for gcc-7.1 and higher, where it produces faster
>> code for AES and avoids running into
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356
>>
>> > Maybe for now just enable it in crtsavres.S always, with a comment?
>> > That -Os workaround is hopefully not going to live long either...
>>
>> It depends on whether or how soon someone comes up with a
>> better fix for PR83356.
>> gcc-8.0.0 is currently not affected by it, so we could limit the
>> workaround (and the hack in crtsavres.S) to gcc-7-only.
>
> I guess you could enable the _x routines whenever you use ubsan?  Ubsan
> will cause much bigger code growth than the handful of insns in those
> routines?

Right, that could work, too. My patch that Herbert merged intentionally
used -Os also for non-UBSAN builds because it turned out to
be much faster (see gcc PR83651), but we could revert that back
to the default and only use the -Os for UBSAN, essentially
addressing only PR83356 but not PR83651.

   Arnd


Re: [cryptodev:master 130/134] aes_generic.c:undefined reference to `_restgpr_31_x'

2018-01-12 Thread Segher Boessenkool
On Fri, Jan 12, 2018 at 10:29:01PM +0100, Arnd Bergmann wrote:
> On Fri, Jan 12, 2018 at 9:41 PM, Segher Boessenkool
>  wrote:
> > On Fri, Jan 12, 2018 at 08:43:21PM +0100, Arnd Bergmann wrote:
> >> On Fri, Jan 12, 2018 at 5:39 PM, Segher Boessenkool
> 
> >> We could theoretically work around it by turning that into
> >> "#if defined(CONFIG_CC_OPTIMIZE_FOR_SIZE) ||
> >> defined(CONFIG_CRYPTO_AES)", but that seems rather ugly.
> >>
> >> My earlier patch already tried to be more specific, turning very
> >> specific optimizations off rather than moving from -O2 to -Os,
> >> but that turned out to lead to significantly worse performance,
> >> where -Os improved performance slightly. Is there a way
> >> to ask powerpc compilers to use mostly -Os but not the
> >> specific thing that makes it link to _restgpr_31_x?
> >
> > There is no such thing, sorry.  Would be very hard to implement, and
> > older compilers will never get it, so it won't help you anyway :-(
> 
> We use -Os only for gcc-7.1 and higher, where it produces faster
> code for AES and avoids running into
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356
> 
> > Maybe for now just enable it in crtsavres.S always, with a comment?
> > That -Os workaround is hopefully not going to live long either...
> 
> It depends on whether or how soon someone comes up with a
> better fix for PR83356.
> gcc-8.0.0 is currently not affected by it, so we could limit the
> workaround (and the hack in crtsavres.S) to gcc-7-only.

I guess you could enable the _x routines whenever you use ubsan?  Ubsan
will cause much bigger code growth than the handful of insns in those
routines?


Segher


Re: [cryptodev:master 130/134] aes_generic.c:undefined reference to `_restgpr_31_x'

2018-01-12 Thread Arnd Bergmann
On Fri, Jan 12, 2018 at 9:41 PM, Segher Boessenkool
 wrote:
> On Fri, Jan 12, 2018 at 08:43:21PM +0100, Arnd Bergmann wrote:
>> On Fri, Jan 12, 2018 at 5:39 PM, Segher Boessenkool

>> We could theoretically work around it by turning that into
>> "#if defined(CONFIG_CC_OPTIMIZE_FOR_SIZE) ||
>> defined(CONFIG_CRYPTO_AES)", but that seems rather ugly.
>>
>> My earlier patch already tried to be more specific, turning very
>> specific optimizations off rather than moving from -O2 to -Os,
>> but that turned out to lead to significantly worse performance,
>> where -Os improved performance slightly. Is there a way
>> to ask powerpc compilers to use mostly -Os but not the
>> specific thing that makes it link to _restgpr_31_x?
>
> There is no such thing, sorry.  Would be very hard to implement, and
> older compilers will never get it, so it won't help you anyway :-(

We use -Os only for gcc-7.1 and higher, where it produces faster
code for AES and avoids running into
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356

> Maybe for now just enable it in crtsavres.S always, with a comment?
> That -Os workaround is hopefully not going to live long either...

It depends on whether or how soon someone comes up with a
better fix for PR83356.
gcc-8.0.0 is currently not affected by it, so we could limit the
workaround (and the hack in crtsavres.S) to gcc-7-only.

 Arnd


Re: [cryptodev:master 130/134] aes_generic.c:undefined reference to `_restgpr_31_x'

2018-01-12 Thread Segher Boessenkool
On Fri, Jan 12, 2018 at 08:43:21PM +0100, Arnd Bergmann wrote:
> On Fri, Jan 12, 2018 at 5:39 PM, Segher Boessenkool
>  wrote:
> 
> >> or why the aes_generic implementation needs this on
> >> powerpc when built with 'gcc -Os'. FWIW, the -Os change was needed
> >> to work around a possible kernel stack overflow that can happen with
> >> gcc-7.2, see https://patchwork.kernel.org/patch/10143607/ and
> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356
> >
> > The _x versions are smaller but slower; that's why they are used with -Os.
> > Apparently nothing else was built with -Os (and the other needed flags)
> > before.
> 
> Ah, that explains it, the definition is in arch/powerpc/lib/crtsavres.S,
> but inside of #ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE.

Ah ok.  Right.

> We could theoretically work around it by turning that into
> "#if defined(CONFIG_CC_OPTIMIZE_FOR_SIZE) ||
> defined(CONFIG_CRYPTO_AES)", but that seems rather ugly.
> 
> My earlier patch already tried to be more specific, turning very
> specific optimizations off rather than moving from -O2 to -Os,
> but that turned out to lead to significantly worse performance,
> where -Os improved performance slightly. Is there a way
> to ask powerpc compilers to use mostly -Os but not the
> specific thing that makes it link to _restgpr_31_x?

There is no such thing, sorry.  Would be very hard to implement, and
older compilers will never get it, so it won't help you anyway :-(

Maybe for now just enable it in crtsavres.S always, with a comment?
That -Os workaround is hopefully not going to live long either...


Segher


Re: [cryptodev:master 130/134] aes_generic.c:undefined reference to `_restgpr_31_x'

2018-01-12 Thread Arnd Bergmann
On Fri, Jan 12, 2018 at 5:39 PM, Segher Boessenkool
 wrote:

>> or why the aes_generic implementation needs this on
>> powerpc when built with 'gcc -Os'. FWIW, the -Os change was needed
>> to work around a possible kernel stack overflow that can happen with
>> gcc-7.2, see https://patchwork.kernel.org/patch/10143607/ and
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356
>
> The _x versions are smaller but slower; that's why they are used with -Os.
> Apparently nothing else was built with -Os (and the other needed flags)
> before.

Ah, that explains it, the definition is in arch/powerpc/lib/crtsavres.S,
but inside of #ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE.

We could theoretically work around it by turning that into
"#if defined(CONFIG_CC_OPTIMIZE_FOR_SIZE) ||
defined(CONFIG_CRYPTO_AES)", but that seems rather ugly.

My earlier patch already tried to be more specific, turning very
specific optimizations off rather than moving from -O2 to -Os,
but that turned out to lead to significantly worse performance,
where -Os improved performance slightly. Is there a way
to ask powerpc compilers to use mostly -Os but not the
specific thing that makes it link to _restgpr_31_x?

   Arnd


Re: [cryptodev:master 130/134] aes_generic.c:undefined reference to `_restgpr_31_x'

2018-01-12 Thread Segher Boessenkool
Hi!

On Fri, Jan 12, 2018 at 03:55:47PM +0100, Arnd Bergmann wrote:
> >crypto/aes_generic.o: In function `crypto_aes_set_key':
> >>> aes_generic.c:(.text+0x4e0): undefined reference to `_restgpr_31_x'
> 
> adding linuxpcc-dev to Cc, maybe someone knows a way out of this.
> It appears related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43810

It is not.

> but I don't know what _restgpr_31_x actually does,

It restores GPR31 from stack, restores LR from stack, and returns
(_x is "exit").

> why it's not provided by the kernel

Because the kernel refuses to use libgcc.  Let's, uh, not start that
again?  :-)

> or why the aes_generic implementation needs this on
> powerpc when built with 'gcc -Os'. FWIW, the -Os change was needed
> to work around a possible kernel stack overflow that can happen with
> gcc-7.2, see https://patchwork.kernel.org/patch/10143607/ and
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356

The _x versions are smaller but slower; that's why they are used with -Os.
Apparently nothing else was built with -Os (and the other needed flags)
before.


Segher


Re: [cryptodev:master 130/134] aes_generic.c:undefined reference to `_restgpr_31_x'

2018-01-12 Thread Arnd Bergmann
On Fri, Jan 12, 2018 at 3:11 PM, kbuild test robot
 wrote:
> tree:   
> https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git 
> master
> head:   b40fa82cd6138350f723aa47b37e3e3e80906b40
> commit: 148b974deea927f5dbb6c468af2707b488bfa2de [130/134] crypto: 
> aes-generic - build with -Os on gcc-7+
> config: powerpc-linkstation_defconfig (attached as .config)
> compiler: powerpc-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
> reproduce:
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> git checkout 148b974deea927f5dbb6c468af2707b488bfa2de
> # save the attached .config to linux build tree
> make.cross ARCH=powerpc
>
> All errors (new ones prefixed by >>):
>
>crypto/aes_generic.o: In function `crypto_aes_set_key':
>>> aes_generic.c:(.text+0x4e0): undefined reference to `_restgpr_31_x'

adding linuxpcc-dev to Cc, maybe someone knows a way out of this.
It appears related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43810
but I don't know what _restgpr_31_x actually does, why it's not provided
by the kernel or why the aes_generic implementation needs this on
powerpc when built with 'gcc -Os'. FWIW, the -Os change was needed
to work around a possible kernel stack overflow that can happen with
gcc-7.2, see https://patchwork.kernel.org/patch/10143607/ and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356

 Arnd