Re: [cryptodev:master 130/134] aes_generic.c:undefined reference to `_restgpr_31_x'
On Sun, Jan 14, 2018 at 10:40:36PM +0100, Arnd Bergmann wrote: > Right. I've done some more investigation anyway, starting over with the > analysis of the gcc options that change it. I've found now that turning > off '-fcode-hoisting' but leaving on the other options I had suspected > earlier (-O2 instead of -Os, -ftree-sra, -ftree-pre) also fixes the > stack problem, and appears to result in the best performance so > far. Oh nice! > I need to rerun the whole test matrix, but that seems rather > promising, and the result may also help debug what's really happening. -fcode-hoisting moves all expression evaluation to as early as possible; for this AES code that means it will increase register pressure a lot, causing a lot of spilling (well, that is my guess). If that is so, then we need to dial down -fcode-hoisting a bit, maybe make it aware of register pressure. Glad you found a smoking gun, Segher
Re: [cryptodev:master 130/134] aes_generic.c:undefined reference to `_restgpr_31_x'
On Fri, Jan 12, 2018 at 11:10 PM, Segher Boessenkoolwrote: > On Fri, Jan 12, 2018 at 10:45:31PM +0100, Arnd Bergmann wrote: >> > I guess you could enable the _x routines whenever you use ubsan? Ubsan >> > will cause much bigger code growth than the handful of insns in those >> > routines? >> >> Right, that could work, too. My patch that Herbert merged intentionally >> used -Os also for non-UBSAN builds because it turned out to >> be much faster (see gcc PR83651), > > "Much"? > > -Os is *slower* with 8.0, 5% faster with 7.2, 4% faster with 7.1, > slower with 7.0 and 6.3. Your numbers, #c1. > > Anf this is the generic code of course, which is slow anyway (not to > mention insecure). Right. I've done some more investigation anyway, starting over with the analysis of the gcc options that change it. I've found now that turning off '-fcode-hoisting' but leaving on the other options I had suspected earlier (-O2 instead of -Os, -ftree-sra, -ftree-pre) also fixes the stack problem, and appears to result in the best performance so far. I need to rerun the whole test matrix, but that seems rather promising, and the result may also help debug what's really happening. Arnd
Re: [cryptodev:master 130/134] aes_generic.c:undefined reference to `_restgpr_31_x'
On Fri, Jan 12, 2018 at 10:45:31PM +0100, Arnd Bergmann wrote: > > I guess you could enable the _x routines whenever you use ubsan? Ubsan > > will cause much bigger code growth than the handful of insns in those > > routines? > > Right, that could work, too. My patch that Herbert merged intentionally > used -Os also for non-UBSAN builds because it turned out to > be much faster (see gcc PR83651), "Much"? -Os is *slower* with 8.0, 5% faster with 7.2, 4% faster with 7.1, slower with 7.0 and 6.3. Your numbers, #c1. Anf this is the generic code of course, which is slow anyway (not to mention insecure). > but we could revert that back > to the default and only use the -Os for UBSAN, essentially > addressing only PR83356 but not PR83651. Segher
Re: [cryptodev:master 130/134] aes_generic.c:undefined reference to `_restgpr_31_x'
On Fri, Jan 12, 2018 at 10:41 PM, Segher Boessenkoolwrote: > On Fri, Jan 12, 2018 at 10:29:01PM +0100, Arnd Bergmann wrote: >> On Fri, Jan 12, 2018 at 9:41 PM, Segher Boessenkool >> wrote: >> > On Fri, Jan 12, 2018 at 08:43:21PM +0100, Arnd Bergmann wrote: >> >> On Fri, Jan 12, 2018 at 5:39 PM, Segher Boessenkool >> >> >> We could theoretically work around it by turning that into >> >> "#if defined(CONFIG_CC_OPTIMIZE_FOR_SIZE) || >> >> defined(CONFIG_CRYPTO_AES)", but that seems rather ugly. >> >> >> >> My earlier patch already tried to be more specific, turning very >> >> specific optimizations off rather than moving from -O2 to -Os, >> >> but that turned out to lead to significantly worse performance, >> >> where -Os improved performance slightly. Is there a way >> >> to ask powerpc compilers to use mostly -Os but not the >> >> specific thing that makes it link to _restgpr_31_x? >> > >> > There is no such thing, sorry. Would be very hard to implement, and >> > older compilers will never get it, so it won't help you anyway :-( >> >> We use -Os only for gcc-7.1 and higher, where it produces faster >> code for AES and avoids running into >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356 >> >> > Maybe for now just enable it in crtsavres.S always, with a comment? >> > That -Os workaround is hopefully not going to live long either... >> >> It depends on whether or how soon someone comes up with a >> better fix for PR83356. >> gcc-8.0.0 is currently not affected by it, so we could limit the >> workaround (and the hack in crtsavres.S) to gcc-7-only. > > I guess you could enable the _x routines whenever you use ubsan? Ubsan > will cause much bigger code growth than the handful of insns in those > routines? Right, that could work, too. My patch that Herbert merged intentionally used -Os also for non-UBSAN builds because it turned out to be much faster (see gcc PR83651), but we could revert that back to the default and only use the -Os for UBSAN, essentially addressing only PR83356 but not PR83651. Arnd
Re: [cryptodev:master 130/134] aes_generic.c:undefined reference to `_restgpr_31_x'
On Fri, Jan 12, 2018 at 10:29:01PM +0100, Arnd Bergmann wrote: > On Fri, Jan 12, 2018 at 9:41 PM, Segher Boessenkool >wrote: > > On Fri, Jan 12, 2018 at 08:43:21PM +0100, Arnd Bergmann wrote: > >> On Fri, Jan 12, 2018 at 5:39 PM, Segher Boessenkool > > >> We could theoretically work around it by turning that into > >> "#if defined(CONFIG_CC_OPTIMIZE_FOR_SIZE) || > >> defined(CONFIG_CRYPTO_AES)", but that seems rather ugly. > >> > >> My earlier patch already tried to be more specific, turning very > >> specific optimizations off rather than moving from -O2 to -Os, > >> but that turned out to lead to significantly worse performance, > >> where -Os improved performance slightly. Is there a way > >> to ask powerpc compilers to use mostly -Os but not the > >> specific thing that makes it link to _restgpr_31_x? > > > > There is no such thing, sorry. Would be very hard to implement, and > > older compilers will never get it, so it won't help you anyway :-( > > We use -Os only for gcc-7.1 and higher, where it produces faster > code for AES and avoids running into > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356 > > > Maybe for now just enable it in crtsavres.S always, with a comment? > > That -Os workaround is hopefully not going to live long either... > > It depends on whether or how soon someone comes up with a > better fix for PR83356. > gcc-8.0.0 is currently not affected by it, so we could limit the > workaround (and the hack in crtsavres.S) to gcc-7-only. I guess you could enable the _x routines whenever you use ubsan? Ubsan will cause much bigger code growth than the handful of insns in those routines? Segher
Re: [cryptodev:master 130/134] aes_generic.c:undefined reference to `_restgpr_31_x'
On Fri, Jan 12, 2018 at 9:41 PM, Segher Boessenkoolwrote: > On Fri, Jan 12, 2018 at 08:43:21PM +0100, Arnd Bergmann wrote: >> On Fri, Jan 12, 2018 at 5:39 PM, Segher Boessenkool >> We could theoretically work around it by turning that into >> "#if defined(CONFIG_CC_OPTIMIZE_FOR_SIZE) || >> defined(CONFIG_CRYPTO_AES)", but that seems rather ugly. >> >> My earlier patch already tried to be more specific, turning very >> specific optimizations off rather than moving from -O2 to -Os, >> but that turned out to lead to significantly worse performance, >> where -Os improved performance slightly. Is there a way >> to ask powerpc compilers to use mostly -Os but not the >> specific thing that makes it link to _restgpr_31_x? > > There is no such thing, sorry. Would be very hard to implement, and > older compilers will never get it, so it won't help you anyway :-( We use -Os only for gcc-7.1 and higher, where it produces faster code for AES and avoids running into https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356 > Maybe for now just enable it in crtsavres.S always, with a comment? > That -Os workaround is hopefully not going to live long either... It depends on whether or how soon someone comes up with a better fix for PR83356. gcc-8.0.0 is currently not affected by it, so we could limit the workaround (and the hack in crtsavres.S) to gcc-7-only. Arnd
Re: [cryptodev:master 130/134] aes_generic.c:undefined reference to `_restgpr_31_x'
On Fri, Jan 12, 2018 at 08:43:21PM +0100, Arnd Bergmann wrote: > On Fri, Jan 12, 2018 at 5:39 PM, Segher Boessenkool >wrote: > > >> or why the aes_generic implementation needs this on > >> powerpc when built with 'gcc -Os'. FWIW, the -Os change was needed > >> to work around a possible kernel stack overflow that can happen with > >> gcc-7.2, see https://patchwork.kernel.org/patch/10143607/ and > >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356 > > > > The _x versions are smaller but slower; that's why they are used with -Os. > > Apparently nothing else was built with -Os (and the other needed flags) > > before. > > Ah, that explains it, the definition is in arch/powerpc/lib/crtsavres.S, > but inside of #ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE. Ah ok. Right. > We could theoretically work around it by turning that into > "#if defined(CONFIG_CC_OPTIMIZE_FOR_SIZE) || > defined(CONFIG_CRYPTO_AES)", but that seems rather ugly. > > My earlier patch already tried to be more specific, turning very > specific optimizations off rather than moving from -O2 to -Os, > but that turned out to lead to significantly worse performance, > where -Os improved performance slightly. Is there a way > to ask powerpc compilers to use mostly -Os but not the > specific thing that makes it link to _restgpr_31_x? There is no such thing, sorry. Would be very hard to implement, and older compilers will never get it, so it won't help you anyway :-( Maybe for now just enable it in crtsavres.S always, with a comment? That -Os workaround is hopefully not going to live long either... Segher
Re: [cryptodev:master 130/134] aes_generic.c:undefined reference to `_restgpr_31_x'
On Fri, Jan 12, 2018 at 5:39 PM, Segher Boessenkoolwrote: >> or why the aes_generic implementation needs this on >> powerpc when built with 'gcc -Os'. FWIW, the -Os change was needed >> to work around a possible kernel stack overflow that can happen with >> gcc-7.2, see https://patchwork.kernel.org/patch/10143607/ and >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356 > > The _x versions are smaller but slower; that's why they are used with -Os. > Apparently nothing else was built with -Os (and the other needed flags) > before. Ah, that explains it, the definition is in arch/powerpc/lib/crtsavres.S, but inside of #ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE. We could theoretically work around it by turning that into "#if defined(CONFIG_CC_OPTIMIZE_FOR_SIZE) || defined(CONFIG_CRYPTO_AES)", but that seems rather ugly. My earlier patch already tried to be more specific, turning very specific optimizations off rather than moving from -O2 to -Os, but that turned out to lead to significantly worse performance, where -Os improved performance slightly. Is there a way to ask powerpc compilers to use mostly -Os but not the specific thing that makes it link to _restgpr_31_x? Arnd
Re: [cryptodev:master 130/134] aes_generic.c:undefined reference to `_restgpr_31_x'
Hi! On Fri, Jan 12, 2018 at 03:55:47PM +0100, Arnd Bergmann wrote: > >crypto/aes_generic.o: In function `crypto_aes_set_key': > >>> aes_generic.c:(.text+0x4e0): undefined reference to `_restgpr_31_x' > > adding linuxpcc-dev to Cc, maybe someone knows a way out of this. > It appears related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43810 It is not. > but I don't know what _restgpr_31_x actually does, It restores GPR31 from stack, restores LR from stack, and returns (_x is "exit"). > why it's not provided by the kernel Because the kernel refuses to use libgcc. Let's, uh, not start that again? :-) > or why the aes_generic implementation needs this on > powerpc when built with 'gcc -Os'. FWIW, the -Os change was needed > to work around a possible kernel stack overflow that can happen with > gcc-7.2, see https://patchwork.kernel.org/patch/10143607/ and > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356 The _x versions are smaller but slower; that's why they are used with -Os. Apparently nothing else was built with -Os (and the other needed flags) before. Segher
Re: [cryptodev:master 130/134] aes_generic.c:undefined reference to `_restgpr_31_x'
On Fri, Jan 12, 2018 at 3:11 PM, kbuild test robotwrote: > tree: > https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git > master > head: b40fa82cd6138350f723aa47b37e3e3e80906b40 > commit: 148b974deea927f5dbb6c468af2707b488bfa2de [130/134] crypto: > aes-generic - build with -Os on gcc-7+ > config: powerpc-linkstation_defconfig (attached as .config) > compiler: powerpc-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 > reproduce: > wget > https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O > ~/bin/make.cross > chmod +x ~/bin/make.cross > git checkout 148b974deea927f5dbb6c468af2707b488bfa2de > # save the attached .config to linux build tree > make.cross ARCH=powerpc > > All errors (new ones prefixed by >>): > >crypto/aes_generic.o: In function `crypto_aes_set_key': >>> aes_generic.c:(.text+0x4e0): undefined reference to `_restgpr_31_x' adding linuxpcc-dev to Cc, maybe someone knows a way out of this. It appears related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43810 but I don't know what _restgpr_31_x actually does, why it's not provided by the kernel or why the aes_generic implementation needs this on powerpc when built with 'gcc -Os'. FWIW, the -Os change was needed to work around a possible kernel stack overflow that can happen with gcc-7.2, see https://patchwork.kernel.org/patch/10143607/ and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356 Arnd