Re: [PATCH] crypto: blake2b - Fix clang optimization for ARMv7-M

2020-05-15 Thread Herbert Xu
On Tue, May 05, 2020 at 03:53:45PM +0200, Arnd Bergmann wrote:
> When building for ARMv7-M, clang-9 or higher tries to unroll some loops,
> which ends up confusing the register allocator to the point of generating
> rather bad code and using more than the warning limit for stack frames:
> 
> warning: stack frame size of 1200 bytes in function 'blake2b_compress' 
> [-Wframe-larger-than=]
> 
> Forcing it to not unroll the final loop avoids this problem.
> 
> Fixes: 91d689337fe8 ("crypto: blake2b - add blake2b generic implementation")
> Signed-off-by: Arnd Bergmann 
> ---
>  crypto/blake2b_generic.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] crypto: blake2b - Fix clang optimization for ARMv7-M

2020-05-08 Thread Nathan Chancellor
On Fri, May 08, 2020 at 11:31:07PM +0200, Arnd Bergmann wrote:
> On Wed, May 6, 2020 at 7:12 AM Nathan Chancellor
>  wrote:
> > > -
> > > +#ifdef CONFIG_CC_IS_CLANG
> >
> > Given your comment in the bug:
> >
> > "The code is written to assume no loops are unrolled"
> >
> > Does it make sense to make this unconditional and take compiler
> > heuristics out of it?
> >
> > > +#pragma nounroll /* https://bugs.llvm.org/show_bug.cgi?id=45803 */
> > > +#endif
> > >   for (i = 0; i < 8; ++i)
> > >   S->h[i] = S->h[i] ^ v[i] ^ v[i + 8];
> 
> No, that would not work, as gcc does not support this pragma.
> 
> Arnd

Ah fair enough.

Reviewed-by: Nathan Chancellor 


Re: [PATCH] crypto: blake2b - Fix clang optimization for ARMv7-M

2020-05-08 Thread Arnd Bergmann
On Wed, May 6, 2020 at 7:12 AM Nathan Chancellor
 wrote:
> > -
> > +#ifdef CONFIG_CC_IS_CLANG
>
> Given your comment in the bug:
>
> "The code is written to assume no loops are unrolled"
>
> Does it make sense to make this unconditional and take compiler
> heuristics out of it?
>
> > +#pragma nounroll /* https://bugs.llvm.org/show_bug.cgi?id=45803 */
> > +#endif
> >   for (i = 0; i < 8; ++i)
> >   S->h[i] = S->h[i] ^ v[i] ^ v[i + 8];

No, that would not work, as gcc does not support this pragma.

Arnd


Re: [PATCH] crypto: blake2b - Fix clang optimization for ARMv7-M

2020-05-05 Thread Nathan Chancellor
On Tue, May 05, 2020 at 03:53:45PM +0200, Arnd Bergmann wrote:
> When building for ARMv7-M, clang-9 or higher tries to unroll some loops,
> which ends up confusing the register allocator to the point of generating
> rather bad code and using more than the warning limit for stack frames:
> 
> warning: stack frame size of 1200 bytes in function 'blake2b_compress' 
> [-Wframe-larger-than=]
> 
> Forcing it to not unroll the final loop avoids this problem.
> 
> Fixes: 91d689337fe8 ("crypto: blake2b - add blake2b generic implementation")
> Signed-off-by: Arnd Bergmann 
> ---
>  crypto/blake2b_generic.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/crypto/blake2b_generic.c b/crypto/blake2b_generic.c
> index 1d262374fa4e..0ffd8d92e308 100644
> --- a/crypto/blake2b_generic.c
> +++ b/crypto/blake2b_generic.c
> @@ -129,7 +129,9 @@ static void blake2b_compress(struct blake2b_state *S,
>   ROUND(9);
>   ROUND(10);
>   ROUND(11);
> -
> +#ifdef CONFIG_CC_IS_CLANG

Given your comment in the bug:

"The code is written to assume no loops are unrolled"

Does it make sense to make this unconditional and take compiler
heuristics out of it?

> +#pragma nounroll /* https://bugs.llvm.org/show_bug.cgi?id=45803 */
> +#endif
>   for (i = 0; i < 8; ++i)
>   S->h[i] = S->h[i] ^ v[i] ^ v[i + 8];
>  }
> -- 
> 2.26.0
>