Re: [PATCH] compiler.h: Fix barrier_data() on clang
On 11/16/20 11:28 AM, Randy Dunlap wrote: > On 11/16/20 10:30 AM, Andreas Schwab wrote: >> On Nov 16 2020, Randy Dunlap wrote: >> >>> What kernel version are you building? >> >> 5.10-rc4 >> >> Andreas. > > OK, thanks. > > My build machine is slow, but I have a patch that I am testing: > > --- > From: Randy Dunlap > > riscv's uses barrier() so it should > #include to prevent build errors. > > Reported-by: Andreas Schwab > --- > arch/riscv/include/asm/vdso/processor.h | 2 ++ > 1 file changed, 2 insertions(+) > > --- lnx-510-rc4.orig/arch/riscv/include/asm/vdso/processor.h > +++ lnx-510-rc4/arch/riscv/include/asm/vdso/processor.h > @@ -4,6 +4,8 @@ > > #ifndef __ASSEMBLY__ > > +#include > + > static inline void cpu_relax(void) > { > #ifdef __riscv_muldiv This fixes the emulex/benet/ driver build. I'm still building allmodconfig to see if there are any other issues. -- ~Randy
Re: [PATCH] compiler.h: Fix barrier_data() on clang
On Nov 16 2020, Nick Desaulniers wrote: > A lot of VDSO's reset KBUILD_CFLAGS or use a new variable for their > compiler flags. As such, they're missing `-include` command line flag > that injects include/linux/compiler_types.h, It's not missing here. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
Re: [PATCH] compiler.h: Fix barrier_data() on clang
On Mon, Nov 16, 2020 at 9:47 AM Andreas Schwab wrote: > > On Okt 14 2020, Arvind Sankar wrote: > > > Commit > > 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually > > exclusive") > > > > neglected to copy barrier_data() from compiler-gcc.h into > > compiler-clang.h. The definition in compiler-gcc.h was really to work > > around clang's more aggressive optimization, so this broke > > barrier_data() on clang, and consequently memzero_explicit() as well. > > > > For example, this results in at least the memzero_explicit() call in > > lib/crypto/sha256.c:sha256_transform() being optimized away by clang. > > > > Fix this by moving the definition of barrier_data() into compiler.h. > > > > Also move the gcc/clang definition of barrier() into compiler.h, > > __memory_barrier() is icc-specific (and barrier() is already defined > > using it in compiler-intel.h) and doesn't belong in compiler.h. > > > > Signed-off-by: Arvind Sankar > > Fixes: 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually > > exclusive") > > This breaks build on riscv: > > CC [M] drivers/net/ethernet/emulex/benet/be_main.o > In file included from ./include/vdso/processor.h:10, > from ./arch/riscv/include/asm/processor.h:11, > from ./include/linux/prefetch.h:15, > from drivers/net/ethernet/emulex/benet/be_main.c:14: > ./arch/riscv/include/asm/vdso/processor.h: In function 'cpu_relax': > ./arch/riscv/include/asm/vdso/processor.h:14:2: error: implicit declaration > of function 'barrier' [-Werror=implicit-function-declaration] >14 | barrier(); > | ^~~ > cc1: some warnings being treated as errors > make[5]: *** [scripts/Makefile.build:283: > drivers/net/ethernet/emulex/benet/be_main.o] Error 1 > make[4]: *** [scripts/Makefile.build:500: drivers/net/ethernet/emulex/benet] > Error 2 > make[3]: *** [scripts/Makefile.build:500: drivers/net/ethernet/emulex] Error 2 > make[2]: *** [scripts/Makefile.build:500: drivers/net/ethernet] Error 2 > make[1]: *** [scripts/Makefile.build:500: drivers/net] Error 2 > make: *** [Makefile:1799: drivers] Error 2 > > Andreas. A lot of VDSO's reset KBUILD_CFLAGS or use a new variable for their compiler flags. As such, they're missing `-include` command line flag that injects include/linux/compiler_types.h, which `#includes` numerous other headers if `__KERNEL__` is defined (`-D__KERNEL__`). So the RISCV VDSO Makefile might need to `-include $(srctree)/include/linux/compiler_types.h -D__KERNEL__`, or `#include "` directly in arch/riscv/include/asm/vdso/processor.h. -- Thanks, ~Nick Desaulniers
Re: [PATCH] compiler.h: Fix barrier_data() on clang
On 11/16/20 10:30 AM, Andreas Schwab wrote: On Nov 16 2020, Randy Dunlap wrote: What kernel version are you building? 5.10-rc4 Andreas. OK, thanks. My build machine is slow, but I have a patch that I am testing: --- From: Randy Dunlap riscv's uses barrier() so it should #include to prevent build errors. Reported-by: Andreas Schwab --- arch/riscv/include/asm/vdso/processor.h |2 ++ 1 file changed, 2 insertions(+) --- lnx-510-rc4.orig/arch/riscv/include/asm/vdso/processor.h +++ lnx-510-rc4/arch/riscv/include/asm/vdso/processor.h @@ -4,6 +4,8 @@ #ifndef __ASSEMBLY__ +#include + static inline void cpu_relax(void) { #ifdef __riscv_muldiv
Re: [PATCH] compiler.h: Fix barrier_data() on clang
On Nov 16 2020, Randy Dunlap wrote: > What kernel version are you building? 5.10-rc4 Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
Re: [PATCH] compiler.h: Fix barrier_data() on clang
On 11/16/20 9:47 AM, Andreas Schwab wrote: > On Okt 14 2020, Arvind Sankar wrote: > >> Commit >> 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually >> exclusive") >> >> neglected to copy barrier_data() from compiler-gcc.h into >> compiler-clang.h. The definition in compiler-gcc.h was really to work >> around clang's more aggressive optimization, so this broke >> barrier_data() on clang, and consequently memzero_explicit() as well. >> >> For example, this results in at least the memzero_explicit() call in >> lib/crypto/sha256.c:sha256_transform() being optimized away by clang. >> >> Fix this by moving the definition of barrier_data() into compiler.h. >> >> Also move the gcc/clang definition of barrier() into compiler.h, >> __memory_barrier() is icc-specific (and barrier() is already defined >> using it in compiler-intel.h) and doesn't belong in compiler.h. >> >> Signed-off-by: Arvind Sankar >> Fixes: 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually >> exclusive") > > This breaks build on riscv: > > CC [M] drivers/net/ethernet/emulex/benet/be_main.o > In file included from ./include/vdso/processor.h:10, > from ./arch/riscv/include/asm/processor.h:11, > from ./include/linux/prefetch.h:15, > from drivers/net/ethernet/emulex/benet/be_main.c:14: > ./arch/riscv/include/asm/vdso/processor.h: In function 'cpu_relax': > ./arch/riscv/include/asm/vdso/processor.h:14:2: error: implicit declaration > of function 'barrier' [-Werror=implicit-function-declaration] >14 | barrier(); > | ^~~ > cc1: some warnings being treated as errors > make[5]: *** [scripts/Makefile.build:283: > drivers/net/ethernet/emulex/benet/be_main.o] Error 1 > make[4]: *** [scripts/Makefile.build:500: drivers/net/ethernet/emulex/benet] > Error 2 > make[3]: *** [scripts/Makefile.build:500: drivers/net/ethernet/emulex] Error 2 > make[2]: *** [scripts/Makefile.build:500: drivers/net/ethernet] Error 2 > make[1]: *** [scripts/Makefile.build:500: drivers/net] Error 2 > make: *** [Makefile:1799: drivers] Error 2 Hi, What kernel version are you building? This should be fixed in 5.10-rc4 by commit 3347acc6fcd4ee71ad18a9ff9d9dac176b517329; specifically this portion of it: diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h index 798027bb89be..640f09479bdf 100644 --- a/include/asm-generic/barrier.h +++ b/include/asm-generic/barrier.h @@ -13,6 +13,7 @@ #ifndef __ASSEMBLY__ +#include #include #ifndef nop -- ~Randy
Re: [PATCH] compiler.h: Fix barrier_data() on clang
On Okt 14 2020, Arvind Sankar wrote: > Commit > 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually > exclusive") > > neglected to copy barrier_data() from compiler-gcc.h into > compiler-clang.h. The definition in compiler-gcc.h was really to work > around clang's more aggressive optimization, so this broke > barrier_data() on clang, and consequently memzero_explicit() as well. > > For example, this results in at least the memzero_explicit() call in > lib/crypto/sha256.c:sha256_transform() being optimized away by clang. > > Fix this by moving the definition of barrier_data() into compiler.h. > > Also move the gcc/clang definition of barrier() into compiler.h, > __memory_barrier() is icc-specific (and barrier() is already defined > using it in compiler-intel.h) and doesn't belong in compiler.h. > > Signed-off-by: Arvind Sankar > Fixes: 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually > exclusive") This breaks build on riscv: CC [M] drivers/net/ethernet/emulex/benet/be_main.o In file included from ./include/vdso/processor.h:10, from ./arch/riscv/include/asm/processor.h:11, from ./include/linux/prefetch.h:15, from drivers/net/ethernet/emulex/benet/be_main.c:14: ./arch/riscv/include/asm/vdso/processor.h: In function 'cpu_relax': ./arch/riscv/include/asm/vdso/processor.h:14:2: error: implicit declaration of function 'barrier' [-Werror=implicit-function-declaration] 14 | barrier(); | ^~~ cc1: some warnings being treated as errors make[5]: *** [scripts/Makefile.build:283: drivers/net/ethernet/emulex/benet/be_main.o] Error 1 make[4]: *** [scripts/Makefile.build:500: drivers/net/ethernet/emulex/benet] Error 2 make[3]: *** [scripts/Makefile.build:500: drivers/net/ethernet/emulex] Error 2 make[2]: *** [scripts/Makefile.build:500: drivers/net/ethernet] Error 2 make[1]: *** [scripts/Makefile.build:500: drivers/net] Error 2 make: *** [Makefile:1799: drivers] Error 2 Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
Re: [PATCH] compiler.h: Fix barrier_data() on clang
On Wed, Oct 14, 2020 at 05:26:31PM -0400, Arvind Sankar wrote: > Commit > 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually > exclusive") > > neglected to copy barrier_data() from compiler-gcc.h into > compiler-clang.h. The definition in compiler-gcc.h was really to work > around clang's more aggressive optimization, so this broke > barrier_data() on clang, and consequently memzero_explicit() as well. > > For example, this results in at least the memzero_explicit() call in > lib/crypto/sha256.c:sha256_transform() being optimized away by clang. > > Fix this by moving the definition of barrier_data() into compiler.h. > > Also move the gcc/clang definition of barrier() into compiler.h, > __memory_barrier() is icc-specific (and barrier() is already defined > using it in compiler-intel.h) and doesn't belong in compiler.h. > > Signed-off-by: Arvind Sankar > Fixes: 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually > exclusive") Yeowch. Cc: sta...@vger.kernel.org Reviewed-by: Kees Cook Nick just mentioned this to me; I hadn't had a chance to read it yet. This needs to go to Linus ASAP; memzero_explicit() under Clang in v4.19 and later isn't so explicit. :( Andrew, Linus, can one of you pick this up please? As Nick mentioned, sorting out the specifics of the comments[1] can come later. [1] https://lore.kernel.org/lkml/cakwvodklvxeyebh7kx0gw7jpktph8a4domjtiduqa0jrqtr...@mail.gmail.com/ > --- > include/linux/compiler-clang.h | 6 -- > include/linux/compiler-gcc.h | 19 --- > include/linux/compiler.h | 18 -- > 3 files changed, 16 insertions(+), 27 deletions(-) > > diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h > index cee0c728d39a..04c0a5a717f7 100644 > --- a/include/linux/compiler-clang.h > +++ b/include/linux/compiler-clang.h > @@ -52,12 +52,6 @@ > #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1 > #endif > > -/* The following are for compatibility with GCC, from compiler-gcc.h, > - * and may be redefined here because they should not be shared with other > - * compilers, like ICC. > - */ > -#define barrier() __asm__ __volatile__("" : : : "memory") > - > #if __has_feature(shadow_call_stack) > # define __noscs __attribute__((__no_sanitize__("shadow-call-stack"))) > #endif > diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h > index 7a3769040d7d..fda30ffb037b 100644 > --- a/include/linux/compiler-gcc.h > +++ b/include/linux/compiler-gcc.h > @@ -15,25 +15,6 @@ > # error Sorry, your compiler is too old - please upgrade it. > #endif > > -/* Optimization barrier */ > - > -/* The "volatile" is due to gcc bugs */ > -#define barrier() __asm__ __volatile__("": : :"memory") > -/* > - * This version is i.e. to prevent dead stores elimination on @ptr > - * where gcc and llvm may behave differently when otherwise using > - * normal barrier(): while gcc behavior gets along with a normal > - * barrier(), llvm needs an explicit input variable to be assumed > - * clobbered. The issue is as follows: while the inline asm might > - * access any memory it wants, the compiler could have fit all of > - * @ptr into memory registers instead, and since @ptr never escaped > - * from that, it proved that the inline asm wasn't touching any of > - * it. This version works well with both compilers, i.e. we're telling > - * the compiler that the inline asm absolutely may see the contents > - * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495 > - */ > -#define barrier_data(ptr) __asm__ __volatile__("": :"r"(ptr) :"memory") > - > /* > * This macro obfuscates arithmetic on a variable address so that gcc > * shouldn't recognize the original var, and make assumptions about it. > diff --git a/include/linux/compiler.h b/include/linux/compiler.h > index 92ef163a7479..dfba70b2644f 100644 > --- a/include/linux/compiler.h > +++ b/include/linux/compiler.h > @@ -80,11 +80,25 @@ void ftrace_likely_update(struct ftrace_likely_data *f, > int val, > > /* Optimization barrier */ > #ifndef barrier > -# define barrier() __memory_barrier() > +/* The "volatile" is due to gcc bugs */ > +# define barrier() __asm__ __volatile__("": : :"memory") > #endif > > #ifndef barrier_data > -# define barrier_data(ptr) barrier() > +/* > + * This version is i.e. to prevent dead stores elimination on @ptr > + * where gcc and llvm may behave differently when otherwise using > + * normal barrier(): while gcc behavior gets along with a normal > + * barrier(), llvm needs an explicit input variable to be assumed > + * clobbered. The issue is as follows: while the inline asm might > + * access any memory it wants, the compiler could have fit all of > + * @ptr into memory registers instead, and since @ptr never escaped > + * from that, it proved that the inline asm wasn't touching any of > + * it. This version works well with both compilers, i.e. we're telling > + * the compiler that the inline asm
Re: [PATCH] compiler.h: Fix barrier_data() on clang
On Thu, Oct 15, 2020 at 8:39 AM Arvind Sankar wrote: > > On Thu, Oct 15, 2020 at 03:24:09PM +, David Laight wrote: > > > I think the comment is unclear now that you bring it up, but the problem > > > it actually addresses is not that the data is held in registers: in the > > > sha256_transform() case mentioned in the commit message, for example, > > > the data is in fact in memory even before this change (it's a 256-byte > > > array), and that together with the memory clobber is enough for gcc to > > > assume that the asm might use it. But with clang, if the address of that > > > data has never escaped -- in this case the data is a local variable > > > whose address was never passed out of the function -- the compiler > > > assumes that the asm cannot possibly depend on that memory, because it > > > has no way of getting its address. > > > > Ok, slightly different from what i thought. > > But the current comment is just wrong. > > Should I fix up the comment in the same commit, or do a second one after > moving the macro? I would prefer undoing the mistake from 815f0ddb346c and getting that backported to stable, then rewriting comments or the trick to retain the memset separately. > > > > i.e. something like: > > > static inline void barrier_data(void *ptr, size_t size) > > > { > > > asm volatile("" : "+m"(*(char (*)[size])ptr)); > > > > I think it has to be a struct with an array member? > > I don't think so, this is actually an example in gcc's documentation: > > An x86 example where the string memory argument is of unknown length. > > asm("repne scasb" > : "=c" (count), "+D" (p) > : "m" (*(const char (*)[]) p), "0" (-1), "a" (0)); > > If you know the above will only be reading a ten byte array then you > could instead use a memory input like: "m" (*(const char (*)[10]) p). > > > > > > } > > > plus some magic to disable the VLA warning, otherwise it causes a build > > > error. > > > > It shouldn't if the size is a compile time constant. > > And given this is an instruction to the compiler it better be. > > Ah right. I saw the warning when playing with something else where size > could be constant or variable depending on the call. > > > > > > > With a memory clobber, the compiler has to keep x and y at different > > > addresses, since the first barrier_data() might have saved the address > > > of x. > > > > Maybe "+m"(*ptr) : "r"(ptr) would work. > > Nothing that can only modify what ptr points to could avoid this, since > that storage is dead after the barrier. > > > OTOH a "memory" clobber at the bottom of a function isn't going to > > cause bloat. > > > > The explicit ranged memory access (without "memory") probably has its > > uses - but only if the full "memory" clobber causes grief. > > > > David > > > > - > > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 > > 1PT, UK > > Registration No: 1397386 (Wales) -- Thanks, ~Nick Desaulniers
Re: [PATCH] compiler.h: Fix barrier_data() on clang
On Thu, Oct 15, 2020 at 03:24:09PM +, David Laight wrote: > > I think the comment is unclear now that you bring it up, but the problem > > it actually addresses is not that the data is held in registers: in the > > sha256_transform() case mentioned in the commit message, for example, > > the data is in fact in memory even before this change (it's a 256-byte > > array), and that together with the memory clobber is enough for gcc to > > assume that the asm might use it. But with clang, if the address of that > > data has never escaped -- in this case the data is a local variable > > whose address was never passed out of the function -- the compiler > > assumes that the asm cannot possibly depend on that memory, because it > > has no way of getting its address. > > Ok, slightly different from what i thought. > But the current comment is just wrong. Should I fix up the comment in the same commit, or do a second one after moving the macro? > > i.e. something like: > > static inline void barrier_data(void *ptr, size_t size) > > { > > asm volatile("" : "+m"(*(char (*)[size])ptr)); > > I think it has to be a struct with an array member? I don't think so, this is actually an example in gcc's documentation: An x86 example where the string memory argument is of unknown length. asm("repne scasb" : "=c" (count), "+D" (p) : "m" (*(const char (*)[]) p), "0" (-1), "a" (0)); If you know the above will only be reading a ten byte array then you could instead use a memory input like: "m" (*(const char (*)[10]) p). > > > } > > plus some magic to disable the VLA warning, otherwise it causes a build > > error. > > It shouldn't if the size is a compile time constant. > And given this is an instruction to the compiler it better be. Ah right. I saw the warning when playing with something else where size could be constant or variable depending on the call. > > > > With a memory clobber, the compiler has to keep x and y at different > > addresses, since the first barrier_data() might have saved the address > > of x. > > Maybe "+m"(*ptr) : "r"(ptr) would work. Nothing that can only modify what ptr points to could avoid this, since that storage is dead after the barrier. > OTOH a "memory" clobber at the bottom of a function isn't going to > cause bloat. > > The explicit ranged memory access (without "memory") probably has its > uses - but only if the full "memory" clobber causes grief. > > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 > 1PT, UK > Registration No: 1397386 (Wales)
RE: [PATCH] compiler.h: Fix barrier_data() on clang
From: Arvind Sankar > Sent: 15 October 2020 15:45 > > On Thu, Oct 15, 2020 at 08:50:05AM +, David Laight wrote: > > From: Arvind Sankar > > > Sent: 14 October 2020 22:27 > > ... > > > +/* > > > + * This version is i.e. to prevent dead stores elimination on @ptr > > > + * where gcc and llvm may behave differently when otherwise using > > > + * normal barrier(): while gcc behavior gets along with a normal > > > + * barrier(), llvm needs an explicit input variable to be assumed > > > + * clobbered. The issue is as follows: while the inline asm might > > > + * access any memory it wants, the compiler could have fit all of > > > + * @ptr into memory registers instead, and since @ptr never escaped > > > + * from that, it proved that the inline asm wasn't touching any of > > > + * it. This version works well with both compilers, i.e. we're telling > > > + * the compiler that the inline asm absolutely may see the contents > > > + * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495 > > > + */ > > > +# define barrier_data(ptr) __asm__ __volatile__("": :"r"(ptr) :"memory") > > > > That comment doesn't actually match the asm statement. > > Although the asm statement probably has the desired effect. > > > > The "r"(ptr) constraint only passes the address of the buffer > > into the asm - it doesn't say anything at all about the associated > > memory. > > > > What the "r"(ptr) actually does is to force the address of the > > associated data to be taken. > > This means that on-stack space must actually be allocated. > > The "memory" clobber will then force the registers caching > > the variable be written out to stack. > > > > I think the comment is unclear now that you bring it up, but the problem > it actually addresses is not that the data is held in registers: in the > sha256_transform() case mentioned in the commit message, for example, > the data is in fact in memory even before this change (it's a 256-byte > array), and that together with the memory clobber is enough for gcc to > assume that the asm might use it. But with clang, if the address of that > data has never escaped -- in this case the data is a local variable > whose address was never passed out of the function -- the compiler > assumes that the asm cannot possibly depend on that memory, because it > has no way of getting its address. Ok, slightly different from what i thought. But the current comment is just wrong. > Passing ptr to the inline asm tells clang that the asm knows the > address, and since it also has a memory clobber, that it may use the > data. This is somewhat suboptimal, since if the data was some small > structure that the compiler was just holding in registers originally, > forcing it out to memory is a bad thing to do. > > > If you only want to force stores on a single data structure > > you actually want: > > #define barrier_data(ptr) asm volatile("" :: "m"(*ptr)) > > although it would be best then to add an explicit size > > and associated cast. > > > > i.e. something like: > static inline void barrier_data(void *ptr, size_t size) > { > asm volatile("" : "+m"(*(char (*)[size])ptr)); I think it has to be a struct with an array member? > } > plus some magic to disable the VLA warning, otherwise it causes a build > error. It shouldn't if the size is a compile time constant. And given this is an instruction to the compiler it better be. > But I think that might lead to even more subtle issues by dropping the > memory clobber. For example (and this is actually done in > sha256_transform() as well, though the zero'ing simply doesn't work with > any compiler, as it's missing the barrier_data()'s): > > unsigned long x, y; > ... do something secret with x/y ... > x = y = 0; > barrier_data(, sizeof(x)); > barrier_data(, sizeof(y)); > return; > > Since x is not used after its barrier_data(), I think the compiler would > be within its rights to turn that into: > > xorl%eax, %eax > leaq-16(%rbp), %rdx // == -16(%rbp) > movq%eax, (%rdx)// x = 0; > // inline asm for barrier_data(, sizeof(x)); > movq%eax, (%rdx)// y = 0; (!) > // inline asm for barrier_data(, sizeof(y)); > > which saves one instruction by putting y at the same location as x, once > x is dead. > > With a memory clobber, the compiler has to keep x and y at different > addresses, since the first barrier_data() might have saved the address > of x. Maybe "+m"(*ptr) : "r"(ptr) would work. OTOH a "memory" clobber at the bottom of a function isn't going to cause bloat. The explicit ranged memory access (without "memory") probably has its uses - but only if the full "memory" clobber causes grief. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [PATCH] compiler.h: Fix barrier_data() on clang
On Thu, Oct 15, 2020 at 08:50:05AM +, David Laight wrote: > From: Arvind Sankar > > Sent: 14 October 2020 22:27 > ... > > +/* > > + * This version is i.e. to prevent dead stores elimination on @ptr > > + * where gcc and llvm may behave differently when otherwise using > > + * normal barrier(): while gcc behavior gets along with a normal > > + * barrier(), llvm needs an explicit input variable to be assumed > > + * clobbered. The issue is as follows: while the inline asm might > > + * access any memory it wants, the compiler could have fit all of > > + * @ptr into memory registers instead, and since @ptr never escaped > > + * from that, it proved that the inline asm wasn't touching any of > > + * it. This version works well with both compilers, i.e. we're telling > > + * the compiler that the inline asm absolutely may see the contents > > + * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495 > > + */ > > +# define barrier_data(ptr) __asm__ __volatile__("": :"r"(ptr) :"memory") > > That comment doesn't actually match the asm statement. > Although the asm statement probably has the desired effect. > > The "r"(ptr) constraint only passes the address of the buffer > into the asm - it doesn't say anything at all about the associated > memory. > > What the "r"(ptr) actually does is to force the address of the > associated data to be taken. > This means that on-stack space must actually be allocated. > The "memory" clobber will then force the registers caching > the variable be written out to stack. > I think the comment is unclear now that you bring it up, but the problem it actually addresses is not that the data is held in registers: in the sha256_transform() case mentioned in the commit message, for example, the data is in fact in memory even before this change (it's a 256-byte array), and that together with the memory clobber is enough for gcc to assume that the asm might use it. But with clang, if the address of that data has never escaped -- in this case the data is a local variable whose address was never passed out of the function -- the compiler assumes that the asm cannot possibly depend on that memory, because it has no way of getting its address. Passing ptr to the inline asm tells clang that the asm knows the address, and since it also has a memory clobber, that it may use the data. This is somewhat suboptimal, since if the data was some small structure that the compiler was just holding in registers originally, forcing it out to memory is a bad thing to do. > If you only want to force stores on a single data structure > you actually want: > #define barrier_data(ptr) asm volatile("" :: "m"(*ptr)) > although it would be best then to add an explicit size > and associated cast. > i.e. something like: static inline void barrier_data(void *ptr, size_t size) { asm volatile("" : "+m"(*(char (*)[size])ptr)); } plus some magic to disable the VLA warning, otherwise it causes a build error. But I think that might lead to even more subtle issues by dropping the memory clobber. For example (and this is actually done in sha256_transform() as well, though the zero'ing simply doesn't work with any compiler, as it's missing the barrier_data()'s): unsigned long x, y; ... do something secret with x/y ... x = y = 0; barrier_data(, sizeof(x)); barrier_data(, sizeof(y)); return; Since x is not used after its barrier_data(), I think the compiler would be within its rights to turn that into: xorl%eax, %eax leaq-16(%rbp), %rdx // == -16(%rbp) movq%eax, (%rdx)// x = 0; // inline asm for barrier_data(, sizeof(x)); movq%eax, (%rdx)// y = 0; (!) // inline asm for barrier_data(, sizeof(y)); which saves one instruction by putting y at the same location as x, once x is dead. With a memory clobber, the compiler has to keep x and y at different addresses, since the first barrier_data() might have saved the address of x.
RE: [PATCH] compiler.h: Fix barrier_data() on clang
From: Arvind Sankar > Sent: 14 October 2020 22:27 ... > +/* > + * This version is i.e. to prevent dead stores elimination on @ptr > + * where gcc and llvm may behave differently when otherwise using > + * normal barrier(): while gcc behavior gets along with a normal > + * barrier(), llvm needs an explicit input variable to be assumed > + * clobbered. The issue is as follows: while the inline asm might > + * access any memory it wants, the compiler could have fit all of > + * @ptr into memory registers instead, and since @ptr never escaped > + * from that, it proved that the inline asm wasn't touching any of > + * it. This version works well with both compilers, i.e. we're telling > + * the compiler that the inline asm absolutely may see the contents > + * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495 > + */ > +# define barrier_data(ptr) __asm__ __volatile__("": :"r"(ptr) :"memory") That comment doesn't actually match the asm statement. Although the asm statement probably has the desired effect. The "r"(ptr) constraint only passes the address of the buffer into the asm - it doesn't say anything at all about the associated memory. What the "r"(ptr) actually does is to force the address of the associated data to be taken. This means that on-stack space must actually be allocated. The "memory" clobber will then force the registers caching the variable be written out to stack. If you only want to force stores on a single data structure you actually want: #define barrier_data(ptr) asm volatile("" :: "m"(*ptr)) although it would be best then to add an explicit size and associated cast. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [PATCH] compiler.h: Fix barrier_data() on clang
On Wed, Oct 14, 2020 at 2:26 PM Arvind Sankar wrote: > > Commit > 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually > exclusive") > > neglected to copy barrier_data() from compiler-gcc.h into > compiler-clang.h. The definition in compiler-gcc.h was really to work > around clang's more aggressive optimization, so this broke > barrier_data() on clang, and consequently memzero_explicit() as well. > > For example, this results in at least the memzero_explicit() call in > lib/crypto/sha256.c:sha256_transform() being optimized away by clang. > > Fix this by moving the definition of barrier_data() into compiler.h. > > Also move the gcc/clang definition of barrier() into compiler.h, > __memory_barrier() is icc-specific (and barrier() is already defined > using it in compiler-intel.h) and doesn't belong in compiler.h. > > Signed-off-by: Arvind Sankar > Fixes: 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually > exclusive") Thanks for the patch! Curious how you spotted this? My mistake for missing it. Definite difference in the disassembly before/after. Cc: sta...@vger.kernel.org Reviewed-by: Nick Desaulniers Tested-by: Nick Desaulniers akpm@ would you mind picking this up when you have a chance? See also: commit 7829fb09a2b4 ("lib: make memzero_explicit more robust against dead store elimination") I'm pretty sure `man 3 explicit_bzero` was created in libc for this exact problem, though the manual page is an interesting read. > --- > include/linux/compiler-clang.h | 6 -- > include/linux/compiler-gcc.h | 19 --- > include/linux/compiler.h | 18 -- > 3 files changed, 16 insertions(+), 27 deletions(-) > > diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h > index cee0c728d39a..04c0a5a717f7 100644 > --- a/include/linux/compiler-clang.h > +++ b/include/linux/compiler-clang.h > @@ -52,12 +52,6 @@ > #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1 > #endif > > -/* The following are for compatibility with GCC, from compiler-gcc.h, > - * and may be redefined here because they should not be shared with other > - * compilers, like ICC. > - */ > -#define barrier() __asm__ __volatile__("" : : : "memory") > - > #if __has_feature(shadow_call_stack) > # define __noscs __attribute__((__no_sanitize__("shadow-call-stack"))) > #endif > diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h > index 7a3769040d7d..fda30ffb037b 100644 > --- a/include/linux/compiler-gcc.h > +++ b/include/linux/compiler-gcc.h > @@ -15,25 +15,6 @@ > # error Sorry, your compiler is too old - please upgrade it. > #endif > > -/* Optimization barrier */ > - > -/* The "volatile" is due to gcc bugs */ > -#define barrier() __asm__ __volatile__("": : :"memory") > -/* > - * This version is i.e. to prevent dead stores elimination on @ptr > - * where gcc and llvm may behave differently when otherwise using > - * normal barrier(): while gcc behavior gets along with a normal > - * barrier(), llvm needs an explicit input variable to be assumed > - * clobbered. The issue is as follows: while the inline asm might > - * access any memory it wants, the compiler could have fit all of > - * @ptr into memory registers instead, and since @ptr never escaped > - * from that, it proved that the inline asm wasn't touching any of > - * it. This version works well with both compilers, i.e. we're telling > - * the compiler that the inline asm absolutely may see the contents > - * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495 > - */ > -#define barrier_data(ptr) __asm__ __volatile__("": :"r"(ptr) :"memory") > - > /* > * This macro obfuscates arithmetic on a variable address so that gcc > * shouldn't recognize the original var, and make assumptions about it. > diff --git a/include/linux/compiler.h b/include/linux/compiler.h > index 92ef163a7479..dfba70b2644f 100644 > --- a/include/linux/compiler.h > +++ b/include/linux/compiler.h > @@ -80,11 +80,25 @@ void ftrace_likely_update(struct ftrace_likely_data *f, > int val, > > /* Optimization barrier */ > #ifndef barrier > -# define barrier() __memory_barrier() > +/* The "volatile" is due to gcc bugs */ > +# define barrier() __asm__ __volatile__("": : :"memory") > #endif > > #ifndef barrier_data > -# define barrier_data(ptr) barrier() > +/* > + * This version is i.e. to prevent dead stores elimination on @ptr > + * where gcc and llvm may behave differently when otherwise using > + * normal barrier(): while gcc behavior gets along with a normal > + * barrier(), llvm needs an explicit input variable to be assumed > + * clobbered. The issue is as follows: while the inline asm might > + * access any memory it wants, the compiler could have fit all of > + * @ptr into memory registers instead, and since @ptr never escaped > + * from that, it proved that the inline asm wasn't touching any of > + * it. This version works well with both compilers, i.e. we're telling > + * the
[PATCH] compiler.h: Fix barrier_data() on clang
Commit 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive") neglected to copy barrier_data() from compiler-gcc.h into compiler-clang.h. The definition in compiler-gcc.h was really to work around clang's more aggressive optimization, so this broke barrier_data() on clang, and consequently memzero_explicit() as well. For example, this results in at least the memzero_explicit() call in lib/crypto/sha256.c:sha256_transform() being optimized away by clang. Fix this by moving the definition of barrier_data() into compiler.h. Also move the gcc/clang definition of barrier() into compiler.h, __memory_barrier() is icc-specific (and barrier() is already defined using it in compiler-intel.h) and doesn't belong in compiler.h. Signed-off-by: Arvind Sankar Fixes: 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive") --- include/linux/compiler-clang.h | 6 -- include/linux/compiler-gcc.h | 19 --- include/linux/compiler.h | 18 -- 3 files changed, 16 insertions(+), 27 deletions(-) diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h index cee0c728d39a..04c0a5a717f7 100644 --- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -52,12 +52,6 @@ #define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1 #endif -/* The following are for compatibility with GCC, from compiler-gcc.h, - * and may be redefined here because they should not be shared with other - * compilers, like ICC. - */ -#define barrier() __asm__ __volatile__("" : : : "memory") - #if __has_feature(shadow_call_stack) # define __noscs __attribute__((__no_sanitize__("shadow-call-stack"))) #endif diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h index 7a3769040d7d..fda30ffb037b 100644 --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -15,25 +15,6 @@ # error Sorry, your compiler is too old - please upgrade it. #endif -/* Optimization barrier */ - -/* The "volatile" is due to gcc bugs */ -#define barrier() __asm__ __volatile__("": : :"memory") -/* - * This version is i.e. to prevent dead stores elimination on @ptr - * where gcc and llvm may behave differently when otherwise using - * normal barrier(): while gcc behavior gets along with a normal - * barrier(), llvm needs an explicit input variable to be assumed - * clobbered. The issue is as follows: while the inline asm might - * access any memory it wants, the compiler could have fit all of - * @ptr into memory registers instead, and since @ptr never escaped - * from that, it proved that the inline asm wasn't touching any of - * it. This version works well with both compilers, i.e. we're telling - * the compiler that the inline asm absolutely may see the contents - * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495 - */ -#define barrier_data(ptr) __asm__ __volatile__("": :"r"(ptr) :"memory") - /* * This macro obfuscates arithmetic on a variable address so that gcc * shouldn't recognize the original var, and make assumptions about it. diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 92ef163a7479..dfba70b2644f 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -80,11 +80,25 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val, /* Optimization barrier */ #ifndef barrier -# define barrier() __memory_barrier() +/* The "volatile" is due to gcc bugs */ +# define barrier() __asm__ __volatile__("": : :"memory") #endif #ifndef barrier_data -# define barrier_data(ptr) barrier() +/* + * This version is i.e. to prevent dead stores elimination on @ptr + * where gcc and llvm may behave differently when otherwise using + * normal barrier(): while gcc behavior gets along with a normal + * barrier(), llvm needs an explicit input variable to be assumed + * clobbered. The issue is as follows: while the inline asm might + * access any memory it wants, the compiler could have fit all of + * @ptr into memory registers instead, and since @ptr never escaped + * from that, it proved that the inline asm wasn't touching any of + * it. This version works well with both compilers, i.e. we're telling + * the compiler that the inline asm absolutely may see the contents + * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495 + */ +# define barrier_data(ptr) __asm__ __volatile__("": :"r"(ptr) :"memory") #endif /* workaround for GCC PR82365 if needed */ -- 2.26.2