[Bug target/108659] Suboptimal 128 bit atomics codegen on AArch64 and x64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659 --- Comment #11 from Wilco --- (In reply to Niall Douglas from comment #10) > (In reply to Jakub Jelinek from comment #9) > > (In reply to Wilco from comment #8) > > > Yes that sounds like a reasonable approach. > > > > I don't think so. Not all variables on which __atomic_* intrinsics are used > > are actually _Atomic, the vars can be embedded in const aggregates etc. > > I'd have the attribute propagate to enclosing types, like over-alignment. Yes, a structure with a 128-bit Atomic type in a subfield/union would be forced to rwdata. And arbitrary casts (eg. from char* to an atomic type) wouldn't work due to Atomics requiring strict alignment. A 128-bit atomic type might have a higher alignment than a 128-bit integer so even casting that seems questionable.
[Bug target/108659] Suboptimal 128 bit atomics codegen on AArch64 and x64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659 --- Comment #10 from Niall Douglas --- (In reply to Jakub Jelinek from comment #9) > (In reply to Wilco from comment #8) > > Yes that sounds like a reasonable approach. > > I don't think so. Not all variables on which __atomic_* intrinsics are used > are actually _Atomic, the vars can be embedded in const aggregates etc. I'd have the attribute propagate to enclosing types, like over-alignment.
[Bug target/108659] Suboptimal 128 bit atomics codegen on AArch64 and x64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659 --- Comment #9 from Jakub Jelinek --- (In reply to Wilco from comment #8) > Yes that sounds like a reasonable approach. I don't think so. Not all variables on which __atomic_* intrinsics are used are actually _Atomic, the vars can be embedded in const aggregates etc.
[Bug target/108659] Suboptimal 128 bit atomics codegen on AArch64 and x64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659 --- Comment #8 from Wilco --- (In reply to Niall Douglas from comment #7) > (In reply to Andrew Pinski from comment #4) > > (In reply to Niall Douglas from comment #3) > > > You may be interested in reading https://reviews.llvm.org/D110069. It > > > wanted > > > to have LLVM generate a 128 bit AArch64 CAS for atomics. LLVM merged that > > > change, it'll be in the next release. > > > > Using CAS for atomic load is not valid thing to do ... > > Because atomic load from constant rodata needs to work. > > LLVM breaks this case as they don't care about it. GCC does though. > > I've heard that argument before, and I've always wondered why _Atomic128 > types couldn't have an attribute which applies attribute section to their > static const variable incarnations to force them into r/w memory. That would > also solve the LLVM issue. Said attribute is not unuseful in general > actually, it would help avoid having to mess with mprotect to apply copy on > write perms on regions in .rodata when you need to modify static const > variable values. > > I don't think that the standard *guarantees* that static const variables go > into read only memory, and besides, before C23 128 bit integers weren't > supported anyway so one could argue as a proprietary extension (__int128) > you get proprietary special casing. Yes that sounds like a reasonable approach. There will language lawyers that say it must also work on mmap after mprotect of course, but that seems even more unlikely in the real world... I believe that the vast majority of developers just want 128-bit atomics to work efficiently without locks when possible. Currently various packages are forced to create 128-bit atomics using inline assembler - and that seems a much worse hack than supporting lock-free atomics in the compiler.
[Bug target/108659] Suboptimal 128 bit atomics codegen on AArch64 and x64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659 --- Comment #7 from Niall Douglas --- (In reply to Andrew Pinski from comment #4) > (In reply to Niall Douglas from comment #3) > > You may be interested in reading https://reviews.llvm.org/D110069. It wanted > > to have LLVM generate a 128 bit AArch64 CAS for atomics. LLVM merged that > > change, it'll be in the next release. > > Using CAS for atomic load is not valid thing to do ... > Because atomic load from constant rodata needs to work. > LLVM breaks this case as they don't care about it. GCC does though. I've heard that argument before, and I've always wondered why _Atomic128 types couldn't have an attribute which applies attribute section to their static const variable incarnations to force them into r/w memory. That would also solve the LLVM issue. Said attribute is not unuseful in general actually, it would help avoid having to mess with mprotect to apply copy on write perms on regions in .rodata when you need to modify static const variable values. I don't think that the standard *guarantees* that static const variables go into read only memory, and besides, before C23 128 bit integers weren't supported anyway so one could argue as a proprietary extension (__int128) you get proprietary special casing. Niall
[Bug target/108659] Suboptimal 128 bit atomics codegen on AArch64 and x64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659 --- Comment #6 from Andrew Pinski --- (In reply to Wilco from comment #5) > To me a far worse issue is that this difference for 128-bit atomics means > that LLVM and GCC are binary incompatible. AFAIK isn't an option to make > them compatible either (on AArch64 GCC13 will use a compatible sequence only > if LSE2 is available). Right and that would mean LLVM is broken for valid C code. And breaks the binary compatibility with GCC rather than the other way around. This is not the first time LLVM has chosen to break things with respect to binary compatibility with GCC (x86_64 argument passing for an example).
[Bug target/108659] Suboptimal 128 bit atomics codegen on AArch64 and x64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659 Wilco changed: What|Removed |Added CC||wilco at gcc dot gnu.org --- Comment #5 from Wilco --- (In reply to Andrew Pinski from comment #4) > (In reply to Niall Douglas from comment #3) > > You may be interested in reading https://reviews.llvm.org/D110069. It wanted > > to have LLVM generate a 128 bit AArch64 CAS for atomics. LLVM merged that > > change, it'll be in the next release. > > Using CAS for atomic load is not valid thing to do ... > Because atomic load from constant rodata needs to work. > LLVM breaks this case as they don't care about it. GCC does though. The question is how useful is this in reality? If memory is not writeable then you can use atomic loads but no other atomic accesses. We could be pragmatic and say that using 128-bit atomic loads from non-writeable memory is a user error just like unaligned atomic accesses. To me a far worse issue is that this difference for 128-bit atomics means that LLVM and GCC are binary incompatible. AFAIK isn't an option to make them compatible either (on AArch64 GCC13 will use a compatible sequence only if LSE2 is available).
[Bug target/108659] Suboptimal 128 bit atomics codegen on AArch64 and x64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659 --- Comment #4 from Andrew Pinski --- (In reply to Niall Douglas from comment #3) > You may be interested in reading https://reviews.llvm.org/D110069. It wanted > to have LLVM generate a 128 bit AArch64 CAS for atomics. LLVM merged that > change, it'll be in the next release. Using CAS for atomic load is not valid thing to do ... Because atomic load from constant rodata needs to work. LLVM breaks this case as they don't care about it. GCC does though.
[Bug target/108659] Suboptimal 128 bit atomics codegen on AArch64 and x64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659 --- Comment #3 from Niall Douglas --- > AMD has guaranteed it, but there is still VIA and Zhaoxin and while we have > some statement from the latter, I'm not sure it is enough and we don't have > anything from VIA. See PR104688 for details. I'm wondering if a compiler opt out flag like -no-msseatomic16 to turn off use of SSE for 128 bit atomics wouldn't be an idea? Given the small market share of those CPU vendors, seems a shame to hold up implementation. (Also, if you do turn it on by default and advertise that widely, I suspect those vendors will hurry up with their documentation) > FWIW, the GCC codegen for aarch64 is at https://godbolt.org/z/qvx9484nY (arm > and aarch64 are different targets). It emits a call to libatomic, which for > GCC 13 will use a lockless implementation when possible at runtime, see > g:d1288d850944f69a795e4ff444a427eba3fec11b Thanks for the catch, my mistake. It would seem the codegen is similarly inferior to the codegen from clang for both aarch64 and x64. You may be interested in reading https://reviews.llvm.org/D110069. It wanted to have LLVM generate a 128 bit AArch64 CAS for atomics. LLVM merged that change, it'll be in the next release.
[Bug target/108659] Suboptimal 128 bit atomics codegen on AArch64 and x64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659 ktkachov at gcc dot gnu.org changed: What|Removed |Added CC||ktkachov at gcc dot gnu.org --- Comment #2 from ktkachov at gcc dot gnu.org --- (In reply to Niall Douglas from comment #0) > Related: > - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80878 > - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94649 > - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 > > I got bitten by this again, latest GCC still does not emit single > instruction 128 bit atomics, even when the -march is easily new enough. Here > is a godbolt comparing latest MSVC, latest GCC and latest clang for the > skylake-avx512 architecture, which unquestionably supports cmpxchg16b. Only > clang emits the single instruction atomic: > > https://godbolt.org/z/EnbeeW4az > > I'm gathering from the issue comments and from the comments at > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 that you're going to > wait for AMD to guarantee atomicity of SSE instructions before changing the > codegen here, which makes sense. However I also wanted to raise potentially > suboptimal 128 bit atomic codegen by GCC for AArch64 as compared to clang: > > https://godbolt.org/z/oKv4o81nv > > GCC emits `dmb` to force a global memory fence, whereas clang does not. > > I think clang is in the right here, the seq_cst atomic semantics are not > supposed to globally memory fence. FWIW, the GCC codegen for aarch64 is at https://godbolt.org/z/qvx9484nY (arm and aarch64 are different targets). It emits a call to libatomic, which for GCC 13 will use a lockless implementation when possible at runtime, see g:d1288d850944f69a795e4ff444a427eba3fec11b
[Bug target/108659] Suboptimal 128 bit atomics codegen on AArch64 and x64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #1 from Jakub Jelinek --- (In reply to Niall Douglas from comment #0) > I'm gathering from the issue comments and from the comments at > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 that you're going to > wait for AMD to guarantee atomicity of SSE instructions before changing the > codegen here, which makes sense. AMD has guaranteed it, but there is still VIA and Zhaoxin and while we have some statement from the latter, I'm not sure it is enough and we don't have anything from VIA. See PR104688 for details.