[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061 Richard Biener changed: What|Removed |Added Target Milestone|14.0|14.2 --- Comment #19 from Richard Biener --- GCC 14.1 is being released, retargeting bugs to GCC 14.2.
[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061 Andrew Pinski changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=113986 --- Comment #18 from Andrew Pinski --- (In reply to Khem Raj from comment #17) > @wilco this commit is now regressing builds for musl/aarch64, where > libatomic fails to compile. With errors like Yes and already known is recorded as PR 113986 .
[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061 Khem Raj changed: What|Removed |Added CC||raj.khem at gmail dot com --- Comment #17 from Khem Raj --- @wilco this commit is now regressing builds for musl/aarch64, where libatomic fails to compile. With errors like In file included from /home/kraj/work/gcc/libatomic/exch_n.c:25: /home/kraj/work/gcc/libatomic/libatomic_i.h:288:40: error: ‘export_exchange_16’ aliased to undefined symbol ‘libat_exchange_16’ 288 | extern typeof(C2(libat_,X)) C2(export_,X) \ |^~~ /home/kraj/work/gcc/libatomic/libatomic_i.h:40:25: note: in definition of macro ‘C2_’ 40 | #define C2_(X,Y)X ## Y | ^ /home/kraj/work/gcc/libatomic/libatomic_i.h:288:37: note: in expansion of macro ‘C2’ 288 | extern typeof(C2(libat_,X)) C2(export_,X) \ | ^~ /home/kraj/work/gcc/libatomic/exch_n.c:128:1: note: in expansion of macro ‘EXPORT_ALIAS’ 128 | EXPORT_ALIAS (SIZE(exchange)); | ^~~~ In file included from /home/kraj/work/gcc/libatomic/fop_n.c:25, from /home/kraj/work/gcc/libatomic/fand_n.c:3: /home/kraj/work/gcc/libatomic/libatomic_i.h:288:40: error: ‘export_fetch_and_16’ aliased to undefined symbol ‘libat_fetch_and_16’ 288 | extern typeof(C2(libat_,X)) C2(export_,X) \ |^~~ /home/kraj/work/gcc/libatomic/libatomic_i.h:40:25: note: in definition of macro ‘C2_’ 40 | #define C2_(X,Y)X ## Y | ^ /home/kraj/work/gcc/libatomic/libatomic_i.h:288:37: note: in expansion of macro ‘C2’ 288 | extern typeof(C2(libat_,X)) C2(export_,X) \ | ^~ /home/kraj/work/gcc/libatomic/fop_n.c:199:1: note: in expansion of macro ‘EXPORT_ALIAS’ 199 | EXPORT_ALIAS (SIZE(C2(fetch_,NAME))); | ^~~~ In file included from /home/kraj/work/gcc/libatomic/fadd_n.c:25: /home/kraj/work/gcc/libatomic/libatomic_i.h:288:40: error: ‘export_fetch_add_16’ aliased to undefined symbol ‘libat_fetch_add_16’ 288 | extern typeof(C2(libat_,X)) C2(export_,X) \ |^~~
[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061 Wilco changed: What|Removed |Added Target Milestone|--- |14.0 --- Comment #16 from Wilco --- Fixed by https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=3fa689f6ed8387d315e58169bb9bace3bd508c0a libatomic: Enable lock-free 128-bit atomics on AArch64 Enable lock-free 128-bit atomics on AArch64. This is backwards compatible with existing binaries (as for these GCC always calls into libatomic, so all 128-bit atomic uses in a process are switched), gives better performance than locking atomics and is what most users expect. 128-bit atomic loads use a load/store exclusive loop if LSE2 is not supported. This results in an implicit store which is invisible to software as long as the given address is writeable (which will be true when using atomics in real code). This doesn't yet change __atomic_is_lock_free eventhough all atomics are finally lock-free on AArch64. libatomic: * config/linux/aarch64/atomic_16.S: Implement lock-free ARMv8.0 atomics. (libat_exchange_16): Merge RELEASE and ACQ_REL/SEQ_CST cases. * config/linux/aarch64/host-config.h: Use atomic_16.S for baseline v8.0.
[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061 --- Comment #15 from Xi Ruoyao --- (In reply to Wilco from comment #14) > (In reply to Wilco from comment #13) > > (In reply to Xi Ruoyao from comment #12) > > > (In reply to Wilco from comment #11) > > > > > > > > Then the compiler (and the standard) is not what they consider. Such > > > > > misunderstandings are everywhere and this has no difference. > > > > > > > > Where is int128 in "the standard"? > > > > > > Consider this: > > > > > > const _Atomic long double x = 0.1; > > > > > > int main() > > > { > > > double y = x; > > > return y != 0.1; > > > } > > > > > > If CAS is used here, the program will just segfault. Does the standard > > > say > > > this is ill-formed or not? > > > > I'd say this is ill formed yes. And it will crash on Atom laptops. > > Correction - it crashes on all AMD cpus too. Are you going to file > bugreports for this? PR95722.
[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061 --- Comment #14 from Wilco --- (In reply to Wilco from comment #13) > (In reply to Xi Ruoyao from comment #12) > > (In reply to Wilco from comment #11) > > > > > > Then the compiler (and the standard) is not what they consider. Such > > > > misunderstandings are everywhere and this has no difference. > > > > > > Where is int128 in "the standard"? > > > > Consider this: > > > > const _Atomic long double x = 0.1; > > > > int main() > > { > > double y = x; > > return y != 0.1; > > } > > > > If CAS is used here, the program will just segfault. Does the standard say > > this is ill-formed or not? > > I'd say this is ill formed yes. And it will crash on Atom laptops. Correction - it crashes on all AMD cpus too. Are you going to file bugreports for this?
[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061 --- Comment #13 from Wilco --- (In reply to Xi Ruoyao from comment #12) > (In reply to Wilco from comment #11) > > > > Then the compiler (and the standard) is not what they consider. Such > > > misunderstandings are everywhere and this has no difference. > > > > Where is int128 in "the standard"? > > Consider this: > > const _Atomic long double x = 0.1; > > int main() > { > double y = x; > return y != 0.1; > } > > If CAS is used here, the program will just segfault. Does the standard say > this is ill-formed or not? I'd say this is ill formed yes. And it will crash on Atom laptops.
[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061 --- Comment #12 from Xi Ruoyao --- (In reply to Wilco from comment #11) > > Then the compiler (and the standard) is not what they consider. Such > > misunderstandings are everywhere and this has no difference. > > Where is int128 in "the standard"? Consider this: const _Atomic long double x = 0.1; int main() { double y = x; return y != 0.1; } If CAS is used here, the program will just segfault. Does the standard say this is ill-formed or not?
[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061 --- Comment #11 from Wilco --- (In reply to Xi Ruoyao from comment #10) > (In reply to Wilco from comment #9) > > (In reply to Xi Ruoyao from comment #8) > > > (In reply to Wilco from comment #7) > > > > I don't see the issue you have here. GCC for x86/x86_64 has been using > > > > compare exchange for atomic load (which always does a write even if the > > > > compare fails) for many years. > > > > > > No we don't, since r7-6454. > > > > Incorrect - libatomic still uses cmpxchg16b depending on the CPU. > > You are incorrect. It checks cmpxchg16b bit in CPUID but does not use the > cmpxchg16b instruction. No, it will use the cmpxchg16b instruction in the other ifunc when AVX is not available. Libatomic will fallback to locking atomics if neither AVX nor cmpxchg16b are available (first few generations of x86_64). > The reason to check cmpxchg16b is both Intel and AMD guarantee that if both > cmpxchg16b and AVX are available, then an aligned 16-byte load with vmovdqa > is atomic. So we can use vmovdqa to do a lock-free load then. But using > cmpxchg16b for a load is still wrong, and libatomic do NOT use it. > > > > > The question is, do you believe compilers should provide users with > > > > fast and > > > > efficient atomics they need? Or do you want to force every application > > > > to > > > > implement their own version of 128-bit atomics? > > > > > > But a compiler must generate correct code first. They can use the > > > wonderful > > > inline assembly because they know CAS is safe in their case, but the > > > compiler does not know. > > > > Many developers consider locking atomics fundamentally incorrect. If we emit > > lock-free atomics they don't need to write inline assembler. > > Then the compiler (and the standard) is not what they consider. Such > misunderstandings are everywhere and this has no difference. Where is int128 in "the standard"?
[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061 --- Comment #10 from Xi Ruoyao --- (In reply to Wilco from comment #9) > (In reply to Xi Ruoyao from comment #8) > > (In reply to Wilco from comment #7) > > > I don't see the issue you have here. GCC for x86/x86_64 has been using > > > compare exchange for atomic load (which always does a write even if the > > > compare fails) for many years. > > > > No we don't, since r7-6454. > > Incorrect - libatomic still uses cmpxchg16b depending on the CPU. You are incorrect. It checks cmpxchg16b bit in CPUID but does not use the cmpxchg16b instruction. The reason to check cmpxchg16b is both Intel and AMD guarantee that if both cmpxchg16b and AVX are available, then an aligned 16-byte load with vmovdqa is atomic. So we can use vmovdqa to do a lock-free load then. But using cmpxchg16b for a load is still wrong, and libatomic do NOT use it. > > > The question is, do you believe compilers should provide users with fast > > > and > > > efficient atomics they need? Or do you want to force every application to > > > implement their own version of 128-bit atomics? > > > > But a compiler must generate correct code first. They can use the wonderful > > inline assembly because they know CAS is safe in their case, but the > > compiler does not know. > > Many developers consider locking atomics fundamentally incorrect. If we emit > lock-free atomics they don't need to write inline assembler. Then the compiler (and the standard) is not what they consider. Such misunderstandings are everywhere and this has no difference.
[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061 --- Comment #9 from Wilco --- (In reply to Xi Ruoyao from comment #8) > (In reply to Wilco from comment #7) > > I don't see the issue you have here. GCC for x86/x86_64 has been using > > compare exchange for atomic load (which always does a write even if the > > compare fails) for many years. > > No we don't, since r7-6454. Incorrect - libatomic still uses cmpxchg16b depending on the CPU. > > The question is, do you believe compilers should provide users with fast and > > efficient atomics they need? Or do you want to force every application to > > implement their own version of 128-bit atomics? > > But a compiler must generate correct code first. They can use the wonderful > inline assembly because they know CAS is safe in their case, but the > compiler does not know. Many developers consider locking atomics fundamentally incorrect. If we emit lock-free atomics they don't need to write inline assembler.
[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061 Xi Ruoyao changed: What|Removed |Added CC||xry111 at gcc dot gnu.org --- Comment #8 from Xi Ruoyao --- (In reply to Wilco from comment #7) > I don't see the issue you have here. GCC for x86/x86_64 has been using > compare exchange for atomic load (which always does a write even if the > compare fails) for many years. No we don't, since r7-6454. > The question is, do you believe compilers should provide users with fast and > efficient atomics they need? Or do you want to force every application to > implement their own version of 128-bit atomics? But a compiler must generate correct code first. They can use the wonderful inline assembly because they know CAS is safe in their case, but the compiler does not know.
[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061 Wilco changed: What|Removed |Added Resolution|DUPLICATE |--- Status|RESOLVED|NEW --- Comment #7 from Wilco --- I don't see the issue you have here. GCC for x86/x86_64 has been using compare exchange for atomic load (which always does a write even if the compare fails) for many years. LLVM does the same for AArch64/x86/x86_64. If you believe this is incorrect/invalid, do you have any evidence this causes crashes in real applications? As a result of GCC's bad choice of using locking atomics on AArch64, many applications are forced to implement 128-bit atomics themselves using hacky inline assembler. Just one example for reference: https://github.com/boostorg/atomic/blob/08bd4e20338c503d2acfdddfdaa8f5e0bcf9006c/include/boost/atomic/detail/core_arch_ops_gcc_aarch64.hpp#L1635 The question is, do you believe compilers should provide users with fast and efficient atomics they need? Or do you want to force every application to implement their own version of 128-bit atomics?
[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061 --- Comment #6 from Andrew Pinski --- bug 70814 comment #3 explains why this should not be done. As I mentioned it can be improved for armv8.4-a maybe using an ifunc but it cannot be done for before that.
[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061 Andrew Pinski changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |DUPLICATE --- Comment #5 from Andrew Pinski --- Still a dup. Reopen the old one instead if you think gcc should be broken for loads. *** This bug has been marked as a duplicate of bug 70814 ***
[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061 Wilco changed: What|Removed |Added Last reconfirmed||2023-05-31 See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=80878 Assignee|unassigned at gcc dot gnu.org |wilco at gcc dot gnu.org Status|RESOLVED|NEW Resolution|DUPLICATE |--- Ever confirmed|0 |1 --- Comment #4 from Wilco --- Reopened. Please don't close bugs without allowing for discussion first. I'll send a patch soon that shows it's possible and valid. And if there is a better solution that results in the same benefits (fast lock-free atomics, allowing inlining and use of latest instructions without ABI issues) then I would love to hear ideas and suggestions.
[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #3 from Andrew Pinski --- Dup of bug 70814. Basically 128bit atomics CANNOT be lock free with ARMv8-a. *** This bug has been marked as a duplicate of bug 70814 ***
[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061 Andrew Pinski changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=108659 --- Comment #2 from Andrew Pinski --- Llvm making wrong choices does not mean gcc has to follow them.
[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061 --- Comment #1 from Andrew Pinski --- But it is not valid thing to do.