[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64

2024-05-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|14.0|14.2

--- Comment #19 from Richard Biener  ---
GCC 14.1 is being released, retargeting bugs to GCC 14.2.

[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64

2024-03-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=113986

--- Comment #18 from Andrew Pinski  ---
(In reply to Khem Raj from comment #17)
> @wilco this commit is now regressing builds for musl/aarch64, where
> libatomic fails to compile. With errors like

Yes and already known is recorded as PR 113986 .

[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64

2024-03-17 Thread raj.khem at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061

Khem Raj  changed:

   What|Removed |Added

 CC||raj.khem at gmail dot com

--- Comment #17 from Khem Raj  ---
@wilco this commit is now regressing builds for musl/aarch64, where libatomic
fails to compile. With errors like

In file included from /home/kraj/work/gcc/libatomic/exch_n.c:25:
/home/kraj/work/gcc/libatomic/libatomic_i.h:288:40: error: ‘export_exchange_16’
aliased to undefined symbol ‘libat_exchange_16’
  288 | extern typeof(C2(libat_,X)) C2(export_,X)   \
  |^~~
/home/kraj/work/gcc/libatomic/libatomic_i.h:40:25: note: in definition of macro
‘C2_’
   40 | #define C2_(X,Y)X ## Y
  | ^
/home/kraj/work/gcc/libatomic/libatomic_i.h:288:37: note: in expansion of macro
‘C2’
  288 | extern typeof(C2(libat_,X)) C2(export_,X)   \
  | ^~
/home/kraj/work/gcc/libatomic/exch_n.c:128:1: note: in expansion of macro
‘EXPORT_ALIAS’
  128 | EXPORT_ALIAS (SIZE(exchange));
  | ^~~~
In file included from /home/kraj/work/gcc/libatomic/fop_n.c:25,
 from /home/kraj/work/gcc/libatomic/fand_n.c:3:
/home/kraj/work/gcc/libatomic/libatomic_i.h:288:40: error:
‘export_fetch_and_16’ aliased to undefined symbol ‘libat_fetch_and_16’
  288 | extern typeof(C2(libat_,X)) C2(export_,X)   \
  |^~~
/home/kraj/work/gcc/libatomic/libatomic_i.h:40:25: note: in definition of macro
‘C2_’
   40 | #define C2_(X,Y)X ## Y
  | ^
/home/kraj/work/gcc/libatomic/libatomic_i.h:288:37: note: in expansion of macro
‘C2’
  288 | extern typeof(C2(libat_,X)) C2(export_,X)   \
  | ^~
/home/kraj/work/gcc/libatomic/fop_n.c:199:1: note: in expansion of macro
‘EXPORT_ALIAS’
  199 | EXPORT_ALIAS (SIZE(C2(fetch_,NAME)));
  | ^~~~
In file included from /home/kraj/work/gcc/libatomic/fadd_n.c:25:
/home/kraj/work/gcc/libatomic/libatomic_i.h:288:40: error:
‘export_fetch_add_16’ aliased to undefined symbol ‘libat_fetch_add_16’
  288 | extern typeof(C2(libat_,X)) C2(export_,X)   \
  |^~~

[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64

2023-12-22 Thread wilco at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061

Wilco  changed:

   What|Removed |Added

   Target Milestone|--- |14.0

--- Comment #16 from Wilco  ---
Fixed by
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=3fa689f6ed8387d315e58169bb9bace3bd508c0a

libatomic: Enable lock-free 128-bit atomics on AArch64

Enable lock-free 128-bit atomics on AArch64.  This is backwards compatible with
existing binaries (as for these GCC always calls into libatomic, so all 128-bit
atomic uses in a process are switched), gives better performance than locking
atomics and is what most users expect.

128-bit atomic loads use a load/store exclusive loop if LSE2 is not supported.
This results in an implicit store which is invisible to software as long as the
given address is writeable (which will be true when using atomics in real
code).

This doesn't yet change __atomic_is_lock_free eventhough all atomics are
finally
lock-free on AArch64.

libatomic:
* config/linux/aarch64/atomic_16.S: Implement lock-free ARMv8.0
atomics.
(libat_exchange_16): Merge RELEASE and ACQ_REL/SEQ_CST cases.
* config/linux/aarch64/host-config.h: Use atomic_16.S for baseline
v8.0.

[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64

2023-06-04 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061

--- Comment #15 from Xi Ruoyao  ---
(In reply to Wilco from comment #14)
> (In reply to Wilco from comment #13)
> > (In reply to Xi Ruoyao from comment #12)
> > > (In reply to Wilco from comment #11)
> > > 
> > > > > Then the compiler (and the standard) is not what they consider.  Such
> > > > > misunderstandings are everywhere and this has no difference.
> > > > 
> > > > Where is int128 in "the standard"?
> > > 
> > > Consider this:
> > > 
> > > const _Atomic long double x = 0.1;
> > > 
> > > int main()
> > > {
> > >   double y = x;
> > >   return y != 0.1;
> > > }
> > > 
> > > If CAS is used here, the program will just segfault.  Does the standard 
> > > say
> > > this is ill-formed or not?
> > 
> > I'd say this is ill formed yes. And it will crash on Atom laptops.
> 
> Correction - it crashes on all AMD cpus too. Are you going to file
> bugreports for this?

PR95722.

[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64

2023-06-02 Thread wilco at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061

--- Comment #14 from Wilco  ---
(In reply to Wilco from comment #13)
> (In reply to Xi Ruoyao from comment #12)
> > (In reply to Wilco from comment #11)
> > 
> > > > Then the compiler (and the standard) is not what they consider.  Such
> > > > misunderstandings are everywhere and this has no difference.
> > > 
> > > Where is int128 in "the standard"?
> > 
> > Consider this:
> > 
> > const _Atomic long double x = 0.1;
> > 
> > int main()
> > {
> > double y = x;
> > return y != 0.1;
> > }
> > 
> > If CAS is used here, the program will just segfault.  Does the standard say
> > this is ill-formed or not?
> 
> I'd say this is ill formed yes. And it will crash on Atom laptops.

Correction - it crashes on all AMD cpus too. Are you going to file bugreports
for this?

[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64

2023-06-02 Thread wilco at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061

--- Comment #13 from Wilco  ---
(In reply to Xi Ruoyao from comment #12)
> (In reply to Wilco from comment #11)
> 
> > > Then the compiler (and the standard) is not what they consider.  Such
> > > misunderstandings are everywhere and this has no difference.
> > 
> > Where is int128 in "the standard"?
> 
> Consider this:
> 
> const _Atomic long double x = 0.1;
> 
> int main()
> {
>   double y = x;
>   return y != 0.1;
> }
> 
> If CAS is used here, the program will just segfault.  Does the standard say
> this is ill-formed or not?

I'd say this is ill formed yes. And it will crash on Atom laptops.

[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64

2023-06-02 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061

--- Comment #12 from Xi Ruoyao  ---
(In reply to Wilco from comment #11)

> > Then the compiler (and the standard) is not what they consider.  Such
> > misunderstandings are everywhere and this has no difference.
> 
> Where is int128 in "the standard"?

Consider this:

const _Atomic long double x = 0.1;

int main()
{
double y = x;
return y != 0.1;
}

If CAS is used here, the program will just segfault.  Does the standard say
this is ill-formed or not?

[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64

2023-06-02 Thread wilco at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061

--- Comment #11 from Wilco  ---
(In reply to Xi Ruoyao from comment #10)
> (In reply to Wilco from comment #9)
> > (In reply to Xi Ruoyao from comment #8)
> > > (In reply to Wilco from comment #7)
> > > > I don't see the issue you have here. GCC for x86/x86_64 has been using
> > > > compare exchange for atomic load (which always does a write even if the
> > > > compare fails) for many years.
> > > 
> > > No we don't, since r7-6454.
> > 
> > Incorrect - libatomic still uses cmpxchg16b depending on the CPU.
> 
> You are incorrect.  It checks cmpxchg16b bit in CPUID but does not use the
> cmpxchg16b instruction.

No, it will use the cmpxchg16b instruction in the other ifunc when AVX is not
available. Libatomic will fallback to locking atomics if neither AVX nor
cmpxchg16b are available (first few generations of x86_64).

> The reason to check cmpxchg16b is both Intel and AMD guarantee that if both
> cmpxchg16b and AVX are available, then an aligned 16-byte load with vmovdqa
> is atomic.  So we can use vmovdqa to do a lock-free load then.  But using
> cmpxchg16b for a load is still wrong, and libatomic do NOT use it.
> 
> > > > The question is, do you believe compilers should provide users with 
> > > > fast and
> > > > efficient atomics they need? Or do you want to force every application 
> > > > to
> > > > implement their own version of 128-bit atomics?
> > > 
> > > But a compiler must generate correct code first.  They can use the 
> > > wonderful
> > > inline assembly because they know CAS is safe in their case, but the
> > > compiler does not know.
> > 
> > Many developers consider locking atomics fundamentally incorrect. If we emit
> > lock-free atomics they don't need to write inline assembler.
> 
> Then the compiler (and the standard) is not what they consider.  Such
> misunderstandings are everywhere and this has no difference.

Where is int128 in "the standard"?

[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64

2023-06-02 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061

--- Comment #10 from Xi Ruoyao  ---
(In reply to Wilco from comment #9)
> (In reply to Xi Ruoyao from comment #8)
> > (In reply to Wilco from comment #7)
> > > I don't see the issue you have here. GCC for x86/x86_64 has been using
> > > compare exchange for atomic load (which always does a write even if the
> > > compare fails) for many years.
> > 
> > No we don't, since r7-6454.
> 
> Incorrect - libatomic still uses cmpxchg16b depending on the CPU.

You are incorrect.  It checks cmpxchg16b bit in CPUID but does not use the
cmpxchg16b instruction.

The reason to check cmpxchg16b is both Intel and AMD guarantee that if both
cmpxchg16b and AVX are available, then an aligned 16-byte load with vmovdqa is
atomic.  So we can use vmovdqa to do a lock-free load then.  But using
cmpxchg16b for a load is still wrong, and libatomic do NOT use it.

> > > The question is, do you believe compilers should provide users with fast 
> > > and
> > > efficient atomics they need? Or do you want to force every application to
> > > implement their own version of 128-bit atomics?
> > 
> > But a compiler must generate correct code first.  They can use the wonderful
> > inline assembly because they know CAS is safe in their case, but the
> > compiler does not know.
> 
> Many developers consider locking atomics fundamentally incorrect. If we emit
> lock-free atomics they don't need to write inline assembler.

Then the compiler (and the standard) is not what they consider.  Such
misunderstandings are everywhere and this has no difference.

[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64

2023-06-02 Thread wilco at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061

--- Comment #9 from Wilco  ---
(In reply to Xi Ruoyao from comment #8)
> (In reply to Wilco from comment #7)
> > I don't see the issue you have here. GCC for x86/x86_64 has been using
> > compare exchange for atomic load (which always does a write even if the
> > compare fails) for many years.
> 
> No we don't, since r7-6454.

Incorrect - libatomic still uses cmpxchg16b depending on the CPU.

> > The question is, do you believe compilers should provide users with fast and
> > efficient atomics they need? Or do you want to force every application to
> > implement their own version of 128-bit atomics?
> 
> But a compiler must generate correct code first.  They can use the wonderful
> inline assembly because they know CAS is safe in their case, but the
> compiler does not know.

Many developers consider locking atomics fundamentally incorrect. If we emit
lock-free atomics they don't need to write inline assembler.

[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64

2023-06-02 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061

Xi Ruoyao  changed:

   What|Removed |Added

 CC||xry111 at gcc dot gnu.org

--- Comment #8 from Xi Ruoyao  ---
(In reply to Wilco from comment #7)
> I don't see the issue you have here. GCC for x86/x86_64 has been using
> compare exchange for atomic load (which always does a write even if the
> compare fails) for many years.

No we don't, since r7-6454.

> The question is, do you believe compilers should provide users with fast and
> efficient atomics they need? Or do you want to force every application to
> implement their own version of 128-bit atomics?

But a compiler must generate correct code first.  They can use the wonderful
inline assembly because they know CAS is safe in their case, but the compiler
does not know.

[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64

2023-05-31 Thread wilco at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061

Wilco  changed:

   What|Removed |Added

 Resolution|DUPLICATE   |---
 Status|RESOLVED|NEW

--- Comment #7 from Wilco  ---
I don't see the issue you have here. GCC for x86/x86_64 has been using compare
exchange for atomic load (which always does a write even if the compare fails)
for many years. LLVM does the same for AArch64/x86/x86_64.

If you believe this is incorrect/invalid, do you have any evidence this causes
crashes in real applications?

As a result of GCC's bad choice of using locking atomics on AArch64, many
applications are forced to implement 128-bit atomics themselves using hacky
inline assembler. Just one example for reference:

https://github.com/boostorg/atomic/blob/08bd4e20338c503d2acfdddfdaa8f5e0bcf9006c/include/boost/atomic/detail/core_arch_ops_gcc_aarch64.hpp#L1635

The question is, do you believe compilers should provide users with fast and
efficient atomics they need? Or do you want to force every application to
implement their own version of 128-bit atomics?

[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64

2023-05-31 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061

--- Comment #6 from Andrew Pinski  ---
bug 70814 comment #3 explains why this should not be done. As I mentioned it
can be improved for armv8.4-a maybe using an ifunc but it cannot be done for
before that.

[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64

2023-05-31 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #5 from Andrew Pinski  ---
Still a dup. Reopen the old one instead if you think gcc should be broken for
loads.

*** This bug has been marked as a duplicate of bug 70814 ***

[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64

2023-05-31 Thread wilco at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061

Wilco  changed:

   What|Removed |Added

   Last reconfirmed||2023-05-31
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=80878
   Assignee|unassigned at gcc dot gnu.org  |wilco at gcc dot gnu.org
 Status|RESOLVED|NEW
 Resolution|DUPLICATE   |---
 Ever confirmed|0   |1

--- Comment #4 from Wilco  ---
Reopened. Please don't close bugs without allowing for discussion first. I'll
send a patch soon that shows it's possible and valid.

And if there is a better solution that results in the same benefits (fast
lock-free atomics, allowing inlining and use of latest instructions without ABI
issues) then I would love to hear ideas and suggestions.

[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64

2023-05-31 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #3 from Andrew Pinski  ---
Dup of bug 70814. Basically 128bit atomics CANNOT be lock free with ARMv8-a.

*** This bug has been marked as a duplicate of bug 70814 ***

[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64

2023-05-31 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=108659

--- Comment #2 from Andrew Pinski  ---
Llvm making wrong choices does not mean gcc has to follow them.

[Bug target/110061] libatomic: 128-bit atomics should be lock-free on AArch64

2023-05-31 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110061

--- Comment #1 from Andrew Pinski  ---
But it is not valid thing to do.