[Bug target/108659] Suboptimal 128 bit atomics codegen on AArch64 and x64

2023-02-03 Thread wilco at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659

--- Comment #11 from Wilco  ---
(In reply to Niall Douglas from comment #10)
> (In reply to Jakub Jelinek from comment #9)
> > (In reply to Wilco from comment #8)
> > > Yes that sounds like a reasonable approach.
> > 
> > I don't think so.  Not all variables on which __atomic_* intrinsics are used
> > are actually _Atomic, the vars can be embedded in const aggregates etc.
> 
> I'd have the attribute propagate to enclosing types, like over-alignment.

Yes, a structure with a 128-bit Atomic type in a subfield/union would be forced
to rwdata.

And arbitrary casts (eg. from char* to an atomic type) wouldn't work due to
Atomics requiring strict alignment. A 128-bit atomic type might have a higher
alignment than a 128-bit integer so even casting that seems questionable.

[Bug target/108659] Suboptimal 128 bit atomics codegen on AArch64 and x64

2023-02-03 Thread s_gccbugzilla at nedprod dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659

--- Comment #10 from Niall Douglas  ---
(In reply to Jakub Jelinek from comment #9)
> (In reply to Wilco from comment #8)
> > Yes that sounds like a reasonable approach.
> 
> I don't think so.  Not all variables on which __atomic_* intrinsics are used
> are actually _Atomic, the vars can be embedded in const aggregates etc.

I'd have the attribute propagate to enclosing types, like over-alignment.

[Bug target/108659] Suboptimal 128 bit atomics codegen on AArch64 and x64

2023-02-03 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659

--- Comment #9 from Jakub Jelinek  ---
(In reply to Wilco from comment #8)
> Yes that sounds like a reasonable approach.

I don't think so.  Not all variables on which __atomic_* intrinsics are used
are actually _Atomic, the vars can be embedded in const aggregates etc.

[Bug target/108659] Suboptimal 128 bit atomics codegen on AArch64 and x64

2023-02-03 Thread wilco at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659

--- Comment #8 from Wilco  ---
(In reply to Niall Douglas from comment #7)
> (In reply to Andrew Pinski from comment #4)
> > (In reply to Niall Douglas from comment #3) 
> > > You may be interested in reading https://reviews.llvm.org/D110069. It 
> > > wanted
> > > to have LLVM generate a 128 bit AArch64 CAS for atomics. LLVM merged that
> > > change, it'll be in the next release.
> > 
> > Using CAS for atomic load is not valid thing to do ...
> > Because atomic load from constant rodata needs to work.
> > LLVM breaks this case as they don't care about it. GCC does though.
> 
> I've heard that argument before, and I've always wondered why _Atomic128
> types couldn't have an attribute which applies attribute section to their
> static const variable incarnations to force them into r/w memory. That would
> also solve the LLVM issue. Said attribute is not unuseful in general
> actually, it would help avoid having to mess with mprotect to apply copy on
> write perms on regions in .rodata when you need to modify static const
> variable values.
> 
> I don't think that the standard *guarantees* that static const variables go
> into read only memory, and besides, before C23 128 bit integers weren't
> supported anyway so one could argue as a proprietary extension (__int128)
> you get proprietary special casing.

Yes that sounds like a reasonable approach. There will language lawyers that
say it must also work on mmap after mprotect of course, but that seems even
more unlikely in the real world...

I believe that the vast majority of developers just want 128-bit atomics to
work efficiently without locks when possible.

Currently various packages are forced to create 128-bit atomics using inline
assembler - and that seems a much worse hack than supporting lock-free atomics
in the compiler.

[Bug target/108659] Suboptimal 128 bit atomics codegen on AArch64 and x64

2023-02-03 Thread s_gccbugzilla at nedprod dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659

--- Comment #7 from Niall Douglas  ---
(In reply to Andrew Pinski from comment #4)
> (In reply to Niall Douglas from comment #3) 
> > You may be interested in reading https://reviews.llvm.org/D110069. It wanted
> > to have LLVM generate a 128 bit AArch64 CAS for atomics. LLVM merged that
> > change, it'll be in the next release.
> 
> Using CAS for atomic load is not valid thing to do ...
> Because atomic load from constant rodata needs to work.
> LLVM breaks this case as they don't care about it. GCC does though.

I've heard that argument before, and I've always wondered why _Atomic128 types
couldn't have an attribute which applies attribute section to their static
const variable incarnations to force them into r/w memory. That would also
solve the LLVM issue. Said attribute is not unuseful in general actually, it
would help avoid having to mess with mprotect to apply copy on write perms on
regions in .rodata when you need to modify static const variable values.

I don't think that the standard *guarantees* that static const variables go
into read only memory, and besides, before C23 128 bit integers weren't
supported anyway so one could argue as a proprietary extension (__int128) you
get proprietary special casing.

Niall

[Bug target/108659] Suboptimal 128 bit atomics codegen on AArch64 and x64

2023-02-03 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659

--- Comment #6 from Andrew Pinski  ---
(In reply to Wilco from comment #5)
> To me a far worse issue is that this difference for 128-bit atomics means
> that LLVM and GCC are binary incompatible. AFAIK isn't an option to make
> them compatible either (on AArch64 GCC13 will use a compatible sequence only
> if LSE2 is available).

Right and that would mean LLVM is broken for valid C code. And breaks the
binary compatibility with GCC rather than the other way around.
This is not the first time LLVM has chosen to break things with respect to
binary compatibility with GCC (x86_64 argument passing for an example).

[Bug target/108659] Suboptimal 128 bit atomics codegen on AArch64 and x64

2023-02-03 Thread wilco at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659

Wilco  changed:

   What|Removed |Added

 CC||wilco at gcc dot gnu.org

--- Comment #5 from Wilco  ---
(In reply to Andrew Pinski from comment #4)
> (In reply to Niall Douglas from comment #3) 
> > You may be interested in reading https://reviews.llvm.org/D110069. It wanted
> > to have LLVM generate a 128 bit AArch64 CAS for atomics. LLVM merged that
> > change, it'll be in the next release.
> 
> Using CAS for atomic load is not valid thing to do ...
> Because atomic load from constant rodata needs to work.
> LLVM breaks this case as they don't care about it. GCC does though.

The question is how useful is this in reality? If memory is not writeable then
you can use atomic loads but no other atomic accesses.

We could be pragmatic and say that using 128-bit atomic loads from
non-writeable memory is a user error just like unaligned atomic accesses.

To me a far worse issue is that this difference for 128-bit atomics means that
LLVM and GCC are binary incompatible. AFAIK isn't an option to make them
compatible either (on AArch64 GCC13 will use a compatible sequence only if LSE2
is available).

[Bug target/108659] Suboptimal 128 bit atomics codegen on AArch64 and x64

2023-02-03 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659

--- Comment #4 from Andrew Pinski  ---
(In reply to Niall Douglas from comment #3) 
> You may be interested in reading https://reviews.llvm.org/D110069. It wanted
> to have LLVM generate a 128 bit AArch64 CAS for atomics. LLVM merged that
> change, it'll be in the next release.

Using CAS for atomic load is not valid thing to do ...
Because atomic load from constant rodata needs to work.
LLVM breaks this case as they don't care about it. GCC does though.

[Bug target/108659] Suboptimal 128 bit atomics codegen on AArch64 and x64

2023-02-03 Thread s_gccbugzilla at nedprod dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659

--- Comment #3 from Niall Douglas  ---
> AMD has guaranteed it, but there is still VIA and Zhaoxin and while we have 
> some statement from the latter, I'm not sure it is enough and we don't have 
> anything from VIA.  See PR104688 for details.

I'm wondering if a compiler opt out flag like -no-msseatomic16 to turn off use
of SSE for 128 bit atomics wouldn't be an idea? Given the small market share of
those CPU vendors, seems a shame to hold up implementation.

(Also, if you do turn it on by default and advertise that widely, I suspect
those vendors will hurry up with their documentation)

> FWIW, the GCC codegen for aarch64 is at https://godbolt.org/z/qvx9484nY (arm 
> and aarch64 are different targets). It emits a call to libatomic, which for 
> GCC 13 will use a lockless implementation when possible at runtime, see 
> g:d1288d850944f69a795e4ff444a427eba3fec11b

Thanks for the catch, my mistake. It would seem the codegen is similarly
inferior to the codegen from clang for both aarch64 and x64.

You may be interested in reading https://reviews.llvm.org/D110069. It wanted to
have LLVM generate a 128 bit AArch64 CAS for atomics. LLVM merged that change,
it'll be in the next release.

[Bug target/108659] Suboptimal 128 bit atomics codegen on AArch64 and x64

2023-02-03 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org

--- Comment #2 from ktkachov at gcc dot gnu.org ---
(In reply to Niall Douglas from comment #0)
> Related:
> - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80878
> - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94649
> - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688
> 
> I got bitten by this again, latest GCC still does not emit single
> instruction 128 bit atomics, even when the -march is easily new enough. Here
> is a godbolt comparing latest MSVC, latest GCC and latest clang for the
> skylake-avx512 architecture, which unquestionably supports cmpxchg16b. Only
> clang emits the single instruction atomic:
> 
> https://godbolt.org/z/EnbeeW4az
> 
> I'm gathering from the issue comments and from the comments at
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 that you're going to
> wait for AMD to guarantee atomicity of SSE instructions before changing the
> codegen here, which makes sense. However I also wanted to raise potentially
> suboptimal 128 bit atomic codegen by GCC for AArch64 as compared to clang:
> 
> https://godbolt.org/z/oKv4o81nv
> 
> GCC emits `dmb` to force a global memory fence, whereas clang does not.
> 
> I think clang is in the right here, the seq_cst atomic semantics are not
> supposed to globally memory fence.

FWIW, the GCC codegen for aarch64 is at https://godbolt.org/z/qvx9484nY (arm
and aarch64 are different targets). It emits a call to libatomic, which for GCC
13 will use a lockless implementation when possible at runtime, see
g:d1288d850944f69a795e4ff444a427eba3fec11b

[Bug target/108659] Suboptimal 128 bit atomics codegen on AArch64 and x64

2023-02-03 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek  ---
(In reply to Niall Douglas from comment #0)
> I'm gathering from the issue comments and from the comments at
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 that you're going to
> wait for AMD to guarantee atomicity of SSE instructions before changing the
> codegen here, which makes sense.

AMD has guaranteed it, but there is still VIA and Zhaoxin and while we have
some statement from the latter, I'm not sure it is enough and we don't have
anything from VIA.  See PR104688 for details.