[Bug target/70814] atomic store of __int128 is not lock free on aarch64

2023-05-31 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814

--- Comment #13 from Andrew Pinski  ---
*** Bug 110061 has been marked as a duplicate of this bug. ***

[Bug target/70814] atomic store of __int128 is not lock free on aarch64

2023-05-31 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814

Andrew Pinski  changed:

   What|Removed |Added

 CC||wilco at gcc dot gnu.org

--- Comment #12 from Andrew Pinski  ---
*** Bug 110061 has been marked as a duplicate of this bug. ***

[Bug target/70814] atomic store of __int128 is not lock free on aarch64

2016-06-29 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814

--- Comment #11 from Andrew Pinski  ---
(In reply to Richard Earnshaw from comment #10)
> (In reply to Andrew Pinski from comment #9)
> > On x86, they use ifuncs for this purpose inside libatomic. Basically the
> > requirement is only one libatomic can be used at a time.
> 
> If we can guarantee that, then yes, we could do something similar if we had
> fully atomic 128-bit loads.

You have to guarantee that anyways because of the way libatomic is implemented.
 That is the lock used by the implementations has to be the same across all
uses of them.

[Bug target/70814] atomic store of __int128 is not lock free on aarch64

2016-06-29 Thread rearnsha at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814

--- Comment #10 from Richard Earnshaw  ---
(In reply to Andrew Pinski from comment #9)
> On x86, they use ifuncs for this purpose inside libatomic. Basically the
> requirement is only one libatomic can be used at a time.

If we can guarantee that, then yes, we could do something similar if we had
fully atomic 128-bit loads.

[Bug target/70814] atomic store of __int128 is not lock free on aarch64

2016-06-28 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814

--- Comment #9 from Andrew Pinski  ---
(In reply to Richard Earnshaw from comment #8)
> (In reply to Andrew Pinski from comment #7)
> > Closing as invalid.  Though for v8.4 or so it would be nice if there was
> > 128bit atomic loads.
> 
> That probably wouldn't help.  It would require a new ABI before you could
> use them, since all code has to swap to that mechanism.  Iff you could
> guarantee that an application had exactly one instance of
> __atomic_{load,store}_16 (ie that it isn't a private function in each DSO
> component), then you *might* be able to re-implement those routines using
> the new instructions, but otherwise you're stuck with the requirement to
> retain backwards compatibility.

On x86, they use ifuncs for this purpose inside libatomic. Basically the
requirement is only one libatomic can be used at a time.

[Bug target/70814] atomic store of __int128 is not lock free on aarch64

2016-06-28 Thread rearnsha at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814

--- Comment #8 from Richard Earnshaw  ---
(In reply to Andrew Pinski from comment #7)
> Closing as invalid.  Though for v8.4 or so it would be nice if there was
> 128bit atomic loads.

That probably wouldn't help.  It would require a new ABI before you could use
them, since all code has to swap to that mechanism.  Iff you could guarantee
that an application had exactly one instance of __atomic_{load,store}_16 (ie
that it isn't a private function in each DSO component), then you *might* be
able to re-implement those routines using the new instructions, but otherwise
you're stuck with the requirement to retain backwards compatibility.

[Bug target/70814] atomic store of __int128 is not lock free on aarch64

2016-06-28 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814

--- Comment #6 from Andrew Pinski  ---
(In reply to Richard Earnshaw from comment #5)
> I don't think so.  It's to do with whether the CPU is permitted to 'crack'
> the operation into multiple micro-ops that proceed independently down the
> pipeline.  LDAXP could still be cracked in this way provided that from that
> CPU's perspective other memory operations from that core were not re-ordered
> with respect to that operation in an incompatible manner.  However, external
> memory operations could still cause the separate micro-ops to see different
> values.

Makes sense (and I Know of one micro arch which will doing that for casp).

So closing as invalid.

Note the locks used in libatomic are no where near fair and can cause sometimes
not progressing on one thread (but that is a separate bug and is recorded as
bug 59305).

[Bug target/70814] atomic store of __int128 is not lock free on aarch64

2016-06-28 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |INVALID

--- Comment #7 from Andrew Pinski  ---
Closing as invalid.  Though for v8.4 or so it would be nice if there was 128bit
atomic loads.

[Bug target/70814] atomic store of __int128 is not lock free on aarch64

2016-06-28 Thread rearnsha at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814

--- Comment #5 from Richard Earnshaw  ---
I don't think so.  It's to do with whether the CPU is permitted to 'crack' the
operation into multiple micro-ops that proceed independently down the pipeline.
 LDAXP could still be cracked in this way provided that from that CPU's
perspective other memory operations from that core were not re-ordered with
respect to that operation in an incompatible manner.  However, external memory
operations could still cause the separate micro-ops to see different values.

[Bug target/70814] atomic store of __int128 is not lock free on aarch64

2016-06-28 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814

--- Comment #4 from Yichao Yu  ---
Thanks for the explanation. I didn't realize that the load is the problem. Just
curious (since I somehow can't find documentation about it), would `ldaxp`
provide the right semantics without the corresponding store?

[Bug target/70814] atomic store of __int128 is not lock free on aarch64

2016-06-28 Thread rearnsha at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814

--- Comment #3 from Richard Earnshaw  ---
Atomic loads and stores have to use the same policy for a single object, you
can't have loads using one model and stores another.

Atomic loads have to work even if the object is marked const.

C/C++ permit a non-const object pointer to be cast to a const object pointer. 
That means that any mechanism used for atomic loads has to work with const
objects, in particular with objects that are immutable (would trap if
modification were attempted).

The v8 ARM ARM says that the way to perform a 128-bit atomic read using ldxp is
to read the value and then attempt to store it back using a stxp; if the store
succeeds, then the read was atomic, otherwise it was not and must be retried. 
This model doesn't work if the object was immutable, since the store will
fault.

The consequence of this is that ldxp cannot be used for atomic 128-bit reads. 
And the consequence of that is that stxp cannot be used for 128-bit stores. 
Ergo, the only way to perform 128-bit atomic accesses is by using locks.  So I
think the existing code is correct.

I'll leave this open for the moment for further discussion, but I'm minded to
reject as INVALID.

[Bug target/70814] atomic store of __int128 is not lock free on aarch64

2016-04-27 Thread yyc1992 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814

--- Comment #2 from Yichao Yu  ---
As an update, it seems that the llvm trunk recently switched to using
`__atomic_*` function call for int128 on all platforms (that I can test
anyway). I'm not sure how that decision is made or if there's any communication
with GCC but IMHO it would be nice to agree on using the atomic instruction on
aarch64 as long as it provides all the necessary ones.

[Bug target/70814] atomic store of __int128 is not lock free on aarch64

2016-04-27 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||missed-optimization
 Target||aarch64
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-04-27
 CC||ktkachov at gcc dot gnu.org
 Ever confirmed|0   |1
   Severity|normal  |enhancement

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed.
For reference, the testcase is:
#include 
#include 

std::atomic<__int128> a;

__attribute__((noinline)) void f(int64_t v)
{
a.store(v);
}

int main()
{
f(1);
return 0;
}

g++ at -O2 for aarch64 emits for the function 'f':
_Z1fl:
.LFB328:
.cfi_startproc
mov x2, x0
adrpx1, .LANCHOR0
add x0, x1, :lo12:.LANCHOR0
mov w4, 5
asr x3, x2, 63
b   __atomic_store_16