[Bug target/70814] atomic store of __int128 is not lock free on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814 --- Comment #13 from Andrew Pinski --- *** Bug 110061 has been marked as a duplicate of this bug. ***
[Bug target/70814] atomic store of __int128 is not lock free on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814 Andrew Pinski changed: What|Removed |Added CC||wilco at gcc dot gnu.org --- Comment #12 from Andrew Pinski --- *** Bug 110061 has been marked as a duplicate of this bug. ***
[Bug target/70814] atomic store of __int128 is not lock free on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814 --- Comment #11 from Andrew Pinski --- (In reply to Richard Earnshaw from comment #10) > (In reply to Andrew Pinski from comment #9) > > On x86, they use ifuncs for this purpose inside libatomic. Basically the > > requirement is only one libatomic can be used at a time. > > If we can guarantee that, then yes, we could do something similar if we had > fully atomic 128-bit loads. You have to guarantee that anyways because of the way libatomic is implemented. That is the lock used by the implementations has to be the same across all uses of them.
[Bug target/70814] atomic store of __int128 is not lock free on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814 --- Comment #10 from Richard Earnshaw --- (In reply to Andrew Pinski from comment #9) > On x86, they use ifuncs for this purpose inside libatomic. Basically the > requirement is only one libatomic can be used at a time. If we can guarantee that, then yes, we could do something similar if we had fully atomic 128-bit loads.
[Bug target/70814] atomic store of __int128 is not lock free on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814 --- Comment #9 from Andrew Pinski --- (In reply to Richard Earnshaw from comment #8) > (In reply to Andrew Pinski from comment #7) > > Closing as invalid. Though for v8.4 or so it would be nice if there was > > 128bit atomic loads. > > That probably wouldn't help. It would require a new ABI before you could > use them, since all code has to swap to that mechanism. Iff you could > guarantee that an application had exactly one instance of > __atomic_{load,store}_16 (ie that it isn't a private function in each DSO > component), then you *might* be able to re-implement those routines using > the new instructions, but otherwise you're stuck with the requirement to > retain backwards compatibility. On x86, they use ifuncs for this purpose inside libatomic. Basically the requirement is only one libatomic can be used at a time.
[Bug target/70814] atomic store of __int128 is not lock free on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814 --- Comment #8 from Richard Earnshaw --- (In reply to Andrew Pinski from comment #7) > Closing as invalid. Though for v8.4 or so it would be nice if there was > 128bit atomic loads. That probably wouldn't help. It would require a new ABI before you could use them, since all code has to swap to that mechanism. Iff you could guarantee that an application had exactly one instance of __atomic_{load,store}_16 (ie that it isn't a private function in each DSO component), then you *might* be able to re-implement those routines using the new instructions, but otherwise you're stuck with the requirement to retain backwards compatibility.
[Bug target/70814] atomic store of __int128 is not lock free on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814 --- Comment #6 from Andrew Pinski --- (In reply to Richard Earnshaw from comment #5) > I don't think so. It's to do with whether the CPU is permitted to 'crack' > the operation into multiple micro-ops that proceed independently down the > pipeline. LDAXP could still be cracked in this way provided that from that > CPU's perspective other memory operations from that core were not re-ordered > with respect to that operation in an incompatible manner. However, external > memory operations could still cause the separate micro-ops to see different > values. Makes sense (and I Know of one micro arch which will doing that for casp). So closing as invalid. Note the locks used in libatomic are no where near fair and can cause sometimes not progressing on one thread (but that is a separate bug and is recorded as bug 59305).
[Bug target/70814] atomic store of __int128 is not lock free on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814 Andrew Pinski changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |INVALID --- Comment #7 from Andrew Pinski --- Closing as invalid. Though for v8.4 or so it would be nice if there was 128bit atomic loads.
[Bug target/70814] atomic store of __int128 is not lock free on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814 --- Comment #5 from Richard Earnshaw --- I don't think so. It's to do with whether the CPU is permitted to 'crack' the operation into multiple micro-ops that proceed independently down the pipeline. LDAXP could still be cracked in this way provided that from that CPU's perspective other memory operations from that core were not re-ordered with respect to that operation in an incompatible manner. However, external memory operations could still cause the separate micro-ops to see different values.
[Bug target/70814] atomic store of __int128 is not lock free on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814 --- Comment #4 from Yichao Yu --- Thanks for the explanation. I didn't realize that the load is the problem. Just curious (since I somehow can't find documentation about it), would `ldaxp` provide the right semantics without the corresponding store?
[Bug target/70814] atomic store of __int128 is not lock free on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814 --- Comment #3 from Richard Earnshaw --- Atomic loads and stores have to use the same policy for a single object, you can't have loads using one model and stores another. Atomic loads have to work even if the object is marked const. C/C++ permit a non-const object pointer to be cast to a const object pointer. That means that any mechanism used for atomic loads has to work with const objects, in particular with objects that are immutable (would trap if modification were attempted). The v8 ARM ARM says that the way to perform a 128-bit atomic read using ldxp is to read the value and then attempt to store it back using a stxp; if the store succeeds, then the read was atomic, otherwise it was not and must be retried. This model doesn't work if the object was immutable, since the store will fault. The consequence of this is that ldxp cannot be used for atomic 128-bit reads. And the consequence of that is that stxp cannot be used for 128-bit stores. Ergo, the only way to perform 128-bit atomic accesses is by using locks. So I think the existing code is correct. I'll leave this open for the moment for further discussion, but I'm minded to reject as INVALID.
[Bug target/70814] atomic store of __int128 is not lock free on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814 --- Comment #2 from Yichao Yu --- As an update, it seems that the llvm trunk recently switched to using `__atomic_*` function call for int128 on all platforms (that I can test anyway). I'm not sure how that decision is made or if there's any communication with GCC but IMHO it would be nice to agree on using the atomic instruction on aarch64 as long as it provides all the necessary ones.
[Bug target/70814] atomic store of __int128 is not lock free on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70814 ktkachov at gcc dot gnu.org changed: What|Removed |Added Keywords||missed-optimization Target||aarch64 Status|UNCONFIRMED |NEW Last reconfirmed||2016-04-27 CC||ktkachov at gcc dot gnu.org Ever confirmed|0 |1 Severity|normal |enhancement --- Comment #1 from ktkachov at gcc dot gnu.org --- Confirmed. For reference, the testcase is: #include #include std::atomic<__int128> a; __attribute__((noinline)) void f(int64_t v) { a.store(v); } int main() { f(1); return 0; } g++ at -O2 for aarch64 emits for the function 'f': _Z1fl: .LFB328: .cfi_startproc mov x2, x0 adrpx1, .LANCHOR0 add x0, x1, :lo12:.LANCHOR0 mov w4, 5 asr x3, x2, 63 b __atomic_store_16