[Bug rtl-optimization/70825] x86_64: __atomic_compare_exchange_n() accesses stack unnecessarily
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70825 Andrew Pinski changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |DUPLICATE --- Comment #6 from Andrew Pinski --- Dup of bug 66867. *** This bug has been marked as a duplicate of bug 66867 ***
[Bug rtl-optimization/70825] x86_64: __atomic_compare_exchange_n() accesses stack unnecessarily
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70825 ktkachov at gcc dot gnu.org changed: What|Removed |Added CC||ktkachov at gcc dot gnu.org --- Comment #5 from ktkachov at gcc dot gnu.org --- I've looked at RTL dce a bit and the reason it doesn't remove the store is because the MEM rtx used in the atomic instruction pattern is volatile andalso has the alias set ALIAS_SET_MEMORY_BARRIER associated with it. When the dse pass sees either of those it inserts a "wild read" into its calculations indicating that a memory happened from potentially any location, thus the stack store is potentially not dead and can't be eliminated. I've confirmed this by hacking get_builtin_sync_mem from builtins.c (that creates that mem rtx) to not set MEM_VOLATILE and the alias set to not ALIAS_SET_MEMORY_BARRIER. With those changes I see the stack store being eliminated by dse. So the same information that's used to prevent the compiler from moving memory instructions across these atomic operations prevents it from eliminating the preceding stack store.
[Bug rtl-optimization/70825] x86_64: __atomic_compare_exchange_n() accesses stack unnecessarily
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70825 ghalliday at hpccsystems dot com changed: What|Removed |Added CC||ghalliday at hpccsystems dot com --- Comment #4 from ghalliday at hpccsystems dot com --- I have also hit this problem on x86 and Power8, and am adding a comment about its significance. Although it seems a minor bug it can have a very significant effect on performance. I have code which uses __sync_bool_compare_and_swap() to implement a lock free linked list (inside a memory manager). Replacing it with __atomic_compare_exchange_n should allow better performance - by avoiding reloading the expected value (and also selecting a less restrictive memory order). However on one example test query, the new code using __atomic_compare_exchange_n is over 40% *slower* on x86. It is also slower on Power8 despite using a lwsync instead of sync in the generated code.
[Bug rtl-optimization/70825] x86_64: __atomic_compare_exchange_n() accesses stack unnecessarily
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70825 Bill Schmidt changed: What|Removed |Added Target|x86_64, aarch64 |x86_64, aarch64, powerpc64* --- Comment #3 from Bill Schmidt --- Also on powerpc64: .file "gorp.c" .abiversion 2 .section".text" .align 2 .p2align 4,,15 .globl test_atomic_cmpxchg .type test_atomic_cmpxchg, @function test_atomic_cmpxchg: li 9,23 stw 9,-16(1) <-- unnecessary sync .L2: lwarx 9,0,3 cmpwi 0,9,23 bne 0,.L3 li 9,42 stwcx. 9,0,3 bne 0,.L2 isync .L3: blr
[Bug rtl-optimization/70825] x86_64: __atomic_compare_exchange_n() accesses stack unnecessarily
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70825 Ramana Radhakrishnan changed: What|Removed |Added Target||x86_64, aarch64 CC||ramana at gcc dot gnu.org Component|target |rtl-optimization --- Comment #1 from Ramana Radhakrishnan --- There is an unnecessary store to the stack regardless of the architecture. I suspect that's just because of the a combination of the specification of the intrinsic and DSE being unable to remove such stores. For e.g. on aarch64 with: #include #define __always_inline inline __attribute__((always_inline)) typedef struct { int counter; } atomic_t; static __always_inline int atomic_cmpxchg(atomic_t *v, int old, int new) { int cur = old; if (__atomic_compare_exchange_n(&v->counter, &cur, new, false, __ATOMIC_SEQ_CST, __ATOMIC_RELAXED)) return cur; return cur; } void test_atomic_cmpxchg(atomic_t *counter) { atomic_cmpxchg(counter, 23, 42); } we get. sub sp, sp, #16 mov w1, 23 mov w2, 42 str w1, [sp, 12] ---> unneeded .L3: ldaxr w3, [x0] cmp w3, w1 bne .L4 stlxr w4, w2, [x0] cbnzw4, .L3 .L4: add sp, sp, 16 ret
[Bug rtl-optimization/70825] x86_64: __atomic_compare_exchange_n() accesses stack unnecessarily
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70825 Ramana Radhakrishnan changed: What|Removed |Added Keywords||missed-optimization Status|UNCONFIRMED |NEW Last reconfirmed||2016-04-28 Ever confirmed|0 |1 --- Comment #2 from Ramana Radhakrishnan --- Confirmed then.