[Bug target/55966] __atomic_fetch_* generate wrong code for HLE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966 Andrew Pinski changed: What|Removed |Added Resolution|--- |WONTFIX Status|UNCONFIRMED |RESOLVED --- Comment #10 from Andrew Pinski --- Dropping __ATOMIC_HLE_ACQUIRE/__ATOMIC_HLE_RELEASE is always ok thing to do, it is only there for a hint anyways. So closing as won't fix as Intel also disable HLE on almost all cores anyways: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c2955f270a84762343000f103e0640d29c7a96f3
[Bug target/55966] __atomic_fetch_* generate wrong code for HLE
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966 --- Comment #1 from Andi Kleen andi-gcc at firstfloor dot org 2013-01-14 19:06:02 UTC --- Here's a test case. This requires the libstdc++ HLE patch from http://gcc.gnu.org/ml/gcc-patches/2013-01/msg00673.html g++ -std=gnu++0x #include atomic #define ACQ memory_order_acquire | __memory_order_hle_acquire #define REL memory_order_release | __memory_order_hle_release int main() { using namespace std; atomic_ulong au = ATOMIC_VAR_INIT(0); if (!au.fetch_and(1, ACQ)) au.fetch_and(-1, REL); unsigned lock = 0; __atomic_fetch_and(lock, 1, __ATOMIC_HLE_ACQUIRE|__ATOMIC_ACQUIRE); return 0; } The first fetch_and generates incorrectly: .L2: movq%rax, %rcx movq%rax, %rdx andl$1, %ecx lock; cmpxchgq %rcx, -24(%rsp) jne .L2 The second generates correctly: lock; .byte 0xf3 andq$-1, -24(%rsp) The __atomic_fetch_and generates correctly: lock; .byte 0xf2 andl$1, -28(%rsp) The first fetch_and should generate the same code sequence.
[Bug target/55966] __atomic_fetch_* generate wrong code for HLE
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966 --- Comment #2 from Andi Kleen andi-gcc at firstfloor dot org 2013-01-14 19:52:03 UTC --- Created attachment 29163 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29163 preprocessed testcase
[Bug target/55966] __atomic_fetch_* generate wrong code for HLE
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966 Andi Kleen andi-gcc at firstfloor dot org changed: What|Removed |Added Attachment #29163|0 |1 is obsolete|| --- Comment #3 from Andi Kleen andi-gcc at firstfloor dot org 2013-01-14 20:15:11 UTC --- Created attachment 29164 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29164 preprocessed test case Sorry earlier was the wrong file
[Bug target/55966] __atomic_fetch_* generate wrong code for HLE
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966 --- Comment #4 from Uros Bizjak ubizjak at gmail dot com 2013-01-14 21:22:35 UTC --- The problem is, that in failed case maybe_emit_op() gets target register to return the result to, so with after=false, it expands via optab-mem_fetch_before. Unfortunately, atomic_fetch_logic or atomic_logic_fetch do not exist for x86 target, so generic expander falls back to exchange loop.
[Bug target/55966] __atomic_fetch_* generate wrong code for HLE
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966 --- Comment #5 from Uros Bizjak ubizjak at gmail dot com 2013-01-14 21:25:13 UTC --- Following testcase will expand to a cmpxchg loop: int hle_and (int *p, int v) { return __atomic_fetch_and_4 (p, v, __ATOMIC_ACQUIRE | __ATOMIC_HLE_ACQUIRE); }
[Bug target/55966] __atomic_fetch_* generate wrong code for HLE
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966 --- Comment #6 from Andi Kleen andi-gcc at firstfloor dot org 2013-01-14 22:05:38 UTC --- Hmm that's true. x86 doesn't have xand, x_or, x_xor, only xadd Maybe cmpxchg is the only way? For some special cases it can be done (like and single bit- btr, or single bit - btr), but it's probably complicated to implement. In this case I would prefer to forbid those for HLE. I guess more arguments for the target hook. Other ideas?
[Bug target/55966] __atomic_fetch_* generate wrong code for HLE
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966 Andrew Macleod amacleod at redhat dot com changed: What|Removed |Added CC||amacleod at redhat dot com --- Comment #7 from Andrew Macleod amacleod at redhat dot com 2013-01-14 22:22:58 UTC --- What do you mean 'forbid' it? Isn't the HLE bit suppose to be a hint anyway? Saying if the hardware supports it, use it on this atomic instruction... Its not incorrect to ignore it, and some targets simply don't support it. so its ignored. I would expect that when we have to drop back to a compare-exchange loop we're no longer going to really care?
[Bug target/55966] __atomic_fetch_* generate wrong code for HLE
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966 --- Comment #8 from Andi Kleen andi-gcc at firstfloor dot org 2013-01-14 22:32:06 UTC --- forbid = give warning and drop bit It's a hint, but in a good implementation it should not be silently dropped or code generated that has no chance to elide. It's a quality of implementation issue. IMHO there should be at least a warning for this case, like we do for other cases where elision doesn't work.
[Bug target/55966] __atomic_fetch_* generate wrong code for HLE
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966 --- Comment #9 from Andi Kleen andi-gcc at firstfloor dot org 2013-01-14 22:34:16 UTC --- Also i need to look more closely, but most likely the C++ atomic code should be changed to avoid this situation. This would give much better code on x86 in any case even without elision.