[Bug target/55966] __atomic_fetch_* generate wrong code for HLE

2024-04-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |WONTFIX
 Status|UNCONFIRMED |RESOLVED

--- Comment #10 from Andrew Pinski  ---
Dropping __ATOMIC_HLE_ACQUIRE/__ATOMIC_HLE_RELEASE is always ok thing to do, it
is only there for a hint anyways. So closing as won't fix as Intel also disable
HLE on almost all cores anyways:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c2955f270a84762343000f103e0640d29c7a96f3

[Bug target/55966] __atomic_fetch_* generate wrong code for HLE

2013-01-14 Thread andi-gcc at firstfloor dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966



--- Comment #1 from Andi Kleen andi-gcc at firstfloor dot org 2013-01-14 
19:06:02 UTC ---

Here's a test case. This requires the libstdc++ HLE patch from

http://gcc.gnu.org/ml/gcc-patches/2013-01/msg00673.html



g++ -std=gnu++0x



#include atomic



#define ACQ memory_order_acquire | __memory_order_hle_acquire

#define REL memory_order_release | __memory_order_hle_release



int main()

{

  using namespace std;

  atomic_ulong au = ATOMIC_VAR_INIT(0);



  if (!au.fetch_and(1, ACQ))

au.fetch_and(-1, REL);



  unsigned lock = 0;

  __atomic_fetch_and(lock, 1, __ATOMIC_HLE_ACQUIRE|__ATOMIC_ACQUIRE);



  return 0;

}



The first fetch_and generates incorrectly:



.L2:

movq%rax, %rcx

movq%rax, %rdx

andl$1, %ecx

lock; cmpxchgq  %rcx, -24(%rsp)

jne .L2





The second generates correctly:



lock; 

.byte   0xf3

andq$-1, -24(%rsp)





The __atomic_fetch_and generates correctly:



 lock; 

.byte   0xf2

andl$1, -28(%rsp)





The first fetch_and should generate the same code sequence.


[Bug target/55966] __atomic_fetch_* generate wrong code for HLE

2013-01-14 Thread andi-gcc at firstfloor dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966



--- Comment #2 from Andi Kleen andi-gcc at firstfloor dot org 2013-01-14 
19:52:03 UTC ---

Created attachment 29163

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29163

preprocessed testcase


[Bug target/55966] __atomic_fetch_* generate wrong code for HLE

2013-01-14 Thread andi-gcc at firstfloor dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966



Andi Kleen andi-gcc at firstfloor dot org changed:



   What|Removed |Added



  Attachment #29163|0   |1

is obsolete||



--- Comment #3 from Andi Kleen andi-gcc at firstfloor dot org 2013-01-14 
20:15:11 UTC ---

Created attachment 29164

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29164

preprocessed test case



Sorry earlier was the wrong file


[Bug target/55966] __atomic_fetch_* generate wrong code for HLE

2013-01-14 Thread ubizjak at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966



--- Comment #4 from Uros Bizjak ubizjak at gmail dot com 2013-01-14 21:22:35 
UTC ---

The problem is, that in failed case maybe_emit_op() gets target register to

return the result to, so with after=false, it expands via

optab-mem_fetch_before.



Unfortunately, atomic_fetch_logic or atomic_logic_fetch do not exist for

x86 target, so generic expander falls back to exchange loop.


[Bug target/55966] __atomic_fetch_* generate wrong code for HLE

2013-01-14 Thread ubizjak at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966



--- Comment #5 from Uros Bizjak ubizjak at gmail dot com 2013-01-14 21:25:13 
UTC ---

Following testcase will expand to a cmpxchg loop:



int hle_and (int *p, int v)

{

  return __atomic_fetch_and_4 (p, v, __ATOMIC_ACQUIRE | __ATOMIC_HLE_ACQUIRE);

}


[Bug target/55966] __atomic_fetch_* generate wrong code for HLE

2013-01-14 Thread andi-gcc at firstfloor dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966



--- Comment #6 from Andi Kleen andi-gcc at firstfloor dot org 2013-01-14 
22:05:38 UTC ---

Hmm that's true. x86 doesn't have xand, x_or, x_xor, only xadd 

Maybe cmpxchg is the only way?



For some special cases it can be done (like and single bit- btr, or single bit

- btr), but it's probably complicated to implement.



In this case I would prefer to forbid those for HLE.

I guess more arguments for the target hook.



Other ideas?


[Bug target/55966] __atomic_fetch_* generate wrong code for HLE

2013-01-14 Thread amacleod at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966



Andrew Macleod amacleod at redhat dot com changed:



   What|Removed |Added



 CC||amacleod at redhat dot com



--- Comment #7 from Andrew Macleod amacleod at redhat dot com 2013-01-14 
22:22:58 UTC ---

What do you mean 'forbid' it?



Isn't the HLE bit suppose to be a hint anyway?  Saying if the hardware supports

it, use it on this atomic instruction...  Its not incorrect to ignore it, and

some targets simply don't support it. so its ignored.  



I would expect that when we have to drop back to a compare-exchange loop we're

no longer going to really care?


[Bug target/55966] __atomic_fetch_* generate wrong code for HLE

2013-01-14 Thread andi-gcc at firstfloor dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966



--- Comment #8 from Andi Kleen andi-gcc at firstfloor dot org 2013-01-14 
22:32:06 UTC ---

forbid = give warning and drop bit



It's a hint, but in a good implementation it should not be silently dropped or

code generated that has no chance to elide. It's a quality of implementation

issue.



IMHO there should be at least a warning for this case, like we do for other

cases where elision doesn't work.


[Bug target/55966] __atomic_fetch_* generate wrong code for HLE

2013-01-14 Thread andi-gcc at firstfloor dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966



--- Comment #9 from Andi Kleen andi-gcc at firstfloor dot org 2013-01-14 
22:34:16 UTC ---

Also i need to look more closely, but most likely the C++ atomic code should be

changed to avoid this situation. This would give much better code on x86 in any

case even without elision.