[Bug target/103066] __sync_val_compare_and_swap/__sync_bool_compare_and_swap aren't optimized

2021-11-06 Thread thiago at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103066

--- Comment #10 from Thiago Macieira  ---
You're right that emitting more penalises those who have done their job and
written proper code.

The problem we're seeing is that such code appears to be the minority. Or,
maybe put differently, the bad code is showing up a lot in our benchmarks,
especially on very big multi-core and multi-socket systems. "Fixing" the
compiler would make a broad update to the industry -- so long as the code is
recompiled with new compilers. Fixing the actual code would make it better even
if used with old ones.

Does anyone have a suggestion on how to get best "bang for buck"? (Biggest
benefit for smallest effort) This is a sincere question. I'm not trying to be
ironic or sarcastic. How can we help the most, the quickest, for the limited
amount of resources we can marshal?

Also, and I've been hitting this key for a few years, how can we do better at
teaching people how to use the tools at their disposal at the proper way? A
very good counter-example is C++11's std::atomic_flag: you MUST NEVER use it
(at least until C++20, where it got a test() member).

[Bug target/103066] __sync_val_compare_and_swap/__sync_bool_compare_and_swap aren't optimized

2021-11-05 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103066

--- Comment #9 from Jonathan Wakely  ---
I don't know what the assembly code does sorry, but it's not permitted to store
to the 'expected' value unless the atomic variable is not equal to it. It looks
like this would update the 'expected' value unconditionally?

I agree with Jakub that we don't want the additional instructions because they
will be redundant in some cases where the user has already done a load.

[Bug target/103066] __sync_val_compare_and_swap/__sync_bool_compare_and_swap aren't optimized

2021-11-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103066

--- Comment #8 from Jakub Jelinek  ---
(In reply to H.J. Lu from comment #7)
> Instead of generating:
> 
>   movlf(%rip), %eax
> .L2:
>   movd%eax, %xmm0
>   addss   .LC0(%rip), %xmm0
>   movd%xmm0, %edx
>   lock cmpxchgl   %edx, f(%rip)
>   jne .L2
>   ret
> 
> we want
> 
>   movlf(%rip), %eax
> .L2:
>   movd%eax, %xmm0
>   addss   .LC0(%rip), %xmm0
>   movd%xmm0, %edx
>   cmplf(%rip), %eax
>   jne .L2
>   lock cmpxchgl   %edx, f(%rip)
>   jne .L2
>   ret

No, certainly not.  The mov before or the remembered value from previous lock
cmpxchgl already has the right value unless the atomic memory is extremely
contended, so you don't want to add the non-atomic comparison in between.  Not
to mention that the way you've written it totally breaks it, because if the
memory is not equal to the expected value, you should get the current value.
With the above code, if f is modified by another thread in between the initial
movl f(%rip), %eax and cmpl f(%rip), %eax and never after it, it will loop
forever.
I believe what the above paper is talking about should be addressed by users of
these intrinsics if they care and if it is beneficial (e.g. depending on extra
information on how much the lock etc. is contended etc., in OpenMP one has
omp_sync_hint_* constants one can use in hint clause to tell if the lock is
contended, uncontended, unknown, speculative, non-speculative, unknown etc.).

[Bug target/103066] __sync_val_compare_and_swap/__sync_bool_compare_and_swap aren't optimized

2021-11-05 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103066

--- Comment #7 from H.J. Lu  ---
Instead of generating:

movlf(%rip), %eax
.L2:
movd%eax, %xmm0
addss   .LC0(%rip), %xmm0
movd%xmm0, %edx
lock cmpxchgl   %edx, f(%rip)
jne .L2
ret

we want

movlf(%rip), %eax
.L2:
movd%eax, %xmm0
addss   .LC0(%rip), %xmm0
movd%xmm0, %edx
cmplf(%rip), %eax
jne .L2
lock cmpxchgl   %edx, f(%rip)
jne .L2
ret

[Bug target/103066] __sync_val_compare_and_swap/__sync_bool_compare_and_swap aren't optimized

2021-11-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103066

--- Comment #6 from Jakub Jelinek  ---
E.g. the builtin is often used in a loop where the user does his own atomic
load first and decides what to do based on that.
Say for
float f;

void
foo ()
{
  #pragma omp atomic
  f += 3.0f;
}
with -O2 -fopenmp we emit:
  D.2113 = 
  D.2115 = __atomic_load_4 (D.2113, 0);
  D.2114 = D.2115;

   :
  D.2112 = VIEW_CONVERT_EXPR(D.2114);
  _1 = D.2112 + 3.0e+0;
  D.2116 = VIEW_CONVERT_EXPR(_1);
  D.2117 = .ATOMIC_COMPARE_EXCHANGE (D.2113, D.2114, D.2116, 4, 0, 0);
  D.2118 = REALPART_EXPR ;
  D.2119 = D.2114;
  D.2114 = D.2118;
  if (D.2118 != D.2119)
goto ; [0.00%]
  else
goto ; [100.00%]

   :
  return;
which is essentially
void
foo ()
{
  int x = __atomic_load_4 ((int *) , __ATOMIC_RELAXED), y;
  float g;
  do
{
  __builtin_memcpy (, , 4);
  g += 3.0f;
  __builtin_memcpy (, , 4);
}
  while (!__atomic_compare_exchange_n ((int *) , , y, false,
__ATOMIC_RELAXED, __ATOMIC_RELAXED));
}
Can you explain how your proposed change would improve this?  It would just
slow it down and make it larger.

[Bug target/103066] __sync_val_compare_and_swap/__sync_bool_compare_and_swap aren't optimized

2021-11-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103066

Jakub Jelinek  changed:

   What|Removed |Added

 CC||redi at gcc dot gnu.org,
   ||rodgertq at gcc dot gnu.org

--- Comment #5 from Jakub Jelinek  ---
I don't think so, but we'd need somebody more familiar with the C/C++ memory
model to say for sure.
In any case, even if it is, I'm not convinced it is a good idea, users should
be able to choose whether they do the separate atomic load first themselves
rather than compiler forcing them to do that unconditionally.

[Bug target/103066] __sync_val_compare_and_swap/__sync_bool_compare_and_swap aren't optimized

2021-11-05 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103066

--- Comment #4 from H.J. Lu  ---
(In reply to Jakub Jelinek from comment #3)
> If by fail you mean that it doesn't update the memory if the memory isn't
> equal to expected, sure, but do you mean it can fail spuriously, not update
> the memory even if the memory is equal to expected?
> Neither __sync_{bool,val}_compare_and_swap nor __atomic_compare_exchange_n
> with weak set to false can fail spuriously, __atomic_compare_exchange_n with
> weak set to true can.

If we generate

movl m(%rip), %eax
cmpl  %edi, %eax
jne  .L1
movl%edi, %eax
lock cmpxchgl   %esi, m(%rip)
.L1:
ret

is it a valid implementation of atomic_compare_exchange_strong?

[Bug target/103066] __sync_val_compare_and_swap/__sync_bool_compare_and_swap aren't optimized

2021-11-05 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103066

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
If by fail you mean that it doesn't update the memory if the memory isn't equal
to expected, sure, but do you mean it can fail spuriously, not update the
memory even if the memory is equal to expected?
Neither __sync_{bool,val}_compare_and_swap nor __atomic_compare_exchange_n with
weak set to false can fail spuriously, __atomic_compare_exchange_n with weak
set to true can.

[Bug target/103066] __sync_val_compare_and_swap/__sync_bool_compare_and_swap aren't optimized

2021-11-05 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103066

H.J. Lu  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2021-11-05

--- Comment #2 from H.J. Lu  ---
(In reply to Hongyu Wang from comment #1)
> __sync_val_compare_and_swap will be expanded to
> atomic_compare_exchange_strong
> by default, should we restrict the check and return under
> atomic_compare_exchange_weak which is allowed to fail spuriously?

On x86, "lock cmpxchgl" can fail.

[Bug target/103066] __sync_val_compare_and_swap/__sync_bool_compare_and_swap aren't optimized

2021-11-04 Thread wwwhhhyyy333 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103066

--- Comment #1 from Hongyu Wang  ---
__sync_val_compare_and_swap will be expanded to atomic_compare_exchange_strong
by default, should we restrict the check and return under
atomic_compare_exchange_weak which is allowed to fail spuriously?