[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2023-01-18 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

--- Comment #37 from H.J. Lu  ---
It is

if ((__atomic_fetch_xor_4 ((volatile void *) a, (unsigned int) (1 << bit), 0) 
& (unsigned int) (1 << bit)) != 0)

vs

if ((__atomic_fetch_xor_4 ((volatile void *) a, 1 << bit, 0) >> bit & 1) != 0)

Why does GCC generate the second one?

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2023-01-18 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

--- Comment #36 from H.J. Lu  ---
(1 << (x)) works, but (((unsigned int) 1) << (x)) doesn't work:

[hjl@gnu-skx-1 gcc]$ cat bar.c
void bar (void);

#define MASK1(x) (1 << (x))

void
f1 (unsigned int *a, unsigned int bit)
{
  if ((__atomic_fetch_xor (a, MASK1 (bit), __ATOMIC_RELAXED) & MASK1 (bit)))
bar ();
}

#define MASK2(x) (((unsigned int) 1) << (x))

void
f2 (unsigned int *a, unsigned int bit)
{
  if ((__atomic_fetch_xor (a, MASK2 (bit), __ATOMIC_RELAXED) & MASK2 (bit)))
bar ();
}
[hjl@gnu-skx-1 gcc]$ ./xgcc -B./ -S -O2 bar.c
[hjl@gnu-skx-1 gcc]$ cat bar.s
.file   "bar.c"
.text
.p2align 4
.globl  f1
.type   f1, @function
f1:
.LFB0:
.cfi_startproc
lock btcl   %esi, (%rdi)
jc  .L4
ret
.p2align 4,,10
.p2align 3
.L4:
jmp bar
.cfi_endproc
.LFE0:
.size   f1, .-f1
.p2align 4
.globl  f2
.type   f2, @function
f2:
.LFB1:
.cfi_startproc
movl%esi, %ecx
movl$1, %edx
movl(%rdi), %eax
sall%cl, %edx
.L6:
movl%eax, %r8d
movl%eax, %esi
xorl%edx, %r8d
lock cmpxchgl   %r8d, (%rdi)
jne .L6
btl %ecx, %esi
jc  .L10
ret
.p2align 4,,10
.p2align 3
.L10:
jmp bar
.cfi_endproc
.LFE1:
.size   f2, .-f2
.ident  "GCC: (GNU) 13.0.1 20230118 (experimental)"
.section.note.GNU-stack,"",@progbits
[hjl@gnu-skx-1 gcc]$

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2022-11-07 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

--- Comment #35 from CVS Commits  ---
The master branch has been updated by H.J. Lu :

https://gcc.gnu.org/g:03ed4e57e3d46a61513b3d1ab1720997aec8cf71

commit r13-3760-g03ed4e57e3d46a61513b3d1ab1720997aec8cf71
Author: H.J. Lu 
Date:   Tue Nov 1 09:49:18 2022 -0700

Extend optimization for integer bit test on __atomic_fetch_[or|and]_*

Extend optimization for

_1 = __atomic_fetch_or_4 (ptr_6, 0x8000, _3);
_5 = (signed int) _1;
_4 = _5 >= 0;

to

_1 = __atomic_fetch_or_4 (ptr_6, 0x8000, _3);
_5 = (signed int) _1;
if (_5 >= 0)

gcc/

PR middle-end/102566
* tree-ssa-ccp.cc (optimize_atomic_bit_test_and): Also handle
if (_5 < 0) and if (_5 >= 0).

gcc/testsuite/

PR middle-end/102566
* g++.target/i386/pr102566-7.C

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2022-11-01 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

--- Comment #34 from H.J. Lu  ---
Created attachment 53813
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53813&action=edit
A patch to handle if (_5 < 0)

A patch to extend optimization for

_1 = __atomic_fetch_or_4 (ptr_6, 0x8000, _3);
_5 = (signed int) _1;
_4 = _5 >= 0;

to

_1 = __atomic_fetch_or_4 (ptr_6, 0x8000, _3);
_5 = (signed int) _1;
if (_5 >= 0)

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2022-11-01 Thread marko.makela at mariadb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

--- Comment #33 from Marko Mäkelä  ---
When it comes to toggling the most significant bit, std::atomic::fetch_xor()
could be translated to LOCK XADD which would be able to return all bits:

#include 
uint32_t toggle_by_add(std::atomic& a)
{
  return a.fetch_add(1U<<31);
}
uint32_t toggle_by_xor(std::atomic& a)
{
  return a.fetch_xor(1U<<31);
}

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2022-10-30 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

--- Comment #32 from Hongtao.liu  ---
(In reply to Marko Mäkelä from comment #31)
> Much of this seems to work in GCC 12.2.0 as well as in clang++-15. For clang
> there is a related ticket https://github.com/llvm/llvm-project/issues/37322
> 
> I noticed a missed optimization in both g++-12 and clang++-15: Some
> operations involving bit 31 degrade to loops around lock cmpxchg. I compiled
31 is sign bit, and  c = a & 1U << 31; c == 0 is optimized to (sign int)a >= 0.
The optimization we did in optimize_atomic_bit_test_and is supposed to match a
& 1U << 31, and it failed. I guess it could be extend to match (sign int)a >= 0
when mask is 1U << 31.

 7  :
 8  :
 9  _1 = __atomic_fetch_or_4 (v, 2147483648, 0);
10  _2 = (signed int) _1;
11  if (_2 >= 0) goto ; else goto ;
12  :
13  return;
14}
15

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2022-10-29 Thread marko.makela at mariadb dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

Marko Mäkelä  changed:

   What|Removed |Added

 CC||marko.makela at mariadb dot com

--- Comment #31 from Marko Mäkelä  ---
Much of this seems to work in GCC 12.2.0 as well as in clang++-15. For clang
there is a related ticket https://github.com/llvm/llvm-project/issues/37322

I noticed a missed optimization in both g++-12 and clang++-15: Some operations
involving bit 31 degrade to loops around lock cmpxchg. I compiled it with "-c
-O2" (AMD64) or "-c -O2 -m32 -march=i686" (IA-32).

#include 
template
void lock_bts(std::atomic &a) { while (!(a.fetch_or(b) & b)); }
template
void lock_btr(std::atomic &a) { while (a.fetch_and(~b) & b); }
template
void lock_btc(std::atomic &a) { while (a.fetch_xor(b) & b); }
template void lock_bts<1U<<30>(std::atomic &a);
template void lock_btr<1U<<30>(std::atomic &a);
template void lock_btc<1U<<30>(std::atomic &a);
// bug: uses lock cmpxchg
template void lock_bts<1U<<31>(std::atomic &a);
template void lock_btr<1U<<31>(std::atomic &a);
template void lock_btc<1U<<31>(std::atomic &a);

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2021-11-10 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

--- Comment #30 from CVS Commits  ---
The master branch has been updated by hongtao Liu :

https://gcc.gnu.org/g:fb161782545224f55ba26ba663889c5e6e9a04d1

commit r12-5102-gfb161782545224f55ba26ba663889c5e6e9a04d1
Author: liuhongt 
Date:   Mon Oct 25 13:59:51 2021 +0800

Improve integer bit test on __atomic_fetch_[or|and]_* returns

commit adedd5c173388ae505470df152b9cb3947339566
Author: Jakub Jelinek 
Date:   Tue May 3 13:37:25 2016 +0200

re PR target/49244 (__sync or __atomic builtins will not emit 'lock
bts/btr/btc')

optimized bit test on __atomic_fetch_or_* and __atomic_fetch_and_* returns
with lock bts/btr/btc by turning

  mask_2 = 1 << cnt_1;
  _4 = __atomic_fetch_or_* (ptr_6, mask_2, _3);
  _5 = _4 & mask_2;

into

  _4 = ATOMIC_BIT_TEST_AND_SET (ptr_6, cnt_1, 0, _3);
  _5 = _4;

and

  mask_6 = 1 << bit_5(D);
  _1 = ~mask_6;
  _2 = __atomic_fetch_and_4 (v_8(D), _1, 0);
  _3 = _2 & mask_6;
  _4 = _3 != 0;

into

  mask_6 = 1 << bit_5(D);
  _1 = ~mask_6;
  _11 = .ATOMIC_BIT_TEST_AND_RESET (v_8(D), bit_5(D), 1, 0);
  _4 = _11 != 0;

But it failed to optimize many equivalent, but slighly different cases:

1.
  _1 = __atomic_fetch_or_4 (ptr_6, 1, _3);
  _4 = (_Bool) _1;
2.
  _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3);
  _4 = (_Bool) _1;
3.
  _1 = __atomic_fetch_or_4 (ptr_6, 1, _3);
  _7 = ~_1;
  _5 = (_Bool) _7;
4.
  _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3);
  _7 = ~_1;
  _5 = (_Bool) _7;
5.
  _1 = __atomic_fetch_or_4 (ptr_6, 1, _3);
  _2 = (int) _1;
  _7 = ~_2;
  _5 = (_Bool) _7;
6.
  _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3);
  _2 = (int) _1;
  _7 = ~_2;
  _5 = (_Bool) _7;
7.
  _1 = __atomic_fetch_or_4 (ptr_6, 0x8000, _3);
  _5 = (signed int) _1;
  _4 = _5 < 0;
8.
  _1 = __atomic_fetch_and_4 (ptr_6, 0x7fff, _3);
  _5 = (signed int) _1;
  _4 = _5 < 0;
9.
  _1 = 1 << bit_4(D);
  mask_5 = (unsigned int) _1;
  _2 = __atomic_fetch_or_4 (v_7(D), mask_5, 0);
  _3 = _2 & mask_5;
10.
  mask_7 = 1 << bit_6(D);
  _1 = ~mask_7;
  _2 = (unsigned int) _1;
  _3 = __atomic_fetch_and_4 (v_9(D), _2, 0);
  _4 = (int) _3;
  _5 = _4 & mask_7;

We make

  mask_2 = 1 << cnt_1;
  _4 = __atomic_fetch_or_* (ptr_6, mask_2, _3);
  _5 = _4 & mask_2;

and

  mask_6 = 1 << bit_5(D);
  _1 = ~mask_6;
  _2 = __atomic_fetch_and_4 (v_8(D), _1, 0);
  _3 = _2 & mask_6;
  _4 = _3 != 0;

the canonical forms for this optimization and transform cases 1-9 to the
equivalent canonical form.  For cases 10 and 11, we simply remove the cast
before __atomic_fetch_or_4/__atomic_fetch_and_4 with

  _1 = 1 << bit_4(D);
  _2 = __atomic_fetch_or_4 (v_7(D), _1, 0);
  _3 = _2 & _1;

and

  mask_7 = 1 << bit_6(D);
  _1 = ~mask_7;
  _3 = __atomic_fetch_and_4 (v_9(D), _1, 0);
  _6 = _3 & mask_7;
  _5 = (int) _6;

2021-11-04  H.J. Lu  
Hongtao Liu  
gcc/

PR middle-end/102566
* match.pd (nop_atomic_bit_test_and_p): New match.
* tree-ssa-ccp.c (convert_atomic_bit_not): New function.
(gimple_nop_atomic_bit_test_and_p): New prototype.
(optimize_atomic_bit_test_and): Transform equivalent, but slighly
different cases to their canonical forms.

gcc/testsuite/

PR middle-end/102566
* g++.target/i386/pr102566-1.C: New test.
* g++.target/i386/pr102566-2.C: Likewise.
* g++.target/i386/pr102566-3.C: Likewise.
* g++.target/i386/pr102566-4.C: Likewise.
* g++.target/i386/pr102566-5a.C: Likewise.
* g++.target/i386/pr102566-5b.C: Likewise.
* g++.target/i386/pr102566-6a.C: Likewise.
* g++.target/i386/pr102566-6b.C: Likewise.
* gcc.target/i386/pr102566-1a.c: Likewise.
* gcc.target/i386/pr102566-1b.c: Likewise.
* gcc.target/i386/pr102566-2.c: Likewise.
* gcc.target/i386/pr102566-3a.c: Likewise.
* gcc.target/i386/pr102566-3b.c: Likewise.
* gcc.target/i386/pr102566-4.c: Likewise.
* gcc.target/i386/pr102566-5.c: Likewise.
* gcc.target/i386/pr102566-6.c: Likewise.
* gcc.target/i386/pr102566-7.c: Likewise.
* gcc.target/i386/pr102566-8a.c: Likewise.
* gcc.target/i386/pr102566-8b.c: Likewise.
* gcc.target/i386/pr102566-9a.c: Likewise.
* gcc.target/i386/pr102566-9b.c: Likewise.
* gcc.target/i386/pr102566-10a.c: Likewise.
* gcc.target/i386/pr102566-10b.c: Likewise.
* gcc.target/i386/pr102566-11.c: Likewise.
* gcc.target/i386/pr102566-12.c: Likewise.
   

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2021-11-04 Thread thiago at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

--- Comment #29 from Thiago Macieira  ---
New suggestion in bug 103090

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2021-10-21 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

--- Comment #28 from Hongtao.liu  ---
Can be optimize

int gomp_futex_wake = FUTEX_WAKE | FUTEX_PRIVATE_FLAG;
int gomp_futex_wait = FUTEX_WAIT | FUTEX_PRIVATE_FLAG;

void
gomp_mutex_lock_slow (gomp_mutex_t *mutex, int oldval)
{
  /* First loop spins a while.  */
  while (oldval == 1)
{
  if (do_spin (mutex, 1))
{
  /* Spin timeout, nothing changed.  Set waiting flag.  */
  oldval = __atomic_exchange_n (mutex, -1, MEMMODEL_ACQUIRE);
  if (oldval == 0)
return;
  futex_wait (mutex, -1);
  break;
}
  else
{
  /* Something changed.  If now unlocked, we're good to go.  */
  oldval = 0;
  if (__atomic_compare_exchange_n (mutex, &oldval, 1, false,
   MEMMODEL_ACQUIRE, MEMMODEL_RELAXED))
return;
}
}

  /* Second loop waits until mutex is unlocked.  We always exit this
 loop with wait flag set, so next unlock will awaken a thread.  */
  while ((oldval = __atomic_exchange_n (mutex, -1, MEMMODEL_ACQUIRE)))
do_wait (mutex, -1);
}

with _atomic_fetch_or/and/xor ?

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2021-10-10 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

H.J. Lu  changed:

   What|Removed |Added

  Attachment #51559|0   |1
is obsolete||

--- Comment #27 from H.J. Lu  ---
Created attachment 51580
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51580&action=edit
The new v4 patch

Changes in v4:

1. Bypass redundant check when inputs have been transformed to the
equivalent canonical form with valid bit operation.

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2021-10-07 Thread thiago at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

--- Comment #26 from Thiago Macieira  ---
(In reply to H.J. Lu from comment #25)
> Can you get some performance improvement data on real workloads?

Will ask.

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2021-10-07 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

--- Comment #25 from H.J. Lu  ---
(In reply to Thiago Macieira from comment #24)
> (In reply to H.J. Lu from comment #23)
> > I renamed the commit title.  The new v3 is the v6 + fixes.
> 
> Got it. Still no issues.

Can you get some performance improvement data on real workloads?

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2021-10-07 Thread thiago at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

--- Comment #24 from Thiago Macieira  ---
(In reply to H.J. Lu from comment #23)
> I renamed the commit title.  The new v3 is the v6 + fixes.

Got it. Still no issues.

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2021-10-06 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

--- Comment #23 from H.J. Lu  ---
(In reply to Thiago Macieira from comment #22)
> (In reply to H.J. Lu from comment #21)
> > Created attachment 51559 [details]
> > The new v3 patch
> > 
> > The new v3 patch to check invalid mask.
> 
> v3? We were already up to v6.

I renamed the commit title.  The new v3 is the v6 + fixes.

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2021-10-06 Thread thiago at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

--- Comment #22 from Thiago Macieira  ---
(In reply to H.J. Lu from comment #21)
> Created attachment 51559 [details]
> The new v3 patch
> 
> The new v3 patch to check invalid mask.

v3? We were already up to v6.

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2021-10-06 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

H.J. Lu  changed:

   What|Removed |Added

  Attachment #51558|0   |1
is obsolete||

--- Comment #21 from H.J. Lu  ---
Created attachment 51559
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51559&action=edit
The new v3 patch

The new v3 patch to check invalid mask.

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2021-10-06 Thread thiago at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

--- Comment #20 from Thiago Macieira  ---
And:

$ cat /tmp/test.cpp 
#include 
bool tbit(std::atomic &i)
{
  return i.fetch_xor(CONSTANT, std::memory_order_relaxed) & (CONSTANT);
}
$ ~/dev/gcc/bin/gcc "-DCONSTANT=(1LL<<63)" -S -o - -O2 /tmp/test.cpp | sed
'1,/startproc/d;/endproc/,$d'
lock btcq   $63, (%rdi)
setc%al
ret

Nice!

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2021-10-06 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566
Bug 102566 depends on bug 49244, which changed state.

Bug 49244 Summary: __sync or __atomic builtins will not emit 'lock bts/btr/btc'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2021-10-05 Thread thiago at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

--- Comment #19 from Thiago Macieira  ---
(In reply to H.J. Lu from comment #17)
> Created attachment 51558 [details]
> The v6 patch
> 
> Please try this.

Confirmed for all inputs.

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2021-10-05 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

--- Comment #18 from H.J. Lu  ---
(In reply to Andrew Pinski from comment #16)
> (In reply to H.J. Lu from comment #14)
> > Created attachment 51556 [details]
> > The v5 patch
> > 
> > Changes in v5:
> > 
> > 1. Check SSA_NAME before SSA_NAME_OCCURS_IN_ABNORMAL_PHI.
> 
> Why don't you just move this to match.pd instead as suggested by Richard B.
> on the mailing list?  Then you get the check for
> SSA_NAME_OCCURS_IN_ABNORMAL_PHI for free and such.  Plus other passes will
> do the optimization too 

Without __atomic_fetch_or_* or __atomic_fetch_and_*, the conversion isn't
needed.  We also need to check the mask of the atomic builtin.

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2021-10-05 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

H.J. Lu  changed:

   What|Removed |Added

  Attachment #51556|0   |1
is obsolete||

--- Comment #17 from H.J. Lu  ---
Created attachment 51558
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51558&action=edit
The v6 patch

Please try this.

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2021-10-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

--- Comment #16 from Andrew Pinski  ---
(In reply to H.J. Lu from comment #14)
> Created attachment 51556 [details]
> The v5 patch
> 
> Changes in v5:
> 
> 1. Check SSA_NAME before SSA_NAME_OCCURS_IN_ABNORMAL_PHI.

Why don't you just move this to match.pd instead as suggested by Richard B. on
the mailing list?  Then you get the check for SSA_NAME_OCCURS_IN_ABNORMAL_PHI
for free and such.  Plus other passes will do the optimization too 

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2021-10-05 Thread thiago at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

--- Comment #15 from Thiago Macieira  ---
Works now for the failing case. Additionally:

bool tbit(std::atomic &i)
{
  return i.fetch_and(~CONSTANT, std::memory_order_relaxed) & (CONSTANT);
}

Will properly produce LOCK BTR (CONSTANT=2):

lock btrq   $1, (%rdi)
setc%al
ret

CONSTANT=(1L<<62):

lock btrq   $62, (%rdi)
setc%al
ret

But not for CONSTANT=1 or CONSTANT=(1L<<63):
movq(%rdi), %rax
.L2:
movq%rax, %rcx
movq%rax, %rdx
andq$-2, %rcx
lock cmpxchgq   %rcx, (%rdi)
jne .L2
movl%edx, %eax
andl$1, %eax
ret

Same applies to 1<<31 for atomic.

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2021-10-05 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

H.J. Lu  changed:

   What|Removed |Added

  Attachment #51551|0   |1
is obsolete||

--- Comment #14 from H.J. Lu  ---
Created attachment 51556
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51556&action=edit
The v5 patch

Changes in v5:

1. Check SSA_NAME before SSA_NAME_OCCURS_IN_ABNORMAL_PHI.

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2021-10-04 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

H.J. Lu  changed:

   What|Removed |Added

  Attachment #51549|0   |1
is obsolete||

--- Comment #13 from H.J. Lu  ---
Created attachment 51551
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51551&action=edit
The v4 patch

Please try the v4 patch.

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2021-10-04 Thread thiago at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

--- Comment #12 from Thiago Macieira  ---
Commit 7e0c0500808d58bca5b8e23cbd474022c32234e4 + your patch.

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2021-10-04 Thread thiago at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

--- Comment #11 from Thiago Macieira  ---
$ for ((i=0;i<32;++i)); do ~/dev/gcc/bin/gcc "-DCONSTANT=(1<<$i)" -S -o - -O2
/tmp/test.cpp | grep bts; done 
lock btsl   $0, (%rdi)
lock btsl   $1, (%rdi)
lock btsl   $2, (%rdi)
lock btsl   $3, (%rdi)
lock btsl   $4, (%rdi)
lock btsl   $5, (%rdi)
lock btsl   $6, (%rdi)
lock btsl   $7, (%rdi)
lock btsl   $8, (%rdi)
lock btsl   $9, (%rdi)
lock btsl   $10, (%rdi)
lock btsl   $11, (%rdi)
lock btsl   $12, (%rdi)
lock btsl   $13, (%rdi)
lock btsl   $14, (%rdi)
lock btsl   $15, (%rdi)
lock btsl   $16, (%rdi)
lock btsl   $17, (%rdi)
lock btsl   $18, (%rdi)
lock btsl   $19, (%rdi)
lock btsl   $20, (%rdi)
lock btsl   $21, (%rdi)
lock btsl   $22, (%rdi)
lock btsl   $23, (%rdi)
lock btsl   $24, (%rdi)
lock btsl   $25, (%rdi)
lock btsl   $26, (%rdi)
lock btsl   $27, (%rdi)
lock btsl   $28, (%rdi)
lock btsl   $29, (%rdi)
lock btsl   $30, (%rdi)
lock btsl   $31, (%rdi)

And after changing to long:

$ for ((i=32;i<64;++i)); do ~/dev/gcc/bin/gcc "-DCONSTANT=(1L<<$i)" -S -o - -O2
/tmp/test.cpp | grep bts; done
lock btsq   $32, (%rdi)
lock btsq   $33, (%rdi)
lock btsq   $34, (%rdi)
lock btsq   $35, (%rdi)
lock btsq   $36, (%rdi)
lock btsq   $37, (%rdi)
lock btsq   $38, (%rdi)
lock btsq   $39, (%rdi)
lock btsq   $40, (%rdi)
lock btsq   $41, (%rdi)
lock btsq   $42, (%rdi)
lock btsq   $43, (%rdi)
lock btsq   $44, (%rdi)
lock btsq   $45, (%rdi)
lock btsq   $46, (%rdi)
lock btsq   $47, (%rdi)
lock btsq   $48, (%rdi)
lock btsq   $49, (%rdi)
lock btsq   $50, (%rdi)
lock btsq   $51, (%rdi)
lock btsq   $52, (%rdi)
lock btsq   $53, (%rdi)
lock btsq   $54, (%rdi)
lock btsq   $55, (%rdi)
lock btsq   $56, (%rdi)
lock btsq   $57, (%rdi)
lock btsq   $58, (%rdi)
lock btsq   $59, (%rdi)
lock btsq   $60, (%rdi)
lock btsq   $61, (%rdi)
lock btsq   $62, (%rdi)
lock btsq   $63, (%rdi)

But:

$ cat /tmp/test2.cpp 
#include 
bool tbit(std::atomic &i)
{
  return i.fetch_or(1, std::memory_order_relaxed) & (~1);
}
$ ~/dev/gcc/bin/gcc -S -o - -O2 /tmp/test2.cpp
.file   "test.cpp"
.text
/tmp/test.cpp: In function ‘bool tbit(std::atomic&)’:
/tmp/test.cpp:2:6: error: type mismatch in binary expression
2 | bool tbit(std::atomic &i)
  |  ^~~~
long int

long unsigned int

__int_type

_9 = _6 & -2;
during GIMPLE pass: fab
/tmp/test.cpp:2:6: internal compiler error: verify_gimple failed
0x119fbba verify_gimple_in_cfg(function*, bool)
/home/tjmaciei/src/gcc/gcc/tree-cfg.c:5576
0x106ced7 execute_function_todo
/home/tjmaciei/src/gcc/gcc/passes.c:2042
0x106d8fb execute_todo
/home/tjmaciei/src/gcc/gcc/passes.c:2096
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2021-10-04 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

H.J. Lu  changed:

   What|Removed |Added

  Attachment #51543|0   |1
is obsolete||

--- Comment #10 from H.J. Lu  ---
Created attachment 51549
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51549&action=edit
The v3 patch

Please try the v3 patch.

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2021-10-04 Thread thiago at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

--- Comment #9 from Thiago Macieira  ---
Looks like it doesn't work for the sign bit.

$ cat /tmp/test.cpp 
#include 
bool tbit(std::atomic &i)
{
return i.fetch_or(CONSTANT, std::memory_order_relaxed) & CONSTANT;
}
$ ~/dev/gcc/bin/gcc -DCONSTANT='(1<<30)' -S -o - -O2 /tmp/test.cpp | sed -n
'/startproc/,/endproc/p'
.cfi_startproc
lock btsl   $30, (%rdi)
setc%al
ret
.cfi_endproc
$ ~/dev/gcc/bin/gcc -DCONSTANT='(1<<31)' -S -o - -O2 /tmp/test.cpp | sed -n
'/startproc/,/endproc/p'
.cfi_startproc
movl(%rdi), %eax
.L2:
movl%eax, %ecx
movl%eax, %edx
orl $-2147483648, %ecx
lock cmpxchgl   %ecx, (%rdi)
jne .L2
shrl$31, %edx
movl%edx, %eax
ret
.cfi_endproc

Changing to std::atomic makes no difference.

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2021-10-04 Thread thiago at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

--- Comment #8 from Thiago Macieira  ---
$ cat /tmp/test.cpp  
#include 
bool tbit(std::atomic &i)
{
   return i.fetch_or(1, std::memory_order_relaxed) & 1;
}
$ ~/dev/gcc/bin/gcc -S -o - -O2 /tmp/test.cpp  
   .file   "test.cpp"
   .text
   .p2align 4
   .globl  _Z4tbitRSt6atomicIiE
   .type   _Z4tbitRSt6atomicIiE, @function
_Z4tbitRSt6atomicIiE:
.LFB339:
   .cfi_startproc
   lock btsl   $0, (%rdi)
   setc%al
   ret
   .cfi_endproc
.LFE339:
   .size   _Z4tbitRSt6atomicIiE, .-_Z4tbitRSt6atomicIiE
   .ident  "GCC: (GNU) 12.0.0 20211004 (experimental)"
   .section.note.GNU-stack,"",@progbits

+1

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2021-10-04 Thread thiago at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

--- Comment #7 from Thiago Macieira  ---
(In reply to H.J. Lu from comment #5)
> Created attachment 51536 [details]
> A patch
> 
> Please try this.

Give me an hour (will try v2).

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2021-10-03 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

H.J. Lu  changed:

   What|Removed |Added

  Attachment #51536|0   |1
is obsolete||

--- Comment #6 from H.J. Lu  ---
Created attachment 51543
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51543&action=edit
The v2 patch

[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic

2021-10-03 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566

--- Comment #5 from H.J. Lu  ---
Created attachment 51536
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51536&action=edit
A patch

Please try this.