[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566 --- Comment #37 from H.J. Lu --- It is if ((__atomic_fetch_xor_4 ((volatile void *) a, (unsigned int) (1 << bit), 0) & (unsigned int) (1 << bit)) != 0) vs if ((__atomic_fetch_xor_4 ((volatile void *) a, 1 << bit, 0) >> bit & 1) != 0) Why does GCC generate the second one?
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566
--- Comment #36 from H.J. Lu ---
(1 << (x)) works, but (((unsigned int) 1) << (x)) doesn't work:
[hjl@gnu-skx-1 gcc]$ cat bar.c
void bar (void);
#define MASK1(x) (1 << (x))
void
f1 (unsigned int *a, unsigned int bit)
{
if ((__atomic_fetch_xor (a, MASK1 (bit), __ATOMIC_RELAXED) & MASK1 (bit)))
bar ();
}
#define MASK2(x) (((unsigned int) 1) << (x))
void
f2 (unsigned int *a, unsigned int bit)
{
if ((__atomic_fetch_xor (a, MASK2 (bit), __ATOMIC_RELAXED) & MASK2 (bit)))
bar ();
}
[hjl@gnu-skx-1 gcc]$ ./xgcc -B./ -S -O2 bar.c
[hjl@gnu-skx-1 gcc]$ cat bar.s
.file "bar.c"
.text
.p2align 4
.globl f1
.type f1, @function
f1:
.LFB0:
.cfi_startproc
lock btcl %esi, (%rdi)
jc .L4
ret
.p2align 4,,10
.p2align 3
.L4:
jmp bar
.cfi_endproc
.LFE0:
.size f1, .-f1
.p2align 4
.globl f2
.type f2, @function
f2:
.LFB1:
.cfi_startproc
movl%esi, %ecx
movl$1, %edx
movl(%rdi), %eax
sall%cl, %edx
.L6:
movl%eax, %r8d
movl%eax, %esi
xorl%edx, %r8d
lock cmpxchgl %r8d, (%rdi)
jne .L6
btl %ecx, %esi
jc .L10
ret
.p2align 4,,10
.p2align 3
.L10:
jmp bar
.cfi_endproc
.LFE1:
.size f2, .-f2
.ident "GCC: (GNU) 13.0.1 20230118 (experimental)"
.section.note.GNU-stack,"",@progbits
[hjl@gnu-skx-1 gcc]$
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566 --- Comment #35 from CVS Commits --- The master branch has been updated by H.J. Lu : https://gcc.gnu.org/g:03ed4e57e3d46a61513b3d1ab1720997aec8cf71 commit r13-3760-g03ed4e57e3d46a61513b3d1ab1720997aec8cf71 Author: H.J. Lu Date: Tue Nov 1 09:49:18 2022 -0700 Extend optimization for integer bit test on __atomic_fetch_[or|and]_* Extend optimization for _1 = __atomic_fetch_or_4 (ptr_6, 0x8000, _3); _5 = (signed int) _1; _4 = _5 >= 0; to _1 = __atomic_fetch_or_4 (ptr_6, 0x8000, _3); _5 = (signed int) _1; if (_5 >= 0) gcc/ PR middle-end/102566 * tree-ssa-ccp.cc (optimize_atomic_bit_test_and): Also handle if (_5 < 0) and if (_5 >= 0). gcc/testsuite/ PR middle-end/102566 * g++.target/i386/pr102566-7.C
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566 --- Comment #34 from H.J. Lu --- Created attachment 53813 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53813&action=edit A patch to handle if (_5 < 0) A patch to extend optimization for _1 = __atomic_fetch_or_4 (ptr_6, 0x8000, _3); _5 = (signed int) _1; _4 = _5 >= 0; to _1 = __atomic_fetch_or_4 (ptr_6, 0x8000, _3); _5 = (signed int) _1; if (_5 >= 0)
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566
--- Comment #33 from Marko Mäkelä ---
When it comes to toggling the most significant bit, std::atomic::fetch_xor()
could be translated to LOCK XADD which would be able to return all bits:
#include
uint32_t toggle_by_add(std::atomic& a)
{
return a.fetch_add(1U<<31);
}
uint32_t toggle_by_xor(std::atomic& a)
{
return a.fetch_xor(1U<<31);
}
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566 --- Comment #32 from Hongtao.liu --- (In reply to Marko Mäkelä from comment #31) > Much of this seems to work in GCC 12.2.0 as well as in clang++-15. For clang > there is a related ticket https://github.com/llvm/llvm-project/issues/37322 > > I noticed a missed optimization in both g++-12 and clang++-15: Some > operations involving bit 31 degrade to loops around lock cmpxchg. I compiled 31 is sign bit, and c = a & 1U << 31; c == 0 is optimized to (sign int)a >= 0. The optimization we did in optimize_atomic_bit_test_and is supposed to match a & 1U << 31, and it failed. I guess it could be extend to match (sign int)a >= 0 when mask is 1U << 31. 7 : 8 : 9 _1 = __atomic_fetch_or_4 (v, 2147483648, 0); 10 _2 = (signed int) _1; 11 if (_2 >= 0) goto ; else goto ; 12 : 13 return; 14} 15
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566
Marko Mäkelä changed:
What|Removed |Added
CC||marko.makela at mariadb dot com
--- Comment #31 from Marko Mäkelä ---
Much of this seems to work in GCC 12.2.0 as well as in clang++-15. For clang
there is a related ticket https://github.com/llvm/llvm-project/issues/37322
I noticed a missed optimization in both g++-12 and clang++-15: Some operations
involving bit 31 degrade to loops around lock cmpxchg. I compiled it with "-c
-O2" (AMD64) or "-c -O2 -m32 -march=i686" (IA-32).
#include
template
void lock_bts(std::atomic &a) { while (!(a.fetch_or(b) & b)); }
template
void lock_btr(std::atomic &a) { while (a.fetch_and(~b) & b); }
template
void lock_btc(std::atomic &a) { while (a.fetch_xor(b) & b); }
template void lock_bts<1U<<30>(std::atomic &a);
template void lock_btr<1U<<30>(std::atomic &a);
template void lock_btc<1U<<30>(std::atomic &a);
// bug: uses lock cmpxchg
template void lock_bts<1U<<31>(std::atomic &a);
template void lock_btr<1U<<31>(std::atomic &a);
template void lock_btc<1U<<31>(std::atomic &a);
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566 --- Comment #30 from CVS Commits --- The master branch has been updated by hongtao Liu : https://gcc.gnu.org/g:fb161782545224f55ba26ba663889c5e6e9a04d1 commit r12-5102-gfb161782545224f55ba26ba663889c5e6e9a04d1 Author: liuhongt Date: Mon Oct 25 13:59:51 2021 +0800 Improve integer bit test on __atomic_fetch_[or|and]_* returns commit adedd5c173388ae505470df152b9cb3947339566 Author: Jakub Jelinek Date: Tue May 3 13:37:25 2016 +0200 re PR target/49244 (__sync or __atomic builtins will not emit 'lock bts/btr/btc') optimized bit test on __atomic_fetch_or_* and __atomic_fetch_and_* returns with lock bts/btr/btc by turning mask_2 = 1 << cnt_1; _4 = __atomic_fetch_or_* (ptr_6, mask_2, _3); _5 = _4 & mask_2; into _4 = ATOMIC_BIT_TEST_AND_SET (ptr_6, cnt_1, 0, _3); _5 = _4; and mask_6 = 1 << bit_5(D); _1 = ~mask_6; _2 = __atomic_fetch_and_4 (v_8(D), _1, 0); _3 = _2 & mask_6; _4 = _3 != 0; into mask_6 = 1 << bit_5(D); _1 = ~mask_6; _11 = .ATOMIC_BIT_TEST_AND_RESET (v_8(D), bit_5(D), 1, 0); _4 = _11 != 0; But it failed to optimize many equivalent, but slighly different cases: 1. _1 = __atomic_fetch_or_4 (ptr_6, 1, _3); _4 = (_Bool) _1; 2. _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3); _4 = (_Bool) _1; 3. _1 = __atomic_fetch_or_4 (ptr_6, 1, _3); _7 = ~_1; _5 = (_Bool) _7; 4. _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3); _7 = ~_1; _5 = (_Bool) _7; 5. _1 = __atomic_fetch_or_4 (ptr_6, 1, _3); _2 = (int) _1; _7 = ~_2; _5 = (_Bool) _7; 6. _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3); _2 = (int) _1; _7 = ~_2; _5 = (_Bool) _7; 7. _1 = __atomic_fetch_or_4 (ptr_6, 0x8000, _3); _5 = (signed int) _1; _4 = _5 < 0; 8. _1 = __atomic_fetch_and_4 (ptr_6, 0x7fff, _3); _5 = (signed int) _1; _4 = _5 < 0; 9. _1 = 1 << bit_4(D); mask_5 = (unsigned int) _1; _2 = __atomic_fetch_or_4 (v_7(D), mask_5, 0); _3 = _2 & mask_5; 10. mask_7 = 1 << bit_6(D); _1 = ~mask_7; _2 = (unsigned int) _1; _3 = __atomic_fetch_and_4 (v_9(D), _2, 0); _4 = (int) _3; _5 = _4 & mask_7; We make mask_2 = 1 << cnt_1; _4 = __atomic_fetch_or_* (ptr_6, mask_2, _3); _5 = _4 & mask_2; and mask_6 = 1 << bit_5(D); _1 = ~mask_6; _2 = __atomic_fetch_and_4 (v_8(D), _1, 0); _3 = _2 & mask_6; _4 = _3 != 0; the canonical forms for this optimization and transform cases 1-9 to the equivalent canonical form. For cases 10 and 11, we simply remove the cast before __atomic_fetch_or_4/__atomic_fetch_and_4 with _1 = 1 << bit_4(D); _2 = __atomic_fetch_or_4 (v_7(D), _1, 0); _3 = _2 & _1; and mask_7 = 1 << bit_6(D); _1 = ~mask_7; _3 = __atomic_fetch_and_4 (v_9(D), _1, 0); _6 = _3 & mask_7; _5 = (int) _6; 2021-11-04 H.J. Lu Hongtao Liu gcc/ PR middle-end/102566 * match.pd (nop_atomic_bit_test_and_p): New match. * tree-ssa-ccp.c (convert_atomic_bit_not): New function. (gimple_nop_atomic_bit_test_and_p): New prototype. (optimize_atomic_bit_test_and): Transform equivalent, but slighly different cases to their canonical forms. gcc/testsuite/ PR middle-end/102566 * g++.target/i386/pr102566-1.C: New test. * g++.target/i386/pr102566-2.C: Likewise. * g++.target/i386/pr102566-3.C: Likewise. * g++.target/i386/pr102566-4.C: Likewise. * g++.target/i386/pr102566-5a.C: Likewise. * g++.target/i386/pr102566-5b.C: Likewise. * g++.target/i386/pr102566-6a.C: Likewise. * g++.target/i386/pr102566-6b.C: Likewise. * gcc.target/i386/pr102566-1a.c: Likewise. * gcc.target/i386/pr102566-1b.c: Likewise. * gcc.target/i386/pr102566-2.c: Likewise. * gcc.target/i386/pr102566-3a.c: Likewise. * gcc.target/i386/pr102566-3b.c: Likewise. * gcc.target/i386/pr102566-4.c: Likewise. * gcc.target/i386/pr102566-5.c: Likewise. * gcc.target/i386/pr102566-6.c: Likewise. * gcc.target/i386/pr102566-7.c: Likewise. * gcc.target/i386/pr102566-8a.c: Likewise. * gcc.target/i386/pr102566-8b.c: Likewise. * gcc.target/i386/pr102566-9a.c: Likewise. * gcc.target/i386/pr102566-9b.c: Likewise. * gcc.target/i386/pr102566-10a.c: Likewise. * gcc.target/i386/pr102566-10b.c: Likewise. * gcc.target/i386/pr102566-11.c: Likewise. * gcc.target/i386/pr102566-12.c: Likewise.
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566 --- Comment #29 from Thiago Macieira --- New suggestion in bug 103090
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566
--- Comment #28 from Hongtao.liu ---
Can be optimize
int gomp_futex_wake = FUTEX_WAKE | FUTEX_PRIVATE_FLAG;
int gomp_futex_wait = FUTEX_WAIT | FUTEX_PRIVATE_FLAG;
void
gomp_mutex_lock_slow (gomp_mutex_t *mutex, int oldval)
{
/* First loop spins a while. */
while (oldval == 1)
{
if (do_spin (mutex, 1))
{
/* Spin timeout, nothing changed. Set waiting flag. */
oldval = __atomic_exchange_n (mutex, -1, MEMMODEL_ACQUIRE);
if (oldval == 0)
return;
futex_wait (mutex, -1);
break;
}
else
{
/* Something changed. If now unlocked, we're good to go. */
oldval = 0;
if (__atomic_compare_exchange_n (mutex, &oldval, 1, false,
MEMMODEL_ACQUIRE, MEMMODEL_RELAXED))
return;
}
}
/* Second loop waits until mutex is unlocked. We always exit this
loop with wait flag set, so next unlock will awaken a thread. */
while ((oldval = __atomic_exchange_n (mutex, -1, MEMMODEL_ACQUIRE)))
do_wait (mutex, -1);
}
with _atomic_fetch_or/and/xor ?
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566 H.J. Lu changed: What|Removed |Added Attachment #51559|0 |1 is obsolete|| --- Comment #27 from H.J. Lu --- Created attachment 51580 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51580&action=edit The new v4 patch Changes in v4: 1. Bypass redundant check when inputs have been transformed to the equivalent canonical form with valid bit operation.
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566 --- Comment #26 from Thiago Macieira --- (In reply to H.J. Lu from comment #25) > Can you get some performance improvement data on real workloads? Will ask.
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566 --- Comment #25 from H.J. Lu --- (In reply to Thiago Macieira from comment #24) > (In reply to H.J. Lu from comment #23) > > I renamed the commit title. The new v3 is the v6 + fixes. > > Got it. Still no issues. Can you get some performance improvement data on real workloads?
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566 --- Comment #24 from Thiago Macieira --- (In reply to H.J. Lu from comment #23) > I renamed the commit title. The new v3 is the v6 + fixes. Got it. Still no issues.
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566 --- Comment #23 from H.J. Lu --- (In reply to Thiago Macieira from comment #22) > (In reply to H.J. Lu from comment #21) > > Created attachment 51559 [details] > > The new v3 patch > > > > The new v3 patch to check invalid mask. > > v3? We were already up to v6. I renamed the commit title. The new v3 is the v6 + fixes.
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566 --- Comment #22 from Thiago Macieira --- (In reply to H.J. Lu from comment #21) > Created attachment 51559 [details] > The new v3 patch > > The new v3 patch to check invalid mask. v3? We were already up to v6.
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566 H.J. Lu changed: What|Removed |Added Attachment #51558|0 |1 is obsolete|| --- Comment #21 from H.J. Lu --- Created attachment 51559 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51559&action=edit The new v3 patch The new v3 patch to check invalid mask.
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566
--- Comment #20 from Thiago Macieira ---
And:
$ cat /tmp/test.cpp
#include
bool tbit(std::atomic &i)
{
return i.fetch_xor(CONSTANT, std::memory_order_relaxed) & (CONSTANT);
}
$ ~/dev/gcc/bin/gcc "-DCONSTANT=(1LL<<63)" -S -o - -O2 /tmp/test.cpp | sed
'1,/startproc/d;/endproc/,$d'
lock btcq $63, (%rdi)
setc%al
ret
Nice!
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566 Bug 102566 depends on bug 49244, which changed state. Bug 49244 Summary: __sync or __atomic builtins will not emit 'lock bts/btr/btc' https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49244 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566 --- Comment #19 from Thiago Macieira --- (In reply to H.J. Lu from comment #17) > Created attachment 51558 [details] > The v6 patch > > Please try this. Confirmed for all inputs.
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566 --- Comment #18 from H.J. Lu --- (In reply to Andrew Pinski from comment #16) > (In reply to H.J. Lu from comment #14) > > Created attachment 51556 [details] > > The v5 patch > > > > Changes in v5: > > > > 1. Check SSA_NAME before SSA_NAME_OCCURS_IN_ABNORMAL_PHI. > > Why don't you just move this to match.pd instead as suggested by Richard B. > on the mailing list? Then you get the check for > SSA_NAME_OCCURS_IN_ABNORMAL_PHI for free and such. Plus other passes will > do the optimization too Without __atomic_fetch_or_* or __atomic_fetch_and_*, the conversion isn't needed. We also need to check the mask of the atomic builtin.
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566 H.J. Lu changed: What|Removed |Added Attachment #51556|0 |1 is obsolete|| --- Comment #17 from H.J. Lu --- Created attachment 51558 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51558&action=edit The v6 patch Please try this.
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566 --- Comment #16 from Andrew Pinski --- (In reply to H.J. Lu from comment #14) > Created attachment 51556 [details] > The v5 patch > > Changes in v5: > > 1. Check SSA_NAME before SSA_NAME_OCCURS_IN_ABNORMAL_PHI. Why don't you just move this to match.pd instead as suggested by Richard B. on the mailing list? Then you get the check for SSA_NAME_OCCURS_IN_ABNORMAL_PHI for free and such. Plus other passes will do the optimization too
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566
--- Comment #15 from Thiago Macieira ---
Works now for the failing case. Additionally:
bool tbit(std::atomic &i)
{
return i.fetch_and(~CONSTANT, std::memory_order_relaxed) & (CONSTANT);
}
Will properly produce LOCK BTR (CONSTANT=2):
lock btrq $1, (%rdi)
setc%al
ret
CONSTANT=(1L<<62):
lock btrq $62, (%rdi)
setc%al
ret
But not for CONSTANT=1 or CONSTANT=(1L<<63):
movq(%rdi), %rax
.L2:
movq%rax, %rcx
movq%rax, %rdx
andq$-2, %rcx
lock cmpxchgq %rcx, (%rdi)
jne .L2
movl%edx, %eax
andl$1, %eax
ret
Same applies to 1<<31 for atomic.
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566 H.J. Lu changed: What|Removed |Added Attachment #51551|0 |1 is obsolete|| --- Comment #14 from H.J. Lu --- Created attachment 51556 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51556&action=edit The v5 patch Changes in v5: 1. Check SSA_NAME before SSA_NAME_OCCURS_IN_ABNORMAL_PHI.
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566 H.J. Lu changed: What|Removed |Added Attachment #51549|0 |1 is obsolete|| --- Comment #13 from H.J. Lu --- Created attachment 51551 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51551&action=edit The v4 patch Please try the v4 patch.
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566 --- Comment #12 from Thiago Macieira --- Commit 7e0c0500808d58bca5b8e23cbd474022c32234e4 + your patch.
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566
--- Comment #11 from Thiago Macieira ---
$ for ((i=0;i<32;++i)); do ~/dev/gcc/bin/gcc "-DCONSTANT=(1<<$i)" -S -o - -O2
/tmp/test.cpp | grep bts; done
lock btsl $0, (%rdi)
lock btsl $1, (%rdi)
lock btsl $2, (%rdi)
lock btsl $3, (%rdi)
lock btsl $4, (%rdi)
lock btsl $5, (%rdi)
lock btsl $6, (%rdi)
lock btsl $7, (%rdi)
lock btsl $8, (%rdi)
lock btsl $9, (%rdi)
lock btsl $10, (%rdi)
lock btsl $11, (%rdi)
lock btsl $12, (%rdi)
lock btsl $13, (%rdi)
lock btsl $14, (%rdi)
lock btsl $15, (%rdi)
lock btsl $16, (%rdi)
lock btsl $17, (%rdi)
lock btsl $18, (%rdi)
lock btsl $19, (%rdi)
lock btsl $20, (%rdi)
lock btsl $21, (%rdi)
lock btsl $22, (%rdi)
lock btsl $23, (%rdi)
lock btsl $24, (%rdi)
lock btsl $25, (%rdi)
lock btsl $26, (%rdi)
lock btsl $27, (%rdi)
lock btsl $28, (%rdi)
lock btsl $29, (%rdi)
lock btsl $30, (%rdi)
lock btsl $31, (%rdi)
And after changing to long:
$ for ((i=32;i<64;++i)); do ~/dev/gcc/bin/gcc "-DCONSTANT=(1L<<$i)" -S -o - -O2
/tmp/test.cpp | grep bts; done
lock btsq $32, (%rdi)
lock btsq $33, (%rdi)
lock btsq $34, (%rdi)
lock btsq $35, (%rdi)
lock btsq $36, (%rdi)
lock btsq $37, (%rdi)
lock btsq $38, (%rdi)
lock btsq $39, (%rdi)
lock btsq $40, (%rdi)
lock btsq $41, (%rdi)
lock btsq $42, (%rdi)
lock btsq $43, (%rdi)
lock btsq $44, (%rdi)
lock btsq $45, (%rdi)
lock btsq $46, (%rdi)
lock btsq $47, (%rdi)
lock btsq $48, (%rdi)
lock btsq $49, (%rdi)
lock btsq $50, (%rdi)
lock btsq $51, (%rdi)
lock btsq $52, (%rdi)
lock btsq $53, (%rdi)
lock btsq $54, (%rdi)
lock btsq $55, (%rdi)
lock btsq $56, (%rdi)
lock btsq $57, (%rdi)
lock btsq $58, (%rdi)
lock btsq $59, (%rdi)
lock btsq $60, (%rdi)
lock btsq $61, (%rdi)
lock btsq $62, (%rdi)
lock btsq $63, (%rdi)
But:
$ cat /tmp/test2.cpp
#include
bool tbit(std::atomic &i)
{
return i.fetch_or(1, std::memory_order_relaxed) & (~1);
}
$ ~/dev/gcc/bin/gcc -S -o - -O2 /tmp/test2.cpp
.file "test.cpp"
.text
/tmp/test.cpp: In function ‘bool tbit(std::atomic&)’:
/tmp/test.cpp:2:6: error: type mismatch in binary expression
2 | bool tbit(std::atomic &i)
| ^~~~
long int
long unsigned int
__int_type
_9 = _6 & -2;
during GIMPLE pass: fab
/tmp/test.cpp:2:6: internal compiler error: verify_gimple failed
0x119fbba verify_gimple_in_cfg(function*, bool)
/home/tjmaciei/src/gcc/gcc/tree-cfg.c:5576
0x106ced7 execute_function_todo
/home/tjmaciei/src/gcc/gcc/passes.c:2042
0x106d8fb execute_todo
/home/tjmaciei/src/gcc/gcc/passes.c:2096
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See for instructions.
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566 H.J. Lu changed: What|Removed |Added Attachment #51543|0 |1 is obsolete|| --- Comment #10 from H.J. Lu --- Created attachment 51549 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51549&action=edit The v3 patch Please try the v3 patch.
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566
--- Comment #9 from Thiago Macieira ---
Looks like it doesn't work for the sign bit.
$ cat /tmp/test.cpp
#include
bool tbit(std::atomic &i)
{
return i.fetch_or(CONSTANT, std::memory_order_relaxed) & CONSTANT;
}
$ ~/dev/gcc/bin/gcc -DCONSTANT='(1<<30)' -S -o - -O2 /tmp/test.cpp | sed -n
'/startproc/,/endproc/p'
.cfi_startproc
lock btsl $30, (%rdi)
setc%al
ret
.cfi_endproc
$ ~/dev/gcc/bin/gcc -DCONSTANT='(1<<31)' -S -o - -O2 /tmp/test.cpp | sed -n
'/startproc/,/endproc/p'
.cfi_startproc
movl(%rdi), %eax
.L2:
movl%eax, %ecx
movl%eax, %edx
orl $-2147483648, %ecx
lock cmpxchgl %ecx, (%rdi)
jne .L2
shrl$31, %edx
movl%edx, %eax
ret
.cfi_endproc
Changing to std::atomic makes no difference.
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566
--- Comment #8 from Thiago Macieira ---
$ cat /tmp/test.cpp
#include
bool tbit(std::atomic &i)
{
return i.fetch_or(1, std::memory_order_relaxed) & 1;
}
$ ~/dev/gcc/bin/gcc -S -o - -O2 /tmp/test.cpp
.file "test.cpp"
.text
.p2align 4
.globl _Z4tbitRSt6atomicIiE
.type _Z4tbitRSt6atomicIiE, @function
_Z4tbitRSt6atomicIiE:
.LFB339:
.cfi_startproc
lock btsl $0, (%rdi)
setc%al
ret
.cfi_endproc
.LFE339:
.size _Z4tbitRSt6atomicIiE, .-_Z4tbitRSt6atomicIiE
.ident "GCC: (GNU) 12.0.0 20211004 (experimental)"
.section.note.GNU-stack,"",@progbits
+1
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566 --- Comment #7 from Thiago Macieira --- (In reply to H.J. Lu from comment #5) > Created attachment 51536 [details] > A patch > > Please try this. Give me an hour (will try v2).
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566 H.J. Lu changed: What|Removed |Added Attachment #51536|0 |1 is obsolete|| --- Comment #6 from H.J. Lu --- Created attachment 51543 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51543&action=edit The v2 patch
[Bug middle-end/102566] [i386] GCC should emit LOCK BTS for simple bit-test-and-set operations with std::atomic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102566 --- Comment #5 from H.J. Lu --- Created attachment 51536 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51536&action=edit A patch Please try this.
