[Bug target/25671] test_bit() compilation does not expand to "bt" instruction

2021-08-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25671

--- Comment #10 from Andrew Pinski  ---
With the fixed testcase we get:

movq%rsi, %rax
movq%rsi, %rcx
shrq$6, %rax
andl$63, %ecx
movq(%rdi,%rax,8), %rax
shrq%cl, %rax
andl$1, %eax

ICC can produce the btq but with extra instructions still:
movq  %rsi, %rax#5.25
shrq  $6, %rax  #5.25
movq  (%rdi,%rax,8), %rdx   #5.13
xorl  %eax, %eax#5.61
btq   %rsi, %rdx#5.61
setc  %al

[Bug target/25671] test_bit() compilation does not expand to "bt" instruction

2021-08-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25671

--- Comment #9 from Andrew Pinski  ---
Note there is a bug in the original testcase.
It should be:

int test_bit(unsigned long *words, int bit)
{
int wsize = (sizeof *words) * 8;
return (words[bit / wsize] & (1ul << (bit % wsize))) != 0;
}

if int is 32bit and long is 64bit, you would have gotten the wrong result.

[Bug target/25671] test_bit() compilation does not expand to bt instruction

2006-04-11 Thread avi at argo dot co dot il


--- Comment #4 from avi at argo dot co dot il  2006-04-11 15:36 ---
Benchmark results, 32 bit code, various methods

On an athlon 64:

   bts reg, (reg):  1 cycle
   bts reg, (mem):  3 cycles
   C code (reg):1 cycle
   C code (mem):5 cycles

On a Xeon:

   bts reg, (reg):  6 cycles
   bts reg, (mem): 15 cycles
   C code (reg):1 cycle
   C code (mem):5 cycles

Looks like a very small win on athlon 64 when modifying memory.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25671




[Bug target/25671] test_bit() compilation does not expand to bt instruction

2006-04-11 Thread avi at argo dot co dot il


--- Comment #5 from avi at argo dot co dot il  2006-04-11 15:38 ---
Created an attachment (id=11243)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11243action=view)
benchmark for various set_bit() implementions


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25671



[Bug target/25671] test_bit() compilation does not expand to bt instruction

2006-04-11 Thread avi at argo dot co dot il


--- Comment #6 from avi at argo dot co dot il  2006-04-11 15:39 ---
oops, the benchmark was for bts. will do again for bt.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25671



[Bug target/25671] test_bit() compilation does not expand to bt instruction

2006-04-11 Thread avi at argo dot co dot il


--- Comment #7 from avi at argo dot co dot il  2006-04-11 15:53 ---
Created an attachment (id=11244)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11244action=view)
bt instruction benchmark

redone the test for test_bit(), this time always forcing a memory access:

Athlon 64:

 bt:  3 cycles
 generic: 3 cycles

Xeon:

 bt: 10 cycles
 generic: 4 cycles

so, bt might be usable for -Os, but likely not with the effort.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25671



[Bug target/25671] test_bit() compilation does not expand to bt instruction

2006-04-11 Thread steven at gcc dot gnu dot org


--- Comment #8 from steven at gcc dot gnu dot org  2006-04-11 23:03 ---
Code size issue


-- 

steven at gcc dot gnu dot org changed:

   What|Removed |Added

OtherBugsDependingO||16996
  nThis||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25671



[Bug target/25671] test_bit() compilation does not expand to bt instruction

2006-04-10 Thread steven at gcc dot gnu dot org


--- Comment #2 from steven at gcc dot gnu dot org  2006-04-10 20:18 ---
The resulting code for -march=opteron:

test_bit:
.LFB2:
leal63(%rsi), %edx
testl   %esi, %esi
movl%esi, %eax
cmovns  %esi, %edx
sarl$31, %eax
shrl$26, %eax
sarl$6, %edx
leal(%rsi,%rax), %ecx
movslq  %edx,%rdx
andl$63, %ecx
subl%eax, %ecx
movl$1, %eax
sall%cl, %eax
cltq
testq   %rax, (%rdi,%rdx,8)
setne   %al
movzbl  %al, %eax
ret

For -march=nocona the code is even uglier.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25671



[Bug target/25671] test_bit() compilation does not expand to bt instruction

2006-04-10 Thread steven at gcc dot gnu dot org


--- Comment #3 from steven at gcc dot gnu dot org  2006-04-10 20:31 ---
This is what the i386 machine description has to say about BT and friends:

;; %%% bts, btr, btc, bt.
;; In general these instructions are *slow* when applied to memory,
;; since they enforce atomic operation.  When applied to registers,
;; it depends on the cpu implementation.  They're never faster than
;; the corresponding and/ior/xor operations, so with 32-bit there's
;; no point.  But in 64-bit, we can't hold the relevant immediates
;; within the instruction itself, so operating on bits in the high
;; 32-bits of a register becomes easier.
;;
;; These are slow on Nocona, but fast on Athlon64.  We do require the use
;; of btrq and btcq for corner cases of post-reload expansion of absdf and
;; negdf respectively, so they can never be disabled entirely.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25671



[Bug target/25671] test_bit() compilation does not expand to bt instruction

2006-01-04 Thread pinskia at gcc dot gnu dot org


--- Comment #1 from pinskia at gcc dot gnu dot org  2006-01-04 15:33 ---
Confirmed, not a regression.


-- 

pinskia at gcc dot gnu dot org changed:

   What|Removed |Added

   Severity|minor   |enhancement
 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
   Keywords||missed-optimization
   Last reconfirmed|-00-00 00:00:00 |2006-01-04 15:33:26
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25671