[Bug target/25671] test_bit() compilation does not expand to "bt" instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25671 --- Comment #10 from Andrew Pinski --- With the fixed testcase we get: movq%rsi, %rax movq%rsi, %rcx shrq$6, %rax andl$63, %ecx movq(%rdi,%rax,8), %rax shrq%cl, %rax andl$1, %eax ICC can produce the btq but with extra instructions still: movq %rsi, %rax#5.25 shrq $6, %rax #5.25 movq (%rdi,%rax,8), %rdx #5.13 xorl %eax, %eax#5.61 btq %rsi, %rdx#5.61 setc %al
[Bug target/25671] test_bit() compilation does not expand to "bt" instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25671 --- Comment #9 from Andrew Pinski --- Note there is a bug in the original testcase. It should be: int test_bit(unsigned long *words, int bit) { int wsize = (sizeof *words) * 8; return (words[bit / wsize] & (1ul << (bit % wsize))) != 0; } if int is 32bit and long is 64bit, you would have gotten the wrong result.
[Bug target/25671] test_bit() compilation does not expand to bt instruction
--- Comment #4 from avi at argo dot co dot il 2006-04-11 15:36 --- Benchmark results, 32 bit code, various methods On an athlon 64: bts reg, (reg): 1 cycle bts reg, (mem): 3 cycles C code (reg):1 cycle C code (mem):5 cycles On a Xeon: bts reg, (reg): 6 cycles bts reg, (mem): 15 cycles C code (reg):1 cycle C code (mem):5 cycles Looks like a very small win on athlon 64 when modifying memory. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25671
[Bug target/25671] test_bit() compilation does not expand to bt instruction
--- Comment #5 from avi at argo dot co dot il 2006-04-11 15:38 --- Created an attachment (id=11243) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11243action=view) benchmark for various set_bit() implementions -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25671
[Bug target/25671] test_bit() compilation does not expand to bt instruction
--- Comment #6 from avi at argo dot co dot il 2006-04-11 15:39 --- oops, the benchmark was for bts. will do again for bt. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25671
[Bug target/25671] test_bit() compilation does not expand to bt instruction
--- Comment #7 from avi at argo dot co dot il 2006-04-11 15:53 --- Created an attachment (id=11244) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11244action=view) bt instruction benchmark redone the test for test_bit(), this time always forcing a memory access: Athlon 64: bt: 3 cycles generic: 3 cycles Xeon: bt: 10 cycles generic: 4 cycles so, bt might be usable for -Os, but likely not with the effort. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25671
[Bug target/25671] test_bit() compilation does not expand to bt instruction
--- Comment #8 from steven at gcc dot gnu dot org 2006-04-11 23:03 --- Code size issue -- steven at gcc dot gnu dot org changed: What|Removed |Added OtherBugsDependingO||16996 nThis|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25671
[Bug target/25671] test_bit() compilation does not expand to bt instruction
--- Comment #2 from steven at gcc dot gnu dot org 2006-04-10 20:18 --- The resulting code for -march=opteron: test_bit: .LFB2: leal63(%rsi), %edx testl %esi, %esi movl%esi, %eax cmovns %esi, %edx sarl$31, %eax shrl$26, %eax sarl$6, %edx leal(%rsi,%rax), %ecx movslq %edx,%rdx andl$63, %ecx subl%eax, %ecx movl$1, %eax sall%cl, %eax cltq testq %rax, (%rdi,%rdx,8) setne %al movzbl %al, %eax ret For -march=nocona the code is even uglier. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25671
[Bug target/25671] test_bit() compilation does not expand to bt instruction
--- Comment #3 from steven at gcc dot gnu dot org 2006-04-10 20:31 --- This is what the i386 machine description has to say about BT and friends: ;; %%% bts, btr, btc, bt. ;; In general these instructions are *slow* when applied to memory, ;; since they enforce atomic operation. When applied to registers, ;; it depends on the cpu implementation. They're never faster than ;; the corresponding and/ior/xor operations, so with 32-bit there's ;; no point. But in 64-bit, we can't hold the relevant immediates ;; within the instruction itself, so operating on bits in the high ;; 32-bits of a register becomes easier. ;; ;; These are slow on Nocona, but fast on Athlon64. We do require the use ;; of btrq and btcq for corner cases of post-reload expansion of absdf and ;; negdf respectively, so they can never be disabled entirely. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25671
[Bug target/25671] test_bit() compilation does not expand to bt instruction
--- Comment #1 from pinskia at gcc dot gnu dot org 2006-01-04 15:33 --- Confirmed, not a regression. -- pinskia at gcc dot gnu dot org changed: What|Removed |Added Severity|minor |enhancement Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Keywords||missed-optimization Last reconfirmed|-00-00 00:00:00 |2006-01-04 15:33:26 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25671