subject:"\[Bug target\/82680\] Use cmpXXss and cmpXXsd for setcc boolean compare"

[Bug target/82680] Use cmpXXss and cmpXXsd for setcc boolean compare

2021-08-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82680

--- Comment #4 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #3)
I should say this is with the following options:
/O2 /std:c++latest /arch:AVX2

[Bug target/82680] Use cmpXXss and cmpXXsd for setcc boolean compare

2021-08-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82680

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2021-08-18
 Ever confirmed|0   |1

--- Comment #3 from Andrew Pinski  ---
MSVC (in 64bit 19.30.30423.0) does:
vucomisd xmm0, xmm1
jp  SHORT $LN3@g
jne SHORT $LN3@g
mov eax, 1
ret 0
$LN3@g:
xor eax, eax
ret 0

MSVC (32bit, 19.30.30423.0) does:
vmovsd  xmm0, QWORD PTR _x$[esp-4]
vucomisd xmm0, QWORD PTR _y$[esp-4]
mov edx, 1
lahf
xor ecx, ecx
testah, 68  ; 0044H
cmovnp  ecx, edx
mov eax, ecx
; Line 5
ret 0

I don't know why there is a difference as both are the same version.

Confirmed.

[Bug target/82680] Use cmpXXss and cmpXXsd for setcc boolean compare

2017-10-24 Thread peter at cordes dot ca

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82680

--- Comment #2 from Peter Cordes  ---
gcc's sequence is *probably* good, as long as it uses xor / comisd / setcc and
not comisd / setcc / movzx (which gcc often likes to do for integer setcc).

(u)comisd and cmpeqsd both run on the FP add unit.  Agner Fog doesn't list the
latency.  (It's hard to measure, because you'd need to construct a round-trip
back to FP.)  XOR-zeroing is as cheap as a NOP on Intel SnB-family, but uses an
execution port on AMD, so gcc's sequence is the same front-end uops but fewer
unfused-domain uops for the execution units on SnB.  Also, the xor-zeroing is
off the critical path on all CPUs.  (But ucomisd latency is probably as high as
cmpeqsd + movd).

Hmm, AMD bdver* and Ryzen take 2 uops for comisd, so for tune=generic it's
probably worth thinking about using ICC's sequence.

ICC's sequence is especially good if you're doing something with the integer
result that can optimize away the NEG.  (e.g. use it with AND instead of a CMOV
to conditionally zero something, or AND it with another condition).  Or if
you're storing the boolean result to memory, psrld $31, %xmm0 or PAND, then
movd directly to memory without going through integer regs.


comisd doesn't destroy either of its args, but cmpeqsd does (without AVX).  If
you want both x and y afterwards (e.g. if they weren't equal, or you care about
-0.0 and +0.0 being different even though they compare equal), then comisd is a
win.

So I think we need to look at the choices given some more surrounding code.

I'll hopefully look at this some more soon.

[Bug target/82680] Use cmpXXss and cmpXXsd for setcc boolean compare

2017-10-23 Thread ubizjak at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82680

Uroš Bizjak  changed:

   What|Removed |Added

 CC||peter at cordes dot ca

--- Comment #1 from Uroš Bizjak  ---
Maybe Peter knows which version is the best.

[Bug target/82680] Use cmpXXss and cmpXXsd for setcc boolean compare

[Bug target/82680] Use cmpXXss and cmpXXsd for setcc boolean compare

[Bug target/82680] Use cmpXXss and cmpXXsd for setcc boolean compare

[Bug target/82680] Use cmpXXss and cmpXXsd for setcc boolean compare

4 matches

Site Navigation

Mail list logo

Footer information