[Bug target/82261] x86: missing peephole for SHLD / SHRD

2022-04-09 Thread peter at cordes dot ca via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82261

--- Comment #4 from Peter Cordes  ---
GCC will emit SHLD / SHRD as part of shifting an integer that's two registers
wide.
Hironori Bono proposed the following functions as a workaround for this missed
optimization (https://stackoverflow.com/a/71805063/224132)

#include 

#ifdef __SIZEOF_INT128__
uint64_t shldq_x64(uint64_t low, uint64_t high, uint64_t count) {
  return (uint64_t)(unsigned __int128)high << 64) | (unsigned __int128)low)
<< (count & 63)) >> 64);
}

uint64_t shrdq_x64(uint64_t low, uint64_t high, uint64_t count) {
  return (uint64_t)unsigned __int128)high << 64) | (unsigned __int128)low)
>> (count & 63));
}
#endif

uint32_t shld_x86(uint32_t low, uint32_t high, uint32_t count) {
  return (uint32_t)(uint64_t)high << 32) | (uint64_t)low) << (count & 31))
>> 32);
}

uint32_t shrd_x86(uint32_t low, uint32_t high, uint32_t count) {
  return (uint32_t)uint64_t)high << 32) | (uint64_t)low) >> (count & 31));
}

---

The uint64_t functions (using __int128) compile cleanly in 64-bit mode
(https://godbolt.org/z/1j94Gcb4o) using 64-bit operand-size shld/shrd

but the uint32_t functions compile to a total mess in 32-bit mode (GCC11.2 -O3
-m32 -mregparm=3) before eventually using shld, including a totally insane 
or  dh, 0

GCC trunk with -O3 -mregparm=3 compiles them cleanly, but without regparm it's
also slightly different mess.

Ironically, the uint32_t functions compile to quite a few instructions in
64-bit mode, actually doing the operations as written with shifts and ORs, and
having to manually mask the shift count to &31 because it uses a 64-bit
operand-size shift which masks with &63.  32-bit operand-size SHLD would be a
win here, at least for -mtune=intel or a specific Intel uarch.

I haven't looked at whether they still compile ok after inlining into
surrounding code, or whether operations would tend to combine with other things
in preference to becoming an SHLD.

[Bug target/82261] x86: missing peephole for SHLD / SHRD

2020-05-19 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82261

--- Comment #3 from Uroš Bizjak  ---
(In reply to Michael Clark from comment #2)
> Just refreshing this issue. I found it while testing some code-gen on
> Godbolt:

The combiner creates:

Failed to match this instruction:
(parallel [
(set (reg:SI 89)
(ior:SI (ashift:SI (reg:SI 94)
(subreg:QI (reg/v:SI 88 [ n ]) 0))
(lshiftrt:SI (reg:SI 95)
(minus:QI (subreg:QI (reg:SI 91) 0)
(subreg:QI (reg/v:SI 88 [ n ]) 0)
(clobber (reg:CC 17 flags))
])

This is *almost* matched by:

(define_insn "x86_shld"
  [(set (match_operand:SI 0 "nonimmediate_operand" "+r*m")
(ior:SI (ashift:SI (match_dup 0)
  (match_operand:QI 2 "nonmemory_operand" "Ic"))
(lshiftrt:SI (match_operand:SI 1 "register_operand" "r")
  (minus:QI (const_int 32) (match_dup 2)
   (clobber (reg:CC FLAGS_REG))]

but RTL combiner doesn't propagate (const_int 32) into the pattern.

I wonder if tree combiner can help here.

[Bug target/82261] x86: missing peephole for SHLD / SHRD

2020-05-18 Thread michaeljclark at mac dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82261

Michael Clark  changed:

   What|Removed |Added

 CC||michaeljclark at mac dot com

--- Comment #2 from Michael Clark  ---
Just refreshing this issue. I found it while testing some code-gen on Godbolt:

- https://godbolt.org/z/uXGxZ9

I noticed that Haswell code-gen uses SHRX/SHLX, but I think -Os and pre-Haswell
would benefit from this peephole if it is not complex to add. Noting that Clang
prefers SHLD / SHRD over the SHRX+SHLX pair no matter the -march flavor.

[Bug target/82261] x86: missing peephole for SHLD / SHRD

2017-09-19 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82261

--- Comment #1 from Marc Glisse  ---
Related to PR 55583.