[Bug target/98957] [11 Regression] [x86] Odd code generation for 8-bit left shift

2021-02-04 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98957

--- Comment #5 from Jakub Jelinek  ---
Though, in this case it seems like the best fix is to:
2021-02-04  Jakub Jelinek  

PR target/98957
* config/i386/x86-tune.def (X86_TUNE_BRANCH_PREDICTION_HINTS,
X86_TUNE_PROMOTE_QI_REGS): Use HOST_WIDE_INT_0U instead of 0U.
(X86_TUNE_QIMODE_MATH): Use ~HOST_WIDE_INT_0U instead of ~0U.

--- gcc/config/i386/x86-tune.def.jj 2021-01-04 10:25:45.175162012 +0100
+++ gcc/config/i386/x86-tune.def2021-02-04 10:56:20.031489884 +0100
@@ -580,15 +580,16 @@ DEF_TUNE (X86_TUNE_AVOID_VECTOR_DECODE,
on simulation result. But after P4 was made, no performance benefit
was observed with branch hints.  It also increases the code size.
As a result, icc never generates branch hints.  */
-DEF_TUNE (X86_TUNE_BRANCH_PREDICTION_HINTS, "branch_prediction_hints", 0U)
+DEF_TUNE (X86_TUNE_BRANCH_PREDICTION_HINTS, "branch_prediction_hints",
+ HOST_WIDE_INT_0U)

 /* X86_TUNE_QIMODE_MATH: Enable use of 8bit arithmetic.  */
-DEF_TUNE (X86_TUNE_QIMODE_MATH, "qimode_math", ~0U)
+DEF_TUNE (X86_TUNE_QIMODE_MATH, "qimode_math", ~HOST_WIDE_INT_0U)

 /* X86_TUNE_PROMOTE_QI_REGS: This enables generic code that promotes all 8bit
arithmetic to 32bit via PROMOTE_MODE macro.  This code generation scheme
is usually used for RISC targets.  */
-DEF_TUNE (X86_TUNE_PROMOTE_QI_REGS, "promote_qi_regs", 0U)
+DEF_TUNE (X86_TUNE_PROMOTE_QI_REGS, "promote_qi_regs", HOST_WIDE_INT_0U)

 /* X86_TUNE_EMIT_VZEROUPPER: This enables vzeroupper instruction insertion
before a transfer of control flow out of the function.  */

because disabling QImode math on all the PROCESSOR_* tunings which happen to be
>= 32 seems unintended.  E.g. in GCC 8 that was just
PROCESSOR_BTVER2 and PROCESSOR_ZNVER1, in GCC <= 7 none, in GCC 9
PROCESSOR_BDVER{2,3,4}, PROCESSOR_BTVER{1,2} and PROCESSOR_ZNVER{1,2},
GCC 10 added PROCESSOR_AMDFAM10 to the GCC 9 set and trunk adds
PROCESSOR_ATHLON, PROCESSOR_K8 and PROCESSOR_ZNVER3 to that set (set of tunings
that disable QImode math).

[Bug target/98957] [11 Regression] [x86] Odd code generation for 8-bit left shift

2021-02-04 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98957

Jakub Jelinek  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org,
   ||sayle at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
The thing is that during combine that change allows one further optimization.
After successfully optimizing the and away:
Trying 8 -> 10:
8: {r88:HI=r87:HI 0>>0x7;clobber flags:CC;}
  REG_DEAD r87:HI
  REG_UNUSED flags:CC
   10: {r90:HI=r88:HI&0x1;clobber flags:CC;}
  REG_DEAD r88:HI
  REG_UNUSED flags:CC
Successfully matched this instruction:
(parallel [
(set (reg:HI 90)
(lshiftrt:HI (reg:HI 87 [ m ])
(const_int 7 [0x7])))
(clobber (reg:CC 17 flags))
])
it adds it back again:
Trying 7, 10 -> 11:
7: r87:HI=zero_extend(r91:SI#0)
  REG_DEAD r91:SI
   10: {r90:HI=r87:HI 0>>0x7;clobber flags:CC;}
  REG_DEAD r87:HI
  REG_UNUSED flags:CC
   11: r86:QI=r90:HI#0
  REG_DEAD r90:HI
Failed to match this instruction:
(set (subreg:HI (reg:QI 86) 0)
(zero_extract:HI (subreg:HI (reg:SI 91) 0)
(const_int 1 [0x1])
(const_int 7 [0x7])))
Failed to match this instruction:
(set (subreg:HI (reg:QI 86) 0)
(and:HI (lshiftrt:HI (subreg:HI (reg:SI 91) 0)
(const_int 7 [0x7]))
(const_int 1 [0x1])))
Successfully matched this instruction:
(set (reg:HI 90)
(lshiftrt:HI (subreg:HI (reg:SI 91) 0)
(const_int 7 [0x7])))
Successfully matched this instruction:
(set (subreg:HI (reg:QI 86) 0)
(and:HI (reg:HI 90)
(const_int 1 [0x1])))
allowing combination of insns 7, 10 and 11
original costs 4 + 4 + 4 = 12
replacement costs 4 + 4 = 8
deferring deletion of insn with uid = 7.
modifying insn i210: {r90:HI=r91:SI#0 0>>0x7;clobber flags:CC;}
  REG_UNUSED flags:CC
  REG_DEAD r91:SI
deferring rescan insn with uid = 10.
modifying insn i311: {r86:QI#0=r90:HI&0x1;clobber flags:CC;}
  REG_UNUSED flags:CC
  REG_DEAD r90:HI
deferring rescan insn with uid = 11.
in a 3 to 2 combination.  It is unclear why the
(insn 11 10 16 2 (set (reg:QI 86)
(subreg:QI (reg:HI 90) 0)) "pr98957.c":3:14 77 {*movqi_internal}
 (expr_list:REG_DEAD (reg:HI 90)
(nil)))
insn is considered to have any cost at all though...

[Bug target/98957] [11 Regression] [x86] Odd code generation for 8-bit left shift

2021-02-04 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98957

--- Comment #3 from Jakub Jelinek  ---
The first change is during cse1:
 (insn 10 9 11 2 (parallel [
 (set (reg:HI 90)
 (and:HI (subreg:HI (reg:QI 89) 0)
 (const_int 1 [0x1])))
 (clobber (reg:CC 17 flags))
 ]) "pr98957.c":3:14 489 {*andhi_1}
  (expr_list:REG_DEAD (reg:QI 89)
 (expr_list:REG_UNUSED (reg:CC 17 flags)
 (nil
 (insn 11 10 12 2 (set (reg:QI 86)
 (subreg:QI (reg:HI 90) 0)) "pr98957.c":3:14 77 {*movqi_internal}
  (expr_list:REG_DEAD (reg:HI 90)
 (nil)))
 (insn 12 11 16 2 (set (reg:QI 83 [  ])
 (subreg:QI (reg:HI 90) 0)) "pr98957.c":3:14 77 {*movqi_internal}
  (expr_list:REG_DEAD (reg:QI 86)
 (nil)))
 (insn 16 12 17 2 (set (reg/i:QI 0 ax)
-(subreg:QI (reg:HI 90) 0)) "pr98957.c":4:1 77 {*movqi_internal}
+(reg:QI 86)) "pr98957.c":4:1 77 {*movqi_internal}
  (expr_list:REG_DEAD (reg:QI 83 [  ])
 (nil)))
and that then changes how combine handles it.

[Bug target/98957] [11 Regression] [x86] Odd code generation for 8-bit left shift

2021-02-04 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98957

Jakub Jelinek  changed:

   What|Removed |Added

   Keywords|needs-bisection |
 CC||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
Started with r11-5066-gbe39636d9f68c437c8a2c2e7a225c4aed4663e78

[Bug target/98957] [11 Regression] [x86] Odd code generation for 8-bit left shift

2021-02-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98957

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1
Summary|[x86] Odd code generation   |[11 Regression] [x86] Odd
   |for 8-bit left shift|code generation for 8-bit
   ||left shift
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2021-02-04
   Target Milestone|--- |11.0
   Keywords||needs-bisection

--- Comment #1 from Richard Biener  ---
Indeed odd.  -mtune=core-avx2 outputs just

movl%edi, %eax
shrb$7, %al
ret

which is also what generic tuning produces.  This must be some partial
reg stall stuff, not sure which.

Maybe there's again some flags aliasing going on in the backend.  Marking as
regression for investigation.