[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3

2021-07-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed|2021-04-06 00:00:00 |2021-7-24
   Severity|normal  |enhancement

[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3

2021-04-08 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

--- Comment #10 from Segher Boessenkool  ---
That is a USE of a constant, which is a no-op always.  Here we have a USE
of a register, which is not.  We actually have *two* uses of pseudos, and
combine cannot know what that means for the target (all PARALLELs are split
up in combine).

[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3

2021-04-08 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

--- Comment #9 from Hongtao.liu  ---
(In reply to Segher Boessenkool from comment #8)
> That patch is no good.  The combination is not allowed because it is not
> known what the "use"s are *for*.  Checking if something is from the constant
> pools is not enough at all.

in -O1 the USE of INSN is ---use [`*.LC0']--- a reference of constant pool, we
also don't know what the uses are for, why it can be combined?

[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3

2021-04-07 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

--- Comment #8 from Segher Boessenkool  ---
That patch is no good.  The combination is not allowed because it is not
known what the "use"s are *for*.  Checking if something is from the constant
pools is not enough at all.

[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3

2021-04-07 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

--- Comment #7 from Hongtao.liu  ---
i'm testing

1 file changed, 30 insertions(+)
gcc/combine.c | 30 ++

modified   gcc/combine.c
@@ -1811,6 +1811,33 @@ set_nonzero_bits_and_sign_copies (rtx x, const_rtx set,
void *data)
}
 }
 }
+
+/* Return true is reg is only defined by loading from constant pool.  */
+static int
+single_ref_from_constant_pool (rtx reg)
+{
+  gcc_assert (REG_P (reg));
+  rtx_insn* insn;
+  rtx src, set;
+
+  if (DF_REG_DEF_COUNT (REGNO (reg)) != 1)
+return 0;
+  insn = DF_REF_INSN (DF_REG_DEF_CHAIN (REGNO (reg)));
+  if (!insn)
+return 0;
+  set = single_set (insn);
+  if (!set)
+return 0;
+  src = SET_SRC (set);
+
+  /* Constant pool.  */
+  if (!MEM_P (src)
+  || !SYMBOL_REF_P (XEXP (src, 0))
+  || !CONSTANT_POOL_ADDRESS_P (XEXP (src, 0)))
+return 0;
+
+  return 1;
+}

 /* See if INSN can be combined into I3.  PRED, PRED2, SUCC and SUCC2 are
optionally insns that were previously combined into I3 or that will be
@@ -1895,7 +1922,10 @@ can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_insn
*pred ATTRIBUTE_UNUSED,
 something to tell them apart, e.g. different modes.  For
 now, we forgo such complicated tests and simply disallow
 combining of USES of pseudo registers with any other USE.  */
+ /* If the USE in INSN is only defined by loading from constant
+pool, it must have identical value.  */
  if (REG_P (XEXP (elt, 0))
+ && !single_ref_from_constant_pool (XEXP (elt, 0))
  && GET_CODE (PATTERN (i3)) == PARALLEL)
{
  rtx i3pat = PATTERN (i3);

[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3

2021-04-07 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

--- Comment #6 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #4)
> Is there some reason why the patterns are written that way rather than split
> immediately into the AND or XOR?  Perhaps it could be done on SUBREGs to
> make it valid RTL, but we split into those post reload already anyway.

I don't know, since these patterns pre-date my involvement in gcc.

[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3

2021-04-07 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

--- Comment #5 from Jakub Jelinek  ---
Maybe the X alternatives where we don't know the sign bit mask.

[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3

2021-04-07 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org,
   ||uros at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
Is there some reason why the patterns are written that way rather than split
immediately into the AND or XOR?  Perhaps it could be done on SUBREGs to make
it valid RTL, but we split into those post reload already anyway.

[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3

2021-04-06 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

--- Comment #3 from Segher Boessenkool  ---
What happens here is
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/combine.c;h=3294575357bfcb19e589868da34364498a860dcf;hb=HEAD#l1884

"*2_1" for absneg:MODEF has a bare "use".  And then we trigger

  If the USE in INSN was for a pseudo register, the matching
  insn pattern will likely match any register; combining this
  with any other USE would only be safe if we knew that the
  used registers have identical values, or if there was
  something to tell them apart, e.g. different modes.  For
  now, we forgo such complicated tests and simply disallow
  combining of USES of pseudo registers with any other USE.

because both the abs and the neg have a bare use.

The patterns should be rewritten to not have such bare uses.  Alternatively
we can add some pretty-much-never-triggered code do combine to handle this
case.  Patches welcome.

[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3

2021-04-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

Richard Biener  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #2 from Richard Biener  ---
Seems because of r93 being live:

insn_cost 8 for 9: r93:V4SF=[`*.LC0']
  REG_EQUAL const_vector
insn_cost 4 for10: {r91:SF=abs(r92:SF);use r93:V4SF;clobber flags:CC;}
  REG_DEAD r92:SF
  REG_UNUSED flags:CC
insn_cost 8 for11: r95:V4SF=[`*.LC1']
  REG_EQUAL const_vector
insn_cost 4 for12: {r94:SF=-r91:SF;use r95:V4SF;clobber flags:CC;}
  REG_DEAD r91:SF
  REG_UNUSED flags:CC
insn_cost 4 for13: flags:CCFP=cmp(r90:SF,r94:SF)
  REG_DEAD r94:SF
insn_cost 12 for14: pc={(flags:CCFP>0)?L35:pc}
  REG_DEAD flags:CCFP
  REG_BR_PROB 59055804
insn_cost 8 for16: r97:SF=[r89:DI+0x4]
  REG_DEAD r89:DI
insn_cost 4 for18: {r96:SF=abs(r97:SF);use r93:V4SF;clobber flags:CC;}
  REG_DEAD r97:SF
  REG_DEAD r93:V4SF
  REG_UNUSED flags:CC
insn_cost 4 for20: {r99:SF=-r96:SF;use r95:V4SF;clobber flags:CC;}
  REG_DEAD r96:SF
  REG_DEAD r95:V4SF
  REG_UNUSED flags:CC

while at -O1 we have two loads of LC0 and r93 is dead after insn 10.

[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3

2021-04-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2021-04-06
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
 Target||x86_64-*-*
  Component|target  |rtl-optimization

--- Comment #1 from Richard Biener  ---
Confirmed.  At -O1

Trying 10 -> 12:
   10: {r91:SF=abs(r92:SF);use [`*.LC0'];clobber flags:CC;}
  REG_UNUSED flags:CC
  REG_DEAD r92:SF
   12: {r94:SF=-r91:SF;use r95:V4SF;clobber flags:CC;}
  REG_DEAD r95:V4SF
  REG_DEAD r91:SF
  REG_UNUSED flags:CC
Failed to match this instruction:
(parallel [
(set (reg:SF 94)
(neg:SF (abs:SF (reg:SF 92 [ *n_9(D) ]
(use (reg:V4SF 95))
(clobber (reg:CC 17 flags))
])
Successfully matched this instruction:
(parallel [
(set (reg:SF 94)
(neg:SF (abs:SF (reg:SF 92 [ *n_9(D) ]
(use (reg:V4SF 95))
])
allowing combination of insns 10 and 12
original costs 4 + 4 = 8
replacement cost 8

but with -O2:

Trying 10 -> 12:
   10: {r91:SF=abs(r92:SF);use r93:V4SF;clobber flags:CC;}
  REG_DEAD r92:SF
  REG_UNUSED flags:CC
   12: {r94:SF=-r91:SF;use r95:V4SF;clobber flags:CC;}
  REG_DEAD r91:SF
  REG_UNUSED flags:CC
Can't combine i2 into i3

we're later trying

Trying 10, 12 -> 13:
   10: {r91:SF=abs(r92:SF);use r93:V4SF;clobber flags:CC;}
  REG_DEAD r92:SF
  REG_UNUSED flags:CC
   12: {r94:SF=-r91:SF;use r95:V4SF;clobber flags:CC;}
  REG_DEAD r91:SF
  REG_UNUSED flags:CC
   13: flags:CCFP=cmp(r90:SF,r94:SF)
  REG_DEAD r94:SF
Failed to match this instruction:
(set (reg:CCFP 17 flags)
(compare:CCFP (neg:SF (abs:SF (reg:SF 92 [ *n_9(D) ])))
(reg/v:SF 90 [ m ])))
Failed to match this instruction:
(set (reg:SF 94)
(abs:SF (reg:SF 92 [ *n_9(D) ])))