[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930 Andrew Pinski changed: What|Removed |Added Last reconfirmed|2021-04-06 00:00:00 |2021-7-24 Severity|normal |enhancement
[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930 --- Comment #10 from Segher Boessenkool --- That is a USE of a constant, which is a no-op always. Here we have a USE of a register, which is not. We actually have *two* uses of pseudos, and combine cannot know what that means for the target (all PARALLELs are split up in combine).
[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930 --- Comment #9 from Hongtao.liu --- (In reply to Segher Boessenkool from comment #8) > That patch is no good. The combination is not allowed because it is not > known what the "use"s are *for*. Checking if something is from the constant > pools is not enough at all. in -O1 the USE of INSN is ---use [`*.LC0']--- a reference of constant pool, we also don't know what the uses are for, why it can be combined?
[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930 --- Comment #8 from Segher Boessenkool --- That patch is no good. The combination is not allowed because it is not known what the "use"s are *for*. Checking if something is from the constant pools is not enough at all.
[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930 --- Comment #7 from Hongtao.liu --- i'm testing 1 file changed, 30 insertions(+) gcc/combine.c | 30 ++ modified gcc/combine.c @@ -1811,6 +1811,33 @@ set_nonzero_bits_and_sign_copies (rtx x, const_rtx set, void *data) } } } + +/* Return true is reg is only defined by loading from constant pool. */ +static int +single_ref_from_constant_pool (rtx reg) +{ + gcc_assert (REG_P (reg)); + rtx_insn* insn; + rtx src, set; + + if (DF_REG_DEF_COUNT (REGNO (reg)) != 1) +return 0; + insn = DF_REF_INSN (DF_REG_DEF_CHAIN (REGNO (reg))); + if (!insn) +return 0; + set = single_set (insn); + if (!set) +return 0; + src = SET_SRC (set); + + /* Constant pool. */ + if (!MEM_P (src) + || !SYMBOL_REF_P (XEXP (src, 0)) + || !CONSTANT_POOL_ADDRESS_P (XEXP (src, 0))) +return 0; + + return 1; +} /* See if INSN can be combined into I3. PRED, PRED2, SUCC and SUCC2 are optionally insns that were previously combined into I3 or that will be @@ -1895,7 +1922,10 @@ can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_insn *pred ATTRIBUTE_UNUSED, something to tell them apart, e.g. different modes. For now, we forgo such complicated tests and simply disallow combining of USES of pseudo registers with any other USE. */ + /* If the USE in INSN is only defined by loading from constant +pool, it must have identical value. */ if (REG_P (XEXP (elt, 0)) + && !single_ref_from_constant_pool (XEXP (elt, 0)) && GET_CODE (PATTERN (i3)) == PARALLEL) { rtx i3pat = PATTERN (i3);
[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930 --- Comment #6 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #4) > Is there some reason why the patterns are written that way rather than split > immediately into the AND or XOR? Perhaps it could be done on SUBREGs to > make it valid RTL, but we split into those post reload already anyway. I don't know, since these patterns pre-date my involvement in gcc.
[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930 --- Comment #5 from Jakub Jelinek --- Maybe the X alternatives where we don't know the sign bit mask.
[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org, ||uros at gcc dot gnu.org --- Comment #4 from Jakub Jelinek --- Is there some reason why the patterns are written that way rather than split immediately into the AND or XOR? Perhaps it could be done on SUBREGs to make it valid RTL, but we split into those post reload already anyway.
[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930 --- Comment #3 from Segher Boessenkool --- What happens here is https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/combine.c;h=3294575357bfcb19e589868da34364498a860dcf;hb=HEAD#l1884 "*2_1" for absneg:MODEF has a bare "use". And then we trigger If the USE in INSN was for a pseudo register, the matching insn pattern will likely match any register; combining this with any other USE would only be safe if we knew that the used registers have identical values, or if there was something to tell them apart, e.g. different modes. For now, we forgo such complicated tests and simply disallow combining of USES of pseudo registers with any other USE. because both the abs and the neg have a bare use. The patterns should be rewritten to not have such bare uses. Alternatively we can add some pretty-much-never-triggered code do combine to handle this case. Patches welcome.
[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930 Richard Biener changed: What|Removed |Added CC||segher at gcc dot gnu.org --- Comment #2 from Richard Biener --- Seems because of r93 being live: insn_cost 8 for 9: r93:V4SF=[`*.LC0'] REG_EQUAL const_vector insn_cost 4 for10: {r91:SF=abs(r92:SF);use r93:V4SF;clobber flags:CC;} REG_DEAD r92:SF REG_UNUSED flags:CC insn_cost 8 for11: r95:V4SF=[`*.LC1'] REG_EQUAL const_vector insn_cost 4 for12: {r94:SF=-r91:SF;use r95:V4SF;clobber flags:CC;} REG_DEAD r91:SF REG_UNUSED flags:CC insn_cost 4 for13: flags:CCFP=cmp(r90:SF,r94:SF) REG_DEAD r94:SF insn_cost 12 for14: pc={(flags:CCFP>0)?L35:pc} REG_DEAD flags:CCFP REG_BR_PROB 59055804 insn_cost 8 for16: r97:SF=[r89:DI+0x4] REG_DEAD r89:DI insn_cost 4 for18: {r96:SF=abs(r97:SF);use r93:V4SF;clobber flags:CC;} REG_DEAD r97:SF REG_DEAD r93:V4SF REG_UNUSED flags:CC insn_cost 4 for20: {r99:SF=-r96:SF;use r95:V4SF;clobber flags:CC;} REG_DEAD r96:SF REG_DEAD r95:V4SF REG_UNUSED flags:CC while at -O1 we have two loads of LC0 and r93 is dead after insn 10.
[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930 Richard Biener changed: What|Removed |Added Last reconfirmed||2021-04-06 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Target||x86_64-*-* Component|target |rtl-optimization --- Comment #1 from Richard Biener --- Confirmed. At -O1 Trying 10 -> 12: 10: {r91:SF=abs(r92:SF);use [`*.LC0'];clobber flags:CC;} REG_UNUSED flags:CC REG_DEAD r92:SF 12: {r94:SF=-r91:SF;use r95:V4SF;clobber flags:CC;} REG_DEAD r95:V4SF REG_DEAD r91:SF REG_UNUSED flags:CC Failed to match this instruction: (parallel [ (set (reg:SF 94) (neg:SF (abs:SF (reg:SF 92 [ *n_9(D) ] (use (reg:V4SF 95)) (clobber (reg:CC 17 flags)) ]) Successfully matched this instruction: (parallel [ (set (reg:SF 94) (neg:SF (abs:SF (reg:SF 92 [ *n_9(D) ] (use (reg:V4SF 95)) ]) allowing combination of insns 10 and 12 original costs 4 + 4 = 8 replacement cost 8 but with -O2: Trying 10 -> 12: 10: {r91:SF=abs(r92:SF);use r93:V4SF;clobber flags:CC;} REG_DEAD r92:SF REG_UNUSED flags:CC 12: {r94:SF=-r91:SF;use r95:V4SF;clobber flags:CC;} REG_DEAD r91:SF REG_UNUSED flags:CC Can't combine i2 into i3 we're later trying Trying 10, 12 -> 13: 10: {r91:SF=abs(r92:SF);use r93:V4SF;clobber flags:CC;} REG_DEAD r92:SF REG_UNUSED flags:CC 12: {r94:SF=-r91:SF;use r95:V4SF;clobber flags:CC;} REG_DEAD r91:SF REG_UNUSED flags:CC 13: flags:CCFP=cmp(r90:SF,r94:SF) REG_DEAD r94:SF Failed to match this instruction: (set (reg:CCFP 17 flags) (compare:CCFP (neg:SF (abs:SF (reg:SF 92 [ *n_9(D) ]))) (reg/v:SF 90 [ m ]))) Failed to match this instruction: (set (reg:SF 94) (abs:SF (reg:SF 92 [ *n_9(D) ])))