[Bug target/93453] PPC: rldimi not taken into account to avoid shift+or
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93453 --- Comment #9 from Segher Boessenkool --- Yeah that looks better already, thanks. Please get rid of the debug stuff still in here, and send to gcc-patches@?
[Bug target/93453] PPC: rldimi not taken into account to avoid shift+or
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93453 --- Comment #8 from HaoChen Gui --- I refined the patch and put all things in a helper - change_pseudo_and_mask. As you mentioned, it's still a band-aid. The perfect solution might be a better version of nonzero_bits. Thanks. diff --git a/gcc/combine.c b/gcc/combine.c index 892c834a160..f0e6ca5d8cf 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -11539,6 +11539,41 @@ change_zero_ext (rtx pat) return changed; } +/* Convert a psuedo to psuedo AND with a mask if its nonzero_bits is less + than its mode mask. */ +static bool +change_pseudo_and_mask (rtx pat) +{ + bool changed = false; + + rtx src = SET_SRC (pat); + if ((GET_CODE (src) == IOR + || GET_CODE (src) == XOR + || GET_CODE (src) == PLUS) + && (((GET_CODE (XEXP (src, 0)) == ASHIFT + || GET_CODE (XEXP (src, 0)) == LSHIFTRT + || GET_CODE (XEXP (src, 0)) == AND) + && REG_P (XEXP (src, 1))) + || ((GET_CODE (XEXP (src, 1)) == ASHIFT + || GET_CODE (XEXP (src, 1)) == LSHIFTRT + || GET_CODE (XEXP (src, 1)) == AND) + && REG_P (XEXP (src, 0) +{ + rtx *reg = REG_P (XEXP (src, 0)) +? (SET_SRC (pat), 0) +: (SET_SRC (pat), 1); + machine_mode mode = GET_MODE (*reg); + unsigned HOST_WIDE_INT nonzero = nonzero_bits (*reg, mode); + if (nonzero < GET_MODE_MASK (mode)) + { + rtx x = gen_rtx_AND (mode, *reg, GEN_INT (nonzero)); + SUBST (*reg, x); + changed = true; + } + } + return changed; +} + /* Like recog, but we receive the address of a pointer to a new pattern. We try to match the rtx that the pointer points to. If that fails, we may try to modify or replace the pattern, @@ -11565,9 +11600,18 @@ recog_for_combine (rtx *pnewpat, rtx_insn *insn, rtx *pnotes) void *marker = get_undo_marker (); bool changed = false; + //bool PIX_opt = false; if (GET_CODE (pat) == SET) -changed = change_zero_ext (pat); +{ + changed = change_pseudo_and_mask (pat); + if (changed) + { + maybe_swap_commutative_operands (SET_SRC (pat)); + //PIX_opt = true; + } + changed |= change_zero_ext (pat); +} else if (GET_CODE (pat) == PARALLEL) { int i; @@ -11585,6 +11629,8 @@ recog_for_combine (rtx *pnewpat, rtx_insn *insn, rtx *pnotes) if (insn_code_number < 0) undo_to_marker (marker); + //else if (PIX_opt) + //fprintf (stdout, "PIX applied\n"); } return insn_code_number;
[Bug target/93453] PPC: rldimi not taken into account to avoid shift+or
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93453 --- Comment #7 from Segher Boessenkool --- (In reply to HaoChen Gui from comment #6) > Yes, I found that the nonzero_bits doesn't return exact value in other > pass. It returns a different value. Neither is "exact". The version used by combine uses what combine keeps track of in reg_stat. What that records depends on the order combine tries its combinations, and on which succeed as well: it is suboptimal. The version used outside of combine does not keep track of known values at all. This is worse. > So calling nonzero_bits in md file is bad as it can't be recognized in > other pass. An insn that matches in one pass is required to match in the next as well, yes. This does not mean you cannot use nonzero_bits at all, but yes it is close. > Right now I want to convert a single pseudo to the pseudo AND with a mask > in combine pass if its nonzero_bits is less than its mode mask and the outer > operation is plus/ior/xor and its one of inner operation is > ashift/lshiftrt/and. Thus it is possible to match rotate and insert pattern. > What's your opinion? Thanks a lot. > > (ior:DI (ashift:DI (reg:DI 125) > (const_int 32 [0x20])) > (reg:DI 126))) > > is converted to > > (ior:DI (ashift:DI (reg:DI 125) >(const_int 32 [0x20])) > (and:DI (reg:DI 126) > (const_int 4294967295 [0xfff] In this example it isn't invalid RTL. > diff --git a/gcc/combine.c b/gcc/combine.c > index 892c834a160..8b72a5ec831 100644 > --- a/gcc/combine.c > +++ b/gcc/combine.c > @@ -11539,6 +11539,26 @@ change_zero_ext (rtx pat) >return changed; > } > > +/* Convert a psuedo to psuedo AND with a mask if its nonzero_bits is less > + than its mode mask. */ > +static bool > +pseudo_and_with_mask (rtx *reg) > +{ > + bool changed = false; > + gcc_assert (REG_P (*reg)); > + > + machine_mode mode = GET_MODE (*reg); > + unsigned HOST_WIDE_INT nonzero = nonzero_bits (*reg, mode); > + if (nonzero < GET_MODE_MASK (mode)) > +{ > + rtx x = gen_rtx_AND (mode, *reg, GEN_INT (nonzero)); > + SUBST (*reg, x); > + changed = true; > + //fprintf (stdout, "PIX optimization\n"); > +} > + return changed; > +} > + > /* Like recog, but we receive the address of a pointer to a new pattern. > We try to match the rtx that the pointer points to. > If that fails, we may try to modify or replace the pattern, > @@ -11565,9 +11585,34 @@ recog_for_combine (rtx *pnewpat, rtx_insn *insn, > rtx *pnotes) > >void *marker = get_undo_marker (); >bool changed = false; > + //bool PIX_opt = false; > >if (GET_CODE (pat) == SET) > -changed = change_zero_ext (pat); > +{ > + rtx src = SET_SRC (pat); > + if ((GET_CODE (src) == IOR > + || GET_CODE (src) == XOR > + || GET_CODE (src) == PLUS) > + && (((GET_CODE (XEXP (src, 0)) == ASHIFT > + || GET_CODE (XEXP (src, 0)) == LSHIFTRT > + || GET_CODE (XEXP (src, 0)) == AND) > + && REG_P (XEXP (src, 1))) > + || ((GET_CODE (XEXP (src, 1)) == ASHIFT > + || GET_CODE (XEXP (src, 1)) == LSHIFTRT > + || GET_CODE (XEXP (src, 1)) == AND) > + && REG_P (XEXP (src, 0) > + { > + changed = REG_P (XEXP (src, 0)) > + ? pseudo_and_with_mask ( (SET_SRC (pat), 0)) > + : pseudo_and_with_mask ( (SET_SRC (pat), 1)); > + if (changed) > + { > + maybe_swap_commutative_operands (SET_SRC (pat)); > + //PIX_opt = true; > + } > + } > + changed |= change_zero_ext (pat); > +} >else if (GET_CODE (pat) == PARALLEL) > { >int i; > @@ -11585,6 +11630,8 @@ recog_for_combine (rtx *pnewpat, rtx_insn *insn, rtx > *pnotes) > >if (insn_code_number < 0) > undo_to_marker (marker); > + //else if (PIX_opt) > + //fprintf (stdout, "PIX applied\n"); > } > >return insn_code_number; I don't like this at all. Maybe it will look better if you split this out to a new helper like change_zero_ext? That will make it clearer if and how these change things interact: they should not make us try out an exponential number of things (you don't cause that here, good), but also they should not negatively affect each other, and that isn't clear at all as written. Another problem is this is kind of giving up and trying to solve a problem in combine itself with this band-aid, while the existing change_* stuff is battling a problem in the design of RTL itself. That can be solved by clarifying things in a comment of course :-)
[Bug target/93453] PPC: rldimi not taken into account to avoid shift+or
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93453 --- Comment #6 from HaoChen Gui --- Sehger, Yes, I found that the nonzero_bits doesn't return exact value in other pass. So calling nonzero_bits in md file is bad as it can't be recognized in other pass. Right now I want to convert a single pseudo to the pseudo AND with a mask in combine pass if its nonzero_bits is less than its mode mask and the outer operation is plus/ior/xor and its one of inner operation is ashift/lshiftrt/and. Thus it is possible to match rotate and insert pattern. What's your opinion? Thanks a lot. (ior:DI (ashift:DI (reg:DI 125) (const_int 32 [0x20])) (reg:DI 126))) is converted to (ior:DI (ashift:DI (reg:DI 125) (const_int 32 [0x20])) (and:DI (reg:DI 126) (const_int 4294967295 [0xfff] patch.diff diff --git a/gcc/combine.c b/gcc/combine.c index 892c834a160..8b72a5ec831 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -11539,6 +11539,26 @@ change_zero_ext (rtx pat) return changed; } +/* Convert a psuedo to psuedo AND with a mask if its nonzero_bits is less + than its mode mask. */ +static bool +pseudo_and_with_mask (rtx *reg) +{ + bool changed = false; + gcc_assert (REG_P (*reg)); + + machine_mode mode = GET_MODE (*reg); + unsigned HOST_WIDE_INT nonzero = nonzero_bits (*reg, mode); + if (nonzero < GET_MODE_MASK (mode)) +{ + rtx x = gen_rtx_AND (mode, *reg, GEN_INT (nonzero)); + SUBST (*reg, x); + changed = true; + //fprintf (stdout, "PIX optimization\n"); +} + return changed; +} + /* Like recog, but we receive the address of a pointer to a new pattern. We try to match the rtx that the pointer points to. If that fails, we may try to modify or replace the pattern, @@ -11565,9 +11585,34 @@ recog_for_combine (rtx *pnewpat, rtx_insn *insn, rtx *pnotes) void *marker = get_undo_marker (); bool changed = false; + //bool PIX_opt = false; if (GET_CODE (pat) == SET) -changed = change_zero_ext (pat); +{ + rtx src = SET_SRC (pat); + if ((GET_CODE (src) == IOR + || GET_CODE (src) == XOR + || GET_CODE (src) == PLUS) + && (((GET_CODE (XEXP (src, 0)) == ASHIFT + || GET_CODE (XEXP (src, 0)) == LSHIFTRT + || GET_CODE (XEXP (src, 0)) == AND) + && REG_P (XEXP (src, 1))) + || ((GET_CODE (XEXP (src, 1)) == ASHIFT + || GET_CODE (XEXP (src, 1)) == LSHIFTRT + || GET_CODE (XEXP (src, 1)) == AND) + && REG_P (XEXP (src, 0) + { + changed = REG_P (XEXP (src, 0)) + ? pseudo_and_with_mask ( (SET_SRC (pat), 0)) + : pseudo_and_with_mask ( (SET_SRC (pat), 1)); + if (changed) + { + maybe_swap_commutative_operands (SET_SRC (pat)); + //PIX_opt = true; + } + } + changed |= change_zero_ext (pat); +} else if (GET_CODE (pat) == PARALLEL) { int i; @@ -11585,6 +11630,8 @@ recog_for_combine (rtx *pnewpat, rtx_insn *insn, rtx *pnotes) if (insn_code_number < 0) undo_to_marker (marker); + //else if (PIX_opt) + //fprintf (stdout, "PIX applied\n"); } return insn_code_number;
[Bug target/93453] PPC: rldimi not taken into account to avoid shift+or
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93453 --- Comment #5 from Segher Boessenkool --- (In reply to HaoChen Gui from comment #4) > (define_insn_and_split "*rotl3_insert_8" > [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") > (plus_ior_xor:GPR (ashift:GPR (match_operand:GPR 1 "gpc_reg_operand" > "r") > (match_operand:SI 2 "const_int_operand" > "n")) > (match_operand:GPR 3 "gpc_reg_operand" "0")))] > "INTVAL (operands[2]) > 0 >&& (nonzero_bits (operands[3], mode) >< HOST_WIDE_INT_1U << INTVAL (operands[2]))" > { > if (mode == SImode) > return "rlwimi %0,%1,%h2,0,31-%h2"; > else > return "rldimi %0,%1,%H2,0"; > } > "&& 1" > [(set (match_dup 0) > (ior:GPR (and:GPR (match_dup 3) > (match_dup 4)) > (ashift:GPR (match_dup 1) > (match_dup 2] > { > operands[4] = GEN_INT ((HOST_WIDE_INT_1U << INTVAL (operands[2])) - 1); > } > [(set_attr "type" "insert")]) > > But I found that nonzero_bits can't return an exact value except in combine > pass. It can return a different value after the combine pass, yes. But making the version of nonzero_bits used by combine be the generic version would be a big regression, and the version used by combine cannot be used anywhere else (it was an extension of flow.c originally, but this wasn't ported to the dataflow framework). I planned to fix this for GCC 12, but we are in stage 3 already :-) > So the pattern finally can't be split to pattern of > 'rotl3_insert_3'. Also if the pass after combine changes the insn, it > can't be recognized as the nonzero_bits doesn't return exact value in that > pass. > > I am thinking if we can convert third operand to "reg and a mask" when the > nonzero_bits is known in combine pass. Thus the pattern can be directly > combined to 'rotl3_insert_3'. > > (set (reg:DI 123) > (ior:DI (ashift:DI (reg:DI 125) > (const_int 32 [0x20])) > (reg:DI 126))) > > (set (reg:DI 123) > (ior:DI (ashift:DI (reg:DI 125) >(const_int 32 [0x20])) > (and:DI (reg:DI 126) > (const_int 4294967295 [0xfff] Will nothing modify it back?
[Bug target/93453] PPC: rldimi not taken into account to avoid shift+or
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93453 --- Comment #4 from HaoChen Gui --- For the second issue, I drafted following insn_and_split pattern. It tries to combine the shift and ior when the nonzero_bits of operands[3] matches the condition. (define_insn_and_split "*rotl3_insert_8" [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") (plus_ior_xor:GPR (ashift:GPR (match_operand:GPR 1 "gpc_reg_operand" "r") (match_operand:SI 2 "const_int_operand" "n")) (match_operand:GPR 3 "gpc_reg_operand" "0")))] "INTVAL (operands[2]) > 0 && (nonzero_bits (operands[3], mode) < HOST_WIDE_INT_1U << INTVAL (operands[2]))" { if (mode == SImode) return "rlwimi %0,%1,%h2,0,31-%h2"; else return "rldimi %0,%1,%H2,0"; } "&& 1" [(set (match_dup 0) (ior:GPR (and:GPR (match_dup 3) (match_dup 4)) (ashift:GPR (match_dup 1) (match_dup 2] { operands[4] = GEN_INT ((HOST_WIDE_INT_1U << INTVAL (operands[2])) - 1); } [(set_attr "type" "insert")]) But I found that nonzero_bits can't return an exact value except in combine pass. So the pattern finally can't be split to pattern of 'rotl3_insert_3'. Also if the pass after combine changes the insn, it can't be recognized as the nonzero_bits doesn't return exact value in that pass. I am thinking if we can convert third operand to "reg and a mask" when the nonzero_bits is known in combine pass. Thus the pattern can be directly combined to 'rotl3_insert_3'. (set (reg:DI 123) (ior:DI (ashift:DI (reg:DI 125) (const_int 32 [0x20])) (reg:DI 126))) (set (reg:DI 123) (ior:DI (ashift:DI (reg:DI 125) (const_int 32 [0x20])) (and:DI (reg:DI 126) (const_int 4294967295 [0xfff]
[Bug target/93453] PPC: rldimi not taken into account to avoid shift+or
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93453 --- Comment #3 from Segher Boessenkool --- Splitters run after all RTL transforms, so anything that can be done on the split result has to be done manually. This does not scale. Splitters are not suitable for this kind of thing. You can do define_insns to recognise some special case rl*imi patterns we do not yet recognise.
[Bug target/93453] PPC: rldimi not taken into account to avoid shift+or
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93453 HaoChen Gui changed: What|Removed |Added CC||guihaoc at gcc dot gnu.org --- Comment #2 from HaoChen Gui --- My solution is to split the move (from TI to V1TI) into one vsx_concat_v2di and one V2DI to V1TI move. Thus, TI register 122 can be decomposed. (insn 12 11 17 2 (set (reg:V1TI 121 [ b ]) (subreg:V1TI (reg:TI 122 [ a ]) 0)) "test2.c":4:5 1167 {vsx_movv1ti_64bit} (expr_list:REG_DEAD (reg:TI 122 [ a ]) (nil))) //after split pass (insn 23 11 24 2 (set (reg:V2DI 125) (vec_concat:V2DI (subreg:DI (reg:TI 122 [ a ]) 0) (subreg:DI (reg:TI 122 [ a ]) 8))) "test2.c":4:5 -1 (nil)) (insn 24 23 17 2 (set (reg:V1TI 121 [ b ]) (subreg:V1TI (reg:V2DI 125) 0)) "test2.c":4:5 -1 (nil)) //after subreg pass (insn 23 11 24 2 (set (reg:V2DI 125) (vec_concat:V2DI (reg:DI 126 [ a ]) (reg:DI 127 [ a+8 ]))) "test2.c":4:5 1346 {vsx_concat_v2di} (nil)) (insn 24 23 17 2 (set (reg:V1TI 121 [ b ]) (subreg:V1TI (reg:V2DI 125) 0)) "test2.c":4:5 1167 {vsx_movv1ti_64bit} (nil))
[Bug target/93453] PPC: rldimi not taken into account to avoid shift+or
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93453 Segher Boessenkool changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2020-01-28 Ever confirmed|0 |1 --- Comment #1 from Segher Boessenkool --- The first case is because the splitter does emit_insn (gen_ashldi3 (op3, op3, GEN_INT (32))); emit_insn (gen_iordi3 (dest, dest, op3)); but this splitter runs late, so it has to do all wanted simple optimisations manually. The second is Trying 7 -> 8: 7: r124:DI=r125:DI<<0x20 REG_DEAD r125:DI 8: r123:DI=r124:DI|r126:DI REG_DEAD r126:DI REG_DEAD r124:DI Failed to match this instruction: (set (reg:DI 123) (ior:DI (ashift:DI (reg:DI 125) (const_int 32 [0x20])) (reg:DI 126))) We need a separate pattern to also recognise this (instead of just a version with the AND) if we already know the high half bits (of r126 here) are zero. Confirmed.