[Bug target/93453] PPC: rldimi not taken into account to avoid shift+or

2021-11-25 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93453

--- Comment #9 from Segher Boessenkool  ---
Yeah that looks better already, thanks.  Please get rid of the debug stuff
still in here, and send to gcc-patches@?

[Bug target/93453] PPC: rldimi not taken into account to avoid shift+or

2021-11-24 Thread guihaoc at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93453

--- Comment #8 from HaoChen Gui  ---
I refined the patch and put all things in a helper - change_pseudo_and_mask. As
you mentioned, it's still a band-aid. The perfect solution might be a better
version of nonzero_bits. Thanks.

diff --git a/gcc/combine.c b/gcc/combine.c
index 892c834a160..f0e6ca5d8cf 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -11539,6 +11539,41 @@ change_zero_ext (rtx pat)
   return changed;
 }

+/* Convert a psuedo to psuedo AND with a mask if its nonzero_bits is less
+   than its mode mask.  */
+static bool
+change_pseudo_and_mask (rtx pat)
+{
+  bool changed = false;
+
+  rtx src = SET_SRC (pat);
+  if ((GET_CODE (src) == IOR
+   || GET_CODE (src) == XOR
+   || GET_CODE (src) == PLUS)
+  && (((GET_CODE (XEXP (src, 0)) == ASHIFT
+   || GET_CODE (XEXP (src, 0)) == LSHIFTRT
+   || GET_CODE (XEXP (src, 0)) == AND)
+  && REG_P (XEXP (src, 1)))
+ || ((GET_CODE (XEXP (src, 1)) == ASHIFT
+  || GET_CODE (XEXP (src, 1)) == LSHIFTRT
+  || GET_CODE (XEXP (src, 1)) == AND)
+ && REG_P (XEXP (src, 0)
+{
+  rtx *reg = REG_P (XEXP (src, 0))
+?  (SET_SRC (pat), 0)
+:  (SET_SRC (pat), 1);
+  machine_mode mode = GET_MODE (*reg);
+  unsigned HOST_WIDE_INT nonzero = nonzero_bits (*reg, mode);
+  if (nonzero < GET_MODE_MASK (mode))
+   {
+ rtx x = gen_rtx_AND (mode, *reg, GEN_INT (nonzero));
+ SUBST (*reg, x);
+ changed = true;
+   }
+ }
+  return changed;
+}
+
 /* Like recog, but we receive the address of a pointer to a new pattern.
We try to match the rtx that the pointer points to.
If that fails, we may try to modify or replace the pattern,
@@ -11565,9 +11600,18 @@ recog_for_combine (rtx *pnewpat, rtx_insn *insn, rtx
*pnotes)

   void *marker = get_undo_marker ();
   bool changed = false;
+  //bool PIX_opt = false;

   if (GET_CODE (pat) == SET)
-changed = change_zero_ext (pat);
+{
+  changed = change_pseudo_and_mask (pat);
+  if (changed)
+   {
+ maybe_swap_commutative_operands (SET_SRC (pat));
+ //PIX_opt = true;
+   }
+  changed |= change_zero_ext (pat);
+}
   else if (GET_CODE (pat) == PARALLEL)
 {
   int i;
@@ -11585,6 +11629,8 @@ recog_for_combine (rtx *pnewpat, rtx_insn *insn, rtx
*pnotes)

   if (insn_code_number < 0)
undo_to_marker (marker);
+  //else if (PIX_opt)
+   //fprintf (stdout, "PIX applied\n");
 }

   return insn_code_number;

[Bug target/93453] PPC: rldimi not taken into account to avoid shift+or

2021-11-23 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93453

--- Comment #7 from Segher Boessenkool  ---
(In reply to HaoChen Gui from comment #6)
>   Yes, I found that the nonzero_bits doesn't return exact value in other
> pass.

It returns a different value.  Neither is "exact".

The version used by combine uses what combine keeps track of in reg_stat.  What
that records depends on the order combine tries its combinations, and on which
succeed as well: it is suboptimal.

The version used outside of combine does not keep track of known values at all.
This is worse.

> So calling nonzero_bits in md file is bad as it can't be recognized in
> other pass. 

An insn that matches in one pass is required to match in the next as well, yes.
This does not mean you cannot use nonzero_bits at all, but yes it is close.

>   Right now I want to convert a single pseudo to the pseudo AND with a mask
> in combine pass if its nonzero_bits is less than its mode mask and the outer
> operation is plus/ior/xor and its one of inner operation is
> ashift/lshiftrt/and. Thus it is possible to match rotate and insert pattern.
> What's your opinion? Thanks a lot.
> 
> (ior:DI (ashift:DI (reg:DI 125)
> (const_int 32 [0x20]))
> (reg:DI 126)))
> 
> is converted to
> 
> (ior:DI (ashift:DI (reg:DI 125)
>(const_int 32 [0x20]))
> (and:DI (reg:DI 126)
> (const_int 4294967295 [0xfff]

In this example it isn't invalid RTL.

> diff --git a/gcc/combine.c b/gcc/combine.c
> index 892c834a160..8b72a5ec831 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -11539,6 +11539,26 @@ change_zero_ext (rtx pat)
>return changed;
>  }
> 
> +/* Convert a psuedo to psuedo AND with a mask if its nonzero_bits is less
> +   than its mode mask.  */
> +static bool
> +pseudo_and_with_mask (rtx *reg)
> +{
> +  bool changed = false;
> +  gcc_assert (REG_P (*reg));
> +
> +  machine_mode mode = GET_MODE (*reg);
> +  unsigned HOST_WIDE_INT nonzero = nonzero_bits (*reg, mode);
> +  if (nonzero < GET_MODE_MASK (mode))
> +{
> +  rtx x = gen_rtx_AND (mode, *reg, GEN_INT (nonzero));
> +  SUBST (*reg, x);
> +  changed = true;
> +  //fprintf (stdout, "PIX optimization\n");
> +}
> +  return changed;
> +}
> +
>  /* Like recog, but we receive the address of a pointer to a new pattern.
> We try to match the rtx that the pointer points to.
> If that fails, we may try to modify or replace the pattern,
> @@ -11565,9 +11585,34 @@ recog_for_combine (rtx *pnewpat, rtx_insn *insn,
> rtx *pnotes)
> 
>void *marker = get_undo_marker ();
>bool changed = false;
> +  //bool PIX_opt = false;
> 
>if (GET_CODE (pat) == SET)
> -changed = change_zero_ext (pat);
> +{
> +  rtx src = SET_SRC (pat);
> +  if ((GET_CODE (src) == IOR
> +  || GET_CODE (src) == XOR
> +  || GET_CODE (src) == PLUS)
> + && (((GET_CODE (XEXP (src, 0)) == ASHIFT
> +   || GET_CODE (XEXP (src, 0)) == LSHIFTRT
> +   || GET_CODE (XEXP (src, 0)) == AND)
> +  && REG_P (XEXP (src, 1)))
> + || ((GET_CODE (XEXP (src, 1)) == ASHIFT
> +  || GET_CODE (XEXP (src, 1)) == LSHIFTRT
> +  || GET_CODE (XEXP (src, 1)) == AND)
> + && REG_P (XEXP (src, 0)
> +   {
> + changed = REG_P (XEXP (src, 0))
> +   ? pseudo_and_with_mask ( (SET_SRC (pat), 0))
> +   : pseudo_and_with_mask ( (SET_SRC (pat), 1));
> + if (changed)
> +   {
> + maybe_swap_commutative_operands (SET_SRC (pat));
> + //PIX_opt = true;
> +   }
> +   }
> +  changed |= change_zero_ext (pat);
> +}
>else if (GET_CODE (pat) == PARALLEL)
>  {
>int i;
> @@ -11585,6 +11630,8 @@ recog_for_combine (rtx *pnewpat, rtx_insn *insn, rtx
> *pnotes)
> 
>if (insn_code_number < 0)
> undo_to_marker (marker);
> +  //else if (PIX_opt)
> +   //fprintf (stdout, "PIX applied\n");
>  }
> 
>return insn_code_number;

I don't like this at all.  Maybe it will look better if you split this out
to a new helper like change_zero_ext?  That will make it clearer if and how
these change things interact: they should not make us try out an exponential
number of things (you don't cause that here, good), but also they should not
negatively affect each other, and that isn't clear at all as written.

Another problem is this is kind of giving up and trying to solve a problem
in combine itself with this band-aid, while the existing change_* stuff is
battling a problem in the design of RTL itself.  That can be solved by
clarifying things in a comment of course :-)

[Bug target/93453] PPC: rldimi not taken into account to avoid shift+or

2021-11-22 Thread guihaoc at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93453

--- Comment #6 from HaoChen Gui  ---
Sehger,
  Yes, I found that the nonzero_bits doesn't return exact value in other pass.
So calling nonzero_bits in md file is bad as it can't be recognized in other
pass. 
  Right now I want to convert a single pseudo to the pseudo AND with a mask in
combine pass if its nonzero_bits is less than its mode mask and the outer
operation is plus/ior/xor and its one of inner operation is
ashift/lshiftrt/and. Thus it is possible to match rotate and insert pattern.
What's your opinion? Thanks a lot.

(ior:DI (ashift:DI (reg:DI 125)
(const_int 32 [0x20]))
(reg:DI 126)))

is converted to

(ior:DI (ashift:DI (reg:DI 125)
   (const_int 32 [0x20]))
(and:DI (reg:DI 126)
(const_int 4294967295 [0xfff]


patch.diff

diff --git a/gcc/combine.c b/gcc/combine.c
index 892c834a160..8b72a5ec831 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -11539,6 +11539,26 @@ change_zero_ext (rtx pat)
   return changed;
 }

+/* Convert a psuedo to psuedo AND with a mask if its nonzero_bits is less
+   than its mode mask.  */
+static bool
+pseudo_and_with_mask (rtx *reg)
+{
+  bool changed = false;
+  gcc_assert (REG_P (*reg));
+
+  machine_mode mode = GET_MODE (*reg);
+  unsigned HOST_WIDE_INT nonzero = nonzero_bits (*reg, mode);
+  if (nonzero < GET_MODE_MASK (mode))
+{
+  rtx x = gen_rtx_AND (mode, *reg, GEN_INT (nonzero));
+  SUBST (*reg, x);
+  changed = true;
+  //fprintf (stdout, "PIX optimization\n");
+}
+  return changed;
+}
+
 /* Like recog, but we receive the address of a pointer to a new pattern.
We try to match the rtx that the pointer points to.
If that fails, we may try to modify or replace the pattern,
@@ -11565,9 +11585,34 @@ recog_for_combine (rtx *pnewpat, rtx_insn *insn, rtx
*pnotes)

   void *marker = get_undo_marker ();
   bool changed = false;
+  //bool PIX_opt = false;

   if (GET_CODE (pat) == SET)
-changed = change_zero_ext (pat);
+{
+  rtx src = SET_SRC (pat);
+  if ((GET_CODE (src) == IOR
+  || GET_CODE (src) == XOR
+  || GET_CODE (src) == PLUS)
+ && (((GET_CODE (XEXP (src, 0)) == ASHIFT
+   || GET_CODE (XEXP (src, 0)) == LSHIFTRT
+   || GET_CODE (XEXP (src, 0)) == AND)
+  && REG_P (XEXP (src, 1)))
+ || ((GET_CODE (XEXP (src, 1)) == ASHIFT
+  || GET_CODE (XEXP (src, 1)) == LSHIFTRT
+  || GET_CODE (XEXP (src, 1)) == AND)
+ && REG_P (XEXP (src, 0)
+   {
+ changed = REG_P (XEXP (src, 0))
+   ? pseudo_and_with_mask ( (SET_SRC (pat), 0))
+   : pseudo_and_with_mask ( (SET_SRC (pat), 1));
+ if (changed)
+   {
+ maybe_swap_commutative_operands (SET_SRC (pat));
+ //PIX_opt = true;
+   }
+   }
+  changed |= change_zero_ext (pat);
+}
   else if (GET_CODE (pat) == PARALLEL)
 {
   int i;
@@ -11585,6 +11630,8 @@ recog_for_combine (rtx *pnewpat, rtx_insn *insn, rtx
*pnotes)

   if (insn_code_number < 0)
undo_to_marker (marker);
+  //else if (PIX_opt)
+   //fprintf (stdout, "PIX applied\n");
 }

   return insn_code_number;

[Bug target/93453] PPC: rldimi not taken into account to avoid shift+or

2021-11-22 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93453

--- Comment #5 from Segher Boessenkool  ---
(In reply to HaoChen Gui from comment #4)
> (define_insn_and_split "*rotl3_insert_8"
>   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
> (plus_ior_xor:GPR (ashift:GPR (match_operand:GPR 1 "gpc_reg_operand" 
> "r")
>   (match_operand:SI 2 "const_int_operand" 
> "n"))
>   (match_operand:GPR 3 "gpc_reg_operand" "0")))]
>   "INTVAL (operands[2]) > 0
>&& (nonzero_bits (operands[3], mode)
>< HOST_WIDE_INT_1U << INTVAL (operands[2]))"
> {
>   if (mode == SImode)
> return "rlwimi %0,%1,%h2,0,31-%h2";
>   else
> return "rldimi %0,%1,%H2,0";
> }
>   "&& 1"
>   [(set (match_dup 0)
> (ior:GPR (and:GPR (match_dup 3)
>   (match_dup 4))
>  (ashift:GPR (match_dup 1)
>  (match_dup 2]
> {
>   operands[4] = GEN_INT ((HOST_WIDE_INT_1U << INTVAL (operands[2])) - 1);
> }
>   [(set_attr "type" "insert")])
> 
> But I found that nonzero_bits can't return an exact value except in combine
> pass.

It can return a different value after the combine pass, yes.  But making the
version of nonzero_bits used by combine be the generic version would be a big
regression, and the version used by combine cannot be used anywhere else (it
was an extension of flow.c originally, but this wasn't ported to the dataflow
framework).

I planned to fix this for GCC 12, but we are in stage 3 already :-)

> So the pattern finally can't be split to pattern of
> 'rotl3_insert_3'. Also if the pass after combine changes the insn, it
> can't be recognized as the nonzero_bits doesn't return exact value in that
> pass.
> 
> I am thinking if we can convert third operand to "reg and a mask" when the
> nonzero_bits is known in combine pass. Thus the pattern can be directly
> combined to 'rotl3_insert_3'.  
> 
> (set (reg:DI 123)
> (ior:DI (ashift:DI (reg:DI 125)
> (const_int 32 [0x20]))
> (reg:DI 126)))
> 
> (set (reg:DI 123)
> (ior:DI (ashift:DI (reg:DI 125)
>(const_int 32 [0x20]))
> (and:DI (reg:DI 126)
> (const_int 4294967295 [0xfff]

Will nothing modify it back?

[Bug target/93453] PPC: rldimi not taken into account to avoid shift+or

2021-11-15 Thread guihaoc at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93453

--- Comment #4 from HaoChen Gui  ---
For the second issue, I drafted following insn_and_split pattern. It tries to
combine the shift and ior when the nonzero_bits of operands[3] matches the
condition. 

(define_insn_and_split "*rotl3_insert_8"
  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
(plus_ior_xor:GPR (ashift:GPR (match_operand:GPR 1 "gpc_reg_operand"
"r")
  (match_operand:SI 2 "const_int_operand"
"n"))
  (match_operand:GPR 3 "gpc_reg_operand" "0")))]
  "INTVAL (operands[2]) > 0
   && (nonzero_bits (operands[3], mode)
   < HOST_WIDE_INT_1U << INTVAL (operands[2]))"
{
  if (mode == SImode)
return "rlwimi %0,%1,%h2,0,31-%h2";
  else
return "rldimi %0,%1,%H2,0";
}
  "&& 1"
  [(set (match_dup 0)
(ior:GPR (and:GPR (match_dup 3)
  (match_dup 4))
 (ashift:GPR (match_dup 1)
 (match_dup 2]
{
  operands[4] = GEN_INT ((HOST_WIDE_INT_1U << INTVAL (operands[2])) - 1);
}
  [(set_attr "type" "insert")])

But I found that nonzero_bits can't return an exact value except in combine
pass. So the pattern finally can't be split to pattern of
'rotl3_insert_3'. Also if the pass after combine changes the insn, it
can't be recognized as the nonzero_bits doesn't return exact value in that
pass.

I am thinking if we can convert third operand to "reg and a mask" when the
nonzero_bits is known in combine pass. Thus the pattern can be directly
combined to 'rotl3_insert_3'.  

(set (reg:DI 123)
(ior:DI (ashift:DI (reg:DI 125)
(const_int 32 [0x20]))
(reg:DI 126)))

(set (reg:DI 123)
(ior:DI (ashift:DI (reg:DI 125)
   (const_int 32 [0x20]))
(and:DI (reg:DI 126)
(const_int 4294967295 [0xfff]

[Bug target/93453] PPC: rldimi not taken into account to avoid shift+or

2021-11-09 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93453

--- Comment #3 from Segher Boessenkool  ---
Splitters run after all RTL transforms, so anything that can be done on the
split result has to be done manually.  This does not scale.  Splitters are
not suitable for this kind of thing.

You can do define_insns to recognise some special case rl*imi patterns we do
not yet recognise.

[Bug target/93453] PPC: rldimi not taken into account to avoid shift+or

2021-11-08 Thread guihaoc at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93453

HaoChen Gui  changed:

   What|Removed |Added

 CC||guihaoc at gcc dot gnu.org

--- Comment #2 from HaoChen Gui  ---
My solution is to split the move (from TI to V1TI) into one vsx_concat_v2di and
one V2DI to V1TI move. Thus, TI register 122 can be decomposed.

(insn 12 11 17 2 (set (reg:V1TI 121 [ b ])
(subreg:V1TI (reg:TI 122 [ a ]) 0)) "test2.c":4:5 1167
{vsx_movv1ti_64bit}
 (expr_list:REG_DEAD (reg:TI 122 [ a ])
(nil)))

//after split pass
(insn 23 11 24 2 (set (reg:V2DI 125)
(vec_concat:V2DI (subreg:DI (reg:TI 122 [ a ]) 0)
(subreg:DI (reg:TI 122 [ a ]) 8))) "test2.c":4:5 -1
 (nil))
(insn 24 23 17 2 (set (reg:V1TI 121 [ b ])
(subreg:V1TI (reg:V2DI 125) 0)) "test2.c":4:5 -1
 (nil))

//after subreg pass
(insn 23 11 24 2 (set (reg:V2DI 125)
(vec_concat:V2DI (reg:DI 126 [ a ])
(reg:DI 127 [ a+8 ]))) "test2.c":4:5 1346 {vsx_concat_v2di}
 (nil))
(insn 24 23 17 2 (set (reg:V1TI 121 [ b ])
(subreg:V1TI (reg:V2DI 125) 0)) "test2.c":4:5 1167 {vsx_movv1ti_64bit}
 (nil))

[Bug target/93453] PPC: rldimi not taken into account to avoid shift+or

2020-01-28 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93453

Segher Boessenkool  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-01-28
 Ever confirmed|0   |1

--- Comment #1 from Segher Boessenkool  ---
The first case is because the splitter does

  emit_insn (gen_ashldi3 (op3, op3, GEN_INT (32)));
  emit_insn (gen_iordi3 (dest, dest, op3));

but this splitter runs late, so it has to do all wanted simple optimisations
manually.


The second is

Trying 7 -> 8:
7: r124:DI=r125:DI<<0x20
  REG_DEAD r125:DI
8: r123:DI=r124:DI|r126:DI
  REG_DEAD r126:DI
  REG_DEAD r124:DI
Failed to match this instruction:
(set (reg:DI 123)
(ior:DI (ashift:DI (reg:DI 125)
(const_int 32 [0x20]))
(reg:DI 126)))

We need a separate pattern to also recognise this (instead of just a version
with the AND) if we already know the high half bits (of r126 here) are zero.
Confirmed.