Re: [PATCH] cse: Fix handling of fake vec_select sets [PR111702]

2023-12-26 Thread Prathamesh Kulkarni
On Thu, 21 Dec 2023 at 00:00, Richard Sandiford
 wrote:
>
> If cse sees:
>
>   (set (reg R) (const_vector [A B ...]))
>
> it creates fake sets of the form:
>
>   (set R[0] A)
>   (set R[1] B)
>   ...
>
> (with R[n] replaced by appropriate rtl) and then adds them to the tables
> in the same way as for normal sets.  This allows a sequence like:
>
>   (set (reg R2) A)
>   ...(reg R2)...
>
> to try to use R[0] instead of (reg R2).
>
> But the pass was taking the analogy too far, and was trying to simplify
> these fake sets based on costs.  That is, if there was an earlier:
>
>   (set (reg T) A)
>
> the pass would go to considerable effort trying to work out whether:
>
>   (set R[0] A)
>
> or:
>
>   (set R[0] (reg T))
>
> was more profitable.  This included running validate*_change on the sets,
> which has no meaning given that the sets are not part of the insn.
>
> In this example, the equivalence A == T is already known, and the
> purpose of the fake sets is to add A == T == R[0].  We can do that
> just as easily (or, as the PR shows, more easily) if we keep the
> original form of the fake set, with A instead of T.
>
> The problem in the PR occurred if we had:
>
> (1) something that establishes an equivalence between a vector V1 of
> M-bit scalar integers and a hard register H
>
> (2) something that establishes an equivalence between a vector V2 of
> N-bit scalar integers, where N instances of V1[0]
>
> (1) established an equivalence between V1[0] and H in M bits.
> (2) then triggered a search for an equivalence of V1[0] in N bits.
> This included:
>
>   /* See if we have a CONST_INT that is already in a register in a
>  wider mode.  */
>
> which (correctly) found that the low N bits of H contain the right value.
> But because it came from a wider mode, this equivalence between N-bit H
> and N-bit V1[0] was not yet in the hash table.  It therefore survived
> the purge in:
>
>   /* At this point, ELT, if nonzero, points to a class of expressions
>  equivalent to the source of this SET and SRC, SRC_EQV, SRC_FOLDED,
>  and SRC_RELATED, if nonzero, each contain additional equivalent
>  expressions.  Prune these latter expressions by deleting expressions
>  already in the equivalence class.
>
> And since more than 1 set found the same N-bit equivalence between
> H and V1[0], the pass tried to add it more than once.
>
> Things were already wrong at this stage, but an ICE was only triggered
> later when trying to merge this N-bit equivalence with another one.
>
> We could avoid the double registration by adding:
>
>   for (elt = classp; elt; elt = elt->next_same_value)
> if (rtx_equal_p (elt->exp, x))
>   return elt;
>
> to insert_with_costs, or by making cse_insn check whether previous
> sets have recorded the same equivalence.  The latter seems more
> appealing from a compile-time perspective.  But in this case,
> doing that would be adding yet more spurious work to the handling
> of fake sets.
>
> The handling of fake sets therefore seems like the more fundamental bug.
>
> While there, the patch also makes sure that we don't apply REG_EQUAL
> notes to these fake sets.  They only describe the "real" (first) set.
Hi Richard,
Thanks for the detailed explanation and fix!

Thanks,
Prathamesh
>
> gcc/
> PR rtl-optimization/111702
> * cse.cc (set::mode): Move earlier.
> (set::src_in_memory, set::src_volatile): Convert to bitfields.
> (set::is_fake_set): New member variable.
> (add_to_set): Add an is_fake_set parameter.
> (find_sets_in_insn): Update calls accordingly.
> (cse_insn): Do not apply REG_EQUAL notes to fake sets.  Do not
> try to optimize them either, or validate changes to them.
>
> gcc/
> PR rtl-optimization/111702
> * gcc.dg/rtl/aarch64/pr111702.c: New test.
> ---
>  gcc/cse.cc  | 38 +++---
>  gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c | 43 +
>  2 files changed, 67 insertions(+), 14 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c
>
> diff --git a/gcc/cse.cc b/gcc/cse.cc
> index f9603fdfd43..9fd51ca2832 100644
> --- a/gcc/cse.cc
> +++ b/gcc/cse.cc
> @@ -4128,13 +4128,17 @@ struct set
>unsigned dest_hash;
>/* The SET_DEST, with SUBREG, etc., stripped.  */
>rtx inner_dest;
> +  /* Original machine mode, in case it becomes a CONST_INT.  */
> +  ENUM_BITFIELD(machine_mode) mode : MACHINE_MODE_BITSIZE;
>/* Nonzero if the SET_SRC is in memory.  */
> -  char src_in_memory;
> +  unsigned int src_in_memory : 1;
>/* Nonzero if the SET_SRC contains something
>   whose value cannot be predicted and understood.  */
> -  char src_volatile;
> -  /* Original machine mode, in case it becomes a CONST_INT.  */
> -  ENUM_BITFIELD(machine_mode) mode : MACHINE_MODE_BITSIZE;
> +  unsigned int src_volatile : 1;
> +  /* Nonzero if RTL is an artifical set that has been c

Re: [PATCH] cse: Fix handling of fake vec_select sets [PR111702]

2023-12-20 Thread Richard Biener



> Am 20.12.2023 um 19:30 schrieb Richard Sandiford :
> 
> If cse sees:
> 
>  (set (reg R) (const_vector [A B ...]))
> 
> it creates fake sets of the form:
> 
>  (set R[0] A)
>  (set R[1] B)
>  ...
> 
> (with R[n] replaced by appropriate rtl) and then adds them to the tables
> in the same way as for normal sets.  This allows a sequence like:
> 
>  (set (reg R2) A)
>  ...(reg R2)...
> 
> to try to use R[0] instead of (reg R2).
> 
> But the pass was taking the analogy too far, and was trying to simplify
> these fake sets based on costs.  That is, if there was an earlier:
> 
>  (set (reg T) A)
> 
> the pass would go to considerable effort trying to work out whether:
> 
>  (set R[0] A)
> 
> or:
> 
>  (set R[0] (reg T))
> 
> was more profitable.  This included running validate*_change on the sets,
> which has no meaning given that the sets are not part of the insn.
> 
> In this example, the equivalence A == T is already known, and the
> purpose of the fake sets is to add A == T == R[0].  We can do that
> just as easily (or, as the PR shows, more easily) if we keep the
> original form of the fake set, with A instead of T.
> 
> The problem in the PR occurred if we had:
> 
> (1) something that establishes an equivalence between a vector V1 of
>M-bit scalar integers and a hard register H
> 
> (2) something that establishes an equivalence between a vector V2 of
>N-bit scalar integers, where Ninstances of V1[0]
> 
> (1) established an equivalence between V1[0] and H in M bits.
> (2) then triggered a search for an equivalence of V1[0] in N bits.
> This included:
> 
>  /* See if we have a CONST_INT that is already in a register in a
> wider mode.  */
> 
> which (correctly) found that the low N bits of H contain the right value.
> But because it came from a wider mode, this equivalence between N-bit H
> and N-bit V1[0] was not yet in the hash table.  It therefore survived
> the purge in:
> 
>  /* At this point, ELT, if nonzero, points to a class of expressions
> equivalent to the source of this SET and SRC, SRC_EQV, SRC_FOLDED,
> and SRC_RELATED, if nonzero, each contain additional equivalent
> expressions.  Prune these latter expressions by deleting expressions
> already in the equivalence class.
> 
> And since more than 1 set found the same N-bit equivalence between
> H and V1[0], the pass tried to add it more than once.
> 
> Things were already wrong at this stage, but an ICE was only triggered
> later when trying to merge this N-bit equivalence with another one.
> 
> We could avoid the double registration by adding:
> 
>  for (elt = classp; elt; elt = elt->next_same_value)
>if (rtx_equal_p (elt->exp, x))
>  return elt;
> 
> to insert_with_costs, or by making cse_insn check whether previous
> sets have recorded the same equivalence.  The latter seems more
> appealing from a compile-time perspective.  But in this case,
> doing that would be adding yet more spurious work to the handling
> of fake sets.
> 
> The handling of fake sets therefore seems like the more fundamental bug.
> 
> While there, the patch also makes sure that we don't apply REG_EQUAL
> notes to these fake sets.  They only describe the "real" (first) set.

Agreed and OK

> gcc/
>PR rtl-optimization/111702
>* cse.cc (set::mode): Move earlier.
>(set::src_in_memory, set::src_volatile): Convert to bitfields.
>(set::is_fake_set): New member variable.
>(add_to_set): Add an is_fake_set parameter.
>(find_sets_in_insn): Update calls accordingly.
>(cse_insn): Do not apply REG_EQUAL notes to fake sets.  Do not
>try to optimize them either, or validate changes to them.
> 
> gcc/
>PR rtl-optimization/111702
>* gcc.dg/rtl/aarch64/pr111702.c: New test.
> ---
> gcc/cse.cc  | 38 +++---
> gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c | 43 +
> 2 files changed, 67 insertions(+), 14 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c
> 
> diff --git a/gcc/cse.cc b/gcc/cse.cc
> index f9603fdfd43..9fd51ca2832 100644
> --- a/gcc/cse.cc
> +++ b/gcc/cse.cc
> @@ -4128,13 +4128,17 @@ struct set
>   unsigned dest_hash;
>   /* The SET_DEST, with SUBREG, etc., stripped.  */
>   rtx inner_dest;
> +  /* Original machine mode, in case it becomes a CONST_INT.  */
> +  ENUM_BITFIELD(machine_mode) mode : MACHINE_MODE_BITSIZE;
>   /* Nonzero if the SET_SRC is in memory.  */
> -  char src_in_memory;
> +  unsigned int src_in_memory : 1;
>   /* Nonzero if the SET_SRC contains something
>  whose value cannot be predicted and understood.  */
> -  char src_volatile;
> -  /* Original machine mode, in case it becomes a CONST_INT.  */
> -  ENUM_BITFIELD(machine_mode) mode : MACHINE_MODE_BITSIZE;
> +  unsigned int src_volatile : 1;
> +  /* Nonzero if RTL is an artifical set that has been created to describe
> + part of an insn's effect.  Zero means that RTL appears directly in
> + the insn pattern.  */
> +  

[PATCH] cse: Fix handling of fake vec_select sets [PR111702]

2023-12-20 Thread Richard Sandiford
If cse sees:

  (set (reg R) (const_vector [A B ...]))

it creates fake sets of the form:

  (set R[0] A)
  (set R[1] B)
  ...

(with R[n] replaced by appropriate rtl) and then adds them to the tables
in the same way as for normal sets.  This allows a sequence like:

  (set (reg R2) A)
  ...(reg R2)...

to try to use R[0] instead of (reg R2).

But the pass was taking the analogy too far, and was trying to simplify
these fake sets based on costs.  That is, if there was an earlier:

  (set (reg T) A)

the pass would go to considerable effort trying to work out whether:

  (set R[0] A)

or:

  (set R[0] (reg T))

was more profitable.  This included running validate*_change on the sets,
which has no meaning given that the sets are not part of the insn.

In this example, the equivalence A == T is already known, and the
purpose of the fake sets is to add A == T == R[0].  We can do that
just as easily (or, as the PR shows, more easily) if we keep the
original form of the fake set, with A instead of T.

The problem in the PR occurred if we had:

(1) something that establishes an equivalence between a vector V1 of
M-bit scalar integers and a hard register H

(2) something that establishes an equivalence between a vector V2 of
N-bit scalar integers, where Nnext_same_value)
if (rtx_equal_p (elt->exp, x))
  return elt;

to insert_with_costs, or by making cse_insn check whether previous
sets have recorded the same equivalence.  The latter seems more
appealing from a compile-time perspective.  But in this case,
doing that would be adding yet more spurious work to the handling
of fake sets.

The handling of fake sets therefore seems like the more fundamental bug.

While there, the patch also makes sure that we don't apply REG_EQUAL
notes to these fake sets.  They only describe the "real" (first) set.

gcc/
PR rtl-optimization/111702
* cse.cc (set::mode): Move earlier.
(set::src_in_memory, set::src_volatile): Convert to bitfields.
(set::is_fake_set): New member variable.
(add_to_set): Add an is_fake_set parameter.
(find_sets_in_insn): Update calls accordingly.
(cse_insn): Do not apply REG_EQUAL notes to fake sets.  Do not
try to optimize them either, or validate changes to them.

gcc/
PR rtl-optimization/111702
* gcc.dg/rtl/aarch64/pr111702.c: New test.
---
 gcc/cse.cc  | 38 +++---
 gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c | 43 +
 2 files changed, 67 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c

diff --git a/gcc/cse.cc b/gcc/cse.cc
index f9603fdfd43..9fd51ca2832 100644
--- a/gcc/cse.cc
+++ b/gcc/cse.cc
@@ -4128,13 +4128,17 @@ struct set
   unsigned dest_hash;
   /* The SET_DEST, with SUBREG, etc., stripped.  */
   rtx inner_dest;
+  /* Original machine mode, in case it becomes a CONST_INT.  */
+  ENUM_BITFIELD(machine_mode) mode : MACHINE_MODE_BITSIZE;
   /* Nonzero if the SET_SRC is in memory.  */
-  char src_in_memory;
+  unsigned int src_in_memory : 1;
   /* Nonzero if the SET_SRC contains something
  whose value cannot be predicted and understood.  */
-  char src_volatile;
-  /* Original machine mode, in case it becomes a CONST_INT.  */
-  ENUM_BITFIELD(machine_mode) mode : MACHINE_MODE_BITSIZE;
+  unsigned int src_volatile : 1;
+  /* Nonzero if RTL is an artifical set that has been created to describe
+ part of an insn's effect.  Zero means that RTL appears directly in
+ the insn pattern.  */
+  unsigned int is_fake_set : 1;
   /* Hash value of constant equivalent for SET_SRC.  */
   unsigned src_const_hash;
   /* A constant equivalent for SET_SRC, if any.  */
@@ -4229,12 +4233,15 @@ try_back_substitute_reg (rtx set, rtx_insn *insn)
 }
 }
 
-/* Add an entry containing RTL X into SETS.  */
+/* Add an entry containing RTL X into SETS.  IS_FAKE_SET is true if X is
+   an artifical set that has been created to describe part of an insn's
+   effect.  */
 static inline void
-add_to_set (vec *sets, rtx x)
+add_to_set (vec *sets, rtx x, bool is_fake_set)
 {
   struct set entry = {};
   entry.rtl = x;
+  entry.is_fake_set = is_fake_set;
   sets->safe_push (entry);
 }
 
@@ -4271,7 +4278,7 @@ find_sets_in_insn (rtx_insn *insn, vec *psets)
&& known_eq (GET_MODE_NUNITS (GET_MODE (SET_SRC (x))), 1)))
{
  /* First register the vector itself.  */
- add_to_set (psets, x);
+ add_to_set (psets, x, false);
  rtx src = SET_SRC (x);
  /* Go over the constants of the CONST_VECTOR in forward order, to
 put them in the same order in the SETS array.  */
@@ -4281,11 +4288,12 @@ find_sets_in_insn (rtx_insn *insn, vec 
*psets)
 used to tell CSE how to get to a particular constant.  */
  rtx y = simplify_gen_vec_select (SET_DEST (x), i);
  gcc_assert (y);
- add_to_set (psets, gen_rtx_SET (y,