Re: [PATCH V3 0/2] aarch64: Place target independent and dependent changed code in one file.

2024-04-05 Thread Ajit Agarwal
Hello Alex:

On 03/04/24 8:51 pm, Alex Coplan wrote:
> On 23/02/2024 16:41, Ajit Agarwal wrote:
>> Hello Richard/Alex/Segher:
> 
> Hi Ajit,
> 
> Sorry for the delay and thanks for working on this.
> 
> Generally this looks like the right sort of approach (IMO) but I've left
> some comments below.
> 
> I'll start with a meta comment: in the subject line you have marked this
> as 0/2, but usually 0/n is reserved for the cover letter of a patch
> series and wouldn't contain an actual patch.  I think this might have
> confused the Linaro CI suitably such that it didn't run regression tests
> on the patch.
> 
Sorry for that. I have changed that in latest patch.
>>
>> This patch adds the changed code for target independent and
>> dependent code for load store fusion.
>>
>> Common infrastructure of load store pair fusion is
>> divided into target independent and target dependent
>> changed code.
>>
>> Target independent code is the Generic code with
>> pure virtual function to interface betwwen target
>> independent and dependent code.
>>
>> Target dependent code is the implementation of pure
>> virtual function for aarch64 target and the call
>> to target independent code.
>>
>> Bootstrapped for aarch64-linux-gnu.
>>
>> Thanks & Regards
>> Ajit
>>
>> aarch64: Place target independent and dependent changed code in one file.
>>
>> Common infrastructure of load store pair fusion is
>> divided into target independent and target dependent
>> changed code.
>>
>> Target independent code is the Generic code with
>> pure virtual function to interface betwwen target
>> independent and dependent code.
>>
>> Target dependent code is the implementation of pure
>> virtual function for aarch64 target and the call
>> to target independent code.
>>
>> 2024-02-23  Ajit Kumar Agarwal  
>>
>> gcc/ChangeLog:
>>
>>  * config/aarch64/aarch64-ldp-fusion.cc: Place target
>>  independent and dependent changed code.
>> ---
>>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 437 ---
>>  1 file changed, 305 insertions(+), 132 deletions(-)
>>
>> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
>> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> index 22ed95eb743..2ef22ff1e96 100644
>> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> @@ -40,10 +40,10 @@
>>  
>>  using namespace rtl_ssa;
>>  
>> -static constexpr HOST_WIDE_INT LDP_IMM_BITS = 7;
>> -static constexpr HOST_WIDE_INT LDP_IMM_SIGN_BIT = (1 << (LDP_IMM_BITS - 1));
>> -static constexpr HOST_WIDE_INT LDP_MAX_IMM = LDP_IMM_SIGN_BIT - 1;
>> -static constexpr HOST_WIDE_INT LDP_MIN_IMM = -LDP_MAX_IMM - 1;
>> +static constexpr HOST_WIDE_INT PAIR_MEM_IMM_BITS = 7;
>> +static constexpr HOST_WIDE_INT PAIR_MEM_IMM_SIGN_BIT = (1 << 
>> (PAIR_MEM_IMM_BITS - 1));
>> +static constexpr HOST_WIDE_INT PAIR_MEM_MAX_IMM = PAIR_MEM_IMM_SIGN_BIT - 1;
>> +static constexpr HOST_WIDE_INT PAIR_MEM_MIN_IMM = -PAIR_MEM_MAX_IMM - 1;
> 
> These constants shouldn't be renamed: they are specific to aarch64 so the
> original names should be preserved in this file.
> 
> I expect we want to introduce virtual functions to validate an offset
> for a paired access.  The aarch64 code could then implement it by
> comparing the offset against LDP_{MIN,MAX}_IMM, and the generic code
> could use that hook to replace the code that queries those constants
> directly (i.e. in find_trailing_add and get_viable_bases).
> 
>>  
>>  // We pack these fields (load_p, fpsimd_p, and size) into an integer
>>  // (LFS) which we use as part of the key into the main hash tables.
>> @@ -138,8 +138,18 @@ struct alt_base
>>poly_int64 offset;
>>  };
>>  
>> +// Virtual base class for load/store walkers used in alias analysis.
>> +struct alias_walker
>> +{
>> +  virtual bool conflict_p (int &budget) const = 0;
>> +  virtual insn_info *insn () const = 0;
>> +  virtual bool valid () const  = 0;
>> +  virtual void advance () = 0;
>> +};
>> +
>> +
>>  // State used by the pass for a given basic block.
>> -struct ldp_bb_info
>> +struct pair_fusion
> 
> As a comment on the high-level design, I think we want a generic class
> for the overall pass, not just for the BB-specific structure.
> 
> That is because naturally we want the ldp_fusion_bb function itself to
> be a member of such a class, so that it can access virtual functions to
> query the target e.g. about the load/store pair policy, and whether to
> try and promote writeback pairs.
> 
> If we keep all of the virtual functions in such an outer class, then we
> can keep the ldp_fusion_bb class generic (not needing an override for
> each target) and that inner class can perhaps be given a pointer or
> reference to the outer class when it is instantiated in ldp_fusion_bb.
> 

Above comments is addressed in latest patch.

>>  {
>>using def_hash = nofree_ptr_hash;
>>using expr_key_t = pair_hash>;
>> @@ -161,13 +171,13 @@ struct ldp_bb_info
>>static const size_t obstack_alignment = sizeof (void *);
>>bb_inf

Re: [PATCH V3 0/2] aarch64: Place target independent and dependent changed code in one file.

2024-04-03 Thread Richard Sandiford
Alex Coplan  writes:
> On 23/02/2024 16:41, Ajit Agarwal wrote:
>> Hello Richard/Alex/Segher:
>
> Hi Ajit,
>
> Sorry for the delay and thanks for working on this.
>
> Generally this looks like the right sort of approach (IMO) but I've left
> some comments below.
>
> I'll start with a meta comment: in the subject line you have marked this
> as 0/2, but usually 0/n is reserved for the cover letter of a patch
> series and wouldn't contain an actual patch.  I think this might have
> confused the Linaro CI suitably such that it didn't run regression tests
> on the patch.

Alex, thanks for the thorough and in-depth review.  I agree with all the
comments FWIW.  Just to add a couple of things:

> > @@ -138,8 +138,18 @@ struct alt_base
> >poly_int64 offset;
> >  };
> >  
> > +// Virtual base class for load/store walkers used in alias analysis.
> > +struct alias_walker
> > +{
> > +  virtual bool conflict_p (int &budget) const = 0;
> > +  virtual insn_info *insn () const = 0;
> > +  virtual bool valid () const  = 0;
> > +  virtual void advance () = 0;
> > +};
> > +
> > +
> >  // State used by the pass for a given basic block.
> > -struct ldp_bb_info
> > +struct pair_fusion
>
> As a comment on the high-level design, I think we want a generic class
> for the overall pass, not just for the BB-specific structure.
>
> That is because naturally we want the ldp_fusion_bb function itself to
> be a member of such a class, so that it can access virtual functions to
> query the target e.g. about the load/store pair policy, and whether to
> try and promote writeback pairs.
>
> If we keep all of the virtual functions in such an outer class, then we
> can keep the ldp_fusion_bb class generic (not needing an override for
> each target) and that inner class can perhaps be given a pointer or
> reference to the outer class when it is instantiated in ldp_fusion_bb.

I agree that in general, the new virtual methods should belong to a pass
class rather than the per-bb class.

In principle, if we need to virtualise existing members of ldp_bb_info
(or code contained within existing members of ldp_bb_info), and if that
code accesses members of the bb info, then it might make sense to have
target-specific derivatives of the bb info structure too, with a virtual
function to create the bb info structure for a given bb.

However, it looks like all but one of the virtual functions in the patch
are self-contained (in the sense of depending only on their arguments
and on globals).  The one exception is transform_for_base, but Alex
asked whether that needs to be virtualised.  If it doesn't, then like
Alex says, it seems that all virtuals could belong to the pass class
rather than to the bb info.

>> [...]
>> +  }
>>  };
>>  
>> +bool
>> +store_modifies_mem_p (rtx mem, insn_info *store_insn, int &budget);
>> +bool load_modified_by_store_p (insn_info *load,
>> +  insn_info *store,
>> +  int &budget);
>> +extern insn_info *
>> +try_repurpose_store (insn_info *first,
>> + insn_info *second,
>> + const insn_range_info &move_range);
>> +
>> +void reset_debug_use (use_info *use);
>> +
>> +extern void
>> +fixup_debug_uses (obstack_watermark &attempt,
>> +  insn_info *insns[2],
>> +  rtx orig_rtl[2],
>> +  insn_info *pair_dst,
>> +  insn_info *trailing_add,
>> +  bool load_p,
>> +  int writeback,
>> +  rtx writeback_effect,
>> +  unsigned base_regno);
>> +
>> +void
>> +fixup_debug_uses_trailing_add (obstack_watermark &attempt,
>> +   insn_info *pair_dst,
>> +   insn_info *trailing_add,
>> +   rtx writeback_effect);
>> +
>> +
>> +extern void
>> +fixup_debug_use (obstack_watermark &attempt,
>> + use_info *use,
>> + def_info *def,
>> + rtx base,
>> + poly_int64 wb_offset);
>> +
>> +extern insn_info *
>> +find_trailing_add (insn_info *insns[2],
>> +   const insn_range_info &pair_range,
>> +   int initial_writeback,
>> +   rtx *writeback_effect,
>> +   def_info **add_def,
>> +   def_info *base_def,
>> +   poly_int64 initial_offset,
>> +   unsigned access_size);
>> +
>> +rtx drop_writeback (rtx mem);
>> +rtx pair_mem_strip_offset (rtx mem, poly_int64 *offset);
>> +bool any_pre_modify_p (rtx x);
>> +bool any_post_modify_p (rtx x);
>> +int encode_lfs (lfs_fields fields);
>> +extern insn_info * latest_hazard_before (insn_info *insn, rtx *ignore,
>> +  insn_info *ignore_insn = nullptr);
>> +insn_info * first_hazard_after (insn_info *insn, rtx *ignore);
>> +bool ranges_overlap_p (const insn_range_info &r1, const insn_range_info 
>> &r2);
>> +insn_range_info get_def_range (def_info *def);
>> +insn_range_info def_downwards_move_range (def_info *def);
>> +insn_range_info def_upwards_mov

Re: [PATCH V3 0/2] aarch64: Place target independent and dependent changed code in one file.

2024-04-03 Thread Alex Coplan
On 23/02/2024 16:41, Ajit Agarwal wrote:
> Hello Richard/Alex/Segher:

Hi Ajit,

Sorry for the delay and thanks for working on this.

Generally this looks like the right sort of approach (IMO) but I've left
some comments below.

I'll start with a meta comment: in the subject line you have marked this
as 0/2, but usually 0/n is reserved for the cover letter of a patch
series and wouldn't contain an actual patch.  I think this might have
confused the Linaro CI suitably such that it didn't run regression tests
on the patch.

> 
> This patch adds the changed code for target independent and
> dependent code for load store fusion.
> 
> Common infrastructure of load store pair fusion is
> divided into target independent and target dependent
> changed code.
> 
> Target independent code is the Generic code with
> pure virtual function to interface betwwen target
> independent and dependent code.
> 
> Target dependent code is the implementation of pure
> virtual function for aarch64 target and the call
> to target independent code.
> 
> Bootstrapped for aarch64-linux-gnu.
> 
> Thanks & Regards
> Ajit
> 
> aarch64: Place target independent and dependent changed code in one file.
> 
> Common infrastructure of load store pair fusion is
> divided into target independent and target dependent
> changed code.
> 
> Target independent code is the Generic code with
> pure virtual function to interface betwwen target
> independent and dependent code.
> 
> Target dependent code is the implementation of pure
> virtual function for aarch64 target and the call
> to target independent code.
> 
> 2024-02-23  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-ldp-fusion.cc: Place target
>   independent and dependent changed code.
> ---
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 437 ---
>  1 file changed, 305 insertions(+), 132 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 22ed95eb743..2ef22ff1e96 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -40,10 +40,10 @@
>  
>  using namespace rtl_ssa;
>  
> -static constexpr HOST_WIDE_INT LDP_IMM_BITS = 7;
> -static constexpr HOST_WIDE_INT LDP_IMM_SIGN_BIT = (1 << (LDP_IMM_BITS - 1));
> -static constexpr HOST_WIDE_INT LDP_MAX_IMM = LDP_IMM_SIGN_BIT - 1;
> -static constexpr HOST_WIDE_INT LDP_MIN_IMM = -LDP_MAX_IMM - 1;
> +static constexpr HOST_WIDE_INT PAIR_MEM_IMM_BITS = 7;
> +static constexpr HOST_WIDE_INT PAIR_MEM_IMM_SIGN_BIT = (1 << 
> (PAIR_MEM_IMM_BITS - 1));
> +static constexpr HOST_WIDE_INT PAIR_MEM_MAX_IMM = PAIR_MEM_IMM_SIGN_BIT - 1;
> +static constexpr HOST_WIDE_INT PAIR_MEM_MIN_IMM = -PAIR_MEM_MAX_IMM - 1;

These constants shouldn't be renamed: they are specific to aarch64 so the
original names should be preserved in this file.

I expect we want to introduce virtual functions to validate an offset
for a paired access.  The aarch64 code could then implement it by
comparing the offset against LDP_{MIN,MAX}_IMM, and the generic code
could use that hook to replace the code that queries those constants
directly (i.e. in find_trailing_add and get_viable_bases).

>  
>  // We pack these fields (load_p, fpsimd_p, and size) into an integer
>  // (LFS) which we use as part of the key into the main hash tables.
> @@ -138,8 +138,18 @@ struct alt_base
>poly_int64 offset;
>  };
>  
> +// Virtual base class for load/store walkers used in alias analysis.
> +struct alias_walker
> +{
> +  virtual bool conflict_p (int &budget) const = 0;
> +  virtual insn_info *insn () const = 0;
> +  virtual bool valid () const  = 0;
> +  virtual void advance () = 0;
> +};
> +
> +
>  // State used by the pass for a given basic block.
> -struct ldp_bb_info
> +struct pair_fusion

As a comment on the high-level design, I think we want a generic class
for the overall pass, not just for the BB-specific structure.

That is because naturally we want the ldp_fusion_bb function itself to
be a member of such a class, so that it can access virtual functions to
query the target e.g. about the load/store pair policy, and whether to
try and promote writeback pairs.

If we keep all of the virtual functions in such an outer class, then we
can keep the ldp_fusion_bb class generic (not needing an override for
each target) and that inner class can perhaps be given a pointer or
reference to the outer class when it is instantiated in ldp_fusion_bb.

>  {
>using def_hash = nofree_ptr_hash;
>using expr_key_t = pair_hash>;
> @@ -161,13 +171,13 @@ struct ldp_bb_info
>static const size_t obstack_alignment = sizeof (void *);
>bb_info *m_bb;
>  
> -  ldp_bb_info (bb_info *bb) : m_bb (bb), m_emitted_tombstone (false)
> +  pair_fusion (bb_info *bb) : m_bb (bb), m_emitted_tombstone (false)
>{
>  obstack_specify_allocation (&m_obstack, OBSTACK_CHUNK_SIZE,
>   obstack_alignment, obstack_chunk_alloc,
> 

[PATCH V3 0/2] aarch64: Place target independent and dependent changed code in one file.

2024-02-23 Thread Ajit Agarwal
Hello Richard/Alex/Segher:

This patch adds the changed code for target independent and
dependent code for load store fusion.

Common infrastructure of load store pair fusion is
divided into target independent and target dependent
changed code.

Target independent code is the Generic code with
pure virtual function to interface betwwen target
independent and dependent code.

Target dependent code is the implementation of pure
virtual function for aarch64 target and the call
to target independent code.

Bootstrapped for aarch64-linux-gnu.

Thanks & Regards
Ajit

aarch64: Place target independent and dependent changed code in one file.

Common infrastructure of load store pair fusion is
divided into target independent and target dependent
changed code.

Target independent code is the Generic code with
pure virtual function to interface betwwen target
independent and dependent code.

Target dependent code is the implementation of pure
virtual function for aarch64 target and the call
to target independent code.

2024-02-23  Ajit Kumar Agarwal  

gcc/ChangeLog:

* config/aarch64/aarch64-ldp-fusion.cc: Place target
independent and dependent changed code.
---
 gcc/config/aarch64/aarch64-ldp-fusion.cc | 437 ---
 1 file changed, 305 insertions(+), 132 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
b/gcc/config/aarch64/aarch64-ldp-fusion.cc
index 22ed95eb743..2ef22ff1e96 100644
--- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
+++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
@@ -40,10 +40,10 @@
 
 using namespace rtl_ssa;
 
-static constexpr HOST_WIDE_INT LDP_IMM_BITS = 7;
-static constexpr HOST_WIDE_INT LDP_IMM_SIGN_BIT = (1 << (LDP_IMM_BITS - 1));
-static constexpr HOST_WIDE_INT LDP_MAX_IMM = LDP_IMM_SIGN_BIT - 1;
-static constexpr HOST_WIDE_INT LDP_MIN_IMM = -LDP_MAX_IMM - 1;
+static constexpr HOST_WIDE_INT PAIR_MEM_IMM_BITS = 7;
+static constexpr HOST_WIDE_INT PAIR_MEM_IMM_SIGN_BIT = (1 << 
(PAIR_MEM_IMM_BITS - 1));
+static constexpr HOST_WIDE_INT PAIR_MEM_MAX_IMM = PAIR_MEM_IMM_SIGN_BIT - 1;
+static constexpr HOST_WIDE_INT PAIR_MEM_MIN_IMM = -PAIR_MEM_MAX_IMM - 1;
 
 // We pack these fields (load_p, fpsimd_p, and size) into an integer
 // (LFS) which we use as part of the key into the main hash tables.
@@ -138,8 +138,18 @@ struct alt_base
   poly_int64 offset;
 };
 
+// Virtual base class for load/store walkers used in alias analysis.
+struct alias_walker
+{
+  virtual bool conflict_p (int &budget) const = 0;
+  virtual insn_info *insn () const = 0;
+  virtual bool valid () const  = 0;
+  virtual void advance () = 0;
+};
+
+
 // State used by the pass for a given basic block.
-struct ldp_bb_info
+struct pair_fusion
 {
   using def_hash = nofree_ptr_hash;
   using expr_key_t = pair_hash>;
@@ -161,13 +171,13 @@ struct ldp_bb_info
   static const size_t obstack_alignment = sizeof (void *);
   bb_info *m_bb;
 
-  ldp_bb_info (bb_info *bb) : m_bb (bb), m_emitted_tombstone (false)
+  pair_fusion (bb_info *bb) : m_bb (bb), m_emitted_tombstone (false)
   {
 obstack_specify_allocation (&m_obstack, OBSTACK_CHUNK_SIZE,
obstack_alignment, obstack_chunk_alloc,
obstack_chunk_free);
   }
-  ~ldp_bb_info ()
+  ~pair_fusion ()
   {
 obstack_free (&m_obstack, nullptr);
 
@@ -177,10 +187,50 @@ struct ldp_bb_info
bitmap_obstack_release (&m_bitmap_obstack);
   }
   }
+  void track_access (insn_info *, bool load, rtx mem);
+  void transform ();
+  void cleanup_tombstones ();
+  virtual void set_multiword_subreg (insn_info *i1, insn_info *i2,
+bool load_p) = 0;
+  virtual rtx gen_load_store_pair (rtx *pats,  rtx writeback,
+  bool load_p) = 0;
+  void merge_pairs (insn_list_t &, insn_list_t &,
+   bool load_p, unsigned access_size);
+  virtual void transform_for_base (int load_size, access_group &group) = 0;
+
+  bool try_fuse_pair (bool load_p, unsigned access_size,
+insn_info *i1, insn_info *i2);
+
+  bool fuse_pair (bool load_p, unsigned access_size,
+ int writeback,
+ insn_info *i1, insn_info *i2,
+ base_cand &base,
+ const insn_range_info &move_range);
+
+  void do_alias_analysis (insn_info *alias_hazards[4],
+ alias_walker *walkers[4],
+ bool load_p);
+
+  void track_tombstone (int uid);
+
+  bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs);
+
+  virtual bool is_fpsimd_op_p (rtx reg_op, machine_mode mem_mode,
+  bool load_p) = 0;
+
+  virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
+  virtual bool pair_trailing_writeback_p () = 0;
+  virtual bool pair_check_register_operand (bool load_p, rtx reg_op,
+   machine_mode mem_mode) = 0;
+  virtual int pair_mem_alias_check_limit () = 0;