Re: [PATCH v5] Implement new RTL optimizations pass: fold-mem-offsets.

2023-10-03 Thread Manolis Tsamis
On Fri, Sep 29, 2023 at 10:22 PM Jeff Law  wrote:
>
>
>
> On 9/12/23 04:13, Manolis Tsamis wrote:
>
> >>> +
> >>> +/* Get the single reaching definition of an instruction inside a BB.
> >>> +   The definition is desired for REG used in INSN.
> >>> +   Return the definition insn or NULL if there's no definition with
> >>> +   the desired criteria.  */
> >>> +static rtx_insn*
> >>> +get_single_def_in_bb (rtx_insn *insn, rtx reg)
> >>> +{
> >>> +  df_ref use;
> >>> +  struct df_link *ref_chain, *ref_link;
> >>> +
> >>> +  FOR_EACH_INSN_USE (use, insn)
> >>> +{
> >>> +  if (GET_CODE (DF_REF_REG (use)) == SUBREG)
> >>> + return NULL;
> >>> +  if (REGNO (DF_REF_REG (use)) == REGNO (reg))
> >>> + break;
> >>> +}
> >>> +
> >>> +  if (!use)
> >>> +return NULL;
> >>> +
> >>> +  ref_chain = DF_REF_CHAIN (use);
> >> So what if there's two uses of REG in INSN?  I don't think it's be
> >> common at all, but probably better safe and reject than sorry, right? Or
> >> is that case filtered out earlier?
> >>
> > If the REG is the same won't the definitions be the same even if that
> > REG appears multiple times in INSN?
> Yes.
Good, so no issues here.
>
> > fold_offsets_1 should be able to handle the folding with multiple uses
> > of REG just fine, for example add R1, R1 or add (ashift R1, 1), R1.
> > If there's no other issue here I assume we want to keep that as-is in
> > order to not reduce the propagation power (Which I assume is similar
> > to ree which uses the same logic).
> OK.  I was primarily concerned about the folding and rewriting aspects.
> It probably can only show up on targets with LEA like instructions, and
> even on such targets it's probably rate.
>
OK.
>
>
>
> >>> +/* Test if INSN is a memory load / store that can have an offset folded 
> >>> to it.
> >>> +   Return true iff INSN is such an instruction and return through 
> >>> MEM_OUT,
> >>> +   REG_OUT and OFFSET_OUT the RTX that has a MEM code, the register that 
> >>> is
> >>> +   used as a base address and the offset accordingly.
> >>> +   All of the out pointers may be NULL in which case they will be 
> >>> ignored.  */
> >>> +bool
> >>> +get_fold_mem_root (rtx_insn* insn, rtx *mem_out, rtx *reg_out,
> >>> +HOST_WIDE_INT *offset_out)
> >>> +{
> >>> +  rtx set = single_set (insn);
> >>> +  rtx mem = NULL_RTX;
> >>> +
> >>> +  if (set != NULL_RTX)
> >>> +{
> >>> +  rtx src = SET_SRC (set);
> >>> +  rtx dest = SET_DEST (set);
> >>> +
> >>> +  /* Don't fold when we have unspec / volatile.  */
> >>> +  if (GET_CODE (src) == UNSPEC
> >>> +   || GET_CODE (src) == UNSPEC_VOLATILE
> >>> +   || GET_CODE (dest) == UNSPEC
> >>> +   || GET_CODE (dest) == UNSPEC_VOLATILE)
> >>> + return false;
> >>> +
> >>> +  if (MEM_P (src))
> >>> + mem = src;
> >>> +  else if (MEM_P (dest))
> >>> + mem = dest;
> >>> +  else if ((GET_CODE (src) == SIGN_EXTEND
> >>> + || GET_CODE (src) == ZERO_EXTEND)
> >>> +&& MEM_P (XEXP (src, 0)))
> >> Note some architectures allow both a source and destination memory.  It
> >> looks like your code will prefer the source operand in that case.
> >> That's fine, just pointing it out.
> >>
> > Thanks for pointing that possibility out. I thought for a moment that
> > this would be a bug with multiple mentions of the address register.
> > but it should be fine due to:
> > /* Special case: A foldable memory store is not foldable if it
> > mentions DEST outside of the address calculation. */
> ACK.
>
> The other thing I keep pondering is autoincrement style addressing.
> Though I think at some point I convinced myself they weren't a problem.
> I think your checks only allow specific kinds of expressions for the
> memory address and I don't think {PRE,POST}_{INC,DEC.MODIFY} were in the
> list of valid ops.
>
Yes, although I haven't considered pre/post-inc/dec, they're indeed
not allowed due to what is allowed to be a root memory operation
(get_fold_mem_root).
I believe these shouldn't be an issue as anything that transitively
affects even one of these will be rejected.

>
> >>> +
> >>> +  int max_iters = 5;
> >>> +  for (int i = 0; i < max_iters; i++)
> >>> +{
> >>> +  bool made_changes = false;
> >>> +  for (fold_info_map::iterator iter = fold_info->begin ();
> >>> +iter != fold_info->end (); ++iter)
> >>> + {
> >>> +   fold_mem_info *info = (*iter).second;
> >>> +   if (bitmap_intersect_p (_fold_insns, info->fold_insns))
> >>> + made_changes |= bitmap_ior_into (_fold_insns,
> >>> +  info->fold_insns);
> >>> + }
> >>> +
> >>> +  if (!made_changes)
> >>> + return true;
> >>> +}
> >>> +
> >>> +  return false;
> >> So how was the magic value of "5" determined here?  In general we try
> >> not to have magic #s like that and instead find a better way to control
> >> iterations, falling back to a PARAM when all else fails.
> >>
> > It's 

Re: [PATCH v5] Implement new RTL optimizations pass: fold-mem-offsets.

2023-09-29 Thread Jeff Law




On 9/12/23 04:13, Manolis Tsamis wrote:


+
+/* Get the single reaching definition of an instruction inside a BB.
+   The definition is desired for REG used in INSN.
+   Return the definition insn or NULL if there's no definition with
+   the desired criteria.  */
+static rtx_insn*
+get_single_def_in_bb (rtx_insn *insn, rtx reg)
+{
+  df_ref use;
+  struct df_link *ref_chain, *ref_link;
+
+  FOR_EACH_INSN_USE (use, insn)
+{
+  if (GET_CODE (DF_REF_REG (use)) == SUBREG)
+ return NULL;
+  if (REGNO (DF_REF_REG (use)) == REGNO (reg))
+ break;
+}
+
+  if (!use)
+return NULL;
+
+  ref_chain = DF_REF_CHAIN (use);

So what if there's two uses of REG in INSN?  I don't think it's be
common at all, but probably better safe and reject than sorry, right? Or
is that case filtered out earlier?


If the REG is the same won't the definitions be the same even if that
REG appears multiple times in INSN?

Yes.


fold_offsets_1 should be able to handle the folding with multiple uses
of REG just fine, for example add R1, R1 or add (ashift R1, 1), R1.
If there's no other issue here I assume we want to keep that as-is in
order to not reduce the propagation power (Which I assume is similar
to ree which uses the same logic).
OK.  I was primarily concerned about the folding and rewriting aspects. 
It probably can only show up on targets with LEA like instructions, and 
even on such targets it's probably rate.






+/* Test if INSN is a memory load / store that can have an offset folded to it.
+   Return true iff INSN is such an instruction and return through MEM_OUT,
+   REG_OUT and OFFSET_OUT the RTX that has a MEM code, the register that is
+   used as a base address and the offset accordingly.
+   All of the out pointers may be NULL in which case they will be ignored.  */
+bool
+get_fold_mem_root (rtx_insn* insn, rtx *mem_out, rtx *reg_out,
+HOST_WIDE_INT *offset_out)
+{
+  rtx set = single_set (insn);
+  rtx mem = NULL_RTX;
+
+  if (set != NULL_RTX)
+{
+  rtx src = SET_SRC (set);
+  rtx dest = SET_DEST (set);
+
+  /* Don't fold when we have unspec / volatile.  */
+  if (GET_CODE (src) == UNSPEC
+   || GET_CODE (src) == UNSPEC_VOLATILE
+   || GET_CODE (dest) == UNSPEC
+   || GET_CODE (dest) == UNSPEC_VOLATILE)
+ return false;
+
+  if (MEM_P (src))
+ mem = src;
+  else if (MEM_P (dest))
+ mem = dest;
+  else if ((GET_CODE (src) == SIGN_EXTEND
+ || GET_CODE (src) == ZERO_EXTEND)
+&& MEM_P (XEXP (src, 0)))

Note some architectures allow both a source and destination memory.  It
looks like your code will prefer the source operand in that case.
That's fine, just pointing it out.


Thanks for pointing that possibility out. I thought for a moment that
this would be a bug with multiple mentions of the address register.
but it should be fine due to:
/* Special case: A foldable memory store is not foldable if it
mentions DEST outside of the address calculation. */

ACK.

The other thing I keep pondering is autoincrement style addressing. 
Though I think at some point I convinced myself they weren't a problem. 
I think your checks only allow specific kinds of expressions for the 
memory address and I don't think {PRE,POST}_{INC,DEC.MODIFY} were in the 
list of valid ops.




+
+  int max_iters = 5;
+  for (int i = 0; i < max_iters; i++)
+{
+  bool made_changes = false;
+  for (fold_info_map::iterator iter = fold_info->begin ();
+iter != fold_info->end (); ++iter)
+ {
+   fold_mem_info *info = (*iter).second;
+   if (bitmap_intersect_p (_fold_insns, info->fold_insns))
+ made_changes |= bitmap_ior_into (_fold_insns,
+  info->fold_insns);
+ }
+
+  if (!made_changes)
+ return true;
+}
+
+  return false;

So how was the magic value of "5" determined here?  In general we try
not to have magic #s like that and instead find a better way to control
iterations, falling back to a PARAM when all else fails.


It's indeed just a magic value :)
In general even two iterations of this should be quite rare. One
iteration will handle the majority of interesting cases. I chose 5 to
limit the worst case execution time for degenerate cases.

I'll be happy to change that to something better but I couldn't
figure out how "falling back to a PARAM when all else fails"  should
work here. Is there a similar example somewhere else in the code that
I can look at?
If we expect two iterations to be quite rare, then I doubt a param is 
really worth the effort.  I might suggest using a loop like


for (pass = 0; pass < flag_expensive_optimizations + 1; pass++)

No magic numbers :-)   We use similar loops elsewhere for this kind of 
scenario.



If you ever need to create a param, the procedure is to add the option 
to params.opt.  If you look in that file you'll see that you define a 
variable name in the specification that you can then reference.



Re: [PATCH v5] Implement new RTL optimizations pass: fold-mem-offsets.

2023-09-12 Thread Manolis Tsamis
On Tue, Sep 12, 2023 at 3:47 AM Jeff Law  wrote:
>
>
>
> On 9/9/23 02:46, Manolis Tsamis wrote:
> > This is a new RTL pass that tries to optimize memory offset calculations
> > by moving them from add immediate instructions to the memory loads/stores.
> > For example it can transform this:
> >
> >addi t4,sp,16
> >add  t2,a6,t4
> >shl  t3,t2,1
> >ld   a2,0(t3)
> >addi a2,1
> >sd   a2,8(t2)
> >
> > into the following (one instruction less):
> >
> >add  t2,a6,sp
> >shl  t3,t2,1
> >ld   a2,32(t3)
> >addi a2,1
> >sd   a2,24(t2)
> >
> > Although there are places where this is done already, this pass is more
> > powerful and can handle the more difficult cases that are currently not
> > optimized. Also, it runs late enough and can optimize away unnecessary
> > stack pointer calculations.
> >
> > gcc/ChangeLog:
> >
> >   * Makefile.in: Add fold-mem-offsets.o.
> >   * passes.def: Schedule a new pass.
> >   * tree-pass.h (make_pass_fold_mem_offsets): Declare.
> >   * common.opt: New options.
> >   * doc/invoke.texi: Document new option.
> >   * fold-mem-offsets.cc: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/fold-mem-offsets-1.c: New test.
> >   * gcc.target/riscv/fold-mem-offsets-2.c: New test.
> >   * gcc.target/riscv/fold-mem-offsets-3.c: New test.
> >
> > Signed-off-by: Manolis Tsamis 
> > ---
> >
> > Changes in v5:
> >  - Introduce new helper function fold_offsets_1.
> >  - Fix bug because constants could be partially propagated
> >through instructions that weren't understood.
> >  - Introduce helper class fold_mem_info that stores f-m-o
> >info for an instruction.
> >  - Calculate fold_offsets only once with do_fold_info_calculation.
> >  - Fix correctness issue by introducing compute_validity_closure.
> >  - Propagate in more cases for PLUS/MINUS with constant.
> >
> > Changes in v4:
> >  - Add DF_EQ_NOTES flag to avoid incorrect state in notes.
> >  - Remove fold_mem_offsets_driver and enum fold_mem_phase.
> >  - Call recog when patching offsets in do_commit_offset.
> >  - Restore INSN_CODE after modifying insn in do_check_validity.
> >
> > Changes in v3:
> >  - Added propagation for more codes:
> >sub, neg, mul.
> >  - Added folding / elimination for sub and
> >const int moves.
> >  - For the validity check of the generated addresses
> >also test memory_address_addr_space_p.
> >  - Replaced GEN_INT with gen_int_mode.
> >  - Replaced some bitmap_head with auto_bitmap.
> >  - Refactor each phase into own function for readability.
> >  - Add dump details.
> >  - Replace rtx iteration with reg_mentioned_p.
> >  - Return early for codes that we can't propagate through.
> >
> > Changes in v2:
> >  - Made the pass target-independant instead of RISCV specific.
> >  - Fixed a number of bugs.
> >  - Add code to handle more ADD patterns as found
> >in other targets (x86, aarch64).
> >  - Improved naming and comments.
> >  - Fixed bitmap memory leak.
> >
>
>
> > +
> > +/* Get the single reaching definition of an instruction inside a BB.
> > +   The definition is desired for REG used in INSN.
> > +   Return the definition insn or NULL if there's no definition with
> > +   the desired criteria.  */
> > +static rtx_insn*
> > +get_single_def_in_bb (rtx_insn *insn, rtx reg)
> > +{
> > +  df_ref use;
> > +  struct df_link *ref_chain, *ref_link;
> > +
> > +  FOR_EACH_INSN_USE (use, insn)
> > +{
> > +  if (GET_CODE (DF_REF_REG (use)) == SUBREG)
> > + return NULL;
> > +  if (REGNO (DF_REF_REG (use)) == REGNO (reg))
> > + break;
> > +}
> > +
> > +  if (!use)
> > +return NULL;
> > +
> > +  ref_chain = DF_REF_CHAIN (use);
> So what if there's two uses of REG in INSN?  I don't think it's be
> common at all, but probably better safe and reject than sorry, right? Or
> is that case filtered out earlier?
>
If the REG is the same won't the definitions be the same even if that
REG appears multiple times in INSN?
fold_offsets_1 should be able to handle the folding with multiple uses
of REG just fine, for example add R1, R1 or add (ashift R1, 1), R1.
If there's no other issue here I assume we want to keep that as-is in
order to not reduce the propagation power (Which I assume is similar
to ree which uses the same logic).

>
>
>
> > +
> > +  rtx_insn* def = DF_REF_INSN (ref_chain->ref);
> Formatting nit.  The '*' should be next to the variable, not the type.
>
Thanks, I fixed this in multiple places. I thought that
check_GNU_style would whine about this...

> > +
>
> > +
> > +static HOST_WIDE_INT
> > +fold_offsets (rtx_insn* insn, rtx reg, bool analyze, bitmap 
> > foldable_insns);
> > +
> > +/*  Helper function for fold_offsets.
> > +
> > +If 

Re: [PATCH v5] Implement new RTL optimizations pass: fold-mem-offsets.

2023-09-11 Thread Jeff Law via Gcc-patches




On 9/9/23 02:46, Manolis Tsamis wrote:

This is a new RTL pass that tries to optimize memory offset calculations
by moving them from add immediate instructions to the memory loads/stores.
For example it can transform this:

   addi t4,sp,16
   add  t2,a6,t4
   shl  t3,t2,1
   ld   a2,0(t3)
   addi a2,1
   sd   a2,8(t2)

into the following (one instruction less):

   add  t2,a6,sp
   shl  t3,t2,1
   ld   a2,32(t3)
   addi a2,1
   sd   a2,24(t2)

Although there are places where this is done already, this pass is more
powerful and can handle the more difficult cases that are currently not
optimized. Also, it runs late enough and can optimize away unnecessary
stack pointer calculations.

gcc/ChangeLog:

* Makefile.in: Add fold-mem-offsets.o.
* passes.def: Schedule a new pass.
* tree-pass.h (make_pass_fold_mem_offsets): Declare.
* common.opt: New options.
* doc/invoke.texi: Document new option.
* fold-mem-offsets.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/fold-mem-offsets-1.c: New test.
* gcc.target/riscv/fold-mem-offsets-2.c: New test.
* gcc.target/riscv/fold-mem-offsets-3.c: New test.

Signed-off-by: Manolis Tsamis 
---

Changes in v5:
 - Introduce new helper function fold_offsets_1.
 - Fix bug because constants could be partially propagated
   through instructions that weren't understood.
 - Introduce helper class fold_mem_info that stores f-m-o
   info for an instruction.
 - Calculate fold_offsets only once with do_fold_info_calculation.
 - Fix correctness issue by introducing compute_validity_closure.
 - Propagate in more cases for PLUS/MINUS with constant.

Changes in v4:
 - Add DF_EQ_NOTES flag to avoid incorrect state in notes.
 - Remove fold_mem_offsets_driver and enum fold_mem_phase.
 - Call recog when patching offsets in do_commit_offset.
 - Restore INSN_CODE after modifying insn in do_check_validity.

Changes in v3:
 - Added propagation for more codes:
   sub, neg, mul.
 - Added folding / elimination for sub and
   const int moves.
 - For the validity check of the generated addresses
   also test memory_address_addr_space_p.
 - Replaced GEN_INT with gen_int_mode.
 - Replaced some bitmap_head with auto_bitmap.
 - Refactor each phase into own function for readability.
 - Add dump details.
 - Replace rtx iteration with reg_mentioned_p.
 - Return early for codes that we can't propagate through.

Changes in v2:
 - Made the pass target-independant instead of RISCV specific.
 - Fixed a number of bugs.
 - Add code to handle more ADD patterns as found
   in other targets (x86, aarch64).
 - Improved naming and comments.
 - Fixed bitmap memory leak.





+
+/* Get the single reaching definition of an instruction inside a BB.
+   The definition is desired for REG used in INSN.
+   Return the definition insn or NULL if there's no definition with
+   the desired criteria.  */
+static rtx_insn*
+get_single_def_in_bb (rtx_insn *insn, rtx reg)
+{
+  df_ref use;
+  struct df_link *ref_chain, *ref_link;
+
+  FOR_EACH_INSN_USE (use, insn)
+{
+  if (GET_CODE (DF_REF_REG (use)) == SUBREG)
+   return NULL;
+  if (REGNO (DF_REF_REG (use)) == REGNO (reg))
+   break;
+}
+
+  if (!use)
+return NULL;
+
+  ref_chain = DF_REF_CHAIN (use);
So what if there's two uses of REG in INSN?  I don't think it's be 
common at all, but probably better safe and reject than sorry, right? Or 
is that case filtered out earlier?






+
+  rtx_insn* def = DF_REF_INSN (ref_chain->ref);

Formatting nit.  The '*' should be next to the variable, not the type.


+



+
+static HOST_WIDE_INT
+fold_offsets (rtx_insn* insn, rtx reg, bool analyze, bitmap foldable_insns);
+
+/*  Helper function for fold_offsets.
+
+If DO_RECURSION is false and ANALYZE is true this function returns true iff
+it understands the structure of INSN and knows how to propagate constants
+through it.  In this case OFFSET_OUT and FOLDABLE_INSNS are unused.
+
+If DO_RECURSION is true then it also calls fold_offsets for each recognised
+part of INSN with the appropriate arguments.
+
+If DO_RECURSION is true and ANALYZE is false then offset that would result
+from folding is computed and is returned through the pointer OFFSET_OUT.
+The instructions that can be folded are recorded in FOLDABLE_INSNS.
+*/
+static bool fold_offsets_1 (rtx_insn* insn, bool analyze, bool do_recursion,
+   HOST_WIDE_INT *offset_out, bitmap foldable_insns)
Nit.  Linkage and return type on separate line.  That makes the function 
name start at the beginning of a line.





+
+/* Test if INSN is a memory load / store that can have an offset folded to it.
+   Return true iff INSN is such an 

Re: [PATCH v5] Implement new RTL optimizations pass: fold-mem-offsets.

2023-09-09 Thread Manolis Tsamis
This new version fixes the issues discussed in v4 and also fixes an
issue that is described in the newly introduced
compute_validity_closure.

Bootstrapped on x86-64 and AArch64.
I also ran the GCC testsuite on x86-64, AArch64 and RISCV64. There are
no regressions except for gcc.target/i386/pr52146.c which I have
already mentioned and I believe it shouldn't be fixed in f-m-o.

I have also measured the number of eliminated instructions for SPEC
intrate on these three architectures, which are as follows:

RISCV64:
  500: 112
  502: 443
  505: 0
  520: 808
  523: 20
  525: 384
  531: 41
  541: 97
  548: 101
  557: 9

AArch64:
  500: 71
  502: 318
  505: 0
  520: 23
  523: 205
  525: 73
  531: 7
  541: 56
  548: 0
  557: 2

x86-64:
  500: 8
  502: 16
  505: 0
  520: 4
  523: 5
  525: 2
  531: 0
  541: 0
  548: 0
  557: 0

Thanks,
Manolis

On Sat, Sep 9, 2023 at 11:47 AM Manolis Tsamis  wrote:
>
> This is a new RTL pass that tries to optimize memory offset calculations
> by moving them from add immediate instructions to the memory loads/stores.
> For example it can transform this:
>
>   addi t4,sp,16
>   add  t2,a6,t4
>   shl  t3,t2,1
>   ld   a2,0(t3)
>   addi a2,1
>   sd   a2,8(t2)
>
> into the following (one instruction less):
>
>   add  t2,a6,sp
>   shl  t3,t2,1
>   ld   a2,32(t3)
>   addi a2,1
>   sd   a2,24(t2)
>
> Although there are places where this is done already, this pass is more
> powerful and can handle the more difficult cases that are currently not
> optimized. Also, it runs late enough and can optimize away unnecessary
> stack pointer calculations.
>
> gcc/ChangeLog:
>
> * Makefile.in: Add fold-mem-offsets.o.
> * passes.def: Schedule a new pass.
> * tree-pass.h (make_pass_fold_mem_offsets): Declare.
> * common.opt: New options.
> * doc/invoke.texi: Document new option.
> * fold-mem-offsets.cc: New file.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/fold-mem-offsets-1.c: New test.
> * gcc.target/riscv/fold-mem-offsets-2.c: New test.
> * gcc.target/riscv/fold-mem-offsets-3.c: New test.
>
> Signed-off-by: Manolis Tsamis 
> ---
>
> Changes in v5:
> - Introduce new helper function fold_offsets_1.
> - Fix bug because constants could be partially propagated
>   through instructions that weren't understood.
> - Introduce helper class fold_mem_info that stores f-m-o
>   info for an instruction.
> - Calculate fold_offsets only once with do_fold_info_calculation.
> - Fix correctness issue by introducing compute_validity_closure.
> - Propagate in more cases for PLUS/MINUS with constant.
>
> Changes in v4:
> - Add DF_EQ_NOTES flag to avoid incorrect state in notes.
> - Remove fold_mem_offsets_driver and enum fold_mem_phase.
> - Call recog when patching offsets in do_commit_offset.
> - Restore INSN_CODE after modifying insn in do_check_validity.
>
> Changes in v3:
> - Added propagation for more codes:
>   sub, neg, mul.
> - Added folding / elimination for sub and
>   const int moves.
> - For the validity check of the generated addresses
>   also test memory_address_addr_space_p.
> - Replaced GEN_INT with gen_int_mode.
> - Replaced some bitmap_head with auto_bitmap.
> - Refactor each phase into own function for readability.
> - Add dump details.
> - Replace rtx iteration with reg_mentioned_p.
> - Return early for codes that we can't propagate through.
>
> Changes in v2:
> - Made the pass target-independant instead of RISCV specific.
> - Fixed a number of bugs.
> - Add code to handle more ADD patterns as found
>   in other targets (x86, aarch64).
> - Improved naming and comments.
> - Fixed bitmap memory leak.
>
>  gcc/Makefile.in   |   1 +
>  gcc/common.opt|   4 +
>  gcc/doc/invoke.texi   |   8 +
>  gcc/fold-mem-offsets.cc   | 891 ++
>  gcc/passes.def|   1 +
>  .../gcc.target/riscv/fold-mem-offsets-1.c |  16 +
>  .../gcc.target/riscv/fold-mem-offsets-2.c |  24 +
>  .../gcc.target/riscv/fold-mem-offsets-3.c |  17 +
>  gcc/tree-pass.h   |   1 +
>  9 files changed, 963 insertions(+)
>  create mode 100644 gcc/fold-mem-offsets.cc
>  create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 6d608db4dd2..d18bef1be4b 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1435,6 +1435,7 @@ OBJS = \
> fixed-value.o \
> fold-const.o \
> fold-const-call.o \
> +