Re: [PATCH v6] aarch64: New RTL optimization pass avoid-store-forwarding.

2023-12-08 Thread Manos Anagnostakis
So is it OK for trunk as is in v6 with the generic changes added in GCC-15?

Manos.

Στις Πέμ 7 Δεκ 2023, 16:10 ο χρήστης Richard Biener <
richard.guent...@gmail.com> έγραψε:

> On Thu, Dec 7, 2023 at 1:20 PM Richard Sandiford
>  wrote:
> >
> > Richard Biener  writes:
> > > On Wed, Dec 6, 2023 at 7:44 PM Philipp Tomsich <
> philipp.toms...@vrull.eu> wrote:
> > >>
> > >> On Wed, 6 Dec 2023 at 23:32, Richard Biener <
> richard.guent...@gmail.com> wrote:
> > >> >
> > >> > On Wed, Dec 6, 2023 at 2:48 PM Manos Anagnostakis
> > >> >  wrote:
> > >> > >
> > >> > > This is an RTL pass that detects store forwarding from stores to
> larger loads (load pairs).
> > >> > >
> > >> > > This optimization is SPEC2017-driven and was found to be
> beneficial for some benchmarks,
> > >> > > through testing on ampere1/ampere1a machines.
> > >> > >
> > >> > > For example, it can transform cases like
> > >> > >
> > >> > > str  d5, [sp, #320]
> > >> > > fmul d5, d31, d29
> > >> > > ldp  d31, d17, [sp, #312] # Large load from small store
> > >> > >
> > >> > > to
> > >> > >
> > >> > > str  d5, [sp, #320]
> > >> > > fmul d5, d31, d29
> > >> > > ldr  d31, [sp, #312]
> > >> > > ldr  d17, [sp, #320]
> > >> > >
> > >> > > Currently, the pass is disabled by default on all architectures
> and enabled by a target-specific option.
> > >> > >
> > >> > > If deemed beneficial enough for a default, it will be enabled on
> ampere1/ampere1a,
> > >> > > or other architectures as well, without needing to be turned on
> by this option.
> > >> >
> > >> > What is aarch64-specific about the pass?
> > >> >
> > >> > I see an increasingly large number of target specific passes pop up
> (probably
> > >> > for the excuse we can generalize them if necessary).  But GCC isn't
> LLVM
> > >> > and this feels like getting out of hand?
> > >>
> > >> We had an OK from Richard Sandiford on the earlier (v5) version with
> > >> v6 just fixing an obvious bug... so I was about to merge this earlier
> > >> just when you commented.
> > >>
> > >> Given that this had months of test exposure on our end, I would prefer
> > >> to move this forward for GCC14 in its current form.
> > >> The project of replacing architecture-specific store-forwarding passes
> > >> with a generalized infrastructure could then be addressed in the GCC15
> > >> timeframe (or beyond)?
> > >
> > > It's up to target maintainers, I just picked this pass (randomly) to
> make this
> > > comment (of course also knowing that STLF fails are a common issue on
> > > pipelined uarchs).
> >
> > I agree there's scope for making some of this target-independent.
> >
> > One vague thing I've been wondering about is whether, for some passes
> > like these, we should use inheritance rather than target hooks.  So in
> > this case, the target-independent code would provide a framework for
> > iterating over the function and testing for forwarding, but the target
> > would ultimately decide what to do with that information.  This would
> > also make it easier for targets to add genuinely target-specific
> > information to the bookkeeping structures.
> >
> > In case it sounds otherwise, that's supposed to be more than
> > just a structural C++-vs-C thing.  The idea is that we'd have
> > a pass for "resolving store forwarding-related problems",
> > but the specific goals would be mostly (or at least partially)
> > target-specific rather than target-independent.
>
> In some cases we've used target hooks for this, in this case it might
> work as well.
>
> > I'd wondered the same thing about the early-ra pass that we're
> > adding for SME.  Some of the framework could be generalised and
> > made target-independent, but the main purpose of the pass (using
> > strided registers with certain patterns and constraints) is highly
> > target-specific.
>
> .. not sure about this one though.
>
> Richard.
>
> > Thanks,
> > Richard
>


Re: [PATCH v6] aarch64: New RTL optimization pass avoid-store-forwarding.

2023-12-07 Thread Richard Biener
On Thu, Dec 7, 2023 at 1:20 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Wed, Dec 6, 2023 at 7:44 PM Philipp Tomsich  
> > wrote:
> >>
> >> On Wed, 6 Dec 2023 at 23:32, Richard Biener  
> >> wrote:
> >> >
> >> > On Wed, Dec 6, 2023 at 2:48 PM Manos Anagnostakis
> >> >  wrote:
> >> > >
> >> > > This is an RTL pass that detects store forwarding from stores to 
> >> > > larger loads (load pairs).
> >> > >
> >> > > This optimization is SPEC2017-driven and was found to be beneficial 
> >> > > for some benchmarks,
> >> > > through testing on ampere1/ampere1a machines.
> >> > >
> >> > > For example, it can transform cases like
> >> > >
> >> > > str  d5, [sp, #320]
> >> > > fmul d5, d31, d29
> >> > > ldp  d31, d17, [sp, #312] # Large load from small store
> >> > >
> >> > > to
> >> > >
> >> > > str  d5, [sp, #320]
> >> > > fmul d5, d31, d29
> >> > > ldr  d31, [sp, #312]
> >> > > ldr  d17, [sp, #320]
> >> > >
> >> > > Currently, the pass is disabled by default on all architectures and 
> >> > > enabled by a target-specific option.
> >> > >
> >> > > If deemed beneficial enough for a default, it will be enabled on 
> >> > > ampere1/ampere1a,
> >> > > or other architectures as well, without needing to be turned on by 
> >> > > this option.
> >> >
> >> > What is aarch64-specific about the pass?
> >> >
> >> > I see an increasingly large number of target specific passes pop up 
> >> > (probably
> >> > for the excuse we can generalize them if necessary).  But GCC isn't LLVM
> >> > and this feels like getting out of hand?
> >>
> >> We had an OK from Richard Sandiford on the earlier (v5) version with
> >> v6 just fixing an obvious bug... so I was about to merge this earlier
> >> just when you commented.
> >>
> >> Given that this had months of test exposure on our end, I would prefer
> >> to move this forward for GCC14 in its current form.
> >> The project of replacing architecture-specific store-forwarding passes
> >> with a generalized infrastructure could then be addressed in the GCC15
> >> timeframe (or beyond)?
> >
> > It's up to target maintainers, I just picked this pass (randomly) to make 
> > this
> > comment (of course also knowing that STLF fails are a common issue on
> > pipelined uarchs).
>
> I agree there's scope for making some of this target-independent.
>
> One vague thing I've been wondering about is whether, for some passes
> like these, we should use inheritance rather than target hooks.  So in
> this case, the target-independent code would provide a framework for
> iterating over the function and testing for forwarding, but the target
> would ultimately decide what to do with that information.  This would
> also make it easier for targets to add genuinely target-specific
> information to the bookkeeping structures.
>
> In case it sounds otherwise, that's supposed to be more than
> just a structural C++-vs-C thing.  The idea is that we'd have
> a pass for "resolving store forwarding-related problems",
> but the specific goals would be mostly (or at least partially)
> target-specific rather than target-independent.

In some cases we've used target hooks for this, in this case it might
work as well.

> I'd wondered the same thing about the early-ra pass that we're
> adding for SME.  Some of the framework could be generalised and
> made target-independent, but the main purpose of the pass (using
> strided registers with certain patterns and constraints) is highly
> target-specific.

.. not sure about this one though.

Richard.

> Thanks,
> Richard


Re: [PATCH v6] aarch64: New RTL optimization pass avoid-store-forwarding.

2023-12-07 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, Dec 6, 2023 at 7:44 PM Philipp Tomsich  
> wrote:
>>
>> On Wed, 6 Dec 2023 at 23:32, Richard Biener  
>> wrote:
>> >
>> > On Wed, Dec 6, 2023 at 2:48 PM Manos Anagnostakis
>> >  wrote:
>> > >
>> > > This is an RTL pass that detects store forwarding from stores to larger 
>> > > loads (load pairs).
>> > >
>> > > This optimization is SPEC2017-driven and was found to be beneficial for 
>> > > some benchmarks,
>> > > through testing on ampere1/ampere1a machines.
>> > >
>> > > For example, it can transform cases like
>> > >
>> > > str  d5, [sp, #320]
>> > > fmul d5, d31, d29
>> > > ldp  d31, d17, [sp, #312] # Large load from small store
>> > >
>> > > to
>> > >
>> > > str  d5, [sp, #320]
>> > > fmul d5, d31, d29
>> > > ldr  d31, [sp, #312]
>> > > ldr  d17, [sp, #320]
>> > >
>> > > Currently, the pass is disabled by default on all architectures and 
>> > > enabled by a target-specific option.
>> > >
>> > > If deemed beneficial enough for a default, it will be enabled on 
>> > > ampere1/ampere1a,
>> > > or other architectures as well, without needing to be turned on by this 
>> > > option.
>> >
>> > What is aarch64-specific about the pass?
>> >
>> > I see an increasingly large number of target specific passes pop up 
>> > (probably
>> > for the excuse we can generalize them if necessary).  But GCC isn't LLVM
>> > and this feels like getting out of hand?
>>
>> We had an OK from Richard Sandiford on the earlier (v5) version with
>> v6 just fixing an obvious bug... so I was about to merge this earlier
>> just when you commented.
>>
>> Given that this had months of test exposure on our end, I would prefer
>> to move this forward for GCC14 in its current form.
>> The project of replacing architecture-specific store-forwarding passes
>> with a generalized infrastructure could then be addressed in the GCC15
>> timeframe (or beyond)?
>
> It's up to target maintainers, I just picked this pass (randomly) to make this
> comment (of course also knowing that STLF fails are a common issue on
> pipelined uarchs).

I agree there's scope for making some of this target-independent.

One vague thing I've been wondering about is whether, for some passes
like these, we should use inheritance rather than target hooks.  So in
this case, the target-independent code would provide a framework for
iterating over the function and testing for forwarding, but the target
would ultimately decide what to do with that information.  This would
also make it easier for targets to add genuinely target-specific
information to the bookkeeping structures.

In case it sounds otherwise, that's supposed to be more than
just a structural C++-vs-C thing.  The idea is that we'd have
a pass for "resolving store forwarding-related problems",
but the specific goals would be mostly (or at least partially)
target-specific rather than target-independent.

I'd wondered the same thing about the early-ra pass that we're
adding for SME.  Some of the framework could be generalised and
made target-independent, but the main purpose of the pass (using
strided registers with certain patterns and constraints) is highly
target-specific.

Thanks,
Richard


Re: [PATCH v6] aarch64: New RTL optimization pass avoid-store-forwarding.

2023-12-07 Thread Manos Anagnostakis
Στις Πέμ 7 Δεκ 2023, 09:39 ο χρήστης Richard Biener <
richard.guent...@gmail.com> έγραψε:

> On Wed, Dec 6, 2023 at 6:15 PM Manos Anagnostakis
>  wrote:
> >
> > Hi again,
> >
> > I went and tested the requested changes and found out the following:
> >
> > 1. The pass is currently increasing insn_cnt on a NONJUMP_INSN_P, which
> is a subset of NONDEBUG_INSN_P. I think there is no problem with depending
> on -g with the current version. Do you see something I don't or did you
> mean something else?
>
> It just occurred to me - thanks for double-checking (it wasn't obvious
> to me NONJUMP_INSN_P doesn't include DEBUG_INSNs ...).
>
> > 2. Not processing all instructions is not letting cselib record all the
> effects they have, thus it does not have updated information to find true
> forwardings at any given time. I can confirm this since I am witnessing
> many unexpected changes on the number of handled cases if I do this only
> for loads/stores.
>
> Ah, OK.  I guess I don't fully get it, it seems you use cselib to
> compare addresses and while possibly not
> processing part of the address computation might break this other
> stmts inbetween should be uninteresting
> (at least I don't see you handling intermediate may-aliasing [small]
> stores to disable the splitting).
>
> So in the end it's a compile-time trade-off between relying solely on
> cselib or trying to optimize
> cselib use with DF for address computes?
>
> Richard.
>
I am not really familiar with the DF technique, but what we did is the
following:

At first we were using a combination of alias analysis with cselib, which
while trying to avoid false positives that had their memory-register
address changed inbetween, was rejecting part of stores from being
considered as candidates for a forwarding to an ldp.

Thus in the end using only the cselib with the correct lookups was the way
to address both the register difference and/or the intermediate change on
the address value.

>
> > Thanks in advance and please let me know your thoughts on the above.
> > Manos.
> >
> > On Wed, Dec 6, 2023 at 5:10 PM Manos Anagnostakis <
> manos.anagnosta...@vrull.eu> wrote:
> >>
> >> Hi Richard,
> >>
> >> thanks for the useful comments.
> >>
> >> On Wed, Dec 6, 2023 at 4:32 PM Richard Biener <
> richard.guent...@gmail.com> wrote:
> >>>
> >>> On Wed, Dec 6, 2023 at 2:48 PM Manos Anagnostakis
> >>>  wrote:
> >>> >
> >>> > This is an RTL pass that detects store forwarding from stores to
> larger loads (load pairs).
> >>> >
> >>> > This optimization is SPEC2017-driven and was found to be beneficial
> for some benchmarks,
> >>> > through testing on ampere1/ampere1a machines.
> >>> >
> >>> > For example, it can transform cases like
> >>> >
> >>> > str  d5, [sp, #320]
> >>> > fmul d5, d31, d29
> >>> > ldp  d31, d17, [sp, #312] # Large load from small store
> >>> >
> >>> > to
> >>> >
> >>> > str  d5, [sp, #320]
> >>> > fmul d5, d31, d29
> >>> > ldr  d31, [sp, #312]
> >>> > ldr  d17, [sp, #320]
> >>> >
> >>> > Currently, the pass is disabled by default on all architectures and
> enabled by a target-specific option.
> >>> >
> >>> > If deemed beneficial enough for a default, it will be enabled on
> ampere1/ampere1a,
> >>> > or other architectures as well, without needing to be turned on by
> this option.
> >>>
> >>> What is aarch64-specific about the pass?
> >>
> >> The pass was designed to target load pairs, which are aarch64 specific,
> thus it cannot handle generic loads.
> >>>
> >>>
> >>> I see an increasingly large number of target specific passes pop up
> (probably
> >>> for the excuse we can generalize them if necessary).  But GCC isn't
> LLVM
> >>> and this feels like getting out of hand?
> >>>
> >>> The x86 backend also has its store-forwarding "pass" as part of mdreorg
> >>> in ix86_split_stlf_stall_load.
> >>>
> >>> Richard.
> >>>
> >>> > Bootstrapped and regtested on aarch64-linux.
> >>> >
> >>> > gcc/ChangeLog:
> >>> >
> >>> > * config.gcc: Add aarch64-store-forwarding.o to extra_objs.
> >>> > * config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New
> pass.
> >>> > * config/aarch64/aarch64-protos.h
> (make_pass_avoid_store_forwarding): Declare.
> >>> > * config/aarch64/aarch64.opt (mavoid-store-forwarding): New
> option.
> >>> > (aarch64-store-forwarding-threshold): New param.
> >>> > * config/aarch64/t-aarch64: Add aarch64-store-forwarding.o
> >>> > * doc/invoke.texi: Document new option and new param.
> >>> > * config/aarch64/aarch64-store-forwarding.cc: New file.
> >>> >
> >>> > gcc/testsuite/ChangeLog:
> >>> >
> >>> > * gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test.
> >>> > * gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test.
> >>> > * gcc.target/aarch64/ldp_ssll_overlap.c: New test.
> >>> >
> >>> > Signed-off-by: Manos Anagnostakis 
> >>> > Co-Authored-By: Manolis Tsamis 
> >>> > Co-Authored-By: Philipp Tomsich 
> >>> > ---
> >>> > Changes in v6:
> 

Re: [PATCH v6] aarch64: New RTL optimization pass avoid-store-forwarding.

2023-12-06 Thread Richard Biener
On Wed, Dec 6, 2023 at 6:15 PM Manos Anagnostakis
 wrote:
>
> Hi again,
>
> I went and tested the requested changes and found out the following:
>
> 1. The pass is currently increasing insn_cnt on a NONJUMP_INSN_P, which is a 
> subset of NONDEBUG_INSN_P. I think there is no problem with depending on -g 
> with the current version. Do you see something I don't or did you mean 
> something else?

It just occurred to me - thanks for double-checking (it wasn't obvious
to me NONJUMP_INSN_P doesn't include DEBUG_INSNs ...).

> 2. Not processing all instructions is not letting cselib record all the 
> effects they have, thus it does not have updated information to find true 
> forwardings at any given time. I can confirm this since I am witnessing many 
> unexpected changes on the number of handled cases if I do this only for 
> loads/stores.

Ah, OK.  I guess I don't fully get it, it seems you use cselib to
compare addresses and while possibly not
processing part of the address computation might break this other
stmts inbetween should be uninteresting
(at least I don't see you handling intermediate may-aliasing [small]
stores to disable the splitting).

So in the end it's a compile-time trade-off between relying solely on
cselib or trying to optimize
cselib use with DF for address computes?

Richard.

> Thanks in advance and please let me know your thoughts on the above.
> Manos.
>
> On Wed, Dec 6, 2023 at 5:10 PM Manos Anagnostakis 
>  wrote:
>>
>> Hi Richard,
>>
>> thanks for the useful comments.
>>
>> On Wed, Dec 6, 2023 at 4:32 PM Richard Biener  
>> wrote:
>>>
>>> On Wed, Dec 6, 2023 at 2:48 PM Manos Anagnostakis
>>>  wrote:
>>> >
>>> > This is an RTL pass that detects store forwarding from stores to larger 
>>> > loads (load pairs).
>>> >
>>> > This optimization is SPEC2017-driven and was found to be beneficial for 
>>> > some benchmarks,
>>> > through testing on ampere1/ampere1a machines.
>>> >
>>> > For example, it can transform cases like
>>> >
>>> > str  d5, [sp, #320]
>>> > fmul d5, d31, d29
>>> > ldp  d31, d17, [sp, #312] # Large load from small store
>>> >
>>> > to
>>> >
>>> > str  d5, [sp, #320]
>>> > fmul d5, d31, d29
>>> > ldr  d31, [sp, #312]
>>> > ldr  d17, [sp, #320]
>>> >
>>> > Currently, the pass is disabled by default on all architectures and 
>>> > enabled by a target-specific option.
>>> >
>>> > If deemed beneficial enough for a default, it will be enabled on 
>>> > ampere1/ampere1a,
>>> > or other architectures as well, without needing to be turned on by this 
>>> > option.
>>>
>>> What is aarch64-specific about the pass?
>>
>> The pass was designed to target load pairs, which are aarch64 specific, thus 
>> it cannot handle generic loads.
>>>
>>>
>>> I see an increasingly large number of target specific passes pop up 
>>> (probably
>>> for the excuse we can generalize them if necessary).  But GCC isn't LLVM
>>> and this feels like getting out of hand?
>>>
>>> The x86 backend also has its store-forwarding "pass" as part of mdreorg
>>> in ix86_split_stlf_stall_load.
>>>
>>> Richard.
>>>
>>> > Bootstrapped and regtested on aarch64-linux.
>>> >
>>> > gcc/ChangeLog:
>>> >
>>> > * config.gcc: Add aarch64-store-forwarding.o to extra_objs.
>>> > * config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New pass.
>>> > * config/aarch64/aarch64-protos.h 
>>> > (make_pass_avoid_store_forwarding): Declare.
>>> > * config/aarch64/aarch64.opt (mavoid-store-forwarding): New 
>>> > option.
>>> > (aarch64-store-forwarding-threshold): New param.
>>> > * config/aarch64/t-aarch64: Add aarch64-store-forwarding.o
>>> > * doc/invoke.texi: Document new option and new param.
>>> > * config/aarch64/aarch64-store-forwarding.cc: New file.
>>> >
>>> > gcc/testsuite/ChangeLog:
>>> >
>>> > * gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test.
>>> > * gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test.
>>> > * gcc.target/aarch64/ldp_ssll_overlap.c: New test.
>>> >
>>> > Signed-off-by: Manos Anagnostakis 
>>> > Co-Authored-By: Manolis Tsamis 
>>> > Co-Authored-By: Philipp Tomsich 
>>> > ---
>>> > Changes in v6:
>>> > - An obvious change. insn_cnt was incremented only on
>>> >   stores and not for every insn in the bb. Now restored.
>>> >
>>> >  gcc/config.gcc|   1 +
>>> >  gcc/config/aarch64/aarch64-passes.def |   1 +
>>> >  gcc/config/aarch64/aarch64-protos.h   |   1 +
>>> >  .../aarch64/aarch64-store-forwarding.cc   | 318 ++
>>> >  gcc/config/aarch64/aarch64.opt|   9 +
>>> >  gcc/config/aarch64/t-aarch64  |  10 +
>>> >  gcc/doc/invoke.texi   |  11 +-
>>> >  .../aarch64/ldp_ssll_no_overlap_address.c |  33 ++
>>> >  .../aarch64/ldp_ssll_no_overlap_offset.c  |  33 ++
>>> >  .../gcc.target/aarch64/ldp_ssll_overlap.c |  33 ++
>>> >  10 files changed, 449 insertions(+), 1 

Re: [PATCH v6] aarch64: New RTL optimization pass avoid-store-forwarding.

2023-12-06 Thread Richard Biener
On Wed, Dec 6, 2023 at 7:44 PM Philipp Tomsich  wrote:
>
> On Wed, 6 Dec 2023 at 23:32, Richard Biener  
> wrote:
> >
> > On Wed, Dec 6, 2023 at 2:48 PM Manos Anagnostakis
> >  wrote:
> > >
> > > This is an RTL pass that detects store forwarding from stores to larger 
> > > loads (load pairs).
> > >
> > > This optimization is SPEC2017-driven and was found to be beneficial for 
> > > some benchmarks,
> > > through testing on ampere1/ampere1a machines.
> > >
> > > For example, it can transform cases like
> > >
> > > str  d5, [sp, #320]
> > > fmul d5, d31, d29
> > > ldp  d31, d17, [sp, #312] # Large load from small store
> > >
> > > to
> > >
> > > str  d5, [sp, #320]
> > > fmul d5, d31, d29
> > > ldr  d31, [sp, #312]
> > > ldr  d17, [sp, #320]
> > >
> > > Currently, the pass is disabled by default on all architectures and 
> > > enabled by a target-specific option.
> > >
> > > If deemed beneficial enough for a default, it will be enabled on 
> > > ampere1/ampere1a,
> > > or other architectures as well, without needing to be turned on by this 
> > > option.
> >
> > What is aarch64-specific about the pass?
> >
> > I see an increasingly large number of target specific passes pop up 
> > (probably
> > for the excuse we can generalize them if necessary).  But GCC isn't LLVM
> > and this feels like getting out of hand?
>
> We had an OK from Richard Sandiford on the earlier (v5) version with
> v6 just fixing an obvious bug... so I was about to merge this earlier
> just when you commented.
>
> Given that this had months of test exposure on our end, I would prefer
> to move this forward for GCC14 in its current form.
> The project of replacing architecture-specific store-forwarding passes
> with a generalized infrastructure could then be addressed in the GCC15
> timeframe (or beyond)?

It's up to target maintainers, I just picked this pass (randomly) to make this
comment (of course also knowing that STLF fails are a common issue on
pipelined uarchs).

Richard.

>
> --Philipp.
>
> >
> > The x86 backend also has its store-forwarding "pass" as part of mdreorg
> > in ix86_split_stlf_stall_load.
> >
> > Richard.
> >
> > > Bootstrapped and regtested on aarch64-linux.
> > >
> > > gcc/ChangeLog:
> > >
> > > * config.gcc: Add aarch64-store-forwarding.o to extra_objs.
> > > * config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New pass.
> > > * config/aarch64/aarch64-protos.h 
> > > (make_pass_avoid_store_forwarding): Declare.
> > > * config/aarch64/aarch64.opt (mavoid-store-forwarding): New 
> > > option.
> > > (aarch64-store-forwarding-threshold): New param.
> > > * config/aarch64/t-aarch64: Add aarch64-store-forwarding.o
> > > * doc/invoke.texi: Document new option and new param.
> > > * config/aarch64/aarch64-store-forwarding.cc: New file.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test.
> > > * gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test.
> > > * gcc.target/aarch64/ldp_ssll_overlap.c: New test.
> > >
> > > Signed-off-by: Manos Anagnostakis 
> > > Co-Authored-By: Manolis Tsamis 
> > > Co-Authored-By: Philipp Tomsich 
> > > ---
> > > Changes in v6:
> > > - An obvious change. insn_cnt was incremented only on
> > >   stores and not for every insn in the bb. Now restored.
> > >
> > >  gcc/config.gcc|   1 +
> > >  gcc/config/aarch64/aarch64-passes.def |   1 +
> > >  gcc/config/aarch64/aarch64-protos.h   |   1 +
> > >  .../aarch64/aarch64-store-forwarding.cc   | 318 ++
> > >  gcc/config/aarch64/aarch64.opt|   9 +
> > >  gcc/config/aarch64/t-aarch64  |  10 +
> > >  gcc/doc/invoke.texi   |  11 +-
> > >  .../aarch64/ldp_ssll_no_overlap_address.c |  33 ++
> > >  .../aarch64/ldp_ssll_no_overlap_offset.c  |  33 ++
> > >  .../gcc.target/aarch64/ldp_ssll_overlap.c |  33 ++
> > >  10 files changed, 449 insertions(+), 1 deletion(-)
> > >  create mode 100644 gcc/config/aarch64/aarch64-store-forwarding.cc
> > >  create mode 100644 
> > > gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_address.c
> > >  create mode 100644 
> > > gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_offset.c
> > >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_ssll_overlap.c
> > >
> > > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > > index 6450448f2f0..7c48429eb82 100644
> > > --- a/gcc/config.gcc
> > > +++ b/gcc/config.gcc
> > > @@ -350,6 +350,7 @@ aarch64*-*-*)
> > > cxx_target_objs="aarch64-c.o"
> > > d_target_objs="aarch64-d.o"
> > > extra_objs="aarch64-builtins.o aarch-common.o 
> > > aarch64-sve-builtins.o aarch64-sve-builtins-shapes.o 
> > > aarch64-sve-builtins-base.o aarch64-sve-builtins-sve2.o 
> > > aarch64-sve-builtins-sme.o cortex-a57-fma-steering.o 
> > > aarch64-speculation.o 

Re: [PATCH v6] aarch64: New RTL optimization pass avoid-store-forwarding.

2023-12-06 Thread Philipp Tomsich
On Wed, 6 Dec 2023 at 23:32, Richard Biener  wrote:
>
> On Wed, Dec 6, 2023 at 2:48 PM Manos Anagnostakis
>  wrote:
> >
> > This is an RTL pass that detects store forwarding from stores to larger 
> > loads (load pairs).
> >
> > This optimization is SPEC2017-driven and was found to be beneficial for 
> > some benchmarks,
> > through testing on ampere1/ampere1a machines.
> >
> > For example, it can transform cases like
> >
> > str  d5, [sp, #320]
> > fmul d5, d31, d29
> > ldp  d31, d17, [sp, #312] # Large load from small store
> >
> > to
> >
> > str  d5, [sp, #320]
> > fmul d5, d31, d29
> > ldr  d31, [sp, #312]
> > ldr  d17, [sp, #320]
> >
> > Currently, the pass is disabled by default on all architectures and enabled 
> > by a target-specific option.
> >
> > If deemed beneficial enough for a default, it will be enabled on 
> > ampere1/ampere1a,
> > or other architectures as well, without needing to be turned on by this 
> > option.
>
> What is aarch64-specific about the pass?
>
> I see an increasingly large number of target specific passes pop up (probably
> for the excuse we can generalize them if necessary).  But GCC isn't LLVM
> and this feels like getting out of hand?

We had an OK from Richard Sandiford on the earlier (v5) version with
v6 just fixing an obvious bug... so I was about to merge this earlier
just when you commented.

Given that this had months of test exposure on our end, I would prefer
to move this forward for GCC14 in its current form.
The project of replacing architecture-specific store-forwarding passes
with a generalized infrastructure could then be addressed in the GCC15
timeframe (or beyond)?

--Philipp.

>
> The x86 backend also has its store-forwarding "pass" as part of mdreorg
> in ix86_split_stlf_stall_load.
>
> Richard.
>
> > Bootstrapped and regtested on aarch64-linux.
> >
> > gcc/ChangeLog:
> >
> > * config.gcc: Add aarch64-store-forwarding.o to extra_objs.
> > * config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New pass.
> > * config/aarch64/aarch64-protos.h 
> > (make_pass_avoid_store_forwarding): Declare.
> > * config/aarch64/aarch64.opt (mavoid-store-forwarding): New option.
> > (aarch64-store-forwarding-threshold): New param.
> > * config/aarch64/t-aarch64: Add aarch64-store-forwarding.o
> > * doc/invoke.texi: Document new option and new param.
> > * config/aarch64/aarch64-store-forwarding.cc: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test.
> > * gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test.
> > * gcc.target/aarch64/ldp_ssll_overlap.c: New test.
> >
> > Signed-off-by: Manos Anagnostakis 
> > Co-Authored-By: Manolis Tsamis 
> > Co-Authored-By: Philipp Tomsich 
> > ---
> > Changes in v6:
> > - An obvious change. insn_cnt was incremented only on
> >   stores and not for every insn in the bb. Now restored.
> >
> >  gcc/config.gcc|   1 +
> >  gcc/config/aarch64/aarch64-passes.def |   1 +
> >  gcc/config/aarch64/aarch64-protos.h   |   1 +
> >  .../aarch64/aarch64-store-forwarding.cc   | 318 ++
> >  gcc/config/aarch64/aarch64.opt|   9 +
> >  gcc/config/aarch64/t-aarch64  |  10 +
> >  gcc/doc/invoke.texi   |  11 +-
> >  .../aarch64/ldp_ssll_no_overlap_address.c |  33 ++
> >  .../aarch64/ldp_ssll_no_overlap_offset.c  |  33 ++
> >  .../gcc.target/aarch64/ldp_ssll_overlap.c |  33 ++
> >  10 files changed, 449 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/config/aarch64/aarch64-store-forwarding.cc
> >  create mode 100644 
> > gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_address.c
> >  create mode 100644 
> > gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_offset.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_ssll_overlap.c
> >
> > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > index 6450448f2f0..7c48429eb82 100644
> > --- a/gcc/config.gcc
> > +++ b/gcc/config.gcc
> > @@ -350,6 +350,7 @@ aarch64*-*-*)
> > cxx_target_objs="aarch64-c.o"
> > d_target_objs="aarch64-d.o"
> > extra_objs="aarch64-builtins.o aarch-common.o 
> > aarch64-sve-builtins.o aarch64-sve-builtins-shapes.o 
> > aarch64-sve-builtins-base.o aarch64-sve-builtins-sve2.o 
> > aarch64-sve-builtins-sme.o cortex-a57-fma-steering.o aarch64-speculation.o 
> > falkor-tag-collision-avoidance.o aarch-bti-insert.o aarch64-cc-fusion.o"
> > +   extra_objs="${extra_objs} aarch64-store-forwarding.o"
> > target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.cc 
> > \$(srcdir)/config/aarch64/aarch64-sve-builtins.h 
> > \$(srcdir)/config/aarch64/aarch64-sve-builtins.cc"
> > target_has_targetm_common=yes
> > ;;
> > diff --git a/gcc/config/aarch64/aarch64-passes.def 
> > b/gcc/config/aarch64/aarch64-passes.def
> > index 

Re: [PATCH v6] aarch64: New RTL optimization pass avoid-store-forwarding.

2023-12-06 Thread Manos Anagnostakis
Hi again,

I went and tested the requested changes and found out the following:

1. The pass is currently increasing insn_cnt on a NONJUMP_INSN_P, which is
a subset of NONDEBUG_INSN_P. I think there is no problem with depending on
-g with the current version. Do you see something I don't or did you mean
something else?
2. Not processing all instructions is not letting cselib record all the
effects they have, thus it does not have updated information to find true
forwardings at any given time. I can confirm this since I am witnessing
many unexpected changes on the number of handled cases if I do this only
for loads/stores.

Thanks in advance and please let me know your thoughts on the above.
Manos.

On Wed, Dec 6, 2023 at 5:10 PM Manos Anagnostakis <
manos.anagnosta...@vrull.eu> wrote:

> Hi Richard,
>
> thanks for the useful comments.
>
> On Wed, Dec 6, 2023 at 4:32 PM Richard Biener 
> wrote:
>
>> On Wed, Dec 6, 2023 at 2:48 PM Manos Anagnostakis
>>  wrote:
>> >
>> > This is an RTL pass that detects store forwarding from stores to larger
>> loads (load pairs).
>> >
>> > This optimization is SPEC2017-driven and was found to be beneficial for
>> some benchmarks,
>> > through testing on ampere1/ampere1a machines.
>> >
>> > For example, it can transform cases like
>> >
>> > str  d5, [sp, #320]
>> > fmul d5, d31, d29
>> > ldp  d31, d17, [sp, #312] # Large load from small store
>> >
>> > to
>> >
>> > str  d5, [sp, #320]
>> > fmul d5, d31, d29
>> > ldr  d31, [sp, #312]
>> > ldr  d17, [sp, #320]
>> >
>> > Currently, the pass is disabled by default on all architectures and
>> enabled by a target-specific option.
>> >
>> > If deemed beneficial enough for a default, it will be enabled on
>> ampere1/ampere1a,
>> > or other architectures as well, without needing to be turned on by this
>> option.
>>
>> What is aarch64-specific about the pass?
>>
> The pass was designed to target load pairs, which are aarch64 specific,
> thus it cannot handle generic loads.
>
>>
>> I see an increasingly large number of target specific passes pop up
>> (probably
>> for the excuse we can generalize them if necessary).  But GCC isn't LLVM
>> and this feels like getting out of hand?
>>
>> The x86 backend also has its store-forwarding "pass" as part of mdreorg
>> in ix86_split_stlf_stall_load.
>>
>> Richard.
>>
>> > Bootstrapped and regtested on aarch64-linux.
>> >
>> > gcc/ChangeLog:
>> >
>> > * config.gcc: Add aarch64-store-forwarding.o to extra_objs.
>> > * config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New
>> pass.
>> > * config/aarch64/aarch64-protos.h
>> (make_pass_avoid_store_forwarding): Declare.
>> > * config/aarch64/aarch64.opt (mavoid-store-forwarding): New
>> option.
>> > (aarch64-store-forwarding-threshold): New param.
>> > * config/aarch64/t-aarch64: Add aarch64-store-forwarding.o
>> > * doc/invoke.texi: Document new option and new param.
>> > * config/aarch64/aarch64-store-forwarding.cc: New file.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> > * gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test.
>> > * gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test.
>> > * gcc.target/aarch64/ldp_ssll_overlap.c: New test.
>> >
>> > Signed-off-by: Manos Anagnostakis 
>> > Co-Authored-By: Manolis Tsamis 
>> > Co-Authored-By: Philipp Tomsich 
>> > ---
>> > Changes in v6:
>> > - An obvious change. insn_cnt was incremented only on
>> >   stores and not for every insn in the bb. Now restored.
>> >
>> >  gcc/config.gcc|   1 +
>> >  gcc/config/aarch64/aarch64-passes.def |   1 +
>> >  gcc/config/aarch64/aarch64-protos.h   |   1 +
>> >  .../aarch64/aarch64-store-forwarding.cc   | 318 ++
>> >  gcc/config/aarch64/aarch64.opt|   9 +
>> >  gcc/config/aarch64/t-aarch64  |  10 +
>> >  gcc/doc/invoke.texi   |  11 +-
>> >  .../aarch64/ldp_ssll_no_overlap_address.c |  33 ++
>> >  .../aarch64/ldp_ssll_no_overlap_offset.c  |  33 ++
>> >  .../gcc.target/aarch64/ldp_ssll_overlap.c |  33 ++
>> >  10 files changed, 449 insertions(+), 1 deletion(-)
>> >  create mode 100644 gcc/config/aarch64/aarch64-store-forwarding.cc
>> >  create mode 100644
>> gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_address.c
>> >  create mode 100644
>> gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_offset.c
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_ssll_overlap.c
>> >
>> > diff --git a/gcc/config.gcc b/gcc/config.gcc
>> > index 6450448f2f0..7c48429eb82 100644
>> > --- a/gcc/config.gcc
>> > +++ b/gcc/config.gcc
>> > @@ -350,6 +350,7 @@ aarch64*-*-*)
>> > cxx_target_objs="aarch64-c.o"
>> > d_target_objs="aarch64-d.o"
>> > extra_objs="aarch64-builtins.o aarch-common.o
>> aarch64-sve-builtins.o aarch64-sve-builtins-shapes.o
>> aarch64-sve-builtins-base.o 

Re: [PATCH v6] aarch64: New RTL optimization pass avoid-store-forwarding.

2023-12-06 Thread Manos Anagnostakis
Hi Richard,

thanks for the useful comments.

On Wed, Dec 6, 2023 at 4:32 PM Richard Biener 
wrote:

> On Wed, Dec 6, 2023 at 2:48 PM Manos Anagnostakis
>  wrote:
> >
> > This is an RTL pass that detects store forwarding from stores to larger
> loads (load pairs).
> >
> > This optimization is SPEC2017-driven and was found to be beneficial for
> some benchmarks,
> > through testing on ampere1/ampere1a machines.
> >
> > For example, it can transform cases like
> >
> > str  d5, [sp, #320]
> > fmul d5, d31, d29
> > ldp  d31, d17, [sp, #312] # Large load from small store
> >
> > to
> >
> > str  d5, [sp, #320]
> > fmul d5, d31, d29
> > ldr  d31, [sp, #312]
> > ldr  d17, [sp, #320]
> >
> > Currently, the pass is disabled by default on all architectures and
> enabled by a target-specific option.
> >
> > If deemed beneficial enough for a default, it will be enabled on
> ampere1/ampere1a,
> > or other architectures as well, without needing to be turned on by this
> option.
>
> What is aarch64-specific about the pass?
>
The pass was designed to target load pairs, which are aarch64 specific,
thus it cannot handle generic loads.

>
> I see an increasingly large number of target specific passes pop up
> (probably
> for the excuse we can generalize them if necessary).  But GCC isn't LLVM
> and this feels like getting out of hand?
>
> The x86 backend also has its store-forwarding "pass" as part of mdreorg
> in ix86_split_stlf_stall_load.
>
> Richard.
>
> > Bootstrapped and regtested on aarch64-linux.
> >
> > gcc/ChangeLog:
> >
> > * config.gcc: Add aarch64-store-forwarding.o to extra_objs.
> > * config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New
> pass.
> > * config/aarch64/aarch64-protos.h
> (make_pass_avoid_store_forwarding): Declare.
> > * config/aarch64/aarch64.opt (mavoid-store-forwarding): New
> option.
> > (aarch64-store-forwarding-threshold): New param.
> > * config/aarch64/t-aarch64: Add aarch64-store-forwarding.o
> > * doc/invoke.texi: Document new option and new param.
> > * config/aarch64/aarch64-store-forwarding.cc: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test.
> > * gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test.
> > * gcc.target/aarch64/ldp_ssll_overlap.c: New test.
> >
> > Signed-off-by: Manos Anagnostakis 
> > Co-Authored-By: Manolis Tsamis 
> > Co-Authored-By: Philipp Tomsich 
> > ---
> > Changes in v6:
> > - An obvious change. insn_cnt was incremented only on
> >   stores and not for every insn in the bb. Now restored.
> >
> >  gcc/config.gcc|   1 +
> >  gcc/config/aarch64/aarch64-passes.def |   1 +
> >  gcc/config/aarch64/aarch64-protos.h   |   1 +
> >  .../aarch64/aarch64-store-forwarding.cc   | 318 ++
> >  gcc/config/aarch64/aarch64.opt|   9 +
> >  gcc/config/aarch64/t-aarch64  |  10 +
> >  gcc/doc/invoke.texi   |  11 +-
> >  .../aarch64/ldp_ssll_no_overlap_address.c |  33 ++
> >  .../aarch64/ldp_ssll_no_overlap_offset.c  |  33 ++
> >  .../gcc.target/aarch64/ldp_ssll_overlap.c |  33 ++
> >  10 files changed, 449 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/config/aarch64/aarch64-store-forwarding.cc
> >  create mode 100644
> gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_address.c
> >  create mode 100644
> gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_offset.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_ssll_overlap.c
> >
> > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > index 6450448f2f0..7c48429eb82 100644
> > --- a/gcc/config.gcc
> > +++ b/gcc/config.gcc
> > @@ -350,6 +350,7 @@ aarch64*-*-*)
> > cxx_target_objs="aarch64-c.o"
> > d_target_objs="aarch64-d.o"
> > extra_objs="aarch64-builtins.o aarch-common.o
> aarch64-sve-builtins.o aarch64-sve-builtins-shapes.o
> aarch64-sve-builtins-base.o aarch64-sve-builtins-sve2.o
> aarch64-sve-builtins-sme.o cortex-a57-fma-steering.o aarch64-speculation.o
> falkor-tag-collision-avoidance.o aarch-bti-insert.o aarch64-cc-fusion.o"
> > +   extra_objs="${extra_objs} aarch64-store-forwarding.o"
> > target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.cc
> \$(srcdir)/config/aarch64/aarch64-sve-builtins.h
> \$(srcdir)/config/aarch64/aarch64-sve-builtins.cc"
> > target_has_targetm_common=yes
> > ;;
> > diff --git a/gcc/config/aarch64/aarch64-passes.def
> b/gcc/config/aarch64/aarch64-passes.def
> > index 662a13fd5e6..94ced0aebf6 100644
> > --- a/gcc/config/aarch64/aarch64-passes.def
> > +++ b/gcc/config/aarch64/aarch64-passes.def
> > @@ -24,3 +24,4 @@ INSERT_PASS_BEFORE
> (pass_late_thread_prologue_and_epilogue, 1, pass_switch_pstat
> >  INSERT_PASS_AFTER (pass_machine_reorg, 1, pass_tag_collision_avoidance);
> >  INSERT_PASS_BEFORE 

Re: [PATCH v6] aarch64: New RTL optimization pass avoid-store-forwarding.

2023-12-06 Thread Richard Biener
On Wed, Dec 6, 2023 at 2:48 PM Manos Anagnostakis
 wrote:
>
> This is an RTL pass that detects store forwarding from stores to larger loads 
> (load pairs).
>
> This optimization is SPEC2017-driven and was found to be beneficial for some 
> benchmarks,
> through testing on ampere1/ampere1a machines.
>
> For example, it can transform cases like
>
> str  d5, [sp, #320]
> fmul d5, d31, d29
> ldp  d31, d17, [sp, #312] # Large load from small store
>
> to
>
> str  d5, [sp, #320]
> fmul d5, d31, d29
> ldr  d31, [sp, #312]
> ldr  d17, [sp, #320]
>
> Currently, the pass is disabled by default on all architectures and enabled 
> by a target-specific option.
>
> If deemed beneficial enough for a default, it will be enabled on 
> ampere1/ampere1a,
> or other architectures as well, without needing to be turned on by this 
> option.

What is aarch64-specific about the pass?

I see an increasingly large number of target specific passes pop up (probably
for the excuse we can generalize them if necessary).  But GCC isn't LLVM
and this feels like getting out of hand?

The x86 backend also has its store-forwarding "pass" as part of mdreorg
in ix86_split_stlf_stall_load.

Richard.

> Bootstrapped and regtested on aarch64-linux.
>
> gcc/ChangeLog:
>
> * config.gcc: Add aarch64-store-forwarding.o to extra_objs.
> * config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New pass.
> * config/aarch64/aarch64-protos.h (make_pass_avoid_store_forwarding): 
> Declare.
> * config/aarch64/aarch64.opt (mavoid-store-forwarding): New option.
> (aarch64-store-forwarding-threshold): New param.
> * config/aarch64/t-aarch64: Add aarch64-store-forwarding.o
> * doc/invoke.texi: Document new option and new param.
> * config/aarch64/aarch64-store-forwarding.cc: New file.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test.
> * gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test.
> * gcc.target/aarch64/ldp_ssll_overlap.c: New test.
>
> Signed-off-by: Manos Anagnostakis 
> Co-Authored-By: Manolis Tsamis 
> Co-Authored-By: Philipp Tomsich 
> ---
> Changes in v6:
> - An obvious change. insn_cnt was incremented only on
>   stores and not for every insn in the bb. Now restored.
>
>  gcc/config.gcc|   1 +
>  gcc/config/aarch64/aarch64-passes.def |   1 +
>  gcc/config/aarch64/aarch64-protos.h   |   1 +
>  .../aarch64/aarch64-store-forwarding.cc   | 318 ++
>  gcc/config/aarch64/aarch64.opt|   9 +
>  gcc/config/aarch64/t-aarch64  |  10 +
>  gcc/doc/invoke.texi   |  11 +-
>  .../aarch64/ldp_ssll_no_overlap_address.c |  33 ++
>  .../aarch64/ldp_ssll_no_overlap_offset.c  |  33 ++
>  .../gcc.target/aarch64/ldp_ssll_overlap.c |  33 ++
>  10 files changed, 449 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/config/aarch64/aarch64-store-forwarding.cc
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_address.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_offset.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_ssll_overlap.c
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 6450448f2f0..7c48429eb82 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -350,6 +350,7 @@ aarch64*-*-*)
> cxx_target_objs="aarch64-c.o"
> d_target_objs="aarch64-d.o"
> extra_objs="aarch64-builtins.o aarch-common.o aarch64-sve-builtins.o 
> aarch64-sve-builtins-shapes.o aarch64-sve-builtins-base.o 
> aarch64-sve-builtins-sve2.o aarch64-sve-builtins-sme.o 
> cortex-a57-fma-steering.o aarch64-speculation.o 
> falkor-tag-collision-avoidance.o aarch-bti-insert.o aarch64-cc-fusion.o"
> +   extra_objs="${extra_objs} aarch64-store-forwarding.o"
> target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.cc 
> \$(srcdir)/config/aarch64/aarch64-sve-builtins.h 
> \$(srcdir)/config/aarch64/aarch64-sve-builtins.cc"
> target_has_targetm_common=yes
> ;;
> diff --git a/gcc/config/aarch64/aarch64-passes.def 
> b/gcc/config/aarch64/aarch64-passes.def
> index 662a13fd5e6..94ced0aebf6 100644
> --- a/gcc/config/aarch64/aarch64-passes.def
> +++ b/gcc/config/aarch64/aarch64-passes.def
> @@ -24,3 +24,4 @@ INSERT_PASS_BEFORE (pass_late_thread_prologue_and_epilogue, 
> 1, pass_switch_pstat
>  INSERT_PASS_AFTER (pass_machine_reorg, 1, pass_tag_collision_avoidance);
>  INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_insert_bti);
>  INSERT_PASS_AFTER (pass_if_after_combine, 1, pass_cc_fusion);
> +INSERT_PASS_AFTER (pass_peephole2, 1, pass_avoid_store_forwarding);
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 60ff61f6d54..8f5f2ca4710 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@