Re: [PATCH v6] aarch64: New RTL optimization pass avoid-store-forwarding.
So is it OK for trunk as is in v6 with the generic changes added in GCC-15? Manos. Στις Πέμ 7 Δεκ 2023, 16:10 ο χρήστης Richard Biener < richard.guent...@gmail.com> έγραψε: > On Thu, Dec 7, 2023 at 1:20 PM Richard Sandiford > wrote: > > > > Richard Biener writes: > > > On Wed, Dec 6, 2023 at 7:44 PM Philipp Tomsich < > philipp.toms...@vrull.eu> wrote: > > >> > > >> On Wed, 6 Dec 2023 at 23:32, Richard Biener < > richard.guent...@gmail.com> wrote: > > >> > > > >> > On Wed, Dec 6, 2023 at 2:48 PM Manos Anagnostakis > > >> > wrote: > > >> > > > > >> > > This is an RTL pass that detects store forwarding from stores to > larger loads (load pairs). > > >> > > > > >> > > This optimization is SPEC2017-driven and was found to be > beneficial for some benchmarks, > > >> > > through testing on ampere1/ampere1a machines. > > >> > > > > >> > > For example, it can transform cases like > > >> > > > > >> > > str d5, [sp, #320] > > >> > > fmul d5, d31, d29 > > >> > > ldp d31, d17, [sp, #312] # Large load from small store > > >> > > > > >> > > to > > >> > > > > >> > > str d5, [sp, #320] > > >> > > fmul d5, d31, d29 > > >> > > ldr d31, [sp, #312] > > >> > > ldr d17, [sp, #320] > > >> > > > > >> > > Currently, the pass is disabled by default on all architectures > and enabled by a target-specific option. > > >> > > > > >> > > If deemed beneficial enough for a default, it will be enabled on > ampere1/ampere1a, > > >> > > or other architectures as well, without needing to be turned on > by this option. > > >> > > > >> > What is aarch64-specific about the pass? > > >> > > > >> > I see an increasingly large number of target specific passes pop up > (probably > > >> > for the excuse we can generalize them if necessary). But GCC isn't > LLVM > > >> > and this feels like getting out of hand? > > >> > > >> We had an OK from Richard Sandiford on the earlier (v5) version with > > >> v6 just fixing an obvious bug... so I was about to merge this earlier > > >> just when you commented. > > >> > > >> Given that this had months of test exposure on our end, I would prefer > > >> to move this forward for GCC14 in its current form. > > >> The project of replacing architecture-specific store-forwarding passes > > >> with a generalized infrastructure could then be addressed in the GCC15 > > >> timeframe (or beyond)? > > > > > > It's up to target maintainers, I just picked this pass (randomly) to > make this > > > comment (of course also knowing that STLF fails are a common issue on > > > pipelined uarchs). > > > > I agree there's scope for making some of this target-independent. > > > > One vague thing I've been wondering about is whether, for some passes > > like these, we should use inheritance rather than target hooks. So in > > this case, the target-independent code would provide a framework for > > iterating over the function and testing for forwarding, but the target > > would ultimately decide what to do with that information. This would > > also make it easier for targets to add genuinely target-specific > > information to the bookkeeping structures. > > > > In case it sounds otherwise, that's supposed to be more than > > just a structural C++-vs-C thing. The idea is that we'd have > > a pass for "resolving store forwarding-related problems", > > but the specific goals would be mostly (or at least partially) > > target-specific rather than target-independent. > > In some cases we've used target hooks for this, in this case it might > work as well. > > > I'd wondered the same thing about the early-ra pass that we're > > adding for SME. Some of the framework could be generalised and > > made target-independent, but the main purpose of the pass (using > > strided registers with certain patterns and constraints) is highly > > target-specific. > > .. not sure about this one though. > > Richard. > > > Thanks, > > Richard >
Re: [PATCH v6] aarch64: New RTL optimization pass avoid-store-forwarding.
On Thu, Dec 7, 2023 at 1:20 PM Richard Sandiford wrote: > > Richard Biener writes: > > On Wed, Dec 6, 2023 at 7:44 PM Philipp Tomsich > > wrote: > >> > >> On Wed, 6 Dec 2023 at 23:32, Richard Biener > >> wrote: > >> > > >> > On Wed, Dec 6, 2023 at 2:48 PM Manos Anagnostakis > >> > wrote: > >> > > > >> > > This is an RTL pass that detects store forwarding from stores to > >> > > larger loads (load pairs). > >> > > > >> > > This optimization is SPEC2017-driven and was found to be beneficial > >> > > for some benchmarks, > >> > > through testing on ampere1/ampere1a machines. > >> > > > >> > > For example, it can transform cases like > >> > > > >> > > str d5, [sp, #320] > >> > > fmul d5, d31, d29 > >> > > ldp d31, d17, [sp, #312] # Large load from small store > >> > > > >> > > to > >> > > > >> > > str d5, [sp, #320] > >> > > fmul d5, d31, d29 > >> > > ldr d31, [sp, #312] > >> > > ldr d17, [sp, #320] > >> > > > >> > > Currently, the pass is disabled by default on all architectures and > >> > > enabled by a target-specific option. > >> > > > >> > > If deemed beneficial enough for a default, it will be enabled on > >> > > ampere1/ampere1a, > >> > > or other architectures as well, without needing to be turned on by > >> > > this option. > >> > > >> > What is aarch64-specific about the pass? > >> > > >> > I see an increasingly large number of target specific passes pop up > >> > (probably > >> > for the excuse we can generalize them if necessary). But GCC isn't LLVM > >> > and this feels like getting out of hand? > >> > >> We had an OK from Richard Sandiford on the earlier (v5) version with > >> v6 just fixing an obvious bug... so I was about to merge this earlier > >> just when you commented. > >> > >> Given that this had months of test exposure on our end, I would prefer > >> to move this forward for GCC14 in its current form. > >> The project of replacing architecture-specific store-forwarding passes > >> with a generalized infrastructure could then be addressed in the GCC15 > >> timeframe (or beyond)? > > > > It's up to target maintainers, I just picked this pass (randomly) to make > > this > > comment (of course also knowing that STLF fails are a common issue on > > pipelined uarchs). > > I agree there's scope for making some of this target-independent. > > One vague thing I've been wondering about is whether, for some passes > like these, we should use inheritance rather than target hooks. So in > this case, the target-independent code would provide a framework for > iterating over the function and testing for forwarding, but the target > would ultimately decide what to do with that information. This would > also make it easier for targets to add genuinely target-specific > information to the bookkeeping structures. > > In case it sounds otherwise, that's supposed to be more than > just a structural C++-vs-C thing. The idea is that we'd have > a pass for "resolving store forwarding-related problems", > but the specific goals would be mostly (or at least partially) > target-specific rather than target-independent. In some cases we've used target hooks for this, in this case it might work as well. > I'd wondered the same thing about the early-ra pass that we're > adding for SME. Some of the framework could be generalised and > made target-independent, but the main purpose of the pass (using > strided registers with certain patterns and constraints) is highly > target-specific. .. not sure about this one though. Richard. > Thanks, > Richard
Re: [PATCH v6] aarch64: New RTL optimization pass avoid-store-forwarding.
Richard Biener writes: > On Wed, Dec 6, 2023 at 7:44 PM Philipp Tomsich > wrote: >> >> On Wed, 6 Dec 2023 at 23:32, Richard Biener >> wrote: >> > >> > On Wed, Dec 6, 2023 at 2:48 PM Manos Anagnostakis >> > wrote: >> > > >> > > This is an RTL pass that detects store forwarding from stores to larger >> > > loads (load pairs). >> > > >> > > This optimization is SPEC2017-driven and was found to be beneficial for >> > > some benchmarks, >> > > through testing on ampere1/ampere1a machines. >> > > >> > > For example, it can transform cases like >> > > >> > > str d5, [sp, #320] >> > > fmul d5, d31, d29 >> > > ldp d31, d17, [sp, #312] # Large load from small store >> > > >> > > to >> > > >> > > str d5, [sp, #320] >> > > fmul d5, d31, d29 >> > > ldr d31, [sp, #312] >> > > ldr d17, [sp, #320] >> > > >> > > Currently, the pass is disabled by default on all architectures and >> > > enabled by a target-specific option. >> > > >> > > If deemed beneficial enough for a default, it will be enabled on >> > > ampere1/ampere1a, >> > > or other architectures as well, without needing to be turned on by this >> > > option. >> > >> > What is aarch64-specific about the pass? >> > >> > I see an increasingly large number of target specific passes pop up >> > (probably >> > for the excuse we can generalize them if necessary). But GCC isn't LLVM >> > and this feels like getting out of hand? >> >> We had an OK from Richard Sandiford on the earlier (v5) version with >> v6 just fixing an obvious bug... so I was about to merge this earlier >> just when you commented. >> >> Given that this had months of test exposure on our end, I would prefer >> to move this forward for GCC14 in its current form. >> The project of replacing architecture-specific store-forwarding passes >> with a generalized infrastructure could then be addressed in the GCC15 >> timeframe (or beyond)? > > It's up to target maintainers, I just picked this pass (randomly) to make this > comment (of course also knowing that STLF fails are a common issue on > pipelined uarchs). I agree there's scope for making some of this target-independent. One vague thing I've been wondering about is whether, for some passes like these, we should use inheritance rather than target hooks. So in this case, the target-independent code would provide a framework for iterating over the function and testing for forwarding, but the target would ultimately decide what to do with that information. This would also make it easier for targets to add genuinely target-specific information to the bookkeeping structures. In case it sounds otherwise, that's supposed to be more than just a structural C++-vs-C thing. The idea is that we'd have a pass for "resolving store forwarding-related problems", but the specific goals would be mostly (or at least partially) target-specific rather than target-independent. I'd wondered the same thing about the early-ra pass that we're adding for SME. Some of the framework could be generalised and made target-independent, but the main purpose of the pass (using strided registers with certain patterns and constraints) is highly target-specific. Thanks, Richard
Re: [PATCH v6] aarch64: New RTL optimization pass avoid-store-forwarding.
Στις Πέμ 7 Δεκ 2023, 09:39 ο χρήστης Richard Biener < richard.guent...@gmail.com> έγραψε: > On Wed, Dec 6, 2023 at 6:15 PM Manos Anagnostakis > wrote: > > > > Hi again, > > > > I went and tested the requested changes and found out the following: > > > > 1. The pass is currently increasing insn_cnt on a NONJUMP_INSN_P, which > is a subset of NONDEBUG_INSN_P. I think there is no problem with depending > on -g with the current version. Do you see something I don't or did you > mean something else? > > It just occurred to me - thanks for double-checking (it wasn't obvious > to me NONJUMP_INSN_P doesn't include DEBUG_INSNs ...). > > > 2. Not processing all instructions is not letting cselib record all the > effects they have, thus it does not have updated information to find true > forwardings at any given time. I can confirm this since I am witnessing > many unexpected changes on the number of handled cases if I do this only > for loads/stores. > > Ah, OK. I guess I don't fully get it, it seems you use cselib to > compare addresses and while possibly not > processing part of the address computation might break this other > stmts inbetween should be uninteresting > (at least I don't see you handling intermediate may-aliasing [small] > stores to disable the splitting). > > So in the end it's a compile-time trade-off between relying solely on > cselib or trying to optimize > cselib use with DF for address computes? > > Richard. > I am not really familiar with the DF technique, but what we did is the following: At first we were using a combination of alias analysis with cselib, which while trying to avoid false positives that had their memory-register address changed inbetween, was rejecting part of stores from being considered as candidates for a forwarding to an ldp. Thus in the end using only the cselib with the correct lookups was the way to address both the register difference and/or the intermediate change on the address value. > > > Thanks in advance and please let me know your thoughts on the above. > > Manos. > > > > On Wed, Dec 6, 2023 at 5:10 PM Manos Anagnostakis < > manos.anagnosta...@vrull.eu> wrote: > >> > >> Hi Richard, > >> > >> thanks for the useful comments. > >> > >> On Wed, Dec 6, 2023 at 4:32 PM Richard Biener < > richard.guent...@gmail.com> wrote: > >>> > >>> On Wed, Dec 6, 2023 at 2:48 PM Manos Anagnostakis > >>> wrote: > >>> > > >>> > This is an RTL pass that detects store forwarding from stores to > larger loads (load pairs). > >>> > > >>> > This optimization is SPEC2017-driven and was found to be beneficial > for some benchmarks, > >>> > through testing on ampere1/ampere1a machines. > >>> > > >>> > For example, it can transform cases like > >>> > > >>> > str d5, [sp, #320] > >>> > fmul d5, d31, d29 > >>> > ldp d31, d17, [sp, #312] # Large load from small store > >>> > > >>> > to > >>> > > >>> > str d5, [sp, #320] > >>> > fmul d5, d31, d29 > >>> > ldr d31, [sp, #312] > >>> > ldr d17, [sp, #320] > >>> > > >>> > Currently, the pass is disabled by default on all architectures and > enabled by a target-specific option. > >>> > > >>> > If deemed beneficial enough for a default, it will be enabled on > ampere1/ampere1a, > >>> > or other architectures as well, without needing to be turned on by > this option. > >>> > >>> What is aarch64-specific about the pass? > >> > >> The pass was designed to target load pairs, which are aarch64 specific, > thus it cannot handle generic loads. > >>> > >>> > >>> I see an increasingly large number of target specific passes pop up > (probably > >>> for the excuse we can generalize them if necessary). But GCC isn't > LLVM > >>> and this feels like getting out of hand? > >>> > >>> The x86 backend also has its store-forwarding "pass" as part of mdreorg > >>> in ix86_split_stlf_stall_load. > >>> > >>> Richard. > >>> > >>> > Bootstrapped and regtested on aarch64-linux. > >>> > > >>> > gcc/ChangeLog: > >>> > > >>> > * config.gcc: Add aarch64-store-forwarding.o to extra_objs. > >>> > * config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New > pass. > >>> > * config/aarch64/aarch64-protos.h > (make_pass_avoid_store_forwarding): Declare. > >>> > * config/aarch64/aarch64.opt (mavoid-store-forwarding): New > option. > >>> > (aarch64-store-forwarding-threshold): New param. > >>> > * config/aarch64/t-aarch64: Add aarch64-store-forwarding.o > >>> > * doc/invoke.texi: Document new option and new param. > >>> > * config/aarch64/aarch64-store-forwarding.cc: New file. > >>> > > >>> > gcc/testsuite/ChangeLog: > >>> > > >>> > * gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test. > >>> > * gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test. > >>> > * gcc.target/aarch64/ldp_ssll_overlap.c: New test. > >>> > > >>> > Signed-off-by: Manos Anagnostakis > >>> > Co-Authored-By: Manolis Tsamis > >>> > Co-Authored-By: Philipp Tomsich > >>> > --- > >>> > Changes in v6: >
Re: [PATCH v6] aarch64: New RTL optimization pass avoid-store-forwarding.
On Wed, Dec 6, 2023 at 6:15 PM Manos Anagnostakis wrote: > > Hi again, > > I went and tested the requested changes and found out the following: > > 1. The pass is currently increasing insn_cnt on a NONJUMP_INSN_P, which is a > subset of NONDEBUG_INSN_P. I think there is no problem with depending on -g > with the current version. Do you see something I don't or did you mean > something else? It just occurred to me - thanks for double-checking (it wasn't obvious to me NONJUMP_INSN_P doesn't include DEBUG_INSNs ...). > 2. Not processing all instructions is not letting cselib record all the > effects they have, thus it does not have updated information to find true > forwardings at any given time. I can confirm this since I am witnessing many > unexpected changes on the number of handled cases if I do this only for > loads/stores. Ah, OK. I guess I don't fully get it, it seems you use cselib to compare addresses and while possibly not processing part of the address computation might break this other stmts inbetween should be uninteresting (at least I don't see you handling intermediate may-aliasing [small] stores to disable the splitting). So in the end it's a compile-time trade-off between relying solely on cselib or trying to optimize cselib use with DF for address computes? Richard. > Thanks in advance and please let me know your thoughts on the above. > Manos. > > On Wed, Dec 6, 2023 at 5:10 PM Manos Anagnostakis > wrote: >> >> Hi Richard, >> >> thanks for the useful comments. >> >> On Wed, Dec 6, 2023 at 4:32 PM Richard Biener >> wrote: >>> >>> On Wed, Dec 6, 2023 at 2:48 PM Manos Anagnostakis >>> wrote: >>> > >>> > This is an RTL pass that detects store forwarding from stores to larger >>> > loads (load pairs). >>> > >>> > This optimization is SPEC2017-driven and was found to be beneficial for >>> > some benchmarks, >>> > through testing on ampere1/ampere1a machines. >>> > >>> > For example, it can transform cases like >>> > >>> > str d5, [sp, #320] >>> > fmul d5, d31, d29 >>> > ldp d31, d17, [sp, #312] # Large load from small store >>> > >>> > to >>> > >>> > str d5, [sp, #320] >>> > fmul d5, d31, d29 >>> > ldr d31, [sp, #312] >>> > ldr d17, [sp, #320] >>> > >>> > Currently, the pass is disabled by default on all architectures and >>> > enabled by a target-specific option. >>> > >>> > If deemed beneficial enough for a default, it will be enabled on >>> > ampere1/ampere1a, >>> > or other architectures as well, without needing to be turned on by this >>> > option. >>> >>> What is aarch64-specific about the pass? >> >> The pass was designed to target load pairs, which are aarch64 specific, thus >> it cannot handle generic loads. >>> >>> >>> I see an increasingly large number of target specific passes pop up >>> (probably >>> for the excuse we can generalize them if necessary). But GCC isn't LLVM >>> and this feels like getting out of hand? >>> >>> The x86 backend also has its store-forwarding "pass" as part of mdreorg >>> in ix86_split_stlf_stall_load. >>> >>> Richard. >>> >>> > Bootstrapped and regtested on aarch64-linux. >>> > >>> > gcc/ChangeLog: >>> > >>> > * config.gcc: Add aarch64-store-forwarding.o to extra_objs. >>> > * config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New pass. >>> > * config/aarch64/aarch64-protos.h >>> > (make_pass_avoid_store_forwarding): Declare. >>> > * config/aarch64/aarch64.opt (mavoid-store-forwarding): New >>> > option. >>> > (aarch64-store-forwarding-threshold): New param. >>> > * config/aarch64/t-aarch64: Add aarch64-store-forwarding.o >>> > * doc/invoke.texi: Document new option and new param. >>> > * config/aarch64/aarch64-store-forwarding.cc: New file. >>> > >>> > gcc/testsuite/ChangeLog: >>> > >>> > * gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test. >>> > * gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test. >>> > * gcc.target/aarch64/ldp_ssll_overlap.c: New test. >>> > >>> > Signed-off-by: Manos Anagnostakis >>> > Co-Authored-By: Manolis Tsamis >>> > Co-Authored-By: Philipp Tomsich >>> > --- >>> > Changes in v6: >>> > - An obvious change. insn_cnt was incremented only on >>> > stores and not for every insn in the bb. Now restored. >>> > >>> > gcc/config.gcc| 1 + >>> > gcc/config/aarch64/aarch64-passes.def | 1 + >>> > gcc/config/aarch64/aarch64-protos.h | 1 + >>> > .../aarch64/aarch64-store-forwarding.cc | 318 ++ >>> > gcc/config/aarch64/aarch64.opt| 9 + >>> > gcc/config/aarch64/t-aarch64 | 10 + >>> > gcc/doc/invoke.texi | 11 +- >>> > .../aarch64/ldp_ssll_no_overlap_address.c | 33 ++ >>> > .../aarch64/ldp_ssll_no_overlap_offset.c | 33 ++ >>> > .../gcc.target/aarch64/ldp_ssll_overlap.c | 33 ++ >>> > 10 files changed, 449 insertions(+), 1
Re: [PATCH v6] aarch64: New RTL optimization pass avoid-store-forwarding.
On Wed, Dec 6, 2023 at 7:44 PM Philipp Tomsich wrote: > > On Wed, 6 Dec 2023 at 23:32, Richard Biener > wrote: > > > > On Wed, Dec 6, 2023 at 2:48 PM Manos Anagnostakis > > wrote: > > > > > > This is an RTL pass that detects store forwarding from stores to larger > > > loads (load pairs). > > > > > > This optimization is SPEC2017-driven and was found to be beneficial for > > > some benchmarks, > > > through testing on ampere1/ampere1a machines. > > > > > > For example, it can transform cases like > > > > > > str d5, [sp, #320] > > > fmul d5, d31, d29 > > > ldp d31, d17, [sp, #312] # Large load from small store > > > > > > to > > > > > > str d5, [sp, #320] > > > fmul d5, d31, d29 > > > ldr d31, [sp, #312] > > > ldr d17, [sp, #320] > > > > > > Currently, the pass is disabled by default on all architectures and > > > enabled by a target-specific option. > > > > > > If deemed beneficial enough for a default, it will be enabled on > > > ampere1/ampere1a, > > > or other architectures as well, without needing to be turned on by this > > > option. > > > > What is aarch64-specific about the pass? > > > > I see an increasingly large number of target specific passes pop up > > (probably > > for the excuse we can generalize them if necessary). But GCC isn't LLVM > > and this feels like getting out of hand? > > We had an OK from Richard Sandiford on the earlier (v5) version with > v6 just fixing an obvious bug... so I was about to merge this earlier > just when you commented. > > Given that this had months of test exposure on our end, I would prefer > to move this forward for GCC14 in its current form. > The project of replacing architecture-specific store-forwarding passes > with a generalized infrastructure could then be addressed in the GCC15 > timeframe (or beyond)? It's up to target maintainers, I just picked this pass (randomly) to make this comment (of course also knowing that STLF fails are a common issue on pipelined uarchs). Richard. > > --Philipp. > > > > > The x86 backend also has its store-forwarding "pass" as part of mdreorg > > in ix86_split_stlf_stall_load. > > > > Richard. > > > > > Bootstrapped and regtested on aarch64-linux. > > > > > > gcc/ChangeLog: > > > > > > * config.gcc: Add aarch64-store-forwarding.o to extra_objs. > > > * config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New pass. > > > * config/aarch64/aarch64-protos.h > > > (make_pass_avoid_store_forwarding): Declare. > > > * config/aarch64/aarch64.opt (mavoid-store-forwarding): New > > > option. > > > (aarch64-store-forwarding-threshold): New param. > > > * config/aarch64/t-aarch64: Add aarch64-store-forwarding.o > > > * doc/invoke.texi: Document new option and new param. > > > * config/aarch64/aarch64-store-forwarding.cc: New file. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test. > > > * gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test. > > > * gcc.target/aarch64/ldp_ssll_overlap.c: New test. > > > > > > Signed-off-by: Manos Anagnostakis > > > Co-Authored-By: Manolis Tsamis > > > Co-Authored-By: Philipp Tomsich > > > --- > > > Changes in v6: > > > - An obvious change. insn_cnt was incremented only on > > > stores and not for every insn in the bb. Now restored. > > > > > > gcc/config.gcc| 1 + > > > gcc/config/aarch64/aarch64-passes.def | 1 + > > > gcc/config/aarch64/aarch64-protos.h | 1 + > > > .../aarch64/aarch64-store-forwarding.cc | 318 ++ > > > gcc/config/aarch64/aarch64.opt| 9 + > > > gcc/config/aarch64/t-aarch64 | 10 + > > > gcc/doc/invoke.texi | 11 +- > > > .../aarch64/ldp_ssll_no_overlap_address.c | 33 ++ > > > .../aarch64/ldp_ssll_no_overlap_offset.c | 33 ++ > > > .../gcc.target/aarch64/ldp_ssll_overlap.c | 33 ++ > > > 10 files changed, 449 insertions(+), 1 deletion(-) > > > create mode 100644 gcc/config/aarch64/aarch64-store-forwarding.cc > > > create mode 100644 > > > gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_address.c > > > create mode 100644 > > > gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_offset.c > > > create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_ssll_overlap.c > > > > > > diff --git a/gcc/config.gcc b/gcc/config.gcc > > > index 6450448f2f0..7c48429eb82 100644 > > > --- a/gcc/config.gcc > > > +++ b/gcc/config.gcc > > > @@ -350,6 +350,7 @@ aarch64*-*-*) > > > cxx_target_objs="aarch64-c.o" > > > d_target_objs="aarch64-d.o" > > > extra_objs="aarch64-builtins.o aarch-common.o > > > aarch64-sve-builtins.o aarch64-sve-builtins-shapes.o > > > aarch64-sve-builtins-base.o aarch64-sve-builtins-sve2.o > > > aarch64-sve-builtins-sme.o cortex-a57-fma-steering.o > > > aarch64-speculation.o
Re: [PATCH v6] aarch64: New RTL optimization pass avoid-store-forwarding.
On Wed, 6 Dec 2023 at 23:32, Richard Biener wrote: > > On Wed, Dec 6, 2023 at 2:48 PM Manos Anagnostakis > wrote: > > > > This is an RTL pass that detects store forwarding from stores to larger > > loads (load pairs). > > > > This optimization is SPEC2017-driven and was found to be beneficial for > > some benchmarks, > > through testing on ampere1/ampere1a machines. > > > > For example, it can transform cases like > > > > str d5, [sp, #320] > > fmul d5, d31, d29 > > ldp d31, d17, [sp, #312] # Large load from small store > > > > to > > > > str d5, [sp, #320] > > fmul d5, d31, d29 > > ldr d31, [sp, #312] > > ldr d17, [sp, #320] > > > > Currently, the pass is disabled by default on all architectures and enabled > > by a target-specific option. > > > > If deemed beneficial enough for a default, it will be enabled on > > ampere1/ampere1a, > > or other architectures as well, without needing to be turned on by this > > option. > > What is aarch64-specific about the pass? > > I see an increasingly large number of target specific passes pop up (probably > for the excuse we can generalize them if necessary). But GCC isn't LLVM > and this feels like getting out of hand? We had an OK from Richard Sandiford on the earlier (v5) version with v6 just fixing an obvious bug... so I was about to merge this earlier just when you commented. Given that this had months of test exposure on our end, I would prefer to move this forward for GCC14 in its current form. The project of replacing architecture-specific store-forwarding passes with a generalized infrastructure could then be addressed in the GCC15 timeframe (or beyond)? --Philipp. > > The x86 backend also has its store-forwarding "pass" as part of mdreorg > in ix86_split_stlf_stall_load. > > Richard. > > > Bootstrapped and regtested on aarch64-linux. > > > > gcc/ChangeLog: > > > > * config.gcc: Add aarch64-store-forwarding.o to extra_objs. > > * config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New pass. > > * config/aarch64/aarch64-protos.h > > (make_pass_avoid_store_forwarding): Declare. > > * config/aarch64/aarch64.opt (mavoid-store-forwarding): New option. > > (aarch64-store-forwarding-threshold): New param. > > * config/aarch64/t-aarch64: Add aarch64-store-forwarding.o > > * doc/invoke.texi: Document new option and new param. > > * config/aarch64/aarch64-store-forwarding.cc: New file. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test. > > * gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test. > > * gcc.target/aarch64/ldp_ssll_overlap.c: New test. > > > > Signed-off-by: Manos Anagnostakis > > Co-Authored-By: Manolis Tsamis > > Co-Authored-By: Philipp Tomsich > > --- > > Changes in v6: > > - An obvious change. insn_cnt was incremented only on > > stores and not for every insn in the bb. Now restored. > > > > gcc/config.gcc| 1 + > > gcc/config/aarch64/aarch64-passes.def | 1 + > > gcc/config/aarch64/aarch64-protos.h | 1 + > > .../aarch64/aarch64-store-forwarding.cc | 318 ++ > > gcc/config/aarch64/aarch64.opt| 9 + > > gcc/config/aarch64/t-aarch64 | 10 + > > gcc/doc/invoke.texi | 11 +- > > .../aarch64/ldp_ssll_no_overlap_address.c | 33 ++ > > .../aarch64/ldp_ssll_no_overlap_offset.c | 33 ++ > > .../gcc.target/aarch64/ldp_ssll_overlap.c | 33 ++ > > 10 files changed, 449 insertions(+), 1 deletion(-) > > create mode 100644 gcc/config/aarch64/aarch64-store-forwarding.cc > > create mode 100644 > > gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_address.c > > create mode 100644 > > gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_offset.c > > create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_ssll_overlap.c > > > > diff --git a/gcc/config.gcc b/gcc/config.gcc > > index 6450448f2f0..7c48429eb82 100644 > > --- a/gcc/config.gcc > > +++ b/gcc/config.gcc > > @@ -350,6 +350,7 @@ aarch64*-*-*) > > cxx_target_objs="aarch64-c.o" > > d_target_objs="aarch64-d.o" > > extra_objs="aarch64-builtins.o aarch-common.o > > aarch64-sve-builtins.o aarch64-sve-builtins-shapes.o > > aarch64-sve-builtins-base.o aarch64-sve-builtins-sve2.o > > aarch64-sve-builtins-sme.o cortex-a57-fma-steering.o aarch64-speculation.o > > falkor-tag-collision-avoidance.o aarch-bti-insert.o aarch64-cc-fusion.o" > > + extra_objs="${extra_objs} aarch64-store-forwarding.o" > > target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.cc > > \$(srcdir)/config/aarch64/aarch64-sve-builtins.h > > \$(srcdir)/config/aarch64/aarch64-sve-builtins.cc" > > target_has_targetm_common=yes > > ;; > > diff --git a/gcc/config/aarch64/aarch64-passes.def > > b/gcc/config/aarch64/aarch64-passes.def > > index
Re: [PATCH v6] aarch64: New RTL optimization pass avoid-store-forwarding.
Hi again, I went and tested the requested changes and found out the following: 1. The pass is currently increasing insn_cnt on a NONJUMP_INSN_P, which is a subset of NONDEBUG_INSN_P. I think there is no problem with depending on -g with the current version. Do you see something I don't or did you mean something else? 2. Not processing all instructions is not letting cselib record all the effects they have, thus it does not have updated information to find true forwardings at any given time. I can confirm this since I am witnessing many unexpected changes on the number of handled cases if I do this only for loads/stores. Thanks in advance and please let me know your thoughts on the above. Manos. On Wed, Dec 6, 2023 at 5:10 PM Manos Anagnostakis < manos.anagnosta...@vrull.eu> wrote: > Hi Richard, > > thanks for the useful comments. > > On Wed, Dec 6, 2023 at 4:32 PM Richard Biener > wrote: > >> On Wed, Dec 6, 2023 at 2:48 PM Manos Anagnostakis >> wrote: >> > >> > This is an RTL pass that detects store forwarding from stores to larger >> loads (load pairs). >> > >> > This optimization is SPEC2017-driven and was found to be beneficial for >> some benchmarks, >> > through testing on ampere1/ampere1a machines. >> > >> > For example, it can transform cases like >> > >> > str d5, [sp, #320] >> > fmul d5, d31, d29 >> > ldp d31, d17, [sp, #312] # Large load from small store >> > >> > to >> > >> > str d5, [sp, #320] >> > fmul d5, d31, d29 >> > ldr d31, [sp, #312] >> > ldr d17, [sp, #320] >> > >> > Currently, the pass is disabled by default on all architectures and >> enabled by a target-specific option. >> > >> > If deemed beneficial enough for a default, it will be enabled on >> ampere1/ampere1a, >> > or other architectures as well, without needing to be turned on by this >> option. >> >> What is aarch64-specific about the pass? >> > The pass was designed to target load pairs, which are aarch64 specific, > thus it cannot handle generic loads. > >> >> I see an increasingly large number of target specific passes pop up >> (probably >> for the excuse we can generalize them if necessary). But GCC isn't LLVM >> and this feels like getting out of hand? >> >> The x86 backend also has its store-forwarding "pass" as part of mdreorg >> in ix86_split_stlf_stall_load. >> >> Richard. >> >> > Bootstrapped and regtested on aarch64-linux. >> > >> > gcc/ChangeLog: >> > >> > * config.gcc: Add aarch64-store-forwarding.o to extra_objs. >> > * config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New >> pass. >> > * config/aarch64/aarch64-protos.h >> (make_pass_avoid_store_forwarding): Declare. >> > * config/aarch64/aarch64.opt (mavoid-store-forwarding): New >> option. >> > (aarch64-store-forwarding-threshold): New param. >> > * config/aarch64/t-aarch64: Add aarch64-store-forwarding.o >> > * doc/invoke.texi: Document new option and new param. >> > * config/aarch64/aarch64-store-forwarding.cc: New file. >> > >> > gcc/testsuite/ChangeLog: >> > >> > * gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test. >> > * gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test. >> > * gcc.target/aarch64/ldp_ssll_overlap.c: New test. >> > >> > Signed-off-by: Manos Anagnostakis >> > Co-Authored-By: Manolis Tsamis >> > Co-Authored-By: Philipp Tomsich >> > --- >> > Changes in v6: >> > - An obvious change. insn_cnt was incremented only on >> > stores and not for every insn in the bb. Now restored. >> > >> > gcc/config.gcc| 1 + >> > gcc/config/aarch64/aarch64-passes.def | 1 + >> > gcc/config/aarch64/aarch64-protos.h | 1 + >> > .../aarch64/aarch64-store-forwarding.cc | 318 ++ >> > gcc/config/aarch64/aarch64.opt| 9 + >> > gcc/config/aarch64/t-aarch64 | 10 + >> > gcc/doc/invoke.texi | 11 +- >> > .../aarch64/ldp_ssll_no_overlap_address.c | 33 ++ >> > .../aarch64/ldp_ssll_no_overlap_offset.c | 33 ++ >> > .../gcc.target/aarch64/ldp_ssll_overlap.c | 33 ++ >> > 10 files changed, 449 insertions(+), 1 deletion(-) >> > create mode 100644 gcc/config/aarch64/aarch64-store-forwarding.cc >> > create mode 100644 >> gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_address.c >> > create mode 100644 >> gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_offset.c >> > create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_ssll_overlap.c >> > >> > diff --git a/gcc/config.gcc b/gcc/config.gcc >> > index 6450448f2f0..7c48429eb82 100644 >> > --- a/gcc/config.gcc >> > +++ b/gcc/config.gcc >> > @@ -350,6 +350,7 @@ aarch64*-*-*) >> > cxx_target_objs="aarch64-c.o" >> > d_target_objs="aarch64-d.o" >> > extra_objs="aarch64-builtins.o aarch-common.o >> aarch64-sve-builtins.o aarch64-sve-builtins-shapes.o >> aarch64-sve-builtins-base.o
Re: [PATCH v6] aarch64: New RTL optimization pass avoid-store-forwarding.
Hi Richard, thanks for the useful comments. On Wed, Dec 6, 2023 at 4:32 PM Richard Biener wrote: > On Wed, Dec 6, 2023 at 2:48 PM Manos Anagnostakis > wrote: > > > > This is an RTL pass that detects store forwarding from stores to larger > loads (load pairs). > > > > This optimization is SPEC2017-driven and was found to be beneficial for > some benchmarks, > > through testing on ampere1/ampere1a machines. > > > > For example, it can transform cases like > > > > str d5, [sp, #320] > > fmul d5, d31, d29 > > ldp d31, d17, [sp, #312] # Large load from small store > > > > to > > > > str d5, [sp, #320] > > fmul d5, d31, d29 > > ldr d31, [sp, #312] > > ldr d17, [sp, #320] > > > > Currently, the pass is disabled by default on all architectures and > enabled by a target-specific option. > > > > If deemed beneficial enough for a default, it will be enabled on > ampere1/ampere1a, > > or other architectures as well, without needing to be turned on by this > option. > > What is aarch64-specific about the pass? > The pass was designed to target load pairs, which are aarch64 specific, thus it cannot handle generic loads. > > I see an increasingly large number of target specific passes pop up > (probably > for the excuse we can generalize them if necessary). But GCC isn't LLVM > and this feels like getting out of hand? > > The x86 backend also has its store-forwarding "pass" as part of mdreorg > in ix86_split_stlf_stall_load. > > Richard. > > > Bootstrapped and regtested on aarch64-linux. > > > > gcc/ChangeLog: > > > > * config.gcc: Add aarch64-store-forwarding.o to extra_objs. > > * config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New > pass. > > * config/aarch64/aarch64-protos.h > (make_pass_avoid_store_forwarding): Declare. > > * config/aarch64/aarch64.opt (mavoid-store-forwarding): New > option. > > (aarch64-store-forwarding-threshold): New param. > > * config/aarch64/t-aarch64: Add aarch64-store-forwarding.o > > * doc/invoke.texi: Document new option and new param. > > * config/aarch64/aarch64-store-forwarding.cc: New file. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test. > > * gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test. > > * gcc.target/aarch64/ldp_ssll_overlap.c: New test. > > > > Signed-off-by: Manos Anagnostakis > > Co-Authored-By: Manolis Tsamis > > Co-Authored-By: Philipp Tomsich > > --- > > Changes in v6: > > - An obvious change. insn_cnt was incremented only on > > stores and not for every insn in the bb. Now restored. > > > > gcc/config.gcc| 1 + > > gcc/config/aarch64/aarch64-passes.def | 1 + > > gcc/config/aarch64/aarch64-protos.h | 1 + > > .../aarch64/aarch64-store-forwarding.cc | 318 ++ > > gcc/config/aarch64/aarch64.opt| 9 + > > gcc/config/aarch64/t-aarch64 | 10 + > > gcc/doc/invoke.texi | 11 +- > > .../aarch64/ldp_ssll_no_overlap_address.c | 33 ++ > > .../aarch64/ldp_ssll_no_overlap_offset.c | 33 ++ > > .../gcc.target/aarch64/ldp_ssll_overlap.c | 33 ++ > > 10 files changed, 449 insertions(+), 1 deletion(-) > > create mode 100644 gcc/config/aarch64/aarch64-store-forwarding.cc > > create mode 100644 > gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_address.c > > create mode 100644 > gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_offset.c > > create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_ssll_overlap.c > > > > diff --git a/gcc/config.gcc b/gcc/config.gcc > > index 6450448f2f0..7c48429eb82 100644 > > --- a/gcc/config.gcc > > +++ b/gcc/config.gcc > > @@ -350,6 +350,7 @@ aarch64*-*-*) > > cxx_target_objs="aarch64-c.o" > > d_target_objs="aarch64-d.o" > > extra_objs="aarch64-builtins.o aarch-common.o > aarch64-sve-builtins.o aarch64-sve-builtins-shapes.o > aarch64-sve-builtins-base.o aarch64-sve-builtins-sve2.o > aarch64-sve-builtins-sme.o cortex-a57-fma-steering.o aarch64-speculation.o > falkor-tag-collision-avoidance.o aarch-bti-insert.o aarch64-cc-fusion.o" > > + extra_objs="${extra_objs} aarch64-store-forwarding.o" > > target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.cc > \$(srcdir)/config/aarch64/aarch64-sve-builtins.h > \$(srcdir)/config/aarch64/aarch64-sve-builtins.cc" > > target_has_targetm_common=yes > > ;; > > diff --git a/gcc/config/aarch64/aarch64-passes.def > b/gcc/config/aarch64/aarch64-passes.def > > index 662a13fd5e6..94ced0aebf6 100644 > > --- a/gcc/config/aarch64/aarch64-passes.def > > +++ b/gcc/config/aarch64/aarch64-passes.def > > @@ -24,3 +24,4 @@ INSERT_PASS_BEFORE > (pass_late_thread_prologue_and_epilogue, 1, pass_switch_pstat > > INSERT_PASS_AFTER (pass_machine_reorg, 1, pass_tag_collision_avoidance); > > INSERT_PASS_BEFORE
Re: [PATCH v6] aarch64: New RTL optimization pass avoid-store-forwarding.
On Wed, Dec 6, 2023 at 2:48 PM Manos Anagnostakis wrote: > > This is an RTL pass that detects store forwarding from stores to larger loads > (load pairs). > > This optimization is SPEC2017-driven and was found to be beneficial for some > benchmarks, > through testing on ampere1/ampere1a machines. > > For example, it can transform cases like > > str d5, [sp, #320] > fmul d5, d31, d29 > ldp d31, d17, [sp, #312] # Large load from small store > > to > > str d5, [sp, #320] > fmul d5, d31, d29 > ldr d31, [sp, #312] > ldr d17, [sp, #320] > > Currently, the pass is disabled by default on all architectures and enabled > by a target-specific option. > > If deemed beneficial enough for a default, it will be enabled on > ampere1/ampere1a, > or other architectures as well, without needing to be turned on by this > option. What is aarch64-specific about the pass? I see an increasingly large number of target specific passes pop up (probably for the excuse we can generalize them if necessary). But GCC isn't LLVM and this feels like getting out of hand? The x86 backend also has its store-forwarding "pass" as part of mdreorg in ix86_split_stlf_stall_load. Richard. > Bootstrapped and regtested on aarch64-linux. > > gcc/ChangeLog: > > * config.gcc: Add aarch64-store-forwarding.o to extra_objs. > * config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New pass. > * config/aarch64/aarch64-protos.h (make_pass_avoid_store_forwarding): > Declare. > * config/aarch64/aarch64.opt (mavoid-store-forwarding): New option. > (aarch64-store-forwarding-threshold): New param. > * config/aarch64/t-aarch64: Add aarch64-store-forwarding.o > * doc/invoke.texi: Document new option and new param. > * config/aarch64/aarch64-store-forwarding.cc: New file. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test. > * gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test. > * gcc.target/aarch64/ldp_ssll_overlap.c: New test. > > Signed-off-by: Manos Anagnostakis > Co-Authored-By: Manolis Tsamis > Co-Authored-By: Philipp Tomsich > --- > Changes in v6: > - An obvious change. insn_cnt was incremented only on > stores and not for every insn in the bb. Now restored. > > gcc/config.gcc| 1 + > gcc/config/aarch64/aarch64-passes.def | 1 + > gcc/config/aarch64/aarch64-protos.h | 1 + > .../aarch64/aarch64-store-forwarding.cc | 318 ++ > gcc/config/aarch64/aarch64.opt| 9 + > gcc/config/aarch64/t-aarch64 | 10 + > gcc/doc/invoke.texi | 11 +- > .../aarch64/ldp_ssll_no_overlap_address.c | 33 ++ > .../aarch64/ldp_ssll_no_overlap_offset.c | 33 ++ > .../gcc.target/aarch64/ldp_ssll_overlap.c | 33 ++ > 10 files changed, 449 insertions(+), 1 deletion(-) > create mode 100644 gcc/config/aarch64/aarch64-store-forwarding.cc > create mode 100644 > gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_address.c > create mode 100644 > gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_offset.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_ssll_overlap.c > > diff --git a/gcc/config.gcc b/gcc/config.gcc > index 6450448f2f0..7c48429eb82 100644 > --- a/gcc/config.gcc > +++ b/gcc/config.gcc > @@ -350,6 +350,7 @@ aarch64*-*-*) > cxx_target_objs="aarch64-c.o" > d_target_objs="aarch64-d.o" > extra_objs="aarch64-builtins.o aarch-common.o aarch64-sve-builtins.o > aarch64-sve-builtins-shapes.o aarch64-sve-builtins-base.o > aarch64-sve-builtins-sve2.o aarch64-sve-builtins-sme.o > cortex-a57-fma-steering.o aarch64-speculation.o > falkor-tag-collision-avoidance.o aarch-bti-insert.o aarch64-cc-fusion.o" > + extra_objs="${extra_objs} aarch64-store-forwarding.o" > target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.cc > \$(srcdir)/config/aarch64/aarch64-sve-builtins.h > \$(srcdir)/config/aarch64/aarch64-sve-builtins.cc" > target_has_targetm_common=yes > ;; > diff --git a/gcc/config/aarch64/aarch64-passes.def > b/gcc/config/aarch64/aarch64-passes.def > index 662a13fd5e6..94ced0aebf6 100644 > --- a/gcc/config/aarch64/aarch64-passes.def > +++ b/gcc/config/aarch64/aarch64-passes.def > @@ -24,3 +24,4 @@ INSERT_PASS_BEFORE (pass_late_thread_prologue_and_epilogue, > 1, pass_switch_pstat > INSERT_PASS_AFTER (pass_machine_reorg, 1, pass_tag_collision_avoidance); > INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_insert_bti); > INSERT_PASS_AFTER (pass_if_after_combine, 1, pass_cc_fusion); > +INSERT_PASS_AFTER (pass_peephole2, 1, pass_avoid_store_forwarding); > diff --git a/gcc/config/aarch64/aarch64-protos.h > b/gcc/config/aarch64/aarch64-protos.h > index 60ff61f6d54..8f5f2ca4710 100644 > --- a/gcc/config/aarch64/aarch64-protos.h > +++ b/gcc/config/aarch64/aarch64-protos.h > @@