Re: [PATCH v5] Implement new RTL optimizations pass: fold-mem-offsets.
On Fri, Sep 29, 2023 at 10:22 PM Jeff Law wrote: > > > > On 9/12/23 04:13, Manolis Tsamis wrote: > > >>> + > >>> +/* Get the single reaching definition of an instruction inside a BB. > >>> + The definition is desired for REG used in INSN. > >>> + Return the definition insn or NULL if there's no definition with > >>> + the desired criteria. */ > >>> +static rtx_insn* > >>> +get_single_def_in_bb (rtx_insn *insn, rtx reg) > >>> +{ > >>> + df_ref use; > >>> + struct df_link *ref_chain, *ref_link; > >>> + > >>> + FOR_EACH_INSN_USE (use, insn) > >>> +{ > >>> + if (GET_CODE (DF_REF_REG (use)) == SUBREG) > >>> + return NULL; > >>> + if (REGNO (DF_REF_REG (use)) == REGNO (reg)) > >>> + break; > >>> +} > >>> + > >>> + if (!use) > >>> +return NULL; > >>> + > >>> + ref_chain = DF_REF_CHAIN (use); > >> So what if there's two uses of REG in INSN? I don't think it's be > >> common at all, but probably better safe and reject than sorry, right? Or > >> is that case filtered out earlier? > >> > > If the REG is the same won't the definitions be the same even if that > > REG appears multiple times in INSN? > Yes. Good, so no issues here. > > > fold_offsets_1 should be able to handle the folding with multiple uses > > of REG just fine, for example add R1, R1 or add (ashift R1, 1), R1. > > If there's no other issue here I assume we want to keep that as-is in > > order to not reduce the propagation power (Which I assume is similar > > to ree which uses the same logic). > OK. I was primarily concerned about the folding and rewriting aspects. > It probably can only show up on targets with LEA like instructions, and > even on such targets it's probably rate. > OK. > > > > >>> +/* Test if INSN is a memory load / store that can have an offset folded > >>> to it. > >>> + Return true iff INSN is such an instruction and return through > >>> MEM_OUT, > >>> + REG_OUT and OFFSET_OUT the RTX that has a MEM code, the register that > >>> is > >>> + used as a base address and the offset accordingly. > >>> + All of the out pointers may be NULL in which case they will be > >>> ignored. */ > >>> +bool > >>> +get_fold_mem_root (rtx_insn* insn, rtx *mem_out, rtx *reg_out, > >>> +HOST_WIDE_INT *offset_out) > >>> +{ > >>> + rtx set = single_set (insn); > >>> + rtx mem = NULL_RTX; > >>> + > >>> + if (set != NULL_RTX) > >>> +{ > >>> + rtx src = SET_SRC (set); > >>> + rtx dest = SET_DEST (set); > >>> + > >>> + /* Don't fold when we have unspec / volatile. */ > >>> + if (GET_CODE (src) == UNSPEC > >>> + || GET_CODE (src) == UNSPEC_VOLATILE > >>> + || GET_CODE (dest) == UNSPEC > >>> + || GET_CODE (dest) == UNSPEC_VOLATILE) > >>> + return false; > >>> + > >>> + if (MEM_P (src)) > >>> + mem = src; > >>> + else if (MEM_P (dest)) > >>> + mem = dest; > >>> + else if ((GET_CODE (src) == SIGN_EXTEND > >>> + || GET_CODE (src) == ZERO_EXTEND) > >>> +&& MEM_P (XEXP (src, 0))) > >> Note some architectures allow both a source and destination memory. It > >> looks like your code will prefer the source operand in that case. > >> That's fine, just pointing it out. > >> > > Thanks for pointing that possibility out. I thought for a moment that > > this would be a bug with multiple mentions of the address register. > > but it should be fine due to: > > /* Special case: A foldable memory store is not foldable if it > > mentions DEST outside of the address calculation. */ > ACK. > > The other thing I keep pondering is autoincrement style addressing. > Though I think at some point I convinced myself they weren't a problem. > I think your checks only allow specific kinds of expressions for the > memory address and I don't think {PRE,POST}_{INC,DEC.MODIFY} were in the > list of valid ops. > Yes, although I haven't considered pre/post-inc/dec, they're indeed not allowed due to what is allowed to be a root memory operation (get_fold_mem_root). I believe these shouldn't be an issue as anything that transitively affects even one of these will be rejected. > > >>> + > >>> + int max_iters = 5; > >>> + for (int i = 0; i < max_iters; i++) > >>> +{ > >>> + bool made_changes = false; > >>> + for (fold_info_map::iterator iter = fold_info->begin (); > >>> +iter != fold_info->end (); ++iter) > >>> + { > >>> + fold_mem_info *info = (*iter).second; > >>> + if (bitmap_intersect_p (_fold_insns, info->fold_insns)) > >>> + made_changes |= bitmap_ior_into (_fold_insns, > >>> + info->fold_insns); > >>> + } > >>> + > >>> + if (!made_changes) > >>> + return true; > >>> +} > >>> + > >>> + return false; > >> So how was the magic value of "5" determined here? In general we try > >> not to have magic #s like that and instead find a better way to control > >> iterations, falling back to a PARAM when all else fails. > >> > > It's
Re: [PATCH v5] Implement new RTL optimizations pass: fold-mem-offsets.
On 9/12/23 04:13, Manolis Tsamis wrote: + +/* Get the single reaching definition of an instruction inside a BB. + The definition is desired for REG used in INSN. + Return the definition insn or NULL if there's no definition with + the desired criteria. */ +static rtx_insn* +get_single_def_in_bb (rtx_insn *insn, rtx reg) +{ + df_ref use; + struct df_link *ref_chain, *ref_link; + + FOR_EACH_INSN_USE (use, insn) +{ + if (GET_CODE (DF_REF_REG (use)) == SUBREG) + return NULL; + if (REGNO (DF_REF_REG (use)) == REGNO (reg)) + break; +} + + if (!use) +return NULL; + + ref_chain = DF_REF_CHAIN (use); So what if there's two uses of REG in INSN? I don't think it's be common at all, but probably better safe and reject than sorry, right? Or is that case filtered out earlier? If the REG is the same won't the definitions be the same even if that REG appears multiple times in INSN? Yes. fold_offsets_1 should be able to handle the folding with multiple uses of REG just fine, for example add R1, R1 or add (ashift R1, 1), R1. If there's no other issue here I assume we want to keep that as-is in order to not reduce the propagation power (Which I assume is similar to ree which uses the same logic). OK. I was primarily concerned about the folding and rewriting aspects. It probably can only show up on targets with LEA like instructions, and even on such targets it's probably rate. +/* Test if INSN is a memory load / store that can have an offset folded to it. + Return true iff INSN is such an instruction and return through MEM_OUT, + REG_OUT and OFFSET_OUT the RTX that has a MEM code, the register that is + used as a base address and the offset accordingly. + All of the out pointers may be NULL in which case they will be ignored. */ +bool +get_fold_mem_root (rtx_insn* insn, rtx *mem_out, rtx *reg_out, +HOST_WIDE_INT *offset_out) +{ + rtx set = single_set (insn); + rtx mem = NULL_RTX; + + if (set != NULL_RTX) +{ + rtx src = SET_SRC (set); + rtx dest = SET_DEST (set); + + /* Don't fold when we have unspec / volatile. */ + if (GET_CODE (src) == UNSPEC + || GET_CODE (src) == UNSPEC_VOLATILE + || GET_CODE (dest) == UNSPEC + || GET_CODE (dest) == UNSPEC_VOLATILE) + return false; + + if (MEM_P (src)) + mem = src; + else if (MEM_P (dest)) + mem = dest; + else if ((GET_CODE (src) == SIGN_EXTEND + || GET_CODE (src) == ZERO_EXTEND) +&& MEM_P (XEXP (src, 0))) Note some architectures allow both a source and destination memory. It looks like your code will prefer the source operand in that case. That's fine, just pointing it out. Thanks for pointing that possibility out. I thought for a moment that this would be a bug with multiple mentions of the address register. but it should be fine due to: /* Special case: A foldable memory store is not foldable if it mentions DEST outside of the address calculation. */ ACK. The other thing I keep pondering is autoincrement style addressing. Though I think at some point I convinced myself they weren't a problem. I think your checks only allow specific kinds of expressions for the memory address and I don't think {PRE,POST}_{INC,DEC.MODIFY} were in the list of valid ops. + + int max_iters = 5; + for (int i = 0; i < max_iters; i++) +{ + bool made_changes = false; + for (fold_info_map::iterator iter = fold_info->begin (); +iter != fold_info->end (); ++iter) + { + fold_mem_info *info = (*iter).second; + if (bitmap_intersect_p (_fold_insns, info->fold_insns)) + made_changes |= bitmap_ior_into (_fold_insns, + info->fold_insns); + } + + if (!made_changes) + return true; +} + + return false; So how was the magic value of "5" determined here? In general we try not to have magic #s like that and instead find a better way to control iterations, falling back to a PARAM when all else fails. It's indeed just a magic value :) In general even two iterations of this should be quite rare. One iteration will handle the majority of interesting cases. I chose 5 to limit the worst case execution time for degenerate cases. I'll be happy to change that to something better but I couldn't figure out how "falling back to a PARAM when all else fails" should work here. Is there a similar example somewhere else in the code that I can look at? If we expect two iterations to be quite rare, then I doubt a param is really worth the effort. I might suggest using a loop like for (pass = 0; pass < flag_expensive_optimizations + 1; pass++) No magic numbers :-) We use similar loops elsewhere for this kind of scenario. If you ever need to create a param, the procedure is to add the option to params.opt. If you look in that file you'll see that you define a variable name in the specification that you can then reference.
Re: [PATCH v5] Implement new RTL optimizations pass: fold-mem-offsets.
On Tue, Sep 12, 2023 at 3:47 AM Jeff Law wrote: > > > > On 9/9/23 02:46, Manolis Tsamis wrote: > > This is a new RTL pass that tries to optimize memory offset calculations > > by moving them from add immediate instructions to the memory loads/stores. > > For example it can transform this: > > > >addi t4,sp,16 > >add t2,a6,t4 > >shl t3,t2,1 > >ld a2,0(t3) > >addi a2,1 > >sd a2,8(t2) > > > > into the following (one instruction less): > > > >add t2,a6,sp > >shl t3,t2,1 > >ld a2,32(t3) > >addi a2,1 > >sd a2,24(t2) > > > > Although there are places where this is done already, this pass is more > > powerful and can handle the more difficult cases that are currently not > > optimized. Also, it runs late enough and can optimize away unnecessary > > stack pointer calculations. > > > > gcc/ChangeLog: > > > > * Makefile.in: Add fold-mem-offsets.o. > > * passes.def: Schedule a new pass. > > * tree-pass.h (make_pass_fold_mem_offsets): Declare. > > * common.opt: New options. > > * doc/invoke.texi: Document new option. > > * fold-mem-offsets.cc: New file. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/riscv/fold-mem-offsets-1.c: New test. > > * gcc.target/riscv/fold-mem-offsets-2.c: New test. > > * gcc.target/riscv/fold-mem-offsets-3.c: New test. > > > > Signed-off-by: Manolis Tsamis > > --- > > > > Changes in v5: > > - Introduce new helper function fold_offsets_1. > > - Fix bug because constants could be partially propagated > >through instructions that weren't understood. > > - Introduce helper class fold_mem_info that stores f-m-o > >info for an instruction. > > - Calculate fold_offsets only once with do_fold_info_calculation. > > - Fix correctness issue by introducing compute_validity_closure. > > - Propagate in more cases for PLUS/MINUS with constant. > > > > Changes in v4: > > - Add DF_EQ_NOTES flag to avoid incorrect state in notes. > > - Remove fold_mem_offsets_driver and enum fold_mem_phase. > > - Call recog when patching offsets in do_commit_offset. > > - Restore INSN_CODE after modifying insn in do_check_validity. > > > > Changes in v3: > > - Added propagation for more codes: > >sub, neg, mul. > > - Added folding / elimination for sub and > >const int moves. > > - For the validity check of the generated addresses > >also test memory_address_addr_space_p. > > - Replaced GEN_INT with gen_int_mode. > > - Replaced some bitmap_head with auto_bitmap. > > - Refactor each phase into own function for readability. > > - Add dump details. > > - Replace rtx iteration with reg_mentioned_p. > > - Return early for codes that we can't propagate through. > > > > Changes in v2: > > - Made the pass target-independant instead of RISCV specific. > > - Fixed a number of bugs. > > - Add code to handle more ADD patterns as found > >in other targets (x86, aarch64). > > - Improved naming and comments. > > - Fixed bitmap memory leak. > > > > > > + > > +/* Get the single reaching definition of an instruction inside a BB. > > + The definition is desired for REG used in INSN. > > + Return the definition insn or NULL if there's no definition with > > + the desired criteria. */ > > +static rtx_insn* > > +get_single_def_in_bb (rtx_insn *insn, rtx reg) > > +{ > > + df_ref use; > > + struct df_link *ref_chain, *ref_link; > > + > > + FOR_EACH_INSN_USE (use, insn) > > +{ > > + if (GET_CODE (DF_REF_REG (use)) == SUBREG) > > + return NULL; > > + if (REGNO (DF_REF_REG (use)) == REGNO (reg)) > > + break; > > +} > > + > > + if (!use) > > +return NULL; > > + > > + ref_chain = DF_REF_CHAIN (use); > So what if there's two uses of REG in INSN? I don't think it's be > common at all, but probably better safe and reject than sorry, right? Or > is that case filtered out earlier? > If the REG is the same won't the definitions be the same even if that REG appears multiple times in INSN? fold_offsets_1 should be able to handle the folding with multiple uses of REG just fine, for example add R1, R1 or add (ashift R1, 1), R1. If there's no other issue here I assume we want to keep that as-is in order to not reduce the propagation power (Which I assume is similar to ree which uses the same logic). > > > > > + > > + rtx_insn* def = DF_REF_INSN (ref_chain->ref); > Formatting nit. The '*' should be next to the variable, not the type. > Thanks, I fixed this in multiple places. I thought that check_GNU_style would whine about this... > > + > > > + > > +static HOST_WIDE_INT > > +fold_offsets (rtx_insn* insn, rtx reg, bool analyze, bitmap > > foldable_insns); > > + > > +/* Helper function for fold_offsets. > > + > > +If
Re: [PATCH v5] Implement new RTL optimizations pass: fold-mem-offsets.
On 9/9/23 02:46, Manolis Tsamis wrote: This is a new RTL pass that tries to optimize memory offset calculations by moving them from add immediate instructions to the memory loads/stores. For example it can transform this: addi t4,sp,16 add t2,a6,t4 shl t3,t2,1 ld a2,0(t3) addi a2,1 sd a2,8(t2) into the following (one instruction less): add t2,a6,sp shl t3,t2,1 ld a2,32(t3) addi a2,1 sd a2,24(t2) Although there are places where this is done already, this pass is more powerful and can handle the more difficult cases that are currently not optimized. Also, it runs late enough and can optimize away unnecessary stack pointer calculations. gcc/ChangeLog: * Makefile.in: Add fold-mem-offsets.o. * passes.def: Schedule a new pass. * tree-pass.h (make_pass_fold_mem_offsets): Declare. * common.opt: New options. * doc/invoke.texi: Document new option. * fold-mem-offsets.cc: New file. gcc/testsuite/ChangeLog: * gcc.target/riscv/fold-mem-offsets-1.c: New test. * gcc.target/riscv/fold-mem-offsets-2.c: New test. * gcc.target/riscv/fold-mem-offsets-3.c: New test. Signed-off-by: Manolis Tsamis --- Changes in v5: - Introduce new helper function fold_offsets_1. - Fix bug because constants could be partially propagated through instructions that weren't understood. - Introduce helper class fold_mem_info that stores f-m-o info for an instruction. - Calculate fold_offsets only once with do_fold_info_calculation. - Fix correctness issue by introducing compute_validity_closure. - Propagate in more cases for PLUS/MINUS with constant. Changes in v4: - Add DF_EQ_NOTES flag to avoid incorrect state in notes. - Remove fold_mem_offsets_driver and enum fold_mem_phase. - Call recog when patching offsets in do_commit_offset. - Restore INSN_CODE after modifying insn in do_check_validity. Changes in v3: - Added propagation for more codes: sub, neg, mul. - Added folding / elimination for sub and const int moves. - For the validity check of the generated addresses also test memory_address_addr_space_p. - Replaced GEN_INT with gen_int_mode. - Replaced some bitmap_head with auto_bitmap. - Refactor each phase into own function for readability. - Add dump details. - Replace rtx iteration with reg_mentioned_p. - Return early for codes that we can't propagate through. Changes in v2: - Made the pass target-independant instead of RISCV specific. - Fixed a number of bugs. - Add code to handle more ADD patterns as found in other targets (x86, aarch64). - Improved naming and comments. - Fixed bitmap memory leak. + +/* Get the single reaching definition of an instruction inside a BB. + The definition is desired for REG used in INSN. + Return the definition insn or NULL if there's no definition with + the desired criteria. */ +static rtx_insn* +get_single_def_in_bb (rtx_insn *insn, rtx reg) +{ + df_ref use; + struct df_link *ref_chain, *ref_link; + + FOR_EACH_INSN_USE (use, insn) +{ + if (GET_CODE (DF_REF_REG (use)) == SUBREG) + return NULL; + if (REGNO (DF_REF_REG (use)) == REGNO (reg)) + break; +} + + if (!use) +return NULL; + + ref_chain = DF_REF_CHAIN (use); So what if there's two uses of REG in INSN? I don't think it's be common at all, but probably better safe and reject than sorry, right? Or is that case filtered out earlier? + + rtx_insn* def = DF_REF_INSN (ref_chain->ref); Formatting nit. The '*' should be next to the variable, not the type. + + +static HOST_WIDE_INT +fold_offsets (rtx_insn* insn, rtx reg, bool analyze, bitmap foldable_insns); + +/* Helper function for fold_offsets. + +If DO_RECURSION is false and ANALYZE is true this function returns true iff +it understands the structure of INSN and knows how to propagate constants +through it. In this case OFFSET_OUT and FOLDABLE_INSNS are unused. + +If DO_RECURSION is true then it also calls fold_offsets for each recognised +part of INSN with the appropriate arguments. + +If DO_RECURSION is true and ANALYZE is false then offset that would result +from folding is computed and is returned through the pointer OFFSET_OUT. +The instructions that can be folded are recorded in FOLDABLE_INSNS. +*/ +static bool fold_offsets_1 (rtx_insn* insn, bool analyze, bool do_recursion, + HOST_WIDE_INT *offset_out, bitmap foldable_insns) Nit. Linkage and return type on separate line. That makes the function name start at the beginning of a line. + +/* Test if INSN is a memory load / store that can have an offset folded to it. + Return true iff INSN is such an
Re: [PATCH v5] Implement new RTL optimizations pass: fold-mem-offsets.
This new version fixes the issues discussed in v4 and also fixes an issue that is described in the newly introduced compute_validity_closure. Bootstrapped on x86-64 and AArch64. I also ran the GCC testsuite on x86-64, AArch64 and RISCV64. There are no regressions except for gcc.target/i386/pr52146.c which I have already mentioned and I believe it shouldn't be fixed in f-m-o. I have also measured the number of eliminated instructions for SPEC intrate on these three architectures, which are as follows: RISCV64: 500: 112 502: 443 505: 0 520: 808 523: 20 525: 384 531: 41 541: 97 548: 101 557: 9 AArch64: 500: 71 502: 318 505: 0 520: 23 523: 205 525: 73 531: 7 541: 56 548: 0 557: 2 x86-64: 500: 8 502: 16 505: 0 520: 4 523: 5 525: 2 531: 0 541: 0 548: 0 557: 0 Thanks, Manolis On Sat, Sep 9, 2023 at 11:47 AM Manolis Tsamis wrote: > > This is a new RTL pass that tries to optimize memory offset calculations > by moving them from add immediate instructions to the memory loads/stores. > For example it can transform this: > > addi t4,sp,16 > add t2,a6,t4 > shl t3,t2,1 > ld a2,0(t3) > addi a2,1 > sd a2,8(t2) > > into the following (one instruction less): > > add t2,a6,sp > shl t3,t2,1 > ld a2,32(t3) > addi a2,1 > sd a2,24(t2) > > Although there are places where this is done already, this pass is more > powerful and can handle the more difficult cases that are currently not > optimized. Also, it runs late enough and can optimize away unnecessary > stack pointer calculations. > > gcc/ChangeLog: > > * Makefile.in: Add fold-mem-offsets.o. > * passes.def: Schedule a new pass. > * tree-pass.h (make_pass_fold_mem_offsets): Declare. > * common.opt: New options. > * doc/invoke.texi: Document new option. > * fold-mem-offsets.cc: New file. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/fold-mem-offsets-1.c: New test. > * gcc.target/riscv/fold-mem-offsets-2.c: New test. > * gcc.target/riscv/fold-mem-offsets-3.c: New test. > > Signed-off-by: Manolis Tsamis > --- > > Changes in v5: > - Introduce new helper function fold_offsets_1. > - Fix bug because constants could be partially propagated > through instructions that weren't understood. > - Introduce helper class fold_mem_info that stores f-m-o > info for an instruction. > - Calculate fold_offsets only once with do_fold_info_calculation. > - Fix correctness issue by introducing compute_validity_closure. > - Propagate in more cases for PLUS/MINUS with constant. > > Changes in v4: > - Add DF_EQ_NOTES flag to avoid incorrect state in notes. > - Remove fold_mem_offsets_driver and enum fold_mem_phase. > - Call recog when patching offsets in do_commit_offset. > - Restore INSN_CODE after modifying insn in do_check_validity. > > Changes in v3: > - Added propagation for more codes: > sub, neg, mul. > - Added folding / elimination for sub and > const int moves. > - For the validity check of the generated addresses > also test memory_address_addr_space_p. > - Replaced GEN_INT with gen_int_mode. > - Replaced some bitmap_head with auto_bitmap. > - Refactor each phase into own function for readability. > - Add dump details. > - Replace rtx iteration with reg_mentioned_p. > - Return early for codes that we can't propagate through. > > Changes in v2: > - Made the pass target-independant instead of RISCV specific. > - Fixed a number of bugs. > - Add code to handle more ADD patterns as found > in other targets (x86, aarch64). > - Improved naming and comments. > - Fixed bitmap memory leak. > > gcc/Makefile.in | 1 + > gcc/common.opt| 4 + > gcc/doc/invoke.texi | 8 + > gcc/fold-mem-offsets.cc | 891 ++ > gcc/passes.def| 1 + > .../gcc.target/riscv/fold-mem-offsets-1.c | 16 + > .../gcc.target/riscv/fold-mem-offsets-2.c | 24 + > .../gcc.target/riscv/fold-mem-offsets-3.c | 17 + > gcc/tree-pass.h | 1 + > 9 files changed, 963 insertions(+) > create mode 100644 gcc/fold-mem-offsets.cc > create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c > create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c > create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in > index 6d608db4dd2..d18bef1be4b 100644 > --- a/gcc/Makefile.in > +++ b/gcc/Makefile.in > @@ -1435,6 +1435,7 @@ OBJS = \ > fixed-value.o \ > fold-const.o \ > fold-const-call.o \ > +