[Bug rtl-optimization/114674] [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114674 Alex Coplan changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #7 from Alex Coplan --- Fixed for GCC 15, thanks for the report.
[Bug rtl-optimization/114674] [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114674 --- Comment #6 from GCC Commits --- The master branch has been updated by Alex Coplan : https://gcc.gnu.org/g:74690ff96b263b8609639b7fbc5d6afd3f19cb98 commit r15-282-g74690ff96b263b8609639b7fbc5d6afd3f19cb98 Author: Alex Coplan Date: Wed Apr 10 16:30:36 2024 +0100 aarch64: Preserve mem info on change of base for ldp/stp [PR114674] The ldp/stp fusion pass can change the base of an access so that the two accesses end up using a common base register. So far we have been using adjust_address_nv to do this, but this means that we don't preserve other properties of the mem we're replacing. It seems better to use replace_equiv_address_nv, as this will preserve e.g. the MEM_ALIGN of the mem whose address we're changing. The PR shows that by adjusting the other mem we lose alignment information about the original access and therefore end up rejecting an otherwise viable pair when --param=aarch64-stp-policy=aligned is passed. This patch fixes that by using replace_equiv_address_nv instead. Notably this is the same approach as taken by aarch64_check_consecutive_mems when a change of base is required, so this at least makes things more consistent between the ldp fusion pass and the peepholes. gcc/ChangeLog: PR target/114674 * config/aarch64/aarch64-ldp-fusion.cc (ldp_bb_info::fuse_pair): Use replace_equiv_address_nv on a change of base instead of adjust_address_nv on the other access. gcc/testsuite/ChangeLog: PR target/114674 * gcc.target/aarch64/pr114674.c: New test.
[Bug rtl-optimization/114674] [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114674 --- Comment #5 from Di Zhao --- Yes it's better to take a maximum MEM_ALIGN. I've missed that. Thanks.
[Bug rtl-optimization/114674] [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114674 Alex Coplan changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |acoplan at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #4 from Alex Coplan --- Discussing offline with Richard S an alternative approach would be to use replace_equiv_address[_nv] instead of adjust_address[_nv]; that way we preserve most properties of the original mem and just take the address from the other. In fact this is what aarch64_check_consecutive_mems already does so I think we should follow that. I'll try a patch along those lines for stage 1.
[Bug rtl-optimization/114674] [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114674 Alex Coplan changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2024-04-10 --- Comment #3 from Alex Coplan --- Confirmed. I think it might be best to take the maximum MEM_ALIGN between the adjusted mem (new_mem) and the original mem (change_mem). In this case it happens that the original mem (change_mem) has a stronger alignment guarantee, but in general it could be the case that the adjusted mem gives a better alignment guarantee. Since both locations are known to point to the same address, it seems best to me to take the largest alignment of the two.
[Bug rtl-optimization/114674] [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114674 Alex Coplan changed: What|Removed |Added CC||acoplan at gcc dot gnu.org Keywords||missed-optimization --- Comment #2 from Alex Coplan --- Thanks for the report (and patch), I'll look into this.
[Bug rtl-optimization/114674] [aarch64] ldp_fusion fails to merge 2 strs due to imprecise alignment info
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114674 --- Comment #1 from Di Zhao --- Here's a quick fix I tried, that works on the small test case above: diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc b/gcc/config/aarch64/aarch64-ldp-fusion.cc index 365dcf48b22..43478ede72b 100644 --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc @@ -1735,6 +1735,9 @@ ldp_bb_info::fuse_pair (bool load_p, rtx new_mem = adjust_address_nv (effective_base, mode_for_mem, adjust_amt); + // Keep original alignment info for ldp/stp policy. + set_mem_align (new_mem, MEM_ALIGN (change_mem)); + rtx new_set = load_p ? gen_rtx_SET (change_reg, new_mem) : gen_rtx_SET (new_mem, change_reg); Is is OK?