[Bug tree-optimization/88709] Improve store-merging
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709 Ramana Radhakrishnan changed: What|Removed |Added CC||ramana at gcc dot gnu.org --- Comment #13 from Ramana Radhakrishnan --- (In reply to Richard Earnshaw from comment #9) > (In reply to Jakub Jelinek from comment #7) > > (In reply to Christophe Lyon from comment #6) > > > I've noticed that the new test store_merging_29.c fails on > > > arm-none-eabi --with-cpu cortex-a9 > > > FAIL: gcc.dg/store_merging_29.c scan-tree-dump store-merging "New sequence > > > of 3 stores to replace old one of 6 stores" > > > > That is because target-supports.exp lies on arm, even for -mcpu=cortex-a9 > > STRICT_ALIGNMENT is 1 on arm (it is 1 unconditionally), but > > check_effective_target_non_strict_align returns true on arm anyway. > > I've already added hacks for it in r256783 in another testcase though, guess > > I'll do something similar now, but I must say I'm not very excited about > > that. > > Support for misaligned accesses is a three(.5!)-valued problem: > > 1) There's no support in the architecture at all > 2) There's some support with a limited set of instructions > 3) There's full support: any memory access can handle any alignment. > 3.5) There's full support: but some accesses may be very slow > > I would think that these days most CPU architectures actually fall into > either 1 or 2. Many architectures have limitations, for example on atomic > accesses that are unaligned. > > STRICT_ALIGNMENT only covers, in reality case 3. I'm not even sure if it > would be defined on a machine with case 3.5. > > I think the real problem here is that it's not clear what question this > target-supports macro is really asking - does the CPU have the capability to > do (some) unaligned acceses? or can it arbitrarily support casts from > unaligned pointers to standard types? I agree - it sounds like this should be put in a comment next to the target-supports query.
[Bug tree-optimization/88709] Improve store-merging
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709 --- Comment #12 from Jakub Jelinek --- Author: jakub Date: Fri May 10 07:53:23 2019 New Revision: 271056 URL: https://gcc.gnu.org/viewcvs?rev=271056=gcc=rev Log: PR tree-optimization/88709 PR tree-optimization/90271 * gcc.dg/store_merging_29.c: Allow 4 stores to replace 6 stores on arm*-*-*. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/store_merging_29.c
[Bug tree-optimization/88709] Improve store-merging
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709 --- Comment #11 from Richard Earnshaw --- And in the testcase that prompted Ramana's original patch it clearly wanted to ask something else. We can't have it both ways.
[Bug tree-optimization/88709] Improve store-merging
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709 --- Comment #10 from Jakub Jelinek --- In this and many other testcases it wants to ask is STRICT_ALIGNMENT non-zero?
[Bug tree-optimization/88709] Improve store-merging
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709 --- Comment #9 from Richard Earnshaw --- (In reply to Jakub Jelinek from comment #7) > (In reply to Christophe Lyon from comment #6) > > I've noticed that the new test store_merging_29.c fails on > > arm-none-eabi --with-cpu cortex-a9 > > FAIL: gcc.dg/store_merging_29.c scan-tree-dump store-merging "New sequence > > of 3 stores to replace old one of 6 stores" > > That is because target-supports.exp lies on arm, even for -mcpu=cortex-a9 > STRICT_ALIGNMENT is 1 on arm (it is 1 unconditionally), but > check_effective_target_non_strict_align returns true on arm anyway. > I've already added hacks for it in r256783 in another testcase though, guess > I'll do something similar now, but I must say I'm not very excited about > that. Support for misaligned accesses is a three(.5!)-valued problem: 1) There's no support in the architecture at all 2) There's some support with a limited set of instructions 3) There's full support: any memory access can handle any alignment. 3.5) There's full support: but some accesses may be very slow I would think that these days most CPU architectures actually fall into either 1 or 2. Many architectures have limitations, for example on atomic accesses that are unaligned. STRICT_ALIGNMENT only covers, in reality case 3. I'm not even sure if it would be defined on a machine with case 3.5. I think the real problem here is that it's not clear what question this target-supports macro is really asking - does the CPU have the capability to do (some) unaligned acceses? or can it arbitrarily support casts from unaligned pointers to standard types?
[Bug tree-optimization/88709] Improve store-merging
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709 --- Comment #8 from Jakub Jelinek --- Created attachment 46327 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46327=edit gcc10-pr88709-test.patch Untested patch for the testsuite (well, I've tested it on x86_64-linux, together with some tweaks to test the 3 variants - replaced the [34] with [4], so that the second regex doesn't match for me on x86_64 and then testing normally, with be and le swapped and with x86_64 instead of arm).
[Bug tree-optimization/88709] Improve store-merging
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709 --- Comment #7 from Jakub Jelinek --- (In reply to Christophe Lyon from comment #6) > I've noticed that the new test store_merging_29.c fails on > arm-none-eabi --with-cpu cortex-a9 > FAIL: gcc.dg/store_merging_29.c scan-tree-dump store-merging "New sequence > of 3 stores to replace old one of 6 stores" That is because target-supports.exp lies on arm, even for -mcpu=cortex-a9 STRICT_ALIGNMENT is 1 on arm (it is 1 unconditionally), but check_effective_target_non_strict_align returns true on arm anyway. I've already added hacks for it in r256783 in another testcase though, guess I'll do something similar now, but I must say I'm not very excited about that.
[Bug tree-optimization/88709] Improve store-merging
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709 Christophe Lyon changed: What|Removed |Added CC||clyon at gcc dot gnu.org --- Comment #6 from Christophe Lyon --- I've noticed that the new test store_merging_29.c fails on arm-none-eabi --with-cpu cortex-a9 FAIL: gcc.dg/store_merging_29.c scan-tree-dump store-merging "New sequence of 3 stores to replace old one of 6 stores"
[Bug tree-optimization/88709] Improve store-merging
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709 --- Comment #5 from Jakub Jelinek --- Author: jakub Date: Mon May 6 21:50:14 2019 New Revision: 270924 URL: https://gcc.gnu.org/viewcvs?rev=270924=gcc=rev Log: PR tree-optimization/88709 PR tree-optimization/90271 * params.def (PARAM_STORE_MERGING_MAX_SIZE): New parameter. * gimple-ssa-store-merging.c (encode_tree_to_bitpos): Handle non-clobber CONSTRUCTORs with no elts. Remove useless tmp_int variable. (imm_store_chain_info::coalesce_immediate_stores): Punt if the size of the store merging group is larger than PARAM_STORE_MERGING_MAX_SIZE parameter. (split_group): Add bzero_first argument. If set, always emit first the first store which must be = {} of the whole area and then for the rest of the stores consider all zero bytes as paddings. (imm_store_chain_info::output_merged_store): Check if first store is = {} of the whole area and if yes, determine which setting of bzero_first for split_group gives smaller number of stores. Adjust split_group callers. (lhs_valid_for_store_merging_p): Allow decls. (rhs_valid_for_store_merging_p): Allow non-clobber CONTRUCTORs with no elts. (pass_store_merging::process_store): Likewise. * gcc.dg/store_merging_26.c: New test. * gcc.dg/store_merging_27.c: New test. * gcc.dg/store_merging_28.c: New test. * gcc.dg/store_merging_29.c: New test. Added: trunk/gcc/testsuite/gcc.dg/store_merging_26.c trunk/gcc/testsuite/gcc.dg/store_merging_27.c trunk/gcc/testsuite/gcc.dg/store_merging_28.c trunk/gcc/testsuite/gcc.dg/store_merging_29.c Modified: trunk/gcc/ChangeLog trunk/gcc/gimple-ssa-store-merging.c trunk/gcc/params.def trunk/gcc/testsuite/ChangeLog
[Bug tree-optimization/88709] Improve store-merging
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709 Jakub Jelinek changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org --- Comment #4 from Jakub Jelinek --- Created attachment 46278 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46278=edit gcc10-pr88709-wip.patch Further progress on the patch.
[Bug tree-optimization/88709] Improve store-merging
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709 --- Comment #3 from Jakub Jelinek --- WIP: --- gcc/gimple-ssa-store-merging.c.jj 2019-01-01 12:37:19.063943678 +0100 +++ gcc/gimple-ssa-store-merging.c 2019-04-29 19:02:55.992151104 +0200 @@ -1615,13 +1615,31 @@ encode_tree_to_bitpos (tree expr, unsign unsigned int total_bytes) { unsigned int first_byte = bitpos / BITS_PER_UNIT; - tree tmp_int = expr; bool sub_byte_op_p = ((bitlen % BITS_PER_UNIT) || (bitpos % BITS_PER_UNIT) || !int_mode_for_size (bitlen, 0).exists ()); + bool empty_ctor_p += (TREE_CODE (expr) == CONSTRUCTOR + && CONSTRUCTOR_NELTS (expr) == 0 + && TYPE_SIZE_UNIT (TREE_TYPE (expr)) + && tree_fits_uhwi_p (TYPE_SIZE_UNIT (TREE_TYPE (expr; if (!sub_byte_op_p) -return native_encode_expr (tmp_int, ptr + first_byte, total_bytes) != 0; +{ + if (first_byte >= total_bytes) + return false; + total_bytes -= first_byte; + if (empty_ctor_p) + { + unsigned HOST_WIDE_INT rhs_bytes + = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (expr))); + if (rhs_bytes > total_bytes) + return false; + memset (ptr + first_byte, '\0', rhs_bytes); + return true; + } + return native_encode_expr (expr, ptr + first_byte, total_bytes) != 0; +} /* LITTLE-ENDIAN We are writing a non byte-sized quantity or at a position that is not @@ -1667,14 +1685,29 @@ encode_tree_to_bitpos (tree expr, unsign /* We must be dealing with fixed-size data at this point, since the total size is also fixed. */ - fixed_size_mode mode = as_a (TYPE_MODE (TREE_TYPE (expr))); + unsigned int byte_size; + if (empty_ctor_p) +{ + unsigned HOST_WIDE_INT rhs_bytes + = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (expr))); + if (rhs_bytes > total_bytes) + return false; + byte_size = rhs_bytes; +} + else +{ + fixed_size_mode mode + = as_a (TYPE_MODE (TREE_TYPE (expr))); + byte_size = GET_MODE_SIZE (mode); +} /* Allocate an extra byte so that we have space to shift into. */ - unsigned int byte_size = GET_MODE_SIZE (mode) + 1; + byte_size++; unsigned char *tmpbuf = XALLOCAVEC (unsigned char, byte_size); memset (tmpbuf, '\0', byte_size); /* The store detection code should only have allowed constants that are - accepted by native_encode_expr. */ - if (native_encode_expr (expr, tmpbuf, byte_size - 1) == 0) + accepted by native_encode_expr or empty ctors. */ + if (!empty_ctor_p + && native_encode_expr (expr, tmpbuf, byte_size - 1) == 0) gcc_unreachable (); /* The native_encode_expr machinery uses TYPE_MODE to determine how many @@ -4164,7 +4197,8 @@ lhs_valid_for_store_merging_p (tree lhs) tree_code code = TREE_CODE (lhs); if (code == ARRAY_REF || code == ARRAY_RANGE_REF || code == MEM_REF - || code == COMPONENT_REF || code == BIT_FIELD_REF) + || code == COMPONENT_REF || code == BIT_FIELD_REF + || DECL_P (lhs)) return true; return false; @@ -4178,6 +4212,11 @@ static bool rhs_valid_for_store_merging_p (tree rhs) { unsigned HOST_WIDE_INT size; + if (TREE_CODE (rhs) == CONSTRUCTOR + && CONSTRUCTOR_NELTS (rhs) == 0 + && TYPE_SIZE_UNIT (TREE_TYPE (rhs)) + && tree_fits_uhwi_p (TYPE_SIZE_UNIT (TREE_TYPE (rhs +return true; return (GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (rhs))).is_constant () && native_encode_expr (rhs, NULL, size) != 0); }
[Bug tree-optimization/88709] Improve store-merging
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709 --- Comment #2 from Jakub Jelinek --- Note, especially with large constructors, unlike other overlapping stores it might not be feasible to merge the large clearing or memset with the other stores, but still it might be possible to merge several later stores using info on what we know has been stored in the gaps in between. So, say: struct S { char a[16]; }; void foo (void) { struct S s = {}; s.a[3] = 1; // We don't want to bump this into a larger store unless we can // eliminate the whole 16 byte initialization through it (still, in this case // it is likely beneficial, but we should see what would that = {} be expanded // to, depending on can_store_by_pieces etc. bar (); } struct T { char a[1024]; }; void foo (void) { struct T t = {}; t.a[54] = 1; t.a[52] = 2; // But we can using the info that t = {} cleared the whole // var merge these two stores, not using unnecessary masking etc. because // we know what the gaps will contain and can even extend the store // on either side or both to find optimal store size bar (); }
[Bug tree-optimization/88709] Improve store-merging
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Status|UNCONFIRMED |NEW Last reconfirmed||2019-01-07 Ever confirmed|0 |1
[Bug tree-optimization/88709] Improve store-merging
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88709 Jakub Jelinek changed: What|Removed |Added CC||redi at gcc dot gnu.org --- Comment #1 from Jakub Jelinek --- Compared to the first testcase, we do handle struct S { char buf[8]; }; void bar (struct S *); void foo (void) { struct S s; int a = 0; __builtin_memcpy ([4], , sizeof (int)); s.buf[0] = 5; s.buf[1] = 2; s.buf[2] = 3; s.buf[3] = 2; s.buf[5] = 7; bar (); } though, because the store is in that case MEM[ + 4B] = {} and thus valid for lhs.