[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 --- Comment #34 from Andrew Pinski --- aarch64 was fixed fully with r11-4973-g54bbde550ec5 .
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 --- Comment #33 from CVS Commits --- The master branch has been updated by Kito Cheng : https://gcc.gnu.org/g:46275300312b44e1388b86a45f1600a5a1722303 commit r10-7055-g46275300312b44e1388b86a45f1600a5a1722303 Author: Kito Cheng Date: Tue Mar 3 14:16:42 2020 +0800 re PR tree-optimization/90883 (Generated code is worse if returned struct is unnamed) After add --param max-inline-insns-size=1 all target will remove the redundant store at dse1, except some targets like AArch64 and MIPS will expand the struct initialization into loop due to CLEAR_RATIO. Tested on cross compiler of riscv32, riscv64, x86, x86_64, mips, mips64, aarch64, nds32 and arm. gcc/testsuite/ChangeLog PR tree-optimization/90883 * g++.dg/tree-ssa/pr90883.c: Add --param max-inline-insns-size=1. Add aarch64-*-* mips*-*-* to XFAIL.
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 --- Comment #32 from Jim Wilson --- The proposed patch looks OK to me. I suggest you submit it to gcc-patches.
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 --- Comment #31 from Kito Cheng --- Maybe we could add --param max-inline-insns-size=1 to compile flag and add mips* and aarch64 into xfail list to make every target happy for this test case? and if some other target fail is cause by the CLEAR_RATIO too, they should add into xfail list. Patch here: diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr90883.C b/gcc/testsuite/g++.dg/tree-ssa/pr90883.C index c5faffa1f32..0e622f263d2 100644 --- a/gcc/testsuite/g++.dg/tree-ssa/pr90883.C +++ b/gcc/testsuite/g++.dg/tree-ssa/pr90883.C @@ -1,4 +1,4 @@ -// { dg-options "-O2 -Os -fdump-tree-dse-details -std=c++11" } +// { dg-options "-O2 -Os -fdump-tree-dse-details -std=c++11 --param max-inline-insns-size=1" } class C @@ -15,6 +15,6 @@ // We want to match enough here to capture that we deleted an empty // constructor store -// { dg-final { scan-tree-dump "Deleted redundant store: .*\.a = {}" "dse1" { target { ! i?86-*-* } } } } -// { dg-final { scan-tree-dump "Deleted redundant store: .*\.a = {}" "dse2" { target i?86-*-* } } } +// aarch64 and mips will expand into loop to clear because CLEAR_RATIO. +// { dg-final { scan-tree-dump "Deleted redundant store: .*\.a = {}" "dse1" { xfail { aarch64-*-* mips*-*-* } } } }
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 --- Comment #30 from Kito Cheng --- After add --param max-inline-insns-size=1 to compile flags, x86, x86_64, rv32, rv64, nds32 and arm are "Deleted redundant store" at dse1. But mips, mips64 and aarch64 still not pass the scan testing since those 3 targets are expand " = {}" to loop at earlier passes.
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #29 from Jim Wilson --- The testcase works for riscv64-elf but does not work for riscv32-elf. The difference is in the einline pass before dse1. riscv64-elf has tmp.C:12:17: optimized: Inlining constexpr C::C()/1 into C slow()/3. where as riscv32-elf has tmp.C:12:17: missed: will not early inline: C slow()/3->constexpr C::C()/1, call is cold and code would grow by 1 Since the constructor was not early inlined, dse1 can't eliminate the redundant store. The constructor eventually gets inlined between 085i.materialize-all-clones and 088t.fixup_cfg3 which allows dse2 to eliminate the redundant store. I can make the testcase work for riscv32-elf if I add --param max-inline-insns-size=1 to allow the constructor to be inlined during the einline pass. I didn't check to see if this works for the other failing targets.
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 Kito Cheng changed: What|Removed |Added CC||kito at gcc dot gnu.org --- Comment #28 from Kito Cheng --- pr90883.C got fail for following target: | fail reason arm-eabi| "Deleted redundant store:" appear dse2 mips-elf| "Deleted dead store: .a[*] = 0;" appear at dse2 mips64-elf | "Deleted dead store: .a[*] = 0;" appear at dse2 nds32le-elf | "Deleted redundant store:" appear dse2 riscv32-elf | "Deleted redundant store:" appear dse2 And configure options: | configure option arm-eabi| --target=arm-eabi mips-elf| --target=mips-elf mips64-elf | --target=mips64-elf nds32le-elf | --target=nds32le-elf riscv32-elf | --target=riscv32-elf --with-arch=rv32gc --with-abi=ilp32d sha1: g:85232b45840330df0fef81f2d4756d9a25d5ac3b
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 Bug 90883 depends on bug 92328, which changed state. Bug 92328 Summary: [10 Regression] ICE in eliminate_stmt, at tree-ssa-sccvn.c:5497 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92328 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 --- Comment #27 from Richard Biener --- Author: rguenth Date: Fri Oct 11 13:10:15 2019 New Revision: 276882 URL: https://gcc.gnu.org/viewcvs?rev=276882=gcc=rev Log: 2019-10-11 Richard Biener PR tree-optimization/90883 PR tree-optimization/91091 * tree-ssa-sccvn.c (vn_reference_lookup_3): Use correct alias-sets both for recording VN table entries and continuing walking after translating through copies. Handle same-sized reads from SSA names by returning the plain SSA name. (eliminate_dom_walker::eliminate_stmt): Properly handle non-size precision stores in redundant store elimination. * gcc.dg/torture/20191011-1.c: New testcase. * gcc.dg/tree-ssa/ssa-fre-82.c: Likewise. * gcc.dg/tree-ssa/ssa-fre-83.c: Likewise. * gcc.dg/tree-ssa/redundant-assign-zero-1.c: Disable FRE. * gcc.dg/tree-ssa/redundant-assign-zero-2.c: Likewise. Added: trunk/gcc/testsuite/gcc.dg/torture/20191011-1.c trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-82.c trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-83.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/tree-ssa/redundant-assign-zero-1.c trunk/gcc/testsuite/gcc.dg/tree-ssa/redundant-assign-zero-2.c trunk/gcc/tree-ssa-sccvn.c
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 --- Comment #26 from Richard Biener --- (In reply to Jakub Jelinek from comment #25) > The FRE redundant store removal went away in r273135 aka PR91091 fix. I'll see whether I can do something about FRE. Redundant store removal there isn't a good fit and thus not easy to get semantically correct.
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 --- Comment #25 from Jakub Jelinek --- The FRE redundant store removal went away in r273135 aka PR91091 fix.
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #24 from Jakub Jelinek --- Shouldn't dse_optimize_redundant_stores be at least a little bit more aggressive? I mean on: struct S { int a; char b; char c; char d; }; void foo (struct S); void bar (void) { struct S s = {}; s.a = 0; s.b = 0; s.c = 0; s.d = 0; foo (s); } dse1 removes s.a = 0; but keeps the rest, dse2 removes s.b = 0; but keeps the rest, dse3 removes s.c = 0; but still keeps s.d = 0; and only store-merging in this case optimizes it, while e.g. in GCC 6 or 8 fre1 optimized this (GCC 9 fre1 no longer does). So, rather than just walking immediate uses of the vdef, can't it under some small limit keep walking the vdef-vuse chains, skip loads/stores that cannot alias it, and when removing a redundant store keep looking after it (or, even when not removing, such as s.a = 3; above, keep looking for stores to bytes that don't overlap already non-zero ones)? Or is it better to readd it in FRE after fixing whatever bugs it had that lead to the removal?
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 --- Comment #23 from Martin Sebor --- I get the same failure with -m32 -mtune=generic: spawn -ignore SIGHUP /ssd/build/gcc-svn/gcc/testsuite/g++/../../xg++ -B/ssd/buil d/gcc-svn/gcc/testsuite/g++/../../ /src/gcc/svn/gcc/testsuite/g++.dg/tree-ssa/pr 90883.C -m32 -mtune=generic -fno-diagnostics-show-caret -fno-diagnostics-show-li ne-numbers -fdiagnostics-color=never -nostdinc++ -I/ssd/build/gcc-svn/x86_64-pc- linux-gnu/32/libstdc++-v3/include/x86_64-pc-linux-gnu -I/ssd/build/gcc-svn/x86_6 4-pc-linux-gnu/32/libstdc++-v3/include -I/src/gcc/svn/libstdc++-v3/libsupc++ -I/src/gcc/svn/libstdc++-v3/include/backward -I/src/gcc/svn/libstdc++-v3/testsuite/util -fmessage-length=0 -O2 -Os -fdump-tree-dse-details -std=c++11 -S -o pr90883.s PASS: g++.dg/tree-ssa/pr90883.C (test for excess errors) FAIL: g++.dg/tree-ssa/pr90883.C scan-tree-dump dse1 "Deleted redundant store: .*.a = {}"
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 --- Comment #22 from Jeffrey A. Law --- The test is somewhat sensitive to target bits that select between various strategies for implementing mem* routines. Can you try with -mtune=generic? If that works, I can adjust the testcase appropriately.
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 Martin Sebor changed: What|Removed |Added CC||msebor at gcc dot gnu.org --- Comment #21 from Martin Sebor --- I just noticed the test fail in my test run with -m32 on x86_64: FAIL: g++.dg/tree-ssa/pr90883.C scan-tree-dump dse1 "Deleted redundant store: .*.a = {}" With -m32 the pattern is dse2 dump but the target matches the dse1 directive: // { dg-final { scan-tree-dump "Deleted redundant store: .*\.a = {}" "dse1" { target { ! i?86-*-* } } } }
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 --- Comment #20 from Jeffrey A. Law --- Just to be clear, the expansion in question happens very early, essentially pre-gimple, not at the gimple/RTL border and it's driven by a target macro. I guess another approach would be to write the whole test in gimple so that we could avoid the paths that query the target macros and ensure DSE gets the code in a suitable form.
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 --- Comment #19 from Richard Earnshaw --- Surely the real problem is that the expansion doesn't really understand about the 'don't care' location and that we can therefore put any value in that? That additional knowledge would allow the earlier passes of the compiler to rewrite this as a simple store of zero to then entire object.
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 --- Comment #18 from Jeffrey A. Law --- It's the aarch64's definition of CLEAR_RATIO that's coming into play here. The hole in the structure is critical here to show the DSE pessimization. Changing the size of the object to something "nice" and eliminating the hole defeats the whole purpose here. he fact that the aarch64 port is inlining the expansion essentially invalidates the test, hence if there isn't some guidance from the aarch64 folks, I'm just going to xfail it.
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 --- Comment #17 from Wilco --- (In reply to Jeffrey A. Law from comment #16) > The issue here (of course) is that aarch64 has a different set of defaults > for when to open-code vs loop vs function call. My attempts to pick a > better size for the objects results in failures on other targets. > > Do we have a method on aarch64 to tune this stuff via flags? Otherwise I'm > likely to just xfail aarch64 and move on since DSE is doing what we want at > this point if given sane input. I don't know, this issue doesn't seem related to any backend setting - this is a typical inline memset expansion. Handling structures that are not a multiple of 4 or 8 are generally inefficient on GCC given the mid-end can't deal with overlapping accesses of different sizes. It's efficient if I change the size of the array to 8 rather than 7. So there is a real issue here, but maybe you'd prefer a new bugreport for that?
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 --- Comment #16 from Jeffrey A. Law --- The issue here (of course) is that aarch64 has a different set of defaults for when to open-code vs loop vs function call. My attempts to pick a better size for the objects results in failures on other targets. Do we have a method on aarch64 to tune this stuff via flags? Otherwise I'm likely to just xfail aarch64 and move on since DSE is doing what we want at this point if given sane input.
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 Wilco changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|FIXED |--- --- Comment #15 from Wilco --- Reopen
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 Wilco changed: What|Removed |Added CC||wilco at gcc dot gnu.org --- Comment #14 from Wilco --- (In reply to Jeffrey A. Law from comment #13) > Author: law > Date: Tue Jul 2 23:01:53 2019 > New Revision: 272949 > > URL: https://gcc.gnu.org/viewcvs?rev=272949=gcc=rev > Log: > PR tree-optimization/90883 > * g++.dg/tree-ssa/pr90883.c: Add -Os. Check dse2 for the > deleted store on some targets. > > Modified: > trunk/gcc/testsuite/ChangeLog > trunk/gcc/testsuite/g++.dg/tree-ssa/pr90883.C This test still fails on Arm and AArch64, final code is still inefficient: _Z4slowv: .LFB0: .cfi_startproc sub sp, sp, #16 .cfi_def_cfa_offset 16 mov x1, 0 str wzr, [sp] strhwzr, [sp, 4] strbwzr, [sp, 6] ldr x0, [sp] add sp, sp, 16 .cfi_def_cfa_offset 0 ret
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 --- Comment #13 from Jeffrey A. Law --- Author: law Date: Tue Jul 2 23:01:53 2019 New Revision: 272949 URL: https://gcc.gnu.org/viewcvs?rev=272949=gcc=rev Log: PR tree-optimization/90883 * g++.dg/tree-ssa/pr90883.c: Add -Os. Check dse2 for the deleted store on some targets. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/g++.dg/tree-ssa/pr90883.C
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 --- Comment #12 from Jeffrey A. Law --- Author: law Date: Thu Jun 27 02:42:30 2019 New Revision: 272726 URL: https://gcc.gnu.org/viewcvs?rev=272726=gcc=rev Log: PR tree-optimization/90883 * tree-ssa-dse.c (delete_dead_or_redundant_call): Fix signature. (delete_dead_or_redundant_assignment): Likewise. Modified: trunk/gcc/tree-ssa-dse.c
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 --- Comment #11 from Jeffrey A. Law --- Author: law Date: Thu Jun 27 02:41:27 2019 New Revision: 272725 URL: https://gcc.gnu.org/viewcvs?rev=272725=gcc=rev Log: PR tree-optimization/90883 * tree-ssa-dse.c (delete_dead_or_redundant_call): Fix signature. (delete_dead_or_redundant_assignment): Likewise. Modified: trunk/gcc/ChangeLog
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 Jeffrey A. Law changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #10 from Jeffrey A. Law --- Fixed on the trunk.
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 --- Comment #9 from Jeffrey A. Law --- Author: law Date: Wed Jun 26 21:36:27 2019 New Revision: 272717 URL: https://gcc.gnu.org/viewcvs?rev=272717=gcc=rev Log: PR tree-optimization/90883 * tree-ssa-alias.c (stmt_kills_ref_p): Handle BUILT_IN_CALLOC. * tree-ssa-dse.c: Update various comments to distinguish between dead and redundant stores. (initialize_ao_ref_for_dse): Handle BUILT_IN_CALLOC. (dse_optimize_redundant_stores): New function. (delete_dead_or_redundant_call): Renamed from delete_dead_call. Distinguish between dead and redundant calls in dump output. All callers updated. (delete_dead_or_redundant_assignment): Similarly for assignments. (dse_optimize_stmt): Handle _CHK variants. For statements which store 0 into multiple memory locations, try to prove a subsequent store is redundant. PR tree-optimization/90883 * g++.dg/tree-ssa/pr90883.C: New test. * gcc.dg/tree-ssa/ssa-dse-36.c: New test. Added: trunk/gcc/testsuite/g++.dg/tree-ssa/pr90883.C trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-36.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-alias.c trunk/gcc/tree-ssa-dse.c
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 --- Comment #8 from Jeffrey A. Law --- Oh, yea, I kept looking at this from a DSE lens in which case it's the earlier store that is partially dead. But if we're storing the same value, then the latter store is totally dead and removing the latter store is a better choice. This might actually be fairly easy to implement, let me poke around a bit. WRT the padding byte and the unaligned issues. We try to keep the head of an area aligned, but it's much less important for the tail. And I even poked at things in the debugger and having DSE ignore that padding byte doesn't really help. Really the way to go is to realize the second store is redundant because it's storing the exact same values as the earlier store.
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 --- Comment #7 from rguenther at suse dot de --- On Tue, 18 Jun 2019, law at redhat dot com wrote: > slow () > { > struct C D.25898; > struct C D.29462; > > ;; basic block 2, loop depth 0, count 1073741824 (estimated locally), maybe > hot > ;;prev block 0, next block 1, flags: (NEW, REACHABLE, VISITED) > ;;pred: ENTRY [always] count:1073741824 (estimated locally) > (FALLTHRU,EXECUTABLE) > D.25898.a = {}; > D.29462 = D.25898; > D.25898 ={v} {CLOBBER}; > return D.29462; > ;;succ: EXIT [always] count:1073741824 (estimated locally) > > } > > WHich still isn't sufficient to get good code. > > I'm not really sure what you want DSE to do here Richi :-) I observed that D.26322 = {}; D.26322.a = {}; looks like that the later store is dead (a C testcase showing actual layout might be nice here). Of course DSE doesn't work this way around but trimming might be able to trim the second store instead of the first (to nothing)? I also noticed that MEM[(struct C *) + 7B] = {}; D.26322.a = {}; here the first store is at offset 7 which will result in unaligned and or small stores. DSE doesn't seem to exploit the fact that we do not need to preserve the stores into the padding (in fact we do not expand that way I think). Given that -fno-tree-dse produces nearly optimal code (well, RTL manages to clean up all the useless stuff) some of the above might help.
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 Jeffrey A. Law changed: What|Removed |Added CC||law at redhat dot com --- Comment #6 from Jeffrey A. Law --- So we could have DSE ignore the padding bytes at the read point. We may have even discussed that at some point. That results in something like this from the gimple optimizers: slow () { struct C D.25898; struct C D.29462; ;; basic block 2, loop depth 0, count 1073741824 (estimated locally), maybe hot ;;prev block 0, next block 1, flags: (NEW, REACHABLE, VISITED) ;;pred: ENTRY [always] count:1073741824 (estimated locally) (FALLTHRU,EXECUTABLE) MEM [(struct C *) + 8B] = {}; D.25898.a = {}; D.29462 = D.25898; D.25898 ={v} {CLOBBER}; return D.29462; ;;succ: EXIT [always] count:1073741824 (estimated locally) } But that still doesn't really help. Even if we tell DSE stores to the bytes for C.b aren't live (they're uninitialized), then we get something like this: slow () { struct C D.25898; struct C D.29462; ;; basic block 2, loop depth 0, count 1073741824 (estimated locally), maybe hot ;;prev block 0, next block 1, flags: (NEW, REACHABLE, VISITED) ;;pred: ENTRY [always] count:1073741824 (estimated locally) (FALLTHRU,EXECUTABLE) D.25898.a = {}; D.29462 = D.25898; D.25898 ={v} {CLOBBER}; return D.29462; ;;succ: EXIT [always] count:1073741824 (estimated locally) } WHich still isn't sufficient to get good code. I'm not really sure what you want DSE to do here Richi :-)
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 --- Comment #5 from rguenther at suse dot de --- On June 14, 2019 2:27:22 PM GMT+02:00, "jamborm at gcc dot gnu.org" wrote: >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 > >--- Comment #4 from Martin Jambor --- >(In reply to Richard Biener from comment #3) >> ...I also wonder why SRA does not elide the aggregate copy. > >SRA has a special condition not to attempt to totally scalarize array >of chars, so that it does not interfere with bigger types >placement-new placed in those. > >It is true that artificially removing this condition would avoid the >problem though. However, the IL after esra would look like the >following: > : > SR.1_3 = 0; > SR.2_8 = 0; > SR.3_9 = 0; > SR.4_10 = 0; > SR.5_11 = 0; > SR.6_12 = 0; > SR.7_13 = 0; > SR.8_14 = 0; > SR.1_15 = 0; > SR.2_16 = 0; > SR.3_17 = 0; > SR.4_18 = 0; > SR.5_19 = 0; > SR.6_20 = 0; > SR.7_21 = 0; > SR.8_22 = 0; > D.10899.a._M_elems[0] = SR.1_15; > D.10899.a._M_elems[1] = SR.2_16; > D.10899.a._M_elems[2] = SR.3_17; > D.10899.a._M_elems[3] = SR.4_18; > D.10899.a._M_elems[4] = SR.5_19; > D.10899.a._M_elems[5] = SR.6_20; > D.10899.a._M_elems[6] = SR.7_21; Ick. OK, so the important fact would be the zero initialization which we'd only see in a flow sensitive analysis? Then the above could just be D. 10899.a._M_elems = {}; > D.10899.b = SR.8_22; > return D.10899;
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 --- Comment #4 from Martin Jambor --- (In reply to Richard Biener from comment #3) > ...I also wonder why SRA does not elide the aggregate copy. SRA has a special condition not to attempt to totally scalarize array of chars, so that it does not interfere with bigger types placement-new placed in those. It is true that artificially removing this condition would avoid the problem though. However, the IL after esra would look like the following: : SR.1_3 = 0; SR.2_8 = 0; SR.3_9 = 0; SR.4_10 = 0; SR.5_11 = 0; SR.6_12 = 0; SR.7_13 = 0; SR.8_14 = 0; SR.1_15 = 0; SR.2_16 = 0; SR.3_17 = 0; SR.4_18 = 0; SR.5_19 = 0; SR.6_20 = 0; SR.7_21 = 0; SR.8_22 = 0; D.10899.a._M_elems[0] = SR.1_15; D.10899.a._M_elems[1] = SR.2_16; D.10899.a._M_elems[2] = SR.3_17; D.10899.a._M_elems[3] = SR.4_18; D.10899.a._M_elems[4] = SR.5_19; D.10899.a._M_elems[5] = SR.6_20; D.10899.a._M_elems[6] = SR.7_21; D.10899.b = SR.8_22; return D.10899;
[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883 Richard Biener changed: What|Removed |Added CC||jamborm at gcc dot gnu.org, ||law at gcc dot gnu.org, ||rguenth at gcc dot gnu.org Component|c++ |tree-optimization --- Comment #3 from Richard Biener --- So before SRA we have slow () { struct C D.26322; struct C D.29886; : D.26322 = {}; D.26322.a = {}; D.26322.b = 0; D.29886 = D.26322; D.26322 ={v} {CLOBBER}; return D.29886; fast () { struct C c; struct C D.29889; : c = {}; D.29889 = c; c ={v} {CLOBBER}; return D.29889; and then SRA does nothing to fast but half-way optimizes slow() to : D.26322 = {}; D.26322.a = {}; D.29886 = D.26322; D.26322 ={v} {CLOBBER}; return D.29886; where then probably DSE wrecks things by trimming the earlier store: : MEM[(struct C *) + 7B] = {}; D.26322.a = {}; D.29886 = D.26322; D.26322 ={v} {CLOBBER}; return D.29886; which we later do not recover from (store-merging?). I also wonder why SRA does not elide the aggregate copy. -O2 -fno-tree-dse generates _Z4slowv: .LFB1099: .cfi_startproc movq$0, -32(%rsp) xorl%eax, %eax xorl%edx, %edx ret