[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2023-05-01 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

--- Comment #34 from Andrew Pinski  ---
aarch64 was fixed fully with r11-4973-g54bbde550ec5 .

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2020-03-05 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

--- Comment #33 from CVS Commits  ---
The master branch has been updated by Kito Cheng :

https://gcc.gnu.org/g:46275300312b44e1388b86a45f1600a5a1722303

commit r10-7055-g46275300312b44e1388b86a45f1600a5a1722303
Author: Kito Cheng 
Date:   Tue Mar 3 14:16:42 2020 +0800

re PR tree-optimization/90883 (Generated code is worse if returned struct
is unnamed)

After add --param max-inline-insns-size=1 all target will remove the
redundant store at dse1, except some targets like AArch64 and MIPS will
expand the struct initialization into loop due to CLEAR_RATIO.

Tested on cross compiler of riscv32, riscv64, x86, x86_64, mips, mips64,
aarch64, nds32 and arm.

gcc/testsuite/ChangeLog

PR tree-optimization/90883
* g++.dg/tree-ssa/pr90883.c: Add --param max-inline-insns-size=1.
Add aarch64-*-* mips*-*-* to XFAIL.

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2020-03-02 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

--- Comment #32 from Jim Wilson  ---
The proposed patch looks OK to me.  I suggest you submit it to gcc-patches.

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2020-03-02 Thread kito at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

--- Comment #31 from Kito Cheng  ---
Maybe we could add --param max-inline-insns-size=1 to compile flag and add
mips* and aarch64 into xfail list to make every target happy for this test
case? and if some other target fail is cause by the CLEAR_RATIO too, they
should add into xfail list.

Patch here:

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr90883.C
b/gcc/testsuite/g++.dg/tree-ssa/pr90883.C
index c5faffa1f32..0e622f263d2 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/pr90883.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr90883.C
@@ -1,4 +1,4 @@
-// { dg-options "-O2 -Os -fdump-tree-dse-details -std=c++11" }
+// { dg-options "-O2 -Os -fdump-tree-dse-details -std=c++11 --param
max-inline-insns-size=1" }


 class C
@@ -15,6 +15,6 @@

 // We want to match enough here to capture that we deleted an empty
 // constructor store
-// { dg-final { scan-tree-dump "Deleted redundant store: .*\.a = {}" "dse1" {
target { ! i?86-*-* } } } }
-// { dg-final { scan-tree-dump "Deleted redundant store: .*\.a = {}" "dse2" {
target i?86-*-* } } }
+// aarch64 and mips will expand into loop to clear because CLEAR_RATIO.
+// { dg-final { scan-tree-dump "Deleted redundant store: .*\.a = {}" "dse1" {
xfail { aarch64-*-* mips*-*-* } } } }

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2020-03-02 Thread kito at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

--- Comment #30 from Kito Cheng  ---
After add --param max-inline-insns-size=1 to compile flags, x86, x86_64, rv32,
rv64, nds32 and arm are "Deleted redundant store" at dse1.


But mips, mips64 and aarch64 still not pass the scan testing since those 3
targets are expand " = {}" to loop at earlier passes.

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2020-02-20 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #29 from Jim Wilson  ---
The testcase works for riscv64-elf but does not work for riscv32-elf.  The
difference is in the einline pass before dse1.  riscv64-elf has
tmp.C:12:17: optimized:  Inlining constexpr C::C()/1 into C slow()/3.
where as riscv32-elf has
tmp.C:12:17: missed:   will not early inline: C slow()/3->constexpr
C::C()/1, call is cold and code would grow by 1

Since the constructor was not early inlined, dse1 can't eliminate the redundant
store.  The constructor eventually gets inlined between
085i.materialize-all-clones and 088t.fixup_cfg3 which allows dse2 to eliminate
the redundant store.

I can make the testcase work for riscv32-elf if I add
--param max-inline-insns-size=1
to allow the constructor to be inlined during the einline pass.  I didn't check
to see if this works for the other failing targets.

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2020-02-20 Thread kito at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

Kito Cheng  changed:

   What|Removed |Added

 CC||kito at gcc dot gnu.org

--- Comment #28 from Kito Cheng  ---
pr90883.C got fail for following target:

| fail reason
arm-eabi| "Deleted redundant store:" appear dse2
mips-elf| "Deleted dead store: .a[*] = 0;" appear at dse2
mips64-elf  | "Deleted dead store: .a[*] = 0;" appear at dse2
nds32le-elf | "Deleted redundant store:" appear dse2
riscv32-elf | "Deleted redundant store:" appear dse2


And configure options:

| configure option
arm-eabi| --target=arm-eabi
mips-elf| --target=mips-elf
mips64-elf  | --target=mips64-elf
nds32le-elf | --target=nds32le-elf
riscv32-elf | --target=riscv32-elf --with-arch=rv32gc --with-abi=ilp32d

sha1: g:85232b45840330df0fef81f2d4756d9a25d5ac3b

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2020-01-21 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883
Bug 90883 depends on bug 92328, which changed state.

Bug 92328 Summary: [10 Regression] ICE in eliminate_stmt, at 
tree-ssa-sccvn.c:5497
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92328

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2019-10-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

--- Comment #27 from Richard Biener  ---
Author: rguenth
Date: Fri Oct 11 13:10:15 2019
New Revision: 276882

URL: https://gcc.gnu.org/viewcvs?rev=276882=gcc=rev
Log:
2019-10-11  Richard Biener  

PR tree-optimization/90883
PR tree-optimization/91091
* tree-ssa-sccvn.c (vn_reference_lookup_3): Use correct
alias-sets both for recording VN table entries and continuing
walking after translating through copies.  Handle same-sized
reads from SSA names by returning the plain SSA name.
(eliminate_dom_walker::eliminate_stmt): Properly handle
non-size precision stores in redundant store elimination.

* gcc.dg/torture/20191011-1.c: New testcase.
* gcc.dg/tree-ssa/ssa-fre-82.c: Likewise.
* gcc.dg/tree-ssa/ssa-fre-83.c: Likewise.
* gcc.dg/tree-ssa/redundant-assign-zero-1.c: Disable FRE.
* gcc.dg/tree-ssa/redundant-assign-zero-2.c: Likewise.

Added:
trunk/gcc/testsuite/gcc.dg/torture/20191011-1.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-82.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-83.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/tree-ssa/redundant-assign-zero-1.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/redundant-assign-zero-2.c
trunk/gcc/tree-ssa-sccvn.c

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2019-10-10 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

--- Comment #26 from Richard Biener  ---
(In reply to Jakub Jelinek from comment #25)
> The FRE redundant store removal went away in r273135 aka PR91091 fix.

I'll see whether I can do something about FRE.  Redundant store removal
there isn't a good fit and thus not easy to get semantically correct.

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2019-10-10 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

--- Comment #25 from Jakub Jelinek  ---
The FRE redundant store removal went away in r273135 aka PR91091 fix.

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2019-10-10 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #24 from Jakub Jelinek  ---
Shouldn't dse_optimize_redundant_stores be at least a little bit more
aggressive?
I mean on:
struct S { int a; char b; char c; char d; };
void foo (struct S);
void bar (void) { struct S s = {}; s.a = 0; s.b = 0; s.c = 0; s.d = 0; foo (s);
}
dse1 removes s.a = 0; but keeps the rest, dse2 removes s.b = 0; but keeps the
rest, dse3 removes s.c = 0; but still keeps s.d = 0; and only store-merging in
this case optimizes it, while e.g. in GCC 6 or 8 fre1 optimized this (GCC 9
fre1 no longer does).
So, rather than just walking immediate uses of the vdef, can't it under some
small limit keep walking the vdef-vuse chains, skip loads/stores that cannot
alias it, and when removing a redundant store keep looking after it (or, even
when not removing, such as s.a = 3; above, keep looking for stores to bytes
that don't overlap already non-zero ones)?
Or is it better to readd it in FRE after fixing whatever bugs it had that lead
to the removal?

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2019-08-23 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

--- Comment #23 from Martin Sebor  ---
I get the same failure with -m32 -mtune=generic:

spawn -ignore SIGHUP /ssd/build/gcc-svn/gcc/testsuite/g++/../../xg++
-B/ssd/buil
d/gcc-svn/gcc/testsuite/g++/../../
/src/gcc/svn/gcc/testsuite/g++.dg/tree-ssa/pr
90883.C -m32 -mtune=generic -fno-diagnostics-show-caret
-fno-diagnostics-show-li
ne-numbers -fdiagnostics-color=never -nostdinc++
-I/ssd/build/gcc-svn/x86_64-pc-
linux-gnu/32/libstdc++-v3/include/x86_64-pc-linux-gnu
-I/ssd/build/gcc-svn/x86_6
4-pc-linux-gnu/32/libstdc++-v3/include -I/src/gcc/svn/libstdc++-v3/libsupc++
-I/src/gcc/svn/libstdc++-v3/include/backward
-I/src/gcc/svn/libstdc++-v3/testsuite/util -fmessage-length=0 -O2 -Os
-fdump-tree-dse-details -std=c++11 -S -o pr90883.s
PASS: g++.dg/tree-ssa/pr90883.C   (test for excess errors)
FAIL: g++.dg/tree-ssa/pr90883.C   scan-tree-dump dse1 "Deleted redundant store:
.*.a = {}"

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2019-08-23 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

--- Comment #22 from Jeffrey A. Law  ---
The test is somewhat sensitive to target bits that select between various
strategies for implementing mem* routines.

Can you try with -mtune=generic?  If that works, I can adjust the testcase
appropriately.

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2019-08-22 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

Martin Sebor  changed:

   What|Removed |Added

 CC||msebor at gcc dot gnu.org

--- Comment #21 from Martin Sebor  ---
I just noticed the test fail in my test run with -m32 on x86_64:

FAIL: g++.dg/tree-ssa/pr90883.C   scan-tree-dump dse1 "Deleted redundant store:
.*.a = {}"

With -m32 the pattern is dse2 dump but the target matches the dse1 directive:

// { dg-final { scan-tree-dump "Deleted redundant store: .*\.a = {}" "dse1" {
target { ! i?86-*-* } } } }

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2019-07-24 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

--- Comment #20 from Jeffrey A. Law  ---
Just to be clear, the expansion in question happens very early, essentially
pre-gimple, not at the gimple/RTL border and it's driven by a target macro.

I guess another approach would be to write the whole test in gimple so that we
could avoid the paths that query the target macros and ensure DSE gets the code
in a suitable form.

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2019-07-24 Thread rearnsha at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

--- Comment #19 from Richard Earnshaw  ---
Surely the real problem is that the expansion doesn't really understand about
the 'don't care' location and that we can therefore put any value in that?

That additional knowledge would allow the earlier passes of the compiler to
rewrite this as a simple store of zero to then entire object.

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2019-07-23 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

--- Comment #18 from Jeffrey A. Law  ---
It's the aarch64's definition of CLEAR_RATIO that's coming into play here.

The hole in the structure is critical here to show the DSE pessimization. 
Changing the size of the object to something "nice" and eliminating the hole
defeats the whole purpose here. 

he fact that the aarch64 port is inlining the expansion essentially invalidates
the test, hence if there isn't some guidance from the aarch64 folks, I'm just
going to xfail it.

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2019-07-23 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

--- Comment #17 from Wilco  ---
(In reply to Jeffrey A. Law from comment #16)
> The issue here (of course) is that aarch64 has a different set of defaults
> for when to open-code vs loop vs function call.   My attempts to pick a
> better size for the objects results in failures on other targets.
> 
> Do we have a method on aarch64 to tune this stuff via flags?  Otherwise I'm
> likely to just xfail aarch64 and move on since DSE is doing what we want at
> this point if given sane input.

I don't know, this issue doesn't seem related to any backend setting - this is
a typical inline memset expansion.

Handling structures that are not a multiple of 4 or 8 are generally inefficient
on GCC given the mid-end can't deal with overlapping accesses of different
sizes. It's efficient if I change the size of the array to 8 rather than 7.

So there is a real issue here, but maybe you'd prefer a new bugreport for that?

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2019-07-22 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

--- Comment #16 from Jeffrey A. Law  ---
The issue here (of course) is that aarch64 has a different set of defaults for
when to open-code vs loop vs function call.   My attempts to pick a better size
for the objects results in failures on other targets.

Do we have a method on aarch64 to tune this stuff via flags?  Otherwise I'm
likely to just xfail aarch64 and move on since DSE is doing what we want at
this point if given sane input.

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2019-07-05 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

Wilco  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

--- Comment #15 from Wilco  ---
Reopen

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2019-07-04 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

Wilco  changed:

   What|Removed |Added

 CC||wilco at gcc dot gnu.org

--- Comment #14 from Wilco  ---
(In reply to Jeffrey A. Law from comment #13)
> Author: law
> Date: Tue Jul  2 23:01:53 2019
> New Revision: 272949
> 
> URL: https://gcc.gnu.org/viewcvs?rev=272949=gcc=rev
> Log:
>   PR tree-optimization/90883
>   * g++.dg/tree-ssa/pr90883.c: Add -Os.  Check dse2 for the
>   deleted store on some targets.
> 
> Modified:
> trunk/gcc/testsuite/ChangeLog
> trunk/gcc/testsuite/g++.dg/tree-ssa/pr90883.C

This test still fails on Arm and AArch64, final code is still inefficient:

_Z4slowv:
.LFB0:
.cfi_startproc
sub sp, sp, #16
.cfi_def_cfa_offset 16
mov x1, 0
str wzr, [sp]
strhwzr, [sp, 4]
strbwzr, [sp, 6]
ldr x0, [sp]
add sp, sp, 16
.cfi_def_cfa_offset 0
ret

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2019-07-02 Thread law at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

--- Comment #13 from Jeffrey A. Law  ---
Author: law
Date: Tue Jul  2 23:01:53 2019
New Revision: 272949

URL: https://gcc.gnu.org/viewcvs?rev=272949=gcc=rev
Log:
PR tree-optimization/90883
* g++.dg/tree-ssa/pr90883.c: Add -Os.  Check dse2 for the
deleted store on some targets.

Modified:
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/g++.dg/tree-ssa/pr90883.C

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2019-06-26 Thread law at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

--- Comment #12 from Jeffrey A. Law  ---
Author: law
Date: Thu Jun 27 02:42:30 2019
New Revision: 272726

URL: https://gcc.gnu.org/viewcvs?rev=272726=gcc=rev
Log:
PR tree-optimization/90883
* tree-ssa-dse.c (delete_dead_or_redundant_call): Fix signature.
(delete_dead_or_redundant_assignment): Likewise.

Modified:
trunk/gcc/tree-ssa-dse.c

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2019-06-26 Thread law at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

--- Comment #11 from Jeffrey A. Law  ---
Author: law
Date: Thu Jun 27 02:41:27 2019
New Revision: 272725

URL: https://gcc.gnu.org/viewcvs?rev=272725=gcc=rev
Log:
PR tree-optimization/90883
* tree-ssa-dse.c (delete_dead_or_redundant_call): Fix signature.
(delete_dead_or_redundant_assignment): Likewise.

Modified:
trunk/gcc/ChangeLog

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2019-06-26 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

Jeffrey A. Law  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from Jeffrey A. Law  ---
Fixed on the trunk.

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2019-06-26 Thread law at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

--- Comment #9 from Jeffrey A. Law  ---
Author: law
Date: Wed Jun 26 21:36:27 2019
New Revision: 272717

URL: https://gcc.gnu.org/viewcvs?rev=272717=gcc=rev
Log:
PR tree-optimization/90883
* tree-ssa-alias.c (stmt_kills_ref_p): Handle BUILT_IN_CALLOC.
* tree-ssa-dse.c: Update various comments to distinguish between
dead and redundant stores.
(initialize_ao_ref_for_dse): Handle BUILT_IN_CALLOC.
(dse_optimize_redundant_stores): New function.
(delete_dead_or_redundant_call): Renamed from delete_dead_call.
Distinguish between dead and redundant calls in dump output.  All
callers updated.
(delete_dead_or_redundant_assignment): Similarly for assignments.
(dse_optimize_stmt): Handle _CHK variants.  For statements which
store 0 into multiple memory locations, try to prove a subsequent
store is redundant.

PR tree-optimization/90883
* g++.dg/tree-ssa/pr90883.C: New test.
* gcc.dg/tree-ssa/ssa-dse-36.c: New test.

Added:
trunk/gcc/testsuite/g++.dg/tree-ssa/pr90883.C
trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-36.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-ssa-alias.c
trunk/gcc/tree-ssa-dse.c

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2019-06-21 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

--- Comment #8 from Jeffrey A. Law  ---
Oh, yea, I kept looking at this from a DSE lens in which case it's the earlier
store that is partially dead.

But if we're storing the same value, then the latter store is totally dead and
removing the latter store is a better choice.

This might actually be fairly easy to implement, let me poke around a bit.

WRT the padding byte and the unaligned issues.  We try to keep the head of an
area aligned, but it's much less important for the tail.  And I even poked at
things in the debugger and having DSE ignore that padding byte doesn't really
help.  Really the way to go is to realize the second store is redundant because
it's storing the exact same values as the earlier store.

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2019-06-21 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

--- Comment #7 from rguenther at suse dot de  ---
On Tue, 18 Jun 2019, law at redhat dot com wrote:

> slow ()
> {
>   struct C D.25898;
>   struct C D.29462;
> 
> ;;   basic block 2, loop depth 0, count 1073741824 (estimated locally), maybe
> hot
> ;;prev block 0, next block 1, flags: (NEW, REACHABLE, VISITED)
> ;;pred:   ENTRY [always]  count:1073741824 (estimated locally)
> (FALLTHRU,EXECUTABLE)
>   D.25898.a = {};
>   D.29462 = D.25898;
>   D.25898 ={v} {CLOBBER};
>   return D.29462;
> ;;succ:   EXIT [always]  count:1073741824 (estimated locally)
> 
> }
> 
> WHich still isn't sufficient to get good code.
> 
> I'm not really sure what you want DSE to do here Richi :-)

I observed that

  D.26322 = {};
  D.26322.a = {};

looks like that the later store is dead (a C testcase showing actual
layout might be nice here).  Of course DSE doesn't work this way
around but trimming might be able to trim the second store instead of
the first (to nothing)?  I also noticed that

  MEM[(struct C *) + 7B] = {};
  D.26322.a = {};

here the first store is at offset 7 which will result in unaligned
and or small stores.  DSE doesn't seem to exploit the fact that
we do not need to preserve the stores into the padding (in fact
we do not expand that way I think).

Given that -fno-tree-dse produces nearly optimal code
(well, RTL manages to clean up all the useless stuff) some of
the above might help.

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2019-06-18 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at redhat dot com

--- Comment #6 from Jeffrey A. Law  ---
So we could have DSE ignore the padding bytes at the read point.  We may have
even discussed that at some point.  That results in something like this from
the gimple optimizers:

slow ()
{
  struct C D.25898;
  struct C D.29462;

;;   basic block 2, loop depth 0, count 1073741824 (estimated locally), maybe
hot
;;prev block 0, next block 1, flags: (NEW, REACHABLE, VISITED)
;;pred:   ENTRY [always]  count:1073741824 (estimated locally)
(FALLTHRU,EXECUTABLE)
  MEM  [(struct C *) + 8B] = {};
  D.25898.a = {};
  D.29462 = D.25898;
  D.25898 ={v} {CLOBBER};
  return D.29462;
;;succ:   EXIT [always]  count:1073741824 (estimated locally)

}


But that still doesn't really help.  Even if we tell DSE stores to the bytes
for C.b aren't live (they're uninitialized), then we get something like this:

slow ()
{
  struct C D.25898;
  struct C D.29462;

;;   basic block 2, loop depth 0, count 1073741824 (estimated locally), maybe
hot
;;prev block 0, next block 1, flags: (NEW, REACHABLE, VISITED)
;;pred:   ENTRY [always]  count:1073741824 (estimated locally)
(FALLTHRU,EXECUTABLE)
  D.25898.a = {};
  D.29462 = D.25898;
  D.25898 ={v} {CLOBBER};
  return D.29462;
;;succ:   EXIT [always]  count:1073741824 (estimated locally)

}

WHich still isn't sufficient to get good code.

I'm not really sure what you want DSE to do here Richi :-)

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2019-06-14 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

--- Comment #5 from rguenther at suse dot de  ---
On June 14, 2019 2:27:22 PM GMT+02:00, "jamborm at gcc dot gnu.org"
 wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883
>
>--- Comment #4 from Martin Jambor  ---
>(In reply to Richard Biener from comment #3)
>> ...I also wonder why SRA does not elide the aggregate copy.
>
>SRA has a special condition not to attempt to totally scalarize array
>of chars, so that it does not interfere with bigger types
>placement-new placed in those.
>
>It is true that artificially removing this condition would avoid the
>problem though.  However, the IL after esra would look like the
>following:
>   :
>  SR.1_3 = 0;
>  SR.2_8 = 0;
>  SR.3_9 = 0;
>  SR.4_10 = 0;
>  SR.5_11 = 0;
>  SR.6_12 = 0;
>  SR.7_13 = 0;
>  SR.8_14 = 0;
>  SR.1_15 = 0;
>  SR.2_16 = 0;
>  SR.3_17 = 0;
>  SR.4_18 = 0;
>  SR.5_19 = 0;
>  SR.6_20 = 0;
>  SR.7_21 = 0;
>  SR.8_22 = 0;
>  D.10899.a._M_elems[0] = SR.1_15;
>  D.10899.a._M_elems[1] = SR.2_16;
>  D.10899.a._M_elems[2] = SR.3_17;
>  D.10899.a._M_elems[3] = SR.4_18;
>  D.10899.a._M_elems[4] = SR.5_19;
>  D.10899.a._M_elems[5] = SR.6_20;
>  D.10899.a._M_elems[6] = SR.7_21;

Ick. OK, so the important fact would be the zero initialization which we'd only
see in a flow sensitive analysis? Then the above could just be

D. 10899.a._M_elems = {};

>  D.10899.b = SR.8_22;
>  return D.10899;

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2019-06-14 Thread jamborm at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

--- Comment #4 from Martin Jambor  ---
(In reply to Richard Biener from comment #3)
> ...I also wonder why SRA does not elide the aggregate copy.

SRA has a special condition not to attempt to totally scalarize array
of chars, so that it does not interfere with bigger types
placement-new placed in those.

It is true that artificially removing this condition would avoid the
problem though.  However, the IL after esra would look like the
following:

   :
  SR.1_3 = 0;
  SR.2_8 = 0;
  SR.3_9 = 0;
  SR.4_10 = 0;
  SR.5_11 = 0;
  SR.6_12 = 0;
  SR.7_13 = 0;
  SR.8_14 = 0;
  SR.1_15 = 0;
  SR.2_16 = 0;
  SR.3_17 = 0;
  SR.4_18 = 0;
  SR.5_19 = 0;
  SR.6_20 = 0;
  SR.7_21 = 0;
  SR.8_22 = 0;
  D.10899.a._M_elems[0] = SR.1_15;
  D.10899.a._M_elems[1] = SR.2_16;
  D.10899.a._M_elems[2] = SR.3_17;
  D.10899.a._M_elems[3] = SR.4_18;
  D.10899.a._M_elems[4] = SR.5_19;
  D.10899.a._M_elems[5] = SR.6_20;
  D.10899.a._M_elems[6] = SR.7_21;
  D.10899.b = SR.8_22;
  return D.10899;

[Bug tree-optimization/90883] Generated code is worse if returned struct is unnamed

2019-06-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

Richard Biener  changed:

   What|Removed |Added

 CC||jamborm at gcc dot gnu.org,
   ||law at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org
  Component|c++ |tree-optimization

--- Comment #3 from Richard Biener  ---
So before SRA we have

slow ()
{
  struct C D.26322;
  struct C D.29886;

   :
  D.26322 = {};
  D.26322.a = {};
  D.26322.b = 0;
  D.29886 = D.26322;
  D.26322 ={v} {CLOBBER};
  return D.29886;

fast ()
{
  struct C c;
  struct C D.29889;

   :
  c = {};
  D.29889 = c;
  c ={v} {CLOBBER};
  return D.29889;

and then SRA does nothing to fast but half-way optimizes slow() to

   :
  D.26322 = {};
  D.26322.a = {};
  D.29886 = D.26322;
  D.26322 ={v} {CLOBBER};
  return D.29886;

where then probably DSE wrecks things by trimming the earlier store:

   :
  MEM[(struct C *) + 7B] = {};
  D.26322.a = {};
  D.29886 = D.26322;
  D.26322 ={v} {CLOBBER};
  return D.29886;

which we later do not recover from (store-merging?).  I also wonder
why SRA does not elide the aggregate copy.

-O2 -fno-tree-dse generates

_Z4slowv:
.LFB1099:
.cfi_startproc
movq$0, -32(%rsp)
xorl%eax, %eax
xorl%edx, %edx
ret