[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores

2018-04-30 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485
Bug 77485 depends on bug 33562, which changed state.

Bug 33562 Summary: [6 Regression] aggregate DSE disabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33562

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores

2017-01-16 Thread law at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485

--- Comment #16 from Jeffrey A. Law  ---
Author: law
Date: Mon Jan 16 23:43:05 2017
New Revision: 244509

URL: https://gcc.gnu.org/viewcvs?rev=244509=gcc=rev
Log:
2017-01-16  Jeff Law  

PR tree-optimization/79090
PR tree-optimization/33562
PR tree-optimization/61912
PR tree-optimization/77485
* tree-ssa-dse.c (compute_trims): Accept STMT argument.  Dump STMT
and computed trims into the dump file.

PR tree-optimization/79090
PR tree-optimization/33562
PR tree-optimization/61912
PR tree-optimization/77485
* tree-ssa-dse.c (compute_trims): Accept STMT argument.  Dump STMT
and computed trims into the dump file.

Added:
trunk/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-2.C
trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-29.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-ssa-dse.c

[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores

2017-01-13 Thread law at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485

--- Comment #15 from Jeffrey A. Law  ---
Author: law
Date: Sat Jan 14 06:16:23 2017
New Revision: 244461

URL: https://gcc.gnu.org/viewcvs?rev=244461=gcc=rev
Log:
PR tree-optimization/33562
PR tree-optimization/61912
PR tree-optimization/77485
* tree-ssa-dse.c (delete_dead_call): Accept gsi rather than
a statement.
(delete_dead_assignment): Likewise.
(dse_dom_walker::dse_optimize_stmt): Pass in the gsi rather than
statement to delete_dead_call and delete_dead_assignment.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-ssa-dse.c

[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores

2017-01-13 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485
Bug 77485 depends on bug 33562, which changed state.

Bug 33562 Summary: [5/6 Regression] aggregate DSE disabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33562

   What|Removed |Added

 Status|RESOLVED|NEW
 Resolution|FIXED   |---

[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores

2017-01-13 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485
Bug 77485 depends on bug 33562, which changed state.

Bug 33562 Summary: [5/6 Regression] aggregate DSE disabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33562

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores

2017-01-13 Thread law at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485

--- Comment #14 from Jeffrey A. Law  ---
Author: law
Date: Fri Jan 13 15:46:22 2017
New Revision: 23

URL: https://gcc.gnu.org/viewcvs?rev=23=gcc=rev
Log:
PR tree-optimization/61912
PR tree-optimization/77485
* tree-ssa-dse.c: Include expr.h.
(maybe_trim_constructor_store): New function.
(maybe_trim_partially_dead_store): Call maybe_trim_constructor_store.

PR tree-optimization/61912
PR tree-optimization/77485
* g++.dg/tree-ssa/ssa-dse-1.C: New test.
* gcc.dg/tree-ssa/pr30375: Adjust expected output.
* gcc.dg/tree-ssa/ssa-dse-24.c: New test.

Added:
trunk/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-1.C
trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-24.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/tree-ssa/pr30375.c
trunk/gcc/tree-ssa-dse.c

[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores

2017-01-13 Thread law at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485

--- Comment #12 from Jeffrey A. Law  ---
Author: law
Date: Fri Jan 13 15:37:09 2017
New Revision: 21

URL: https://gcc.gnu.org/viewcvs?rev=21=gcc=rev
Log:
PR tree-optimization/33562
PR tree-optimization/61912
PR tree-optimization/77485
* sbitmap.h (bitmap_count_bits): Prototype.
(bitmap_clear_range, bitmap_set_range): Likewise.
* sbitmap.c (bitmap_clear_range): New function.
(bitmap_set_range, sbitmap_popcount, bitmap_count_bits): Likewise.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/sbitmap.c
trunk/gcc/sbitmap.h

[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores

2017-01-13 Thread law at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485

--- Comment #13 from Jeffrey A. Law  ---
Author: law
Date: Fri Jan 13 15:42:08 2017
New Revision: 22

URL: https://gcc.gnu.org/viewcvs?rev=22=gcc=rev
Log:
PR tree-optimization/33562
PR tree-optimization/61912
PR tree-optimization/77485
* doc/invoke.texi: Document new dse-max-object-size param.
* params.def (PARM_DSE_MAX_OBJECT_SIZE): New PARAM.
* tree-ssa-dse.c: Include params.h.
(dse_store_status): New enum.
(initialize_ao_ref_for_dse): New, partially extracted from
dse_optimize_stmt.
(valid_ao_ref_for_dse, normalize_ref): New.
(setup_live_bytes_from_ref, compute_trims): Likewise.
(clear_bytes_written_by, maybe_trim_complex_store): Likewise.
(maybe_trim_partially_dead_store): Likewise.
(maybe_trim_complex_store): Likewise.
(dse_classify_store): Renamed from dse_possibly_dead_store_p.
Track what bytes live from the original store.  Return tri-state
for dead, partially dead or live.
(dse_dom_walker): Add constructor, destructor and new private members.
(delete_dead_call, delete_dead_assignment): New extracted from
dse_optimize_stmt.
(dse_optimize_stmt): Make a member of dse_dom_walker.
Use initialize_ao_ref_for_dse.

PR tree-optimization/33562
PR tree-optimization/61912
PR tree-optimization/77485
* gcc.dg/tree-ssa/complex-4.c: Remove xfail.
* gcc.dg/tree-ssa/complex-5.c: Likewise.
* gcc.dg/tree-ssa/ssa-dse-9.c: Likewise.
* gcc.dg/tree-ssa/ssa-dse-18.c: New test.
* gcc.dg/tree-ssa/ssa-dse-19.c: Likewise.
* gcc.dg/tree-ssa/ssa-dse-20.c: Likewise.
* gcc.dg/tree-ssa/ssa-dse-21.c: Likewise.

Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-18.c
  - copied, changed from r21,
trunk/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-19.c
  - copied, changed from r21,
trunk/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-20.c
  - copied, changed from r21,
trunk/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-21.c
  - copied, changed from r21,
trunk/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/doc/invoke.texi
trunk/gcc/params.def
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c
trunk/gcc/tree-ssa-dse.c

[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores

2016-12-15 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485

Jeffrey A. Law  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #11 from Jeffrey A. Law  ---
This is effectively the same as 61912 and fixed by the same change under
development.  We'll track via 61912, but expect to use the testcase from this
BZ as well as 61912 in the testsuite.

*** This bug has been marked as a duplicate of bug 61912 ***

[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores

2016-12-15 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485

--- Comment #10 from Jeffrey A. Law  ---
   [0.00%]:
  MEM[(char[170] *)& + 30B] = {};
  .buf[0] = 48;
[ ... ]

[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores

2016-12-14 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485

--- Comment #9 from Jeffrey A. Law  ---
That ought to be significantly easier and cleaner.  I really didn't like the
transformation into memset.  I'll give that a whirl tomorrow.

[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores

2016-12-14 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485

--- Comment #8 from rguenther at suse dot de  ---
On Wed, 14 Dec 2016, law at redhat dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485
> 
> --- Comment #7 from Jeffrey A. Law  ---
> There was a bit of cruft and a missed ADDR_EXPR in that last fragment of 
> gimple
> code.
> 
> ;;   basic block 2, loop depth 0, count 0, freq 0, maybe hot
> ;;prev block 0, next block 1, flags: (NEW, REACHABLE, VISITED)
> ;;pred:   ENTRY (FALLTHRU,EXECUTABLE)
>   __builtin_memset ([(struct FixBuf *)& + 31B], 0, 169);
>   .buf[0] = 48;
>   .buf[1] = 32;
> [ ... ]
>   return ;

It also shows another (awkward) detail of the transform -- a previously
non-TREE_ADDRESSABLE  becomes TREE_ADDRESSABLE.  This 
generally invalidates points-to (though in this particular case should
be harmless).  It also possibly pessimizes alias analysis because
pointers pointing to unknown stuff now also alias  while
previously not (non-TREE_ADDRESSABLE decls are not aliased).

This is why we generally prefer

 decl = {};

over a memset.  If you just chop off elements from the head/tail
you can keep the {} if you do

 MEM  [, off with original alias type]
   = {}; (CONSTRUCTOR of same char[] type)

[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores

2016-12-13 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485

--- Comment #7 from Jeffrey A. Law  ---
There was a bit of cruft and a missed ADDR_EXPR in that last fragment of gimple
code.

;;   basic block 2, loop depth 0, count 0, freq 0, maybe hot
;;prev block 0, next block 1, flags: (NEW, REACHABLE, VISITED)
;;pred:   ENTRY (FALLTHRU,EXECUTABLE)
  __builtin_memset ([(struct FixBuf *)& + 31B], 0, 169);
  .buf[0] = 48;
  .buf[1] = 32;
[ ... ]
  return ;

[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores

2016-12-13 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485

--- Comment #6 from Jeffrey A. Law  ---
So DSE involving CONSTRUCTOR nodes is a bit of a pain.

The way CONSTRUCTOR nodes are used past gimplification is awkward for DSE. 
Specifically we have a CONSTRUCTOR node with no ELTS.  The semantics of that
are the entire object is zero initialized.  In fact, unless a magic flag is
set, any field not showing up in CONSTRUCTOR_ELTS is to be zero initialized.

So while we can easily identify elements from the CONSTRUCTOR that are dead,
it's not just a matter of walking the CONSTRUCTOR_ELTs and removing those which
are dead.  Instead we'd have to add a suitable field for every element of the
object which is not dead.  Ugh.

Given the usage and semantics, we could treat a CONSTRUCTOR node (after
verification) as a memset.  That would allow the code I've already written
which can chop off the head or tail of a memset to be re-used to simplify
CONSTRUCTORS.  Note that my prototype actually turns the CONSTRUCTOR into a
memset.  At that point we bypass all that code in expr.c to find a reasonably
efficient way to expand the CONSTRUCTOR node and instead go through the paths
to try to emit an efficient memset.

For the given test we take this:

basic block 2, loop depth 0
 pred:   ENTRY
# .MEM_2 = VDEF <.MEM_1(D)>
 = {};
# .MEM_3 = VDEF <.MEM_2>
.buf[0] = 48;
# .MEM_4 = VDEF <.MEM_3>
.buf[1] = 32;
[ ... ]
# VUSE <.MEM_33>
return ;
 succ:   EXIT


Note the CONSTRUCTOR assignment to retval.  We turn that into:

basic block 2, loop depth 0
 pred:   ENTRY
# VUSE <.MEM>
_34 = MEM[(struct FixBuf *)& + 31B];
# .MEM_2 = VDEF <.MEM_1(D)>
__builtin_memset (_34, 0, 169);
# .MEM_3 = VDEF <.MEM_2>
.buf[0] = 48;
# .MEM_4 = VDEF <.MEM_3>
.buf[1] = 32;
[ ... ]
# VUSE <.MEM_33>
return ;
 succ:   EXIT


Note how the CONSTRUCTOR initialization has been replaced by a memset which
starts at offset 31 from the start of the object and initialized 169 bytes.


Instrumentation shows that handling CONSTRUCTORS in terms of static
opportunities to improve the code is more beneficial than any of the other
things we've looked at within the context of bz33562.   But it's also by far
the ugliest of the transformations.

[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores

2016-12-08 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485

--- Comment #5 from rguenther at suse dot de  ---
On Thu, 8 Dec 2016, law at redhat dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485
> 
> --- Comment #4 from Jeffrey A. Law  ---
> So my patches for 33562 will detect the partial dead store in "foo", but I
> never wrote the bits to narrow partial dead stores.
> 
> The difficulty in optimizing this particular case will be rewriting the
> CONSTRUCTOR node.  Though it may be as simple as walking down the CONSTRUCTOR
> and splicing out components which correspond to dead assignments -- I'll have
> to familiarize myself with the guts of how to walk CONSTRUCTOR nodes. 
> Hopefully they're at least in-order and extracting byte offsets is easy :-)

Should be.

But I still think any "real" DSE work should be done by unifying what
SRA, update-address-taken, store-merging and current DSE do (eventually 
even the bswap pass).  Those are all related and they share basic analysis
(and dataflow) parts and would benefit from each other transform "tricks".

And all of them would benefit from being flow-sensitive in a way to
consider parts of the CFG where for example addresses do not escape
and doing some (eventually expensive) stuff at such (cold?) region 
boundaries.

[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores

2016-12-08 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485

--- Comment #4 from Jeffrey A. Law  ---
So my patches for 33562 will detect the partial dead store in "foo", but I
never wrote the bits to narrow partial dead stores.

The difficulty in optimizing this particular case will be rewriting the
CONSTRUCTOR node.  Though it may be as simple as walking down the CONSTRUCTOR
and splicing out components which correspond to dead assignments -- I'll have
to familiarize myself with the guts of how to walk CONSTRUCTOR nodes. 
Hopefully they're at least in-order and extracting byte offsets is easy :-)

[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores

2016-09-21 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at redhat dot com

--- Comment #3 from Jeffrey A. Law  ---
Richi, I think you were referring to BZ33562.   I have a follow-up patch for
that which attacks it on the DSE side rather than in complex lowering.  The
approach in that follow-up patch has a reasonable chance of catching the case
in this BZ.

I still hope to get back to it prior to close of stage1.

[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores

2016-09-06 Thread petschy at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485

--- Comment #2 from petschy at gmail dot com ---
I agree that the generic case can become quite complicated: if after the
memset, the individual values are written with gaps between them, or multiple
contiguous chunks with gaps between them, it's not easy to tell whether having
a single memset + overwrites is better than having multiple memsets with
distinct regions + the individual byte writes, or anything in-between. It all
depends on the actual pattern.

However, for a simplified approach I can think of is keeping track of
contiguous regions that are written, then trimming the regions based on the
order and overlap, or merging them if they are adjacent.

In this particular case this would mean that
 [0,199] : memset 0 + [0,31] : const init
could be converted to
 [32,199] : memset 0 + [0,31] : const init
knowing that the const init comes later.

A similar adjustment can be made if a second const init region overlaps with
the end of the memset region. Adjacent or overlapping const init regions can be
merged.

But then, of course comes the devil with the details: if the trivial merging
and trimming of the intervals is done,
 - at what length is it worth having the memset merged into the const init
   regions, if a short memset is stuck between two const init regions?
 - and vica versa, at what length is it worth having a single memset with
   an overwriting const init region at the middle vs memset + const init +
   memset as disjunct regions?
 - at what point is it worth storing the whole data in .rodata and just memcpy
   it to the target?
 - how to integrate regions of runtime calculated values into the above?

For my particular case, I can work around this inefficiency by setting the
buffer to the exact size. I have no idea how a simple region based approach
like the above would perform in general and whether it would worth the
development effort.

[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores

2016-09-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-09-06
 CC||law at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org
  Component|c++ |tree-optimization
Summary|Missed dead store   |Missed dead store
   |elimination when returning  |elimination of aggregate
   |constexpr generated data|store followed by partial
   ||stores
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
I think we have a duplicate for this somewhere -- our DSE cannot handle the
situation where a larger store is made dead by several following smaller ones.
We have

auto foo() ()
{
  :
   = {};
  .buf[0] = 48;
  .buf[1] = 32;
  .buf[2] = 49;
  .buf[3] = 32;
...
  .buf[29] = 49;
  .buf[30] = 51;
  return ;

}

here.  I believe Jeff had some patches in this area (that I didn't like too
much)
that would maybe have handled this special case.

Note that integrating (some) "DSE" into SRA could allow to handle the case of
a single byte not being written twice and thus decompose the large initializer.

Note it's even harder to catch the case of an aggregate init which also covers
struct padding vs. initializing all fields (but obviously not the padding).