[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485 Bug 77485 depends on bug 33562, which changed state. Bug 33562 Summary: [6 Regression] aggregate DSE disabled https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33562 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485 --- Comment #16 from Jeffrey A. Law --- Author: law Date: Mon Jan 16 23:43:05 2017 New Revision: 244509 URL: https://gcc.gnu.org/viewcvs?rev=244509=gcc=rev Log: 2017-01-16 Jeff LawPR tree-optimization/79090 PR tree-optimization/33562 PR tree-optimization/61912 PR tree-optimization/77485 * tree-ssa-dse.c (compute_trims): Accept STMT argument. Dump STMT and computed trims into the dump file. PR tree-optimization/79090 PR tree-optimization/33562 PR tree-optimization/61912 PR tree-optimization/77485 * tree-ssa-dse.c (compute_trims): Accept STMT argument. Dump STMT and computed trims into the dump file. Added: trunk/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-2.C trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-29.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-dse.c
[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485 --- Comment #15 from Jeffrey A. Law --- Author: law Date: Sat Jan 14 06:16:23 2017 New Revision: 244461 URL: https://gcc.gnu.org/viewcvs?rev=244461=gcc=rev Log: PR tree-optimization/33562 PR tree-optimization/61912 PR tree-optimization/77485 * tree-ssa-dse.c (delete_dead_call): Accept gsi rather than a statement. (delete_dead_assignment): Likewise. (dse_dom_walker::dse_optimize_stmt): Pass in the gsi rather than statement to delete_dead_call and delete_dead_assignment. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-ssa-dse.c
[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485 Bug 77485 depends on bug 33562, which changed state. Bug 33562 Summary: [5/6 Regression] aggregate DSE disabled https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33562 What|Removed |Added Status|RESOLVED|NEW Resolution|FIXED |---
[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485 Bug 77485 depends on bug 33562, which changed state. Bug 33562 Summary: [5/6 Regression] aggregate DSE disabled https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33562 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485 --- Comment #14 from Jeffrey A. Law --- Author: law Date: Fri Jan 13 15:46:22 2017 New Revision: 23 URL: https://gcc.gnu.org/viewcvs?rev=23=gcc=rev Log: PR tree-optimization/61912 PR tree-optimization/77485 * tree-ssa-dse.c: Include expr.h. (maybe_trim_constructor_store): New function. (maybe_trim_partially_dead_store): Call maybe_trim_constructor_store. PR tree-optimization/61912 PR tree-optimization/77485 * g++.dg/tree-ssa/ssa-dse-1.C: New test. * gcc.dg/tree-ssa/pr30375: Adjust expected output. * gcc.dg/tree-ssa/ssa-dse-24.c: New test. Added: trunk/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-1.C trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-24.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/tree-ssa/pr30375.c trunk/gcc/tree-ssa-dse.c
[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485 --- Comment #12 from Jeffrey A. Law --- Author: law Date: Fri Jan 13 15:37:09 2017 New Revision: 21 URL: https://gcc.gnu.org/viewcvs?rev=21=gcc=rev Log: PR tree-optimization/33562 PR tree-optimization/61912 PR tree-optimization/77485 * sbitmap.h (bitmap_count_bits): Prototype. (bitmap_clear_range, bitmap_set_range): Likewise. * sbitmap.c (bitmap_clear_range): New function. (bitmap_set_range, sbitmap_popcount, bitmap_count_bits): Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/sbitmap.c trunk/gcc/sbitmap.h
[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485 --- Comment #13 from Jeffrey A. Law --- Author: law Date: Fri Jan 13 15:42:08 2017 New Revision: 22 URL: https://gcc.gnu.org/viewcvs?rev=22=gcc=rev Log: PR tree-optimization/33562 PR tree-optimization/61912 PR tree-optimization/77485 * doc/invoke.texi: Document new dse-max-object-size param. * params.def (PARM_DSE_MAX_OBJECT_SIZE): New PARAM. * tree-ssa-dse.c: Include params.h. (dse_store_status): New enum. (initialize_ao_ref_for_dse): New, partially extracted from dse_optimize_stmt. (valid_ao_ref_for_dse, normalize_ref): New. (setup_live_bytes_from_ref, compute_trims): Likewise. (clear_bytes_written_by, maybe_trim_complex_store): Likewise. (maybe_trim_partially_dead_store): Likewise. (maybe_trim_complex_store): Likewise. (dse_classify_store): Renamed from dse_possibly_dead_store_p. Track what bytes live from the original store. Return tri-state for dead, partially dead or live. (dse_dom_walker): Add constructor, destructor and new private members. (delete_dead_call, delete_dead_assignment): New extracted from dse_optimize_stmt. (dse_optimize_stmt): Make a member of dse_dom_walker. Use initialize_ao_ref_for_dse. PR tree-optimization/33562 PR tree-optimization/61912 PR tree-optimization/77485 * gcc.dg/tree-ssa/complex-4.c: Remove xfail. * gcc.dg/tree-ssa/complex-5.c: Likewise. * gcc.dg/tree-ssa/ssa-dse-9.c: Likewise. * gcc.dg/tree-ssa/ssa-dse-18.c: New test. * gcc.dg/tree-ssa/ssa-dse-19.c: Likewise. * gcc.dg/tree-ssa/ssa-dse-20.c: Likewise. * gcc.dg/tree-ssa/ssa-dse-21.c: Likewise. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-18.c - copied, changed from r21, trunk/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-19.c - copied, changed from r21, trunk/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-20.c - copied, changed from r21, trunk/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-21.c - copied, changed from r21, trunk/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c Modified: trunk/gcc/ChangeLog trunk/gcc/doc/invoke.texi trunk/gcc/params.def trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c trunk/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c trunk/gcc/tree-ssa-dse.c
[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485 Jeffrey A. Law changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |DUPLICATE --- Comment #11 from Jeffrey A. Law --- This is effectively the same as 61912 and fixed by the same change under development. We'll track via 61912, but expect to use the testcase from this BZ as well as 61912 in the testsuite. *** This bug has been marked as a duplicate of bug 61912 ***
[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485 --- Comment #10 from Jeffrey A. Law --- [0.00%]: MEM[(char[170] *)& + 30B] = {}; .buf[0] = 48; [ ... ]
[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485 --- Comment #9 from Jeffrey A. Law --- That ought to be significantly easier and cleaner. I really didn't like the transformation into memset. I'll give that a whirl tomorrow.
[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485 --- Comment #8 from rguenther at suse dot de --- On Wed, 14 Dec 2016, law at redhat dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485 > > --- Comment #7 from Jeffrey A. Law --- > There was a bit of cruft and a missed ADDR_EXPR in that last fragment of > gimple > code. > > ;; basic block 2, loop depth 0, count 0, freq 0, maybe hot > ;;prev block 0, next block 1, flags: (NEW, REACHABLE, VISITED) > ;;pred: ENTRY (FALLTHRU,EXECUTABLE) > __builtin_memset ([(struct FixBuf *)& + 31B], 0, 169); > .buf[0] = 48; > .buf[1] = 32; > [ ... ] > return ; It also shows another (awkward) detail of the transform -- a previously non-TREE_ADDRESSABLE becomes TREE_ADDRESSABLE. This generally invalidates points-to (though in this particular case should be harmless). It also possibly pessimizes alias analysis because pointers pointing to unknown stuff now also alias while previously not (non-TREE_ADDRESSABLE decls are not aliased). This is why we generally prefer decl = {}; over a memset. If you just chop off elements from the head/tail you can keep the {} if you do MEM[, off with original alias type] = {}; (CONSTRUCTOR of same char[] type)
[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485 --- Comment #7 from Jeffrey A. Law --- There was a bit of cruft and a missed ADDR_EXPR in that last fragment of gimple code. ;; basic block 2, loop depth 0, count 0, freq 0, maybe hot ;;prev block 0, next block 1, flags: (NEW, REACHABLE, VISITED) ;;pred: ENTRY (FALLTHRU,EXECUTABLE) __builtin_memset ([(struct FixBuf *)& + 31B], 0, 169); .buf[0] = 48; .buf[1] = 32; [ ... ] return ;
[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485 --- Comment #6 from Jeffrey A. Law --- So DSE involving CONSTRUCTOR nodes is a bit of a pain. The way CONSTRUCTOR nodes are used past gimplification is awkward for DSE. Specifically we have a CONSTRUCTOR node with no ELTS. The semantics of that are the entire object is zero initialized. In fact, unless a magic flag is set, any field not showing up in CONSTRUCTOR_ELTS is to be zero initialized. So while we can easily identify elements from the CONSTRUCTOR that are dead, it's not just a matter of walking the CONSTRUCTOR_ELTs and removing those which are dead. Instead we'd have to add a suitable field for every element of the object which is not dead. Ugh. Given the usage and semantics, we could treat a CONSTRUCTOR node (after verification) as a memset. That would allow the code I've already written which can chop off the head or tail of a memset to be re-used to simplify CONSTRUCTORS. Note that my prototype actually turns the CONSTRUCTOR into a memset. At that point we bypass all that code in expr.c to find a reasonably efficient way to expand the CONSTRUCTOR node and instead go through the paths to try to emit an efficient memset. For the given test we take this: basic block 2, loop depth 0 pred: ENTRY # .MEM_2 = VDEF <.MEM_1(D)> = {}; # .MEM_3 = VDEF <.MEM_2> .buf[0] = 48; # .MEM_4 = VDEF <.MEM_3> .buf[1] = 32; [ ... ] # VUSE <.MEM_33> return ; succ: EXIT Note the CONSTRUCTOR assignment to retval. We turn that into: basic block 2, loop depth 0 pred: ENTRY # VUSE <.MEM> _34 = MEM[(struct FixBuf *)& + 31B]; # .MEM_2 = VDEF <.MEM_1(D)> __builtin_memset (_34, 0, 169); # .MEM_3 = VDEF <.MEM_2> .buf[0] = 48; # .MEM_4 = VDEF <.MEM_3> .buf[1] = 32; [ ... ] # VUSE <.MEM_33> return ; succ: EXIT Note how the CONSTRUCTOR initialization has been replaced by a memset which starts at offset 31 from the start of the object and initialized 169 bytes. Instrumentation shows that handling CONSTRUCTORS in terms of static opportunities to improve the code is more beneficial than any of the other things we've looked at within the context of bz33562. But it's also by far the ugliest of the transformations.
[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485 --- Comment #5 from rguenther at suse dot de --- On Thu, 8 Dec 2016, law at redhat dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485 > > --- Comment #4 from Jeffrey A. Law --- > So my patches for 33562 will detect the partial dead store in "foo", but I > never wrote the bits to narrow partial dead stores. > > The difficulty in optimizing this particular case will be rewriting the > CONSTRUCTOR node. Though it may be as simple as walking down the CONSTRUCTOR > and splicing out components which correspond to dead assignments -- I'll have > to familiarize myself with the guts of how to walk CONSTRUCTOR nodes. > Hopefully they're at least in-order and extracting byte offsets is easy :-) Should be. But I still think any "real" DSE work should be done by unifying what SRA, update-address-taken, store-merging and current DSE do (eventually even the bswap pass). Those are all related and they share basic analysis (and dataflow) parts and would benefit from each other transform "tricks". And all of them would benefit from being flow-sensitive in a way to consider parts of the CFG where for example addresses do not escape and doing some (eventually expensive) stuff at such (cold?) region boundaries.
[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485 --- Comment #4 from Jeffrey A. Law --- So my patches for 33562 will detect the partial dead store in "foo", but I never wrote the bits to narrow partial dead stores. The difficulty in optimizing this particular case will be rewriting the CONSTRUCTOR node. Though it may be as simple as walking down the CONSTRUCTOR and splicing out components which correspond to dead assignments -- I'll have to familiarize myself with the guts of how to walk CONSTRUCTOR nodes. Hopefully they're at least in-order and extracting byte offsets is easy :-)
[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485 Jeffrey A. Law changed: What|Removed |Added CC||law at redhat dot com --- Comment #3 from Jeffrey A. Law --- Richi, I think you were referring to BZ33562. I have a follow-up patch for that which attacks it on the DSE side rather than in complex lowering. The approach in that follow-up patch has a reasonable chance of catching the case in this BZ. I still hope to get back to it prior to close of stage1.
[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485 --- Comment #2 from petschy at gmail dot com --- I agree that the generic case can become quite complicated: if after the memset, the individual values are written with gaps between them, or multiple contiguous chunks with gaps between them, it's not easy to tell whether having a single memset + overwrites is better than having multiple memsets with distinct regions + the individual byte writes, or anything in-between. It all depends on the actual pattern. However, for a simplified approach I can think of is keeping track of contiguous regions that are written, then trimming the regions based on the order and overlap, or merging them if they are adjacent. In this particular case this would mean that [0,199] : memset 0 + [0,31] : const init could be converted to [32,199] : memset 0 + [0,31] : const init knowing that the const init comes later. A similar adjustment can be made if a second const init region overlaps with the end of the memset region. Adjacent or overlapping const init regions can be merged. But then, of course comes the devil with the details: if the trivial merging and trimming of the intervals is done, - at what length is it worth having the memset merged into the const init regions, if a short memset is stuck between two const init regions? - and vica versa, at what length is it worth having a single memset with an overwriting const init region at the middle vs memset + const init + memset as disjunct regions? - at what point is it worth storing the whole data in .rodata and just memcpy it to the target? - how to integrate regions of runtime calculated values into the above? For my particular case, I can work around this inefficiency by setting the buffer to the exact size. I have no idea how a simple region based approach like the above would perform in general and whether it would worth the development effort.
[Bug tree-optimization/77485] Missed dead store elimination of aggregate store followed by partial stores
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77485 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Status|UNCONFIRMED |NEW Last reconfirmed||2016-09-06 CC||law at gcc dot gnu.org, ||rguenth at gcc dot gnu.org Component|c++ |tree-optimization Summary|Missed dead store |Missed dead store |elimination when returning |elimination of aggregate |constexpr generated data|store followed by partial ||stores Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- I think we have a duplicate for this somewhere -- our DSE cannot handle the situation where a larger store is made dead by several following smaller ones. We have auto foo() () { : = {}; .buf[0] = 48; .buf[1] = 32; .buf[2] = 49; .buf[3] = 32; ... .buf[29] = 49; .buf[30] = 51; return ; } here. I believe Jeff had some patches in this area (that I didn't like too much) that would maybe have handled this special case. Note that integrating (some) "DSE" into SRA could allow to handle the case of a single byte not being written twice and thus decompose the large initializer. Note it's even harder to catch the case of an aggregate init which also covers struct padding vs. initializing all fields (but obviously not the padding).