[PATCH] Fix PR63677 and a regression from alias-improvements

2014-11-19 Thread Richard Biener

PR63677 shows that late complete unrolling can expose quite some memory
CSE opportunities (and followup simplifications) which we fail to
exploit because currently DOMs ability to do memory CSE is quite
limited (it requires the redundant load to optimize to directly follow the
load or store).  This is actually a regression from before
alias-improvements, thus to the time where we had multiple virtual
operand symbols which made us handle some cases better than now
with a single memory operand symbol.

The following patch makes DOM use the alias walker to fix that.

The easiest way to do that is to no longer hash virtual operands
and use the expression hash lookup result as candidate if
the expression involves memory.  We can then verify if the candidate
can be used by using the alias walker to walk to the candidate.
If there is an intermediate possibly aliasing stmt we simply
replace the hashtable entry and set up enough info to be able
to rewind to the previous state during unwinding.

This means the DOM implementation of memory CSE is a tad bit more
efficient than that of SCCVN.  But it is also very simplistic as
it requires 1:1 requivalence for the memory access structure
(as compared to SCCVN which can look through various kinds of
aggregate copies or initializations as well as handling accessing
the same memory location with different memory access structures).
This is also the reason that the patch doesn't fix PR63864 which
requires us to look through aggregate copies.

To fix PR63677 I also had to teach DOM the trick of propagating
a + CST as MEM[a, CST] which is used by all other propagators.

Given how easy this was to restore pre-alias-improvements state
I wonder why I didn't think of this before...

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Ok for trunk?

Thanks,
Richard.

2014-11-19   Richard Biener  rguent...@suse.de

PR tree-optimization/63677
* tree-ssa-dom.c: Include gimplify.h for unshare_expr.
(avail_exprs_stack): Make a vector of pairs.
(struct hash_expr_elt): Replace stmt member with vop member.
(expr_elt_hasher::equal): Simplify.
(initialize_hash_element): Adjust.
(initialize_hash_element_from_expr): Likewise.
(dom_opt_dom_walker::thread_across_edge): Likewise.
(record_cond): Likewise.
(dom_opt_dom_walker::before_dom_children): Likewise.
(print_expr_hash_elt): Likewise.
(remove_local_expressions_from_table): Restore previous state
if requested.
(record_equivalences_from_stmt): Record x + CST as constant
MEM[x, CST] for further propagation.
(vuse_eq): New function.
(lookup_avail_expr): For loads use the alias oracle to see
whether a candidate from the expr hash is usable.
(avail_expr_hash): Do not hash VUSEs.

* gcc.dg/tree-ssa/ssa-dom-cse-2.c: New testcase.
* gcc.dg/tree-ssa/ssa-dom-cse-3.c: Likewise.

Index: gcc/tree-ssa-dom.c
===
*** gcc/tree-ssa-dom.c.orig 2014-11-19 13:28:46.123015743 +0100
--- gcc/tree-ssa-dom.c  2014-11-19 13:58:33.122937543 +0100
*** along with GCC; see the file COPYING3.
*** 66,71 
--- 66,72 
  #include tree-ssa-threadedge.h
  #include tree-ssa-dom.h
  #include inchash.h
+ #include gimplify.h
  
  /* This file implements optimizations on the dominator tree.  */
  
*** struct edge_info
*** 139,145 
 marker.  */
  typedef struct expr_hash_elt * expr_hash_elt_t;
  
! static vecexpr_hash_elt_t avail_exprs_stack;
  
  /* Structure for entries in the expression hash table.  */
  
--- 140,146 
 marker.  */
  typedef struct expr_hash_elt * expr_hash_elt_t;
  
! static vecstd::pairexpr_hash_elt_t, expr_hash_elt_t  avail_exprs_stack;
  
  /* Structure for entries in the expression hash table.  */
  
*** struct expr_hash_elt
*** 151,158 
/* The expression (rhs) we want to record.  */
struct hashable_expr expr;
  
!   /* The stmt pointer if this element corresponds to a statement.  */
!   gimple stmt;
  
/* The hash value for RHS.  */
hashval_t hash;
--- 152,160 
/* The expression (rhs) we want to record.  */
struct hashable_expr expr;
  
!   /* The virtual operand associated with the nearest dominating stmt
!  loading from or storing to expr.  */
!   tree vop;
  
/* The hash value for RHS.  */
hashval_t hash;
*** expr_elt_hasher::hash (const value_type
*** 187,196 
  inline bool
  expr_elt_hasher::equal (const value_type p1, const compare_type p2)
  {
-   gimple stmt1 = p1-stmt;
const struct hashable_expr *expr1 = p1-expr;
const struct expr_hash_elt *stamp1 = p1-stamp;
-   gimple stmt2 = p2-stmt;
const struct hashable_expr *expr2 = p2-expr;
const struct expr_hash_elt *stamp2 = p2-stamp;
  
--- 189,196 
*** expr_elt_hasher::equal (const value_type
*** 198,222 

Re: [PATCH] Fix PR63677 and a regression from alias-improvements

2014-11-19 Thread Jeff Law

On 11/19/14 06:12, Richard Biener wrote:


PR63677 shows that late complete unrolling can expose quite some memory
CSE opportunities
Yes, we saw the same thing with unrolling in RTL.  So this is not a 
surprise at all.




(and followup simplifications) which we fail to

exploit because currently DOMs ability to do memory CSE is quite
limited (it requires the redundant load to optimize to directly follow the
load or store).  This is actually a regression from before
alias-improvements, thus to the time where we had multiple virtual
operand symbols which made us handle some cases better than now
with a single memory operand symbol.
I certainly wasn't aware of the directly follow requirement -- it 
should just be depending on the state of the operands, both real and 
virtual.


Certainly the change away from multiple virtual operand symbols to a 
single one has hurt DOM's ability to perform CSE of loads.  Those 
changes went in while I wasn't doing GCC work, so I never looked at how 
bad they mucked things up.




The easiest way to do that is to no longer hash virtual operands
and use the expression hash lookup result as candidate if
the expression involves memory.  We can then verify if the candidate
can be used by using the alias walker to walk to the candidate.
If there is an intermediate possibly aliasing stmt we simply
replace the hashtable entry and set up enough info to be able
to rewind to the previous state during unwinding.

Seems quite reasonable.


2014-11-19   Richard Biener  rguent...@suse.de

PR tree-optimization/63677
* tree-ssa-dom.c: Include gimplify.h for unshare_expr.
(avail_exprs_stack): Make a vector of pairs.
(struct hash_expr_elt): Replace stmt member with vop member.
(expr_elt_hasher::equal): Simplify.
(initialize_hash_element): Adjust.
(initialize_hash_element_from_expr): Likewise.
(dom_opt_dom_walker::thread_across_edge): Likewise.
(record_cond): Likewise.
(dom_opt_dom_walker::before_dom_children): Likewise.
(print_expr_hash_elt): Likewise.
(remove_local_expressions_from_table): Restore previous state
if requested.
(record_equivalences_from_stmt): Record x + CST as constant
MEM[x, CST] for further propagation.
(vuse_eq): New function.
(lookup_avail_expr): For loads use the alias oracle to see
whether a candidate from the expr hash is usable.
(avail_expr_hash): Do not hash VUSEs.

* gcc.dg/tree-ssa/ssa-dom-cse-2.c: New testcase.
* gcc.dg/tree-ssa/ssa-dom-cse-3.c: Likewise.

Looks good to me.

Jeff