Re: [patch tree-optimization]: Fix for PR 45397 part 1 of 2

2012-03-16 Thread Richard Guenther
On Thu, Mar 15, 2012 at 3:34 PM, Kai Tietz ktiet...@googlemail.com wrote:
 2012/3/15 Richard Guenther richard.guent...@gmail.com:
 On Thu, Mar 15, 2012 at 3:00 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Thu, Mar 15, 2012 at 02:53:10PM +0100, Kai Tietz wrote:
  This looks like to match unbound pattern sizes and thus does not fit
  into the forwprop machinery.  Instead it was suggested elsewhere
  that promoting / demoting registers should be done in a separate pass
  where you can compute a lattice of used bits and apply a transform
  based on that lattice and target information (according to PROMOTE_MODE
  for example).

 Well, the integer truncation part might be something for a separate
 pass.  It could then also take care that within single-use
 gimple-statements the integral-constant is always on right-hand-side
 of first statement of an +, -, |, ^, , and mul.

 But the cast-hoisting code itself is not unbound AFAICS and has fixed
 pattern size.

 Can you split that part out then please?

 I can do.  In fact just the part of calling

 Sure, it would be the removal of the function truncate_integers and its call.

 The type demotion is PR45397/PR47477 among other PRs.
 I'd just walk from the narrowing integer conversion stmts recursively
 through the def stmts, see if they can be narrowed, note it, and finally if
 everything or significant portion of the stmts can be demoted (if not all,
 with some narrowing integer conversion stmt inserted), do it all together.

 Jakub, this might be something good to have it in separate pass.
 Right now I need to avoid some type-hoisting in forward-propagation,
 as otherwise it would loop endless caused by type-sinking code in
 forward-propagation.

Well, that sounds like either of it is not applying costs properly.  We should
have a total ordering cost-wise on both forms.  Which forms do we iterate
on?

 Only question would be where such pass would be
 best placed.  After or before forward-propagation pass?

Somewhere before loop optimizations (especially IVOPTs).  Generally
it might help SCEV analysis, so I'd do it before PRE, after the CCP/copy-prop
passes.

 For PROMOTE_MODE targets you'd promote but properly mask out
 constants (to make them cheaper to generate, for example).  You'd
 also take advantate of targets that can do zero/sign-extending loads
 without extra cost (ISTR that's quite important for some SPEC 2k6
 benchmark on x86_64).

 Hmm, as we are talking about truncation-casts here, what is the reaons
 for PROMOTE_MODE here?
 You mean to generate for a PROMOTE_MODE target explicit something like:
 D1 = 0x80
 D2 = (int) D1

 instead of having D1 = 0xff80.  Isn't that a decision done on RTL level?

PROMOTE_MODE is about targets only having basic operations in
the mode PROMOTE_MODE promotes to.  For example (IIRC) powerpc
can only do arithmetic on full register width, not on arbitrarily sub-regs
as i386.  Which means that if you need sub-reg precision values we have
to insert truncataions after every operation on RTL.  Thus, if you artificially
lower precision of computations on the tree level this will pessimize things
on RTL.  So, you never should narrow operations below what PROMOTE_MODE
would do.  In fact, you might as well expose the PROMOTE_MODE fact on
the tree level.

 Another interesting type-hoisting part is to check if a type-sign
 change might be profitable.

 Eg:
 signed char D0, D1, D5;
 int D2,D3,D4
 ...
 D2 = (int) D0
 D3 = (int) D1
 D4 = D2 + D3
 D5 = (signed char) D4
 D6 = D5 == 0x8f

 to

 signed char D0, D1, D5;
 unsigned char D2,D3,D4
 ...
 D2 = (unsigned char) D0
 D3 = (unsigned char) D1
 D4 = D2 + D3
 D5 = D4 == 0x7fu

You need to watch for undefined overflow issues on the narrowed expressions
anyway (thus, have to resort to unsigned arithmetic in nearly all cases).

Richard.

 Richard.

        Jakub

 Regards,
 Kai


Re: [patch tree-optimization]: Fix for PR 45397 part 1 of 2

2012-03-15 Thread Richard Guenther
On Thu, Mar 15, 2012 at 2:08 PM, Kai Tietz ktiet...@googlemail.com wrote
 Hi,

 The solution for this PR is a mix out of different issues.  First is
 of course the type-hoisting, but also
 it shows some lacks in simplifications on integer-values, and on equal
 and none-equal
 comparisons.
 The first patch adds to forward-propagation the ability to do type-hoisting
 for some conversion operations and do simplification for inner binary
 operations on it.
 Most important part is here the adjustment of constant integer-values
 in statement-lists
 for a truncation.
 I limited that patch to handle in compare_equal_optimize_1 only
 bitwise-and operations
 inner a truncation-cast.  Of course for bitwise-xor/or operations some
 more simplifications
 are possible.
 This patch just does the type-hoisting part.  In a second patch I add
 to compare_equal_optimize_1
 the ability for further required simplifications for fixing this problem.

This looks like to match unbound pattern sizes and thus does not fit
into the forwprop machinery.  Instead it was suggested elsewhere
that promoting / demoting registers should be done in a separate pass
where you can compute a lattice of used bits and apply a transform
based on that lattice and target information (according to PROMOTE_MODE
for example).

Richard.

 ChangeLog

 2012-03-15  Kai Tietz  kti...@redhat.com

        PR tree-optimization/45397
        * tree-ssa-forwprop.c (compare_equal_optimize_1): New
        function.
        (combine_cond_expr_cond): Use compare_equal_optimize_1
        function.
        (truncate_integers): New function.
        (combine_inner_conversion): Likewise.
        (ssa_forward_propagate_and_combine): Use for conversions
        combine_inner_conversion function.

 2012-03-15  Kai Tietz  kti...@redhat.com

        * gcc.dg/tree-ssa/pr45397-1.c: New testcase.

 Regression tested for all languages (including Ada and Obj-C) on
 x86_64-unknown-linux-gnu.  Ok for apply?

 Regards,
 Kai

 Index: gcc-trunk/gcc/tree-ssa-forwprop.c
 ===
 --- gcc-trunk.orig/gcc/tree-ssa-forwprop.c
 +++ gcc-trunk/gcc/tree-ssa-forwprop.c
 @@ -362,6 +362,150 @@ rhs_to_tree (tree type, gimple stmt)
     gcc_unreachable ();
  }

 +/* This function does simplifications of comparison OP0 ==/!= OP1
 while integral
 +   typed operands
 +   On success new statement's TREE is returned, otherwise NULL_TREE.  */
 +
 +static tree
 +compare_equal_optimize_1 (gimple stmt, enum tree_code code, tree type,
 +                         tree op0, tree op1)
 +{
 +  gimple_stmt_iterator gsi;
 +  tree type_outer;
 +  tree type_inner, op_inner;
 +  tree op1_l, op1_r, tem;
 +  gimple newop;
 +
 +  type_outer = TREE_TYPE (op0);
 +  if ((code != EQ_EXPR  code != NE_EXPR)
 +      || !INTEGRAL_TYPE_P (type_outer))
 +    return NULL_TREE;
 +
 +  /* If OP0 isn't a conversion, then check if OP1 might be one.  If so
 +     swap arguments, otherwise return NULL_TREE.  */
 +  if (!CONVERT_EXPR_P (op0))
 +    {
 +      if (!CONVERT_EXPR_P (op1))
 +        return NULL_TREE;
 +      tem = op0;
 +      op0 = op1;
 +      op1 = tem;
 +    }
 +
 +  op_inner = TREE_OPERAND (op0, 0);
 +  type_inner = TREE_TYPE (op_inner);
 +
 +  /* Operations only apply to integral typed operands of cast.  */
 +  if (!INTEGRAL_TYPE_P (type_inner))
 +    return NULL_TREE;
 +
 +  /* If second operand is also a type-conversion, check that underlying 
 operand
 +     is integral typed.  */
 +  if (CONVERT_EXPR_P (op1)
 +       !INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (op1, 0
 +    return NULL_TREE;
 +
 +  /* Simplify ((type) X ==/!= (type) X) to true/false, if X has no 
 side-effects
 +     and is integral typed.  */
 +  if (CONVERT_EXPR_P (op1)
 +       TREE_OPERAND (op0, 0) == TREE_OPERAND (op1, 0)
 +       !TREE_SIDE_EFFECTS (TREE_OPERAND (op0, 0)))
 +    return fold_convert (type, (code == EQ_EXPR ? boolean_true_node
 +                                               : boolean_false_node));
 +
 +  /* Simplify (type) X ==/!= (type) Y to X ==/!= Y, if types of X and Y are
 +     compatible and type-precision of X is smaller or equal to TYPE's.  */
 +  if (CONVERT_EXPR_P (op1)
 +       TYPE_PRECISION (type_inner) = TYPE_PRECISION (type_outer))
 +    {
 +      tem = TREE_OPERAND (op1, 0);
 +      if (!useless_type_conversion_p (type_inner, TREE_TYPE (tem)))
 +       return NULL_TREE;
 +      return fold_build2_loc (gimple_location (stmt), code, type,
 +                             op_inner, tem);
 +    }
 +
 +  /* Verify that for pattern 'OP0 = (type) X' the type of X is of
 integral kind,
 +     is unsigned, and has smaller or equal precision to type TYPE.  */
 +  if (TYPE_PRECISION (type_inner)  TYPE_PRECISION (type_outer)
 +      || !TYPE_UNSIGNED (type_inner))
 +    return NULL_TREE;
 +
 +  /* Simplify (type) X ==/!= CST to X ==/!= CST' with CST'=(type-of-X) CST.  
 */
 +  if (TREE_CODE (op1) == INTEGER_CST)
 +    {
 +      tree new_cst = fold_convert (type_inner, 

Re: [patch tree-optimization]: Fix for PR 45397 part 1 of 2

2012-03-15 Thread Kai Tietz
2012/3/15 Richard Guenther richard.guent...@gmail.com:
 On Thu, Mar 15, 2012 at 2:08 PM, Kai Tietz ktiet...@googlemail.com wrote
 Hi,

 The solution for this PR is a mix out of different issues.  First is
 of course the type-hoisting, but also
 it shows some lacks in simplifications on integer-values, and on equal
 and none-equal
 comparisons.
 The first patch adds to forward-propagation the ability to do type-hoisting
 for some conversion operations and do simplification for inner binary
 operations on it.
 Most important part is here the adjustment of constant integer-values
 in statement-lists
 for a truncation.
 I limited that patch to handle in compare_equal_optimize_1 only
 bitwise-and operations
 inner a truncation-cast.  Of course for bitwise-xor/or operations some
 more simplifications
 are possible.
 This patch just does the type-hoisting part.  In a second patch I add
 to compare_equal_optimize_1
 the ability for further required simplifications for fixing this problem.

 This looks like to match unbound pattern sizes and thus does not fit
 into the forwprop machinery.  Instead it was suggested elsewhere
 that promoting / demoting registers should be done in a separate pass
 where you can compute a lattice of used bits and apply a transform
 based on that lattice and target information (according to PROMOTE_MODE
 for example).

 Richard.

Well, the integer truncation part might be something for a separate
pass.  It could then also take care that within single-use
gimple-statements the integral-constant is always on right-hand-side
of first statement of an +, -, |, ^, , and mul.

But the cast-hoisting code itself is not unbound AFAICS and has fixed
pattern size.

Regards,
Kai


Re: [patch tree-optimization]: Fix for PR 45397 part 1 of 2

2012-03-15 Thread Jakub Jelinek
On Thu, Mar 15, 2012 at 02:53:10PM +0100, Kai Tietz wrote:
  This looks like to match unbound pattern sizes and thus does not fit
  into the forwprop machinery.  Instead it was suggested elsewhere
  that promoting / demoting registers should be done in a separate pass
  where you can compute a lattice of used bits and apply a transform
  based on that lattice and target information (according to PROMOTE_MODE
  for example).
 
 Well, the integer truncation part might be something for a separate
 pass.  It could then also take care that within single-use
 gimple-statements the integral-constant is always on right-hand-side
 of first statement of an +, -, |, ^, , and mul.
 
 But the cast-hoisting code itself is not unbound AFAICS and has fixed
 pattern size.

The type demotion is PR45397/PR47477 among other PRs.
I'd just walk from the narrowing integer conversion stmts recursively
through the def stmts, see if they can be narrowed, note it, and finally if
everything or significant portion of the stmts can be demoted (if not all,
with some narrowing integer conversion stmt inserted), do it all together.

Jakub


Re: [patch tree-optimization]: Fix for PR 45397 part 1 of 2

2012-03-15 Thread Richard Guenther
On Thu, Mar 15, 2012 at 3:00 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Thu, Mar 15, 2012 at 02:53:10PM +0100, Kai Tietz wrote:
  This looks like to match unbound pattern sizes and thus does not fit
  into the forwprop machinery.  Instead it was suggested elsewhere
  that promoting / demoting registers should be done in a separate pass
  where you can compute a lattice of used bits and apply a transform
  based on that lattice and target information (according to PROMOTE_MODE
  for example).

 Well, the integer truncation part might be something for a separate
 pass.  It could then also take care that within single-use
 gimple-statements the integral-constant is always on right-hand-side
 of first statement of an +, -, |, ^, , and mul.

 But the cast-hoisting code itself is not unbound AFAICS and has fixed
 pattern size.

Can you split that part out then please?

 The type demotion is PR45397/PR47477 among other PRs.
 I'd just walk from the narrowing integer conversion stmts recursively
 through the def stmts, see if they can be narrowed, note it, and finally if
 everything or significant portion of the stmts can be demoted (if not all,
 with some narrowing integer conversion stmt inserted), do it all together.

For PROMOTE_MODE targets you'd promote but properly mask out
constants (to make them cheaper to generate, for example).  You'd
also take advantate of targets that can do zero/sign-extending loads
without extra cost (ISTR that's quite important for some SPEC 2k6
benchmark on x86_64).

Richard.

        Jakub


Re: [patch tree-optimization]: Fix for PR 45397 part 1 of 2

2012-03-15 Thread Michael Matz
Hi,

On Thu, 15 Mar 2012, Richard Guenther wrote:

  The type demotion is PR45397/PR47477 among other PRs. I'd just walk 
  from the narrowing integer conversion stmts recursively through the 
  def stmts, see if they can be narrowed, note it, and finally if 
  everything or significant portion of the stmts can be demoted (if not 
  all, with some narrowing integer conversion stmt inserted), do it all 
  together.
 
 For PROMOTE_MODE targets you'd promote but properly mask out
 constants (to make them cheaper to generate, for example).  You'd
 also take advantate of targets that can do zero/sign-extending loads
 without extra cost (ISTR that's quite important for some SPEC 2k6
 benchmark on x86_64).

gamess.  I still have an old proof of concept patch doing type promotion.  
Probably doesn't apply anymore, and it's using too broad predicates (it 
simple-mindedly extends to the largest type see in an expression tree).
But I think that basic idea of it is sound.


Ciao,
Michael.

Index: passes.c
===
--- passes.c(revision 159226)
+++ passes.c(working copy)
@@ -831,6 +831,7 @@ init_optimization_passes (void)
   NEXT_PASS (pass_all_optimizations);
 {
   struct opt_pass **p = pass_all_optimizations.pass.sub;
+  extern struct gimple_opt_pass pass_bprop_extends;
   NEXT_PASS (pass_remove_cgraph_callee_edges);
   /* Initial scalar cleanups before alias computation.
 They ensure memory accesses are not indirect wherever possible.  */
@@ -838,6 +839,7 @@ init_optimization_passes (void)
   NEXT_PASS (pass_update_address_taken);
   NEXT_PASS (pass_rename_ssa_copies);
   NEXT_PASS (pass_complete_unrolli);
+  NEXT_PASS (pass_bprop_extends);
   NEXT_PASS (pass_ccp);
   NEXT_PASS (pass_forwprop);
   NEXT_PASS (pass_call_cdce);
Index: tree-ssa-ccp.c
===
--- tree-ssa-ccp.c  (revision 159226)
+++ tree-ssa-ccp.c  (working copy)
@@ -1999,3 +1999,263 @@ struct gimple_opt_pass pass_fold_builtin
 | TODO_update_ssa  /* todo_flags_finish */
  }
 };
+
+#if 1
+static bool
+promote_through_insn_p (gimple def, tree *prhs1, tree *prhs2)
+{
+  tree lhs, rhs1, rhs2;
+  if (!is_gimple_assign (def))
+return false;
+  lhs = gimple_assign_lhs (def);
+  rhs1 = rhs2 = NULL;
+  switch (gimple_assign_rhs_class (def))
+{
+  case GIMPLE_SINGLE_RHS:
+   rhs1 = gimple_assign_rhs1 (def);
+   if (TREE_CODE (rhs1) != SSA_NAME)
+ return false;
+   break;
+  case GIMPLE_UNARY_RHS:
+   rhs1 = gimple_assign_rhs1 (def);
+   if (TREE_TYPE (gimple_expr_type (def)) != TREE_TYPE (rhs1))
+ return false;
+   break;
+  case GIMPLE_BINARY_RHS:
+   rhs1 = gimple_assign_rhs1 (def);
+   rhs2 = gimple_assign_rhs2 (def);
+
+   switch (gimple_assign_rhs_code (def))
+ {
+   case LSHIFT_EXPR: case RSHIFT_EXPR:
+   case LROTATE_EXPR: case RROTATE_EXPR:
+ rhs2 = NULL;
+ if (TREE_TYPE (lhs) != TREE_TYPE (rhs1))
+   return false;
+ break;
+   case PLUS_EXPR: case MINUS_EXPR:
+   case MULT_EXPR: case EXACT_DIV_EXPR:
+   case TRUNC_DIV_EXPR: case CEIL_DIV_EXPR:
+   case FLOOR_DIV_EXPR: case ROUND_DIV_EXPR:
+   case TRUNC_MOD_EXPR: case CEIL_MOD_EXPR:
+   case FLOOR_MOD_EXPR: case ROUND_MOD_EXPR:
+   case RDIV_EXPR:
+   case MIN_EXPR: case MAX_EXPR:
+   case BIT_IOR_EXPR: case BIT_XOR_EXPR: case BIT_AND_EXPR:
+ if (TREE_TYPE (lhs) != TREE_TYPE (rhs1)
+ || TREE_TYPE (lhs) != TREE_TYPE (rhs2))
+   return false;
+ break;
+   default:
+ return false;
+ }
+   break;
+  default:
+   return false;
+}
+  if (rhs1  TREE_CODE (rhs1) != SSA_NAME)
+rhs1 = NULL;
+  if (prhs1)
+*prhs1 = rhs1;
+  if (rhs2  TREE_CODE (rhs2) != SSA_NAME)
+rhs2 = NULL;
+  if (prhs2)
+*prhs2 = rhs2;
+  return true;
+}
+
+static tree
+get_extended_version (tree newtype, tree name, bool force)
+{
+  tree ret = TREE_CHAIN (name);
+  tree rhs1, rhs2;
+  gimple def;
+  /* If we already have a version of NAME, try to use it.  If it
+ doesn't match in type, fail.  */
+  if (ret)
+{
+  if (TREE_TYPE (ret) == newtype)
+   return ret;
+  else
+   return NULL_TREE;
+}
+  def = SSA_NAME_DEF_STMT (name);
+  /* If we can propagate through our defining insn, try to do that.  */
+  if (promote_through_insn_p (def, rhs1, rhs2))
+{
+  gimple stmt;
+  tree extrhs1, extrhs2;
+  gimple_stmt_iterator gsi;
+  enum tree_code code;
+  if (rhs1)
+   {
+ extrhs1 = get_extended_version (newtype, rhs1, true);
+ if (!extrhs1)
+   /* ??? We could force here.  */
+   return NULL_TREE;
+   }
+  else
+   extrhs1 =