Re: [PATCH] Fix PR 33512 Folding of x ((~x) | y) into x y on the tree level

2012-04-24 Thread Richard Guenther
On Mon, Apr 23, 2012 at 11:21 PM, Andrew Pinski
andrew.pin...@caviumnetworks.com wrote:
 On Mon, Apr 23, 2012 at 2:41 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Sat, Apr 21, 2012 at 6:06 AM, Andrew Pinski
 andrew.pin...@caviumnetworks.com wrote:
 This time with the patch and describing what the bug was.  The problem
 was defcodefor_name does not always set arg1 and arg2.  This fixes it
 so it is always set to NULL if they don't exist.

 Ok with ...

 +  if (code1 == SSA_NAME)
 +    {
 +      def = SSA_NAME_DEF_STMT (name);
 +
 +      if (def  is_gimple_assign (def)
 +          can_propagate_from (def))
 +       {
 +         code1 = gimple_assign_rhs_code (def);
 +         arg11 = gimple_assign_rhs1 (def);
 +          arg21 = gimple_assign_rhs2 (def);
 +          arg31 = gimple_assign_rhs2 (def);
 +       }
 +    }

 ... recursing here instead.


 Recursing how?  Or do you mean when code1 is a SSA_NAME do a recursive
 call so that we can get some simple copy-prop happening?
 The current code does not do that though.

Ah, this is all pre-existing code.  But yes, to get simple copy-prop happening
(which is what this code seems to do, but just a single level).

The patch is ok as-is, you can improve ontop of it if you like.

Thanks,
Richard.

 Thanks,
 Andrew



 Thanks,
 Richard.


 Thanks,
 Andrew

 On Fri, Apr 20, 2012 at 9:05 PM, Andrew Pinski
 andrew.pin...@caviumnetworks.com wrote:
 On Thu, Jan 19, 2012 at 3:13 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Thu, Jan 19, 2012 at 10:00 AM, Andrew Pinski
 andrew.pin...@caviumnetworks.com wrote:
 On Tue, Jan 17, 2012 at 1:38 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Tue, Jan 17, 2012 at 8:06 AM, Andrew Pinski
 andrew.pin...@caviumnetworks.com wrote:
 Hi,
  This adds the folding of x  ((~x) | y)) into x  y on the tree
 level via fold-const.c
 There is already partly done on the RTL level but it would be a good
 thing for the tree level also.


 OK for 4.8 (yes I know we have not branched yet but I thought I would
 send it out so I don't forget about it)?
 Bootstrapped and tested on x86_64-linux-gnu with no regressions.

 Can you instead patch tree-ssa-forwprop.c:simplify_bitwise_binary?

 Yes and here is a new patch which also adds optimizing x | ((~x)  y)) 
 to x | y.
 Also it adds the optimizing x  (x | y) to x and x | (x  y) to x to
 tree-ssa-forwprop.c:simplify_bitwise_binary
 since it was an easy extension on top of the ~x case (well I
 implemented the one without the ~ first).  I did not remove those
 folding from fold-const.c though.

 Also I was thinking maybe this belongs in reassociate though I don't
 see how to do it.

 I still have plans to create that piecewise gimple_fold (see my proposal
 from early last year) that would be the container for this kind of pattern
 matching.  It would then be usable from reassoc as well (but reassoc
 has the issue of only collecting one kind of op, so its simplification
 wouldn't trigger reliably on these).

 http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01099.html

 OK for 4.8, once in stage 1? Again bootstrapped and tested on
 x86_64-linux-gnu with no regressions.

 Ok.

 Here is an updated patch which fixes a bug which I found while doing
 the treecombine branch.  I rewrote defcodefor_name so more than
 SSA_NAMEs can be passed to it.

 OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

 Thanks,
 Andrew Pinski

 ChangeLog:

 ChangeLog:
 * tree-ssa-forwprop.c (defcodefor_name): New function.
 (simplify_bitwise_binary): Use defcodefor_name instead of manually
 Simplify ( X | Y)  X to X and ( X  Y) | X to X.
 Simplify (~X | Y)  X to X  Y and
 (~X  Y) | X to X | Y.

 testsuite/ChangeLog:
 * gcc.dg/tree-ssa/andor-3.c: New testcase.
 * gcc.dg/tree-ssa/andor-4.c: New testcase.
 * gcc.dg/tree-ssa/andor-5.c: New testcase.


 Thanks,
 Richard.

 Thanks,
 Andrew Pinski

 ChangeLog:
 * tree-ssa-forwprop.c (defcodefor_name): New function.
 (simplify_bitwise_binary): Use defcodefor_name.
 Simplify ( X | Y)  X to X and ( X  Y) | X to X.
 Simplify (~X | Y)  X to X  Y and
 (~X  Y) | X to X | Y.

 testsuite/ChangeLog:
 * gcc.dg/tree-ssa/andor-3.c: New testcase.
 * gcc.dg/tree-ssa/andor-4.c: New testcase.
 * gcc.dg/tree-ssa/andor-5.c: New testcase.



 Thanks,
 Richard.

 Thanks,
 Andrew Pinski

 ChangeLog:
 * fold-const.c (fold_binary_loc case BIT_AND_EXPR): Add folding of x
  (~x | y) into x  y.

 testsuite/ChangeLog:
 * gcc.dg/tree-ssa/andor-3.c: New testcase.


Re: [PATCH] Fix VRP ICE on x cst1; if (x cmp cst2) (PR tree-optimization/53058)

2012-04-24 Thread Richard Guenther
On Tue, Apr 24, 2012 at 1:01 AM, Jakub Jelinek ja...@redhat.com wrote:
 Hi!

 My PR52533 patch fixed unsigned comparisons, but not signed ones
 where the maximum is different.
 Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
 ok for trunk?

 2012-04-23  Jakub Jelinek  ja...@redhat.com

        PR tree-optimization/53058
        * tree-vrp.c (register_edge_assert_for_2): Compare mask
        for LE_EXPR or GT_EXPR with TYPE_MAX_VALUE of TREE_TYPE (val)
        instead of all ones in the precision.

        * gcc.c-torture/compile/pr53058.c: New test.

 --- gcc/tree-vrp.c.jj   2012-04-23 11:11:21.0 +0200
 +++ gcc/tree-vrp.c      2012-04-23 16:08:43.752462433 +0200
 @@ -4565,6 +4565,7 @@ register_edge_assert_for_2 (tree name, e
               INTEGRAL_TYPE_P (TREE_TYPE (name2))
               IN_RANGE (tree_low_cst (cst2, 1), 1, prec - 1)
               prec = 2 * HOST_BITS_PER_WIDE_INT
 +              prec == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (val)))
               live_on_edge (e, name2)
               !has_single_use (name2))
            {
 @@ -4598,8 +4599,10 @@ register_edge_assert_for_2 (tree name, e
            new_val = val2;
          else
            {
 +             double_int maxval
 +               = tree_to_double_int (TYPE_MAX_VALUE (TREE_TYPE (val)));

I don't like using TYPE_MAX_VALUE in VRP - at least please use vrp_val_max.
For enums this can be not what you expect(?)

Please consider adding a double_int_max_value (unsigned prec, bool sign)
and double_int_min_value.

Thanks,
Richard.

              mask = double_int_ior (tree_to_double_int (val2), mask);
 -             if (double_int_minus_one_p (double_int_sext (mask, prec)))
 +             if (double_int_equal_p (mask, maxval))
                new_val = NULL_TREE;
              else
                new_val = double_int_to_tree (TREE_TYPE (val2), mask);
 --- gcc/testsuite/gcc.c-torture/compile/pr53058.c.jj    2012-04-23 
 15:53:41.489982650 +0200
 +++ gcc/testsuite/gcc.c-torture/compile/pr53058.c       2012-04-23 
 15:53:13.0 +0200
 @@ -0,0 +1,12 @@
 +/* PR tree-optimization/53058 */
 +
 +int a, b, c;
 +
 +void
 +foo ()
 +{
 +  c = b  16;
 +  if (c  32767)
 +    c = 0;
 +  a = b;
 +}

        Jakub


[PATCH] Fix VRP ICE on x cst1; if (x cmp cst2) (PR tree-optimization/53058, take 2)

2012-04-24 Thread Jakub Jelinek
On Tue, Apr 24, 2012 at 08:05:35AM +0200, Richard Guenther wrote:
 On Tue, Apr 24, 2012 at 1:01 AM, Jakub Jelinek ja...@redhat.com wrote:
 I don't like using TYPE_MAX_VALUE in VRP - at least please use vrp_val_max.
 For enums this can be not what you expect(?)
 
 Please consider adding a double_int_max_value (unsigned prec, bool sign)
 and double_int_min_value.

So like this instead?

2012-04-24  Jakub Jelinek  ja...@redhat.com

PR tree-optimization/53058
* double-int.h (double_int_max_value, double_int_min_value): New
functions.
* tree-vrp.c (register_edge_assert_for_2): Compare mask
for LE_EXPR or GT_EXPR with double_int_max_value
instead of double_int_mask.

* gcc.c-torture/compile/pr53058.c: New test.

--- gcc/double-int.h.jj 2012-04-19 11:09:13.0 +0200
+++ gcc/double-int.h2012-04-24 08:42:42.655756406 +0200
@@ -1,5 +1,5 @@
 /* Operations with long integers.
-   Copyright (C) 2006, 2007, 2008, 2010 Free Software Foundation, Inc.
+   Copyright (C) 2006, 2007, 2008, 2010, 2012 Free Software Foundation, Inc.
 
 This file is part of GCC.
 
@@ -280,6 +280,26 @@ double_int_equal_p (double_int cst1, dou
   return cst1.low == cst2.low  cst1.high == cst2.high;
 }
 
+/* Returns a maximum value for signed or unsigned integer
+   of precision PREC.  */
+
+static inline double_int
+double_int_max_value (unsigned int prec, bool uns)
+{
+  return double_int_mask (prec - (uns ? 0 : 1));
+}
+
+/* Returns a minimum value for signed or unsigned integer
+   of precision PREC.  */
+
+static inline double_int
+double_int_min_value (unsigned int prec, bool uns)
+{
+  if (uns)
+return double_int_zero;
+  return double_int_lshift (double_int_one, prec - 1, prec, false);
+}
+
 
 /* Legacy interface with decomposed high/low parts.  */
 
--- gcc/tree-vrp.c.jj   2012-04-23 11:11:21.0 +0200
+++ gcc/tree-vrp.c  2012-04-24 09:06:29.116624564 +0200
@@ -4565,6 +4565,7 @@ register_edge_assert_for_2 (tree name, e
   INTEGRAL_TYPE_P (TREE_TYPE (name2))
   IN_RANGE (tree_low_cst (cst2, 1), 1, prec - 1)
   prec = 2 * HOST_BITS_PER_WIDE_INT
+  prec == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (val)))
   live_on_edge (e, name2)
   !has_single_use (name2))
{
@@ -4598,8 +4599,10 @@ register_edge_assert_for_2 (tree name, e
new_val = val2;
  else
{
+ double_int maxval
+   = double_int_max_value (prec, TYPE_UNSIGNED (TREE_TYPE (val)));
  mask = double_int_ior (tree_to_double_int (val2), mask);
- if (double_int_minus_one_p (double_int_sext (mask, prec)))
+ if (double_int_equal_p (mask, maxval))
new_val = NULL_TREE;
  else
new_val = double_int_to_tree (TREE_TYPE (val2), mask);
--- gcc/testsuite/gcc.c-torture/compile/pr53058.c.jj2012-04-23 
15:53:41.489982650 +0200
+++ gcc/testsuite/gcc.c-torture/compile/pr53058.c   2012-04-23 
15:53:13.0 +0200
@@ -0,0 +1,12 @@
+/* PR tree-optimization/53058 */
+
+int a, b, c;
+
+void
+foo ()
+{
+  c = b  16;
+  if (c  32767)
+c = 0;
+  a = b;
+}

Jakub


RE: [H8300] Use braced strings in MD

2012-04-24 Thread Naveen H. S
Hi,

 Great.  I approved it this morning and it looks like your account was
 created soon thereafter.

Thanks for the approval. My account has been created.

 add yourself to the MAINTAINTERS file with write-after-approval
 privileges.

Done.
http://gcc.gnu.org/ml/gcc-cvs/2012-04/msg00705.html

 Looks good.  Please install.  Similarly for the v850 changes.

The h8300.md and v850.md are committed at the following links
respectively on trunk:-
http://gcc.gnu.org/ml/gcc-cvs/2012-04/msg00696.html
http://gcc.gnu.org/ml/gcc-cvs/2012-04/msg00699.html

Thanks for your help.

Regards,
Naveen




[PATCH] Fix PR53098

2012-04-24 Thread Richard Guenther

This fixes PR53098.

Committed as obvious.

Richard.

2012-04-24  Richard Guenther  rguent...@suse.de

PR tree-optimization/53098
* tree-vect-loop.c (vect_analyze_loop_operations): Fixup
comparison sign.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 186751)
+++ gcc/tree-vect-loop.c(working copy)
@@ -1411,7 +1411,7 @@ vect_analyze_loop_operations (loop_vec_i
   if ((LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
 (LOOP_VINFO_INT_NITERS (loop_vinfo)  vectorization_factor))
   || ((max_niter = max_stmt_executions_int (loop)) != -1
-  max_niter  vectorization_factor))
+  (unsigned HOST_WIDE_INT) max_niter  vectorization_factor))
 {
   if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS))
 fprintf (vect_dump, not vectorized: iteration count too small.);


Re: [PATCH] Fix VRP ICE on x cst1; if (x cmp cst2) (PR tree-optimization/53058, take 2)

2012-04-24 Thread Richard Guenther
On Tue, Apr 24, 2012 at 9:10 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Tue, Apr 24, 2012 at 08:05:35AM +0200, Richard Guenther wrote:
 On Tue, Apr 24, 2012 at 1:01 AM, Jakub Jelinek ja...@redhat.com wrote:
 I don't like using TYPE_MAX_VALUE in VRP - at least please use vrp_val_max.
 For enums this can be not what you expect(?)

 Please consider adding a double_int_max_value (unsigned prec, bool sign)
 and double_int_min_value.

 So like this instead?

Yes!

Thanks,
Richard.

 2012-04-24  Jakub Jelinek  ja...@redhat.com

        PR tree-optimization/53058
        * double-int.h (double_int_max_value, double_int_min_value): New
        functions.
        * tree-vrp.c (register_edge_assert_for_2): Compare mask
        for LE_EXPR or GT_EXPR with double_int_max_value
        instead of double_int_mask.

        * gcc.c-torture/compile/pr53058.c: New test.

 --- gcc/double-int.h.jj 2012-04-19 11:09:13.0 +0200
 +++ gcc/double-int.h    2012-04-24 08:42:42.655756406 +0200
 @@ -1,5 +1,5 @@
  /* Operations with long integers.
 -   Copyright (C) 2006, 2007, 2008, 2010 Free Software Foundation, Inc.
 +   Copyright (C) 2006, 2007, 2008, 2010, 2012 Free Software Foundation, Inc.

  This file is part of GCC.

 @@ -280,6 +280,26 @@ double_int_equal_p (double_int cst1, dou
   return cst1.low == cst2.low  cst1.high == cst2.high;
  }

 +/* Returns a maximum value for signed or unsigned integer
 +   of precision PREC.  */
 +
 +static inline double_int
 +double_int_max_value (unsigned int prec, bool uns)
 +{
 +  return double_int_mask (prec - (uns ? 0 : 1));
 +}
 +
 +/* Returns a minimum value for signed or unsigned integer
 +   of precision PREC.  */
 +
 +static inline double_int
 +double_int_min_value (unsigned int prec, bool uns)
 +{
 +  if (uns)
 +    return double_int_zero;
 +  return double_int_lshift (double_int_one, prec - 1, prec, false);
 +}
 +

  /* Legacy interface with decomposed high/low parts.  */

 --- gcc/tree-vrp.c.jj   2012-04-23 11:11:21.0 +0200
 +++ gcc/tree-vrp.c      2012-04-24 09:06:29.116624564 +0200
 @@ -4565,6 +4565,7 @@ register_edge_assert_for_2 (tree name, e
               INTEGRAL_TYPE_P (TREE_TYPE (name2))
               IN_RANGE (tree_low_cst (cst2, 1), 1, prec - 1)
               prec = 2 * HOST_BITS_PER_WIDE_INT
 +              prec == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (val)))
               live_on_edge (e, name2)
               !has_single_use (name2))
            {
 @@ -4598,8 +4599,10 @@ register_edge_assert_for_2 (tree name, e
            new_val = val2;
          else
            {
 +             double_int maxval
 +               = double_int_max_value (prec, TYPE_UNSIGNED (TREE_TYPE 
 (val)));
              mask = double_int_ior (tree_to_double_int (val2), mask);
 -             if (double_int_minus_one_p (double_int_sext (mask, prec)))
 +             if (double_int_equal_p (mask, maxval))
                new_val = NULL_TREE;
              else
                new_val = double_int_to_tree (TREE_TYPE (val2), mask);
 --- gcc/testsuite/gcc.c-torture/compile/pr53058.c.jj    2012-04-23 
 15:53:41.489982650 +0200
 +++ gcc/testsuite/gcc.c-torture/compile/pr53058.c       2012-04-23 
 15:53:13.0 +0200
 @@ -0,0 +1,12 @@
 +/* PR tree-optimization/53058 */
 +
 +int a, b, c;
 +
 +void
 +foo ()
 +{
 +  c = b  16;
 +  if (c  32767)
 +    c = 0;
 +  a = b;
 +}

        Jakub


[Committed] Fix bootstrap issue in tree-ssa-forwprop.c

2012-04-24 Thread Andrew Pinski
I tested my two patches independently and forgot to test together and
one used a variable which I removed in the other.
Anyways this patch changes the code to be correct and fixes the bootstrap issue.

Committed as obvious after building stage1.

Thanks,
Andrew Pinski

ChangeLog:
2012-04-24  Andrew Pinski  apin...@cavium.com

* tree-ssa-forwprop.c (simplify_bitwise_binary):
Don't directly use def1/def2.
Index: ChangeLog
===
--- ChangeLog   (revision 186756)
+++ ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2012-04-24  Andrew Pinski  apin...@cavium.com
+
+   * tree-ssa-forwprop.c (simplify_bitwise_binary):
+   Don't directly use def1/def2.
+
 2012-04-24  Richard Guenther  rguent...@suse.de
 
PR tree-optimization/53098
Index: tree-ssa-forwprop.c
===
--- tree-ssa-forwprop.c (revision 186755)
+++ tree-ssa-forwprop.c (working copy)
@@ -1913,10 +1913,10 @@ simplify_bitwise_binary (gimple_stmt_ite
/* Simplify (A  B) OP0 (C  B) to (A OP0 C)  B. */
if (def1_code == def2_code
 def1_code == BIT_AND_EXPR
-operand_equal_for_phi_arg_p (gimple_assign_rhs2 (def1),
-  gimple_assign_rhs2 (def2)))
+operand_equal_for_phi_arg_p (def1_arg2,
+  def2_arg2))
 {
-  tree b = gimple_assign_rhs2 (def1);
+  tree b = def1_arg2;
   tree a = def1_arg1;
   tree c = def2_arg1;
   tree inner = fold_build2 (code, TREE_TYPE (arg2), a, c);


Re: [Fixinclude]: Fix typo and default to twoprocess on VMS

2012-04-24 Thread Tristan Gingold

On Apr 18, 2012, at 8:23 PM, Bruce Korb wrote:

 Hi,
 
 When I approved a patch in 2008, there was a typo.  I didn't
 notice and it was fixed by removing a formatting element.
 Your patch corrects the error.
 
 Please apply your changes to active branches.  Thank you!
 Regards, Bruce

Thanks, committed on the trunk.

Tristan.




[PATCH] Record vectorized loop epilogue loop max number of iterations

2012-04-24 Thread Richard Guenther

I have two patches who also record the maximum number of loop iterations
for the epilogue loop the vectorizer creates.  The first one reverts
an old patch that made us re-use that epilogue loop also for the
unvectorized case (when versioning for alignment or aliasing).  Thus
it creates one more loop copy (though it then properly separates the
epilogue with very few iterations from the unvectorized loops with
the original number of iterations).  The second patch simply only
records the maximum number of iterations when we do not re-use
that loop body for the unvectorized version.

Numbers on the single testcase I tried this on shows the second version
has slightly better code-size behavior.  I was testing this on
the single-file leslie3d SPEC CPU 2006 testcase whose file-size regressed
the most when enabling array-prefetching by default on certain AMD models.
The numbers are as follows, base flags -Ofast -funroll-loops -fpeel-loops 
-march=barcelona.

  unpatched  patch #1  patch #2
  564423 391623386151
-fno-prefetch-loop-arrays 366247 308711303783
-fwhole-program   471481 326944322925

the difference is barely noticable but the testcase is surely special.

Both patches were bootstrapped and tested on x86_64-unknown-linux-gnu
(the first one needs some assembler scan adjustments in the x86 testsuite
still).

I'm leaning towards using the first patch, as only that will enable
us to more easily dis-entangle the various loops generated by the
vectorizer and eventually optimize them more (like unroll the
epilogue, re-order and re-structure the cost model, alignment and
alias checks, etc.).

But for wider testing coverage (and also to get some runtime numbers,
which should show just noise for both patches ...) I'm going to
apply patch two first and later revert to patch one.

Thanks,
Richard.

2012-04-24  Richard Guenther  rguent...@suse.de

* tree-vectorizer.h (vect_loop_versioning): Adjust prototype.
* tree-vect-loop.c (vect_transform_loop): Adjust.
* tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Record
the maximum number of iterations for the epilogue loop.
(vect_loop_versioning): Remove case re-using the peeled
epilogue loop.

Index: gcc/tree-vectorizer.h
===
*** gcc/tree-vectorizer.h   (revision 186709)
--- gcc/tree-vectorizer.h   (working copy)
*** extern LOC vect_loop_location;
*** 807,813 
 in tree-vect-loop-manip.c.  */
  extern void slpeel_make_loop_iterate_ntimes (struct loop *, tree);
  extern bool slpeel_can_duplicate_loop_p (const struct loop *, const_edge);
! extern void vect_loop_versioning (loop_vec_info, bool, tree *, gimple_seq *);
  extern void vect_do_peeling_for_loop_bound (loop_vec_info, tree *,
  tree, gimple_seq);
  extern void vect_do_peeling_for_alignment (loop_vec_info);
--- 807,813 
 in tree-vect-loop-manip.c.  */
  extern void slpeel_make_loop_iterate_ntimes (struct loop *, tree);
  extern bool slpeel_can_duplicate_loop_p (const struct loop *, const_edge);
! extern void vect_loop_versioning (loop_vec_info);
  extern void vect_do_peeling_for_loop_bound (loop_vec_info, tree *,
  tree, gimple_seq);
  extern void vect_do_peeling_for_alignment (loop_vec_info);
Index: gcc/tree-vect-loop.c
===
*** gcc/tree-vect-loop.c(revision 186709)
--- gcc/tree-vect-loop.c(working copy)
*** vect_transform_loop (loop_vec_info loop_
*** 5229,5235 
unsigned int nunits;
tree cond_expr = NULL_TREE;
gimple_seq cond_expr_stmt_list = NULL;
-   bool do_peeling_for_loop_bound;
gimple stmt, pattern_stmt;
gimple_seq pattern_def_seq = NULL;
gimple_stmt_iterator pattern_def_si = gsi_start (NULL);
--- 5229,5234 
*** vect_transform_loop (loop_vec_info loop_
*** 5244,5260 
if (LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo))
  vect_do_peeling_for_alignment (loop_vinfo);
  
-   do_peeling_for_loop_bound
- = (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
-|| (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
-   LOOP_VINFO_INT_NITERS (loop_vinfo) % vectorization_factor != 0)
-|| LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo));
- 
if (LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT (loop_vinfo)
|| LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo))
! vect_loop_versioning (loop_vinfo,
! !do_peeling_for_loop_bound,
! cond_expr, cond_expr_stmt_list);
  
/* If the loop has a symbolic number of iterations 'n' (i.e. it's not a
   compile time constant), or it is a constant that doesn't divide by the
--- 5243,5251 
if (LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo))
  vect_do_peeling_for_alignment 

Re: [PATCH] Support for known unknown alignment

2012-04-24 Thread Martin Jambor
Hi,

On Mon, Apr 23, 2012 at 03:30:19PM +0200, Richard Guenther wrote:
 On Mon, 23 Apr 2012, Martin Jambor wrote:
 
  Hi,
  
  On Mon, Apr 23, 2012 at 12:50:51PM +0200, Richard Guenther wrote:
   On Fri, 20 Apr 2012, Martin Jambor wrote:
   
Hi,

two days ago I talked to Richi on IRC about the functions to determine
the expected alignment of objects and pointers we have and he
suggested that get_object_alignment_1 and get_pointer_alignment_1
should return whether the alignment is actually known and return the
actual alignment in a reference variable (as they currently return
bitpos).


...

   
   This looks all good apart from
   
   + void
   + set_ptr_info_alignment (struct ptr_info_def *pi, unsigned int align,
   +   unsigned int misalign)
   + {
   +   gcc_checking_assert (align != 0);
   
   Why this assert?  If we assert sth then we should assert that
   misalign  ~(align - 1) == 0 and that align is a power-of-two or 0.
  
  Because the patch makes zero value special, meaning we do not know,
  and my intention was that this should be announced using
  mark_ptr_info_alignment_unknown, mainly to prevent bugs when somebody
  miscalculated the alignment.  Without the assert, the subsystem would
  silently accept zero as its internal flag which would not ICE or
  miscompile stuff, but perhaps unnecessarily pessimistic alignment
  assumptions would be used.  Moreover, the smallest sensible alignment
  is 1, the misalign component is also byte-based (after all, they
  describe pointers) and zero alignment does not really have any
  meaning, or does it?
 
 Hmm, indeed (given the docs mention ptr  (align - 1)).
 
  I'll add a FIXME to the place where CCP previously attempted setting
  alignment zero, but I won't have time to investigate that anytime
  soon.
 
 Well, CCP simply tracks known-bits and derives the alignment
 value from that.  If tem  -tem computes as zero that means
 val-mask.low is all zeros.  Thus the if (align == 1) check
 should have been if (align = 1) from the start.  No fixme necessary
 I think.
 
  Asserting alignment is a power of two and that misalign
  is not bigger than align are good ideas.
  
   
   +   pi-align = align;
   +   pi-misalign = misalign;
   + }
   + 
   + /* If pointer decribed by PI has known alignment, increase its known
   +misalignment by INCREMENT modulo its current alignment.  */
   + 
   + void
   + increment_ptr_info_misalignment (struct ptr_info_def *pi,
   + unsigned int increment)
   + {
   
   Hmm, I'd call it adjust_ptr_info_misalignment instead.
   
  
  Fine with me.
  
   Ok with that changes.
   
  
  Unfortunately, the testsuite results from ppc64 look like there are
  some problems.  Hopefully they are not too complex or I'll have to
  postpone this for a few weeks as I have more pressing tasks to do now.

The testsuite differences I got on Friday were probably noise, tonight
(on an updated svn tree) I did not get any on ppc64-linux,
x86_64-linux or i686-linux.  Considering that and the OK above, I'm
going to bootstrap and test also on sparc64-linux and ia64-linux and
if those pass too, I'll commit the patch tomorrow, unless there are
any objections.

Of course, I have renamed increment_ptr_info_misalignment to
adjust_ptr_info_misalignment and changed the asserts at the beginning
of set_ptr_info_alignment to:

  gcc_checking_assert (align != 0);
  gcc_assert ((align  (align - 1)) == 0);
  gcc_assert ((misalign  ~(align - 1)) == 0);

Those were the only changes, the whole updated patch is below.

Thanks,

Martin

2012-04-23  Martin Jambor  mjam...@suse.cz

* builtins.c (get_object_alignment_1): Return whether we can determine
the alignment or conservatively assume byte alignment.  Return the
alignment by reference.  Use get_pointer_alignment_1 for dereference
alignment.
(get_pointer_alignment_1): Return whether we can determine the
alignment or conservatively assume byte alignment.  Return the
alignment by reference.  Use get_ptr_info_alignment to get SSA name
alignment.
(get_object_alignment): Update call to get_object_alignment_1.
(get_object_or_type_alignment): Likewise, fall back to type alignment
only when it returned false.
(get_pointer_alignment): Update call to get_pointer_alignment_1.
* fold-const.c (get_pointer_modulus_and_residue): Update call to
get_object_alignment_1.
* ipa-prop.c (ipa_modify_call_arguments): Update call to
get_pointer_alignment_1.
* tree-sra.c (build_ref_for_offset): Likewise, fall back to the type
of MEM_REF or TARGET_MEM_REF only when it returns false.
* tree-ssa-ccp.c (get_value_from_alignment): Update call to
get_object_alignment_1.
(ccp_finalize): Use set_ptr_info_alignment.
* tree.h (get_object_alignment_1): Update declaration.

Re: [PATCH] genattrtab: avoid NULL-deref on error

2012-04-24 Thread Jim Meyering
Richard Guenther wrote:
 On Sat, Apr 21, 2012 at 5:53 PM, Jim Meyering j...@meyering.net wrote:
 I see that no one responded when I posted this in February.
 Ok to commit, now?

 Ok.

Thanks.  Pushed.


[Patch,avr,trunk,4.7]: Fix PR53065

2012-04-24 Thread Georg-Johann Lay
This implements HARD_REGNO_CALL_PART_CLOBBERED.

Ok to apply?

Johann

PR target/53065
* config/avr/avr.h (HARD_REGNO_CALL_PART_CLOBBERED): New hook define.
Index: config/avr/avr.h
===
--- config/avr/avr.h	(revision 186600)
+++ config/avr/avr.h	(working copy)
@@ -394,6 +394,11 @@ enum reg_class {
 
 #define REGNO_OK_FOR_INDEX_P(NUM) 0
 
+#define HARD_REGNO_CALL_PART_CLOBBERED(REGNO, MODE)\
+  (((REGNO)  18  (REGNO) + GET_MODE_SIZE (MODE)  18)   \
+   || ((REGNO)  REG_Y  (REGNO) + GET_MODE_SIZE (MODE)  REG_Y)  \
+   || ((REGNO)  REG_Z  (REGNO) + GET_MODE_SIZE (MODE)  REG_Z))
+
 #define TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P hook_bool_mode_true
 
 #define STACK_PUSH_CODE POST_DEC


[PATCH, 4.7, Android, i386] Port of Android target support in i386 for 4.7 branch

2012-04-24 Thread Ilya Enkovich
Hello,

here is a port of Android support patch
(http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00944.html) for 4.7
branch. Is it OK?

Bootstrap and check passed on linux-x86_64.

Thanks,
Ilya
---
2012-04-24  Enkovich Ilya  ilya.enkov...@intel.com

* config/i386/linux-common.h: New.

* config.gcc: Add i386/linux-common.h before
all i386/linux.h and i386/linux64.h usages.

* config/i386/gnu-user.h (GNU_USER_TARGET_LINK_SPEC): New.
(LINK_SPEC): Use GNU_USER_TARGET_LINK_SPEC.
(GNU_USER_TARGET_CC1_SPEC): New.
(CC1_SPEC): Use GNU_USER_TARGET_CC1_SPEC.
(GNU_USER_TARGET_MATHFILE_SPEC): New.
(ENDFILE_SPEC): Use GNU_USER_TARGET_MATHFILE_SPEC.

* config/i386/gnu-user64.h: Likewise.


mandroid-4_7.patch
Description: Binary data


[PATCH] Make sizetypes no longer sign-extending

2012-04-24 Thread Richard Guenther

I've been carrying this patch for quite some while now and really
want to go forward with it - the problem is that while all default
languages work fine after this patch Ada shows some testsuite
regressions.  I've had various hacks/workarounds throughout the
Ada frontend for them, but lost track of what fixed what and
they all felt like hacks anyway.

Thus - I know the patch will add Ada testsuite regressions.  But it will
not break Ada bootstrap.  Ada is not in the set of default languages,
nor is it considered release critical.

Are the Ada folks happy with helping to fix the fallout after-the-fact
(I got Eric to fix the bootstrap issues that were first present - thanks
for that)?  I am happy to revisit my hacks/workarounds and post them,
but it will be ultimatively easier to review them if you can see
the FAIL for yourself (there are some workarounds/hacks posted on the
mailinglist for previous attempts IIRC).

Thanks for your consideration.

The patch is currently under re-testing (it needs the 2nd patch
below, which was already approved but back in time broke Ada
bootstrap - I didn't verify if that still occurs).

Richard.

2012-03-06  Richard Guenther  rguent...@suse.de

* fold-const.c (div_if_zero_remainder): sizetypes no longer
sign-extend.
(int_const_binop_1): New worker for int_const_binop with
overflowable parameter.  Pass it through
to force_fit_type_double.
(int_const_binop): Wrap around int_const_binop_1 with overflowable
equal to one.
(size_binop_loc): Call int_const_binop_1 with overflowable equal
to minus one, forcing overflow detection for even unsigned types.
(extract_muldiv_1): Remove bogus TYPE_IS_SIZETYPE special-casing.
(fold_binary_loc): Call try_move_mult_to_index with signed offset.
* stor-layout.c (initialize_sizetypes): sizetypes no longer
sign-extend.
(layout_type): For zero-sized arrays ignore overflow on the
size calculations.
* tree-ssa-ccp.c (bit_value_unop_1): Likewise.
(bit_value_binop_1): Likewise.
* tree.c (double_int_to_tree): Likewise.
(double_int_fits_to_tree_p): Likewise.
(force_fit_type_double): Likewise.
(host_integerp): Likewise.
(int_fits_type_p): Likewise.
* varasm.c (output_constructor_regular_field): Sign-extend the
field-offset to cater for negative offsets produced by the Ada frontend.
* omp-low.c (extract_omp_for_data): Convert the loop step to
signed for pointer adjustments.

* g++.dg/tree-ssa/pr19807.C: Adjust.

Index: trunk/gcc/fold-const.c
===
*** trunk.orig/gcc/fold-const.c 2012-03-06 12:41:10.0 +0100
--- trunk/gcc/fold-const.c  2012-03-06 14:46:25.0 +0100
*** div_if_zero_remainder (enum tree_code co
*** 191,199 
   does the correct thing for POINTER_PLUS_EXPR where we want
   a signed division.  */
uns = TYPE_UNSIGNED (TREE_TYPE (arg2));
-   if (TREE_CODE (TREE_TYPE (arg2)) == INTEGER_TYPE
-TYPE_IS_SIZETYPE (TREE_TYPE (arg2)))
- uns = false;
  
quo = double_int_divmod (tree_to_double_int (arg1),
   tree_to_double_int (arg2),
--- 191,196 
*** int_binop_types_match_p (enum tree_code
*** 935,942 
 to produce a new constant.  Return NULL_TREE if we don't know how
 to evaluate CODE at compile-time.  */
  
! tree
! int_const_binop (enum tree_code code, const_tree arg1, const_tree arg2)
  {
double_int op1, op2, res, tmp;
tree t;
--- 932,940 
 to produce a new constant.  Return NULL_TREE if we don't know how
 to evaluate CODE at compile-time.  */
  
! static tree
! int_const_binop_1 (enum tree_code code, const_tree arg1, const_tree arg2,
!  int overflowable)
  {
double_int op1, op2, res, tmp;
tree t;
*** int_const_binop (enum tree_code code, co
*** 1078,1090 
return NULL_TREE;
  }
  
!   t = force_fit_type_double (TREE_TYPE (arg1), res, 1,
 ((!uns || is_sizetype)  overflow)
 | TREE_OVERFLOW (arg1) | TREE_OVERFLOW (arg2));
  
return t;
  }
  
  /* Combine two constants ARG1 and ARG2 under operation CODE to produce a new
 constant.  We assume ARG1 and ARG2 have the same data type, or at least
 are the same kind of constant and the same machine mode.  Return zero if
--- 1076,1094 
return NULL_TREE;
  }
  
!   t = force_fit_type_double (TREE_TYPE (arg1), res, overflowable,
 ((!uns || is_sizetype)  overflow)
 | TREE_OVERFLOW (arg1) | TREE_OVERFLOW (arg2));
  
return t;
  }
  
+ tree
+ int_const_binop (enum tree_code code, const_tree arg1, const_tree arg2)
+ {
+   return int_const_binop_1 (code, arg1, arg2, 1);
+ }
+ 
  /* Combine two constants ARG1 and ARG2 

Re: [fortran, patch] Allow displaying backtraces from user code

2012-04-24 Thread Janus Weil
Hi guys,

(coming back to an old patch proposed by FX some time ago ...)

2012/3/3 FX fxcoud...@gmail.com:
 I think that this approach is a mistake.  The patch
 starts us down a slippery slope.  Why not export all
 the nonstandard intrinsics?  In additions, the
 _gfortran_ prefix was used to separate the libgfortran
 namespace from userspace.  Providing a means to
 circumvent this separation seems to asking for more
 PR.

 Well, the mean exists. All _gfortran_* functions can already be called, 
 they're part of libgfortran's public (and versioned) API. I'm just saying 
 adding a simple backtrace function to that list is useful, and documenting it 
 too.

Yes, I agree that this is useful, and in that sense the patch is ok
from my side ...


 I would rather see a new intrinsic procedure add to
 gfortran.  The standard does not prevent a vendor
 from offer additional intrinsic procedures not found
 in the standard.

 I just think multiplicating vendor intrinsics makes our life harder. I'd like 
 to hear other's opinions on this issue, but I'll implement the new intrinsic 
 if that's the consensus.

... but I also think that having an intrinsic function for it would be
useful, so that one can just call it without the detour via
ISO_C_BINDING. Note that ifort also has a vendor intrinsic for this,
called TRACEBACKQQ, so we're in good company.

Cheers,
Janus


Re: [google/integration] Extend C++11 UDLs to be compatible with inttypes.h macros (issue6104051)

2012-04-24 Thread Diego Novillo
On Mon, Apr 23, 2012 at 18:16, Ollie Wild a...@google.com wrote:

 If the trunk change isn't approved today, I'll probably go ahead and
 check in *that* patch to google/integration.  The main difference
 relative to this is that the option name has been changed to
 -Wliteral-suffix.  *That* part has been approved, so hopefully any
 additional adjustments will be minor.

Sounds good.  Thanks.


Diego.


Re: [Patch,avr,trunk,4.7]: Fix PR53065

2012-04-24 Thread Denis Chertykov
2012/4/24 Georg-Johann Lay a...@gjlay.de:
 This implements HARD_REGNO_CALL_PART_CLOBBERED.

 Ok to apply?

 Johann

        PR target/53065
        * config/avr/avr.h (HARD_REGNO_CALL_PART_CLOBBERED): New hook define.

Please commit.

Denis.


Re: [PATCH, 4.7, Android, i386] Port of Android target support in i386 for 4.7 branch

2012-04-24 Thread Richard Guenther
On Tue, Apr 24, 2012 at 2:54 PM, Ilya Enkovich enkovich@gmail.com wrote:
 Hello,

 here is a port of Android support patch
 (http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00944.html) for 4.7
 branch. Is it OK?

This kind of re-org is not appropriate for the branch.

Thanks,
Richard.

 Bootstrap and check passed on linux-x86_64.

 Thanks,
 Ilya
 ---
 2012-04-24  Enkovich Ilya  ilya.enkov...@intel.com

        * config/i386/linux-common.h: New.

        * config.gcc: Add i386/linux-common.h before
        all i386/linux.h and i386/linux64.h usages.

        * config/i386/gnu-user.h (GNU_USER_TARGET_LINK_SPEC): New.
        (LINK_SPEC): Use GNU_USER_TARGET_LINK_SPEC.
        (GNU_USER_TARGET_CC1_SPEC): New.
        (CC1_SPEC): Use GNU_USER_TARGET_CC1_SPEC.
        (GNU_USER_TARGET_MATHFILE_SPEC): New.
        (ENDFILE_SPEC): Use GNU_USER_TARGET_MATHFILE_SPEC.

        * config/i386/gnu-user64.h: Likewise.


Re: [PATCH] Make sizetypes no longer sign-extending

2012-04-24 Thread Richard Earnshaw
On 24/04/12 14:03, Richard Guenther wrote:
 /* Size types *are* sign extended.  */
 !   sign_extended_type = !TYPE_UNSIGNED (type);

looks like you failed to update the comment.

R.



Re: [PATCH] Make sizetypes no longer sign-extending

2012-04-24 Thread Richard Guenther
On Tue, 24 Apr 2012, Richard Earnshaw wrote:

 On 24/04/12 14:03, Richard Guenther wrote:
  /* Size types *are* sign extended.  */
  !   sign_extended_type = !TYPE_UNSIGNED (type);
 
 looks like you failed to update the comment.

Indeed.  Fixed.

Thanks,
Richard.


Re: [PATCH v2] ARM: Use different linker path for hardfloat ABI

2012-04-24 Thread Richard Earnshaw
On 23/04/12 22:36, Michael Hope wrote:
 On 24 April 2012 03:35, Richard Earnshaw rearn...@arm.com wrote:
 On 22/04/12 23:20, Michael Hope wrote:
 Change the dynamic linker path for ARM hard float executables.
 Matches the path discussed and agreed on last week[1].  Carlos will
 follow up with the matching patch to GLIBC[2].  I'm happy to if he's
 busy.

 OK for trunk?

 -- Michael
 [1] http://sourceware.org/ml/libc-ports/2012-04/msg00060.html
 [2] http://sourceware.org/ml/libc-ports/2012-04/msg00064.html

 2012-04-23  Michael Hope  michael.h...@linaro.org

   * config/arm/linux-eabi.h (GLIBC_DYNAMIC_LINKER_HARD_FLOAT): Define.
   (GLIBC_DYNAMIC_LINKER_SOFT_FLOAT): Define.
   (GLIBC_DYNAMIC_LINKER): Redefine to use the hard float path.

 diff --git a/gcc/config/arm/linux-eabi.h b/gcc/config/arm/linux-eabi.h
 index 80bd825..3ddf812 100644
 --- a/gcc/config/arm/linux-eabi.h
 +++ b/gcc/config/arm/linux-eabi.h
 @@ -62,7 +62,11 @@
  /* Use ld-linux.so.3 so that it will be possible to run classic
 GNU/Linux binaries on an EABI system.  */
  #undef  GLIBC_DYNAMIC_LINKER
 -#define GLIBC_DYNAMIC_LINKER /lib/ld-linux.so.3
 +#define GLIBC_DYNAMIC_LINKER_SOFT_FLOAT /lib/ld-linux.so.3
 +#define GLIBC_DYNAMIC_LINKER_HARD_FLOAT /lib/ld-linux-armhf.so.3
 +#define GLIBC_DYNAMIC_LINKER \
 +   %{mfloat-abi=hard: GLIBC_DYNAMIC_LINKER_HARD_FLOAT } \
 +%{!mfloat-abi=hard: GLIBC_DYNAMIC_LINKER_SOFT_FLOAT }

  /* At this point, bpabi.h will have clobbered LINK_SPEC.  We want to
 use the GNU/Linux version, not the generic BPABI version.  */



 I think this should handle having the hard-float variant as the default
 more gracefully, so something like:


 -#define GLIBC_DYNAMIC_LINKER /lib/ld-linux.so.3
 +#define GLIBC_DYNAMIC_LINKER_SOFT_FLOAT /lib/ld-linux.so.3
 +#define GLIBC_DYNAMIC_LINKER_HARD_FLOAT /lib/ld-linux-armhf.so.3
 +#if TARGET_DEFAULT_FLOAT_ABI == ARM_FLOAT_ABI_HARD
 +#define GLIBC_DYNAMIC_LINKER_DEFAULT GLIBC_DYNAMIC_LINKER_HARD_FLOAT
 +#else
 +#define GLIBC_DYNAMIC_LINKER_DEFAULT GLIBC_DYNAMIC_LINKER_SOFT_FLOAT
 +#endif
 +#define GLIBC_DYNAMIC_LINKER \
 +   %{mfloat-abi=hard: GLIBC_DYNAMIC_LINKER_HARD_FLOAT } \
 +%{mfloat-abi=soft*: GLIBC_DYNAMIC_LINKER_SOFT_FLOAT } \
 +%{!mfloat-abi=*: GLIBC_DYNAMIC_LINKER_DEFAULT }

 But I haven't tested any of that.
 
 It tests just fine when setting the float ABI at configure time or
 explicitly.  Updated patch is below.  I prefer the original, shorter
 version but the new one is easier for someone else to understand.
 

Your version would work for configuration using --with-float-abi, but
not for built-in setting with hard-float as default.

 OK for trunk?
 

OK.

 -- Michael
 
 2012-04-24  Michael Hope  michael.h...@linaro.org
   Richard Earnshaw  rearn...@arm.com
 
   * config/arm/linux-eabi.h (GLIBC_DYNAMIC_LINKER_SOFT_FLOAT): Define.
   (GLIBC_DYNAMIC_LINKER_HARD_FLOAT): Define.
   (GLIBC_DYNAMIC_LINKER_DEFAULT): Define.
   (GLIBC_DYNAMIC_LINKER): Redefine to use the hard float path.
 
 diff --git a/gcc/config/arm/linux-eabi.h b/gcc/config/arm/linux-eabi.h
 index 80bd825..2ace6f0 100644
 --- a/gcc/config/arm/linux-eabi.h
 +++ b/gcc/config/arm/linux-eabi.h
 @@ -62,7 +62,17 @@
  /* Use ld-linux.so.3 so that it will be possible to run classic
 GNU/Linux binaries on an EABI system.  */
  #undef  GLIBC_DYNAMIC_LINKER
 -#define GLIBC_DYNAMIC_LINKER /lib/ld-linux.so.3
 +#define GLIBC_DYNAMIC_LINKER_SOFT_FLOAT /lib/ld-linux.so.3
 +#define GLIBC_DYNAMIC_LINKER_HARD_FLOAT /lib/ld-linux-armhf.so.3
 +#if TARGET_DEFAULT_FLOAT_ABI == ARM_FLOAT_ABI_HARD
 +#define GLIBC_DYNAMIC_LINKER_DEFAULT GLIBC_DYNAMIC_LINKER_HARD_FLOAT
 +#else
 +#define GLIBC_DYNAMIC_LINKER_DEFAULT GLIBC_DYNAMIC_LINKER_SOFT_FLOAT
 +#endif
 +#define GLIBC_DYNAMIC_LINKER \
 +   %{mfloat-abi=hard: GLIBC_DYNAMIC_LINKER_HARD_FLOAT } \
 +%{mfloat-abi=soft*: GLIBC_DYNAMIC_LINKER_SOFT_FLOAT } \
 +%{!mfloat-abi=*: GLIBC_DYNAMIC_LINKER_DEFAULT }
 
  /* At this point, bpabi.h will have clobbered LINK_SPEC.  We want to
 use the GNU/Linux version, not the generic BPABI version.  */
 




Re: [PATCH, 4.7, Android, i386] Port of Android target support in i386 for 4.7 branch

2012-04-24 Thread Ilya Enkovich
 On Tue, Apr 24, 2012 at 2:54 PM, Ilya Enkovich enkovich@gmail.com wrote:
 Hello,

 here is a port of Android support patch
 (http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00944.html) for 4.7
 branch. Is it OK?

 This kind of re-org is not appropriate for the branch.

Could you please specify what exactly is wrong with it? Do you mean
adding linux-common.h?

Ilya

 Thanks,
 Richard.

 Bootstrap and check passed on linux-x86_64.

 Thanks,
 Ilya
 ---
 2012-04-24  Enkovich Ilya  ilya.enkov...@intel.com

        * config/i386/linux-common.h: New.

        * config.gcc: Add i386/linux-common.h before
        all i386/linux.h and i386/linux64.h usages.

        * config/i386/gnu-user.h (GNU_USER_TARGET_LINK_SPEC): New.
        (LINK_SPEC): Use GNU_USER_TARGET_LINK_SPEC.
        (GNU_USER_TARGET_CC1_SPEC): New.
        (CC1_SPEC): Use GNU_USER_TARGET_CC1_SPEC.
        (GNU_USER_TARGET_MATHFILE_SPEC): New.
        (ENDFILE_SPEC): Use GNU_USER_TARGET_MATHFILE_SPEC.

        * config/i386/gnu-user64.h: Likewise.


Re: [PATCH, 4.7, Android, i386] Port of Android target support in i386 for 4.7 branch

2012-04-24 Thread Richard Guenther
On Tue, Apr 24, 2012 at 3:44 PM, Ilya Enkovich enkovich@gmail.com wrote:
 On Tue, Apr 24, 2012 at 2:54 PM, Ilya Enkovich enkovich@gmail.com 
 wrote:
 Hello,

 here is a port of Android support patch
 (http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00944.html) for 4.7
 branch. Is it OK?

 This kind of re-org is not appropriate for the branch.

 Could you please specify what exactly is wrong with it? Do you mean
 adding linux-common.h?

The 4.7 branch is restricted to fixing regression bugs.  This does not appear
to fix a bug nor a regression but adds a feature.

Richard.

 Ilya

 Thanks,
 Richard.

 Bootstrap and check passed on linux-x86_64.

 Thanks,
 Ilya
 ---
 2012-04-24  Enkovich Ilya  ilya.enkov...@intel.com

        * config/i386/linux-common.h: New.

        * config.gcc: Add i386/linux-common.h before
        all i386/linux.h and i386/linux64.h usages.

        * config/i386/gnu-user.h (GNU_USER_TARGET_LINK_SPEC): New.
        (LINK_SPEC): Use GNU_USER_TARGET_LINK_SPEC.
        (GNU_USER_TARGET_CC1_SPEC): New.
        (CC1_SPEC): Use GNU_USER_TARGET_CC1_SPEC.
        (GNU_USER_TARGET_MATHFILE_SPEC): New.
        (ENDFILE_SPEC): Use GNU_USER_TARGET_MATHFILE_SPEC.

        * config/i386/gnu-user64.h: Likewise.


Re: [PATCH, 4.7, Android, i386] Port of Android target support in i386 for 4.7 branch

2012-04-24 Thread Ilya Enkovich
 On Tue, Apr 24, 2012 at 3:44 PM, Ilya Enkovich enkovich@gmail.com wrote:
 On Tue, Apr 24, 2012 at 2:54 PM, Ilya Enkovich enkovich@gmail.com 
 wrote:
 Hello,

 here is a port of Android support patch
 (http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00944.html) for 4.7
 branch. Is it OK?

 This kind of re-org is not appropriate for the branch.

 Could you please specify what exactly is wrong with it? Do you mean
 adding linux-common.h?

 The 4.7 branch is restricted to fixing regression bugs.  This does not appear
 to fix a bug nor a regression but adds a feature.

OK, thanks!

Ilya


 Richard.

 Ilya

 Thanks,
 Richard.

 Bootstrap and check passed on linux-x86_64.

 Thanks,
 Ilya
 ---
 2012-04-24  Enkovich Ilya  ilya.enkov...@intel.com

        * config/i386/linux-common.h: New.

        * config.gcc: Add i386/linux-common.h before
        all i386/linux.h and i386/linux64.h usages.

        * config/i386/gnu-user.h (GNU_USER_TARGET_LINK_SPEC): New.
        (LINK_SPEC): Use GNU_USER_TARGET_LINK_SPEC.
        (GNU_USER_TARGET_CC1_SPEC): New.
        (CC1_SPEC): Use GNU_USER_TARGET_CC1_SPEC.
        (GNU_USER_TARGET_MATHFILE_SPEC): New.
        (ENDFILE_SPEC): Use GNU_USER_TARGET_MATHFILE_SPEC.

        * config/i386/gnu-user64.h: Likewise.


[PATCH] Fix release checking

2012-04-24 Thread Richard Guenther

Committed as obvious.

Richard.

2012-04-12  Richard Guenther  rguent...@suse.de

* tree-if-conv.c (main_tree_if_conversion): Move bb under
ENABLE_CHECKING.


Index: gcc/tree-if-conv.c
===
--- gcc/tree-if-conv.c  (revision 186760)
+++ gcc/tree-if-conv.c  (working copy)
@@ -1779,7 +1779,6 @@ main_tree_if_conversion (void)
   struct loop *loop;
   bool changed = false;
   unsigned todo = 0;
-  basic_block bb;
 
   if (number_of_loops () = 1)
 return 0;
@@ -1796,8 +1795,11 @@ main_tree_if_conversion (void)
   free_dominance_info (CDI_POST_DOMINATORS);
 
 #ifdef ENABLE_CHECKING
-  FOR_EACH_BB (bb)
-gcc_assert (!bb-aux);
+  {
+basic_block bb;
+FOR_EACH_BB (bb)
+  gcc_assert (!bb-aux);
+  }
 #endif
 
   return todo;


Re: [PATCH] Make sizetypes no longer sign-extending

2012-04-24 Thread Richard Guenther
On Tue, 24 Apr 2012, Richard Guenther wrote:

 
 I've been carrying this patch for quite some while now and really
 want to go forward with it - the problem is that while all default
 languages work fine after this patch Ada shows some testsuite
 regressions.  I've had various hacks/workarounds throughout the
 Ada frontend for them, but lost track of what fixed what and
 they all felt like hacks anyway.
 
 Thus - I know the patch will add Ada testsuite regressions.  But it will
 not break Ada bootstrap.  Ada is not in the set of default languages,
 nor is it considered release critical.
 
 Are the Ada folks happy with helping to fix the fallout after-the-fact
 (I got Eric to fix the bootstrap issues that were first present - thanks
 for that)?  I am happy to revisit my hacks/workarounds and post them,
 but it will be ultimatively easier to review them if you can see
 the FAIL for yourself (there are some workarounds/hacks posted on the
 mailinglist for previous attempts IIRC).
 
 Thanks for your consideration.
 
 The patch is currently under re-testing (it needs the 2nd patch
 below, which was already approved but back in time broke Ada
 bootstrap - I didn't verify if that still occurs).

To followup myself - bootstrap with just the 2nd patch is still
broken:

/abuild/rguenther/obj2/./gcc/xgcc -B/abuild/rguenther/obj2/./gcc/ 
-B/usr/local/x86_64-unknown-linux-gnu/bin/ 
-B/usr/local/x86_64-unknown-linux-gnu/lib/ -isystem 
/usr/local/x86_64-unknown-linux-gnu/include -isystem 
/usr/local/x86_64-unknown-linux-gnu/sys-include-c -g -O2 -m32 -fpic  
-W -Wall -gnatpg -nostdinc -m32  s-secsta.adb -o s-secsta.o
s-secsta.adb:501:4: error: size of variable 'System.Secondary_Stack.Chunk' 
is too large
Chunk : aliased Chunk_Id (1, Static_Secondary_Stack_Size);
^
make[9]: *** [s-secsta.o] Error 1

And the following is the list of regressions introduced by the combined
patch set:

=== acats tests ===
FAIL:   a71004a
FAIL:   c36204d
FAIL:   c36205l
FAIL:   c37404b
FAIL:   c41107a
FAIL:   c41204a
FAIL:   c43204c
FAIL:   c43204e
FAIL:   c43204f
FAIL:   c43204g
FAIL:   c43204h
FAIL:   c43204i
FAIL:   c52102a
FAIL:   c52102c
FAIL:   c64103c
FAIL:   c64103d
FAIL:   c64106a
FAIL:   c95087a
FAIL:   cc1224a
FAIL:   cc1311a
FAIL:   cc3106b
FAIL:   cc3224a
FAIL:   cd2a31a

=== acats Summary ===
# of expected passes2297
# of unexpected failures23

=== gnat tests ===


Running target unix/
FAIL: gnat.dg/array11.adb  (test for warnings, line 12)
FAIL: gnat.dg/loop_optimization3.adb (test for excess errors)
FAIL: gnat.dg/loop_optimization3.adb execution test
FAIL: gnat.dg/object_overflow.adb  (test for warnings, line 8)
FAIL: gnat.dg/renaming5.adb scan-tree-dump-times optimized goto 2
FAIL: gnat.dg/return3.adb scan-assembler loc 1 6
FAIL: gnat.dg/test_8bitlong_overflow.adb (test for excess errors)
FAIL: gnat.dg/test_8bitlong_overflow.adb execution test

=== gnat Summary for unix/ ===

# of expected passes1089
# of unexpected failures8
# of expected failures  13
# of unsupported tests  2

Running target unix//-m32
FAIL: gnat.dg/array11.adb  (test for warnings, line 12)
FAIL: gnat.dg/loop_optimization3.adb (test for excess errors)
FAIL: gnat.dg/loop_optimization3.adb execution test
FAIL: gnat.dg/object_overflow.adb  (test for warnings, line 8)
FAIL: gnat.dg/renaming5.adb scan-tree-dump-times optimized goto 2
FAIL: gnat.dg/return3.adb scan-assembler loc 1 6
FAIL: gnat.dg/test_8bitlong_overflow.adb (test for excess errors)
FAIL: gnat.dg/test_8bitlong_overflow.adb execution test

=== gnat Summary for unix//-m32 ===

# of expected passes1089
# of unexpected failures8
# of expected failures  13
# of unsupported tests  2

=== gnat Summary ===

# of expected passes2178
# of unexpected failures16
# of expected failures  26
# of unsupported tests  4

Most of the ACATS errors are raised STORAGE_ERROR : object too large
or error: size of variable '...' is too large.  The gnat testsuite
adds warning: Storage_Error will be raised at run time to this.

I remember one workaround (which usually involves re-setting TREE_OVERFLOW
at strathegic places) fixes most of ACATS.  I'll try to isolate that
(but it's a hack and does not feel right).

Richard.

 2012-03-06  Richard Guenther  rguent...@suse.de
 
   * fold-const.c (div_if_zero_remainder): sizetypes no longer
   sign-extend.
   (int_const_binop_1): New worker for int_const_binop with
   overflowable parameter.  Pass it through
   to force_fit_type_double.
   (int_const_binop): Wrap around int_const_binop_1 with overflowable
   equal to one.
   (size_binop_loc): Call int_const_binop_1 with overflowable equal
   to minus one, forcing overflow detection for even unsigned types.
   (extract_muldiv_1): Remove bogus TYPE_IS_SIZETYPE 

[PATCH] Fix PR53085

2012-04-24 Thread Richard Guenther

This fixes redundant store elimination in FRE/PRE to preserve
volatile stores.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to the
trunk and the branch.

Richard.

2012-04-24  Richard Guenther  rguent...@suse.de

PR tree-optimization/53085
* tree-ssa-pre.c (eliminate): Do not eliminate volatile redundant
stores.

* g++.dg/torture/pr53085.C: New testcase.

Index: gcc/tree-ssa-pre.c
===
*** gcc/tree-ssa-pre.c  (revision 186761)
--- gcc/tree-ssa-pre.c  (working copy)
*** eliminate (void)
*** 4409,4414 
--- 4409,4415 
 has the same value number as its rhs.  If so, the store is
 dead.  */
  else if (gimple_assign_single_p (stmt)
+   !gimple_has_volatile_ops (stmt)
!is_gimple_reg (gimple_assign_lhs (stmt))
(TREE_CODE (rhs) == SSA_NAME
   || is_gimple_min_invariant (rhs)))
Index: gcc/testsuite/g++.dg/torture/pr53085.C
===
*** gcc/testsuite/g++.dg/torture/pr53085.C  (revision 0)
--- gcc/testsuite/g++.dg/torture/pr53085.C  (revision 0)
***
*** 0 
--- 1,17 
+ // { dg-do compile }
+ // { dg-skip-if  { *-*-* } { -fno-fat-lto-objects } {  } }
+ // { dg-options -fdump-tree-optimized }
+ 
+ class aa{
+ void f();
+ private:
+ volatile int a;
+ };
+ 
+ void aa::f(){
+ a=1;
+ a=1;
+ }
+ 
+ // { dg-final { scan-tree-dump-times a ={v} 1; 2 optimized } }
+ // { dg-final { cleanup-tree-dump optimized } }


Go patch committed: Reject invalid composite literals

2012-04-24 Thread Ian Lance Taylor
This patch to the Go frontend adds a check to reject composite literals
that the language syntax forbids.  The parser was able to parse some
cases unambiguously because it already knew that some name was a type,
but since that can not always be known that parse is not permitted.
Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu.
Committed to mainline and 4.7 branch.

Ian

diff -r 91db663343df go/parse.cc
--- a/go/parse.cc	Mon Apr 23 22:54:27 2012 -0700
+++ b/go/parse.cc	Tue Apr 24 07:50:51 2012 -0700
@@ -2865,7 +2865,16 @@
 {
   if (this-peek_token()-is_op(OPERATOR_LCURLY))
 	{
-	  if (is_parenthesized)
+	  if (!may_be_composite_lit)
+	{
+	  Type* t = ret-type();
+	  if (t-named_type() != NULL
+		  || t-forward_declaration_type() != NULL)
+		error_at(start_loc,
+			 _(parentheses required around this composite literal
+			   to avoid parsing ambiguity));
+	}
+	  else if (is_parenthesized)
 	error_at(start_loc,
 		 cannot parenthesize type in composite literal);
 	  ret = this-composite_lit(ret-type(), 0, ret-location());


Re: [C/C++] Do not pretty-print expressions with caret diagnostics for wrong function call

2012-04-24 Thread Jason Merrill

On 04/22/2012 03:49 PM, Manuel López-Ibáñez wrote:

On 22 April 2012 21:20, Jason Merrillja...@redhat.com  wrote:

When we aren't printing the expression, please print the type instead; in a
template it might not be clear what the type of the expression we're
complaining about works out to be.


I can do that. Should I also change it in the C FE?


I think it wouldn't hurt.

Jason


Re: PowerPC prologue and epilogue 4

2012-04-24 Thread David Edelsohn
On Tue, Apr 17, 2012 at 11:13 AM, Alan Modra amo...@gmail.com wrote:
 This provides some protection against misuse of r0, r11 and r12.  I
 found it useful when enabling out-of-line saves for large frames.  ;-)

        * config/rs6000/rs6000.c (START_USE, END_USE, NOT_INUSE): Define.
        (rs6000_emit_prologue): Use the above to catch register overlap.

This patch is okay.

Thanks, David


Re: PowerPC prologue and epilogue 5

2012-04-24 Thread David Edelsohn
On Thu, Apr 19, 2012 at 11:36 AM, Alan Modra amo...@gmail.com wrote:
 On Thu, Apr 19, 2012 at 08:00:15PM +0930, Alan Modra wrote:
 On Wed, Apr 18, 2012 at 12:45:16AM +0930, Alan Modra wrote:
  This enables out-of-line save and restore for large frames, and for
  ABI_AIX when using the static chain.

 Further testing revealed two problems when compiling nested
 functions.
 1) The logic I had for cr_save_regno is wrong, resulting in one of my
    NOT_INUSE asserts triggering.

 Fixed in this revised patch.  Bootstrapped etc. powerpc-linux.

 2) In some cases the prologue uses in-line saves while the epilogue
    uses out-of-line restores.  This can lead to restoring regs that
    haven't been saved.

 This turned out to be a pre-existing problem, patch in PR50340.

        * config/rs6000/rs6000.c (rs6000_savres_strategy): Allow
        out-of-line save/restore for large frames.  Don't disable
        out-of-line saves on ABI_AIX when using static chain reg.
        (rs6000_emit_prologue): Adjust cr_save_regno on ABI_AIX to not
        clobber static chain reg, and tweak for out-of-line gpr saves
        that use r1.

This patch is okay.

Thanks, David


Re: [C++ Patch] PR 52363

2012-04-24 Thread Jason Merrill

OK, thanks!

Jason


Re: [PATCH, powerpc] Fix PR47197

2012-04-24 Thread David Edelsohn
2012-04-24  Bill Schmidt  wschm...@linux.vnet.ibm.com

 PR target/47197
 * config/rs6000/rs6000-c.c (fully_fold_convert): New function.
 (altivec_build_resolved_builtin): Call fully_fold_convert.

2012-04-24  Bill Schmidt  wschm...@linux.vnet.ibm.com

 PR target/47197
 * gcc.target/powerpc/pr47197.c: New test.

Okay.

Thanks, David


[PATCH, powerpc] Fix PR47197

2012-04-24 Thread William J. Schmidt
This fixes an error wherein a nontrivial expression oassed to an Altivec
built-in results in an ICE, following Joseph Myers's suggested approach
in the bugzilla.

Bootstrapped and tested with no new regressions on
powerpc64-unknown-linux-gnu.  Ok for trunk?

Thanks,
Bill


gcc:

2012-04-24  Bill Schmidt  wschm...@linux.vnet.ibm.com

PR target/47197
* config/rs6000/rs6000-c.c (fully_fold_convert): New function.
(altivec_build_resolved_builtin): Call fully_fold_convert.

gcc/testsuite:

2012-04-24  Bill Schmidt  wschm...@linux.vnet.ibm.com

PR target/47197
* gcc.target/powerpc/pr47197.c: New test.


Index: gcc/testsuite/gcc.target/powerpc/pr47197.c
===
--- gcc/testsuite/gcc.target/powerpc/pr47197.c  (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr47197.c  (revision 0)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options -maltivec } */
+
+/* Compile-only test to ensure that expressions can be passed to
+   Altivec builtins without error.  */
+
+#include altivec.h
+
+void func(unsigned char *buf, unsigned len)
+{
+vec_dst(buf, (len = 256 ? 0 : len) | 512, 2);
+}
Index: gcc/config/rs6000/rs6000-c.c
===
--- gcc/config/rs6000/rs6000-c.c(revision 186761)
+++ gcc/config/rs6000/rs6000-c.c(working copy)
@@ -3421,6 +3421,22 @@ rs6000_builtin_type_compatible (tree t, int id)
 }
 
 
+/* In addition to calling fold_convert for EXPR of type TYPE, also
+   call c_fully_fold to remove any C_MAYBE_CONST_EXPRs that could be
+   hiding there (PR47197).  */
+
+static tree
+fully_fold_convert (tree type, tree expr)
+{
+  tree result = fold_convert (type, expr);
+  bool maybe_const = true;
+
+  if (!c_dialect_cxx ())
+result = c_fully_fold (result, false, maybe_const);
+
+  return result;
+}
+
 /* Build a tree for a function call to an Altivec non-overloaded builtin.
The overloaded builtin that matched the types and args is described
by DESC.  The N arguments are given in ARGS, respectively.  
@@ -3470,18 +3486,18 @@ altivec_build_resolved_builtin (tree *args, int n,
   break;
 case 1:
   call = build_call_expr (impl_fndecl, 1,
- fold_convert (arg_type[0], args[0]));
+ fully_fold_convert (arg_type[0], args[0]));
   break;
 case 2:
   call = build_call_expr (impl_fndecl, 2,
- fold_convert (arg_type[0], args[0]),
- fold_convert (arg_type[1], args[1]));
+ fully_fold_convert (arg_type[0], args[0]),
+ fully_fold_convert (arg_type[1], args[1]));
   break;
 case 3:
   call = build_call_expr (impl_fndecl, 3,
- fold_convert (arg_type[0], args[0]),
- fold_convert (arg_type[1], args[1]),
- fold_convert (arg_type[2], args[2]));
+ fully_fold_convert (arg_type[0], args[0]),
+ fully_fold_convert (arg_type[1], args[1]),
+ fully_fold_convert (arg_type[2], args[2]));
   break;
 default:
   gcc_unreachable ();




Re: default to strict dwarf-2 on powerpc-vxworks

2012-04-24 Thread Jakub Jelinek
On Fri, Apr 20, 2012 at 09:57:24AM +0200, Jakub Jelinek wrote:
 Looks good to me, but IMHO we should go further at this point, and
 default to dwarf_version 4 unless overridden (and override
 also on Darwin to too due to its tools not capable of handling it),
 probably together with making -fno-debug-types-section the default
 (since while DWARF 4 DW_FORM_* etc. support is in gdb/libdw/valgrind/etc.
 for quite some time, e.g. .debug_types support is very recent, and
 furthermore on lots of programs/DSOs .debug_types actually increases
 size instead of decreases.  If a post-linking utility is used
 undoing the extra duplication caused by .debug_types (yeah, I know
 Cary said he'll work on it) is going to be very hard.  I've noticed
 also lots of intra-CU redundancies with current state of .debug_types,
 e.g. lots of DW_TAG_{const,pointer,reference}_type DIEs with the same
 DW_FORM_ref_sig8 hash).
 
 Jason, are you ok with Olivier's patch?

Talked to Jason about this on IRC last night, he is ok with that, so your
patch is ok for trunk.

Jakub


Re: PR c/44774 -Werror=edantic

2012-04-24 Thread Jason Merrill

On 04/22/2012 03:38 PM, Manuel López-Ibáñez wrote:

Then, let's say we have one of such options. For example, let's call
it -Wx for this example. If the behaviour is consistent with other
group options like -Wall, then:

-Wx is enabled by default (like now)
-Wno-pedantic does not disable -Wx (like now)


-Wno-all doesn't disable the group elements?  I'd think it should, or we 
should warn that it's a useless option.



-Werror=pedantic implies -Werror=x ? (to mimic -pedantic-errors)


I would think so.


For example, -Wmain is enabled by default but also by -Wall and
-pedantic. However, -Werror=all does not enable -Werror=main. Is this
a bug or the desired behaviour?


Sounds like a bug to me.


-Wno-error=pedantic -Werror implies -Werror=x or -Wno-error=x? The
documentation only says that -Wno-error= does not imply anything.


This should work the same as -Wno-error=all does for errors that are 
part of the all group.


I think the bit about -Wno-error not implying anything just means that 
-Wno-error=foo does not imply either -Wfoo or -Wno-foo.


Jason


Re: PR c/44774 -Werror=edantic

2012-04-24 Thread Manuel López-Ibáñez
On 24 April 2012 17:53, Jason Merrill ja...@redhat.com wrote:
 On 04/22/2012 03:38 PM, Manuel López-Ibáñez wrote:

 Then, let's say we have one of such options. For example, let's call
 it -Wx for this example. If the behaviour is consistent with other
 group options like -Wall, then:

 -Wx is enabled by default (like now)
 -Wno-pedantic does not disable -Wx (like now)


 -Wno-all doesn't disable the group elements?  I'd think it should, or we
 should warn that it's a useless option.

-Wall -Wno-all = no effect. However,
-Wno-all -Wuninitialized  = -Wuninitialized (I think this is intended)
-Wuninitialized -Wno-all = -Wno-uninitialized (I think this is a bug)

 For example, -Wmain is enabled by default but also by -Wall and
 -pedantic. However, -Werror=all does not enable -Werror=main. Is this
 a bug or the desired behaviour?


 Sounds like a bug to me.

It seems there are a lot of bugs to be fixed before we can even
consider to make -Werror=pedantic equivalent to -pedantic-errors.

 -Wno-error=pedantic -Werror implies -Werror=x or -Wno-error=x? The
 documentation only says that -Wno-error= does not imply anything.


 This should work the same as -Wno-error=all does for errors that are part of
 the all group.

-Wuninitialized -Wno-error=all -Werror  = -Werror=uninitialized

but I think this may be a bug and -Wno-error=all is simply ignored as
-Werror=all is.

Anyway, the only reliable way to solve all these bugs is to encode
group options in the .opt files, and this seems to be non-trivial.
http://gcc.gnu.org/PR53072

If anyone wishes to give it a try, you are very welcome.

Cheers,

Manuel.


[Patch, ARM][0/2] Prepartion for Epilogue in RTL

2012-04-24 Thread Greta Yorsh
The following patches perform code reorganization in preparation for
epilogue generation in RTL.

[1/2] move the code of the special predicates load_multiple_operation and
store_multiple_operation into a separate function ldm_stm_operation_p
[2/2] generalize ldm_stm_operation_p 

No regression on qemu for arm-none-eabi neon softfp arm/thumb.

Ok for trunk?

Thanks,
Greta





Go patch committed: Fix order of evaluation

2012-04-24 Thread Ian Lance Taylor
This patch to the Go frontend fixes the order of evaluation for a
statement like m[0] = len(m) when m is a map.  I added a test case for
this to the master testsuite.  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Committed to mainline and 4.7 branch.

Ian

diff -r 5b5853844f00 go/gogo.cc
--- a/go/gogo.cc	Tue Apr 24 07:52:37 2012 -0700
+++ b/go/gogo.cc	Tue Apr 24 09:22:09 2012 -0700
@@ -2216,12 +2216,18 @@
   Expression::traverse(init, find_eval_ordering);
 }
 
-  if (find_eval_ordering.size() = 1)
-{
-  // If there is only one expression with a side-effect, we can
-  // leave it in place.
-  return TRAVERSE_CONTINUE;
-}
+  size_t c = find_eval_ordering.size();
+  if (c == 0)
+return TRAVERSE_CONTINUE;
+
+  // If there is only one expression with a side-effect, we can
+  // usually leave it in place.  However, for an assignment statement,
+  // we need to evaluate an expression on the right hand side before
+  // we evaluate any index expression on the left hand side, so for
+  // that case we always move the expression.  Otherwise we mishandle
+  // m[0] = len(m) where m is a map.
+  if (c == 1  s-classification() != Statement::STATEMENT_ASSIGNMENT)
+return TRAVERSE_CONTINUE;
 
   bool is_thunk = s-thunk_statement() != NULL;
   for (Find_eval_ordering::const_iterator p = find_eval_ordering.begin();


[Patch, ARM][1/2] add ldm_stm_operation_p

2012-04-24 Thread Greta Yorsh
Move the code of the special predicates load_multiple_operation and
store_multiple_operation into a separate function. No change in
functionality. 

gcc/ChangeLog

2012-04-24  Ian Bolton  ian.bolton at arm.com
Sameera Deshpande  sameera.deshpande at arm.com
Greta Yorsh  greta.yorsh at arm.com

* config/arm/arm-protos.h (ldm_stm_operation_p): New declaration.
* config/arm/arm.c (ldm_stm_operation_p): New function.
* config/arm/predicates.md (load_multiple_operation): Update
predicate.
(store_multiple_operation): Likewise.
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 900d09a..7da0e90 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -62,6 +62,7 @@ extern bool arm_legitimize_reload_address (rtx *, enum 
machine_mode, int, int,
 extern rtx thumb_legitimize_reload_address (rtx *, enum machine_mode, int, int,
int);
 extern int thumb1_legitimate_address_p (enum machine_mode, rtx, int);
+extern bool ldm_stm_operation_p (rtx, bool);
 extern int arm_const_double_rtx (rtx);
 extern int neg_const_double_rtx_ok_for_fpa (rtx);
 extern int vfp3_const_double_rtx (rtx);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e5779ce..74f4abf 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -10138,6 +10138,150 @@ adjacent_mem_locations (rtx a, rtx b)
   return 0;
 }
 
+/* Return true if OP is a valid load or store multiple operation.  LOAD is true
+   for load operations, false for store operations.
+   The pattern we are trying to match for load is:
+ [(SET (R_d0) (MEM (PLUS (addr) (offset
+  (SET (R_d1) (MEM (PLUS (addr) (offset + reg_increment
+   :
+   :
+  (SET (R_dn) (MEM (PLUS (addr) (offset + n * reg_increment
+ ]
+ where
+ 1.  If offset is 0, first insn should be (SET (R_d0) (MEM (src_addr))).
+ 2.  REGNO (R_d0)  REGNO (R_d1)  ...  REGNO (R_dn).
+ 3.  If consecutive is TRUE, then for kth register being loaded,
+ REGNO (R_dk) = REGNO (R_d0) + k.
+   The pattern for store is similar.  */
+bool
+ldm_stm_operation_p (rtx op, bool load)
+{
+  HOST_WIDE_INT count = XVECLEN (op, 0);
+  rtx reg, mem, addr;
+  unsigned regno;
+  HOST_WIDE_INT i = 1, base = 0, offset = 0;
+  rtx elt;
+  bool addr_reg_in_reglist = false;
+  bool update = false;
+  int reg_increment;
+  int offset_adj;
+
+  reg_increment = 4;
+  offset_adj = 0;
+
+  if (count = 1
+  || GET_CODE (XVECEXP (op, 0, offset_adj)) != SET
+  || (load  !REG_P (SET_DEST (XVECEXP (op, 0, offset_adj)
+return false;
+
+  /* Check if this is a write-back.  */
+  elt = XVECEXP (op, 0, offset_adj);
+  if (GET_CODE (SET_SRC (elt)) == PLUS)
+{
+  i++;
+  base = 1;
+  update = true;
+
+  /* The offset adjustment must be the number of registers being
+ popped times the size of a single register.  */
+  if (!REG_P (SET_DEST (elt))
+  || !REG_P (XEXP (SET_SRC (elt), 0))
+  || (REGNO (SET_DEST (elt)) != REGNO (XEXP (SET_SRC (elt), 0)))
+  || !CONST_INT_P (XEXP (SET_SRC (elt), 1))
+  || INTVAL (XEXP (SET_SRC (elt), 1)) !=
+ ((count - 1 - offset_adj) * reg_increment))
+return false;
+}
+
+  i = i + offset_adj;
+  base = base + offset_adj;
+  /* Perform a quick check so we don't blow up below.  */
+  if (count = i)
+return false;
+
+  elt = XVECEXP (op, 0, i - 1);
+  if (GET_CODE (elt) != SET)
+return false;
+
+  if (load)
+{
+  reg = SET_DEST (elt);
+  mem = SET_SRC (elt);
+}
+  else
+{
+  reg = SET_SRC (elt);
+  mem = SET_DEST (elt);
+}
+
+  if (!REG_P (reg) || !MEM_P (mem))
+return false;
+
+  regno = REGNO (reg);
+  addr = XEXP (mem, 0);
+  if (GET_CODE (addr) == PLUS)
+{
+  if (!CONST_INT_P (XEXP (addr, 1)))
+   return false;
+
+  offset = INTVAL (XEXP (addr, 1));
+  addr = XEXP (addr, 0);
+}
+
+  if (!REG_P (addr))
+return false;
+
+  for (; i  count; i++)
+{
+  elt = XVECEXP (op, 0, i);
+  if (GET_CODE (elt) != SET)
+return false;
+
+  if (load)
+{
+  reg = SET_DEST (elt);
+  mem = SET_SRC (elt);
+}
+  else
+{
+  reg = SET_SRC (elt);
+  mem = SET_DEST (elt);
+}
+
+  if (!REG_P (reg)
+  || GET_MODE (reg) != SImode
+  || REGNO (reg) = regno
+  || !MEM_P (mem)
+  || GET_MODE (mem) != SImode
+  || ((GET_CODE (XEXP (mem, 0)) != PLUS
+  || !rtx_equal_p (XEXP (XEXP (mem, 0), 0), addr)
+  || !CONST_INT_P (XEXP (XEXP (mem, 0), 1))
+  || (INTVAL (XEXP (XEXP (mem, 0), 1)) !=
+   offset + (i - base) * reg_increment))
+  (!REG_P (XEXP (mem, 0))
+ || offset + (i - base) * reg_increment != 0)))
+return false;
+
+  regno = REGNO (reg);
+  if (regno == 

[Patch, ARM][2/2] generalize ldm_stm_operation_p

2012-04-24 Thread Greta Yorsh
Generalize ldm_stm_operation_p with additional parameters that will be used
by epilogue patterns:
  * machine mode to support both SImode and DFmode registers
  * flag to request consecutive registers in the register list
  * flag to indicate whether PC in the register list

gcc/ChangeLog

2012-04-24  Ian Bolton  ian.bolton at arm.com
Sameera Deshpande  sameera.deshpande at arm.com
Greta Yorsh  greta.yorsh at arm.com

* config/arm/arm-protos.h (ldm_stm_operation_p): New parameters.
* config/arm/arm.c (ldm_stm_operation_p): New parameters.
* config/arm/predicates.md (load_multiple_operation): Add arguments.
(store_multiple_operation): Likewise.diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 753e109..efb5b9f 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -62,7 +62,8 @@ extern bool arm_legitimize_reload_address (rtx *, enum 
machine_mode, int, int,
 extern rtx thumb_legitimize_reload_address (rtx *, enum machine_mode, int, int,
int);
 extern int thumb1_legitimate_address_p (enum machine_mode, rtx, int);
-extern bool ldm_stm_operation_p (rtx, bool);
+extern bool ldm_stm_operation_p (rtx, bool, enum machine_mode mode,
+ bool, bool);
 extern int arm_const_double_rtx (rtx);
 extern int neg_const_double_rtx_ok_for_fpa (rtx);
 extern int vfp3_const_double_rtx (rtx);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 4216d05..5477de2 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -10139,7 +10139,9 @@ adjacent_mem_locations (rtx a, rtx b)
 }
 
 /* Return true if OP is a valid load or store multiple operation.  LOAD is true
-   for load operations, false for store operations.
+   for load operations, false for store operations.  CONSECUTIVE is true
+   if the register numbers in the operation must be consecutive in the register
+   bank. RETURN_PC is true if value is to be loaded in PC.
The pattern we are trying to match for load is:
  [(SET (R_d0) (MEM (PLUS (addr) (offset
   (SET (R_d1) (MEM (PLUS (addr) (offset + reg_increment
@@ -10154,20 +10156,31 @@ adjacent_mem_locations (rtx a, rtx b)
  REGNO (R_dk) = REGNO (R_d0) + k.
The pattern for store is similar.  */
 bool
-ldm_stm_operation_p (rtx op, bool load)
+ldm_stm_operation_p (rtx op, bool load, enum machine_mode mode,
+ bool consecutive, bool return_pc)
 {
   HOST_WIDE_INT count = XVECLEN (op, 0);
   rtx reg, mem, addr;
   unsigned regno;
+  unsigned first_regno;
   HOST_WIDE_INT i = 1, base = 0, offset = 0;
   rtx elt;
   bool addr_reg_in_reglist = false;
   bool update = false;
   int reg_increment;
   int offset_adj;
+  int regs_per_val;
 
-  reg_increment = 4;
-  offset_adj = 0;
+  /* If not in SImode, then registers must be consecutive
+ (e.g., VLDM instructions for DFmode).  */
+  gcc_assert ((mode == SImode) || consecutive);
+  /* Setting return_pc for stores is illegal.  */
+  gcc_assert (!return_pc || load);
+
+  /* Set up the increments and the regs per val based on the mode.  */
+  reg_increment = GET_MODE_SIZE (mode);
+  regs_per_val = reg_increment / 4;
+  offset_adj = return_pc ? 1 : 0;
 
   if (count = 1
   || GET_CODE (XVECEXP (op, 0, offset_adj)) != SET
@@ -10195,9 +10208,11 @@ ldm_stm_operation_p (rtx op, bool load)
 
   i = i + offset_adj;
   base = base + offset_adj;
-  /* Perform a quick check so we don't blow up below.  */
-  if (count = i)
-return false;
+  /* Perform a quick check so we don't blow up below. If only one reg is 
loaded,
+ success depends on the type: VLDM can do just one reg,
+ LDM must do at least two.  */
+  if ((count = i)  (mode == SImode))
+  return false;
 
   elt = XVECEXP (op, 0, i - 1);
   if (GET_CODE (elt) != SET)
@@ -10218,6 +10233,7 @@ ldm_stm_operation_p (rtx op, bool load)
 return false;
 
   regno = REGNO (reg);
+  first_regno = regno;
   addr = XEXP (mem, 0);
   if (GET_CODE (addr) == PLUS)
 {
@@ -10249,10 +10265,13 @@ ldm_stm_operation_p (rtx op, bool load)
 }
 
   if (!REG_P (reg)
-  || GET_MODE (reg) != SImode
+  || GET_MODE (reg) != mode
   || REGNO (reg) = regno
+  || (consecutive
+   (REGNO (reg) !=
+  (unsigned int) (first_regno + regs_per_val * (i - base
   || !MEM_P (mem)
-  || GET_MODE (mem) != SImode
+  || GET_MODE (mem) != mode
   || ((GET_CODE (XEXP (mem, 0)) != PLUS
   || !rtx_equal_p (XEXP (XEXP (mem, 0), 0), addr)
   || !CONST_INT_P (XEXP (XEXP (mem, 0), 1))
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 20a64ec..428f9e0 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -380,13 +380,17 @@
 (define_special_predicate load_multiple_operation
   (match_code parallel)
 {
- return ldm_stm_operation_p (op, 

move default_tree_printer out of toplev.c

2012-04-24 Thread Manuel López-Ibáñez
It seems an obvious cleanup to me.

OK?

2012-04-24  Manuel López-Ibáñez  m...@gcc.gnu.org

gcc/
* tree-pretty-print.h (default_tree_printer): Do not declare.
* tree-diagnostic.c: Include tree-pretty-print.h, tree-pass.h and
intl.h.
(default_tree_diagnostic_starter): Make static.
(default_tree_printer): Move to here. Make static.
(tree_diagnostics_defaults): New.
* tree-diagnostic.h (default_tree_diagnostic_starter): Do not declare.
* tree.c (free_lang_data): Use tree_diagnostics_defaults.
* toplev.c: Do not include tree-pass.h.
(default_tree_printer): Move from here.
(general_init): Use tree_diagnostics_defaults.


move-tree-default-printer.diff
Description: Binary data


[Patch, ARM] rename thumb_unexpanded_epilogue to thumb1_unexpanded_epilogue

2012-04-24 Thread Greta Yorsh
Rename thumb_unexpanded_epilogue to thumb1_unexpanded_epilogue.

In preparation for epilogue generation in RTL and anyway it's the right name
for this function.

Ok for trunk?

Thanks,
Greta

gcc/ChangeLog

2012-04-24  Ian Bolton  ian.bolton at arm.com
Sameera Deshpande  sameera.deshpande at arm.com
Greta Yorsh  greta.yorsh at arm.com

* config/arm/arm-protos.h (thumb_unexpanded_epilogue): Rename to...
(thumb1_unexpanded_epilogue): ...this.
* config/arm/arm.c (thumb_unexpanded_epilogue): Rename to...
(thumb1_unexpanded_epilogue): ...this.
* config/arm/arm.md (thumb_unexpanded_epilogue): Rename to...
(thumb1_unexpanded_epilogue): ...this.diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 21a89aa..0c4bb96 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -178,7 +178,7 @@ extern int arm_float_words_big_endian (void);
 
 /* Thumb functions.  */
 extern void arm_init_expanders (void);
-extern const char *thumb_unexpanded_epilogue (void);
+extern const char *thumb1_unexpanded_epilogue (void);
 extern void thumb1_expand_prologue (void);
 extern void thumb1_expand_epilogue (void);
 extern const char *thumb1_output_interwork (void);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 5d9cbc5..0b7f3c8 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -22032,7 +22032,7 @@ thumb1_extra_regs_pushed (arm_stack_offsets *offsets, 
bool for_prologue)
 
 /* The bits which aren't usefully expanded as rtl.  */
 const char *
-thumb_unexpanded_epilogue (void)
+thumb1_unexpanded_epilogue (void)
 {
   arm_stack_offsets *offsets;
   int regno;
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 4f6d965..ed33c9b 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -10673,7 +10673,7 @@
   if (TARGET_32BIT)
 return arm_output_epilogue (NULL);
   else /* TARGET_THUMB1 */
-return thumb_unexpanded_epilogue ();
+return thumb1_unexpanded_epilogue ();
   
   ; Length is absolute worst case
   [(set_attr length 44)


[rfc] PR tree-optimization/52633 - ICE due to vectorizer pattern detection collision

2012-04-24 Thread Ulrich Weigand
Hello,

PR 52633 is caused by bad interaction between two different vectorizer
pattern recognition passed.  A minimal test case is:

void
test (unsigned short *x, signed char *y)
{
  int i;
  for (i = 0; i  32; i++)
x[i] = (short) (y[i]  5);
}

built with cc1 -O3 -march=armv7-a -mfpu=neon -mfloat-abi=softfp
on a arm-linux-gnueabi target.

Before the vectorizer, we have something like:

  short unsigned int D.4976;
  int D.4975;
  int D.4974;
  signed char D.4973;
  signed char * D.4972;
  short unsigned int * D.4970;
[snip]
  D.4973_8 = *D.4972_7;
  D.4974_9 = (int) D.4973_8;
  D.4975_10 = D.4974_9  5;
  D.4976_11 = (short unsigned int) D.4975_10;
  *D.4970_5 = D.4976_11;


The pattern recognizer now goes through its list of patterns it tries
to detect.  The first successful one is vect_recog_over_widening_pattern.
This will annotate the statements (via related_stmt fields):

  D.4973_8 = *D.4972_7;
  D.4974_9 = (int) D.4973_8;
-- D.4992_26 = (signed short) D.4973_8
  D.4975_10 = D.4974_9  5;
-- patt.16_31 = D.4992_26  5
  D.4976_11 = (short unsigned int) D.4975_10;
-- D.4994_32 = (short unsigned int) patt.16_31
  *D.4970_5 = D.4976_11;


In the next step, vect_recog_widen_shift_pattern *also* matches, and
creates a new annotation for the shift statement (using a widening
shift):

  D.4973_8 = *D.4972_7;
  D.4974_9 = (int) D.4973_8;
-- D.4992_26 = (signed short) D.4973_8
  D.4975_10 = D.4974_9  5;
   [-- patt.16_31 = D.4992_26  5]
-- patt.17_33 = D.4973_8 w 5
  D.4976_11 = (short unsigned int) D.4975_10;
-- D.4994_32 = (short unsigned int) patt.16_31
  *D.4970_5 = D.4976_11;


Since the original statement can only point to a single related_stmt,
the statement setting patt.16_31 is now longer refered to as related_stmt
by any other statement.  This causes it to no longer be considered
relevant for the vectorizer.

However, the statement:
-- D.4994_32 = (short unsigned int) patt.16_31 
*is* still considered relevant.  While analysing it, however, the
vectorizer follows through to the def statement for patt.16_31,
and gets quite confused to find that it doesn't have a vectype
(because it wasn't considered by the vectorizer).  The symptom
is a failing assertion
  gcc_assert (*vectype != NULL_TREE);
in vect_is_simple_use_1.


Now, it seems quite unusual for multiple patterns to match for a
single original statement.  In fact, most pattern recognizers
explicitly refuse to even consider statements that were already
recognized.  However, vect_recog_widen_shift_pattern makes an
exception:

  /* This statement was also detected as over-widening operation (it can't
 be any other pattern, because only over-widening detects shifts).
 LAST_STMT is the final type demotion statement, but its related
 statement is shift.  We analyze the related statement to catch cases:

 orig code:
  type a_t;
  itype res;
  TYPE a_T, res_T;

  S1 a_T = (TYPE) a_t;
  S2 res_T = a_T  CONST;
  S3 res = (itype)res_T;

  (size of type * 2 = size of itype
   and size of itype * 2 = size of TYPE)

 code after over-widening pattern detection:

  S1 a_T = (TYPE) a_t;
   -- a_it = (itype) a_t;
  S2 res_T = a_T  CONST;
  S3 res = (itype)res_T;  --- LAST_STMT
   -- res = a_it  CONST;

 after widen_shift:

  S1 a_T = (TYPE) a_t;
   -- a_it = (itype) a_t; - redundant
  S2 res_T = a_T  CONST;
  S3 res = (itype)res_T;
   -- res = a_t w CONST;

  i.e., we replace the three statements with res = a_t w CONST.  */


If everything were indeed as described in that comment, things would work out
fine.  However, what is described above as code after over-widening pattern
detection is only one of two possible outcomes of that latter pattern; the
other is the one that happens in the current test case, where we still have
a final type conversion left after applying the over-widening pattern.


I guess one could try to distiguish the two cases somehow and handle both;
but the overall approach seems quite fragile to me; it doesn't look really
maintainable to have to rely on so many details of the operation of one
particular pattern detection function while writing another one, or else
risk creating subtle problems like the one described above.

So I was wondering why vect_recog_widen_shift_pattern tries to take advantage
of an already recognized over-widening pattern.  But indeed, if it does not,
it will generate less efficient code in cases like the above test case: by
itself vect_recog_widen_shift_pattern, would generate code to expand the
char to a short, then do a widening shift resulting in an int, followed
by narrowing back down to a short.


However, even so, it might actually be preferable to just handle such
cases within vect_recog_widen_shift_pattern itself.  Indeed, the routine
already looks for another 

[PATCH, rtl-optimization]: Enhance post-reload compare elimination pass to handle arithmetic operations with implicit extensions

2012-04-24 Thread Uros Bizjak
Hello!

Back to converting x86 to post-reload compare elimination pass.

Arithmetic operations in x86_64 can implicitly zero extend the result,
and set flags according to the non-extended result. Following testcase
should exercise both features:

--cut here--
void foo (long int, int);

void test (unsigned int a, unsigned int b)
{
  long int x = a + b;
  int f = a + b  0;

  foo (x, f);
}
--cut here--

However, with pre-reload implementation, gcc was able to merge
operation + compare, while ignoring implicit extension:

test:
addl%esi, %edi  # 8 *addsi_2/3
setne   %sil# 19*setcc_qi
movl%edi, %edi  # 11*zero_extendsidi2_rex64/1
movzbl  %sil, %esi  # 20*zero_extendqisi2
jmp foo # 14*sibcall

Unpatched post-reload compare elimination was able to merge operation
+ zero extension (please note that attached WIP target patch is
needed):

test:
addl%esi, %edi  # 7 addsi_1_zext/2
xorl%esi, %esi  # 22*movsi_xor
testl   %edi, %edi  # 23*cmpsi_ccno_1/1
setne   %sil# 24*setcc_qi_slp
jmp foo # 14*sibcall

And patched post-reload compare finally produces optimal code:

test:
addl%esi, %edi  # 7 *addsi_2_zext/2
setne   %sil# 19*setcc_qi
movzbl  %sil, %esi  # 20*zero_extendqisi2
jmp foo # 14*sibcall

Where the operation looks like:

#(insn:TI 7 9 19 2 (parallel [
#(set (reg:DI 5 di)
#(zero_extend:DI (plus:SI (reg/v:SI 4 si [orig:64 b ] [64])
#(reg/v:SI 5 di [orig:63 a ] [63]
#(set (reg:CCZ 17 flags)
#(compare:CCZ (plus:SI (reg/v:SI 4 si [orig:64 b ] [64])
#(reg/v:SI 5 di [orig:63 a ] [63]))
#(const_int 0 [0])))
#]) cmpadd.c:5 261 {*addsi_2_zext}
# (expr_list:REG_DEAD (reg/v:SI 4 si [orig:64 b ] [64])
#(nil)))
addl%esi, %edi  # 7 *addsi_2_zext/2 [length = 2]

Attached patch teaches post-reload compare optimization how to handle
arithmetic operations with implicit extensions. The input is:

(insn 7 4 8 2 (parallel [
(set (reg:DI 5 di)
(zero_extend:DI (plus:SI (reg/v:SI 4 si [orig:64 b ] [64])
(reg/v:SI 5 di [orig:63 a ] [63]
(clobber (reg:CC 17 flags))
]) cmpadd.c:5 253 {addsi_1_zext}
 (nil))

(insn 8 7 9 2 (set (reg:CCZ 17 flags)
(compare:CCZ (reg:SI 5 di [orig:59 D.1714 ] [59])
(const_int 0 [0]))) cmpadd.c:6 2 {*cmpsi_ccno_1}
 (nil))

It simply checks if REGNOs of registers are the same, the mode of the
compared register (against zero!) is checked to the mode of the inner
part of the extension instruction.

2012-04-24  Uros Bizjak  ubiz...@gmail.com

* compare-elim.c (try_eliminate_compare): Also handle operands with
implicit extensions.

Patch is lightly tested on x86_64-pc-linux-gnu, together with attached
WIP patch.

Opinions? Since it looks quite safe, is it OK for mainline?

Uros.
Index: compare-elim.c
===
--- compare-elim.c  (revision 186721)
+++ compare-elim.c  (working copy)
@@ -563,10 +563,26 @@
  Validate that PREV_CLOBBER itself does in fact refer to IN_A.  Do
  recall that we've already validated the shape of PREV_CLOBBER.  */
   x = XVECEXP (PATTERN (insn), 0, 0);
-  if (!rtx_equal_p (SET_DEST (x), in_a))
+  if (rtx_equal_p (SET_DEST (x), in_a))
+cmp_src = SET_SRC (x);
+
+  /* Also check operations with implicit extensions, e.g.:
+ [(set (reg:DI)
+  (zero_extend:DI (plus:SI (reg:SI)(reg:SI
+  (set (reg:CCZ flags)
+  (compare:CCZ
+(plus:SI (reg:SI)(reg:SI))
+(const_int 0)))]   */
+  else if (REG_P (SET_DEST (x))
+   REG_P (in_a)
+   REGNO (SET_DEST (x)) == REGNO (in_a)
+   (GET_CODE (SET_SRC (x)) == ZERO_EXTEND
+  || GET_CODE (SET_SRC (x)) == SIGN_EXTEND)
+   GET_MODE (XEXP (SET_SRC (x), 0)) == GET_MODE (in_a))
+cmp_src = XEXP (SET_SRC (x), 0);
+  else
 return false;
-  cmp_src = SET_SRC (x);
-  
+
   /* Determine if we ought to use a different CC_MODE here.  */
   flags = maybe_select_cc_mode (cmp, cmp_src, cmp-in_b);
   if (flags == NULL)
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 186721)
+++ config/i386/i386.c  (working copy)
@@ -17861,19 +17861,31 @@
   emit_insn (gen_rtx_SET (VOIDmode, dest, x));
 }
 
-/* Return TRUE or FALSE depending on whether the first SET in INSN
-   has source and destination with matching CC modes, and that the
+/* Return TRUE or FALSE depending on whether the first SET from COMPARE
+   in INSN has source and destination with matching CC modes, and that the
 

[testsuite,4.7,committed]: fix gcc.c-torture/compile/pr52891-2.c

2012-04-24 Thread Georg-Johann Lay
Committed as obvious.

http://gcc.gnu.org/viewcvs?view=revisionrevision=186775

Johann

PR testsuite/52641
PR tree-optimizations/52891
* gcc.c-torture/compile/pr52891-2.c: Fix test for 16-bit int.


Re: [PR tree-optimization/52558]: RFC: questions on store data race

2012-04-24 Thread Aldy Hernandez

On 04/13/12 03:46, Richard Guenther wrote:

On Fri, Apr 13, 2012 at 12:11 AM, Aldy Hernandezal...@redhat.com  wrote:


Richard.  Thanks so much for reviewing and providing an alternative 
approach, which AFAICT provides superior results.



A similar effect could be obtained by keeping a flag whether we entered the
path that stored the value before the transform.  Thus, do

   lsm = g2;  // technically not needed, if you also handle loads
   flag = false;
   for (...)
 {
if (g1)
  {
 if (flag)
   g2 = lsm;
  }
else
  {
 lsm = 0;
 flag = true;
  }
 }
   if (flag)
 g2 = lsm;


I have implemented this in the attached patch, and it seems to be 
generating better code than my other if-changed approach.  It also 
avoids generating unnecessary loads that may trap.


For the original testcase:

int g_1 = 1;
int g_2 = 0;

int func_1(void)
{
 int l;
 for (l = 0; l  1234; l++)
 {
   if (g_1)
 return l;
   else
 g_2 = 0;
 }
 return 999;
}

After all optimizations are done, we are now generating the following 
for the flags approach:


new:
func_1:
movlg_1(%rip), %edx
xorl%eax, %eax
testl   %edx, %edx
je  .L5
rep
ret
.L5:
movl$0, g_2(%rip)
movw$999, %ax
ret

Which is significantly better than the unmodified trunk of:

func_1:
movlg_1(%rip), %edx
movlg_2(%rip), %eax
testl   %edx, %edx
je  .L5
movl%eax, g_2(%rip)
xorl%eax, %eax
ret
.L5:
movl$0, g_2(%rip)
movl$999, %eax
ret

And in other less contrived cases, we generate correct code that avoids 
writes on code paths that did not have it.  For example, in Hans 
register promotion example:


int count;

struct obj {
int data;
struct obj *next;
} *q;

void func()
{
  struct obj *p;
  for (p = q; p; p = p-next)
if (p-data  0)
count++;
}

Under the new memory model we should avoid promoting count to a 
register and unilaterally writing to it upon exiting the loop.


With the attached patch, we now generate:

func:
.LFB0:
movqq(%rip), %rax
xorl%ecx, %ecx
movlcount(%rip), %edx
testq   %rax, %rax
je  .L1
.L9:
movl(%rax), %esi
testl   %esi, %esi
jle .L3
addl$1, %edx
movl$1, %ecx
.L3:
movq8(%rax), %rax
testq   %rax, %rax
jne .L9
testb   %cl, %cl
jne .L12
.L1:
rep
ret
.L12:
movl%edx, count(%rip)
ret

Which is as expected.

I don't understand your suggestion of:

lsm = g2;  // technically not needed, if you also handle loads

that would allow to extend this to cover the cases where the access may
eventually trap (if you omit the load before the loop).


Can't I just remove the lsm=g2?  What's this about also handling loads?



Not sure which is more efficient (you can eventually combine flag variables
for different store locations, combining the if-changed compare is not so easy
I guess).


So far, I see better code generated with the flags approach, so...


1. No pass can figure out the equality (or inequality) of g_2_lsm and g_2.
  In the PR, Richard mentions that both FRE/PRE can figure this out, but they
are not run after store motion.  DOM should also be able to, but apparently
does not :(.  Tips?


Well.  Schedule FRE after loop opts ...

DOM is not prepared to handle loads/stores with differing VOPs - it was never
updated really after the alias-improvements merge.


2. The GIMPLE_CONDs I create in this patch are currently causing problems
with complex floats/doubles.  do_jump is complaining that it can't compare
them, so I assume it is too late (in tree-ssa-loop-im.c) to compare complex
values since complex lowering has already happened (??). Is there a more
canonical way of creating a GIMPLE_COND that may contain complex floats at
this stage?


As you are handling redundant stores you want a bitwise comparison anyway,
but yes, complex compares need to be lowered.  But as said, you want a
bitwise comparison, not a value-comparison.  You can try using


I can ignore all this.  I have implemented both alternatives, with a 
target hook as suggested, but I'm thinking of removing it altogether and 
just leaving the flags approach.



Few comments on the patch itself.

+ new_bb = split_edge (ex);
+ then_bb = create_empty_bb (new_bb);
+ if (current_loops  new_bb-loop_father)
+   add_bb_to_loop (then_bb, new_bb-loop_father);

+ gsi = gsi_start_bb (new_bb);
+ t1 = make_rename_temp (TREE_TYPE (ref-mem), NULL);

Hmm, why do you need to re-load the value?  Don't you have both
the value as it was loaded before the loop and the new value (the lsm tmp reg)
already?  Why not simply remember the SSA 

Re: move default_tree_printer out of toplev.c

2012-04-24 Thread Jason Merrill

OK.

Jason


Re: [PATCH, rtl-optimization]: Enhance post-reload compare elimination pass to handle arithmetic operations with implicit extensions

2012-04-24 Thread Uros Bizjak
On Tue, Apr 24, 2012 at 6:56 PM, Uros Bizjak ubiz...@gmail.com wrote:

 Back to converting x86 to post-reload compare elimination pass.

 Arithmetic operations in x86_64 can implicitly zero extend the result,
 and set flags according to the non-extended result. Following testcase
 should exercise both features:

 2012-04-24  Uros Bizjak  ubiz...@gmail.com

        * compare-elim.c (try_eliminate_compare): Also handle operands with
        implicit extensions.

 Patch is lightly tested on x86_64-pc-linux-gnu, together with attached
 WIP patch.

 Opinions? Since it looks quite safe, is it OK for mainline?

The full bootstrap and regression test passed on x86_64-pc-linux-gnu
{,-m32} with attached x86 WIP patch (the updated patch disables some
interfering peephole2s). I am confident that compare-elim.c change
doesn't cause regressions, so this message is a formal request for a
patch inclusion.

BTW: This is also the first time x86 bootstrapped with enabled
post-reload compare elimination pass.

Uros.
Index: i386.c
===
--- i386.c  (revision 186721)
+++ i386.c  (working copy)
@@ -17861,19 +17861,32 @@ ix86_split_copysign_var (rtx operands[])
   emit_insn (gen_rtx_SET (VOIDmode, dest, x));
 }
 
-/* Return TRUE or FALSE depending on whether the first SET in INSN
-   has source and destination with matching CC modes, and that the
+/* Return TRUE or FALSE depending on whether the first SET from COMPARE
+   in INSN has source and destination with matching CC modes, and that the
CC mode is at least as constrained as REQ_MODE.  */
 
 bool
 ix86_match_ccmode (rtx insn, enum machine_mode req_mode)
 {
-  rtx set;
+  rtx pat, set;
   enum machine_mode set_mode;
+  int i;
 
-  set = PATTERN (insn);
-  if (GET_CODE (set) == PARALLEL)
-set = XVECEXP (set, 0, 0);
+  pat = PATTERN (insn);
+  if (GET_CODE (pat) == PARALLEL)
+{
+  set = NULL_RTX;
+  for (i = 0; i  XVECLEN (pat, 0); i++)
+   {
+ set = XVECEXP (pat, 0, i);
+ if (GET_CODE (set) == SET
+  GET_CODE (SET_SRC (set)) == COMPARE)
+   break;
+   }
+}
+  else
+set = pat;
+
   gcc_assert (GET_CODE (set) == SET);
   gcc_assert (GET_CODE (SET_SRC (set)) == COMPARE);
 
@@ -39090,6 +39103,8 @@ ix86_autovectorize_vector_sizes (void)
 #define TARGET_FIXED_CONDITION_CODE_REGS ix86_fixed_condition_code_regs
 #undef TARGET_CC_MODES_COMPATIBLE
 #define TARGET_CC_MODES_COMPATIBLE ix86_cc_modes_compatible
+#undef TARGET_FLAGS_REGNUM
+#define TARGET_FLAGS_REGNUM FLAGS_REG
 
 #undef TARGET_MACHINE_DEPENDENT_REORG
 #define TARGET_MACHINE_DEPENDENT_REORG ix86_reorg
Index: i386.md
===
--- i386.md (revision 186769)
+++ i386.md (working copy)
@@ -5808,14 +5808,14 @@
(zero_extend:DI (plus:SI (match_dup 1) (match_dup 2])
 
 (define_insn *addmode_2
-  [(set (reg FLAGS_REG)
+  [(set (match_operand:SWI 0 nonimmediate_operand =r,rm,r)
+   (plus:SWI
+ (match_operand:SWI 1 nonimmediate_operand %0,0,r)
+ (match_operand:SWI 2 general_operand g,ri,0)))
+   (set (reg FLAGS_REG)
(compare
- (plus:SWI
-   (match_operand:SWI 1 nonimmediate_operand %0,0,r)
-   (match_operand:SWI 2 general_operand g,ri,0))
- (const_int 0)))
-   (set (match_operand:SWI 0 nonimmediate_operand =r,rm,r)
-   (plus:SWI (match_dup 1) (match_dup 2)))]
+ (plus:SWI (match_dup 1) (match_dup 2))
+ (const_int 0)))]
   ix86_match_ccmode (insn, CCGOCmode)
 ix86_binary_operator_ok (PLUS, MODEmode, operands)
 {
@@ -5857,13 +5857,14 @@
 
 ;; See comment for addsi_1_zext why we do use nonimmediate_operand
 (define_insn *addsi_2_zext
-  [(set (reg FLAGS_REG)
+  [(set (match_operand:DI 0 register_operand =r,r)
+   (zero_extend:DI
+ (plus:SI (match_operand:SI 1 nonimmediate_operand %0,r)
+  (match_operand:SI 2 x86_64_general_operand rme,0
+   (set (reg FLAGS_REG)
(compare
- (plus:SI (match_operand:SI 1 nonimmediate_operand %0,r)
-  (match_operand:SI 2 x86_64_general_operand rme,0))
- (const_int 0)))
-   (set (match_operand:DI 0 register_operand =r,r)
-   (zero_extend:DI (plus:SI (match_dup 1) (match_dup 2]
+ (plus:SI (match_dup 1) (match_dup 2))
+ (const_int 0)))]
   TARGET_64BIT  ix86_match_ccmode (insn, CCGOCmode)
 ix86_binary_operator_ok (PLUS, SImode, operands)
 {
@@ -6090,7 +6091,7 @@
(match_operand:SWI 2 general_operand g,0))
  (const_int 0)))
(clobber (match_scratch:SWI 0 =r,r))]
-  ix86_match_ccmode (insn, CCGOCmode)
+  0  ix86_match_ccmode (insn, CCGOCmode)
 !(MEM_P (operands[1])  MEM_P (operands[2]))
 {
   switch (get_attr_type (insn))
@@ -17257,7 +17258,7 @@
  (clobber (reg:CC FLAGS_REG))])
(set (match_dup 1) (match_dup 0))
(set (reg FLAGS_REG) (compare (match_dup 0) (const_int 0)))]
-  

Re: thoughts on libatomic

2012-04-24 Thread Torvald Riegel
On Mon, 2012-04-23 at 12:29 -0700, Richard Henderson wrote:
 On 04/16/12 11:15, Torvald Riegel wrote:
  - for acq/rel memorders, we don't need seq_cst fences whenever the
atomicity is implemented with a lock
(hostconfig.c:pre_barrier/post_barrier)
 
 Err.. where?  This also seems to conflict with...

I thought on the fallback paths that use locks.  Locks typically use
acq/rel semantics, so for the particular location, we don't need
pre/post barrier code to be run.

Or maybe it didn't happen, and the one below is correct.  Sorry for the
potential mix-up.  I can look at this again after PTO.

 
  - protect_start|end:
- BUG: those are called without additional post|pre_barrier calls in
  cas_n.c, exch_n.c, fop_n.c, load_n.c, store_n.c
  The lock membar guarantees are too weak if memorder for the atomic
  access is seq_cst, and if you use more than one lock.
  (assumes that lock implementation uses typical acq/rel barriers).
 
 ... this.  I don't see what multiple locks has to do with seq vs acqrel.

If we don't run seq_cst membars when falling back to locks, then seq_cst
operations on two locations ordered by acq/rel locks will not be forced
to be in one total order (which seq_cst requires).  They will be ordered
per lock, but not globally.

  - config/posix/lock.c:
- I'm not sure that WATCH_SIZE should almost certainly be the same as
  the cacheline size.  Why?  Which assumptions about the program?
 
 See above.  Plus I assumed that if we have to take a lock, that's got to
 dominate over any cache ping-pong that would be played.

Perhaps the cache ping-pong dominates (and this assumes that the lock is
contended).  But perhaps the cache ping-pong wouldn't be worse than the
contention potentially introduced from a too-coarse-grained lock?

- Do locks need to care about priority inversion?
 
 At the moment I'm leaving this with the pthread implementation.

pthreads offers PI-safe locks too.  But I also don't know whether we
need this. 



Re: [PATCH, rtl-optimization]: Enhance post-reload compare elimination pass to handle arithmetic operations with implicit extensions

2012-04-24 Thread Richard Henderson
On 04/24/12 09:56, Uros Bizjak wrote:
 2012-04-24  Uros Bizjak  ubiz...@gmail.com
 
   * compare-elim.c (try_eliminate_compare): Also handle operands with
   implicit extensions.

Ok.


r~


Re: RFC reminder: an alternative -fsched-pressure implementation

2012-04-24 Thread Richard Sandiford
Vladimir Makarov vmaka...@redhat.com writes:
 On 04/23/2012 11:42 AM, Ulrich Weigand wrote:
 Vladimir Makarov wrote:

 I have a mixed feeling with the patch. I've tried it on SPEC2000 on
 x86/x86-64 and ARM. Model algorithm generates bigger code up to 3.5%
 (SPECFP on x86), 2% (SPECFP on 86-64), and 0.23% (SPECFP on ARM) in
 comparison with the current algorithm. It is slower too. Although
 the difference is quite insignificant on Corei7, compiler speed
 slowdown achieves 0.4% on SPECFP2000 on arm. The algorithm also
 generates slower code on x86 (1.5% on SPECINT and 5% on SPECFP200)
 and practically the same average code on x86-64 and ARM (I've tried
 only SPECINT on ARM).
 Was this using NEON on ARM?  The biggest benefits we saw involved
 vector code ...
 Sorry, No.
 On 04/17/2012 04:29 AM, Richard Sandiford wrote:
 Anyway, given those results and your mixed feelings, I think it would
 be best to drop the patch.  It's a lot of code to carry around, and its
 ad-hoc nature would make it hard to change in future.  There must be
 better ways of achieving the same thing.

 It is up to you, Richard.  I don't mind if you commit it into the trunk.

 There is something in your approach too.  If it would be possible to get
 the best of the two methods, we could see a big improvement.
 On s390, the patch is pretty much a straight improvement across the line:
 0.5% on SPECint, 1.7% on SPECfp overall, with the best single improvement
 in GemsFDTD (9.3%), and no regression beyond usual noise levels.
 That is a really big improvement.
 Given that, and the impressive improvements we saw on some ARM benchmarks
 involving vector code (up to 70%), it would really be a pity to see the
 patch going nowhere ...

 Is there anything we can do to help make the patch more acceptable?
 Any suggestions welcome ...

 I did not object the patch.  As I wrote Richard, it is up to him to 
 decide to commit the patch or not (it still is).  I had mixed results on 
 x86-64 -- some better and some worse (but with average the same) with 
 pretty big variations.  Those big variations mean that people could use 
 this for fine-tuning their programs.

 Taking your results for S390 and ARM with Neon into account, I guess it 
 should be included and probably made by default for these 2 targets (for 
 sure for s390).

OK, thanks to both of you.

Ulrich and Andreas: would you be happy for s390 to use this by default?
I'll update the patch and install if so.

Richard


libgo patch committed: Only define PathMax if PATH_MAX is defined

2012-04-24 Thread Ian Lance Taylor
This patch to libgo only defines syscall.PathMax if PATH_MAX is defined
in the system header files.  Bootstrapped on x86_64-unknown-linux-gnu.
Committed to mainline and 4.7 branch.

Ian

diff -r 4692bc350a0a libgo/mksysinfo.sh
--- a/libgo/mksysinfo.sh	Tue Apr 24 09:23:40 2012 -0700
+++ b/libgo/mksysinfo.sh	Tue Apr 24 12:11:02 2012 -0700
@@ -275,8 +275,9 @@
   sed -e 's/^\(const \)__\(PC[^= ]*\)\(.*\)$/\1\2 = __\2/'  ${OUT}
 
 # The PATH_MAX constant.
-grep '^const _PATH_MAX ' gen-sysinfo.go |
+if grep '^const _PATH_MAX ' gen-sysinfo.go /dev/null 21; then
   echo 'const PathMax = _PATH_MAX'  ${OUT}
+fi
 
 # epoll constants.
 grep '^const _EPOLL' gen-sysinfo.go |


Re: [Patch, Fortran] PR 52196 add -Wrealloc-lhs(-all)

2012-04-24 Thread Mikael Morin
On 19/04/2012 14:02, Tobias Burnus wrote:
 Updated patch enclosed.
 
 On 02/14/2012 12:42 PM, Tobias Burnus wrote:
 in order to gain an overview for our code whether the recent RESHAPE
 (and friends) bug affects us and to determine for which assignment a
 reallocation happens, useful to mitigate performance issues, I added
 -Wrealloc-lhs and -Wrealloc-lhs-all.

 The flag -Wrealloc-lhs is the more useful flag: It's about arrays of
 intrinsic types, which are more likely to appear in hot loops than
 other types of reallocatable variables such as derived types or
 (scalar) character  variables with deferred length. 
 
 On 02/27/2012 09:59 PM, Mikael Morin wrote:
 In turn, the warning might be printed even if at the end no realloc
 code is
 generated or present with -O1.
 Can it be caused by the frontend not going in the realloc-lhs
 functions in
 some cases? Especially, it seems that there is a missing
 codimension/coindexed
 condition guarding the warning if I compare to the flag_realloc_lhs
 conditions
 in trans-expr.c I would rather move the warnings to a function and
 call it in the places where we really generate the extra code, like
 it's done for -Warray-temporaries.
 
 Two months later I have finally worked on the patch again and followed
 the suggestion and added the checks to trans-expr.c.
 
 Build and regtested on x86-64-linux.
 OK for the trunk?
 
Looks good. OK. Thanks.

Mikael


Re: [Patch, Fortran] PR52864 - fix actual/formal checks

2012-04-24 Thread Mikael Morin
On 12/04/2012 17:23, Tobias Burnus wrote:
 This patch is a kind of follow up to the other one for the same PR -
 though this one is for a separate test case, it is not a regression and
 it's about actual/formal checks.
 
 When trying to fix the rejects-valid bug, I realized that one function
 was never accessed as a call to expr.c's gfc_check_vardef_context is
 done before. I made some cleanup and added some code to ensure pointer
 CLASS are correctly handled. I am not positive that the removed code is
 unreachable, but I failed to produce reachable code and also the test
 suit passed.
 
 Thus, this patch removed a rejects-valid bug, an accepts-invalid bug,
 cleans up the code a bit and adds a test case for existing checks
 (besides testing the bug fixes).
 
 Build and regtested on x86-64-linux.
 OK for the trunk?
 

Hello, is there a reason to guard the class_pointer condition with
attr.class_ok in the first conditional and with CLASS_DATA(...) != NULL
in the two other ones?
Not that it matters much, and in fact, I think the patch as is is good
enough for committal (yes, it is a OK).
I'm asking as I never know myself what is the correct, canonical way to
handle the class_* hell...
Thanks

Mikael


libgo patch committed: Remove race in use of ../testdata

2012-04-24 Thread Ian Lance Taylor
This patch to the libgo testsuite driver removes a race in the use of
the ../testdata directory.  This is PR go/52462.  Ran Go testsuite on
x86_64-unknown-linux-gnu.  Committed to mainline and 4.7 branch.

Ian

diff -r 0b5618e33848 libgo/testsuite/gotest
--- a/libgo/testsuite/gotest	Tue Apr 24 12:11:37 2012 -0700
+++ b/libgo/testsuite/gotest	Tue Apr 24 13:10:27 2012 -0700
@@ -136,18 +136,20 @@
 mkdir $DIR
 
 cd $DIR
+mkdir test
+cd test
 
 if test $keep = false; then
-  trap cd ..; rm -rf $DIR 0 1 2 3 14 15
+  trap cd ../..; rm -rf $DIR 0 1 2 3 14 15
 else
-  trap cd ..; echo Keeping $DIR 0 1 2 3 14 15
+  trap cd ../..; echo Keeping $DIR 0 1 2 3 14 15
 fi
 
 case $srcdir in
 	/*)
 		;;
 	*)
-		srcdir=../$srcdir
+		srcdir=../../$srcdir
 ;;
 esac
 
@@ -158,7 +160,7 @@
 	/*)
 		;;
 	*)
-		basedir=../$basedir
+		basedir=../../$basedir
 ;;
 esac
 
@@ -189,10 +191,10 @@
 			b=`basename $f`
 			rm -f $b
 			cp $basedir/$f $b
-		elif test -f ../$f; then
+		elif test -f ../../$f; then
 			b=`basename $f`
 			rm -f $b
-			cp ../$f $b
+			cp ../../$f $b
 		else
 			echo file $f not found 12
 			exit 1
@@ -224,10 +226,10 @@
 			b=`basename $f`
 			rm -f $b
 			cp $basedir/$f $b
-		elif test -f ../$f; then
+		elif test -f ../../$f; then
 			b=`basename $f`
 			rm -f $b
-			cp ../$f $b
+			cp ../../$f $b
 		else
 			echo file $f not found 12
 			exit 1
@@ -455,27 +457,27 @@
 	exit $status
 	;;
 xyes)
-	rm -rf ../testsuite/*.o
+	rm -rf ../../testsuite/*.o
 	files=`echo *`
 	for f in $files; do
 		if test $f = _obj || test $f = _test; then
 			continue
 		fi
-		rm -rf ../testsuite/$f
+		rm -rf ../../testsuite/$f
 		if test -f $f; then
-			cp $f ../testsuite/
+			cp $f ../../testsuite/
 		else
-			ln -s ../$DIR/$f ../testsuite/
+			ln -s ../$DIR/test/$f ../../testsuite/
 		fi
 	done
-	cd ../testsuite
+	cd ../../testsuite
 	rm -rf _obj _test
 	mkdir _obj _test
 	if test $testname != ; then
 	GOTESTNAME=$testname
 	export GOTESTNAME
 	fi
-	$MAKE check RUNTESTFLAGS=$RUNTESTFLAGS GOTEST_TMPDIR=$DIR
+	$MAKE check RUNTESTFLAGS=$RUNTESTFLAGS GOTEST_TMPDIR=$DIR/test
 	# Useful when using make check-target-libgo
 	cat libgo.log  libgo-all.log
 	cat libgo.sum  libgo-all.sum


[C++ Patch] PR 39970

2012-04-24 Thread Paolo Carlini

Hi,

I'm trying to resolve this rather old PR, where we don't reject uses of 
the dot operator in default template arguments, eg:


class blah { int member; };
blah global;

templateint param = global.member
class template_blah;


I'm enforcing the check by adding to cp_parser a new 
in_template_default_argument_p which can be set in 
cp_parser_default_argument. Then it's just matter of reading it 
(together with integral_constant_expression_p, otherwise we can make 
mistakes in the argument of eg, sizeof or decltype) exactly when parsing 
. - .* and -* in order to have a precise location for the error. Is 
this the way we can fix this or we can do better?


Tested x86_64-linux.

Thanks,
Paolo.

//

/cp
2012-04-24  Paolo Carlini  paolo.carl...@oracle.com

PR c++/39970
* parser.h (cp_parser::in_template_default_argument_p): Add.
* parser.c (cp_parser_new): Initialize the latter.
(cp_parser_default_argument): Set the latter to true when parsing
the default argument for a non-type template parameter. 
(cp_parser_primary_expression): When in_template_default_argument_p
 integral_constant_expression_p reject pointer-to-member operators.
(cp_parser_postfix_expression): Likewise for class member access
operators.

/testsuite
2012-04-24  Paolo Carlini  paolo.carl...@oracle.com

PR c++/39970
* g++.dg/parse/template27.C: New.

Index: testsuite/g++.dg/parse/template27.C
===
--- testsuite/g++.dg/parse/template27.C (revision 0)
+++ testsuite/g++.dg/parse/template27.C (revision 0)
@@ -0,0 +1,22 @@
+// PR c++/39970
+
+struct blah { int member; };
+blah global;
+blah* pglobal;
+
+template int param = global.member   // { dg-error class member access 
operator }
+class template1_blah;
+
+template int* param = global.member // { dg-error class member access 
operator }
+class template2_blah;
+
+template int param = pglobal-member // { dg-error class member access 
operator }
+class template3_blah;
+
+int blah::* pm = blah::member;
+
+template int param = global.*pm  // { dg-error pointer-to-member 
operator }
+class template4_blah;
+
+template int param = pglobal-*pm// { dg-error pointer-to-member 
operator }
+class template5_blah;
Index: cp/parser.c
===
--- cp/parser.c (revision 186780)
+++ cp/parser.c (working copy)
@@ -503,6 +503,8 @@ cp_debug_parser (FILE *file, cp_parser *parser)
   cp_debug_print_flag (file, Local names and 'this' forbidden in 
  current context,
  parser-local_variables_forbidden_p);
+  cp_debug_print_flag (file, In template default argument,
+ parser-in_template_default_argument_p);
   cp_debug_print_flag (file, In unbraced linkage specification,
  parser-in_unbraced_linkage_specification_p);
   cp_debug_print_flag (file, Parsing a declarator,
@@ -3278,6 +3280,10 @@ cp_parser_new (void)
   /* Local variable names are not forbidden.  */
   parser-local_variables_forbidden_p = false;
 
+  /* We are not processing the default argument for 
+ a non-type template parameter.  */
+  parser-in_template_default_argument_p = false;
+
   /* We are not processing an `extern C' declaration.  */
   parser-in_unbraced_linkage_specification_p = false;
 
@@ -4266,6 +4272,14 @@ cp_parser_primary_expression (cp_parser *parser,
  return error_mark_node;
id_expr_token = token;
token = cp_lexer_peek_token (parser-lexer);
+   if (parser-in_template_default_argument_p
+parser-integral_constant_expression_p
+(token-type == CPP_DOT_STAR || token-type == CPP_DEREF_STAR))
+ {
+   error_at (token-location,
+ pointer-to-member operator is not allowed here);
+   return error_mark_node;
+ }
done = (token-type != CPP_OPEN_SQUARE
 token-type != CPP_OPEN_PAREN
 token-type != CPP_DOT
@@ -5776,6 +5790,11 @@ cp_parser_postfix_expression (cp_parser *parser, b
 postfix-expression - template [opt] id-expression
 postfix-expression - pseudo-destructor-name */
 
+ if (parser-in_template_default_argument_p
+  parser-integral_constant_expression_p)
+   error_at (token-location,
+ class member access operator is not allowed here);
+
  /* Consume the `.' or `-' operator.  */
  cp_lexer_consume_token (parser-lexer);
 
@@ -17372,6 +17391,7 @@ cp_parser_default_argument (cp_parser *parser, boo
   tree default_argument = NULL_TREE;
   bool saved_greater_than_is_operator_p;
   bool saved_local_variables_forbidden_p;
+  bool saved_in_template_default_argument_p;
   bool non_constant_p, is_direct_init;
 
   /* Make sure that PARSER-GREATER_THAN_IS_OPERATOR_P is

Re: Propose to add a new argument MULTILIB_REQUIRED in multilib framework

2012-04-24 Thread Joseph S. Myers
On Tue, 17 Apr 2012, Terry Guo wrote:

 2012-04-17  Terry Guo  terry@arm.com
 
 * Makefile.in (s-mlib): Add new argument MULTILIB_REQUIRED.
 * genmultilib (MULTILIB_REQUIRED): New.

You need to add documentation in fragments.texi for MULTILIB_REQUIRED.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: default to strict dwarf-2 on powerpc-vxworks

2012-04-24 Thread Olivier Hainque

On Apr 24, 2012, at 17:49 , Jakub Jelinek wrote:
 Talked to Jason about this on IRC last night, he is ok with that, so your
 patch is ok for trunk.

Just committed - Thanks :-)



Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls

2012-04-24 Thread Tom de Vries
On 17/04/12 14:24, Richard Guenther wrote:
 On Sat, Apr 14, 2012 at 9:26 AM, Tom de Vries tom_devr...@mentor.com wrote:
 On 27/01/12 21:37, Tom de Vries wrote:
 On 24/01/12 11:40, Richard Guenther wrote:
 On Mon, Jan 23, 2012 at 10:27 PM, Tom de Vries tom_devr...@mentor.com 
 wrote:
 Richard,
 Jakub,

 the following patch fixes PR51879.

 Consider the following test-case:
 ...
 int bar (int);
 void baz (int);

 void
 foo (int y)
 {
  int a;
  if (y == 6)
a = bar (7);
  else
a = bar (7);
  baz (a);
 }
 ...

 after compiling at -02, the representation looks like this before 
 tail-merging:
 ...
  # BLOCK 3 freq:1991
  # PRED: 2 [19.9%]  (true,exec)
  # .MEMD.1714_7 = VDEF .MEMD.1714_6(D)
  # USE = nonlocal
  # CLB = nonlocal
  aD.1709_3 = barD.1703 (7);
  goto bb 5;
  # SUCC: 5 [100.0%]  (fallthru,exec)

  # BLOCK 4 freq:8009
  # PRED: 2 [80.1%]  (false,exec)
  # .MEMD.1714_8 = VDEF .MEMD.1714_6(D)
  # USE = nonlocal
  # CLB = nonlocal
  aD.1709_4 = barD.1703 (7);
  # SUCC: 5 [100.0%]  (fallthru,exec)

  # BLOCK 5 freq:1
  # PRED: 3 [100.0%]  (fallthru,exec) 4 [100.0%]  (fallthru,exec)
  # aD.1709_1 = PHI aD.1709_3(3), aD.1709_4(4)
  # .MEMD.1714_5 = PHI .MEMD.1714_7(3), .MEMD.1714_8(4)
  # .MEMD.1714_9 = VDEF .MEMD.1714_5
  # USE = nonlocal
  # CLB = nonlocal
  bazD.1705 (aD.1709_1);
  # VUSE .MEMD.1714_9
  return;
 ...

 the patch allows aD.1709_4 to be value numbered to aD.1709_3, and 
 .MEMD.1714_8
 to .MEMD.1714_7, which enables tail-merging of blocks 4 and 5.

 The patch makes sure non-const/pure call results (gimple_vdef and
 gimple_call_lhs) are properly value numbered.

 Bootstrapped and reg-tested on x86_64.

 ok for stage1?

 The following cannot really work:

 @@ -2600,7 +2601,11 @@ visit_reference_op_call (tree lhs, gimpl
result = vn_reference_lookup_1 (vr1, NULL);
if (result)
  {
 -  changed = set_ssa_val_to (lhs, result);
 +  tree result_vdef = gimple_vdef (SSA_NAME_DEF_STMT (result));
 +  if (vdef)
 +   changed |= set_ssa_val_to (vdef, result_vdef);
 +  changed |= set_ssa_val_to (lhs, result);

 because 'result' may be not an SSA name.  It might also not have
 a proper definition statement (if VN_INFO (result)-needs_insertion
 is true).  So you at least need to guard things properly.


 Right. And that also doesn't work if the function is without lhs, such as 
 in the
 new test-case pr51879-6.c.

 I fixed this by storing both lhs and vdef, such that I don't have to derive
 the vdef from the lhs.

 (On a side-note - I _did_ want to remove value-numbering virtual operands
 at some point ...)


 Doing so willl hurt performance of tail-merging in its current form.
 OTOH, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51964#c0 shows that
 value numbering as used in tail-merging has its limitations too.
 Do you have any ideas how to address that one?

 @@ -3359,8 +3366,10 @@ visit_use (tree use)
   /* ???  We should handle stores from calls.  */
   else if (TREE_CODE (lhs) == SSA_NAME)
 {
 + tree vuse = gimple_vuse (stmt);
   if (!gimple_call_internal_p (stmt)
 -  gimple_call_flags (stmt)  (ECF_PURE | ECF_CONST))
 +  (gimple_call_flags (stmt)  (ECF_PURE | ECF_CONST)
 + || (vuse  SSA_VAL (vuse) != VN_TOP)))
 changed = visit_reference_op_call (lhs, stmt);
   else
 changed = defs_to_varying (stmt);

 ... exactly because of the issue that a stmt has multiple defs.  Btw,
 vuse should have been visited here or be part of our SCC, so, why do
 you need this check?


 Removed now, that was a workaround for a bug in an earlier version of the 
 patch,
 that I didn't need anymore.

 Bootstrapped and reg-tested on x86_64.

 OK for stage1?


 Richard,

 quoting you in http://gcc.gnu.org/ml/gcc-patches/2012-02/msg00618.html:
 ...
 I think these fixes hint at that we should
 use structural equality as fallback if value-numbering doesn't equate
 two stmt effects.  Thus, treat two stmts with exactly the same operands
 and flags as equal and using value-numbering to canonicalize operands
 (when they are SSA names) for that comparison, or use VN entirely
 if there are no side-effects on the stmt.

 Changing value-numbering of virtual operands, even if it looks correct in the
 simple cases you change, doesn't look like a general solution for the missed
 tail merging opportunities.
 ...

 The test-case pr51879-6.c shows a case where improving value numbering will 
 help
 tail-merging, but structural equality comparison not:
 ...
  # BLOCK 3 freq:1991
  # PRED: 2 [19.9%]  (true,exec)
  # .MEMD.1717_7 = VDEF .MEMD.1717_6(D)
  # USE = nonlocal
  # CLB = nonlocal
  blaD.1708 (5);
  # .MEMD.1717_8 = VDEF .MEMD.1717_7
  # USE = nonlocal
  # CLB = nonlocal
  aD.1712_3 = barD.1704 (7);
  goto bb 5;
  # SUCC: 5 [100.0%]  (fallthru,exec)

  # BLOCK 4 freq:8009
  # PRED: 2 [80.1%]  (false,exec)
  # .MEMD.1717_9 = VDEF .MEMD.1717_6(D)
  # USE = 

Re: [C++ Patch] PR 39970

2012-04-24 Thread Paolo Carlini

Hi,


We should already reject that because it isn't a constant expression; 
I don't think we want to bother checking specifically for this case in 
the parser as well.


Sure we reject this kind of code at instantiation time. I suspect 
submitter bothered filing a PR because other front-end also enforce the 
check in the parser, eg, EDG, but certainly I don't have a strong opinion.


Perhaps we aren't checking default arguments unless they're actually 
used; a could change that if they aren't dependent.



Your reply reached the mailing list a bit mangled, could you please clarify?

Thanks!
Paolo.


[PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)

2012-04-24 Thread Teresa Johnson
This patch adds heuristics to limit unrolling in loops with branches that may 
increase
branch mispredictions. It affects loops that are not frequently iterated, and 
that are
nested within a hot region of code that already contains many branch 
instructions.

Performance tested with both internal benchmarks and with SPEC 2000/2006 on a 
variety
of Intel systems (Core2, Corei7, SandyBridge) and a couple of different AMD 
Opteron systems.
This improves performance of an internal search indexing benchmark by close to 
2% on
all the tested Intel platforms.  It also consistently improves 445.gobmk (with 
FDO feedback
where unrolling kicks in) by close to 1% on AMD Opteron. Other performance 
effects are
neutral.

Bootstrapped and tested on x86_64-unknown-linux-gnu.  Is this ok for trunk?

Thanks,
Teresa

2012-04-24   Teresa Johnson  tejohn...@google.com

* loop-unroll.c (loop_has_call): New function.
(loop_has_FP_comp): Ditto.
(compute_weighted_branches): Ditto.
(max_unroll_with_branches): Ditto.
(decide_unroll_constant_iterations): Add heuristic to avoid
increasing branch mispredicts when unrolling.
(decide_unroll_runtime_iterations): Ditto.
* params.def (PARAM_MIN_ITER_UNROLL_WITH_BRANCHES): New param.
(PARAM_UNROLL_OUTER_LOOP_BRANCH_BUDGET): Ditto.

Index: loop-unroll.c
===
--- loop-unroll.c   (revision 186783)
+++ loop-unroll.c   (working copy)
@@ -152,6 +152,180 @@ static void combine_var_copies_in_loop_exit (struc
 basic_block);
 static rtx get_expansion (struct var_to_expand *);
 
+/* Determine whether LOOP contains call.  */
+static bool
+loop_has_call(struct loop *loop)
+{
+  basic_block *body, bb;
+  unsigned i;
+  rtx insn;
+
+  body = get_loop_body (loop);
+  for (i = 0; i  loop-num_nodes; i++)
+{
+  bb = body[i];
+
+  FOR_BB_INSNS (bb, insn)
+{
+  if (CALL_P (insn))
+{
+  free (body);
+  return true;
+}
+}
+}
+  free (body);
+  return false;
+}
+
+/* Determine whether LOOP contains floating-point computation.  */
+static bool
+loop_has_FP_comp(struct loop *loop)
+{
+  rtx set, dest;
+  basic_block *body, bb;
+  unsigned i;
+  rtx insn;
+
+  body = get_loop_body (loop);
+  for (i = 0; i  loop-num_nodes; i++)
+{
+  bb = body[i];
+
+  FOR_BB_INSNS (bb, insn)
+{
+  set = single_set (insn);
+  if (!set)
+continue;
+
+  dest = SET_DEST (set);
+  if (FLOAT_MODE_P (GET_MODE (dest)))
+{
+  free (body);
+  return true;
+}
+}
+}
+  free (body);
+  return false;
+}
+
+/* Compute the number of branches in LOOP, weighted by execution counts.  */
+static float
+compute_weighted_branches(struct loop *loop)
+{
+  int header_count = loop-header-count;
+  unsigned i;
+  float n;
+  basic_block * body;
+
+  /* If no profile feedback data exists, don't limit unrolling  */
+  if (header_count == 0)
+return 0.0;
+
+  gcc_assert (loop-latch != EXIT_BLOCK_PTR);
+
+  body = get_loop_body (loop);
+  n = 0.0;
+  for (i = 0; i  loop-num_nodes; i++)
+{
+  if (EDGE_COUNT (body[i]-succs) = 2)
+{
+  /* If this block is executed less frequently than the header (loop
+ entry), then it is weighted based on the ratio of times it is
+ executed compared to the header.  */
+  if (body[i]-count  header_count)
+n += ((float)body[i]-count)/header_count;
+
+  /* When it is executed more frequently than the header (i.e. it is
+ in a nested inner loop), simply weight the branch at 1.0.  */
+  else
+n += 1.0;
+}
+}
+  free (body);
+
+  return n;
+}
+
+/* Compute the maximum number of times LOOP can be unrolled without exceeding
+   a branch budget, which can increase branch mispredictions. The number of
+   branches is computed by weighting each branch with its expected execution
+   probability through the loop based on profile data. If no profile feedback
+   data exists, simply return the current NUNROLL factor.  */
+static unsigned
+max_unroll_with_branches(struct loop *loop, unsigned nunroll)
+{
+  struct loop *outer;
+  struct niter_desc *outer_desc;
+  int outer_niters = 1;
+  float weighted_outer_branches = 0.0;
+  float weighted_num_branches = compute_weighted_branches (loop);
+
+  /* If there was no profile feedback data, weighted_num_branches will be 0.0
+ and we won't limit unrolling. If the weighted_num_branches is at most 1.0,
+ also don't limit unrolling as the back-edge branch will not be 
duplicated.  */
+  if (weighted_num_branches = 1.0)
+return nunroll;
+
+  /* Walk up the loop tree until we find a hot outer loop in which the current
+ loop is nested. At that point we will compute the number 

Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)

2012-04-24 Thread Andrew Pinski
On Tue, Apr 24, 2012 at 2:26 PM, Teresa Johnson tejohn...@google.com wrote:
 This patch adds heuristics to limit unrolling in loops with branches that may 
 increase
 branch mispredictions. It affects loops that are not frequently iterated, and 
 that are
 nested within a hot region of code that already contains many branch 
 instructions.

 Performance tested with both internal benchmarks and with SPEC 2000/2006 on a 
 variety
 of Intel systems (Core2, Corei7, SandyBridge) and a couple of different AMD 
 Opteron systems.
 This improves performance of an internal search indexing benchmark by close 
 to 2% on
 all the tested Intel platforms.  It also consistently improves 445.gobmk 
 (with FDO feedback
 where unrolling kicks in) by close to 1% on AMD Opteron. Other performance 
 effects are
 neutral.

 Bootstrapped and tested on x86_64-unknown-linux-gnu.  Is this ok for trunk?

 Thanks,
 Teresa

 2012-04-24   Teresa Johnson  tejohn...@google.com

        * loop-unroll.c (loop_has_call): New function.
        (loop_has_FP_comp): Ditto.
        (compute_weighted_branches): Ditto.
        (max_unroll_with_branches): Ditto.
        (decide_unroll_constant_iterations): Add heuristic to avoid
        increasing branch mispredicts when unrolling.
        (decide_unroll_runtime_iterations): Ditto.
        * params.def (PARAM_MIN_ITER_UNROLL_WITH_BRANCHES): New param.
        (PARAM_UNROLL_OUTER_LOOP_BRANCH_BUDGET): Ditto.

 Index: loop-unroll.c
 ===
 --- loop-unroll.c       (revision 186783)
 +++ loop-unroll.c       (working copy)
 @@ -152,6 +152,180 @@ static void combine_var_copies_in_loop_exit (struc
                                             basic_block);
  static rtx get_expansion (struct var_to_expand *);

 +/* Determine whether LOOP contains call.  */
 +static bool
 +loop_has_call(struct loop *loop)
 +{
 +  basic_block *body, bb;
 +  unsigned i;
 +  rtx insn;
 +
 +  body = get_loop_body (loop);
 +  for (i = 0; i  loop-num_nodes; i++)
 +    {
 +      bb = body[i];
 +
 +      FOR_BB_INSNS (bb, insn)
 +        {
 +          if (CALL_P (insn))
 +            {
 +              free (body);
 +              return true;
 +            }
 +        }
 +    }
 +  free (body);
 +  return false;
 +}
 +
 +/* Determine whether LOOP contains floating-point computation.  */
 +static bool
 +loop_has_FP_comp(struct loop *loop)
 +{
 +  rtx set, dest;
 +  basic_block *body, bb;
 +  unsigned i;
 +  rtx insn;
 +
 +  body = get_loop_body (loop);
 +  for (i = 0; i  loop-num_nodes; i++)
 +    {
 +      bb = body[i];
 +
 +      FOR_BB_INSNS (bb, insn)
 +        {
 +          set = single_set (insn);
 +          if (!set)
 +            continue;
 +
 +          dest = SET_DEST (set);
 +          if (FLOAT_MODE_P (GET_MODE (dest)))
 +            {
 +              free (body);
 +              return true;
 +            }
 +        }
 +    }
 +  free (body);
 +  return false;
 +}
 +
 +/* Compute the number of branches in LOOP, weighted by execution counts.  */
 +static float
 +compute_weighted_branches(struct loop *loop)
 +{
 +  int header_count = loop-header-count;
 +  unsigned i;
 +  float n;
 +  basic_block * body;
 +
 +  /* If no profile feedback data exists, don't limit unrolling  */
 +  if (header_count == 0)
 +    return 0.0;
 +
 +  gcc_assert (loop-latch != EXIT_BLOCK_PTR);
 +
 +  body = get_loop_body (loop);
 +  n = 0.0;
 +  for (i = 0; i  loop-num_nodes; i++)
 +    {
 +      if (EDGE_COUNT (body[i]-succs) = 2)
 +        {
 +          /* If this block is executed less frequently than the header (loop
 +             entry), then it is weighted based on the ratio of times it is
 +             executed compared to the header.  */
 +          if (body[i]-count  header_count)
 +            n += ((float)body[i]-count)/header_count;

Please don't introduce more floating point usage into the compiler
since it could change between different hosts (sse vs x87 for an
example).
Maybe use a fixed point multiply of 1000 (note use a macro for this
special value though)  like what is used in the rest of the predict
code.


Thanks,
Andrew Pinski


 +
 +          /* When it is executed more frequently than the header (i.e. it is
 +             in a nested inner loop), simply weight the branch at 1.0.  */
 +          else
 +            n += 1.0;
 +        }
 +    }
 +  free (body);
 +
 +  return n;
 +}
 +
 +/* Compute the maximum number of times LOOP can be unrolled without exceeding
 +   a branch budget, which can increase branch mispredictions. The number of
 +   branches is computed by weighting each branch with its expected execution
 +   probability through the loop based on profile data. If no profile feedback
 +   data exists, simply return the current NUNROLL factor.  */
 +static unsigned
 +max_unroll_with_branches(struct loop *loop, unsigned nunroll)
 +{
 +  struct loop *outer;
 +  struct niter_desc *outer_desc;
 +  int outer_niters = 1;
 +  float weighted_outer_branches = 

Re: [C PATCH] Fix -Woverride-init (PR c/52880)

2012-04-24 Thread Joseph S. Myers
On Fri, 20 Apr 2012, Jakub Jelinek wrote:

 2012-04-19  Jakub Jelinek  ja...@redhat.com
 
   PR c/52880
   * c-typeck.c (set_nonincremental_init,
   set_nonincremental_init_from_string): Pass true instead of false
   as IMPLICIT to add_pending_init.
 
   * gcc.dg/pr52880.c: New test.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Go patch committed: Add explicit checks for division zero and overflow

2012-04-24 Thread Joseph S. Myers
On Fri, 20 Apr 2012, Ian Lance Taylor wrote:

 In Go, a division overflow is supposed to wrap.  On x86 processors it
 would raise a SIGFPE signal leading to a panic.  This patch fixes that
 problem as well.

It's a pity that this change is Go-specific.  For -fwrapv, INT_MIN / -1 
and INT_MIN % -1 should both have defined modulo semantics for C and C++ 
(similarly of course for all signed integer types, not just int).  Ideally 
of course the semantics for this would be encoded properly in GENERIC / 
GIMPLE, but until then it would make sense to have a global flag, set by 
both -fwrapv and the Go front end, to enable those checks in the 
language-independent compiler.  See bug 30484.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)

2012-04-24 Thread Teresa Johnson
Resending my response in plain text so it will go through to gcc-patches...

On Tue, Apr 24, 2012 at 2:36 PM, Teresa Johnson tejohn...@google.com wrote:



 On Tue, Apr 24, 2012 at 2:30 PM, Andrew Pinski pins...@gmail.com wrote:

 On Tue, Apr 24, 2012 at 2:26 PM, Teresa Johnson tejohn...@google.com wrote:
  This patch adds heuristics to limit unrolling in loops with branches that 
  may increase
  branch mispredictions. It affects loops that are not frequently iterated, 
  and that are
  nested within a hot region of code that already contains many branch 
  instructions.
 
  Performance tested with both internal benchmarks and with SPEC 2000/2006 
  on a variety
  of Intel systems (Core2, Corei7, SandyBridge) and a couple of different 
  AMD Opteron systems.
  This improves performance of an internal search indexing benchmark by 
  close to 2% on
  all the tested Intel platforms.  It also consistently improves 445.gobmk 
  (with FDO feedback
  where unrolling kicks in) by close to 1% on AMD Opteron. Other performance 
  effects are
  neutral.
 
  Bootstrapped and tested on x86_64-unknown-linux-gnu.  Is this ok for trunk?
 
  Thanks,
  Teresa
 
  2012-04-24   Teresa Johnson  tejohn...@google.com
 
         * loop-unroll.c (loop_has_call): New function.
         (loop_has_FP_comp): Ditto.
         (compute_weighted_branches): Ditto.
         (max_unroll_with_branches): Ditto.
         (decide_unroll_constant_iterations): Add heuristic to avoid
         increasing branch mispredicts when unrolling.
         (decide_unroll_runtime_iterations): Ditto.
         * params.def (PARAM_MIN_ITER_UNROLL_WITH_BRANCHES): New param.
         (PARAM_UNROLL_OUTER_LOOP_BRANCH_BUDGET): Ditto.
 
  Index: loop-unroll.c
  ===
  --- loop-unroll.c       (revision 186783)
  +++ loop-unroll.c       (working copy)
  @@ -152,6 +152,180 @@ static void combine_var_copies_in_loop_exit (struc
                                              basic_block);
   static rtx get_expansion (struct var_to_expand *);
 
  +/* Determine whether LOOP contains call.  */
  +static bool
  +loop_has_call(struct loop *loop)
  +{
  +  basic_block *body, bb;
  +  unsigned i;
  +  rtx insn;
  +
  +  body = get_loop_body (loop);
  +  for (i = 0; i  loop-num_nodes; i++)
  +    {
  +      bb = body[i];
  +
  +      FOR_BB_INSNS (bb, insn)
  +        {
  +          if (CALL_P (insn))
  +            {
  +              free (body);
  +              return true;
  +            }
  +        }
  +    }
  +  free (body);
  +  return false;
  +}
  +
  +/* Determine whether LOOP contains floating-point computation.  */
  +static bool
  +loop_has_FP_comp(struct loop *loop)
  +{
  +  rtx set, dest;
  +  basic_block *body, bb;
  +  unsigned i;
  +  rtx insn;
  +
  +  body = get_loop_body (loop);
  +  for (i = 0; i  loop-num_nodes; i++)
  +    {
  +      bb = body[i];
  +
  +      FOR_BB_INSNS (bb, insn)
  +        {
  +          set = single_set (insn);
  +          if (!set)
  +            continue;
  +
  +          dest = SET_DEST (set);
  +          if (FLOAT_MODE_P (GET_MODE (dest)))
  +            {
  +              free (body);
  +              return true;
  +            }
  +        }
  +    }
  +  free (body);
  +  return false;
  +}
  +
  +/* Compute the number of branches in LOOP, weighted by execution counts.  
  */
  +static float
  +compute_weighted_branches(struct loop *loop)
  +{
  +  int header_count = loop-header-count;
  +  unsigned i;
  +  float n;
  +  basic_block * body;
  +
  +  /* If no profile feedback data exists, don't limit unrolling  */
  +  if (header_count == 0)
  +    return 0.0;
  +
  +  gcc_assert (loop-latch != EXIT_BLOCK_PTR);
  +
  +  body = get_loop_body (loop);
  +  n = 0.0;
  +  for (i = 0; i  loop-num_nodes; i++)
  +    {
  +      if (EDGE_COUNT (body[i]-succs) = 2)
  +        {
  +          /* If this block is executed less frequently than the header 
  (loop
  +             entry), then it is weighted based on the ratio of times it is
  +             executed compared to the header.  */
  +          if (body[i]-count  header_count)
  +            n += ((float)body[i]-count)/header_count;

 Please don't introduce more floating point usage into the compiler
 since it could change between different hosts (sse vs x87 for an
 example).
 Maybe use a fixed point multiply of 1000 (note use a macro for this
 special value though)  like what is used in the rest of the predict
 code.


 Ok, got it. I will address this in the next version of the patch.

 Thanks,
 Teresa




 Thanks,
 Andrew Pinski


  +
  +          /* When it is executed more frequently than the header (i.e. it 
  is
  +             in a nested inner loop), simply weight the branch at 1.0.  */
  +          else
  +            n += 1.0;
  +        }
  +    }
  +  free (body);
  +
  +  return n;
  +}
  +
  +/* Compute the maximum number of times LOOP can be unrolled without 
  exceeding
  +   a branch budget, which 

Re: [C] improve missing initializers diagnostics

2012-04-24 Thread Joseph S. Myers
On Sat, 21 Apr 2012, Manuel L?pez-Ib??ez wrote:

 2012-04-19  Manuel L?pez-Ib??ez  m...@gcc.gnu.org
 
   * c-typeck.c (pop_init_level): Improve diagnostics.
 testsuite/
   * gcc.dg/m-un-2.c: Update.
   * gcc.dg/20011021-1.c: Update.

OK.  The strange near initialization formulation came in with a gcc2 
merge in 1999 (r24879), replacing previous diagnostics that did identify 
the field directly in the message, and never really made sense to me; I 
think naming the field directly in the diagnostic is better for all such 
diagnostics.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: Symbol table 14/many: remove rest of reachability code

2012-04-24 Thread H.J. Lu
On Mon, Apr 23, 2012 at 2:07 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Hi,
 this patch removes remaining traces of the code handling needed and
 reachability flags and replaces last use of it in cgraph_mark_reachable_node 
 by
 pointer set.  It is nice to have 60k of patch that essentially just removes
 things.

 Bootstrapped/regtested x86_64-linux, will commit it after rebuilding Mozilla.

 Honza

        * lto-symtab.c (lto_cgraph_replace_node): Do not call
        mark_reahcable_node.
        * cgraph.c (cgraph_remove_node): Do not clear reachable.
        (cgraph_mark_reachable_node): Remove.
        (cgraph_mark_force_output_node): Do not set reachable.
        (dump_cgraph_node): Do not dump reachable.
        (cgraph_create_virtual_clone): Do not set reachable.
        * cgraph.h (cgraph_node): Remove reachable flag.
        (varpool_node): Remove reachable flag.
        (cgraph_mark_if_needed): Remove.
        (cgraph_mark_reachable_node): Remove.
        * tree-emutls.c (ipa_lower_emutls): Do not check
        reachable.
        * cgraphunit.c (cgraph_finalize_function): Do not mark node as
        reachable.
        (cgraph_add_new_function): Likewise.
        (cgraph_mark_if_needed): Remove.
        (cgraph_analyze_function): Do not set target as reachable.
        (process_function_and_variable_attributes): Do not care about 
 dllexport.
        (cgraph_analyze_functions): Do not set reachable flags.
        (cgraph_mark_functions_to_output): Do not check reachability.
        (cgraph_copy_node_for_versioning): Do not set reachable flag.
        (dbxout_expand_expr): Update.
        * c-decl.c (merge_decls): Do not track changed externs.
        * ipa.c: Include pointer-set.h
        (enqueue_cgraph_node): Use reachable pointer set.
        (process_references): Likewise.
        (cgraph_remove_unreachable_nodes): Likewise.
        (whole_program_function_and_variable_visibility): Do not recompute 
 reachable.
        * trans-mem.c (ipa_tm_execute): Do not check reachable flag.

This patch caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53106

-- 
H.J.


set the correct block info for the call stmt in fnsplit (issue6111050)

2012-04-24 Thread Rong Xu
Hi,

In split_function() (ipa-split.c), for the newly created call stmt,
its block is set to the outermost scope, i.e.
DECL_INITIAL(current_function_decl). When we inline this
partially outlined function, we create the new block based on the
block for the call stmt (in expand_call_inline()). 
So the new block for the inlined body is in
parallel to the function top level scope. This bad block structure will
late result in a bad stack layout.

This patch fixes the issue by setting the correct block number. It 
starts with the split_point entry bb to find the block stmt in the 
outlined region. The entry_bb maybe an empty BB so we need to follow
the CFG until we find a non-empty bb.

My early patch tried to use the block info from the first non-empty 
bb in the outlined regision. But that failed bootstrap as some of the
stmts (assignment stmt) do not have the block info (bug?). So in this
patch I traverse all the stmts in the bb until getting the block info.

Tested with gcc bootstap.

Thanks,

2012-04-24   Rong Xu  x...@google.com

* ipa-split.c (split_function): set the correct block for the
call statement.

Index: ipa-split.c
===
--- ipa-split.c (revision 186634)
+++ ipa-split.c (working copy)
@@ -948,7 +948,7 @@
   int num = 0;
   struct cgraph_node *node, *cur_node = cgraph_node (current_function_decl);
   basic_block return_bb = find_return_bb ();
-  basic_block call_bb;
+  basic_block call_bb, bb;
   gimple_stmt_iterator gsi;
   gimple call;
   edge e;
@@ -958,6 +958,7 @@
   gimple last_stmt = NULL;
   unsigned int i;
   tree arg;
+  tree split_loc_block = NULL;
 
   if (dump_file)
 {
@@ -1072,6 +1073,33 @@
}
 }
 
+  /* Find the block location of the split region.  */
+  bb = split_point-entry_bb;
+  do
+{
+  bool found = false;
+
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
+{
+  if (is_gimple_debug(gsi_stmt(gsi)))
+continue;
+  if ((split_loc_block = gimple_block (gsi_stmt (gsi
+{
+  found = true;
+  break;
+}
+}
+  if (found)
+break;
+
+  /* If we reach here, this bb should be an empty unconditional
+ or fall-throuth branch. We continue with the succ bb.  */
+  bb = single_succ (bb);
+}
+  while (bb  bitmap_bit_p (split_point-split_bbs, bb-index));
+
+  gcc_assert (split_loc_block);
+
   /* Now create the actual clone.  */
   rebuild_cgraph_edges ();
   node = cgraph_function_versioning (cur_node, NULL, NULL, args_to_skip,
@@ -1115,7 +1143,7 @@
VEC_replace (tree, args_to_pass, i, arg);
   }
   call = gimple_build_call_vec (node-decl, args_to_pass);
-  gimple_set_block (call, DECL_INITIAL (current_function_decl));
+  gimple_set_block (call, split_loc_block);
 
   /* We avoid address being taken on any variable used by split part,
  so return slot optimization is always possible.  Moreover this is

--
This patch is available for review at http://codereview.appspot.com/6111050


Re: PR c/53066 Wshadow should not warn for shadowing an extern function

2012-04-24 Thread Joseph S. Myers
On Sun, 22 Apr 2012, Manuel L?pez-Ib??ez wrote:

 Wshadow warns whenever any declaration shadows a global function
 declaration. This is almost always noise, since most (always?) of the
 time one cannot mistakenly replace a function by another variable. The
 false positives are too common (Linus mentions using the name 'index'
 when including string.h).

I think the correct rule would be: warn if a variable *with 
pointer-to-function type* shadows a function, and warn if a nested 
function shadows another function, but don't warn for variables shadowing 
functions if the variable has any other type (because if the variable has 
some type that isn't a pointer-to-function, no confusion is likely without 
another error being given).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: set the correct block info for the call stmt in fnsplit (issue6111050)

2012-04-24 Thread Andrew Pinski
On Tue, Apr 24, 2012 at 2:57 PM, Rong Xu x...@google.com wrote:
 Hi,

 In split_function() (ipa-split.c), for the newly created call stmt,
 its block is set to the outermost scope, i.e.
 DECL_INITIAL(current_function_decl). When we inline this
 partially outlined function, we create the new block based on the
 block for the call stmt (in expand_call_inline()).
 So the new block for the inlined body is in
 parallel to the function top level scope. This bad block structure will
 late result in a bad stack layout.

We do not depend on the block structure any more when dealing with
stack layout for variables in GCC 4.7.0 and above.  I am not saying
your patch is incorrect or not needed.  Just it will not have an
effect on variable stack layout.

Did you have a testcase for the bad stack layout issue?  Or was it too
hard to produce one because the size matters?



 This patch fixes the issue by setting the correct block number. It
 starts with the split_point entry bb to find the block stmt in the
 outlined region. The entry_bb maybe an empty BB so we need to follow
 the CFG until we find a non-empty bb.

 My early patch tried to use the block info from the first non-empty
 bb in the outlined regision. But that failed bootstrap as some of the
 stmts (assignment stmt) do not have the block info (bug?). So in this
 patch I traverse all the stmts in the bb until getting the block info.

 Tested with gcc bootstap.

On which target?

Thanks,
Andrew Pinski


 Thanks,

 2012-04-24   Rong Xu  x...@google.com

        * ipa-split.c (split_function): set the correct block for the
        call statement.

 Index: ipa-split.c
 ===
 --- ipa-split.c (revision 186634)
 +++ ipa-split.c (working copy)
 @@ -948,7 +948,7 @@
   int num = 0;
   struct cgraph_node *node, *cur_node = cgraph_node (current_function_decl);
   basic_block return_bb = find_return_bb ();
 -  basic_block call_bb;
 +  basic_block call_bb, bb;
   gimple_stmt_iterator gsi;
   gimple call;
   edge e;
 @@ -958,6 +958,7 @@
   gimple last_stmt = NULL;
   unsigned int i;
   tree arg;
 +  tree split_loc_block = NULL;

   if (dump_file)
     {
 @@ -1072,6 +1073,33 @@
        }
     }

 +  /* Find the block location of the split region.  */
 +  bb = split_point-entry_bb;
 +  do
 +    {
 +      bool found = false;
 +
 +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
 +        {
 +          if (is_gimple_debug(gsi_stmt(gsi)))
 +            continue;
 +          if ((split_loc_block = gimple_block (gsi_stmt (gsi
 +            {
 +              found = true;
 +              break;
 +            }
 +        }
 +      if (found)
 +        break;
 +
 +      /* If we reach here, this bb should be an empty unconditional
 +         or fall-throuth branch. We continue with the succ bb.  */
 +      bb = single_succ (bb);
 +    }
 +  while (bb  bitmap_bit_p (split_point-split_bbs, bb-index));
 +
 +  gcc_assert (split_loc_block);
 +
   /* Now create the actual clone.  */
   rebuild_cgraph_edges ();
   node = cgraph_function_versioning (cur_node, NULL, NULL, args_to_skip,
 @@ -1115,7 +1143,7 @@
        VEC_replace (tree, args_to_pass, i, arg);
       }
   call = gimple_build_call_vec (node-decl, args_to_pass);
 -  gimple_set_block (call, DECL_INITIAL (current_function_decl));
 +  gimple_set_block (call, split_loc_block);

   /* We avoid address being taken on any variable used by split part,
      so return slot optimization is always possible.  Moreover this is

 --
 This patch is available for review at http://codereview.appspot.com/6111050


Re: set the correct block info for the call stmt in fnsplit (issue6111050)

2012-04-24 Thread Rong Xu
please the find test case in the attachment. It shows the issue in
google-4_6 branch

-c -O2 -fno-strict-aliasing ss.C -fdump-rtl-expand-all

in the rtl-expand dump,  trianglevertices and one the gtest_ar are in
the same partition.

the problem is found in arm compiler, but we manager to reproduce in x86.

-Rong


On Tue, Apr 24, 2012 at 3:02 PM, Andrew Pinski pins...@gmail.com wrote:
 On Tue, Apr 24, 2012 at 2:57 PM, Rong Xu x...@google.com wrote:
 Hi,

 In split_function() (ipa-split.c), for the newly created call stmt,
 its block is set to the outermost scope, i.e.
 DECL_INITIAL(current_function_decl). When we inline this
 partially outlined function, we create the new block based on the
 block for the call stmt (in expand_call_inline()).
 So the new block for the inlined body is in
 parallel to the function top level scope. This bad block structure will
 late result in a bad stack layout.

 We do not depend on the block structure any more when dealing with
 stack layout for variables in GCC 4.7.0 and above.  I am not saying
 your patch is incorrect or not needed.  Just it will not have an
 effect on variable stack layout.

 Did you have a testcase for the bad stack layout issue?  Or was it too
 hard to produce one because the size matters?



 This patch fixes the issue by setting the correct block number. It
 starts with the split_point entry bb to find the block stmt in the
 outlined region. The entry_bb maybe an empty BB so we need to follow
 the CFG until we find a non-empty bb.

 My early patch tried to use the block info from the first non-empty
 bb in the outlined regision. But that failed bootstrap as some of the
 stmts (assignment stmt) do not have the block info (bug?). So in this
 patch I traverse all the stmts in the bb until getting the block info.

 Tested with gcc bootstap.

 On which target?

 Thanks,
 Andrew Pinski


 Thanks,

 2012-04-24   Rong Xu  x...@google.com

        * ipa-split.c (split_function): set the correct block for the
        call statement.

 Index: ipa-split.c
 ===
 --- ipa-split.c (revision 186634)
 +++ ipa-split.c (working copy)
 @@ -948,7 +948,7 @@
   int num = 0;
   struct cgraph_node *node, *cur_node = cgraph_node (current_function_decl);
   basic_block return_bb = find_return_bb ();
 -  basic_block call_bb;
 +  basic_block call_bb, bb;
   gimple_stmt_iterator gsi;
   gimple call;
   edge e;
 @@ -958,6 +958,7 @@
   gimple last_stmt = NULL;
   unsigned int i;
   tree arg;
 +  tree split_loc_block = NULL;

   if (dump_file)
     {
 @@ -1072,6 +1073,33 @@
        }
     }

 +  /* Find the block location of the split region.  */
 +  bb = split_point-entry_bb;
 +  do
 +    {
 +      bool found = false;
 +
 +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
 +        {
 +          if (is_gimple_debug(gsi_stmt(gsi)))
 +            continue;
 +          if ((split_loc_block = gimple_block (gsi_stmt (gsi
 +            {
 +              found = true;
 +              break;
 +            }
 +        }
 +      if (found)
 +        break;
 +
 +      /* If we reach here, this bb should be an empty unconditional
 +         or fall-throuth branch. We continue with the succ bb.  */
 +      bb = single_succ (bb);
 +    }
 +  while (bb  bitmap_bit_p (split_point-split_bbs, bb-index));
 +
 +  gcc_assert (split_loc_block);
 +
   /* Now create the actual clone.  */
   rebuild_cgraph_edges ();
   node = cgraph_function_versioning (cur_node, NULL, NULL, args_to_skip,
 @@ -1115,7 +1143,7 @@
        VEC_replace (tree, args_to_pass, i, arg);
       }
   call = gimple_build_call_vec (node-decl, args_to_pass);
 -  gimple_set_block (call, DECL_INITIAL (current_function_decl));
 +  gimple_set_block (call, split_loc_block);

   /* We avoid address being taken on any variable used by split part,
      so return slot optimization is always possible.  Moreover this is

 --
 This patch is available for review at http://codereview.appspot.com/6111050
namespace testing
{
  class Test
  {
virtual void TestBody() = 0;
  };
  class AssertionResult
  {
public:
  AssertionResult(const AssertionResult other);
  operator bool() const { return success_; }

private:
  bool success_;
  void operator=(AssertionResult const );
  };

  namespace internal
  {
class TestFactoryBase
{
  public:
virtual ::testing::Test* CreateTest() = 0;
};
template class TestClass
  class TestFactoryImpl : public TestFactoryBase
{
  public:
virtual ::testing::Test* CreateTest() { return new TestClass; }
};
char* MakeAndRegisterTestInfo2(
  const char* test_case_name,
  TestFactoryBase* factory);
AssertionResult CmpHelperEQ(const char* expected_expression,
  const char* actual_expression,
  long long expected,
  long long actual);
AssertionResult CmpHelperNE( const char* expr, const char* expr2,
  long long val1, long long val2);
  

Re: strengthen protection against REG_EQUIV/EQUAL on !REG set

2012-04-24 Thread Richard Sandiford
Olivier Hainque hain...@adacore.com writes:
 *** /tmp/rkQ7Ep_emit-rtl.c2012-04-12 11:19:51.830104512 +0200
 --- gcc/emit-rtl.c2012-04-11 22:39:11.323103686 +0200
 *** set_unique_reg_note (rtx insn, enum reg_
 *** 4955,4960 
 --- 4955,4975 
 if (GET_CODE (datum) == ASM_OPERANDS)
   return NULL_RTX;
   
 +   /* Don't add REG_EQUAL/REG_EQUIV notes on a single_set destination 
 which
 +  isn't a REG or a SUBREG of REG.  Such notes are invalid, could lead
 +  to bogus assumptions downstream (e.g. that a (set MEM) couldn't trap),
 +  and many callers just don't care checking.  Note that we might have
 +  single_set (insn) == 0 here, typically from reload attaching REG_EQUAL
 +  to USEs for inheritance processing purposes.  */
 +   {
 + rtx set  = single_set (insn);
 + 
 + if (set  ! (REG_P (SET_DEST (set))
 +   || (GET_CODE (SET_DEST (set)) == SUBREG
 +REG_P (SUBREG_REG (SET_DEST (set))
 +   return NULL_RTX;
 +   }
 + 

STRICT_LOW_PART is OK too.

I like the idea, but I think we're in danger of having too many
functions check the set.  Further up set_unique_reg_note we have:

  /* Don't add REG_EQUAL/REG_EQUIV notes if the insn
 has multiple sets (some callers assume single_set
 means the insn only has one set, when in fact it
 means the insn only has one * useful * set).  */
  if (GET_CODE (PATTERN (insn)) == PARALLEL  multiple_sets (insn))
{
  gcc_assert (!note);
  return NULL_RTX;
}

And set_dst_reg_note has:

  rtx set = single_set (insn);

  if (set  SET_DEST (set) == dst)
return set_unique_reg_note (insn, kind, datum);

Would be nice to use a single function that knows about the extra
contraints here.  Maybe something like the attached?

I'm deliberately requiring the SET to the first rtx in the PARALLEL.
I'm not entirely happy with the optabs.c code:

  if (! rtx_equal_p (SET_DEST (set), target)
  /* For a STRICT_LOW_PART, the REG_NOTE applies to what is inside it.  */
   (GET_CODE (SET_DEST (set)) != STRICT_LOW_PART
  || ! rtx_equal_p (XEXP (SET_DEST (set), 0), target)))
return 1;

either; the note is always GET_MODE (target), so surely this
should be checking for STRICT_LOW_PART before applying rtx_equal_p?
Maybe set_for_reg_notes should return the reg too, and we'd just
apply rtx_equal_p to that.

Richard


gcc/
* rtl.h (set_for_reg_notes): Declare.
* emit-rtl.c (set_for_reg_notes): New function.
(set_unique_reg_note): Use it.
* optabs.c (add_equal_note): Likewise.

Index: gcc/rtl.h
===
--- gcc/rtl.h   2012-04-24 22:55:36.002967164 +0100
+++ gcc/rtl.h   2012-04-24 22:59:34.150966581 +0100
@@ -1879,6 +1879,7 @@ extern enum machine_mode choose_hard_reg
   bool);
 
 /* In emit-rtl.c  */
+extern rtx set_for_reg_notes (rtx);
 extern rtx set_unique_reg_note (rtx, enum reg_note, rtx);
 extern rtx set_dst_reg_note (rtx, enum reg_note, rtx, rtx);
 extern void set_insn_deleted (rtx);
Index: gcc/emit-rtl.c
===
--- gcc/emit-rtl.c  2012-04-24 22:55:36.002967164 +0100
+++ gcc/emit-rtl.c  2012-04-24 23:00:13.677966484 +0100
@@ -4944,6 +4944,45 @@ force_next_line_note (void)
   last_location = -1;
 }
 
+/* Notes like REG_EQUAL and REG_EQUIV refer to a set in an instruction.
+   Return the set in INSN that such notes describe, or NULL if the notes
+   have no meaning for INSN.  */
+
+rtx
+set_for_reg_notes (rtx insn)
+{
+  rtx pat, reg;
+
+  if (!INSN_P (insn))
+return NULL_RTX;
+
+  pat = PATTERN (insn);
+  if (GET_CODE (pat) == PARALLEL)
+{
+  /* We do not use single_set because that ignores SETs of unused
+registers.  REG_EQUAL and REG_EQUIV notes really do require the
+PARALLEL to have a single SET.  */
+  if (multiple_sets (insn))
+   return NULL_RTX;
+  pat = XVECEXP (pat, 0, 0);
+}
+
+  if (GET_CODE (pat) != SET)
+return NULL_RTX;
+
+  reg = SET_DEST (pat);
+
+  /* Notes apply to the contents of a STRICT_LOW_PART.  */
+  if (GET_CODE (reg) == STRICT_LOW_PART)
+reg = XEXP (reg, 0);
+
+  /* Check that we have a register.  */
+  if (!(REG_P (reg) || GET_CODE (reg) == SUBREG))
+return NULL_RTX;
+
+  return pat;
+}
+
 /* Place a note of KIND on insn INSN with DATUM as the datum. If a
note of this type already exists, remove it first.  */
 
@@ -4956,39 +4995,26 @@ set_unique_reg_note (rtx insn, enum reg_
 {
 case REG_EQUAL:
 case REG_EQUIV:
-  /* Don't add REG_EQUAL/REG_EQUIV notes if the insn
-has multiple sets (some callers assume single_set
-means the insn only has one set, when in fact it
-means the insn only has one * useful * set).  */
-  if (GET_CODE (PATTERN (insn)) == PARALLEL  

Re: Added better handling of outdated profile data. (issue 5989046)

2012-04-24 Thread asharif
Ping?

Here is the formatted ChangeLog in case you want a summary of what I did:

* gcc/ipa-inline.c (edge_badness): Make sure profile is valid before
using it to compute badness.
* gcc/predict.c (maybe_hot_frequency_p): Ditto.
(cgraph_maybe_hot_edge_p): Ditto.
(maybe_hot_edge_p): Ditto.
(probably_never_executed_bb_p): Ditto.
(compute_function_frequency): Ditto.
* gcc/profile.c (compute_branch_probabilities): Return without setting
the profile read flag if get_exec_counts returns NULL.
* gcc/tree.h: Added macro for accessing profile status.

The patch is in the original email and on the codereview issue
(http://codereview.appspot.com/5989046/).


Re: PowerPC prologue and epilogue 6

2012-04-24 Thread David Edelsohn
On Sat, Apr 21, 2012 at 2:48 AM, Alan Modra amo...@gmail.com wrote:
 This patch adds out-of-line vector saves and restores.  To do this I
 made some infrastructure changes to various functions like
 rs6000_emit_savres_rtx that currently take boolean parameters (savep,
 gpr, and lr).  Rather than add yet another boolean to specify vector
 regs, I chose to lump them all together in a bitmask.  This made the
 patch a little larger but overall is a better interface, I think.

 I also revert a change I made in
 http://gcc.gnu.org/ml/gcc-patches/2012-04/msg01014.html to always use
 r11 as a frame reg whenever abiv4 emits out-of-line saves.  Code
 quality in functions with small frames is better without that
 particular change.  This however meant some changes are required later
 when setting up pointer regs for gpr and fpr out-of-line saves.

 What else is here?  Improved register selection when saving vrsave in
 the prologue and when restoring cr in the epilogue, allowing better
 scheduling.  A fix to rs6000_output_function_prologue to output the
 correct .extern for ELF, then deciding we don't need such things
 anyway.  And various other little code cleanups.  Bootstrapped and
 regression tested powerpc-linux.

 gcc/
        * config/rs6000/rs6000 (SAVE_INLINE_VRS, REST_INLINE_VRS,
        V_SAVE_INLINE, SAVRES_LR, SAVRES_SAVE, SAVRES_REG,
        SAVRES_GPR, SAVRES_FPR, SAVRES_VR): Define.
        (no_global_regs_above): Delete.
        (no_global_regs): New function.
        (rs6000_savres_strategy): Handle vector regs.  Use proper lr_save_p
        value for load multiple test.
        (savres_routine_syms): Increase size.
        (rs6000_savres_routine_name, rs6000_savres_routine_sym,
        ptr_regno_for_savres, rs6000_emit_savres_rtx): Pass in int selector
        rather than a number of boolean flags.  Update all callers.
        (rs6000_savres_routine_name): Generate vector save/restore names.
        (rs6000_savres_routine_sym): Handle vector regs.  Delete forward decl.
        (ptr_regno_for_savres, rs6000_emit_savres_rtx): Likewise.
        (rs6000_emit_prologue): Delete saving_FPRs_inline, saving_GPRs_inline
        and using_store_multiple.  Expand uses.  Don't always use r11 as
        frame reg when needed for out-of-line saves.  Set up initial offset
        for out-of-line vector saves when buying stack frame.  Handle pointer
        reg setup for out-of-line fp save.  Emit call to out-of-line vector
        save function.  Choose r11 or r12 for vrsave reg when available for
        better scheduling.
        (rs6000_output_function_prologue): Don't emit .extern for ELF.
        (rs6000_emit_epilogue): Choose a better frame reg when restoring
        from back-chain to suit out-of-line vector restore functions.  Emit
        call to out-of-line vector restore function.  Adjust register used
        for cr restore.  Tweak pointer register setup for gpr restore.
        * config/rs6000/rs6000.h (FIRST_SAVED_GP_REGNO): Take into account
        FIXED_R13.
        * config/rs6000/sysv4.h (FP_SAVE_INLINE, GP_SAVE_INLINE): Simplify.
        (V_SAVE_INLINE): Define.
        * config/rs6000/altivec.md (save_vregs_*, restore_vregs_*): New insns.
 libgcc/
        * config/rs6000/crtsavevr.S: New file.
        * config/rs6000/crtrestvr.S: New file.
        * config/rs6000/t-savresfgpr: Build the above.
        * config/rs6000/t-netbsd: Likewise.

This patch is okay with the macro usage fix.

Thanks, David


Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)

2012-04-24 Thread Steven Bosscher
On Tue, Apr 24, 2012 at 11:26 PM, Teresa Johnson tejohn...@google.com wrote:

        * params.def (PARAM_MIN_ITER_UNROLL_WITH_BRANCHES): New param.
        (PARAM_UNROLL_OUTER_LOOP_BRANCH_BUDGET): Ditto.

You should add documentation for these new PARAMs to doc/invoke.texi.

I don't really like these new PARAMs: All other loop PARAMs are based
on the number of insns in a loop, or the maximum number of times a
transformation is applied. Your new
PARAM_MIN_ITER_UNROLL_WITH_BRANCHES is completely different, because
it is a number of iterations. This makes the PARAM value feel even
more arbitrary than all the other PARAMs to some extend already do...

(The only other PARAM like that is PARAM_ALIGN_LOOP_ITERATIONS, and
its default value also looks quite arbitrary...)


 Index: loop-unroll.c
 ===
 --- loop-unroll.c       (revision 186783)
 +++ loop-unroll.c       (working copy)
 @@ -152,6 +152,180 @@ static void combine_var_copies_in_loop_exit (struc
                                             basic_block);
  static rtx get_expansion (struct var_to_expand *);

 +/* Determine whether LOOP contains call.  */
 +static bool
 +loop_has_call(struct loop *loop)
 +{
 +  basic_block *body, bb;
 +  unsigned i;
 +  rtx insn;
 +
 +  body = get_loop_body (loop);
 +  for (i = 0; i  loop-num_nodes; i++)
 +    {
 +      bb = body[i];
 +
 +      FOR_BB_INSNS (bb, insn)
 +        {
 +          if (CALL_P (insn))
 +            {
 +              free (body);
 +              return true;
 +            }
 +        }
 +    }
 +  free (body);
 +  return false;
 +}
 +
 +/* Determine whether LOOP contains floating-point computation.  */
 +static bool
 +loop_has_FP_comp(struct loop *loop)
 +{
 +  rtx set, dest;
 +  basic_block *body, bb;
 +  unsigned i;
 +  rtx insn;
 +
 +  body = get_loop_body (loop);
 +  for (i = 0; i  loop-num_nodes; i++)
 +    {
 +      bb = body[i];
 +
 +      FOR_BB_INSNS (bb, insn)
 +        {
 +          set = single_set (insn);
 +          if (!set)
 +            continue;
 +
 +          dest = SET_DEST (set);
 +          if (FLOAT_MODE_P (GET_MODE (dest)))
 +            {
 +              free (body);
 +              return true;

So you only detect single-set FP operations where some insns stores in
a float mode. It wouldn't be very difficult to just walk over all sets
and look for float modes. This is also necessary e.g. for x87 sincos,
as well as various insns on other machines. Your comments say you
don't want to apply the new heuristic to loops containing FP
operations because these loops usually benefit more from unrolling.
Therefore, you should IMHO look at non-single_set() insns also here,
to avoid applying the heuristics to loops containing non-single_set()
FP insns.


 +            }
 +        }
 +    }
 +  free (body);
 +  return false;
 +}

Nit: You are calling loop_has_call and loop_has_FP_comp() twice on
each loop (first for constant iterations and next for runtime
iterations), maybe you can fuse the functions and cache the results
(e.g. with two bitmaps, or put it in the loop description and retrieve
it with get_simple_loop_desc). Actually num_loop_branches()
could/should also be cached. I realize that the loop body walks are
probably not very expensive (and compile time probably isn't a concern
if you're using profile driven optimizations) but they do all add
up...


 +/* Compute the number of branches in LOOP, weighted by execution counts.  */
 +static float
 +compute_weighted_branches(struct loop *loop)

The floating point thing was already mentioned by Andrew. You can use
integer math instead (for examples, look for BB_FREQ_MAX e.g. in
average_num_loop_insns()).


 +  while (loop_outer(outer))
 +    {
 +      outer_desc = get_simple_loop_desc (outer);
 +
 +      if (outer_desc-const_iter)
 +        outer_niters *= outer_desc-niter;
 +      else if (outer-header-count)
 +        outer_niters *= expected_loop_iterations (outer);
 +
 +      weighted_outer_branches = compute_weighted_branches (outer);

Can you delay this computation of weighted_outer_branches call to ...

 +      /* Should have been checked by caller.  */
 +      gcc_assert(PARAM_VALUE (PARAM_MIN_ITER_UNROLL_WITH_BRANCHES) != -1);

Should never even happen. You have set the minimum acceptable value to
0. If you managed to test this code with
PARAM_MIN_ITER_UNROLL_WITH_BRANCHES==-1, I'd like to know how (if you
can do it from the command line, there is a bug in the handling of
acceptable PARAM values :-)


 +      /* If the outer loop has enough iterations to be considered hot, then
 +         we can stop our upwards loop tree traversal and examine the current
 +         outer loop.  */
 +      if (outer_niters = PARAM_VALUE (PARAM_MIN_ITER_UNROLL_WITH_BRANCHES))
 +        {
 +          /* Assume that any call will cause the branch budget to be 
 exceeded,
 +             and that we can't unroll the current loop without increasing
 +             mispredicts.  */
 +   

Re: [C++ Patch] PR 39970

2012-04-24 Thread Jason Merrill

On 04/24/2012 05:24 PM, Paolo Carlini wrote:

Perhaps we aren't checking default arguments unless they're actually
used; a could change that if they aren't dependent.


Your reply reached the mailing list a bit mangled, could you please
clarify?


If the default argument isn't dependent on other template parameters, we 
can check immediately whether it is a suitable template argument for its 
parameter, including being a constant expression.  That ought to give an 
appropriate diagnostic for this testcase.


Jason


Re: Support for Runtime CPU type detection via builtins (issue5754058)

2012-04-24 Thread Sriraman Tallam
Hi,

   Thanks for all the comments. I have made all the changes as
mentioned and submiited the patch. Summary of changes made:

* Add support for AVX
* Fix documentation in extend.texi
* Make it thread-safe according to H.J.'s comments.

I have attached the patch. Boot-strapped and checked for test parity
with pristine build.

   * config/i386/i386.c (build_processor_model_struct): New function.
(make_var_decl): New function.
(fold_builtin_cpu): New function.
(ix86_fold_builtin): New function.
(make_cpu_type_builtin): New function.
(ix86_init_platform_type_builtins): New function.
(ix86_expand_builtin): Expand new builtins by folding them.
(ix86_init_builtins): Make new builtins to detect CPU type.
(TARGET_FOLD_BUILTIN): New macro.
(IX86_BUILTIN_CPU_INIT): New enum value.
(IX86_BUILTIN_CPU_IS): New enum value.
(IX86_BUILTIN_CPU_SUPPORTS): New enum value.
* config/i386/i386-builtin-types.def: New function type.
* testsuite/gcc.target/builtin_target.c: New testcase.
* doc/extend.texi: Document builtins.

* libgcc/config/i386/i386-cpuinfo.c: New file.
* libgcc/config/i386/t-cpuinfo: New file.
* libgcc/config.host: Include t-cpuinfo.
* libgcc/config/i386/libgcc-glibc.ver: Version symbol __cpu_model.


Thanks,
-Sri.

On Mon, Apr 23, 2012 at 5:46 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Apr 23, 2012 at 1:43 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Mon, Apr 23, 2012 at 12:16 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Mon, Apr 23, 2012 at 6:59 PM, Sriraman Tallam tmsri...@google.com 
 wrote:

 i386 maintainers -  Is this patch ok?

 Has the community reached the consensus on how this kind of
 functionality has to be implemented? I have followed the discussion a
 bit, but IIRC, there was no clear decision. Without this decision, I
 am not able to review the _implementation_ of agreed functionality for
 x86 target.

 (I apologize if I have missed the decision, please point me to the
 discussion in this case.)

 The discussions are here:

 http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01446.html
 and follow-ups to this.

 I am not sure about consensus, but the important points raised were:

 1) Constructor ordering: What if some constructors fire before
 cpu_indicator_init?, which determines the CPU. I addressed this
 problem by making the priority of cpu_indicator_init to be the highest
 possible. Still, IFUNC initializers will fire before and they have to
 explicitly call __builtin_cpu_init() before checking the CPU type.
 2) Reducing the number of builtins : It is only two now.

       * config/i386/i386.c (build_processor_features_struct): New function.
       (build_processor_model_struct): New function.
       (make_var_decl): New function.
       (get_field_from_struct): New function.
       (fold_builtin_target): New function.
       (ix86_fold_builtin): New function.
       (ix86_expand_builtin): Expand new builtins by folding them.
       (make_cpu_type_builtin): New functions.
       (ix86_init_platform_type_builtins): Make the new builtins.
       (ix86_init_builtins): Make new builtins to detect CPU type.
       (TARGET_FOLD_BUILTIN): New macro.
       (IX86_BUILTIN_CPU_INIT): New enum value.
       (IX86_BUILTIN_CPU_IS): New enum value.
       (IX86_BUILTIN_CPU_SUPPORTS): New enum value.
       * config/i386/i386-builtin-types.def: New function type.
       * testsuite/gcc.target/builtin_target.c: New testcase.
       * doc/extend.texi: Document builtins.

       * libgcc/config/i386/i386-cpuinfo.c: New file.
       * libgcc/config/i386/t-cpuinfo: New file.
       * libgcc/config.host: Include t-cpuinfo.
       * libgcc/config/i386/libgcc-glibc.ver: Version symbols __cpu_model
       and __cpu_features.

 The patch is OK.

 I guess that AVX is left as an exercise for a x86 maintainer ;)

 I will add AVX, and make all the changes.


 3 more comments:

 1.  Your implementation isn't thread-safe. With my suggestion of

 if (_cpu_model.__cpu_vendor)
  return 0;

 you should change get_intel_cpu to return processor_vendor and
 set _cpu_model.__cpu_vendor after setting up all other bits.

 2. Bitfields

 struct __processor_features
 +{
 +  unsigned int __cpu_cmov : 1;
 +  unsigned int __cpu_mmx : 1;
 +  unsigned int __cpu_popcnt : 1;
 +  unsigned int __cpu_sse : 1;
 +  unsigned int __cpu_sse2 : 1;
 +  unsigned int __cpu_sse3 : 1;
 +  unsigned int __cpu_ssse3 : 1;
 +  unsigned int __cpu_sse4_1 : 1;
 +  unsigned int __cpu_sse4_2 : 1;
 +} __cpu_features;

 may not be thread-safe since only char/short/int/long long
 stores are atomic and it isn't extensible. You can use

 unsigned int __cpu_features[1];

 to make it extensible and store atomic.

 3. You may move

 unsigned int __cpu_features[1];

 into struct __processor_model to only export one symbol
 instead of 2.

 --
 H.J.
2012-04-24  Sriraman Tallam  tmsri...@google.com

This patch adds new 

Re: Support for Runtime CPU type detection via builtins (issue5754058)

2012-04-24 Thread H.J. Lu
On Tue, Apr 24, 2012 at 5:10 PM, Sriraman Tallam tmsri...@google.com wrote:
 Hi,

   Thanks for all the comments. I have made all the changes as
 mentioned and submiited the patch. Summary of changes made:

 * Add support for AVX
 * Fix documentation in extend.texi
 * Make it thread-safe according to H.J.'s comments.

 I have attached the patch. Boot-strapped and checked for test parity
 with pristine build.

       * config/i386/i386.c (build_processor_model_struct): New function.
        (make_var_decl): New function.
        (fold_builtin_cpu): New function.
        (ix86_fold_builtin): New function.
        (make_cpu_type_builtin): New function.
        (ix86_init_platform_type_builtins): New function.
        (ix86_expand_builtin): Expand new builtins by folding them.
        (ix86_init_builtins): Make new builtins to detect CPU type.
        (TARGET_FOLD_BUILTIN): New macro.
        (IX86_BUILTIN_CPU_INIT): New enum value.
        (IX86_BUILTIN_CPU_IS): New enum value.
        (IX86_BUILTIN_CPU_SUPPORTS): New enum value.
        * config/i386/i386-builtin-types.def: New function type.
        * testsuite/gcc.target/builtin_target.c: New testcase.
        * doc/extend.texi: Document builtins.

        * libgcc/config/i386/i386-cpuinfo.c: New file.
        * libgcc/config/i386/t-cpuinfo: New file.
        * libgcc/config.host: Include t-cpuinfo.
        * libgcc/config/i386/libgcc-glibc.ver: Version symbol __cpu_model.



+  /* This function needs to run just once.  */
+  if (__cpu_model.__cpu_vendor)
+return 0;
+
+  /* Assume cpuid insn present. Run in level 0 to get vendor id. */
+  if (!__get_cpuid_output (0, eax, ebx, ecx, edx))
+return -1;

If __get_cpuid_output returns non-zero, it will be called
repeatedly.  I think you should set __cpu_model.__cpu_vendor
to non-zero in this case.

Otherwise, it looks good to me.

Thanks.


-- 
H.J.


Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)

2012-04-24 Thread Andi Kleen
tejohn...@google.com (Teresa Johnson) writes:

 This patch adds heuristics to limit unrolling in loops with branches that may 
 increase
 branch mispredictions. It affects loops that are not frequently iterated, and 
 that are
 nested within a hot region of code that already contains many branch 
 instructions.

 Performance tested with both internal benchmarks and with SPEC 2000/2006 on a 
 variety
 of Intel systems (Core2, Corei7, SandyBridge) and a couple of different AMD 
 Opteron systems.
 This improves performance of an internal search indexing benchmark by close 
 to 2% on
 all the tested Intel platforms.  It also consistently improves 445.gobmk 
 (with FDO feedback
 where unrolling kicks in) by close to 1% on AMD Opteron. Other performance 
 effects are
 neutral.

 Bootstrapped and tested on x86_64-unknown-linux-gnu.  Is this ok for trunk?

One problem with any unrolling heuristics is currently that gcc has both
the tree level and the rtl level unroller. The tree one is even on at
-O3.  So if you tweak anything for one you have to affect both, otherwise the
other may still do the wrong thing(tm).

For some other tweaks I looked into a shared cost model some time ago.
May be still needed.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only


[v3] libstdc++/52689 testcase

2012-04-24 Thread Benjamin De Kosnik

Noticed that this testcase wasn't put in as part of the patch. Fixed as
follows.

tested x86/linux

-benjamin2012-04-24  Benjamin Kosnik  b...@redhat.com

	PR libstdc++/52689
	* testsuite/17_intro/static.cc: New.


diff --git a/libstdc++-v3/testsuite/17_intro/static.cc b/libstdc++-v3/testsuite/17_intro/static.cc
new file mode 100644
index 000..99362f5
--- /dev/null
+++ b/libstdc++-v3/testsuite/17_intro/static.cc
@@ -0,0 +1,31 @@
+// { dg-do link }
+// { dg-require-effective-target static }
+// { dg-options -static -std=gnu++11 }
+
+// Copyright (C) 2012 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// http://www.gnu.org/licenses/
+
+// libstdc++/52689 static linking fails
+#include iostream
+#include locale
+
+int main()
+{
+  std::locale c = std::locale::global();
+  std::cout  i am old-skool\n;
+  return 0;
+}


Re: [C++ Patch] PR 39970

2012-04-24 Thread Paolo Carlini

On 04/25/2012 01:41 AM, Jason Merrill wrote:

On 04/24/2012 05:24 PM, Paolo Carlini wrote:

Perhaps we aren't checking default arguments unless they're actually
used; a could change that if they aren't dependent.


Your reply reached the mailing list a bit mangled, could you please
clarify?


If the default argument isn't dependent on other template parameters, 
we can check immediately whether it is a suitable template argument 
for its parameter, including being a constant expression.  That ought 
to give an appropriate diagnostic for this testcase.

Ok, thanks, I''ll see what I can do.

Paolo.


Re: Support for Runtime CPU type detection via builtins (issue5754058)

2012-04-24 Thread Sriraman Tallam
On Tue, Apr 24, 2012 at 5:24 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Tue, Apr 24, 2012 at 5:10 PM, Sriraman Tallam tmsri...@google.com wrote:
 Hi,

   Thanks for all the comments. I have made all the changes as
 mentioned and submiited the patch. Summary of changes made:

 * Add support for AVX
 * Fix documentation in extend.texi
 * Make it thread-safe according to H.J.'s comments.

 I have attached the patch. Boot-strapped and checked for test parity
 with pristine build.

       * config/i386/i386.c (build_processor_model_struct): New function.
        (make_var_decl): New function.
        (fold_builtin_cpu): New function.
        (ix86_fold_builtin): New function.
        (make_cpu_type_builtin): New function.
        (ix86_init_platform_type_builtins): New function.
        (ix86_expand_builtin): Expand new builtins by folding them.
        (ix86_init_builtins): Make new builtins to detect CPU type.
        (TARGET_FOLD_BUILTIN): New macro.
        (IX86_BUILTIN_CPU_INIT): New enum value.
        (IX86_BUILTIN_CPU_IS): New enum value.
        (IX86_BUILTIN_CPU_SUPPORTS): New enum value.
        * config/i386/i386-builtin-types.def: New function type.
        * testsuite/gcc.target/builtin_target.c: New testcase.
        * doc/extend.texi: Document builtins.

        * libgcc/config/i386/i386-cpuinfo.c: New file.
        * libgcc/config/i386/t-cpuinfo: New file.
        * libgcc/config.host: Include t-cpuinfo.
        * libgcc/config/i386/libgcc-glibc.ver: Version symbol __cpu_model.



 +  /* This function needs to run just once.  */
 +  if (__cpu_model.__cpu_vendor)
 +    return 0;
 +
 +  /* Assume cpuid insn present. Run in level 0 to get vendor id. */
 +  if (!__get_cpuid_output (0, eax, ebx, ecx, edx))
 +    return -1;

 If __get_cpuid_output returns non-zero, it will be called
 repeatedly.  I think you should set __cpu_model.__cpu_vendor
 to non-zero in this case.

Done now.

2012-04-24  Sriraman Tallam  tmsri...@google.com

* libgcc/config/i386/i386-cpuinfo.c: Set __cpu_vendor always.


Index: libgcc/config/i386/i386-cpuinfo.c
===
--- libgcc/config/i386/i386-cpuinfo.c   (revision 186789)
+++ libgcc/config/i386/i386-cpuinfo.c   (working copy)
@@ -256,16 +256,25 @@ __cpu_indicator_init (void)

   /* Assume cpuid insn present. Run in level 0 to get vendor id. */
   if (!__get_cpuid_output (0, eax, ebx, ecx, edx))
-return -1;
+{
+  __cpu_model.__cpu_vendor = VENDOR_OTHER;
+  return -1;
+}

   vendor = ebx;
   max_level = eax;

   if (max_level  1)
-return -1;
+{
+  __cpu_model.__cpu_vendor = VENDOR_OTHER;
+  return -1;
+}

   if (!__get_cpuid_output (1, eax, ebx, ecx, edx))
-return -1;
+{
+  __cpu_model.__cpu_vendor = VENDOR_OTHER;
+  return -1;
+}

   model = (eax  4)  0x0f;
   family = (eax  8)  0x0f;


Thanks,
-Sri.



 Otherwise, it looks good to me.

 Thanks.


 --
 H.J.


Re: Support for Runtime CPU type detection via builtins (issue5754058)

2012-04-24 Thread H.J. Lu
On Tue, Apr 24, 2012 at 7:06 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Tue, Apr 24, 2012 at 5:24 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Tue, Apr 24, 2012 at 5:10 PM, Sriraman Tallam tmsri...@google.com wrote:
 Hi,

   Thanks for all the comments. I have made all the changes as
 mentioned and submiited the patch. Summary of changes made:

 * Add support for AVX
 * Fix documentation in extend.texi
 * Make it thread-safe according to H.J.'s comments.

 I have attached the patch. Boot-strapped and checked for test parity
 with pristine build.

       * config/i386/i386.c (build_processor_model_struct): New function.
        (make_var_decl): New function.
        (fold_builtin_cpu): New function.
        (ix86_fold_builtin): New function.
        (make_cpu_type_builtin): New function.
        (ix86_init_platform_type_builtins): New function.
        (ix86_expand_builtin): Expand new builtins by folding them.
        (ix86_init_builtins): Make new builtins to detect CPU type.
        (TARGET_FOLD_BUILTIN): New macro.
        (IX86_BUILTIN_CPU_INIT): New enum value.
        (IX86_BUILTIN_CPU_IS): New enum value.
        (IX86_BUILTIN_CPU_SUPPORTS): New enum value.
        * config/i386/i386-builtin-types.def: New function type.
        * testsuite/gcc.target/builtin_target.c: New testcase.
        * doc/extend.texi: Document builtins.

        * libgcc/config/i386/i386-cpuinfo.c: New file.
        * libgcc/config/i386/t-cpuinfo: New file.
        * libgcc/config.host: Include t-cpuinfo.
        * libgcc/config/i386/libgcc-glibc.ver: Version symbol __cpu_model.



 +  /* This function needs to run just once.  */
 +  if (__cpu_model.__cpu_vendor)
 +    return 0;
 +
 +  /* Assume cpuid insn present. Run in level 0 to get vendor id. */
 +  if (!__get_cpuid_output (0, eax, ebx, ecx, edx))
 +    return -1;

 If __get_cpuid_output returns non-zero, it will be called
 repeatedly.  I think you should set __cpu_model.__cpu_vendor
 to non-zero in this case.

 Done now.

 2012-04-24  Sriraman Tallam  tmsri...@google.com

        * libgcc/config/i386/i386-cpuinfo.c: Set __cpu_vendor always.


 Index: libgcc/config/i386/i386-cpuinfo.c
 ===
 --- libgcc/config/i386/i386-cpuinfo.c   (revision 186789)
 +++ libgcc/config/i386/i386-cpuinfo.c   (working copy)
 @@ -256,16 +256,25 @@ __cpu_indicator_init (void)

   /* Assume cpuid insn present. Run in level 0 to get vendor id. */
   if (!__get_cpuid_output (0, eax, ebx, ecx, edx))
 -    return -1;
 +    {
 +      __cpu_model.__cpu_vendor = VENDOR_OTHER;
 +      return -1;
 +    }

   vendor = ebx;
   max_level = eax;

   if (max_level  1)
 -    return -1;
 +    {
 +      __cpu_model.__cpu_vendor = VENDOR_OTHER;
 +      return -1;
 +    }

   if (!__get_cpuid_output (1, eax, ebx, ecx, edx))
 -    return -1;
 +    {
 +      __cpu_model.__cpu_vendor = VENDOR_OTHER;
 +      return -1;
 +    }

   model = (eax  4)  0x0f;
   family = (eax  8)  0x0f;


 Thanks,

Should you also handle AVX2?


-- 
H.J.


libgo patch committed: Fix net package for Solaris

2012-04-24 Thread Ian Lance Taylor
This patch to libgo fixes the net package to work correctly on Solaris.
The main fix is to the select code, to fix the case where one goroutine
closes a file descriptor while another goroutine is waiting for I/O on
the descriptor.  Bootstrapped and ran Go testsuite on
sparc-sun-solaris2.11.  Committed to mainline and 4.7 branch.

Ian

diff -r 19c80c28161e libgo/Makefile.am
--- a/libgo/Makefile.am	Tue Apr 24 13:11:46 2012 -0700
+++ b/libgo/Makefile.am	Tue Apr 24 20:10:20 2012 -0700
@@ -654,9 +654,9 @@
 else
 if LIBGO_IS_SOLARIS
 go_net_cgo_file = go/net/cgo_linux.go
-go_net_sock_file = go/net/sock_linux.go
-go_net_sockopt_file = go/net/sockopt_linux.go
-go_net_sockoptip_file = go/net/sockoptip_linux.go
+go_net_sock_file = go/net/sock_solaris.go
+go_net_sockopt_file = go/net/sockopt_bsd.go
+go_net_sockoptip_file = go/net/sockoptip_bsd.go go/net/sockoptip_solaris.go
 else
 if LIBGO_IS_FREEBSD
 go_net_cgo_file = go/net/cgo_bsd.go
diff -r 19c80c28161e libgo/Makefile.in
--- a/libgo/Makefile.in	Tue Apr 24 13:11:46 2012 -0700
+++ b/libgo/Makefile.in	Tue Apr 24 20:10:20 2012 -0700
@@ -1019,17 +1019,17 @@
 @LIBGO_IS_LINUX_TRUE@go_net_cgo_file = go/net/cgo_linux.go
 @LIBGO_IS_FREEBSD_FALSE@@LIBGO_IS_IRIX_FALSE@@LIBGO_IS_LINUX_FALSE@@LIBGO_IS_SOLARIS_FALSE@go_net_sock_file = go/net/sock_bsd.go
 @LIBGO_IS_FREEBSD_TRUE@@LIBGO_IS_IRIX_FALSE@@LIBGO_IS_LINUX_FALSE@@LIBGO_IS_SOLARIS_FALSE@go_net_sock_file = go/net/sock_bsd.go
-@LIBGO_IS_IRIX_FALSE@@LIBGO_IS_LINUX_FALSE@@LIBGO_IS_SOLARIS_TRUE@go_net_sock_file = go/net/sock_linux.go
+@LIBGO_IS_IRIX_FALSE@@LIBGO_IS_LINUX_FALSE@@LIBGO_IS_SOLARIS_TRUE@go_net_sock_file = go/net/sock_solaris.go
 @LIBGO_IS_IRIX_TRUE@@LIBGO_IS_LINUX_FALSE@go_net_sock_file = go/net/sock_linux.go
 @LIBGO_IS_LINUX_TRUE@go_net_sock_file = go/net/sock_linux.go
 @LIBGO_IS_FREEBSD_FALSE@@LIBGO_IS_IRIX_FALSE@@LIBGO_IS_LINUX_FALSE@@LIBGO_IS_SOLARIS_FALSE@go_net_sockopt_file = go/net/sockopt_bsd.go
 @LIBGO_IS_FREEBSD_TRUE@@LIBGO_IS_IRIX_FALSE@@LIBGO_IS_LINUX_FALSE@@LIBGO_IS_SOLARIS_FALSE@go_net_sockopt_file = go/net/sockopt_bsd.go
-@LIBGO_IS_IRIX_FALSE@@LIBGO_IS_LINUX_FALSE@@LIBGO_IS_SOLARIS_TRUE@go_net_sockopt_file = go/net/sockopt_linux.go
+@LIBGO_IS_IRIX_FALSE@@LIBGO_IS_LINUX_FALSE@@LIBGO_IS_SOLARIS_TRUE@go_net_sockopt_file = go/net/sockopt_bsd.go
 @LIBGO_IS_IRIX_TRUE@@LIBGO_IS_LINUX_FALSE@go_net_sockopt_file = go/net/sockopt_linux.go
 @LIBGO_IS_LINUX_TRUE@go_net_sockopt_file = go/net/sockopt_linux.go
 @LIBGO_IS_FREEBSD_FALSE@@LIBGO_IS_IRIX_FALSE@@LIBGO_IS_LINUX_FALSE@@LIBGO_IS_SOLARIS_FALSE@go_net_sockoptip_file = go/net/sockoptip_bsd.go go/net/sockoptip_netbsd.go
 @LIBGO_IS_FREEBSD_TRUE@@LIBGO_IS_IRIX_FALSE@@LIBGO_IS_LINUX_FALSE@@LIBGO_IS_SOLARIS_FALSE@go_net_sockoptip_file = go/net/sockoptip_bsd.go go/net/sockoptip_freebsd.go
-@LIBGO_IS_IRIX_FALSE@@LIBGO_IS_LINUX_FALSE@@LIBGO_IS_SOLARIS_TRUE@go_net_sockoptip_file = go/net/sockoptip_linux.go
+@LIBGO_IS_IRIX_FALSE@@LIBGO_IS_LINUX_FALSE@@LIBGO_IS_SOLARIS_TRUE@go_net_sockoptip_file = go/net/sockoptip_bsd.go go/net/sockoptip_solaris.go
 @LIBGO_IS_IRIX_TRUE@@LIBGO_IS_LINUX_FALSE@go_net_sockoptip_file = go/net/sockoptip_linux.go
 @LIBGO_IS_LINUX_TRUE@go_net_sockoptip_file = go/net/sockoptip_linux.go
 @LIBGO_IS_LINUX_FALSE@go_net_sendfile_file = go/net/sendfile_stub.go
diff -r 19c80c28161e libgo/go/net/dial_test.go
--- a/libgo/go/net/dial_test.go	Tue Apr 24 13:11:46 2012 -0700
+++ b/libgo/go/net/dial_test.go	Tue Apr 24 20:10:20 2012 -0700
@@ -130,7 +130,7 @@
 		n = 1000
 	}
 	switch runtime.GOOS {
-	case darwin, freebsd, openbsd, windows:
+	case darwin, freebsd, openbsd, solaris, windows:
 		// Non-Linux systems take a long time to figure
 		// out that there is nothing listening on localhost.
 		n = 100
diff -r 19c80c28161e libgo/go/net/fd_select.go
--- a/libgo/go/net/fd_select.go	Tue Apr 24 13:11:46 2012 -0700
+++ b/libgo/go/net/fd_select.go	Tue Apr 24 20:10:20 2012 -0700
@@ -102,7 +102,27 @@
 break
 			}
 		}
-		if e != nil {
+		if e == syscall.EBADF {
+			// Some file descriptor has been closed.
+			tmpReadFds = syscall.FdSet{}
+			tmpWriteFds = syscall.FdSet{}
+			n = 0
+			for i := 0; i  p.maxFd+1; i++ {
+if syscall.FDIsSet(i, p.readFds) {
+	var s syscall.Stat_t
+	if syscall.Fstat(i, s) == syscall.EBADF {
+		syscall.FDSet(i, tmpReadFds)
+		n++
+	}
+} else if syscall.FDIsSet(i, p.writeFds) {
+	var s syscall.Stat_t
+	if syscall.Fstat(i, s) == syscall.EBADF {
+		syscall.FDSet(i, tmpWriteFds)
+		n++
+	}
+}
+			}
+		} else if e != nil {
 			return -1, 0, os.NewSyscallError(select, e)
 		}
 		if n == 0 {
diff -r 19c80c28161e libgo/go/net/multicast_test.go
--- a/libgo/go/net/multicast_test.go	Tue Apr 24 13:11:46 2012 -0700
+++ b/libgo/go/net/multicast_test.go	Tue Apr 24 20:10:20 2012 -0700
@@ -46,7 +46,7 @@
 // listener with same address family, same group address and same port.
 func TestMulticastListener(t *testing.T) {
 	switch runtime.GOOS {
-	case netbsd, openbsd, plan9, windows:
+	case netbsd, 

libgo patch committed: Fix crypto/rand test to read all data

2012-04-24 Thread Ian Lance Taylor
This patch to libgo fixes the crypto/rand test to read all the data that
it needs.  This should fix PR 52341.  Bootstrapped and ran crypto/rand
test on x86_64-unknown-linux-gnu.  Committed to mainline and 4.7 branch.

Ian

diff -r c4931780d05a libgo/go/crypto/rand/rand_test.go
--- a/libgo/go/crypto/rand/rand_test.go	Tue Apr 24 21:24:41 2012 -0700
+++ b/libgo/go/crypto/rand/rand_test.go	Tue Apr 24 21:38:23 2012 -0700
@@ -7,6 +7,7 @@
 import (
 	bytes
 	compress/flate
+	io
 	testing
 )
 
@@ -16,9 +17,9 @@
 		n = 1e5
 	}
 	b := make([]byte, n)
-	n, err := Read(b)
+	n, err := io.ReadFull(Reader, b)
 	if n != len(b) || err != nil {
-		t.Fatalf(Read(buf) = %d, %s, n, err)
+		t.Fatalf(ReadFull(buf) = %d, %s, n, err)
 	}
 
 	var z bytes.Buffer


Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)

2012-04-24 Thread Xinliang David Li
On Tue, Apr 24, 2012 at 6:13 PM, Andi Kleen a...@firstfloor.org wrote:
 tejohn...@google.com (Teresa Johnson) writes:

 This patch adds heuristics to limit unrolling in loops with branches that 
 may increase
 branch mispredictions. It affects loops that are not frequently iterated, 
 and that are
 nested within a hot region of code that already contains many branch 
 instructions.

 Performance tested with both internal benchmarks and with SPEC 2000/2006 on 
 a variety
 of Intel systems (Core2, Corei7, SandyBridge) and a couple of different AMD 
 Opteron systems.
 This improves performance of an internal search indexing benchmark by close 
 to 2% on
 all the tested Intel platforms.  It also consistently improves 445.gobmk 
 (with FDO feedback
 where unrolling kicks in) by close to 1% on AMD Opteron. Other performance 
 effects are
 neutral.

 Bootstrapped and tested on x86_64-unknown-linux-gnu.  Is this ok for trunk?

 One problem with any unrolling heuristics is currently that gcc has both
 the tree level and the rtl level unroller. The tree one is even on at
 -O3.  So if you tweak anything for one you have to affect both, otherwise the
 other may still do the wrong thing(tm).

Tree level unrollers (cunrolli and cunroll) do complete unroll. At O2,
both of them are turned on, but gcc does not allow any code growth --
which makes them pretty useless at O2 (very few loops qualify). The
default max complete peel iteration is also too low compared with both
icc and llvm.  This needs to be tuned.

David


 For some other tweaks I looked into a shared cost model some time ago.
 May be still needed.

 -Andi

 --
 a...@linux.intel.com -- Speaking for myself only


Re: PowerPC prologue and epilogue 6

2012-04-24 Thread Alan Modra
On Tue, Apr 24, 2012 at 07:19:42PM -0400, David Edelsohn wrote:
 This patch is okay with the macro usage fix.

Thanks, series 2 to 6 committed as 186796, 186797, 186798, 186799,
186800.  I noticed after I committed the lot that 186797 has some
duplicated lines (harmless), corrected in 186798, and 186799 kept the
old cr_save_regno assignment (again harmless), corrected in 186800.
A result of merge conflicts.  I normally start from a clean source
tree, apply patch as posted, commit, repeat.  This time I had a series
of directories with the cumulative patches applied.  Bad idea unless
you use mf to resolve conflicts..

This patch adds a testcase to verify register saves and restores.
I tried to write it so that it will run on all powerpc targets.  From
past experience it probably won't.  OK to apply anyway, and fix
fallout later?

* gcc.target/powerpc/savres.c: New test.
* gcc.target/powerpc/powerpc.exp: Run it.

Index: gcc/testsuite/gcc.target/powerpc/powerpc.exp
===
--- gcc/testsuite/gcc.target/powerpc/powerpc.exp(revision 186800)
+++ gcc/testsuite/gcc.target/powerpc/powerpc.exp(working copy)
@@ -37,5 +37,14 @@
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\]]] \
 $DEFAULT_CFLAGS
 
+set SAVRES_TEST_OPTS [list -Os -O2 {-Os -mno-multiple} {-O2 -mno-multiple}]
+set alti 
+if [check_vmx_hw_available] {
+set alti -maltivec
+}
+torture-init
+set-torture-options $SAVRES_TEST_OPTS
+gcc-dg-runtest [list $srcdir/$subdir/savres.c] $alti
+
 # All done.
 dg-finish
Index: gcc/testsuite/gcc.target/powerpc/savres.c
===
--- gcc/testsuite/gcc.target/powerpc/savres.c   (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/savres.c   (revision 0)
@@ -0,0 +1,1158 @@
+/* { dg-do run } */
+/* { dg-options -fno-inline -fomit-frame-pointer } */
+
+/* -fno-inline -maltivec -m32/-m64 -mmultiple/no-multiple -Os/-O2.  */
+#ifndef NO_BODY
+#define abort() __builtin_abort ()
+#define vec_all_eq(v1,v2) __builtin_vec_vcmpeq_p (2, v1, v2)
+#define SET(T,R,V) register T R __asm__ (#R) = V
+#define SET_GPR(R,V) SET (long, R, V)
+#define SET_FPR(R,V) SET (double, R, V)
+#define SET_VR(R,V) SET (__attribute__ ((vector_size (16))) int, R, V)
+#define SET_CR(R,V) __asm__ __volatile__ (mtcrf %0,%1 : : n (1(7-R)), 
r (V(4*(7-R))) : cr #R)
+#define TRASH_GPR(R) SET_GPR (R, 0)
+#define TRASH_FPR(R) SET_FPR (R, 0)
+#define TRASH_VR(R) SET_VR (R, val0)
+#define TRASH_CR(R) SET_CR (R, 0)
+#define TRASH_SOME_GPR TRASH_GPR (r30); TRASH_GPR (r31)
+#define TRASH_SOME_FPR TRASH_FPR (fr28); TRASH_FPR (fr31)
+#define TRASH_SOME_VR TRASH_VR (v26); TRASH_VR (v27); TRASH_VR (v31)
+#define TRASH_SOME_CR TRASH_CR (2)
+#define TRASH_ALL_GPR TRASH_GPR (r14); TRASH_GPR (r15); TRASH_GPR (r16); 
TRASH_GPR (r17); TRASH_GPR (r18); TRASH_GPR (r19); TRASH_GPR (r20); TRASH_GPR 
(r21); TRASH_GPR (r22); TRASH_GPR (r23); TRASH_GPR (r24); TRASH_GPR (r25); 
TRASH_GPR (r26); TRASH_GPR (r27); TRASH_GPR (r28); TRASH_GPR (r29); TRASH_GPR 
(r30); TRASH_GPR (r31)
+#define TRASH_ALL_FPR TRASH_FPR (fr14); TRASH_FPR (fr15); TRASH_FPR (fr16); 
TRASH_FPR (fr17); TRASH_FPR (fr18); TRASH_FPR (fr19); TRASH_FPR (fr20); 
TRASH_FPR (fr21); TRASH_FPR (fr22); TRASH_FPR (fr23); TRASH_FPR (fr24); 
TRASH_FPR (fr25); TRASH_FPR (fr26); TRASH_FPR (fr27); TRASH_FPR (fr28); 
TRASH_FPR (fr29); TRASH_FPR (fr30); TRASH_FPR (fr31)
+#define TRASH_ALL_VR TRASH_VR (v20); TRASH_VR (v21); TRASH_VR (v22); TRASH_VR 
(v23); TRASH_VR (v24); TRASH_VR (v25); TRASH_VR (v26); TRASH_VR (v27); TRASH_VR 
(v28); TRASH_VR (v29); TRASH_VR (v30); TRASH_VR (v31)
+#define TRASH_ALL_CR TRASH_CR (2); TRASH_CR (3); TRASH_CR (4)
+#define USE_SOME_GPR __asm__ __volatile__ (#%0 %1 : : r (r30), r (r31))
+#define USE_SOME_FPR __asm__ __volatile__ (#%0 %1 : : f (fr28), f (fr31))
+#define USE_SOME_VR __asm__ __volatile__ (#%0 %1 %2 : : v (v26), v 
(v27), v (v31))
+#define USE_SOME_CR
+#define USE_ALL_GPR __asm__ __volatile__ (#%0 %1 %2 %3 %4 %5 %6 %7 %8 %9 %10 
%11 %12 %13 %14 %15 %16 %17 : : r (r14), r (r15), r (r16), r (r17), 
r (r18), r (r19), r (r20), r (r21), r (r22), r (r23), r (r24), 
r (r25), r (r26), r (r27), r (r28), r (r29), r (r30), r (r31))
+#define USE_ALL_FPR __asm__ __volatile__ (#%0 %1 %2 %3 %4 %5 %6 %7 %8 %9 %10 
%11 %12 %13 %14 %15 %16 %17 : : f (fr14), f (fr15), f (fr16), f 
(fr17), f (fr18), f (fr19), f (fr20), f (fr21), f (fr22), f (fr23), 
f (fr24), f (fr25), f (fr26), f (fr27), f (fr28), f (fr29), f 
(fr30), f (fr31))
+#define USE_ALL_VR __asm__ __volatile__ (#%0 %1 %2 %3 %4 %5 %6 %7 %8 %9 %10 
%11 : : v (v20), v (v21), v (v22), v (v23), v (v24), v (v25), v 
(v26), v (v27), v (v28), v (v29), v (v30), v (v31))
+#define USE_ALL_CR
+
+#define INIT_GPR SET_GPR (r14, 14); SET_GPR (r15, 15); SET_GPR (r16, 16); 
SET_GPR (r17, 17); SET_GPR (r18, 18); SET_GPR (r19, 19); SET_GPR (r20, 20); 
SET_GPR (r21, 21); SET_GPR (r22, 22); SET_GPR (r23, 23); 

Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue 6099055)

2012-04-24 Thread davidxl


http://codereview.appspot.com/6099055/diff/1/loop-unroll.c
File loop-unroll.c (right):

http://codereview.appspot.com/6099055/diff/1/loop-unroll.c#newcode156
loop-unroll.c:156: static bool
An empty line here.

http://codereview.appspot.com/6099055/diff/1/loop-unroll.c#newcode182
loop-unroll.c:182: static bool
add empty line after comment.

http://codereview.appspot.com/6099055/diff/1/loop-unroll.c#newcode213
loop-unroll.c:213: /* Compute the number of branches in LOOP, weighted
by execution counts.  */
same here.

http://codereview.appspot.com/6099055/diff/1/loop-unroll.c#newcode215
loop-unroll.c:215: compute_weighted_branches(struct loop *loop)
Missing space. Similarly for other functions.

http://codereview.appspot.com/6099055/diff/1/loop-unroll.c#newcode255
loop-unroll.c:255: data exists, simply return the current NUNROLL
factor.  */
same here.

http://codereview.appspot.com/6099055/diff/1/loop-unroll.c#newcode281
loop-unroll.c:281: while (loop_outer(outer))
Should the check start from the current loop? Or the handling of the
current loop should be peeled out -- max_unroll_factor =
branch_budget/weighted_num_branches

http://codereview.appspot.com/6099055/diff/1/loop-unroll.c#newcode317
loop-unroll.c:317: return (PARAM_VALUE
(PARAM_UNROLL_OUTER_LOOP_BRANCH_BUDGET) -
The number of branches may be larger than budget -- leading to overflow
-- need to guard against it.

http://codereview.appspot.com/6099055/diff/1/loop-unroll.c#newcode318
loop-unroll.c:318: weighted_outer_branches)/(weighted_num_branches - 1)
+ 1;
Should it continue walking up the loop tree and find the minmal unroll
factor?

http://codereview.appspot.com/6099055/diff/1/loop-unroll.c#newcode747
loop-unroll.c:747:  loop_has_FP_comp(loop)
missing space

http://codereview.appspot.com/6099055/diff/1/loop-unroll.c#newcode749
loop-unroll.c:749:  desc-niter  (unsigned) PARAM_VALUE
(PARAM_MIN_ITER_UNROLL_WITH_BRANCHES))
line too long.

http://codereview.appspot.com/6099055/diff/1/loop-unroll.c#newcode1057
loop-unroll.c:1057:  loop_has_FP_comp(loop)
Refactor the check into a helper function?

http://codereview.appspot.com/6099055/diff/1/params.def
File params.def (right):

http://codereview.appspot.com/6099055/diff/1/params.def#newcode314
params.def:314:
Missing comment.

http://codereview.appspot.com/6099055/diff/1/params.def#newcode319
params.def:319:
missing comment.

http://codereview.appspot.com/6099055/