[patch] [1/2] Support reduction in loop SLP

2011-05-18 Thread Ira Rosen
Hi,

This is the first part of reduction support in loop-aware SLP. The
purpose of the patch is to handle unrolled reductions such as:

#a1 = phi a0, a5
...
a2 = a1 + x
...
a3 = a2 + y
...
a5 = a4 + z

Such sequence of statements is gathered into a reduction chain and
serves as a root for an SLP instance (similar to a group of strided
stores in the existing loop SLP implementation).

The patch also fixes PR tree-optimization/41881.

Since reduction chains use the same data structure as strided data
accesses, this part of the patch renames these data structures,
removing data-ref and interleaving references.

Bootstrapped and tested on powerpc64-suse-linux.
I am going to apply it later today.

Ira


ChangeLog:

* tree-vect-loop-manip.c (vect_create_cond_for_alias_checks): Use new
names for group elements access.
* tree-vectorizer.h (struct _stmt_vec_info): Use interleaving info for
reduction chains as well.  Remove data reference and interleaving 
related
words from the fields names.
* tree-vect-loop.c (vect_transform_loop): Use new names for group
elements access.
* tree-vect-data-refs.c (vect_get_place_in_interleaving_chain,
vect_insert_into_interleaving_chain, vect_update_interleaving_chain,
vect_update_interleaving_chain, vect_same_range_drs,
vect_analyze_data_ref_dependence, vect_update_misalignment_for_peel,
vect_verify_datarefs_alignment, vector_alignment_reachable_p,
vect_peeling_hash_get_lowest_cost, vect_enhance_data_refs_alignment,
vect_analyze_group_access, vect_analyze_data_ref_access,
vect_create_data_ref_ptr, vect_transform_strided_load,
vect_record_strided_load_vectors): Likewise.
* tree-vect-stmts.c (vect_model_simple_cost, vect_model_store_cost,
vect_model_load_cost, vectorizable_store, vectorizable_load,
vect_remove_stores, new_stmt_vec_info): Likewise.
* tree-vect-slp.c (vect_build_slp_tree,
vect_supported_slp_permutation_p, vect_analyze_slp_instance): Likewise.
Index: tree-vect-loop-manip.c
===
--- tree-vect-loop-manip.c  (revision 173814)
+++ tree-vect-loop-manip.c  (working copy)
@@ -2437,7 +2437,7 @@ vect_create_cond_for_alias_checks (loop_vec_info l
 
   dr_a = DDR_A (ddr);
   stmt_a = DR_STMT (DDR_A (ddr));
-  dr_group_first_a = DR_GROUP_FIRST_DR (vinfo_for_stmt (stmt_a));
+  dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a));
   if (dr_group_first_a)
 {
  stmt_a = dr_group_first_a;
@@ -2446,7 +2446,7 @@ vect_create_cond_for_alias_checks (loop_vec_info l
 
   dr_b = DDR_B (ddr);
   stmt_b = DR_STMT (DDR_B (ddr));
-  dr_group_first_b = DR_GROUP_FIRST_DR (vinfo_for_stmt (stmt_b));
+  dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b));
   if (dr_group_first_b)
 {
  stmt_b = dr_group_first_b;
Index: tree-vectorizer.h
===
--- tree-vectorizer.h   (revision 173814)
+++ tree-vectorizer.h   (working copy)
@@ -468,15 +473,15 @@ typedef struct _stmt_vec_info {
   /*  Whether the stmt is SLPed, loop-based vectorized, or both.  */
   enum slp_vect_type slp_type;
 
-  /* Interleaving info.  */
-  /* First data-ref in the interleaving group.  */
-  gimple first_dr;
-  /* Pointer to the next data-ref in the group.  */
-  gimple next_dr;
-  /* In case that two or more stmts share data-ref, this is the pointer to the
- previously detected stmt with the same dr.  */
+  /* Interleaving and reduction chains info.  */
+  /* First element in the group.  */
+  gimple first_element;
+  /* Pointer to the next element in the group.  */
+  gimple next_element;
+  /* For data-refs, in case that two or more stmts share data-ref, this is the
+ pointer to the previously detected stmt with the same dr.  */
   gimple same_dr_stmt;
-  /* The size of the interleaving group.  */
+  /* The size of the group.  */
   unsigned int size;
   /* For stores, number of stores from this group seen. We vectorize the last
  one.  */
@@ -527,22 +532,22 @@ typedef struct _stmt_vec_info {
 #define STMT_VINFO_RELATED_STMT(S) (S)-related_stmt
 #define STMT_VINFO_SAME_ALIGN_REFS(S)  (S)-same_align_refs
 #define STMT_VINFO_DEF_TYPE(S) (S)-def_type
-#define STMT_VINFO_DR_GROUP_FIRST_DR(S)(S)-first_dr
-#define STMT_VINFO_DR_GROUP_NEXT_DR(S) (S)-next_dr
-#define STMT_VINFO_DR_GROUP_SIZE(S)(S)-size
-#define STMT_VINFO_DR_GROUP_STORE_COUNT(S) (S)-store_count
-#define STMT_VINFO_DR_GROUP_GAP(S) (S)-gap
-#define STMT_VINFO_DR_GROUP_SAME_DR_STMT(S)(S)-same_dr_stmt
-#define STMT_VINFO_DR_GROUP_READ_WRITE_DEPENDENCE(S)  (S)-read_write_dep
-#define STMT_VINFO_STRIDED_ACCESS(S)  ((S)-first_dr != NULL)
+#define STMT_VINFO_GROUP_FIRST_ELEMENT(S)  (S)-first_element
+#define 

[patch] [2/2] Support reduction in loop SLP

2011-05-18 Thread Ira Rosen
This part adds the actual code for reduction support.

Bootstrapped and tested on powerpc64-suse-linux.
I am planning to apply it later today.

Ira

ChangeLog:

PR tree-optimization/41881
* tree-vectorizer.h (struct _loop_vec_info): Add new field
reduction_chains along with a macro for
its access.
* tree-vect-loop.c (new_loop_vec_info): Initialize reduction chains.
(destroy_loop_vec_info): Free reduction chains.
(vect_analyze_loop_2): Return false if vect_analyze_slp() returns false.
(vect_is_slp_reduction): New function.
(vect_is_simple_reduction_1): Call vect_is_slp_reduction.
(vect_create_epilog_for_reduction): Support SLP reduction chains.
* tree-vect-slp.c (vect_get_and_check_slp_defs): Allow different
definition types for reduction
chains.
(vect_supported_load_permutation_p): Don't allow permutations for
reduction chains.
(vect_analyze_slp_instance): Support reduction chains.
(vect_analyze_slp): Try to build SLP instance from reduction chains.
(vect_get_constant_vectors):  Handle reduction chains.
(vect_schedule_slp_instance): Mark the first statement of the
reduction chain as reduction.

testsuite/ChangeLog:

PR tree-optimization/41881
* gcc.dg/vect/O3-pr41881.c: New test.
* gcc.dg/vect/O3-slp-reduc-10.c: New test.
Index: testsuite/gcc.dg/vect/O3-slp-reduc-10.c
===
--- testsuite/gcc.dg/vect/O3-slp-reduc-10.c (revision 0)
+++ testsuite/gcc.dg/vect/O3-slp-reduc-10.c (revision 0)
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include stdarg.h
+#include tree-vect.h
+
+#define N 128
+#define TYPE int
+#define RESULT 755918
+
+__attribute__ ((noinline)) TYPE fun2 (TYPE *x, TYPE *y, unsigned int n)
+{
+  int i, j;
+  TYPE dot = 14;
+
+  for (i = 0; i  n / 2; i++)
+for (j = 0; j  2; j++)
+  dot += *(x++) * *(y++);
+
+  return dot;
+}
+
+int main (void)
+{
+  TYPE a[N], b[N], dot;
+  int i;
+
+  check_vect ();
+
+  for (i = 0; i  N; i++)
+{
+  a[i] = i;
+  b[i] = i+8;
+}
+
+  dot = fun2 (a, b, N);
+  if (dot != RESULT)
+abort();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times vectorized 1 loops 2 vect { target { 
vect_int_mult  {! vect_no_align } } } } } */
+/* { dg-final { cleanup-tree-dump vect } } */
Index: testsuite/gcc.dg/vect/O3-pr41881.c
===
--- testsuite/gcc.dg/vect/O3-pr41881.c  (revision 0)
+++ testsuite/gcc.dg/vect/O3-pr41881.c  (revision 0)
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+
+#define TYPE int
+
+TYPE fun1(TYPE *x, TYPE *y, unsigned int n)
+{
+  int i, j;
+  TYPE dot = 0;
+
+  for (i = 0; i  n; i++)
+dot += *(x++) * *(y++);
+
+  return dot;
+}
+
+TYPE fun2(TYPE *x, TYPE *y, unsigned int n)
+{
+  int i, j;
+  TYPE dot = 0;
+
+  for (i = 0; i  n / 8; i++)
+for (j = 0; j  8; j++)
+  dot += *(x++) * *(y++);
+
+  return dot;
+}
+
+/* { dg-final { scan-tree-dump-times vectorized 1 loops 2 vect { target { 
vect_int_mult  {! vect_no_align } } } } } */
+/* { dg-final { cleanup-tree-dump vect } } */
+
Index: tree-vectorizer.h
===
--- tree-vectorizer.h   (revision 173814)
+++ tree-vectorizer.h   (working copy)
@@ -248,6 +248,10 @@ typedef struct _loop_vec_info {
   /* Reduction cycles detected in the loop. Used in loop-aware SLP.  */
   VEC (gimple, heap) *reductions;
 
+  /* All reduction chains in the loop, represented by the first
+ stmt in the chain.  */
+  VEC (gimple, heap) *reduction_chains;
+
   /* Hash table used to choose the best peeling option.  */
   htab_t peeling_htab;
 
@@ -277,6 +281,7 @@ typedef struct _loop_vec_info {
 #define LOOP_VINFO_SLP_INSTANCES(L)(L)-slp_instances
 #define LOOP_VINFO_SLP_UNROLLING_FACTOR(L) (L)-slp_unrolling_factor
 #define LOOP_VINFO_REDUCTIONS(L)   (L)-reductions
+#define LOOP_VINFO_REDUCTION_CHAINS(L) (L)-reduction_chains
 #define LOOP_VINFO_PEELING_HTAB(L) (L)-peeling_htab
 
 #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \
Index: tree-vect-loop.c
===
--- tree-vect-loop.c(revision 173814)
+++ tree-vect-loop.c(working copy)
@@ -757,6 +757,7 @@ new_loop_vec_info (struct loop *loop)
PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS));
   LOOP_VINFO_STRIDED_STORES (res) = VEC_alloc (gimple, heap, 10);
   LOOP_VINFO_REDUCTIONS (res) = VEC_alloc (gimple, heap, 10);
+  LOOP_VINFO_REDUCTION_CHAINS (res) = VEC_alloc (gimple, heap, 10);
   LOOP_VINFO_SLP_INSTANCES (res) = VEC_alloc (slp_instance, heap, 10);
   LOOP_VINFO_SLP_UNROLLING_FACTOR (res) = 1;
   LOOP_VINFO_PEELING_HTAB (res) = NULL;
@@ -852,6 +853,7 @@ destroy_loop_vec_info (loop_vec_info loop_vinfo, b
   VEC_free (slp_instance, heap, 

[PATCH] Fix up execute_update_addresses_taken for debug stmts (PR tree-optimization/49000)

2011-05-18 Thread Jakub Jelinek
Hi!

When an addressable var is optimized into non-addressable, we didn't
clean up MEM_REFs containing ADDR_EXPR of such VARs in debug stmts.  This
got later on folded into the var itself and caused ssa verification errors.
Fixed by trying to rewrite it and if it fails, resetting.

Bootstrapped/regtested on x86_64-linux and i686-linux, no change in cc1plus
.debug_info/.debug_loc, implicitptr.c testcase still works too.
Ok for trunk/4.6?

2011-05-18  Jakub Jelinek  ja...@redhat.com

PR tree-optimization/49000
* tree-ssa.c (execute_update_addresses_taken): Call
maybe_rewrite_mem_ref_base on debug stmt value.  If it couldn't
be rewritten and decl has been marked for renaming, reset
the debug stmt.

* gcc.dg/pr49000.c: New test.

--- gcc/tree-ssa.c.jj   2011-05-11 19:39:04.0 +0200
+++ gcc/tree-ssa.c  2011-05-17 18:20:10.0 +0200
@@ -2230,6 +2230,17 @@ execute_update_addresses_taken (void)
  }
  }
 
+   else if (gimple_debug_bind_p (stmt)
+ gimple_debug_bind_has_value_p (stmt))
+ {
+   tree *valuep = gimple_debug_bind_get_value_ptr (stmt);
+   tree decl;
+   maybe_rewrite_mem_ref_base (valuep);
+   decl = non_rewritable_mem_ref_base (*valuep);
+   if (decl  symbol_marked_for_renaming (decl))
+ gimple_debug_bind_reset_value (stmt);
+ }
+
if (gimple_references_memory_p (stmt)
|| is_gimple_debug (stmt))
  update_stmt (stmt);
--- gcc/testsuite/gcc.dg/pr49000.c.jj   2011-05-17 18:30:10.0 +0200
+++ gcc/testsuite/gcc.dg/pr49000.c  2011-05-17 18:23:16.0 +0200
@@ -0,0 +1,29 @@
+/* PR tree-optimization/49000 */
+/* { dg-do compile } */
+/* { dg-options -O2 -g } */
+
+static
+foo (int x, int y)
+{
+  return x * y;
+}
+
+static int
+bar (int *z)
+{
+  return *z;
+}
+
+void
+baz (void)
+{
+  int a = 42;
+  int *b = a;
+  foo (bar (a), 3);
+}
+
+void
+test (void)
+{
+  baz ();
+}

Jakub


[PATCH] Small typed DWARF improvement

2011-05-18 Thread Jakub Jelinek
Hi!

This patch optimizes away unneeded DW_OP_GNU_converts.  mem_loc_descriptor
attempts to keep the operands signed when it returns, if next op
needs it unsigned again with the same size, there might be useless
converts.  The patch won't change DW_OP_GNU_convert to integral from
non-integral (so that say float to {un,}signed conversion is done with the
right sign), for other converts will change if possible preceeding typed
op's base type if size is the same, both the typed op and following
DW_OP_GNU_convert are integral or have the same encoding.
Example testcase which is improved is e.g.:
/* { dg-do run } */
/* { dg-options -g } */

volatile int vv;

__attribute__((noclone, noinline)) void
foo (double d)
{
  unsigned long f = ((unsigned long) d) / 33UL;
  vv++; /* { dg-final { gdb-test 10 f 7 } } */
}

int
main ()
{
  foo (231.0);
  return 0;
}
where previously we emitted
DW_OP_GNU_regval_type xmm0, double DW_OP_GNU_convert ulong
DW_OP_GNU_convert long DW_OP_GNU_convert ulong DW_OP_const1u 33
DW_OP_GNU_convert ulong DW_OP_div DW_OP_GNU_convert long
while with this patch DW_OP_GNU_convert long DW_OP_GNU_convert ulong
can go away.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2011-05-17  Jakub Jelinek  ja...@redhat.com

* dwarf2out.c (resolve_addr_in_expr): Optimize away redundant
DW_OP_GNU_convert ops.

--- gcc/dwarf2out.c.jj  2011-05-17 13:35:26.0 +0200
+++ gcc/dwarf2out.c 2011-05-17 14:41:21.0 +0200
@@ -24092,23 +24092,84 @@ resolve_one_addr (rtx *addr, void *data 
 static bool
 resolve_addr_in_expr (dw_loc_descr_ref loc)
 {
+  dw_loc_descr_ref keep = NULL;
   for (; loc; loc = loc-dw_loc_next)
-if (((loc-dw_loc_opc == DW_OP_addr || loc-dtprel)
- resolve_one_addr (loc-dw_loc_oprnd1.v.val_addr, NULL))
-   || (loc-dw_loc_opc == DW_OP_implicit_value
-loc-dw_loc_oprnd2.val_class == dw_val_class_addr
-resolve_one_addr (loc-dw_loc_oprnd2.v.val_addr, NULL)))
-  return false;
-else if (loc-dw_loc_opc == DW_OP_GNU_implicit_pointer
- loc-dw_loc_oprnd1.val_class == dw_val_class_decl_ref)
+switch (loc-dw_loc_opc)
   {
-   dw_die_ref ref
- = lookup_decl_die (loc-dw_loc_oprnd1.v.val_decl_ref);
-   if (ref == NULL)
+  case DW_OP_addr:
+   if (resolve_one_addr (loc-dw_loc_oprnd1.v.val_addr, NULL))
  return false;
-   loc-dw_loc_oprnd1.val_class = dw_val_class_die_ref;
-   loc-dw_loc_oprnd1.v.val_die_ref.die = ref;
-   loc-dw_loc_oprnd1.v.val_die_ref.external = 0;
+   break;
+  case DW_OP_const4u:
+  case DW_OP_const8u:
+   if (loc-dtprel
+resolve_one_addr (loc-dw_loc_oprnd1.v.val_addr, NULL))
+ return false;
+   break;
+  case DW_OP_implicit_value:
+   if (loc-dw_loc_oprnd2.val_class == dw_val_class_addr
+resolve_one_addr (loc-dw_loc_oprnd2.v.val_addr, NULL))
+ return false;
+   break;
+  case DW_OP_GNU_implicit_pointer:
+   if (loc-dw_loc_oprnd1.val_class == dw_val_class_decl_ref)
+ {
+   dw_die_ref ref
+ = lookup_decl_die (loc-dw_loc_oprnd1.v.val_decl_ref);
+   if (ref == NULL)
+ return false;
+   loc-dw_loc_oprnd1.val_class = dw_val_class_die_ref;
+   loc-dw_loc_oprnd1.v.val_die_ref.die = ref;
+   loc-dw_loc_oprnd1.v.val_die_ref.external = 0;
+ }
+   break;
+  case DW_OP_GNU_const_type:
+  case DW_OP_GNU_regval_type:
+  case DW_OP_GNU_deref_type:
+  case DW_OP_GNU_convert:
+  case DW_OP_GNU_reinterpret:
+   while (loc-dw_loc_next
+   loc-dw_loc_next-dw_loc_opc == DW_OP_GNU_convert)
+ {
+   dw_die_ref base1, base2;
+   unsigned enc1, enc2, size1, size2;
+   if (loc-dw_loc_opc == DW_OP_GNU_regval_type
+   || loc-dw_loc_opc == DW_OP_GNU_deref_type)
+ base1 = loc-dw_loc_oprnd2.v.val_die_ref.die;
+   else
+ base1 = loc-dw_loc_oprnd1.v.val_die_ref.die;
+   base2 = loc-dw_loc_next-dw_loc_oprnd1.v.val_die_ref.die;
+   gcc_assert (base1-die_tag == DW_TAG_base_type
+base2-die_tag == DW_TAG_base_type);
+   enc1 = get_AT_unsigned (base1, DW_AT_encoding);
+   enc2 = get_AT_unsigned (base2, DW_AT_encoding);
+   size1 = get_AT_unsigned (base1, DW_AT_byte_size);
+   size2 = get_AT_unsigned (base2, DW_AT_byte_size);
+   if (size1 == size2
+(((enc1 == DW_ATE_unsigned || enc1 == DW_ATE_signed)
+ (enc2 == DW_ATE_unsigned || enc2 == DW_ATE_signed)
+ loc != keep)
+   || enc1 == enc2))
+ {
+   /* Optimize away next DW_OP_GNU_convert after
+  adjusting LOC's base type die reference.  */
+   if (loc-dw_loc_opc == DW_OP_GNU_regval_type
+   || loc-dw_loc_opc == 

Re: [google] Increase inlining limits with FDO/LIPO

2011-05-18 Thread Xinliang David Li
To make consistent inline decisions between profile-gen and
profile-use, probably better to check these two:

flag_profile_arcs and flag_branch_probabilities.  -fprofile-use
enables profile-arcs, and value profiling is enabled only when
edge/branch profiling is enabled (so no need to be checked).

David


On Tue, May 17, 2011 at 10:50 PM, Mark Heffernan meh...@google.com wrote:
 This small patch greatly expands the function size limits for inlining with
 FDO/LIPO.  With profile information, the inliner is much more selective and
 precise and so the limits can be increased with less worry that functions
 and total code size will blow up.  This speeds up x86-64 internal benchmarks
 by about geomean 1.5% to 3% with LIPO (depending on microarch), and 1% to
 1.5% with FDO.  Size increase is negligible (0.1% mean).
 Bootstrapped and regression tested on x86-64.
 Trunk testing to follow.
 Ok for google/main?
 Mark

 2011-05-17  Mark Heffernan  meh...@google.com
        * opts.c (finish_options): Increase inlining limits with profile
        generate and use.

 Index: opts.c
 ===
 --- opts.c (revision 173666)
 +++ opts.c (working copy)
 @@ -828,6 +828,22 @@ finish_options (struct gcc_options *opts
    opts-x_flag_split_stack = 0;
   }
      }
 +
 +  if (opts-x_flag_profile_use
 +      || opts-x_profile_arc_flag
 +      || opts-x_flag_profile_values)
 +    {
 +      /* With accurate profile information, inlining is much more
 + selective and makes better decisions, so increase the
 + inlining function size limits.  Changes must be added to both
 + the generate and use builds to avoid profile mismatches.  */
 +      maybe_set_param_value
 + (PARAM_MAX_INLINE_INSNS_SINGLE, 1000,
 + opts-x_param_values, opts_set-x_param_values);
 +      maybe_set_param_value
 + (PARAM_MAX_INLINE_INSNS_AUTO, 1000,
 + opts-x_param_values, opts_set-x_param_values);
 +    }
  }



Re: [patch gimplifier]: Make sure TRUTH_NOT_EXPR has boolean_type_node type and argument

2011-05-18 Thread Kai Tietz
2011/5/16 Richard Guenther richard.guent...@gmail.com:
 On Mon, May 16, 2011 at 3:45 PM, Michael Matz m...@suse.de wrote:
 Hi,

 On Mon, 16 May 2011, Richard Guenther wrote:

  I think conversion _to_ BOOLEAN_TYPE shouldn't be useless, on the
  grounds that it requires booleanization (at least conceptually), i.e.
  conversion to a set of two values (no matter the precision or size)
  based on the outcome of comparing the RHS value with
  false_pre_image(TREE_TYPE(RHS)).
 
  Conversion _from_ BOOLEAN_TYPE can be regarded as useless, as the
  conversions from false or true into false_pre_image or true_pre_image
  always is simply an embedding of 0 or 1/-1 (depending on target type
  signedness).  And if the BOOLEAN_TYPE and the LHS have same signedness
  the bit representation of boolean_true_type is (or should be) the same
  as the one converted to LHS (namely either 1 or -1).

 Sure, that would probably be enough to prevent non-BOOLEAN_TYPEs be used
 where BOOLEAN_TYPE nodes were used before.  It still will cause an
 artificial conversion from a single-bit bitfield read to a bool.

 Not if you're special casing single-bit conversions (on the grounds that a
 booleanization from two-valued set to a different two-valued set of
 the same signedness will not actually require a comparison).  I think it's
 better to be very precise in our base predicates than to add various hacks
 over the place to care for imprecision.

 Or require a 1-bit integral type for TRUTH_* operands only (which ensures
 the two-valueness which is what we really want).  That can be done
 by either fixing the frontends to make boolean_type_node have 1-bit
 precision or to build a middle-end private type with that constraints
 (though that's the more difficult route as we still do not have a strong
 FE - middle-end hand-off point, and it certainly is not the gimplifier).

 Long term all the global trees should be FE private and the middle-end
 should have its own set.

 Richard.


 Ciao,
 Michael.


Hello,

initial idea was to check for logical operations that the conversion
to boolean_type_node
is useless.  This assumption was flawed by the fact that
boolean_type_node gets re-defined
in free_lang_decl to a 1-bit precision BOOL_TYPE_SIZE-ed type, if FE's
boolean_type_node
is incompatible to this.  By this FE's boolean_type_node gets via the
back-door incompatible
in tree-cfg checks.
So for all languages - but ADA - logical types have precision set to
one. Just for ADA case,
which requires a different boolean_type_node kind, we need to inspect
the inner type to be
a boolean.  As Fortran has also integer typed boolean compatible
types, we can't simply check
for BOOLEAN_TYPE here and need to check for precision first.

ChangeLog

2011-05-18  Kai Tietz

PR middle-end/48989
* tree-cfg.c (verify_gimple_assign_binary): Check lhs type
for being compatible to boolean for logical operations.
(verify_gimple_assign_unary): Likewise.
(compatible_boolean_type_p): New helper.

Bootstrapped on x86_64-pc-linux-gnu. And regression tested for ADA
and Fortran.

Ok for apply?

Regards,
Kai
Index: gcc/gcc/tree-cfg.c
===
--- gcc.orig/gcc/tree-cfg.c 2011-05-16 14:26:12.369031500 +0200
+++ gcc/gcc/tree-cfg.c  2011-05-18 08:20:34.935819100 +0200
@@ -3220,6 +3220,31 @@ verify_gimple_comparison (tree type, tre
   return false;
 }
 
+/* Checks TYPE for being compatible to boolean. Returns
+   FALSE, if type is not compatible, otherwise TRUE.
+
+   A type is compatible if
+   a) TYPE_PRECISION is one.
+   b) The type - or the inner type - is of kind BOOLEAN_TYPE.  */
+
+static bool
+compatible_boolean_type_p (tree type)
+{
+  if (!type)
+return false;
+  if (TYPE_PRECISION (type) == 1)
+return true;
+
+  /* We try to look here into inner type, as ADA uses
+ boolean_type_node with type precision != 1.  */
+  while (TREE_TYPE (type)
+ (TREE_CODE (type) == INTEGER_TYPE
+|| TREE_CODE (type) == REAL_TYPE))
+type = TREE_TYPE (type);
+
+  return TYPE_PRECISION (type) == 1 || TREE_CODE (type) == BOOLEAN_TYPE;
+}
+
 /* Verify a gimple assignment statement STMT with an unary rhs.
Returns true if anything is wrong.  */
 
@@ -3350,15 +3375,16 @@ verify_gimple_assign_unary (gimple stmt)
   return false;
 
 case TRUTH_NOT_EXPR:
-  if (!useless_type_conversion_p (boolean_type_node,  rhs1_type))
+
+  if (!useless_type_conversion_p (lhs_type,  rhs1_type)
+  || !compatible_boolean_type_p (lhs_type))
 {
-   error (invalid types in truth not);
-   debug_generic_expr (lhs_type);
-   debug_generic_expr (rhs1_type);
-   return true;
+   error (invalid types in truth not);
+   debug_generic_expr (lhs_type);
+   debug_generic_expr (rhs1_type);
+   return true;
 }
   break;
-
 case NEGATE_EXPR:
 case ABS_EXPR:
 case BIT_NOT_EXPR:

Re: [PATCH, PR45098, 2/10]

2011-05-18 Thread Zdenek Dvorak
Hi,

 2011-05-05  Tom de Vries  t...@codesourcery.com
 
   PR target/45098
   * tree-ssa-loop-ivopts.c (seq_cost): Fix call to rtx_cost.

OK,

Zdenek


Re: [patch ada]: Fix boolean_type_node setup and some cleanup for boolean use

2011-05-18 Thread Eric Botcazou
 Ok, thanks for explaining it.  So would be patch ok for apply without
 the precision setting?

Sure, everything but the gcc-interface/misc.c part is OK.  Thanks.

-- 
Eric Botcazou


Re: [patch ada]: Fix boolean_type_node setup and some cleanup for boolean use

2011-05-18 Thread Kai Tietz
2011/5/18 Eric Botcazou ebotca...@adacore.com:
 Ok, thanks for explaining it.  So would be patch ok for apply without
 the precision setting?

 Sure, everything but the gcc-interface/misc.c part is OK.  Thanks.

 --
 Eric Botcazou

Hmm, you mean the initialization of boolean_false_node is wrong, too?
Not sure here. As this patch introduces its use in the other parts.
The precision part of course is wrong.

Regards,
Kai


Re: [PATCH] Fix PR48989

2011-05-18 Thread Eric Botcazou
   * tree-ssa.c (useless_type_conversion_p): Preserve conversions
   to non-1-precision BOOLEAN_TYPEs.

This looks like overeager if you're allowing non-boolean types in tree-cfg.c.
The conversion can be stripped if the source type has precision 1, can't it?

-- 
Eric Botcazou


Re: [PATCH] Fix PR48989

2011-05-18 Thread Richard Guenther
On Wed, 18 May 2011, Eric Botcazou wrote:

  * tree-ssa.c (useless_type_conversion_p): Preserve conversions
  to non-1-precision BOOLEAN_TYPEs.
 
 This looks like overeager if you're allowing non-boolean types in tree-cfg.c.
 The conversion can be stripped if the source type has precision 1, can't it?

That's true, though in that case the previous

  /* Preserve changes in signedness or precision.  */
  if (TYPE_UNSIGNED (inner_type) != TYPE_UNSIGNED (outer_type)
  || TYPE_PRECISION (inner_type) != TYPE_PRECISION (outer_type))
return false;

check would have preserved the conversion already (or the BOOLEAN_TYPEs
precision is 1 as well).  Thus, we preserved such conversions already
in the past.

Richard.


[PATCH] Fix PR49018

2011-05-18 Thread Richard Guenther

This fixes PR49018, ifcombine looks for side-effects but instead
asks only gimple_has_volatile_ops.  And gimple_has_side_effects
disregards that volatile asms have side-effects.  The function
also doesn't handle all stmts gracefully so I fixed it as well
as turning the asserts to checking asserts.  Fixed as follows.

Bootstrap / regtest pending on x86_64-unknown-linux-gnu.

Richard.

2011-05-18  Richard Guenther  rguent...@suse.de

PR tree-optimization/49018
* gimple.c (gimple_has_side_effects): Volatile asms have side-effects.
* tree-ssa-ifcombine.c (bb_no_side_effects_p): Use
gimple_has_side_effects.

Index: gcc/gimple.c
===
--- gcc/gimple.c(revision 173854)
+++ gcc/gimple.c(working copy)
@@ -2354,6 +2354,10 @@ gimple_has_side_effects (const_gimple s)
   if (gimple_has_volatile_ops (s))
 return true;
 
+  if (gimple_code (s) == GIMPLE_ASM
+   gimple_asm_volatile_p (s))
+return true;
+
   if (is_gimple_call (s))
 {
   unsigned nargs = gimple_call_num_args (s);
@@ -2368,7 +2372,7 @@ gimple_has_side_effects (const_gimple s)
   if (gimple_call_lhs (s)
TREE_SIDE_EFFECTS (gimple_call_lhs (s)))
{
- gcc_assert (gimple_has_volatile_ops (s));
+ gcc_checking_assert (gimple_has_volatile_ops (s));
  return true;
}
 
@@ -2379,7 +2383,7 @@ gimple_has_side_effects (const_gimple s)
   for (i = 0; i  nargs; i++)
 if (TREE_SIDE_EFFECTS (gimple_call_arg (s, i)))
  {
-   gcc_assert (gimple_has_volatile_ops (s));
+   gcc_checking_assert (gimple_has_volatile_ops (s));
return true;
  }
 
@@ -2388,11 +2392,14 @@ gimple_has_side_effects (const_gimple s)
   else
 {
   for (i = 0; i  gimple_num_ops (s); i++)
-   if (TREE_SIDE_EFFECTS (gimple_op (s, i)))
- {
-   gcc_assert (gimple_has_volatile_ops (s));
-   return true;
- }
+   {
+ tree op = gimple_op (s, i);
+ if (op  TREE_SIDE_EFFECTS (op))
+   {
+ gcc_checking_assert (gimple_has_volatile_ops (s));
+ return true;
+   }
+   }
 }
 
   return false;
Index: gcc/tree-ssa-ifcombine.c
===
--- gcc/tree-ssa-ifcombine.c(revision 173854)
+++ gcc/tree-ssa-ifcombine.c(working copy)
@@ -107,7 +107,7 @@ bb_no_side_effects_p (basic_block bb)
 {
   gimple stmt = gsi_stmt (gsi);
 
-  if (gimple_has_volatile_ops (stmt)
+  if (gimple_has_side_effects (stmt)
  || gimple_vuse (stmt))
return false;
 }


Re: [PATCH, i386]: Cleanup TARGET_GNU2_TLS usage

2011-05-18 Thread Rainer Orth
Uros,

 The test (tls.c), used to check all TLS models is attached to the
 message. I plan to convert it to proper dg test... ;)

I've got it in my tree since you sent it to me while debugging/testing
support for the various TLS models on Solaris.  I'd really prefer (and
have modified it this way) the test to be split into one test per access
model so it becomes easier to figure out what is failing.

I can provide such a patch if desired.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH] Fix extract_fixed_bit_field (PR middle-end/49029)

2011-05-18 Thread Jakub Jelinek
Hi!

The attached testcase ICEs on arm, because extract_fixed_bit_field
with tmode SImode (and SImode non-NULL target) decides to use DImode for
the signed shifts, but doesn't clear target and thus attempts to use
that SImode target for DImode shifts.
The code apparently already has if (mode != tmode) target = 0;, just
done at a wrong spot before mode can be changed.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux
and tested with a cross to arm-linux on the testcase, ok for trunk/4.6?

2011-05-18  Jakub Jelinek  ja...@redhat.com

PR middle-end/49029
* expmed.c (extract_fixed_bit_field): Test whether target can be used
only after deciding which mode to use.

* gcc.c-torture/compile/pr49029.c: New test.

--- gcc/expmed.c.jj 2011-05-11 19:39:04.0 +0200
+++ gcc/expmed.c2011-05-18 11:38:43.0 +0200
@@ -1769,8 +1769,6 @@ extract_fixed_bit_field (enum machine_mo
   /* To extract a signed bit-field, first shift its msb to the msb of the word,
  then arithmetic-shift its lsb to the lsb of the word.  */
   op0 = force_reg (mode, op0);
-  if (mode != tmode)
-target = 0;
 
   /* Find the narrowest integer mode that contains the field.  */
 
@@ -1782,6 +1780,9 @@ extract_fixed_bit_field (enum machine_mo
break;
   }
 
+  if (mode != tmode)
+target = 0;
+
   if (GET_MODE_BITSIZE (mode) != (bitsize + bitpos))
 {
   int amount = GET_MODE_BITSIZE (mode) - (bitsize + bitpos);
--- gcc/testsuite/gcc.c-torture/compile/pr49029.c.jj2011-05-18 
11:55:25.0 +0200
+++ gcc/testsuite/gcc.c-torture/compile/pr49029.c   2011-05-18 
11:54:22.0 +0200
@@ -0,0 +1,10 @@
+/* PR middle-end/49029 */
+struct S { volatile unsigned f : 11; signed g : 30; } __attribute__((packed));
+struct T { volatile struct S h; } __attribute__((packed)) a;
+void foo (int);
+
+void
+bar ()
+{
+  foo (a.h.g);
+}

Jakub


Re: [PATCH][ARM] Add support for ADDW and SUBW instructions

2011-05-18 Thread Andrew Stubbs

Ping.

On 20/04/11 16:27, Andrew Stubbs wrote:

This patch adds basic support for the Thumb ADDW and SUBW instructions.

The patch permits the compiler to use the new instructions for constants
that can be loaded with a single instruction (i.e. 16-bit unshifted),
but does not support use of addw with split-constants; I have a patch
for that coming soon.

This patch requires that my previously posted patch for MOVW is applied
first.

OK?

Andrew




Re: [PATCH 2/2] Reimplementation of build_ref_for_offset

2011-05-18 Thread H.J. Lu
On Sat, Oct 23, 2010 at 10:12 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Wed, Sep 8, 2010 at 9:43 AM, Martin Jambor mjam...@suse.cz wrote:
 Hi,

 this patch reimplements build_ref_for_offset so that it simply creates
 a MEM_REF rather than trying to figure out what combination of
 component and array refs are necessary.  The main advantage of this
 approach is that this can never fail, allowing us to be more
 aggressive and remove a number of checks.

 There were two main problems with this, though.  First is that
 MEM_REFs are not particularly readable to by users.  This would be a
 problem when we are creating a reference that might be displayed to
 them in a warning or a debugger which is what we do with
 DECL_DEBUG_EXPR expressions.  We sometimes construct these
 artificially when propagating accesses across assignments.  So for
 those cases I retained the old implementation and only simplified it a
 bit - it is now called build_user_friendly_ref_for_offset.

 The other problem was bit-fields.  Constructing accesses to them was
 difficult enough but then I realized that I was not even able to
 detect the cases when I was accessing a bit field if their offset
 happened to be on a byte boundary.  I thought I would be able to
 figure this out from TYPE_SIZE and TYPE_PRECISION of exp_type but
 combinations that signal a bit-field in one language may not be
 applied in another (in C, small TYPE_PRECISION denotes bit-fields and
 TYPE_SIZE is big, but for example Fortran booleans have the precision
 set to one even though they are not bit-fields).

 So in the end I based the detection on the access structures that
 represented the thing being loaded or stored which I knew had their
 sizes correct because they are based on field sizes.  Since I use the
 access, the simplest way to actually create the reference to the bit
 field is to re-use the last component ref of its expression - that is
 what build_ref_for_model (meaning a model access) does.  Separating
 this from build_ref_for_offset (which cannot handle bit-fields) makes
 the code a bit cleaner and keeps the latter function for other users
 which know nothing about SRA access structures.

 I hope that you'll find these approaches reasonable.  The patch was
 bootstrapped and tested on x86_64-linux without any issues.  I'd like
 to commit it to trunk but I'm sure there will be comments and
 suggestions.

 Thanks,

 Martin



 2010-09-08  Martin Jambor  mjam...@suse.cz

        PR tree-optimization/44972
        * tree-sra.c: Include toplev.h.
        (build_ref_for_offset): Entirely reimplemented.
        (build_ref_for_model): New function.
        (build_user_friendly_ref_for_offset): New function.
        (analyze_access_subtree): Removed build_ref_for_offset check.
        (propagate_subaccesses_across_link): Likewise.
        (create_artificial_child_access): Use
        build_user_friendly_ref_for_offset.
        (propagate_subaccesses_across_link): Likewise.
        (ref_expr_for_all_replacements_p): Removed.
        (generate_subtree_copies): Updated comment.  Use build_ref_for_model.
        (sra_modify_expr): Use build_ref_for_model.
        (load_assign_lhs_subreplacements): Likewise.
        (sra_modify_assign): Removed ref_expr_for_all_replacements_p checks,
        checks for return values of build_ref_for_offset.
        * ipa-cp.c (ipcp_lattice_from_jfunc): No need to check return value of
        build_ref_for_offset.
        * ipa-prop.h: Include gimple.h
        * ipa-prop.c (ipa_compute_jump_functions): Update to look for 
 MEM_REFs.
        (ipa_analyze_indirect_call_uses): Update comment.
        * Makefile.in (tree-sra.o): Add $(GIMPLE_H) to dependencies.
        (IPA_PROP_H): Likewise.

 This caused:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46150


This also caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49039

-- 
H.J.


PING: PATCH: PR rtl-optimization/48575: RTL vector patterns are limited to 26 elements

2011-05-18 Thread H.J. Lu
On Tue, Apr 26, 2011 at 3:32 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Apr 4, 2011 at 6:05 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, Mar 31, 2011 at 5:05 AM, Kenneth Zadeck
 zad...@naturalbridge.com wrote:
 we hit this limit trying to write the explicit semantics for a
 vec_interleave_evenv32qi.

 ;;(define_insn vec_interleave_evenv32qi
 ;;  [(set (match_operand:V32QI 0 register_operand =r)
 ;;    (vec_select:V32QI
 ;;      (vec_concat:V64QI
 ;;        (match_operand:V32QI 1 register_operand 0)
 ;;        (match_operand:V32QI 2 register_operand r))
 ;;      (parallel [(const_int  0) (const_int 32)
 ;;             (const_int  2) (const_int 34)
 ;;             (const_int  4) (const_int 36)
 ;;             (const_int  6) (const_int 38)
 ;;             (const_int  8) (const_int 40)
 ;;             (const_int 10) (const_int 42)
 ;;             (const_int 12) (const_int 44)
 ;;             (const_int 14) (const_int 46)
 ;;             (const_int 16) (const_int 48)
 ;;             (const_int 18) (const_int 50)
 ;;             (const_int 20) (const_int 52)
 ;;             (const_int 22) (const_int 54)
 ;;             (const_int 24) (const_int 56)
 ;;             (const_int 26) (const_int 58)
 ;;             (const_int 28) (const_int 60)
 ;;             (const_int 30) (const_int 62)])))]
 ;;  
 ;;  rimihv\t%0,%2,8,15,8
 ;;  [(set_attr type rimi)])


 kenny

 On 03/31/2011 06:16 AM, Mike Stump wrote:

 On Mar 31, 2011, at 1:41 AM, Richard Guenther wrote:

 On Wed, Mar 30, 2011 at 8:09 PM, H.J. Luhongjiu...@intel.com  wrote:

 On Wed, Mar 30, 2011 at 08:02:38AM -0700, H.J. Lu wrote:

 Hi,

 Currently, we limit XVECEXP to 26 elements in machine description
 since we use letters 'a' to 'z' to encode them.  I don't see any
 reason why we can't go beyond 'z'.  This patch removes this
 restriction.
 Any comments?

 That was wrong.  The problem is in vector elements.  This patch passes
 bootstrap.  Any comments?

 Do you really need it?

 I'm trying to recall if this is the limit Kenny and I hit  If so,
 annoying.  Kenny could confirm if it was.  gcc's general strategy of, no
 fixed N gives gcc a certain flexibility that is very nice to have, on those
 general grounds, I kinda liked this patch.


 Is my patch OK to install?


 Here is my patch:

 http://gcc.gnu.org/ml/gcc-patches/2011-03/msg02105.html

 OK for trunk?


Hi,

No one is listed to review genrecog.c.  Could global reviewers comment
on my patch?

Thanks.

-- 
H.J.


Re: [PATCH, MELT] correcting path error in the Makefile.in

2011-05-18 Thread Ian Lance Taylor
Basile Starynkevitch bas...@starynkevitch.net writes:

 On Wed, May 18, 2011 at 10:27:11AM +0400, Andrey Belevantsev wrote:

 On 17.05.2011 23:42, Basile Starynkevitch wrote:
 On Tue, 17 May 2011 21:30:44 +0200
 Pierre Vittetpier...@pvittet.com  wrote:
 
 My contributor number is 634276.
 You don't have to write your FSF contributor number in each mail to
 gcc-patches.  This information is irrelevant to anybody reading the
 list as soon as you have got your papers right and got acquainted
 with the community.  So don't worry about this :)


 It would help a lot if Pierre Vittet had a write access to the SVN of GCC.
 Hese legal papers are done. However, neither Pierre nor me understands how
 can he get an actual write access to the SVN (that is an SSH account on
 gcc.gnu.org). Apparently, Pierre needs to be presented (or introduced) by 
 someone. But a plain write after approval GCC maintainer like me is not 
 enough.

 So how can Pierre get write access to GCC ?

We usually like to see a few successful patches before granting people
write access, to make sure that people have the mechanics down before
they start changing the repository.

Ian


Re: PING: PATCH: PR other/48007: Unwind library doesn't work with UNITS_PER_WORD sizeof (void *)

2011-05-18 Thread H.J. Lu
On Tue, Apr 26, 2011 at 6:07 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Sat, Apr 9, 2011 at 6:52 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, Mar 24, 2011 at 12:15 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Wed, Mar 23, 2011 at 12:22 PM, Ulrich Weigand uweig...@de.ibm.com 
 wrote:
 Richard Henderson wrote:
 Because, really, if we consider the structure truly public, we can't even
 change the number of registers for a given port to support new features of
 the cpu.

 Indeed, and I remember we got bitten by that a long time ago, which is why
 s390.h now has this comment:

 /* Number of hardware registers that go into the DWARF-2 unwind info.
   To avoid ABI incompatibility, this number must not change even as
   'fake' hard registers are added or removed.  */
 #define DWARF_FRAME_REGISTERS 34

 I don't suppose there's any way that we can declare these old
 programs Just Broken, and drop this compatibility stuff?

 I wouldn't like that ... we did run into this problem in the wild, and
 some s390 users really run very old programs for some reason.

 However, I'm wondering: this bug that leaked the implementation of
 _Unwind_Context only ever affected the *original* version of the
 structure -- it was fixed before the extended context was ever
 added, right?

 If this is true, we'd still need to keep the original context format
 unchanged, but we'd be free to modify the *extended* format at any
 time, without ABI considerations and need for further versioning ...


 From what I can tell, the issues are:

 1. _Unwind_Context is supposed to be opaque and we are free to
 change it.  We should be able to extend DWARF_FRAME_REGISTERS
 to support the new hard registers if needed, without breaking binary
 compatibility.
 2.  _Unwind_Context was leaked and wasn't really opaque.  To
 provide backward binary compatibility, we are stuck with what we
 had.

 Is that possible to implement something along the line:

 1. Add some bits to _Unwind_Context so that we can detect
 the leaked _Unwind_Context.
 2. When a leaked _Unwind_Context is detected at run-time,
 as a compile time option, a target can either provide binary
 compatibility or issue a run-time error.

 This is the attempt to implement it.  Any comments?

 Thanks.

 --
 H.J.
 --
 2011-04-09  H.J. Lu  hongjiu...@intel.com

        PR other/48007
        * unwind-dw2.c (UNIQUE_UNWIND_CONTEXT): New.
        (_Unwind_Context): If UNIQUE_UNWIND_CONTEXT is defined, add
        dwarf_reg_size_table and value, remove version and by_value.
        (EXTENDED_CONTEXT_BIT): Don't define if UNIQUE_UNWIND_CONTEXT
        is defined.
        (_Unwind_IsExtendedContext): Likewise.
        (_Unwind_GetGR): Support UNIQUE_UNWIND_CONTEXT.
        (_Unwind_SetGR): Likewise.
        (_Unwind_GetGRPtr): Likewise.
        (_Unwind_SetGRPtr): Likewise.
        (_Unwind_SetGRValue): Likewise.
        (_Unwind_GRByValue): Likewise.
        (__frame_state_for): Initialize dwarf_reg_size_table field if
        UNIQUE_UNWIND_CONTEXT is defined.
        (uw_install_context_1): Likewise.  Support UNIQUE_UNWIND_CONTEXT.


 PING.


Hi Jason,

Can you take a look at:

http://gcc.gnu.org/ml/gcc-patches/2011-04/msg00695.html

Thanks.


-- 
H.J.


Re: [Patch, Fortran] PR 48700: memory leak with MOVE_ALLOC

2011-05-18 Thread Tobias Burnus
Janus Weil wrote:
 The patch was regtested on x86_64-unknown-linux-gnu. Ok for trunk?

OK. Thanks for the patch!

(What next on your gfortran agenda?)

Tobias

PS: For the following two patches review is pending:
My trans*.c coarray patch at
  http://gcc.gnu.org/ml/fortran/2011-05/msg00123.html
Janne's
  http://gcc.gnu.org/ml/fortran/2011-05/msg00122.html

 2011-05-16  Janus Weil  ja...@gcc.gnu.org

   PR fortran/48700
   * trans-intrinsic.c (gfc_conv_intrinsic_move_alloc): Deallocate 'TO'
   argument to avoid memory leaks.

 2011-05-16  Janus Weil  ja...@gcc.gnu.org

   PR fortran/48700
   * gfortran.dg/move_alloc_4.f90: New.


[patch gimplifier]: Change TRUTH_(AND|OR|XOR) expressions to binary form

2011-05-18 Thread Kai Tietz
Hello

As follow-up for logical to binary transition

2011-05-18  Kai Tietz  kti...@redhat.com

* tree-cfg.c (verify_gimple_assign_binary): Barf on
TRUTH_AND_EXPR, TRUTH_OR_EXPR, and TRUTH_XOR_EXPR.
(gimplify_expr): Boolify TRUTH_ANDIF_EXPR, TRUTH_ORIF_EXPR,
TRUTH_AND_EXPR, TRUTH_OR_EXPR, and TRUTH_XOR_EXPR. Additionally
move TRUTH_AND|OR|XOR_EXPR to its binary form.

Boostrapped for x86_64-pc-linux-gnu and regression tested for ada,
fortran, g++, and c. Ok for apply?

Regards,
Kai
Index: gcc/gcc/gimplify.c
===
--- gcc.orig/gcc/gimplify.c 2011-05-13 13:15:01.0 +0200
+++ gcc/gcc/gimplify.c  2011-05-18 14:03:31.730740200 +0200
@@ -7210,7 +7210,21 @@ gimplify_expr (tree *expr_p, gimple_seq
break;
  }
  }
- 
+
+   switch (TREE_CODE (*expr_p))
+ {
+ case TRUTH_AND_EXPR:
+   TREE_SET_CODE (*expr_p, BIT_AND_EXPR);
+   break;
+ case TRUTH_OR_EXPR:
+   TREE_SET_CODE (*expr_p, BIT_IOR_EXPR);
+   break;
+ case TRUTH_XOR_EXPR:
+   TREE_SET_CODE (*expr_p, BIT_XOR_EXPR);
+   break;
+ default:
+   break;
+ }
  /* Classified as tcc_expression.  */
  goto expr_2;
 
Index: gcc/gcc/tree-cfg.c
===
--- gcc.orig/gcc/tree-cfg.c 2011-05-18 14:01:18.0 +0200
+++ gcc/gcc/tree-cfg.c  2011-05-18 14:05:06.512276000 +0200
@@ -3555,29 +3555,11 @@ do_pointer_plus_expr_check:
 
 case TRUTH_ANDIF_EXPR:
 case TRUTH_ORIF_EXPR:
-  gcc_unreachable ();
-
 case TRUTH_AND_EXPR:
 case TRUTH_OR_EXPR:
 case TRUTH_XOR_EXPR:
-  {
-   /* We require two-valued operand types.  */
-   if (!(TREE_CODE (rhs1_type) == BOOLEAN_TYPE
- || (INTEGRAL_TYPE_P (rhs1_type)
-  TYPE_PRECISION (rhs1_type) == 1))
-   || !(TREE_CODE (rhs2_type) == BOOLEAN_TYPE
-|| (INTEGRAL_TYPE_P (rhs2_type)
- TYPE_PRECISION (rhs2_type) == 1)))
- {
-   error (type mismatch in binary truth expression);
-   debug_generic_expr (lhs_type);
-   debug_generic_expr (rhs1_type);
-   debug_generic_expr (rhs2_type);
-   return true;
- }
 
-   break;
-  }
+  gcc_unreachable ();
 
 case LT_EXPR:
 case LE_EXPR:


Re: [PATCH] fix vfmsubaddpd/vfmaddsubpd generation

2011-05-18 Thread Uros Bizjak
Hello!

 This patch fixes an obvious problem: the fma4_fmsubadd/fma4_fmaddsub
 instruction templates don't generate vfmsubaddpd/vfmaddsubpd because
 they don't use ssemodesuffix

 This passes bootstrap on x86_64 on trunk.  Okay to commit?

See comments in the code.

 BTW, I'm testing on gcc-4_6-branch.  Should I post a different patch
 thread, or just use this one?

No, the patch is clear and simple enough, you don't need to post it twice.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 3625d9b..e86ea4e 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2011-05-17  Harsha Jagasia  harsha.jaga...@amd.com
+
+   * config/i386/sse.md (fma4_fmsubadd): Use ssemodesuffix.
+   (fma4_fmaddsub): Likewise
+
 2011-05-17  Richard Guenther  rguent...@suse.de

ChangeLog should be included in the message body, not in the patch.
Please see [1] for details.

* gimple.c (iterative_hash_gimple_type): Simplify singleton
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 291bffb..7c4e6dd 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1663,7 +1663,7 @@
   (match_operand:VF 3 nonimmediate_operand xm,x)]
  UNSPEC_FMADDSUB))]
   TARGET_FMA4
-  vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}
+  vfmaddsubpssemodesuffix\t{%3, %2, %1, %0|%0, %1, %2, %3}
   [(set_attr type ssemuladd)
(set_attr mode MODE)])

No, ssemodesuffix mode attribute resolves to ps and pd for VF mode
iterator, so vfmaddsubssemodesuffix.

@@ -1676,7 +1676,7 @@
 (match_operand:VF 3 nonimmediate_operand xm,x))]
  UNSPEC_FMADDSUB))]
   TARGET_FMA4
-  vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}
+  vfmsubaddpssemodesuffix\t{%3, %2, %1, %0|%0, %1, %2, %3}
   [(set_attr type ssemuladd)
(set_attr mode MODE)])

Same here.

OK everywhere with these two changes.

[1] http://gcc.gnu.org/contribute.html.

Thanks,
Uros.


[PATCH 1/2] Add bf592 support

2011-05-18 Thread Henderson, Stuart
Hi,

The attached patch adds support for the bfin bf592 part.

* doc/invoke.texi (Blackfin Options): -mcpu accepts bf592.
* config/bfin/t-bfin-elf (MULTILIB_MATCHES): Select bf532-none for
bf592-none.
* config/bfin/t-bfin-linux (MULTILIB_MATCHES): Likewise.
* config/bfin/t-bfin-uclinux (MULTILIB_MATCHES): Likewise.
* config/bfin/bfin.c (bfin_cpus): Add bf592.
* config/bfin/bfin.h (TARGET_CPU_CPP_BUILTINS): Define
__ADSPBF592__ and __ADSPBF59x__ for BFIN_CPU_BF592.
* config/bfin/bfin-opts.h (bfin_cpu_type): Add BFIN_CPU_BF592.
* config/bfin/elf.h (LIB_SPEC): Add bf592.


Ok to add to trunk?

thanks,
Stu

Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 173825)
+++ gcc/doc/invoke.texi (working copy)
@@ -10414,7 +10414,7 @@
 @samp{bf534}, @samp{bf536}, @samp{bf537}, @samp{bf538}, @samp{bf539},
 @samp{bf542}, @samp{bf544}, @samp{bf547}, @samp{bf548}, @samp{bf549},
 @samp{bf542m}, @samp{bf544m}, @samp{bf547m}, @samp{bf548m}, @samp{bf549m},
-@samp{bf561}.
+@samp{bf561}, @samp{bf592}.
 The optional @var{sirevision} specifies the silicon revision of the target
 Blackfin processor.  Any workarounds available for the targeted silicon 
revision
 will be enabled.  If @var{sirevision} is @samp{none}, no workarounds are 
enabled.
Index: gcc/config/bfin/t-bfin-elf
===
--- gcc/config/bfin/t-bfin-elf  (revision 173825)
+++ gcc/config/bfin/t-bfin-elf  (working copy)
@@ -58,6 +58,7 @@
 MULTILIB_MATCHES+=mcpu?bf532-none=mcpu?bf549-none
 MULTILIB_MATCHES+=mcpu?bf532-none=mcpu?bf549m-none
 MULTILIB_MATCHES+=mcpu?bf532-none=mcpu?bf561-none
+MULTILIB_MATCHES+=mcpu?bf532-none=mcpu?bf592-none

 MULTILIB_EXCEPTIONS=mleaf-id-shared-library*
 MULTILIB_EXCEPTIONS+=mcpu=bf532-none/mleaf-id-shared-library*
Index: gcc/config/bfin/bfin-opts.h
===
--- gcc/config/bfin/bfin-opts.h (revision 173825)
+++ gcc/config/bfin/bfin-opts.h (working copy)
@@ -53,7 +53,8 @@
   BFIN_CPU_BF548M,
   BFIN_CPU_BF549,
   BFIN_CPU_BF549M,
-  BFIN_CPU_BF561
+  BFIN_CPU_BF561,
+  BFIN_CPU_BF592
 } bfin_cpu_t;

 #endif
Index: gcc/config/bfin/elf.h
===
--- gcc/config/bfin/elf.h   (revision 173825)
+++ gcc/config/bfin/elf.h   (working copy)
@@ -51,6 +51,7 @@
%{mmulticore:%{mcorea:-T bf561a.ld%s}} \
%{mmulticore:%{mcoreb:-T bf561b.ld%s}} \
%{mmulticore:%{!mcorea:%{!mcoreb:-T bf561m.ld%s \
+ %{mcpu=bf592*:-T bf592.ld%s} \
  %{!mcpu=*:%eno processor type specified for linking} \
  %{!mcpu=bf561*:-T bfin-common-sc.ld%s} \
  %{mcpu=bf561*:%{!mmulticore:-T bfin-common-sc.ld%s} \
Index: gcc/config/bfin/bfin.c
===
--- gcc/config/bfin/bfin.c  (revision 173825)
+++ gcc/config/bfin/bfin.c  (working copy)
@@ -350,6 +350,11 @@
| WA_05000283 | WA_05000257 | WA_05000315 | WA_LOAD_LCREGS
| WA_0574},

+  {bf592, BFIN_CPU_BF592, 0x0001,
+   WA_SPECULATIVE_LOADS | WA_0574},
+  {bf592, BFIN_CPU_BF592, 0x,
+   WA_SPECULATIVE_LOADS | WA_0574},
+
   {NULL, BFIN_CPU_UNKNOWN, 0, 0}
 };

Index: gcc/config/bfin/bfin.h
===
--- gcc/config/bfin/bfin.h  (revision 173825)
+++ gcc/config/bfin/bfin.h  (working copy)
@@ -140,6 +140,10 @@
case BFIN_CPU_BF561:\
  builtin_define (__ADSPBF561__); \
  break;\
+   case BFIN_CPU_BF592:\
+ builtin_define (__ADSPBF592__); \
+ builtin_define (__ADSPBF59x__); \
+ break;\
}   \
\
   if (bfin_si_revision != -1)  \
Index: gcc/config/bfin/t-bfin-uclinux
===
--- gcc/config/bfin/t-bfin-uclinux  (revision 173825)
+++ gcc/config/bfin/t-bfin-uclinux  (working copy)
@@ -58,6 +58,7 @@
 MULTILIB_MATCHES+=mcpu?bf532-none=mcpu?bf549-none
 MULTILIB_MATCHES+=mcpu?bf532-none=mcpu?bf549m-none
 MULTILIB_MATCHES+=mcpu?bf532-none=mcpu?bf561-none
+MULTILIB_MATCHES+=mcpu?bf532-none=mcpu?bf592-none

 MULTILIB_EXCEPTIONS=mleaf-id-shared-library*
 MULTILIB_EXCEPTIONS+=mcpu=bf532-none/mleaf-id-shared-library*
Index: gcc/config/bfin/t-bfin-linux
===
--- gcc/config/bfin/t-bfin-linux(revision 173825)
+++ gcc/config/bfin/t-bfin-linux(working copy)
@@ -57,6 +57,7 @@
 MULTILIB_MATCHES+=mcpu?bf532-none=mcpu?bf549-none
 MULTILIB_MATCHES+=mcpu?bf532-none=mcpu?bf549m-none
 MULTILIB_MATCHES+=mcpu?bf532-none=mcpu?bf561-none
+MULTILIB_MATCHES+=mcpu?bf532-none=mcpu?bf592-none

 SHLIB_MAPFILES=$(srcdir)/config/bfin/libgcc-bfin.ver



[PATCH 2/2] Add bf592 support

2011-05-18 Thread Henderson, Stuart
Hi,

The attached patch adds a new test for the bfin bf592 part.


* gcc.target/bfin/mcpu-bf592.c: New test.


Ok to add to trunk?

thanks,
Stu

Index: gcc/testsuite/gcc.target/bfin/mcpu-bf592.c
===
--- gcc/testsuite/gcc.target/bfin/mcpu-bf592.c  (revision 0)
+++ gcc/testsuite/gcc.target/bfin/mcpu-bf592.c  (revision 0)
@@ -0,0 +1,31 @@
+/* Test for -mcpu=.  */
+/* { dg-do preprocess } */
+/* { dg-bfin-options -mcpu=bf592 } */
+
+#ifndef __ADSPBF592__
+#error __ADSPBF592__ is not defined
+#endif
+
+#ifndef __ADSPBF59x__
+#error __ADSPBF59x__ is not defined
+#endif
+
+#if __SILICON_REVISION__ != 0x0001
+#error __SILICON_REVISION__ is not 0x0001
+#endif
+
+#ifndef __WORKAROUNDS_ENABLED
+#error __WORKAROUNDS_ENABLED is not defined
+#endif
+
+#ifdef __WORKAROUND_RETS
+#error __WORKAROUND_RETS is defined
+#endif
+
+#ifndef __WORKAROUND_SPECULATIVE_LOADS
+#error __WORKAROUND_SPECULATIVE_LOADS is not defined
+#endif
+
+#ifdef __WORKAROUND_SPECULATIVE_SYNCS
+#error __WORKAROUND_SPECULATIVE_SYNCS is defined
+#endif



[PATCH PR45098, 4/10] Iv init cost.

2011-05-18 Thread Tom de Vries
On 05/17/2011 09:17 AM, Tom de Vries wrote:
 On 05/17/2011 09:10 AM, Tom de Vries wrote:
 Hi Zdenek,

 I have a patch set for for PR45098.

 01_object-size-target.patch
 02_pr45098-rtx-cost-set.patch
 03_pr45098-computation-cost.patch
 04_pr45098-iv-init-cost.patch
 05_pr45098-bound-cost.patch
 06_pr45098-bound-cost.test.patch
 07_pr45098-nowrap-limits-iterations.patch
 08_pr45098-nowrap-limits-iterations.test.patch
 09_pr45098-shift-add-cost.patch
 10_pr45098-shift-add-cost.test.patch

 I will sent out the patches individually.

 
 OK for trunk?
 
 Thanks,
 - Tom

Resubmitting with comment.

The init cost of an iv will in general not be zero. It will be
exceptional that the iv register happens to be initialized with the
proper value at no cost. In general, there will at the very least be a
regcopy or a const set.

2011-05-05  Tom de Vries  t...@codesourcery.com

PR target/45098
* tree-ssa-loop-ivopts.c (determine_iv_cost): Prevent
cost_base.cost == 0.
Index: gcc/tree-ssa-loop-ivopts.c
===
--- gcc/tree-ssa-loop-ivopts.c	(revision 173380)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -4688,6 +4688,8 @@ determine_iv_cost (struct ivopts_data *d
 
   base = cand-iv-base;
   cost_base = force_var_cost (data, base, NULL);
+  if (cost_base.cost == 0)
+  cost_base.cost = COSTS_N_INSNS (1);
   cost_step = add_cost (TYPE_MODE (TREE_TYPE (base)), data-speed);
 
   cost = cost_step + adjust_setup_cost (data, cost_base.cost);


[PATCH PR45098, 6/10] Bound cost - test cases.

2011-05-18 Thread Tom de Vries
On 05/17/2011 09:19 AM, Tom de Vries wrote:
 On 05/17/2011 09:10 AM, Tom de Vries wrote:
 Hi Zdenek,

 I have a patch set for for PR45098.

 01_object-size-target.patch
 02_pr45098-rtx-cost-set.patch
 03_pr45098-computation-cost.patch
 04_pr45098-iv-init-cost.patch
 05_pr45098-bound-cost.patch
 06_pr45098-bound-cost.test.patch
 07_pr45098-nowrap-limits-iterations.patch
 08_pr45098-nowrap-limits-iterations.test.patch
 09_pr45098-shift-add-cost.patch
 10_pr45098-shift-add-cost.test.patch

 I will sent out the patches individually.

 
 OK for trunk?
 
 Thanks,
 - Tom

These patch adds 2 new test cases. These need the preceding patches to pass.
Index: gcc/testsuite/gcc.target/arm/ivopts-2.c
===
--- /dev/null (new file)
+++ gcc/testsuite/gcc.target/arm/ivopts-2.c (revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do assemble } */
+/* { dg-options -Os -mthumb -fdump-tree-ivopts -save-temps } */
+
+extern void foo2 (short*);
+
+void
+tr4 (short array[], int n)
+{
+  int x;
+  if (n  0)
+for (x = 0; x  n; x++)
+  foo2 (array[x]);
+}
+
+/* { dg-final { scan-tree-dump-times PHI ivtmp 1 ivopts} } */
+/* { dg-final { scan-tree-dump-times PHI  1 ivopts} } */
+/* { dg-final { object-size text = 26 { target arm_thumb2_ok } } } */
+/* { dg-final { cleanup-tree-dump ivopts } } */
Index: gcc/testsuite/gcc.target/arm/ivopts.c
===
--- /dev/null (new file)
+++ gcc/testsuite/gcc.target/arm/ivopts.c (revision 0)
@@ -0,0 +1,15 @@
+/* { dg-do assemble } */
+/* { dg-options -Os -mthumb -fdump-tree-ivopts -save-temps } */
+
+void
+tr5 (short array[], int n)
+{
+  int x;
+  if (n  0)
+for (x = 0; x  n; x++)
+  array[x] = 0;
+}
+
+/* { dg-final { scan-tree-dump-times PHI  1 ivopts} } */
+/* { dg-final { object-size text = 20 { target arm_thumb2_ok } } } */
+/* { dg-final { cleanup-tree-dump ivopts } } */


[PATCH PR45098, 7/10] Nowrap limits iterations

2011-05-18 Thread Tom de Vries
On 05/17/2011 09:20 AM, Tom de Vries wrote:
 On 05/17/2011 09:10 AM, Tom de Vries wrote:
 Hi Zdenek,

 I have a patch set for for PR45098.

 01_object-size-target.patch
 02_pr45098-rtx-cost-set.patch
 03_pr45098-computation-cost.patch
 04_pr45098-iv-init-cost.patch
 05_pr45098-bound-cost.patch
 06_pr45098-bound-cost.test.patch
 07_pr45098-nowrap-limits-iterations.patch
 08_pr45098-nowrap-limits-iterations.test.patch
 09_pr45098-shift-add-cost.patch
 10_pr45098-shift-add-cost.test.patch

 I will sent out the patches individually.

 
 OK for trunk?
 
 Thanks,
 - Tom

Resubmitting with comment.

This patch attemps to estimate the number of iterations of the loop based on
nonwrapping arithmetic in the loop body.

2011-05-05  Tom de Vries  t...@codesourcery.com

PR target/45098
* tree-ssa-loop-ivopts.c (struct ivopts_data): Add fields
max_iterations_p and max_iterations.
(is_nonwrap_use, max_loop_iterations, set_max_iterations): New function.
(may_eliminate_iv): Use max_iterations_p and max_iterations.
(tree_ssa_iv_optimize_loop): Use set_max_iterations.
Index: gcc/tree-ssa-loop-ivopts.c
===
--- gcc/tree-ssa-loop-ivopts.c (revision 173355)
+++ gcc/tree-ssa-loop-ivopts.c (working copy)
@@ -291,6 +291,12 @@ struct ivopts_data
 
   /* Whether the loop body includes any function calls.  */
   bool body_includes_call;
+
+  /* Whether max_iterations is valid.  */
+  bool max_iterations_p;
+
+  /* Maximum number of iterations of current_loop.  */
+  double_int max_iterations;
 };
 
 /* An assignment of iv candidates to uses.  */
@@ -4319,6 +4325,108 @@ iv_elimination_compare (struct ivopts_da
   return (exit-flags  EDGE_TRUE_VALUE ? EQ_EXPR : NE_EXPR);
 }
 
+/* Determine if USE contains non-wrapping arithmetic.  */
+
+static bool
+is_nonwrap_use (struct ivopts_data *data, struct iv_use *use)
+{
+  gimple stmt = use-stmt;
+  tree var, ptr, ptr_type;
+
+  if (!is_gimple_assign (stmt))
+return false;
+
+  switch (gimple_assign_rhs_code (stmt))
+{
+case POINTER_PLUS_EXPR:
+  ptr = gimple_assign_rhs1 (stmt);
+  ptr_type = TREE_TYPE (ptr);
+  var = gimple_assign_rhs2 (stmt);
+  if (!expr_invariant_in_loop_p (data-current_loop, ptr))
+return false;
+  break;
+case ARRAY_REF:
+  ptr = TREE_OPERAND ((gimple_assign_rhs1 (stmt)), 0);
+  ptr_type = build_pointer_type (TREE_TYPE (gimple_assign_rhs1 (stmt)));
+  var = TREE_OPERAND ((gimple_assign_rhs1 (stmt)), 1);
+  break;
+default:
+  return false;
+}
+
+  if (!nowrap_type_p (ptr_type))
+return false;
+
+  if (TYPE_PRECISION (ptr_type) != TYPE_PRECISION (TREE_TYPE (var)))
+return false;
+
+  return true;
+}
+
+/* Attempt to infer maximum number of loop iterations of DATA-current_loop
+   from uses in loop containing non-wrapping arithmetic.  If successful,
+   return true, and return maximum iterations in MAX_NITER.  */
+
+static bool
+max_loop_iterations (struct ivopts_data *data, double_int *max_niter)
+{
+  struct iv_use *use;
+  struct iv *iv;
+  bool found = false;
+  double_int period;
+  gimple stmt;
+  unsigned i;
+
+  for (i = 0; i  n_iv_uses (data); i++)
+{
+  use = iv_use (data, i);
+
+  stmt = use-stmt;
+  if (!just_once_each_iteration_p (data-current_loop, gimple_bb (stmt)))
+	continue;
+
+  if (!is_nonwrap_use (data, use))
+continue;
+
+  iv = use-iv;
+  if (iv-step == NULL_TREE || TREE_CODE (iv-step) != INTEGER_CST)
+	continue;
+  period = tree_to_double_int (iv_period (iv));
+
+  if (found)
+*max_niter = double_int_umin (*max_niter, period);
+  else
+{
+  found = true;
+  *max_niter = period;
+}
+}
+
+  return found;
+}
+
+/* Initializes DATA-max_iterations and DATA-max_iterations_p.  */
+
+static void
+set_max_iterations (struct ivopts_data *data)
+{
+  double_int max_niter, max_niter2;
+  bool estimate1, estimate2;
+
+  data-max_iterations_p = false;
+  estimate1 = estimated_loop_iterations (data-current_loop, true, max_niter);
+  estimate2 = max_loop_iterations (data, max_niter2);
+  if (!(estimate1 || estimate2))
+return;
+  if (estimate1  estimate2)
+data-max_iterations = double_int_umin (max_niter, max_niter2);
+  else if (estimate1)
+data-max_iterations = max_niter;
+  else
+data-max_iterations = max_niter2;
+  data-max_iterations_p = true;
+}
+
 /* Check whether it is possible to express the condition in USE by comparison
of candidate CAND.  If so, store the value compared with to BOUND.  */
 
@@ -4391,10 +4499,10 @@ may_eliminate_iv (struct ivopts_data *da
   /* See if we can take advantage of infered loop bound information.  */
   if (loop_only_exit_p (loop, exit))
 {
-  if (!estimated_loop_iterations (loop, true, max_niter))
+  if (!data-max_iterations_p)
 return false;
   /* 

Re: C++ PATCH for c++/48948 (rejecting constexpr friend that takes the current class)

2011-05-18 Thread Jason Merrill

On 05/11/2011 05:27 PM, Jason Merrill wrote:

We want to allow a constexpr friend function that takes the current
class, so we need to defer checking the literality of parameter types
until any classes involved are complete.


It was pointed out to me that the restriction already only applies to 
function definitions, not declarations, which dramatically simplifies 
the code.  This patch reverts most of the previous one, and only checks 
return/parameter types at the point of definition.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit f2a2c7b6af06123b5f81bd474b60bddfe9b58550
Author: Jason Merrill ja...@redhat.com
Date:   Mon May 16 17:21:17 2011 -0400

	PR c++/48948
	PR c++/49015
	* class.c (finalize_literal_type_property): Do check
	for constexpr member functions of non-literal class.
	(finish_struct): Don't call check_deferred_constexpr_decls.
	* cp-tree.h: Don't declare it.
	(DECL_DEFERRED_CONSTEXPR_CHECK): Remove.
	* decl.c (grok_special_member_properties): Don't check it
	(grokfnedcl): Don't call validate_constexpr_fundecl.
	(start_preparsed_function): Do call it.
	* pt.c (tsubst_decl): Don't call it.
	(instantiate_class_template_1): Don't call
	check_deferred_constexpr_decls.
	* semantics.c (literal_type_p): Check for any incompleteness.
	(ensure_literal_type_for_constexpr_object): Likewise.
	(is_valid_constexpr_fn): Revert deferral changes.
	(validate_constexpr_fundecl): Likewise.
	(register_constexpr_fundef): Likewise.
	(check_deferred_constexpr_decls): Remove.

diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index dc2c509..4e52b18 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -4582,6 +4582,8 @@ type_requires_array_cookie (tree type)
 static void
 finalize_literal_type_property (tree t)
 {
+  tree fn;
+
   if (cxx_dialect  cxx0x
   || TYPE_HAS_NONTRIVIAL_DESTRUCTOR (t)
   /* FIXME These constraints seem unnecessary; remove from standard.
@@ -4591,6 +4593,18 @@ finalize_literal_type_property (tree t)
   else if (CLASSTYPE_LITERAL_P (t)  !TYPE_HAS_TRIVIAL_DFLT (t)
 	!TYPE_HAS_CONSTEXPR_CTOR (t))
 CLASSTYPE_LITERAL_P (t) = false;
+
+  if (!CLASSTYPE_LITERAL_P (t))
+for (fn = TYPE_METHODS (t); fn; fn = DECL_CHAIN (fn))
+  if (DECL_DECLARED_CONSTEXPR_P (fn)
+	   TREE_CODE (fn) != TEMPLATE_DECL
+	   DECL_NONSTATIC_MEMBER_FUNCTION_P (fn)
+	   !DECL_CONSTRUCTOR_P (fn))
+	{
+	  DECL_DECLARED_CONSTEXPR_P (fn) = false;
+	  if (!DECL_TEMPLATE_INFO (fn))
+	error (enclosing class of %q+#D is not a literal type, fn);
+	}
 }
 
 /* Check the validity of the bases and members declared in T.  Add any
@@ -5831,8 +5845,6 @@ finish_struct (tree t, tree attributes)
   else
 error (trying to finish struct, but kicked out due to previous parse errors);
 
-  check_deferred_constexpr_decls ();
-
   if (processing_template_decl  at_function_scope_p ())
 add_stmt (build_min (TAG_DEFN, t));
 
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index c0b5290..dfb2b66 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -93,7 +93,6 @@ c-common.h, not after.
   TYPENAME_IS_RESOLVING_P (in TYPE_NAME_TYPE)
   LAMBDA_EXPR_DEDUCE_RETURN_TYPE_P (in LAMBDA_EXPR)
   TARGET_EXPR_DIRECT_INIT_P (in TARGET_EXPR)
-  DECL_DEFERRED_CONSTEXPR_CHECK (in FUNCTION_DECL)
3: (TREE_REFERENCE_EXPR) (in NON_LVALUE_EXPR) (commented-out).
   ICS_BAD_FLAG (in _CONV)
   FN_TRY_BLOCK_P (in TRY_BLOCK)
@@ -2345,11 +2344,6 @@ struct GTY((variable_size)) lang_decl {
 #define DECL_DECLARED_CONSTEXPR_P(DECL) \
   DECL_LANG_FLAG_8 (VAR_OR_FUNCTION_DECL_CHECK (STRIP_TEMPLATE (DECL)))
 
-/* True if we can't tell yet whether the argument/return types of DECL
-   are literal because one is still being defined.  */
-#define DECL_DEFERRED_CONSTEXPR_CHECK(DECL) \
-  TREE_LANG_FLAG_2 (FUNCTION_DECL_CHECK (STRIP_TEMPLATE (DECL)))
-
 /* Nonzero if this DECL is the __PRETTY_FUNCTION__ variable in a
template function.  */
 #define DECL_PRETTY_FUNCTION_P(NODE) \
@@ -5337,7 +5331,6 @@ extern void finish_handler_parms		(tree, tree);
 extern void finish_handler			(tree);
 extern void finish_cleanup			(tree, tree);
 extern bool literal_type_p (tree);
-extern void check_deferred_constexpr_decls (void);
 extern tree validate_constexpr_fundecl (tree);
 extern tree register_constexpr_fundef (tree, tree);
 extern bool check_constexpr_ctor_body (tree, tree);
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 7939140..e950c43 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -7200,10 +7200,7 @@ grokfndecl (tree ctype,
   if (inlinep)
 DECL_DECLARED_INLINE_P (decl) = 1;
   if (inlinep  2)
-{
-  DECL_DECLARED_CONSTEXPR_P (decl) = true;
-  validate_constexpr_fundecl (decl);
-}
+DECL_DECLARED_CONSTEXPR_P (decl) = true;
 
   DECL_EXTERNAL (decl) = 1;
   if (quals  TREE_CODE (type) == FUNCTION_TYPE)
@@ -10681,9 +10678,6 @@ grok_special_member_properties (tree decl)
 	TYPE_HAS_LIST_CTOR (class_type) = 1;
 
   if 

Re: [PATCH][?/n] LTO type merging cleanup

2011-05-18 Thread Jan Hubicka
 
 We can end up with an infinite recursion as gimple_register_type
 tries to register TYPE_MAIN_VARIANT first.  This is because we
 are being called from the LTO type-fixup code which walks the
 type graph and adjusts types to their leaders.  So we can
 be called for type SCCs that are only partially fixed up yet
 which means TYPE_MAIN_VARIANT might temporarily not honor
 the invariant that the main variant of a main variant is itself.
 Thus, simply avoid recursing more than once - we are sure that
 we will be reaching at most type duplicates in further recursion.
 
 Bootstrap  regtest pending on x86_64-unknown-linux-gnu.

With this funcion WPA stage passes with some improvements I repported to 
mozilla metabug.

We now get ICE in ltrans:
#0  gimple_register_type (t=0x0) at ../../gcc/gimple.c:4616
#1  0x005a0fc9 in gimple_register_canonical_type (t=0x7fffe851f498) at 
../../gcc/gimple.c:4890
#2  0x0048f14d in lto_ft_type (t=0x7fffe851f498) at 
../../gcc/lto/lto.c:401
#3  lto_fixup_types (t=0x7fffe851f498) at ../../gcc/lto/lto.c:581
#4  0x0048f4a0 in uniquify_nodes (node=Unhandled dwarf expression 
opcode 0xf3

TYPE_MAIN_VARIANT is NULL.
(gdb) up
#1  0x005a0fc9 in gimple_register_canonical_type (t=0x7fffe851f498) at 
../../gcc/gimple.c:4890
4890  t = gimple_register_type (TYPE_MAIN_VARIANT (t));
(gdb) p debug_generic_stmt (t)
struct _ffi_type

$1 = void
(gdb) p debug_tree (t)
 record_type 0x7fffe851f498 _ffi_type BLK
size integer_cst 0x77ecf680 type integer_type 0x77eca0a8 
bit_size_type constant 192
unit size integer_cst 0x77ecf640 type integer_type 0x77eca000 
constant 24
align 64 symtab 0 alias set -1 structural equality
fields field_decl 0x7fffe87684c0 size
type integer_type 0x77eca690 long unsigned int public unsigned DI
size integer_cst 0x77ecf1e0 constant 64
unit size integer_cst 0x77ecf200 constant 8
align 64 symtab 0 alias set -1 canonical type 0x77eca690 
precision 64 min integer_cst 0x77ecf220 0 max integer_cst 0x77ecf1c0 
18446744073709551615
pointer_to_this pointer_type 0x75336150 reference_to_this 
reference_type 0x70aba000
used unsigned nonlocal DI file ctypes/libffi/include/ffi.h line 109 col 
0 size integer_cst 0x77ecf1e0 64 unit size integer_cst 0x77ecf200 8
align 64 offset_align 128
offset integer_cst 0x77ebaf00 constant 0
bit offset integer_cst 0x77ecf420 constant 0 context record_type 
0x7fffe851f2a0 _ffi_type
chain field_decl 0x7fffe8768558 alignment type integer_type 
0x77eca3f0 short unsigned int
used unsigned nonlocal HI file ctypes/libffi/include/ffi.h line 110 
col 0
size integer_cst 0x77ecf080 constant 16
unit size integer_cst 0x77ecf0a0 constant 2
align 16 offset_align 128 offset integer_cst 0x77ebaf00 0 bit 
offset integer_cst 0x77ecf1e0 64 context record_type 0x7fffe851f2a0 
_ffi_type chain field_decl 0x7fffe87685f0 type
chain type_decl 0x7fffe8966ac8 _ffi_type
$2 = void

Let me know if there is anything easy I could work out ;)
I think the bug may be in the recursion guard.  When you have cycle of length
greater than 2 of MVs, you won't walk them all.

Honza


[PATCH PR45098, 8/10] Nowrap limits iterations - test cases.

2011-05-18 Thread Tom de Vries
On 05/17/2011 09:21 AM, Tom de Vries wrote:
 On 05/17/2011 09:10 AM, Tom de Vries wrote:
 Hi Zdenek,

 I have a patch set for for PR45098.

 01_object-size-target.patch
 02_pr45098-rtx-cost-set.patch
 03_pr45098-computation-cost.patch
 04_pr45098-iv-init-cost.patch
 05_pr45098-bound-cost.patch
 06_pr45098-bound-cost.test.patch
 07_pr45098-nowrap-limits-iterations.patch
 08_pr45098-nowrap-limits-iterations.test.patch
 09_pr45098-shift-add-cost.patch
 10_pr45098-shift-add-cost.test.patch

 I will sent out the patches individually.

 
 OK for trunk?
 
 Thanks,
 - Tom

Resubmitting with comment.

This patch introduces 3 new testcases, and modifies an existing test case.
The 3 new testcases need the preceding patches to pass.

The modified test case is ivopt_infer_2.c.

#ifndef TYPE
#define TYPE char*
#endif

extern int a[];

/* Can not infer loop iteration from array -- exit test can not be replaced.  */
void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
{
  TYPE dstn= dst + i_width;
  TYPE dst0 = dst;
  unsigned long long i = 0;
  for( ; dst = dstn; )
{
  dst0[i] = ( src1[i] + src2[i] + 1 +a[i])  1;
  dst++;
  i += 16;
}
}

The estimates in set_max_iterations for this testcase are:

(gdb) p /x  max_niter
$3 = {low = 0x0, high = 0x1}
(gdb) p /x  max_niter2
$4 = {low = 0x3ff, high = 0x0}

The second estimate is based on a[i], which contains the non-wrapping pointer
arithmetic a+i. Var i is incremented with 16 each iterations, an a is an int
pointer, which explains the factor 64 difference between the 2.

2011-05-05  Tom de Vries  t...@codesourcery.com

PR target/45098
* gcc.target/arm/ivopts-3.c: New test.
* gcc.target/arm/ivopts-4.c: New test.
* gcc.target/arm/ivopts-5.c: New test.
* gcc.dg/tree-ssa/ivopt_infer_2.c: Adapt test.
Index: gcc/testsuite/gcc.target/arm/ivopts-3.c
===
--- /dev/null (new file)
+++ gcc/testsuite/gcc.target/arm/ivopts-3.c (revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do assemble } */
+/* { dg-options -Os -mthumb -fdump-tree-ivopts -save-temps } */
+
+extern unsigned int foo2 (short*) __attribute__((pure));
+
+unsigned int
+tr3 (short array[], unsigned int n)
+{
+  unsigned sum = 0;
+  unsigned int x;
+  for (x = 0; x  n; x++)
+sum += foo2 (array[x]);
+  return sum;
+}
+
+/* { dg-final { scan-tree-dump-times PHI ivtmp 1 ivopts} } */
+/* { dg-final { scan-tree-dump-times PHI x 0 ivopts} } */
+/* { dg-final { scan-tree-dump-times , x 0 ivopts} } */
+/* { dg-final { object-size text = 30 { target arm_thumb2_ok } } } */
+/* { dg-final { cleanup-tree-dump ivopts } } */
Index: gcc/testsuite/gcc.target/arm/ivopts-4.c
===
--- /dev/null (new file)
+++ gcc/testsuite/gcc.target/arm/ivopts-4.c (revision 0)
@@ -0,0 +1,21 @@
+/* { dg-do assemble } */
+/* { dg-options -mthumb -Os -fdump-tree-ivopts -save-temps } */
+
+extern unsigned int foo (int*) __attribute__((pure));
+
+unsigned int
+tr2 (int array[], int n)
+{
+  unsigned int sum = 0;
+  int x;
+  if (n  0)
+for (x = 0; x  n; x++)
+  sum += foo (array[x]);
+  return sum;
+}
+
+/* { dg-final { scan-tree-dump-times PHI ivtmp 1 ivopts} } */
+/* { dg-final { scan-tree-dump-times PHI x 0 ivopts} } */
+/* { dg-final { scan-tree-dump-times , x 0 ivopts} } */
+/* { dg-final { object-size text = 36 { target arm_thumb2_ok } } } */
+/* { dg-final { cleanup-tree-dump ivopts } } */
Index: gcc/testsuite/gcc.target/arm/ivopts-5.c
===
--- /dev/null (new file)
+++ gcc/testsuite/gcc.target/arm/ivopts-5.c (revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do assemble } */
+/* { dg-options -Os -mthumb -fdump-tree-ivopts -save-temps } */
+
+extern unsigned int foo (int*) __attribute__((pure));
+
+unsigned int
+tr1 (int array[], unsigned int n)
+{
+  unsigned int sum = 0;
+  unsigned int x;
+  for (x = 0; x  n; x++)
+sum += foo (array[x]);
+  return sum;
+}
+
+/* { dg-final { scan-tree-dump-times PHI ivtmp 1 ivopts} } */
+/* { dg-final { scan-tree-dump-times PHI x 0 ivopts} } */
+/* { dg-final { scan-tree-dump-times , x 0 ivopts} } */
+/* { dg-final { object-size text = 30 { target arm_thumb2_ok } } } */
+/* { dg-final { cleanup-tree-dump ivopts } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c	(revision 173380)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c	(working copy)
@@ -7,7 +7,8 @@
 
 extern int a[];
 
-/* Can not infer loop iteration from array -- exit test can not be replaced.  */
+/* Can infer loop iteration from nonwrapping pointer arithmetic.
+   exit test can be replaced.  */
 void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
 {
   TYPE dstn= dst + i_width;
@@ -21,5 +22,5 @@ void foo (int i_width, TYPE dst, TYPE sr
}
 }
 
-/* { dg-final { 

[PATCH PR45098, 9/10] Cheap shift-add.

2011-05-18 Thread Tom de Vries
On 05/17/2011 09:21 AM, Tom de Vries wrote:
 On 05/17/2011 09:10 AM, Tom de Vries wrote:
 Hi Zdenek,

 I have a patch set for for PR45098.

 01_object-size-target.patch
 02_pr45098-rtx-cost-set.patch
 03_pr45098-computation-cost.patch
 04_pr45098-iv-init-cost.patch
 05_pr45098-bound-cost.patch
 06_pr45098-bound-cost.test.patch
 07_pr45098-nowrap-limits-iterations.patch
 08_pr45098-nowrap-limits-iterations.test.patch
 09_pr45098-shift-add-cost.patch
 10_pr45098-shift-add-cost.test.patch

 I will sent out the patches individually.

 
 OK for trunk?
 
 Thanks,
 - Tom

Resubmitting with comment.

ARM has cheap shift-add instructions. Take that into account in
force_expr_to_var_cost.

2011-05-05  Tom de Vries  t...@codesourcery.com

PR target/45098
* tree-ssa-loop-ivopts.c: Include expmed.h.
(get_shiftadd_cost): New function.
(force_expr_to_var_cost): Use get_shiftadd_cost.
Index: gcc/tree-ssa-loop-ivopts.c
===
--- gcc/tree-ssa-loop-ivopts.c	(revision 173380)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -92,6 +92,12 @@ along with GCC; see the file COPYING3.  
 #include tree-inline.h
 #include tree-ssa-propagate.h
 
+/* FIXME: add_cost and zero_cost defined in exprmed.h conflict with local uses.
+ */
+#include expmed.h
+#undef add_cost
+#undef zero_cost
+
 /* FIXME: Expressions are expanded to RTL in this pass to determine the
cost of different addressing modes.  This should be moved to a TBD
interface between the GIMPLE and RTL worlds.  */
@@ -3504,6 +3510,37 @@ get_address_cost (bool symbol_present, b
   return new_cost (cost + acost, complexity);
 }
 
+ /* Calculate the SPEED or size cost of shiftadd EXPR in MODE.  MULT is the
+the EXPR operand holding the shift.  COST0 and COST1 are the costs for
+calculating the operands of EXPR.  Returns true if successful, and returns
+the cost in COST.  */
+
+static bool
+get_shiftadd_cost (tree expr, enum machine_mode mode, comp_cost cost0,
+   comp_cost cost1, tree mult, bool speed, comp_cost *cost)
+{
+  comp_cost res;
+  tree op1 = TREE_OPERAND (expr, 1);
+  tree cst = TREE_OPERAND (mult, 1);
+  int m = exact_log2 (int_cst_value (cst));
+  int maxm = MIN (BITS_PER_WORD, GET_MODE_BITSIZE (mode));
+  int sa_cost;
+
+  if (!(m = 0  m  maxm))
+return false;
+
+  sa_cost = (TREE_CODE (expr) != MINUS_EXPR
+ ? shiftadd_cost[speed][mode][m]
+ : (mult == op1
+? shiftsub1_cost[speed][mode][m]
+: shiftsub0_cost[speed][mode][m]));
+  res = new_cost (sa_cost, 0);
+  res = add_costs (res, mult == op1 ? cost0 : cost1);
+
+  *cost = res;
+  return true;
+}
+
 /* Estimates cost of forcing expression EXPR into a variable.  */
 
 static comp_cost
@@ -3629,6 +3666,21 @@ force_expr_to_var_cost (tree expr, bool 
 case MINUS_EXPR:
 case NEGATE_EXPR:
   cost = new_cost (add_cost (mode, speed), 0);
+  if (TREE_CODE (expr) != NEGATE_EXPR)
+{
+  tree mult = NULL_TREE;
+  comp_cost sa_cost;
+  if (TREE_CODE (op1) == MULT_EXPR)
+mult = op1;
+  else if (TREE_CODE (op0) == MULT_EXPR)
+mult = op0;
+
+  if (mult != NULL_TREE
+   TREE_CODE (TREE_OPERAND (mult, 1)) == INTEGER_CST
+   get_shiftadd_cost (expr, mode, cost0, cost1, mult, speed,
+sa_cost))
+return sa_cost;
+}
   break;
 
 case MULT_EXPR:


[PATCH, i386]: Trivial, split long asm templates in TLS patterns

2011-05-18 Thread Uros Bizjak
Hello!

2011-05-18  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.md (*tls_global_dynamic_32_gnu): Split asm template.
(*tls_global_dynamic_64): Ditto.
(*tls_local_dynamic_base_32_gnu): Ditto.
(*tls_local_dynamic_base_64): Ditto.
(tls_initial_exec_64_sun): Ditto.

No functional changes.

Patch was tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN.

Uros.
Index: i386.md
===
--- i386.md (revision 173864)
+++ i386.md (working copy)
@@ -12364,7 +12364,11 @@
(clobber (match_scratch:SI 5 =c))
(clobber (reg:CC FLAGS_REG))]
   !TARGET_64BIT  TARGET_GNU_TLS
-  lea{l}\t{%a2@tlsgd(,%1,1), %0|%0, %a2@tlsgd[%1*1]}\;call\t%P3
+{
+  output_asm_insn
+(lea{l}\t{%a2@tlsgd(,%1,1), %0|%0, %a2@tlsgd[%1*1]}, operands);
+  return call\t%P3;
+}
   [(set_attr type multi)
(set_attr length 12)])
 
@@ -12387,7 +12391,14 @@
(unspec:DI [(match_operand:DI 1 tls_symbolic_operand )]
  UNSPEC_TLS_GD)]
   TARGET_64BIT
-  { return ASM_BYTE 0x66\n\tlea{q}\t{%a1@tlsgd(%%rip), %%rdi|rdi, 
%a1@tlsgd[rip]}\n ASM_SHORT 0x\n\trex64\n\tcall\t%P2; }
+{
+  fputs (ASM_BYTE 0x66\n, asm_out_file);
+  output_asm_insn
+(lea{q}\t{%a1@tlsgd(%%rip), %%rdi|rdi, %a1@tlsgd[rip]}, operands);
+  fputs (ASM_SHORT 0x\n, asm_out_file);
+  fputs (\trex64\n, asm_out_file);
+  return call\t%P2;
+}
   [(set_attr type multi)
(set_attr length 16)])
 
@@ -12410,7 +12421,11 @@
(clobber (match_scratch:SI 4 =c))
(clobber (reg:CC FLAGS_REG))]
   !TARGET_64BIT  TARGET_GNU_TLS
-  lea{l}\t{%@tlsldm(%1), %0|%0, %@tlsldm[%1]}\;call\t%P2
+{
+  output_asm_insn
+(lea{l}\t{%@tlsldm(%1), %0|%0, %@tlsldm[%1]}, operands);
+  return call\t%P2;
+}
   [(set_attr type multi)
(set_attr length 11)])
 
@@ -12432,7 +12447,11 @@
 (match_operand:DI 2  )))
(unspec:DI [(const_int 0)] UNSPEC_TLS_LD_BASE)]
   TARGET_64BIT
-  lea{q}\t{%@tlsld(%%rip), %%rdi|rdi, %@tlsld[rip]}\;call\t%P1
+{
+  output_asm_insn
+(lea{q}\t{%@tlsld(%%rip), %%rdi|rdi, %@tlsld[rip]}, operands);
+  return call\t%P1;
+}
   [(set_attr type multi)
(set_attr length 12)])
 
@@ -12507,7 +12526,11 @@
 UNSPEC_TLS_IE_SUN))
(clobber (reg:CC FLAGS_REG))]
   TARGET_64BIT  TARGET_SUN_TLS
-  mov{q}\t{%%fs:0, %0|%0, QWORD PTR fs:0}\n\tadd{q}\t{%a1@gottpoff(%%rip), 
%0|%0, %a1@gottpoff[rip]}
+{
+  output_asm_insn
+(mov{q}\t{%%fs:0, %0|%0, QWORD PTR fs:0}, operands)
+  return add{q}\t{%a1@gottpoff(%%rip), %0|%0, %a1@gottpoff[rip]};
+}
   [(set_attr type multi)])
 
 ;; GNU2 TLS patterns can be split.


Re: [PATCH,c++] describe reasons for function template overload resolution failure

2011-05-18 Thread Jason Merrill

Thanks for the background; I will keep the principle in mind.  IMHO, in
a case like this where we're logically printing one diagnostic (one
error and then some number of explanatory notes) keeping all the logic
for the diagnostic centralized makes more sense.


I understand, but that means we have to create a whole data structure to 
try and preserve information about the failure, and either having to 
duplicate every possible error or give less informative messages.  I 
feel even more strongly about this after looking more closely at your patch.



+case ur_invalid:
+  inform (loc,
+   template argument deduction attempted with invalid input);
+  break;


In ur_invalid cases, we should have had an earlier error message 
already, so giving an extra message here seems kind of redundant.



+   types %qT and %qT differ in their qualifiers,


Let's say ...have incompatible cv-qualifiers, since some differences 
are OK.



+  inform (loc,   variable-sized array type %qT is not permitted,


...is not a valid template argument


+  inform (loc,   %qT is not derived from %qT,


This could be misleading, since we can also fail when the deduction is 
ambiguous.



+  inform (loc,   %qE is not a valid pointer-to-member of type %qT,


This needs to say pointer-to-member constant, not just 
pointer-to-member.



+case ur_parameter_deduction_failure:
+  inform (loc,   couldn't deduce template argument %qD, ui-u.parm);
+  break;


It seems like you're using this both for cases where unification 
succeeded but just didn't produce template arguments for all parameters, 
and for cases where unification failed for some reason; this message 
should only apply to the first case.



  if (TREE_PURPOSE (TREE_VEC_ELT (tparms, i)))
{
  tree parm = TREE_VALUE (TREE_VEC_ELT (tparms, i));
  tree arg = TREE_PURPOSE (TREE_VEC_ELT (tparms, i));
  arg = tsubst_template_arg (arg, targs, tf_none, NULL_TREE);
  arg = convert_template_argument (parm, arg, targs, tf_none,
   i, NULL_TREE, ui);
  if (arg == error_mark_node)
return unify_parameter_deduction_failure (ui, parm);


In this case, the problem is that we tried to use the default template 
argument but it didn't work for some reason; we should say that, not 
just say we didn't deduce something, or the users will say but there's 
a default argument!.


In this case, we should do the substitution again with 
tf_warning_or_error so the user can see what the problem actually is, 
not just say that there was some unspecified problem.



- return 2;
+ return unify_parameter_deduction_failure (ui, tparm);


This seems like the only place we actually want to use 
unify_parameter_deduction_failure.



  /* Check for mixed types and values.  */
  if ((TREE_CODE (parm) == TEMPLATE_TYPE_PARM
TREE_CODE (tparm) != TYPE_DECL)
  || (TREE_CODE (parm) == TEMPLATE_TEMPLATE_PARM
   TREE_CODE (tparm) != TEMPLATE_DECL))
return unify_parameter_deduction_failure (ui, parm);


This is a type/template mismatch issue that deserves a more helpful 
diagnostic.



  /* ARG must be constructed from a template class or a template
 template parameter.  */
  if (TREE_CODE (arg) != BOUND_TEMPLATE_TEMPLATE_PARM
   !CLASSTYPE_SPECIALIZATION_OF_PRIMARY_TEMPLATE_P (arg))
return unify_parameter_deduction_failure (ui, parm);


This is saying that we can't deduce a template from a non-template type.


  /* If the argument deduction results is a METHOD_TYPE,
 then there is a problem.
 METHOD_TYPE doesn't map to any real C++ type the result of
 the deduction can not be of that type.  */
  if (TREE_CODE (arg) == METHOD_TYPE)
return unify_parameter_deduction_failure (ui, parm);


Like with the VLA case, the problem here is deducing something that 
isn't a valid template type argument.



/* We haven't deduced the type of this parameter yet.  Try again
   later.  */
return unify_success (ui);
  else
return unify_parameter_deduction_failure (ui, parm);


Here the problem is a type mismatch between parm and arg for a non-type 
template argument.



/* Perhaps PARM is something like SU and ARG is Sint.
   Then, we should unify `int' and `U'.  */
t = arg;
  else
/* There's no chance of unification succeeding.  */
return unify_parameter_deduction_failure (ui, parm);


This should be type_mismatch.


case FIELD_DECL:
case TEMPLATE_DECL:
  /* Matched cases are handled by the ARG == PARM test above.  */
  return unify_parameter_deduction_failure (ui, parm);


Another case where we should talk about the arg/parm mismatch.


+   case rr_invalid_copy:
+ inform 

Re: Libiberty: POSIXify psignal definition

2011-05-18 Thread Richard Earnshaw

On Tue, 2011-05-17 at 12:48 -0400, DJ Delorie wrote:
  What I don't understand is why the newlib change broke older compilers.
 
 Older compilers have the older libiberty.  At the moment, libiberty
 cannot be built by *any* released gcc, because you cannot *build* any
 released gcc, because it cannot build its target libiberty.
 

And the problem is that libiberty is assuming that it *knows* what
functions newlib provides, so that it doesn't need to check directly.
This is just broken...


  # If we are being configured for newlib, we know which functions
  # newlib provide and which ones we will be expected to provide.






Re: [google] Increase inlining limits with FDO/LIPO

2011-05-18 Thread Xinliang David Li
Though not common, people can do this:

1. for profile gen:
gcc -fprofile-arcs ...

2. for profile use
gcc -fbranch-probabilities ...

The new change won't help those. Your original place will be ok if you
test profile_arcs and branch_probability flags.

David


On Wed, May 18, 2011 at 10:39 AM, Mark Heffernan meh...@google.com wrote:
 On Tue, May 17, 2011 at 11:34 PM, Xinliang David Li davi...@google.com
 wrote:

 To make consistent inline decisions between profile-gen and
 profile-use, probably better to check these two:

 flag_profile_arcs and flag_branch_probabilities.  -fprofile-use
 enables profile-arcs, and value profiling is enabled only when
 edge/branch profiling is enabled (so no need to be checked).

 I changed the location where these parameters are set to someplace more
 appropriate (to where the flags are set when profile gen/use is indicated).
  Verified identical binaries are generated.
 OK as updated?

 Mark
 2011-05-18  Mark Heffernan  meh...@google.com
 * opts.c (set_profile_parameters): New function.
 Index: opts.c
 ===
 --- opts.c      (revision 173666)
 +++ opts.c      (working copy)
 @@ -1209,6 +1209,25 @@ print_specific_help (unsigned int includ
                        opts-x_help_columns, opts, lang_mask);
  }

 +
 +/* Set parameters to more appropriate values when profile information
 +   is available.  */
 +static void
 +set_profile_parameters (struct gcc_options *opts,
 +                       struct gcc_options *opts_set)
 +{
 +  /* With accurate profile information, inlining is much more
 +     selective and makes better decisions, so increase the
 +     inlining function size limits.  */
 +  maybe_set_param_value
 +    (PARAM_MAX_INLINE_INSNS_SINGLE, 1000,
 +     opts-x_param_values, opts_set-x_param_values);
 +  maybe_set_param_value
 +    (PARAM_MAX_INLINE_INSNS_AUTO, 1000,
 +     opts-x_param_values, opts_set-x_param_values);
 +}
 +
 +
  /* Handle target- and language-independent options.  Return zero to
     generate an unknown option message.  Only options that need
     extra handling need to be listed here; if you simply want
 @@ -1560,6 +1579,7 @@ common_handle_option (struct gcc_options
         opts-x_flag_unswitch_loops = value;
        if (!opts_set-x_flag_gcse_after_reload)
         opts-x_flag_gcse_after_reload = value;
 +      set_profile_parameters (opts, opts_set);
        break;

      case OPT_fprofile_generate_:
 @@ -1580,6 +1600,7 @@ common_handle_option (struct gcc_options
          is done.  */
        if (!opts_set-x_flag_ipa_reference  in_lto_p)
          opts-x_flag_ipa_reference = false;
 +      set_profile_parameters (opts, opts_set);
        break;

      case OPT_fshow_column:



Re: Libiberty: POSIXify psignal definition

2011-05-18 Thread DJ Delorie

 And the problem is that libiberty is assuming that it *knows* what
 functions newlib provides, so that it doesn't need to check
 directly.  This is just broken...

Historically, cygwin was built using libiberty and newlib, so you did
not have a runtime at the time you were building libiberty, because
you hadn't built newlib yet.

In a combined tree, target-libiberty is still built before
target-newlib, so the problem exists there too.

At this point, though, I'm tempted to say there's no such thing as a
target libiberty and rip all the target-libiberty rules out, and let
newlib-hosted targets autodetect the host-libiberty.  That is, if
Cygwin doesn't need a target-libiberty any more?


[v3] Update bitset (and a few more bits elsewhere) for noexcept

2011-05-18 Thread Paolo Carlini

Hi,

tested x86_64-linux, committed.

Thanks,
Paolo.

//
2011-05-18  Paolo Carlini  paolo.carl...@oracle.com

* libsupc++/initializer_list: Use noexcept specifier.
(initializer_list::size, begin, end): Qualify as const.
* include/bits/move.h (__addressof, forward, move, addressof): Specify
as noexcept.
* include/std/bitset: Use noexcept specifier throughout.
* include/debug/bitset: Update.
* include/profile/bitset: Likewise.

Index: include/debug/bitset
===
--- include/debug/bitset(revision 173870)
+++ include/debug/bitset(working copy)
@@ -1,6 +1,6 @@
 // Debugging bitset implementation -*- C++ -*-
 
-// Copyright (C) 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010
+// Copyright (C) 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011
 // Free Software Foundation, Inc.
 //
 // This file is part of the GNU ISO C++ Library.  This library is free
@@ -66,19 +66,19 @@
reference();
 
reference(const _Base_ref __base,
- bitset* __seq __attribute__((__unused__)))
+ bitset* __seq __attribute__((__unused__))) _GLIBCXX_NOEXCEPT
: _Base_ref(__base)
, _Safe_iterator_base(__seq, false)
{ }
 
   public:
-   reference(const reference __x)
+   reference(const reference __x) _GLIBCXX_NOEXCEPT
: _Base_ref(__x)
, _Safe_iterator_base(__x, false)
{ }
 
reference
-   operator=(bool __x)
+   operator=(bool __x) _GLIBCXX_NOEXCEPT
{
  _GLIBCXX_DEBUG_VERIFY(! this-_M_singular(),
  _M_message(__gnu_debug::__msg_bad_bitset_write)
@@ -88,7 +88,7 @@
}
 
reference
-   operator=(const reference __x)
+   operator=(const reference __x) _GLIBCXX_NOEXCEPT
{
  _GLIBCXX_DEBUG_VERIFY(! __x._M_singular(),
   _M_message(__gnu_debug::__msg_bad_bitset_read)
@@ -101,7 +101,7 @@
}
 
bool
-   operator~() const
+   operator~() const _GLIBCXX_NOEXCEPT
{
  _GLIBCXX_DEBUG_VERIFY(! this-_M_singular(),
   _M_message(__gnu_debug::__msg_bad_bitset_read)
@@ -109,7 +109,7 @@
  return ~(*static_castconst _Base_ref*(this));
}
 
-   operator bool() const
+   operator bool() const _GLIBCXX_NOEXCEPT
{
  _GLIBCXX_DEBUG_VERIFY(! this-_M_singular(),
  _M_message(__gnu_debug::__msg_bad_bitset_read)
@@ -118,7 +118,7 @@
}
 
reference
-   flip()
+   flip() _GLIBCXX_NOEXCEPT
{
  _GLIBCXX_DEBUG_VERIFY(! this-_M_singular(),
  _M_message(__gnu_debug::__msg_bad_bitset_flip)
@@ -130,10 +130,11 @@
 #endif
 
   // 23.3.5.1 constructors:
-  _GLIBCXX_CONSTEXPR bitset() : _Base() { }
+  _GLIBCXX_CONSTEXPR bitset() _GLIBCXX_NOEXCEPT
+  : _Base() { }
 
 #ifdef __GXX_EXPERIMENTAL_CXX0X__
-  constexpr bitset(unsigned long long __val)
+  constexpr bitset(unsigned long long __val) noexcept
 #else
   bitset(unsigned long __val)
 #endif
@@ -173,42 +174,42 @@
 
   // 23.3.5.2 bitset operations:
   bitset_Nb
-  operator=(const bitset_Nb __rhs)
+  operator=(const bitset_Nb __rhs) _GLIBCXX_NOEXCEPT
   {
_M_base() = __rhs;
return *this;
   }
 
   bitset_Nb
-  operator|=(const bitset_Nb __rhs)
+  operator|=(const bitset_Nb __rhs) _GLIBCXX_NOEXCEPT
   {
_M_base() |= __rhs;
return *this;
   }
 
   bitset_Nb
-  operator^=(const bitset_Nb __rhs)
+  operator^=(const bitset_Nb __rhs) _GLIBCXX_NOEXCEPT
   {
_M_base() ^= __rhs;
return *this;
   }
 
   bitset_Nb
-  operator=(size_t __pos)
+  operator=(size_t __pos) _GLIBCXX_NOEXCEPT
   {
_M_base() = __pos;
return *this;
   }
 
   bitset_Nb
-  operator=(size_t __pos)
+  operator=(size_t __pos) _GLIBCXX_NOEXCEPT
   {
_M_base() = __pos;
return *this;
   }
 
   bitset_Nb
-  set()
+  set() _GLIBCXX_NOEXCEPT
   {
_Base::set();
return *this;
@@ -224,7 +225,7 @@
   }
 
   bitset_Nb
-  reset()
+  reset() _GLIBCXX_NOEXCEPT
   {
_Base::reset();
return *this;
@@ -237,10 +238,12 @@
return *this;
   }
 
-  bitset_Nb operator~() const { return bitset(~_M_base()); }
+  bitset_Nb
+  operator~() const _GLIBCXX_NOEXCEPT
+  { return bitset(~_M_base()); }
 
   bitset_Nb
-  flip()
+  flip() _GLIBCXX_NOEXCEPT
   {
_Base::flip();
return *this;
@@ -346,11 +349,11 @@
   using _Base::size;
 
   bool
-  operator==(const bitset_Nb __rhs) const
+  operator==(const bitset_Nb __rhs) const _GLIBCXX_NOEXCEPT
   { return _M_base() == __rhs; }
 
   bool

Re: Libiberty: POSIXify psignal definition

2011-05-18 Thread Corinna Vinschen
On May 18 14:03, DJ Delorie wrote:
 
  And the problem is that libiberty is assuming that it *knows* what
  functions newlib provides, so that it doesn't need to check
  directly.  This is just broken...
 
 Historically, cygwin was built using libiberty and newlib, so you did
 not have a runtime at the time you were building libiberty, because
 you hadn't built newlib yet.
 
 In a combined tree, target-libiberty is still built before
 target-newlib, so the problem exists there too.
 
 At this point, though, I'm tempted to say there's no such thing as a
 target libiberty and rip all the target-libiberty rules out, and let
 newlib-hosted targets autodetect the host-libiberty.  That is, if
 Cygwin doesn't need a target-libiberty any more?

Cygwin doesn't need libiberty anymore since 2007.


Corinna

-- 
Corinna Vinschen
Cygwin Project Co-Leader
Red Hat


Re: [PATCH, PR45098, 3/10]

2011-05-18 Thread Tom de Vries
Hi Zdenek,

On 05/18/2011 05:24 PM, Zdenek Dvorak wrote:
 Hi,
 
 How about:
 ...
 @@ -2866,6 +2878,8 @@ computation_cost (tree expr, bool speed)
if (MEM_P (rslt))
  cost += address_cost (XEXP (rslt, 0), TYPE_MODE (type),
TYPE_ADDR_SPACE (type), speed);
 +  else if (!REG_P (rslt))
 +cost += (unsigned)rtx_cost (rslt, SET, speed);

return cost;
  }
 ...
 ?
 
 this looks ok to me 
 

thanks for the review.

 (the cast to unsigned is not necessary, though?)

You're right, it's not, that was only necessary to prevent a warning in the
conditional expression originally proposed.

Checked in without cast.

Thanks,
- Tom



Re: [patch gimplifier]: Change TRUTH_(AND|OR|XOR) expressions to binary form

2011-05-18 Thread Kai Tietz
2011/5/18 Kai Tietz ktiet...@googlemail.com:
 Hello

 As follow-up for logical to binary transition

 2011-05-18  Kai Tietz  kti...@redhat.com

        * tree-cfg.c (verify_gimple_assign_binary): Barf on
        TRUTH_AND_EXPR, TRUTH_OR_EXPR, and TRUTH_XOR_EXPR.
        (gimplify_expr): Boolify TRUTH_ANDIF_EXPR, TRUTH_ORIF_EXPR,
        TRUTH_AND_EXPR, TRUTH_OR_EXPR, and TRUTH_XOR_EXPR. Additionally
        move TRUTH_AND|OR|XOR_EXPR to its binary form.

 Boostrapped for x86_64-pc-linux-gnu and regression tested for ada,
 fortran, g++, and c. Ok for apply?

Additional bootstrapped and regression tested for java, obj-c, and
obj-c++. Regression tested alos libstdc++ and libjava. No regressions.

Regards,
Kai


Re: [google] Increase inlining limits with FDO/LIPO

2011-05-18 Thread Mark Heffernan
On Wed, May 18, 2011 at 10:52 AM, Xinliang David Li davi...@google.com wrote:
 The new change won't help those. Your original place will be ok if you
 test profile_arcs and branch_probability flags.

Ah, yes.  I see your point now. Reverted to the original change with
condition profile_arc_flag and flag_branch_probabilities.

Mark


 David


 On Wed, May 18, 2011 at 10:39 AM, Mark Heffernan meh...@google.com wrote:
 On Tue, May 17, 2011 at 11:34 PM, Xinliang David Li davi...@google.com
 wrote:

 To make consistent inline decisions between profile-gen and
 profile-use, probably better to check these two:

 flag_profile_arcs and flag_branch_probabilities.  -fprofile-use
 enables profile-arcs, and value profiling is enabled only when
 edge/branch profiling is enabled (so no need to be checked).

 I changed the location where these parameters are set to someplace more
 appropriate (to where the flags are set when profile gen/use is indicated).
  Verified identical binaries are generated.
 OK as updated?

 Mark
 2011-05-18  Mark Heffernan  meh...@google.com
 * opts.c (set_profile_parameters): New function.
 Index: opts.c
 ===
 --- opts.c      (revision 173666)
 +++ opts.c      (working copy)
 @@ -1209,6 +1209,25 @@ print_specific_help (unsigned int includ
                        opts-x_help_columns, opts, lang_mask);
  }

 +
 +/* Set parameters to more appropriate values when profile information
 +   is available.  */
 +static void
 +set_profile_parameters (struct gcc_options *opts,
 +                       struct gcc_options *opts_set)
 +{
 +  /* With accurate profile information, inlining is much more
 +     selective and makes better decisions, so increase the
 +     inlining function size limits.  */
 +  maybe_set_param_value
 +    (PARAM_MAX_INLINE_INSNS_SINGLE, 1000,
 +     opts-x_param_values, opts_set-x_param_values);
 +  maybe_set_param_value
 +    (PARAM_MAX_INLINE_INSNS_AUTO, 1000,
 +     opts-x_param_values, opts_set-x_param_values);
 +}
 +
 +
  /* Handle target- and language-independent options.  Return zero to
     generate an unknown option message.  Only options that need
     extra handling need to be listed here; if you simply want
 @@ -1560,6 +1579,7 @@ common_handle_option (struct gcc_options
         opts-x_flag_unswitch_loops = value;
        if (!opts_set-x_flag_gcse_after_reload)
         opts-x_flag_gcse_after_reload = value;
 +      set_profile_parameters (opts, opts_set);
        break;

      case OPT_fprofile_generate_:
 @@ -1580,6 +1600,7 @@ common_handle_option (struct gcc_options
          is done.  */
        if (!opts_set-x_flag_ipa_reference  in_lto_p)
          opts-x_flag_ipa_reference = false;
 +      set_profile_parameters (opts, opts_set);
        break;

      case OPT_fshow_column:




New options to disable/enable any pass for any functions (issue4550056)

2011-05-18 Thread David Li

In gcc, not all passes have user level control to turn it on/off, and
there is no way to flip on/off the pass for a subset of functions. I
implemented a generic option handling scheme in gcc to allow
disabling/enabling any gcc pass for any specified function(s).  The
new options will be very useful for things like performance
experiments and bug triaging (gcc has dbgcnt mechanism, but not all
passes have the counter).

The option syntax is very similar to -fdump- options. The following
are some examples:

-fdisable-tree-ccp1--- disable ccp1 for all functions
-fenable-tree-cunroll=1   --- enable complete unroll for the function
   whose cgraphnode uid is 1
-fdisable-rtl-gcse2=1:100,300,400:1000   -- disable gcse2 for
   functions at the following
ranges [1,1], [300,400], and 
[400,1000]
-fdisable-tree-einline -- disable early inlining for all callers
-fdisable-ipa-inline -- disable ipa inlininig

In the gcc dumps, the uid numbers are displayed in the function header.

The options are intended to be used internally by gcc developers.

Ok for trunk ? (There is a little LIPO specific change that can be removed).

David

2011-05-18  David Li  davi...@google.com

* final.c (rest_of_clean_state): Call function header dumper.
* opts-global.c (handle_common_deferred_options): Handle new options.
* tree-cfg.c (gimple_dump_cfg): Call function header dumper.
* passes.c (register_one_dump_file): Call register_pass_name.
(pass_init_dump_file): Call function header dumper.
(execute_one_pass): Check explicit enable/disable flag.
(passr_hash): New function.
(passr_eq): 
(register_pass_name):
(get_pass_by_name):
(pass_hash):
(pass_eq):
(enable_disable_pass):
(is_pass_explicitly_enabled_or_disabled):
(is_pass_explicitly_enabled):
(is_pass_explicitly_disabled):


Index: tree-pass.h
===
--- tree-pass.h (revision 173635)
+++ tree-pass.h (working copy)
@@ -644,4 +644,12 @@ extern bool first_pass_instance;
 /* Declare for plugins.  */
 extern void do_per_function_toporder (void (*) (void *), void *);
 
+extern void enable_disable_pass (const char *, bool);
+extern bool is_pass_explicitly_disabled (struct opt_pass *, tree);
+extern bool is_pass_explicitly_enabled (struct opt_pass *, tree);
+extern void register_pass_name (struct opt_pass *, const char *);
+extern struct opt_pass *get_pass_by_name (const char *);
+struct function;
+extern void pass_dump_function_header (FILE *, tree, struct function *);
+
 #endif /* GCC_TREE_PASS_H */
Index: final.c
===
--- final.c (revision 173635)
+++ final.c (working copy)
@@ -4456,19 +4456,7 @@ rest_of_clean_state (void)
}
   else
{
- const char *aname;
- struct cgraph_node *node = cgraph_node (current_function_decl);
-
- aname = (IDENTIFIER_POINTER
-  (DECL_ASSEMBLER_NAME (current_function_decl)));
- fprintf (final_output, \n;; Function (%s) %s\n\n, aname,
-node-frequency == NODE_FREQUENCY_HOT
-?  (hot)
-: node-frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED
-?  (unlikely executed)
-: node-frequency == NODE_FREQUENCY_EXECUTED_ONCE
-?  (executed once)
-: );
+ pass_dump_function_header (final_output, current_function_decl, cfun);
 
  flag_dump_noaddr = flag_dump_unnumbered = 1;
  if (flag_compare_debug_opt || flag_compare_debug)
Index: common.opt
===
--- common.opt  (revision 173635)
+++ common.opt  (working copy)
@@ -1018,6 +1018,14 @@ fdiagnostics-show-option
 Common Var(flag_diagnostics_show_option) Init(1)
 Amend appropriate diagnostic messages with the command line option that 
controls them
 
+fdisable-
+Common Joined RejectNegative Var(common_deferred_options) Defer
+-fdisable-[tree|rtl|ipa]-pass=range1+range2 disables an optimization pass
+
+fenable-
+Common Joined RejectNegative Var(common_deferred_options) Defer
+-fenable-[tree|rtl|ipa]-pass=range1+range2 enables an optimization pass
+
 fdump-
 Common Joined RejectNegative Var(common_deferred_options) Defer
 -fdump-type  Dump various compiler internals to a file
Index: opts-global.c
===
--- opts-global.c   (revision 173635)
+++ opts-global.c   (working copy)
@@ -411,6 +411,12 @@ handle_common_deferred_options (void)
error (unrecognized command line option %-fdump-%s%, opt-arg);
  break;
 
+   case OPT_fenable_:
+   case OPT_fdisable_:
+ enable_disable_pass (opt-arg, (opt-opt_index == OPT_fenable_?
+

Re: [google] Increase inlining limits with FDO/LIPO

2011-05-18 Thread Xinliang David Li
Ok with that change to google/main with some retesting.

David

On Wed, May 18, 2011 at 11:34 AM, Mark Heffernan meh...@google.com wrote:
 On Wed, May 18, 2011 at 10:52 AM, Xinliang David Li davi...@google.com 
 wrote:
 The new change won't help those. Your original place will be ok if you
 test profile_arcs and branch_probability flags.

 Ah, yes.  I see your point now. Reverted to the original change with
 condition profile_arc_flag and flag_branch_probabilities.

 Mark


 David


 On Wed, May 18, 2011 at 10:39 AM, Mark Heffernan meh...@google.com wrote:
 On Tue, May 17, 2011 at 11:34 PM, Xinliang David Li davi...@google.com
 wrote:

 To make consistent inline decisions between profile-gen and
 profile-use, probably better to check these two:

 flag_profile_arcs and flag_branch_probabilities.  -fprofile-use
 enables profile-arcs, and value profiling is enabled only when
 edge/branch profiling is enabled (so no need to be checked).

 I changed the location where these parameters are set to someplace more
 appropriate (to where the flags are set when profile gen/use is indicated).
  Verified identical binaries are generated.
 OK as updated?

 Mark
 2011-05-18  Mark Heffernan  meh...@google.com
 * opts.c (set_profile_parameters): New function.
 Index: opts.c
 ===
 --- opts.c      (revision 173666)
 +++ opts.c      (working copy)
 @@ -1209,6 +1209,25 @@ print_specific_help (unsigned int includ
                        opts-x_help_columns, opts, lang_mask);
  }

 +
 +/* Set parameters to more appropriate values when profile information
 +   is available.  */
 +static void
 +set_profile_parameters (struct gcc_options *opts,
 +                       struct gcc_options *opts_set)
 +{
 +  /* With accurate profile information, inlining is much more
 +     selective and makes better decisions, so increase the
 +     inlining function size limits.  */
 +  maybe_set_param_value
 +    (PARAM_MAX_INLINE_INSNS_SINGLE, 1000,
 +     opts-x_param_values, opts_set-x_param_values);
 +  maybe_set_param_value
 +    (PARAM_MAX_INLINE_INSNS_AUTO, 1000,
 +     opts-x_param_values, opts_set-x_param_values);
 +}
 +
 +
  /* Handle target- and language-independent options.  Return zero to
     generate an unknown option message.  Only options that need
     extra handling need to be listed here; if you simply want
 @@ -1560,6 +1579,7 @@ common_handle_option (struct gcc_options
         opts-x_flag_unswitch_loops = value;
        if (!opts_set-x_flag_gcse_after_reload)
         opts-x_flag_gcse_after_reload = value;
 +      set_profile_parameters (opts, opts_set);
        break;

      case OPT_fprofile_generate_:
 @@ -1580,6 +1600,7 @@ common_handle_option (struct gcc_options
          is done.  */
        if (!opts_set-x_flag_ipa_reference  in_lto_p)
          opts-x_flag_ipa_reference = false;
 +      set_profile_parameters (opts, opts_set);
        break;

      case OPT_fshow_column:





Re: [PATCH,c++] describe reasons for function template overload resolution failure

2011-05-18 Thread Nathan Froyd
On 05/18/2011 01:45 PM, Jason Merrill wrote:
 Thanks for the background; I will keep the principle in mind.  IMHO, in
 a case like this where we're logically printing one diagnostic (one
 error and then some number of explanatory notes) keeping all the logic
 for the diagnostic centralized makes more sense.
 
 I understand, but that means we have to create a whole data structure to try
 and preserve information about the failure, and either having to duplicate
 every possible error or give less informative messages.  I feel even more
 strongly about this after looking more closely at your patch.

Thank you for the review.  I'll go back and try things the way you suggest;
before I go off and do that, I've taken your comments to mean that:

- fn_type_unification/type_unification_real and associated callers should take
  a boolean `explain' parameter, which is normally false;

- failed calls to fn_type_unification should save the arguments for the call
  for future explanation;

- printing diagnostic messages should call fn_type_unification with the saved
  arguments and a true `explain' parameter.

This is similar to passing `struct unification_info' and really only involves
shuffling code from call.c into the unify_* functions in pt.c and some minor
changes to the rejection_reason code in call.c.  The only wrinkle I see is
that in cases like these:

   if (TREE_PURPOSE (TREE_VEC_ELT (tparms, i)))
 {
   tree parm = TREE_VALUE (TREE_VEC_ELT (tparms, i));
   tree arg = TREE_PURPOSE (TREE_VEC_ELT (tparms, i));
   arg = tsubst_template_arg (arg, targs, tf_none, NULL_TREE);
   arg = convert_template_argument (parm, arg, targs, tf_none,
i, NULL_TREE, ui);
   if (arg == error_mark_node)
 return unify_parameter_deduction_failure (ui, parm);
 
 In this case, the problem is that we tried to use the default template
 argument but it didn't work for some reason; we should say that, not just say
 we didn't deduce something, or the users will say but there's a default
 argument!.
 
 In this case, we should do the substitution again with tf_warning_or_error so
 the user can see what the problem actually is, not just say that there was
 some unspecified problem.

 if (coerce_template_parms (parm_parms,
full_argvec,
TYPE_TI_TEMPLATE (parm),
tf_none,
/*require_all_args=*/true,
/*use_default_args=*/false, ui)
 == error_mark_node)
   return 1;
 
 Rather than pass ui down into coerce_template_parms we should just note when
 it fails and run it again at diagnostic time.
 
   converted_args
 = (coerce_template_parms (tparms, explicit_targs, NULL_TREE, tf_none,
   /*require_all_args=*/false,
   /*use_default_args=*/false, ui));
   if (converted_args == error_mark_node)
 return 1;

 Here too.

   if (fntype == error_mark_node)
 return unify_substitution_failure (ui);
 
 And this should remember the arguments so we can do the tsubst again at
 diagnostic time.

and other bits of pt.c, I'm interpreting your suggestions to mean that
tf_warning_or_error should be passed if `explain' is true.  That doesn't seem
like the best interface for diagnostics, as we'll get:

foo.cc:105:40 error: no matching function for call to bar (...)
foo.cc:105:40 note: candidates are:
bar.hh:7000:30 note: bar (...)
bar.hh:7000:30 note: [some reason]
bar.hh:4095:63 note: bar (...)
bar.hh:... error: [some message from tf_warning_or_error code]

I'm not sure that the last location there will necessary be the same as the
one that's printed for the declaration.  I think I'll punt on that issue for
the time being until we see how the diagnostics work out.  There's also the
matter of the error vs. note diagnostic.  I think it'd be nicer to keep the
conformity of a note for all the explanations; the only way I see to do that
is something like:

- Add a tf_note flag; pass it at all appropriate call sites when explaining
  things;

- Add a tf_issue_diagnostic flag that's the union of tf_{warning,error,note};

- Change code that looks like:

  if (complain  tf_warning_or_error)
error (STUFF);

  to something like:

  if (complain  tf_issue_diagnostic)
emit_diagnostic (complain  tf_note ? DK_NOTE : DK_ERROR, STUFF);

  passing input_location if we're not already passing a location.

That involves a lot of code churn.  (Not a lot if you just modified the
functions above, but with this scheme, you'd have to call instantiate_template
again from the diagnostic code, and I assume you'd want to call that with
tf_note as well, which means hitting a lot more code.)  I don't see a better
way 

Re: [Patch, Fortran] PR 48700: memory leak with MOVE_ALLOC

2011-05-18 Thread Janus Weil
 The patch was regtested on x86_64-unknown-linux-gnu. Ok for trunk?

 OK. Thanks for the patch!

Thanks, Tobias. Committed as r173874.


 (What next on your gfortran agenda?)

Well, I am pretty busy with my day job at university and working on
my PhD, which will not allow me to make huge leaps on gfortran anytime
soon (and was the reason for my abstinence from this mailing list
during the last weeks). However, I'll surely try to continue working
on a few PRs in the OOP  friends area. As the OOP wiki page shows,
there is enough work left to do in this respect (e.g. we urgently need
polymorphic deallocation and there are problems with type-bound
operators and assignments, just to name a few).

In case anyone feels a strong urge to implement polymorphic arrays or
finalization, please go ahead! I will probably not be able to take
this on myself in the next months, but I can offer some support and
advice regarding the present implementation of polymorphism and how to
extend it.

This is all I can promise right now ...

Cheers,
Janus



 2011-05-16  Janus Weil  ja...@gcc.gnu.org

       PR fortran/48700
       * trans-intrinsic.c (gfc_conv_intrinsic_move_alloc): Deallocate 'TO'
       argument to avoid memory leaks.

 2011-05-16  Janus Weil  ja...@gcc.gnu.org

       PR fortran/48700
       * gfortran.dg/move_alloc_4.f90: New.



Re: Libiberty: POSIXify psignal definition

2011-05-18 Thread Joseph S. Myers
On Wed, 18 May 2011, DJ Delorie wrote:

 At this point, though, I'm tempted to say there's no such thing as a
 target libiberty and rip all the target-libiberty rules out, and let

Yes please.  I've been arguing for that for some time.

http://gcc.gnu.org/ml/gcc/2009-04/msg00410.html
http://gcc.gnu.org/ml/gcc/2010-03/msg2.html
http://gcc.gnu.org/ml/gcc/2010-03/msg00012.html
http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01231.html
http://gcc.gnu.org/ml/gcc-bugs/2011-03/msg00206.html
http://gcc.gnu.org/ml/gcc/2011-03/msg00465.html
http://gcc.gnu.org/ml/gcc-patches/2011-03/msg02304.html

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Libiberty: POSIXify psignal definition

2011-05-18 Thread DJ Delorie

What about these?

dependencies = { module=all-target-fastjar; on=all-target-libiberty; };
dependencies = { module=all-target-libobjc; on=all-target-libiberty; };
dependencies = { module=all-target-libstdc++-v3; on=all-target-libiberty; };


Re: New options to disable/enable any pass for any functions (issue4550056)

2011-05-18 Thread Xinliang David Li
Thanks for the comment. Will fix those.

David

On Wed, May 18, 2011 at 12:30 PM, Joseph S. Myers
jos...@codesourcery.com wrote:
 On Wed, 18 May 2011, David Li wrote:

 +      error (Unrecognized option %s, is_enable ? -fenable : 
 -fdisable);

 +      error (Unknown pass %s specified in %s,
 +          phase_name,
 +          is_enable ? -fenable : -fdisable);

 Follow GNU Coding Standards for diagnostics (start with lowercase letter).

 +      inform (UNKNOWN_LOCATION, %s pass %s for functions in the range of 
 [%u, %u]\n,
 +              is_enable? Enable:Disable, phase_name, new_range-start, 
 new_range-last);

 Use separate calls to inform for the enable and disable cases, so that
 full sentences can be extracted for translation.

 +           error (Invalid range %s in option %s,
 +                  one_range,
 +                  is_enable ? -fenable : -fdisable);

 GNU Coding Standards.

 +               error (Invalid range %s in option %s,

 Likewise.

 +          inform (UNKNOWN_LOCATION, %s pass %s for functions in the range 
 of [%u, %u]\n,
 +                  is_enable? Enable:Disable, phase_name, 
 new_range-start, new_range-last);

 Again needs GCS and i18n fixes.

 --
 Joseph S. Myers
 jos...@codesourcery.com



Re: [PATCH,c++] describe reasons for function template overload resolution failure

2011-05-18 Thread Jason Merrill

On 05/18/2011 03:00 PM, Nathan Froyd wrote:

Thank you for the review.  I'll go back and try things the way you suggest;
before I go off and do that, I've taken your comments to mean that:

- fn_type_unification/type_unification_real and associated callers should take
   a boolean `explain' parameter, which is normally false;

- failed calls to fn_type_unification should save the arguments for the call
   for future explanation;

- printing diagnostic messages should call fn_type_unification with the saved
   arguments and a true `explain' parameter.


Yes, that's what I had in mind.  Though I think you can reconstruct the 
arguments rather than save them.


...

bar.hh:4095:63 note: bar (...)
bar.hh:... error: [some message from tf_warning_or_error code]



I'm not sure that the last location there will necessary be the same as the
one that's printed for the declaration.  I think I'll punt on that issue for
the time being until we see how the diagnostics work out.  There's also the
matter of the error vs. note diagnostic.  I think it'd be nicer to keep the
conformity of a note for all the explanations


Nicer, yes, but I think that's a secondary concern after usefulness of 
the actual message.  In similar cases I've introduced the errors with 
another message like %qD is implicitly deleted because the default 
definition would be ill-formed:


Or, in this case, deduction failed because substituting the template 
arguments would be ill-formed:



; the only way I see to do that
is something like:

- Add a tf_note flag; pass it at all appropriate call sites when explaining
   things;

- Add a tf_issue_diagnostic flag that's the union of tf_{warning,error,note};

- Change code that looks like:

   if (complain  tf_warning_or_error)
 error (STUFF);

   to something like:

   if (complain  tf_issue_diagnostic)
 emit_diagnostic (complain  tf_note ? DK_NOTE : DK_ERROR,STUFF);

   passing input_location if we're not already passing a location.

That involves a lot of code churn.  (Not a lot if you just modified the
functions above, but with this scheme, you'd have to call instantiate_template
again from the diagnostic code, and I assume you'd want to call that with
tf_note as well, which means hitting a lot more code.)  I don't see a better
way to keep the diagnostics uniform, but I might be making things too
complicated; did you have a different idea of how to implement what you were
suggesting?


That all makes sense, but I'd put it in a follow-on patch.  And wrap the 
complexity in a cp_error function that takes a complain parameter and 
either gives no message, a note, or an error depending.


Jason


Make ARM -mfpu= option handling use Enum

2011-05-18 Thread Joseph S. Myers
This patch continues the cleanup of ARM option handling by making
-mfpu= handling use Enum, with the table of FPUs moved to a new
arm-fpus.def.

Tested building cc1 and xgcc for cross to arm-eabi.  Will commit to
trunk in the absence of target maintainer objections.

contrib:
2011-05-18  Joseph Myers  jos...@codesourcery.com

* gcc_update (gcc/config/arm/arm-tables.opt): Also depend on
gcc/config/arm/arm-fpus.def.

gcc:
2011-05-18  Joseph Myers  jos...@codesourcery.com

* config/arm/arm-fpus.def: New.
* config/arm/genopt.sh: Generate Enum and EnumValue entries from
arm-fpus.def.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm.c (all_fpus): Move contents to arm-fpus.def.
(arm_option_override): Don't decode FPU name to string here.
* config/arm/arm.opt (mfpu=): Use Enum.
* config/arm/t-arm ($(srcdir)/config/arm/arm-tables.opt, arm.o):
Update dependencies.

Index: contrib/gcc_update
===
--- contrib/gcc_update  (revision 173864)
+++ contrib/gcc_update  (working copy)
@@ -80,7 +80,7 @@
 gcc/config.in: gcc/cstamp-h.in
 gcc/fixinc/fixincl.x: gcc/fixinc/fixincl.tpl gcc/fixinc/inclhack.def
 gcc/config/arm/arm-tune.md: gcc/config/arm/arm-cores.def 
gcc/config/arm/gentune.sh
-gcc/config/arm/arm-tables.opt: gcc/config/arm/arm-arches.def 
gcc/config/arm/arm-cores.def gcc/config/arm/genopt.sh
+gcc/config/arm/arm-tables.opt: gcc/config/arm/arm-arches.def 
gcc/config/arm/arm-cores.def gcc/config/arm/arm-fpus.def 
gcc/config/arm/genopt.sh
 gcc/config/m68k/m68k-tables.opt: gcc/config/m68k/m68k-devices.def 
gcc/config/m68k/m68k-isas.def gcc/config/m68k/m68k-microarchs.def 
gcc/config/m68k/genopt.sh
 gcc/config/mips/mips-tables.opt: gcc/config/mips/mips-cpus.def 
gcc/config/mips/genopt.sh
 gcc/config/rs6000/rs6000-tables.opt: gcc/config/rs6000/rs6000-cpus.def 
gcc/config/rs6000/genopt.sh
Index: gcc/config/arm/arm-tables.opt
===
--- gcc/config/arm/arm-tables.opt   (revision 173864)
+++ gcc/config/arm/arm-tables.opt   (working copy)
@@ -1,5 +1,6 @@
 ; -*- buffer-read-only: t -*-
-; Generated automatically by genopt.sh from arm-cores.def and arm-arches.def.
+; Generated automatically by genopt.sh from arm-cores.def, arm-arches.def
+; and arm-fpus.def.
 
 ; Copyright (C) 2011 Free Software Foundation, Inc.
 ;
@@ -339,3 +340,61 @@
 EnumValue
 Enum(arm_arch) String(iwmmxt2) Value(24)
 
+Enum
+Name(arm_fpu) Type(int)
+Known ARM FPUs (for use with the -mfpu= option):
+
+EnumValue
+Enum(arm_fpu) String(fpa) Value(0)
+
+EnumValue
+Enum(arm_fpu) String(fpe2) Value(1)
+
+EnumValue
+Enum(arm_fpu) String(fpe3) Value(2)
+
+EnumValue
+Enum(arm_fpu) String(maverick) Value(3)
+
+EnumValue
+Enum(arm_fpu) String(vfp) Value(4)
+
+EnumValue
+Enum(arm_fpu) String(vfpv3) Value(5)
+
+EnumValue
+Enum(arm_fpu) String(vfpv3-fp16) Value(6)
+
+EnumValue
+Enum(arm_fpu) String(vfpv3-d16) Value(7)
+
+EnumValue
+Enum(arm_fpu) String(vfpv3-d16-fp16) Value(8)
+
+EnumValue
+Enum(arm_fpu) String(vfpv3xd) Value(9)
+
+EnumValue
+Enum(arm_fpu) String(vfpv3xd-fp16) Value(10)
+
+EnumValue
+Enum(arm_fpu) String(neon) Value(11)
+
+EnumValue
+Enum(arm_fpu) String(neon-fp16) Value(12)
+
+EnumValue
+Enum(arm_fpu) String(vfpv4) Value(13)
+
+EnumValue
+Enum(arm_fpu) String(vfpv4-d16) Value(14)
+
+EnumValue
+Enum(arm_fpu) String(fpv4-sp-d16) Value(15)
+
+EnumValue
+Enum(arm_fpu) String(neon-vfpv4) Value(16)
+
+EnumValue
+Enum(arm_fpu) String(vfp3) Value(17)
+
Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c(revision 173864)
+++ gcc/config/arm/arm.c(working copy)
@@ -939,25 +939,10 @@
 
 static const struct arm_fpu_desc all_fpus[] =
 {
-  {fpa,  ARM_FP_MODEL_FPA, 0, VFP_NONE, false, false},
-  {fpe2, ARM_FP_MODEL_FPA, 2, VFP_NONE, false, false},
-  {fpe3, ARM_FP_MODEL_FPA, 3, VFP_NONE, false, false},
-  {maverick, ARM_FP_MODEL_MAVERICK, 0, VFP_NONE, false, false},
-  {vfp,  ARM_FP_MODEL_VFP, 2, VFP_REG_D16, false, false},
-  {vfpv3,ARM_FP_MODEL_VFP, 3, VFP_REG_D32, false, false},
-  {vfpv3-fp16,   ARM_FP_MODEL_VFP, 3, VFP_REG_D32, false, true},
-  {vfpv3-d16,ARM_FP_MODEL_VFP, 3, VFP_REG_D16, false, false},
-  {vfpv3-d16-fp16,   ARM_FP_MODEL_VFP, 3, VFP_REG_D16, false, true},
-  {vfpv3xd,  ARM_FP_MODEL_VFP, 3, VFP_REG_SINGLE, false, false},
-  {vfpv3xd-fp16, ARM_FP_MODEL_VFP, 3, VFP_REG_SINGLE, false, true},
-  {neon, ARM_FP_MODEL_VFP, 3, VFP_REG_D32, true , false},
-  {neon-fp16,ARM_FP_MODEL_VFP, 3, VFP_REG_D32, true , true },
-  {vfpv4,ARM_FP_MODEL_VFP, 4, VFP_REG_D32, false, true},
-  {vfpv4-d16,ARM_FP_MODEL_VFP, 4, VFP_REG_D16, false, true},
-  {fpv4-sp-d16,  ARM_FP_MODEL_VFP, 4, VFP_REG_SINGLE, 

[Patch, fortran] Update documentation and error messages for -ffpe-trap

2011-05-18 Thread Janne Blomqvist
Hi,

the attached patch updates the documentation and error messages for
the -ffpe-trap= option:

- The IEEE 754 name for the loss of precision exception is
inexact, and not precision (both in 754-1985 and 754-2008). So use
that instead, while still allowing precision as an alias for inexact
for backwards compatibility. Also, change the name of the
corresponding macro in the internal headers (ABI is not broken since
the value is still the same).

- The denormal exception is not an IEEE exception, but an additional
one supported at least on x86. And the difference between underflow
and denormal is, AFAICS, that underflow refers to the result of a FP
operation, whereas the denormal exception means that an operand to an
operation was a denormal. So try to clarify that.

- In fpu-aix.h we had a bug where we enabled underflow instead of
inexact when inexact was specified. Fixed.

Regtested on x86_64-unknown-linux-gnu, Ok for trunk?

frontend ChangeLog:

2011-05-18  Janne Blomqvist  j...@gcc.gnu.org

* gfortran.texi (set_fpe): Update documentation.
* invoke.texi (-ffpe-trap): Likewise.
* libgfortran.h (GFC_FPE_PRECISION): Rename to GFC_FPE_INEXACT.
* options.c (gfc_handle_fpe_trap_option): Handle inexact and make
precision an alias for it.

libgfortran ChangeLog:

2011-05-18  Janne Blomqvist  j...@gcc.gnu.org

* config/fpu-387.h (set_fpu): Use renamed inexact macro.
* config/fpu-aix.h (set_fpu): Clarify error messages, use renamed
inexact macro, set TRP_INEXACT for inexact exception instead of
TRP_UNDERFLOW.
* config/fpu-generic.h (set_fpu): Clarify error messages, use
renamed inexact macro.
* config/fpu-glibc.h (set_fpu): Likewise.
* config/fpu-sysv.h (set_fpu): Likewise.


-- 
Janne Blomqvist
diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
index 995d9d8..4db506c 100644
--- a/gcc/fortran/gfortran.texi
+++ b/gcc/fortran/gfortran.texi
@@ -2718,16 +2718,15 @@ int main (int argc, char *argv[])
 
 
 @node _gfortran_set_fpe
-@subsection @code{_gfortran_set_fpe} --- Set when a Floating Point Exception should be raised
+@subsection @code{_gfortran_set_fpe} --- Enable floating point exception traps
 @fnindex _gfortran_set_fpe
 @cindex libgfortran initialization, set_fpe
 
 @table @asis
 @item @emph{Description}:
-@code{_gfortran_set_fpe} sets the IEEE exceptions for which a
-Floating Point Exception (FPE) should be raised.  On most systems,
-this will result in a SIGFPE signal being sent and the program
-being interrupted.
+@code{_gfortran_set_fpe} enables floating point exception traps for
+the specified exceptions.  On most systems, this will result in a
+SIGFPE signal being sent and the program being aborted.
 
 @item @emph{Syntax}:
 @code{void _gfortran_set_fpe (int val)}
@@ -2738,7 +2737,7 @@ being interrupted.
 (bitwise or-ed) zero (0, default) no trapping,
 @code{GFC_FPE_INVALID} (1), @code{GFC_FPE_DENORMAL} (2),
 @code{GFC_FPE_ZERO} (4), @code{GFC_FPE_OVERFLOW} (8),
-@code{GFC_FPE_UNDERFLOW} (16), and @code{GFC_FPE_PRECISION} (32).
+@code{GFC_FPE_UNDERFLOW} (16), and @code{GFC_FPE_INEXACT} (32).
 @end multitable
 
 @item @emph{Example}:
diff --git a/gcc/fortran/invoke.texi b/gcc/fortran/invoke.texi
index ab45072..41fee67 100644
--- a/gcc/fortran/invoke.texi
+++ b/gcc/fortran/invoke.texi
@@ -919,21 +919,31 @@ GNU Fortran compiler itself.  This option is deprecated; use
 
 @item -ffpe-trap=@var{list}
 @opindex @code{ffpe-trap=}@var{list}
-Specify a list of IEEE exceptions when a Floating Point Exception
-(FPE) should be raised.  On most systems, this will result in a SIGFPE
-signal being sent and the program being interrupted, producing a core
-file useful for debugging.  @var{list} is a (possibly empty) comma-separated
-list of the following IEEE exceptions: @samp{invalid} (invalid floating
-point operation, such as @code{SQRT(-1.0)}), @samp{zero} (division by
-zero), @samp{overflow} (overflow in a floating point operation),
-@samp{underflow} (underflow in a floating point operation),
-@samp{precision} (loss of precision during operation) and @samp{denormal}
-(operation produced a denormal value).
-
-Some of the routines in the Fortran runtime library, like
-@samp{CPU_TIME}, are likely to trigger floating point exceptions when
-@code{ffpe-trap=precision} is used. For this reason, the use of 
-@code{ffpe-trap=precision} is not recommended.
+Specify a list of floating point exception traps to enable.  On most
+systems, if a floating point exception occurs and the trap for that
+exception is enabled, a SIGFPE signal will be sent and the program
+being aborted, producing a core file useful for debugging.  @var{list}
+is a (possibly empty) comma-separated list of the following
+exceptions: @samp{invalid} (invalid floating point operation, such as
+@code{SQRT(-1.0)}), @samp{zero} (division by zero), @samp{overflow}
+(overflow in a floating point operation), @samp{underflow} (underflow
+in a 

Re: [PATCH]: Restore bootstrap with --enable-build-with-cxx

2011-05-18 Thread Toon Moene

On 05/18/2011 05:41 AM, Gabriel Dos Reis wrote:


On Tue, May 17, 2011 at 2:46 PM, Toon Moenet...@moene.org  wrote:



On 05/17/2011 08:32 PM, Uros Bizjak wrote:


Tested on x86_64-pc-linux-gnu {, m32} with --enable-build-with-cxx.
Committed to mainline SVN as obvious.


Does that mean that I can now remove the --disable-werror from my daily C++
bootstrap run ?


Well, that certainly worked, as exemplified by this:

http://gcc.gnu.org/ml/gcc-testresults/2011-05/msg01890.html

At least that would enable my daily run (between 18:10 and 20:10 UTC) to 
catch -Werror mistakes ...



It's great that some people understand the intricacies of the
infight^H^H^H^H^H^H differences between the C and C++ type model.

OK: 1/2 :-)


I suspect this infight would vanish if we just switched, as we discussed
in the past.


Perhaps it would just help if we implemented the next step of the plan 
(http://gcc.gnu.org/wiki/gcc-in-cxx):


# it would be a good thing to try forcing the C++ host compiler 
requirement for GCC 4.[7] with just building stage1 with C++ and 
stage2/3 with the stage1 C compiler. --disable-build-with-cxx would be a 
workaround for a missing C++ host compiler.


Of course, that still wouldn't make it possible to implement C++ 
solutions for C hacks because the --disable-build-with-cxx crowd would 
cry foul over this ...


--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news


Re: [PATCH][?/n] LTO type merging cleanup

2011-05-18 Thread Richard Guenther
On Wed, May 18, 2011 at 7:20 PM, Jan Hubicka hubi...@ucw.cz wrote:

 We can end up with an infinite recursion as gimple_register_type
 tries to register TYPE_MAIN_VARIANT first.  This is because we
 are being called from the LTO type-fixup code which walks the
 type graph and adjusts types to their leaders.  So we can
 be called for type SCCs that are only partially fixed up yet
 which means TYPE_MAIN_VARIANT might temporarily not honor
 the invariant that the main variant of a main variant is itself.
 Thus, simply avoid recursing more than once - we are sure that
 we will be reaching at most type duplicates in further recursion.

 Bootstrap  regtest pending on x86_64-unknown-linux-gnu.

 With this funcion WPA stage passes with some improvements I repported to 
 mozilla metabug.

 We now get ICE in ltrans:
 #0  gimple_register_type (t=0x0) at ../../gcc/gimple.c:4616
 #1  0x005a0fc9 in gimple_register_canonical_type (t=0x7fffe851f498) 
 at ../../gcc/gimple.c:4890
 #2  0x0048f14d in lto_ft_type (t=0x7fffe851f498) at 
 ../../gcc/lto/lto.c:401
 #3  lto_fixup_types (t=0x7fffe851f498) at ../../gcc/lto/lto.c:581
 #4  0x0048f4a0 in uniquify_nodes (node=Unhandled dwarf expression 
 opcode 0xf3

 TYPE_MAIN_VARIANT is NULL.
 (gdb) up
 #1  0x005a0fc9 in gimple_register_canonical_type (t=0x7fffe851f498) 
 at ../../gcc/gimple.c:4890
 4890      t = gimple_register_type (TYPE_MAIN_VARIANT (t));
 (gdb) p debug_generic_stmt (t)
 struct _ffi_type

 $1 = void
 (gdb) p debug_tree (t)
  record_type 0x7fffe851f498 _ffi_type BLK
    size integer_cst 0x77ecf680 type integer_type 0x77eca0a8 
 bit_size_type constant 192
    unit size integer_cst 0x77ecf640 type integer_type 0x77eca000 
 constant 24
    align 64 symtab 0 alias set -1 structural equality
    fields field_decl 0x7fffe87684c0 size
        type integer_type 0x77eca690 long unsigned int public unsigned DI
            size integer_cst 0x77ecf1e0 constant 64
            unit size integer_cst 0x77ecf200 constant 8
            align 64 symtab 0 alias set -1 canonical type 0x77eca690 
 precision 64 min integer_cst 0x77ecf220 0 max integer_cst 
 0x77ecf1c0 18446744073709551615
            pointer_to_this pointer_type 0x75336150 reference_to_this 
 reference_type 0x70aba000
        used unsigned nonlocal DI file ctypes/libffi/include/ffi.h line 109 
 col 0 size integer_cst 0x77ecf1e0 64 unit size integer_cst 
 0x77ecf200 8
        align 64 offset_align 128
        offset integer_cst 0x77ebaf00 constant 0
        bit offset integer_cst 0x77ecf420 constant 0 context 
 record_type 0x7fffe851f2a0 _ffi_type
        chain field_decl 0x7fffe8768558 alignment type integer_type 
 0x77eca3f0 short unsigned int
            used unsigned nonlocal HI file ctypes/libffi/include/ffi.h line 
 110 col 0
            size integer_cst 0x77ecf080 constant 16
            unit size integer_cst 0x77ecf0a0 constant 2
            align 16 offset_align 128 offset integer_cst 0x77ebaf00 0 
 bit offset integer_cst 0x77ecf1e0 64 context record_type 
 0x7fffe851f2a0 _ffi_type chain field_decl 0x7fffe87685f0 type
    chain type_decl 0x7fffe8966ac8 _ffi_type
 $2 = void

 Let me know if there is anything easy I could work out ;)
 I think the bug may be in the recursion guard.  When you have cycle of length
 greater than 2 of MVs, you won't walk them all.

That doesn't matter.  MVs are acyclic initially (in fact the chain has
length 1), only during
fixup we can temporarily create larger chains or cycles.  MVs also
never are NULL, so it
would be interesting to see what clears it ...

Richard.

 Honza



[PATCH] Fix VRP MIN/MAX handling with two anti-ranges (PR tree-optimization/49039)

2011-05-18 Thread Jakub Jelinek
Hi!

The testcases below are miscompiled (execute/ by 4.6/4.7, pr49039.C
by 4.6 and twice so by 4.7 (so much that it doesn't abort)), because
VRP thinks that
MIN_EXPR ~[-1UL, -1UL], ~[0, 0] is ~[0, 0] (correct is VARYING and similarly
MAX_EXPR ~[-1UL, -1UL], ~[0, 0] is ~[-1UL, -1UL]).

 min = vrp_int_const_binop (code, vr0.min, vr1.min);
 max = vrp_int_const_binop (code, vr0.max, vr1.max);
is only correct for VR_RANGE for +/min/max, for + we give up
for VRP_ANTI_RANGE.
The following patch instead for both min and max with anti-ranges
returns ~[MAX_EXPR vr0.min, vr1.min, MIN_EXPR vr0.max, vr1.max].
The code later on in the function will change that into VARYING if
there is no intersection and thus min is above max.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.6?

2011-05-18  Jakub Jelinek  ja...@redhat.com

PR tree-optimization/49039
* tree-vrp.c (extract_range_from_binary_expr): For
MIN_EXPR ~[a, b], ~[c, d] and MAX_EXPR ~[a, b], ~[c, d]
return ~[MAX_EXPR a, c, MIN_EXPR b, d].

* gcc.c-torture/execute/pr49039.c: New test.
* gcc.dg/tree-ssa/pr49039.c: New test.
* g++.dg/torture/pr49039.C: New test.

--- gcc/tree-vrp.c.jj   2011-05-11 19:39:03.0 +0200
+++ gcc/tree-vrp.c  2011-05-18 19:13:54.0 +0200
@@ -2358,17 +2358,27 @@ extract_range_from_binary_expr (value_ra
 op0 + op1 == 0, so we cannot claim that the sum is in ~[0,0].
 Note that we are guaranteed to have vr0.type == vr1.type at
 this point.  */
-  if (code == PLUS_EXPR  vr0.type == VR_ANTI_RANGE)
+  if (vr0.type == VR_ANTI_RANGE)
{
- set_value_range_to_varying (vr);
- return;
+ if (code == PLUS_EXPR)
+   {
+ set_value_range_to_varying (vr);
+ return;
+   }
+ /* For MIN_EXPR and MAX_EXPR with two VR_ANTI_RANGEs,
+the resulting VR_ANTI_RANGE is the same - intersection
+of the two ranges.  */
+ min = vrp_int_const_binop (MAX_EXPR, vr0.min, vr1.min);
+ max = vrp_int_const_binop (MIN_EXPR, vr0.max, vr1.max);
+   }
+  else
+   {
+ /* For operations that make the resulting range directly
+proportional to the original ranges, apply the operation to
+the same end of each range.  */
+ min = vrp_int_const_binop (code, vr0.min, vr1.min);
+ max = vrp_int_const_binop (code, vr0.max, vr1.max);
}
-
-  /* For operations that make the resulting range directly
-proportional to the original ranges, apply the operation to
-the same end of each range.  */
-  min = vrp_int_const_binop (code, vr0.min, vr1.min);
-  max = vrp_int_const_binop (code, vr0.max, vr1.max);
 
   /* If both additions overflowed the range kind is still correct.
 This happens regularly with subtracting something in unsigned
--- gcc/testsuite/gcc.c-torture/execute/pr49039.c.jj2011-05-18 
19:18:57.0 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr49039.c   2011-05-18 
19:03:24.0 +0200
@@ -0,0 +1,26 @@
+/* PR tree-optimization/49039 */
+extern void abort (void);
+int cnt;
+
+__attribute__((noinline, noclone)) void
+foo (unsigned int x, unsigned int y)
+{
+  unsigned int minv, maxv;
+  if (x == 1 || y == -2U)
+return;
+  minv = x  y ? x : y;
+  maxv = x  y ? x : y;
+  if (minv == 1)
+++cnt;
+  if (maxv == -2U)
+++cnt;
+}
+
+int
+main ()
+{
+  foo (-2U, 1);
+  if (cnt != 2)
+abort ();
+  return 0;
+}
--- gcc/testsuite/gcc.dg/tree-ssa/pr49039.c.jj  2011-05-18 19:30:04.0 
+0200
+++ gcc/testsuite/gcc.dg/tree-ssa/pr49039.c 2011-05-18 19:29:57.0 
+0200
@@ -0,0 +1,31 @@
+/* PR tree-optimization/49039 */
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-vrp1 } */
+
+extern void bar (void);
+
+void
+foo (unsigned int x, unsigned int y)
+{
+  unsigned int minv, maxv;
+  if (x = 3  x = 6)
+return;
+  if (y = 5  y = 8)
+return;
+  minv = x  y ? x : y;
+  maxv = x  y ? x : y;
+  if (minv == 5)
+bar ();
+  if (minv == 6)
+bar ();
+  if (maxv == 5)
+bar ();
+  if (maxv == 6)
+bar ();
+}
+
+/* { dg-final { scan-tree-dump Folding predicate minv_\[0-9\]* == 5 to 0 
vrp1 } } */
+/* { dg-final { scan-tree-dump Folding predicate minv_\[0-9\]* == 6 to 0 
vrp1 } } */
+/* { dg-final { scan-tree-dump Folding predicate maxv_\[0-9\]* == 5 to 0 
vrp1 } } */
+/* { dg-final { scan-tree-dump Folding predicate maxv_\[0-9\]* == 6 to 0 
vrp1 } } */
+/* { dg-final { cleanup-tree-dump vrp1 } } */
--- gcc/testsuite/g++.dg/torture/pr49039.C.jj   2011-05-18 19:20:45.0 
+0200
+++ gcc/testsuite/g++.dg/torture/pr49039.C  2011-05-18 19:20:03.0 
+0200
@@ -0,0 +1,76 @@
+// PR tree-optimization/49039
+// { dg-do run }
+
+template class T1, class T2
+struct pair
+{
+  T1 first;
+  T2 second;
+  pair (const T1  a, const T2  b):first (a), second (b) {}
+};
+
+template class 

Re: New options to disable/enable any pass for any functions (issue4550056)

2011-05-18 Thread Richard Guenther
On Wed, May 18, 2011 at 8:37 PM, David Li davi...@google.com wrote:

 In gcc, not all passes have user level control to turn it on/off, and
 there is no way to flip on/off the pass for a subset of functions. I
 implemented a generic option handling scheme in gcc to allow
 disabling/enabling any gcc pass for any specified function(s).  The
 new options will be very useful for things like performance
 experiments and bug triaging (gcc has dbgcnt mechanism, but not all
 passes have the counter).

 The option syntax is very similar to -fdump- options. The following
 are some examples:

 -fdisable-tree-ccp1    --- disable ccp1 for all functions
 -fenable-tree-cunroll=1   --- enable complete unroll for the function
                           whose cgraphnode uid is 1
 -fdisable-rtl-gcse2=1:100,300,400:1000   -- disable gcse2 for
                                           functions at the following
                                            ranges [1,1], [300,400], and 
 [400,1000]
 -fdisable-tree-einline -- disable early inlining for all callers
 -fdisable-ipa-inline -- disable ipa inlininig

 In the gcc dumps, the uid numbers are displayed in the function header.

 The options are intended to be used internally by gcc developers.

 Ok for trunk ? (There is a little LIPO specific change that can be removed).

 David

 2011-05-18  David Li  davi...@google.com

        * final.c (rest_of_clean_state): Call function header dumper.
        * opts-global.c (handle_common_deferred_options): Handle new options.
        * tree-cfg.c (gimple_dump_cfg): Call function header dumper.
        * passes.c (register_one_dump_file): Call register_pass_name.
        (pass_init_dump_file): Call function header dumper.
        (execute_one_pass): Check explicit enable/disable flag.
        (passr_hash): New function.
        (passr_eq):
        (register_pass_name):
        (get_pass_by_name):
        (pass_hash):
        (pass_eq):
        (enable_disable_pass):
        (is_pass_explicitly_enabled_or_disabled):
        (is_pass_explicitly_enabled):
        (is_pass_explicitly_disabled):

Bogus changelog entry.

New options need documenting in doc/invoke.texi.

Richard.


 Index: tree-pass.h
 ===
 --- tree-pass.h (revision 173635)
 +++ tree-pass.h (working copy)
 @@ -644,4 +644,12 @@ extern bool first_pass_instance;
  /* Declare for plugins.  */
  extern void do_per_function_toporder (void (*) (void *), void *);

 +extern void enable_disable_pass (const char *, bool);
 +extern bool is_pass_explicitly_disabled (struct opt_pass *, tree);
 +extern bool is_pass_explicitly_enabled (struct opt_pass *, tree);
 +extern void register_pass_name (struct opt_pass *, const char *);
 +extern struct opt_pass *get_pass_by_name (const char *);
 +struct function;
 +extern void pass_dump_function_header (FILE *, tree, struct function *);
 +
  #endif /* GCC_TREE_PASS_H */
 Index: final.c
 ===
 --- final.c     (revision 173635)
 +++ final.c     (working copy)
 @@ -4456,19 +4456,7 @@ rest_of_clean_state (void)
        }
       else
        {
 -         const char *aname;
 -         struct cgraph_node *node = cgraph_node (current_function_decl);
 -
 -         aname = (IDENTIFIER_POINTER
 -                  (DECL_ASSEMBLER_NAME (current_function_decl)));
 -         fprintf (final_output, \n;; Function (%s) %s\n\n, aname,
 -            node-frequency == NODE_FREQUENCY_HOT
 -            ?  (hot)
 -            : node-frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED
 -            ?  (unlikely executed)
 -            : node-frequency == NODE_FREQUENCY_EXECUTED_ONCE
 -            ?  (executed once)
 -            : );
 +         pass_dump_function_header (final_output, current_function_decl, 
 cfun);

          flag_dump_noaddr = flag_dump_unnumbered = 1;
          if (flag_compare_debug_opt || flag_compare_debug)
 Index: common.opt
 ===
 --- common.opt  (revision 173635)
 +++ common.opt  (working copy)
 @@ -1018,6 +1018,14 @@ fdiagnostics-show-option
  Common Var(flag_diagnostics_show_option) Init(1)
  Amend appropriate diagnostic messages with the command line option that 
 controls them

 +fdisable-
 +Common Joined RejectNegative Var(common_deferred_options) Defer
 +-fdisable-[tree|rtl|ipa]-pass=range1+range2 disables an optimization pass
 +
 +fenable-
 +Common Joined RejectNegative Var(common_deferred_options) Defer
 +-fenable-[tree|rtl|ipa]-pass=range1+range2 enables an optimization pass
 +
  fdump-
  Common Joined RejectNegative Var(common_deferred_options) Defer
  -fdump-type  Dump various compiler internals to a file
 Index: opts-global.c
 ===
 --- opts-global.c       (revision 173635)
 +++ opts-global.c       (working copy)
 @@ -411,6 +411,12 @@ handle_common_deferred_options (void)
            error 

Re: [PATCH]: Restore bootstrap with --enable-build-with-cxx

2011-05-18 Thread Richard Guenther
On Wed, May 18, 2011 at 10:17 PM, Toon Moene t...@moene.org wrote:
 On 05/18/2011 05:41 AM, Gabriel Dos Reis wrote:

 On Tue, May 17, 2011 at 2:46 PM, Toon Moenet...@moene.org  wrote:

 On 05/17/2011 08:32 PM, Uros Bizjak wrote:

 Tested on x86_64-pc-linux-gnu {, m32} with --enable-build-with-cxx.
 Committed to mainline SVN as obvious.

 Does that mean that I can now remove the --disable-werror from my daily
 C++
 bootstrap run ?

 Well, that certainly worked, as exemplified by this:

 http://gcc.gnu.org/ml/gcc-testresults/2011-05/msg01890.html

 At least that would enable my daily run (between 18:10 and 20:10 UTC) to
 catch -Werror mistakes ...

 It's great that some people understand the intricacies of the
 infight^H^H^H^H^H^H differences between the C and C++ type model.

 OK: 1/2 :-)

 I suspect this infight would vanish if we just switched, as we discussed
 in the past.

 Perhaps it would just help if we implemented the next step of the plan
 (http://gcc.gnu.org/wiki/gcc-in-cxx):

 # it would be a good thing to try forcing the C++ host compiler requirement
 for GCC 4.[7] with just building stage1 with C++ and stage2/3 with the
 stage1 C compiler. --disable-build-with-cxx would be a workaround for a
 missing C++ host compiler.

Or the other way around, build stage1 with the host C compiler, add
C++ to stage1-languages and build stage2/3 with the stageN C++ compiler.
That avoids the host C++ compiler requirement for now and excercises
the libstdc++ linking issues.

But yes, somebody has to go forward to implement either (or both) variants.

Not that I'm too excited to see GCC built with a C++ compiler (or even C++
features being used).

Richard.


Re: [PATCH] Fix VRP MIN/MAX handling with two anti-ranges (PR tree-optimization/49039)

2011-05-18 Thread Richard Guenther
On Wed, May 18, 2011 at 10:21 PM, Jakub Jelinek ja...@redhat.com wrote:
 Hi!

 The testcases below are miscompiled (execute/ by 4.6/4.7, pr49039.C
 by 4.6 and twice so by 4.7 (so much that it doesn't abort)), because
 VRP thinks that
 MIN_EXPR ~[-1UL, -1UL], ~[0, 0] is ~[0, 0] (correct is VARYING and similarly
 MAX_EXPR ~[-1UL, -1UL], ~[0, 0] is ~[-1UL, -1UL]).

         min = vrp_int_const_binop (code, vr0.min, vr1.min);
         max = vrp_int_const_binop (code, vr0.max, vr1.max);
 is only correct for VR_RANGE for +/min/max, for + we give up
 for VRP_ANTI_RANGE.
 The following patch instead for both min and max with anti-ranges
 returns ~[MAX_EXPR vr0.min, vr1.min, MIN_EXPR vr0.max, vr1.max].
 The code later on in the function will change that into VARYING if
 there is no intersection and thus min is above max.

 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.6?

Ok.  Isn't it latent on the 4.5 branch as well?

Thanks,
Richard.

 2011-05-18  Jakub Jelinek  ja...@redhat.com

        PR tree-optimization/49039
        * tree-vrp.c (extract_range_from_binary_expr): For
        MIN_EXPR ~[a, b], ~[c, d] and MAX_EXPR ~[a, b], ~[c, d]
        return ~[MAX_EXPR a, c, MIN_EXPR b, d].

        * gcc.c-torture/execute/pr49039.c: New test.
        * gcc.dg/tree-ssa/pr49039.c: New test.
        * g++.dg/torture/pr49039.C: New test.

 --- gcc/tree-vrp.c.jj   2011-05-11 19:39:03.0 +0200
 +++ gcc/tree-vrp.c      2011-05-18 19:13:54.0 +0200
 @@ -2358,17 +2358,27 @@ extract_range_from_binary_expr (value_ra
         op0 + op1 == 0, so we cannot claim that the sum is in ~[0,0].
         Note that we are guaranteed to have vr0.type == vr1.type at
         this point.  */
 -      if (code == PLUS_EXPR  vr0.type == VR_ANTI_RANGE)
 +      if (vr0.type == VR_ANTI_RANGE)
        {
 -         set_value_range_to_varying (vr);
 -         return;
 +         if (code == PLUS_EXPR)
 +           {
 +             set_value_range_to_varying (vr);
 +             return;
 +           }
 +         /* For MIN_EXPR and MAX_EXPR with two VR_ANTI_RANGEs,
 +            the resulting VR_ANTI_RANGE is the same - intersection
 +            of the two ranges.  */
 +         min = vrp_int_const_binop (MAX_EXPR, vr0.min, vr1.min);
 +         max = vrp_int_const_binop (MIN_EXPR, vr0.max, vr1.max);
 +       }
 +      else
 +       {
 +         /* For operations that make the resulting range directly
 +            proportional to the original ranges, apply the operation to
 +            the same end of each range.  */
 +         min = vrp_int_const_binop (code, vr0.min, vr1.min);
 +         max = vrp_int_const_binop (code, vr0.max, vr1.max);
        }
 -
 -      /* For operations that make the resulting range directly
 -        proportional to the original ranges, apply the operation to
 -        the same end of each range.  */
 -      min = vrp_int_const_binop (code, vr0.min, vr1.min);
 -      max = vrp_int_const_binop (code, vr0.max, vr1.max);

       /* If both additions overflowed the range kind is still correct.
         This happens regularly with subtracting something in unsigned
 --- gcc/testsuite/gcc.c-torture/execute/pr49039.c.jj    2011-05-18 
 19:18:57.0 +0200
 +++ gcc/testsuite/gcc.c-torture/execute/pr49039.c       2011-05-18 
 19:03:24.0 +0200
 @@ -0,0 +1,26 @@
 +/* PR tree-optimization/49039 */
 +extern void abort (void);
 +int cnt;
 +
 +__attribute__((noinline, noclone)) void
 +foo (unsigned int x, unsigned int y)
 +{
 +  unsigned int minv, maxv;
 +  if (x == 1 || y == -2U)
 +    return;
 +  minv = x  y ? x : y;
 +  maxv = x  y ? x : y;
 +  if (minv == 1)
 +    ++cnt;
 +  if (maxv == -2U)
 +    ++cnt;
 +}
 +
 +int
 +main ()
 +{
 +  foo (-2U, 1);
 +  if (cnt != 2)
 +    abort ();
 +  return 0;
 +}
 --- gcc/testsuite/gcc.dg/tree-ssa/pr49039.c.jj  2011-05-18 19:30:04.0 
 +0200
 +++ gcc/testsuite/gcc.dg/tree-ssa/pr49039.c     2011-05-18 19:29:57.0 
 +0200
 @@ -0,0 +1,31 @@
 +/* PR tree-optimization/49039 */
 +/* { dg-do compile } */
 +/* { dg-options -O2 -fdump-tree-vrp1 } */
 +
 +extern void bar (void);
 +
 +void
 +foo (unsigned int x, unsigned int y)
 +{
 +  unsigned int minv, maxv;
 +  if (x = 3  x = 6)
 +    return;
 +  if (y = 5  y = 8)
 +    return;
 +  minv = x  y ? x : y;
 +  maxv = x  y ? x : y;
 +  if (minv == 5)
 +    bar ();
 +  if (minv == 6)
 +    bar ();
 +  if (maxv == 5)
 +    bar ();
 +  if (maxv == 6)
 +    bar ();
 +}
 +
 +/* { dg-final { scan-tree-dump Folding predicate minv_\[0-9\]* == 5 to 0 
 vrp1 } } */
 +/* { dg-final { scan-tree-dump Folding predicate minv_\[0-9\]* == 6 to 0 
 vrp1 } } */
 +/* { dg-final { scan-tree-dump Folding predicate maxv_\[0-9\]* == 5 to 0 
 vrp1 } } */
 +/* { dg-final { scan-tree-dump Folding predicate maxv_\[0-9\]* == 6 to 0 
 vrp1 } } */
 +/* { dg-final { cleanup-tree-dump vrp1 } } */
 --- gcc/testsuite/g++.dg/torture/pr49039.C.jj   2011-05-18 19:20:45.0 
 +0200
 +++ gcc/testsuite/g++.dg/torture/pr49039.C      

Re: New options to disable/enable any pass for any functions (issue4550056)

2011-05-18 Thread Xinliang David Li
Will fix the Changelog, and add documentation.

Thanks,

David



On Wed, May 18, 2011 at 1:26 PM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Wed, May 18, 2011 at 8:37 PM, David Li davi...@google.com wrote:

 In gcc, not all passes have user level control to turn it on/off, and
 there is no way to flip on/off the pass for a subset of functions. I
 implemented a generic option handling scheme in gcc to allow
 disabling/enabling any gcc pass for any specified function(s).  The
 new options will be very useful for things like performance
 experiments and bug triaging (gcc has dbgcnt mechanism, but not all
 passes have the counter).

 The option syntax is very similar to -fdump- options. The following
 are some examples:

 -fdisable-tree-ccp1    --- disable ccp1 for all functions
 -fenable-tree-cunroll=1   --- enable complete unroll for the function
                           whose cgraphnode uid is 1
 -fdisable-rtl-gcse2=1:100,300,400:1000   -- disable gcse2 for
                                           functions at the following
                                            ranges [1,1], [300,400], and 
 [400,1000]
 -fdisable-tree-einline -- disable early inlining for all callers
 -fdisable-ipa-inline -- disable ipa inlininig

 In the gcc dumps, the uid numbers are displayed in the function header.

 The options are intended to be used internally by gcc developers.

 Ok for trunk ? (There is a little LIPO specific change that can be removed).

 David

 2011-05-18  David Li  davi...@google.com

        * final.c (rest_of_clean_state): Call function header dumper.
        * opts-global.c (handle_common_deferred_options): Handle new options.
        * tree-cfg.c (gimple_dump_cfg): Call function header dumper.
        * passes.c (register_one_dump_file): Call register_pass_name.
        (pass_init_dump_file): Call function header dumper.
        (execute_one_pass): Check explicit enable/disable flag.
        (passr_hash): New function.
        (passr_eq):
        (register_pass_name):
        (get_pass_by_name):
        (pass_hash):
        (pass_eq):
        (enable_disable_pass):
        (is_pass_explicitly_enabled_or_disabled):
        (is_pass_explicitly_enabled):
        (is_pass_explicitly_disabled):

 Bogus changelog entry.

 New options need documenting in doc/invoke.texi.

 Richard.


 Index: tree-pass.h
 ===
 --- tree-pass.h (revision 173635)
 +++ tree-pass.h (working copy)
 @@ -644,4 +644,12 @@ extern bool first_pass_instance;
  /* Declare for plugins.  */
  extern void do_per_function_toporder (void (*) (void *), void *);

 +extern void enable_disable_pass (const char *, bool);
 +extern bool is_pass_explicitly_disabled (struct opt_pass *, tree);
 +extern bool is_pass_explicitly_enabled (struct opt_pass *, tree);
 +extern void register_pass_name (struct opt_pass *, const char *);
 +extern struct opt_pass *get_pass_by_name (const char *);
 +struct function;
 +extern void pass_dump_function_header (FILE *, tree, struct function *);
 +
  #endif /* GCC_TREE_PASS_H */
 Index: final.c
 ===
 --- final.c     (revision 173635)
 +++ final.c     (working copy)
 @@ -4456,19 +4456,7 @@ rest_of_clean_state (void)
        }
       else
        {
 -         const char *aname;
 -         struct cgraph_node *node = cgraph_node (current_function_decl);
 -
 -         aname = (IDENTIFIER_POINTER
 -                  (DECL_ASSEMBLER_NAME (current_function_decl)));
 -         fprintf (final_output, \n;; Function (%s) %s\n\n, aname,
 -            node-frequency == NODE_FREQUENCY_HOT
 -            ?  (hot)
 -            : node-frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED
 -            ?  (unlikely executed)
 -            : node-frequency == NODE_FREQUENCY_EXECUTED_ONCE
 -            ?  (executed once)
 -            : );
 +         pass_dump_function_header (final_output, current_function_decl, 
 cfun);

          flag_dump_noaddr = flag_dump_unnumbered = 1;
          if (flag_compare_debug_opt || flag_compare_debug)
 Index: common.opt
 ===
 --- common.opt  (revision 173635)
 +++ common.opt  (working copy)
 @@ -1018,6 +1018,14 @@ fdiagnostics-show-option
  Common Var(flag_diagnostics_show_option) Init(1)
  Amend appropriate diagnostic messages with the command line option that 
 controls them

 +fdisable-
 +Common Joined RejectNegative Var(common_deferred_options) Defer
 +-fdisable-[tree|rtl|ipa]-pass=range1+range2 disables an optimization pass
 +
 +fenable-
 +Common Joined RejectNegative Var(common_deferred_options) Defer
 +-fenable-[tree|rtl|ipa]-pass=range1+range2 enables an optimization pass
 +
  fdump-
  Common Joined RejectNegative Var(common_deferred_options) Defer
  -fdump-type  Dump various compiler internals to a file
 Index: opts-global.c
 ===
 --- 

Re: [PATCH] Fix VRP MIN/MAX handling with two anti-ranges (PR tree-optimization/49039)

2011-05-18 Thread Jakub Jelinek
On Wed, May 18, 2011 at 10:32:49PM +0200, Richard Guenther wrote:
 On Wed, May 18, 2011 at 10:21 PM, Jakub Jelinek ja...@redhat.com wrote:
  Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.6?
 
 Ok. 

Thanks.

 Isn't it latent on the 4.5 branch as well?

Not only latent, execute/pr49039.c fails at -O2 with 4.3/4.4/4.5 too
(and succeeds with -O1).  I'll queue this for backporting...

Jakub


PATCH: PR target/49002: 128-bit AVX load incorrectly becomes 256-bit AVX load

2011-05-18 Thread H.J. Lu
Hi,

This patch properly handles 256bit load cast.  OK for trunk if there
is no regression?  I will also prepare a patch for 4.6 branch.

Thanks.


H.J.
--
gcc/

2011-05-18  H.J. Lu  hongjiu...@intel.com

PR target/49002
* config/i386/sse.md 
(avx_ssemodesuffixavxsizesuffix_ssemodesuffix):
Properly handle load cast.

gcc/testsuite/

2011-05-18  H.J. Lu  hongjiu...@intel.com

PR target/49002
* gcc.target/i386/pr49002-1.c: New test.
* gcc.target/i386/pr49002-2.c: Likewise.

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 291bffb..cf12a6d 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -10294,12 +10294,13 @@
reload_completed
   [(const_int 0)]
 {
+  rtx op0 = operands[0];
   rtx op1 = operands[1];
-  if (REG_P (op1))
+  if (REG_P (op0))
+op0 = gen_rtx_REG (ssehalfvecmodemode, REGNO (op0));
+  else 
 op1 = gen_rtx_REG (MODEmode, REGNO (op1));
-  else
-op1 = gen_lowpart (MODEmode, op1);
-  emit_move_insn (operands[0], op1);
+  emit_move_insn (op0, op1);
   DONE;
 })
 
diff --git a/gcc/testsuite/gcc.target/i386/pr49002-1.c 
b/gcc/testsuite/gcc.target/i386/pr49002-1.c
new file mode 100644
index 000..7553e82
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr49002-1.c
@@ -0,0 +1,16 @@
+/* PR target/49002 */
+/* { dg-do compile } */
+/* { dg-options -O -mavx } */
+
+#include immintrin.h
+
+void foo(const __m128d *from, __m256d *to, int s)
+{
+  __m256d var = _mm256_castpd128_pd256(from[0]);
+  var = _mm256_insertf128_pd(var, from[s], 1);
+  to[0] = var;
+}
+
+/* Ensure we load into xmm, not ymm.  */
+/* { dg-final { scan-assembler-not vmovapd\[\t \]*\[^,\]*,\[\t \]*%ymm } } */
+/* { dg-final { scan-assembler vmovapd\[\t \]*\[^,\]*,\[\t \]*%xmm } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr49002-2.c 
b/gcc/testsuite/gcc.target/i386/pr49002-2.c
new file mode 100644
index 000..b0e1009
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr49002-2.c
@@ -0,0 +1,14 @@
+/* PR target/49002 */
+/* { dg-do compile } */
+/* { dg-options -O -mavx } */
+
+#include immintrin.h
+
+void foo(const __m128d from, __m256d *to)
+{
+  *to = _mm256_castpd128_pd256(from);
+}
+
+/* Ensure we store ymm, not xmm.  */
+/* { dg-final { scan-assembler-not vmovapd\[\t \]*%xmm\[0-9\]\+,\[^,\]* } } 
*/
+/* { dg-final { scan-assembler vmovapd\[\t \]*%ymm\[0-9\]\+,\[^,\]* } } */


Re: [PATCH]: Restore bootstrap with --enable-build-with-cxx

2011-05-18 Thread Toon Moene

On 05/18/2011 10:31 PM, Richard Guenther wrote:


Not that I'm too excited to see GCC built with a C++ compiler (or even C++
features being used).


Hmmm, you think using false as a value for a pointer-returning 
function is just A-OK ?


Duh, I'm glad I'm using Fortran, where the programmer isn't even 
supposed to know what the value of .FALSE. is, because it is 
implementation dependent.


--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news


Re: [PATCH]: Restore bootstrap with --enable-build-with-cxx

2011-05-18 Thread Richard Guenther
On Wed, May 18, 2011 at 10:44 PM, Toon Moene t...@moene.org wrote:
 On 05/18/2011 10:31 PM, Richard Guenther wrote:

 Not that I'm too excited to see GCC built with a C++ compiler (or even C++
 features being used).

 Hmmm, you think using false as a value for a pointer-returning function is
 just A-OK ?

No, it isn't ;)

Richard.

 Duh, I'm glad I'm using Fortran, where the programmer isn't even supposed to
 know what the value of .FALSE. is, because it is implementation dependent.

 --
 Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
 Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
 At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
 Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news



Re: [PATCH PR45098, 4/10] Iv init cost.

2011-05-18 Thread Zdenek Dvorak
Hi,

 Resubmitting with comment.
 
 The init cost of an iv will in general not be zero. It will be
 exceptional that the iv register happens to be initialized with the
 proper value at no cost. In general, there will at the very least be a
 regcopy or a const set.

OK.  Please add a comment explaining this to the code,

Zdenek

 2011-05-05  Tom de Vries  t...@codesourcery.com
 
   PR target/45098
   * tree-ssa-loop-ivopts.c (determine_iv_cost): Prevent
   cost_base.cost == 0.

 Index: gcc/tree-ssa-loop-ivopts.c
 ===
 --- gcc/tree-ssa-loop-ivopts.c(revision 173380)
 +++ gcc/tree-ssa-loop-ivopts.c(working copy)
 @@ -4688,6 +4688,8 @@ determine_iv_cost (struct ivopts_data *d
  
base = cand-iv-base;
cost_base = force_var_cost (data, base, NULL);
 +  if (cost_base.cost == 0)
 +  cost_base.cost = COSTS_N_INSNS (1);
cost_step = add_cost (TYPE_MODE (TREE_TYPE (base)), data-speed);
  
cost = cost_step + adjust_setup_cost (data, cost_base.cost);



[patch, fortran] Some more function elimination tweaks

2011-05-18 Thread Thomas Koenig

Hello world,

the attached patch does the following:

- It removes the restriction on functions returning allocatables
  for elimination (unnecessary since the introduction of allocatable
  temporary variables)

- It allows character function elimination if the character length is
  a constant known at compile time

- It removes introducing temporary variables for the TRANSFER function;
  this is better be handled by the middle-end.

Regression-tested.

OK for trunk?

Thomas

2011-05-18  Thomas Koenig  tkoe...@gcc.gnu.org

* frontend-passes.c (cfe_register_funcs):  Also register
character functions if their charlens are known and constant.
Also register allocatable functions.

2011-05-18  Thomas Koenig  tkoe...@gcc.gnu.org

* gfortran.dg/function_optimize_8.f90:  New test case.
Index: frontend-passes.c
===
--- frontend-passes.c	(Revision 173754)
+++ frontend-passes.c	(Arbeitskopie)
@@ -137,8 +137,7 @@ optimize_expr (gfc_expr **e, int *walk_subtrees AT
 
 
 /* Callback function for common function elimination, called from cfe_expr_0.
-   Put all eligible function expressions into expr_array.  We can't do
-   allocatable functions.  */
+   Put all eligible function expressions into expr_array.  */
 
 static int
 cfe_register_funcs (gfc_expr **e, int *walk_subtrees ATTRIBUTE_UNUSED,
@@ -148,8 +147,10 @@ cfe_register_funcs (gfc_expr **e, int *walk_subtre
   if ((*e)-expr_type != EXPR_FUNCTION)
 return 0;
 
-  /* We don't do character functions (yet).  */
-  if ((*e)-ts.type == BT_CHARACTER)
+  /* We don't do character functions with unknown charlens.  */
+  if ((*e)-ts.type == BT_CHARACTER 
+   ((*e)-ts.u.cl == NULL || (*e)-ts.u.cl-length == NULL
+	  || (*e)-ts.u.cl-length-expr_type != EXPR_CONSTANT))
 return 0;
 
   /* If we don't know the shape at compile time, we create an allocatable
@@ -163,9 +164,6 @@ cfe_register_funcs (gfc_expr **e, int *walk_subtre
  is specified.  */
   if ((*e)-value.function.esym)
 {
-  if ((*e)-value.function.esym-attr.allocatable)
-	return 0;
-
   /* Don't create an array temporary for elemental functions.  */
   if ((*e)-value.function.esym-attr.elemental  (*e)-rank  0)
 	return 0;
@@ -181,9 +179,10 @@ cfe_register_funcs (gfc_expr **e, int *walk_subtre
   if ((*e)-value.function.isym)
 {
   /* Conversions are handled on the fly by the middle end,
-	 transpose during trans-* stages.  */
+	 transpose during trans-* stages and TRANSFER by the middle end.  */
   if ((*e)-value.function.isym-id == GFC_ISYM_CONVERSION
-	  || (*e)-value.function.isym-id == GFC_ISYM_TRANSPOSE)
+	  || (*e)-value.function.isym-id == GFC_ISYM_TRANSPOSE
+	  || (*e)-value.function.isym-id == GFC_ISYM_TRANSFER)
 	return 0;
 
   /* Don't create an array temporary for elemental functions,
! { dg-do compile }
! { dg-options -O -fdump-tree-original }
! Check that duplicate function calls are removed for
! - Functions returning allocatables
! - Character functions with known length
module x
  implicit none
contains
  pure function myfunc(x) result(y)
integer, intent(in) :: x
integer, dimension(:), allocatable :: y
allocate (y(3))
y(1) = x
y(2) = 2*x
y(3) = 3*x
  end function myfunc

  pure function mychar(x) result(r)
integer, intent(in) :: x
character(len=2) :: r
r = achar(x + iachar('0')) // achar(x + iachar('1'))
  end function mychar
end module x

program main
  use x
  implicit none
  integer :: n
  character(len=20) :: line
  n = 3
  write (unit=line,fmt='(3I2)') myfunc(n) + myfunc(n)
  if (line /= ' 61218') call abort
  write (unit=line,fmt='(A)') mychar(2) // mychar(2)
  if (line /= '2323') call abort
end program main
! { dg-final { scan-tree-dump-times myfunc 2 original } }
! { dg-final { scan-tree-dump-times mychar 2 original } }
! { dg-final { cleanup-tree-dump original } }
! { dg-final { cleanup-modules x } }


Re: [C++ PATCH] Attempt to find implicitly determined firstprivate class type vars during genericization (PR c++/48869)

2011-05-18 Thread Jason Merrill

On 05/11/2011 08:26 AM, Jakub Jelinek wrote:

This patch duplicates parts of the gimplifier's work during genericization,


That seems unfortunate, but I'll accept your judgment that it's the 
least-bad solution.  The patch is OK.


Jason


Re: [PATCH PR45098, 9/10] Cheap shift-add.

2011-05-18 Thread Zdenek Dvorak
Hi,

 +  sa_cost = (TREE_CODE (expr) != MINUS_EXPR
 + ? shiftadd_cost[speed][mode][m]
 + : (mult == op1
 +? shiftsub1_cost[speed][mode][m]
 +: shiftsub0_cost[speed][mode][m]));
 +  res = new_cost (sa_cost, 0);
 +  res = add_costs (res, mult == op1 ? cost0 : cost1);

just forgetting the cost of the other operand does not seem correct -- what
if it contains some more complicated subexpression?

Zdenek


Re: PATCH: PR target/49002: 128-bit AVX load incorrectly becomes 256-bit AVX load

2011-05-18 Thread Uros Bizjak
On Wed, May 18, 2011 at 10:37 PM, H.J. Lu hongjiu...@intel.com wrote:

 This patch properly handles 256bit load cast.  OK for trunk if there
 is no regression?  I will also prepare a patch for 4.6 branch.

 2011-05-18  H.J. Lu  hongjiu...@intel.com

        PR target/49002
        * config/i386/sse.md 
 (avx_ssemodesuffixavxsizesuffix_ssemodesuffix):
        Properly handle load cast.

 gcc/testsuite/

 2011-05-18  H.J. Lu  hongjiu...@intel.com

        PR target/49002
        * gcc.target/i386/pr49002-1.c: New test.
        * gcc.target/i386/pr49002-2.c: Likewise.

OK for 4.6 and mainline (4.6 needs a bit adjusted patch due to renamed
avxmodesuffixp).

Thanks,
Uros.


Re: [C++0x] contiguous bitfields race implementation

2011-05-18 Thread Jason Merrill
It seems like you're calculating maxbits correctly now, but an access 
doesn't necessarily start from the beginning of the sequence of 
bit-fields, especially given store_split_bit_field.  That is,


struct A
{
  int i;
  int j: 32;
  int k: 8;
  char c[2];
};

Here maxbits would be 40, so we decide that it's OK to use SImode to 
access the word starting with k, and clobber c in the process.  Am I wrong?


Jason



[PATCH, SMS 1/4] Fix calculation of row_rest_count

2011-05-18 Thread Revital Eres
Hello,

The calculation of the number of instructions in a row is currently
done by updating row_rest_count field in struct ps_insn on the fly
while creating a new instruction.  It is used to make sure we do not
exceed
the issue_rate.
This calculation assumes the instruction is inserted in the beginning of a
row thus does not take into account the cases where it must follow other
instructions.  Also, it's not been property updated when an instruction
is removed.
To avoid the overhead of maintaining this row_rest_count count in every
instruction in each row as is currently done; this patch maintains one
count per row which holds the number of instructions in the row.

The patch was tested together with the rest of the patches in this series.
On ppc64-redhat-linux regtest as well as bootstrap with SMS flags
enabling SMS also on loops with stage count 1.  Regtested on SPU.
On arm-linux-gnueabi regtseted on c,c++. Bootstrap c language with SMS
flags enabling SMS also on loops with stage count 1.

OK for mainline?

Thanks,
Revital

* modulo-sched.c (struct ps_insn): Remove row_rest_count field.
(struct partial_schedule): Add rows_length field.
(ps_insert_empty_row): Handle rows_length.
(create_partial_schedule): Likewise.
(free_partial_schedule): Likewise.
(reset_partial_schedule): Likewise.
(create_ps_insn): Remove rest_count argument.
(remove_node_from_ps): Update rows_length.
(add_node_to_ps): Update rows_length and call create_ps_insn
without passing row_rest_count.
Index: modulo-sched.c
===
--- modulo-sched.c  (revision 173814)
+++ modulo-sched.c  (working copy)
@@ -134,8 +135,6 @@ struct ps_insn
   ps_insn_ptr next_in_row,
  prev_in_row;
 
-  /* The number of nodes in the same row that come after this node.  */
-  int row_rest_count;
 };
 
 /* Holds the partial schedule as an array of II rows.  Each entry of the
@@ -149,6 +148,9 @@ struct partial_schedule
   /* rows[i] points to linked list of insns scheduled in row i (0=iii).  */
   ps_insn_ptr *rows;
 
+  /*  rows_length[i] holds the number of instructions in the row.  */
+  int *rows_length;
+
   /* The earliest absolute cycle of an insn in the partial schedule.  */
   int min_cycle;
 
@@ -1908,6 +2140,7 @@ ps_insert_empty_row (partial_schedule_pt
   int ii = ps-ii;
   int new_ii = ii + 1;
   int row;
+  int *rows_length_new;
 
   verify_partial_schedule (ps, sched_nodes);
 
@@ -1922,6 +2155,7 @@ ps_insert_empty_row (partial_schedule_pt
   rotate_partial_schedule (ps, PS_MIN_CYCLE (ps));
 
   rows_new = (ps_insn_ptr *) xcalloc (new_ii, sizeof (ps_insn_ptr));
+  rows_length_new = (int *) xcalloc (new_ii, sizeof (int));
   for (row = 0; row  split_row; row++)
 {
   rows_new[row] = ps-rows[row];
@@ -1966,6 +2200,8 @@ ps_insert_empty_row (partial_schedule_pt
 + (SMODULO (ps-max_cycle, ii) = split_row ? 1 : 0);
   free (ps-rows);
   ps-rows = rows_new;
+  free (ps-rows_length);
+  ps-rows_length = rows_length_new;
   ps-ii = new_ii;
   gcc_assert (ps-min_cycle = 0);
 
@@ -2456,6 +2692,7 @@ create_partial_schedule (int ii, ddg_ptr
 {
   partial_schedule_ptr ps = XNEW (struct partial_schedule);
   ps-rows = (ps_insn_ptr *) xcalloc (ii, sizeof (ps_insn_ptr));
+  ps-rows_length = (int *) xcalloc (ii, sizeof (int));
   ps-ii = ii;
   ps-history = history;
   ps-min_cycle = INT_MAX;
@@ -2494,6 +2731,7 @@ free_partial_schedule (partial_schedule_
 return;
   free_ps_insns (ps);
   free (ps-rows);
+  free (ps-rows_length);
   free (ps);
 }
 
@@ -2511,6 +2749,8 @@ reset_partial_schedule (partial_schedule
   ps-rows = (ps_insn_ptr *) xrealloc (ps-rows, new_ii
 * sizeof (ps_insn_ptr));
   memset (ps-rows, 0, new_ii * sizeof (ps_insn_ptr));
+  ps-rows_length = (int *) xrealloc (ps-rows_length, new_ii * sizeof (int));
+  memset (ps-rows_length, 0, new_ii * sizeof (int));
   ps-ii = new_ii;
   ps-min_cycle = INT_MAX;
   ps-max_cycle = INT_MIN;
@@ -2539,14 +2784,13 @@ print_partial_schedule (partial_schedule
 
 /* Creates an object of PS_INSN and initializes it to the given parameters.  */
 static ps_insn_ptr
-create_ps_insn (ddg_node_ptr node, int rest_count, int cycle)
+create_ps_insn (ddg_node_ptr node, int cycle)
 {
   ps_insn_ptr ps_i = XNEW (struct ps_insn);
 
   ps_i-node = node;
   ps_i-next_in_row = NULL;
   ps_i-prev_in_row = NULL;
-  ps_i-row_rest_count = rest_count;
   ps_i-cycle = cycle;
 
   return ps_i;
@@ -2579,6 +2823,8 @@ remove_node_from_ps (partial_schedule_pt
   if (ps_i-next_in_row)
ps_i-next_in_row-prev_in_row = ps_i-prev_in_row;
 }
+   
+  ps-rows_length[row] -= 1; 
   free (ps_i);
   return true;
 }
@@ -2735,17 +2981,12 @@ add_node_to_ps (partial_schedule_ptr ps,
sbitmap must_precede, sbitmap must_follow)
 {
   ps_insn_ptr ps_i;
-  int rest_count = 1;
   int row = SMODULO (cycle, ps-ii);
 
-  

[PATCH, SMS 2/4] Move the creation of anti-dep edge

2011-05-18 Thread Revital Eres
Hello,

The attached patch moves the creation of anti-dep edge from a
branch to it's def from create_ddg_dep_from_intra_loop_link () to
add_cross_iteration_register_deps  () due to the fact the edge is with
distance 1 and thus should be in the later function.
The edge was added to avoid creating reg-moves.

The patch was tested together with the rest of the patches in this series.
On ppc64-redhat-linux regtest as well as bootstrap with SMS flags
enabling SMS also on loops with stage count 1.  Regtested on SPU.
On arm-linux-gnueabi regtseted on c,c++. Bootstrap c language with SMS
flags enabling SMS also on loops with stage count 1.

OK for mainline?

Thanks,
Revital

* ddg.c (create_ddg_dep_from_intra_loop_link): Remove the creation
of anti-dep edge from a branch.
(add_cross_iteration_register_deps): Create anti-dep edge from
a branch.


Index: ddg.c
===
--- ddg.c   (revision 173785)
+++ ddg.c   (working copy)
@@ -197,11 +197,6 @@ create_ddg_dep_from_intra_loop_link (ddg
 }
 }

-  /* If a true dep edge enters the branch create an anti edge in the
- opposite direction to prevent the creation of reg-moves.  */
-  if ((DEP_TYPE (link) == REG_DEP_TRUE)  JUMP_P (dest_node-insn))
-create_ddg_dep_no_link (g, dest_node, src_node, ANTI_DEP, REG_DEP, 1);
-
latency = dep_cost (link);
e = create_ddg_edge (src_node, dest_node, t, dt, latency, distance);
add_edge_to_ddg (g, e);
@@ -306,8 +301,11 @@ add_cross_iteration_register_deps (ddg_p

  gcc_assert (first_def_node);

+ /* Always create the edge if the use node is a branch in
+order to prevent the creation of reg-moves.  */
   if (DF_REF_ID (last_def) != DF_REF_ID (first_def)
-  || !flag_modulo_sched_allow_regmoves)
+  || !flag_modulo_sched_allow_regmoves
+ || (flag_modulo_sched_allow_regmoves  JUMP_P (use_node-insn)))
 create_ddg_dep_no_link (g, use_node, first_def_node, ANTI_DEP,
 REG_DEP, 1);


[PATCH, SMS 3/4] Optimize stage count

2011-05-18 Thread Revital Eres
Hello,

The attach patch tries to achieve optimised SC by normalizing the partial
schedule (having the cycles start from cycle zero). The branch location
must be placed in row ii-1 in the final scheduling.  If that's not the
case after the normalization then it tries to move the branch to that
row if possible, while preserving the scheduling of the rest of the
instructions.

The patch was tested together with the rest of the patches in this series.
On ppc64-redhat-linux regtest as well as bootstrap with SMS flags
enabling SMS also on loops with stage count 1.  Regtested on SPU.
On arm-linux-gnueabi regtseted on c,c++. Bootstrap c language with SMS
flags enabling SMS also on loops with stage count 1.

OK for mainline?

Thanks,
Revital

Changelog:

   * modulo-sched.c (calculate_stage_count,
calculate_must_precede_follow, get_sched_window,
try_scheduling_node_in_cycle, remove_node_from_ps): Add
declaration.
(update_node_sched_params, set_must_precede_follow, optimize_sc):
New functions.
(reset_sched_times): Call update_node_sched_params.
(sms_schedule): Call optimize_sc.
(get_sched_window): Change function arguments.
(sms_schedule_by_order): Update call to get_sched_window.
all set_must_precede_follow.
(calculate_stage_count): Add function argument.
Index: modulo-sched.c
===
--- modulo-sched.c  (revision 173786)
+++ modulo-sched.c  (working copy)
@@ -198,7 +198,16 @@ static void generate_prolog_epilog (part
 rtx, rtx);
 static void duplicate_insns_of_cycles (partial_schedule_ptr,
   int, int, int, rtx);
-static int calculate_stage_count (partial_schedule_ptr ps);
+static int calculate_stage_count (partial_schedule_ptr, int);
+static void calculate_must_precede_follow (ddg_node_ptr, int, int,
+  int, int, sbitmap, sbitmap, sbitmap);
+static int get_sched_window (partial_schedule_ptr, ddg_node_ptr, 
+sbitmap, int, int *, int *, int *);
+static bool try_scheduling_node_in_cycle (partial_schedule_ptr, ddg_node_ptr,
+ int, int, sbitmap, int *, sbitmap,
+ sbitmap);
+static bool remove_node_from_ps (partial_schedule_ptr, ps_insn_ptr);
+
 #define SCHED_ASAP(x) (((node_sched_params_ptr)(x)-aux.info)-asap)
 #define SCHED_TIME(x) (((node_sched_params_ptr)(x)-aux.info)-time)
 #define SCHED_FIRST_REG_MOVE(x) \
@@ -572,6 +581,33 @@ free_undo_replace_buff (struct undo_repl
 }
 }
 
+/* Update the sched_params for node U using the II,
+   the CYCLE of U and MIN_CYCLE.  */
+static void
+update_node_sched_params (ddg_node_ptr u, int ii, int cycle, int min_cycle)
+{
+  int sc_until_cycle_zero;
+  int stage;
+
+  SCHED_TIME (u) = cycle;
+  SCHED_ROW (u) = SMODULO (cycle, ii);
+
+  /* The calculation of stage count is done adding the number
+ of stages before cycle zero and after cycle zero.  */
+  sc_until_cycle_zero = CALC_STAGE_COUNT (-1, min_cycle, ii);
+
+  if (SCHED_TIME (u)  0)
+{
+  stage = CALC_STAGE_COUNT (-1, SCHED_TIME (u), ii);
+  SCHED_STAGE (u) = sc_until_cycle_zero - stage;
+}
+  else
+{
+  stage = CALC_STAGE_COUNT (SCHED_TIME (u), 0, ii);
+  SCHED_STAGE (u) = sc_until_cycle_zero + stage - 1;
+}
+}
+
 /* Bump the SCHED_TIMEs of all nodes by AMOUNT.  Set the values of
SCHED_ROW and SCHED_STAGE.  Instruction scheduled on cycle AMOUNT
will move to cycle zero.  */
@@ -588,7 +624,6 @@ reset_sched_times (partial_schedule_ptr 
ddg_node_ptr u = crr_insn-node;
int normalized_time = SCHED_TIME (u) - amount;
int new_min_cycle = PS_MIN_CYCLE (ps) - amount;
-int sc_until_cycle_zero, stage;
 
 if (dump_file)
   {
@@ -604,23 +639,9 @@ reset_sched_times (partial_schedule_ptr 

gcc_assert (SCHED_TIME (u) = ps-min_cycle);
gcc_assert (SCHED_TIME (u) = ps-max_cycle);
-   SCHED_TIME (u) = normalized_time;
-   SCHED_ROW (u) = SMODULO (normalized_time, ii);
-  
-/* The calculation of stage count is done adding the number
-   of stages before cycle zero and after cycle zero.  */
-   sc_until_cycle_zero = CALC_STAGE_COUNT (-1, new_min_cycle, ii);
-   
-   if (SCHED_TIME (u)  0)
- {
-   stage = CALC_STAGE_COUNT (-1, SCHED_TIME (u), ii);
-   SCHED_STAGE (u) = sc_until_cycle_zero - stage;
- }
-   else
- {
-   stage = CALC_STAGE_COUNT (SCHED_TIME (u), 0, ii);
-   SCHED_STAGE (u) = sc_until_cycle_zero + stage - 1;
- }
+
+   crr_insn-cycle = normalized_time;
+   update_node_sched_params (u, ii, normalized_time, new_min_cycle);
   }
 }
  
@@ -657,6 +678,206 @@ permute_partial_schedule (partial_schedu

[PATCH, SMS 4/4] Misc. fixes

2011-05-18 Thread Revital Eres
Hello,

The attached patch contains misc. fixes and changes.

The patch was tested together with the rest of the patches in this series.
On ppc64-redhat-linux regtest as well as bootstrap with SMS flags
enabling SMS also on loops with stage count 1.  Regtested on SPU.
On arm-linux-gnueabi regtseted on c,c++. Bootstrap c language with SMS
flags enabling SMS also on loops with stage count 1.

OK for mainline?

Thanks,
Revital


Changelog:

* modulo-sched.c: Change comment.
(reset_sched_times): Fix print message.
(print_partial_schedule): Add print info.


Index: modulo-sched.c
===
--- modulo-sched.c  (revision 173786)
+++ modulo-sched.c  (working copy)
@@ -84,13 +84,14 @@ along with GCC; see the file COPYING3.
   II cycles (i.e. use register copies to prevent a def from overwriting
   itself before reaching the use).

-SMS works with countable loops whose loop count can be easily
-adjusted.  This is because we peel a constant number of iterations
-into a prologue and epilogue for which we want to avoid emitting
-the control part, and a kernel which is to iterate that constant
-number of iterations less than the original loop.  So the control
-part should be a set of insns clearly identified and having its
-own iv, not otherwise used in the loop (at-least for now), which
+SMS works with countable loops (1) whose control part can be easily
+decoupled from the rest of the loop and (2) whose loop count can
+be easily adjusted.  This is because we peel a constant number of
+iterations into a prologue and epilogue for which we want to avoid
+emitting the control part, and a kernel which is to iterate that
+constant number of iterations less than the original loop.  So the
+control part should be a set of insns clearly identified and having
+its own iv, not otherwise used in the loop (at-least for now), which
 initializes a register before the loop to the number of iterations.
 Currently SMS relies on the do-loop pattern to recognize such loops,
 where (1) the control part comprises of all insns defining and/or
@@ -595,8 +596,8 @@ reset_sched_times (partial_schedule_ptr
 /* Print the scheduling times after the rotation.  */
 fprintf (dump_file, crr_insn-node=%d (insn id %d), 
  crr_insn-cycle=%d, min_cycle=%d, crr_insn-node-cuid,
- INSN_UID (crr_insn-node-insn), SCHED_TIME (u),
- normalized_time);
+ INSN_UID (crr_insn-node-insn), normalized_time,
+ new_min_cycle);
 if (JUMP_P (crr_insn-node-insn))
   fprintf (dump_file,  (branch));
 fprintf (dump_file, \n);
@@ -2530,8 +2531,13 @@ print_partial_schedule (partial_schedule
   fprintf (dump, \n[ROW %d ]: , i);
   while (ps_i)
{
- fprintf (dump, %d, ,
-  INSN_UID (ps_i-node-insn));
+ if (JUMP_P (ps_i-node-insn))
+   fprintf (dump, %d (branch), ,
+INSN_UID (ps_i-node-insn));
+ else
+   fprintf (dump, %d, ,
+INSN_UID (ps_i-node-insn));
+   
  ps_i = ps_i-next_in_row;
}
 }


Re: [PATCH, MELT] add dominance functions

2011-05-18 Thread Basile Starynkevitch
On Wed, 18 May 2011 21:04:39 +0200
Pierre Vittet pier...@pvittet.com wrote:

 Hello,
 
 I have written a patch to allow the use of the GCC dominance functions 
 into MELT.
[...]

 Changelog:
 2011-05-17  Pierre Vittet pier...@pvittet.com
 
  * melt/xtramelt-ana-base.melt
  (is_dominance_info_available, is_post_dominance_info_available,
  calculate_dominance_info_unsafe,
  calculate_post_dominance_info_unsafe,
  free_dominance_info, free_post_dominance_info,
  calculate_dominance_info,
  calculate_post_dominance_info, debug_dominance_info,
  debug_post_dominance_info, get_immediate_dominator_unsafe,
  get_immediate_dominator, get_immediate_post_dominator_unsafe,
  get_immediate_post_dominator, dominated_by_other_unsafe,
  dominated_by_other, post_dominated_by_other_unsafe,
  post_dominated_by_other, foreach_dominated_unsafe,
  dominated_by_bb_iterator): Add primitives, functions, iterators for
  using dominance info.
 
 

Thanks for the patch. Some minor tweaks:

First, put a space between formal arguments list  function name. 
So 

+(defprimitive calculate_dominance_info_unsafe() :void
should be
+(defprimitive calculate_dominance_info_unsafe () :void

Then, please put the defined name on the same line that defprimitive or
defun or def... When consecutive MELT formals have the same ctype, you
don't need to repeat it 

So

+(defprimitive 
+  dominated_by_other_unsafe(:basic_block bbA :basic_block bbB) :long

should be

+(defprimitive dominated_by_other_unsafe (:basic_block bbA bbB) :long

In :doc strings, document when something is a boxed value 
(distinction between values  stuffs is crucial), so write instead 
[I added the boxed word, it is important]

+(defun get_immediate_dominator (bb) 
+ :doc#{Return the next immediate dominator of the boxed basic_block
$BB as a MELT +value.}#

At last, all debug* operations should only output debug to stderr only
when flag_melt_debug is set and give the MELT source position (because
we don't want any debug printing in the usual case when -fmelt-debug is
not given to our cc1) Look at debugloop in xtramelt-ana-base.melt for
an example (notice that debugeprintfnonl is a C macro printing the MELT
source position.


So please resubmit a slightly improved patch.

Regards.
-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basileatstarynkevitchdotnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mine, sont seulement les miennes} ***