Re: RFC: [build, ada] Centralize PICFLAG configuration

2011-08-22 Thread Paolo Bonzini

On 08/19/2011 09:11 PM, Rainer Orth wrote:


2011-07-31  Rainer Orthr...@cebitec.uni-bielefeld.de

config:
* picflag.m4: New file.

gcc:
* configure.ac (GCC_PICFLAG_FOR_TARGET): Call it.
(PICFLAG_FOR_TARGET): Substitute.
* aclocal.m4: Regenerate.
* configure: Regenerate.

gcc/ada:
* gcc-interface/Makefile.in (PICFLAG_FOR_TARGET): New.
(GNATLIBCFLAGS_FOR_C): Replace
TARGET_LIBGCC2_CFLAGS by PICFLAG_FOR_TARGET.
(gnatlib-shared-default, gnatlib-shared-dual-win32)
(gnatlib-shared-win32, gnatlib-shared-darwin, gnatlib-shared)
(gnatlib-sjlj, gnatlib-zcx): Likewise.

libada:
* configure.ac: Include ../config/picflag.m4.
(GCC_PICFLAG): Call it.
Substitute.
* configure: Regenerate.
* Makefile.in (TARGET_LIBGCC2_CFLAGS): Replace by PICFLAG.
(GNATLIBCFLAGS_FOR_C): Replace TARGET_LIBGCC2_CFLAGS by PICFLAG.
(LIBADA_FLAGS_TO_PASS): Pass PICFLAG as PICFLAG_FOR_TARGET.
Don't include $(GCC_DIR)/libgcc.mvars.

libiberty:
* aclocal.m4: Include ../config/picflag.m4.
* configure.ac (GCC_PICFLAG): Call it.
(enable_shared): Clear PICFLAG unless shared.
* configure: Regenerate.



Ok, thanks.

Paolo


Various minor speed-ups

2011-08-22 Thread Dimitrios Apostolou

Hello list,

the followup patches are a selection of minor changes introduced in 
various times during my GSOC project. They mostly are simple or
not that important to be posted alone, so I'll post them alltogether under 
this thread. Nevertheless they have been carefully selected from a pool of 
other changes and they are the ones that *do* offer some (minor) speed 
improvement, and have the least impact on memory usage, if at all.


They have all been tested on x86_64, some also on i386. For production 
builds I have seen no regression introduced.



Thanks,
Dimitris



mem_attrs_htab

2011-08-22 Thread Dimitrios Apostolou


2011-08-22  Dimitrios Apostolou  ji...@gmx.net

* emit-rtl.c (mem_attrs_htab_hash): Hash massively by calling
iterative_hash(). We disregard the offset,size rtx fields of the
mem_attrs struct, but overall this hash is a *huge* improvement to
the previous one, it reduces the collisions/searches ratio from 8
to 0.8 for some cases.
(init_emit_once): Slightly increase the mem_attrs_htab initial
size because it's frequently used and expanded many times.
=== modified file 'gcc/emit-rtl.c'
--- gcc/emit-rtl.c  2011-05-29 17:40:05 +
+++ gcc/emit-rtl.c  2011-08-21 04:44:25 +
@@ -256,11 +256,10 @@ mem_attrs_htab_hash (const void *x)
 {
   const mem_attrs *const p = (const mem_attrs *) x;
 
-  return (p-alias ^ (p-align * 1000)
- ^ (p-addrspace * 4000)
- ^ ((p-offset ? INTVAL (p-offset) : 0) * 5)
- ^ ((p-size ? INTVAL (p-size) : 0) * 250)
- ^ (size_t) iterative_hash_expr (p-expr, 0));
+  /* By massively feeding the mem_attrs struct to iterative_hash() we
+ disregard the p-offset and p-size rtx, but in total the hash is
+ quick and good enough. */
+  return iterative_hash_object (*p, iterative_hash_expr (p-expr, 0));
 }
 
 /* Returns nonzero if the value represented by X (which is really a
@@ -5494,7 +5500,7 @@ init_emit_once (void)
   const_fixed_htab = htab_create_ggc (37, const_fixed_htab_hash,
  const_fixed_htab_eq, NULL);
 
-  mem_attrs_htab = htab_create_ggc (37, mem_attrs_htab_hash,
+  mem_attrs_htab = htab_create_ggc (128, mem_attrs_htab_hash,
mem_attrs_htab_eq, NULL);
   reg_attrs_htab = htab_create_ggc (37, reg_attrs_htab_hash,
reg_attrs_htab_eq, NULL);



graphds.[ch]: alloc_pool for edges

2011-08-22 Thread Dimitrios Apostolou
free() was called way too often before, this patch reduces it 
significantly. Minor speed-up here too, I don't mention it individually 
since numbers are within noise margins.



2011-08-22  Dimitrios Apostolou  ji...@gmx.net

* graphds.h (struct graph): Added edge_pool as a pool for
allocating edges.
* graphds.c (new_graph): Initialise edge_pool.
(add_edge): Allocate edge from edge_pool rather than with malloc.
(free_graph): Instead of iterating across the graph freeing edges,
just destroy the edge_pool.
=== modified file 'gcc/graphds.c'
--- gcc/graphds.c   2009-11-25 10:55:54 +
+++ gcc/graphds.c   2011-08-19 16:44:41 +
@@ -62,7 +62,8 @@ new_graph (int n_vertices)
 
   g-n_vertices = n_vertices;
   g-vertices = XCNEWVEC (struct vertex, n_vertices);
-
+  g-edge_pool = create_alloc_pool (edge_pool,
+   sizeof (struct graph_edge), 32);
   return g;
 }
 
@@ -71,7 +72,7 @@ new_graph (int n_vertices)
 struct graph_edge *
 add_edge (struct graph *g, int f, int t)
 {
-  struct graph_edge *e = XNEW (struct graph_edge);
+  struct graph_edge *e = (struct graph_edge *) pool_alloc (g-edge_pool);
   struct vertex *vf = g-vertices[f], *vt = g-vertices[t];
 
 
@@ -326,19 +327,7 @@ for_each_edge (struct graph *g, graphds_
 void
 free_graph (struct graph *g)
 {
-  struct graph_edge *e, *n;
-  struct vertex *v;
-  int i;
-
-  for (i = 0; i  g-n_vertices; i++)
-{
-  v = g-vertices[i];
-  for (e = v-succ; e; e = n)
-   {
- n = e-succ_next;
- free (e);
-   }
-}
+  free_alloc_pool (g-edge_pool);
   free (g-vertices);
   free (g);
 }

=== modified file 'gcc/graphds.h'
--- gcc/graphds.h   2009-02-20 15:20:38 +
+++ gcc/graphds.h   2011-08-19 16:44:41 +
@@ -18,6 +18,10 @@ You should have received a copy of the G
 along with GCC; see the file COPYING3.  If not see
 http://www.gnu.org/licenses/.  */
 
+
+#include alloc-pool.h
+
+
 /* Structure representing edge of a graph.  */
 
 struct graph_edge
@@ -44,10 +48,10 @@ struct vertex
 
 struct graph
 {
-  int n_vertices;  /* Number of vertices.  */
-  struct vertex *vertices;
-   /* The vertices.  */
-  htab_t indices;  /* Fast lookup for indices.  */
+  int n_vertices;  /* Number of vertices.  */
+  struct vertex *vertices; /* The vertices.  */
+  htab_t indices;  /* Fast lookup for indices.  */
+  alloc_pool edge_pool;/* Pool for allocating edges. */
 };
 
 struct graph *new_graph (int);



tree-ssa*: reduce malloc() calls by preallocating hot VECs on the stack

2011-08-22 Thread Dimitrios Apostolou


2011-08-22  Dimitrios Apostolou  ji...@gmx.net

Allocate some very frequently used vectors on the stack:
* vecir.h: Defined a tree vector on the stack.
* tree-ssa-sccvn.c (print_scc, sort_scc, process_scc)
(extract_and_process_scc_for_name): Allocate the scc vector on the
stack instead of the heap, giving it a minimal initial size
instead of 0.
* tree-ssa-structalias.c (get_constraint_for_1)
(get_constraint_for, get_constraint_for_rhs, do_deref)
(get_constraint_for_ssa_var, get_constraint_for_ptr_offset)
(get_constraint_for_component_ref, get_constraint_for_address_of)
(process_all_all_constraints, do_structure_copy)
(make_constraints_to, make_constraint_to, handle_rhs_call)
(handle_lhs_call, handle_const_call, handle_pure_call)
(find_func_aliases_for_builtin_call, find_func_aliases_for_call)
(find_func_aliases, process_ipa_clobber, find_func_clobbers)
(create_variable_info_for): Converted the rhsc, lhsc vectors from
heap to stack, with a minimal initial size, since they were very
frequently allocated.
=== modified file 'gcc/tree-ssa-structalias.c'
--- gcc/tree-ssa-structalias.c  2011-08-18 06:53:12 +
+++ gcc/tree-ssa-structalias.c  2011-08-19 09:43:41 +
@@ -477,11 +477,14 @@ struct constraint_expr
 
 typedef struct constraint_expr ce_s;
 DEF_VEC_O(ce_s);
-DEF_VEC_ALLOC_O(ce_s, heap);
-static void get_constraint_for_1 (tree, VEC(ce_s, heap) **, bool, bool);
-static void get_constraint_for (tree, VEC(ce_s, heap) **);
-static void get_constraint_for_rhs (tree, VEC(ce_s, heap) **);
-static void do_deref (VEC (ce_s, heap) **);
+DEF_VEC_ALLOC_O_STACK(ce_s);
+#define VEC_ce_s_stack_alloc(alloc) \
+  VEC_stack_alloc (ce_s, alloc)
+
+static void get_constraint_for_1 (tree, VEC(ce_s, stack) **, bool, bool);
+static void get_constraint_for (tree, VEC(ce_s, stack) **);
+static void get_constraint_for_rhs (tree, VEC(ce_s, stack) **);
+static void do_deref (VEC (ce_s, stack) **);
 
 /* Our set constraints are made up of two constraint expressions, one
LHS, and one RHS.
@@ -2736,7 +2739,7 @@ new_scalar_tmp_constraint_exp (const cha
If address_p is true, the result will be taken its address of.  */
 
 static void
-get_constraint_for_ssa_var (tree t, VEC(ce_s, heap) **results, bool address_p)
+get_constraint_for_ssa_var (tree t, VEC(ce_s, stack) **results, bool address_p)
 {
   struct constraint_expr cexpr;
   varinfo_t vi;
@@ -2776,12 +2779,12 @@ get_constraint_for_ssa_var (tree t, VEC(
   for (; vi; vi = vi-next)
{
  cexpr.var = vi-id;
- VEC_safe_push (ce_s, heap, *results, cexpr);
+ VEC_safe_push (ce_s, stack, *results, cexpr);
}
   return;
 }
 
-  VEC_safe_push (ce_s, heap, *results, cexpr);
+  VEC_safe_push (ce_s, stack, *results, cexpr);
 }
 
 /* Process constraint T, performing various simplifications and then
@@ -2861,7 +2864,7 @@ bitpos_of_field (const tree fdecl)
 
 static void
 get_constraint_for_ptr_offset (tree ptr, tree offset,
-  VEC (ce_s, heap) **results)
+  VEC (ce_s, stack) **results)
 {
   struct constraint_expr c;
   unsigned int j, n;
@@ -2920,7 +2923,7 @@ get_constraint_for_ptr_offset (tree ptr,
  c2.type = ADDRESSOF;
  c2.offset = 0;
  if (c2.var != c.var)
-   VEC_safe_push (ce_s, heap, *results, c2);
+   VEC_safe_push (ce_s, stack, *results, c2);
  temp = temp-next;
}
  while (temp);
@@ -2955,7 +2958,7 @@ get_constraint_for_ptr_offset (tree ptr,
  c2.var = temp-next-id;
  c2.type = ADDRESSOF;
  c2.offset = 0;
- VEC_safe_push (ce_s, heap, *results, c2);
+ VEC_safe_push (ce_s, stack, *results, c2);
}
  c.var = temp-id;
  c.offset = 0;
@@ -2974,7 +2977,7 @@ get_constraint_for_ptr_offset (tree ptr,
as the lhs.  */
 
 static void
-get_constraint_for_component_ref (tree t, VEC(ce_s, heap) **results,
+get_constraint_for_component_ref (tree t, VEC(ce_s, stack) **results,
  bool address_p, bool lhs_p)
 {
   tree orig_t = t;
@@ -2999,7 +3002,7 @@ get_constraint_for_component_ref (tree t
   temp.offset = 0;
   temp.var = integer_id;
   temp.type = SCALAR;
-  VEC_safe_push (ce_s, heap, *results, temp);
+  VEC_safe_push (ce_s, stack, *results, temp);
   return;
 }
 
@@ -3021,7 +3024,7 @@ get_constraint_for_component_ref (tree t
temp.offset = 0;
temp.var = anything_id;
temp.type = ADDRESSOF;
-   VEC_safe_push (ce_s, heap, *results, temp);
+   VEC_safe_push (ce_s, stack, *results, temp);
return;
  }
 }
@@ -3062,7 +3065,7 @@ get_constraint_for_component_ref (tree t
bitpos, bitmaxsize))
 

tree-ssa-structalias.c: alloc_pool for struct equiv_class_label

2011-08-22 Thread Dimitrios Apostolou


2011-08-22  Dimitrios Apostolou  ji...@gmx.net

* tree-ssa-structalias.c (equiv_class_add)
(perform_var_substitution, free_var_substitution_info): Created a
new equiv_class_pool allocator pool for struct
equiv_class_label. Changed the pointer_equiv_class_table and
location_equiv_class_table hash tables to not iterate freeing all
elements in the end, but just free the pool.




[PATCH] make assign_hard_reg's saved_nregs conditional to unbreak ARM bootstrap (PR50146)

2011-08-22 Thread Mikael Pettersson
As described in PR50146, a recent change to ira-color.c caused trunk to
fail to bootstrap on ARM.  The issue is that a new variable saved_nregs
is declared unconditionally but used #ifndef HONOR_REG_ALLOC_ORDER.  Since
the ARM backend defines HONOR_REG_ALLOC_ORDER, an 'unused variable' warning
results, which becomes a hard error in stage 2.

Fixed by moving the declaration to the #ifndef HONOR_REG_ALLOC_ORDER block
of variable declarations.  With this in place bootstrap succeeds on
armv5tel-linux-gnueabi.

Patch was pre-approved by Vladimir Makarov in the PR trail, but I don't
have svn write access so I'll need help to commit it.

/Mikael

gcc/

2011-08-22  Mikael Pettersson  mi...@it.uu.se

PR bootstrap/50146
* ira-color.c (assign_hard_reg): Move saved_nregs declaration
to #ifndef HONOR_REG_ALLOC_ORDER block.

--- gcc-4.7-20110820/gcc/ira-color.c.~1~2011-08-18 16:56:36.0 
+0200
+++ gcc-4.7-20110820/gcc/ira-color.c2011-08-21 19:11:00.0 +0200
@@ -1567,13 +1567,14 @@ static bool
 assign_hard_reg (ira_allocno_t a, bool retry_p)
 {
   HARD_REG_SET conflicting_regs[2], profitable_hard_regs[2];
-  int i, j, hard_regno, best_hard_regno, class_size, saved_nregs;
+  int i, j, hard_regno, best_hard_regno, class_size;
   int cost, mem_cost, min_cost, full_cost, min_full_cost, nwords, word;
   int *a_costs;
   enum reg_class aclass;
   enum machine_mode mode;
   static int costs[FIRST_PSEUDO_REGISTER], full_costs[FIRST_PSEUDO_REGISTER];
 #ifndef HONOR_REG_ALLOC_ORDER
+  int saved_nregs;
   enum reg_class rclass;
   int add_cost;
 #endif


Re: mem_attrs_htab

2011-08-22 Thread Jakub Jelinek
On Mon, Aug 22, 2011 at 10:32:35AM +0300, Dimitrios Apostolou wrote:
 --- gcc/emit-rtl.c2011-05-29 17:40:05 +
 +++ gcc/emit-rtl.c2011-08-21 04:44:25 +
 @@ -256,11 +256,10 @@ mem_attrs_htab_hash (const void *x)
  {
const mem_attrs *const p = (const mem_attrs *) x;
  
 -  return (p-alias ^ (p-align * 1000)
 -   ^ (p-addrspace * 4000)
 -   ^ ((p-offset ? INTVAL (p-offset) : 0) * 5)
 -   ^ ((p-size ? INTVAL (p-size) : 0) * 250)
 -   ^ (size_t) iterative_hash_expr (p-expr, 0));
 +  /* By massively feeding the mem_attrs struct to iterative_hash() we
 + disregard the p-offset and p-size rtx, but in total the hash is
 + quick and good enough. */
 +  return iterative_hash_object (*p, iterative_hash_expr (p-expr, 0));
  }
  
  /* Returns nonzero if the value represented by X (which is really a

This patch isn't against the trunk, where p-offset and p-size aren't rtxes
anymore, but HOST_WIDE_INTs.  Furthermore, it is a bad idea to hash
the p-expr address itself, it doesn't make any sense to hash on what
p-expr points to in that case.  And p-offset and p-size should be ignored
if the *known_p corresponding fields are false.  So, if you really think
using iterative_hash_object is a win, it should be something like:
  mem_attrs q = *p;
  q.expr = NULL;
  if (!q.offset_known_p) q.offset = 0;
  if (!q.size_known_p) q.size = 0;
  return iterative_hash_object (q, iterative_hash_expr (p-expr, 0));
(or better yet avoid q.expr = NULL and instead start hashing from the next
field after expr).  Hashing the struct padding might not be a good idea
either.

Jakub


Re: Various minor speed-ups

2011-08-22 Thread Dimitrios Apostolou


2011-08-22  Dimitrios Apostolou  ji...@gmx.net

* tree-ssa-pre.c (phi_trans_add, init_pre, fini_pre): Added a pool
for phi_translate_table elements to avoid free() calls from
htab_delete().

=== modified file 'gcc/tree-ssa-pre.c'
--- gcc/tree-ssa-pre.c  2011-05-04 09:04:53 +
+++ gcc/tree-ssa-pre.c  2011-08-17 08:43:23 +
@@ -515,6 +515,10 @@ typedef struct expr_pred_trans_d
 } *expr_pred_trans_t;
 typedef const struct expr_pred_trans_d *const_expr_pred_trans_t;
 
+/* Pool of memory for the above */
+
+static alloc_pool phi_translate_pool;
+
 /* Return the hash value for a phi translation table entry.  */
 
 static hashval_t
@@ -571,7 +575,8 @@ static inline void
 phi_trans_add (pre_expr e, pre_expr v, basic_block pred)
 {
   void **slot;
-  expr_pred_trans_t new_pair = XNEW (struct expr_pred_trans_d);
+  expr_pred_trans_t new_pair
+= (expr_pred_trans_t) pool_alloc (phi_translate_pool);
   new_pair-e = e;
   new_pair-pred = pred;
   new_pair-v = v;
@@ -580,7 +585,8 @@ phi_trans_add (pre_expr e, pre_expr v, b
 
   slot = htab_find_slot_with_hash (phi_translate_table, new_pair,
   new_pair-hashcode, INSERT);
-  free (*slot);
+  if (*slot)
+pool_free (phi_translate_pool, *slot);
   *slot = (void *) new_pair;
 }
 
@@ -4804,8 +4810,12 @@ init_pre (bool do_fre)
   calculate_dominance_info (CDI_DOMINATORS);
 
   bitmap_obstack_initialize (grand_bitmap_obstack);
+  phi_translate_pool = create_alloc_pool (phi_translate_table pool,
+ sizeof (struct expr_pred_trans_d),
+ 4096);
+  /* NULL as free because we'll free the whole pool in the end. */
   phi_translate_table = htab_create (5110, expr_pred_trans_hash,
-expr_pred_trans_eq, free);
+expr_pred_trans_eq, NULL);
   expression_to_id = htab_create (num_ssa_names * 3,
  pre_expr_hash,
  pre_expr_eq, NULL);
@@ -4839,6 +4849,7 @@ fini_pre (bool do_fre)
   free_alloc_pool (bitmap_set_pool);
   free_alloc_pool (pre_expr_pool);
   htab_delete (phi_translate_table);
+  free_alloc_pool (phi_translate_pool);
   htab_delete (expression_to_id);
   VEC_free (unsigned, heap, name_to_id);
 



Re: graphds.[ch]: alloc_pool for edges

2011-08-22 Thread Jakub Jelinek
On Mon, Aug 22, 2011 at 10:37:58AM +0300, Dimitrios Apostolou wrote:
 --- gcc/graphds.h 2009-02-20 15:20:38 +
 +++ gcc/graphds.h 2011-08-19 16:44:41 +
 @@ -18,6 +18,10 @@ You should have received a copy of the G
  along with GCC; see the file COPYING3.  If not see
  http://www.gnu.org/licenses/.  */
  
 +
 +#include alloc-pool.h
 +
 +

This needs to be reflected in Makefile.in, we unfortunately don't have
automatic dependency generation for gcc.
So, create
GRAPHDS_H = graphds.h alloc-pool.h
and use $(GRAPHDS_H) everywhere where graphds.h has been used so far.

Jakub


Re: tree-ssa-structalias.c: alloc_pool for struct equiv_class_label

2011-08-22 Thread Dimitrios Apostolou

Forgot the patch...

On Mon, 22 Aug 2011, Dimitrios Apostolou wrote:


2011-08-22  Dimitrios Apostolou  ji...@gmx.net

* tree-ssa-structalias.c (equiv_class_add)
(perform_var_substitution, free_var_substitution_info): Created a
new equiv_class_pool allocator pool for struct
equiv_class_label. Changed the pointer_equiv_class_table and
location_equiv_class_table hash tables to not iterate freeing all
elements in the end, but just free the pool.


=== modified file 'gcc/tree-ssa-structalias.c'
--- gcc/tree-ssa-structalias.c  2011-04-29 10:59:33 +
+++ gcc/tree-ssa-structalias.c  2011-08-18 06:53:12 +
@@ -1899,6 +1899,9 @@ static htab_t pointer_equiv_class_table;
classes.  */
 static htab_t location_equiv_class_table;
 
+/* Pool of memory for storing the above */
+static alloc_pool equiv_class_pool;
+
 /* Hash function for a equiv_class_label_t */
 
 static hashval_t
@@ -1948,7 +1951,8 @@ equiv_class_add (htab_t table, unsigned 
 bitmap labels)
 {
   void **slot;
-  equiv_class_label_t ecl = XNEW (struct equiv_class_label);
+  equiv_class_label_t ecl
+= (equiv_class_label_t) pool_alloc (equiv_class_pool);
 
   ecl-labels = labels;
   ecl-equivalence_class = equivalence_class;
@@ -2159,10 +2163,14 @@ perform_var_substitution (constraint_gra
   struct scc_info *si = init_scc_info (size);
 
   bitmap_obstack_initialize (iteration_obstack);
+  equiv_class_pool = create_alloc_pool (equiv_class_label pool,
+   sizeof (struct equiv_class_label),
+   64);
+  /* NULL free function, we'll free the whole pool at the end of the pass. */
   pointer_equiv_class_table = htab_create (511, equiv_class_label_hash,
-  equiv_class_label_eq, free);
+  equiv_class_label_eq, NULL);
   location_equiv_class_table = htab_create (511, equiv_class_label_hash,
-   equiv_class_label_eq, free);
+   equiv_class_label_eq, NULL);
   pointer_equiv_class = 1;
   location_equiv_class = 1;
 
@@ -2269,6 +2277,7 @@ free_var_substitution_info (struct scc_i
   sbitmap_free (graph-direct_nodes);
   htab_delete (pointer_equiv_class_table);
   htab_delete (location_equiv_class_table);
+  free_alloc_pool (equiv_class_pool);
   bitmap_obstack_release (iteration_obstack);
 }
 



Re: mem_attrs_htab

2011-08-22 Thread Dimitrios Apostolou

Hi Jakub,

I forgot to mention that all patches are against mid-July trunk, I was 
hoping I'd have no conflicts. Anyway thanks for letting me know, 
if there are conflicts with my other patches please let me know, and I'll 
post an updated version at a later date.


All your other concerns are valid and I'll try addressing them in the 
future. I didn't like hashing addresses either, and I was surprised I saw 
no regressions.



Dimitris




This patch isn't against the trunk, where p-offset and p-size aren't rtxes
anymore, but HOST_WIDE_INTs.  Furthermore, it is a bad idea to hash
the p-expr address itself, it doesn't make any sense to hash on what
p-expr points to in that case.  And p-offset and p-size should be ignored
if the *known_p corresponding fields are false.  So, if you really think
using iterative_hash_object is a win, it should be something like:
 mem_attrs q = *p;
 q.expr = NULL;
 if (!q.offset_known_p) q.offset = 0;
 if (!q.size_known_p) q.size = 0;
 return iterative_hash_object (q, iterative_hash_expr (p-expr, 0));
(or better yet avoid q.expr = NULL and instead start hashing from the next
field after expr).  Hashing the struct padding might not be a good idea
either.

Jakub



Re: mem_attrs_htab

2011-08-22 Thread Jakub Jelinek
On Mon, Aug 22, 2011 at 10:58:48AM +0300, Dimitrios Apostolou wrote:
 the future. I didn't like hashing addresses either, and I was
 surprised I saw no regressions.

Hashing on the expr address as well just results in smaller sharing
in the hash table (i.e. if the expr has different address, but is considered
equal).  The hashing of mem attrs is done just to reduce memory overhead.

Jakub


Re: Various minor speed-ups

2011-08-22 Thread Dimitrios Apostolou
For whoever is concerned about memory usage, I didn't measure a real 
increase, besides a few KB. These are very hot allocation pools and 
allocating too many blocks of 10 elements is suboptimal.




2011-08-22  Dimitrios Apostolou  ji...@gmx.net

* cselib.c (cselib_init): Increased initial size of elt_list_pool,
elt_loc_list_pool, cselib_val_pool, value_pool allocation pools
since they are very frequently used.


=== modified file 'gcc/cselib.c'
--- gcc/cselib.c2011-05-31 19:14:21 +
+++ gcc/cselib.c2011-08-17 14:03:56 +
@@ -2484,12 +2484,12 @@ void
 cselib_init (int record_what)
 {
   elt_list_pool = create_alloc_pool (elt_list,
-sizeof (struct elt_list), 10);
+sizeof (struct elt_list), 128);
   elt_loc_list_pool = create_alloc_pool (elt_loc_list,
-sizeof (struct elt_loc_list), 10);
+sizeof (struct elt_loc_list), 128);
   cselib_val_pool = create_alloc_pool (cselib_val_list,
-  sizeof (cselib_val), 10);
-  value_pool = create_alloc_pool (value, RTX_CODE_SIZE (VALUE), 100);
+  sizeof (cselib_val), 128);
+  value_pool = create_alloc_pool (value, RTX_CODE_SIZE (VALUE), 128);
   cselib_record_memory = record_what  CSELIB_RECORD_MEMORY;
   cselib_preserve_constants = record_what  CSELIB_PRESERVE_CONSTANTS;
 



cse.c: preferable()

2011-08-22 Thread Dimitrios Apostolou
Attached patch is also posted at bug #19832 and I think resolves it, as 
well as /maybe/ offers a negligible speedup of 3-4 M instr or a couple 
milliseconds. I also post it here for comments.



2011-08-13  Dimitrios Apostolou  ji...@gmx.net

* cse.c (preferable): Make it more readable and slightly faster,
without affecting its logic.

=== modified file 'gcc/cse.c'
--- gcc/cse.c   2011-06-02 21:52:46 +
+++ gcc/cse.c   2011-08-13 00:54:06 +
@@ -720,32 +720,25 @@ approx_reg_cost (rtx x)
 static int
 preferable (int cost_a, int regcost_a, int cost_b, int regcost_b)
 {
-  /* First, get rid of cases involving expressions that are entirely
- unwanted.  */
-  if (cost_a != cost_b)
-{
-  if (cost_a == MAX_COST)
-   return 1;
-  if (cost_b == MAX_COST)
-   return -1;
-}
+  int cost_diff = cost_a - cost_b;
+  int regcost_diff = regcost_a - regcost_b;
 
-  /* Avoid extending lifetimes of hardregs.  */
-  if (regcost_a != regcost_b)
+  if (cost_diff != 0)
 {
-  if (regcost_a == MAX_COST)
-   return 1;
-  if (regcost_b == MAX_COST)
-   return -1;
+  /* If none of the expressions are entirely unwanted */
+  if ((cost_a != MAX_COST)  (cost_b != MAX_COST)
+ /* AND only one of the regs is HARD_REG */
+  (regcost_diff != 0)
+  ((regcost_a == MAX_COST) || (regcost_b == MAX_COST))
+ )
+   /* Then avoid extending lifetime of HARD_REG */
+   return regcost_diff;
+
+  return cost_diff;
 }
 
-  /* Normal operation costs take precedence.  */
-  if (cost_a != cost_b)
-return cost_a - cost_b;
-  /* Only if these are identical consider effects on register pressure.  */
-  if (regcost_a != regcost_b)
-return regcost_a - regcost_b;
-  return 0;
+  /* cost_a == costb, consider effects on register pressure */
+  return regcost_diff;
 }
 
 /* Internal function, to compute cost when X is not a register; called



Re: Use of vector instructions in memmov/memset expanding

2011-08-22 Thread Michael Zolotukhin
Ping.

On 18 July 2011 15:00, Michael Zolotukhin
michael.v.zolotuk...@gmail.com wrote:
 Here is a summary - probably, it doesn't cover every single piece in
 the patch, but I tried to describe the major changes. I hope this will
 help you a bit - and of course I'll answer your further questions if
 they appear.

 The changes could be logically divided into two parts (though, these
 parts have something in common).
 The first part is changes in target-independent part, in functions
 move_by_pieces() and store_by_pieces() - mostly located in expr.c.
 The second part touches ix86_expand_movmem() and ix86_expand_setmem()
 - mostly located in config/i386/i386.c.

 Changes in i386.c (target-dependent part):
 1) Strategies for cases with known and unknown alignment are separated
 from each other.
 When alignment is known at compile time, we could generate optimized
 code without libcalls.
 When it's unknown, we sometimes could create runtime-checks to reach
 desired alignment, but not always.
 Strategies for atom and generic_32, generic_64 were chosen according
 to set of experiments, strategies in other
 cost models are unchanged (strategies for unknown alignment are copied
 from existing strategies).
 2) unrolled_loop algorithm was modified - now it uses SSE move-modes,
 if they're available.
 3) As size of data, moved in one iteration, greatly increased, and
 epilogues became bigger - so some changes were needed in epilogue
 generation. In some cases a special loop (not unrolled) is generated
 in epilogue to avoid slow copying by bytes (changes in
 expand_set_or_movmem_via_loop() and introducing of
 expand_set_or_movmem_via_loop_with_iter() is made for these cases).
 4) As bigger alignment might be needed than previously, prologue
 generation was also modified.

 Changes in expr.c (target-independent part):
 There are two possible strategies now: use of aligned and unaligned
 moves. For each of them a cost model was implemented and the choice is
 made according to the cost of each option. Move-mode choice is made by
 functions widest_mode_for_unaligned_mov() and
 widest_mode_for_aligned_mov().
 Cost estimation is implemented in functions compute_aligned_cost() and
 compute_unaligned_cost().
 Choice between these two strategies and the generation of moves
 themselves are in function move_by_pieces().

 Function store_by_pieces() calls set_by_pieces_1() instead of
 store_by_pieces_1(), if this is memset-case (I needed to introduce
 set_by_pieces_1 to separate memset-case from others -
 store_by_pieces_1 is sometimes called for strcpy and some other
 functions, not only for memset).

 Set_by_pieces_1() estimates costs of aligned and unaligned strategies
 (as in move_by_pieces() ) and generates moves for memset. Single move
 is generated via
 generate_move_with_mode(). If it's called first time, a promoted value
 (register, filled with one-byte value of memset argument) is generated
 - later calls reuse this value.

 Changes in MD-files:
 For generation of promoted values, I made some changes in
 promote_duplicated_reg() and promote_duplicated_reg_to_size(). Expands
 for vec_dup4si and vec_dupv2di were introduced for this too (these
 expands differ from corresponding define_insns - existing define_insn
 work only with registers, while new expands could process memory
 operand as well).

 Some code were added to allow generation of MOVQ (with SSE-registers)
 - such moves aren't usual ones, because they use only half of
 xmm-register.
 There was a need to generate such moves explicitly, so I added a
 simple expand to sse.md.


 On 16 July 2011 03:24, Jan Hubicka hubi...@ucw.cz wrote:
  New algorithm for move-mode selection is implemented for move_by_pieces,
  store_by_pieces.
  x86-specific ix86_expand_movmem and ix86_expand_setmem are also changed in
  similar way, x86 cost-models parameters are slightly changed to support
  this. This implementation checks if array's alignment is known at compile
  time and chooses expanding algorithm and move-mode according to it.

 Can you give some sumary of changes you made?  It would make it a lot easier 
 to
 review if it was broken up int the generic changes (with rationaly why they 
 are
 needed) and i386 backend changes that I could review then.

 From first pass through the patch I don't quite see the need for i.e. adding
 new move patterns when we can output all kinds of SSE moves already.  Will 
 look
 more into the patch to see if I can come up with useful comments.

 Honza




Re: [PATCH v3, i386] BMI2 support for GCC, mulx, rorx, shiftx part

2011-08-22 Thread Uros Bizjak
On Sun, Aug 21, 2011 at 1:39 PM, Uros Bizjak ubiz...@gmail.com wrote:

 This is the third version of BMI2 support that includes generation of
 mulx, rorx, shiftx part. This patch includes all comments on
 previous version, splits all insn post-reload, uses enable attribute
 and avoids new register modifiers. As a compromise (see previous
 posts), the mulx insn is now split post-reload into pattern that
 separates outputs (so, post-reload passes can do their job more
 effectively), with the hope that someday other DWI patterns will be
 rewritten in the same way.

A small update that removes the need for w mode attribute. We can
convert count register to the correct mode in a splitter.

Re-tested on x86_64-pc-linux-gnu {,-m32}.

Uros.
Index: i386.md
===
--- i386.md (revision 177949)
+++ i386.md (working copy)
@@ -377,7 +377,7 @@
 (define_attr type
   other,multi,
alu,alu1,negnot,imov,imovx,lea,
-   incdec,ishift,ishift1,rotate,rotate1,imul,idiv,
+   incdec,ishift,ishiftx,ishift1,rotate,rotatex,rotate1,imul,imulx,idiv,
icmp,test,ibr,setcc,icmov,
push,pop,call,callv,leave,
str,bitmanip,
@@ -410,12 +410,12 @@
 ;; The (bounding maximum) length of an instruction immediate.
 (define_attr length_immediate 
   (cond [(eq_attr type incdec,setcc,icmov,str,lea,other,multi,idiv,leave,
-  bitmanip)
+  bitmanip,imulx)
   (const_int 0)
 (eq_attr unit i387,sse,mmx)
   (const_int 0)
-(eq_attr type alu,alu1,negnot,imovx,ishift,rotate,ishift1,rotate1,
- imul,icmp,push,pop)
+(eq_attr type alu,alu1,negnot,imovx,ishift,ishiftx,ishift1,
+ rotate,rotatex,rotate1,imul,icmp,push,pop)
   (symbol_ref ix86_attr_length_immediate_default (insn, true))
 (eq_attr type imov,test)
   (symbol_ref ix86_attr_length_immediate_default (insn, false))
@@ -675,7 +675,7 @@
  (and (match_operand 0 memory_displacement_operand )
   (match_operand 1 immediate_operand )))
   (const_string true)
-(and (eq_attr type alu,ishift,rotate,imul,idiv)
+(and (eq_attr type alu,ishift,ishiftx,rotate,rotatex,imul,idiv)
  (and (match_operand 0 memory_displacement_operand )
   (match_operand 2 immediate_operand )))
   (const_string true)
@@ -699,12 +699,13 @@
 (define_attr movu 0,1 (const_string 0))
 
 ;; Used to control the enabled attribute on a per-instruction basis.
-(define_attr isa base,noavx,avx
+(define_attr isa base,noavx,avx,bmi2
   (const_string base))
 
 (define_attr enabled 
   (cond [(eq_attr isa noavx) (symbol_ref !TARGET_AVX)
 (eq_attr isa avx) (symbol_ref TARGET_AVX)
+(eq_attr isa bmi2) (symbol_ref TARGET_BMI2)
]
(const_int 1)))
 
@@ -6844,16 +6845,102 @@
  (clobber (reg:CC FLAGS_REG))])]
   TARGET_QIMODE_MATH)
 
-(define_insn *umulmodedwi3_1
+(define_insn *bmi2_umulditi3_1
+  [(set (match_operand:DI 0 register_operand =r)
+   (mult:DI
+ (match_operand:DI 2 nonimmediate_operand %d)
+ (match_operand:DI 3 nonimmediate_operand rm)))
+   (set (match_operand:DI 1 register_operand =r)
+   (truncate:DI
+ (lshiftrt:TI
+   (mult:TI (zero_extend:TI (match_dup 2))
+(zero_extend:TI (match_dup 3)))
+   (const_int 64]
+  TARGET_64BIT  TARGET_BMI2
+!(MEM_P (operands[1])  MEM_P (operands[2]))
+  mulx\t{%3, %0, %1|%1, %0, %3}
+  [(set_attr type imulx)
+   (set_attr prefix vex)
+   (set_attr mode DI)])
+
+(define_insn *bmi2_umulsidi3_1
+  [(set (match_operand:SI 0 register_operand =r)
+   (mult:SI
+ (match_operand:SI 2 nonimmediate_operand %d)
+ (match_operand:SI 3 nonimmediate_operand rm)))
+   (set (match_operand:SI 1 register_operand =r)
+   (truncate:SI
+ (lshiftrt:DI
+   (mult:DI (zero_extend:DI (match_dup 2))
+(zero_extend:DI (match_dup 3)))
+   (const_int 32]
+  !TARGET_64BIT  TARGET_BMI2
+!(MEM_P (operands[1])  MEM_P (operands[2]))
+  mulx\t{%3, %0, %1|%1, %0, %3}
+  [(set_attr type imulx)
+   (set_attr prefix vex)
+   (set_attr mode SI)])
+
+(define_insn *umulmodedwi3_1
+  [(set (match_operand:DWI 0 register_operand =A,r)
+   (mult:DWI
+ (zero_extend:DWI
+   (match_operand:DWIH 1 nonimmediate_operand %0,d))
+ (zero_extend:DWI
+   (match_operand:DWIH 2 nonimmediate_operand rm,rm
+   (clobber (reg:CC FLAGS_REG))]
+  !(MEM_P (operands[1])  MEM_P (operands[2]))
+  @
+   mul{imodesuffix}\t%2
+   #
+  [(set_attr isa base,bmi2)
+   (set_attr type imul,imulx)
+   (set_attr length_immediate 0,*)
+   (set (attr athlon_decode)
+   (cond [(eq_attr alternative 0)
+(if_then_else (eq_attr cpu athlon)
+  (const_string vector)
+  (const_string double))]
+ 

Re: [patch, committed] gfortran.dg/graphite/interchange-1.f: Remove xfail

2011-08-22 Thread Rainer Orth
Tobias,

 I saw that the test case now XPASSes and I find also some xpass in
 gcc-testresult. (Not for all, but I think those do not build graphite.)

 I think the XPASS is due to
 http://gcc.gnu.org/ml/fortran/2011-08/msg00023.html

This is PR tree-optimization/50124.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] make assign_hard_reg's saved_nregs conditional to unbreak ARM bootstrap (PR50146)

2011-08-22 Thread Jakub Jelinek
On Mon, Aug 22, 2011 at 09:46:17AM +0200, Mikael Pettersson wrote:
 Patch was pre-approved by Vladimir Makarov in the PR trail, but I don't
 have svn write access so I'll need help to commit it.

Committed, thanks.

Jakub


[trans-mem] Use __x86_64__ instead of __LP64__.

2011-08-22 Thread Torvald Riegel
Use __x86_64__ instead of __LP64__ in setjmp/longjmp and TLS
definitions.

H.J.: Is that sufficient for x32, or do we need entirely different code?
If so, can you please provide the required changes?

Otherwise, OK for branch?
commit 5337bae3f70d53e463d09e8d6806826876b0da8a
Author: Torvald Riegel trie...@redhat.com
Date:   Mon Aug 22 11:21:03 2011 +0200

Use __x86_64__ instead of __LP64__.

* config/x86/tls.h: Use __x86_64__ instead of __LP64__.
* config/x86/sjlj.S: Same.

diff --git a/libitm/config/x86/sjlj.S b/libitm/config/x86/sjlj.S
index 0e9c246..725ffec 100644
--- a/libitm/config/x86/sjlj.S
+++ b/libitm/config/x86/sjlj.S
@@ -1,4 +1,4 @@
-/* Copyright (C) 2008, 2009 Free Software Foundation, Inc.
+/* Copyright (C) 2008, 2009, 2011 Free Software Foundation, Inc.
Contributed by Richard Henderson r...@redhat.com.
 
This file is part of the GNU Transactional Memory Library (libitm).
@@ -29,7 +29,7 @@
 
 _ITM_beginTransaction:
.cfi_startproc
-#ifdef __LP64__
+#ifdef __x86_64__
leaq8(%rsp), %rax
movq(%rsp), %r8
subq$72, %rsp
@@ -72,7 +72,7 @@ _ITM_beginTransaction:
 
 GTM_longjmp:
.cfi_startproc
-#ifdef __LP64__
+#ifdef __x86_64__
movq(%rdi), %rcx
movq8(%rdi), %rdx
movq16(%rdi), %rbx
diff --git a/libitm/config/x86/tls.h b/libitm/config/x86/tls.h
index 03fdab2..ca644f4 100644
--- a/libitm/config/x86/tls.h
+++ b/libitm/config/x86/tls.h
@@ -37,7 +37,7 @@
 #if defined(__GLIBC_PREREQ)  __GLIBC_PREREQ(2, 10)
 namespace GTM HIDDEN {
 
-#ifdef __LP64__
+#ifdef __x86_64__
 # define SEG_READ(OFS) movq\t%%fs:( #OFS *8),%0
 # define SEG_WRITE(OFS)movq\t%0,%%fs:( #OFS *8)
 # define SEG_DECODE_READ(OFS)  SEG_READ(OFS) \n\t \


Re: Dump stats about hottest hash tables when -fmem-report

2011-08-22 Thread Dimitrios Apostolou

Hello,

I'm attaching here the final version of statistics dumping, I think I made 
some stylistic/insignificant changes. Tested on i386 and x86_64. I have 
withheld the hash table size changes, so please apply if you find 
everything good.


Would you agree on a future patch that would make such statistics being 
computed only when the gather-mem-stats compile-time variable is set?



On Sat, 20 Aug 2011, Richard Guenther wrote:

On Fri, Aug 19, 2011 at 6:40 PM, Tom Tromey tro...@redhat.com wrote:


Your patch to change the symbol table's load factor is fine technically.
I think the argument for putting it in is lacking; what I would like to
see is either some rationale explaining that the increased memory use is
not important, or some numbers showing that it still performs well on
more than a single machine.  My reason for wanting this is just that,
historically, GCC has been very sensitive to increases in memory use.
Alternatively, comments from more active maintainers indicating that
they don't care about this would also help your case.

I can't approve or reject the libiberty change, just the libcpp one.


Yes, memory usage is as important as compile-time.  We still have testcases
that show a vast imbalance of them.  I don't know if the symbol table hash
is ever the problem, but changing the generic load factor in libiberty doesn't
sound a good idea - maybe instead have a away of specifying that factor
per hashtable instance.  Or, as usual, improve the hash function to
reduce the collision rate and/or to make re-hashing cheaper.


Regarding hash table size, I will back off from hashtab.c. I really 
believe that even though it makes a difference the proper solution would 
be a much more intrusive one: complete reimplementation.


If we primarily care about memory usage we should use a closed-addressing 
(with linked-lists in each bucket) hash table, and set the load-factor 
high. If we care about cache-efficiency and speed we should use an 
open-addressing hash table, like the one we have, but decrease the 
load-factor and resolve collisions by quadratic probing instead of 
rehashing. Also we should make sure our hash table is always power-of-2 
sized, and that hash values are always computed using good and proved hash 
functions (we use Bob Jenkins v2 hash, we should upgrade to v3 even though 
v2 seems very good *when* we actually do use it). That would probably mean 
the interface should change to disallow custom hash values, and while we 
do that I'd really like to see new functions dedicated to searching and 
inserting.


I've experimented with some of these changes, but these changes 
individually do not offer significant speed-up. I believe that change will 
show if hash tables are completely reimplemented.


But regarding symtab.c, I only see benefits. Collisions are reduced and 
extra memory usage is negligible. For a hash table with 32K elements and 
64K slots, the overhead of empty slots is 32K*8 = 256KB. It it was 75% 
full it would be 128KB. Actual measurements on i386 show negligible 
overhead, it could be by luck that final ht size is the same (they are 
always power-of-2 sized, so it's possible). Anyway I'll hopefully test in 
other platforms within September and report back.


A change that would probably help for symtab would be to store custom 
string structs, that include string length so we avoid several strlen() 
calls. Another could be to not support deletion of strings (since what we 
are actually doing is string interning, isn't it?). This would make the 
open-addressing hash table much simpler, and I think there is only one 
case we actually delete strings. Have to look further into this one.



All comments welcome,
Dimitris




Changelog:

2011-08-22  Dimitrios Apostolou  ji...@gmx.net

* cgraph.c, cgraph.h (cgraph_dump_stats): New function to dump
stats about cgraph_hash hash table.
* cselib.c, cselib.h (cselib_dump_stats): New function to dump
stats about cselib_hash_table.
* cselib.c (cselib_finish): Keep statistics by increasing values
of new global variables
cselib_htab_{num,expands,searches,collisions} if -fmem-report.
* emit-rtl.c, emit-rtl.h (mem_attrs_dump_stats): New function to
dump statistics about mem_attrs_htab hash table.
* tree.c (print_type_hash_statistics): Used the new
htab_dump_statistics() function.
* var-tracking.c (shared_hash_destroy): Keep statistics by
increasing values of new global variables
vars_htab_{num,expands,searches,collisions} if -fmem-report.
(vt_finalize): Keep statistics by increasing values of new global
variables cv_htab_{num,expands,searches,collisions} and
vc_htab_{num,expands,searches,collisions} if -fmem-report.
* var-tracking.c, rtl.h (vt_dump_stats): New function to dump
stats about vars-htab, changed_variables and value_chains hash
tables.
* hashtab.c, 

Re: tree-ssa*: reduce malloc() calls by preallocating hot VECs on the stack

2011-08-22 Thread Richard Guenther
On Mon, Aug 22, 2011 at 9:43 AM, Dimitrios Apostolou ji...@gmx.net wrote:

 2011-08-22  Dimitrios Apostolou  ji...@gmx.net

        Allocate some very frequently used vectors on the stack:
        * vecir.h: Defined a tree vector on the stack.
        * tree-ssa-sccvn.c (print_scc, sort_scc, process_scc)
        (extract_and_process_scc_for_name): Allocate the scc vector on the
        stack instead of the heap, giving it a minimal initial size
        instead of 0.
        * tree-ssa-structalias.c (get_constraint_for_1)
        (get_constraint_for, get_constraint_for_rhs, do_deref)
        (get_constraint_for_ssa_var, get_constraint_for_ptr_offset)
        (get_constraint_for_component_ref, get_constraint_for_address_of)
        (process_all_all_constraints, do_structure_copy)
        (make_constraints_to, make_constraint_to, handle_rhs_call)
        (handle_lhs_call, handle_const_call, handle_pure_call)
        (find_func_aliases_for_builtin_call, find_func_aliases_for_call)
        (find_func_aliases, process_ipa_clobber, find_func_clobbers)
        (create_variable_info_for): Converted the rhsc, lhsc vectors from
        heap to stack, with a minimal initial size, since they were very
        frequently allocated.

Ok if bootstrapped and tested ok - please always state how you tested
a patch.

Thanks,
Richard.


Re: graphds.[ch]: alloc_pool for edges

2011-08-22 Thread Richard Guenther
On Mon, Aug 22, 2011 at 9:37 AM, Dimitrios Apostolou ji...@gmx.net wrote:
 free() was called way too often before, this patch reduces it significantly.
 Minor speed-up here too, I don't mention it individually since numbers are
 within noise margins.

As there is no re-use in this pool the natural allocator to use is an
obstack which has even less overhead than a alloc_pool.

Richard.


 2011-08-22  Dimitrios Apostolou  ji...@gmx.net

        * graphds.h (struct graph): Added edge_pool as a pool for
        allocating edges.
        * graphds.c (new_graph): Initialise edge_pool.
        (add_edge): Allocate edge from edge_pool rather than with malloc.
        (free_graph): Instead of iterating across the graph freeing edges,
        just destroy the edge_pool.



Re: tree-ssa-structalias.c: alloc_pool for struct equiv_class_label

2011-08-22 Thread Richard Guenther
On Mon, Aug 22, 2011 at 9:46 AM, Dimitrios Apostolou ji...@gmx.net wrote:

 2011-08-22  Dimitrios Apostolou  ji...@gmx.net

        * tree-ssa-structalias.c (equiv_class_add)
        (perform_var_substitution, free_var_substitution_info): Created a
        new equiv_class_pool allocator pool for struct
        equiv_class_label. Changed the pointer_equiv_class_table and
        location_equiv_class_table hash tables to not iterate freeing all
        elements in the end, but just free the pool.

Did you check if the hash functions have ever called free()?  If so why
not use the pool free function so that entries can get re-used?  If not,
the natural allocator would be an obstack instead.

Richard.





[trans-mem] Move x86 TLS definition to linux/x86.

2011-08-22 Thread Torvald Riegel
A small change. OK for branch?
commit b9db8481779fc5a4069ac36505bd6cce8da971a1
Author: Torvald Riegel trie...@redhat.com
Date:   Mon Aug 22 11:52:00 2011 +0200

Move x86 TLS definition to linux/x86.

* config/x86/tls.h: Moved to ...
* config/linux/x86/tls.h: ... here.

diff --git a/libitm/config/linux/x86/tls.h b/libitm/config/linux/x86/tls.h
new file mode 100644
index 000..ca644f4
--- /dev/null
+++ b/libitm/config/linux/x86/tls.h
@@ -0,0 +1,93 @@
+/* Copyright (C) 2008, 2009, 2011 Free Software Foundation, Inc.
+   Contributed by Richard Henderson r...@redhat.com.
+
+   This file is part of the GNU Transactional Memory Library (libitm).
+
+   Libitm is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   Libitm is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   http://www.gnu.org/licenses/.  */
+
+#ifndef LIBITM_X86_TLS_H
+#define LIBITM_X86_TLS_H 1
+
+#if defined(__GLIBC_PREREQ)  __GLIBC_PREREQ(2, 10)
+/* Use slots in the TCB head rather than __thread lookups.
+   GLIBC has reserved words 10 through 13 for TM.  */
+#define HAVE_ARCH_GTM_THREAD 1
+#define HAVE_ARCH_GTM_THREAD_DISP 1
+#endif
+
+#include config/generic/tls.h
+
+#if defined(__GLIBC_PREREQ)  __GLIBC_PREREQ(2, 10)
+namespace GTM HIDDEN {
+
+#ifdef __x86_64__
+# define SEG_READ(OFS) movq\t%%fs:( #OFS *8),%0
+# define SEG_WRITE(OFS)movq\t%0,%%fs:( #OFS *8)
+# define SEG_DECODE_READ(OFS)  SEG_READ(OFS) \n\t \
+   rorq\t$17,%0\n\t \
+   xorq\t%%fs:48,%0
+# define SEG_ENCODE_WRITE(OFS) xorq\t%%fs:48,%0\n\t \
+   rolq\t$17,%0\n\t \
+   SEG_WRITE(OFS)
+#else
+# define SEG_READ(OFS)  movl\t%%gs:( #OFS *4),%0
+# define SEG_WRITE(OFS) movl\t%0,%%gs:( #OFS *4)
+# define SEG_DECODE_READ(OFS)  SEG_READ(OFS) \n\t \
+   rorl\t$9,%0\n\t \
+   xorl\t%%gs:24,%0
+# define SEG_ENCODE_WRITE(OFS) xorl\t%%gs:24,%0\n\t \
+   roll\t$9,%0\n\t \
+   SEG_WRITE(OFS)
+#endif
+
+static inline struct gtm_thread *gtm_thr(void)
+{
+  struct gtm_thread *r;
+  asm (SEG_READ(10) : =r(r));
+  return r;
+}
+
+static inline void set_gtm_thr(struct gtm_thread *x)
+{
+  asm volatile (SEG_WRITE(10) : : r(x));
+}
+
+static inline struct abi_dispatch *abi_disp(void)
+{
+  struct abi_dispatch *r;
+  asm (SEG_DECODE_READ(11) : =r(r));
+  return r;
+}
+
+static inline void set_abi_disp(struct abi_dispatch *x)
+{
+  void *scratch;
+  asm volatile (SEG_ENCODE_WRITE(11) : =r(scratch) : 0(x));
+}
+
+#undef SEG_READ
+#undef SEG_WRITE
+#undef SEG_DECODE_READ
+#undef SEG_ENCODE_WRITE
+
+} // namespace GTM
+#endif /* = GLIBC 2.10 */
+
+#endif // LIBITM_X86_TLS_H
diff --git a/libitm/config/x86/tls.h b/libitm/config/x86/tls.h
deleted file mode 100644
index ca644f4..000
--- a/libitm/config/x86/tls.h
+++ /dev/null
@@ -1,93 +0,0 @@
-/* Copyright (C) 2008, 2009, 2011 Free Software Foundation, Inc.
-   Contributed by Richard Henderson r...@redhat.com.
-
-   This file is part of the GNU Transactional Memory Library (libitm).
-
-   Libitm is free software; you can redistribute it and/or modify it
-   under the terms of the GNU General Public License as published by
-   the Free Software Foundation; either version 3 of the License, or
-   (at your option) any later version.
-
-   Libitm is distributed in the hope that it will be useful, but WITHOUT ANY
-   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
-   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
-   more details.
-
-   Under Section 7 of GPL version 3, you are granted additional
-   permissions described in the GCC Runtime Library Exception, version
-   3.1, as published by the Free Software Foundation.
-
-   You should have received a copy of the GNU General Public License and
-   a copy of the GCC Runtime Library Exception along with this program;
-   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-   http://www.gnu.org/licenses/.  */
-
-#ifndef LIBITM_X86_TLS_H
-#define LIBITM_X86_TLS_H 1
-
-#if defined(__GLIBC_PREREQ)  

Re: [ARM] Model automodified addresses in the Cortex A8 and A9 schedulers

2011-08-22 Thread Ramana Radhakrishnan
 Tested on arm-linux-gnueabi.  OK to install?


 gcc/
* config/arm/arm-protos.h (arm_writeback_dep): Declare.
   (arm_writeback_only_dep): Likewise.
   * config/arm/arm.c (arm_writeback_dep): New function.
   (arm_writeback_only_dep_1, arm_writeback_only_dep): Likewise.
   * config/arm/cortex-a8.md: Add address-writeback bypasses for
   loads and stores.
   * config/arm/cortex-a9.md: Likewise.

This is OK.

Ramana


Re: mem_attrs_htab

2011-08-22 Thread Richard Guenther
On Mon, Aug 22, 2011 at 10:04 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Mon, Aug 22, 2011 at 10:58:48AM +0300, Dimitrios Apostolou wrote:
 the future. I didn't like hashing addresses either, and I was
 surprised I saw no regressions.

 Hashing on the expr address as well just results in smaller sharing
 in the hash table (i.e. if the expr has different address, but is considered
 equal).  The hashing of mem attrs is done just to reduce memory overhead.

And at some point the idea popped up to just dump this whole re-using
mem-attrs.  There is at most a single function in RTL but the whole program
is available in SSA, so the memory overhead must be small.

Richard.

        Jakub



Re: Various minor speed-ups

2011-08-22 Thread Richard Guenther
On Mon, Aug 22, 2011 at 9:50 AM, Dimitrios Apostolou ji...@gmx.net wrote:

 2011-08-22  Dimitrios Apostolou  ji...@gmx.net

        * tree-ssa-pre.c (phi_trans_add, init_pre, fini_pre): Added a pool
        for phi_translate_table elements to avoid free() calls from
        htab_delete().

Ok if bootstrap and test pass.

Richard.


Re: mem_attrs_htab

2011-08-22 Thread Jakub Jelinek
On Mon, Aug 22, 2011 at 11:57:18AM +0200, Richard Guenther wrote:
 And at some point the idea popped up to just dump this whole re-using
 mem-attrs.  There is at most a single function in RTL but the whole program
 is available in SSA, so the memory overhead must be small.

Some functions are extremely large though.  Do you mean that MEM itself would be
enlarged to have the MEM_ATTRS field so that one operand is the address,
then expr, then HWI size, offset, etc.?  Because if the mem attrs aren't
shared any longer, it doesn't make sense to keep the indirection.
I still fear we have way too many MEMs in RTL that this would be noticeable.

Jakub


Re: tree-ssa-structalias.c: alloc_pool for struct equiv_class_label

2011-08-22 Thread Dimitrios Apostolou

On Mon, 22 Aug 2011, Richard Guenther wrote:


On Mon, Aug 22, 2011 at 9:46 AM, Dimitrios Apostolou ji...@gmx.net wrote:


2011-08-22  Dimitrios Apostolou  ji...@gmx.net

       * tree-ssa-structalias.c (equiv_class_add)
       (perform_var_substitution, free_var_substitution_info): Created a
       new equiv_class_pool allocator pool for struct
       equiv_class_label. Changed the pointer_equiv_class_table and
       location_equiv_class_table hash tables to not iterate freeing all
       elements in the end, but just free the pool.


Did you check if the hash functions have ever called free()?  If so why
not use the pool free function so that entries can get re-used?  If not,
the natural allocator would be an obstack instead.


I have not found any relevant call of htab_clear_slot(). I didn't consider 
obstacks at all for all these cases, thanks for telling me, I'll see where 
I can use them. As I've noted I have bootstrapped and tested all these 
changes at least on x86_64 with release-checking enabled, but I plan to 
test and measure all my changes together later, and hopefully on other 
platforms in the future.



Thanks,
Dimitris


[PATCH] Fix PR50067

2011-08-22 Thread Richard Guenther

Applied.

Richard.

2011-08-22  Richard Guenther  rguent...@suse.de

PR testsuite/50145
* gcc.dg/torture/pr50067-1.c: Run on little-endian systems only.
* gcc.dg/torture/pr50067-2.c: Likewise.

Index: gcc/testsuite/gcc.dg/torture/pr50067-1.c
===
--- gcc/testsuite/gcc.dg/torture/pr50067-1.c(revision 177949)
+++ gcc/testsuite/gcc.dg/torture/pr50067-1.c(working copy)
@@ -9,10 +9,15 @@ short a[32] = { 0, 1, 2, 3, 4, 5, 6, 7,
 short b[32] = { 4, 0, 5, 0, 6, 0, 7, 0, 8, 0, };
 int main()
 {
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
   int i;
-  for (i = 0; i  32; ++i)
-(*((unsigned short(*)[32])a[0]))[i] = (*((char(*)[32])a[0]))[i+8];
-  if (memcmp (a, b, sizeof (a)) != 0)
-abort ();
+  if (sizeof (short) == 2)
+{
+  for (i = 0; i  32; ++i)
+   (*((unsigned short(*)[32])a[0]))[i] = (*((char(*)[32])a[0]))[i+8];
+  if (memcmp (a, b, sizeof (a)) != 0)
+   abort ();
+}
+#endif
   return 0;
 }
Index: gcc/testsuite/gcc.dg/torture/pr50067-2.c
===
--- gcc/testsuite/gcc.dg/torture/pr50067-2.c(revision 177949)
+++ gcc/testsuite/gcc.dg/torture/pr50067-2.c(working copy)
@@ -9,12 +9,17 @@ short a[32] = { 0, 1, 2, 3, 4, 5, 6, 7,
 short b[32] = { 4, 0, 5, 0, 6, 0, 7, 0, 8, 0, };
 int main()
 {
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
   int i;
-  for (i = 0; i  32; ++i)
+  if (sizeof (short) == 2)
 {
-  a[i] = (*((char(*)[32])a[0]))[i+8];
+  for (i = 0; i  32; ++i)
+   {
+ a[i] = (*((char(*)[32])a[0]))[i+8];
+   }
+  if (memcmp (a, b, sizeof (a)) != 0)
+   abort ();
 }
-  if (memcmp (a, b, sizeof (a)) != 0)
-abort ();
+#endif
   return 0;
 }


[var-tracking] small speed-ups

2011-08-22 Thread Dimitrios Apostolou

Hello list,

this patch has all of my changes to var-tracking that I believe are worth 
keeping. These are all minor changes not touching algorithmic issues, I 
lost much time trying to understand what is actually done in 
var-tracking.c but I still don't get it. I wish there was some document 
describing current implementation. I'd appreciate all help I can get here.


Bootstrapped/tested on x86_64.

Thanks to lxo for helping me, hopefully his plan for better var-tracking 
throughout all optimisation passes will be implemented. This is the only 
var-tracking related doc I could find... ( 
http://gcc.gnu.org/wiki/Var_Tracking_Assignments).


This patch also includes minor stylistic changes that made the code just a 
tiny bit more accessible to me, the indirection was so much that it hardly 
reminded me of C, let me know if I should remove these parts. :-s For the 
sake of completion I'll also post a follow-up patch where I 
delete/simplify a big part of var-tracking, unfortunately with some impact 
on performance.



2011-08-22  Dimitrios Apostolou  ji...@gmx.net

* var-tracking.c (init_attrs_list_set): Remove function, instead
use a memset() call to zero the attr list in...
(dataflow_set_init).
(vars_copy): Remove function because inserting each element into a
new hash table was too costly. Replaced with the ...
(htab_dup): ... new function that only does a memcpy() of the
element table in htab_t, without rehashing any elements.
(shared_hash_unshare): Replace the vars_copy() call with
htab_dup(), plus do a little extra work (reference counting) which
was in vars_copy.
(shared_hash_destroy, dataflow_set_destroy): Add an extra
do_free bool argument, to avoid iterating over hash tables
freeing elements, when not needed.
(vt_finalize, vt_emit_notes): Call the above with do_free=false
since all pools will be freed later.
(dataflow_set_clear, dataflow_set_copy, dataflow_set_union)
(dataflow_set_merge, dataflow_set_different, compute_bb_dataflow)
(vt_find_locations): Call shared_hash_destroy with do_free=true.
(attrs_list_copy): Do not free destination list but reuse already
allocated elements if possible.
=== modified file 'gcc/var-tracking.c'
--- gcc/var-tracking.c  2011-06-03 01:42:31 +
+++ gcc/var-tracking.c  2011-08-22 09:52:08 +
@@ -414,7 +414,6 @@ static hashval_t variable_htab_hash (con
 static int variable_htab_eq (const void *, const void *);
 static void variable_htab_free (void *);
 
-static void init_attrs_list_set (attrs *);
 static void attrs_list_clear (attrs *);
 static attrs attrs_list_member (attrs, decl_or_value, HOST_WIDE_INT);
 static void attrs_list_insert (attrs *, decl_or_value, HOST_WIDE_INT, rtx);
@@ -423,7 +422,6 @@ static void attrs_list_union (attrs *, a
 
 static void **unshare_variable (dataflow_set *set, void **slot, variable var,
enum var_init_status);
-static void vars_copy (htab_t, htab_t);
 static tree var_debug_decl (tree);
 static void var_reg_set (dataflow_set *, rtx, enum var_init_status, rtx);
 static void var_reg_delete_and_set (dataflow_set *, rtx, bool,
@@ -447,7 +445,7 @@ static bool variable_part_different_p (v
 static bool onepart_variable_different_p (variable, variable);
 static bool variable_different_p (variable, variable);
 static bool dataflow_set_different (dataflow_set *, dataflow_set *);
-static void dataflow_set_destroy (dataflow_set *);
+static void dataflow_set_destroy (dataflow_set *, bool);
 
 static bool contains_symbol_ref (rtx);
 static bool track_expr_p (tree, bool);
@@ -1069,14 +1067,14 @@ adjust_insn (basic_block bb, rtx insn)
 static inline bool
 dv_is_decl_p (decl_or_value dv)
 {
-  return !dv || (int) TREE_CODE ((tree) dv) != (int) VALUE;
+  return !dv || ((int) TREE_CODE ((tree) dv) != (int) VALUE);
 }
 
 /* Return true if a decl_or_value is a VALUE rtl.  */
 static inline bool
 dv_is_value_p (decl_or_value dv)
 {
-  return dv  !dv_is_decl_p (dv);
+  return !dv_is_decl_p (dv);
 }
 
 /* Return the decl in the decl_or_value.  */
@@ -1092,7 +1090,7 @@ static inline rtx
 dv_as_value (decl_or_value dv)
 {
   gcc_checking_assert (dv_is_value_p (dv));
-  return (rtx)dv;
+  return (rtx) dv;
 }
 
 /* Return the opaque pointer in the decl_or_value.  */
@@ -1191,7 +1189,7 @@ dv_uid2hash (dvuid uid)
 static inline hashval_t
 dv_htab_hash (decl_or_value dv)
 {
-  return dv_uid2hash (dv_uid (dv));
+  return (hashval_t) (dv_uid (dv));
 }
 
 /* The hash function for variable_htab, computes the hash value
@@ -1202,7 +1200,7 @@ variable_htab_hash (const void *x)
 {
   const_variable const v = (const_variable) x;
 
-  return dv_htab_hash (v-dv);
+  return (hashval_t) (dv_uid (v-dv));
 }
 
 /* Compare the declaration of variable X with declaration Y.  */
@@ -1211,9 +1209,8 @@ static int
 variable_htab_eq (const void *x, const void *y)
 {
   const_variable const v 

Re: mem_attrs_htab

2011-08-22 Thread Richard Guenther
On Mon, Aug 22, 2011 at 12:07 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Mon, Aug 22, 2011 at 11:57:18AM +0200, Richard Guenther wrote:
 And at some point the idea popped up to just dump this whole re-using
 mem-attrs.  There is at most a single function in RTL but the whole program
 is available in SSA, so the memory overhead must be small.

 Some functions are extremely large though.  Do you mean that MEM itself would 
 be
 enlarged to have the MEM_ATTRS field so that one operand is the address,
 then expr, then HWI size, offset, etc.?  Because if the mem attrs aren't
 shared any longer, it doesn't make sense to keep the indirection.
 I still fear we have way too many MEMs in RTL that this would be noticeable.

It would be interesting to have numbers about the amount of sharing that
happens - might be not trivial though, as some re-uses would be able to
simply modify the attr inplace.

Richard.

        Jakub



[patch] revert an obsolete part from the mips-triarch checkin

2011-08-22 Thread Matthias Klose
While looking at the multiarch patches, I noticed that a previous change is not
necessary.  MULTILIB_DEFAULTS is handled in config/mips/mips.h.

  Matthias

gcc/

2011-08-22  Matthias Klose d...@debian.org

Revert:
2011-07-11  Arthur Loiret  aloi...@debian.org
Matthias Klose d...@debian.org
* config/mips/t-linux64 (MULTILIB_DIRNAMES): Set to 'n32 . 64' if
tm_defines contains MIPS_ABI_DEFAULT ABI_32, to follow the glibc
convention.
 
Index: gcc/config/mips/t-linux64
===
--- gcc/config/mips/t-linux64   (revision 177952)
+++ gcc/config/mips/t-linux64   (working copy)
@@ -17,11 +17,7 @@
 # http://www.gnu.org/licenses/.
 
 MULTILIB_OPTIONS = mabi=n32/mabi=32/mabi=64
-ifneq ($(filter MIPS_ABI_DEFAULT=ABI_32,$(tm_defines)),)
-MULTILIB_DIRNAMES = n32 . 64
-else
 MULTILIB_DIRNAMES = n32 32 64
-endif
 MULTILIB_OSDIRNAMES = ../lib32 ../lib ../lib64
 
 EXTRA_MULTILIB_PARTS=crtbegin.o crtend.o crtbeginS.o crtendS.o crtbeginT.o


Re: [patch] revert an obsolete part from the mips-triarch checkin

2011-08-22 Thread Richard Sandiford
Matthias Klose d...@debian.org writes:
 2011-08-22  Matthias Klose d...@debian.org

   Revert:
   2011-07-11  Arthur Loiret  aloi...@debian.org
   Matthias Klose d...@debian.org
   * config/mips/t-linux64 (MULTILIB_DIRNAMES): Set to 'n32 . 64' if
   tm_defines contains MIPS_ABI_DEFAULT ABI_32, to follow the glibc
   convention.

Please also remove:

tm_defines=${tm_defines} MIPS_ABI_DEFAULT=ABI_32

from:

mips*-*-linux*) # Linux MIPS, either endian.
tm_file=dbxelf.h elfos.h gnu-user.h linux.h glibc-stdint.h ${tm_file} 
mips/gnu-user.h mips/linux.h
tmake_file=${tmake_file} mips/t-libgcc-mips16
if test x$enable_targets = xall; then
tm_file=${tm_file} mips/gnu-user64.h mips/linux64.h
tmake_file=${tmake_file} mips/t-linux64
tm_defines=${tm_defines} MIPS_ABI_DEFAULT=ABI_32
fi

in config.gcc.  (For the record, this follows up from:
http://gcc.gnu.org/ml/gcc-patches/2010-05/msg01108.html
which I'd forgotten about until Matthias reminded me this morning.)

OK with that change, thanks.

Richard


[PATCH] void dangling line table after loading pch

2011-08-22 Thread Dodji Seketeli
Hello,

In c_common_read_pch when gt_pch_restore loads a new pch, the previous
line table (referenced from the global 'line_table') is
garbage-collected and a new one is built.

As the global instance of cpp_reader referenced by the local variable
'pfile' has a pfile-line_table member that references the
'line_table' global, pfile-line_table needs to be set to the new
value of line_table after gt_pch_restore is called, otherwise
pfile-line_table points to garbage-collected memory.  This is what
the call to cpp_set_line_map in c_common_read_pch is for.

The problem is that cpp_set_line_map is called too late as
cpp_read_state (called before cpp_set_line_map) indirectly touches
some pfile-line_table dangling garbage-collected memory[1].

This doesn't cause any visible havoc in trunk yet but I am seeing it
crashing as I am fiddling with line map stuff on the side.

The two-liner patch below just calls cpp_set_line_map right after
gt_pch_restore to restore pfile-line_table before anyone touches it.

[1]: This happens via cpp_read_state - _cpp_create_definition -
_cpp_create_iso_definition - _cpp_lex_token - _cpp_lex_direct -
linemap_position_for_column.

Bootstrapped and tested on x86_64-unknown-linux-gnu against trunk.

OK for trunk?

gcc/

* c-family/c-pch.c (c_common_read_pch): Re-set line table right
after reading in the pch.
---
 gcc/c-family/c-pch.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/gcc/c-family/c-pch.c b/gcc/c-family/c-pch.c
index b429d9d..3c2fd18 100644
--- a/gcc/c-family/c-pch.c
+++ b/gcc/c-family/c-pch.c
@@ -431,6 +431,7 @@ c_common_read_pch (cpp_reader *pfile, const char *name,
   timevar_pop (TV_PCH_CPP_RESTORE);
 
   gt_pch_restore (f);
+  cpp_set_line_map (pfile, line_table);
 
   timevar_push (TV_PCH_CPP_RESTORE);
   if (cpp_read_state (pfile, name, f, smd) != 0)
@@ -445,7 +446,6 @@ c_common_read_pch (cpp_reader *pfile, const char *name,
   fclose (f);
 
   line_table-trace_includes = saved_trace_includes;
-  cpp_set_line_map (pfile, line_table);
   linemap_add (line_table, LC_RENAME, 0, saved_loc.file, saved_loc.line);
 
   /* Give the front end a chance to take action after a PCH file has
-- 
Dodji


Re: Vector Comparison patch

2011-08-22 Thread Richard Guenther
On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov
artyom.shinkar...@gmail.com wrote:
 Richard

 I formalized an approach a little-bit, now it works without target
 hooks, but some polishing is still required. I want you to comment on
 the several important approaches that I use in the patch.

 So how does it work.
 1) All the vector comparisons at the level of  type-checker are
 introduced using VEC_COND_EXPR with constant selection operands being
 {-1} and {0}. For example v0  v1 is transformed into VEC_COND_EXPRv0
 v1, {-1}, {0}.

 2) When optabs expand VEC_COND_EXPR, two cases are considered:
 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing 
 changes.
 2.b) first operand is something else, in that case, we specially mark
 this case, recognize it in the backend, and do not create a
 comparison, but use the mask as it was a result of some comparison.

 3) In order to make sure that mask in VEC_COND_EXPRmask, v0, v1 is a
 vector comparison we use is_vector_comparison function, if it returns
 false, then we replace mask with mask != {0}.

 So we end-up with the following functionality:
 VEC_COND_EXPRmask, v0,v1 -- if we know that mask is a result of
 comparison of two vectors, we leave it as it is, otherwise change with
 mask != {0}.

 Plain vector comparison a op b is represented with VEC_COND_EXPR,
 which correctly expands, without creating useless masking.


 Basically for me there are two questions:
 1) Can we perform information passing in optabs in a nicer way?
 2) How is_vector_comparison could be improved? I have several ideas,
 like checking if constant vector all consists of 0 and -1, and so on.
 But first is it conceptually fine.

 P.S. I tired to put the functionality of is_vector_comparison in
 tree-ssa-forwprop, but the thing is that it is called only with -On,
 which I find inappropriate, and the functionality gets more
 complicated.

Why is it inappropriate to not optimize it at -O0?  If the user
separates comparison and ?: expression it's his own fault.

Btw, the new hook is still in the patch.

I would simply always create != 0 if it isn't and let optimizers
(tree-ssa-forwprop.c) optimize this.  You still have to deal with
non-comparison operands during expansion though, but if
you always forced a != 0 from the start you can then simply
interpret it as a proper comparison result (in which case I'd
modify the backends to have an alternate pattern or directly
expand to masking operations - using the fake comparison
RTX is too much of a hack).

 tree
 constant_boolean_node (int value, tree type)
 {
-  if (type == integer_type_node)
+  if (TREE_CODE (type) == VECTOR_TYPE)
+{
+  tree tval;
+
+  gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE);
+  tval = build_int_cst (TREE_TYPE (type), value);
+  return build_vector_from_val (type, tval);

as value is either 0 or 1 that won't work.  Oh, I see you pass -1
for true in the callers.  But I think we should simply decide that true (1)
means -1 for a vector boolean node (and the value parameter should
be a bool instead).  Thus,

+  if (TREE_CODE (type) == VECTOR_TYPE)
+{
+  tree tval;
+
+  gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE);
+  tval = build_int_cst (TREE_TYPE (type), value ? -1 : 0);
+  return build_vector_from_val (type, tval);

instead.

@@ -9073,26 +9082,29 @@ fold_comparison (location_t loc, enum tr
  floating-point, we can only do some of these simplifications.)  */
   if (operand_equal_p (arg0, arg1, 0))
 {
+  int true_val = TREE_CODE (type) == VECTOR_TYPE ? -1 : 0;
+  tree arg0_type = TREE_TYPE (arg0);
+

as I said this is not necessary - the FLOAT_TYPE_P and HONOR_NANS
macros work perfectly fine on vector types.

Richard.


 Thanks,
 Artem.



[var-tracking] [not-good!] disable shared_hash and other simplifications

2011-08-22 Thread Dimitrios Apostolou

Hello,

the attached patch applies after my previous one, and actually cancels all 
runtime gains from it. It doesn't make things worse than initially, so 
it's not *that* bad.


While trying to understand var-tracking I deleted the whole shared hash 
table concept and some other indirections. It didn't make things much 
worse, and it makes me think that getting rid of other indirections like 
shared variables reference counting would actually help. But it got too 
complicated and I had to give up.


While I don't understand variable tracking, I couldn't help but notice 
some things that could be discussed:


* This is the part of the compiler that makes the heaviest use of hash 
tables. htab_traverse(), htab_delete(), FOR_EACH_HTAB_ELEMENT (should that 
be moved in libiberty?) all iterate way too many times over hash tables. 
This is all because decl/value UIDs are sparse.
* Since var-tracking is happening per-function, is there a way to produce 
per-function UIDs and use those in a regular array instead of a sparse 
hash table? These UIDs could be produced at tree/rtx level, and I bet 
other parts in the compiler could make good use of them.
* Another alternative would be to store variables in a vector, sorted by 
UID value. That way checking if it's there would be O(logn) but merging 
would be a very simple operation. Still, merging dataflow sets is way to 
complex to follow so I didn't manage to change this.
* From a very wide field of view, all this DF solving reminded me a lot of 
what I've seen in df-*.c. Why can't variable tracking be integrated in the 
main DF pass of the compiler, the one that happens *after* register 
allocation?



Thanks,
Dimitris
=== modified file 'gcc/var-tracking.c'
--- gcc/var-tracking.c  2011-08-22 08:27:47 +
+++ gcc/var-tracking.c  2011-08-22 10:36:15 +
@@ -233,17 +233,6 @@ typedef struct attrs_def
   HOST_WIDE_INT offset;
 } *attrs;
 
-/* Structure holding a refcounted hash table.  If refcount  1,
-   it must be first unshared before modified.  */
-typedef struct shared_hash_def
-{
-  /* Reference count.  */
-  int refcount;
-
-  /* Actual hash table.  */
-  htab_t htab;
-} *shared_hash;
-
 /* Structure holding the IN or OUT set for a basic block.  */
 typedef struct dataflow_set_def
 {
@@ -253,11 +242,11 @@ typedef struct dataflow_set_def
   /* Attributes for registers (lists of attrs).  */
   attrs regs[FIRST_PSEUDO_REGISTER];
 
-  /* Variable locations.  */
-  shared_hash vars;
+  /* Variable locations, stored in a hash table.  */
+  htab_t vars;
 
   /* Vars that is being traversed.  */
-  shared_hash traversed_vars;
+  htab_t traversed_vars;
 } dataflow_set;
 
 /* The structure (one for each basic block) containing the information
@@ -379,9 +368,6 @@ static alloc_pool valvar_pool;
 /* Alloc pool for struct location_chain_def.  */
 static alloc_pool loc_chain_pool;
 
-/* Alloc pool for struct shared_hash_def.  */
-static alloc_pool shared_hash_pool;
-
 /* Alloc pool for struct value_chain_def.  */
 static alloc_pool value_chain_pool;
 
@@ -395,7 +381,7 @@ static htab_t value_chains;
 static bool emit_notes;
 
 /* Empty shared hashtable.  */
-static shared_hash empty_shared_hash;
+static htab_t empty_htab;
 
 /* Scratch register bitmap used by cselib_expand_value_rtx.  */
 static bitmap scratch_regs = NULL;
@@ -478,7 +464,7 @@ static void **delete_slot_part (dataflow
 static void delete_variable_part (dataflow_set *, rtx,
  decl_or_value, HOST_WIDE_INT);
 static int emit_note_insn_var_location (void **, void *);
-static void emit_notes_for_changes (rtx, enum emit_note_where, shared_hash);
+static void emit_notes_for_changes (rtx, enum emit_note_where, htab_t);
 static int emit_notes_for_differences_1 (void **, void *);
 static int emit_notes_for_differences_2 (void **, void *);
 static void emit_notes_for_differences (rtx, dataflow_set *, dataflow_set *);
@@ -1370,164 +1356,30 @@ attrs_list_mpdv_union (attrs *dstp, attr
 }
 }
 
-/* Shared hashtable support.  */
-
-/* Return true if VARS is shared.  */
-
-static inline bool
-shared_hash_shared (shared_hash vars)
-{
-  return vars-refcount  1;
-}
-
-/* Return the hash table for VARS.  */
-
-static inline htab_t
-shared_hash_htab (shared_hash vars)
-{
-  return vars-htab;
-}
-
 /* Return true if VAR is shared, or maybe because VARS is shared.  */
 
 static inline bool
-shared_var_p (variable var, shared_hash vars)
+shared_var_p (variable var)
 {
   /* Don't count an entry in the changed_variables table as a duplicate.  */
-  return ((var-refcount  1 + (int) var-in_changed_variables)
- || shared_hash_shared (vars));
+  return (var-refcount - (int) var-in_changed_variables)  1;
 }
 
-/* Copy all variables from hash table SRC to hash table DST without rehashing
-   any values.  */
-
-static htab_t
-htab_dup (htab_t src)
-{
-  htab_t dst;
-
-  dst = (htab_t) xmalloc (sizeof (*src));
-  memcpy (dst, src, sizeof (*src));
-  dst-entries = (void **) xmalloc (src-size * 

Re: Dump stats about hottest hash tables when -fmem-report

2011-08-22 Thread Dimitrios Apostolou
I should note here that specialised hash-tables in pointer-set.c have a 
load-factor of at most 25%. Also another very fast hash table I've 
studied, dense_hash_map from google's sparse_hash_table, has a load factor 
of 50% max.


As I understand it a good hash function gives a perfectly random value up 
to N. So if we have only 25% of slots empty, like we do with today's 75% 
load factor, chances we won't have collision are small. And collisions 
cost more for an open-addressing hash table.


But for implementing a closed-addressing hash-table with linked lists for 
collision resolution, we'd need allocator pools within libiberty. Is it 
OK to move alloc-pool.[ch] into libiberty as a first step?



Dimitris




Re: [var-tracking] [not-good!] disable shared_hash and other simplifications

2011-08-22 Thread Jakub Jelinek
On Mon, Aug 22, 2011 at 02:26:58PM +0300, Dimitrios Apostolou wrote:
 the attached patch applies after my previous one, and actually
 cancels all runtime gains from it. It doesn't make things worse than
 initially, so it's not *that* bad.

You should really test it on the testcases which lead to the changes,
e.g. http://gcc.gnu.org/ml/gcc-patches/2009-06/msg01586.html
You might not notice significant differences on small or common testcases,
but on larger testcases it may be a question whether something can be
compiled at all or not.  You'll see that reverting it is actually very
bad for the worst cases.

 * This is the part of the compiler that makes the heaviest use of
 hash tables. htab_traverse(), htab_delete(), FOR_EACH_HTAB_ELEMENT
 (should that be moved in libiberty?) all iterate way too many times
 over hash tables. This is all because decl/value UIDs are sparse.
 * Since var-tracking is happening per-function, is there a way to
 produce per-function UIDs and use those in a regular array instead
 of a sparse hash table? These UIDs could be produced at tree/rtx
 level, and I bet other parts in the compiler could make good use of
 them.

That wouldn't help you, you don't want/need to represent every DECL_UID and
every VALUE seen in the whole function in every bb, only those that were
actually seen there.  Having
(number_of_seen_decl_uids+number_of_values)*number_of_basic_blocks*sizeof(void*)
memory requirements would be still bigger than the size of all the hash
tables together, I think much bigger actually, and also slower when you'd
need to traverse it.

 * From a very wide field of view, all this DF solving reminded me a
 lot of what I've seen in df-*.c. Why can't variable tracking be
 integrated in the main DF pass of the compiler, the one that happens
 *after* register allocation?

I don't think there is any significant overlap.  Yes, var-tracking is a data
flow algorithm, but the code that does that is just a few lines, the big
part is how the merges are done and how the tracking propagates through a
basic block, and that has nothing in common with df-*.c.

Jakub


Re: Dump stats about hottest hash tables when -fmem-report

2011-08-22 Thread Richard Guenther
On Mon, Aug 22, 2011 at 1:37 PM, Dimitrios Apostolou ji...@gmx.net wrote:
 I should note here that specialised hash-tables in pointer-set.c have a
 load-factor of at most 25%. Also another very fast hash table I've studied,
 dense_hash_map from google's sparse_hash_table, has a load factor of 50%
 max.

 As I understand it a good hash function gives a perfectly random value up to
 N. So if we have only 25% of slots empty, like we do with today's 75% load
 factor, chances we won't have collision are small. And collisions cost more
 for an open-addressing hash table.

 But for implementing a closed-addressing hash-table with linked lists for
 collision resolution, we'd need allocator pools within libiberty. Is it OK
 to move alloc-pool.[ch] into libiberty as a first step?

Well, what would be the constant overhead for a closed-addressing hash
if it is empty and per element (per collision)?  Decreasing the load factor
might be a better choice ...

Richard.


 Dimitris





Re: [var-tracking] small speed-ups

2011-08-22 Thread Jakub Jelinek
On Mon, Aug 22, 2011 at 01:30:33PM +0300, Dimitrios Apostolou wrote:
 --- gcc/var-tracking.c2011-06-03 01:42:31 +
 +++ gcc/var-tracking.c2011-08-22 09:52:08 +
 @@ -1069,14 +1067,14 @@ adjust_insn (basic_block bb, rtx insn)
  static inline bool
  dv_is_decl_p (decl_or_value dv)
  {
 -  return !dv || (int) TREE_CODE ((tree) dv) != (int) VALUE;
 +  return !dv || ((int) TREE_CODE ((tree) dv) != (int) VALUE);
  }

Why?

  /* Return true if a decl_or_value is a VALUE rtl.  */
  static inline bool
  dv_is_value_p (decl_or_value dv)
  {
 -  return dv  !dv_is_decl_p (dv);
 +  return !dv_is_decl_p (dv);
  }

This is fine, though I don't think it makes any difference if var-tracking
is compiled with -O+ - gcc should be able to optimize the second
NULL/non-NULL check out.

 @@ -1191,7 +1189,7 @@ dv_uid2hash (dvuid uid)
  static inline hashval_t
  dv_htab_hash (decl_or_value dv)
  {
 -  return dv_uid2hash (dv_uid (dv));
 +  return (hashval_t) (dv_uid (dv));
  }

Why?  dv_uid2hash is an inline that does exactly that.

 @@ -1202,7 +1200,7 @@ variable_htab_hash (const void *x)
  {
const_variable const v = (const_variable) x;
  
 -  return dv_htab_hash (v-dv);
 +  return (hashval_t) (dv_uid (v-dv));
  }

Why?

  /* Compare the declaration of variable X with declaration Y.  */
 @@ -1211,9 +1209,8 @@ static int
  variable_htab_eq (const void *x, const void *y)
  {
const_variable const v = (const_variable) x;
 -  decl_or_value dv = CONST_CAST2 (decl_or_value, const void *, y);
  
 -  return (dv_as_opaque (v-dv) == dv_as_opaque (dv));
 +  return (v-dv) == y;
  }

Why?

 @@ -1397,19 +1398,40 @@ shared_var_p (variable var, shared_hash 
 || shared_hash_shared (vars));
  }
  
 +/* Copy all variables from hash table SRC to hash table DST without rehashing
 +   any values.  */
 +
 +static htab_t
 +htab_dup (htab_t src)
 +{
 +  htab_t dst;
 +
 +  dst = (htab_t) xmalloc (sizeof (*src));
 +  memcpy (dst, src, sizeof (*src));
 +  dst-entries = (void **) xmalloc (src-size * sizeof (*src-entries));
 +  memcpy (dst-entries, src-entries,
 +   src-size * sizeof (*src-entries));
 +  return dst;
 +}
 +

This certainly doesn't belong here, it should go into libiberty/hashtab.c
and prototype into include/hashtab.h.  It relies on hashtab.c
implementation details.

 @@ -2034,7 +2041,8 @@ val_resolve (dataflow_set *set, rtx val,
  static void
  dataflow_set_init (dataflow_set *set)
  {
 -  init_attrs_list_set (set-regs);
 +  /* Initialize the set (array) SET of attrs to empty lists.  */
 +  memset (set-regs, 0, sizeof (set-regs));
set-vars = shared_hash_copy (empty_shared_hash);
set-stack_adjust = 0;
set-traversed_vars = NULL;

I'd say you should instead just implement init_attrs_list_set inline using
memset.

dst-vars = (shared_hash) pool_alloc (shared_hash_pool);
dst-vars-refcount = 1;
dst-vars-htab
 -= htab_create (MAX (src1_elems, src2_elems), variable_htab_hash,
 += htab_create (2 * MAX (src1_elems, src2_elems), variable_htab_hash,
  variable_htab_eq, variable_htab_free);

This looks wrong, 2 * max is definitely too much.

 @@ -8996,11 +9006,13 @@ vt_finalize (void)
  
FOR_ALL_BB (bb)
  {
 -  dataflow_set_destroy (VTI (bb)-in);
 -  dataflow_set_destroy (VTI (bb)-out);
 +  /* The false do_free parameter means to not bother to iterate and 
 free
 +  all hash table elements, since we'll destroy the pools. */
 +  dataflow_set_destroy (VTI (bb)-in, false);
 +  dataflow_set_destroy (VTI (bb)-out, false);
if (VTI (bb)-permp)
   {
 -   dataflow_set_destroy (VTI (bb)-permp);
 +   dataflow_set_destroy (VTI (bb)-permp, false);
 XDELETE (VTI (bb)-permp);
   }
  }
 

How much does this actually speed things up (the not freeing pool allocated
stuff during finalizaqtion)?  Is it really worth it?

Jakub


Re: Vector Comparison patch

2011-08-22 Thread Artem Shinkarov
On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov
 artyom.shinkar...@gmail.com wrote:
 Richard

 I formalized an approach a little-bit, now it works without target
 hooks, but some polishing is still required. I want you to comment on
 the several important approaches that I use in the patch.

 So how does it work.
 1) All the vector comparisons at the level of  type-checker are
 introduced using VEC_COND_EXPR with constant selection operands being
 {-1} and {0}. For example v0  v1 is transformed into VEC_COND_EXPRv0
 v1, {-1}, {0}.

 2) When optabs expand VEC_COND_EXPR, two cases are considered:
 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing 
 changes.
 2.b) first operand is something else, in that case, we specially mark
 this case, recognize it in the backend, and do not create a
 comparison, but use the mask as it was a result of some comparison.

 3) In order to make sure that mask in VEC_COND_EXPRmask, v0, v1 is a
 vector comparison we use is_vector_comparison function, if it returns
 false, then we replace mask with mask != {0}.

 So we end-up with the following functionality:
 VEC_COND_EXPRmask, v0,v1 -- if we know that mask is a result of
 comparison of two vectors, we leave it as it is, otherwise change with
 mask != {0}.

 Plain vector comparison a op b is represented with VEC_COND_EXPR,
 which correctly expands, without creating useless masking.


 Basically for me there are two questions:
 1) Can we perform information passing in optabs in a nicer way?
 2) How is_vector_comparison could be improved? I have several ideas,
 like checking if constant vector all consists of 0 and -1, and so on.
 But first is it conceptually fine.

 P.S. I tired to put the functionality of is_vector_comparison in
 tree-ssa-forwprop, but the thing is that it is called only with -On,
 which I find inappropriate, and the functionality gets more
 complicated.

 Why is it inappropriate to not optimize it at -O0?  If the user
 separates comparison and ?: expression it's his own fault.

Well, because all the information is there, and I perfectly envision
the case when user expressed comparison separately, just to avoid code
duplication.

Like:
mask = a  b;
res1 = mask ? v0 : v1;
res2 = mask ? v2 : v3;

Which in this case would be different from
res1 = a  b ? v0 : v1;
res2 = a  b ? v2 : v3;

 Btw, the new hook is still in the patch.

 I would simply always create != 0 if it isn't and let optimizers
 (tree-ssa-forwprop.c) optimize this.  You still have to deal with
 non-comparison operands during expansion though, but if
 you always forced a != 0 from the start you can then simply
 interpret it as a proper comparison result (in which case I'd
 modify the backends to have an alternate pattern or directly
 expand to masking operations - using the fake comparison
 RTX is too much of a hack).

Richard, I think you didn't get the problem.
I really need the way, to pass the information, that the expression
that is in the first operand of vcond is an appropriate mask. I though
for quite a while and this hack is the only answer I found, is there a
better way to do that. I could for example introduce another
tree-node, but it would be overkill as well.

Now why do I need it so much:
I want to implement the comparison in a way that {1, 5, 0, -1} is
actually {-1,-1,-1,-1}. So whenever I am not sure that mask of
VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not
always). To check the stuff, I use is_vector_comparison in
tree-vect-generic.

So I really have the difference between mask ? x : y and mask != {0} ?
x : y, otherwise I could treat mask != {0} in the backend as just
mask.

If this link between optabs and backend breaks, then the patch falls
apart. Because every time the comparison is taken out VEC_COND_EXPR, I
will have to put != {0}. Keep in mind, that I cannot always put the
comparison inside the VEC_COND_EXPR, what if it is defined in a
function-comparison, or somehow else?

So what would be an appropriate way to connect optabs and the backend?


Thanks,
Artem.

All the rest would be adjusted later.

  tree
  constant_boolean_node (int value, tree type)
  {
 -  if (type == integer_type_node)
 +  if (TREE_CODE (type) == VECTOR_TYPE)
 +    {
 +      tree tval;
 +
 +      gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE);
 +      tval = build_int_cst (TREE_TYPE (type), value);
 +      return build_vector_from_val (type, tval);

 as value is either 0 or 1 that won't work.  Oh, I see you pass -1
 for true in the callers.  But I think we should simply decide that true (1)
 means -1 for a vector boolean node (and the value parameter should
 be a bool instead).  Thus,

 +  if (TREE_CODE (type) == VECTOR_TYPE)
 +    {
 +      tree tval;
 +
 +      gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE);
 +      tval = build_int_cst (TREE_TYPE (type), value ? -1 : 0);
 +      return 

[dwarf2out, elfos] Output assembly faster

2011-08-22 Thread Dimitrios Apostolou

Hello again,

most of this patch was posted at the beginning of my GSOC adventure and 
primarily is about replacing fprintf() calls with custom faster ones. From 
that time I changed minor things you suggested, omitted some complexities 
that offered minor speed-up, and made the code as clear as possible, while 
following more closely coding guidelines.


Bootstrapped and tested on both i386 and x86_64, showing runtime 
improvement when compiling with debug info enabled.


The only thing I am not sure is the differentiation between HOST_WIDE_INT 
and long, so I provided two versions of the same function. Please let me 
know if I should handle this differently.



Future speedup in assembly generation *is possible* but requires more 
intrusive changes, maybe making assembly hard to read by human:

* Always use hex numbers, they are much faster to produce
* Call GNU assembler with -s parameter, though it's pretty hard to be 
compliant.
* Even further in the future we could generate binary data, if we *know* 
the assembler is GAS.



Changelog:

2011-08-12  Dimitrios Apostolou  ji...@gmx.net

* final.c, output.h (fprint_whex, fprint_w, fprint_ul, sprint_ul):
New functions serving as fast replacements for fprintf() integer
to string conversions. They were used in the following changes.
* final.c (sprint_ul_rev): Internal helper for the above.
(output_addr_const): case CONST_INT: don't use fprintf().
* elfos.h (ASM_GENERATE_INTERNAL_LABEL): Don't use sprintf(%u),
use sprint_ul() and custom loop which are much faster.
(TARGET_ASM_INTERNAL_LABEL): Define as default_elf_internal_label.
(ELF_ASCII_ESCAPES, ELF_STRING_LIMIT): Are the old ESCAPES and
STRING_LIMIT macros.
(ASM_OUTPUT_LIMITED_STRING, ASM_OUTPUT_ASCII): Macros now just
call respective functions that provide now the same
functionality. Those are default_elf_asm_output_limited_string()
and default_elf_asm_output_ascii() in varasm.c.
* varasm.c: Fixed some whitespace inconsistencies.
(default_elf_asm_output_limited_string)
(default_elf_asm_output_ascii): The above macros from elfos.h are
implemented here, avoiding superfluous calls to fprintf().
(default_elf_internal_label): Hook for
targetm.asm_out.internal_label and ASM_OUTPUT_DEBUG_LABEL.
* i386.c: Don't call fprintf(%u) but fprint_ul() instead.
* defaults.h (ASM_OUTPUT_LABEL, ASM_OUTPUT_INTERNAL_LABEL):
Expanded the macros on multiple lines for readability.
(ASM_OUTPUT_LABELREF): Have two calls to fputs() instead of one to
asm_fprintf().
* dwarf2asm.c (dw2_assemble_integer, dw2_asm_output_data)
(dw2_asm_data_uleb128, dw2_asm_delta_uleb128)
(dw2_asm_delta_sleb128): Convert fprintf() calls to faster,
simpler functions.
* dwarf2out.c (dwarf2out_source_line): Convert fprintf() calls to
faster, simpler functions.



Thanks,
Dimitris
=== modified file 'gcc/config/elfos.h'
--- gcc/config/elfos.h  2010-11-21 00:54:14 +
+++ gcc/config/elfos.h  2011-08-22 11:39:43 +
@@ -117,10 +117,23 @@ see the files COPYING3 and COPYING.RUNTI
 #define ASM_GENERATE_INTERNAL_LABEL(LABEL, PREFIX, NUM)\
   do   \
 {  \
-  sprintf (LABEL, *.%s%u, PREFIX, (unsigned) (NUM)); \
+  int __i = 2; \
+  int __j = 0; \
+  (LABEL)[0] = '*';\
+  (LABEL)[1] = '.';\
+  do   \
+   {   \
+ (LABEL)[__i] = (PREFIX)[__j]; \
+ __i++; __j++; \
+   }   \
+  while ((PREFIX)[__j] != '\0');   \
+  sprint_ul ((LABEL)[__i], (unsigned long) (NUM));\
 }  \
   while (0)
 
+#undef TARGET_ASM_INTERNAL_LABEL
+#define TARGET_ASM_INTERNAL_LABEL default_elf_internal_label
+
 /* Output the label which precedes a jumptable.  Note that for all svr4
systems where we actually generate jumptables (which is to say every
svr4 target except i386, where we use casesi instead) we put the jump-
@@ -371,7 +384,7 @@ see the files COPYING3 and COPYING.RUNTI
the i386) don't know about that.  Also, we don't use \v
since some versions of gas, such as 2.2 did not accept it.  */
 
-#define ESCAPES \
+#define ELF_ASCII_ESCAPES \
 \1\1\1\1\1\1\1\1btn\1fr\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\
 

Re: GIMPLE and intent of variables

2011-08-22 Thread Mateusz Grabowski

Thank you for all the answers. Now I have another problem.

When I reach a function call during statement iteration, I want to go to
this function's code.

if (is_gimple_call(stmt)) {
tree fndecl = gimple_call_fndecl(stmt); // This returns 
function_decl ;-)

struct funtion * new_cfun = DECL_STRUCT_FUNCTION(fndecl);   
/* This returns
struct function * from tree */

/* Push the current cfun onto the stack, and set cfun to 
new_cfun.  */
push_cfun (new_cfun);

annotate_function(cfun);

/* Pop cfun from the stack.  */
pop_cfun();
}


I have a problem with variable types (TREE_CODE). When I change cfun (using
push_cfun), temporary variables, used in new function, have TREE_CODE equals
VAR_DECL instead of SSA_NAME.
I think I have reached the called function before it was passed through SSA.
Should I pass the function through SSA manually or somehow change the moment
when the plugin is run, so that all function are already passed through SSA?
-- 
View this message in context: 
http://old.nabble.com/GIMPLE-and-intent-of-variables-tp32275433p32310211.html
Sent from the gcc - patches mailing list archive at Nabble.com.



Re: GIMPLE and intent of variables

2011-08-22 Thread Richard Guenther
On Mon, Aug 22, 2011 at 2:21 PM, Mateusz Grabowski grabek...@wp.pl wrote:

 Thank you for all the answers. Now I have another problem.

 When I reach a function call during statement iteration, I want to go to
 this function's code.

 if (is_gimple_call(stmt)) {
                tree fndecl = gimple_call_fndecl(stmt);         // This 
 returns function_decl ;-)

                struct funtion * new_cfun = DECL_STRUCT_FUNCTION(fndecl);      
  /* This returns
 struct function * from tree */

                /* Push the current cfun onto the stack, and set cfun to 
 new_cfun.  */
                push_cfun (new_cfun);

                annotate_function(cfun);

                /* Pop cfun from the stack.  */
                pop_cfun();
 }


 I have a problem with variable types (TREE_CODE). When I change cfun (using
 push_cfun), temporary variables, used in new function, have TREE_CODE equals
 VAR_DECL instead of SSA_NAME.
 I think I have reached the called function before it was passed through SSA.
 Should I pass the function through SSA manually or somehow change the moment
 when the plugin is run, so that all function are already passed through SSA?

The latter.

 View this message in context: 
 http://old.nabble.com/GIMPLE-and-intent-of-variables-tp32275433p32310211.html
 Sent from the gcc - patches mailing list archive at Nabble.com.




Re: GIMPLE and intent of variables

2011-08-22 Thread Mateusz Grabowski


Richard Guenther-2 wrote:
 
 
 The latter.
 
 

But how to do it? I want to have all functions after SSA pass, but before
any optimizations. Maybe you could tell me how to enforce it (or even better
- a small example)?
Thanks in advance.

-- 
View this message in context: 
http://old.nabble.com/GIMPLE-and-intent-of-variables-tp32275433p32310942.html
Sent from the gcc - patches mailing list archive at Nabble.com.



Re: [trans-mem] Use __x86_64__ instead of __LP64__.

2011-08-22 Thread H.J. Lu
On Mon, Aug 22, 2011 at 2:42 AM, Torvald Riegel trie...@redhat.com wrote:
 Use __x86_64__ instead of __LP64__ in setjmp/longjmp and TLS
 definitions.

 H.J.: Is that sufficient for x32, or do we need entirely different code?
 If so, can you please provide the required changes?


I need to take a look.

Thanks.

-- 
H.J.


Re: [DF] [performance] generate DF_REF_BASE REFs in REGNO order

2011-08-22 Thread Dimitrios Apostolou

Hi Steven,

On Mon, 1 Aug 2011, Steven Bosscher wrote:

On Sun, Jul 31, 2011 at 11:59 PM, Steven Bosscher stevenb@gmail.com wrote:

On Fri, Jul 29, 2011 at 11:48 PM, Steven Bosscher stevenb@gmail.com wrote:

I'll see if I can test the patch on the compile farm this weekend,
just to be sure.


Bootstrap on
ia64-unknown-linux-gnu is in stage2 but it is taking forever (on
gcc60)...


Just to be clear, it is taking forever without the patch too. I did
time -O2 non-bootstrap builds but there was no difference worth
mentioning.


Did the testsuite finish with no regressions on IA64?

I'd test on a sparcstation but unfortunately the machine crashed before
finishing and I can't regain access to it. I'll hopefully test it in some
other platform by next week, I'm curious to actually measure runtime
there.


Thanks for helping!
Dimitris


Re: [PATCH, testsuite, i386] AVX2 support for GCC

2011-08-22 Thread H.J. Lu
On Mon, Aug 22, 2011 at 6:18 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hi,
 thanks for input, Uros. Spaces were fixed.

 Updated patch is attached. ChangeLog entry is attached.

 Could anybody please commit it?


I checked in for you.


-- 
H.J.


Re: [PATCH, testsuite, i386] AVX2 support for GCC

2011-08-22 Thread Kirill Yukhin
Thanks!

K

On Mon, Aug 22, 2011 at 5:57 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Aug 22, 2011 at 6:18 AM, Kirill Yukhin kirill.yuk...@gmail.com 
 wrote:
 Hi,
 thanks for input, Uros. Spaces were fixed.

 Updated patch is attached. ChangeLog entry is attached.

 Could anybody please commit it?


 I checked in for you.


 --
 H.J.



[testsuite,committed]: Add require scheduling to gcc.dg/pr49994-[23].c

2011-08-22 Thread Georg-Johann Lay
http://gcc.gnu.org/viewcvs?view=revisionrevision=177954

Committed that patchlet as obvious.

testsuite/
* gcc.dg/pr49994-2.c: Add dg-require-effective-target scheduling.
* gcc.dg/pr49994-3.c: Ditto.

Johann


Index: ChangeLog
===
--- ChangeLog   (revision 177953)
+++ ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2011-08-22  Georg-Johann Lay  a...@gjlay.de
+
+   * gcc.dg/pr49994-2.c: Add dg-require-effective-target scheduling.
+   * gcc.dg/pr49994-3.c: Ditto.
+
 2011-08-22  Richard Guenther  rguent...@suse.de

PR testsuite/50145
Index: gcc.dg/pr49994-3.c
===
--- gcc.dg/pr49994-3.c  (revision 177949)
+++ gcc.dg/pr49994-3.c  (working copy)
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options -O2 -fsched2-use-superblocks -g } */
+/* { dg-require-effective-target scheduling } */

 void *
 foo (int offset)
Index: gcc.dg/pr49994-2.c
===
--- gcc.dg/pr49994-2.c  (revision 177949)
+++ gcc.dg/pr49994-2.c  (working copy)
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options -O -fno-omit-frame-pointer -fschedule-insns2 
-fsched2-use-superblocks -g } */
+/* { dg-require-effective-target scheduling } */

 int
 bar (int i)


Re: PING: PATCH: PR target/46770: Use .init_array/.fini_array sections

2011-08-22 Thread H.J. Lu
On Sun, Aug 21, 2011 at 10:37 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Sun, Aug 21, 2011 at 05:09:59PM -0700, H.J. Lu wrote:
 I didn't know .init_array section was enabled for AIX.  Does this patch
 work for you?

 Some ELF targets (e.g. arm*-linux*) don't use elfos.h.  IMHO you should
 instead add
 #ifndef __ELF__
 #error NonELF
 #endif
 to gcc_AC_INITFINI_ARRAY test.  And/or add another test to it that tests
 that you can actually use
 .section .init_array
 and it will use correct section flags for the section.


I will update the test.

-- 
H.J.


Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)

2011-08-22 Thread Michael Matz
Hi,

On Sun, 21 Aug 2011, Richard Guenther wrote:

 On Sat, Aug 20, 2011 at 11:02 PM, Richard Henderson r...@redhat.com wrote:
  On 08/19/2011 02:04 AM, Richard Guenther wrote:
  So make sure that __cpu_indicator initially has a conservative correct
  value?  I'd still prefer the constructor-in-libgcc option - if only because
  then the compiler-side is much simplified.
 
 
  Err, I thought __cpu_indicator was a function, not data.
 
  I think we need to discuss this more...
 
 Oh, I thought it was data initialized by the constructor ...

Sriramans patch right now has a function __cpu_indicator_init which is 
called from (adhoc constructed) ctors and that initializes variables
__cpu_model and __cpu_features ;-)  There's no __cpu_indicator symbol :)

I think the whole initializer function and the associated data blobs have 
to sit in static libgcc and be hidden.  By that all shared modules 
will have their own copies of the model and features (and the initializer 
function) so there won't be issues with copy relocs, or cross shared lib 
calls while relocating the modules.  Dynamically they will contain the 
same data always, but it's not many bytes, and only modules making use of 
this facility will pay it.

The initializer function has to be callable from pre-.init contexts, e.g.  
ifunc dispatchers.  And to make life easier there should be one ctor 
function calling this initializer function too, so that normal code can 
rely on it being already called saving one check.


Ciao,
Michael.

Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)

2011-08-22 Thread H.J. Lu
On Mon, Aug 22, 2011 at 7:07 AM, Michael Matz m...@suse.de wrote:
 Hi,

 On Sun, 21 Aug 2011, Richard Guenther wrote:

 On Sat, Aug 20, 2011 at 11:02 PM, Richard Henderson r...@redhat.com wrote:
  On 08/19/2011 02:04 AM, Richard Guenther wrote:
  So make sure that __cpu_indicator initially has a conservative correct
  value?  I'd still prefer the constructor-in-libgcc option - if only 
  because
  then the compiler-side is much simplified.
 
 
  Err, I thought __cpu_indicator was a function, not data.
 
  I think we need to discuss this more...

 Oh, I thought it was data initialized by the constructor ...

 Sriramans patch right now has a function __cpu_indicator_init which is
 called from (adhoc constructed) ctors and that initializes variables
 __cpu_model and __cpu_features ;-)  There's no __cpu_indicator symbol :)

 I think the whole initializer function and the associated data blobs have
 to sit in static libgcc and be hidden.  By that all shared modules
 will have their own copies of the model and features (and the initializer
 function) so there won't be issues with copy relocs, or cross shared lib
 calls while relocating the modules.  Dynamically they will contain the
 same data always, but it's not many bytes, and only modules making use of
 this facility will pay it.

 The initializer function has to be callable from pre-.init contexts, e.g.
 ifunc dispatchers.  And to make life easier there should be one ctor
 function calling this initializer function too, so that normal code can
 rely on it being already called saving one check.


It sounds more complicated than necessary.  Why not just do it
on demand like glibc does?

-- 
H.J.


Re: GIMPLE and intent of variables

2011-08-22 Thread Richard Guenther
On Mon, Aug 22, 2011 at 2:57 PM, Mateusz Grabowski grabek...@wp.pl wrote:


 Richard Guenther-2 wrote:


 The latter.



 But how to do it? I want to have all functions after SSA pass, but before
 any optimizations. Maybe you could tell me how to enforce it (or even better
 - a small example)?
 Thanks in advance.

There is no such place at the moment.

 --
 View this message in context: 
 http://old.nabble.com/GIMPLE-and-intent-of-variables-tp32275433p32310942.html
 Sent from the gcc - patches mailing list archive at Nabble.com.




Re: [PATCH 4/7] Support -fdebug-cpp option

2011-08-22 Thread Tom Tromey
 Jakub == Jakub Jelinek ja...@redhat.com writes:

Jakub For ccache and friends I think it would be better to have a
Jakub preprocessing mode that would output all lines as is (i.e. no
Jakub macro replacement), except for processing #include/#include_next
Jakub directives.

That exists -- -fdirectives-only.

Tom


[PATCH] Fix a RTL sharing problem with CALL_INSN_FUNCTION_USAGE (PR middle-end/48722)

2011-08-22 Thread Jakub Jelinek
Hi!

As the testcase below shows (on i686-linux or x86_64-linux -m32),
we don't unshare expressions in CALL_INSN_FUNCTION_USAGE, which
with entry_value support now include MEMs.  The invalid sharing then
can lead to changes in unrelated insns affecting a call insn (or vice
versa), which results in DF being out of date and ICE afterwards.

Fixed thusly, bootstrapped/regtested (with rtl checking) on x86_64-linux and
i686-linux, ok for trunk?

2011-08-22  Jakub Jelinek  ja...@redhat.com

PR middle-end/48722
* emit-rtl.c (unshare_all_rtl_again): For CALL_INSNs,
reset_used_flags also in CALL_INSN_FUNCTION_USAGE.
(verify_rtl_sharing): Likewise and verify_rtx_sharing
in there too.
(unshare_all_rtl_in_chain): For CALL_INSNs
copy_rtx_if_shared also CALL_INSN_FUNCTION_USAGE.

* gcc.target/i386/pr48722.c: New test.

--- gcc/emit-rtl.c.jj   2011-08-18 08:36:00.0 +0200
+++ gcc/emit-rtl.c  2011-08-22 08:48:27.0 +0200
@@ -2444,6 +2444,8 @@ unshare_all_rtl_again (rtx insn)
   {
reset_used_flags (PATTERN (p));
reset_used_flags (REG_NOTES (p));
+   if (CALL_P (p))
+ reset_used_flags (CALL_INSN_FUNCTION_USAGE (p));
   }
 
   /* Make sure that virtual stack slots are not shared.  */
@@ -2610,6 +2612,8 @@ verify_rtl_sharing (void)
   {
reset_used_flags (PATTERN (p));
reset_used_flags (REG_NOTES (p));
+   if (CALL_P (p))
+ reset_used_flags (CALL_INSN_FUNCTION_USAGE (p));
if (GET_CODE (PATTERN (p)) == SEQUENCE)
  {
int i;
@@ -2621,6 +2625,8 @@ verify_rtl_sharing (void)
gcc_assert (INSN_P (q));
reset_used_flags (PATTERN (q));
reset_used_flags (REG_NOTES (q));
+   if (CALL_P (q))
+ reset_used_flags (CALL_INSN_FUNCTION_USAGE (q));
  }
  }
   }
@@ -2630,6 +2636,8 @@ verify_rtl_sharing (void)
   {
verify_rtx_sharing (PATTERN (p), p);
verify_rtx_sharing (REG_NOTES (p), p);
+   if (CALL_P (p))
+ verify_rtx_sharing (CALL_INSN_FUNCTION_USAGE (p), p);
   }
 
   timevar_pop (TV_VERIFY_RTL_SHARING);
@@ -2646,6 +2654,9 @@ unshare_all_rtl_in_chain (rtx insn)
   {
PATTERN (insn) = copy_rtx_if_shared (PATTERN (insn));
REG_NOTES (insn) = copy_rtx_if_shared (REG_NOTES (insn));
+   if (CALL_P (insn))
+ CALL_INSN_FUNCTION_USAGE (insn)
+   = copy_rtx_if_shared (CALL_INSN_FUNCTION_USAGE (insn));
   }
 }
 
--- gcc/testsuite/gcc.target/i386/pr48722.c.jj  2011-08-22 08:53:10.0 
+0200
+++ gcc/testsuite/gcc.target/i386/pr48722.c 2011-08-22 08:52:37.0 
+0200
@@ -0,0 +1,13 @@
+/* PR middle-end/48722 */
+/* { dg-do compile } */
+/* { dg-options -Os -mno-push-args } */
+
+extern long long a;
+extern int b;
+void bar (int, long long);
+
+void
+foo (void)
+{
+  bar (a  0x85, b);
+}

Jakub


[PATCH] Fix ICEs in get_bit_range (PR middle-end/50141)

2011-08-22 Thread Jakub Jelinek
Hi!

DECL_THREAD_LOCAL_P may be used only on VAR_DECLs, not other decls
like PARM_DECL, RESULT_DECL etc.
Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
ok for trunk?

2011-08-22  Jakub Jelinek  ja...@redhat.com

PR middle-end/50141
* expr.c (get_bit_range): Only use DECL_THREAD_LOCAL_P if
innerdecl is a VAR_DECL.

* c-c++-common/cxxbitfields-6.c: New test.

--- gcc/expr.c.jj   2011-08-22 08:17:07.0 +0200
+++ gcc/expr.c  2011-08-22 10:06:26.0 +0200
@@ -4354,7 +4354,8 @@ get_bit_range (unsigned HOST_WIDE_INT *b
   || TREE_CODE (innerdecl) == TARGET_MEM_REF)
   !ptr_deref_may_alias_global_p (TREE_OPERAND (innerdecl, 0)))
   || (DECL_P (innerdecl)
-  (DECL_THREAD_LOCAL_P (innerdecl)
+  ((TREE_CODE (innerdecl) == VAR_DECL
+   DECL_THREAD_LOCAL_P (innerdecl))
  || !TREE_STATIC (innerdecl
 {
   *bitstart = *bitend = 0;
--- gcc/testsuite/c-c++-common/cxxbitfields-6.c.jj  2011-08-22 
10:11:34.0 +0200
+++ gcc/testsuite/c-c++-common/cxxbitfields-6.c 2011-08-22 10:11:59.0 
+0200
@@ -0,0 +1,17 @@
+/* PR middle-end/50141 */
+/* { dg-do compile } */
+/* { dg-options -O2 --param allow-store-data-races=0 } */
+
+struct S
+{
+  int i:8;
+};
+
+void bar (struct S, int);
+
+void
+foo (struct S s, int i)
+{
+  s.i = i;
+  bar (s, i);
+}

Jakub


[PATCH] Fix ICEs in vect_finish_stmt_generation (PR tree-optimization/50133)

2011-08-22 Thread Jakub Jelinek
Hi!

The following testcase ICEs, because gsi_end_p (*gsi) and thus
there is no stmt after it from which to copy over the location.
As can be seen in the PR, we could do ugly hacks to retrieve locus
from previous stmt (non-debug of course) instead, but I'm probably missing
something obvious why we shouldn't take location from stmt itself
instead.  We've been doing that before, just PR37482 patch changed that
without an explanation.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2011-08-22  Jakub Jelinek  ja...@redhat.com

PR tree-optimization/50133
* tree-vect-stmts.c (vect_finish_stmt_generation): Copy location
from stmt instead of some statement around gsi.

* gcc.dg/pr50133.c: New test.

--- gcc/tree-vect-stmts.c.jj2011-08-22 08:17:08.0 +0200
+++ gcc/tree-vect-stmts.c   2011-08-22 11:26:27.0 +0200
@@ -1419,7 +1419,6 @@ vect_finish_stmt_generation (gimple stmt
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
-  gimple_stmt_iterator si;
 
   gcc_assert (gimple_code (stmt) != GIMPLE_LABEL);
 
@@ -1434,13 +1433,7 @@ vect_finish_stmt_generation (gimple stmt
   print_gimple_stmt (vect_dump, vec_stmt, 0, TDF_SLIM);
 }
 
-  si = *gsi;
-  if (is_gimple_debug (gsi_stmt (si)))
-{
-  gsi_next_nondebug (si);
-  gcc_assert (!gsi_end_p (si));
-}
-  gimple_set_location (vec_stmt, gimple_location (gsi_stmt (si)));
+  gimple_set_location (vec_stmt, gimple_location (stmt));
 }
 
 /* Checks if CALL can be vectorized in type VECTYPE.  Returns
--- gcc/testsuite/gcc.dg/pr50133.c.jj   2011-08-22 11:27:15.0 +0200
+++ gcc/testsuite/gcc.dg/pr50133.c  2011-08-22 11:12:04.0 +0200
@@ -0,0 +1,18 @@
+/* PR tree-optimization/50133 */
+/* { dg-do compile } */
+/* { dg-options -O -ftree-vectorize -fno-tree-loop-im } */
+
+extern int A[], B[];
+
+void
+foo (int z)
+{
+  int j, i;
+  for (j = 0; j  32; j++)
+{
+  int a = A[0];
+  for (i = 0; i  16; i++)
+   a = A[i] ? a : z;
+  B[j] = a;
+}
+}

Jakub


[C++ PATCH] Clear TYPE_TRANSPARENT_AGGR if there are no fields (PR c++/46862)

2011-08-22 Thread Jakub Jelinek
Hi!

TYPE_TRANSPARENT_AGGR types are passed and mangled as its first field.
But if somebody errorneously doesn't put any types in its definition,
the FE marks it TYPE_TRANSPARENT_AGGR early (e.g. even on a forward
declaration), but during mangling or in middle-end when trying to find out
how it will be passed we ICE because first_field is NULL.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux.
Ok for trunk/4.6?

2011-08-22  Jakub Jelinek  ja...@redhat.com

PR c++/46862
* class.c (finish_struct_1): If TYPE_TRANSPARENT_AGGR is set on a type
which doesn't have any fields, clear it and diagnose.

* g++.dg/ext/decimal1.C: New test.

--- gcc/cp/class.c.jj   2011-08-18 08:35:38.0 +0200
+++ gcc/cp/class.c  2011-08-22 12:52:13.0 +0200
@@ -1,6 +1,6 @@
 /* Functions related to building classes and their related objects.
Copyright (C) 1987, 1992, 1993, 1994, 1995, 1996, 1997, 1998,
-   1999, 2000, 2001, 2002, 2003, 2004, 2005, 2007, 2008, 2009, 2010
+   1999, 2000, 2001, 2002, 2003, 2004, 2005, 2007, 2008, 2009, 2010, 2011
Free Software Foundation, Inc.
Contributed by Michael Tiemann (tiem...@cygnus.com)
 
@@ -5794,6 +5794,12 @@ finish_struct_1 (tree t)
 
   /* Finish debugging output for this type.  */
   rest_of_type_compilation (t, ! LOCAL_CLASS_P (t));
+
+  if (TYPE_TRANSPARENT_AGGR (t)  first_field (t) == NULL_TREE)
+{
+  error (type transparent class %qT does not have any fields, t);
+  TYPE_TRANSPARENT_AGGR (t) = 0;
+}
 }
 
 /* When T was built up, the member declarations were added in reverse
--- gcc/testsuite/g++.dg/ext/decimal1.C.jj  2011-08-22 13:22:01.0 
+0200
+++ gcc/testsuite/g++.dg/ext/decimal1.C 2011-08-22 13:16:30.0 +0200
@@ -0,0 +1,19 @@
+// PR c++/46862
+// { dg-do compile }
+
+namespace std
+{
+  namespace decimal
+  {
+class decimal32 { };   // { dg-error does not have any fields }
+class decimal64 { };   // { dg-error does not have any fields }
+class decimal128 { };  // { dg-error does not have any fields }
+  }
+}
+
+void
+foo (std::decimal::decimal32 x,
+ std::decimal::decimal64 y,
+ std::decimal::decimal128 z)
+{
+}

Jakub


Re: [PATCH] void dangling line table after loading pch

2011-08-22 Thread Diego Novillo

On 11-08-22 07:10 , Dodji Seketeli wrote:

Hello,

In c_common_read_pch when gt_pch_restore loads a new pch, the previous
line table (referenced from the global 'line_table') is
garbage-collected and a new one is built.

As the global instance of cpp_reader referenced by the local variable
'pfile' has a pfile-line_table member that references the
'line_table' global, pfile-line_table needs to be set to the new
value of line_table after gt_pch_restore is called, otherwise
pfile-line_table points to garbage-collected memory.  This is what
the call to cpp_set_line_map in c_common_read_pch is for.

The problem is that cpp_set_line_map is called too late as
cpp_read_state (called before cpp_set_line_map) indirectly touches
some pfile-line_table dangling garbage-collected memory[1].

This doesn't cause any visible havoc in trunk yet but I am seeing it
crashing as I am fiddling with line map stuff on the side.

The two-liner patch below just calls cpp_set_line_map right after
gt_pch_restore to restore pfile-line_table before anyone touches it.

[1]: This happens via cpp_read_state -  _cpp_create_definition -
_cpp_create_iso_definition -  _cpp_lex_token -  _cpp_lex_direct -
linemap_position_for_column.

Bootstrapped and tested on x86_64-unknown-linux-gnu against trunk.

OK for trunk?

gcc/

* c-family/c-pch.c (c_common_read_pch): Re-set line table right
after reading in the pch.


OK.


Diego.


Re: [PATCH 4/7] Support -fdebug-cpp option

2011-08-22 Thread Jakub Jelinek
On Mon, Aug 22, 2011 at 08:16:45AM -0600, Tom Tromey wrote:
  Jakub == Jakub Jelinek ja...@redhat.com writes:
 
 Jakub For ccache and friends I think it would be better to have a
 Jakub preprocessing mode that would output all lines as is (i.e. no
 Jakub macro replacement), except for processing #include/#include_next
 Jakub directives.
 
 That exists -- -fdirectives-only.

It isn't exactly what would be needed, as e.g. \\\n are removed from
from #defines and thus they get different location of the tokens.

BTW, I guess we should do something about parsing such an output,
we emit e.g.
# 1 stdin
# 1 built-in
#define __STDC__ 1
#define __STDC_HOSTED__ 1
#define __GNUC__ 4
#define __GNUC_MINOR__ 6
#define __GNUC_PATCHLEVEL__ 0
...

For built-in we really should assume line 0 for all the defines
in that file.

Jakub


Re: PING: PATCH: PR target/46770: Use .init_array/.fini_array sections

2011-08-22 Thread H.J. Lu
On Mon, Aug 22, 2011 at 7:06 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Sun, Aug 21, 2011 at 10:37 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Sun, Aug 21, 2011 at 05:09:59PM -0700, H.J. Lu wrote:
 I didn't know .init_array section was enabled for AIX.  Does this patch
 work for you?

 Some ELF targets (e.g. arm*-linux*) don't use elfos.h.  IMHO you should
 instead add
 #ifndef __ELF__
 #error NonELF
 #endif
 to gcc_AC_INITFINI_ARRAY test.  And/or add another test to it that tests
 that you can actually use
 .section .init_array
 and it will use correct section flags for the section.


 I will update the test.

Can I check in this patch to address AIX issue first?
I will submit a patch to test .section .init_array later?

Thanks.

-- 
H.J.
---
2011-08-22  H.J. Lu  hongjiu...@intel.com

* acinclude.m4 (gcc_AC_INITFINI_ARRAY): Error if __ELF__ isn't
defined.

diff --git a/gcc/acinclude.m4 b/gcc/acinclude.m4
index 74c86db..a8ecd2d 100644
--- a/gcc/acinclude.m4
+++ b/gcc/acinclude.m4
@@ -377,6 +377,9 @@ AC_CACHE_CHECK(for .preinit_array/.init_array/.fini_array su
pport,
 gcc_cv_initfini_array, [dnl
   if test x${build} = x${target}  test x${build} = x${host}; then
 AC_RUN_IFELSE([AC_LANG_SOURCE([
+#ifndef __ELF__
+#error Not an ELF OS
+#endif
 #ifdef __ia64__
 /* We turn on .preinit_array/.init_array/.fini_array support for ia64
if it can be used.  */


Re: [PATCH] Fix ICEs in get_bit_range (PR middle-end/50141)

2011-08-22 Thread Richard Guenther
On Mon, Aug 22, 2011 at 4:18 PM, Jakub Jelinek ja...@redhat.com wrote:
 Hi!

 DECL_THREAD_LOCAL_P may be used only on VAR_DECLs, not other decls
 like PARM_DECL, RESULT_DECL etc.
 Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
 ok for trunk?

Ok.

Thanks,
Richard.

 2011-08-22  Jakub Jelinek  ja...@redhat.com

        PR middle-end/50141
        * expr.c (get_bit_range): Only use DECL_THREAD_LOCAL_P if
        innerdecl is a VAR_DECL.

        * c-c++-common/cxxbitfields-6.c: New test.

 --- gcc/expr.c.jj       2011-08-22 08:17:07.0 +0200
 +++ gcc/expr.c  2011-08-22 10:06:26.0 +0200
 @@ -4354,7 +4354,8 @@ get_bit_range (unsigned HOST_WIDE_INT *b
           || TREE_CODE (innerdecl) == TARGET_MEM_REF)
           !ptr_deref_may_alias_global_p (TREE_OPERAND (innerdecl, 0)))
       || (DECL_P (innerdecl)
 -          (DECL_THREAD_LOCAL_P (innerdecl)
 +          ((TREE_CODE (innerdecl) == VAR_DECL
 +               DECL_THREAD_LOCAL_P (innerdecl))
              || !TREE_STATIC (innerdecl
     {
       *bitstart = *bitend = 0;
 --- gcc/testsuite/c-c++-common/cxxbitfields-6.c.jj      2011-08-22 
 10:11:34.0 +0200
 +++ gcc/testsuite/c-c++-common/cxxbitfields-6.c 2011-08-22 10:11:59.0 
 +0200
 @@ -0,0 +1,17 @@
 +/* PR middle-end/50141 */
 +/* { dg-do compile } */
 +/* { dg-options -O2 --param allow-store-data-races=0 } */
 +
 +struct S
 +{
 +  int i:8;
 +};
 +
 +void bar (struct S, int);
 +
 +void
 +foo (struct S s, int i)
 +{
 +  s.i = i;
 +  bar (s, i);
 +}

        Jakub



Re: [PATCH] Fix ICEs in vect_finish_stmt_generation (PR tree-optimization/50133)

2011-08-22 Thread Richard Guenther
On Mon, Aug 22, 2011 at 4:22 PM, Jakub Jelinek ja...@redhat.com wrote:
 Hi!

 The following testcase ICEs, because gsi_end_p (*gsi) and thus
 there is no stmt after it from which to copy over the location.
 As can be seen in the PR, we could do ugly hacks to retrieve locus
 from previous stmt (non-debug of course) instead, but I'm probably missing
 something obvious why we shouldn't take location from stmt itself
 instead.  We've been doing that before, just PR37482 patch changed that
 without an explanation.

 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.

 2011-08-22  Jakub Jelinek  ja...@redhat.com

        PR tree-optimization/50133
        * tree-vect-stmts.c (vect_finish_stmt_generation): Copy location
        from stmt instead of some statement around gsi.

        * gcc.dg/pr50133.c: New test.

 --- gcc/tree-vect-stmts.c.jj    2011-08-22 08:17:08.0 +0200
 +++ gcc/tree-vect-stmts.c       2011-08-22 11:26:27.0 +0200
 @@ -1419,7 +1419,6 @@ vect_finish_stmt_generation (gimple stmt
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
 -  gimple_stmt_iterator si;

   gcc_assert (gimple_code (stmt) != GIMPLE_LABEL);

 @@ -1434,13 +1433,7 @@ vect_finish_stmt_generation (gimple stmt
       print_gimple_stmt (vect_dump, vec_stmt, 0, TDF_SLIM);
     }

 -  si = *gsi;
 -  if (is_gimple_debug (gsi_stmt (si)))
 -    {
 -      gsi_next_nondebug (si);
 -      gcc_assert (!gsi_end_p (si));
 -    }
 -  gimple_set_location (vec_stmt, gimple_location (gsi_stmt (si)));
 +  gimple_set_location (vec_stmt, gimple_location (stmt));
  }

  /* Checks if CALL can be vectorized in type VECTYPE.  Returns
 --- gcc/testsuite/gcc.dg/pr50133.c.jj   2011-08-22 11:27:15.0 +0200
 +++ gcc/testsuite/gcc.dg/pr50133.c      2011-08-22 11:12:04.0 +0200
 @@ -0,0 +1,18 @@
 +/* PR tree-optimization/50133 */
 +/* { dg-do compile } */
 +/* { dg-options -O -ftree-vectorize -fno-tree-loop-im } */
 +
 +extern int A[], B[];
 +
 +void
 +foo (int z)
 +{
 +  int j, i;
 +  for (j = 0; j  32; j++)
 +    {
 +      int a = A[0];
 +      for (i = 0; i  16; i++)
 +       a = A[i] ? a : z;
 +      B[j] = a;
 +    }
 +}

        Jakub



Re: Vector Comparison patch

2011-08-22 Thread Richard Guenther
On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov
artyom.shinkar...@gmail.com wrote:
 On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov
 artyom.shinkar...@gmail.com wrote:
 Richard

 I formalized an approach a little-bit, now it works without target
 hooks, but some polishing is still required. I want you to comment on
 the several important approaches that I use in the patch.

 So how does it work.
 1) All the vector comparisons at the level of  type-checker are
 introduced using VEC_COND_EXPR with constant selection operands being
 {-1} and {0}. For example v0  v1 is transformed into VEC_COND_EXPRv0
 v1, {-1}, {0}.

 2) When optabs expand VEC_COND_EXPR, two cases are considered:
 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing 
 changes.
 2.b) first operand is something else, in that case, we specially mark
 this case, recognize it in the backend, and do not create a
 comparison, but use the mask as it was a result of some comparison.

 3) In order to make sure that mask in VEC_COND_EXPRmask, v0, v1 is a
 vector comparison we use is_vector_comparison function, if it returns
 false, then we replace mask with mask != {0}.

 So we end-up with the following functionality:
 VEC_COND_EXPRmask, v0,v1 -- if we know that mask is a result of
 comparison of two vectors, we leave it as it is, otherwise change with
 mask != {0}.

 Plain vector comparison a op b is represented with VEC_COND_EXPR,
 which correctly expands, without creating useless masking.


 Basically for me there are two questions:
 1) Can we perform information passing in optabs in a nicer way?
 2) How is_vector_comparison could be improved? I have several ideas,
 like checking if constant vector all consists of 0 and -1, and so on.
 But first is it conceptually fine.

 P.S. I tired to put the functionality of is_vector_comparison in
 tree-ssa-forwprop, but the thing is that it is called only with -On,
 which I find inappropriate, and the functionality gets more
 complicated.

 Why is it inappropriate to not optimize it at -O0?  If the user
 separates comparison and ?: expression it's his own fault.

 Well, because all the information is there, and I perfectly envision
 the case when user expressed comparison separately, just to avoid code
 duplication.

 Like:
 mask = a  b;
 res1 = mask ? v0 : v1;
 res2 = mask ? v2 : v3;

 Which in this case would be different from
 res1 = a  b ? v0 : v1;
 res2 = a  b ? v2 : v3;

 Btw, the new hook is still in the patch.

 I would simply always create != 0 if it isn't and let optimizers
 (tree-ssa-forwprop.c) optimize this.  You still have to deal with
 non-comparison operands during expansion though, but if
 you always forced a != 0 from the start you can then simply
 interpret it as a proper comparison result (in which case I'd
 modify the backends to have an alternate pattern or directly
 expand to masking operations - using the fake comparison
 RTX is too much of a hack).

 Richard, I think you didn't get the problem.
 I really need the way, to pass the information, that the expression
 that is in the first operand of vcond is an appropriate mask. I though
 for quite a while and this hack is the only answer I found, is there a
 better way to do that. I could for example introduce another
 tree-node, but it would be overkill as well.

 Now why do I need it so much:
 I want to implement the comparison in a way that {1, 5, 0, -1} is
 actually {-1,-1,-1,-1}. So whenever I am not sure that mask of
 VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not
 always). To check the stuff, I use is_vector_comparison in
 tree-vect-generic.

 So I really have the difference between mask ? x : y and mask != {0} ?
 x : y, otherwise I could treat mask != {0} in the backend as just
 mask.

 If this link between optabs and backend breaks, then the patch falls
 apart. Because every time the comparison is taken out VEC_COND_EXPR, I
 will have to put != {0}. Keep in mind, that I cannot always put the
 comparison inside the VEC_COND_EXPR, what if it is defined in a
 function-comparison, or somehow else?

 So what would be an appropriate way to connect optabs and the backend?

Well, there is no problem in having the only valid mask operand for
VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,} mask.
Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ?
vec1 : vec2.  This comparison can be eliminated by optimization passes
that
either replace it by the real comparison computing the mask or just
propagating the information this mask is already {-1,...} / {0,} by simply
dropping the comparison against zero.

For the backends I'd have vcond patterns for both an embedded comparison
and for a mask.  (Now we can rewind the discussion a bit and allow
arbitrary masks and define a vcond with a mask operand to do bitwise
selection - what matters is the C frontend semantics which we need to

Re: PING: PATCH: PR target/46770: Use .init_array/.fini_array sections

2011-08-22 Thread Paolo Bonzini

On 08/22/2011 04:45 PM, H.J. Lu wrote:

Can I check in this patch to address AIX issue first?
I will submit a patch to test .section .init_array later?

Thanks.


Yes.

Paolo


[pph] Fix x1dynarra1, x1dynarray2a and x1dynarray2b (issue4921051)

2011-08-22 Thread Diego Novillo

This patch fixes some template test cases.  We were trying to 
expand functions that did not really need expanding (templates).  This
got me thinking that we are not approaching the expansion properly.

We are trying to re-create all the cgraph creation done during the
compilation of the header file.  Rather than re-creating all this, it
would be more efficient to save the state of the callgraph and varpool
at the time of pph generation and then simply recreate it at read
time.  I will experiment with that, see if it actually makes any
sense.

Tested on x86_64.  Applied to pph.

* pph-streamer-out.c (pph_out_symtab): Mark the function as
having a body if it has a cgraph node associated with it.
* pph-streamer-in.c (pph_in_binding_level): Call pph_in_tree to
read BL-BLOCKS.
* pph-streamer-out.c (pph_out_binding_level_1): Call pph_out_tree
to write BL-BLOCKS.

testsuite/ChangeLog.pph
2011-08-18   Diego Novillo  dnovi...@google.com

* g++.dg/pph/x1dynarray1.cc: Mark fixed.
* g++.dg/pph/x1dynarray2a.cc: Mark fixed.
* g++.dg/pph/x1dynarray2b.cc: Mark fixed.

diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c
index 9686edf..ae3ede9 100644
--- a/gcc/cp/pph-streamer-in.c
+++ b/gcc/cp/pph-streamer-in.c
@@ -557,7 +557,7 @@ pph_in_binding_level (pph_stream *stream, cp_binding_level 
*to_register)
   VEC_safe_push (cp_label_binding, gc, bl-shadowed_labels, sl);
 }
 
-  bl-blocks = pph_in_chain (stream);
+  bl-blocks = pph_in_tree (stream);
   bl-this_entity = pph_in_tree (stream);
   bl-level_chain = pph_in_binding_level (stream, NULL);
   bl-dead_vars_from_for = pph_in_tree_vec (stream);
diff --git a/gcc/cp/pph-streamer-out.c b/gcc/cp/pph-streamer-out.c
index b563891..fe0758f 100644
--- a/gcc/cp/pph-streamer-out.c
+++ b/gcc/cp/pph-streamer-out.c
@@ -574,7 +574,7 @@ pph_out_binding_level_1 (pph_stream *stream, 
cp_binding_level *bl,
   FOR_EACH_VEC_ELT (cp_label_binding, bl-shadowed_labels, i, sl)
 pph_out_label_binding (stream, sl);
 
-  pph_out_chain (stream, bl-blocks);
+  pph_out_tree (stream, bl-blocks);
   pph_out_tree (stream, bl-this_entity);
   pph_out_binding_level (stream, bl-level_chain, filter);
   pph_out_tree_vec (stream, bl-dead_vars_from_for);
@@ -1128,7 +1128,9 @@ pph_out_symtab (pph_stream *stream)
   FOR_EACH_VEC_ELT (tree, stream-symtab.v, i, decl)
 if (TREE_CODE (decl) == FUNCTION_DECL  DECL_STRUCT_FUNCTION (decl))
   {
-   if (DECL_SAVED_TREE (decl))
+   /* If this is a regular (non-template) function with a body,
+  mark it for expansion during reading.  */
+   if (DECL_SAVED_TREE (decl)  cgraph_get_node (decl))
  pph_out_symtab_marker (stream, PPH_SYMTAB_FUNCTION_BODY);
else
  pph_out_symtab_marker (stream, PPH_SYMTAB_FUNCTION);
diff --git a/gcc/testsuite/g++.dg/pph/x1dynarray1.cc 
b/gcc/testsuite/g++.dg/pph/x1dynarray1.cc
index 2a1f9f6..6ef279d 100644
--- a/gcc/testsuite/g++.dg/pph/x1dynarray1.cc
+++ b/gcc/testsuite/g++.dg/pph/x1dynarray1.cc
@@ -1,5 +1,4 @@
-// { dg-do link }
-// { dg-xfail-if BOGUS { *-*-* } { -fpph-map=pph.map } }
+// { dg-do run }
 
 #include x0dynarray1.h
 
diff --git a/gcc/testsuite/g++.dg/pph/x1dynarray2a.cc 
b/gcc/testsuite/g++.dg/pph/x1dynarray2a.cc
index f941086..252e801 100644
--- a/gcc/testsuite/g++.dg/pph/x1dynarray2a.cc
+++ b/gcc/testsuite/g++.dg/pph/x1dynarray2a.cc
@@ -1,5 +1,4 @@
-// { dg-xfail-if BOGUS { *-*-* } { -fpph-map=pph.map } }
-// { dg-do link }
+// { dg-do run }
 
 #include x0dynarray2.h
 
diff --git a/gcc/testsuite/g++.dg/pph/x1dynarray2b.cc 
b/gcc/testsuite/g++.dg/pph/x1dynarray2b.cc
index 8f14149..c942d9d 100644
--- a/gcc/testsuite/g++.dg/pph/x1dynarray2b.cc
+++ b/gcc/testsuite/g++.dg/pph/x1dynarray2b.cc
@@ -1,5 +1,4 @@
-// { dg-xfail-if BOGUS { *-*-* } { -fpph-map=pph.map } }
-// { dg-do link }
+// { dg-do run }
 
 #include x0dynarray2.h
 
-- 
1.7.3.1


--
This patch is available for review at http://codereview.appspot.com/4921051


Re: Vector Comparison patch

2011-08-22 Thread Artem Shinkarov
On Mon, Aug 22, 2011 at 4:01 PM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov
 artyom.shinkar...@gmail.com wrote:
 On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov
 artyom.shinkar...@gmail.com wrote:
 Richard

 I formalized an approach a little-bit, now it works without target
 hooks, but some polishing is still required. I want you to comment on
 the several important approaches that I use in the patch.

 So how does it work.
 1) All the vector comparisons at the level of  type-checker are
 introduced using VEC_COND_EXPR with constant selection operands being
 {-1} and {0}. For example v0  v1 is transformed into VEC_COND_EXPRv0
 v1, {-1}, {0}.

 2) When optabs expand VEC_COND_EXPR, two cases are considered:
 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing 
 changes.
 2.b) first operand is something else, in that case, we specially mark
 this case, recognize it in the backend, and do not create a
 comparison, but use the mask as it was a result of some comparison.

 3) In order to make sure that mask in VEC_COND_EXPRmask, v0, v1 is a
 vector comparison we use is_vector_comparison function, if it returns
 false, then we replace mask with mask != {0}.

 So we end-up with the following functionality:
 VEC_COND_EXPRmask, v0,v1 -- if we know that mask is a result of
 comparison of two vectors, we leave it as it is, otherwise change with
 mask != {0}.

 Plain vector comparison a op b is represented with VEC_COND_EXPR,
 which correctly expands, without creating useless masking.


 Basically for me there are two questions:
 1) Can we perform information passing in optabs in a nicer way?
 2) How is_vector_comparison could be improved? I have several ideas,
 like checking if constant vector all consists of 0 and -1, and so on.
 But first is it conceptually fine.

 P.S. I tired to put the functionality of is_vector_comparison in
 tree-ssa-forwprop, but the thing is that it is called only with -On,
 which I find inappropriate, and the functionality gets more
 complicated.

 Why is it inappropriate to not optimize it at -O0?  If the user
 separates comparison and ?: expression it's his own fault.

 Well, because all the information is there, and I perfectly envision
 the case when user expressed comparison separately, just to avoid code
 duplication.

 Like:
 mask = a  b;
 res1 = mask ? v0 : v1;
 res2 = mask ? v2 : v3;

 Which in this case would be different from
 res1 = a  b ? v0 : v1;
 res2 = a  b ? v2 : v3;

 Btw, the new hook is still in the patch.

 I would simply always create != 0 if it isn't and let optimizers
 (tree-ssa-forwprop.c) optimize this.  You still have to deal with
 non-comparison operands during expansion though, but if
 you always forced a != 0 from the start you can then simply
 interpret it as a proper comparison result (in which case I'd
 modify the backends to have an alternate pattern or directly
 expand to masking operations - using the fake comparison
 RTX is too much of a hack).

 Richard, I think you didn't get the problem.
 I really need the way, to pass the information, that the expression
 that is in the first operand of vcond is an appropriate mask. I though
 for quite a while and this hack is the only answer I found, is there a
 better way to do that. I could for example introduce another
 tree-node, but it would be overkill as well.

 Now why do I need it so much:
 I want to implement the comparison in a way that {1, 5, 0, -1} is
 actually {-1,-1,-1,-1}. So whenever I am not sure that mask of
 VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not
 always). To check the stuff, I use is_vector_comparison in
 tree-vect-generic.

 So I really have the difference between mask ? x : y and mask != {0} ?
 x : y, otherwise I could treat mask != {0} in the backend as just
 mask.

 If this link between optabs and backend breaks, then the patch falls
 apart. Because every time the comparison is taken out VEC_COND_EXPR, I
 will have to put != {0}. Keep in mind, that I cannot always put the
 comparison inside the VEC_COND_EXPR, what if it is defined in a
 function-comparison, or somehow else?

 So what would be an appropriate way to connect optabs and the backend?

 Well, there is no problem in having the only valid mask operand for
 VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,} mask.
 Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ?
 vec1 : vec2.

This happens already in the new version of patch (not submitted yet).

 This comparison can be eliminated by optimization passes
 that
 either replace it by the real comparison computing the mask or just
 propagating the information this mask is already {-1,...} / {0,} by simply
 dropping the comparison against zero.

This is not a problem, because the backend recognizes these patterns,
so no optimization is needed in this part.

[pph] Re-organize pph_write_tree/pph_read_tree (issue4934045)

2011-08-22 Thread Diego Novillo
This patch refactors pph_write_tree and pph_read_tree so that we can
reduce the amount of special casing we were doing.  It first tries to
handle nodes based on their tree code class.  The only class that
cannot be handled this way is tcc_exceptional, so we only deal with
those nodes individually.

This fixes no testcases but it simplifies the logic of the AST
streamer.  I think we can now get rid of flag_pph_untree now since we 
are unconditionally ICEing if a node is not handled.

Tested on x86_64.  Applied to branch.

* pph-streamer-in.c (pph_in_tcc_declaration): Factor out
of pph_read_tree_body.  Handle all decls.
(pph_in_tcc_type): Factor out of pph_read_tree_body.  Handle
all types.
(pph_read_tree_body): Issue a fatal error if a node has not
been handled.
* pph-streamer-out.c (pph_out_tcc_declaration): Factor out of
pph_write_tree_body.  Handle all decls.
(pph_out_tcc_type): Factor out of pph_write_tree_body.  Handle
all types.
(pph_write_tree_body): Issue a fatal error if a node has not
been handled.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/pph@177960 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/cp/ChangeLog.pph  |   15 ++
 gcc/cp/pph-streamer-in.c  |  342 --
 gcc/cp/pph-streamer-out.c |  359 +
 3 files changed, 284 insertions(+), 432 deletions(-)

diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c
index ae3ede9..07e1135 100644
--- a/gcc/cp/pph-streamer-in.c
+++ b/gcc/cp/pph-streamer-in.c
@@ -1538,119 +1538,129 @@ pph_read_file (const char *filename)
 }
 
 
-/* Read the attributes for a FUNCTION_DECL FNDECL.  If FNDECL had
-   a body, mark it for expansion.  */
+/* Read from STREAM the body of tcc_declaration tree DECL.  */
 
 static void
-pph_in_function_decl (pph_stream *stream, tree fndecl)
+pph_in_tcc_declaration (pph_stream *stream, tree decl)
 {
-  DECL_INITIAL (fndecl) = pph_in_tree (stream);
-  pph_in_lang_specific (stream, fndecl);
-  DECL_SAVED_TREE (fndecl) = pph_in_tree (stream);
-  DECL_CHAIN (fndecl) = pph_in_tree (stream);
-}
-
+  pph_in_lang_specific (stream, decl);
+  DECL_INITIAL (decl) = pph_in_tree (stream);
 
+  /* The tree streamer only writes DECL_CHAIN for PARM_DECL nodes.  */
+  if (TREE_CODE (decl) == VAR_DECL
+  || TREE_CODE (decl) == FUNCTION_DECL)
+DECL_CHAIN (decl) = pph_in_tree (stream);
 
-/* Read the body fields of EXPR from STREAM.  */
-
-static void
-pph_read_tree_body (pph_stream *stream, tree expr)
-{
-  struct lto_input_block *ib = stream-encoder.r.ib;
-  struct data_in *data_in = stream-encoder.r.data_in;
-
-  /* Read the language-independent parts of EXPR's body.  */
-  streamer_read_tree_body (ib, data_in, expr);
-
-  /* Read all the language-dependent fields.  */
-  switch (TREE_CODE (expr))
+  /* Handle some individual decl nodes.  */
+  switch (TREE_CODE (decl))
 {
-/* TREES NEEDING EXTRA WORK */
-
-/* tcc_declaration */
-
-case DEBUG_EXPR_DECL:
-case IMPORTED_DECL:
-case LABEL_DECL:
-case RESULT_DECL:
-  DECL_INITIAL (expr) = pph_in_tree (stream);
-  break;
-
-case CONST_DECL:
-case FIELD_DECL:
-case NAMESPACE_DECL:
-case PARM_DECL:
-case USING_DECL:
-case VAR_DECL:
-  /* FIXME pph: Should we merge DECL_INITIAL into lang_specific? */
-  DECL_INITIAL (expr) = pph_in_tree (stream);
-  pph_in_lang_specific (stream, expr);
-  /* DECL_CHAIN is handled by generic code, except for VAR_DECLs.  */
-  if (TREE_CODE (expr) == VAR_DECL)
-   DECL_CHAIN (expr) = pph_in_tree (stream);
-  break;
-
 case FUNCTION_DECL:
-  pph_in_function_decl (stream, expr);
+  DECL_SAVED_TREE (decl) = pph_in_tree (stream);
   break;
 
 case TYPE_DECL:
-  DECL_INITIAL (expr) = pph_in_tree (stream);
-  pph_in_lang_specific (stream, expr);
-  DECL_ORIGINAL_TYPE (expr) = pph_in_tree (stream);
+  DECL_ORIGINAL_TYPE (decl) = pph_in_tree (stream);
   break;
 
 case TEMPLATE_DECL:
-  DECL_INITIAL (expr) = pph_in_tree (stream);
-  pph_in_lang_specific (stream, expr);
-  DECL_TEMPLATE_RESULT (expr) = pph_in_tree (stream);
-  DECL_TEMPLATE_PARMS (expr) = pph_in_tree (stream);
-  DECL_CONTEXT (expr) = pph_in_tree (stream);
+  DECL_TEMPLATE_RESULT (decl) = pph_in_tree (stream);
+  DECL_TEMPLATE_PARMS (decl) = pph_in_tree (stream);
   break;
 
-/* tcc_type */
-
-case ARRAY_TYPE:
-case BOOLEAN_TYPE:
-case COMPLEX_TYPE:
-case ENUMERAL_TYPE:
-case FIXED_POINT_TYPE:
-case FUNCTION_TYPE:
-case INTEGER_TYPE:
-case LANG_TYPE:
-case METHOD_TYPE:
-case NULLPTR_TYPE:
-case OFFSET_TYPE:
-case POINTER_TYPE:
-case REAL_TYPE:
-case REFERENCE_TYPE:
-case VECTOR_TYPE:
-case VOID_TYPE:
-  TYPE_LANG_SPECIFIC (expr) = pph_in_lang_type (stream);
+default:
   break;

Re: Vector Comparison patch

2011-08-22 Thread Artem Shinkarov
On Mon, Aug 22, 2011 at 4:34 PM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Mon, Aug 22, 2011 at 5:21 PM, Artem Shinkarov
 artyom.shinkar...@gmail.com wrote:
 On Mon, Aug 22, 2011 at 4:01 PM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov
 artyom.shinkar...@gmail.com wrote:
 On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov
 artyom.shinkar...@gmail.com wrote:
 Richard

 I formalized an approach a little-bit, now it works without target
 hooks, but some polishing is still required. I want you to comment on
 the several important approaches that I use in the patch.

 So how does it work.
 1) All the vector comparisons at the level of  type-checker are
 introduced using VEC_COND_EXPR with constant selection operands being
 {-1} and {0}. For example v0  v1 is transformed into VEC_COND_EXPRv0
 v1, {-1}, {0}.

 2) When optabs expand VEC_COND_EXPR, two cases are considered:
 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing 
 changes.
 2.b) first operand is something else, in that case, we specially mark
 this case, recognize it in the backend, and do not create a
 comparison, but use the mask as it was a result of some comparison.

 3) In order to make sure that mask in VEC_COND_EXPRmask, v0, v1 is a
 vector comparison we use is_vector_comparison function, if it returns
 false, then we replace mask with mask != {0}.

 So we end-up with the following functionality:
 VEC_COND_EXPRmask, v0,v1 -- if we know that mask is a result of
 comparison of two vectors, we leave it as it is, otherwise change with
 mask != {0}.

 Plain vector comparison a op b is represented with VEC_COND_EXPR,
 which correctly expands, without creating useless masking.


 Basically for me there are two questions:
 1) Can we perform information passing in optabs in a nicer way?
 2) How is_vector_comparison could be improved? I have several ideas,
 like checking if constant vector all consists of 0 and -1, and so on.
 But first is it conceptually fine.

 P.S. I tired to put the functionality of is_vector_comparison in
 tree-ssa-forwprop, but the thing is that it is called only with -On,
 which I find inappropriate, and the functionality gets more
 complicated.

 Why is it inappropriate to not optimize it at -O0?  If the user
 separates comparison and ?: expression it's his own fault.

 Well, because all the information is there, and I perfectly envision
 the case when user expressed comparison separately, just to avoid code
 duplication.

 Like:
 mask = a  b;
 res1 = mask ? v0 : v1;
 res2 = mask ? v2 : v3;

 Which in this case would be different from
 res1 = a  b ? v0 : v1;
 res2 = a  b ? v2 : v3;

 Btw, the new hook is still in the patch.

 I would simply always create != 0 if it isn't and let optimizers
 (tree-ssa-forwprop.c) optimize this.  You still have to deal with
 non-comparison operands during expansion though, but if
 you always forced a != 0 from the start you can then simply
 interpret it as a proper comparison result (in which case I'd
 modify the backends to have an alternate pattern or directly
 expand to masking operations - using the fake comparison
 RTX is too much of a hack).

 Richard, I think you didn't get the problem.
 I really need the way, to pass the information, that the expression
 that is in the first operand of vcond is an appropriate mask. I though
 for quite a while and this hack is the only answer I found, is there a
 better way to do that. I could for example introduce another
 tree-node, but it would be overkill as well.

 Now why do I need it so much:
 I want to implement the comparison in a way that {1, 5, 0, -1} is
 actually {-1,-1,-1,-1}. So whenever I am not sure that mask of
 VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not
 always). To check the stuff, I use is_vector_comparison in
 tree-vect-generic.

 So I really have the difference between mask ? x : y and mask != {0} ?
 x : y, otherwise I could treat mask != {0} in the backend as just
 mask.

 If this link between optabs and backend breaks, then the patch falls
 apart. Because every time the comparison is taken out VEC_COND_EXPR, I
 will have to put != {0}. Keep in mind, that I cannot always put the
 comparison inside the VEC_COND_EXPR, what if it is defined in a
 function-comparison, or somehow else?

 So what would be an appropriate way to connect optabs and the backend?

 Well, there is no problem in having the only valid mask operand for
 VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,} mask.
 Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ?
 vec1 : vec2.

 This happens already in the new version of patch (not submitted yet).

 This comparison can be eliminated by optimization passes
 that
 either replace it by the real comparison computing the mask or just
 propagating the information this mask is already {-1,...} 

Re: mem_attrs_htab

2011-08-22 Thread Michael Matz
Hi,

On Mon, 22 Aug 2011, Richard Guenther wrote:

  Some functions are extremely large though.  Do you mean that MEM 
  itself would be enlarged to have the MEM_ATTRS field so that one 
  operand is the address, then expr, then HWI size, offset, etc.? 
   Because if the mem attrs aren't shared any longer, it doesn't make 
  sense to keep the indirection. I still fear we have way too many MEMs 
  in RTL that this would be noticeable.
 
 It would be interesting to have numbers about the amount of sharing that 
 happens

A pathetic amount.  From compiling cse.c combine.c tree.c and dwarf2out.c 
the top 10 users of MEMs per routine are:

With -O0:
MEMs  ATTR  name
970   485   combine_simplify_rtx
1011  464   simple_cst_equal
1047  612   mem_loc_descriptor
1173  442   walk_tree_1
1296  515   loc_list_from_tree
1431  690   simplify_comparison
1911  752   substitute_placeholder_in_expr
1951  745   substitute_in_expr
2503  1532  cse_insn
3242  2206  try_combine

With -O2:
MEMs  ATTR  name
514   502   gen_tagged_type_die
701   536   simplify_comparison
743   877   find_decls_types_r
851   840   dwarf2out_finish
863   784   loc_list_from_tree
916   839   combine_simplify_rtx
978   878   gen_subprogram_die
1650  1475  cse_insn
1720  1782  mem_loc_descriptor
2336  1792  try_combine

Summing doesn't make sense, but the routines with largest differences:
-O0
532 force_to_mode
547 simple_cst_equal
640 simplify_shift_const_1
731 walk_tree_1
741 simplify_comparison
781 loc_list_from_tree
971 cse_insn
1036 try_combine
1159 substitute_placeholder_in_expr
1206 substitute_in_expr

-O2
100 gen_subprogram_die
101 make_extraction
112 output_loc_sequence
122 if_then_else_cond
124 substitute_placeholder_in_expr
144 simplify_shift_const_1
165 simplify_comparison
175 cse_insn
205 simplify_if_then_else
544 try_combine

(Using -g or not doesn't make a difference).  I've counted all MEM rtx in 
the whole insn stream at finalization time (i.e. slightly less than 
potentially are actually generated during RTL passes).  ATTR is the number 
of unique mem_attrs ever created by set_mem_attrs, reset to zero at each 
function start (including emptying the htab).

That is, we save a whopping 48 kilobyte due to this fantastic hash table 
:-)  (offseted by the need for a pointer in the MEM rtx)

Just remove the whole thing.  Same for the reg_attrs hash table (I haven't 
measured that one, though).

 - might be not trivial though, as some re-uses would be able to 
 simply modify the attr inplace.


Ciao,
Michael.

Re: PING: PATCH: PR target/46770: Use .init_array/.fini_array sections

2011-08-22 Thread H.J. Lu
On Sun, Aug 21, 2011 at 4:19 PM, David Edelsohn dje@gmail.com wrote:
 This patch broke bootstrap on AIX.  It emits a .section op in
 assembly but .section is an ELF syntax op not AIX XCOFF.

 FE..initialize_critical:
        .section        .init_array

 varasm.c should not be generating ELF ops for non-ELF targets.

 config.log shows:

 gcc_cv_initfini_array=yes
 tm_file_list includes initfini-array.h
 tm_include_list includes initfini-array.h

 Why is the patch affecting non-ELF targets?  Please fix or revert immediately.


David, I checked in the fix for AIX.  Please verify it.

Thanks.


-- 
H.J.


Re: [PATCH] Fix ICEs in get_bit_range (PR middle-end/50141)

2011-08-22 Thread Aldy Hernandez



@@ -4354,7 +4354,8 @@ get_bit_range (unsigned HOST_WIDE_INT *b
   || TREE_CODE (innerdecl) == TARGET_MEM_REF)
  !ptr_deref_may_alias_global_p (TREE_OPERAND (innerdecl, 0)))
|| (DECL_P (innerdecl)
- (DECL_THREAD_LOCAL_P (innerdecl)
+ ((TREE_CODE (innerdecl) == VAR_DECL
+ DECL_THREAD_LOCAL_P (innerdecl))
  || !TREE_STATIC (innerdecl


Thanks.  I already had this in my local tree, but am waiting to polish 
everything up before another round of iterations with Richi.


Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)

2011-08-22 Thread Michael Matz
Hi,

On Mon, 22 Aug 2011, H.J. Lu wrote:

  Oh, I thought it was data initialized by the constructor ...
 
  Sriramans patch right now has a function __cpu_indicator_init which is 
  called from (adhoc constructed) ctors and that initializes variables
  __cpu_model and __cpu_features ;-)  There's no __cpu_indicator symbol :)
 
  I think the whole initializer function and the associated data blobs have
  to sit in static libgcc and be hidden.  By that all shared modules
  will have their own copies of the model and features (and the initializer
  function) so there won't be issues with copy relocs, or cross shared lib
  calls while relocating the modules.  Dynamically they will contain the
  same data always, but it's not many bytes, and only modules making use of
  this facility will pay it.
 
  The initializer function has to be callable from pre-.init contexts, e.g.
  ifunc dispatchers.  And to make life easier there should be one ctor
  function calling this initializer function too, so that normal code can
  rely on it being already called saving one check.
 
 
 It sounds more complicated than necessary.  Why not just do it
 on demand like glibc does?

Ehm, the only difference would be to not have a ctor in libgcc that looks 
like so:

void __attribute__((constructor)) bla(void)
{
  __cpu_indicator_init ();
}

I don't see any complication.?


Ciao,
Michael.

Re: RFC: [build, ada] Centralize PICFLAG configuration

2011-08-22 Thread Rainer Orth
Paolo Bonzini bonz...@gnu.org writes:

 On 08/19/2011 09:11 PM, Rainer Orth wrote:

 2011-07-31  Rainer Orthr...@cebitec.uni-bielefeld.de

  config:
  * picflag.m4: New file.

  gcc:
  * configure.ac (GCC_PICFLAG_FOR_TARGET): Call it.
  (PICFLAG_FOR_TARGET): Substitute.
  * aclocal.m4: Regenerate.
  * configure: Regenerate.

  gcc/ada:
  * gcc-interface/Makefile.in (PICFLAG_FOR_TARGET): New.
  (GNATLIBCFLAGS_FOR_C): Replace
  TARGET_LIBGCC2_CFLAGS by PICFLAG_FOR_TARGET.
  (gnatlib-shared-default, gnatlib-shared-dual-win32)
  (gnatlib-shared-win32, gnatlib-shared-darwin, gnatlib-shared)
  (gnatlib-sjlj, gnatlib-zcx): Likewise.

  libada:
  * configure.ac: Include ../config/picflag.m4.
  (GCC_PICFLAG): Call it.
  Substitute.
  * configure: Regenerate.
  * Makefile.in (TARGET_LIBGCC2_CFLAGS): Replace by PICFLAG.
  (GNATLIBCFLAGS_FOR_C): Replace TARGET_LIBGCC2_CFLAGS by PICFLAG.
  (LIBADA_FLAGS_TO_PASS): Pass PICFLAG as PICFLAG_FOR_TARGET.
  Don't include $(GCC_DIR)/libgcc.mvars.

  libiberty:
  * aclocal.m4: Include ../config/picflag.m4.
  * configure.ac (GCC_PICFLAG): Call it.
  (enable_shared): Clear PICFLAG unless shared.
  * configure: Regenerate.


 Ok, thanks.

installed, thanks.

Do I need to sync the config and libiberty parts to src manually or does
this happen by some sort of magic?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: RFC: [build, ada] Centralize PICFLAG configuration

2011-08-22 Thread DJ Delorie

 Do I need to sync the config and libiberty parts to src manually or does
 this happen by some sort of magic?


intl/; config.rhost; libiberty/; libiberty's part of include/ 
gcc: http://gcc.gnu.org
Changes need to be done in tandem with the official GCC
sources or submitted to the master file maintainer and brought
in via a merge.  Note: approved patches in gcc's libiberty or
intl are automatically approved in this libiberty and intl also;
feel free to merge them yourself if needed sooner than the next
merge.  Otherwise, changes are automatically merged, usually
within a day.


Re: repost: [DF] Use HARD_REG_SETs instead of bitmaps

2011-08-22 Thread Jeff Law
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 08/22/11 11:16, Dimitrios Apostolou wrote:

 
 Any updates will come as a followup to this thread. I still have to
 do some testing on other platforms. Do we have access to any PA and
 MIPS machinery? I also wanted to test on architecture with lots of
 hard registers and small long size, but my 32-bit sparcstation
 crashed badly, any ideas which arch to try next?
I thought there was a PA available in the testfarm.

jeff
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOUo9iAAoJEBRtltQi2kC7qLsH/RDhJIi0iK4WSYIyGk/KiG15
4m/1vJzCN0u2E0Aw92ArpQtzFSU3/wMvLJFvJMZQcyWKJ73H0Ri0k81POCXBnEmd
nMDoQ8/T1xXTyYRCrnh7Y4ZuVxAhXUqL2V5ONnfiayqCR+z5jeKPZGu9GOStPGhI
ysGP+rzSQAwgCF9lrXgy7+zaetnK230BmNw4P5C/yLYhySP5zTTEZ3e1JCCHWSbt
fg2Hjt7FKHdzgeSs9X8LpUCZd2QWFDaunlYzOJTrrDj1TatIz9WL+e7NH3+gsrQ9
enL8+CKUeOE63zKOfvarHw3ucxHZn8hYZriCZzdB+RbKnoikCjWCNnGdH8pDj6c=
=qSqA
-END PGP SIGNATURE-


Re: PING: PATCH: PR target/46770: Use .init_array/.fini_array sections

2011-08-22 Thread H.J. Lu
On Mon, Aug 22, 2011 at 10:09 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Sun, Aug 21, 2011 at 10:37 PM, Jakub Jelinek ja...@redhat.com wrote:
 nd/or add another test to it that tests
 that you can actually use
 .section .init_array
 and it will use correct section flags for the section.


 We need this information in config.gcc. But config.gcc is used
 before assembler and readelf are detected. I am running out of
 ideas.  Any suggestions?

 Thanks.


This is a much bigger change that I like and I don't feel very comfortable
about it since I can only test it on Linux/x86.


-- 
H.J.

2011-08-22  H.J. Lu  hongjiu...@intel.com

* acinclude.m4 (gcc_AC_INITFINI_ARRAY): Check if
.section .init_array works.

* configure.ac: Move binutils tests before gcc_AC_INITFINI_ARRAY.
* configure: Regenerated.
diff --git a/gcc/acinclude.m4 b/gcc/acinclude.m4
index a8ecd2d..3827046 100644
--- a/gcc/acinclude.m4
+++ b/gcc/acinclude.m4
@@ -376,6 +376,18 @@ AC_DEFUN([gcc_AC_INITFINI_ARRAY],
 AC_CACHE_CHECK(for .preinit_array/.init_array/.fini_array support,
 		 gcc_cv_initfini_array, [dnl
   if test x${build} = x${target}  test x${build} = x${host}; then
+if test x$gcc_cv_as != x -a x$gcc_cv_readelf != x ; then
+  echo .section .init_array; .byte 0  conftest.s
+  if $gcc_cv_as -o conftest.o conftest.s  /dev/null 21; then
+	if $gcc_cv_readelf -S --wide conftest.o 21 \
+	   | grep INIT_ARRAY  /dev/null 21; then
+	  gcc_cv_as_init_array=yes
+	fi
+  fi
+  rm -f conftest.s conftest.o
+fi
+  fi
+  if test x${gcc_cv_as_init_array} = xyes; then
 AC_RUN_IFELSE([AC_LANG_SOURCE([
 #ifndef __ELF__
 #error Not an ELF OS
diff --git a/gcc/configure.ac b/gcc/configure.ac
index ed01904..3cae58a 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -1153,6 +1153,231 @@ fi
 # Restore CFLAGS from before the gcc_AC_NEED_DECLARATIONS tests.
 CFLAGS=$saved_CFLAGS
 
+# Identify the assembler which will work hand-in-glove with the newly
+# built GCC, so that we can examine its features.  This is the assembler
+# which will be driven by the driver program.
+#
+# If build != host, and we aren't building gas in-tree, we identify a
+# build-target assembler and hope that it will have the same features
+# as the host-target assembler we'll be using.
+gcc_cv_gas_major_version=
+gcc_cv_gas_minor_version=
+gcc_cv_as_gas_srcdir=`echo $srcdir | sed -e 's,/gcc$,,'`/gas
+
+m4_pattern_allow([AS_FOR_TARGET])dnl
+AS_VAR_SET_IF(gcc_cv_as,, [
+if test -x $DEFAULT_ASSEMBLER; then
+	gcc_cv_as=$DEFAULT_ASSEMBLER
+elif test -f $gcc_cv_as_gas_srcdir/configure.in \
+  test -f ../gas/Makefile \
+  test x$build = x$host; then
+	gcc_cv_as=../gas/as-new$build_exeext
+elif test -x as$build_exeext; then
+	# Build using assembler in the current directory.
+	gcc_cv_as=./as$build_exeext
+elif ( set dummy $AS_FOR_TARGET; test -x $[2] ); then
+gcc_cv_as=$AS_FOR_TARGET
+else
+AC_PATH_PROG(gcc_cv_as, $AS_FOR_TARGET)
+fi])
+
+ORIGINAL_AS_FOR_TARGET=$gcc_cv_as
+AC_SUBST(ORIGINAL_AS_FOR_TARGET)
+case $ORIGINAL_AS_FOR_TARGET in
+  ./as | ./as$build_exeext) ;;
+  *) AC_CONFIG_FILES(as:exec-tool.in, [chmod +x as]) ;;
+esac 
+
+AC_MSG_CHECKING(what assembler to use)
+if test $gcc_cv_as = ../gas/as-new$build_exeext; then
+  # Single tree build which includes gas.  We want to prefer it
+  # over whatever linker top-level may have detected, since
+  # we'll use what we're building after installation anyway.
+  AC_MSG_RESULT(newly built gas)
+  in_tree_gas=yes
+  _gcc_COMPUTE_GAS_VERSION
+  in_tree_gas_is_elf=no
+  if grep 'obj_format = elf' ../gas/Makefile  /dev/null \
+ || (grep 'obj_format = multi' ../gas/Makefile \
+  grep 'extra_objects =.* obj-elf' ../gas/Makefile)  /dev/null
+  then
+in_tree_gas_is_elf=yes
+  fi
+else
+  AC_MSG_RESULT($gcc_cv_as)
+  in_tree_gas=no
+fi
+
+# Identify the linker which will work hand-in-glove with the newly
+# built GCC, so that we can examine its features.  This is the linker
+# which will be driven by the driver program.
+#
+# If build != host, and we aren't building gas in-tree, we identify a
+# build-target linker and hope that it will have the same features
+# as the host-target linker we'll be using.
+gcc_cv_gld_major_version=
+gcc_cv_gld_minor_version=
+gcc_cv_ld_gld_srcdir=`echo $srcdir | sed -e 's,/gcc$,,'`/ld
+gcc_cv_ld_bfd_srcdir=`echo $srcdir | sed -e 's,/gcc$,,'`/bfd
+
+AS_VAR_SET_IF(gcc_cv_ld,, [
+if test -x $DEFAULT_LINKER; then
+	gcc_cv_ld=$DEFAULT_LINKER
+elif test -f $gcc_cv_ld_gld_srcdir/configure.in \
+  test -f ../ld/Makefile \
+  test x$build = x$host; then
+	gcc_cv_ld=../ld/ld-new$build_exeext
+elif test -x collect-ld$build_exeext; then
+	# Build using linker in the current directory.
+	gcc_cv_ld=./collect-ld$build_exeext
+elif ( set dummy $LD_FOR_TARGET; test -x $[2] ); then
+gcc_cv_ld=$LD_FOR_TARGET
+else
+AC_PATH_PROG(gcc_cv_ld, $LD_FOR_TARGET)
+fi])
+
+ORIGINAL_PLUGIN_LD_FOR_TARGET=$gcc_cv_ld

[ping 2] [patch] attribute to reverse bitfield allocations

2011-08-22 Thread DJ Delorie

Ping 2 ?

http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01889.html
http://gcc.gnu.org/ml/gcc-patches/2011-07/msg02555.html


Re: [patch] support for multiarch systems

2011-08-22 Thread Toon Moene

On 08/21/2011 02:14 AM, Matthias Klose wrote:


On 08/20/2011 09:51 PM, Matthias Klose wrote:



Multiarch [1] is the term being used to refer to the capability of a system to
install and run applications of multiple different binary targets on the same
system.  The idea and name of multiarch dates back to 2004/2005 [2] (to be
confused with multiarch in glibc).


attached is an updated patch which includes feedback from Jakub and Joseph.


Perhaps I could like this patch ?  It probably solves

http://gcc.gnu.org/ml/gcc-testresults/2011-08/msg02398.html

[ My system is Debian Testing, updated 20110821 at 12:15 UTC ]

(h/t Mark Glisse).

Thanks in advance,

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news


Re: [var-tracking] [not-good!] disable shared_hash and other simplifications

2011-08-22 Thread Steven Bosscher
On Mon, Aug 22, 2011 at 1:54 PM, Jakub Jelinek ja...@redhat.com wrote:
 * From a very wide field of view, all this DF solving reminded me a
 lot of what I've seen in df-*.c. Why can't variable tracking be
 integrated in the main DF pass of the compiler, the one that happens
 *after* register allocation?

 I don't think there is any significant overlap.  Yes, var-tracking is a data
 flow algorithm, but the code that does that is just a few lines, the big
 part is how the merges are done and how the tracking propagates through a
 basic block, and that has nothing in common with df-*.c.

But perhaps with the DF solver fewer blocks are visited...?

Ciao!
Steven


Re: [var-tracking] [not-good!] disable shared_hash and other simplifications

2011-08-22 Thread Jakub Jelinek
On Mon, Aug 22, 2011 at 07:49:43PM +0200, Steven Bosscher wrote:
 On Mon, Aug 22, 2011 at 1:54 PM, Jakub Jelinek ja...@redhat.com wrote:
  * From a very wide field of view, all this DF solving reminded me a
  lot of what I've seen in df-*.c. Why can't variable tracking be
  integrated in the main DF pass of the compiler, the one that happens
  *after* register allocation?
 
  I don't think there is any significant overlap.  Yes, var-tracking is a data
  flow algorithm, but the code that does that is just a few lines, the big
  part is how the merges are done and how the tracking propagates through a
  basic block, and that has nothing in common with df-*.c.
 
 But perhaps with the DF solver fewer blocks are visited...?

Or perhaps more.  Who knows.

Jakub


Re: RFC: [build, ada] Centralize PICFLAG configuration

2011-08-22 Thread Rainer Orth
DJ Delorie d...@redhat.com writes:

 Do I need to sync the config and libiberty parts to src manually or does
 this happen by some sort of magic?


 intl/; config.rhost; libiberty/; libiberty's part of include/ 
   gcc: http://gcc.gnu.org
   Changes need to be done in tandem with the official GCC
   sources or submitted to the master file maintainer and brought
   in via a merge.  Note: approved patches in gcc's libiberty or
   intl are automatically approved in this libiberty and intl also;
   feel free to merge them yourself if needed sooner than the next
   merge.  Otherwise, changes are automatically merged, usually
   within a day.

Thanks.  I've just installed config/picflag.m4 and rely on the automatic
merge for libiberty.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[pph] Cleanup line_table and includes streaming (issue4921052)

2011-08-22 Thread Gabriel Charette
This is a refactoring patch, it doesn't change/fix anything in pph itself.

I extracted out/in logic for includes into their own functions.

I removed the LINETAB parameter from in/out functions for the line_table as 
there is only one line_table and that's the only one we stream, the main/global 
one.

I wanted to move pph_loc_offset to pph_stream.encoder.r, but it's not possible 
while locations are called by the streamer hook as in the callback we do not 
have a pph_stream* parameter, added a FIXME to do it later if we do get rid of 
the hook for locations.

Tested with bootstrap and pph regression testing on x64.

Cheers,
Gab

2011-08-22  Gabriel Charette  gch...@google.com

* pph-streamer-in.c (pph_loc_offset): Add FIXME to move this variable
to pph_stream.encoder.r
(pph_in_include): New.
(pph_in_line_table_and_includes): Remove LINETAB parameter. Update
all users to use global LINE_TABLE instead.
Remove code moved to new function pph_in_include. And call it.
(pph_read_file_1): Remove extra parameter in call to
pph_in_line_table_and_includes.
* pph-streamer-out.c (pph_out_include): New.
(pph_get_next_include): Fix comment.
(pph_out_line_table_and_includes): Remove LINETAB parameter. Update
all users to use global LINE_TABLE instead.
Remove code moved to new function pph_out_include. And call it.
* gcc/cp/pph-streamer.h (pph_stream): Fix indenting.

diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c
index 07e1135..2b7dc2d 100644
--- a/gcc/cp/pph-streamer-in.c
+++ b/gcc/cp/pph-streamer-in.c
@@ -75,7 +75,11 @@ static int pph_reading_includes = 0;
 } while (0)
 
 /* Set in pph_in_and_merge_line_table. Represents the source_location offset
-   which every streamed in token must add to it's serialized source_location. 
*/
+   which every streamed in token must add to it's serialized source_location.
+
+   FIXME pph: Ideally this would be in pph_stream.encoder.r, but for that we
+   first need to get rid of the dependency to the streamer_hook for locations.
+   */
 static int pph_loc_offset;
 
 
@@ -1336,9 +1340,37 @@ pph_in_line_map (pph_stream *stream, struct line_map *lm)
 }
 
 
-/* Read the line_table from STREAM and merge it in LINETAB.  At the same time
-   load includes in the order they were originally included by loading them at
-   the point they were referenced in the line_table.
+/* Read in from STREAM and merge a referenced include into the current parsing
+   context.  */
+
+static void
+pph_in_include (pph_stream *stream)
+{
+  int old_loc_offset;
+  const char *include_name;
+  pph_stream *include;
+  source_location prev_start_loc = pph_in_source_location (stream);
+
+  /* Simulate highest_location to be as it would be at this point in a non-pph
+ compilation.  */
+  line_table-highest_location = (prev_start_loc - 1) + pph_loc_offset;
+
+  /* FIXME pph: If we move pph_loc_offset to pph_stream.encoder.r, we could
+ have an independent offset for each stream and not have to save and
+ restore the state of a global pph_loc_offset as we are doing here.  */
+  old_loc_offset = pph_loc_offset;
+
+  include_name = pph_in_string (stream);
+  include = pph_read_file (include_name);
+  pph_add_include (include, false);
+
+  pph_loc_offset = old_loc_offset;
+}
+
+
+/* Read the line_table from STREAM and merge it in the current line_table.  At
+   the same time load includes in the order they were originally included by
+   loading them at the point they were referenced in the line_table.
 
Returns the source_location of line 1 / col 0 for this include.
 
@@ -1348,13 +1380,13 @@ pph_in_line_map (pph_stream *stream, struct line_map 
*lm)
a known current issue, so I didn't bother working around it here for now.  
*/
 
 static source_location
-pph_in_line_table_and_includes (pph_stream *stream, struct line_maps *linetab)
+pph_in_line_table_and_includes (pph_stream *stream)
 {
   unsigned int old_depth;
   bool first;
   int includer_ix = -1;
-  unsigned int used_before = linetab-used;
-  int entries_offset = linetab-used - PPH_NUM_IGNORED_LINE_TABLE_ENTRIES;
+  unsigned int used_before = line_table-used;
+  int entries_offset = line_table-used - PPH_NUM_IGNORED_LINE_TABLE_ENTRIES;
   enum pph_linetable_marker next_lt_marker = pph_in_linetable_marker (stream);
 
   pph_reading_includes++;
@@ -1364,29 +1396,16 @@ pph_in_line_table_and_includes (pph_stream *stream, 
struct line_maps *linetab)
 {
   if (next_lt_marker == PPH_LINETABLE_REFERENCE)
{
- int old_loc_offset;
- const char *include_name = pph_in_string (stream);
- source_location prev_start_loc = pph_in_source_location (stream);
- pph_stream *include;
-
  gcc_assert (!first);
-
- linetab-highest_location = (prev_start_loc - 1) + pph_loc_offset;
-
- old_loc_offset = pph_loc_offset;
-
- include = pph_read_file (include_name);

Re: [pph] Cleanup line_table and includes streaming (issue4921052)

2011-08-22 Thread Diego Novillo

On 11-08-22 14:20 , Gabriel Charette wrote:


2011-08-22  Gabriel Charettegch...@google.com

* pph-streamer-in.c (pph_loc_offset): Add FIXME to move this variable
to pph_stream.encoder.r
(pph_in_include): New.
(pph_in_line_table_and_includes): Remove LINETAB parameter. Update
all users to use global LINE_TABLE instead.
Remove code moved to new function pph_in_include. And call it.
(pph_read_file_1): Remove extra parameter in call to
pph_in_line_table_and_includes.
* pph-streamer-out.c (pph_out_include): New.
(pph_get_next_include): Fix comment.
(pph_out_line_table_and_includes): Remove LINETAB parameter. Update
all users to use global LINE_TABLE instead.
Remove code moved to new function pph_out_include. And call it.
* gcc/cp/pph-streamer.h (pph_stream): Fix indenting.


OK.


Diego.


Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)

2011-08-22 Thread Sriraman Tallam
On Mon, Aug 22, 2011 at 9:02 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Aug 22, 2011 at 8:56 AM, Michael Matz m...@suse.de wrote:
 Hi,

 On Mon, 22 Aug 2011, H.J. Lu wrote:

  Oh, I thought it was data initialized by the constructor ...
 
  Sriramans patch right now has a function __cpu_indicator_init which is
  called from (adhoc constructed) ctors and that initializes variables
  __cpu_model and __cpu_features ;-)  There's no __cpu_indicator symbol :)
 
  I think the whole initializer function and the associated data blobs have
  to sit in static libgcc and be hidden.  By that all shared modules
  will have their own copies of the model and features (and the initializer
  function) so there won't be issues with copy relocs, or cross shared lib
  calls while relocating the modules.  Dynamically they will contain the
  same data always, but it's not many bytes, and only modules making use of
  this facility will pay it.
 
  The initializer function has to be callable from pre-.init contexts, e.g.
  ifunc dispatchers.  And to make life easier there should be one ctor
  function calling this initializer function too, so that normal code can
  rely on it being already called saving one check.
 

 It sounds more complicated than necessary.  Why not just do it
 on demand like glibc does?

 Ehm, the only difference would be to not have a ctor in libgcc that looks
 like so:

 void __attribute__((constructor)) bla(void)
 {
  __cpu_indicator_init ();
 }

 I don't see any complication.?


 Order of constructors.  A constructor may call functions
 which use __cpu_indicator.

I have a suggestion that is a hybrid of the proposed solutions here:

1) Make a constructor in every module that calls
__cpu_indicator_init and make it to be the first constructor to run.
 Will this solve the ordering problem?
2) Change __cpu_indicator_init to run only once by using a variable to
check if it has been run before.

So, each module's constructor will call __cpu_indicator_init but the
CPUID insns are only done once. I also avoid the extra overhead of
having to check if __cpu_indicator_init is called from within the
binary. Will this work?

Thanks,
-Sri.


 --
 H.J.



Re: PING: PATCH: PR target/46770: Use .init_array/.fini_array sections

2011-08-22 Thread Joseph S. Myers
On Mon, 22 Aug 2011, H.J. Lu wrote:

 On Sun, Aug 21, 2011 at 10:37 PM, Jakub Jelinek ja...@redhat.com wrote:
 nd/or add another test to it that tests
  that you can actually use
  .section .init_array
  and it will use correct section flags for the section.
 
 
 We need this information in config.gcc. But config.gcc is used
 before assembler and readelf are detected. I am running out of
 ideas.  Any suggestions?

Require a good assembler on ELF targets and just enable this by default 
for them without trying a configure test that won't work for cross 
compilation (AC_RUN_IFELSE is bad).

The toplevel config/elf.m4 provides a good notion of what is or is not 
ELF (if there are problems, we can fix that file).  Only a handful of 
targets support non-GNU assemblers; for the vast bulk of targets we should 
assume a not-too-old GNU assembler.  That way, the configure test can be 
used to cause a configure-time error if the assembler is defective and it 
doesn't matter that the test is late.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] PR middle-end/38509: add -Wfree-nonheap-object warning option

2011-08-22 Thread Diego Novillo

On 11-08-21 18:14 , Mark Heffernan wrote:

Ping?

Mark

On Fri, Aug 12, 2011 at 9:41 AM, Mark Heffernanmeh...@google.com  wrote:

This patch adds an option for enabling/disabling the warning for
attempting to free nonheap objects (PR/38509).  The warning is
imprecise and can issue false positives.

Bootstrapped on x86-64.  Ok for trunk?

Mark

2011-08-11  Mark Heffernanmeh...@google.com

PR middle-end/38509
* common.opt (Wfree-nonheap-object): New option.
* doc/invoke.texi (Warning options): Document -Wfree-nonheap-object.
* builtins.c (maybe_emit_free_warning): Add OPT_Wfree_nonheap_object
to warning.


OK.


Diego.


Re: PING: PATCH: PR target/46770: Use .init_array/.fini_array sections

2011-08-22 Thread H.J. Lu
On Mon, Aug 22, 2011 at 11:53 AM, Joseph S. Myers
jos...@codesourcery.com wrote:
 On Mon, 22 Aug 2011, H.J. Lu wrote:

 On Sun, Aug 21, 2011 at 10:37 PM, Jakub Jelinek ja...@redhat.com wrote:
 nd/or add another test to it that tests
  that you can actually use
  .section .init_array
  and it will use correct section flags for the section.
 

 We need this information in config.gcc. But config.gcc is used
 before assembler and readelf are detected. I am running out of
 ideas.  Any suggestions?

 Require a good assembler on ELF targets and just enable this by default
 for them without trying a configure test that won't work for cross
 compilation (AC_RUN_IFELSE is bad).

 The toplevel config/elf.m4 provides a good notion of what is or is not
 ELF (if there are problems, we can fix that file).  Only a handful of
 targets support non-GNU assemblers; for the vast bulk of targets we should
 assume a not-too-old GNU assembler.  That way, the configure test can be
 used to cause a configure-time error if the assembler is defective and it
 doesn't matter that the test is late.


A working .init_array support needs assembler, linker and libc.
That is why AC_RUN_IFELSE is used.

-- 
H.J.


Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)

2011-08-22 Thread H.J. Lu
On Mon, Aug 22, 2011 at 11:50 AM, Sriraman Tallam tmsri...@google.com wrote:
 On Mon, Aug 22, 2011 at 9:02 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Aug 22, 2011 at 8:56 AM, Michael Matz m...@suse.de wrote:
 Hi,

 On Mon, 22 Aug 2011, H.J. Lu wrote:

  Oh, I thought it was data initialized by the constructor ...
 
  Sriramans patch right now has a function __cpu_indicator_init which is
  called from (adhoc constructed) ctors and that initializes variables
  __cpu_model and __cpu_features ;-)  There's no __cpu_indicator symbol :)
 
  I think the whole initializer function and the associated data blobs have
  to sit in static libgcc and be hidden.  By that all shared modules
  will have their own copies of the model and features (and the initializer
  function) so there won't be issues with copy relocs, or cross shared lib
  calls while relocating the modules.  Dynamically they will contain the
  same data always, but it's not many bytes, and only modules making use of
  this facility will pay it.
 
  The initializer function has to be callable from pre-.init contexts, e.g.
  ifunc dispatchers.  And to make life easier there should be one ctor
  function calling this initializer function too, so that normal code can
  rely on it being already called saving one check.
 

 It sounds more complicated than necessary.  Why not just do it
 on demand like glibc does?

 Ehm, the only difference would be to not have a ctor in libgcc that looks
 like so:

 void __attribute__((constructor)) bla(void)
 {
  __cpu_indicator_init ();
 }

 I don't see any complication.?


 Order of constructors.  A constructor may call functions
 which use __cpu_indicator.

 I have a suggestion that is a hybrid of the proposed solutions here:

 1) Make a constructor in every module that calls
 __cpu_indicator_init and make it to be the first constructor to run.
  Will this solve the ordering problem?
 2) Change __cpu_indicator_init to run only once by using a variable to
 check if it has been run before.

 So, each module's constructor will call __cpu_indicator_init but the
 CPUID insns are only done once. I also avoid the extra overhead of
 having to check if __cpu_indicator_init is called from within the
 binary. Will this work?


Please make it simple like

if __cpu_indicator is not initialized then
call __cpu_indicator_init
fi

use  __cpu_indicator


-- 
H.J.


Re: PING: PATCH: PR target/46770: Use .init_array/.fini_array sections

2011-08-22 Thread Joseph S. Myers
On Mon, 22 Aug 2011, H.J. Lu wrote:

  Require a good assembler on ELF targets and just enable this by default
  for them without trying a configure test that won't work for cross
  compilation (AC_RUN_IFELSE is bad).
 
  The toplevel config/elf.m4 provides a good notion of what is or is not
  ELF (if there are problems, we can fix that file).  Only a handful of
  targets support non-GNU assemblers; for the vast bulk of targets we should
  assume a not-too-old GNU assembler.  That way, the configure test can be
  used to cause a configure-time error if the assembler is defective and it
  doesn't matter that the test is late.
 
 
 A working .init_array support needs assembler, linker and libc.
 That is why AC_RUN_IFELSE is used.

A working .init_array is a standard part of ELF.  The correct default for 
cross compilation to an ELF target (as determined by elf.m4) is to assume 
it is present; this is not a matter of a GNU extension that could affect 
interoperation with other tools.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)

2011-08-22 Thread Sriraman Tallam
On Mon, Aug 22, 2011 at 11:58 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Aug 22, 2011 at 11:50 AM, Sriraman Tallam tmsri...@google.com wrote:
 On Mon, Aug 22, 2011 at 9:02 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Aug 22, 2011 at 8:56 AM, Michael Matz m...@suse.de wrote:
 Hi,

 On Mon, 22 Aug 2011, H.J. Lu wrote:

  Oh, I thought it was data initialized by the constructor ...
 
  Sriramans patch right now has a function __cpu_indicator_init which is
  called from (adhoc constructed) ctors and that initializes variables
  __cpu_model and __cpu_features ;-)  There's no __cpu_indicator symbol :)
 
  I think the whole initializer function and the associated data blobs 
  have
  to sit in static libgcc and be hidden.  By that all shared modules
  will have their own copies of the model and features (and the 
  initializer
  function) so there won't be issues with copy relocs, or cross shared lib
  calls while relocating the modules.  Dynamically they will contain the
  same data always, but it's not many bytes, and only modules making use 
  of
  this facility will pay it.
 
  The initializer function has to be callable from pre-.init contexts, 
  e.g.
  ifunc dispatchers.  And to make life easier there should be one ctor
  function calling this initializer function too, so that normal code can
  rely on it being already called saving one check.
 

 It sounds more complicated than necessary.  Why not just do it
 on demand like glibc does?

 Ehm, the only difference would be to not have a ctor in libgcc that looks
 like so:

 void __attribute__((constructor)) bla(void)
 {
  __cpu_indicator_init ();
 }

 I don't see any complication.?


 Order of constructors.  A constructor may call functions
 which use __cpu_indicator.

 I have a suggestion that is a hybrid of the proposed solutions here:

 1) Make a constructor in every module that calls
 __cpu_indicator_init and make it to be the first constructor to run.
  Will this solve the ordering problem?
 2) Change __cpu_indicator_init to run only once by using a variable to
 check if it has been run before.

 So, each module's constructor will call __cpu_indicator_init but the
 CPUID insns are only done once. I also avoid the extra overhead of
 having to check if __cpu_indicator_init is called from within the
 binary. Will this work?


 Please make it simple like

 if __cpu_indicator is not initialized then
    call __cpu_indicator_init
 fi

 use  __cpu_indicator


Will do, thanks.

-Sri.


 --
 H.J.



Re: [PATCH, test, i386] Fix for PR50155

2011-08-22 Thread Uros Bizjak
On Mon, Aug 22, 2011 at 8:51 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:

 Attached fix for http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50155

 ChangeLog entry:
 2011-08-22  Kirill Yukhin  kirill.yuk...@intel.com

        PR target/50155
        * config/i386/sse.md (VI1248_AVX2): New.
        (plusminus_insnmode3): Update.
        (*plusminus_insnmode3): Likewise.
        (sse2_avx2_andnotmode3): Likewise.
        (avx2_pbroadcastmode): Likewise.

 testsuite/ChangeLog entry:
 2011-08-22  Kirill Yukhin  kirill.yuk...@intel.com

        PR target/50155
        * gcc.target/i386/pr50155.c: New test.

 New test fails without fix, passed with it applied.

 Ok for trunk if bootstrap will success?

No.

- you are disabling andnotps for 256bit integer modes on !TARGET_AVX2 targets.
- avx2_pbroadcast change is a no-op.

I found two additional problems with the patch:
- order of evaluation of cond RTX in mode attribute calculation is
wrong for *andnotmode3 and *any_logic:codemode3 instructions.
- shortmode mode attribute is not used (minor)

Attached (lightly tested) patch fixes all problems and adds additional
asserts into mentioned logic instructions.

Uros.
Index: sse.md
===
--- sse.md  (revision 177968)
+++ sse.md  (working copy)
@@ -73,6 +73,12 @@
(V8SI TARGET_AVX) V4SI
(V4DI TARGET_AVX) V2DI])
 
+(define_mode_iterator VI_AVX2
+  [(V32QI TARGET_AVX2) V16QI
+   (V16HI TARGET_AVX2) V8HI
+   (V8SI TARGET_AVX2) V4SI
+   (V4DI TARGET_AVX2) V2DI])
+
 ;; All QImode vector integer modes
 (define_mode_iterator VI1
   [(V32QI TARGET_AVX) V16QI])
@@ -124,8 +130,8 @@
   [V4SI V4DI])
 
 (define_mode_iterator V48_AVX2
-  [(V4SF TARGET_SSE) (V2DF TARGET_SSE2)
-   (V8SF TARGET_AVX) (V4DF TARGET_AVX)
+  [V4SF V2DF
+   V8SF V4DF
(V4SI TARGET_AVX2) (V2DI TARGET_AVX2)
(V8SI TARGET_AVX2) (V4DI TARGET_AVX2)])
 
@@ -170,9 +176,6 @@
 (define_mode_attr ssebytemode
   [(V4DI V32QI) (V2DI V16QI)])
 
-(define_mode_attr shortmode
-  [(V4DI v4si) (V2DI v2si)])
-
 ;; All 128bit vector integer modes
 (define_mode_iterator VI_128 [V16QI V8HI V4SI V2DI])
 
@@ -4641,18 +4644,18 @@
   operands[2] = force_reg (MODEmode, CONST0_RTX (MODEmode));)
 
 (define_expand plusminus_insnmode3
-  [(set (match_operand:VI 0 register_operand )
-   (plusminus:VI
- (match_operand:VI 1 nonimmediate_operand )
- (match_operand:VI 2 nonimmediate_operand )))]
+  [(set (match_operand:VI_AVX2 0 register_operand )
+   (plusminus:VI_AVX2
+ (match_operand:VI_AVX2 1 nonimmediate_operand )
+ (match_operand:VI_AVX2 2 nonimmediate_operand )))]
   TARGET_SSE2
   ix86_fixup_binary_operands_no_copy (CODE, MODEmode, operands);)
 
 (define_insn *plusminus_insnmode3
-  [(set (match_operand:VI 0 register_operand =x,x)
-   (plusminus:VI
- (match_operand:VI 1 nonimmediate_operand comm0,x)
- (match_operand:VI 2 nonimmediate_operand xm,xm)))]
+  [(set (match_operand:VI_AVX2 0 register_operand =x,x)
+   (plusminus:VI_AVX2
+ (match_operand:VI_AVX2 1 nonimmediate_operand comm0,x)
+ (match_operand:VI_AVX2 2 nonimmediate_operand xm,xm)))]
   TARGET_SSE2  ix86_binary_operator_ok (CODE, MODEmode, operands)
   @
pplusminus_mnemonicssemodesuffix\t{%2, %0|%0, %2}
@@ -6176,10 +6179,30 @@
 {
   static char buf[32];
   const char *ops;
-  const char *tmp
-= ((get_attr_mode (insn) == MODE_TI) ||
-   (get_attr_mode (insn) == MODE_OI)) ? pandn : andnps;
+  const char *tmp;
 
+  switch (get_attr_mode (insn))
+{
+case MODE_OI:
+  gcc_assert (TARGET_AVX2);
+case MODE_TI:
+  gcc_assert (TARGET_SSE2);
+
+  tmp = pandn;
+  break;
+
+   case MODE_V8SF:
+  gcc_assert (TARGET_AVX);
+   case MODE_V4SF:
+  gcc_assert (TARGET_SSE);
+
+  tmp = andnps;
+  break;
+
+   default:
+  gcc_unreachable ();
+   }
+
   switch (which_alternative)
 {
 case 0:
@@ -6205,12 +6228,12 @@
(const_string *)))
(set_attr prefix orig,vex)
(set (attr mode)
- (cond [(ne (symbol_ref GET_MODE_SIZE (MODEmode)  128) (const_int 0))
+ (cond [(ne (symbol_ref TARGET_AVX2) (const_int 0))
+ (const_string OI)
+   (ne (symbol_ref GET_MODE_SIZE (MODEmode)  128) (const_int 0))
  (const_string V8SF)
(ne (symbol_ref TARGET_SSE2) (const_int 0))
  (const_string TI)
-   (ne (symbol_ref TARGET_AVX2) (const_int 0))
- (const_string OI)
   ]
   (const_string V4SF)))])
 
@@ -6232,10 +6255,30 @@
 {
   static char buf[32];
   const char *ops;
-  const char *tmp
-= (get_attr_mode (insn) == MODE_TI)||
-  (get_attr_mode (insn) == MODE_OI) ? plogic : logicps;
+  const char *tmp;
 
+  switch (get_attr_mode (insn))
+{
+case MODE_OI:
+  gcc_assert (TARGET_AVX2);
+case MODE_TI:
+  gcc_assert (TARGET_SSE2);
+
+  tmp = plogic;
+  break;
+
+   case MODE_V8SF:
+  gcc_assert (TARGET_AVX);

Re: [M32C] Hookize CLASS_MAX_NREGS

2011-08-22 Thread DJ Delorie

   OK to install?
 
 * config/m32c/m32c.h (CLASS_MAX_NREGS): Remove macro.
 * config/m32c/m32c-protos.h (m32c_class_max_nregs): Remove.
 * config/m32c/m32c.c (m32c_class_max_nregs): Make static. Change
 regclass argument type to reg_class_t. Change 'max' and 'v' vars
 and return types to unsigned char. Use reg_class_contents instead
 of class_contents.
 (TARGET_CLASS_MAX_NREGS): Define.

Ok.


Re: [PATCH, i386, testsuite] FMA intrinsics

2011-08-22 Thread Uros Bizjak
On Mon, Aug 22, 2011 at 6:25 PM, Ilya Tocar tocarip.in...@gmail.com wrote:

 You don't need to add negated versions, one FMA builtin per mode is
 enough, please see existing FMA4 descriptions. Just put unary minus
 sign in the intrinsics header for negated operand and let GCC do its
 job. Please see existing FMA4 intrinsics header.

 Actually i tried that.But in such case  when i compile(FMA4 example)
 #include x86intrin.h
 extern  __m128 a,b,c;
 void foo(){
   a = _mm_nmsub_ps(a,b,c);
 }
 with -S -O0 -mfma4
 The asm have

        vxorps  %xmm1, %xmm0, %xmm0
        vmovaps -16(%rbp), %xmm1
        vmovaps .LC0(%rip), %xmm2
        vxorps  %xmm2, %xmm1, %xmm1
        vfmaddps        %xmm0, -32(%rbp), %xmm1, %xmm0
 So vfmaddps of negated values is generated instead of vfnmsubps.
 I think it is bad that intrinsic for  instruction can generate code
 without this instruction.
 So to make sure that exact instruction is always generated i
 introduced additional expands and builtins.
 Is it wrong?

This is artificial limitation. User requested the functionality of the
intrinsic, and should not bother with how the compiler realizes it.
With -O2, negation would propagate into the insn during combine pass,
and optimal instruction would be generated.

So, to answer your question - it is wrong to expect exact instruction
from builtins. Maybe from using -O0, but this should not be used
anyway in the testsuite.

Uros.


Re: [C++ PATCH] Clear TYPE_TRANSPARENT_AGGR if there are no fields (PR c++/46862)

2011-08-22 Thread Jason Merrill

OK.

Jason


Re: Vector Comparison patch

2011-08-22 Thread Uros Bizjak
On Mon, Aug 22, 2011 at 5:34 PM, Richard Guenther
richard.guent...@gmail.com wrote:

 In this case it is simple to analyse that a is a comparison, but you
 cannot embed the operations of a into VEC_COND_EXPR.

 Sure, but if the above is C source the frontend would generate
 res = a != 0 ? v0 : v1; initially.  An optimization pass could still
 track this information and replace VEC_COND_EXPR a != 0, v0, v1
 with VEC_COND_EXPR a, v0, v1 (no existing one would track
 vector contents though).

 Ok, I am testing the patch that removes hooks. Could you push a little
 bit the backend-patterns business?

 Well, I suppose we're waiting for Uros here.  I hadn't much luck with
 fiddling with the mode-iterator stuff myself.

It is not _that_ trivial change, since we have ix86_expand_fp_vcond
and ix86_expand_int_vcond to merge. ATM, FP version deals with FP
operands and vice versa. We have to merge them somehow and split out
comparison part that handles FP as well as integer operands.

I also don't know why vcond is not allowed to FAIL... probably
middle-end should be enhanced for a fallback if some comparison isn't
supported by optab.

Uros.


Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)

2011-08-22 Thread Richard Guenther
On Mon, Aug 22, 2011 at 6:02 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Aug 22, 2011 at 8:56 AM, Michael Matz m...@suse.de wrote:
 Hi,

 On Mon, 22 Aug 2011, H.J. Lu wrote:

  Oh, I thought it was data initialized by the constructor ...
 
  Sriramans patch right now has a function __cpu_indicator_init which is
  called from (adhoc constructed) ctors and that initializes variables
  __cpu_model and __cpu_features ;-)  There's no __cpu_indicator symbol :)
 
  I think the whole initializer function and the associated data blobs have
  to sit in static libgcc and be hidden.  By that all shared modules
  will have their own copies of the model and features (and the initializer
  function) so there won't be issues with copy relocs, or cross shared lib
  calls while relocating the modules.  Dynamically they will contain the
  same data always, but it's not many bytes, and only modules making use of
  this facility will pay it.
 
  The initializer function has to be callable from pre-.init contexts, e.g.
  ifunc dispatchers.  And to make life easier there should be one ctor
  function calling this initializer function too, so that normal code can
  rely on it being already called saving one check.
 

 It sounds more complicated than necessary.  Why not just do it
 on demand like glibc does?

 Ehm, the only difference would be to not have a ctor in libgcc that looks
 like so:

 void __attribute__((constructor)) bla(void)
 {
  __cpu_indicator_init ();
 }

 I don't see any complication.?


 Order of constructors.  A constructor may call functions
 which use __cpu_indicator.

As I said - make __cpu_indicator have a conservative
default value (zero).  It is irrelevant if constructors that
run before initializing __cpu_indicator run with the
default CPU capabilities.

Richard.

 --
 H.J.



  1   2   >