Re: RFC: [build, ada] Centralize PICFLAG configuration
On 08/19/2011 09:11 PM, Rainer Orth wrote: 2011-07-31 Rainer Orthr...@cebitec.uni-bielefeld.de config: * picflag.m4: New file. gcc: * configure.ac (GCC_PICFLAG_FOR_TARGET): Call it. (PICFLAG_FOR_TARGET): Substitute. * aclocal.m4: Regenerate. * configure: Regenerate. gcc/ada: * gcc-interface/Makefile.in (PICFLAG_FOR_TARGET): New. (GNATLIBCFLAGS_FOR_C): Replace TARGET_LIBGCC2_CFLAGS by PICFLAG_FOR_TARGET. (gnatlib-shared-default, gnatlib-shared-dual-win32) (gnatlib-shared-win32, gnatlib-shared-darwin, gnatlib-shared) (gnatlib-sjlj, gnatlib-zcx): Likewise. libada: * configure.ac: Include ../config/picflag.m4. (GCC_PICFLAG): Call it. Substitute. * configure: Regenerate. * Makefile.in (TARGET_LIBGCC2_CFLAGS): Replace by PICFLAG. (GNATLIBCFLAGS_FOR_C): Replace TARGET_LIBGCC2_CFLAGS by PICFLAG. (LIBADA_FLAGS_TO_PASS): Pass PICFLAG as PICFLAG_FOR_TARGET. Don't include $(GCC_DIR)/libgcc.mvars. libiberty: * aclocal.m4: Include ../config/picflag.m4. * configure.ac (GCC_PICFLAG): Call it. (enable_shared): Clear PICFLAG unless shared. * configure: Regenerate. Ok, thanks. Paolo
Various minor speed-ups
Hello list, the followup patches are a selection of minor changes introduced in various times during my GSOC project. They mostly are simple or not that important to be posted alone, so I'll post them alltogether under this thread. Nevertheless they have been carefully selected from a pool of other changes and they are the ones that *do* offer some (minor) speed improvement, and have the least impact on memory usage, if at all. They have all been tested on x86_64, some also on i386. For production builds I have seen no regression introduced. Thanks, Dimitris
mem_attrs_htab
2011-08-22 Dimitrios Apostolou ji...@gmx.net * emit-rtl.c (mem_attrs_htab_hash): Hash massively by calling iterative_hash(). We disregard the offset,size rtx fields of the mem_attrs struct, but overall this hash is a *huge* improvement to the previous one, it reduces the collisions/searches ratio from 8 to 0.8 for some cases. (init_emit_once): Slightly increase the mem_attrs_htab initial size because it's frequently used and expanded many times. === modified file 'gcc/emit-rtl.c' --- gcc/emit-rtl.c 2011-05-29 17:40:05 + +++ gcc/emit-rtl.c 2011-08-21 04:44:25 + @@ -256,11 +256,10 @@ mem_attrs_htab_hash (const void *x) { const mem_attrs *const p = (const mem_attrs *) x; - return (p-alias ^ (p-align * 1000) - ^ (p-addrspace * 4000) - ^ ((p-offset ? INTVAL (p-offset) : 0) * 5) - ^ ((p-size ? INTVAL (p-size) : 0) * 250) - ^ (size_t) iterative_hash_expr (p-expr, 0)); + /* By massively feeding the mem_attrs struct to iterative_hash() we + disregard the p-offset and p-size rtx, but in total the hash is + quick and good enough. */ + return iterative_hash_object (*p, iterative_hash_expr (p-expr, 0)); } /* Returns nonzero if the value represented by X (which is really a @@ -5494,7 +5500,7 @@ init_emit_once (void) const_fixed_htab = htab_create_ggc (37, const_fixed_htab_hash, const_fixed_htab_eq, NULL); - mem_attrs_htab = htab_create_ggc (37, mem_attrs_htab_hash, + mem_attrs_htab = htab_create_ggc (128, mem_attrs_htab_hash, mem_attrs_htab_eq, NULL); reg_attrs_htab = htab_create_ggc (37, reg_attrs_htab_hash, reg_attrs_htab_eq, NULL);
graphds.[ch]: alloc_pool for edges
free() was called way too often before, this patch reduces it significantly. Minor speed-up here too, I don't mention it individually since numbers are within noise margins. 2011-08-22 Dimitrios Apostolou ji...@gmx.net * graphds.h (struct graph): Added edge_pool as a pool for allocating edges. * graphds.c (new_graph): Initialise edge_pool. (add_edge): Allocate edge from edge_pool rather than with malloc. (free_graph): Instead of iterating across the graph freeing edges, just destroy the edge_pool. === modified file 'gcc/graphds.c' --- gcc/graphds.c 2009-11-25 10:55:54 + +++ gcc/graphds.c 2011-08-19 16:44:41 + @@ -62,7 +62,8 @@ new_graph (int n_vertices) g-n_vertices = n_vertices; g-vertices = XCNEWVEC (struct vertex, n_vertices); - + g-edge_pool = create_alloc_pool (edge_pool, + sizeof (struct graph_edge), 32); return g; } @@ -71,7 +72,7 @@ new_graph (int n_vertices) struct graph_edge * add_edge (struct graph *g, int f, int t) { - struct graph_edge *e = XNEW (struct graph_edge); + struct graph_edge *e = (struct graph_edge *) pool_alloc (g-edge_pool); struct vertex *vf = g-vertices[f], *vt = g-vertices[t]; @@ -326,19 +327,7 @@ for_each_edge (struct graph *g, graphds_ void free_graph (struct graph *g) { - struct graph_edge *e, *n; - struct vertex *v; - int i; - - for (i = 0; i g-n_vertices; i++) -{ - v = g-vertices[i]; - for (e = v-succ; e; e = n) - { - n = e-succ_next; - free (e); - } -} + free_alloc_pool (g-edge_pool); free (g-vertices); free (g); } === modified file 'gcc/graphds.h' --- gcc/graphds.h 2009-02-20 15:20:38 + +++ gcc/graphds.h 2011-08-19 16:44:41 + @@ -18,6 +18,10 @@ You should have received a copy of the G along with GCC; see the file COPYING3. If not see http://www.gnu.org/licenses/. */ + +#include alloc-pool.h + + /* Structure representing edge of a graph. */ struct graph_edge @@ -44,10 +48,10 @@ struct vertex struct graph { - int n_vertices; /* Number of vertices. */ - struct vertex *vertices; - /* The vertices. */ - htab_t indices; /* Fast lookup for indices. */ + int n_vertices; /* Number of vertices. */ + struct vertex *vertices; /* The vertices. */ + htab_t indices; /* Fast lookup for indices. */ + alloc_pool edge_pool;/* Pool for allocating edges. */ }; struct graph *new_graph (int);
tree-ssa*: reduce malloc() calls by preallocating hot VECs on the stack
2011-08-22 Dimitrios Apostolou ji...@gmx.net Allocate some very frequently used vectors on the stack: * vecir.h: Defined a tree vector on the stack. * tree-ssa-sccvn.c (print_scc, sort_scc, process_scc) (extract_and_process_scc_for_name): Allocate the scc vector on the stack instead of the heap, giving it a minimal initial size instead of 0. * tree-ssa-structalias.c (get_constraint_for_1) (get_constraint_for, get_constraint_for_rhs, do_deref) (get_constraint_for_ssa_var, get_constraint_for_ptr_offset) (get_constraint_for_component_ref, get_constraint_for_address_of) (process_all_all_constraints, do_structure_copy) (make_constraints_to, make_constraint_to, handle_rhs_call) (handle_lhs_call, handle_const_call, handle_pure_call) (find_func_aliases_for_builtin_call, find_func_aliases_for_call) (find_func_aliases, process_ipa_clobber, find_func_clobbers) (create_variable_info_for): Converted the rhsc, lhsc vectors from heap to stack, with a minimal initial size, since they were very frequently allocated. === modified file 'gcc/tree-ssa-structalias.c' --- gcc/tree-ssa-structalias.c 2011-08-18 06:53:12 + +++ gcc/tree-ssa-structalias.c 2011-08-19 09:43:41 + @@ -477,11 +477,14 @@ struct constraint_expr typedef struct constraint_expr ce_s; DEF_VEC_O(ce_s); -DEF_VEC_ALLOC_O(ce_s, heap); -static void get_constraint_for_1 (tree, VEC(ce_s, heap) **, bool, bool); -static void get_constraint_for (tree, VEC(ce_s, heap) **); -static void get_constraint_for_rhs (tree, VEC(ce_s, heap) **); -static void do_deref (VEC (ce_s, heap) **); +DEF_VEC_ALLOC_O_STACK(ce_s); +#define VEC_ce_s_stack_alloc(alloc) \ + VEC_stack_alloc (ce_s, alloc) + +static void get_constraint_for_1 (tree, VEC(ce_s, stack) **, bool, bool); +static void get_constraint_for (tree, VEC(ce_s, stack) **); +static void get_constraint_for_rhs (tree, VEC(ce_s, stack) **); +static void do_deref (VEC (ce_s, stack) **); /* Our set constraints are made up of two constraint expressions, one LHS, and one RHS. @@ -2736,7 +2739,7 @@ new_scalar_tmp_constraint_exp (const cha If address_p is true, the result will be taken its address of. */ static void -get_constraint_for_ssa_var (tree t, VEC(ce_s, heap) **results, bool address_p) +get_constraint_for_ssa_var (tree t, VEC(ce_s, stack) **results, bool address_p) { struct constraint_expr cexpr; varinfo_t vi; @@ -2776,12 +2779,12 @@ get_constraint_for_ssa_var (tree t, VEC( for (; vi; vi = vi-next) { cexpr.var = vi-id; - VEC_safe_push (ce_s, heap, *results, cexpr); + VEC_safe_push (ce_s, stack, *results, cexpr); } return; } - VEC_safe_push (ce_s, heap, *results, cexpr); + VEC_safe_push (ce_s, stack, *results, cexpr); } /* Process constraint T, performing various simplifications and then @@ -2861,7 +2864,7 @@ bitpos_of_field (const tree fdecl) static void get_constraint_for_ptr_offset (tree ptr, tree offset, - VEC (ce_s, heap) **results) + VEC (ce_s, stack) **results) { struct constraint_expr c; unsigned int j, n; @@ -2920,7 +2923,7 @@ get_constraint_for_ptr_offset (tree ptr, c2.type = ADDRESSOF; c2.offset = 0; if (c2.var != c.var) - VEC_safe_push (ce_s, heap, *results, c2); + VEC_safe_push (ce_s, stack, *results, c2); temp = temp-next; } while (temp); @@ -2955,7 +2958,7 @@ get_constraint_for_ptr_offset (tree ptr, c2.var = temp-next-id; c2.type = ADDRESSOF; c2.offset = 0; - VEC_safe_push (ce_s, heap, *results, c2); + VEC_safe_push (ce_s, stack, *results, c2); } c.var = temp-id; c.offset = 0; @@ -2974,7 +2977,7 @@ get_constraint_for_ptr_offset (tree ptr, as the lhs. */ static void -get_constraint_for_component_ref (tree t, VEC(ce_s, heap) **results, +get_constraint_for_component_ref (tree t, VEC(ce_s, stack) **results, bool address_p, bool lhs_p) { tree orig_t = t; @@ -2999,7 +3002,7 @@ get_constraint_for_component_ref (tree t temp.offset = 0; temp.var = integer_id; temp.type = SCALAR; - VEC_safe_push (ce_s, heap, *results, temp); + VEC_safe_push (ce_s, stack, *results, temp); return; } @@ -3021,7 +3024,7 @@ get_constraint_for_component_ref (tree t temp.offset = 0; temp.var = anything_id; temp.type = ADDRESSOF; - VEC_safe_push (ce_s, heap, *results, temp); + VEC_safe_push (ce_s, stack, *results, temp); return; } } @@ -3062,7 +3065,7 @@ get_constraint_for_component_ref (tree t bitpos, bitmaxsize))
tree-ssa-structalias.c: alloc_pool for struct equiv_class_label
2011-08-22 Dimitrios Apostolou ji...@gmx.net * tree-ssa-structalias.c (equiv_class_add) (perform_var_substitution, free_var_substitution_info): Created a new equiv_class_pool allocator pool for struct equiv_class_label. Changed the pointer_equiv_class_table and location_equiv_class_table hash tables to not iterate freeing all elements in the end, but just free the pool.
[PATCH] make assign_hard_reg's saved_nregs conditional to unbreak ARM bootstrap (PR50146)
As described in PR50146, a recent change to ira-color.c caused trunk to fail to bootstrap on ARM. The issue is that a new variable saved_nregs is declared unconditionally but used #ifndef HONOR_REG_ALLOC_ORDER. Since the ARM backend defines HONOR_REG_ALLOC_ORDER, an 'unused variable' warning results, which becomes a hard error in stage 2. Fixed by moving the declaration to the #ifndef HONOR_REG_ALLOC_ORDER block of variable declarations. With this in place bootstrap succeeds on armv5tel-linux-gnueabi. Patch was pre-approved by Vladimir Makarov in the PR trail, but I don't have svn write access so I'll need help to commit it. /Mikael gcc/ 2011-08-22 Mikael Pettersson mi...@it.uu.se PR bootstrap/50146 * ira-color.c (assign_hard_reg): Move saved_nregs declaration to #ifndef HONOR_REG_ALLOC_ORDER block. --- gcc-4.7-20110820/gcc/ira-color.c.~1~2011-08-18 16:56:36.0 +0200 +++ gcc-4.7-20110820/gcc/ira-color.c2011-08-21 19:11:00.0 +0200 @@ -1567,13 +1567,14 @@ static bool assign_hard_reg (ira_allocno_t a, bool retry_p) { HARD_REG_SET conflicting_regs[2], profitable_hard_regs[2]; - int i, j, hard_regno, best_hard_regno, class_size, saved_nregs; + int i, j, hard_regno, best_hard_regno, class_size; int cost, mem_cost, min_cost, full_cost, min_full_cost, nwords, word; int *a_costs; enum reg_class aclass; enum machine_mode mode; static int costs[FIRST_PSEUDO_REGISTER], full_costs[FIRST_PSEUDO_REGISTER]; #ifndef HONOR_REG_ALLOC_ORDER + int saved_nregs; enum reg_class rclass; int add_cost; #endif
Re: mem_attrs_htab
On Mon, Aug 22, 2011 at 10:32:35AM +0300, Dimitrios Apostolou wrote: --- gcc/emit-rtl.c2011-05-29 17:40:05 + +++ gcc/emit-rtl.c2011-08-21 04:44:25 + @@ -256,11 +256,10 @@ mem_attrs_htab_hash (const void *x) { const mem_attrs *const p = (const mem_attrs *) x; - return (p-alias ^ (p-align * 1000) - ^ (p-addrspace * 4000) - ^ ((p-offset ? INTVAL (p-offset) : 0) * 5) - ^ ((p-size ? INTVAL (p-size) : 0) * 250) - ^ (size_t) iterative_hash_expr (p-expr, 0)); + /* By massively feeding the mem_attrs struct to iterative_hash() we + disregard the p-offset and p-size rtx, but in total the hash is + quick and good enough. */ + return iterative_hash_object (*p, iterative_hash_expr (p-expr, 0)); } /* Returns nonzero if the value represented by X (which is really a This patch isn't against the trunk, where p-offset and p-size aren't rtxes anymore, but HOST_WIDE_INTs. Furthermore, it is a bad idea to hash the p-expr address itself, it doesn't make any sense to hash on what p-expr points to in that case. And p-offset and p-size should be ignored if the *known_p corresponding fields are false. So, if you really think using iterative_hash_object is a win, it should be something like: mem_attrs q = *p; q.expr = NULL; if (!q.offset_known_p) q.offset = 0; if (!q.size_known_p) q.size = 0; return iterative_hash_object (q, iterative_hash_expr (p-expr, 0)); (or better yet avoid q.expr = NULL and instead start hashing from the next field after expr). Hashing the struct padding might not be a good idea either. Jakub
Re: Various minor speed-ups
2011-08-22 Dimitrios Apostolou ji...@gmx.net * tree-ssa-pre.c (phi_trans_add, init_pre, fini_pre): Added a pool for phi_translate_table elements to avoid free() calls from htab_delete(). === modified file 'gcc/tree-ssa-pre.c' --- gcc/tree-ssa-pre.c 2011-05-04 09:04:53 + +++ gcc/tree-ssa-pre.c 2011-08-17 08:43:23 + @@ -515,6 +515,10 @@ typedef struct expr_pred_trans_d } *expr_pred_trans_t; typedef const struct expr_pred_trans_d *const_expr_pred_trans_t; +/* Pool of memory for the above */ + +static alloc_pool phi_translate_pool; + /* Return the hash value for a phi translation table entry. */ static hashval_t @@ -571,7 +575,8 @@ static inline void phi_trans_add (pre_expr e, pre_expr v, basic_block pred) { void **slot; - expr_pred_trans_t new_pair = XNEW (struct expr_pred_trans_d); + expr_pred_trans_t new_pair += (expr_pred_trans_t) pool_alloc (phi_translate_pool); new_pair-e = e; new_pair-pred = pred; new_pair-v = v; @@ -580,7 +585,8 @@ phi_trans_add (pre_expr e, pre_expr v, b slot = htab_find_slot_with_hash (phi_translate_table, new_pair, new_pair-hashcode, INSERT); - free (*slot); + if (*slot) +pool_free (phi_translate_pool, *slot); *slot = (void *) new_pair; } @@ -4804,8 +4810,12 @@ init_pre (bool do_fre) calculate_dominance_info (CDI_DOMINATORS); bitmap_obstack_initialize (grand_bitmap_obstack); + phi_translate_pool = create_alloc_pool (phi_translate_table pool, + sizeof (struct expr_pred_trans_d), + 4096); + /* NULL as free because we'll free the whole pool in the end. */ phi_translate_table = htab_create (5110, expr_pred_trans_hash, -expr_pred_trans_eq, free); +expr_pred_trans_eq, NULL); expression_to_id = htab_create (num_ssa_names * 3, pre_expr_hash, pre_expr_eq, NULL); @@ -4839,6 +4849,7 @@ fini_pre (bool do_fre) free_alloc_pool (bitmap_set_pool); free_alloc_pool (pre_expr_pool); htab_delete (phi_translate_table); + free_alloc_pool (phi_translate_pool); htab_delete (expression_to_id); VEC_free (unsigned, heap, name_to_id);
Re: graphds.[ch]: alloc_pool for edges
On Mon, Aug 22, 2011 at 10:37:58AM +0300, Dimitrios Apostolou wrote: --- gcc/graphds.h 2009-02-20 15:20:38 + +++ gcc/graphds.h 2011-08-19 16:44:41 + @@ -18,6 +18,10 @@ You should have received a copy of the G along with GCC; see the file COPYING3. If not see http://www.gnu.org/licenses/. */ + +#include alloc-pool.h + + This needs to be reflected in Makefile.in, we unfortunately don't have automatic dependency generation for gcc. So, create GRAPHDS_H = graphds.h alloc-pool.h and use $(GRAPHDS_H) everywhere where graphds.h has been used so far. Jakub
Re: tree-ssa-structalias.c: alloc_pool for struct equiv_class_label
Forgot the patch... On Mon, 22 Aug 2011, Dimitrios Apostolou wrote: 2011-08-22 Dimitrios Apostolou ji...@gmx.net * tree-ssa-structalias.c (equiv_class_add) (perform_var_substitution, free_var_substitution_info): Created a new equiv_class_pool allocator pool for struct equiv_class_label. Changed the pointer_equiv_class_table and location_equiv_class_table hash tables to not iterate freeing all elements in the end, but just free the pool. === modified file 'gcc/tree-ssa-structalias.c' --- gcc/tree-ssa-structalias.c 2011-04-29 10:59:33 + +++ gcc/tree-ssa-structalias.c 2011-08-18 06:53:12 + @@ -1899,6 +1899,9 @@ static htab_t pointer_equiv_class_table; classes. */ static htab_t location_equiv_class_table; +/* Pool of memory for storing the above */ +static alloc_pool equiv_class_pool; + /* Hash function for a equiv_class_label_t */ static hashval_t @@ -1948,7 +1951,8 @@ equiv_class_add (htab_t table, unsigned bitmap labels) { void **slot; - equiv_class_label_t ecl = XNEW (struct equiv_class_label); + equiv_class_label_t ecl += (equiv_class_label_t) pool_alloc (equiv_class_pool); ecl-labels = labels; ecl-equivalence_class = equivalence_class; @@ -2159,10 +2163,14 @@ perform_var_substitution (constraint_gra struct scc_info *si = init_scc_info (size); bitmap_obstack_initialize (iteration_obstack); + equiv_class_pool = create_alloc_pool (equiv_class_label pool, + sizeof (struct equiv_class_label), + 64); + /* NULL free function, we'll free the whole pool at the end of the pass. */ pointer_equiv_class_table = htab_create (511, equiv_class_label_hash, - equiv_class_label_eq, free); + equiv_class_label_eq, NULL); location_equiv_class_table = htab_create (511, equiv_class_label_hash, - equiv_class_label_eq, free); + equiv_class_label_eq, NULL); pointer_equiv_class = 1; location_equiv_class = 1; @@ -2269,6 +2277,7 @@ free_var_substitution_info (struct scc_i sbitmap_free (graph-direct_nodes); htab_delete (pointer_equiv_class_table); htab_delete (location_equiv_class_table); + free_alloc_pool (equiv_class_pool); bitmap_obstack_release (iteration_obstack); }
Re: mem_attrs_htab
Hi Jakub, I forgot to mention that all patches are against mid-July trunk, I was hoping I'd have no conflicts. Anyway thanks for letting me know, if there are conflicts with my other patches please let me know, and I'll post an updated version at a later date. All your other concerns are valid and I'll try addressing them in the future. I didn't like hashing addresses either, and I was surprised I saw no regressions. Dimitris This patch isn't against the trunk, where p-offset and p-size aren't rtxes anymore, but HOST_WIDE_INTs. Furthermore, it is a bad idea to hash the p-expr address itself, it doesn't make any sense to hash on what p-expr points to in that case. And p-offset and p-size should be ignored if the *known_p corresponding fields are false. So, if you really think using iterative_hash_object is a win, it should be something like: mem_attrs q = *p; q.expr = NULL; if (!q.offset_known_p) q.offset = 0; if (!q.size_known_p) q.size = 0; return iterative_hash_object (q, iterative_hash_expr (p-expr, 0)); (or better yet avoid q.expr = NULL and instead start hashing from the next field after expr). Hashing the struct padding might not be a good idea either. Jakub
Re: mem_attrs_htab
On Mon, Aug 22, 2011 at 10:58:48AM +0300, Dimitrios Apostolou wrote: the future. I didn't like hashing addresses either, and I was surprised I saw no regressions. Hashing on the expr address as well just results in smaller sharing in the hash table (i.e. if the expr has different address, but is considered equal). The hashing of mem attrs is done just to reduce memory overhead. Jakub
Re: Various minor speed-ups
For whoever is concerned about memory usage, I didn't measure a real increase, besides a few KB. These are very hot allocation pools and allocating too many blocks of 10 elements is suboptimal. 2011-08-22 Dimitrios Apostolou ji...@gmx.net * cselib.c (cselib_init): Increased initial size of elt_list_pool, elt_loc_list_pool, cselib_val_pool, value_pool allocation pools since they are very frequently used. === modified file 'gcc/cselib.c' --- gcc/cselib.c2011-05-31 19:14:21 + +++ gcc/cselib.c2011-08-17 14:03:56 + @@ -2484,12 +2484,12 @@ void cselib_init (int record_what) { elt_list_pool = create_alloc_pool (elt_list, -sizeof (struct elt_list), 10); +sizeof (struct elt_list), 128); elt_loc_list_pool = create_alloc_pool (elt_loc_list, -sizeof (struct elt_loc_list), 10); +sizeof (struct elt_loc_list), 128); cselib_val_pool = create_alloc_pool (cselib_val_list, - sizeof (cselib_val), 10); - value_pool = create_alloc_pool (value, RTX_CODE_SIZE (VALUE), 100); + sizeof (cselib_val), 128); + value_pool = create_alloc_pool (value, RTX_CODE_SIZE (VALUE), 128); cselib_record_memory = record_what CSELIB_RECORD_MEMORY; cselib_preserve_constants = record_what CSELIB_PRESERVE_CONSTANTS;
cse.c: preferable()
Attached patch is also posted at bug #19832 and I think resolves it, as well as /maybe/ offers a negligible speedup of 3-4 M instr or a couple milliseconds. I also post it here for comments. 2011-08-13 Dimitrios Apostolou ji...@gmx.net * cse.c (preferable): Make it more readable and slightly faster, without affecting its logic. === modified file 'gcc/cse.c' --- gcc/cse.c 2011-06-02 21:52:46 + +++ gcc/cse.c 2011-08-13 00:54:06 + @@ -720,32 +720,25 @@ approx_reg_cost (rtx x) static int preferable (int cost_a, int regcost_a, int cost_b, int regcost_b) { - /* First, get rid of cases involving expressions that are entirely - unwanted. */ - if (cost_a != cost_b) -{ - if (cost_a == MAX_COST) - return 1; - if (cost_b == MAX_COST) - return -1; -} + int cost_diff = cost_a - cost_b; + int regcost_diff = regcost_a - regcost_b; - /* Avoid extending lifetimes of hardregs. */ - if (regcost_a != regcost_b) + if (cost_diff != 0) { - if (regcost_a == MAX_COST) - return 1; - if (regcost_b == MAX_COST) - return -1; + /* If none of the expressions are entirely unwanted */ + if ((cost_a != MAX_COST) (cost_b != MAX_COST) + /* AND only one of the regs is HARD_REG */ + (regcost_diff != 0) + ((regcost_a == MAX_COST) || (regcost_b == MAX_COST)) + ) + /* Then avoid extending lifetime of HARD_REG */ + return regcost_diff; + + return cost_diff; } - /* Normal operation costs take precedence. */ - if (cost_a != cost_b) -return cost_a - cost_b; - /* Only if these are identical consider effects on register pressure. */ - if (regcost_a != regcost_b) -return regcost_a - regcost_b; - return 0; + /* cost_a == costb, consider effects on register pressure */ + return regcost_diff; } /* Internal function, to compute cost when X is not a register; called
Re: Use of vector instructions in memmov/memset expanding
Ping. On 18 July 2011 15:00, Michael Zolotukhin michael.v.zolotuk...@gmail.com wrote: Here is a summary - probably, it doesn't cover every single piece in the patch, but I tried to describe the major changes. I hope this will help you a bit - and of course I'll answer your further questions if they appear. The changes could be logically divided into two parts (though, these parts have something in common). The first part is changes in target-independent part, in functions move_by_pieces() and store_by_pieces() - mostly located in expr.c. The second part touches ix86_expand_movmem() and ix86_expand_setmem() - mostly located in config/i386/i386.c. Changes in i386.c (target-dependent part): 1) Strategies for cases with known and unknown alignment are separated from each other. When alignment is known at compile time, we could generate optimized code without libcalls. When it's unknown, we sometimes could create runtime-checks to reach desired alignment, but not always. Strategies for atom and generic_32, generic_64 were chosen according to set of experiments, strategies in other cost models are unchanged (strategies for unknown alignment are copied from existing strategies). 2) unrolled_loop algorithm was modified - now it uses SSE move-modes, if they're available. 3) As size of data, moved in one iteration, greatly increased, and epilogues became bigger - so some changes were needed in epilogue generation. In some cases a special loop (not unrolled) is generated in epilogue to avoid slow copying by bytes (changes in expand_set_or_movmem_via_loop() and introducing of expand_set_or_movmem_via_loop_with_iter() is made for these cases). 4) As bigger alignment might be needed than previously, prologue generation was also modified. Changes in expr.c (target-independent part): There are two possible strategies now: use of aligned and unaligned moves. For each of them a cost model was implemented and the choice is made according to the cost of each option. Move-mode choice is made by functions widest_mode_for_unaligned_mov() and widest_mode_for_aligned_mov(). Cost estimation is implemented in functions compute_aligned_cost() and compute_unaligned_cost(). Choice between these two strategies and the generation of moves themselves are in function move_by_pieces(). Function store_by_pieces() calls set_by_pieces_1() instead of store_by_pieces_1(), if this is memset-case (I needed to introduce set_by_pieces_1 to separate memset-case from others - store_by_pieces_1 is sometimes called for strcpy and some other functions, not only for memset). Set_by_pieces_1() estimates costs of aligned and unaligned strategies (as in move_by_pieces() ) and generates moves for memset. Single move is generated via generate_move_with_mode(). If it's called first time, a promoted value (register, filled with one-byte value of memset argument) is generated - later calls reuse this value. Changes in MD-files: For generation of promoted values, I made some changes in promote_duplicated_reg() and promote_duplicated_reg_to_size(). Expands for vec_dup4si and vec_dupv2di were introduced for this too (these expands differ from corresponding define_insns - existing define_insn work only with registers, while new expands could process memory operand as well). Some code were added to allow generation of MOVQ (with SSE-registers) - such moves aren't usual ones, because they use only half of xmm-register. There was a need to generate such moves explicitly, so I added a simple expand to sse.md. On 16 July 2011 03:24, Jan Hubicka hubi...@ucw.cz wrote: New algorithm for move-mode selection is implemented for move_by_pieces, store_by_pieces. x86-specific ix86_expand_movmem and ix86_expand_setmem are also changed in similar way, x86 cost-models parameters are slightly changed to support this. This implementation checks if array's alignment is known at compile time and chooses expanding algorithm and move-mode according to it. Can you give some sumary of changes you made? It would make it a lot easier to review if it was broken up int the generic changes (with rationaly why they are needed) and i386 backend changes that I could review then. From first pass through the patch I don't quite see the need for i.e. adding new move patterns when we can output all kinds of SSE moves already. Will look more into the patch to see if I can come up with useful comments. Honza
Re: [PATCH v3, i386] BMI2 support for GCC, mulx, rorx, shiftx part
On Sun, Aug 21, 2011 at 1:39 PM, Uros Bizjak ubiz...@gmail.com wrote: This is the third version of BMI2 support that includes generation of mulx, rorx, shiftx part. This patch includes all comments on previous version, splits all insn post-reload, uses enable attribute and avoids new register modifiers. As a compromise (see previous posts), the mulx insn is now split post-reload into pattern that separates outputs (so, post-reload passes can do their job more effectively), with the hope that someday other DWI patterns will be rewritten in the same way. A small update that removes the need for w mode attribute. We can convert count register to the correct mode in a splitter. Re-tested on x86_64-pc-linux-gnu {,-m32}. Uros. Index: i386.md === --- i386.md (revision 177949) +++ i386.md (working copy) @@ -377,7 +377,7 @@ (define_attr type other,multi, alu,alu1,negnot,imov,imovx,lea, - incdec,ishift,ishift1,rotate,rotate1,imul,idiv, + incdec,ishift,ishiftx,ishift1,rotate,rotatex,rotate1,imul,imulx,idiv, icmp,test,ibr,setcc,icmov, push,pop,call,callv,leave, str,bitmanip, @@ -410,12 +410,12 @@ ;; The (bounding maximum) length of an instruction immediate. (define_attr length_immediate (cond [(eq_attr type incdec,setcc,icmov,str,lea,other,multi,idiv,leave, - bitmanip) + bitmanip,imulx) (const_int 0) (eq_attr unit i387,sse,mmx) (const_int 0) -(eq_attr type alu,alu1,negnot,imovx,ishift,rotate,ishift1,rotate1, - imul,icmp,push,pop) +(eq_attr type alu,alu1,negnot,imovx,ishift,ishiftx,ishift1, + rotate,rotatex,rotate1,imul,icmp,push,pop) (symbol_ref ix86_attr_length_immediate_default (insn, true)) (eq_attr type imov,test) (symbol_ref ix86_attr_length_immediate_default (insn, false)) @@ -675,7 +675,7 @@ (and (match_operand 0 memory_displacement_operand ) (match_operand 1 immediate_operand ))) (const_string true) -(and (eq_attr type alu,ishift,rotate,imul,idiv) +(and (eq_attr type alu,ishift,ishiftx,rotate,rotatex,imul,idiv) (and (match_operand 0 memory_displacement_operand ) (match_operand 2 immediate_operand ))) (const_string true) @@ -699,12 +699,13 @@ (define_attr movu 0,1 (const_string 0)) ;; Used to control the enabled attribute on a per-instruction basis. -(define_attr isa base,noavx,avx +(define_attr isa base,noavx,avx,bmi2 (const_string base)) (define_attr enabled (cond [(eq_attr isa noavx) (symbol_ref !TARGET_AVX) (eq_attr isa avx) (symbol_ref TARGET_AVX) +(eq_attr isa bmi2) (symbol_ref TARGET_BMI2) ] (const_int 1))) @@ -6844,16 +6845,102 @@ (clobber (reg:CC FLAGS_REG))])] TARGET_QIMODE_MATH) -(define_insn *umulmodedwi3_1 +(define_insn *bmi2_umulditi3_1 + [(set (match_operand:DI 0 register_operand =r) + (mult:DI + (match_operand:DI 2 nonimmediate_operand %d) + (match_operand:DI 3 nonimmediate_operand rm))) + (set (match_operand:DI 1 register_operand =r) + (truncate:DI + (lshiftrt:TI + (mult:TI (zero_extend:TI (match_dup 2)) +(zero_extend:TI (match_dup 3))) + (const_int 64] + TARGET_64BIT TARGET_BMI2 +!(MEM_P (operands[1]) MEM_P (operands[2])) + mulx\t{%3, %0, %1|%1, %0, %3} + [(set_attr type imulx) + (set_attr prefix vex) + (set_attr mode DI)]) + +(define_insn *bmi2_umulsidi3_1 + [(set (match_operand:SI 0 register_operand =r) + (mult:SI + (match_operand:SI 2 nonimmediate_operand %d) + (match_operand:SI 3 nonimmediate_operand rm))) + (set (match_operand:SI 1 register_operand =r) + (truncate:SI + (lshiftrt:DI + (mult:DI (zero_extend:DI (match_dup 2)) +(zero_extend:DI (match_dup 3))) + (const_int 32] + !TARGET_64BIT TARGET_BMI2 +!(MEM_P (operands[1]) MEM_P (operands[2])) + mulx\t{%3, %0, %1|%1, %0, %3} + [(set_attr type imulx) + (set_attr prefix vex) + (set_attr mode SI)]) + +(define_insn *umulmodedwi3_1 + [(set (match_operand:DWI 0 register_operand =A,r) + (mult:DWI + (zero_extend:DWI + (match_operand:DWIH 1 nonimmediate_operand %0,d)) + (zero_extend:DWI + (match_operand:DWIH 2 nonimmediate_operand rm,rm + (clobber (reg:CC FLAGS_REG))] + !(MEM_P (operands[1]) MEM_P (operands[2])) + @ + mul{imodesuffix}\t%2 + # + [(set_attr isa base,bmi2) + (set_attr type imul,imulx) + (set_attr length_immediate 0,*) + (set (attr athlon_decode) + (cond [(eq_attr alternative 0) +(if_then_else (eq_attr cpu athlon) + (const_string vector) + (const_string double))] +
Re: [patch, committed] gfortran.dg/graphite/interchange-1.f: Remove xfail
Tobias, I saw that the test case now XPASSes and I find also some xpass in gcc-testresult. (Not for all, but I think those do not build graphite.) I think the XPASS is due to http://gcc.gnu.org/ml/fortran/2011-08/msg00023.html This is PR tree-optimization/50124. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] make assign_hard_reg's saved_nregs conditional to unbreak ARM bootstrap (PR50146)
On Mon, Aug 22, 2011 at 09:46:17AM +0200, Mikael Pettersson wrote: Patch was pre-approved by Vladimir Makarov in the PR trail, but I don't have svn write access so I'll need help to commit it. Committed, thanks. Jakub
[trans-mem] Use __x86_64__ instead of __LP64__.
Use __x86_64__ instead of __LP64__ in setjmp/longjmp and TLS definitions. H.J.: Is that sufficient for x32, or do we need entirely different code? If so, can you please provide the required changes? Otherwise, OK for branch? commit 5337bae3f70d53e463d09e8d6806826876b0da8a Author: Torvald Riegel trie...@redhat.com Date: Mon Aug 22 11:21:03 2011 +0200 Use __x86_64__ instead of __LP64__. * config/x86/tls.h: Use __x86_64__ instead of __LP64__. * config/x86/sjlj.S: Same. diff --git a/libitm/config/x86/sjlj.S b/libitm/config/x86/sjlj.S index 0e9c246..725ffec 100644 --- a/libitm/config/x86/sjlj.S +++ b/libitm/config/x86/sjlj.S @@ -1,4 +1,4 @@ -/* Copyright (C) 2008, 2009 Free Software Foundation, Inc. +/* Copyright (C) 2008, 2009, 2011 Free Software Foundation, Inc. Contributed by Richard Henderson r...@redhat.com. This file is part of the GNU Transactional Memory Library (libitm). @@ -29,7 +29,7 @@ _ITM_beginTransaction: .cfi_startproc -#ifdef __LP64__ +#ifdef __x86_64__ leaq8(%rsp), %rax movq(%rsp), %r8 subq$72, %rsp @@ -72,7 +72,7 @@ _ITM_beginTransaction: GTM_longjmp: .cfi_startproc -#ifdef __LP64__ +#ifdef __x86_64__ movq(%rdi), %rcx movq8(%rdi), %rdx movq16(%rdi), %rbx diff --git a/libitm/config/x86/tls.h b/libitm/config/x86/tls.h index 03fdab2..ca644f4 100644 --- a/libitm/config/x86/tls.h +++ b/libitm/config/x86/tls.h @@ -37,7 +37,7 @@ #if defined(__GLIBC_PREREQ) __GLIBC_PREREQ(2, 10) namespace GTM HIDDEN { -#ifdef __LP64__ +#ifdef __x86_64__ # define SEG_READ(OFS) movq\t%%fs:( #OFS *8),%0 # define SEG_WRITE(OFS)movq\t%0,%%fs:( #OFS *8) # define SEG_DECODE_READ(OFS) SEG_READ(OFS) \n\t \
Re: Dump stats about hottest hash tables when -fmem-report
Hello, I'm attaching here the final version of statistics dumping, I think I made some stylistic/insignificant changes. Tested on i386 and x86_64. I have withheld the hash table size changes, so please apply if you find everything good. Would you agree on a future patch that would make such statistics being computed only when the gather-mem-stats compile-time variable is set? On Sat, 20 Aug 2011, Richard Guenther wrote: On Fri, Aug 19, 2011 at 6:40 PM, Tom Tromey tro...@redhat.com wrote: Your patch to change the symbol table's load factor is fine technically. I think the argument for putting it in is lacking; what I would like to see is either some rationale explaining that the increased memory use is not important, or some numbers showing that it still performs well on more than a single machine. My reason for wanting this is just that, historically, GCC has been very sensitive to increases in memory use. Alternatively, comments from more active maintainers indicating that they don't care about this would also help your case. I can't approve or reject the libiberty change, just the libcpp one. Yes, memory usage is as important as compile-time. We still have testcases that show a vast imbalance of them. I don't know if the symbol table hash is ever the problem, but changing the generic load factor in libiberty doesn't sound a good idea - maybe instead have a away of specifying that factor per hashtable instance. Or, as usual, improve the hash function to reduce the collision rate and/or to make re-hashing cheaper. Regarding hash table size, I will back off from hashtab.c. I really believe that even though it makes a difference the proper solution would be a much more intrusive one: complete reimplementation. If we primarily care about memory usage we should use a closed-addressing (with linked-lists in each bucket) hash table, and set the load-factor high. If we care about cache-efficiency and speed we should use an open-addressing hash table, like the one we have, but decrease the load-factor and resolve collisions by quadratic probing instead of rehashing. Also we should make sure our hash table is always power-of-2 sized, and that hash values are always computed using good and proved hash functions (we use Bob Jenkins v2 hash, we should upgrade to v3 even though v2 seems very good *when* we actually do use it). That would probably mean the interface should change to disallow custom hash values, and while we do that I'd really like to see new functions dedicated to searching and inserting. I've experimented with some of these changes, but these changes individually do not offer significant speed-up. I believe that change will show if hash tables are completely reimplemented. But regarding symtab.c, I only see benefits. Collisions are reduced and extra memory usage is negligible. For a hash table with 32K elements and 64K slots, the overhead of empty slots is 32K*8 = 256KB. It it was 75% full it would be 128KB. Actual measurements on i386 show negligible overhead, it could be by luck that final ht size is the same (they are always power-of-2 sized, so it's possible). Anyway I'll hopefully test in other platforms within September and report back. A change that would probably help for symtab would be to store custom string structs, that include string length so we avoid several strlen() calls. Another could be to not support deletion of strings (since what we are actually doing is string interning, isn't it?). This would make the open-addressing hash table much simpler, and I think there is only one case we actually delete strings. Have to look further into this one. All comments welcome, Dimitris Changelog: 2011-08-22 Dimitrios Apostolou ji...@gmx.net * cgraph.c, cgraph.h (cgraph_dump_stats): New function to dump stats about cgraph_hash hash table. * cselib.c, cselib.h (cselib_dump_stats): New function to dump stats about cselib_hash_table. * cselib.c (cselib_finish): Keep statistics by increasing values of new global variables cselib_htab_{num,expands,searches,collisions} if -fmem-report. * emit-rtl.c, emit-rtl.h (mem_attrs_dump_stats): New function to dump statistics about mem_attrs_htab hash table. * tree.c (print_type_hash_statistics): Used the new htab_dump_statistics() function. * var-tracking.c (shared_hash_destroy): Keep statistics by increasing values of new global variables vars_htab_{num,expands,searches,collisions} if -fmem-report. (vt_finalize): Keep statistics by increasing values of new global variables cv_htab_{num,expands,searches,collisions} and vc_htab_{num,expands,searches,collisions} if -fmem-report. * var-tracking.c, rtl.h (vt_dump_stats): New function to dump stats about vars-htab, changed_variables and value_chains hash tables. * hashtab.c,
Re: tree-ssa*: reduce malloc() calls by preallocating hot VECs on the stack
On Mon, Aug 22, 2011 at 9:43 AM, Dimitrios Apostolou ji...@gmx.net wrote: 2011-08-22 Dimitrios Apostolou ji...@gmx.net Allocate some very frequently used vectors on the stack: * vecir.h: Defined a tree vector on the stack. * tree-ssa-sccvn.c (print_scc, sort_scc, process_scc) (extract_and_process_scc_for_name): Allocate the scc vector on the stack instead of the heap, giving it a minimal initial size instead of 0. * tree-ssa-structalias.c (get_constraint_for_1) (get_constraint_for, get_constraint_for_rhs, do_deref) (get_constraint_for_ssa_var, get_constraint_for_ptr_offset) (get_constraint_for_component_ref, get_constraint_for_address_of) (process_all_all_constraints, do_structure_copy) (make_constraints_to, make_constraint_to, handle_rhs_call) (handle_lhs_call, handle_const_call, handle_pure_call) (find_func_aliases_for_builtin_call, find_func_aliases_for_call) (find_func_aliases, process_ipa_clobber, find_func_clobbers) (create_variable_info_for): Converted the rhsc, lhsc vectors from heap to stack, with a minimal initial size, since they were very frequently allocated. Ok if bootstrapped and tested ok - please always state how you tested a patch. Thanks, Richard.
Re: graphds.[ch]: alloc_pool for edges
On Mon, Aug 22, 2011 at 9:37 AM, Dimitrios Apostolou ji...@gmx.net wrote: free() was called way too often before, this patch reduces it significantly. Minor speed-up here too, I don't mention it individually since numbers are within noise margins. As there is no re-use in this pool the natural allocator to use is an obstack which has even less overhead than a alloc_pool. Richard. 2011-08-22 Dimitrios Apostolou ji...@gmx.net * graphds.h (struct graph): Added edge_pool as a pool for allocating edges. * graphds.c (new_graph): Initialise edge_pool. (add_edge): Allocate edge from edge_pool rather than with malloc. (free_graph): Instead of iterating across the graph freeing edges, just destroy the edge_pool.
Re: tree-ssa-structalias.c: alloc_pool for struct equiv_class_label
On Mon, Aug 22, 2011 at 9:46 AM, Dimitrios Apostolou ji...@gmx.net wrote: 2011-08-22 Dimitrios Apostolou ji...@gmx.net * tree-ssa-structalias.c (equiv_class_add) (perform_var_substitution, free_var_substitution_info): Created a new equiv_class_pool allocator pool for struct equiv_class_label. Changed the pointer_equiv_class_table and location_equiv_class_table hash tables to not iterate freeing all elements in the end, but just free the pool. Did you check if the hash functions have ever called free()? If so why not use the pool free function so that entries can get re-used? If not, the natural allocator would be an obstack instead. Richard.
[trans-mem] Move x86 TLS definition to linux/x86.
A small change. OK for branch? commit b9db8481779fc5a4069ac36505bd6cce8da971a1 Author: Torvald Riegel trie...@redhat.com Date: Mon Aug 22 11:52:00 2011 +0200 Move x86 TLS definition to linux/x86. * config/x86/tls.h: Moved to ... * config/linux/x86/tls.h: ... here. diff --git a/libitm/config/linux/x86/tls.h b/libitm/config/linux/x86/tls.h new file mode 100644 index 000..ca644f4 --- /dev/null +++ b/libitm/config/linux/x86/tls.h @@ -0,0 +1,93 @@ +/* Copyright (C) 2008, 2009, 2011 Free Software Foundation, Inc. + Contributed by Richard Henderson r...@redhat.com. + + This file is part of the GNU Transactional Memory Library (libitm). + + Libitm is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + Libitm is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS + FOR A PARTICULAR PURPOSE. See the GNU General Public License for + more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + http://www.gnu.org/licenses/. */ + +#ifndef LIBITM_X86_TLS_H +#define LIBITM_X86_TLS_H 1 + +#if defined(__GLIBC_PREREQ) __GLIBC_PREREQ(2, 10) +/* Use slots in the TCB head rather than __thread lookups. + GLIBC has reserved words 10 through 13 for TM. */ +#define HAVE_ARCH_GTM_THREAD 1 +#define HAVE_ARCH_GTM_THREAD_DISP 1 +#endif + +#include config/generic/tls.h + +#if defined(__GLIBC_PREREQ) __GLIBC_PREREQ(2, 10) +namespace GTM HIDDEN { + +#ifdef __x86_64__ +# define SEG_READ(OFS) movq\t%%fs:( #OFS *8),%0 +# define SEG_WRITE(OFS)movq\t%0,%%fs:( #OFS *8) +# define SEG_DECODE_READ(OFS) SEG_READ(OFS) \n\t \ + rorq\t$17,%0\n\t \ + xorq\t%%fs:48,%0 +# define SEG_ENCODE_WRITE(OFS) xorq\t%%fs:48,%0\n\t \ + rolq\t$17,%0\n\t \ + SEG_WRITE(OFS) +#else +# define SEG_READ(OFS) movl\t%%gs:( #OFS *4),%0 +# define SEG_WRITE(OFS) movl\t%0,%%gs:( #OFS *4) +# define SEG_DECODE_READ(OFS) SEG_READ(OFS) \n\t \ + rorl\t$9,%0\n\t \ + xorl\t%%gs:24,%0 +# define SEG_ENCODE_WRITE(OFS) xorl\t%%gs:24,%0\n\t \ + roll\t$9,%0\n\t \ + SEG_WRITE(OFS) +#endif + +static inline struct gtm_thread *gtm_thr(void) +{ + struct gtm_thread *r; + asm (SEG_READ(10) : =r(r)); + return r; +} + +static inline void set_gtm_thr(struct gtm_thread *x) +{ + asm volatile (SEG_WRITE(10) : : r(x)); +} + +static inline struct abi_dispatch *abi_disp(void) +{ + struct abi_dispatch *r; + asm (SEG_DECODE_READ(11) : =r(r)); + return r; +} + +static inline void set_abi_disp(struct abi_dispatch *x) +{ + void *scratch; + asm volatile (SEG_ENCODE_WRITE(11) : =r(scratch) : 0(x)); +} + +#undef SEG_READ +#undef SEG_WRITE +#undef SEG_DECODE_READ +#undef SEG_ENCODE_WRITE + +} // namespace GTM +#endif /* = GLIBC 2.10 */ + +#endif // LIBITM_X86_TLS_H diff --git a/libitm/config/x86/tls.h b/libitm/config/x86/tls.h deleted file mode 100644 index ca644f4..000 --- a/libitm/config/x86/tls.h +++ /dev/null @@ -1,93 +0,0 @@ -/* Copyright (C) 2008, 2009, 2011 Free Software Foundation, Inc. - Contributed by Richard Henderson r...@redhat.com. - - This file is part of the GNU Transactional Memory Library (libitm). - - Libitm is free software; you can redistribute it and/or modify it - under the terms of the GNU General Public License as published by - the Free Software Foundation; either version 3 of the License, or - (at your option) any later version. - - Libitm is distributed in the hope that it will be useful, but WITHOUT ANY - WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS - FOR A PARTICULAR PURPOSE. See the GNU General Public License for - more details. - - Under Section 7 of GPL version 3, you are granted additional - permissions described in the GCC Runtime Library Exception, version - 3.1, as published by the Free Software Foundation. - - You should have received a copy of the GNU General Public License and - a copy of the GCC Runtime Library Exception along with this program; - see the files COPYING3 and COPYING.RUNTIME respectively. If not, see - http://www.gnu.org/licenses/. */ - -#ifndef LIBITM_X86_TLS_H -#define LIBITM_X86_TLS_H 1 - -#if defined(__GLIBC_PREREQ)
Re: [ARM] Model automodified addresses in the Cortex A8 and A9 schedulers
Tested on arm-linux-gnueabi. OK to install? gcc/ * config/arm/arm-protos.h (arm_writeback_dep): Declare. (arm_writeback_only_dep): Likewise. * config/arm/arm.c (arm_writeback_dep): New function. (arm_writeback_only_dep_1, arm_writeback_only_dep): Likewise. * config/arm/cortex-a8.md: Add address-writeback bypasses for loads and stores. * config/arm/cortex-a9.md: Likewise. This is OK. Ramana
Re: mem_attrs_htab
On Mon, Aug 22, 2011 at 10:04 AM, Jakub Jelinek ja...@redhat.com wrote: On Mon, Aug 22, 2011 at 10:58:48AM +0300, Dimitrios Apostolou wrote: the future. I didn't like hashing addresses either, and I was surprised I saw no regressions. Hashing on the expr address as well just results in smaller sharing in the hash table (i.e. if the expr has different address, but is considered equal). The hashing of mem attrs is done just to reduce memory overhead. And at some point the idea popped up to just dump this whole re-using mem-attrs. There is at most a single function in RTL but the whole program is available in SSA, so the memory overhead must be small. Richard. Jakub
Re: Various minor speed-ups
On Mon, Aug 22, 2011 at 9:50 AM, Dimitrios Apostolou ji...@gmx.net wrote: 2011-08-22 Dimitrios Apostolou ji...@gmx.net * tree-ssa-pre.c (phi_trans_add, init_pre, fini_pre): Added a pool for phi_translate_table elements to avoid free() calls from htab_delete(). Ok if bootstrap and test pass. Richard.
Re: mem_attrs_htab
On Mon, Aug 22, 2011 at 11:57:18AM +0200, Richard Guenther wrote: And at some point the idea popped up to just dump this whole re-using mem-attrs. There is at most a single function in RTL but the whole program is available in SSA, so the memory overhead must be small. Some functions are extremely large though. Do you mean that MEM itself would be enlarged to have the MEM_ATTRS field so that one operand is the address, then expr, then HWI size, offset, etc.? Because if the mem attrs aren't shared any longer, it doesn't make sense to keep the indirection. I still fear we have way too many MEMs in RTL that this would be noticeable. Jakub
Re: tree-ssa-structalias.c: alloc_pool for struct equiv_class_label
On Mon, 22 Aug 2011, Richard Guenther wrote: On Mon, Aug 22, 2011 at 9:46 AM, Dimitrios Apostolou ji...@gmx.net wrote: 2011-08-22 Dimitrios Apostolou ji...@gmx.net * tree-ssa-structalias.c (equiv_class_add) (perform_var_substitution, free_var_substitution_info): Created a new equiv_class_pool allocator pool for struct equiv_class_label. Changed the pointer_equiv_class_table and location_equiv_class_table hash tables to not iterate freeing all elements in the end, but just free the pool. Did you check if the hash functions have ever called free()? If so why not use the pool free function so that entries can get re-used? If not, the natural allocator would be an obstack instead. I have not found any relevant call of htab_clear_slot(). I didn't consider obstacks at all for all these cases, thanks for telling me, I'll see where I can use them. As I've noted I have bootstrapped and tested all these changes at least on x86_64 with release-checking enabled, but I plan to test and measure all my changes together later, and hopefully on other platforms in the future. Thanks, Dimitris
[PATCH] Fix PR50067
Applied. Richard. 2011-08-22 Richard Guenther rguent...@suse.de PR testsuite/50145 * gcc.dg/torture/pr50067-1.c: Run on little-endian systems only. * gcc.dg/torture/pr50067-2.c: Likewise. Index: gcc/testsuite/gcc.dg/torture/pr50067-1.c === --- gcc/testsuite/gcc.dg/torture/pr50067-1.c(revision 177949) +++ gcc/testsuite/gcc.dg/torture/pr50067-1.c(working copy) @@ -9,10 +9,15 @@ short a[32] = { 0, 1, 2, 3, 4, 5, 6, 7, short b[32] = { 4, 0, 5, 0, 6, 0, 7, 0, 8, 0, }; int main() { +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ int i; - for (i = 0; i 32; ++i) -(*((unsigned short(*)[32])a[0]))[i] = (*((char(*)[32])a[0]))[i+8]; - if (memcmp (a, b, sizeof (a)) != 0) -abort (); + if (sizeof (short) == 2) +{ + for (i = 0; i 32; ++i) + (*((unsigned short(*)[32])a[0]))[i] = (*((char(*)[32])a[0]))[i+8]; + if (memcmp (a, b, sizeof (a)) != 0) + abort (); +} +#endif return 0; } Index: gcc/testsuite/gcc.dg/torture/pr50067-2.c === --- gcc/testsuite/gcc.dg/torture/pr50067-2.c(revision 177949) +++ gcc/testsuite/gcc.dg/torture/pr50067-2.c(working copy) @@ -9,12 +9,17 @@ short a[32] = { 0, 1, 2, 3, 4, 5, 6, 7, short b[32] = { 4, 0, 5, 0, 6, 0, 7, 0, 8, 0, }; int main() { +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ int i; - for (i = 0; i 32; ++i) + if (sizeof (short) == 2) { - a[i] = (*((char(*)[32])a[0]))[i+8]; + for (i = 0; i 32; ++i) + { + a[i] = (*((char(*)[32])a[0]))[i+8]; + } + if (memcmp (a, b, sizeof (a)) != 0) + abort (); } - if (memcmp (a, b, sizeof (a)) != 0) -abort (); +#endif return 0; }
[var-tracking] small speed-ups
Hello list, this patch has all of my changes to var-tracking that I believe are worth keeping. These are all minor changes not touching algorithmic issues, I lost much time trying to understand what is actually done in var-tracking.c but I still don't get it. I wish there was some document describing current implementation. I'd appreciate all help I can get here. Bootstrapped/tested on x86_64. Thanks to lxo for helping me, hopefully his plan for better var-tracking throughout all optimisation passes will be implemented. This is the only var-tracking related doc I could find... ( http://gcc.gnu.org/wiki/Var_Tracking_Assignments). This patch also includes minor stylistic changes that made the code just a tiny bit more accessible to me, the indirection was so much that it hardly reminded me of C, let me know if I should remove these parts. :-s For the sake of completion I'll also post a follow-up patch where I delete/simplify a big part of var-tracking, unfortunately with some impact on performance. 2011-08-22 Dimitrios Apostolou ji...@gmx.net * var-tracking.c (init_attrs_list_set): Remove function, instead use a memset() call to zero the attr list in... (dataflow_set_init). (vars_copy): Remove function because inserting each element into a new hash table was too costly. Replaced with the ... (htab_dup): ... new function that only does a memcpy() of the element table in htab_t, without rehashing any elements. (shared_hash_unshare): Replace the vars_copy() call with htab_dup(), plus do a little extra work (reference counting) which was in vars_copy. (shared_hash_destroy, dataflow_set_destroy): Add an extra do_free bool argument, to avoid iterating over hash tables freeing elements, when not needed. (vt_finalize, vt_emit_notes): Call the above with do_free=false since all pools will be freed later. (dataflow_set_clear, dataflow_set_copy, dataflow_set_union) (dataflow_set_merge, dataflow_set_different, compute_bb_dataflow) (vt_find_locations): Call shared_hash_destroy with do_free=true. (attrs_list_copy): Do not free destination list but reuse already allocated elements if possible. === modified file 'gcc/var-tracking.c' --- gcc/var-tracking.c 2011-06-03 01:42:31 + +++ gcc/var-tracking.c 2011-08-22 09:52:08 + @@ -414,7 +414,6 @@ static hashval_t variable_htab_hash (con static int variable_htab_eq (const void *, const void *); static void variable_htab_free (void *); -static void init_attrs_list_set (attrs *); static void attrs_list_clear (attrs *); static attrs attrs_list_member (attrs, decl_or_value, HOST_WIDE_INT); static void attrs_list_insert (attrs *, decl_or_value, HOST_WIDE_INT, rtx); @@ -423,7 +422,6 @@ static void attrs_list_union (attrs *, a static void **unshare_variable (dataflow_set *set, void **slot, variable var, enum var_init_status); -static void vars_copy (htab_t, htab_t); static tree var_debug_decl (tree); static void var_reg_set (dataflow_set *, rtx, enum var_init_status, rtx); static void var_reg_delete_and_set (dataflow_set *, rtx, bool, @@ -447,7 +445,7 @@ static bool variable_part_different_p (v static bool onepart_variable_different_p (variable, variable); static bool variable_different_p (variable, variable); static bool dataflow_set_different (dataflow_set *, dataflow_set *); -static void dataflow_set_destroy (dataflow_set *); +static void dataflow_set_destroy (dataflow_set *, bool); static bool contains_symbol_ref (rtx); static bool track_expr_p (tree, bool); @@ -1069,14 +1067,14 @@ adjust_insn (basic_block bb, rtx insn) static inline bool dv_is_decl_p (decl_or_value dv) { - return !dv || (int) TREE_CODE ((tree) dv) != (int) VALUE; + return !dv || ((int) TREE_CODE ((tree) dv) != (int) VALUE); } /* Return true if a decl_or_value is a VALUE rtl. */ static inline bool dv_is_value_p (decl_or_value dv) { - return dv !dv_is_decl_p (dv); + return !dv_is_decl_p (dv); } /* Return the decl in the decl_or_value. */ @@ -1092,7 +1090,7 @@ static inline rtx dv_as_value (decl_or_value dv) { gcc_checking_assert (dv_is_value_p (dv)); - return (rtx)dv; + return (rtx) dv; } /* Return the opaque pointer in the decl_or_value. */ @@ -1191,7 +1189,7 @@ dv_uid2hash (dvuid uid) static inline hashval_t dv_htab_hash (decl_or_value dv) { - return dv_uid2hash (dv_uid (dv)); + return (hashval_t) (dv_uid (dv)); } /* The hash function for variable_htab, computes the hash value @@ -1202,7 +1200,7 @@ variable_htab_hash (const void *x) { const_variable const v = (const_variable) x; - return dv_htab_hash (v-dv); + return (hashval_t) (dv_uid (v-dv)); } /* Compare the declaration of variable X with declaration Y. */ @@ -1211,9 +1209,8 @@ static int variable_htab_eq (const void *x, const void *y) { const_variable const v
Re: mem_attrs_htab
On Mon, Aug 22, 2011 at 12:07 PM, Jakub Jelinek ja...@redhat.com wrote: On Mon, Aug 22, 2011 at 11:57:18AM +0200, Richard Guenther wrote: And at some point the idea popped up to just dump this whole re-using mem-attrs. There is at most a single function in RTL but the whole program is available in SSA, so the memory overhead must be small. Some functions are extremely large though. Do you mean that MEM itself would be enlarged to have the MEM_ATTRS field so that one operand is the address, then expr, then HWI size, offset, etc.? Because if the mem attrs aren't shared any longer, it doesn't make sense to keep the indirection. I still fear we have way too many MEMs in RTL that this would be noticeable. It would be interesting to have numbers about the amount of sharing that happens - might be not trivial though, as some re-uses would be able to simply modify the attr inplace. Richard. Jakub
[patch] revert an obsolete part from the mips-triarch checkin
While looking at the multiarch patches, I noticed that a previous change is not necessary. MULTILIB_DEFAULTS is handled in config/mips/mips.h. Matthias gcc/ 2011-08-22 Matthias Klose d...@debian.org Revert: 2011-07-11 Arthur Loiret aloi...@debian.org Matthias Klose d...@debian.org * config/mips/t-linux64 (MULTILIB_DIRNAMES): Set to 'n32 . 64' if tm_defines contains MIPS_ABI_DEFAULT ABI_32, to follow the glibc convention. Index: gcc/config/mips/t-linux64 === --- gcc/config/mips/t-linux64 (revision 177952) +++ gcc/config/mips/t-linux64 (working copy) @@ -17,11 +17,7 @@ # http://www.gnu.org/licenses/. MULTILIB_OPTIONS = mabi=n32/mabi=32/mabi=64 -ifneq ($(filter MIPS_ABI_DEFAULT=ABI_32,$(tm_defines)),) -MULTILIB_DIRNAMES = n32 . 64 -else MULTILIB_DIRNAMES = n32 32 64 -endif MULTILIB_OSDIRNAMES = ../lib32 ../lib ../lib64 EXTRA_MULTILIB_PARTS=crtbegin.o crtend.o crtbeginS.o crtendS.o crtbeginT.o
Re: [patch] revert an obsolete part from the mips-triarch checkin
Matthias Klose d...@debian.org writes: 2011-08-22 Matthias Klose d...@debian.org Revert: 2011-07-11 Arthur Loiret aloi...@debian.org Matthias Klose d...@debian.org * config/mips/t-linux64 (MULTILIB_DIRNAMES): Set to 'n32 . 64' if tm_defines contains MIPS_ABI_DEFAULT ABI_32, to follow the glibc convention. Please also remove: tm_defines=${tm_defines} MIPS_ABI_DEFAULT=ABI_32 from: mips*-*-linux*) # Linux MIPS, either endian. tm_file=dbxelf.h elfos.h gnu-user.h linux.h glibc-stdint.h ${tm_file} mips/gnu-user.h mips/linux.h tmake_file=${tmake_file} mips/t-libgcc-mips16 if test x$enable_targets = xall; then tm_file=${tm_file} mips/gnu-user64.h mips/linux64.h tmake_file=${tmake_file} mips/t-linux64 tm_defines=${tm_defines} MIPS_ABI_DEFAULT=ABI_32 fi in config.gcc. (For the record, this follows up from: http://gcc.gnu.org/ml/gcc-patches/2010-05/msg01108.html which I'd forgotten about until Matthias reminded me this morning.) OK with that change, thanks. Richard
[PATCH] void dangling line table after loading pch
Hello, In c_common_read_pch when gt_pch_restore loads a new pch, the previous line table (referenced from the global 'line_table') is garbage-collected and a new one is built. As the global instance of cpp_reader referenced by the local variable 'pfile' has a pfile-line_table member that references the 'line_table' global, pfile-line_table needs to be set to the new value of line_table after gt_pch_restore is called, otherwise pfile-line_table points to garbage-collected memory. This is what the call to cpp_set_line_map in c_common_read_pch is for. The problem is that cpp_set_line_map is called too late as cpp_read_state (called before cpp_set_line_map) indirectly touches some pfile-line_table dangling garbage-collected memory[1]. This doesn't cause any visible havoc in trunk yet but I am seeing it crashing as I am fiddling with line map stuff on the side. The two-liner patch below just calls cpp_set_line_map right after gt_pch_restore to restore pfile-line_table before anyone touches it. [1]: This happens via cpp_read_state - _cpp_create_definition - _cpp_create_iso_definition - _cpp_lex_token - _cpp_lex_direct - linemap_position_for_column. Bootstrapped and tested on x86_64-unknown-linux-gnu against trunk. OK for trunk? gcc/ * c-family/c-pch.c (c_common_read_pch): Re-set line table right after reading in the pch. --- gcc/c-family/c-pch.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/gcc/c-family/c-pch.c b/gcc/c-family/c-pch.c index b429d9d..3c2fd18 100644 --- a/gcc/c-family/c-pch.c +++ b/gcc/c-family/c-pch.c @@ -431,6 +431,7 @@ c_common_read_pch (cpp_reader *pfile, const char *name, timevar_pop (TV_PCH_CPP_RESTORE); gt_pch_restore (f); + cpp_set_line_map (pfile, line_table); timevar_push (TV_PCH_CPP_RESTORE); if (cpp_read_state (pfile, name, f, smd) != 0) @@ -445,7 +446,6 @@ c_common_read_pch (cpp_reader *pfile, const char *name, fclose (f); line_table-trace_includes = saved_trace_includes; - cpp_set_line_map (pfile, line_table); linemap_add (line_table, LC_RENAME, 0, saved_loc.file, saved_loc.line); /* Give the front end a chance to take action after a PCH file has -- Dodji
Re: Vector Comparison patch
On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov artyom.shinkar...@gmail.com wrote: Richard I formalized an approach a little-bit, now it works without target hooks, but some polishing is still required. I want you to comment on the several important approaches that I use in the patch. So how does it work. 1) All the vector comparisons at the level of type-checker are introduced using VEC_COND_EXPR with constant selection operands being {-1} and {0}. For example v0 v1 is transformed into VEC_COND_EXPRv0 v1, {-1}, {0}. 2) When optabs expand VEC_COND_EXPR, two cases are considered: 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes. 2.b) first operand is something else, in that case, we specially mark this case, recognize it in the backend, and do not create a comparison, but use the mask as it was a result of some comparison. 3) In order to make sure that mask in VEC_COND_EXPRmask, v0, v1 is a vector comparison we use is_vector_comparison function, if it returns false, then we replace mask with mask != {0}. So we end-up with the following functionality: VEC_COND_EXPRmask, v0,v1 -- if we know that mask is a result of comparison of two vectors, we leave it as it is, otherwise change with mask != {0}. Plain vector comparison a op b is represented with VEC_COND_EXPR, which correctly expands, without creating useless masking. Basically for me there are two questions: 1) Can we perform information passing in optabs in a nicer way? 2) How is_vector_comparison could be improved? I have several ideas, like checking if constant vector all consists of 0 and -1, and so on. But first is it conceptually fine. P.S. I tired to put the functionality of is_vector_comparison in tree-ssa-forwprop, but the thing is that it is called only with -On, which I find inappropriate, and the functionality gets more complicated. Why is it inappropriate to not optimize it at -O0? If the user separates comparison and ?: expression it's his own fault. Btw, the new hook is still in the patch. I would simply always create != 0 if it isn't and let optimizers (tree-ssa-forwprop.c) optimize this. You still have to deal with non-comparison operands during expansion though, but if you always forced a != 0 from the start you can then simply interpret it as a proper comparison result (in which case I'd modify the backends to have an alternate pattern or directly expand to masking operations - using the fake comparison RTX is too much of a hack). tree constant_boolean_node (int value, tree type) { - if (type == integer_type_node) + if (TREE_CODE (type) == VECTOR_TYPE) +{ + tree tval; + + gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE); + tval = build_int_cst (TREE_TYPE (type), value); + return build_vector_from_val (type, tval); as value is either 0 or 1 that won't work. Oh, I see you pass -1 for true in the callers. But I think we should simply decide that true (1) means -1 for a vector boolean node (and the value parameter should be a bool instead). Thus, + if (TREE_CODE (type) == VECTOR_TYPE) +{ + tree tval; + + gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE); + tval = build_int_cst (TREE_TYPE (type), value ? -1 : 0); + return build_vector_from_val (type, tval); instead. @@ -9073,26 +9082,29 @@ fold_comparison (location_t loc, enum tr floating-point, we can only do some of these simplifications.) */ if (operand_equal_p (arg0, arg1, 0)) { + int true_val = TREE_CODE (type) == VECTOR_TYPE ? -1 : 0; + tree arg0_type = TREE_TYPE (arg0); + as I said this is not necessary - the FLOAT_TYPE_P and HONOR_NANS macros work perfectly fine on vector types. Richard. Thanks, Artem.
[var-tracking] [not-good!] disable shared_hash and other simplifications
Hello, the attached patch applies after my previous one, and actually cancels all runtime gains from it. It doesn't make things worse than initially, so it's not *that* bad. While trying to understand var-tracking I deleted the whole shared hash table concept and some other indirections. It didn't make things much worse, and it makes me think that getting rid of other indirections like shared variables reference counting would actually help. But it got too complicated and I had to give up. While I don't understand variable tracking, I couldn't help but notice some things that could be discussed: * This is the part of the compiler that makes the heaviest use of hash tables. htab_traverse(), htab_delete(), FOR_EACH_HTAB_ELEMENT (should that be moved in libiberty?) all iterate way too many times over hash tables. This is all because decl/value UIDs are sparse. * Since var-tracking is happening per-function, is there a way to produce per-function UIDs and use those in a regular array instead of a sparse hash table? These UIDs could be produced at tree/rtx level, and I bet other parts in the compiler could make good use of them. * Another alternative would be to store variables in a vector, sorted by UID value. That way checking if it's there would be O(logn) but merging would be a very simple operation. Still, merging dataflow sets is way to complex to follow so I didn't manage to change this. * From a very wide field of view, all this DF solving reminded me a lot of what I've seen in df-*.c. Why can't variable tracking be integrated in the main DF pass of the compiler, the one that happens *after* register allocation? Thanks, Dimitris === modified file 'gcc/var-tracking.c' --- gcc/var-tracking.c 2011-08-22 08:27:47 + +++ gcc/var-tracking.c 2011-08-22 10:36:15 + @@ -233,17 +233,6 @@ typedef struct attrs_def HOST_WIDE_INT offset; } *attrs; -/* Structure holding a refcounted hash table. If refcount 1, - it must be first unshared before modified. */ -typedef struct shared_hash_def -{ - /* Reference count. */ - int refcount; - - /* Actual hash table. */ - htab_t htab; -} *shared_hash; - /* Structure holding the IN or OUT set for a basic block. */ typedef struct dataflow_set_def { @@ -253,11 +242,11 @@ typedef struct dataflow_set_def /* Attributes for registers (lists of attrs). */ attrs regs[FIRST_PSEUDO_REGISTER]; - /* Variable locations. */ - shared_hash vars; + /* Variable locations, stored in a hash table. */ + htab_t vars; /* Vars that is being traversed. */ - shared_hash traversed_vars; + htab_t traversed_vars; } dataflow_set; /* The structure (one for each basic block) containing the information @@ -379,9 +368,6 @@ static alloc_pool valvar_pool; /* Alloc pool for struct location_chain_def. */ static alloc_pool loc_chain_pool; -/* Alloc pool for struct shared_hash_def. */ -static alloc_pool shared_hash_pool; - /* Alloc pool for struct value_chain_def. */ static alloc_pool value_chain_pool; @@ -395,7 +381,7 @@ static htab_t value_chains; static bool emit_notes; /* Empty shared hashtable. */ -static shared_hash empty_shared_hash; +static htab_t empty_htab; /* Scratch register bitmap used by cselib_expand_value_rtx. */ static bitmap scratch_regs = NULL; @@ -478,7 +464,7 @@ static void **delete_slot_part (dataflow static void delete_variable_part (dataflow_set *, rtx, decl_or_value, HOST_WIDE_INT); static int emit_note_insn_var_location (void **, void *); -static void emit_notes_for_changes (rtx, enum emit_note_where, shared_hash); +static void emit_notes_for_changes (rtx, enum emit_note_where, htab_t); static int emit_notes_for_differences_1 (void **, void *); static int emit_notes_for_differences_2 (void **, void *); static void emit_notes_for_differences (rtx, dataflow_set *, dataflow_set *); @@ -1370,164 +1356,30 @@ attrs_list_mpdv_union (attrs *dstp, attr } } -/* Shared hashtable support. */ - -/* Return true if VARS is shared. */ - -static inline bool -shared_hash_shared (shared_hash vars) -{ - return vars-refcount 1; -} - -/* Return the hash table for VARS. */ - -static inline htab_t -shared_hash_htab (shared_hash vars) -{ - return vars-htab; -} - /* Return true if VAR is shared, or maybe because VARS is shared. */ static inline bool -shared_var_p (variable var, shared_hash vars) +shared_var_p (variable var) { /* Don't count an entry in the changed_variables table as a duplicate. */ - return ((var-refcount 1 + (int) var-in_changed_variables) - || shared_hash_shared (vars)); + return (var-refcount - (int) var-in_changed_variables) 1; } -/* Copy all variables from hash table SRC to hash table DST without rehashing - any values. */ - -static htab_t -htab_dup (htab_t src) -{ - htab_t dst; - - dst = (htab_t) xmalloc (sizeof (*src)); - memcpy (dst, src, sizeof (*src)); - dst-entries = (void **) xmalloc (src-size *
Re: Dump stats about hottest hash tables when -fmem-report
I should note here that specialised hash-tables in pointer-set.c have a load-factor of at most 25%. Also another very fast hash table I've studied, dense_hash_map from google's sparse_hash_table, has a load factor of 50% max. As I understand it a good hash function gives a perfectly random value up to N. So if we have only 25% of slots empty, like we do with today's 75% load factor, chances we won't have collision are small. And collisions cost more for an open-addressing hash table. But for implementing a closed-addressing hash-table with linked lists for collision resolution, we'd need allocator pools within libiberty. Is it OK to move alloc-pool.[ch] into libiberty as a first step? Dimitris
Re: [var-tracking] [not-good!] disable shared_hash and other simplifications
On Mon, Aug 22, 2011 at 02:26:58PM +0300, Dimitrios Apostolou wrote: the attached patch applies after my previous one, and actually cancels all runtime gains from it. It doesn't make things worse than initially, so it's not *that* bad. You should really test it on the testcases which lead to the changes, e.g. http://gcc.gnu.org/ml/gcc-patches/2009-06/msg01586.html You might not notice significant differences on small or common testcases, but on larger testcases it may be a question whether something can be compiled at all or not. You'll see that reverting it is actually very bad for the worst cases. * This is the part of the compiler that makes the heaviest use of hash tables. htab_traverse(), htab_delete(), FOR_EACH_HTAB_ELEMENT (should that be moved in libiberty?) all iterate way too many times over hash tables. This is all because decl/value UIDs are sparse. * Since var-tracking is happening per-function, is there a way to produce per-function UIDs and use those in a regular array instead of a sparse hash table? These UIDs could be produced at tree/rtx level, and I bet other parts in the compiler could make good use of them. That wouldn't help you, you don't want/need to represent every DECL_UID and every VALUE seen in the whole function in every bb, only those that were actually seen there. Having (number_of_seen_decl_uids+number_of_values)*number_of_basic_blocks*sizeof(void*) memory requirements would be still bigger than the size of all the hash tables together, I think much bigger actually, and also slower when you'd need to traverse it. * From a very wide field of view, all this DF solving reminded me a lot of what I've seen in df-*.c. Why can't variable tracking be integrated in the main DF pass of the compiler, the one that happens *after* register allocation? I don't think there is any significant overlap. Yes, var-tracking is a data flow algorithm, but the code that does that is just a few lines, the big part is how the merges are done and how the tracking propagates through a basic block, and that has nothing in common with df-*.c. Jakub
Re: Dump stats about hottest hash tables when -fmem-report
On Mon, Aug 22, 2011 at 1:37 PM, Dimitrios Apostolou ji...@gmx.net wrote: I should note here that specialised hash-tables in pointer-set.c have a load-factor of at most 25%. Also another very fast hash table I've studied, dense_hash_map from google's sparse_hash_table, has a load factor of 50% max. As I understand it a good hash function gives a perfectly random value up to N. So if we have only 25% of slots empty, like we do with today's 75% load factor, chances we won't have collision are small. And collisions cost more for an open-addressing hash table. But for implementing a closed-addressing hash-table with linked lists for collision resolution, we'd need allocator pools within libiberty. Is it OK to move alloc-pool.[ch] into libiberty as a first step? Well, what would be the constant overhead for a closed-addressing hash if it is empty and per element (per collision)? Decreasing the load factor might be a better choice ... Richard. Dimitris
Re: [var-tracking] small speed-ups
On Mon, Aug 22, 2011 at 01:30:33PM +0300, Dimitrios Apostolou wrote: --- gcc/var-tracking.c2011-06-03 01:42:31 + +++ gcc/var-tracking.c2011-08-22 09:52:08 + @@ -1069,14 +1067,14 @@ adjust_insn (basic_block bb, rtx insn) static inline bool dv_is_decl_p (decl_or_value dv) { - return !dv || (int) TREE_CODE ((tree) dv) != (int) VALUE; + return !dv || ((int) TREE_CODE ((tree) dv) != (int) VALUE); } Why? /* Return true if a decl_or_value is a VALUE rtl. */ static inline bool dv_is_value_p (decl_or_value dv) { - return dv !dv_is_decl_p (dv); + return !dv_is_decl_p (dv); } This is fine, though I don't think it makes any difference if var-tracking is compiled with -O+ - gcc should be able to optimize the second NULL/non-NULL check out. @@ -1191,7 +1189,7 @@ dv_uid2hash (dvuid uid) static inline hashval_t dv_htab_hash (decl_or_value dv) { - return dv_uid2hash (dv_uid (dv)); + return (hashval_t) (dv_uid (dv)); } Why? dv_uid2hash is an inline that does exactly that. @@ -1202,7 +1200,7 @@ variable_htab_hash (const void *x) { const_variable const v = (const_variable) x; - return dv_htab_hash (v-dv); + return (hashval_t) (dv_uid (v-dv)); } Why? /* Compare the declaration of variable X with declaration Y. */ @@ -1211,9 +1209,8 @@ static int variable_htab_eq (const void *x, const void *y) { const_variable const v = (const_variable) x; - decl_or_value dv = CONST_CAST2 (decl_or_value, const void *, y); - return (dv_as_opaque (v-dv) == dv_as_opaque (dv)); + return (v-dv) == y; } Why? @@ -1397,19 +1398,40 @@ shared_var_p (variable var, shared_hash || shared_hash_shared (vars)); } +/* Copy all variables from hash table SRC to hash table DST without rehashing + any values. */ + +static htab_t +htab_dup (htab_t src) +{ + htab_t dst; + + dst = (htab_t) xmalloc (sizeof (*src)); + memcpy (dst, src, sizeof (*src)); + dst-entries = (void **) xmalloc (src-size * sizeof (*src-entries)); + memcpy (dst-entries, src-entries, + src-size * sizeof (*src-entries)); + return dst; +} + This certainly doesn't belong here, it should go into libiberty/hashtab.c and prototype into include/hashtab.h. It relies on hashtab.c implementation details. @@ -2034,7 +2041,8 @@ val_resolve (dataflow_set *set, rtx val, static void dataflow_set_init (dataflow_set *set) { - init_attrs_list_set (set-regs); + /* Initialize the set (array) SET of attrs to empty lists. */ + memset (set-regs, 0, sizeof (set-regs)); set-vars = shared_hash_copy (empty_shared_hash); set-stack_adjust = 0; set-traversed_vars = NULL; I'd say you should instead just implement init_attrs_list_set inline using memset. dst-vars = (shared_hash) pool_alloc (shared_hash_pool); dst-vars-refcount = 1; dst-vars-htab -= htab_create (MAX (src1_elems, src2_elems), variable_htab_hash, += htab_create (2 * MAX (src1_elems, src2_elems), variable_htab_hash, variable_htab_eq, variable_htab_free); This looks wrong, 2 * max is definitely too much. @@ -8996,11 +9006,13 @@ vt_finalize (void) FOR_ALL_BB (bb) { - dataflow_set_destroy (VTI (bb)-in); - dataflow_set_destroy (VTI (bb)-out); + /* The false do_free parameter means to not bother to iterate and free + all hash table elements, since we'll destroy the pools. */ + dataflow_set_destroy (VTI (bb)-in, false); + dataflow_set_destroy (VTI (bb)-out, false); if (VTI (bb)-permp) { - dataflow_set_destroy (VTI (bb)-permp); + dataflow_set_destroy (VTI (bb)-permp, false); XDELETE (VTI (bb)-permp); } } How much does this actually speed things up (the not freeing pool allocated stuff during finalizaqtion)? Is it really worth it? Jakub
Re: Vector Comparison patch
On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther richard.guent...@gmail.com wrote: On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov artyom.shinkar...@gmail.com wrote: Richard I formalized an approach a little-bit, now it works without target hooks, but some polishing is still required. I want you to comment on the several important approaches that I use in the patch. So how does it work. 1) All the vector comparisons at the level of type-checker are introduced using VEC_COND_EXPR with constant selection operands being {-1} and {0}. For example v0 v1 is transformed into VEC_COND_EXPRv0 v1, {-1}, {0}. 2) When optabs expand VEC_COND_EXPR, two cases are considered: 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes. 2.b) first operand is something else, in that case, we specially mark this case, recognize it in the backend, and do not create a comparison, but use the mask as it was a result of some comparison. 3) In order to make sure that mask in VEC_COND_EXPRmask, v0, v1 is a vector comparison we use is_vector_comparison function, if it returns false, then we replace mask with mask != {0}. So we end-up with the following functionality: VEC_COND_EXPRmask, v0,v1 -- if we know that mask is a result of comparison of two vectors, we leave it as it is, otherwise change with mask != {0}. Plain vector comparison a op b is represented with VEC_COND_EXPR, which correctly expands, without creating useless masking. Basically for me there are two questions: 1) Can we perform information passing in optabs in a nicer way? 2) How is_vector_comparison could be improved? I have several ideas, like checking if constant vector all consists of 0 and -1, and so on. But first is it conceptually fine. P.S. I tired to put the functionality of is_vector_comparison in tree-ssa-forwprop, but the thing is that it is called only with -On, which I find inappropriate, and the functionality gets more complicated. Why is it inappropriate to not optimize it at -O0? If the user separates comparison and ?: expression it's his own fault. Well, because all the information is there, and I perfectly envision the case when user expressed comparison separately, just to avoid code duplication. Like: mask = a b; res1 = mask ? v0 : v1; res2 = mask ? v2 : v3; Which in this case would be different from res1 = a b ? v0 : v1; res2 = a b ? v2 : v3; Btw, the new hook is still in the patch. I would simply always create != 0 if it isn't and let optimizers (tree-ssa-forwprop.c) optimize this. You still have to deal with non-comparison operands during expansion though, but if you always forced a != 0 from the start you can then simply interpret it as a proper comparison result (in which case I'd modify the backends to have an alternate pattern or directly expand to masking operations - using the fake comparison RTX is too much of a hack). Richard, I think you didn't get the problem. I really need the way, to pass the information, that the expression that is in the first operand of vcond is an appropriate mask. I though for quite a while and this hack is the only answer I found, is there a better way to do that. I could for example introduce another tree-node, but it would be overkill as well. Now why do I need it so much: I want to implement the comparison in a way that {1, 5, 0, -1} is actually {-1,-1,-1,-1}. So whenever I am not sure that mask of VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not always). To check the stuff, I use is_vector_comparison in tree-vect-generic. So I really have the difference between mask ? x : y and mask != {0} ? x : y, otherwise I could treat mask != {0} in the backend as just mask. If this link between optabs and backend breaks, then the patch falls apart. Because every time the comparison is taken out VEC_COND_EXPR, I will have to put != {0}. Keep in mind, that I cannot always put the comparison inside the VEC_COND_EXPR, what if it is defined in a function-comparison, or somehow else? So what would be an appropriate way to connect optabs and the backend? Thanks, Artem. All the rest would be adjusted later. tree constant_boolean_node (int value, tree type) { - if (type == integer_type_node) + if (TREE_CODE (type) == VECTOR_TYPE) + { + tree tval; + + gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE); + tval = build_int_cst (TREE_TYPE (type), value); + return build_vector_from_val (type, tval); as value is either 0 or 1 that won't work. Oh, I see you pass -1 for true in the callers. But I think we should simply decide that true (1) means -1 for a vector boolean node (and the value parameter should be a bool instead). Thus, + if (TREE_CODE (type) == VECTOR_TYPE) + { + tree tval; + + gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE); + tval = build_int_cst (TREE_TYPE (type), value ? -1 : 0); + return
[dwarf2out, elfos] Output assembly faster
Hello again, most of this patch was posted at the beginning of my GSOC adventure and primarily is about replacing fprintf() calls with custom faster ones. From that time I changed minor things you suggested, omitted some complexities that offered minor speed-up, and made the code as clear as possible, while following more closely coding guidelines. Bootstrapped and tested on both i386 and x86_64, showing runtime improvement when compiling with debug info enabled. The only thing I am not sure is the differentiation between HOST_WIDE_INT and long, so I provided two versions of the same function. Please let me know if I should handle this differently. Future speedup in assembly generation *is possible* but requires more intrusive changes, maybe making assembly hard to read by human: * Always use hex numbers, they are much faster to produce * Call GNU assembler with -s parameter, though it's pretty hard to be compliant. * Even further in the future we could generate binary data, if we *know* the assembler is GAS. Changelog: 2011-08-12 Dimitrios Apostolou ji...@gmx.net * final.c, output.h (fprint_whex, fprint_w, fprint_ul, sprint_ul): New functions serving as fast replacements for fprintf() integer to string conversions. They were used in the following changes. * final.c (sprint_ul_rev): Internal helper for the above. (output_addr_const): case CONST_INT: don't use fprintf(). * elfos.h (ASM_GENERATE_INTERNAL_LABEL): Don't use sprintf(%u), use sprint_ul() and custom loop which are much faster. (TARGET_ASM_INTERNAL_LABEL): Define as default_elf_internal_label. (ELF_ASCII_ESCAPES, ELF_STRING_LIMIT): Are the old ESCAPES and STRING_LIMIT macros. (ASM_OUTPUT_LIMITED_STRING, ASM_OUTPUT_ASCII): Macros now just call respective functions that provide now the same functionality. Those are default_elf_asm_output_limited_string() and default_elf_asm_output_ascii() in varasm.c. * varasm.c: Fixed some whitespace inconsistencies. (default_elf_asm_output_limited_string) (default_elf_asm_output_ascii): The above macros from elfos.h are implemented here, avoiding superfluous calls to fprintf(). (default_elf_internal_label): Hook for targetm.asm_out.internal_label and ASM_OUTPUT_DEBUG_LABEL. * i386.c: Don't call fprintf(%u) but fprint_ul() instead. * defaults.h (ASM_OUTPUT_LABEL, ASM_OUTPUT_INTERNAL_LABEL): Expanded the macros on multiple lines for readability. (ASM_OUTPUT_LABELREF): Have two calls to fputs() instead of one to asm_fprintf(). * dwarf2asm.c (dw2_assemble_integer, dw2_asm_output_data) (dw2_asm_data_uleb128, dw2_asm_delta_uleb128) (dw2_asm_delta_sleb128): Convert fprintf() calls to faster, simpler functions. * dwarf2out.c (dwarf2out_source_line): Convert fprintf() calls to faster, simpler functions. Thanks, Dimitris === modified file 'gcc/config/elfos.h' --- gcc/config/elfos.h 2010-11-21 00:54:14 + +++ gcc/config/elfos.h 2011-08-22 11:39:43 + @@ -117,10 +117,23 @@ see the files COPYING3 and COPYING.RUNTI #define ASM_GENERATE_INTERNAL_LABEL(LABEL, PREFIX, NUM)\ do \ { \ - sprintf (LABEL, *.%s%u, PREFIX, (unsigned) (NUM)); \ + int __i = 2; \ + int __j = 0; \ + (LABEL)[0] = '*';\ + (LABEL)[1] = '.';\ + do \ + { \ + (LABEL)[__i] = (PREFIX)[__j]; \ + __i++; __j++; \ + } \ + while ((PREFIX)[__j] != '\0'); \ + sprint_ul ((LABEL)[__i], (unsigned long) (NUM));\ } \ while (0) +#undef TARGET_ASM_INTERNAL_LABEL +#define TARGET_ASM_INTERNAL_LABEL default_elf_internal_label + /* Output the label which precedes a jumptable. Note that for all svr4 systems where we actually generate jumptables (which is to say every svr4 target except i386, where we use casesi instead) we put the jump- @@ -371,7 +384,7 @@ see the files COPYING3 and COPYING.RUNTI the i386) don't know about that. Also, we don't use \v since some versions of gas, such as 2.2 did not accept it. */ -#define ESCAPES \ +#define ELF_ASCII_ESCAPES \ \1\1\1\1\1\1\1\1btn\1fr\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\
Re: GIMPLE and intent of variables
Thank you for all the answers. Now I have another problem. When I reach a function call during statement iteration, I want to go to this function's code. if (is_gimple_call(stmt)) { tree fndecl = gimple_call_fndecl(stmt); // This returns function_decl ;-) struct funtion * new_cfun = DECL_STRUCT_FUNCTION(fndecl); /* This returns struct function * from tree */ /* Push the current cfun onto the stack, and set cfun to new_cfun. */ push_cfun (new_cfun); annotate_function(cfun); /* Pop cfun from the stack. */ pop_cfun(); } I have a problem with variable types (TREE_CODE). When I change cfun (using push_cfun), temporary variables, used in new function, have TREE_CODE equals VAR_DECL instead of SSA_NAME. I think I have reached the called function before it was passed through SSA. Should I pass the function through SSA manually or somehow change the moment when the plugin is run, so that all function are already passed through SSA? -- View this message in context: http://old.nabble.com/GIMPLE-and-intent-of-variables-tp32275433p32310211.html Sent from the gcc - patches mailing list archive at Nabble.com.
Re: GIMPLE and intent of variables
On Mon, Aug 22, 2011 at 2:21 PM, Mateusz Grabowski grabek...@wp.pl wrote: Thank you for all the answers. Now I have another problem. When I reach a function call during statement iteration, I want to go to this function's code. if (is_gimple_call(stmt)) { tree fndecl = gimple_call_fndecl(stmt); // This returns function_decl ;-) struct funtion * new_cfun = DECL_STRUCT_FUNCTION(fndecl); /* This returns struct function * from tree */ /* Push the current cfun onto the stack, and set cfun to new_cfun. */ push_cfun (new_cfun); annotate_function(cfun); /* Pop cfun from the stack. */ pop_cfun(); } I have a problem with variable types (TREE_CODE). When I change cfun (using push_cfun), temporary variables, used in new function, have TREE_CODE equals VAR_DECL instead of SSA_NAME. I think I have reached the called function before it was passed through SSA. Should I pass the function through SSA manually or somehow change the moment when the plugin is run, so that all function are already passed through SSA? The latter. View this message in context: http://old.nabble.com/GIMPLE-and-intent-of-variables-tp32275433p32310211.html Sent from the gcc - patches mailing list archive at Nabble.com.
Re: GIMPLE and intent of variables
Richard Guenther-2 wrote: The latter. But how to do it? I want to have all functions after SSA pass, but before any optimizations. Maybe you could tell me how to enforce it (or even better - a small example)? Thanks in advance. -- View this message in context: http://old.nabble.com/GIMPLE-and-intent-of-variables-tp32275433p32310942.html Sent from the gcc - patches mailing list archive at Nabble.com.
Re: [trans-mem] Use __x86_64__ instead of __LP64__.
On Mon, Aug 22, 2011 at 2:42 AM, Torvald Riegel trie...@redhat.com wrote: Use __x86_64__ instead of __LP64__ in setjmp/longjmp and TLS definitions. H.J.: Is that sufficient for x32, or do we need entirely different code? If so, can you please provide the required changes? I need to take a look. Thanks. -- H.J.
Re: [DF] [performance] generate DF_REF_BASE REFs in REGNO order
Hi Steven, On Mon, 1 Aug 2011, Steven Bosscher wrote: On Sun, Jul 31, 2011 at 11:59 PM, Steven Bosscher stevenb@gmail.com wrote: On Fri, Jul 29, 2011 at 11:48 PM, Steven Bosscher stevenb@gmail.com wrote: I'll see if I can test the patch on the compile farm this weekend, just to be sure. Bootstrap on ia64-unknown-linux-gnu is in stage2 but it is taking forever (on gcc60)... Just to be clear, it is taking forever without the patch too. I did time -O2 non-bootstrap builds but there was no difference worth mentioning. Did the testsuite finish with no regressions on IA64? I'd test on a sparcstation but unfortunately the machine crashed before finishing and I can't regain access to it. I'll hopefully test it in some other platform by next week, I'm curious to actually measure runtime there. Thanks for helping! Dimitris
Re: [PATCH, testsuite, i386] AVX2 support for GCC
On Mon, Aug 22, 2011 at 6:18 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hi, thanks for input, Uros. Spaces were fixed. Updated patch is attached. ChangeLog entry is attached. Could anybody please commit it? I checked in for you. -- H.J.
Re: [PATCH, testsuite, i386] AVX2 support for GCC
Thanks! K On Mon, Aug 22, 2011 at 5:57 PM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Aug 22, 2011 at 6:18 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hi, thanks for input, Uros. Spaces were fixed. Updated patch is attached. ChangeLog entry is attached. Could anybody please commit it? I checked in for you. -- H.J.
[testsuite,committed]: Add require scheduling to gcc.dg/pr49994-[23].c
http://gcc.gnu.org/viewcvs?view=revisionrevision=177954 Committed that patchlet as obvious. testsuite/ * gcc.dg/pr49994-2.c: Add dg-require-effective-target scheduling. * gcc.dg/pr49994-3.c: Ditto. Johann Index: ChangeLog === --- ChangeLog (revision 177953) +++ ChangeLog (working copy) @@ -1,3 +1,8 @@ +2011-08-22 Georg-Johann Lay a...@gjlay.de + + * gcc.dg/pr49994-2.c: Add dg-require-effective-target scheduling. + * gcc.dg/pr49994-3.c: Ditto. + 2011-08-22 Richard Guenther rguent...@suse.de PR testsuite/50145 Index: gcc.dg/pr49994-3.c === --- gcc.dg/pr49994-3.c (revision 177949) +++ gcc.dg/pr49994-3.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-options -O2 -fsched2-use-superblocks -g } */ +/* { dg-require-effective-target scheduling } */ void * foo (int offset) Index: gcc.dg/pr49994-2.c === --- gcc.dg/pr49994-2.c (revision 177949) +++ gcc.dg/pr49994-2.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-options -O -fno-omit-frame-pointer -fschedule-insns2 -fsched2-use-superblocks -g } */ +/* { dg-require-effective-target scheduling } */ int bar (int i)
Re: PING: PATCH: PR target/46770: Use .init_array/.fini_array sections
On Sun, Aug 21, 2011 at 10:37 PM, Jakub Jelinek ja...@redhat.com wrote: On Sun, Aug 21, 2011 at 05:09:59PM -0700, H.J. Lu wrote: I didn't know .init_array section was enabled for AIX. Does this patch work for you? Some ELF targets (e.g. arm*-linux*) don't use elfos.h. IMHO you should instead add #ifndef __ELF__ #error NonELF #endif to gcc_AC_INITFINI_ARRAY test. And/or add another test to it that tests that you can actually use .section .init_array and it will use correct section flags for the section. I will update the test. -- H.J.
Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
Hi, On Sun, 21 Aug 2011, Richard Guenther wrote: On Sat, Aug 20, 2011 at 11:02 PM, Richard Henderson r...@redhat.com wrote: On 08/19/2011 02:04 AM, Richard Guenther wrote: So make sure that __cpu_indicator initially has a conservative correct value? I'd still prefer the constructor-in-libgcc option - if only because then the compiler-side is much simplified. Err, I thought __cpu_indicator was a function, not data. I think we need to discuss this more... Oh, I thought it was data initialized by the constructor ... Sriramans patch right now has a function __cpu_indicator_init which is called from (adhoc constructed) ctors and that initializes variables __cpu_model and __cpu_features ;-) There's no __cpu_indicator symbol :) I think the whole initializer function and the associated data blobs have to sit in static libgcc and be hidden. By that all shared modules will have their own copies of the model and features (and the initializer function) so there won't be issues with copy relocs, or cross shared lib calls while relocating the modules. Dynamically they will contain the same data always, but it's not many bytes, and only modules making use of this facility will pay it. The initializer function has to be callable from pre-.init contexts, e.g. ifunc dispatchers. And to make life easier there should be one ctor function calling this initializer function too, so that normal code can rely on it being already called saving one check. Ciao, Michael.
Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
On Mon, Aug 22, 2011 at 7:07 AM, Michael Matz m...@suse.de wrote: Hi, On Sun, 21 Aug 2011, Richard Guenther wrote: On Sat, Aug 20, 2011 at 11:02 PM, Richard Henderson r...@redhat.com wrote: On 08/19/2011 02:04 AM, Richard Guenther wrote: So make sure that __cpu_indicator initially has a conservative correct value? I'd still prefer the constructor-in-libgcc option - if only because then the compiler-side is much simplified. Err, I thought __cpu_indicator was a function, not data. I think we need to discuss this more... Oh, I thought it was data initialized by the constructor ... Sriramans patch right now has a function __cpu_indicator_init which is called from (adhoc constructed) ctors and that initializes variables __cpu_model and __cpu_features ;-) There's no __cpu_indicator symbol :) I think the whole initializer function and the associated data blobs have to sit in static libgcc and be hidden. By that all shared modules will have their own copies of the model and features (and the initializer function) so there won't be issues with copy relocs, or cross shared lib calls while relocating the modules. Dynamically they will contain the same data always, but it's not many bytes, and only modules making use of this facility will pay it. The initializer function has to be callable from pre-.init contexts, e.g. ifunc dispatchers. And to make life easier there should be one ctor function calling this initializer function too, so that normal code can rely on it being already called saving one check. It sounds more complicated than necessary. Why not just do it on demand like glibc does? -- H.J.
Re: GIMPLE and intent of variables
On Mon, Aug 22, 2011 at 2:57 PM, Mateusz Grabowski grabek...@wp.pl wrote: Richard Guenther-2 wrote: The latter. But how to do it? I want to have all functions after SSA pass, but before any optimizations. Maybe you could tell me how to enforce it (or even better - a small example)? Thanks in advance. There is no such place at the moment. -- View this message in context: http://old.nabble.com/GIMPLE-and-intent-of-variables-tp32275433p32310942.html Sent from the gcc - patches mailing list archive at Nabble.com.
Re: [PATCH 4/7] Support -fdebug-cpp option
Jakub == Jakub Jelinek ja...@redhat.com writes: Jakub For ccache and friends I think it would be better to have a Jakub preprocessing mode that would output all lines as is (i.e. no Jakub macro replacement), except for processing #include/#include_next Jakub directives. That exists -- -fdirectives-only. Tom
[PATCH] Fix a RTL sharing problem with CALL_INSN_FUNCTION_USAGE (PR middle-end/48722)
Hi! As the testcase below shows (on i686-linux or x86_64-linux -m32), we don't unshare expressions in CALL_INSN_FUNCTION_USAGE, which with entry_value support now include MEMs. The invalid sharing then can lead to changes in unrelated insns affecting a call insn (or vice versa), which results in DF being out of date and ICE afterwards. Fixed thusly, bootstrapped/regtested (with rtl checking) on x86_64-linux and i686-linux, ok for trunk? 2011-08-22 Jakub Jelinek ja...@redhat.com PR middle-end/48722 * emit-rtl.c (unshare_all_rtl_again): For CALL_INSNs, reset_used_flags also in CALL_INSN_FUNCTION_USAGE. (verify_rtl_sharing): Likewise and verify_rtx_sharing in there too. (unshare_all_rtl_in_chain): For CALL_INSNs copy_rtx_if_shared also CALL_INSN_FUNCTION_USAGE. * gcc.target/i386/pr48722.c: New test. --- gcc/emit-rtl.c.jj 2011-08-18 08:36:00.0 +0200 +++ gcc/emit-rtl.c 2011-08-22 08:48:27.0 +0200 @@ -2444,6 +2444,8 @@ unshare_all_rtl_again (rtx insn) { reset_used_flags (PATTERN (p)); reset_used_flags (REG_NOTES (p)); + if (CALL_P (p)) + reset_used_flags (CALL_INSN_FUNCTION_USAGE (p)); } /* Make sure that virtual stack slots are not shared. */ @@ -2610,6 +2612,8 @@ verify_rtl_sharing (void) { reset_used_flags (PATTERN (p)); reset_used_flags (REG_NOTES (p)); + if (CALL_P (p)) + reset_used_flags (CALL_INSN_FUNCTION_USAGE (p)); if (GET_CODE (PATTERN (p)) == SEQUENCE) { int i; @@ -2621,6 +2625,8 @@ verify_rtl_sharing (void) gcc_assert (INSN_P (q)); reset_used_flags (PATTERN (q)); reset_used_flags (REG_NOTES (q)); + if (CALL_P (q)) + reset_used_flags (CALL_INSN_FUNCTION_USAGE (q)); } } } @@ -2630,6 +2636,8 @@ verify_rtl_sharing (void) { verify_rtx_sharing (PATTERN (p), p); verify_rtx_sharing (REG_NOTES (p), p); + if (CALL_P (p)) + verify_rtx_sharing (CALL_INSN_FUNCTION_USAGE (p), p); } timevar_pop (TV_VERIFY_RTL_SHARING); @@ -2646,6 +2654,9 @@ unshare_all_rtl_in_chain (rtx insn) { PATTERN (insn) = copy_rtx_if_shared (PATTERN (insn)); REG_NOTES (insn) = copy_rtx_if_shared (REG_NOTES (insn)); + if (CALL_P (insn)) + CALL_INSN_FUNCTION_USAGE (insn) + = copy_rtx_if_shared (CALL_INSN_FUNCTION_USAGE (insn)); } } --- gcc/testsuite/gcc.target/i386/pr48722.c.jj 2011-08-22 08:53:10.0 +0200 +++ gcc/testsuite/gcc.target/i386/pr48722.c 2011-08-22 08:52:37.0 +0200 @@ -0,0 +1,13 @@ +/* PR middle-end/48722 */ +/* { dg-do compile } */ +/* { dg-options -Os -mno-push-args } */ + +extern long long a; +extern int b; +void bar (int, long long); + +void +foo (void) +{ + bar (a 0x85, b); +} Jakub
[PATCH] Fix ICEs in get_bit_range (PR middle-end/50141)
Hi! DECL_THREAD_LOCAL_P may be used only on VAR_DECLs, not other decls like PARM_DECL, RESULT_DECL etc. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-08-22 Jakub Jelinek ja...@redhat.com PR middle-end/50141 * expr.c (get_bit_range): Only use DECL_THREAD_LOCAL_P if innerdecl is a VAR_DECL. * c-c++-common/cxxbitfields-6.c: New test. --- gcc/expr.c.jj 2011-08-22 08:17:07.0 +0200 +++ gcc/expr.c 2011-08-22 10:06:26.0 +0200 @@ -4354,7 +4354,8 @@ get_bit_range (unsigned HOST_WIDE_INT *b || TREE_CODE (innerdecl) == TARGET_MEM_REF) !ptr_deref_may_alias_global_p (TREE_OPERAND (innerdecl, 0))) || (DECL_P (innerdecl) - (DECL_THREAD_LOCAL_P (innerdecl) + ((TREE_CODE (innerdecl) == VAR_DECL + DECL_THREAD_LOCAL_P (innerdecl)) || !TREE_STATIC (innerdecl { *bitstart = *bitend = 0; --- gcc/testsuite/c-c++-common/cxxbitfields-6.c.jj 2011-08-22 10:11:34.0 +0200 +++ gcc/testsuite/c-c++-common/cxxbitfields-6.c 2011-08-22 10:11:59.0 +0200 @@ -0,0 +1,17 @@ +/* PR middle-end/50141 */ +/* { dg-do compile } */ +/* { dg-options -O2 --param allow-store-data-races=0 } */ + +struct S +{ + int i:8; +}; + +void bar (struct S, int); + +void +foo (struct S s, int i) +{ + s.i = i; + bar (s, i); +} Jakub
[PATCH] Fix ICEs in vect_finish_stmt_generation (PR tree-optimization/50133)
Hi! The following testcase ICEs, because gsi_end_p (*gsi) and thus there is no stmt after it from which to copy over the location. As can be seen in the PR, we could do ugly hacks to retrieve locus from previous stmt (non-debug of course) instead, but I'm probably missing something obvious why we shouldn't take location from stmt itself instead. We've been doing that before, just PR37482 patch changed that without an explanation. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-08-22 Jakub Jelinek ja...@redhat.com PR tree-optimization/50133 * tree-vect-stmts.c (vect_finish_stmt_generation): Copy location from stmt instead of some statement around gsi. * gcc.dg/pr50133.c: New test. --- gcc/tree-vect-stmts.c.jj2011-08-22 08:17:08.0 +0200 +++ gcc/tree-vect-stmts.c 2011-08-22 11:26:27.0 +0200 @@ -1419,7 +1419,6 @@ vect_finish_stmt_generation (gimple stmt stmt_vec_info stmt_info = vinfo_for_stmt (stmt); loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info); - gimple_stmt_iterator si; gcc_assert (gimple_code (stmt) != GIMPLE_LABEL); @@ -1434,13 +1433,7 @@ vect_finish_stmt_generation (gimple stmt print_gimple_stmt (vect_dump, vec_stmt, 0, TDF_SLIM); } - si = *gsi; - if (is_gimple_debug (gsi_stmt (si))) -{ - gsi_next_nondebug (si); - gcc_assert (!gsi_end_p (si)); -} - gimple_set_location (vec_stmt, gimple_location (gsi_stmt (si))); + gimple_set_location (vec_stmt, gimple_location (stmt)); } /* Checks if CALL can be vectorized in type VECTYPE. Returns --- gcc/testsuite/gcc.dg/pr50133.c.jj 2011-08-22 11:27:15.0 +0200 +++ gcc/testsuite/gcc.dg/pr50133.c 2011-08-22 11:12:04.0 +0200 @@ -0,0 +1,18 @@ +/* PR tree-optimization/50133 */ +/* { dg-do compile } */ +/* { dg-options -O -ftree-vectorize -fno-tree-loop-im } */ + +extern int A[], B[]; + +void +foo (int z) +{ + int j, i; + for (j = 0; j 32; j++) +{ + int a = A[0]; + for (i = 0; i 16; i++) + a = A[i] ? a : z; + B[j] = a; +} +} Jakub
[C++ PATCH] Clear TYPE_TRANSPARENT_AGGR if there are no fields (PR c++/46862)
Hi! TYPE_TRANSPARENT_AGGR types are passed and mangled as its first field. But if somebody errorneously doesn't put any types in its definition, the FE marks it TYPE_TRANSPARENT_AGGR early (e.g. even on a forward declaration), but during mangling or in middle-end when trying to find out how it will be passed we ICE because first_field is NULL. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux. Ok for trunk/4.6? 2011-08-22 Jakub Jelinek ja...@redhat.com PR c++/46862 * class.c (finish_struct_1): If TYPE_TRANSPARENT_AGGR is set on a type which doesn't have any fields, clear it and diagnose. * g++.dg/ext/decimal1.C: New test. --- gcc/cp/class.c.jj 2011-08-18 08:35:38.0 +0200 +++ gcc/cp/class.c 2011-08-22 12:52:13.0 +0200 @@ -1,6 +1,6 @@ /* Functions related to building classes and their related objects. Copyright (C) 1987, 1992, 1993, 1994, 1995, 1996, 1997, 1998, - 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2007, 2008, 2009, 2010 + 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2007, 2008, 2009, 2010, 2011 Free Software Foundation, Inc. Contributed by Michael Tiemann (tiem...@cygnus.com) @@ -5794,6 +5794,12 @@ finish_struct_1 (tree t) /* Finish debugging output for this type. */ rest_of_type_compilation (t, ! LOCAL_CLASS_P (t)); + + if (TYPE_TRANSPARENT_AGGR (t) first_field (t) == NULL_TREE) +{ + error (type transparent class %qT does not have any fields, t); + TYPE_TRANSPARENT_AGGR (t) = 0; +} } /* When T was built up, the member declarations were added in reverse --- gcc/testsuite/g++.dg/ext/decimal1.C.jj 2011-08-22 13:22:01.0 +0200 +++ gcc/testsuite/g++.dg/ext/decimal1.C 2011-08-22 13:16:30.0 +0200 @@ -0,0 +1,19 @@ +// PR c++/46862 +// { dg-do compile } + +namespace std +{ + namespace decimal + { +class decimal32 { }; // { dg-error does not have any fields } +class decimal64 { }; // { dg-error does not have any fields } +class decimal128 { }; // { dg-error does not have any fields } + } +} + +void +foo (std::decimal::decimal32 x, + std::decimal::decimal64 y, + std::decimal::decimal128 z) +{ +} Jakub
Re: [PATCH] void dangling line table after loading pch
On 11-08-22 07:10 , Dodji Seketeli wrote: Hello, In c_common_read_pch when gt_pch_restore loads a new pch, the previous line table (referenced from the global 'line_table') is garbage-collected and a new one is built. As the global instance of cpp_reader referenced by the local variable 'pfile' has a pfile-line_table member that references the 'line_table' global, pfile-line_table needs to be set to the new value of line_table after gt_pch_restore is called, otherwise pfile-line_table points to garbage-collected memory. This is what the call to cpp_set_line_map in c_common_read_pch is for. The problem is that cpp_set_line_map is called too late as cpp_read_state (called before cpp_set_line_map) indirectly touches some pfile-line_table dangling garbage-collected memory[1]. This doesn't cause any visible havoc in trunk yet but I am seeing it crashing as I am fiddling with line map stuff on the side. The two-liner patch below just calls cpp_set_line_map right after gt_pch_restore to restore pfile-line_table before anyone touches it. [1]: This happens via cpp_read_state - _cpp_create_definition - _cpp_create_iso_definition - _cpp_lex_token - _cpp_lex_direct - linemap_position_for_column. Bootstrapped and tested on x86_64-unknown-linux-gnu against trunk. OK for trunk? gcc/ * c-family/c-pch.c (c_common_read_pch): Re-set line table right after reading in the pch. OK. Diego.
Re: [PATCH 4/7] Support -fdebug-cpp option
On Mon, Aug 22, 2011 at 08:16:45AM -0600, Tom Tromey wrote: Jakub == Jakub Jelinek ja...@redhat.com writes: Jakub For ccache and friends I think it would be better to have a Jakub preprocessing mode that would output all lines as is (i.e. no Jakub macro replacement), except for processing #include/#include_next Jakub directives. That exists -- -fdirectives-only. It isn't exactly what would be needed, as e.g. \\\n are removed from from #defines and thus they get different location of the tokens. BTW, I guess we should do something about parsing such an output, we emit e.g. # 1 stdin # 1 built-in #define __STDC__ 1 #define __STDC_HOSTED__ 1 #define __GNUC__ 4 #define __GNUC_MINOR__ 6 #define __GNUC_PATCHLEVEL__ 0 ... For built-in we really should assume line 0 for all the defines in that file. Jakub
Re: PING: PATCH: PR target/46770: Use .init_array/.fini_array sections
On Mon, Aug 22, 2011 at 7:06 AM, H.J. Lu hjl.to...@gmail.com wrote: On Sun, Aug 21, 2011 at 10:37 PM, Jakub Jelinek ja...@redhat.com wrote: On Sun, Aug 21, 2011 at 05:09:59PM -0700, H.J. Lu wrote: I didn't know .init_array section was enabled for AIX. Does this patch work for you? Some ELF targets (e.g. arm*-linux*) don't use elfos.h. IMHO you should instead add #ifndef __ELF__ #error NonELF #endif to gcc_AC_INITFINI_ARRAY test. And/or add another test to it that tests that you can actually use .section .init_array and it will use correct section flags for the section. I will update the test. Can I check in this patch to address AIX issue first? I will submit a patch to test .section .init_array later? Thanks. -- H.J. --- 2011-08-22 H.J. Lu hongjiu...@intel.com * acinclude.m4 (gcc_AC_INITFINI_ARRAY): Error if __ELF__ isn't defined. diff --git a/gcc/acinclude.m4 b/gcc/acinclude.m4 index 74c86db..a8ecd2d 100644 --- a/gcc/acinclude.m4 +++ b/gcc/acinclude.m4 @@ -377,6 +377,9 @@ AC_CACHE_CHECK(for .preinit_array/.init_array/.fini_array su pport, gcc_cv_initfini_array, [dnl if test x${build} = x${target} test x${build} = x${host}; then AC_RUN_IFELSE([AC_LANG_SOURCE([ +#ifndef __ELF__ +#error Not an ELF OS +#endif #ifdef __ia64__ /* We turn on .preinit_array/.init_array/.fini_array support for ia64 if it can be used. */
Re: [PATCH] Fix ICEs in get_bit_range (PR middle-end/50141)
On Mon, Aug 22, 2011 at 4:18 PM, Jakub Jelinek ja...@redhat.com wrote: Hi! DECL_THREAD_LOCAL_P may be used only on VAR_DECLs, not other decls like PARM_DECL, RESULT_DECL etc. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Thanks, Richard. 2011-08-22 Jakub Jelinek ja...@redhat.com PR middle-end/50141 * expr.c (get_bit_range): Only use DECL_THREAD_LOCAL_P if innerdecl is a VAR_DECL. * c-c++-common/cxxbitfields-6.c: New test. --- gcc/expr.c.jj 2011-08-22 08:17:07.0 +0200 +++ gcc/expr.c 2011-08-22 10:06:26.0 +0200 @@ -4354,7 +4354,8 @@ get_bit_range (unsigned HOST_WIDE_INT *b || TREE_CODE (innerdecl) == TARGET_MEM_REF) !ptr_deref_may_alias_global_p (TREE_OPERAND (innerdecl, 0))) || (DECL_P (innerdecl) - (DECL_THREAD_LOCAL_P (innerdecl) + ((TREE_CODE (innerdecl) == VAR_DECL + DECL_THREAD_LOCAL_P (innerdecl)) || !TREE_STATIC (innerdecl { *bitstart = *bitend = 0; --- gcc/testsuite/c-c++-common/cxxbitfields-6.c.jj 2011-08-22 10:11:34.0 +0200 +++ gcc/testsuite/c-c++-common/cxxbitfields-6.c 2011-08-22 10:11:59.0 +0200 @@ -0,0 +1,17 @@ +/* PR middle-end/50141 */ +/* { dg-do compile } */ +/* { dg-options -O2 --param allow-store-data-races=0 } */ + +struct S +{ + int i:8; +}; + +void bar (struct S, int); + +void +foo (struct S s, int i) +{ + s.i = i; + bar (s, i); +} Jakub
Re: [PATCH] Fix ICEs in vect_finish_stmt_generation (PR tree-optimization/50133)
On Mon, Aug 22, 2011 at 4:22 PM, Jakub Jelinek ja...@redhat.com wrote: Hi! The following testcase ICEs, because gsi_end_p (*gsi) and thus there is no stmt after it from which to copy over the location. As can be seen in the PR, we could do ugly hacks to retrieve locus from previous stmt (non-debug of course) instead, but I'm probably missing something obvious why we shouldn't take location from stmt itself instead. We've been doing that before, just PR37482 patch changed that without an explanation. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Thanks, Richard. 2011-08-22 Jakub Jelinek ja...@redhat.com PR tree-optimization/50133 * tree-vect-stmts.c (vect_finish_stmt_generation): Copy location from stmt instead of some statement around gsi. * gcc.dg/pr50133.c: New test. --- gcc/tree-vect-stmts.c.jj 2011-08-22 08:17:08.0 +0200 +++ gcc/tree-vect-stmts.c 2011-08-22 11:26:27.0 +0200 @@ -1419,7 +1419,6 @@ vect_finish_stmt_generation (gimple stmt stmt_vec_info stmt_info = vinfo_for_stmt (stmt); loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info); - gimple_stmt_iterator si; gcc_assert (gimple_code (stmt) != GIMPLE_LABEL); @@ -1434,13 +1433,7 @@ vect_finish_stmt_generation (gimple stmt print_gimple_stmt (vect_dump, vec_stmt, 0, TDF_SLIM); } - si = *gsi; - if (is_gimple_debug (gsi_stmt (si))) - { - gsi_next_nondebug (si); - gcc_assert (!gsi_end_p (si)); - } - gimple_set_location (vec_stmt, gimple_location (gsi_stmt (si))); + gimple_set_location (vec_stmt, gimple_location (stmt)); } /* Checks if CALL can be vectorized in type VECTYPE. Returns --- gcc/testsuite/gcc.dg/pr50133.c.jj 2011-08-22 11:27:15.0 +0200 +++ gcc/testsuite/gcc.dg/pr50133.c 2011-08-22 11:12:04.0 +0200 @@ -0,0 +1,18 @@ +/* PR tree-optimization/50133 */ +/* { dg-do compile } */ +/* { dg-options -O -ftree-vectorize -fno-tree-loop-im } */ + +extern int A[], B[]; + +void +foo (int z) +{ + int j, i; + for (j = 0; j 32; j++) + { + int a = A[0]; + for (i = 0; i 16; i++) + a = A[i] ? a : z; + B[j] = a; + } +} Jakub
Re: Vector Comparison patch
On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov artyom.shinkar...@gmail.com wrote: On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther richard.guent...@gmail.com wrote: On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov artyom.shinkar...@gmail.com wrote: Richard I formalized an approach a little-bit, now it works without target hooks, but some polishing is still required. I want you to comment on the several important approaches that I use in the patch. So how does it work. 1) All the vector comparisons at the level of type-checker are introduced using VEC_COND_EXPR with constant selection operands being {-1} and {0}. For example v0 v1 is transformed into VEC_COND_EXPRv0 v1, {-1}, {0}. 2) When optabs expand VEC_COND_EXPR, two cases are considered: 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes. 2.b) first operand is something else, in that case, we specially mark this case, recognize it in the backend, and do not create a comparison, but use the mask as it was a result of some comparison. 3) In order to make sure that mask in VEC_COND_EXPRmask, v0, v1 is a vector comparison we use is_vector_comparison function, if it returns false, then we replace mask with mask != {0}. So we end-up with the following functionality: VEC_COND_EXPRmask, v0,v1 -- if we know that mask is a result of comparison of two vectors, we leave it as it is, otherwise change with mask != {0}. Plain vector comparison a op b is represented with VEC_COND_EXPR, which correctly expands, without creating useless masking. Basically for me there are two questions: 1) Can we perform information passing in optabs in a nicer way? 2) How is_vector_comparison could be improved? I have several ideas, like checking if constant vector all consists of 0 and -1, and so on. But first is it conceptually fine. P.S. I tired to put the functionality of is_vector_comparison in tree-ssa-forwprop, but the thing is that it is called only with -On, which I find inappropriate, and the functionality gets more complicated. Why is it inappropriate to not optimize it at -O0? If the user separates comparison and ?: expression it's his own fault. Well, because all the information is there, and I perfectly envision the case when user expressed comparison separately, just to avoid code duplication. Like: mask = a b; res1 = mask ? v0 : v1; res2 = mask ? v2 : v3; Which in this case would be different from res1 = a b ? v0 : v1; res2 = a b ? v2 : v3; Btw, the new hook is still in the patch. I would simply always create != 0 if it isn't and let optimizers (tree-ssa-forwprop.c) optimize this. You still have to deal with non-comparison operands during expansion though, but if you always forced a != 0 from the start you can then simply interpret it as a proper comparison result (in which case I'd modify the backends to have an alternate pattern or directly expand to masking operations - using the fake comparison RTX is too much of a hack). Richard, I think you didn't get the problem. I really need the way, to pass the information, that the expression that is in the first operand of vcond is an appropriate mask. I though for quite a while and this hack is the only answer I found, is there a better way to do that. I could for example introduce another tree-node, but it would be overkill as well. Now why do I need it so much: I want to implement the comparison in a way that {1, 5, 0, -1} is actually {-1,-1,-1,-1}. So whenever I am not sure that mask of VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not always). To check the stuff, I use is_vector_comparison in tree-vect-generic. So I really have the difference between mask ? x : y and mask != {0} ? x : y, otherwise I could treat mask != {0} in the backend as just mask. If this link between optabs and backend breaks, then the patch falls apart. Because every time the comparison is taken out VEC_COND_EXPR, I will have to put != {0}. Keep in mind, that I cannot always put the comparison inside the VEC_COND_EXPR, what if it is defined in a function-comparison, or somehow else? So what would be an appropriate way to connect optabs and the backend? Well, there is no problem in having the only valid mask operand for VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,} mask. Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ? vec1 : vec2. This comparison can be eliminated by optimization passes that either replace it by the real comparison computing the mask or just propagating the information this mask is already {-1,...} / {0,} by simply dropping the comparison against zero. For the backends I'd have vcond patterns for both an embedded comparison and for a mask. (Now we can rewind the discussion a bit and allow arbitrary masks and define a vcond with a mask operand to do bitwise selection - what matters is the C frontend semantics which we need to
Re: PING: PATCH: PR target/46770: Use .init_array/.fini_array sections
On 08/22/2011 04:45 PM, H.J. Lu wrote: Can I check in this patch to address AIX issue first? I will submit a patch to test .section .init_array later? Thanks. Yes. Paolo
[pph] Fix x1dynarra1, x1dynarray2a and x1dynarray2b (issue4921051)
This patch fixes some template test cases. We were trying to expand functions that did not really need expanding (templates). This got me thinking that we are not approaching the expansion properly. We are trying to re-create all the cgraph creation done during the compilation of the header file. Rather than re-creating all this, it would be more efficient to save the state of the callgraph and varpool at the time of pph generation and then simply recreate it at read time. I will experiment with that, see if it actually makes any sense. Tested on x86_64. Applied to pph. * pph-streamer-out.c (pph_out_symtab): Mark the function as having a body if it has a cgraph node associated with it. * pph-streamer-in.c (pph_in_binding_level): Call pph_in_tree to read BL-BLOCKS. * pph-streamer-out.c (pph_out_binding_level_1): Call pph_out_tree to write BL-BLOCKS. testsuite/ChangeLog.pph 2011-08-18 Diego Novillo dnovi...@google.com * g++.dg/pph/x1dynarray1.cc: Mark fixed. * g++.dg/pph/x1dynarray2a.cc: Mark fixed. * g++.dg/pph/x1dynarray2b.cc: Mark fixed. diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c index 9686edf..ae3ede9 100644 --- a/gcc/cp/pph-streamer-in.c +++ b/gcc/cp/pph-streamer-in.c @@ -557,7 +557,7 @@ pph_in_binding_level (pph_stream *stream, cp_binding_level *to_register) VEC_safe_push (cp_label_binding, gc, bl-shadowed_labels, sl); } - bl-blocks = pph_in_chain (stream); + bl-blocks = pph_in_tree (stream); bl-this_entity = pph_in_tree (stream); bl-level_chain = pph_in_binding_level (stream, NULL); bl-dead_vars_from_for = pph_in_tree_vec (stream); diff --git a/gcc/cp/pph-streamer-out.c b/gcc/cp/pph-streamer-out.c index b563891..fe0758f 100644 --- a/gcc/cp/pph-streamer-out.c +++ b/gcc/cp/pph-streamer-out.c @@ -574,7 +574,7 @@ pph_out_binding_level_1 (pph_stream *stream, cp_binding_level *bl, FOR_EACH_VEC_ELT (cp_label_binding, bl-shadowed_labels, i, sl) pph_out_label_binding (stream, sl); - pph_out_chain (stream, bl-blocks); + pph_out_tree (stream, bl-blocks); pph_out_tree (stream, bl-this_entity); pph_out_binding_level (stream, bl-level_chain, filter); pph_out_tree_vec (stream, bl-dead_vars_from_for); @@ -1128,7 +1128,9 @@ pph_out_symtab (pph_stream *stream) FOR_EACH_VEC_ELT (tree, stream-symtab.v, i, decl) if (TREE_CODE (decl) == FUNCTION_DECL DECL_STRUCT_FUNCTION (decl)) { - if (DECL_SAVED_TREE (decl)) + /* If this is a regular (non-template) function with a body, + mark it for expansion during reading. */ + if (DECL_SAVED_TREE (decl) cgraph_get_node (decl)) pph_out_symtab_marker (stream, PPH_SYMTAB_FUNCTION_BODY); else pph_out_symtab_marker (stream, PPH_SYMTAB_FUNCTION); diff --git a/gcc/testsuite/g++.dg/pph/x1dynarray1.cc b/gcc/testsuite/g++.dg/pph/x1dynarray1.cc index 2a1f9f6..6ef279d 100644 --- a/gcc/testsuite/g++.dg/pph/x1dynarray1.cc +++ b/gcc/testsuite/g++.dg/pph/x1dynarray1.cc @@ -1,5 +1,4 @@ -// { dg-do link } -// { dg-xfail-if BOGUS { *-*-* } { -fpph-map=pph.map } } +// { dg-do run } #include x0dynarray1.h diff --git a/gcc/testsuite/g++.dg/pph/x1dynarray2a.cc b/gcc/testsuite/g++.dg/pph/x1dynarray2a.cc index f941086..252e801 100644 --- a/gcc/testsuite/g++.dg/pph/x1dynarray2a.cc +++ b/gcc/testsuite/g++.dg/pph/x1dynarray2a.cc @@ -1,5 +1,4 @@ -// { dg-xfail-if BOGUS { *-*-* } { -fpph-map=pph.map } } -// { dg-do link } +// { dg-do run } #include x0dynarray2.h diff --git a/gcc/testsuite/g++.dg/pph/x1dynarray2b.cc b/gcc/testsuite/g++.dg/pph/x1dynarray2b.cc index 8f14149..c942d9d 100644 --- a/gcc/testsuite/g++.dg/pph/x1dynarray2b.cc +++ b/gcc/testsuite/g++.dg/pph/x1dynarray2b.cc @@ -1,5 +1,4 @@ -// { dg-xfail-if BOGUS { *-*-* } { -fpph-map=pph.map } } -// { dg-do link } +// { dg-do run } #include x0dynarray2.h -- 1.7.3.1 -- This patch is available for review at http://codereview.appspot.com/4921051
Re: Vector Comparison patch
On Mon, Aug 22, 2011 at 4:01 PM, Richard Guenther richard.guent...@gmail.com wrote: On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov artyom.shinkar...@gmail.com wrote: On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther richard.guent...@gmail.com wrote: On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov artyom.shinkar...@gmail.com wrote: Richard I formalized an approach a little-bit, now it works without target hooks, but some polishing is still required. I want you to comment on the several important approaches that I use in the patch. So how does it work. 1) All the vector comparisons at the level of type-checker are introduced using VEC_COND_EXPR with constant selection operands being {-1} and {0}. For example v0 v1 is transformed into VEC_COND_EXPRv0 v1, {-1}, {0}. 2) When optabs expand VEC_COND_EXPR, two cases are considered: 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes. 2.b) first operand is something else, in that case, we specially mark this case, recognize it in the backend, and do not create a comparison, but use the mask as it was a result of some comparison. 3) In order to make sure that mask in VEC_COND_EXPRmask, v0, v1 is a vector comparison we use is_vector_comparison function, if it returns false, then we replace mask with mask != {0}. So we end-up with the following functionality: VEC_COND_EXPRmask, v0,v1 -- if we know that mask is a result of comparison of two vectors, we leave it as it is, otherwise change with mask != {0}. Plain vector comparison a op b is represented with VEC_COND_EXPR, which correctly expands, without creating useless masking. Basically for me there are two questions: 1) Can we perform information passing in optabs in a nicer way? 2) How is_vector_comparison could be improved? I have several ideas, like checking if constant vector all consists of 0 and -1, and so on. But first is it conceptually fine. P.S. I tired to put the functionality of is_vector_comparison in tree-ssa-forwprop, but the thing is that it is called only with -On, which I find inappropriate, and the functionality gets more complicated. Why is it inappropriate to not optimize it at -O0? If the user separates comparison and ?: expression it's his own fault. Well, because all the information is there, and I perfectly envision the case when user expressed comparison separately, just to avoid code duplication. Like: mask = a b; res1 = mask ? v0 : v1; res2 = mask ? v2 : v3; Which in this case would be different from res1 = a b ? v0 : v1; res2 = a b ? v2 : v3; Btw, the new hook is still in the patch. I would simply always create != 0 if it isn't and let optimizers (tree-ssa-forwprop.c) optimize this. You still have to deal with non-comparison operands during expansion though, but if you always forced a != 0 from the start you can then simply interpret it as a proper comparison result (in which case I'd modify the backends to have an alternate pattern or directly expand to masking operations - using the fake comparison RTX is too much of a hack). Richard, I think you didn't get the problem. I really need the way, to pass the information, that the expression that is in the first operand of vcond is an appropriate mask. I though for quite a while and this hack is the only answer I found, is there a better way to do that. I could for example introduce another tree-node, but it would be overkill as well. Now why do I need it so much: I want to implement the comparison in a way that {1, 5, 0, -1} is actually {-1,-1,-1,-1}. So whenever I am not sure that mask of VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not always). To check the stuff, I use is_vector_comparison in tree-vect-generic. So I really have the difference between mask ? x : y and mask != {0} ? x : y, otherwise I could treat mask != {0} in the backend as just mask. If this link between optabs and backend breaks, then the patch falls apart. Because every time the comparison is taken out VEC_COND_EXPR, I will have to put != {0}. Keep in mind, that I cannot always put the comparison inside the VEC_COND_EXPR, what if it is defined in a function-comparison, or somehow else? So what would be an appropriate way to connect optabs and the backend? Well, there is no problem in having the only valid mask operand for VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,} mask. Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ? vec1 : vec2. This happens already in the new version of patch (not submitted yet). This comparison can be eliminated by optimization passes that either replace it by the real comparison computing the mask or just propagating the information this mask is already {-1,...} / {0,} by simply dropping the comparison against zero. This is not a problem, because the backend recognizes these patterns, so no optimization is needed in this part.
[pph] Re-organize pph_write_tree/pph_read_tree (issue4934045)
This patch refactors pph_write_tree and pph_read_tree so that we can reduce the amount of special casing we were doing. It first tries to handle nodes based on their tree code class. The only class that cannot be handled this way is tcc_exceptional, so we only deal with those nodes individually. This fixes no testcases but it simplifies the logic of the AST streamer. I think we can now get rid of flag_pph_untree now since we are unconditionally ICEing if a node is not handled. Tested on x86_64. Applied to branch. * pph-streamer-in.c (pph_in_tcc_declaration): Factor out of pph_read_tree_body. Handle all decls. (pph_in_tcc_type): Factor out of pph_read_tree_body. Handle all types. (pph_read_tree_body): Issue a fatal error if a node has not been handled. * pph-streamer-out.c (pph_out_tcc_declaration): Factor out of pph_write_tree_body. Handle all decls. (pph_out_tcc_type): Factor out of pph_write_tree_body. Handle all types. (pph_write_tree_body): Issue a fatal error if a node has not been handled. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/pph@177960 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/cp/ChangeLog.pph | 15 ++ gcc/cp/pph-streamer-in.c | 342 -- gcc/cp/pph-streamer-out.c | 359 + 3 files changed, 284 insertions(+), 432 deletions(-) diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c index ae3ede9..07e1135 100644 --- a/gcc/cp/pph-streamer-in.c +++ b/gcc/cp/pph-streamer-in.c @@ -1538,119 +1538,129 @@ pph_read_file (const char *filename) } -/* Read the attributes for a FUNCTION_DECL FNDECL. If FNDECL had - a body, mark it for expansion. */ +/* Read from STREAM the body of tcc_declaration tree DECL. */ static void -pph_in_function_decl (pph_stream *stream, tree fndecl) +pph_in_tcc_declaration (pph_stream *stream, tree decl) { - DECL_INITIAL (fndecl) = pph_in_tree (stream); - pph_in_lang_specific (stream, fndecl); - DECL_SAVED_TREE (fndecl) = pph_in_tree (stream); - DECL_CHAIN (fndecl) = pph_in_tree (stream); -} - + pph_in_lang_specific (stream, decl); + DECL_INITIAL (decl) = pph_in_tree (stream); + /* The tree streamer only writes DECL_CHAIN for PARM_DECL nodes. */ + if (TREE_CODE (decl) == VAR_DECL + || TREE_CODE (decl) == FUNCTION_DECL) +DECL_CHAIN (decl) = pph_in_tree (stream); -/* Read the body fields of EXPR from STREAM. */ - -static void -pph_read_tree_body (pph_stream *stream, tree expr) -{ - struct lto_input_block *ib = stream-encoder.r.ib; - struct data_in *data_in = stream-encoder.r.data_in; - - /* Read the language-independent parts of EXPR's body. */ - streamer_read_tree_body (ib, data_in, expr); - - /* Read all the language-dependent fields. */ - switch (TREE_CODE (expr)) + /* Handle some individual decl nodes. */ + switch (TREE_CODE (decl)) { -/* TREES NEEDING EXTRA WORK */ - -/* tcc_declaration */ - -case DEBUG_EXPR_DECL: -case IMPORTED_DECL: -case LABEL_DECL: -case RESULT_DECL: - DECL_INITIAL (expr) = pph_in_tree (stream); - break; - -case CONST_DECL: -case FIELD_DECL: -case NAMESPACE_DECL: -case PARM_DECL: -case USING_DECL: -case VAR_DECL: - /* FIXME pph: Should we merge DECL_INITIAL into lang_specific? */ - DECL_INITIAL (expr) = pph_in_tree (stream); - pph_in_lang_specific (stream, expr); - /* DECL_CHAIN is handled by generic code, except for VAR_DECLs. */ - if (TREE_CODE (expr) == VAR_DECL) - DECL_CHAIN (expr) = pph_in_tree (stream); - break; - case FUNCTION_DECL: - pph_in_function_decl (stream, expr); + DECL_SAVED_TREE (decl) = pph_in_tree (stream); break; case TYPE_DECL: - DECL_INITIAL (expr) = pph_in_tree (stream); - pph_in_lang_specific (stream, expr); - DECL_ORIGINAL_TYPE (expr) = pph_in_tree (stream); + DECL_ORIGINAL_TYPE (decl) = pph_in_tree (stream); break; case TEMPLATE_DECL: - DECL_INITIAL (expr) = pph_in_tree (stream); - pph_in_lang_specific (stream, expr); - DECL_TEMPLATE_RESULT (expr) = pph_in_tree (stream); - DECL_TEMPLATE_PARMS (expr) = pph_in_tree (stream); - DECL_CONTEXT (expr) = pph_in_tree (stream); + DECL_TEMPLATE_RESULT (decl) = pph_in_tree (stream); + DECL_TEMPLATE_PARMS (decl) = pph_in_tree (stream); break; -/* tcc_type */ - -case ARRAY_TYPE: -case BOOLEAN_TYPE: -case COMPLEX_TYPE: -case ENUMERAL_TYPE: -case FIXED_POINT_TYPE: -case FUNCTION_TYPE: -case INTEGER_TYPE: -case LANG_TYPE: -case METHOD_TYPE: -case NULLPTR_TYPE: -case OFFSET_TYPE: -case POINTER_TYPE: -case REAL_TYPE: -case REFERENCE_TYPE: -case VECTOR_TYPE: -case VOID_TYPE: - TYPE_LANG_SPECIFIC (expr) = pph_in_lang_type (stream); +default: break;
Re: Vector Comparison patch
On Mon, Aug 22, 2011 at 4:34 PM, Richard Guenther richard.guent...@gmail.com wrote: On Mon, Aug 22, 2011 at 5:21 PM, Artem Shinkarov artyom.shinkar...@gmail.com wrote: On Mon, Aug 22, 2011 at 4:01 PM, Richard Guenther richard.guent...@gmail.com wrote: On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov artyom.shinkar...@gmail.com wrote: On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther richard.guent...@gmail.com wrote: On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov artyom.shinkar...@gmail.com wrote: Richard I formalized an approach a little-bit, now it works without target hooks, but some polishing is still required. I want you to comment on the several important approaches that I use in the patch. So how does it work. 1) All the vector comparisons at the level of type-checker are introduced using VEC_COND_EXPR with constant selection operands being {-1} and {0}. For example v0 v1 is transformed into VEC_COND_EXPRv0 v1, {-1}, {0}. 2) When optabs expand VEC_COND_EXPR, two cases are considered: 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes. 2.b) first operand is something else, in that case, we specially mark this case, recognize it in the backend, and do not create a comparison, but use the mask as it was a result of some comparison. 3) In order to make sure that mask in VEC_COND_EXPRmask, v0, v1 is a vector comparison we use is_vector_comparison function, if it returns false, then we replace mask with mask != {0}. So we end-up with the following functionality: VEC_COND_EXPRmask, v0,v1 -- if we know that mask is a result of comparison of two vectors, we leave it as it is, otherwise change with mask != {0}. Plain vector comparison a op b is represented with VEC_COND_EXPR, which correctly expands, without creating useless masking. Basically for me there are two questions: 1) Can we perform information passing in optabs in a nicer way? 2) How is_vector_comparison could be improved? I have several ideas, like checking if constant vector all consists of 0 and -1, and so on. But first is it conceptually fine. P.S. I tired to put the functionality of is_vector_comparison in tree-ssa-forwprop, but the thing is that it is called only with -On, which I find inappropriate, and the functionality gets more complicated. Why is it inappropriate to not optimize it at -O0? If the user separates comparison and ?: expression it's his own fault. Well, because all the information is there, and I perfectly envision the case when user expressed comparison separately, just to avoid code duplication. Like: mask = a b; res1 = mask ? v0 : v1; res2 = mask ? v2 : v3; Which in this case would be different from res1 = a b ? v0 : v1; res2 = a b ? v2 : v3; Btw, the new hook is still in the patch. I would simply always create != 0 if it isn't and let optimizers (tree-ssa-forwprop.c) optimize this. You still have to deal with non-comparison operands during expansion though, but if you always forced a != 0 from the start you can then simply interpret it as a proper comparison result (in which case I'd modify the backends to have an alternate pattern or directly expand to masking operations - using the fake comparison RTX is too much of a hack). Richard, I think you didn't get the problem. I really need the way, to pass the information, that the expression that is in the first operand of vcond is an appropriate mask. I though for quite a while and this hack is the only answer I found, is there a better way to do that. I could for example introduce another tree-node, but it would be overkill as well. Now why do I need it so much: I want to implement the comparison in a way that {1, 5, 0, -1} is actually {-1,-1,-1,-1}. So whenever I am not sure that mask of VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not always). To check the stuff, I use is_vector_comparison in tree-vect-generic. So I really have the difference between mask ? x : y and mask != {0} ? x : y, otherwise I could treat mask != {0} in the backend as just mask. If this link between optabs and backend breaks, then the patch falls apart. Because every time the comparison is taken out VEC_COND_EXPR, I will have to put != {0}. Keep in mind, that I cannot always put the comparison inside the VEC_COND_EXPR, what if it is defined in a function-comparison, or somehow else? So what would be an appropriate way to connect optabs and the backend? Well, there is no problem in having the only valid mask operand for VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,} mask. Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ? vec1 : vec2. This happens already in the new version of patch (not submitted yet). This comparison can be eliminated by optimization passes that either replace it by the real comparison computing the mask or just propagating the information this mask is already {-1,...}
Re: mem_attrs_htab
Hi, On Mon, 22 Aug 2011, Richard Guenther wrote: Some functions are extremely large though. Do you mean that MEM itself would be enlarged to have the MEM_ATTRS field so that one operand is the address, then expr, then HWI size, offset, etc.? Because if the mem attrs aren't shared any longer, it doesn't make sense to keep the indirection. I still fear we have way too many MEMs in RTL that this would be noticeable. It would be interesting to have numbers about the amount of sharing that happens A pathetic amount. From compiling cse.c combine.c tree.c and dwarf2out.c the top 10 users of MEMs per routine are: With -O0: MEMs ATTR name 970 485 combine_simplify_rtx 1011 464 simple_cst_equal 1047 612 mem_loc_descriptor 1173 442 walk_tree_1 1296 515 loc_list_from_tree 1431 690 simplify_comparison 1911 752 substitute_placeholder_in_expr 1951 745 substitute_in_expr 2503 1532 cse_insn 3242 2206 try_combine With -O2: MEMs ATTR name 514 502 gen_tagged_type_die 701 536 simplify_comparison 743 877 find_decls_types_r 851 840 dwarf2out_finish 863 784 loc_list_from_tree 916 839 combine_simplify_rtx 978 878 gen_subprogram_die 1650 1475 cse_insn 1720 1782 mem_loc_descriptor 2336 1792 try_combine Summing doesn't make sense, but the routines with largest differences: -O0 532 force_to_mode 547 simple_cst_equal 640 simplify_shift_const_1 731 walk_tree_1 741 simplify_comparison 781 loc_list_from_tree 971 cse_insn 1036 try_combine 1159 substitute_placeholder_in_expr 1206 substitute_in_expr -O2 100 gen_subprogram_die 101 make_extraction 112 output_loc_sequence 122 if_then_else_cond 124 substitute_placeholder_in_expr 144 simplify_shift_const_1 165 simplify_comparison 175 cse_insn 205 simplify_if_then_else 544 try_combine (Using -g or not doesn't make a difference). I've counted all MEM rtx in the whole insn stream at finalization time (i.e. slightly less than potentially are actually generated during RTL passes). ATTR is the number of unique mem_attrs ever created by set_mem_attrs, reset to zero at each function start (including emptying the htab). That is, we save a whopping 48 kilobyte due to this fantastic hash table :-) (offseted by the need for a pointer in the MEM rtx) Just remove the whole thing. Same for the reg_attrs hash table (I haven't measured that one, though). - might be not trivial though, as some re-uses would be able to simply modify the attr inplace. Ciao, Michael.
Re: PING: PATCH: PR target/46770: Use .init_array/.fini_array sections
On Sun, Aug 21, 2011 at 4:19 PM, David Edelsohn dje@gmail.com wrote: This patch broke bootstrap on AIX. It emits a .section op in assembly but .section is an ELF syntax op not AIX XCOFF. FE..initialize_critical: .section .init_array varasm.c should not be generating ELF ops for non-ELF targets. config.log shows: gcc_cv_initfini_array=yes tm_file_list includes initfini-array.h tm_include_list includes initfini-array.h Why is the patch affecting non-ELF targets? Please fix or revert immediately. David, I checked in the fix for AIX. Please verify it. Thanks. -- H.J.
Re: [PATCH] Fix ICEs in get_bit_range (PR middle-end/50141)
@@ -4354,7 +4354,8 @@ get_bit_range (unsigned HOST_WIDE_INT *b || TREE_CODE (innerdecl) == TARGET_MEM_REF) !ptr_deref_may_alias_global_p (TREE_OPERAND (innerdecl, 0))) || (DECL_P (innerdecl) - (DECL_THREAD_LOCAL_P (innerdecl) + ((TREE_CODE (innerdecl) == VAR_DECL + DECL_THREAD_LOCAL_P (innerdecl)) || !TREE_STATIC (innerdecl Thanks. I already had this in my local tree, but am waiting to polish everything up before another round of iterations with Richi.
Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
Hi, On Mon, 22 Aug 2011, H.J. Lu wrote: Oh, I thought it was data initialized by the constructor ... Sriramans patch right now has a function __cpu_indicator_init which is called from (adhoc constructed) ctors and that initializes variables __cpu_model and __cpu_features ;-) There's no __cpu_indicator symbol :) I think the whole initializer function and the associated data blobs have to sit in static libgcc and be hidden. By that all shared modules will have their own copies of the model and features (and the initializer function) so there won't be issues with copy relocs, or cross shared lib calls while relocating the modules. Dynamically they will contain the same data always, but it's not many bytes, and only modules making use of this facility will pay it. The initializer function has to be callable from pre-.init contexts, e.g. ifunc dispatchers. And to make life easier there should be one ctor function calling this initializer function too, so that normal code can rely on it being already called saving one check. It sounds more complicated than necessary. Why not just do it on demand like glibc does? Ehm, the only difference would be to not have a ctor in libgcc that looks like so: void __attribute__((constructor)) bla(void) { __cpu_indicator_init (); } I don't see any complication.? Ciao, Michael.
Re: RFC: [build, ada] Centralize PICFLAG configuration
Paolo Bonzini bonz...@gnu.org writes: On 08/19/2011 09:11 PM, Rainer Orth wrote: 2011-07-31 Rainer Orthr...@cebitec.uni-bielefeld.de config: * picflag.m4: New file. gcc: * configure.ac (GCC_PICFLAG_FOR_TARGET): Call it. (PICFLAG_FOR_TARGET): Substitute. * aclocal.m4: Regenerate. * configure: Regenerate. gcc/ada: * gcc-interface/Makefile.in (PICFLAG_FOR_TARGET): New. (GNATLIBCFLAGS_FOR_C): Replace TARGET_LIBGCC2_CFLAGS by PICFLAG_FOR_TARGET. (gnatlib-shared-default, gnatlib-shared-dual-win32) (gnatlib-shared-win32, gnatlib-shared-darwin, gnatlib-shared) (gnatlib-sjlj, gnatlib-zcx): Likewise. libada: * configure.ac: Include ../config/picflag.m4. (GCC_PICFLAG): Call it. Substitute. * configure: Regenerate. * Makefile.in (TARGET_LIBGCC2_CFLAGS): Replace by PICFLAG. (GNATLIBCFLAGS_FOR_C): Replace TARGET_LIBGCC2_CFLAGS by PICFLAG. (LIBADA_FLAGS_TO_PASS): Pass PICFLAG as PICFLAG_FOR_TARGET. Don't include $(GCC_DIR)/libgcc.mvars. libiberty: * aclocal.m4: Include ../config/picflag.m4. * configure.ac (GCC_PICFLAG): Call it. (enable_shared): Clear PICFLAG unless shared. * configure: Regenerate. Ok, thanks. installed, thanks. Do I need to sync the config and libiberty parts to src manually or does this happen by some sort of magic? Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: RFC: [build, ada] Centralize PICFLAG configuration
Do I need to sync the config and libiberty parts to src manually or does this happen by some sort of magic? intl/; config.rhost; libiberty/; libiberty's part of include/ gcc: http://gcc.gnu.org Changes need to be done in tandem with the official GCC sources or submitted to the master file maintainer and brought in via a merge. Note: approved patches in gcc's libiberty or intl are automatically approved in this libiberty and intl also; feel free to merge them yourself if needed sooner than the next merge. Otherwise, changes are automatically merged, usually within a day.
Re: repost: [DF] Use HARD_REG_SETs instead of bitmaps
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 08/22/11 11:16, Dimitrios Apostolou wrote: Any updates will come as a followup to this thread. I still have to do some testing on other platforms. Do we have access to any PA and MIPS machinery? I also wanted to test on architecture with lots of hard registers and small long size, but my 32-bit sparcstation crashed badly, any ideas which arch to try next? I thought there was a PA available in the testfarm. jeff -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJOUo9iAAoJEBRtltQi2kC7qLsH/RDhJIi0iK4WSYIyGk/KiG15 4m/1vJzCN0u2E0Aw92ArpQtzFSU3/wMvLJFvJMZQcyWKJ73H0Ri0k81POCXBnEmd nMDoQ8/T1xXTyYRCrnh7Y4ZuVxAhXUqL2V5ONnfiayqCR+z5jeKPZGu9GOStPGhI ysGP+rzSQAwgCF9lrXgy7+zaetnK230BmNw4P5C/yLYhySP5zTTEZ3e1JCCHWSbt fg2Hjt7FKHdzgeSs9X8LpUCZd2QWFDaunlYzOJTrrDj1TatIz9WL+e7NH3+gsrQ9 enL8+CKUeOE63zKOfvarHw3ucxHZn8hYZriCZzdB+RbKnoikCjWCNnGdH8pDj6c= =qSqA -END PGP SIGNATURE-
Re: PING: PATCH: PR target/46770: Use .init_array/.fini_array sections
On Mon, Aug 22, 2011 at 10:09 AM, H.J. Lu hjl.to...@gmail.com wrote: On Sun, Aug 21, 2011 at 10:37 PM, Jakub Jelinek ja...@redhat.com wrote: nd/or add another test to it that tests that you can actually use .section .init_array and it will use correct section flags for the section. We need this information in config.gcc. But config.gcc is used before assembler and readelf are detected. I am running out of ideas. Any suggestions? Thanks. This is a much bigger change that I like and I don't feel very comfortable about it since I can only test it on Linux/x86. -- H.J. 2011-08-22 H.J. Lu hongjiu...@intel.com * acinclude.m4 (gcc_AC_INITFINI_ARRAY): Check if .section .init_array works. * configure.ac: Move binutils tests before gcc_AC_INITFINI_ARRAY. * configure: Regenerated. diff --git a/gcc/acinclude.m4 b/gcc/acinclude.m4 index a8ecd2d..3827046 100644 --- a/gcc/acinclude.m4 +++ b/gcc/acinclude.m4 @@ -376,6 +376,18 @@ AC_DEFUN([gcc_AC_INITFINI_ARRAY], AC_CACHE_CHECK(for .preinit_array/.init_array/.fini_array support, gcc_cv_initfini_array, [dnl if test x${build} = x${target} test x${build} = x${host}; then +if test x$gcc_cv_as != x -a x$gcc_cv_readelf != x ; then + echo .section .init_array; .byte 0 conftest.s + if $gcc_cv_as -o conftest.o conftest.s /dev/null 21; then + if $gcc_cv_readelf -S --wide conftest.o 21 \ + | grep INIT_ARRAY /dev/null 21; then + gcc_cv_as_init_array=yes + fi + fi + rm -f conftest.s conftest.o +fi + fi + if test x${gcc_cv_as_init_array} = xyes; then AC_RUN_IFELSE([AC_LANG_SOURCE([ #ifndef __ELF__ #error Not an ELF OS diff --git a/gcc/configure.ac b/gcc/configure.ac index ed01904..3cae58a 100644 --- a/gcc/configure.ac +++ b/gcc/configure.ac @@ -1153,6 +1153,231 @@ fi # Restore CFLAGS from before the gcc_AC_NEED_DECLARATIONS tests. CFLAGS=$saved_CFLAGS +# Identify the assembler which will work hand-in-glove with the newly +# built GCC, so that we can examine its features. This is the assembler +# which will be driven by the driver program. +# +# If build != host, and we aren't building gas in-tree, we identify a +# build-target assembler and hope that it will have the same features +# as the host-target assembler we'll be using. +gcc_cv_gas_major_version= +gcc_cv_gas_minor_version= +gcc_cv_as_gas_srcdir=`echo $srcdir | sed -e 's,/gcc$,,'`/gas + +m4_pattern_allow([AS_FOR_TARGET])dnl +AS_VAR_SET_IF(gcc_cv_as,, [ +if test -x $DEFAULT_ASSEMBLER; then + gcc_cv_as=$DEFAULT_ASSEMBLER +elif test -f $gcc_cv_as_gas_srcdir/configure.in \ + test -f ../gas/Makefile \ + test x$build = x$host; then + gcc_cv_as=../gas/as-new$build_exeext +elif test -x as$build_exeext; then + # Build using assembler in the current directory. + gcc_cv_as=./as$build_exeext +elif ( set dummy $AS_FOR_TARGET; test -x $[2] ); then +gcc_cv_as=$AS_FOR_TARGET +else +AC_PATH_PROG(gcc_cv_as, $AS_FOR_TARGET) +fi]) + +ORIGINAL_AS_FOR_TARGET=$gcc_cv_as +AC_SUBST(ORIGINAL_AS_FOR_TARGET) +case $ORIGINAL_AS_FOR_TARGET in + ./as | ./as$build_exeext) ;; + *) AC_CONFIG_FILES(as:exec-tool.in, [chmod +x as]) ;; +esac + +AC_MSG_CHECKING(what assembler to use) +if test $gcc_cv_as = ../gas/as-new$build_exeext; then + # Single tree build which includes gas. We want to prefer it + # over whatever linker top-level may have detected, since + # we'll use what we're building after installation anyway. + AC_MSG_RESULT(newly built gas) + in_tree_gas=yes + _gcc_COMPUTE_GAS_VERSION + in_tree_gas_is_elf=no + if grep 'obj_format = elf' ../gas/Makefile /dev/null \ + || (grep 'obj_format = multi' ../gas/Makefile \ + grep 'extra_objects =.* obj-elf' ../gas/Makefile) /dev/null + then +in_tree_gas_is_elf=yes + fi +else + AC_MSG_RESULT($gcc_cv_as) + in_tree_gas=no +fi + +# Identify the linker which will work hand-in-glove with the newly +# built GCC, so that we can examine its features. This is the linker +# which will be driven by the driver program. +# +# If build != host, and we aren't building gas in-tree, we identify a +# build-target linker and hope that it will have the same features +# as the host-target linker we'll be using. +gcc_cv_gld_major_version= +gcc_cv_gld_minor_version= +gcc_cv_ld_gld_srcdir=`echo $srcdir | sed -e 's,/gcc$,,'`/ld +gcc_cv_ld_bfd_srcdir=`echo $srcdir | sed -e 's,/gcc$,,'`/bfd + +AS_VAR_SET_IF(gcc_cv_ld,, [ +if test -x $DEFAULT_LINKER; then + gcc_cv_ld=$DEFAULT_LINKER +elif test -f $gcc_cv_ld_gld_srcdir/configure.in \ + test -f ../ld/Makefile \ + test x$build = x$host; then + gcc_cv_ld=../ld/ld-new$build_exeext +elif test -x collect-ld$build_exeext; then + # Build using linker in the current directory. + gcc_cv_ld=./collect-ld$build_exeext +elif ( set dummy $LD_FOR_TARGET; test -x $[2] ); then +gcc_cv_ld=$LD_FOR_TARGET +else +AC_PATH_PROG(gcc_cv_ld, $LD_FOR_TARGET) +fi]) + +ORIGINAL_PLUGIN_LD_FOR_TARGET=$gcc_cv_ld
[ping 2] [patch] attribute to reverse bitfield allocations
Ping 2 ? http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01889.html http://gcc.gnu.org/ml/gcc-patches/2011-07/msg02555.html
Re: [patch] support for multiarch systems
On 08/21/2011 02:14 AM, Matthias Klose wrote: On 08/20/2011 09:51 PM, Matthias Klose wrote: Multiarch [1] is the term being used to refer to the capability of a system to install and run applications of multiple different binary targets on the same system. The idea and name of multiarch dates back to 2004/2005 [2] (to be confused with multiarch in glibc). attached is an updated patch which includes feedback from Jakub and Joseph. Perhaps I could like this patch ? It probably solves http://gcc.gnu.org/ml/gcc-testresults/2011-08/msg02398.html [ My system is Debian Testing, updated 20110821 at 12:15 UTC ] (h/t Mark Glisse). Thanks in advance, -- Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/ Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news
Re: [var-tracking] [not-good!] disable shared_hash and other simplifications
On Mon, Aug 22, 2011 at 1:54 PM, Jakub Jelinek ja...@redhat.com wrote: * From a very wide field of view, all this DF solving reminded me a lot of what I've seen in df-*.c. Why can't variable tracking be integrated in the main DF pass of the compiler, the one that happens *after* register allocation? I don't think there is any significant overlap. Yes, var-tracking is a data flow algorithm, but the code that does that is just a few lines, the big part is how the merges are done and how the tracking propagates through a basic block, and that has nothing in common with df-*.c. But perhaps with the DF solver fewer blocks are visited...? Ciao! Steven
Re: [var-tracking] [not-good!] disable shared_hash and other simplifications
On Mon, Aug 22, 2011 at 07:49:43PM +0200, Steven Bosscher wrote: On Mon, Aug 22, 2011 at 1:54 PM, Jakub Jelinek ja...@redhat.com wrote: * From a very wide field of view, all this DF solving reminded me a lot of what I've seen in df-*.c. Why can't variable tracking be integrated in the main DF pass of the compiler, the one that happens *after* register allocation? I don't think there is any significant overlap. Yes, var-tracking is a data flow algorithm, but the code that does that is just a few lines, the big part is how the merges are done and how the tracking propagates through a basic block, and that has nothing in common with df-*.c. But perhaps with the DF solver fewer blocks are visited...? Or perhaps more. Who knows. Jakub
Re: RFC: [build, ada] Centralize PICFLAG configuration
DJ Delorie d...@redhat.com writes: Do I need to sync the config and libiberty parts to src manually or does this happen by some sort of magic? intl/; config.rhost; libiberty/; libiberty's part of include/ gcc: http://gcc.gnu.org Changes need to be done in tandem with the official GCC sources or submitted to the master file maintainer and brought in via a merge. Note: approved patches in gcc's libiberty or intl are automatically approved in this libiberty and intl also; feel free to merge them yourself if needed sooner than the next merge. Otherwise, changes are automatically merged, usually within a day. Thanks. I've just installed config/picflag.m4 and rely on the automatic merge for libiberty. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[pph] Cleanup line_table and includes streaming (issue4921052)
This is a refactoring patch, it doesn't change/fix anything in pph itself. I extracted out/in logic for includes into their own functions. I removed the LINETAB parameter from in/out functions for the line_table as there is only one line_table and that's the only one we stream, the main/global one. I wanted to move pph_loc_offset to pph_stream.encoder.r, but it's not possible while locations are called by the streamer hook as in the callback we do not have a pph_stream* parameter, added a FIXME to do it later if we do get rid of the hook for locations. Tested with bootstrap and pph regression testing on x64. Cheers, Gab 2011-08-22 Gabriel Charette gch...@google.com * pph-streamer-in.c (pph_loc_offset): Add FIXME to move this variable to pph_stream.encoder.r (pph_in_include): New. (pph_in_line_table_and_includes): Remove LINETAB parameter. Update all users to use global LINE_TABLE instead. Remove code moved to new function pph_in_include. And call it. (pph_read_file_1): Remove extra parameter in call to pph_in_line_table_and_includes. * pph-streamer-out.c (pph_out_include): New. (pph_get_next_include): Fix comment. (pph_out_line_table_and_includes): Remove LINETAB parameter. Update all users to use global LINE_TABLE instead. Remove code moved to new function pph_out_include. And call it. * gcc/cp/pph-streamer.h (pph_stream): Fix indenting. diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c index 07e1135..2b7dc2d 100644 --- a/gcc/cp/pph-streamer-in.c +++ b/gcc/cp/pph-streamer-in.c @@ -75,7 +75,11 @@ static int pph_reading_includes = 0; } while (0) /* Set in pph_in_and_merge_line_table. Represents the source_location offset - which every streamed in token must add to it's serialized source_location. */ + which every streamed in token must add to it's serialized source_location. + + FIXME pph: Ideally this would be in pph_stream.encoder.r, but for that we + first need to get rid of the dependency to the streamer_hook for locations. + */ static int pph_loc_offset; @@ -1336,9 +1340,37 @@ pph_in_line_map (pph_stream *stream, struct line_map *lm) } -/* Read the line_table from STREAM and merge it in LINETAB. At the same time - load includes in the order they were originally included by loading them at - the point they were referenced in the line_table. +/* Read in from STREAM and merge a referenced include into the current parsing + context. */ + +static void +pph_in_include (pph_stream *stream) +{ + int old_loc_offset; + const char *include_name; + pph_stream *include; + source_location prev_start_loc = pph_in_source_location (stream); + + /* Simulate highest_location to be as it would be at this point in a non-pph + compilation. */ + line_table-highest_location = (prev_start_loc - 1) + pph_loc_offset; + + /* FIXME pph: If we move pph_loc_offset to pph_stream.encoder.r, we could + have an independent offset for each stream and not have to save and + restore the state of a global pph_loc_offset as we are doing here. */ + old_loc_offset = pph_loc_offset; + + include_name = pph_in_string (stream); + include = pph_read_file (include_name); + pph_add_include (include, false); + + pph_loc_offset = old_loc_offset; +} + + +/* Read the line_table from STREAM and merge it in the current line_table. At + the same time load includes in the order they were originally included by + loading them at the point they were referenced in the line_table. Returns the source_location of line 1 / col 0 for this include. @@ -1348,13 +1380,13 @@ pph_in_line_map (pph_stream *stream, struct line_map *lm) a known current issue, so I didn't bother working around it here for now. */ static source_location -pph_in_line_table_and_includes (pph_stream *stream, struct line_maps *linetab) +pph_in_line_table_and_includes (pph_stream *stream) { unsigned int old_depth; bool first; int includer_ix = -1; - unsigned int used_before = linetab-used; - int entries_offset = linetab-used - PPH_NUM_IGNORED_LINE_TABLE_ENTRIES; + unsigned int used_before = line_table-used; + int entries_offset = line_table-used - PPH_NUM_IGNORED_LINE_TABLE_ENTRIES; enum pph_linetable_marker next_lt_marker = pph_in_linetable_marker (stream); pph_reading_includes++; @@ -1364,29 +1396,16 @@ pph_in_line_table_and_includes (pph_stream *stream, struct line_maps *linetab) { if (next_lt_marker == PPH_LINETABLE_REFERENCE) { - int old_loc_offset; - const char *include_name = pph_in_string (stream); - source_location prev_start_loc = pph_in_source_location (stream); - pph_stream *include; - gcc_assert (!first); - - linetab-highest_location = (prev_start_loc - 1) + pph_loc_offset; - - old_loc_offset = pph_loc_offset; - - include = pph_read_file (include_name);
Re: [pph] Cleanup line_table and includes streaming (issue4921052)
On 11-08-22 14:20 , Gabriel Charette wrote: 2011-08-22 Gabriel Charettegch...@google.com * pph-streamer-in.c (pph_loc_offset): Add FIXME to move this variable to pph_stream.encoder.r (pph_in_include): New. (pph_in_line_table_and_includes): Remove LINETAB parameter. Update all users to use global LINE_TABLE instead. Remove code moved to new function pph_in_include. And call it. (pph_read_file_1): Remove extra parameter in call to pph_in_line_table_and_includes. * pph-streamer-out.c (pph_out_include): New. (pph_get_next_include): Fix comment. (pph_out_line_table_and_includes): Remove LINETAB parameter. Update all users to use global LINE_TABLE instead. Remove code moved to new function pph_out_include. And call it. * gcc/cp/pph-streamer.h (pph_stream): Fix indenting. OK. Diego.
Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
On Mon, Aug 22, 2011 at 9:02 AM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Aug 22, 2011 at 8:56 AM, Michael Matz m...@suse.de wrote: Hi, On Mon, 22 Aug 2011, H.J. Lu wrote: Oh, I thought it was data initialized by the constructor ... Sriramans patch right now has a function __cpu_indicator_init which is called from (adhoc constructed) ctors and that initializes variables __cpu_model and __cpu_features ;-) There's no __cpu_indicator symbol :) I think the whole initializer function and the associated data blobs have to sit in static libgcc and be hidden. By that all shared modules will have their own copies of the model and features (and the initializer function) so there won't be issues with copy relocs, or cross shared lib calls while relocating the modules. Dynamically they will contain the same data always, but it's not many bytes, and only modules making use of this facility will pay it. The initializer function has to be callable from pre-.init contexts, e.g. ifunc dispatchers. And to make life easier there should be one ctor function calling this initializer function too, so that normal code can rely on it being already called saving one check. It sounds more complicated than necessary. Why not just do it on demand like glibc does? Ehm, the only difference would be to not have a ctor in libgcc that looks like so: void __attribute__((constructor)) bla(void) { __cpu_indicator_init (); } I don't see any complication.? Order of constructors. A constructor may call functions which use __cpu_indicator. I have a suggestion that is a hybrid of the proposed solutions here: 1) Make a constructor in every module that calls __cpu_indicator_init and make it to be the first constructor to run. Will this solve the ordering problem? 2) Change __cpu_indicator_init to run only once by using a variable to check if it has been run before. So, each module's constructor will call __cpu_indicator_init but the CPUID insns are only done once. I also avoid the extra overhead of having to check if __cpu_indicator_init is called from within the binary. Will this work? Thanks, -Sri. -- H.J.
Re: PING: PATCH: PR target/46770: Use .init_array/.fini_array sections
On Mon, 22 Aug 2011, H.J. Lu wrote: On Sun, Aug 21, 2011 at 10:37 PM, Jakub Jelinek ja...@redhat.com wrote: nd/or add another test to it that tests that you can actually use .section .init_array and it will use correct section flags for the section. We need this information in config.gcc. But config.gcc is used before assembler and readelf are detected. I am running out of ideas. Any suggestions? Require a good assembler on ELF targets and just enable this by default for them without trying a configure test that won't work for cross compilation (AC_RUN_IFELSE is bad). The toplevel config/elf.m4 provides a good notion of what is or is not ELF (if there are problems, we can fix that file). Only a handful of targets support non-GNU assemblers; for the vast bulk of targets we should assume a not-too-old GNU assembler. That way, the configure test can be used to cause a configure-time error if the assembler is defective and it doesn't matter that the test is late. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] PR middle-end/38509: add -Wfree-nonheap-object warning option
On 11-08-21 18:14 , Mark Heffernan wrote: Ping? Mark On Fri, Aug 12, 2011 at 9:41 AM, Mark Heffernanmeh...@google.com wrote: This patch adds an option for enabling/disabling the warning for attempting to free nonheap objects (PR/38509). The warning is imprecise and can issue false positives. Bootstrapped on x86-64. Ok for trunk? Mark 2011-08-11 Mark Heffernanmeh...@google.com PR middle-end/38509 * common.opt (Wfree-nonheap-object): New option. * doc/invoke.texi (Warning options): Document -Wfree-nonheap-object. * builtins.c (maybe_emit_free_warning): Add OPT_Wfree_nonheap_object to warning. OK. Diego.
Re: PING: PATCH: PR target/46770: Use .init_array/.fini_array sections
On Mon, Aug 22, 2011 at 11:53 AM, Joseph S. Myers jos...@codesourcery.com wrote: On Mon, 22 Aug 2011, H.J. Lu wrote: On Sun, Aug 21, 2011 at 10:37 PM, Jakub Jelinek ja...@redhat.com wrote: nd/or add another test to it that tests that you can actually use .section .init_array and it will use correct section flags for the section. We need this information in config.gcc. But config.gcc is used before assembler and readelf are detected. I am running out of ideas. Any suggestions? Require a good assembler on ELF targets and just enable this by default for them without trying a configure test that won't work for cross compilation (AC_RUN_IFELSE is bad). The toplevel config/elf.m4 provides a good notion of what is or is not ELF (if there are problems, we can fix that file). Only a handful of targets support non-GNU assemblers; for the vast bulk of targets we should assume a not-too-old GNU assembler. That way, the configure test can be used to cause a configure-time error if the assembler is defective and it doesn't matter that the test is late. A working .init_array support needs assembler, linker and libc. That is why AC_RUN_IFELSE is used. -- H.J.
Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
On Mon, Aug 22, 2011 at 11:50 AM, Sriraman Tallam tmsri...@google.com wrote: On Mon, Aug 22, 2011 at 9:02 AM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Aug 22, 2011 at 8:56 AM, Michael Matz m...@suse.de wrote: Hi, On Mon, 22 Aug 2011, H.J. Lu wrote: Oh, I thought it was data initialized by the constructor ... Sriramans patch right now has a function __cpu_indicator_init which is called from (adhoc constructed) ctors and that initializes variables __cpu_model and __cpu_features ;-) There's no __cpu_indicator symbol :) I think the whole initializer function and the associated data blobs have to sit in static libgcc and be hidden. By that all shared modules will have their own copies of the model and features (and the initializer function) so there won't be issues with copy relocs, or cross shared lib calls while relocating the modules. Dynamically they will contain the same data always, but it's not many bytes, and only modules making use of this facility will pay it. The initializer function has to be callable from pre-.init contexts, e.g. ifunc dispatchers. And to make life easier there should be one ctor function calling this initializer function too, so that normal code can rely on it being already called saving one check. It sounds more complicated than necessary. Why not just do it on demand like glibc does? Ehm, the only difference would be to not have a ctor in libgcc that looks like so: void __attribute__((constructor)) bla(void) { __cpu_indicator_init (); } I don't see any complication.? Order of constructors. A constructor may call functions which use __cpu_indicator. I have a suggestion that is a hybrid of the proposed solutions here: 1) Make a constructor in every module that calls __cpu_indicator_init and make it to be the first constructor to run. Will this solve the ordering problem? 2) Change __cpu_indicator_init to run only once by using a variable to check if it has been run before. So, each module's constructor will call __cpu_indicator_init but the CPUID insns are only done once. I also avoid the extra overhead of having to check if __cpu_indicator_init is called from within the binary. Will this work? Please make it simple like if __cpu_indicator is not initialized then call __cpu_indicator_init fi use __cpu_indicator -- H.J.
Re: PING: PATCH: PR target/46770: Use .init_array/.fini_array sections
On Mon, 22 Aug 2011, H.J. Lu wrote: Require a good assembler on ELF targets and just enable this by default for them without trying a configure test that won't work for cross compilation (AC_RUN_IFELSE is bad). The toplevel config/elf.m4 provides a good notion of what is or is not ELF (if there are problems, we can fix that file). Only a handful of targets support non-GNU assemblers; for the vast bulk of targets we should assume a not-too-old GNU assembler. That way, the configure test can be used to cause a configure-time error if the assembler is defective and it doesn't matter that the test is late. A working .init_array support needs assembler, linker and libc. That is why AC_RUN_IFELSE is used. A working .init_array is a standard part of ELF. The correct default for cross compilation to an ELF target (as determined by elf.m4) is to assume it is present; this is not a matter of a GNU extension that could affect interoperation with other tools. -- Joseph S. Myers jos...@codesourcery.com
Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
On Mon, Aug 22, 2011 at 11:58 AM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Aug 22, 2011 at 11:50 AM, Sriraman Tallam tmsri...@google.com wrote: On Mon, Aug 22, 2011 at 9:02 AM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Aug 22, 2011 at 8:56 AM, Michael Matz m...@suse.de wrote: Hi, On Mon, 22 Aug 2011, H.J. Lu wrote: Oh, I thought it was data initialized by the constructor ... Sriramans patch right now has a function __cpu_indicator_init which is called from (adhoc constructed) ctors and that initializes variables __cpu_model and __cpu_features ;-) There's no __cpu_indicator symbol :) I think the whole initializer function and the associated data blobs have to sit in static libgcc and be hidden. By that all shared modules will have their own copies of the model and features (and the initializer function) so there won't be issues with copy relocs, or cross shared lib calls while relocating the modules. Dynamically they will contain the same data always, but it's not many bytes, and only modules making use of this facility will pay it. The initializer function has to be callable from pre-.init contexts, e.g. ifunc dispatchers. And to make life easier there should be one ctor function calling this initializer function too, so that normal code can rely on it being already called saving one check. It sounds more complicated than necessary. Why not just do it on demand like glibc does? Ehm, the only difference would be to not have a ctor in libgcc that looks like so: void __attribute__((constructor)) bla(void) { __cpu_indicator_init (); } I don't see any complication.? Order of constructors. A constructor may call functions which use __cpu_indicator. I have a suggestion that is a hybrid of the proposed solutions here: 1) Make a constructor in every module that calls __cpu_indicator_init and make it to be the first constructor to run. Will this solve the ordering problem? 2) Change __cpu_indicator_init to run only once by using a variable to check if it has been run before. So, each module's constructor will call __cpu_indicator_init but the CPUID insns are only done once. I also avoid the extra overhead of having to check if __cpu_indicator_init is called from within the binary. Will this work? Please make it simple like if __cpu_indicator is not initialized then call __cpu_indicator_init fi use __cpu_indicator Will do, thanks. -Sri. -- H.J.
Re: [PATCH, test, i386] Fix for PR50155
On Mon, Aug 22, 2011 at 8:51 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Attached fix for http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50155 ChangeLog entry: 2011-08-22 Kirill Yukhin kirill.yuk...@intel.com PR target/50155 * config/i386/sse.md (VI1248_AVX2): New. (plusminus_insnmode3): Update. (*plusminus_insnmode3): Likewise. (sse2_avx2_andnotmode3): Likewise. (avx2_pbroadcastmode): Likewise. testsuite/ChangeLog entry: 2011-08-22 Kirill Yukhin kirill.yuk...@intel.com PR target/50155 * gcc.target/i386/pr50155.c: New test. New test fails without fix, passed with it applied. Ok for trunk if bootstrap will success? No. - you are disabling andnotps for 256bit integer modes on !TARGET_AVX2 targets. - avx2_pbroadcast change is a no-op. I found two additional problems with the patch: - order of evaluation of cond RTX in mode attribute calculation is wrong for *andnotmode3 and *any_logic:codemode3 instructions. - shortmode mode attribute is not used (minor) Attached (lightly tested) patch fixes all problems and adds additional asserts into mentioned logic instructions. Uros. Index: sse.md === --- sse.md (revision 177968) +++ sse.md (working copy) @@ -73,6 +73,12 @@ (V8SI TARGET_AVX) V4SI (V4DI TARGET_AVX) V2DI]) +(define_mode_iterator VI_AVX2 + [(V32QI TARGET_AVX2) V16QI + (V16HI TARGET_AVX2) V8HI + (V8SI TARGET_AVX2) V4SI + (V4DI TARGET_AVX2) V2DI]) + ;; All QImode vector integer modes (define_mode_iterator VI1 [(V32QI TARGET_AVX) V16QI]) @@ -124,8 +130,8 @@ [V4SI V4DI]) (define_mode_iterator V48_AVX2 - [(V4SF TARGET_SSE) (V2DF TARGET_SSE2) - (V8SF TARGET_AVX) (V4DF TARGET_AVX) + [V4SF V2DF + V8SF V4DF (V4SI TARGET_AVX2) (V2DI TARGET_AVX2) (V8SI TARGET_AVX2) (V4DI TARGET_AVX2)]) @@ -170,9 +176,6 @@ (define_mode_attr ssebytemode [(V4DI V32QI) (V2DI V16QI)]) -(define_mode_attr shortmode - [(V4DI v4si) (V2DI v2si)]) - ;; All 128bit vector integer modes (define_mode_iterator VI_128 [V16QI V8HI V4SI V2DI]) @@ -4641,18 +4644,18 @@ operands[2] = force_reg (MODEmode, CONST0_RTX (MODEmode));) (define_expand plusminus_insnmode3 - [(set (match_operand:VI 0 register_operand ) - (plusminus:VI - (match_operand:VI 1 nonimmediate_operand ) - (match_operand:VI 2 nonimmediate_operand )))] + [(set (match_operand:VI_AVX2 0 register_operand ) + (plusminus:VI_AVX2 + (match_operand:VI_AVX2 1 nonimmediate_operand ) + (match_operand:VI_AVX2 2 nonimmediate_operand )))] TARGET_SSE2 ix86_fixup_binary_operands_no_copy (CODE, MODEmode, operands);) (define_insn *plusminus_insnmode3 - [(set (match_operand:VI 0 register_operand =x,x) - (plusminus:VI - (match_operand:VI 1 nonimmediate_operand comm0,x) - (match_operand:VI 2 nonimmediate_operand xm,xm)))] + [(set (match_operand:VI_AVX2 0 register_operand =x,x) + (plusminus:VI_AVX2 + (match_operand:VI_AVX2 1 nonimmediate_operand comm0,x) + (match_operand:VI_AVX2 2 nonimmediate_operand xm,xm)))] TARGET_SSE2 ix86_binary_operator_ok (CODE, MODEmode, operands) @ pplusminus_mnemonicssemodesuffix\t{%2, %0|%0, %2} @@ -6176,10 +6179,30 @@ { static char buf[32]; const char *ops; - const char *tmp -= ((get_attr_mode (insn) == MODE_TI) || - (get_attr_mode (insn) == MODE_OI)) ? pandn : andnps; + const char *tmp; + switch (get_attr_mode (insn)) +{ +case MODE_OI: + gcc_assert (TARGET_AVX2); +case MODE_TI: + gcc_assert (TARGET_SSE2); + + tmp = pandn; + break; + + case MODE_V8SF: + gcc_assert (TARGET_AVX); + case MODE_V4SF: + gcc_assert (TARGET_SSE); + + tmp = andnps; + break; + + default: + gcc_unreachable (); + } + switch (which_alternative) { case 0: @@ -6205,12 +6228,12 @@ (const_string *))) (set_attr prefix orig,vex) (set (attr mode) - (cond [(ne (symbol_ref GET_MODE_SIZE (MODEmode) 128) (const_int 0)) + (cond [(ne (symbol_ref TARGET_AVX2) (const_int 0)) + (const_string OI) + (ne (symbol_ref GET_MODE_SIZE (MODEmode) 128) (const_int 0)) (const_string V8SF) (ne (symbol_ref TARGET_SSE2) (const_int 0)) (const_string TI) - (ne (symbol_ref TARGET_AVX2) (const_int 0)) - (const_string OI) ] (const_string V4SF)))]) @@ -6232,10 +6255,30 @@ { static char buf[32]; const char *ops; - const char *tmp -= (get_attr_mode (insn) == MODE_TI)|| - (get_attr_mode (insn) == MODE_OI) ? plogic : logicps; + const char *tmp; + switch (get_attr_mode (insn)) +{ +case MODE_OI: + gcc_assert (TARGET_AVX2); +case MODE_TI: + gcc_assert (TARGET_SSE2); + + tmp = plogic; + break; + + case MODE_V8SF: + gcc_assert (TARGET_AVX);
Re: [M32C] Hookize CLASS_MAX_NREGS
OK to install? * config/m32c/m32c.h (CLASS_MAX_NREGS): Remove macro. * config/m32c/m32c-protos.h (m32c_class_max_nregs): Remove. * config/m32c/m32c.c (m32c_class_max_nregs): Make static. Change regclass argument type to reg_class_t. Change 'max' and 'v' vars and return types to unsigned char. Use reg_class_contents instead of class_contents. (TARGET_CLASS_MAX_NREGS): Define. Ok.
Re: [PATCH, i386, testsuite] FMA intrinsics
On Mon, Aug 22, 2011 at 6:25 PM, Ilya Tocar tocarip.in...@gmail.com wrote: You don't need to add negated versions, one FMA builtin per mode is enough, please see existing FMA4 descriptions. Just put unary minus sign in the intrinsics header for negated operand and let GCC do its job. Please see existing FMA4 intrinsics header. Actually i tried that.But in such case when i compile(FMA4 example) #include x86intrin.h extern __m128 a,b,c; void foo(){ a = _mm_nmsub_ps(a,b,c); } with -S -O0 -mfma4 The asm have vxorps %xmm1, %xmm0, %xmm0 vmovaps -16(%rbp), %xmm1 vmovaps .LC0(%rip), %xmm2 vxorps %xmm2, %xmm1, %xmm1 vfmaddps %xmm0, -32(%rbp), %xmm1, %xmm0 So vfmaddps of negated values is generated instead of vfnmsubps. I think it is bad that intrinsic for instruction can generate code without this instruction. So to make sure that exact instruction is always generated i introduced additional expands and builtins. Is it wrong? This is artificial limitation. User requested the functionality of the intrinsic, and should not bother with how the compiler realizes it. With -O2, negation would propagate into the insn during combine pass, and optimal instruction would be generated. So, to answer your question - it is wrong to expect exact instruction from builtins. Maybe from using -O0, but this should not be used anyway in the testsuite. Uros.
Re: [C++ PATCH] Clear TYPE_TRANSPARENT_AGGR if there are no fields (PR c++/46862)
OK. Jason
Re: Vector Comparison patch
On Mon, Aug 22, 2011 at 5:34 PM, Richard Guenther richard.guent...@gmail.com wrote: In this case it is simple to analyse that a is a comparison, but you cannot embed the operations of a into VEC_COND_EXPR. Sure, but if the above is C source the frontend would generate res = a != 0 ? v0 : v1; initially. An optimization pass could still track this information and replace VEC_COND_EXPR a != 0, v0, v1 with VEC_COND_EXPR a, v0, v1 (no existing one would track vector contents though). Ok, I am testing the patch that removes hooks. Could you push a little bit the backend-patterns business? Well, I suppose we're waiting for Uros here. I hadn't much luck with fiddling with the mode-iterator stuff myself. It is not _that_ trivial change, since we have ix86_expand_fp_vcond and ix86_expand_int_vcond to merge. ATM, FP version deals with FP operands and vice versa. We have to merge them somehow and split out comparison part that handles FP as well as integer operands. I also don't know why vcond is not allowed to FAIL... probably middle-end should be enhanced for a fallback if some comparison isn't supported by optab. Uros.
Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
On Mon, Aug 22, 2011 at 6:02 PM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Aug 22, 2011 at 8:56 AM, Michael Matz m...@suse.de wrote: Hi, On Mon, 22 Aug 2011, H.J. Lu wrote: Oh, I thought it was data initialized by the constructor ... Sriramans patch right now has a function __cpu_indicator_init which is called from (adhoc constructed) ctors and that initializes variables __cpu_model and __cpu_features ;-) There's no __cpu_indicator symbol :) I think the whole initializer function and the associated data blobs have to sit in static libgcc and be hidden. By that all shared modules will have their own copies of the model and features (and the initializer function) so there won't be issues with copy relocs, or cross shared lib calls while relocating the modules. Dynamically they will contain the same data always, but it's not many bytes, and only modules making use of this facility will pay it. The initializer function has to be callable from pre-.init contexts, e.g. ifunc dispatchers. And to make life easier there should be one ctor function calling this initializer function too, so that normal code can rely on it being already called saving one check. It sounds more complicated than necessary. Why not just do it on demand like glibc does? Ehm, the only difference would be to not have a ctor in libgcc that looks like so: void __attribute__((constructor)) bla(void) { __cpu_indicator_init (); } I don't see any complication.? Order of constructors. A constructor may call functions which use __cpu_indicator. As I said - make __cpu_indicator have a conservative default value (zero). It is irrelevant if constructors that run before initializing __cpu_indicator run with the default CPU capabilities. Richard. -- H.J.