[patch] [1/2] Support reduction in loop SLP
Hi, This is the first part of reduction support in loop-aware SLP. The purpose of the patch is to handle unrolled reductions such as: #a1 = phi a0, a5 ... a2 = a1 + x ... a3 = a2 + y ... a5 = a4 + z Such sequence of statements is gathered into a reduction chain and serves as a root for an SLP instance (similar to a group of strided stores in the existing loop SLP implementation). The patch also fixes PR tree-optimization/41881. Since reduction chains use the same data structure as strided data accesses, this part of the patch renames these data structures, removing data-ref and interleaving references. Bootstrapped and tested on powerpc64-suse-linux. I am going to apply it later today. Ira ChangeLog: * tree-vect-loop-manip.c (vect_create_cond_for_alias_checks): Use new names for group elements access. * tree-vectorizer.h (struct _stmt_vec_info): Use interleaving info for reduction chains as well. Remove data reference and interleaving related words from the fields names. * tree-vect-loop.c (vect_transform_loop): Use new names for group elements access. * tree-vect-data-refs.c (vect_get_place_in_interleaving_chain, vect_insert_into_interleaving_chain, vect_update_interleaving_chain, vect_update_interleaving_chain, vect_same_range_drs, vect_analyze_data_ref_dependence, vect_update_misalignment_for_peel, vect_verify_datarefs_alignment, vector_alignment_reachable_p, vect_peeling_hash_get_lowest_cost, vect_enhance_data_refs_alignment, vect_analyze_group_access, vect_analyze_data_ref_access, vect_create_data_ref_ptr, vect_transform_strided_load, vect_record_strided_load_vectors): Likewise. * tree-vect-stmts.c (vect_model_simple_cost, vect_model_store_cost, vect_model_load_cost, vectorizable_store, vectorizable_load, vect_remove_stores, new_stmt_vec_info): Likewise. * tree-vect-slp.c (vect_build_slp_tree, vect_supported_slp_permutation_p, vect_analyze_slp_instance): Likewise. Index: tree-vect-loop-manip.c === --- tree-vect-loop-manip.c (revision 173814) +++ tree-vect-loop-manip.c (working copy) @@ -2437,7 +2437,7 @@ vect_create_cond_for_alias_checks (loop_vec_info l dr_a = DDR_A (ddr); stmt_a = DR_STMT (DDR_A (ddr)); - dr_group_first_a = DR_GROUP_FIRST_DR (vinfo_for_stmt (stmt_a)); + dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a)); if (dr_group_first_a) { stmt_a = dr_group_first_a; @@ -2446,7 +2446,7 @@ vect_create_cond_for_alias_checks (loop_vec_info l dr_b = DDR_B (ddr); stmt_b = DR_STMT (DDR_B (ddr)); - dr_group_first_b = DR_GROUP_FIRST_DR (vinfo_for_stmt (stmt_b)); + dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b)); if (dr_group_first_b) { stmt_b = dr_group_first_b; Index: tree-vectorizer.h === --- tree-vectorizer.h (revision 173814) +++ tree-vectorizer.h (working copy) @@ -468,15 +473,15 @@ typedef struct _stmt_vec_info { /* Whether the stmt is SLPed, loop-based vectorized, or both. */ enum slp_vect_type slp_type; - /* Interleaving info. */ - /* First data-ref in the interleaving group. */ - gimple first_dr; - /* Pointer to the next data-ref in the group. */ - gimple next_dr; - /* In case that two or more stmts share data-ref, this is the pointer to the - previously detected stmt with the same dr. */ + /* Interleaving and reduction chains info. */ + /* First element in the group. */ + gimple first_element; + /* Pointer to the next element in the group. */ + gimple next_element; + /* For data-refs, in case that two or more stmts share data-ref, this is the + pointer to the previously detected stmt with the same dr. */ gimple same_dr_stmt; - /* The size of the interleaving group. */ + /* The size of the group. */ unsigned int size; /* For stores, number of stores from this group seen. We vectorize the last one. */ @@ -527,22 +532,22 @@ typedef struct _stmt_vec_info { #define STMT_VINFO_RELATED_STMT(S) (S)-related_stmt #define STMT_VINFO_SAME_ALIGN_REFS(S) (S)-same_align_refs #define STMT_VINFO_DEF_TYPE(S) (S)-def_type -#define STMT_VINFO_DR_GROUP_FIRST_DR(S)(S)-first_dr -#define STMT_VINFO_DR_GROUP_NEXT_DR(S) (S)-next_dr -#define STMT_VINFO_DR_GROUP_SIZE(S)(S)-size -#define STMT_VINFO_DR_GROUP_STORE_COUNT(S) (S)-store_count -#define STMT_VINFO_DR_GROUP_GAP(S) (S)-gap -#define STMT_VINFO_DR_GROUP_SAME_DR_STMT(S)(S)-same_dr_stmt -#define STMT_VINFO_DR_GROUP_READ_WRITE_DEPENDENCE(S) (S)-read_write_dep -#define STMT_VINFO_STRIDED_ACCESS(S) ((S)-first_dr != NULL) +#define STMT_VINFO_GROUP_FIRST_ELEMENT(S) (S)-first_element +#define
[patch] [2/2] Support reduction in loop SLP
This part adds the actual code for reduction support. Bootstrapped and tested on powerpc64-suse-linux. I am planning to apply it later today. Ira ChangeLog: PR tree-optimization/41881 * tree-vectorizer.h (struct _loop_vec_info): Add new field reduction_chains along with a macro for its access. * tree-vect-loop.c (new_loop_vec_info): Initialize reduction chains. (destroy_loop_vec_info): Free reduction chains. (vect_analyze_loop_2): Return false if vect_analyze_slp() returns false. (vect_is_slp_reduction): New function. (vect_is_simple_reduction_1): Call vect_is_slp_reduction. (vect_create_epilog_for_reduction): Support SLP reduction chains. * tree-vect-slp.c (vect_get_and_check_slp_defs): Allow different definition types for reduction chains. (vect_supported_load_permutation_p): Don't allow permutations for reduction chains. (vect_analyze_slp_instance): Support reduction chains. (vect_analyze_slp): Try to build SLP instance from reduction chains. (vect_get_constant_vectors): Handle reduction chains. (vect_schedule_slp_instance): Mark the first statement of the reduction chain as reduction. testsuite/ChangeLog: PR tree-optimization/41881 * gcc.dg/vect/O3-pr41881.c: New test. * gcc.dg/vect/O3-slp-reduc-10.c: New test. Index: testsuite/gcc.dg/vect/O3-slp-reduc-10.c === --- testsuite/gcc.dg/vect/O3-slp-reduc-10.c (revision 0) +++ testsuite/gcc.dg/vect/O3-slp-reduc-10.c (revision 0) @@ -0,0 +1,43 @@ +/* { dg-require-effective-target vect_int } */ + +#include stdarg.h +#include tree-vect.h + +#define N 128 +#define TYPE int +#define RESULT 755918 + +__attribute__ ((noinline)) TYPE fun2 (TYPE *x, TYPE *y, unsigned int n) +{ + int i, j; + TYPE dot = 14; + + for (i = 0; i n / 2; i++) +for (j = 0; j 2; j++) + dot += *(x++) * *(y++); + + return dot; +} + +int main (void) +{ + TYPE a[N], b[N], dot; + int i; + + check_vect (); + + for (i = 0; i N; i++) +{ + a[i] = i; + b[i] = i+8; +} + + dot = fun2 (a, b, N); + if (dot != RESULT) +abort(); + + return 0; +} + +/* { dg-final { scan-tree-dump-times vectorized 1 loops 2 vect { target { vect_int_mult {! vect_no_align } } } } } */ +/* { dg-final { cleanup-tree-dump vect } } */ Index: testsuite/gcc.dg/vect/O3-pr41881.c === --- testsuite/gcc.dg/vect/O3-pr41881.c (revision 0) +++ testsuite/gcc.dg/vect/O3-pr41881.c (revision 0) @@ -0,0 +1,30 @@ +/* { dg-do compile } */ + +#define TYPE int + +TYPE fun1(TYPE *x, TYPE *y, unsigned int n) +{ + int i, j; + TYPE dot = 0; + + for (i = 0; i n; i++) +dot += *(x++) * *(y++); + + return dot; +} + +TYPE fun2(TYPE *x, TYPE *y, unsigned int n) +{ + int i, j; + TYPE dot = 0; + + for (i = 0; i n / 8; i++) +for (j = 0; j 8; j++) + dot += *(x++) * *(y++); + + return dot; +} + +/* { dg-final { scan-tree-dump-times vectorized 1 loops 2 vect { target { vect_int_mult {! vect_no_align } } } } } */ +/* { dg-final { cleanup-tree-dump vect } } */ + Index: tree-vectorizer.h === --- tree-vectorizer.h (revision 173814) +++ tree-vectorizer.h (working copy) @@ -248,6 +248,10 @@ typedef struct _loop_vec_info { /* Reduction cycles detected in the loop. Used in loop-aware SLP. */ VEC (gimple, heap) *reductions; + /* All reduction chains in the loop, represented by the first + stmt in the chain. */ + VEC (gimple, heap) *reduction_chains; + /* Hash table used to choose the best peeling option. */ htab_t peeling_htab; @@ -277,6 +281,7 @@ typedef struct _loop_vec_info { #define LOOP_VINFO_SLP_INSTANCES(L)(L)-slp_instances #define LOOP_VINFO_SLP_UNROLLING_FACTOR(L) (L)-slp_unrolling_factor #define LOOP_VINFO_REDUCTIONS(L) (L)-reductions +#define LOOP_VINFO_REDUCTION_CHAINS(L) (L)-reduction_chains #define LOOP_VINFO_PEELING_HTAB(L) (L)-peeling_htab #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \ Index: tree-vect-loop.c === --- tree-vect-loop.c(revision 173814) +++ tree-vect-loop.c(working copy) @@ -757,6 +757,7 @@ new_loop_vec_info (struct loop *loop) PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS)); LOOP_VINFO_STRIDED_STORES (res) = VEC_alloc (gimple, heap, 10); LOOP_VINFO_REDUCTIONS (res) = VEC_alloc (gimple, heap, 10); + LOOP_VINFO_REDUCTION_CHAINS (res) = VEC_alloc (gimple, heap, 10); LOOP_VINFO_SLP_INSTANCES (res) = VEC_alloc (slp_instance, heap, 10); LOOP_VINFO_SLP_UNROLLING_FACTOR (res) = 1; LOOP_VINFO_PEELING_HTAB (res) = NULL; @@ -852,6 +853,7 @@ destroy_loop_vec_info (loop_vec_info loop_vinfo, b VEC_free (slp_instance, heap,
[PATCH] Fix up execute_update_addresses_taken for debug stmts (PR tree-optimization/49000)
Hi! When an addressable var is optimized into non-addressable, we didn't clean up MEM_REFs containing ADDR_EXPR of such VARs in debug stmts. This got later on folded into the var itself and caused ssa verification errors. Fixed by trying to rewrite it and if it fails, resetting. Bootstrapped/regtested on x86_64-linux and i686-linux, no change in cc1plus .debug_info/.debug_loc, implicitptr.c testcase still works too. Ok for trunk/4.6? 2011-05-18 Jakub Jelinek ja...@redhat.com PR tree-optimization/49000 * tree-ssa.c (execute_update_addresses_taken): Call maybe_rewrite_mem_ref_base on debug stmt value. If it couldn't be rewritten and decl has been marked for renaming, reset the debug stmt. * gcc.dg/pr49000.c: New test. --- gcc/tree-ssa.c.jj 2011-05-11 19:39:04.0 +0200 +++ gcc/tree-ssa.c 2011-05-17 18:20:10.0 +0200 @@ -2230,6 +2230,17 @@ execute_update_addresses_taken (void) } } + else if (gimple_debug_bind_p (stmt) + gimple_debug_bind_has_value_p (stmt)) + { + tree *valuep = gimple_debug_bind_get_value_ptr (stmt); + tree decl; + maybe_rewrite_mem_ref_base (valuep); + decl = non_rewritable_mem_ref_base (*valuep); + if (decl symbol_marked_for_renaming (decl)) + gimple_debug_bind_reset_value (stmt); + } + if (gimple_references_memory_p (stmt) || is_gimple_debug (stmt)) update_stmt (stmt); --- gcc/testsuite/gcc.dg/pr49000.c.jj 2011-05-17 18:30:10.0 +0200 +++ gcc/testsuite/gcc.dg/pr49000.c 2011-05-17 18:23:16.0 +0200 @@ -0,0 +1,29 @@ +/* PR tree-optimization/49000 */ +/* { dg-do compile } */ +/* { dg-options -O2 -g } */ + +static +foo (int x, int y) +{ + return x * y; +} + +static int +bar (int *z) +{ + return *z; +} + +void +baz (void) +{ + int a = 42; + int *b = a; + foo (bar (a), 3); +} + +void +test (void) +{ + baz (); +} Jakub
[PATCH] Small typed DWARF improvement
Hi! This patch optimizes away unneeded DW_OP_GNU_converts. mem_loc_descriptor attempts to keep the operands signed when it returns, if next op needs it unsigned again with the same size, there might be useless converts. The patch won't change DW_OP_GNU_convert to integral from non-integral (so that say float to {un,}signed conversion is done with the right sign), for other converts will change if possible preceeding typed op's base type if size is the same, both the typed op and following DW_OP_GNU_convert are integral or have the same encoding. Example testcase which is improved is e.g.: /* { dg-do run } */ /* { dg-options -g } */ volatile int vv; __attribute__((noclone, noinline)) void foo (double d) { unsigned long f = ((unsigned long) d) / 33UL; vv++; /* { dg-final { gdb-test 10 f 7 } } */ } int main () { foo (231.0); return 0; } where previously we emitted DW_OP_GNU_regval_type xmm0, double DW_OP_GNU_convert ulong DW_OP_GNU_convert long DW_OP_GNU_convert ulong DW_OP_const1u 33 DW_OP_GNU_convert ulong DW_OP_div DW_OP_GNU_convert long while with this patch DW_OP_GNU_convert long DW_OP_GNU_convert ulong can go away. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-05-17 Jakub Jelinek ja...@redhat.com * dwarf2out.c (resolve_addr_in_expr): Optimize away redundant DW_OP_GNU_convert ops. --- gcc/dwarf2out.c.jj 2011-05-17 13:35:26.0 +0200 +++ gcc/dwarf2out.c 2011-05-17 14:41:21.0 +0200 @@ -24092,23 +24092,84 @@ resolve_one_addr (rtx *addr, void *data static bool resolve_addr_in_expr (dw_loc_descr_ref loc) { + dw_loc_descr_ref keep = NULL; for (; loc; loc = loc-dw_loc_next) -if (((loc-dw_loc_opc == DW_OP_addr || loc-dtprel) - resolve_one_addr (loc-dw_loc_oprnd1.v.val_addr, NULL)) - || (loc-dw_loc_opc == DW_OP_implicit_value -loc-dw_loc_oprnd2.val_class == dw_val_class_addr -resolve_one_addr (loc-dw_loc_oprnd2.v.val_addr, NULL))) - return false; -else if (loc-dw_loc_opc == DW_OP_GNU_implicit_pointer - loc-dw_loc_oprnd1.val_class == dw_val_class_decl_ref) +switch (loc-dw_loc_opc) { - dw_die_ref ref - = lookup_decl_die (loc-dw_loc_oprnd1.v.val_decl_ref); - if (ref == NULL) + case DW_OP_addr: + if (resolve_one_addr (loc-dw_loc_oprnd1.v.val_addr, NULL)) return false; - loc-dw_loc_oprnd1.val_class = dw_val_class_die_ref; - loc-dw_loc_oprnd1.v.val_die_ref.die = ref; - loc-dw_loc_oprnd1.v.val_die_ref.external = 0; + break; + case DW_OP_const4u: + case DW_OP_const8u: + if (loc-dtprel +resolve_one_addr (loc-dw_loc_oprnd1.v.val_addr, NULL)) + return false; + break; + case DW_OP_implicit_value: + if (loc-dw_loc_oprnd2.val_class == dw_val_class_addr +resolve_one_addr (loc-dw_loc_oprnd2.v.val_addr, NULL)) + return false; + break; + case DW_OP_GNU_implicit_pointer: + if (loc-dw_loc_oprnd1.val_class == dw_val_class_decl_ref) + { + dw_die_ref ref + = lookup_decl_die (loc-dw_loc_oprnd1.v.val_decl_ref); + if (ref == NULL) + return false; + loc-dw_loc_oprnd1.val_class = dw_val_class_die_ref; + loc-dw_loc_oprnd1.v.val_die_ref.die = ref; + loc-dw_loc_oprnd1.v.val_die_ref.external = 0; + } + break; + case DW_OP_GNU_const_type: + case DW_OP_GNU_regval_type: + case DW_OP_GNU_deref_type: + case DW_OP_GNU_convert: + case DW_OP_GNU_reinterpret: + while (loc-dw_loc_next + loc-dw_loc_next-dw_loc_opc == DW_OP_GNU_convert) + { + dw_die_ref base1, base2; + unsigned enc1, enc2, size1, size2; + if (loc-dw_loc_opc == DW_OP_GNU_regval_type + || loc-dw_loc_opc == DW_OP_GNU_deref_type) + base1 = loc-dw_loc_oprnd2.v.val_die_ref.die; + else + base1 = loc-dw_loc_oprnd1.v.val_die_ref.die; + base2 = loc-dw_loc_next-dw_loc_oprnd1.v.val_die_ref.die; + gcc_assert (base1-die_tag == DW_TAG_base_type +base2-die_tag == DW_TAG_base_type); + enc1 = get_AT_unsigned (base1, DW_AT_encoding); + enc2 = get_AT_unsigned (base2, DW_AT_encoding); + size1 = get_AT_unsigned (base1, DW_AT_byte_size); + size2 = get_AT_unsigned (base2, DW_AT_byte_size); + if (size1 == size2 +(((enc1 == DW_ATE_unsigned || enc1 == DW_ATE_signed) + (enc2 == DW_ATE_unsigned || enc2 == DW_ATE_signed) + loc != keep) + || enc1 == enc2)) + { + /* Optimize away next DW_OP_GNU_convert after + adjusting LOC's base type die reference. */ + if (loc-dw_loc_opc == DW_OP_GNU_regval_type + || loc-dw_loc_opc ==
Re: [google] Increase inlining limits with FDO/LIPO
To make consistent inline decisions between profile-gen and profile-use, probably better to check these two: flag_profile_arcs and flag_branch_probabilities. -fprofile-use enables profile-arcs, and value profiling is enabled only when edge/branch profiling is enabled (so no need to be checked). David On Tue, May 17, 2011 at 10:50 PM, Mark Heffernan meh...@google.com wrote: This small patch greatly expands the function size limits for inlining with FDO/LIPO. With profile information, the inliner is much more selective and precise and so the limits can be increased with less worry that functions and total code size will blow up. This speeds up x86-64 internal benchmarks by about geomean 1.5% to 3% with LIPO (depending on microarch), and 1% to 1.5% with FDO. Size increase is negligible (0.1% mean). Bootstrapped and regression tested on x86-64. Trunk testing to follow. Ok for google/main? Mark 2011-05-17 Mark Heffernan meh...@google.com * opts.c (finish_options): Increase inlining limits with profile generate and use. Index: opts.c === --- opts.c (revision 173666) +++ opts.c (working copy) @@ -828,6 +828,22 @@ finish_options (struct gcc_options *opts opts-x_flag_split_stack = 0; } } + + if (opts-x_flag_profile_use + || opts-x_profile_arc_flag + || opts-x_flag_profile_values) + { + /* With accurate profile information, inlining is much more + selective and makes better decisions, so increase the + inlining function size limits. Changes must be added to both + the generate and use builds to avoid profile mismatches. */ + maybe_set_param_value + (PARAM_MAX_INLINE_INSNS_SINGLE, 1000, + opts-x_param_values, opts_set-x_param_values); + maybe_set_param_value + (PARAM_MAX_INLINE_INSNS_AUTO, 1000, + opts-x_param_values, opts_set-x_param_values); + } }
Re: [patch gimplifier]: Make sure TRUTH_NOT_EXPR has boolean_type_node type and argument
2011/5/16 Richard Guenther richard.guent...@gmail.com: On Mon, May 16, 2011 at 3:45 PM, Michael Matz m...@suse.de wrote: Hi, On Mon, 16 May 2011, Richard Guenther wrote: I think conversion _to_ BOOLEAN_TYPE shouldn't be useless, on the grounds that it requires booleanization (at least conceptually), i.e. conversion to a set of two values (no matter the precision or size) based on the outcome of comparing the RHS value with false_pre_image(TREE_TYPE(RHS)). Conversion _from_ BOOLEAN_TYPE can be regarded as useless, as the conversions from false or true into false_pre_image or true_pre_image always is simply an embedding of 0 or 1/-1 (depending on target type signedness). And if the BOOLEAN_TYPE and the LHS have same signedness the bit representation of boolean_true_type is (or should be) the same as the one converted to LHS (namely either 1 or -1). Sure, that would probably be enough to prevent non-BOOLEAN_TYPEs be used where BOOLEAN_TYPE nodes were used before. It still will cause an artificial conversion from a single-bit bitfield read to a bool. Not if you're special casing single-bit conversions (on the grounds that a booleanization from two-valued set to a different two-valued set of the same signedness will not actually require a comparison). I think it's better to be very precise in our base predicates than to add various hacks over the place to care for imprecision. Or require a 1-bit integral type for TRUTH_* operands only (which ensures the two-valueness which is what we really want). That can be done by either fixing the frontends to make boolean_type_node have 1-bit precision or to build a middle-end private type with that constraints (though that's the more difficult route as we still do not have a strong FE - middle-end hand-off point, and it certainly is not the gimplifier). Long term all the global trees should be FE private and the middle-end should have its own set. Richard. Ciao, Michael. Hello, initial idea was to check for logical operations that the conversion to boolean_type_node is useless. This assumption was flawed by the fact that boolean_type_node gets re-defined in free_lang_decl to a 1-bit precision BOOL_TYPE_SIZE-ed type, if FE's boolean_type_node is incompatible to this. By this FE's boolean_type_node gets via the back-door incompatible in tree-cfg checks. So for all languages - but ADA - logical types have precision set to one. Just for ADA case, which requires a different boolean_type_node kind, we need to inspect the inner type to be a boolean. As Fortran has also integer typed boolean compatible types, we can't simply check for BOOLEAN_TYPE here and need to check for precision first. ChangeLog 2011-05-18 Kai Tietz PR middle-end/48989 * tree-cfg.c (verify_gimple_assign_binary): Check lhs type for being compatible to boolean for logical operations. (verify_gimple_assign_unary): Likewise. (compatible_boolean_type_p): New helper. Bootstrapped on x86_64-pc-linux-gnu. And regression tested for ADA and Fortran. Ok for apply? Regards, Kai Index: gcc/gcc/tree-cfg.c === --- gcc.orig/gcc/tree-cfg.c 2011-05-16 14:26:12.369031500 +0200 +++ gcc/gcc/tree-cfg.c 2011-05-18 08:20:34.935819100 +0200 @@ -3220,6 +3220,31 @@ verify_gimple_comparison (tree type, tre return false; } +/* Checks TYPE for being compatible to boolean. Returns + FALSE, if type is not compatible, otherwise TRUE. + + A type is compatible if + a) TYPE_PRECISION is one. + b) The type - or the inner type - is of kind BOOLEAN_TYPE. */ + +static bool +compatible_boolean_type_p (tree type) +{ + if (!type) +return false; + if (TYPE_PRECISION (type) == 1) +return true; + + /* We try to look here into inner type, as ADA uses + boolean_type_node with type precision != 1. */ + while (TREE_TYPE (type) + (TREE_CODE (type) == INTEGER_TYPE +|| TREE_CODE (type) == REAL_TYPE)) +type = TREE_TYPE (type); + + return TYPE_PRECISION (type) == 1 || TREE_CODE (type) == BOOLEAN_TYPE; +} + /* Verify a gimple assignment statement STMT with an unary rhs. Returns true if anything is wrong. */ @@ -3350,15 +3375,16 @@ verify_gimple_assign_unary (gimple stmt) return false; case TRUTH_NOT_EXPR: - if (!useless_type_conversion_p (boolean_type_node, rhs1_type)) + + if (!useless_type_conversion_p (lhs_type, rhs1_type) + || !compatible_boolean_type_p (lhs_type)) { - error (invalid types in truth not); - debug_generic_expr (lhs_type); - debug_generic_expr (rhs1_type); - return true; + error (invalid types in truth not); + debug_generic_expr (lhs_type); + debug_generic_expr (rhs1_type); + return true; } break; - case NEGATE_EXPR: case ABS_EXPR: case BIT_NOT_EXPR:
Re: [PATCH, PR45098, 2/10]
Hi, 2011-05-05 Tom de Vries t...@codesourcery.com PR target/45098 * tree-ssa-loop-ivopts.c (seq_cost): Fix call to rtx_cost. OK, Zdenek
Re: [patch ada]: Fix boolean_type_node setup and some cleanup for boolean use
Ok, thanks for explaining it. So would be patch ok for apply without the precision setting? Sure, everything but the gcc-interface/misc.c part is OK. Thanks. -- Eric Botcazou
Re: [patch ada]: Fix boolean_type_node setup and some cleanup for boolean use
2011/5/18 Eric Botcazou ebotca...@adacore.com: Ok, thanks for explaining it. So would be patch ok for apply without the precision setting? Sure, everything but the gcc-interface/misc.c part is OK. Thanks. -- Eric Botcazou Hmm, you mean the initialization of boolean_false_node is wrong, too? Not sure here. As this patch introduces its use in the other parts. The precision part of course is wrong. Regards, Kai
Re: [PATCH] Fix PR48989
* tree-ssa.c (useless_type_conversion_p): Preserve conversions to non-1-precision BOOLEAN_TYPEs. This looks like overeager if you're allowing non-boolean types in tree-cfg.c. The conversion can be stripped if the source type has precision 1, can't it? -- Eric Botcazou
Re: [PATCH] Fix PR48989
On Wed, 18 May 2011, Eric Botcazou wrote: * tree-ssa.c (useless_type_conversion_p): Preserve conversions to non-1-precision BOOLEAN_TYPEs. This looks like overeager if you're allowing non-boolean types in tree-cfg.c. The conversion can be stripped if the source type has precision 1, can't it? That's true, though in that case the previous /* Preserve changes in signedness or precision. */ if (TYPE_UNSIGNED (inner_type) != TYPE_UNSIGNED (outer_type) || TYPE_PRECISION (inner_type) != TYPE_PRECISION (outer_type)) return false; check would have preserved the conversion already (or the BOOLEAN_TYPEs precision is 1 as well). Thus, we preserved such conversions already in the past. Richard.
[PATCH] Fix PR49018
This fixes PR49018, ifcombine looks for side-effects but instead asks only gimple_has_volatile_ops. And gimple_has_side_effects disregards that volatile asms have side-effects. The function also doesn't handle all stmts gracefully so I fixed it as well as turning the asserts to checking asserts. Fixed as follows. Bootstrap / regtest pending on x86_64-unknown-linux-gnu. Richard. 2011-05-18 Richard Guenther rguent...@suse.de PR tree-optimization/49018 * gimple.c (gimple_has_side_effects): Volatile asms have side-effects. * tree-ssa-ifcombine.c (bb_no_side_effects_p): Use gimple_has_side_effects. Index: gcc/gimple.c === --- gcc/gimple.c(revision 173854) +++ gcc/gimple.c(working copy) @@ -2354,6 +2354,10 @@ gimple_has_side_effects (const_gimple s) if (gimple_has_volatile_ops (s)) return true; + if (gimple_code (s) == GIMPLE_ASM + gimple_asm_volatile_p (s)) +return true; + if (is_gimple_call (s)) { unsigned nargs = gimple_call_num_args (s); @@ -2368,7 +2372,7 @@ gimple_has_side_effects (const_gimple s) if (gimple_call_lhs (s) TREE_SIDE_EFFECTS (gimple_call_lhs (s))) { - gcc_assert (gimple_has_volatile_ops (s)); + gcc_checking_assert (gimple_has_volatile_ops (s)); return true; } @@ -2379,7 +2383,7 @@ gimple_has_side_effects (const_gimple s) for (i = 0; i nargs; i++) if (TREE_SIDE_EFFECTS (gimple_call_arg (s, i))) { - gcc_assert (gimple_has_volatile_ops (s)); + gcc_checking_assert (gimple_has_volatile_ops (s)); return true; } @@ -2388,11 +2392,14 @@ gimple_has_side_effects (const_gimple s) else { for (i = 0; i gimple_num_ops (s); i++) - if (TREE_SIDE_EFFECTS (gimple_op (s, i))) - { - gcc_assert (gimple_has_volatile_ops (s)); - return true; - } + { + tree op = gimple_op (s, i); + if (op TREE_SIDE_EFFECTS (op)) + { + gcc_checking_assert (gimple_has_volatile_ops (s)); + return true; + } + } } return false; Index: gcc/tree-ssa-ifcombine.c === --- gcc/tree-ssa-ifcombine.c(revision 173854) +++ gcc/tree-ssa-ifcombine.c(working copy) @@ -107,7 +107,7 @@ bb_no_side_effects_p (basic_block bb) { gimple stmt = gsi_stmt (gsi); - if (gimple_has_volatile_ops (stmt) + if (gimple_has_side_effects (stmt) || gimple_vuse (stmt)) return false; }
Re: [PATCH, i386]: Cleanup TARGET_GNU2_TLS usage
Uros, The test (tls.c), used to check all TLS models is attached to the message. I plan to convert it to proper dg test... ;) I've got it in my tree since you sent it to me while debugging/testing support for the various TLS models on Solaris. I'd really prefer (and have modified it this way) the test to be split into one test per access model so it becomes easier to figure out what is failing. I can provide such a patch if desired. Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[PATCH] Fix extract_fixed_bit_field (PR middle-end/49029)
Hi! The attached testcase ICEs on arm, because extract_fixed_bit_field with tmode SImode (and SImode non-NULL target) decides to use DImode for the signed shifts, but doesn't clear target and thus attempts to use that SImode target for DImode shifts. The code apparently already has if (mode != tmode) target = 0;, just done at a wrong spot before mode can be changed. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux and tested with a cross to arm-linux on the testcase, ok for trunk/4.6? 2011-05-18 Jakub Jelinek ja...@redhat.com PR middle-end/49029 * expmed.c (extract_fixed_bit_field): Test whether target can be used only after deciding which mode to use. * gcc.c-torture/compile/pr49029.c: New test. --- gcc/expmed.c.jj 2011-05-11 19:39:04.0 +0200 +++ gcc/expmed.c2011-05-18 11:38:43.0 +0200 @@ -1769,8 +1769,6 @@ extract_fixed_bit_field (enum machine_mo /* To extract a signed bit-field, first shift its msb to the msb of the word, then arithmetic-shift its lsb to the lsb of the word. */ op0 = force_reg (mode, op0); - if (mode != tmode) -target = 0; /* Find the narrowest integer mode that contains the field. */ @@ -1782,6 +1780,9 @@ extract_fixed_bit_field (enum machine_mo break; } + if (mode != tmode) +target = 0; + if (GET_MODE_BITSIZE (mode) != (bitsize + bitpos)) { int amount = GET_MODE_BITSIZE (mode) - (bitsize + bitpos); --- gcc/testsuite/gcc.c-torture/compile/pr49029.c.jj2011-05-18 11:55:25.0 +0200 +++ gcc/testsuite/gcc.c-torture/compile/pr49029.c 2011-05-18 11:54:22.0 +0200 @@ -0,0 +1,10 @@ +/* PR middle-end/49029 */ +struct S { volatile unsigned f : 11; signed g : 30; } __attribute__((packed)); +struct T { volatile struct S h; } __attribute__((packed)) a; +void foo (int); + +void +bar () +{ + foo (a.h.g); +} Jakub
Re: [PATCH][ARM] Add support for ADDW and SUBW instructions
Ping. On 20/04/11 16:27, Andrew Stubbs wrote: This patch adds basic support for the Thumb ADDW and SUBW instructions. The patch permits the compiler to use the new instructions for constants that can be loaded with a single instruction (i.e. 16-bit unshifted), but does not support use of addw with split-constants; I have a patch for that coming soon. This patch requires that my previously posted patch for MOVW is applied first. OK? Andrew
Re: [PATCH 2/2] Reimplementation of build_ref_for_offset
On Sat, Oct 23, 2010 at 10:12 AM, H.J. Lu hjl.to...@gmail.com wrote: On Wed, Sep 8, 2010 at 9:43 AM, Martin Jambor mjam...@suse.cz wrote: Hi, this patch reimplements build_ref_for_offset so that it simply creates a MEM_REF rather than trying to figure out what combination of component and array refs are necessary. The main advantage of this approach is that this can never fail, allowing us to be more aggressive and remove a number of checks. There were two main problems with this, though. First is that MEM_REFs are not particularly readable to by users. This would be a problem when we are creating a reference that might be displayed to them in a warning or a debugger which is what we do with DECL_DEBUG_EXPR expressions. We sometimes construct these artificially when propagating accesses across assignments. So for those cases I retained the old implementation and only simplified it a bit - it is now called build_user_friendly_ref_for_offset. The other problem was bit-fields. Constructing accesses to them was difficult enough but then I realized that I was not even able to detect the cases when I was accessing a bit field if their offset happened to be on a byte boundary. I thought I would be able to figure this out from TYPE_SIZE and TYPE_PRECISION of exp_type but combinations that signal a bit-field in one language may not be applied in another (in C, small TYPE_PRECISION denotes bit-fields and TYPE_SIZE is big, but for example Fortran booleans have the precision set to one even though they are not bit-fields). So in the end I based the detection on the access structures that represented the thing being loaded or stored which I knew had their sizes correct because they are based on field sizes. Since I use the access, the simplest way to actually create the reference to the bit field is to re-use the last component ref of its expression - that is what build_ref_for_model (meaning a model access) does. Separating this from build_ref_for_offset (which cannot handle bit-fields) makes the code a bit cleaner and keeps the latter function for other users which know nothing about SRA access structures. I hope that you'll find these approaches reasonable. The patch was bootstrapped and tested on x86_64-linux without any issues. I'd like to commit it to trunk but I'm sure there will be comments and suggestions. Thanks, Martin 2010-09-08 Martin Jambor mjam...@suse.cz PR tree-optimization/44972 * tree-sra.c: Include toplev.h. (build_ref_for_offset): Entirely reimplemented. (build_ref_for_model): New function. (build_user_friendly_ref_for_offset): New function. (analyze_access_subtree): Removed build_ref_for_offset check. (propagate_subaccesses_across_link): Likewise. (create_artificial_child_access): Use build_user_friendly_ref_for_offset. (propagate_subaccesses_across_link): Likewise. (ref_expr_for_all_replacements_p): Removed. (generate_subtree_copies): Updated comment. Use build_ref_for_model. (sra_modify_expr): Use build_ref_for_model. (load_assign_lhs_subreplacements): Likewise. (sra_modify_assign): Removed ref_expr_for_all_replacements_p checks, checks for return values of build_ref_for_offset. * ipa-cp.c (ipcp_lattice_from_jfunc): No need to check return value of build_ref_for_offset. * ipa-prop.h: Include gimple.h * ipa-prop.c (ipa_compute_jump_functions): Update to look for MEM_REFs. (ipa_analyze_indirect_call_uses): Update comment. * Makefile.in (tree-sra.o): Add $(GIMPLE_H) to dependencies. (IPA_PROP_H): Likewise. This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46150 This also caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49039 -- H.J.
PING: PATCH: PR rtl-optimization/48575: RTL vector patterns are limited to 26 elements
On Tue, Apr 26, 2011 at 3:32 PM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Apr 4, 2011 at 6:05 PM, H.J. Lu hjl.to...@gmail.com wrote: On Thu, Mar 31, 2011 at 5:05 AM, Kenneth Zadeck zad...@naturalbridge.com wrote: we hit this limit trying to write the explicit semantics for a vec_interleave_evenv32qi. ;;(define_insn vec_interleave_evenv32qi ;; [(set (match_operand:V32QI 0 register_operand =r) ;; (vec_select:V32QI ;; (vec_concat:V64QI ;; (match_operand:V32QI 1 register_operand 0) ;; (match_operand:V32QI 2 register_operand r)) ;; (parallel [(const_int 0) (const_int 32) ;; (const_int 2) (const_int 34) ;; (const_int 4) (const_int 36) ;; (const_int 6) (const_int 38) ;; (const_int 8) (const_int 40) ;; (const_int 10) (const_int 42) ;; (const_int 12) (const_int 44) ;; (const_int 14) (const_int 46) ;; (const_int 16) (const_int 48) ;; (const_int 18) (const_int 50) ;; (const_int 20) (const_int 52) ;; (const_int 22) (const_int 54) ;; (const_int 24) (const_int 56) ;; (const_int 26) (const_int 58) ;; (const_int 28) (const_int 60) ;; (const_int 30) (const_int 62)])))] ;; ;; rimihv\t%0,%2,8,15,8 ;; [(set_attr type rimi)]) kenny On 03/31/2011 06:16 AM, Mike Stump wrote: On Mar 31, 2011, at 1:41 AM, Richard Guenther wrote: On Wed, Mar 30, 2011 at 8:09 PM, H.J. Luhongjiu...@intel.com wrote: On Wed, Mar 30, 2011 at 08:02:38AM -0700, H.J. Lu wrote: Hi, Currently, we limit XVECEXP to 26 elements in machine description since we use letters 'a' to 'z' to encode them. I don't see any reason why we can't go beyond 'z'. This patch removes this restriction. Any comments? That was wrong. The problem is in vector elements. This patch passes bootstrap. Any comments? Do you really need it? I'm trying to recall if this is the limit Kenny and I hit If so, annoying. Kenny could confirm if it was. gcc's general strategy of, no fixed N gives gcc a certain flexibility that is very nice to have, on those general grounds, I kinda liked this patch. Is my patch OK to install? Here is my patch: http://gcc.gnu.org/ml/gcc-patches/2011-03/msg02105.html OK for trunk? Hi, No one is listed to review genrecog.c. Could global reviewers comment on my patch? Thanks. -- H.J.
Re: [PATCH, MELT] correcting path error in the Makefile.in
Basile Starynkevitch bas...@starynkevitch.net writes: On Wed, May 18, 2011 at 10:27:11AM +0400, Andrey Belevantsev wrote: On 17.05.2011 23:42, Basile Starynkevitch wrote: On Tue, 17 May 2011 21:30:44 +0200 Pierre Vittetpier...@pvittet.com wrote: My contributor number is 634276. You don't have to write your FSF contributor number in each mail to gcc-patches. This information is irrelevant to anybody reading the list as soon as you have got your papers right and got acquainted with the community. So don't worry about this :) It would help a lot if Pierre Vittet had a write access to the SVN of GCC. Hese legal papers are done. However, neither Pierre nor me understands how can he get an actual write access to the SVN (that is an SSH account on gcc.gnu.org). Apparently, Pierre needs to be presented (or introduced) by someone. But a plain write after approval GCC maintainer like me is not enough. So how can Pierre get write access to GCC ? We usually like to see a few successful patches before granting people write access, to make sure that people have the mechanics down before they start changing the repository. Ian
Re: PING: PATCH: PR other/48007: Unwind library doesn't work with UNITS_PER_WORD sizeof (void *)
On Tue, Apr 26, 2011 at 6:07 AM, H.J. Lu hjl.to...@gmail.com wrote: On Sat, Apr 9, 2011 at 6:52 PM, H.J. Lu hjl.to...@gmail.com wrote: On Thu, Mar 24, 2011 at 12:15 AM, H.J. Lu hjl.to...@gmail.com wrote: On Wed, Mar 23, 2011 at 12:22 PM, Ulrich Weigand uweig...@de.ibm.com wrote: Richard Henderson wrote: Because, really, if we consider the structure truly public, we can't even change the number of registers for a given port to support new features of the cpu. Indeed, and I remember we got bitten by that a long time ago, which is why s390.h now has this comment: /* Number of hardware registers that go into the DWARF-2 unwind info. To avoid ABI incompatibility, this number must not change even as 'fake' hard registers are added or removed. */ #define DWARF_FRAME_REGISTERS 34 I don't suppose there's any way that we can declare these old programs Just Broken, and drop this compatibility stuff? I wouldn't like that ... we did run into this problem in the wild, and some s390 users really run very old programs for some reason. However, I'm wondering: this bug that leaked the implementation of _Unwind_Context only ever affected the *original* version of the structure -- it was fixed before the extended context was ever added, right? If this is true, we'd still need to keep the original context format unchanged, but we'd be free to modify the *extended* format at any time, without ABI considerations and need for further versioning ... From what I can tell, the issues are: 1. _Unwind_Context is supposed to be opaque and we are free to change it. We should be able to extend DWARF_FRAME_REGISTERS to support the new hard registers if needed, without breaking binary compatibility. 2. _Unwind_Context was leaked and wasn't really opaque. To provide backward binary compatibility, we are stuck with what we had. Is that possible to implement something along the line: 1. Add some bits to _Unwind_Context so that we can detect the leaked _Unwind_Context. 2. When a leaked _Unwind_Context is detected at run-time, as a compile time option, a target can either provide binary compatibility or issue a run-time error. This is the attempt to implement it. Any comments? Thanks. -- H.J. -- 2011-04-09 H.J. Lu hongjiu...@intel.com PR other/48007 * unwind-dw2.c (UNIQUE_UNWIND_CONTEXT): New. (_Unwind_Context): If UNIQUE_UNWIND_CONTEXT is defined, add dwarf_reg_size_table and value, remove version and by_value. (EXTENDED_CONTEXT_BIT): Don't define if UNIQUE_UNWIND_CONTEXT is defined. (_Unwind_IsExtendedContext): Likewise. (_Unwind_GetGR): Support UNIQUE_UNWIND_CONTEXT. (_Unwind_SetGR): Likewise. (_Unwind_GetGRPtr): Likewise. (_Unwind_SetGRPtr): Likewise. (_Unwind_SetGRValue): Likewise. (_Unwind_GRByValue): Likewise. (__frame_state_for): Initialize dwarf_reg_size_table field if UNIQUE_UNWIND_CONTEXT is defined. (uw_install_context_1): Likewise. Support UNIQUE_UNWIND_CONTEXT. PING. Hi Jason, Can you take a look at: http://gcc.gnu.org/ml/gcc-patches/2011-04/msg00695.html Thanks. -- H.J.
Re: [Patch, Fortran] PR 48700: memory leak with MOVE_ALLOC
Janus Weil wrote: The patch was regtested on x86_64-unknown-linux-gnu. Ok for trunk? OK. Thanks for the patch! (What next on your gfortran agenda?) Tobias PS: For the following two patches review is pending: My trans*.c coarray patch at http://gcc.gnu.org/ml/fortran/2011-05/msg00123.html Janne's http://gcc.gnu.org/ml/fortran/2011-05/msg00122.html 2011-05-16 Janus Weil ja...@gcc.gnu.org PR fortran/48700 * trans-intrinsic.c (gfc_conv_intrinsic_move_alloc): Deallocate 'TO' argument to avoid memory leaks. 2011-05-16 Janus Weil ja...@gcc.gnu.org PR fortran/48700 * gfortran.dg/move_alloc_4.f90: New.
[patch gimplifier]: Change TRUTH_(AND|OR|XOR) expressions to binary form
Hello As follow-up for logical to binary transition 2011-05-18 Kai Tietz kti...@redhat.com * tree-cfg.c (verify_gimple_assign_binary): Barf on TRUTH_AND_EXPR, TRUTH_OR_EXPR, and TRUTH_XOR_EXPR. (gimplify_expr): Boolify TRUTH_ANDIF_EXPR, TRUTH_ORIF_EXPR, TRUTH_AND_EXPR, TRUTH_OR_EXPR, and TRUTH_XOR_EXPR. Additionally move TRUTH_AND|OR|XOR_EXPR to its binary form. Boostrapped for x86_64-pc-linux-gnu and regression tested for ada, fortran, g++, and c. Ok for apply? Regards, Kai Index: gcc/gcc/gimplify.c === --- gcc.orig/gcc/gimplify.c 2011-05-13 13:15:01.0 +0200 +++ gcc/gcc/gimplify.c 2011-05-18 14:03:31.730740200 +0200 @@ -7210,7 +7210,21 @@ gimplify_expr (tree *expr_p, gimple_seq break; } } - + + switch (TREE_CODE (*expr_p)) + { + case TRUTH_AND_EXPR: + TREE_SET_CODE (*expr_p, BIT_AND_EXPR); + break; + case TRUTH_OR_EXPR: + TREE_SET_CODE (*expr_p, BIT_IOR_EXPR); + break; + case TRUTH_XOR_EXPR: + TREE_SET_CODE (*expr_p, BIT_XOR_EXPR); + break; + default: + break; + } /* Classified as tcc_expression. */ goto expr_2; Index: gcc/gcc/tree-cfg.c === --- gcc.orig/gcc/tree-cfg.c 2011-05-18 14:01:18.0 +0200 +++ gcc/gcc/tree-cfg.c 2011-05-18 14:05:06.512276000 +0200 @@ -3555,29 +3555,11 @@ do_pointer_plus_expr_check: case TRUTH_ANDIF_EXPR: case TRUTH_ORIF_EXPR: - gcc_unreachable (); - case TRUTH_AND_EXPR: case TRUTH_OR_EXPR: case TRUTH_XOR_EXPR: - { - /* We require two-valued operand types. */ - if (!(TREE_CODE (rhs1_type) == BOOLEAN_TYPE - || (INTEGRAL_TYPE_P (rhs1_type) - TYPE_PRECISION (rhs1_type) == 1)) - || !(TREE_CODE (rhs2_type) == BOOLEAN_TYPE -|| (INTEGRAL_TYPE_P (rhs2_type) - TYPE_PRECISION (rhs2_type) == 1))) - { - error (type mismatch in binary truth expression); - debug_generic_expr (lhs_type); - debug_generic_expr (rhs1_type); - debug_generic_expr (rhs2_type); - return true; - } - break; - } + gcc_unreachable (); case LT_EXPR: case LE_EXPR:
Re: [PATCH] fix vfmsubaddpd/vfmaddsubpd generation
Hello! This patch fixes an obvious problem: the fma4_fmsubadd/fma4_fmaddsub instruction templates don't generate vfmsubaddpd/vfmaddsubpd because they don't use ssemodesuffix This passes bootstrap on x86_64 on trunk. Okay to commit? See comments in the code. BTW, I'm testing on gcc-4_6-branch. Should I post a different patch thread, or just use this one? No, the patch is clear and simple enough, you don't need to post it twice. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 3625d9b..e86ea4e 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2011-05-17 Harsha Jagasia harsha.jaga...@amd.com + + * config/i386/sse.md (fma4_fmsubadd): Use ssemodesuffix. + (fma4_fmaddsub): Likewise + 2011-05-17 Richard Guenther rguent...@suse.de ChangeLog should be included in the message body, not in the patch. Please see [1] for details. * gimple.c (iterative_hash_gimple_type): Simplify singleton diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 291bffb..7c4e6dd 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -1663,7 +1663,7 @@ (match_operand:VF 3 nonimmediate_operand xm,x)] UNSPEC_FMADDSUB))] TARGET_FMA4 - vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3} + vfmaddsubpssemodesuffix\t{%3, %2, %1, %0|%0, %1, %2, %3} [(set_attr type ssemuladd) (set_attr mode MODE)]) No, ssemodesuffix mode attribute resolves to ps and pd for VF mode iterator, so vfmaddsubssemodesuffix. @@ -1676,7 +1676,7 @@ (match_operand:VF 3 nonimmediate_operand xm,x))] UNSPEC_FMADDSUB))] TARGET_FMA4 - vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3} + vfmsubaddpssemodesuffix\t{%3, %2, %1, %0|%0, %1, %2, %3} [(set_attr type ssemuladd) (set_attr mode MODE)]) Same here. OK everywhere with these two changes. [1] http://gcc.gnu.org/contribute.html. Thanks, Uros.
[PATCH 1/2] Add bf592 support
Hi, The attached patch adds support for the bfin bf592 part. * doc/invoke.texi (Blackfin Options): -mcpu accepts bf592. * config/bfin/t-bfin-elf (MULTILIB_MATCHES): Select bf532-none for bf592-none. * config/bfin/t-bfin-linux (MULTILIB_MATCHES): Likewise. * config/bfin/t-bfin-uclinux (MULTILIB_MATCHES): Likewise. * config/bfin/bfin.c (bfin_cpus): Add bf592. * config/bfin/bfin.h (TARGET_CPU_CPP_BUILTINS): Define __ADSPBF592__ and __ADSPBF59x__ for BFIN_CPU_BF592. * config/bfin/bfin-opts.h (bfin_cpu_type): Add BFIN_CPU_BF592. * config/bfin/elf.h (LIB_SPEC): Add bf592. Ok to add to trunk? thanks, Stu Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 173825) +++ gcc/doc/invoke.texi (working copy) @@ -10414,7 +10414,7 @@ @samp{bf534}, @samp{bf536}, @samp{bf537}, @samp{bf538}, @samp{bf539}, @samp{bf542}, @samp{bf544}, @samp{bf547}, @samp{bf548}, @samp{bf549}, @samp{bf542m}, @samp{bf544m}, @samp{bf547m}, @samp{bf548m}, @samp{bf549m}, -@samp{bf561}. +@samp{bf561}, @samp{bf592}. The optional @var{sirevision} specifies the silicon revision of the target Blackfin processor. Any workarounds available for the targeted silicon revision will be enabled. If @var{sirevision} is @samp{none}, no workarounds are enabled. Index: gcc/config/bfin/t-bfin-elf === --- gcc/config/bfin/t-bfin-elf (revision 173825) +++ gcc/config/bfin/t-bfin-elf (working copy) @@ -58,6 +58,7 @@ MULTILIB_MATCHES+=mcpu?bf532-none=mcpu?bf549-none MULTILIB_MATCHES+=mcpu?bf532-none=mcpu?bf549m-none MULTILIB_MATCHES+=mcpu?bf532-none=mcpu?bf561-none +MULTILIB_MATCHES+=mcpu?bf532-none=mcpu?bf592-none MULTILIB_EXCEPTIONS=mleaf-id-shared-library* MULTILIB_EXCEPTIONS+=mcpu=bf532-none/mleaf-id-shared-library* Index: gcc/config/bfin/bfin-opts.h === --- gcc/config/bfin/bfin-opts.h (revision 173825) +++ gcc/config/bfin/bfin-opts.h (working copy) @@ -53,7 +53,8 @@ BFIN_CPU_BF548M, BFIN_CPU_BF549, BFIN_CPU_BF549M, - BFIN_CPU_BF561 + BFIN_CPU_BF561, + BFIN_CPU_BF592 } bfin_cpu_t; #endif Index: gcc/config/bfin/elf.h === --- gcc/config/bfin/elf.h (revision 173825) +++ gcc/config/bfin/elf.h (working copy) @@ -51,6 +51,7 @@ %{mmulticore:%{mcorea:-T bf561a.ld%s}} \ %{mmulticore:%{mcoreb:-T bf561b.ld%s}} \ %{mmulticore:%{!mcorea:%{!mcoreb:-T bf561m.ld%s \ + %{mcpu=bf592*:-T bf592.ld%s} \ %{!mcpu=*:%eno processor type specified for linking} \ %{!mcpu=bf561*:-T bfin-common-sc.ld%s} \ %{mcpu=bf561*:%{!mmulticore:-T bfin-common-sc.ld%s} \ Index: gcc/config/bfin/bfin.c === --- gcc/config/bfin/bfin.c (revision 173825) +++ gcc/config/bfin/bfin.c (working copy) @@ -350,6 +350,11 @@ | WA_05000283 | WA_05000257 | WA_05000315 | WA_LOAD_LCREGS | WA_0574}, + {bf592, BFIN_CPU_BF592, 0x0001, + WA_SPECULATIVE_LOADS | WA_0574}, + {bf592, BFIN_CPU_BF592, 0x, + WA_SPECULATIVE_LOADS | WA_0574}, + {NULL, BFIN_CPU_UNKNOWN, 0, 0} }; Index: gcc/config/bfin/bfin.h === --- gcc/config/bfin/bfin.h (revision 173825) +++ gcc/config/bfin/bfin.h (working copy) @@ -140,6 +140,10 @@ case BFIN_CPU_BF561:\ builtin_define (__ADSPBF561__); \ break;\ + case BFIN_CPU_BF592:\ + builtin_define (__ADSPBF592__); \ + builtin_define (__ADSPBF59x__); \ + break;\ } \ \ if (bfin_si_revision != -1) \ Index: gcc/config/bfin/t-bfin-uclinux === --- gcc/config/bfin/t-bfin-uclinux (revision 173825) +++ gcc/config/bfin/t-bfin-uclinux (working copy) @@ -58,6 +58,7 @@ MULTILIB_MATCHES+=mcpu?bf532-none=mcpu?bf549-none MULTILIB_MATCHES+=mcpu?bf532-none=mcpu?bf549m-none MULTILIB_MATCHES+=mcpu?bf532-none=mcpu?bf561-none +MULTILIB_MATCHES+=mcpu?bf532-none=mcpu?bf592-none MULTILIB_EXCEPTIONS=mleaf-id-shared-library* MULTILIB_EXCEPTIONS+=mcpu=bf532-none/mleaf-id-shared-library* Index: gcc/config/bfin/t-bfin-linux === --- gcc/config/bfin/t-bfin-linux(revision 173825) +++ gcc/config/bfin/t-bfin-linux(working copy) @@ -57,6 +57,7 @@ MULTILIB_MATCHES+=mcpu?bf532-none=mcpu?bf549-none MULTILIB_MATCHES+=mcpu?bf532-none=mcpu?bf549m-none MULTILIB_MATCHES+=mcpu?bf532-none=mcpu?bf561-none +MULTILIB_MATCHES+=mcpu?bf532-none=mcpu?bf592-none SHLIB_MAPFILES=$(srcdir)/config/bfin/libgcc-bfin.ver
[PATCH 2/2] Add bf592 support
Hi, The attached patch adds a new test for the bfin bf592 part. * gcc.target/bfin/mcpu-bf592.c: New test. Ok to add to trunk? thanks, Stu Index: gcc/testsuite/gcc.target/bfin/mcpu-bf592.c === --- gcc/testsuite/gcc.target/bfin/mcpu-bf592.c (revision 0) +++ gcc/testsuite/gcc.target/bfin/mcpu-bf592.c (revision 0) @@ -0,0 +1,31 @@ +/* Test for -mcpu=. */ +/* { dg-do preprocess } */ +/* { dg-bfin-options -mcpu=bf592 } */ + +#ifndef __ADSPBF592__ +#error __ADSPBF592__ is not defined +#endif + +#ifndef __ADSPBF59x__ +#error __ADSPBF59x__ is not defined +#endif + +#if __SILICON_REVISION__ != 0x0001 +#error __SILICON_REVISION__ is not 0x0001 +#endif + +#ifndef __WORKAROUNDS_ENABLED +#error __WORKAROUNDS_ENABLED is not defined +#endif + +#ifdef __WORKAROUND_RETS +#error __WORKAROUND_RETS is defined +#endif + +#ifndef __WORKAROUND_SPECULATIVE_LOADS +#error __WORKAROUND_SPECULATIVE_LOADS is not defined +#endif + +#ifdef __WORKAROUND_SPECULATIVE_SYNCS +#error __WORKAROUND_SPECULATIVE_SYNCS is defined +#endif
[PATCH PR45098, 4/10] Iv init cost.
On 05/17/2011 09:17 AM, Tom de Vries wrote: On 05/17/2011 09:10 AM, Tom de Vries wrote: Hi Zdenek, I have a patch set for for PR45098. 01_object-size-target.patch 02_pr45098-rtx-cost-set.patch 03_pr45098-computation-cost.patch 04_pr45098-iv-init-cost.patch 05_pr45098-bound-cost.patch 06_pr45098-bound-cost.test.patch 07_pr45098-nowrap-limits-iterations.patch 08_pr45098-nowrap-limits-iterations.test.patch 09_pr45098-shift-add-cost.patch 10_pr45098-shift-add-cost.test.patch I will sent out the patches individually. OK for trunk? Thanks, - Tom Resubmitting with comment. The init cost of an iv will in general not be zero. It will be exceptional that the iv register happens to be initialized with the proper value at no cost. In general, there will at the very least be a regcopy or a const set. 2011-05-05 Tom de Vries t...@codesourcery.com PR target/45098 * tree-ssa-loop-ivopts.c (determine_iv_cost): Prevent cost_base.cost == 0. Index: gcc/tree-ssa-loop-ivopts.c === --- gcc/tree-ssa-loop-ivopts.c (revision 173380) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -4688,6 +4688,8 @@ determine_iv_cost (struct ivopts_data *d base = cand-iv-base; cost_base = force_var_cost (data, base, NULL); + if (cost_base.cost == 0) + cost_base.cost = COSTS_N_INSNS (1); cost_step = add_cost (TYPE_MODE (TREE_TYPE (base)), data-speed); cost = cost_step + adjust_setup_cost (data, cost_base.cost);
[PATCH PR45098, 6/10] Bound cost - test cases.
On 05/17/2011 09:19 AM, Tom de Vries wrote: On 05/17/2011 09:10 AM, Tom de Vries wrote: Hi Zdenek, I have a patch set for for PR45098. 01_object-size-target.patch 02_pr45098-rtx-cost-set.patch 03_pr45098-computation-cost.patch 04_pr45098-iv-init-cost.patch 05_pr45098-bound-cost.patch 06_pr45098-bound-cost.test.patch 07_pr45098-nowrap-limits-iterations.patch 08_pr45098-nowrap-limits-iterations.test.patch 09_pr45098-shift-add-cost.patch 10_pr45098-shift-add-cost.test.patch I will sent out the patches individually. OK for trunk? Thanks, - Tom These patch adds 2 new test cases. These need the preceding patches to pass. Index: gcc/testsuite/gcc.target/arm/ivopts-2.c === --- /dev/null (new file) +++ gcc/testsuite/gcc.target/arm/ivopts-2.c (revision 0) @@ -0,0 +1,18 @@ +/* { dg-do assemble } */ +/* { dg-options -Os -mthumb -fdump-tree-ivopts -save-temps } */ + +extern void foo2 (short*); + +void +tr4 (short array[], int n) +{ + int x; + if (n 0) +for (x = 0; x n; x++) + foo2 (array[x]); +} + +/* { dg-final { scan-tree-dump-times PHI ivtmp 1 ivopts} } */ +/* { dg-final { scan-tree-dump-times PHI 1 ivopts} } */ +/* { dg-final { object-size text = 26 { target arm_thumb2_ok } } } */ +/* { dg-final { cleanup-tree-dump ivopts } } */ Index: gcc/testsuite/gcc.target/arm/ivopts.c === --- /dev/null (new file) +++ gcc/testsuite/gcc.target/arm/ivopts.c (revision 0) @@ -0,0 +1,15 @@ +/* { dg-do assemble } */ +/* { dg-options -Os -mthumb -fdump-tree-ivopts -save-temps } */ + +void +tr5 (short array[], int n) +{ + int x; + if (n 0) +for (x = 0; x n; x++) + array[x] = 0; +} + +/* { dg-final { scan-tree-dump-times PHI 1 ivopts} } */ +/* { dg-final { object-size text = 20 { target arm_thumb2_ok } } } */ +/* { dg-final { cleanup-tree-dump ivopts } } */
[PATCH PR45098, 7/10] Nowrap limits iterations
On 05/17/2011 09:20 AM, Tom de Vries wrote: On 05/17/2011 09:10 AM, Tom de Vries wrote: Hi Zdenek, I have a patch set for for PR45098. 01_object-size-target.patch 02_pr45098-rtx-cost-set.patch 03_pr45098-computation-cost.patch 04_pr45098-iv-init-cost.patch 05_pr45098-bound-cost.patch 06_pr45098-bound-cost.test.patch 07_pr45098-nowrap-limits-iterations.patch 08_pr45098-nowrap-limits-iterations.test.patch 09_pr45098-shift-add-cost.patch 10_pr45098-shift-add-cost.test.patch I will sent out the patches individually. OK for trunk? Thanks, - Tom Resubmitting with comment. This patch attemps to estimate the number of iterations of the loop based on nonwrapping arithmetic in the loop body. 2011-05-05 Tom de Vries t...@codesourcery.com PR target/45098 * tree-ssa-loop-ivopts.c (struct ivopts_data): Add fields max_iterations_p and max_iterations. (is_nonwrap_use, max_loop_iterations, set_max_iterations): New function. (may_eliminate_iv): Use max_iterations_p and max_iterations. (tree_ssa_iv_optimize_loop): Use set_max_iterations. Index: gcc/tree-ssa-loop-ivopts.c === --- gcc/tree-ssa-loop-ivopts.c (revision 173355) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -291,6 +291,12 @@ struct ivopts_data /* Whether the loop body includes any function calls. */ bool body_includes_call; + + /* Whether max_iterations is valid. */ + bool max_iterations_p; + + /* Maximum number of iterations of current_loop. */ + double_int max_iterations; }; /* An assignment of iv candidates to uses. */ @@ -4319,6 +4325,108 @@ iv_elimination_compare (struct ivopts_da return (exit-flags EDGE_TRUE_VALUE ? EQ_EXPR : NE_EXPR); } +/* Determine if USE contains non-wrapping arithmetic. */ + +static bool +is_nonwrap_use (struct ivopts_data *data, struct iv_use *use) +{ + gimple stmt = use-stmt; + tree var, ptr, ptr_type; + + if (!is_gimple_assign (stmt)) +return false; + + switch (gimple_assign_rhs_code (stmt)) +{ +case POINTER_PLUS_EXPR: + ptr = gimple_assign_rhs1 (stmt); + ptr_type = TREE_TYPE (ptr); + var = gimple_assign_rhs2 (stmt); + if (!expr_invariant_in_loop_p (data-current_loop, ptr)) +return false; + break; +case ARRAY_REF: + ptr = TREE_OPERAND ((gimple_assign_rhs1 (stmt)), 0); + ptr_type = build_pointer_type (TREE_TYPE (gimple_assign_rhs1 (stmt))); + var = TREE_OPERAND ((gimple_assign_rhs1 (stmt)), 1); + break; +default: + return false; +} + + if (!nowrap_type_p (ptr_type)) +return false; + + if (TYPE_PRECISION (ptr_type) != TYPE_PRECISION (TREE_TYPE (var))) +return false; + + return true; +} + +/* Attempt to infer maximum number of loop iterations of DATA-current_loop + from uses in loop containing non-wrapping arithmetic. If successful, + return true, and return maximum iterations in MAX_NITER. */ + +static bool +max_loop_iterations (struct ivopts_data *data, double_int *max_niter) +{ + struct iv_use *use; + struct iv *iv; + bool found = false; + double_int period; + gimple stmt; + unsigned i; + + for (i = 0; i n_iv_uses (data); i++) +{ + use = iv_use (data, i); + + stmt = use-stmt; + if (!just_once_each_iteration_p (data-current_loop, gimple_bb (stmt))) + continue; + + if (!is_nonwrap_use (data, use)) +continue; + + iv = use-iv; + if (iv-step == NULL_TREE || TREE_CODE (iv-step) != INTEGER_CST) + continue; + period = tree_to_double_int (iv_period (iv)); + + if (found) +*max_niter = double_int_umin (*max_niter, period); + else +{ + found = true; + *max_niter = period; +} +} + + return found; +} + +/* Initializes DATA-max_iterations and DATA-max_iterations_p. */ + +static void +set_max_iterations (struct ivopts_data *data) +{ + double_int max_niter, max_niter2; + bool estimate1, estimate2; + + data-max_iterations_p = false; + estimate1 = estimated_loop_iterations (data-current_loop, true, max_niter); + estimate2 = max_loop_iterations (data, max_niter2); + if (!(estimate1 || estimate2)) +return; + if (estimate1 estimate2) +data-max_iterations = double_int_umin (max_niter, max_niter2); + else if (estimate1) +data-max_iterations = max_niter; + else +data-max_iterations = max_niter2; + data-max_iterations_p = true; +} + /* Check whether it is possible to express the condition in USE by comparison of candidate CAND. If so, store the value compared with to BOUND. */ @@ -4391,10 +4499,10 @@ may_eliminate_iv (struct ivopts_data *da /* See if we can take advantage of infered loop bound information. */ if (loop_only_exit_p (loop, exit)) { - if (!estimated_loop_iterations (loop, true, max_niter)) + if (!data-max_iterations_p) return false; /*
Re: C++ PATCH for c++/48948 (rejecting constexpr friend that takes the current class)
On 05/11/2011 05:27 PM, Jason Merrill wrote: We want to allow a constexpr friend function that takes the current class, so we need to defer checking the literality of parameter types until any classes involved are complete. It was pointed out to me that the restriction already only applies to function definitions, not declarations, which dramatically simplifies the code. This patch reverts most of the previous one, and only checks return/parameter types at the point of definition. Tested x86_64-pc-linux-gnu, applying to trunk. commit f2a2c7b6af06123b5f81bd474b60bddfe9b58550 Author: Jason Merrill ja...@redhat.com Date: Mon May 16 17:21:17 2011 -0400 PR c++/48948 PR c++/49015 * class.c (finalize_literal_type_property): Do check for constexpr member functions of non-literal class. (finish_struct): Don't call check_deferred_constexpr_decls. * cp-tree.h: Don't declare it. (DECL_DEFERRED_CONSTEXPR_CHECK): Remove. * decl.c (grok_special_member_properties): Don't check it (grokfnedcl): Don't call validate_constexpr_fundecl. (start_preparsed_function): Do call it. * pt.c (tsubst_decl): Don't call it. (instantiate_class_template_1): Don't call check_deferred_constexpr_decls. * semantics.c (literal_type_p): Check for any incompleteness. (ensure_literal_type_for_constexpr_object): Likewise. (is_valid_constexpr_fn): Revert deferral changes. (validate_constexpr_fundecl): Likewise. (register_constexpr_fundef): Likewise. (check_deferred_constexpr_decls): Remove. diff --git a/gcc/cp/class.c b/gcc/cp/class.c index dc2c509..4e52b18 100644 --- a/gcc/cp/class.c +++ b/gcc/cp/class.c @@ -4582,6 +4582,8 @@ type_requires_array_cookie (tree type) static void finalize_literal_type_property (tree t) { + tree fn; + if (cxx_dialect cxx0x || TYPE_HAS_NONTRIVIAL_DESTRUCTOR (t) /* FIXME These constraints seem unnecessary; remove from standard. @@ -4591,6 +4593,18 @@ finalize_literal_type_property (tree t) else if (CLASSTYPE_LITERAL_P (t) !TYPE_HAS_TRIVIAL_DFLT (t) !TYPE_HAS_CONSTEXPR_CTOR (t)) CLASSTYPE_LITERAL_P (t) = false; + + if (!CLASSTYPE_LITERAL_P (t)) +for (fn = TYPE_METHODS (t); fn; fn = DECL_CHAIN (fn)) + if (DECL_DECLARED_CONSTEXPR_P (fn) + TREE_CODE (fn) != TEMPLATE_DECL + DECL_NONSTATIC_MEMBER_FUNCTION_P (fn) + !DECL_CONSTRUCTOR_P (fn)) + { + DECL_DECLARED_CONSTEXPR_P (fn) = false; + if (!DECL_TEMPLATE_INFO (fn)) + error (enclosing class of %q+#D is not a literal type, fn); + } } /* Check the validity of the bases and members declared in T. Add any @@ -5831,8 +5845,6 @@ finish_struct (tree t, tree attributes) else error (trying to finish struct, but kicked out due to previous parse errors); - check_deferred_constexpr_decls (); - if (processing_template_decl at_function_scope_p ()) add_stmt (build_min (TAG_DEFN, t)); diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index c0b5290..dfb2b66 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -93,7 +93,6 @@ c-common.h, not after. TYPENAME_IS_RESOLVING_P (in TYPE_NAME_TYPE) LAMBDA_EXPR_DEDUCE_RETURN_TYPE_P (in LAMBDA_EXPR) TARGET_EXPR_DIRECT_INIT_P (in TARGET_EXPR) - DECL_DEFERRED_CONSTEXPR_CHECK (in FUNCTION_DECL) 3: (TREE_REFERENCE_EXPR) (in NON_LVALUE_EXPR) (commented-out). ICS_BAD_FLAG (in _CONV) FN_TRY_BLOCK_P (in TRY_BLOCK) @@ -2345,11 +2344,6 @@ struct GTY((variable_size)) lang_decl { #define DECL_DECLARED_CONSTEXPR_P(DECL) \ DECL_LANG_FLAG_8 (VAR_OR_FUNCTION_DECL_CHECK (STRIP_TEMPLATE (DECL))) -/* True if we can't tell yet whether the argument/return types of DECL - are literal because one is still being defined. */ -#define DECL_DEFERRED_CONSTEXPR_CHECK(DECL) \ - TREE_LANG_FLAG_2 (FUNCTION_DECL_CHECK (STRIP_TEMPLATE (DECL))) - /* Nonzero if this DECL is the __PRETTY_FUNCTION__ variable in a template function. */ #define DECL_PRETTY_FUNCTION_P(NODE) \ @@ -5337,7 +5331,6 @@ extern void finish_handler_parms (tree, tree); extern void finish_handler (tree); extern void finish_cleanup (tree, tree); extern bool literal_type_p (tree); -extern void check_deferred_constexpr_decls (void); extern tree validate_constexpr_fundecl (tree); extern tree register_constexpr_fundef (tree, tree); extern bool check_constexpr_ctor_body (tree, tree); diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index 7939140..e950c43 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -7200,10 +7200,7 @@ grokfndecl (tree ctype, if (inlinep) DECL_DECLARED_INLINE_P (decl) = 1; if (inlinep 2) -{ - DECL_DECLARED_CONSTEXPR_P (decl) = true; - validate_constexpr_fundecl (decl); -} +DECL_DECLARED_CONSTEXPR_P (decl) = true; DECL_EXTERNAL (decl) = 1; if (quals TREE_CODE (type) == FUNCTION_TYPE) @@ -10681,9 +10678,6 @@ grok_special_member_properties (tree decl) TYPE_HAS_LIST_CTOR (class_type) = 1; if
Re: [PATCH][?/n] LTO type merging cleanup
We can end up with an infinite recursion as gimple_register_type tries to register TYPE_MAIN_VARIANT first. This is because we are being called from the LTO type-fixup code which walks the type graph and adjusts types to their leaders. So we can be called for type SCCs that are only partially fixed up yet which means TYPE_MAIN_VARIANT might temporarily not honor the invariant that the main variant of a main variant is itself. Thus, simply avoid recursing more than once - we are sure that we will be reaching at most type duplicates in further recursion. Bootstrap regtest pending on x86_64-unknown-linux-gnu. With this funcion WPA stage passes with some improvements I repported to mozilla metabug. We now get ICE in ltrans: #0 gimple_register_type (t=0x0) at ../../gcc/gimple.c:4616 #1 0x005a0fc9 in gimple_register_canonical_type (t=0x7fffe851f498) at ../../gcc/gimple.c:4890 #2 0x0048f14d in lto_ft_type (t=0x7fffe851f498) at ../../gcc/lto/lto.c:401 #3 lto_fixup_types (t=0x7fffe851f498) at ../../gcc/lto/lto.c:581 #4 0x0048f4a0 in uniquify_nodes (node=Unhandled dwarf expression opcode 0xf3 TYPE_MAIN_VARIANT is NULL. (gdb) up #1 0x005a0fc9 in gimple_register_canonical_type (t=0x7fffe851f498) at ../../gcc/gimple.c:4890 4890 t = gimple_register_type (TYPE_MAIN_VARIANT (t)); (gdb) p debug_generic_stmt (t) struct _ffi_type $1 = void (gdb) p debug_tree (t) record_type 0x7fffe851f498 _ffi_type BLK size integer_cst 0x77ecf680 type integer_type 0x77eca0a8 bit_size_type constant 192 unit size integer_cst 0x77ecf640 type integer_type 0x77eca000 constant 24 align 64 symtab 0 alias set -1 structural equality fields field_decl 0x7fffe87684c0 size type integer_type 0x77eca690 long unsigned int public unsigned DI size integer_cst 0x77ecf1e0 constant 64 unit size integer_cst 0x77ecf200 constant 8 align 64 symtab 0 alias set -1 canonical type 0x77eca690 precision 64 min integer_cst 0x77ecf220 0 max integer_cst 0x77ecf1c0 18446744073709551615 pointer_to_this pointer_type 0x75336150 reference_to_this reference_type 0x70aba000 used unsigned nonlocal DI file ctypes/libffi/include/ffi.h line 109 col 0 size integer_cst 0x77ecf1e0 64 unit size integer_cst 0x77ecf200 8 align 64 offset_align 128 offset integer_cst 0x77ebaf00 constant 0 bit offset integer_cst 0x77ecf420 constant 0 context record_type 0x7fffe851f2a0 _ffi_type chain field_decl 0x7fffe8768558 alignment type integer_type 0x77eca3f0 short unsigned int used unsigned nonlocal HI file ctypes/libffi/include/ffi.h line 110 col 0 size integer_cst 0x77ecf080 constant 16 unit size integer_cst 0x77ecf0a0 constant 2 align 16 offset_align 128 offset integer_cst 0x77ebaf00 0 bit offset integer_cst 0x77ecf1e0 64 context record_type 0x7fffe851f2a0 _ffi_type chain field_decl 0x7fffe87685f0 type chain type_decl 0x7fffe8966ac8 _ffi_type $2 = void Let me know if there is anything easy I could work out ;) I think the bug may be in the recursion guard. When you have cycle of length greater than 2 of MVs, you won't walk them all. Honza
[PATCH PR45098, 8/10] Nowrap limits iterations - test cases.
On 05/17/2011 09:21 AM, Tom de Vries wrote: On 05/17/2011 09:10 AM, Tom de Vries wrote: Hi Zdenek, I have a patch set for for PR45098. 01_object-size-target.patch 02_pr45098-rtx-cost-set.patch 03_pr45098-computation-cost.patch 04_pr45098-iv-init-cost.patch 05_pr45098-bound-cost.patch 06_pr45098-bound-cost.test.patch 07_pr45098-nowrap-limits-iterations.patch 08_pr45098-nowrap-limits-iterations.test.patch 09_pr45098-shift-add-cost.patch 10_pr45098-shift-add-cost.test.patch I will sent out the patches individually. OK for trunk? Thanks, - Tom Resubmitting with comment. This patch introduces 3 new testcases, and modifies an existing test case. The 3 new testcases need the preceding patches to pass. The modified test case is ivopt_infer_2.c. #ifndef TYPE #define TYPE char* #endif extern int a[]; /* Can not infer loop iteration from array -- exit test can not be replaced. */ void foo (int i_width, TYPE dst, TYPE src1, TYPE src2) { TYPE dstn= dst + i_width; TYPE dst0 = dst; unsigned long long i = 0; for( ; dst = dstn; ) { dst0[i] = ( src1[i] + src2[i] + 1 +a[i]) 1; dst++; i += 16; } } The estimates in set_max_iterations for this testcase are: (gdb) p /x max_niter $3 = {low = 0x0, high = 0x1} (gdb) p /x max_niter2 $4 = {low = 0x3ff, high = 0x0} The second estimate is based on a[i], which contains the non-wrapping pointer arithmetic a+i. Var i is incremented with 16 each iterations, an a is an int pointer, which explains the factor 64 difference between the 2. 2011-05-05 Tom de Vries t...@codesourcery.com PR target/45098 * gcc.target/arm/ivopts-3.c: New test. * gcc.target/arm/ivopts-4.c: New test. * gcc.target/arm/ivopts-5.c: New test. * gcc.dg/tree-ssa/ivopt_infer_2.c: Adapt test. Index: gcc/testsuite/gcc.target/arm/ivopts-3.c === --- /dev/null (new file) +++ gcc/testsuite/gcc.target/arm/ivopts-3.c (revision 0) @@ -0,0 +1,20 @@ +/* { dg-do assemble } */ +/* { dg-options -Os -mthumb -fdump-tree-ivopts -save-temps } */ + +extern unsigned int foo2 (short*) __attribute__((pure)); + +unsigned int +tr3 (short array[], unsigned int n) +{ + unsigned sum = 0; + unsigned int x; + for (x = 0; x n; x++) +sum += foo2 (array[x]); + return sum; +} + +/* { dg-final { scan-tree-dump-times PHI ivtmp 1 ivopts} } */ +/* { dg-final { scan-tree-dump-times PHI x 0 ivopts} } */ +/* { dg-final { scan-tree-dump-times , x 0 ivopts} } */ +/* { dg-final { object-size text = 30 { target arm_thumb2_ok } } } */ +/* { dg-final { cleanup-tree-dump ivopts } } */ Index: gcc/testsuite/gcc.target/arm/ivopts-4.c === --- /dev/null (new file) +++ gcc/testsuite/gcc.target/arm/ivopts-4.c (revision 0) @@ -0,0 +1,21 @@ +/* { dg-do assemble } */ +/* { dg-options -mthumb -Os -fdump-tree-ivopts -save-temps } */ + +extern unsigned int foo (int*) __attribute__((pure)); + +unsigned int +tr2 (int array[], int n) +{ + unsigned int sum = 0; + int x; + if (n 0) +for (x = 0; x n; x++) + sum += foo (array[x]); + return sum; +} + +/* { dg-final { scan-tree-dump-times PHI ivtmp 1 ivopts} } */ +/* { dg-final { scan-tree-dump-times PHI x 0 ivopts} } */ +/* { dg-final { scan-tree-dump-times , x 0 ivopts} } */ +/* { dg-final { object-size text = 36 { target arm_thumb2_ok } } } */ +/* { dg-final { cleanup-tree-dump ivopts } } */ Index: gcc/testsuite/gcc.target/arm/ivopts-5.c === --- /dev/null (new file) +++ gcc/testsuite/gcc.target/arm/ivopts-5.c (revision 0) @@ -0,0 +1,20 @@ +/* { dg-do assemble } */ +/* { dg-options -Os -mthumb -fdump-tree-ivopts -save-temps } */ + +extern unsigned int foo (int*) __attribute__((pure)); + +unsigned int +tr1 (int array[], unsigned int n) +{ + unsigned int sum = 0; + unsigned int x; + for (x = 0; x n; x++) +sum += foo (array[x]); + return sum; +} + +/* { dg-final { scan-tree-dump-times PHI ivtmp 1 ivopts} } */ +/* { dg-final { scan-tree-dump-times PHI x 0 ivopts} } */ +/* { dg-final { scan-tree-dump-times , x 0 ivopts} } */ +/* { dg-final { object-size text = 30 { target arm_thumb2_ok } } } */ +/* { dg-final { cleanup-tree-dump ivopts } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c === --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c (revision 173380) +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c (working copy) @@ -7,7 +7,8 @@ extern int a[]; -/* Can not infer loop iteration from array -- exit test can not be replaced. */ +/* Can infer loop iteration from nonwrapping pointer arithmetic. + exit test can be replaced. */ void foo (int i_width, TYPE dst, TYPE src1, TYPE src2) { TYPE dstn= dst + i_width; @@ -21,5 +22,5 @@ void foo (int i_width, TYPE dst, TYPE sr } } -/* { dg-final {
[PATCH PR45098, 9/10] Cheap shift-add.
On 05/17/2011 09:21 AM, Tom de Vries wrote: On 05/17/2011 09:10 AM, Tom de Vries wrote: Hi Zdenek, I have a patch set for for PR45098. 01_object-size-target.patch 02_pr45098-rtx-cost-set.patch 03_pr45098-computation-cost.patch 04_pr45098-iv-init-cost.patch 05_pr45098-bound-cost.patch 06_pr45098-bound-cost.test.patch 07_pr45098-nowrap-limits-iterations.patch 08_pr45098-nowrap-limits-iterations.test.patch 09_pr45098-shift-add-cost.patch 10_pr45098-shift-add-cost.test.patch I will sent out the patches individually. OK for trunk? Thanks, - Tom Resubmitting with comment. ARM has cheap shift-add instructions. Take that into account in force_expr_to_var_cost. 2011-05-05 Tom de Vries t...@codesourcery.com PR target/45098 * tree-ssa-loop-ivopts.c: Include expmed.h. (get_shiftadd_cost): New function. (force_expr_to_var_cost): Use get_shiftadd_cost. Index: gcc/tree-ssa-loop-ivopts.c === --- gcc/tree-ssa-loop-ivopts.c (revision 173380) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -92,6 +92,12 @@ along with GCC; see the file COPYING3. #include tree-inline.h #include tree-ssa-propagate.h +/* FIXME: add_cost and zero_cost defined in exprmed.h conflict with local uses. + */ +#include expmed.h +#undef add_cost +#undef zero_cost + /* FIXME: Expressions are expanded to RTL in this pass to determine the cost of different addressing modes. This should be moved to a TBD interface between the GIMPLE and RTL worlds. */ @@ -3504,6 +3510,37 @@ get_address_cost (bool symbol_present, b return new_cost (cost + acost, complexity); } + /* Calculate the SPEED or size cost of shiftadd EXPR in MODE. MULT is the +the EXPR operand holding the shift. COST0 and COST1 are the costs for +calculating the operands of EXPR. Returns true if successful, and returns +the cost in COST. */ + +static bool +get_shiftadd_cost (tree expr, enum machine_mode mode, comp_cost cost0, + comp_cost cost1, tree mult, bool speed, comp_cost *cost) +{ + comp_cost res; + tree op1 = TREE_OPERAND (expr, 1); + tree cst = TREE_OPERAND (mult, 1); + int m = exact_log2 (int_cst_value (cst)); + int maxm = MIN (BITS_PER_WORD, GET_MODE_BITSIZE (mode)); + int sa_cost; + + if (!(m = 0 m maxm)) +return false; + + sa_cost = (TREE_CODE (expr) != MINUS_EXPR + ? shiftadd_cost[speed][mode][m] + : (mult == op1 +? shiftsub1_cost[speed][mode][m] +: shiftsub0_cost[speed][mode][m])); + res = new_cost (sa_cost, 0); + res = add_costs (res, mult == op1 ? cost0 : cost1); + + *cost = res; + return true; +} + /* Estimates cost of forcing expression EXPR into a variable. */ static comp_cost @@ -3629,6 +3666,21 @@ force_expr_to_var_cost (tree expr, bool case MINUS_EXPR: case NEGATE_EXPR: cost = new_cost (add_cost (mode, speed), 0); + if (TREE_CODE (expr) != NEGATE_EXPR) +{ + tree mult = NULL_TREE; + comp_cost sa_cost; + if (TREE_CODE (op1) == MULT_EXPR) +mult = op1; + else if (TREE_CODE (op0) == MULT_EXPR) +mult = op0; + + if (mult != NULL_TREE + TREE_CODE (TREE_OPERAND (mult, 1)) == INTEGER_CST + get_shiftadd_cost (expr, mode, cost0, cost1, mult, speed, +sa_cost)) +return sa_cost; +} break; case MULT_EXPR:
[PATCH, i386]: Trivial, split long asm templates in TLS patterns
Hello! 2011-05-18 Uros Bizjak ubiz...@gmail.com * config/i386/i386.md (*tls_global_dynamic_32_gnu): Split asm template. (*tls_global_dynamic_64): Ditto. (*tls_local_dynamic_base_32_gnu): Ditto. (*tls_local_dynamic_base_64): Ditto. (tls_initial_exec_64_sun): Ditto. No functional changes. Patch was tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN. Uros. Index: i386.md === --- i386.md (revision 173864) +++ i386.md (working copy) @@ -12364,7 +12364,11 @@ (clobber (match_scratch:SI 5 =c)) (clobber (reg:CC FLAGS_REG))] !TARGET_64BIT TARGET_GNU_TLS - lea{l}\t{%a2@tlsgd(,%1,1), %0|%0, %a2@tlsgd[%1*1]}\;call\t%P3 +{ + output_asm_insn +(lea{l}\t{%a2@tlsgd(,%1,1), %0|%0, %a2@tlsgd[%1*1]}, operands); + return call\t%P3; +} [(set_attr type multi) (set_attr length 12)]) @@ -12387,7 +12391,14 @@ (unspec:DI [(match_operand:DI 1 tls_symbolic_operand )] UNSPEC_TLS_GD)] TARGET_64BIT - { return ASM_BYTE 0x66\n\tlea{q}\t{%a1@tlsgd(%%rip), %%rdi|rdi, %a1@tlsgd[rip]}\n ASM_SHORT 0x\n\trex64\n\tcall\t%P2; } +{ + fputs (ASM_BYTE 0x66\n, asm_out_file); + output_asm_insn +(lea{q}\t{%a1@tlsgd(%%rip), %%rdi|rdi, %a1@tlsgd[rip]}, operands); + fputs (ASM_SHORT 0x\n, asm_out_file); + fputs (\trex64\n, asm_out_file); + return call\t%P2; +} [(set_attr type multi) (set_attr length 16)]) @@ -12410,7 +12421,11 @@ (clobber (match_scratch:SI 4 =c)) (clobber (reg:CC FLAGS_REG))] !TARGET_64BIT TARGET_GNU_TLS - lea{l}\t{%@tlsldm(%1), %0|%0, %@tlsldm[%1]}\;call\t%P2 +{ + output_asm_insn +(lea{l}\t{%@tlsldm(%1), %0|%0, %@tlsldm[%1]}, operands); + return call\t%P2; +} [(set_attr type multi) (set_attr length 11)]) @@ -12432,7 +12447,11 @@ (match_operand:DI 2 ))) (unspec:DI [(const_int 0)] UNSPEC_TLS_LD_BASE)] TARGET_64BIT - lea{q}\t{%@tlsld(%%rip), %%rdi|rdi, %@tlsld[rip]}\;call\t%P1 +{ + output_asm_insn +(lea{q}\t{%@tlsld(%%rip), %%rdi|rdi, %@tlsld[rip]}, operands); + return call\t%P1; +} [(set_attr type multi) (set_attr length 12)]) @@ -12507,7 +12526,11 @@ UNSPEC_TLS_IE_SUN)) (clobber (reg:CC FLAGS_REG))] TARGET_64BIT TARGET_SUN_TLS - mov{q}\t{%%fs:0, %0|%0, QWORD PTR fs:0}\n\tadd{q}\t{%a1@gottpoff(%%rip), %0|%0, %a1@gottpoff[rip]} +{ + output_asm_insn +(mov{q}\t{%%fs:0, %0|%0, QWORD PTR fs:0}, operands) + return add{q}\t{%a1@gottpoff(%%rip), %0|%0, %a1@gottpoff[rip]}; +} [(set_attr type multi)]) ;; GNU2 TLS patterns can be split.
Re: [PATCH,c++] describe reasons for function template overload resolution failure
Thanks for the background; I will keep the principle in mind. IMHO, in a case like this where we're logically printing one diagnostic (one error and then some number of explanatory notes) keeping all the logic for the diagnostic centralized makes more sense. I understand, but that means we have to create a whole data structure to try and preserve information about the failure, and either having to duplicate every possible error or give less informative messages. I feel even more strongly about this after looking more closely at your patch. +case ur_invalid: + inform (loc, + template argument deduction attempted with invalid input); + break; In ur_invalid cases, we should have had an earlier error message already, so giving an extra message here seems kind of redundant. + types %qT and %qT differ in their qualifiers, Let's say ...have incompatible cv-qualifiers, since some differences are OK. + inform (loc, variable-sized array type %qT is not permitted, ...is not a valid template argument + inform (loc, %qT is not derived from %qT, This could be misleading, since we can also fail when the deduction is ambiguous. + inform (loc, %qE is not a valid pointer-to-member of type %qT, This needs to say pointer-to-member constant, not just pointer-to-member. +case ur_parameter_deduction_failure: + inform (loc, couldn't deduce template argument %qD, ui-u.parm); + break; It seems like you're using this both for cases where unification succeeded but just didn't produce template arguments for all parameters, and for cases where unification failed for some reason; this message should only apply to the first case. if (TREE_PURPOSE (TREE_VEC_ELT (tparms, i))) { tree parm = TREE_VALUE (TREE_VEC_ELT (tparms, i)); tree arg = TREE_PURPOSE (TREE_VEC_ELT (tparms, i)); arg = tsubst_template_arg (arg, targs, tf_none, NULL_TREE); arg = convert_template_argument (parm, arg, targs, tf_none, i, NULL_TREE, ui); if (arg == error_mark_node) return unify_parameter_deduction_failure (ui, parm); In this case, the problem is that we tried to use the default template argument but it didn't work for some reason; we should say that, not just say we didn't deduce something, or the users will say but there's a default argument!. In this case, we should do the substitution again with tf_warning_or_error so the user can see what the problem actually is, not just say that there was some unspecified problem. - return 2; + return unify_parameter_deduction_failure (ui, tparm); This seems like the only place we actually want to use unify_parameter_deduction_failure. /* Check for mixed types and values. */ if ((TREE_CODE (parm) == TEMPLATE_TYPE_PARM TREE_CODE (tparm) != TYPE_DECL) || (TREE_CODE (parm) == TEMPLATE_TEMPLATE_PARM TREE_CODE (tparm) != TEMPLATE_DECL)) return unify_parameter_deduction_failure (ui, parm); This is a type/template mismatch issue that deserves a more helpful diagnostic. /* ARG must be constructed from a template class or a template template parameter. */ if (TREE_CODE (arg) != BOUND_TEMPLATE_TEMPLATE_PARM !CLASSTYPE_SPECIALIZATION_OF_PRIMARY_TEMPLATE_P (arg)) return unify_parameter_deduction_failure (ui, parm); This is saying that we can't deduce a template from a non-template type. /* If the argument deduction results is a METHOD_TYPE, then there is a problem. METHOD_TYPE doesn't map to any real C++ type the result of the deduction can not be of that type. */ if (TREE_CODE (arg) == METHOD_TYPE) return unify_parameter_deduction_failure (ui, parm); Like with the VLA case, the problem here is deducing something that isn't a valid template type argument. /* We haven't deduced the type of this parameter yet. Try again later. */ return unify_success (ui); else return unify_parameter_deduction_failure (ui, parm); Here the problem is a type mismatch between parm and arg for a non-type template argument. /* Perhaps PARM is something like SU and ARG is Sint. Then, we should unify `int' and `U'. */ t = arg; else /* There's no chance of unification succeeding. */ return unify_parameter_deduction_failure (ui, parm); This should be type_mismatch. case FIELD_DECL: case TEMPLATE_DECL: /* Matched cases are handled by the ARG == PARM test above. */ return unify_parameter_deduction_failure (ui, parm); Another case where we should talk about the arg/parm mismatch. + case rr_invalid_copy: + inform
Re: Libiberty: POSIXify psignal definition
On Tue, 2011-05-17 at 12:48 -0400, DJ Delorie wrote: What I don't understand is why the newlib change broke older compilers. Older compilers have the older libiberty. At the moment, libiberty cannot be built by *any* released gcc, because you cannot *build* any released gcc, because it cannot build its target libiberty. And the problem is that libiberty is assuming that it *knows* what functions newlib provides, so that it doesn't need to check directly. This is just broken... # If we are being configured for newlib, we know which functions # newlib provide and which ones we will be expected to provide.
Re: [google] Increase inlining limits with FDO/LIPO
Though not common, people can do this: 1. for profile gen: gcc -fprofile-arcs ... 2. for profile use gcc -fbranch-probabilities ... The new change won't help those. Your original place will be ok if you test profile_arcs and branch_probability flags. David On Wed, May 18, 2011 at 10:39 AM, Mark Heffernan meh...@google.com wrote: On Tue, May 17, 2011 at 11:34 PM, Xinliang David Li davi...@google.com wrote: To make consistent inline decisions between profile-gen and profile-use, probably better to check these two: flag_profile_arcs and flag_branch_probabilities. -fprofile-use enables profile-arcs, and value profiling is enabled only when edge/branch profiling is enabled (so no need to be checked). I changed the location where these parameters are set to someplace more appropriate (to where the flags are set when profile gen/use is indicated). Verified identical binaries are generated. OK as updated? Mark 2011-05-18 Mark Heffernan meh...@google.com * opts.c (set_profile_parameters): New function. Index: opts.c === --- opts.c (revision 173666) +++ opts.c (working copy) @@ -1209,6 +1209,25 @@ print_specific_help (unsigned int includ opts-x_help_columns, opts, lang_mask); } + +/* Set parameters to more appropriate values when profile information + is available. */ +static void +set_profile_parameters (struct gcc_options *opts, + struct gcc_options *opts_set) +{ + /* With accurate profile information, inlining is much more + selective and makes better decisions, so increase the + inlining function size limits. */ + maybe_set_param_value + (PARAM_MAX_INLINE_INSNS_SINGLE, 1000, + opts-x_param_values, opts_set-x_param_values); + maybe_set_param_value + (PARAM_MAX_INLINE_INSNS_AUTO, 1000, + opts-x_param_values, opts_set-x_param_values); +} + + /* Handle target- and language-independent options. Return zero to generate an unknown option message. Only options that need extra handling need to be listed here; if you simply want @@ -1560,6 +1579,7 @@ common_handle_option (struct gcc_options opts-x_flag_unswitch_loops = value; if (!opts_set-x_flag_gcse_after_reload) opts-x_flag_gcse_after_reload = value; + set_profile_parameters (opts, opts_set); break; case OPT_fprofile_generate_: @@ -1580,6 +1600,7 @@ common_handle_option (struct gcc_options is done. */ if (!opts_set-x_flag_ipa_reference in_lto_p) opts-x_flag_ipa_reference = false; + set_profile_parameters (opts, opts_set); break; case OPT_fshow_column:
Re: Libiberty: POSIXify psignal definition
And the problem is that libiberty is assuming that it *knows* what functions newlib provides, so that it doesn't need to check directly. This is just broken... Historically, cygwin was built using libiberty and newlib, so you did not have a runtime at the time you were building libiberty, because you hadn't built newlib yet. In a combined tree, target-libiberty is still built before target-newlib, so the problem exists there too. At this point, though, I'm tempted to say there's no such thing as a target libiberty and rip all the target-libiberty rules out, and let newlib-hosted targets autodetect the host-libiberty. That is, if Cygwin doesn't need a target-libiberty any more?
[v3] Update bitset (and a few more bits elsewhere) for noexcept
Hi, tested x86_64-linux, committed. Thanks, Paolo. // 2011-05-18 Paolo Carlini paolo.carl...@oracle.com * libsupc++/initializer_list: Use noexcept specifier. (initializer_list::size, begin, end): Qualify as const. * include/bits/move.h (__addressof, forward, move, addressof): Specify as noexcept. * include/std/bitset: Use noexcept specifier throughout. * include/debug/bitset: Update. * include/profile/bitset: Likewise. Index: include/debug/bitset === --- include/debug/bitset(revision 173870) +++ include/debug/bitset(working copy) @@ -1,6 +1,6 @@ // Debugging bitset implementation -*- C++ -*- -// Copyright (C) 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 +// Copyright (C) 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011 // Free Software Foundation, Inc. // // This file is part of the GNU ISO C++ Library. This library is free @@ -66,19 +66,19 @@ reference(); reference(const _Base_ref __base, - bitset* __seq __attribute__((__unused__))) + bitset* __seq __attribute__((__unused__))) _GLIBCXX_NOEXCEPT : _Base_ref(__base) , _Safe_iterator_base(__seq, false) { } public: - reference(const reference __x) + reference(const reference __x) _GLIBCXX_NOEXCEPT : _Base_ref(__x) , _Safe_iterator_base(__x, false) { } reference - operator=(bool __x) + operator=(bool __x) _GLIBCXX_NOEXCEPT { _GLIBCXX_DEBUG_VERIFY(! this-_M_singular(), _M_message(__gnu_debug::__msg_bad_bitset_write) @@ -88,7 +88,7 @@ } reference - operator=(const reference __x) + operator=(const reference __x) _GLIBCXX_NOEXCEPT { _GLIBCXX_DEBUG_VERIFY(! __x._M_singular(), _M_message(__gnu_debug::__msg_bad_bitset_read) @@ -101,7 +101,7 @@ } bool - operator~() const + operator~() const _GLIBCXX_NOEXCEPT { _GLIBCXX_DEBUG_VERIFY(! this-_M_singular(), _M_message(__gnu_debug::__msg_bad_bitset_read) @@ -109,7 +109,7 @@ return ~(*static_castconst _Base_ref*(this)); } - operator bool() const + operator bool() const _GLIBCXX_NOEXCEPT { _GLIBCXX_DEBUG_VERIFY(! this-_M_singular(), _M_message(__gnu_debug::__msg_bad_bitset_read) @@ -118,7 +118,7 @@ } reference - flip() + flip() _GLIBCXX_NOEXCEPT { _GLIBCXX_DEBUG_VERIFY(! this-_M_singular(), _M_message(__gnu_debug::__msg_bad_bitset_flip) @@ -130,10 +130,11 @@ #endif // 23.3.5.1 constructors: - _GLIBCXX_CONSTEXPR bitset() : _Base() { } + _GLIBCXX_CONSTEXPR bitset() _GLIBCXX_NOEXCEPT + : _Base() { } #ifdef __GXX_EXPERIMENTAL_CXX0X__ - constexpr bitset(unsigned long long __val) + constexpr bitset(unsigned long long __val) noexcept #else bitset(unsigned long __val) #endif @@ -173,42 +174,42 @@ // 23.3.5.2 bitset operations: bitset_Nb - operator=(const bitset_Nb __rhs) + operator=(const bitset_Nb __rhs) _GLIBCXX_NOEXCEPT { _M_base() = __rhs; return *this; } bitset_Nb - operator|=(const bitset_Nb __rhs) + operator|=(const bitset_Nb __rhs) _GLIBCXX_NOEXCEPT { _M_base() |= __rhs; return *this; } bitset_Nb - operator^=(const bitset_Nb __rhs) + operator^=(const bitset_Nb __rhs) _GLIBCXX_NOEXCEPT { _M_base() ^= __rhs; return *this; } bitset_Nb - operator=(size_t __pos) + operator=(size_t __pos) _GLIBCXX_NOEXCEPT { _M_base() = __pos; return *this; } bitset_Nb - operator=(size_t __pos) + operator=(size_t __pos) _GLIBCXX_NOEXCEPT { _M_base() = __pos; return *this; } bitset_Nb - set() + set() _GLIBCXX_NOEXCEPT { _Base::set(); return *this; @@ -224,7 +225,7 @@ } bitset_Nb - reset() + reset() _GLIBCXX_NOEXCEPT { _Base::reset(); return *this; @@ -237,10 +238,12 @@ return *this; } - bitset_Nb operator~() const { return bitset(~_M_base()); } + bitset_Nb + operator~() const _GLIBCXX_NOEXCEPT + { return bitset(~_M_base()); } bitset_Nb - flip() + flip() _GLIBCXX_NOEXCEPT { _Base::flip(); return *this; @@ -346,11 +349,11 @@ using _Base::size; bool - operator==(const bitset_Nb __rhs) const + operator==(const bitset_Nb __rhs) const _GLIBCXX_NOEXCEPT { return _M_base() == __rhs; } bool
Re: Libiberty: POSIXify psignal definition
On May 18 14:03, DJ Delorie wrote: And the problem is that libiberty is assuming that it *knows* what functions newlib provides, so that it doesn't need to check directly. This is just broken... Historically, cygwin was built using libiberty and newlib, so you did not have a runtime at the time you were building libiberty, because you hadn't built newlib yet. In a combined tree, target-libiberty is still built before target-newlib, so the problem exists there too. At this point, though, I'm tempted to say there's no such thing as a target libiberty and rip all the target-libiberty rules out, and let newlib-hosted targets autodetect the host-libiberty. That is, if Cygwin doesn't need a target-libiberty any more? Cygwin doesn't need libiberty anymore since 2007. Corinna -- Corinna Vinschen Cygwin Project Co-Leader Red Hat
Re: [PATCH, PR45098, 3/10]
Hi Zdenek, On 05/18/2011 05:24 PM, Zdenek Dvorak wrote: Hi, How about: ... @@ -2866,6 +2878,8 @@ computation_cost (tree expr, bool speed) if (MEM_P (rslt)) cost += address_cost (XEXP (rslt, 0), TYPE_MODE (type), TYPE_ADDR_SPACE (type), speed); + else if (!REG_P (rslt)) +cost += (unsigned)rtx_cost (rslt, SET, speed); return cost; } ... ? this looks ok to me thanks for the review. (the cast to unsigned is not necessary, though?) You're right, it's not, that was only necessary to prevent a warning in the conditional expression originally proposed. Checked in without cast. Thanks, - Tom
Re: [patch gimplifier]: Change TRUTH_(AND|OR|XOR) expressions to binary form
2011/5/18 Kai Tietz ktiet...@googlemail.com: Hello As follow-up for logical to binary transition 2011-05-18 Kai Tietz kti...@redhat.com * tree-cfg.c (verify_gimple_assign_binary): Barf on TRUTH_AND_EXPR, TRUTH_OR_EXPR, and TRUTH_XOR_EXPR. (gimplify_expr): Boolify TRUTH_ANDIF_EXPR, TRUTH_ORIF_EXPR, TRUTH_AND_EXPR, TRUTH_OR_EXPR, and TRUTH_XOR_EXPR. Additionally move TRUTH_AND|OR|XOR_EXPR to its binary form. Boostrapped for x86_64-pc-linux-gnu and regression tested for ada, fortran, g++, and c. Ok for apply? Additional bootstrapped and regression tested for java, obj-c, and obj-c++. Regression tested alos libstdc++ and libjava. No regressions. Regards, Kai
Re: [google] Increase inlining limits with FDO/LIPO
On Wed, May 18, 2011 at 10:52 AM, Xinliang David Li davi...@google.com wrote: The new change won't help those. Your original place will be ok if you test profile_arcs and branch_probability flags. Ah, yes. I see your point now. Reverted to the original change with condition profile_arc_flag and flag_branch_probabilities. Mark David On Wed, May 18, 2011 at 10:39 AM, Mark Heffernan meh...@google.com wrote: On Tue, May 17, 2011 at 11:34 PM, Xinliang David Li davi...@google.com wrote: To make consistent inline decisions between profile-gen and profile-use, probably better to check these two: flag_profile_arcs and flag_branch_probabilities. -fprofile-use enables profile-arcs, and value profiling is enabled only when edge/branch profiling is enabled (so no need to be checked). I changed the location where these parameters are set to someplace more appropriate (to where the flags are set when profile gen/use is indicated). Verified identical binaries are generated. OK as updated? Mark 2011-05-18 Mark Heffernan meh...@google.com * opts.c (set_profile_parameters): New function. Index: opts.c === --- opts.c (revision 173666) +++ opts.c (working copy) @@ -1209,6 +1209,25 @@ print_specific_help (unsigned int includ opts-x_help_columns, opts, lang_mask); } + +/* Set parameters to more appropriate values when profile information + is available. */ +static void +set_profile_parameters (struct gcc_options *opts, + struct gcc_options *opts_set) +{ + /* With accurate profile information, inlining is much more + selective and makes better decisions, so increase the + inlining function size limits. */ + maybe_set_param_value + (PARAM_MAX_INLINE_INSNS_SINGLE, 1000, + opts-x_param_values, opts_set-x_param_values); + maybe_set_param_value + (PARAM_MAX_INLINE_INSNS_AUTO, 1000, + opts-x_param_values, opts_set-x_param_values); +} + + /* Handle target- and language-independent options. Return zero to generate an unknown option message. Only options that need extra handling need to be listed here; if you simply want @@ -1560,6 +1579,7 @@ common_handle_option (struct gcc_options opts-x_flag_unswitch_loops = value; if (!opts_set-x_flag_gcse_after_reload) opts-x_flag_gcse_after_reload = value; + set_profile_parameters (opts, opts_set); break; case OPT_fprofile_generate_: @@ -1580,6 +1600,7 @@ common_handle_option (struct gcc_options is done. */ if (!opts_set-x_flag_ipa_reference in_lto_p) opts-x_flag_ipa_reference = false; + set_profile_parameters (opts, opts_set); break; case OPT_fshow_column:
New options to disable/enable any pass for any functions (issue4550056)
In gcc, not all passes have user level control to turn it on/off, and there is no way to flip on/off the pass for a subset of functions. I implemented a generic option handling scheme in gcc to allow disabling/enabling any gcc pass for any specified function(s). The new options will be very useful for things like performance experiments and bug triaging (gcc has dbgcnt mechanism, but not all passes have the counter). The option syntax is very similar to -fdump- options. The following are some examples: -fdisable-tree-ccp1--- disable ccp1 for all functions -fenable-tree-cunroll=1 --- enable complete unroll for the function whose cgraphnode uid is 1 -fdisable-rtl-gcse2=1:100,300,400:1000 -- disable gcse2 for functions at the following ranges [1,1], [300,400], and [400,1000] -fdisable-tree-einline -- disable early inlining for all callers -fdisable-ipa-inline -- disable ipa inlininig In the gcc dumps, the uid numbers are displayed in the function header. The options are intended to be used internally by gcc developers. Ok for trunk ? (There is a little LIPO specific change that can be removed). David 2011-05-18 David Li davi...@google.com * final.c (rest_of_clean_state): Call function header dumper. * opts-global.c (handle_common_deferred_options): Handle new options. * tree-cfg.c (gimple_dump_cfg): Call function header dumper. * passes.c (register_one_dump_file): Call register_pass_name. (pass_init_dump_file): Call function header dumper. (execute_one_pass): Check explicit enable/disable flag. (passr_hash): New function. (passr_eq): (register_pass_name): (get_pass_by_name): (pass_hash): (pass_eq): (enable_disable_pass): (is_pass_explicitly_enabled_or_disabled): (is_pass_explicitly_enabled): (is_pass_explicitly_disabled): Index: tree-pass.h === --- tree-pass.h (revision 173635) +++ tree-pass.h (working copy) @@ -644,4 +644,12 @@ extern bool first_pass_instance; /* Declare for plugins. */ extern void do_per_function_toporder (void (*) (void *), void *); +extern void enable_disable_pass (const char *, bool); +extern bool is_pass_explicitly_disabled (struct opt_pass *, tree); +extern bool is_pass_explicitly_enabled (struct opt_pass *, tree); +extern void register_pass_name (struct opt_pass *, const char *); +extern struct opt_pass *get_pass_by_name (const char *); +struct function; +extern void pass_dump_function_header (FILE *, tree, struct function *); + #endif /* GCC_TREE_PASS_H */ Index: final.c === --- final.c (revision 173635) +++ final.c (working copy) @@ -4456,19 +4456,7 @@ rest_of_clean_state (void) } else { - const char *aname; - struct cgraph_node *node = cgraph_node (current_function_decl); - - aname = (IDENTIFIER_POINTER - (DECL_ASSEMBLER_NAME (current_function_decl))); - fprintf (final_output, \n;; Function (%s) %s\n\n, aname, -node-frequency == NODE_FREQUENCY_HOT -? (hot) -: node-frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED -? (unlikely executed) -: node-frequency == NODE_FREQUENCY_EXECUTED_ONCE -? (executed once) -: ); + pass_dump_function_header (final_output, current_function_decl, cfun); flag_dump_noaddr = flag_dump_unnumbered = 1; if (flag_compare_debug_opt || flag_compare_debug) Index: common.opt === --- common.opt (revision 173635) +++ common.opt (working copy) @@ -1018,6 +1018,14 @@ fdiagnostics-show-option Common Var(flag_diagnostics_show_option) Init(1) Amend appropriate diagnostic messages with the command line option that controls them +fdisable- +Common Joined RejectNegative Var(common_deferred_options) Defer +-fdisable-[tree|rtl|ipa]-pass=range1+range2 disables an optimization pass + +fenable- +Common Joined RejectNegative Var(common_deferred_options) Defer +-fenable-[tree|rtl|ipa]-pass=range1+range2 enables an optimization pass + fdump- Common Joined RejectNegative Var(common_deferred_options) Defer -fdump-type Dump various compiler internals to a file Index: opts-global.c === --- opts-global.c (revision 173635) +++ opts-global.c (working copy) @@ -411,6 +411,12 @@ handle_common_deferred_options (void) error (unrecognized command line option %-fdump-%s%, opt-arg); break; + case OPT_fenable_: + case OPT_fdisable_: + enable_disable_pass (opt-arg, (opt-opt_index == OPT_fenable_? +
Re: [google] Increase inlining limits with FDO/LIPO
Ok with that change to google/main with some retesting. David On Wed, May 18, 2011 at 11:34 AM, Mark Heffernan meh...@google.com wrote: On Wed, May 18, 2011 at 10:52 AM, Xinliang David Li davi...@google.com wrote: The new change won't help those. Your original place will be ok if you test profile_arcs and branch_probability flags. Ah, yes. I see your point now. Reverted to the original change with condition profile_arc_flag and flag_branch_probabilities. Mark David On Wed, May 18, 2011 at 10:39 AM, Mark Heffernan meh...@google.com wrote: On Tue, May 17, 2011 at 11:34 PM, Xinliang David Li davi...@google.com wrote: To make consistent inline decisions between profile-gen and profile-use, probably better to check these two: flag_profile_arcs and flag_branch_probabilities. -fprofile-use enables profile-arcs, and value profiling is enabled only when edge/branch profiling is enabled (so no need to be checked). I changed the location where these parameters are set to someplace more appropriate (to where the flags are set when profile gen/use is indicated). Verified identical binaries are generated. OK as updated? Mark 2011-05-18 Mark Heffernan meh...@google.com * opts.c (set_profile_parameters): New function. Index: opts.c === --- opts.c (revision 173666) +++ opts.c (working copy) @@ -1209,6 +1209,25 @@ print_specific_help (unsigned int includ opts-x_help_columns, opts, lang_mask); } + +/* Set parameters to more appropriate values when profile information + is available. */ +static void +set_profile_parameters (struct gcc_options *opts, + struct gcc_options *opts_set) +{ + /* With accurate profile information, inlining is much more + selective and makes better decisions, so increase the + inlining function size limits. */ + maybe_set_param_value + (PARAM_MAX_INLINE_INSNS_SINGLE, 1000, + opts-x_param_values, opts_set-x_param_values); + maybe_set_param_value + (PARAM_MAX_INLINE_INSNS_AUTO, 1000, + opts-x_param_values, opts_set-x_param_values); +} + + /* Handle target- and language-independent options. Return zero to generate an unknown option message. Only options that need extra handling need to be listed here; if you simply want @@ -1560,6 +1579,7 @@ common_handle_option (struct gcc_options opts-x_flag_unswitch_loops = value; if (!opts_set-x_flag_gcse_after_reload) opts-x_flag_gcse_after_reload = value; + set_profile_parameters (opts, opts_set); break; case OPT_fprofile_generate_: @@ -1580,6 +1600,7 @@ common_handle_option (struct gcc_options is done. */ if (!opts_set-x_flag_ipa_reference in_lto_p) opts-x_flag_ipa_reference = false; + set_profile_parameters (opts, opts_set); break; case OPT_fshow_column:
Re: [PATCH,c++] describe reasons for function template overload resolution failure
On 05/18/2011 01:45 PM, Jason Merrill wrote: Thanks for the background; I will keep the principle in mind. IMHO, in a case like this where we're logically printing one diagnostic (one error and then some number of explanatory notes) keeping all the logic for the diagnostic centralized makes more sense. I understand, but that means we have to create a whole data structure to try and preserve information about the failure, and either having to duplicate every possible error or give less informative messages. I feel even more strongly about this after looking more closely at your patch. Thank you for the review. I'll go back and try things the way you suggest; before I go off and do that, I've taken your comments to mean that: - fn_type_unification/type_unification_real and associated callers should take a boolean `explain' parameter, which is normally false; - failed calls to fn_type_unification should save the arguments for the call for future explanation; - printing diagnostic messages should call fn_type_unification with the saved arguments and a true `explain' parameter. This is similar to passing `struct unification_info' and really only involves shuffling code from call.c into the unify_* functions in pt.c and some minor changes to the rejection_reason code in call.c. The only wrinkle I see is that in cases like these: if (TREE_PURPOSE (TREE_VEC_ELT (tparms, i))) { tree parm = TREE_VALUE (TREE_VEC_ELT (tparms, i)); tree arg = TREE_PURPOSE (TREE_VEC_ELT (tparms, i)); arg = tsubst_template_arg (arg, targs, tf_none, NULL_TREE); arg = convert_template_argument (parm, arg, targs, tf_none, i, NULL_TREE, ui); if (arg == error_mark_node) return unify_parameter_deduction_failure (ui, parm); In this case, the problem is that we tried to use the default template argument but it didn't work for some reason; we should say that, not just say we didn't deduce something, or the users will say but there's a default argument!. In this case, we should do the substitution again with tf_warning_or_error so the user can see what the problem actually is, not just say that there was some unspecified problem. if (coerce_template_parms (parm_parms, full_argvec, TYPE_TI_TEMPLATE (parm), tf_none, /*require_all_args=*/true, /*use_default_args=*/false, ui) == error_mark_node) return 1; Rather than pass ui down into coerce_template_parms we should just note when it fails and run it again at diagnostic time. converted_args = (coerce_template_parms (tparms, explicit_targs, NULL_TREE, tf_none, /*require_all_args=*/false, /*use_default_args=*/false, ui)); if (converted_args == error_mark_node) return 1; Here too. if (fntype == error_mark_node) return unify_substitution_failure (ui); And this should remember the arguments so we can do the tsubst again at diagnostic time. and other bits of pt.c, I'm interpreting your suggestions to mean that tf_warning_or_error should be passed if `explain' is true. That doesn't seem like the best interface for diagnostics, as we'll get: foo.cc:105:40 error: no matching function for call to bar (...) foo.cc:105:40 note: candidates are: bar.hh:7000:30 note: bar (...) bar.hh:7000:30 note: [some reason] bar.hh:4095:63 note: bar (...) bar.hh:... error: [some message from tf_warning_or_error code] I'm not sure that the last location there will necessary be the same as the one that's printed for the declaration. I think I'll punt on that issue for the time being until we see how the diagnostics work out. There's also the matter of the error vs. note diagnostic. I think it'd be nicer to keep the conformity of a note for all the explanations; the only way I see to do that is something like: - Add a tf_note flag; pass it at all appropriate call sites when explaining things; - Add a tf_issue_diagnostic flag that's the union of tf_{warning,error,note}; - Change code that looks like: if (complain tf_warning_or_error) error (STUFF); to something like: if (complain tf_issue_diagnostic) emit_diagnostic (complain tf_note ? DK_NOTE : DK_ERROR, STUFF); passing input_location if we're not already passing a location. That involves a lot of code churn. (Not a lot if you just modified the functions above, but with this scheme, you'd have to call instantiate_template again from the diagnostic code, and I assume you'd want to call that with tf_note as well, which means hitting a lot more code.) I don't see a better way
Re: [Patch, Fortran] PR 48700: memory leak with MOVE_ALLOC
The patch was regtested on x86_64-unknown-linux-gnu. Ok for trunk? OK. Thanks for the patch! Thanks, Tobias. Committed as r173874. (What next on your gfortran agenda?) Well, I am pretty busy with my day job at university and working on my PhD, which will not allow me to make huge leaps on gfortran anytime soon (and was the reason for my abstinence from this mailing list during the last weeks). However, I'll surely try to continue working on a few PRs in the OOP friends area. As the OOP wiki page shows, there is enough work left to do in this respect (e.g. we urgently need polymorphic deallocation and there are problems with type-bound operators and assignments, just to name a few). In case anyone feels a strong urge to implement polymorphic arrays or finalization, please go ahead! I will probably not be able to take this on myself in the next months, but I can offer some support and advice regarding the present implementation of polymorphism and how to extend it. This is all I can promise right now ... Cheers, Janus 2011-05-16 Janus Weil ja...@gcc.gnu.org PR fortran/48700 * trans-intrinsic.c (gfc_conv_intrinsic_move_alloc): Deallocate 'TO' argument to avoid memory leaks. 2011-05-16 Janus Weil ja...@gcc.gnu.org PR fortran/48700 * gfortran.dg/move_alloc_4.f90: New.
Re: Libiberty: POSIXify psignal definition
On Wed, 18 May 2011, DJ Delorie wrote: At this point, though, I'm tempted to say there's no such thing as a target libiberty and rip all the target-libiberty rules out, and let Yes please. I've been arguing for that for some time. http://gcc.gnu.org/ml/gcc/2009-04/msg00410.html http://gcc.gnu.org/ml/gcc/2010-03/msg2.html http://gcc.gnu.org/ml/gcc/2010-03/msg00012.html http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01231.html http://gcc.gnu.org/ml/gcc-bugs/2011-03/msg00206.html http://gcc.gnu.org/ml/gcc/2011-03/msg00465.html http://gcc.gnu.org/ml/gcc-patches/2011-03/msg02304.html -- Joseph S. Myers jos...@codesourcery.com
Re: Libiberty: POSIXify psignal definition
What about these? dependencies = { module=all-target-fastjar; on=all-target-libiberty; }; dependencies = { module=all-target-libobjc; on=all-target-libiberty; }; dependencies = { module=all-target-libstdc++-v3; on=all-target-libiberty; };
Re: New options to disable/enable any pass for any functions (issue4550056)
Thanks for the comment. Will fix those. David On Wed, May 18, 2011 at 12:30 PM, Joseph S. Myers jos...@codesourcery.com wrote: On Wed, 18 May 2011, David Li wrote: + error (Unrecognized option %s, is_enable ? -fenable : -fdisable); + error (Unknown pass %s specified in %s, + phase_name, + is_enable ? -fenable : -fdisable); Follow GNU Coding Standards for diagnostics (start with lowercase letter). + inform (UNKNOWN_LOCATION, %s pass %s for functions in the range of [%u, %u]\n, + is_enable? Enable:Disable, phase_name, new_range-start, new_range-last); Use separate calls to inform for the enable and disable cases, so that full sentences can be extracted for translation. + error (Invalid range %s in option %s, + one_range, + is_enable ? -fenable : -fdisable); GNU Coding Standards. + error (Invalid range %s in option %s, Likewise. + inform (UNKNOWN_LOCATION, %s pass %s for functions in the range of [%u, %u]\n, + is_enable? Enable:Disable, phase_name, new_range-start, new_range-last); Again needs GCS and i18n fixes. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH,c++] describe reasons for function template overload resolution failure
On 05/18/2011 03:00 PM, Nathan Froyd wrote: Thank you for the review. I'll go back and try things the way you suggest; before I go off and do that, I've taken your comments to mean that: - fn_type_unification/type_unification_real and associated callers should take a boolean `explain' parameter, which is normally false; - failed calls to fn_type_unification should save the arguments for the call for future explanation; - printing diagnostic messages should call fn_type_unification with the saved arguments and a true `explain' parameter. Yes, that's what I had in mind. Though I think you can reconstruct the arguments rather than save them. ... bar.hh:4095:63 note: bar (...) bar.hh:... error: [some message from tf_warning_or_error code] I'm not sure that the last location there will necessary be the same as the one that's printed for the declaration. I think I'll punt on that issue for the time being until we see how the diagnostics work out. There's also the matter of the error vs. note diagnostic. I think it'd be nicer to keep the conformity of a note for all the explanations Nicer, yes, but I think that's a secondary concern after usefulness of the actual message. In similar cases I've introduced the errors with another message like %qD is implicitly deleted because the default definition would be ill-formed: Or, in this case, deduction failed because substituting the template arguments would be ill-formed: ; the only way I see to do that is something like: - Add a tf_note flag; pass it at all appropriate call sites when explaining things; - Add a tf_issue_diagnostic flag that's the union of tf_{warning,error,note}; - Change code that looks like: if (complain tf_warning_or_error) error (STUFF); to something like: if (complain tf_issue_diagnostic) emit_diagnostic (complain tf_note ? DK_NOTE : DK_ERROR,STUFF); passing input_location if we're not already passing a location. That involves a lot of code churn. (Not a lot if you just modified the functions above, but with this scheme, you'd have to call instantiate_template again from the diagnostic code, and I assume you'd want to call that with tf_note as well, which means hitting a lot more code.) I don't see a better way to keep the diagnostics uniform, but I might be making things too complicated; did you have a different idea of how to implement what you were suggesting? That all makes sense, but I'd put it in a follow-on patch. And wrap the complexity in a cp_error function that takes a complain parameter and either gives no message, a note, or an error depending. Jason
Make ARM -mfpu= option handling use Enum
This patch continues the cleanup of ARM option handling by making -mfpu= handling use Enum, with the table of FPUs moved to a new arm-fpus.def. Tested building cc1 and xgcc for cross to arm-eabi. Will commit to trunk in the absence of target maintainer objections. contrib: 2011-05-18 Joseph Myers jos...@codesourcery.com * gcc_update (gcc/config/arm/arm-tables.opt): Also depend on gcc/config/arm/arm-fpus.def. gcc: 2011-05-18 Joseph Myers jos...@codesourcery.com * config/arm/arm-fpus.def: New. * config/arm/genopt.sh: Generate Enum and EnumValue entries from arm-fpus.def. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm.c (all_fpus): Move contents to arm-fpus.def. (arm_option_override): Don't decode FPU name to string here. * config/arm/arm.opt (mfpu=): Use Enum. * config/arm/t-arm ($(srcdir)/config/arm/arm-tables.opt, arm.o): Update dependencies. Index: contrib/gcc_update === --- contrib/gcc_update (revision 173864) +++ contrib/gcc_update (working copy) @@ -80,7 +80,7 @@ gcc/config.in: gcc/cstamp-h.in gcc/fixinc/fixincl.x: gcc/fixinc/fixincl.tpl gcc/fixinc/inclhack.def gcc/config/arm/arm-tune.md: gcc/config/arm/arm-cores.def gcc/config/arm/gentune.sh -gcc/config/arm/arm-tables.opt: gcc/config/arm/arm-arches.def gcc/config/arm/arm-cores.def gcc/config/arm/genopt.sh +gcc/config/arm/arm-tables.opt: gcc/config/arm/arm-arches.def gcc/config/arm/arm-cores.def gcc/config/arm/arm-fpus.def gcc/config/arm/genopt.sh gcc/config/m68k/m68k-tables.opt: gcc/config/m68k/m68k-devices.def gcc/config/m68k/m68k-isas.def gcc/config/m68k/m68k-microarchs.def gcc/config/m68k/genopt.sh gcc/config/mips/mips-tables.opt: gcc/config/mips/mips-cpus.def gcc/config/mips/genopt.sh gcc/config/rs6000/rs6000-tables.opt: gcc/config/rs6000/rs6000-cpus.def gcc/config/rs6000/genopt.sh Index: gcc/config/arm/arm-tables.opt === --- gcc/config/arm/arm-tables.opt (revision 173864) +++ gcc/config/arm/arm-tables.opt (working copy) @@ -1,5 +1,6 @@ ; -*- buffer-read-only: t -*- -; Generated automatically by genopt.sh from arm-cores.def and arm-arches.def. +; Generated automatically by genopt.sh from arm-cores.def, arm-arches.def +; and arm-fpus.def. ; Copyright (C) 2011 Free Software Foundation, Inc. ; @@ -339,3 +340,61 @@ EnumValue Enum(arm_arch) String(iwmmxt2) Value(24) +Enum +Name(arm_fpu) Type(int) +Known ARM FPUs (for use with the -mfpu= option): + +EnumValue +Enum(arm_fpu) String(fpa) Value(0) + +EnumValue +Enum(arm_fpu) String(fpe2) Value(1) + +EnumValue +Enum(arm_fpu) String(fpe3) Value(2) + +EnumValue +Enum(arm_fpu) String(maverick) Value(3) + +EnumValue +Enum(arm_fpu) String(vfp) Value(4) + +EnumValue +Enum(arm_fpu) String(vfpv3) Value(5) + +EnumValue +Enum(arm_fpu) String(vfpv3-fp16) Value(6) + +EnumValue +Enum(arm_fpu) String(vfpv3-d16) Value(7) + +EnumValue +Enum(arm_fpu) String(vfpv3-d16-fp16) Value(8) + +EnumValue +Enum(arm_fpu) String(vfpv3xd) Value(9) + +EnumValue +Enum(arm_fpu) String(vfpv3xd-fp16) Value(10) + +EnumValue +Enum(arm_fpu) String(neon) Value(11) + +EnumValue +Enum(arm_fpu) String(neon-fp16) Value(12) + +EnumValue +Enum(arm_fpu) String(vfpv4) Value(13) + +EnumValue +Enum(arm_fpu) String(vfpv4-d16) Value(14) + +EnumValue +Enum(arm_fpu) String(fpv4-sp-d16) Value(15) + +EnumValue +Enum(arm_fpu) String(neon-vfpv4) Value(16) + +EnumValue +Enum(arm_fpu) String(vfp3) Value(17) + Index: gcc/config/arm/arm.c === --- gcc/config/arm/arm.c(revision 173864) +++ gcc/config/arm/arm.c(working copy) @@ -939,25 +939,10 @@ static const struct arm_fpu_desc all_fpus[] = { - {fpa, ARM_FP_MODEL_FPA, 0, VFP_NONE, false, false}, - {fpe2, ARM_FP_MODEL_FPA, 2, VFP_NONE, false, false}, - {fpe3, ARM_FP_MODEL_FPA, 3, VFP_NONE, false, false}, - {maverick, ARM_FP_MODEL_MAVERICK, 0, VFP_NONE, false, false}, - {vfp, ARM_FP_MODEL_VFP, 2, VFP_REG_D16, false, false}, - {vfpv3,ARM_FP_MODEL_VFP, 3, VFP_REG_D32, false, false}, - {vfpv3-fp16, ARM_FP_MODEL_VFP, 3, VFP_REG_D32, false, true}, - {vfpv3-d16,ARM_FP_MODEL_VFP, 3, VFP_REG_D16, false, false}, - {vfpv3-d16-fp16, ARM_FP_MODEL_VFP, 3, VFP_REG_D16, false, true}, - {vfpv3xd, ARM_FP_MODEL_VFP, 3, VFP_REG_SINGLE, false, false}, - {vfpv3xd-fp16, ARM_FP_MODEL_VFP, 3, VFP_REG_SINGLE, false, true}, - {neon, ARM_FP_MODEL_VFP, 3, VFP_REG_D32, true , false}, - {neon-fp16,ARM_FP_MODEL_VFP, 3, VFP_REG_D32, true , true }, - {vfpv4,ARM_FP_MODEL_VFP, 4, VFP_REG_D32, false, true}, - {vfpv4-d16,ARM_FP_MODEL_VFP, 4, VFP_REG_D16, false, true}, - {fpv4-sp-d16, ARM_FP_MODEL_VFP, 4, VFP_REG_SINGLE,
[Patch, fortran] Update documentation and error messages for -ffpe-trap
Hi, the attached patch updates the documentation and error messages for the -ffpe-trap= option: - The IEEE 754 name for the loss of precision exception is inexact, and not precision (both in 754-1985 and 754-2008). So use that instead, while still allowing precision as an alias for inexact for backwards compatibility. Also, change the name of the corresponding macro in the internal headers (ABI is not broken since the value is still the same). - The denormal exception is not an IEEE exception, but an additional one supported at least on x86. And the difference between underflow and denormal is, AFAICS, that underflow refers to the result of a FP operation, whereas the denormal exception means that an operand to an operation was a denormal. So try to clarify that. - In fpu-aix.h we had a bug where we enabled underflow instead of inexact when inexact was specified. Fixed. Regtested on x86_64-unknown-linux-gnu, Ok for trunk? frontend ChangeLog: 2011-05-18 Janne Blomqvist j...@gcc.gnu.org * gfortran.texi (set_fpe): Update documentation. * invoke.texi (-ffpe-trap): Likewise. * libgfortran.h (GFC_FPE_PRECISION): Rename to GFC_FPE_INEXACT. * options.c (gfc_handle_fpe_trap_option): Handle inexact and make precision an alias for it. libgfortran ChangeLog: 2011-05-18 Janne Blomqvist j...@gcc.gnu.org * config/fpu-387.h (set_fpu): Use renamed inexact macro. * config/fpu-aix.h (set_fpu): Clarify error messages, use renamed inexact macro, set TRP_INEXACT for inexact exception instead of TRP_UNDERFLOW. * config/fpu-generic.h (set_fpu): Clarify error messages, use renamed inexact macro. * config/fpu-glibc.h (set_fpu): Likewise. * config/fpu-sysv.h (set_fpu): Likewise. -- Janne Blomqvist diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi index 995d9d8..4db506c 100644 --- a/gcc/fortran/gfortran.texi +++ b/gcc/fortran/gfortran.texi @@ -2718,16 +2718,15 @@ int main (int argc, char *argv[]) @node _gfortran_set_fpe -@subsection @code{_gfortran_set_fpe} --- Set when a Floating Point Exception should be raised +@subsection @code{_gfortran_set_fpe} --- Enable floating point exception traps @fnindex _gfortran_set_fpe @cindex libgfortran initialization, set_fpe @table @asis @item @emph{Description}: -@code{_gfortran_set_fpe} sets the IEEE exceptions for which a -Floating Point Exception (FPE) should be raised. On most systems, -this will result in a SIGFPE signal being sent and the program -being interrupted. +@code{_gfortran_set_fpe} enables floating point exception traps for +the specified exceptions. On most systems, this will result in a +SIGFPE signal being sent and the program being aborted. @item @emph{Syntax}: @code{void _gfortran_set_fpe (int val)} @@ -2738,7 +2737,7 @@ being interrupted. (bitwise or-ed) zero (0, default) no trapping, @code{GFC_FPE_INVALID} (1), @code{GFC_FPE_DENORMAL} (2), @code{GFC_FPE_ZERO} (4), @code{GFC_FPE_OVERFLOW} (8), -@code{GFC_FPE_UNDERFLOW} (16), and @code{GFC_FPE_PRECISION} (32). +@code{GFC_FPE_UNDERFLOW} (16), and @code{GFC_FPE_INEXACT} (32). @end multitable @item @emph{Example}: diff --git a/gcc/fortran/invoke.texi b/gcc/fortran/invoke.texi index ab45072..41fee67 100644 --- a/gcc/fortran/invoke.texi +++ b/gcc/fortran/invoke.texi @@ -919,21 +919,31 @@ GNU Fortran compiler itself. This option is deprecated; use @item -ffpe-trap=@var{list} @opindex @code{ffpe-trap=}@var{list} -Specify a list of IEEE exceptions when a Floating Point Exception -(FPE) should be raised. On most systems, this will result in a SIGFPE -signal being sent and the program being interrupted, producing a core -file useful for debugging. @var{list} is a (possibly empty) comma-separated -list of the following IEEE exceptions: @samp{invalid} (invalid floating -point operation, such as @code{SQRT(-1.0)}), @samp{zero} (division by -zero), @samp{overflow} (overflow in a floating point operation), -@samp{underflow} (underflow in a floating point operation), -@samp{precision} (loss of precision during operation) and @samp{denormal} -(operation produced a denormal value). - -Some of the routines in the Fortran runtime library, like -@samp{CPU_TIME}, are likely to trigger floating point exceptions when -@code{ffpe-trap=precision} is used. For this reason, the use of -@code{ffpe-trap=precision} is not recommended. +Specify a list of floating point exception traps to enable. On most +systems, if a floating point exception occurs and the trap for that +exception is enabled, a SIGFPE signal will be sent and the program +being aborted, producing a core file useful for debugging. @var{list} +is a (possibly empty) comma-separated list of the following +exceptions: @samp{invalid} (invalid floating point operation, such as +@code{SQRT(-1.0)}), @samp{zero} (division by zero), @samp{overflow} +(overflow in a floating point operation), @samp{underflow} (underflow +in a
Re: [PATCH]: Restore bootstrap with --enable-build-with-cxx
On 05/18/2011 05:41 AM, Gabriel Dos Reis wrote: On Tue, May 17, 2011 at 2:46 PM, Toon Moenet...@moene.org wrote: On 05/17/2011 08:32 PM, Uros Bizjak wrote: Tested on x86_64-pc-linux-gnu {, m32} with --enable-build-with-cxx. Committed to mainline SVN as obvious. Does that mean that I can now remove the --disable-werror from my daily C++ bootstrap run ? Well, that certainly worked, as exemplified by this: http://gcc.gnu.org/ml/gcc-testresults/2011-05/msg01890.html At least that would enable my daily run (between 18:10 and 20:10 UTC) to catch -Werror mistakes ... It's great that some people understand the intricacies of the infight^H^H^H^H^H^H differences between the C and C++ type model. OK: 1/2 :-) I suspect this infight would vanish if we just switched, as we discussed in the past. Perhaps it would just help if we implemented the next step of the plan (http://gcc.gnu.org/wiki/gcc-in-cxx): # it would be a good thing to try forcing the C++ host compiler requirement for GCC 4.[7] with just building stage1 with C++ and stage2/3 with the stage1 C compiler. --disable-build-with-cxx would be a workaround for a missing C++ host compiler. Of course, that still wouldn't make it possible to implement C++ solutions for C hacks because the --disable-build-with-cxx crowd would cry foul over this ... -- Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/ Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news
Re: [PATCH][?/n] LTO type merging cleanup
On Wed, May 18, 2011 at 7:20 PM, Jan Hubicka hubi...@ucw.cz wrote: We can end up with an infinite recursion as gimple_register_type tries to register TYPE_MAIN_VARIANT first. This is because we are being called from the LTO type-fixup code which walks the type graph and adjusts types to their leaders. So we can be called for type SCCs that are only partially fixed up yet which means TYPE_MAIN_VARIANT might temporarily not honor the invariant that the main variant of a main variant is itself. Thus, simply avoid recursing more than once - we are sure that we will be reaching at most type duplicates in further recursion. Bootstrap regtest pending on x86_64-unknown-linux-gnu. With this funcion WPA stage passes with some improvements I repported to mozilla metabug. We now get ICE in ltrans: #0 gimple_register_type (t=0x0) at ../../gcc/gimple.c:4616 #1 0x005a0fc9 in gimple_register_canonical_type (t=0x7fffe851f498) at ../../gcc/gimple.c:4890 #2 0x0048f14d in lto_ft_type (t=0x7fffe851f498) at ../../gcc/lto/lto.c:401 #3 lto_fixup_types (t=0x7fffe851f498) at ../../gcc/lto/lto.c:581 #4 0x0048f4a0 in uniquify_nodes (node=Unhandled dwarf expression opcode 0xf3 TYPE_MAIN_VARIANT is NULL. (gdb) up #1 0x005a0fc9 in gimple_register_canonical_type (t=0x7fffe851f498) at ../../gcc/gimple.c:4890 4890 t = gimple_register_type (TYPE_MAIN_VARIANT (t)); (gdb) p debug_generic_stmt (t) struct _ffi_type $1 = void (gdb) p debug_tree (t) record_type 0x7fffe851f498 _ffi_type BLK size integer_cst 0x77ecf680 type integer_type 0x77eca0a8 bit_size_type constant 192 unit size integer_cst 0x77ecf640 type integer_type 0x77eca000 constant 24 align 64 symtab 0 alias set -1 structural equality fields field_decl 0x7fffe87684c0 size type integer_type 0x77eca690 long unsigned int public unsigned DI size integer_cst 0x77ecf1e0 constant 64 unit size integer_cst 0x77ecf200 constant 8 align 64 symtab 0 alias set -1 canonical type 0x77eca690 precision 64 min integer_cst 0x77ecf220 0 max integer_cst 0x77ecf1c0 18446744073709551615 pointer_to_this pointer_type 0x75336150 reference_to_this reference_type 0x70aba000 used unsigned nonlocal DI file ctypes/libffi/include/ffi.h line 109 col 0 size integer_cst 0x77ecf1e0 64 unit size integer_cst 0x77ecf200 8 align 64 offset_align 128 offset integer_cst 0x77ebaf00 constant 0 bit offset integer_cst 0x77ecf420 constant 0 context record_type 0x7fffe851f2a0 _ffi_type chain field_decl 0x7fffe8768558 alignment type integer_type 0x77eca3f0 short unsigned int used unsigned nonlocal HI file ctypes/libffi/include/ffi.h line 110 col 0 size integer_cst 0x77ecf080 constant 16 unit size integer_cst 0x77ecf0a0 constant 2 align 16 offset_align 128 offset integer_cst 0x77ebaf00 0 bit offset integer_cst 0x77ecf1e0 64 context record_type 0x7fffe851f2a0 _ffi_type chain field_decl 0x7fffe87685f0 type chain type_decl 0x7fffe8966ac8 _ffi_type $2 = void Let me know if there is anything easy I could work out ;) I think the bug may be in the recursion guard. When you have cycle of length greater than 2 of MVs, you won't walk them all. That doesn't matter. MVs are acyclic initially (in fact the chain has length 1), only during fixup we can temporarily create larger chains or cycles. MVs also never are NULL, so it would be interesting to see what clears it ... Richard. Honza
[PATCH] Fix VRP MIN/MAX handling with two anti-ranges (PR tree-optimization/49039)
Hi! The testcases below are miscompiled (execute/ by 4.6/4.7, pr49039.C by 4.6 and twice so by 4.7 (so much that it doesn't abort)), because VRP thinks that MIN_EXPR ~[-1UL, -1UL], ~[0, 0] is ~[0, 0] (correct is VARYING and similarly MAX_EXPR ~[-1UL, -1UL], ~[0, 0] is ~[-1UL, -1UL]). min = vrp_int_const_binop (code, vr0.min, vr1.min); max = vrp_int_const_binop (code, vr0.max, vr1.max); is only correct for VR_RANGE for +/min/max, for + we give up for VRP_ANTI_RANGE. The following patch instead for both min and max with anti-ranges returns ~[MAX_EXPR vr0.min, vr1.min, MIN_EXPR vr0.max, vr1.max]. The code later on in the function will change that into VARYING if there is no intersection and thus min is above max. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.6? 2011-05-18 Jakub Jelinek ja...@redhat.com PR tree-optimization/49039 * tree-vrp.c (extract_range_from_binary_expr): For MIN_EXPR ~[a, b], ~[c, d] and MAX_EXPR ~[a, b], ~[c, d] return ~[MAX_EXPR a, c, MIN_EXPR b, d]. * gcc.c-torture/execute/pr49039.c: New test. * gcc.dg/tree-ssa/pr49039.c: New test. * g++.dg/torture/pr49039.C: New test. --- gcc/tree-vrp.c.jj 2011-05-11 19:39:03.0 +0200 +++ gcc/tree-vrp.c 2011-05-18 19:13:54.0 +0200 @@ -2358,17 +2358,27 @@ extract_range_from_binary_expr (value_ra op0 + op1 == 0, so we cannot claim that the sum is in ~[0,0]. Note that we are guaranteed to have vr0.type == vr1.type at this point. */ - if (code == PLUS_EXPR vr0.type == VR_ANTI_RANGE) + if (vr0.type == VR_ANTI_RANGE) { - set_value_range_to_varying (vr); - return; + if (code == PLUS_EXPR) + { + set_value_range_to_varying (vr); + return; + } + /* For MIN_EXPR and MAX_EXPR with two VR_ANTI_RANGEs, +the resulting VR_ANTI_RANGE is the same - intersection +of the two ranges. */ + min = vrp_int_const_binop (MAX_EXPR, vr0.min, vr1.min); + max = vrp_int_const_binop (MIN_EXPR, vr0.max, vr1.max); + } + else + { + /* For operations that make the resulting range directly +proportional to the original ranges, apply the operation to +the same end of each range. */ + min = vrp_int_const_binop (code, vr0.min, vr1.min); + max = vrp_int_const_binop (code, vr0.max, vr1.max); } - - /* For operations that make the resulting range directly -proportional to the original ranges, apply the operation to -the same end of each range. */ - min = vrp_int_const_binop (code, vr0.min, vr1.min); - max = vrp_int_const_binop (code, vr0.max, vr1.max); /* If both additions overflowed the range kind is still correct. This happens regularly with subtracting something in unsigned --- gcc/testsuite/gcc.c-torture/execute/pr49039.c.jj2011-05-18 19:18:57.0 +0200 +++ gcc/testsuite/gcc.c-torture/execute/pr49039.c 2011-05-18 19:03:24.0 +0200 @@ -0,0 +1,26 @@ +/* PR tree-optimization/49039 */ +extern void abort (void); +int cnt; + +__attribute__((noinline, noclone)) void +foo (unsigned int x, unsigned int y) +{ + unsigned int minv, maxv; + if (x == 1 || y == -2U) +return; + minv = x y ? x : y; + maxv = x y ? x : y; + if (minv == 1) +++cnt; + if (maxv == -2U) +++cnt; +} + +int +main () +{ + foo (-2U, 1); + if (cnt != 2) +abort (); + return 0; +} --- gcc/testsuite/gcc.dg/tree-ssa/pr49039.c.jj 2011-05-18 19:30:04.0 +0200 +++ gcc/testsuite/gcc.dg/tree-ssa/pr49039.c 2011-05-18 19:29:57.0 +0200 @@ -0,0 +1,31 @@ +/* PR tree-optimization/49039 */ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-vrp1 } */ + +extern void bar (void); + +void +foo (unsigned int x, unsigned int y) +{ + unsigned int minv, maxv; + if (x = 3 x = 6) +return; + if (y = 5 y = 8) +return; + minv = x y ? x : y; + maxv = x y ? x : y; + if (minv == 5) +bar (); + if (minv == 6) +bar (); + if (maxv == 5) +bar (); + if (maxv == 6) +bar (); +} + +/* { dg-final { scan-tree-dump Folding predicate minv_\[0-9\]* == 5 to 0 vrp1 } } */ +/* { dg-final { scan-tree-dump Folding predicate minv_\[0-9\]* == 6 to 0 vrp1 } } */ +/* { dg-final { scan-tree-dump Folding predicate maxv_\[0-9\]* == 5 to 0 vrp1 } } */ +/* { dg-final { scan-tree-dump Folding predicate maxv_\[0-9\]* == 6 to 0 vrp1 } } */ +/* { dg-final { cleanup-tree-dump vrp1 } } */ --- gcc/testsuite/g++.dg/torture/pr49039.C.jj 2011-05-18 19:20:45.0 +0200 +++ gcc/testsuite/g++.dg/torture/pr49039.C 2011-05-18 19:20:03.0 +0200 @@ -0,0 +1,76 @@ +// PR tree-optimization/49039 +// { dg-do run } + +template class T1, class T2 +struct pair +{ + T1 first; + T2 second; + pair (const T1 a, const T2 b):first (a), second (b) {} +}; + +template class
Re: New options to disable/enable any pass for any functions (issue4550056)
On Wed, May 18, 2011 at 8:37 PM, David Li davi...@google.com wrote: In gcc, not all passes have user level control to turn it on/off, and there is no way to flip on/off the pass for a subset of functions. I implemented a generic option handling scheme in gcc to allow disabling/enabling any gcc pass for any specified function(s). The new options will be very useful for things like performance experiments and bug triaging (gcc has dbgcnt mechanism, but not all passes have the counter). The option syntax is very similar to -fdump- options. The following are some examples: -fdisable-tree-ccp1 --- disable ccp1 for all functions -fenable-tree-cunroll=1 --- enable complete unroll for the function whose cgraphnode uid is 1 -fdisable-rtl-gcse2=1:100,300,400:1000 -- disable gcse2 for functions at the following ranges [1,1], [300,400], and [400,1000] -fdisable-tree-einline -- disable early inlining for all callers -fdisable-ipa-inline -- disable ipa inlininig In the gcc dumps, the uid numbers are displayed in the function header. The options are intended to be used internally by gcc developers. Ok for trunk ? (There is a little LIPO specific change that can be removed). David 2011-05-18 David Li davi...@google.com * final.c (rest_of_clean_state): Call function header dumper. * opts-global.c (handle_common_deferred_options): Handle new options. * tree-cfg.c (gimple_dump_cfg): Call function header dumper. * passes.c (register_one_dump_file): Call register_pass_name. (pass_init_dump_file): Call function header dumper. (execute_one_pass): Check explicit enable/disable flag. (passr_hash): New function. (passr_eq): (register_pass_name): (get_pass_by_name): (pass_hash): (pass_eq): (enable_disable_pass): (is_pass_explicitly_enabled_or_disabled): (is_pass_explicitly_enabled): (is_pass_explicitly_disabled): Bogus changelog entry. New options need documenting in doc/invoke.texi. Richard. Index: tree-pass.h === --- tree-pass.h (revision 173635) +++ tree-pass.h (working copy) @@ -644,4 +644,12 @@ extern bool first_pass_instance; /* Declare for plugins. */ extern void do_per_function_toporder (void (*) (void *), void *); +extern void enable_disable_pass (const char *, bool); +extern bool is_pass_explicitly_disabled (struct opt_pass *, tree); +extern bool is_pass_explicitly_enabled (struct opt_pass *, tree); +extern void register_pass_name (struct opt_pass *, const char *); +extern struct opt_pass *get_pass_by_name (const char *); +struct function; +extern void pass_dump_function_header (FILE *, tree, struct function *); + #endif /* GCC_TREE_PASS_H */ Index: final.c === --- final.c (revision 173635) +++ final.c (working copy) @@ -4456,19 +4456,7 @@ rest_of_clean_state (void) } else { - const char *aname; - struct cgraph_node *node = cgraph_node (current_function_decl); - - aname = (IDENTIFIER_POINTER - (DECL_ASSEMBLER_NAME (current_function_decl))); - fprintf (final_output, \n;; Function (%s) %s\n\n, aname, - node-frequency == NODE_FREQUENCY_HOT - ? (hot) - : node-frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED - ? (unlikely executed) - : node-frequency == NODE_FREQUENCY_EXECUTED_ONCE - ? (executed once) - : ); + pass_dump_function_header (final_output, current_function_decl, cfun); flag_dump_noaddr = flag_dump_unnumbered = 1; if (flag_compare_debug_opt || flag_compare_debug) Index: common.opt === --- common.opt (revision 173635) +++ common.opt (working copy) @@ -1018,6 +1018,14 @@ fdiagnostics-show-option Common Var(flag_diagnostics_show_option) Init(1) Amend appropriate diagnostic messages with the command line option that controls them +fdisable- +Common Joined RejectNegative Var(common_deferred_options) Defer +-fdisable-[tree|rtl|ipa]-pass=range1+range2 disables an optimization pass + +fenable- +Common Joined RejectNegative Var(common_deferred_options) Defer +-fenable-[tree|rtl|ipa]-pass=range1+range2 enables an optimization pass + fdump- Common Joined RejectNegative Var(common_deferred_options) Defer -fdump-type Dump various compiler internals to a file Index: opts-global.c === --- opts-global.c (revision 173635) +++ opts-global.c (working copy) @@ -411,6 +411,12 @@ handle_common_deferred_options (void) error
Re: [PATCH]: Restore bootstrap with --enable-build-with-cxx
On Wed, May 18, 2011 at 10:17 PM, Toon Moene t...@moene.org wrote: On 05/18/2011 05:41 AM, Gabriel Dos Reis wrote: On Tue, May 17, 2011 at 2:46 PM, Toon Moenet...@moene.org wrote: On 05/17/2011 08:32 PM, Uros Bizjak wrote: Tested on x86_64-pc-linux-gnu {, m32} with --enable-build-with-cxx. Committed to mainline SVN as obvious. Does that mean that I can now remove the --disable-werror from my daily C++ bootstrap run ? Well, that certainly worked, as exemplified by this: http://gcc.gnu.org/ml/gcc-testresults/2011-05/msg01890.html At least that would enable my daily run (between 18:10 and 20:10 UTC) to catch -Werror mistakes ... It's great that some people understand the intricacies of the infight^H^H^H^H^H^H differences between the C and C++ type model. OK: 1/2 :-) I suspect this infight would vanish if we just switched, as we discussed in the past. Perhaps it would just help if we implemented the next step of the plan (http://gcc.gnu.org/wiki/gcc-in-cxx): # it would be a good thing to try forcing the C++ host compiler requirement for GCC 4.[7] with just building stage1 with C++ and stage2/3 with the stage1 C compiler. --disable-build-with-cxx would be a workaround for a missing C++ host compiler. Or the other way around, build stage1 with the host C compiler, add C++ to stage1-languages and build stage2/3 with the stageN C++ compiler. That avoids the host C++ compiler requirement for now and excercises the libstdc++ linking issues. But yes, somebody has to go forward to implement either (or both) variants. Not that I'm too excited to see GCC built with a C++ compiler (or even C++ features being used). Richard.
Re: [PATCH] Fix VRP MIN/MAX handling with two anti-ranges (PR tree-optimization/49039)
On Wed, May 18, 2011 at 10:21 PM, Jakub Jelinek ja...@redhat.com wrote: Hi! The testcases below are miscompiled (execute/ by 4.6/4.7, pr49039.C by 4.6 and twice so by 4.7 (so much that it doesn't abort)), because VRP thinks that MIN_EXPR ~[-1UL, -1UL], ~[0, 0] is ~[0, 0] (correct is VARYING and similarly MAX_EXPR ~[-1UL, -1UL], ~[0, 0] is ~[-1UL, -1UL]). min = vrp_int_const_binop (code, vr0.min, vr1.min); max = vrp_int_const_binop (code, vr0.max, vr1.max); is only correct for VR_RANGE for +/min/max, for + we give up for VRP_ANTI_RANGE. The following patch instead for both min and max with anti-ranges returns ~[MAX_EXPR vr0.min, vr1.min, MIN_EXPR vr0.max, vr1.max]. The code later on in the function will change that into VARYING if there is no intersection and thus min is above max. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.6? Ok. Isn't it latent on the 4.5 branch as well? Thanks, Richard. 2011-05-18 Jakub Jelinek ja...@redhat.com PR tree-optimization/49039 * tree-vrp.c (extract_range_from_binary_expr): For MIN_EXPR ~[a, b], ~[c, d] and MAX_EXPR ~[a, b], ~[c, d] return ~[MAX_EXPR a, c, MIN_EXPR b, d]. * gcc.c-torture/execute/pr49039.c: New test. * gcc.dg/tree-ssa/pr49039.c: New test. * g++.dg/torture/pr49039.C: New test. --- gcc/tree-vrp.c.jj 2011-05-11 19:39:03.0 +0200 +++ gcc/tree-vrp.c 2011-05-18 19:13:54.0 +0200 @@ -2358,17 +2358,27 @@ extract_range_from_binary_expr (value_ra op0 + op1 == 0, so we cannot claim that the sum is in ~[0,0]. Note that we are guaranteed to have vr0.type == vr1.type at this point. */ - if (code == PLUS_EXPR vr0.type == VR_ANTI_RANGE) + if (vr0.type == VR_ANTI_RANGE) { - set_value_range_to_varying (vr); - return; + if (code == PLUS_EXPR) + { + set_value_range_to_varying (vr); + return; + } + /* For MIN_EXPR and MAX_EXPR with two VR_ANTI_RANGEs, + the resulting VR_ANTI_RANGE is the same - intersection + of the two ranges. */ + min = vrp_int_const_binop (MAX_EXPR, vr0.min, vr1.min); + max = vrp_int_const_binop (MIN_EXPR, vr0.max, vr1.max); + } + else + { + /* For operations that make the resulting range directly + proportional to the original ranges, apply the operation to + the same end of each range. */ + min = vrp_int_const_binop (code, vr0.min, vr1.min); + max = vrp_int_const_binop (code, vr0.max, vr1.max); } - - /* For operations that make the resulting range directly - proportional to the original ranges, apply the operation to - the same end of each range. */ - min = vrp_int_const_binop (code, vr0.min, vr1.min); - max = vrp_int_const_binop (code, vr0.max, vr1.max); /* If both additions overflowed the range kind is still correct. This happens regularly with subtracting something in unsigned --- gcc/testsuite/gcc.c-torture/execute/pr49039.c.jj 2011-05-18 19:18:57.0 +0200 +++ gcc/testsuite/gcc.c-torture/execute/pr49039.c 2011-05-18 19:03:24.0 +0200 @@ -0,0 +1,26 @@ +/* PR tree-optimization/49039 */ +extern void abort (void); +int cnt; + +__attribute__((noinline, noclone)) void +foo (unsigned int x, unsigned int y) +{ + unsigned int minv, maxv; + if (x == 1 || y == -2U) + return; + minv = x y ? x : y; + maxv = x y ? x : y; + if (minv == 1) + ++cnt; + if (maxv == -2U) + ++cnt; +} + +int +main () +{ + foo (-2U, 1); + if (cnt != 2) + abort (); + return 0; +} --- gcc/testsuite/gcc.dg/tree-ssa/pr49039.c.jj 2011-05-18 19:30:04.0 +0200 +++ gcc/testsuite/gcc.dg/tree-ssa/pr49039.c 2011-05-18 19:29:57.0 +0200 @@ -0,0 +1,31 @@ +/* PR tree-optimization/49039 */ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-vrp1 } */ + +extern void bar (void); + +void +foo (unsigned int x, unsigned int y) +{ + unsigned int minv, maxv; + if (x = 3 x = 6) + return; + if (y = 5 y = 8) + return; + minv = x y ? x : y; + maxv = x y ? x : y; + if (minv == 5) + bar (); + if (minv == 6) + bar (); + if (maxv == 5) + bar (); + if (maxv == 6) + bar (); +} + +/* { dg-final { scan-tree-dump Folding predicate minv_\[0-9\]* == 5 to 0 vrp1 } } */ +/* { dg-final { scan-tree-dump Folding predicate minv_\[0-9\]* == 6 to 0 vrp1 } } */ +/* { dg-final { scan-tree-dump Folding predicate maxv_\[0-9\]* == 5 to 0 vrp1 } } */ +/* { dg-final { scan-tree-dump Folding predicate maxv_\[0-9\]* == 6 to 0 vrp1 } } */ +/* { dg-final { cleanup-tree-dump vrp1 } } */ --- gcc/testsuite/g++.dg/torture/pr49039.C.jj 2011-05-18 19:20:45.0 +0200 +++ gcc/testsuite/g++.dg/torture/pr49039.C
Re: New options to disable/enable any pass for any functions (issue4550056)
Will fix the Changelog, and add documentation. Thanks, David On Wed, May 18, 2011 at 1:26 PM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, May 18, 2011 at 8:37 PM, David Li davi...@google.com wrote: In gcc, not all passes have user level control to turn it on/off, and there is no way to flip on/off the pass for a subset of functions. I implemented a generic option handling scheme in gcc to allow disabling/enabling any gcc pass for any specified function(s). The new options will be very useful for things like performance experiments and bug triaging (gcc has dbgcnt mechanism, but not all passes have the counter). The option syntax is very similar to -fdump- options. The following are some examples: -fdisable-tree-ccp1 --- disable ccp1 for all functions -fenable-tree-cunroll=1 --- enable complete unroll for the function whose cgraphnode uid is 1 -fdisable-rtl-gcse2=1:100,300,400:1000 -- disable gcse2 for functions at the following ranges [1,1], [300,400], and [400,1000] -fdisable-tree-einline -- disable early inlining for all callers -fdisable-ipa-inline -- disable ipa inlininig In the gcc dumps, the uid numbers are displayed in the function header. The options are intended to be used internally by gcc developers. Ok for trunk ? (There is a little LIPO specific change that can be removed). David 2011-05-18 David Li davi...@google.com * final.c (rest_of_clean_state): Call function header dumper. * opts-global.c (handle_common_deferred_options): Handle new options. * tree-cfg.c (gimple_dump_cfg): Call function header dumper. * passes.c (register_one_dump_file): Call register_pass_name. (pass_init_dump_file): Call function header dumper. (execute_one_pass): Check explicit enable/disable flag. (passr_hash): New function. (passr_eq): (register_pass_name): (get_pass_by_name): (pass_hash): (pass_eq): (enable_disable_pass): (is_pass_explicitly_enabled_or_disabled): (is_pass_explicitly_enabled): (is_pass_explicitly_disabled): Bogus changelog entry. New options need documenting in doc/invoke.texi. Richard. Index: tree-pass.h === --- tree-pass.h (revision 173635) +++ tree-pass.h (working copy) @@ -644,4 +644,12 @@ extern bool first_pass_instance; /* Declare for plugins. */ extern void do_per_function_toporder (void (*) (void *), void *); +extern void enable_disable_pass (const char *, bool); +extern bool is_pass_explicitly_disabled (struct opt_pass *, tree); +extern bool is_pass_explicitly_enabled (struct opt_pass *, tree); +extern void register_pass_name (struct opt_pass *, const char *); +extern struct opt_pass *get_pass_by_name (const char *); +struct function; +extern void pass_dump_function_header (FILE *, tree, struct function *); + #endif /* GCC_TREE_PASS_H */ Index: final.c === --- final.c (revision 173635) +++ final.c (working copy) @@ -4456,19 +4456,7 @@ rest_of_clean_state (void) } else { - const char *aname; - struct cgraph_node *node = cgraph_node (current_function_decl); - - aname = (IDENTIFIER_POINTER - (DECL_ASSEMBLER_NAME (current_function_decl))); - fprintf (final_output, \n;; Function (%s) %s\n\n, aname, - node-frequency == NODE_FREQUENCY_HOT - ? (hot) - : node-frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED - ? (unlikely executed) - : node-frequency == NODE_FREQUENCY_EXECUTED_ONCE - ? (executed once) - : ); + pass_dump_function_header (final_output, current_function_decl, cfun); flag_dump_noaddr = flag_dump_unnumbered = 1; if (flag_compare_debug_opt || flag_compare_debug) Index: common.opt === --- common.opt (revision 173635) +++ common.opt (working copy) @@ -1018,6 +1018,14 @@ fdiagnostics-show-option Common Var(flag_diagnostics_show_option) Init(1) Amend appropriate diagnostic messages with the command line option that controls them +fdisable- +Common Joined RejectNegative Var(common_deferred_options) Defer +-fdisable-[tree|rtl|ipa]-pass=range1+range2 disables an optimization pass + +fenable- +Common Joined RejectNegative Var(common_deferred_options) Defer +-fenable-[tree|rtl|ipa]-pass=range1+range2 enables an optimization pass + fdump- Common Joined RejectNegative Var(common_deferred_options) Defer -fdump-type Dump various compiler internals to a file Index: opts-global.c === ---
Re: [PATCH] Fix VRP MIN/MAX handling with two anti-ranges (PR tree-optimization/49039)
On Wed, May 18, 2011 at 10:32:49PM +0200, Richard Guenther wrote: On Wed, May 18, 2011 at 10:21 PM, Jakub Jelinek ja...@redhat.com wrote: Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.6? Ok. Thanks. Isn't it latent on the 4.5 branch as well? Not only latent, execute/pr49039.c fails at -O2 with 4.3/4.4/4.5 too (and succeeds with -O1). I'll queue this for backporting... Jakub
PATCH: PR target/49002: 128-bit AVX load incorrectly becomes 256-bit AVX load
Hi, This patch properly handles 256bit load cast. OK for trunk if there is no regression? I will also prepare a patch for 4.6 branch. Thanks. H.J. -- gcc/ 2011-05-18 H.J. Lu hongjiu...@intel.com PR target/49002 * config/i386/sse.md (avx_ssemodesuffixavxsizesuffix_ssemodesuffix): Properly handle load cast. gcc/testsuite/ 2011-05-18 H.J. Lu hongjiu...@intel.com PR target/49002 * gcc.target/i386/pr49002-1.c: New test. * gcc.target/i386/pr49002-2.c: Likewise. diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 291bffb..cf12a6d 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -10294,12 +10294,13 @@ reload_completed [(const_int 0)] { + rtx op0 = operands[0]; rtx op1 = operands[1]; - if (REG_P (op1)) + if (REG_P (op0)) +op0 = gen_rtx_REG (ssehalfvecmodemode, REGNO (op0)); + else op1 = gen_rtx_REG (MODEmode, REGNO (op1)); - else -op1 = gen_lowpart (MODEmode, op1); - emit_move_insn (operands[0], op1); + emit_move_insn (op0, op1); DONE; }) diff --git a/gcc/testsuite/gcc.target/i386/pr49002-1.c b/gcc/testsuite/gcc.target/i386/pr49002-1.c new file mode 100644 index 000..7553e82 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr49002-1.c @@ -0,0 +1,16 @@ +/* PR target/49002 */ +/* { dg-do compile } */ +/* { dg-options -O -mavx } */ + +#include immintrin.h + +void foo(const __m128d *from, __m256d *to, int s) +{ + __m256d var = _mm256_castpd128_pd256(from[0]); + var = _mm256_insertf128_pd(var, from[s], 1); + to[0] = var; +} + +/* Ensure we load into xmm, not ymm. */ +/* { dg-final { scan-assembler-not vmovapd\[\t \]*\[^,\]*,\[\t \]*%ymm } } */ +/* { dg-final { scan-assembler vmovapd\[\t \]*\[^,\]*,\[\t \]*%xmm } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr49002-2.c b/gcc/testsuite/gcc.target/i386/pr49002-2.c new file mode 100644 index 000..b0e1009 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr49002-2.c @@ -0,0 +1,14 @@ +/* PR target/49002 */ +/* { dg-do compile } */ +/* { dg-options -O -mavx } */ + +#include immintrin.h + +void foo(const __m128d from, __m256d *to) +{ + *to = _mm256_castpd128_pd256(from); +} + +/* Ensure we store ymm, not xmm. */ +/* { dg-final { scan-assembler-not vmovapd\[\t \]*%xmm\[0-9\]\+,\[^,\]* } } */ +/* { dg-final { scan-assembler vmovapd\[\t \]*%ymm\[0-9\]\+,\[^,\]* } } */
Re: [PATCH]: Restore bootstrap with --enable-build-with-cxx
On 05/18/2011 10:31 PM, Richard Guenther wrote: Not that I'm too excited to see GCC built with a C++ compiler (or even C++ features being used). Hmmm, you think using false as a value for a pointer-returning function is just A-OK ? Duh, I'm glad I'm using Fortran, where the programmer isn't even supposed to know what the value of .FALSE. is, because it is implementation dependent. -- Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/ Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news
Re: [PATCH]: Restore bootstrap with --enable-build-with-cxx
On Wed, May 18, 2011 at 10:44 PM, Toon Moene t...@moene.org wrote: On 05/18/2011 10:31 PM, Richard Guenther wrote: Not that I'm too excited to see GCC built with a C++ compiler (or even C++ features being used). Hmmm, you think using false as a value for a pointer-returning function is just A-OK ? No, it isn't ;) Richard. Duh, I'm glad I'm using Fortran, where the programmer isn't even supposed to know what the value of .FALSE. is, because it is implementation dependent. -- Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/ Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news
Re: [PATCH PR45098, 4/10] Iv init cost.
Hi, Resubmitting with comment. The init cost of an iv will in general not be zero. It will be exceptional that the iv register happens to be initialized with the proper value at no cost. In general, there will at the very least be a regcopy or a const set. OK. Please add a comment explaining this to the code, Zdenek 2011-05-05 Tom de Vries t...@codesourcery.com PR target/45098 * tree-ssa-loop-ivopts.c (determine_iv_cost): Prevent cost_base.cost == 0. Index: gcc/tree-ssa-loop-ivopts.c === --- gcc/tree-ssa-loop-ivopts.c(revision 173380) +++ gcc/tree-ssa-loop-ivopts.c(working copy) @@ -4688,6 +4688,8 @@ determine_iv_cost (struct ivopts_data *d base = cand-iv-base; cost_base = force_var_cost (data, base, NULL); + if (cost_base.cost == 0) + cost_base.cost = COSTS_N_INSNS (1); cost_step = add_cost (TYPE_MODE (TREE_TYPE (base)), data-speed); cost = cost_step + adjust_setup_cost (data, cost_base.cost);
[patch, fortran] Some more function elimination tweaks
Hello world, the attached patch does the following: - It removes the restriction on functions returning allocatables for elimination (unnecessary since the introduction of allocatable temporary variables) - It allows character function elimination if the character length is a constant known at compile time - It removes introducing temporary variables for the TRANSFER function; this is better be handled by the middle-end. Regression-tested. OK for trunk? Thomas 2011-05-18 Thomas Koenig tkoe...@gcc.gnu.org * frontend-passes.c (cfe_register_funcs): Also register character functions if their charlens are known and constant. Also register allocatable functions. 2011-05-18 Thomas Koenig tkoe...@gcc.gnu.org * gfortran.dg/function_optimize_8.f90: New test case. Index: frontend-passes.c === --- frontend-passes.c (Revision 173754) +++ frontend-passes.c (Arbeitskopie) @@ -137,8 +137,7 @@ optimize_expr (gfc_expr **e, int *walk_subtrees AT /* Callback function for common function elimination, called from cfe_expr_0. - Put all eligible function expressions into expr_array. We can't do - allocatable functions. */ + Put all eligible function expressions into expr_array. */ static int cfe_register_funcs (gfc_expr **e, int *walk_subtrees ATTRIBUTE_UNUSED, @@ -148,8 +147,10 @@ cfe_register_funcs (gfc_expr **e, int *walk_subtre if ((*e)-expr_type != EXPR_FUNCTION) return 0; - /* We don't do character functions (yet). */ - if ((*e)-ts.type == BT_CHARACTER) + /* We don't do character functions with unknown charlens. */ + if ((*e)-ts.type == BT_CHARACTER + ((*e)-ts.u.cl == NULL || (*e)-ts.u.cl-length == NULL + || (*e)-ts.u.cl-length-expr_type != EXPR_CONSTANT)) return 0; /* If we don't know the shape at compile time, we create an allocatable @@ -163,9 +164,6 @@ cfe_register_funcs (gfc_expr **e, int *walk_subtre is specified. */ if ((*e)-value.function.esym) { - if ((*e)-value.function.esym-attr.allocatable) - return 0; - /* Don't create an array temporary for elemental functions. */ if ((*e)-value.function.esym-attr.elemental (*e)-rank 0) return 0; @@ -181,9 +179,10 @@ cfe_register_funcs (gfc_expr **e, int *walk_subtre if ((*e)-value.function.isym) { /* Conversions are handled on the fly by the middle end, - transpose during trans-* stages. */ + transpose during trans-* stages and TRANSFER by the middle end. */ if ((*e)-value.function.isym-id == GFC_ISYM_CONVERSION - || (*e)-value.function.isym-id == GFC_ISYM_TRANSPOSE) + || (*e)-value.function.isym-id == GFC_ISYM_TRANSPOSE + || (*e)-value.function.isym-id == GFC_ISYM_TRANSFER) return 0; /* Don't create an array temporary for elemental functions, ! { dg-do compile } ! { dg-options -O -fdump-tree-original } ! Check that duplicate function calls are removed for ! - Functions returning allocatables ! - Character functions with known length module x implicit none contains pure function myfunc(x) result(y) integer, intent(in) :: x integer, dimension(:), allocatable :: y allocate (y(3)) y(1) = x y(2) = 2*x y(3) = 3*x end function myfunc pure function mychar(x) result(r) integer, intent(in) :: x character(len=2) :: r r = achar(x + iachar('0')) // achar(x + iachar('1')) end function mychar end module x program main use x implicit none integer :: n character(len=20) :: line n = 3 write (unit=line,fmt='(3I2)') myfunc(n) + myfunc(n) if (line /= ' 61218') call abort write (unit=line,fmt='(A)') mychar(2) // mychar(2) if (line /= '2323') call abort end program main ! { dg-final { scan-tree-dump-times myfunc 2 original } } ! { dg-final { scan-tree-dump-times mychar 2 original } } ! { dg-final { cleanup-tree-dump original } } ! { dg-final { cleanup-modules x } }
Re: [C++ PATCH] Attempt to find implicitly determined firstprivate class type vars during genericization (PR c++/48869)
On 05/11/2011 08:26 AM, Jakub Jelinek wrote: This patch duplicates parts of the gimplifier's work during genericization, That seems unfortunate, but I'll accept your judgment that it's the least-bad solution. The patch is OK. Jason
Re: [PATCH PR45098, 9/10] Cheap shift-add.
Hi, + sa_cost = (TREE_CODE (expr) != MINUS_EXPR + ? shiftadd_cost[speed][mode][m] + : (mult == op1 +? shiftsub1_cost[speed][mode][m] +: shiftsub0_cost[speed][mode][m])); + res = new_cost (sa_cost, 0); + res = add_costs (res, mult == op1 ? cost0 : cost1); just forgetting the cost of the other operand does not seem correct -- what if it contains some more complicated subexpression? Zdenek
Re: PATCH: PR target/49002: 128-bit AVX load incorrectly becomes 256-bit AVX load
On Wed, May 18, 2011 at 10:37 PM, H.J. Lu hongjiu...@intel.com wrote: This patch properly handles 256bit load cast. OK for trunk if there is no regression? I will also prepare a patch for 4.6 branch. 2011-05-18 H.J. Lu hongjiu...@intel.com PR target/49002 * config/i386/sse.md (avx_ssemodesuffixavxsizesuffix_ssemodesuffix): Properly handle load cast. gcc/testsuite/ 2011-05-18 H.J. Lu hongjiu...@intel.com PR target/49002 * gcc.target/i386/pr49002-1.c: New test. * gcc.target/i386/pr49002-2.c: Likewise. OK for 4.6 and mainline (4.6 needs a bit adjusted patch due to renamed avxmodesuffixp). Thanks, Uros.
Re: [C++0x] contiguous bitfields race implementation
It seems like you're calculating maxbits correctly now, but an access doesn't necessarily start from the beginning of the sequence of bit-fields, especially given store_split_bit_field. That is, struct A { int i; int j: 32; int k: 8; char c[2]; }; Here maxbits would be 40, so we decide that it's OK to use SImode to access the word starting with k, and clobber c in the process. Am I wrong? Jason
[PATCH, SMS 1/4] Fix calculation of row_rest_count
Hello, The calculation of the number of instructions in a row is currently done by updating row_rest_count field in struct ps_insn on the fly while creating a new instruction. It is used to make sure we do not exceed the issue_rate. This calculation assumes the instruction is inserted in the beginning of a row thus does not take into account the cases where it must follow other instructions. Also, it's not been property updated when an instruction is removed. To avoid the overhead of maintaining this row_rest_count count in every instruction in each row as is currently done; this patch maintains one count per row which holds the number of instructions in the row. The patch was tested together with the rest of the patches in this series. On ppc64-redhat-linux regtest as well as bootstrap with SMS flags enabling SMS also on loops with stage count 1. Regtested on SPU. On arm-linux-gnueabi regtseted on c,c++. Bootstrap c language with SMS flags enabling SMS also on loops with stage count 1. OK for mainline? Thanks, Revital * modulo-sched.c (struct ps_insn): Remove row_rest_count field. (struct partial_schedule): Add rows_length field. (ps_insert_empty_row): Handle rows_length. (create_partial_schedule): Likewise. (free_partial_schedule): Likewise. (reset_partial_schedule): Likewise. (create_ps_insn): Remove rest_count argument. (remove_node_from_ps): Update rows_length. (add_node_to_ps): Update rows_length and call create_ps_insn without passing row_rest_count. Index: modulo-sched.c === --- modulo-sched.c (revision 173814) +++ modulo-sched.c (working copy) @@ -134,8 +135,6 @@ struct ps_insn ps_insn_ptr next_in_row, prev_in_row; - /* The number of nodes in the same row that come after this node. */ - int row_rest_count; }; /* Holds the partial schedule as an array of II rows. Each entry of the @@ -149,6 +148,9 @@ struct partial_schedule /* rows[i] points to linked list of insns scheduled in row i (0=iii). */ ps_insn_ptr *rows; + /* rows_length[i] holds the number of instructions in the row. */ + int *rows_length; + /* The earliest absolute cycle of an insn in the partial schedule. */ int min_cycle; @@ -1908,6 +2140,7 @@ ps_insert_empty_row (partial_schedule_pt int ii = ps-ii; int new_ii = ii + 1; int row; + int *rows_length_new; verify_partial_schedule (ps, sched_nodes); @@ -1922,6 +2155,7 @@ ps_insert_empty_row (partial_schedule_pt rotate_partial_schedule (ps, PS_MIN_CYCLE (ps)); rows_new = (ps_insn_ptr *) xcalloc (new_ii, sizeof (ps_insn_ptr)); + rows_length_new = (int *) xcalloc (new_ii, sizeof (int)); for (row = 0; row split_row; row++) { rows_new[row] = ps-rows[row]; @@ -1966,6 +2200,8 @@ ps_insert_empty_row (partial_schedule_pt + (SMODULO (ps-max_cycle, ii) = split_row ? 1 : 0); free (ps-rows); ps-rows = rows_new; + free (ps-rows_length); + ps-rows_length = rows_length_new; ps-ii = new_ii; gcc_assert (ps-min_cycle = 0); @@ -2456,6 +2692,7 @@ create_partial_schedule (int ii, ddg_ptr { partial_schedule_ptr ps = XNEW (struct partial_schedule); ps-rows = (ps_insn_ptr *) xcalloc (ii, sizeof (ps_insn_ptr)); + ps-rows_length = (int *) xcalloc (ii, sizeof (int)); ps-ii = ii; ps-history = history; ps-min_cycle = INT_MAX; @@ -2494,6 +2731,7 @@ free_partial_schedule (partial_schedule_ return; free_ps_insns (ps); free (ps-rows); + free (ps-rows_length); free (ps); } @@ -2511,6 +2749,8 @@ reset_partial_schedule (partial_schedule ps-rows = (ps_insn_ptr *) xrealloc (ps-rows, new_ii * sizeof (ps_insn_ptr)); memset (ps-rows, 0, new_ii * sizeof (ps_insn_ptr)); + ps-rows_length = (int *) xrealloc (ps-rows_length, new_ii * sizeof (int)); + memset (ps-rows_length, 0, new_ii * sizeof (int)); ps-ii = new_ii; ps-min_cycle = INT_MAX; ps-max_cycle = INT_MIN; @@ -2539,14 +2784,13 @@ print_partial_schedule (partial_schedule /* Creates an object of PS_INSN and initializes it to the given parameters. */ static ps_insn_ptr -create_ps_insn (ddg_node_ptr node, int rest_count, int cycle) +create_ps_insn (ddg_node_ptr node, int cycle) { ps_insn_ptr ps_i = XNEW (struct ps_insn); ps_i-node = node; ps_i-next_in_row = NULL; ps_i-prev_in_row = NULL; - ps_i-row_rest_count = rest_count; ps_i-cycle = cycle; return ps_i; @@ -2579,6 +2823,8 @@ remove_node_from_ps (partial_schedule_pt if (ps_i-next_in_row) ps_i-next_in_row-prev_in_row = ps_i-prev_in_row; } + + ps-rows_length[row] -= 1; free (ps_i); return true; } @@ -2735,17 +2981,12 @@ add_node_to_ps (partial_schedule_ptr ps, sbitmap must_precede, sbitmap must_follow) { ps_insn_ptr ps_i; - int rest_count = 1; int row = SMODULO (cycle, ps-ii); -
[PATCH, SMS 2/4] Move the creation of anti-dep edge
Hello, The attached patch moves the creation of anti-dep edge from a branch to it's def from create_ddg_dep_from_intra_loop_link () to add_cross_iteration_register_deps () due to the fact the edge is with distance 1 and thus should be in the later function. The edge was added to avoid creating reg-moves. The patch was tested together with the rest of the patches in this series. On ppc64-redhat-linux regtest as well as bootstrap with SMS flags enabling SMS also on loops with stage count 1. Regtested on SPU. On arm-linux-gnueabi regtseted on c,c++. Bootstrap c language with SMS flags enabling SMS also on loops with stage count 1. OK for mainline? Thanks, Revital * ddg.c (create_ddg_dep_from_intra_loop_link): Remove the creation of anti-dep edge from a branch. (add_cross_iteration_register_deps): Create anti-dep edge from a branch. Index: ddg.c === --- ddg.c (revision 173785) +++ ddg.c (working copy) @@ -197,11 +197,6 @@ create_ddg_dep_from_intra_loop_link (ddg } } - /* If a true dep edge enters the branch create an anti edge in the - opposite direction to prevent the creation of reg-moves. */ - if ((DEP_TYPE (link) == REG_DEP_TRUE) JUMP_P (dest_node-insn)) -create_ddg_dep_no_link (g, dest_node, src_node, ANTI_DEP, REG_DEP, 1); - latency = dep_cost (link); e = create_ddg_edge (src_node, dest_node, t, dt, latency, distance); add_edge_to_ddg (g, e); @@ -306,8 +301,11 @@ add_cross_iteration_register_deps (ddg_p gcc_assert (first_def_node); + /* Always create the edge if the use node is a branch in +order to prevent the creation of reg-moves. */ if (DF_REF_ID (last_def) != DF_REF_ID (first_def) - || !flag_modulo_sched_allow_regmoves) + || !flag_modulo_sched_allow_regmoves + || (flag_modulo_sched_allow_regmoves JUMP_P (use_node-insn))) create_ddg_dep_no_link (g, use_node, first_def_node, ANTI_DEP, REG_DEP, 1);
[PATCH, SMS 3/4] Optimize stage count
Hello, The attach patch tries to achieve optimised SC by normalizing the partial schedule (having the cycles start from cycle zero). The branch location must be placed in row ii-1 in the final scheduling. If that's not the case after the normalization then it tries to move the branch to that row if possible, while preserving the scheduling of the rest of the instructions. The patch was tested together with the rest of the patches in this series. On ppc64-redhat-linux regtest as well as bootstrap with SMS flags enabling SMS also on loops with stage count 1. Regtested on SPU. On arm-linux-gnueabi regtseted on c,c++. Bootstrap c language with SMS flags enabling SMS also on loops with stage count 1. OK for mainline? Thanks, Revital Changelog: * modulo-sched.c (calculate_stage_count, calculate_must_precede_follow, get_sched_window, try_scheduling_node_in_cycle, remove_node_from_ps): Add declaration. (update_node_sched_params, set_must_precede_follow, optimize_sc): New functions. (reset_sched_times): Call update_node_sched_params. (sms_schedule): Call optimize_sc. (get_sched_window): Change function arguments. (sms_schedule_by_order): Update call to get_sched_window. all set_must_precede_follow. (calculate_stage_count): Add function argument. Index: modulo-sched.c === --- modulo-sched.c (revision 173786) +++ modulo-sched.c (working copy) @@ -198,7 +198,16 @@ static void generate_prolog_epilog (part rtx, rtx); static void duplicate_insns_of_cycles (partial_schedule_ptr, int, int, int, rtx); -static int calculate_stage_count (partial_schedule_ptr ps); +static int calculate_stage_count (partial_schedule_ptr, int); +static void calculate_must_precede_follow (ddg_node_ptr, int, int, + int, int, sbitmap, sbitmap, sbitmap); +static int get_sched_window (partial_schedule_ptr, ddg_node_ptr, +sbitmap, int, int *, int *, int *); +static bool try_scheduling_node_in_cycle (partial_schedule_ptr, ddg_node_ptr, + int, int, sbitmap, int *, sbitmap, + sbitmap); +static bool remove_node_from_ps (partial_schedule_ptr, ps_insn_ptr); + #define SCHED_ASAP(x) (((node_sched_params_ptr)(x)-aux.info)-asap) #define SCHED_TIME(x) (((node_sched_params_ptr)(x)-aux.info)-time) #define SCHED_FIRST_REG_MOVE(x) \ @@ -572,6 +581,33 @@ free_undo_replace_buff (struct undo_repl } } +/* Update the sched_params for node U using the II, + the CYCLE of U and MIN_CYCLE. */ +static void +update_node_sched_params (ddg_node_ptr u, int ii, int cycle, int min_cycle) +{ + int sc_until_cycle_zero; + int stage; + + SCHED_TIME (u) = cycle; + SCHED_ROW (u) = SMODULO (cycle, ii); + + /* The calculation of stage count is done adding the number + of stages before cycle zero and after cycle zero. */ + sc_until_cycle_zero = CALC_STAGE_COUNT (-1, min_cycle, ii); + + if (SCHED_TIME (u) 0) +{ + stage = CALC_STAGE_COUNT (-1, SCHED_TIME (u), ii); + SCHED_STAGE (u) = sc_until_cycle_zero - stage; +} + else +{ + stage = CALC_STAGE_COUNT (SCHED_TIME (u), 0, ii); + SCHED_STAGE (u) = sc_until_cycle_zero + stage - 1; +} +} + /* Bump the SCHED_TIMEs of all nodes by AMOUNT. Set the values of SCHED_ROW and SCHED_STAGE. Instruction scheduled on cycle AMOUNT will move to cycle zero. */ @@ -588,7 +624,6 @@ reset_sched_times (partial_schedule_ptr ddg_node_ptr u = crr_insn-node; int normalized_time = SCHED_TIME (u) - amount; int new_min_cycle = PS_MIN_CYCLE (ps) - amount; -int sc_until_cycle_zero, stage; if (dump_file) { @@ -604,23 +639,9 @@ reset_sched_times (partial_schedule_ptr gcc_assert (SCHED_TIME (u) = ps-min_cycle); gcc_assert (SCHED_TIME (u) = ps-max_cycle); - SCHED_TIME (u) = normalized_time; - SCHED_ROW (u) = SMODULO (normalized_time, ii); - -/* The calculation of stage count is done adding the number - of stages before cycle zero and after cycle zero. */ - sc_until_cycle_zero = CALC_STAGE_COUNT (-1, new_min_cycle, ii); - - if (SCHED_TIME (u) 0) - { - stage = CALC_STAGE_COUNT (-1, SCHED_TIME (u), ii); - SCHED_STAGE (u) = sc_until_cycle_zero - stage; - } - else - { - stage = CALC_STAGE_COUNT (SCHED_TIME (u), 0, ii); - SCHED_STAGE (u) = sc_until_cycle_zero + stage - 1; - } + + crr_insn-cycle = normalized_time; + update_node_sched_params (u, ii, normalized_time, new_min_cycle); } } @@ -657,6 +678,206 @@ permute_partial_schedule (partial_schedu
[PATCH, SMS 4/4] Misc. fixes
Hello, The attached patch contains misc. fixes and changes. The patch was tested together with the rest of the patches in this series. On ppc64-redhat-linux regtest as well as bootstrap with SMS flags enabling SMS also on loops with stage count 1. Regtested on SPU. On arm-linux-gnueabi regtseted on c,c++. Bootstrap c language with SMS flags enabling SMS also on loops with stage count 1. OK for mainline? Thanks, Revital Changelog: * modulo-sched.c: Change comment. (reset_sched_times): Fix print message. (print_partial_schedule): Add print info. Index: modulo-sched.c === --- modulo-sched.c (revision 173786) +++ modulo-sched.c (working copy) @@ -84,13 +84,14 @@ along with GCC; see the file COPYING3. II cycles (i.e. use register copies to prevent a def from overwriting itself before reaching the use). -SMS works with countable loops whose loop count can be easily -adjusted. This is because we peel a constant number of iterations -into a prologue and epilogue for which we want to avoid emitting -the control part, and a kernel which is to iterate that constant -number of iterations less than the original loop. So the control -part should be a set of insns clearly identified and having its -own iv, not otherwise used in the loop (at-least for now), which +SMS works with countable loops (1) whose control part can be easily +decoupled from the rest of the loop and (2) whose loop count can +be easily adjusted. This is because we peel a constant number of +iterations into a prologue and epilogue for which we want to avoid +emitting the control part, and a kernel which is to iterate that +constant number of iterations less than the original loop. So the +control part should be a set of insns clearly identified and having +its own iv, not otherwise used in the loop (at-least for now), which initializes a register before the loop to the number of iterations. Currently SMS relies on the do-loop pattern to recognize such loops, where (1) the control part comprises of all insns defining and/or @@ -595,8 +596,8 @@ reset_sched_times (partial_schedule_ptr /* Print the scheduling times after the rotation. */ fprintf (dump_file, crr_insn-node=%d (insn id %d), crr_insn-cycle=%d, min_cycle=%d, crr_insn-node-cuid, - INSN_UID (crr_insn-node-insn), SCHED_TIME (u), - normalized_time); + INSN_UID (crr_insn-node-insn), normalized_time, + new_min_cycle); if (JUMP_P (crr_insn-node-insn)) fprintf (dump_file, (branch)); fprintf (dump_file, \n); @@ -2530,8 +2531,13 @@ print_partial_schedule (partial_schedule fprintf (dump, \n[ROW %d ]: , i); while (ps_i) { - fprintf (dump, %d, , - INSN_UID (ps_i-node-insn)); + if (JUMP_P (ps_i-node-insn)) + fprintf (dump, %d (branch), , +INSN_UID (ps_i-node-insn)); + else + fprintf (dump, %d, , +INSN_UID (ps_i-node-insn)); + ps_i = ps_i-next_in_row; } }
Re: [PATCH, MELT] add dominance functions
On Wed, 18 May 2011 21:04:39 +0200 Pierre Vittet pier...@pvittet.com wrote: Hello, I have written a patch to allow the use of the GCC dominance functions into MELT. [...] Changelog: 2011-05-17 Pierre Vittet pier...@pvittet.com * melt/xtramelt-ana-base.melt (is_dominance_info_available, is_post_dominance_info_available, calculate_dominance_info_unsafe, calculate_post_dominance_info_unsafe, free_dominance_info, free_post_dominance_info, calculate_dominance_info, calculate_post_dominance_info, debug_dominance_info, debug_post_dominance_info, get_immediate_dominator_unsafe, get_immediate_dominator, get_immediate_post_dominator_unsafe, get_immediate_post_dominator, dominated_by_other_unsafe, dominated_by_other, post_dominated_by_other_unsafe, post_dominated_by_other, foreach_dominated_unsafe, dominated_by_bb_iterator): Add primitives, functions, iterators for using dominance info. Thanks for the patch. Some minor tweaks: First, put a space between formal arguments list function name. So +(defprimitive calculate_dominance_info_unsafe() :void should be +(defprimitive calculate_dominance_info_unsafe () :void Then, please put the defined name on the same line that defprimitive or defun or def... When consecutive MELT formals have the same ctype, you don't need to repeat it So +(defprimitive + dominated_by_other_unsafe(:basic_block bbA :basic_block bbB) :long should be +(defprimitive dominated_by_other_unsafe (:basic_block bbA bbB) :long In :doc strings, document when something is a boxed value (distinction between values stuffs is crucial), so write instead [I added the boxed word, it is important] +(defun get_immediate_dominator (bb) + :doc#{Return the next immediate dominator of the boxed basic_block $BB as a MELT +value.}# At last, all debug* operations should only output debug to stderr only when flag_melt_debug is set and give the MELT source position (because we don't want any debug printing in the usual case when -fmelt-debug is not given to our cc1) Look at debugloop in xtramelt-ana-base.melt for an example (notice that debugeprintfnonl is a C macro printing the MELT source position. So please resubmit a slightly improved patch. Regards. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basileatstarynkevitchdotnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mine, sont seulement les miennes} ***