Re: [patch, mips] Update multilibs for mips-mti-* targets
Steve Ellcey sell...@imgtec.com writes: 2013-05-28 Steve Ellcey sell...@imgtec.com * config/mips/mti-linux.h (SYSROOT_SUFFIX_SPEC): Add micromips and mips16 directories. * config/mips/t-mti-linux (MULTILIB_OPTIONS): Add micromips and mips16. (MULTILIB_DIRNAMES): Ditto. (MULTILIB_EXCEPTIONS): Add new exceptions. * config/mips/t-mti-elf (MULTILIB_OPTIONS): Add micromips. (MULTILIB_DIRNAMES): Ditto. (MULTILIB_EXCEPTIONS): Add new exceptions. OK, thanks. Just one minor suggestion: @@ -32,4 +32,11 @@ MULTILIB_EXCEPTIONS += *mips32*/*mabi=64* # or mips64r2 but does specify mabi=64 is not allowed because that # would be defaulting to the mips32r2 architecture. MULTILIB_EXCEPTIONS += mabi=64* -MULTILIB_EXCEPTIONS += mips16/mabi=64* +MULTILIB_EXCEPTIONS += *mips16/mabi=64* +MULTILIB_EXCEPTIONS += *mmicromips/mabi=64* The new lines here and: # The 64 bit ABI is not supported on the mips32r2 architecture. # Because mips32r2 is the default we can't use that flag to trigger -# the exception so we check for mabi=64 with no specific mips flag -# instead. +# the exception so we check for mabi=64 with no specific mips +# architecture flag instead. MULTILIB_EXCEPTIONS += mabi=64* +MULTILIB_EXCEPTIONS += *mips16/mabi=64* +MULTILIB_EXCEPTIONS += *mmicromips/mabi=64* here aren't really doing what the comment says. The * at the beginning allows any intervening flags, including -mips64 and -mips64r2. The lines are correct for a different reason: we're excluding all 64-bit MIPS16 and microMIPS multilibs, even when a correct -mips option is present. I think it would be clearer if you put these lines in the restrict MIPS16 to 32-bit targets and restrict microMIPS to mips32r2 blocks instead. Changing the order like that shouldn't need a full retest. Richard
Re: [PATCH][2/3] final try: Re-organize -fvect-cost-model, enable vectorization at -O2
On Tue, 28 May 2013, David Edelsohn wrote: Richi, I would greatly appreciate if you would bootstrap and test this on ppc64-linux as well. It's a no-op patch unless you specify -fvect-cost-model=cheap. I'll do a bootstrap and regtest on ppc64-linux for your convenience anyway. I'll record an otherwise positive review for the change from you. Thanks, Richard.
Re: [PATCH 1/2] handwritten part of patch
On Tue, May 28, 2013 at 8:04 PM, David Malcolm dmalc...@redhat.com wrote: On Mon, 2013-05-27 at 15:38 +0200, Richard Biener wrote: On Sat, May 25, 2013 at 3:02 PM, David Malcolm dmalc...@redhat.com wrote: Eliminate all direct references to cfun from macros in basic-block.h and introduce access methods to control_flow_graph * basic-block.h (control_flow_graph::get_basic_block_by_idx): New accessor methods. (control_flow_graph::get_entry_block): New method. (control_flow_graph::get_exit_block): New method. (control_flow_graph::get_basic_block_info): New methods. (control_flow_graph::get_n_basic_blocks): New methods. (control_flow_graph::get_n_edges): New methods. (control_flow_graph::get_last_basic_block): New methods. (control_flow_graph::get_label_to_block_map): New methods. (control_flow_graph::get_profile_status): New method. (control_flow_graph::set_profile_status): New method. (ENTRY_BLOCK_PTR): Eliminate this macro. (EXIT_BLOCK_PTR): Likewise. (basic_block_info): Likewise. (n_basic_blocks): Likewise. (n_edges): Likewise. (last_basic_block): Likewise. (label_to_block_map): Likewise. (profile_status): Likewise. (BASIC_BLOCK): Likewise. (SET_BASIC_BLOCK): Likewise. (FOR_EACH_BB_FN): Rewrite in terms of... (FOR_EACH_BB_CFG): New macro (FOR_EACH_BB): Eliminate this macro (FOR_EACH_BB_REVERSE_FN): Rewrite in terms of... (FOR_EACH_BB_REVERSE_FN_CFG): New macro (FOR_EACH_BB_REVERSE): Eliminate this macro (FOR_ALL_BB): Likewise. (FOR_ALL_BB_CFG): New macro I don't like the mix of _CFG / _FN. It's obvious we are talking about the CFG of a function in 'FOR_EACH_BB' and friends. The current status quo is a pair of macros: #define FOO_FN(FN) ...do something on (FN)-cfg #define FOO() FOO_FN(cfun) My patch changed these to be: #define FOO_CFG(CFG) ...do something on CFG #define FOO_FN(FN) FOO_CFG((FN)-cfg) and to get rid of the FOO() with its cfun usage. So would your preference be for the FOO() to mean the cfg: #define FOO(CFG) ...do something on cfg #define FOO_FN(FN) FOO((FN)-cfg) ? e.g. #define FOR_EACH_BB(BB, CFG) \ FOR_BB_BETWEEN (BB, (CFG)-x_entry_block_ptr-next_bb, (CFG)-x_exit_block_ptr, next_bb) #define FOR_EACH_BB_FN(BB, FN) FOR_EACH_BB_CFG (BB, (FN)-cfg) I suppose the CFG vs. FN distinction is because there is a control-flow-graph sub-structure in a function (OTOH there is at most a single CFG in a FN, so specifying FN is unambiguous). Yes, the un-suffixed variant (FOO) should be the one with the natural interface (taking a CFG). People already know about the _FN variant and if you change all existing uses of FOO (without cfg/fn argument) then that would be the best change. Ideally FN or CFG would be handled transparently via overloading for example (or giving both struct function and struct cfg the same set of member functions), so inline basic_block entry_block_ptr (struct function *fn) { return fn-cfg-x_entry_block_ptr; } inline basic_block entry_block_ptr (struct cfg *) { return cfg-x_entry_block_ptr; } #define FOR_EACH_BB (BB, CONTEXT) FOR_BB_BETWEEN (BB, entry_block_ptr (CONTEXT)-next_bb, exit_block_ptr (CONTEXT), next_bb) and only retain a single macro/function form for context kinds that can be unambiguously interchanged. Or do people think that would be too confusing? (I suppose defining an automatic conversion from struct function * to struct cfg * is also techincally possible, but that only works with the non-member-function way and would avoid the overloading). That said, I have no strong preference to remove the FOO_FN variants together with this patch. Using FOO with cfg arguments and FOO_FN with function arguments is fine for me (eventually as intermediate step). --- diff --git a/gcc/basic-block.h b/gcc/basic-block.h index eed320c..3949417 100644 --- a/gcc/basic-block.h +++ b/gcc/basic-block.h @@ -276,6 +276,57 @@ enum profile_status_d fields of this struct are interpreted as the defines for backward source compatibility following the definition of this struct. */ struct GTY(()) control_flow_graph { +public: + basic_block get_basic_block_by_idx (int idx) const + { +return (*x_basic_block_info)[idx]; + } get_basic_block_by_idx is rather long, any reason to not shorten it to get_block () or get_bb? Will do. get_bb () then + void set_basic_block_by_idx (int idx, basic_block bb) set_block () or set_bb() Will do. + { +(*x_basic_block_info)[idx] = bb; + } + + basic_block get_entry_block () const { return x_entry_block_ptr; } + + basic_block get_exit_block () const { return x_exit_block_ptr; } + + vecbasic_block, va_gc *get_basic_block_info
Re: Fix PR 57442
On Wed, May 29, 2013 at 3:35 AM, Easwaran Raman era...@google.com wrote: I made a thinko when I asserted gcc_unreachable outside the main loop of appears_later_in_bb in my previous fix to PR 57337. It is quite possible for STMT1 to be followed by a one or more statements with the same UID till the end of the BB without encountering STMT2. Right fix is to return STMT1 outside the loop. Ok for trunk? Ok. Thanks, Richard. Thanks, Easwaran 2013-05-28 Easwaran Raman era...@google.com PR tree-optimization/57442 * tree-ssa-reassoc.c (appears_later_in_bb): Return correct value when control exits the main loop. 2013-05-28 Easwaran Raman era...@google.com PR tree-optimization/57442 * gcc.dg/tree-ssa/reassoc-30.c: New testcase. Index: gcc/testsuite/gcc.dg/tree-ssa/reassoc-30.c === --- gcc/testsuite/gcc.dg/tree-ssa/reassoc-30.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-30.c (revision 0) @@ -0,0 +1,13 @@ +/* PR tree-optimization/57442 */ +/* { dg-do compile } */ +/* { dg-options -O1 } */ +short a; +unsigned b; +long c; +int d; + +void f(void) +{ +b = a ? : (a = b) - c + (d - (b + b)); +} + Index: gcc/tree-ssa-reassoc.c === --- gcc/tree-ssa-reassoc.c (revision 199385) +++ gcc/tree-ssa-reassoc.c (working copy) @@ -2888,7 +2888,7 @@ appears_later_in_bb (gimple stmt1, gimple stmt2) else if (stmt == stmt2) return stmt2; } - gcc_unreachable (); + return stmt1; } /* Find the statement after which STMT must be moved so that the
[Patch, Fortran] Enable FINALization/poly dealloc for allocatables
Dear all, this patch enables finalization (and polymorphic deallocation) for allocatables for: end of scope, DEALLOCATE and intent(out). As a side effect, an allocatable is no longer deallocated at the end of the main program. (Variables declared in the main program have automatically SAVE attribute; before finalization, it made no difference but with finalization it is detectable. And only finalizing nonfinalizable allocatables seems to be too much effort for too little gain.) Note: This patch requires the following patch, which is still pending review: * Enable wrapper generation, http://gcc.gnu.org/ml/fortran/2013-05/msg00073.html http://gcc.gnu.org/ml/fortran/2013-05/msg00093.html Build and regtested on x86-64-gnu-linux. OK for the trunk? Tobias PS: Fortran requires additional cases where finalization has to happen; those will be added in follow-up patches. 2013-05-29 Tobias Burnus bur...@net-b.de PR fortran/37336 * trans-array.c (gfc_trans_dealloc_allocated): Support finalization. (structure_alloc_comps): Update caller. (gfc_trans_deferred_array): Call finalizer. * trans-array.h (gfc_trans_dealloc_allocated): Update prototype. * trans-decl.c (gfc_trans_deferred_vars): Don't deallocate/finalize variables of the main program. * trans-expr.c (gfc_conv_procedure_call): Support finalization. * trans-openmp.c (gfc_omp_clause_dtor, gfc_trans_omp_array_reduction): Update calls. * trans-stmt.c (gfc_trans_deallocate): Avoid double deallocation of alloc components. * trans.c (gfc_add_finalizer_call): New function. (gfc_deallocate_with_status, gfc_deallocate_scalar_with_status): Call it (gfc_build_final_call): Fix handling of scalar coarrays. 2013-05-29 Tobias Burnus bur...@net-b.de PR fortran/37336 * gfortran.dg/finalize_12.f90: New. * gfortran.dg/alloc_comp_basics_1.f90: Add BLOCK for end of scope finalization. * gfortran.dg/alloc_comp_constructor_1.f90: Ditto. * gfortran.dg/allocatable_scalar_9.f90: Ditto. * gfortran.dg/auto_dealloc_2.f90: Ditto. * gfortran.dg/class_19.f03: Ditto. * gfortran.dg/coarray_lib_alloc_1.f90: Ditto. * gfortran.dg/coarray_lib_alloc_2.f90: Ditto. * gfortran.dg/extends_14.f03: Ditto. * gfortran.dg/move_alloc_4.f90: Ditto. * gfortran.dg/typebound_proc_27.f03: Ditto. diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c index be3a5a0..8160fcd 100644 --- a/gcc/fortran/trans-array.c +++ b/gcc/fortran/trans-array.c @@ -7243,7 +7243,7 @@ gfc_conv_array_parameter (gfc_se * se, gfc_expr * expr, bool g77, /* Generate code to deallocate an array, if it is allocated. */ tree -gfc_trans_dealloc_allocated (tree descriptor, bool coarray) +gfc_trans_dealloc_allocated (tree descriptor, bool coarray, gfc_expr *expr) { tree tmp; tree var; @@ -7259,7 +7259,7 @@ gfc_trans_dealloc_allocated (tree descriptor, bool coarray) are already deallocated are ignored. */ tmp = gfc_deallocate_with_status (coarray ? descriptor : var, NULL_TREE, NULL_TREE, NULL_TREE, NULL_TREE, true, -NULL, coarray); +expr, coarray); gfc_add_expr_to_block (block, tmp); /* Zero the data pointer. */ @@ -7548,7 +7548,7 @@ structure_alloc_comps (gfc_symbol * der_type, tree decl, { comp = fold_build3_loc (input_location, COMPONENT_REF, ctype, decl, cdecl, NULL_TREE); - tmp = gfc_trans_dealloc_allocated (comp, c-attr.codimension); + tmp = gfc_trans_dealloc_allocated (comp, c-attr.codimension, NULL); gfc_add_expr_to_block (tmpblock, tmp); } else if (c-attr.allocatable) @@ -7580,7 +7580,7 @@ structure_alloc_comps (gfc_symbol * der_type, tree decl, if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (comp))) tmp = gfc_trans_dealloc_allocated (comp, - CLASS_DATA (c)-attr.codimension); + CLASS_DATA (c)-attr.codimension, NULL); else { tmp = gfc_deallocate_scalar_with_status (comp, NULL_TREE, true, NULL, @@ -8292,7 +8292,7 @@ gfc_trans_deferred_array (gfc_symbol * sym, gfc_wrapped_block * block) stmtblock_t cleanup; locus loc; int rank; - bool sym_has_alloc_comp; + bool sym_has_alloc_comp, has_finalizer; sym_has_alloc_comp = (sym-ts.type == BT_DERIVED || sym-ts.type == BT_CLASS) @@ -8379,8 +8379,12 @@ gfc_trans_deferred_array (gfc_symbol * sym, gfc_wrapped_block * block) /* Allocatable arrays need to be freed when they go out of scope. The allocatable components of pointers must not be touched. */ - if (sym_has_alloc_comp !(sym-attr.function || sym-attr.result) - !sym-attr.pointer !sym-attr.save) + has_finalizer = sym-ts.type == BT_CLASS || sym-ts.type == BT_DERIVED + ? gfc_is_finalizable (sym-ts.u.derived, NULL) : false; + if ((!sym-attr.allocatable || !has_finalizer) + sym_has_alloc_comp !(sym-attr.function || sym-attr.result) + !sym-attr.pointer !sym-attr.save + !sym-ns-proc_name-attr.is_main_program) { int rank; rank = sym-as ? sym-as-rank : 0; @@
[IA-64] Improve FP comparisons with -fno-trapping-math
Hi, on most platforms the Ada compiler doesn't enable trap-on-FP-exceptions, so it's appropriate to set -fno-trapping-math there, which has been done in http://gcc.gnu.org/ml/gcc-patches/2013-05/msg01461.html However doing so can have an adverse effect if the architecture doesn't have a sufficiently rich set of FP comparison instructions because -fno-trapping-math allows the middle-end and the optimizers to turn the negation of signaling FP comparison operators into quiet FP unordered operators, e.g. not() into != and the quiet unordered operators aren't supported universally. The effect is that you need 2 comparisons instead of just 1 to achieve the desired result. The first (surprising) example is the IA-64, which has ORDERED and UNORDERED but not UNLT/UNLE/UNGT/UNGE because the unordered comparison instructions are signaling (since they are the negation of the signaling ordered comparisons). Therefore the fix is to make it possible to use these unordered comparison instructions, which are signaling, for the quiet unordered operators when the flag -fno-trapping-math is in effect. As a matter of fact, the support was already there (e.g. in ia64_print_operand) but it was rightfully disabled. Tested on IA-64/Linux and IA-64/HP-UX, OK for the mainline? 2013-05-29 Eric Botcazou ebotca...@adacore.com * config/ia64/predicates.md (ia64_cbranch_operator): Accept unordered comparison operators when -fno-trapping-math is in effect. * config/ia64/ia64.c (ia64_expand_compare): Add support for unordered comparison operators in TFmode and assert that unsupported operators cannot reach here. (ia64_print_operand): Likewise. -- Eric BotcazouIndex: config/ia64/predicates.md === --- config/ia64/predicates.md (revision 199343) +++ config/ia64/predicates.md (working copy) @@ -568,9 +568,15 @@ (define_predicate fr_reg_or_0_operand (match_test op == CONST0_RTX (GET_MODE (op)) ;; Return 1 if OP is a valid comparison operator for cbranch instructions. +;; If we're assuming that FP operations cannot generate user-visible traps, +;; then we can use the FP unordered-signaling instructions to implement the +;; FP unordered-quiet comparison predicates. (define_predicate ia64_cbranch_operator - (ior (match_operand 0 ordered_comparison_operator) - (match_code ordered,unordered))) + (if_then_else (match_test flag_trapping_math) + (ior (match_operand 0 ordered_comparison_operator) + (match_code ordered,unordered)) + (and (match_operand 0 comparison_operator) + (not (match_code uneq,ltgt) ;; True if this is a comparison operator, which accepts a normal 8-bit ;; signed immediate operand. Index: config/ia64/ia64.c === --- config/ia64/ia64.c (revision 199343) +++ config/ia64/ia64.c (working copy) @@ -1756,7 +1756,7 @@ ia64_expand_compare (rtx *expr, rtx *op0 else if (TARGET_HPUX GET_MODE (*op0) == TFmode) { enum qfcmp_magic { - QCMP_INV = 1, /* Raise FP_INVALID on SNaN as a side effect. */ + QCMP_INV = 1, /* Raise FP_INVALID on NaNs as a side effect. */ QCMP_UNORD = 2, QCMP_EQ = 4, QCMP_LT = 8, @@ -1770,21 +1770,27 @@ ia64_expand_compare (rtx *expr, rtx *op0 switch (code) { /* 1 = equal, 0 = not equal. Equality operators do - not raise FP_INVALID when given an SNaN operand. */ + not raise FP_INVALID when given a NaN operand. */ case EQ:magic = QCMP_EQ; ncode = NE; break; case NE:magic = QCMP_EQ; ncode = EQ; break; /* isunordered() from C99. */ case UNORDERED: magic = QCMP_UNORD; ncode = NE; break; case ORDERED: magic = QCMP_UNORD; ncode = EQ; break; /* Relational operators raise FP_INVALID when given - an SNaN operand. */ + a NaN operand. */ case LT:magic = QCMP_LT|QCMP_INV; ncode = NE; break; case LE:magic = QCMP_LT|QCMP_EQ|QCMP_INV; ncode = NE; break; case GT:magic = QCMP_GT|QCMP_INV; ncode = NE; break; case GE:magic = QCMP_GT|QCMP_EQ|QCMP_INV; ncode = NE; break; - /* FUTURE: Implement UNEQ, UNLT, UNLE, UNGT, UNGE, LTGT. - Expanders for buneq etc. weuld have to be added to ia64.md - for this to be useful. */ + /* Unordered relational operators do not raise FP_INVALID + when given a NaN operand. */ + case UNLT:magic = QCMP_LT|QCMP_UNORD; ncode = NE; break; + case UNLE:magic = QCMP_LT|QCMP_EQ|QCMP_UNORD; ncode = NE; break; + case UNGT:magic = QCMP_GT|QCMP_UNORD; ncode = NE; break; + case UNGE:magic = QCMP_GT|QCMP_EQ|QCMP_UNORD; ncode = NE; break; + /* Not supported. */ + case UNEQ: + case LTGT: default: gcc_unreachable (); } @@ -5279,6 +5285,9 @@ ia64_print_operand (FILE * file, rtx x, case UNGE: str = nlt;
[rs6000] Improve FP comparisons with -fno-trapping-math
Hi, on most platforms the Ada compiler doesn't enable trap-on-FP-exceptions, so it's appropriate to set -fno-trapping-math there, which has been done in http://gcc.gnu.org/ml/gcc-patches/2013-05/msg01461.html However doing so can have an adverse effect if the architecture doesn't have a sufficiently rich set of FP comparison instructions because -fno-trapping-math allows the middle-end and the optimizers to turn the negation of signaling FP comparison operators into quiet FP unordered operators, e.g. not() into != and the quiet unordered operators aren't supported universally. The effect is that you need 2 comparisons instead of just 1 to achieve the desired result. The second (less surprising) example is the e500 architecture which features FP instructions operating on GP registers. It has neither ORDERED/UNORDERED nor the UNLT/UNLE/UNGT/UNGE operators so, in this case, you get a new call to the libgcc routine implementing UNORDERED in addition to the comparison. The fix is to make it possible to use the (negated form of the) ordered comparison instructions, which are signaling, for the quiet unordered operators when the flag -fno-trapping-math is in effect. As a matter of fact, a partial support was already there (e.g. in rs6000_generate_compare) but it was disabled. Tested on e500v2/Linux and PowerPC/Linux, OK for the mainline? 2013-05-29 Eric Botcazou ebotca...@adacore.com * config/rs6000/predicates.md (rs6000_cbranch_operator): Accept some unordered comparison operators when -fno-trapping-math is in effect on the e500. * config/rs6000/rs6000.c (rs6000_generate_compare): Remove dead code and implement unordered comparison operators properly on the e500. -- Eric BotcazouIndex: config/rs6000/predicates.md === --- config/rs6000/predicates.md (revision 199343) +++ config/rs6000/predicates.md (working copy) @@ -1121,9 +1121,16 @@ (define_predicate branch_comparison_ope GET_MODE (XEXP (op, 0))), 1 +;; Return 1 if OP is a valid comparison operator for cbranch instructions. +;; If we're assuming that FP operations cannot generate user-visible traps, +;; then on e500 we can use the ordered-signaling instructions to implement +;; the unordered-quiet FP comparison predicates modulo a reversal. (define_predicate rs6000_cbranch_operator (if_then_else (match_test TARGET_HARD_FLOAT !TARGET_FPRS) - (match_operand 0 ordered_comparison_operator) + (if_then_else (match_test flag_trapping_math) + (match_operand 0 ordered_comparison_operator) + (ior (match_operand 0 ordered_comparison_operator) + (match_code (unlt,unle,ungt,unge (match_operand 0 comparison_operator))) ;; Return 1 if OP is a comparison operation that is valid for an SCC insn -- Index: config/rs6000/rs6000.c === --- config/rs6000/rs6000.c (revision 199343) +++ config/rs6000/rs6000.c (working copy) @@ -16086,16 +16086,41 @@ rs6000_generate_compare (rtx cmp, enum m { rtx cmp, or_result, compare_result2; enum machine_mode op_mode = GET_MODE (op0); + bool reverse_p; if (op_mode == VOIDmode) op_mode = GET_MODE (op1); + /* First reverse the condition codes that aren't directly supported. */ + switch (code) + { + case NE: + case UNLT: + case UNLE: + case UNGT: + case UNGE: + code = reverse_condition_maybe_unordered (code); + reverse_p = true; + break; + + case EQ: + case LT: + case LE: + case GT: + case GE: + reverse_p = false; + break; + + default: + gcc_unreachable (); + } + /* The E500 FP compare instructions toggle the GT bit (CR bit 1) only. This explains the following mess. */ switch (code) { - case EQ: case UNEQ: case NE: case LTGT: + case EQ: switch (op_mode) { case SFmode: @@ -16121,7 +16146,8 @@ rs6000_generate_compare (rtx cmp, enum m } break; - case GT: case GTU: case UNGT: case UNGE: case GE: case GEU: + case GT: + case GE: switch (op_mode) { case SFmode: @@ -16147,7 +16173,8 @@ rs6000_generate_compare (rtx cmp, enum m } break; - case LT: case LTU: case UNLT: case UNLE: case LE: case LEU: + case LT: + case LE: switch (op_mode) { case SFmode: @@ -16172,24 +16199,16 @@ rs6000_generate_compare (rtx cmp, enum m gcc_unreachable (); } break; + default: gcc_unreachable (); } /* Synthesize LE and GE from LT/GT || EQ. */ - if (code == LE || code == GE || code == LEU || code == GEU) + if (code == LE || code == GE) { emit_insn (cmp); - switch (code) - { - case LE: code = LT; break; - case GE: code = GT; break; - case LEU: code = LT; break; - case GEU: code = GT; break; - default: gcc_unreachable (); - } - compare_result2
Re: [IA-64] Improve FP comparisons with -fno-trapping-math
On Wed, May 29, 2013 at 10:33 AM, Eric Botcazou ebotca...@adacore.com wrote: Hi, on most platforms the Ada compiler doesn't enable trap-on-FP-exceptions, so it's appropriate to set -fno-trapping-math there, which has been done in http://gcc.gnu.org/ml/gcc-patches/2013-05/msg01461.html However doing so can have an adverse effect if the architecture doesn't have a sufficiently rich set of FP comparison instructions because -fno-trapping-math allows the middle-end and the optimizers to turn the negation of signaling FP comparison operators into quiet FP unordered operators, e.g. not() into != and the quiet unordered operators aren't supported universally. The effect is that you need 2 comparisons instead of just 1 to achieve the desired result. The first (surprising) example is the IA-64, which has ORDERED and UNORDERED but not UNLT/UNLE/UNGT/UNGE because the unordered comparison instructions are signaling (since they are the negation of the signaling ordered comparisons). Therefore the fix is to make it possible to use these unordered comparison instructions, which are signaling, for the quiet unordered operators when the flag -fno-trapping-math is in effect. As a matter of fact, the support was already there (e.g. in ia64_print_operand) but it was rightfully disabled. Tested on IA-64/Linux and IA-64/HP-UX, OK for the mainline? Did you check that this doesn't cause traps on SPEC CPU 2000 / 2006 when compiled with -ffast-math (something we generally want to support, even if it is on the border of validity)? Thanks, Richard. 2013-05-29 Eric Botcazou ebotca...@adacore.com * config/ia64/predicates.md (ia64_cbranch_operator): Accept unordered comparison operators when -fno-trapping-math is in effect. * config/ia64/ia64.c (ia64_expand_compare): Add support for unordered comparison operators in TFmode and assert that unsupported operators cannot reach here. (ia64_print_operand): Likewise. -- Eric Botcazou
[PATCH] Fix vect_bb_slp_scalar_cost change
My commit crossed the Cilk+ changes and only its testcases expose that we do not set stmt UIDs on blocks we currently vectorize and thus other vinfo_for_stmt calls have to be properly guarded. Tested on x86_64-unknown-linux-gnu, applied as obvious. Richard. 2013-05-29 Richard Biener rguent...@suse.de * tree-vect-slp.c (vect_bb_slp_scalar_cost): Guard vinfo access on whether the use is in the BB we currently try to vectorize. (vect_bb_vectorization_profitable_p): Pass the BB we currently vectorize to vect_bb_slp_scalar_cost. Index: gcc/tree-vect-slp.c === *** gcc/tree-vect-slp.c (revision 199402) --- gcc/tree-vect-slp.c (working copy) *** vect_slp_analyze_operations (bb_vec_info *** 1904,1910 update LIFE according to uses of NODE. */ static unsigned ! vect_bb_slp_scalar_cost (slp_tree node, vecbool, va_stack life) { unsigned scalar_cost = 0; unsigned i; --- 1904,1911 update LIFE according to uses of NODE. */ static unsigned ! vect_bb_slp_scalar_cost (basic_block bb, !slp_tree node, vecbool, va_stack life) { unsigned scalar_cost = 0; unsigned i; *** vect_bb_slp_scalar_cost (slp_tree node, *** 1931,1937 imm_use_iterator use_iter; gimple use_stmt; FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p)) ! if (!vinfo_for_stmt (use_stmt) || !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (use_stmt))) { life[i] = true; --- 1932,1938 imm_use_iterator use_iter; gimple use_stmt; FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p)) ! if (gimple_bb (use_stmt) != bb || !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (use_stmt))) { life[i] = true; *** vect_bb_slp_scalar_cost (slp_tree node, *** 1956,1962 } FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) ! scalar_cost += vect_bb_slp_scalar_cost (child, life); return scalar_cost; } --- 1957,1963 } FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) ! scalar_cost += vect_bb_slp_scalar_cost (bb, child, life); return scalar_cost; } *** vect_bb_vectorization_profitable_p (bb_v *** 1995,2001 vecbool, va_stack life; vec_stack_alloc (bool, life, SLP_INSTANCE_GROUP_SIZE (instance)); life.quick_grow_cleared (SLP_INSTANCE_GROUP_SIZE (instance)); ! scalar_cost += vect_bb_slp_scalar_cost (SLP_INSTANCE_TREE (instance), life); life.release (); } --- 1996,2003 vecbool, va_stack life; vec_stack_alloc (bool, life, SLP_INSTANCE_GROUP_SIZE (instance)); life.quick_grow_cleared (SLP_INSTANCE_GROUP_SIZE (instance)); ! scalar_cost += vect_bb_slp_scalar_cost (BB_VINFO_BB (bb_vinfo), ! SLP_INSTANCE_TREE (instance), life); life.release (); }
Re: [IA-64] Improve FP comparisons with -fno-trapping-math
Did you check that this doesn't cause traps on SPEC CPU 2000 / 2006 when compiled with -ffast-math (something we generally want to support, even if it is on the border of validity)? Do you mean SPECfp because some benchmarks might enable trap-on-FP-exceptions and you nevertheless compile them with -ffast-math? -- Eric Botcazou
Re: [PATCH] Fix incorrect discriminator assignment.
Mike Stump m...@mrs.kithrup.com writes: On May 28, 2013, at 3:56 PM, Dehao Chen de...@google.com wrote: The attached patch restricts the test only run on linux-gnu targets. Is it ok? Ok. You can put HAVE_AS_DWARF2_DEBUG_LINE in a comment above it to help explain why the target restriction applies. The idea is, if enough people care, or some x86 target (say, darwin), want to reliably find them all, one can then search for that and fix them all in one go. Right, this does work e.g. on Solaris/x86 with gas, but fails with Sun as or Darwin as. Btw., why is this even x86 specific? At least on Solaris/SPARC with gas, HAVE_AS_DWARF2_DEBUG_LINE *is* defined. And why do you add -gdwarf-2 in dg-options? AFAICS all tests in gcc.dg/debug/dwarf2 are built with -gdwarf-2 anyway. Perhaps you need just { dg-additional-options -O0 } instead? Also, please mention PR testsuite/57413 in the ChangeLog and move the addition of the testcase from gcc/ChangeLog to gcc/testsuite/ChangeLog, removing the testsuite/ prefix in the pathname. Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [IA-64] Improve FP comparisons with -fno-trapping-math
On Wed, May 29, 2013 at 11:19 AM, Eric Botcazou ebotca...@adacore.com wrote: Did you check that this doesn't cause traps on SPEC CPU 2000 / 2006 when compiled with -ffast-math (something we generally want to support, even if it is on the border of validity)? Do you mean SPECfp because some benchmarks might enable trap-on-FP-exceptions and you nevertheless compile them with -ffast-math? Yes, SPECfp. And no, I don't think they enable trap-on-FP-exceptions but they may cause exceptional behavior even if -fno-trapping-math is specified (not sure if IA64 masks exceptions on the comparisons?) Richard. -- Eric Botcazou
Re: [x86, PATCH 1/2] Enabling of the new Intel microarchitecture Silvermont
On Tue, May 28, 2013 at 11:12 AM, Zamyatin, Igor igor.zamya...@intel.com wrote: Several weeks ago Intel announced new microarchitecture named Silvermont (see for example http://newsroom.intel.com/community/intel_newsroom/blog/2013/05/06/intel-launches-low-power-high-performance-silvermont-microarchitecture ). This patch is the first and it enables Silvermont architecture in the GCC compiler (second patch will add some performance related features). Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk? Thanks, Igor 2013-05-27 Yuri Rumyantsev yuri.s.rumyant...@intel.com Igor Zamyatin igor.zamya...@intel.com Silvermont (SLM) architecture pipeline model, tuning and insn selection. * config.gcc: Add slm config options and target. * config/i386/slm.md: New. * config/i386/driver-i386.c (host_detect_local_cpu): Check movbe. * gcc/config/i386/i386-c.c (ix86_target_macros_internal): New case PROCESSOR_SLM. (ix86_target_macros_internal): Likewise. * gcc/config/i386/i386.c (slm_cost): New cost. (m_SLM): New macro flag. (initial_ix86_tune_features): Set m_SLM. (x86_accumulate_outgoing_args): Likewise. (x86_arch_always_fancy_math_387): Likewise. (processor_target_table): Add slm cost. (cpu_names): Add slm cpu name. (x86_option_override_internal): Set SLM ISA. (ix86_issue_rate): New case PROCESSOR_SLM. (ia32_multipass_dfa_lookahead): Likewise. (fold_builtin_cpu): Add slm. * config/i386/i386.h (TARGET_SLM): New target macro. (target_cpu_default): Add TARGET_CPU_DEFAULT_slm. (processor_type): Add PROCESSOR_SLM. * config/i386/i386.md (cpu): Add new value slm. (slm.md): Include slm.md. * libgcc/config/i386/cpuinfo.c (INTEL_SLM): New enum value. This is OK for mainline. Thanks, Uros.
Re: [IA-64] Improve FP comparisons with -fno-trapping-math
Yes, SPECfp. And no, I don't think they enable trap-on-FP-exceptions but they may cause exceptional behavior even if -fno-trapping-math is specified (not sure if IA64 masks exceptions on the comparisons?) The manual says that QNaNs signal Invalid for the signaling FP comparison operators and that you need to enable the Invalid Operation trap to get a fault from Invalid. The IEEE standard also says that comparisons by way of unordered signaling predicates raise the invalid operation exception, just like the regular operations. That being said, I can double-check. -- Eric Botcazou
Re: [PATCH] Do not allow non-top-level BIT_FIELD_REFs, IMAGPART_EXPRs or REALPART_EXPRs
Hi, On Tue, May 28, 2013 at 03:40:25PM +0200, Richard Biener wrote: On Tue, 28 May 2013, Martin Jambor wrote: I've committed it s revision 199379, thanks. As far as the non-top-levelness is concerned, the following (on top of the previous patch) also survives bootstrap and testsuite on x86_64 (all languages including Ada and Obj-C++). Do you think it would be acceptable as well? With the following minor adjustment: Thanks, Martin 2013-05-27 Martin Jambor mjam...@suse.cz * tree-cfg.c (verify_expr): Verify that BIT_FIELD_REF, REALPART_EXPR and IMAGPART_EXPR do not occur within other handled_components. Index: src/gcc/tree-cfg.c === --- src.orig/gcc/tree-cfg.c +++ src/gcc/tree-cfg.c @@ -2675,6 +2675,33 @@ verify_expr (tree *tp, int *walk_subtree return t; } + if (TREE_CODE (t) == BIT_FIELD_REF) + { + if (!host_integerp (TREE_OPERAND (t, 1), 1) + || !host_integerp (TREE_OPERAND (t, 2), 1)) + { + error (invalid position or size operand to BIT_FIELD_REF); + return t; + } + if (INTEGRAL_TYPE_P (TREE_TYPE (t)) + (TYPE_PRECISION (TREE_TYPE (t)) + != TREE_INT_CST_LOW (TREE_OPERAND (t, 1 + { + error (integral result type precision does not match +field size of BIT_FIELD_REF); + return t; + } + else if (!INTEGRAL_TYPE_P (TREE_TYPE (t)) + TYPE_MODE (TREE_TYPE (t)) != BLKmode + (GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (t))) + != TREE_INT_CST_LOW (TREE_OPERAND (t, 1 + { + error (mode precision of non-integral result does not +match field size of BIT_FIELD_REF); + return t; + } + } + t = TREE_OPERAND (t, 0); here instead of ... /* Fall-through. */ case COMPONENT_REF: case ARRAY_REF: @@ -2697,35 +2724,16 @@ verify_expr (tree *tp, int *walk_subtree if (TREE_OPERAND (t, 3)) CHECK_OP (3, invalid array stride); } - else if (TREE_CODE (t) == BIT_FIELD_REF) - { - if (!host_integerp (TREE_OPERAND (t, 1), 1) - || !host_integerp (TREE_OPERAND (t, 2), 1)) - { - error (invalid position or size operand to BIT_FIELD_REF); - return t; - } - if (INTEGRAL_TYPE_P (TREE_TYPE (t)) - (TYPE_PRECISION (TREE_TYPE (t)) - != TREE_INT_CST_LOW (TREE_OPERAND (t, 1 - { - error (integral result type precision does not match -field size of BIT_FIELD_REF); - return t; - } - else if (!INTEGRAL_TYPE_P (TREE_TYPE (t)) - !AGGREGATE_TYPE_P (TREE_TYPE (t)) - TYPE_MODE (TREE_TYPE (t)) != BLKmode - (GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (t))) - != TREE_INT_CST_LOW (TREE_OPERAND (t, 1 - { - error (mode precision of non-integral result does not -match field size of BIT_FIELD_REF); - return t; - } - } t = TREE_OPERAND (t, 0); + if (TREE_CODE (t) == BIT_FIELD_REF + || TREE_CODE (t) == REALPART_EXPR + || TREE_CODE (t) == IMAGPART_EXPR) + { + error (non-top-level BIT_FIELD_REF, IMAGPART_EXPR or +REALPART_EXPR); + return t; + } ... doing this after t = TREE_OPERAND (t, 0) (so, do it before). All right, I have committed the following as revision 199405. Thanks, Martin 2013-05-29 Martin Jambor mjam...@suse.cz * tree-cfg.c (verify_expr): Verify that BIT_FIELD_REF, REALPART_EXPR and IMAGPART_EXPR do not occur within other handled_components. Index: src/gcc/tree-cfg.c === --- src.orig/gcc/tree-cfg.c +++ src/gcc/tree-cfg.c @@ -2675,6 +2675,34 @@ verify_expr (tree *tp, int *walk_subtree return t; } + if (TREE_CODE (t) == BIT_FIELD_REF) + { + if (!host_integerp (TREE_OPERAND (t, 1), 1) + || !host_integerp (TREE_OPERAND (t, 2), 1)) + { + error (invalid position or size operand to BIT_FIELD_REF); + return t; + } + if (INTEGRAL_TYPE_P (TREE_TYPE (t)) + (TYPE_PRECISION (TREE_TYPE (t)) + != TREE_INT_CST_LOW (TREE_OPERAND (t, 1 + { + error (integral result type precision does not match +field size of BIT_FIELD_REF); + return t; + } + else if (!INTEGRAL_TYPE_P (TREE_TYPE (t)) + TYPE_MODE (TREE_TYPE
Re: [ada, build] host/target configuration
Hi! On Tue, 28 May 2013 09:28:28 +0200, I wrote: On Wed, 08 May 2013 11:27:09 +0200, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: As described in [PR ada/57188], amd64-pc-solaris2.1[01] Ada bootstrap was failing for some time. It has turned out that this patch is the culprit: 2013-04-23 Eric Botcazou ebotca...@adacore.com Pascal Obryo...@adacore.com * gcc-interface/Makefile.in (targ): Fix target name check. diff --git a/gcc/ada/gcc-interface/Makefile.in b/gcc/ada/gcc-interface/Makefile.in --- a/gcc/ada/gcc-interface/Makefile.in +++ b/gcc/ada/gcc-interface/Makefile.in @@ -259,7 +259,7 @@ TOOLS_LIBS = targext.o link.o ../../ggc- # manufacturer, and operating system and assign each of those to its own # variable. host:=$(subst -, ,$(host_canonical)) -targ:=$(subst -, ,$(target)) +targ:=$(subst -, ,$(subst -gnu, ,$(target_alias))) arch:=$(word 1,$(targ)) ifeq ($(words $(targ)),2) manu:= |osys:=$(word 2,$(targ)) | else |manu:=$(word 2,$(targ)) |osys:=$(word 3,$(targ)) | endif I couldn't find the gcc-patches posting for this patch, thus I'm missing the rationale for it. It seems rather counterintuitive and fragile to me, replacing the canonical $target by the far more varied $target_alias. I concur, and this has now caused confusion for the (pending upstream re-submission) x86 GNU/Hurd port, too, for which, upon removing -gnu from the target of i686-pc-gnu0.3, now a mere 0.3 remains for osys... If there's really a good reason to keep that patch nonetheless, [...] How about we use something like the following [...] patch? In essence, replace the manual parsing in gcc/ada/gcc-interface/Makefile.in by using the values the configure script already computed for us. This is largely straightforward (I would hope); please especially review the more involved cases such as the one where I'm now using linux%eabi -- for example, does that do the right thing or interfere with the Android configurations? The target_cpu_canonical substitution has been added in commit 369e542b3ad1c0acfa9bfaeb72b338d8db5ba2ef (2009-02-27, r144463, schwab) but unused ever since, thus removed. I'll now be testing for x86 GNU/Linux and GNU/Hurd; further testing appreciated. For these two configurations, I have now successfully tested the patch I posted. Further review/testing appreciated. (Adding »build machinery (*.in)« maintainers.) Other than rebuilding from scratch, how do I rebuild only the affected Ada/GNAT bits after regenerating gcc/ada/gcc-interface/Makefile? It doesn't just work, and »make clean-target-libada clean-gnattools« doesn't help either? gcc/ada/ * gcc-interface/Makefile.in (target_alias, host_canonical) (target_cpu_default): Don't substitute. (target_cpu, target_vendor, target_os, host_cpu, host_vendor) (host_os): Substitute. (host, targ, arch, manu, osys): Don't set, and replace their usage with the newly substituted target_* and host_* variables. diff --git gcc/ada/gcc-interface/Makefile.in gcc/ada/gcc-interface/Makefile.in index eeb8c7f..c07722b 100644 --- gcc/ada/gcc-interface/Makefile.in +++ gcc/ada/gcc-interface/Makefile.in @@ -151,12 +151,15 @@ GCC_CFLAGS = $(INTERNAL_CFLAGS) $(T_CFLAGS) $(CFLAGS) program_transform_name = objdir = . -target_alias=@target_alias@ target=@target@ +target_cpu=@target_cpu@ +target_vendor=@target_vendor@ +target_os=@target_os@ +host_cpu=@host_cpu@ +host_vendor=@host_vendor@ +host_os=@host_os@ xmake_file = @xmake_file@ tmake_file = @tmake_file@ -host_canonical=@host@ -target_cpu_default=@target_cpu_default@ #version=`sed -e 's/.*\\([^ \]*\)[ \].*/\1/' $(srcdir)/version.c` #mainversion=`sed -e 's/.*\\([0-9]*\.[0-9]*\).*/\1/' $(srcdir)/version.c` @@ -255,20 +258,6 @@ TOOLS_LIBS = targext.o link.o ../../ggc-none.o ../../libcommon-target.a \ ../../../libbacktrace/.libs/libbacktrace.a ../../../libiberty/libiberty.a \ $(SYSLIBS) $(TGT_LIB) -# Convert the target variable into a space separated list of architecture, -# manufacturer, and operating system and assign each of those to its own -# variable. -host:=$(subst -, ,$(host_canonical)) -targ:=$(subst -, ,$(subst -gnu, ,$(target_alias))) -arch:=$(word 1,$(targ)) -ifeq ($(words $(targ)),2) - manu:= - osys:=$(word 2,$(targ)) -else - manu:=$(word 2,$(targ)) - osys:=$(word 3,$(targ)) -endif - # Specify the directories to be searched for header files. # Both . and srcdir are used, in that order, # so that tm.h and config.h will be found in the compilation @@ -280,7 +269,7 @@ ADA_INCLUDES = -I- -I. -I$(srcdir)/ada # Likewise, but valid for subdirectories of the current dir. # FIXME: for VxWorks, we cannot add $(fsrcdir) because the regs.h file in # that directory conflicts with a system header file. -ifneq ($(findstring vxworks,$(osys)),) +ifneq ($(findstring
RE: [PATCH,i386] FP Reassociation for AMD bdver1 and bdver2
Thanks Uros! Committed at r199405. -Ganesh -Original Message- From: Uros Bizjak [mailto:ubiz...@gmail.com] Sent: Thursday, May 23, 2013 4:47 PM To: Gopalasubramanian, Ganesh Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH,i386] FP Reassociation for AMD bdver1 and bdver2 On Thu, May 23, 2013 at 1:11 PM, Gopalasubramanian, Ganesh ganesh.gopalasubraman...@amd.com wrote: The patch enables FP Reassociation pass AMD bdver1 and bdver2 architectures. We note a performance uplift of around ~8% on calculix. make -k check passes. Is it OK for upstream? OK. Thanks, Uros.
RE: [PATCH,i386] Default alignment for AMD BD and BT
Hi We want this to be backported to GCC48 branch. Please approve. Regards Ganesh -Original Message- From: Uros Bizjak [mailto:ubiz...@gmail.com] Sent: Tuesday, May 07, 2013 6:22 PM To: Gopalasubramanian, Ganesh Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH,i386] Default alignment for AMD BD and BT On Tue, May 7, 2013 at 9:16 AM, Gopalasubramanian, Ganesh ganesh.gopalasubraman...@amd.com wrote: The patch updates the alignment values for AMD BD and BT architectures. make -k check passes. Is it OK for upstream? 2013-05-07 Ganesh Gopalasubramanian ganesh.gopalasubraman...@amd.com * config/i386/i386.c (processor_target_table): Modified default alignment values for AMD BD and BT architectures. The value 11 indeed looks a bit weird, but it means: align to 16 byte boundary only if this can be done by skipping 10 bytes or less. Tha patch is OK for mainline. Thanks, Uros.
[PING 2] [PATCH RX] Added target specific macros for RX100, RX200, and RX600
Hi, Below patch added target specific macros for RX100, RX200, and RX600 http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00050.html Please review the patch and let me know if there should be any modifications in it? Regards, Sandeep Kumar Singh, KPIT Cummins InfoSystems Ltd. Pune, India
Re: [PING]RE: [patch] cilkplus: Array notation for C patch
Iyer, Balaji V balaji.v.i...@intel.com writes: -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Richard Henderson Sent: Tuesday, May 28, 2013 2:52 PM To: Iyer, Balaji V Cc: Jakub Jelinek; Aldy Hernandez; Jeff Law; 'Joseph S. Myers'; 'gcc-patches' Subject: Re: [PING]RE: [patch] cilkplus: Array notation for C patch On 05/28/2013 11:44 AM, Iyer, Balaji V wrote: i Richard, Jakub et al.. I think I have fixed everything requested by RTH (http://gcc.gnu.org/ml/gcc-patches/2013-05/msg01400.html). I think I have also moved the tests in the correct place Jakub requested. It is passing all the correct regression tests and not affecting others. Is this patch OK for trunk? Yes, it's ok. This patch is committed to trunk at revision 199389. ... and immediately broke Solaris bootstrap, cf. PR bootstrap/57450. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [Patch, Fortran] Enable the generation of the FINALization wrapper function
Hi Tobias, Small re-diff - but essentially unchanged. (I made a thinko when adding a _final call to gfc_trans_class_array_init_assign: Not in all contexts the _final should be called, only for INTENT(OUT). Thus, I remove the _final call and deferred it to the actual finalization call. [That also matches the scalar handling, which only does a memcpy and no dealloc.]) Build and regtested on x86-64-gnu-linux. OK for the trunk? I think this patch is ok. Just one nit: @@ -5571,7 +5569,7 @@ gfc_dump_module (const char *name, int dump_flag) FIXME: For backwards compatibility with the old uncompressed module format, write an extra empty line. When the module version is bumped, this can be removed. */ - gzprintf (module_fp, GFORTRAN module version '%s' created from %s\n\n, + gzprintf (module_fp, GFORTRAN module version '%s' created from %s\n, MOD_VERSION, gfc_source_file); Here you should remove the FIXME. Thanks for the patch, Janus Tobias Burnus wrote: Pre-remark: This patch does *not* enable finalization or polymorphic deallocation. * * * Dear all, The attached patch is a bit boring and invasive, but it paves the way to FINAL support. Changes of technical kind: * Changed ABI for CLASS's virtual table (due to _final) - and, hence, it bumps the .mod version * The finalization wrapper is now generated (this should not but might lead to ICEs) * It also causes that the virtual table is now more often generated New feature: _copy no longer deallocates the dst argument. Doing so lead to bogus finalization with ALLOCATE (exposed with the pending FINAL patch). As a sideeffect, memset could be removed and CALLOC could be replased by MALLOC (minute performance advantage). In order to keep the deallocation in gfc_trans_class_array_init_assign, there is now a call to the finalization wrapper. Next steps: * Add end-of-scope/intent(out) deallocation for polymorphic arrays * Enable FINAL parsing * Stepwise enabling for polymorphic deallocation/finalization * Fix issues with ELEMENTAL(+optional) with intent(out) * Fix some issues related to intrinsic assignment * Fix fallout of any of those items Build and regtested on x86-64-gnu-linux. OK for the trunk? Tobias
Fix PR57268
Hi, I noticed that the scheduler created long dependence list about ~9000 elements long and slowed compilation time for about an hour. Attached patch flushes the dependence list is case of it is longer than MAX_PENDING_LIST_LENGTH. Tested with gcc testsite on x86_64-linux-gnu with c and c++ enabled. Ok for trunk? thanks, Dinar. 2013-05-28 Dinar Temirbulatov di...@kugelworks.com PR rtl-optimization/57268 * sched-deps.c (sched_analyze_2): Flush dependence list then it is longer than MAX_PENDING_LIST_LENGTH. fix.patch Description: Binary data
Re: Fix PR57268
Hello, On Wed, May 29, 2013 at 2:01 PM, Dinar Temirbulatov wrote: Hi, I noticed that the scheduler created long dependence list about ~9000 elements long and slowed compilation time for about an hour. Attached patch flushes the dependence list is case of it is longer than MAX_PENDING_LIST_LENGTH. Tested with gcc testsite on x86_64-linux-gnu with c and c++ enabled. Ok for trunk? thanks, Dinar. 2013-05-28 Dinar Temirbulatov dinar at kugelworks dot com PR rtl-optimization/57268 * sched-deps.c (sched_analyze_2): Flush dependence list then it is longer than MAX_PENDING_LIST_LENGTH. * sched-deps.c (sched_analyze_2): Flush dependence lists if the sum of the read and write lists exceeds MAX_PENDING_LIST_LENGTH. if (!deps-readonly) - add_insn_mem_dependence (deps, true, insn, x); + { + if ((deps-pending_read_list_length + deps-pending_write_list_length) +MAX_PENDING_LIST_LENGTH) + flush_pending_lists (deps, insn, true, true); +add_insn_mem_dependence (deps, true, insn, x); +} The flush_pending_lists, add_insn_mem_dependence and } lines are not indented correctly. The if (...+...) line is too long (max. 80 characters per line). The GCC style would be if (!deps-readonly) { if ((deps-pending_read_list_length + deps-pending_write_list_length) MAX_PENDING_LIST_LENGTH) flush_pending_lists (deps, insn, true, true); add_insn_mem_dependence (deps, true, insn, x); } (The aesthetics of GCC code style is a matter for debate, but not here and now ;-) Ciao! Steven
[PATCH, AArch64] Re-organize aarch64_classify_symbol
This patch re-organizes the implementation of aarch64_classify_symbol in preparation for add tiny absolute memory model support. Regressed aarch64-none-elf, applied. /Marcus 2013-05-29 Chris Schlumberger-Socha chris.schlumberger-so...@arm.com Marcus Shawcroft marcus.shawcr...@arm.com * config/aarch64/aarch64.c (aarch64_classify_symbol): Remove comment. Refactor if/switch. Replace gcc_assert with if.
Re: [PING 2] [PATCH RX] Added target specific macros for RX100, RX200, and RX600
Hi Sandeep, Below patch added target specific macros for RX100, RX200, and RX600 http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00050.html Please review the patch and let me know if there should be any modifications in it? Sorry - I missed your original posting. The patch is fine, please apply. Cheers Nick
[PATCH, AArch64] Support --mcmodel=tiny.
Hi, This patch adds support for the tiny absolute memory model. Regressed for aarch64-none-elf with each of -mcmodel=tiny -mcmodel=small -mcmodel=small -fPIC Applied. /Marcus 2012-05-29 Chris Schlumberger-Socha chris.schlumberger-so...@arm.com Marcus Shawcroft marcus.shawcr...@arm.com * config/aarch64/aarch64-protos.h (aarch64_symbol_type): Define SYMBOL_TINY_ABSOLUTE. * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Handle SYMBOL_TINY_ABSOLUTE. (aarch64_expand_mov_immediate): Likewise. (aarch64_classify_symbol): Likewise. (aarch64_mov_operand_p): Remove ATTRIBUTE_UNUSED. Permit SYMBOL_TINY_ABSOLUTE. * config/aarch64/predicates.md (aarch64_mov_operand): Permit CONST.
Re: [PATCH, AArch64] Support --mcmodel=tiny.
On 29/05/13 14:08, Marcus Shawcroft wrote: Hi, This patch adds support for the tiny absolute memory model. Regressed for aarch64-none-elf with each of -mcmodel=tiny -mcmodel=small -mcmodel=small -fPIC Applied. This time with patch attached, oops, sorry. /Marcus /Marcus 2012-05-29 Chris Schlumberger-Socha chris.schlumberger-so...@arm.com Marcus Shawcroft marcus.shawcr...@arm.com * config/aarch64/aarch64-protos.h (aarch64_symbol_type): Define SYMBOL_TINY_ABSOLUTE. * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Handle SYMBOL_TINY_ABSOLUTE. (aarch64_expand_mov_immediate): Likewise. (aarch64_classify_symbol): Likewise. (aarch64_mov_operand_p): Remove ATTRIBUTE_UNUSED. Permit SYMBOL_TINY_ABSOLUTE. * config/aarch64/predicates.md (aarch64_mov_operand): Permit CONST. diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 91fcde8..bdb6b04 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -81,6 +81,7 @@ enum aarch64_symbol_type SYMBOL_SMALL_TLSDESC, SYMBOL_SMALL_GOTTPREL, SYMBOL_SMALL_TPREL, + SYMBOL_TINY_ABSOLUTE, SYMBOL_FORCE_TO_MEM }; diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index cbe7847..074eb1c 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -524,6 +524,10 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm, return; } +case SYMBOL_TINY_ABSOLUTE: + emit_insn (gen_rtx_SET (Pmode, dest, imm)); + return; + case SYMBOL_SMALL_GOT: { rtx tmp_reg = dest; @@ -826,6 +830,7 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) case SYMBOL_SMALL_TPREL: case SYMBOL_SMALL_ABSOLUTE: + case SYMBOL_TINY_ABSOLUTE: aarch64_load_symref_appropriately (dest, imm, sty); return; @@ -5030,6 +5035,8 @@ aarch64_classify_symbol (rtx x, case AARCH64_CMODEL_TINY_PIC: case AARCH64_CMODEL_TINY: + return SYMBOL_TINY_ABSOLUTE; + case AARCH64_CMODEL_SMALL_PIC: case AARCH64_CMODEL_SMALL: return SYMBOL_SMALL_ABSOLUTE; @@ -5051,6 +5058,10 @@ aarch64_classify_symbol (rtx x, switch (aarch64_cmodel) { case AARCH64_CMODEL_TINY: + if (SYMBOL_REF_WEAK (x)) + return SYMBOL_FORCE_TO_MEM; + return SYMBOL_TINY_ABSOLUTE; + case AARCH64_CMODEL_SMALL: if (SYMBOL_REF_WEAK (x)) return SYMBOL_FORCE_TO_MEM; @@ -6444,10 +6455,9 @@ aarch64_simd_imm_scalar_p (rtx x, enum machine_mode mode ATTRIBUTE_UNUSED) bool aarch64_mov_operand_p (rtx x, - enum aarch64_symbol_context context ATTRIBUTE_UNUSED, + enum aarch64_symbol_context context, enum machine_mode mode) { - if (GET_CODE (x) == HIGH aarch64_valid_symref (XEXP (x, 0), GET_MODE (XEXP (x, 0 return true; @@ -6458,7 +6468,8 @@ aarch64_mov_operand_p (rtx x, if (GET_CODE (x) == SYMBOL_REF mode == DImode CONSTANT_ADDRESS_P (x)) return true; - return false; + return aarch64_classify_symbolic_expression (x, context) +== SYMBOL_TINY_ABSOLUTE; } /* Return a const_int vector of VAL. */ diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index 16c4385..3248f61 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -166,7 +166,7 @@ }) (define_predicate aarch64_mov_operand - (and (match_code reg,subreg,mem,const_int,symbol_ref,high) + (and (match_code reg,subreg,mem,const,const_int,symbol_ref,label_ref,high) (ior (match_operand 0 register_operand) (ior (match_operand 0 memory_operand) (match_test aarch64_mov_operand_p (op, SYMBOL_CONTEXT_ADR, mode))
Re: [Patch, Fortran] Enable the generation of the FINALization wrapper function
Janus Weil wrote: Build and regtested on x86-64-gnu-linux. OK for the trunk? I think this patch is ok. Just one nit: @@ -5571,7 +5569,7 @@ gfc_dump_module (const char *name, int dump_flag) FIXME: For backwards compatibility with the old uncompressed module format, write an extra empty line. When the module version is bumped, this can be removed. */ - gzprintf (module_fp, GFORTRAN module version '%s' created from %s\n\n, + gzprintf (module_fp, GFORTRAN module version '%s' created from %s\n, MOD_VERSION, gfc_source_file); Here you should remove the FIXME. I thought I had done this - but seemingly I missed that one. DONE. Dominique was that brave and tested the posted patches. Doing so, he found that the finalization wrapper causes an ICE with -m32. The reason is that I use gfc_convert_type to convert the kind value of result variable of the RANK intrinsic. It turned out that calling that for a no op conversion leads to an ICE (internal error). (The conversion is supposed to convert the default-kind result value to the gfc_array_index_kind, otherwise one runs into an ICE at tree level.) Well, the solution is simple: Only convert it when needed. Hence, I included the following patch in the committal (Rev. 199409). --- a/gcc/fortran/class.c +++ b/gcc/fortran/class.c @@ -1644 +1644,2 @@ generate_finalization_wrapper (gfc_symbol *derived, gfc_namespace *ns, - gfc_convert_type (rank, idx-ts, 2); + if (rank-ts.kind != idx-ts.kind) +gfc_convert_type_warn (rank, idx-ts, 2, 0); Thanks to both of you for the review and the testing. * * * Now the important ingredients for finalization are all in, the next task is to actually do finalization. The first patch has been posted, but it will take a while until all finalization calls will be available. Tobias
RE: [PING 2] [PATCH RX] Added target specific macros for RX100, RX200, and RX600
Hi Nick, The patch is fine, please apply. Thanks for the review. I do not have write approvals to gcc-svn. Could you please commit this for me? Regards, Sandeep Kumar Singh, KPIT Cummins InfoSystems Ltd. Pune, India -Original Message- From: nick clifton [mailto:ni...@redhat.com] Sent: Wednesday, May 29, 2013 6:38 PM To: Sandeep Kumar Singh; gcc-patches@gcc.gnu.org Cc: Kaushik Phatak; Cecilia Rodrigues Subject: Re: [PING 2] [PATCH RX] Added target specific macros for RX100, RX200, and RX600 Hi Sandeep, Below patch added target specific macros for RX100, RX200, and RX600 http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00050.html Please review the patch and let me know if there should be any modifications in it? Sorry - I missed your original posting. The patch is fine, please apply. Cheers Nick
Re: [ada, build] host/target configuration
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Il 29/05/2013 12:50, Thomas Schwinge ha scritto: How about we use something like the following [...] patch? In essence, replace the manual parsing in gcc/ada/gcc-interface/Makefile.in by using the values the configure script already computed for us. This is largely straightforward (I would hope); please especially review the more involved cases such as the one where I'm now using linux%eabi -- for example, does that do the right thing or interfere with the Android configurations? The target_cpu_canonical substitution has been added in commit 369e542b3ad1c0acfa9bfaeb72b338d8db5ba2ef (2009-02-27, r144463, schwab) but unused ever since, thus removed. I'll now be testing for x86 GNU/Linux and GNU/Hurd; further testing appreciated. For these two configurations, I have now successfully tested the patch I posted. Further review/testing appreciated. (Adding »build machinery (*.in)« maintainers.) I agree that using $(target_os) and linux% matches is a better way to strip out the -gnus. The patch is long, but I think I reviewed it carefully enough. It's okay for mainline. Regarding the android change: -ifeq ($(strip $(filter-out arm%-linux,$(arch)-$(osys)) $(if $(findstring eabi,$(word 4,$(targ))),,$(word 4,$(targ,) +ifeq ($(strip $(filter-out arm% linux%eabi,$(target_cpu) $(target_os))),) This is okay, it will match arm-none-linux-androideabi both before and after (perhaps linux-%eabi would be more readable). However, it seems that the first androideabi snippet was dead code. Can you delete it in a follow-up? Paolo -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJRpg7xAAoJEBvWZb6bTYbyEKgP/RI1fU8fEdzH1UK+3BG8sqgF DSoK0mo25PBIGHYT8p4Qe2a9tE2fZys93NHufPmQnVQ1M/YqmB5JyspCoe0mqke0 fQUMI8DpV0f+g9MquIZFDYrceR4qGEaFZ9KlnHGcpErA5zs1Wb7Shj6SoWt4I3b3 2dQy61hD7DcVe1xctG349xU6Hc2j34VCk8axUcpFCm9iD6UBB8GAUmZCk6egkIJr BWs9anasAhmh1E4aEOjGFvhWQcB23pmxSkIIk5wSkKBHZpHURIwXqOUmK1I0ebme j7ee2+9SzSwk0emY4lBdDg14byVSvRMKnMDpFdIfHTJvYEqm6ha8jeNxBKqM/xvk pz7AX1jwfsX22QSWiZuqcqqYBR0y7qVA0/5rjyFitLB85PY+yIfe5BNH4d5KdeuF yDuss+n6vZrYCnyRpk0P4VbLQ5gnrtqDhxTKPABf3eMNFI6iyyadRh0qLQVrnyX5 qmUciuZTMouC4rAx1H/oAAHZbPmeHWS6cH7ggO0WlS5Us/Ws/s3OdFUePygXhQpL xpOgqjSaDhE6s4PQAZkX79EOqvtEJynX/gGi3NTyZey/Fsp9n5JFperjAUfPG/Bt 2Cu7tDmcXLKlNKeaL7IuLppVPmVicfXTUAnYemWja78ijbbKLeD2W5OGzsfxTMuV QoH7oqV2NqSIayTAmYg6 =Sh4D -END PGP SIGNATURE-
Re: Fix PR57268
On 05/29/2013 06:52 AM, Steven Bosscher wrote: Hello, On Wed, May 29, 2013 at 2:01 PM, Dinar Temirbulatov wrote: Hi, I noticed that the scheduler created long dependence list about ~9000 elements long and slowed compilation time for about an hour. Attached patch flushes the dependence list is case of it is longer than MAX_PENDING_LIST_LENGTH. Tested with gcc testsite on x86_64-linux-gnu with c and c++ enabled. Ok for trunk? thanks, Dinar. 2013-05-28 Dinar Temirbulatov dinar at kugelworks dot com PR rtl-optimization/57268 * sched-deps.c (sched_analyze_2): Flush dependence list then it is longer than MAX_PENDING_LIST_LENGTH. * sched-deps.c (sched_analyze_2): Flush dependence lists if the sum of the read and write lists exceeds MAX_PENDING_LIST_LENGTH. if (!deps-readonly) - add_insn_mem_dependence (deps, true, insn, x); + { + if ((deps-pending_read_list_length + deps-pending_write_list_length) +MAX_PENDING_LIST_LENGTH) + flush_pending_lists (deps, insn, true, true); +add_insn_mem_dependence (deps, true, insn, x); +} The flush_pending_lists, add_insn_mem_dependence and } lines are not indented correctly. The if (...+...) line is too long (max. 80 characters per line). The GCC style would be if (!deps-readonly) { if ((deps-pending_read_list_length + deps-pending_write_list_length) MAX_PENDING_LIST_LENGTH) flush_pending_lists (deps, insn, true, true); add_insn_mem_dependence (deps, true, insn, x); } (The aesthetics of GCC code style is a matter for debate, but not here and now ;-) And just to be clear, with Steven's suggested changes, this patch is OK. jeff
[PATCH] Fix PR57441 (memory error in SLSR)
Hi, Prior to adding conditional candidates to SLSR, the size of the increment vector was bounded by the number of related candidates, and the vector was malloc'd to be that size. With conditional candidates, there can be an additional increment for each related PHI operand, causing potential overflow and heap corruption. This patch removes the no longer valid assumption and allocates a fixed-size vector. Bootstrap and regtest in progress on powerpc64-linux-gnu. Ok for trunk if everything checks out? Thanks, Bill 2013-05-29 Bill Schmidt wschm...@linux.vnet.ibm.com * gimple-ssa-strength-reduction.c (analyze_candidates_and_replace): Don't limit size of incr_vec to number of candidates. Index: gcc/gimple-ssa-strength-reduction.c === --- gcc/gimple-ssa-strength-reduction.c (revision 199406) +++ gcc/gimple-ssa-strength-reduction.c (working copy) @@ -3361,7 +3361,6 @@ analyze_candidates_and_replace (void) less expensive to calculate than the replaced statements. */ else { - int length; enum machine_mode mode; bool speed; @@ -3372,14 +3371,11 @@ analyze_candidates_and_replace (void) /* If all candidates have already been replaced under other interpretations, nothing remains to be done. */ - length = count_candidates (c); - if (!length) + if (!count_candidates (c)) continue; - if (length MAX_INCR_VEC_LEN) - length = MAX_INCR_VEC_LEN; /* Construct an array of increments for this candidate chain. */ - incr_vec = XNEWVEC (incr_info, length); + incr_vec = XNEWVEC (incr_info, MAX_INCR_VEC_LEN); incr_vec_len = 0; record_increments (c);
Re: Fix PR 53743 and other -freorder-blocks-and-partition failures
On Thu, May 23, 2013 at 6:18 AM, Teresa Johnson tejohn...@google.com wrote: On Wed, May 22, 2013 at 2:05 PM, Teresa Johnson tejohn...@google.com wrote: Revised patch included below. The spacing of my pasted in patch text looks funky again, let me know if you want the patch as an attachment instead. I addressed all of Steven's comments, except for the suggestion to use gcc_assert instead of error() in verify_hot_cold_block_grouping() to keep this consistent with the rest of the verify_flow_info subroutines (let me know if this is ok). I fixed this issue too, which was actually in insert_section_boundary_note(), so that it gcc_asserts more efficiently as suggested. Retested, latest patch below. Honza, would you be able to review the patch? Ping. Still needs a global maintainer to review and approve. Also, I submitted a PR for the debug range issue: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57451 Thanks! Teresa Thanks! Teresa The other main changes: (1) Added several test cases (cloned from the torture subdirectories, where I manually built/ran with FDO and -freorder-blocks-and-partition with both the current trunk and my fixed trunk compiler, and was able to expose some failures I fixed. (2) Changed existing tree-prof tests that used -freorder-blocks-and-partition to be built with -O2 instead of -O, so that partitioning actually kicks in. (3) Fixed a couple of failures in the new verify_hot_cold_block_grouping() checks exposed by the torture tests I ran manually with splitting (2 of the tests cloned to tree-prof in this patch). One was in computed goto where we were too aggressive about cloning crossing edges, and the other was in rtl_split_edge called from the stack pass which was not correctly inserting the new bb in the correct partition since bb layout is complete at that point. Re-tested on x86_64-unknown-linux-gnu with bootstrap and profiledbootstrap builds and regression testing. Re-built/ran cpu2006int with profile feedback and -freorder-blocks-and-partition enabled. Ok for trunk? Thanks! Teresa 2013-05-23 Teresa Johnson tejohn...@google.com * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert as this is now done by redirect_edge_and_branch_force. * function.c (thread_prologue_and_epilogue_insns): Insert new bb after barriers, and fix interaction with splitting. * emit-rtl.c (try_split): Copy REG_CROSSING_JUMP notes. * cfgcleanup.c (try_forward_edges): Fix early return value to properly reflect changes made in the routine. * bb-reorder.c (emit_barrier_after_bb): Move to cfgrtl.c. (fix_up_fall_thru_edges): Remove incorrect check for bb layout order since this is called in cfglayout mode, and replace partition fixup with assert as that is now done by force_nonfallthru_and_redirect. (add_reg_crossing_jump_notes): Handle the fact that some jumps may already be marked with region crossing note. (insert_section_boundary_note): Make non-static, gate on flag has_bb_partition, rewrite to also check for multiple partitions. (rest_of_handle_reorder_blocks): Remove call to insert_section_boundary_note, now done later during free_cfg. (duplicate_computed_gotos): Don't duplicate partition crossing edge. * bb-reorder.h (insert_section_boundary_note): Declare. * Makefile.in (cfgrtl.o): Depend on bb-reorder.h * cfgrtl.c (rest_of_pass_free_cfg): If partitions exist invoke insert_section_boundary_note. (try_redirect_by_replacing_jump): Remove unnecessary check for region crossing note. (fixup_partition_crossing): New function. (rtl_redirect_edge_and_branch): Fixup partition boundaries. (emit_barrier_after_bb): Move here from bb-reorder.c, handle insertion in non-cfglayout mode. (force_nonfallthru_and_redirect): Fixup partition boundaries, remove old code that tried to do this. Emit barrier correctly when we are in cfglayout mode. (last_bb_in_partition): New function. (rtl_split_edge): Correctly fixup partition boundaries. (commit_one_edge_insertion): Remove old code that tried to fixup region crossing edge since this is now handled in split_block, and set up insertion point correctly since block may now end in a jump. (verify_hot_cold_block_grouping): Guard against checking when not in linearized RTL mode. (rtl_verify_edges): Add checks for incorrect/missing REG_CROSSING_JUMP notes. (rtl_verify_flow_info_1): Move verify_hot_cold_block_grouping to rtl_verify_flow_info, so not called in cfglayout mode. (rtl_verify_flow_info): Move verify_hot_cold_block_grouping here. (fixup_reorder_chain): Remove old code that attempted to fixup region crossing note as this is now handled in force_nonfallthru_and_redirect. (duplicate_insn_chain): Don't duplicate switch section notes. (rtl_can_remove_branch_p):
Re: [PING]RE: [patch] cilkplus: Array notation for C patch
Rainer Orth r...@cebitec.uni-bielefeld.de writes: Iyer, Balaji V balaji.v.i...@intel.com writes: [...] This patch is committed to trunk at revision 199389. ... and immediately broke Solaris bootstrap, cf. PR bootstrap/57450. Fixed implementing Richard's suggestion from the PR. Bootstrapped without regressions on i386-pc-solaris2.10 and x86_64-unknown-linux-gnu, installed as obvious. Rainer 2013-05-29 Rainer Orth r...@cebitec.uni-bielefeld.de PR bootstrap/57450 * c-array-notation.c (length_mismatch_in_expr_p): Use absu_hwi. (build_array_notation_expr): Likewise. # HG changeset patch # Parent 22d5dfeeb2006caf8be6803106b454a19cfe6513 Fix c/c-array-notation.c compilation failure (PR bootstrap/57450) diff --git a/gcc/c/c-array-notation.c b/gcc/c/c-array-notation.c --- a/gcc/c/c-array-notation.c +++ b/gcc/c/c-array-notation.c @@ -116,7 +116,7 @@ length_mismatch_in_expr_p (location_t lo { l_node = int_cst_value (list[ii][jj]); l_start = int_cst_value (start); - if (abs (l_start) != abs (l_node)) + if (absu_hwi (l_start) != absu_hwi (l_node)) { error_at (loc, length mismatch in expression); return true; @@ -1561,7 +1561,7 @@ build_array_notation_expr (location_t lo HOST_WIDE_INT r_length = int_cst_value (rhs_length[0][0]); /* Length can be negative or positive. As long as the magnitude is OK, then the array notation is valid. */ - if (abs (l_length) != abs (r_length)) + if (absu_hwi (l_length) != absu_hwi (r_length)) { error_at (location, length mismatch between LHS and RHS); pop_stmt_list (an_init); -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] Fix PR57441 (memory error in SLSR)
On Wed, May 29, 2013 at 4:50 PM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: Hi, Prior to adding conditional candidates to SLSR, the size of the increment vector was bounded by the number of related candidates, and the vector was malloc'd to be that size. With conditional candidates, there can be an additional increment for each related PHI operand, causing potential overflow and heap corruption. This patch removes the no longer valid assumption and allocates a fixed-size vector. Bootstrap and regtest in progress on powerpc64-linux-gnu. Ok for trunk if everything checks out? Ok with referencing the PR in the changelog and adding the testcase. Thanks, Richard. Thanks, Bill 2013-05-29 Bill Schmidt wschm...@linux.vnet.ibm.com * gimple-ssa-strength-reduction.c (analyze_candidates_and_replace): Don't limit size of incr_vec to number of candidates. Index: gcc/gimple-ssa-strength-reduction.c === --- gcc/gimple-ssa-strength-reduction.c (revision 199406) +++ gcc/gimple-ssa-strength-reduction.c (working copy) @@ -3361,7 +3361,6 @@ analyze_candidates_and_replace (void) less expensive to calculate than the replaced statements. */ else { - int length; enum machine_mode mode; bool speed; @@ -3372,14 +3371,11 @@ analyze_candidates_and_replace (void) /* If all candidates have already been replaced under other interpretations, nothing remains to be done. */ - length = count_candidates (c); - if (!length) + if (!count_candidates (c)) continue; - if (length MAX_INCR_VEC_LEN) - length = MAX_INCR_VEC_LEN; /* Construct an array of increments for this candidate chain. */ - incr_vec = XNEWVEC (incr_info, length); + incr_vec = XNEWVEC (incr_info, MAX_INCR_VEC_LEN); incr_vec_len = 0; record_increments (c);
RE: [PING]RE: [patch] cilkplus: Array notation for C patch
Balaji, Thanks for this new feature and I am relieved that so much of it works successfully on PowerLinux and AIX. I know that you have received a deluge of reports of issues with the cilkplus support that you slowly are working through. I am seeing the following new testsuite failures on AIX: /nasfarm/dje/src/src/gcc/testsuite/c-c++-common/cilk-plus/AN/sec_implicit2.c: In function 'main2': /nasfarm/dje/src/src/gcc/testsuite/c-c++-common/cilk-plus/AN/sec_implicit2.c:23:15: error: __sec_implicit_index parameter must be an integer constant expression /nasfarm/dje/src/src/gcc/testsuite/c-c++-common/cilk-plus/AN/sec_reduce_max_min_ind.c: In function 'main2': /nasfarm/dje/src/src/gcc/testsuite/c-c++-common/cilk-plus/AN/sec_reduce_max_min_ind.c:23:28: error: __sec_reduce_min_ind or __sec_reduce_max_ind cannot have arrays with dimension greater than 1 /nasfarm/dje/src/src/gcc/testsuite/c-c++-common/cilk-plus/AN/sec_reduce_max_min_ind.c:27:28: error: __sec_reduce_min_ind or __sec_reduce_max_ind cannot have arrays with dimension greater than 1 I hope that you can help resolve these errors. Thanks, David
Re: [PATCH] Fix incorrect discriminator assignment.
Patch updated and committed as r199412. Thanks, Dehao
Re: [rs6000] Improve FP comparisons with -fno-trapping-math
* config/rs6000/predicates.md (rs6000_cbranch_operator): Accept some unordered comparison operators when -fno-trapping-math is in effect on the e500. * config/rs6000/rs6000.c (rs6000_generate_compare): Remove dead code and implement unordered comparison operators properly on the e500. I'm okay with most of the change, but I have a question: What happened to the unsigned comparisons and LTGT? I'm not going to insist, but this probably deserves an e500-specific testcase that it's generating the correct results and not calling libgcc for unordered comparisons. Thanks, David
Re: [PING]RE: [patch] cilkplus: Array notation for C patch
On Tue, May 28, 2013 at 5:35 PM, H.J. Lu hjl.to...@gmail.com wrote: On Tue, May 28, 2013 at 1:02 PM, Iyer, Balaji V balaji.v.i...@intel.com wrote: -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Richard Henderson Sent: Tuesday, May 28, 2013 2:52 PM To: Iyer, Balaji V Cc: Jakub Jelinek; Aldy Hernandez; Jeff Law; 'Joseph S. Myers'; 'gcc-patches' Subject: Re: [PING]RE: [patch] cilkplus: Array notation for C patch On 05/28/2013 11:44 AM, Iyer, Balaji V wrote: i Richard, Jakub et al.. I think I have fixed everything requested by RTH (http://gcc.gnu.org/ml/gcc-patches/2013-05/msg01400.html). I think I have also moved the tests in the correct place Jakub requested. It is passing all the correct regression tests and not affecting others. Is this patch OK for trunk? Yes, it's ok. This patch is committed to trunk at revision 199389. Thanks, On Linux/x32, I got FAIL: c-c++-common/cilk-plus/AN/if_test.c -O0 -fcilkplus execution test FAIL: c-c++-common/cilk-plus/AN/if_test.c -fcilkplus -O0 -std=c99 execution test FAIL: c-c++-common/cilk-plus/AN/if_test.c -fcilkplus -g -O0 -std=c99 execution test FAIL: c-c++-common/cilk-plus/AN/if_test.c -fcilkplus -g -std=c99 execution test FAIL: c-c++-common/cilk-plus/AN/if_test.c -fcilkplus -std=c99 execution test FAIL: c-c++-common/cilk-plus/AN/if_test.c -fcilkplus execution test FAIL: c-c++-common/cilk-plus/AN/if_test.c -g -O0 -fcilkplus execution test FAIL: c-c++-common/cilk-plus/AN/if_test.c -g -fcilkplus execution test [x32@gnu-35 gcc]$ /export/gnu/import/git/gcc-test-x32/bld/gcc/xgcc -B/export/gnu/import/git/gcc-test-x32/bld/gcc/ /export/gnu/import/git/gcc-test-x32/src-trunk/gcc/testsuite/c-c++-common/cilk-plus/AN/if_test.c -fno-diagnostics-show-caret -fdiagnostics-color=never -fcilkplus -g -O0 -std=c99 -fcilkplus -lm -m32 -o ./if_test.exe [x32@gnu-35 gcc]$ /export/gnu/import/git/gcc-test-x32/bld/gcc/xgcc -B/export/gnu/import/git/gcc-test-x32/bld/gcc/ /export/gnu/import/git/gcc-test-x32/src-trunk/gcc/testsuite/c-c++-common/cilk-plus/AN/if_test.c -fno-diagnostics-show-caret -fdiagnostics-color=never -fcilkplus -g -O0 -std=c99 -fcilkplus -lm -mx32 -o ./if_test.exe [x32@gnu-35 gcc]$ ./if_test.exe [x32@gnu-35 gcc]$ echo $? 5 [x32@gnu-35 gcc]$ (gdb) r Starting program: /export/gnu/import/git/gcc-test-x32/bld/gcc/testsuite/gcc/if_test.exe Breakpoint 1, main2 (argc=3, argv=0xd240) at /export/gnu/import/git/gcc-test-x32/src-trunk/gcc/testsuite/c-c++-common/cilk-plus/AN/if_test.c:131 131 return 5; Missing separate debuginfos, use: debuginfo-install glibc-2.16-30.1.fc18.x32 (gdb) p ii $1 = 0 (gdb) p array2_check $2 = {5, 5, 5, 5, 5, 5, 5, 5, 5, 5} (gdb) p array2 $3 = {10, 10, 10, 10, 10, 10, 10, 10, 10, 10} (gdb) Does cilkplus assume ptr_mode == word_mode? On x32, ptr_mode == SImode and word_mode == DImode. I opened: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57452 There is a bug in the testcase. The problem is the second element in array notation is not the size of array and GCC doesn't detect the problem. -- H.J.
Re: Fix PR57268
* sched-deps.c (sched_analyze_2): Flush dependence lists if the sum of the read and write lists exceeds MAX_PENDING_LIST_LENGTH. On Wed, May 29, 2013 at 6:49 PM, Jeff Law l...@redhat.com wrote: On 05/29/2013 06:52 AM, Steven Bosscher wrote: Hello, On Wed, May 29, 2013 at 2:01 PM, Dinar Temirbulatov wrote: Hi, I noticed that the scheduler created long dependence list about ~9000 elements long and slowed compilation time for about an hour. Attached patch flushes the dependence list is case of it is longer than MAX_PENDING_LIST_LENGTH. Tested with gcc testsite on x86_64-linux-gnu with c and c++ enabled. Ok for trunk? thanks, Dinar. 2013-05-28 Dinar Temirbulatov dinar at kugelworks dot com PR rtl-optimization/57268 * sched-deps.c (sched_analyze_2): Flush dependence list then it is longer than MAX_PENDING_LIST_LENGTH. * sched-deps.c (sched_analyze_2): Flush dependence lists if the sum of the read and write lists exceeds MAX_PENDING_LIST_LENGTH. if (!deps-readonly) - add_insn_mem_dependence (deps, true, insn, x); + { + if ((deps-pending_read_list_length + deps-pending_write_list_length) +MAX_PENDING_LIST_LENGTH) + flush_pending_lists (deps, insn, true, true); +add_insn_mem_dependence (deps, true, insn, x); +} The flush_pending_lists, add_insn_mem_dependence and } lines are not indented correctly. The if (...+...) line is too long (max. 80 characters per line). The GCC style would be if (!deps-readonly) { if ((deps-pending_read_list_length + deps-pending_write_list_length) MAX_PENDING_LIST_LENGTH) flush_pending_lists (deps, insn, true, true); add_insn_mem_dependence (deps, true, insn, x); } (The aesthetics of GCC code style is a matter for debate, but not here and now ;-) And just to be clear, with Steven's suggested changes, this patch is OK. jeff fix1.patch Description: Binary data
Re: [PATCH] Fix BB vectorization cost model wrt scalar uses
The patch breaks compiling Open MPI with default compilation flags (for that file: -O3 -fno-strict-aliasing): ../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:100:6: internal compiler error: in operator[], at vec.h:815 bool mca_btl_tcp_frag_send(mca_btl_tcp_frag_t* frag, int sd) See PR 57453 for a reduced test case. Tobias Richard Biener wrote: This properly accounts for remaining unvectorized scalar uses of stmts we vectorize during BB vectorization. Previously we happily retained all of the scalar loads and computations, just optimizing the final stores, alongside the vectorized code. To account for scalar uses properly the following patch simply makes the scalar cost cheaper when stmts have to be kept live. Support for vectorizing those by using vector extracts is currently missing. It happens quite frequently that CSE looks through the final scalar global store, re-using the stored value and thus keeping the whole computation sequence live. Bootstrap / regtest running on x86_64-unknown-linux-gnu. Richard. 2013-05-28 Richard Biener rguent...@suse.de * tree-vect-slp.c (vect_bb_slp_scalar_cost): New function computing scalar cost offsetted by stmts that are kept live by scalar uses. (vect_bb_vectorization_profitable_p): Use vect_bb_slp_scalar_cost for computation of scalar cost. * gcc.dg/vect/bb-slp-32.c: New testcase. Index: gcc/tree-vect-slp.c === *** gcc/tree-vect-slp.c (revision 199375) --- gcc/tree-vect-slp.c (working copy) *** vect_slp_analyze_operations (bb_vec_info *** 1898,1903 --- 1898,1966 return true; } + + /* Compute the scalar cost of the SLP node NODE and its children +and return it. Do not account defs that are marked in LIFE and +update LIFE according to uses of NODE. */ + + static unsigned + vect_bb_slp_scalar_cost (slp_tree node, vecbool, va_stack life) + { + unsigned scalar_cost = 0; + unsigned i; + gimple stmt; + slp_tree child; + + FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt) + { + unsigned stmt_cost; + ssa_op_iter op_iter; + def_operand_p def_p; + stmt_vec_info stmt_info; + + if (life[i]) + continue; + + /* If there is a non-vectorized use of the defs then the scalar + stmt is kept live in which case we do not account it or any +required defs in the SLP children in the scalar cost. This +way we make the vectorization more costly when compared to +the scalar cost. */ + FOR_EACH_SSA_DEF_OPERAND (def_p, stmt, op_iter, SSA_OP_DEF) + { + imm_use_iterator use_iter; + gimple use_stmt; + FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p)) + if (!vinfo_for_stmt (use_stmt) + || !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (use_stmt))) + { + life[i] = true; + BREAK_FROM_IMM_USE_STMT (use_iter); + } + } + if (life[i]) + continue; + + stmt_info = vinfo_for_stmt (stmt); + if (STMT_VINFO_DATA_REF (stmt_info)) + { + if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))) + stmt_cost = vect_get_stmt_cost (scalar_load); + else + stmt_cost = vect_get_stmt_cost (scalar_store); + } + else + stmt_cost = vect_get_stmt_cost (scalar_stmt); + + scalar_cost += stmt_cost; + } + + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) + scalar_cost += vect_bb_slp_scalar_cost (child, life); + + return scalar_cost; + } + /* Check if vectorization of the basic block is profitable. */ static bool *** vect_bb_vectorization_profitable_p (bb_v *** 1931,1956 } /* Calculate scalar cost. */ ! for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (si)) { ! stmt = gsi_stmt (si); ! stmt_info = vinfo_for_stmt (stmt); ! ! if (!stmt_info || !STMT_VINFO_VECTORIZABLE (stmt_info) ! || !PURE_SLP_STMT (stmt_info)) ! continue; ! ! if (STMT_VINFO_DATA_REF (stmt_info)) ! { ! if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))) ! stmt_cost = vect_get_stmt_cost (scalar_load); ! else ! stmt_cost = vect_get_stmt_cost (scalar_store); ! } ! else ! stmt_cost = vect_get_stmt_cost (scalar_stmt); ! ! scalar_cost += stmt_cost; } /* Complete the target-specific cost calculation. */ --- 1994,2007 } /* Calculate scalar cost. */ ! FOR_EACH_VEC_ELT (slp_instances, i, instance) { ! vecbool, va_stack life; ! vec_stack_alloc (bool, life, SLP_INSTANCE_GROUP_SIZE (instance)); ! life.quick_grow_cleared (SLP_INSTANCE_GROUP_SIZE (instance)); ! scalar_cost += vect_bb_slp_scalar_cost (SLP_INSTANCE_TREE (instance), !
Re: Fix PR57268
Here is the corrected version of change. Also, I think, I need write-after-approval access to commit the change. thanks, Dinar, PR rtl-optimization/57268 * sched-deps.c (sched_analyze_2): Flush dependence lists if the sum of the read and write lists exceeds MAX_PENDING_LIST_LENGTH. On Wed, May 29, 2013 at 6:49 PM, Jeff Law l...@redhat.com wrote: On 05/29/2013 06:52 AM, Steven Bosscher wrote: Hello, On Wed, May 29, 2013 at 2:01 PM, Dinar Temirbulatov wrote: Hi, I noticed that the scheduler created long dependence list about ~9000 elements long and slowed compilation time for about an hour. Attached patch flushes the dependence list is case of it is longer than MAX_PENDING_LIST_LENGTH. Tested with gcc testsite on x86_64-linux-gnu with c and c++ enabled. Ok for trunk? thanks, Dinar. 2013-05-28 Dinar Temirbulatov dinar at kugelworks dot com PR rtl-optimization/57268 * sched-deps.c (sched_analyze_2): Flush dependence list then it is longer than MAX_PENDING_LIST_LENGTH. * sched-deps.c (sched_analyze_2): Flush dependence lists if the sum of the read and write lists exceeds MAX_PENDING_LIST_LENGTH. if (!deps-readonly) - add_insn_mem_dependence (deps, true, insn, x); + { + if ((deps-pending_read_list_length + deps-pending_write_list_length) +MAX_PENDING_LIST_LENGTH) + flush_pending_lists (deps, insn, true, true); +add_insn_mem_dependence (deps, true, insn, x); +} The flush_pending_lists, add_insn_mem_dependence and } lines are not indented correctly. The if (...+...) line is too long (max. 80 characters per line). The GCC style would be if (!deps-readonly) { if ((deps-pending_read_list_length + deps-pending_write_list_length) MAX_PENDING_LIST_LENGTH) flush_pending_lists (deps, insn, true, true); add_insn_mem_dependence (deps, true, insn, x); } (The aesthetics of GCC code style is a matter for debate, but not here and now ;-) And just to be clear, with Steven's suggested changes, this patch is OK. jeff
[GOOGLE] Unrestrict early inline restrictions for AutoFDO
In gcc4-8, the max einline iterations are restricted to 1. For AutoFDO, this is bad because early inline is not size restricted. This patch allows einline to do multiple iterations in AutoFDO. It also enables tracer optimization in AutoFDO. Bootstrapped and passed regression test. OK for googel-4_8? Thanks, Dehao Index: gcc/ipa-inline.c === --- gcc/ipa-inline.c (revision 199416) +++ gcc/ipa-inline.c (working copy) @@ -2161,7 +2161,8 @@ early_inliner (void) { /* We iterate incremental inlining to get trivial cases of indirect inlining. */ - while (iterations PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS) + while ((flag_auto_profile + || iterations PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS)) early_inline_small_functions (node)) { timevar_push (TV_INTEGRATION); Index: gcc/opts.c === --- gcc/opts.c (revision 199416) +++ gcc/opts.c (working copy) @@ -1644,6 +1644,8 @@ common_handle_option (struct gcc_options *opts, opts-x_flag_peel_loops = value; if (!opts_set-x_flag_value_profile_transformations) opts-x_flag_value_profile_transformations = value; + if (!opts_set-x_flag_tracer) + opts-x_flag_tracer = value; if (!opts_set-x_flag_inline_functions) opts-x_flag_inline_functions = value; if (!opts_set-x_flag_ipa_cp)
Re: [PING 2] [PATCH RX] Added target specific macros for RX100, RX200, and RX600
Hi Sandeep, The patch is fine, please apply. Thanks for the review. I do not have write approvals to gcc-svn. Could you please commit this for me? Done. Cheers Nick
Re: [GOOGLE] Unrestrict early inline restrictions for AutoFDO
The early inlining part is ok. The tracer optimization should be revisited -- we should have more fine grain control on it (for instance, based on FDO summary -- but that should be common to FDO/LIPO). David On Wed, May 29, 2013 at 9:39 AM, Dehao Chen de...@google.com wrote: In gcc4-8, the max einline iterations are restricted to 1. For AutoFDO, this is bad because early inline is not size restricted. This patch allows einline to do multiple iterations in AutoFDO. It also enables tracer optimization in AutoFDO. Bootstrapped and passed regression test. OK for googel-4_8? Thanks, Dehao Index: gcc/ipa-inline.c === --- gcc/ipa-inline.c (revision 199416) +++ gcc/ipa-inline.c (working copy) @@ -2161,7 +2161,8 @@ early_inliner (void) { /* We iterate incremental inlining to get trivial cases of indirect inlining. */ - while (iterations PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS) + while ((flag_auto_profile + || iterations PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS)) early_inline_small_functions (node)) { timevar_push (TV_INTEGRATION); Index: gcc/opts.c === --- gcc/opts.c (revision 199416) +++ gcc/opts.c (working copy) @@ -1644,6 +1644,8 @@ common_handle_option (struct gcc_options *opts, opts-x_flag_peel_loops = value; if (!opts_set-x_flag_value_profile_transformations) opts-x_flag_value_profile_transformations = value; + if (!opts_set-x_flag_tracer) + opts-x_flag_tracer = value; if (!opts_set-x_flag_inline_functions) opts-x_flag_inline_functions = value; if (!opts_set-x_flag_ipa_cp)
[PATCH] ARM: Don't clobber CC reg when it is live after the peephole window
Hi All, This patch fixes a bug in one of the ARM peephole2 optimizations. The peephole2 optimization in question was changed to use the CC-updating form for all of the instructions produced by the peephole so that the encoding will be smaller when compiling for thumb [1]. However, I don't think that is always safe. For example, the CC register might be used by something *after* the peephole window. The current peephole will transform: (insn:TI 7 49 18 2 (set (reg:CC 24 cc) (compare:CC (reg:SI 3 r3 [orig:136 *endname_1(D) ] [136]) (const_int 0 [0]))) repro.c:5 212 {*arm_cmpsi_insn} (nil)) (insn:TI 18 7 11 2 (cond_exec (ne (reg:CC 24 cc) (const_int 0 [0])) (set (reg:SI 3 r3 [140]) (const_int 0 [0]))) repro.c:8 3366 {*p *arm_movsi_vfp} (expr_list:REG_EQUIV (const_int 0 [0]) (nil))) (insn 11 18 19 2 (cond_exec (eq (reg:CC 24 cc) (const_int 0 [0])) (set (reg:SI 3 r3 [138]) (const_int 1 [0x1]))) repro.c:6 3366 {*p *arm_movsi_vfp} (expr_list:REG_EQUIV (const_int 1 [0x1]) (nil))) (insn:TI 19 11 12 2 (cond_exec (ne (reg:CC 24 cc) (const_int 0 [0])) (set (mem/c:SI (reg/f:SI 2 r2 [143]) [2 atend+0 S4 A32]) (reg:SI 3 r3 [140]))) repro.c:8 3366 {*p *arm_movsi_vfp} (expr_list:REG_DEAD (reg/f:SI 2 r2 [143]) (nil))) (insn:TI 12 19 22 2 (cond_exec (eq (reg:CC 24 cc) (const_int 0 [0])) (set (mem/c:SI (reg/f:SI 2 r2 [143]) [2 atend+0 S4 A32]) (reg:SI 3 r3 [138]))) repro.c:6 3366 {*p *arm_movsi_vfp} (nil)) (insn:TI 22 12 58 2 (cond_exec (ne (reg:CC 24 cc) (const_int 0 [0])) (set (mem:QI (reg/v/f:SI 0 r0 [orig:135 endname ] [135]) [0 *endname_1(D)+0 S1 A8]) (reg:QI 3 r3 [140]))) repro.c:9 3115 {*p *arm_movqi_insn} (expr_list:REG_DEAD (reg:CC 24 cc) (expr_list:REG_DEAD (reg:QI 3 r3 [140]) (expr_list:REG_DEAD (reg/v/f:SI 0 r0 [orig:135 endname ] [135]) (nil) into the following: (insn 59 49 60 2 (parallel [ (set (reg:CC 24 cc) (compare:CC (reg:SI 3 r3 [orig:136 *endname_1(D) ] [136]) (const_int 0 [0]))) (set (reg:SI 1 r1) (minus:SI (reg:SI 3 r3 [orig:136 *endname_1(D) ] [136]) (const_int 0 [0]))) ]) repro.c:6 -1 (nil)) (insn 60 59 61 2 (parallel [ (set (reg:CC 24 cc) (compare:CC (const_int 0 [0]) (reg:SI 1 r1))) (set (reg:SI 3 r3 [140]) (minus:SI (const_int 0 [0]) (reg:SI 1 r1))) ]) repro.c:6 -1 (nil)) (insn 61 60 19 2 (parallel [ (set (reg:SI 3 r3 [140]) (plus:SI (plus:SI (reg:SI 3 r3 [140]) (reg:SI 1 r1)) (geu:SI (reg:CC 24 cc) (const_int 0 [0] (clobber (reg:CC 24 cc)) ]) repro.c:6 -1 (nil)) (insn:TI 19 61 12 2 (cond_exec (ne (reg:CC 24 cc) (const_int 0 [0])) (set (mem/c:SI (reg/f:SI 2 r2 [143]) [2 atend+0 S4 A32]) (reg:SI 3 r3 [140]))) repro.c:8 3366 {*p *arm_movsi_vfp} (nil)) (insn:TI 12 19 22 2 (cond_exec (eq (reg:CC 24 cc) (const_int 0 [0])) (set (mem/c:SI (reg/f:SI 2 r2 [143]) [2 atend+0 S4 A32]) (reg:SI 3 r3 [138]))) repro.c:6 3366 {*p *arm_movsi_vfp} (expr_list:REG_DEAD (reg/f:SI 2 r2 [143]) (nil))) (insn:TI 22 12 58 2 (cond_exec (ne (reg:CC 24 cc) (const_int 0 [0])) (set (mem:QI (reg/v/f:SI 0 r0 [orig:135 endname ] [135]) [0 *endname_1(D)+0 S1 A8]) (reg:QI 3 r3 [140]))) repro.c:9 3115 {*p *arm_movqi_insn} (expr_list:REG_DEAD (reg:CC 24 cc) (expr_list:REG_DEAD (reg:QI 3 r3 [140]) (expr_list:REG_DEAD (reg/v/f:SI 0 r0 [orig:135 endname ] [135]) (nil) This gets compiled into the incorrect sequence: ldrbr3, [r0, #0] ldr r2, .L4 subsr1, r3, #0 rsbsr3, r1, #0 adcsr3, r3, r1 strne r3, [r2, #0] streq r3, [r2, #0] strneb r3, [r0, #0] The conditional stores are now dealing with an incorrect condition state. This patch fixes the problem by ensuring that the CC reg is dead after the peephole window for the current peephole definition and falls back on the original pre-PR46975 peephole when it is live. Unfortunately I had trouble coming up with a reproduction case against mainline. I only noticed the bug while working with some local changes that
ping: [patch] Support .eh_frame in crt1 x86_64 glibc (PR libgcc/57280, libc/15407)
Hi, [patch update] Support .eh_frame in crt1 x86_64 glibc (PR libgcc/57280, libc/15407) http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00775.html Message-ID: 20130514191244.ga12...@host2.jankratochvil.net Thanks, Jan
Re: Fix PR57268
On 30/05/2013, at 4:36 AM, Dinar Temirbulatov wrote: Here is the corrected version of change. Also, I think, I need write-after-approval access to commit the change. thanks, Dinar, PR rtl-optimization/57268 * sched-deps.c (sched_analyze_2): Flush dependence lists if the sum of the read and write lists exceeds MAX_PENDING_LIST_LENGTH. Jeff, Could you approve write-after-approval privileges for Dinar and start the procedure to create a gcc/sourceware account for him? Dinar has been working on the GNU toolchain since 2004 and now he will be more active in upstream development. Thank you, -- Maxim Kuvyrkov KugelWorks
[Patch, Fortran] PR57456 - Handle ALLOCATE with typespec for CLASS
Currently, ALLOCATE ignores the typespec for arrays. Such that: ALLOCATE (t2 :: var(5)) will allocate as much memory as the base type requires instead of using as much as t2 does. I explicitly exclude characters as it otherwise will fail for allocate_with_typespec_1.f90, which uses: allocate(character :: c1(1)) The problem is that gfc_typenode_for_spec will return an array type and not an element type, hence TYPE_SIZE_UNIT won't work. The current version is fine, except for deferred-length strings. To properly handle it, one has to do it as gfortran currently does for scalars. (Best by consolidating the support. See PR.) As I want to work on other things first, I would like to get this in as band aid - until someone has the time to do it properly. (I found it when trying to write a test case for the already submitted final patch.) Build and regtested on x86-64-gnu-linux. OK for the trunk? Tobias 2013-05-29 Tobias Burnus bur...@net-b.de PR fortran/37336 * trans-array.c (gfc_trans_dealloc_allocated): Support finalization. (structure_alloc_comps): Update caller. (gfc_trans_deferred_array): Call finalizer. * trans-array.h (gfc_trans_dealloc_allocated): Update prototype. * trans-decl.c (gfc_trans_deferred_vars): Don't deallocate/finalize variables of the main program. * trans-expr.c (gfc_conv_procedure_call): Support finalization. * trans-openmp.c (gfc_omp_clause_dtor, gfc_trans_omp_array_reduction): Update calls. * trans-stmt.c (gfc_trans_deallocate): Avoid double deallocation of alloc components. * trans.c (gfc_add_finalizer_call): New function. (gfc_deallocate_with_status, gfc_deallocate_scalar_with_status): Call it (gfc_build_final_call): Fix handling of scalar coarrays. 2013-05-29 Tobias Burnus bur...@net-b.de PR fortran/37336 * gfortran.dg/finalize_12.f90: New. * gfortran.dg/alloc_comp_basics_1.f90: Add BLOCK for end of scope finalization. * gfortran.dg/alloc_comp_constructor_1.f90: Ditto. * gfortran.dg/allocatable_scalar_9.f90: Ditto. * gfortran.dg/auto_dealloc_2.f90: Ditto. * gfortran.dg/class_19.f03: Ditto. * gfortran.dg/coarray_lib_alloc_1.f90: Ditto. * gfortran.dg/coarray_lib_alloc_2.f90: Ditto. * gfortran.dg/extends_14.f03: Ditto. * gfortran.dg/move_alloc_4.f90: Ditto. * gfortran.dg/typebound_proc_27.f03: Ditto. diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c index be3a5a0..8160fcd 100644 --- a/gcc/fortran/trans-array.c +++ b/gcc/fortran/trans-array.c @@ -7243,7 +7243,7 @@ gfc_conv_array_parameter (gfc_se * se, gfc_expr * expr, bool g77, /* Generate code to deallocate an array, if it is allocated. */ tree -gfc_trans_dealloc_allocated (tree descriptor, bool coarray) +gfc_trans_dealloc_allocated (tree descriptor, bool coarray, gfc_expr *expr) { tree tmp; tree var; @@ -7259,7 +7259,7 @@ gfc_trans_dealloc_allocated (tree descriptor, bool coarray) are already deallocated are ignored. */ tmp = gfc_deallocate_with_status (coarray ? descriptor : var, NULL_TREE, NULL_TREE, NULL_TREE, NULL_TREE, true, -NULL, coarray); +expr, coarray); gfc_add_expr_to_block (block, tmp); /* Zero the data pointer. */ @@ -7548,7 +7548,7 @@ structure_alloc_comps (gfc_symbol * der_type, tree decl, { comp = fold_build3_loc (input_location, COMPONENT_REF, ctype, decl, cdecl, NULL_TREE); - tmp = gfc_trans_dealloc_allocated (comp, c-attr.codimension); + tmp = gfc_trans_dealloc_allocated (comp, c-attr.codimension, NULL); gfc_add_expr_to_block (tmpblock, tmp); } else if (c-attr.allocatable) @@ -7580,7 +7580,7 @@ structure_alloc_comps (gfc_symbol * der_type, tree decl, if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (comp))) tmp = gfc_trans_dealloc_allocated (comp, - CLASS_DATA (c)-attr.codimension); + CLASS_DATA (c)-attr.codimension, NULL); else { tmp = gfc_deallocate_scalar_with_status (comp, NULL_TREE, true, NULL, @@ -8292,7 +8292,7 @@ gfc_trans_deferred_array (gfc_symbol * sym, gfc_wrapped_block * block) stmtblock_t cleanup; locus loc; int rank; - bool sym_has_alloc_comp; + bool sym_has_alloc_comp, has_finalizer; sym_has_alloc_comp = (sym-ts.type == BT_DERIVED || sym-ts.type == BT_CLASS) @@ -8379,8 +8379,12 @@ gfc_trans_deferred_array (gfc_symbol * sym, gfc_wrapped_block * block) /* Allocatable arrays need to be freed when they go out of scope. The allocatable components of pointers must not be touched. */ - if (sym_has_alloc_comp !(sym-attr.function || sym-attr.result) - !sym-attr.pointer !sym-attr.save) + has_finalizer = sym-ts.type == BT_CLASS || sym-ts.type == BT_DERIVED + ? gfc_is_finalizable (sym-ts.u.derived, NULL) : false; + if ((!sym-attr.allocatable || !has_finalizer) + sym_has_alloc_comp !(sym-attr.function || sym-attr.result) + !sym-attr.pointer !sym-attr.save + !sym-ns-proc_name-attr.is_main_program) { int rank;
Re: [Patch, Fortran] PR57456 - Handle ALLOCATE with typespec for CLASS
Now as bonus with the proper patch. Tobias PS: I really wonder why Thunderbird's attach file dialog shows an outdated directory content, unless one hits F5, if one opens the dialog again :-( Tobias Burnus wrote: Currently, ALLOCATE ignores the typespec for arrays. Such that: ALLOCATE (t2 :: var(5)) will allocate as much memory as the base type requires instead of using as much as t2 does. I explicitly exclude characters as it otherwise will fail for allocate_with_typespec_1.f90, which uses: allocate(character :: c1(1)) The problem is that gfc_typenode_for_spec will return an array type and not an element type, hence TYPE_SIZE_UNIT won't work. The current version is fine, except for deferred-length strings. To properly handle it, one has to do it as gfortran currently does for scalars. (Best by consolidating the support. See PR.) As I want to work on other things first, I would like to get this in as band aid - until someone has the time to do it properly. (I found it when trying to write a test case for the already submitted final patch.) Build and regtested on x86-64-gnu-linux. OK for the trunk? Tobias 2013-05-29 Tobias Burnus bur...@net-b.de PR fortran/57456 * trans-array.c (gfc_array_init_size): Use passed type spec, when available. (gfc_array_allocate): Pass typespec on. * trans-array.h (gfc_array_allocate): Update prototype. * trans-stmt.c (gfc_trans_allocate): Pass typespec on. 2013-05-29 Tobias Burnus bur...@net-b.de PR fortran/57456 * gfortran.dg/class_array_17.f90: New. diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c index be3a5a0..b0748b7 100644 --- a/gcc/fortran/trans-array.c +++ b/gcc/fortran/trans-array.c @@ -4834,7 +4834,8 @@ static tree gfc_array_init_size (tree descriptor, int rank, int corank, tree * poffset, gfc_expr ** lower, gfc_expr ** upper, stmtblock_t * pblock, stmtblock_t * descriptor_block, tree * overflow, - tree expr3_elem_size, tree *nelems, gfc_expr *expr3) + tree expr3_elem_size, tree *nelems, gfc_expr *expr3, + gfc_typespec *ts) { tree type; tree tmp; @@ -4834,7 +4834,8 @@ gfc_array_init_size (tree descriptor, int rank, int corank, tree * poffset, tmp = TYPE_SIZE_UNIT (tmp); } } + else if (ts-type != BT_UNKNOWN ts-type != BT_CHARACTER) +/* FIXME: Properly handle characters. See PR 57456. */ +tmp = TYPE_SIZE_UNIT (gfc_typenode_for_spec (ts)); else tmp = TYPE_SIZE_UNIT (gfc_get_element_type (type)); @@ -5081,7 +5084,7 @@ gfc_array_init_size (tree descriptor, int rank, int corank, tree * poffset, bool gfc_array_allocate (gfc_se * se, gfc_expr * expr, tree status, tree errmsg, tree errlen, tree label_finish, tree expr3_elem_size, - tree *nelems, gfc_expr *expr3) + tree *nelems, gfc_expr *expr3, gfc_typespec *ts) { tree tmp; tree pointer; @@ -5166,7 +5169,7 @@ gfc_array_allocate (gfc_se * se, gfc_expr * expr, tree status, tree errmsg, size = gfc_array_init_size (se-expr, ref-u.ar.as-rank, ref-u.ar.as-corank, offset, lower, upper, se-pre, set_descriptor_block, overflow, - expr3_elem_size, nelems, expr3); + expr3_elem_size, nelems, expr3, ts); if (dimension) { diff --git a/gcc/fortran/trans-array.h b/gcc/fortran/trans-array.h index 6f44d79..d00e156 100644 --- a/gcc/fortran/trans-array.h +++ b/gcc/fortran/trans-array.h @@ -24,7 +24,7 @@ tree gfc_array_deallocate (tree, tree, tree, tree, tree, gfc_expr*); /* Generate code to initialize and allocate an array. Statements are added to se, which should contain an expression for the array descriptor. */ bool gfc_array_allocate (gfc_se *, gfc_expr *, tree, tree, tree, tree, - tree, tree *, gfc_expr *); + tree, tree *, gfc_expr *, gfc_typespec *); /* Allow the bounds of a loop to be set from a callee's array spec. */ void gfc_set_loop_bounds_from_array_spec (gfc_interface_mapping *, diff --git a/gcc/fortran/trans-stmt.c b/gcc/fortran/trans-stmt.c index 7812934..7759b86 100644 --- a/gcc/fortran/trans-stmt.c +++ b/gcc/fortran/trans-stmt.c @@ -4925,7 +4925,7 @@ gfc_trans_allocate (gfc_code * code) nelems = NULL_TREE; if (!gfc_array_allocate (se, expr, stat, errmsg, errlen, label_finish, - memsz, nelems, code-expr3)) + memsz, nelems, code-expr3, code-ext.alloc.ts)) { bool unlimited_char; --- /dev/null 2013-05-29 07:55:34.977108520 +0200 +++ gcc/gcc/testsuite/gfortran.dg/class_array_17.f90 2013-05-29 19:36:00.239941803 +0200 @@ -0,0 +1,34 @@ +! { dg-do compile } +! { dg-options -fdump-tree-original } +! +! PR fortran/57456 +! +module m + implicit none + type t +integer :: i + end type t + type, extends(t) :: t2 +integer :: j + end type t2 +end module m + +program test + use m + implicit none + integer :: i + class(t), save, allocatable :: y(:) + + allocate (t :: y(5)) + select type(y) + type is (t2) +do i = 1, 5 + y(i)%i = i +
Re: Symtab cleanups 1/17
Honza, I would greatly appreciate if you would bootstrap GCC and run the testsuite with your symtab cleanup patches on ppc64-linux. Testing on gcc110.fsffrance.org passed fine. I will re-test x86_64-linux and go ahead with the patch and will mind to continue testing the patches on both platforms. Honza Thanks, David
Re: Fix PR57268
On 05/29/2013 10:36 AM, Dinar Temirbulatov wrote: Here is the corrected version of change. Also, I think, I need write-after-approval access to commit the change. thanks, Dinar, PR rtl-optimization/57268 * sched-deps.c (sched_analyze_2): Flush dependence lists if the sum of the read and write lists exceeds MAX_PENDING_LIST_LENGTH. Approved for the trunk. I've separately sent instructions to get write access set up. jeff
Re: [PATCH, rs6000] power8 patches, patch #6, direct move basic quad load/store
+ if (TARGET_DIRECT_MOVE) +{ + if (TARGET_POWERPC64) +{ + reload_gpr_vsx[TImode]= CODE_FOR_reload_gpr_from_vsxti; + reload_gpr_vsx[V2DFmode] = CODE_FOR_reload_gpr_from_vsxv2df; + reload_gpr_vsx[V2DImode] = CODE_FOR_reload_gpr_from_vsxv2di; + reload_gpr_vsx[V4SFmode] = CODE_FOR_reload_gpr_from_vsxv4sf; + reload_gpr_vsx[V4SImode] = CODE_FOR_reload_gpr_from_vsxv4si; + reload_gpr_vsx[V8HImode] = CODE_FOR_reload_gpr_from_vsxv8hi; + reload_gpr_vsx[V16QImode] = CODE_FOR_reload_gpr_from_vsxv16qi; + reload_gpr_vsx[SFmode]= CODE_FOR_reload_gpr_from_vsxsf; + + reload_vsx_gpr[TImode]= CODE_FOR_reload_vsx_from_gprti; + reload_vsx_gpr[V2DFmode] = CODE_FOR_reload_vsx_from_gprv2df; + reload_vsx_gpr[V2DImode] = CODE_FOR_reload_vsx_from_gprv2di; + reload_vsx_gpr[V4SFmode] = CODE_FOR_reload_vsx_from_gprv4sf; + reload_vsx_gpr[V4SImode] = CODE_FOR_reload_vsx_from_gprv4si; + reload_vsx_gpr[V8HImode] = CODE_FOR_reload_vsx_from_gprv8hi; + reload_vsx_gpr[V16QImode] = CODE_FOR_reload_vsx_from_gprv16qi; + reload_vsx_gpr[SFmode]= CODE_FOR_reload_vsx_from_gprsf; +} + else +{ + reload_fpr_gpr[DImode] = CODE_FOR_reload_fpr_from_gprdi; + reload_fpr_gpr[DDmode] = CODE_FOR_reload_fpr_from_gprdd; + reload_fpr_gpr[DFmode] = CODE_FOR_reload_fpr_from_gprdf; +} +} Why do the VSX reload functions depend on TARGET_POWERPC64? That seems like the wrong test. +/* Return true if this is a move direct operation between GPR registers and + floating point/VSX registers. */ + +bool +direct_move_p (rtx op0, rtx op1) Why isn't this function symmetric? It at least needs an explanation in the comment about assumptions for the operands. +/* Return true if this is a load or store quad operation. */ + +bool +quad_load_store_p (rtx op0, rtx op1) Same for this function. +/* Helper function for rs6000_secondary_reload to return true if a move to a + different register classe is really a simple move. */ + +static bool +rs6000_secondary_reload_simple_move (enum rs6000_reg_type to_type, + enum rs6000_reg_type from_type, + enum machine_mode mode) +{ + int size; + + /* Add support for various direct moves available. In this function, we only + look at cases where we don't need any extra registers, and one or more + simple move insns are issued. At present, 32-bit integers are not allowed + in FPR/VSX registers. Single precision binary floating is not a simple + move because we need to convert to the single precision memory layout. + The 4-byte SDmode can be moved. */ The second comment should be merged into the first -- it explains the purpose and implementation of the function. +/* Return whether a move between two register classes can be done either + directly (simple move) or via a pattern that uses a single extra temporary + (using power8's direct move in this case. */ + +static bool +rs6000_secondary_reload_move (enum rs6000_reg_type to_type, + enum rs6000_reg_type from_type, + enum machine_mode mode, + secondary_reload_info *sri, + bool altivec_p) Missing a close parenthesis in the comment. (define_insn *vsx_movmode - [(set (match_operand:VSX_M 0 nonimmediate_operand =Z,VSr,VSr,?Z,?wa,?wa,*Y,*r,*r,VSr,?wa,*r,v,wZ,v) -(match_operand:VSX_M 1 input_operand VSr,Z,VSr,wa,Z,wa,r,Y,r,j,j,j,W,v,wZ))] + [(set (match_operand:VSX_M 0 nonimmediate_operand =Z,VSr,VSr,?Z,?wa,?wa,wQ,?r,??Y,??r,??r,VSr,?wa,*r,v,wZ, v) +(match_operand:VSX_M 1 input_operand VSr,Z,VSr,wa,Z,wa,r,wQ,r,Y,r,j,j,j,W,v,wZ))] Why do you need to change the modifiers? Why should vector operands in GPRs matter for register preferences (removing `*' from r constraints)? Thanks, David
[PATCH] Fix -fdump-passes
This patch re-enables -fdump-passes. It had stopped working because dump_passes was changed to use the FOR_EACH_DEFINED_FUNCTION iterator, however, functions are not marked as defined until after dump_passes is called, in cgraph_analyze_functions. Fixed by iterating over all functions. Bootstrapped and tested on x86-64-unknown-linux-gnu. Ok for trunk? Thanks, Teresa 2013-05-29 Teresa Johnson tejohn...@google.com * passes.c (dump_passes): Use FOR_EACH_FUNCTION since functions are not yet marked as defined. Index: passes.c === --- passes.c (revision 199199) +++ passes.c (working copy) @@ -718,7 +718,7 @@ dump_passes (void) create_pass_tab(); - FOR_EACH_DEFINED_FUNCTION (n) + FOR_EACH_FUNCTION (n) if (DECL_STRUCT_FUNCTION (n-symbol.decl)) { node = n; -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [gomp4/cilk-in-gomp/cilkplus] C++ parsing for Cilk plus #pragma simd
On 05/28/13 16:20, Iyer, Balaji V wrote: Regarding the tests, I will merge trunk-cilk-in-gomp so we can reuse the c-c++- common infrastructure you wrote. There is no sense having multiple tests for vectorlength, linear, etc. Can you please add them under c-c++-common/cilk-plus/PS directory? AN = Array Notation PS = Pragma SIMD CK = Cilk Keywords EF = Elemental function. This way, we don't have to monkey with the existing gcc.dg/cilk-plus/cilk-plus.exp script. Thanks, Balaji V. Iyer. Aldy On second thought, it looks like you'll be busy working on the fallout from the array notation work. I can take over this entire patch, fix things, and re-post the final version before committing to the branch. Thanks for contributing this! Aldy
Re: [PATCH, rs6000] power8 patches, patch #7, quad/byte/half-word atomic instructions
- if (mode == QImode || mode == HImode) + /* On power8, we want to use SImode for the operation. On previoius systems, + use the operation in a subword and shift/mask to get the proper byte or + halfword. */ + if (TARGET_SYNC_HI_QI (mode == QImode || mode == HImode)) +{ + val = convert_modes (SImode, mode, val, 1); + + /* Prepare to adjust the return value. */ + before = gen_reg_rtx (SImode); + if (after) +after = gen_reg_rtx (SImode); + mode = SImode; +} + else if (mode == QImode || mode == HImode) Spelling: previoius. This logic is redundant. Why not if (mode == QImode || mode == HImode) { if (TARGET_SYNC_HI_QI) { new code } else { original code } The rest of this patch is okay. Thanks, David
Re: [PATCH] Fix -fdump-passes
On 05/29/2013 02:11 PM, Teresa Johnson wrote: This patch re-enables -fdump-passes. It had stopped working because dump_passes was changed to use the FOR_EACH_DEFINED_FUNCTION iterator, however, functions are not marked as defined until after dump_passes is called, in cgraph_analyze_functions. Fixed by iterating over all functions. Bootstrapped and tested on x86-64-unknown-linux-gnu. Ok for trunk? Thanks, Teresa 2013-05-29 Teresa Johnson tejohn...@google.com * passes.c (dump_passes): Use FOR_EACH_FUNCTION since functions are not yet marked as defined. OK, Assuming you fix the whitespace in the ChangeLog entry. jeff
Re: [PATCH, rs6000] power8 patches, patch #6, direct move basic quad load/store
On Wed, May 29, 2013 at 03:53:43PM -0400, David Edelsohn wrote: + if (TARGET_DIRECT_MOVE) +{ + if (TARGET_POWERPC64) +{ + reload_gpr_vsx[TImode]= CODE_FOR_reload_gpr_from_vsxti; + reload_gpr_vsx[V2DFmode] = CODE_FOR_reload_gpr_from_vsxv2df; + reload_gpr_vsx[V2DImode] = CODE_FOR_reload_gpr_from_vsxv2di; + reload_gpr_vsx[V4SFmode] = CODE_FOR_reload_gpr_from_vsxv4sf; + reload_gpr_vsx[V4SImode] = CODE_FOR_reload_gpr_from_vsxv4si; + reload_gpr_vsx[V8HImode] = CODE_FOR_reload_gpr_from_vsxv8hi; + reload_gpr_vsx[V16QImode] = CODE_FOR_reload_gpr_from_vsxv16qi; + reload_gpr_vsx[SFmode]= CODE_FOR_reload_gpr_from_vsxsf; + + reload_vsx_gpr[TImode]= CODE_FOR_reload_vsx_from_gprti; + reload_vsx_gpr[V2DFmode] = CODE_FOR_reload_vsx_from_gprv2df; + reload_vsx_gpr[V2DImode] = CODE_FOR_reload_vsx_from_gprv2di; + reload_vsx_gpr[V4SFmode] = CODE_FOR_reload_vsx_from_gprv4sf; + reload_vsx_gpr[V4SImode] = CODE_FOR_reload_vsx_from_gprv4si; + reload_vsx_gpr[V8HImode] = CODE_FOR_reload_vsx_from_gprv8hi; + reload_vsx_gpr[V16QImode] = CODE_FOR_reload_vsx_from_gprv16qi; + reload_vsx_gpr[SFmode]= CODE_FOR_reload_vsx_from_gprsf; +} + else +{ + reload_fpr_gpr[DImode] = CODE_FOR_reload_fpr_from_gprdi; + reload_fpr_gpr[DDmode] = CODE_FOR_reload_fpr_from_gprdd; + reload_fpr_gpr[DFmode] = CODE_FOR_reload_fpr_from_gprdf; +} +} Why do the VSX reload functions depend on TARGET_POWERPC64? That seems like the wrong test. Because at present we do not do direct move between VSX and GPR registers for 128-bit in 32-bit mode. For 32-bit mode, we only transfer 64-bit types. Due to issues with secondary reload where you only get one temporary register, and that temporary register might/might not overlap with the output register, we might need a type that takes 3 traditional FPR registers, and we don't have one at present (to move from GPR to VSX we would need to do 2 direct moves, a FMRGOW for the first 64-bits, and 2 direct moves, and another FMRGOW for the second 64-bits, and then an XXPERMDI to glue the two 64-bits together). I've seen places where reload does not honor the constraint for these secondary reload cases, so the output can't be one of the temporary registers, hence needing a type that spans 3 registers. In theory it can be done, it just hasn't been done. +/* Return true if this is a move direct operation between GPR registers and + floating point/VSX registers. */ + +bool +direct_move_p (rtx op0, rtx op1) Why isn't this function symmetric? It at least needs an explanation in the comment about assumptions for the operands. I probably should have named the operands to and from, since that is how the callers call it. I'm not sure I understand the comment about it being symetric, since you need different strategies to move in different directions, and you might or might not have implemented all of the cases. +/* Return true if this is a load or store quad operation. */ + +bool +quad_load_store_p (rtx op0, rtx op1) Same for this function. Again it should have been to/from. +/* Helper function for rs6000_secondary_reload to return true if a move to a + different register classe is really a simple move. */ + +static bool +rs6000_secondary_reload_simple_move (enum rs6000_reg_type to_type, + enum rs6000_reg_type from_type, + enum machine_mode mode) +{ + int size; + + /* Add support for various direct moves available. In this function, we only + look at cases where we don't need any extra registers, and one or more + simple move insns are issued. At present, 32-bit integers are not allowed + in FPR/VSX registers. Single precision binary floating is not a simple + move because we need to convert to the single precision memory layout. + The 4-byte SDmode can be moved. */ The second comment should be merged into the first -- it explains the purpose and implementation of the function. Ok. +/* Return whether a move between two register classes can be done either + directly (simple move) or via a pattern that uses a single extra temporary + (using power8's direct move in this case. */ + +static bool +rs6000_secondary_reload_move (enum rs6000_reg_type to_type, + enum rs6000_reg_type from_type, + enum machine_mode mode, + secondary_reload_info *sri, + bool altivec_p) Missing a close parenthesis in the comment. Yep, thanks. (define_insn *vsx_movmode - [(set (match_operand:VSX_M 0 nonimmediate_operand =Z,VSr,VSr,?Z,?wa,?wa,*Y,*r,*r,VSr,?wa,*r,v,wZ,v) -(match_operand:VSX_M 1 input_operand VSr,Z,VSr,wa,Z,wa,r,Y,r,j,j,j,W,v,wZ))] + [(set
Re: [PATCH, rs6000] power8 patches, patch #7, quad/byte/half-word atomic instructions
On Wed, May 29, 2013 at 04:29:07PM -0400, David Edelsohn wrote: - if (mode == QImode || mode == HImode) + /* On power8, we want to use SImode for the operation. On previoius systems, + use the operation in a subword and shift/mask to get the proper byte or + halfword. */ + if (TARGET_SYNC_HI_QI (mode == QImode || mode == HImode)) +{ + val = convert_modes (SImode, mode, val, 1); + + /* Prepare to adjust the return value. */ + before = gen_reg_rtx (SImode); + if (after) +after = gen_reg_rtx (SImode); + mode = SImode; +} + else if (mode == QImode || mode == HImode) Spelling: previoius. Thanks. This logic is redundant. Why not if (mode == QImode || mode == HImode) { if (TARGET_SYNC_HI_QI) { new code } else { original code } The rest of this patch is okay. I can do that. I think earlier versions did it that way. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH] Fix -fdump-passes
On Wed, May 29, 2013 at 1:32 PM, Jeff Law l...@redhat.com wrote: On 05/29/2013 02:11 PM, Teresa Johnson wrote: This patch re-enables -fdump-passes. It had stopped working because dump_passes was changed to use the FOR_EACH_DEFINED_FUNCTION iterator, however, functions are not marked as defined until after dump_passes is called, in cgraph_analyze_functions. Fixed by iterating over all functions. Bootstrapped and tested on x86-64-unknown-linux-gnu. Ok for trunk? Thanks, Teresa 2013-05-29 Teresa Johnson tejohn...@google.com * passes.c (dump_passes): Use FOR_EACH_FUNCTION since functions are not yet marked as defined. OK, Assuming you fix the whitespace in the ChangeLog entry. Fixed and committed at r199424. Thanks, Teresa jeff -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [ada, build] host/target configuration
Hi! On Wed, 29 May 2013 16:21:38 +0200, Paolo Bonzini bonz...@gnu.org wrote: Il 29/05/2013 12:50, Thomas Schwinge ha scritto: How about we use something like the following [...] patch? In essence, replace the manual parsing in gcc/ada/gcc-interface/Makefile.in by using the values the configure script already computed for us. This is largely straightforward (I would hope); please especially review the more involved cases such as the one where I'm now using linux%eabi -- for example, does that do the right thing or interfere with the Android configurations? I agree that using $(target_os) and linux% matches is a better way to strip out the -gnus. The patch is long, but I think I reviewed it carefully enough. It's okay for mainline. Thanks for the review. Before I commit however: Regarding the android change: -ifeq ($(strip $(filter-out arm%-linux,$(arch)-$(osys)) $(if $(findstring eabi,$(word 4,$(targ))),,$(word 4,$(targ,) +ifeq ($(strip $(filter-out arm% linux%eabi,$(target_cpu) $(target_os))),) This is okay, it will match arm-none-linux-androideabi both before and after Well, upon more carful inspection I came to realize that the old/still current code accepts arm-*-linux-*eabi but also plain arm-*-linux, which my new code no longer accepts. Is the old/still current behavior intentional here, or a bug? It has been introduced in commit f49eb158fa5618dbd1130e47281e0bae13798ec6 (r192475): gcc/ada/ 2012-10-15 Matthias Klose d...@ubuntu.com * gcc-interface/Makefile.in: Match arm*-*-linux-*eabi* for ARM Linux/GNU. -ifeq ($(strip $(filter-out arm% linux-gnueabi,$(arch) $(osys)-$(word 4,$(targ,) +ifeq ($(strip $(filter-out arm%-linux,$(arch)-$(osys)) $(if $(findstring eabi,$(word 4,$(targ))),,$(word 4,$(targ,) From the description, I understand that change to mean any ARM, Linux-kernel, EABI configuration with any kind of user-land, so my new interpretation seems correct, and the current code wrong to also accept plain arm-*-linux. That change just cite should just have replaced linux-gnueabi with linux-%eabi? (perhaps linux-%eabi would be more readable). As soon as we've clarified the previous question, I'll do that change. However, it seems that the first androideabi snippet was dead code. Can you delete it in a follow-up? That has been added in commit 7a5ee35f18bc945ec8abbcd1377c446352504e6e (r193238): 2012-11-06 Arnaud Charlet char...@adacore.com [...] * gcc-interface/Makefile.in: Add runtime pairs for Android. Rework handling of s-oscons.ads. * s-osinte-android.ads, s-osinte-android.adb: New files. +ifeq ($(strip $(filter-out arm% linux-androideabi,$(arch) $(osys)-$(word 4,$(targ,) + LIBGNAT_TARGET_PAIRS = \ + [...] + s-osinte.adbs-osinte-android.adb \ + s-osinte.adss-osinte-android.ads \ + [...] Indeed I agree that the second snippet in the Makefile.in (as changed by Matthias on 2012-10-15) triggers anytime the first one (as added by Arnaud on 2012-11-06) triggered, so Arnaud's LIBGNAT_TARGET_PAIRS setting will always be overwritten by Matthias', so to speak, and thus the new s-osinte-android.* files never be used in the current code. Arnaud, Matthias, please clarify the intended behavior? Of course I would assume that for *-android* configurations the new s-osinte-android.* files be used, but as Arnaud's change has gone in after Matthias', so this can never have worked (unless I'm confused)? Grüße, Thomas pgpMQriqgNpfH.pgp Description: PGP signature
Re: [patch, mips] Fix for PR target/56942
On Tue, May 28, 2013 at 10:39 PM, Steven Bosscher stevenb@gmail.com wrote: On Tue, May 28, 2013 at 9:36 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Hi Steven, Steven Bosscher stevenb@gmail.com writes: Imho the active-insn idiom is the best solution for the moment. I will fix this mess properly asap, probably next week. Just wondering, how are things going with this? (I assume fixing it properly means getting rid of the FIXME in next_active_insn?) Yes, it involves getting this to work: And I can't get it to fail. I also can't find any place where the label and jump table might get separated. I was expecting some trouble with cross-jumping but it seems that it takes care of updating the label reference, and skip_insns_after_block already expects the label and table to be adjacent. So let's just see if/what breaks... Bootstrappedtested on powerpc64-unknown-linux-gnu (unix{,-m32}). Bootstrappedtested on x86_64-unknown-linux-gnu (unix{,-m32}). Buildtested powerpc64 X mips-elf. OK for trunk? If it causes any trouble, please file a PR and assign it to me. And when the dust has settled, I can clean up the FIXME for JUMP_TABLE_DATA in next_active_insn, and fix up the back ends. Ciao! Steven * rtlanal.c (tablejump_p): Expect table and label to be adjacent. Index: rtlanal.c === --- rtlanal.c (revision 199324) +++ rtlanal.c (working copy) @@ -2711,6 +2711,7 @@ tablejump_p (const_rtx insn, rtx *labelp, rtx *tab (table = next_active_insn (label)) != NULL_RTX JUMP_TABLE_DATA_P (table)) { + gcc_assert (table == NEXT_INSN (label)); if (labelp) *labelp = label; if (tablep)
Re: [GOOGLE] Unrestrict early inline restrictions for AutoFDO
OK, I'll commit the early inline part. Dehao On Wed, May 29, 2013 at 10:00 AM, Xinliang David Li davi...@google.com wrote: The early inlining part is ok. The tracer optimization should be revisited -- we should have more fine grain control on it (for instance, based on FDO summary -- but that should be common to FDO/LIPO). David On Wed, May 29, 2013 at 9:39 AM, Dehao Chen de...@google.com wrote: In gcc4-8, the max einline iterations are restricted to 1. For AutoFDO, this is bad because early inline is not size restricted. This patch allows einline to do multiple iterations in AutoFDO. It also enables tracer optimization in AutoFDO. Bootstrapped and passed regression test. OK for googel-4_8? Thanks, Dehao Index: gcc/ipa-inline.c === --- gcc/ipa-inline.c (revision 199416) +++ gcc/ipa-inline.c (working copy) @@ -2161,7 +2161,8 @@ early_inliner (void) { /* We iterate incremental inlining to get trivial cases of indirect inlining. */ - while (iterations PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS) + while ((flag_auto_profile + || iterations PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS)) early_inline_small_functions (node)) { timevar_push (TV_INTEGRATION); Index: gcc/opts.c === --- gcc/opts.c (revision 199416) +++ gcc/opts.c (working copy) @@ -1644,6 +1644,8 @@ common_handle_option (struct gcc_options *opts, opts-x_flag_peel_loops = value; if (!opts_set-x_flag_value_profile_transformations) opts-x_flag_value_profile_transformations = value; + if (!opts_set-x_flag_tracer) + opts-x_flag_tracer = value; if (!opts_set-x_flag_inline_functions) opts-x_flag_inline_functions = value; if (!opts_set-x_flag_ipa_cp)
[GOOGLE] Remove records in powerpc64-grtev3-linux-gnu.xfail
Hi Since b/8397853 has been fixed, the related tests now passed, so we can remove them from powerpc64-grtev3-linux-gnu.xfail now. Tested with ./buildit --run_tests. OK for google 4.7 branch? thanks Carrot 2013-05-29 Guozhi Wei car...@google.com * powerpc64-grtev3-linux-gnu.xfail (*** g++): Remove passed records. Index: powerpc64-grtev3-linux-gnu.xfail === --- powerpc64-grtev3-linux-gnu.xfail (revision 199411) +++ powerpc64-grtev3-linux-gnu.xfail (working copy) @@ -131,19 +131,8 @@ FAIL: g++.dg/ext/cleanup-9.C -std=gnu++11 execution test FAIL: g++.dg/warn/Wself-assign-2.C -std=gnu++11 (test for warnings, line 12) FAIL: g++.dg/tree-prof/lipo/vcall1_0.C scan-ipa-dump-times profile Indirect call - direct call 2 -# b/8397853, a LIPO bug, causing compilation to fail. -FAIL: g++.dg/tree-prof/mversn15.C compilation, -fprofile-generate -UNRESOLVED: g++.dg/tree-prof/mversn15.C execution,-fprofile-generate -#FAIL: g++.dg/tree-prof/mversn15.C execution,-fprofile-generate -UNRESOLVED: g++.dg/tree-prof/mversn15.C compilation, -fprofile-use -UNRESOLVED: g++.dg/tree-prof/mversn15.C execution,-fprofile-use FAIL: g++.dg/tree-prof/mversn15.C scan-tree-dump optimized return 0 FAIL: g++.dg/tree-prof/mversn15.C scan-tree-dump optimized main_clone -FAIL: g++.dg/tree-prof/mversn15a.C compilation, -fprofile-generate -UNRESOLVED: g++.dg/tree-prof/mversn15a.C execution,-fprofile-generate -#FAIL: g++.dg/tree-prof/mversn15a.C execution,-fprofile-generate -UNRESOLVED: g++.dg/tree-prof/mversn15a.C compilation, -fprofile-use -UNRESOLVED: g++.dg/tree-prof/mversn15a.C execution,-fprofile-use # Fortran failures are not important to us so far. *** gfortran:
Re: new port: msp430-elf, revision 2
Are there any further comments on this? Do I need to post a final patch? Can it be committed yet?
Re: [GOOGLE] Remove records in powerpc64-grtev3-linux-gnu.xfail
ok. David On Wed, May 29, 2013 at 5:18 PM, Carrot Wei car...@google.com wrote: Hi Since b/8397853 has been fixed, the related tests now passed, so we can remove them from powerpc64-grtev3-linux-gnu.xfail now. Tested with ./buildit --run_tests. OK for google 4.7 branch? thanks Carrot 2013-05-29 Guozhi Wei car...@google.com * powerpc64-grtev3-linux-gnu.xfail (*** g++): Remove passed records. Index: powerpc64-grtev3-linux-gnu.xfail === --- powerpc64-grtev3-linux-gnu.xfail (revision 199411) +++ powerpc64-grtev3-linux-gnu.xfail (working copy) @@ -131,19 +131,8 @@ FAIL: g++.dg/ext/cleanup-9.C -std=gnu++11 execution test FAIL: g++.dg/warn/Wself-assign-2.C -std=gnu++11 (test for warnings, line 12) FAIL: g++.dg/tree-prof/lipo/vcall1_0.C scan-ipa-dump-times profile Indirect call - direct call 2 -# b/8397853, a LIPO bug, causing compilation to fail. -FAIL: g++.dg/tree-prof/mversn15.C compilation, -fprofile-generate -UNRESOLVED: g++.dg/tree-prof/mversn15.C execution,-fprofile-generate -#FAIL: g++.dg/tree-prof/mversn15.C execution,-fprofile-generate -UNRESOLVED: g++.dg/tree-prof/mversn15.C compilation, -fprofile-use -UNRESOLVED: g++.dg/tree-prof/mversn15.C execution,-fprofile-use FAIL: g++.dg/tree-prof/mversn15.C scan-tree-dump optimized return 0 FAIL: g++.dg/tree-prof/mversn15.C scan-tree-dump optimized main_clone -FAIL: g++.dg/tree-prof/mversn15a.C compilation, -fprofile-generate -UNRESOLVED: g++.dg/tree-prof/mversn15a.C execution,-fprofile-generate -#FAIL: g++.dg/tree-prof/mversn15a.C execution,-fprofile-generate -UNRESOLVED: g++.dg/tree-prof/mversn15a.C compilation, -fprofile-use -UNRESOLVED: g++.dg/tree-prof/mversn15a.C execution,-fprofile-use # Fortran failures are not important to us so far. *** gfortran:
Re: [patch] Hash table changes from cxx-conversion branch - config part
On 5/21/13, Diego Novillo dnovi...@google.com wrote: On May 13, 2013 Lawrence Crowl cr...@googlers.com wrote: I still have not heard from i386 or ia64 folks. Anyone? The i386 bits look fine to me as well. Please wait 48 hours to give the i386 maintainers a chance to object. Committed. -- Lawrence Crowl