Re: [PATCH] Fix handle_char_store in strlen pass (PR tree-optimization/57230)
On Fri, 10 May 2013, Jakub Jelinek wrote: Hi! I've apparently missed one spot, where store of zero value was assumed, but not actually verified. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.8/4.7? Ok. Thanks, Richard. 2013-05-10 Jakub Jelinek ja...@redhat.com PR tree-optimization/57230 * tree-ssa-strlen.c (handle_char_store): Add missing integer_zerop check. * gcc.dg/strlenopt-23.c: New test. --- gcc/tree-ssa-strlen.c.jj 2013-04-26 08:49:53.0 +0200 +++ gcc/tree-ssa-strlen.c 2013-05-10 08:57:20.654523288 +0200 @@ -1703,7 +1703,7 @@ handle_char_store (gimple_stmt_iterator its length may be decreased. */ adjust_last_stmt (si, stmt, false); } - else if (si != NULL) + else if (si != NULL integer_zerop (gimple_assign_rhs1 (stmt))) { si = unshare_strinfo (si); si-length = build_int_cst (size_type_node, 0); --- gcc/testsuite/gcc.dg/strlenopt-23.c.jj2013-05-10 09:01:27.808152595 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-23.c 2013-05-10 09:02:08.042931124 +0200 @@ -0,0 +1,15 @@ +/* PR tree-optimization/57230 */ +/* { dg-do run } */ +/* { dg-options -O2 } */ + +#include strlenopt.h + +int +main () +{ + char p[] = hello world; + p[0] = (char) (strlen (p) - 1); + if (strlen (p) != 11) +abort (); + return 0; +} Jakub -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend
Re: [RFC] Make the new var decl STATIC in function dw2_output_indirect_constant_1
On Fri, May 10, 2013 at 8:27 PM, Carrot Wei car...@google.com wrote: Hi In function dw2_output_indirect_constant_1 a new var decl is created. Only When the variable is not PUBLIC it is allocated static storage. Does anybody know why the variable is not allocated static storage by marking TREE_STATIC when it is PUBLIC? Beacuse TREE_STATIC says whether it gets static or external storage. Not all combinations of TREE_STATIC/TREE_PUBLIC/DECL_EXTERNAL make sense (though you'll see very weird combinations from the C++ frontend which also has DECL_REALLY_EXTERN). The following patch marks the STATIC flag in all cases. It can pass bootstrap and regression test on x86-64. Any comments? Ok. Thanks, Richard. thanks Carrot 2013-05-09 Guozhi Wei car...@google.com * dwarf2asm.c (dw2_output_indirect_constant_1): Mark new decl STATIC. Index: dwarf2asm.c === --- dwarf2asm.c (revision 198794) +++ dwarf2asm.c (working copy) @@ -906,6 +906,7 @@ DECL_IGNORED_P (decl) = 1; DECL_INITIAL (decl) = decl; TREE_READONLY (decl) = 1; + TREE_STATIC (decl) = 1; if (TREE_PUBLIC (id)) { @@ -914,8 +915,6 @@ if (USE_LINKONCE_INDIRECT) DECL_VISIBILITY (decl) = VISIBILITY_HIDDEN; } - else -TREE_STATIC (decl) = 1; sym_ref = gen_rtx_SYMBOL_REF (Pmode, sym); assemble_variable (decl, 1, 1, 1);
Re: [patch] struct resources: remove unch_memory member
On Sat, May 11, 2013 at 1:22 AM, Steven Bosscher stevenb@gmail.com wrote: Hello, This unch_memory in struct resources is a left-over from RTX_UNCHANGING_P, but it looks like the change-over to MEM_READONLY_P was done incorrectly: The resource_conflicts_p code now reports conflicts for insns reading readonly memory and insns reading normal memory or no memory at all. Spotted by checking why reorg.c failed to fill some slots that my sched-deps based delay slot scheduler managed to fill. Bootstrappedtested on sparc64-unknown-linux-gnu. OK for trunk? Ok. Thanks, Richard. Ciao! Steven
Re: [patch] fix PR52139 correctly
On Sat, Apr 13, 2013 at 08:21:46PM +0200, Steven Bosscher wrote: * cfgrtl.c (cfg_layout_merge_blocks): Revert r184005, implement correct fix by moving header and footer insn to the footer of the merged basic block. Clear BB_END of the merged-away block. Unfortunately this caused PR57257. Jakub
Re: cfgexpand.c patch for [was new port: msp430-elf]
On Sat, May 11, 2013 at 1:41 AM, DJ Delorie d...@redhat.com wrote: Note that I had to make a few changes (fixes?) in the MI portions of gcc to avoid problems I encountered, I don't know if these changes are correct or if there are better ways to avoid those cases. Those In any case, they should best be posted in separate messages, each one with its own rationale. Here's the first of those... The patch assumes that, by definition, a partial int mode has fewer bits than an int mode of the same size, and thus truncation should be used to go from the int mode to the partial int mode. Can you add that (partial int modes have fewer bits than int modes) as verification to genmodes.c:make_partial_integer_mode? Index: gcc/cfgexpand.c === --- gcc/cfgexpand.c (revision 198591) +++ gcc/cfgexpand.c (working copy) @@ -3090,13 +3090,17 @@ expand_debug_expr (tree exp) size_t, we need to check for mis-matched modes and correct the addend. */ if (op0 op1 GET_MODE (op0) != VOIDmode GET_MODE (op1) != VOIDmode GET_MODE (op0) != GET_MODE (op1)) { - if (GET_MODE_BITSIZE (GET_MODE (op0)) GET_MODE_BITSIZE (GET_MODE (op1))) I wonder if this should not use GET_MODE_PRECISION - after all it is the precision that determines whether we have to extend / truncate? Or is precision a so much unused term on RTL that this would cause problems? Thus, + if (GET_MODE_BITSIZE (GET_MODE (op0)) GET_MODE_BITSIZE (GET_MODE (op1)) if (GET_MODE_PRECISION (GET_MODE (op0)) GET_MODE_PRECISION (GET_MODE (op1))) op1 = simplify_gen_unary (TRUNCATE, GET_MODE (op0), op1, GET_MODE (op1)); ? Richard. + /* Don't try to sign-extend SImode to PSImode, for example. */ + || (GET_MODE_BITSIZE (GET_MODE (op0)) == GET_MODE_BITSIZE (GET_MODE (op1)) + GET_MODE_CLASS (GET_MODE (op0)) == MODE_PARTIAL_INT + GET_MODE_CLASS (GET_MODE (op1)) == MODE_INT)) op1 = simplify_gen_unary (TRUNCATE, GET_MODE (op0), op1, GET_MODE (op1)); else /* We always sign-extend, regardless of the signedness of the operand, because the operand is always unsigned here even if the original C expression is signed. */
Re: Prefer scalar offset in vector shifts
On Sun, May 12, 2013 at 2:04 PM, Marc Glisse marc.gli...@inria.fr wrote: Hello, this patch passes bootstrap+testsuite on x86_64-linux-gnu. When moving uniform_vector_p, I only added the gcc_assert. Note that the fold_binary patch helps for constant vectors, but not for {n,n,n,n}, which will require some help in forwprop for instance. This transformation is already done by the vector lowering pass, but that's too late in my opinion. Ok. Thanks, Richard. 2013-05-13 Marc Glisse marc.gli...@inria.fr gcc/ * tree-vect-generic.c (uniform_vector_p): Move ... * tree.c (uniform_vector_p): ... here. * tree.h (uniform_vector_p): Declare it. * fold-const.c (fold_binary_loc) shift: Turn the second argument into a scalar. gcc/testsuite/ * gcc.dg/vector-shift-2.c: New testcase. -- Marc Glisse Index: gcc/testsuite/gcc.dg/vector-shift-2.c === --- gcc/testsuite/gcc.dg/vector-shift-2.c (revision 0) +++ gcc/testsuite/gcc.dg/vector-shift-2.c (revision 0) @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options -fdump-tree-ccp1 } */ + +typedef unsigned vec __attribute__ ((vector_size (16))); +void +f (vec *a) +{ + vec s = { 5, 5, 5, 5 }; + *a = *a s; +} + +/* { dg-final { scan-tree-dump 5 ccp1 } } */ +/* { dg-final { cleanup-tree-dump ccp1 } } */ Property changes on: gcc/testsuite/gcc.dg/vector-shift-2.c ___ Added: svn:keywords + Author Date Id Revision URL Added: svn:eol-style + native Index: gcc/tree-vect-generic.c === --- gcc/tree-vect-generic.c (revision 198803) +++ gcc/tree-vect-generic.c (working copy) @@ -319,66 +319,20 @@ expand_vector_addition (gimple_stmt_iter parts_per_word = 4 TYPE_VECTOR_SUBPARTS (type) = 4) return expand_vector_parallel (gsi, f_parallel, type, a, b, code); else return expand_vector_piecewise (gsi, f, type, TREE_TYPE (type), a, b, code); } -/* Check if vector VEC consists of all the equal elements and - that the number of elements corresponds to the type of VEC. - The function returns first element of the vector - or NULL_TREE if the vector is not uniform. */ -static tree -uniform_vector_p (tree vec) -{ - tree first, t; - unsigned i; - - if (vec == NULL_TREE) -return NULL_TREE; - - if (TREE_CODE (vec) == VECTOR_CST) -{ - first = VECTOR_CST_ELT (vec, 0); - for (i = 1; i VECTOR_CST_NELTS (vec); ++i) - if (!operand_equal_p (first, VECTOR_CST_ELT (vec, i), 0)) - return NULL_TREE; - - return first; -} - - else if (TREE_CODE (vec) == CONSTRUCTOR) -{ - first = error_mark_node; - - FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (vec), i, t) -{ - if (i == 0) -{ - first = t; - continue; -} - if (!operand_equal_p (first, t, 0)) - return NULL_TREE; -} - if (i != TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec))) - return NULL_TREE; - - return first; -} - - return NULL_TREE; -} - /* Try to expand vector comparison expression OP0 CODE OP1 by querying optab if the following expression: VEC_COND_EXPR OP0 CODE OP1, {-1,...}, {0,...} can be expanded. */ static tree expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0, tree op1, enum tree_code code) { tree t; if (! expand_vec_cond_expr_p (type, TREE_TYPE (op0))) Index: gcc/tree.c === --- gcc/tree.c (revision 198803) +++ gcc/tree.c (working copy) @@ -10126,20 +10126,68 @@ initializer_zerop (const_tree init) return false; return true; } default: return false; } } +/* Check if vector VEC consists of all the equal elements and + that the number of elements corresponds to the type of VEC. + The function returns first element of the vector + or NULL_TREE if the vector is not uniform. */ +tree +uniform_vector_p (const_tree vec) +{ + tree first, t; + unsigned i; + + if (vec == NULL_TREE) +return NULL_TREE; + + gcc_assert (VECTOR_TYPE_P (TREE_TYPE (vec))); + + if (TREE_CODE (vec) == VECTOR_CST) +{ + first = VECTOR_CST_ELT (vec, 0); + for (i = 1; i VECTOR_CST_NELTS (vec); ++i) + if (!operand_equal_p (first, VECTOR_CST_ELT (vec, i), 0)) + return NULL_TREE; + + return first; +} + + else if (TREE_CODE (vec) == CONSTRUCTOR) +{ + first = error_mark_node; + + FOR_EACH_CONSTRUCTOR_VALUE
Re: rtl expansion without zero/sign extension based on VRP
On Mon, May 13, 2013 at 5:45 AM, Kugan kugan.vivekanandara...@linaro.org wrote: Hi, This patch removes some of the redundant sign/zero extensions using value ranges informations generated by VRP. When GIMPLE_ASSIGN stmts with LHS type smaller than word is expanded to rtl, if we can prove that RHS expression value can always fit in LHS type and there is no sign conversion, truncation and extension to fit the type is redundant. Subreg and Zero/sign extensions are therefore redundant. For example, when an expression is evaluated and it's value is assigned to variable of type short, the generated rtl would look something like the following. (set (reg:SI 110) (zero_extend:SI (subreg:HI (reg:SI 117) 0))) However, if during value range propagation, if we can say for certain that the value of the expression which is present in register 117 is within the limits of short and there is no sign conversion, we don’t need to perform the subreg and zero_extend; instead we can generate the following rtl. (set (reg:SI 110) (reg:SI 117))) Attached patch adds value range information to gimple stmts during value range propagation pass and expands rtl without subreg and zero/sign extension if the value range is within the limits of it's type. This change improve the geomean of spec2k int benchmark with ref by about ~3.5% on an arm chromebook. Tested on X86_64 and ARM. I would like review comments on this. A few comments on the way you preserve this information. --- a/gcc/gimple.h +++ b/gcc/gimple.h @@ -191,6 +191,11 @@ struct GTY((chain_next (%h.next))) gimple_statement_base { in there. */ unsigned int subcode : 16; + /* if an assignment gimple statement has RHS expression that can fit + LHS type, zero/sign extension to truncate is redundant. + Set this if we detect extension as redundant during VRP. */ + unsigned sign_zero_ext_redundant : 1; + this enlarges all gimple statements by 8 bytes, so it's out of the question. My original plan to preserve value-range information was to re-use SSA_NAME_PTR_INFO which is where we already store value-range information for pointers: struct GTY(()) ptr_info_def { /* The points-to solution. */ struct pt_solution pt; /* Alignment and misalignment of the pointer in bytes. Together align and misalign specify low known bits of the pointer. ptr (align - 1) == misalign. */ /* When known, this is the power-of-two byte alignment of the object this pointer points into. This is usually DECL_ALIGN_UNIT for decls and MALLOC_ABI_ALIGNMENT for allocated storage. When the alignment is not known, it is zero. Do not access directly but use functions get_ptr_info_alignment, set_ptr_info_alignment, mark_ptr_info_alignment_unknown and similar. */ unsigned int align; /* When alignment is known, the byte offset this pointer differs from the above alignment. Access only through the same helper functions as align above. */ unsigned int misalign; }; what you'd do is to make the ptr_info_def reference from tree_ssa_name a reference to a union of the above (for pointer SSA names) and an alternate info like struct range_info_def { double_int min; double_int max; }; or a more compressed format (a reference to a mode or type it fits for example). The very specific case in question also points at the fact that we have two conversion tree codes, NOP_EXPR and CONVERT_EXPR, and we've tried to unify them (but didn't finish up that task)... +static void +process_stmt_for_ext_elimination () +{ we already do all the analysis when looking for opportunities to eliminate the intermediate cast in (T2)(T1)x in simplify_conversion_using_ranges, that's the place that should handle this case. Richard. Thanks, Kugan 2013-05-09 Kugan Vivekanandarajah kug...@linaro.org * gcc/gimple.h (gimple_is_exp_fit_lhs, gimple_set_exp_fit_lhs): New function. * gcc/tree-vrp.c (process_stmt_for_ext_elimination): Likewise. * (is_msb_set, range_fits_type): Likewise. * (vrp_finalize): Call process_stmt_for_ext_elimination. * gcc/dojump.c (do_compare_and_jump): generates rtl without zero/sign extension if process_stmt_for_ext_elimination tells so. * gcc/cfgexpand.c (expand_gimple_stmt_1): Likewise.
Re: C++/v3 PATCH to add/throw std::bad_array_new_length
On 05/06/2013 05:56 PM, Jason Merrill wrote: On 05/06/2013 08:46 AM, Florian Weimer wrote: On 05/06/2013 02:39 PM, Jason Merrill wrote: On 05/06/2013 05:46 AM, Florian Weimer wrote: Nice, this is simpler than expected. However, it makes the call sites even more bloated. Hmm, perhaps the checking should be wrapped in an inline function, so that the inliner can decide whether or not to expand it at the call site... Or we could call __cxa_vec_new[23] and rely on the check there True. The problem with using those is the indirect calls to the (possibly inline) constructors, though it might be worth doing conditionally. Would you be interested in working on that change? Yes, but it's probably better if you commit your patch right away. (in most cases—for new T[a][b], we'd still need a separate overflow check). But new T[a][b] is ill-formed, so we don't need to handle that case. I meant with one of a or b as a constant (I can't remember which it is). We still have to perform one multiplication inline, to get the total number of elements, and that needs overflow checking as well. -- Florian Weimer / Red Hat Product Security Team
Re: [PATCH] Use types_compatible_p in get_binfo_at_offset
On Sat, May 11, 2013 at 1:39 PM, Jan Hubicka hubi...@ucw.cz wrote: But glancing over the the dumps, I see many of them just have different name spaces. Do we even attempt to merge namespace_decl? How types from same namespaces in different units are supposed to match? We do not merge namespace decls, which is likely the issue here. My in-progress tree merging should eventually fix this... Yep, I think namespaces should to be merged. They are however (always?) refereed by TYPE_CONTEXT and I do not see that to be part of type merging rules. So I do not think it prevents us from merging types from different namespace decls. Note that only TYPE_DECLs are siblings of NAMESPACE_DECLs and we do stream DECL_CONTEXT (and also factor in that at least for types). So in the end we should only merge same types from the same namespace which means we _should_ factor in the types overall context, even if we fail to do that now. On the other hand, I alwas saw a lot of duplicates of the same structure/class getting unmerged even without namespaces. I am not sure how much of those are just bugs and how much we keep for valid reasons (such as slightly different debug info attached to them because of different #include order or so) Bugs are certainly possible ;) I uploaded the list of false negatives when compiling Mozilla's JS interpretter to http://atrey.karlin.mff.cuni.cz/~hubicka/binfo.txt I suppose any bigger C++ project will give such a list and it looks like good list of cases to analyze. Can we resonably expect those to be all merged? Perhaps C++ one decl rule allows us to simply match names+namespaces (after merging) here? But we cannot assume C++ rules for types. Richard. Honza Richard. Honza Richard. Honza
RE: [PATCH,i386] Default alignment for AMD BD and BT
Thank you Uros! Committed r198820. Regards Ganesh -Original Message- From: Uros Bizjak [mailto:ubiz...@gmail.com] Sent: Tuesday, May 07, 2013 6:22 PM To: Gopalasubramanian, Ganesh Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH,i386] Default alignment for AMD BD and BT On Tue, May 7, 2013 at 9:16 AM, Gopalasubramanian, Ganesh ganesh.gopalasubraman...@amd.com wrote: The patch updates the alignment values for AMD BD and BT architectures. make -k check passes. Is it OK for upstream? 2013-05-07 Ganesh Gopalasubramanian ganesh.gopalasubraman...@amd.com * config/i386/i386.c (processor_target_table): Modified default alignment values for AMD BD and BT architectures. The value 11 indeed looks a bit weird, but it means: align to 16 byte boundary only if this can be done by skipping 10 bytes or less. Tha patch is OK for mainline. Thanks, Uros.
Re: section anchors and weak hidden symbols
On 05/09/13 07:02, Nathan Sidwell wrote: On 05/08/13 18:47, Jan Hubicka wrote: Thinking about it again, isn't decl_replaceable_p the thing you are looking for here? that looks promising. I'll try !decl_replaceable_p in the section anchor hook. It does indeed seem to be the right predicate. tested with a ppc-linux target, ok? nathan 2013-05-13 Nathan Sidwell nat...@codesourcery.com gcc/ * varasm.c (default_use_anchors_for_symbol_p): Use decl_replaceable_p. gcc/testsuite/ * gcc.dg/visibility-21.c: New. Index: varasm.c === --- varasm.c(revision 198771) +++ varasm.c(working copy) @@ -6582,10 +6582,18 @@ default_use_anchors_for_symbol_p (const_ { /* Don't use section anchors for decls that might be defined by other modules. */ - if (!targetm.binds_local_p (decl)) + if (decl_replaceable_p (decl)) return false; /* Don't use section anchors for decls that will be placed in a Index: testsuite/gcc.dg/visibility-21.c === --- testsuite/gcc.dg/visibility-21.c(revision 0) +++ testsuite/gcc.dg/visibility-21.c(revision 0) @@ -0,0 +1,13 @@ +/* Test visibility attribute on function definition. */ +/* { dg-do compile } */ +/* { dg-options -O2 -fsection-anchors } */ +/* { dg-require-visibility } */ +/* { dg-require-weak } */ +/* { dg-final { scan-assembler-not ANCHOR } } */ + +int __attribute__((weak, visibility(hidden))) weak_hidden[3]; + +int *f_weak_hidden () +{ + return weak_hidden; +}
Re: section anchors and weak hidden symbols
On 05/09/13 07:02, Nathan Sidwell wrote: On 05/08/13 18:47, Jan Hubicka wrote: Thinking about it again, isn't decl_replaceable_p the thing you are looking for here? that looks promising. I'll try !decl_replaceable_p in the section anchor hook. It does indeed seem to be the right predicate. tested with a ppc-linux target, ok? Looks good to me. We eventually ought to do some renaming and move all the predicates to single place. It is really hard to make sense of them as they are organized currently. Honza nathan 2013-05-13 Nathan Sidwell nat...@codesourcery.com gcc/ * varasm.c (default_use_anchors_for_symbol_p): Use decl_replaceable_p. gcc/testsuite/ * gcc.dg/visibility-21.c: New. Index: varasm.c === --- varasm.c (revision 198771) +++ varasm.c (working copy) @@ -6582,10 +6582,18 @@ default_use_anchors_for_symbol_p (const_ { /* Don't use section anchors for decls that might be defined by other modules. */ - if (!targetm.binds_local_p (decl)) + if (decl_replaceable_p (decl)) return false; /* Don't use section anchors for decls that will be placed in a Index: testsuite/gcc.dg/visibility-21.c === --- testsuite/gcc.dg/visibility-21.c (revision 0) +++ testsuite/gcc.dg/visibility-21.c (revision 0) @@ -0,0 +1,13 @@ +/* Test visibility attribute on function definition. */ +/* { dg-do compile } */ +/* { dg-options -O2 -fsection-anchors } */ +/* { dg-require-visibility } */ +/* { dg-require-weak } */ +/* { dg-final { scan-assembler-not ANCHOR } } */ + +int __attribute__((weak, visibility(hidden))) weak_hidden[3]; + +int *f_weak_hidden () +{ + return weak_hidden; +}
Re: More vector folding
On Sat, May 11, 2013 at 11:38 AM, Marc Glisse marc.gli...@inria.fr wrote: Second try. I removed the fold_single_bit_test thing (I thought I'd handle it, so I started by the easy part, and never did the rest). Adapting invert_truthvalue_loc for vectors, I thought: calling fold_truth_not_expr and build1 if it fails is just the same as fold_build1. Except that it wasn't: fold_unary_loc fold_convert to boolean before calling fold_truth_not_expr and then back to the required type. And instead of simply changing the type of an EQ_EXPR, fold_convert introduces a NOP_EXPR (one that STRIP_NOPS doesn't remove), which hides the comparison from many other parts of the front-end (affects warnings) and folding. I hesitated between removing this cast and enhancing fold_convert, and chose the one that removes code. As a side benefit, I got an XPASS :-) Passes bootstrap+testsuite on x86_64-linux-gnu. 2013-05-11 Marc Glisse marc.gli...@inria.fr gcc/ * fold-const.c (fold_negate_expr): Handle vectors. (fold_truth_not_expr): Make it static. (invert_truthvalue_loc): Handle vectors. Do not call fold_truth_not_expr directly. (fold_unary_loc) BIT_NOT_EXPR: Handle vector comparisons. TRUTH_NOT_EXPR: Do not cast to boolean. (fold_comparison): Handle vector constants. (fold_ternary_loc) VEC_COND_EXPR: Adapt more COND_EXPR optimizations. * tree.h (fold_truth_not_expr): Remove declaration. gcc/testsuite/ * g++.dg/ext/vector22.C: New testcase. * gcc.dg/binop-xor3.c: Remove xfail. -- Marc Glisse Index: gcc/fold-const.c === --- gcc/fold-const.c(revision 198796) +++ gcc/fold-const.c(working copy) @@ -519,21 +519,21 @@ fold_negate_expr (location_t loc, tree t { tree type = TREE_TYPE (t); tree tem; switch (TREE_CODE (t)) { /* Convert - (~A) to A + 1. */ case BIT_NOT_EXPR: if (INTEGRAL_TYPE_P (type)) return fold_build2_loc (loc, PLUS_EXPR, type, TREE_OPERAND (t, 0), -build_int_cst (type, 1)); +build_one_cst (type)); break; case INTEGER_CST: tem = fold_negate_const (t, type); if (TREE_OVERFLOW (tem) == TREE_OVERFLOW (t) || !TYPE_OVERFLOW_TRAPS (type)) return tem; break; case REAL_CST: @@ -3078,21 +3078,21 @@ omit_two_operands_loc (location_t loc, t } /* Return a simplified tree node for the truth-negation of ARG. This never alters ARG itself. We assume that ARG is an operation that returns a truth value (0 or 1). FIXME: one would think we would fold the result, but it causes problems with the dominator optimizer. */ -tree +static tree fold_truth_not_expr (location_t loc, tree arg) { tree type = TREE_TYPE (arg); enum tree_code code = TREE_CODE (arg); location_t loc1, loc2; /* If this is a comparison, we can simply invert it, except for floating-point non-equality comparisons, in which case we just enclose a TRUTH_NOT_EXPR around what we have. */ @@ -3215,38 +3215,33 @@ fold_truth_not_expr (location_t loc, tre return build1_loc (loc, CLEANUP_POINT_EXPR, type, invert_truthvalue_loc (loc1, TREE_OPERAND (arg, 0))); default: return NULL_TREE; } } /* Return a simplified tree node for the truth-negation of ARG. This never alters ARG itself. We assume that ARG is an operation that - returns a truth value (0 or 1). - - FIXME: one would think we would fold the result, but it causes - problems with the dominator optimizer. */ + returns a truth value (0 or 1 for scalars, 0 or -1 for vectors). */ tree invert_truthvalue_loc (location_t loc, tree arg) { - tree tem; - if (TREE_CODE (arg) == ERROR_MARK) return arg; - tem = fold_truth_not_expr (loc, arg); - if (!tem) -tem = build1_loc (loc, TRUTH_NOT_EXPR, TREE_TYPE (arg), arg); - - return tem; + tree type = TREE_TYPE (arg); + return fold_build1_loc (loc, VECTOR_TYPE_P (type) + ? BIT_NOT_EXPR + : TRUTH_NOT_EXPR, + type, arg); } /* Given a bit-wise operation CODE applied to ARG0 and ARG1, see if both operands are another bit-wise operation with a common input. If so, distribute the bit operations to save an operation and possibly two if constants are involved. For example, convert (A | B) (A | C) into A | (B C) Further simplification will occur if B and C are constants. If this optimization cannot be done, 0 will be returned. */ @@ -8274,28 +8269,34 @@ fold_unary_loc (location_t loc, enum tre { elem = VECTOR_CST_ELT (arg0, i); elem = fold_unary_loc (loc, BIT_NOT_EXPR, TREE_TYPE
[AARCH64] Refactor simd_mov split
Hi, This patch refactors the simd_mov split and fixes a few coding style issues. Tested successfully on a full aarch64-elf regression run. OK for trunk? Thanks Sofiane aarch64-refactor-simd-mov.diff Description: Binary data
[PATCH] Fix PR57235
This fixes a virtual SSA updating problem with sinking clobbers. Namely when sinking into a block with multiple predecessors and no virtual use we lack a convenient PHI node that serves as a merge of the virtual operands from the predecessors. The following patch gives up sinking clobbers in this case (alternatively we can remove all clobbers). Any preference here? I'm wondering when sinking a clobber into a block with multiple preds is good. Bootstrapped and tested on x86_64-unknown-linux-gnu. Thanks, Richard. 2013-05-13 Richard Biener rguent...@suse.de PR middle-end/57235 * tree-eh.c (sink_clobbers): Give up for successors with multiple predecessors and no virtual uses. * g++.dg/torture/pr57235.C: New testcase. Index: gcc/tree-eh.c === --- gcc/tree-eh.c (revision 198815) +++ gcc/tree-eh.c (working copy) @@ -3360,6 +3360,11 @@ sink_clobbers (basic_block bb) } } + /* ??? If we have multiple predecessors but no virtual PHI we cannot + easily update virtual SSA form. */ + if (!vphi !single_pred_p (succbb)) +return 0; + dgsi = gsi_after_labels (succbb); gsi = gsi_last_bb (bb); for (gsi_prev (gsi); !gsi_end_p (gsi); gsi_prev (gsi)) Index: gcc/testsuite/g++.dg/torture/pr57235.C === --- gcc/testsuite/g++.dg/torture/pr57235.C (revision 0) +++ gcc/testsuite/g++.dg/torture/pr57235.C (working copy) @@ -0,0 +1,156 @@ +// { dg-do compile } + +namespace std +{ + template class _Elem struct char_traits + { + }; + struct _Container_base + { + }; +template class _Ty struct _Allocator_base + { + }; +template class _Ty class allocator:public _Allocator_base _Ty + { + }; + class _String_base:public _Container_base + { + }; +template class _Ty, class _Alloc class _String_val:public _String_base + { + }; +template class _Elem, class _Traits, class _Ax class basic_string:public _String_val _Elem, +_Ax + + { + public:typedef basic_string _Elem, _Traits, _Ax _Myt; +typedef _String_val _Elem, _Ax _Mybase; +basic_string (const _Elem * _Ptr):_Mybase () +{ +} + }; + typedef basic_string char, char_traits char , +allocator char string; +} + + +namespace google +{ + namespace protobuf + { +namespace internal +{ + template class C class scoped_ptr + { + public:typedef C element_type; + explicit scoped_ptr (C * p = __null):ptr_ (p) + { + } +~scoped_ptr () + { + delete ptr_; + } + C *get () const + { + return ptr_; + } + private: C * ptr_; + }; +} +using internal::scoped_ptr; +enum LogLevel +{ + LOGLEVEL_INFO, LOGLEVEL_WARNING, LOGLEVEL_ERROR, LOGLEVEL_FATAL, + LOGLEVEL_DFATAL = LOGLEVEL_ERROR +}; +namespace internal +{ + class LogMessage + { + public:LogMessage (LogLevel level, const char *filename, + int line); +~LogMessage (); + LogMessage operator (const std::string value); + }; + class LogFinisher + { + public:void operator= (LogMessage other); + }; +} +using namespace std; +class Descriptor +{ +}; +class FieldDescriptor +{ +public: + const Descriptor *message_type () const; + string DebugString () const; +}; +class MessageLite +{ +}; +class Message:public MessageLite +{ +public:inline Message () + { + } + virtual ~ Message (); + virtual Message *New () const = 0; +}; +class MessageFactory +{ +}; +class UnknownFieldSet +{ +}; +class DynamicMessageFactory:public MessageFactory +{ +public:DynamicMessageFactory (); + const Message *GetPrototype (const Descriptor * type); +}; +namespace io +{ + class ErrorCollector + { + public:inline ErrorCollector () + { + } + virtual ~ ErrorCollector (); + }; +} +class DescriptorBuilder +{ + class OptionInterpreter + { + bool SetAggregateOption (const FieldDescriptor * option_field, +UnknownFieldSet * unknown_fields); + DynamicMessageFactory dynamic_factory_; + }; +}; +namespace +{ + class AggregateErrorCollector:public io::ErrorCollector + { + }; +} +bool DescriptorBuilder::OptionInterpreter:: + SetAggregateOption (const FieldDescriptor * option_field, + UnknownFieldSet * unknown_fields) +{ + const Descriptor *type = option_field-message_type (); + scoped_ptr Message + dynamic (dynamic_factory_.GetPrototype (type)-New ()); + !(!(dynamic.get () != + __null)) ? (void) 0 : ::google::protobuf::internal:: + LogFinisher ()
Re: [PATCH] Fix up rotate expansion (take 2)
On Sat, 11 May 2013, Jakub Jelinek wrote: On Sat, May 11, 2013 at 09:05:52AM +0200, Jakub Jelinek wrote: Seems that we ought to have a testcase, even though it probably means scanning the tree dumps to pick up the undefined behaviour. Approved with a testcase. I have added lots of testcases recently, for rotation by zero perhaps something similar to rotate-1a.c from above can be added as rotate-2b.c and rotate-4b.c, and test zero rotation. Thanks for forcing me to do more testcases, I've actually found a serious bug in my recent patch. The (X Y) OP (X ((-Y) (B - 1))) style patterns can only be recognized as rotates if OP is |, because while they act as rotates for Y != 0, they act differently for Y == 0. For (X Y) OP (X (B - Y)) that is not an issue, because for Y == 0 they trigger undefined behavior. Fixed thusly, plus added coverage for rotates by 0. And rotate-5.c testcase is to test the expmed.c change. Ok. Thanks, Richard. 2013-05-10 Jakub Jelinek ja...@redhat.com PR tree-optimization/45216 PR tree-optimization/57157 * tree-ssa-forwprop.c (simplify_rotate): Only recognize the (-Y) (B - 1) variant if OP is |. * expmed.c (expand_shift_1): For rotations by const0_rtx just return shifted. Use (-op1) (prec - 1) as other_amount instead of prec - op1. * c-c++-common/rotate-1.c: Add 32 tests with +. * c-c++-common/rotate-1a.c: Adjust. * c-c++-common/rotate-2.c: Add 32 tests with +, expect only 48 rotates. * c-c++-common/rotate-2b.c: New test. * c-c++-common/rotate-3.c: Add 32 tests with +. * c-c++-common/rotate-4.c: Add 32 tests with +, expect only 48 rotates. * c-c++-common/rotate-4b.c: New test. * c-c++-common/rotate-5.c: New test. --- gcc/tree-ssa-forwprop.c.jj2013-05-10 10:39:13.0 +0200 +++ gcc/tree-ssa-forwprop.c 2013-05-11 09:57:39.627194037 +0200 @@ -2135,10 +2135,10 @@ simplify_bitwise_binary (gimple_stmt_ite (X (int) Y) OP (X (int) (B - Y)) ((T) ((T2) X Y)) OP ((T) ((T2) X (B - Y))) ((T) ((T2) X (int) Y)) OP ((T) ((T2) X (int) (B - Y))) - (X Y) OP (X ((-Y) (B - 1))) - (X (int) Y) OP (X (int) ((-Y) (B - 1))) - ((T) ((T2) X Y)) OP ((T) ((T2) X ((-Y) (B - 1 - ((T) ((T2) X (int) Y)) OP ((T) ((T2) X (int) ((-Y) (B - 1 + (X Y) | (X ((-Y) (B - 1))) + (X (int) Y) | (X (int) ((-Y) (B - 1))) + ((T) ((T2) X Y)) | ((T) ((T2) X ((-Y) (B - 1 + ((T) ((T2) X (int) Y)) | ((T) ((T2) X (int) ((-Y) (B - 1 and transform these into: X r CNT1 @@ -2293,7 +2293,8 @@ simplify_rotate (gimple_stmt_iterator *g host_integerp (cdef_arg2[i], 0) tree_low_cst (cdef_arg2[i], 0) == TYPE_PRECISION (rtype) - 1 - TREE_CODE (cdef_arg1[i]) == SSA_NAME) + TREE_CODE (cdef_arg1[i]) == SSA_NAME + gimple_assign_rhs_code (stmt) == BIT_IOR_EXPR) { tree tem; enum tree_code code; --- gcc/expmed.c.jj 2013-05-07 10:26:46.0 +0200 +++ gcc/expmed.c 2013-05-11 09:11:54.087412982 +0200 @@ -2166,7 +2166,8 @@ expand_shift_1 (enum tree_code code, enu { /* If we have been unable to open-code this by a rotation, do it as the IOR of two shifts. I.e., to rotate A - by N bits, compute (A N) | ((unsigned) A (C - N)) + by N bits, compute + (A N) | ((unsigned) A ((-N) (C - 1))) where C is the bitsize of A. It is theoretically possible that the target machine might @@ -2181,14 +2182,22 @@ expand_shift_1 (enum tree_code code, enu rtx temp1; new_amount = op1; - if (CONST_INT_P (op1)) + if (op1 == const0_rtx) + return shifted; + else if (CONST_INT_P (op1)) other_amount = GEN_INT (GET_MODE_BITSIZE (mode) - INTVAL (op1)); else - other_amount - = simplify_gen_binary (MINUS, GET_MODE (op1), - GEN_INT (GET_MODE_PRECISION (mode)), - op1); + { + other_amount + = simplify_gen_unary (NEG, GET_MODE (op1), + op1, GET_MODE (op1)); + other_amount + = simplify_gen_binary (AND, GET_MODE (op1), +other_amount, +GEN_INT (GET_MODE_PRECISION (mode) + - 1)); + } shifted = force_reg (mode, shifted); --- gcc/testsuite/c-c++-common/rotate-1.c.jj 2013-05-10 10:39:13.0 +0200 +++
Re: [ada, build] Restore Solaris/amd64 Ada bootstrap (PR ada/57188)
Rainer Orth r...@cebitec.uni-bielefeld.de writes: As described in the PR, amd64-pc-solaris2.1[01] Ada bootstrap was failing for some time. It has turned out that this patch is the culprit: 2013-04-23 Eric Botcazou ebotca...@adacore.com Pascal Obryo...@adacore.com * gcc-interface/Makefile.in (targ): Fix target name check. [...] I couldn't find the gcc-patches posting for this patch, thus I'm missing the rationale for it. It seems rather counterintuitive and fragile to me, replacing the canonical $target by the far more varied $target_alias. If there's really a good reason to keep that patch nonetheless, the following patch fixes Solaris/x64 bootstrap. Bootstrapped without regression on amd64-pc-solaris2.10 and i386-pc-solaris2.11. Ok for mainline? The patch was approved by Arno in the PR, so I've installed it. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: More vector folding
On Mon, 13 May 2013, Richard Biener wrote: On Sat, May 11, 2013 at 11:38 AM, Marc Glisse marc.gli...@inria.fr wrote: Second try. I removed the fold_single_bit_test thing (I thought I'd handle it, so I started by the easy part, and never did the rest). Adapting invert_truthvalue_loc for vectors, I thought: calling fold_truth_not_expr and build1 if it fails is just the same as fold_build1. Except that it wasn't: fold_unary_loc fold_convert to boolean before calling fold_truth_not_expr and then back to the required type. And instead of simply changing the type of an EQ_EXPR, fold_convert introduces a NOP_EXPR (one that STRIP_NOPS doesn't remove), which hides the comparison from many other parts of the front-end (affects warnings) and folding. I hesitated between removing this cast and enhancing fold_convert, and chose the one that removes code. As a side benefit, I got an XPASS :-) Passes bootstrap+testsuite on x86_64-linux-gnu. 2013-05-11 Marc Glisse marc.gli...@inria.fr gcc/ * fold-const.c (fold_negate_expr): Handle vectors. (fold_truth_not_expr): Make it static. (invert_truthvalue_loc): Handle vectors. Do not call fold_truth_not_expr directly. (fold_unary_loc) BIT_NOT_EXPR: Handle vector comparisons. TRUTH_NOT_EXPR: Do not cast to boolean. (fold_comparison): Handle vector constants. (fold_ternary_loc) VEC_COND_EXPR: Adapt more COND_EXPR optimizations. * tree.h (fold_truth_not_expr): Remove declaration. gcc/testsuite/ * g++.dg/ext/vector22.C: New testcase. * gcc.dg/binop-xor3.c: Remove xfail. -- Marc Glisse Index: gcc/fold-const.c === --- gcc/fold-const.c(revision 198796) +++ gcc/fold-const.c(working copy) @@ -519,21 +519,21 @@ fold_negate_expr (location_t loc, tree t { tree type = TREE_TYPE (t); tree tem; switch (TREE_CODE (t)) { /* Convert - (~A) to A + 1. */ case BIT_NOT_EXPR: if (INTEGRAL_TYPE_P (type)) return fold_build2_loc (loc, PLUS_EXPR, type, TREE_OPERAND (t, 0), -build_int_cst (type, 1)); +build_one_cst (type)); break; case INTEGER_CST: tem = fold_negate_const (t, type); if (TREE_OVERFLOW (tem) == TREE_OVERFLOW (t) || !TYPE_OVERFLOW_TRAPS (type)) return tem; break; case REAL_CST: @@ -3078,21 +3078,21 @@ omit_two_operands_loc (location_t loc, t } /* Return a simplified tree node for the truth-negation of ARG. This never alters ARG itself. We assume that ARG is an operation that returns a truth value (0 or 1). FIXME: one would think we would fold the result, but it causes problems with the dominator optimizer. */ -tree +static tree fold_truth_not_expr (location_t loc, tree arg) { tree type = TREE_TYPE (arg); enum tree_code code = TREE_CODE (arg); location_t loc1, loc2; /* If this is a comparison, we can simply invert it, except for floating-point non-equality comparisons, in which case we just enclose a TRUTH_NOT_EXPR around what we have. */ @@ -3215,38 +3215,33 @@ fold_truth_not_expr (location_t loc, tre return build1_loc (loc, CLEANUP_POINT_EXPR, type, invert_truthvalue_loc (loc1, TREE_OPERAND (arg, 0))); default: return NULL_TREE; } } /* Return a simplified tree node for the truth-negation of ARG. This never alters ARG itself. We assume that ARG is an operation that - returns a truth value (0 or 1). - - FIXME: one would think we would fold the result, but it causes - problems with the dominator optimizer. */ + returns a truth value (0 or 1 for scalars, 0 or -1 for vectors). */ tree invert_truthvalue_loc (location_t loc, tree arg) { - tree tem; - if (TREE_CODE (arg) == ERROR_MARK) return arg; - tem = fold_truth_not_expr (loc, arg); - if (!tem) -tem = build1_loc (loc, TRUTH_NOT_EXPR, TREE_TYPE (arg), arg); - - return tem; + tree type = TREE_TYPE (arg); + return fold_build1_loc (loc, VECTOR_TYPE_P (type) + ? BIT_NOT_EXPR + : TRUTH_NOT_EXPR, + type, arg); } /* Given a bit-wise operation CODE applied to ARG0 and ARG1, see if both operands are another bit-wise operation with a common input. If so, distribute the bit operations to save an operation and possibly two if constants are involved. For example, convert (A | B) (A | C) into A | (B C) Further simplification will occur if B and C are constants. If this optimization cannot be done, 0 will be returned. */ @@ -8274,28 +8269,34 @@ fold_unary_loc (location_t loc, enum tre { elem = VECTOR_CST_ELT (arg0, i); elem = fold_unary_loc (loc, BIT_NOT_EXPR, TREE_TYPE (type), elem); if (elem == NULL_TREE)
[PATCH,i386] FSGSBASE for AMD bdver3
Hi The patch enables FSGSBASE instruction generation for AMD bdver3 architectures. make -k check passes. Is it OK for upstream? Regards Ganesh Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 198821) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2013-05-13 Ganesh Gopalasubramanian ganesh.gopalasubraman...@amd.com + +* config/i386/i386.c (processor_alias_table): Add instruction +FSGSBASE for AMD bdver3 architecture. + 2013-05-13 Martin Jambor mjam...@suse.cz PR middle-end/42371 Index: gcc/config/i386/i386.c === --- gcc/config/i386/i386.c (revision 198821) +++ gcc/config/i386/i386.c (working copy) @@ -3000,7 +3000,7 @@ | PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX | PTA_XOP | PTA_LWP | PTA_BMI | PTA_TBM | PTA_F16C | PTA_FMA | PTA_PRFCHW | PTA_FXSR | PTA_XSAVE - | PTA_XSAVEOPT}, + | PTA_XSAVEOPT | PTA_FSGSBASE}, {btver1, PROCESSOR_BTVER1, CPU_GENERIC64, PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_SSSE3 | PTA_SSE4A |PTA_ABM | PTA_CX16 | PTA_PRFCHW
Re: [PATCH,i386] FSGSBASE for AMD bdver3
On Mon, May 13, 2013 at 1:54 PM, Gopalasubramanian, Ganesh ganesh.gopalasubraman...@amd.com wrote: The patch enables FSGSBASE instruction generation for AMD bdver3 architectures. make -k check passes. Is it OK for upstream? OK. Please also check for missing PTA_PRFCHW and PTA_FXSR for AMD processors in processor_alias_table. Thanks, Uros.
Re: PATCH to implement C++14 VLA semantics
On 05/09/2013 06:41 PM, Jason Merrill wrote: At the last C++ standards meeting, we agreed to add VLAs to the language. But they're significantly different from GNU/C99 VLAs: you can't form a pointer to a VLA, or take its sizeof, or really anything other than directly use it. We also need to throw an exception if we try to create one with a negative or too large bound. I'm not sure if we should throw the exception in case of large size_t values. Even with the checks in place, there is still a wide gap where the definition triggers undefined behavior due to stack overflow. This whole feature seems rather poorly designed to me. The code size increase due to official VLA support in C++11y might come a bit as a surprise. But rereading N3639, there's no way around it, at least for expressions of signed types. -- Florian Weimer / Red Hat Product Security Team
Profile predicates housekeeping
Hi, this patch makes i386.c to use correct predicates for hot/cold decisions. It also makes mode-switching to set correct RTL profile when emitting the initialization code. Honza Bootstrapped/regtested x86_64-linux, comitted. * mode-switching.c (optimize_mode_switching): Set correct RTL profile. * config/i386/i386.c (ix86_compute_frame_layout, ix86_expand_epilogue, emit_i387_cw_initialization, ix86_expand_vector_move_misalign, ix86_fp_comparison_strategy, ix86_local_alignment): Fix use of size/speed predicates. Index: mode-switching.c === *** mode-switching.c(revision 198821) --- mode-switching.c(working copy) *** optimize_mode_switching (void) *** 667,676 --- 667,678 REG_SET_TO_HARD_REG_SET (live_at_edge, df_get_live_out (src_bb)); + rtl_profile_for_edge (eg); start_sequence (); EMIT_MODE_SET (entity_map[j], mode, live_at_edge); mode_set = get_insns (); end_sequence (); + default_rtl_profile (); /* Do not bother to insert empty sequence. */ if (mode_set == NULL_RTX) *** optimize_mode_switching (void) *** 713,718 --- 715,721 { rtx mode_set; + rtl_profile_for_bb (bb); start_sequence (); EMIT_MODE_SET (entity_map[j], ptr-mode, ptr-regs_live); mode_set = get_insns (); *** optimize_mode_switching (void) *** 727,732 --- 730,737 else emit_insn_before (mode_set, ptr-insn_ptr); } + + default_rtl_profile (); } free (ptr); Index: config/i386/i386.c === *** config/i386/i386.c (revision 198821) --- config/i386/i386.c (working copy) *** ix86_compute_frame_layout (struct ix86_f *** 9003,9009 Recompute the value as needed. Do not recompute when amount of registers didn't change as reload does multiple calls to the function and does not expect the decision to change within single iteration. */ ! else if (!optimize_function_for_size_p (cfun) cfun-machine-use_fast_prologue_epilogue_nregs != frame-nregs) { int count = frame-nregs; --- 9003,9009 Recompute the value as needed. Do not recompute when amount of registers didn't change as reload does multiple calls to the function and does not expect the decision to change within single iteration. */ ! else if (!optimize_bb_for_size_p (ENTRY_BLOCK_PTR) cfun-machine-use_fast_prologue_epilogue_nregs != frame-nregs) { int count = frame-nregs; *** ix86_expand_epilogue (int style) *** 11071,11077 /* Leave results in shorter dependency chains on CPUs that are able to grok it fast. */ else if (TARGET_USE_LEAVE ! || optimize_function_for_size_p (cfun) || !cfun-machine-use_fast_prologue_epilogue) ix86_emit_leave (); else --- 11071,11077 /* Leave results in shorter dependency chains on CPUs that are able to grok it fast. */ else if (TARGET_USE_LEAVE ! || optimize_bb_for_size_p (EXIT_BLOCK_PTR) || !cfun-machine-use_fast_prologue_epilogue) ix86_emit_leave (); else *** emit_i387_cw_initialization (int mode) *** 15668,15674 emit_move_insn (reg, copy_rtx (stored_mode)); if (TARGET_64BIT || TARGET_PARTIAL_REG_STALL ! || optimize_function_for_size_p (cfun)) { switch (mode) { --- 15668,15674 emit_move_insn (reg, copy_rtx (stored_mode)); if (TARGET_64BIT || TARGET_PARTIAL_REG_STALL ! || optimize_insn_for_size_p ()) { switch (mode) { *** ix86_expand_vector_move_misalign (enum m *** 16426,16432 if (TARGET_AVX || TARGET_SSE_UNALIGNED_LOAD_OPTIMAL || TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL ! || optimize_function_for_size_p (cfun)) { /* We will eventually emit movups based on insn attributes. */ emit_insn (gen_sse2_loadupd (op0, op1)); --- 16426,16432 if (TARGET_AVX || TARGET_SSE_UNALIGNED_LOAD_OPTIMAL || TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL ! || optimize_insn_for_size_p ()) { /* We will eventually emit movups based on insn attributes. */ emit_insn (gen_sse2_loadupd (op0, op1)); *** ix86_expand_vector_move_misalign (enum m *** 16463,16469 if (TARGET_AVX || TARGET_SSE_UNALIGNED_LOAD_OPTIMAL
[gomp4] #pragma omp declare simd C parsing
Hi! As expected, the C FE in this case was easier than C++, for #pragma omp declare reduction it will be the other way around, C will be much harder. Committed to gomp-4_0-branch. 2013-05-13 Jakub Jelinek ja...@redhat.com * c-tree.h (c_finish_omp_declare_simd): New prototype. * c-typeck.c (c_finish_omp_clauses): Handle OMP_CLAUSE_LINEAR_STEP adjustments for pointer-types here. Diagnose inbranch notinbranch being used together. (c_finish_omp_declare_simd): New function. * c-parser.c (enum pragma_context): Add pragma_struct and pragma_param. (c_parser_declaration_or_fndef): Add omp_declare_simd_clauses argument. Call c_finish_omp_declare_simd if needed. (c_parser_external_declaration, c_parser_compound_statement_nostart, c_parser_label, c_parser_for_statement, c_parser_objc_methodprotolist, c_parser_omp_for_loop): Adjust c_parser_declaration_or_fndef callers. (c_parser_struct_or_union_specifier): Use pragma_struct instead of pragma_external. (c_parser_parameter_declaration): Use pragma_param instead of pragma_external. (c_parser_pragma): Handle PRAGMA_OMP_DECLARE_REDUCTION. Replace == pragma_external with != pragma_stmt != pragma_compound test. (c_parser_omp_variable_list): Add declare_simd argument. Don't lookup vars if it is true, just store identifiers. (c_parser_omp_var_list_parens, c_parser_omp_clause_depend, c_parser_omp_clause_map): Adjust callers. (c_parser_omp_clause_reduction, c_parser_omp_clause_aligned): Add declare_simd argument, pass it through to c_parser_omp_variable_list. (c_parser_omp_clause_linear): Likewise. Don't handle OMP_CLAUSE_LINEAR_STEP adjustements for pointer-types here. (c_parser_omp_clause_uniform): Call c_parser_omp_variable_list instead of c_parser_omp_var_list_parens to pass true as declare_simd. (c_parser_omp_all_clauses): Add declare_simd argument, pass it through clause parsing routines as needed. Don't call c_finish_omp_clauses if set. (c_parser_omp_simd, c_parser_omp_for, c_parser_omp_sections, c_parser_omp_parallel, c_parser_omp_single, c_parser_omp_task, c_parser_omp_cancel, c_parser_omp_cancellation_point): Adjust callers. (OMP_DECLARE_SIMD_CLAUSE_MASK): Define. (c_parser_omp_declare_simd, c_parser_omp_declare): New functions. * gcc.dg/gomp/declare-simd-1.c: New test. * gcc.dg/gomp/declare-simd-2.c: New test. --- gcc/c/c-tree.h.jj 2013-04-24 16:53:21.0 +0200 +++ gcc/c/c-tree.h 2013-05-13 12:50:51.536759893 +0200 @@ -642,6 +642,7 @@ extern void c_finish_omp_taskgroup (loca extern void c_finish_omp_cancel (location_t, tree); extern void c_finish_omp_cancellation_point (location_t, tree); extern tree c_finish_omp_clauses (tree); +extern void c_finish_omp_declare_simd (tree, tree, vectree); extern tree c_build_va_arg (location_t, tree, tree); extern tree c_finish_transaction (location_t, tree, int); extern bool c_tree_equal (tree, tree); --- gcc/c/c-typeck.c.jj 2013-04-24 17:51:21.0 +0200 +++ gcc/c/c-typeck.c2013-05-13 13:59:23.654460967 +0200 @@ -10670,6 +10670,7 @@ c_finish_omp_clauses (tree clauses) bitmap_head aligned_head; tree c, t, *pc = clauses; const char *name; + bool branch_seen = false; bitmap_obstack_initialize (NULL); bitmap_initialize (generic_head, bitmap_default_obstack); @@ -10774,6 +10775,17 @@ c_finish_omp_clauses (tree clauses) remove = true; break; } + if (TREE_CODE (TREE_TYPE (OMP_CLAUSE_DECL (c))) == POINTER_TYPE) + { + tree s = OMP_CLAUSE_LINEAR_STEP (c); + s = pointer_int_sum (OMP_CLAUSE_LOCATION (c), PLUS_EXPR, + OMP_CLAUSE_DECL (c), s); + s = fold_build2_loc (OMP_CLAUSE_LOCATION (c), MINUS_EXPR, + sizetype, s, OMP_CLAUSE_DECL (c)); + if (s == error_mark_node) + s = size_one_node; + OMP_CLAUSE_LINEAR_STEP (c) = s; + } goto check_dup_generic; check_dup_generic: @@ -10919,8 +10931,6 @@ c_finish_omp_clauses (tree clauses) case OMP_CLAUSE_SIMDLEN: case OMP_CLAUSE_DEVICE: case OMP_CLAUSE_DIST_SCHEDULE: - case OMP_CLAUSE_INBRANCH: - case OMP_CLAUSE_NOTINBRANCH: case OMP_CLAUSE_PARALLEL: case OMP_CLAUSE_FOR: case OMP_CLAUSE_SECTIONS: @@ -10929,6 +10939,20 @@ c_finish_omp_clauses (tree clauses) pc = OMP_CLAUSE_CHAIN (c); continue; + case OMP_CLAUSE_INBRANCH: + case OMP_CLAUSE_NOTINBRANCH: + if (branch_seen) + { + error_at (OMP_CLAUSE_LOCATION (c), + %inbranch% clause is incompatible with +
Re: PATCH to implement C++14 VLA semantics
On Mon, May 13, 2013 at 7:25 AM, Florian Weimer fwei...@redhat.com wrote: On 05/09/2013 06:41 PM, Jason Merrill wrote: At the last C++ standards meeting, we agreed to add VLAs to the language. But they're significantly different from GNU/C99 VLAs: you can't form a pointer to a VLA, or take its sizeof, or really anything other than directly use it. We also need to throw an exception if we try to create one with a negative or too large bound. I'm not sure if we should throw the exception in case of large size_t values. Even with the checks in place, there is still a wide gap where the definition triggers undefined behavior due to stack overflow. This whole feature seems rather poorly designed to me. The code size increase due to official VLA support in C++11y might come a bit as a surprise. But rereading N3639, there's no way around it, at least for expressions of signed types. I think there is a general mood of unsympathetic views towards liberal undefined behavior. Of course, implementations are always free to offer switches to programmers who don't want checks. -- Gaby
Re: PATCH to implement C++14 VLA semantics
On 05/13/2013 03:06 PM, Gabriel Dos Reis wrote: This whole feature seems rather poorly designed to me. The code size increase due to official VLA support in C++11y might come a bit as a surprise. But rereading N3639, there's no way around it, at least for expressions of signed types. I think there is a general mood of unsympathetic views towards liberal undefined behavior. Of course, implementations are always free to offer switches to programmers who don't want checks. And usually I'm in that crowd as well. But in this case, we add a check which only covers a tiny fraction of the problem. It's like bounds checking for arrays which only fails if the index is at least twice as large as the array length, IMHO. -- Florian Weimer / Red Hat Product Security Team
Re: section anchors and weak hidden symbols
Index: varasm.c === --- varasm.c (revision 198771) +++ varasm.c (working copy) @@ -6582,10 +6582,18 @@ default_use_anchors_for_symbol_p (const_ { /* Don't use section anchors for decls that might be defined by other modules. */ - if (!targetm.binds_local_p (decl)) + if (decl_replaceable_p (decl)) return false; Actually looking more into this, I think decl_replaceable_p is still not correct predicate: bool decl_replaceable_p (tree decl) { gcc_assert (DECL_P (decl)); if (!TREE_PUBLIC (decl) || DECL_COMDAT (decl)) return false; return !decl_binds_to_current_def_p (decl); } I think DECL_COMDAT is not what you really want to return true for. So perhaps you really want (TREE_PUBLIC (decl) decl_binds_to_current_def_p)? Honza
Re: [patch] Small emit-rtl.c / reorg.c cleanup
On 05/11/2013 01:29 PM, Steven Bosscher wrote: Hello, This just removes one unused function, and moves two functions from emit-rtl.c to reorg.c which is the only place where they're used. Will commit in a few days, barring objections. Ciao! Steven * rtl.h (next_label, skip_consecutive_labels, link_cc0_insns): Remove prototypes. * emit-rtl.c (next_label): Remove unused function. (skip_consecutive_labels, link_cc0_insns): Move to ... * reorg.c (skip_consecutive_labels, link_cc0_insns): ... here, the only place where these functions are used. OK. Jeff
Re: [google] Disable RDRAND bits when building with Clang
friendly ping On Mon, Apr 22, 2013 at 5:23 PM, Evgeniy Stepanov euge...@google.com wrote: Hi, this patch disables rdrand in c++11/random.cc when building with Clang compiler. Current Clang misses a number of definitions needed to build that. Is it OK for google/gcc-4_8 and google/main (or google/integration?) ?
Re: [AARCH64] Refactor simd_mov split
OK /Marcus On 13 May 2013 11:38, Sofiane Naci sofiane.n...@arm.com wrote: Hi, This patch refactors the simd_mov split and fixes a few coding style issues. Tested successfully on a full aarch64-elf regression run. OK for trunk? Thanks Sofiane
Re: new port: msp430-elf, revision 2
2013/5/13 Steven Bosscher stevenb@gmail.com: On Sun, May 12, 2013 at 8:56 PM, Gerald Pfeifer wrote: On Fri, 10 May 2013, DJ Delorie wrote: Index: MAINTAINERS === +msp430 port DJ Delorie d...@redhat.com +msp430 port Nick Cliftonni...@redhat.com I'll bring this up on the steering committee (not that I expect a lot of discussion :-). Actually I hope for some discussion: Should new ports be allowed in if they rely so heavily on reload? Ciao! Steven I have a new port 'nds32' target and I am going to contribute some patches on mailing list recently. We are glad that we eventually decided to enable LRA rather than relying on reload. To my experiences, it is much easier to design a new port with LRA enabled. We don't have to implement a tons of reload target hook and we can still have satisfied allocation result. For the future GCC development and target maintenance, IMHO it would be great to have 'less reload' requirement for new ports. Best regards, jasonwucj
Re: [google] Disable RDRAND bits when building with Clang
re-sending the patch On Mon, May 13, 2013 at 5:54 PM, Evgeniy Stepanov euge...@google.com wrote: friendly ping On Mon, Apr 22, 2013 at 5:23 PM, Evgeniy Stepanov euge...@google.com wrote: Hi, this patch disables rdrand in c++11/random.cc when building with Clang compiler. Current Clang misses a number of definitions needed to build that. Is it OK for google/gcc-4_8 and google/main (or google/integration?) ? rdrand.patch Description: Binary data
Re: PATCH to implement C++14 VLA semantics
On 05/13/2013 09:09 AM, Florian Weimer wrote: And usually I'm in that crowd as well. But in this case, we add a check which only covers a tiny fraction of the problem. It's like bounds checking for arrays which only fails if the index is at least twice as large as the array length, IMHO. The document is still in flux; if you have ideas about ways to improve the specification, I would be happy to submit them as a comment on the public draft. Jason
Re: [ping][patch, ARM] Fix PR42017, LR not used in leaf functions
On 13/5/13 11:15 AM, Kugan wrote: Hi, Ping this patch by Chung-Lin. http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01179.html This patch allows lr registers to be used in leaf functions for ARM. There were some concerns about performance regression in thumb2 mode for CoreMark. However, looking at the code further shows that this performance regression is due to alignment issue with core_state_transition function and as a result taking longer time to execute. In fact, there isn’t any change in the code generated for core_state_transition with and without the patch. Adding Alignment to this function improves the performance than without the patch. Just curious, were changes to enforce the alignment added already? (I'm quite out of ARM-specific context lately). Chung-Lin
[PING]RE: [patch] cilkplus: Array notation for C patch
Can someone please review this patch for us? Thanks, Balaji V. Iyer. -Original Message- From: Iyer, Balaji V Sent: Monday, May 06, 2013 11:32 AM To: Joseph S. Myers Cc: 'Aldy Hernandez'; 'gcc-patches' Subject: RE: [patch] cilkplus: Array notation for C patch Attached, please find a fixed patch. My responses to your comments are inline below: Thanks, Balaji V. Iyer. Here are the ChangeLog entries: gcc/ChangeLog +2013-05-06 Balaji V. Iyer balaji.v.i...@intel.com + + * doc/extend.texi (C Extensions): Added documentation about Cilk Plus + array notation built-in reduction functions. + * doc/passes.texi (Passes): Added documentation about changes done + for Cilk Plus. + * doc/invoke.texi (C Dialect Options): Added documentation about + the -fcilkplus flag. + * doc/generic.texi (Storage References): Added documentation for + ARRAY_NOTATION_REF storage. + * Makefile.in (C_COMMON_OBJS): Added c-family/array-notation- common.o. + (BUILTINS_DEF): Depend on cilkplus.def. + * builtins.def: Include cilkplus.def. Define DEF_CILKPLUS_BUILTIN. + * builtin-types.def: Define BT_FN_INT_PTR_PTR_PTR. + * cilkplus.def: New file. gcc/c-family/ChangeLog +2013-05-06 Balaji V. Iyer balaji.v.i...@intel.com + + * c-common.c (c_define_builtins): When cilkplus is enabled, the + function array_notation_init_builtins is called. + (c_common_init_ts): Added ARRAY_NOTATION_REF as typed. + * c-common.def (ARRAY_NOTATION_REF): New tree. + * c-common.h (build_array_notation_expr): New function declaration. + (build_array_notation_ref): Likewise. + (extract_sec_implicit_index_arg): New extern declaration. + (is_sec_implicit_index_fn): Likewise. + (ARRAY_NOTATION_CHECK): New define. + (ARRAY_NOTATION_ARRAY): Likewise. + (ARRAY_NOTATION_START): Likewise. + (ARRAY_NOTATION_LENGTH): Likewise. + (ARRAY_NOTATION_STRIDE): Likewise. + (ARRAY_NOTATION_TYPE): Likewise. + * c-pretty-print.c (pp_c_postifix_expression): Added a new case for + ARRAY_NOTATION_REF. + (pp_c_expression): Likewise. + * c.opt (flag_enable_cilkplus): New flag. + * array-notation-common.c: New file. gcc/c/ChangeLog +2013-05-06 Balaji V. Iyer balaji.v.i...@intel.com + + * c-typeck.c (build_array_ref): Added a check to see if array's + index is greater than one. If true, then emit an error. + (build_function_call_vec): Exclude error reporting and checking + for builtin array-notation functions. + (convert_arguments): Likewise. + (c_finish_return): Added a check for array notations as a return + expression. If true, then emit an error. + (c_finish_loop): Added a check for array notations in a loop + condition. If true then emit an error. + (lvalue_p): Added a ARRAY_NOTATION_REF case. + (build_binary_op): Added a check for array notation expr inside + op1 and op0. If present, we call another function to find correct + type. + * Make-lang.in (C_AND_OBJC_OBJS): Added c-array-notation.o. + * c-parser.c (c_parser_compound_statement): Check if array + notation code is used in tree, if so, then transform them into + appropriate C code. + (c_parser_expr_no_commas): Check if array notation is used in LHS + or RHS, if so, then build array notation expression instead of + regular modify. + (c_parser_postfix_expression_after_primary): Added a check for + colon(s) after square braces, if so then handle it like an array + notation. Also, break up array notations in unary op if found. + (c_parser_direct_declarator_inner): Added a check for array + notation. + (c_parser_compound_statement): Added a check for array notation in + a stmt. If one is present, then expand array notation expr. + (c_parser_if_statement): Likewise. + (c_parser_switch_statement): Added a check for array notations in + a switch statement's condition. If true, then output an error. + (c_parser_while_statement): Similarly, but for a while. + (c_parser_do_statement): Similarly, but for a do-while. + (c_parser_for_statement): Similarly, but for a for-loop. + (c_parser_unary_expression): Check if array notation is used in a + pre-increment or pre-decrement expression. If true, then expand + them. + (c_parser_array_notation): New function. + * c-array-notation.c: New file. + * c-tree.h (is_cilkplus_reduce_builtin): Protoize. -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Joseph S. Myers Sent: Monday, April 29, 2013 6:55 PM To: Iyer, Balaji V Cc: 'Aldy Hernandez'; 'gcc-patches' Subject: RE: [patch] cilkplus: Array notation for C patch Here's a review of the changes to the compiler proper in this patch. I don't think much more will
Re: Prefer scalar offset in vector shifts
On Sun, May 12, 2013 at 02:04:52PM +0200, Marc Glisse wrote: this patch passes bootstrap+testsuite on x86_64-linux-gnu. When moving uniform_vector_p, I only added the gcc_assert. Note that the fold_binary patch helps for constant vectors, but not for {n,n,n,n}, which will require some help in forwprop for instance. This transformation is already done by the vector lowering pass, but that's too late in my opinion. 2013-05-13 Marc Glisse marc.gli...@inria.fr gcc/ * tree-vect-generic.c (uniform_vector_p): Move ... * tree.c (uniform_vector_p): ... here. * tree.h (uniform_vector_p): Declare it. * fold-const.c (fold_binary_loc) shift: Turn the second argument into a scalar. gcc/testsuite/ * gcc.dg/vector-shift-2.c: New testcase. The testcase is UNSUPPORTED everywhere, because ccp1 dump isn't produced at -O0. Did you mean to add -O2 (or -O or -O3 etc.) to dg-options? --- gcc/testsuite/gcc.dg/vector-shift-2.c (revision 0) +++ gcc/testsuite/gcc.dg/vector-shift-2.c (revision 0) @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options -fdump-tree-ccp1 } */ + +typedef unsigned vec __attribute__ ((vector_size (16))); +void +f (vec *a) +{ + vec s = { 5, 5, 5, 5 }; + *a = *a s; +} + +/* { dg-final { scan-tree-dump 5 ccp1 } } */ +/* { dg-final { cleanup-tree-dump ccp1 } } */ Jakub
[PATCH] Fix up WIDEN_MULT_EXPR (PR middle-end/57251)
Hi! On the following testcase, the widen multiplies etc. is introduced so late that no forwprop or ccp follows it. The WIDEN_MULT_EXPR tries hard not to use widening multiply if both arguments are INTEGER_CSTs, but if they are e.g. SSA_NAMEs that expand_normal into CONST_INT, we can still ICE. The following patch attempts to handle those cases. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.8? 2013-05-13 Jakub Jelinek ja...@redhat.com PR middle-end/57251 * expr.c (expand_expr_real_2) case WIDEN_MULT_EXPR: Handle the case when both op0 and op1 have VOIDmode. * gcc.dg/torture/pr57251.c: New test. --- gcc/expr.c.jj 2013-05-07 10:27:07.0 +0200 +++ gcc/expr.c 2013-05-13 12:01:49.339087536 +0200 @@ -8390,6 +8390,15 @@ expand_expr_real_2 (sepops ops, rtx targ else expand_operands (treeop0, treeop1, NULL_RTX, op1, op0, EXPAND_NORMAL); + /* op0 and op1 might still be constant, despite the above +!= INTEGER_CST check. Handle it. */ + if (GET_MODE (op0) == VOIDmode GET_MODE (op1) == VOIDmode) + { + op0 = convert_modes (innermode, mode, op0, true); + op1 = convert_modes (innermode, mode, op1, false); + return REDUCE_BIT_FIELD (expand_mult (mode, op0, op1, + target, unsignedp)); + } goto binop3; } } @@ -8412,6 +8421,19 @@ expand_expr_real_2 (sepops ops, rtx targ { expand_operands (treeop0, treeop1, NULL_RTX, op0, op1, EXPAND_NORMAL); + /* op0 and op1 might still be constant, despite the above +!= INTEGER_CST check. Handle it. */ + if (GET_MODE (op0) == VOIDmode GET_MODE (op1) == VOIDmode) + { +widen_mult_const: + op0 = convert_modes (innermode, mode, op0, zextend_p); + op1 + = convert_modes (innermode, mode, op1, +TYPE_UNSIGNED (TREE_TYPE (treeop1))); + return REDUCE_BIT_FIELD (expand_mult (mode, op0, op1, + target, + unsignedp)); + } temp = expand_widening_mult (mode, op0, op1, target, unsignedp, this_optab); return REDUCE_BIT_FIELD (temp); @@ -8424,9 +8446,14 @@ expand_expr_real_2 (sepops ops, rtx targ op0 = expand_normal (treeop0); if (TREE_CODE (treeop1) == INTEGER_CST) op1 = convert_modes (innermode, mode, -expand_normal (treeop1), unsignedp); +expand_normal (treeop1), +TYPE_UNSIGNED (TREE_TYPE (treeop1))); else op1 = expand_normal (treeop1); + /* op0 and op1 might still be constant, despite the above +!= INTEGER_CST check. Handle it. */ + if (GET_MODE (op0) == VOIDmode GET_MODE (op1) == VOIDmode) + goto widen_mult_const; temp = expand_binop (mode, other_optab, op0, op1, target, unsignedp, OPTAB_LIB_WIDEN); hipart = gen_highpart (innermode, temp); --- gcc/testsuite/gcc.dg/torture/pr57251.c.jj 2013-05-13 11:46:34.937151578 +0200 +++ gcc/testsuite/gcc.dg/torture/pr57251.c 2013-05-13 11:54:23.070414122 +0200 @@ -0,0 +1,12 @@ +/* PR middle-end/57251 */ +/* { dg-do compile } */ +/* { dg-options -ftracer } */ + +short a, b; +int +f (void) +{ + long long i = 2; + a ? f () ? : 0 : b--; + b = i *= a |= 0; +} Jakub
[PATCH] Improve rotation by mode bitsize - 1 (take 2)
On Fri, May 10, 2013 at 07:15:38PM +0200, Jan Hubicka wrote: It seems to me that it is not different from normalizing reg-10 into reg+(-10) we do for years (and for good reason). It is still target preference when use add and when sub to perform the arithmetic, but it makes sense to keep single canonical form of the expression both in RTL and Gimple. For example we may want to be able to prove that (rotate reg 31) == (rotatert reg 1) is true or (rotate reg 30) == (rotatert reg 2) is also true or cross jump both variants into one instruction. Ok, this patch reverts my earlier patch and does the canonicalization, for now for RTL only. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2013-05-13 Jakub Jelinek ja...@redhat.com * expmed.c (expand_shift_1): Canonicalize rotates by constant bitsize / 2 to bitsize - 1. * simplify-rt.x (simplify_binary_operation_1) case ROTATE, case ROTATERT: Likewise. Revert: 2013-05-10 Jakub Jelinek ja...@redhat.com * config/i386/i386.md (rotateinv): New code attr. (*rotate_insnmode3_1, *rotate_insnsi3_1_zext, *rotate_insnqi3_1_slp): Emit rorl %eax instead of roll $31, %eax, etc. --- gcc/expmed.c.jj 2013-05-13 13:03:31.0 +0200 +++ gcc/expmed.c2013-05-13 15:22:39.456194286 +0200 @@ -2122,6 +2122,20 @@ expand_shift_1 (enum tree_code code, enu op1 = SUBREG_REG (op1); } + /* Canonicalize rotates by constant amount. If op1 is bitsize / 2, + prefer left rotation, if op1 is from bitsize / 2 + 1 to + bitsize - 1, use other direction of rotate with 1 .. bitsize / 2 - 1 + amount instead. */ + if (rotate + CONST_INT_P (op1) + IN_RANGE (INTVAL (op1), GET_MODE_BITSIZE (mode) / 2 + left, + GET_MODE_BITSIZE (mode) - 1)) +{ + op1 = GEN_INT (GET_MODE_BITSIZE (mode) - INTVAL (op1)); + left = !left; + code = left ? LROTATE_EXPR : RROTATE_EXPR; +} + if (op1 == const0_rtx) return shifted; --- gcc/simplify-rtx.c.jj 2013-05-02 12:42:25.0 +0200 +++ gcc/simplify-rtx.c 2013-05-13 15:48:31.171182716 +0200 @@ -3250,6 +3250,18 @@ simplify_binary_operation_1 (enum rtx_co case ROTATERT: case ROTATE: + /* Canonicalize rotates by constant amount. If op1 is bitsize / 2, +prefer left rotation, if op1 is from bitsize / 2 + 1 to +bitsize - 1, use other direction of rotate with 1 .. bitsize / 2 - 1 +amount instead. */ + if (CONST_INT_P (trueop1) + IN_RANGE (INTVAL (trueop1), + GET_MODE_BITSIZE (mode) / 2 + (code == ROTATE), + GET_MODE_BITSIZE (mode) - 1)) + return simplify_gen_binary (code == ROTATE ? ROTATERT : ROTATE, + mode, op0, GEN_INT (GET_MODE_BITSIZE (mode) + - INTVAL (trueop1))); + /* FALLTHRU */ case ASHIFTRT: if (trueop1 == CONST0_RTX (mode)) return op0; --- gcc/config/i386/i386.md.jj 2013-05-13 09:44:51.675494325 +0200 +++ gcc/config/i386/i386.md 2013-05-13 15:09:37.461637593 +0200 @@ -762,9 +762,6 @@ (define_code_attr rotate_insn [(rotate ;; Base name for insn mnemonic. (define_code_attr rotate [(rotate rol) (rotatert ror)]) -;; Base name for insn mnemonic of rotation in the other direction. -(define_code_attr rotateinv [(rotate ror) (rotatert rol)]) - ;; Mapping of abs neg operators (define_code_iterator absneg [abs neg]) @@ -9755,15 +9752,11 @@ (define_insn *rotate_insnmode3_1 return #; default: - if (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)) - { - if (operands[2] == const1_rtx) - return rotate{imodesuffix}\t%0; - if (CONST_INT_P (operands[2]) - INTVAL (operands[2]) == GET_MODE_BITSIZE (MODEmode) - 1) - return rotateinv{imodesuffix}\t%0; - } - return rotate{imodesuffix}\t{%2, %0|%0, %2}; + if (operands[2] == const1_rtx + (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))) + return rotate{imodesuffix}\t%0; + else + return rotate{imodesuffix}\t{%2, %0|%0, %2}; } } [(set_attr isa *,bmi2) @@ -9825,14 +9818,11 @@ (define_insn *rotate_insnsi3_1_zext return #; default: - if (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)) - { - if (operands[2] == const1_rtx) - return rotate{l}\t%k0; - if (CONST_INT_P (operands[2]) INTVAL (operands[2]) == 31) - return rotateinv{l}\t%k0; - } - return rotate{l}\t{%2, %k0|%k0, %2}; + if (operands[2] == const1_rtx + (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))) + return rotate{l}\t%k0; + else + return rotate{l}\t{%2, %k0|%k0, %2}; } } [(set_attr isa *,bmi2) @@ -9879,15 +9869,11 @@ (define_insn *rotate_insnmode3_1 (clobber (reg:CC
Re: Prefer scalar offset in vector shifts
On Mon, 13 May 2013, Jakub Jelinek wrote: On Sun, May 12, 2013 at 02:04:52PM +0200, Marc Glisse wrote: this patch passes bootstrap+testsuite on x86_64-linux-gnu. When moving uniform_vector_p, I only added the gcc_assert. Note that the fold_binary patch helps for constant vectors, but not for {n,n,n,n}, which will require some help in forwprop for instance. This transformation is already done by the vector lowering pass, but that's too late in my opinion. 2013-05-13 Marc Glisse marc.gli...@inria.fr gcc/ * tree-vect-generic.c (uniform_vector_p): Move ... * tree.c (uniform_vector_p): ... here. * tree.h (uniform_vector_p): Declare it. * fold-const.c (fold_binary_loc) shift: Turn the second argument into a scalar. gcc/testsuite/ * gcc.dg/vector-shift-2.c: New testcase. The testcase is UNSUPPORTED everywhere, because ccp1 dump isn't produced at -O0. Argh! contrib/compare_tests only showed me: New tests that PASS: gcc.dg/vector-shift-2.c (test for excess errors) And I happily went ahead... Did you mean to add -O2 (or -O or -O3 etc.) to dg-options? -O is enough. Could you commit that please? Or I can do it in a couple hours. Thanks for the notice and sorry for the breakage, -- Marc Glisse
Re: Prefer scalar offset in vector shifts
On Mon, May 13, 2013 at 06:46:00PM +0200, Marc Glisse wrote: -O is enough. Could you commit that please? Or I can do it in a couple hours. Done. Jakub
Re: [PATCH] Improve rotation by mode bitsize - 1 (take 2)
On Mon, May 13, 2013 at 6:43 PM, Jakub Jelinek ja...@redhat.com wrote: On Fri, May 10, 2013 at 07:15:38PM +0200, Jan Hubicka wrote: It seems to me that it is not different from normalizing reg-10 into reg+(-10) we do for years (and for good reason). It is still target preference when use add and when sub to perform the arithmetic, but it makes sense to keep single canonical form of the expression both in RTL and Gimple. For example we may want to be able to prove that (rotate reg 31) == (rotatert reg 1) is true or (rotate reg 30) == (rotatert reg 2) is also true or cross jump both variants into one instruction. Ok, this patch reverts my earlier patch and does the canonicalization, for now for RTL only. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2013-05-13 Jakub Jelinek ja...@redhat.com * expmed.c (expand_shift_1): Canonicalize rotates by constant bitsize / 2 to bitsize - 1. * simplify-rt.x (simplify_binary_operation_1) case ROTATE, case ROTATERT: Likewise. Revert: 2013-05-10 Jakub Jelinek ja...@redhat.com * config/i386/i386.md (rotateinv): New code attr. (*rotate_insnmode3_1, *rotate_insnsi3_1_zext, *rotate_insnqi3_1_slp): Emit rorl %eax instead of roll $31, %eax, etc. You can revert your own patch without approval, so the patch approval depends solely on the approval from ME maintainer. Thanks, Uros.
Re: [PATCH,RFC] Make libbacktrace more standalone
On Wed, 8 May 2013, Ian Lance Taylor wrote: +#ifdef IN_GCC Where is IN_GCC defined? I've amended configure.ac to provide a new configuration flag ('--enable-standalone') and add -DIN_GCC to EXTRA_FLAGS unless that flag is given. Previously I've misread grep output and thought that IN_GCC is defined globally. This isn't right. Using test -n ${with_target_subdir} tests whether libbacktrace is being built as a target library, using the newly built compiler. It does not test whether it is being used in a standalone build. with_target_subdir will be empty when building libbacktrace as part of the host compiler. In that case we still want to use include/dwarf2.def, and we do not want to give an error if the system does not have dwarf.h. Sorry. I'm amended that sequence to test $enable_standalone instead. +#include backtrace.h +#include internal.h Please keep these after the #include of the other header files. Done. +#ifdef IN_GCC #include dwarf2.h #include filenames.h +#else +#include dwarf.h +typedef int dwarf_attribute; +typedef int dwarf_form; +typedef int dwarf_tag; -#include backtrace.h -#include internal.h +#define IS_ABSOLUTE_PATH(f) ((f)[0] == '/') +#endif In the case where IN_GCC is defined, where are the types dwarf_attribute, dwarf_form, and dwarf_tag defined? In GCC's own dwarf2.h as enum tags; dwarf.h uses anonymous enums. When IN_GCC is defined, something needs to ensure that HAVE_DWARF2_FISSION and HAVE_DWARF2_DWZ_MULTIFILE are defined. It is ensured by defining have_dwarf2_* unconditionally to yes in that case. Updated patch below. Thanks! libbacktrace/Changelog: 2013-05-13 Alexander Monakov amona...@ispras.ru * btest.c: [!IN_GCC] (IS_DIR_SEPARATOR): Define. * configure.ac: (standalone): New configuration flag. (EXTRA_FLAGS): Add -DIN_GCC unless building standalone. (HAVE_DWARF2_FISSION, HAVE_DWARF2_DWZ_MULTIFILE): New tests. Use ... * dwarf.c: (read_attribute): ... here. [!IN_GCC] Use system dwarf.h. [!IN_GCC] (dwarf_attribute, dwarf_form, dwarf_tag): Typedef to int. Update all uses. [!IN_GCC] (IS_ABSOLUTE_PATH): Define. (read_line_program): Avoid use of DW_LNS_extended_op. * configure: Regenerate. * config.h.in: Regenerate. * Makefile.in: Regenerate. diff --git a/libbacktrace/btest.c b/libbacktrace/btest.c index cc647b8..1516099 100644 --- a/libbacktrace/btest.c +++ b/libbacktrace/btest.c @@ -38,7 +38,11 @@ POSSIBILITY OF SUCH DAMAGE. */ #include stdlib.h #include string.h +#ifdef IN_GCC #include filenames.h +#else +#define IS_DIR_SEPARATOR(c) ((c) == '/') +#endif #include backtrace.h #include backtrace-supported.h diff --git a/libbacktrace/configure.ac b/libbacktrace/configure.ac index 28b2a1c..ae23da4 100644 --- a/libbacktrace/configure.ac +++ b/libbacktrace/configure.ac @@ -58,6 +58,9 @@ AM_MAINTAINER_MODE AC_ARG_WITH(target-subdir, [ --with-target-subdir=SUBDIR Configuring in a subdirectory for target]) +AC_ARG_ENABLE(standalone, +[ --enable-standalone Do not use internal GCC headers]) + # We must force CC to /not/ be precious variables; otherwise # the wrong, non-multilib-adjusted value will be used in multilibs. # As a side effect, we have to subst CFLAGS ourselves. @@ -72,7 +75,7 @@ AC_PROG_RANLIB AC_PROG_AWK case $AWK in -) AC_MSG_ERROR([can't build without awk]) ;; +) AC_MSG_ERROR([cannot build without awk]) ;; esac LT_INIT([disable-shared]) @@ -125,6 +128,10 @@ else EXTRA_FLAGS=$EXTRA_FLAGS -frandom-seed=\$@ fi fi + +if test ${enable_standalone} != yes; then + EXTRA_FLAGS=$EXTRA_FLAGS -DIN_GCC +fi AC_SUBST(EXTRA_FLAGS) ACX_PROG_CC_WARNING_OPTS([-W -Wall -Wwrite-strings -Wstrict-prototypes \ @@ -314,6 +321,40 @@ if test $have_getexecname = yes; then AC_DEFINE(HAVE_GETEXECNAME, 1, [Define if getexecname is available.]) fi +# Check for DWARF2 extensions +if test ${enable_standalone} != yes; then + have_dwarf2_fission=yes + have_dwarf2_dwz_multifile=yes +else + AC_CHECK_HEADER([dwarf.h], +[ + AC_MSG_CHECKING([for DW_FORM_GNU_addr_index]) + AC_COMPILE_IFELSE( + [AC_LANG_PROGRAM( + [#include dwarf.h], + [int i = DW_FORM_GNU_addr_index;])], + [have_dwarf2_fission=yes], + [have_dwarf2_fission=no]) + AC_MSG_RESULT([$have_dwarf2_fission]) + AC_MSG_CHECKING([for DW_FORM_GNU_ref_alt]) + AC_COMPILE_IFELSE( + [AC_LANG_PROGRAM( + [#include dwarf.h], + [int i = DW_FORM_GNU_ref_alt;])], + [have_dwarf2_dwz_multifile=yes], + [have_dwarf2_dwz_multifile=no]) + AC_MSG_RESULT([$have_dwarf2_dwz_multifile])], +[AC_MSG_ERROR([dwarf.h required when building standalone])]) +fi +if test $have_dwarf2_fission = yes; then + AC_DEFINE(HAVE_DWARF2_FISSION, 1, + [Define if DWARF2 Fission enumeration values are defined.]) +fi +if test
Re: cfgexpand.c patch for [was new port: msp430-elf]
Can you add that (partial int modes have fewer bits than int modes) as verification to genmodes.c:make_partial_integer_mode? I could, but it would be a no-op for PARTIAL_INT_MODE() I wonder if this should not use GET_MODE_PRECISION - after all it is the precision that determines whether we have to extend / truncate? Or is precision a so much unused term on RTL that this would cause problems? The problem is, the precision of PSImode *is* the same as SImode, if you just use PARTIAL_INT_MODE() in *-modes.def
Re: [Patch:RL78] Fix hardware multiply on G13 target
Please find below an updated patch. Let me know if ok to commit. Yup, this is OK to commit. Thanks! Regards, Kaushik 2013-05-13 Kaushik Phatak kaushik.pha...@kpitcummins.com * config/rl78/rl78.md (mulsi3_g13): Add additional 'nop' required in multiply-accumulate mode
[4.7 PATCH, i386]: Fix PR57264, cld not emitted when string instructions used, and '-mcld' on command line
Hello! This PR again exposes the problem when STOS instruction is generated from the combine pass. The insn is not generated from corresponding expander, so ix86_current_function_needs_cld flag never gets set. Consequently, the CLD insn is not emitted in the prologue. The problematic combination is prevented by Jakub's PR55686 patch in 4.8+ branches, so attached patch backports it to 4.7 branch. 2013-05-13 Uros Bizjak ubiz...@gmail.com PR target/57264 Backport from mainline 2013-01-22 Jakub Jelinek ja...@redhat.com PR target/55686 * config/i386/i386.md (UNSPEC_STOS): New. (strset_singleop, *strsetdi_rex_1, *strsetsi_1, *strsethi_1, *strsetqi_1): Add UNSPEC_STOS. testsuite/ChangeLog: 2013-05-13 Uros Bizjak ubiz...@gmail.com PR target/57264 * gcc.target/i386/pr57264.c: New test. Patch was tested on x86_64-pc-linux-gnu {-m32} on 4.7 branch, and committed to 4.7 branch. The testcase will be forward ported to 4.8 and mainline SVN. Uros. Index: config/i386/i386.md === --- config/i386/i386.md (revision 198835) +++ config/i386/i386.md (working copy) @@ -109,6 +109,7 @@ UNSPEC_CALL_NEEDS_VZEROUPPER UNSPEC_PAUSE UNSPEC_LEA_ADDR + UNSPEC_STOS ;; For SSE/MMX support: UNSPEC_FIX_NOTRUNC @@ -15912,7 +15913,8 @@ [(parallel [(set (match_operand 1 memory_operand ) (match_operand 2 register_operand )) (set (match_operand 0 register_operand ) - (match_operand 3 ))])] + (match_operand 3 )) + (unspec [(const_int 0)] UNSPEC_STOS)])] ix86_current_function_needs_cld = 1;) @@ -15921,7 +15923,8 @@ (match_operand:DI 2 register_operand a)) (set (match_operand:DI 0 register_operand =D) (plus:DI (match_dup 1) -(const_int 8)))] +(const_int 8))) + (unspec [(const_int 0)] UNSPEC_STOS)] TARGET_64BIT !(fixed_regs[AX_REG] || fixed_regs[DI_REG]) stosq @@ -15934,7 +15937,8 @@ (match_operand:SI 2 register_operand a)) (set (match_operand:P 0 register_operand =D) (plus:P (match_dup 1) - (const_int 4)))] + (const_int 4))) + (unspec [(const_int 0)] UNSPEC_STOS)] !(fixed_regs[AX_REG] || fixed_regs[DI_REG]) stos{l|d} [(set_attr type str) @@ -15946,7 +15950,8 @@ (match_operand:HI 2 register_operand a)) (set (match_operand:P 0 register_operand =D) (plus:P (match_dup 1) - (const_int 2)))] + (const_int 2))) + (unspec [(const_int 0)] UNSPEC_STOS)] !(fixed_regs[AX_REG] || fixed_regs[DI_REG]) stosw [(set_attr type str) @@ -15958,7 +15963,8 @@ (match_operand:QI 2 register_operand a)) (set (match_operand:P 0 register_operand =D) (plus:P (match_dup 1) - (const_int 1)))] + (const_int 1))) + (unspec [(const_int 0)] UNSPEC_STOS)] !(fixed_regs[AX_REG] || fixed_regs[DI_REG]) stosb [(set_attr type str) Index: testsuite/gcc.target/i386/pr57264.c === --- testsuite/gcc.target/i386/pr57264.c (revision 0) +++ testsuite/gcc.target/i386/pr57264.c (working copy) @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options -O1 -mcld } */ + +void test (int x, int **pp) +{ + while (x) +{ + int *ip = *pp; + int *op = *pp; + while (*ip) + { + int v = *ip++; + *op++ = v + 1; + } +} +} + +/* { dg-final { scan-assembler-not stosl } } */
[Patch, Fortran] PR48858 - COMMON - Fix global/local identifier issues with C binding
First, it adds a missing -std=f95 check for Fortran 2003's BIND(C) statement. Secondly, it honors the COMMON identifier changes of Fortran 2008. In Fortran 2003, one has: The name of a program unit, common block, or external procedure is a global identifier and shall not be the same as the name of any other such global entity in the same program. (16.1 Scope of global identifiers) In Fortran 2008 it has been modified to: The name of a common block with no binding label, external procedure with no binding label, or program unit that is not a submodule is a global identifier. (16.2 Scope of global identifiers) Thus, this patch only generates a gsym if a common block either has no binding label or -std=f2003 is used. Additionally, it ensures in trans-common.c that this is correctly handled. Build and regtested on x86-64-gnu-linux. OK for the trunk? Tobias PS: Still to be done is a similar change for procedures. Except that for procedures even more changes are required, e.g. having two identical INTERFACE (with different Fortran name but same binding name) is valid. 2013-05-13 Tobias Burnus bur...@net-b.de PR fortran/48858 * decl.c (gfc_match_bind_c_stmt): Add gfc_notify_std. * match.c (gfc_match_common): Don't add commons to gsym. * resolve.c (resolve_common_blocks): Add to gsym and add checks. (resolve_bind_c_comms): Remove. (resolve_types): Remove call to the latter. * trans-common.c (gfc_common_ns): Remove static var. (gfc_map_of_all_commons): Add static var. (build_common_decl): Correctly handle binding label. 2013-05-13 Tobias Burnus bur...@net-b.de PR fortran/48858 * gfortran.dg/test_common_binding_labels.f03: Update dg-error. * gfortran.dg/test_common_binding_labels_2_main.f03: Ditto. * gfortran.dg/test_common_binding_labels_3_main.f03: Ditto. * gfortran.dg/common_18.f90: New. * gfortran.dg/common_19.f90: New. * gfortran.dg/common_20.f90: New. * gfortran.dg/common_21.f90: New. diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c index 6ae51e2..06a049c 100644 --- a/gcc/fortran/decl.c +++ b/gcc/fortran/decl.c @@ -4208,6 +4208,9 @@ gfc_match_bind_c_stmt (void) if (found_match == MATCH_YES) { + if (!gfc_notify_std (GFC_STD_F2003, BIND(C) statement at %C)) + return MATCH_ERROR; + /* Look for the :: now, but it is not required. */ gfc_match ( :: ); diff --git a/gcc/fortran/match.c b/gcc/fortran/match.c index 07f8f63..b44d815 100644 --- a/gcc/fortran/match.c +++ b/gcc/fortran/match.c @@ -4332,7 +4332,6 @@ gfc_match_common (void) gfc_array_spec *as; gfc_equiv *e1, *e2; match m; - gfc_gsymbol *gsym; old_blank_common = gfc_current_ns-blank_common.head; if (old_blank_common) @@ -4349,23 +4348,6 @@ gfc_match_common (void) if (m == MATCH_ERROR) goto cleanup; - gsym = gfc_get_gsymbol (name); - if (gsym-type != GSYM_UNKNOWN gsym-type != GSYM_COMMON) - { - gfc_error (Symbol '%s' at %C is already an external symbol that - is not COMMON, name); - goto cleanup; - } - - if (gsym-type == GSYM_UNKNOWN) - { - gsym-type = GSYM_COMMON; - gsym-where = gfc_current_locus; - gsym-defined = 1; - } - - gsym-used = 1; - if (name[0] == '\0') { t = gfc_current_ns-blank_common; diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c index e27b23b..06fa301 100644 --- a/gcc/fortran/resolve.c +++ b/gcc/fortran/resolve.c @@ -947,6 +947,7 @@ static void resolve_common_blocks (gfc_symtree *common_root) { gfc_symbol *sym; + gfc_gsymbol * gsym; if (common_root == NULL) return; @@ -958,6 +959,84 @@ resolve_common_blocks (gfc_symtree *common_root) resolve_common_vars (common_root-n.common-head, true); + /* The common name is a global name - in Fortran 2003 also if it has a + C binding name, since Fortran 2008 only the C binding name is a global + identifier. */ + if (!common_root-n.common-binding_label + || gfc_notification_std (GFC_STD_F2008)) +{ + gsym = gfc_find_gsymbol (gfc_gsym_root, + common_root-n.common-name); + + if (gsym gfc_notification_std (GFC_STD_F2008) + gsym-type == GSYM_COMMON + ((common_root-n.common-binding_label + (!gsym-binding_label + || strcmp (common_root-n.common-binding_label, + gsym-binding_label) != 0)) + || (!common_root-n.common-binding_label + gsym-binding_label))) + { + gfc_error (In Fortran 2003 COMMON '%s' block at %L is a global + identifier and must thus have the same binding name + as the same-named COMMON block at %L: %s vs %s, + common_root-n.common-name, common_root-n.common-where, + gsym-where, + common_root-n.common-binding_label + ? common_root-n.common-binding_label : (blank), + gsym-binding_label ? gsym-binding_label : (blank)); + return; + } + + if (gsym gsym-type != GSYM_COMMON + !common_root-n.common-binding_label) + { + gfc_error (COMMON block '%s' at %L uses the same global
RFA: fix avr compile/limits-externdecl.c failures
All the gcc.c-torture/compile/limits-externdecl.c currently give an error: size of array is too large, followed by an ICE in avr_encode_section_info, which goes on to try to find the address space of error_mark_node. Given the size of the array, it makes sense for the test to give an error where POINTER_SIZE is 16 bit, but then, we should mark this as an expected error for this target. Moreover, we shouldn't ICE after the error. The attached patch implements both these changes. regression tested for i686-pc-linux-gnu X avr, Running target atmega128-sim 2013-05-13 Joern Rennecke joern.renne...@embecosm.com gcc: * config/avr/avr.c (avr_encode_section_info): Bail out if the type is error_mark_node. gcc/testsuite: * testsuite/gcc.c-torture/compile/limits-externdecl.c [target avr-*-*]: Expect size of array is too large error. Index: config/avr/avr.c === --- config/avr/avr.c(revision 198829) +++ config/avr/avr.c(working copy) @@ -8324,7 +8324,11 @@ avr_encode_section_info (tree decl, rtx SYMBOL_REF == GET_CODE (XEXP (rtl, 0))) { rtx sym = XEXP (rtl, 0); - addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (decl)); + tree type = TREE_TYPE (decl); + + if (type == error_mark_node) + return; + addr_space_t as = TYPE_ADDR_SPACE (type); /* PSTR strings are in generic space but located in flash: patch address space. */ Index: testsuite/gcc.c-torture/compile/limits-externdecl.c === --- testsuite/gcc.c-torture/compile/limits-externdecl.c (revision 198829) +++ testsuite/gcc.c-torture/compile/limits-externdecl.c (working copy) @@ -52,4 +52,4 @@ #define LIM6(x) LIM5(x##0) LIM5(x##1) LI REFERENCE references[] = { LIM5 (X) 0 -}; +}; /* { dg-error size of array is too large { target avr-*-* } } */
compare_tests
So, it turns out that comm on macosx does't like long lines. Don't know why that is, kinda unfortunate, but life goes on. 2013-05-13 Mike Stump mikest...@comcast.net * compare_tests: Limit lines to 2000 characters as comm on Mac OS X 10.8.3 doesn't like long lines (those 2055 characters or more). Index: compare_tests === --- compare_tests (revision 198796) +++ compare_tests (working copy) @@ -2,6 +2,9 @@ # This script automatically test the given tool with the tool's test cases, # reporting anything of interest. +# Written by Mike Stump m...@cygnus.com +# Subdir comparison added by Quentin Neill quentin.ne...@amd.com + usage() { if [ -n $1 ] ; then @@ -29,9 +32,6 @@ EOUSAGE exit 2 } -# Written by Mike Stump m...@cygnus.com -# Subdir comparison added by Quentin Neill quentin.ne...@amd.com - export LC_ALL=C tool=gxx @@ -107,8 +107,8 @@ elif [ -d $1 -o -d $2 ] ; then usage Must specify either two directories or two files fi -sed 's/^XFAIL/FAIL/; s/^XPASS/PASS/' $1 | awk '/^Running target / {target = $3} { if (target != unix) { sub(/: /, target: ); }; print $0; }' $tmp1 -sed 's/^XFAIL/FAIL/; s/^XPASS/PASS/' $2 | awk '/^Running target / {target = $3} { if (target != unix) { sub(/: /, target: ); }; print $0; }' $tmp2 +sed 's/^XFAIL/FAIL/; s/^XPASS/PASS/' $1 | awk '/^Running target / {target = $3} { if (target != unix) { sub(/: /, target: ); }; print $0; }' | cut -c1-2000 $tmp1 +sed 's/^XFAIL/FAIL/; s/^XPASS/PASS/' $2 | awk '/^Running target / {target = $3} { if (target != unix) { sub(/: /, target: ); }; print $0; }' | cut -c1-2000 $tmp2 before=$tmp1 now=$tmp2
Re: [PATCH] Dynamic dispatch of multiversioned functions and CPU mocks for code coverage.
The MV testing support includes 3 logical parts: 1) runtime APIs to check mocked CPU types and features (__builtin_mock_cpu_supports ..) 2) runtime APIs to do CPU mocking; 3) compile time option to do lazy dispatching (instead of using IFUNC). 3) can be used to also support target without IFUNC support, but it should be handled differently -- for instance, it does not need an option, nor should it use the mock version of the feature testing. I like the flexibility the patch provides for testing -- it allows global mocking via environment variable, and fine grain mocking at each callsite. The former is good for application testing, and latter is suitable for unit testing. What is the design of the environment variable used to control the behavior of __builtin_mock_cpu...? They are part of the user interface and should be documented somewhere. thanks, David On Thu, May 9, 2013 at 7:30 PM, Sriraman Tallam tmsri...@google.com wrote: Hi, This patch is an enhancement to the Function Multiversioning feature. This patch achieves two things: * Primarily, this patch makes it easy to test for code coverage of multiversioned functions. * Secondary, It makes function multiversioning work when there is no ifunc support. Since it invokes the dispatcher for every call, it is possible to execute different function versions every time. This incurs a performance penalty. This patch makes it easy to test for code coverage of multiversioned functions. Here is a motivating example: __attribute__((target (default))) int foo () { ... return 0; } __attribute__((target (sse))) int foo () { ... return 1; } __attribute__((target (popcnt))) int foo () { ... return 2; } int main () { return foo(); } Lets say your test CPU supports popcnt. A run of this program will invoke the popcnt version of foo (). Then, how do we test the sse version of foo()? To do that for the above example, we need to run this code on a CPU that has sse support but no popcnt support. Otherwise, we need to comment out the popcnt version and run this example. This can get painful when there are many versions. The same argument applies to testing the default version of foo. So, I am introducing the ability to mock a CPU. If the CPU you are testing on supports sse, you should be able to test the sse version. First, I have introduced a new flag called -fmultiversion-dynamic-dispatch. This patch invokes the function version dispatcher every time a call to a foo () is made. Without that flag, the version dispatch happens once at startup time via the IFUNC mechanism. Also, with -fmultiversion-dynamic-dispatch, the version dispatcher uses the two new builtins __builtin_mock_cpu_is and __builtin_mock_cpu_supports to check the cpu type and cpu isa. Then, I plan to add the following hooks to libgcc (in a different patch) : int set_mock_cpu_is (const char *cpu); int set_mock_cpu_supports (const char *isa); int init_mock_cpu (); // Clear the values of the mock cpu. With this support, here is how you can test for code coverage of the sse version and default version of foo in the above example: int main () { // Test SSE version. if (__builtin_cpu_supports (sse)) { init_mock_cpu(); set_mock_cpu_supports (sse); assert (foo () == 1); } // Test default version. init_mock_cpu(); assert (foo () == 0); } Invoking a multiversioned binary several times with appropriate mock cpu values for the various ISAs and CPUs will give the complete code coverage desired. Ofcourse, the underlying platform should be able to support the various features. Note that the above test will work only with -fmultiversion-dynamic-dispatch as the dispatcher must be invoked on every multiversioned call to be able to dynamically change the version. Multiple ISA features can be set in the mock cpu by calling set_mock_cpu_supports several times with different ISA names. Calling init_mock_cpu will clear all the values. set_mock_cpu_is will set the CPU type. This patch only includes the gcc changes. I will separately prepare a patch for the libgcc changes. Right now, since the libgcc changes are not available the two new mock cpu builtins check the real CPU like __builtin_cpu_is and __builtin_cpu_supports. Patch attached. Please look at mv14_debug_code_coverage.C for an exhaustive example of testing for code coverage in the presence of multiple versions. This patch was already discussed when sent earlier to google/gcc-4_7 branch. That is here: http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00557.html Some of the alternative suggested here are: * Lazy IFUNC relocation, which got shot down due to problems with bad interactions with other shared libraries. * Using environment variables to mock CPU architectures: This may still be plausible. For instance: LD_CPU_FEATURES=sse,sse2 ./a.out # run as if only sse and sse2 are available However, with dynamic
Re: [Patch][google/gcc-4_8] Backport trunk@198547 for pr target/56732
OK for google branches. On Thu, May 9, 2013 at 1:40 PM, Han Shen(沈涵) shen...@google.com wrote: Hi, I'm to backport trunk patch @198547 for pr target/56732 to google branch google/gcc-4_8. This patch fixes arm ICE. Ok for google/gcc-4_8? [patch attached] H.
Using GS for TLS on x86-64 for target RDOS
I would need a way to use GS segment register instead of FS for x86-64 for target RDOS since RDOS cannot use FS for TLS. It seems like the code related to this is concentrated to two different places: The gcc/config/i386/i386.c: 11677:seg = TARGET_64BIT ? SEG_FS : SEG_GS; 13526: if (ix86_decompose_address (x, addr) == 0 || addr.seg != (TARGET_64BIT ? SEG_FS : SEG_GS) || addr.disp == NULL_RTX || GET_CODE (addr.disp) != CONST) Especially the second reference would become hard-to-read if more conditionals are added to it. Perhaps the code could be changed to something like this: #ifdef TARGET_RDOS #define GET_TLS_SEG_REG SEG_GS #else #define GET_TLS_SEG_REG TARGET_64BIT ? SEG_FS : SEG_GS #endif Then the above could be patched to: 11677:seg = GET_TLS_SEG_REG; 13526: if (ix86_decompose_address (x, addr) == 0 || addr.seg != (GET_TLS_SEG_REG) || addr.disp == NULL_RTX || GET_CODE (addr.disp) != CONST) Thoughts? Regards, Leif Ekblad
C++ PATCH for 3 ref-qualifier issues
3 places that hadn't yet been updated to handle ref-qualifiers. Tested x86_64-pc-linux-gnu, applying to trunk and 4.8. commit e4f9e32f7e7b9e508f713ac15a9edbd275d4bd85 Author: Jason Merrill ja...@redhat.com Date: Mon May 13 10:45:41 2013 -0400 PR c++/57252 * decl.c (decls_match): Compare ref-qualifiers. diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index d8363b2..faa2911 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -1024,6 +1024,7 @@ decls_match (tree newdecl, tree olddecl) else types_match = compparms (p1, p2) + type_memfn_rqual (f1) == type_memfn_rqual (f2) (TYPE_ATTRIBUTES (TREE_TYPE (newdecl)) == NULL_TREE || comp_type_attributes (TREE_TYPE (newdecl), TREE_TYPE (olddecl)) != 0); diff --git a/gcc/testsuite/g++.dg/cpp0x/ref-qual10.C b/gcc/testsuite/g++.dg/cpp0x/ref-qual10.C new file mode 100644 index 000..1b6c54f --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/ref-qual10.C @@ -0,0 +1,13 @@ +// PR c++/57252 +// { dg-require-effective-target c++11 } + +struct foo { + void bar() {} + void bar() {} +}; + +int main() +{ + auto p = foo::bar; // { dg-error } + (foo{}.*p)(); +} commit d6cad04704766c07324719463aac32c1ea428659 Author: Jason Merrill ja...@redhat.com Date: Mon May 13 10:54:47 2013 -0400 PR c++/57253 * decl.c (grokdeclarator): Apply ref-qualifier in the TYPENAME case. diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index faa2911..3854135 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -10284,7 +10284,7 @@ grokdeclarator (const cp_declarator *declarator, type = void_type_node; } } - else if (memfn_quals) + else if (memfn_quals || rqual) { if (ctype == NULL_TREE TREE_CODE (type) == METHOD_TYPE) diff --git a/gcc/testsuite/g++.dg/cpp0x/ref-qual11.C b/gcc/testsuite/g++.dg/cpp0x/ref-qual11.C new file mode 100644 index 000..15dd049 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/ref-qual11.C @@ -0,0 +1,10 @@ +// PR c++/57253 +// { dg-require-effective-target c++11 } + +templatetypename T struct foo; + +template struct foovoid() {}; +template struct foovoid() {}; + +int main() +{} commit 1f114e1a2911396b76dfecda214e132e10b41816 Author: Jason Merrill ja...@redhat.com Date: Mon May 13 11:26:25 2013 -0400 PR c++/57254 * typeck.c (merge_types): Propagate ref-qualifier in METHOD_TYPE case. diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c index b8ea555..fb75847 100644 --- a/gcc/cp/typeck.c +++ b/gcc/cp/typeck.c @@ -851,6 +851,7 @@ merge_types (tree t1, tree t2) tree raises = merge_exception_specifiers (TYPE_RAISES_EXCEPTIONS (t1), TYPE_RAISES_EXCEPTIONS (t2), NULL_TREE); + cp_ref_qualifier rqual = type_memfn_rqual (t1); tree t3; /* If this was a member function type, get back to the @@ -864,6 +865,7 @@ merge_types (tree t1, tree t2) t3 = build_method_type_directly (basetype, TREE_TYPE (t3), TYPE_ARG_TYPES (t3)); t1 = build_exception_variant (t3, raises); + t1 = build_ref_qualified_type (t1, rqual); break; } diff --git a/gcc/testsuite/g++.dg/cpp0x/ref-qual12.C b/gcc/testsuite/g++.dg/cpp0x/ref-qual12.C new file mode 100644 index 000..b0a16fe --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/ref-qual12.C @@ -0,0 +1,22 @@ +// PR c++/57254 +// { dg-require-effective-target c++11 } + +struct foo { +templatetypename T +void bar(T) ; + +templatetypename T +void bar(T) ; +}; + +templatetypename T +void foo::bar(T) {} + +templatetypename T +void foo::bar(T) {} + +int main() +{ + foo f; + f.bar(0); +}
C++ PATCH for c++/57041 (ICE with designated initializer)
If we don't like the designator, we should fail pleasantly. Tested x86_64-pc-linux-gnu, applying to trunk and 4.8. commit be2b8cf9756a4ca4be6583f65fc4388ada2a5b77 Author: Jason Merrill ja...@redhat.com Date: Mon May 13 13:26:12 2013 -0400 PR c++/57041 * decl.c (reshape_init_class): Handle error_mark_node. diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index 3854135..acc8371 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -5200,6 +5200,9 @@ reshape_init_class (tree type, reshape_iter *d, bool first_initializer_p, /* Handle designated initializers, as an extension. */ if (d-cur-index) { + if (d-cur-index == error_mark_node) + return error_mark_node; + if (TREE_CODE (d-cur-index) == INTEGER_CST) { if (complain tf_error) diff --git a/gcc/testsuite/g++.dg/ext/desig6.C b/gcc/testsuite/g++.dg/ext/desig6.C new file mode 100644 index 000..30882a6 --- /dev/null +++ b/gcc/testsuite/g++.dg/ext/desig6.C @@ -0,0 +1,18 @@ +// PR c++/57041 +// { dg-options -std=gnu++11 } +// { dg-prune-output error: } + +templatetypename T +union u { + T a; + char b; +}; + +templatetypename T +uT make_u(T t) { + return { .a = t }; +} + +int main() { + return make_uint(1).a; +}
[Ping] Re: [Patch] Fix PR56780: --disable-install-libiberty still installs libiberty.a
Hi, Is anyone able to review the below please (original patch attached to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56780 and first posted at http://gcc.gnu.org/ml/gcc-patches/2013-04/msg00167.html Thanks, Matt. On Wed, 2013-04-03 at 15:03 +0100, Matt Burgess wrote: Hi, Please find attached a patch that fixes PR56780. Build tested on x86_64-linux. I've also attached it to the bug. Regards, Matt Burgess 2013-04-03 Matt Burgess matt...@linuxfromscratch.org other/PR56780 * libiberty/configure.ac: Move test for --enable-install-libiberty outside of the 'with_target_subdir' test so that it actually gets run. Add output messages to show the test result. * libiberty/configure: Regenerate. * libiberty/Makefile.in (install_to_libdir): Place the installation of the libiberty library in the same guard as that used for the headers to prevent it being installed unless requested via --enable-install-libiberty.
C++ PATCH for c++/56998 (ICE with call in C++03 mode)
In this testcase, since one of the overloads takes a pointer, we need to check whether the argument is a valid null pointer constant. In C++11 only an integer literal can be a null pointer constant of integral type, but in C++03 we need to handle other integer constant expressions. Here we were passing the call expression into value_dependent_expression_p, but in C++03 mode that function expects its argument to be a constant expression, and the expression wasn't filtered out by potential_constant_expression. The simplest solution is to just check TREE_SIDE_EFFECTS since in C++03 mode an expression with side-effects can't be a constant expression (unlike in C++11, where constexpr function substitution can produce a constant expression from one with TREE_SIDE_EFFECTS). Tested x86_64-pc-linux-gnu, applying to trunk and 4.8. commit bd25166552f6aa94b8da3ffea0201213e8057b17 Author: Jason Merrill ja...@redhat.com Date: Mon May 13 13:34:31 2013 -0400 PR c++/56998 * call.c (null_ptr_cst_p): An expression with side-effects can't be a C++03 null pointer constant. diff --git a/gcc/cp/call.c b/gcc/cp/call.c index bd8f531..9f3a50d 100644 --- a/gcc/cp/call.c +++ b/gcc/cp/call.c @@ -554,7 +554,7 @@ null_ptr_cst_p (tree t) if (CP_INTEGRAL_TYPE_P (TREE_TYPE (t))) { /* Core issue 903 says only literal 0 is a null pointer constant. */ - if (cxx_dialect cxx0x) + if (cxx_dialect cxx0x !TREE_SIDE_EFFECTS (t)) t = maybe_constant_value (fold_non_dependent_expr_sfinae (t, tf_none)); STRIP_NOPS (t); if (integer_zerop (t) !TREE_OVERFLOW (t)) diff --git a/gcc/testsuite/g++.dg/template/overload13.C b/gcc/testsuite/g++.dg/template/overload13.C new file mode 100644 index 000..d41ccd0 --- /dev/null +++ b/gcc/testsuite/g++.dg/template/overload13.C @@ -0,0 +1,16 @@ +// PR c++/56998 + +class Secret; +char IsNullLiteralHelper(Secret* p); +char (IsNullLiteralHelper(...))[2]; + +struct C +{ + int val() { return 42; } +}; + +template typename T +unsigned f() +{ + return sizeof(IsNullLiteralHelper(C().val())); +}
Re: C++ PATCH for c++/56998 (ICE with call in C++03 mode)
On Mon, May 13, 2013 at 03:25:11PM -0400, Jason Merrill wrote: In this testcase, since one of the overloads takes a pointer, we need to check whether the argument is a valid null pointer constant. In C++11 only an integer literal can be a null pointer constant of integral type, but in C++03 we need to handle other integer constant expressions. Here we were passing the call expression into value_dependent_expression_p, but in C++03 mode that function expects its argument to be a constant expression, and the expression wasn't filtered out by potential_constant_expression. The simplest solution is to just check TREE_SIDE_EFFECTS since in C++03 mode an expression with side-effects can't be a constant expression (unlike in C++11, where constexpr function substitution can produce a constant expression from one with TREE_SIDE_EFFECTS). What about the 4 other maybe_constant_value on fold_non_dependent_expr_sfinae (something, tf_none) calls in typeck.c (two for -Wdiv-by-zero and two for shift diagnostics)? Should that be if (cxx_dialect = cxx0x || !TREE_SIDE_EFFECTS (t)) guarded (or wrapped into some helper function that will do all of tree some_good_name (tree t) { if (cxx_dialect cxx0x TREE_SIDE_EFFECTS (t)) return t; t = fold_non_dependent_expr_sfinae (t, tf_none); return maybe_constant_value (t); } Plus there is one guarded by ENABLE_CHECKING in pt.c. Jakub
[PATCH] New switch optimization pass (PR tree-optimization/54742)
Here is the latest version of my SSA optimization pass to do the switch statement optimization described in PR 54742 (core_state_transition from coremark). I have tested this optimization with a x86 bootstrap and GCC test run with no errors and tested the MIPS cross compiler with no errors. Because of that I decided to submit it as a statically linked optimization pass instead of a dynamically loaded one, though I did keep the ifdefs for using it as a dynamic pass in the code. They could be removed if this patch is approved as a statically linked pass. Also, while this patch shows the optimization only being turned on with the -ftree-switch-shortcut flag, my bootstrap and testing had it turned on for all -O2 optimizations in order to maximize the testing. We may want to turn this on for -O3 and/or for -fexpensive-optimizations. I had to make one change to dominance.c in order to avoid some compiler aborts where it was dereferencing a null pointer. I believe this was happening because I am calling gimple_duplicate_sese_region with regions that are not really SESE. Because I am doing this, I regenerate the cfg and SSA information after each call, but I also had to change iterate_fix_dominators to fix the abort. Another way we might want to fix this would be to pass a flag to gimple_duplicate_sese_region that tells it whether or not we want it to recalculate the dominance information at all. If set to false, it would assume the caller will take care of it. Opinions? OK to checkin? Steve Ellcey sell...@imgtec.com 2013-05-13 Steve Ellcey sell...@imgtec.com PR tree-optimization/54742 * Makefile.in (OBJS): Add tree-switch-shortcut.o. * common.opt (ftree-switch-shortcut): New. * dominance.c (iterate_fix_dominators): Add null check. * params.def (PARAM_MAX_SWITCH_INSNS): New. (PARAM_MAX_SWITCH_PATHS): New. * passes.c (init_optimization_passes): Add pass_switch_shortcut. * timevar.def (TV_SWITCH_SHORTCUT): New. * tree-pass.c (pass_switch_shortcut): New. * tree-switch-shortcut.c: New file. diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 903125e..db0ffcb 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1399,6 +1399,7 @@ OBJS = \ tree-scalar-evolution.o \ tree-sra.o \ tree-switch-conversion.o \ + tree-switch-shortcut.o \ tree-ssa-address.o \ tree-ssa-alias.o \ tree-ssa-ccp.o \ diff --git a/gcc/common.opt b/gcc/common.opt index 4c7933e..e028e2d 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2160,6 +2160,10 @@ ftree-sra Common Report Var(flag_tree_sra) Optimization Perform scalar replacement of aggregates +ftree-switch-shortcut +Common Report Var(flag_tree_switch_shortcut) Init(0) Optimization +Do fancy switch statement shortcutting + ftree-ter Common Report Var(flag_tree_ter) Optimization Replace temporary expressions in the SSA-normal pass diff --git a/gcc/dominance.c b/gcc/dominance.c index 5c96dad..d858ad1 100644 --- a/gcc/dominance.c +++ b/gcc/dominance.c @@ -1251,6 +1251,7 @@ iterate_fix_dominators (enum cdi_direction dir, vecbasic_block bbs, struct pointer_map_t *map; int *parent, *son, *brother; unsigned int dir_index = dom_convert_dir_to_idx (dir); + void **slot; /* We only support updating dominators. There are some problems with updating postdominators (need to add fake edges from infinite loops @@ -1357,7 +1358,10 @@ iterate_fix_dominators (enum cdi_direction dir, vecbasic_block bbs, if (dom == bb) continue; - dom_i = (size_t) *pointer_map_contains (map, dom); + slot = pointer_map_contains (map, dom); + if (slot == NULL) + continue; + dom_i = (size_t) *slot; /* Do not include parallel edges to G. */ if (!bitmap_set_bit ((bitmap) g-vertices[dom_i].data, i)) diff --git a/gcc/params.def b/gcc/params.def index 3c52651..bdabe07 100644 --- a/gcc/params.def +++ b/gcc/params.def @@ -1020,6 +1020,20 @@ DEFPARAM (PARAM_MAX_SLSR_CANDIDATE_SCAN, strength reduction, 50, 1, 99) +/* Maximum number of instructions to duplicate when shortcutting a switch. */ +DEFPARAM (PARAM_MAX_SWITCH_INSNS, + max-switch-insns, + Maximum number of instructions to duplicate when + shortcutting a switch statement, + 100, 1, 99) + +/* Maximum number of paths to duplicate when shortcutting a switch. */ +DEFPARAM (PARAM_MAX_SWITCH_PATHS, + max-switch-paths, + Maximum number of new paths to create when + shortcutting a switch statement, + 50, 1, 99) + /* Local variables: mode:c diff --git a/gcc/passes.c b/gcc/passes.c index fd67ee6..0fb826c 100644 --- a/gcc/passes.c +++ b/gcc/passes.c @@ -1416,6 +1416,7 @@ init_optimization_passes (void) NEXT_PASS (pass_call_cdce); NEXT_PASS (pass_cselim); NEXT_PASS (pass_tree_ifcombine); + NEXT_PASS (pass_switch_shortcut); NEXT_PASS (pass_phiopt); NEXT_PASS (pass_tail_recursion); NEXT_PASS (pass_ch); diff --git a/gcc/timevar.def
Re: [PATCH] New switch optimization pass (PR tree-optimization/54742)
On 05/13/2013 02:16 PM, Steve Ellcey wrote: Here is the latest version of my SSA optimization pass to do the switch statement optimization described in PR 54742 (core_state_transition from coremark). I have tested this optimization with a x86 bootstrap and GCC test run with no errors and tested the MIPS cross compiler with no errors. Because of that I decided to submit it as a statically linked optimization pass instead of a dynamically loaded one, though I did keep the ifdefs for using it as a dynamic pass in the code. They could be removed if this patch is approved as a statically linked pass. Also, while this patch shows the optimization only being turned on with the -ftree-switch-shortcut flag, my bootstrap and testing had it turned on for all -O2 optimizations in order to maximize the testing. We may want to turn this on for -O3 and/or for -fexpensive-optimizations. I had to make one change to dominance.c in order to avoid some compiler aborts where it was dereferencing a null pointer. I believe this was happening because I am calling gimple_duplicate_sese_region with regions that are not really SESE. Because I am doing this, I regenerate the cfg and SSA information after each call, but I also had to change iterate_fix_dominators to fix the abort. Another way we might want to fix this would be to pass a flag to gimple_duplicate_sese_region that tells it whether or not we want it to recalculate the dominance information at all. If set to false, it would assume the caller will take care of it. Opinions? OK to checkin? Steve Ellcey sell...@imgtec.com 2013-05-13 Steve Ellcey sell...@imgtec.com PR tree-optimization/54742 * Makefile.in (OBJS): Add tree-switch-shortcut.o. * common.opt (ftree-switch-shortcut): New. * dominance.c (iterate_fix_dominators): Add null check. * params.def (PARAM_MAX_SWITCH_INSNS): New. (PARAM_MAX_SWITCH_PATHS): New. * passes.c (init_optimization_passes): Add pass_switch_shortcut. * timevar.def (TV_SWITCH_SHORTCUT): New. * tree-pass.c (pass_switch_shortcut): New. * tree-switch-shortcut.c: New file. I was looking at this last week (stuck for hours on tarmac at BWI). I think we should fix this in the threader rather than doing a special purpose pass. This is primarily because we get it for free if we address one limitation in the threader (some of the other issues I was concerned about don't apply). Specifically if we look at thread_around_empty_block we have: /* This block must have a single predecessor (E-dest). */ if (!single_pred_p (bb)) return NULL; /* This block must have more than one successor. */ if (single_succ_p (bb)) return NULL; /* This block can have no PHI nodes. This is overly conservative. */ if (!gsi_end_p (gsi_start_phis (bb))) return NULL; The test that the block have 1 successor and no PHI nodes are the keys. ;; basic block 17, loop depth 1 ;;pred: 9 ;;13 ;;16 ;;15 ;;4 ;;12 ;;7 ;;14 ;;10 ;;5 ;;6 ;;8 ;;11 # state_1 = PHI 0(9), 2(13), 3(16), 3(15), state_36(4), 1(12), 0(7), 2(14), 1(10), 1(5), 0(6), 2(8), 3(11) # .MEM_4 = PHI .MEM_29(9), .MEM_20(13), .MEM_24(16), .MEM_29(15), .MEM_29(4), .MEM_29(12), .MEM_12(7), .MEM_29(14), .MEM_16(10), .MEM_29(5), .MEM_29(6), .MEM_29(8), .MEM_29(11) L32: str_25 = str_37 + 1; # VUSE .MEM_4 _8 = MEM[base: str_25, offset: 0B]; if (_8 != 0) goto bb 18; else goto bb 19; ;;succ: 18 ;;19 ;; basic block 18, loop depth 1 ;;pred: 17 goto bb 4; ;;succ: 4 bb will be block #18. In theory we just do if (single_succ_p (bb)) bb = single_succ_edge (bb)-dest; And allow PHI nodes in this specific instance. That's enough to allow existing code to identify the potential jump threading candidates -- and more importantly, it's much more general. The updater needs copy bb17 bb4. bb17' will transfer to bb4' which will directly transfer to the destination of the switch. The advantage of doing this in the threader is it'll help more than just the switch statement case. It's probably highly likely that we're missing cases to thread through empty blocks with PHIs. I've started cobbling together some of the updating code. So far it doesn't look too terrible. Jeff
Re: [PATCH,RFC] Make libbacktrace more standalone
Alexander == Alexander Monakov amona...@ispras.ru writes: Alexander In GCC's own dwarf2.h as enum tags; dwarf.h uses anonymous enums. I'm curious to know which dwarf2.h you are testing against and/or intend to support. I think there is more than one. Tom
Re: [PATCH,RFC] Make libbacktrace more standalone
On Tue, May 14, 2013 at 12:45 AM, Tom Tromey tro...@redhat.com wrote: Alexander == Alexander Monakov amona...@ispras.ru writes: Alexander In GCC's own dwarf2.h as enum tags; dwarf.h uses anonymous enums. I'm curious to know which dwarf2.h you are testing against and/or intend to support. I think there is more than one. Either dwarf.h from libdwarf (the official one if I understand correctly), or dwarf.h provided by elfutils or libdw. Right now I'm testing against /usr/include/dwarf.h provided by elfutils package on Gentoo. As far as I know, those are very similar except for presence of some enum values, hence the configure checks. Alexander
Re: [PATCH] Fix up WIDEN_MULT_EXPR (PR middle-end/57251)
On 05/13/2013 09:39 AM, Jakub Jelinek wrote: 2013-05-13 Jakub Jelinek ja...@redhat.com PR middle-end/57251 * expr.c (expand_expr_real_2) case WIDEN_MULT_EXPR: Handle the case when both op0 and op1 have VOIDmode. * gcc.dg/torture/pr57251.c: New test. Ok. r~
Re: GCC does not support *mmintrin.h with function specific opts
Ping. On Thu, May 9, 2013 at 2:20 PM, Sriraman Tallam tmsri...@google.com wrote: cc:Diego On Tue, May 7, 2013 at 2:41 PM, Sriraman Tallam tmsri...@google.com wrote: Ping. On Thu, May 2, 2013 at 3:51 PM, Sriraman Tallam tmsri...@google.com wrote: Ping. On Mon, Apr 29, 2013 at 10:47 AM, Sriraman Tallam tmsri...@google.com wrote: On Thu, Apr 25, 2013 at 12:41 PM, Joseph S. Myers jos...@codesourcery.com wrote: On Tue, 16 Apr 2013, Sriraman Tallam wrote: Ok, it is on by default now. There is a way to turn it off, with -mno-generate-builtins. Any new option needs documenting in invoke.texi. Added and new patch attached. Thanks Sri -- Joseph S. Myers jos...@codesourcery.com
Re: GCC does not support *mmintrin.h with function specific opts
On Mon, May 13, 2013 at 2:21 PM, Sriraman Tallam tmsri...@google.com wrote: Ping. On Thu, May 9, 2013 at 2:20 PM, Sriraman Tallam tmsri...@google.com wrote: cc:Diego On Tue, May 7, 2013 at 2:41 PM, Sriraman Tallam tmsri...@google.com wrote: Ping. On Thu, May 2, 2013 at 3:51 PM, Sriraman Tallam tmsri...@google.com wrote: Ping. On Mon, Apr 29, 2013 at 10:47 AM, Sriraman Tallam tmsri...@google.com wrote: On Thu, Apr 25, 2013 at 12:41 PM, Joseph S. Myers jos...@codesourcery.com wrote: On Tue, 16 Apr 2013, Sriraman Tallam wrote: Ok, it is on by default now. There is a way to turn it off, with -mno-generate-builtins. Any new option needs documenting in invoke.texi. Added and new patch attached. Thanks Sri -- Joseph S. Myers jos...@codesourcery.com It looks good to me. But I can't approve it. Add Uros, -- H.J.
[v3] libsupc++ bad_array_* build fixes
Some cleanup, no Makefile.am was checked in on the bad_array_* additions. The exports make the intent very clear, so this just fills in the blanks. tested x86/linux -benjamin2013-05-13 Benjamin Kosnik b...@redhat.com * libsupc++/Makefile.am (sources): Add bad_array_length.cc, bad_array_new.cc. * libsupc++/Makefile.in: Regenerate. * libsupc++/bad_array_length.cc: Tweak. * libsupc++/bad_array_new.cc: Tweak. diff --git a/libstdc++-v3/libsupc++/Makefile.am b/libstdc++-v3/libsupc++/Makefile.am index 25c58fb..b4e86f5 100644 --- a/libstdc++-v3/libsupc++/Makefile.am +++ b/libstdc++-v3/libsupc++/Makefile.am @@ -48,6 +48,8 @@ sources = \ atexit_arm.cc \ atexit_thread.cc \ bad_alloc.cc \ + bad_array_length.cc \ + bad_array_new.cc \ bad_cast.cc \ bad_typeid.cc \ class_type_info.cc \ @@ -107,6 +109,21 @@ cp-demangle.o: cp-demangle.c # Use special rules for the C++11 sources so that the proper flags are passed. +bad_array_length.lo: bad_array_length.cc + $(LTCXXCOMPILE) -std=gnu++11 -c $ +bad_array_length.o: bad_array_length.cc + $(CXXCOMPILE) -std=gnu++11 -c $ + +bad_array_new.lo: bad_array_new.cc + $(LTCXXCOMPILE) -std=gnu++11 -c $ +bad_array_new.o: bad_array_new.cc + $(CXXCOMPILE) -std=gnu++11 -c $ + +eh_aux_runtime.lo: eh_aux_runtime.cc + $(LTCXXCOMPILE) -std=gnu++11 -c $ +eh_aux_runtime.o: eh_aux_runtime.cc + $(CXXCOMPILE) -std=gnu++11 -c $ + eh_ptr.lo: eh_ptr.cc $(LTCXXCOMPILE) -std=gnu++11 -c $ eh_ptr.o: eh_ptr.cc diff --git a/libstdc++-v3/libsupc++/bad_array_length.cc b/libstdc++-v3/libsupc++/bad_array_length.cc index a63d660..76afd30 100644 --- a/libstdc++-v3/libsupc++/bad_array_length.cc +++ b/libstdc++-v3/libsupc++/bad_array_length.cc @@ -23,14 +23,13 @@ #include new -namespace std { +namespace std +{ bad_array_length::~bad_array_length() _GLIBCXX_USE_NOEXCEPT { } const char* bad_array_length::what() const _GLIBCXX_USE_NOEXCEPT -{ - return std::bad_array_length; -} +{ return std::bad_array_length; } } // namespace std diff --git a/libstdc++-v3/libsupc++/bad_array_new.cc b/libstdc++-v3/libsupc++/bad_array_new.cc index 5282f52..224e4f7 100644 --- a/libstdc++-v3/libsupc++/bad_array_new.cc +++ b/libstdc++-v3/libsupc++/bad_array_new.cc @@ -23,14 +23,13 @@ #include new -namespace std { +namespace std +{ bad_array_new_length::~bad_array_new_length() _GLIBCXX_USE_NOEXCEPT { } const char* bad_array_new_length::what() const _GLIBCXX_USE_NOEXCEPT -{ - return std::bad_array_new_length; -} +{ return std::bad_array_new_length; } } // namespace std
Re: [PATCH] Generate a label for the split cold function while using -freorder-blocks-and-partition
Ping. On Thu, May 9, 2013 at 2:22 PM, Sriraman Tallam tmsri...@google.com wrote: cc:Diego On Tue, May 7, 2013 at 2:41 PM, Sriraman Tallam tmsri...@google.com wrote: Ping. On Thu, Apr 25, 2013 at 4:50 PM, Sriraman Tallam tmsri...@google.com wrote: Attaching an updated patch. Thanks Sri On Thu, Apr 25, 2013 at 4:42 PM, Sriraman Tallam tmsri...@google.com wrote: On Tue, Apr 23, 2013 at 9:59 PM, Jakub Jelinek ja...@redhat.com wrote: On Tue, Apr 23, 2013 at 03:58:06PM -0700, Sriraman Tallam wrote: This patch generates labels for cold function parts that are split when using the option -freorder-blocks-and-partition. The cold label name is generated by suffixing .cold to the assembler name of the hot function. This is useful when getting back traces from gdb when the cold function part does get executed. * final.c (final_scan_insn): Generate cold label name by suffixing .cold to function's assembler name. * gcc.dg/tree-prof/cold_partition_label.c: New test. This doesn't honor NO_DOT_IN_LABEL (and NO_DOLLAR_IN_LABEL). Fixed, by calling clean_symbol_name Also, don't some function start in cold section and then switch into hot section? I am not able to generate a test where this happens. However, I fixed this problem by generating the cold label only when the first function block is not cold. Patch attached, please see if this is ok. Thanks Sri Jakub
Re: [patch] Hash table changes from cxx-conversion branch - config part
I still have not heard from i386 or ia64 folks. Anyone? On 4/24/13, Lawrence Crowl cr...@googlers.com wrote: This patch is a consolodation of the hash_table patches to the cxx-conversion branch for files under gcc/config. Recipients: config/arm/arm.c - ni...@redhat.com, ramana.radhakrish...@arm.com config/ia64/ia64.c - wil...@tuliptree.org, sell...@mips.com config/mips/mips.c - rdsandif...@googlemail.com config/sol2.c - r...@cebitec.uni-bielefeld.de config/i386/winnt.c - c...@gcc.gnu.org, kti...@redhat.com global - rguent...@suse.de, dnovi...@google.com Update various hash tables from htab_t to hash_table. Modify types and calls to match. * config/arm/arm.c'arm_libcall_uses_aapcs_base::libcall_htab Fold libcall_eq and libcall_hash into new struct libcall_hasher. * config/ia64/ia64.c'bundle_state_table Fold bundle_state_hash and bundle_state_eq_p into new struct bundle_state_hasher. * config/mips/mips.c'mips_offset_table Fold mips_lo_sum_offset_hash and mips_lo_sum_offset_eq into new struct mips_lo_sum_offset_hasher. In mips_reorg_process_insns, change call to for_each_rtx to pass a pointer to the hash_table rather than a htab_t. This change requires then dereferencing that pointer in mips_record_lo_sum to obtain the hash_table. * config/sol2.c'solaris_comdat_htab Fold comdat_hash and comdat_eq into new struct comdat_entry_hasher. * config/i386/winnt.c'i386_pe_section_type_flags::htab * config/i386/winnt.c'i386_find_on_wrapper_list::wrappers Fold wrapper_strcmp into new struct wrapped_symbol_hasher. Tested on x86_64. Tested with config-list.mk. Index: gcc/ChangeLog 2013-04-24 Lawrence Crowl cr...@google.com * config/arm/t-arm: Update for below. * config/arm/arm.c (arm_libcall_uses_aapcs_base::libcall_htab): Change type to hash_table. Update dependent calls and types. * config/i386/t-cygming: Update for below. * config/i386/t-interix: Update for below. * config/i386/winnt.c (i386_pe_section_type_flags::htab): Change type to hash_table. Update dependent calls and types. (i386_find_on_wrapper_list::wrappers): Likewise. * config/ia64/t-ia64: Update for below. * config/ia64/ia64.c (bundle_state_table): Change type to hash_table. Update dependent calls and types. * config/mips/mips.c (mips_reorg_process_insns::htab): Change type to hash_table. Update dependent calls and types. * config/sol2.c (solaris_comdat_htab): Change type to hash_table. Update dependent calls and types. * config/t-sol2: Update for above. Index: gcc/config/ia64/ia64.c === --- gcc/config/ia64/ia64.c(revision 198213) +++ gcc/config/ia64/ia64.c(working copy) @@ -47,7 +47,7 @@ along with GCC; see the file COPYING3. #include target-def.h #include common/common-target.h #include tm_p.h -#include hashtab.h +#include hash-table.h #include langhooks.h #include gimple.h #include intl.h @@ -257,8 +257,6 @@ static struct bundle_state *get_free_bun static void free_bundle_state (struct bundle_state *); static void initiate_bundle_states (void); static void finish_bundle_states (void); -static unsigned bundle_state_hash (const void *); -static int bundle_state_eq_p (const void *, const void *); static int insert_bundle_state (struct bundle_state *); static void initiate_bundle_state_table (void); static void finish_bundle_state_table (void); @@ -8528,18 +8526,21 @@ finish_bundle_states (void) } } -/* Hash table of the bundle states. The key is dfa_state and insn_num - of the bundle states. */ +/* Hashtable helpers. */ -static htab_t bundle_state_table; +struct bundle_state_hasher : typed_noop_remove bundle_state +{ + typedef bundle_state value_type; + typedef bundle_state compare_type; + static inline hashval_t hash (const value_type *); + static inline bool equal (const value_type *, const compare_type *); +}; /* The function returns hash of BUNDLE_STATE. */ -static unsigned -bundle_state_hash (const void *bundle_state) +inline hashval_t +bundle_state_hasher::hash (const value_type *state) { - const struct bundle_state *const state -= (const struct bundle_state *) bundle_state; unsigned result, i; for (result = i = 0; i dfa_state_size; i++) @@ -8550,19 +8551,20 @@ bundle_state_hash (const void *bundle_st /* The function returns nonzero if the bundle state keys are equal. */ -static int -bundle_state_eq_p (const void *bundle_state_1, const void *bundle_state_2) +inline bool +bundle_state_hasher::equal (const value_type *state1, + const compare_type *state2) { - const struct bundle_state *const state1 -= (const struct bundle_state *) bundle_state_1; - const struct bundle_state *const state2 -= (const struct bundle_state *) bundle_state_2; - return
Re: new port: msp430-elf, revision 2
run-time, according to my reading of http://gcc.gnu.org/codingconventions.html . I chenged it, but I note that runtime is used much more often than run-time and run time in the docs. However, runtime is preferred for libraries and system support present at run time, which is what this option controls. Here I am wondering whether to simply omit instead of decimal? (If not, somehow that part of the sentence comes across as a bit odd, but then I'm not a native speaker.) New text: Force assembly output to always use hex constants. Normally such constants are signed decimals, but this option is available for testsuite and/or aesthetic purposes. What are the other legitimate options beyond these two? We're still discussing that with TI; historical non-fsf versions of the msp-gcc allow the full chip number here (i.e. msp430F5438). Ideally, such would still be allowed, but the community is talking about switching to -mcpu= for the ISA and using -mmcu= only for the linker scripts. For now, we allow chip numbers but only look for the 'x' to determine the ISA. We might allow for arbitrary chip numbers, to be passed to the assembler/linker or to be used as filenames for linker scripts, headers, or libraries. So, I only documented the use I know is currently supported. +@item -mrelax +@opindex mrelax +Perform link-time opcode relaxing. Will everyone know what opcode relaxing is? New text: This option is passed to the assembler and linker, and allows the linker to perform certain optimizations that cannot be done until the final link. restricted 64k range of constants, what's that? And kBit or kByte? New text: Memory references which do not require an extended MOVX instruction. The doc changes look fine modulo the above, and I assume you'll want to add a note to the release notes htdocs/gcc-4.9/changes.html and a news item to htdocs/index.html . I included index.html, I thought... I didn't see changes.html in the file list I had, though, but I can certainly add that too.
[gomp4/cilkplus] C parsing for Cilk Plus #pragma simd
Hi Jakub. Hi folks. Attached is a patch against the gomp4 branch implementing Cilk's #pragma simd construct with the gomp4 infrastructure. I emit OMP_SIMD as previously mentioned, and let omp-low.c do the rest. The reason for this, is that Cilk Plus and OMP4 behave pretty much the same way, so there's no need to implement differing trees. If in the future they diverge too much, I guess we could add different tree family opcodes for Cilk Plus and gimplify them to a common representation. For now, they're pretty much identical. I am working on a private branch in aldyh/cilk-in-gomp, but I thought I'd post what I've done so far, since it's pretty much complete, and Balaji is about to start the C++ parsing bits based on this work. Let me know if you see anything obviously wrong, or would like things in any way different. Aldy diff --git a/gcc/ChangeLog.cilkplus b/gcc/ChangeLog.cilkplus new file mode 100644 index 000..5e9fb79 --- /dev/null +++ b/gcc/ChangeLog.cilkplus @@ -0,0 +1,27 @@ +2013-05-13 Aldy Hernandez al...@redhat.com + + * Makefile.in (C_COMMON_OBJS): Depend on c-family/c-cilkplus.o. + (c-cilkplus.o): New dependency. + +c-family/ + * c-cilkplus.c: New. + * c-pragma.c (init_pragma): Register simd pragma. + * c-pragma.h (enum pragma_kind): Add PRAGMA_CILK_SIMD enum. + (enum pragma_cilk_clause): New. + * c.opt (fcilkplus): New flag. + +c/ + * c-parser.c (c_parser_pragma): Add case for PRAGMA_CILK_SIMD. + (c_parser_cilk_verify_simd): New. + (c_parser_cilk_clause_vectorlength): New. + (c_parser_cilk_clause_linear): New. + (c_parser_cilk_clause_name): New. + (c_parser_cilk_all_clauses): New. + (c_parser_cilk_for_statement): New. + (c_parser_cilk_simd_construct): New. + * c-tree.h (c_finish_cilk_simd_loop): Protoize. + (c_finish_cilk_clauses): Same. + * c-typeck.c (c_finish_bc_stmt): Add case for _Cilk_for loops. + +testsuite/ + * gcc.dg/cilk-plus: New directory and associated infrastructure. diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 7be55ae..6a75f30 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1142,6 +1142,7 @@ C_COMMON_OBJS = c-family/c-common.o c-family/c-cppbuiltin.o c-family/c-dump.o \ c-family/c-format.o c-family/c-gimplify.o c-family/c-lex.o \ c-family/c-omp.o c-family/c-opts.o c-family/c-pch.o \ c-family/c-ppoutput.o c-family/c-pragma.o c-family/c-pretty-print.o \ + c-family/c-cilkplus.o \ c-family/c-semantics.o c-family/c-ada-spec.o tree-mudflap.o # Language-independent object files. @@ -1971,6 +1972,9 @@ c-family/c-lex.o : c-family/c-lex.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ c-family/c-omp.o : c-family/c-omp.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ $(TREE_H) $(C_COMMON_H) $(GIMPLE_H) langhooks.h +c-family/c-cilkplus.o : c-family/c-cilkplus.c $(CONFIG_H) $(SYSTEM_H) \ + coretypes.h $(TREE_H) $(C_COMMON_H) langhooks.h + CFLAGS-c-family/c-opts.o += @TARGET_SYSTEM_ROOT_DEFINE@ c-family/c-opts.o : c-family/c-opts.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ $(TREE_H) $(C_PRAGMA_H) $(FLAGS_H) toplev.h langhooks.h \ diff --git a/gcc/c-family/c-cilkplus.c b/gcc/c-family/c-cilkplus.c new file mode 100644 index 000..40284fe --- /dev/null +++ b/gcc/c-family/c-cilkplus.c @@ -0,0 +1,442 @@ +/* This file contains routines to construct and validate Cilk Plus + constructs within the C and C++ front ends. + + Copyright (C) 2011-2013 Free Software Foundation, Inc. + Contributed by Balaji V. Iyer balaji.v.i...@intel.com, + Aldy Hernandez al...@redhat.com. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it +under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +http://www.gnu.org/licenses/. */ + +#include config.h +#include system.h +#include coretypes.h +#include tree.h +#include c-common.h + +/* Helper function for c_check_cilk_loop. + + Validate the increment in a _Cilk_for construct or a #pragma simd + for loop. + + LOC is the location of the `for' keyword. DECL is the induction + variable. INCR is the original increment expression. + + Returns the canonicalized increment expression for an OMP_FOR_INCR. + If there is a validation error, returns error_mark_node. */ + +static tree +c_check_cilk_loop_incr (location_t loc, tree decl, tree incr) +{ + if (EXPR_HAS_LOCATION (incr)) +loc = EXPR_LOCATION (incr); + + if (!incr) +{ + error_at (loc, missing increment); +
Re: RFC: PATCH to avoid linking multiple front ends at once with parallel make
On May 9, 2013, Jason Merrill ja...@redhat.com wrote: On 05/01/2013 07:40 PM, Mike Stump wrote: $ bash -c trap 'echo remove lock' 0; true; echo $? Thanks for the suggestion. Here's a revised patch: I like the idea of the patch, for I feel your pain too ;-) However, rather than implementing the locking in Makefiles, I'm thinking it might be wiser to do so in a script that takes the lock name and the command to run while holding the lock. ifeq (@DO_LINK_MUTEX@,true) LLINKER = $(SHELL) $(srcdir)/lock-and-run linkfe.lck $(LINKER) else LLINKER = $(LINKER) endif lock-and-run: #! /bin/sh lockdir=$1 prog=$2; shift 2 || exit 1 status=1 trap 'rmdir $lockdir; exit $status' 0 1 2 15 until mkdir $lockdir; do sleep 1; done $prog $@ status=$? rmdir $lockdir exit $status -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer
Re: [PATCH] Generate a label for the split cold function while using -freorder-blocks-and-partition
Hi, I would prefer to encode section changes within a function in the debug info. I will explain later today. Ciao! Steven On 5/13/13, Sriraman Tallam tmsri...@google.com wrote: Ping. On Thu, May 9, 2013 at 2:22 PM, Sriraman Tallam tmsri...@google.com wrote: cc:Diego On Tue, May 7, 2013 at 2:41 PM, Sriraman Tallam tmsri...@google.com wrote: Ping. On Thu, Apr 25, 2013 at 4:50 PM, Sriraman Tallam tmsri...@google.com wrote: Attaching an updated patch. Thanks Sri On Thu, Apr 25, 2013 at 4:42 PM, Sriraman Tallam tmsri...@google.com wrote: On Tue, Apr 23, 2013 at 9:59 PM, Jakub Jelinek ja...@redhat.com wrote: On Tue, Apr 23, 2013 at 03:58:06PM -0700, Sriraman Tallam wrote: This patch generates labels for cold function parts that are split when using the option -freorder-blocks-and-partition. The cold label name is generated by suffixing .cold to the assembler name of the hot function. This is useful when getting back traces from gdb when the cold function part does get executed. * final.c (final_scan_insn): Generate cold label name by suffixing .cold to function's assembler name. * gcc.dg/tree-prof/cold_partition_label.c: New test. This doesn't honor NO_DOT_IN_LABEL (and NO_DOLLAR_IN_LABEL). Fixed, by calling clean_symbol_name Also, don't some function start in cold section and then switch into hot section? I am not able to generate a test where this happens. However, I fixed this problem by generating the cold label only when the first function block is not cold. Patch attached, please see if this is ok. Thanks Sri Jakub