[PING][PATCH, ARM] Fix PR63718, Thumb1 bootstrap -- disable fuse-caller-save for Thumb1
On 20-11-14 12:54, Tom de Vries wrote: Richard, This patch fixes PR63718, which currently breaks Thumb1 bootstrap. The problem is that in Thumb1 mode, we emit the epilogue in RTL, but the last insn - epilogue_insns - does not accurately model the corresponding insns emitted in the asm file. F.i., the asm file may contain an insn: ... pop {r0} while the corresponding RTL pattern looks like this: ... (jump_insn (unspec_volatile [ (return) ] VUNSPEC_EPILOGUE)) ... As a consequence, the epilogue may clobber registers without fuse-caller-save being able to analyze that. Adding the missing clobbers to epilogue_insns is not trivial, and probably not a good idea for stage3. The patch works around the problem by disabling fuse-caller-save in Thumb1 mode. Build and reg-tested on arm-none-eabi. OK for stage3? Ping. Thanks, - Tom
Re: [PATCH] Fix PR64126
On Mon, 1 Dec 2014, Marc Glisse wrote: On Mon, 1 Dec 2014, Richard Biener wrote: The following fixes PR64126. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2014-12-01 Richard Biener rguent...@suse.de PR middle-end/64126 * match.pd: Allow conversions in ~A + 1 - -A, add -A - 1 - ~A and -1 - A - ~A. * fold-const.c (fold_binary_loc): Remove transforms here. Index: gcc/match.pd === --- gcc/match.pd(revision 218144) +++ gcc/match.pd(working copy) @@ -484,8 +522,22 @@ (define_operator_list inverted_tcc_compa /* ~A + 1 - -A */ (simplify - (plus (bit_not @0) integer_each_onep) - (negate @0)) + (plus (convert? (bit_not @0)) integer_each_onep) + (if (tree_nop_conversion_p (type, TREE_TYPE (@0))) +(negate (convert @0 + + /* -A - 1 - ~A */ + (simplify + (minus (convert? (negate @0)) integer_each_onep) + (if (!TYPE_OVERFLOW_TRAPS (type) I don't understand why TYPE_OVERFLOW_TRAPS is tested for this one but not the others. No idea - I just copied from fold-const.c. The whole -ftrapv and now -fsanitize=overflow stuff is just very inconsistently checked. +tree_nop_conversion_p (type, TREE_TYPE (@0))) +(bit_not (convert @0 + + /* -1 - A - ~A */ + (simplify + (minus integer_all_onesp @0) + (if (TREE_CODE (type) != COMPLEX_TYPE) +(bit_not @0))) It should also be true for COMPLEX_TYPE where integer_all_onesp tests for -1-i. Yeah, I have no idea why it was disabled for complex. Maybe expansion doesn't deal with ~complex properly (on some targets?). Not sure. Didn't bother to investigate as ... (I know you are just copying from fold-const) ... I was just copying from fold-const.c ;) Richard.
Re: [PATCH] Fix PR64137
On Mon, 1 Dec 2014, Richard Biener wrote: On Mon, 1 Dec 2014, FX wrote: Your change is OK (we don’t want to use the type of the result, but the type of the argument indeed). Index: gcc/fortran/trans-intrinsic.c === --- gcc/fortran/trans-intrinsic.c (revision 218211) +++ gcc/fortran/trans-intrinsic.c (working copy) @@ -3729,7 +3729,7 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * s possible value is HUGE in both cases. */ if (op == GT_EXPR) tmp = fold_build1_loc (input_location, NEGATE_EXPR, TREE_TYPE (tmp), tmp); - if (op == GT_EXPR expr-ts.type == BT_INTEGER) + if (op == GT_EXPR arrayexpr-ts.type == BT_INTEGER) tmp = fold_build2_loc (input_location, MINUS_EXPR, TREE_TYPE (tmp), tmp, build_int_cst (type, 1)); Logic would dictate that it is build_int_cst (TREE_TYPE (tmp), 1)” instead of build_int_cst (type, 1)” in that last line. Probably doesn’t matter much, as it will be all folded into the same value anyway, but could you test and commit that change together, while you’re at it? Sure, will re-test and commit tomorrow. The following is what I have applied after bootstrap regtest on x86_64-unknown-linux-gnu. Richard. 2014-12-02 Richard Biener rguent...@suse.de PR fortran/64137 * trans-intrinsic.c (gfc_conv_intrinsic_minmaxloc): Check proper expressions type, use proper type for computing -Huge - 1. Index: gcc/fortran/trans-intrinsic.c === --- gcc/fortran/trans-intrinsic.c (revision 218225) +++ gcc/fortran/trans-intrinsic.c (working copy) @@ -3729,9 +3729,9 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * s possible value is HUGE in both cases. */ if (op == GT_EXPR) tmp = fold_build1_loc (input_location, NEGATE_EXPR, TREE_TYPE (tmp), tmp); - if (op == GT_EXPR expr-ts.type == BT_INTEGER) + if (op == GT_EXPR arrayexpr-ts.type == BT_INTEGER) tmp = fold_build2_loc (input_location, MINUS_EXPR, TREE_TYPE (tmp), tmp, - build_int_cst (type, 1)); + build_int_cst (TREE_TYPE (tmp), 1)); gfc_add_modify (se-pre, limit, tmp);
Re: [PATCH] Fix PR15346
On Mon, 1 Dec 2014, Joseph Myers wrote: On Mon, 1 Dec 2014, Richard Biener wrote: +/* Combine two successive divisions. */ +(for div (trunc_div ceil_div floor_div round_div exact_div) This doesn't seem correct for all kinds of division and signedness of arguments. TRUNC_DIV_EXPR (C division) and EXACT_DIV_EXPR should be OK regardless. But for CEIL_DIV_EXPR and FLOOR_DIV_EXPR, in ((a / b) / c) you need c to be positive so that both roundings are in the same direction (consider e.g. 3 FLOOR_DIV -2 FLOOR_DIV -2 == 1, but 3 FLOOR_DIV 4 == 0). And for ROUND_DIV_EXPR, it can be incorrect whatever the signs (e.g. 2 ROUND_DIV 3 ROUND_DIV 2 == 1, but 2 ROUND_DIV 6 == 0). I'm not sure if these forms of division actually occur in places where this could cause a problem, but it does look like Ada may enable you to generate ROUND_DIV_EXPR. Hmm. I thought I was following what extract_muldiv_1 does (but of course that function is quite hard to follow). case TRUNC_DIV_EXPR: case CEIL_DIV_EXPR: case FLOOR_DIV_EXPR: case ROUND_DIV_EXPR: case EXACT_DIV_EXPR: ... /* If these are the same operation types, we can associate them assuming no overflow. */ if (tcode == code) { bool overflow_p = false; bool overflow_mul_p; signop sign = TYPE_SIGN (ctype); wide_int mul = wi::mul (op1, c, sign, overflow_mul_p); overflow_p = TREE_OVERFLOW (c) | TREE_OVERFLOW (op1); if (overflow_mul_p ((sign == UNSIGNED tcode != MULT_EXPR) || sign == SIGNED)) overflow_p = true; if (!overflow_p) return fold_build2 (tcode, ctype, fold_convert (ctype, op0), wide_int_to_tree (ctype, mul)); but yeah, I see you are correct. I'll restrict the pattern to TRUNC_DIV_EXPR and EXACT_DIV_EXPR which are the only ones I've ever seen generated from C. I don't know how to generate testcases for the other division opcodes - eventually we should have builtins just for the sake of being able to generate testcases... Committed as obvoious. Thanks, Richard. 2014-12-02 Richard Biener rguent...@suse.de * match.pd: Restrict division combining to trunc_div and exact_div. Index: gcc/match.pd === --- gcc/match.pd(revision 218258) +++ gcc/match.pd(working copy) @@ -129,8 +129,9 @@ (define_operator_list inverted_tcc_compa TYPE_UNSIGNED (type)) (trunc_div @0 @1))) -/* Combine two successive divisions. */ -(for div (trunc_div ceil_div floor_div round_div exact_div) +/* Combine two successive divisions. Note that combining ceil_div + and floor_div is trickier and combining round_div even more so. */ +(for div (trunc_div exact_div) (simplify (div (div @0 INTEGER_CST@1) INTEGER_CST@2) (with {
Re: [PATCH] Fix PR15346
On Mon, 1 Dec 2014, Marc Glisse wrote: On Mon, 1 Dec 2014, Richard Biener wrote: The following fixes the oldest bug in my list of assigned PRs, namely combining A / CST / CST' to A / (CST * CST'). extract_muldiv does this in fold-const.c (but it fails to optimize to zero when CST * CST' overflows). Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2014-12-01 Richard Biener rguent...@suse.de PR tree-optimization/15346 * Makefile.in (gimple-match.o-warn): Remove -Wno-unused-parameter, add -Wno-unused-but-set-variable. * match.pd: Combine two successive divisions. * gcc.dg/tree-ssa/forwprop-32.c: New testcase. Index: gcc/match.pd === --- gcc/match.pd(revision 218140) +++ gcc/match.pd(working copy) @@ -129,6 +129,19 @@ (define_operator_list inverted_tcc_compa TYPE_UNSIGNED (type)) (trunc_div @0 @1))) +/* Combine two successive divisions. */ +(for div (trunc_div ceil_div floor_div round_div exact_div) + (simplify + (div (div @0 INTEGER_CST@1) INTEGER_CST@2) + (with { +bool overflow_p; +wide_int mul = wi::mul (@1, @2, TYPE_SIGN (type), overflow_p); + } + (if (!overflow_p) +(div @0 { wide_int_to_tree (type, mul); })) + (if (overflow_p) +{ build_zero_cst (type); } + Can't you have something like: INT_MIN / 2 / (INT_MIN / -2) == -1 where 2 * (INT_MIN / -2) overflows? Hmm, right. It should work if overflow_p mul != INT_MIN as far as I can see. I am testing the following. Thanks, Richard. 2014-12-02 Richard Biener rguent...@suse.de * match.pd: When combining divisions exclude the degenerate case involving INT_MIN from overflow handling. * gcc.dg/torture/20141202-1.c: New testcase. Index: gcc/match.pd === --- gcc/match.pd(revision 218260) +++ gcc/match.pd(working copy) @@ -140,7 +140,9 @@ (define_operator_list inverted_tcc_compa } (if (!overflow_p) (div @0 { wide_int_to_tree (type, mul); })) - (if (overflow_p) + (if (overflow_p + (TYPE_UNSIGNED (type) + || mul != wi::min_value (TYPE_PRECISION (type), SIGNED))) { build_zero_cst (type); } /* Optimize A / A to 1.0 if we don't care about Index: gcc/testsuite/gcc.dg/torture/20141202-1.c === --- gcc/testsuite/gcc.dg/torture/20141202-1.c (revision 0) +++ gcc/testsuite/gcc.dg/torture/20141202-1.c (working copy) @@ -0,0 +1,15 @@ +/* { dg-do run } */ + +extern void abort (void); + +int foo (int x) +{ + return (x / 2) / ((-__INT_MAX__ - 1) / -2); +} + +int main() +{ + if (foo (- __INT_MAX__ - 1) != -1) +abort (); + return 0; +}
Re: [PATCH] Adjust gimple_build_assign* gimple.texi documentation
On Tue, 2 Dec 2014, Jakub Jelinek wrote: Hi! Here is an attempt to adjust gimple_build_assign* documentation. Ok for trunk? Ok. Thanks, Richard. Note, apparently the documentation has not been adjusted for the gimple - gassign * etc. changes, David, can you please work on adjusting gimple.texi to match the reality after your changes? 2014-12-02 Jakub Jelinek ja...@redhat.com * doc/gimple.texi (gimple_build_assign_with_ops): Remove. (gimple_build_assign): Document the new overloads. --- gcc/doc/gimple.texi.jj2014-11-20 17:06:24.0 +0100 +++ gcc/doc/gimple.texi 2014-12-02 08:27:06.898386806 +0100 @@ -1124,9 +1124,35 @@ already have a tree expression that you tuple. However, try to avoid building expression trees for the sole purpose of calling this function. If you already have the operands in separate trees, it is better to use -@code{gimple_build_assign_with_ops}. +@code{gimple_build_assign} with @code{enum tree_code} argument and separate +arguments for each operand. @end deftypefn +@deftypefn {GIMPLE function} gimple gimple_build_assign @ +(tree lhs, enum tree_code subcode, tree op1, tree op2, tree op3) +This function is similar to two operand @code{gimple_build_assign}, +but is used to build a @code{GIMPLE_ASSIGN} statement when the operands of the +right-hand side of the assignment are already split into +different operands. + +The left-hand side is an lvalue passed in lhs. Subcode is the +@code{tree_code} for the right-hand side of the assignment. Op1, op2 and op3 +are the operands. +@end deftypefn + +@deftypefn {GIMPLE function} gimple gimple_build_assign @ +(tree lhs, enum tree_code subcode, tree op1, tree op2) +Like the above 5 operand @code{gimple_build_assign}, but with the last +argument @code{NULL} - this overload should not be used for +@code{GIMPLE_TERNARY_RHS} assignments. +@end deftypefn + +@deftypefn {GIMPLE function} gimple gimple_build_assign @ +(tree lhs, enum tree_code subcode, tree op1) +Like the above 4 operand @code{gimple_build_assign}, but with the last +argument @code{NULL} - this overload should be used only for +@code{GIMPLE_UNARY_RHS} and @code{GIMPLE_SINGLE_RHS} assignments. +@end deftypefn @deftypefn {GIMPLE function} gimple gimplify_assign (tree dst, tree src, gimple_seq *seq_p) Build a new @code{GIMPLE_ASSIGN} tuple and append it to the end of @@ -1139,19 +1165,6 @@ case they will be converted to a gimple This function returns the newly created @code{GIMPLE_ASSIGN} tuple. -@deftypefn {GIMPLE function} gimple gimple_build_assign_with_ops @ -(enum tree_code subcode, tree lhs, tree op1, tree op2) -This function is similar to @code{gimple_build_assign}, but is used to -build a @code{GIMPLE_ASSIGN} statement when the operands of the -right-hand side of the assignment are already split into -different operands. - -The left-hand side is an lvalue passed in lhs. Subcode is the -@code{tree_code} for the right-hand side of the assignment. Op1 and op2 -are the operands. If op2 is null, subcode must be a @code{tree_code} -for a unary expression. -@end deftypefn - @deftypefn {GIMPLE function} {enum tree_code} gimple_assign_rhs_code (gimple g) Return the code of the expression computed on the @code{RHS} of assignment statement @code{G}. Jakub -- Richard Biener rguent...@suse.de SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284 (AG Nuernberg) Maxfeldstrasse 5, 90409 Nuernberg, Germany
[match-and-simplify] Merge from trunk
2014-12-02 Richard Biener rguent...@suse.de Merge from trunk r217561 through r218261.
[PATCH, alpha]: Fix PR 64113, Error: No lda !gpdisp!282 was found
Hello! As mentioned in the PR, we need accurate gp live analysis to split __tls_get_addr call. Move split from post-reload splitter to peephole2 pass, as is the case with other call insns. 2014-12-02 Uros Bizjak ubiz...@gmail.com PR target/64113 * config/alpha/alpha.md (call_value_osf_tlsgd): Do not split insn using post-reload splitter. Use peephole2 pass instead. (call_value_osf_tlsldm): Ditto. (TLS_CALL): New int iterator. (tls): New int attribute. (call_value_osf_tls): Merge insn pattern from call_value_osf_tlsgd and call_value_tlsldm using TLS_CALL int iterator. Bootstrapped and regression tested on alphaev68-linux-gnu, also tested by building systemd-217 that failed to compile without the patch. Committed to mainline SVN, will be committed to 4.9 branch. Uros. Index: config/alpha/alpha.md === --- config/alpha/alpha.md (revision 218160) +++ config/alpha/alpha.md (working copy) @@ -5984,16 +5984,38 @@ [(set_attr type jsr) (set_attr length *,*,8)]) -(define_insn_and_split call_value_osf_tlsgd +(define_int_iterator TLS_CALL + [UNSPEC_TLSGD_CALL +UNSPEC_TLSLDM_CALL]) + +(define_int_attr tls + [(UNSPEC_TLSGD_CALL tlsgd) +(UNSPEC_TLSLDM_CALL tlsldm)]) + +(define_insn call_value_osf_tls [(set (match_operand 0) (call (mem:DI (match_operand:DI 1 symbolic_operand)) (const_int 0))) - (unspec [(match_operand:DI 2 const_int_operand)] UNSPEC_TLSGD_CALL) + (unspec [(match_operand:DI 2 const_int_operand)] TLS_CALL) (use (reg:DI 29)) (clobber (reg:DI 26))] HAVE_AS_TLS - # - reload_completed + ldq $27,%1($29)\t\t!literal!%2\;jsr $26,($27),%1\t\t!lituse_tls!%2\;ldah $29,0($26)\t\t!gpdisp!%*\;lda $29,0($29)\t\t!gpdisp!%* + [(set_attr type jsr) + (set_attr length 16)]) + +;; We must use peep2 instead of a split because we need accurate life +;; information for $gp. +(define_peephole2 + [(parallel +[(set (match_operand 0) + (call (mem:DI (match_operand:DI 1 symbolic_operand)) + (const_int 0))) + (unspec [(match_operand:DI 2 const_int_operand)] TLS_CALL) + (use (reg:DI 29)) + (clobber (reg:DI 26))])] + HAVE_AS_TLS reload_completed +peep2_regno_dead_p (1, 29) [(set (match_dup 3) (unspec:DI [(match_dup 5) (match_dup 1) @@ -6001,10 +6023,9 @@ (parallel [(set (match_dup 0) (call (mem:DI (match_dup 3)) (const_int 0))) - (set (match_dup 5) - (unspec:DI [(match_dup 5) (match_dup 4)] UNSPEC_LDGP1)) + (use (match_dup 5)) (use (match_dup 1)) - (use (unspec [(match_dup 2)] UNSPEC_TLSGD_CALL)) + (use (unspec [(match_dup 2)] TLS_CALL)) (clobber (reg:DI 26))]) (set (match_dup 5) (unspec:DI [(match_dup 5) (match_dup 4)] UNSPEC_LDGP2))] @@ -6012,19 +6033,18 @@ operands[3] = gen_rtx_REG (Pmode, 27); operands[4] = GEN_INT (alpha_next_sequence_number++); operands[5] = pic_offset_table_rtx; -} - [(set_attr type multi)]) +}) -(define_insn_and_split call_value_osf_tlsldm - [(set (match_operand 0) - (call (mem:DI (match_operand:DI 1 symbolic_operand)) - (const_int 0))) - (unspec [(match_operand:DI 2 const_int_operand)] UNSPEC_TLSLDM_CALL) - (use (reg:DI 29)) - (clobber (reg:DI 26))] - HAVE_AS_TLS - # - reload_completed +(define_peephole2 + [(parallel +[(set (match_operand 0) + (call (mem:DI (match_operand:DI 1 symbolic_operand)) + (const_int 0))) + (unspec [(match_operand:DI 2 const_int_operand)] TLS_CALL) + (use (reg:DI 29)) + (clobber (reg:DI 26))])] + HAVE_AS_TLS reload_completed +!peep2_regno_dead_p (1, 29) [(set (match_dup 3) (unspec:DI [(match_dup 5) (match_dup 1) @@ -6035,7 +6055,7 @@ (set (match_dup 5) (unspec:DI [(match_dup 5) (match_dup 4)] UNSPEC_LDGP1)) (use (match_dup 1)) - (use (unspec [(match_dup 2)] UNSPEC_TLSLDM_CALL)) + (use (unspec [(match_dup 2)] TLS_CALL)) (clobber (reg:DI 26))]) (set (match_dup 5) (unspec:DI [(match_dup 5) (match_dup 4)] UNSPEC_LDGP2))] @@ -6043,8 +6063,7 @@ operands[3] = gen_rtx_REG (Pmode, 27); operands[4] = GEN_INT (alpha_next_sequence_number++); operands[5] = pic_offset_table_rtx; -} - [(set_attr type multi)]) +}) (define_insn *call_value_osf_1 [(set (match_operand 0)
Re: [Patch] Improving jump-thread pass for PR 54742
On Mon, Dec 1, 2014 at 10:06 PM, Jeff Law l...@redhat.com wrote: On 11/25/14 14:16, Sebastian Pop wrote: Sebastian Pop wrote: I will bootstrap and regression test this patch on x86_64-linux and powerpc64-linux. I will also run it on our internal benchmarks, coremark, and the llvm test-suite. I will also include a longer testcase that makes sure we do not regress on coremark. Done all the above. Attached is the new patch with a new testcase. I have also added verify_seme inspired by the recent patch adding verify_sese. Sebastian 0001-extend-jump-thread-for-finite-state-automata-PR-5474.patch From ca222d5222fb976c7aa258d3e3c04e593f42f7a2 Mon Sep 17 00:00:00 2001 From: Sebastian Pops@samsung.com Date: Fri, 26 Sep 2014 14:54:20 -0500 Subject: [PATCH] extend jump thread for finite state automata PR 54742 Adapted from a patch from James Greenhalgh. * params.def (max-fsm-thread-path-insns, max-fsm-thread-length, max-fsm-thread-paths): New. * doc/invoke.texi (max-fsm-thread-path-insns, max-fsm-thread-length, max-fsm-thread-paths): Documented. * tree-cfg.c (split_edge_bb_loc): Export. * tree-cfg.h (split_edge_bb_loc): Declared extern. * tree-ssa-threadedge.c (simplify_control_stmt_condition): Restore the original value of cond when simplification fails. (fsm_find_thread_path): New. (fsm_find_control_statement_thread_paths): New. (fsm_thread_through_normal_block): Call find_control_statement_thread_paths. * tree-ssa-threadupdate.c (dump_jump_thread_path): Pretty print EDGE_START_FSM_THREAD. (verify_seme): New. (duplicate_seme_region): New. (thread_through_all_blocks): Generate code for EDGE_START_FSM_THREAD edges calling gimple_duplicate_sese_region. * tree-ssa-threadupdate.h (jump_thread_edge_type): Add EDGE_START_FSM_THREAD. * testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c: New. * testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c: New. --- gcc/doc/invoke.texi | 12 ++ gcc/params.def | 15 ++ gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c | 43 + gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c | 127 + gcc/tree-cfg.c |2 +- gcc/tree-cfg.h |1 + gcc/tree-ssa-threadedge.c| 215 +- gcc/tree-ssa-threadupdate.c | 201 +++- gcc/tree-ssa-threadupdate.h |1 + 9 files changed, 614 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 89edddb..074183f 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -10624,6 +10624,18 @@ large and significantly increase compile time at optimization level @option{-O1} and higher. This parameter is a maximum nubmer of statements in a single generated constructor. Default value is 5000. +@item max-fsm-thread-path-insns +Maximum number of instructions to copy when duplicating blocks on a +finite state automaton jump thread path. The default is 100. + +@item max-fsm-thread-length +Maximum number of basic blocks on a finite state automaton jump thread +path. The default is 10. + +@item max-fsm-thread-paths +Maximum number of new jump thread paths to create for a finite state +automaton. The default is 50. Has there been any tuning on these defaults. I don't have any strong opinions about what they ought to be, this is more to get any such information recorded on the lists for historical purposes. I think it's worth a note in the debug dump anytime you abort threading when you hit a limit. I'm a bit worried about compile-time impacts of the all the recursion, but I'm willing to wait and see if it turns out to be a problem in practice. Please consider restricting it to -fexpensive-optimizations (-O2+). Richard. diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c index 8b0b7b8..c9fe212 100644 --- a/gcc/tree-ssa-threadedge.c +++ b/gcc/tree-ssa-threadedge.c @@ -56,6 +56,8 @@ along with GCC; see the file COPYING3. If not see #include params.h #include tree-ssa-threadedge.h #include builtins.h +#include cfg.h +#include cfganal.h /* To avoid code explosion due to jump threading, we limit the number of statements we are going to copy. This variable @@ -661,6 +663,7 @@ simplify_control_stmt_condition (edge e, rather than use a relational operator. These are simpler to handle. */ if (TREE_CODE (cond) == SSA_NAME) { + tree original_lhs = cond; cached_lhs = cond; /* Get the
Re: [testsuite] Fix multiple definitions of _init
On Mon, Dec 1, 2014 at 11:48 PM, Oleg Endo oleg.e...@t-online.de wrote: On Mon, 2014-12-01 at 12:09 +0100, Richard Biener wrote: On Mon, Dec 1, 2014 at 8:21 AM, Oleg Endo oleg.e...@t-online.de wrote: Hi, When running the testsuite on a sh-elf configuration, some test cases fail due to multiple definitions of the function '_init'. This is because on sh-elf every function is automatically prefixed with a '_' char. When defining a C function 'init' in some code, the actual name becomes '_init' and this conflicts with the _init function in libgcc (crti.S). While the behavior of adding the '_' prefix is debatable, the testsuite failures can be easily fixed by the attached patch. Is this OK for trunk? The testcases are certainly valid C testcases and thus sh-elf is buggy here. As '_init' is a reserved identifier it shouldn't mangle 'init' as '_init'. Do you never run into real applications naming a function 'init'? Looking for USER_LABEL_PREFIX shows that the '_' prefix mangling scheme is being used by quite some targets/configs. So it's not just sh-elf, but other bare metal configs and their ABIs, where 'init' gets mangled to '_init' and there's a '.global _init' in libgcc. Yes, it does break valid standard C code 'void init (void)', but it's been doing that for quite a while. Oh, and I wonder why the global init isn't then __init (two underscores), as obviously on those targets symbols with _ prefix are not in the implementation namespace. I didn't mean to open a can of worms because of 4 test cases showing up in sh-elf sim tests. From what I can see, those 4 cases would also pop up for other configs, not only sh-elf. Anyway, the function names in those test cases don't matter at all, do they? No, but it looks like papering over a problem. After all we could have a valid testcase checking for exactly this situation - that a C function named 'init' works fine. Joseph may have more experience with how targets should setup USER_LABEL_PREFIX to avoid this situation. Richard. Cheers, Oleg
[PATCH, REPOST] Fix PR fortran/60718
Hello Tobias, a long time ago, I posted this patch, but it got forgotten. However the described problem is still unsolved, so I thought my patch should be re-posted now. Boot-strapped and regression-tested on arm-linux-gnueabihf. OK for trunk? Thanks Bernd. On Tue, 15 Apr 2014 13:49:37, Bernd Edlinger wrote: Hi Tobias, On Fri, 11 Apr 2014 16:04:51, Tobias Burnus wrote: Hi Tobias, On Fri, Apr 11, 2014 at 02:39:57PM +0200, Bernd Edlinger wrote: On Fri, 11 Apr 2014 13:37:46, Tobias Burnus wrote: Hmm, I was hoping somehow that only that test case is broken, and needs to be fixed. The target attribute is somehow simple, it implies intent(in) and the actual value will in most cases be a pointer, as in the example. I think that passing another nonpointer TARGET to a dummy argument which has a TARGET attribute is at least as common as passing a POINTER to a TARGET. TARGET is roughtly the opposite to the restrict qualifier. By default any nonpointer variable does not alias with something else, unless it has the TARGET attribute; if it has, it (its address) can then be assigned to a pointer. POINTER intrinsically alias and cannot have the TARGET attribute. Pointer - Nonalloc Allocatable - Noalloc Nonallocatable*/Allocatable* - Pointer with intent(in) Well, this approach does not handle intent(inout) at all. Now I have created a test case for the different aliasing issues with may arise with scalar objects. As you pointed out, also conversions of allocatable - nonalloc, allocatable - pointer and nonalloc - pointer turn out to violate the strict aliasing rules. However, conversions of arrays of objects with different attributes seem to be safe. I have not been able to find an example where it would be necessary to write the modified class object back to the original location. But I am not really a Fortran expert. Unfortunately there are also conversions of optional allocatable - optional pointer, which complicate the whole thing quite a lot. I have found these in class_optional_2.f90. Boot-strapped and regression-tested on x86_64-linux-gnu. OK for trunk? Thanks Bernd. 2014-04-15 Bernd Edlinger bernd.edlin...@hotmail.de PR fortran/60718 * trans-expr.c (gfc_conv_procedure_call): Fix a strict aliasing violation when passing a class object to a formal parameter which has different pointer or allocatable attributes. testsuite: 2014-04-14 Bernd Edlinger bernd.edlin...@hotmail.de PR fortran/60718 * gfortran.dg/class_alias.f90: New. patch-pr60718.diff Description: Binary data
Re: PR 13631 Problems in messages
On 01/12/14 22:42 +0100, François Dumont wrote: Hi Here is another proposal that consider all your remarks except one. I finally prefer to go with std::vector of pointers. Dynamically allocating Catalog_info allow to avoid numerous copies of locale when we find this pointer from the catalog info. + result_type +_M_get(catalog_id __c) const +{ + __gnu_cxx::__scoped_lock lock(_M_mutex); + + const catalog_info* __entry = +lower_bound(_M_map, _M_map + _M_nb_entry, __c, _Comp()); + if (__entry != _M_map + _M_nb_entry __entry-_M_id == __c) +return result_type(__entry-_M_domain, __entry-_M_locale); + return result_type(0, locale()); I assume the second return was just for testing, it should be removed. I don't get this one. I still need to return something when I can't find the catalog so the 2 return statements. I thought I remembered seeing two returns at the same level of indentation, but obviously they are not! Maybe I was looking it it in gmail's attachment viewer which eats leading whitespace. Ignore that comment, sorry. Index: config/locale/gnu/messages_members.h === --- config/locale/gnu/messages_members.h(revision 218027) +++ config/locale/gnu/messages_members.h(working copy) @@ -83,22 +83,6 @@ _S_destroy_c_locale(_M_c_locale_messages); } - templatetypename _CharT -typename messages_CharT::catalog - messages_CharT::do_open(const basic_stringchar __s, - const locale) const -{ - // No error checking is done, assume the catalog exists and can - // be used. - textdomain(__s.c_str()); - return 0; -} - - templatetypename _CharT -void-messages_CharT::do_close(catalog) const -{ } - // messages_byname templatetypename _CharT messages_byname_CharT::messages_byname(const char* __s, size_t __refs) Unless I'm misreading this patch, you've removed the definitions of messages_CharT::do_open() and messages_CharT::do_close() for the primary template. They would stil be needed if users instantiate e.g. messageschar16_t or messagessigned char. Yes but do you confirm that it is already the same problem with do_get ? Yes, I'm just wondering why make the situation worse than it already is. Does leaving the default definitions cause a problem? If some user has got code that uses messagessigned char and they provide a definition for messagessigned char::do_get() their code will break if do_open and do_close disappear. (Realistically I doubt anyone is doing that though, it may not even work.) In my opinion we could provide template implementations of all those methods relying on codecvt_CharT, char, mbstate_t, even for do_get. But in this case some implementation details will be exposed in the header files and additional symbols will have to be exported I think. In fact I have already started doing something like that but then start facing issue accessing nl_langinfo_l. Shall I go further and provide a patch doing this ? IMHO no. +catalog +_M_add(const string __domain, locale __l) +{ + __gnu_cxx::__scoped_lock lock(_M_mutex); + + // The counter is not likely to roll unless catalogs keep on being + // opened/closed which is consider as an application mistake for the + // moment. + if (_M_catalog_counter == numeric_limitscatalog::max()) + return -1; + + _M_infos.push_back(new Catalog_info(_M_catalog_counter++, __domain, __l)); + return _M_infos.back()-_M_id; This is not exception safe. You can use auto_ptr to fix it: std::auto_ptrCatalog_info info(new Catalog_info(_M_catalog_counter++, __domain, __l)); _M_infos.push_back(info.get()); return info.release()-_M_id;
Re: [testsuite] Fix multiple definitions of _init
On Tue, 2014-12-02 at 11:22 +0100, Richard Biener wrote: On Mon, Dec 1, 2014 at 11:48 PM, Oleg Endo oleg.e...@t-online.de wrote: On Mon, 2014-12-01 at 12:09 +0100, Richard Biener wrote: On Mon, Dec 1, 2014 at 8:21 AM, Oleg Endo oleg.e...@t-online.de wrote: Hi, When running the testsuite on a sh-elf configuration, some test cases fail due to multiple definitions of the function '_init'. This is because on sh-elf every function is automatically prefixed with a '_' char. When defining a C function 'init' in some code, the actual name becomes '_init' and this conflicts with the _init function in libgcc (crti.S). While the behavior of adding the '_' prefix is debatable, the testsuite failures can be easily fixed by the attached patch. Is this OK for trunk? The testcases are certainly valid C testcases and thus sh-elf is buggy here. As '_init' is a reserved identifier it shouldn't mangle 'init' as '_init'. Do you never run into real applications naming a function 'init'? Looking for USER_LABEL_PREFIX shows that the '_' prefix mangling scheme is being used by quite some targets/configs. So it's not just sh-elf, but other bare metal configs and their ABIs, where 'init' gets mangled to '_init' and there's a '.global _init' in libgcc. Yes, it does break valid standard C code 'void init (void)', but it's been doing that for quite a while. Oh, and I wonder why the global init isn't then __init (two underscores), as obviously on those targets symbols with _ prefix are not in the implementation namespace. My guess: Because it's written in asm, and there it's not known whether a _ prefix should be used or not. It's an ABI/convention. BTW it's not only the init function, but potentially several others. E.g. newlib/libc/sys/sh/crt0.S refers to other symbols such as _stack, _edata, _end, _main, _exit. Thus, having something global named stack in a C program will potentially clash. So while we maybe could use USER_LABEL_PREFIX in libgcc to add the prefix, there would still be issues with things defined in e.g. newlib or the linker script. For bare metal configs / programs having some additional reserved/prohibited names is not a big deal, as there are often other restrictions which are even worse. I guess the issue isn't that bad. Out of all the gcc test cases, I hit only 4. In the context of gcc sim testing, one option could be to use a special toolchain config, which does not do the _ prefixing. However, there's no such configure option and the only way to change it is to change the USER_LABEL_PREFIX definition, or add a subtarget which doesn't prefix _. Another option could be adding something to the global configure stuff so that the option -fno-leading-underscore can be turned on by default for the resulting compiler. Cheers, Oleg
Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
On 05 Nov 11:00, Uros Bizjak wrote: On Wed, Nov 5, 2014 at 10:35 AM, Ilya Enkovich enkovich@gmail.com wrote: Hi, Having stage1 close to end, may we make some decision regarding this patch? Having a couple of working variants, may we choose and use one of them? I propose to wait for Vlad for an update about his plans on register preference algorythm that would fix this (and other Ya*r-type issues). In the absence of the fix, we'll go with Yr,*x. Uros. Hi, Here is an updated patch which uses Yr,*x option to avoid long prefixes for Silvermont. Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk? Thanks, Ilya -- gcc/ 2014-12-02 Ilya Enkovich ilya.enkov...@intel.com * config/i386/constraints.md (Yr): New. * config/i386/i386.h (reg_class): Add NO_REX_SSE_REGS. (REG_CLASS_NAMES): Likewise. (REG_CLASS_CONTENTS): Likewise. * config/i386/sse.md (*vec_concatv2sf_sse4_1): Add alternatives which use only NO_REX_SSE_REGS. (vec_setmode_0): Likewise. (*vec_setv4sf_sse4_1): Likewise. (sse4_1_insertps): Likewise. (*sse4_1_extractps): Likewise. (*sse4_1_mulv2siv2di3mask_name): Likewise. (*sse4_1_avx2_mulmode3mask_name): Likewise. (*sse4_1_codemode3mask_name): Likewise. (*sse4_1_codemode3): Likewise. (*sse4_1_eqv2di3): Likewise. (sse4_2_gtv2di3): Likewise. (*vec_extractv4si): Likewise. (*vec_concatv2si_sse4_1): Likewise. (vec_concatv2di): Likewise. (sse4_1_blendssemodesuffixavxsizesuffix): Likewise. (sse4_1_blendvssemodesuffixavxsizesuffix): Likewise. (sse4_1_dpssemodesuffixavxsizesuffix): Likewise. (vi8_sse4_1_avx2_avx512_movntdqa): Likewise. (sse4_1_avx2_mpsadbw): Likewise. (sse4_1_avx2packusdwmask_name): Likewise. (sse4_1_avx2_pblendvb): Likewise. (sse4_1_pblendw): Likewise. (sse4_1_phminposuw): Likewise. (sse4_1_codev8qiv8hi2mask_name): Likewise. (sse4_1_codev4qiv4si2mask_name): Likewise. (sse4_1_codev4hiv4si2mask_name): Likewise. (sse4_1_codev2qiv2di2mask_name): Likewise. (sse4_1_codev2hiv2di2mask_name): Likewise. (sse4_1_codev2siv2di2mask_name): Likewise. (sse4_1_ptest): Likewise. (sse4_1_roundssemodesuffixavxsizesuffix): Likewise. (sse4_1_roundssescalarmodesuffix): Likewise. * config/i386/subst.md (mask_prefix4): New. * config/i386/x86-tune.def (X86_TUNE_AVOID_4BYTE_PREFIXES): New. gcc/testsuites/ 2014-12-02 Ilya Enkovich ilya.enkov...@intel.com * gcc.target/i386/sse2-init-v2di-2.c: Adjust to changed vec_concatv2di template. diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md index 4e07d70..72925cd 100644 --- a/gcc/config/i386/constraints.md +++ b/gcc/config/i386/constraints.md @@ -108,6 +108,8 @@ ;; d Integer register when integer DFmode moves are enabled ;; x Integer register when integer XFmode moves are enabled ;; f x87 register when 80387 floating point arithmetic is enabled +;; r SSE regs not requiring REX prefix when prefixes avoidance is enabled +;; and all SSE regs otherwise (define_register_constraint Yz TARGET_SSE ? SSE_FIRST_REG : NO_REGS First SSE register (@code{%xmm0}).) @@ -150,6 +152,10 @@ (ix86_fpmath FPMATH_387) ? FLOAT_REGS : NO_REGS @internal Any x87 register when 80387 FP arithmetic is enabled.) +(define_register_constraint Yr + TARGET_SSE ? (X86_TUNE_AVOID_4BYTE_PREFIXES ? NO_REX_SSE_REGS : ALL_SSE_REGS) : NO_REGS + @internal Lower SSE register when avoiding REX prefix and all SSE registers otherwise.) + ;; We use the B prefix to denote any number of internal operands: ;; s Sibcall memory operand, not valid for TARGET_X32 ;; w Call memory operand, not valid for TARGET_X32 diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 3f5f979..6d9df65 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -1311,6 +1311,7 @@ enum reg_class FP_TOP_REG, FP_SECOND_REG, /* %st(0) %st(1) */ FLOAT_REGS, SSE_FIRST_REG, + NO_REX_SSE_REGS, SSE_REGS, EVEX_SSE_REGS, BND_REGS, @@ -1369,6 +1370,7 @@ enum reg_class FP_TOP_REG, FP_SECOND_REG, \ FLOAT_REGS, \ SSE_FIRST_REG,\ + NO_REX_SSE_REGS, \ SSE_REGS, \ EVEX_SSE_REGS,\ BND_REGS, \ @@ -1409,6 +1411,7 @@ enum reg_class { 0x0200, 0x0,0x0 }, /* FP_SECOND_REG */ \ { 0xff00, 0x0,0x0 }, /* FLOAT_REGS */\ { 0x20, 0x0,0x0 }, /* SSE_FIRST_REG */ \ +{ 0x1fe0, 0x00,0x0 }, /* NO_REX_SSE_REGS */ \ { 0x1fe0, 0x1fe000,0x0 }, /* SSE_REGS */ \ { 0x0,0xffe0, 0x1f },
Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target
On Tue, Dec 2, 2014 at 12:08 PM, Ilya Enkovich enkovich@gmail.com wrote: Having stage1 close to end, may we make some decision regarding this patch? Having a couple of working variants, may we choose and use one of them? I propose to wait for Vlad for an update about his plans on register preference algorythm that would fix this (and other Ya*r-type issues). In the absence of the fix, we'll go with Yr,*x. Uros. Hi, Here is an updated patch which uses Yr,*x option to avoid long prefixes for Silvermont. Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk? Thanks, Ilya -- gcc/ 2014-12-02 Ilya Enkovich ilya.enkov...@intel.com * config/i386/constraints.md (Yr): New. * config/i386/i386.h (reg_class): Add NO_REX_SSE_REGS. (REG_CLASS_NAMES): Likewise. (REG_CLASS_CONTENTS): Likewise. * config/i386/sse.md (*vec_concatv2sf_sse4_1): Add alternatives which use only NO_REX_SSE_REGS. (vec_setmode_0): Likewise. (*vec_setv4sf_sse4_1): Likewise. (sse4_1_insertps): Likewise. (*sse4_1_extractps): Likewise. (*sse4_1_mulv2siv2di3mask_name): Likewise. (*sse4_1_avx2_mulmode3mask_name): Likewise. (*sse4_1_codemode3mask_name): Likewise. (*sse4_1_codemode3): Likewise. (*sse4_1_eqv2di3): Likewise. (sse4_2_gtv2di3): Likewise. (*vec_extractv4si): Likewise. (*vec_concatv2si_sse4_1): Likewise. (vec_concatv2di): Likewise. (sse4_1_blendssemodesuffixavxsizesuffix): Likewise. (sse4_1_blendvssemodesuffixavxsizesuffix): Likewise. (sse4_1_dpssemodesuffixavxsizesuffix): Likewise. (vi8_sse4_1_avx2_avx512_movntdqa): Likewise. (sse4_1_avx2_mpsadbw): Likewise. (sse4_1_avx2packusdwmask_name): Likewise. (sse4_1_avx2_pblendvb): Likewise. (sse4_1_pblendw): Likewise. (sse4_1_phminposuw): Likewise. (sse4_1_codev8qiv8hi2mask_name): Likewise. (sse4_1_codev4qiv4si2mask_name): Likewise. (sse4_1_codev4hiv4si2mask_name): Likewise. (sse4_1_codev2qiv2di2mask_name): Likewise. (sse4_1_codev2hiv2di2mask_name): Likewise. (sse4_1_codev2siv2di2mask_name): Likewise. (sse4_1_ptest): Likewise. (sse4_1_roundssemodesuffixavxsizesuffix): Likewise. (sse4_1_roundssescalarmodesuffix): Likewise. * config/i386/subst.md (mask_prefix4): New. * config/i386/x86-tune.def (X86_TUNE_AVOID_4BYTE_PREFIXES): New. gcc/testsuites/ 2014-12-02 Ilya Enkovich ilya.enkov...@intel.com * gcc.target/i386/sse2-init-v2di-2.c: Adjust to changed vec_concatv2di template. OK for mainline with a change below. @@ -6544,13 +6550,14 @@ }) (define_insn_and_split *sse4_1_extractps - [(set (match_operand:SF 0 nonimmediate_operand =rm,x,x) + [(set (match_operand:SF 0 nonimmediate_operand =*rm,rm,x,x) (vec_select:SF Please do not change preferences of non-SSE registers. Please check the patch that similar changes didn't creep in. Thanks, Uros.
RE: [PATCH] Improve spillcost of literal pool loads
Jeff Law wrote: OK with the appropropriate ChangeLog entires. THe original for ira-costs.c was fine, so you just need the trivial one for the testcase. ChangeLog below - Jiong, could you commit for me please? 2014-12-02 Wilco Dijkstra wdijk...@arm.com * gcc/ira-costs.c (scan_one_insn): Improve spill cost adjustment. * gcc/testsuite/gcc.target/aarch64/remat1.c: New testcase. --- gcc/ira-costs.c | 5 ++--- gcc/testsuite/gcc.target/aarch64/remat1.c | 20 2 files changed, 22 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/remat1.c diff --git a/gcc/ira-costs.c b/gcc/ira-costs.c index 122815b..95d266e 100644 --- a/gcc/ira-costs.c +++ b/gcc/ira-costs.c @@ -1462,12 +1462,11 @@ scan_one_insn (rtx_insn *insn) ((MEM_P (XEXP (note, 0)) !side_effects_p (SET_SRC (set))) || (CONSTANT_P (XEXP (note, 0)) - targetm.legitimate_constant_p (GET_MODE (SET_DEST (set)), - XEXP (note, 0)) + (! flag_pic || LEGITIMATE_PIC_OPERAND_P (XEXP (note, 0))) REG_N_SETS (REGNO (SET_DEST (set))) == 1)) general_operand (SET_SRC (set), GET_MODE (SET_SRC (set { - enum reg_class cl = GENERAL_REGS; + enum reg_class cl = ALL_REGS; rtx reg = SET_DEST (set); int num = COST_INDEX (REGNO (reg)); diff --git a/gcc/testsuite/gcc.target/aarch64/remat1.c b/gcc/testsuite/gcc.target/aarch64/remat1.c new file mode 100644 index 000..999577e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/remat1.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fomit-frame-pointer -fcaller-saves -ffixed-d8 -ffixed-d9 -ffixed-d10 -ffixed-d11 -ffixed-d12 -ffixed-d13 -ffixed-d14 -ffixed-d15 } */ + +/* Under high register pressure FP immediates should be rematerialized + as literal loads rather than being caller-saved to the stack. */ + +void +g (void); + +float +f (float x) +{ + x += 3.1f; + g (); + x *= 3.1f; + return x; +} + +/* { dg-final { scan-assembler-times ldr\ts\[0-9]+, .LC0 2 } } */ + -- 1.9.1
Re: [PATCH, ARM] Fix PR63718, Thumb1 bootstrap -- disable fuse-caller-save for Thumb1
On 20/11/14 11:54, Tom de Vries wrote: Richard, This patch fixes PR63718, which currently breaks Thumb1 bootstrap. The problem is that in Thumb1 mode, we emit the epilogue in RTL, but the last insn - epilogue_insns - does not accurately model the corresponding insns emitted in the asm file. F.i., the asm file may contain an insn: ... pop {r0} while the corresponding RTL pattern looks like this: ... (jump_insn (unspec_volatile [ (return) ] VUNSPEC_EPILOGUE)) ... As a consequence, the epilogue may clobber registers without fuse-caller-save being able to analyze that. Adding the missing clobbers to epilogue_insns is not trivial, and probably not a good idea for stage3. The patch works around the problem by disabling fuse-caller-save in Thumb1 mode. Build and reg-tested on arm-none-eabi. OK for stage3? This is OK, can you make sure that a bug is created to reflect the work required to make this work in the backend. Sorry about the delay in review. regards Ramana Thanks, - Tom
Re: [PATCH PR59593] [arm] Backport r217772 r217826 to 4.8 4.9
On 29/11/14 06:50, Chen Shanyao wrote: I've backported this fix to 4.8 4.9 branch. These patches have been tested for armeb-none-eabi-gcc/g++ with qemu, and both the test results were ok. The Changelog should mention all authors of the original patches i.e. include my name. Otherwise this is OK unless an RM objects in the next 24 hours. regards Ramana
Re: [rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1
On 21/11/2014 14:08, Alan Hayward alan.hayw...@arm.com wrote: On 14/11/2014 16:48, Alan Hayward alan.hayw...@arm.com wrote: This is a new version of my BE patch from a few weeks ago. This is part 1 and covers rtlanal.c. The second part will be aarch64 specific. When combined with the second patch, It fixes up movoi/ci/xi for Big Endian, so that we end up with the lab of a big-endian integer to be in the low byte of the highest-numbered register. This will apply cleanly by itself and no regressions were seen when testing aarch64 and x86_64 on make check. Changelog: 2014-11-14 Alan Hayward alan.hayw...@arm.com * rtlanal.c (subreg_get_info): Exit early for simple and common cases Alan. Hi, The second part to this patch (aarch64 specific) has been approved. Could someone review this one please. Thanks, Alan. Ping. Thanks, Alan. 0001-BE-fix-load-stores.-Common-code.patch Description: Binary data
Re: [PATCH 1/2, PR 63814] Strengthen cgraph_edge_brings_value_p
On Fri, Nov 21, 2014 at 10:59 AM, Martin Jambor mjam...@suse.cz wrote: Hi, PR 63814 is caused by cgraph_edge_brings_value_p misidentifying an edge to an expanded artificial thunk as an edge to the original node, which then leads to crazy double-cloning and doubling the thunks along the call. This patch fixes the bug by strengthening the predicate so that it knows where the value is supposed to go and can check that it goes there and not anywhere else. It also adds an extra availability check that was probably missing in it. Bootstrapped and tested on x86_64-linux, and i686-linux. OK for trunk? Thanks, Martin 2014-11-20 Martin Jambor mjam...@suse.cz PR ipa/63814 * ipa-cp.c (same_node_or_its_all_contexts_clone_p): New function. (cgraph_edge_brings_value_p): New parameter dest, use same_node_or_its_all_contexts_clone_p and check availability. (cgraph_edge_brings_value_p): Likewise. (get_info_about_necessary_edges): New parameter dest, pass it to cgraph_edge_brings_value_p. Update caller. (gather_edges_for_value): Likewise. (perhaps_add_new_callers): Use cgraph_edge_brings_value_p to check both the destination and availability. I checked in this testcase: diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index a17cc6c..1410f10 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2014-12-02 H.J. Lu hongjiu...@intel.com + + PR ipa/63814 + * g++.dg/ipa/pr63814.C: New test. + 2014-12-02 Wilco Dijkstra wilco.dijks...@arm.com * gcc.target/aarch64/remat1.c: New testcase. diff --git a/gcc/testsuite/g++.dg/ipa/pr63814.C b/gcc/testsuite/g++.dg/ipa/pr63814.C new file mode 100644 index 000..15a7dda --- /dev/null +++ b/gcc/testsuite/g++.dg/ipa/pr63814.C @@ -0,0 +1,29 @@ +// { dg-do run { target fpic } } +// { dg-options -O3 -fpic } + +struct CBase { + virtual void BaseFunc () {} +}; + +struct MMixin { + virtual void * MixinFunc (int, int) = 0; +}; + +struct CExample: CBase, public MMixin +{ + void *MixinFunc (int arg, int arg2) + { +return this; + } +}; + +void *test (MMixin anExample) +{ + return anExample.MixinFunc (0, 0); +} + +int main () +{ + CExample c; + return (test (c) != c); +} H.J.
[PATCH] rs6000: Fix PR target/64115
Hello, as discussed in the PR, this patch removes an invalid ENABLE_CHECKING sanity check in rs6000_delegitimize_address, fixing the ICE. Tested on powerpc64-linux. OK for mainline / 4.9 / 4.8? Bye, Ulrich ChangeLog: PR target/64115 * config/rs6000/rs6000.c (rs6000_delegitimize_address): Remove invalid UNSPEC_TOCREL sanity check under ENABLE_CHECKING. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 218210) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -7033,24 +7033,6 @@ if (GET_CODE (y) == UNSPEC XINT (y, 1) == UNSPEC_TOCREL) { -#ifdef ENABLE_CHECKING - if (REG_P (XVECEXP (y, 0, 1)) - REGNO (XVECEXP (y, 0, 1)) == TOC_REGISTER) - { - /* All good. */ - } - else if (GET_CODE (XVECEXP (y, 0, 1)) == DEBUG_EXPR) - { - /* Weirdness alert. df_note_compute can replace r2 with a -debug_expr when this unspec is in a debug_insn. -Seen in gcc.dg/pr51957-1.c */ - } - else - { - debug_rtx (orig_x); - abort (); - } -#endif y = XVECEXP (y, 0, 0); #ifdef HAVE_AS_TLS -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH fortran/linemap] Add enough column hint to fit any possible offset
On 2 December 2014 at 07:50, Tobias Burnus bur...@net-b.de wrote: Hi Manuel, Manuel López-Ibáñez wrote: This patch actually does not touch linemap but I will appreciate Dodji's comments about the approach. The problem is that in case of long lines, the column hint of 120 might be too small, thus we do not have enough locations within one line to point to a higher column (like 132 in the testcase). Giving as column hint the length of the line seems the right fix. [...] My fix here is to create a dummy location for line_len -1 (the last column in the line). This forces the next map to start after this last column's location. My feeling is that that doesn't work for -ffree-line-length-none/-ffixed-line-length-none. In that case, gfc_option.free_line_length == 0 and there is no limit to the number of characters per line. Thus, I fear that b-location - = linemap_line_start (line_table, current_file-line++, 120); + = linemap_line_start (line_table, current_file-line++, line_len); might even set the line length to 0 - but I haven't checked. For all other values, it should work. My understanding of this piece of code is that Fortran reads the whole line into a buffer, and that line_len is the length of the line just read. Am I wrong? If so, is there any way to compute the length of the current line? A small optimization would be to use all that information about truncation limits to give a smaller column hint when possible, but someone more knowledgeable in Fortran can do that as a follow-up. I think setting it to the actual length line will fix the ICEs and only waste a bit of memory if the lines are quite long. Cheers, Manuel.
Re: [PATCH] DWARF add DW_AT_noreturn on noreturn function subprogram.
On Mon, 2014-12-01 at 14:31 -0700, Jeff Law wrote: On 11/25/14 15:57, Mark Wielaard wrote: This implements the DWARFv5 noreturn proposal: http://dwarfstd.org/ShowIssue.php?issue=140331.1 TREE_THIS_VOLATILE on a FUNCTION_DECL node means the function does not return normally. This catches the traditional noreturn GNU attribute, the C11 _Noreturn keyword and the C++11 [[noreturn]] attribute. This relies on the DW_AT_noreturn constant defined in the DWARFv5 DRAFT: http://www.dwarfstd.org/doc/dwarf5.20141029.pdf We could also use a new GNU extension if we don't trust these constants to be stable. Presumably we don't have any sense when the values will be assigned, right? Any chance we could speed that along a bit? Sadly I have no idea. Officially the DWARF committee hasn't accepted new proposals since March 2014 for DWARFv5. But some of my proposals have been accepted anyway as late as last month. There is not really a way to know when which proposal is submitted or considered for incorporation by people outside the committee. There is the above draft that contains the currently assigned constants. But I don't know when the committee will ask for comments on the draft. Or how long the comment period will be. I have just asked on the DWARF mailinglist. I don't mind using GNU vendor constants instead. But people on the committee have convinced me it should be fine to use the constants from the current draft. Given that Jakub, Jason and Cary are all on the DWARF committee I assume they would yell and scream at me if they believed using these constants were not stable. Most usage is gated behind the experimental GCC5 -gdwarf-5 switch anyway (although not this attribute, which is also emitted in non-strict mode). BTW. I do make sure to update bintuils, gdb, elfutils and valgrind to match the GCC5 output, so the constants should at least be consistent across the GNU toolchain and friends. All GNU and DWARFv5 extensions are documented: https://fedorahosted.org/elfutils/wiki/DwarfExtensions Cheers, Mark
Re: PATCH: ICE: SIGSEGV in decide_alg() with -mmemset-strategy=libcall:-1:align -minline-all-stringops
On Sat, Nov 29, 2014 at 3:00 PM, H.J. Lu hjl.to...@gmail.com wrote: When searching for an usable algorithm with -minline-all-stringops, decide_alg stops when it sees libcall even if there is a usable algorithm. It goes into an infinite loop. This patch changes decide_alg to stop searching only if there aren't any usable algorithms. Testd on Linux/x86-64. OK for trunk. H.J. gcc/ 2014-11-29 H.J. Lu hongjiu...@intel.com PR target/64108 * config/i386/i386.c (decide_alg): Stop only if there aren't any usable algorithms. gcc/testsuite/ 2014-11-29 H.J. Lu hongjiu...@intel.com PR target/64108 * gcc.target/i386/memset-strategy-2.c: New test. OK with a change below. Please wait a couple of days before backporting the fix - the fix looks kind of obvious, but this part of the code is somehow complex and has biten us in the past. Thanks, Uros. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 2493130..d789635 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -24464,7 +24464,9 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT expected_size, *noalign = alg_noalign; return alg; } - break; + /* Stop only if there aren't any usable algorithms. */ + if (!any_alg_usable_p) + break; Please use else if ... and remove unneded argument. IMO, the code speaks for itself. } else if (alg_usable_p (candidate, memset)) { diff --git a/gcc/testsuite/gcc.target/i386/memset-strategy-2.c b/gcc/testsuite/gcc.target/i386/memset-strategy-2.c new file mode 100644 index 000..aafa54d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/memset-strategy-2.c @@ -0,0 +1,10 @@ +/* PR target/64108 */ +/* { dg-do compile } */ +/* { dg-options -O2 -march=atom -mmemset-strategy=libcall:-1:align -minline-all-stringops } */ + +char a[2048]; +void t (void) +{ + __builtin_memset (a, 1, 2048); +} +
[patch] Update libstdc++ status docs
Update the docs. Committed to trunk. commit 388d7943d85f505381f6e818d2de07978d022498 Author: Jonathan Wakely jwak...@redhat.com Date: Tue Dec 2 13:13:33 2014 + * doc/xml/manual/status_cxx2011.xml: Update. * doc/xml/manual/status_cxx2014.xml: Update. * doc/html/manual/status.html: Regenerate. diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml index 455c704..c4eec2c 100644 --- a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml +++ b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml @@ -2434,11 +2434,10 @@ particular release. entry/ /row row - ?dbhtml bgcolor=#B0B0B0 ? entry30.5/entry entryCondition variables/entry - entryPartial/entry - entryMissing notify_all_at_thread_exit/entry + entryY/entry + entry/ /row row entry30.5.1/entry @@ -2483,11 +2482,10 @@ particular release. entry/ /row row - ?dbhtml bgcolor=#B0B0B0 ? entry30.6.5/entry entryClass template codepromise/code/entry - entryPartial/entry - entryMissing set_*_at_thread_exit/entry + entryY/entry + entry/ /row row entry30.6.6/entry @@ -2508,11 +2506,10 @@ particular release. entry/ /row row - ?dbhtml bgcolor=#B0B0B0 ? entry30.6.9/entry entryClass template codepackaged_task/code/entry - entryPartial/entry - entryMissing make_ready_at_thread_exit/entry + entryY/entry + entry/ /row row entry diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2014.xml b/libstdc++-v3/doc/xml/manual/status_cxx2014.xml index f7d18a2..f716739 100644 --- a/libstdc++-v3/doc/xml/manual/status_cxx2014.xml +++ b/libstdc++-v3/doc/xml/manual/status_cxx2014.xml @@ -283,18 +283,6 @@ not in any particular release. /row row - ?dbhtml bgcolor=#C8B0B0 ? - entry - link xmlns:xlink=http://www.w3.org/1999/xlink; xlink:href=http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2013/n3783.pdf; - N3783 - /link - /entry - entryNetwork byte order conversion/entry - entryN/entry - entryLibrary Fundamentals TS/entry -/row - -row entry link xmlns:xlink=http://www.w3.org/1999/xlink; xlink:href=http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2013/n3793.html; N3793 @@ -397,14 +385,13 @@ not in any particular release. /row row - ?dbhtml bgcolor=#C8C8B0 ? entry link xmlns:xlink=http://www.w3.org/1999/xlink; xlink:href=http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3932.htm; N3932 /link /entry entryVariable Templates For Type Traits/entry - entryPartial/entry + entryY/entry entryLibrary Fundamentals TS/entry /row
Re: [PATCH 2/3] Extended if-conversion
On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Hi Richard, I resend you patch1 and patch2 with minor changes: 1. I renamed flag_force_vectorize to aggressive_if_conv. 2. Use static cast for the first argument of gimple_phi_arg_edge. I also very sorry that I sent you bad patch. Now let me answer on your questions related to second patch. 1. Why we need both predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi? Let's consider the following simple test-case: #pragma omp simd safelen(8) for (i=0; i512; i++) { float t = a[i]; if (t 0.0f t 1.0e+17f) if (c[i] != 0) /* c is integer array. */ res += 1; } we can see the following phi node correspondent to res: # res_1 = PHI res_15(3), res_15(4), res_10(5) It is clear that we can optimize it to phi node with 2 arguments only and only one check can be used for phi predication (for reduction in our case), namely predicate of bb_5. In general case we can't do it even if we sort all phi argument values since we still have to produce a chain of cond expressions to perform phi predication (see comments for predicate_arbitrary_scalar_phi). How so? We can always use !(condition) for the last value, thus treat it as an 'else' case. That even works for # res_1 = PHI res_15(3), res_15(4), res_10(5), res_10(7) where the condition for edges 5 and 7 can be computed as ! (condition for 3 || condition for 4). Of course it is worthwhile to also sort single-occurances first so your case gets just the condiiton for edge 5 and its inversion used for edges 3 and 4 combined. 2. Why we need to introduce find_insertion_point? Let's consider another test-case extracted from 175.vpr ( t5.c is attached) and we can see that bb_7 and bb_9 containig phi nodes has only critical incoming edges and both contain code computing edge predicates, e.g. bb 7: # xmax_edge_30 = PHI xmax_edge_36(6), xmax_edge_18(5) _46 = xmax_17 == xmax_37; _47 = xmax_17 == xmax_27; _48 = _46 _47; _53 = xmax_17 == xmax_37; _54 = ~_53; _55 = xmax_17 == xmax_27; _56 = _54 _55; _57 = _48 | _56; xmax_edge_19 = xmax_edge_39 + 1; goto bb 11; It is evident that we can not put phi predication at the block beginning but need to put it after predicate computations. Note also that if there are no critical edges for phi arguments insertion point will be after labels Note also that phi result can have use in this block too, so we can't put predication code to the block end. So the issue is that predicate insertion for edge predicates does not happen on the edge but somewhere else (generally impossible for critical edges unless you split them). I think I've told you before that I prefer simple solutions to such issues, like splitting the edge! Certainly not involving a function walking GENERIC expressions. Thanks, Richard. Let me know if you still have any questions. Best regards. Yuri. 2014-11-28 15:43 GMT+03:00 Richard Biener richard.guent...@gmail.com: On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Hi All, Here is the second patch related to extended predication. Few comments which explain a main goal of design. 1. I don't want to insert any critical edge splitting since it may lead to less efficient binaries. 2. One special case of extended PHI node predication was introduced when #arguments is more than 2 but only two arguments are different and one argument has the only occurrence. For such PHI conditional scalar reduction is applied. This is correspondent to the following statement: if (q1 q2 q3) var++ New function phi_has_two_different_args was introduced to detect such phi. 3. Original algorithm for PHI predication used assumption that at least one incoming edge for blocks containing PHI is not critical - it guarantees that all computations related to predicate of normal edge are already inserted above this block and code related to PHI predication can be inserted at the beginning of block. But this is not true for critical edges for which predicate computations are in the block where code for phi predication must be inserted. So new function find_insertion_point is introduced which is simply found out the last statement in block defining predicates correspondent to all incoming edges and insert phi predication code after it (with some minor exceptions). Unfortunately the patch doesn't apply for me - I get patch: malformed patch at line 505: @@ -1720,6 +2075,8 @@ predicate_all_scalar_phis (struct loop *loop) a few remarks nevertheless. I don't see how we need both predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi. Couldn't we simply sort an array of (edge, value) pairs after value and handle equal values specially in predicate_extended_scalar_phi? That would even make PHI a, a, b, c, c more optimal. I don't understand the need for find_insertion_point. All SSA names required for the
Re: [PATCH linemap] Make some asserts fail gracefully
Manuel López-Ibáñez lopeziba...@gmail.com writes: +/* Assert that becomes a conditional expression when not checking. For the sake of clarity towards newcomers, I'd say: Assert that becomes a conditional expression when checking is disabled at compilation time. + Use this for conditions that should not happen but if they happen, it + is better to handle them gracefully than ICE. Usage: Similarly, I'd say: Use this for conditions that should not happen but if they happen, it is better to handle them gracefully rather than crash randomly later. Usage: OK with those changes and thank for your time and dedication. Cheers, -- Dodji
[PATCH, CHKP] Don't generate bndret for not instrumented calls
Hi, Currently bndret is generated each time we need to get bounds for returned pointer. It causes bndret generated for not instrumented calls incuding builtin function calls. Troubles appear when such builtin call is optimized out - bndret needs to be handled appropriately. Since we don't want not instrumented builtin calls optimizers to be affected by instrumentation, we better avoid bndret for not instrumented calls. This patch uses zero bounds when we don't expect call to return bounds. Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk? Thanks, Ilya -- gcc/ 2014-12-02 Ilya Enkovich ilya.enkov...@intel.com * tree-chkp.c (chkp_call_returns_bounds_p): New. (chkp_build_returned_bound): Use zero bounds as returned by calls not returning bounds. gcc/testsuite/ 2014-12-02 Ilya Enkovich ilya.enkov...@intel.com * gcc.target/i386/chkp-bndret.c: New. * gcc.target/i386/chkp-strchr.c: New. diff --git a/gcc/testsuite/gcc.target/i386/chkp-bndret.c b/gcc/testsuite/gcc.target/i386/chkp-bndret.c new file mode 100644 index 000..3498058 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/chkp-bndret.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target mpx } */ +/* { dg-options -fcheck-pointer-bounds -mmpx -O2 -fdump-tree-chkp } */ +/* { dg-final { scan-tree-dump-not bndret chkp } } */ +/* { dg-final { cleanup-tree-dump chkp } } */ + +#include string.h + +extern int *test1 (int *p) __attribute__((bnd_legacy)); + +int * +test2 (int *p) +{ + return test1 (p); +} diff --git a/gcc/testsuite/gcc.target/i386/chkp-strchr.c b/gcc/testsuite/gcc.target/i386/chkp-strchr.c new file mode 100644 index 000..94a5eaa --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/chkp-strchr.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target mpx } */ +/* { dg-options -fcheck-pointer-bounds -mmpx -O2 } */ + +#include string.h + +static char * +test1 (char *str) +{ + return strrchr (str, '_'); +} + +char * +test2 () +{ + return test1 (test_string); +} diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c index 3e38691..f7def51 100644 --- a/gcc/tree-chkp.c +++ b/gcc/tree-chkp.c @@ -2080,6 +2080,38 @@ chkp_get_nonpointer_load_bounds (void) return chkp_get_zero_bounds (); } +/* Return 1 if may use bndret call to get bounds for pointer + returned by CALL. */ +static bool +chkp_call_returns_bounds_p (gcall *call) +{ + if (gimple_call_internal_p (call)) +return false; + + tree fndecl = gimple_call_fndecl (call); + + if (fndecl DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_MD) +return false; + + if (fndecl + lookup_attribute (bnd_legacy, DECL_ATTRIBUTES (fndecl))) +return false; + + if (fndecl DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL) +{ + if (chkp_instrument_normal_builtin (fndecl)) + return true; + + if (!lookup_attribute (always_inline, DECL_ATTRIBUTES (fndecl))) + return false; + + struct cgraph_node *clone = chkp_maybe_create_clone (fndecl); + return (clone gimple_has_body_p (clone-decl)); +} + + return true; +} + /* Build bounds returned by CALL. */ static tree chkp_build_returned_bound (gcall *call) @@ -2154,7 +2186,7 @@ chkp_build_returned_bound (gcall *call) bounds = chkp_find_bounds (gimple_call_arg (call, argno), iter); } - else + else if (chkp_call_returns_bounds_p (call)) { gcc_assert (TREE_CODE (gimple_call_lhs (call)) == SSA_NAME); @@ -2172,6 +2204,8 @@ chkp_build_returned_bound (gcall *call) update_stmt (stmt); } + else +bounds = chkp_get_zero_bounds (); if (dump_file (dump_flags TDF_DETAILS)) {
[PATCH, CHKP] Don't try to optimize bounds returned by strchr
Hi, For strchr calls bounds of the first argument are considered as returned which is wrong because NULL may be returned. This patch fixes that. Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk? Thanks, Ilya -- 2014-12-02 Ilya Enkovich ilya.enkov...@intel.com * tree-chkp.c (chkp_build_returned_bound): Don't predict return bounds for strchr calls. diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c index f7def51..1d91aa6 100644 --- a/gcc/tree-chkp.c +++ b/gcc/tree-chkp.c @@ -2163,8 +2163,7 @@ chkp_build_returned_bound (gcall *call) } /* Do not use retbnd when returned bounds are equal to some of passed bounds. */ - else if ((gimple_call_return_flags (call) ERF_RETURNS_ARG) - || gimple_call_builtin_p (call, BUILT_IN_STRCHR)) + else if (gimple_call_return_flags (call) ERF_RETURNS_ARG) { gimple_stmt_iterator iter = gsi_for_stmt (call); unsigned int retarg = 0, argno;
Re: [PATCH] rs6000: Fix PR target/64115
On Tue, Dec 2, 2014 at 7:56 AM, Ulrich Weigand uweig...@de.ibm.com wrote: Hello, as discussed in the PR, this patch removes an invalid ENABLE_CHECKING sanity check in rs6000_delegitimize_address, fixing the ICE. Tested on powerpc64-linux. OK for mainline / 4.9 / 4.8? Bye, Ulrich ChangeLog: PR target/64115 * config/rs6000/rs6000.c (rs6000_delegitimize_address): Remove invalid UNSPEC_TOCREL sanity check under ENABLE_CHECKING. Okay. Thanks, David
Re: [PATCH] Improve spillcost of literal pool loads
On Tue, Dec 2, 2014 at 3:34 AM, Wilco Dijkstra wdijk...@arm.com wrote: Jeff Law wrote: OK with the appropropriate ChangeLog entires. THe original for ira-costs.c was fine, so you just need the trivial one for the testcase. ChangeLog below - Jiong, could you commit for me please? 2014-12-02 Wilco Dijkstra wdijk...@arm.com * gcc/ira-costs.c (scan_one_insn): Improve spill cost adjustment. * gcc/testsuite/gcc.target/aarch64/remat1.c: New testcase. How did you test on Linux/ia32? It introduced many regressions: FAIL: gcc.dg/hoist-register-pressure-3.c scan-rtl-dump hoist PRE/HOIST: end of bb .* copying expression FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O1 -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O1 -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O2 -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O2 -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O3 -fomit-frame-pointer -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O3 -fomit-frame-pointer -funroll-loops -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O3 -fomit-frame-pointer -funroll-loops -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O3 -fomit-frame-pointer -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O3 -g -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O3 -g -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -Os -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -Os -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O1 -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O1 -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O2 -flto -fno-use-linker-plugin -flto-partition=none -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O2 -flto -fno-use-linker-plugin -flto-partition=none -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O2 -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O2 -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O3 -fomit-frame-pointer -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O3 -fomit-frame-pointer -funroll-loops -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O3 -fomit-frame-pointer -funroll-loops -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O3 -fomit-frame-pointer -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O3 -g -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O3 -g -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -Os -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -Os -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c -O1 -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c -O1 -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c -O2 -flto -fno-use-linker-plugin -flto-partition=none -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c -O2 -flto -fno-use-linker-plugin -flto-partition=none -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c -O2 -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c -O2 -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c -O3 -fomit-frame-pointer -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c -O3 -fomit-frame-pointer -funroll-loops -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c -O3 -fomit-frame-pointer -funroll-loops -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c -O3 -fomit-frame-pointer
[PATCH, CHKP] Keep instrumentation references consistency when merging cgraph nodes
Hi, Currently cgraph nodes merge may break instrumented_version references. It depends on order nodes are read and merged and therefore problem is not nicely reproducible. I couldn't make a small reproducer. This patch fixes problem and 253.perlbmk benchmark build with '-O3 -flto -fcheck-pointer-bounds -mmpx'. Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk? Thanks, Ilya -- 2014-12-02 Ilya Enkovich ilya.enkov...@intel.com * lto-cgraph.c (input_cgraph_1): Don't break existing instrumentation clone references. * lto/lto-symtab.c (lto_cgraph_replace_node): Redirect instrumented_version references appropriately. diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c index d424e14..b310b44 100644 --- a/gcc/lto-cgraph.c +++ b/gcc/lto-cgraph.c @@ -1562,7 +1562,18 @@ input_cgraph_1 (struct lto_file_decl_data *file_data, cnode-instrumented_version = cgraph_node::get (cnode-orig_decl); if (cnode-instrumented_version) - cnode-instrumented_version-instrumented_version = cnode; + { + /* We may have multiple nodes for a single function which +will be merged later. To have a proper merge we need +to keep instrumentation_version reference between nodes +consistent: each instrumented_version reference should +have proper reverse reference. Thus don't break existing +instrumented_version reference if it already exists. */ + if (cnode-instrumented_version-instrumented_version) + cnode-instrumented_version = NULL; + else + cnode-instrumented_version-instrumented_version = cnode; + } /* Restore decl names reference. */ if (IDENTIFIER_TRANSPARENT_ALIAS (DECL_ASSEMBLER_NAME (cnode-decl)) diff --git a/gcc/lto/lto-symtab.c b/gcc/lto/lto-symtab.c index 4c4e48a..f5d82a7 100644 --- a/gcc/lto/lto-symtab.c +++ b/gcc/lto/lto-symtab.c @@ -99,6 +99,20 @@ lto_cgraph_replace_node (struct cgraph_node *node, /* Redirect incomming references. */ prevailing_node-clone_referring (node); + /* Fix instrumentation references. */ + if (node-instrumented_version) +{ + gcc_assert (node-instrumentation_clone + == prevailing_node-instrumentation_clone); + node-instrumented_version-instrumented_version = prevailing_node; + if (!prevailing_node-instrumented_version) + prevailing_node-instrumented_version = node-instrumented_version; + /* Need to reset node-instrumented_version to NULL, +otherwise node removal code would reset +node-instrumented_version-instrumented_version. */ + node-instrumented_version = NULL; +} + ipa_merge_profiles (prevailing_node, node); lto_free_function_in_decl_state_for_node (node);
Re: [PATCH] Improve spillcost of literal pool loads
On Tue, Dec 2, 2014 at 5:51 AM, H.J. Lu hjl.to...@gmail.com wrote: On Tue, Dec 2, 2014 at 3:34 AM, Wilco Dijkstra wdijk...@arm.com wrote: Jeff Law wrote: OK with the appropropriate ChangeLog entires. THe original for ira-costs.c was fine, so you just need the trivial one for the testcase. ChangeLog below - Jiong, could you commit for me please? 2014-12-02 Wilco Dijkstra wdijk...@arm.com * gcc/ira-costs.c (scan_one_insn): Improve spill cost adjustment. * gcc/testsuite/gcc.target/aarch64/remat1.c: New testcase. How did you test on Linux/ia32? It introduced many regressions: FAIL: gcc.dg/hoist-register-pressure-3.c scan-rtl-dump hoist PRE/HOIST: end of bb .* copying expression FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O1 -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O1 -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O2 -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O2 -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O3 -fomit-frame-pointer -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O3 -fomit-frame-pointer -funroll-loops -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O3 -fomit-frame-pointer -funroll-loops -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O3 -fomit-frame-pointer -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O3 -g -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -O3 -g -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -Os -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-1.c -Os -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O1 -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O1 -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O2 -flto -fno-use-linker-plugin -flto-partition=none -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O2 -flto -fno-use-linker-plugin -flto-partition=none -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O2 -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O2 -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O3 -fomit-frame-pointer -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O3 -fomit-frame-pointer -funroll-loops -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O3 -fomit-frame-pointer -funroll-loops -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O3 -fomit-frame-pointer -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O3 -g -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O3 -g -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -Os -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -Os -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c -O1 -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c -O1 -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c -O2 -flto -fno-use-linker-plugin -flto-partition=none -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c -O2 -flto -fno-use-linker-plugin -flto-partition=none -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c -O2 -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c -O2 -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c -O3 -fomit-frame-pointer -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions -mforce-drap -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c -O3 -fomit-frame-pointer -funroll-loops -fpic execution test FAIL: gcc.dg/torture/stackalign/setjmp-4.c
Re: [arm][patch] fix arm_neon_ok check on !arm_arch7
On 23/09/14 09:27, James Greenhalgh wrote: On Mon, Sep 15, 2014 at 11:56:03AM +0100, Andrew Stubbs wrote: On 15/09/14 10:46, Richard Earnshaw wrote: Hmm, I wonder if arm_override_options should reject neon + (arch 7). Is this more to your taste? Is this really such a good idea? It causes carnage throughout the testsuite if you have configured with support for Neon and the testcase is written with dg-options for a pre-armv7-a -march value. For example in: testsuite/gcc.target/arm/di-longlong64-sync-withhelpers.c Which forces -march=armv5. Perhaps you just have to fix the effective-target-ok tests - but then we lose swathes of test coverage. This also causes subtle Linux kernel compile failures. Over there they use make rules where they check if the compiler supports -march=armv5te and if not use -march=armv4t. With this patch if the compiler is configured with something like --with-fpu=neon the test will fail with your error message, even though the compiler supports -march=armv5te. Kyrill Thanks, James Andrew P.S. arm_override_options was renamed in 2010. 2014-09-15 Andrew Stubbs a...@codesourcery.com * gcc/config/arm/arm.c (arm_option_override): Reject -mfpu=neon when architecture is older than ARMv7. Index: gcc/config/arm/arm.c === --- gcc/config/arm/arm.c(revision 215228) +++ gcc/config/arm/arm.c(working copy) @@ -2845,6 +2845,9 @@ arm_fpu_desc = all_fpus[arm_fpu_index]; + if (TARGET_NEON !arm_arch7) +error (target CPU does not support NEON); + switch (arm_fpu_desc-model) { case ARM_FP_MODEL_VFP:
Re: PATCH: ICE: SIGSEGV in decide_alg() with -mmemset-strategy=libcall:-1:align -minline-all-stringops
On Tue, Dec 2, 2014 at 5:25 AM, Uros Bizjak ubiz...@gmail.com wrote: On Sat, Nov 29, 2014 at 3:00 PM, H.J. Lu hjl.to...@gmail.com wrote: When searching for an usable algorithm with -minline-all-stringops, decide_alg stops when it sees libcall even if there is a usable algorithm. It goes into an infinite loop. This patch changes decide_alg to stop searching only if there aren't any usable algorithms. Testd on Linux/x86-64. OK for trunk. H.J. gcc/ 2014-11-29 H.J. Lu hongjiu...@intel.com PR target/64108 * config/i386/i386.c (decide_alg): Stop only if there aren't any usable algorithms. gcc/testsuite/ 2014-11-29 H.J. Lu hongjiu...@intel.com PR target/64108 * gcc.target/i386/memset-strategy-2.c: New test. OK with a change below. Please wait a couple of days before backporting the fix - the fix looks kind of obvious, but this part of the code is somehow complex and has biten us in the past. Will do. Thanks, Uros. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 2493130..d789635 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -24464,7 +24464,9 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT expected_size, *noalign = alg_noalign; return alg; } - break; + /* Stop only if there aren't any usable algorithms. */ + if (!any_alg_usable_p) + break; Please use else if ... and remove unneded argument. IMO, the code speaks for itself. Which unneeded argument to remove? All arguments of decide_alg are used. } else if (alg_usable_p (candidate, memset)) { diff --git a/gcc/testsuite/gcc.target/i386/memset-strategy-2.c b/gcc/testsuite/gcc.target/i386/memset-strategy-2.c new file mode 100644 index 000..aafa54d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/memset-strategy-2.c @@ -0,0 +1,10 @@ +/* PR target/64108 */ +/* { dg-do compile } */ +/* { dg-options -O2 -march=atom -mmemset-strategy=libcall:-1:align -minline-all-stringops } */ + +char a[2048]; +void t (void) +{ + __builtin_memset (a, 1, 2048); +} + -- H.J.
Re: PATCH: ICE: SIGSEGV in decide_alg() with -mmemset-strategy=libcall:-1:align -minline-all-stringops
On Tue, Dec 2, 2014 at 3:03 PM, H.J. Lu hjl.to...@gmail.com wrote: On Tue, Dec 2, 2014 at 5:25 AM, Uros Bizjak ubiz...@gmail.com wrote: On Sat, Nov 29, 2014 at 3:00 PM, H.J. Lu hjl.to...@gmail.com wrote: When searching for an usable algorithm with -minline-all-stringops, decide_alg stops when it sees libcall even if there is a usable algorithm. It goes into an infinite loop. This patch changes decide_alg to stop searching only if there aren't any usable algorithms. Testd on Linux/x86-64. OK for trunk. H.J. gcc/ 2014-11-29 H.J. Lu hongjiu...@intel.com PR target/64108 * config/i386/i386.c (decide_alg): Stop only if there aren't any usable algorithms. gcc/testsuite/ 2014-11-29 H.J. Lu hongjiu...@intel.com PR target/64108 * gcc.target/i386/memset-strategy-2.c: New test. OK with a change below. Please wait a couple of days before backporting the fix - the fix looks kind of obvious, but this part of the code is somehow complex and has biten us in the past. Will do. Thanks, Uros. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 2493130..d789635 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -24464,7 +24464,9 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT expected_size, *noalign = alg_noalign; return alg; } - break; + /* Stop only if there aren't any usable algorithms. */ + if (!any_alg_usable_p) + break; Please use else if ... and remove unneded argument. IMO, the code speaks for itself. Which unneeded argument to remove? All arguments of decide_alg are used. Eh, slip of the tongue. Unnedded *comment*, the one your patch is adding. Uros.
Re: PATCH: ICE: SIGSEGV in decide_alg() with -mmemset-strategy=libcall:-1:align -minline-all-stringops
On Tue, Dec 2, 2014 at 6:05 AM, Uros Bizjak ubiz...@gmail.com wrote: On Tue, Dec 2, 2014 at 3:03 PM, H.J. Lu hjl.to...@gmail.com wrote: On Tue, Dec 2, 2014 at 5:25 AM, Uros Bizjak ubiz...@gmail.com wrote: On Sat, Nov 29, 2014 at 3:00 PM, H.J. Lu hjl.to...@gmail.com wrote: When searching for an usable algorithm with -minline-all-stringops, decide_alg stops when it sees libcall even if there is a usable algorithm. It goes into an infinite loop. This patch changes decide_alg to stop searching only if there aren't any usable algorithms. Testd on Linux/x86-64. OK for trunk. H.J. gcc/ 2014-11-29 H.J. Lu hongjiu...@intel.com PR target/64108 * config/i386/i386.c (decide_alg): Stop only if there aren't any usable algorithms. gcc/testsuite/ 2014-11-29 H.J. Lu hongjiu...@intel.com PR target/64108 * gcc.target/i386/memset-strategy-2.c: New test. OK with a change below. Please wait a couple of days before backporting the fix - the fix looks kind of obvious, but this part of the code is somehow complex and has biten us in the past. Will do. Thanks, Uros. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 2493130..d789635 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -24464,7 +24464,9 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT expected_size, *noalign = alg_noalign; return alg; } - break; + /* Stop only if there aren't any usable algorithms. */ + if (!any_alg_usable_p) + break; Please use else if ... and remove unneded argument. IMO, the code speaks for itself. Which unneeded argument to remove? All arguments of decide_alg are used. Eh, slip of the tongue. Unnedded *comment*, the one your patch is adding. Uros. This is what I checked in. I will wait for a few days before backporting it to 4.9. Thanks. -- H.J. --- diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 3eacc5a..c746d0f 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,9 @@ +2014-12-02 H.J. Lu hongjiu...@intel.com + + PR target/64108 + * config/i386/i386.c (decide_alg): Stop only if there aren't + any usable algorithms. + 2014-12-02 Tom de Vries t...@codesourcery.com PR rtl-optimization/63718 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 3397167..211c9e6 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -24464,7 +24464,8 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT expected_size, *noalign = alg_noalign; return alg; } - break; + else if (!any_alg_usable_p) +break; } else if (alg_usable_p (candidate, memset)) { diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 00da0bd..5d66849 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2014-12-02 H.J. Lu hongjiu...@intel.com + + PR target/64108 + * gcc.target/i386/memset-strategy-2.c: New test. + 2014-12-02 Richard Biener rguent...@suse.de * gcc.dg/torture/20141202-1.c: New testcase. diff --git a/gcc/testsuite/gcc.target/i386/memset-strategy-2.c b/gcc/testsuite/gcc.target/i386/memset-strategy-2.c new file mode 100644 index 000..aafa54d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/memset-strategy-2.c @@ -0,0 +1,10 @@ +/* PR target/64108 */ +/* { dg-do compile } */ +/* { dg-options -O2 -march=atom -mmemset-strategy=libcall:-1:align -minline-all-stringops } */ + +char a[2048]; +void t (void) +{ + __builtin_memset (a, 1, 2048); +} +
Re: [C++ PATCH] Detect UB in shifts in constexpr functions
On Fri, Nov 28, 2014 at 09:28:03AM -0500, Jason Merrill wrote: I was thinking even more detailed: one diagnostic for negative count, one diagnostic for count larger than the precision of the lhs, and then a third for overflow. Alright, done. + /* For signed x y the following: + (unsigned) x ((prec (lhs) - 1) - y) + if 1, is undefined. */ I meant briefly explaining where that formula comes from. This was a little bit problematic, because I don't really know where that formula comes from. We use it in c-ubsan.c, so I've just taken it from there. It was Jakub who came with that formula, so maybe he remembers its origin. I wrote something more to the comment describing it, even though that might not be enough. Another change I made is that I've added missing cxx11 check for the signed x case. Bootstrapped/regtested on ppc64-linux, ok for trunk? 2014-12-02 Marek Polacek pola...@redhat.com * constexpr.c (cxx_eval_check_shift_p): New function. (cxx_eval_binary_expression): Call it. * g++.dg/cpp0x/constexpr-shift1.C: New test. * g++.dg/cpp1y/constexpr-shift1.C: New test. * g++.dg/ubsan/pr63956.C: Add dg-errors. diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c index 2184a2f..556c422 100644 --- gcc/cp/constexpr.c +++ gcc/cp/constexpr.c @@ -1470,6 +1470,73 @@ verify_constant (tree t, bool allow_non_constant, bool *non_constant_p, return *non_constant_p; } +/* Check whether the shift operation with code CODE and type TYPE on LHS + and RHS is undefined. If it is, explain why and give an error. */ + +static void +cxx_eval_check_shift_p (location_t loc, enum tree_code code, tree type, + tree lhs, tree rhs) +{ + if ((code != LSHIFT_EXPR code != RSHIFT_EXPR) + || TREE_CODE (lhs) != INTEGER_CST + || TREE_CODE (rhs) != INTEGER_CST) +return; + + tree lhstype = TREE_TYPE (lhs); + unsigned HOST_WIDE_INT uprec = TYPE_PRECISION (TREE_TYPE (lhs)); + + /* [expr.shift] The behavior is undefined if the right operand + is negative, or greater than or equal to the length in bits + of the promoted left operand. */ + if (tree_int_cst_sgn (rhs) == -1) +{ + error_at (loc, right operand of shift expression %q+E is negative, + build2_loc (loc, code, type, lhs, rhs)); + return; +} + if (compare_tree_int (rhs, uprec) = 0) +{ + error_at (loc, right operand of shift expression %q+E is = than + the precision of the left operand, + build2_loc (loc, code, type, lhs, rhs)); + return; +} + + /* The value of E1 E2 is E1 left-shifted E2 bit positions; [...] + if E1 has a signed type and non-negative value, and E1x2^E2 is + representable in the corresponding unsigned type of the result type, + then that value, converted to the result type, is the resulting value; + otherwise, the behavior is undefined. */ + if (code == LSHIFT_EXPR !TYPE_UNSIGNED (lhstype) + (cxx_dialect = cxx11)) +{ + if (tree_int_cst_sgn (lhs) == -1) + { + error_at (loc, left operand of shift expression %q+E is negative, + build2_loc (loc, code, type, lhs, rhs)); + return; + } + /* For signed x y the following: +(unsigned) x ((prec (lhs) - 1) - y) +if 1, is undefined. The right-hand side of this formula +is the highest bit of the LHS that can be set (starting from 0), +so that the shift doesn't overflow. We then right-shift the LHS +to see whether any other bit is set making the original shift +undefined -- the result is not representable in the corresponding +unsigned type. */ + tree t = build_int_cst (unsigned_type_node, uprec - 1); + t = fold_build2 (MINUS_EXPR, unsigned_type_node, t, rhs); + tree ulhs = fold_convert (unsigned_type_for (lhstype), lhs); + t = fold_build2 (RSHIFT_EXPR, TREE_TYPE (ulhs), ulhs, t); + if (tree_int_cst_lt (integer_one_node, t)) + { + error_at (loc, shift expression %q+E overflows, + build2_loc (loc, code, type, lhs, rhs)); + return; + } +} +} + /* Subroutine of cxx_eval_constant_expression. Attempt to reduce the unary expression tree T to a compile time value. If successful, return the value. Otherwise issue a diagnostic @@ -1532,6 +1599,8 @@ cxx_eval_binary_expression (const constexpr_ctx *ctx, tree t, else r = build2_loc (loc, code, type, lhs, rhs); } + else +cxx_eval_check_shift_p (loc, code, type, lhs, rhs); VERIFY_CONSTANT (r); return r; } diff --git gcc/testsuite/g++.dg/cpp0x/constexpr-shift1.C gcc/testsuite/g++.dg/cpp0x/constexpr-shift1.C index e69de29..1f4ee73 100644 --- gcc/testsuite/g++.dg/cpp0x/constexpr-shift1.C +++ gcc/testsuite/g++.dg/cpp0x/constexpr-shift1.C @@ -0,0 +1,73 @@ +// { dg-do compile { target c++11 } } + +constexpr int +fn1 (int i, int j) +{ +
Re: [PATCH fortran/linemap] Add enough column hint to fit any possible offset
Hello Manuel, Tobias, Manuel López-Ibáñez lopeziba...@gmail.com writes: This patch actually does not touch linemap but I will appreciate Dodji's comments about the approach. Thanks :-) The problem is that in case of long lines, the column hint of 120 might be too small, thus we do not have enough locations within one line to point to a higher column (like 132 in the testcase). Giving as column hint the length of the line seems the right fix. I see. The other issue that triggers with this testcase is that, even though the column hint allows having line 6 + column 132, since we never generate a location except for 6:0, if a new map is created for the new line, then it will consider that the last location was 6:0 and the new map for line 7 will start at location+1, hence, no offset within line 6 can be represented. Right. This is done by design to allow for a kind of 'compression' of the theoretical source location space. That is, the number of source locations known to the line map sub-system roughly becomes (roughly) the number of tokens issued by the tokenizer, rather than the total number of columns multiplied by the number of lines of the input program. And of course, we are we today willing something in-between, I guess. :-) My fix here is to create a dummy location for line_len -1 (the last column in the line). This forces the next map to start after this last column's location. I think this makes sense. I'll get to the theoretically unlimited line length case that Tobias alludes to later below. I'm not sure if this latter fix is the approach we want to take. I think this approach is fine. If so, then we may want to change linemap.c itself to force new maps to reserve enough locations for any offset in the line instead of doing last_location+1. Hmmh, we must keep in mind that this whole line map thing should keep working for cases where the primary way to interact with the source location management sub-system is basically cpp_get_token(). So if that condition is satisfied, then, why not. I guess I'd need to look at specifics of what you are proposing here before talking :-) [...] Tobias Burnus bur...@net-b.de writes: [...] My fix here is to create a dummy location for line_len -1 (the last column in the line). This forces the next map to start after this last column's location. My feeling is that that doesn't work for -ffree-line-length-none/-ffixed-line-length-none. In that case, gfc_option.free_line_length == 0 and there is no limit to the number of characters per line. Thus, I fear that b-location - = linemap_line_start (line_table, current_file-line++, 120); + = linemap_line_start (line_table, current_file-line++, line_len); might even set the line length to 0 – but I haven't checked. For all other values, it should work. I am not sure what the value of line_len is when gfc_option.free_line_length == 0. If it's not zero, then we are good. If it's zero (with the semantics of that zero meaning that the line might be very big), then we might just get the actual length of the line in terms of column count and pass that to linemap_line_start. We must just keep in mind that the line subsystem has a hard limit on column numbers (I think it's 100 000 columns at the moment). Passed that limit, we give up tracking column numbers in source locations; that means that all columns numbers are set to zero. I hope this is useful. Cheers, -- Dodji
[PATCH, CHKP] Fix instrumentation clones privatization
Hi, Currently symbol names privatization doesn't work for instrumentation clones because clones assembler name is transparent alias and therefore alias target should be privatized instead. This patch does it. Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk? Thanks, Ilya -- gcc/ 2014-12-02 Ilya Enkovich ilya.enkov...@intel.com * lto/lto-partition.c (privatize_symbol_name): Correctly privatize instrumentation clones. gcc/testsuite/ 2014-12-02 Ilya Enkovich ilya.enkov...@intel.com * gcc.dg/lto/lto.exp: Load mpx-dg.exp. * gcc.dg/lto/chkp-privatize_0.c: New. * gcc.dg/lto/chkp-privatize_1.c: New. diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c index 65f0582..809a493 100644 --- a/gcc/lto/lto-partition.c +++ b/gcc/lto/lto-partition.c @@ -787,8 +787,16 @@ static bool privatize_symbol_name (symtab_node *node) { tree decl = node-decl; - const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); - cgraph_node *cnode; + cgraph_node *cnode = dyn_cast cgraph_node * (node); + const char *name; + + /* If we want to privatize instrumentation clone + then we need to change original function name + which is used via transparent alias chain. */ + if (cnode cnode-instrumentation_clone) +decl = cnode-orig_decl; + + name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); /* Our renaming machinery do not handle more than one change of assembler name. We should not need more than one anyway. */ @@ -821,15 +829,20 @@ privatize_symbol_name (symtab_node *node) (DECL_ASSEMBLER_NAME (decl))); /* We could change name which is a target of transparent alias chain of instrumented function name. Fix alias chain if so .*/ - if ((cnode = dyn_cast cgraph_node * (node)) - !cnode-instrumentation_clone - cnode-instrumented_version - cnode-instrumented_version-orig_decl == decl) + if (cnode) { - tree iname = DECL_ASSEMBLER_NAME (cnode-instrumented_version-decl); - - gcc_assert (IDENTIFIER_TRANSPARENT_ALIAS (iname)); - TREE_CHAIN (iname) = DECL_ASSEMBLER_NAME (decl); + tree iname = NULL_TREE; + if (cnode-instrumentation_clone) + iname = DECL_ASSEMBLER_NAME (cnode-decl); + else if (cnode-instrumented_version + cnode-instrumented_version-orig_decl == decl) + iname = DECL_ASSEMBLER_NAME (cnode-instrumented_version-decl); + + if (iname) + { + gcc_assert (IDENTIFIER_TRANSPARENT_ALIAS (iname)); + TREE_CHAIN (iname) = DECL_ASSEMBLER_NAME (decl); + } } if (symtab-dump_file) fprintf (symtab-dump_file, diff --git a/gcc/testsuite/gcc.dg/lto/chkp-privatize_0.c b/gcc/testsuite/gcc.dg/lto/chkp-privatize_0.c new file mode 100644 index 000..4c899e8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/chkp-privatize_0.c @@ -0,0 +1,18 @@ +/* { dg-lto-do link } */ +/* { dg-require-effective-target mpx { target { i?86-*-* x86_64-*-* } } } */ +/* { dg-lto-options { { -fPIC -flto -flto-partition=max -fcheck-pointer-bounds -mmpx } } } */ + +static int +__attribute__ ((noinline)) +func1 (int i) +{ + return i + 2; +} + +extern int func2 (int i); + +int +main (int argc, char **argv) +{ + return func1 (argc) + func2 (argc) + 1; +} diff --git a/gcc/testsuite/gcc.dg/lto/chkp-privatize_1.c b/gcc/testsuite/gcc.dg/lto/chkp-privatize_1.c new file mode 100644 index 000..db39e7f --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/chkp-privatize_1.c @@ -0,0 +1,8 @@ +int func1 = 10; + +int +func2 (int i) +{ + func1++; + return i + func1; +} diff --git a/gcc/testsuite/gcc.dg/lto/lto.exp b/gcc/testsuite/gcc.dg/lto/lto.exp index 2586297..797c02f 100644 --- a/gcc/testsuite/gcc.dg/lto/lto.exp +++ b/gcc/testsuite/gcc.dg/lto/lto.exp @@ -30,6 +30,7 @@ if $tracelevel then { # Load procedures from common libraries. load_lib standard.exp load_lib gcc.exp +load_lib mpx-dg.exp # Load the language-independent compabibility support procedures. load_lib lto.exp
[PATCH][1/n] Fix PR14541
This is a (tiny) piece of fixing PR14541 which basically complains that most math function simplification doesn't happen on GIMPLE. This is of course because nobody re-builds the nested GENERIC call expressions that would be necessary to trigger the various simplifications implemneted in builtins.c. The following patch thus moves the simplifications from fold_builtin_logarithm to match.pd patterns, removing the corresponding code from builtins.c. Note that this results in those patterns no longer applying at -O0 (or in C/C++ constant expression simplification) as currently there is no builtin folding implemented in the GENERIC code-path of match-and-simplify. I am considering that premature optimization (even more so at -O0...). Note that the machinery only omits support for outermost calls, thus we cannot simplify log (exp (x)) but we can simplify x * pow (x, CST) to pow (x, CST+1) for example. The simplifications also won't apply during gimplification but only after SSA is built. On the branch there are also complete implementations of the fold_builtin_pow and fold_builtin_sqrt simplifications. I'd like to get a little bit coverage of builtin simplification in the wild and wonder if the following is a piece that would be acceptable for trunk at this point (well, it fixes part of a bug which is fine in stage3). Eventually also merging the builtin bits from the branch. Are there otherwise any opinions on not handling these for GENERIC? Of course in the end I was just lazy and support shouldn't be too difficult. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Thanks, Richard. 2014-12-02 Richard Biener rguent...@suse.de PR middle-end/14541 * builtins.c (fold_builtin_logarithm): Implement simplifications ... * match.pd: ... here as patterns. Index: gcc/match.pd === --- gcc/match.pd(revision 218265) +++ gcc/match.pd(working copy) @@ -916,3 +916,90 @@ (define_operator_list inverted_tcc_compa (icmp @0 @1)) (if (ic == ncmp) (ncmp @0 @1) + + +/* Simplification of math builtins. */ + +(define_operator_list LOG BUILT_IN_LOGF BUILT_IN_LOG BUILT_IN_LOGL) +(define_operator_list EXP BUILT_IN_EXPF BUILT_IN_EXP BUILT_IN_EXPL) +(define_operator_list LOG2 BUILT_IN_LOG2F BUILT_IN_LOG2 BUILT_IN_LOG2L) +(define_operator_list EXP2 BUILT_IN_EXP2F BUILT_IN_EXP2 BUILT_IN_EXP2L) +(define_operator_list LOG10 BUILT_IN_LOG10F BUILT_IN_LOG10 BUILT_IN_LOG10L) +(define_operator_list EXP10 BUILT_IN_EXP10F BUILT_IN_EXP10 BUILT_IN_EXP10L) +(define_operator_list POW BUILT_IN_POWF BUILT_IN_POW BUILT_IN_POWL) +(define_operator_list POW10 BUILT_IN_POW10F BUILT_IN_POW10 BUILT_IN_POW10L) +(define_operator_list SQRT BUILT_IN_SQRTF BUILT_IN_SQRT BUILT_IN_SQRTL) +(define_operator_list CBRT BUILT_IN_CBRTF BUILT_IN_CBRT BUILT_IN_CBRTL) + + +/* fold_builtin_logarithm */ +(if (flag_unsafe_math_optimizations) + /* Special case, optimize logN(expN(x)) = x. */ + (for logs (LOG LOG2 LOG10) + exps (EXP EXP2 EXP10) + (simplify + (logs (exps @0)) +@0)) + /* Optimize logN(func()) for various exponential functions. We +want to determine the value x and the power exponent in +order to transform logN(x**exponent) into exponent*logN(x). */ + (for logs (LOG LOG LOG LOG +LOG2 LOG2 LOG2 LOG2 + LOG10 LOG10 LOG10 LOG10) + exps (EXP EXP2 EXP10 POW10) + (simplify + (logs (exps @0)) + (with { + tree x; + switch (exps) + { + CASE_FLT_FN (BUILT_IN_EXP): + /* Prepare to do logN(exp(exponent) - exponent*logN(e). */ +x = build_real (type, real_value_truncate (TYPE_MODE (type), + dconst_e ())); + break; + CASE_FLT_FN (BUILT_IN_EXP2): + /* Prepare to do logN(exp2(exponent) - exponent*logN(2). */ + x = build_real (type, dconst2); + break; + CASE_FLT_FN (BUILT_IN_EXP10): + CASE_FLT_FN (BUILT_IN_POW10): +/* Prepare to do logN(exp10(exponent) - exponent*logN(10). */ +{ + REAL_VALUE_TYPE dconst10; + real_from_integer (dconst10, VOIDmode, 10, SIGNED); + x = build_real (type, dconst10); +} + break; + } + } +(mult (logs { x; }) @0 + (for logs (LOG LOG +LOG2 LOG2 + LOG10 LOG10) + exps (SQRT CBRT) + (simplify + (logs (exps @0)) + (with { + tree x; + switch (exps) + { + CASE_FLT_FN (BUILT_IN_SQRT): +/* Prepare to do logN(sqrt(x) - 0.5*logN(x). */ +x = build_real (type, dconsthalf); + break; + CASE_FLT_FN (BUILT_IN_CBRT): +/* Prepare to do logN(cbrt(x) - (1/3)*logN(x). */ + x = build_real (type, real_value_truncate (TYPE_MODE (type), + dconst_third ())); + break; + } + } +(mult { x; } (logs @0) + /*
Re: [C++ PATCH] Detect UB in shifts in constexpr functions
On 12/02/2014 09:14 AM, Marek Polacek wrote: On Fri, Nov 28, 2014 at 09:28:03AM -0500, Jason Merrill wrote: I was thinking even more detailed: one diagnostic for negative count, one diagnostic for count larger than the precision of the lhs, and then a third for overflow. Alright, done. Thanks. These errors also need to be conditional on (!ctx-quiet), and if we see one of these conditions we need to set *non_constant_p. This was a little bit problematic, because I don't really know where that formula comes from. We use it in c-ubsan.c, so I've just taken it from there. It was Jakub who came with that formula, so maybe he remembers its origin. I wrote something more to the comment describing it, even though that might not be enough. Looks good. Jason
Re: [PATCH 2/3] Extended if-conversion
Thanks Richard for your quick reply! 1. I agree that we can combine predicate_extended_ and predicate_arbitrary_ to one function as you proposed. 2. What is your opinion about using more simple decision about insertion point - if bb has use of phi result insert phi predication before it and at the bb end otherwise. I assume that critical edge splitting is not a good decision. Best regards. Yuri. 2014-12-02 16:28 GMT+03:00 Richard Biener richard.guent...@gmail.com: On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Hi Richard, I resend you patch1 and patch2 with minor changes: 1. I renamed flag_force_vectorize to aggressive_if_conv. 2. Use static cast for the first argument of gimple_phi_arg_edge. I also very sorry that I sent you bad patch. Now let me answer on your questions related to second patch. 1. Why we need both predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi? Let's consider the following simple test-case: #pragma omp simd safelen(8) for (i=0; i512; i++) { float t = a[i]; if (t 0.0f t 1.0e+17f) if (c[i] != 0) /* c is integer array. */ res += 1; } we can see the following phi node correspondent to res: # res_1 = PHI res_15(3), res_15(4), res_10(5) It is clear that we can optimize it to phi node with 2 arguments only and only one check can be used for phi predication (for reduction in our case), namely predicate of bb_5. In general case we can't do it even if we sort all phi argument values since we still have to produce a chain of cond expressions to perform phi predication (see comments for predicate_arbitrary_scalar_phi). How so? We can always use !(condition) for the last value, thus treat it as an 'else' case. That even works for # res_1 = PHI res_15(3), res_15(4), res_10(5), res_10(7) where the condition for edges 5 and 7 can be computed as ! (condition for 3 || condition for 4). Of course it is worthwhile to also sort single-occurances first so your case gets just the condiiton for edge 5 and its inversion used for edges 3 and 4 combined. 2. Why we need to introduce find_insertion_point? Let's consider another test-case extracted from 175.vpr ( t5.c is attached) and we can see that bb_7 and bb_9 containig phi nodes has only critical incoming edges and both contain code computing edge predicates, e.g. bb 7: # xmax_edge_30 = PHI xmax_edge_36(6), xmax_edge_18(5) _46 = xmax_17 == xmax_37; _47 = xmax_17 == xmax_27; _48 = _46 _47; _53 = xmax_17 == xmax_37; _54 = ~_53; _55 = xmax_17 == xmax_27; _56 = _54 _55; _57 = _48 | _56; xmax_edge_19 = xmax_edge_39 + 1; goto bb 11; It is evident that we can not put phi predication at the block beginning but need to put it after predicate computations. Note also that if there are no critical edges for phi arguments insertion point will be after labels Note also that phi result can have use in this block too, so we can't put predication code to the block end. So the issue is that predicate insertion for edge predicates does not happen on the edge but somewhere else (generally impossible for critical edges unless you split them). I think I've told you before that I prefer simple solutions to such issues, like splitting the edge! Certainly not involving a function walking GENERIC expressions. Thanks, Richard. Let me know if you still have any questions. Best regards. Yuri. 2014-11-28 15:43 GMT+03:00 Richard Biener richard.guent...@gmail.com: On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Hi All, Here is the second patch related to extended predication. Few comments which explain a main goal of design. 1. I don't want to insert any critical edge splitting since it may lead to less efficient binaries. 2. One special case of extended PHI node predication was introduced when #arguments is more than 2 but only two arguments are different and one argument has the only occurrence. For such PHI conditional scalar reduction is applied. This is correspondent to the following statement: if (q1 q2 q3) var++ New function phi_has_two_different_args was introduced to detect such phi. 3. Original algorithm for PHI predication used assumption that at least one incoming edge for blocks containing PHI is not critical - it guarantees that all computations related to predicate of normal edge are already inserted above this block and code related to PHI predication can be inserted at the beginning of block. But this is not true for critical edges for which predicate computations are in the block where code for phi predication must be inserted. So new function find_insertion_point is introduced which is simply found out the last statement in block defining predicates correspondent to all incoming edges and insert phi predication code after it (with some minor exceptions). Unfortunately the patch doesn't apply for me - I get patch: malformed patch at line
Re: [C++ PATCH] Detect UB in shifts in constexpr functions
On Tue, Dec 02, 2014 at 09:59:26AM -0500, Jason Merrill wrote: Thanks. These errors also need to be conditional on (!ctx-quiet), and if we see one of these conditions we need to set *non_constant_p. Ah, sorry. Fixed in the following. Regtested on ppc64-linux, bootstrap in progress. 2014-12-02 Marek Polacek pola...@redhat.com * constexpr.c (cxx_eval_check_shift_p): New function. (cxx_eval_binary_expression): Call it. Set NON_CONSTANT_P if it returns true. * g++.dg/cpp0x/constexpr-shift1.C: New test. * g++.dg/cpp1y/constexpr-shift1.C: New test. * g++.dg/ubsan/pr63956.C: Add dg-errors. diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c index 2184a2f..48bc8f1 100644 --- gcc/cp/constexpr.c +++ gcc/cp/constexpr.c @@ -1470,6 +1470,79 @@ verify_constant (tree t, bool allow_non_constant, bool *non_constant_p, return *non_constant_p; } +/* Check whether the shift operation with code CODE and type TYPE on LHS + and RHS is undefined. If it is, give an error with an explanation, + and return true; return false otherwise. */ + +static bool +cxx_eval_check_shift_p (location_t loc, const constexpr_ctx *ctx, + enum tree_code code, tree type, tree lhs, tree rhs) +{ + if ((code != LSHIFT_EXPR code != RSHIFT_EXPR) + || TREE_CODE (lhs) != INTEGER_CST + || TREE_CODE (rhs) != INTEGER_CST) +return false; + + tree lhstype = TREE_TYPE (lhs); + unsigned HOST_WIDE_INT uprec = TYPE_PRECISION (TREE_TYPE (lhs)); + + /* [expr.shift] The behavior is undefined if the right operand + is negative, or greater than or equal to the length in bits + of the promoted left operand. */ + if (tree_int_cst_sgn (rhs) == -1) +{ + if (!ctx-quiet) + error_at (loc, right operand of shift expression %q+E is negative, + build2_loc (loc, code, type, lhs, rhs)); + return true; +} + if (compare_tree_int (rhs, uprec) = 0) +{ + if (!ctx-quiet) + error_at (loc, right operand of shift expression %q+E is = than + the precision of the left operand, + build2_loc (loc, code, type, lhs, rhs)); + return true; +} + + /* The value of E1 E2 is E1 left-shifted E2 bit positions; [...] + if E1 has a signed type and non-negative value, and E1x2^E2 is + representable in the corresponding unsigned type of the result type, + then that value, converted to the result type, is the resulting value; + otherwise, the behavior is undefined. */ + if (code == LSHIFT_EXPR !TYPE_UNSIGNED (lhstype) + (cxx_dialect = cxx11)) +{ + if (tree_int_cst_sgn (lhs) == -1) + { + if (!ctx-quiet) + error_at (loc, left operand of shift expression %q+E is negative, + build2_loc (loc, code, type, lhs, rhs)); + return true; + } + /* For signed x y the following: +(unsigned) x ((prec (lhs) - 1) - y) +if 1, is undefined. The right-hand side of this formula +is the highest bit of the LHS that can be set (starting from 0), +so that the shift doesn't overflow. We then right-shift the LHS +to see whether any other bit is set making the original shift +undefined -- the result is not representable in the corresponding +unsigned type. */ + tree t = build_int_cst (unsigned_type_node, uprec - 1); + t = fold_build2 (MINUS_EXPR, unsigned_type_node, t, rhs); + tree ulhs = fold_convert (unsigned_type_for (lhstype), lhs); + t = fold_build2 (RSHIFT_EXPR, TREE_TYPE (ulhs), ulhs, t); + if (tree_int_cst_lt (integer_one_node, t)) + { + if (!ctx-quiet) + error_at (loc, shift expression %q+E overflows, + build2_loc (loc, code, type, lhs, rhs)); + return true; + } +} + return false; +} + /* Subroutine of cxx_eval_constant_expression. Attempt to reduce the unary expression tree T to a compile time value. If successful, return the value. Otherwise issue a diagnostic @@ -1532,6 +1605,8 @@ cxx_eval_binary_expression (const constexpr_ctx *ctx, tree t, else r = build2_loc (loc, code, type, lhs, rhs); } + else if (cxx_eval_check_shift_p (loc, ctx, code, type, lhs, rhs)) +*non_constant_p = true; VERIFY_CONSTANT (r); return r; } diff --git gcc/testsuite/g++.dg/cpp0x/constexpr-shift1.C gcc/testsuite/g++.dg/cpp0x/constexpr-shift1.C index e69de29..1f4ee73 100644 --- gcc/testsuite/g++.dg/cpp0x/constexpr-shift1.C +++ gcc/testsuite/g++.dg/cpp0x/constexpr-shift1.C @@ -0,0 +1,73 @@ +// { dg-do compile { target c++11 } } + +constexpr int +fn1 (int i, int j) +{ + return i j; // { dg-error is negative } +} + +constexpr int i1 = fn1 (1, -1); + +constexpr int +fn2 (int i, int j) +{ + return i j; // { dg-error is = than the precision of the left operand } +} + +constexpr int i2 = fn2 (1, 200); + +constexpr int +fn3 (int i, int j) +{ + return i
[patch, 4.9] backport parallel header changes to the 4.9 branch
Backport these header changes to the 4.9 branch. No regression, but reported to the Debian BTS. Safe to backport according to Jonathan Wakely. No regressions in the testsuite on x86_64-linux-gnu. Ok for the branch? libstdc++-v3/ 2014-12-02 Matthias Klose d...@ubuntu.com PR libstdc++/64103 Backport from mainline 2014-11-03 Paolo Carlini paolo.carl...@oracle.com * include/parallel/algo.h: Do not use default arguments in function template redeclarations (definitions). * include/parallel/numeric.h: Likewise. libstdc++-v3/ 2014-12-02 Matthias Klose d...@ubuntu.com PR libstdc++/64103 Backport from mainline 2014-11-03 Paolo Carlini paolo.carl...@oracle.com * include/parallel/algo.h: Do not use default arguments in function template redeclarations (definitions). * include/parallel/numeric.h: Likewise. Index: libstdc++-v3/include/parallel/algo.h === --- libstdc++-v3/include/parallel/algo.h (revision 218277) +++ libstdc++-v3/include/parallel/algo.h (working copy) @@ -81,9 +81,8 @@ templatetypename _RAIter, typename _Function _Function __for_each_switch(_RAIter __begin, _RAIter __end, -_Function __f, random_access_iterator_tag, -__gnu_parallel::_Parallelism __parallelism_tag -= __gnu_parallel::parallel_balanced) +_Function __f, random_access_iterator_tag, +__gnu_parallel::_Parallelism __parallelism_tag) { if (_GLIBCXX_PARALLEL_CONDITION( static_cast__gnu_parallel::_SequenceIndex(__end - __begin) @@ -896,8 +895,7 @@ typename iterator_traits_RAIter::difference_type __count_switch(_RAIter __begin, _RAIter __end, const _Tp __value, random_access_iterator_tag, - __gnu_parallel::_Parallelism __parallelism_tag - = __gnu_parallel::parallel_unbalanced) + __gnu_parallel::_Parallelism __parallelism_tag) { typedef iterator_traits_RAIter _TraitsType; typedef typename _TraitsType::value_type _ValueType; @@ -966,8 +964,7 @@ typename iterator_traits_RAIter::difference_type __count_if_switch(_RAIter __begin, _RAIter __end, _Predicate __pred, random_access_iterator_tag, -__gnu_parallel::_Parallelism __parallelism_tag -= __gnu_parallel::parallel_unbalanced) +__gnu_parallel::_Parallelism __parallelism_tag) { typedef iterator_traits_RAIter _TraitsType; typedef typename _TraitsType::value_type _ValueType; @@ -1225,8 +1222,7 @@ __transform1_switch(_RAIter1 __begin, _RAIter1 __end, _RAIter2 __result, _UnaryOperation __unary_op, random_access_iterator_tag, random_access_iterator_tag, - __gnu_parallel::_Parallelism __parallelism_tag - = __gnu_parallel::parallel_balanced) + __gnu_parallel::_Parallelism __parallelism_tag) { if (_GLIBCXX_PARALLEL_CONDITION( static_cast__gnu_parallel::_SequenceIndex(__end - __begin) @@ -1315,8 +1311,7 @@ _RAIter3 __result, _BinaryOperation __binary_op, random_access_iterator_tag, random_access_iterator_tag, random_access_iterator_tag, - __gnu_parallel::_Parallelism __parallelism_tag - = __gnu_parallel::parallel_balanced) + __gnu_parallel::_Parallelism __parallelism_tag) { if (_GLIBCXX_PARALLEL_CONDITION( (__end1 - __begin1) = @@ -1422,8 +1417,7 @@ __replace_switch(_RAIter __begin, _RAIter __end, const _Tp __old_value, const _Tp __new_value, random_access_iterator_tag, - __gnu_parallel::_Parallelism __parallelism_tag - = __gnu_parallel::parallel_balanced) + __gnu_parallel::_Parallelism __parallelism_tag) { // XXX parallel version is where? replace(__begin, __end, __old_value, __new_value, @@ -1478,8 +1472,7 @@ __replace_if_switch(_RAIter __begin, _RAIter __end, _Predicate __pred, const _Tp __new_value, random_access_iterator_tag, - __gnu_parallel::_Parallelism __parallelism_tag - = __gnu_parallel::parallel_balanced) + __gnu_parallel::_Parallelism __parallelism_tag) { if (_GLIBCXX_PARALLEL_CONDITION( static_cast__gnu_parallel::_SequenceIndex(__end - __begin) @@ -1544,8 +1537,7 @@ void __generate_switch(_RAIter __begin, _RAIter __end, _Generator __gen, random_access_iterator_tag, -__gnu_parallel::_Parallelism __parallelism_tag -
Re: [patch, 4.9] backport parallel header changes to the 4.9 branch
On 02/12/14 16:24 +, Jonathan Wakely wrote: On 02/12/14 17:20 +0100, Matthias Klose wrote: Backport these header changes to the 4.9 branch. No regression, but reported to the Debian BTS. Safe to backport according to Jonathan Wakely. No regressions in the testsuite on x86_64-linux-gnu. Ok for the branch? You might as well include the additional fixes in r217069 which fix more instances of the same problem. Which you already did! Sorry. OK for the 4.9 branch, thanks for doing this.
[PATCH][wwwdocs] Add note about removal of -mwords-little-endian option to changes.html for GCC 5
Hi all, Back in July this deprecated option was removed. This patch adds a note to changes.html for that. Looked at the result in firefox, looked ok to me. Ok to commit? Thanks, Kyrill Index: htdocs/gcc-5/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v retrieving revision 1.44 diff -U 3 -r1.44 changes.html --- htdocs/gcc-5/changes.html 28 Nov 2014 12:20:01 - 1.44 +++ htdocs/gcc-5/changes.html 2 Dec 2014 15:36:03 - @@ -406,6 +406,9 @@ code-mcpu=cortex-a17.cortex-a7 /code and code-mtune=cortex-a17.cortex-a7/code options. /li + li The deprecated option code-mwords-little-endian/code + has been removed. + /li /ul
Re: [patch, 4.9] backport parallel header changes to the 4.9 branch
On 02/12/14 17:20 +0100, Matthias Klose wrote: Backport these header changes to the 4.9 branch. No regression, but reported to the Debian BTS. Safe to backport according to Jonathan Wakely. No regressions in the testsuite on x86_64-linux-gnu. Ok for the branch? You might as well include the additional fixes in r217069 which fix more instances of the same problem.
Re: [PATCH 08/08] PR/64003 workaround (uninit memory in i386.md)
On Wed, 2014-11-26 at 10:41 -0700, Jeff Law wrote: On 11/25/14 18:39, David Malcolm wrote: I suspect this is papering over a real problem, but I've been applying this workaround locally to keep my valgrind output clean. gcc/ChangeLog: PR/64003 * final.c (shorten_branches): Allocate insn_lengths with XCNEWVEC rather than XNEWVEC; remove subsequent per-element initialization. I'd like more information on this one. I've spent some time trying to track this down, and I've added detailed notes to the bug. In short, I believe the problem occurs with a *jcc_1 insn that jumps forwards, but the full details are in the bug. My first thought is that something must be creating a new insn after shorten_branches is complete or an existing insn that was not on the chain when we called shorten-branches, but got threaded onto the chain later. Either would be considered bad in various ways, so we'd like to fix it. I don't think either of these are the case. I believe it's due to the size of the jcc_1 insn being affected by the distance to the jump target, which for a forward jump is a bit of a chicken-and-egg issue, since that distance is affected by the size of the jcc_1 insn itself. It looks like align_fuzz exists in order to cope with this kind of circular definition, but the issue seems to occur inside align_fuzz itself. Presumably you're not doing an out-of-range access into insn_lengths? Right? Do you have a reasonable testcase for this? I've added a more minimal reproducer to the bug (8 lines); see: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64003#c7 I see it with latest trunk (r218254) on x86_64. See my notes in the bug. Dave
[PING] [PATCH][AARCH64]Use selected cpu's tuning when no tuning parameter is specified.
On 27/11/14 11:27, Renlin Li wrote: Hi all, We have the following code in aarch64_override_options() function. /* The selected cpu may be an architecture, so lookup tuning by core ID. */ if (!selected_tune) selected_tune = all_cores[selected_cpu-core]; However, the logic here is not right any more according to our current code. selected_cpu will never be an architecture at this point. This patch will correct the behaviour and use selected_cpu's tuning directly if selected_tune is still not decided at this point. We have the following consequence after the change. Previously we have the following tuning settings with the following command line options: -march=armv8-a -- selected_cpu = generic, aarch64_tune_params = cortexa53_tunings Nothing specified and selected_cpu == generic -- aarch64_tune_params = cortexa53_tunings After the change, we got something different: -march=armv8-a -- selected_cpu = generic, aarch64_tune_params = generic_tunings Nothing specified and selected_cpu == generic -- aarch64_tune_params = generic_tunings All other configuration(-march, -mcpu, -mtune) combinations should be unaffected. aarch64-none-elf has been built and tested on the model, no issue. Is it Okay for trunk? gcc/ChangeLog: 2014-11-27 Renlin Li renlin...@arm.com * config/aarch64/aarch64.c (aarch64_parse_cpu): Don't define selected_tune. (aarch64_override_options): Use selected_cpu's tuning. PING~ Regards, Renlin
Re: [C++ PATCH] Detect UB in shifts in constexpr functions
OK, thanks. Jason
[PATCH, testsuite, i386]: Add some missing defines to AVX512 testcases
Hello! 2014-12-02 Uros Bizjak ubiz...@gmail.com * gcc.target/i386/avx512ifma-vpmaddhuq-2.c: Define AVX512IFMA. * gcc.target/i386/avx512ifma-vpmaddluq-2.c: Ditto. * gcc.target/i386/avx512vbmi-vpermb-2.c: Define AVX512VBMI. * gcc.target/i386/avx512vbmi-vpermi2b-2.c: Ditto. * gcc.target/i386/avx512vbmi-vpermt2b-2.c: Ditto. * gcc.target/i386/avx512vbmi-vpmultishiftqb-2.c: Ditto. Tested on x86_64-linux-gnu and committed to mainline SVN. Uros. Index: gcc.target/i386/avx512ifma-vpmaddhuq-2.c === --- gcc.target/i386/avx512ifma-vpmaddhuq-2.c(revision 218276) +++ gcc.target/i386/avx512ifma-vpmaddhuq-2.c(working copy) @@ -2,6 +2,8 @@ /* { dg-options -O2 -mavx512ifma } */ /* { dg-require-effective-target avx512ifma } */ +#define AVX512IFMA + #include avx512f-helper.h #define SIZE (AVX512F_LEN / 64) Index: gcc.target/i386/avx512ifma-vpmaddluq-2.c === --- gcc.target/i386/avx512ifma-vpmaddluq-2.c(revision 218276) +++ gcc.target/i386/avx512ifma-vpmaddluq-2.c(working copy) @@ -2,6 +2,8 @@ /* { dg-options -O2 -mavx512ifma } */ /* { dg-require-effective-target avx512ifma } */ +#define AVX512IFMA + #include avx512f-helper.h #define SIZE (AVX512F_LEN / 64) Index: gcc.target/i386/avx512vbmi-vpermb-2.c === --- gcc.target/i386/avx512vbmi-vpermb-2.c (revision 218276) +++ gcc.target/i386/avx512vbmi-vpermb-2.c (working copy) @@ -2,6 +2,8 @@ /* { dg-options -O2 -mavx512vbmi } */ /* { dg-require-effective-target avx512vbmi } */ +#define AVX512VBMI + #include avx512f-helper.h #define SIZE (AVX512F_LEN / 8) Index: gcc.target/i386/avx512vbmi-vpermi2b-2.c === --- gcc.target/i386/avx512vbmi-vpermi2b-2.c (revision 218276) +++ gcc.target/i386/avx512vbmi-vpermi2b-2.c (working copy) @@ -2,6 +2,8 @@ /* { dg-options -O2 -mavx512vbmi } */ /* { dg-require-effective-target avx512vbmi } */ +#define AVX512VBMI + #include avx512f-helper.h #define SIZE (AVX512F_LEN / 8) Index: gcc.target/i386/avx512vbmi-vpermt2b-2.c === --- gcc.target/i386/avx512vbmi-vpermt2b-2.c (revision 218276) +++ gcc.target/i386/avx512vbmi-vpermt2b-2.c (working copy) @@ -2,6 +2,8 @@ /* { dg-options -O2 -mavx512vbmi } */ /* { dg-require-effective-target avx512vbmi } */ +#define AVX512VBMI + #include avx512f-helper.h #define SIZE (AVX512F_LEN / 8) Index: gcc.target/i386/avx512vbmi-vpmultishiftqb-2.c === --- gcc.target/i386/avx512vbmi-vpmultishiftqb-2.c (revision 218276) +++ gcc.target/i386/avx512vbmi-vpmultishiftqb-2.c (working copy) @@ -2,6 +2,8 @@ /* { dg-options -O2 -mavx512vbmi } */ /* { dg-require-effective-target avx512vbmi } */ +#define AVX512VBMI + #include avx512f-helper.h #define SIZE (AVX512F_LEN / 8)
[Patch]: Check __gthread_setspecific return
Hi, Underlying pthread_setspecific can return non-zero with ENOMEM or EINVAL. 2014-12-02 Ryan Mansfield rmansfi...@qnx.com * emutls.c (__emutls_get_address): Check __gthread_setspecific returns. OK? Regards, Ryan Mansfield Index: libgcc/emutls.c === --- libgcc/emutls.c (revision 218278) +++ libgcc/emutls.c (working copy) @@ -160,7 +160,8 @@ if (arr == NULL) abort (); arr-size = size; - __gthread_setspecific (emutls_key, (void *) arr); + if (__gthread_setspecific (emutls_key, (void *) arr) != 0) + abort (); } else if (__builtin_expect (offset arr-size, 0)) { @@ -174,7 +175,8 @@ arr-size = size; memset (arr-data + orig_size, 0, (size - orig_size) * sizeof (void *)); - __gthread_setspecific (emutls_key, (void *) arr); + if (__gthread_setspecific (emutls_key, (void *) arr) != 0) + abort (); } void *ret = arr-data[offset - 1];
[C++ Patch] PR 60978
Hi, another simple issue: this one argues that the warning is overeager when anonymous enums are involved, which often are just used as named constants. Details: maybe write the conditional in a different way; maybe even use instead of ||. Tested x86_64-linux. Thanks, Paolo. // /cp 2014-12-02 Paolo Carlini paolo.carl...@oracle.com PR c++/60978 * call.c (build_conditional_expr_1): Do not -Wenum-compare warn for anonymous enums. /testsuite 2014-12-02 Paolo Carlini paolo.carl...@oracle.com PR c++/60978 * g++.dg/warn/Wenum-compare-2.C: New. Index: cp/call.c === --- cp/call.c (revision 218278) +++ cp/call.c (working copy) @@ -5014,6 +5014,10 @@ build_conditional_expr_1 (location_t loc, tree arg DECL_CONTEXT (orig_arg2) == DECL_CONTEXT (orig_arg3)) /* Two enumerators from the same enumeration can have different types when the enumeration is still being defined. */; + else if (TYPE_ANONYMOUS_P (arg2_type) + || TYPE_ANONYMOUS_P (arg3_type)) + /* Avoid warning for anonymous enums, probably just used as + named constants. */; else if (complain tf_warning) warning_at (loc, OPT_Wenum_compare, enumeral mismatch in conditional expression: %qT vs %qT, Index: testsuite/g++.dg/warn/Wenum-compare-2.C === --- testsuite/g++.dg/warn/Wenum-compare-2.C (revision 0) +++ testsuite/g++.dg/warn/Wenum-compare-2.C (working copy) @@ -0,0 +1,10 @@ +// PR c++/60978 +// { dg-options -Wenum-compare } + +enum { A }; +enum { B }; + +int foo(int x) +{ + return x ? A : B; +}
asan: support for globals in kernel
Hi, The following patch adds support for instrumentation of globals for Linux kernel (-fsanitize=kernel-address). Kernel only supports constructors with default priority, but the rest works fine. OK for trunk? https://codereview.appspot.com/176570043 Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 218280) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2014-12-02 Dmitry Vyukov dvyu...@google.com + + * asan.c: (asan_finish_file): Use default priority for constructors + in kernel mode. + 2014-12-02 Ulrich Weigand ulrich.weig...@de.ibm.com PR target/64115 Index: gcc/asan.c === --- gcc/asan.c (revision 218280) +++ gcc/asan.c (working copy) @@ -1348,7 +1348,9 @@ the var that is selected by the linker will have padding or not. */ || DECL_ONE_ONLY (decl) - /* Similarly for common vars. People can use -fno-common. */ + /* Similarly for common vars. People can use -fno-common. + Note: Linux kernel is built with -fno-common, so we do instrument + globals there even if it is C. */ || (DECL_COMMON (decl) TREE_PUBLIC (decl)) /* Don't protect if using user section, often vars placed into user section from multiple TUs are then assumed @@ -2440,6 +2442,7 @@ { varpool_node *vnode; unsigned HOST_WIDE_INT gcount = 0; + int priority; if (shadow_ptr_types[0] == NULL_TREE) asan_init_shadow_ptr_types (); @@ -2448,6 +2451,13 @@ nor after .LASAN* array. */ flag_sanitize = ~SANITIZE_ADDRESS; + /* For user-space we want asan constructors to run first. + Linux kernel does not support priorities other than default, and the only + other user of constructors is coverage. So we run with the default + priority. */ + priority = flag_sanitize SANITIZE_USER_ADDRESS ? + MAX_RESERVED_INIT_PRIORITY - 1 : DEFAULT_INIT_PRIORITY; + if (flag_sanitize SANITIZE_USER_ADDRESS) { tree fn = builtin_decl_implicit (BUILT_IN_ASAN_INIT); @@ -2503,12 +2513,10 @@ build_fold_addr_expr (var), gcount_tree), dtor_statements); - cgraph_build_static_cdtor ('D', dtor_statements, - MAX_RESERVED_INIT_PRIORITY - 1); + cgraph_build_static_cdtor ('D', dtor_statements, priority); } if (asan_ctor_statements) -cgraph_build_static_cdtor ('I', asan_ctor_statements, - MAX_RESERVED_INIT_PRIORITY - 1); +cgraph_build_static_cdtor ('I', asan_ctor_statements, priority); flag_sanitize |= SANITIZE_ADDRESS; }
Re: [PATCH x86_64] Optimize access to globals in -fpie -pie builds with copy relocations
Ping. On Mon, Nov 10, 2014 at 3:22 PM, Sriraman Tallam tmsri...@google.com wrote: Ping. On Mon, Oct 6, 2014 at 1:43 PM, Sriraman Tallam tmsri...@google.com wrote: Ping. On Mon, Sep 29, 2014 at 10:57 AM, Sriraman Tallam tmsri...@google.com wrote: Ping. On Fri, Sep 19, 2014 at 2:11 PM, Sriraman Tallam tmsri...@google.com wrote: Hi Richard, I also ran the gcc testsuite with RUNTESTFLAGS=--tool_opts=-mcopyrelocs to check for issues. The only test that failed was g++.dg/tsan/default_options.C. It uses -fpie -pie and BFD ld to link. Since BFD ld does not support copy relocations with -pie, it does not link. I linked with gold to make the test pass. Could you please take another look at this patch? Thanks Sri On Mon, Sep 8, 2014 at 3:19 PM, Sriraman Tallam tmsri...@google.com wrote: On Tue, Sep 2, 2014 at 1:40 PM, Richard Henderson r...@redhat.com wrote: On 06/20/2014 05:17 PM, Sriraman Tallam wrote: Index: config/i386/i386.c === --- config/i386/i386.c(revision 211826) +++ config/i386/i386.c(working copy) @@ -12691,7 +12691,9 @@ legitimate_pic_address_disp_p (rtx disp) return true; } else if (!SYMBOL_REF_FAR_ADDR_P (op0) - SYMBOL_REF_LOCAL_P (op0) + (SYMBOL_REF_LOCAL_P (op0) +|| (TARGET_64BIT ix86_copyrelocs flag_pie + !SYMBOL_REF_FUNCTION_P (op0))) ix86_cmodel != CM_LARGE_PIC) return true; break; This is the wrong place to patch. You ought to be adjusting SYMBOL_REF_LOCAL_P, by providing a modified TARGET_BINDS_LOCAL_P. I have done this in the new attached patch, I added a new function i386_binds_local_p which will check for this and call default_binds_local_p otherwise. Note in particular that I believe that you are doing the wrong thing with weak and COMMON symbols, in that you probably ought not force a copy reloc there. I added an extra check to not do this for WEAK symbols. I also added a check for DECL_EXTERNAL so I believe this will also not be called for COMMON symbols. Note the complexity of default_binds_local_p_1, and the fact that all you really want to modify is /* If PIC, then assume that any global name can be overridden by symbols resolved from other modules. */ else if (shlib) local_p = false; near the bottom of that function. I did not understand what you mean here? Were you suggesting an alternative way of doing this? Thanks for reviewing Sri r~
Re: asan: support for globals in kernel
On Tue, Dec 02, 2014 at 09:56:36PM +0400, Dmitry Vyukov wrote: --- gcc/ChangeLog (revision 218280) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2014-12-02 Dmitry Vyukov dvyu...@google.com + + * asan.c: (asan_finish_file): Use default priority for constructors + in kernel mode. Seems your MUA is eating tabs, I'll assume they are where I'd expect them to be. @@ -2440,6 +2442,7 @@ { varpool_node *vnode; unsigned HOST_WIDE_INT gcount = 0; + int priority; if (shadow_ptr_types[0] == NULL_TREE) asan_init_shadow_ptr_types (); No need to declare this separately, @@ -2448,6 +2451,13 @@ nor after .LASAN* array. */ flag_sanitize = ~SANITIZE_ADDRESS; + /* For user-space we want asan constructors to run first. + Linux kernel does not support priorities other than default, and the only + other user of constructors is coverage. So we run with the default + priority. */ + priority = flag_sanitize SANITIZE_USER_ADDRESS ? + MAX_RESERVED_INIT_PRIORITY - 1 : DEFAULT_INIT_PRIORITY; formatting plus declare it here too. This should be int priority = flag_sanitize SANITIZE_USER_ADDRESS ? MAX_RESERVED_INIT_PRIORITY - 1 : DEFAULT_INIT_PRIORITY; (? should be on the second line and aligned below flag_sanitize). Ok for trunk with that change. Shall we backport it to 4.9 branch too? Jakub
Re: asan: support for globals in kernel
On Tue, Dec 2, 2014 at 9:04 PM, Jakub Jelinek ja...@redhat.com wrote: On Tue, Dec 02, 2014 at 09:56:36PM +0400, Dmitry Vyukov wrote: --- gcc/ChangeLog (revision 218280) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2014-12-02 Dmitry Vyukov dvyu...@google.com + + * asan.c: (asan_finish_file): Use default priority for constructors + in kernel mode. Seems your MUA is eating tabs, I'll assume they are where I'd expect them to be. @@ -2440,6 +2442,7 @@ { varpool_node *vnode; unsigned HOST_WIDE_INT gcount = 0; + int priority; if (shadow_ptr_types[0] == NULL_TREE) asan_init_shadow_ptr_types (); No need to declare this separately, @@ -2448,6 +2451,13 @@ nor after .LASAN* array. */ flag_sanitize = ~SANITIZE_ADDRESS; + /* For user-space we want asan constructors to run first. + Linux kernel does not support priorities other than default, and the only + other user of constructors is coverage. So we run with the default + priority. */ + priority = flag_sanitize SANITIZE_USER_ADDRESS ? + MAX_RESERVED_INIT_PRIORITY - 1 : DEFAULT_INIT_PRIORITY; formatting plus declare it here too. This should be int priority = flag_sanitize SANITIZE_USER_ADDRESS ? MAX_RESERVED_INIT_PRIORITY - 1 : DEFAULT_INIT_PRIORITY; (? should be on the second line and aligned below flag_sanitize). Ok for trunk with that change. Shall we backport it to 4.9 branch too? If it's doable, it would be nice. Thanks. When do we expect next 4.9 release? Yes, my mail client eats tabs, here is proper version: https://codereview.appspot.com/176570043/diff/20001/gcc/ChangeLog Sorry. And here is updated text version with eaten tabs: Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 218280) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2014-12-02 Dmitry Vyukov dvyu...@google.com + + * asan.c: (asan_finish_file): Use default priority for constructors + in kernel mode. + 2014-12-02 Ulrich Weigand ulrich.weig...@de.ibm.com PR target/64115 Index: gcc/asan.c === --- gcc/asan.c (revision 218280) +++ gcc/asan.c (working copy) @@ -1348,7 +1348,9 @@ the var that is selected by the linker will have padding or not. */ || DECL_ONE_ONLY (decl) - /* Similarly for common vars. People can use -fno-common. */ + /* Similarly for common vars. People can use -fno-common. + Note: Linux kernel is built with -fno-common, so we do instrument + globals there even if it is C. */ || (DECL_COMMON (decl) TREE_PUBLIC (decl)) /* Don't protect if using user section, often vars placed into user section from multiple TUs are then assumed @@ -2448,6 +2450,13 @@ nor after .LASAN* array. */ flag_sanitize = ~SANITIZE_ADDRESS; + /* For user-space we want asan constructors to run first. + Linux kernel does not support priorities other than default, and the only + other user of constructors is coverage. So we run with the default + priority. */ + int priority = flag_sanitize SANITIZE_USER_ADDRESS + ? MAX_RESERVED_INIT_PRIORITY - 1 : DEFAULT_INIT_PRIORITY; + if (flag_sanitize SANITIZE_USER_ADDRESS) { tree fn = builtin_decl_implicit (BUILT_IN_ASAN_INIT); @@ -2503,12 +2512,10 @@ build_fold_addr_expr (var), gcount_tree), dtor_statements); - cgraph_build_static_cdtor ('D', dtor_statements, - MAX_RESERVED_INIT_PRIORITY - 1); + cgraph_build_static_cdtor ('D', dtor_statements, priority); } if (asan_ctor_statements) -cgraph_build_static_cdtor ('I', asan_ctor_statements, - MAX_RESERVED_INIT_PRIORITY - 1); +cgraph_build_static_cdtor ('I', asan_ctor_statements, priority); flag_sanitize |= SANITIZE_ADDRESS; }
[PING][PATCH V2] plugin event for C/C++ function definitions
Hi, this patch adds a new plugin event PLUGIN_START_PARSE_FUNCTION and PLUGIN_FINISH_PARSE_FUNCTION that are invoked at start_function and finish_function respectively in the C and C++ frontends. PLUGIN_START_PARSE_FUNCTION is called before parsing the function body. PLUGIN_FINISH_PARSE_FUNCTION is called after parsing a function definition. Since I have no write privileges please commit this for me if ok. changelog: gcc/c/c-decl.c: Invoke callbacks in start_function and finish_function. gcc/cp/decl.c: Invoke callbacks in start_function and finish_function. gcc/doc/plugins.texi: Add documentation about PLUGIN_START_FUNCTION and PLUGIN_FINISH_FUNCTION gcc/plugin.def: Add events for start_function and finish_function. gcc/plugin.c (register_callback, invoke_plugin_callbacks): Same. gcc/testsuite/g++.dg/plugin/def_plugin.c: New test plugin. gcc/testsuite/g++.dg/plugin/def-plugin-test.C: Testcase for above plugin. gcc/testsuite/g++.dg/plugin/plugin.exp diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c index 6413e6f..d9d922c 100644 --- a/gcc/c/c-decl.c +++ b/gcc/c/c-decl.c @@ -4449,6 +4449,7 @@ start_decl (struct c_declarator *declarator, struct c_declspecs *declspecs, decl = grokdeclarator (declarator, declspecs, NORMAL, initialized, NULL, attributes, expr, NULL, deprecated_state); + invoke_plugin_callbacks (PLUGIN_START_PARSE_FUNCTION, decl); if (!decl) return 0; @@ -9031,6 +9032,7 @@ finish_function (void) It's still in DECL_STRUCT_FUNCTION, and we'll restore it in tree_rest_of_compilation. */ set_cfun (NULL); + invoke_plugin_callbacks (PLUGIN_FINISH_PARSE_FUNCTION, current_function_decl); current_function_decl = NULL; } diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index 716ab5f..6adf2e4 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -13671,6 +13671,7 @@ start_function (cp_decl_specifier_seq *declspecs, tree decl1; decl1 = grokdeclarator (declarator, declspecs, FUNCDEF, 1, attrs); + invoke_plugin_callbacks (PLUGIN_START_PARSE_FUNCTION, decl1); if (decl1 == error_mark_node) return false; /* If the declarator is not suitable for a function definition, @@ -14301,6 +14302,7 @@ finish_function (int flags) vec_free (deferred_mark_used_calls); } + invoke_plugin_callbacks (PLUGIN_FINISH_PARSE_FUNCTION, fndecl); return fndecl; } diff --git a/gcc/doc/plugins.texi b/gcc/doc/plugins.texi index 4a839b8..1c9e074 100644 --- a/gcc/doc/plugins.texi +++ b/gcc/doc/plugins.texi @@ -174,6 +174,8 @@ Callbacks can be invoked at the following pre-determined events: @smallexample enum plugin_event @{ + PLUGIN_START_PARSE_FUNCTION, /* Called before parsing the body of a function. */ + PLUGIN_FINISH_PARSE_FUNCTION, /* After finishing parsing a function. */ PLUGIN_PASS_MANAGER_SETUP,/* To hook into pass manager. */ PLUGIN_FINISH_TYPE, /* After finishing parsing a type. */ PLUGIN_FINISH_DECL, /* After finishing parsing a declaration. */ diff --git a/gcc/plugin.c b/gcc/plugin.c index 8debc09..f7a8b64 100644 --- a/gcc/plugin.c +++ b/gcc/plugin.c @@ -433,6 +433,8 @@ register_callback (const char *plugin_name, return; } /* Fall through. */ + case PLUGIN_START_PARSE_FUNCTION: + case PLUGIN_FINISH_PARSE_FUNCTION: case PLUGIN_FINISH_TYPE: case PLUGIN_FINISH_DECL: case PLUGIN_START_UNIT: @@ -511,6 +513,8 @@ invoke_plugin_callbacks_full (int event, void *gcc_data) gcc_assert (event = PLUGIN_EVENT_FIRST_DYNAMIC); gcc_assert (event event_last); /* Fall through. */ + case PLUGIN_START_PARSE_FUNCTION: + case PLUGIN_FINISH_PARSE_FUNCTION: case PLUGIN_FINISH_TYPE: case PLUGIN_FINISH_DECL: case PLUGIN_START_UNIT: diff --git a/gcc/plugin.def b/gcc/plugin.def index df5d383..4b7f6ef 100644 --- a/gcc/plugin.def +++ b/gcc/plugin.def @@ -17,6 +17,11 @@ You should have received a copy of the GNU General Public License along with GCC; see the file COPYING3. If not see http://www.gnu.org/licenses/. */ +/* Called before parsing the body of a function. */ +DEFEVENT (PLUGIN_START_PARSE_FUNCTION) + +/* After finishing parsing a function. */ +DEFEVENT (PLUGIN_FINISH_PARSE_FUNCTION) /* To hook into pass manager. */ DEFEVENT (PLUGIN_PASS_MANAGER_SETUP) diff --git a/gcc/testsuite/g++.dg/plugin/def-plugin-test.C b/gcc/testsuite/g++.dg/plugin/def-plugin-test.C new file mode 100644 index 000..b7f2d3d --- /dev/null +++ b/gcc/testsuite/g++.dg/plugin/def-plugin-test.C @@ -0,0 +1,13 @@ +int global = 12; + +int function1(void); + +int function2(int a) // { dg-warning Start fndef function2 } +{ + return function1() + a; +} // { dg-warning Finish fndef function2 } + +int function1(void) // { dg-warning Start fndef function1 } +{ +
Re: asan: support for globals in kernel
On Tue, Dec 02, 2014 at 10:12:30PM +0400, Dmitry Vyukov wrote: Shall we backport it to 4.9 branch too? If it's doable, it would be nice. Thanks. Bet the same patch will just apply there. When do we expect next 4.9 release? Probably in March/April timeframe or so, 4.9.2 has been released just a month ago. And here is updated text version with eaten tabs: Ok. Jakub
Re: [testsuite] Fix multiple definitions of _init
On Nov 30, 2014, at 11:21 PM, Oleg Endo oleg.e...@t-online.de wrote: When running the testsuite on a sh-elf configuration, some test cases fail due to multiple definitions of the function '_init'. This is because on sh-elf every function is automatically prefixed with a '_' char. When defining a C function 'init' in some code, the actual name becomes '_init' and this conflicts with the _init function in libgcc (crti.S). While the behavior of adding the '_' prefix is debatable, the testsuite failures can be easily fixed by the attached patch. Is this OK for trunk? No. It is reasonable for the test suite to fail when the implementation of gcc is wrong (unclean) or newlib startup code is wrong (unclean). Since that is what happened, the fix is to fix the cleanliness problem. I’ve read through the thread… I think the sh newlib port has a trivial bug and should be fixed. Either, add an underscore or remove one, just like _all_ the other ports. There is no complexity here, just a typo (or someone copied the wrong port). When newlib/libc/sys/sh/crt0.S is fixed, and libgcc/config/sh/crt1.S is fixed, then these test cases will pass. USER_LABEL_PREFIX is not the issue. Let me quote from the code: .long _atexit .long _init This is wrong. Also, _fini needs to be fixed as well. It is unfortunate that the fix involves two packages, but, such is life.
Re: asan: support for globals in kernel
On Tue, Dec 2, 2014 at 9:17 PM, Jakub Jelinek ja...@redhat.com wrote: On Tue, Dec 02, 2014 at 10:12:30PM +0400, Dmitry Vyukov wrote: Shall we backport it to 4.9 branch too? If it's doable, it would be nice. Thanks. Bet the same patch will just apply there. Do I need to do anything additional for this? Or you will backport it? Committed in revision 218281. When do we expect next 4.9 release? Probably in March/April timeframe or so, 4.9.2 has been released just a month ago. And here is updated text version with eaten tabs: Ok. Jakub
Re: [PATCH] Fix PR15346
On Tue, 2 Dec 2014, Richard Biener wrote: I'm not sure if these forms of division actually occur in places where this could cause a problem, but it does look like Ada may enable you to generate ROUND_DIV_EXPR. Hmm. I thought I was following what extract_muldiv_1 does (but of course that function is quite hard to follow). I don't claim existing optimizations are correct for all these forms of division. In fact even wide-int doesn't handle ROUND_DIV_EXPR correctly (round to nearest, ties away from 0) - at a glance, without testing, ROUND_MOD_EXPR has the same bug (wi::ges_p (wi::abs (remainder), wi::lrshift (wi::abs (y), 1)) is not the right condition for rounding away). The following Ada testcase test_round_div.adb will generate a ROUND_DIV_EXPR which is executed correctly at -O0, but fails at -O3. (The separate division function seems necessary to stop the front end from doing its own constant folding, though there may for all I know be more idiomatic Ada ways of doing that.) procedure Test_Round_Div is type Fixed is delta 1.0 range -2147483648.0 .. 2147483647.0; A : Fixed := 1.0; B : Fixed := 3.0; C : Integer; function Divide (X, Y : Fixed) return Integer is begin return Integer (X / Y); end; begin C := Divide (A, B); if C /= 0 then raise Program_Error; end if; end Test_Round_Div; -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH x86_64] Optimize access to globals in -fpie -pie builds with copy relocations
On Mon, Sep 8, 2014 at 3:19 PM, Sriraman Tallam tmsri...@google.com wrote: On Tue, Sep 2, 2014 at 1:40 PM, Richard Henderson r...@redhat.com wrote: On 06/20/2014 05:17 PM, Sriraman Tallam wrote: Index: config/i386/i386.c === --- config/i386/i386.c(revision 211826) +++ config/i386/i386.c(working copy) @@ -12691,7 +12691,9 @@ legitimate_pic_address_disp_p (rtx disp) return true; } else if (!SYMBOL_REF_FAR_ADDR_P (op0) - SYMBOL_REF_LOCAL_P (op0) + (SYMBOL_REF_LOCAL_P (op0) +|| (TARGET_64BIT ix86_copyrelocs flag_pie + !SYMBOL_REF_FUNCTION_P (op0))) ix86_cmodel != CM_LARGE_PIC) return true; break; This is the wrong place to patch. You ought to be adjusting SYMBOL_REF_LOCAL_P, by providing a modified TARGET_BINDS_LOCAL_P. I have done this in the new attached patch, I added a new function i386_binds_local_p which will check for this and call default_binds_local_p otherwise. Note in particular that I believe that you are doing the wrong thing with weak and COMMON symbols, in that you probably ought not force a copy reloc there. I added an extra check to not do this for WEAK symbols. I also added a check for DECL_EXTERNAL so I believe this will also not be called for COMMON symbols. Note the complexity of default_binds_local_p_1, and the fact that all you really want to modify is /* If PIC, then assume that any global name can be overridden by symbols resolved from other modules. */ else if (shlib) local_p = false; near the bottom of that function. I did not understand what you mean here? Were you suggesting an alternative way of doing this? Thanks for reviewing I'd like to see a few testcases: 1. One test to show it does the right thing for external variable. 2. One test to show it does the right thing for common symbol. 3. One test to show it does the right thing for weak symbol. 4. One test to show it does the right thing for external function. Thanks. -- H.J.
Re: [PATCH] Fix PR15346
On Tue, 2 Dec 2014, Joseph Myers wrote: (y), 1)) is not the right condition for rounding away). The following Ada testcase test_round_div.adb will generate a ROUND_DIV_EXPR which is I should add that I'm not sure if Ada requires correct rounding for fixed-point division converted to integer like this, or, if not, whether GNU Ada nevertheless guarantees it; it's simply a case that does generate such a ROUND_DIV_EXPR and so can be used to test how ROUND_DIV_EXPR behaves (in the absence of built-in functions). -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH x86_64] Optimize access to globals in -fpie -pie builds with copy relocations
Hello! Ping. Ping. Ping. Ping. It would probably help reviewers if you pointed to actual path submission [1], which unfortunately contains the explanation in the patch itself [2], which further explains that this functionality is currently only supported with gold, patched with [3]. [1] https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00645.html [2] https://gcc.gnu.org/ml/gcc-patches/2014-09/txt2CHtu81P1O.txt [3] https://sourceware.org/ml/binutils/2014-05/msg00092.html After a bit of the above detective work, I think that new gcc option is not necessary. The configure should detect if new functionality is supported in the linker, and auto-configure gcc to use it when appropriate. I have also added a couple of linker experts in the CC. Uros.
Re: [PATCH x86_64] Optimize access to globals in -fpie -pie builds with copy relocations
On Tue, Dec 2, 2014 at 11:19 AM, Uros Bizjak ubiz...@gmail.com wrote: Hello! Ping. Ping. Ping. Ping. It would probably help reviewers if you pointed to actual path submission [1], which unfortunately contains the explanation in the patch itself [2], which further explains that this functionality is currently only supported with gold, patched with [3]. [1] https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00645.html [2] https://gcc.gnu.org/ml/gcc-patches/2014-09/txt2CHtu81P1O.txt [3] https://sourceware.org/ml/binutils/2014-05/msg00092.html After a bit of the above detective work, I think that new gcc option is not necessary. The configure should detect if new functionality is supported in the linker, and auto-configure gcc to use it when appropriate. I have also added a couple of linker experts in the CC. I don't think i386_binds_local_p is correct. What does it return for hidden external variable? I think it should be bool local = default_binds_local_p (exp); if (!local) local = ... return local; H.J.
Re: [PATCH x86_64] Optimize access to globals in -fpie -pie builds with copy relocations
On Tue, Dec 2, 2014 at 11:19 AM, Uros Bizjak ubiz...@gmail.com wrote: Hello! Ping. Ping. Ping. Ping. It would probably help reviewers if you pointed to actual path submission [1], which unfortunately contains the explanation in the patch itself [2], which further explains that this functionality is currently only supported with gold, patched with [3]. [1] https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00645.html [2] https://gcc.gnu.org/ml/gcc-patches/2014-09/txt2CHtu81P1O.txt [3] https://sourceware.org/ml/binutils/2014-05/msg00092.html After a bit of the above detective work, I think that new gcc option is not necessary. The configure should detect if new functionality is supported in the linker, and auto-configure gcc to use it when appropriate. I think GCC option is needed since one can use -fuse-ld= to change linker. -- H.J.
Re: [PATCH x86_64] Optimize access to globals in -fpie -pie builds with copy relocations
On Tue, Dec 2, 2014 at 8:40 PM, H.J. Lu hjl.to...@gmail.com wrote: On Tue, Dec 2, 2014 at 11:19 AM, Uros Bizjak ubiz...@gmail.com wrote: Hello! Ping. Ping. Ping. Ping. It would probably help reviewers if you pointed to actual path submission [1], which unfortunately contains the explanation in the patch itself [2], which further explains that this functionality is currently only supported with gold, patched with [3]. [1] https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00645.html [2] https://gcc.gnu.org/ml/gcc-patches/2014-09/txt2CHtu81P1O.txt [3] https://sourceware.org/ml/binutils/2014-05/msg00092.html After a bit of the above detective work, I think that new gcc option is not necessary. The configure should detect if new functionality is supported in the linker, and auto-configure gcc to use it when appropriate. I think GCC option is needed since one can use -fuse-ld= to change linker. IMO, nobody will use this highly special x86_64-only option. It would be best for gnu-ld to reach feature parity with gold as far as this functionality is concerned. In this case, the optimization would be auto-configured, and would fire automatically, without any user intervention. Uros.
Re: [PATCH] Make IPA-CP propagate alignment information of pointers
Hi, thanks for the comments. On Mon, Dec 01, 2014 at 11:40:59PM +0100, Jan Hubicka wrote: 2014-11-19 Martin Jambor mjam...@suse.cz * ipa-prop.h (ipa_alignment): New type. (ipa_jump_func): New field alignment. (ipcp_transformation_summary) New type. (ipcp_grow_transformations_if_necessary): Declare. (ipa_node_agg_replacements): Removed. (ipcp_transformations): Declare. (ipcp_get_transformation_summary): New function. (ipa_get_agg_replacements_for_node): Use it. * ipa-cp.c (ipcp_param_lattices): New field alignment. (print_all_lattices): Also print alignment. (alignment_bottom_p): New function. (set_alignment_to_bottom): Likewise. (set_all_contains_variable): Also set alignment to bottom. (initialize_node_lattices): Likewise. (propagate_alignment_accross_jump_function): New function. (propagate_constants_accross_call): Call it. (ipcp_store_alignment_results): New function. (ipcp_driver): Call it. * ipa-prop.c (ipa_node_agg_replacements): Removed. (ipcp_transformations): New. (ipa_print_node_jump_functions_for_edge): Also print alignment. (ipa_set_jf_unknown): New function. (detect_type_change_from_memory_writes): Use ipa_set_jf_unknown. (ipa_compute_jump_functions_for_edge): Also calculate alignment. (update_jump_functions_after_inlining): Use ipa_set_jf_unknown. (ipcp_grow_transformations_if_necessary): New function. (ipa_set_node_agg_value_chain): Use ipcp_transformations. (ipa_node_removal_hook): Likewise. (ipa_node_duplication_hook): Also duplicate alignment results. (ipa_write_jump_function): Also stream alignments. (ipa_read_jump_function): Use ipa_set_jf_unknown, also stream alignments. (write_agg_replacement_chain): Renamed to write_ipcp_transformation_info, also stream alignments. (read_agg_replacement_chain): Renamed to read_ipcp_transformation_info, also stream alignments. (ipa_prop_write_all_agg_replacement): Renamed to ipcp_write_transformation_summaries. Stream always. (ipa_prop_read_all_agg_replacement): Renamed to ipcp_read_transformation_summaries. (ipcp_update_alignments): New function. (ipcp_transform_function): Call it, free also alignments. In longer term I think we should just propagate value range and known to be zero/nonzero bits. This is stronger and more universal than inventing propagation for subproblems (like non-NULL would be useful). I think it would be also good excuse to do (simplified) early VRP to populate the vlaue ranges locally. This seems to make quite some difference on C++. Well, it would essentially need some sort of early VRP. So yes, hopefully in the next major version we can do that. But for 5.0 it seems like resonable thing to do. Since ipa-prop itself is built as propaation engine for multiple types of values (scalars, aggregates, contexts and now value ranges). I believe that if we start propagating value ranges, this would make propagating scalars redundant? Also it would need some rethinking of the ipa-cp propagation heuristics. I wonder if the code can not be organized by type of propagated value and perhaps split into multiple file to make it more readable. Splitting off of some part to extra files might help but first it is necessary to figure out how to reorganize the contents of the file in general. Jump functions should probably be re-thought a little bit, especially once we add information like reference mapping from my aggregate propagation patches. It would also be beneficial to do some analysis together with generating the inline summaries and tweak ipa-analysis summary conditions so that they can tell which value ranges would lead to the same results. And so on and so forth. I think the way we split polymorphic-call-context lattice operations into separate file quite works but more of code separation would be nice. Well, a bit like df.c infrastructure is done. + else if (jfunc-type == IPA_JF_PASS_THROUGH +|| jfunc-type == IPA_JF_ANCESTOR) +{ + struct ipa_node_params *caller_info = IPA_NODE_REF (cs-caller); + struct ipcp_param_lattices *src_lats; + HOST_WIDE_INT offset = 0; + int src_idx; + We probably chould avoid new places that rely on tree_fits_shwi_p and use wide_int instead. On IRC we agreed that this making change here alone is of very limited benefit, unlike using wide_int more in IPA data structures overall, something which should be done in 6.0. + if (POINTER_TYPE_P (TREE_TYPE(arg))) + { + unsigned HOST_WIDE_INT hwi_bitpos; + unsigned align; + + if (get_pointer_alignment_1 (arg, align, hwi_bitpos) +align BITS_PER_UNIT) OK, so here we get the nonzero/zero bits propagation from CCP used, right? + ipcp_transformation_summary *ts = ipcp_get_transformation_summary
Re: [PATCH 5/6] combine: handle REG_UNUSED in reg_dead_at_p (PR59278)
On 12/01/14 16:30, Oleg Endo wrote: On Mon, 2014-12-01 at 10:38 -0700, Jeff Law wrote: On 11/27/14 18:44, Segher Boessenkool wrote: Currently reg_dead_at_p returns 0 for registers that are set but never used. This patch repairs that oversight. This fixes PR59278. 2014-11-27 Segher Boessenkool seg...@kernel.crashing.org gcc/ PR rtl-optimization/59278 combine (reg_dead_at_p): Consider REG_UNUSED notes. OK with a testcase. ISTM you can just create an SH specific testcase from the code in the BZ. Pre-approved with suitable testcase. jeff Attached is the SH test case for the PR, which passes with the combine patch applied for make -k check-gcc RUNTESTFLAGS=sh.exp=pr59278.c --target_board=sh-sim\{-m4/-ml,-m4/-mb,-m2a/-mb,-m2/-mb} Cheers, Oleg testsuite/ChangeLog: * gcc.target/sh/pr59278.c: New. OK, please commit after Segher or together as a unified patch+testcase. jeff
Re: [Patch] Improving jump-thread pass for PR 54742
On 12/02/14 03:15, Richard Biener wrote: I'm a bit worried about compile-time impacts of the all the recursion, but I'm willing to wait and see if it turns out to be a problem in practice. Please consider restricting it to -fexpensive-optimizations (-O2+). Yea, let's go ahead and do that. jeff
[PATCH] Update docs to reflect use of gimple subclasses
On Tue, 2014-12-02 at 08:49 +0100, Jakub Jelinek wrote: Hi! Here is an attempt to adjust gimple_build_assign* documentation. Ok for trunk? Note, apparently the documentation has not been adjusted for the gimple - gassign * etc. changes, David, can you please work on adjusting gimple.texi to match the reality after your changes? I went through cfg.texi and gimple.texi updating all of the documentation to reflect the changed signatures. gimple.texi has numerous overlong lines; I wordwrapped any that I touched. make html succeeds; I visually inspected and sanity-checked the built HTML. This will have merger conflicts with Jakub's patch, but it should be trivial to fix them up. OK for trunk? gcc/ChangeLog: * doc/cfg.texi (GIMPLE statement iterators): Add note about gphi_iterator, and use one in the example. * doc/gimple.texi (Tuple specific accessors): Add missing GIMPLE_GOTO section and menu item. (gimple_build_asm, gimple gimple_build_assign_with_ops) gimple_call_mark_uninlinable, gimple_call_cannot_inline_p): Remove description of removed functions. (gimple_build_assign, gimple_build_bind, gimple_build_call, gimple_build_call_from_tree, gimple_build_call_vec, gimple_build_catch, gimple_build_cond, gimple_build_cond_from_tree, gimple_build_debug_bind, gimple_build_eh_filter, gimple_build_label, gimple_build_goto, gimple_build_omp_atomic_load, gimple_build_omp_atomic_store, gimple_build_omp_continue, gimple_build_omp_critical, gimple_build_omp_for, gimple_build_omp_parallel, gimple_build_omp_sections, gimple_build_omp_single, gimple_build_return, gimple_build_resx, gimple_build_switch, gimple_build_try): Update return type within description to reflect changes in gimple.h to using gimple subclasses. (gimple_build_asm_vec): Update return type, params and description. (gimple_asm_ninputs): Update param. (gimple_asm_noutputs, gimple_asm_nclobbers, gimple_asm_input_op gimple_asm_set_input_op, gimple_asm_output_op gimple_asm_set_output_op, gimple_asm_clobber_op, gimple_asm_set_clobber_op, gimple_asm_string, gimple_asm_volatile_p, gimple_asm_set_volatile, gimple_bind_vars, gimple_bind_set_vars, gimple_bind_append_vars, gimple_bind_body, gimple_bind_set_body, gimple_bind_add_stmt, gimple_bind_add_seq, gimple_bind_block, gimple_bind_set_block, gimple_call_set_fn, gimple_call_return_type, gimple_call_set_chain, gimple_call_set_tail, gimple_call_tail_p, gimple_call_copy_skip_args, gimple_catch_types, gimple_catch_types_ptr, gimple_catch_handler, gimple_catch_set_types, gimple_catch_set_handler, gimple_cond_set_code, gimple_cond_set_lhs, gimple_cond_set_rhs, gimple_cond_true_label, gimple_cond_set_true_label, gimple_cond_set_false_label, gimple_cond_false_label, gimple_cond_make_false, gimple_cond_make_true, gimple_eh_filter_set_types, gimple_eh_filter_set_failure, gimple_eh_must_not_throw_fndecl, gimple_eh_must_not_throw_set_fndecl, gimple_label_label, gimple_label_set_label, gimple_goto_set_dest, gimple_omp_atomic_load_set_lhs, gimple_omp_atomic_load_lhs, gimple_omp_atomic_load_set_rhs, gimple_omp_atomic_load_rhs, gimple_omp_atomic_store_set_val, gimple_omp_atomic_store_val, gimple_omp_continue_control_def, gimple_omp_continue_control_def_ptr, gimple_omp_continue_set_control_def, gimple_omp_continue_control_use, gimple_omp_continue_control_use_ptr, gimple_omp_continue_set_control_use, gimple_omp_critical_name, gimple_omp_critical_name_ptr, gimple_omp_critical_set_name, gimple_omp_parallel_clauses_ptr, gimple_omp_parallel_set_clauses, gimple_omp_parallel_child_fn, gimple_omp_parallel_child_fn_ptr, gimple_omp_parallel_set_child_fn, gimple_omp_parallel_data_arg, gimple_omp_parallel_data_arg_ptr, gimple_omp_parallel_set_data_arg, gimple_omp_single_set_clauses, gimple_phi_set_result, gimple_phi_set_arg, gimple_resx_region, gimple_resx_set_region, gimple_return_retval, gimple_return_set_retval, gimple_switch_num_labels, gimple_switch_set_num_labels, gimple_switch_index, gimple_switch_set_index, gimple_switch_label, gimple_switch_set_label, gimple_switch_default_label, gimple_switch_set_default_label, gimple_try_set_eval, gimple_try_set_cleanup): Update initial param within description to reflect changes in gimple.h to using gimple subclasses. (Adding a new GIMPLE statement code): Update to reflect gimple statement subclassing. --- gcc/doc/cfg.texi| 8 +- gcc/doc/gimple.texi | 354 +--- 2 files changed, 204
Re: [PATCH x86_64] Optimize access to globals in -fpie -pie builds with copy relocations
On Tue, Dec 02, 2014 at 12:16:09PM -0800, H.J. Lu wrote: IMO, nobody will use this highly special x86_64-only option. It would be best for gnu-ld to reach feature parity with gold as far as this functionality is concerned. In this case, the optimization would be auto-configured, and would fire automatically, without any user intervention. I will implement it in ld after its support is checked into GCC. I think it would be better to do it the other way around, so that gcc can be configured against the right ld from the start. Jakub
Re: [PATCH 2/2, PR 63814] Do not re-create expanded artificial thunks
Hi, On Mon, Dec 01, 2014 at 10:43:19PM +0100, Jan Hubicka wrote: On Fri, Nov 21, 2014 at 08:18:12PM +0100, Martin Jambor wrote: Hi, when debugging PR 63814 I noticed that when cgraph_node::create_clone was using redirect_edge_duplicating_thunks to redirect two edges to a thunk of a clone, two thunks were created, one for each edge. The reason is that even though duplicate_thunk_for_node attempts to locate an already created thunk, it does so by looking for a caller with thunk.thunk_p set and the previously created one does not have it set because (on i686) expand_thunk has expanded the thunk to gimple and cleared the flag. This patch fixes the issue by marking such expanded thunks with yet another flag and then uses the flag to identify such expanded thunks. Bootstrapped and tested on x86_64-linux and i686-linux. Honza, do you think this is a good approach? Is the patch OK for trunk? Hmm, I was sort of hoping to drop the whole .thunk structure into summary and do not keep it for functions with Gimple body, so this change will imply the need to keep this datastructure for thunks that are already expanded. moreover we do not stream the flag, so not all expanded thunks are handled, only those that was introduced during WPA. It may be cleaner to simply donot expand thunks proactively before all the clonning is finished? I did not really like it either (although I must say that in general I really dislike the need to call expand_thunk but telling it to not really expand anything if possible). So here is another approach, which works by delaying expansion of thunks but only after all redirections within create_clone are done. This has the advantage that the new method expand_all_artificial_thunks is not exposed to every (indirect) user of cgraph_node::create_clone (as opposed to requiring them all to call that function at some point after making any redirections at all). However, it also has the disadvantage that in the unlikeliest of circumstances (strongly connected components in IPA-CP value dependencies connected by thunks and these thunks ceasing to be assembly thunks in the course of cloning at the same time), IPA-CP might still create extra gimple-expanded thunks. The alternative not suffering from this problem would be either exposing expand_all_artificial_thunks to all users and requiring them to call it, with IPA-CP doing it for all created clones at the very end of propagation, or teaching the infrastructure and pass manager about this and then expanding them at some convenient time (before/after inlining?). However, the first option seems to be too ugly for very little benefit (but I am willing to implement that) and the second does not seem to be stage3 material. What do you think? Bootstrapped and tested on x86_64-linux. Martin 2014-12-02 Martin Jambor mjam...@suse.cz * cgraph.h (cgraph_node): New method expand_all_artificial_thunks. (cgraph_edge): New method redirect_callee_duplicating_thunks. * cgraphclones.c (duplicate_thunk_for_node): Donot expand newly created thunks. (redirect_edge_duplicating_thunks): Turned into edge method redirect_callee_duplicating_thunks. (cgraph_node::expand_all_artificial_thunks): New method. (create_clone): Call expand_all_artificial_thunks. * ipa-cp.c (perhaps_add_new_callers): Call redirect_callee_duplicating_thunks instead of redirect_callee. Also call expand_all_artificial_thunks. Index: src/gcc/cgraph.h === --- src.orig/gcc/cgraph.h +++ src/gcc/cgraph.h @@ -908,6 +908,10 @@ public: thunks that are not lowered. */ bool expand_thunk (bool output_asm_thunks, bool force_gimple_thunk); + /* Call expand_thunk on all callers that are thunks and if analyze those + nodes that were expanded. */ + void expand_all_artificial_thunks (); + /* Assemble thunks and aliases associated to node. */ void assemble_thunks_and_aliases (void); @@ -1477,6 +1481,12 @@ struct GTY((chain_next (%h.next_caller call expression. */ void redirect_callee (cgraph_node *n); + /* If the edge does not lead to a thunk, simply redirect it to N. Otherwise + create one or more equivalent thunks for N and redirect E to the first in + the chain. Note that it is then necessary to call + n-expand_all_artificial_thunks once all callers are redirected. */ + void redirect_callee_duplicating_thunks (cgraph_node *n); + /* Make an indirect edge with an unknown callee an ordinary edge leading to CALLEE. DELTA is an integer constant that is to be added to the this pointer (first parameter) to compensate for skipping Index: src/gcc/cgraphclones.c === --- src.orig/gcc/cgraphclones.c +++ src/gcc/cgraphclones.c @@ -370,28 +370,47 @@
[PATCH, PR 64153] Check type sizes before V_C_Eing in evaluate_conditions_for_known_args
Hi, apparently it is necessary to check that type sizes match before attempting to fold-V_C_E them in evaluate_conditions_for_known_args. So this patch does this. It passes bootstrap and testing on x86_64-linux and I have verified with a cross compiler that the reported bug is fixed (the generated assembly is the same as before the commit which introduced the problem). OK for trunk and (after a bootstrap and testing there) the 4.9 branch? Thanks, Martin 2014-12-02 Martin Jambor mjam...@suse.cz PR ipa/64153 * ipa-inline-analysis.c (evaluate_conditions_for_known_args): Check type sizes before view_converting. Index: src/gcc/ipa-inline-analysis.c === --- src.orig/gcc/ipa-inline-analysis.c +++ src/gcc/ipa-inline-analysis.c @@ -880,12 +880,19 @@ evaluate_conditions_for_known_args (stru } if (c-code == IS_NOT_CONSTANT || c-code == CHANGED) continue; - val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (c-val), val); - res = val - ? fold_binary_to_constant (c-code, boolean_type_node, val, c-val) - : NULL; - if (res integer_zerop (res)) - continue; + + if (operand_equal_p (TYPE_SIZE (TREE_TYPE (c-val)), + TYPE_SIZE (TREE_TYPE (val)), 0)) + { + val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (c-val), val); + + res = val + ? fold_binary_to_constant (c-code, boolean_type_node, val, c-val) + : NULL; + + if (res integer_zerop (res)) + continue; + } clause |= 1 (i + predicate_first_dynamic_condition); } return clause;
Re: [PATCH 08/08] PR/64003 workaround (uninit memory in i386.md)
On 12/02/14 09:20, David Malcolm wrote: In short, I believe the problem occurs with a *jcc_1 insn that jumps forwards, but the full details are in the bug. My first thought is that something must be creating a new insn after shorten_branches is complete or an existing insn that was not on the chain when we called shorten-branches, but got threaded onto the chain later. Either would be considered bad in various ways, so we'd like to fix it. I don't think either of these are the case. I believe it's due to the size of the jcc_1 insn being affected by the distance to the jump target, which for a forward jump is a bit of a chicken-and-egg issue, since that distance is affected by the size of the jcc_1 insn itself. It looks like align_fuzz exists in order to cope with this kind of circular definition, but the issue seems to occur inside align_fuzz itself. Sorry, I didn't look at the BZ, you had already put a fair amount of analysis in there. My bad. This feels so familiar. Jeff
Re: attribute handler oddness in MEP and STORMY16 ports
My memories of why I did MeP the way I did are... vague. I recall it had to do with getting the attributes to apply to C++ objects correctly, since C++ objects tend to be complicated and gcc didn't always pass me what I expected. think they are suppose to. They build, but I cant test them... can some with that ability run them and see if they are correct? MeP and stormy16 are both cgen/sid simulators.
Re: [PATCH x86_64] Optimize access to globals in -fpie -pie builds with copy relocations
On Tue, Dec 2, 2014 at 12:01 PM, Uros Bizjak ubiz...@gmail.com wrote: On Tue, Dec 2, 2014 at 8:40 PM, H.J. Lu hjl.to...@gmail.com wrote: On Tue, Dec 2, 2014 at 11:19 AM, Uros Bizjak ubiz...@gmail.com wrote: Hello! Ping. Ping. Ping. Ping. It would probably help reviewers if you pointed to actual path submission [1], which unfortunately contains the explanation in the patch itself [2], which further explains that this functionality is currently only supported with gold, patched with [3]. [1] https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00645.html [2] https://gcc.gnu.org/ml/gcc-patches/2014-09/txt2CHtu81P1O.txt [3] https://sourceware.org/ml/binutils/2014-05/msg00092.html After a bit of the above detective work, I think that new gcc option is not necessary. The configure should detect if new functionality is supported in the linker, and auto-configure gcc to use it when appropriate. I think GCC option is needed since one can use -fuse-ld= to change linker. IMO, nobody will use this highly special x86_64-only option. It would be best for gnu-ld to reach feature parity with gold as far as this functionality is concerned. In this case, the optimization would be auto-configured, and would fire automatically, without any user intervention. I will implement it in ld after its support is checked into GCC. -- H.J.
Re: [PATCH, PR 64153] Check type sizes before V_C_Eing in evaluate_conditions_for_known_args
Hi, apparently it is necessary to check that type sizes match before attempting to fold-V_C_E them in evaluate_conditions_for_known_args. So this patch does this. It passes bootstrap and testing on x86_64-linux and I have verified with a cross compiler that the reported bug is fixed (the generated assembly is the same as before the commit which introduced the problem). OK for trunk and (after a bootstrap and testing there) the 4.9 branch? Thanks, Martin 2014-12-02 Martin Jambor mjam...@suse.cz PR ipa/64153 * ipa-inline-analysis.c (evaluate_conditions_for_known_args): Check type sizes before view_converting. OK, Honza Index: src/gcc/ipa-inline-analysis.c === --- src.orig/gcc/ipa-inline-analysis.c +++ src/gcc/ipa-inline-analysis.c @@ -880,12 +880,19 @@ evaluate_conditions_for_known_args (stru } if (c-code == IS_NOT_CONSTANT || c-code == CHANGED) continue; - val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (c-val), val); - res = val - ? fold_binary_to_constant (c-code, boolean_type_node, val, c-val) - : NULL; - if (res integer_zerop (res)) - continue; + + if (operand_equal_p (TYPE_SIZE (TREE_TYPE (c-val)), +TYPE_SIZE (TREE_TYPE (val)), 0)) + { + val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (c-val), val); + + res = val + ? fold_binary_to_constant (c-code, boolean_type_node, val, c-val) + : NULL; + + if (res integer_zerop (res)) + continue; + } clause |= 1 (i + predicate_first_dynamic_condition); } return clause;
[PATCH] fix PR testsuite/64145
The attached patch fixes the regression in the gcc/testsuite/gcc.dg/graphite/isl-codegen-loop-dumping.c testcase caused by the accidental removal of -fgraphite-identity from dg-options at r217315. Okay for gcc trunk? Jack 2014-12-01 Jack Howarth howa...@bromo.med.uc.edu PR testsuite/64145 * gcc.dg/graphite/isl-codegen-loop-dumping.c: Restore -fgraphite-identity. Index: gcc/testsuite/gcc.dg/graphite/isl-codegen-loop-dumping.c === --- gcc/testsuite/gcc.dg/graphite/isl-codegen-loop-dumping.c(revision 218285) +++ gcc/testsuite/gcc.dg/graphite/isl-codegen-loop-dumping.c(working copy) @@ -1,4 +1,4 @@ -/* { dg-options -O2 -fdump-tree-graphite-all } */ +/* { dg-options -O2 -fgraphite-identity -fdump-tree-graphite-all } */ int main (int n, int *a)
Fwd: [PATCH] fix PR testsuite/64145
Looks good to me. Thank you. 2014-12-03 2:09 GMT+05:00 Jack Howarth howarth.at@gmail.com: The attached patch fixes the regression in the gcc/testsuite/gcc.dg/graphite/isl-codegen-loop-dumping.c testcase caused by the accidental removal of -fgraphite-identity from dg-options at r217315. Okay for gcc trunk? Jack -- Cheers, Roman Gareev.
Re: [PATCH 08/08] PR/64003 workaround (uninit memory in i386.md)
On 12/02/14 09:20, David Malcolm wrote: I've spent some time trying to track this down, and I've added detailed notes to the bug. In short, I believe the problem occurs with a *jcc_1 insn that jumps forwards, but the full details are in the bug. My first thought is that something must be creating a new insn after shorten_branches is complete or an existing insn that was not on the chain when we called shorten-branches, but got threaded onto the chain later. Either would be considered bad in various ways, so we'd like to fix it. I don't think either of these are the case. I believe it's due to the size of the jcc_1 insn being affected by the distance to the jump target, which for a forward jump is a bit of a chicken-and-egg issue, since that distance is affected by the size of the jcc_1 insn itself. I thought we had code somewhere that made worst case assumptions about the lengths, to get those arrays initialized with reasonable value without looking forward. But that may have also been in place prior to the addition of align_fuzz. It looks like align_fuzz exists in order to cope with this kind of circular definition, but the issue seems to occur inside align_fuzz itself. align_fuzz is meant to estimate padding due to alignments. Joern and I went round and round on that code and I never managed to wrap my head around it or its correctness. Jim W. might have stepped in at some point to arbitrate. I may have to pull out my archives and review that history (IIRC is predates egcs). Interestingly there was a great discussion around the code in 2009 which mirrored some of my initial concerns with this code. Ultimately Bernd recommended removing it, but nobody reviewed/approved the patch. I think you should ping Joern to chime in here about the proper course of action to take if we're going to keep this code in GCC. You could just assign in the BZ to him. FWIW, I can't convince myself initializing the value to zero is actually safe or not. jeff
Re: [PATCH][wwwdocs] Add note about removal of -mwords-little-endian option to changes.html for GCC 5
On Tuesday 2014-12-02 16:34, Kyrill Tkachov wrote: Back in July this deprecated option was removed. This patch adds a note to changes.html for that. Looked at the result in firefox, looked ok to me. Ok to commit? Yep. And I very comfortable for you to commit changes like this without asking for approval. Mind, I am happy to review (except for Dec 6h-17th where I expect I'll be off-offline), I just don't want to enforce too much process on you. Gerald
Re: [arm][patch] fix arm_neon_ok check on !arm_arch7
On Tue, Dec 2, 2014 at 2:01 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: On 23/09/14 09:27, James Greenhalgh wrote: On Mon, Sep 15, 2014 at 11:56:03AM +0100, Andrew Stubbs wrote: On 15/09/14 10:46, Richard Earnshaw wrote: Hmm, I wonder if arm_override_options should reject neon + (arch 7). Is this more to your taste? Is this really such a good idea? It causes carnage throughout the testsuite if you have configured with support for Neon and the testcase is written with dg-options for a pre-armv7-a -march value. For example in: testsuite/gcc.target/arm/di-longlong64-sync-withhelpers.c Which forces -march=armv5. Perhaps you just have to fix the effective-target-ok tests - but then we lose swathes of test coverage. This also causes subtle Linux kernel compile failures. Over there they use make rules where they check if the compiler supports -march=armv5te and if not use -march=armv4t. With this patch if the compiler is configured with something like --with-fpu=neon the test will fail with your error message, even though the compiler supports -march=armv5te. I've spent some time this evening pondering over your patch. Firstly it appears that the current behaviour is going to cause more breakage than originally expected. If this is to go in we'd have a number of users having to add -mfloat-abi=soft to the command line option to ensure that -march=armv5te works just fine on the files where march=armv5te in the first places. I'm not sure that the original patch is enough. The tools have always allowed us to drop down the arch to march=armv5te along with using -mfpu=neon. We are now changing command line behaviour, so an inform in terms of diagnostics to the user would be useful as it states that we don't really have mfpu=neon generating neon code any more because of this particular case. If we are to do this then the original patch is probably not enough as it then doesn't handle the case of TARGET_VFP3 / TARGET_VFP5 / TARGET_NEON_FP16 / TARGET_FP16 / TARGET_FPU_ARMV8 etc. etc. etc. regards Ramana Kyrill Thanks, James Andrew P.S. arm_override_options was renamed in 2010. 2014-09-15 Andrew Stubbs a...@codesourcery.com * gcc/config/arm/arm.c (arm_option_override): Reject -mfpu=neon when architecture is older than ARMv7. Index: gcc/config/arm/arm.c === --- gcc/config/arm/arm.c(revision 215228) +++ gcc/config/arm/arm.c(working copy) @@ -2845,6 +2845,9 @@ arm_fpu_desc = all_fpus[arm_fpu_index]; + if (TARGET_NEON !arm_arch7) +error (target CPU does not support NEON); + switch (arm_fpu_desc-model) { case ARM_FP_MODEL_VFP:
Re: [PATCH] DWARF add DW_AT_noreturn on noreturn function subprogram.
On 12/01/14 14:02, Cary Coutant wrote: [+cc Michael Eager] Rather than having to lobby to keep it unchanged because we jumped the gun, can we lobby to get the number assigned in the near future rather than in the potentially far future? That feels more cooperative to me :-) Would that make Michael happier? I'm pretty confident that Michael will say, don't rely on the value until the final spec is published. (He's told me exactly that in the past. I've been guilty of jumping the gun on a couple of DW_LANG codes and a few DW_AT values.) But we should let Michael answer for himself... As Cary said, the values are not fixed until the spec is released. As Jason said, the values are unlikely to change. The standards process can be a bit messy at times, particularly when we realize that something we added to a draft needs to be changed. We try to avoid making unnecessary changes, naturally. On the other hand, we don't want something in an internal draft of the DWARF specification to prevent us from making necessary changes. We will avoid changing the value of the DW_AT_noreturn or any other new attribute, if for no reason than it makes additional work for us. We're also aware that changes may affect work on GCC, GDB, or other tools, which we would like to avoid. The most conservative approach is to wait for the DWARF Version 5 spec to be released. While there is no guarantee that attribute values will not change in the final spec, the risk if this happening seems small. Michael, for your reference, here's the start of this thread: https://gcc.gnu.org/ml/gcc-patches/2014-11/msg03195.html and its continuation into December: https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00099.html Michael, are you OK with sharing your target dates for publishing the spec? We will likely have a public review draft available in April or May with a 60 period for comments, with final release most likely in July. -- Michael Eagerea...@eagercon.com 1960 Park Blvd., Palo Alto, CA 94306 650-325-8077
Re: [testsuite] Fix multiple definitions of _init
On Tue, 2 Dec 2014, Richard Biener wrote: Joseph may have more experience with how targets should setup USER_LABEL_PREFIX to avoid this situation. See e.g. config/arm/lib1funcs.S: #define SYM(x) CONCAT1 (__USER_LABEL_PREFIX__, x) (and the associated macro definition of CONCAT1 that uses, etc.). If you have .S sources that may be used on targets both with and without a prefix, you should do something similar. (The ELF gABI says External C symbols have the same names in C and object files' symbol tables., so ELF targets using leading underscores on C symbol names are going against the gABI.) -- Joseph S. Myers jos...@codesourcery.com
Re: [C++ Patch] PR 60978
See my comment on the PR; I think the testcase illustrates why we still want this warning for anonymous enums. Jason
Re: [PATCH x86_64] Optimize access to globals in -fpie -pie builds with copy relocations
On Tue, Dec 2, 2014 at 12:19 PM, Jakub Jelinek ja...@redhat.com wrote: On Tue, Dec 02, 2014 at 12:16:09PM -0800, H.J. Lu wrote: IMO, nobody will use this highly special x86_64-only option. It would be best for gnu-ld to reach feature parity with gold as far as this functionality is concerned. In this case, the optimization would be auto-configured, and would fire automatically, without any user intervention. I will implement it in ld after its support is checked into GCC. I think it would be better to do it the other way around, so that gcc can be configured against the right ld from the start. Consider it is done. I will check it into binutils ths week. -- H.J.
Re: [C++ Patch] PR 60978
Hi, On 12/02/2014 11:01 PM, Jason Merrill wrote: See my comment on the PR; I think the testcase illustrates why we still want this warning for anonymous enums. Ok... Thus, barring further discussion, I will simply close the Bug as invalid. Thanks! Paolo.
Re: [PATCH 0/2, AArch64, v3] APM X-Gene 1 cost-table and pipeline model
On Fri, Nov 21, 2014 at 6:44 PM, Philipp Tomsich philipp.toms...@theobroma-systems.com wrote: The following patch-series adds optimized support for the APM X-Gene 1 by providing a cost-model and pipeline-model. The pipeline-model has a few long reservation-chains, but looking at the stats for the generated NDA shows that it's well below other AArch64 cores (e.g. Cortex-A53) in overall size. This includes all the requested enhancements and cleans up the naming of the various states and reservations in 'xgene1.md'. Even though it isn't wired into the 32bit ARM backend yet, we've decided to keep the machine-description in config/arm... after all, the X-Gene family is backwards compatible with ARMv7 and our benchmarking has shown good potential for performance improvements from improving the instruction selection and scheduling when using ARMv7 code (after all, X-Gene 1 is a 4-way superscalar design). Good, this being the case I would strongly suggest that -mcpu=xgene1 is wired up properly in the ARM backend by the end of stage3. It's better to do this sooner rather than later in order to minimize issues late in the release cycle. Afterwards its just bug fixing / tuning specifically for xgene. As I've said in the past I would strongly discourage an options break between the 2 backends, it will only come back to haunt the backends. i.e. please wire up xgene1 in config/arm/arm-cores.def Please also submit a patch to changes.html for AArch64 and ARM backends once this patch is reviewed / queued. Ramana After having a few further discussions with my colleagues regarding the latencies and modelling of divides in the pipeline, we've readjusted the modelling of the divides another time... even though it doesn't make a difference in real-world benchmarks. Thanks to everyone who took the time to review and comment. Philipp Tomsich (2): Core definition for APM XGene-1 and associated cost-table. Pipeline model for APM XGene-1. gcc/ChangeLog| 14 + gcc/config/aarch64/aarch64-cores.def | 1 + gcc/config/aarch64/aarch64-tune.md | 2 +- gcc/config/aarch64/aarch64.c | 62 gcc/config/aarch64/aarch64.md| 3 +- gcc/config/arm/aarch-cost-tables.h | 101 +++ gcc/config/arm/xgene1.md | 532 +++ gcc/doc/invoke.texi | 3 +- 8 files changed, 715 insertions(+), 3 deletions(-) create mode 100644 gcc/config/arm/xgene1.md -- 1.9.1
Re: [PATCH 1/2] Core definition for APM XGene-1 and associated cost-table.
On Fri, Nov 21, 2014 at 6:44 PM, Philipp Tomsich philipp.toms...@theobroma-systems.com wrote: To keep this change separately buildable from the pipeline model, this patch directs the APM XGene-1 to use the generic scheduling model. --- gcc/ChangeLog| 8 +++ gcc/config/aarch64/aarch64-cores.def | 1 + gcc/config/aarch64/aarch64-tune.md | 2 +- gcc/config/aarch64/aarch64.c | 62 + gcc/config/arm/aarch-cost-tables.h | 101 +++ gcc/doc/invoke.texi | 3 +- 6 files changed, 175 insertions(+), 2 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 2fa58ca..c9ac0d9 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,11 @@ +2014-11-19 Philipp Tomsich philipp.toms...@theobroma-systems.com + + * config/aarch64/aarch64-cores.def (xgene1): Update/add the + xgene1 (APM XGene-1) core definition. + * gcc/config/aarch64/aarch64.c: Add cost tables for APM XGene-1 + * config/arm/aarch-cost-tables.h: Add cost tables for APM XGene-1 + * doc/invoke.texi: Document -mcpu=xgene1. Changes to config/arm/ files are ok as long as you've built a cross-compiler for arm-none-eabi and ensured that a build succeeds. regards Ramana + 2014-11-18 Maciej W. Rozycki ma...@codesourcery.com * config/mips/mips.md (compression): Add `micromips32' setting. diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def index 312941f..e553e50 100644 --- a/gcc/config/aarch64/aarch64-cores.def +++ b/gcc/config/aarch64/aarch64-cores.def @@ -37,6 +37,7 @@ AARCH64_CORE(cortex-a53, cortexa53, cortexa53, 8, AARCH64_FL_FPSIMD | AARCH64_FL_CRC, cortexa53) AARCH64_CORE(cortex-a57, cortexa15, cortexa15, 8, AARCH64_FL_FPSIMD | AARCH64_FL_CRC, cortexa57) AARCH64_CORE(thunderx,thunderx, thunderx, 8, AARCH64_FL_FPSIMD | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx) +AARCH64_CORE(xgene1, xgene1,xgene1,8, AARCH64_FL_FPSIMD, xgene1) /* V8 big.LITTLE implementations. */ diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md index c717ea8..6409082 100644 --- a/gcc/config/aarch64/aarch64-tune.md +++ b/gcc/config/aarch64/aarch64-tune.md @@ -1,5 +1,5 @@ ;; -*- buffer-read-only: t -*- ;; Generated automatically by gentune.sh from aarch64-cores.def (define_attr tune - cortexa53,cortexa15,thunderx,cortexa57cortexa53 + cortexa53,cortexa15,thunderx,xgene1,cortexa57cortexa53 (const (symbol_ref ((enum attr_tune) aarch64_tune diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 4fec21e..9b92527 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -226,6 +226,27 @@ static const struct cpu_addrcost_table cortexa57_addrcost_table = #if HAVE_DESIGNATED_INITIALIZERS GCC_VERSION = 2007 __extension__ #endif +static const struct cpu_addrcost_table xgene1_addrcost_table = +{ +#if HAVE_DESIGNATED_INITIALIZERS + .addr_scale_costs = +#endif +{ + NAMED_PARAM (hi, 1), + NAMED_PARAM (si, 0), + NAMED_PARAM (di, 0), + NAMED_PARAM (ti, 1), +}, + NAMED_PARAM (pre_modify, 1), + NAMED_PARAM (post_modify, 0), + NAMED_PARAM (register_offset, 0), + NAMED_PARAM (register_extend, 1), + NAMED_PARAM (imm_offset, 0), +}; + +#if HAVE_DESIGNATED_INITIALIZERS GCC_VERSION = 2007 +__extension__ +#endif static const struct cpu_regmove_cost generic_regmove_cost = { NAMED_PARAM (GP2GP, 1), @@ -262,6 +283,17 @@ static const struct cpu_regmove_cost thunderx_regmove_cost = NAMED_PARAM (FP2FP, 4) }; +static const struct cpu_regmove_cost xgene1_regmove_cost = +{ + NAMED_PARAM (GP2GP, 1), + NAMED_PARAM (GP2FP, 8), + NAMED_PARAM (FP2GP, 8), + /* We currently do not provide direct support for TFmode Q-Q move. + Therefore we need to raise the cost above 2 in order to have + reload handle the situation. */ + NAMED_PARAM (FP2FP, 4) +}; + /* Generic costs for vector insn classes. */ #if HAVE_DESIGNATED_INITIALIZERS GCC_VERSION = 2007 __extension__ @@ -302,6 +334,26 @@ static const struct cpu_vector_cost cortexa57_vector_cost = NAMED_PARAM (cond_not_taken_branch_cost, 1) }; +/* Generic costs for vector insn classes. */ +#if HAVE_DESIGNATED_INITIALIZERS GCC_VERSION = 2007 +__extension__ +#endif +static const struct cpu_vector_cost xgene1_vector_cost = +{ + NAMED_PARAM (scalar_stmt_cost, 1), + NAMED_PARAM (scalar_load_cost, 5), + NAMED_PARAM (scalar_store_cost, 1), + NAMED_PARAM (vec_stmt_cost, 2), + NAMED_PARAM (vec_to_scalar_cost, 4), + NAMED_PARAM (scalar_to_vec_cost, 4), + NAMED_PARAM (vec_align_load_cost, 10), + NAMED_PARAM (vec_unalign_load_cost, 10), + NAMED_PARAM (vec_unalign_store_cost, 2), + NAMED_PARAM (vec_store_cost, 2), + NAMED_PARAM (cond_taken_branch_cost, 2),
Re: [PATCH 2/2] Pipeline model for APM XGene-1.
On Fri, Nov 21, 2014 at 6:44 PM, Philipp Tomsich philipp.toms...@theobroma-systems.com wrote: --- gcc/ChangeLog | 6 + gcc/config/aarch64/aarch64.md | 3 +- gcc/config/arm/xgene1.md | 532 ++ 3 files changed, 540 insertions(+), 1 deletion(-) create mode 100644 gcc/config/arm/xgene1.md diff --git a/gcc/ChangeLog b/gcc/ChangeLog index c9ac0d9..dad2278 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,11 @@ 2014-11-19 Philipp Tomsich philipp.toms...@theobroma-systems.com + * config/aarch64/aarch64.md: Include xgene1.md. + (generic_sched): Set to no for xgene1. + * config/arm/xgene1.md: New file. + config/arm/xgene1.md is fine as long as you've built a cross-compiler (building cc1 / cc1plus is enough) to arm-none-eabi to ensure no build breakage. Please wait for Marcus or Richard to approve the AArch64 bits. regards Ramana +2014-11-19 Philipp Tomsich philipp.toms...@theobroma-systems.com + * config/aarch64/aarch64-cores.def (xgene1): Update/add the xgene1 (APM XGene-1) core definition. * gcc/config/aarch64/aarch64.c: Add cost tables for APM XGene-1 diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 597ff8c..1b36384 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -191,7 +191,7 @@ (define_attr generic_sched yes,no (const (if_then_else - (eq_attr tune cortexa53,cortexa15,thunderx) + (eq_attr tune cortexa53,cortexa15,thunderx,xgene1) (const_string no) (const_string yes @@ -199,6 +199,7 @@ (include ../arm/cortex-a53.md) (include ../arm/cortex-a15.md) (include thunderx.md) +(include ../arm/xgene1.md) ;; --- ;; Jumps and other miscellaneous insns diff --git a/gcc/config/arm/xgene1.md b/gcc/config/arm/xgene1.md new file mode 100644 index 000..563a959 --- /dev/null +++ b/gcc/config/arm/xgene1.md @@ -0,0 +1,532 @@ +;; Machine description for AppliedMicro xgene1 core. +;; Copyright (C) 2012-2014 Free Software Foundation, Inc. +;; Contributed by Theobroma Systems Design und Consulting GmbH. +;;See http://www.theobroma-systems.com for more info. +;; +;; This file is part of GCC. +;; +;; GCC is free software; you can redistribute it and/or modify it +;; under the terms of the GNU General Public License as published by +;; the Free Software Foundation; either version 3, or (at your option) +;; any later version. +;; +;; GCC is distributed in the hope that it will be useful, but +;; WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +;; General Public License for more details. +;; +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; http://www.gnu.org/licenses/. + +;; Pipeline description for the xgene1 micro-architecture + +(define_automaton xgene1) + +(define_cpu_unit xgene1_decode_out0 xgene1) +(define_cpu_unit xgene1_decode_out1 xgene1) +(define_cpu_unit xgene1_decode_out2 xgene1) +(define_cpu_unit xgene1_decode_out3 xgene1) + +(define_cpu_unit xgene1_divide xgene1) +(define_cpu_unit xgene1_fp_divide xgene1) +(define_cpu_unit xgene1_fsu xgene1) +(define_cpu_unit xgene1_fcmp xgene1) + +(define_reservation xgene1_decode1op +( xgene1_decode_out0 ) +|( xgene1_decode_out1 ) +|( xgene1_decode_out2 ) +|( xgene1_decode_out3 ) +) +(define_reservation xgene1_decode2op +( xgene1_decode_out0 + xgene1_decode_out1 ) +|( xgene1_decode_out0 + xgene1_decode_out2 ) +|( xgene1_decode_out0 + xgene1_decode_out3 ) +|( xgene1_decode_out1 + xgene1_decode_out2 ) +|( xgene1_decode_out1 + xgene1_decode_out3 ) +|( xgene1_decode_out2 + xgene1_decode_out3 ) +) +(define_reservation xgene1_decodeIsolated +( xgene1_decode_out0 + xgene1_decode_out1 + xgene1_decode_out2 + xgene1_decode_out3 ) +) + +(define_insn_reservation xgene1_branch 1 + (and (eq_attr tune xgene1) + (eq_attr type branch)) + xgene1_decode1op) + +(define_insn_reservation xgene1_nop 1 + (and (eq_attr tune xgene1) + (eq_attr type no_insn)) + xgene1_decode1op) + +(define_insn_reservation xgene1_call 1 + (and (eq_attr tune xgene1) + (eq_attr type call)) + xgene1_decode2op) + +(define_insn_reservation xgene1_f_load 10 + (and (eq_attr tune xgene1) + (eq_attr type f_loadd,f_loads)) + xgene1_decode2op) + +(define_insn_reservation xgene1_f_store 4 + (and (eq_attr tune xgene1) + (eq_attr type f_stored,f_stores)) + xgene1_decode2op) + +(define_insn_reservation xgene1_fmov 2 + (and (eq_attr tune xgene1) + (eq_attr type fmov,fconsts,fconstd)) + xgene1_decode1op)
Re: [Ping]Re: [PR63762][4.9] Backport the patch which fixes GCC generates UNPREDICTABLE STR with Rn = Rt for arm
CCing release maintainers as well as they need to approve this backport if Vlad is happy with it. Vlad - is this ok to go back as it fixes a bug for ARM in the 4.9 tree that came up in building bits of debian. Ramana On Mon, Dec 1, 2014 at 5:24 PM, Renlin Li renlin...@arm.com wrote: On 01/12/14 15:58, H.J. Lu wrote: On Thu, Nov 27, 2014 at 8:38 AM, Renlin Li renlin...@arm.com wrote: On 27/11/14 15:37, H.J. Lu wrote: On Thu, Nov 27, 2014 at 7:32 AM, Renlin Li renlin...@arm.com wrote: On 26/11/14 18:12, H.J. Lu wrote: On Wed, Nov 26, 2014 at 10:09 AM, Renlin Li renlin...@arm.com wrote: On 26/11/14 12:16, H.J. Lu wrote: On Wed, Nov 26, 2014 at 4:07 AM, Renlin Li renlin...@arm.com wrote: On 20/11/14 16:17, Renlin Li wrote: Hi all, This is a backport for gcc-4_9-branch of the patch [PR63762]GCC generates UNPREDICTABLE STR with Rn = Rt for arm posted in: https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02253.html arm-none-eabi has been test on the model, no new issues. bootstrapping and regression tested on x86, no new issues. Is it Okay for gcc-4_9-branch? gcc/ChangeLog: 2014-11-20 Renlin Li renlin...@arm.com PR middle-end/63762 * ira.c (ira): Update preferred class. gcc/testsuite/ChangeLog: 2014-11-20 Renlin Li renlin...@arm.com PR middle-end/63762 * gcc.dg/pr63762.c: New. Ping for it. Please verify if it is the real fix for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63661 If yes, please add a testcase for PR 63661 and mention it in your ChangeLog entry. Thanks. Hi H.J. Yes, I have verified that, this patch additionally fixes PR 63661. I observed the same behaviour as I saw on arm backend. It will be great if you can double check they are caused by exactly the same reason. I will ask our people to take a look. A new testcase has been added, ChangeLog has been updated to reflect the change. Updated patch has bee attached. Okay for gcc-4_9-branch? Regards, Renlin Li gcc/ChangeLog: 2014-11-26 Renlin Li renlin...@arm.com PR middle-end/63762 PR middle-end/63661 * ira.c (ira): Update preferred class. gcc/testsuite/ChangeLog: 2014-11-26 Renlin Li renlin...@arm.com PR middle-end/63661 PR middle-end/63762 * testsuite/gcc.dg/pr63661.c: New. * testsuite/gcc.dg/pr63762.c: New. pr63661.c should be moved to gcc.target/i386 and run it on PIC target. Thanks. Hi H.J. The patch has been adjusted according to your suggestion. gcc/ChangeLog: 2014-11-27 Renlin Li renlin...@arm.com PR middle-end/63762 PR target/63661 * ira.c (ira): Update preferred class. gcc/testsuite/ChangeLog: 2014-11-27 Renlin Li renlin...@arm.com PR middle-end/63762 PR target/63661 * testsuite/gcc.dg/pr63762.c: New. * testsuite/gcc.target/i386/pr63661.c: New. ^^^ No testsuite/ Hi H.J. gcc.target/i386/pr63661.c should be checked into trunk first. A separate patch is sent to mailing list for this. ChangeLog has been corrected. Regards, Renlin Li gcc/ChangeLog: 2014-11-27 Renlin Li renlin...@arm.com PR middle-end/63762 PR target/63661 * ira.c (ira): Update preferred class. gcc/testsuite/ChangeLog: 2014-11-27 Renlin Li renlin...@arm.com PR middle-end/63762 PR target/63661 * gcc.dg/pr63762.c: New. * gcc.target/i386/pr63661.c: New. You need to update gcc.target/i386/pr63661.c from trunk. Update the test case for pr63661 from trunk. Okay for gcc-4_9-branch? gcc/ChangeLog: 2014-12-01 Renlin Li renlin...@arm.com PR middle-end/63762 PR target/63661 * ira.c (ira): Update preferred class. gcc/testsuite/ChangeLog: 2014-12-01 Renlin Li renlin...@arm.com H.J Lu hongjiu...@intel.com PR middle-end/63762 PR target/63661 * gcc.dg/pr63762.c: New. * gcc.target/i386/pr63661.c: New.
[Patch, committed] contrib/download_prerequisites - download ISL 0.14
Committed as Rev. 218294, now that ISL 0.14 is supported by GCC 5 and ISL 0.14 is in the infrastructure directory. Tobias Index: contrib/ChangeLog === --- contrib/ChangeLog (Revision 218293) +++ contrib/ChangeLog (Arbeitskopie) @@ -1,3 +1,7 @@ +2014-12-02 Tobias Burnus bur...@net-b.de + + * download_prerequisites: Download ISL 0.14 instead of 0.12.2. + 2014-11-25 Tom de Vries t...@codesourcery.com Peter Bergner berg...@vnet.ibm.com Index: contrib/download_prerequisites === --- contrib/download_prerequisites (Revision 218293) +++ contrib/download_prerequisites (Arbeitskopie) @@ -43,7 +43,7 @@ ln -sf $MPC mpc || exit 1 # Necessary to build GCC with the Graphite loop optimizations. if [ $GRAPHITE_LOOP_OPT = yes ] ; then - ISL=isl-0.12.2 + ISL=isl-0.14 wget ftp://gcc.gnu.org/pub/gcc/infrastructure/$ISL.tar.bz2 || exit 1 tar xjf $ISL.tar.bz2 || exit 1
Re: PR 13631 Problems in messages
So here is another proposal with all your remarks implemented. 2014-12-02 François Dumont fdum...@gcc.gnu.org DR libstdc++/13631 * include/bits/codecvt.h (codecvtchar, char, mbstate_t): friend class std::messageschar. (codecvtwchar_t, char, mbstate_t): friend class std::messageswchar_t. * config/locale/gnu/messages_member.h (messageschar::do_open): Specialized. (messageschar::do_close): Likewise. (messageswchar_t::do_open): Likewise. (messageswchar_t::do_close): Likewise. * config/locale/gnu/messages_member.cc: (messageschar::do_open): Implement. Use bind_textdomain_codeset based on codecvtchar, char, mbstate_t._M_c_locale_codecvt code set. Use internal cache to keep opened domain name with locale information. (messageswchar_t::do_open): Likewise with codecvtwchar_t, char, mbstate_t. (messageschar::do_close): Implement. Clean cache information. (messageswchar_t::do_close): Likewise. (get_glibc_msg): New. Use dgettext rather than gettext using cached domain name associated to catalog id. (messageschar::do_get): Use latter. (messageswchar_t::do_get): Likewise and use also cached locale codecvtwchar_t, char, mbstate_t facet to convert wchar_t default value to char and the result back to wchar_t. * testsuite/22_locale/messages/13631.cc: New. * testsuite/22_locale/messages/members/char/2.cc: Use also fr_FR locale for charset conversion to get the expected accented character. Tested under Linux x86_64. Ok to commit ? François On 02/12/2014 11:55, Jonathan Wakely wrote: If some user has got code that uses messagessigned char and they provide a definition for messagessigned char::do_get() their code will break if do_open and do_close disappear. (Realistically I doubt anyone is doing that though, it may not even work.) I restored default implementation, it does no harm. This is not exception safe. You can use auto_ptr to fix it: std::auto_ptrCatalog_info info(new Catalog_info(_M_catalog_counter++, __domain, __l)); _M_infos.push_back(info.get()); return info.release()-_M_id Good catch. Index: config/locale/gnu/messages_members.cc === --- config/locale/gnu/messages_members.cc (revision 218285) +++ config/locale/gnu/messages_members.cc (working copy) @@ -31,20 +31,123 @@ #include locale #include bits/c++locale_internal.h -namespace std _GLIBCXX_VISIBILITY(default) +#include limits +#include algorithm +#include vector + +#include backward/auto_ptr.h +#include ext/concurrence.h + +namespace { -_GLIBCXX_BEGIN_NAMESPACE_VERSION + using namespace std; - // Specializations. - template -string -messageschar::do_get(catalog, int, int, const string __dfault) const + typedef messages_base::catalog catalog; + + struct Catalog_info { +Catalog_info(catalog __id, const string __domain, locale __loc) + : _M_id(__id), _M_domain(__domain), _M_locale(__loc) +{ } + +catalog _M_id; +string _M_domain; +locale _M_locale; + }; + + class Catalogs + { + public: +Catalogs() : _M_catalog_counter(0) { } + +~Catalogs() +{ + for (vectorCatalog_info*::iterator __it = _M_infos.begin(); + __it != _M_infos.end(); ++__it) + delete *__it; +} + +catalog +_M_add(const string __domain, locale __l) +{ + __gnu_cxx::__scoped_lock lock(_M_mutex); + + // The counter is not likely to roll unless catalogs keep on being + // opened/closed which is consider as an application mistake for the + // moment. + if (_M_catalog_counter == numeric_limitscatalog::max()) + return -1; + + std::auto_ptrCatalog_info info(new Catalog_info(_M_catalog_counter++, + __domain, __l)); + _M_infos.push_back(info.get()); + return info.release()-_M_id; +} + +void +_M_erase(catalog __c) +{ + __gnu_cxx::__scoped_lock lock(_M_mutex); + + vectorCatalog_info*::iterator __res = + lower_bound(_M_infos.begin(), _M_infos.end(), __c, _Comp()); + if (__res == _M_infos.end() || (*__res)-_M_id != __c) + return; + + delete *__res; + _M_infos.erase(__res); + + // Just in case closed catalog was the last open. + if (__c == _M_catalog_counter - 1) + --_M_catalog_counter; +} + +const Catalog_info* +_M_get(catalog __c) const +{ + __gnu_cxx::__scoped_lock lock(_M_mutex); + + vectorCatalog_info*::const_iterator __res = + lower_bound(_M_infos.begin(), _M_infos.end(), __c, _Comp()); + + if (__res != _M_infos.end() (*__res)-_M_id == __c) + return *__res; + + return 0; +} + + private: +struct _Comp +{ + bool operator()(catalog __cat, const Catalog_info* __info) const + { return __cat __info-_M_id; } + + bool operator()(const Catalog_info* __info, catalog __cat) const + { return __info-_M_id __cat; } +
Re: [PATCH][ARM] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P
On Tue, Nov 11, 2014 at 11:55 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This is the arm implementation of the macro fusion hook. It tries to fuse movw+movt operations together. It also tries to take lo_sum RTXs into account since those generate movt instructions as well. Bootstrapped and tested on arm-none-linux-gnueabihf. Ok for trunk? if (current_tune-fuseable_ops ARM_FUSE_MOVW_MOVT) +{ + /* We are trying to fuse + movw imm / movt imm + instructions as a group that gets scheduled together. */ + A comment here about the insn structure would be useful. + set_dest = SET_DEST (curr_set); + if (GET_CODE (set_dest) == ZERO_EXTRACT) +{ + if (CONST_INT_P (SET_SRC (curr_set)) + CONST_INT_P (SET_SRC (prev_set)) + REG_P (XEXP (set_dest, 0)) + REG_P (SET_DEST (prev_set)) + REGNO (XEXP (set_dest, 0)) == REGNO (SET_DEST (prev_set))) +return true; +} + else if (GET_CODE (SET_SRC (curr_set)) == LO_SUM +REG_P (SET_DEST (curr_set)) +REG_P (SET_DEST (prev_set)) +GET_CODE (SET_SRC (prev_set)) == HIGH +REGNO (SET_DEST (curr_set)) == REGNO (SET_DEST (prev_set))) +{ + return true; +} Can we add a fast path exit to be if (GET_MODE (set_dest) != SImode) return false; I did think whether we wanted to use reg_overlap_mentioned_p as that may simplify the logic a bit but that's overkill here as we still want to restrict it to the cases above. Otherwise OK. Ramana +} + return false; Thanks, Kyrill 2014-11-11 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm-protos.h (tune_params): Add fuseable_ops field. * config/arm/arm.c (arm_macro_fusion_p): New function. (arm_macro_fusion_pair_p): Likewise. (TARGET_SCHED_MACRO_FUSION_P): Define. (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise. (ARM_FUSE_NOTHING): Likewise. (ARM_FUSE_MOVW_MOVT): Likewise. (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune, arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune, arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune, arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune arm_cortex_a5_tune): Specify fuseable_ops value.
Re: [PATCH driver/diagnostics] init color earlier, add color to driver
On Tue, 2 Dec 2014, Manuel López-Ibáñez wrote: 2014-12-02 Manuel López-Ibáñez m...@gcc.gnu.org * diagnostic.c (diagnostic_color_init): New. * diagnostic.h: Declare. * gcc.c (driver::global_initializations): Use it. (driver_handle_option): Handle -fdiagnostics-color_. * toplev.c: Do not include diagnostic-color.h. (process_options): Do not initialize color diagnostics here. * common.opt (fdiagnostics-color=): Add Driver. * opts-global.c (init_options_once): Initialize color here. The gcc.c / common.opt / opts-global.c changes are OK. -- Joseph S. Myers jos...@codesourcery.com
[PATCH obvious/diagnostics] Honor override_column when printing the caret
libcpp uses diagnostic-override_column to give a custom column number to diagnostics. This is taken into account when building the prefix, but it was missing when placing the caret. Before: /home/manuel/override_column.c:1:4: warning: /* within comment [-Wcomment] /* /* */ ^ After: /home/manuel/override_column.c:1:4: warning: /* within comment [-Wcomment] /* /* */ ^ Committed as obvious in r218295. 2014-12-03 Manuel López-Ibáñez m...@gcc.gnu.org * diagnostic.c (diagnostic_show_locus): Honor override_column when placing the caret. --- gcc/diagnostic.c(revision 218278) +++ gcc/diagnostic.c(working copy) @@ -308,10 +308,12 @@ diagnostic_show_locus (diagnostic_contex || diagnostic-location == context-last_location) return; context-last_location = diagnostic-location; s = expand_location_to_spelling_point (diagnostic-location); + if (diagnostic-override_column) +s.column = diagnostic-override_column; line = location_get_source_line (s, line_width); if (line == NULL || s.column line_width) return;
Re: [PATCH x86_64] Optimize access to globals in -fpie -pie builds with copy relocations
On Tue, Dec 2, 2014 at 2:14 PM, H.J. Lu hjl.to...@gmail.com wrote: On Tue, Dec 2, 2014 at 12:19 PM, Jakub Jelinek ja...@redhat.com wrote: On Tue, Dec 02, 2014 at 12:16:09PM -0800, H.J. Lu wrote: IMO, nobody will use this highly special x86_64-only option. It would be best for gnu-ld to reach feature parity with gold as far as this functionality is concerned. In this case, the optimization would be auto-configured, and would fire automatically, without any user intervention. I will implement it in ld after its support is checked into GCC. I think it would be better to do it the other way around, so that gcc can be configured against the right ld from the start. Consider it is done. I will check it into binutils ths week. It is on binutils master branch now: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=9a926d55ab4b6667f6c35b518d59b902fe490d9d -- H.J.