Re: [PATCH, ARM] MI-thunk fix for TARGET_THUMB1_ONLY
On 2014/6/18 上午 06:26, Ramana Radhakrishnan wrote: On Sun, Jun 8, 2014 at 12:27 PM, Chung-Lin Tang clt...@codesourcery.com wrote: Hi Richard, Ramana, Attached is a small fix for resolving a g++.old-deja/g++.jason/thunk2.C regression we found under a TARGET_THUMB1_ONLY multilib (-mthumb -march=armv6-m to be exact). Basically under those conditions, the thunk is in Thumb mode, so the subtraction should be 4 rather than 8. Yep, this is OK with a minor change to the comment to make it more explicit. + /* Output .word .LTHUNKn-[37]-.LTHUNKPCn. */ s/37/3,7/ Ok with that change and if no regressions. OK for release branches unless the RM's object in 24 hours. Re-tested on a recent trunk, verified g++.jason/thunk2.C resolved with patch and no regressions. Committed on trunk and backported to 4.8, 4.9 branches. Thanks, Chung-Lin
Re: [PATCH, ARM] Cortex-A9 MPCore volatile load workaround
Ping. On 2014/6/9 10:03 PM, Chung-Lin Tang wrote: Hi Richard, As we talked about earlier, here's a patch to add a compiler option to work around Cortex-A9 MPCore errata 761319: http://infocenter.arm.com/help/topic/com.arm.doc.uan0004a/UAN0004A_a9_read_read.pdf What the option does basically, is to scan for volatile loads during reorg, and add a dmb barrier after it. It also strives to make dmb conditionally executed under TARGET_THUMB2, which means a new Thumb-2 specific *memory_barrier_t2 pattern in sync.md, with adjusted conds/predicable attributes and %? in output strings. Patch originally written by Julian, with additions by Meador, and finally a few trivial adjustments by me. Again, we've been carrying this fix for a release or two. Okay for trunk? Thanks, Chung-Lin 2014-06-09 Julian Brown jul...@codesourcery.com Meador Inge mead...@codesourcery.com Chung-Lin Tang clt...@codesourcery.com * config/arm/arm.c (arm_option_override): Emit warning if -mfix-cortex-a9-volatile-hazards is used on an incompatible CPU. (any_volatile_loads_p): New. (arm_cortex_a9_errata_reorg): New. (arm_reorg): Call arm_cortex_a9_errata_reorg. * config/arm/arm.opt (mfix-cortex-a9-volatile-hazards): Add option. * config/arm/sync.md (*memory_barrier): Don't use on Thumb-2. (*memory_barrier_t2): New, allow conditional execution on Thumb-2. * doc/invoke.texi (-mfix-cortex-a9-volatile-hazards): Add documentation. testsuite/ * lib/target-supports.exp (check_effective_target_arm_dmb): New. * gcc.target/arm/a9-volatile-ordering-erratum-1.c: New test. * gcc.target/arm/a9-volatile-ordering-erratum-2.c: New test. * gcc.target/arm/a9-volatile-ordering-erratum-3.c: New test. * gcc.target/arm/a9-volatile-ordering-erratum-4.c: New test.
Move DECL_INIT_PRIORITY/FINI_PRIORITY to symbol table
Hi, this patch moves init and fini priorities to symbol table instead of trees. They are already in on-side hashtables, but the hashtables are now maintaned by symbol table. This is needed for correctness with LTO. Currently tree merging may load declaration with priority and then ggc_free it creating a stale entry in the hashtable. This is usually not problem, because ctor declarations are usually static, but it is not safe. I would really like to have template for such a sparse annotations to symbols (writting our old school pch friendly hashtable is not a fun) but I am not sure I can get it done in GGC/PCH safe way. Is user marking working for PCH? Bootstrapped/regtested x86_64-linux, will commit it shortly. Honza * cgraph.h (struct symtab_node): Add field in_init_priority_hash (set_init_priority, get_init_priority, set_fini_priority, get_fini_priority): New methods. * tree.c (init_priority_for_decl): Remove. (init_ttree): Do not initialize init priority. (decl_init_priority_lookup, decl_fini_priority_lookup): Rewrite. (decl_priority_info): Remove. (decl_init_priority_insert): Rewrite. (decl_fini_priority_insert): Rewrite. * tree.h (tree_priority_map_eq, tree_priority_map_hash, tree_priority_map_marked_p): Remove. * lto-cgraph.c (lto_output_node, input_node): Stream init priorities. * lto-streamer-out.c (hash_tree): Do not hash priorities. * tree-streamer-out.c (pack_ts_decl_with_vis_value_fields): Do not output priorities. (pack_ts_function_decl_value_fields): Likewise. * tree-streamer-in.c (unpack_ts_decl_with_vis_value_fields): Do not input priorities. (unpack_ts_function_decl_value_fields): Likewise. * symtab.c (symbol_priority_map): Declare. (init_priority_hash): Declare. (symtab_unregister_node): Unregister from priority hash, too. (symtab_node::get_init_priority, cgraph_node::get_fini_priority): New methods. (symbol_priority_map_eq, symbol_priority_map_hash): New functions. (symbol_priority_info): New function. (symtab_node::set_init_priority, cgraph_node::set_fini_priority): New methods. * tree-core.h (tree_priority_map): Remove. * lto.c (compare_tree_sccs_1): Do not compare priorities. Index: cgraph.h === --- cgraph.h(revision 211831) +++ cgraph.h(working copy) @@ -130,6 +130,8 @@ public: /* Set when symbol has address taken. */ unsigned address_taken : 1; + /* Set when init priority is set. */ + unsigned in_init_priority_hash : 1; /* Ordering of all symtab entries. */ @@ -163,6 +165,7 @@ public: return x_comdat_group; } + /* Return comdat group as identifier_node. */ tree get_comdat_group_id () { if (x_comdat_group TREE_CODE (x_comdat_group) != IDENTIFIER_NODE) @@ -208,6 +211,9 @@ public: /* Set section for symbol and its aliases. */ void set_section (const char *section); void set_section_for_node (const char *section); + + void set_init_priority (priority_type priority); + priority_type get_init_priority (); }; enum availability @@ -497,6 +503,9 @@ public: /* True if this decl calls a COMDAT-local function. This is set up in compute_inline_parameters and inline_call. */ unsigned calls_comdat_local : 1; + + void set_fini_priority (priority_type priority); + priority_type get_fini_priority (); }; Index: tree.c === --- tree.c (revision 211831) +++ tree.c (working copy) @@ -219,10 +219,6 @@ static GTY ((if_marked (tree_decl_map_m static GTY ((if_marked (tree_vec_map_marked_p), param_is (struct tree_vec_map))) htab_t debug_args_for_decl; -static GTY ((if_marked (tree_priority_map_marked_p), -param_is (struct tree_priority_map))) - htab_t init_priority_for_decl; - static void set_type_quals (tree, int); static int type_hash_eq (const void *, const void *); static hashval_t type_hash_hash (const void *); @@ -573,8 +569,6 @@ init_ttree (void) value_expr_for_decl = htab_create_ggc (512, tree_decl_map_hash, tree_decl_map_eq, 0); - init_priority_for_decl = htab_create_ggc (512, tree_priority_map_hash, - tree_priority_map_eq, 0); int_cst_hash_table = htab_create_ggc (1024, int_cst_hash_hash, int_cst_hash_eq, NULL); @@ -6492,13 +6486,12 @@ tree_decl_map_hash (const void *item) priority_type decl_init_priority_lookup (tree decl) { - struct tree_priority_map *h; - struct tree_map_base in; + symtab_node *snode = symtab_get_node (decl); - gcc_assert (VAR_OR_FUNCTION_DECL_P (decl)); - in.from = decl; - h = (struct tree_priority_map *) htab_find
Delay RTL initialization until it is really needed
Hi, IRA initialization shows high in profiles even when building lto objects. This patch simply delays RTL backend initialization until we really decide to output a function. In some cases this avoids the initialization completely (like in the case of LTO but also user target attributes) and there is some hope for better cache locality. Basic idea is to have two flags saying whether lang and target dependent bits needs initialization and check it when starting function codegen. Bootstrapped/regtested x86_64-linux, testing also at AIX. Ok if it passes? Honza * toplev.c (backend_init_target): Move init_emit_regs and init_regs to... (backend_init) ... here; skip ira_init_once and backend_init_target. (target_reinit) ... and here; clear this_target_rtl-lang_dependent_initialized. (lang_dependent_init_target): Clear this_target_rtl-lang_dependent_initialized; break out rtl initialization to ... (initialize_rtl): ... here; call also backend_init_target and ira_init_once. * toplev.h (initialize_rtl): New function. * function.c: Include toplev.h (init_function_start): Call initialize_rtl. * rtl.h (target_rtl): Add target_specific_initialized, lang_dependent_initialized. Index: toplev.c === --- toplev.c(revision 211837) +++ toplev.c(working copy) @@ -1583,14 +1583,6 @@ backend_init_target (void) /* Initialize alignment variables. */ init_alignments (); - /* This reinitializes hard_frame_pointer, and calls init_reg_modes_target() - to initialize reg_raw_mode[]. */ - init_emit_regs (); - - /* This invokes target hooks to set fixed_reg[] etc, which is - mode-dependent. */ - init_regs (); - /* This depends on stack_pointer_rtx. */ init_fake_stack_mems (); @@ -1632,9 +1624,13 @@ backend_init (void) init_varasm_once (); save_register_info (); - /* Initialize the target-specific back end pieces. */ - ira_init_once (); - backend_init_target (); + /* Middle end needs this initialization for default mem attributes + used by early calls to make_decl_rtl. */ + init_emit_regs (); + + /* Middle end needs this initialization for mode tables used to assign + modes to vector variables. */ + init_regs (); } /* Initialize excess precision settings. */ @@ -1686,6 +1682,31 @@ lang_dependent_init_target (void) front end is initialized. It also depends on the HAVE_xxx macros generated from the target machine description. */ init_optabs (); + this_target_rtl-lang_dependent_initialized = false; +} + +/* Perform initializations that are lang-dependent or target-dependent. + but matters only for late optimizations and RTL generation. */ + +void +initialize_rtl (void) +{ + static int initialized_once; + + /* Initialization done just once per compilation, but delayed + till code generation. */ + if (!initialized_once) +ira_init_once (); + initialized_once = true; + + /* Target specific RTL backend initialization. */ + if (!this_target_rtl-target_specific_initialized) +backend_init_target (); + this_target_rtl-target_specific_initialized = true; + + if (this_target_rtl-lang_dependent_initialized) +return; + this_target_rtl-lang_dependent_initialized = true; /* The following initialization functions need to generate rtl, so provide a dummy function context for them. */ @@ -1784,8 +1805,15 @@ target_reinit (void) regno_reg_rtx = NULL; } - /* Reinitialize RTL backend. */ - backend_init_target (); + this_target_rtl-lang_dependent_initialized = false; + + /* This initializes hard_frame_pointer, and calls init_reg_modes_target() + to initialize reg_raw_mode[]. */ + init_emit_regs (); + + /* This invokes target hooks to set fixed_reg[] etc, which is + mode-dependent. */ + init_regs (); /* Reinitialize lang-dependent parts. */ lang_dependent_init_target (); Index: toplev.h === --- toplev.h(revision 211837) +++ toplev.h(working copy) @@ -77,4 +77,6 @@ extern bool set_src_pwd (const c extern HOST_WIDE_INT get_random_seed (bool); extern const char *set_random_seed (const char *); +extern void initialize_rtl (void); + #endif /* ! GCC_TOPLEV_H */ Index: function.c === --- function.c (revision 211837) +++ function.c (working copy) @@ -64,6 +64,7 @@ along with GCC; see the file COPYING3. #include params.h #include bb-reorder.h #include shrink-wrap.h +#include toplev.h /* So we can assign to cfun in this file. */ #undef cfun @@ -4630,6 +4631,10 @@ init_function_start (tree subr) set_cfun (DECL_STRUCT_FUNCTION (subr)); else allocate_struct_function (subr, false); + + /* Initialize backend, if needed. */ + initialize_rtl (); +
Re: [PATCH][AArch64] Fix some saturating math NEON intrinsics types
On 16 June 2014 15:26, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, I noticed that a few saturating math intrinsics in arm_neon.h for aarch64 have the wrong types, i.e. not what's mandated by the ACLE spec. This patch fixes that by adjusting the types of the builtin functions that those intrinsics map to (and in the process cleaning up the VCON iterator) and adding tests for the affected intrinsics. I realise it's quite big, but the changes are mostly uniform. Bootstrapped and tested aarch64-none-linux-gnu. Ok for trunk? OK, can you prepare a 4.9 backport? Cheers /Marcus
Re: [PATCH] Implement -fsanitize=bounds and internal calls in FEs
On Thu, Jun 19, 2014 at 07:19:31PM +0200, Jakub Jelinek wrote: + case IFN_UBSAN_BOUNDS: + ubsan_expand_bounds_btn (gsi); + break; default: Why *_btn instead of *_ifn ? Remnant from when I was using __builtin.ubsan instead of the internal call. Fixed. +static tree +ubsan_walk_array_refs_r (tree *tp, int *walk_subtrees, void *data) +{ + struct pointer_set_t *pset = (struct pointer_set_t *) data; + + if (TREE_CODE (*tp) == BIND_EXPR) +{ I think it would be worth adding here a comment why do you handle BIND_EXPR here, that it doesn't walk the vars, but only their initializers etc. and thus in order to prevent walking DECL_INITIAL of TREE_STATIC decls we have to duplicate this part of walk_tree. Done. + for (tree decl = BIND_EXPR_VARS (*tp); decl; decl = DECL_CHAIN (decl)) + { + if (TREE_STATIC (decl)) + { + *walk_subtrees = 0; + continue; + } + walk_tree (DECL_INITIAL (decl), ubsan_walk_array_refs_r, NULL, pset); + walk_tree (DECL_SIZE (decl), ubsan_walk_array_refs_r, NULL, pset); + walk_tree (DECL_SIZE_UNIT (decl), ubsan_walk_array_refs_r, NULL, pset); Shouldn't that use pset, pset); or data, pset); ? Also, too long lines (at least the last one, first one likely too). Oops, fixed. + tree bound = TYPE_MAX_VALUE (domain); + if (ignore_off_by_one) +bound = fold_build2 (PLUS_EXPR, TREE_TYPE (bound), bound, +build_int_cst (TREE_TYPE (bound), 1)); + + /* Detect flexible array members and suchlike. */ + tree base = get_base_address (array); + if (base TREE_CODE (base) == INDIRECT_REF) I'd check also == MEM_REF here, while the FEs often use INDIRECT_REFs, there are already spots where it creates MEM_REFs. Fixed. +void +ubsan_maybe_instrument_array_ref (tree *expr_p, bool ignore_off_by_one) +{ + if (!ubsan_array_ref_instrumented_p (*expr_p) + current_function_decl != 0 Please use != NULL_TREE. Ok. + !lookup_attribute (no_sanitize_undefined, + DECL_ATTRIBUTES (current_function_decl))) +{ + tree t = copy_node (*expr_p); + tree op0 = TREE_OPERAND (t, 0); + tree op1 = TREE_OPERAND (t, 1); Please don't call copy_node until you know you want to instrument it. I.e. tree op0 = TREE_OPERAND (*expr_p, 0); tree op1 = TREE_OPERAND (*expr_p, 1); + tree e = ubsan_instrument_bounds (EXPR_LOCATION (t), op0, op1, s/t/*expr_p/ above. + ignore_off_by_one); + if (e != NULL_TREE) + { and only here add: tree t = copy_node (*expr_p); + TREE_OPERAND (t, 1) = build2 (COMPOUND_EXPR, TREE_TYPE (op1), + e, op1); + *expr_p = t; + } +} +} Fixed as well. --- gcc/tree-pretty-print.c +++ gcc/tree-pretty-print.c @@ -3218,6 +3218,13 @@ print_call_name (pretty_printer *buffer, tree node, int flags) { tree op0 = node; + if (node == NULL_TREE) +{ + /* TODO Print builtin name. */ + pp_string (buffer, internal function call); Use internal_fn_name function? Uh, not sure how I missed that. I print the internal function before calling print_call_name, since that gets NULL node - and we can't determine CALL_EXPR_IFN. I added some docs as well, as promised. 2014-06-20 Marek Polacek pola...@redhat.com * asan.c (pass_sanopt::execute): Handle IFN_UBSAN_BOUNDS. * flag-types.h (enum sanitize_code): Add SANITIZE_BOUNDS and or it into SANITIZE_UNDEFINED. * doc/invoke.texi: Describe -fsanitize=bounds. * gimplify.c (gimplify_call_expr): Add gimplification of internal functions created in the FEs. * internal-fn.c: Move internal-fn.h after tree.h. (expand_UBSAN_BOUNDS): New function. * internal-fn.def (UBSAN_BOUNDS): New internal function. * internal-fn.h: Don't define internal functions here. * opts.c (common_handle_option): Add -fsanitize=bounds. * sanitizer.def (BUILT_IN_UBSAN_HANDLE_OUT_OF_BOUNDS, BUILT_IN_UBSAN_HANDLE_OUT_OF_BOUNDS_ABORT): Add. * tree-core.h: Define internal functions here. (struct tree_base): Add ifn field. * tree-pretty-print.c: Include internal-fn.h. (dump_generic_node): Handle functions without CALL_EXPR_FN. * tree.c (get_callee_fndecl): Likewise. (build_call_expr_internal_loc): New function. * tree.def (CALL_EXPR): Update description. * tree.h (CALL_EXPR_IFN): Define. (build_call_expr_internal_loc): Declare. * ubsan.c (get_ubsan_type_info_for_type): Return 0 for non-arithmetic types. (ubsan_type_descriptor): Change bool parameter to enum ubsan_print_style. Adjust the code. Add handling of UBSAN_PRINT_ARRAY.
Re: [PATCH] Implement -fsanitize=bounds and internal calls in FEs
On Fri, Jun 20, 2014 at 10:43:04AM +0200, Marek Polacek wrote: + +/* Internal function code. */ +ENUM_BITFIELD(internal_fn) ifn : 5; Any reason for the : 5 here? I mean, the union also contains unsigned int, so it doesn't hurt if you use full 32 bits for it there, and it should be faster and you won't run into problems when we'll have more than 32 internal functions. Otherwise the patch looks good to me, but please wait for comments from Joseph and/or Jason. Jakub
[linaro/gcc-4_9-branch] AArch64 costs model backports
Hi all, we have backported a set of AArch64 costs model related revisions in the linaro/gcc-4_9-branch at r211843. The backported revisions are: 210493 : [AArch64 costs 1/18] Refactor aarch64_address_costs. 210494 : [AArch64 costs 2/18] Add cost tables for Cortex-A57 210495 : [AArch64 costs 3/18] Wrap aarch64_rtx_costs to dump verbose output 210496 : [AArch64 costs 4/18] Better estimate cost of building a constant 210497 : [AArch64 costs 5/18] Factor out common MULT cases 210498 : [AArch64 costs 6/18] Set default costs and handle vector modes. 210499 : [AArch64 costs 7/18] Improve SET cost. 210500 : [AArch64 costs 8/18] Cost memory accesses using address costs 210501 : [AArch64 costs 9/18] Better cost logical operations 210502 : [AArch64 costs 10/18] Improve costs for sign/zero extend operations 210503 : [AArch64 costs 11/18] Improve costs for rotate and shift operations. 210504 : [AArch64 costs 12/18] Improve costs for sign/zero extracts 210505 : [AArch64 costs 13/18] Improve costs for div/mod 210506 : [AArch64 costs 14/18] Cost comparisons, flag setting operators and IF_THEN_ELSE 210507 : [AArch64 costs 15/18] Cost more Floating point RTX. 210508 : [AArch64 costs 16/18] Cost TRUNCATE 210509 : [AArch64 costs 17/18] Cost for SYMBOL_REF, HIGH and LO_SUM 210510 : [AArch64 costs 18/18] Dump a message if we are unable to cost an insn. 210512 : [AArch64 costs] Fixup to costing of FNMUL 211205 : aarch64_if_then_else_costs refactor 211206 : aarch64_if_then_else_costs Thanks, Yvan
Re: [PATCH] Implement -fsanitize=bounds and internal calls in FEs
On Fri, Jun 20, 2014 at 10:57:47AM +0200, Jakub Jelinek wrote: On Fri, Jun 20, 2014 at 10:43:04AM +0200, Marek Polacek wrote: + +/* Internal function code. */ +ENUM_BITFIELD(internal_fn) ifn : 5; Any reason for the : 5 here? I mean, the union also contains unsigned int, so it doesn't hurt if you use full 32 bits for it there, and it should be faster and you won't run into problems when we'll have more than 32 internal functions. The sole reason was that all other ENUM_BITFIELDs have it - on the other hand, they're not in a union and here the bit-field is pointless. I'll drop it. Otherwise the patch looks good to me, but please wait for comments from Joseph and/or Jason. Thanks for all your help! The following is the same patch with only : 5 dropped. 2014-06-20 Marek Polacek pola...@redhat.com * asan.c (pass_sanopt::execute): Handle IFN_UBSAN_BOUNDS. * flag-types.h (enum sanitize_code): Add SANITIZE_BOUNDS and or it into SANITIZE_UNDEFINED. * doc/invoke.texi: Describe -fsanitize=bounds. * gimplify.c (gimplify_call_expr): Add gimplification of internal functions created in the FEs. * internal-fn.c: Move internal-fn.h after tree.h. (expand_UBSAN_BOUNDS): New function. * internal-fn.def (UBSAN_BOUNDS): New internal function. * internal-fn.h: Don't define internal functions here. * opts.c (common_handle_option): Add -fsanitize=bounds. * sanitizer.def (BUILT_IN_UBSAN_HANDLE_OUT_OF_BOUNDS, BUILT_IN_UBSAN_HANDLE_OUT_OF_BOUNDS_ABORT): Add. * tree-core.h: Define internal functions here. (struct tree_base): Add ifn field. * tree-pretty-print.c: Include internal-fn.h. (dump_generic_node): Handle functions without CALL_EXPR_FN. * tree.c (get_callee_fndecl): Likewise. (build_call_expr_internal_loc): New function. * tree.def (CALL_EXPR): Update description. * tree.h (CALL_EXPR_IFN): Define. (build_call_expr_internal_loc): Declare. * ubsan.c (get_ubsan_type_info_for_type): Return 0 for non-arithmetic types. (ubsan_type_descriptor): Change bool parameter to enum ubsan_print_style. Adjust the code. Add handling of UBSAN_PRINT_ARRAY. (ubsan_expand_bounds_ifn): New function. (ubsan_expand_null_ifn): Adjust ubsan_type_descriptor call. (ubsan_build_overflow_builtin): Likewise. (instrument_bool_enum_load): Likewise. (ubsan_instrument_float_cast): Likewise. * ubsan.h (enum ubsan_print_style): New enum. (ubsan_expand_bounds_ifn): Declare. (ubsan_type_descriptor): Adjust declaration. Use a default parameter. c-family/ * c-gimplify.c: Include c-ubsan.h and pointer-set.h. (ubsan_walk_array_refs_r): New function. (c_genericize): Instrument array bounds. * c-ubsan.c: Include internal-fn.h. (ubsan_instrument_division): Mark instrumented arrays as having side effects. Adjust ubsan_type_descriptor call. (ubsan_instrument_shift): Likewise. (ubsan_instrument_vla): Adjust ubsan_type_descriptor call. (ubsan_instrument_bounds): New function. (ubsan_array_ref_instrumented_p): New function. (ubsan_maybe_instrument_array_ref): New function. * c-ubsan.h (ubsan_instrument_bounds): Declare. (ubsan_array_ref_instrumented_p): Declare. (ubsan_maybe_instrument_array_ref): Declare. testsuite/ * c-c++-common/ubsan/bounds-1.c: New test. * c-c++-common/ubsan/bounds-2.c: New test. * c-c++-common/ubsan/bounds-3.c: New test. * c-c++-common/ubsan/bounds-4.c: New test. * c-c++-common/ubsan/bounds-5.c: New test. * c-c++-common/ubsan/bounds-6.c: New test. diff --git gcc/asan.c gcc/asan.c index 281a795..b7c76cf 100644 --- gcc/asan.c +++ gcc/asan.c @@ -2761,6 +2761,9 @@ pass_sanopt::execute (function *fun) case IFN_UBSAN_NULL: ubsan_expand_null_ifn (gsi); break; + case IFN_UBSAN_BOUNDS: + ubsan_expand_bounds_ifn (gsi); + break; default: break; } @@ -2771,6 +2774,10 @@ pass_sanopt::execute (function *fun) print_gimple_stmt (dump_file, stmt, 0, dump_flags); fprintf (dump_file, \n); } + + /* ubsan_expand_bounds_ifn might move us to the end of the BB. */ + if (gsi_end_p (gsi)) + break; } } return 0; diff --git gcc/c-family/c-gimplify.c gcc/c-family/c-gimplify.c index 737be4d..c797d99 100644 --- gcc/c-family/c-gimplify.c +++ gcc/c-family/c-gimplify.c @@ -45,6 +45,8 @@ along with GCC; see the file COPYING3. If not see #include c-pretty-print.h #include cgraph.h #include cilk.h +#include c-ubsan.h +#include pointer-set.h /* The gimplification pass converts the language-dependent
Re: [PATCH] Implement -fsanitize=bounds and internal calls in FEs
On Fri, Jun 20, 2014 at 11:34:26AM +0200, Marek Polacek wrote: On Fri, Jun 20, 2014 at 10:57:47AM +0200, Jakub Jelinek wrote: On Fri, Jun 20, 2014 at 10:43:04AM +0200, Marek Polacek wrote: + +/* Internal function code. */ +ENUM_BITFIELD(internal_fn) ifn : 5; Any reason for the : 5 here? I mean, the union also contains unsigned int, so it doesn't hurt if you use full 32 bits for it there, and it should be faster and you won't run into problems when we'll have more than 32 internal functions. The sole reason was that all other ENUM_BITFIELDs have it - on the other hand, they're not in a union and here the bit-field is pointless. I'll drop it. Well, no point to use ENUM_BITFIELD either, just use enum internal_fn ifn; ? Jakub
Re: [RFC] optimize x - y cmp 0 with undefined overflow
[I'm at last back to this...] With [1, -x + INF] as the resulting range? But it can be bogus if x is itself equal to +INF (unlike the input range [x + 1, +INF] which is always correct) Hmm, indeed. so this doesn't look valid to me. I don't see how we can get away without a +INF(OVF) here, but I can compute it in extract_range_from_binary_expr_1 if you prefer and try only [op0,op0] and [op1,op1]. Yeah, I'd prefer that. To recap, the range of y is [x + 1, +INF] and we're trying to evaluate the range of y - x, in particular we want to prove that y - x 0. We compute the range of y - x as [1, -x + INF] by combining [x + 1, +INF] with [x, x] and we want to massage it because compare_values will rightly choke. If overflow is undefined, we can simply change it to [1, +INF (OVF)] and be done with that. But if overflow is defined, we need to prove that -x + INF cannot wrap around (which is true if the type is unsigned) and the simplest way to do that in the general case is to recursively invoke the machinery of extract_range_from_binary_expr_1 on range_of(-x) + INF and analyze the result. This looks much more complicated implementation-wise (and would very likely buy us nothing in practice) than my scheme, where we just approximate range_of(-x) by [-INF,+INF] and don't need to implement the recursion at all. However I can change extract_range_from_binary_expr so that only one range among [-INF,x], [x,+INF] and [x,x] is tried instead of the 3 ranges in a row. I initially didn't want to do that because this breaks the separation between extract_range_from_binary_expr_1 and extract_range_from_binary_expr but, given their names, this is very likely acceptable. What do you think? -- Eric Botcazou
Re: [PATCH] Implement -fsanitize=bounds and internal calls in FEs
On Fri, Jun 20, 2014 at 11:39:23AM +0200, Jakub Jelinek wrote: On Fri, Jun 20, 2014 at 11:34:26AM +0200, Marek Polacek wrote: On Fri, Jun 20, 2014 at 10:57:47AM +0200, Jakub Jelinek wrote: On Fri, Jun 20, 2014 at 10:43:04AM +0200, Marek Polacek wrote: + +/* Internal function code. */ +ENUM_BITFIELD(internal_fn) ifn : 5; Any reason for the : 5 here? I mean, the union also contains unsigned int, so it doesn't hurt if you use full 32 bits for it there, and it should be faster and you won't run into problems when we'll have more than 32 internal functions. The sole reason was that all other ENUM_BITFIELDs have it - on the other hand, they're not in a union and here the bit-field is pointless. I'll drop it. Well, no point to use ENUM_BITFIELD either, just use enum internal_fn ifn; Works as well. 2014-06-20 Marek Polacek pola...@redhat.com * asan.c (pass_sanopt::execute): Handle IFN_UBSAN_BOUNDS. * flag-types.h (enum sanitize_code): Add SANITIZE_BOUNDS and or it into SANITIZE_UNDEFINED. * doc/invoke.texi: Describe -fsanitize=bounds. * gimplify.c (gimplify_call_expr): Add gimplification of internal functions created in the FEs. * internal-fn.c: Move internal-fn.h after tree.h. (expand_UBSAN_BOUNDS): New function. * internal-fn.def (UBSAN_BOUNDS): New internal function. * internal-fn.h: Don't define internal functions here. * opts.c (common_handle_option): Add -fsanitize=bounds. * sanitizer.def (BUILT_IN_UBSAN_HANDLE_OUT_OF_BOUNDS, BUILT_IN_UBSAN_HANDLE_OUT_OF_BOUNDS_ABORT): Add. * tree-core.h: Define internal functions here. (struct tree_base): Add ifn field. * tree-pretty-print.c: Include internal-fn.h. (dump_generic_node): Handle functions without CALL_EXPR_FN. * tree.c (get_callee_fndecl): Likewise. (build_call_expr_internal_loc): New function. * tree.def (CALL_EXPR): Update description. * tree.h (CALL_EXPR_IFN): Define. (build_call_expr_internal_loc): Declare. * ubsan.c (get_ubsan_type_info_for_type): Return 0 for non-arithmetic types. (ubsan_type_descriptor): Change bool parameter to enum ubsan_print_style. Adjust the code. Add handling of UBSAN_PRINT_ARRAY. (ubsan_expand_bounds_ifn): New function. (ubsan_expand_null_ifn): Adjust ubsan_type_descriptor call. (ubsan_build_overflow_builtin): Likewise. (instrument_bool_enum_load): Likewise. (ubsan_instrument_float_cast): Likewise. * ubsan.h (enum ubsan_print_style): New enum. (ubsan_expand_bounds_ifn): Declare. (ubsan_type_descriptor): Adjust declaration. Use a default parameter. c-family/ * c-gimplify.c: Include c-ubsan.h and pointer-set.h. (ubsan_walk_array_refs_r): New function. (c_genericize): Instrument array bounds. * c-ubsan.c: Include internal-fn.h. (ubsan_instrument_division): Mark instrumented arrays as having side effects. Adjust ubsan_type_descriptor call. (ubsan_instrument_shift): Likewise. (ubsan_instrument_vla): Adjust ubsan_type_descriptor call. (ubsan_instrument_bounds): New function. (ubsan_array_ref_instrumented_p): New function. (ubsan_maybe_instrument_array_ref): New function. * c-ubsan.h (ubsan_instrument_bounds): Declare. (ubsan_array_ref_instrumented_p): Declare. (ubsan_maybe_instrument_array_ref): Declare. testsuite/ * c-c++-common/ubsan/bounds-1.c: New test. * c-c++-common/ubsan/bounds-2.c: New test. * c-c++-common/ubsan/bounds-3.c: New test. * c-c++-common/ubsan/bounds-4.c: New test. * c-c++-common/ubsan/bounds-5.c: New test. * c-c++-common/ubsan/bounds-6.c: New test. diff --git gcc/asan.c gcc/asan.c index 281a795..b7c76cf 100644 --- gcc/asan.c +++ gcc/asan.c @@ -2761,6 +2761,9 @@ pass_sanopt::execute (function *fun) case IFN_UBSAN_NULL: ubsan_expand_null_ifn (gsi); break; + case IFN_UBSAN_BOUNDS: + ubsan_expand_bounds_ifn (gsi); + break; default: break; } @@ -2771,6 +2774,10 @@ pass_sanopt::execute (function *fun) print_gimple_stmt (dump_file, stmt, 0, dump_flags); fprintf (dump_file, \n); } + + /* ubsan_expand_bounds_ifn might move us to the end of the BB. */ + if (gsi_end_p (gsi)) + break; } } return 0; diff --git gcc/c-family/c-gimplify.c gcc/c-family/c-gimplify.c index 737be4d..c797d99 100644 --- gcc/c-family/c-gimplify.c +++ gcc/c-family/c-gimplify.c @@ -45,6 +45,8 @@ along with GCC; see the file COPYING3. If not see #include c-pretty-print.h #include cgraph.h #include cilk.h +#include c-ubsan.h +#include
[PATCH] Fix ICE in Asan
Hi, This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61530. Tested on x86_64-unknown-linux-gnu, no regressions. Ok to commit? -Maxim gcc/ChangeLog: 2014-06-20 Yury Gribov y.gri...@samsung.com Max Ostapenko m.ostape...@partner.samsung.com * asan.c (build_check_stmt): Add condition. gcc/testsuite/ChangeLog: 2014-06-20 Yury Gribov y.gri...@samsung.com Max Ostapenko m.ostape...@partner.samsung.com * c-c++-common/asan/pr61530.c: New test. diff --git a/gcc/asan.c b/gcc/asan.c index 281a795..4d87dad 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -1654,6 +1654,7 @@ build_check_stmt (location_t location, tree base, tree len, if (size_in_bytes 1) { if ((size_in_bytes (size_in_bytes - 1)) != 0 + || !is_scalar_access || size_in_bytes 16) size_in_bytes = -1; else if (align align size_in_bytes * BITS_PER_UNIT) diff --git a/gcc/testsuite/c-c++-common/asan/pr61530.c b/gcc/testsuite/c-c++-common/asan/pr61530.c new file mode 100644 index 000..e306a71 --- /dev/null +++ b/gcc/testsuite/c-c++-common/asan/pr61530.c @@ -0,0 +1,17 @@ +/* { dg-do run } */ +/* { dg-shouldfail asan } */ + +__attribute__((noinline,noclone)) void +foo (char *a, char *b) { + a[0] = b[0] = 0; + __builtin_memcpy(a, b, 4); +} + +int +main () { + char a, b; + foo (a, b); + return 0; +} + +/* { dg-output ERROR: AddressSanitizer: stack-buffer-overflow } */
Re: [PATCH] Fix ICE in Asan
On Fri, Jun 20, 2014 at 02:04:25PM +0400, Maxim Ostapenko wrote: This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61530. Tested on x86_64-unknown-linux-gnu, no regressions. Ok to commit? -Maxim gcc/ChangeLog: 2014-06-20 Yury Gribov y.gri...@samsung.com Max Ostapenko m.ostape...@partner.samsung.com * asan.c (build_check_stmt): Add condition. Please align your name below Yury's (i.e. tab + 4 spaces indent). Also, please add PR sanitizer/61530 to both changelog entries. Ok with those changes. gcc/testsuite/ChangeLog: 2014-06-20 Yury Gribov y.gri...@samsung.com Max Ostapenko m.ostape...@partner.samsung.com * c-c++-common/asan/pr61530.c: New test. Jakub
Re: [PATCH] Fix ICE in Asan
On 06/20/2014 02:07 PM, Jakub Jelinek wrote: On Fri, Jun 20, 2014 at 02:04:25PM +0400, Maxim Ostapenko wrote: This patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61530. Tested on x86_64-unknown-linux-gnu, no regressions. Ok to commit? -Maxim gcc/ChangeLog: 2014-06-20 Yury Gribov y.gri...@samsung.com Max Ostapenko m.ostape...@partner.samsung.com * asan.c (build_check_stmt): Add condition. Please align your name below Yury's (i.e. tab + 4 spaces indent). Also, please add PR sanitizer/61530 to both changelog entries. Ok with those changes. gcc/testsuite/ChangeLog: 2014-06-20 Yury Gribov y.gri...@samsung.com Max Ostapenko m.ostape...@partner.samsung.com * c-c++-common/asan/pr61530.c: New test. Jakub Thanks, done in r211846. -Maxim
RE: [PATCH] Fix PR61375: cancel bswap optimization when value doesn't fit in a HOST_WIDE_INT
From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Tuesday, June 10, 2014 5:05 PM Backports are welcome - please post a patch. Sorry for the delay. Here you are: diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61375.c b/gcc/testsuite/gcc.c-torture/execute/pr61375.c new file mode 100644 index 000..d3b54a8 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr61375.c @@ -0,0 +1,35 @@ +#ifdef __UINT64_TYPE__ +typedef __UINT64_TYPE__ uint64_t; +#else +typedef unsigned long long uint64_t; +#endif + +#ifndef __SIZEOF_INT128__ +#define __int128 long long +#endif + +/* Some version of bswap optimization would ICE when analyzing a mask constant + too big for an HOST_WIDE_INT (PR61375). */ + +__attribute__ ((noinline, noclone)) uint64_t +uint128_central_bitsi_ior (unsigned __int128 in1, uint64_t in2) +{ + __int128 mask = (__int128)0x 56; + return ((in1 mask) 56) | in2; +} + +int +main (int argc) +{ + __int128 in = 1; +#ifdef __SIZEOF_INT128__ + in = 64; +#endif + if (sizeof (uint64_t) * __CHAR_BIT__ != 64) +return 0; + if (sizeof (unsigned __int128) * __CHAR_BIT__ != 128) +return 0; + if (uint128_central_bitsi_ior (in, 2) != 0x102) +__builtin_abort (); + return 0; +} diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index 9ff857c..9d64205 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -1741,6 +1741,8 @@ find_bswap_1 (gimple stmt, struct symbolic_number *n, int limit) if (n-size % BITS_PER_UNIT != 0) return NULL_TREE; n-size /= BITS_PER_UNIT; + if (n-size (int)sizeof (unsigned HOST_WIDEST_INT)) + return NULL_TREE; n-n = (sizeof (HOST_WIDEST_INT) 8 ? 0 : (unsigned HOST_WIDEST_INT)0x08070605 32 | 0x04030201); @@ -1781,6 +1783,8 @@ find_bswap_1 (gimple stmt, struct symbolic_number *n, int limit) type_size = TYPE_PRECISION (gimple_expr_type (stmt)); if (type_size % BITS_PER_UNIT != 0) return NULL_TREE; + if (type_size (int)HOST_BITS_PER_WIDEST_INT) + return NULL_TREE; if (type_size / BITS_PER_UNIT (int)(sizeof (HOST_WIDEST_INT))) { Ok for GCC 4.8 and GCC 4.9 branches? Best regards, Thomas
[PATCH] Fix for invalid sanitization of trailing byte in __builtin_strlen
This patch fixes invalid sanitization of trailing byte in __builtin_strlen ( https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61547). Tested on x86_64-unknown-linux-gnu, no regressions. Ok to commit? -Maxim gcc/ChangeLog: 2014-06-20 Yury Gribov y.gri...@samsung.com Max Ostapenko m.ostape...@partner.samsung.com PR sanitizer/61547 * asan.c (instrument_strlen_call): Fixed instrumentation of trailing byte. gcc/testsuite/ChangeLog: 2014-06-20 Yury Gribov y.gri...@samsung.com Max Ostapenko m.ostape...@partner.samsung.com PR sanitizer/61547 * c-c++-common/asan/strlen-overflow-1.c: New test. diff --git a/gcc/asan.c b/gcc/asan.c index 281a795..71c063b 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -2036,19 +2036,19 @@ instrument_strlen_call (gimple_stmt_iterator *iter) build_check_stmt (loc, gimple_assign_lhs (str_arg_ssa), NULL_TREE, 1, iter, /*non_zero_len_p*/true, /*before_p=*/true, - /*is_store=*/false, /*is_scalar_access*/false, /*align*/0); + /*is_store=*/false, /*is_scalar_access*/true, /*align*/0); - gimple stmt = -gimple_build_assign_with_ops (PLUS_EXPR, - make_ssa_name (TREE_TYPE (len), NULL), - len, - build_int_cst (TREE_TYPE (len), 1)); - gimple_set_location (stmt, loc); - gsi_insert_after (iter, stmt, GSI_NEW_STMT); + gimple g = +gimple_build_assign_with_ops (POINTER_PLUS_EXPR, + make_ssa_name (cptr_type, NULL), + gimple_assign_lhs (str_arg_ssa), + len); + gimple_set_location (g, loc); + gsi_insert_after (iter, g, GSI_NEW_STMT); - build_check_stmt (loc, gimple_assign_lhs (stmt), len, 1, iter, + build_check_stmt (loc, gimple_assign_lhs (g), NULL_TREE, 1, iter, /*non_zero_len_p*/true, /*before_p=*/false, - /*is_store=*/false, /*is_scalar_access*/false, /*align*/0); + /*is_store=*/false, /*is_scalar_access*/true, /*align*/0); return true; } diff --git a/gcc/testsuite/c-c++-common/asan/strlen-overflow-1.c b/gcc/testsuite/c-c++-common/asan/strlen-overflow-1.c new file mode 100644 index 000..4c4585e --- /dev/null +++ b/gcc/testsuite/c-c++-common/asan/strlen-overflow-1.c @@ -0,0 +1,31 @@ +/* { dg-do run } */ +/* { dg-skip-if { *-*-* } { -flto } { } } */ +/* { dg-shouldfail asan } */ + +#include sanitizer/asan_interface.h + +char a[2] = 0; + +#ifdef __cplusplus +extern C +#endif +__SIZE_TYPE__ strlen (const char *p) { + /* Simulate error */ + if (p == a) +return 1; + + __SIZE_TYPE__ n = 0; + for (; *p; ++n, ++p); + return n; +} + +int main () { + char *p = a[0]; + asm ( : +r(p)); + __asan_poison_memory_region ((char *)a[1], 1); + return __builtin_strlen (a); +} + +/* { dg-output READ of size 1 at 0x\[0-9a-f\]+ thread T0.*(\n|\r\n|\r) } */ +/* { dg-output #0 0x\[0-9a-f\]+ (in _*main (\[^\n\r]*strlen-overflow-1.c:26|\[^\n\r]*:0)|\[(\]).*(\n|\r\n|\r) } */ +/* { dg-output \[^\n\r]*0x\[0-9a-f\]+ is located 1 bytes inside of global variable } */
[PATCH 2/N] allow storing values directly in hash tables
From: Trevor Saunders tsaund...@mozilla.com Hi, this patch allows you to define the type the hash table stores as elements instead of the type elements point at by having your hash descriptor define the type store_values_directly. It turns out trying to implement both cases with the same code is really confusing, so I ended up providing one partial specialization for each case. Its a lot of coppying, but I'm hoping the next patch will get rid of many direct users of hash_table, and the rest can all get converted to tell the hash table the type entries should have at which point the dupplication can be removed. bootstrapped + regtested without regression on x86_64-unknown-linux-gnu, ok? Trev gcc/ * hash-table.h: Add a template arg to choose between storing values and storing pointers to values, and then provide partial specializations for both. * tree-browser.c (tree_upper_hasher): Provide the type the hash table should store, not the type values should point to. * tree-into-ssa.c (var_info_hasher): Likewise. * tree-ssa-dom.c (expr_elt_hasher): Likewise. * tree-complex.c: Adjust. * tree-hasher.h (int_tree_hasher): store int_tree_map in the hash table instead of int_tree_map *. * tree-parloops.c: Adjust. * tree-ssa-reassoc.c (ocount_hasher): Don't lie to hash_map about what type is being stored. * tree-vectorizer.c: Adjust. diff --git a/gcc/hash-table.h b/gcc/hash-table.h index 41cc19e..22af12f 100644 --- a/gcc/hash-table.h +++ b/gcc/hash-table.h @@ -272,19 +272,18 @@ typed_noop_remove Type::remove (Type *p ATTRIBUTE_UNUSED) template typename Type struct pointer_hash : typed_noop_remove Type { - typedef Type value_type; - typedef Type compare_type; + typedef Type *value_type; + typedef Type *compare_type; + typedef int store_values_directly; - static inline hashval_t - hash (const value_type *); + static inline hashval_t hash (const value_type ); - static inline int - equal (const value_type *existing, const compare_type *candidate); + static inline bool equal (const value_type existing, const compare_type candidate); }; template typename Type inline hashval_t -pointer_hash Type::hash (const value_type *candidate) +pointer_hash Type::hash (const value_type candidate) { /* This is a really poor hash function, but it is what the current code uses, so I am reusing it to avoid an additional axis in testing. */ @@ -292,9 +291,9 @@ pointer_hash Type::hash (const value_type *candidate) } template typename Type -inline int -pointer_hash Type::equal (const value_type *existing, - const compare_type *candidate) +inline bool +pointer_hash Type::equal (const value_type existing, + const compare_type candidate) { return existing == candidate; } @@ -319,10 +318,147 @@ extern unsigned int hash_table_higher_prime_index (unsigned long n); extern hashval_t hash_table_mod1 (hashval_t hash, unsigned int index); extern hashval_t hash_table_mod2 (hashval_t hash, unsigned int index); +/* The below is some template meta programming to decide if we should use the + hash table partial specialization that directly stores value_type instead of + pointers to value_type. If the Descriptor type defines the type + Descriptor::store_values_directly then values are stored directly otherwise + pointers to them are stored. */ +templatetypename T struct notype { typedef void type; }; + +templatetypename T, typename = void +struct storage_tester +{ + static const bool value = false; +}; + +templatetypename T +struct storage_testerT, typename notypetypename +T::store_values_directly::type +{ + static const bool value = true; +}; + + templatetypename Traits + struct has_is_deleted +{ + templatetypename U, bool (*)(U ) struct helper {}; + templatetypename U static char test (helperU, U::is_deleted *); + templatetypename U static int test (...); + static const bool value = sizeof (testTraits (0)) == sizeof (char); +}; + +templatetypename Type, typename Traits, bool = has_is_deletedTraits::value +struct is_deleted_helper +{ + static inline bool + call (Type v) + { +return Traits::is_deleted (v); + } +}; + +templatetypename Type, typename Traits +struct is_deleted_helperType *, Traits, false +{ + static inline bool + call (Type *v) + { +return v == HTAB_DELETED_ENTRY; + } +}; + + templatetypename Traits + struct has_is_empty +{ + templatetypename U, bool (*)(U ) struct helper {}; + templatetypename U static char test (helperU, U::is_empty *); + templatetypename U static int test (...); + static const bool value = sizeof (testTraits (0)) == sizeof (char); +}; + +templatetypename Type, typename Traits, bool = has_is_deletedTraits::value +struct is_empty_helper +{ + static inline bool + call (Type v) + { +return Traits::is_empty (v); + } +}; +
[PATCH 3/3] add hash_map class
From: Trevor Saunders tsaund...@mozilla.com Hi, This patch adds a hash_map class so we can consolidate the boiler plate around using hash_table as a map, it also allows us to get rid of pointer_map which I do in this patch by converting its users to hash_map. bootstrapped + regtested without regression on x86_64-unknown-linux-gnu, ok? Trev gcc/ * alloc-pool.c (alloc_pool_hash): Use hash_map instead of hash_table. * dominance.c (iterate_fix_dominators): Use hash_map instead of pointer_map. * hash-map.h: New file. * ipa-comdats.c: Use hash_map instead of pointer_map. * lto-section-out.c: Adjust. * lto-streamer.h: Replace pointer_map with hash_map. * symtab.c (verify_symtab): Likewise. * tree-ssa-strlen.c (decl_to_stridxlist_htab): Likewise. * tree-ssa-uncprop.c (val_ssa_equiv): Likewise. * tree-streamer.h: Likewise. * tree-streamer.c: Adjust. * pointer-set.h: Remove pointer_map. lto/ * lto.c (canonical_type_hash_cache): Use hash_map instead of pointer_map. diff --git a/gcc/alloc-pool.c b/gcc/alloc-pool.c index 49209ee..0d31835 100644 --- a/gcc/alloc-pool.c +++ b/gcc/alloc-pool.c @@ -22,6 +22,7 @@ along with GCC; see the file COPYING3. If not see #include system.h #include alloc-pool.h #include hash-table.h +#include hash-map.h #define align_eight(x) (((x+7) 3) 3) @@ -69,7 +70,6 @@ static ALLOC_POOL_ID_TYPE last_id; size for that pool. */ struct alloc_pool_descriptor { - const char *name; /* Number of pools allocated. */ unsigned long created; /* Gross allocated storage. */ @@ -82,48 +82,17 @@ struct alloc_pool_descriptor int elt_size; }; -/* Hashtable helpers. */ -struct alloc_pool_hasher : typed_noop_remove alloc_pool_descriptor -{ - typedef alloc_pool_descriptor value_type; - typedef char compare_type; - static inline hashval_t hash (const alloc_pool_descriptor *); - static inline bool equal (const value_type *, const compare_type *); -}; - -inline hashval_t -alloc_pool_hasher::hash (const value_type *d) -{ - return htab_hash_pointer (d-name); -} - -inline bool -alloc_pool_hasher::equal (const value_type *d, - const compare_type *p2) -{ - return d-name == p2; -} - /* Hashtable mapping alloc_pool names to descriptors. */ -static hash_tablealloc_pool_hasher *alloc_pool_hash; +static hash_mapconst char *, alloc_pool_descriptor *alloc_pool_hash; /* For given name, return descriptor, create new if needed. */ static struct alloc_pool_descriptor * allocate_pool_descriptor (const char *name) { - struct alloc_pool_descriptor **slot; - if (!alloc_pool_hash) -alloc_pool_hash = new hash_tablealloc_pool_hasher (10); - - slot = alloc_pool_hash-find_slot_with_hash (name, - htab_hash_pointer (name), - INSERT); - if (*slot) -return *slot; - *slot = XCNEW (struct alloc_pool_descriptor); - (*slot)-name = name; - return *slot; +alloc_pool_hash = new hash_mapconst char *, alloc_pool_descriptor (10); + + return alloc_pool_hash-get_or_insert (name); } /* Create a pool of things of size SIZE, with NUM in each block we @@ -375,23 +344,22 @@ struct output_info unsigned long total_allocated; }; -/* Called via hash_table.traverse. Output alloc_pool descriptor pointed out by +/* Called via hash_map.traverse. Output alloc_pool descriptor pointed out by SLOT and update statistics. */ -int -print_alloc_pool_statistics (alloc_pool_descriptor **slot, +bool +print_alloc_pool_statistics (const char *const name, +const alloc_pool_descriptor d, struct output_info *i) { - struct alloc_pool_descriptor *d = *slot; - - if (d-allocated) + if (d.allocated) { fprintf (stderr, %-22s %6d %10lu %10lu(%10lu) %10lu(%10lu) %10lu(%10lu)\n, - d-name, d-elt_size, d-created, d-allocated, - d-allocated / d-elt_size, d-peak, d-peak / d-elt_size, - d-current, d-current / d-elt_size); - i-total_allocated += d-allocated; - i-total_created += d-created; + name, d.elt_size, d.created, d.allocated, + d.allocated / d.elt_size, d.peak, d.peak / d.elt_size, + d.current, d.current / d.elt_size); + i-total_allocated += d.allocated; + i-total_created += d.created; } return 1; } diff --git a/gcc/dominance.c b/gcc/dominance.c index 7adec4f..be0a439 100644 --- a/gcc/dominance.c +++ b/gcc/dominance.c @@ -43,6 +43,7 @@ #include diagnostic-core.h #include et-forest.h #include timevar.h +#include hash-map.h #include pointer-set.h #include graphds.h #include bitmap.h @@ -1258,7 +1259,6 @@ iterate_fix_dominators (enum cdi_direction dir, vecbasic_block bbs, size_t dom_i; edge e; edge_iterator ei; - pointer_mapint *map; int *parent, *son,
Re: Move DECL_INIT_PRIORITY/FINI_PRIORITY to symbol table
On Fri, Jun 20, 2014 at 08:41:22AM +0200, Jan Hubicka wrote: Hi, this patch moves init and fini priorities to symbol table instead of trees. They are already in on-side hashtables, but the hashtables are now maintaned by symbol table. This is needed for correctness with LTO. Currently tree merging may load declaration with priority and then ggc_free it creating a stale entry in the hashtable. This is usually not problem, because ctor declarations are usually static, but it is not safe. I would really like to have template for such a sparse annotations to symbols (writting our old school pch friendly hashtable is not a fun) but I am not sure I can get it done in GGC/PCH safe way. Is user marking working for PCH? hm, so thinking about this a little more I wonder if you can just use hash_table, and add a dtor to symtab_node that removes its entry from the hash table. ggc should invoke your dtor as a finalizer for the node then. Trev Bootstrapped/regtested x86_64-linux, will commit it shortly. Honza * cgraph.h (struct symtab_node): Add field in_init_priority_hash (set_init_priority, get_init_priority, set_fini_priority, get_fini_priority): New methods. * tree.c (init_priority_for_decl): Remove. (init_ttree): Do not initialize init priority. (decl_init_priority_lookup, decl_fini_priority_lookup): Rewrite. (decl_priority_info): Remove. (decl_init_priority_insert): Rewrite. (decl_fini_priority_insert): Rewrite. * tree.h (tree_priority_map_eq, tree_priority_map_hash, tree_priority_map_marked_p): Remove. * lto-cgraph.c (lto_output_node, input_node): Stream init priorities. * lto-streamer-out.c (hash_tree): Do not hash priorities. * tree-streamer-out.c (pack_ts_decl_with_vis_value_fields): Do not output priorities. (pack_ts_function_decl_value_fields): Likewise. * tree-streamer-in.c (unpack_ts_decl_with_vis_value_fields): Do not input priorities. (unpack_ts_function_decl_value_fields): Likewise. * symtab.c (symbol_priority_map): Declare. (init_priority_hash): Declare. (symtab_unregister_node): Unregister from priority hash, too. (symtab_node::get_init_priority, cgraph_node::get_fini_priority): New methods. (symbol_priority_map_eq, symbol_priority_map_hash): New functions. (symbol_priority_info): New function. (symtab_node::set_init_priority, cgraph_node::set_fini_priority): New methods. * tree-core.h (tree_priority_map): Remove. * lto.c (compare_tree_sccs_1): Do not compare priorities. Index: cgraph.h === --- cgraph.h (revision 211831) +++ cgraph.h (working copy) @@ -130,6 +130,8 @@ public: /* Set when symbol has address taken. */ unsigned address_taken : 1; + /* Set when init priority is set. */ + unsigned in_init_priority_hash : 1; /* Ordering of all symtab entries. */ @@ -163,6 +165,7 @@ public: return x_comdat_group; } + /* Return comdat group as identifier_node. */ tree get_comdat_group_id () { if (x_comdat_group TREE_CODE (x_comdat_group) != IDENTIFIER_NODE) @@ -208,6 +211,9 @@ public: /* Set section for symbol and its aliases. */ void set_section (const char *section); void set_section_for_node (const char *section); + + void set_init_priority (priority_type priority); + priority_type get_init_priority (); }; enum availability @@ -497,6 +503,9 @@ public: /* True if this decl calls a COMDAT-local function. This is set up in compute_inline_parameters and inline_call. */ unsigned calls_comdat_local : 1; + + void set_fini_priority (priority_type priority); + priority_type get_fini_priority (); }; Index: tree.c === --- tree.c(revision 211831) +++ tree.c(working copy) @@ -219,10 +219,6 @@ static GTY ((if_marked (tree_decl_map_m static GTY ((if_marked (tree_vec_map_marked_p), param_is (struct tree_vec_map))) htab_t debug_args_for_decl; -static GTY ((if_marked (tree_priority_map_marked_p), - param_is (struct tree_priority_map))) - htab_t init_priority_for_decl; - static void set_type_quals (tree, int); static int type_hash_eq (const void *, const void *); static hashval_t type_hash_hash (const void *); @@ -573,8 +569,6 @@ init_ttree (void) value_expr_for_decl = htab_create_ggc (512, tree_decl_map_hash, tree_decl_map_eq, 0); - init_priority_for_decl = htab_create_ggc (512, tree_priority_map_hash, - tree_priority_map_eq, 0); int_cst_hash_table = htab_create_ggc (1024, int_cst_hash_hash, int_cst_hash_eq, NULL); @@ -6492,13 +6486,12 @@
Re: [PATCH] Fix for invalid sanitization of trailing byte in __builtin_strlen
On Fri, Jun 20, 2014 at 02:49:12PM +0400, Maxim Ostapenko wrote: This patch fixes invalid sanitization of trailing byte in __builtin_strlen ( https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61547). Tested on x86_64-unknown-linux-gnu, no regressions. + +#include sanitizer/asan_interface.h + +char a[2] = 0; + +#ifdef __cplusplus +extern C +#endif +__SIZE_TYPE__ strlen (const char *p) { + /* Simulate error */ + if (p == a) +return 1; Why this? Can't you instead just use __attribute__((no_sanitize_address, noinline)) on it instead? Otherwise looks good to me Jakub
Re: [PATCH] Fix for invalid sanitization of trailing byte in __builtin_strlen
On Fri, Jun 20, 2014 at 2:57 PM, Jakub Jelinek ja...@redhat.com wrote: On Fri, Jun 20, 2014 at 02:49:12PM +0400, Maxim Ostapenko wrote: This patch fixes invalid sanitization of trailing byte in __builtin_strlen ( https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61547). Tested on x86_64-unknown-linux-gnu, no regressions. What about bootstrap though? +__SIZE_TYPE__ strlen (const char *p) { + /* Simulate error */ + if (p == a) +return 1; Why this? Can't you instead just use __attribute__((no_sanitize_address, noinline)) on it instead? Yeah, good point. -Y
Re: [PATCH] Change default for --param allow-...-data-races to off
Hi, On Thu, Jun 19, 2014 at 06:18:47PM +0200, Bernd Edlinger wrote: Hi, from a recent discussion on g...@gcc.gnu.org I have learned that the default of --param allow-store-data-races is still 1, and it is causing problems. Therefore I would like to suggest to change the default of this option to 0. I was about to propose a similar patch but I intended to leave the parameter set to one when -Ofast is specified so that benchmarks are not hurt by this and as a nice pointer for people exploring our options to really squeeze out 100% performance (which would of course mean documenting it too). Thanks, Martin Boot-strapped and regression tested on x86_64-linux-gnu. Ok for trunk? Thanks Bernd. gcc/ChangeLog: 2014-06-19 Bernd Edlinger bernd.edlin...@hotmail.de Set default for --param allow-...-data-races to off. * params.def (PARAM_ALLOW_LOAD_DATA_RACES, PARAM_ALLOW_STORE_DATA_RACES, PARAM_ALLOW_PACKED_LOAD_DATA_RACES, PARAM_ALLOW_PACKED_STORE_DATA_RACES): Set default to off. testsuite/ChangeLog: 2014-06-19 Bernd Edlinger bernd.edlin...@hotmail.de Adjust to new default for --param allow-...-data-races. * c-c++-common/cxxbitfields-3.c: Adjust. * c-c++-common/cxxbitfields-6.c: Adjust. * c-c++-common/simulate-thread/bitfields-1.c: Adjust. * c-c++-common/simulate-thread/bitfields-2.c: Adjust. * c-c++-common/simulate-thread/bitfields-3.c: Adjust. * c-c++-common/simulate-thread/bitfields-4.c: Adjust. * g++.dg/simulate-thread/bitfields.C: Adjust. * g++.dg/simulate-thread/bitfields-2.C: Adjust. * gcc.dg/lto/pr52097_0.c: Adjust. * gcc.dg/simulate-thread/speculative-store.c: Adjust. * gcc.dg/simulate-thread/speculative-store-2.c: Adjust. * gcc.dg/simulate-thread/speculative-store-3.c: Adjust. * gcc.dg/simulate-thread/speculative-store-4.c: Adjust. * gcc.dg/simulate-thread/strict-align-global.c: Adjust. * gcc.dg/simulate-thread/subfields.c: Adjust. * gcc.dg/tree-ssa/20050314-1.c: Adjust.
Re: [PATCH] dwarf2out.c: Pass DWARF type modifiers around as flags argument.
While adding some new type modifiers I did find a typo in my original patch, so I decided to add some testcases to make sure no regressions were introduced. This is the same patch as the original, but with the typo in modified_type_die fixed that could accidentally add an extra layer of type modifiers and a new test based on the guality.exp testsuite, but tweaked to inspect the type of a variable instead of its value. From b3140b1ee59560b33d08e2583c20be5a615e588b Mon Sep 17 00:00:00 2001 From: Mark Wielaard m...@redhat.com Date: Wed, 18 Jun 2014 22:41:38 +0200 Subject: [PATCH] dwarf2out.c: Pass type modifiers as flags arguments. Add guality type test. modified_type_die and add_type_attribute take two separate arguments for whether the type should be const and/or volatile. To help add more type modifiers pass the requested modifiers as one flag value to these functions. And introduce helper functions dw_mod_type_flags and dw_mod_decl_flags to easily extract the modifiers from type and declaration trees. Add a new type:var variant to the guality.exp testsuite to check that gdb gets the correct type for a variable and use it to make sure the change doesn't cause any regressions. DWARFv3 added restrict_type [PR debug/59051] and DWARFv5 has proposals for atomic_type and aligned_type. Which will hopefully be easier to implement based on this change. gcc/ChangeLog * dwarf2out.h (enum dw_mod_flag): New enum. * dwarf2out.c (dw_mod_decl_flags): New function. (dw_mod_type_flags): Likewise. (modified_type_die): Take one modifiers flag argument instead of one for const and one for volatile. (add_type_attribute): Likewise. (generic_parameter_die): Call add_type_attribute with one modifier argument. (base_type_for_mode): Likewise. (add_bounds_info): Likewise. (add_subscript_info): Likewise. (gen_array_type_die): Likewise. (gen_descr_array_type_die): Likewise. (gen_entry_point_die): Likewise. (gen_enumeration_type_die): Likewise. (gen_formal_parameter_die): Likewise. (gen_subprogram_die): Likewise. (gen_variable_die): Likewise. (gen_const_die): Likewise. (gen_field_die): Likewise. (gen_pointer_type_die): Likewise. (gen_reference_type_die): Likewise. (gen_ptr_to_mbr_type_die): Likewise. (gen_inheritance_die): Likewise. (gen_subroutine_type_die): Likewise. (gen_typedef_die): Likewise. (force_type_die): Likewise. gcc/testsuite/ChangeLog * lib/gcc-gdb-test.exp (gdb-test): Handle type:var for gdb ptype matching. * gcc/testsuite/gcc.dg/guality/const-volatile.c: New test. --- gcc/ChangeLog | 30 ++ gcc/dwarf2out.c | 133 +--- gcc/dwarf2out.h |8 ++ gcc/testsuite/ChangeLog |6 + gcc/testsuite/gcc.dg/guality/const-volatile.c | 83 +++ gcc/testsuite/lib/gcc-gdb-test.exp| 45 - 6 files changed, 241 insertions(+), 64 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/guality/const-volatile.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 2d0a07c..4dfd9a5 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,33 @@ +2014-06-20 Mark Wielaard m...@redhat.com + + * dwarf2out.h (enum dw_mod_flag): New enum. + * dwarf2out.c (dw_mod_decl_flags): New function. + (dw_mod_type_flags): Likewise. + (modified_type_die): Take one modifiers flag argument instead of + one for const and one for volatile. + (add_type_attribute): Likewise. + (generic_parameter_die): Call add_type_attribute with one modifier + argument. + (base_type_for_mode): Likewise. + (add_bounds_info): Likewise. + (add_subscript_info): Likewise. + (gen_array_type_die): Likewise. + (gen_descr_array_type_die): Likewise. + (gen_entry_point_die): Likewise. + (gen_enumeration_type_die): Likewise. + (gen_formal_parameter_die): Likewise. + (gen_subprogram_die): Likewise. + (gen_variable_die): Likewise. + (gen_const_die): Likewise. + (gen_field_die): Likewise. + (gen_pointer_type_die): Likewise. + (gen_reference_type_die): Likewise. + (gen_ptr_to_mbr_type_die): Likewise. + (gen_inheritance_die): Likewise. + (gen_subroutine_type_die): Likewise. + (gen_typedef_die): Likewise. + (force_type_die): Likewise. + 2014-06-18 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm_neon.h (vadd_f32): Change #ifdef to __FAST_MATH. diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 933ec62..3d3508d 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -3140,7 +3140,9 @@ static void output_file_names (void); static dw_die_ref base_type_die (tree); static int is_base_type (tree); static dw_die_ref subrange_type_die (tree, tree, tree, dw_die_ref); -static dw_die_ref modified_type_die (tree, int, int, dw_die_ref); +static int dw_mod_decl_flags (const_tree); +static int dw_mod_type_flags (const_tree); +static dw_die_ref modified_type_die (tree, int, dw_die_ref); static dw_die_ref generic_parameter_die (tree, tree, bool, dw_die_ref); static dw_die_ref
Re: [PATCH AArch64 1/2] PR/60825 Make float64x1_t in arm_neon.h a proper vector type
On 19 June 2014 13:27, Alan Lawrence alan.lawre...@arm.com wrote: This updates the .md files to generate V1DFmode patterns instead of DFmode for create and reinterpret, and the corresponding __builtins. The various other float64x1_t intrinsics can then be rewritten, generally I've tried to use gcc vector extensions rather than unnecessary/custom builtins where possible, and have started adding some range checking using __builtin_aarch64_im_lane_boundsi. Finally, rewrite the cases in arm_neon.h and various tests, that relied on float64[x1]_t being assignment-compatible, including arm_neon.h vfma functions which had the wrong (but previously equivalent) type signature; and add some new ABI tests. OK /Marcus
Re: [PATCH AArch64 2/2] PR/60825 Make {int,uint}64x1_t in arm_neon.h a proper vector type
On 19 June 2014 13:30, Alan Lawrence alan.lawre...@arm.com wrote: Similarly, this makes int64x1_t a proper vector type, updating arm_neon.h with many explicit vector construction/destruction operations (also including some range checking using __builtin_aarch64_im_lane_boundsi). Change the vabs_s64 intrinsic from using __builtin_llabs to __builtin_aarch64_absdi, the latter is consistent with other intrinsics and should have different behaviour (aarch64_abs on the minimum negative value should be defined to return said minimum negative value rather than undefined). This __builtin was previously being generated as a binary operator (but this was not noticed as it was unused), so I've tweaked the qualifiers to force unary ops to unary. Ok /Marcus
Re: [PATCH] Fix for invalid sanitization of trailing byte in __builtin_strlen
What about bootstrap though? Bootstrap in progress. +__SIZE_TYPE__ strlen (const char *p) { + /* Simulate error */ + if (p == a) +return 1; Why this? Can't you instead just use __attribute__((no_sanitize_address, noinline)) on it instead? Done. Ok to commit if bootstrap will succeed? -Maxim gcc/ChangeLog: 2014-06-20 Yury Gribov y.gri...@samsung.com Max Ostapenko m.ostape...@partner.samsung.com PR sanitizer/61547 * asan.c (instrument_strlen_call): Fixed instrumentation of trailing byte. gcc/testsuite/ChangeLog: 2014-06-20 Yury Gribov y.gri...@samsung.com Max Ostapenko m.ostape...@partner.samsung.com PR sanitizer/61547 * c-c++-common/asan/strlen-overflow-1.c: New test. diff --git a/gcc/asan.c b/gcc/asan.c index 281a795..71c063b 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -2036,19 +2036,19 @@ instrument_strlen_call (gimple_stmt_iterator *iter) build_check_stmt (loc, gimple_assign_lhs (str_arg_ssa), NULL_TREE, 1, iter, /*non_zero_len_p*/true, /*before_p=*/true, - /*is_store=*/false, /*is_scalar_access*/false, /*align*/0); + /*is_store=*/false, /*is_scalar_access*/true, /*align*/0); - gimple stmt = -gimple_build_assign_with_ops (PLUS_EXPR, - make_ssa_name (TREE_TYPE (len), NULL), - len, - build_int_cst (TREE_TYPE (len), 1)); - gimple_set_location (stmt, loc); - gsi_insert_after (iter, stmt, GSI_NEW_STMT); + gimple g = +gimple_build_assign_with_ops (POINTER_PLUS_EXPR, + make_ssa_name (cptr_type, NULL), + gimple_assign_lhs (str_arg_ssa), + len); + gimple_set_location (g, loc); + gsi_insert_after (iter, g, GSI_NEW_STMT); - build_check_stmt (loc, gimple_assign_lhs (stmt), len, 1, iter, + build_check_stmt (loc, gimple_assign_lhs (g), NULL_TREE, 1, iter, /*non_zero_len_p*/true, /*before_p=*/false, - /*is_store=*/false, /*is_scalar_access*/false, /*align*/0); + /*is_store=*/false, /*is_scalar_access*/true, /*align*/0); return true; } diff --git a/gcc/testsuite/c-c++-common/asan/strlen-overflow-1.c b/gcc/testsuite/c-c++-common/asan/strlen-overflow-1.c new file mode 100644 index 000..4c4585e --- /dev/null +++ b/gcc/testsuite/c-c++-common/asan/strlen-overflow-1.c @@ -0,0 +1,31 @@ +/* { dg-do run } */ +/* { dg-skip-if { *-*-* } { -flto } { } } */ +/* { dg-shouldfail asan } */ + +#include sanitizer/asan_interface.h + +char a[2] = 0; + +#ifdef __cplusplus +extern C +#endif + +__attribute__((no_sanitize_address, noinline)) __SIZE_TYPE__ +strlen (const char *p) { + + __SIZE_TYPE__ n = 0; + for (; *p; ++n, ++p); + return n; +} + +int main () { + char *p = a[0]; + asm ( : +r(p)); + __asan_poison_memory_region ((char *)a[1], 1); + return __builtin_strlen (a); +} + +/* { dg-output READ of size 1 at 0x\[0-9a-f\]+ thread T0.*(\n|\r\n|\r) } */ +/* { dg-output #0 0x\[0-9a-f\]+ (in _*main (\[^\n\r]*strlen-overflow-1.c:24|\[^\n\r]*:0)|\[(\]).*(\n|\r\n|\r) } */ +/* { dg-output \[^\n\r]*0x\[0-9a-f\]+ is located 1 bytes inside of global variable } */
Re: [PATCH] Fix for invalid sanitization of trailing byte in __builtin_strlen
On Fri, Jun 20, 2014 at 04:55:43PM +0400, Maxim Ostapenko wrote: What about bootstrap though? Bootstrap in progress. +__SIZE_TYPE__ strlen (const char *p) { + /* Simulate error */ + if (p == a) +return 1; Why this? Can't you instead just use __attribute__((no_sanitize_address, noinline)) on it instead? Done. Ok to commit if bootstrap will succeed? Ok, thanks. Jakub
Re: C++ PATCH for c++/59296 (rvalue object and lvalue ref-qualifier)
On 06/19/2014 01:29 PM, Marc Glisse wrote: That looks weird to me. The const version is a better match than the const, so we should pick that one in overload resolution, but if we remove the const version, the other one seems valid to me Hmm, you're right, I was confused. Jason
Re: [PATCH] Fix for invalid sanitization of trailing byte in __builtin_strlen
On 06/20/2014 04:59 PM, Jakub Jelinek wrote: On Fri, Jun 20, 2014 at 04:55:43PM +0400, Maxim Ostapenko wrote: What about bootstrap though? Bootstrap in progress. +__SIZE_TYPE__ strlen (const char *p) { + /* Simulate error */ + if (p == a) +return 1; Why this? Can't you instead just use __attribute__((no_sanitize_address, noinline)) on it instead? Done. Ok to commit if bootstrap will succeed? Ok, thanks. Jakub Thanks, done in r211849. -Maxim
Re: [PATCH, Testsuite, AArch64] Make aapcs64.exp Tests Big-Endian Friendly
On 19 June 2014 14:32, Yufeng Zhang yufeng.zh...@arm.com wrote: Hi, This patch updates a number of aapcs64 tests to make them big-endian friendly. Changes are mainly: * checking the W regs instead of X regs for integral arguments less than 8 bytes * correcting the corresponding stack location checks in big-endian mode With this patch, make check-gcc RUNTESTFLAGS=aapcs64.exp gives a clean result on aarch64_be-none-elf. OK for trunk? OK thanks. /Marcus
Re: [RFC] Add a .gitattributes file for use with git-merge-changelog
Samuel == Samuel Bronson naes...@gmail.com writes: Samuel [Am I really supposed to CC this to gcc@ like binutils/MAINTAINERS Samuel says I should?] I think just for files that are intended to be put in both trees and shared. Samuel Individual users will still have to: Samuel 1. Install git-merge-changelog Samuel 2. Set up the merge driver in their git config What happens if they do not? Tom
Re: [PATCH][AArch64] Fix some saturating math NEON intrinsics types
On 20/06/14 09:41, Marcus Shawcroft wrote: On 16 June 2014 15:26, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, I noticed that a few saturating math intrinsics in arm_neon.h for aarch64 have the wrong types, i.e. not what's mandated by the ACLE spec. This patch fixes that by adjusting the types of the builtin functions that those intrinsics map to (and in the process cleaning up the VCON iterator) and adding tests for the affected intrinsics. I realise it's quite big, but the changes are mostly uniform. Bootstrapped and tested aarch64-none-linux-gnu. Ok for trunk? OK, can you prepare a 4.9 backport? Sure, but it depends on https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00779.html. Is it ok to backport that one as well? It passes regtest on aarch64-none-elf and aarch64_be-none-elf. Kyrill Cheers /Marcus
[PATCH][AArch64] Implement vfma_f64, vmla_f64, vfms_f64, vmls_f64 intrinsics
Hi all, Now that Alan fixed the float64x1_t machinery, this patch implements some low-hanging intrinsics in arm_neon.h. Tested aarch64-none-elf and bootstrapped on aarch64-linux. Ok for trunk? Thanks, Kyrill 2014-06-20 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/arm_neon.h (vfma_f64): New intrinsic. (vmla_f64): Likewise. (vfms_f64): Likewise. (vmls_f64): Likewise. 2014-06-20 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/simd/vfma_f64.c: New test. * gcc.target/aarch64/simd/vmla_f64.c: Likewise. * gcc.target/aarch64/simd/vfms_f64.c: Likewise. * gcc.target/aarch64/simd/vmls_f64.c: Likewise. commit ffb5a3efe38e50c0d410b5517e030aa37cad88b7 Author: Kyrylo Tkachov kyrylo.tkac...@arm.com Date: Wed Jun 18 14:25:17 2014 +0100 [AArch64] Implement vfma_f64, vmla_f64, vfms_f64, vmls_f64 in arm_neon.h diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index fd520f5..2809b3e 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -16701,6 +16701,14 @@ vextq_u64 (uint64x2_t __a, uint64x2_t __b, __const int __c) #endif } +/* vfma */ + +__extension__ static __inline float64x1_t __attribute__ ((__always_inline__)) +vfma_f64 (float64x1_t __a, float64x1_t __b, float64x1_t __c) +{ + return (float64x1_t) {__builtin_fma (__b[0], __c[0], __a[0])}; +} + /* vfma_lane */ __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) @@ -16804,6 +16812,14 @@ vfmaq_laneq_f64 (float64x2_t __a, float64x2_t __b, __a); } +/* vfms */ + +__extension__ static __inline float64x1_t __attribute__ ((__always_inline__)) +vfms_f64 (float64x1_t __a, float64x1_t __b, float64x1_t __c) +{ + return (float64x1_t) {__builtin_fma (-__b[0], __c[0], __a[0])}; +} + /* vfms_lane */ __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) @@ -18432,6 +18448,12 @@ vmla_f32 (float32x2_t a, float32x2_t b, float32x2_t c) return a + b * c; } +__extension__ static __inline float64x1_t __attribute__ ((__always_inline__)) +vmla_f64 (float64x1_t __a, float64x1_t __b, float64x1_t __c) +{ + return __a + __b * __c; +} + __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vmlaq_f32 (float32x4_t a, float32x4_t b, float32x4_t c) { @@ -18600,6 +18622,12 @@ vmls_f32 (float32x2_t a, float32x2_t b, float32x2_t c) return a - b * c; } +__extension__ static __inline float64x1_t __attribute__ ((__always_inline__)) +vmls_f64 (float64x1_t __a, float64x1_t __b, float64x1_t __c) +{ + return __a - __b * __c; +} + __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vmlsq_f32 (float32x4_t a, float32x4_t b, float32x4_t c) { diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vfma_f64.c b/gcc/testsuite/gcc.target/aarch64/simd/vfma_f64.c new file mode 100644 index 000..d6bcf1c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/vfma_f64.c @@ -0,0 +1,41 @@ +/* Test the vfma_f64 AArch64 SIMD intrinsic. */ + +/* { dg-do run } */ +/* { dg-options -save-temps -O3 } */ + +#include arm_neon.h +#include stdio.h + +#define EPS 1.0e-15 + + +extern void abort (void); + +int +main (void) +{ + float64x1_t arg1; + float64x1_t arg2; + float64x1_t arg3; + + float64_t expected; + float64_t actual; + + arg1 = vcreate_f64 (0x3fe3955382d35b0eULL); + arg2 = vcreate_f64 (0x3fa88480812d6670ULL); + arg3 = vcreate_f64 (0x3fd5791ae2a92572ULL); + + expected = 0.6280448184360076; + actual = vget_lane_f64 (vfma_f64 (arg1, arg2, arg3), 0); + + if (__builtin_fabs (expected - actual) EPS) +{ + fprintf (stderr, Expected: %lf, got %lf\n, expected, actual); + abort (); +} + + return 0; +} + +/* { dg-final { scan-assembler-times fmadd\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+\n 1 } } */ +/* { dg-final { cleanup-saved-temps } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vfms_f64.c b/gcc/testsuite/gcc.target/aarch64/simd/vfms_f64.c new file mode 100644 index 000..3f34758 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/vfms_f64.c @@ -0,0 +1,41 @@ +/* Test the vfms_f64 AArch64 SIMD intrinsic. */ + +/* { dg-do run } */ +/* { dg-options -save-temps -O3 } */ + +#include arm_neon.h +#include stdio.h + +#define EPS 1.0e-15 + + +extern void abort (void); + +int +main (void) +{ + float64x1_t arg1; + float64x1_t arg2; + float64x1_t arg3; + + float64_t expected; + float64_t actual; + + arg1 = vcreate_f64 (0x3fe730af8db9e6f7ULL); + arg2 = vcreate_f64 (0x3fe6b78680fa29ceULL); + arg3 = vcreate_f64 (0x3feea3cbf921fbe0ULL); + + expected = 4.4964705746355915e-2; + actual = vget_lane_f64 (vfms_f64 (arg1, arg2, arg3), 0); + + if (__builtin_fabs (expected - actual) EPS) +{ + fprintf (stderr, Expected: %lf, got %lf\n, expected, actual); + abort (); +} + + return 0; +} + +/* { dg-final { scan-assembler-times fmsub\[ \t\]+\[dD\]\[0-9\]+, ?\[dD\]\[0-9\]+,
[linaro/gcc-4_8-branch] Merge from gcc-4_8-branch
Hi, we have merged the gcc-4_8-branch into linaro/gcc-4_8-branch up to revision 210799 as r211850. Thanks Yvan
Re: [PATCH] Implement -fsanitize=bounds and internal calls in FEs
I don't have any comments on this patch. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH 5/5] add libcc1
Trevor == Trevor Saunders tsaund...@mozilla.com writes: Trevor I'm curious, what is the reason you choose not to write this in C++11 or Trevor later? Its distributed with gcc, so the only case where you aren't Trevor building with the in tree compiler and libraries is when your cross Trevor compiling gcc, and it doesn't seem particularly important to support Trevor building the plugin or library in that configuration. So istm you could Trevor go all the way and assume you are being built with trunk gcc and Trevor libraries. The plugin has to be ABI compatible with GCC itself, and my understanding was that C++11 and however GCC is built are not necessarily compatible. Switching to C++11 would be an improvement -- variadic templates would simplify the RPC code (with a complicated caveat). So if it is possible I am interested. Trevor I'm going to use this as an excuse to bring up something I've wanted to Trevor discuss for a while. Trevor So can we add C++ stuff to libiberty and allow building Trevor libiberty without it for binutils / gdb, or can we do something Trevor else to avoid this kind of stuff? One way would be to just make a new top-level directory for a new library. Trevor This question also arises in the case of templating splay_tree, and I Trevor imagine if gdb switches to C++ some day they'll want to reuse vec.h. While I would like that to happen, I think the odds are very long now. +connection (int fd) + : m_fd (fd), +m_aux_fd (-1), +m_callbacks () Trevor Personally I'd leave that to the compiler to write, but I guess there's Trevor something to be said for being explicit. I can't recall if I did this in response to a warning or if it was just because I wanted to be explicit. I'm inclined to leave it, but I suppose only out of inertia. + void print (const char *buf) Trevor explicitly mark it as virtual? Good idea, done. +// This is a wrapper function that is called by the RPC system and +// that then forwards the call to the library user. Note that the +// return value is not used; the type cannot be 'void' due to +// limitations in our simple RPC. +gcc_address Trevor looks like this one probably is used? Thanks, fixed. + char **argv = new (std::nothrow) char *[self-args.size () + 1]; Trevor What's the point of making this no throw? you don't null check it so Trevor you'll crash anyway afaict. Thanks. I changed it to do a NULL check. It's nothrow because nothing in libcc1 or gdb is prepared for a C++ exception. While I like exceptions (gdb uses its own longjmp-based exception system extensively), my understanding is that they aren't currently used in gcc. +cc1_plugin::status +cc1_plugin::unmarshall (connection *conn, char **result) +{ + unsigned long long len; + + if (!conn-require ('s')) +return FAIL; + if (!conn-get (len, sizeof (len))) +return FAIL; + + if (len == -1ULL) +{ + *result = NULL; + return OK; +} + + char *str = new (std::nothrow) char[len + 1]; Trevor It'd be really nice if the type of the out arg forced the caller to deal Trevor with deleting the string like unique_ptrchar, it would be even nicer Trevor if you could stick a random buffer in a std::string, but I guess you Trevor can't :( Yeah, it's all quite simplistic. I suppose it could be upgraded, there just didn't seem to be a need. Trevor Also where does this array get deleted? The unmarshalling methods are generally called via argument_wrappers. An example is in connection.cc: // Use an argument_wrapper here to simplify management // of the string's lifetime. argument_wrapperchar * method_name; if (!method_name.unmarshall (this)) return FAIL; Then in rpc.hh: // Specialization for string types. template class argument_wrapperconst char * { public: argument_wrapper () : m_object (NULL) { } ~argument_wrapper () { delete[] m_object; } Tom
Re: [PATCH] Implement -fsanitize=bounds and internal calls in FEs
On Thu, Jun 19, 2014 at 07:47:54PM +0200, Jakub Jelinek wrote: On Thu, Jun 19, 2014 at 04:56:53PM +0200, Marek Polacek wrote: + /* Don't instrument this FMA-like array in non-strict Also, please don't use FMA to mean flexible member array, it is flexible array member, but more importantly, FMA is used for fused multiply-add, so IMHO it is better to spell it without acronym. Just for the record, I've fixed this up in some earlier patch. Marek
Re: [patch i386]: Combine memory and indirect jump
I tested variant to use additional the split pass for memory combining for indirect jumps. By this we don't need to add a second peephole2 pass. Other advantage of this version is that even for -O1 we do combining. ChangeLog 2014-06-20 Kai Tietz kti...@redhat.com PR target/39284 * config/i386/i386.md (peehole2): To combine indirect jump with memory. (split2): Likewise. 2014-06-20 Kai Tietz kti...@redhat.com * gcc.target/i386/indjmp-1.c: New test. Tested for i686-pc-cygwin, and x86_64-unknown-linux-gnu. Ok for apply? Regards, Kai Index: testsuite/gcc.target/i386/indjmp-1.c === --- testsuite/gcc.target/i386/indjmp-1.c(Revision 0) +++ testsuite/gcc.target/i386/indjmp-1.c(Arbeitskopie) @@ -0,0 +1,23 @@ +/* { dg-do compile { target ia32 } } */ +/* { dg-options -O2 } */ + +#define ADVANCE_AND_DISPATCH() goto *addresses[*pc++] + +void +Interpret(const unsigned char *pc) +{ +static const void *const addresses[] = { + l0, l1, l2 +}; + +l0: +ADVANCE_AND_DISPATCH(); + +l1: +ADVANCE_AND_DISPATCH(); + +l2: +return; +} + +/* { dg-final { scan-assembler-not jmp\[ \t\]*.%eax } } */ Index: config/i386/i386.md === --- config/i386/i386.md(Revision 211850) +++ config/i386/i386.md(Arbeitskopie) @@ -11466,6 +11466,24 @@ (match_dup 4))) (unspec [(const_int 0)] UNSPEC_PEEPSIB)])]) +;; Combining simple memory jump instruction + +(define_peephole2 + [(set (match_operand:W 0 register_operand) +(match_operand:W 1 memory_operand)) + (set (pc) (match_dup 0))] + !TARGET_X32 peep2_reg_dead_p (2, operands[0]) + [(set (pc) (match_dup 1))]) + +;; For avoiding a second pass for peephole, we use here split pass + +(define_split + [(set (match_operand:W 0 register_operand) +(match_operand:W 1 memory_operand)) + (set (pc) (match_dup 0))] + !TARGET_X32 peep2_reg_dead_p (2, operands[0]) + [(set (pc) (match_dup 1))]) + ;; Call subroutine, returning value in operand 0 (define_expand call_value
[PATCH] Remove bogus include path with in-tree cloog
Hi, I have noticed there is a minor flaw with the include path when cloog is installed in-tree. That is, the cloog-include directory is added twice, first with absolute path, and then again with relative path, but with one ../ to little, so this is useless when compiling sources in the gcc directory. For example, if I call ../gcc-4.10-20140608/configure, the following is added to each invocation of xg++: -I/absolute_path/gcc-4.10-20140608/cloog/include -I../gcc-4.10-20140608/cloog/include The attached patch removes the bogus relative include path for in-tree cloog/include. Boot-strapped and regression-tested on x86_64-linux-gnu. OK for trunk? Thanks Bernd. 2014-06-20 Bernd Edlinger bernd.edlin...@hotmail.de Fix include path for in-tree cloog. * config/cloog.m4 (CLOOG_INIT_FLAGS): Remove bogus include path. * configure: Regenerate. patch-cloog.diff Description: Binary data
Re: [patch i386]: Combine memory and indirect jump
On 06/20/2014 08:56 AM, Kai Tietz wrote: +(define_split + [(set (match_operand:W 0 register_operand) +(match_operand:W 1 memory_operand)) + (set (pc) (match_dup 0))] + !TARGET_X32 peep2_reg_dead_p (2, operands[0]) + [(set (pc) (match_dup 1))]) + Huh? You can't use peep2 data structures in split passes. r~
[PATCH] Fix arrays in rtx.u + add minor rtx verification
When implementing -fsanitize=bounds I noticed a whole slew of errors about accessing u.fld[] field in rtx_def. Turned out this is indeed a bug, the array should have a size of 8; u.hwint[] array had similar issue. Thus fixed, plus I added some verification code to genpreds.c (can't do it in gengenrtl.c as that doesn't include rtl.h) so this won't happen again. Verified that the bootstrap crashes by bootstrapping with changed RTX_FLD_WIDTH/RTX_HWINT_WIDTH, otherwise the bootstrapp passes. Ok for trunk? 2014-06-20 Marek Polacek pola...@redhat.com * genpreds.c (verify_rtx_codes): New function. (main): Call it. * rtl.h (RTX_FLD_WIDTH, RTX_HWINT_WIDTH): Define. (struct rtx_def): Use them. diff --git gcc/genpreds.c gcc/genpreds.c index b14a4ac..3826757 100644 --- gcc/genpreds.c +++ gcc/genpreds.c @@ -1471,6 +1471,40 @@ parse_option (const char *opt) return 0; } +/* Verify RTX codes. We can't call fatal_error here, so call + gcc_unreachable after error to really abort. */ + +static void +verify_rtx_codes (void) +{ + unsigned int i, j; + + for (i = 0; i NUM_RTX_CODE; i++) +if (strchr (GET_RTX_FORMAT (i), 'w') == NULL) + { + if (strlen (GET_RTX_FORMAT (i)) RTX_FLD_WIDTH) + { + error (insufficient size of RTX_FLD_WIDTH); + gcc_unreachable (); + } + } +else + { + const size_t len = strlen (GET_RTX_FORMAT (i)); + for (j = 0; j len; j++) + if (GET_RTX_FORMAT (i)[j] != 'w') + { + error (rtx format does not contain only hwint entries); + gcc_unreachable (); + } + if (len RTX_HWINT_WIDTH) + { + error (insufficient size of RTL_MAX_HWINT_WIDTH); + gcc_unreachable (); + } + } +} + /* Master control. */ int main (int argc, char **argv) @@ -1518,5 +1552,7 @@ main (int argc, char **argv) if (have_error || ferror (stdout) || fflush (stdout) || fclose (stdout)) return FATAL_EXIT_CODE; + verify_rtx_codes (); + return SUCCESS_EXIT_CODE; } diff --git gcc/rtl.h gcc/rtl.h index 6ec91a8..3f2e774 100644 --- gcc/rtl.h +++ gcc/rtl.h @@ -264,6 +264,12 @@ struct GTY((variable_size)) hwivec_def { #define CWI_PUT_NUM_ELEM(RTX, NUM) \ (RTL_FLAG_CHECK1(CWI_PUT_NUM_ELEM, (RTX), CONST_WIDE_INT)-u2.num_elem = (NUM)) +/* The maximum number of entries in the FLD array in rtx. */ +#define RTX_FLD_WIDTH 8 + +/* The maximum number of entries in the HWINT array in rtx. */ +#define RTX_HWINT_WIDTH (MAX (REAL_WIDTH, 3)) + /* RTL expression (rtx). */ struct GTY((chain_next (RTX_NEXT (%h)), @@ -378,8 +384,8 @@ struct GTY((chain_next (RTX_NEXT (%h)), The number of operands and their types are controlled by the `code' field, according to rtl.def. */ union u { -rtunion fld[1]; -HOST_WIDE_INT hwint[1]; +rtunion fld[RTX_FLD_WIDTH]; +HOST_WIDE_INT hwint[RTX_HWINT_WIDTH]; struct block_symbol block_sym; struct real_value rv; struct fixed_value fv; Marek
Re: [patch i386]: Combine memory and indirect jump
2014-06-20 19:23 GMT+02:00 Richard Henderson r...@redhat.com: On 06/20/2014 08:56 AM, Kai Tietz wrote: +(define_split + [(set (match_operand:W 0 register_operand) +(match_operand:W 1 memory_operand)) + (set (pc) (match_dup 0))] + !TARGET_X32 peep2_reg_dead_p (2, operands[0]) + [(set (pc) (match_dup 1))]) + Huh? You can't use peep2 data structures in split passes. r~ Duh, you are right ... that shouldn't work, nevertheless it bootstrapped fine. Well, so we will need second peephole2 pass. I will come with patch for that soon. Kai
Re: [patch i386]: Combine memory and indirect jump
So revert to use a second peephole2 pass before final split before sched2 pass. Ok for apply ChangeLog 2014-06-20 Kai Tietz kti...@redhat.com PR target/39284 * passes.def (peephole2): Add second peephole2 pass before split before sched2 pass. * config/i386/i386.md (peehole2): To combine indirect jump with memory. (split2): Likewise. 2014-06-20 Kai Tietz kti...@redhat.com * gcc.target/i386/indjmp-1.c: New test. Tested for i686-pc-cygwin, and x86_64-unknown-linux-gnu running. Ok for apply when boostraps are passing? Regards, Kai Index: testsuite/gcc.target/i386/indjmp-1.c === --- testsuite/gcc.target/i386/indjmp-1.c(Revision 0) +++ testsuite/gcc.target/i386/indjmp-1.c(Arbeitskopie) @@ -0,0 +1,23 @@ +/* { dg-do compile { target ia32 } } */ +/* { dg-options -O2 } */ + +#define ADVANCE_AND_DISPATCH() goto *addresses[*pc++] + +void +Interpret(const unsigned char *pc) +{ +static const void *const addresses[] = { + l0, l1, l2 +}; + +l0: +ADVANCE_AND_DISPATCH(); + +l1: +ADVANCE_AND_DISPATCH(); + +l2: +return; +} + +/* { dg-final { scan-assembler-not jmp\[ \t\]*.%eax } } */ Index: passes.def === --- passes.def(Revision 211850) +++ passes.def(Arbeitskopie) @@ -393,6 +393,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_reorder_blocks); NEXT_PASS (pass_branch_target_load_optimize2); NEXT_PASS (pass_leaf_regs); + NEXT_PASS (pass_peephole2); NEXT_PASS (pass_split_before_sched2); NEXT_PASS (pass_sched2); NEXT_PASS (pass_stack_regs); Index: config/i386/i386.md === --- config/i386/i386.md(Revision 211850) +++ config/i386/i386.md(Arbeitskopie) @@ -11466,6 +11466,15 @@ (match_dup 4))) (unspec [(const_int 0)] UNSPEC_PEEPSIB)])]) +;; Combining simple memory jump instruction + +(define_peephole2 + [(set (match_operand:W 0 register_operand) +(match_operand:W 1 memory_operand)) + (set (pc) (match_dup 0))] + !TARGET_X32 peep2_reg_dead_p (2, operands[0]) + [(set (pc) (match_dup 1))]) + ;; Call subroutine, returning value in operand 0 (define_expand call_value
Re: [patch i386]: Combine memory and indirect jump
On 06/20/2014 10:52 AM, Kai Tietz wrote: 2014-06-20 Kai Tietz kti...@redhat.com PR target/39284 * passes.def (peephole2): Add second peephole2 pass before split before sched2 pass. * config/i386/i386.md (peehole2): To combine indirect jump with memory. (split2): Likewise. Why are we adding a second pass instead of just moving the one? r~
Re: [patch i386]: Combine memory and indirect jump
2014-06-20 19:55 GMT+02:00 Richard Henderson r...@redhat.com: On 06/20/2014 10:52 AM, Kai Tietz wrote: 2014-06-20 Kai Tietz kti...@redhat.com PR target/39284 * passes.def (peephole2): Add second peephole2 pass before split before sched2 pass. * config/i386/i386.md (peehole2): To combine indirect jump with memory. (split2): Likewise. Why are we adding a second pass instead of just moving the one? r~ I told that in a prior mail in that thread to Jeff. IIRC there are some conversion of impossible pushes then done too late, additional some patterns about split dieing register too. Means we produce weaker code. Kai
Re: [patch i386]: Combine memory and indirect jump
On 06/20/14 12:07, Kai Tietz wrote: 2014-06-20 19:55 GMT+02:00 Richard Henderson r...@redhat.com: On 06/20/2014 10:52 AM, Kai Tietz wrote: 2014-06-20 Kai Tietz kti...@redhat.com PR target/39284 * passes.def (peephole2): Add second peephole2 pass before split before sched2 pass. * config/i386/i386.md (peehole2): To combine indirect jump with memory. (split2): Likewise. Why are we adding a second pass instead of just moving the one? r~ I told that in a prior mail in that thread to Jeff. IIRC there are some conversion of impossible pushes then done too late, additional some patterns about split dieing register too. Means we produce weaker code. So let's dig into this deeper. Examples explanations would help. I know it feels like a bit of a runaround, but avoiding adding the pass would be good. jeff
Re: C++ PATCH for c++/59296 (rvalue object and lvalue ref-qualifier)
On 06/20/2014 03:11 PM, Jason Merrill wrote: On 06/19/2014 01:29 PM, Marc Glisse wrote: That looks weird to me. The const version is a better match than the const, so we should pick that one in overload resolution, but if we remove the const version, the other one seems valid to me Hmm, you're right, I was confused. Here's a patch that fixes this properly. Tested x86_64-pc-linux-gnu, applying to trunk. commit cc1e903c60c452ad7b618f6a9ff25ae85741424e Author: Jason Merrill ja...@redhat.com Date: Fri Jun 20 14:51:21 2014 +0200 PR c++/59296 * call.c (add_function_candidate): Avoid special 'this' handling if we have a ref-qualifier. diff --git a/gcc/cp/call.c b/gcc/cp/call.c index da91433..4847c3a 100644 --- a/gcc/cp/call.c +++ b/gcc/cp/call.c @@ -2025,9 +2025,9 @@ add_function_candidate (struct z_candidate **candidates, object parameter has reference type. */ bool rv = FUNCTION_RVALUE_QUALIFIED (TREE_TYPE (fn)); parmtype = cp_build_reference_type (parmtype, rv); - /* Don't bind an rvalue to a const lvalue ref-qualifier. */ - if (!rv) - lflags |= LOOKUP_NO_RVAL_BIND|LOOKUP_NO_TEMP_BIND; + /* The special handling of 'this' conversions in compare_ics + does not apply if there is a ref-qualifier. */ + is_this = false; } else { @@ -8597,10 +8597,11 @@ compare_ics (conversion *ics1, conversion *ics2) /* [over.ics.rank] --S1 and S2 are reference bindings (_dcl.init.ref_) and neither refers - to an implicit object parameter, and either S1 binds an lvalue reference - to an lvalue and S2 binds an rvalue reference or S1 binds an rvalue - reference to an rvalue and S2 binds an lvalue reference - (C++0x draft standard, 13.3.3.2) + to an implicit object parameter of a non-static member function + declared without a ref-qualifier, and either S1 binds an lvalue + reference to an lvalue and S2 binds an rvalue reference or S1 binds an + rvalue reference to an rvalue and S2 binds an lvalue reference (C++0x + draft standard, 13.3.3.2) --S1 and S2 are reference bindings (_dcl.init.ref_), and the types to which the references refer are the same type except for
C++ PATCH for c++/61556 (constexpr member function)
Now that we're calling build_this in build_over_call, it needs to happen on the template path as well. Tested x86_64-pc-linux-gnu, applying to trunk. commit daf445b34181c222baa792e7310fc4af2d26ec3c Author: Jason Merrill ja...@redhat.com Date: Thu Jun 19 14:29:51 2014 +0200 PR c++/61556 * call.c (build_over_call): Call build_this in template path. diff --git a/gcc/cp/call.c b/gcc/cp/call.c index e147abd..da91433 100644 --- a/gcc/cp/call.c +++ b/gcc/cp/call.c @@ -6896,7 +6896,7 @@ build_over_call (struct z_candidate *cand, int flags, tsubst_flags_t complain) ++nargs; alcarray = XALLOCAVEC (tree, nargs); - alcarray[0] = first_arg; + alcarray[0] = build_this (first_arg); FOR_EACH_VEC_SAFE_ELT (args, ix, arg) alcarray[ix + 1] = arg; argarray = alcarray; diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-template7.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-template7.C new file mode 100644 index 000..e835dbf --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-template7.C @@ -0,0 +1,32 @@ +// PR c++/61556 +// { dg-do compile { target c++11 } } + +class ValueType { +public: +constexpr operator int() const {return m_ID;}; +constexpr ValueType(const int v) +: m_ID(v) {} +private: +int m_ID; +}; + +class ValueTypeEnum { +public: +static constexpr ValueType doubleval = ValueType(1); +}; + +template int format +class ValueTypeInfo { +}; + +template typename Format +class FillFunctor { +public: +FillFunctor() { +ValueTypeInfoValueTypeEnum::doubleval v; +} +}; + +int main() { +ValueTypeInfoValueTypeEnum::doubleval v; +}
Re: [PATCH, AARCH64] Enable fuse-caller-save for AARCH64
On 19-06-14 20:41, Richard Henderson wrote: On 06/19/2014 11:25 AM, Tom de Vries wrote: On 19-06-14 05:53, Richard Henderson wrote: On 06/01/2014 03:00 AM, Tom de Vries wrote: +aarch64_emit_call_insn (rtx pat) +{ + rtx insn = emit_call_insn (pat); + + rtx *fusage = CALL_INSN_FUNCTION_USAGE (insn); + clobber_reg (fusage, gen_rtx_REG (word_mode, IP0_REGNUM)); + clobber_reg (fusage, gen_rtx_REG (word_mode, IP1_REGNUM)); Actually, I'd like to know more about how this is supposed to work. Why are you only marking the two registers that would be used by a PLT entry, but not those clobbered by the ld.so trampoline, or indeed the unknown function that would be called from the PLT. Oh, I see, looking at the code we do actually follow the cgraph and make sure it is a direct call with a known destination. So, in fact, it's only the registers that could be clobbered by ld branch islands (so these two are still correct for aarch64). This means the documentation is actually wrong when it mentions PLTs at all. Yes, if we go from the point of view that the TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS hooks sole purpose is to enable the fuse-caller-save optimization. How about this updated definition ? OK for trunk if re-testing on arm succeeds ? I did like the doc including mention of stubs, because they're easy to forget. How about Set to true if each call that binds to a local definition explicitly clobbers or sets all non-fixed registers modified by performing the call. That is, by the call pattern itself, or by code that might be inserted by the linker (e.g. stubs, veneers, branch islands), but not including those modifiable by the callee. The affected registers may be mentioned explicitly in the call pattern, or included as clobbers in CALL_INSN_FUNCTION_USAGE. The default version of this hook is set to false. The purpose of this hook is to enable the fuse-caller-save optimization. Looks good to me. Bootstrapped and committed as attached. Thanks, - Tom 2014-06-20 Tom de Vries t...@codesourcery.com * target.def (call_fusage_contains_non_callee_clobbers): Update definition. * doc/tm.texi: Regenerate. diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index c272630..45281ae 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -4884,14 +4884,14 @@ Whether this target supports splitting the stack when the options described in @ @cindex miscellaneous register hooks @deftypevr {Target Hook} bool TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS -set to true if all the calls in the current function contain clobbers in -CALL_INSN_FUNCTION_USAGE for the registers that are clobbered by the call -rather than by the callee, and are not already set or clobbered in the call -pattern. Examples of such registers are registers used in PLTs and stubs, -and temporary registers used in the call instruction but not present in the -rtl pattern. Another way to formulate it is the registers not present in the -rtl pattern that are clobbered by the call assuming the callee does not -clobber any register. The default version of this hook is set to false. +Set to true if each call that binds to a local definition explicitly +clobbers or sets all non-fixed registers modified by performing the call. +That is, by the call pattern itself, or by code that might be inserted by the +linker (e.g. stubs, veneers, branch islands), but not including those +modifiable by the callee. The affected registers may be mentioned explicitly +in the call pattern, or included as clobbers in CALL_INSN_FUNCTION_USAGE. +The default version of this hook is set to false. The purpose of this hook +is to enable the fuse-caller-save optimization. @end deftypevr @node Varargs diff --git a/gcc/target.def b/gcc/target.def index e455211..ee250e6 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -5128,18 +5128,18 @@ FRAME_POINTER_REGNUM, ARG_POINTER_REGNUM, and the PIC_OFFSET_TABLE_REGNUM., hook_void_bitmap) /* Targets should define this target hook to mark that non-callee clobbers are - present in CALL_INSN_FUNCTION_USAGE for all the calls in the current - function. */ + present in CALL_INSN_FUNCTION_USAGE for all the calls that bind to a local + definition. */ DEFHOOKPOD (call_fusage_contains_non_callee_clobbers, - set to true if all the calls in the current function contain clobbers in\n\ -CALL_INSN_FUNCTION_USAGE for the registers that are clobbered by the call\n\ -rather than by the callee, and are not already set or clobbered in the call\n\ -pattern. Examples of such registers are registers used in PLTs and stubs,\n\ -and temporary registers used in the call instruction but not present in the\n\ -rtl pattern. Another way to formulate it is the registers not present in the\n\ -rtl pattern that are clobbered by the call assuming the callee does not\n\ -clobber any register. The default version of this hook is set to false., + Set to true if each call that binds to a local definition explicitly\n\
Re: [PATCH] Power/GCC: Remove trailing NOP from byte-swap code
014-06-20 Maciej W. Rozycki ma...@codesourcery.com gcc/ * config/rs6000/rs6000.md: Append `DONE' to preparation statements of `bswap' pattern splitters. Okay. Thanks, David
Re: [PATCH] Fix arrays in rtx.u + add minor rtx verification
On Fri, Jun 20, 2014 at 07:36:41PM +0200, Marek Polacek wrote: 2014-06-20 Marek Polacek pola...@redhat.com * genpreds.c (verify_rtx_codes): New function. (main): Call it. * rtl.h (RTX_FLD_WIDTH, RTX_HWINT_WIDTH): Define. (struct rtx_def): Use them. Note, e.g. Coverity also complains about this stuff loudly too, apparently not just in the problematic case where rtx_def is used in a middle of structure, but also when used in flexible array like spot. Most RTLs are allocated through rtx_alloc and the size is determined from RTX_HDR_SIZE (i.e. offsetof) and/or RTX_CODE_SIZE, so your rtl.h change IMHO shouldn't affect anything but make the expmed.c init_expmed_rtl structure somewhat longer. --- gcc/genpreds.c +++ gcc/genpreds.c @@ -1471,6 +1471,40 @@ parse_option (const char *opt) return 0; } +/* Verify RTX codes. We can't call fatal_error here, so call + gcc_unreachable after error to really abort. */ + +static void +verify_rtx_codes (void) +{ + unsigned int i, j; + + for (i = 0; i NUM_RTX_CODE; i++) +if (strchr (GET_RTX_FORMAT (i), 'w') == NULL) + { + if (strlen (GET_RTX_FORMAT (i)) RTX_FLD_WIDTH) + { + error (insufficient size of RTX_FLD_WIDTH); + gcc_unreachable (); I think it would be nice to be more verbose, i.e. say which rtx has longer format string than RTX_FLD_WIDTH, and perhaps also the size and RTX_FLD_WIDTH value. Also, can't you use internal_error instead of error + gcc_unreachable ? So perhaps internal_error (%s format %s longer than RTX_FLD_WIDTH %d\n, GET_RTX_NAME (i), GET_RTX_FORMAT (i), (int) RTX_FLD_WIDTH); ? + } + } +else + { + const size_t len = strlen (GET_RTX_FORMAT (i)); + for (j = 0; j len; j++) + if (GET_RTX_FORMAT (i)[j] != 'w') + { + error (rtx format does not contain only hwint entries); + gcc_unreachable (); + } + if (len RTX_HWINT_WIDTH) + { + error (insufficient size of RTL_MAX_HWINT_WIDTH); + gcc_unreachable (); + } + } And similarly here. Otherwise, LGTM, but as I've been discussing this with Marek, I'd prefer somebody else to review it. Jakub
Re: [PATCH] Power/GCC: Remove trailing NOP from byte-swap code
On Fri, 20 Jun 2014, David Edelsohn wrote: gcc/ * config/rs6000/rs6000.md: Append `DONE' to preparation statements of `bswap' pattern splitters. Okay. Committed, thanks for your review. Maciej
Re: [PATCH] Fix arrays in rtx.u + add minor rtx verification
On 06/20/14 13:01, Jakub Jelinek wrote: On Fri, Jun 20, 2014 at 07:36:41PM +0200, Marek Polacek wrote: 2014-06-20 Marek Polacek pola...@redhat.com * genpreds.c (verify_rtx_codes): New function. (main): Call it. * rtl.h (RTX_FLD_WIDTH, RTX_HWINT_WIDTH): Define. (struct rtx_def): Use them. Note, e.g. Coverity also complains about this stuff loudly too, Yes it does. IIRC, these warnings from Coverity were the cause of GCC reaching the #1 or #2 position across Red Hat's packages in terms of Coverity warnings. Not a good position to be in. like spot. Most RTLs are allocated through rtx_alloc and the size is determined from RTX_HDR_SIZE (i.e. offsetof) and/or RTX_CODE_SIZE, so your rtl.h change IMHO shouldn't affect anything but make the expmed.c init_expmed_rtl structure somewhat longer. Right. This comment was actually very helpful in that I wasn't aware of precisely which cases Marek was trying to address. Presumably the [1] sizing is what prevents any compile-time checking of this? No strong opinion on the internal_error vs error+unreachable. --- gcc/genpreds.c +++ gcc/genpreds.c @@ -1471,6 +1471,40 @@ parse_option (const char *opt) return 0; } +/* Verify RTX codes. We can't call fatal_error here, so call + gcc_unreachable after error to really abort. */ + +static void +verify_rtx_codes (void) +{ + unsigned int i, j; + + for (i = 0; i NUM_RTX_CODE; i++) +if (strchr (GET_RTX_FORMAT (i), 'w') == NULL) + { + if (strlen (GET_RTX_FORMAT (i)) RTX_FLD_WIDTH) + { + error (insufficient size of RTX_FLD_WIDTH); + gcc_unreachable (); I think it would be nice to be more verbose, i.e. say which rtx has longer format string than RTX_FLD_WIDTH, and perhaps also the size and RTX_FLD_WIDTH value. Also, can't you use internal_error instead of error + gcc_unreachable ? Agreed, definitely indicate which RTX code is problematical. Jeff
[PATCH] Fix 61565 -- cmpelim vs non-call exceptions
There aren't too many users of the cmpelim pass, and previously they were all small embedded targets without an FPU. I'm a bit surprised that Ramana decided to enable this pass for aarch64, as that target is not so limited as the block comment for the pass describes. Honestly, whatever is being deleted here ought to have been found earlier, either via combine or cse. We ought to find out why any changes are made during this pass for aarch64. That said, this PR does demonstrate a bug in the handling of fp comparisons in the presence of -fnon-call-exceptions, so I go ahead and fix that regardless of what we do with the aarch64 port longer term. Bootstrap still in progress, but the original testcase is resolved. r~ * compare-elim.c (struct comparison): Add eh_note. (find_comparison_dom_walker::before_dom_children): Don't eliminate a redundant comparison in a different EH region. Purge EH edges if necessary. diff --git a/gcc/compare-elim.c b/gcc/compare-elim.c index 2fbb75b..4ecdd48 100644 --- a/gcc/compare-elim.c +++ b/gcc/compare-elim.c @@ -100,6 +100,9 @@ struct comparison constants. */ rtx in_a, in_b; + /* The REG_EH_REGION of the comparison. */ + rtx eh_note; + /* Information about how this comparison is used. */ struct comparison_use uses[MAX_CMP_USE]; @@ -262,6 +265,7 @@ find_comparison_dom_walker::before_dom_children (basic_block bb) struct comparison *last_cmp; rtx insn, next, last_clobber; bool last_cmp_valid; + bool need_purge = false; bitmap killed; killed = BITMAP_ALLOC (NULL); @@ -303,44 +307,60 @@ find_comparison_dom_walker::before_dom_children (basic_block bb) if (src) { enum machine_mode src_mode = GET_MODE (src); + rtx eh_note = NULL; - /* Eliminate a compare that's redundant with the previous. */ - if (last_cmp_valid - rtx_equal_p (last_cmp-in_a, XEXP (src, 0)) - rtx_equal_p (last_cmp-in_b, XEXP (src, 1))) - { - rtx flags, x; - enum machine_mode new_mode - = targetm.cc_modes_compatible (last_cmp-orig_mode, src_mode); + if (flag_non_call_exceptions) + eh_note = find_reg_note (insn, REG_EH_REGION, NULL); - /* New mode is incompatible with the previous compare mode. */ - if (new_mode == VOIDmode) - continue; + if (!last_cmp_valid) + goto dont_delete; - if (new_mode != last_cmp-orig_mode) - { - flags = gen_rtx_REG (src_mode, targetm.flags_regnum); + /* Take care that it's in the same EH region. */ + if (flag_non_call_exceptions + !rtx_equal_p (eh_note, last_cmp-eh_note)) + goto dont_delete; - /* Generate new comparison for substitution. */ - x = gen_rtx_COMPARE (new_mode, XEXP (src, 0), XEXP (src, 1)); - x = gen_rtx_SET (VOIDmode, flags, x); + /* Make sure the compare is redundant with the previous. */ + if (!rtx_equal_p (last_cmp-in_a, XEXP (src, 0)) + || !rtx_equal_p (last_cmp-in_b, XEXP (src, 1))) + goto dont_delete; - if (!validate_change (last_cmp-insn, - PATTERN (last_cmp-insn), x, false)) - continue; + /* New mode must be compatible with the previous compare mode. */ + { + enum machine_mode new_mode + = targetm.cc_modes_compatible (last_cmp-orig_mode, src_mode); + if (new_mode == VOIDmode) + goto dont_delete; - last_cmp-orig_mode = new_mode; - } + if (new_mode != last_cmp-orig_mode) + { + rtx x, flags = gen_rtx_REG (src_mode, targetm.flags_regnum); - delete_insn (insn); - continue; - } + /* Generate new comparison for substitution. */ + x = gen_rtx_COMPARE (new_mode, XEXP (src, 0), XEXP (src, 1)); + x = gen_rtx_SET (VOIDmode, flags, x); + if (!validate_change (last_cmp-insn, + PATTERN (last_cmp-insn), x, false)) + goto dont_delete; + + last_cmp-orig_mode = new_mode; + } + } + + /* All tests and substitutions succeeded! */ + if (eh_note) + need_purge = true; + delete_insn (insn); + continue; + + dont_delete: last_cmp = XCNEW (struct comparison); last_cmp-insn = insn; last_cmp-prev_clobber = last_clobber; last_cmp-in_a = XEXP (src, 0); last_cmp-in_b = XEXP (src, 1); + last_cmp-eh_note = eh_note; last_cmp-orig_mode = src_mode; all_compares.safe_push (last_cmp); @@ -404,6 +424,11 @@ find_comparison_dom_walker::before_dom_children
testsuite allocators patch
Hi I would like to finally propose this patch before the one on _Rb_tree, as a separate one. I have adopted the same evolution on the tracker_allocator with even a perfect forwarding constructor to allow its usage on top of the uneq_allocator which take a personality parameter. Doing so I realized that move_assign_neg.cc tests were not accurate enough as they needed a non move propagating allocator and the uneq_allocator were not explicitly non propagating. 2014-06-21 François Dumont fdum...@gcc.gnu.org * testsuite/util/testsuite_allocator.h (tracker_allocator_counter::allocate): Remove new invocation, only collect information. (tracker_allocator_counter::deallocate): Remove delete invocation, only collect information. (check_inconsistent_alloc_value_type): New. (tracker_allocator): Transform as a facade for any allocator type. (uneq_allocator): Likewise. (propagating_allocator): Likewise. * testsuite/23_containers/forward_list/debug/move_assign_neg.cc: Use an explicitly non propagating allocator. * testsuite/23_containers/map/debug/move_assign_neg.cc: Likewise. * testsuite/23_containers/multimap/debug/move_assign_neg.cc: likewise. * testsuite/23_containers/multiset/debug/move_assign_neg.cc: Likewise. * testsuite/23_containers/set/debug/move_assign_neg.cc: Likewise. * testsuite/23_containers/unordered_map/debug/move_assign_neg.cc: Likewise. * testsuite/23_containers/unordered_multimap/debug/move_assign_neg.cc: Likewise. * testsuite/23_containers/unordered_multiset/debug/move_assign_neg.cc: Likewise. * testsuite/23_containers/unordered_set/debug/move_assign_neg.cc: Likewise. * testsuite/23_containers/vector/debug/move_assign_neg.cc: Likewise. Tested under Linux x86_64. Ok to commit ? François
[i386] logical shift right in shrd
Hello, as reported in PR 61503, there seems to be a typo in the shrd pattern. I think it is quite unlikely to cause any problem, because the pattern is 1 instruction too long for combine to recognize it (by the way, if someone has suggestions for PR 55583...). But it is still better to fix it. Bootstrap+testsuite on x86_64-linux-gnu. 2014-06-21 Marc Glisse marc.gli...@inria.fr PR target/61503 * config/i386/i386.md (x86_64_shrd, x86_shrd): Replace ashiftrt with lshiftrt. -- Marc GlisseIndex: gcc/config/i386/i386.md === --- gcc/config/i386/i386.md (revision 211856) +++ gcc/config/i386/i386.md (working copy) @@ -9601,37 +9601,37 @@ (match_operand:DWI 1 register_operand) (match_operand:QI 2 nonmemory_operand))) (clobber (reg:CC FLAGS_REG))]) (match_dup 3)] TARGET_CMOVE [(const_int 0)] ix86_split_shift_insn (operands, operands[3], DWImode); DONE;) (define_insn x86_64_shrd [(set (match_operand:DI 0 nonimmediate_operand +r*m) -(ior:DI (ashiftrt:DI (match_dup 0) +(ior:DI (lshiftrt:DI (match_dup 0) (match_operand:QI 2 nonmemory_operand Jc)) (ashift:DI (match_operand:DI 1 register_operand r) (minus:QI (const_int 64) (match_dup 2) (clobber (reg:CC FLAGS_REG))] TARGET_64BIT shrd{q}\t{%s2%1, %0|%0, %1, %2} [(set_attr type ishift) (set_attr prefix_0f 1) (set_attr mode DI) (set_attr athlon_decode vector) (set_attr amdfam10_decode vector) (set_attr bdver1_decode vector)]) (define_insn x86_shrd [(set (match_operand:SI 0 nonimmediate_operand +r*m) -(ior:SI (ashiftrt:SI (match_dup 0) +(ior:SI (lshiftrt:SI (match_dup 0) (match_operand:QI 2 nonmemory_operand Ic)) (ashift:SI (match_operand:SI 1 register_operand r) (minus:QI (const_int 32) (match_dup 2) (clobber (reg:CC FLAGS_REG))] shrd{l}\t{%s2%1, %0|%0, %1, %2} [(set_attr type ishift) (set_attr prefix_0f 1) (set_attr mode SI) (set_attr pent_pair np)
Re: [PATCH] Fix arrays in rtx.u + add minor rtx verification
On Fri, Jun 20, 2014 at 09:01:14PM +0200, Jakub Jelinek wrote: On Fri, Jun 20, 2014 at 07:36:41PM +0200, Marek Polacek wrote: 2014-06-20 Marek Polacek pola...@redhat.com * genpreds.c (verify_rtx_codes): New function. (main): Call it. * rtl.h (RTX_FLD_WIDTH, RTX_HWINT_WIDTH): Define. (struct rtx_def): Use them. Note, e.g. Coverity also complains about this stuff loudly too, apparently not just in the problematic case where rtx_def is used in a middle of structure, but also when used in flexible array like spot. Most RTLs are allocated through rtx_alloc and the size is determined from RTX_HDR_SIZE (i.e. offsetof) and/or RTX_CODE_SIZE, so your rtl.h change IMHO shouldn't affect anything but make the expmed.c init_expmed_rtl structure somewhat longer. --- gcc/genpreds.c +++ gcc/genpreds.c @@ -1471,6 +1471,40 @@ parse_option (const char *opt) return 0; } +/* Verify RTX codes. We can't call fatal_error here, so call + gcc_unreachable after error to really abort. */ + +static void +verify_rtx_codes (void) +{ + unsigned int i, j; + + for (i = 0; i NUM_RTX_CODE; i++) +if (strchr (GET_RTX_FORMAT (i), 'w') == NULL) + { + if (strlen (GET_RTX_FORMAT (i)) RTX_FLD_WIDTH) + { + error (insufficient size of RTX_FLD_WIDTH); + gcc_unreachable (); I think it would be nice to be more verbose, i.e. say which rtx has longer format string than RTX_FLD_WIDTH, and perhaps also the size and RTX_FLD_WIDTH value. Also, can't you use internal_error instead of error + gcc_unreachable ? So perhaps internal_error (%s format %s longer than RTX_FLD_WIDTH %d\n, GET_RTX_NAME (i), GET_RTX_FORMAT (i), (int) RTX_FLD_WIDTH); ? Ok, that's much better. I actually can use internal_error - we have that function in both diagnostic.c and errors.c, while fatal_error is only in diagnostic.c. + } + } +else + { + const size_t len = strlen (GET_RTX_FORMAT (i)); + for (j = 0; j len; j++) + if (GET_RTX_FORMAT (i)[j] != 'w') + { + error (rtx format does not contain only hwint entries); + gcc_unreachable (); + } + if (len RTX_HWINT_WIDTH) + { + error (insufficient size of RTL_MAX_HWINT_WIDTH); + gcc_unreachable (); + } + } And similarly here. Fixed. Otherwise, LGTM, but as I've been discussing this with Marek, I'd prefer somebody else to review it. Sure. So can anybody look at this, please? 2014-06-20 Marek Polacek pola...@redhat.com * genpreds.c (verify_rtx_codes): New function. (main): Call it. * rtl.h (RTX_FLD_WIDTH, RTX_HWINT_WIDTH): Define. (struct rtx_def): Use them. diff --git gcc/genpreds.c gcc/genpreds.c index b14a4ac..7e62124 100644 --- gcc/genpreds.c +++ gcc/genpreds.c @@ -1471,6 +1471,36 @@ parse_option (const char *opt) return 0; } +/* Verify RTX codes. */ + +static void +verify_rtx_codes (void) +{ + unsigned int i, j; + + for (i = 0; i NUM_RTX_CODE; i++) +if (strchr (GET_RTX_FORMAT (i), 'w') == NULL) + { + if (strlen (GET_RTX_FORMAT (i)) RTX_FLD_WIDTH) + internal_error (%s format %s longer than RTX_FLD_WIDTH %d\n, + GET_RTX_NAME (i), GET_RTX_FORMAT (i), + (int) RTX_FLD_WIDTH); + } +else + { + const size_t len = strlen (GET_RTX_FORMAT (i)); + for (j = 0; j len; j++) + if (GET_RTX_FORMAT (i)[j] != 'w') + internal_error (%s format %s should contain only w, but + has %c\n, GET_RTX_NAME (i), GET_RTX_FORMAT (i), + GET_RTX_FORMAT (i)[j]); + if (len RTX_HWINT_WIDTH) + internal_error (%s format %s longer than RTX_HWINT_WIDTH %d\n, + GET_RTX_NAME (i), GET_RTX_FORMAT (i), + (int) RTX_HWINT_WIDTH); + } +} + /* Master control. */ int main (int argc, char **argv) @@ -1518,5 +1548,7 @@ main (int argc, char **argv) if (have_error || ferror (stdout) || fflush (stdout) || fclose (stdout)) return FATAL_EXIT_CODE; + verify_rtx_codes (); + return SUCCESS_EXIT_CODE; } diff --git gcc/rtl.h gcc/rtl.h index 6ec91a8..3f2e774 100644 --- gcc/rtl.h +++ gcc/rtl.h @@ -264,6 +264,12 @@ struct GTY((variable_size)) hwivec_def { #define CWI_PUT_NUM_ELEM(RTX, NUM) \ (RTL_FLAG_CHECK1(CWI_PUT_NUM_ELEM, (RTX), CONST_WIDE_INT)-u2.num_elem = (NUM)) +/* The maximum number of entries in the FLD array in rtx. */ +#define RTX_FLD_WIDTH 8 + +/* The maximum number of entries in the HWINT array in rtx. */ +#define RTX_HWINT_WIDTH (MAX (REAL_WIDTH, 3)) + /* RTL expression (rtx). */ struct GTY((chain_next (RTX_NEXT (%h)), @@ -378,8 +384,8 @@ struct GTY((chain_next (RTX_NEXT (%h)), The number of operands and
Re: [PATCH] Fix arrays in rtx.u + add minor rtx verification
On Fri, Jun 20, 2014 at 01:55:41PM -0600, Jeff Law wrote: like spot. Most RTLs are allocated through rtx_alloc and the size is determined from RTX_HDR_SIZE (i.e. offsetof) and/or RTX_CODE_SIZE, so your rtl.h change IMHO shouldn't affect anything but make the expmed.c init_expmed_rtl structure somewhat longer. Right. This comment was actually very helpful in that I wasn't aware of precisely which cases Marek was trying to address. Presumably the [1] sizing is what prevents any compile-time checking of this? First version of Marek's patch did that (never instrumented [], [0] and [1] arrays, no matter where they appeared, and instrumented everything else). Latest patch only never instruments [] (which, by definition can only appear at the end of structure), other arrays (no matter what size) aren't instrumented if they aren't followed by any fields, or if the base of the handled components is not INDIRECT_REF/MEM_REF (so, typically is a decl). u.fld[1] array is the last field, so we don't warn for that, but when rtx_def appears in another structure (in expmed.c) or if e.g. even some code had a rtx_def typed variable and accessed say u.fld[1] in there, it would be instrumented. Whether we should have a strict array bounds mode where we would instrument even arrays at the end of structures (with the exception of []) is something to be discussed. Jakub
Re: [Committed] [PATCH] PR61123 : Fix the ABI mis-matching error caused by LTO
On Fri, Jun 20, 2014 at 01:14:52PM +0800, Hale Wang wrote: 2014-06-20 Hale Wang hale.w...@arm.com * gcc.target/arm/lto/: New folder to verify the LTO option. * gcc.target/arm/lto/pr61123-enum-size_0.c: New test case. * gcc.target/arm/lto/pr61123-enum-size_1.c: Likewise. * gcc.target/arm/lto/lto.exp: New exp file used to test LTO option. * lib/lto.exp (object-readelf): New procedure. This FAILs on non-arm targets. I've committed following fix as obvious. On the other side, if you don't plan to add too many arm LTO tests, supposedly putting it into gcc.dg/lto and just using arm*-*-* target selector might be better. 2014-06-20 Jakub Jelinek ja...@redhat.com * gcc.target/arm/lto/lto.exp: Exit immediately if not arm*-*-* target. --- gcc/testsuite/gcc.target/arm/lto/lto.exp.jj 2014-06-20 08:02:50.0 +0200 +++ gcc/testsuite/gcc.target/arm/lto/lto.exp2014-06-20 23:19:33.850043692 +0200 @@ -16,6 +16,10 @@ # # Contributed by Diego Novillo dnovi...@google.com +# Exit immediately if this isn't an ARM target. +if ![istarget arm*-*-*] then { + return +} # Test link-time optimization across multiple files. # Jakub
Re: [PATCH] Fix arrays in rtx.u + add minor rtx verification
On Fri, Jun 20, 2014 at 11:10:18PM +0200, Jakub Jelinek wrote: On Fri, Jun 20, 2014 at 01:55:41PM -0600, Jeff Law wrote: like spot. Most RTLs are allocated through rtx_alloc and the size is determined from RTX_HDR_SIZE (i.e. offsetof) and/or RTX_CODE_SIZE, so your rtl.h change IMHO shouldn't affect anything but make the expmed.c init_expmed_rtl structure somewhat longer. Right. This comment was actually very helpful in that I wasn't aware of precisely which cases Marek was trying to address. Presumably the [1] sizing is what prevents any compile-time checking of this? First version of Marek's patch did that (never instrumented [], [0] and [1] arrays, no matter where they appeared, and instrumented everything else). Latest patch only never instruments [] (which, by definition can only appear at the end of structure), other arrays (no matter what size) aren't instrumented if they aren't followed by any fields, or if the base of the handled components is not INDIRECT_REF/MEM_REF (so, typically is a decl). u.fld[1] array is the last field, so we don't warn for that, but when rtx_def appears in another structure (in expmed.c) or if e.g. even some code had a rtx_def typed variable and accessed say u.fld[1] in there, it would be instrumented. Yeah - init_expmed in expmed.c has XEXP (all.plus, 1) = all.reg; which is expanded to (((all.plus)-u.fld[1]).rt_rtx) = all.reg; but since the expression (A)-B is the same as A.B (if A is a valid pointer expression), the above was turned into all.plus.u.fld[1].rt_rtx) = all.reg; and that doesn't contain any INDIRECT_REF/MEM_REFs - it's being instrumented. With this patch I don't see any -fsanitize=bounds errors when doing bootstrap-ubsan. Whether we should have a strict array bounds mode where we would instrument even arrays at the end of structures (with the exception of []) is something to be discussed. This should be basically just about adding a new option - e.g. -fsanitize=bounds-strict. Marek
[PATCH] Fix up -march=native handling under KVM (PR target/61570)
Hi! As mentioned in the PR, some? KVM versions disable some CPU flags, supposedly so that it can be migrated to any other x86-64 hw. Thus, it announces only sse2 and lm, but already not sse3, ssse3 nor 3dnow, and (unfortunately) identifies itself as GenuineIntel family 6 model 13. There is no 64-bit CPU that actually is the lowest common denominator of x86-64 CPUs (AMDs had 3dNOW etc., Intel first x86-64 CPUs had already SSSE3). This patch just makes sure we can use -march=native -m64 on such hosts, without that gcc complains that the selected CPU doesn't support 64-bit mode, because -march=native gives -march=pentium-m. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.9? 2014-06-20 Jakub Jelinek ja...@redhat.com PR target/61570 * config/i386/driver-i386.c (host_detect_local_cpu): For unknown model family 6 CPU with has_longmode never use a CPU without 64-bit support. --- gcc/config/i386/driver-i386.c.jj2014-05-14 14:45:54.0 +0200 +++ gcc/config/i386/driver-i386.c 2014-06-20 18:59:57.805006358 +0200 @@ -745,6 +745,11 @@ const char *host_detect_local_cpu (int a /* Assume Core 2. */ cpu = core2; } + else if (has_longmode) + /* Perhaps some emulator? Assume x86-64, otherwise gcc + -march=native would be unusable for 64-bit compilations, + as all the CPUs below are 32-bit only. */ + cpu = x86-64; else if (has_sse3) /* It is Core Duo. */ cpu = pentium-m; Jakub
Re: [patch i386]: Combine memory and indirect jump
2014-06-20 20:14 GMT+02:00 Jeff Law l...@redhat.com: On 06/20/14 12:07, Kai Tietz wrote: 2014-06-20 19:55 GMT+02:00 Richard Henderson r...@redhat.com: On 06/20/2014 10:52 AM, Kai Tietz wrote: 2014-06-20 Kai Tietz kti...@redhat.com PR target/39284 * passes.def (peephole2): Add second peephole2 pass before split before sched2 pass. * config/i386/i386.md (peehole2): To combine indirect jump with memory. (split2): Likewise. Why are we adding a second pass instead of just moving the one? r~ I told that in a prior mail in that thread to Jeff. IIRC there are some conversion of impossible pushes then done too late, additional some patterns about split dieing register too. Means we produce weaker code. So let's dig into this deeper. Examples explanations would help. I know it feels like a bit of a runaround, but avoiding adding the pass would be good. jeff I dug into it a bit. And couldn't find any significant difference for x64 target for existing testcases. I am still a bit concerned - I can't reproduce it for x86/x86_64 targets - that we might cause regressions for targets by moving peephole2 pass too close before the sched2 pass. Therefore I searched for the closest place to the prior place of the peephole2 pass, which solves still the indirect jump optimization on memory. By testing for x86/x64 the pass needs to be run directly after the reorder blocks pass. So I suggest following change of passes.def: Index: passes.def === --- passes.def (Revision 211850) +++ passes.def (Arbeitskopie) @@ -384,7 +384,6 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_rtl_dse2); NEXT_PASS (pass_stack_adjustments); NEXT_PASS (pass_jump2); - NEXT_PASS (pass_peephole2); NEXT_PASS (pass_if_after_reload); NEXT_PASS (pass_regrename); NEXT_PASS (pass_cprop_hardreg); @@ -391,6 +390,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_fast_rtl_dce); NEXT_PASS (pass_duplicate_computed_gotos); NEXT_PASS (pass_reorder_blocks); + NEXT_PASS (pass_peephole2); NEXT_PASS (pass_branch_target_load_optimize2); NEXT_PASS (pass_leaf_regs); NEXT_PASS (pass_split_before_sched2); Kai
Re: [PATCH, PR61219]: Fix sNaN handling in ARM float to double conversion
On Tue, Jun 17, 2014 at 11:29:01PM +0100, Ramana Radhakrishnan wrote: On Sun, May 18, 2014 at 10:23 PM, Aurelien Jarno aurel...@aurel32.net wrote: On ARM soft-float, the float to double conversion doesn't convert a sNaN to qNaN as the IEEE Std 754 standard mandates: Under default exception handling, any operation signaling an invalid operation exception and for which a floating-point result is to be delivered shall deliver a quiet NaN. Given the soft float ARM code ignores exceptions and always provides a result, a float to double conversion of a signaling NaN should return a quiet NaN. Fix this in extendsfdf2. 2014-05-18 Aurelien Jarno aurel...@aurel32.net PR target/61219 * config/arm/ieee754-df.S (extendsfdf2): Convert sNaN to qNaN. Ok if no regressions along with a testcase to catch this case please and fixing the PR number I have added such a testcase in the new version below. I didn't add it in the target specific subdirectory, as other architectures might be affected. Actually aarch64 is also affected, though for different reasons. I have tested that the testcase correctly catch the issue, and that the whole patch doesn't cause any regression. Please find the new patch below. If it is fine, I would appreciate if someone can commit the patch, as I don't have SVN write access (though I have done the copyright assignment stuff). Sorry about the slow review. No problem, I am also very often short on time. Aurelien gcc/testsuite/ChangeLog 2014-06-18 Aurelien Jarno aurel...@aurel32.net PR target/59833 * gcc.dg/pr59833.c: New testcase. Index: gcc/testsuite/gcc.dg/pr59833.c === --- gcc/testsuite/gcc.dg/pr59833.c (revision 0) +++ gcc/testsuite/gcc.dg/pr59833.c (working copy) @@ -0,0 +1,15 @@ +/* PR target/59833 */ +/* { dg-do run } */ +/* { dg-options -lm } */ + +extern int __issignaling (double); + +int +main () +{ + float sNaN = __builtin_nansf (); + double x = (double) sNaN; + if (__issignaling (x)) + __builtin_abort (); + return 0; +} libgcc/ChangeLog 2014-06-18 Aurelien Jarno aurel...@aurel32.net PR target/59833 * config/arm/ieee754-df.S (extendsfdf2): Convert sNaN to qNaN. Index: libgcc/config/arm/ieee754-df.S === --- libgcc/config/arm/ieee754-df.S (revision 211428) +++ libgcc/config/arm/ieee754-df.S (working copy) @@ -473,11 +473,15 @@ eorne xh, xh, #0x3800 @ fixup exponent otherwise. RETc(ne)@ and return it. - teq r2, #0 @ if actually 0 - do_it ne, e - teqne r3, #0xff00 @ or INF or NAN + bicsr2, r2, #0xff00 @ isolate mantissa + do_it eq @ if 0, that is ZERO or INF RETc(eq)@ we are done already. + teq r3, #0xff00 @ check for NAN + do_it eq, t + orreq xh, xh, #0x0008 @ change to quiet NAN + RETc(eq)@ and return it. + @ value was denormalized. We can normalize it now. do_push {r4, r5, lr} mov r4, #0x380 @ setup corresponding exponent -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurel...@aurel32.net http://www.aurel32.net
Re: [PATCH] Fix up -march=native handling under KVM (PR target/61570)
On Fri, Jun 20, 2014 at 2:42 PM, Jakub Jelinek ja...@redhat.com wrote: Hi! As mentioned in the PR, some? KVM versions disable some CPU flags, supposedly so that it can be migrated to any other x86-64 hw. Thus, it announces only sse2 and lm, but already not sse3, ssse3 nor 3dnow, and (unfortunately) identifies itself as GenuineIntel family 6 model 13. There is no 64-bit CPU that actually is the lowest common denominator of x86-64 CPUs (AMDs had 3dNOW etc., Intel first x86-64 CPUs had already SSSE3). This patch just makes sure we can use -march=native -m64 on such hosts, without that gcc complains that the selected CPU doesn't support 64-bit mode, because -march=native gives -march=pentium-m. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.9? 2014-06-20 Jakub Jelinek ja...@redhat.com PR target/61570 * config/i386/driver-i386.c (host_detect_local_cpu): For unknown model family 6 CPU with has_longmode never use a CPU without 64-bit support. --- gcc/config/i386/driver-i386.c.jj2014-05-14 14:45:54.0 +0200 +++ gcc/config/i386/driver-i386.c 2014-06-20 18:59:57.805006358 +0200 @@ -745,6 +745,11 @@ const char *host_detect_local_cpu (int a /* Assume Core 2. */ cpu = core2; } + else if (has_longmode) + /* Perhaps some emulator? Assume x86-64, otherwise gcc + -march=native would be unusable for 64-bit compilations, + as all the CPUs below are 32-bit only. */ + cpu = x86-64; else if (has_sse3) /* It is Core Duo. */ cpu = pentium-m; Jakub host_detect_local_cpu guesses the cpu based on the real processors. It doesn't work with emulators due to some conflicts. This isn't the only only place which has the same issue. I prefer something like this. -- H.J. --- diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c index 3e8a995..2aada71 100644 --- a/gcc/config/i386/driver-i386.c +++ b/gcc/config/i386/driver-i386.c @@ -415,6 +415,7 @@ const char *host_detect_local_cpu (int argc, const char **argv) bool arch; unsigned int l2sizekb = 0; + unsigned int arch_64bit = 1; if (argc 1) return NULL; @@ -656,11 +657,14 @@ const char *host_detect_local_cpu (int argc, const char **argv) { case PROCESSOR_I386: /* Default. */ + arch_64bit = 0; break; case PROCESSOR_I486: + arch_64bit = 0; cpu = i486; break; case PROCESSOR_PENTIUM: + arch_64bit = 0; if (arch has_mmx) cpu = pentium-mmx; else @@ -745,21 +749,25 @@ const char *host_detect_local_cpu (int argc, const char **argv) /* Assume Core 2. */ cpu = core2; } - else if (has_sse3) - /* It is Core Duo. */ - cpu = pentium-m; - else if (has_sse2) - /* It is Pentium M. */ - cpu = pentium-m; - else if (has_sse) - /* It is Pentium III. */ - cpu = pentium3; - else if (has_mmx) - /* It is Pentium II. */ - cpu = pentium2; else - /* Default to Pentium Pro. */ - cpu = pentiumpro; + { + arch_64bit = 0; + if (has_sse3) +/* It is Core Duo. */ +cpu = pentium-m; + else if (has_sse2) +/* It is Pentium M. */ +cpu = pentium-m; + else if (has_sse) +/* It is Pentium III. */ +cpu = pentium3; + else if (has_mmx) +/* It is Pentium II. */ +cpu = pentium2; + else +/* Default to Pentium Pro. */ +cpu = pentiumpro; + } } else /* For -mtune, we default to -mtune=generic. */ @@ -773,21 +781,30 @@ const char *host_detect_local_cpu (int argc, const char **argv) if (has_longmode) cpu = nocona; else -cpu = prescott; +{ + cpu = prescott; + arch_64bit = 0; +} } else - cpu = pentium4; + { + cpu = pentium4; + arch_64bit = 0; + } break; case PROCESSOR_GEODE: + arch_64bit = 0; cpu = geode; break; case PROCESSOR_K6: + arch_64bit = 0; if (arch has_3dnow) cpu = k6-3; else cpu = k6; break; case PROCESSOR_ATHLON: + arch_64bit = 0; if (arch has_sse) cpu = athlon-4; else @@ -896,6 +913,9 @@ const char *host_detect_local_cpu (int argc, const char **argv) const char *xsavec = has_xsavec ? -mxsavec : -mno-xsavec; const char *xsaves = has_xsaves ? -mxsaves : -mno-xsaves; + if (arch_64bit == 0 has_sse2 has_longmode) + cpu = x86-64; + options = concat (options, mmx, mmx3dnow, sse, sse2, sse3, ssse3, sse4a, cx16, sahf, movbe, aes, sha, pclmul, popcnt, abm, lwp, fma, fma4, xop, bmi, bmi2,
Re: [PATCH, PR61219]: Fix sNaN handling in ARM float to double conversion
On Sat, 21 Jun 2014, Aurelien Jarno wrote: Index: gcc/testsuite/gcc.dg/pr59833.c === --- gcc/testsuite/gcc.dg/pr59833.c(revision 0) +++ gcc/testsuite/gcc.dg/pr59833.c(working copy) @@ -0,0 +1,15 @@ +/* PR target/59833 */ +/* { dg-do run } */ +/* { dg-options -lm } */ + +extern int __issignaling (double); __issignaling is a recent glibc addition, not a standard C library function; you can't assume it's available in a test without using a suitable effective-target that tests for whether it's available. I believe linking with -lm is the default in the testsuite so shouldn't need specifying explicitly. -- Joseph S. Myers jos...@codesourcery.com
[PATCH] Emit DW_tag_restrict_type for restrict-qualified pointers.
The following patch is based on and extends the guality test from my earlier Pass type modifiers as flags arguments. Add guality type test. patch. gcc/ChangeLog PR debug/59051 * dwarf2out.h (enum dw_mod_flag): Add dw_mod_restrict. * dwarf2out.c (dw_mod_decl_flags): Handle TYPE_RESTRICT. (dw_mod_type_flags): Likewise. (dw_mods_to_quals): New function. (dw_mod_qualified_type): Likewise. (modified_type_die): Handle dw_mod_restrict. gcc/testsuite/ChangeLog PR debug/59051 * gcc.dg/guality/restrict.c: New test. --- gcc/ChangeLog | 10 gcc/dwarf2out.c | 76 +- gcc/dwarf2out.h |1 + gcc/testsuite/ChangeLog |4 ++ gcc/testsuite/gcc.dg/guality/restrict.c | 48 +++ 5 files changed, 126 insertions(+), 13 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/guality/restrict.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 4dfd9a5..06a8767 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,15 @@ 2014-06-20 Mark Wielaard m...@redhat.com + PR debug/59051 + * dwarf2out.h (enum dw_mod_flag): Add dw_mod_restrict. + * dwarf2out.c (dw_mod_decl_flags): Handle TYPE_RESTRICT. + (dw_mod_type_flags): Likewise. + (dw_mods_to_quals): New function. + (dw_mod_qualified_type): Likewise. + (modified_type_die): Handle dw_mod_restrict. + +2014-06-20 Mark Wielaard m...@redhat.com + * dwarf2out.h (enum dw_mod_flag): New enum. * dwarf2out.c (dw_mod_decl_flags): New function. (dw_mod_type_flags): Likewise. diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 3d3508d..b99d1b9 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -10504,14 +10504,52 @@ static int dw_mod_decl_flags (const_tree decl) { return ((TREE_READONLY (decl) ? dw_mod_const : dw_mod_none) - | (TREE_THIS_VOLATILE (decl) ? dw_mod_volatile : dw_mod_none)); + | (TREE_THIS_VOLATILE (decl) ? dw_mod_volatile : dw_mod_none) + | ((POINTER_TYPE_P (TREE_TYPE (decl)) + TYPE_RESTRICT (TREE_TYPE (decl))) +? dw_mod_restrict : dw_mod_none)); } static int dw_mod_type_flags (const_tree type) { return ((TYPE_READONLY (type) ? dw_mod_const : dw_mod_none) - | (TYPE_VOLATILE (type) ? dw_mod_volatile : dw_mod_none)); + | (TYPE_VOLATILE (type) ? dw_mod_volatile : dw_mod_none) + | ((POINTER_TYPE_P (type) TYPE_RESTRICT (type)) +? dw_mod_restrict : dw_mod_none)); +} + +static int +dw_mods_to_quals (int mods) +{ + return (((mods dw_mod_const) ? TYPE_QUAL_CONST : 0) + | ((mods dw_mod_volatile) ? TYPE_QUAL_VOLATILE : 0) + | ((mods dw_mod_restrict) ? TYPE_QUAL_RESTRICT : 0)); +} + +/* Returns true if there is a qualified type with at least one + modifier given in mods. Returns false if mods == dw_mod_none or + there is no qualified type with at least one of the given mods. */ + +static bool +dw_mod_qualified_type (tree type, int mods) +{ + if (mods == dw_mod_none) +return false; + + if (get_qualified_type (type, dw_mods_to_quals (mods)) != NULL_TREE) +return true; + + if (mods dw_mod_const) +return dw_mod_qualified_type (type, mods ~dw_mod_const); + + if (mods dw_mod_volatile) +return dw_mod_qualified_type (type, mods ~dw_mod_volatile); + + if (mods dw_mod_restrict) +return dw_mod_qualified_type (type, mods ~dw_mod_restrict); + + gcc_unreachable (); } /* Given a pointer to an arbitrary ..._TYPE tree node, return a debugging @@ -10531,13 +10569,15 @@ modified_type_die (tree type, int mods, dw_die_ref context_die) if (code == ERROR_MARK) return NULL; + /* Don't emit DW_TAG_restrict_type for DWARFv2, since it is a type + tag modifier (and not an attribute) old consumers won't be able + to handle it. */ + if (dwarf_version 3) +mods = ~dw_mod_restrict; + /* See if we already have the appropriately qualified variant of this type. */ - qualified_type -= get_qualified_type (type, (((mods dw_mod_const) - ? TYPE_QUAL_CONST : 0) -| ((mods dw_mod_volatile) - ? TYPE_QUAL_VOLATILE : 0))); + qualified_type = get_qualified_type (type, dw_mods_to_quals (mods)); if (qualified_type == sizetype TYPE_NAME (qualified_type) @@ -10577,8 +10617,10 @@ modified_type_die (tree type, int mods, dw_die_ref context_die) } else if ((mods dw_mod_const) TYPE_READONLY (dtype) || (mods dw_mod_volatile) TYPE_VOLATILE (dtype) + || (mods dw_mod_restrict) TYPE_RESTRICT (dtype) || ((mods dw_mod_const) = TYPE_READONLY (dtype) (mods dw_mod_volatile) = TYPE_VOLATILE (dtype) + (mods dw_mod_restrict) = TYPE_RESTRICT (dtype)
Re: [PATCH x86_64] Optimize access to globals in -fpie -pie builds with copy relocations
Patch Updated. Sri On Mon, Jun 9, 2014 at 3:55 PM, Sriraman Tallam tmsri...@google.com wrote: Ping. On Mon, May 19, 2014 at 11:11 AM, Sriraman Tallam tmsri...@google.com wrote: Ping. On Thu, May 15, 2014 at 11:34 AM, Sriraman Tallam tmsri...@google.com wrote: Optimize access to globals with -fpie, x86_64 only: Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the module using the GOT. This is two instructions, one to get the address of the global from the GOT and the other to get the value. If it turns out that the global gets defined in the executable at link-time, it still needs to go through the GOT as it is too late then to generate a direct access. Examples: foo.cc -- int a_glob; int main () { return a_glob; // defined in this file } With -O2 -fpie -pie, the generated code directly accesses the global via PC-relative insn: 5e0 main: mov0x165a(%rip),%eax# 1c40 a_glob foo.cc -- extern int a_glob; int main () { return a_glob; // defined in this file } With -O2 -fpie -pie, the generated code accesses global via GOT using two memory loads: 6f0 main: mov0x1609(%rip),%rax # 1d00 _DYNAMIC+0x230 mov(%rax),%eax This is true even if in the latter case the global was defined in the executable through a different file. Some experiments on google benchmarks shows that the extra memory loads affects performance by 1% to 5%. Solution - Copy Relocations: When the linker supports copy relocations, GCC can always assume that the global will be defined in the executable. For globals that are truly extern (come from shared objects), the linker will create copy relocations and have them defined in the executable. Result is that no global access needs to go through the GOT and hence improves performance. This patch to the gold linker : https://sourceware.org/ml/binutils/2014-05/msg00092.html submitted recently allows gold to generate copy relocations for -pie mode when necessary. I have added option -mld-pie-copyrelocs which when combined with -fpie would do this. Note that the BFD linker does not support pie copyrelocs yet and this option cannot be used there. Please review. ChangeLog: * config/i386/i36.opt (mld-pie-copyrelocs): New option. * config/i386/i386.c (legitimate_pic_address_disp_p): Check if this address is still legitimate in the presence of copy relocations and -fpie. * testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test. * testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test. Patch attached. Thanks Sri Optimize access to globals with -fpie, x86_64 only: Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the module using the GOT. This is two instructions, one to get the address of the global from the GOT and the other to get the value. If it turns out that the global gets defined in the executable at link-time, it still needs to go through the GOT as it is too late then to generate a direct access. Examples: foo.cc -- int a_glob; int main () { return a_glob; // defined in this file } With -O2 -fpie -pie, the generated code directly accesses the global via PC-relative insn: 5e0 main: mov 0x165a(%rip),%eax # 1c40 a_glob foo.cc -- extern int a_glob; int main () { return a_glob; // defined in this file } With -O2 -fpie -pie, the generated code accesses global via GOT using two memory loads: 6f0 main: mov 0x1609(%rip),%rax # 1d00 _DYNAMIC+0x230 mov (%rax),%eax This is true even if in the latter case the global was defined in the executable through a different file. Some experiments on google benchmarks shows that the extra memory loads affects performance by 1% to 5%. Solution - Copy Relocations: When the linker supports copy relocations, GCC can always assume that the global will be defined in the executable. For globals that are truly extern (come from shared objects), the linker will create copy relocations and have them defined in the executable. Result is that no global access needs to go through the GOT and hence improves performance. This patch to the gold linker : https://sourceware.org/ml/binutils/2014-05/msg00092.html submitted recently allows gold to generate copy relocations for -pie mode when necessary. I have added option -mcopyrelocs which when combined with -fpie would do this. Note that the BFD linker does not support pie copyrelocs yet and this option cannot be used there. Please review. ChangeLog: * config/i386/i36.opt (mcopyrelocs): New option. * config/i386/i386.c (legitimate_pic_address_disp_p): Check if this address is still legitimate in the presence of copy relocations and -fpie. * doc/invoke.texi (mcopyrelocs): Document. * testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test. * testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test. Index:
Re: Another AIX Bootstrap failure
Hello, after some lengthly investigation it turned out that aliases on AIX doesn't behave in the way we expect. In particular creating a static alias of a global symbol has no effect. This is somewhat special behaviour of AIX's .set pseudo-op I think I can get this fixed by simply emitting alternative symbols into every definition instead of relying on semantic of assembler's .set. This patch disables aliases for !SUPPORTS_ONE_ONLY targets (I hope this to be turned back soon after we fix AIX output macros) and adds testcase for weird behavoiur of aliases so we can possibly catch other targets that do not behave as expected. Bootstrapped/regtested x86_64-linux. * gcc.dg/localalias.c: New testcase. * gcc.dg/localalias-2.c: New testcase. * gcc.dg/globalalias.c: New testcase. * gcc.dg/globalalias-2.c: New testcase. * ipa-visibility.c (function_and_variable_visibility): Disable temporarily local aliases for some targets. Index: testsuite/gcc.dg/localalias.c === --- testsuite/gcc.dg/localalias.c (revision 0) +++ testsuite/gcc.dg/localalias.c (revision 0) @@ -0,0 +1,42 @@ +/* This test checks that local aliases behave sanely. This is necessary for code correctness + of aliases introduced by ipa-visibility pass. + + If this test fails either aliases needs to be disabled on given target on aliases with + proper semantic needs to be implemented. This is problem with e.g. AIX .set pseudo-op + that implementes alias syntactically (by substituting in assembler) rather as alternative + symbol defined on a target's location. */ + +/* { dg-do run } + { dg-options -Wstrict-aliasing=2 -fstrict-aliasing } + { dg-require-alias } + { dg-xfail-if { powerpc-ibm-aix* } { * } { } } + { dg-additional-sources localalias-2.c } */ +extern void abort (void); +extern int test2count; +int testcount; +__attribute__ ((weak,noinline)) +void test(void) +{ + testcount++; +} +__attribute ((alias(test))) +static void test2(void); + +void main() +{ + test2(); + /* This call must bind locally. */ + if (!testcount) +abort (); + test(); + /* Depending on linker choice, this one may bind locally + or to the other unit. */ + if (!testcount !test2count) +abort(); + tt(); + + if ((testcount != 1 || test2count != 3) + (testcount != 3 || test2count != 1)) +abort (); + reutrn 0; +} Index: testsuite/gcc.dg/globalalias.c === --- testsuite/gcc.dg/globalalias.c (revision 0) +++ testsuite/gcc.dg/globalalias.c (revision 0) @@ -0,0 +1,42 @@ +/* This test checks that local aliases behave sanely. This is necessary for code correctness + of aliases introduced by ipa-visibility pass. + + This test expose weird behaviour of AIX's .set pseudo-op where the global symbol is created, + but all uses of the alias are syntactically replaced by uses of the target. This means that + both counters are increased to 2. */ + +/* { dg-do run } + { dg-options -O2 } + { dg-require-alias } + { dg-xfail-if { powerpc-ibm-aix* } { * } { } } + { dg-additional-sources globalalias-2.c } */ +extern int test2count; +extern void abort (void); +int testcount; +static +void test(void) +{ + testcount++; +} +__attribute__ ((weak,noinline)) +__attribute ((alias(test))) +void test2(void); + +void main() +{ + test(); + /* This call must bind locally. */ + if (!testcount) +abort (); + test2(); + /* Depending on linker choice, this one may bind locally + or to the other unit. */ + if (!testcount !test2count) +abort(); + tt(); + + if ((testcount != 1 || test2count != 3) + (testcount != 3 || test2count != 1)) +abort (); + return 0; +} Index: testsuite/gcc.dg/globalalias-2.c === --- testsuite/gcc.dg/globalalias-2.c(revision 0) +++ testsuite/gcc.dg/globalalias-2.c(revision 0) @@ -0,0 +1,20 @@ +int test2count; +extern void abort (void); +static +void test(void) +{ + test2count++; +} +__attribute__ ((weak,noinline)) +__attribute ((alias(test))) +void test2(void); + +void tt() +{ + int prev = test2count; + /* This call must bind locally. */ + test(); + if (test2count == prev) +abort(); + test2(); + } Index: testsuite/gcc.dg/localalias-2.c === --- testsuite/gcc.dg/localalias-2.c (revision 0) +++ testsuite/gcc.dg/localalias-2.c (revision 0) @@ -0,0 +1,19 @@ +extern void abort (void); +int test2count; +__attribute__ ((weak,noinline)) +void test(void) +{ + test2count++; +} +__attribute ((alias(test))) +static void test2(void); + +void tt() +{ + int prev = test2count; + /* This call must bind locally. */ + test2(); + if (test2count == prev) +abort(); + test(); + } Index: ipa-visibility.c