Re: [PATCH][testsuite] Mark as UNSUPPORTED tests that don't fit into tiny memory model
Andreas Schwab sch...@linux-m68k.org writes: domi...@lps.ens.fr (Dominique Dhumieres) writes: Here it is. Committed as r218662 to get things going again. The same change should be done for libitm and libatomic. And boehm-gc. Right, fixed as follows and installed on mainline after testing on i386-pc-solaris2.11 and sparc-sun-solaris2.11. I must say I'm less than impressed how this patch tested actually. Rainer 2014-12-17 Rainer Orth r...@cebitec.uni-bielefeld.de * testsuite/lib/boehm-gc.exp: Load target-utils.exp. # HG changeset patch # Parent db99d18efa0cce3ad66a0cff02fedf946af2a842 Include target-utils.exp in boehm-gc testing diff --git a/boehm-gc/testsuite/lib/boehm-gc.exp b/boehm-gc/testsuite/lib/boehm-gc.exp --- a/boehm-gc/testsuite/lib/boehm-gc.exp +++ b/boehm-gc/testsuite/lib/boehm-gc.exp @@ -27,6 +27,7 @@ load_gcc_lib wrapper.exp load_gcc_lib target-supports.exp # For dg-skip-if. load_gcc_lib target-supports-dg.exp +load_gcc_lib target-utils.exp # For ${tool}_exit. load_gcc_lib gcc-defs.exp # For prune_gcc_output. -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH][testsuite] Mark as UNSUPPORTED tests that don't fit into tiny memory model
On 18/12/14 09:43, Rainer Orth wrote: Andreas Schwab sch...@linux-m68k.org writes: domi...@lps.ens.fr (Dominique Dhumieres) writes: Here it is. Committed as r218662 to get things going again. The same change should be done for libitm and libatomic. And boehm-gc. Right, fixed as follows and installed on mainline after testing on i386-pc-solaris2.11 and sparc-sun-solaris2.11. I must say I'm less than impressed how this patch tested actually. Hi Rainer, Sorry about this. I had grepped for includes of gcc-defs.exp in lib*, which missed boehm-gc. Thanks for fixing this, I was away earlier this week. I'll make sure to check the libraries explicitly if I do any testsuite refactoring work in the future Kyrill Rainer 2014-12-17 Rainer Orth r...@cebitec.uni-bielefeld.de * testsuite/lib/boehm-gc.exp: Load target-utils.exp.
Re: [PATCH][testsuite] Mark as UNSUPPORTED tests that don't fit into tiny memory model
Hi Kyrill, Sorry about this. I had grepped for includes of gcc-defs.exp in lib*, which missed boehm-gc. Thanks for fixing this, I was away earlier this week. I'll make sure to check the libraries explicitly if I do any testsuite refactoring work in the future great, thanks. I know all too well that it needs a large amount of this do get away from the current copy-and-paste programming pervasive throughout the testsuite. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[testsuite, i386] Change mpx effective-target test into link test
Currently, the new gcc.dg/lto/chkp-privatize test FAILs on Solaris/x86 with gas and ld for 64-bit: UNRESOLVED: gcc.dg/lto/chkp-privatize c_lto_chkp-privatize_0.o-c_lto_chkp-privatize_1.o execute -fPIC -flto -flto-partition=max -fcheck-pointer-bounds -mmpx FAIL: gcc.dg/lto/chkp-privatize c_lto_chkp-privatize_0.o-c_lto_chkp-privatize_1.o link, -fPIC -flto -flto-partition=max -fcheck-pointer-bounds -mmpx output is: ld: fatal: relocation error: file c_lto_chkp-privatize_0.o: section [4].rela.text: invalid relocation type: 0x28 ld: fatal: symbol referencing errors collect2: error: ld returned 1 exit status This relocation type 0x28 (40) is R_X86_64_PLT32_BND, which Solaris 10 and 11 ld know nothing about. Given that there's no test whatsoever in gcc/configure.ac that the whole toolchain supports the necessary mnemonics and relocs, the logical place seems to be the testsuite. The following patch does just that, turning the mpx effective-target test into a link test, thus checking both assembler and linker used. Tested with the appropriate runtest invocations on i386-pc-solaris2.11 with as/ld (both 32 and 64-bit tests unsupported due to missing assembler support), gas/ld (32-bit tests pass, 64-bit tests unsupported due to missing linker support), and x86_64-unknown-linux-gnu (both 32 and 64-bit tests pass). I wonder what's the point of having mpx-dg.exp at all: I'd rather move the single test to target-supports.exp, avoiding the whole flurry of mpx-dg.exp inclusions all over gcc/testsuite. Ilya? However that may be, it can be done as a followup. Ok for mainline? Rainer 2014-12-18 Rainer Orth r...@cebitec.uni-bielefeld.de * lib/mpx-dg.exp (check_effective_target_mpx): Change into link test. Add main. # HG changeset patch # Parent bf453dbd938f27cf7eced4b5df3c29f5e452a55f Change mpx effective-target test into link test diff --git a/gcc/testsuite/lib/mpx-dg.exp b/gcc/testsuite/lib/mpx-dg.exp --- a/gcc/testsuite/lib/mpx-dg.exp +++ b/gcc/testsuite/lib/mpx-dg.exp @@ -18,7 +18,8 @@ # error-free for trivial code, 0 otherwise. proc check_effective_target_mpx {} { -return [check_no_compiler_messages mpx object { +return [check_no_compiler_messages mpx executable { int *foo (int *arg) { return arg; } + int main (void) { return foo ((void *)0) == 0; } } -fcheck-pointer-bounds -mmpx] } -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[patch 1/2][ARM]: New CPU support for Marvell Whitney
Hi, This patch contains Marvell Whitney core's pipeline description. Test on arm-linux-gnueabi and no new regression are found. Is it OK for trunk? Regards, Xingxing 2014-12-18 Xingxing Pan xxing...@marvell.com * config/arm/arm-cores.def: Add new core marvell-whitney. * config/arm/arm-protos.h: (marvell_whitney_vector_element_size_is_byte): Declare. (marvell_whitney_non_shift_with_shift_operand): Ditto. * config/arm/arm-tables.opt: Regenerated. * config/arm/arm-tune.md: Regenerated. * config/arm/arm.c (arm_marvell_whitney_tune): New structure. (arm_issue_rate): Add marvell_whitney. (marvell_whitney_vector_element_size_is_byte): New function. (marvell_whitney_non_shift_with_shift_operand): Ditto. * config/arm/arm.md: Include marvell-whitney.md. (generic_sched): Add marvell_whitney. (generic_vfp): Ditto. * config/arm/bpabi.h (BE8_LINK_SPEC): Add marvell-whitney. * config/arm/t-arm (MD_INCLUDES): Add marvell-whitney.md. * config/arm/marvell-whitney.md: New file. * doc/invoke.texi: Document marvell-whitney. diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def index 423ee9e..b0ffbe1 100644 --- a/gcc/config/arm/arm-cores.def +++ b/gcc/config/arm/arm-cores.def @@ -159,6 +159,7 @@ ARM_CORE(cortex-m7, cortexm7, cortexm7, 7EM, FL_LDSCHED, cortex_m7) ARM_CORE(cortex-m4, cortexm4, cortexm4, 7EM, FL_LDSCHED, v7m) ARM_CORE(cortex-m3, cortexm3, cortexm3, 7M, FL_LDSCHED, v7m) ARM_CORE(marvell-pj4, marvell_pj4, marvell_pj4, 7A, FL_LDSCHED, 9e) +ARM_CORE(marvell-whitney,marvell_whitney, marvell_whitney, 7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, marvell_whitney) /* V7 big.LITTLE implementations */ ARM_CORE(cortex-a15.cortex-a7, cortexa15cortexa7, cortexa7, 7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a15) diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 20cfa9f..e86db1e 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -231,6 +231,9 @@ extern void arm_order_regs_for_local_alloc (void); extern int arm_max_conditional_execute (); +extern bool marvell_whitney_vector_element_size_is_byte (rtx insn); +extern bool marvell_whitney_non_shift_with_shift_operand (rtx insn); + /* Vectorizer cost model implementation. */ struct cpu_vec_costs { const int scalar_stmt_cost; /* Cost of any scalar operation, excluding diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt index 9b1886e..3371ce3 100644 --- a/gcc/config/arm/arm-tables.opt +++ b/gcc/config/arm/arm-tables.opt @@ -298,6 +298,9 @@ EnumValue Enum(processor_type) String(marvell-pj4) Value(marvell_pj4) EnumValue +Enum(processor_type) String(marvell-whitney) Value(marvell_whitney) + +EnumValue Enum(processor_type) String(cortex-a15.cortex-a7) Value(cortexa15cortexa7) EnumValue diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md index d300c51..c73c33c 100644 --- a/gcc/config/arm/arm-tune.md +++ b/gcc/config/arm/arm-tune.md @@ -28,9 +28,10 @@ cortexm1smallmultiply,cortexm0smallmultiply,cortexm0plussmallmultiply, genericv7a,cortexa5,cortexa7, cortexa8,cortexa9,cortexa12, - cortexa15,cortexa17,cortexr4,cortexr4f, - cortexr5,cortexr7,cortexm7, - cortexm4,cortexm3,marvell_pj4, - cortexa15cortexa7,cortexa17cortexa7,cortexa53, - cortexa57,cortexa57cortexa53 + cortexa15,cortexa17,cortexr4, + cortexr4f,cortexr5,cortexr7, + cortexm7,cortexm4,cortexm3, + marvell_pj4,marvell_whitney,cortexa15cortexa7, + cortexa17cortexa7,cortexa53,cortexa57, + cortexa57cortexa53 (const (symbol_ref ((enum attr_tune) arm_tune diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 0ec526b..183da4c 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -1914,6 +1914,25 @@ const struct tune_params arm_cortex_a9_tune = 8 /* Maximum insns to inline memset. */ }; +const struct tune_params arm_marvell_whitney_tune = +{ + arm_9e_rtx_costs, + cortexa9_extra_costs, + cortex_a9_sched_adjust_cost, + 1, /* Constant limit. */ + 5, /* Max cond insns. */ + ARM_PREFETCH_BENEFICIAL(4,32,32), + false, /* Prefer constant pool. */ + arm_default_branch_cost, + false, /* Prefer LDRD/STRD. */ + {true, true},/* Prefer non short circuit. */ + arm_default_vec_cost,/* Vectorizer costs. */ + false,/* Prefer Neon for 64-bits bitops. */ + false, false,
Re: [PATCH 4/4] OpenMP 4.0 offloading to Intel MIC: non-fallback testing
On Wed, Dec 17, 2014 at 11:48:01PM +0100, Thomas Schwinge wrote: I have another suggestion: on gomp-4_0-branch, we had already started using a libgomp-test-support.exp file for a similar purpose. I now changed your code on gomp-4_0-branch in r218845 as follows (though, not very much tested). The advantage here is that people who are not using GCC build-tree testing, but are directly invoking runtest, then don't have to set various environment variables, but instead just have to copy this one libgomp-test-support.exp file from the build tree to the testing tree. Does this make sense? commit de01639bf47e29a20959771e587fb6f30c372a45 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Wed Dec 17 22:45:05 2014 + libgomp: Route offloading data through the libgomp-test-support.exp file. libgomp/ * testsuite/Makefile.am: Don't export OFFLOAD_TARGETS, OFFLOAD_ADDITIONAL_OPTIONS, and OFFLOAD_ADDITIONAL_LIB_PATHS... * testsuite/libgomp-test-support.exp.in: ..., and instead set offload_targets, offload_additional_options, and offload_additional_lib_paths here. Update all users. LGTM. Jakub
Re: OpenACC middle end changes
Hi! On Thu, 13 Nov 2014 19:09:49 +0100, Jakub Jelinek ja...@redhat.com wrote: On Thu, Nov 13, 2014 at 05:59:11PM +0100, Thomas Schwinge wrote: --- gcc/builtins.c +++ gcc/builtins.c +/* Expand OpenACC acc_on_device. + + This has to happen late (that is, not in early folding; expand_builtin_*, + rather than fold_builtin_*), as we have to act differently for host and + acceleration device (ACCEL_COMPILER conditional). */ + +static rtx +expand_builtin_acc_on_device (tree exp, rtx target ATTRIBUTE_UNUSED) +{ + if (!validate_arglist (exp, INTEGER_TYPE, VOID_TYPE)) +return NULL_RTX; + + tree arg, v1, v2, ret; + location_t loc; + + arg = CALL_EXPR_ARG (exp, 0); + arg = builtin_save_expr (arg); + loc = EXPR_LOCATION (exp); + + /* Build: (arg == v1 || arg == v2) ? 1 : 0. */ + +#ifdef ACCEL_COMPILER + v1 = build_int_cst (TREE_TYPE (arg), /* TODO: acc_device_not_host */ 3); + v2 = build_int_cst (TREE_TYPE (arg), ACCEL_COMPILER_acc_device); +#else + v1 = build_int_cst (TREE_TYPE (arg), /* TODO: acc_device_none */ 0); + v2 = build_int_cst (TREE_TYPE (arg), /* TODO: acc_device_host */ 2); +#endif + + v1 = fold_build2_loc (loc, EQ_EXPR, integer_type_node, arg, v1); + v2 = fold_build2_loc (loc, EQ_EXPR, integer_type_node, arg, v2); + + /* Can't use TRUTH_ORIF_EXPR, as that is not supported by + expand_expr_real*. */ + ret = fold_build3_loc (loc, COND_EXPR, integer_type_node, v1, v1, v2); + ret = fold_build3_loc (loc, COND_EXPR, integer_type_node, +ret, integer_one_node, integer_zero_node); + + return expand_normal (ret); If you can't fold it late (which is indeed a problem for -O0), then I'd suggest to implement this more RTL-ish. So, avoid the builtin_save_expr, instead rtx op = expand_normal (arg); Don't build v1/v2 as trees (and, please fix the TODOs), but rtxes, (acc_device_* TODOs already resolved earlier on.) just rtx v1 = GEN_INT (...); rtx v2 = GEN_INT (...); machine_mode mode = TYPE_MODE (TREE_TYPE (arg)); rtx ret = gen_reg_rtx (TYPE_MODE (integer_type_node)); emit_move_insn (ret, const0_rtx); rtx_code_label *done_label = gen_label_rtx (); emit_cmp_and_jump_insns (op, v1, NE, NULL_RTX, mode, false, done_label, PROB_EVEN); emit_cmp_and_jump_insns (op, v2, NE, NULL_RTX, mode, false, done_label, PROB_EVEN); emit_move_insn (ret, const1_rtx); emit_label (done_label); return ret; or similar. Thanks for the review/suggestion/code! Note, it would still be worthwhile to fold the builtin, at least when optimizing, after IPA. Dunno if we have some property you can check, and Richard B. could suggest where it would be most appropriate (if GIMPLE guarded match.pd entry, or what), gimple_fold, etc. I'll make a note to have a look at that later on. I bet I should handle omp_is_initial_device (); similarly. Yeah. Committed to gomp-4_0-branch in r218858: commit da5ad5aec1c0f9b230ecb2dc00620a5598de5066 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Thu Dec 18 10:42:30 2014 + OpenACC acc_on_device: Make builtin expansion more RTXy. gcc/ * builtins.c (expand_builtin_acc_on_device): Make more RTXy. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@218858 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp | 5 + gcc/builtins.c | 44 +--- 2 files changed, 26 insertions(+), 23 deletions(-) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index b370616..a3650c5 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,3 +1,8 @@ +2014-12-18 Thomas Schwinge tho...@codesourcery.com + Jakub Jelinek ja...@redhat.com + + * builtins.c (expand_builtin_acc_on_device): Make more RTXy. + 2014-12-17 Thomas Schwinge tho...@codesourcery.com Bernd Schmidt ber...@codesourcery.com diff --git gcc/builtins.c gcc/builtins.c index fcf3f53..e946521 100644 --- gcc/builtins.c +++ gcc/builtins.c @@ -5889,38 +5889,36 @@ expand_stack_save (void) acceleration device (ACCEL_COMPILER conditional). */ static rtx -expand_builtin_acc_on_device (tree exp, rtx target ATTRIBUTE_UNUSED) +expand_builtin_acc_on_device (tree exp, rtx target) { if (!validate_arglist (exp, INTEGER_TYPE, VOID_TYPE)) return NULL_RTX; - tree arg, v1, v2, ret; - location_t loc; - - arg = CALL_EXPR_ARG (exp, 0); - arg = builtin_save_expr (arg); - loc = EXPR_LOCATION (exp); - - /* Build: (arg == v1 || arg == v2) ? 1 : 0. */ + tree arg = CALL_EXPR_ARG (exp, 0); + /* Return (arg == v1 || arg == v2) ? 1 : 0. */ + machine_mode v_mode = TYPE_MODE (TREE_TYPE (arg)); + rtx v = expand_normal (arg), v1, v2; #ifdef ACCEL_COMPILER - v1 = build_int_cst (TREE_TYPE (arg), GOMP_DEVICE_NOT_HOST); - v2 = build_int_cst (TREE_TYPE (arg),
Re: OpenACC middle end changes
Hi Jakub! Many thanks for the review comments! The very most have been addresed, here are just a few comments. If you feel strongly/differently about any, I'll address those, too. On Thu, 13 Nov 2014 19:09:49 +0100, Jakub Jelinek ja...@redhat.com wrote: On Thu, Nov 13, 2014 at 05:59:11PM +0100, Thomas Schwinge wrote: --- gcc/gimple-pretty-print.c +++ gcc/gimple-pretty-print.c @@ -1136,18 +1136,21 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags) case GF_OMP_FOR_KIND_FOR: kind = ; break; - case GF_OMP_FOR_KIND_SIMD: - kind = simd; - break; - case GF_OMP_FOR_KIND_CILKSIMD: - kind = cilksimd; - break; case GF_OMP_FOR_KIND_DISTRIBUTE: kind = distribute; break; case GF_OMP_FOR_KIND_CILKFOR: kind = _Cilk_for; break; + case GF_OMP_FOR_KIND_OACC_LOOP: + kind = oacc_loop; + break; + case GF_OMP_FOR_KIND_SIMD: + kind = simd; + break; + case GF_OMP_FOR_KIND_CILKSIMD: + kind = cilksimd; + break; Why the reshuffling? The result isn't alphabetically sorted anyway. I'd just add new stuff at the end ;) It's the order in which the GF_OMP_FOR_KIND_* are defined. At least for my mind ;-) that makes it very easy to grasp that all of them are covered. @@ -1684,7 +1718,12 @@ gimple_copy (gimple stmt) gimple_try_set_cleanup (copy, new_seq); break; + case GIMPLE_OACC_KERNELS: + case GIMPLE_OACC_PARALLEL: + gcc_unreachable (); + case GIMPLE_OMP_FOR: + gcc_assert (!is_gimple_omp_oacc_specifically (stmt)); new_seq = gimple_seq_copy (gimple_omp_for_pre_body (stmt)); gimple_omp_for_set_pre_body (copy, new_seq); t = unshare_expr (gimple_omp_for_clauses (stmt)); @@ -1754,6 +1793,7 @@ gimple_copy (gimple stmt) case GIMPLE_OMP_TASKGROUP: case GIMPLE_OMP_ORDERED: copy_omp_body: + gcc_assert (!is_gimple_omp_oacc_specifically (stmt)); new_seq = gimple_seq_copy (gimple_omp_body (stmt)); gimple_omp_set_body (copy, new_seq); break; Why are you so afraid to support oacc in gimple_copy? Not afraid, but have never run into this, so didn't want to claim such code paths have been tested. Anyway, with GIMPLE_OACC_* now merged into GIMPLE_OMP_TARGET, this should -- famous last words? ;-) -- now just work, so I removed all these asserts. (And, no doubt, usage of assert/gcc_unreachable was not exactly correct, for code paths that could actually legitimately be reached; probably should've used sorry instead.) +/* GIMPLE_OACC_PARALLEL */ +struct GTY((tag(GSS_OMP_PARALLEL_LAYOUT))) + gimple_statement_oacc_parallel : public gimple_statement_omp_parallel_layout +{ +/* No extra fields; adds invariant: + stmt-code == GIMPLE_OACC_PARALLEL. */ Formatting, there is too big indentation. That said, this will be replaced with GF_OMP_TARGET_KIND_*, right? +static inline tree +gimple_oacc_kernels_clauses (const_gimple gs) +{ + const gimple_statement_oacc_kernels *oacc_kernels_stmt = +as_a const gimple_statement_oacc_kernels * (gs); Wrong formatting (many times, = goes on the second line. Seems you probably just copied it from preexisting bad formatting, we should ping Dave Malcolm to fix up his scripts. Yes, yes, and yes. (And, now gone.) +/* Return true if STMT is any of the OpenACC types specifically. */ + +static inline bool +is_gimple_omp_oacc_specifically (const_gimple stmt) Why not is_gimple_oacc or gimple_oacc_p ? The idea is to make it clear in the name that STMT must be an OMP one. Now renamed to the shorter is_gimple_omp_oacc. @@ -99,15 +103,16 @@ enum gimplify_omp_var_data enum omp_region_type { - ORT_WORKSHARE = 0, - ORT_SIMD = 1, - ORT_PARALLEL = 2, - ORT_COMBINED_PARALLEL = 3, - ORT_TASK = 4, - ORT_UNTIED_TASK = 5, - ORT_TEAMS = 8, - ORT_TARGET_DATA = 16, - ORT_TARGET = 32 + /* An undefined region type. */ + ORT_INVALID = 0, + + ORT_WORKSHARE, + ORT_SIMD, + ORT_PARALLEL, + ORT_COMBINED_PARALLEL, + ORT_TASK, + ORT_TEAMS, + ORT_TARGET }; If you want to shift from bitmasks in the enum to extra on the side bits (why?), then combined for parallel is another thing. Right, but I've now dropped (reverted) this and further gimplification changes. Maybe this is material for the next stage 1, but maybe not useful enough. [...] +static inline tree +lookup_reduction (const char *id, omp_context *ctx) Can't you use oacc_ in the name of OpenACC specific functions? [...] Review comments for OpenACC reduction code have been addressed by Cesar. (Thanks!) case OMP_CLAUSE_DEFAULT: + gcc_assert (!is_gimple_omp_oacc_specifically (ctx-stmt)); ctx-default_kind = OMP_CLAUSE_DEFAULT_KIND (c); break; For clauses which are not parsed for OpenACC
Re: [C++ Patch] PR 60955
Hi, On 12/17/2014 09:37 PM, Jason Merrill wrote: I'm uncomfortable with setting TREE_NO_WARNING on a decl just because we don't want a warning for one particular use of it. How about suppressing warnings across the call to build_static_cast? Sure. The below uses the c_inhibit_evaluation_warnings mechanism and passes testing. I wondered if in such cases we could alternately use the warning_sentinel mechanism (moved to cp-tree.h of course) ? Thanks, Paolo. /// Index: cp/semantics.c === --- cp/semantics.c (revision 218857) +++ cp/semantics.c (working copy) @@ -1660,7 +1660,9 @@ force_paren_expr (tree expr) tree type = unlowered_expr_type (expr); bool rval = !!(kind clk_rvalueref); type = cp_build_reference_type (type, rval); + ++c_inhibit_evaluation_warnings; expr = build_static_cast (type, expr, tf_error); + --c_inhibit_evaluation_warnings; if (expr != error_mark_node) REF_PARENTHESIZED_P (expr) = true; } Index: cp/typeck.c === --- cp/typeck.c (revision 218857) +++ cp/typeck.c (working copy) @@ -6047,7 +6047,8 @@ cxx_mark_addressable (tree exp) (address of explicit register variable %qD requested, x); return false; } - else if (extra_warnings) + else if (extra_warnings + c_inhibit_evaluation_warnings == 0) warning (OPT_Wextra, address requested for %qD, which is declared %register%, x); } Index: testsuite/g++.dg/warn/register-parm-1.C === --- testsuite/g++.dg/warn/register-parm-1.C (revision 0) +++ testsuite/g++.dg/warn/register-parm-1.C (working copy) @@ -0,0 +1,9 @@ +// PR c++/60955 +// { dg-options -Wextra } + +unsigned int erroneous_warning(register int a) { +if ((a) 0xff) return 1; else return 0; +} +unsigned int no_erroneous_warning(register int a) { +if (a 0xff) return 1; else return 0; +}
Re: [testsuite, i386] Change mpx effective-target test into link test
On Thu, Dec 18, 2014 at 11:12 AM, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: Currently, the new gcc.dg/lto/chkp-privatize test FAILs on Solaris/x86 with gas and ld for 64-bit: UNRESOLVED: gcc.dg/lto/chkp-privatize c_lto_chkp-privatize_0.o-c_lto_chkp-privatize_1.o execute -fPIC -flto -flto-partition=max -fcheck-pointer-bounds -mmpx FAIL: gcc.dg/lto/chkp-privatize c_lto_chkp-privatize_0.o-c_lto_chkp-privatize_1.o link, -fPIC -flto -flto-partition=max -fcheck-pointer-bounds -mmpx output is: ld: fatal: relocation error: file c_lto_chkp-privatize_0.o: section [4].rela.text: invalid relocation type: 0x28 ld: fatal: symbol referencing errors collect2: error: ld returned 1 exit status This relocation type 0x28 (40) is R_X86_64_PLT32_BND, which Solaris 10 and 11 ld know nothing about. Given that there's no test whatsoever in gcc/configure.ac that the whole toolchain supports the necessary mnemonics and relocs, the logical place seems to be the testsuite. The following patch does just that, turning the mpx effective-target test into a link test, thus checking both assembler and linker used. Tested with the appropriate runtest invocations on i386-pc-solaris2.11 with as/ld (both 32 and 64-bit tests unsupported due to missing assembler support), gas/ld (32-bit tests pass, 64-bit tests unsupported due to missing linker support), and x86_64-unknown-linux-gnu (both 32 and 64-bit tests pass). I wonder what's the point of having mpx-dg.exp at all: I'd rather move the single test to target-supports.exp, avoiding the whole flurry of mpx-dg.exp inclusions all over gcc/testsuite. Ilya? However that may be, it can be done as a followup. Ok for mainline? Rainer 2014-12-18 Rainer Orth r...@cebitec.uni-bielefeld.de * lib/mpx-dg.exp (check_effective_target_mpx): Change into link test. Add main. OK. Thanks, Uros.
Re: OpenACC middle end changes
On Thu, Dec 18, 2014 at 11:46:00AM +0100, Thomas Schwinge wrote: just rtx v1 = GEN_INT (...); rtx v2 = GEN_INT (...); machine_mode mode = TYPE_MODE (TREE_TYPE (arg)); rtx ret = gen_reg_rtx (TYPE_MODE (integer_type_node)); emit_move_insn (ret, const0_rtx); rtx_code_label *done_label = gen_label_rtx (); emit_cmp_and_jump_insns (op, v1, NE, NULL_RTX, mode, false, done_label, PROB_EVEN); emit_cmp_and_jump_insns (op, v2, NE, NULL_RTX, mode, false, done_label, PROB_EVEN); emit_move_insn (ret, const1_rtx); emit_label (done_label); return ret; or similar. Thanks for the review/suggestion/code! Note, as I found later, emit_cmp_and_jump_insns is good enough only for certain modes on certain architectures (in particular, for cases where can_compare_p returns true). So it is better to use do_compare_rtx_and_jump instead of emit_cmp_and_jump_insns, because it handles also the cases which emit_cmp_and_jump_insns silently mishandles. You'll need to reorder the arguments a little bit and add one NULL_RTX argument. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63848#c4 Jakub
Re: OpenACC middle end changes
On Thu, Dec 18, 2014 at 12:07:01PM +0100, Thomas Schwinge wrote: - case GF_OMP_FOR_KIND_SIMD: - kind = simd; - break; - case GF_OMP_FOR_KIND_CILKSIMD: - kind = cilksimd; - break; case GF_OMP_FOR_KIND_DISTRIBUTE: kind = distribute; break; case GF_OMP_FOR_KIND_CILKFOR: kind = _Cilk_for; break; + case GF_OMP_FOR_KIND_OACC_LOOP: + kind = oacc_loop; + break; + case GF_OMP_FOR_KIND_SIMD: + kind = simd; + break; + case GF_OMP_FOR_KIND_CILKSIMD: + kind = cilksimd; + break; Why the reshuffling? The result isn't alphabetically sorted anyway. I'd just add new stuff at the end ;) It's the order in which the GF_OMP_FOR_KIND_* are defined. At least for my mind ;-) that makes it very easy to grasp that all of them are covered. Ok. +/* Return true if STMT is any of the OpenACC types specifically. */ + +static inline bool +is_gimple_omp_oacc_specifically (const_gimple stmt) Why not is_gimple_oacc or gimple_oacc_p ? The idea is to make it clear in the name that STMT must be an OMP one. Now renamed to the shorter is_gimple_omp_oacc. Ok. If you want to shift from bitmasks in the enum to extra on the side bits (why?), then combined for parallel is another thing. Right, but I've now dropped (reverted) this and further gimplification changes. Maybe this is material for the next stage 1, but maybe not useful enough. Ack. Do you want me to repost the OpenACC Middle End changes patch, or would you be OK with reviewing the code on gomp-4_0-branch, diffing against the last trunk merge point, 0fcfaa33cbf333ac69cc2b01a7277e5272ff8a3d, r218679? So, is what is on the gomp-4_0-branch now all that you'd like to merge to trunk now? Has it been tested on nvptx? I guess we should test it with XeonPhi offloading too to make sure it doesn't break. And then you or together with your coworkers should write the summary ChangeLogs (i.e. what changed compared to trunk in a single giant entry for each ChangeLog file as opposed to many ChangeLog.gomp change entries). Jakub
Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic
On 17/12/14 00:04, Joseph Myers wrote: On Mon, 15 Dec 2014, James Greenhalgh wrote: @@ -22792,6 +22792,12 @@ vsqrtq_f32 (float32x4_t a) return __builtin_aarch64_sqrtv4sf (a); } +__extension__ static __inline float64x1_t __attribute__ ((__always_inline__)) +vsqrt_f64 (float64x1_t a) +{ + return (float64x1_t) { __builtin_sqrt (a[0]) }; +} Hi Kyrill, Does this introduce an implicit need to link against a maths library if we want arm_neon.h to work correctly? If so, I think we need to take a different approach. At O0 I've started to see: undefined reference to `sqrt' When checking a large arm_neon.h testcase. It does seem strange that the mid-end would convert a __builtin_sqrt back to a library call at O0 when the target has an optab for it, so perhaps there is a bug there to go hunt? __builtin_sqrt has the same semantics as the sqrt library function. This includes setting errno for negative arguments (other than -0 and -NaN). The semantics also include that it's always OK to expand to a call to that library function (generally, __builtin_foo may always expand to a call to foo, if there is such a library function). Thanks for the explanation. I see that there are some intrinsics implemented in terms of __builtin_fabsf, presumable they can be at 'risk' too of having a library call emitted? Kyrill
Re: OpenACC middle end changes
On Thu, Dec 18, 2014 at 12:38:53PM +0100, Jakub Jelinek wrote: So, is what is on the gomp-4_0-branch now all that you'd like to merge to trunk now? Has it been tested on nvptx? I guess we should test it with XeonPhi offloading too to make sure it doesn't break. And then you or together with your coworkers should write the summary ChangeLogs (i.e. what changed compared to trunk in a single giant entry for each ChangeLog file as opposed to many ChangeLog.gomp change entries). Also, it would be nice to update wiki/Offloading to give details on all the steps how to configure nvptx offloading (how to grab nvptx-newlib, nvptx-tools, how to configure nvptx-none compiler, in what order to build those etc.). Jakub
[PATCH] Fix -fsanitize=float-cast-overflow with C FE (PR sanitizer/64344)
On Tue, Dec 16, 2014 at 06:58:47PM +, Joseph Myers wrote: On Fri, 12 Dec 2014, Jakub Jelinek wrote: Hi! -fsanitize=float-cast-overflow sanitization is done in convert.c and calls there save_expr. Unfortunately, save_expr is a no-go for the C FE, we need c_save_expr, but as convert.c is shared by all FEs, the only way to arrange that would be a new langhook. This patch attempts to fix it the same way as PR54428 did (the other save_expr in c-convert.c). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. Unfortunately this broke stuff, as reported by Tobias. The problems are: 1) sometimes convert is called after c_fully_fold, so we want in_late_binary_op set in that case and use save_expr instead of c_save_expr 2) if not in_late_binary_op, ubsan_instrument_float_cast wants to use the result of c_save_expr in comparisons (ok) and also as argument to a function call; but c_fully_fold will handle the former, but not look into CALL_EXPR arguments, so those need to be c_fully_folded and, preexisting problems, where constant is required, things didn't work well, because while the check was optimized out, we ended up with COMPOUND_EXPR of 0 and some constant. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-12-18 Jakub Jelinek ja...@redhat.com PR sanitizer/64344 * ubsan.h (ubsan_instrument_float_cast): Add ARG argument. * ubsan.c (ubsan_instrument_float_cast): Add ARG argument, pass it to libubsan handler instead of EXPR. Fold comparisons earlier, if the result is integer_zerop, return NULL_TREE. * convert.c (convert_to_integer): Pass expr as ARG. c/ * c-typeck.c (convert_for_assignment, c_finish_return): For -fsanitize=float-cast-overflow casts from REAL_TYPE to integer/enum types also set in_late_binary_op around convert call. * c-convert.c (convert): For -fsanitize=float-cast-overflow REAL_TYPE to integral type casts, if not in_late_binary_op, pass c_fully_fold result on expr as last argument to ubsan_instrument_float_cast, if in_late_binary_op, don't use c_save_expr but save_expr. testsuite/ * c-c++-common/ubsan/pr64344-1.c: New test. * c-c++-common/ubsan/pr64344-2.c: New test. --- gcc/ubsan.h.jj 2014-12-03 16:33:05.0 +0100 +++ gcc/ubsan.h 2014-12-18 09:57:12.934732565 +0100 @@ -47,7 +47,7 @@ extern tree ubsan_type_descriptor (tree, extern tree ubsan_encode_value (tree, bool = false); extern bool is_ubsan_builtin_p (tree); extern tree ubsan_build_overflow_builtin (tree_code, location_t, tree, tree, tree); -extern tree ubsan_instrument_float_cast (location_t, tree, tree); +extern tree ubsan_instrument_float_cast (location_t, tree, tree, tree); extern tree ubsan_get_source_location_type (void); #endif /* GCC_UBSAN_H */ --- gcc/ubsan.c.jj 2014-12-03 16:33:05.0 +0100 +++ gcc/ubsan.c 2014-12-18 10:47:00.045039211 +0100 @@ -1252,10 +1252,11 @@ instrument_bool_enum_load (gimple_stmt_i } /* Instrument float point-to-integer conversion. TYPE is an integer type of - destination, EXPR is floating-point expression. */ + destination, EXPR is floating-point expression. ARG is what to pass + the libubsan call as value, often EXPR itself. */ tree -ubsan_instrument_float_cast (location_t loc, tree type, tree expr) +ubsan_instrument_float_cast (location_t loc, tree type, tree expr, tree arg) { tree expr_type = TREE_TYPE (expr); tree t, tt, fn, min, max; @@ -1348,6 +1349,12 @@ ubsan_instrument_float_cast (location_t else return NULL_TREE; + t = fold_build2 (UNLE_EXPR, boolean_type_node, expr, min); + tt = fold_build2 (UNGE_EXPR, boolean_type_node, expr, max); + t = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, t, tt); + if (integer_zerop (t)) +return NULL_TREE; + if (flag_sanitize_undefined_trap_on_error) fn = build_call_expr_loc (loc, builtin_decl_explicit (BUILT_IN_TRAP), 0); else @@ -1364,14 +1371,10 @@ ubsan_instrument_float_cast (location_t fn = builtin_decl_explicit (bcode); fn = build_call_expr_loc (loc, fn, 2, build_fold_addr_expr_loc (loc, data), - ubsan_encode_value (expr, false)); + ubsan_encode_value (arg, false)); } - t = fold_build2 (UNLE_EXPR, boolean_type_node, expr, min); - tt = fold_build2 (UNGE_EXPR, boolean_type_node, expr, max); - return fold_build3 (COND_EXPR, void_type_node, - fold_build2 (TRUTH_OR_EXPR, boolean_type_node, t, tt), - fn, integer_zero_node); + return fold_build3 (COND_EXPR, void_type_node, t, fn, integer_zero_node); } /* Instrument values passed to function arguments with nonnull attribute. */ --- gcc/convert.c.jj2014-12-12 16:32:29.0 +0100 +++ gcc/convert.c 2014-12-18 09:57:27.742475996 +0100 @@
Re: OpenACC middle end changes
On Thu, Dec 18, 2014 at 01:02:22PM +0100, Jakub Jelinek wrote: On Thu, Dec 18, 2014 at 12:38:53PM +0100, Jakub Jelinek wrote: So, is what is on the gomp-4_0-branch now all that you'd like to merge to trunk now? Has it been tested on nvptx? I guess we should test it with XeonPhi offloading too to make sure it doesn't break. And then you or together with your coworkers should write the summary ChangeLogs (i.e. what changed compared to trunk in a single giant entry for each ChangeLog file as opposed to many ChangeLog.gomp change entries). Also, it would be nice to update wiki/Offloading to give details on all the steps how to configure nvptx offloading (how to grab nvptx-newlib, nvptx-tools, how to configure nvptx-none compiler, in what order to build those etc.). FYI, just tried to build gomp-4_0-branch with: ../configure --build=x86_64-intelmicemul-linux-gnu --host=x86_64-intelmicemul-linux-gnu --target=x86_64-intelmicemul-linux-gnu --enable-as-accelerator-for=x86_64-pc-linux-gnu --disable-bootstrap make -j16 and the build failed with: ../../gcc/builtins.c: In function ‘rtx_def* expand_builtin_acc_on_device(tree, rtx)’: ../../gcc/builtins.c:5904:17: error: ‘ACCEL_COMPILER_acc_device’ was not declared in this scope v2 = GEN_INT (ACCEL_COMPILER_acc_device); ^ ../../gcc/rtl.h:3186:51: note: in definition of macro ‘GEN_INT’ #define GEN_INT(N) gen_rtx_CONST_INT (VOIDmode, (N)) ^ Where is ACCEL_COMPILER_acc_device macro supposed to be defined? Jakub
Re: OpenACC middle end changes
Hi Jakub! On Thu, 18 Dec 2014 13:15:38 +0100, Jakub Jelinek ja...@redhat.com wrote: On Thu, Dec 18, 2014 at 01:02:22PM +0100, Jakub Jelinek wrote: On Thu, Dec 18, 2014 at 12:38:53PM +0100, Jakub Jelinek wrote: So, is what is on the gomp-4_0-branch now all that you'd like to merge to trunk now? Basically, yes. Only basically, because there are still a few unaddressed review issues in the front ends -- which I'll look into now. (Meaning really: now.) :-) Doing the merge as one big commit on trunk will be the easiest approach, of course. Is that OK, or is there any requirement to single out any of the changes, such as the libgomp/testsuite/libgomp-test-support.exp file just discussed, or the libgomp »Offloading and Multi Processing Runtime Library« renaming, or anything else? Has it been tested on nvptx? I have always been testing on gomp-4_0-branch with ACC_DEVICE_TYPE=host and ACC_DEVICE_TYPE=host_nonshm, plus with ACC_DEVICE_TYPE=nvidia in an internal branch. This branch corresponds to gomp-4_0-branch, but also includes a few patches related to offloading that Bernd has posted for trunk approval, but has not yet gotten approved. I guess we should test it with XeonPhi offloading too to make sure it doesn't break. Right. Do you happen to be set up for such testing? I have not yet managed to properly change my build/test scripts for x86_64-intelmicemul-linux-gnu. And then you or together with your coworkers should write the summary ChangeLogs (i.e. what changed compared to trunk in a single giant entry for each ChangeLog file as opposed to many ChangeLog.gomp change entries). Right, I'll do that once it's time to merge -- otherwise it'll be too cumbersome to keep those up to date. Also, it would be nice to update wiki/Offloading to give details on all the steps how to configure nvptx offloading (how to grab nvptx-newlib, nvptx-tools, how to configure nvptx-none compiler, in what order to build those etc.). Right. FYI, just tried to build gomp-4_0-branch with: ../configure --build=x86_64-intelmicemul-linux-gnu --host=x86_64-intelmicemul-linux-gnu --target=x86_64-intelmicemul-linux-gnu --enable-as-accelerator-for=x86_64-pc-linux-gnu --disable-bootstrap make -j16 and the build failed with: ../../gcc/builtins.c: In function ‘rtx_def* expand_builtin_acc_on_device(tree, rtx)’: ../../gcc/builtins.c:5904:17: error: ‘ACCEL_COMPILER_acc_device’ was not declared in this scope v2 = GEN_INT (ACCEL_COMPILER_acc_device); ^ ../../gcc/rtl.h:3186:51: note: in definition of macro ‘GEN_INT’ #define GEN_INT(N) gen_rtx_CONST_INT (VOIDmode, (N)) ^ Where is ACCEL_COMPILER_acc_device macro supposed to be defined? From b6781092de7cc9fc8c24600815e0a1223e1241f5 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge tho...@codesourcery.com Date: Wed, 17 Dec 2014 10:10:53 +0100 Subject: [PATCH] Intel MIC offloading. --- gcc/config.gcc | 1 + gcc/config/i386/intelmic-offload.h | 35 +++ 2 files changed, 36 insertions(+) create mode 100644 gcc/config/i386/intelmic-offload.h diff --git gcc/config.gcc gcc/config.gcc index 8541274..faad47d 100644 --- gcc/config.gcc +++ gcc/config.gcc @@ -2906,6 +2906,7 @@ esac case ${target} in *-intelmic-* | *-intelmicemul-*) tmake_file=${tmake_file} i386/t-intelmic + tm_file=${tm_file} i386/intelmic-offload.h ;; esac diff --git gcc/config/i386/intelmic-offload.h gcc/config/i386/intelmic-offload.h new file mode 100644 index 000..dc346c7 --- /dev/null +++ gcc/config/i386/intelmic-offload.h @@ -0,0 +1,35 @@ +/* Definitions for Intel MIC offloading. + + Copyright (C) 2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + http://www.gnu.org/licenses/. */ + +#ifndef INTELMIC_OFFLOAD_H +#define INTELMIC_OFFLOAD_H + +/* Support for OpenACC acc_on_device. */ + +#include gomp-constants.h + +#define ACCEL_COMPILER_acc_device GOMP_DEVICE_INTEL_MIC + +#endif -- 1.9.1 Grüße, Thomas
Re: OpenACC middle end changes
On Thu, Dec 18, 2014 at 01:24:20PM +0100, Thomas Schwinge wrote: Hi Jakub! On Thu, 18 Dec 2014 13:15:38 +0100, Jakub Jelinek ja...@redhat.com wrote: On Thu, Dec 18, 2014 at 01:02:22PM +0100, Jakub Jelinek wrote: On Thu, Dec 18, 2014 at 12:38:53PM +0100, Jakub Jelinek wrote: So, is what is on the gomp-4_0-branch now all that you'd like to merge to trunk now? Basically, yes. Only basically, because there are still a few unaddressed review issues in the front ends -- which I'll look into now. (Meaning really: now.) :-) Doing the merge as one big commit on trunk will be the easiest approach, of course. Is that OK, or is there any requirement to single out any of the changes, such as the libgomp/testsuite/libgomp-test-support.exp file just discussed, or the libgomp »Offloading and Multi Processing Runtime Library« renaming, or anything else? Doing one big merge is ok with me. If one needs to bisect something, they can look at the gomp-4_0-branch. Has it been tested on nvptx? I have always been testing on gomp-4_0-branch with ACC_DEVICE_TYPE=host and ACC_DEVICE_TYPE=host_nonshm, plus with ACC_DEVICE_TYPE=nvidia in an internal branch. This branch corresponds to gomp-4_0-branch, but also includes a few patches related to offloading that Bernd has posted for trunk approval, but has not yet gotten approved. Do you have a list of them (URLs)? Have they been pinged? Are they show stoppers for the offloading, or just some tests fail because of that? I guess we should test it with XeonPhi offloading too to make sure it doesn't break. Right. Do you happen to be set up for such testing? I have not yet managed to properly change my build/test scripts for x86_64-intelmicemul-linux-gnu. Anyone with x86_64-linux should be able to test that (i.e. the emulation), for real offloading (that ix x86_64-intelmic-linux-gnu) you supposedly need 2 libraries, some kernel module, some distro installed on the offloading device and most importantly the hw. --- gcc/config.gcc +++ gcc/config.gcc @@ -2906,6 +2906,7 @@ esac case ${target} in *-intelmic-* | *-intelmicemul-*) tmake_file=${tmake_file} i386/t-intelmic + tm_file=${tm_file} i386/intelmic-offload.h ;; esac diff --git gcc/config/i386/intelmic-offload.h gcc/config/i386/intelmic-offload.h new file mode 100644 index 000..dc346c7 --- /dev/null +++ gcc/config/i386/intelmic-offload.h @@ -0,0 +1,35 @@ +/* Definitions for Intel MIC offloading. + + Copyright (C) 2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + http://www.gnu.org/licenses/. */ + +#ifndef INTELMIC_OFFLOAD_H +#define INTELMIC_OFFLOAD_H + +/* Support for OpenACC acc_on_device. */ + +#include gomp-constants.h + +#define ACCEL_COMPILER_acc_device GOMP_DEVICE_INTEL_MIC + +#endif LGTM. Jakub
Re: OpenACC middle end changes
On Thu, Dec 18, 2014 at 01:31:45PM +0100, Jakub Jelinek wrote: --- gcc/config.gcc +++ gcc/config.gcc @@ -2906,6 +2906,7 @@ esac case ${target} in *-intelmic-* | *-intelmicemul-*) tmake_file=${tmake_file} i386/t-intelmic + tm_file=${tm_file} i386/intelmic-offload.h ;; esac +#define ACCEL_COMPILER_acc_device GOMP_DEVICE_INTEL_MIC + +#endif Oh, and where is this defined for nvptx-none target? Jakub
[PATCH] X86-64: Add -mskip-rax-setup
The Linux kernel never passes floating point arguments around, vararg functions or not. Hence no vector registers are ever used when calling a vararg function. But gcc still dutifully emits an xor %eax,%eax before each and every call of a vararg function. Since no callee use that for anything, these instructions are redundant. This patch adds the -mskip-rax-setup option to skip setting up RAX register when SSE is disabled and there are no variable arguments passed in vector registers. Since RAX register is used to avoid unnecessarily saving vector registers on stack when passing variable arguments, the impacts of this option are callees may waste some stack space, misbehave or jump to a random location. GCC 4.4 or newer don't those issues, regardless the RAX register value since they don't check the RAX register value when SSE is disabled, regardless the RAX register value: https://gcc.gnu.org/ml/gcc-patches/2008-09/msg00127.html I used it on kernel 3.17.7: textdata bss dec hexfilename 11493571 2271232 5926912 19691715 12c78c3 vmlinux.skip-rax 11517879 2271232 5926912 19716023 12cd7b7 vmlinux.orig It removed 14309 redundant xor %eax,%eax instructions and saved about 27KB. I am currently running the new kernel without any problem. OK for trunk? Thanks. H.J. --- gcc/ * config/i386/i386.c (ix86_expand_call): Skip setting up RAX register for -mskip-rax-setup when there are no parameters passed in vector registers. * config/i386/i386.opt (mskip-rax-setup): New option. * doc/invoke.texi: Document -mskip-rax-setup. gcc/testsuite/ * gcc.target/i386/amd64-abi-7.c: New tests. * gcc.target/i386/amd64-abi-8.c: Likwise. * gcc.target/i386/amd64-abi-9.c: Likwise. --- gcc/ChangeLog | 8 + gcc/config/i386/i386.c | 7 - gcc/config/i386/i386.opt| 4 +++ gcc/doc/invoke.texi | 13 gcc/testsuite/ChangeLog | 6 gcc/testsuite/gcc.target/i386/amd64-abi-7.c | 46 + gcc/testsuite/gcc.target/i386/amd64-abi-8.c | 18 +++ gcc/testsuite/gcc.target/i386/amd64-abi-9.c | 18 +++ 8 files changed, 119 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/i386/amd64-abi-7.c create mode 100644 gcc/testsuite/gcc.target/i386/amd64-abi-8.c create mode 100644 gcc/testsuite/gcc.target/i386/amd64-abi-9.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 24a252a..de7907a 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,11 @@ +2014-12-18 H.J. Lu hongjiu...@intel.com + + * config/i386/i386.c (ix86_expand_call): Skip setting up RAX + register for -mskip-rax-setup when there are no parameters + passed in vector registers. + * config/i386/i386.opt (mskip-rax-setup): New option. + * doc/invoke.texi: Document -mskip-rax-setup. + 2014-12-18 Martin Liska mli...@suse.cz PR ipa/64146 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 17ef751..122a350 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -25461,7 +25461,12 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1, } } - if (TARGET_64BIT INTVAL (callarg2) = 0) + /* Skip setting up RAX register for -mskip-rax-setup when there are no + parameters passed in vector registers. */ + if (TARGET_64BIT + (INTVAL (callarg2) 0 + || (INTVAL (callarg2) == 0 + (TARGET_SSE || !flag_skip_rax_setup { rtx al = gen_rtx_REG (QImode, AX_REG); emit_move_insn (al, callarg2); diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 3d54bfa..6dc4da2 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -831,6 +831,10 @@ Target Report Var(flag_nop_mcount) Init(0) Generate mcount/__fentry__ calls as nops. To activate they need to be patched in. +mskip-rax-setup +Target Report Var(flag_skip_rax_setup) Init(0) +Skip setting up RAX register when passing variable arguments. + m8bit-idiv Target Report Mask(USE_8BIT_IDIV) Save Expand 32bit/64bit integer divide into 8bit unsigned integer divide with run-time check diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 15068da..33a7ed2 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -16256,6 +16256,19 @@ the profiling functions as nops. This is useful when they should be patched in later dynamically. This is likely only useful together with @option{-mrecord-mcount}. +@item -mskip-rax-setup +@itemx -mno-skip-rax-setup +@opindex mskip-rax-setup +When generating code for the x86-64 architecture with SSE extensions +disabled, @option{-skip-rax-setup} can be used to skip setting up RAX +register when there are no variable arguments passed in vector registers. + +@strong{Warning:} Since RAX register is used to avoid unnecessarily +saving vector
Re: OpenACC middle end changes
Hi Jakub! On Thu, 18 Dec 2014 12:33:11 +0100, Jakub Jelinek ja...@redhat.com wrote: On Thu, Dec 18, 2014 at 11:46:00AM +0100, Thomas Schwinge wrote: just rtx v1 = GEN_INT (...); rtx v2 = GEN_INT (...); machine_mode mode = TYPE_MODE (TREE_TYPE (arg)); rtx ret = gen_reg_rtx (TYPE_MODE (integer_type_node)); emit_move_insn (ret, const0_rtx); rtx_code_label *done_label = gen_label_rtx (); emit_cmp_and_jump_insns (op, v1, NE, NULL_RTX, mode, false, done_label, PROB_EVEN); emit_cmp_and_jump_insns (op, v2, NE, NULL_RTX, mode, false, done_label, PROB_EVEN); emit_move_insn (ret, const1_rtx); emit_label (done_label); return ret; or similar. Thanks for the review/suggestion/code! Note, as I found later, emit_cmp_and_jump_insns is good enough only for certain modes on certain architectures (in particular, for cases where can_compare_p returns true). So it is better to use do_compare_rtx_and_jump instead of emit_cmp_and_jump_insns, because it handles also the cases which emit_cmp_and_jump_insns silently mishandles. You'll need to reorder the arguments a little bit and add one NULL_RTX argument. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63848#c4 Thanks again; committed to gomp-4_0-branch in r218862: commit a58e1475324e6dd6c34a95883f5efc854e204fde Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Thu Dec 18 13:13:06 2014 + OpenACC acc_on_device: Harden builtin expansion. gcc/ * builtins.c (expand_builtin_acc_on_device): Use do_compare_rtx_and_jump instead of emit_cmp_and_jump_insns. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@218862 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp | 3 +++ gcc/builtins.c | 8 2 files changed, 7 insertions(+), 4 deletions(-) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index a3650c5..1e6df5f 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,6 +1,9 @@ 2014-12-18 Thomas Schwinge tho...@codesourcery.com Jakub Jelinek ja...@redhat.com + * builtins.c (expand_builtin_acc_on_device): Use + do_compare_rtx_and_jump instead of emit_cmp_and_jump_insns. + * builtins.c (expand_builtin_acc_on_device): Make more RTXy. 2014-12-17 Thomas Schwinge tho...@codesourcery.com diff --git gcc/builtins.c gcc/builtins.c index e946521..33025a5 100644 --- gcc/builtins.c +++ gcc/builtins.c @@ -5911,10 +5911,10 @@ expand_builtin_acc_on_device (tree exp, rtx target) target = gen_reg_rtx (target_mode); emit_move_insn (target, const0_rtx); rtx_code_label *done_label = gen_label_rtx (); - emit_cmp_and_jump_insns (v, v1, NE, NULL_RTX, v_mode, - false, done_label, PROB_EVEN); - emit_cmp_and_jump_insns (v, v2, NE, NULL_RTX, v_mode, - false, done_label, PROB_EVEN); + do_compare_rtx_and_jump (v, v1, NE, false, v_mode, NULL_RTX, + NULL_RTX, done_label, PROB_EVEN); + do_compare_rtx_and_jump (v, v2, NE, false, v_mode, NULL_RTX, + NULL_RTX, done_label, PROB_EVEN); emit_move_insn (target, const1_rtx); emit_label (done_label); Grüße, Thomas pgpybie4WoKNR.pgp Description: PGP signature
Re: OpenACC middle end changes
Hi Jakub! On Thu, 18 Dec 2014 13:36:16 +0100, Jakub Jelinek ja...@redhat.com wrote: On Thu, Dec 18, 2014 at 01:31:45PM +0100, Jakub Jelinek wrote: --- gcc/config.gcc +++ gcc/config.gcc @@ -2906,6 +2906,7 @@ esac case ${target} in *-intelmic-* | *-intelmicemul-*) tmake_file=${tmake_file} i386/t-intelmic + tm_file=${tm_file} i386/intelmic-offload.h ;; esac +#define ACCEL_COMPILER_acc_device GOMP_DEVICE_INTEL_MIC + +#endif Oh, and where is this defined for nvptx-none target? One of the pending commits (together with the nvptx mkoffload); I now applied the following to gomp-4_0-branch in r218863: commit c3f63a62eb4332f651be5fd377d2d289c8c949f5 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Thu Dec 18 13:13:17 2014 + Support for OpenACC acc_on_device in offloading configurations. gcc/ * config/i386/intelmic-offload.h: New file. * config/nvptx/offload.h: Likewise. * config.gcc *-intelmic-*, *-intelmicemul-*, nvptx-*: Point to them via tm_file. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@218863 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp | 7 +++ gcc/config.gcc | 2 ++ gcc/config/i386/intelmic-offload.h | 35 +++ gcc/config/nvptx/offload.h | 35 +++ 4 files changed, 79 insertions(+) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index 1e6df5f..a744ebf 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,4 +1,11 @@ 2014-12-18 Thomas Schwinge tho...@codesourcery.com + + * config/i386/intelmic-offload.h: New file. + * config/nvptx/offload.h: Likewise. + * config.gcc *-intelmic-*, *-intelmicemul-*, nvptx-*: Point to + them via tm_file. + +2014-12-18 Thomas Schwinge tho...@codesourcery.com Jakub Jelinek ja...@redhat.com * builtins.c (expand_builtin_acc_on_device): Use diff --git gcc/config.gcc gcc/config.gcc index 8541274..1e453e9 100644 --- gcc/config.gcc +++ gcc/config.gcc @@ -2178,6 +2178,7 @@ nios2-*-*) nvptx-*) tm_file=${tm_file} newlib-stdint.h tmake_file=nvptx/t-nvptx + tm_file=${tm_file} nvptx/offload.h ;; pdp11-*-*) tm_file=${tm_file} newlib-stdint.h @@ -2906,6 +2907,7 @@ esac case ${target} in *-intelmic-* | *-intelmicemul-*) tmake_file=${tmake_file} i386/t-intelmic + tm_file=${tm_file} i386/intelmic-offload.h ;; esac diff --git gcc/config/i386/intelmic-offload.h gcc/config/i386/intelmic-offload.h new file mode 100644 index 000..bea18ed --- /dev/null +++ gcc/config/i386/intelmic-offload.h @@ -0,0 +1,35 @@ +/* Support for Intel MIC offloading. + + Copyright (C) 2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + http://www.gnu.org/licenses/. */ + +#ifndef INTELMIC_OFFLOAD_H +#define INTELMIC_OFFLOAD_H + +/* Support for OpenACC acc_on_device. */ + +#include gomp-constants.h + +#define ACCEL_COMPILER_acc_device GOMP_DEVICE_INTEL_MIC + +#endif diff --git gcc/config/nvptx/offload.h gcc/config/nvptx/offload.h new file mode 100644 index 000..63f9a02 --- /dev/null +++ gcc/config/nvptx/offload.h @@ -0,0 +1,35 @@ +/* Support for Nvidia PTX offloading. + + Copyright (C) 2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software
Re: [PATCH] X86-64: Add -mskip-rax-setup
On Thu, Dec 18, 2014 at 2:11 PM, H.J. Lu hongjiu...@intel.com wrote: The Linux kernel never passes floating point arguments around, vararg functions or not. Hence no vector registers are ever used when calling a vararg function. But gcc still dutifully emits an xor %eax,%eax before each and every call of a vararg function. Since no callee use that for anything, these instructions are redundant. This patch adds the -mskip-rax-setup option to skip setting up RAX register when SSE is disabled and there are no variable arguments passed in vector registers. Since RAX register is used to avoid unnecessarily saving vector registers on stack when passing variable arguments, the impacts of this option are callees may waste some stack space, misbehave or jump to a random location. GCC 4.4 or newer don't those issues, regardless the RAX register value since they don't check the RAX register value when SSE is disabled, regardless the RAX register value: https://gcc.gnu.org/ml/gcc-patches/2008-09/msg00127.html I used it on kernel 3.17.7: textdata bss dec hexfilename 11493571 2271232 5926912 19691715 12c78c3 vmlinux.skip-rax 11517879 2271232 5926912 19716023 12cd7b7 vmlinux.orig It removed 14309 redundant xor %eax,%eax instructions and saved about 27KB. I am currently running the new kernel without any problem. OK for trunk? How about skipping RAX setup unconditionally for !TARGET_SSE? Please see ix86_conditional_register_usage, where SSE registers are squashed for !TARGET_SSE, so it is not possible to use them even in the inline asm. Uros.
Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic
On Thu, 18 Dec 2014, Kyrill Tkachov wrote: I see that there are some intrinsics implemented in terms of __builtin_fabsf, presumable they can be at 'risk' too of having a library call emitted? The semantics of fabsf/fabs/fabsl are to clear the sign bit, never raise any exceptions (even for signaling NaN arguments, subject to any limitations imposed by calling conventions / floating-point load conventions on the processor) and never set errno, as per IEEE 754-2008 (and the C bindings thereto, TS 18661-1:2014). So there is certainly no need to call a library function for __builtin_fabsf, but I don't know whether the implementation guarantees never to generate such a call. -- Joseph S. Myers jos...@codesourcery.com
[PATCH] Fix for PR64330
Hello. Following patch is fix for PR64330, is pre-approved by Honza and I'm going to install it. No regression was seen on x86_64-linux-pc. Thanks, Martin From 6f9ed17a2897be6a26e2c64f8e581771c216e41b Mon Sep 17 00:00:00 2001 From: mliska mli...@suse.cz Date: Wed, 17 Dec 2014 19:11:48 +0100 Subject: [PATCH] Fix for PR64330. gcc/ChangeLog: 2014-12-17 Martin Liska mli...@suse.cz PR tree-optimization/64330 * ipa-icf.c (sem_variable::parse): Add checking for externally visible symbols and do not introduce an alias for an external declaration. --- gcc/ipa-icf.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c index b193200..ad106e5 100644 --- a/gcc/ipa-icf.c +++ b/gcc/ipa-icf.c @@ -1131,10 +1131,14 @@ sem_variable::parse (varpool_node *node, bitmap_obstack *stack) tree decl = node-decl; bool readonly = TYPE_P (decl) ? TYPE_READONLY (decl) : TREE_READONLY (decl); - bool can_handle = readonly (DECL_VIRTUAL_P (decl) - || !TREE_ADDRESSABLE (decl)); + if (!readonly) +return NULL; + + bool can_handle = DECL_VIRTUAL_P (decl) + || flag_merge_constants = 2 + || (!TREE_ADDRESSABLE (decl) !node-externally_visible); - if (!can_handle) + if (!can_handle || DECL_EXTERNAL (decl)) return NULL; tree ctor = ctor_for_folding (decl); -- 2.1.2
Re: [PATCH] IPA ICF: refactoring + fix for PR ipa/63569
On 12/17/2014 04:23 PM, Richard Biener wrote: On Wed, Dec 17, 2014 at 12:17 PM, Martin Liška mli...@suse.cz wrote: On 12/11/2014 01:37 PM, Richard Biener wrote: On Wed, Dec 10, 2014 at 1:18 PM, Martin Liška mli...@suse.cz wrote: Hello. As suggested by Richard, I split compare_operand functions to various functions related to a specific comparison. Apart from that I added fast check for volatility flag that caused miscompilation mentioned in PR63569. Patch can bootstrap on x86_64-linux-pc without any regression seen and I was able to build Firefox with LTO. Ready for trunk? Hmm, I don't think the dispatch to compare_memory_operand is at the correct place. It should be called from places where currently compare_operand is called and it should recurse to compare_operand. That is, it is more high-level. Can you please fix the volatile issue separately? It's also not necessary to do that check on every operand but just on memory operands. Hello Richard. I hope the following patch is the finally the right approach we want to do ;) In the first patch, I did just refactoring for high-level memory operand comparison and the second of fixes problem connected to missing volatile attribute comparison. Patch was retested and can bootstrap. What do you think about it? +bool +func_checker::compare_cst_or_decl (tree t1, tree t2) +{ ... +case INTEGER_CST: + { + ret = compatible_types_p (TREE_TYPE (t1), TREE_TYPE (t2)) + wi::to_offset (t1) == wi::to_offset (t2); + + return return_with_debug (ret); + } +case COMPLEX_CST: +case VECTOR_CST: +case STRING_CST: +case REAL_CST: + { + ret = operand_equal_p (t1, t2, OEP_ONLY_CONST); + return return_with_debug (ret); why does the type matter for INTEGER_CSTs but not for other constants? Also comparing INTEGER_CSTs via to_offset is certainly wrong. Please use tree_int_cst_equal_p (t1, t2) instead, or operand_equal_p which would end up calling tree_int_cst_equal_p. That said, I'd have commonized all _CST nodes by ret = compatible_types_p () operand_equal_p (...); +case CONST_DECL: +case BIT_FIELD_REF: + { I'm quite sure BIT_FIELD_REF is spurious here. Now to the meat ... + tree load1 = get_base_loadstore (t1, false); + tree load2 = get_base_loadstore (t2, false); + + if (load1 load2 !compare_memory_operand (t1, t2)) + return return_false_with_msg (memory operands are different); + else if ((load1 !load2) || (!load1 load2)) + return return_false_with_msg (); + and the similar code for assigns. The way you introduce the unpack_handled_component flag to get_base_loadstore makes it treat x.field as non-memory operand. That's wrong. I wonder why you think you needed that. Why does tree load1= get_base_loadstore (t1); not work? Btw, I'd have avoided get_base_loadstore and simply did ao_ref_init (r1, t1); ao_ref_init (r2, t2); tree base1 = ao_ref_base (r1); tree base2 = ao_ref_base (r2); if ((DECL_P (base1) || (TREE_CODE (base1) == MEM_REF || TREE_CODE (base1) == TARGET_MEM_REF)) (DECL_P (base2) || (...)) ... or rather put this into compare_memory_operand and call that on all operands that may be memory (TREE_CODE () != SSA_NAME !is_gimple_min_invariant ()). I also miss where you handle memory operands on the LHS for calls and for assignments. The compare_ssa_name refactoring part is ok to install. Can you please fix the volatile bug before the refactoring as it looks like we're going to iterate some more here unless I can find the time to give it a quick try myself. Hi. Sure, there's minimalistic patch which fixes the PR. Can I install this change? I'm going to apply your notes that are orthogonal to the PR. Thanks, Martin Thanks, Richard. Thank you, Martin Thanks, Richard. Thanks, Martin From b984eaae263eedc8beb2413f4739d3574f46d63d Mon Sep 17 00:00:00 2001 From: mliska mli...@suse.cz Date: Thu, 18 Dec 2014 14:16:12 +0100 Subject: [PATCH 1/2] Fix for PR ipa/63569. gcc/testsuite/ChangeLog: 2014-12-18 Martin Liska mli...@suse.cz PR ipa/63569 * gcc.dg/ipa/pr63569.c: New test. gcc/ChangeLog: 2014-12-18 Martin Liska mli...@suse.cz PR ipa/63569 * ipa-icf-gimple.c (func_checker::compare_operand): Add missing comparison for volatile flag. --- gcc/ipa-icf-gimple.c | 3 +++ gcc/testsuite/gcc.dg/ipa/pr63569.c | 32 2 files changed, 35 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/ipa/pr63569.c diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c index ec0290a..fa2c353 100644 --- a/gcc/ipa-icf-gimple.c +++ b/gcc/ipa-icf-gimple.c @@ -230,6 +230,9 @@ func_checker::compare_operand (tree t1, tree t2) tree tt1 = TREE_TYPE (t1); tree tt2 = TREE_TYPE (t2); + if (TREE_THIS_VOLATILE (t1) != TREE_THIS_VOLATILE (t2)) +return return_false_with_msg (different operand volatility); + if
Re: [PATCH 2/3] Extended if-conversion
SSA names required for the predicates are defined upward - and the complex CFG is squashed to a single basic-block, thus the defs will dominate the inserted code if you insert after labels just like for the other case. Or what am I missing? (flattening of the basic-blocks of course needs to happen in dominator order - but I guess that happens already?) I'd like the extended PHI handling to be enablable by a flag even for !force-vectorization - I've seen cases with 3 PHI args multiple times that would have been nice to vectorize. I suggest to add -ftree-loop-if-convert-aggressive for this. We can do this as followup, but please rename the local flag_force_vectorize flag to something less looking like a flag, like simply 'aggressive'. Otherwise patch 2 looks ok to me. Richard. ChangeLog: 2014-10-24 Yuri Rumyantsev ysrum...@gmail.com * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE instead of loop flag. (if_convertible_bb_p): Allow bb has more than 2 predecessors if FLAG_FORCE_VECTORIZE is true. (if_convertible_bb_p): Delete check that bb has at least one non-critical incoming edge. (phi_has_two_different_args): New function. (is_cond_scalar_reduction): Add argument EXTENDED to choose access to phi arguments. Invoke phi_has_two_different_args to get phi arguments if EXTENDED is true. Change check that block containing reduction statement candidate is predecessor of phi-block since phi may have more than two arguments. (convert_scalar_cond_reduction): Add argument BEFORE to insert statement before/after gsi point. (predicate_scalar_phi): Add argument false (which means non-extended predication) to call of is_cond_scalar_reduction. Add argument true (which correspondent to argument BEFORE) to call of convert_scalar_cond_reduction. (get_predicate_for_edge): New function. (predicate_arbitrary_scalar_phi): New function. (predicate_extended_scalar_phi): New function. (find_insertion_point): New function. (predicate_all_scalar_phis): Add two boolean variables EXTENDED and BEFORE. Initialize EXTENDED to true if BB containing phi has more than 2 predecessors or both incoming edges are critical. Invoke find_phi_replacement_condition and predicate_scalar_phi or find_insertion_point and predicate_extended_scalar_phi depending on EXTENDED value. (insert_gimplified_predicates): Add check that non-predicated block may have statements to insert. Insert predicate of BB just after label if FLAG_FORCE_VECTORIZE is true. (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which is copy of inner or outer loop field force_vectorize. patch.20141218 Description: Binary data
Pragma parsing (was: [PATCH] OpenACC for C++ front end)
Hi! On Thu, 13 Nov 2014 14:02:37 +0100, Jakub Jelinek ja...@redhat.com wrote: void init_pragma (void) { + if (flag_openacc) +{ + const int n_oacc_pragmas + = sizeof (oacc_pragmas) / sizeof (*oacc_pragmas); + int i; + + for (i = 0; i n_oacc_pragmas; ++i) + cpp_register_deferred_pragma (parse_in, acc, oacc_pragmas[i].name, + oacc_pragmas[i].id, true, true); +} + if (flag_openmp) { const int n_omp_pragmas = sizeof (omp_pragmas) / sizeof (*omp_pragmas); Is -fopenmp -fopenacc tested not to run out of number of supported pragmas by libcpp? We're running (some limited amount of) combined OpenACC/OpenMP compile testing, yes. -/* All clauses defined by OpenMP 2.5, 3.0, 3.1 and 4.0. +/* All clauses defined by OpenACC 2.0, and OpenMP 2.5, 3.0, 3.1, and 4.0. Used internally by both C and C++ parsers. */ typedef enum pragma_omp_clause { PRAGMA_OMP_CLAUSE_NONE = 0, PRAGMA_OMP_CLAUSE_ALIGNED, + PRAGMA_OMP_CLAUSE_ASYNC, PRAGMA_OMP_CLAUSE_COLLAPSE, + PRAGMA_OMP_CLAUSE_COPY, PRAGMA_OMP_CLAUSE_COPYIN, + PRAGMA_OMP_CLAUSE_COPYOUT, PRAGMA_OMP_CLAUSE_COPYPRIVATE, + PRAGMA_OMP_CLAUSE_CREATE, PRAGMA_OMP_CLAUSE_DEFAULT, + PRAGMA_OMP_CLAUSE_DELETE, PRAGMA_OMP_CLAUSE_DEPEND, PRAGMA_OMP_CLAUSE_DEVICE, + PRAGMA_OMP_CLAUSE_DEVICEPTR, PRAGMA_OMP_CLAUSE_DIST_SCHEDULE, PRAGMA_OMP_CLAUSE_FINAL, PRAGMA_OMP_CLAUSE_FIRSTPRIVATE, PRAGMA_OMP_CLAUSE_FOR, PRAGMA_OMP_CLAUSE_FROM, + PRAGMA_OMP_CLAUSE_HOST, PRAGMA_OMP_CLAUSE_IF, PRAGMA_OMP_CLAUSE_INBRANCH, PRAGMA_OMP_CLAUSE_LASTPRIVATE, @@ -90,16 +106,24 @@ typedef enum pragma_omp_clause { PRAGMA_OMP_CLAUSE_MERGEABLE, PRAGMA_OMP_CLAUSE_NOTINBRANCH, PRAGMA_OMP_CLAUSE_NOWAIT, + PRAGMA_OMP_CLAUSE_NUM_GANGS, PRAGMA_OMP_CLAUSE_NUM_TEAMS, PRAGMA_OMP_CLAUSE_NUM_THREADS, + PRAGMA_OMP_CLAUSE_NUM_WORKERS, PRAGMA_OMP_CLAUSE_ORDERED, PRAGMA_OMP_CLAUSE_PARALLEL, + PRAGMA_OMP_CLAUSE_PRESENT, + PRAGMA_OMP_CLAUSE_PRESENT_OR_COPY, + PRAGMA_OMP_CLAUSE_PRESENT_OR_COPYIN, + PRAGMA_OMP_CLAUSE_PRESENT_OR_COPYOUT, + PRAGMA_OMP_CLAUSE_PRESENT_OR_CREATE, PRAGMA_OMP_CLAUSE_PRIVATE, PRAGMA_OMP_CLAUSE_PROC_BIND, PRAGMA_OMP_CLAUSE_REDUCTION, PRAGMA_OMP_CLAUSE_SAFELEN, PRAGMA_OMP_CLAUSE_SCHEDULE, PRAGMA_OMP_CLAUSE_SECTIONS, + PRAGMA_OMP_CLAUSE_SELF, PRAGMA_OMP_CLAUSE_SHARED, PRAGMA_OMP_CLAUSE_SIMDLEN, PRAGMA_OMP_CLAUSE_TASKGROUP, @@ -107,6 +131,8 @@ typedef enum pragma_omp_clause { PRAGMA_OMP_CLAUSE_TO, PRAGMA_OMP_CLAUSE_UNIFORM, PRAGMA_OMP_CLAUSE_UNTIED, + PRAGMA_OMP_CLAUSE_VECTOR_LENGTH, + PRAGMA_OMP_CLAUSE_WAIT, Like for CILK, I'd strongly prefer if for the clauses that are specific to OpenACC only you'd use PRAGMA_OACC_CLAUSE_* instead, and put them after the PRAGMA_CILK_* enum values. If you want to have PRAGMA_OACC_CLAUSE_ aliases also for the clauses shared in between OpenMP and OpenACC, feel free to add aliases like Cilk+ has them. It is unfortunately lots of new clauses and we are getting close to the 64 clauses limit :( when they are used in bitmasks. I still don't like this very much, because we'll then get a mess of PRAGMA_OMP_CLAUSE_* intermixed with PRAGMA_OACC_CLAUSE_*. I understand, for example, PRAGMA_OMP_CLAUSE_REDUCTION to just be a numeric representation of the reduction string -- and then, it doesn't make too much sense to me to express this both as PRAGMA_OMP_CLAUSE_REDUCTION and PRAGMA_OACC_CLAUSE_REDUCTION, or, similarly, to switch back and forth between PRAGMA_OMP_CLAUSE_* and PRAGMA_OACC_CLAUSE_*. Yet, that's just the point that I'm making, so I'll defer if there's no consensus to be found here. Chung-Lin also had a suggestion to make; what do you think of this? --- c-family/c-common.h (revision 442892) +++ c-family/c-common.h (working copy) @@ -1076,138 +1076,17 @@ extern void pp_dir_change (cpp_reader *, const cha extern bool check_missing_format_attribute (tree, tree); /* In c-omp.c */ -#if HOST_BITS_PER_WIDE_INT = 64 -typedef unsigned HOST_WIDE_INT omp_clause_mask; -# define OMP_CLAUSE_MASK_1 ((omp_clause_mask) 1) -#else -struct omp_clause_mask +#include bitset +typedef std::bitset128 omp_clause_mask; +/* Provide '' before full transition to set() method. */ +static inline omp_clause_mask +operator (const omp_clause_mask a, const omp_clause_mask b) { - inline omp_clause_mask (); - inline omp_clause_mask (unsigned HOST_WIDE_INT l); - inline omp_clause_mask (unsigned HOST_WIDE_INT l, - unsigned HOST_WIDE_INT h); - inline omp_clause_mask operator = (omp_clause_mask); - inline omp_clause_mask operator |= (omp_clause_mask); - inline omp_clause_mask operator ~ () const; - inline omp_clause_mask operator (omp_clause_mask) const; - inline omp_clause_mask operator | (omp_clause_mask)
Re: [PATCH] X86-64: Add -mskip-rax-setup
On Thu, Dec 18, 2014 at 2:24 PM, Uros Bizjak ubiz...@gmail.com wrote: On Thu, Dec 18, 2014 at 2:11 PM, H.J. Lu hongjiu...@intel.com wrote: The Linux kernel never passes floating point arguments around, vararg functions or not. Hence no vector registers are ever used when calling a vararg function. But gcc still dutifully emits an xor %eax,%eax before each and every call of a vararg function. Since no callee use that for anything, these instructions are redundant. This patch adds the -mskip-rax-setup option to skip setting up RAX register when SSE is disabled and there are no variable arguments passed in vector registers. Since RAX register is used to avoid unnecessarily saving vector registers on stack when passing variable arguments, the impacts of this option are callees may waste some stack space, misbehave or jump to a random location. GCC 4.4 or newer don't those issues, regardless the RAX register value since they don't check the RAX register value when SSE is disabled, regardless the RAX register value: https://gcc.gnu.org/ml/gcc-patches/2008-09/msg00127.html I used it on kernel 3.17.7: textdata bss dec hexfilename 11493571 2271232 5926912 19691715 12c78c3 vmlinux.skip-rax 11517879 2271232 5926912 19716023 12cd7b7 vmlinux.orig It removed 14309 redundant xor %eax,%eax instructions and saved about 27KB. I am currently running the new kernel without any problem. OK for trunk? How about skipping RAX setup unconditionally for !TARGET_SSE? Please see ix86_conditional_register_usage, where SSE registers are squashed for !TARGET_SSE, so it is not possible to use them even in the inline asm. ... when -ffreestanding is in effect, of course. Uros.
Re: [PATCH] X86-64: Add -mskip-rax-setup
On Thu, Dec 18, 2014 at 02:24:06PM +0100, Uros Bizjak wrote: It removed 14309 redundant xor %eax,%eax instructions and saved about 27KB. I am currently running the new kernel without any problem. OK for trunk? How about skipping RAX setup unconditionally for !TARGET_SSE? Please see ix86_conditional_register_usage, where SSE registers are squashed for !TARGET_SSE, so it is not possible to use them even in the inline asm. I'd say a problem is if a -mno-sse TU calls a vararg function (obviously it can't pass any float/double arguments) to a function in a TU compiled with -msse2 or higher where the stdarg pass can't figure out anything, say #include stdarg.h extern void bar (int, va_list); void foo (int x, ...) { va_list ap; va_start (ap, x); bar (x, ap); va_end (ap); } If foo is compiled with gcc 4.4 and earlier, it might crash when called from -mno-sse caller that would not xor %eax,%eax. If foo is compiled with gcc 4.5? and higher, then it might just randomly save all the xmm registers to stack (as the test is %al != 0, I think it will be more likely that it will save it unnecessarily than not). So I view H.J.'s new option as a user guarantee the callee will be also -mno-sse. Jakub
Re: [PATCH] X86-64: Add -mskip-rax-setup
On Thu, Dec 18, 2014 at 5:51 AM, Jakub Jelinek ja...@redhat.com wrote: On Thu, Dec 18, 2014 at 02:24:06PM +0100, Uros Bizjak wrote: It removed 14309 redundant xor %eax,%eax instructions and saved about 27KB. I am currently running the new kernel without any problem. OK for trunk? How about skipping RAX setup unconditionally for !TARGET_SSE? Please see ix86_conditional_register_usage, where SSE registers are squashed for !TARGET_SSE, so it is not possible to use them even in the inline asm. I'd say a problem is if a -mno-sse TU calls a vararg function (obviously it can't pass any float/double arguments) to a function in a TU compiled with -msse2 or higher where the stdarg pass can't figure out anything, say #include stdarg.h extern void bar (int, va_list); void foo (int x, ...) { va_list ap; va_start (ap, x); bar (x, ap); va_end (ap); } If foo is compiled with gcc 4.4 and earlier, it might crash when called from This was checked into GCC 4.4: commit d5d9458afdae9056a9624ae9c332dbcdc2f383be Author: jakub jakub@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Tue Sep 2 19:49:41 2008 + * config/i386/i386.c (X86_64_VARARGS_SIZE): Removed. (setup_incoming_varargs_64): Assume cum != NULL. Set/check ix86_varargs_gpr_size and ix86_varargs_fpr_size. Use ix86_varargs_gpr_size instead of X86_64_REGPARM_MAX. Don't set ix86_save_varrargs_registers. (ix86_setup_incoming_varargs): Assume cum != NULL. (ix86_va_start): Check ix86_varargs_gpr_size and ix86_varargs_fpr_size instead of cfun-va_list_gpr_size and cfun-va_list_fpr_size, respectively. Subtract 8*X86_64_REGPARM_MAX from frame pointer if ix86_varargs_gpr_size == 0. (ix86_compute_frame_layout): Updated. * config/i386/i386.h (ix86_save_varrargs_registers): Removed. (ix86_varargs_gpr_size): Define. (ix86_varargs_fpr_size): Likewise. (machine_function): Remove save_varrargs_registers. Add varargs_gpr_size and varargs_fpr_size. * gcc.target/i386/amd64-abi-3.c: New test. * gcc.target/i386/amd64-abi-4.c: Likewise. * gcc.target/i386/amd64-abi-5.c: Likewise. * gcc.target/i386/amd64-abi-6.c: Likewise. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@139910 138bc75d-0d04-0410-961f-82ee72b054a4 -mno-sse caller that would not xor %eax,%eax. If foo is compiled with gcc 4.5? and higher, then it might just randomly save all the xmm registers to GCC 4.4 is fine. stack (as the test is %al != 0, I think it will be more likely that it will save it unnecessarily than not). So I view H.J.'s new option as a user guarantee the callee will be also -mno-sse. If foo may be compiled by GCC 4.3 or older, we can't skip setting up RAX in foo's callers. Such an option is useful. -- H.J.
Re: [PATCH] X86-64: Add -mskip-rax-setup
On Thu, Dec 18, 2014 at 2:49 PM, Uros Bizjak ubiz...@gmail.com wrote: The Linux kernel never passes floating point arguments around, vararg functions or not. Hence no vector registers are ever used when calling a vararg function. But gcc still dutifully emits an xor %eax,%eax before each and every call of a vararg function. Since no callee use that for anything, these instructions are redundant. This patch adds the -mskip-rax-setup option to skip setting up RAX register when SSE is disabled and there are no variable arguments passed in vector registers. Since RAX register is used to avoid unnecessarily saving vector registers on stack when passing variable arguments, the impacts of this option are callees may waste some stack space, misbehave or jump to a random location. GCC 4.4 or newer don't those issues, regardless the RAX register value since they don't check the RAX register value when SSE is disabled, regardless the RAX register value: https://gcc.gnu.org/ml/gcc-patches/2008-09/msg00127.html I used it on kernel 3.17.7: textdata bss dec hexfilename 11493571 2271232 5926912 19691715 12c78c3 vmlinux.skip-rax 11517879 2271232 5926912 19716023 12cd7b7 vmlinux.orig It removed 14309 redundant xor %eax,%eax instructions and saved about 27KB. I am currently running the new kernel without any problem. OK for trunk? How about skipping RAX setup unconditionally for !TARGET_SSE? Please see ix86_conditional_register_usage, where SSE registers are squashed for !TARGET_SSE, so it is not possible to use them even in the inline asm. ... when -ffreestanding is in effect, of course. Ops, this is not the unconditional default kernel compile flag. It is defined only for 32bit builds, where: # temporary until string.h is fixed KBUILD_CFLAGS += -ffreestanding Yes, it looks to me that new option is the way to go. Uros.
Re: [PATCH] X86-64: Add -mskip-rax-setup
On Thu, Dec 18, 2014 at 6:03 AM, Uros Bizjak ubiz...@gmail.com wrote: On Thu, Dec 18, 2014 at 2:49 PM, Uros Bizjak ubiz...@gmail.com wrote: The Linux kernel never passes floating point arguments around, vararg functions or not. Hence no vector registers are ever used when calling a vararg function. But gcc still dutifully emits an xor %eax,%eax before each and every call of a vararg function. Since no callee use that for anything, these instructions are redundant. This patch adds the -mskip-rax-setup option to skip setting up RAX register when SSE is disabled and there are no variable arguments passed in vector registers. Since RAX register is used to avoid unnecessarily saving vector registers on stack when passing variable arguments, the impacts of this option are callees may waste some stack space, misbehave or jump to a random location. GCC 4.4 or newer don't those issues, regardless the RAX register value since they don't check the RAX register value when SSE is disabled, regardless the RAX register value: https://gcc.gnu.org/ml/gcc-patches/2008-09/msg00127.html I used it on kernel 3.17.7: textdata bss dec hexfilename 11493571 2271232 5926912 19691715 12c78c3 vmlinux.skip-rax 11517879 2271232 5926912 19716023 12cd7b7 vmlinux.orig It removed 14309 redundant xor %eax,%eax instructions and saved about 27KB. I am currently running the new kernel without any problem. OK for trunk? How about skipping RAX setup unconditionally for !TARGET_SSE? Please see ix86_conditional_register_usage, where SSE registers are squashed for !TARGET_SSE, so it is not possible to use them even in the inline asm. ... when -ffreestanding is in effect, of course. Ops, this is not the unconditional default kernel compile flag. It is defined only for 32bit builds, where: # temporary until string.h is fixed KBUILD_CFLAGS += -ffreestanding Yes, it looks to me that new option is the way to go. Is this an OK? Some really old gcc versions used an indirect jump based on the eax input and they didn't zero extend first. So with those compilers, you could actually jump to a random location. You can enable it only when you can compile everything with a newer GCC. Thanks. -- H.J.
Re: [PATCH] X86-64: Add -mskip-rax-setup
On Thu, Dec 18, 2014 at 3:09 PM, H.J. Lu hjl.to...@gmail.com wrote: The Linux kernel never passes floating point arguments around, vararg functions or not. Hence no vector registers are ever used when calling a vararg function. But gcc still dutifully emits an xor %eax,%eax before each and every call of a vararg function. Since no callee use that for anything, these instructions are redundant. This patch adds the -mskip-rax-setup option to skip setting up RAX register when SSE is disabled and there are no variable arguments passed in vector registers. Since RAX register is used to avoid unnecessarily saving vector registers on stack when passing variable arguments, the impacts of this option are callees may waste some stack space, misbehave or jump to a random location. GCC 4.4 or newer don't those issues, regardless the RAX register value since they don't check the RAX register value when SSE is disabled, regardless the RAX register value: https://gcc.gnu.org/ml/gcc-patches/2008-09/msg00127.html I used it on kernel 3.17.7: textdata bss dec hexfilename 11493571 2271232 5926912 19691715 12c78c3 vmlinux.skip-rax 11517879 2271232 5926912 19716023 12cd7b7 vmlinux.orig It removed 14309 redundant xor %eax,%eax instructions and saved about 27KB. I am currently running the new kernel without any problem. OK for trunk? How about skipping RAX setup unconditionally for !TARGET_SSE? Please see ix86_conditional_register_usage, where SSE registers are squashed for !TARGET_SSE, so it is not possible to use them even in the inline asm. ... when -ffreestanding is in effect, of course. Ops, this is not the unconditional default kernel compile flag. It is defined only for 32bit builds, where: # temporary until string.h is fixed KBUILD_CFLAGS += -ffreestanding Yes, it looks to me that new option is the way to go. Is this an OK? In principle, I'm OK with the patch approach, but let's wait for eventual comments from Linux people. Some really old gcc versions used an indirect jump based on the eax input and they didn't zero extend first. So with those compilers, you could actually jump to a random location. You can enable it only when you can compile everything with a newer GCC. Uros.
Re: [C++ Patch] PR 60955
On 12/18/2014 06:17 AM, Paolo Carlini wrote: Sure. The below uses the c_inhibit_evaluation_warnings mechanism and passes testing. I wondered if in such cases we could alternately use the warning_sentinel mechanism (moved to cp-tree.h of course) ? That would make sense to me. Jason
Re: OpenACC middle end changes
On Thu, Dec 18, 2014 at 12:07:01PM +0100, Thomas Schwinge wrote: Many thanks for the review comments! The very most have been addresed, here are just a few comments. If you feel strongly/differently about any, I'll address those, too. So, with your latest change both compilers build: mkdir objmic; cd objmic ../configure --build=x86_64-intelmicemul-linux-gnu --host=x86_64-intelmicemul-linux-gnu --target=x86_64-intelmicemul-linux-gnu --enable-as-accelerator-for=x86_64-pc-linux-gnu --disable-bootstrap make -j16 make DESTDIR=`pwd`/../objinst install mkdir ../obj; cd ../obj ../configure --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --enable-offload-targets=x86_64-intelmicemul-linux-gnu=/usr/src/gomp-4.0/objmic --disable-bootstrap make -j16 But there are issues during make check. I first did: make -j16 -k check RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} gomp.exp goacc.exp goacc-gomp.exp' and that shows: Making a new config file... echo set tmpdir /usr/src/gomp-4.0/obj/gcc/testsuite ./site.tmp rm -rf testsuite/gcc-parallel rm -rf testsuite/g++-parallel rm -rf testsuite/gfortran-parallel rm -rf testsuite/objc-parallel mkdir: cannot create directory ‘testsuite’: File exists make[1]: Entering directory '/usr/src/gomp-4.0/obj/gcc' make[1]: Entering directory '/usr/src/gomp-4.0/obj/gcc' make[1]: Entering directory '/usr/src/gomp-4.0/obj/gcc' make[1]: Entering directory '/usr/src/gomp-4.0/obj/gcc' mkdir: cannot create directory ‘plugin’: File exists mkdir: cannot create directory ‘plugin’mkdir: : File existscannot create directory ‘plugin’ : File exists mkdir: cannot create directory ‘plugin’: File exists Makefile:3787: recipe for target 'check-parallel-gcc_1' failed make[1]: [check-parallel-gcc_1] Error 1 (ignored) Makefile:3787: recipe for target 'check-parallel-gcc_2' failed make[1]: [check-parallel-gcc_2] Error 1 (ignored) Makefile:3787: recipe for target 'check-parallel-gcc_3' failed make[1]: [check-parallel-gcc_3] Error 1 (ignored) Makefile:3787: recipe for target 'check-parallel-gcc_4' failed make[1]: [check-parallel-gcc_4] Error 1 (ignored) Clearly preexisting problem even on trunk, so not a show stopper for this. And in libgomp the testing fails completely: Making check in testsuite make[1]: Entering directory '/usr/src/gomp-4.0/obj/x86_64-pc-linux-gnu/libgomp/testsuite' make check-DEJAGNU make[2]: Entering directory '/usr/src/gomp-4.0/obj/x86_64-pc-linux-gnu/libgomp/testsuite' Making a new site.exp file... srcdir=`CDPATH=${ZSH_VERSION+.}: cd ../../../../libgomp/testsuite pwd`; export srcdir; \ EXPECT=expect; export EXPECT; \ runtest=runtest ; \ if /bin/sh -c $runtest --version /dev/null 21; then \ exit_status=0; l='libgomp'; for tool in $l; do \ if $runtest --tool $tool --srcdir $srcdir ; \ then :; else exit_status=1; fi; \ done; \ else echo WARNING: could not find \`runtest' 12; :;\ fi; \ exit $exit_status WARNING: Couldn't find the global config file. ERROR: tcl error sourcing libgomp-test-support.exp. can't read (target_alias): no such variable while executing set offload_additional_options -B/usr/src/gomp-4.0/objmic/libexec/gcc/$(target_alias)/$(gcc_version) -B/usr/src/gomp-4.0/objmic/bin (file libgomp-test-support.exp line 5) invoked from within source libgomp-test-support.exp (uplevel body line 1) invoked from within uplevel #0 source libgomp-test-support.exp invoked from within catch uplevel #0 source $file Makefile:277: recipe for target 'check-DEJAGNU' failed make[2]: *** [check-DEJAGNU] Error 1 make[2]: Leaving directory '/usr/src/gomp-4.0/obj/x86_64-pc-linux-gnu/libgomp/testsuite' Makefile:314: recipe for target 'check-am' failed make[1]: *** [check-am] Error 2 make[1]: Leaving directory '/usr/src/gomp-4.0/obj/x86_64-pc-linux-gnu/libgomp/testsuite' Makefile:856: recipe for target 'check-recursive' failed make: *** [check-recursive] Error 1 So clearly the *.exp files need to be taught where to look for libgomp-test-support.exp. Jakub
Re: Pragma parsing (was: [PATCH] OpenACC for C++ front end)
On Thu, Dec 18, 2014 at 02:48:32PM +0100, Thomas Schwinge wrote: Like for CILK, I'd strongly prefer if for the clauses that are specific to OpenACC only you'd use PRAGMA_OACC_CLAUSE_* instead, and put them after the PRAGMA_CILK_* enum values. If you want to have PRAGMA_OACC_CLAUSE_ aliases also for the clauses shared in between OpenMP and OpenACC, feel free to add aliases like Cilk+ has them. It is unfortunately lots of new clauses and we are getting close to the 64 clauses limit :( when they are used in bitmasks. I still don't like this very much, because we'll then get a mess of PRAGMA_OMP_CLAUSE_* intermixed with PRAGMA_OACC_CLAUSE_*. I understand, for example, PRAGMA_OMP_CLAUSE_REDUCTION to just be a numeric representation of the reduction string -- and then, it doesn't make too much sense to me to express this both as PRAGMA_OMP_CLAUSE_REDUCTION and PRAGMA_OACC_CLAUSE_REDUCTION, or, similarly, to switch back and forth between PRAGMA_OMP_CLAUSE_* and PRAGMA_OACC_CLAUSE_*. Yet, that's just the point that I'm making, so I'll defer if there's no consensus to be found here. The point is that we now have lots of clauses, and making it clear what from those clauses are Cilk+, what are OpenACC, what are OpenMP will help with code readability. Especially if some clauses diverge in their meaning between different standards. Chung-Lin also had a suggestion to make; what do you think of this? We don't need it yet, and I'm not very keen on making GCC internals use STL more than it does, HWI is certainly more efficient than bitset. So I'd defer any decision here until we have to. Jakub
Re: [PATCH, ARM] attribute target (thumb,arm) [6/6] - [7/7]
Hello Ramana, I don't know if you have started to look at it, but the attribute support fails after upgrading. This patch aims to catch up on the changes around the fipa_ra -masm-syntax-unified options since the initial posting. They were not tested/supported with the attribute, and of course generated new failures after an update, as they needs resetting depending on thumb mode. It also fixes various minor thinkos, one that prevented global flags (e.g -fschedule_insns) depending on thumb mode to be correctly reset on situation were -mthumb was passed on the command line and thumb/arm attribute was used alternatively. The other is around the -mflip-thumb option setting. Last, a minor fix in the test attr_arm-err.c that had false failures with conflicting -march. Aside, a funny thing: The ACLE doc (https://gcc.gnu.org/onlinedocs/gcc/ARM-C-Language-Extensions-_0028ACLE_0029.html#ARM-C-Language-Extensions-_0028ACLE_0029) already as a visionary description of the attribute. I might have missed something but this time the doc comes earlier than the implementation :-) without the historical background. Best Regards Christian 2014-12-14 Christian Bruel christian.br...@st.com * config/arm/arm.c (arm_option_override_internal): add opts_set param. Use it to set restrict_it. Handle flag_ipa_ra and inline_asm_unified. (arm_valid_target_attribute_tree): Add opts_set param. Initialize init_optimize. (init_optimize): New static variable. (thumb_flipper): Set. ( arm_valid_target_attribute_p): Rewrite. config/arm/arm-protos.h (arm_valid_target_attribute_tree): New opts_set param * config/arm/arm-c.c (arm_pragma_target_parse): Likewiese. * config/arm/arm.opt (inline_asm_unified): Save. 2014-12-14 Christian Bruel christian.br...@st.com * gcc.target/arm/attr_arm-err.c: Check conflicting -march options. diff '--exclude=*~' '--exclude=.svn' -ru a/gcc/gcc/config/arm/arm.c b/gcc/gcc/config/arm/arm.c --- a/gcc/gcc/config/arm/arm.c 2014-12-18 14:36:06.0 +0100 +++ b/gcc/gcc/config/arm/arm.c 2014-12-18 14:35:12.0 +0100 @@ -2625,9 +2625,16 @@ } } +/* True if -mflip-thumb should next add an attribute for the default + mode, false if it should next add an attribute for the opposite mode. */ +static GTY(()) bool thumb_flipper; + +static GTY(()) tree init_optimize; + /* Reset options between modes that the user has specified. */ static void -arm_option_override_internal (struct gcc_options *opts) +arm_option_override_internal (struct gcc_options *opts, + struct gcc_options *opts_set) { if (TREE_TARGET_THUMB (opts) !(insn_flags FL_THUMB)) { @@ -2646,13 +2653,13 @@ if (TREE_TARGET_THUMB (opts) TARGET_CALLEE_INTERWORKING) opts-x_target_flags |= MASK_INTERWORK; - if (restrict_default) + if (! opts_set-x_arm_restrict_it) opts-x_arm_restrict_it = arm_arch8; if (!TREE_TARGET_THUMB2 (opts)) opts-x_arm_restrict_it = 0; - if (TREE_TARGET_THUMB1 (opts) opts-x_flag_schedule_insns) + if (TREE_TARGET_THUMB1 (opts)) { /* Don't warn since it's on by default in -O2. */ opts-x_flag_schedule_insns = 0; @@ -2663,9 +2670,21 @@ if (optimize_function_for_size_p (cfun) TREE_TARGET_THUMB2 (opts)) opts-x_flag_shrink_wrap = false; + /* In Thumb1 mode, we emit the epilogue in RTL, but the last insn + - epilogue_insns - does not accurately model the corresponding insns + emitted in the asm file. In particular, see the comment in thumb_exit + 'Find out how many of the (return) argument registers we can corrupt'. + As a consequence, the epilogue may clobber registers without fipa-ra + finding out about it. Therefore, disable fipa-ra in Thumb1 mode. + TODO: Accurately model clobbers for epilogue_insns and reenable + fipa-ra. */ + if (TREE_TARGET_THUMB1 (opts)) +opts-x_flag_ipa_ra = 0; + /* Thumb2 inline assembly code should always use unified syntax. This will apply to ARM and Thumb1 eventually. */ - opts-x_inline_asm_unified = TREE_TARGET_THUMB2 (opts); + if (TREE_TARGET_THUMB2 (opts)) + opts-x_inline_asm_unified = 1; } /* Fix up any incompatible options that the user has specified. */ @@ -3127,27 +3146,17 @@ if (target_slow_flash_data) arm_disable_literal_pool = true; - /* Override flags, but not the user's one. */ - restrict_default = (arm_restrict_it == 2); - /* Disable scheduling fusion by default if it's not armv7 processor or doesn't prefer ldrd/strd. */ if (flag_schedule_fusion == 2 (!arm_arch7 || !current_tune-prefer_ldrd_strd)) flag_schedule_fusion = 0; - /* In Thumb1 mode, we emit the epilogue in RTL, but the last insn - - epilogue_insns - does not accurately model the corresponding insns - emitted in the asm file. In particular, see the comment in thumb_exit - 'Find out how many of the (return) argument registers we can corrupt'. - As a consequence, the epilogue may clobber
libgomp offloading testing (was: OpenACC middle end changes)
Hi Jakub! On Thu, 18 Dec 2014 15:20:42 +0100, Jakub Jelinek ja...@redhat.com wrote: So, with your latest change both compilers build: mkdir objmic; cd objmic ../configure --build=x86_64-intelmicemul-linux-gnu --host=x86_64-intelmicemul-linux-gnu --target=x86_64-intelmicemul-linux-gnu --enable-as-accelerator-for=x86_64-pc-linux-gnu --disable-bootstrap make -j16 make DESTDIR=`pwd`/../objinst install mkdir ../obj; cd ../obj ../configure --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --enable-offload-targets=x86_64-intelmicemul-linux-gnu=/usr/src/gomp-4.0/objmic --disable-bootstrap make -j16 Thanks; I'll look into reproducing such a build. And in libgomp the testing fails completely: What happens, in my understanding, is: Making check in testsuite make[1]: Entering directory '/usr/src/gomp-4.0/obj/x86_64-pc-linux-gnu/libgomp/testsuite' make check-DEJAGNU make[2]: Entering directory '/usr/src/gomp-4.0/obj/x86_64-pc-linux-gnu/libgomp/testsuite' Making a new site.exp file... srcdir=`CDPATH=${ZSH_VERSION+.}: cd ../../../../libgomp/testsuite pwd`; export srcdir; \ EXPECT=expect; export EXPECT; \ runtest=runtest ; \ if /bin/sh -c $runtest --version /dev/null 21; then \ exit_status=0; l='libgomp'; for tool in $l; do \ if $runtest --tool $tool --srcdir $srcdir ; \ then :; else exit_status=1; fi; \ done; \ else echo WARNING: could not find \`runtest' 12; :;\ fi; \ exit $exit_status WARNING: Couldn't find the global config file. ERROR: tcl error sourcing libgomp-test-support.exp. can't read (target_alias): no such variable The variable $(target_alias) is not available when in the line: while executing set offload_additional_options -B/usr/src/gomp-4.0/objmic/libexec/gcc/$(target_alias)/$(gcc_version) -B/usr/src/gomp-4.0/objmic/bin ... in the file: (file libgomp-test-support.exp line 5) invoked from within source libgomp-test-support.exp ... it is being parsed. Should target_alias and gcc_version be instantiated (AC_SUBST) by Autoconf already, when creating the libgomp-test-support.exp file from libgomp/testsuite/libgomp-test-support.exp.in? Or, should those be written (in libgomp/plugin/configfrag.ac) in TCL syntax, and evaluated only once libgomp-test-support.exp is sourced? target_alias is being provided in site.exp (as generated by libgomp/testsuite/Makefile), but gcc_version is not. (uplevel body line 1) invoked from within uplevel #0 source libgomp-test-support.exp invoked from within catch uplevel #0 source $file Makefile:277: recipe for target 'check-DEJAGNU' failed make[2]: *** [check-DEJAGNU] Error 1 make[2]: Leaving directory '/usr/src/gomp-4.0/obj/x86_64-pc-linux-gnu/libgomp/testsuite' Makefile:314: recipe for target 'check-am' failed make[1]: *** [check-am] Error 2 make[1]: Leaving directory '/usr/src/gomp-4.0/obj/x86_64-pc-linux-gnu/libgomp/testsuite' Makefile:856: recipe for target 'check-recursive' failed make: *** [check-recursive] Error 1 So clearly the *.exp files need to be taught where to look for libgomp-test-support.exp. That's not the problem, if I'm understanding correctly. Grüße, Thomas pgpQUlVKSWcfG.pgp Description: PGP signature
[patch c++]: Fix PR/61198: Crash when selecting specializations through aliases.
Hi, this patch adds handling of aliases within templates and tries to resolve specialization for them. ChangeLog 2014-12-18 Kai Tietz kti...@redhat.com PR c++/61198 * pt.c (retrieve_specialization): Handle using. Tested on x86_64-w64-mingw32. Ok for apply? Regards, Kai ChangeLog testsuite 2014-12-18 Kai Tietz kti...@redhat.com PR c++/61198 * g++.dg/template/using29.C: New file. // { dg-do compile } templateint herp, typename derp_t struct broken { templatetypename target_t using rebind = brokenherp, target_ }; templatetypename derp_t struct broken2, derp_t { templatetypename target_t using rebind = broken2, target_t; }; int main(int argc, char **argv) { broken2, float::rebinddouble u; return 0; } Index: pt.c === --- pt.c(Revision 218832) +++ pt.c(Arbeitskopie) @@ -1033,6 +1033,8 @@ optimize_specialization_lookup_p (tree tmpl) static tree retrieve_specialization (tree tmpl, tree args, hashval_t hash) { + tree tmpl_parms; + if (tmpl == NULL_TREE) return NULL_TREE; @@ -1044,10 +1046,20 @@ retrieve_specialization (tree tmpl, tree args, has /* There should be as many levels of arguments as there are levels of parameters. */ - gcc_assert (TMPL_ARGS_DEPTH (args) - == (TREE_CODE (tmpl) == TEMPLATE_DECL - ? TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (tmpl)) - : template_class_depth (DECL_CONTEXT (tmpl; + if (TREE_CODE (tmpl) == TEMPLATE_DECL) +{ + tmpl_parms = DECL_TEMPLATE_PARMS (tmpl); + if (TMPL_ARGS_DEPTH (args) TMPL_PARMS_DEPTH (tmpl_parms)) +args = get_innermost_template_args +(args, TMPL_PARMS_DEPTH (tmpl_parms)); + gcc_assert (TMPL_ARGS_DEPTH (args) + == TMPL_PARMS_DEPTH (tmpl_parms)); +} + else +{ + tmpl_parms = DECL_CONTEXT (tmpl); + gcc_assert (TMPL_ARGS_DEPTH (args) == template_class_depth (tmpl_parms)); +} if (optimize_specialization_lookup_p (tmpl)) {
Re: [PATCH 4/4] OpenMP 4.0 offloading to Intel MIC: non-fallback testing
Hi! On Thu, 30 Oct 2014 14:40:01 +0300, Ilya Verbin iver...@gmail.com wrote: This patch allows to run non-fallback 'make check-target-libgomp'. It passes to the host compiler additional -B options with the paths to the offload compilers, since non-installed host compiler doesn't know where to find mkoffload tools. Also in case of intelmic offload targets it appends paths to liboffloadmic lib. --- a/libgomp/configure.ac +++ b/libgomp/configure.ac @@ -280,9 +280,13 @@ else multilib_arg= fi +# Get accel target and path to install tree of accel compiler +offload_additional_options= +offload_additional_lib_paths= offload_targets= if test x$enable_offload_targets != x; then for tgt in `echo $enable_offload_targets | sed -e 's#,# #g'`; do +tgt_dir=`echo $tgt | grep '=' | sed 's/.*=//'` tgt=`echo $tgt | sed 's/=.*//'` case $tgt in *-intelmic-* | *-intelmicemul-*) @@ -295,10 +299,20 @@ if test x$enable_offload_targets != x; then else offload_targets=$offload_targets,$tgt_name fi +if test x$tgt_dir != x; then + offload_additional_options=$offload_additional_options -B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin + offload_additional_lib_paths=$offload_additional_lib_paths:$tgt_dir/lib64:$tgt_dir/lib +else + offload_additional_options=$offload_additional_options -B\$(libexecdir)/gcc/\$(target_alias)/\$(gcc_version) -B\$(bindir) + offload_additional_lib_paths=$offload_additional_lib_paths:$toolexeclibdir +fi done fi Hmm, maybe I'm seeing a problem where there isn't one, but in case I'm not: how will this work if there is more than one offloading compiler configured? Won't you get conflicting -B paths added to offload_additional_options in this case? --- a/libgomp/testsuite/lib/libgomp.exp +++ b/libgomp/testsuite/lib/libgomp.exp @@ -169,6 +186,10 @@ proc libgomp_init { args } { # Disable color diagnostics lappend ALWAYS_CFLAGS additional_flags=-fdiagnostics-color=never + +# Required to support non-fallback testing of '#pragma omp target'. +# Help GCC to find target mkoffload. +lappend ALWAYS_CFLAGS additional_flags=${offload_additional_options} } What we're doing in OpenACC offloading testing (gomp-4_0-branch), is in the libgomp.oacc-c/c.exp (etc.) file cycle through all the available offloading devices, and then in there -- I think -- such options should be set, that are specific to one particular offloading device/compiler? Grüße, Thomas signature.asc Description: PGP signature
Re: [PATCH][ARM] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P
Ping. Thanks, Kyrill On 11/12/14 15:06, Kyrill Tkachov wrote: Ping. https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00340.html Thanks, Kyrill On 04/12/14 09:19, Kyrill Tkachov wrote: On 02/12/14 22:58, Ramana Radhakrishnan wrote: On Tue, Nov 11, 2014 at 11:55 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This is the arm implementation of the macro fusion hook. It tries to fuse movw+movt operations together. It also tries to take lo_sum RTXs into account since those generate movt instructions as well. Bootstrapped and tested on arm-none-linux-gnueabihf. Ok for trunk? if (current_tune-fuseable_ops ARM_FUSE_MOVW_MOVT) +{ + /* We are trying to fuse + movw imm / movt imm + instructions as a group that gets scheduled together. */ + A comment here about the insn structure would be useful. Done. It's similar to the aarch64 adrp+add case. It does make it easier to read, thanks. 2014-12-04 Kyrylo Tkachov kyrylo.tkac...@arm.com\ * config/arm/arm-protos.h (tune_params): Add fuseable_ops field. * config/arm/arm.c (arm_macro_fusion_p): New function. (arm_macro_fusion_pair_p): Likewise. (TARGET_SCHED_MACRO_FUSION_P): Define. (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise. (ARM_FUSE_NOTHING): Likewise. (ARM_FUSE_MOVW_MOVT): Likewise. (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune, arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune, arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune, arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune arm_cortex_a5_tune): Specify fuseable_ops value. + set_dest = SET_DEST (curr_set); + if (GET_CODE (set_dest) == ZERO_EXTRACT) +{ + if (CONST_INT_P (SET_SRC (curr_set)) + CONST_INT_P (SET_SRC (prev_set)) + REG_P (XEXP (set_dest, 0)) + REG_P (SET_DEST (prev_set)) + REGNO (XEXP (set_dest, 0)) == REGNO (SET_DEST (prev_set))) +return true; +} + else if (GET_CODE (SET_SRC (curr_set)) == LO_SUM +REG_P (SET_DEST (curr_set)) +REG_P (SET_DEST (prev_set)) +GET_CODE (SET_SRC (prev_set)) == HIGH +REGNO (SET_DEST (curr_set)) == REGNO (SET_DEST (prev_set))) +{ + return true; +} Can we add a fast path exit to be if (GET_MODE (set_dest) != SImode) return false; Done, but if/when we extend the function to handle more fusion cases it will need to be refactored, since we will want to just bail out of this MOVW+MOVT case rather than the whole function. I did think whether we wanted to use reg_overlap_mentioned_p as that may simplify the logic a bit but that's overkill here as we still want to restrict it to the cases above. Otherwise OK. Here's the updated patch. I've tested on arm-none-eabi and made sure that the fusion still happens on the benchmarks I looked at. Ok? Thanks, Kyrill Ramana +} + return false; Thanks, Kyrill 2014-11-11 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm-protos.h (tune_params): Add fuseable_ops field. * config/arm/arm.c (arm_macro_fusion_p): New function. (arm_macro_fusion_pair_p): Likewise. (TARGET_SCHED_MACRO_FUSION_P): Define. (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise. (ARM_FUSE_NOTHING): Likewise. (ARM_FUSE_MOVW_MOVT): Likewise. (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune, arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune, arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune, arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune arm_cortex_a5_tune): Specify fuseable_ops value.
RE: [PATCH] Disable -fuse-caller-save when -pg is active
Patch has been tested with DejaGnu gcc test suite on mips32r2 cross compiler and bootstrapped and tested on x86_64 native compiler. Radovan From: Jeff Law [l...@redhat.com] Sent: Monday, November 17, 2014 1:18 PM To: Radovan Obradovic; gcc-patches@gcc.gnu.org Cc: Petar Jovanovic Subject: Re: [PATCH] Disable -fuse-caller-save when -pg is active On 11/14/14 10:10, Radovan Obradovic wrote: Thank you for the quick reply. Please repost after updating to test HAVE_prologue and HAVE_epilogue and adding a testcase. I have managed to reproduce the problem on the small test case on mips32, but the test is architecture independent and should probably fail on many other ports without this patch. The optimization is also disabled if macros HAVE_prologue and HAVE_epilogue are not defined or have false value. Thanks. This looks good. I forgot to ask in my prior message, has this patch been bootstrapped and regression tested? If so, on what platform? Jeff
Re: OpenACC middle end changes
Hi Jakub! On Thu, 13 Nov 2014 19:09:49 +0100, Jakub Jelinek ja...@redhat.com wrote: --- gcc/builtins.c +++ gcc/builtins.c +static rtx +expand_builtin_acc_on_device (tree exp, rtx target ATTRIBUTE_UNUSED) +{ + if (!validate_arglist (exp, INTEGER_TYPE, VOID_TYPE)) +return NULL_RTX; + + tree arg, v1, v2, ret; + location_t loc; + + arg = CALL_EXPR_ARG (exp, 0); + arg = builtin_save_expr (arg); + loc = EXPR_LOCATION (exp); + + /* Build: (arg == v1 || arg == v2) ? 1 : 0. */ + +#ifdef ACCEL_COMPILER + v1 = build_int_cst (TREE_TYPE (arg), /* TODO: acc_device_not_host */ 3); + v2 = build_int_cst (TREE_TYPE (arg), ACCEL_COMPILER_acc_device); +#else + v1 = build_int_cst (TREE_TYPE (arg), /* TODO: acc_device_none */ 0); + v2 = build_int_cst (TREE_TYPE (arg), /* TODO: acc_device_host */ 2); +#endif + + v1 = fold_build2_loc (loc, EQ_EXPR, integer_type_node, arg, v1); + v2 = fold_build2_loc (loc, EQ_EXPR, integer_type_node, arg, v2); + + /* Can't use TRUTH_ORIF_EXPR, as that is not supported by + expand_expr_real*. */ + ret = fold_build3_loc (loc, COND_EXPR, integer_type_node, v1, v1, v2); + ret = fold_build3_loc (loc, COND_EXPR, integer_type_node, +ret, integer_one_node, integer_zero_node); + + return expand_normal (ret); If you can't fold it late (which is indeed a problem for -O0), then I'd suggest to implement this more RTL-ish. So, avoid the builtin_save_expr, instead rtx op = expand_normal (arg); Don't build v1/v2 as trees (and, please fix the TODOs), but rtxes, just rtx v1 = GEN_INT (...); rtx v2 = GEN_INT (...); machine_mode mode = TYPE_MODE (TREE_TYPE (arg)); rtx ret = gen_reg_rtx (TYPE_MODE (integer_type_node)); emit_move_insn (ret, const0_rtx); rtx_code_label *done_label = gen_label_rtx (); emit_cmp_and_jump_insns (op, v1, NE, NULL_RTX, mode, false, done_label, PROB_EVEN); emit_cmp_and_jump_insns (op, v2, NE, NULL_RTX, mode, false, done_label, PROB_EVEN); emit_move_insn (ret, const1_rtx); emit_label (done_label); return ret; or similar. ;-) Yes, similar, as I've now found; committed to gomp-4_0-branch in r218869: commit cb37a039eb7a7375d074bc092457349312c5a2e2 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Thu Dec 18 16:07:23 2014 + OpenACC acc_on_device: Fix logic error introduced in an earlier change. ... but which didn't show up in testing until after a libgomp rebuild, because of the caching of the acc_on_device builtin that is being done in libgomp/oacc-init.c:acc_on_device. gcc/ * builtins.c (expand_builtin_acc_on_device): Fix logic error. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@218869 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp | 2 ++ gcc/builtins.c | 8 2 files changed, 6 insertions(+), 4 deletions(-) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index a744ebf..a21fd92 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,5 +1,7 @@ 2014-12-18 Thomas Schwinge tho...@codesourcery.com + * builtins.c (expand_builtin_acc_on_device): Fix logic error. + * config/i386/intelmic-offload.h: New file. * config/nvptx/offload.h: Likewise. * config.gcc *-intelmic-*, *-intelmicemul-*, nvptx-*: Point to diff --git gcc/builtins.c gcc/builtins.c index 33025a5..6891229 100644 --- gcc/builtins.c +++ gcc/builtins.c @@ -5909,13 +5909,13 @@ expand_builtin_acc_on_device (tree exp, rtx target) machine_mode target_mode = TYPE_MODE (integer_type_node); if (!REG_P (target) || GET_MODE (target) != target_mode) target = gen_reg_rtx (target_mode); - emit_move_insn (target, const0_rtx); + emit_move_insn (target, const1_rtx); rtx_code_label *done_label = gen_label_rtx (); - do_compare_rtx_and_jump (v, v1, NE, false, v_mode, NULL_RTX, + do_compare_rtx_and_jump (v, v1, EQ, false, v_mode, NULL_RTX, NULL_RTX, done_label, PROB_EVEN); - do_compare_rtx_and_jump (v, v2, NE, false, v_mode, NULL_RTX, + do_compare_rtx_and_jump (v, v2, EQ, false, v_mode, NULL_RTX, NULL_RTX, done_label, PROB_EVEN); - emit_move_insn (target, const1_rtx); + emit_move_insn (target, const0_rtx); emit_label (done_label); return target; Grüße, Thomas pgp6P0PultR2w.pgp Description: PGP signature
Re: [C++ Patch] PR 60955
Hi, On 12/18/2014 03:20 PM, Jason Merrill wrote: On 12/18/2014 06:17 AM, Paolo Carlini wrote: Sure. The below uses the c_inhibit_evaluation_warnings mechanism and passes testing. I wondered if in such cases we could alternately use the warning_sentinel mechanism (moved to cp-tree.h of course) ? That would make sense to me. Good. Then I'm finishing testing the below. Thanks, Paolo. Index: cp/cp-tree.h === --- cp/cp-tree.h(revision 218857) +++ cp/cp-tree.h(working copy) @@ -1149,6 +1149,18 @@ struct processing_template_decl_sentinel } }; +/* RAII sentinel to disable certain warnings during template substitution + and elsewhere. */ + +struct warning_sentinel +{ + int flag; + int val; + warning_sentinel(int flag, bool suppress=true) +: flag(flag), val(flag) { if (suppress) flag = 0; } + ~warning_sentinel() { flag = val; } +}; + /* The cached class binding level, from the most recently exited class, or NULL if none. */ Index: cp/pt.c === --- cp/pt.c (revision 218857) +++ cp/pt.c (working copy) @@ -14438,16 +14438,6 @@ tsubst_non_call_postfix_expression (tree t, tree a return t; } -/* Sentinel to disable certain warnings during template substitution. */ - -struct warning_sentinel { - int flag; - int val; - warning_sentinel(int flag, bool suppress=true) -: flag(flag), val(flag) { if (suppress) flag = 0; } - ~warning_sentinel() { flag = val; } -}; - /* Like tsubst but deals with expressions and performs semantic analysis. FUNCTION_P is true if T is the F in F (ARGS). */ Index: cp/semantics.c === --- cp/semantics.c (revision 218857) +++ cp/semantics.c (working copy) @@ -1660,6 +1660,7 @@ force_paren_expr (tree expr) tree type = unlowered_expr_type (expr); bool rval = !!(kind clk_rvalueref); type = cp_build_reference_type (type, rval); + warning_sentinel s (extra_warnings); expr = build_static_cast (type, expr, tf_error); if (expr != error_mark_node) REF_PARENTHESIZED_P (expr) = true; Index: testsuite/g++.dg/warn/register-parm-1.C === --- testsuite/g++.dg/warn/register-parm-1.C (revision 0) +++ testsuite/g++.dg/warn/register-parm-1.C (working copy) @@ -0,0 +1,9 @@ +// PR c++/60955 +// { dg-options -Wextra } + +unsigned int erroneous_warning(register int a) { +if ((a) 0xff) return 1; else return 0; +} +unsigned int no_erroneous_warning(register int a) { +if (a 0xff) return 1; else return 0; +}
Re: [PATCH 2/2] [C++] pr31397 - implement -Wsuggest-override
On Wed, Dec 17, 2014 at 04:21:31PM -0500, Jason Merrill wrote: On 11/27/2014 01:28 AM, tsaund...@mozilla.com wrote: + if (warn_override DECL_VIRTUAL_P (decl) !DECL_OVERRIDE_P (decl) Why check DECL_VIRTUAL_P here? I think it was to avoid warning when decl is hiding a non virtual method on the parent class, but I guess that won't happen anyway? so I'll remove it. Trev +warning_at(DECL_SOURCE_LOCATION (decl), OPT_Wsuggest_override, Space before the (. Jason
Re: [PATCH] PR 62173, re-shuffle insns for RTL loop invariant hoisting
On 17/12/14 15:54, Richard Biener wrote: On Mon, Dec 15, 2014 at 4:29 PM, Jiong Wang jiong.w...@arm.com wrote: On 15/12/14 15:28, Jiong Wang wrote: On 04/12/14 19:32, Jiong Wang wrote: On 04/12/14 11:07, Richard Biener wrote: On Thu, Dec 4, 2014 at 12:07 PM, Richard Biener richard.guent...@gmail.com wrote: On Thu, Dec 4, 2014 at 12:00 PM, Jiong Wang jiong.w...@arm.com wrote: which means re-associate the constant imm with the virtual frame pointer. transform RA - fixed_reg + RC RD - MEM (RA + const_offset) into: RA - fixed_reg + const_offset RD - MEM (RA + RC) then RA - fixed_reg + const_offset is actually loop invariant, so the later RTL GCSE PRE pass could catch it and do the hoisting, and thus ameliorate what tree level ivopts could not sort out. There is a LIM pass after gimple ivopts - if the invariantness is already visible there why not handle it there similar to the special-cases in rewrite_bittest and rewrite_reciprocal? maybe, needs further check. And of course similar tricks could be applied on the RTL level to RTL invariant motion? Thanks. I just checked the code, yes, loop invariant motion pass is the natural place to integrate such multi-insns invariant analysis trick. those code could be integrated into loop-invariant.c cleanly, but then I found although the invariant detected successfully but it's not moved out of loop because of cost issue, and looks like the patch committed to fix PR33928 is trying to prevent such cheap address be hoisted to reduce register pressure. 805 /* ??? Try to determine cheapness of address computation. Unfortunately 806 the address cost is only a relative measure, we can't really compare 807 it with any absolute number, but only with other address costs. 808 But here we don't have any other addresses, so compare with a magic 809 number anyway. It has to be large enough to not regress PR33928 810 (by avoiding to move reg+8,reg+16,reg+24 invariants), but small 811 enough to not regress 410.bwaves either (by still moving reg+reg 812 invariants). 813 See http://gcc.gnu.org/ml/gcc-patches/2009-10/msg01210.html . */ 814 inv-cheap_address = address_cost (SET_SRC (set), word_mode, 815 ADDR_SPACE_GENERIC, speed) 3; I think that maybe necessary for x86 which is short of register, while for RISC, it may not be that necessary, especially the whole register pressure is not big. currently, what I could think of is for this transformation below, we should increase the costs: A == RA - virtual_stack_var + RC RD - MEM (RA + const_offset) into: B == RA - virtual_stack_var + const_offset --B RD - MEM (RA + RC) because the cost is not that cheap, if there is not re-assocation of virtual_stack_var with const_offset, then lra elimination will create another instruction to hold the elimination result, so format A will actually be RT - real_stack_pointer + elimination_offset RA - RT + RC RD - MEM (RA + const_offset) so, the re-assocation and later hoisting of invariant B could actually save two instructions in the loop, this is why there are 15% perf gap for bzip2 under some situation. updated patch. moved this instruction shuffling trick to rtl loop invariant pass. as described above, this patch tries to transform A to B, so that after the transformation: * RA - virtual_stack_var + const_offset could be hoisted out of the loop * easy the work of lra elimination as virtual_stack_var is associated with const_offset that the elimination offset could be combined with const_offset automatically. current rtl loop invariant pass treat reg - reg + off as cheap address, while although reg - virtual_stack_var + offset fall into the same format, but it's not that cheap as we could also save one lra elimination instruction. so this patch will mark reg - virtual_stack_var + offset transformed from A to be expensive, so that it could be hoisted later. after patch, pr62173 is fixed on powerpc64, while *still not on aarch64*. because there are one glitch in aarch64_legitimize_address which cause unnecessary complex instructions sequences generated when legitimize some addresses. and if we fix that, we will get cheaper address for those cases which is generally good, and the cheaper address will cause tree level IVOPT do more IVOPT optimization which is generally good also, but from the speck2k result, there are actually around 1% code size regression on two cases, the reason is for target support post-index address, doing IVOPT may not always be the best choice because we lost the advantage of using post-index addressing. on aarch64, for the following testcase, the ivopted version is complexer than not ivopted version. while (oargc--) *(nargv++)
Re: [PATCH] X86-64: Add -mskip-rax-setup
On 12/18/2014 06:12 AM, Uros Bizjak wrote: # temporary until string.h is fixed KBUILD_CFLAGS += -ffreestanding Yes, it looks to me that new option is the way to go. Is this an OK? In principle, I'm OK with the patch approach, but let's wait for eventual comments from Linux people. Acked-by: H. Peter Anvin h...@linux.intel.com H.J. already coordinated with us; we are more than happy with this approach. Thank you! -hpa
Aproved patches
Please, can somebody help me? Where do I find GCC approved/accepted patch? Is there some patch directory in repository, if is please where? Thanks and best regards, Ricardo
Re: [PATCH][ARM][doc] Remove mention of Advanced RISC Machines
Ping. https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00947.html Thanks, Kyrill On 10/12/14 16:50, Kyrill Tkachov wrote: Hi all, The company ARM is not Advanced RISC Machines anymore and anyway the ARM architecture is distinct from the company. This patch adjusts the documentation accordingly. Ok? Thanks, Kyrill 2014-12-10 Kyrylo Tkachov kyrylo.tkac...@arm.com * doc/invoke.texi (ARM options): Remove mention of Advanced RISC Machines.
Re: [PATCH] X86-64: Add -mskip-rax-setup
On Thu, Dec 18, 2014 at 9:08 AM, H. Peter Anvin h...@zytor.com wrote: On 12/18/2014 06:12 AM, Uros Bizjak wrote: # temporary until string.h is fixed KBUILD_CFLAGS += -ffreestanding Yes, it looks to me that new option is the way to go. Is this an OK? In principle, I'm OK with the patch approach, but let's wait for eventual comments from Linux people. Acked-by: H. Peter Anvin h...@linux.intel.com H.J. already coordinated with us; we are more than happy with this approach. Thank you! I am checking it in now. Thanks. -- H.J.
Re: [C++ Patch] PR 60955
On 12/18/2014 11:31 AM, Paolo Carlini wrote: + warning_sentinel s (extra_warnings); Let's add a comment about which warning we're avoiding here. OK with that change. Jason
Re: [PATCH] X86-64: Add -mskip-rax-setup
On Thu, Dec 18, 2014 at 6:08 PM, H. Peter Anvin h...@zytor.com wrote: On 12/18/2014 06:12 AM, Uros Bizjak wrote: # temporary until string.h is fixed KBUILD_CFLAGS += -ffreestanding Yes, it looks to me that new option is the way to go. Is this an OK? In principle, I'm OK with the patch approach, but let's wait for eventual comments from Linux people. Acked-by: H. Peter Anvin h...@linux.intel.com H.J. already coordinated with us; we are more than happy with this approach. Great, I'm glad that this specialized option has uses. Patch is OK. Thanks, Uros.
Re: [patch c++]: Fix PR/61198: Crash when selecting specializations through aliases.
On 12/18/2014 10:10 AM, Kai Tietz wrote: + if (TMPL_ARGS_DEPTH (args) TMPL_PARMS_DEPTH (tmpl_parms)) +args = get_innermost_template_args +(args, TMPL_PARMS_DEPTH (tmpl_parms)); It seems unlikely to be correct to arbitrarily strip off outer template args. Where did the mismatch come from? Jason
Re: Aproved patches
On Thu, Dec 18, 2014 at 03:13:32PM -0200, Ricardo Sardano Guanciale wrote: Please, can somebody help me? Where do I find GCC approved/accepted patch? Is there some patch directory in repository, if is please where? Approved patches are usually just committed to the SVN repository, there is no extra repository for patches. Jakub
Re: [PATCH][ARM][doc] Remove mention of Advanced RISC Machines
On Thu, Dec 18, 2014 at 05:18:18PM +, Kyrill Tkachov wrote: Ping. https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00947.html This patch almost certainly falls under the obvious rule: will the person who objects to my work the most be able to find a fault with my fix? ARM hasn't stood for Advanced RISC Machines since 1998, so I can't see even your most fervent nemesis objecting to this one! Cheers, James Thanks, Kyrill On 10/12/14 16:50, Kyrill Tkachov wrote: Hi all, The company ARM is not Advanced RISC Machines anymore and anyway the ARM architecture is distinct from the company. This patch adjusts the documentation accordingly. Ok? Thanks, Kyrill 2014-12-10 Kyrylo Tkachov kyrylo.tkac...@arm.com * doc/invoke.texi (ARM options): Remove mention of Advanced RISC Machines.
Re: [PATCH] IPA ICF: refactoring + fix for PR ipa/63569
On 12/17/2014 04:23 PM, Richard Biener wrote: On Wed, Dec 17, 2014 at 12:17 PM, Martin Liška mli...@suse.cz wrote: On 12/11/2014 01:37 PM, Richard Biener wrote: On Wed, Dec 10, 2014 at 1:18 PM, Martin Liška mli...@suse.cz wrote: Hello. As suggested by Richard, I split compare_operand functions to various functions related to a specific comparison. Apart from that I added fast check for volatility flag that caused miscompilation mentioned in PR63569. Patch can bootstrap on x86_64-linux-pc without any regression seen and I was able to build Firefox with LTO. Ready for trunk? Hmm, I don't think the dispatch to compare_memory_operand is at the correct place. It should be called from places where currently compare_operand is called and it should recurse to compare_operand. That is, it is more high-level. Can you please fix the volatile issue separately? It's also not necessary to do that check on every operand but just on memory operands. Hello Richard. I hope the following patch is the finally the right approach we want to do ;) In the first patch, I did just refactoring for high-level memory operand comparison and the second of fixes problem connected to missing volatile attribute comparison. Patch was retested and can bootstrap. What do you think about it? +bool +func_checker::compare_cst_or_decl (tree t1, tree t2) +{ ... +case INTEGER_CST: + { + ret = compatible_types_p (TREE_TYPE (t1), TREE_TYPE (t2)) + wi::to_offset (t1) == wi::to_offset (t2); + + return return_with_debug (ret); + } +case COMPLEX_CST: +case VECTOR_CST: +case STRING_CST: +case REAL_CST: + { + ret = operand_equal_p (t1, t2, OEP_ONLY_CONST); + return return_with_debug (ret); why does the type matter for INTEGER_CSTs but not for other constants? Also comparing INTEGER_CSTs via to_offset is certainly wrong. Please use tree_int_cst_equal_p (t1, t2) instead, or operand_equal_p which would end up calling tree_int_cst_equal_p. That said, I'd have commonized all _CST nodes by ret = compatible_types_p () operand_equal_p (...); +case CONST_DECL: +case BIT_FIELD_REF: + { I'm quite sure BIT_FIELD_REF is spurious here. Now to the meat ... + tree load1 = get_base_loadstore (t1, false); + tree load2 = get_base_loadstore (t2, false); + + if (load1 load2 !compare_memory_operand (t1, t2)) + return return_false_with_msg (memory operands are different); + else if ((load1 !load2) || (!load1 load2)) + return return_false_with_msg (); + and the similar code for assigns. The way you introduce the unpack_handled_component flag to get_base_loadstore makes it treat x.field as non-memory operand. That's wrong. I wonder why you think you needed that. Why does tree load1= get_base_loadstore (t1); not work? Btw, I'd have avoided get_base_loadstore and simply did ao_ref_init (r1, t1); ao_ref_init (r2, t2); tree base1 = ao_ref_base (r1); tree base2 = ao_ref_base (r2); if ((DECL_P (base1) || (TREE_CODE (base1) == MEM_REF || TREE_CODE (base1) == TARGET_MEM_REF)) (DECL_P (base2) || (...)) ... or rather put this into compare_memory_operand and call that on all operands that may be memory (TREE_CODE () != SSA_NAME !is_gimple_min_invariant ()). I also miss where you handle memory operands on the LHS for calls and for assignments. The compare_ssa_name refactoring part is ok to install. Can you please fix the volatile bug before the refactoring as it looks like we're going to iterate some more here unless I can find the time to give it a quick try myself. Thanks, Richard. Hi. There's second part of the patch set. Thanks, Martin Thank you, Martin Thanks, Richard. Thanks, Martin From 288fb5fa6c78cf5a323c7667b180aed2b4d7e346 Mon Sep 17 00:00:00 2001 From: mliska mli...@suse.cz Date: Thu, 18 Dec 2014 14:22:51 +0100 Subject: [PATCH 2/2] IPA ICF: compare_operand is split to multiple functions. gcc/ChangeLog 2014-12-12 Martin Liska mli...@suse.cz * ipa-icf-gimple.c (func_checker::compare_ssa_name): Enhance SSA name comparison. (func_checker::compare_memory_operand): New function. (func_checker::compare_operand): Split case to newly added functions. (func_checker::compare_cst_or_decl): New function. (func_checker::compare_gimple_call): Identify memory operands. (func_checker::compare_gimple_assign): Likewise. * ipa-icf-gimple.h: New function. --- gcc/ipa-icf-gimple.c | 243 --- gcc/ipa-icf-gimple.h | 9 +- 2 files changed, 142 insertions(+), 110 deletions(-) diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c index fa2c353..2530a19 100644 --- a/gcc/ipa-icf-gimple.c +++ b/gcc/ipa-icf-gimple.c @@ -110,6 +110,9 @@ func_checker::~func_checker () bool func_checker::compare_ssa_name (tree t1, tree t2) { + gcc_assert (TREE_CODE (t1) == SSA_NAME); + gcc_assert (TREE_CODE (t2) == SSA_NAME); +
Re: [PATCH 4/4] OpenMP 4.0 offloading to Intel MIC: non-fallback testing
Hi, 2014-12-18 16:27 GMT+01:00 Thomas Schwinge tho...@codesourcery.com: Hi! On Thu, 30 Oct 2014 14:40:01 +0300, Ilya Verbin iver...@gmail.com wrote: This patch allows to run non-fallback 'make check-target-libgomp'. It passes to the host compiler additional -B options with the paths to the offload compilers, since non-installed host compiler doesn't know where to find mkoffload tools. Also in case of intelmic offload targets it appends paths to liboffloadmic lib. --- a/libgomp/configure.ac +++ b/libgomp/configure.ac @@ -280,9 +280,13 @@ else multilib_arg= fi +# Get accel target and path to install tree of accel compiler +offload_additional_options= +offload_additional_lib_paths= offload_targets= if test x$enable_offload_targets != x; then for tgt in `echo $enable_offload_targets | sed -e 's#,# #g'`; do +tgt_dir=`echo $tgt | grep '=' | sed 's/.*=//'` tgt=`echo $tgt | sed 's/=.*//'` case $tgt in *-intelmic-* | *-intelmicemul-*) @@ -295,10 +299,20 @@ if test x$enable_offload_targets != x; then else offload_targets=$offload_targets,$tgt_name fi +if test x$tgt_dir != x; then + offload_additional_options=$offload_additional_options -B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin + offload_additional_lib_paths=$offload_additional_lib_paths:$tgt_dir/lib64:$tgt_dir/lib +else + offload_additional_options=$offload_additional_options -B\$(libexecdir)/gcc/\$(target_alias)/\$(gcc_version) -B\$(bindir) + offload_additional_lib_paths=$offload_additional_lib_paths:$toolexeclibdir +fi done fi Hmm, maybe I'm seeing a problem where there isn't one, but in case I'm not: how will this work if there is more than one offloading compiler configured? Won't you get conflicting -B paths added to offload_additional_options in this case? --- a/libgomp/testsuite/lib/libgomp.exp +++ b/libgomp/testsuite/lib/libgomp.exp @@ -169,6 +186,10 @@ proc libgomp_init { args } { # Disable color diagnostics lappend ALWAYS_CFLAGS additional_flags=-fdiagnostics-color=never + +# Required to support non-fallback testing of '#pragma omp target'. +# Help GCC to find target mkoffload. +lappend ALWAYS_CFLAGS additional_flags=${offload_additional_options} } What we're doing in OpenACC offloading testing (gomp-4_0-branch), is in the libgomp.oacc-c/c.exp (etc.) file cycle through all the available offloading devices, and then in there -- I think -- such options should be set, that are specific to one particular offloading device/compiler? Grüße, Thomas I have not tested a compiler, configured to support 2 different offloading targets, so there might be some corner cases. In this place I don't see any problems, at least for the case with installed offloading compilers. One -B allows to find mkoffload in lto-wrapper:compile_offload_image. This function tries to open all paths + '/accel/target_name/mkoffload' suffix. So, there should be no conflicts. Another -B allows mkoffload to find target driver. It tries to open 'host_name-accel-target_name-gcc', so, there also should be no conflicts. However, I still did not tried to enable 'make check' with non-installed offloading compilers. Probably such target specific paths can help there. -- Ilya
Re: [PATCH] X86-64: Add -mskip-rax-setup
On Thu, Dec 18, 2014 at 9:23 AM, Uros Bizjak ubiz...@gmail.com wrote: On Thu, Dec 18, 2014 at 6:08 PM, H. Peter Anvin h...@zytor.com wrote: On 12/18/2014 06:12 AM, Uros Bizjak wrote: # temporary until string.h is fixed KBUILD_CFLAGS += -ffreestanding Yes, it looks to me that new option is the way to go. Is this an OK? In principle, I'm OK with the patch approach, but let's wait for eventual comments from Linux people. Acked-by: H. Peter Anvin h...@linux.intel.com H.J. already coordinated with us; we are more than happy with this approach. Great, I'm glad that this specialized option has uses. Patch is OK. It is in now. Here is the GCC 5 change entry. OK to install? Peter, please feel free to use my kernel patch or create a different one. Thanks. -- H.J. --- Index: gcc-5/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v retrieving revision 1.52 diff -u -p -r1.52 changes.html --- gcc-5/changes.html 15 Dec 2014 19:55:08 - 1.52 +++ gcc-5/changes.html 18 Dec 2014 17:42:18 - @@ -444,6 +444,10 @@ void operator delete[] (void *, std::siz place of the __fentry__ or mcount call, so that a call per function can be later patched in. This can be used for low overhead tracing or hot code patching./li + liThe new code-mskip-rax-setup/code option to skip setting + up RAX register when SSE is disabled and there are no variable + arguments passed in vector registers. This can be used to + optimize Linux kernel./li /ul h3 id=shSH/h3
Re: [PATCH] X86-64: Add -mskip-rax-setup
On 12/18/2014 09:43 AM, H.J. Lu wrote: Peter, please feel free to use my kernel patch or create a different one. Great, thanks! -hpa
Re: [PATCH 4/4] OpenMP 4.0 offloading to Intel MIC: non-fallback testing
On Thu, Dec 18, 2014 at 06:41:18PM +0100, Ilya Verbin wrote: What we're doing in OpenACC offloading testing (gomp-4_0-branch), is in the libgomp.oacc-c/c.exp (etc.) file cycle through all the available offloading devices, and then in there -- I think -- such options should be set, that are specific to one particular offloading device/compiler? I have not tested a compiler, configured to support 2 different offloading targets, so there might be some corner cases. In this place I don't see any problems, at least for the case with installed offloading compilers. One -B allows to find mkoffload in lto-wrapper:compile_offload_image. This function tries to open all paths + '/accel/target_name/mkoffload' suffix. So, there should be no conflicts. Another -B allows mkoffload to find target driver. It tries to open 'host_name-accel-target_name-gcc', so, there also should be no conflicts. However, I still did not tried to enable 'make check' with non-installed offloading compilers. Probably such target specific paths can help there. Yeah, generally we want to be able to support all 3 (or 4 etc.) compilers installed into the same DESTDIR and prefix, so extra -B arguments in there should not break anything, for each of the offloading compilers you'd just add what -B options it needs. Jakub
Re: [C++ Patch] PR 60955
Hi, On 12/18/2014 06:21 PM, Jason Merrill wrote: On 12/18/2014 11:31 AM, Paolo Carlini wrote: + warning_sentinel s (extra_warnings); Let's add a comment about which warning we're avoiding here. OK with that change. Thanks. I'm attaching what I just committed. This is a regression, I suppose the patch is ok for 4_9-branch too, right? Thanks, Paolo. / /cp 2014-12-18 Paolo Carlini paolo.carl...@oracle.com PR c++/60955 * pt.c (struct warning_sentinel): Move it... * cp-tree.h: ... here. * semantics.c (force_paren_expr): Use it. /testsuite 2014-12-18 Paolo Carlini paolo.carl...@oracle.com PR c++/60955 * g++.dg/warn/register-parm-1.C: New. Index: cp/cp-tree.h === --- cp/cp-tree.h(revision 218870) +++ cp/cp-tree.h(working copy) @@ -1149,6 +1149,18 @@ struct processing_template_decl_sentinel } }; +/* RAII sentinel to disable certain warnings during template substitution + and elsewhere. */ + +struct warning_sentinel +{ + int flag; + int val; + warning_sentinel(int flag, bool suppress=true) +: flag(flag), val(flag) { if (suppress) flag = 0; } + ~warning_sentinel() { flag = val; } +}; + /* The cached class binding level, from the most recently exited class, or NULL if none. */ Index: cp/pt.c === --- cp/pt.c (revision 218870) +++ cp/pt.c (working copy) @@ -14438,16 +14438,6 @@ tsubst_non_call_postfix_expression (tree t, tree a return t; } -/* Sentinel to disable certain warnings during template substitution. */ - -struct warning_sentinel { - int flag; - int val; - warning_sentinel(int flag, bool suppress=true) -: flag(flag), val(flag) { if (suppress) flag = 0; } - ~warning_sentinel() { flag = val; } -}; - /* Like tsubst but deals with expressions and performs semantic analysis. FUNCTION_P is true if T is the F in F (ARGS). */ Index: cp/semantics.c === --- cp/semantics.c (revision 218870) +++ cp/semantics.c (working copy) @@ -1660,6 +1660,9 @@ force_paren_expr (tree expr) tree type = unlowered_expr_type (expr); bool rval = !!(kind clk_rvalueref); type = cp_build_reference_type (type, rval); + /* This inhibits warnings in, eg, cxx_mark_addressable +(c++/60955). */ + warning_sentinel s (extra_warnings); expr = build_static_cast (type, expr, tf_error); if (expr != error_mark_node) REF_PARENTHESIZED_P (expr) = true; Index: testsuite/g++.dg/warn/register-parm-1.C === --- testsuite/g++.dg/warn/register-parm-1.C (revision 0) +++ testsuite/g++.dg/warn/register-parm-1.C (working copy) @@ -0,0 +1,9 @@ +// PR c++/60955 +// { dg-options -Wextra } + +unsigned int erroneous_warning(register int a) { +if ((a) 0xff) return 1; else return 0; +} +unsigned int no_erroneous_warning(register int a) { +if (a 0xff) return 1; else return 0; +}
Re: Pragma parsing
Hi Jakub! On Thu, 18 Dec 2014 15:29:42 +0100, Jakub Jelinek ja...@redhat.com wrote: On Thu, Dec 18, 2014 at 02:48:32PM +0100, Thomas Schwinge wrote: Like for CILK, I'd strongly prefer if for the clauses that are specific to OpenACC only you'd use PRAGMA_OACC_CLAUSE_* instead, and put them after the PRAGMA_CILK_* enum values. If you want to have PRAGMA_OACC_CLAUSE_ aliases also for the clauses shared in between OpenMP and OpenACC, feel free to add aliases like Cilk+ has them. It is unfortunately lots of new clauses and we are getting close to the 64 clauses limit :( when they are used in bitmasks. OK, so there is this limit. But, I fail to understand how merely moving the OpenACC-only PRAGMA_*_CLAUSE_* to the end of enum pragma_omp_clause will help overcome that? Or have I now completely confused myself, and I'm not even understanding the problem anymore? :-| The only way this would make sense (in my confused mind) is, if we then did PRAGMA_OACC_CLAUSE_REDUCTION = PRAGMA_OMP_CLAUSE_REDUCTION for the clauses whose names exist in both standards, and PRAGMA_OACC_CLAUSE_PRESENT = [some PRAGMA_OMP_CLAUSE_* whose name does not exist in OpenACC], taking care that no two PRAGMA_OACC_CLAUSE_* are assigned the same enum value. Wouldn't this become very cumbersome for future maintenance, however? I still don't like this very much, because we'll then get a mess of PRAGMA_OMP_CLAUSE_* intermixed with PRAGMA_OACC_CLAUSE_*. I understand, for example, PRAGMA_OMP_CLAUSE_REDUCTION to just be a numeric representation of the reduction string -- and then, it doesn't make too much sense to me to express this both as PRAGMA_OMP_CLAUSE_REDUCTION and PRAGMA_OACC_CLAUSE_REDUCTION, or, similarly, to switch back and forth between PRAGMA_OMP_CLAUSE_* and PRAGMA_OACC_CLAUSE_*. Yet, that's just the point that I'm making, so I'll defer if there's no consensus to be found here. The point is that we now have lots of clauses, and making it clear what from those clauses are Cilk+, what are OpenACC, what are OpenMP will help with code readability. The idea then is, I guess, to split parsing of OpenACC's PRAGMA_OACC_CLAUSE_* out of gcc/c/c-parser.c:c_parser_omp_clause_name into a new c_parser_oacc_clause_name, just like we already have a separate c_parser_oacc_all_clauses and c_parser_omp_all_clauses? (The motivation for the latter is that this is where the disambiguation between the (possibly) different semantics of OpenACC and OpenMP clauses happens, that is, where the names PRAGMA_OMP_CLAUSE_* are resolved to actual gcc/tree-core.h:enum omp_clause_code's OMP_CLAUSE_*. For example, there a OpenACC copy clause is translated into a OMP_CLAUSE_MAP with tofrom kind.) Especially if some clauses diverge in their meaning between different standards. When you say clauses here, do you mean gcc/c-family/c-pragma.h:enum pragma_omp_clause's PRAGMA_*_CLAUSE_*, or gcc/tree-core.h:enum omp_clause_code's OMP_CLAUSE_*, or both? That is, are you suggesting that we should also be adding new OACC_CLAUSE_* next to the existing OMP_CLAUSE_* in gcc/tree-core.h:enum omp_clause_code? (Which would then require separate handling all over the middle end?) I thought the (long-term) goal is to unfiy all this even more? Or, as above, I have now completely confused myself, and I'm not even understanding the problem anymore? :-| Or, alternatively, wouldn't it also work if we did have a separate gcc/c-family/c-pragma.h:enum pragma_oacc_clause (instead of adding to enum pragma_omp_clause), and then the OpenACC clauses at the parsing level live in their own namespace? (The same should then be done for Cilk+, too, even though the problem there is not as severe as for OpenACC, as its only two handful PRAGMA_CILK_CLAUSE_*.) This means we can't support combined OpenACC/OpenMP pragmas -- but that doesn't make much sense anyway, so no problem. But I'm not sure that is actually the approach you've been suggesting? Grüße, Thomas pgpdeo0OMWERo.pgp Description: PGP signature
Re: Pragma parsing
On Thu, Dec 18, 2014 at 07:00:09PM +0100, Thomas Schwinge wrote: OK, so there is this limit. But, I fail to understand how merely moving the OpenACC-only PRAGMA_*_CLAUSE_* to the end of enum pragma_omp_clause will help overcome that? Or have I now completely confused myself, and I'm not even understanding the problem anymore? :-| The only way this would make sense (in my confused mind) is, if we then did PRAGMA_OACC_CLAUSE_REDUCTION = PRAGMA_OMP_CLAUSE_REDUCTION for the clauses whose names exist in both standards, and PRAGMA_OACC_CLAUSE_PRESENT = [some PRAGMA_OMP_CLAUSE_* whose name does not exist in OpenACC], taking care that no two PRAGMA_OACC_CLAUSE_* are assigned the same enum value. Wouldn't this become very cumbersome for future maintenance, however? Well, if you e.g. put PRAGMA_OMP_CLAUSE_* that isn't relevant to OpenACC first, then pragmas shared by both, then OpenACC pragmas, then perhaps you could use biased bitmasks for the OpenACC constructs. Anyway, guess keep the ordering as is. I still don't like this very much, because we'll then get a mess of PRAGMA_OMP_CLAUSE_* intermixed with PRAGMA_OACC_CLAUSE_*. I understand, for example, PRAGMA_OMP_CLAUSE_REDUCTION to just be a numeric representation of the reduction string -- and then, it doesn't make too much sense to me to express this both as PRAGMA_OMP_CLAUSE_REDUCTION and PRAGMA_OACC_CLAUSE_REDUCTION, or, similarly, to switch back and forth between PRAGMA_OMP_CLAUSE_* and PRAGMA_OACC_CLAUSE_*. Yet, that's just the point that I'm making, so I'll defer if there's no consensus to be found here. The point is that we now have lots of clauses, and making it clear what from those clauses are Cilk+, what are OpenACC, what are OpenMP will help with code readability. The idea then is, I guess, to split parsing of OpenACC's PRAGMA_OACC_CLAUSE_* out of gcc/c/c-parser.c:c_parser_omp_clause_name into a new c_parser_oacc_clause_name, just like we already have a separate c_parser_oacc_all_clauses and c_parser_omp_all_clauses? (The motivation for the latter is that this is where the disambiguation between the (possibly) different semantics of OpenACC and OpenMP clauses happens, that is, where the names PRAGMA_OMP_CLAUSE_* are resolved to actual gcc/tree-core.h:enum omp_clause_code's OMP_CLAUSE_*. For example, there a OpenACC copy clause is translated into a OMP_CLAUSE_MAP with tofrom kind.) I don't think that is needed. It is the same parsing, so it can be shared, just use PRAGMA_OMP_* for OpenMP clauses or clauses shared between OpenMP and OpenACC and PRAGMA_OACC_* for OpenACC specific pragmas in the code that works with both, and in OpenACC specific FE code the PRAGMA_OACC_* aliases to PRAGMA_OMP_* values for the shared clauses. I just want to be able to clearly see what is OpenACC specific and what is not, similarly how Cilk+ already does it. Jakub
Re: [patch c++]: Fix PR/61198: Crash when selecting specializations through aliases.
2014-12-18 18:26 GMT+01:00 Jason Merrill ja...@redhat.com: On 12/18/2014 10:10 AM, Kai Tietz wrote: + if (TMPL_ARGS_DEPTH (args) TMPL_PARMS_DEPTH (tmpl_parms)) +args = get_innermost_template_args +(args, TMPL_PARMS_DEPTH (tmpl_parms)); It seems unlikely to be correct to arbitrarily strip off outer template args. Where did the mismatch come from? Jason Hmm, we are comming from #2 tsubst_decl (cp/pt.c:11278) #3 instantiate_template_1 (cp/pt.c:15945) #4 instantiate_template (cp/pt.c:15995) #5 instantiate_alias_template (cp/pt.c:11827) #6: lookup_template_class_1 #7: lookup_template_class #8: finish_template_type #9: cp_parser_template_id Well, in general I would have assumed to be able to get alias decl of tmpl. Wasn't able to find a simple way to get it. So, by looking into source I found that most cases handling args tmpl-args by using inner_most_template_args instead. Defaulting to innermost_template_args looks indeed a bit wrong, but seemed to work. Kai
Including a file from include/ in gcc/*.h (was: [gomp4] Use include/gomp-constants.h more actively)
Hi! On Wed, 17 Dec 2014 23:26:53 +0100, I wrote: Committed to gomp-4_0-branch in r218840: commit febcd8dfdb10fa80edff0880973d1915ca2fef74 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Wed Dec 17 22:26:24 2014 + Use include/gomp-constants.h more actively. diff --git gcc/tree-core.h gcc/tree-core.h index 743bc0d..fc61b88 100644 --- gcc/tree-core.h +++ gcc/tree-core.h @@ -32,6 +32,7 @@ along with GCC; see the file COPYING3. If not see #include alias.h #include flags.h #include symtab.h +#include gomp-constants.h /* This file contains all the data structures that define the 'tree' type. There are no accessor macros nor functions in this file. Only the Is it actually OK to #include gomp-constants.h (living in [GCC]/include/) from gcc/tree-core.h? Isn't the tree-core.h file getting installed, for later use by plugins -- which then need to be able to find gomp-constants, too, but that is not being installed? --- include/gomp-constants.h +++ include/gomp-constants.h Grüße, Thomas pgpwBcuIrVxlU.pgp Description: PGP signature
Re: Including a file from include/ in gcc/*.h (was: [gomp4] Use include/gomp-constants.h more actively)
On Thu, Dec 18, 2014 at 07:25:03PM +0100, Thomas Schwinge wrote: Hi! On Wed, 17 Dec 2014 23:26:53 +0100, I wrote: Committed to gomp-4_0-branch in r218840: commit febcd8dfdb10fa80edff0880973d1915ca2fef74 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Wed Dec 17 22:26:24 2014 + Use include/gomp-constants.h more actively. diff --git gcc/tree-core.h gcc/tree-core.h index 743bc0d..fc61b88 100644 --- gcc/tree-core.h +++ gcc/tree-core.h @@ -32,6 +32,7 @@ along with GCC; see the file COPYING3. If not see #include alias.h #include flags.h #include symtab.h +#include gomp-constants.h /* This file contains all the data structures that define the 'tree' type. There are no accessor macros nor functions in this file. Only the Is it actually OK to #include gomp-constants.h (living in [GCC]/include/) from gcc/tree-core.h? Isn't the tree-core.h file getting installed, for later use by plugins -- which then need to be able to find gomp-constants, too, but that is not being installed? Generally, it must be possible to include include/ headers from gcc/ headers, even when they are used by plugins. Otherwise system.h including libiberty.h and safe-ctype.h etc. wouldn't work. Perhaps you need to add gomp-constants.h to some Makefile variable or something, look at how is safe-ctype.h etc. handled. That said, including gomp-constants.h from tree-core.h is I think very much against all the Andrew's efforts to flatten headers (which is something I'm not very happy with generally, but in this case, I think only the very few files that really need the constants should include it). Jakub
Re: [PATCH] X86-64: Add -mskip-rax-setup
On Thu, Dec 18 2014, H. Peter Anvin h...@zytor.com wrote: On 12/18/2014 09:43 AM, H.J. Lu wrote: Peter, please feel free to use my kernel patch or create a different one. Great, thanks! Thanks for accepting this idea, and especially to H.J. for already having done all the work. Minor thing: If it's not too late, I'd appreciate a 'Suggested-by' or similar mention in the kernel change log. Thanks, Rasmus
Re: [PATCH] X86-64: Add -mskip-rax-setup
On 12/18/2014 10:37 AM, Rasmus Villemoes wrote: Minor thing: If it's not too late, I'd appreciate a 'Suggested-by' or similar mention in the kernel change log. I think we can get that. -hpa
Re: [RFC, PATCH, fortran] PR fortran/60255 Deferred character length
Hi all, here is my next try on proposing a patch for the issue in pr60255. It took me quite some time to understand the intricacies with handling variables associated in a select type. I think I got most of the issues fixed now: - Added generation of _len component for each unlimited polymorphic pointer. - Removed (my own) _len component creation routine. - Removed the double underscore in get_len_component(). - Associating an unlimited polymorphic entity to a deferred char array now lets the deferred char array use the actual string length from the '_len' component of the unlimited polymorphic entity for the charlen instead of the size component of the vptr. - Removed: Generating a special vtab name for deferred strings. A deferred string assigned to the unlimited polymorphic entity is now stored as having charlen zero again. - Basic support for char array arrays (No stuttering here) in u-poly variables. Bootstraps ok on x86_64-linux-gnu. Comparing regtests I get a difference in unlimited_polymorphic_2.f90 that I don't understand yet. May be that is only, because one error message disappeared. Attached is the full patch for trunk and a delta patch for those of you who already have my pr60255_3 added. I don't provide a changelog entry yet, because I think review will find some issues still to fix. So, comments welcome! Regards, Andre On Tue, 9 Dec 2014 14:16:05 +0100 Dominique d'Humières domi...@lps.ens.fr wrote: Dear Andre, The patch causes an ICE for the test gfortran.dg/unlimited_polymorphic_1.f03: f951: internal compiler error: in gfc_add_component_ref, at fortran/class.c:236 f951: internal compiler error: Abort trap: 6 gfc: internal compiler error: Abort trap: 6 (program f951) Abort Reduced test for which the ICE is triggered by ‘len(w)' MODULE m contains subroutine bar (arg, res) class(*) :: arg character(100) :: res select type (w = arg) type is (character(*)) write (res, '(I2)') len(w) end select end subroutine END MODULE Note that with your patch at https://gcc.gnu.org/ml/fortran/2014-08/msg00022.html, I get the same ICE for the Mikael’s test at https://gcc.gnu.org/ml/fortran/2014-08/msg00055.html (before your patch for pr60255, it used to give a wrong length: 80 instead of 20 AFAICR). Note that the assert at fortran/class.c:236 is also triggered for pr61115. Thanks for working on these issues, Dominique On 8 December 2014 at 18:38, Andre Vehreschild ve...@gmx.de wrote: Hi all, please find attached a more elaborate patch for pr60255. I totally agree that my first attempt was just scratching the surface of the work needed. This patch also is *not* complete, but because I am really new to gfortran patching, I don't want to present a final patch only to learn then, that I have violated design rules, common practice or the like. Therefore please comment and direct me to any sources/ideas to improve the patch. Topic: The pr 60255 is about assigning a char array to an unlimited polymorphic entity. In the comments the concern about the lost length information is raised. The patch adds a _len component to the unlimited polymorphic entity (after _data and _vtab) and adds an assignment of the string length to _len when a string is pointer assigned to the unlimited poly entity. Furthermore is the intrinsic len(unlimited poly pointing to a string) resolved to give the _len component. Yet missing: - assign _len component back to deferred char array length component - transport length along chains of unlimited poly entities, i.e., a = b; c = a where all objects are unlimited poly and b is a string. - allocate() in this context Patch dependencies: none Comments, concerns, candy welcome! Regards, Andre -- Andre Vehreschild * Kreuzherrenstr. 8 * 52062 Aachen Tel.: +49 241 9291018 * Email: ve...@gmx.de diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c index 29e31e1..f5a815c 100644 --- a/gcc/fortran/class.c +++ b/gcc/fortran/class.c @@ -34,6 +34,12 @@ along with GCC; see the file COPYING3. If not see (pointer/allocatable/dimension/...). * _vptr: A pointer to the vtable entry (see below) of the dynamic type. +Only for unlimited polymorphic classes: +* _len: An integer(4) to store the string length when the unlimited + polymorphic pointer is used to point to a char array. The '_len' + component will be zero when no character array is stored in + '_data'. + For each derived type we set up a vtable entry, i.e. a structure with the following fields: * _hash: A hash value serving as a unique identifier for this type. @@ -544,10 +550,41 @@ gfc_intrinsic_hash_value (gfc_typespec *ts) } +/* Get the _len component from a class/derived object storing a string. */ + +gfc_expr * +gfc_get_len_component (gfc_expr *e) +{ + gfc_expr
Re: [PATCH 0/9] Enable LRA on SH
On 2014-12-17 7:56 PM, Kaz Kojima wrote: This patch series is to make SH target use LRA and is discussed in the PR target/55212: [SH] Switch to LRA https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212 I think there are issues which should be solved before defaulting LRA on SH and the helps from the experts are needed. The series are constructed as [PATCH 1/9] LRA: Take account implicit usage of pseudo reg in mem arg [PATCH 2/9] LRA: Swap base_term and index_term for some case [PATCH 3/9] Add TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV target macro [PATCH 4/9] Add TARGET_LEGITIMIZE_ADDRESS_DISPLACEMENT target macro [PATCH 5/9] [SH] Add -mlra option [PATCH 6/9] [SH] Miscellaneous changes for LRA [PATCH 7/9] [SH] Modify movsi_ie movsf_ie patterns for LRA [PATCH 8/9] [SH] Add splitter to addsi3_compact [PATCH 9/9] [SH] Split QI/HImode load/store via r0 Patch 1-4 touch generic files and 5-9 are SH specific. 1-3, 6 and 7 are for getting rid of ICEs. 4, 8 and 9 are to fix some serious code regressions. The patches are tested with bootstrap and the top level make -k check on i686-pc-linux-gnu with no new failures. -mlra makes ~0.3% code size regression on the current sh-lra branch. It solves a few known failures and causes a few new failures with make -k check on sh4-unknown-linux-gnu. Thank you very much for your work on porting LRA to SH! I know it is quite a challenge to do any RA work for this target. And sorry, I had no time to pay attention to this work. To be honest I thought that SH will be one of the last target ported to LRA. It is a surprise for me, that porting SH to LRA does not need many changes in LRA itself. I'll try to review the first 4 patches as soon as possible although I'll be on 2 week vacation since tomorrow.
Re: [PATCH 0/9] Enable LRA on SH
On Thu, 2014-12-18 at 13:42 -0500, Vladimir Makarov wrote: Thank you very much for your work on porting LRA to SH! I know it is quite a challenge to do any RA work for this target. The R0-ness (both, GP reg r0 and FP reg fr0) of the ISA taxes the patience sometimes. Too often, I've been thinking about hiding the r0/fr0 regs from the RA process and do some custom things pre-RA/post-RA ... hopefully LRA will stop me from thinking more about that :) Cheers, Oleg
Re: [PATCH] X86-64: Add -mskip-rax-setup
On Dec 18, 2014, at 9:43 AM, H.J. Lu hjl.to...@gmail.com wrote: + arguments passed in vector registers. This can be used to + optimize Linux kernel./li English, to optimize _the_ Linux kernel?
Re: [debug-early] reuse variable DIEs and fix their context
Hi Jason. It's embarrassing that I just got to this now. I hope you don't repay the favor and take as long responding to me :(. On 09/12/14 08:15, Jason Merrill wrote: On 09/11/2014 08:51 PM, Aldy Hernandez wrote: /* Generate hidden aliases for Java. */ - if (candidates) + if (java_hidden_aliases) { - build_java_method_aliases (candidates); - delete candidates; + build_java_method_aliases (java_hidden_aliases); + delete java_hidden_aliases; } Didn't it work to move this before finalize? I think the VTV stuff is all that really needs to come after it, and that can move out of the front end if this hook is a problem (which I don't really think it is). I can't move the call to build_java_method_aliases until the compilation proper has run because said function iterates through all the functions with FOR_EACH_FUNCTION, and the set of available functions at parse time and after the optimization passes are quite different, presumably because we've pruned all the unused functions. If I move the call to build_java_method_aliases before the optimization passes, we end up emitting many more hidden aliases which cause handle_alias_pairs() in cgraphunit.c to bark with: error: 'void _ZGAN8__JArrayC4Ev()' aliased to external symbol '_ZN8__JArrayC4Ev' So we either leave building Java method aliases after the optimization passes, or we somehow tighten the candidate selection in collect_candidates_for_java_method_aliases(): if (DECL_CLASS_SCOPE_P (fndecl) TYPE_FOR_JAVA (DECL_CONTEXT (fndecl)) TARGET_USE_LOCAL_THUNK_ALIAS_P (fndecl)) What do you suggest? I can, however, move all the other non-VTV stuff to the parser in a subsequent patch, while you respond to this query. Thanks. Aldy
Re: [PATCH 0/9] Enable LRA on SH
On Dec 18, 2014, at 10:42 AM, Vladimir Makarov vmaka...@redhat.com wrote: Thank you very much for your work on porting LRA to SH! I’ve ported my target to LRA, and I have not found any LRA bits I needed to change.
Re: [PATCH] X86-64: Add -mskip-rax-setup
On Thu, Dec 18, 2014 at 11:22 AM, Mike Stump mikest...@comcast.net wrote: On Dec 18, 2014, at 9:43 AM, H.J. Lu hjl.to...@gmail.com wrote: + arguments passed in vector registers. This can be used to + optimize Linux kernel./li English, to optimize _the_ Linux kernel? Updated. -- H.J. --- Index: gcc-5/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v retrieving revision 1.52 diff -u -p -r1.52 changes.html --- gcc-5/changes.html 15 Dec 2014 19:55:08 - 1.52 +++ gcc-5/changes.html 18 Dec 2014 19:35:28 - @@ -444,6 +444,10 @@ void operator delete[] (void *, std::siz place of the __fentry__ or mcount call, so that a call per function can be later patched in. This can be used for low overhead tracing or hot code patching./li + liThe new code-mskip-rax-setup/code option to skip setting + up RAX register when SSE is disabled and there are no variable + arguments passed in vector registers. This can be used to + optimize the Linux kernel./li /ul h3 id=shSH/h3
Re: [PATCH 2/4] New data structure for cgraph_summary introduced.
2014-12-08 Martin Liska mli...@suse.cz * cgraph.h (symbol_table::allocate_cgraph_symbol): Summary UID is filled up. * symbol-summary.h: New file. * gengtype.c (open_base_files): Add symbol-summary.h. * toplev.c (general_init): Call constructor of symbol_table. --- gcc/cgraph.h | 8 ++ gcc/gengtype.c | 4 +- gcc/symbol-summary.h | 281 +++ gcc/toplev.c | 3 +- 4 files changed, 293 insertions(+), 3 deletions(-) create mode 100644 gcc/symbol-summary.h diff --git a/gcc/cgraph.h b/gcc/cgraph.h index a5c5f56..1664bd7 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -1237,6 +1237,8 @@ public: int count_materialization_scale; /* Unique id of the node. */ int uid; + /* Summary unique id of the node. */ + int summary_uid; Hmm, can't we just use uid here? I guess the only difference is that summary uid is not kept dense, unlike the uid. This should not propagate into much of trouble simply because the summary datastructure should safely remove the summaires via removal hook. + +/* We want to pass just pointer types as argument for function_summary + template class. */ + +template class T +class function_summary +{ +private: + function_summary(); +}; + +template class T +class GTY((user)) function_summary T * Eventually I would like this to allow attaching summaries to variables (and symbols in general), too. But currently we do not have use for it, so we can care about this later. The patch is OK. Preferrably with summary_uid replaced by uid if it is easily doable. Honza
Re: [PATCH 3/4] First usage of cgraph_summary in ipa-prop pass.
gcc/lto/ChangeLog: 2014-12-08 Martin Liska mli...@suse.cz * lto-partition.c: Include of symbol-summary.h is added. * lto-symtab.c: Likewise. * lto.c: Likewise. I must say I am not friend of flattening. Hope this will be resolved for 5.1 gcc/ChangeLog: 2014-12-08 Martin Liska mli...@suse.cz * auto-profile.c: Include of symtab-summary.h is added. * cgraph.c: Likewise. * cgraphbuild.c: Likewise. * cgraphclones.c: Likewise. * cgraphunit.c: Likewise. * ipa-cp.c: Likewise. * ipa-devirt.c: Likewise. * ipa-icf.c: Likewise. * ipa-inline-analysis.c (evaluate_properties_for_edge): New ipa_node_params_d data structure is used. (inline_node_duplication_hook): Likewise. (estimate_function_body_sizes): Likewise. (remap_edge_change_prob): Likewise. (inline_merge_summary): Likewise. * ipa-inline-transform.c: Include of symtab-summary.h is added. * ipa-inline.c (early_inliner): New ipa_node_params_d data structure is used. * ipa-polymorphic-call.c: Include of symtab-summary.h is added. * ipa-profile.c: Include of symtab-summary.h is added. * ipa-prop.c (ipa_propagate_indirect_call_infos): New ipa_node_params_d data structure is used. (ipa_node_params::~ipa_node_params): New function. (ipa_free_all_node_params): Destruction is simplified. (ipa_node_removal_hook): Removed. (ipa_add_new_function): Renamed from ipa_node_duplication_hook. (ipa_node_params_t::duplicate): New function. (ipa_register_cgraph_hooks): Few hooks are removed. (ipa_unregister_cgraph_hooks): Likewise. (ipa_prop_write_jump_functions): New ipa_node_params_d is used. * ipa-prop.h (struct ipa_node_params): Destructor instroduced for the structure. (ipa_check_create_node_params): Vector for ipa_node_params is replaced with function_summary. * ipa-split.c: Include of symtab-summary.h is added. * ipa-utils.c: Include of symtab-summary.h is added. * ipa.c: Include of symtab-summary.h is added. * omp-low.c: Include of symtab-summary.h is added. * tree-inline.c: Include of symtab-summary.h is added. * tree-sra.c: Include of symtab-summary.h is added. * tree-ssa-pre.c: Include of symtab-summary.h is added. OK. Honza
Re: [PATCH 4/4] Data structure is used for inline_summary struct.
2014-12-08 Martin Liska mli...@suse.cz * lto-partition.c (add_symbol_to_partition_1): New inline_summary_d is used. (undo_partition): Likewise. (lto_balanced_map): Likewise. gcc/ChangeLog: 2014-12-08 Martin Liska mli...@suse.cz * cgraphunit.c (symbol_table::process_new_functions): New inline_summary_d is used. * ipa-cp.c (ipcp_cloning_candidate_p): Likewise. (devirtualization_time_bonus): Likewise. (estimate_local_effects): Likewise. (ipcp_propagate_stage): Likewise. * ipa-inline-analysis.c (evaluate_conditions_for_known_args): Likewise. (evaluate_properties_for_edge): Likewise. (inline_summary_alloc): Likewise. (reset_inline_summary): New inline_summary argument is introduced. (inline_summary_t::remove): New function. (inline_summary_t::duplicate): Likewise. (dump_inline_edge_summary): New inline_summary_d is used. (dump_inline_summary): Likewise. (estimate_function_body_sizes): Likewise. (compute_inline_parameters): Likewise. (estimate_edge_devirt_benefit): Likewise. (estimate_node_size_and_time): Likewise. (inline_update_callee_summaries): Likewise. (inline_merge_summary): Likewise. (inline_update_overall_summary): Likewise. (simple_edge_hints): Likewise. (do_estimate_edge_time): Likewise. (estimate_time_after_inlining): Likewise. (estimate_size_after_inlining): Likewise. (do_estimate_growth): Likewise. (growth_likely_positive): Likewise. (inline_generate_summary): Likewise. (inline_read_section): Likewise. (inline_read_summary): Likewise. (inline_write_summary): Likewise. (inline_free_summary): Likewise. * ipa-inline-transform.c (clone_inlined_nodes): Likewise. (inline_call): Likewise. * ipa-inline.c (caller_growth_limits): Likewise. (can_inline_edge_p): Likewise. (want_early_inline_function_p): Likewise. (compute_uninlined_call_time): Likewise. (compute_inlined_call_time): Likewise. (big_speedup_p): Likewise. (want_inline_small_function_p): Likewise. (edge_badness): Likewise. (update_caller_keys): Likewise. (update_callee_keys): Likewise. (recursive_inlining): Likewise. (inline_small_functions): Likewise. (inline_to_all_callers): Likewise. (dump_overall_stats): Likewise. (early_inline_small_functions): Likewise. * ipa-inline.h: New class inline_summary_t replaces vecinline_summary_t. * ipa-split.c (execute_split_functions): New inline_summary_d is used. * ipa.c (walk_polymorphic_call_targets): Likewise. * tree-sra.c (ipa_sra_preliminary_function_checks): Likewise. Patch is OK. I only changed mind about inline_summary_d. It does not look good used with inline_summary_d-get() especially because _d prefix was used in C++ication days for type names only. Probably inline_summaries-get() is better. OK with this change. Thanks, Honza
[debug-early] C++: moving more things before the compilation proper
Hi Jason. As promised, I am moving most everything out of cxx_post_compilation_parsing_cleanups() which is the hook that will be called after the compilation proper has run (symtab-finalize_compilation_unit()). Apart from the Java hidden alias support which I just asked you about, and the VTV stuff which will remain in the aforementioned hook, the only remaining bits are: input_location = locus_at_end_of_parsing; #ifdef ENABLE_CHECKING validate_conversion_obstack (); #endif /* ENABLE_CHECKING */ I am keeping the input_location bit because the final VTV stuff has some error() calls which presumably will need a location. And I am keeping validate_conversion_obstack() because I *think* the VTV stuff will create calls? If not, let me know and I can move this bit too. In any case, things are pretty deserted in cxx_post_compilation_parsing_cleanups(), which should make everybody happy :-). I am committing the following to the branch. Let me know if you have any suggestions. Thanks. Aldy commit 0cf41add70fad5defc4ff07b76f95e3aaadc1b05 Author: Aldy Hernandez al...@redhat.com Date: Thu Dec 18 11:47:19 2014 -0800 * cp/decl2.c (cxx_post_compilation_parsing_cleanups): Move some final checks... (c_parse_final_cleanups): ...here. diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c index f9d1028..6441ab2 100644 --- a/gcc/cp/decl2.c +++ b/gcc/cp/decl2.c @@ -4723,6 +4723,21 @@ c_parse_final_cleanups (void) pending_statics-length (), EMIT_DEBUG_EARLY); } + + perform_deferred_noexcept_checks (); + + finish_repo (); + + /* The entire file is now complete. If requested, dump everything + to a file. */ + dump_tu (); + + if (flag_detailed_statistics) +{ + dump_tree_statistics (); + dump_time_statistics (); +} + timevar_stop (TV_PHASE_DBGINFO); timevar_start (TV_PHASE_PARSING); } @@ -4746,8 +4761,6 @@ cxx_post_compilation_parsing_cleanups (void) vtv_generate_init_routine (); } - perform_deferred_noexcept_checks (); - /* Generate hidden aliases for Java. */ if (java_hidden_aliases) { @@ -4755,17 +4768,6 @@ cxx_post_compilation_parsing_cleanups (void) delete java_hidden_aliases; } - finish_repo (); - - /* The entire file is now complete. If requested, dump everything - to a file. */ - dump_tu (); - - if (flag_detailed_statistics) -{ - dump_tree_statistics (); - dump_time_statistics (); -} input_location = locus_at_end_of_parsing; #ifdef ENABLE_CHECKING
[PATCH, rs6000] PR59708, Fix operand ordering for 128-bit andc/orc
The following patch fixes testsuite failures builtin-arith-overflow-[10,11].c. In the case where rs6000_split_logical() was being called the operands were swapped such that the wrong operand was being complemented. Bootstrap/tested on powerpc64-linux with no new regressions. Ok for trunk (and 4.9/4.8 after bootstrap/regtest)? 2014-12-18 Pat Haugen pthau...@us.ibm.com * config/rs6000/rs6000.md (boolcmode3_internal1): Fix operand ordering. Index: gcc/config/rs6000/rs6000.md === --- gcc/config/rs6000/rs6000.md (revision 218827) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -7554,19 +7554,21 @@ (define_insn_and_split *boolmode3_int (const_string 16]) ;; 128-bit ANDC/ORC +;; In the case where rs6000_split_logical is called, the NOT'd operand +;; must be opnd1 in order for the split insns to be recognized. (define_insn_and_split *boolcmode3_internal1 [(set (match_operand:BOOL_128 0 vlogical_operand =BOOL_REGS_OUTPUT) (match_operator:BOOL_128 3 boolean_operator [(not:BOOL_128 - (match_operand:BOOL_128 2 vlogical_operand BOOL_REGS_OP1)) - (match_operand:BOOL_128 1 vlogical_operand BOOL_REGS_OP2)]))] + (match_operand:BOOL_128 1 vlogical_operand BOOL_REGS_OP1)) + (match_operand:BOOL_128 2 vlogical_operand BOOL_REGS_OP2)]))] TARGET_P8_VECTOR || (GET_CODE (operands[3]) == AND) { if (TARGET_VSX vsx_register_operand (operands[0], MODEmode)) -return xxl%q3 %x0,%x1,%x2; +return xxl%q3 %x0,%x2,%x1; if (TARGET_ALTIVEC altivec_register_operand (operands[0], MODEmode)) -return v%q3 %0,%1,%2; +return v%q3 %0,%2,%1; return #; }
patch to solve PR64291
The following patch solves https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64291 It is a bug in a new rematerialization subpass of LRA. The patch was bootstrapped on x86/x86-64. Committed as rev. 218874. 2014-12-18 Vladimir Makarov vmaka...@redhat.com PR rtl-optimization/64291 * lra-remat.c (bad_for_rematerialization_p): Add UNPSEC_VLOATILE. (create_cands): Process only output reload insn with potential cands. 2014-12-18 Vladimir Makarov vmaka...@redhat.com PR rtl-optimization/64291 * testsuite/gcc.target/i386/pr64291-[12].c: New tests. Index: lra-remat.c === --- lra-remat.c (revision 218685) +++ lra-remat.c (working copy) @@ -350,12 +350,12 @@ finish_cand_table (void) -/* Return true if X contains memory or UNSPEC. We can not just check - insn operands as memory or unspec might be not an operand itself - but contain an operand. Insn with memory access is not profitable - for rematerialization. Rematerialization of UNSPEC might result in - wrong code generation as the UNPEC effect is unknown - (e.g. generating a label). */ +/* Return true if X contains memory or some UNSPEC. We can not just + check insn operands as memory or unspec might be not an operand + itself but contain an operand. Insn with memory access is not + profitable for rematerialization. Rematerialization of UNSPEC + might result in wrong code generation as the UNPEC effect is + unknown (e.g. generating a label). */ static bool bad_for_rematerialization_p (rtx x) { @@ -363,7 +363,7 @@ bad_for_rematerialization_p (rtx x) const char *fmt; enum rtx_code code; - if (MEM_P (x) || GET_CODE (x) == UNSPEC) + if (MEM_P (x) || GET_CODE (x) == UNSPEC || GET_CODE (x) == UNSPEC_VOLATILE) return true; code = GET_CODE (x); fmt = GET_RTX_FORMAT (code); @@ -406,7 +406,7 @@ operand_to_remat (rtx_insn *insn) if (reg-regno == STACK_POINTER_REGNUM frame_pointer_needed) return -1; else if (reg-type == OP_OUT ! reg-subreg_p -find_regno_note (insn, REG_UNUSED, reg-regno) == NULL) + find_regno_note (insn, REG_UNUSED, reg-regno) == NULL) { /* We permits only one spilled reg. */ if (found_reg != NULL) @@ -508,11 +508,14 @@ create_cands (void) if ((set = single_set (insn)) != NULL REG_P (SET_SRC (set)) REG_P (SET_DEST (set)) -(src_regno = REGNO (SET_SRC (set))) = FIRST_PSEUDO_REGISTER +((src_regno = REGNO (SET_SRC (set))) + = lra_constraint_new_regno_start) (dst_regno = REGNO (SET_DEST (set))) = FIRST_PSEUDO_REGISTER reg_renumber[dst_regno] 0 (insn2 = regno_potential_cand[src_regno].insn) != NULL BLOCK_FOR_INSN (insn2) == BLOCK_FOR_INSN (insn)) + /* It is an output reload insn after insn can be +rematerialized (potential candidate). */ create_cand (insn2, regno_potential_cand[src_regno].nop, dst_regno); if (nop 0) goto fail; Index: testsuite/gcc.target/i386/pr64291-1.c === --- testsuite/gcc.target/i386/pr64291-1.c (revision 0) +++ testsuite/gcc.target/i386/pr64291-1.c (working copy) @@ -0,0 +1,51 @@ +/* { dg-options -O2 } */ +/* { dg-additional-sources pr64291-2.c } */ +/* { dg-do run } */ +void f(void*,...); +void g(void*,long,long); +int nnn=0; +long test=0; + +typedef struct +{ + int _mp_size; + unsigned long *_mp_d; +} __mpz_struct; +typedef __mpz_struct mpz_t[1]; + +int main () +{ + mpz_t n, d; + long nn, dn; + unsigned long *np, *dup, *dnp, *qp; + long alloc, itch; + + f (n); + f (d); + qp = (unsigned long*)__builtin_alloca(4099*8) + 1; + dnp = (unsigned long*)__builtin_alloca (2049*8); + alloc = 1; + for (test = 0; test 1; test++) +{ + dn = d-_mp_size; + dup = d-_mp_d; + f (dnp, dup, dn); + dnp[dn - 1] |= 1UL63; + f (0); + nn = nnn; + np = n-_mp_d; + qp[-1] = -757136820; + qp[nn - dn + 1] = 14883681; + f (0); + if (dn = 6) + f (0); + itch = nn + 1; + if (itch + 1 alloc) + { + g(0,alloc*8,(itch+1)*8); + alloc = itch + 1; + } + f (np, nn); +} + return 0; +} Index: testsuite/gcc.target/i386/pr64291-2.c === --- testsuite/gcc.target/i386/pr64291-2.c (revision 0) +++ testsuite/gcc.target/i386/pr64291-2.c (working copy) @@ -0,0 +1,4 @@ +/* { dg-do compile } */ +extern void abort (void); +void f(void*p,...){} +void g(void*p,long a,long b){if (a!=8) abort();}
Re: [RFC PATCH 3/9] Add TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV target macro
On 2014-12-17 8:00 PM, Kaz Kojima wrote: This was discussed in PR55212 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212#c25 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212#c58 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212#c69 and is to fix another ICEs in assign_by_spills. This patch introduces a new target hook to disable replacing with memory equivalence if the macro returns true. The problem happens for r0 again. SH ISA has many instructions with r0-parity. The index register is r0 only, only r0 can be used as QI/HImode load/store target/source, and so on. It seems that replacing mem equiv makes register pressure high especially on r0 and fails in spilling. Also replacing mem equiv doesn't win CSiBE in average on this target. Please see the above trail #c69 for CSiBE numbers. I think the one design goal of LRA is to avoid target macros if possible. There might be some way to handle the above situation without target macros, though it's over my head. I don't think it will be easy to solve this problem by following reg pressure. I guess R0 will be in the same pressure reg class. I believe some hook is an adequate solution here. -- * lra-constraints.c (get_equiv): Don't return memory equivalence when targetm.cannot_substitute_mem_equiv_p is true. * target.def (cannot_substitute_mem_equiv_p): New hook. * config/sh/sh.c (sh_cannot_substitute_mem_equiv_p): New function. (TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P): Define. * doc/tm.texi.in (TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P): New hook. * doc/tm.texi: Regenerate. LRA part of the patch is ok. I found one typo in and the speed on avarage working sets. (average). But I'd recommend to try different hook implementation checking r0 in the address as still equiv memory substitution might be profitable for SH in some cases.
Re: [RFC PATCH 2/9] LRA: Swap base_term and index_term for some case
On 2014-12-17 7:59 PM, Kaz Kojima wrote: This was discussed in PR55212 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212#c52 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212#c53 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212#c55 and is to fix ICE in assign_by_spills. In that case, rtlanal.c:decompose_mem_address guesses base and index regs when INDEX_REG_CLASS is assigned to that base candidate already. This will be OK for almost targets but is bad for SH which has only one register r0 in INDEX_REG_CLASS. The patch fix-ups it with swapping these candidates. -- * lra-constraints.c (process_address_1): Swap base_term and index_term if INDEX_REG_CLASS is assigned to base_term already when INDEX_REG_CLASS is a single register class. The patch is ok. Although I will probably try to generalize it in future. Thanks, again.
Re: [PATCH] PR 62173, re-shuffle insns for RTL loop invariant hoisting
2014-12-18 17:00 GMT+00:00 Jiong Wang jiong.w...@arm.com: ok for trunk? gcc/ PR62173 loop-invariant.c.c (expensive_addr): New hash_table. (need_expensive_addr_check_p): New bool. (find_exits): Rename to find_exists_and_reshuffle. Support re-shuffle instructions for better loop invariant hoisting. another question is, is it safe to re-use REG_DEAD info here without calling df_note_add_problem and df_analysis first? am I using those info passed down from the previous pass which calculated these info and maybe broken?
Re: [PATCH 0/9] Enable LRA on SH
On 2014-12-18 2:26 PM, Mike Stump wrote: On Dec 18, 2014, at 10:42 AM, Vladimir Makarov vmaka...@redhat.com wrote: Thank you very much for your work on porting LRA to SH! I’ve ported my target to LRA, and I have not found any LRA bits I needed to change. Thanks, Mike. When I did a few first ports, it never happened to me. First steps are hard but next ones are easier. Back in Cygnus times SH was always a problem for reload and RA. A lot of regs but one reg is used for many purposes, tiny and mode dependent displacements. A lot of optimizations were implemented specifically for SH. SH optimizations were one reason why reload became so complicated.
Re: [RFC PATCH 4/9] Add TARGET_LEGITIMIZE_ADDRESS_DISPLACEMENT target macro
On 2014-12-17 8:00 PM, Kaz Kojima wrote: This patch adds a target macro discussed in PR55212 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212#c76 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212#c78 is to avoid bad codes on SH. SH has very limited base+displacement addressing and it looks that the old reload has some mechanism so to split Yes, that is probably last thing of reload missed in LRA. As I remember Joern wrote this very long ago specifically for SH. postreload pass also has some code to deal with different displacements and their costs (if there are no constraints on the constants). reg := mem[base + bad_displacement] to new_base := base + (bad_displacement - legitimate_displacement) reg := mem[new_base + legitimate_displacement] The new target macro is to give same effect on LRA. This helps SH especially for the address like stack_pointer + offset where offset has 4-bit except for SH2A. I've tried to define this target macro for ARM thumb and found only one CSiBE test replaypc-0.4.0.preproc/build-ndx is improved at 4 bytes with -Os. I guess that 8-bit offset field for sp+offset addressing on ARM thumb is very good for this issue. Again it would be better if something can cover this case without new target macros. -- * lra-constraints.c (process_address_1): Try if target can split displacement with targetm.legitimize_address_displacement. * target.def (legitimize_address_displacement): New hook. * targhooks.c (default_legitimize_address_displacement): New function. * targhooks.h (default_legitimize_address_displacement): Declare. * config/sh/sh.c (sh_legitimize_address_displacement): New function. (TARGET_LEGITIMIZE_ADDRESS_DISPLACEMENT): Define. * doc/tm.texi.in (TARGET_LEGITIMIZE_ADDRESS_DISPLACEMENT): New hook. * doc/tm.texi: Regenerate. It is an interesting solution. The patch is ok for me. One more thing though. The patches you propose affect very sensitive parts of LRA. They might break other ports currently using LRA. If it happens, please, revert a breaking patch as I can not be a help next 2 weeks.
Re: [RFC PATCH 1/9] LRA: Take account implicit usage of pseudo reg in mem arg
On 2014-12-17 7:57 PM, Kaz Kojima wrote: This patch is discussed in PR55212 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212#c47 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212#c48 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212#c49 and is to avoid segfault in remove_pseudos. In the problematic case in #c47, a call_insn has a note for an argument via memory: (expr_list:DF (use (mem:DF (reg/f:SI 699) [0 S8 A32])) and this usage of pseudo 699 was missed by LRA. spill_hard_reg[699] doesn't have correct value and causes a segfault in remove_pseudos. -- * lra.c (lra_update_insn_regno_info): Take account of arguments via memory which could be implicit usage of pseudo regs. It could be a solution but I believe the problem should be solved in some other way. The patch will generate a correct code but it might also generate worse code. The regs mentioned in insn data are always processed for liveness and adding the pseudo can make live range longer potentially worsening allocation of other pseudos. So I guess we need to find another solution for the problem. I suspect spill_hard_reg is undefined as the pseudo 699 is present only in CALL_INSN_USAGE. Otherwise, spill_hard_reg would be NULL. A solution would be to figure out why there is only one reference, what does it mean, and what we expect at the end of RA instead of pseudo (e.g. anything, const or some register). And add code correspondingly in remove_pseudos for case nrefs==0 or scratch pseudo (cases when spill_hard_reg is not initialized). May be such code is the result of wrong transformations (e.g. incomplete reg equiv substitution chain), then it should be fixed somewhere else. It would be nice to have a small test case for this problem with compiler options to reproduce it.
Re: [PATCH] Fix -fsanitize=float-cast-overflow with C FE (PR sanitizer/64344)
On Thu, 18 Dec 2014, Jakub Jelinek wrote: c/ * c-typeck.c (convert_for_assignment, c_finish_return): For -fsanitize=float-cast-overflow casts from REAL_TYPE to integer/enum types also set in_late_binary_op around convert call. * c-convert.c (convert): For -fsanitize=float-cast-overflow REAL_TYPE to integral type casts, if not in_late_binary_op, pass c_fully_fold result on expr as last argument to ubsan_instrument_float_cast, if in_late_binary_op, don't use c_save_expr but save_expr. The C front-end changes are OK. -- Joseph S. Myers jos...@codesourcery.com
patch fixing a typo in LRA
The following patch fixes a typo in usage of register_move_cost. For most targets, the change does not matter as the cost function is symmetric relative to args. The patch was bootstrapped on x86-64. Committed as rev. 218875. 2014-12-18 Vladimir Makarov vmaka...@redhat.com * lra-constraints.c (lra-constraints.c): Exchange places of sclass and dclass. Index: lra-constraints.c === --- lra-constraints.c (revision 218835) +++ lra-constraints.c (working copy) @@ -3197,7 +3197,7 @@ simple_move_p (void) (sclass = get_op_class (src)) != NO_REGS /* The backend guarantees that register moves of cost 2 never need reloads. */ - targetm.register_move_cost (GET_MODE (src), dclass, sclass) == 2); + targetm.register_move_cost (GET_MODE (src), sclass, dclass) == 2); } /* Swap operands NOP and NOP + 1. */
Re: [PATCH] PR 62173, re-shuffle insns for RTL loop invariant hoisting
On Thu, Dec 18, 2014 at 05:00:01PM +, Jiong Wang wrote: On 17/12/14 15:54, Richard Biener wrote: ick. I realize we don't have SSA form on RTL but doesn't DF provide at least some help in looking up definition statements for pseudos? In fact we want to restrict the transform to single-use pseudos, thus hopefully it can at least tell us that... (maybe not and this is what LOG_LINKS are for in combine...?) At least loop-invariant alreadly computes df_chain with DF_UD_CHAIN which seems exactly what is needed (apart from maybe getting at single-use info). thanks very much for these inspiring questions. yes, we want to restrict the transformation on single-use pseudo only, and it's better the transformation could re-use existed info and helper function to avoid increase compile time. but I haven't found anything I can reuse at the stage the transformation happen. the info similar as LOG_LINKS is what I want, but maybe simpler. I'd study the code about build LOG_LINKS, and try to see if we can do some factor out. LOG_LINKs in combine are just historical. combine should be converted to use DF fully. LOG_LINKs have nothing to do with single use; they point from the _first_ use to its corresponding def. You might want to look at what fwprop does instead. Segher
C++ PATCH for c++/64251 (unnecessary template instantiation)
Here we were seeing errors because of template instantiation triggered by mark_used within fold_non_dependent_expr. We don't want to instantiate templates based on uses in other uninstantiated templates, and we were already trying to avoid that; let's add another check to avoid it. Tested x86_64-pc-linux-gnu, applying to trunk. commit 619400bca71f10d05e6adfaaef53c67997fb3f57 Author: Jason Merrill ja...@redhat.com Date: Thu Dec 18 12:10:38 2014 -0500 PR c++/64251 * decl2.c (mark_used): Don't mark if in_template_function. diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c index b2123f2..69201b0 100644 --- a/gcc/cp/decl2.c +++ b/gcc/cp/decl2.c @@ -5017,7 +5017,7 @@ mark_used (tree decl, tsubst_flags_t complain) --function_depth; } - if (processing_template_decl) + if (processing_template_decl || in_template_function ()) return true; /* Check this too in case we're within instantiate_non_dependent_expr. */ diff --git a/gcc/testsuite/g++.dg/template/non-dependent14.C b/gcc/testsuite/g++.dg/template/non-dependent14.C new file mode 100644 index 000..b257d9b --- /dev/null +++ b/gcc/testsuite/g++.dg/template/non-dependent14.C @@ -0,0 +1,7 @@ +// PR c++/64251 + +class DictionaryValue {}; +template typename T void CreateValue(T) { + DictionaryValue(0); + CreateValue(0); +}
C++ PATCH for c++/64352 (decltype and SFINAE)
We were just missing a case where we want to pass complain down to avoid a hard error in SFINAE context. Tested x86_64-pc-linux-gnu, applying to trunk. commit 83bea8460ce8e14e5b9d1c0dc33a0d22b63b566d Author: Jason Merrill ja...@redhat.com Date: Thu Dec 18 12:42:34 2014 -0500 PR c++/64352 * pt.c (tsubst_copy_and_build): Pass complain to mark_used. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 8a663d9..3ddee25 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -15125,7 +15125,7 @@ tsubst_copy_and_build (tree t, /* Remember that there was a reference to this entity. */ if (DECL_P (function)) - mark_used (function); + mark_used (function, complain); /* Put back tf_decltype for the actual call. */ complain |= decltype_flag; diff --git a/gcc/testsuite/g++.dg/cpp0x/deleted9.C b/gcc/testsuite/g++.dg/cpp0x/deleted9.C new file mode 100644 index 000..af97be7 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/deleted9.C @@ -0,0 +1,31 @@ +// PR c++/64352 +// { dg-do compile { target c++11 } } + +templatebool B struct bool_type +{ static constexpr bool value = B; }; + +using true_type = bool_typetrue; +using false_type = bool_typefalse; + +templatetypename T T declval(); + +templatetypename... struct void_ { using type = void; }; +templatetypename... I using void_t = typename void_I...::type; + +templatetypename _Tp, typename = void +struct _Has_addressof_free: false_type { }; + +templatetypename _Tp +struct _Has_addressof_free +_Tp, void_tdecltype( operator(declvalconst _Tp()) ) +: true_type { }; + +struct foo {}; +void operator(foo) = delete; + +int main() +{ +static_assert( !_Has_addressof_freeint::value, ); +// error: use of deleted function 'void operator(foo)' +static_assert( !_Has_addressof_freefoo::value, ); +}
Re: [patch c++]: Fix PR/61198: Crash when selecting specializations through aliases.
On 12/18/2014 01:16 PM, Kai Tietz wrote: Well, in general I would have assumed to be able to get alias decl of tmpl. Wasn't able to find a simple way to get it. So, by looking into source I found that most cases handling args tmpl-args by using inner_most_template_args instead. Defaulting to innermost_template_args looks indeed a bit wrong, but seemed to work. It's wrong, it just papers over the bug. There shouldn't be a mismatch at this point. The problem seems to be that most_general_template isn't actually returning the most general template, so we're trying to instantiate the partially-instantiated alias template with a full set of arguments: thus the mismatch. Jason
Re: flatten expr.h
Hi, I had a discussion with Michael, and would like to put this patch on hold for now till we have tree.h and rtl.h flattening patches checked in to avoid conflicts. Thankyou, Prathamesh On 16 December 2014 at 17:13, Prathamesh Kulkarni prathamesh.kulka...@linaro.org wrote: Hi, The attached patch flattens expr.h. Moved stmt.c prototypes to stmt.h and expmed.c prototypes to expmed.h. Rest of the patch involves restructuring includes. Thank you, Prathamesh
Speedup and cleanup hash-table.h
Hi, this patch started as experiment moving hash_table_mod1 inline because it shows high in streaming profiles and it represents a branch-less code that is good to schedule to surrounding instructions. While looking at hash-tab.h I however noticed that it was not updated very well while it was moved from libiberty.h. It still assumes lack of 64bit integers to be possible and also it uses abort() calls instead of gcc_asserts. Fixed thus. Bootstrapped/regtested x86_64-linux, also checked that cc1plus binary shirnks by about 15kb despite the extra inlining. Will commit it shortly. Honza * hash-table.h (struct pointer_hash): Fix formating. (hash_table_higher_prime_index): Declare pure. (hash_table_mod2, hash_table_mod1, mul_mod): Move inline; assume that uint64_t always exists. (hash_tableDescriptor, Allocator, false): Use gcc_checking_assert. (hash_tableDescriptor, Allocator, false::expand ()): Fix formating. (hash_tableDescriptor, Allocator, false::clear_slot (value_type **slot)): Use checking assert. * hash-table.c: Remvoe #if 0 code. (hash_table_higher_prime_index): Use gcc_assert. (mul_mod, hash-table_mod1, hash_table_mod2): move to hash-table.h Index: hash-table.h === --- hash-table.h(revision 218795) +++ hash-table.h(working copy) @@ -282,7 +282,8 @@ struct pointer_hash : typed_noop_remove static inline hashval_t hash (const value_type ); - static inline bool equal (const value_type existing, const compare_type candidate); + static inline bool equal (const value_type existing, + const compare_type candidate); }; template typename Type @@ -388,9 +389,54 @@ extern struct prime_ent const prime_tab[ /* Functions for computing hash table indexes. */ -extern unsigned int hash_table_higher_prime_index (unsigned long n); -extern hashval_t hash_table_mod1 (hashval_t hash, unsigned int index); -extern hashval_t hash_table_mod2 (hashval_t hash, unsigned int index); +extern unsigned int hash_table_higher_prime_index (unsigned long n) + ATTRIBUTE_PURE; + +/* Return X % Y using multiplicative inverse values INV and SHIFT. + + The multiplicative inverses computed above are for 32-bit types, + and requires that we be able to compute a highpart multiply. + + FIX: I am not at all convinced that + 3 loads, 2 multiplications, 3 shifts, and 3 additions + will be faster than + 1 load and 1 modulus + on modern systems running a compiler. */ + +inline hashval_t +mul_mod (hashval_t x, hashval_t y, hashval_t inv, int shift) +{ + hashval_t t1, t2, t3, t4, q, r; + + t1 = ((uint64_t)x * inv) 32; + t2 = x - t1; + t3 = t2 1; + t4 = t1 + t3; + q = t4 shift; + r = x - (q * y); + + return r; +} + +/* Compute the primary table index for HASH given current prime index. */ + +inline hashval_t +hash_table_mod1 (hashval_t hash, unsigned int index) +{ + const struct prime_ent *p = prime_tab[index]; + gcc_checking_assert (sizeof (hashval_t) * CHAR_BIT = 32); +return mul_mod (hash, p-prime, p-inv, p-shift); +} + +/* Compute the secondary table index for HASH given current prime index. */ + +inline hashval_t +hash_table_mod2 (hashval_t hash, unsigned int index) +{ + const struct prime_ent *p = prime_tab[index]; + gcc_checking_assert (sizeof (hashval_t) * CHAR_BIT = 32); + return 1 + mul_mod (hash, p-prime - 2, p-inv_m2, p-shift); +} /* The below is some template meta programming to decide if we should use the hash table partial specialization that directly stores value_type instead of @@ -748,8 +794,7 @@ hash_tableDescriptor, Allocator, false if (*slot == HTAB_EMPTY_ENTRY) return slot; - else if (*slot == HTAB_DELETED_ENTRY) -abort (); + gcc_checking_assert (*slot != HTAB_DELETED_ENTRY); hash2 = hash_table_mod2 (hash, m_size_prime_index); for (;;) @@ -761,8 +806,7 @@ hash_tableDescriptor, Allocator, false slot = m_entries + index; if (*slot == HTAB_EMPTY_ENTRY) return slot; - else if (*slot == HTAB_DELETED_ENTRY) -abort (); + gcc_checking_assert (*slot != HTAB_DELETED_ENTRY); } } @@ -773,7 +817,7 @@ hash_tableDescriptor, Allocator, false table entries is changed. If memory allocation fails, this function will abort. */ - templatetypename Descriptor, templatetypename Type class Allocator +templatetypename Descriptor, templatetypename Type class Allocator void hash_tableDescriptor, Allocator, false::expand () { @@ -862,9 +906,9 @@ templatetypename Descriptor, templatet void hash_tableDescriptor, Allocator, false::clear_slot (value_type **slot) { - if (slot m_entries || slot = m_entries + size () - || *slot == HTAB_EMPTY_ENTRY || *slot == HTAB_DELETED_ENTRY) -abort (); + gcc_checking_assert (!(slot m_entries || slot = m_entries + size () +
C++ PATCH for c++/64105 (ICE on generic lambda with -std=c++11)
Since generic lambdas don't seem to work properly in C++11 mode, let's upgrade the pedwarn to an error. Tested x86_64-pc-linux-gnu, applying to trunk. commit 6c7d0ebb35b4918be3200625ef0cbb1816ddd7e9 Author: Jason Merrill ja...@redhat.com Date: Thu Dec 18 16:56:33 2014 -0500 PR c++/64105 * parser.c (cp_parser_simple_type_specifier): Make auto parameter before -std=c++14 an error. diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index 0e7ba7a..8ff16ed 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -14862,23 +14862,26 @@ cp_parser_simple_type_specifier (cp_parser* parser, maybe_warn_cpp0x (CPP0X_AUTO); if (parser-auto_is_implicit_function_template_parm_p) { - type = synthesize_implicit_template_parm (parser); + if (cxx_dialect = cxx14) + type = synthesize_implicit_template_parm (parser); + else + type = error_mark_node; if (current_class_type LAMBDA_TYPE_P (current_class_type)) { if (cxx_dialect cxx14) - pedwarn (location_of (type), 0, + error_at (token-location, use of %auto% in lambda parameter declaration only available with -std=c++14 or -std=gnu++14); } else if (cxx_dialect cxx14) - pedwarn (location_of (type), 0, + error_at (token-location, use of %auto% in parameter declaration only available with -std=c++14 or -std=gnu++14); else - pedwarn (location_of (type), OPT_Wpedantic, + pedwarn (token-location, OPT_Wpedantic, ISO C++ forbids use of %auto% in parameter declaration); } @@ -14971,6 +14974,9 @@ cp_parser_simple_type_specifier (cp_parser* parser, /* Consume the token. */ cp_lexer_consume_token (parser-lexer); + if (type == error_mark_node) + return error_mark_node; + /* There is no valid C++ program where a non-template type is followed by a . That usually indicates that the user thought that the type was a template. */ diff --git a/gcc/testsuite/g++.dg/cpp1y/lambda-generic-variadic2.C b/gcc/testsuite/g++.dg/cpp1y/lambda-generic-variadic2.C new file mode 100644 index 000..2808aa6 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1y/lambda-generic-variadic2.C @@ -0,0 +1,22 @@ +// PR c++/64105 +// This test was ICEing on pre-C++14 mode. + +#include functional + +using F = std::functionvoid(void); + +struct X +{ + template typename T + static void f(T t) + { +g(t); + } + + static void g(F) {} +}; + +int main() +{ + X::f([](auto... xs){}); // { dg-error { target { ! cxx14 } } } +};
Flatten tree.h and tree-core.h
This patch flattens tree.h and tree-core.h. This work is part of the GCC Re-Architecture effort being led by Andrew MacLeod. I removed the includes in tree.h and tree-core.h except for the include of tree-core.h in tree.h. I modified genattrtab.c, genautomata.c, genemit.c, gengtype.c, gengtype.c, genoptinit.c, genoutput.c, genpeep.c, genpreds.c, and optc-save-gen-awk to include the the necessary include files removed from tree.h and tree-core.h when generating their respective files. All other changes include the necessary include files removed from tree.h and tree-core.h. Note the patches modifies all the front-ends. I bootstrapped on x86 with all languages. I also bootstrapped on all targets listed in contrib/config-list.mk with c and c++ enabled. Is this okay for trunk? 2014-12-18 Michael Collison michael.colli...@linaro.org * genattrtab.c (write_header): Include hash-set.h, machmode.h, vec.h, double-int.h, input.h, alias.h, symtab.h, options.h tree-core.h, fold-const.h, wide-int.h, and inchash.h when generating insn-attrtab.c. * genautomata.c (main) : Include hash-set.h, macInclude hash-set.h, machmode.h, vec.h, double-int.h, input.h, alias.h, symtab.h, options.h tree-core.h, fold-const.h, wide-int.h, and inchash.h when generating insn-automata.c. * genemit.c (main): Include hash-set.h, machmode.h, vec.h, double-int.h, input.h, alias.h, symtab.h, options.h tree-core.h, fold-const.h, wide-int.h, and inchash.h when generating insn-emit.c. * gengtype.c (open_base_files): Include hash-set.h, machmode.h, vec.h, double-int.h, input.h, alias.h, symtab.h, options.h tree-core.h, fold-const.h, wide-int.h, and inchash.h when generating gtype-desc.c. * genopinit.c (main): Include hash-set.h, machmode.h, vec.h, double-int.h, input.h, alias.h, symtab.h, options.h tree-core.h, fold-const.h, wide-int.h, and inchash.h when generating insn-opinit.c. * genoutput.c (output_prologue): Include hash-set.h, machmode.h, vec.h, double-int.h, input.h, alias.h, symtab.h, options.h tree-core.h, fold-const.h, wide-int.h, and inchash.h when generating insn-output.c. * genpeep.c (main): Include hash-set.h, machmode.h, vec.h, double-int.h, input.h, alias.h, symtab.h, options.h tree-core.h, fold-const.h, wide-int.h, and inchash.h when generating insn-peep.c. * genpreds.c (write_insn_preds_c): Include hash-set.h, machmode.h, vec.h, double-int.h, input.h, alias.h, symtab.h, options.h tree-core.h, fold-const.h, wide-int.h, and inchash.h when generating insn-preds.c. * optc-save-gen-awk: Include hash-set.h, machmode.h, vec.h, double-int.h, input.h, alias.h, symtab.h, options.h tree-core.h, fold-const.h, wide-int.h, and inchash.h when generating options-save.c. * opth-gen.awk: Change include guard from GCC_C_COMMON_H to GCC_C_COMMON_C when generating options.h. 2014-12-18 Michael Collison michael.colli...@linaro.org * ada/gcc-interface/cuintp.c: Include hash-set.h, machmode.h, vec.h, double-int.h, input.h, alias.h, symtab.h, tree-core.h, fold-const.h, wide-int.h, and inchash.h due to flattening of tree.h. * ada/gcc-interface/decl.c: ditto. * ada/gcc-interface/misc.c: ditto. * ada/gcc-interface/targtyps.c: Include hash-set.h, machmode.h, vec.h, double-int.h, input.h, alias.h, symtab.h, options.h, tree-core.h, fold-const.h, wide-int.h, and inchash.h due to flattening of tree.h. * ada/gcc-interface/trans.c: Include hash-set.h, machmode.h, vec.h, double-int.h, input.h, alias.h, symtab.h, real.h, tree-core.h, fold-const.h, wide-int.h, inchash.h due to flattening of tree.h. * ada/gcc-interface/utils.c: Include hash-set.h, machmode.h, vec.h, double-int.h, input.h, alias.h, symtab.h, tree-core.h, fold-const.h, wide-int.h, and inchash.h due to flattening of tree.h. * ada/gcc-interface/utils2.c: ditto. * alias.c: Include hash-set.h, machmode.h, vec.h, double-int.h, input.h, alias.h, symtab.h, options.h tree-core.h, fold-const.h, wide-int.h, and inchash.h due to flattening of tree.h. * asan.c: ditto. * attribs.c: ditto. * auto-inc-dec.c: ditto. * auto-profile.c: ditto * bb-reorder.c: ditto. * bt-load.c: Include symtab.h due to flattening of tree.h. * builtins.c: Include hash-set.h, machmode.h, vec.h, double-int.h, input.h, alias.h, symtab.h, options.h tree-core.h, fold-const.h, wide-int.h, and inchash.h due to flattening of tree.h. * c/c-array-notation.c: ditto. * c/c-aux-info.c: ditto. * c/c-convert.c: ditto. * c/c-decl.c: ditto. * c/c-errors.c: ditto. * c/c-lang.c: dittoxs. * c/c-objc-common.c: ditto. * c/c-parser.c: ditto. * c/c-typeck.c: Include hash-set.h, machmode.h, vec.h, double-int.h, input.h, alias.h, symtab.h, options.h tree-core.h, fold-const.h, wide-int.h, inchash.h, real.h and
Re: Speedup and cleanup hash-table.h
Jan Hubicka hubi...@ucw.cz writes: Hi, this patch started as experiment moving hash_table_mod1 inline because it shows high in streaming profiles and it represents a branch-less code that is good to schedule to surrounding instructions. FWIW with a good modern hash function it shouldn't be needed to have prime hash table sizes anymore. Without that just a power of two size can be used, so it would be just a mask. I considered this last time I messed with hashes, but I didn't actually see this function as beening hot. -Andi -- a...@linux.intel.com -- Speaking for myself only
Re: [PATCH] -fsanitize-recover=list
Hi Jakub, On Tue, Nov 18, 2014 at 11:36 AM, Alexey Samsonov samso...@google.com wrote: On Mon, Nov 17, 2014 at 11:53 PM, Jakub Jelinek ja...@redhat.com wrote: On Mon, Nov 17, 2014 at 11:39:47PM -0800, Alexey Samsonov wrote: That is not what I think we've agreed on and what is implemented in GCC. -fsanitize-recover only enables recovery of the undefined plus undefined-like sanitizers, in particular it doesn't enable recover from kernel-address, because -fsanitize-recover should be a deprecated option and kernel-address never used it before. Hm, indeed, I messed it up. In the older thread we agree that plain -f(no-)sanitize-recover should be a deprecated synonym for the current meaning of these flags, which only specify recoverability for UBSan-related checks :-/ Has GCC already shipped with existing implementation? I'm just curious if it's convenient to have flags that would enable/disable recovery for all sanitizers (analogous to -Werror flags which exist in a standalone form, but may accept specific warnings, so that one can write $(CC) foo.cc -Wno-error -Werror=sign-compare Well, no sanitizer is on by default, so you can just use the same list for -fno-sanitize-recover= or -fsanitize-recover= as for -fsanitize= if you want. Yeah, but it may look somewhat redundant. Also, re-using the exact same list may lead to diagnostic messages (if you write -fsanitize=unreachable,null -fsanitize-recover=unreachable,null). Reviving this thread. What do you think of the following idea: 1) we keep -fsanitize-recover and -fno-sanitize-recover as deprecated synonyms for -f(no-)?sanitize=ubsan-like checks 2) we introduce -fsanitize-recover=list and -fno-sanitize-recover=list, where list may contain specific sanitizers (address) or group of sanitizers (undefined). 3) we introduce group of sanitizers called all, which is, well, all available sanitizers. It can *not* be used in -fsanitize=all flag, but can be used to easily disable all previously enabled sanitizers (-fno-sanitize=all), or to enable/disable recovery for all enabled sanitizers (-fsanitize-recover=all / -fno-sanitize-recover=all). This would be a handy shortcut, and will save users from copying the same list twice for -fsanitize= and -fsanitize-recover= flags, and from potential diagnostic problems I mention. So, in GCC -fsanitize-recover stands for -fsanitize-recover=undefined,float-divide-by-zero,float-cast-overflow and -fno-sanitize-recover stands for -fno-sanitize-recover=undefined,float-divide-by-zero,float-cast-overflow * -fsanitize-recover=list: Enable recovery for all selected checks or group of checks. It is forbidden to list unrecoverable sanitizers here (e.g., -fsanitize-recover=address will produce an error). We only error on -fsanitize-recover=address -fsanitize-recover=thread -fsanitize-recover=leak Right, it's a good idea to error on sanitizer kinds we don't have recovery for. I will add the errors for TSan/MSan/LSan etc. but not say on -fsanitize-recover=unreachable which is part of undefined; unreachable isn't recoverable silently. Likewise -fsanitize-recover=return. Otherwise one couldn't use -fsanitize-recover=undefined which is useful. Can't this be fixed by checking the actual values of -fsanitize-recover= flag before expanding the sanitizer groups (i.e. turning undefined into null,unreachable,return,)? We should probably be able to distinguish between -fsanitize-recover=undefined, and -fsanitize-recover=unreachable,another checks from undefined. In the first case we can enable recovery for all parts of undefined that support it. In the second, we can produce an error as user explicitly tried to enable recovery for sanitizer that doesn't support it. But in that case you would need to diagnose it already when parsing the option, or add code that would remember what bits have been set from other option sets. In the former case, you'd diagnose even -fsanitize-recover=unreachable -fno-sanitize-recover=undefined even when in that case unreachable in the end is not recoverable. We don't error out for -fsanitize-recover=address -fno-sanitize-recover=address because in the end address is not recovered. OK, that's a good question: should we diagnose -fsanitize-recover=address if it was later disabled by -fno-sanitize-recover=address? There is an argument for doing this: -fsanitize-recover=address is unsupported, so this flag shouldn't have been provided in the first place. It makes much more sense to delete it rather than override it. It couldn't be passed down from some global project-wide configuration (like CFLAGS), because it wouldn't work anywhere. The only case where overriding might come in handy is re-using the same list for -fsanitize= and -fsanitize-recover= flags that you mention: # SANITIZERS is a list of sanitizers we want to enable.