Re: copyright dates in binutils (and includes/)
On Thu, Feb 27, 2014 at 06:47:17PM +, Joseph S. Myers wrote: On Thu, 27 Feb 2014, Joel Brobecker wrote: I should mention, however, that for us to use ranges like this, the FSF asked us to add a note explaining that the copyright years could be abbreviated into a range. See gdb/README (at the end). I suspect that you'll need the same note for binutils. Thanks Joel. I'll copy that or the gcc wording. And, where a gap in the years is being implicitly filled in by conversion to a range, make sure that either (a) there was a public version control repository for binutils during that year, or (b) there was a release (including beta releases, Cygnus releases etc., not just official releases) during that year. It looks like the earliest binutils files that are edited by update-copyright.py have copyright dates starting at 1985. Of those, quite a few have skipped years. eg. binutils/filemode.c is Copyright 1985, 1990,... So, CVS goes back to 1991, and there are copies of old binutils releases for all years from 1988 to 2002 except for 1999 at ftp://sourceware.org/pub/binutils/old-releases/ Joseph, do you know why implicitly adding years to the claimed copyright years is a problem? I'm guessing the file needs to be published somewhere for each year claimed. -- Alan Modra Australia Development Lab, IBM
RE: [AArch64 05/14] Add AArch64 'prefetch'-pattern.
With the locality value received in the instruction pattern, I think it would be safe to handle them in prefetch instruction. This helps especially AArch64 has prefetch instructions that can handle this locality. +(define_insn prefetch + [(prefetch (match_operand:DI 0 address_operand r) +(match_operand:QI 1 const_int_operand n) +(match_operand:QI 2 const_int_operand n))] + + * +{ + int locality = INTVAL (operands[2]); + + gcc_assert (IN_RANGE (locality, 0, 3)); + + if (locality == 0) + /* non temporal locality */ + return (INTVAL(operands[1])) ? \prfm\\tPSTL1STRM, [%0, #0]\ : \prfm\\tPLDL1STRM, [%0, #0]\; + + /* temporal locality */ + return (INTVAL(operands[1])) ? \prfm\\tPSTL%2KEEP, [%0, #0]\ : \prfm\\tPLDL%2KEEP, [%0, #0]\; +} + [(set_attr type prefetch)] +) + I also have attached a patch that implements * Prefetch with immediate offset in the range 0 to 32760 (multiple of 8). Added a predicate for this. * Prefetch with immediate offset - in the range -256 to 255 (Gets generated only when we have a negative offset. Generates prfum instruction). Added a predicate for this. * Prefetch with register offset. (modified for printing the locality) Regards Ganesh -Original Message- From: Philipp Tomsich [mailto:philipp.toms...@theobroma-systems.com] Sent: Wednesday, February 19, 2014 2:40 AM To: gcc-patches@gcc.gnu.org Cc: philipp.toms...@theobroma-systems.com Subject: [AArch64 05/14] Add AArch64 'prefetch'-pattern. --- gcc/config/aarch64/aarch64.md | 17 + gcc/config/arm/types.md | 2 ++ 2 files changed, 19 insertions(+) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 99a6ac8..b972a1b 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -293,6 +293,23 @@ [(set_attr type no_insn)] ) +(define_insn prefetch + [(prefetch (match_operand:DI 0 register_operand r) +(match_operand:QI 1 const_int_operand n) +(match_operand:QI 2 const_int_operand n))] + + * +{ + if (INTVAL(operands[2]) == 0) + /* no temporal locality */ + return (INTVAL(operands[1])) ? \prfm\\tPSTL1STRM, [%0, #0]\ : +\prfm\\tPLDL1STRM, [%0, #0]\; + + /* temporal locality */ + return (INTVAL(operands[1])) ? \prfm\\tPSTL1KEEP, [%0, #0]\ : +\prfm\\tPLDL1KEEP, [%0, #0]\; } + [(set_attr type prefetch)] +) + (define_insn trap [(trap_if (const_int 1) (const_int 8))] diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md index cc39cd1..1d1280d 100644 --- a/gcc/config/arm/types.md +++ b/gcc/config/arm/types.md @@ -117,6 +117,7 @@ ; mvn_shift_reg inverting move instruction, shifted operand by a register. ; no_insnan insn which does not represent an instruction in the ;final output, thus having no impact on scheduling. +; prefetch a prefetch instruction ; rbit reverse bits. ; revreverse bytes. ; sdiv signed division. @@ -553,6 +554,7 @@ call,\ clz,\ no_insn,\ + prefetch,\ csel,\ crc,\ extend,\ -- 1.9.0 prefetchdiff.log Description: prefetchdiff.log
Re: [PATCH GCC]Allow cfgcleanup to remove forwarder loop preheaders and latches
On Fri, Feb 28, 2014 at 1:52 AM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Feb 24, 2014 at 9:12 PM, bin.cheng bin.ch...@arm.com wrote: Hi, This patch is to fix regression reported in PR60280 by removing forward loop headers/latches in cfg cleanup if possible. Several tests are broken by this change since cfg cleanup is shared by all optimizers. Some tests has already been fixed by recent patches, I went through and fixed the others. One case needs to be clarified is gcc.dg/tree-prof/update-loopch.c. When GCC removing a basic block, it checks profile information by calling check_bb_profile after redirecting incoming edges of the bb. This certainly results in warnings about invalid profile information and causes the case to fail. I will send a patch to skip checking profile information for a removing basic block in stage 1 if it sounds reasonable. For now I just twisted the case itself. Bootstrap and tested on x86_64 and arm_a15. Is it OK? 2014-02-25 Bin Cheng bin.ch...@arm.com PR target/60280 * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop preheaders and latches only if requested. Fix latch if it is removed. * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set LOOPS_HAVE_PREHEADERS. This change: if (dest-loop_father-header == dest) - return false; + { +if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) + bb-loop_father-header != dest) + return false; + +if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) + bb-loop_father-header == dest) + return false; + } } miscompiled 435.gromacs in SPEC CPU 2006 on x32 with -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver -fuse-linker-plugin This patch changes loops without LOOPS_HAVE_PREHEADERS nor LOOPS_HAVE_SIMPLE_LATCHES from returning false to returning true. I don't have a small testcase. But this patch: diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c index b5c384b..2ba673c 100644 --- a/gcc/tree-cfgcleanup.c +++ b/gcc/tree-cfgcleanup.c @@ -323,6 +323,10 @@ tree_forwarder_block_p (basic_block bb, bool phi_wanted) if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) bb-loop_father-header == dest) return false; + +if (!loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) + !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)) + return false; } } fixes the regression. Does it make any senses? I think the preheader test isn't fully correct (bb may be in an inner loop for example). So a more conservative variant would be Index: gcc/tree-cfgcleanup.c === --- gcc/tree-cfgcleanup.c (revision 208169) +++ gcc/tree-cfgcleanup.c (working copy) @@ -316,13 +316,13 @@ tree_forwarder_block_p (basic_block bb, /* Protect loop preheaders and latches if requested. */ if (dest-loop_father-header == dest) { - if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) - bb-loop_father-header != dest) - return false; - - if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) - bb-loop_father-header == dest) - return false; + if (bb-loop_father == dest-loop_father) + return !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES); + else if (bb-loop_father == loop_outer (dest-loop_father)) + return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS); + /* Always preserve other edges into loop headers that are +not simple latches or preheaders. */ + return false; } } that makes sure we can properly update loop information. It's also a more conservative change at this point which should still successfully remove simple latches and preheaders created by loop discovery. Does it fix 435.gromacs? Thanks, Richard. -- H.J.
RE: [AArch64 05/14] Add AArch64 'prefetch'-pattern.
Avoided top-posting and resending. + /* temporal locality */ + return (INTVAL(operands[1])) ? \prfm\\tPSTL1KEEP, [%0, #0]\ : +\prfm\\tPLDL1KEEP, [%0, #0]\; } + [(set_attr type prefetch)] +) + With the locality value received in the instruction pattern, I think it would be safe to handle them in prefetch instruction. This helps especially AArch64 has prefetch instructions that can handle this locality. +(define_insn prefetch + [(prefetch (match_operand:DI 0 address_operand r) +(match_operand:QI 1 const_int_operand n) +(match_operand:QI 2 const_int_operand n))] + + * +{ + int locality = INTVAL (operands[2]); + + gcc_assert (IN_RANGE (locality, 0, 3)); + + if (locality == 0) + /* non temporal locality */ + return (INTVAL(operands[1])) ? \prfm\\tPSTL1STRM, [%0, #0]\ : \prfm\\tPLDL1STRM, [%0, #0]\; + + /* temporal locality */ + return (INTVAL(operands[1])) ? \prfm\\tPSTL%2KEEP, [%0, #0]\ : \prfm\\tPLDL%2KEEP, [%0, #0]\; +} + [(set_attr type prefetch)] +) + I also have attached a patch that implements the following. * Prefetch with immediate offset in the range 0 to 32760 (multiple of 8). Added a predicate for this. * Prefetch with immediate offset - in the range -256 to 255 (Gets generated only when we have a negative offset. Generates prfum instruction). Added a predicate for this. * Prefetch with register offset. (modified for printing the locality) Regards Ganesh prefetchdiff.log Description: prefetchdiff.log
[gomp4 1/2] Initial support for the OpenACC kernels construct: GIMPLE_OACC_KERNELS.
From: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 gcc/ * gimple.def (GIMPLE_OACC_KERNELS): New code. * doc/gimple.texi: Document it. * gimple.h (gimple_has_substatements, CASE_GIMPLE_OMP) (is_gimple_omp_oacc_specifically): Handle it. (gimple_statement_oacc_kernels): New struct. (gimple_build_oacc_kernels): New prototype. (gimple_oacc_kernels_clauses, gimple_oacc_kernels_clauses_ptr) (gimple_oacc_kernels_set_clauses, gimple_oacc_kernels_child_fn) (gimple_oacc_kernels_child_fn_ptr) (gimple_oacc_kernels_set_child_fn, gimple_oacc_kernels_data_arg) (gimple_oacc_kernels_data_arg_ptr) (gimple_oacc_kernels_set_data_arg): New inline functions. * gimple.c (gimple_build_oacc_kernels): New function. (gimple_copy): Handle GIMPLE_OACC_KERNELS. * gimple-low.c (lower_stmt): Likewise. * gimple-walk.c (walk_gimple_op, walk_gimple_stmt): Likewise. * gimple-pretty-print.c (pp_gimple_stmt_1): Likewise. (dump_gimple_oacc_parallel): Rename to dump_gimple_oacc_offload. Also handle GIMPLE_OACC_KERNELS. Update all callers. * gimplify.c (gimplify_omp_workshare, gimplify_expr): Handle OACC_KERNELS. * oacc-builtins.def (BUILT_IN_GOACC_KERNELS): New builtin. * omp-low.c (scan_oacc_parallel, expand_oacc_parallel) (lower_oacc_parallel): Rename to scan_oacc_offload, expand_oacc_offload, and lower_oacc_offload. Also handle GIMPLE_OACC_KERNELS. Update all callers. (scan_sharing_clauses, scan_omp_1_stmt, expand_omp, lower_omp_1) (diagnose_sb_0, diagnose_sb_1, diagnose_sb_2) (make_gimple_omp_edges): Handle GIMPLE_OACC_KERNELS. * tree-inline.c (remap_gimple_stmt, estimate_num_insns): Likewise. * tree-nested.c (convert_nonlocal_reference_stmt) (convert_local_reference_stmt, convert_tramp_reference_stmt) (convert_gimple_call): Likewise. libgomp/ * libgomp.map (GOACC_2.0): Add GOACC_kernels. * libgomp_g.h (GOACC_kernels): New prototype. * oacc-parallel.c (GOACC_kernels): New function. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@208215 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp| 36 + gcc/doc/gimple.texi | 7 +++ gcc/gimple-low.c | 1 + gcc/gimple-pretty-print.c | 48 - gcc/gimple-walk.c | 16 ++ gcc/gimple.c | 18 +++ gcc/gimple.def| 22 +++- gcc/gimple.h | 130 -- gcc/gimplify.c| 6 ++- gcc/oacc-builtins.def | 6 ++- gcc/omp-low.c | 116 - gcc/tree-inline.c | 2 + gcc/tree-nested.c | 4 ++ libgomp/ChangeLog.gomp| 6 +++ libgomp/libgomp.map | 1 + libgomp/libgomp_g.h | 6 ++- libgomp/oacc-parallel.c | 12 - 17 files changed, 389 insertions(+), 48 deletions(-) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index 3d9b06d..79030d6 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,3 +1,39 @@ +2014-02-28 Thomas Schwinge tho...@codesourcery.com + + * gimple.def (GIMPLE_OACC_KERNELS): New code. + * doc/gimple.texi: Document it. + * gimple.h (gimple_has_substatements, CASE_GIMPLE_OMP) + (is_gimple_omp_oacc_specifically): Handle it. + (gimple_statement_oacc_kernels): New struct. + (gimple_build_oacc_kernels): New prototype. + (gimple_oacc_kernels_clauses, gimple_oacc_kernels_clauses_ptr) + (gimple_oacc_kernels_set_clauses, gimple_oacc_kernels_child_fn) + (gimple_oacc_kernels_child_fn_ptr) + (gimple_oacc_kernels_set_child_fn, gimple_oacc_kernels_data_arg) + (gimple_oacc_kernels_data_arg_ptr) + (gimple_oacc_kernels_set_data_arg): New inline functions. + * gimple.c (gimple_build_oacc_kernels): New function. + (gimple_copy): Handle GIMPLE_OACC_KERNELS. + * gimple-low.c (lower_stmt): Likewise. + * gimple-walk.c (walk_gimple_op, walk_gimple_stmt): Likewise. + * gimple-pretty-print.c (pp_gimple_stmt_1): Likewise. + (dump_gimple_oacc_parallel): Rename to dump_gimple_oacc_offload. + Also handle GIMPLE_OACC_KERNELS. Update all callers. + * gimplify.c (gimplify_omp_workshare, gimplify_expr): Handle + OACC_KERNELS. + * oacc-builtins.def (BUILT_IN_GOACC_KERNELS): New builtin. + * omp-low.c (scan_oacc_parallel, expand_oacc_parallel) + (lower_oacc_parallel): Rename to scan_oacc_offload, + expand_oacc_offload, and lower_oacc_offload. Also handle + GIMPLE_OACC_KERNELS. Update all callers. + (scan_sharing_clauses, scan_omp_1_stmt, expand_omp, lower_omp_1) + (diagnose_sb_0, diagnose_sb_1, diagnose_sb_2) + (make_gimple_omp_edges): Handle
[gomp4 2/2] Initial support for the OpenACC kernels construct in the C front end.
From: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 gcc/c-family/ * c-pragma.c (oacc_pragmas): Add kernels. * c-pragma.h (enum pragma_kind): Add PRAGMA_OACC_KERNELS. gcc/c/ * c-parser.c (OACC_KERNELS_CLAUSE_MASK): New macro definition. (c_parser_oacc_kernels): New function. (c_parser_omp_construct): Handle PRAGMA_OACC_KERNELS. * c-tree.h (c_finish_oacc_kernels): New prototype. * c-typeck.c (c_finish_oacc_kernels): New function. gcc/testsuite/ * c-c++-common/goacc-gomp/nesting-fail-1.c: Extend for OpenACC kernels construct. * c-c++-common/goacc/clauses-fail.c: Likewise. * c-c++-common/goacc/data-clause-duplicate-1.c: Likewise. * c-c++-common/goacc/deviceptr-1.c: Likewise. * c-c++-common/goacc/nesting-fail-1.c: Likewise. * c-c++-common/goacc/kernels-1.c: New file. * gcc.dg/goacc/parallel-sb-1.c: Rename to... * gcc.dg/goacc/sb-1.c: ... this new file, and extend for OpenACC kernels and data constructs. * gcc.dg/goacc/parallel-sb-2.c: Rename to... * gcc.dg/goacc/sb-2.c: ... this new file, and extend for OpenACC kernels and data constructs. libgomp/ * testsuite/libgomp.oacc-c/goacc_kernels.c: New file. * testsuite/libgomp.oacc-c/kernels-1.c: Likewise. * testsuite/libgomp.oacc-c/parallel-1.c: Add one missing test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@208216 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/c-family/ChangeLog.gomp| 5 + gcc/c-family/c-pragma.c| 1 + gcc/c-family/c-pragma.h| 1 + gcc/c/ChangeLog.gomp | 8 + gcc/c/c-parser.c | 42 + gcc/c/c-tree.h | 1 + gcc/c/c-typeck.c | 19 +++ gcc/testsuite/ChangeLog.gomp | 16 ++ .../c-c++-common/goacc-gomp/nesting-fail-1.c | 84 ++ gcc/testsuite/c-c++-common/goacc/clauses-fail.c| 3 + .../c-c++-common/goacc/data-clause-duplicate-1.c | 4 +- gcc/testsuite/c-c++-common/goacc/deviceptr-1.c | 18 +-- gcc/testsuite/c-c++-common/goacc/kernels-1.c | 6 + gcc/testsuite/c-c++-common/goacc/nesting-fail-1.c | 20 +++ gcc/testsuite/gcc.dg/goacc/parallel-sb-1.c | 22 --- gcc/testsuite/gcc.dg/goacc/parallel-sb-2.c | 10 -- gcc/testsuite/gcc.dg/goacc/sb-1.c | 54 +++ gcc/testsuite/gcc.dg/goacc/sb-2.c | 22 +++ libgomp/ChangeLog.gomp | 4 + libgomp/testsuite/libgomp.oacc-c/goacc_kernels.c | 25 +++ libgomp/testsuite/libgomp.oacc-c/kernels-1.c | 170 + libgomp/testsuite/libgomp.oacc-c/parallel-1.c | 14 ++ 22 files changed, 506 insertions(+), 43 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-1.c delete mode 100644 gcc/testsuite/gcc.dg/goacc/parallel-sb-1.c delete mode 100644 gcc/testsuite/gcc.dg/goacc/parallel-sb-2.c create mode 100644 gcc/testsuite/gcc.dg/goacc/sb-1.c create mode 100644 gcc/testsuite/gcc.dg/goacc/sb-2.c create mode 100644 libgomp/testsuite/libgomp.oacc-c/goacc_kernels.c create mode 100644 libgomp/testsuite/libgomp.oacc-c/kernels-1.c diff --git gcc/c-family/ChangeLog.gomp gcc/c-family/ChangeLog.gomp index 3da377f..3b4a335 100644 --- gcc/c-family/ChangeLog.gomp +++ gcc/c-family/ChangeLog.gomp @@ -1,3 +1,8 @@ +2014-02-28 Thomas Schwinge tho...@codesourcery.com + + * c-pragma.c (oacc_pragmas): Add kernels. + * c-pragma.h (enum pragma_kind): Add PRAGMA_OACC_KERNELS. + 2014-02-21 Thomas Schwinge tho...@codesourcery.com * c-pragma.c (oacc_pragmas): Add data. diff --git gcc/c-family/c-pragma.c gcc/c-family/c-pragma.c index 08374aa..ee0ee93 100644 --- gcc/c-family/c-pragma.c +++ gcc/c-family/c-pragma.c @@ -1170,6 +1170,7 @@ static vecpragma_ns_name registered_pp_pragmas; struct omp_pragma_def { const char *name; unsigned int id; }; static const struct omp_pragma_def oacc_pragmas[] = { { data, PRAGMA_OACC_DATA }, + { kernels, PRAGMA_OACC_KERNELS }, { parallel, PRAGMA_OACC_PARALLEL }, }; static const struct omp_pragma_def omp_pragmas[] = { diff --git gcc/c-family/c-pragma.h gcc/c-family/c-pragma.h index d092f9f..d55a511 100644 --- gcc/c-family/c-pragma.h +++ gcc/c-family/c-pragma.h @@ -28,6 +28,7 @@ typedef enum pragma_kind { PRAGMA_NONE = 0, PRAGMA_OACC_DATA, + PRAGMA_OACC_KERNELS, PRAGMA_OACC_PARALLEL, PRAGMA_OMP_ATOMIC, PRAGMA_OMP_BARRIER, diff --git gcc/c/ChangeLog.gomp gcc/c/ChangeLog.gomp index 9b95725..0551026 100644 --- gcc/c/ChangeLog.gomp +++ gcc/c/ChangeLog.gomp @@ -1,3 +1,11 @@ +2014-02-28 Thomas Schwinge tho...@codesourcery.com + + * c-parser.c (OACC_KERNELS_CLAUSE_MASK): New macro definition. +
Re: [AArch64 05/14] Add AArch64 'prefetch'-pattern.
Ganesh, On 28 Feb 2014, at 10:13 , Gopalasubramanian, Ganesh ganesh.gopalasubraman...@amd.com wrote: I also have attached a patch that implements the following. * Prefetch with immediate offset in the range 0 to 32760 (multiple of 8). Added a predicate for this. * Prefetch with immediate offset - in the range -256 to 255 (Gets generated only when we have a negative offset. Generates prfum instruction). Added a predicate for this. * Prefetch with register offset. (modified for printing the locality) These changes look good to me. We’ll try them out on the benchmarks that caused us to add prefetching in the first place. Best, Philipp.
[Patch AArch64] Define TARGET_FLAGS_REGNUM
Hi, This defines TARGET_FLAGS_REGNUM for AArch64 to be CC_REGNUM. Noticed this turns on the cmpelim pass after reload and in a few examples and a couple of benchmarks I noticed a number of comparisons getting deleted. A similar patch for AArch32 is being tested. Tested cross with aarch64-none-elf on a model with no regressions. Ok for stage1 ? regards Ramana DATE Ramana Radhakrishnan ramana.radhakrish...@arm.com * config/aarch64/aarch64.c (TARGET_FLAGS_REGNUM): Define. -- Ramana Radhakrishnan Principal Engineer ARM Ltd.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 901ad3de793c2dd6ca3a2458dc6268e56322400a..617f4de494b1c9fa366dcf4a9fc7f22e7d11642a 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -8536,6 +8536,9 @@ aarch64_cannot_change_mode_class (enum machine_mode from, #undef TARGET_FIXED_CONDITION_CODE_REGS #define TARGET_FIXED_CONDITION_CODE_REGS aarch64_fixed_condition_code_regs +#undef TARGET_FLAGS_REGNUM +#define TARGET_FLAGS_REGNUM CC_REGNUM + struct gcc_target targetm = TARGET_INITIALIZER; #include gt-aarch64.h
Re: [PATCH/AARCH64 1/3] Add AARCH64 ILP32 PCH support
On 26/02/14 02:25, Andrew Pinski wrote: Hi, Just like most of the targets out there we should define TRY_EMPTY_VM_SPACE to have better PCH support. OK? Built and tested on aarch64-linux-gnu with no regressions. Thanks, Andrew Pinski * config/host-linux.c (TRY_EMPTY_VM_SPACE): Change aarch64 ilp32 definition. --- gcc/ChangeLog |5 + gcc/config/host-linux.c |4 +++- 2 files changed, 8 insertions(+), 1 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 616d8ec..fd2b6cd 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2014-02-25 Andrew Pinski apin...@cavium.com + + * config/host-linux.c (TRY_EMPTY_VM_SPACE): Change aarch64 ilp32 + definition. + 2014-02-25 Vladimir Makarov vmaka...@redhat.com PR rtl-optimization/60317 diff --git a/gcc/config/host-linux.c b/gcc/config/host-linux.c index 17048d7..b298a17 100644 --- a/gcc/config/host-linux.c +++ b/gcc/config/host-linux.c @@ -86,8 +86,10 @@ # define TRY_EMPTY_VM_SPACE 0x6000 #elif defined(__mc68000__) # define TRY_EMPTY_VM_SPACE 0x4000 -#elif defined(__aarch64__) +#elif defined(__aarch64__) defined(__LP64__) # define TRY_EMPTY_VM_SPACE 0x10 +#elif defined(__aarch64__) +# define TRY_EMPTY_VM_SPACE 0x6000 #elif defined(__ARM_EABI__) # define TRY_EMPTY_VM_SPACE 0x6000 #elif defined(__mips__) defined(__LP64__) I'd prefer to see this written as: -#elif defined(__aarch64__) +#elif defined(__aarch64__) defined(__ILP32__) # define TRY_EMPTY_VM_SPACE0x6000 +#elif defined(__aarch64__) +# define TRY_EMPTY_VM_SPACE0x10 Since I'd expect there to be a much higher likelihood of another variant that uses 64-bit pointers (eg LLP64) than of there being another variant that uses 32-bit. R.
Re: [PATCH GCC]Allow cfgcleanup to remove forwarder loop preheaders and latches
On Fri, Feb 28, 2014 at 10:09 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Feb 28, 2014 at 1:52 AM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Feb 24, 2014 at 9:12 PM, bin.cheng bin.ch...@arm.com wrote: Hi, This patch is to fix regression reported in PR60280 by removing forward loop headers/latches in cfg cleanup if possible. Several tests are broken by this change since cfg cleanup is shared by all optimizers. Some tests has already been fixed by recent patches, I went through and fixed the others. One case needs to be clarified is gcc.dg/tree-prof/update-loopch.c. When GCC removing a basic block, it checks profile information by calling check_bb_profile after redirecting incoming edges of the bb. This certainly results in warnings about invalid profile information and causes the case to fail. I will send a patch to skip checking profile information for a removing basic block in stage 1 if it sounds reasonable. For now I just twisted the case itself. Bootstrap and tested on x86_64 and arm_a15. Is it OK? 2014-02-25 Bin Cheng bin.ch...@arm.com PR target/60280 * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop preheaders and latches only if requested. Fix latch if it is removed. * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set LOOPS_HAVE_PREHEADERS. This change: if (dest-loop_father-header == dest) - return false; + { +if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) + bb-loop_father-header != dest) + return false; + +if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) + bb-loop_father-header == dest) + return false; + } } miscompiled 435.gromacs in SPEC CPU 2006 on x32 with -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver -fuse-linker-plugin This patch changes loops without LOOPS_HAVE_PREHEADERS nor LOOPS_HAVE_SIMPLE_LATCHES from returning false to returning true. I don't have a small testcase. But this patch: diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c index b5c384b..2ba673c 100644 --- a/gcc/tree-cfgcleanup.c +++ b/gcc/tree-cfgcleanup.c @@ -323,6 +323,10 @@ tree_forwarder_block_p (basic_block bb, bool phi_wanted) if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) bb-loop_father-header == dest) return false; + +if (!loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) + !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)) + return false; } } fixes the regression. Does it make any senses? I think the preheader test isn't fully correct (bb may be in an inner loop for example). So a more conservative variant would be Index: gcc/tree-cfgcleanup.c === --- gcc/tree-cfgcleanup.c (revision 208169) +++ gcc/tree-cfgcleanup.c (working copy) @@ -316,13 +316,13 @@ tree_forwarder_block_p (basic_block bb, /* Protect loop preheaders and latches if requested. */ if (dest-loop_father-header == dest) { - if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) - bb-loop_father-header != dest) - return false; - - if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) - bb-loop_father-header == dest) - return false; + if (bb-loop_father == dest-loop_father) + return !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES); + else if (bb-loop_father == loop_outer (dest-loop_father)) + return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS); + /* Always preserve other edges into loop headers that are +not simple latches or preheaders. */ + return false; } } that makes sure we can properly update loop information. It's also a more conservative change at this point which should still successfully remove simple latches and preheaders created by loop discovery. I think the patch makes sense anyway and thus I'll install it once it passed bootstrap / regtesting. Another fix that may make sense is to restrict it to !loops_state_satisfies_p (LOOPS_NEED_FIXUP), though cfgcleanup itself can end up setting that ... which we eventually should fix if it still happens. That is, check if Index: gcc/tree-cfgcleanup.c === --- gcc/tree-cfgcleanup.c (revision 208169) +++ gcc/tree-cfgcleanup.c (working copy) @@ -729,8 +729,9 @@ cleanup_tree_cfg_noloop (void) timevar_pop (TV_TREE_CLEANUP_CFG); - if (changed current_loops) -loops_state_set (LOOPS_NEED_FIXUP); + if (changed current_loops + !loops_state_satisfies_p (LOOPS_NEED_FIXUP)) +verify_loop_structure (); return changed; } trips anywhere (and apply fixes). That's of course not appropriate at this stage. Does it fix 435.gromacs? I can't see
Re: [AArch64] 64-bit float vreinterpret implemention
On 25/02/14 18:15, Richard Henderson wrote: On 02/25/2014 09:02 AM, Alex Velenko wrote: +(define_expand aarch64_reinterpretdfmode + [(match_operand:DF 0 register_operand ) + (match_operand:VD_RE 1 register_operand )] + TARGET_SIMD +{ + aarch64_simd_reinterpret (operands[0], operands[1]); + DONE; +}) I believe you want to implement these in aarch64_fold_builtin to fold to a VIEW_CONVERT_EXPR. No sense in leaving these opaque until rtl expansion. r~ Hi Richard, Thank you for your suggestion. Attached is a patch that includes implementation of your proposition. A testsuite was run on LE and BE compilers with no regressions. Here is the description of the patch: This patch introduces vreinterpret implementation for vectors with 64-bit float lanes and adds testcase for those intrinsics. Thanks, Alex gcc/ 2014-02-28 Alex Velenko alex.vele...@arm.com * config/aarch64/aarch64-builtins.c (TYPES_REINTERP): Removed. (aarch64_types_signed_unsigned_qualifiers): Qualifier added. (aarch64_types_signed_poly_qualifiers): Likewise. (aarch64_types_unsigned_signed_qualifiers): Likewise. (aarch64_types_poly_signed_qualifiers): Likewise. (TYPES_REINTERP_SS): Type macro added. (TYPES_REINTERP_SU): Likewise. (TYPES_REINTERP_SP): Likewise. (TYPES_REINTERP_US): Likewise. (TYPES_REINTERP_PS): Likewise. (aarch64_fold_builtin): New expression folding added. * config/aarch64/aarch64-simd-builtins.def (REINTERP): Declarations removed. (REINTERP_SS): Declarations added. (REINTERP_US): Likewise. (REINTERP_PS): Likewise. (REINTERP_SU): Likewise. (REINTERP_SP): Likewise. * config/aarch64/arm_neon.h (vreinterpret_p8_f64): Implemented. (vreinterpretq_p8_f64): Likewise. (vreinterpret_p16_f64): Likewise. (vreinterpretq_p16_f64): Likewise. (vreinterpret_f32_f64): Likewise. (vreinterpretq_f32_f64): Likewise. (vreinterpret_f64_f32): Likewise. (vreinterpret_f64_p8): Likewise. (vreinterpret_f64_p16): Likewise. (vreinterpret_f64_s8): Likewise. (vreinterpret_f64_s16): Likewise. (vreinterpret_f64_s32): Likewise. (vreinterpret_f64_s64): Likewise. (vreinterpret_f64_u8): Likewise. (vreinterpret_f64_u16): Likewise. (vreinterpret_f64_u32): Likewise. (vreinterpret_f64_u64): Likewise. (vreinterpretq_f64_f32): Likewise. (vreinterpretq_f64_p8): Likewise. (vreinterpretq_f64_p16): Likewise. (vreinterpretq_f64_s8): Likewise. (vreinterpretq_f64_s16): Likewise. (vreinterpretq_f64_s32): Likewise. (vreinterpretq_f64_s64): Likewise. (vreinterpretq_f64_u8): Likewise. (vreinterpretq_f64_u16): Likewise. (vreinterpretq_f64_u32): Likewise. (vreinterpretq_f64_u64): Likewise. (vreinterpret_s64_f64): Likewise. (vreinterpretq_s64_f64): Likewise. (vreinterpret_u64_f64): Likewise. (vreinterpretq_u64_f64): Likewise. (vreinterpret_s8_f64): Likewise. (vreinterpretq_s8_f64): Likewise. (vreinterpret_s16_f64): Likewise. (vreinterpretq_s16_f64): Likewise. (vreinterpret_s32_f64): Likewise. (vreinterpretq_s32_f64): Likewise. (vreinterpret_u8_f64): Likewise. (vreinterpretq_u8_f64): Likewise. (vreinterpret_u16_f64): Likewise. (vreinterpretq_u16_f64): Likewise. (vreinterpret_u32_f64): Likewise. (vreinterpretq_u32_f64): Likewise. gcc/testsuite/ 2014-02-28 Alex Velenko alex.vele...@arm.com * gcc.target/aarch64/vreinterpret_f64_1.c: new_testcase diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 5e0e9b94653deb1530955d62d9842c39da95058a..8241f918e3fcfb71144daf1c873ba1ed481a4385 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -147,6 +147,23 @@ aarch64_types_unopu_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_unsigned, qualifier_unsigned }; #define TYPES_UNOPU (aarch64_types_unopu_qualifiers) #define TYPES_CREATE (aarch64_types_unop_qualifiers) +#define TYPES_REINTERP_SS (aarch64_types_unop_qualifiers) +static enum aarch64_type_qualifiers +aarch64_types_unop_su_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_unsigned }; +#define TYPES_REINTERP_SU (aarch64_types_unop_su_qualifiers) +static enum aarch64_type_qualifiers +aarch64_types_unop_sp_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_poly }; +#define TYPES_REINTERP_SP (aarch64_types_unop_sp_qualifiers) +static enum aarch64_type_qualifiers +aarch64_types_unop_us_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_unsigned, qualifier_none }; +#define TYPES_REINTERP_US (aarch64_types_unop_us_qualifiers) +static enum aarch64_type_qualifiers
Re: [C++ Patch] PR 60314 (ICE with decltype(auto))
Hi, On 02/27/2014 08:29 PM, Jason Merrill wrote: On 02/25/2014 05:03 AM, Paolo Carlini wrote: here we ICE exactly as we did in c++/53756: the only difference is the use of decltype(auto) instead of auto. Now, if we compare is_cxx_auto to is_auto (the front-end helper), evidently there is an inconsistency about the handling of decltype(auto) and the below fixes the ICE. However, also clearly the patchlet needs a review, because an out of class decltype(auto) is already fine. Also, I'm not 100% sure we don't need a decltype_auto_die, etc. I think we do need a decltype_auto_die. Ok, then I tested on x86_64-linux the below. Thanks! Paolo. /// 2014-02-28 Paolo Carlini paolo.carl...@oracle.com PR c++/60314 * dwarf2out.c (decltype_auto_die): New static. (gen_subprogram_die): Handle 'decltype(auto)' like 'auto'. (gen_type_die_with_usage): Handle 'decltype(auto)'. (is_cxx_auto): Likewise. /testsuite 2014-02-28 Paolo Carlini paolo.carl...@oracle.com PR c++/60314 * g++.dg/cpp1y/auto-fn24.C: New. Index: dwarf2out.c === --- dwarf2out.c (revision 208214) +++ dwarf2out.c (working copy) @@ -250,6 +250,9 @@ static GTY(()) section *cold_text_section; /* The DIE for C++1y 'auto' in a function return type. */ static GTY(()) dw_die_ref auto_die; +/* The DIE for C++1y 'decltype(auto)' in a function return type. */ +static GTY(()) dw_die_ref decltype_auto_die; + /* Forward declarations for functions defined in this file. */ static char *stripattributes (const char *); @@ -10230,7 +10233,8 @@ is_cxx_auto (tree type) tree name = TYPE_NAME (type); if (TREE_CODE (name) == TYPE_DECL) name = DECL_NAME (name); - if (name == get_identifier (auto)) + if (name == get_identifier (auto) + || name == get_identifier (decltype(auto))) return true; } return false; @@ -18022,10 +18026,11 @@ gen_subprogram_die (tree decl, dw_die_ref context_ if (get_AT_unsigned (old_die, DW_AT_decl_line) != (unsigned) s.line) add_AT_unsigned (subr_die, DW_AT_decl_line, s.line); - /* If the prototype had an 'auto' return type, emit the real -type on the definition die. */ + /* If the prototype had an 'auto' or 'decltype(auto)' return type, +emit the real type on the definition die. */ if (is_cxx() debug_info_level DINFO_LEVEL_TERSE - get_AT_ref (old_die, DW_AT_type) == auto_die) + (get_AT_ref (old_die, DW_AT_type) == auto_die + || get_AT_ref (old_die, DW_AT_type) == decltype_auto_die)) add_type_attribute (subr_die, TREE_TYPE (TREE_TYPE (decl)), 0, 0, context_die); } @@ -19852,13 +19857,18 @@ gen_type_die_with_usage (tree type, dw_die_ref con default: if (is_cxx_auto (type)) { - if (!auto_die) + tree name = TYPE_NAME (type); + if (TREE_CODE (name) == TYPE_DECL) + name = DECL_NAME (name); + dw_die_ref *die = (name == get_identifier (auto) +? auto_die : decltype_auto_die); + if (!*die) { - auto_die = new_die (DW_TAG_unspecified_type, - comp_unit_die (), NULL_TREE); - add_name_attribute (auto_die, auto); + *die = new_die (DW_TAG_unspecified_type, + comp_unit_die (), NULL_TREE); + add_name_attribute (*die, IDENTIFIER_POINTER (name)); } - equate_type_number_to_die (type, auto_die); + equate_type_number_to_die (type, *die); break; } gcc_unreachable (); Index: testsuite/g++.dg/cpp1y/auto-fn24.C === --- testsuite/g++.dg/cpp1y/auto-fn24.C (revision 0) +++ testsuite/g++.dg/cpp1y/auto-fn24.C (working copy) @@ -0,0 +1,12 @@ +// PR c++/60314 +// { dg-options -std=c++1y -g } + +// fine +decltype(auto) qux() { return 42; } + +struct foo +{ + // also ICEs if not static + static decltype(auto) bar() + { return 42; } +};
Re: [PATCH v4] PR middle-end/60281
于 2014年02月28日 15:58, lin zuojian 写道: Hi Bernd, I agree you with the mode problem. And I have not change the stack alignment.What I change is the virtual register base's alignment. Realignment must be make in !STRICT_ALIGNMENT machine,or emitting the efficient code is impossible. Sorry, it should be Realignment must be make in STRICT_ALIGNMENT machine. For example 4 set mem:QI X,REG:QI Y will not combine into one set mem:SI X1,REG:SI Y1,if X is not mentioned as SI mode aligned. To make sure X is SI mode algined,virtual register base must be realigned. For this patch,I only intent to make it right.Making it best is next task. -- Regards lin zuojian. 于 2014年02月28日 15:47, Bernd Edlinger 写道: Hi, I see the problem too. But I think it is not necessary to change the stack alignment to solve the problem. It appears to me that the code in asan_emit_stack_protection is just wrong. It uses SImode when the memory is not aligned enough for that mode. This would not happen if that code is rewritten to use get_best_mode, and by the way, even on x86_64 the emitted code is not optimal, because that target could work with DImode more efficiently. So, to fix that, it would be better to concentrate on that function, and use word_mode instead of SImode, and let get_best_mode choose the required mode. Regards Bernd Edlinger.
Re: [patch] [arm] Fix PR60169 - thumb1 far jump
On Fri, Feb 28, 2014 at 2:42 AM, Joey Ye joey...@arm.com wrote: Ping. OK for trunk and 4.8? Ok if no regressions. Ramana -Original Message- From: Joey Ye [mailto:joey...@arm.com] Sent: 21 February 2014 19:32 To: gcc-patches@gcc.gnu.org Subject: [patch] [arm] Fix PR60169 - thumb1 far jump Patch http://gcc.gnu.org/ml/gcc-patches/2012-12/msg01229.html introduced this ICE: 1. thumb1 estimate if far_jump is used based on function insn size 2. During reload, after stack layout finalized, it does reload_as_needed. It however increases insn size that changes estimation result of far_jump, which in return need to save lr and change stack layout again. While there is not chance to change, GCC crashes. Solution: Do not change estimation result of far_jump if reload_in_progress or reload_completed is true. Not likely need to fix lra according to Vlad: http://gcc.gnu.org/ml/gcc/2014-02/msg00355.html ChangeLog: * config/arm/arm.c (thumb_far_jump_used_p): Don't change if reload in progress or completed. * gcc.target/arm/thumb1-far-jump-3.c: New case.
[PATCH i386 14/8] [AVX-512] Fix exp2 and sqrt tests.
Hello, This is relatively obvious patch which eliminates comparision of inifinities for exp2 AVX-512 test and properly comparing floats for avx512f-sqrtps-2.c. Tests pass. Is it ok for trunk? gcc/testsuite/ * gcc.target/i386/avx512er-vexp2ps-2.c: Decrease exponent argument to avoid inf values. * gcc.target/i386/avx512er-vexp2ps-2.c: Compare results with UNION_FP_CHECK machinery. -- Thanks, K --- gcc/testsuite/gcc.target/i386/avx512er-vexp2ps-2.c | 2 +- gcc/testsuite/gcc.target/i386/avx512f-vdivps-2.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/testsuite/gcc.target/i386/avx512er-vexp2ps-2.c b/gcc/testsuite/gcc.target/i386/avx512er-vexp2ps-2.c index 06ef68c..ab911c0 100644 --- a/gcc/testsuite/gcc.target/i386/avx512er-vexp2ps-2.c +++ b/gcc/testsuite/gcc.target/i386/avx512er-vexp2ps-2.c @@ -25,7 +25,7 @@ avx512er_test (void) for (i = 0; i 16; i++) { - src.a[i] = 179.345 - 6.5645 * i; + src.a[i] = 79.345 - 6.5645 * i; res2.a[i] = DEFAULT_VALUE; } diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vdivps-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vdivps-2.c index 5249bbd..f5a7b78 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-vdivps-2.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-vdivps-2.c @@ -46,10 +46,10 @@ TEST (void) abort (); MASK_MERGE () (res_ref, mask, SIZE); - if (UNION_CHECK (AVX512F_LEN,) (res2, res_ref)) + if (UNION_FP_CHECK (AVX512F_LEN,) (res2, res_ref)) abort (); MASK_ZERO () (res_ref, mask, SIZE); - if (UNION_CHECK (AVX512F_LEN,) (res3, res_ref)) + if (UNION_FP_CHECK (AVX512F_LEN,) (res3, res_ref)) abort ();
Re: [PATCH] [libgcc,arm] Fix PR 60166 - NAN fraction bits
On Fri, Feb 28, 2014 at 7:16 AM, Joey Ye joey...@arm.com wrote: This patch is a mirror copy from approved patch in glibc: http://sourceware.org/ml/libc-alpha/2014-02/msg00741.html OK to trunk, 4.8 and 4.7? OK everywhere. Ramana ChangeLog.libgcc: * config/arm/sfp-machine.h (_FP_NANFRAC_H, _FP_NANFRAC_S, _FP_NANFRAC_D, _FP_NANFRAC_Q): Set to zero. diff --git a/libgcc/config/arm/sfp-machine.h b/libgcc/config/arm/sfp-machine.h index bb34895..8d45320 100644 --- a/libgcc/config/arm/sfp-machine.h +++ b/libgcc/config/arm/sfp-machine.h @@ -19,10 +19,12 @@ typedef int __gcc_CMPtype __attribute__ ((mode (__libgcc_cmp_return__))); #define _FP_DIV_MEAT_D(R,X,Y) _FP_DIV_MEAT_2_udiv(D,R,X,Y) #define _FP_DIV_MEAT_Q(R,X,Y) _FP_DIV_MEAT_4_udiv(Q,R,X,Y) -#define _FP_NANFRAC_H ((_FP_QNANBIT_H 1) - 1) -#define _FP_NANFRAC_S ((_FP_QNANBIT_S 1) - 1) -#define _FP_NANFRAC_D ((_FP_QNANBIT_D 1) - 1), -1 -#define _FP_NANFRAC_Q ((_FP_QNANBIT_Q 1) - 1), -1, -1, -1 +/* According to RTABI, QNAN is only with the most significant bit of the + significand set, and all other significand bits zero. */ +#define _FP_NANFRAC_H 0 +#define _FP_NANFRAC_S 0 +#define _FP_NANFRAC_D 0, 0 +#define _FP_NANFRAC_Q 0, 0, 0, 0 #define _FP_NANSIGN_H 0 #define _FP_NANSIGN_S 0 #define _FP_NANSIGN_D 0
[PATCH] Restrict and fix the PR60280 fix
This narrows down the effect of the PR60280 fix (removing more forwarder blocks during cfg-cleanup when loops are present) to only remove forwarders how loop_optimizer_init would create them. It also fixes the loop latch updating in remove_forwarder_block (though that doesn't have any immediate effect as we fixup loops anywya) - it was set to the wrong loop. Which also made me figure that we don't honor !LOOPS_MAY_HAVE_MULTIPLE_LATCHES properly (also fixed). Maybe any of the above will fix the gromacs miscompare HJ is seeing (can't reproduce it). Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2014-02-28 Richard Biener rguent...@suse.de PR target/60280 * tree-cfgcleanup.c (tree_forwarder_block_p): Restrict previous fix and only allow to remove trivial pre-headers and latches. Also honor LOOPS_MAY_HAVE_MULTIPLE_LATCHES. (remove_forwarder_block): Properly update the latch of a loop. Index: gcc/tree-cfgcleanup.c === --- gcc/tree-cfgcleanup.c (revision 208216) +++ gcc/tree-cfgcleanup.c (working copy) @@ -316,13 +316,22 @@ tree_forwarder_block_p (basic_block bb, /* Protect loop preheaders and latches if requested. */ if (dest-loop_father-header == dest) { - if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) - bb-loop_father-header != dest) - return false; - - if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) - bb-loop_father-header == dest) - return false; + if (bb-loop_father == dest-loop_father) + { + if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)) + return false; + /* If bb doesn't have a single predecessor we'd make this +loop have multiple latches. Don't do that if that +would in turn require disambiguating them. */ + return (single_pred_p (bb) + || loops_state_satisfies_p + (LOOPS_MAY_HAVE_MULTIPLE_LATCHES)); + } + else if (bb-loop_father == loop_outer (dest-loop_father)) + return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS); + /* Always preserve other edges into loop headers that are +not simple latches or preheaders. */ + return false; } } @@ -417,6 +426,10 @@ remove_forwarder_block (basic_block bb) can_move_debug_stmts = MAY_HAVE_DEBUG_STMTS single_pred_p (dest); + basic_block pred = NULL; + if (single_pred_p (bb)) +pred = single_pred (bb); + /* Redirect the edges. */ for (ei = ei_start (bb-preds); (e = ei_safe_edge (ei)); ) { @@ -510,7 +523,7 @@ remove_forwarder_block (basic_block bb) /* Adjust latch infomation of BB's parent loop as otherwise the cfg hook has a hard time not to kill the loop. */ if (current_loops bb-loop_father-latch == bb) -bb-loop_father-latch = dest; +bb-loop_father-latch = pred; /* And kill the forwarder block. */ delete_basic_block (bb);
Re: [PATCH i386 14/8] [AVX-512] Fix exp2 and sqrt tests.
On Fri, Feb 28, 2014 at 1:14 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, This is relatively obvious patch which eliminates comparision of inifinities for exp2 AVX-512 test and properly comparing floats for avx512f-sqrtps-2.c. Tests pass. Is it ok for trunk? gcc/testsuite/ * gcc.target/i386/avx512er-vexp2ps-2.c: Decrease exponent argument to avoid inf values. * gcc.target/i386/avx512er-vexp2ps-2.c: Compare results with UNION_FP_CHECK machinery. You are talking about avx512f-sqrtps-2.c, the ChangeLog refers to avx512er-vexp2ps-2.c, but the patch is modifying avx512f-vdivps-2.c. Uros.
Re: copyright dates in binutils (and includes/)
Joseph, do you know why implicitly adding years to the claimed copyright years is a problem? I'm guessing the file needs to be published somewhere for each year claimed. IANAL, but from 2 discussions with copyright-clerk: 1. We start claiming copyright the year the file as committed to a medium (hard drive), not the year it was published. 2. As long as we have evidence of a copyrightable change each year, we can include that year in the list of copyright years in all files' headers. For (2), this is how I asked the FSF: My question is: As we have evidence of copyrightable changes to the GDB project every year since 1986, is it acceptable fix the copyright headers to add the missing holes? And if yes, is it acceptable to go straight to the next step, which is reducing the copyright years to a single range, even if the original list had holes in it? (we will make sure that the first year of the range is always 1986 or later, or else investigate to make sure that the range is correct). For example, we would reduce: Copyright (C) 1986, 1988-1989, 1991-1993, 1999-2000, 2007-2012 Free Software Foundation, Inc. into: 1986-2012 Free Software Foundation, Inc. Naturally, if the initial year was 1995, then it would be the year used as the start of the range! ... to which they answered that it would be acceptable. Does it mean that the sources needed to be made public that year for us to be able to claim copyright that year? It did not seem so to me. But you could ask the FSF (copyright DASH clerk AT fsf DOT org). -- Joel
[jit] New API entrypoint: gcc_jit_block_get_function
Committed to branch dmalcolm/jit: gcc/jit/ * libgccjit.h (gcc_jit_block_get_function): New. * libgccjit.map (gcc_jit_block_get_function): New. * libgccjit++.h (gccjit::block::get_function): New method. * libgccjit.c (gcc_jit_block_get_function): New. --- gcc/jit/ChangeLog.jit | 7 +++ gcc/jit/libgccjit++.h | 8 gcc/jit/libgccjit.c | 8 gcc/jit/libgccjit.h | 4 gcc/jit/libgccjit.map | 1 + 5 files changed, 28 insertions(+) diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit index c7b2395..6c43ce9 100644 --- a/gcc/jit/ChangeLog.jit +++ b/gcc/jit/ChangeLog.jit @@ -1,3 +1,10 @@ +2014-02-28 David Malcolm dmalc...@redhat.com + + * libgccjit.h (gcc_jit_block_get_function): New. + * libgccjit.map (gcc_jit_block_get_function): New. + * libgccjit++.h (gccjit::block::get_function): New method. + * libgccjit.c (gcc_jit_block_get_function): New. + 2014-02-27 David Malcolm dmalc...@redhat.com * libgccjit.h (gcc_jit_label): Delete in favor of... diff --git a/gcc/jit/libgccjit++.h b/gcc/jit/libgccjit++.h index a8801a3..7c1c3be 100644 --- a/gcc/jit/libgccjit++.h +++ b/gcc/jit/libgccjit++.h @@ -316,6 +316,8 @@ namespace gccjit gcc_jit_block *get_inner_block () const; +function get_function () const; + void add_eval (rvalue rvalue, location loc = location ()); @@ -1109,6 +,12 @@ function::new_local (type type_, name.c_str ())); } +inline function +block::get_function () const +{ + return function (gcc_jit_block_get_function ( get_inner_block ())); +} + inline void block::add_eval (rvalue rvalue, location loc) diff --git a/gcc/jit/libgccjit.c b/gcc/jit/libgccjit.c index 1146261..ce7987c 100644 --- a/gcc/jit/libgccjit.c +++ b/gcc/jit/libgccjit.c @@ -591,6 +591,14 @@ gcc_jit_block_as_object (gcc_jit_block *block) return static_cast gcc_jit_object * (block-as_object ()); } +gcc_jit_function * +gcc_jit_block_get_function (gcc_jit_block *block) +{ + RETURN_NULL_IF_FAIL (block, NULL, NULL block); + + return static_cast gcc_jit_function * (block-get_function ()); +} + gcc_jit_lvalue * gcc_jit_context_new_global (gcc_jit_context *ctxt, gcc_jit_location *loc, diff --git a/gcc/jit/libgccjit.h b/gcc/jit/libgccjit.h index c24fddd..f00d672 100644 --- a/gcc/jit/libgccjit.h +++ b/gcc/jit/libgccjit.h @@ -503,6 +503,10 @@ gcc_jit_function_new_block (gcc_jit_function *func, extern gcc_jit_object * gcc_jit_block_as_object (gcc_jit_block *block); +/* Which function is this block within? */ +extern gcc_jit_function * +gcc_jit_block_get_function (gcc_jit_block *block); + /** lvalues, rvalues and expressions. **/ diff --git a/gcc/jit/libgccjit.map b/gcc/jit/libgccjit.map index 48fd9d2..9f6a466 100644 --- a/gcc/jit/libgccjit.map +++ b/gcc/jit/libgccjit.map @@ -11,6 +11,7 @@ gcc_jit_block_end_with_jump; gcc_jit_block_end_with_return; gcc_jit_block_end_with_void_return; +gcc_jit_block_get_function; gcc_jit_context_acquire; gcc_jit_context_compile; gcc_jit_context_dump_to_file; -- 1.7.11.7
Re: [C++ Patch] PR 60314 (ICE with decltype(auto))
OK, thanks. Jason
Re: [C++ Patch] PR 58610
OK. Jason
Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation
2014-02-20 22:27 GMT+04:00 Bernd Schmidt ber...@codesourcery.com: * Functions and variables now go into different tables, otherwise intermixing between them could be a problem that causes tables to go out of sync between host and target (imagine one big table being generated by ptx lto1/mkoffload, and multiple small table fragments being linked together on the host side). If you need 2 different tables for funcs and vars, we can also use them. But I still don't understand how it will help synchronization between host and target tables. * I've put the begin/end fragments for the host tables into crtstuff, which seems like the standard way of doing things. Our plan was that the host side descriptor __OPENMP_TARGET__ will contain (in addition to func/var table) pointers to the images for all enabled accelerators (e.g. omp_image_nvptx_start and omp_image_intelmic_start), therefore we generated it in the lto-wrapper. But if the number of accelerators and their types/names will be defined during configuration, then it's ok to generate the descriptor in crtstuff. * Is there a reason to call a register function for the host tables? The way I've set it up, we register a target function/variable table while also passing a pointer to the __OPENMP_TARGET__ symbol which holds information about the host side tables. In our case we can't register target table with a call to libgomp, it can be obtained only from the accelerator. Therefore we propose a target-independent approach: during device initialization libgomp calls 2 functions from the plugin (or this can be implemented by a single function): 1. devicep-device_load_image_func, which will load target image (its pointer will be taken from the host descriptor); 2. devicep-device_get_table_func, which in our case connects to the device and receives its table. And in your case it will return func_mappings and var_mappings. Will it work for you? * An offload compiler is built with --enable-as-accelerator-for=, which eliminates the need for -fopenmp-target, and changes install paths so that the host compiler knows where to find it. No need for OFFLOAD_TARGET_COMPILERS anymore. Unfortunately I don't fully understand this configure magic... When a user specifies 2 or 3 accelerators during configuration with --enable-accelerators, will several different accel-gccs be built? Thanks, -- Ilya
Re: [PATCH GCC]Allow cfgcleanup to remove forwarder loop preheaders and latches
On Fri, Feb 28, 2014 at 2:09 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Feb 28, 2014 at 10:09 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Feb 28, 2014 at 1:52 AM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Feb 24, 2014 at 9:12 PM, bin.cheng bin.ch...@arm.com wrote: Hi, This patch is to fix regression reported in PR60280 by removing forward loop headers/latches in cfg cleanup if possible. Several tests are broken by this change since cfg cleanup is shared by all optimizers. Some tests has already been fixed by recent patches, I went through and fixed the others. One case needs to be clarified is gcc.dg/tree-prof/update-loopch.c. When GCC removing a basic block, it checks profile information by calling check_bb_profile after redirecting incoming edges of the bb. This certainly results in warnings about invalid profile information and causes the case to fail. I will send a patch to skip checking profile information for a removing basic block in stage 1 if it sounds reasonable. For now I just twisted the case itself. Bootstrap and tested on x86_64 and arm_a15. Is it OK? 2014-02-25 Bin Cheng bin.ch...@arm.com PR target/60280 * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop preheaders and latches only if requested. Fix latch if it is removed. * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set LOOPS_HAVE_PREHEADERS. This change: if (dest-loop_father-header == dest) - return false; + { +if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) + bb-loop_father-header != dest) + return false; + +if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) + bb-loop_father-header == dest) + return false; + } } miscompiled 435.gromacs in SPEC CPU 2006 on x32 with -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver -fuse-linker-plugin This patch changes loops without LOOPS_HAVE_PREHEADERS nor LOOPS_HAVE_SIMPLE_LATCHES from returning false to returning true. I don't have a small testcase. But this patch: diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c index b5c384b..2ba673c 100644 --- a/gcc/tree-cfgcleanup.c +++ b/gcc/tree-cfgcleanup.c @@ -323,6 +323,10 @@ tree_forwarder_block_p (basic_block bb, bool phi_wanted) if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) bb-loop_father-header == dest) return false; + +if (!loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) + !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)) + return false; } } fixes the regression. Does it make any senses? I think the preheader test isn't fully correct (bb may be in an inner loop for example). So a more conservative variant would be Index: gcc/tree-cfgcleanup.c === --- gcc/tree-cfgcleanup.c (revision 208169) +++ gcc/tree-cfgcleanup.c (working copy) @@ -316,13 +316,13 @@ tree_forwarder_block_p (basic_block bb, /* Protect loop preheaders and latches if requested. */ if (dest-loop_father-header == dest) { - if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) - bb-loop_father-header != dest) - return false; - - if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) - bb-loop_father-header == dest) - return false; + if (bb-loop_father == dest-loop_father) + return !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES); + else if (bb-loop_father == loop_outer (dest-loop_father)) + return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS); + /* Always preserve other edges into loop headers that are +not simple latches or preheaders. */ + return false; } } that makes sure we can properly update loop information. It's also a more conservative change at this point which should still successfully remove simple latches and preheaders created by loop discovery. I think the patch makes sense anyway and thus I'll install it once it passed bootstrap / regtesting. Another fix that may make sense is to restrict it to !loops_state_satisfies_p (LOOPS_NEED_FIXUP), though cfgcleanup itself can end up setting that ... which we eventually should fix if it still happens. That is, check if Index: gcc/tree-cfgcleanup.c === --- gcc/tree-cfgcleanup.c (revision 208169) +++ gcc/tree-cfgcleanup.c (working copy) @@ -729,8 +729,9 @@ cleanup_tree_cfg_noloop (void) timevar_pop (TV_TREE_CLEANUP_CFG); - if (changed current_loops) -loops_state_set (LOOPS_NEED_FIXUP); + if (changed current_loops + !loops_state_satisfies_p (LOOPS_NEED_FIXUP)) +verify_loop_structure (); return changed; } trips
Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation
On 02/28/2014 05:09 PM, Ilya Verbin wrote: 2014-02-20 22:27 GMT+04:00 Bernd Schmidt ber...@codesourcery.com: * Functions and variables now go into different tables, otherwise intermixing between them could be a problem that causes tables to go out of sync between host and target (imagine one big table being generated by ptx lto1/mkoffload, and multiple small table fragments being linked together on the host side). If you need 2 different tables for funcs and vars, we can also use them. But I still don't understand how it will help synchronization between host and target tables. I think it won't help that much - I still think this entire scheme is likely to fail on nvptx. I'll try to construct an example at some point. One other thing about the split tables is that we don't have to write a useless size of 1 for functions. * I've put the begin/end fragments for the host tables into crtstuff, which seems like the standard way of doing things. Our plan was that the host side descriptor __OPENMP_TARGET__ will contain (in addition to func/var table) pointers to the images for all enabled accelerators (e.g. omp_image_nvptx_start and omp_image_intelmic_start), therefore we generated it in the lto-wrapper. The concept of image is likely to vary somewhat between accelerators. For ptx, it's just a string and it can't really be generated the same way as for your target where you can manipulate ELF images. So I think it is better to have a call to a gomp registration function for every offload target. That should also give you the ordering you said you wanted between shared libraries. * Is there a reason to call a register function for the host tables? The way I've set it up, we register a target function/variable table while also passing a pointer to the __OPENMP_TARGET__ symbol which holds information about the host side tables. In our case we can't register target table with a call to libgomp, it can be obtained only from the accelerator. Therefore we propose a target-independent approach: during device initialization libgomp calls 2 functions from the plugin (or this can be implemented by a single function): 1. devicep-device_load_image_func, which will load target image (its pointer will be taken from the host descriptor); 2. devicep-device_get_table_func, which in our case connects to the device and receives its table. And in your case it will return func_mappings and var_mappings. Will it work for you? Probably. I think the constructor call to the gomp registration function would contain an opaque pointer to whatever data the target wants, so it can arrange its image/table data in whatever way it likes. It would help to see the code you have on the libgomp side, I don't believe that's been posted yet? Unfortunately I don't fully understand this configure magic... When a user specifies 2 or 3 accelerators during configuration with --enable-accelerators, will several different accel-gccs be built? No - the idea is that --enable-accelerator= is likely specific to ptx, where we really just want to build a gcc and no target libraries, so building it alongside the host in an accel-gcc subdirectory is ideal. For your use case, I'd imagine the offload compiler would be built relatively normally as a full build with --enable-as-accelerator-for=x86_64-linux, which would install it into locations where the host will eventually be able to find it. Then the host compiler would be built with another new configure option (as yet unimplemented in my patch set) --enable-offload-targets=mic,... which would tell the host compiler about the pre-built offload target compilers. On the ptx side, --enable-accelerator=ptx would then also add ptx to the list of --enable-offload-targets. Naming of all these configure options can be discussed, I have no real preference for any of them. Bernd
Re: [PATCH] Fix epilogue bb expansion (PR middle-end/60175)
On 02/17/2014 11:45 AM, Jakub Jelinek wrote: Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-02-17 Jakub Jelinek ja...@redhat.com PR middle-end/60175 * function.c (expand_function_end): Don't emit clobber_return_register sequence if clobber_after is a BARRIER. * cfgexpand.c (construct_exit_block): Append instructions before return_label to prev_bb. Ok. r~
[wwwdocs] GSoC2014 and POWER8 News items
I added a news item for GSoC2014. I also realized that POWER8 support had not been added to the News announcements, so I inserted an item. Thanks, David Index: index.html === RCS file: /cvs/gcc/wwwdocs/htdocs/index.html,v retrieving revision 1.905 diff -c -p -r1.905 index.html *** index.html 17 Feb 2014 08:28:36 - 1.905 --- index.html 28 Feb 2014 16:41:17 - *** mission statement/a./p *** 48,58 td style=width: 50%; padding-right: 8px; valign=top - h2 style=margin-top:0pt; id=newsNews/h2 dl class=news dtspanIntel AVX-512 support/span span class=date[2014-02-17]/span/dt ddIntel AVX-512 support was added to GCC. That includes inline --- 48,63 td style=width: 50%; padding-right: 8px; valign=top h2 style=margin-top:0pt; id=newsNews/h2 dl class=news + dtspanGCC Google Summer of Code 2014/span + span class=date[2014-02-24]/span/dt + ddGCC has been accepted as a + a href=http://www.google-melange.com/gsoc/org2/google/gsoc2014/gcc;Goog le Summer of Code 2014 project/a. + Students, mentors and project ideas welcome!/dd + dtspanIntel AVX-512 support/span span class=date[2014-02-17]/span/dt ddIntel AVX-512 support was added to GCC. That includes inline *** mission statement/a./p *** 109,114 --- 114,126 a href=https://plus.google.com/108467477471815191158; rel=publisher ta rget=_blankGoogle+/a to help developers stay informed of progress./dd + dtspanIBM POWER8 support/span + span class=date[2013-07-15]/span/dt + ddSupport for the POWER8 processor has been contributed by IBM. + This includes new VSX, HTM and atomic instructions, new intrinsics, + and scheduling improvements. Little Endian support also has been + enhanced, including control over vector element endianness./dd + dtspana href=gcc-4.8/GCC 4.8.1/a released/span span class=date[2013-05-31]/span/dt dd/dd
Re: [PATCH] Handle more COMDAT profiling issues
Here's the new patch. The only changes from the earlier patch are in handle_missing_profiles, where we now get the counts off of the entry and call stmt bbs, and in tree_profiling, where we call handle_missing_profiles earlier and I have removed the outlined cgraph rebuilding code since it doesn't need to be reinvoked. Honza, does this look ok for trunk when stage 1 reopens? David, I can send a similar patch for review to google-4_8 if it looks good. Thanks, Teresa ... Spec testing of my earlier patch hit an issue with the call to gimple_bb in this routine, since the caller was a thunk and therefore the edge did not have a call_stmt set. I've attached a slightly modified patch that guards the call by a check to cgraph_function_with_gimple_body_p. Regression and spec testing are clean. Teresa Ping - Honza, does this patch look ok for stage 1? Thanks, Teresa -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH GCC]Allow cfgcleanup to remove forwarder loop preheaders and latches
On Fri, Feb 28, 2014 at 8:11 AM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Feb 28, 2014 at 2:09 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Feb 28, 2014 at 10:09 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Feb 28, 2014 at 1:52 AM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Feb 24, 2014 at 9:12 PM, bin.cheng bin.ch...@arm.com wrote: Hi, This patch is to fix regression reported in PR60280 by removing forward loop headers/latches in cfg cleanup if possible. Several tests are broken by this change since cfg cleanup is shared by all optimizers. Some tests has already been fixed by recent patches, I went through and fixed the others. One case needs to be clarified is gcc.dg/tree-prof/update-loopch.c. When GCC removing a basic block, it checks profile information by calling check_bb_profile after redirecting incoming edges of the bb. This certainly results in warnings about invalid profile information and causes the case to fail. I will send a patch to skip checking profile information for a removing basic block in stage 1 if it sounds reasonable. For now I just twisted the case itself. Bootstrap and tested on x86_64 and arm_a15. Is it OK? 2014-02-25 Bin Cheng bin.ch...@arm.com PR target/60280 * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop preheaders and latches only if requested. Fix latch if it is removed. * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set LOOPS_HAVE_PREHEADERS. This change: if (dest-loop_father-header == dest) - return false; + { +if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) + bb-loop_father-header != dest) + return false; + +if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) + bb-loop_father-header == dest) + return false; + } } miscompiled 435.gromacs in SPEC CPU 2006 on x32 with -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver -fuse-linker-plugin This patch changes loops without LOOPS_HAVE_PREHEADERS nor LOOPS_HAVE_SIMPLE_LATCHES from returning false to returning true. I don't have a small testcase. But this patch: diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c index b5c384b..2ba673c 100644 --- a/gcc/tree-cfgcleanup.c +++ b/gcc/tree-cfgcleanup.c @@ -323,6 +323,10 @@ tree_forwarder_block_p (basic_block bb, bool phi_wanted) if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) bb-loop_father-header == dest) return false; + +if (!loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) + !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)) + return false; } } fixes the regression. Does it make any senses? I think the preheader test isn't fully correct (bb may be in an inner loop for example). So a more conservative variant would be Index: gcc/tree-cfgcleanup.c === --- gcc/tree-cfgcleanup.c (revision 208169) +++ gcc/tree-cfgcleanup.c (working copy) @@ -316,13 +316,13 @@ tree_forwarder_block_p (basic_block bb, /* Protect loop preheaders and latches if requested. */ if (dest-loop_father-header == dest) { - if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) - bb-loop_father-header != dest) - return false; - - if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) - bb-loop_father-header == dest) - return false; + if (bb-loop_father == dest-loop_father) + return !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES); + else if (bb-loop_father == loop_outer (dest-loop_father)) + return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS); + /* Always preserve other edges into loop headers that are +not simple latches or preheaders. */ + return false; } } that makes sure we can properly update loop information. It's also a more conservative change at this point which should still successfully remove simple latches and preheaders created by loop discovery. I think the patch makes sense anyway and thus I'll install it once it passed bootstrap / regtesting. Another fix that may make sense is to restrict it to !loops_state_satisfies_p (LOOPS_NEED_FIXUP), though cfgcleanup itself can end up setting that ... which we eventually should fix if it still happens. That is, check if Index: gcc/tree-cfgcleanup.c === --- gcc/tree-cfgcleanup.c (revision 208169) +++ gcc/tree-cfgcleanup.c (working copy) @@ -729,8 +729,9 @@ cleanup_tree_cfg_noloop (void) timevar_pop (TV_TREE_CLEANUP_CFG); - if (changed current_loops) -loops_state_set (LOOPS_NEED_FIXUP); + if (changed current_loops + !loops_state_satisfies_p
Re: [PATCH GCC]Allow cfgcleanup to remove forwarder loop preheaders and latches
On Fri, Feb 28, 2014 at 9:25 AM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Feb 28, 2014 at 8:11 AM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Feb 28, 2014 at 2:09 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Feb 28, 2014 at 10:09 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Feb 28, 2014 at 1:52 AM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Feb 24, 2014 at 9:12 PM, bin.cheng bin.ch...@arm.com wrote: Hi, This patch is to fix regression reported in PR60280 by removing forward loop headers/latches in cfg cleanup if possible. Several tests are broken by this change since cfg cleanup is shared by all optimizers. Some tests has already been fixed by recent patches, I went through and fixed the others. One case needs to be clarified is gcc.dg/tree-prof/update-loopch.c. When GCC removing a basic block, it checks profile information by calling check_bb_profile after redirecting incoming edges of the bb. This certainly results in warnings about invalid profile information and causes the case to fail. I will send a patch to skip checking profile information for a removing basic block in stage 1 if it sounds reasonable. For now I just twisted the case itself. Bootstrap and tested on x86_64 and arm_a15. Is it OK? 2014-02-25 Bin Cheng bin.ch...@arm.com PR target/60280 * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop preheaders and latches only if requested. Fix latch if it is removed. * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set LOOPS_HAVE_PREHEADERS. This change: if (dest-loop_father-header == dest) - return false; + { +if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) + bb-loop_father-header != dest) + return false; + +if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) + bb-loop_father-header == dest) + return false; + } } miscompiled 435.gromacs in SPEC CPU 2006 on x32 with -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver -fuse-linker-plugin This patch changes loops without LOOPS_HAVE_PREHEADERS nor LOOPS_HAVE_SIMPLE_LATCHES from returning false to returning true. I don't have a small testcase. But this patch: diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c index b5c384b..2ba673c 100644 --- a/gcc/tree-cfgcleanup.c +++ b/gcc/tree-cfgcleanup.c @@ -323,6 +323,10 @@ tree_forwarder_block_p (basic_block bb, bool phi_wanted) if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) bb-loop_father-header == dest) return false; + +if (!loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) + !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)) + return false; } } fixes the regression. Does it make any senses? I think the preheader test isn't fully correct (bb may be in an inner loop for example). So a more conservative variant would be Index: gcc/tree-cfgcleanup.c === --- gcc/tree-cfgcleanup.c (revision 208169) +++ gcc/tree-cfgcleanup.c (working copy) @@ -316,13 +316,13 @@ tree_forwarder_block_p (basic_block bb, /* Protect loop preheaders and latches if requested. */ if (dest-loop_father-header == dest) { - if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) - bb-loop_father-header != dest) - return false; - - if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) - bb-loop_father-header == dest) - return false; + if (bb-loop_father == dest-loop_father) + return !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES); + else if (bb-loop_father == loop_outer (dest-loop_father)) + return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS); + /* Always preserve other edges into loop headers that are +not simple latches or preheaders. */ + return false; } } that makes sure we can properly update loop information. It's also a more conservative change at this point which should still successfully remove simple latches and preheaders created by loop discovery. I think the patch makes sense anyway and thus I'll install it once it passed bootstrap / regtesting. Another fix that may make sense is to restrict it to !loops_state_satisfies_p (LOOPS_NEED_FIXUP), though cfgcleanup itself can end up setting that ... which we eventually should fix if it still happens. That is, check if Index: gcc/tree-cfgcleanup.c === --- gcc/tree-cfgcleanup.c (revision 208169) +++ gcc/tree-cfgcleanup.c (working copy) @@ -729,8 +729,9 @@ cleanup_tree_cfg_noloop (void) timevar_pop (TV_TREE_CLEANUP_CFG); - if (changed current_loops) -loops_state_set (LOOPS_NEED_FIXUP); + if
[PATCH, AArch64] Sync merge libffi - fix call frame information in ffi_closure_SYSV
Hi, The attached patch fixes a bug in ./src/aarch64/sysv.S:ffi_closure_SYSV where stack unwinding information was not generated correctly. The change has been reviewed, approved and merged into the stand-alone libffi release tree**. OK for the trunk? Thanks, Yufeng ** http://github.com/atgreen/libffi 2014-02-28 Yufeng Zhang yufeng.zh...@arm.com * src/aarch64/sysv.S (ffi_closure_SYSV): Use x29 as the main CFA reg; update cfi_rel_offset.diff --git a/libffi/src/aarch64/sysv.S b/libffi/src/aarch64/sysv.S index b8cd421..ffb16f8 100644 --- a/libffi/src/aarch64/sysv.S +++ b/libffi/src/aarch64/sysv.S @@ -231,13 +231,13 @@ ffi_closure_SYSV: cfi_rel_offset (x30, 8) mov x29, sp +cfi_def_cfa_register (x29) sub sp, sp, #ffi_closure_SYSV_FS - cfi_adjust_cfa_offset (ffi_closure_SYSV_FS) stp x21, x22, [x29, #-16] -cfi_rel_offset (x21, 0) -cfi_rel_offset (x22, 8) +cfi_rel_offset (x21, -16) +cfi_rel_offset (x22, -8) /* Load x21 with call_context. */ mov x21, sp @@ -295,7 +295,7 @@ ffi_closure_SYSV: cfi_restore (x22) mov sp, x29 - cfi_adjust_cfa_offset (-ffi_closure_SYSV_FS) +cfi_def_cfa_register (sp) ldp x29, x30, [sp], #16 cfi_adjust_cfa_offset (-16)
Re: copyright dates in binutils (and includes/)
On Fri, 28 Feb 2014, Joel Brobecker wrote: Joseph, do you know why implicitly adding years to the claimed copyright years is a problem? I'm guessing the file needs to be published somewhere for each year claimed. IANAL, but from 2 discussions with copyright-clerk: 1. We start claiming copyright the year the file as committed to a medium (hard drive), not the year it was published. I don't think it counts unless the version in question got published at some point. The question is about versions that weren't published at the time, but were published later when the version control history was released. There was a discussion on bug-standards starting Jan 2012. Karl's revised wording from 11 May 2012 seems to indicate that if a version was committed to a version control history that was later released, the dates from that history count as copyrightable years (so reducing the number of cases where it may not be possible to fill in gaps) - but that revised wording doesn't seem to have been committed. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH GCC]Allow cfgcleanup to remove forwarder loop preheaders and latches
On Fri, Feb 28, 2014 at 9:42 AM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Feb 28, 2014 at 9:25 AM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Feb 28, 2014 at 8:11 AM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Feb 28, 2014 at 2:09 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Feb 28, 2014 at 10:09 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Feb 28, 2014 at 1:52 AM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Feb 24, 2014 at 9:12 PM, bin.cheng bin.ch...@arm.com wrote: Hi, This patch is to fix regression reported in PR60280 by removing forward loop headers/latches in cfg cleanup if possible. Several tests are broken by this change since cfg cleanup is shared by all optimizers. Some tests has already been fixed by recent patches, I went through and fixed the others. One case needs to be clarified is gcc.dg/tree-prof/update-loopch.c. When GCC removing a basic block, it checks profile information by calling check_bb_profile after redirecting incoming edges of the bb. This certainly results in warnings about invalid profile information and causes the case to fail. I will send a patch to skip checking profile information for a removing basic block in stage 1 if it sounds reasonable. For now I just twisted the case itself. Bootstrap and tested on x86_64 and arm_a15. Is it OK? 2014-02-25 Bin Cheng bin.ch...@arm.com PR target/60280 * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop preheaders and latches only if requested. Fix latch if it is removed. * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set LOOPS_HAVE_PREHEADERS. This change: if (dest-loop_father-header == dest) - return false; + { +if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) + bb-loop_father-header != dest) + return false; + +if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) + bb-loop_father-header == dest) + return false; + } } miscompiled 435.gromacs in SPEC CPU 2006 on x32 with -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver -fuse-linker-plugin This patch changes loops without LOOPS_HAVE_PREHEADERS nor LOOPS_HAVE_SIMPLE_LATCHES from returning false to returning true. I don't have a small testcase. But this patch: diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c index b5c384b..2ba673c 100644 --- a/gcc/tree-cfgcleanup.c +++ b/gcc/tree-cfgcleanup.c @@ -323,6 +323,10 @@ tree_forwarder_block_p (basic_block bb, bool phi_wanted) if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) bb-loop_father-header == dest) return false; + +if (!loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) + !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)) + return false; } } fixes the regression. Does it make any senses? I think the preheader test isn't fully correct (bb may be in an inner loop for example). So a more conservative variant would be Index: gcc/tree-cfgcleanup.c === --- gcc/tree-cfgcleanup.c (revision 208169) +++ gcc/tree-cfgcleanup.c (working copy) @@ -316,13 +316,13 @@ tree_forwarder_block_p (basic_block bb, /* Protect loop preheaders and latches if requested. */ if (dest-loop_father-header == dest) { - if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) - bb-loop_father-header != dest) - return false; - - if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) - bb-loop_father-header == dest) - return false; + if (bb-loop_father == dest-loop_father) + return !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES); + else if (bb-loop_father == loop_outer (dest-loop_father)) + return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS); + /* Always preserve other edges into loop headers that are +not simple latches or preheaders. */ + return false; } } that makes sure we can properly update loop information. It's also a more conservative change at this point which should still successfully remove simple latches and preheaders created by loop discovery. I think the patch makes sense anyway and thus I'll install it once it passed bootstrap / regtesting. Another fix that may make sense is to restrict it to !loops_state_satisfies_p (LOOPS_NEED_FIXUP), though cfgcleanup itself can end up setting that ... which we eventually should fix if it still happens. That is, check if Index: gcc/tree-cfgcleanup.c === --- gcc/tree-cfgcleanup.c (revision 208169) +++ gcc/tree-cfgcleanup.c (working copy) @@ -729,8 +729,9 @@ cleanup_tree_cfg_noloop (void) timevar_pop (TV_TREE_CLEANUP_CFG); - if
Re: [C++ Patch] PR 58610
On 02/28/2014 04:57 PM, Jason Merrill wrote: OK. Thanks. I'm going to commit as obvious the additional lambda.c hunk below, which removes another now redundant STRIP_TEMPLATE use. Thanks, Paolo. /cp 2014-02-28 Paolo Carlini paolo.carl...@oracle.com PR c++/58610 * cp-tree.h (DECL_DELETED_FN): Use LANG_DECL_FN_CHECK. * call.c (print_z_candidate): Remove STRIP_TEMPLATE use. * lambda.c (maybe_add_lambda_conv_op): Likewise. /testsuite 2014-02-28 Paolo Carlini paolo.carl...@oracle.com PR c++/58610 * g++.dg/cpp0x/constexpr-ice11.C: New. Index: cp/call.c === --- cp/call.c (revision 208224) +++ cp/call.c (working copy) @@ -3237,7 +3237,7 @@ print_z_candidate (location_t loc, const char *msg inform (cloc, %s%T conversion, msg, candidate-fn); else if (candidate-viable == -1) inform (cloc, %s%#D near match, msg, candidate-fn); - else if (DECL_DELETED_FN (STRIP_TEMPLATE (candidate-fn))) + else if (DECL_DELETED_FN (candidate-fn)) inform (cloc, %s%#D deleted, msg, candidate-fn); else inform (cloc, %s%#D, msg, candidate-fn); Index: cp/cp-tree.h === --- cp/cp-tree.h(revision 208224) +++ cp/cp-tree.h(working copy) @@ -3222,7 +3222,7 @@ more_aggr_init_expr_args_p (const aggr_init_expr_a /* Nonzero if DECL was declared with '= delete'. */ #define DECL_DELETED_FN(DECL) \ - (DECL_LANG_SPECIFIC (FUNCTION_DECL_CHECK (DECL))-u.base.threadprivate_or_deleted_p) + (LANG_DECL_FN_CHECK (DECL)-min.base.threadprivate_or_deleted_p) /* Nonzero if DECL was declared with '= default' (maybe implicitly). */ #define DECL_DEFAULTED_FN(DECL) \ Index: cp/lambda.c === --- cp/lambda.c (revision 208224) +++ cp/lambda.c (working copy) @@ -975,7 +975,7 @@ maybe_add_lambda_conv_op (tree type) the conversion op is used. */ if (varargs_function_p (callop)) { - DECL_DELETED_FN (STRIP_TEMPLATE (fn)) = 1; + DECL_DELETED_FN (fn) = 1; return; } Index: testsuite/g++.dg/cpp0x/constexpr-ice11.C === --- testsuite/g++.dg/cpp0x/constexpr-ice11.C(revision 0) +++ testsuite/g++.dg/cpp0x/constexpr-ice11.C(working copy) @@ -0,0 +1,9 @@ +// PR c++/58610 +// { dg-do compile { target c++11 } } + +struct A +{ + templatetypename A(); +}; + +constexpr A a; // { dg-error literal|matching }
Re: [AArch64] Improve vst4_lane intrinsics
On 13 February 2014 16:03, James Greenhalgh james.greenha...@arm.com wrote: Hi, This patch rewrites the vst4_lane intrinsics in terms of RTL builtins. Tested on aarch64-none-elf with no issues. OK to queue for Stage 1? OK for stage 1 /Marcus
Re: RFA: RL78: Add missing instruction patterns
* config/rl78/rl78-real.md (cbranchsi4_real_signed): Add anti-cacnonical alternatives. (negandhi3_real): New pattern. * config/rl78/rl78-virt.md (negandhi3_virt): New pattern. These are fine, although I don't know why gcc would require a negandhi3 pattern...
Re: [PATCH,GRAPHITE] Fix for P1 bug 58028
Hi, Thanks. Here is the updated patch. 2014-02-26 Tobias Grosser tob...@grosser.es Mircea Namolaru mircea.namol...@inria.fr PR tree-optimization/58028 * graphite-clast-to-gimple.c (set_cloog_options): Don't remove scalar dimensions. Index: gcc/graphite-clast-to-gimple.c === --- gcc/graphite-clast-to-gimple.c (revision 207298) +++ gcc/graphite-clast-to-gimple.c (working copy) @@ -1522,6 +1522,13 @@ variables. */ options-save_domains = 1; + /* Do not remove scalar dimensions. CLooG by default removes scalar + dimensions very early from the input schedule. However, they are + necessary to correctly derive from the saved domains + (options-save_domains) the relationship between the generated loops + and the schedule dimensions they are generated from. */ + options-noscalars = 1; + /* Disable optimizations and make cloog generate source code closer to the input. This is useful for debugging, but later we want the optimized code. Mircea
RFA: ipa-devirt PATCH for c++/58678 (devirt causes KDE build failure)
Multiple large C++ projects (KDE and libreoffice, at least) have been breaking when GCC speculatively devirtualizes a call to an implicitly-declared virtual destructor, because this leads to references to base destructors and vtables that might be hidden in another DSO. This patch avoids this problem by avoiding speculative devirtualization of calls to implicitly-declared functions. Tested x86_64-pc-linux-gnu. OK for trunk? commit 94eb5df9fb20c796d09151d7293ae89ac012ae79 Author: Jason Merrill ja...@redhat.com Date: Fri Feb 28 14:03:19 2014 -0500 PR c++/58678 * ipa-devirt.c (ipa_devirt): Don't choose an implicitly-declared function. diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c index 21649cb..27dc27d 100644 --- a/gcc/ipa-devirt.c +++ b/gcc/ipa-devirt.c @@ -1710,7 +1710,7 @@ ipa_devirt (void) int npolymorphic = 0, nspeculated = 0, nconverted = 0, ncold = 0; int nmultiple = 0, noverwritable = 0, ndevirtualized = 0, nnotdefined = 0; - int nwrong = 0, nok = 0, nexternal = 0;; + int nwrong = 0, nok = 0, nexternal = 0, nartificial = 0; FOR_EACH_DEFINED_FUNCTION (n) { @@ -1820,6 +1820,16 @@ ipa_devirt (void) nexternal++; continue; } + /* Don't use an implicitly-declared destructor (c++/58678). */ + struct cgraph_node *real_target + = cgraph_function_node (likely_target); + if (DECL_ARTIFICIAL (real_target-decl)) + { + if (dump_file) + fprintf (dump_file, Target is implicitly declared\n\n); + nartificial++; + continue; + } if (cgraph_function_body_availability (likely_target) = AVAIL_OVERWRITABLE symtab_can_be_discarded (likely_target)) @@ -1862,10 +1872,10 @@ ipa_devirt (void) %i speculatively devirtualized, %i cold\n %i have multiple targets, %i overwritable, %i already speculated (%i agree, %i disagree), - %i external, %i not defined\n, + %i external, %i not defined, %i artificial\n, npolymorphic, ndevirtualized, nconverted, ncold, nmultiple, noverwritable, nspeculated, nok, nwrong, - nexternal, nnotdefined); + nexternal, nnotdefined, nartificial); return ndevirtualized ? TODO_remove_functions : 0; } diff --git a/gcc/testsuite/g++.dg/ipa/devirt-28.C b/gcc/testsuite/g++.dg/ipa/devirt-28.C new file mode 100644 index 000..35c8df1 --- /dev/null +++ b/gcc/testsuite/g++.dg/ipa/devirt-28.C @@ -0,0 +1,17 @@ +// PR c++/58678 +// { dg-options -O3 -fdump-ipa-devirt } + +struct A { + virtual ~A(); +}; +struct B : A { + virtual int m_fn1(); +}; +void fn1(B* b) { + delete b; +} + +// { dg-final { scan-assembler-not _ZN1AD2Ev } } +// { dg-final { scan-assembler-not _ZN1BD0Ev } } +// { dg-final { scan-ipa-dump Target is implicitly declared devirt } } +// { dg-final { cleanup-ipa-dump devirt } }
Re: RFA: ipa-devirt PATCH for c++/58678 (devirt causes KDE build failure)
Multiple large C++ projects (KDE and libreoffice, at least) have been breaking when GCC speculatively devirtualizes a call to an implicitly-declared virtual destructor, because this leads to references to base destructors and vtables that might be hidden in another DSO. This patch avoids this problem by avoiding speculative devirtualization of calls to implicitly-declared functions. Tested x86_64-pc-linux-gnu. OK for trunk? commit 94eb5df9fb20c796d09151d7293ae89ac012ae79 Author: Jason Merrill ja...@redhat.com Date: Fri Feb 28 14:03:19 2014 -0500 PR c++/58678 * ipa-devirt.c (ipa_devirt): Don't choose an implicitly-declared function. diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c index 21649cb..27dc27d 100644 --- a/gcc/ipa-devirt.c +++ b/gcc/ipa-devirt.c @@ -1710,7 +1710,7 @@ ipa_devirt (void) int npolymorphic = 0, nspeculated = 0, nconverted = 0, ncold = 0; int nmultiple = 0, noverwritable = 0, ndevirtualized = 0, nnotdefined = 0; - int nwrong = 0, nok = 0, nexternal = 0;; + int nwrong = 0, nok = 0, nexternal = 0, nartificial = 0; FOR_EACH_DEFINED_FUNCTION (n) { @@ -1820,6 +1820,16 @@ ipa_devirt (void) nexternal++; continue; } + /* Don't use an implicitly-declared destructor (c++/58678). */ + struct cgraph_node *real_target + = cgraph_function_node (likely_target); + if (DECL_ARTIFICIAL (real_target-decl)) I think we can safely test here DECL_ARTIFICIAL (DECL_EXTERNAL || DECL_COMDAT). If the dtor is going to be output anyway, we are safe to use it. Are those programs valid by C++ standard? (I believe it is not valid to include sutff whose implementation you do not link with.). If we just want to avoid breaking python and libreoffice (I fixed libreoffice part however), we may just go with the ipa-devirt change as you propose (with externalcomdat check). If this is an correcness issue, I think we want to be safe that other optimizations won't do the same. In that case your check seems misplaced. If DECL_ARTIFICIAL destructors are not safe to inline, I would add it into function_attribute_inlinable_p. If the dtor is not safe to refer, then I would add it into can_refer_decl_in_current_unit_p Both such changes would however inhibit quite some potimization, since artificial destructors are quite common case, right? Or is there some reason why only speculative devirtualiztaion count possibly work out reference to these? Honza
Re: RFA: ipa-devirt PATCH for c++/58678 (devirt causes KDE build failure)
On 02/28/2014 03:56 PM, Jan Hubicka wrote: I think we can safely test here DECL_ARTIFICIAL (DECL_EXTERNAL || DECL_COMDAT). If the dtor is going to be output anyway, we are safe to use it. We already skipped DECL_EXTERNAL decls, and artificial members are always DECL_COMDAT, but I'll add the COMDAT check. Are those programs valid by C++ standard? (I believe it is not valid to include stuff whose implementation you do not link with.). Symbol visibility is outside the scope of the standard. If we just want to avoid breaking python and libreoffice (I fixed libreoffice part however), we may just go with the ipa-devirt change as you propose (with externalcomdat check). If this is an correctness issue, I think we want to be safe that other optimizations won't do the same. In that case your check seems misplaced. If DECL_ARTIFICIAL destructors are not safe to inline, I would add it into function_attribute_inlinable_p. If the dtor is not safe to refer, then I would add it into can_refer_decl_in_current_unit_p Both such changes would however inhibit quite some optimization, since artificial destructors are quite common case, right? Or is there some reason why only speculative devirtualization count possibly work out reference to these? Normally, it's fine to inline destructors, and refer to them. The problem comes when we turn what had been a virtual call (which goes through the vtable that is hidden in the DSO) into a direct call to a hidden function. We don't do that for user-defined virtual functions because the user controls whether or not they are defined in the header, and we don't devirtualize if no definition is available, but implicitly-declared functions are different because the user has no way to prevent the definition from being available. This also isn't a problem for cprop devirtualization, because in that situation we must have already referred to the vtable. Jason commit 2a05a09c268ce3abb373aa86cf731d20aac8dd7a Author: Jason Merrill ja...@redhat.com Date: Fri Feb 28 14:03:19 2014 -0500 PR c++/58678 * ipa-devirt.c (ipa_devirt): Don't choose an implicitly-declared function. diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c index 21649cb..2f84f17 100644 --- a/gcc/ipa-devirt.c +++ b/gcc/ipa-devirt.c @@ -1710,7 +1710,7 @@ ipa_devirt (void) int npolymorphic = 0, nspeculated = 0, nconverted = 0, ncold = 0; int nmultiple = 0, noverwritable = 0, ndevirtualized = 0, nnotdefined = 0; - int nwrong = 0, nok = 0, nexternal = 0;; + int nwrong = 0, nok = 0, nexternal = 0, nartificial = 0; FOR_EACH_DEFINED_FUNCTION (n) { @@ -1820,6 +1820,17 @@ ipa_devirt (void) nexternal++; continue; } + /* Don't use an implicitly-declared destructor (c++/58678). */ + struct cgraph_node *non_thunk_target + = cgraph_function_node (likely_target); + if (DECL_ARTIFICIAL (non_thunk_target-decl) + DECL_COMDAT (non_thunk_target-decl)) + { + if (dump_file) + fprintf (dump_file, Target is artificial\n\n); + nartificial++; + continue; + } if (cgraph_function_body_availability (likely_target) = AVAIL_OVERWRITABLE symtab_can_be_discarded (likely_target)) @@ -1862,10 +1873,10 @@ ipa_devirt (void) %i speculatively devirtualized, %i cold\n %i have multiple targets, %i overwritable, %i already speculated (%i agree, %i disagree), - %i external, %i not defined\n, + %i external, %i not defined, %i artificial\n, npolymorphic, ndevirtualized, nconverted, ncold, nmultiple, noverwritable, nspeculated, nok, nwrong, - nexternal, nnotdefined); + nexternal, nnotdefined, nartificial); return ndevirtualized ? TODO_remove_functions : 0; } diff --git a/gcc/testsuite/g++.dg/ipa/devirt-28.C b/gcc/testsuite/g++.dg/ipa/devirt-28.C new file mode 100644 index 000..e18b818 --- /dev/null +++ b/gcc/testsuite/g++.dg/ipa/devirt-28.C @@ -0,0 +1,17 @@ +// PR c++/58678 +// { dg-options -O3 -fdump-ipa-devirt } + +struct A { + virtual ~A(); +}; +struct B : A { + virtual int m_fn1(); +}; +void fn1(B* b) { + delete b; +} + +// { dg-final { scan-assembler-not _ZN1AD2Ev } } +// { dg-final { scan-assembler-not _ZN1BD0Ev } } +// { dg-final { scan-ipa-dump Target is artificial devirt } } +// { dg-final { cleanup-ipa-dump devirt } }
Re: [C++ patch] for C++/52369
On 02/28/2014 04:03 PM, Fabien Chêne wrote: The first two lines are fine in my opinion. The third line should actually be split into an error + an inform. By doing that, I think we also need to reformulate the error message like this: testsuite/g++.dg/init/pr44086.C:4:8: error: 'struct A' needs its non-static const members to be initialized testsuite/g++.dg/init/pr44086.C:6:19: note: 'A::i' should be initialized What do you think ? (before I bother adjusting the testsuite) Let's change the C++11 diagnostic to match the C++98 diagnostic. So, uninitialized const member in %q#T + %qD should be initialized. Incidentally, while moving the diagnostic concerning the uninitialized field from an error to an inform, I realized that the syntactic sugar %q#D is no longer honored an is treated as %qD, is it expected ? No, how do you mean? Jason
[jit] New API entrypoint: gcc_jit_context_new_cast
Committed to branch dmalcolm/jit: gcc/jit/ * libgccjit.h (gcc_jit_context_new_cast): New. * libgccjit.map (gcc_jit_context_new_cast): New. * libgccjit++.h (gccjit::context::new_cast): New method. * libgccjit.c (gcc_jit_context_new_cast): New. * internal-api.h (gcc::jit::recording::context::new_cast): New method. (gcc::jit::recording::cast): New subclass of rvalue. (gcc::jit::playback::context::new_cast): New method. (gcc::jit::playback::context::build_cast): New method. * internal-api.c (convert): New. (gcc::jit::recording::context::new_cast): New. (gcc::jit::recording::cast::replay_into): New. (gcc::jit::recording::cast::make_debug_string): New. (gcc::jit::playback::context::build_cast): New. (gcc::jit::playback::context::new_cast): New. * TODO.rst: Update. gcc/testsuite/ * jit.dg/test-expressions.c (make_test_of_cast): New, to test new entrypoint gcc_jit_context_new_cast. (make_tests_of_casts): New. (create_code): Add call to make_tests_of_casts. (verify_code): Add call to verify_casts. --- gcc/jit/ChangeLog.jit | 21 ++ gcc/jit/TODO.rst| 9 +-- gcc/jit/internal-api.c | 102 ++ gcc/jit/internal-api.h | 33 + gcc/jit/libgccjit++.h | 15 gcc/jit/libgccjit.c | 13 gcc/jit/libgccjit.h | 11 +++ gcc/jit/libgccjit.map | 1 + gcc/testsuite/ChangeLog.jit | 8 ++ gcc/testsuite/jit.dg/test-expressions.c | 126 10 files changed, 332 insertions(+), 7 deletions(-) diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit index 6c43ce9..625e01a 100644 --- a/gcc/jit/ChangeLog.jit +++ b/gcc/jit/ChangeLog.jit @@ -1,5 +1,26 @@ 2014-02-28 David Malcolm dmalc...@redhat.com + * libgccjit.h (gcc_jit_context_new_cast): New. + * libgccjit.map (gcc_jit_context_new_cast): New. + * libgccjit++.h (gccjit::context::new_cast): New method. + * libgccjit.c (gcc_jit_context_new_cast): New. + + * internal-api.h (gcc::jit::recording::context::new_cast): New method. + (gcc::jit::recording::cast): New subclass of rvalue. + (gcc::jit::playback::context::new_cast): New method. + (gcc::jit::playback::context::build_cast): New method. + + * internal-api.c (convert): New. + (gcc::jit::recording::context::new_cast): New. + (gcc::jit::recording::cast::replay_into): New. + (gcc::jit::recording::cast::make_debug_string): New. + (gcc::jit::playback::context::build_cast): New. + (gcc::jit::playback::context::new_cast): New. + + * TODO.rst: Update. + +2014-02-28 David Malcolm dmalc...@redhat.com + * libgccjit.h (gcc_jit_block_get_function): New. * libgccjit.map (gcc_jit_block_get_function): New. * libgccjit++.h (gccjit::block::get_function): New method. diff --git a/gcc/jit/TODO.rst b/gcc/jit/TODO.rst index 227113a..8a2308e 100644 --- a/gcc/jit/TODO.rst +++ b/gcc/jit/TODO.rst @@ -23,13 +23,6 @@ Initial Release * expose the statements in the API? (mostly so they can be stringified?) -* explicit casts:: - -extern gcc_jit_rvalue * -gcc_jit_rvalue_cast (gcc_jit_rvalue *, gcc_jit_type *); - - e.g. (void*) to (struct foo*) - * support more arithmetic ops and comparison modes * access to a function by address:: @@ -119,6 +112,8 @@ Initial Release have each block have its own stmt_list, avoiding the need for this traversal, and having the block structure show up within tree dumps. +* Implement more kinds of casts e.g. pointers + Bugs * INTERNAL functions don't seem to work (see e.g. test-quadratic, on trying diff --git a/gcc/jit/internal-api.c b/gcc/jit/internal-api.c index fa08e56..573dc67 100644 --- a/gcc/jit/internal-api.c +++ b/gcc/jit/internal-api.c @@ -16,12 +16,29 @@ #include diagnostic-core.h #include dumpfile.h #include tree-cfg.h +#include target.h +#include convert.h #include pthread.h #include internal-api.h #include jit-builtins.h +/* gcc::jit::playback::context::build_cast uses the convert.h API, + which in turn requires the frontend to provide a convert + function, apparently as a fallback. + + Hence we provide this dummy one, with the requirement that any casts + are handled before reaching this. */ +extern tree convert (tree type, tree expr); + +tree +convert (tree /*type*/, tree /*expr*/) +{ + error (unhandled conversion); + return error_mark_node; +} + namespace gcc { namespace jit { @@ -474,6 +491,16 @@ recording::context::new_comparison (recording::location *loc, } recording::rvalue * +recording::context::new_cast (recording::location *loc, + recording::rvalue *expr, + recording::type
Re: [jit] Major API change: blocks rather than labels
On Thu, 2014-02-27 at 17:25 -0500, David Malcolm wrote: On Thu, 2014-02-27 at 17:11 -0500, David Malcolm wrote: [...] With this commit, the API changes to using basic blocks instead: blocks are created within functions, and statements are added to blocks, rather than to functions. [...] I've also ported the jittest example to the new API, as of this commit: https://github.com/davidmalcolm/jittest/commit/af66efe0386e52a9292b7527174ae402c0af5e43 (though currently it falls foul of type-checking, due to int vs bool issues in conditionals; upon hacking out the type-checking from libgccjit it compiles and runs OK). jittest is now fixed, as of: https://github.com/davidmalcolm/jittest/commit/7af0765c018e15d600016d41f7b444273cc0389a
Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation
On 02/28/2014 05:21 PM, Bernd Schmidt wrote: On 02/28/2014 05:09 PM, Ilya Verbin wrote: Unfortunately I don't fully understand this configure magic... When a user specifies 2 or 3 accelerators during configuration with --enable-accelerators, will several different accel-gccs be built? No - the idea is that --enable-accelerator= is likely specific to ptx, where we really just want to build a gcc and no target libraries, so building it alongside the host in an accel-gcc subdirectory is ideal. For your use case, I'd imagine the offload compiler would be built relatively normally as a full build with --enable-as-accelerator-for=x86_64-linux, which would install it into locations where the host will eventually be able to find it. Then the host compiler would be built with another new configure option (as yet unimplemented in my patch set) --enable-offload-targets=mic,... which would tell the host compiler about the pre-built offload target compilers. On the ptx side, --enable-accelerator=ptx would then also add ptx to the list of --enable-offload-targets. Naming of all these configure options can be discussed, I have no real preference for any of them. IOW, something like the following on top of the other patches. Ideally we'd also add error checking to make sure the offload compilers exist in the places we'll be looking for them. Bernd Index: gomp-4_0-branch/gcc/config.in === --- gomp-4_0-branch.orig/gcc/config.in +++ gomp-4_0-branch/gcc/config.in @@ -1748,6 +1748,12 @@ #endif +/* Define to hold the list of target names suitable for offloading. */ +#ifndef USED_FOR_TARGET +#undef OFFLOAD_TARGETS +#endif + + /* Define to the address where bug reports for this package should be sent. */ #ifndef USED_FOR_TARGET #undef PACKAGE_BUGREPORT Index: gomp-4_0-branch/gcc/configure === --- gomp-4_0-branch.orig/gcc/configure +++ gomp-4_0-branch/gcc/configure @@ -908,6 +908,7 @@ with_bugurl enable_languages enable_accelerator enable_as_accelerator_for +enable_offload_targets with_multilib_list enable_rpath with_libiconv_prefix @@ -1618,6 +1619,8 @@ Optional Features: --enable-acceleratorbuild accelerator [ARG={no,device-triplet}] --enable-as-accelerator-for build compiler as accelerator target for given host + --enable-offload-targets=LIST + enable offloading to devices from LIST --disable-rpath do not hardcode runtime library paths --enable-sjlj-exceptions arrange to use setjmp/longjmp exception handling @@ -7299,12 +7302,14 @@ else fi +offload_targets= # Check whether --enable-accelerator was given. if test ${enable_accelerator+set} = set; then : enableval=$enable_accelerator; case $enable_accelerator in no) ;; *) +offload_targets=$enable_accelerator $as_echo #define ENABLE_OFFLOADING 1 confdefs.h @@ -7343,6 +7348,31 @@ fi +# Check whether --enable-offload-targets was given. +if test ${enable_offload_targets+set} = set; then : + enableval=$enable_offload_targets; + if test x$enable_offload_targets = x; then +as_fn_error no offload targets specified $LINENO 5 + else +if test x$offload_targets = x; then + offload_targets=$enable_offload_targets +else + offload_targets=$offload_targets,$enable_offload_targets +fi + fi + +else + enable_accelerator=no +fi + + +offload_targets=`echo $offload_targets | sed -e 's#,#:#'` + +cat confdefs.h _ACEOF +#define OFFLOAD_TARGETS $offload_targets +_ACEOF + + # Check whether --with-multilib-list was given. if test ${with_multilib_list+set} = set; then : @@ -17983,7 +18013,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat conftest.$ac_ext _LT_EOF -#line 17986 configure +#line 18016 configure #include confdefs.h #if HAVE_DLFCN_H @@ -18089,7 +18119,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat conftest.$ac_ext _LT_EOF -#line 18092 configure +#line 18122 configure #include confdefs.h #if HAVE_DLFCN_H Index: gomp-4_0-branch/gcc/configure.ac === --- gomp-4_0-branch.orig/gcc/configure.ac +++ gomp-4_0-branch/gcc/configure.ac @@ -839,12 +839,14 @@ AC_ARG_ENABLE(languages, esac], [enable_languages=c]) +offload_targets= AC_ARG_ENABLE(accelerator, [AS_HELP_STRING([--enable-accelerator], [build accelerator @:@ARG={no,device-triplet}@:@])], [ case $enable_accelerator in no) ;; *) +offload_targets=$enable_accelerator AC_DEFINE(ENABLE_OFFLOADING, 1, [Define this to enable support for offloading.]) AC_DEFINE_UNQUOTED(ACCEL_TARGET,${enable_accelerator}, @@ -871,6 +873,25 @@ AC_ARG_ENABLE(as-accelerator-for, ], [enable_as_accelerator=no]) AC_SUBST(enable_as_accelerator)
Re: [C++ patch] for C++/52369
2014-02-28 22:52 GMT+01:00 Fabien Chêne fabien.ch...@gmail.com: Incidentally, while moving the diagnostic concerning the uninitialized field from an error to an inform, I realized that the syntactic sugar %q#D is no longer honored an is treated as %qD, is it expected ? No, how do you mean? I must be tired, false alarm, sorry. I guess my mistake comes from the fact that %q#D is not present in the c++98 diagnostic. Shall we homogeneise that as well ? In favor of %q#D ? -- Fabien
Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation
On 28 Feb 17:21, Bernd Schmidt wrote: It would help to see the code you have on the libgomp side, I don't believe that's been posted yet? It was posted here: http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01777.html And below is the updated version. --- libgomp/libgomp.map |1 + libgomp/target.c| 138 --- 2 files changed, 132 insertions(+), 7 deletions(-) diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map index cb52e45..d33673d 100644 --- a/libgomp/libgomp.map +++ b/libgomp/libgomp.map @@ -208,6 +208,7 @@ GOMP_3.0 { GOMP_4.0 { global: + GOMP_register_lib; GOMP_barrier_cancel; GOMP_cancel; GOMP_cancellation_point; diff --git a/libgomp/target.c b/libgomp/target.c index a6a5505..7fafa9a 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -84,6 +84,19 @@ struct splay_tree_key_s { bool copy_from; }; +enum library_descr { + DESCR_TABLE_START, + DESCR_TABLE_END, + DESCR_IMAGE_START, + DESCR_IMAGE_END +}; + +/* Array of pointers to target shared library descriptors. */ +static void **libraries; + +/* Total number of target shared libraries. */ +static int num_libraries; + /* Array of descriptors of all available devices. */ static struct gomp_device_descr *devices; @@ -107,6 +120,12 @@ splay_compare (splay_tree_key x, splay_tree_key y) #include splay-tree.h +struct target_table_s +{ + void **entries; + int num_entries; +}; + /* This structure describes accelerator device. It contains name of the corresponding libgomp plugin, function handlers for interaction with the device, ID-number of the device, and information about @@ -117,15 +136,21 @@ struct gomp_device_descr TARGET construct. */ int id; + /* Set to true when device is initialized. */ + bool is_initialized; + /* Plugin file handler. */ void *plugin_handle; /* Function handlers. */ - bool (*device_available_func) (void); + bool (*device_available_func) (int); + void (*device_init_func) (int); + struct target_table_s (*device_load_image_func) (void *, int); void *(*device_alloc_func) (size_t); void (*device_free_func) (void *); void *(*device_dev2host_func)(void *, const void *, size_t); void *(*device_host2dev_func)(void *, const void *, size_t); + void (*device_run_func) (void *, void *); /* Splay tree containing information about mapped memory regions. */ struct splay_tree_s dev_splay_tree; @@ -471,6 +496,80 @@ gomp_update (struct gomp_device_descr *devicep, size_t mapnum, gomp_mutex_unlock (devicep-dev_env_lock); } +void +GOMP_register_lib (const void *openmp_target) +{ + libraries = realloc (libraries, (num_libraries + 1) * sizeof (void *)); + + if (libraries == NULL) +return; + + libraries[num_libraries] = (void *) openmp_target; + + num_libraries++; +} + +static void +gomp_init_device (struct gomp_device_descr *devicep) +{ + /* Initialize the target device. */ + devicep-device_init_func (devicep-id); + + /* Load shared libraries into target device and + perform host-target address mapping. */ + int i; + for (i = 0; i num_libraries; i++) +{ + /* Get the pointer to the target image from the library descriptor. */ + void **lib = libraries[i]; + + /* FIXME: Select the proper target image, if there are several. */ + void *target_image = lib[DESCR_IMAGE_START]; + int target_img_size = lib[DESCR_IMAGE_END] - lib[DESCR_IMAGE_START]; + + /* Calculate the size of host address table. */ + void **host_table_start = lib[DESCR_TABLE_START]; + void **host_table_end = lib[DESCR_TABLE_END]; + int host_table_size = host_table_end - host_table_start; + + /* Load library into target device and receive its address table. */ + struct target_table_s target_table + = devicep-device_load_image_func (target_image, target_img_size); + + if (host_table_size != target_table.num_entries) + gomp_fatal (Can't map target objects); + + void **host_entry, **target_entry; + for (host_entry = host_table_start, target_entry = target_table.entries; + host_entry host_table_end; host_entry += 2, target_entry += 2) + { + struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt)); + tgt-refcount = 1; + tgt-array = gomp_malloc (sizeof (*tgt-array)); + tgt-tgt_start = (uintptr_t) *target_entry; + tgt-tgt_end = tgt-tgt_start + *((uint64_t *) target_entry + 1); + tgt-to_free = NULL; + tgt-list_count = 0; + tgt-device_descr = devicep; + splay_tree_node node = tgt-array; + splay_tree_key k = node-key; + k-host_start = (uintptr_t) *host_entry; + k-host_end = k-host_start + *((uint64_t *) host_entry + 1); + k-tgt_offset = 0; + k-tgt = tgt; + node-left = NULL; + node-right = NULL; + splay_tree_insert (devicep-dev_splay_tree, node); + } + +
[GOOGLE] Remove size check when loop is very hot
This patch removes the size limit for loop unroll/peel when the loop is truly hot. This makes the implementation easily maintanable between FDO and AutoFDO. Bootstrapped and loadtest perf show neutral impact. OK for google-4_8? Thanks, Dehao Index: gcc/loop-unroll.c === --- gcc/loop-unroll.c (revision 208233) +++ gcc/loop-unroll.c (working copy) @@ -347,11 +347,9 @@ code_size_limit_factor(struct loop *loop) /* Next, set the value of the codesize-based unroll factor divisor which in most loops will need to be set to a value that will reduce or eliminate unrolling/peeling. */ - if (num_hot_counters size_threshold * 2 - loop-header-count 0) + if (loop-header-count 0) { - /* For applications that are less than twice the codesize limit, allow - limited unrolling for very hot loops. */ + /* Allow limited unrolling for very hot loops. */ sum_to_header_ratio = profile_info-sum_all / loop-header-count; hotness_ratio_threshold = PARAM_VALUE (PARAM_UNROLLPEEL_HOTNESS_THRESHOLD); /* When the profile count sum to loop entry header ratio is smaller than
Re: [GOOGLE] Remove size check when loop is very hot
Looks good to me. Thanks, Teresa On Fri, Feb 28, 2014 at 2:17 PM, Dehao Chen de...@google.com wrote: This patch removes the size limit for loop unroll/peel when the loop is truly hot. This makes the implementation easily maintanable between FDO and AutoFDO. Bootstrapped and loadtest perf show neutral impact. OK for google-4_8? Thanks, Dehao Index: gcc/loop-unroll.c === --- gcc/loop-unroll.c (revision 208233) +++ gcc/loop-unroll.c (working copy) @@ -347,11 +347,9 @@ code_size_limit_factor(struct loop *loop) /* Next, set the value of the codesize-based unroll factor divisor which in most loops will need to be set to a value that will reduce or eliminate unrolling/peeling. */ - if (num_hot_counters size_threshold * 2 - loop-header-count 0) + if (loop-header-count 0) { - /* For applications that are less than twice the codesize limit, allow - limited unrolling for very hot loops. */ + /* Allow limited unrolling for very hot loops. */ sum_to_header_ratio = profile_info-sum_all / loop-header-count; hotness_ratio_threshold = PARAM_VALUE (PARAM_UNROLLPEEL_HOTNESS_THRESHOLD); /* When the profile count sum to loop entry header ratio is smaller than -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
calloc = malloc + memset
Hello, this is a stage 1 patch, and I'll ping it then, but if you have comments now... Passes bootstrap+testsuite on x86_64-linux-gnu. 2014-02-28 Marc Glisse marc.gli...@inria.fr PR tree-optimization/57742 gcc/ * tree-ssa-forwprop.c (simplify_malloc_memset): New function. (simplify_builtin_call): Call it. gcc/testsuite/ * g++.dg/tree-ssa/calloc.C: New testcase. * gcc.dg/tree-ssa/calloc.c: Likewise. -- Marc GlisseIndex: gcc/testsuite/g++.dg/tree-ssa/calloc.C === --- gcc/testsuite/g++.dg/tree-ssa/calloc.C (revision 0) +++ gcc/testsuite/g++.dg/tree-ssa/calloc.C (working copy) @@ -0,0 +1,35 @@ +/* { dg-do compile } */ +/* { dg-options -std=gnu++11 -O3 -fdump-tree-optimized } */ + +#include new +#include vector +#include cstdlib + +void g(void*); +inline void* operator new(std::size_t sz) _GLIBCXX_THROW (std::bad_alloc) +{ + void *p; + + if (sz == 0) +sz = 1; + + // Slightly modified from the libsupc++ version, that one has 2 calls + // to malloc which makes it too hard to optimize. + while ((p = std::malloc (sz)) == 0) +{ + std::new_handler handler = std::get_new_handler (); + if (! handler) +_GLIBCXX_THROW_OR_ABORT(std::bad_alloc()); + handler (); +} + return p; +} + +void f(void*p,int n){ + new(p)std::vectorint(n); +} + +/* { dg-final { scan-tree-dump-times calloc 1 optimized } } */ +/* { dg-final { scan-tree-dump-not malloc optimized } } */ +/* { dg-final { scan-tree-dump-not memset optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Property changes on: gcc/testsuite/g++.dg/tree-ssa/calloc.C ___ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +Author Date Id Revision URL \ No newline at end of property Index: gcc/testsuite/gcc.dg/tree-ssa/calloc.c === --- gcc/testsuite/gcc.dg/tree-ssa/calloc.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/calloc.c (working copy) @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-optimized } */ + +#include stdlib.h +#include string.h + +extern int a; +extern int* b; +int n; +void* f(long*q){ + int*p=malloc(n); + ++*q; + if(p){ +++*q; +a=2; +memset(p,0,n); +*b=3; + } + return p; +} +void* g(void){ + float*p=calloc(8,4); + return memset(p,0,32); +} + +/* { dg-final { scan-tree-dump-times calloc 2 optimized } } */ +/* { dg-final { scan-tree-dump-not malloc optimized } } */ +/* { dg-final { scan-tree-dump-not memset optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Property changes on: gcc/testsuite/gcc.dg/tree-ssa/calloc.c ___ Added: svn:keywords ## -0,0 +1 ## +Author Date Id Revision URL \ No newline at end of property Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Index: gcc/tree-ssa-forwprop.c === --- gcc/tree-ssa-forwprop.c (revision 208224) +++ gcc/tree-ssa-forwprop.c (working copy) @@ -1487,20 +1487,149 @@ constant_pointer_difference (tree p1, tr } for (i = 0; i cnt[0]; i++) for (j = 0; j cnt[1]; j++) if (exps[0][i] == exps[1][j]) return size_binop (MINUS_EXPR, offs[0][i], offs[1][j]); return NULL_TREE; } +/* Optimize + ptr = malloc (n); + memset (ptr, 0, n); + into + ptr = calloc (n); + gsi_p is known to point to a call to __builtin_memset. */ +static bool +simplify_malloc_memset (gimple_stmt_iterator *gsi_p) +{ + /* First make sure we have: + ptr = malloc (n); + memset (ptr, 0, n); */ + gimple stmt2 = gsi_stmt (*gsi_p); + if (!integer_zerop (gimple_call_arg (stmt2, 1))) +return false; + tree ptr1, ptr2 = gimple_call_arg (stmt2, 0); + tree size = gimple_call_arg (stmt2, 2); + if (TREE_CODE (ptr2) != SSA_NAME) +return false; + gimple stmt1 = SSA_NAME_DEF_STMT (ptr2); + tree callee1; + /* Handle the case where STMT1 is a unary PHI, which happends + for instance with: + while (!(p = malloc (n))) { ... } + memset (p, 0, n); */ + if (!stmt1) +return false; + if (gimple_code (stmt1) == GIMPLE_PHI + gimple_phi_num_args (stmt1) == 1) +{ + ptr1 = gimple_phi_arg_def (stmt1, 0); + if (TREE_CODE (ptr1) != SSA_NAME) + return false; + stmt1 = SSA_NAME_DEF_STMT (ptr1); +} + else +ptr1 = ptr2; + if (!stmt1 + || !is_gimple_call (stmt1) + || !(callee1 = gimple_call_fndecl (stmt1))) +return false; + + bool is_calloc; + if (DECL_FUNCTION_CODE (callee1) == BUILT_IN_MALLOC) +{ + is_calloc = false; + if (!operand_equal_p (gimple_call_arg (stmt1, 0), size, 0)) + return false; +} + else if (DECL_FUNCTION_CODE
Re: [C++ patch] for C++/52369
On 02/28/2014 05:04 PM, Fabien Chêne wrote: I guess my mistake comes from the fact that %q#D is not present in the c++98 diagnostic. Shall we homogeneise that as well ? In favor of %q#D ? OK. Jason
[jit] Add typechecking to binary ops and comparisons
Committed to branch dmalcolm/jit: gcc/jit/ * libgccjit.c (gcc_jit_context_new_binary_op): Check that the operands have the same type. (gcc_jit_context_new_comparison): Likewise. --- gcc/jit/ChangeLog.jit | 6 ++ gcc/jit/libgccjit.c | 18 ++ 2 files changed, 24 insertions(+) diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit index 625e01a..f2fea8c 100644 --- a/gcc/jit/ChangeLog.jit +++ b/gcc/jit/ChangeLog.jit @@ -1,5 +1,11 @@ 2014-02-28 David Malcolm dmalc...@redhat.com + * libgccjit.c (gcc_jit_context_new_binary_op): Check that the + operands have the same type. + (gcc_jit_context_new_comparison): Likewise. + +2014-02-28 David Malcolm dmalc...@redhat.com + * libgccjit.h (gcc_jit_context_new_cast): New. * libgccjit.map (gcc_jit_context_new_cast): New. * libgccjit++.h (gccjit::context::new_cast): New method. diff --git a/gcc/jit/libgccjit.c b/gcc/jit/libgccjit.c index 6c078ce..d9f63cf 100644 --- a/gcc/jit/libgccjit.c +++ b/gcc/jit/libgccjit.c @@ -752,6 +752,15 @@ gcc_jit_context_new_binary_op (gcc_jit_context *ctxt, RETURN_NULL_IF_FAIL (result_type, ctxt, NULL result_type); RETURN_NULL_IF_FAIL (a, ctxt, NULL a); RETURN_NULL_IF_FAIL (b, ctxt, NULL b); + RETURN_NULL_IF_FAIL_PRINTF4 ( +a-get_type () == b-get_type (), +ctxt, +mismatching types for binary op: + a: %s (type: %s) b: %s (type: %s), +a-get_debug_string (), +a-get_type ()-get_debug_string (), +b-get_debug_string (), +b-get_type ()-get_debug_string ()); return (gcc_jit_rvalue *)ctxt-new_binary_op (loc, op, result_type, a, b); } @@ -766,6 +775,15 @@ gcc_jit_context_new_comparison (gcc_jit_context *ctxt, /* op is checked by the inner function. */ RETURN_NULL_IF_FAIL (a, ctxt, NULL a); RETURN_NULL_IF_FAIL (b, ctxt, NULL b); + RETURN_NULL_IF_FAIL_PRINTF4 ( +a-get_type () == b-get_type (), +ctxt, +mismatching types for comparison: + a: %s (type: %s) b: %s (type: %s), +a-get_debug_string (), +a-get_type ()-get_debug_string (), +b-get_debug_string (), +b-get_type ()-get_debug_string ()); return (gcc_jit_rvalue *)ctxt-new_comparison (loc, op, a, b); } -- 1.7.11.7
[PATCH, rs6000] Restrict reload use of FLOAT_REGS
Hi, We've encountered a rare bug that occurs when attempting to reload for an unaligned store in DImode. For an unaligned store, using stfd gets preference over std since stfd doesn't have an alignment restriction and therefore the m constraint matches. However, when there is not a register available for the REG to be stored, register elimination can replace the REG with its REQ_EQUIV. When this is a PLUS, we end up with an attempt to compute an integer add into a floating-point register, and things rapidly go downhill. We had some internal discussion and determined the best way to fix this is to avoid ever using FLOAT_REGS for a PLUS in rs6000_preferred_reload_class, similar to what's currently done to avoid loading constants into FLOAT_REGS. Uli Weigand pointed out that this existing test is actually a bit too strong, as rclass could be ALL_REGS and this prevents us from using GENERAL_REGS in that case. So I've relaxed that test to only look for superclasses of FLOAT_REGS. (If you feel this is too risky, I can avoid that change.) The patch below fixes the one case where we've observed this bug in the wild (it occurred for a particular snapshot of code for an internal build that doesn't match any public branch). Because it's dependent on register spill, it is very difficult to try to produce a test case that isn't too fragile, so I haven't tried to add one. Bootstrapped and tested on powerpc64le-unknown-linux-gnu (--with-cpu=power8) and powerpc64-unknown-linux-gnu (--with-cpu=power7) with no regressions. Is this ok for trunk? Thanks, Bill 2014-02-28 Bill Schmidt wschm...@linux.vnet.ibm.com * config/rs6000/rs6000.c (rs6000_preferred_reload_class): Disallow PLUS rtx's from reloading into a superset of FLOAT_REGS; relax constraint on constants to only prevent them from being reloaded into a superset of FLOAT_REGS. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 208207) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -16751,7 +16751,8 @@ rs6000_preferred_reload_class (rtx x, enum reg_cla easy_vector_constant (x, mode)) return ALTIVEC_REGS; - if (CONSTANT_P (x) reg_classes_intersect_p (rclass, FLOAT_REGS)) + if ((CONSTANT_P (x) || GET_CODE (x) == PLUS) + reg_class_subset_p (FLOAT_REGS, rclass)) return NO_REGS; if (GET_MODE_CLASS (mode) == MODE_INT rclass == NON_SPECIAL_REGS)
[PATCH, rs6000] Document reserved use of wc constraint
Hi, Hal Finkel requested that we define a constraint for representing individual CR bits. We agreed to reserve wc for this purpose to maintain compatibility with LLVM. This patch documents that use. A pro-forma regstrap is in progress. Assuming no problems, is this ok for trunk? Thanks, Bill 2014-02-28 Bill Schmidt wschm...@linux.vnet.ibm.com * config/rs6000/constraints.md: Document reserved use of wc. Index: gcc/config/rs6000/constraints.md === --- gcc/config/rs6000/constraints.md(revision 208237) +++ gcc/config/rs6000/constraints.md(working copy) @@ -56,6 +56,9 @@ (define_register_constraint wa rs6000_constraints[RS6000_CONSTRAINT_wa] Any VSX register if the -mvsx option was used or NO_REGS.) +;; NOTE: For compatibility, wc is reserved to represent individual CR bits. +;; It is currently used for that purpose in LLVM. + (define_register_constraint wd rs6000_constraints[RS6000_CONSTRAINT_wd] VSX vector register to hold vector double data or NO_REGS.)
[PATCH, LIBITM] Backport libitm bug fixes to FSF 4.8
I'd like to ask for permission to backport the following two LIBITM bug fixes to the FSF 4.8 branch. Although these are not technically fixing regressions, they do fix the libitm.c/reentrant.c testsuite failure on s390 and powerpc (or at least it will when we finally get our power8 code backported to FSF 4.8). It also fixes a real bug on x86 that is latent because we don't currently have a test case that warms up the x86's RTM hardware enough such that its xbegin succeeds exposing the bug. I'd like this backport so that the 4.8 based distros won't need to carry this as an add-on patch. It should also be fairly safe as well, since the fixed code is limited to the arches (x86, s390 and powerpc) that define USE_HTM_FASTPATH, so all others definitely won't see a difference. I'll note I CC'd some of the usual suspects interested in TM as well as the normal RMs, because LIBITM doesn't seem to have a maintainer or reviewer listed in the MAINTAINERS file. Is that an oversight or??? Peter Backport from mainline 2013-06-20 Torvald Riegel trie...@redhat.com * query.cc (_ITM_inTransaction): Abort when using the HTM fastpath. (_ITM_getTransactionId): Same. * config/x86/target.h (htm_transaction_active): New. 2013-06-20 Torvald Riegel trie...@redhat.com PR libitm/57643 * beginend.cc (gtm_thread::begin_transaction): Handle reentrancy in the HTM fastpath. Index: libitm/beginend.cc === --- libitm/beginend.cc (revision 208151) +++ libitm/beginend.cc (working copy) @@ -197,6 +197,8 @@ // We are executing a transaction now. // Monitor the writer flag in the serial-mode lock, and abort // if there is an active or waiting serial-mode transaction. + // Note that this can also happen due to an enclosing + // serial-mode transaction; we handle this case below. if (unlikely(serial_lock.is_write_locked())) htm_abort(); else @@ -219,6 +221,14 @@ tx = new gtm_thread(); set_gtm_thr(tx); } + // Check whether there is an enclosing serial-mode transaction; + // if so, we just continue as a nested transaction and don't + // try to use the HTM fastpath. This case can happen when an + // outermost relaxed transaction calls unsafe code that starts + // a transaction. + if (tx-nesting 0) + break; + // Another thread is running a serial-mode transaction. Wait. serial_lock.read_lock(tx); serial_lock.read_unlock(tx); // TODO We should probably reset the retry count t here, unless Index: libitm/config/x86/target.h === --- libitm/config/x86/target.h (revision 208151) +++ libitm/config/x86/target.h (working copy) @@ -125,6 +125,13 @@ { return begin_ret _XABORT_RETRY; } + +/* Returns true iff a hardware transaction is currently being executed. */ +static inline bool +htm_transaction_active () +{ + return _xtest() != 0; +} #endif Index: libitm/query.cc === --- libitm/query.cc (revision 208151) +++ libitm/query.cc (working copy) @@ -43,6 +43,15 @@ _ITM_howExecuting ITM_REGPARM _ITM_inTransaction (void) { +#if defined(USE_HTM_FASTPATH) + // If we use the HTM fastpath, we cannot reliably detect whether we are + // in a transaction because this function can be called outside of + // a transaction and thus we can't deduce this by looking at just the serial + // lock. This function isn't used in practice currently, so the easiest + // way to handle it is to just abort. + if (htm_fastpath htm_transaction_active()) +htm_abort(); +#endif struct gtm_thread *tx = gtm_thr(); if (tx (tx-nesting 0)) { @@ -58,6 +67,11 @@ _ITM_transactionId_t ITM_REGPARM _ITM_getTransactionId (void) { +#if defined(USE_HTM_FASTPATH) + // See ITM_inTransaction. + if (htm_fastpath htm_transaction_active()) +htm_abort(); +#endif struct gtm_thread *tx = gtm_thr(); return (tx (tx-nesting 0)) ? tx-id : _ITM_noTransactionId; }
Re: RFA: ipa-devirt PATCH for c++/58678 (devirt causes KDE build failure)
I went ahead and checked in my patch so that the regression is fixed over the weekend. Jason
Re: [PATCH, rs6000] Document reserved use of wc constraint
On Fri, Feb 28, 2014 at 7:23 PM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: Hi, Hal Finkel requested that we define a constraint for representing individual CR bits. We agreed to reserve wc for this purpose to maintain compatibility with LLVM. This patch documents that use. A pro-forma regstrap is in progress. Assuming no problems, is this ok for trunk? You're not going to implement the new register class? Okay Thanks, David
Re: [PATCH, rs6000] Restrict reload use of FLOAT_REGS
On Fri, Feb 28, 2014 at 7:11 PM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: Hi, We've encountered a rare bug that occurs when attempting to reload for an unaligned store in DImode. For an unaligned store, using stfd gets preference over std since stfd doesn't have an alignment restriction and therefore the m constraint matches. However, when there is not a register available for the REG to be stored, register elimination can replace the REG with its REQ_EQUIV. When this is a PLUS, we end up with an attempt to compute an integer add into a floating-point register, and things rapidly go downhill. We had some internal discussion and determined the best way to fix this is to avoid ever using FLOAT_REGS for a PLUS in rs6000_preferred_reload_class, similar to what's currently done to avoid loading constants into FLOAT_REGS. Uli Weigand pointed out that this existing test is actually a bit too strong, as rclass could be ALL_REGS and this prevents us from using GENERAL_REGS in that case. So I've relaxed that test to only look for superclasses of FLOAT_REGS. (If you feel this is too risky, I can avoid that change.) The patch below fixes the one case where we've observed this bug in the wild (it occurred for a particular snapshot of code for an internal build that doesn't match any public branch). Because it's dependent on register spill, it is very difficult to try to produce a test case that isn't too fragile, so I haven't tried to add one. Bootstrapped and tested on powerpc64le-unknown-linux-gnu (--with-cpu=power8) and powerpc64-unknown-linux-gnu (--with-cpu=power7) with no regressions. Is this ok for trunk? Thanks, Bill 2014-02-28 Bill Schmidt wschm...@linux.vnet.ibm.com * config/rs6000/rs6000.c (rs6000_preferred_reload_class): Disallow PLUS rtx's from reloading into a superset of FLOAT_REGS; relax constraint on constants to only prevent them from being reloaded into a superset of FLOAT_REGS. This is okay with me. Uli is the best one to comment if this is the right test. Thanks, David
Re: [PATCH i386 14/8] [AVX-512] Fix exp2 and sqrt tests.
Hello Uroš, On 28 Feb 13:55, Uros Bizjak wrote: On Fri, Feb 28, 2014 at 1:14 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, This is relatively obvious patch which eliminates comparision of inifinities for exp2 AVX-512 test and properly comparing floats for avx512f-sqrtps-2.c. Tests pass. Is it ok for trunk? gcc/testsuite/ * gcc.target/i386/avx512er-vexp2ps-2.c: Decrease exponent argument to avoid inf values. * gcc.target/i386/avx512er-vexp2ps-2.c: Compare results with UNION_FP_CHECK machinery. You are talking about avx512f-sqrtps-2.c, the ChangeLog refers to avx512er-vexp2ps-2.c, but the patch is modifying avx512f-vdivps-2.c. Sorry for mess. Broken was avx512f-vdivps-2.c. Updated testsuite/CHangelog: * gcc.target/i386/avx512er-vexp2ps-2.c: Decrease exponent argument to avoid inf values. * gcc.target/i386/avx512f-vdivps-2.c: Compare results with UNION_FP_CHECK machinery. -- Thanks, K