[PATCH] Windows libibery: Don't quote args unnecessarily
We only quote arguments that contain spaces, \t or characters to prevent wasting 2 characters per argument of the CreateProcess() 32,768 limit. --- libiberty/pex-win32.c | 46 +- 1 file changed, 37 insertions(+), 9 deletions(-) diff --git a/libiberty/pex-win32.c b/libiberty/pex-win32.c index eae72c5..8b9d4f0 100644 --- a/libiberty/pex-win32.c +++ b/libiberty/pex-win32.c @@ -340,17 +340,25 @@ argv_to_cmdline (char *const *argv) char *p; size_t cmdline_len; int i, j, k; + int needs_quotes; cmdline_len = 0; for (i = 0; argv[i]; i++) { - /* We quote every last argument. This simplifies the problem; -we need only escape embedded double-quotes and immediately + /* We only quote arguments that contain spaces, \t or characters to +prevent wasting 2 chars per argument of the CreateProcess 32k char +limit. We need only escape embedded double-quotes and immediately preceeding backslash characters. A sequence of backslach characters that is not follwed by a double quote character will not be escaped. */ + needs_quotes = 0; for (j = 0; argv[i][j]; j++) { + if (argv[i][j] == ' ' || argv[i][j] == '\t' || argv[i][j] == '') + { + needs_quotes = 1; + } + if (argv[i][j] == '') { /* Escape preceeding backslashes. */ @@ -362,16 +370,33 @@ argv_to_cmdline (char *const *argv) } /* Trailing backslashes also need to be escaped because they will be followed by the terminating quote. */ - for (k = j - 1; k = 0 argv[i][k] == '\\'; k--) - cmdline_len++; + if (needs_quotes) +{ + for (k = j - 1; k = 0 argv[i][k] == '\\'; k--) +cmdline_len++; +} cmdline_len += j; - cmdline_len += 3; /* for leading and trailing quotes and space */ + /* for leading and trailing quotes and space */ + cmdline_len += needs_quotes * 2 + 1; } cmdline = XNEWVEC (char, cmdline_len); p = cmdline; for (i = 0; argv[i]; i++) { - *p++ = ''; + needs_quotes = 0; + for (j = 0; argv[i][j]; j++) +{ + if (argv[i][j] == ' ' || argv[i][j] == '\t' || argv[i][j] == '') +{ + needs_quotes = 1; + break; +} +} + + if (needs_quotes) +{ + *p++ = ''; +} for (j = 0; argv[i][j]; j++) { if (argv[i][j] == '') @@ -382,9 +407,12 @@ argv_to_cmdline (char *const *argv) } *p++ = argv[i][j]; } - for (k = j - 1; k = 0 argv[i][k] == '\\'; k--) - *p++ = '\\'; - *p++ = ''; + if (needs_quotes) +{ + for (k = j - 1; k = 0 argv[i][k] == '\\'; k--) +*p++ = '\\'; + *p++ = ''; +} *p++ = ' '; } p[-1] = '\0'; -- 1.9.2
[PATCH] Windows libiberty: Don't quote args unnecessarily (v2)
We only quote arguments that contain spaces, \t or characters to prevent wasting 2 characters per argument of the CreateProcess() 32,768 limit. libiberty/ * pex-win32.c (argv_to_cmdline): Don't quote args unnecessarily Ray Donnelly (1): Windows libibery: Don't quote args unnecessarily libiberty/pex-win32.c | 46 +- 1 file changed, 37 insertions(+), 9 deletions(-) -- 1.9.2
Re: we are starting the wide int merge
Christophe Lyon christophe.l...@linaro.org writes: It also looks like the git-svn-id property is now wrong/incomplete. For instance, commit 9a5942c1d4d9116ab74b0741cfe3894a89fd17fb has: git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/wide-int@201706 138bc75d-0d04-0410-961f-82ee72b054a4 How does it map to the SVN commit in trunk? This is a commit on the wide-int branch (the one that created it). Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 And now for something completely different.
[PATCH, nds32] Committed: Enable HONOR_REG_ALLOC_ORDER when optimizing for size.
Hi, all, There was a patch to have HONOR_REG_ALLOC_ORDER using C expression: http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01546.html http://gcc.gnu.org/ml/gcc-patches/2014-05/msg00048.html This is very helpful to nds32 port since we can decide when to apply HONOR_REG_ALLOC_ORDER against code size and performance trade-off. Currently, HONOR_REG_ALLOC_ORDER only benefits code size in nds32 port. ChangeLog and patch are as below, committed as Rev.210137: Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 210135) +++ gcc/ChangeLog (revision 210137) @@ -1,3 +1,8 @@ +2014-05-07 Chung-Ju Wu jasonw...@gmail.com + + * config/nds32/nds32.h (HONOR_REG_ALLOC_ORDER): Have it in favor + of using optimize_size. + 2014-05-06 Mike Stump mikest...@comcast.net * wide-int.h (wi::int_traits HOST_WIDE_INT): Always define. Index: gcc/config/nds32/nds32.h === --- gcc/config/nds32/nds32.h(revision 210135) +++ gcc/config/nds32/nds32.h(revision 210137) @@ -553,7 +553,7 @@ /* Tell IRA to use the order we define rather than messing it up with its own cost calculations. */ -#define HONOR_REG_ALLOC_ORDER 1 +#define HONOR_REG_ALLOC_ORDER optimize_size /* The number of consecutive hard regs needed starting at reg regno for holding a value of mode mode. */ Best regards, jasonwucj
[PATCH][4.7] Fix PR57864
This backports a piece of 2012-09-24 Richard Guenther rguent...@suse.de * tree-ssa-pre.c (bitmap_find_leader, create_expression_by_pieces, find_or_generate_expression): Remove dominating stmt argument. (find_leader_in_sets, phi_translate_1, bitmap_find_leader, create_component_ref_by_pieces_1, create_component_ref_by_pieces, do_regular_insertion, do_partial_partial_insertion): Adjust. (compute_avail): Do not set uids. to the 4.7 branch. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to the branch (and the testcase added to 4.8, 4.9 and trunk). Richard. 2014-05-06 Richard Biener rguent...@suse.de PR tree-optimization/57864 * tree-ssa-pre.c (phi_translate_1): Backport NAME case simplification from mainline. Do not lookup the VN value-number here. * gcc.dg/torture/pr57864.c: New testcase. Index: gcc/tree-ssa-pre.c === *** gcc/tree-ssa-pre.c (revision 210104) --- gcc/tree-ssa-pre.c (working copy) *** phi_translate_1 (pre_expr expr, bitmap_s *** 1756,1794 case NAME: { - gimple phi = NULL; - edge e; - gimple def_stmt; tree name = PRE_EXPR_NAME (expr); ! ! def_stmt = SSA_NAME_DEF_STMT (name); if (gimple_code (def_stmt) == GIMPLE_PHI gimple_bb (def_stmt) == phiblock) - phi = def_stmt; - else - return expr; - - e = find_edge (pred, gimple_bb (phi)); - if (e) { ! tree def = PHI_ARG_DEF (phi, e-dest_idx); ! pre_expr newexpr; ! ! if (TREE_CODE (def) == SSA_NAME) ! def = VN_INFO (def)-valnum; /* Handle constant. */ if (is_gimple_min_invariant (def)) return get_or_alloc_expr_for_constant (def); ! if (TREE_CODE (def) == SSA_NAME ssa_undefined_value_p (def)) ! return NULL; ! ! newexpr = get_or_alloc_expr_for_name (def); ! return newexpr; } } - return expr; default: gcc_unreachable (); --- 1756,1781 case NAME: { tree name = PRE_EXPR_NAME (expr); ! gimple def_stmt = SSA_NAME_DEF_STMT (name); ! /* If the SSA name is defined by a PHI node in this block, ! translate it. */ if (gimple_code (def_stmt) == GIMPLE_PHI gimple_bb (def_stmt) == phiblock) { ! edge e = find_edge (pred, gimple_bb (def_stmt)); ! tree def = PHI_ARG_DEF (def_stmt, e-dest_idx); /* Handle constant. */ if (is_gimple_min_invariant (def)) return get_or_alloc_expr_for_constant (def); ! return get_or_alloc_expr_for_name (def); } + /* Otherwise return it unchanged - it will get cleaned if its + value is not available in PREDs AVAIL_OUT set of expressions. */ + return expr; } default: gcc_unreachable (); Index: gcc/testsuite/gcc.dg/torture/pr57864.c === *** gcc/testsuite/gcc.dg/torture/pr57864.c (revision 0) --- gcc/testsuite/gcc.dg/torture/pr57864.c (working copy) *** *** 0 --- 1,37 + /* { dg-do compile } */ + + union U { + double val; + union U *ptr; + }; + + union U *d; + double a; + int b; + int c; + + static void fn1(union U *p1, int p2, _Bool p3) + { + union U *e; + + if (p2 == 0) + a = ((union U*)((unsigned long)p1 ~1))-val; + + if (b) { + e = p1; + } else if (c) { + e = ((union U*)((unsigned long)p1 ~1))-ptr; + d = e; + } else { + e = 0; + d = ((union U*)0)-ptr; + } + + fn1 (e, 0, 0); + fn1 (0, 0, p3); + } + + void fn2 (void) + { + fn1 (0, 0, 0); + }
Re: [PATCH] Change HONOR_REG_ALLOC_ORDER to a marco for C expression
2014-05-02 14:41 GMT+08:00 Kito Cheng kito.ch...@gmail.com: Hi Jeff: I fixed up some minor whitespace issues and committed your patch. Thanks for your help :) Hi, I noticed the commit date in ChangeLog was incorrect for the patch. Fixed it as obvious. Committed into Rev.210138. Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 210137) +++ gcc/ChangeLog (revision 210138) @@ -1092,7 +1092,7 @@ * doc/invoke.texi: Describe -fsanitize=float-divide-by-zero. -2014-02-26 Kito Cheng k...@0xlab.org +2014-05-02 Kito Cheng k...@0xlab.org * defaults.h (HONOR_REG_ALLOC_ORDER): Change HONOR_REG_ALLOC_ORDER to a C expression marco. Best regards, jasonwucj
patch1.diff updated + test results Was: Re: GCC's -fsplit-stack disturbing Mach's vm_allocate
On Tue, 2014-05-06 at 15:26 +0200, Samuel Thibault wrote: Svante Signell, le Tue 06 May 2014 15:25:38 +0200, a écrit : On Tue, 2014-05-06 at 15:07 +0200, Samuel Thibault wrote: Svante Signell, le Tue 06 May 2014 15:05:20 +0200, a écrit : On Tue, 2014-05-06 at 14:51 +0200, Samuel Thibault wrote: Just to explicitly ask for it: Svante Signell, le Tue 06 May 2014 10:06:49 +0200, a écrit : For some (yet) unknown reason all libgo tests fails with a segfault when run in the build tree: make, sh or something else, the test commands are rather hard to track. Doesn't that dump a core? Do you have /servers/crash properly pointing to /servers/crash-dump-core and ulimit -u set to unlimited? More good news: - Installing the modified libpthread.so.0.3 made the segfault go away. I could now run the check from the build tree :-) - Adding #define TARGET_THREAD_SSP_OFFSET 0x14 to patch1.diff and building gcc-4.9.0-2 the test results are summarised as follows :-) === libgo Summary === # of expected passes101 # of unexpected failures21 I think some of the remaining failures are rather easy to fix. Attached is an updated patch1.diff. Remains to solve the problem with patch8.diff: Adding arch specific code to: src/libgo/mksysinfo.sh --- a/src/gcc/config/i386/gnu.h +++ b/src/gcc/config/i386/gnu.h @@ -37,11 +37,14 @@ #ifdef TARGET_LIBC_PROVIDES_SSP -/* Not supported yet. */ -# undef TARGET_THREAD_SSP_OFFSET - -/* Not supported yet. */ -# undef TARGET_CAN_SPLIT_STACK -# undef TARGET_THREAD_SPLIT_STACK_OFFSET +/* i386 glibc provides __stack_chk_guard in %gs:0x14. */ +#define TARGET_THREAD_SSP_OFFSET0x14 +/* We only build the -fsplit-stack support in libgcc if the + assembler has full support for the CFI directives. */ +#if HAVE_GAS_CFI_PERSONALITY_DIRECTIVE +#define TARGET_CAN_SPLIT_STACK +#endif +/* We steal the last transactional memory word. */ +#define TARGET_THREAD_SPLIT_STACK_OFFSET 0x30 #endif
[PATCH] [PING^2] Fix for PR libstdc++/60758
Original Message Subject: [PING] [PATCH] Fix for PR libstdc++/60758 Date: Thu, 17 Apr 2014 17:48:12 +0400 From: Alexey Merzlyakov alexey.merzlya...@samsung.com To: Ramana Radhakrishnan ramra...@arm.com CC: gcc-patches@gcc.gnu.org gcc-patches@gcc.gnu.org, Viacheslav Garbuzov v.garbu...@samsung.com, Yury Gribov y.gri...@samsung.com Hi, This fixes infinite backtrace in __cxa_end_cleanup(). Regtest was finished with no regressions on arm-linux-gnueabi(sf). The patch posted at: http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00496.html Thanks in advance. Best regards, Merzlyakov Alexey 2014-05-07 Alexey Merzlyakov alexey.merzlya...@samsung.com PR libstdc++/60758 * libsupc++/eh_arm.cc (__cxa_end_cleanup): Change r4 to lr in save/restore and add unwind directives. diff --git a/libstdc++-v3/libsupc++/eh_arm.cc b/libstdc++-v3/libsupc++/eh_arm.cc index aa453dd..6a45af5 100644 --- a/libstdc++-v3/libsupc++/eh_arm.cc +++ b/libstdc++-v3/libsupc++/eh_arm.cc @@ -199,27 +199,33 @@ asm (.global __cxa_end_cleanup\n nop 5\n); #else // Assembly wrapper to call __gnu_end_cleanup without clobbering r1-r3. -// Also push r4 to preserve stack alignment. +// Also push lr to preserve stack alignment and to allow backtracing. #ifdef __thumb__ asm ( .pushsection .text.__cxa_end_cleanup\n .global __cxa_end_cleanup\n .type __cxa_end_cleanup, \function\\n .thumb_func\n __cxa_end_cleanup:\n - push\t{r1, r2, r3, r4}\n + .fnstart\n + push\t{r1, r2, r3, lr}\n + .save\t{r1, r2, r3, lr}\n bl\t__gnu_end_cleanup\n - pop\t{r1, r2, r3, r4}\n + pop\t{r1, r2, r3, lr}\n bl\t_Unwind_Resume @ Never returns\n + .fnend\n .popsection\n); #else asm ( .pushsection .text.__cxa_end_cleanup\n .global __cxa_end_cleanup\n .type __cxa_end_cleanup, \function\\n __cxa_end_cleanup:\n - stmfd\tsp!, {r1, r2, r3, r4}\n + .fnstart\n + stmfd\tsp!, {r1, r2, r3, lr}\n + .save\t{r1, r2, r3, lr}\n bl\t__gnu_end_cleanup\n - ldmfd\tsp!, {r1, r2, r3, r4}\n + ldmfd\tsp!, {r1, r2, r3, lr}\n bl\t_Unwind_Resume @ Never returns\n + .fnend\n .popsection\n); #endif #endif
Re: [PATCH] [PING^2] Fix for PR libstdc++/60758
Hi, On 05/07/2014 10:19 AM, Yury Gribov wrote: Original Message Subject: [PING] [PATCH] Fix for PR libstdc++/60758 Date: Thu, 17 Apr 2014 17:48:12 +0400 From: Alexey Merzlyakov alexey.merzlya...@samsung.com To: Ramana Radhakrishnan ramra...@arm.com CC: gcc-patches@gcc.gnu.org gcc-patches@gcc.gnu.org, Viacheslav Garbuzov v.garbu...@samsung.com, Yury Gribov y.gri...@samsung.com Hi, This fixes infinite backtrace in __cxa_end_cleanup(). Regtest was finished with no regressions on arm-linux-gnueabi(sf). The patch posted at: http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00496.html I think you want an ARM maintainer for this. I'm adding some in CC. Also, remember to send patches touching the C++ library to the mailing list too. Paolo.
[C++ Patch] PR 61080
Hi, thus I prepared this simple patch. Tested x86_64-linux. Thanks, Paolo. / /cp 2014-05-07 Paolo Carlini paolo.carl...@oracle.com PR c++/61080 * pt.c (instantiate_decl): Avoid generating the body of a deleted function. /testsuite 2014-05-07 Paolo Carlini paolo.carl...@oracle.com PR c++/61080 * g++.dg/cpp0x/deleted7.C: New. Index: cp/pt.c === --- cp/pt.c (revision 210140) +++ cp/pt.c (working copy) @@ -19542,6 +19542,7 @@ instantiate_decl (tree d, int defer_ok, int saved_unevaluated_operand = cp_unevaluated_operand; int saved_inhibit_evaluation_warnings = c_inhibit_evaluation_warnings; bool external_p; + bool deleted_p; tree fn_context; bool nested; @@ -19623,11 +19624,17 @@ instantiate_decl (tree d, int defer_ok, args = gen_args; if (TREE_CODE (d) == FUNCTION_DECL) -pattern_defined = (DECL_SAVED_TREE (code_pattern) != NULL_TREE - || DECL_DEFAULTED_OUTSIDE_CLASS_P (code_pattern) - || DECL_DELETED_FN (code_pattern)); +{ + deleted_p = DECL_DELETED_FN (code_pattern); + pattern_defined = (DECL_SAVED_TREE (code_pattern) != NULL_TREE +|| DECL_DEFAULTED_OUTSIDE_CLASS_P (code_pattern) +|| deleted_p); +} else -pattern_defined = ! DECL_IN_AGGR_P (code_pattern); +{ + deleted_p = false; + pattern_defined = ! DECL_IN_AGGR_P (code_pattern); +} /* We may be in the middle of deferred access check. Disable it now. */ push_deferring_access_checks (dk_no_deferred); @@ -19671,7 +19678,10 @@ instantiate_decl (tree d, int defer_ok, elsewhere, we don't want to instantiate the entire data member, but we do want to instantiate the initializer so that we can substitute that elsewhere. */ - || (external_p VAR_P (d))) + || (external_p VAR_P (d)) + /* Handle here a deleted function too, avoid generating +its body (c++/61080). */ + || deleted_p) { /* The definition of the static data member is now required so we must substitute the initializer. */ @@ -19867,17 +19877,14 @@ instantiate_decl (tree d, int defer_ok, tf_warning_or_error, tmpl, /*integral_constant_expression_p=*/false); - if (DECL_STRUCT_FUNCTION (code_pattern)) - { - /* Set the current input_location to the end of the function -so that finish_function knows where we are. */ - input_location - = DECL_STRUCT_FUNCTION (code_pattern)-function_end_locus; + /* Set the current input_location to the end of the function +so that finish_function knows where we are. */ + input_location + = DECL_STRUCT_FUNCTION (code_pattern)-function_end_locus; - /* Remember if we saw an infinite loop in the template. */ - current_function_infinite_loop - = DECL_STRUCT_FUNCTION (code_pattern)-language-infinite_loop; - } + /* Remember if we saw an infinite loop in the template. */ + current_function_infinite_loop + = DECL_STRUCT_FUNCTION (code_pattern)-language-infinite_loop; } /* We don't need the local specializations any more. */ Index: testsuite/g++.dg/cpp0x/deleted7.C === --- testsuite/g++.dg/cpp0x/deleted7.C (revision 0) +++ testsuite/g++.dg/cpp0x/deleted7.C (working copy) @@ -0,0 +1,36 @@ +// PR c++/61080 +// { dg-do compile { target c++11 } } +// { dg-options -Wreturn-type } + +struct AAA +{ + int a1, a2, a3; + void *p; +}; + +template typename K, typename V +class WeakMapPtr +{ + public: +WeakMapPtr() : ptr(nullptr) {}; +bool init(AAA *cx); + private: +void *ptr; +WeakMapPtr(const WeakMapPtr wmp) = delete; +WeakMapPtr operator=(const WeakMapPtr wmp) = delete; +}; + +template typename K, typename V +bool WeakMapPtrK, V::init(AAA *cx) +{ +ptr = cx-p; +return true; +} + +struct JSObject +{ + int blah; + float meh; +}; + +template class WeakMapPtrJSObject*, JSObject*;
PR 61084: SPARC fallout from wide-int merge
The DImode constant spliiter assigned the result of trunc_int_for_mode to an unsigned int rather than a HOST_WIDE_INT. This then produced const_ints that were zero-extended rather than sign-extended and tripped the assert: gcc_checking_assert (INTVAL (x.first) == sext_hwi (INTVAL (x.first), precision) || (x.second == BImode INTVAL (x.first) == 1)); The other hunks are just by inspection, but I think gen_int_mode is preferred over GEN_INT when the mode is obvious. Tested by Rainer, who says that the boostrap now completes. OK to install? Thanks, Richard gcc/ PR target/61084 * config/sparc/sparc.md: Fix types of low and high in DI constant splitter. Use gen_int_mode in some other splitters. Index: gcc/config/sparc/sparc.md === --- gcc/config/sparc/sparc.md 2014-05-07 10:15:23.051156294 +0100 +++ gcc/config/sparc/sparc.md 2014-05-07 10:15:27.922201361 +0100 @@ -1886,7 +1886,7 @@ (define_split emit_insn (gen_movsi (gen_lowpart (SImode, operands[0]), operands[1])); #else - unsigned int low, high; + HOST_WIDE_INT low, high; low = trunc_int_for_mode (INTVAL (operands[1]), SImode); high = trunc_int_for_mode (INTVAL (operands[1]) 32, SImode); @@ -4822,7 +4822,7 @@ (define_split [(set (match_dup 3) (match_dup 4)) (set (match_dup 0) (ior:SI (not:SI (match_dup 3)) (match_dup 1)))] { - operands[4] = GEN_INT (~INTVAL (operands[2])); + operands[4] = gen_int_mode (~INTVAL (operands[2]), SImode); }) (define_insn_and_split *or_not_di_sp32 @@ -4899,7 +4899,7 @@ (define_split [(set (match_dup 3) (match_dup 4)) (set (match_dup 0) (not:SI (xor:SI (match_dup 3) (match_dup 1] { - operands[4] = GEN_INT (~INTVAL (operands[2])); + operands[4] = gen_int_mode (~INTVAL (operands[2]), SImode); }) (define_split @@ -4911,7 +4911,7 @@ (define_split [(set (match_dup 3) (match_dup 4)) (set (match_dup 0) (xor:SI (match_dup 3) (match_dup 1)))] { - operands[4] = GEN_INT (~INTVAL (operands[2])); + operands[4] = gen_int_mode (~INTVAL (operands[2]), SImode); }) ;; Split DImode logical operations requiring two instructions.
Re: [PATCH] [PING^2] Fix for PR libstdc++/60758
On 05/07/14 09:19, Yury Gribov wrote: Original Message Subject: [PING] [PATCH] Fix for PR libstdc++/60758 Date: Thu, 17 Apr 2014 17:48:12 +0400 From: Alexey Merzlyakov alexey.merzlya...@samsung.com To: Ramana Radhakrishnan ramra...@arm.com CC: gcc-patches@gcc.gnu.org gcc-patches@gcc.gnu.org, Viacheslav Garbuzov v.garbu...@samsung.com, Yury Gribov y.gri...@samsung.com Hi, This fixes infinite backtrace in __cxa_end_cleanup(). Regtest was finished with no regressions on arm-linux-gnueabi(sf). The patch posted at: http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00496.html This is OK to apply if no regressions. Thanks, Ramana Thanks in advance. Best regards, Merzlyakov Alexey
[PATCH GCC]Add 'force-dwarf-lexical-blocks' command line option
Hi, Currently GCC only emits DWARF debug information (DW_TAG_lexical_block DIEs) for compound statements containing significant local declarations. However, code coverage tools that process the DWARF debug information to implement block/path coverage need more complete lexical block information. This patch adds the necessary functionality under the control of a new command line argument: -fforce-dwarf-lexical-blocks. When this flag is set, a DW_TAG_lexical_block DIE will be emitted for every function body, loop body, switch body, case statement, if-then and if-else statement, even if the body is a single statement. Likewise, a lexical block will be emitted for the first label of a labeled statement. This block ends at the end of the current lexical scope, or when a break, continue, goto or return statement is encountered at the same lexical scope level. Consequently, any case in a switch statement that does not flow through to the next case, will have its own dwarf lexical block. The complete change proposal contains 4 patches (attached first 3): 1. Add command line option -fforce-dwarf-lexical-blocks 2. Use of flag_force_dwarf_blocks 3. Create label scopes A forth patch, extending the proposed functionality to C++ will be submitted in a separate message. Attached are the proposed ChangeLog additions, named according to the directory each one belongs to. Best regards, Andrei Herman Mentor Graphics Corporation Israel branch gcc_c_ChangeLog Description: gcc_c_ChangeLog gcc_c-family_ChangeLog Description: gcc_c-family_ChangeLog gcc_ChangeLog Description: gcc_ChangeLog 0001-Add-command-line-option-fforce_dwarf_lexical_blocks.patch Description: 0001-Add-command-line-option-fforce_dwarf_lexical_blocks.patch 0002-Use-flag_force_dwarf_blocks.patch Description: 0002-Use-flag_force_dwarf_blocks.patch 0003-Create-label-scopes.patch Description: 0003-Create-label-scopes.patch
Re: [PATCH GCC]Add 'force-dwarf-lexical-blocks' command line option
On May 7, 2014, at 2:32 AM, Herman, Andrei andrei_her...@codesourcery.com wrote: Hi, Currently GCC only emits DWARF debug information (DW_TAG_lexical_block DIEs) for compound statements containing significant local declarations. However, code coverage tools that process the DWARF debug information to implement block/path coverage need more complete lexical block information. This patch adds the necessary functionality under the control of a new command line argument: -fforce-dwarf-lexical-blocks. When this flag is set, a DW_TAG_lexical_block DIE will be emitted for every function body, loop body, switch body, case statement, if-then and if-else statement, even if the body is a single statement. Likewise, a lexical block will be emitted for the first label of a labeled statement. This block ends at the end of the current lexical scope, or when a break, continue, goto or return statement is encountered at the same lexical scope level. Consequently, any case in a switch statement that does not flow through to the next case, will have its own dwarf lexical block. The complete change proposal contains 4 patches (attached first 3): 1. Add command line option -fforce-dwarf-lexical-blocks This option since it is specific to the c frontend should go into c.opt instead of common.opt. Unless you are going to extend this to Ada, Java and fortran. Thanks, Andrew 2. Use of flag_force_dwarf_blocks 3. Create label scopes A forth patch, extending the proposed functionality to C++ will be submitted in a separate message. Attached are the proposed ChangeLog additions, named according to the directory each one belongs to. Best regards, Andrei Herman Mentor Graphics Corporation Israel branch gcc_c_ChangeLog gcc_c-family_ChangeLog gcc_ChangeLog 0001-Add-command-line-option-fforce_dwarf_lexical_blocks.patch 0002-Use-flag_force_dwarf_blocks.patch 0003-Create-label-scopes.patch
Re: we are starting the wide int merge
On 7 May 2014 09:48, Andreas Schwab sch...@suse.de wrote: Christophe Lyon christophe.l...@linaro.org writes: It also looks like the git-svn-id property is now wrong/incomplete. For instance, commit 9a5942c1d4d9116ab74b0741cfe3894a89fd17fb has: git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/wide-int@201706 138bc75d-0d04-0410-961f-82ee72b054a4 How does it map to the SVN commit in trunk? This is a commit on the wide-int branch (the one that created it). I had a bug in my script while parsing the output of git log, hopefully fixed now.
RE: [PATCH GCC]Add 'force-dwarf-lexical-blocks' command line option
Thanks for the note. I will make the needed changes and resubmit. Regards, Andrei Herman Mentor Graphics Corporation Israel branch -Original Message- From: pins...@gmail.com [mailto:pins...@gmail.com] Sent: Wednesday, May 07, 2014 12:37 PM To: Herman, Andrei Cc: gcc-patches@gcc.gnu.org; herman_and...@mentor.com Subject: Re: [PATCH GCC]Add 'force-dwarf-lexical-blocks' command line option On May 7, 2014, at 2:32 AM, Herman, Andrei andrei_her...@codesourcery.com wrote: Hi, Currently GCC only emits DWARF debug information (DW_TAG_lexical_block DIEs) for compound statements containing significant local declarations. However, code coverage tools that process the DWARF debug information to implement block/path coverage need more complete lexical block information. This patch adds the necessary functionality under the control of a new command line argument: -fforce-dwarf-lexical-blocks. When this flag is set, a DW_TAG_lexical_block DIE will be emitted for every function body, loop body, switch body, case statement, if-then and if-else statement, even if the body is a single statement. Likewise, a lexical block will be emitted for the first label of a labeled statement. This block ends at the end of the current lexical scope, or when a break, continue, goto or return statement is encountered at the same lexical scope level. Consequently, any case in a switch statement that does not flow through to the next case, will have its own dwarf lexical block. The complete change proposal contains 4 patches (attached first 3): 1. Add command line option -fforce-dwarf-lexical-blocks This option since it is specific to the c frontend should go into c.opt instead of common.opt. Unless you are going to extend this to Ada, Java and fortran. Thanks, Andrew 2. Use of flag_force_dwarf_blocks 3. Create label scopes A forth patch, extending the proposed functionality to C++ will be submitted in a separate message. Attached are the proposed ChangeLog additions, named according to the directory each one belongs to. Best regards, Andrei Herman Mentor Graphics Corporation Israel branch gcc_c_ChangeLog gcc_c-family_ChangeLog gcc_ChangeLog 0001-Add-command-line-option-fforce_dwarf_lexical_blocks.patch 0002-Use-flag_force_dwarf_blocks.patch 0003-Create-label-scopes.patch
[PATCH][1/n] Always-64bit HWI cleanups
This removes the need_64bit_hwi logic, nothing else (well, brings libcpp in line with gcc). Bootstrap / regtest pending on x86_64-unknown-linux-gnu. Just as I promised to send this before committing the let's try this patch (which is now said to fix wide-int fallout). Richard. 2014-05-07 Richard Biener rguent...@suse.de gcc/ * config.gcc: Remove need_64bit_hwint. * configure.ac: Do not define NEED_64BIT_HOST_WIDE_INT. * hwint.h: Do not check NEED_64BIT_HOST_WIDE_INT but assume it to be true. * config.in: Regenerate. * configure: Likewise. libcpp/ * configure.ac: Copy gcc logic of detecting a 64bit type. Remove HOST_WIDE_INT define. * include/cpplib.h: typedef cpp_num_part to a 64bit type, similar to how hwint.h does it. * config.in: Regenerate. * configure: Likewise. Index: trunk/gcc/config.gcc === *** trunk.orig/gcc/config.gcc 2014-04-30 10:16:58.491135331 +0200 --- trunk/gcc/config.gcc2014-04-30 10:24:43.902103288 +0200 *** *** 164,176 # gasSet to yes or no depending on whether the target # system normally uses GNU as. # - # need_64bit_hwint Set to yes if HOST_WIDE_INT must be 64 bits wide - # for this target. This is true if this target - # supports long or wchar_t wider than 32 bits, - # or BITS_PER_WORD is wider than 32 bits. - # The setting made here must match the one made in - # other locations such as libcpp/configure.ac - # # configure_default_options # Set to an initializer for configure_default_options # in configargs.h, based on --with-cpu et cetera. --- 164,169 *** gnu_ld=$gnu_ld_flag *** 233,239 default_use_cxa_atexit=no default_gnu_indirect_function=no target_gtfiles= - need_64bit_hwint=yes need_64bit_isa= native_system_header_dir=/usr/include target_type_format_char='@' --- 226,231 *** m32c*-*-*) *** 310,323 ;; aarch64*-*-*) cpu_type=aarch64 - need_64bit_hwint=yes extra_headers=arm_neon.h extra_objs=aarch64-builtins.o aarch-common.o target_has_targetm_common=yes ;; alpha*-*-*) cpu_type=alpha - need_64bit_hwint=yes extra_options=${extra_options} g.opt ;; am33_2.0-*-linux*) --- 302,313 *** arm*-*-*) *** 333,339 target_type_format_char='%' c_target_objs=arm-c.o cxx_target_objs=arm-c.o - need_64bit_hwint=yes extra_options=${extra_options} arm/arm-tables.opt ;; avr-*-*) --- 323,328 *** i[34567]86-*-*) *** 363,369 cpu_type=i386 c_target_objs=i386-c.o cxx_target_objs=i386-c.o - need_64bit_hwint=yes extra_options=${extra_options} fused-madd.opt extra_headers=cpuid.h mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h pmmintrin.h tmmintrin.h ammintrin.h smmintrin.h --- 352,357 *** x86_64-*-*) *** 393,403 adxintrin.h fxsrintrin.h xsaveintrin.h xsaveoptintrin.h avx512cdintrin.h avx512erintrin.h avx512pfintrin.h shaintrin.h - need_64bit_hwint=yes ;; ia64-*-*) extra_headers=ia64intrin.h - need_64bit_hwint=yes extra_options=${extra_options} g.opt fused-madd.opt ;; hppa*-*-*) --- 381,389 *** microblaze*-*-*) *** 420,426 ;; mips*-*-*) cpu_type=mips - need_64bit_hwint=yes extra_headers=loongson.h extra_options=${extra_options} g.opt mips/mips-tables.opt ;; --- 406,411 *** picochip-*-*) *** 438,444 powerpc*-*-*) cpu_type=rs6000 extra_headers=ppc-asm.h altivec.h spe.h ppu_intrinsics.h paired.h spu2vmx.h vec_types.h si2vmx.h htmintrin.h htmxlintrin.h - need_64bit_hwint=yes case x$with_cpu in xpowerpc64|xdefault64|x6[23]0|x970|xG5|xpower[345678]|xpower6x|xrs64a|xcell|xa2|xe500mc64|xe5500|Xe6500) cpu_is_64bit=yes --- 423,428 *** powerpc*-*-*) *** 447,453 extra_options=${extra_options} g.opt fused-madd.opt rs6000/rs6000-tables.opt ;; rs6000*-*-*) - need_64bit_hwint=yes extra_options=${extra_options} g.opt fused-madd.opt rs6000/rs6000-tables.opt ;; score*-*-*) --- 431,436 *** sparc*-*-*) *** 459,480 c_target_objs=sparc-c.o cxx_target_objs=sparc-c.o extra_headers=visintrin.h - need_64bit_hwint=yes ;; spu*-*-*) cpu_type=spu - need_64bit_hwint=yes ;; s390*-*-*)
Re: [PATCH][RFC] Always require a 64bit HWI
On Wed, 30 Apr 2014, Richard Biener wrote: On Tue, 29 Apr 2014, Jeff Law wrote: On 04/29/14 05:21, Richard Biener wrote: The following patch forces the availability of a 64bit HWI (without applying the cleanups that result from this). I propose this exact patch for a short time to get those that are affected and do not want to be affected scream. But honestly I don't see any important host architecture that not already requires a 64bit HWI. Another concern is that the host compiler may not provide a 64bit type. I'm not sure that this is an issue nowadays (even though C++98 doesn't have 'long long', so it's maybe more an issue now with C++ than it was previously with requiring C89). But given that it wasn't an issue for the existing 64bit HWI requiring host archs it shouldn't be an issue now. The benefit of this change is obviously the cleanup that can result from it - especially getting rid of code generation dependences on the host (!need_64bit_hwi doesn't mean we force a 32bit hwi). As followup we can replace HOST_WIDE_INT and its friends with int64_t variants and appear less confusing to newcomers (and it's also less characters to type! yay!). We'd still retain HOST_WIDEST_FAST_INT, and as Kenny said elsewhere wide-int should internally operate on that, not on the eventually slow int64_t. But that's a separate issue. So - any objections? Thanks, Richard. 2014-04-29 Richard Biener rguent...@suse.de libcpp/ * configure.ac: Always set need_64bit_hwint to yes. * configure: Regenerated. * config.gcc: Always set need_64bit_hwint to yes. No objections. The requirement for 64 bit HWINT traces its origins back to the MIPS R5900 target IIRC. It's probably well past the time when we should just bite the bullet and make HWINT 64 bits across the board. If the host compiler doesn't support 64-bit HWINT, then it seems to me the host compiler can be used to bootstrap 4.9, which can then be used to bootstrap more modern GCCs. And like you I suspect it's really not going to be an issue in practice. I realized I forgot to copy gcc-patches, so done now (patch copied below again for reference). I propose to apply the patch after the wide-int merge for a short period of time and then followup with a patch to remove the need_64bit_hwint code (I'll make sure to send that out for review before applying this one). Testing coverage for non-64bit hwi configs is really low these days (I know of only 32bit hppa-*-* that is still built and tested semi-regularly - Dave, I suppose the host compiler has a 64bit long long type there, right?). I have now applied the patch (as it is said to fix wide-int merge fallout). The plan is to go forward with cleanups that are possible after this throughout stage1 (I sent the first cleanup patch already, but further ones should wait until we released 4.9.1 to not make backports harder than necessary). Richard. Thanks, Richard. 2014-04-29 Richard Biener rguent...@suse.de libcpp/ * configure.ac: Always set need_64bit_hwint to yes. * configure: Regenerated. * config.gcc: Always set need_64bit_hwint to yes. Index: libcpp/configure.ac === --- libcpp/configure.ac (revision 209890) +++ libcpp/configure.ac (working copy) @@ -200,7 +200,7 @@ case $target in tilegx*-*-* | tilepro*-*-* ) need_64bit_hwint=yes ;; *) - need_64bit_hwint=no ;; + need_64bit_hwint=yes ;; esac case $need_64bit_hwint:$ac_cv_sizeof_long in Index: gcc/config.gcc === --- gcc/config.gcc(revision 209890) +++ gcc/config.gcc(working copy) @@ -233,7 +233,7 @@ gnu_ld=$gnu_ld_flag default_use_cxa_atexit=no default_gnu_indirect_function=no target_gtfiles= -need_64bit_hwint= +need_64bit_hwint=yes need_64bit_isa= native_system_header_dir=/usr/include target_type_format_char='@'
Re: we are starting the wide int merge
Jan-Benedict Glaw jbg...@lug-owl.de writes: On Tue, 2014-05-06 12:20:54 -0700, Mike Stump mikest...@comcast.net wrote: On May 6, 2014, at 8:19 AM, Kenneth Zadeck zad...@naturalbridge.com wrote: please hold off on committing patches for the next couple of hours as we have a very large merge to do. thanks. All done… It is in. Just found one more: g++ -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -I. -I. -I/home/vaxbuild/repos/gcc/gcc -I/home/vaxbuild/repos/gcc/gcc/. -I/home/vaxbuild/repos/gcc/gcc/../include -I/home/vaxbuild/repos/gcc/gcc/../libcpp/include -I/home/vaxbuild/repos/gcc/gcc/../libdecnumber -I/home/vaxbuild/repos/gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I/home/vaxbuild/repos/gcc/gcc/../libbacktrace-o loop-iv.o -MT loop-iv.o -MMD -MP -MF ./.deps/loop-iv.TPo /home/vaxbuild/repos/gcc/gcc/loop-iv.c In file included from /home/vaxbuild/repos/gcc/gcc/real.h:25:0, from /home/vaxbuild/repos/gcc/gcc/rtl.h:27, from /home/vaxbuild/repos/gcc/gcc/loop-iv.c:54: /home/vaxbuild/repos/gcc/gcc/wide-int.h: In instantiation of ‘fixed_wide_int_storageN::fixed_wide_int_storage(const T) [with T = long long unsigned int; int N = 160]’: /home/vaxbuild/repos/gcc/gcc/wide-int.h:724:15: required from ‘generic_wide_intT::generic_wide_int(const T) [with T = long long unsigned int; storage = fixed_wide_int_storage160]’ /home/vaxbuild/repos/gcc/gcc/loop-iv.c:2628:48: required from here /home/vaxbuild/repos/gcc/gcc/wide-int.h:1172:45: error: incomplete type ‘wi::int_traitslong long unsigned int’ used in nested name specifier WI_BINARY_RESULT (T, FIXED_WIDE_INT (N)) *assertion ATTRIBUTE_UNUSED; ^ /home/vaxbuild/repos/gcc/gcc/wide-int.h:1173:47: error: incomplete type ‘wi::int_traitslong long unsigned int’ used in nested name specifier wi::copy (*this, WIDE_INT_REF_FOR (T) (x, N)); ^ make[1]: *** [loop-iv.o] Error 1 Looks like this is specific to 32-bit HOST_WIDE_INTs. The problem was that loop-iv.c was using HOST_WIDEST_INT and no template specialisations were defined for that. Richard B's patch to force HOST_WIDE_INT to 64 bits will fix this. Thanks, Richard
Re: [AArch64] Fix integer vabs intrinsics
On 05/05/14 09:04, Richard Biener wrote: On Fri, May 2, 2014 at 12:39 PM, Richard Earnshaw rearn...@arm.com wrote: On 02/05/14 11:28, James Greenhalgh wrote: On Fri, May 02, 2014 at 10:29:06AM +0100, pins...@gmail.com wrote: On May 2, 2014, at 2:21 AM, James Greenhalgh james.greenha...@arm.com wrote: On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote: On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh james.greenha...@arm.com wrote: Hi, Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as undefined/impossible, the neon intrinsics vabs intrinsics should behave as the hardware. That is to say, the pseudo-code sequence: Only for signed integer types. You should be able to use an unsigned integer type here instead. If anything, I think that puts us in a worse position. Not if you cast it back. The issue that inspires this patch is that GCC will happily fold: t1 = ABS_EXPR (x) t2 = GE_EXPR (t1, 0) to t2 = TRUE Surely an unsigned integer type is going to suffer the same fate? Certainly I can imagine somewhere in the compiler there being a fold path for: Yes but if add a cast from the unsigned type to the signed type gcc does not optimize that. If it does it is a bug since the overflow is defined there. I'm not sure I understand, are you saying I want to fold to: t1 = VIEW_CONVERT_EXPR (x, unsigned) t2 = ABS_EXPR (t1) t3 = VIEW_CONVERT_EXPR (t2, signed) Surely ABS_EXPR (unsigned) is a nop, and the two VIEW_CONVERTs cancel each other out leading to an overall NOP? It might just be Friday morning and a lack of coffee talking, but I think I need you to spell this one out to me in big letters! I agree. I think what you need is a type widening so that you get t1 = VEC_WIDEN (x) t2 = ABS_EXPR (t1) t3 = VEC_NARROW (t2) This then guarantees that the ABS expression cannot be undefined. I'm less sure, however about the narrow causing a change in 'sign'. Has it just punted the problem? Maybe you need Another option is to allow ABS_EXPR to have a TYPE_UNSIGNED result type, thus do abs(int) - unsigned (what we have as absu_hwi). That is, have an ABS_EXPR that doesn't have the undefined issue (at expense of optimization in case the result is immediately casted back to signed) Yes, that would make more sense, and is, in effect, what the ARM VABS instruction is doing (producing an unsigned result with no undefined behaviour). I'm not sure I understand your 'at expense of optimization' comment, though. Surely a cast back to signed is essentially a no-op, since there's no representational change in the value (at least, not on 2's complement machines)? Richard. t1 = VEC_WIDEN (x) t2 = ABS_EXPR (t1) t3 = VIEW_CONVERT_EXPR (x, unsigned) t4 = VEC_NARROW (t3) t5 = VIEW_CONVERT_EXPR (t4, signed) !!! How you capture this into RTL during expand, though, is another thing. R. (unsigned = 0) == TRUE a = vabs_s8 (vdup_n_s8 (-128)); assert (a = 0); does not hold. As in hardware abs (-128) == -128 Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should avoid it. In fact, we have to be even more careful than that, and keep the integer vabs intrinsics as an unspec in the back end. No it is not. The mistake is to use signed integer types here. Just add a conversion to an unsigned integer vector and it will work correctly. In fact the ABS rtl code is not undefined for the overflow. Here we are covering ourselves against a seperate issue. For auto-vectorized code we want the SABD combine patterns to kick in whenever sensible. For intrinsics code, in the case where vsub_s8 (x, y) would cause an underflow: vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y) So in this case, the combine would be erroneous. Likewise SABA. This sounds like it would problematic for unsigned types and not just for vabs_s8 with vsub_s8. So I think you should be using unspec for vabd_s8 instead. Since in rtl overflow and underflow is defined to be wrapping. There are no vabs_u8/vabd_u8 so I don't see how we can reach this point with unsigned types. Further, I have never thought of RTL having signed and unsigned types, just a bag of bits. We'll want to use unspec for the intrinsic version of vabd_s8 - but we'll want to specify the (abs (minus (reg) (reg))) behaviour so that auto-vectorized code can pick it up. So in the end we'll have these patterns: (abs (abs (reg))) (intrinsic_abs (unspec [(reg)] UNSPEC_ABS)) (abd (abs (minus (reg) (reg (intrinsic_abd (unspec [(reg) (reg)] UNSPEC_ABD)) (aba (plus (abs (minus (reg) (reg))) (reg))) (intrinsic_aba (plus (unspec [(reg) (reg)] UNSPEC_ABD) (reg))) which should give us reasonable auto-vectorized code without triggering any of the issues mapping the semantics of the instructions to intrinsics. Thanks, James Thanks, Andrew Pinski Thanks, James
Re: [AArch64] Fix integer vabs intrinsics
On Wed, May 7, 2014 at 12:30 PM, Richard Earnshaw rearn...@arm.com wrote: On 05/05/14 09:04, Richard Biener wrote: On Fri, May 2, 2014 at 12:39 PM, Richard Earnshaw rearn...@arm.com wrote: On 02/05/14 11:28, James Greenhalgh wrote: On Fri, May 02, 2014 at 10:29:06AM +0100, pins...@gmail.com wrote: On May 2, 2014, at 2:21 AM, James Greenhalgh james.greenha...@arm.com wrote: On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote: On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh james.greenha...@arm.com wrote: Hi, Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as undefined/impossible, the neon intrinsics vabs intrinsics should behave as the hardware. That is to say, the pseudo-code sequence: Only for signed integer types. You should be able to use an unsigned integer type here instead. If anything, I think that puts us in a worse position. Not if you cast it back. The issue that inspires this patch is that GCC will happily fold: t1 = ABS_EXPR (x) t2 = GE_EXPR (t1, 0) to t2 = TRUE Surely an unsigned integer type is going to suffer the same fate? Certainly I can imagine somewhere in the compiler there being a fold path for: Yes but if add a cast from the unsigned type to the signed type gcc does not optimize that. If it does it is a bug since the overflow is defined there. I'm not sure I understand, are you saying I want to fold to: t1 = VIEW_CONVERT_EXPR (x, unsigned) t2 = ABS_EXPR (t1) t3 = VIEW_CONVERT_EXPR (t2, signed) Surely ABS_EXPR (unsigned) is a nop, and the two VIEW_CONVERTs cancel each other out leading to an overall NOP? It might just be Friday morning and a lack of coffee talking, but I think I need you to spell this one out to me in big letters! I agree. I think what you need is a type widening so that you get t1 = VEC_WIDEN (x) t2 = ABS_EXPR (t1) t3 = VEC_NARROW (t2) This then guarantees that the ABS expression cannot be undefined. I'm less sure, however about the narrow causing a change in 'sign'. Has it just punted the problem? Maybe you need Another option is to allow ABS_EXPR to have a TYPE_UNSIGNED result type, thus do abs(int) - unsigned (what we have as absu_hwi). That is, have an ABS_EXPR that doesn't have the undefined issue (at expense of optimization in case the result is immediately casted back to signed) Yes, that would make more sense, and is, in effect, what the ARM VABS instruction is doing (producing an unsigned result with no undefined behaviour). I'm not sure I understand your 'at expense of optimization' comment, though. Surely a cast back to signed is essentially a no-op, since there's no representational change in the value (at least, not on 2's complement machines)? We can't derive a value range of [0, INT_MAX] for the (int)ABSU_EXPR. Richard. Richard. t1 = VEC_WIDEN (x) t2 = ABS_EXPR (t1) t3 = VIEW_CONVERT_EXPR (x, unsigned) t4 = VEC_NARROW (t3) t5 = VIEW_CONVERT_EXPR (t4, signed) !!! How you capture this into RTL during expand, though, is another thing. R. (unsigned = 0) == TRUE a = vabs_s8 (vdup_n_s8 (-128)); assert (a = 0); does not hold. As in hardware abs (-128) == -128 Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should avoid it. In fact, we have to be even more careful than that, and keep the integer vabs intrinsics as an unspec in the back end. No it is not. The mistake is to use signed integer types here. Just add a conversion to an unsigned integer vector and it will work correctly. In fact the ABS rtl code is not undefined for the overflow. Here we are covering ourselves against a seperate issue. For auto-vectorized code we want the SABD combine patterns to kick in whenever sensible. For intrinsics code, in the case where vsub_s8 (x, y) would cause an underflow: vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y) So in this case, the combine would be erroneous. Likewise SABA. This sounds like it would problematic for unsigned types and not just for vabs_s8 with vsub_s8. So I think you should be using unspec for vabd_s8 instead. Since in rtl overflow and underflow is defined to be wrapping. There are no vabs_u8/vabd_u8 so I don't see how we can reach this point with unsigned types. Further, I have never thought of RTL having signed and unsigned types, just a bag of bits. We'll want to use unspec for the intrinsic version of vabd_s8 - but we'll want to specify the (abs (minus (reg) (reg))) behaviour so that auto-vectorized code can pick it up. So in the end we'll have these patterns: (abs (abs (reg))) (intrinsic_abs (unspec [(reg)] UNSPEC_ABS)) (abd (abs (minus (reg) (reg (intrinsic_abd (unspec [(reg) (reg)] UNSPEC_ABD)) (aba (plus (abs (minus (reg) (reg))) (reg))) (intrinsic_aba (plus (unspec [(reg) (reg)] UNSPEC_ABD) (reg))) which should give us reasonable auto-vectorized code without
Re: [PATCH][RFC] Remove RTL loop unswitching
Hi! On Tue, 15 Apr 2014 11:26:29 +0200 (CEST), Richard Biener rguent...@suse.de wrote: This removes RTL loop unswitching 2014-04-15 Richard Biener rguent...@suse.de * Makefile.in (OBJS): Remove loop-unswitch.o. * loop-unswitch.c: Delete. * tree-pass.h (make_pass_rtl_unswitch): Remove. * passes.def (pass_rtl_unswitch): Likewise. * loop-init.c (gate_rtl_unswitch): Likewise. (rtl_unswitch): Likewise. (pass_data_rtl_unswitch): Likewise. (pass_rtl_unswitch): Likewise. (make_pass_rtl_unswitch): Likewise. * rtl.h (reversed_condition): Likewise. (compare_and_jump_seq): Likewise. * loop-iv.c (reversed_condition): Move here from loop-unswitch.c and make static. * loop-unroll.c (compare_and_jump_seq): Likewise. After checking with Richard on IRC, I applied the following in r210150: commit 81283dac62a91d2fbdf154fe51e9f84e0b1db816 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Wed May 7 10:31:26 2014 + Really delete gcc/loop-unswitch.c. gcc/ * loop-unswitch.c: Delete. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@210150 138bc75d-0d04-0410-961f-82ee72b054a4 diff --git gcc/ChangeLog gcc/ChangeLog index d5e6a0a..e5033a0 100644 --- gcc/ChangeLog +++ gcc/ChangeLog @@ -1,3 +1,7 @@ +2014-05-07 Thomas Schwinge tho...@codesourcery.com + + * loop-unswitch.c: Delete. + 2014-05-07 Richard Biener rguent...@suse.de * config.gcc: Always set need_64bit_hwint to yes. @@ -2294,7 +2298,6 @@ 2014-04-23 Richard Biener rguent...@suse.de * Makefile.in (OBJS): Remove loop-unswitch.o. - * loop-unswitch.c: Delete. * tree-pass.h (make_pass_rtl_unswitch): Remove. * passes.def (pass_rtl_unswitch): Likewise. * loop-init.c (gate_rtl_unswitch): Likewise. diff --git gcc/loop-unswitch.c gcc/loop-unswitch.c deleted file mode 100644 index fff0fd1..000 Grüße, Thomas pgp6PZW4kmLlT.pgp Description: PGP signature
Re: [AArch64] Fix integer vabs intrinsics
On 07/05/14 11:32, Richard Biener wrote: On Wed, May 7, 2014 at 12:30 PM, Richard Earnshaw rearn...@arm.com wrote: On 05/05/14 09:04, Richard Biener wrote: On Fri, May 2, 2014 at 12:39 PM, Richard Earnshaw rearn...@arm.com wrote: On 02/05/14 11:28, James Greenhalgh wrote: On Fri, May 02, 2014 at 10:29:06AM +0100, pins...@gmail.com wrote: On May 2, 2014, at 2:21 AM, James Greenhalgh james.greenha...@arm.com wrote: On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote: On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh james.greenha...@arm.com wrote: Hi, Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as undefined/impossible, the neon intrinsics vabs intrinsics should behave as the hardware. That is to say, the pseudo-code sequence: Only for signed integer types. You should be able to use an unsigned integer type here instead. If anything, I think that puts us in a worse position. Not if you cast it back. The issue that inspires this patch is that GCC will happily fold: t1 = ABS_EXPR (x) t2 = GE_EXPR (t1, 0) to t2 = TRUE Surely an unsigned integer type is going to suffer the same fate? Certainly I can imagine somewhere in the compiler there being a fold path for: Yes but if add a cast from the unsigned type to the signed type gcc does not optimize that. If it does it is a bug since the overflow is defined there. I'm not sure I understand, are you saying I want to fold to: t1 = VIEW_CONVERT_EXPR (x, unsigned) t2 = ABS_EXPR (t1) t3 = VIEW_CONVERT_EXPR (t2, signed) Surely ABS_EXPR (unsigned) is a nop, and the two VIEW_CONVERTs cancel each other out leading to an overall NOP? It might just be Friday morning and a lack of coffee talking, but I think I need you to spell this one out to me in big letters! I agree. I think what you need is a type widening so that you get t1 = VEC_WIDEN (x) t2 = ABS_EXPR (t1) t3 = VEC_NARROW (t2) This then guarantees that the ABS expression cannot be undefined. I'm less sure, however about the narrow causing a change in 'sign'. Has it just punted the problem? Maybe you need Another option is to allow ABS_EXPR to have a TYPE_UNSIGNED result type, thus do abs(int) - unsigned (what we have as absu_hwi). That is, have an ABS_EXPR that doesn't have the undefined issue (at expense of optimization in case the result is immediately casted back to signed) Yes, that would make more sense, and is, in effect, what the ARM VABS instruction is doing (producing an unsigned result with no undefined behaviour). I'm not sure I understand your 'at expense of optimization' comment, though. Surely a cast back to signed is essentially a no-op, since there's no representational change in the value (at least, not on 2's complement machines)? We can't derive a value range of [0, INT_MAX] for the (int)ABSU_EXPR. Unless you're assuming that ABS_EXPR(INT_MIN) will always trap, then if you can derive it for ABS_EXPR (which really returns [0, INT_MAX]+UNSPECIFIED, I don't really see why you can't derive it for (int)ABSU_EXPR, which returns [0, INT_MAX]+INT_MIN, since the latter is a subset of the former). R. Richard. Richard. t1 = VEC_WIDEN (x) t2 = ABS_EXPR (t1) t3 = VIEW_CONVERT_EXPR (x, unsigned) t4 = VEC_NARROW (t3) t5 = VIEW_CONVERT_EXPR (t4, signed) !!! How you capture this into RTL during expand, though, is another thing. R. (unsigned = 0) == TRUE a = vabs_s8 (vdup_n_s8 (-128)); assert (a = 0); does not hold. As in hardware abs (-128) == -128 Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should avoid it. In fact, we have to be even more careful than that, and keep the integer vabs intrinsics as an unspec in the back end. No it is not. The mistake is to use signed integer types here. Just add a conversion to an unsigned integer vector and it will work correctly. In fact the ABS rtl code is not undefined for the overflow. Here we are covering ourselves against a seperate issue. For auto-vectorized code we want the SABD combine patterns to kick in whenever sensible. For intrinsics code, in the case where vsub_s8 (x, y) would cause an underflow: vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y) So in this case, the combine would be erroneous. Likewise SABA. This sounds like it would problematic for unsigned types and not just for vabs_s8 with vsub_s8. So I think you should be using unspec for vabd_s8 instead. Since in rtl overflow and underflow is defined to be wrapping. There are no vabs_u8/vabd_u8 so I don't see how we can reach this point with unsigned types. Further, I have never thought of RTL having signed and unsigned types, just a bag of bits. We'll want to use unspec for the intrinsic version of vabd_s8 - but we'll want to specify the (abs (minus (reg) (reg))) behaviour so that auto-vectorized code can pick it up. So in the end we'll have these patterns: (abs (abs
[patch] libstdc++/61086 - fix ubsan errors in std::vector
The testcase in the PR calls __position._M_const_cast() to get a mutable iterator and that dereferences the pointer as suggested in http://gcc.gnu.org/ml/libstdc++/2013-05/msg00031.html That's invalid because the pointer is not dereferenceable (in this case it's null but is past-the-end at all times). I played around with changing the __normal_iterator so we would do __postition._M_const_cast(begin()) then decided we don't need it at all and can just as easily obtain a mutable iterator using: auto __pos = begin() + (__position - cbegin()); I plan to commit the attached patch to trunk and 4.9 soon. I've tested it on x86_64-linux but not added a testcase because we don't test with -fsanitize (though we should do) and it only shows up with Clang anyway. commit 566623def309c70387e41da2346ff89aa7619b13 Author: Jonathan Wakely jwak...@redhat.com Date: Wed May 7 12:17:41 2014 +0100 PR libstdc++/61086 * include/bits/stl_iterator.h (__normal_iterator::_M_const_cast): Remove. * include/bits/stl_vector.h (vector::insert, vector::erase): Use arithmetic to obtain a mutable iterator from const_iterator. * include/bits/vector.tcc (vector::insert): Likewise. * include/debug/vector (vector::erase): Likewise. * testsuite/23_containers/vector/requirements/dr438/assign_neg.cc: Adjust dg-error line number. * testsuite/23_containers/vector/requirements/dr438/ constructor_1_neg.cc: Likewise. * testsuite/23_containers/vector/requirements/dr438/ constructor_2_neg.cc: Likewise. * testsuite/23_containers/vector/requirements/dr438/insert_neg.cc: Likewise. diff --git a/libstdc++-v3/include/bits/stl_iterator.h b/libstdc++-v3/include/bits/stl_iterator.h index 16f992c..f4522a4 100644 --- a/libstdc++-v3/include/bits/stl_iterator.h +++ b/libstdc++-v3/include/bits/stl_iterator.h @@ -736,21 +736,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _Container::__type __i) _GLIBCXX_NOEXCEPT : _M_current(__i.base()) { } -#if __cplusplus = 201103L - __normal_iteratortypename _Container::pointer, _Container - _M_const_cast() const noexcept - { - using _PTraits = std::pointer_traitstypename _Container::pointer; - return __normal_iteratortypename _Container::pointer, _Container - (_PTraits::pointer_to(const_casttypename _PTraits::element_type -(*_M_current))); - } -#else - __normal_iterator - _M_const_cast() const - { return *this; } -#endif - // Forward iterator requirements reference operator*() const _GLIBCXX_NOEXCEPT diff --git a/libstdc++-v3/include/bits/stl_vector.h b/libstdc++-v3/include/bits/stl_vector.h index 3d3a2cf..0a56c65 100644 --- a/libstdc++-v3/include/bits/stl_vector.h +++ b/libstdc++-v3/include/bits/stl_vector.h @@ -1051,7 +1051,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER insert(const_iterator __position, size_type __n, const value_type __x) { difference_type __offset = __position - cbegin(); - _M_fill_insert(__position._M_const_cast(), __n, __x); + _M_fill_insert(begin() + __offset, __n, __x); return begin() + __offset; } #else @@ -1096,7 +1096,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER _InputIterator __last) { difference_type __offset = __position - cbegin(); - _M_insert_dispatch(__position._M_const_cast(), + _M_insert_dispatch(begin() + __offset, __first, __last, __false_type()); return begin() + __offset; } @@ -1144,10 +1144,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER iterator #if __cplusplus = 201103L erase(const_iterator __position) + { return _M_erase(begin() + (__position - cbegin())); } #else erase(iterator __position) + { return _M_erase(__position); } #endif - { return _M_erase(__position._M_const_cast()); } /** * @brief Remove a range of elements. @@ -1170,10 +1171,15 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER iterator #if __cplusplus = 201103L erase(const_iterator __first, const_iterator __last) + { + const auto __beg = begin(); + const auto __cbeg = cbegin(); + return _M_erase(__beg + (__first - __cbeg), __beg + (__last - __cbeg)); + } #else erase(iterator __first, iterator __last) + { return _M_erase(__first, __last); } #endif - { return _M_erase(__first._M_const_cast(), __last._M_const_cast()); } /** * @brief Swaps data with another %vector. diff --git a/libstdc++-v3/include/bits/vector.tcc b/libstdc++-v3/include/bits/vector.tcc index 299e614..5c3dfae 100644 --- a/libstdc++-v3/include/bits/vector.tcc +++ b/libstdc++-v3/include/bits/vector.tcc @@ -121,14 +121,17 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER else { #if __cplusplus = 201103L + const auto __pos = begin() + (__position - cbegin()); if (this-_M_impl._M_finish != this-_M_impl._M_end_of_storage) { _Tp __x_copy = __x; - _M_insert_aux(__position._M_const_cast(), std::move(__x_copy)); +
Re: [patch] libstdc++/61086 - fix ubsan errors in std::vector
On 05/07/2014 02:07 PM, Jonathan Wakely wrote: The testcase in the PR calls __position._M_const_cast() to get a mutable iterator and that dereferences the pointer as suggested in http://gcc.gnu.org/ml/libstdc++/2013-05/msg00031.html That's invalid because the pointer is not dereferenceable (in this case it's null but is past-the-end at all times). Uhmm, I see, at the time I scratched my head a bit. Nice that we can avoid the whole thing. Are we sure we don't have something similar elsewhere? Paolo.
Re: [patch] libstdc++/61086 - fix ubsan errors in std::vector
On 07/05/14 14:21 +0200, Paolo Carlini wrote: On 05/07/2014 02:07 PM, Jonathan Wakely wrote: The testcase in the PR calls __position._M_const_cast() to get a mutable iterator and that dereferences the pointer as suggested in http://gcc.gnu.org/ml/libstdc++/2013-05/msg00031.html That's invalid because the pointer is not dereferenceable (in this case it's null but is past-the-end at all times). Uhmm, I see, at the time I scratched my head a bit. Nice that we can avoid the whole thing. Are we sure we don't have something similar elsewhere? Yes, I checked. deque::const_iterator, list::const_iterator, vectorbool::const_iterator and the _Rb_tree_const_iterator types all have _M_const_cast but they do not dereference anything. It only really affected std::vector because that's the only one of our containers that correctly supports custom pointer types (when my fixes for PR57272 are ready I'll need to deal with the issue again and will be careful about dereferencing).
Re: [PATCH][RFC] Remove RTL loop unswitching
Hi! On Tue, 15 Apr 2014 11:26:29 +0200 (CEST), Richard Biener rguent...@suse.de wrote: This removes RTL loop unswitching 2014-04-15 Richard Biener rguent...@suse.de * Makefile.in (OBJS): Remove loop-unswitch.o. * loop-unswitch.c: Delete. * tree-pass.h (make_pass_rtl_unswitch): Remove. * passes.def (pass_rtl_unswitch): Likewise. * loop-init.c (gate_rtl_unswitch): Likewise. (rtl_unswitch): Likewise. (pass_data_rtl_unswitch): Likewise. (pass_rtl_unswitch): Likewise. (make_pass_rtl_unswitch): Likewise. * rtl.h (reversed_condition): Likewise. (compare_and_jump_seq): Likewise. * loop-iv.c (reversed_condition): Move here from loop-unswitch.c and make static. * loop-unroll.c (compare_and_jump_seq): Likewise. I found some more; OK to commit? Is a non-bootstrap build enough for this, or is a full bootstrap build and test needed? commit 8a703b1e7adc6001f665a12f93601382e3eea806 Author: Thomas Schwinge tho...@codesourcery.com Date: Wed May 7 13:01:47 2014 +0200 More gcc/loop-unswitch.c cleanup. gcc/ * cfgloop.h (unswitch_loops): Remove. * doc/passes.texi: Remove references to loop-unswitch.c * timevar.def (TV_LOOP_UNSWITCH): Remove. diff --git gcc/cfgloop.h gcc/cfgloop.h index ab8b809..62a656a 100644 --- gcc/cfgloop.h +++ gcc/cfgloop.h @@ -711,8 +711,6 @@ extern void loop_optimizer_init (unsigned); extern void loop_optimizer_finalize (void); /* Optimization passes. */ -extern void unswitch_loops (void); - enum { UAP_PEEL = 1,/* Enables loop peeling. */ diff --git gcc/doc/passes.texi gcc/doc/passes.texi index 2727b2c..fb064db 100644 --- gcc/doc/passes.texi +++ gcc/doc/passes.texi @@ -474,10 +474,7 @@ merging and induction variable elimination. The pass is implemented in Loop unswitching. This pass moves the conditional jumps that are invariant out of the loops. To achieve this, a duplicate of the loop is created for each possible outcome of conditional jump(s). The pass is implemented in -@file{tree-ssa-loop-unswitch.c}. This pass should eventually replace the -RTL level loop unswitching in @file{loop-unswitch.c}, but currently -the RTL level pass is not completely redundant yet due to deficiencies -in tree level alias analysis. +@file{tree-ssa-loop-unswitch.c}. The optimizations also use various utility functions contained in @file{tree-ssa-loop-manip.c}, @file{cfgloop.c}, @file{cfgloopanal.c} and @@ -793,8 +790,8 @@ The source files @file{cfgloopanal.c} and @file{cfgloopmanip.c} contain generic loop analysis and manipulation code. Initialization and finalization of loop structures is handled by @file{loop-init.c}. A loop invariant motion pass is implemented in @file{loop-invariant.c}. -Basic block level optimizations---unrolling, peeling and unswitching loops--- -are implemented in @file{loop-unswitch.c} and @file{loop-unroll.c}. +Basic block level optimizations---unrolling, and peeling loops--- +are implemented in @file{loop-unroll.c}. Replacing of the exit condition of loops by special machine-dependent instructions is handled by @file{loop-doloop.c}. diff --git gcc/timevar.def gcc/timevar.def index 9faf98b..2db1943 100644 --- gcc/timevar.def +++ gcc/timevar.def @@ -207,7 +207,6 @@ DEFTIMEVAR (TV_DSE2 , dead store elim2) DEFTIMEVAR (TV_LOOP , loop analysis) DEFTIMEVAR (TV_LOOP_INIT, loop init) DEFTIMEVAR (TV_LOOP_MOVE_INVARIANTS , loop invariant motion) -DEFTIMEVAR (TV_LOOP_UNSWITCH , loop unswitching) DEFTIMEVAR (TV_LOOP_UNROLL , loop unrolling) DEFTIMEVAR (TV_LOOP_DOLOOP , loop doloop) DEFTIMEVAR (TV_LOOP_FINI, loop fini) Grüße, Thomas pgpP6eLZr8j19.pgp Description: PGP signature
Re: [PATCH][RFC] Remove RTL loop unswitching
On Wed, 7 May 2014, Thomas Schwinge wrote: Hi! On Tue, 15 Apr 2014 11:26:29 +0200 (CEST), Richard Biener rguent...@suse.de wrote: This removes RTL loop unswitching 2014-04-15 Richard Biener rguent...@suse.de * Makefile.in (OBJS): Remove loop-unswitch.o. * loop-unswitch.c: Delete. * tree-pass.h (make_pass_rtl_unswitch): Remove. * passes.def (pass_rtl_unswitch): Likewise. * loop-init.c (gate_rtl_unswitch): Likewise. (rtl_unswitch): Likewise. (pass_data_rtl_unswitch): Likewise. (pass_rtl_unswitch): Likewise. (make_pass_rtl_unswitch): Likewise. * rtl.h (reversed_condition): Likewise. (compare_and_jump_seq): Likewise. * loop-iv.c (reversed_condition): Move here from loop-unswitch.c and make static. * loop-unroll.c (compare_and_jump_seq): Likewise. I found some more; OK to commit? Is a non-bootstrap build enough for this, or is a full bootstrap build and test needed? That's enough. Ok. Thanks, Richard. commit 8a703b1e7adc6001f665a12f93601382e3eea806 Author: Thomas Schwinge tho...@codesourcery.com Date: Wed May 7 13:01:47 2014 +0200 More gcc/loop-unswitch.c cleanup. gcc/ * cfgloop.h (unswitch_loops): Remove. * doc/passes.texi: Remove references to loop-unswitch.c * timevar.def (TV_LOOP_UNSWITCH): Remove. diff --git gcc/cfgloop.h gcc/cfgloop.h index ab8b809..62a656a 100644 --- gcc/cfgloop.h +++ gcc/cfgloop.h @@ -711,8 +711,6 @@ extern void loop_optimizer_init (unsigned); extern void loop_optimizer_finalize (void); /* Optimization passes. */ -extern void unswitch_loops (void); - enum { UAP_PEEL = 1, /* Enables loop peeling. */ diff --git gcc/doc/passes.texi gcc/doc/passes.texi index 2727b2c..fb064db 100644 --- gcc/doc/passes.texi +++ gcc/doc/passes.texi @@ -474,10 +474,7 @@ merging and induction variable elimination. The pass is implemented in Loop unswitching. This pass moves the conditional jumps that are invariant out of the loops. To achieve this, a duplicate of the loop is created for each possible outcome of conditional jump(s). The pass is implemented in -@file{tree-ssa-loop-unswitch.c}. This pass should eventually replace the -RTL level loop unswitching in @file{loop-unswitch.c}, but currently -the RTL level pass is not completely redundant yet due to deficiencies -in tree level alias analysis. +@file{tree-ssa-loop-unswitch.c}. The optimizations also use various utility functions contained in @file{tree-ssa-loop-manip.c}, @file{cfgloop.c}, @file{cfgloopanal.c} and @@ -793,8 +790,8 @@ The source files @file{cfgloopanal.c} and @file{cfgloopmanip.c} contain generic loop analysis and manipulation code. Initialization and finalization of loop structures is handled by @file{loop-init.c}. A loop invariant motion pass is implemented in @file{loop-invariant.c}. -Basic block level optimizations---unrolling, peeling and unswitching loops--- -are implemented in @file{loop-unswitch.c} and @file{loop-unroll.c}. +Basic block level optimizations---unrolling, and peeling loops--- +are implemented in @file{loop-unroll.c}. Replacing of the exit condition of loops by special machine-dependent instructions is handled by @file{loop-doloop.c}. diff --git gcc/timevar.def gcc/timevar.def index 9faf98b..2db1943 100644 --- gcc/timevar.def +++ gcc/timevar.def @@ -207,7 +207,6 @@ DEFTIMEVAR (TV_DSE2 , dead store elim2) DEFTIMEVAR (TV_LOOP , loop analysis) DEFTIMEVAR (TV_LOOP_INIT , loop init) DEFTIMEVAR (TV_LOOP_MOVE_INVARIANTS , loop invariant motion) -DEFTIMEVAR (TV_LOOP_UNSWITCH , loop unswitching) DEFTIMEVAR (TV_LOOP_UNROLL , loop unrolling) DEFTIMEVAR (TV_LOOP_DOLOOP , loop doloop) DEFTIMEVAR (TV_LOOP_FINI , loop fini) Grüße, Thomas -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer
Re: [patch] libstdc++/61086 - fix ubsan errors in std::vector
Hi, On 05/07/2014 02:33 PM, Jonathan Wakely wrote: Yes, I checked. deque::const_iterator, list::const_iterator, vectorbool::const_iterator and the _Rb_tree_const_iterator types all have _M_const_cast but they do not dereference anything. It only really affected std::vector because that's the only one of our containers that correctly supports custom pointer types (when my fixes for PR57272 are ready I'll need to deal with the issue again and will be careful about dereferencing). Excellent. Thanks again! Paolo.
Re: debug container patch
On Wed, May 7, 2014 at 2:13 AM, Paolo Carlini paolo.carl...@oracle.com wrote: -- Francois, remember to regenerate and commit the Makefile.in changes. Can someone regenerate and commit the Makefile.in changes soon ? I'm seeing testsuite failures thanks to missing debug/safe_container.h on arm-none-linux-gnueabihf I don't have access to a machine right now with the right versions of autoconf and automake that can do this easily. Ramana Thanks, Paolo.
[PATCH][1/n] Fix PR61034
The following fixes part of PR61034 - we are hindered by false clobbering during FRE/PRE on paths we try to look through by means of the alias walker. The following makes us also consider lattice-based disambiguation there and in particular also try harder to disambiguate against builtins. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2014-05-07 Richard Biener rguent...@suse.de PR tree-optimization/61034 * tree-ssa-alias.c (call_may_clobber_ref_p_1): Export. (maybe_skip_until): Use translate to take into account lattices when trying to do disambiguations. (get_continuation_for_phi_1): Likewise. (get_continuation_for_phi): Adjust for added translate arguments. (walk_non_aliased_vuses): Likewise. * tree-ssa-alias.h (get_continuation_for_phi): Adjust prototype. (walk_non_aliased_vuses): Likewise. (call_may_clobber_ref_p_1): Declare. * tree-ssa-sccvn.c (vn_reference_lookup_3): Also disambiguate against calls. Stop early if we are only supposed to disambiguate. * tree-ssa-pre.c (translate_vuse_through_block): Adjust. * g++.dg/tree-ssa/pr61034.C: New testcase. Index: gcc/tree-ssa-alias.c === *** gcc/tree-ssa-alias.c.orig 2014-05-07 13:53:47.015599960 +0200 --- gcc/tree-ssa-alias.c2014-05-07 14:07:09.087544738 +0200 *** ref_maybe_used_by_stmt_p (gimple stmt, t *** 1835,1841 /* If the call in statement CALL may clobber the memory reference REF return true, otherwise return false. */ ! static bool call_may_clobber_ref_p_1 (gimple call, ao_ref *ref) { tree base; --- 1835,1841 /* If the call in statement CALL may clobber the memory reference REF return true, otherwise return false. */ ! bool call_may_clobber_ref_p_1 (gimple call, ao_ref *ref) { tree base; *** stmt_kills_ref_p (gimple stmt, tree ref) *** 2318,2324 static bool maybe_skip_until (gimple phi, tree target, ao_ref *ref, tree vuse, unsigned int *cnt, bitmap *visited, ! bool abort_on_visited) { basic_block bb = gimple_bb (phi); --- 2318,2326 static bool maybe_skip_until (gimple phi, tree target, ao_ref *ref, tree vuse, unsigned int *cnt, bitmap *visited, ! bool abort_on_visited, ! void *(*translate)(ao_ref *, tree, void *, bool), ! void *data) { basic_block bb = gimple_bb (phi); *** maybe_skip_until (gimple phi, tree targe *** 2338,2344 if (bitmap_bit_p (*visited, SSA_NAME_VERSION (PHI_RESULT (def_stmt return !abort_on_visited; vuse = get_continuation_for_phi (def_stmt, ref, cnt, ! visited, abort_on_visited); if (!vuse) return false; continue; --- 2340,2347 if (bitmap_bit_p (*visited, SSA_NAME_VERSION (PHI_RESULT (def_stmt return !abort_on_visited; vuse = get_continuation_for_phi (def_stmt, ref, cnt, ! visited, abort_on_visited, ! translate, data); if (!vuse) return false; continue; *** maybe_skip_until (gimple phi, tree targe *** 2350,2356 /* A clobbering statement or the end of the IL ends it failing. */ ++*cnt; if (stmt_may_clobber_ref_p_1 (def_stmt, ref)) ! return false; } /* If we reach a new basic-block see if we already skipped it in a previous walk that ended successfully. */ --- 2353,2365 /* A clobbering statement or the end of the IL ends it failing. */ ++*cnt; if (stmt_may_clobber_ref_p_1 (def_stmt, ref)) ! { ! if (translate ! (*translate) (ref, vuse, data, true) == NULL) ! ; ! else ! return false; ! } } /* If we reach a new basic-block see if we already skipped it in a previous walk that ended successfully. */ *** maybe_skip_until (gimple phi, tree targe *** 2372,2378 static tree get_continuation_for_phi_1 (gimple phi, tree arg0, tree arg1, ao_ref *ref, unsigned int *cnt, ! bitmap *visited, bool abort_on_visited) { gimple def0 = SSA_NAME_DEF_STMT (arg0); gimple def1 = SSA_NAME_DEF_STMT (arg1); --- 2381,2389 static tree get_continuation_for_phi_1 (gimple phi, tree arg0, tree arg1, ao_ref *ref, unsigned int *cnt, ! bitmap *visited, bool abort_on_visited, ! void *(*translate)(ao_ref *, tree, void *, bool), !
Re: debug container patch
On 07/05/14 14:17 +0100, Ramana Radhakrishnan wrote: Can someone regenerate and commit the Makefile.in changes soon ? I'm seeing testsuite failures thanks to missing debug/safe_container.h on arm-none-linux-gnueabihf It was done hours ago by http://gcc.gnu.org/ml/gcc-cvs/2014-05/msg00170.html
Re: debug container patch
On Wed, May 7, 2014 at 2:22 PM, Jonathan Wakely jwak...@redhat.com wrote: On 07/05/14 14:17 +0100, Ramana Radhakrishnan wrote: Can someone regenerate and commit the Makefile.in changes soon ? I'm seeing testsuite failures thanks to missing debug/safe_container.h on arm-none-linux-gnueabihf It was done hours ago by http://gcc.gnu.org/ml/gcc-cvs/2014-05/msg00170.html Sorry about the noise. I realized that just after I had hit send. not enough coffee today. Ramana
Re: [C++ Patch] PR 61080
OK. Jason
[patch] libstdc++/61023 - copy comparison functor in RB tree move assignment
As noted in the PR, the standard doesn't actually say what containers should do with their functors on move construction/assignment. Our unordered containers currently move the hash and predicate functions. Our RB trees copy the comparison function in the move constructor but do nothing with it in the move assignment. I think moving in both cases is probably correct, but rather than change the existing move constructor this patch just makes the move assignment copy the function, for consistency. When the standard is clarified we can review whether we should be moving instead of copying. Tested x86_64-linux, committed to trunk and the 4.9 branch. commit 42ea108aeb7528ff3b41f7c1b9d11f3a8ba1bae8 Author: Jonathan Wakely jwak...@redhat.com Date: Wed May 7 14:25:48 2014 +0100 PR libstdc++/61023 * include/bits/stl_tree.h (_Rb_tree::_M_move_assign): Copy the comparison function. * testsuite/23_containers/set/cons/61023.cc: New. diff --git a/libstdc++-v3/include/bits/stl_tree.h b/libstdc++-v3/include/bits/stl_tree.h index 288c9fa..ce43ab8 100644 --- a/libstdc++-v3/include/bits/stl_tree.h +++ b/libstdc++-v3/include/bits/stl_tree.h @@ -1073,6 +1073,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _Rb_tree_Key, _Val, _KeyOfValue, _Compare, _Alloc:: _M_move_assign(_Rb_tree __x) { + _M_impl._M_key_compare = __x._M_impl._M_key_compare; if (_Alloc_traits::_S_propagate_on_move_assign() || _Alloc_traits::_S_always_equal() || _M_get_Node_allocator() == __x._M_get_Node_allocator()) diff --git a/libstdc++-v3/testsuite/23_containers/set/cons/61023.cc b/libstdc++-v3/testsuite/23_containers/set/cons/61023.cc new file mode 100644 index 000..087b9cc --- /dev/null +++ b/libstdc++-v3/testsuite/23_containers/set/cons/61023.cc @@ -0,0 +1,56 @@ +// { dg-options -std=gnu++11 } + +// Copyright (C) 2014 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy of the GNU General Public License along +// with this library; see the file COPYING3. If not see +// http://www.gnu.org/licenses/. + +#include set +#include stdexcept + +struct Comparator +{ + Comparator() : valid(false) { } + explicit Comparator(bool) : valid(true) { } + + bool operator()(int i, int j) const + { +if (!valid) + throw std::logic_error(Comparator is invalid); +return i j; + } + +private: + bool valid; +}; + +int main() +{ + using test_type = std::setint, Comparator; + + Comparator cmp{true}; + + test_type good{cmp}; + + test_type s1; + s1 = good; // copy-assign + s1.insert(1); + s1.insert(2); + + test_type s2; + s2 = std::move(good); // move-assign + s2.insert(1); + s2.insert(2); +}
Re: [Patch ARM 1/3] Neon intrinsics TLC : Replace intrinsics with GNU C implementations where possible.
On 28/04/14 14:01, Ramana Radhakrishnan wrote: On Mon, Apr 28, 2014 at 12:44 PM, Julian Brown jul...@codesourcery.com wrote: On Mon, 28 Apr 2014 11:44:01 +0100 Ramana Radhakrishnan ramra...@arm.com wrote: I've special cased the ffast-math case for the _f32 intrinsics to prevent the auto-vectorizer from coming along and vectorizing addv2sf and addv4sf type operations which we don't want to happen by default. Patch 1/3 causes apparent regressions in the rather ineffective neon intrinsics tests that we currently carry soon hopefully to be replaced by Christophe Lyon's rewrite that is being reviewed. On the whole I deem this patch stack to be safe to go in if necessary. These regressions are for -O0 with the vbic and vorn intrinsics which don't now get combined and well, so be it. I think reimplementing these intrinsics in C is a mistake if we ever hope to make big-endian mode work properly, and fixing the generated header file by bypassing the generator makes it harder to accurately perform the sweeping changes that will probably be necessary to do that.# Recall e.g. the discussion around: http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00161.html Well, it would help if the generator were written in a better language than ML :) . While I don't mind the different language in the backend once in a while the problem is that everytime anyone needs to make a change to this file, we spend far more time relearning ML than actually doing the change :(. I agree: it's time the ML files went. They're an impediment to maintenance these days. When the ML description was added it did three things: generated arm_neon.h, generated the testsuite and generated a pipeline description for Cortex-A8. As we've progressed the second and third of these have gone away (or at least, are about to in the case of the testsuite), leaving only the arm_neon.h generation. I don't see any real merit in having that file generated from the ML file; we might as well just maintain the existing code directly and that brings about the chance to have more people actively work on fixing issues there without having to learn ML first. R.
[PATCH, PR 60897] Clear DECL_LANG_SPECIFIC when creating ISRA clones
Hi, I nearly forgot about this patch to fix PR 60897 where we get a mangled name in a warning for IPA-SRA functions because IPA-SRA currently does not clear DECL_LANG_SPECIFIC when it messes with formal parameters and the front-end then does not look at abstract origin when it is not NULL. Bootstrapped and tested on x86_64-linux. OK for trunk? Also, although I have not tested it there yet, I suppose this should also be committed to the 4.9 branch. Thanks, Martin 2014-04-22 Martin Jambor mjam...@suse.cz PR ipa/60897 * ipa-prop.c (ipa_modify_formal_parameters): Reset DECL_LANG_SPECIFIC. diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c index 9f144fa..0bc44d3 100644 --- a/gcc/ipa-prop.c +++ b/gcc/ipa-prop.c @@ -3650,6 +3650,7 @@ ipa_modify_formal_parameters (tree fndecl, ipa_parm_adjustment_vec adjustments) TREE_TYPE (fndecl) = new_type; DECL_VIRTUAL_P (fndecl) = 0; + DECL_LANG_SPECIFIC (fndecl) = NULL; otypes.release (); oparms.release (); }
Re: [PATCH] Fix GDB PR15559 (inferior calls using thiscall calling convention)
Tom The usual approach is some appropriate text somewhere on the GCC wiki Tom (though I suppose a note in the mail archives would do in a pinch) Tom along with a URL in a comment in the appropriate file (dwarf2.h or Tom dwarf2.def). Tom Could you please do that? Julian How's this, as a first attempt? Julian http://gcc.gnu.org/wiki/GNUDwarfExtensions Sorry I didn't reply to this sooner. That page looks great. Thanks for doing this. Tom
[C++ PATCH] demangler fix
Hi all, A patch I committed to libiberty last year [1, 2] caused a regression that caused the demangler to segfault on certain symbols [3, 4, 5, 6]. The attached patch fixes, and adds regression tests for all symbols referenced in those bugs. Ok to commit? Thanks, Gary -- http://gbenson.net/ [1] http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01299.html [2] http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01755.html [3] https://sourceware.org/bugzilla/show_bug.cgi?id=14963 [4] https://sourceware.org/bugzilla/show_bug.cgi?id=16593 [5] https://sourceware.org/bugzilla/show_bug.cgi?id=16752 [6] https://sourceware.org/bugzilla/show_bug.cgi?id=16845 2014-05-07 Gary Benson gben...@redhat.com * cp-demangle.c (struct d_component_stack): New structure. (struct d_print_info): New field component_stack. (d_print_init): Initialize the above. (d_print_comp_inner): Renamed from d_print_comp. Do not restore template stack if it would cause a loop. (d_print_comp): New function. * testsuite/demangle-expected: New test cases. diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c index bf2ffa9..41c86c7 100644 --- a/libiberty/cp-demangle.c +++ b/libiberty/cp-demangle.c @@ -275,6 +275,16 @@ struct d_growable_string int allocation_failure; }; +/* Stack of components, innermost first, used to avoid loops. */ + +struct d_component_stack +{ + /* This component. */ + const struct demangle_component *dc; + /* This component's parent. */ + const struct d_component_stack *parent; +}; + /* A demangle component and some scope captured when it was first traversed. */ @@ -327,6 +337,8 @@ struct d_print_info int pack_index; /* Number of d_print_flush calls so far. */ unsigned long int flush_count; + /* Stack of components, innermost first, used to avoid loops. */ + const struct d_component_stack *component_stack; /* Array of saved scopes for evaluating substitutions. */ struct d_saved_scope *saved_scopes; /* Index of the next unused saved scope in the above array. */ @@ -3934,6 +3946,8 @@ d_print_init (struct d_print_info *dpi, demangle_callbackref callback, dpi-demangle_failure = 0; + dpi-component_stack = NULL; + dpi-saved_scopes = NULL; dpi-next_saved_scope = 0; dpi-num_saved_scopes = 0; @@ -4269,8 +4283,8 @@ d_get_saved_scope (struct d_print_info *dpi, /* Subroutine to handle components. */ static void -d_print_comp (struct d_print_info *dpi, int options, - const struct demangle_component *dc) +d_print_comp_inner (struct d_print_info *dpi, int options, + const struct demangle_component *dc) { /* Magic variable to let reference smashing skip over the next modifier without needing to modify *dc. */ @@ -4673,11 +4687,30 @@ d_print_comp (struct d_print_info *dpi, int options, } else { + const struct d_component_stack *dcse; + int found_self_or_parent = 0; + /* This traversal is reentering SUB as a substition. - Restore the original templates temporarily. */ - saved_templates = dpi-templates; - dpi-templates = scope-templates; - need_template_restore = 1; + If we are not beneath SUB or DC in the tree then we + need to restore SUB's template stack temporarily. */ + for (dcse = dpi-component_stack; dcse != NULL; +dcse = dcse-parent) + { + if (dcse-dc == sub + || (dcse-dc == dc +dcse != dpi-component_stack)) + { + found_self_or_parent = 1; + break; + } + } + + if (!found_self_or_parent) + { + saved_templates = dpi-templates; + dpi-templates = scope-templates; + need_template_restore = 1; + } } a = d_lookup_template_argument (dpi, sub); @@ -5316,6 +5349,21 @@ d_print_comp (struct d_print_info *dpi, int options, } } +static void +d_print_comp (struct d_print_info *dpi, int options, + const struct demangle_component *dc) +{ + struct d_component_stack self; + + self.dc = dc; + self.parent = dpi-component_stack; + dpi-component_stack = self; + + d_print_comp_inner (dpi, options, dc); + + dpi-component_stack = self.parent; +} + /* Print a Java dentifier. For Java we try to handle encoded extended Unicode characters. The C++ ABI doesn't mention Unicode encoding, so we don't it for C++. Characters are encoded as diff --git a/libiberty/testsuite/demangle-expected b/libiberty/testsuite/demangle-expected index 3ff08e6..453f9a3 100644 --- a/libiberty/testsuite/demangle-expected +++ b/libiberty/testsuite/demangle-expected @@
Re: PR 61084: SPARC fallout from wide-int merge
On May 7, 2014, at 2:26 AM, Richard Sandiford rdsandif...@googlemail.com wrote: The DImode constant spliiter assigned the result of trunc_int_for_mode to an unsigned int rather than a HOST_WIDE_INT. This then produced const_ints that were zero-extended rather than sign-extended and tripped the assert: gcc_checking_assert (INTVAL (x.first) == sext_hwi (INTVAL (x.first), precision) || (x.second == BImode INTVAL (x.first) == 1)); The other hunks are just by inspection, but I think gen_int_mode is preferred over GEN_INT when the mode is obvious. Tested by Rainer, who says that the boostrap now completes. OK to install? Ok.
[Committed] Add myself to MAINTAINERS
Committed as r210164. Index: MAINTAINERS === --- MAINTAINERS (revision 210161) +++ MAINTAINERS (working copy) @@ -315,6 +315,7 @@ Simon Baldwin sim...@google.com Scott Bambroughsco...@netwinder.org Wolfgang Bangerth bange...@dealii.org +Charles Baylis charles.bay...@linaro.org Tejas Belagod tejas.bela...@arm.com Andrey Belevantsev a...@ispras.ru Jon Beniston j...@beniston.com
Re: [C++ PATCH] demangler fix
OK, thanks. Jason
[PATCH] copyprop_hardreg_forward needs to check HARD_REGNO_CALL_PART_CLOBBERED
The MIPS O32 FPXX ABI exposes a bug in regcprop where call part clobbered information is not checked when calculating clobbered registers. This is only one of many places that regs_invalidated_by_call is used without also checking HARD_REGNO_CALL_PART_CLOBBERED. This patch ensures that a part clobbered register is treated as if fully clobbered. Other places where this same issue occurs are not so easily fixed as they do not always have mode information available when calculating clobbered registers. A solution to the larger problem will be significantly more involved. Exposed in a testcase as part of: http://gcc.gnu.org/ml/gcc-patches/2014-05/msg00401.html Regards, Matthew 2014-05-07 Matthew Fortune matthew.fort...@imgtec.com gcc/ * regcprop.c (copyprop_hardreg_forward_1): Account for HARD_REGNO_CALL_PART_CLOBBERED. 0001-copyprop-part-clobbered.patch Description: 0001-copyprop-part-clobbered.patch
Re: PR 61084: SPARC fallout from wide-int merge
Mike Stump mikest...@comcast.net writes: On May 7, 2014, at 2:26 AM, Richard Sandiford rdsandif...@googlemail.com wrote: The DImode constant spliiter assigned the result of trunc_int_for_mode to an unsigned int rather than a HOST_WIDE_INT. This then produced const_ints that were zero-extended rather than sign-extended and tripped the assert: gcc_checking_assert (INTVAL (x.first) == sext_hwi (INTVAL (x.first), precision) || (x.second == BImode INTVAL (x.first) == 1)); The other hunks are just by inspection, but I think gen_int_mode is preferred over GEN_INT when the mode is obvious. Tested by Rainer, who says that the boostrap now completes. OK to install? Ok. I think this needs a backend maintainer. Although it was exposed by the wide-int assert, it isn't really wide-int-related as such. Thanks, Richard
Re: [PATCH GCC]Add 'force-dwarf-lexical-blocks' command line option
On May 7, 2014, at 2:32 AM, Herman, Andrei andrei_her...@codesourcery.com wrote: However, code coverage tools that process the DWARF debug information to implement block/path coverage need more complete lexical block information. So, it would be nice to give a hint in the actual documentation, why a user might use the flag, or for a maintainer to be able to predict exactly what was desired in some obscure corner of dwarf semantics given the documentation. I think it can be as simple as “This option is useful for code coverage tools that utilize the dwarf debug information.” A user, upon seeing that, would then ask, do I have such a tool, say no, and then know they don’t have to contemplate the goodness of the option further. If one is writing a coverage tool, upon seeing the documentation, they might then ask themselves, how might I use that flag profitably for my users.
Re: [PATCH] rs6000: New attributes for load/store: sign_extend, update and indexed
On Sun, May 4, 2014 at 10:13 PM, Segher Boessenkool seg...@kernel.crashing.org wrote: The new attributes replace the instruction types *_ext*, *_u, *_ux. This simplifies all code that does not care about the addressing modes, putting the burden on the code that does care (mostly the scheduling descriptions for certain CPUs). It fixes a few minor bugs in the process. The update and indexed attributes are automatic for any insn that has a MEM as operand 0 or 1. Other insns have to set it manually, if they do not like the default (which is no). Insns that are type load/store/fpload/fpstore but have fewer than two operands need to set it too, or the compiler will crash. There are very few of those. This tries not to change semantics anywhere; in particular, the string and multiple instructions set both update and indexed (although they are neither). Bootstrapped on powerpc64-linux c,c++,fortran,ada,go; tested {-m64,-m64/-mcpu=power8,-m32,-m32/-mpowerpc64}, no regressions. OK for mainline? Segher gcc/ 2014-05-04 Segher Boessenkool seg...@kernel.crashing.org * config/rs6000/predicates.md (indexed_address_mem): New. * config/rs6000/rs6000.md (type): Remove load_ext, load_ext_u, load_ext_ux, load_ux, load_u, store_ux, store_u, fpload_ux, fpload_u, fpstore_ux, fpstore_u. (sign_extend, indexed, update): New. (cell_micro): Adjust. (*zero_extendmodedi2_internal1, *zero_extendsidi2_lfiwzx, *extendsidi2_lfiwax, *extendsidi2_nocell, *extendsfdf2_fpr, *movsi_internal1, *movsi_internal1_single, *movhi_internal, *movqi_internal, *movcc_internal1, movmode_hardfloat, *movmode_softfloat, *movmode_hardfloat32, *movmode_hardfloat64, *movmode_softfloat64, *movdi_internal32, *movdi_internal64, *movmode_string, *ldmsi8, *ldmsi7, *ldmsi6, *ldmsi5, *ldmsi4, *ldmsi3, *stmsi8, *stmsi7, *stmsi6, *stmsi5, *stmsi4, *stmsi3, *movdi_update1, movdi_mode_update, movdi_mode_update_stack, *movsi_update1, *movsi_update2, movsi_update, movsi_update_stack, *movhi_update1, *movhi_update2, *movhi_update3, *movhi_update4, *movqi_update1, *movqi_update2, *movqi_update3, *movsf_update1, *movsf_update2, *movsf_update3, *movsf_update4, *movdf_update1, *movdf_update2, load_toc_aix_si, load_toc_aix_di, probe_stack_mode, *stmw, *lmw, as well as 10 anonymous patterns): Adjust. * config/rs6000/dfp.md (movsd_store, movsd_load): Adjust. * config/rs6000/vsx.md (*vsx_movti_32bit, *vsx_extract_mode_load, *vsx_extract_mode_store): Adjust. * config/rs6000/rs6000.c (rs6000_adjust_cost, is_microcoded_insn, is_cracked_insn, insn_must_be_first_in_group, insn_must_be_last_in_group): Adjust. * config/rs6000/40x.md (ppc403-load, ppc403-store, ppc405-float): Adjust. * config/rs6000/440.md (ppc440-load, ppc440-store, ppc440-fpload, ppc440-fpstore): Adjust. * config/rs6000/476.md (ppc476-load, ppc476-store, ppc476-fpload, ppc476-fpstore): Adjust. * config/rs6000/601.md (ppc601-load, ppc601-store, ppc601-fpload, ppc601-fpstore): Adjust. * config/rs6000/603.md (ppc603-load, ppc603-store, ppc603-fpload): Adjust. * config/rs6000/6xx.md (ppc604-load, ppc604-store, ppc604-fpload): Adjust. * config/rs6000/7450.md (ppc7450-load, ppc7450-store, ppc7450-fpload, ppc7450-fpstore): Adjust. * config/rs6000/7xx.md (ppc750-load, ppc750-store): Adjust. * config/rs6000/8540.md (ppc8540_load, ppc8540_store): Adjust. * config/rs6000/a2.md (ppca2-load, ppca2-fp-load, ppca2-fp-store): Adjust. * config/rs6000/cell.md (cell-load, cell-load-ux, cell-load-ext, cell-fpload, cell-fpload-update, cell-store, cell-store-update, cell-fpstore, cell-fpstore-update): Adjust. * config/rs6000/e300c2c3.md (ppce300c3_load, ppce300c3_fpload, ppce300c3_store, ppce300c3_fpstore): Adjust. * config/rs6000/e500mc.md (e500mc_load, e500mc_fpload, e500mc_store, e500mc_fpstore): Adjust. * config/rs6000/e500mc64.md (e500mc64_load, e500mc64_fpload, e500mc64_store, e500mc64_fpstore): Adjust. * config/rs6000/e5500.md (e5500_load, e5500_fpload, e5500_store, e5500_fpstore): Adjust. * config/rs6000/e6500.md (e6500_load, e6500_fpload, e6500_store, e6500_fpstore): Adjust. * config/rs6000/mpc.md (mpccore-load, mpccore-store, mpccore-fpload): Adjust. * config/rs6000/power4.md (power4-load, power4-load-ext, power4-load-ext-update, power4-load-ext-update-indexed, power4-load-update-indexed, power4-load-update, power4-fpload, power4-fpload-update, power4-store, power4-store-update, power4-store-update-indexed, power4-fpstore,
[4.7] Various backports
Hi! I've backported some fixes I've committed (plus one support change from Jason and one fix from Marek) to 4.8 branch in the last year or so to 4.7 branch, after bootstrapping/regtesting them on x86_64-linux and i686-linux. Sorry for the delay. Jakub 2014-05-07 Jakub Jelinek ja...@redhat.com Backported from mainline 2013-06-27 Jakub Jelinek ja...@redhat.com PR target/57623 * config/i386/i386.md (bmi2_bzhi_mode3): Swap AND arguments to match RTL canonicalization. Swap predicates and constraints of operand 1 and 2. * gcc.target/i386/bmi2-bzhi-1.c: New test. --- gcc/config/i386/i386.md (revision 200477) +++ gcc/config/i386/i386.md (revision 200478) @@ -12174,9 +12174,9 @@ (define_insn *bmi_blsr_mode ;; BMI2 instructions. (define_insn bmi2_bzhi_mode3 [(set (match_operand:SWI48 0 register_operand =r) - (and:SWI48 (match_operand:SWI48 1 register_operand r) - (lshiftrt:SWI48 (const_int -1) - (match_operand:SWI48 2 nonimmediate_operand rm + (and:SWI48 (lshiftrt:SWI48 (const_int -1) + (match_operand:SWI48 2 register_operand r)) + (match_operand:SWI48 1 nonimmediate_operand rm))) (clobber (reg:CC FLAGS_REG))] TARGET_BMI2 bzhi\t{%2, %1, %0|%0, %1, %2} --- gcc/testsuite/gcc.target/i386/bmi2-bzhi-1.c (revision 0) +++ gcc/testsuite/gcc.target/i386/bmi2-bzhi-1.c (revision 200478) @@ -0,0 +1,31 @@ +/* PR target/57623 */ +/* { dg-do assemble { target bmi2 } } */ +/* { dg-options -O2 -mbmi2 } */ + +#include x86intrin.h + +unsigned int +f1 (unsigned int x, unsigned int *y) +{ + return _bzhi_u32 (x, *y); +} + +unsigned int +f2 (unsigned int *x, unsigned int y) +{ + return _bzhi_u32 (*x, y); +} + +#ifdef __x86_64__ +unsigned long long +f3 (unsigned long long x, unsigned long long *y) +{ + return _bzhi_u64 (x, *y); +} + +unsigned long long +f4 (unsigned long long *x, unsigned long long y) +{ + return _bzhi_u64 (*x, y); +} +#endif 2014-05-07 Jakub Jelinek ja...@redhat.com Backported from mainline 2013-06-27 Jakub Jelinek ja...@redhat.com PR target/57623 * config/i386/i386.md (bmi_bextr_mode): Swap predicates and constraints of operand 1 and 2. * gcc.target/i386/bmi-bextr-3.c: New test. --- gcc/config/i386/i386.md (revision 200479) +++ gcc/config/i386/i386.md (revision 200480) @@ -12077,8 +12077,8 @@ (define_insn bmi_bextr_mode [(set (match_operand:SWI48 0 register_operand =r) -(unspec:SWI48 [(match_operand:SWI48 1 register_operand r) - (match_operand:SWI48 2 nonimmediate_operand rm)] +(unspec:SWI48 [(match_operand:SWI48 1 nonimmediate_operand rm) + (match_operand:SWI48 2 register_operand r)] UNSPEC_BEXTR)) (clobber (reg:CC FLAGS_REG))] TARGET_BMI --- gcc/testsuite/gcc.target/i386/bmi-bextr-3.c (revision 0) +++ gcc/testsuite/gcc.target/i386/bmi-bextr-3.c (revision 200480) @@ -0,0 +1,31 @@ +/* PR target/57623 */ +/* { dg-do assemble { target bmi } } */ +/* { dg-options -O2 -mbmi } */ + +#include x86intrin.h + +unsigned int +f1 (unsigned int x, unsigned int *y) +{ + return __bextr_u32 (x, *y); +} + +unsigned int +f2 (unsigned int *x, unsigned int y) +{ + return __bextr_u32 (*x, y); +} + +#ifdef __x86_64__ +unsigned long long +f3 (unsigned long long x, unsigned long long *y) +{ + return __bextr_u64 (x, *y); +} + +unsigned long long +f4 (unsigned long long *x, unsigned long long y) +{ + return __bextr_u64 (*x, y); +} +#endif 2014-05-07 Jakub Jelinek ja...@redhat.com Backported from mainline 2013-07-03 Jakub Jelinek ja...@redhat.com PR target/5 * config/i386/predicates.md (vsib_address_operand): Disallow SYMBOL_REF or LABEL_REF in parts.disp if TARGET_64BIT flag_pic. * gcc.target/i386/pr5.c: New test. --- gcc/config/i386/predicates.md (revision 200649) +++ gcc/config/i386/predicates.md (revision 200650) @@ -835,19 +835,28 @@ (define_predicate vsib_address_operand return false; /* VSIB addressing doesn't support (%rip). */ - if (parts.disp GET_CODE (parts.disp) == CONST) + if (parts.disp) { - disp = XEXP (parts.disp, 0); - if (GET_CODE (disp) == PLUS) - disp = XEXP (disp, 0); - if (GET_CODE (disp) == UNSPEC) - switch (XINT (disp, 1)) - { - case UNSPEC_GOTPCREL: - case UNSPEC_PCREL: - case UNSPEC_GOTNTPOFF: - return false; - } + disp = parts.disp; + if (GET_CODE (disp) == CONST) + { + disp = XEXP (disp, 0); + if (GET_CODE (disp) == PLUS) + disp = XEXP (disp, 0); + if (GET_CODE (disp) == UNSPEC) + switch (XINT (disp, 1)) + { + case UNSPEC_GOTPCREL: + case UNSPEC_PCREL: + case
Re: [PATCH, PR 60897] Clear DECL_LANG_SPECIFIC when creating ISRA clones
On May 7, 2014 5:30:53 PM CEST, Martin Jambor mjam...@suse.cz wrote: Hi, I nearly forgot about this patch to fix PR 60897 where we get a mangled name in a warning for IPA-SRA functions because IPA-SRA currently does not clear DECL_LANG_SPECIFIC when it messes with formal parameters and the front-end then does not look at abstract origin when it is not NULL. Bootstrapped and tested on x86_64-linux. OK for trunk? Also, although I have not tested it there yet, I suppose this should also be committed to the 4.9 branch. OK for both. Thanks, Richard. Thanks, Martin 2014-04-22 Martin Jambor mjam...@suse.cz PR ipa/60897 * ipa-prop.c (ipa_modify_formal_parameters): Reset DECL_LANG_SPECIFIC. diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c index 9f144fa..0bc44d3 100644 --- a/gcc/ipa-prop.c +++ b/gcc/ipa-prop.c @@ -3650,6 +3650,7 @@ ipa_modify_formal_parameters (tree fndecl, ipa_parm_adjustment_vec adjustments) TREE_TYPE (fndecl) = new_type; DECL_VIRTUAL_P (fndecl) = 0; + DECL_LANG_SPECIFIC (fndecl) = NULL; otypes.release (); oparms.release (); }
Re: [SH, committed] Fix PR 61026 sh-*-* Fails to Compile on FreeBSD
On Sat, 3 May 2014, Oleg Endo wrote: +#include sstream +#include vector +#include algorithm + #include config.h It's never OK to include any system headers (C or C++) before config.h. config.h may define feature test macros such as _FILE_OFFSET_BITS that affect system headers in various ways and are only effective if defined before any system headers are included, and if different files in GCC are built with different settings of such feature test macros then they may expect incompatible choices of ABI for C library types. (This is a general principle for any software using autoconf, at least if it uses any of the autoconf macros that can define feature test macros - which GCC does - not just for GCC.) -- Joseph S. Myers jos...@codesourcery.com
Re: [patch] change specific int128 - generic intN
On Sun, 4 May 2014, DJ Delorie wrote: I'm not aware of any reason those macros need to have decimal values. I'd suggest removing the precomputed table and printing them in hex, which is easy for values of any precision. Here's an independent change that removes the decimal table and replaces it with generated hex values. I included the relevent output of gcc -E -dM also. OK (presuming the usual bootstrap and regression test, which should provide a reasonably thorough test of this code through the stdint.h tests). -- Joseph S. Myers jos...@codesourcery.com
Re: [DOC PATCH] Rewrite docs for inline asm
On Mon, 5 May 2014, Gerald Pfeifer wrote: I've changed this to @code{=}. Is that what you meant? This is a question for Joseph. I see how a single character under @code{} won't work, yet @code{=} doesn't feel right, either. Perhaps ``@code{=}''? If you are referring to an actual string constant = in the user's source code, then @code{=} is correct. If you are referring just to the single character = in the user's source code, whether as a token on its own or as part of a larger token, then @samp{=} is the way to get it quoted (with the character being in a fixed-width font, but the quotes around it not being in such a font). -- Joseph S. Myers jos...@codesourcery.com
Re: [C PATCH] Don't reject valid code with _Alignas (PR c/61053)
On Mon, 5 May 2014, Marek Polacek wrote: In this PR the issue is that we reject (valid) code such as _Alignas (long long) long long foo; with -m32, because we trip this condition: alignas_align = 1U declspecs-align_log; if (alignas_align TYPE_ALIGN_UNIT (type)) { if (name) error_at (loc, %_Alignas% specifiers cannot reduce alignment of %qE, name); and error later on, since alignas_align is 4 (correct, see PR52023 for why), but TYPE_ALIGN_UNIT of long long is 8. I think TYPE_ALIGN_UNIT is wrong here as that won't give us minimal alignment required. In c_sizeof_or_alignof_type we already have the code to compute such minimal alignment so I just moved the code to a separate function and used that instead of TYPE_ALIGN_UNIT. Note that the test is run only on i?86 and x86_64, because we can't (?) easily determine which target requires what alignment. Regtested/bootstrapped on x86_64-unknown-linux-gnu and powerpc64-unknown-linux-gnu, ok for trunk? OK, though I'm not sure if the lp64 conditions are right in the testcase (i.e. if x32 has the same peculiarity as -m32 here, which is what's implied by the use of lp64). -- Joseph S. Myers jos...@codesourcery.com
RE: [PATCH GCC]Add 'force-dwarf-lexical-blocks' command line option
Thanks for the suggestion. The current patch includes the following text added in gcc/doc/invoke.texi: @item -fforce-dwarf-lexical-blocks Produce debug information (a DW_TAG_lexical_block) for every function body, loop body, switch body, case statement, if-then and if-else statement, even if the body is a single statement. Likewise, a lexical block will be emitted for the first label of a statement. This block ends at the end of the current lexical scope, or when a break, continue, goto or return statement is encountered at the same lexical scope level. This option is available when using DWARF Version 4 or higher. I can add the suggested sentence at the beginning of the description, to save time for users not interested in the more detailed explanation. Regards, Andrei Herman Mentor Graphics Corporation Israel branch -Original Message- From: Mike Stump [mailto:mikest...@comcast.net] Sent: Wednesday, May 07, 2014 7:00 PM To: Herman, Andrei Cc: gcc-patches@gcc.gnu.org; herman_and...@mentor.com Subject: Re: [PATCH GCC]Add 'force-dwarf-lexical-blocks' command line option On May 7, 2014, at 2:32 AM, Herman, Andrei andrei_her...@codesourcery.com wrote: However, code coverage tools that process the DWARF debug information to implement block/path coverage need more complete lexical block information. So, it would be nice to give a hint in the actual documentation, why a user might use the flag, or for a maintainer to be able to predict exactly what was desired in some obscure corner of dwarf semantics given the documentation. I think it can be as simple as This option is useful for code coverage tools that utilize the dwarf debug information. A user, upon seeing that, would then ask, do I have such a tool, say no, and then know they don't have to contemplate the goodness of the option further. If one is writing a coverage tool, upon seeing the documentation, they might then ask themselves, how might I use that flag profitably for my users.
Re: [PATCH GCC]Add 'force-dwarf-lexical-blocks' command line option
On Wed, May 7, 2014 at 10:19 AM, Herman, Andrei andrei_her...@codesourcery.com wrote: Thanks for the suggestion. The current patch includes the following text added in gcc/doc/invoke.texi: @item -fforce-dwarf-lexical-blocks Produce debug information (a DW_TAG_lexical_block) for every function body, loop body, switch body, case statement, if-then and if-else statement, even if the body is a single statement. Likewise, a lexical block will be emitted for the first label of a statement. This block ends at the end of the current lexical scope, or when a break, continue, goto or return statement is encountered at the same lexical scope level. This option is available when using DWARF Version 4 or higher. I can add the suggested sentence at the beginning of the description, to save time for users not interested in the more detailed explanation. Also be explicit that the option only applies to C/C++ code in the documentation. Thanks, Andrew Pinski Regards, Andrei Herman Mentor Graphics Corporation Israel branch -Original Message- From: Mike Stump [mailto:mikest...@comcast.net] Sent: Wednesday, May 07, 2014 7:00 PM To: Herman, Andrei Cc: gcc-patches@gcc.gnu.org; herman_and...@mentor.com Subject: Re: [PATCH GCC]Add 'force-dwarf-lexical-blocks' command line option On May 7, 2014, at 2:32 AM, Herman, Andrei andrei_her...@codesourcery.com wrote: However, code coverage tools that process the DWARF debug information to implement block/path coverage need more complete lexical block information. So, it would be nice to give a hint in the actual documentation, why a user might use the flag, or for a maintainer to be able to predict exactly what was desired in some obscure corner of dwarf semantics given the documentation. I think it can be as simple as This option is useful for code coverage tools that utilize the dwarf debug information. A user, upon seeing that, would then ask, do I have such a tool, say no, and then know they don't have to contemplate the goodness of the option further. If one is writing a coverage tool, upon seeing the documentation, they might then ask themselves, how might I use that flag profitably for my users.
Re: [C PATCH] Warn about variadic main (PR c/60156)
On Tue, 6 May 2014, Marek Polacek wrote: On Thu, May 01, 2014 at 11:37:58PM +, Joseph S. Myers wrote: As a matter of QoI we should also diagnose use of _Atomic in the return type or argument types of main (something I deferred doing in the initial _Atomic support). Ok, I opened PR61077 and I'm taking it. But I wonder if I should diagnose if the second parameter is e.g.: _Atomic char **argv; char *_Atomic *argv; Yes, those should be diagnosed (remember that _Atomic char is allowed to be bigger than char, so those certainly aren't reasonable types for arguments to main). -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH GCC]Add 'force-dwarf-lexical-blocks' command line option
On Wed, 7 May 2014, Herman, Andrei wrote: When this flag is set, a DW_TAG_lexical_block DIE will be emitted for every function body, loop body, switch body, case statement, if-then and if-else statement, even if the body is a single statement. Likewise, a lexical block will be emitted for the first label of a labeled statement. This block ends at the end of the current lexical scope, or when a break, continue, goto or return statement is encountered at the same lexical scope level. Consequently, any case in a switch statement that does not flow through to the next case, will have its own dwarf lexical block. The documentation appears to suggest it's purely about debug info and has no effect on language semantics. However, the implementation appears to force C99 scoping rules. I don't think it's appropriate for a debug info option to have that effect; that is, gcc.dg/c90-scope-1.c should still pass even with the option enabled (more generally, the whole C testsuite should be verified to work with the option enabled). I suspect the changes adding scopes for labels would also affect language semantics; it's valid in C to have a declaration (not having variably modified type) after one case in a switch statement that gets used in another case even when control does not flow through. If you can't avoid affecting language semantics then you need to be very clear in the documentation that the option makes some invalid programs valid and vice versa and changes the semantics of some valid programs (even if you then assert the affected cases are uncommon in real C code). -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH GCC]Add 'force-dwarf-lexical-blocks' command line option
On May 7, 2014, at 10:19 AM, Herman, Andrei andrei_her...@codesourcery.com wrote: Thanks for the suggestion. I can add the suggested sentence at the beginning of the description, to save time for users not interested in the more detailed explanation. I’d put it at the end… I think the description you have it more important.
[committed] PR 61095: tsan fallout from wide-int merge
This PR was due to code in which -(int) foo was suposed to be sign-extended, but was being ORed with an unsigned int and so ended up being zero-extended. Fixed by using the proper-width type. Tested on x86_64-linux-gnu and applied as obvious. Sorry for the breakage. Thanks, Richard gcc/ PR tree-optimization/61095 * tree-ssanames.c (get_nonzero_bits): Fix type extension in wi::shwi. Index: gcc/tree-ssanames.c === --- gcc/tree-ssanames.c 2014-05-07 16:50:15.136064484 +0100 +++ gcc/tree-ssanames.c 2014-05-07 16:50:15.422063737 +0100 @@ -271,7 +271,8 @@ get_nonzero_bits (const_tree name) { struct ptr_info_def *pi = SSA_NAME_PTR_INFO (name); if (pi pi-align) - return wi::shwi (-(int) pi-align | pi-misalign, precision); + return wi::shwi (-(HOST_WIDE_INT) pi-align +| (HOST_WIDE_INT) pi-misalign, precision); return wi::shwi (-1, precision); }
Re: [C PATCH] Don't reject valid code with _Alignas (PR c/61053)
On Wed, May 7, 2014 at 10:15 AM, Joseph S. Myers jos...@codesourcery.com wrote: On Mon, 5 May 2014, Marek Polacek wrote: In this PR the issue is that we reject (valid) code such as _Alignas (long long) long long foo; with -m32, because we trip this condition: alignas_align = 1U declspecs-align_log; if (alignas_align TYPE_ALIGN_UNIT (type)) { if (name) error_at (loc, %_Alignas% specifiers cannot reduce alignment of %qE, name); and error later on, since alignas_align is 4 (correct, see PR52023 for why), but TYPE_ALIGN_UNIT of long long is 8. I think TYPE_ALIGN_UNIT is wrong here as that won't give us minimal alignment required. In c_sizeof_or_alignof_type we already have the code to compute such minimal alignment so I just moved the code to a separate function and used that instead of TYPE_ALIGN_UNIT. Note that the test is run only on i?86 and x86_64, because we can't (?) easily determine which target requires what alignment. Regtested/bootstrapped on x86_64-unknown-linux-gnu and powerpc64-unknown-linux-gnu, ok for trunk? OK, though I'm not sure if the lp64 conditions are right in the testcase It should be !ia32 instead of lp64. (i.e. if x32 has the same peculiarity as -m32 here, which is what's implied by the use of lp64). Alignments of long long and long double on x32 are the same as x86-64. -- H.J.
Re: [C++ Patch] PR 61083
On 05/07/2014 01:15 PM, Paolo Carlini wrote: curiously, convert_nontype_argument still has most of its error calls not protected by complain tf_error. The obvious fix works for this SFINAE issue. Not a regression, but could be safe for the branch too? Sure, OK for trunk and 4.9. Jason
[patch libgcc]: Fix PR c++/57440
Hi, this patch adds for Windows targets the define _GTHREAD_USE_MUTEX_INIT_FUNC, which is necessary as pthread-emulation for those targets are just handling pthread_mutext_init, othread_mutex_destroy proper. ChangeLog libgcc 2014-05-07 Kai Tietz kti...@redhat.com PR c++/57440 * gthr-posix.h (_GTHREAD_USE_MUTEX_INIT_FUNC): Define for native windows targets. Patch passed already regression-test for x86_64-unknown-linux-gnu. Test for i686-w64-mingw32 is still running (with posix-threading model). Ok to apply this patch after last test passes? Regards, Kai Index: gthr-posix.h === --- gthr-posix.h(Revision 210070) +++ gthr-posix.h(Arbeitskopie) @@ -34,6 +34,10 @@ see the files COPYING3 and COPYING.RUNTIME respect #include pthread.h +#if defined (_WIN32) !defined (__CYGWIN__) +#define _GTHREAD_USE_MUTEX_INIT_FUNC 1 +#endif + #if ((defined(_LIBOBJC) || defined(_LIBOBJC_WEAK)) \ || !defined(_GTHREAD_USE_MUTEX_TIMEDLOCK)) # include unistd.h
Re: [PATCH, MIPS] Alter default number of single-precision registers
Matthew Fortune matthew.fort...@imgtec.com writes: diff --git a/gcc/testsuite/gcc.target/mips/oddspreg-6.c b/gcc/testsuite/gcc.target/mips/oddspreg-6.c new file mode 100644 index 000..2d1b129 --- /dev/null +++ b/gcc/testsuite/gcc.target/mips/oddspreg-6.c @@ -0,0 +1,15 @@ +/* Check that we disable odd-numbered single precision registers and can + still generate code. */ +/* { dg-options -mabi=64 -mno-odd-spreg -mhard-float } */ Check that we enable odd-numbered single precision registers. for this one? OK otherwise once the copyright is sorted out, thanks. Richard
[jit] Add a soname
gcc/jit/ * Make-lang.in (LIBGCCJIT_LINKER_NAME): New. (LIBGCCJIT_VERSION_NUM): New. (LIBGCCJIT_MINOR_NUM): New. (LIBGCCJIT_RELEASE_NUM): New. (LIBGCCJIT_SONAME): New. (LIBGCCJIT_FILENAME): New. (LIBGCCJIT_LINKER_NAME_SYMLINK): New. (LIBGCCJIT_SONAME_SYMLINK): New. (jit): Add symlink targets. (libgccjit.so): Convert to... (LIBGCCJIT_FILENAME): ...and add a soname. (jit.install-common): Install the library with a soname, and symlinks. Install libgccjit++.h. --- gcc/jit/ChangeLog.jit | 16 gcc/jit/Make-lang.in | 38 +- 2 files changed, 49 insertions(+), 5 deletions(-) diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit index ccf8a10..f5c4742 100644 --- a/gcc/jit/ChangeLog.jit +++ b/gcc/jit/ChangeLog.jit @@ -1,3 +1,19 @@ +2014-05-07 David Malcolm dmalc...@redhat.com + + * Make-lang.in (LIBGCCJIT_LINKER_NAME): New. + (LIBGCCJIT_VERSION_NUM): New. + (LIBGCCJIT_MINOR_NUM): New. + (LIBGCCJIT_RELEASE_NUM): New. + (LIBGCCJIT_SONAME): New. + (LIBGCCJIT_FILENAME): New. + (LIBGCCJIT_LINKER_NAME_SYMLINK): New. + (LIBGCCJIT_SONAME_SYMLINK): New. + (jit): Add symlink targets. + (libgccjit.so): Convert to... + (LIBGCCJIT_FILENAME): ...and add a soname. + (jit.install-common): Install the library with a soname, and + symlinks. Install libgccjit++.h. + 2014-04-25 David Malcolm dmalc...@redhat.com * internal-api.c (gcc::jit::playback::context::compile): Put diff --git a/gcc/jit/Make-lang.in b/gcc/jit/Make-lang.in index 776ee81..ce0cdc5 100644 --- a/gcc/jit/Make-lang.in +++ b/gcc/jit/Make-lang.in @@ -40,7 +40,18 @@ # into the jit rule, but that needs a little bit of work # to do the right thing within all.cross. -jit: libgccjit.so +LIBGCCJIT_LINKER_NAME = libgccjit.so +LIBGCCJIT_VERSION_NUM = 0 +LIBGCCJIT_MINOR_NUM = 0 +LIBGCCJIT_RELEASE_NUM = 1 +LIBGCCJIT_SONAME = $(LIBGCCJIT_LINKER_NAME).$(LIBGCCJIT_VERSION_NUM) +LIBGCCJIT_FILENAME = \ + $(LIBGCCJIT_SONAME).$(LIBGCCJIT_MINOR_NUM).$(LIBGCCJIT_RELEASE_NUM) + +LIBGCCJIT_LINKER_NAME_SYMLINK = $(LIBGCCJIT_LINKER_NAME) +LIBGCCJIT_SONAME_SYMLINK = $(LIBGCCJIT_SONAME) + +jit: $(LIBGCCJIT_FILENAME) $(LIBGCCJIT_SYMLINK) $(LIBGCCJIT_LINKER_NAME_SYMLINK) # Tell GNU make to ignore these if they exist. .PHONY: jit @@ -53,14 +64,21 @@ jit-warn = $(STRICT_WARN) # We avoid using $(BACKEND) from Makefile.in in order to avoid pulling # in main.o -libgccjit.so: $(jit_OBJS) \ +$(LIBGCCJIT_FILENAME): $(jit_OBJS) \ libbackend.a libcommon-target.a libcommon.a \ $(CPPLIB) $(LIBDECNUMBER) \ $(LIBDEPS) $(srcdir)/jit/libgccjit.map +$(LLINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ -shared \ $(jit_OBJS) libbackend.a libcommon-target.a libcommon.a \ $(CPPLIB) $(LIBDECNUMBER) $(LIBS) $(BACKENDLIBS) \ --Wl,--version-script=$(srcdir)/jit/libgccjit.map +-Wl,--version-script=$(srcdir)/jit/libgccjit.map \ +-Wl,-soname,$(LIBGCCJIT_SONAME) + +$(LIBGCCJIT_SONAME_SYMLINK): $(LIBGCCJIT_FILENAME) + ln -sf $(LIBGCCJIT_FILENAME) $(LIBGCCJIT_SONAME_SYMLINK) + +$(LIBGCCJIT_LINKER_NAME_SYMLINK): $(LIBGCCJIT_SONAME_SYMLINK) + ln -sf $(LIBGCCJIT_SONAME_SYMLINK) $(LIBGCCJIT_LINKER_NAME_SYMLINK) # # Build hooks: @@ -87,8 +105,18 @@ jit.srcman: # # Install hooks: jit.install-common: installdirs - $(INSTALL_PROGRAM) libgccjit.so $(DESTDIR)/$(libdir)/libgccjit.so - $(INSTALL_PROGRAM) $(srcdir)/jit/libgccjit.h $(DESTDIR)/$(includedir)/libgccjit.h + $(INSTALL_PROGRAM) $(LIBGCCJIT_FILENAME) \ + $(DESTDIR)/$(libdir)/$(LIBGCCJIT_FILENAME) + ln -sf \ + $(LIBGCCJIT_FILENAME) \ + $(DESTDIR)/$(libdir)/$(LIBGCCJIT_SONAME_SYMLINK) + ln -sf \ + $(LIBGCCJIT_SONAME_SYMLINK)\ + $(DESTDIR)/$(libdir)/$(LIBGCCJIT_LINKER_NAME_SYMLINK) + $(INSTALL_PROGRAM) $(srcdir)/jit/libgccjit.h \ + $(DESTDIR)/$(includedir)/libgccjit.h + $(INSTALL_PROGRAM) $(srcdir)/jit/libgccjit++.h \ + $(DESTDIR)/$(includedir)/libgccjit++.h jit.install-man: -- 1.8.5.3
RE: [PATCH, MIPS] Alter default number of single-precision registers
Richard Sandiford rdsandif...@googlemail.com writes: Matthew Fortune matthew.fort...@imgtec.com writes: diff --git a/gcc/testsuite/gcc.target/mips/oddspreg-6.c b/gcc/testsuite/gcc.target/mips/oddspreg-6.c new file mode 100644 index 000..2d1b129 --- /dev/null +++ b/gcc/testsuite/gcc.target/mips/oddspreg-6.c @@ -0,0 +1,15 @@ +/* Check that we disable odd-numbered single precision registers and can + still generate code. */ +/* { dg-options -mabi=64 -mno-odd-spreg -mhard-float } */ Check that we enable odd-numbered single precision registers. for this one? Yes. OK otherwise once the copyright is sorted out, thanks. Richard
Committed: [PATCH 19/89] Const-correctness of gimple_call_builtin_p
On Mon, 2014-04-21 at 12:56 -0400, David Malcolm wrote: gcc/ * gimple.h (gimple_builtin_call_types_compatible_p): Accept a const_gimple, rather than a gimple. (gimple_call_builtin_p): Likewise, for the three variants. * gimple.c (gimple_builtin_call_types_compatible_p): Likewise. (gimple_call_builtin_p): Likewise, for the three variants. --- gcc/gimple.c | 8 gcc/gimple.h | 8 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/gcc/gimple.c b/gcc/gimple.c index 13c5a08..943fa7c 100644 --- a/gcc/gimple.c +++ b/gcc/gimple.c @@ -2383,7 +2383,7 @@ validate_type (tree type1, tree type2) a decl of a builtin function. */ bool -gimple_builtin_call_types_compatible_p (gimple stmt, tree fndecl) +gimple_builtin_call_types_compatible_p (const_gimple stmt, tree fndecl) { gcc_checking_assert (DECL_BUILT_IN_CLASS (fndecl) != NOT_BUILT_IN); @@ -2412,7 +2412,7 @@ gimple_builtin_call_types_compatible_p (gimple stmt, tree fndecl) /* Return true when STMT is builtins call. */ bool -gimple_call_builtin_p (gimple stmt) +gimple_call_builtin_p (const_gimple stmt) { tree fndecl; if (is_gimple_call (stmt) @@ -2425,7 +2425,7 @@ gimple_call_builtin_p (gimple stmt) /* Return true when STMT is builtins call to CLASS. */ bool -gimple_call_builtin_p (gimple stmt, enum built_in_class klass) +gimple_call_builtin_p (const_gimple stmt, enum built_in_class klass) { tree fndecl; if (is_gimple_call (stmt) @@ -2438,7 +2438,7 @@ gimple_call_builtin_p (gimple stmt, enum built_in_class klass) /* Return true when STMT is builtins call to CODE of CLASS. */ bool -gimple_call_builtin_p (gimple stmt, enum built_in_function code) +gimple_call_builtin_p (const_gimple stmt, enum built_in_function code) { tree fndecl; if (is_gimple_call (stmt) diff --git a/gcc/gimple.h b/gcc/gimple.h index a8a8d72..62f9756 100644 --- a/gcc/gimple.h +++ b/gcc/gimple.h @@ -1458,10 +1458,10 @@ extern tree gimple_unsigned_type (tree); extern tree gimple_signed_type (tree); extern alias_set_type gimple_get_alias_set (tree); extern bool gimple_ior_addresses_taken (bitmap, gimple); -extern bool gimple_builtin_call_types_compatible_p (gimple, tree); -extern bool gimple_call_builtin_p (gimple); -extern bool gimple_call_builtin_p (gimple, enum built_in_class); -extern bool gimple_call_builtin_p (gimple, enum built_in_function); +extern bool gimple_builtin_call_types_compatible_p (const_gimple, tree); +extern bool gimple_call_builtin_p (const_gimple); +extern bool gimple_call_builtin_p (const_gimple, enum built_in_class); +extern bool gimple_call_builtin_p (const_gimple, enum built_in_function); extern bool gimple_asm_clobbers_memory_p (const_gimple); extern void dump_decl_set (FILE *, bitmap); extern bool nonfreeing_call_p (gimple); Succesfully bootstrappedregtested on its own on x86_64-unknown-linux-gnu (Fedora 20). Committed to trunk as r210185 (this is just fixing const-correctness, and so it falls under Jeff's preapproval for such fixes here: http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01240.html )
Re: [patch libgcc]: Fix PR c++/57440
On 7 May 2014 20:06, Kai Tietz wrote: PR c++/57440 N.B. that should be libstdc++/57440 in the ChangeLog
[SH, committeð] PR 60884 - reduce code size of inlined strlen
Hi, The attached patch reduces the code size of inlined builtin strlen functions on SH a little bit. Tested on r210083 with make -k check RUNTESTFLAGS=--target_board=sh-sim \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb} and no new failures, except for gcc.target/sh/pr53976-1.c on SH2 and SH2A. Using builtin strlen for checking the sett/clrt optimization pass was a bit inappropriate in this case. Committed as r210187. Cheers, Oleg gcc/ChangeLog: PR target/60884 * config/sh/sh-mem.cc (sh_expand_strlen): Use loop when emitting unrolled byte insns. Emit address increments after move insns. gcc/testsuite/ChangeLog: PR target/60884 * gcc.target/sh/pr53976-1.c (test_02): Remove inappropriate test case. (test_03): Rename to test_02. Index: gcc/testsuite/gcc.target/sh/pr53976-1.c === --- gcc/testsuite/gcc.target/sh/pr53976-1.c (revision 210185) +++ gcc/testsuite/gcc.target/sh/pr53976-1.c (working copy) @@ -24,15 +24,8 @@ } int -test_02 (const char* a) +test_02 (int a, int b, int c, int d) { - /* Must not see a sett after the inlined strlen. */ - return __builtin_strlen (a); -} - -int -test_03 (int a, int b, int c, int d) -{ /* One of the blocks should have a sett and the other one should not. */ if (d 4) return a + b + 1; Index: gcc/config/sh/sh-mem.cc === --- gcc/config/sh/sh-mem.cc (revision 210185) +++ gcc/config/sh/sh-mem.cc (working copy) @@ -568,7 +568,7 @@ addr1 = adjust_automodify_address (addr1, SImode, current_addr, 0); - /*start long loop. */ + /* start long loop. */ emit_label (L_loop_long); /* tmp1 is aligned, OK to load. */ @@ -589,29 +589,15 @@ addr1 = adjust_address (addr1, QImode, 0); /* unroll remaining bytes. */ - emit_insn (gen_extendqisi2 (tmp1, addr1)); - emit_insn (gen_cmpeqsi_t (tmp1, const0_rtx)); - jump = emit_jump_insn (gen_branch_true (L_return)); - add_int_reg_note (jump, REG_BR_PROB, prob_likely); + for (int i = 0; i 4; ++i) +{ + emit_insn (gen_extendqisi2 (tmp1, addr1)); + emit_move_insn (current_addr, plus_constant (Pmode, current_addr, 1)); + emit_insn (gen_cmpeqsi_t (tmp1, const0_rtx)); + jump = emit_jump_insn (gen_branch_true (L_return)); + add_int_reg_note (jump, REG_BR_PROB, prob_likely); +} - emit_move_insn (current_addr, plus_constant (Pmode, current_addr, 1)); - - emit_insn (gen_extendqisi2 (tmp1, addr1)); - emit_insn (gen_cmpeqsi_t (tmp1, const0_rtx)); - jump = emit_jump_insn (gen_branch_true (L_return)); - add_int_reg_note (jump, REG_BR_PROB, prob_likely); - - emit_move_insn (current_addr, plus_constant (Pmode, current_addr, 1)); - - emit_insn (gen_extendqisi2 (tmp1, addr1)); - emit_insn (gen_cmpeqsi_t (tmp1, const0_rtx)); - jump = emit_jump_insn (gen_branch_true (L_return)); - add_int_reg_note (jump, REG_BR_PROB, prob_likely); - - emit_move_insn (current_addr, plus_constant (Pmode, current_addr, 1)); - - emit_insn (gen_extendqisi2 (tmp1, addr1)); - jump = emit_jump_insn (gen_jump_compact (L_return)); emit_barrier_after (jump); /* start byte loop. */ @@ -626,10 +612,9 @@ /* end loop. */ - emit_insn (gen_addsi3 (start_addr, start_addr, GEN_INT (1))); - emit_label (L_return); + emit_insn (gen_addsi3 (start_addr, start_addr, GEN_INT (1))); emit_insn (gen_subsi3 (operands[0], current_addr, start_addr)); return true;
Re: [patch libgcc]: Fix PR c++/57440
2014-05-07 21:41 GMT+02:00 Jonathan Wakely jwakely@gmail.com: On 7 May 2014 20:06, Kai Tietz wrote: PR c++/57440 N.B. that should be libstdc++/57440 in the ChangeLog Oh, yes of course. Thanks. Kai
RFC: Faster for_each_rtx-like iterators
I noticed for_each_rtx showing up in profiles and thought I'd have a go at using worklist-based iterators instead. So far I have three: FOR_EACH_SUBRTX: iterates over const_rtx subrtxes of a const_rtx FOR_EACH_SUBRTX_VAR: iterates over rtx subrtxes of an rtx FOR_EACH_SUBRTX_PTR: iterates over subrtx pointers of an rtx * with FOR_EACH_SUBRTX_PTR being the direct for_each_rtx replacement. I made FOR_EACH_SUBRTX the default (unsuffixed) version because most walks really don't modify the structure. I think we should encourage const_rtxes to be used whereever possible. E.g. it might make it easier to have non-GC storage for temporary rtxes in future. I've locally replaced all for_each_rtx calls in the generic code with these iterators and they make things reproducably faster. The speed-up on full --enable-checking=release ./cc1 and ./cc1plus times is only about 1%, but maybe that's enough to justify the churn. Implementation-wise, the main observation is that most subrtxes are part of a single contiguous sequence of e fields. E.g. when compiling an oldish combine.ii on x86_64-linux-gnu with -O2, we iterate over the subrtxes of 7,636,542 rtxes. Of those: (A) 4,459,135 (58.4%) are leaf rtxes with no e or E fields, (B) 3,133,875 (41.0%) are rtxes with a single block of e fields and no E fields, and (C)43,532 (00.6%) are more complicated. (A) is really a special case of (B) in which the block has zero length. Those are the only two cases that really need to be handled inline. The implementation does this by having a mapping from an rtx code to the bounds of its e sequence, in the form of a start index and count. Out of (C), the vast majority (43,509) are PARALLELs. However, as you'd probably expect, bloating the inline code with that case made things slower rather than faster. The vast majority (in fact all in the combine.ii run above) of iterations can be done with a 16-element stack worklist. We obviously still need a heap fallback for the pathological cases though. I spent a bit of time trying different iterator implementations and seeing which produced the best code. Specific results from that were: - The storage used for the worklist is separate from the iterator, in order to avoid capturing iterator fields. - Although the natural type of the storage would be auto_vec ..., 16, that produced some overhead compared with a separate stack array and heap vector pointer. With the heap vector pointer, the only overhead is an assignment in the constructor and an if (x) release (x)-style sequence in the destructor. I think the extra complication over auto_vec is worth it because in this case the heap version is so very rarely needed. - Several existing for_each_rtx callbacks have something like: if (GET_CODE (x) == CONST) return -1; or: if (CONSTANT_P (x)) return -1; to avoid walking subrtxes of constants. That can be done without extra code checks and branches by having a separate code-bound mapping in which all constants are treated as leaf rtxes. This usage should be common enough to outweigh the cache penalty of two arrays. The choice between iterating over constants or not is given in the final parameter of the FOR_EACH_* iterator. - The maximum number of fields in (B)-type rtxes is 3. We get better code by making that explicit rather than having a general loop. - (C) codes map to an e count of UCHAR_MAX, so we can use a single check to test for that and for cases where the stack worklist is too small. To give an example: /* Callback for for_each_rtx, that returns 1 upon encountering a VALUE whose UID is greater than the int uid that D points to. */ static int refs_newer_value_cb (rtx *x, void *d) { if (GET_CODE (*x) == VALUE CSELIB_VAL_PTR (*x)-uid *(int *)d) return 1; return 0; } /* Return TRUE if EXPR refers to a VALUE whose uid is greater than that of V. */ static bool refs_newer_value_p (rtx expr, rtx v) { int minuid = CSELIB_VAL_PTR (v)-uid; return for_each_rtx (expr, refs_newer_value_cb, minuid); } becomes: /* Return TRUE if EXPR refers to a VALUE whose uid is greater than that of V. */ static bool refs_newer_value_p (const_rtx expr, rtx v) { int minuid = CSELIB_VAL_PTR (v)-uid; subrtx_iterator::array_type array; FOR_EACH_SUBRTX (iter, array, expr, NONCONST) if (GET_CODE (*iter) == VALUE CSELIB_VAL_PTR (*iter)-uid minuid) return true; return false; } The iterator also allows subrtxes of a specific rtx to be skipped; this is the equivalent of returning -1 from a for_each_rtx callback. It also allows the current rtx to be replaced in the worklist by another. E.g.: static void mark_constants_in_pattern (rtx insn) { subrtx_iterator::array_type array; FOR_EACH_SUBRTX (iter, array, PATTERN (insn), ALL) { const_rtx x = *iter; if (GET_CODE (x) == SYMBOL_REF) { if (CONSTANT_POOL_ADDRESS_P (x))
genattrtab error reporting
getattrtab looses track of which file the given rtl came from during error reporting. A port that uses multiple .md files for the port will tend to list the last .md file processed instead of the correct md file. We preserve the filename upon read, and during post processing, we reset the filename to the right context, as we process that context. Ok? 2014-05-07 Mike Stump mikest...@comcast.net * genattrtab.c (struct insn_def): Add filename. (convert_set_attr_alternative): Improve error message. (check_defs): Ensure read_md_filename is set appropriately. (gen_insn): Save read_md_filename. diff --git a/gcc/genattrtab.c b/gcc/genattrtab.c index 99b1b83..0f14b4d 100644 --- a/gcc/genattrtab.c +++ b/gcc/genattrtab.c @@ -139,6 +139,7 @@ struct insn_def rtx def; /* The DEFINE_... */ int insn_code; /* Instruction number. */ int insn_index; /* Expression number in file, for errors. */ + const char *filename;/* Filename. */ int lineno; /* Line number. */ int num_alternatives;/* Number of alternatives. */ int vec_idx; /* Index of attribute vector in `def'. */ @@ -1066,7 +1067,8 @@ convert_set_attr_alternative (rtx exp, struct insn_def *id) if (XVECLEN (exp, 1) != num_alt) { error_with_line (id-lineno, - bad number of entries in SET_ATTR_ALTERNATIVE); + bad number of entries in SET_ATTR_ALTERNATIVE, was %d expected %d, + XVECLEN (exp, 1), num_alt); return NULL_RTX; } @@ -1137,6 +1139,7 @@ check_defs (void) if (XVEC (id-def, id-vec_idx) == NULL) continue; + read_md_filename = id-filename; for (i = 0; i XVECLEN (id-def, id-vec_idx); i++) { value = XVECEXP (id-def, id-vec_idx, i); @@ -3280,6 +3283,7 @@ gen_insn (rtx exp, int lineno) id-next = defs; defs = id; id-def = exp; + id-filename = read_md_filename; id-lineno = lineno; switch (GET_CODE (exp))
Re: RFC: Faster for_each_rtx-like iterators
On May 7, 2014, at 1:52 PM, Richard Sandiford rdsandif...@googlemail.com wrote: I've locally replaced all for_each_rtx calls in the generic code with these iterators and they make things reproducably faster. The speed-up on full --enable-checking=release ./cc1 and ./cc1plus times is only about 1%, but maybe that's enough to justify the churn. 100 1% fixes would make the compiler 100% faster. :-) I think 1% is actually a really good improvement. If you have times for -O0, that would be interesting to see what they are.
Re: [PATCH] AutoFDO patch for trunk
Have you announced the autofdo profile tool to gcc list? David On Wed, May 7, 2014 at 2:24 PM, Dehao Chen de...@google.com wrote: Hi, I'm planning to port the AutoFDO patch upstream. Attached is the prepared patch. You can also find the patch in http://codereview.appspot.com/99010043 I've tested the patch with SPECCPU2006. For the CINT2006 benchmarks, the speedup comparison between O2, FDO and AutoFDO is as follows: Reference: o2 (1): auto_fdo (2): fdo Benchmark Base:Reference(1) (2) - spec/2006/int/C++/471.omnetpp 23.18 +3.11% +5.09% spec/2006/int/C++/473.astar 21.15 +6.79% +9.80% spec/2006/int/C++/483.xalancbmk 36.68 +11.56% +14.47% spec/2006/int/C/400.perlbench 34.57 +6.59% +18.56% spec/2006/int/C/401.bzip2 23.17 +0.95% +2.49% spec/2006/int/C/403.gcc 32.33 +8.27% +9.76% spec/2006/int/C/429.mcf 42.13 +4.72% +5.23% spec/2006/int/C/445.gobmk 26.53 -1.39% +0.05% spec/2006/int/C/456.hmmer 23.72 +7.12% +7.87% spec/2006/int/C/458.sjeng 26.17 +4.65% +6.04% spec/2006/int/C/462.libquantum57.23 +4.04% +1.42% spec/2006/int/C/464.h264ref46.3 +1.07% +8.97% geometric mean+4.73% +7.36% The majority of the performance difference between AutoFDO and FDO comes from the lack of instruction level discriminator support. Cary Coutant is planning to port that patch upstream too. Please let me know if you have any question about this patch, and thanks in advance for reviewing such a huge patch. Dehao
libgo patch committed: Define CLONE flags in syscall package
Domink Vogt pointed out that the gccgo syscall package does not define the CLONE flags. This patch defines them. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline and 4.9 branch. Ian diff -r c8ae29f0c4c6 libgo/configure.ac --- a/libgo/configure.ac Tue May 06 12:23:00 2014 -0700 +++ b/libgo/configure.ac Wed May 07 14:40:49 2014 -0700 @@ -475,7 +475,7 @@ ;; esac -AC_CHECK_HEADERS(sys/file.h sys/mman.h syscall.h sys/epoll.h sys/inotify.h sys/ptrace.h sys/syscall.h sys/user.h sys/utsname.h sys/select.h sys/socket.h net/if.h net/if_arp.h net/route.h netpacket/packet.h sys/prctl.h sys/mount.h sys/vfs.h sys/statfs.h sys/timex.h sys/sysinfo.h utime.h linux/ether.h linux/fs.h linux/reboot.h netinet/icmp6.h netinet/in_syst.h netinet/ip.h netinet/ip_mroute.h netinet/if_ether.h) +AC_CHECK_HEADERS(sched.h sys/file.h sys/mman.h syscall.h sys/epoll.h sys/inotify.h sys/ptrace.h sys/syscall.h sys/user.h sys/utsname.h sys/select.h sys/socket.h net/if.h net/if_arp.h net/route.h netpacket/packet.h sys/prctl.h sys/mount.h sys/vfs.h sys/statfs.h sys/timex.h sys/sysinfo.h utime.h linux/ether.h linux/fs.h linux/reboot.h netinet/icmp6.h netinet/in_syst.h netinet/ip.h netinet/ip_mroute.h netinet/if_ether.h) AC_CHECK_HEADERS([linux/filter.h linux/if_addr.h linux/if_ether.h linux/if_tun.h linux/netlink.h linux/rtnetlink.h], [], [], [#ifdef HAVE_SYS_SOCKET_H diff -r c8ae29f0c4c6 libgo/mksysinfo.sh --- a/libgo/mksysinfo.sh Tue May 06 12:23:00 2014 -0700 +++ b/libgo/mksysinfo.sh Wed May 07 14:40:49 2014 -0700 @@ -163,6 +163,9 @@ #if defined(HAVE_NETINET_ICMP6_H) #include netinet/icmp6.h #endif +#if defined(HAVE_SCHED_H) +#include sched.h +#endif /* Constants that may only be defined as expressions on some systems, expressions too complex for -fdump-go-spec to handle. These are @@ -1130,6 +1133,10 @@ -e 's/\[0\]byte/[0]int8/' \ ${OUT} +# The GNU/Linux CLONE flags. +grep '^const _CLONE_' gen-sysinfo.go | \ + sed -e 's/^\(const \)_\(CLONE_[^= ]*\)\(.*\)$/\1\2 = _\2/' ${OUT} + # The Solaris 11 Update 1 _zone_net_addr_t struct. grep '^type _zone_net_addr_t ' gen-sysinfo.go | \ sed -e 's/_in6_addr/[16]byte/' \
libgo patch committed: Define more TIOC constants
This patch to libgo defines more TIOC constants, constants that are non-trivial constants on GNU/Linux systems. Boostrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline and 4.9 branch. Ian diff -r bbf6c7c22954 libgo/mksysinfo.sh --- a/libgo/mksysinfo.sh Wed May 07 14:42:39 2014 -0700 +++ b/libgo/mksysinfo.sh Wed May 07 14:58:48 2014 -0700 @@ -180,6 +180,18 @@ #ifdef TIOCSCTTY TIOCSCTTY_val = TIOCSCTTY, #endif +#ifdef TIOCGPTN + TIOCGPTN_val = TIOCGPTN, +#endif +#ifdef TIOCSPTLCK + TIOCSPTLCK_val = TIOCSPTLCK, +#endif +#ifdef TIOCGDEV + TIOCGDEV_val = TIOCGDEV, +#endif +#ifdef TIOCSIG + TIOCSIG_val = TIOCSIG, +#endif }; EOF @@ -778,6 +790,26 @@ echo 'const TIOCSCTTY = _TIOCSCTTY_val' ${OUT} fi fi +if ! grep '^const TIOCGPTN' ${OUT} /dev/null 21; then + if grep '^const _TIOCGPTN_val' ${OUT} /dev/null 21; then +echo 'const TIOCGPTN = _TIOCGPTN_val' ${OUT} + fi +fi +if ! grep '^const TIOCSPTLCK' ${OUT} /dev/null 21; then + if grep '^const _TIOCSPTLCK_val' ${OUT} /dev/null 21; then +echo 'const TIOCSPTLCK = _TIOCSPTLCK_val' ${OUT} + fi +fi +if ! grep '^const TIOCGDEV' ${OUT} /dev/null 21; then + if grep '^const _TIOCGDEV_val' ${OUT} /dev/null 21; then +echo 'const TIOCGDEV = _TIOCGDEV_val' ${OUT} + fi +fi +if ! grep '^const TIOCSIG' ${OUT} /dev/null 21; then + if grep '^const _TIOCSIG_val' ${OUT} /dev/null 21; then +echo 'const TIOCSIG = _TIOCSIG_val' ${OUT} + fi +fi # The ioctl flags for terminal control grep '^const _TC[GS]ET' gen-sysinfo.go | \
AutoFDO profile toolchain is open-sourced
We have open-sourced AutoFDO profile toolchain in: https://github.com/google/autofdo For GCC developers, the most important tool is create_gcov, which converts sampling based profile to GCC-readable profile. Please refer to the readme file (https://raw.githubusercontent.com/google/autofdo/master/README) for more details. To use the profile, one need to checkout https://gcc.gnu.org/svn/gcc/branches/google/gcc-4_8. We are working on porting AutoFDO to trunk (http://gcc.gnu.org/ml/gcc-patches/2014-05/msg00438.html). We have limited doc inside the open-sourced package, and we are planning to add more content to the wiki page (https://github.com/google/autofdo/wiki). Feel free to send me emails or discuss on github if you have any questions. Cheers, Dehao
Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call
This is the updated patch of pr58066-3.patch. The calls added in the templates of tls_local_dynamic_base_32 and tls_global_dynamic_32 in pr58066-3.patch are used to prevent sched2 from moving sp setting across implicit tls calls, but those calls make the combine of UNSPEC_TLS_LD_BASE and UNSPEC_DTPOFF difficult, so that the optimization in tls_local_dynamic_32_once to convert local_dynamic to global_dynamic mode for single tls reference cannot take effect. In the updated patch, I remove those calls from insn templates and add reg:SI SP_REG explicitly in the templates of UNSPEC_TLS_GD and UNSPEC_TLS_LD_BASE. It solves the sched2 and combine problems above, and now the optimization in tls_local_dynamic_32_once works. bootstrapped ok on x86_64-linux-gnu. regression is going on. Is it OK if regression passes? Thanks. Wei. ChangeLog: gcc/ 2014-05-07 Wei Mi w...@google.com * config/i386/i386.c (ix86_compute_frame_layout): preferred_stack_boundary updated for tls expanded call. * config/i386/i386.md: Set ix86_tls_descriptor_calls_expanded_in_cfun. gcc/testsuite/ 2014-05-07 Wei Mi w...@google.com * gcc.target/i386/pr58066.c: New test. Index: testsuite/gcc.target/i386/pr58066.c === --- testsuite/gcc.target/i386/pr58066.c (revision 0) +++ testsuite/gcc.target/i386/pr58066.c (revision 0) @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options -fPIC -O2 } */ + +/* Check whether the stack frame starting addresses of tls expanded calls + in foo and goo are 16bytes aligned. */ +static __thread char ccc1; +void* foo() +{ + return ccc1; +} + +__thread char ccc2; +void* goo() +{ + return ccc2; +} + +/* { dg-final { scan-assembler-times .cfi_def_cfa_offset 16 2 } } */ Index: config/i386/i386.c === --- config/i386/i386.c (revision 209979) +++ config/i386/i386.c (working copy) @@ -9485,20 +9485,30 @@ ix86_compute_frame_layout (struct ix86_f frame-nregs = ix86_nsaved_regs (); frame-nsseregs = ix86_nsaved_sseregs (); - stack_alignment_needed = crtl-stack_alignment_needed / BITS_PER_UNIT; - preferred_alignment = crtl-preferred_stack_boundary / BITS_PER_UNIT; - /* 64-bit MS ABI seem to require stack alignment to be always 16 except for function prologues and leaf. */ - if ((TARGET_64BIT_MS_ABI preferred_alignment 16) + if ((TARGET_64BIT_MS_ABI crtl-preferred_stack_boundary 128) (!crtl-is_leaf || cfun-calls_alloca != 0 || ix86_current_function_calls_tls_descriptor)) { - preferred_alignment = 16; - stack_alignment_needed = 16; crtl-preferred_stack_boundary = 128; crtl-stack_alignment_needed = 128; } + /* preferred_stack_boundary is never updated for call + expanded from tls descriptor. Update it here. We don't update it in + expand stage because according to the comments before + ix86_current_function_calls_tls_descriptor, tls calls may be optimized + away. */ + else if (ix86_current_function_calls_tls_descriptor + crtl-preferred_stack_boundary PREFERRED_STACK_BOUNDARY) +{ + crtl-preferred_stack_boundary = PREFERRED_STACK_BOUNDARY; + if (crtl-stack_alignment_needed PREFERRED_STACK_BOUNDARY) + crtl-stack_alignment_needed = PREFERRED_STACK_BOUNDARY; +} + + stack_alignment_needed = crtl-stack_alignment_needed / BITS_PER_UNIT; + preferred_alignment = crtl-preferred_stack_boundary / BITS_PER_UNIT; gcc_assert (!size || stack_alignment_needed); gcc_assert (preferred_alignment = STACK_BOUNDARY / BITS_PER_UNIT); Index: config/i386/i386.md === --- config/i386/i386.md (revision 209979) +++ config/i386/i386.md (working copy) @@ -12530,7 +12530,8 @@ (unspec:SI [(match_operand:SI 1 register_operand b) (match_operand 2 tls_symbolic_operand) - (match_operand 3 constant_call_address_operand z)] + (match_operand 3 constant_call_address_operand z) + (reg:SI SP_REG)] UNSPEC_TLS_GD)) (clobber (match_scratch:SI 4 =d)) (clobber (match_scratch:SI 5 =c)) @@ -12555,11 +12556,14 @@ [(set (match_operand:SI 0 register_operand) (unspec:SI [(match_operand:SI 2 register_operand) (match_operand 1 tls_symbolic_operand) - (match_operand 3 constant_call_address_operand)] + (match_operand 3 constant_call_address_operand) + (reg:SI SP_REG)] UNSPEC_TLS_GD)) (clobber (match_scratch:SI 4)) (clobber (match_scratch:SI 5)) - (clobber (reg:CC FLAGS_REG))])]) + (clobber (reg:CC FLAGS_REG))])] + + ix86_tls_descriptor_calls_expanded_in_cfun = true;) (define_insn *tls_global_dynamic_64_mode [(set (match_operand:P 0 register_operand =a) @@ -12614,13 +12618,15 @@ (const_int 0)))
Re: genattrtab error reporting
On Wed, May 7, 2014 at 2:21 PM, Mike Stump mikest...@comcast.net wrote: getattrtab looses track of which file the given rtl came from during error reporting. A port that uses multiple .md files for the port will tend to list the last .md file processed instead of the correct md file. We preserve the filename upon read, and during post processing, we reset the filename to the right context, as we process that context. Does this fix http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31778 -- H.J.
Re: genattrtab error reporting
On May 7, 2014, at 5:22 PM, H.J. Lu hjl.to...@gmail.com wrote: On Wed, May 7, 2014 at 2:21 PM, Mike Stump mikest...@comcast.net wrote: getattrtab looses track of which file the given rtl came from during error reporting. A port that uses multiple .md files for the port will tend to list the last .md file processed instead of the correct md file. We preserve the filename upon read, and during post processing, we reset the filename to the right context, as we process that context. Does this fix http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31778 Only if it is applied to the tree! :-) Yes.
[v3] Mini-tweak to acinclude.m4
Hi, I don't think we have any reason to trigger a -Wwrite-strings warning, thus, barring objections, I'm going to commit the below. Thanks, Paolo. /// 2014-05-08 Paolo Carlini paolo.carl...@oracle.com * acinclude.m4 ([GLIBCXX_ENABLE_C99]): Avoid -Wwrite-strings warning. * configure: Regenerate. Index: acinclude.m4 === --- acinclude.m4(revision 210183) +++ acinclude.m4(working copy) @@ -1052,8 +1052,8 @@ AC_DEFUN([GLIBCXX_ENABLE_C99], [ vscanf(%i, args); vsnprintf(fmt, 0, %i, args); vsscanf(fmt, %i, args); - }], - [snprintf(12, 0, %i);], + snprintf(fmt, 0, %i); + }], [], [glibcxx_cv_c99_stdio=yes], [glibcxx_cv_c99_stdio=no]) ]) AC_MSG_RESULT($glibcxx_cv_c99_stdio)
Fix some tests for TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL
Having fixed TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL to apply only to 128-bit vectors, some --with-arch=bdver3 --with-cpu=bdver3 scan-assembler failures relating to that tuning remain, because of different choices of instructions for 128-bit vectors from the choices expected by the tests. This patch fixes affected tests to allow the different instruction choices seen in this case. Tested for x86_64-linux-gnu (--with-arch=bdver3 --with-cpu=bdver3). OK to commit? 2014-05-07 Joseph Myers jos...@codesourcery.com * gcc.target/i386/avx256-unaligned-load-2.c, gcc.target/i386/pr49002-1.c, gcc.target/i386/pr53712.c, gcc.target/i386/pr53907.c, gcc.target/i386/pr59539-1.c: Allow packed-single instructions. Index: gcc/testsuite/gcc.target/i386/pr59539-1.c === --- gcc/testsuite/gcc.target/i386/pr59539-1.c (revision 210124) +++ gcc/testsuite/gcc.target/i386/pr59539-1.c (working copy) @@ -13,4 +13,4 @@ return _mm_movemask_epi8 (result); } -/* { dg-final { scan-assembler-times vmovdqu 1 } } */ +/* { dg-final { scan-assembler-times vmovdqu|vmovups 1 } } */ Index: gcc/testsuite/gcc.target/i386/pr53712.c === --- gcc/testsuite/gcc.target/i386/pr53712.c (revision 210124) +++ gcc/testsuite/gcc.target/i386/pr53712.c (working copy) @@ -10,4 +10,4 @@ return __builtin_ia32_pcmpistri128 (s1chars, s2chars, 0); } -/* { dg-final { scan-assembler-times movdqu 1 } } */ +/* { dg-final { scan-assembler-times movdqu|movups 1 } } */ Index: gcc/testsuite/gcc.target/i386/avx256-unaligned-load-2.c === --- gcc/testsuite/gcc.target/i386/avx256-unaligned-load-2.c (revision 210124) +++ gcc/testsuite/gcc.target/i386/avx256-unaligned-load-2.c (working copy) @@ -11,5 +11,5 @@ } /* { dg-final { scan-assembler-not (avx_loaddqu256|vmovdqu\[^\n\r]*movv32qi_internal) } } */ -/* { dg-final { scan-assembler (sse2_loaddqu|vmovdqu\[^\n\r]*movv16qi_internal) } } */ +/* { dg-final { scan-assembler (sse2_loaddqu|(vmovdqu|vmovups)\[^\n\r]*movv16qi_internal) } } */ /* { dg-final { scan-assembler vinsert.128 } } */ Index: gcc/testsuite/gcc.target/i386/pr49002-1.c === --- gcc/testsuite/gcc.target/i386/pr49002-1.c (revision 210124) +++ gcc/testsuite/gcc.target/i386/pr49002-1.c (working copy) @@ -13,4 +13,4 @@ /* Ensure we load into xmm, not ymm. */ /* { dg-final { scan-assembler-not vmovapd\[\t \]*\[^,\]*,\[\t \]*%ymm } } */ -/* { dg-final { scan-assembler vmovapd\[\t \]*\[^,\]*,\[\t \]*%xmm } } */ +/* { dg-final { scan-assembler vmovap\[ds\]\[\t \]*\[^,\]*,\[\t \]*%xmm } } */ Index: gcc/testsuite/gcc.target/i386/pr53907.c === --- gcc/testsuite/gcc.target/i386/pr53907.c (revision 210124) +++ gcc/testsuite/gcc.target/i386/pr53907.c (working copy) @@ -13,4 +13,4 @@ return sz; } -/* { dg-final { scan-assembler movdqa } } */ +/* { dg-final { scan-assembler movdqa|movaps } } */ -- Joseph S. Myers jos...@codesourcery.com
Re: genattrtab error reporting
Does this fix http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31778 Only if it is applied to the tree! :-) Yes. It also is PR57062. Thanks for fixing it! Segher
Re: RFC: Faster for_each_rtx-like iterators
On Wed, May 07, 2014 at 09:52:49PM +0100, Richard Sandiford wrote: I noticed for_each_rtx showing up in profiles and thought I'd have a go at using worklist-based iterators instead. So far I have three: FOR_EACH_SUBRTX: iterates over const_rtx subrtxes of a const_rtx FOR_EACH_SUBRTX_VAR: iterates over rtx subrtxes of an rtx FOR_EACH_SUBRTX_PTR: iterates over subrtx pointers of an rtx * with FOR_EACH_SUBRTX_PTR being the direct for_each_rtx replacement. I made FOR_EACH_SUBRTX the default (unsuffixed) version because most walks really don't modify the structure. I think we should encourage const_rtxes to be used whereever possible. E.g. it might make it easier to have non-GC storage for temporary rtxes in future. I've locally replaced all for_each_rtx calls in the generic code with these iterators and they make things reproducably faster. The speed-up on full --enable-checking=release ./cc1 and ./cc1plus times is only about 1%, but maybe that's enough to justify the churn. seems pretty nice, and it seems like it'll make code a little more readable too :) Implementation-wise, the main observation is that most subrtxes are part of a single contiguous sequence of e fields. E.g. when compiling an oldish combine.ii on x86_64-linux-gnu with -O2, we iterate over the subrtxes of 7,636,542 rtxes. Of those: (A) 4,459,135 (58.4%) are leaf rtxes with no e or E fields, (B) 3,133,875 (41.0%) are rtxes with a single block of e fields and no E fields, and (C)43,532 (00.6%) are more complicated. (A) is really a special case of (B) in which the block has zero length. Those are the only two cases that really need to be handled inline. The implementation does this by having a mapping from an rtx code to the bounds of its e sequence, in the form of a start index and count. Out of (C), the vast majority (43,509) are PARALLELs. However, as you'd probably expect, bloating the inline code with that case made things slower rather than faster. The vast majority (in fact all in the combine.ii run above) of iterations can be done with a 16-element stack worklist. We obviously still need a heap fallback for the pathological cases though. I spent a bit of time trying different iterator implementations and seeing which produced the best code. Specific results from that were: - The storage used for the worklist is separate from the iterator, in order to avoid capturing iterator fields. - Although the natural type of the storage would be auto_vec ..., 16, that produced some overhead compared with a separate stack array and heap vector pointer. With the heap vector pointer, the only overhead is an assignment in the constructor and an if (x) release (x)-style sequence in the destructor. I think the extra complication over auto_vec is worth it because in this case the heap version is so very rarely needed. hm, where does the overhead come from exactly? it seems like if its faster to use vecT, va_heap, vl_embedd *foo; we should fix something about vectors since this isn't the only place it could matter. does it matter if you use vecT, va_heap, vl_embedd * or vecT ? the second is basically just a wrapper around the former I'd expect has no effect. I'm not saying you're doing the wrong thing here, but if we can make generic vectors faster we probably should ;) or is the issue the __builtin_expect()s you can add? - Several existing for_each_rtx callbacks have something like: if (GET_CODE (x) == CONST) return -1; or: if (CONSTANT_P (x)) return -1; to avoid walking subrtxes of constants. That can be done without extra code checks and branches by having a separate code-bound mapping in which all constants are treated as leaf rtxes. This usage should be common enough to outweigh the cache penalty of two arrays. The choice between iterating over constants or not is given in the final parameter of the FOR_EACH_* iterator. less repitition \O/ - The maximum number of fields in (B)-type rtxes is 3. We get better code by making that explicit rather than having a general loop. - (C) codes map to an e count of UCHAR_MAX, so we can use a single check to test for that and for cases where the stack worklist is too small. can we use uint8_t? To give an example: /* Callback for for_each_rtx, that returns 1 upon encountering a VALUE whose UID is greater than the int uid that D points to. */ static int refs_newer_value_cb (rtx *x, void *d) { if (GET_CODE (*x) == VALUE CSELIB_VAL_PTR (*x)-uid *(int *)d) return 1; return 0; } /* Return TRUE if EXPR refers to a VALUE whose uid is greater than that of V. */ static bool refs_newer_value_p (rtx expr, rtx v) { int minuid = CSELIB_VAL_PTR (v)-uid; return for_each_rtx (expr, refs_newer_value_cb, minuid); } becomes: /* Return TRUE if EXPR refers to a VALUE whose
[RS6000] Fix PR61098, Poor code setting count register
On powerpc64, to set a large loop count we have code like the following after split1: (insn 67 14 68 4 (set (reg:DI 160) (const_int 99942400 [0x5f5])) /home/amodra/unaligned_load.c:14 -1 (nil)) (insn 68 67 42 4 (set (reg:DI 160) (ior:DI (reg:DI 160) (const_int 57600 [0xe100]))) /home/amodra/unaligned_load.c:14 -1 (expr_list:REG_EQUAL (const_int 1 [0x5f5e100]) (nil))) and then test for loop exit with: (jump_insn 65 31 45 5 (parallel [ (set (pc) (if_then_else (ne (reg:DI 160) (const_int 1 [0x1])) (label_ref:DI 42) (pc))) (set (reg:DI 160) (plus:DI (reg:DI 160) (const_int -1 [0x]))) (clobber (scratch:CC)) (clobber (scratch:DI)) ]) /home/amodra/unaligned_load.c:15 800 {*ctrdi_internal1} (int_list:REG_BR_PROB 9899 (nil)) - 42) The jump_insn of course is meant for use with bdnz, which implies a strong preference for reg 160 to live in the count register. Trouble is, the count register doesn't do arithmetic. So, use a new psuedo for intermediate results. On looking at this, I noticed the !TARGET_POWERPC64 code in rs6000_emit_set_long_const was broken, apparently expecting c1 and c2 to be the high and low 32 bits of the constant. That's no longer true, so I've fixed that as well. Bootstrapped and regression tested powerpc64-linux. OK for mainline and branches? PR target/61098 * config/rs6000/rs6000.c (rs6000_emit_set_const): Remove unneeded params and return value. Simplify. Update comment. (rs6000_emit_set_long_const): Remove unneeded param and return value. Correct !TARGET_POWERPC64 handling of constants 2G. If we can, use a new pseudo for intermediate calculations. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 209926) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -1068,7 +1069,7 @@ static tree rs6000_handle_longcall_attribute (tree static tree rs6000_handle_altivec_attribute (tree *, tree, tree, int, bool *); static tree rs6000_handle_struct_attribute (tree *, tree, tree, int, bool *); static tree rs6000_builtin_vectorized_libmass (tree, tree, tree); -static rtx rs6000_emit_set_long_const (rtx, HOST_WIDE_INT, HOST_WIDE_INT); +static void rs6000_emit_set_long_const (rtx, HOST_WIDE_INT); static int rs6000_memory_move_cost (enum machine_mode, reg_class_t, bool); static bool rs6000_debug_rtx_costs (rtx, int, int, int, int *, bool); static int rs6000_debug_address_cost (rtx, enum machine_mode, addr_space_t, @@ -7826,53 +7811,36 @@ rs6000_conditional_register_usage (void) } -/* Try to output insns to set TARGET equal to the constant C if it can - be done in less than N insns. Do all computations in MODE. - Returns the place where the output has been placed if it can be - done and the insns have been emitted. If it would take more than N - insns, zero is returned and no insns and emitted. */ +/* Output insns to set DEST equal to the constant SOURCE. */ -rtx -rs6000_emit_set_const (rtx dest, enum machine_mode mode, - rtx source, int n ATTRIBUTE_UNUSED) +void +rs6000_emit_set_const (rtx dest, rtx source) { - rtx result, insn, set; - HOST_WIDE_INT c0, c1; + enum machine_mode mode = GET_MODE (dest); + rtx temp, insn, set; + HOST_WIDE_INT c; + gcc_checking_assert (CONST_INT_P (source)); + c = INTVAL (source); switch (mode) { -case QImode: +case QImode: case HImode: - if (dest == NULL) - dest = gen_reg_rtx (mode); emit_insn (gen_rtx_SET (VOIDmode, dest, source)); - return dest; + return; case SImode: - result = !can_create_pseudo_p () ? dest : gen_reg_rtx (SImode); + temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (SImode); - emit_insn (gen_rtx_SET (VOIDmode, copy_rtx (result), - GEN_INT (INTVAL (source) - (~ (HOST_WIDE_INT) 0x; + emit_insn (gen_rtx_SET (VOIDmode, copy_rtx (temp), + GEN_INT (c (~ (HOST_WIDE_INT) 0x; emit_insn (gen_rtx_SET (VOIDmode, dest, - gen_rtx_IOR (SImode, copy_rtx (result), - GEN_INT (INTVAL (source) 0x; - result = dest; + gen_rtx_IOR (SImode, copy_rtx (temp), + GEN_INT (c 0x; break; case DImode: - switch (GET_CODE (source)) - { - case CONST_INT: - c0 = INTVAL (source); - c1 = -(c0 0); - break; - - default: - gcc_unreachable (); - } - - result = rs6000_emit_set_long_const (dest, c0, c1); +
Re: [RS6000] PR60737, expand_block_clear uses word stores
On Wed, May 07, 2014 at 01:39:50PM -0400, David Edelsohn wrote: On Tue, May 6, 2014 at 4:32 AM, Alan Modra amo...@gmail.com wrote: BTW, the latest patch in my tree has a slight refinement, the reload-by-hand addition. PR target/60737 * config/rs6000/rs6000.c (expand_block_move): Allow 64-bit loads and stores when -mno-strict-align at any alignment. (expand_block_clear): Similarly. Also correct calculation of instruction count. Based on results of your experiment, the revised patch is okay. You did not include gcc-patches in the distribution list for the revised patch. Thanks, David. Patch copied here for gcc-patches and committed revision 210201. PR target/60737 * config/rs6000/rs6000.c (expand_block_move): Allow 64-bit loads and stores when -mno-strict-align at any alignment. (expand_block_clear): Similarly. Also correct calculation of instruction count. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 210200) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -15443,7 +15443,7 @@ expand_block_clear (rtx operands[]) load zero and three to do clearing. */ if (TARGET_ALTIVEC align = 128) clear_step = 16; - else if (TARGET_POWERPC64 align = 32) + else if (TARGET_POWERPC64 (align = 64 || !STRICT_ALIGNMENT)) clear_step = 8; else if (TARGET_SPE align = 64) clear_step = 8; @@ -15471,12 +15471,27 @@ expand_block_clear (rtx operands[]) mode = V2SImode; } else if (bytes = 8 TARGET_POWERPC64 - /* 64-bit loads and stores require word-aligned - displacements. */ - (align = 64 || (!STRICT_ALIGNMENT align = 32))) + (align = 64 || !STRICT_ALIGNMENT)) { clear_bytes = 8; mode = DImode; + if (offset == 0 align 64) + { + rtx addr; + + /* If the address form is reg+offset with offset not a +multiple of four, reload into reg indirect form here +rather than waiting for reload. This way we get one +reload, not one per store. */ + addr = XEXP (orig_dest, 0); + if ((GET_CODE (addr) == PLUS || GET_CODE (addr) == LO_SUM) + GET_CODE (XEXP (addr, 1)) == CONST_INT + (INTVAL (XEXP (addr, 1)) 3) != 0) + { + addr = copy_addr_to_reg (addr); + orig_dest = replace_equiv_address (orig_dest, addr); + } + } } else if (bytes = 4 (align = 32 || !STRICT_ALIGNMENT)) { /* move 4 bytes */ @@ -15604,13 +15619,36 @@ expand_block_move (rtx operands[]) gen_func.movmemsi = gen_movmemsi_4reg; } else if (bytes = 8 TARGET_POWERPC64 - /* 64-bit loads and stores require word-aligned - displacements. */ - (align = 64 || (!STRICT_ALIGNMENT align = 32))) + (align = 64 || !STRICT_ALIGNMENT)) { move_bytes = 8; mode = DImode; gen_func.mov = gen_movdi; + if (offset == 0 align 64) + { + rtx addr; + + /* If the address form is reg+offset with offset not a +multiple of four, reload into reg indirect form here +rather than waiting for reload. This way we get one +reload, not one per load and/or store. */ + addr = XEXP (orig_dest, 0); + if ((GET_CODE (addr) == PLUS || GET_CODE (addr) == LO_SUM) + GET_CODE (XEXP (addr, 1)) == CONST_INT + (INTVAL (XEXP (addr, 1)) 3) != 0) + { + addr = copy_addr_to_reg (addr); + orig_dest = replace_equiv_address (orig_dest, addr); + } + addr = XEXP (orig_src, 0); + if ((GET_CODE (addr) == PLUS || GET_CODE (addr) == LO_SUM) + GET_CODE (XEXP (addr, 1)) == CONST_INT + (INTVAL (XEXP (addr, 1)) 3) != 0) + { + addr = copy_addr_to_reg (addr); + orig_src = replace_equiv_address (orig_src, addr); + } + } } else if (TARGET_STRING bytes 4 !TARGET_POWERPC64) { /* move up to 8 bytes at a time */ -- Alan Modra Australia Development Lab, IBM
Re: genattrtab error reporting
On May 7, 2014, at 6:12 PM, Segher Boessenkool seg...@kernel.crashing.org wrote: Does this fix http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31778 Only if it is applied to the tree! :-) Yes. It also is PR57062. Thanks for fixing it! Thanks, marked as dup.
Re: [patch] change specific int128 - generic intN
OK (presuming the usual bootstrap and regression test, which should provide a reasonably thorough test of this code through the stdint.h tests). Bootstrapped with and without the patch on x86-64, no regressions. Committed. Thanks!