Re: [PATCH] Honour DriverOnly for enum values in error messages.
On 07/27/2015 01:45 PM, Dominik Vogt wrote: /gcc/ChangeLog * opts-common.c (read_cmdline_option): List DriverOnly enum values as valid only in the error message of the driver, not in the messages of the language compilers. Applied. Thanks! -Andreas-
[PATCH] Refactoring masked built-in decls
Hello, This patch converts mask type for masked builtins from signed to unsigned. Furthermore, several redundant builtins definitions were removed. Please have a look. It it ok for trunk? Thanks, Petr 2015-07-27 Petr Murzin petr.mur...@intel.com * config/i386/i386.c (bdesc_special_args): Convert mask type from signed to unsigned for masked builtins. (ix86_expand_args_builtin): Do not handle UINT_FTYPE_V2DF, UINT64_FTYPE_V2DF, UINT64_FTYPE_V4SF, V16QI_FTYPE_V8DI, V16HI_FTYPE_V16SI, V16SI_FTYPE_V16SI, V16SF_FTYPE_FLOAT, V8HI_FTYPE_V8DI, V8UHI_FTYPE_V8UHI, V8SI_FTYPE_V8DI, V8SF_FTYPE_V8DF, V8DI_FTYPE_INT64, V8DI_FTYPE_V4DI, V8DI_FTYPE_V8DI, V8DF_FTYPE_DOUBLE, V8DF_FTYPE_V8SI, V16SI_FTYPE_V16SI_V16SI, V16SF_FTYPE_V16SF_V16SI, V8DI_FTYPE_V8DI_V8DI, V8DF_FTYPE_V8DF_V8DI, V4SI_FTYPE_V4SF_V4SF, V4SF_FTYPE_V4SF_UINT64, V2UDI_FTYPE_V4USI_V4USI, V2DI_FTYPE_V2DF_V2DF, V2DF_FTYPE_V2DF_UINT64, V4UDI_FTYPE_V8USI_V8USI, QI_FTYPE_V8DI_V8DI, HI_FTYPE_V16SI_V16SI, HI_FTYPE_HI_INT, V16SF_FTYPE_V16SF_V16SF_V16SF, V16SF_FTYPE_V16SF_V16SI_V16SF, V16SF_FTYPE_V16SI_V16SF_HI, V16SF_FTYPE_V16SI_V16SF_V16SF, V16SI_FTYPE_V16SF_V16SI_HI, V8DI_FTYPE_V8SF_V8DI_QI, V8SF_FTYPE_V8DI_V8SF_QI, V8DI_FTYPE_PV4DI, V8DF_FTYPE_V8DI_V8DF_QI, V16SI_FTYPE_V16SI_V16SI_V16SI, V2DI_FTYPE_V2DI_V2DI_V2DI, V8DI_FTYPE_V8DF_V8DI_QI, V8DF_FTYPE_PV4DF, V8SI_FTYPE_V8SI_V8SI_V8SI, V8DF_FTYPE_V8DF_V8DF_V8DF, UINT_FTYPE_V4SF, V8DF_FTYPE_V8DF_V8DI_V8DF, V8DF_FTYPE_V8DI_V8DF_V8DF, V8DF_FTYPE_V8SF_V8DF_QI, V8DI_FTYPE_V8DI_V8DI_V8DI, V16SF_FTYPE_PV4SF, V8SF_FTYPE_V8DF_V8SF_QI, V8SI_FTYPE_V8DF_V8SI_QI, V16SI_FTYPE_PV4SI, V2DF_FTYPE_V2DF_V4SF_V2DF_QI, V4SF_FTYPE_V4SF_V2DF_V4SF_QI, V8DI_FTYPE_V8DI_SI_V8DI_V8DI, QI_FTYPE_V8DF_V8DF_INT_QI, HI_FTYPE_V16SF_V16SF_INT_HI, V16SF_FTYPE_V16SF_V16SF_V16SI_INT_HI, VOID_FTYPE_PDOUBLE_V2DF_QI, VOID_FTYPE_PFLOAT_V4SF_QI, V2DF_FTYPE_PCDOUBLE_V2DF_QI, V4SF_FTYPE_PCFLOAT_V4SF_QI. * config/i386/i386-builtin-types.def (V16QI_FTYPE_V16SI): Remove. (V8DF_FTYPE_V8SI): Ditto. (V8HI_FTYPE_V8DI): Ditto. (V8SI_FTYPE_V8DI): Ditto. (V8SF_FTYPE_V8DF): Ditto. (V8SF_FTYPE_V8DF_V8SF_QI): Ditto. (V16HI_FTYPE_V16SI): Ditto. (V16SF_FTYPE_V16HI): Ditto. (V16SF_FTYPE_V16HI_V16SF_HI): Ditto. (V16SF_FTYPE_V16SI): Ditto. (V4DI_FTYPE_V4DI): Ditto. (V16SI_FTYPE_V16SF): Ditto. (V16SF_FTYPE_FLOAT): Ditto. (V8DF_FTYPE_DOUBLE): Ditto. (V8DI_FTYPE_INT64): Ditto. (V8DI_FTYPE_V4DI): Ditto. (V16QI_FTYPE_V8DI): Ditto. (UINT_FTYPE_V4SF): Ditto. (UINT64_FTYPE_V4SF): Ditto. (UINT_FTYPE_V2DF): Ditto. (UINT64_FTYPE_V2DF): Ditto. (V16SI_FTYPE_V16SI): Ditto. (V8DI_FTYPE_V8DI): Ditto. (V16SI_FTYPE_PV4SI): Ditto. (V16SF_FTYPE_PV4SF): Ditto. (V8DI_FTYPE_PV2DI): Ditto. (V8DF_FTYPE_PV2DF): Ditto. (V4DI_FTYPE_PV2DI): Ditto. (V4DF_FTYPE_PV2DF): Ditto. (V16SI_FTYPE_PV2SI): Ditto. (V16SF_FTYPE_PV2SF): Ditto. (V8DI_FTYPE_PV4DI): Ditto. (V8DF_FTYPE_PV4DF): Ditto. (V8SF_FTYPE_FLOAT): Ditto. (V4SF_FTYPE_FLOAT): Ditto. (V4DF_FTYPE_DOUBLE): Ditto. (V8SF_FTYPE_PV4SF): Ditto. (V8SI_FTYPE_PV4SI): Ditto. (V4SI_FTYPE_PV2SI): Ditto. (V8SF_FTYPE_PV2SF): Ditto. (V8SI_FTYPE_PV2SI): Ditto. (V16SF_FTYPE_PV8SF): Ditto. (V16SI_FTYPE_PV8SI): Ditto. (V8DI_FTYPE_V8SF): Ditto. (V4DI_FTYPE_V4SF): Ditto. (V2DI_FTYPE_V4SF): Ditto. (V64QI_FTYPE_QI): Ditto. (V32HI_FTYPE_HI): Ditto. (V8UHI_FTYPE_V8UHI): Ditto. (V16UHI_FTYPE_V16UHI): Ditto. (V32UHI_FTYPE_V32UHI): Ditto. (V2UDI_FTYPE_V2UDI): Ditto. (V4UDI_FTYPE_V4UDI): Ditto. (V8UDI_FTYPE_V8UDI): Ditto. (V4USI_FTYPE_V4USI): Ditto. (V8USI_FTYPE_V8USI): Ditto. (V16USI_FTYPE_V16USI): Ditto. (V2DF_FTYPE_V2DF_UINT64): Ditto. (V2DI_FTYPE_V2DF_V2DF): Ditto. (V2UDI_FTYPE_V4USI_V4USI): Ditto. (V8DF_FTYPE_V8DF_V8DI): Ditto. (V4SF_FTYPE_V4SF_UINT64): Ditto. (V4SI_FTYPE_V4SF_V4SF): Ditto. (V16SF_FTYPE_V16SF_V16SI): Ditto. (V64QI_FTYPE_V32HI_V32HI): Ditto. (V32HI_FTYPE_V16SI_V16SI): Ditto. (V8DF_FTYPE_V8DF_V8DF_V8DI_INT_QI): Ditto. (V16SF_FTYPE_V16SF_V16SF_V16SI_INT_HI): Ditto. (V32HI_FTYPE_V64QI_V64QI): Ditto. (V32HI_FTYPE_V32HI_V32HI): Ditto. (V16HI_FTYPE_V16HI_V16HI_INT_V16HI_HI): Ditto. (V16SI_FTYPE_V16SI_V4SI): Ditto. (V16SI_FTYPE_V16SI_V16SI): Ditto. (V16SI_FTYPE_V32HI_V32HI): Ditto. (V16SI_FTYPE_V16SI_SI): Ditto. (V8DI_FTYPE_V8DI_V8DI): Ditto. (V4UDI_FTYPE_V8USI_V8USI): Ditto. (V8DI_FTYPE_V16SI_V16SI): Ditto. (V8DI_FTYPE_V8DI_V2DI): Ditto. (QI_FTYPE_QI): Ditto. (SI_FTYPE_SI): Ditto. (DI_FTYPE_DI): Ditto. (QI_FTYPE_QI_QI): Ditto. (QI_FTYPE_QI_INT): Ditto. (HI_FTYPE_HI_INT): Ditto. (SI_FTYPE_SI_INT): Ditto. (DI_FTYPE_DI_INT): Ditto. (HI_FTYPE_V16QI_V16QI): Ditto. (SI_FTYPE_V32QI_V32QI): Ditto. (DI_FTYPE_V64QI_V64QI): Ditto. (QI_FTYPE_V8HI_V8HI): Ditto. (HI_FTYPE_V16HI_V16HI): Ditto. (SI_FTYPE_V32HI_V32HI): Ditto. (QI_FTYPE_V4SI_V4SI): Ditto. (QI_FTYPE_V8SI_V8SI): Ditto. (QI_FTYPE_V2DI_V2DI): Ditto. (QI_FTYPE_V4DI_V4DI): Ditto. (QI_FTYPE_V8DI_V8DI): Ditto. (HI_FTYPE_V16SI_V16SI): Ditto. (HI_FTYPE_V16SI_V16SI_INT_HI): Ditto. (QI_FTYPE_V8DF_V8DF_INT_QI): Ditto. (HI_FTYPE_V16SF_V16SF_INT_HI): Ditto. (V32HI_FTYPE_V32HI_V32HI_V32HI): Ditto. (V4SF_FTYPE_V4SF_V2DF_V4SF_QI):
Re: [wwwdocs] Mention -Wshift-overflow
On Mon, Jul 27, 2015 at 12:34:06AM +0200, Gerald Pfeifer wrote: Hi Marek, On Tue, 21 Jul 2015, Marek Polacek wrote: +liA new command-line option code-Wshift-overflow/code has been + added for the C and C++ compilers, which warns about left shift + overflows. code-Wshift-overflow=2/code also warns about + left-shifting 1 into the sign bit./li from what I can tell, this is enabled by default for modern dialects of C and C++ (so many people will see new warnings when upgrading) Shouldn't this be mentioned here? Ok, fixed. Thanks. Index: changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v retrieving revision 1.15 diff -u -r1.15 changes.html --- changes.html21 Jul 2015 09:30:14 - 1.15 +++ changes.html27 Jul 2015 11:14:11 - @@ -46,7 +46,8 @@ liA new command-line option code-Wshift-overflow/code has been added for the C and C++ compilers, which warns about left shift overflows. code-Wshift-overflow=2/code also warns about - left-shifting 1 into the sign bit./li + left-shifting 1 into the sign bit. This warning is enabled by + default./li /ul h3 id=cC/h3 Marek
RFA: RL78: Add an optimization to the addsi3 pattern
Hi DJ, This patch adds a small optimization to the RL78 addsi3 pattern so that it can skip adding in the high part of a constant symbolic address when -mes0 is active. Under these circumstances we know that the address has to be in low memory and that it is invalid to attempt to access anything in high memory relative to this symbol. Tested with no regressions on an rl78-elf toolchain. OK to apply ? Cheers Nick gcc/ChangeLog 2015-07-27 Nick Clifton ni...@redhat.com * config/rl78/rl78.c (rl78_addsi3_internal): New function. Optimizes the case where -mes0 is active and a constant symbolic address is used. * config/rl78/rl78-protos.h: Prototype the new function. * config/rl78/rl78.md (addsi3_internal_real): Call new function. Index: gcc/config/rl78/rl78-protos.h === --- gcc/config/rl78/rl78-protos.h (revision 226240) +++ gcc/config/rl78/rl78-protos.h (working copy) @@ -18,6 +18,7 @@ along with GCC; see the file COPYING3. If not see http://www.gnu.org/licenses/. */ +const char *rl78_addsi3_internal (rtx *, unsigned int); void rl78_emit_eh_epilogue (rtx); void rl78_expand_compare (rtx *); void rl78_expand_movsi (rtx *); Index: gcc/config/rl78/rl78.c === --- gcc/config/rl78/rl78.c (revision 226240) +++ gcc/config/rl78/rl78.c (working copy) @@ -4638,6 +4641,34 @@ return res; } + +const char * +rl78_addsi3_internal (rtx * operands, unsigned int alternative) +{ + /* Issue 5905133: + If we are adding in a constant symbolic address when -mes0 + is active then we know that the address must be 64K and + that it is invalid to access anything above 64K relative to + this address. So we can skip adding in the high bytes. */ + if (TARGET_ES0 + GET_CODE (operands[2]) == SYMBOL_REF + TREE_CODE (SYMBOL_REF_DECL (operands[2])) == VAR_DECL + TREE_READONLY (SYMBOL_REF_DECL (operands[2])) + ! TREE_SIDE_EFFECTS (SYMBOL_REF_DECL (operands[2]))) +return movw ax, %h1\n\taddw ax, %h2\n\tmovw %h0, ax; + + switch (alternative) +{ +case 0: +case 1: + return movw ax, %h1\n\taddw ax, %h2\n\tmovw %h0, ax\n\tmovw ax, %H1\n\tsknc\n\tincw ax\n\taddw ax, %H2\n\tmovw %H0, ax; +case 2: + return movw ax, %h1\n\taddw ax,%h2\n\tmovw bc, ax\n\tmovw ax, %H1\n\tsknc\n\tincw ax\n\taddw ax, %H2\n\tmovw %H0, ax\n\tmovw ax, bc\n\tmovw %h0, ax; +default: + gcc_unreachable (); +} +} + #undef TARGET_PREFERRED_RELOAD_CLASS #define TARGET_PREFERRED_RELOAD_CLASS rl78_preferred_reload_class Index: gcc/config/rl78/rl78.md === --- gcc/config/rl78/rl78.md (revision 226240) +++ gcc/config/rl78/rl78.md (working copy) @@ -243,10 +243,7 @@ (clobber (reg:HI BC_REG)) ] rl78_real_insns_ok () - @ - movw ax,%h1 \;addw ax,%h2 \;movw %h0, ax \;movw ax,%H1 \;sknc \;incw ax \;addw ax,%H2 \;movw %H0,ax - movw ax,%h1 \;addw ax,%h2 \;movw %h0, ax \;movw ax,%H1 \;sknc \;incw ax \;addw ax,%H2 \;movw %H0,ax - movw ax,%h1 \;addw ax,%h2 \;movw bc, ax \;movw ax,%H1 \;sknc \;incw ax \;addw ax,%H2 \;movw %H0,ax \;movw ax,bc \;movw %h0, ax + { return rl78_addsi3_internal (operands, which_alternative); } [(set_attr valloc macax)] )
Re: [PATCH 3/16][ARM] Add float16x4_t intrinsics
Ramana Radhakrishnan wrote: I haven't seen the patch yet but here are my thoughts on where this should be going. Thus in summary - 1. -mfpu=neon implies the presence of the float16x(4/8) types and all the intrinsics that treat these values as bags of bits. 2. -mfpu=neon-fp16 implies the presence of the vcvt* intrinsics that are needed for the float16 types. So I think the problems are statements in ACLE that (a) we should only have float16x(4/8)_t types, when we have scalar types as well; (b) whenever we have a scalar __fp16 type, we should have one or other of the __ARM_FP16_FORMAT_(IEEE/ALTERNATIVE) macros defined to indicate the format that's in use. Sadly these seem to forbid the current situation whereby we expose hardware conversion instructions (that work with either fp16 format, according to the status of the FPSCR bit) and allow compiling a binary that will work with either format :(. The situation is further complicated by GCC's support for the alternative format (not mandated by ACLE), that we can support either format in the absence of any hardware (as we have software emulation routines for scalar conversions in either format, as long as we know which at compile time), and that object files compiled with different -mfp16-format cannot be linked together (the ABI attributes conflict). However, I think we can still go with Ramana's point 1. albeit _only_when_ a -mfp16-format is specified. _2_ similarly (i.e. -mfpu=neon-fp16 will not provide any additional intrinsics unless an -mfp16-format is specified). I'll repost the patch series shortly with those changes implemented. In the meantime: are patches 1 2 ( ARM __builtin_arm_neon_lane_bounds and qualifier_lane_index) OK to commit? These contain nothing float16-specific, and would unblock Charles Baylis' work on PR63870 (https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00545.html). I'll ping the AArch64 changes separately. Cheers, Alan
Re: [gomp4] Additional tests for routine directive
Hi! On Fri, 24 Jul 2015 15:43:36 -0500, James Norris jnor...@codesourcery.com wrote: The attached patch adds additional test for the routine directive for C/C++/Fortran. Committed to gomp-4_0-branch. Thanks, but I see a number of FAILs, including the following: FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/routine-5.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors) UNRESOLVED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/routine-5.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 compilation failed to produce executable PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/routine-3.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 (test for excess errors) PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/routine-3.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 execution test FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/routine-3.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 output pattern test, is , should match foo not found FAIL: libgomp.oacc-fortran/routine-8.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -O0 (test for excess errors) UNRESOLVED: libgomp.oacc-fortran/routine-8.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -O0 compilation failed to produce executable [same for other torture testing flags] PASS: libgomp.oacc-fortran/routine-6.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -O0 (test for excess errors) PASS: libgomp.oacc-fortran/routine-6.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -O0 execution test FAIL: libgomp.oacc-fortran/routine-6.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -O0 output pattern test, is , should match not found [same for other torture testing flags] (I have not reviewed your test case changes.) Grüße, Thomas pgp899Yl6vG_G.pgp Description: PGP signature
Re: [gomp4] acc routines bugfix
Hi Cesar! On Fri, 24 Jul 2015 08:05:00 -0700, Cesar Philippidis ce...@codesourcery.com wrote: [...] I couldn't think of a way to test the lto error message because that involves having two compilers present. I wonder if it's ok to have libgomp check for compiler expected compiler errors? However, that's more of a gcc/testsuite type of check. Yes, such tests need to live in libgomp, even if they're compilation tests and not execution tests: the libgomp testsuite is the only place in the GCC testsuite where offloading (compilation) testing is possible. applied this patch to gomp-4_0-branch for now. Thanks! Grüße, Thomas signature.asc Description: PGP signature
Re: OMP. More constification
On 07/27/15 04:23, Jakub Jelinek wrote: On Sun, Jul 26, 2015 at 11:01:09AM -0400, Nathan Sidwell wrote: I found some more missing consts. The size, kind, var and function arrays emitted by omp-low are read only, but are not so marked. This patch First of all, the hostaddrs array is going to be written by the library side for GOMP_MAP_FIRSTPRIVATE, so can't be const and shouldn't be const on the compiler emitted side either. Similarly for the use_device_ptr clause on GOMP_target_data* handling. The size array is const from the library side, sure, but on the compiler side is only conditionally static (static only if all the values written to it are INTEGER_CSTs), so not sure if we want to make it use const qualified type unconditionally, rather than only change it to the const qualified type if it is TREE_STATIC at the end. The kinds array I'm afraid might very well soon follow the size array model. Ok, thanks for the explanation. I'm puzzled though about writing the sizes and kinds arrays. They're global object, so that doesn't seem thread-safe? Or is it ill-formed to have multiple host threads execute the same potentially offloaded function concurrently? nathan
[PATCH][AArch64] Change aarch64 vector cost to match vectorizer
Hello, I've applied changes for AArch64 vectorizer config suggested by Richard Biener (here https://gcc.gnu.org/ml/gcc/2015-04/msg00086.html) and checked the effect on spec2006 and geekbench benchmarks. Benchmarks were compiled with and without patch, in both cases with and without '-mcpu=cortex-a57'. As it turns out, there is no effect of this change on generated code. With cortex-a57 flag, mentioned loop is not vectorized before and after the patch. Binaries for benchmarks test cases are identical. Without the flag loop is vectorized in both cases. However, this change makes aarch64 vector costs consistent vectorizer code, while approach based on loop nesting level clearly underestimates cost of those statements. Thanks, Pawel diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 10df325..ffafc3f 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,7 @@ +2015-07-27 Pawel Kupidura pawel.kupid...@arm.com + +* config/aarch64/aarch64.c: Changed inner loop statement cost +to be consistent with vectorizer code. + 2015-07-26 Uros Bizjak ubiz...@gmail.com * config/alpha/alpha.c: Use SUBREG_P predicate. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 020f63c..3b6f8c5 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -7079,15 +7079,9 @@ aarch64_add_stmt_cost (void *data, int count, enum vect_cost_for_stmt kind, /* Statements in an inner loop relative to the loop being vectorized are weighted more heavily. The value here is - a function (linear for now) of the loop nest level. */ + arbitrary and could potentially be improved with analysis. */ if (where == vect_body stmt_info stmt_in_inner_loop_p (stmt_info)) -{ - loop_vec_info loop_info = STMT_VINFO_LOOP_VINFO (stmt_info); - struct loop *loop = LOOP_VINFO_LOOP (loop_info); - unsigned nest_level = loop_depth (loop); - - count *= nest_level; -} +count *= 50; /* FIXME */ retval = (unsigned) (count * stmt_cost); cost[where] += retval;
[PATCH] Honour DriverOnly for enum values in error messages.
The attached patch fixes a glicht in the error message generated for invalid values of enum options. When a DriverOnly option was passed directoy top the compiler, it would still list that as valid in the error message, e.g. on s390: $ cc1 ... -march=native cc1: error: unrecognized argument in option ‘-march=native’ cc1: note: valid arguments to ‘-march=’ are: g5 g6 native z10 z13 z196 z9-109 z9-ec z900 z990 zEC12 The patched code prints DriverOnly enum values only when the driver generates the error message. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany /gcc/ChangeLog * opts-common.c (read_cmdline_option): List DriverOnly enum values as valid only in the error message of the driver, not in the messages of the language compilers. From 6d59f56f33804e70ab5be6ced734ed2e95eeea7b Mon Sep 17 00:00:00 2001 From: Dominik Vogt v...@linux.vnet.ibm.com Date: Mon, 27 Jul 2015 12:39:30 +0100 Subject: [PATCH] Honour DriverOnly for enum values in error messages. --- gcc/opts-common.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/gcc/opts-common.c b/gcc/opts-common.c index 8e51974..3bcbaf1 100644 --- a/gcc/opts-common.c +++ b/gcc/opts-common.c @@ -1079,6 +1079,8 @@ read_cmdline_option (struct gcc_options *opts, p = s; for (i = 0; e-values[i].arg != NULL; i++) { + if (!enum_arg_ok_for_language (e-values[i], lang_mask)) + continue; size_t arglen = strlen (e-values[i].arg); memcpy (p, e-values[i].arg, arglen); p[arglen] = ' '; -- 2.3.0
Re: [C/C++ PATCH] Implement -Wtautological-compare (PR c++/66555, c/54979)
On Fri, Jul 24, 2015 at 03:10:17PM -0600, Jeff Law wrote: OK. Note I think the pt.c change will need trivial updating as it conflicts with another change of yours that I approved recently. Committed after resolving trivial merge conflict. Thanks for review, Marek
Re: [PATCH][RFT] Reduce gimple-match.c compile-time(?)
On Fri, 24 Jul 2015, Richard Biener wrote: The following patch implements the simplest approach of splitting the huge functions in gimple-match.c (not yet generic-match.c). ... Can people who reported huge compile-times test this patch and report back? Positive feedback on IRC and thus I am applying the following simplified variant. Bootstrapped and tested on x86_64-unknown-linux-gnu. Richard. 2015-07-27 Richard Biener rguent...@suse.de * genmatch.c (decision_tree::gen_gimple): Split out large subtrees into separate functions. (decision_tree::gen_generic): Likewise. Index: gcc/genmatch.c === --- gcc/genmatch.c (revision 226240) +++ gcc/genmatch.c (working copy) @@ -2951,6 +2964,32 @@ decision_tree::gen_gimple (FILE *f) for (unsigned n = 1; n = 3; ++n) { + /* First generate split-out functions. */ + for (unsigned i = 0; i root-kids.length (); i++) + { + dt_operand *dop = static_castdt_operand *(root-kids[i]); + expr *e = static_castexpr *(dop-op); + if (e-ops.length () != n) + continue; + + fprintf (f, \nstatic bool\n + gimple_simplify_%s (code_helper *res_code, tree *res_ops,\n + gimple_seq *seq, tree (*valueize)(tree) + ATTRIBUTE_UNUSED,\n + code_helper ARG_UNUSED (code), tree + ARG_UNUSED (type)\n, + e-operation-id); + for (unsigned i = 0; i n; ++i) + fprintf (f, , tree op%d, i); + fprintf (f, )\n); + fprintf (f, {\n); + dop-gen_kids (f, 2, true); + fprintf (f, return false;\n); + fprintf (f, }\n); + } + + /* Then generate the main entry with the outermost switch and + tail-calls to the split-out functions. */ fprintf (f, \nstatic bool\n gimple_simplify (code_helper *res_code, tree *res_ops,\n gimple_seq *seq, tree (*valueize)(tree),\n @@ -2976,10 +3015,11 @@ decision_tree::gen_gimple (FILE *f) fprintf (f, case %s%s:\n, is_a fn_id * (e-operation) ? - : , e-operation-id); - fprintf (f, {\n); - dop-gen_kids (f, 8, true); - fprintf (f, break;\n); - fprintf (f, }\n); + fprintf (f, return gimple_simplify_%s (res_code, res_ops, + seq, valueize, code, type, e-operation-id); + for (unsigned i = 0; i n; ++i) + fprintf (f, , op%d, i); + fprintf (f, );\n); } fprintf (f, default:;\n }\n); @@ -3003,6 +3043,34 @@ decision_tree::gen_generic (FILE *f) for (unsigned n = 1; n = 3; ++n) { + /* First generate split-out functions. */ + for (unsigned i = 0; i root-kids.length (); i++) + { + dt_operand *dop = static_castdt_operand *(root-kids[i]); + expr *e = static_castexpr *(dop-op); + if (e-ops.length () != n + /* Builtin simplifications are somewhat premature on +GENERIC. The following drops patterns with outermost +calls. It's easy to emit overloads for function code +though if necessary. */ + || e-operation-kind != id_base::CODE) + continue; + + fprintf (f, \nstatic tree\n + generic_simplify_%s (location_t ARG_UNUSED (loc), enum + tree_code ARG_UNUSED (code), tree ARG_UNUSED (type), + e-operation-id); + for (unsigned i = 0; i n; ++i) + fprintf (f, , tree op%d, i); + fprintf (f, )\n); + fprintf (f, {\n); + dop-gen_kids (f, 2, false); + fprintf (f, return NULL_TREE;\n); + fprintf (f, }\n); + } + + /* Then generate the main entry with the outermost switch and + tail-calls to the split-out functions. */ fprintf (f, \ntree\n generic_simplify (location_t loc, enum tree_code code, tree type ATTRIBUTE_UNUSED); @@ -3030,10 +3098,11 @@ decision_tree::gen_generic (FILE *f) fprintf (f, CASE_CONVERT:\n); else fprintf (f, case %s:\n, e-operation-id); - fprintf (f, {\n); - dop-gen_kids (f, 8, false); - fprintf (f, break;\n - }\n); + fprintf (f, return generic_simplify_%s (loc, code, type, + e-operation-id); + for (unsigned i = 0; i n; ++i) + fprintf (f, , op%d, i); + fprintf (f, );\n); } fprintf (f, default:;\n }\n);
Re: [PATCH][RFC] Re-work GIMPLE checking to be gimple class friendly
On Mon, 27 Jul 2015, Richard Biener wrote: I noticed that the code we generate for a simple gimple_assign_rhs1 (stmt) is quite bad as we have two checking pieces. The implementation is now static inline tree gimple_assign_rhs1 (const_gimple gs) { GIMPLE_CHECK (gs, GIMPLE_ASSIGN); return gimple_op (gs, 1); } and the hidden checking is due to gimple_op being static inline tree * gimple_ops (gimple gs) { size_t off; /* All the tuples have their operand vector at the very bottom of the structure. Note that those structures that do not have an operand vector have a zero offset. */ off = gimple_ops_offset_[gimple_statement_structure (gs)]; gcc_gimple_checking_assert (off != 0); return (tree *) ((char *) gs + off); (ugly in itself considering that we just verified we have a gassign which has an operand vector as member...). gimple_op incurs two additional memory reference to compute gimple_statement_structure and gimple_ops_offset_ plus the checking bits. The simplest thing would be to use GIMPLE_CHECK (gs, GIMPLE_ASSIGN); return as_a const gassign * (gs)-op[1]; but that's not optimal either because we check we have a gassign again (not optimized in stage1). So I decided to invent a fancy as_a which reports errors similar to GIMPLE_CHECK and makes the code look like const gassign *ass = GIMPLE_CHECK2const gassign * (gs); return ass-op[1]; for some reason I needed to overload is_a_helper for const_gimple (I thought the is_a machinery would transparently support qualified types?). I'm missing a fallback for !ENABLE_GIMPLE_CHECKING as well as an overload for 'gimple' I guess. I just changed gimple_assign_rhs1 for now. So this is a RFC. I understand in the end the whole GCC may use gassign everywhere we access gimple_assign_rhs1 but I don't see this happening soon and this simple proof-of-concept patch already reduces unoptimized text size of gimple-match.c from 836080 to 836359 (too bad) and optimized from 585731 to 193190 bytes (yay!) on x86_64 using trunk. Some inlining must go very differently - we just have 544 calls to gimple_assign_rhs1 in gimple-match.c. .optimizes shows all calls remaining as bb 89: _ZL18gimple_assign_rhs1PK21gimple_statement_base.part.32 (def_stmt_47); which is tree_node* gimple_assign_rhs1(const_gimple) (const struct gimple_statement_base * gs) { bb 2: gimple_check_failed (gs_1(D), /space/rguenther/tramp3d/trunk/gcc/gimple.h[0], 2381, gimple_assign_rhs1[0], 6, 0); } so eventually we just do a better job with profile estimation. Ok, numbers are off because of a typo I introduced late. + if (ret) +gimple_check_failed (gs, file, line, fun, T()-code_, ERROR_MARK); should be if (!ret) of course ... Optimized code size after the patch is 588878 (we still avoid a lot of code and the checking fails are still split out). Not sure where the code size increase is from. Unoptimized code size after the patch is 836233. Richard. I think inliner-wise we are even as we get rid of the gimple_op () inline but instead get the GIMPLE_CHECK2 inline. So - any comments? Thanks, Richard. 2015-07-27 Richard Biener rguent...@suse.de * gimple.h (GIMPLE_CHECK2): New inline template function. (gassign::code_): New constant static member. (is_a_helperconst gassign *): Add. (gimple_assign_rhs1): Use GIMPLE_CHECK2 and a cheaper way to access operand 1. * gimple.c (gassign::code_): Define. Index: gcc/gimple.h === --- gcc/gimple.h (revision 226240) +++ gcc/gimple.h (working copy) @@ -51,6 +51,16 @@ extern void gimple_check_failed (const_g gimple_check_failed (__gs, __FILE__, __LINE__, __FUNCTION__, \ (CODE), ERROR_MARK); \ } while (0) +template typename T +static inline T +GIMPLE_CHECK2(const_gimple gs, const char *file = __builtin_FILE (), + int line = __builtin_LINE (), const char *fun = __builtin_FUNCTION ()) +{ + T ret = dyn_cast T (gs); + if (ret) +gimple_check_failed (gs, file, line, fun, T()-code_, ERROR_MARK); + return ret; +} #else /* not ENABLE_GIMPLE_CHECKING */ #define gcc_gimple_checking_assert(EXPR) ((void)(0 (EXPR))) #define GIMPLE_CHECK(GS, CODE) (void)0 @@ -832,6 +842,7 @@ struct GTY((tag(GSS_WITH_OPS))) struct GTY((tag(GSS_WITH_MEM_OPS))) gassign : public gimple_statement_with_memory_ops { + static const enum gimple_code code_ = GIMPLE_ASSIGN; /* no additional fields; this uses the layout for GSS_WITH_MEM_OPS. */ }; @@ -864,6 +875,14 @@ is_a_helper gassign *::test (gimple gs template template inline bool +is_a_helper const gassign *::test (const_gimple gs) +{ + return gs-code == GIMPLE_ASSIGN; +} + +template +template +inline bool
Re: [PATCH][RFC] Re-work GIMPLE checking to be gimple class friendly
On Mon, 27 Jul 2015, Richard Biener wrote: On Mon, 27 Jul 2015, Richard Biener wrote: I noticed that the code we generate for a simple gimple_assign_rhs1 (stmt) is quite bad as we have two checking pieces. The implementation is now static inline tree gimple_assign_rhs1 (const_gimple gs) { GIMPLE_CHECK (gs, GIMPLE_ASSIGN); return gimple_op (gs, 1); } and the hidden checking is due to gimple_op being static inline tree * gimple_ops (gimple gs) { size_t off; /* All the tuples have their operand vector at the very bottom of the structure. Note that those structures that do not have an operand vector have a zero offset. */ off = gimple_ops_offset_[gimple_statement_structure (gs)]; gcc_gimple_checking_assert (off != 0); return (tree *) ((char *) gs + off); (ugly in itself considering that we just verified we have a gassign which has an operand vector as member...). gimple_op incurs two additional memory reference to compute gimple_statement_structure and gimple_ops_offset_ plus the checking bits. The simplest thing would be to use GIMPLE_CHECK (gs, GIMPLE_ASSIGN); return as_a const gassign * (gs)-op[1]; but that's not optimal either because we check we have a gassign again (not optimized in stage1). So I decided to invent a fancy as_a which reports errors similar to GIMPLE_CHECK and makes the code look like const gassign *ass = GIMPLE_CHECK2const gassign * (gs); return ass-op[1]; for some reason I needed to overload is_a_helper for const_gimple (I thought the is_a machinery would transparently support qualified types?). I'm missing a fallback for !ENABLE_GIMPLE_CHECKING as well as an overload for 'gimple' I guess. I just changed gimple_assign_rhs1 for now. So this is a RFC. I understand in the end the whole GCC may use gassign everywhere we access gimple_assign_rhs1 but I don't see this happening soon and this simple proof-of-concept patch already reduces unoptimized text size of gimple-match.c from 836080 to 836359 (too bad) and optimized from 585731 to 193190 bytes (yay!) on x86_64 using trunk. Some inlining must go very differently - we just have 544 calls to gimple_assign_rhs1 in gimple-match.c. .optimizes shows all calls remaining as bb 89: _ZL18gimple_assign_rhs1PK21gimple_statement_base.part.32 (def_stmt_47); which is tree_node* gimple_assign_rhs1(const_gimple) (const struct gimple_statement_base * gs) { bb 2: gimple_check_failed (gs_1(D), /space/rguenther/tramp3d/trunk/gcc/gimple.h[0], 2381, gimple_assign_rhs1[0], 6, 0); } so eventually we just do a better job with profile estimation. Ok, numbers are off because of a typo I introduced late. + if (ret) +gimple_check_failed (gs, file, line, fun, T()-code_, ERROR_MARK); should be if (!ret) of course ... Optimized code size after the patch is 588878 (we still avoid a lot of code and the checking fails are still split out). Not sure where the code size increase is from. Unoptimized code size after the patch is 836233. And here is a more complete variant (covering all op accesses of gimple assigns). stage2 cc1plus size drops from textdata bss dec hex filename 24970991 73776 1392952 264377191936857 /abuild/rguenther/obj2/prev-gcc/cc1plus to textdata bss dec hex filename 24809991 69608 1388536 26268135190d1e7 ../obj/prev-gcc/cc1plus Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Thanks, Richard. 2015-07-27 Richard Biener rguent...@suse.de * gimple.h (GIMPLE_CHECK2): New inline template function. (gassign::code_): New constant static member. (is_a_helperconst gassign *): Add. (gimple_assign_lhs): Use GIMPLE_CHECK2 and a cheaper way to access the operand. (gimple_assign_lhs_ptr): Likewise. (gimple_assign_set_lhs): Likewise. (gimple_assign_rhs1, gimple_assign_rhs1_ptr, gimple_assign_set_rhs1): Likewise. (gimple_assign_rhs2, gimple_assign_rhs2_ptr, gimple_assign_set_rhs2): Likewise. (gimple_assign_rhs3, gimple_assign_rhs3_ptr, gimple_assign_set_rhs3): Likewise. (gimple_assign_rhs_code): Likewise. * gimple.c (gassign::code_): Define. Index: gcc/gimple.h === --- gcc/gimple.h(revision 226240) +++ gcc/gimple.h(working copy) @@ -51,9 +51,57 @@ extern void gimple_check_failed (const_g gimple_check_failed (__gs, __FILE__, __LINE__, __FUNCTION__, \ (CODE), ERROR_MARK); \ } while (0) +template typename T +static inline T +GIMPLE_CHECK2(const_gimple gs, +#if __GNUC__ 4 || (__GNUC__ == 4 __GNUC_MINOR__ = 8) + const char *file = __builtin_FILE
Re: OMP. More constification
On Mon, Jul 27, 2015 at 07:33:59AM -0400, Nathan Sidwell wrote: On 07/27/15 04:23, Jakub Jelinek wrote: On Sun, Jul 26, 2015 at 11:01:09AM -0400, Nathan Sidwell wrote: I found some more missing consts. The size, kind, var and function arrays emitted by omp-low are read only, but are not so marked. This patch First of all, the hostaddrs array is going to be written by the library side for GOMP_MAP_FIRSTPRIVATE, so can't be const and shouldn't be const on the compiler emitted side either. Similarly for the use_device_ptr clause on GOMP_target_data* handling. The size array is const from the library side, sure, but on the compiler side is only conditionally static (static only if all the values written to it are INTEGER_CSTs), so not sure if we want to make it use const qualified type unconditionally, rather than only change it to the const qualified type if it is TREE_STATIC at the end. The kinds array I'm afraid might very well soon follow the size array model. Ok, thanks for the explanation. I'm puzzled though about writing the sizes and kinds arrays. They're global object, so that doesn't seem thread-safe? Or is it ill-formed to have multiple host threads execute the same potentially offloaded function concurrently? They are only global objects if they are filled with constants. Right know for kinds array that is true always, for sizes array if no VLAs nor variable length array sections are used. They are initially TREE_STATIC (var) = 1; but we later on clear it with TREE_STATIC (var) = 0; if it is constant (this is still before the decls are finalized). Jakub
Re: OMP. More constification
On 07/27/15 07:43, Jakub Jelinek wrote: On Mon, Jul 27, 2015 at 07:33:59AM -0400, Nathan Sidwell wrote: They are only global objects if they are filled with constants. Right know for kinds array that is true always, for sizes array if no VLAs nor variable length array sections are used. They are initially TREE_STATIC (var) = 1; but we later on clear it with TREE_STATIC (var) = 0; if it is constant (this is still before the decls are finalized). Ok, I'd missed that. thanks.
Re: [libstdc++/67015, patch] Fix regex POSIX bracket parsing
On 26/07/15 05:20 -0700, Tim Shen wrote: @@ -389,7 +391,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #endif } - void + _StringT _M_add_collating_element(const _StringT __s) { auto __st = _M_traits.lookup_collatename(__s.data(), Changing this return type is an ABI change. When compiled with older versions of GCC this function will not return anything, so code compiled with newer versions that links to that older code will read garbage off the stack if linked to the old version. (This isn't a problem for _M_expression_term because the return type is part of the mangled name for function templates). One solution would be to rename the function, so that code compiled with the new GCC will use the new function, not link to the old version that doesn't return anything.
[RFC AArch64][PR 63304] Handle literal pools for functions 1 MiB in size.
Hi, This patch appears to fix the issue in PR63304 where we have functions that are 1MiB. The idea is to use adrp / ldr or adrp / add instructions to address the literal pools under the use of a command line option. The patch as attached turns this feature on by default as that is the mode I've used for testing. My thought is that we turn this on by default on trunk but keep this disabled by default for the release branches in order to get some serious testing for this feature while it bakes on trunk. As a follow-up I would like to try and see if estimate_num_insns or something else can give us a heuristic to turn this on for large functions. After all the number of incidences of this are quite low in real life, so may be we should look to restrict this use as much as possible on the grounds that this code generation implies an extra integer register for addressing for every floating point and vector constant and I don't think that's great in code that already may have high register pressure. Tested on aarch64-none-elf with no regressions. Bootstrapped and regression tested on aarch64-none-linux-gnu. Additionally a test run of SPEC2k showed no issues other than a miscompare for eon which also appeared in the base run. Thus for now I'm confident this is reasonably sane for the minute. I will also note that this patch will also need rebasing on top of Kyrill's work with target attributes and thus cannot be final for trunk - however this version is still applicable for all release branches. More testing is still underway but in the meanwhile I'd like to put this up for some comments please. regards Ramana DATE Ramana Radhakrishnan ramana.radhakrish...@arm.com PR target/63304 * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): Handle nopcrelative_literal_loads. (aarch64_classify_address): Likewise. (aarch64_constant_pool_reload_icode): Define. (aarch64_secondary_reload): Handle secondary reloads for literal pools. (aarch64_override_options): Handle nopcrelative_literal_loads. (aarch64_classify_symbol): Handle nopcrelative_literal_loads. * config/aarch64/aarch64.md (aarch64_reload_movcpALLTF:modeP:mode): Define. (aarch64_reload_movcpVALL:modeP:mode): Likewise. * config/aarch64/aarch64.opt: New option mnopc-relative-literal-loads * config/aarch64/predicates.md (aarch64_constant_pool_symref): New predicate. * doc/invoke.texi (mnopc-relative-literal-loads): Document. --- gcc/config/aarch64/aarch64.c | 102 +-- gcc/config/aarch64/aarch64.md| 26 ++ gcc/config/aarch64/aarch64.opt | 4 ++ gcc/config/aarch64/predicates.md | 4 ++ gcc/doc/invoke.texi | 8 +++ 5 files changed, 140 insertions(+), 4 deletions(-) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 020f63c..f37a031 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1612,11 +1612,27 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) aarch64_emit_move (dest, base); return; } + mem = force_const_mem (ptr_mode, imm); gcc_assert (mem); + + /* If we aren't generating PC relative literals, then +we need to expand the literal pool access carefully. +This is something that needs to be done in a number +of places, so could well live as a separate function. */ + if (nopcrelative_literal_loads) + { + gcc_assert (can_create_pseudo_p ()); + base = gen_reg_rtx (ptr_mode); + aarch64_expand_mov_immediate (base, XEXP (mem, 0)); + mem = gen_rtx_MEM (ptr_mode, base); + } + if (mode != ptr_mode) mem = gen_rtx_ZERO_EXTEND (mode, mem); + emit_insn (gen_rtx_SET (dest, mem)); + return; case SYMBOL_SMALL_TLSGD: @@ -3728,9 +3744,10 @@ aarch64_classify_address (struct aarch64_address_info *info, rtx sym, addend; split_const (x, sym, addend); - return (GET_CODE (sym) == LABEL_REF - || (GET_CODE (sym) == SYMBOL_REF - CONSTANT_POOL_ADDRESS_P (sym))); + return ((GET_CODE (sym) == LABEL_REF + || (GET_CODE (sym) == SYMBOL_REF + CONSTANT_POOL_ADDRESS_P (sym) + !nopcrelative_literal_loads))); } return false; @@ -4918,12 +4935,69 @@ aarch64_legitimize_reload_address (rtx *x_p, } +/* Return the reload icode required for a constant pool in mode. */ +static enum insn_code +aarch64_constant_pool_reload_icode (machine_mode mode) +{ + switch (mode) +{ +case SFmode: + return CODE_FOR_aarch64_reload_movcpsfdi; + +case DFmode: + return CODE_FOR_aarch64_reload_movcpdfdi; + +case TFmode: + return
Re: Fix logic error in Fortran OpenACC parsing
Hi! On Fri, 8 May 2015 14:24:15 +0300, Ilmir Usmanov i.usma...@samsung.com wrote: On 06.05.2015 14:38, Thomas Schwinge wrote: On Tue, 5 May 2015 15:38:03 -0400, David Malcolm dmalc...@redhat.com wrote: On Wed, 2015-04-29 at 14:10 +0200, Mikael Morin wrote: Le 29/04/2015 02:02, David Malcolm a écrit : diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c index 2c7c554..30e4eab 100644 --- a/gcc/fortran/parse.c +++ b/gcc/fortran/parse.c @@ -4283,7 +4283,7 @@ parse_oacc_structured_block (gfc_statement acc_st) unexpected_eof (); else if (st != acc_end_st) gfc_error (Expecting %s at %C, gfc_ascii_statement (acc_end_st)); -reject_statement (); + reject_statement (); } while (st != acc_end_st); I think this one is a bug; there should be braces around 'gfc_error' and 'reject_statement'. If 'st' is 'acc_end_st', as it shall be, the statement is rejected. So, this is a bug. At least that's the pattern in 'parse_oacc_loop', and how the 'unexpected_statement' function is used. FWIW, Jeff had approved that patch, so I've committed the patch to trunk (as r222823), making the indentation reflect the block structure. Thomas: should the reject_statement (); call in the above be guarded by the else if (st != acc_end_st) clause? Indeed, this seems to be a bug that has been introduced very early in the OpenACC Fortran front end development -- see how the parse_oacc_structured_block function evolved in the patches posted in http://news.gmane.org/find-root.php?message_id=%3C52E1595D.907%40samsung.com%3E and following (Ilmir, CCed just in case). I also see that the corresponding OpenMP code, parse_omp_structured_block, just calls unexpected_statement, which Ilmir's initial patch also did, but at some point, he then changed this to the current code: gfc_error followed by reject_statement, as cited above -- I would guess for the reason to get a better error message? (Tobias, should this thus also be done for OpenMP, and/or extend unexpected_statement accordingly?) That's true. I've checked abandoned openacc-1_0-branch and I used unexpected_statement there (there still odd *_acc_* naming presents instead of new-and-shiny *_oacc_* one), but, as you mentioned, I've changed this for better error reporting... and introduced the bug. And then, I'm a bit confused: is it OK that despite this presumed logic error, which affects all (?) valid executions of this parsing code, we're not running into any issues with the OpenACC Fortran front end test cases? I think, this is OK, since this is an !$ACC END _smth_ statement and it shall not present in the AST. So, it is abandoned later anyway ;) (if I remember correctly, during gfc_clear_new_st call). Although the bug does not affect the logic, it is still a bug. OK for trunk? From my point of view, OK. Thanks for your review! Nobody else had any comments, so I have now committed my patch to trunk in r226246: commit 1ed4ddb2615c807c239e1b3eb214655a4cc87f1a Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Mon Jul 27 14:26:41 2015 + Fix logic error in Fortran OpenACC parsing gcc/fortran/ * parse.c (parse_oacc_structured_block): Fix logic error. Reported by Mikael Morin mikael.mo...@sfr.fr. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@226246 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/fortran/ChangeLog |5 + gcc/fortran/parse.c |6 -- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git gcc/fortran/ChangeLog gcc/fortran/ChangeLog index 0ed6b9b..e5b7681 100644 --- gcc/fortran/ChangeLog +++ gcc/fortran/ChangeLog @@ -1,3 +1,8 @@ +2015-07-27 Thomas Schwinge tho...@codesourcery.com + + * parse.c (parse_oacc_structured_block): Fix logic error. + Reported by Mikael Morin mikael.mo...@sfr.fr. + 2015-07-24 Mikael Morin mik...@gcc.gnu.org PR fortran/64986 diff --git gcc/fortran/parse.c gcc/fortran/parse.c index 45ad63f..04b4c80 100644 --- gcc/fortran/parse.c +++ gcc/fortran/parse.c @@ -4383,8 +4383,10 @@ parse_oacc_structured_block (gfc_statement acc_st) if (st == ST_NONE) unexpected_eof (); else if (st != acc_end_st) - gfc_error (Expecting %s at %C, gfc_ascii_statement (acc_end_st)); - reject_statement (); + { + gfc_error (Expecting %s at %C, gfc_ascii_statement (acc_end_st)); + reject_statement (); + } } while (st != acc_end_st); Grüße, Thomas pgp8lg16qnyQJ.pgp Description: PGP signature
Re: [gomp4] fiuxup openacc default handling
On 26/07/15 19:09, Nathan Sidwell wrote: I've committed this update to my earlier breakout of default handling. After complaining about something because of 'none', we should fall through to the default handling, to prevent ICEing later (on patch seriesI'm working on). This matches the OMP default handling. Also tweaked the setting of GOVD_ flags slightly, to make the firstprivate handling I'm working on less invasive. Hi, this causes PR 67027 - [gomp4] FAIL: gfortran.dg/goacc/modules.f95 -O (internal compiler error). Thanks, - Tom nathan gomp4-gimp-fix.patch 2015-07-26 Nathan Sidwellnat...@codesourcery.com * gimplify.c (oacc_default_clause): Fallthrough to unspecified handling. Propagate mapping from outer scope. Index: gcc/gimplify.c === --- gcc/gimplify.c (revision 226226) +++ gcc/gimplify.c (working copy) @@ -5930,7 +5930,7 @@ oacc_default_clause (struct gimplify_omp DECL_NAME (lang_hooks.decls.omp_report_decl (decl)), rkind); error_at (ctx-location, enclosing OpenACC %s construct, rkind); } -break; + /* FALLTHRU. */ case OMP_CLAUSE_DEFAULT_UNSPECIFIED: { @@ -5944,33 +5944,39 @@ oacc_default_clause (struct gimplify_omp continue; if (!(octx-region_type (ORT_TARGET_DATA | ORT_TARGET))) break; - if (splay_tree_lookup (octx-variables, (splay_tree_key) decl)) + splay_tree_node n2 + = splay_tree_lookup (octx-variables, (splay_tree_key) decl); + if (n2) + { + flags |= n2-value GOVD_MAP; goto found_outer; + } } } - { - tree type = TREE_TYPE (decl); - /* Should this be REFERENCE_TYPE_P? */ - if (POINTER_TYPE_P (type)) - type = TREE_TYPE (type); + if (is_global_var (decl) device_resident_p (decl)) + flags |= GOVD_MAP_TO_ONLY | GOVD_MAP; + /* Scalars under kernels are default 'copy'. */ + else if (ctx-acc_region_kind == ARK_KERNELS) + flags |= GOVD_FORCE_MAP | GOVD_MAP; + else if (ctx-acc_region_kind == ARK_PARALLEL) + { + tree type = TREE_TYPE (decl); + + /* Should this be REFERENCE_TYPE_P? */ + if (POINTER_TYPE_P (type)) + type = TREE_TYPE (type); - /* For OpenACC regions, array and aggregate variables -default to present_or_copy, while scalar variables -by default are firstprivate (gang-local) in parallel. */ - if (!AGGREGATE_TYPE_P (type)) - { - if (is_global_var (decl) device_resident_p (decl)) - flags |= GOVD_MAP_TO_ONLY; - else if (ctx-acc_region_kind == ARK_PARALLEL) - flags |= (GOVD_GANGLOCAL | GOVD_MAP_TO_ONLY); - /* Scalars under kernels are default 'copy'. */ - else if (ctx-acc_region_kind == ARK_KERNELS) - flags |= GOVD_FORCE_MAP; - else - gcc_unreachable (); - } + if (AGGREGATE_TYPE_P (type)) + /* Aggregates default to 'copy'. This should really +include GOVD_FORCE_MAP. */ + flags |= GOVD_MAP; + else + /* Scalars default tp 'firstprivate'. */ + flags |= GOVD_GANGLOCAL | GOVD_MAP_TO_ONLY | GOVD_MAP; } + else + gcc_unreachable (); found_outer:; } break; @@ -6020,7 +6026,8 @@ omp_notice_variable (struct gimplify_omp if (is_oacc) flags = oacc_default_clause (ctx, decl, in_code, flags); - flags |= GOVD_MAP; + else + flags |= GOVD_MAP; if (!lang_hooks.types.omp_mappable_type (TREE_TYPE (decl), is_oacc)) {
Re: [gomp4] fiuxup openacc default handling
On 07/27/15 11:21, Tom de Vries wrote: On 26/07/15 19:09, Nathan Sidwell wrote: I've committed this update to my earlier breakout of default handling. After complaining about something because of 'none', we should fall through to the default handling, to prevent ICEing later (on patch seriesI'm working on). This matches the OMP default handling. Also tweaked the setting of GOVD_ flags slightly, to make the firstprivate handling I'm working on less invasive. Hi, this causes PR 67027 - [gomp4] FAIL: gfortran.dg/goacc/modules.f95 -O (internal compiler error). Will take a look ...
Re: [C/C++ PATCH] Implement -Wtautological-compare (PR c++/66555, c/54979)
On 27/07/15 16:59, Kyrill Tkachov wrote: Hi Marek, On 27/07/15 13:41, Marek Polacek wrote: On Fri, Jul 24, 2015 at 03:10:17PM -0600, Jeff Law wrote: OK. Note I think the pt.c change will need trivial updating as it conflicts with another change of yours that I approved recently. Committed after resolving trivial merge conflict. This caused an arm bootstrap failure with -Werror. https://gcc.gnu.org/bugzilla/process_bug.cgi Err, the correct link is: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67030 Thanks, Kyrill Thanks for review, Marek
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
On 07/27/2015 04:17 AM, Kyrill Tkachov wrote: I experimented with resource.c and the roadblock I hit is that it seems to have an assumption that it operates on hard regs (in fact the struct it uses to describe the resources has a HARD_REG_SET for the regs) and so it triggers various HARD_REGISTER_P asserts when I try to use the functions there. if-conversion runs before register allocation, so we're dealing with pseudos here. Sigh. resource.c probably isn't going to be useful then. My other attempt was to go over BB_A and mark the set registers in a bitmap then go over BB_B and do a FOR_EACH_SUBRTX of the SET_SRC of each insn. If a sub-rtx is a reg that is set in the bitmap from BB_A we return false. This seemed to do the job and testing worked out ok. That would require one walk over BB_A, one walk over BB_B but I don't know how expensive FOR_EACH_SUBRTX walks are... Would that be an acceptable solution? I think the latter is reasonable. Ultimately we have to do a full look at those rtxs, so it's unavoidable to some extent. The only other possibility would be to use the DF framework. I'm not sure if it's even initialized for the ifcvt code. If it is, then you might be able to avoid some of the walking of the insns and instead walk the DF structures. snip It fails when the last insn is not recognised, because noce_try_cmove_arith can modify the last insn, but I have not seen it cause any trouble. If it fails then back in noce_try_cmove_arith we goto end_seq_and_fail which ends the sequence and throws it away (and cancels if-conversion down that path), so it should be safe. OK, I was working for the assumption that memoization ought not fail, but it seems that was a bad assumption on my part.So given noce_try_cmove_arith can change the last insn and make it no-recognizable this code seems reasoanble. So I think the only outstanding issues are: 1. Investigate moving rather than re-emitting insns. I'll look into that, but what is the machinery by which one moves insns? I don't think we have any good generic machinery for this. I think every pass that needs this capability unlinks the insn from the chain and patches it back in at the new location. jeff
RE: [PATCH][AArch64] Improve spill code - swap order in shl pattern
ping -Original Message- From: Wilco Dijkstra [mailto:wdijk...@arm.com] Sent: 27 April 2015 14:37 To: GCC Patches Subject: [PATCH][AArch64] Improve spill code - swap order in shl pattern Various instructions are supported as integer operations as well as SIMD on AArch64. When register pressure is high, lra-constraints inserts spill code without taking the allocation class into account, and basically chooses the first available pattern that matches. Since this instruction has the SIMD version first it is usually chosen eventhough some of the operands are eventually allocated to integer registers. The result is inefficient code not only due to the higher latency of SIMD instructions but also due to the extra int-FP moves. Placing the integer variant first in the shl pattern generates far more optimal spill code. A few more patterns are the wrong way around, which I'll address in a separate patch. I'm also looking into fixing lra-constraints to generate the expected code by taking the allocno class into account in the cost calculations during spilling. 2015-04-27 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64.md (aarch64_ashl_sisd_or_int_mode3): Place integer variant first. --- gcc/config/aarch64/aarch64.md | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 7163025..baef56a 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3334,17 +3334,17 @@ ;; Logical left shift using SISD or Integer instruction (define_insn *aarch64_ashl_sisd_or_int_mode3 - [(set (match_operand:GPI 0 register_operand =w,w,r) + [(set (match_operand:GPI 0 register_operand =r,w,w) (ashift:GPI - (match_operand:GPI 1 register_operand w,w,r) - (match_operand:QI 2 aarch64_reg_or_shift_imm_mode Uscmode,w,rUscmode)))] + (match_operand:GPI 1 register_operand r,w,w) + (match_operand:QI 2 aarch64_reg_or_shift_imm_mode rUscmode,Uscmode,w)))] @ + lsl\t%w0, %w1, %w2 shl\t%rtn0vas, %rtn1vas, %2 - ushl\t%rtn0vas, %rtn1vas, %rtn2vas - lsl\t%w0, %w1, %w2 - [(set_attr simd yes,yes,no) - (set_attr type neon_shift_immq, neon_shift_regq,shift_reg)] + ushl\t%rtn0vas, %rtn1vas, %rtn2vas + [(set_attr simd no,yes,yes) + (set_attr type shift_reg,neon_shift_immq, neon_shift_regq)] ) ;; Logical right shift using SISD or Integer instruction -- 1.9.1
Re: [gomp4, fortran] Patch to fix continuation checks of OpenACC and OpenMP directives
Hi! On Tue, 30 Jun 2015 03:39:42 +0300, Ilmir Usmanov m...@ilmir.us wrote: 08.06.2015, 17:59, Cesar Philippidis ce...@codesourcery.com: On 06/07/2015 02:05 PM, Ilmir Usmanov wrote: 08.06.2015, 00:01, Ilmir Usmanov m...@ilmir.us: This patch fixes checks of OpenMP and OpenACC continuations in case if someone mixes them (i.e. continues OpenMP directive with !$ACC sentinel or vice versa). Thanks for working on this! OK for gomp branch? The same applies to GCC trunk, as far as I can tell -- any reason not to apply the patch to trunk? Thanks for working on this. Does this fix PR63858 by any chance? No problem. I had a feeling that something is wrong in the scanner since I've committed an initial support of OpenACC ver. 1.0 to gomp branch (more than a year ago). Now it does fix the PR, because I've added support of fixed form to the patch. BTW, your test in the PR has a wrong continuation. Fixed test added to the patch. I understand correctly that your patch does fix https://gcc.gnu.org/PR63858, right? In that case, please assign the PR to yourself, and close it as fixed, once it is fixed in trunk. --- a/gcc/fortran/ChangeLog.gomp +++ b/gcc/fortran/ChangeLog.gomp @@ -1,3 +1,13 @@ +2015-06-30 Ilmir Usmanov m...@ilmir.us + + PR/63858 The format to use is PR [component]/[number], so here: PR fortran/63858. --- a/gcc/testsuite/ChangeLog.gomp +++ b/gcc/testsuite/ChangeLog.gomp @@ -1,3 +1,10 @@ +2015-06-30 Ilmir Usmanov m...@ilmir.us + + PR/63858 Same. --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/omp-fixed.f @@ -0,0 +1,32 @@ +! { dg-do compile } +! { dg-additional-options -fopenmp } For C, C++, we have a goacc-gomp directory, where both OpenACC and OpenMP are enabled -- does that make sense to have for Fortran, too? Grüße, Thomas signature.asc Description: PGP signature
Re: [PATCH][RFC] Re-work GIMPLE checking to be gimple class friendly
Hi, On Mon, 27 Jul 2015, Richard Biener wrote: static inline tree gimple_assign_rhs1 (const_gimple gs) { GIMPLE_CHECK (gs, GIMPLE_ASSIGN); return gimple_op (gs, 1); } and the hidden checking is due to gimple_op being static inline tree * gimple_ops (gimple gs) { size_t off; /* All the tuples have their operand vector at the very bottom of the structure. Note that those structures that do not have an operand vector have a zero offset. */ off = gimple_ops_offset_[gimple_statement_structure (gs)]; gcc_gimple_checking_assert (off != 0); I once experimented with making the multiple checking and removing the actual memory accesses by exposing the real content of gss_for_code_[] (for gss_for_code(), used by gimple_statement_structure()) and gimple_ops_offset_[] to the inlined functions, i.e. by defining them in gimple.h, not just declaring them. Still wouldn't help the -O0 case, but with -O2 only the first check for being GIMPLE_ASSING remains, everything else is constant folded. The simplest thing would be to use GIMPLE_CHECK (gs, GIMPLE_ASSIGN); return as_a const gassign * (gs)-op[1]; but that's not optimal either because we check we have a gassign again (not optimized in stage1). So I decided to invent a fancy as_a which reports errors similar to GIMPLE_CHECK and makes the code look like I'm confused. as_a already does checking: as_a (U *p) { gcc_checking_assert (is_a T (p)); return is_a_helper T::cast (p); } (were the is_a variant here has the same test as GIMPLE_CHECK). So, if you have the as_a call, you can as well simply remove the GIMPLE_CHECK, and be done with it, without ... const gassign *ass = GIMPLE_CHECK2const gassign * (gs); ... introducing this. What am I missing? Ciao, Michael.
Re: [PATCH 2/16][ARM] PR/63870 Add __builtin_arm_lane_check.
Kyrill Tkachov wrote: Hi Alan, Can you please add a comment on top of this saying that this builtin only exists to perform the lane check, just to make it explicit for the future. Done, and pushed as r226252. Charles, thanks for your patience, and I hope this lets you move forwards. I realize this leaves you slightly picking up the pieces in terms of what's there and what isn't, but in theory it corresponds to what we've done previously in AArch64 - give me a shout if you are unclear how to proceed! Thanks, Alan
Re: [PATCH][AArch64] Improve spill code - swap order in shr patterns
On Mon, Jul 27, 2015 at 03:38:55PM +0100, Wilco Dijkstra wrote: This is basically the same as the shl version but for the shr patterns. Various instructions are supported as integer operations as well as SIMD on AArch64. When register pressure is high, lra-constraints inserts spill code without taking the allocation class into account, and basically chooses the first available pattern that matches. Since this instruction has the SIMD version first it is usually chosen eventhough some of the operands are eventually allocated to integer registers. The result is inefficient code not only due to the higher latency of SIMD instructions but also due to the extra int-FP moves. Placing the integer variant first in the shr pattern generates far more optimal spill code. 2015-07-27 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64.md (aarch64_lshr_sisd_or_int_mode3): Place integer variant first. (aarch64_ashr_sisd_or_int_mode3): Likewise. This ChangeLog looks odd to me. It should include details of the new alternative added to aarch64_lshr_sis_or_int_mode3, and also be formatted as so: * gcc/config/aarch64/aarch64.md (aarch64_lshr_sisd_or_int_mode3): Place integer variant first. (aarch64_ashr_sisd_or_int_mode3): Likewise. Otherwise, this patch looks OK to me, so I've applied it on your behalf as revision 226253. Thanks, James --- gcc/config/aarch64/aarch64.md | 27 ++- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index baef56a..06256e3 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3349,17 +3349,18 @@ ;; Logical right shift using SISD or Integer instruction (define_insn *aarch64_lshr_sisd_or_int_mode3 - [(set (match_operand:GPI 0 register_operand =w,w,r) + [(set (match_operand:GPI 0 register_operand =r,w,w,w) (lshiftrt:GPI - (match_operand:GPI 1 register_operand w,w,r) - (match_operand:QI 2 aarch64_reg_or_shift_imm_mode Uscmode,w,rUscmode)))] + (match_operand:GPI 1 register_operand r,w,w,w) + (match_operand:QI 2 aarch64_reg_or_shift_imm_mode rUscmode,Uscmode,w,0)))] @ + lsr\t%w0, %w1, %w2 ushr\t%rtn0vas, %rtn1vas, %2 # - lsr\t%w0, %w1, %w2 - [(set_attr simd yes,yes,no) - (set_attr type neon_shift_immq,neon_shift_regq,shift_reg)] + # + [(set_attr simd no,yes,yes,yes) + (set_attr type shift_reg,neon_shift_immq,neon_shift_regq,neon_shift_regq)] ) (define_split @@ -3394,18 +3395,18 @@ ;; Arithmetic right shift using SISD or Integer instruction (define_insn *aarch64_ashr_sisd_or_int_mode3 - [(set (match_operand:GPI 0 register_operand =w,w,w,r) + [(set (match_operand:GPI 0 register_operand =r,w,w,w) (ashiftrt:GPI - (match_operand:GPI 1 register_operand w,w,w,r) - (match_operand:QI 2 aarch64_reg_or_shift_imm_di Uscmode,w,0,rUscmode)))] + (match_operand:GPI 1 register_operand r,w,w,w) + (match_operand:QI 2 aarch64_reg_or_shift_imm_di rUscmode,Uscmode,w,0)))] @ + asr\t%w0, %w1, %w2 sshr\t%rtn0vas, %rtn1vas, %2 # - # - asr\t%w0, %w1, %w2 - [(set_attr simd yes,yes,yes,no) - (set_attr type neon_shift_immq,neon_shift_regq,neon_shift_regq,shift_reg)] + # + [(set_attr simd no,yes,yes,yes) + (set_attr type shift_reg,neon_shift_immq,neon_shift_regq,neon_shift_regq)] ) (define_split -- 1.9.1
Re: [PATCH] Refactoring masked built-in decls
On Mon, Jul 27, 2015 at 2:46 PM, Petr Murzin petrmurz...@gmail.com wrote: Hello, This patch converts mask type for masked builtins from signed to unsigned. Furthermore, several redundant builtins definitions were removed. Please have a look. It it ok for trunk? I gave the patch a quick look and didn't find anything problematic. Assuming the patch is bootstrapped and properly regression tested, it is OK for trunk. Thanks, Uros. 2015-07-27 Petr Murzin petr.mur...@intel.com * config/i386/i386.c (bdesc_special_args): Convert mask type from signed to unsigned for masked builtins. (ix86_expand_args_builtin): Do not handle UINT_FTYPE_V2DF, UINT64_FTYPE_V2DF, UINT64_FTYPE_V4SF, V16QI_FTYPE_V8DI, V16HI_FTYPE_V16SI, V16SI_FTYPE_V16SI, V16SF_FTYPE_FLOAT, V8HI_FTYPE_V8DI, V8UHI_FTYPE_V8UHI, V8SI_FTYPE_V8DI, V8SF_FTYPE_V8DF, V8DI_FTYPE_INT64, V8DI_FTYPE_V4DI, V8DI_FTYPE_V8DI, V8DF_FTYPE_DOUBLE, V8DF_FTYPE_V8SI, V16SI_FTYPE_V16SI_V16SI, V16SF_FTYPE_V16SF_V16SI, V8DI_FTYPE_V8DI_V8DI, V8DF_FTYPE_V8DF_V8DI, V4SI_FTYPE_V4SF_V4SF, V4SF_FTYPE_V4SF_UINT64, V2UDI_FTYPE_V4USI_V4USI, V2DI_FTYPE_V2DF_V2DF, V2DF_FTYPE_V2DF_UINT64, V4UDI_FTYPE_V8USI_V8USI, QI_FTYPE_V8DI_V8DI, HI_FTYPE_V16SI_V16SI, HI_FTYPE_HI_INT, V16SF_FTYPE_V16SF_V16SF_V16SF, V16SF_FTYPE_V16SF_V16SI_V16SF, V16SF_FTYPE_V16SI_V16SF_HI, V16SF_FTYPE_V16SI_V16SF_V16SF, V16SI_FTYPE_V16SF_V16SI_HI, V8DI_FTYPE_V8SF_V8DI_QI, V8SF_FTYPE_V8DI_V8SF_QI, V8DI_FTYPE_PV4DI, V8DF_FTYPE_V8DI_V8DF_QI, V16SI_FTYPE_V16SI_V16SI_V16SI, V2DI_FTYPE_V2DI_V2DI_V2DI, V8DI_FTYPE_V8DF_V8DI_QI, V8DF_FTYPE_PV4DF, V8SI_FTYPE_V8SI_V8SI_V8SI, V8DF_FTYPE_V8DF_V8DF_V8DF, UINT_FTYPE_V4SF, V8DF_FTYPE_V8DF_V8DI_V8DF, V8DF_FTYPE_V8DI_V8DF_V8DF, V8DF_FTYPE_V8SF_V8DF_QI, V8DI_FTYPE_V8DI_V8DI_V8DI, V16SF_FTYPE_PV4SF, V8SF_FTYPE_V8DF_V8SF_QI, V8SI_FTYPE_V8DF_V8SI_QI, V16SI_FTYPE_PV4SI, V2DF_FTYPE_V2DF_V4SF_V2DF_QI, V4SF_FTYPE_V4SF_V2DF_V4SF_QI, V8DI_FTYPE_V8DI_SI_V8DI_V8DI, QI_FTYPE_V8DF_V8DF_INT_QI, HI_FTYPE_V16SF_V16SF_INT_HI, V16SF_FTYPE_V16SF_V16SF_V16SI_INT_HI, VOID_FTYPE_PDOUBLE_V2DF_QI, VOID_FTYPE_PFLOAT_V4SF_QI, V2DF_FTYPE_PCDOUBLE_V2DF_QI, V4SF_FTYPE_PCFLOAT_V4SF_QI. * config/i386/i386-builtin-types.def (V16QI_FTYPE_V16SI): Remove. (V8DF_FTYPE_V8SI): Ditto. (V8HI_FTYPE_V8DI): Ditto. (V8SI_FTYPE_V8DI): Ditto. (V8SF_FTYPE_V8DF): Ditto. (V8SF_FTYPE_V8DF_V8SF_QI): Ditto. (V16HI_FTYPE_V16SI): Ditto. (V16SF_FTYPE_V16HI): Ditto. (V16SF_FTYPE_V16HI_V16SF_HI): Ditto. (V16SF_FTYPE_V16SI): Ditto. (V4DI_FTYPE_V4DI): Ditto. (V16SI_FTYPE_V16SF): Ditto. (V16SF_FTYPE_FLOAT): Ditto. (V8DF_FTYPE_DOUBLE): Ditto. (V8DI_FTYPE_INT64): Ditto. (V8DI_FTYPE_V4DI): Ditto. (V16QI_FTYPE_V8DI): Ditto. (UINT_FTYPE_V4SF): Ditto. (UINT64_FTYPE_V4SF): Ditto. (UINT_FTYPE_V2DF): Ditto. (UINT64_FTYPE_V2DF): Ditto. (V16SI_FTYPE_V16SI): Ditto. (V8DI_FTYPE_V8DI): Ditto. (V16SI_FTYPE_PV4SI): Ditto. (V16SF_FTYPE_PV4SF): Ditto. (V8DI_FTYPE_PV2DI): Ditto. (V8DF_FTYPE_PV2DF): Ditto. (V4DI_FTYPE_PV2DI): Ditto. (V4DF_FTYPE_PV2DF): Ditto. (V16SI_FTYPE_PV2SI): Ditto. (V16SF_FTYPE_PV2SF): Ditto. (V8DI_FTYPE_PV4DI): Ditto. (V8DF_FTYPE_PV4DF): Ditto. (V8SF_FTYPE_FLOAT): Ditto. (V4SF_FTYPE_FLOAT): Ditto. (V4DF_FTYPE_DOUBLE): Ditto. (V8SF_FTYPE_PV4SF): Ditto. (V8SI_FTYPE_PV4SI): Ditto. (V4SI_FTYPE_PV2SI): Ditto. (V8SF_FTYPE_PV2SF): Ditto. (V8SI_FTYPE_PV2SI): Ditto. (V16SF_FTYPE_PV8SF): Ditto. (V16SI_FTYPE_PV8SI): Ditto. (V8DI_FTYPE_V8SF): Ditto. (V4DI_FTYPE_V4SF): Ditto. (V2DI_FTYPE_V4SF): Ditto. (V64QI_FTYPE_QI): Ditto. (V32HI_FTYPE_HI): Ditto. (V8UHI_FTYPE_V8UHI): Ditto. (V16UHI_FTYPE_V16UHI): Ditto. (V32UHI_FTYPE_V32UHI): Ditto. (V2UDI_FTYPE_V2UDI): Ditto. (V4UDI_FTYPE_V4UDI): Ditto. (V8UDI_FTYPE_V8UDI): Ditto. (V4USI_FTYPE_V4USI): Ditto. (V8USI_FTYPE_V8USI): Ditto. (V16USI_FTYPE_V16USI): Ditto. (V2DF_FTYPE_V2DF_UINT64): Ditto. (V2DI_FTYPE_V2DF_V2DF): Ditto. (V2UDI_FTYPE_V4USI_V4USI): Ditto. (V8DF_FTYPE_V8DF_V8DI): Ditto. (V4SF_FTYPE_V4SF_UINT64): Ditto. (V4SI_FTYPE_V4SF_V4SF): Ditto. (V16SF_FTYPE_V16SF_V16SI): Ditto. (V64QI_FTYPE_V32HI_V32HI): Ditto. (V32HI_FTYPE_V16SI_V16SI): Ditto. (V8DF_FTYPE_V8DF_V8DF_V8DI_INT_QI): Ditto. (V16SF_FTYPE_V16SF_V16SF_V16SI_INT_HI): Ditto. (V32HI_FTYPE_V64QI_V64QI): Ditto. (V32HI_FTYPE_V32HI_V32HI): Ditto. (V16HI_FTYPE_V16HI_V16HI_INT_V16HI_HI): Ditto. (V16SI_FTYPE_V16SI_V4SI): Ditto. (V16SI_FTYPE_V16SI_V16SI): Ditto. (V16SI_FTYPE_V32HI_V32HI): Ditto. (V16SI_FTYPE_V16SI_SI): Ditto. (V8DI_FTYPE_V8DI_V8DI): Ditto. (V4UDI_FTYPE_V8USI_V8USI): Ditto. (V8DI_FTYPE_V16SI_V16SI): Ditto. (V8DI_FTYPE_V8DI_V2DI): Ditto. (QI_FTYPE_QI): Ditto. (SI_FTYPE_SI): Ditto. (DI_FTYPE_DI): Ditto. (QI_FTYPE_QI_QI): Ditto. (QI_FTYPE_QI_INT): Ditto. (HI_FTYPE_HI_INT): Ditto. (SI_FTYPE_SI_INT): Ditto. (DI_FTYPE_DI_INT): Ditto. (HI_FTYPE_V16QI_V16QI): Ditto. (SI_FTYPE_V32QI_V32QI): Ditto. (DI_FTYPE_V64QI_V64QI): Ditto. (QI_FTYPE_V8HI_V8HI): Ditto. (HI_FTYPE_V16HI_V16HI): Ditto. (SI_FTYPE_V32HI_V32HI): Ditto.
Re: [PATCH][RFC][match.pd] optimize (X C) == N when C is power of 2
On Mon, Jul 27, 2015 at 09:11:12AM +0100, Kyrill Tkachov wrote: On 25/07/15 03:19, Segher Boessenkool wrote: On Fri, Jul 24, 2015 at 09:09:39AM +0100, Kyrill Tkachov wrote: This transformation folds (X % C) == N into X ((1 (size - 1)) | (C - 1))) == N for constants C and N where N is positive and C is a power of 2. For N = 0 you can transform it to ((unsigned)X % C) == N and for 0 N C you can transform it to X 0 ((unsigned)X % C) == N (or X = 0) and for -C N 0 it is X 0 ((unsigned)X % C) == N + C (or X = 0) and for other N it is 0. For N not a constant, well, do you really care? :-) (That second case might eventually fold to your original expression). Yeah, these avoid the potentially expensive mask, Fun fact: the current code ends up using the exact same mask, for some targets. but introduce more operations, which I believe may not be desirable at this stage. It is getting rid of the (expensive) division/modulo. In many cases it could get rid of the sign test, or hoist it to some outer structure, hard to test here though (at least, I have no idea how to do that). Unless these transformations are ok for match.pd I'll try to implement this transformation at RTL expansion time. If you have to do conditional jumps, the RTL optimisers will not be able to do very much :-( Segher
Re: [C/C++ PATCH] Implement -Wtautological-compare (PR c++/66555, c/54979)
Hi Marek, On 27/07/15 13:41, Marek Polacek wrote: On Fri, Jul 24, 2015 at 03:10:17PM -0600, Jeff Law wrote: OK. Note I think the pt.c change will need trivial updating as it conflicts with another change of yours that I approved recently. Committed after resolving trivial merge conflict. This caused an arm bootstrap failure with -Werror. https://gcc.gnu.org/bugzilla/process_bug.cgi Thanks, Kyrill Thanks for review, Marek
Re: [PATCH 2/16][ARM] PR/63870 Add __builtin_arm_lane_check.
Hi Alan, On 07/07/15 13:34, Alan Lawrence wrote: As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01334.html + if (fcode == ARM_BUILTIN_NEON_LANE_CHECK) +{ + tree nlanes = CALL_EXPR_ARG (exp, 0); + gcc_assert (TREE_CODE (nlanes) == INTEGER_CST); + rtx lane_idx = expand_normal (CALL_EXPR_ARG (exp, 1)); + if (CONST_INT_P (lane_idx)) + neon_lane_bounds (lane_idx, 0, TREE_INT_CST_LOW (nlanes), exp); + else + error (%Klane index must be a constant immediate, exp); + /* Don't generate any RTL. */ + return const0_rtx; +} Can you please add a comment on top of this saying that this builtin only exists to perform the lane check, just to make it explicit for the future. Ok with a comment added. Thanks, Kyrill
Re: [PATCH 1/16][ARM] PR/63870 Add qualifier to check lane bounds in expand
Hi Alan, On 07/07/15 13:34, Alan Lawrence wrote: As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01333.html (While this falls under PR/63870, and I will link to that in the ChangeLog, it is only a small step towards fixing that PR.) This is ok for trunk. Thanks, Kyrill
Re: [gomp4] Remove device-specific filtering during parsing for OpenACC
On Fri, 17 Jul 2015 14:57:14 +0200 Thomas Schwinge tho...@codesourcery.com wrote: In combination with the equivant change to gcc/cp/parser.c:cp_parser_oacc_all_clauses, gcc/c-family/c-omp.c:c_oacc_filter_device_types, and transitively also the struct identifier_hasher and c_oacc_extract_device_id function preceding it, are now unused. (Not an exhaustive list; have not checked which other auxilliary functions etc. Cesar has added in his device_type changes.) Does it make any sense to keep these for later, or dump them now? The attached patch removes this dead code... --- a/gcc/c/c-typeck.c +++ b/gcc/c/c-typeck.c @@ -12568,6 +12568,10 @@ c_finish_omp_clauses (tree clauses, bool oacc) pc = OMP_CLAUSE_CHAIN (c); continue; +case OMP_CLAUSE_DEVICE_TYPE: + pc = OMP_CLAUSE_DEVICE_TYPE_CLAUSES (c); + continue; + case OMP_CLAUSE_INBRANCH: case OMP_CLAUSE_NOTINBRANCH: if (branch_seen) From a quick glance only, this seems to be different from the C++ front end (have not checked Fortran). I have not looked at what the front end parsing is now actually doing; is it just attaching any clauses following a device_type clause to the latter? (The same should be done for all front ends, obviously. Even if it's not important right now, because of the sorry diagnostic that will be emitted later on as soon as there is one device_type clause, this should best be addressed now, while you still remember what's going on here ;-) so that there will be no bad surprises once we actually implement the handling in OMP lowering/streaming/device compilers.) Do we need manually need to take care to finalize (c_finish_omp_clauses et al.) such masked clause chains, or will the right thing happen automatically? ...and fixes the C and C++ frontend to finalize parsed device_type clauses properly (although so far finalization doesn't do anything for the clauses that can be associated with a device_type clause anyway, so there's no actual change in behaviour). I haven't moved the sorry reporting for the unsupported device_type clause to scan_sharing_clauses because it doesn't seem to be particularly a more logical place, and doing so breaks the tests that scan the omp-low dumps. I will apply to gomp4 branch as obvious, shortly. Thanks, Julian ChangeLog gcc/ * c-family/c-omp.c (c_oacc_extract_device_id, identifier_hasher) (c_oacc_filter_device_types): Remove dead code. * c/c-typeck.c (c_finish_omp_clauses): Add scanning for sub-clauses of device_type clause. * cp/semantics.c (finish_omp_clauses): Likewise.commit e24a9cd14d4b8b5dab8b37218b29844787809648 Author: Julian Brown jul...@codesourcery.com Date: Mon Jul 27 07:31:10 2015 -0700 Clause finalization cleanups and dead code removal. diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c index b76de69..10190d7 100644 --- a/gcc/c-family/c-omp.c +++ b/gcc/c-family/c-omp.c @@ -1081,132 +1081,6 @@ c_omp_predetermined_sharing (tree decl) return OMP_CLAUSE_DEFAULT_UNSPECIFIED; } -/* Return a numerical code representing the device_type. Currently, - only device_type(nvidia) is supported. All device_type parameters - are treated as case-insensitive keywords. */ - -static int -c_oacc_extract_device_id (const char *device) -{ - if (!strcasecmp (device, nvidia)) -return GOMP_DEVICE_NVIDIA_PTX; - else if (!strcmp (device, *)) -return GOMP_DEVICE_DEFAULT; - return GOMP_DEVICE_NONE; -} - -struct identifier_hasher : ggc_cache_ptr_hashtree_node -{ - static hashval_t hash (tree t) { return htab_hash_pointer (t); } - static bool equal (tree a, tree b) - { -return !strcmp(IDENTIFIER_POINTER (a), IDENTIFIER_POINTER (b)); - } -}; - -/* Filter out the list of unsupported OpenACC device_types. */ - -tree -c_oacc_filter_device_types (tree clauses) -{ - tree c, prev; - tree dtype = NULL_TREE; - tree seen_nvidia = NULL_TREE; - tree seen_default = NULL_TREE; - hash_tableidentifier_hasher *dt_htab -= hash_tableidentifier_hasher::create_ggc (10); - - /* First scan for all device_type clauses. */ - for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c)) -{ - if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE_TYPE) - { - tree t; - - for (t = OMP_CLAUSE_DEVICE_TYPE_DEVICES (c); t; t = TREE_CHAIN (t)) - { - if (dt_htab-find (t)) - { - error_at (OMP_CLAUSE_LOCATION (c), - duplicate device_type (%s), - IDENTIFIER_POINTER (t)); - goto filter_dtype; - } - - int code = c_oacc_extract_device_id (IDENTIFIER_POINTER (t)); - - if (code == GOMP_DEVICE_DEFAULT) - seen_default = OMP_CLAUSE_DEVICE_TYPE_CLAUSES (c); - else if (code == GOMP_DEVICE_NVIDIA_PTX) - seen_nvidia = OMP_CLAUSE_DEVICE_TYPE_CLAUSES (c); - else - { - /* The OpenACC technical committee advises compilers - to silently ignore unknown devices. */ - } - - tree *slot = dt_htab-find_slot (t, INSERT); - *slot = t; - }
Re: [PATCH][RFC] Re-work GIMPLE checking to be gimple class friendly
On July 27, 2015 5:18:55 PM GMT+02:00, Michael Matz m...@suse.de wrote: Hi, On Mon, 27 Jul 2015, Richard Biener wrote: static inline tree gimple_assign_rhs1 (const_gimple gs) { GIMPLE_CHECK (gs, GIMPLE_ASSIGN); return gimple_op (gs, 1); } and the hidden checking is due to gimple_op being static inline tree * gimple_ops (gimple gs) { size_t off; /* All the tuples have their operand vector at the very bottom of the structure. Note that those structures that do not have an operand vector have a zero offset. */ off = gimple_ops_offset_[gimple_statement_structure (gs)]; gcc_gimple_checking_assert (off != 0); I once experimented with making the multiple checking and removing the actual memory accesses by exposing the real content of gss_for_code_[] (for gss_for_code(), used by gimple_statement_structure()) and gimple_ops_offset_[] to the inlined functions, i.e. by defining them in gimple.h, not just declaring them. Still wouldn't help the -O0 case, but with -O2 only the first check for being GIMPLE_ASSING remains, everything else is constant folded. The simplest thing would be to use GIMPLE_CHECK (gs, GIMPLE_ASSIGN); return as_a const gassign * (gs)-op[1]; but that's not optimal either because we check we have a gassign again (not optimized in stage1). So I decided to invent a fancy as_a which reports errors similar to GIMPLE_CHECK and makes the code look like I'm confused. as_a already does checking: as_a (U *p) { gcc_checking_assert (is_a T (p)); return is_a_helper T::cast (p); } (were the is_a variant here has the same test as GIMPLE_CHECK). So, if you have the as_a call, you can as well simply remove the GIMPLE_CHECK, and be done with it, without ... const gassign *ass = GIMPLE_CHECK2const gassign * (gs); ... introducing this. What am I missing? It would remove the fancy checking fail message in favor of a simple ICE. Otherwise nothing, of course. Oh, and the ability to separately turn on/off GIMPLE checking. Richard. Ciao, Michael.
Re: offload data version number
On 07/24/15 09:32, Nathan Sidwell wrote: On 07/21/15 11:21, Nathan Sidwell wrote: I committed a version to gomp4 branch, but would still like to get this to trunk ASAP. I committed this update to the gomp4 branch to match the updated version currently under review for trunk. nathan 2015-07-27 Nathan Sidwell nat...@codesourcery.com include/ * gomp-constants.h (GOMP_VERSION_NVIDIA_PTX): Restore to zero. libgomp/ * libgomp.map: Rename GOMP_offload_{,un}register_2 to GOMP_offload_{,un}register_ver. * target.c (gomp_load_image_to_device): Check version function to determine interface to plugin. (gomp_unload_image_from_device): Likewise. (GOMP_offload_register_2): Rename to ... (GOMP_offload_register_ver): ... here. (GOMP_offload_unregister_2): Rename to ... (GOMP_offload_unregister_ver): ... here. (GOMP_offload_register, GOMP_offload_unregister): Adjust forwarding. (gomp_load_plugin_for_device): Reimplement DLSYM DLSYM_OPT macros. Look for versioning function and check it. Fetch versioning loader and unloader if available. * libgomp.h (gomp_device_descr): Add version function field. Put loader and unloader fields in unions. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_version): New. (GOMP_OFFLOAD_load_image_2): Renamed to ... (GOMP_OFFLOAD_load_image_ver): ... here. (GOMP_OFFLOAD_unload_image_2): Renamed to ... (GOMP_OFFLOAD_unload_image_ver): ... here. (GOMP_OFFLOAD_load_image, GOMP_OFFLOAD_unload_image): Delete. * oacc-host.c (host_dispatch): Adjust. gcc/ * config/nvptx/mkoffload.c (process): Rename _2 functions to _ver. Index: libgomp/libgomp.map === --- libgomp/libgomp.map (revision 226237) +++ libgomp/libgomp.map (working copy) @@ -236,8 +236,8 @@ GOMP_4.0.1 { GOMP_4.0.2 { global: - GOMP_offload_register_2; - GOMP_offload_unregister_2; + GOMP_offload_register_ver; + GOMP_offload_unregister_ver; } GOMP_4.0.1; OACC_2.0 { Index: libgomp/target.c === --- libgomp/target.c (revision 226237) +++ libgomp/target.c (working copy) @@ -671,17 +671,17 @@ gomp_load_image_to_device (unsigned vers struct addr_pair *target_table = NULL; int i, num_target_entries; - if (devicep-load_image_2_func) + if (devicep-version_func) num_target_entries - = devicep-load_image_2_func (version, devicep-target_id, -target_data, target_table); + = devicep-load_image.ver_func (version, devicep-target_id, + target_data, target_table); else if (GOMP_VERSION_DEV (version)) gomp_fatal (Plugin too old for offload data (0 %u), GOMP_VERSION_DEV (version)); else num_target_entries - = devicep-load_image_func (devicep-target_id, - target_data, target_table); + = devicep-load_image.unver_func (devicep-target_id, + target_data, target_table); if (num_target_entries != num_funcs + num_vars) { @@ -790,11 +790,11 @@ gomp_unload_image_from_device (unsigned node = splay_tree_lookup (devicep-mem_map, k); } - if (devicep-unload_image_2_func) -devicep-unload_image_2_func (version, - devicep-target_id, target_data); + if (devicep-version_func) +devicep-unload_image.ver_func (version, +devicep-target_id, target_data); else -devicep-unload_image_func (devicep-target_id, target_data); +devicep-unload_image.unver_func (devicep-target_id, target_data); /* Remove mappings from splay tree. */ for (j = 0; j num_funcs; j++) @@ -823,8 +823,8 @@ gomp_unload_image_from_device (unsigned the target, and TARGET_DATA needed by target plugin. */ void -GOMP_offload_register_2 (unsigned version, const void *host_table, - int target_type, const void *target_data) +GOMP_offload_register_ver (unsigned version, const void *host_table, + int target_type, const void *target_data) { int i; @@ -863,7 +863,7 @@ void GOMP_offload_register (const void *host_table, int target_type, const void *target_data) { - GOMP_offload_register_2 (0, host_table, target_type, target_data); + GOMP_offload_register_ver (0, host_table, target_type, target_data); } /* This function should be called from every offload image while unloading. @@ -871,8 +871,8 @@ GOMP_offload_register (const void *host_ the target, and TARGET_DATA needed by target plugin. */ void -GOMP_offload_unregister_2 (unsigned version, const void *host_table, - int target_type, const void *target_data) +GOMP_offload_unregister_ver (unsigned version, const void *host_table, + int target_type, const void *target_data) { int i; @@ -904,7 +904,7 @@ void GOMP_offload_unregister (const void *host_table, int target_type, const void *target_data) { - GOMP_offload_unregister_2 (0, host_table, target_type, target_data); + GOMP_offload_unregister_ver (0, host_table, target_type, target_data); } /* This function initializes the target device, specified
Re: [PATCH 0/9] start converting POINTER_SIZE to a hook
On 07/27/2015 03:17 AM, Richard Biener wrote: On Mon, Jul 27, 2015 at 5:10 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders tbsaunde+...@tbsaunde.org Hi, $subject. patches individually bootstrapped + regtested on x86_64-linux-gnu, and run through config-list.mk with more patches removing usage of the macro. Ok? With POINTER_SIZE now being expensive (target hook) you might consider moving most users to use pointer_sized_int_node or some other global derived from POINTER_SIZE. Which of course raises the question of why we are hookizing this... if you'd want a truly switchable target you'd have to switch all global trees as well (or hookize them individually). Not sure -- it doesn't remove any conditionally compiled code... One could easily argue that it's just another step on the path towards a switchable target -- which in and of itself is a reasonable design goal. Trevor, maybe a quick note on the motivation would help here... jeff
[wwwdocs] Add -Wtautological-compare to changes.html
Another one. Committed to CVS. Index: changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v retrieving revision 1.17 diff -u -r1.17 changes.html --- changes.html27 Jul 2015 11:16:15 - 1.17 +++ changes.html27 Jul 2015 14:35:59 - @@ -48,6 +48,10 @@ overflows. code-Wshift-overflow=2/code also warns about left-shifting 1 into the sign bit. This warning is enabled by default./li +liA new command-line option code-Wtautological-compare/code has been + added for the C and C++ compilers, which warns if a self-comparison + always evaluates to true or false. This warning is enabled by + code-Wall/code./li /ul h3 id=cC/h3 Marek
[PATCH][AArch64] Improve spill code - swap order in shr patterns
This is basically the same as the shl version but for the shr patterns. Various instructions are supported as integer operations as well as SIMD on AArch64. When register pressure is high, lra-constraints inserts spill code without taking the allocation class into account, and basically chooses the first available pattern that matches. Since this instruction has the SIMD version first it is usually chosen eventhough some of the operands are eventually allocated to integer registers. The result is inefficient code not only due to the higher latency of SIMD instructions but also due to the extra int-FP moves. Placing the integer variant first in the shr pattern generates far more optimal spill code. 2015-07-27 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64.md (aarch64_lshr_sisd_or_int_mode3): Place integer variant first. (aarch64_ashr_sisd_or_int_mode3): Likewise. --- gcc/config/aarch64/aarch64.md | 27 ++- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index baef56a..06256e3 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3349,17 +3349,18 @@ ;; Logical right shift using SISD or Integer instruction (define_insn *aarch64_lshr_sisd_or_int_mode3 - [(set (match_operand:GPI 0 register_operand =w,w,r) + [(set (match_operand:GPI 0 register_operand =r,w,w,w) (lshiftrt:GPI - (match_operand:GPI 1 register_operand w,w,r) - (match_operand:QI 2 aarch64_reg_or_shift_imm_mode Uscmode,w,rUscmode)))] + (match_operand:GPI 1 register_operand r,w,w,w) + (match_operand:QI 2 aarch64_reg_or_shift_imm_mode rUscmode,Uscmode,w,0)))] @ + lsr\t%w0, %w1, %w2 ushr\t%rtn0vas, %rtn1vas, %2 # - lsr\t%w0, %w1, %w2 - [(set_attr simd yes,yes,no) - (set_attr type neon_shift_immq,neon_shift_regq,shift_reg)] + # + [(set_attr simd no,yes,yes,yes) + (set_attr type shift_reg,neon_shift_immq,neon_shift_regq,neon_shift_regq)] ) (define_split @@ -3394,18 +3395,18 @@ ;; Arithmetic right shift using SISD or Integer instruction (define_insn *aarch64_ashr_sisd_or_int_mode3 - [(set (match_operand:GPI 0 register_operand =w,w,w,r) + [(set (match_operand:GPI 0 register_operand =r,w,w,w) (ashiftrt:GPI - (match_operand:GPI 1 register_operand w,w,w,r) - (match_operand:QI 2 aarch64_reg_or_shift_imm_di Uscmode,w,0,rUscmode)))] + (match_operand:GPI 1 register_operand r,w,w,w) + (match_operand:QI 2 aarch64_reg_or_shift_imm_di rUscmode,Uscmode,w,0)))] @ + asr\t%w0, %w1, %w2 sshr\t%rtn0vas, %rtn1vas, %2 # - # - asr\t%w0, %w1, %w2 - [(set_attr simd yes,yes,yes,no) - (set_attr type neon_shift_immq,neon_shift_regq,neon_shift_regq,shift_reg)] + # + [(set_attr simd no,yes,yes,yes) + (set_attr type shift_reg,neon_shift_immq,neon_shift_regq,neon_shift_regq)] ) (define_split -- 1.9.1
Re: [PATCH][AArch64] Improve spill code - swap order in shl pattern
On Mon, Jul 27, 2015 at 03:38:40PM +0100, Wilco Dijkstra wrote: ping Various instructions are supported as integer operations as well as SIMD on AArch64. When register pressure is high, lra-constraints inserts spill code without taking the allocation class into account, and basically chooses the first available pattern that matches. Since this instruction has the SIMD version first it is usually chosen eventhough some of the operands are eventually allocated to integer registers. The result is inefficient code not only due to the higher latency of SIMD instructions but also due to the extra int-FP moves. Placing the integer variant first in the shl pattern generates far more optimal spill code. A few more patterns are the wrong way around, which I'll address in a separate patch. I'm also looking into fixing lra-constraints to generate the expected code by taking the allocno class into account in the cost calculations during spilling. 2015-04-27 Wilco Dijkstra wdijk...@arm.com * gcc/config/aarch64/aarch64.md (aarch64_ashl_sisd_or_int_mode3): Place integer variant first. I approved this in April ( https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01652.html ). I've committed it on your behalf as revision 226247. Thanks, James -Original Message- From: Wilco Dijkstra [mailto:wdijk...@arm.com] Sent: 27 April 2015 14:37 To: GCC Patches Subject: [PATCH][AArch64] Improve spill code - swap order in shl pattern --- gcc/config/aarch64/aarch64.md | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 7163025..baef56a 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3334,17 +3334,17 @@ ;; Logical left shift using SISD or Integer instruction (define_insn *aarch64_ashl_sisd_or_int_mode3 - [(set (match_operand:GPI 0 register_operand =w,w,r) + [(set (match_operand:GPI 0 register_operand =r,w,w) (ashift:GPI - (match_operand:GPI 1 register_operand w,w,r) - (match_operand:QI 2 aarch64_reg_or_shift_imm_mode Uscmode,w,rUscmode)))] + (match_operand:GPI 1 register_operand r,w,w) + (match_operand:QI 2 aarch64_reg_or_shift_imm_mode rUscmode,Uscmode,w)))] @ + lsl\t%w0, %w1, %w2 shl\t%rtn0vas, %rtn1vas, %2 - ushl\t%rtn0vas, %rtn1vas, %rtn2vas - lsl\t%w0, %w1, %w2 - [(set_attr simd yes,yes,no) - (set_attr type neon_shift_immq, neon_shift_regq,shift_reg)] + ushl\t%rtn0vas, %rtn1vas, %rtn2vas + [(set_attr simd no,yes,yes) + (set_attr type shift_reg,neon_shift_immq, neon_shift_regq)] ) ;; Logical right shift using SISD or Integer instruction -- 1.9.1
Re: [C/C++ PATCH] Implement -Wtautological-compare (PR c++/66555, c/54979)
On Mon, Jul 27, 2015 at 05:01:30PM +0100, Kyrill Tkachov wrote: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67030 I hope the patch I've posted in the PR helps. Can you please check? Marek
Re: [PATCH][RFC][match.pd] optimize (X C) == N when C is power of 2
On 07/27/2015 01:36 AM, Richard Biener wrote: I think it also aligns with the idea of pushing back RTL expansion and expose some target specifics after another GIMPLE lowering phase. I'm also thinking of addressing-mode selection and register promotion. So at least if we think of that target specific match.pd pass as containing all RTL expansion tricks done with TER only then it should be quite simple to make it work. Yea -- I was also thinking it might allow us to clean up some of the actual expansion code as well. It seems promising enough for some experimentation. jeff
Re: [PATCH] Honour DriverOnly for enum values in error messages.
On 07/27/2015 05:45 AM, Dominik Vogt wrote: The attached patch fixes a glicht in the error message generated for invalid values of enum options. When a DriverOnly option was passed directoy top the compiler, it would still list that as valid in the error message, e.g. on s390: $ cc1 ... -march=native cc1: error: unrecognized argument in option ‘-march=native’ cc1: note: valid arguments to ‘-march=’ are: g5 g6 native z10 z13 z196 z9-109 z9-ec z900 z990 zEC12 The patched code prints DriverOnly enum values only when the driver generates the error message. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany 0001-ChangeLog /gcc/ChangeLog * opts-common.c (read_cmdline_option): List DriverOnly enum values as valid only in the error message of the driver, not in the messages of the language compilers. OK. jeff
Re: [PATCH] Use lowpart_subreg instead of simplify_gen_subreg
On 07/26/2015 02:02 PM, Anatoliy Sokolov wrote: Hello. This patch change function call simplify_gen_subreg (omode, x, imode, subreg_lowpart_offset (omode, imode)) with lowpart_subreg (omode, x, imode) and move lowpart_subreg function from loop-iv.c to simplify-rtx.c. Bootstrapped and reg-tested on x86_64-unknown-linux-gnu. OK for trunk? Anatoliy. 2015-07-26 Anatoly Sokolov ae...@post.ru * rtl.h (lowpart_subreg): Move in file. * loop-iv.c (lowpart_subreg): Move to... * simplify-rtx.c (lowpart_subreg): ...here. (simplify_binary_operation_1): Use lowpart_subreg instead of simplify_gen_subreg. * expr.c (expand_expr_real_2): Ditto. * emit-rtl.c (gen_lowpart_common): Ditto. * combine.c (gen_lowpart_for_combine): Ditto. * cfgexpand.c (convert_debug_memory_address, expand_debug_expr, expand_debug_source_expr): Ditto. OK. jeff
Re: [PATCH][1/N] Change GET_MODE_INNER to always return a non-void mode
On 07/27/2015 04:25 AM, David Sherwood wrote: Hi, Part 1 of this change is a clean-up. I have changed calls to GET_MODE_INNER (m) so that it returns m in cases where there is no inner mode. This simplifies some of the calling code by removing the need to check for VOIDmode and allows calling it unconditionally. I also removed element_precision () as it was only called in one place and thought it neater to call GET_MODE_PRECISION explicitly. Parts 2-4 will include further tidy-ups and optimisations based on [1/N]. Good to go? Regards, David Sherwood. 2015-07-17 David Sherwooddavid.sherw...@arm.com gcc/ * config/arm/arm.c (neon_element_bits, neon_valid_immediate): Call GET_MODE_INNER unconditionally. * config/spu/spu.c (arith_immediate_p): Likewise. * config/i386/i386.c (ix86_build_signbit_mask): Likewise. New variable. * expmed.c (synth_mult): Remove check for VOIDmode result from GET_MODE_INNER. (expand_mult_const): Likewise. * fold-const.c (): Replace call to element_precision with call to GET_MODE_PRECISION. * genmodes.c (emit_mode_inner_inline): Replace void_mode-name with m-name. (emit_mode_inner): Likewise. * lto-streamer-out.c (lto_write_mode_table): Update GET_MODE_INNER result check. * machmode.h (GET_MODE_UNIT_SIZE): Simplify. (GET_MODE_UNIT_PRECISION): Likewise. * rtlanal.c (subreg_get_info): Call GET_MODE_INNER unconditionally. * simplify-rtx.c (simplify_immed_subreg): Likewise. * stor-layout.c (bitwise_type_for_mode): Update assert. (element_precision): Remove. Somehow my brain kept translating INNER into NARROWER. Naturally I was having considerable trouble seeing how the patch could be correct ;-) Looking at insn-modes.h cleared things up quickly. In a lot of ways this makes GET_INNER_MODE act more like GET_MODE_NUNITS, which is probably good. You need to update the comment for GET_MODE_INNER in machmode.h to reflect the change in its return value for non-vector modes. With that update, this patch is fine. jeff
Re: [gomp4] Additional tests for routine directive
Thomas, The attached patch adds XFAILs so as to quiet the errors until such time as the development is complete. Committed to gomp-4_0-branch. Thank you, thank you Jim On 07/27/2015 08:36 AM, Thomas Schwinge wrote: Hi! On Fri, 24 Jul 2015 15:43:36 -0500, James Norris jnor...@codesourcery.com wrote: The attached patch adds additional test for the routine directive for C/C++/Fortran. Committed to gomp-4_0-branch. Thanks, but I see a number of FAILs, including the following: FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/routine-5.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors) UNRESOLVED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/routine-5.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 compilation failed to produce executable PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/routine-3.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 (test for excess errors) PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/routine-3.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 execution test FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/routine-3.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 output pattern test, is , should match foo not found FAIL: libgomp.oacc-fortran/routine-8.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -O0 (test for excess errors) UNRESOLVED: libgomp.oacc-fortran/routine-8.f90 -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -O0 compilation failed to produce executable [same for other torture testing flags] PASS: libgomp.oacc-fortran/routine-6.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -O0 (test for excess errors) PASS: libgomp.oacc-fortran/routine-6.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -O0 execution test FAIL: libgomp.oacc-fortran/routine-6.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -O0 output pattern test, is , should match not found [same for other torture testing flags] (I have not reviewed your test case changes.) Grüße, Thomas diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-3.c index 73ca528..a191758 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-3.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-3.c @@ -1,5 +1,6 @@ /* { dg-do run } */ +/* { dg-xfail-if foo not found { openacc_host_selected } } */ #include stdlib.h @@ -28,5 +29,3 @@ main() return 0; } - -/* { dg-output foo not found { target openacc_host_selected } } */ diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-5.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-5.c index 6d0fbe3..4e34f78 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-5.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-5.c @@ -1,7 +1,9 @@ /* { dg-do run } */ +/* { dg-warning TODO implicit { xfail *-*-* } 17 } */ +/* { dg-warning TODO implicit { xfail *-*-* } 27 } */ +/* { dg-xfail-if unresolved symbol { *-*-* } } */ -#include stdio.h #include stdlib.h #pragma acc routine bind (foo) diff --git a/libgomp/testsuite/libgomp.oacc-fortran/routine-6.f90 b/libgomp/testsuite/libgomp.oacc-fortran/routine-6.f90 index 4b7b707..9ba6da8 100644 --- a/libgomp/testsuite/libgomp.oacc-fortran/routine-6.f90 +++ b/libgomp/testsuite/libgomp.oacc-fortran/routine-6.f90 @@ -1,4 +1,5 @@ ! { dg-do run } +! { dg-xfail-if not found { openacc_host_selected } } program main integer :: a, n @@ -25,4 +26,3 @@ end function end program main -! { dg-output not found { target openacc_host_selected } } diff --git a/libgomp/testsuite/libgomp.oacc-fortran/routine-8.f90 b/libgomp/testsuite/libgomp.oacc-fortran/routine-8.f90 index 2060740..5c58b43 100644 --- a/libgomp/testsuite/libgomp.oacc-fortran/routine-8.f90 +++ b/libgomp/testsuite/libgomp.oacc-fortran/routine-8.f90 @@ -1,5 +1,6 @@ ! { dg-do run } +! { dg-error Invalid TODO { xfail *-*-* } 51 } program main integer, parameter :: n = 10
Re: [PATCH] Optimize certain end of loop conditions into min/max operation
On 07/27/2015 03:25 AM, Richard Biener wrote: On Mon, Jul 27, 2015 at 5:41 AM, Michael Collison michael.colli...@linaro.org wrote: This patch is designed to optimize end of loop conditions involving of the form i x i y into i min (x, y). Loop condition involving '' are handled similarly using max(x,y). As an example: #define N 1024 int a[N], b[N], c[N]; void add (unsignedint m, unsignedint n) { unsignedint i, bound = (m n) ? m : n; for (i = 0; i m i n; ++i) a[i] = b[i] + c[i]; } Performed bootstrap and make check on: x86_64_unknown-linux-gnu, arm-linux-gnueabihf, and aarch64-linux-gnu. Okay for trunk? So this works only for that has been lowered to non-CFG form (I suppose phiopt would catch that? If not, ifcombine would be the place to implement it I guess). phiopt is supposed to be generating MIN/MAX expressions for us. If it isn't it'd be good to see any testcases where it isn't. I think that raises a general question though. Does it make more sense to capture MIN/MAX (and others) in phiopt or in the match.pd framework? Jeff
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
On 27/07/15 17:09, Jeff Law wrote: On 07/27/2015 04:17 AM, Kyrill Tkachov wrote: I experimented with resource.c and the roadblock I hit is that it seems to have an assumption that it operates on hard regs (in fact the struct it uses to describe the resources has a HARD_REG_SET for the regs) and so it triggers various HARD_REGISTER_P asserts when I try to use the functions there. if-conversion runs before register allocation, so we're dealing with pseudos here. Sigh. resource.c probably isn't going to be useful then. My other attempt was to go over BB_A and mark the set registers in a bitmap then go over BB_B and do a FOR_EACH_SUBRTX of the SET_SRC of each insn. If a sub-rtx is a reg that is set in the bitmap from BB_A we return false. This seemed to do the job and testing worked out ok. That would require one walk over BB_A, one walk over BB_B but I don't know how expensive FOR_EACH_SUBRTX walks are... Would that be an acceptable solution? I think the latter is reasonable. Ultimately we have to do a full look at those rtxs, so it's unavoidable to some extent. The only other possibility would be to use the DF framework. I'm not sure if it's even initialized for the ifcvt code. If it is, then you might be able to avoid some of the walking of the insns and instead walk the DF structures. I think it is initialized (I look at df_get_live_out earlier on in the call chain). I suppose what we want is for the live in regs for BB_B to not include any of the set regs in BB_A? snip It fails when the last insn is not recognised, because noce_try_cmove_arith can modify the last insn, but I have not seen it cause any trouble. If it fails then back in noce_try_cmove_arith we goto end_seq_and_fail which ends the sequence and throws it away (and cancels if-conversion down that path), so it should be safe. OK, I was working for the assumption that memoization ought not fail, but it seems that was a bad assumption on my part.So given noce_try_cmove_arith can change the last insn and make it no-recognizable this code seems reasoanble. So I think the only outstanding issues are: 1. Investigate moving rather than re-emitting insns. I'll look into that, but what is the machinery by which one moves insns? I don't think we have any good generic machinery for this. I think every pass that needs this capability unlinks the insn from the chain and patches it back in at the new location. That's the SET_PREV_INSN, SET_NEXT_INSN functions, right? The current way the top-level noce_process_if_block is structured it expects the various ifcvt functions (like noce_try_cmove_arith) to generate a sequence, then it takes it, unshares it and removes the empty basic blocks. If we're to instead move insns around we'd need to further modify noce_process_if_block to handle differently this one case where we move insns instead of re-emitting them. I think this would make that function more convoluted than it needs to be. With the current approach we always call unshare_all_rtl_in_chain on the emitted sequence which should take care of any RTL sharing issues and in practice I don't expect to have more than 3-4 insns in these sequences since they will be guarded by the branch cost. So I would rather argue for re-emitting insns in this case to keep consistent with the dozen or so similar functions in ifcvt.c that already work that way. Thanks, Kyrill jeff
[committed, PATCH]: Add more tests for PR target/66232
Index: ChangeLog === --- ChangeLog (revision 226252) +++ ChangeLog (working copy) @@ -1,3 +1,10 @@ +2015-07-27 H.J. Lu hongjiu...@intel.com + + * gcc.target/i386/pr66232-6.c: New tests. + * gcc.target/i386/pr66232-7.c: Likewise. + * gcc.target/i386/pr66232-8.c: Likewise. + * gcc.target/i386/pr66232-9.c: Likewise. + 2015-07-27 Marek Polacek pola...@redhat.com PR c++/66555 Index: gcc.target/i386/pr66232-6.c === --- gcc.target/i386/pr66232-6.c (revision 0) +++ gcc.target/i386/pr66232-6.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options -O2 -fno-pic -fno-plt } */ + +extern void bar (void) __attribute__((visibility(hidden))); + +void +foo (void) +{ + bar (); +} + +/* { dg-final { scan-assembler-not jmp\[ \t\]*.bar@GOTPCREL { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not jmp\[ \t\]*.bar@GOT { target ia32 } } } */ Index: gcc.target/i386/pr66232-7.c === --- gcc.target/i386/pr66232-7.c (revision 0) +++ gcc.target/i386/pr66232-7.c (working copy) @@ -0,0 +1,14 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options -O2 -fno-pic -fno-plt } */ + +extern void bar (void) __attribute__((visibility(hidden))); + +int +foo (void) +{ + bar (); + return 0; +} + +/* { dg-final { scan-assembler-not call\[ \t\]*.bar@GOTPCREL { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not call\[ \t\]*.bar@GOT { target ia32 } } } */ Index: gcc.target/i386/pr66232-8.c === --- gcc.target/i386/pr66232-8.c (revision 0) +++ gcc.target/i386/pr66232-8.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options -O2 -fno-pic -fno-plt } */ + +extern int bar (void) __attribute__((visibility(hidden))); + +int +foo (void) +{ + return bar (); +} + +/* { dg-final { scan-assembler-not jmp\[ \t\]*.bar@GOTPCREL { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not jmp\[ \t\]*.bar@GOT { target ia32 } } } */ Index: gcc.target/i386/pr66232-9.c === --- gcc.target/i386/pr66232-9.c (revision 0) +++ gcc.target/i386/pr66232-9.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options -O2 -fno-pic -fno-plt } */ + +extern int bar (void) __attribute__((visibility(hidden))); + +int +foo (void) +{ + return bar () + 1; +} + +/* { dg-final { scan-assembler-not call\[ \t\]*.bar@GOTPCREL { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not call\[ \t\]*.bar@GOT { target ia32 } } } */
Re: [MOXIE] Hookize PRINT_OPERAND and PRINT_OPERAND_ADDRESS
On 07/26/2015 02:02 PM, Anatoliy Sokolov wrote: Hi. This patch removes obsolete PRINT_OPERAND and PRINT_OPERAND_ADDRESS macros from the MOXIE back end in the GCC and introduces equivalent TARGET_PRINT_OPERAND and TARGET_PRINT_OPERAND_ADDRESS target hooks. Regression tested on moxie-unknown-elf. OK for trunk? Anatoly. 2015-07-26 Anatoly Sokolov ae...@post.ru * config/moxie/moxie.h (PRINT_OPERAND, PRINT_OPERAND_ADDRESS): Remove macros. * config/moxie/moxie-protos.h (moxie_print_operand, moxie_print_operand_address): Remove declaration. * config/moxie/moxie.c (TARGET_PRINT_OPERAND, TARGET_PRINT_OPERAND_ADDRESS): Define. (moxie_print_operand, moxie_print_operand_address): Make static. OK. jeff
Re: RFA: RL78: Add an optimization to the addsi3 pattern
Ok, but please don't put non-public issue numbers in the comments.
Re: C++ delayed folding branch review
I've trimmed this to the previously mentioned issues that still need to be addressed; I'll do another full review after these are dealt with. On 06/13/2015 12:15 AM, Jason Merrill wrote: On 06/12/2015 12:11 PM, Kai Tietz wrote: @@ -1052,6 +1054,9 @@ adjust_temp_type (tree type, tree temp) { if (TREE_TYPE (temp) == type) return temp; + STRIP_NOPS (temp); + if (TREE_TYPE (temp) == type) +return temp; @@ -1430,6 +1438,8 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t, bool reduced_constant_expression_p (tree t) { + /* Make sure we remove useless initial NOP_EXPRs. */ + STRIP_NOPS (t); Within the constexpr code we should be folding away NOPs as they are generated, they shouldn't live this long. Well, we might see them on overflows ... We shouldn't within the constexpr code. NOPs for expressions that are non-constant due to overflow are added in cxx_eval_outermost_constant_expr, so we shouldn't see them in the middle of constexpr evaluation. @@ -1088,7 +1093,10 @@ cxx_bind_parameters_in_call (const constexpr_ctx *ctx, tree t, is_dummy_object (x)) { x = ctx-object; - x = cp_build_addr_expr (x, tf_warning_or_error); + if (x) + x = cp_build_addr_expr (x, tf_warning_or_error); + else + x = get_nth_callarg (t, i); This still should not be necessary. Yeah, most likely. But I got initially here some issues, so I don't see that this code would worsen things. If this code path is hit, that means something has broken my design, and I don't want to just paper over that. Please revert this change. case SIZEOF_EXPR: + if (processing_template_decl + (!COMPLETE_TYPE_P (TREE_TYPE (t)) + || TREE_CODE (TYPE_SIZE (TREE_TYPE (t))) != INTEGER_CST)) + return t; Why is this necessary? We don't want to resolve SIZEOF_EXPR within template-declarations for incomplete types, of if its size isn't fixed. Issue is that we otherwise get issues about expressions without existing type (as usual within template-declarations for some expressions). Yes, but we shouldn't have gotten this far with a dependent sizeof; maybe_constant_value just returns if instantiation_dependent_expression_p is true. @@ -3391,8 +3431,23 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t, case CONVERT_EXPR: case VIEW_CONVERT_EXPR: case NOP_EXPR: +case UNARY_PLUS_EXPR: { + enum tree_code tcode = TREE_CODE (t); tree oldop = TREE_OPERAND (t, 0); + + if (tcode == NOP_EXPR TREE_TYPE (t) == TREE_TYPE (oldop) TREE_OVERFLOW_P (oldop)) + { + if (!ctx-quiet) + permerror (input_location, overflow in constant expression); + /* If we're being permissive (and are in an enforcing + context), ignore the overflow. */ + if (!flag_permissive) + *overflow_p = true; + *non_constant_p = true; + + return t; + } tree op = cxx_eval_constant_expression (ctx, oldop, Why doesn't the call to cxx_eval_constant_expression at the bottom here handle oldop having TREE_OVERFLOW set? I just handled the case that we see here a wrapping NOP_EXPR around an overflow. As this isn't handled by cxx_eval_constant_expression. How does it need to be handled? A NOP_EXPR wrapped around an overflow is there to indicated that the expression is non-constant, and it can't be simplified any farther. Please give an example of what was going wrong. @@ -565,6 +571,23 @@ cp_gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p) switch (code) { +case SIZEOF_EXPR: + if (SIZEOF_EXPR_TYPE_P (*expr_p)) + *expr_p = cxx_sizeof_or_alignof_type (TREE_TYPE (TREE_OPERAND (*expr_p, + 0)), + SIZEOF_EXPR, false); + else if (TYPE_P (TREE_OPERAND (*expr_p, 0))) + *expr_p = cxx_sizeof_or_alignof_type (TREE_OPERAND (*expr_p, 0), + SIZEOF_EXPR, false); + else + *expr_p = cxx_sizeof_or_alignof_expr (TREE_OPERAND (*expr_p, 0), + SIZEOF_EXPR, false); + if (*expr_p == error_mark_node) + *expr_p = size_one_node; + + *expr_p = maybe_constant_value (*expr_p); + ret = GS_OK; + break; Why are these surviving until gimplification time? This might be still necessary. I will retest, when bootstrap works. As we now added SIZEOF_EXPR folding to cp_fold, and if we catch all expressions a sizeof can occure, this shouldn't be necessary anymore. AFAIR I saw here some issues about initialzation for global-variables, which weren't caught. Hmm, I wonder why you would see issues with global initializers that aren't seen on trunk? In any case, if the issue is with global initializers, they should be handled sooner, not here. @@ -608,9 +608,13 @@ cp_fold_convert (tree
Re: [committed] Typo in ipa-devirt.c
On July 27, 2015 9:35:35 PM GMT+02:00, Marek Polacek pola...@redhat.com wrote: This patch fixes an obvious typo in types_same_for_odr detected by the new -Wtautological-compare warning. Bootstrapped/regtested on x86_64-linux, applying to trunk. Might want to back port this. Looks like a latent wrong-code bug to me. Richard. 2015-07-27 Marek Polacek pola...@redhat.com * ipa-devirt.c (types_same_for_odr): Fix typo. diff --git gcc/ipa-devirt.c gcc/ipa-devirt.c index b7afc3b..0a92768 100644 --- gcc/ipa-devirt.c +++ gcc/ipa-devirt.c @@ -550,7 +550,7 @@ types_same_for_odr (const_tree type1, const_tree type2, bool strict) return false; if (TREE_CODE (type1) == RECORD_TYPE (TYPE_BINFO (type1) == NULL_TREE) -!= (TYPE_BINFO (type1) == NULL_TREE)) +!= (TYPE_BINFO (type2) == NULL_TREE)) return false; if (TREE_CODE (type1) == RECORD_TYPE TYPE_BINFO (type1) (BINFO_VTABLE (TYPE_BINFO (type1)) == NULL_TREE) Marek
[committed] Update outer-4.c and uns-outer-4.c
Hi, this patch cleans up testcases autopar/outer-4.c and autopar/uns-outer-4.c. Committed as obvious. Thanks, - Tom Update outer-4.c and uns-outer-4.c 2015-07-27 Tom de Vries t...@codesourcery.com * gcc.dg/autopar/outer-4.c (parloop): Remove superfluous noinline attribute. Update comment. (main): Remove. Add scan for not parallelizing inner loop. * gcc.dg/autopar/uns-outer-4.c (parloop): Remove superfluous noinline attribute. (main): Remove. --- gcc/testsuite/gcc.dg/autopar/outer-4.c | 19 +++ gcc/testsuite/gcc.dg/autopar/uns-outer-4.c | 11 +-- 2 files changed, 8 insertions(+), 22 deletions(-) diff --git a/gcc/testsuite/gcc.dg/autopar/outer-4.c b/gcc/testsuite/gcc.dg/autopar/outer-4.c index 2027499..681cf85 100644 --- a/gcc/testsuite/gcc.dg/autopar/outer-4.c +++ b/gcc/testsuite/gcc.dg/autopar/outer-4.c @@ -6,15 +6,16 @@ void abort (void); int g_sum=0; int x[500][500]; -__attribute__((noinline)) -void parloop (int N) +void +parloop (int N) { int i, j; int sum; - /* Double reduction is currently not supported, outer loop is not - parallelized. Inner reduction is detected, inner loop is - parallelized. */ + /* The inner reduction is not recognized as reduction because we cannot assume + that int wraps on overflow. The way to fix this is to implement the + reduction operation in unsigned type, but we've not yet implemented + this. */ sum = 0; for (i = 0; i N; i++) for (j = 0; j N; j++) @@ -23,13 +24,7 @@ void parloop (int N) g_sum = sum; } -int main(void) -{ - parloop(500); - - return 0; -} - +/* { dg-final { scan-tree-dump-times parallelizing inner loop 0 parloops } } */ /* { dg-final { scan-tree-dump-times parallelizing outer loop 1 parloops { xfail *-*-* } } } */ /* { dg-final { scan-tree-dump-times loopfn 4 optimized { xfail *-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c b/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c index 8365a89..30ead25 100644 --- a/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c +++ b/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c @@ -6,7 +6,7 @@ void abort (void); unsigned int g_sum=0; unsigned int x[500][500]; -void __attribute__((noinline)) +void parloop (int N) { int i, j; @@ -23,14 +23,5 @@ parloop (int N) g_sum = sum; } -int -main (void) -{ - parloop (500); - - return 0; -} - - /* { dg-final { scan-tree-dump-times parallelizing outer loop 1 parloops { xfail *-*-* } } } */ /* { dg-final { scan-tree-dump-times loopfn 4 optimized } } */ -- 1.9.1
[C/C++ PATCH, committed] Fix -Wtautological-compare (PR bootstrap/67030)
This PR exposed a defect in my -Wtautological-compare warning: we don't want to warn when the expression comes from a macro expansion. Bootstrapped/regtested on x86_64-linux, applying to trunk. 2015-07-27 Marek Polacek pola...@redhat.com PR bootstrap/67030 * c-common.c (warn_tautological_cmp): Don't warn for macro expansion. * c-c++-common/Wtautological-compare-2.c: New test. diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c index 6a79b95..caa801e 100644 --- gcc/c-family/c-common.c +++ gcc/c-family/c-common.c @@ -1890,6 +1890,12 @@ warn_tautological_cmp (location_t loc, enum tree_code code, tree lhs, tree rhs) if (TREE_CODE_CLASS (code) != tcc_comparison) return; + /* Don't warn for various macro expansions. */ + if (from_macro_expansion_at (loc) + || from_macro_expansion_at (EXPR_LOCATION (lhs)) + || from_macro_expansion_at (EXPR_LOCATION (rhs))) +return; + /* We do not warn for constants because they are typical of macro expansions that test for features, sizeof, and similar. */ if (CONSTANT_CLASS_P (lhs) || CONSTANT_CLASS_P (rhs)) diff --git gcc/testsuite/c-c++-common/Wtautological-compare-2.c gcc/testsuite/c-c++-common/Wtautological-compare-2.c index e69de29..c8aecef 100644 --- gcc/testsuite/c-c++-common/Wtautological-compare-2.c +++ gcc/testsuite/c-c++-common/Wtautological-compare-2.c @@ -0,0 +1,15 @@ +/* PR bootstrap/67030 */ +/* { dg-do compile } */ +/* { dg-options -Wtautological-compare } */ + +extern int foo (void); + +#define A a +#define B A +#define FOO (A B) + +void +fn1 (int a) +{ + if (FOO); +} Marek
[committed] Typo in ipa-devirt.c
This patch fixes an obvious typo in types_same_for_odr detected by the new -Wtautological-compare warning. Bootstrapped/regtested on x86_64-linux, applying to trunk. 2015-07-27 Marek Polacek pola...@redhat.com * ipa-devirt.c (types_same_for_odr): Fix typo. diff --git gcc/ipa-devirt.c gcc/ipa-devirt.c index b7afc3b..0a92768 100644 --- gcc/ipa-devirt.c +++ gcc/ipa-devirt.c @@ -550,7 +550,7 @@ types_same_for_odr (const_tree type1, const_tree type2, bool strict) return false; if (TREE_CODE (type1) == RECORD_TYPE (TYPE_BINFO (type1) == NULL_TREE) - != (TYPE_BINFO (type1) == NULL_TREE)) + != (TYPE_BINFO (type2) == NULL_TREE)) return false; if (TREE_CODE (type1) == RECORD_TYPE TYPE_BINFO (type1) (BINFO_VTABLE (TYPE_BINFO (type1)) == NULL_TREE) Marek
Re: [committed] Typo in ipa-devirt.c
On Mon, Jul 27, 2015 at 09:45:19PM +0200, Richard Biener wrote: On July 27, 2015 9:35:35 PM GMT+02:00, Marek Polacek pola...@redhat.com wrote: This patch fixes an obvious typo in types_same_for_odr detected by the new -Wtautological-compare warning. Bootstrapped/regtested on x86_64-linux, applying to trunk. Might want to back port this. Looks like a latent wrong-code bug to me. Yeah, I'm going to put this into 5 as well. 4.9 probably doesn't have this problem. Marek
Re: [PATCH 0/9] start converting POINTER_SIZE to a hook
Jeff Law l...@redhat.com writes: On 07/27/2015 03:17 AM, Richard Biener wrote: On Mon, Jul 27, 2015 at 5:10 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders tbsaunde+...@tbsaunde.org Hi, $subject. patches individually bootstrapped + regtested on x86_64-linux-gnu, and run through config-list.mk with more patches removing usage of the macro. Ok? With POINTER_SIZE now being expensive (target hook) you might consider moving most users to use pointer_sized_int_node or some other global derived from POINTER_SIZE. Which of course raises the question of why we are hookizing this... if you'd want a truly switchable target you'd have to switch all global trees as well (or hookize them individually). Not sure -- it doesn't remove any conditionally compiled code... One could easily argue that it's just another step on the path towards a switchable target -- which in and of itself is a reasonable design goal. Trevor, maybe a quick note on the motivation would help here... I think at least we should use data hooks rather than function hooks, since this value should a constant within a subtarget. It should only change for target_reinit. Alternatively we could have a new target_globals structure that is initialised with the result of calling the hook. If we do that though, it might make sense to consolidate the hooks rather than have one for every value. E.g. having one function for UNITS_PER_WORD, one for POINTER_SIZE, one for Pmode, etc., would lead to some very verbose target code. Perhaps the main problem with these approaches is ensuring that the value is set up early enough. Thanks, Richard
Re: [PATCH 1/4] convert ASM_OUTPUT_ASCII to a hook
tbsaunde+...@tbsaunde.org writes: +/* The routine used to output sequences of byte values. We use a special + version of this for most svr4 targets because doing so makes the + generated assembly code more compact (and thus faster to assemble) + as well as more readable. Note that if we find subparts of the + character sequence which end with NUL (and which are shorter than + ELF_STRING_LIMIT) we output those using ASM_OUTPUT_LIMITED_STRING. */ + +void +ix86_elf_output_ascii (FILE *f, const char *string, size_t length) +{ + const unsigned char *_ascii_bytes = (const unsigned char *) string; + const unsigned char *limit = _ascii_bytes + length; + unsigned bytes_in_chunk = 0; + for (; _ascii_bytes limit; _ascii_bytes++) +{ + const unsigned char *p; + if (bytes_in_chunk = 64) + { + fputc ('\n', f); + bytes_in_chunk = 0; + } + for (p = _ascii_bytes; p limit *p != '\0'; p++) + continue; + if (p limit (p - _ascii_bytes) = (long) ELF_STRING_LIMIT) + { + if (bytes_in_chunk 0) + { + fputc ('\n', f); + bytes_in_chunk = 0; + } + ASM_OUTPUT_LIMITED_STRING (f, (const char *) _ascii_bytes); + _ascii_bytes = p; + } + else + { + if (bytes_in_chunk == 0) + fputs (ASM_BYTE, f); + else + fputc (',', f); + fprintf (f, 0x%02x, *_ascii_bytes); \ + bytes_in_chunk += 5; + } _ascii_bytes = p; onwards seems to be indented too far. +DEFHOOK +(output_ascii, + A target hook to output an assembly instruction to assemble a string\n\ + constant containing the @var{length} bytes at @var{str}.\n\ +\n\ +The defalt hook uses the @code{.ascii} pseudo op as found in the Berkely Unix assembler., + void, (FILE *f, const char *str, size_t length), + default_output_ascii) Preexisting, but how about s/an assembly instruction/assembly instructions/ (or assembly directives?), just to emphasise that more than one instruction can be used. @@ -7586,6 +7586,37 @@ make_debug_expr_from_rtl (const_rtx exp) return dval; } +/* Default implementation of TARGET_ASM_OUTPUT_ASCII using .ascii. */ + +void +default_output_ascii (FILE *f, const char *str, size_t length) +{ + const unsigned char *p = (const unsigned char *) str; + + fprintf (f, \t.ascii \); +for (unsigned int i = 0; i length; i++) + { Some strange indentation here too. Thanks, Richard
Re: [PR64164] drop copyrename, integrate into expand
On Jul 24, 2015, David Edelsohn dje@gmail.com wrote: On Fri, Jul 24, 2015 at 4:02 PM, Alexandre Oliva aol...@redhat.com wrote: On Jul 23, 2015, David Edelsohn dje@gmail.com wrote: I request that this patch be reverted (again). Might I kindly ask you to please do so for me. I've just found out that, after yesterday's memory upgrade on my local build machine, the filesystem that I normally use for GCC development got corrupted, and I don't want to mess with it before running an fsck which will take me a while. I have reverted the patch. Thank you very much. Long story short, the filesystem got corrupted beyond repair before I realized something was wrong, so I spend my weekend backing up the bits I still could and recreating it all from scratch. *fun* :-/ I even ran memtest before booting up, but everything was fine in the single-threaded tests it runs by default. It was only with all cores actively using memory intensely that something overheated (memory modules? chipset? cpu? no clue) and started randomly corrupting bits. So, I'm now back at lower memory clock speeds, and everything appears to be rock solid again. Phew! So, I'm back to debugging the newly-reported problems and thinking how much further I should extend testing coverage so that the next round doesn't have to be reverted again ;-) Thanks again, -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
Re: [PR64164] drop copyrename, integrate into expand
On Mon, Jul 27, 2015 at 2:22 PM, Alexandre Oliva aol...@redhat.com wrote: On Jul 24, 2015, David Edelsohn dje@gmail.com wrote: On Fri, Jul 24, 2015 at 4:02 PM, Alexandre Oliva aol...@redhat.com wrote: On Jul 23, 2015, David Edelsohn dje@gmail.com wrote: I request that this patch be reverted (again). Might I kindly ask you to please do so for me. I've just found out that, after yesterday's memory upgrade on my local build machine, the filesystem that I normally use for GCC development got corrupted, and I don't want to mess with it before running an fsck which will take me a while. I have reverted the patch. Thank you very much. Long story short, the filesystem got corrupted beyond repair before I realized something was wrong, so I spend my weekend backing up the bits I still could and recreating it all from scratch. *fun* :-/ I even ran memtest before booting up, but everything was fine in the single-threaded tests it runs by default. It was only with all cores actively using memory intensely that something overheated (memory modules? chipset? cpu? no clue) and started randomly corrupting bits. So, I'm now back at lower memory clock speeds, and everything appears to be rock solid again. Phew! So, I'm back to debugging the The exactly same thing happened to my machine. It took me several weeks before I lowered memory clock. My machine has been running fine for over a year under very heavy load. BTW, this is what I use to test ia32 on Intel64: PATH=/usr/local32/bin:/bin:/usr/bin .../configure --prefix=/usr/6.0.0 --enable-clocale=gnu --with-system-zlib --enable-shared --with-demangler-in-ld --enable-libmpx i686-linux --with-fpmath=sse --enable-languages=c,c++,fortran,java,lto,objc where /usr/local32/bin has ia32 binutils. -- H.J.
Elimitate duplication of get_catalogs in different abi
Hi This is the patch to get rid of the duplication of the get_catalogs functions in the .so. I used c++locale_internal.h that seems to be there for this kind of purpose. * config/locale/gnu/messages_members.cc (Catalog_info, Catalogs): Move... * config/locale/gnu/c++locale_internal.h: ...here in std namespace. * config/locale/gnu/c_locale.cc: Move implementation of latter here. Tested under linux x86_64. Ok to commit ? François diff --git libstdc++-v3/config/locale/gnu/c++locale_internal.h libstdc++-v3/config/locale/gnu/c++locale_internal.h index f1959d6..eeac620 100644 --- libstdc++-v3/config/locale/gnu/c++locale_internal.h +++ libstdc++-v3/config/locale/gnu/c++locale_internal.h @@ -36,8 +36,13 @@ #include cstddef #include langinfo.h +#include vector +#include string.h // ::strdup + +#include ext/concurrence.h + #if __GLIBC__ 2 || (__GLIBC__ == 2 __GLIBC_MINOR__ 2) - + extern C __typeof(nl_langinfo_l) __nl_langinfo_l; extern C __typeof(strcoll_l) __strcoll_l; extern C __typeof(strftime_l) __strftime_l; @@ -61,3 +66,49 @@ extern C __typeof(wctype_l) __wctype_l; #endif #endif // GLIBC 2.3 and later + +namespace std _GLIBCXX_VISIBILITY(default) +{ +_GLIBCXX_BEGIN_NAMESPACE_VERSION + + struct Catalog_info + { +Catalog_info(messages_base::catalog __id, const char* __domain, + locale __loc) + : _M_id(__id), _M_domain(strdup(__domain)), _M_locale(__loc) +{ } + +~Catalog_info() +{ free(_M_domain); } + +messages_base::catalog _M_id; +char* _M_domain; +locale _M_locale; + }; + + class Catalogs + { + public: +Catalogs() : _M_catalog_counter(0) { } +~Catalogs(); + +messages_base::catalog +_M_add(const char* __domain, locale __l); + +void +_M_erase(messages_base::catalog __c); + +const Catalog_info* +_M_get(messages_base::catalog __c) const; + + private: +mutable __gnu_cxx::__mutex _M_mutex; +messages_base::catalog _M_catalog_counter; +vectorCatalog_info* _M_infos; + }; + + Catalogs + get_catalogs(); + +_GLIBCXX_END_NAMESPACE_VERSION +} // namespace diff --git libstdc++-v3/config/locale/gnu/c_locale.cc libstdc++-v3/config/locale/gnu/c_locale.cc index 4612c64..0d6d204 100644 --- libstdc++-v3/config/locale/gnu/c_locale.cc +++ libstdc++-v3/config/locale/gnu/c_locale.cc @@ -31,9 +31,12 @@ #include locale #include stdexcept #include limits +#include algorithm #include langinfo.h #include bits/c++locale_internal.h +#include backward/auto_ptr.h + namespace std _GLIBCXX_VISIBILITY(default) { _GLIBCXX_BEGIN_NAMESPACE_VERSION @@ -170,6 +173,85 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return __changed; } + struct _CatalogIdComp + { +bool +operator()(messages_base::catalog __cat, const Catalog_info* __info) const +{ return __cat __info-_M_id; } + +bool +operator()(const Catalog_info* __info, messages_base::catalog __cat) const +{ return __info-_M_id __cat; } + }; + + Catalogs::~Catalogs() + { +for (vectorCatalog_info*::iterator __it = _M_infos.begin(); + __it != _M_infos.end(); ++__it) + delete *__it; + } + + messages_base::catalog + Catalogs::_M_add(const char* __domain, locale __l) + { +__gnu_cxx::__scoped_lock lock(_M_mutex); + +// The counter is not likely to roll unless catalogs keep on being +// opened/closed which is consider as an application mistake for the +// moment. +if (_M_catalog_counter == numeric_limitsmessages_base::catalog::max()) + return -1; + +auto_ptrCatalog_info info(new Catalog_info(_M_catalog_counter++, + __domain, __l)); + +// Check if we managed to allocate memory for domain. +if (!info-_M_domain) + return -1; + +_M_infos.push_back(info.get()); +return info.release()-_M_id; + } + + void + Catalogs::_M_erase(messages_base::catalog __c) + { +__gnu_cxx::__scoped_lock lock(_M_mutex); + +vectorCatalog_info*::iterator __res = + lower_bound(_M_infos.begin(), _M_infos.end(), __c, _CatalogIdComp()); +if (__res == _M_infos.end() || (*__res)-_M_id != __c) + return; + +delete *__res; +_M_infos.erase(__res); + +// Just in case closed catalog was the last open. +if (__c == _M_catalog_counter - 1) + --_M_catalog_counter; + } + + const Catalog_info* + Catalogs::_M_get(messages_base::catalog __c) const + { +__gnu_cxx::__scoped_lock lock(_M_mutex); + +vectorCatalog_info*::const_iterator __res = + lower_bound(_M_infos.begin(), _M_infos.end(), __c, _CatalogIdComp()); + +if (__res != _M_infos.end() (*__res)-_M_id == __c) + return *__res; + +return 0; + } + + Catalogs + get_catalogs() + { +static Catalogs __catalogs; +return __catalogs; + } + _GLIBCXX_END_NAMESPACE_VERSION } // namespace diff --git libstdc++-v3/config/locale/gnu/messages_members.cc libstdc++-v3/config/locale/gnu/messages_members.cc index 2e6122d..90f4b8d
[committed] Simplify gather_scalar_reductions
Hi, this patch adds an early-out to gather_scalar_reductions, and further simplifies the function structure. Bootstrapped on x86_64. Reg-tested with relevant testcases. Committed as obvious. Thanks, - Tom Simplify gather_scalar_reductions 2015-07-27 Tom de Vries t...@codesourcery.com * tree-parloops.c (gather_scalar_reductions): Simplify function structure. --- gcc/tree-parloops.c | 21 - 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index 6e75cb1..daf23f2 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -2370,6 +2370,8 @@ gather_scalar_reductions (loop_p loop, reduction_info_table_type *reduction_list loop_vec_info simple_loop_info; simple_loop_info = vect_analyze_loop_form (loop); + if (simple_loop_info == NULL) +return; for (gsi = gsi_start_phis (loop-header); !gsi_end_p (gsi); gsi_next (gsi)) { @@ -2381,15 +2383,16 @@ gather_scalar_reductions (loop_p loop, reduction_info_table_type *reduction_list if (virtual_operand_p (res)) continue; - if (!simple_iv (loop, loop, res, iv, true) - simple_loop_info) - { - gimple reduc_stmt - = vect_force_simple_reduction (simple_loop_info, phi, true, - double_reduc, true); - if (reduc_stmt !double_reduc) - build_new_reduction (reduction_list, reduc_stmt, phi); -} + if (simple_iv (loop, loop, res, iv, true)) + continue; + + gimple reduc_stmt + = vect_force_simple_reduction (simple_loop_info, phi, true, + double_reduc, true); + if (!reduc_stmt || double_reduc) + continue; + + build_new_reduction (reduction_list, reduc_stmt, phi); } destroy_loop_vec_info (simple_loop_info, true); -- 1.9.1
[committed] Remove unused decl in a test
Applied. 2015-07-27 Marek Polacek pola...@redhat.com * c-c++-common/Wtautological-compare-2.c: Remove unused declaration. diff --git gcc/testsuite/c-c++-common/Wtautological-compare-2.c gcc/testsuite/c-c++-common/Wtautological-compare-2.c index c8aecef..260d9f7 100644 --- gcc/testsuite/c-c++-common/Wtautological-compare-2.c +++ gcc/testsuite/c-c++-common/Wtautological-compare-2.c @@ -2,8 +2,6 @@ /* { dg-do compile } */ /* { dg-options -Wtautological-compare } */ -extern int foo (void); - #define A a #define B A #define FOO (A B) Marek
Re: PATCH: PR bootstrap/66978: [6 Regression] bootstrap failure with --with-multilib-list=m32,m64,mx32
On Jul 24, 2015, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Jul 24, 2015 at 10:53 AM, Alexandre Oliva aol...@redhat.com wrote: On Jul 24, 2015, H.J. Lu hongjiu...@intel.com wrote: Static chain returned from get_rtl_for_parm_ssa_default_def may not have Pmode. This patch converts static chain to Pmode if needed. Tested on Linux/x86-64. OK for trunk? Yeah, this looks good to me, thanks. Even the static chain parm_decl undergoes promotion through promote_decl_mode. I'd have placed the test and convert_to_mode call as an else to the if (!local), though. Ok with that change too. I tried it and it doesn't work due to mark_reg_pointer (local, TYPE_ALIGN (TREE_TYPE (TREE_TYPE (parm; mark_reg_pointer won't take a SUBREG. That is why I placed it after mark_reg_pointer . Is my orginal patch OK? Aah! Yes, thanks, I will fold it into a future updated version of the patch for PR64164. -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
Re: C++ delayed folding branch review
2015-07-27 18:51 GMT+02:00 Jason Merrill ja...@redhat.com: I've trimmed this to the previously mentioned issues that still need to be addressed; I'll do another full review after these are dealt with. Thanks for doing this summary of missing parts of prior review. On 06/13/2015 12:15 AM, Jason Merrill wrote: On 06/12/2015 12:11 PM, Kai Tietz wrote: @@ -1052,6 +1054,9 @@ adjust_temp_type (tree type, tree temp) { if (TREE_TYPE (temp) == type) return temp; + STRIP_NOPS (temp); + if (TREE_TYPE (temp) == type) +return temp; @@ -1430,6 +1438,8 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t, bool reduced_constant_expression_p (tree t) { + /* Make sure we remove useless initial NOP_EXPRs. */ + STRIP_NOPS (t); Within the constexpr code we should be folding away NOPs as they are generated, they shouldn't live this long. Well, we might see them on overflows ... We shouldn't within the constexpr code. NOPs for expressions that are non-constant due to overflow are added in cxx_eval_outermost_constant_expr, so we shouldn't see them in the middle of constexpr evaluation. @@ -1088,7 +1093,10 @@ cxx_bind_parameters_in_call (const constexpr_ctx *ctx, tree t, is_dummy_object (x)) { x = ctx-object; - x = cp_build_addr_expr (x, tf_warning_or_error); + if (x) + x = cp_build_addr_expr (x, tf_warning_or_error); + else + x = get_nth_callarg (t, i); This still should not be necessary. Yeah, most likely. But I got initially here some issues, so I don't see that this code would worsen things. If this code path is hit, that means something has broken my design, and I don't want to just paper over that. Please revert this change. case SIZEOF_EXPR: + if (processing_template_decl + (!COMPLETE_TYPE_P (TREE_TYPE (t)) + || TREE_CODE (TYPE_SIZE (TREE_TYPE (t))) != INTEGER_CST)) + return t; Why is this necessary? We don't want to resolve SIZEOF_EXPR within template-declarations for incomplete types, of if its size isn't fixed. Issue is that we otherwise get issues about expressions without existing type (as usual within template-declarations for some expressions). Yes, but we shouldn't have gotten this far with a dependent sizeof; maybe_constant_value just returns if instantiation_dependent_expression_p is true. @@ -3391,8 +3431,23 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t, case CONVERT_EXPR: case VIEW_CONVERT_EXPR: case NOP_EXPR: +case UNARY_PLUS_EXPR: { + enum tree_code tcode = TREE_CODE (t); tree oldop = TREE_OPERAND (t, 0); + + if (tcode == NOP_EXPR TREE_TYPE (t) == TREE_TYPE (oldop) TREE_OVERFLOW_P (oldop)) + { + if (!ctx-quiet) + permerror (input_location, overflow in constant expression); + /* If we're being permissive (and are in an enforcing + context), ignore the overflow. */ + if (!flag_permissive) + *overflow_p = true; + *non_constant_p = true; + + return t; + } tree op = cxx_eval_constant_expression (ctx, oldop, Why doesn't the call to cxx_eval_constant_expression at the bottom here handle oldop having TREE_OVERFLOW set? I just handled the case that we see here a wrapping NOP_EXPR around an overflow. As this isn't handled by cxx_eval_constant_expression. How does it need to be handled? A NOP_EXPR wrapped around an overflow is there to indicated that the expression is non-constant, and it can't be simplified any farther. Please give an example of what was going wrong. @@ -565,6 +571,23 @@ cp_gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p) switch (code) { +case SIZEOF_EXPR: + if (SIZEOF_EXPR_TYPE_P (*expr_p)) + *expr_p = cxx_sizeof_or_alignof_type (TREE_TYPE (TREE_OPERAND (*expr_p, + 0)), + SIZEOF_EXPR, false); + else if (TYPE_P (TREE_OPERAND (*expr_p, 0))) + *expr_p = cxx_sizeof_or_alignof_type (TREE_OPERAND (*expr_p, 0), + SIZEOF_EXPR, false); + else + *expr_p = cxx_sizeof_or_alignof_expr (TREE_OPERAND (*expr_p, 0), + SIZEOF_EXPR, false); + if (*expr_p == error_mark_node) + *expr_p = size_one_node; + + *expr_p = maybe_constant_value (*expr_p); + ret = GS_OK; + break; Why are these surviving until gimplification time? This might be still necessary. I will retest, when bootstrap works. As we now added SIZEOF_EXPR folding to cp_fold, and if we catch all expressions a sizeof can occure, this shouldn't be necessary anymore. AFAIR I saw here some issues about initialzation for global-variables, which weren't caught.
Re: [patch] [fixincludes] Ignore .DS_Store junk files when running make check
On 7/6/15, Jeff Law l...@redhat.com wrote: On 07/05/2015 04:58 PM, Eric Gallager wrote: I was just matching the code that was already used there... should the lines to ignore the CVS and .svn folders be re-written into the style you propose, too? Might as well have a consistent style. Embedding them into the find ought to be marginally faster than a pipeline of fgrep processes. jeff Okay, I tried embedding ! -name CVS/ ! -name .svn/ into the find command, too, but that just led to warnings such as: find: warning: Unix filenames usually don't contain slashes (though pathnames do). That means that '-name `CVS/'' will probably evaluate to false all the time on this system. You might find the '-wholename' test more useful, or perhaps '-samefile'. Alternatively, if you are using GNU grep, you could use 'find ... -print0 | grep -FzZ `CVS/''. find: warning: Unix filenames usually don't contain slashes (though pathnames do). That means that '-name `.svn/'' will probably evaluate to false all the time on this system. You might find the '-wholename' test more useful, or perhaps '-samefile'. Alternatively, if you are using GNU grep, you could use 'find ... -print0 | grep -FzZ `.svn/''. I'm not really sure which of these alternatives to go with... cc-ing the fixincludes maintainer to see which he wants to go with...
Re: [PING][PATCH, PR66851] Handle double reduction in parloops
On 24/07/15 12:30, Tom de Vries wrote: On 13/07/15 16:55, Tom de Vries wrote: Hi, this patch fixes PR66851. In parloops, we manage to parallelize outer loops, but not if the inner loop contains a reduction. There is an xfail in autopar/outer-4.c for this: ... /* { dg-final { scan-tree-dump-times parallelizing outer loop 1 parloops { xfail *-*-* } } } */ ... This patch allows outer loops with a reduction in the inner loop to be parallelized. Updated patch checks that we actually have an inner reduction that we can parallelize. So, uns-outer-4.c with unsigned int reduction will be paralellized, while outer-4.c with signed int reduction will not be paralellized. Bootstrapped on x86_64, reg-test in progress. OK for trunk? Thanks, - Tom Handle double reduction in parloops 2015-07-27 Tom de Vries t...@codesourcery.com * tree-parloops.c (reduc_stmt_res): New function. (initialize_reductions, add_field_for_reduction) (create_phi_for_local_result, create_loads_for_reductions) (create_stores_for_reduction, build_new_reduction): Handle case that reduc_stmt is a phi. (gather_scalar_reductions): Allow double_reduc reductions. * gcc.dg/autopar/uns-outer-4.c: Remove xfail on scan for parallelizing outer loop. * testsuite/libgomp.c/uns-outer-4.c: New test. --- gcc/testsuite/gcc.dg/autopar/uns-outer-4.c | 6 +-- gcc/tree-parloops.c| 73 ++ libgomp/testsuite/libgomp.c/uns-outer-4.c | 36 +++ 3 files changed, 102 insertions(+), 13 deletions(-) create mode 100644 libgomp/testsuite/libgomp.c/uns-outer-4.c diff --git a/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c b/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c index 30ead25..5eb67ea 100644 --- a/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c +++ b/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c @@ -12,9 +12,7 @@ parloop (int N) int i, j; unsigned int sum; - /* Double reduction is currently not supported, outer loop is not - parallelized. Inner reduction is detected, inner loop is - parallelized. */ + /* Double reduction is detected, outer loop is parallelized. */ sum = 0; for (i = 0; i N; i++) for (j = 0; j N; j++) @@ -23,5 +21,5 @@ parloop (int N) g_sum = sum; } -/* { dg-final { scan-tree-dump-times parallelizing outer loop 1 parloops { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times parallelizing outer loop 1 parloops } } */ /* { dg-final { scan-tree-dump-times loopfn 4 optimized } } */ diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index daf23f2..b06265c 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -549,6 +549,14 @@ take_address_of (tree obj, tree type, edge entry, return name; } +static tree +reduc_stmt_res (gimple stmt) +{ + return (gimple_code (stmt) == GIMPLE_PHI + ? gimple_phi_result (stmt) + : gimple_assign_lhs (stmt)); +} + /* Callback for htab_traverse. Create the initialization statement for reduction described in SLOT, and place it at the preheader of the loop described in DATA. */ @@ -575,7 +583,7 @@ initialize_reductions (reduction_info **slot, struct loop *loop) c = build_omp_clause (gimple_location (reduc-reduc_stmt), OMP_CLAUSE_REDUCTION); OMP_CLAUSE_REDUCTION_CODE (c) = reduc-reduction_code; - OMP_CLAUSE_DECL (c) = SSA_NAME_VAR (gimple_assign_lhs (reduc-reduc_stmt)); + OMP_CLAUSE_DECL (c) = SSA_NAME_VAR (reduc_stmt_res (reduc-reduc_stmt)); init = omp_reduction_init (c, TREE_TYPE (bvar)); reduc-init = init; @@ -982,7 +990,7 @@ add_field_for_reduction (reduction_info **slot, tree type) { struct reduction_info *const red = *slot; - tree var = gimple_assign_lhs (red-reduc_stmt); + tree var = reduc_stmt_res (red-reduc_stmt); tree field = build_decl (gimple_location (red-reduc_stmt), FIELD_DECL, SSA_NAME_IDENTIFIER (var), TREE_TYPE (var)); @@ -1042,12 +1050,12 @@ create_phi_for_local_result (reduction_info **slot, struct loop *loop) e = EDGE_PRED (store_bb, 1); else e = EDGE_PRED (store_bb, 0); - local_res = copy_ssa_name (gimple_assign_lhs (reduc-reduc_stmt)); + tree lhs = reduc_stmt_res (reduc-reduc_stmt); + local_res = copy_ssa_name (lhs); locus = gimple_location (reduc-reduc_stmt); new_phi = create_phi_node (local_res, store_bb); add_phi_arg (new_phi, reduc-init, e, locus); - add_phi_arg (new_phi, gimple_assign_lhs (reduc-reduc_stmt), - FALLTHRU_EDGE (loop-latch), locus); + add_phi_arg (new_phi, lhs, FALLTHRU_EDGE (loop-latch), locus); reduc-new_phi = new_phi; return 1; @@ -1140,7 +1148,7 @@ create_loads_for_reductions (reduction_info **slot, struct clsn_data *clsn_data) struct reduction_info *const red = *slot; gimple stmt; gimple_stmt_iterator gsi; - tree type = TREE_TYPE (gimple_assign_lhs (red-reduc_stmt)); + tree type = TREE_TYPE (reduc_stmt_res (red-reduc_stmt)); tree load_struct; tree name; tree x; @@ -1205,7 +1213,7 @@ create_stores_for_reduction (reduction_info **slot,
Re: [PATCH] Cleaning up incomplete type warning.
On 5 January 2015 at 21:14, Jeff Law l...@redhat.com wrote: On 12/24/14 11:07, Luis Felipe Strano Moraes wrote: This removes an unnecessary static variable from the code and also makes it a single warning instead of two. Patch originally proposed by Manuel López-Ibáñez. Luis Strano clean_incomplete_warning.log gcc/c/ChangeLog: 2014-12-24 Luis Felipe Strano Moraesluis.str...@gmail.com * c-decl.c (get_parm_info): cleaning up incomplete type warning. OK for the trunk assuming it passed bootstrap and regression testing. This was never committed. I regtested it and committed as https://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=226274 Cheers, Manuel.
Re: [patch] [fixincludes] Ignore .DS_Store junk files when running make check
Eric Gallager eg...@gwmail.gwu.edu writes: Okay, I tried embedding ! -name CVS/ ! -name .svn/ into the find -name does an exact match, so you don't need the slash. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Re: [PATCH 1/3] Refactor entry point to -Wmisleading-indentation
On Tue, Jun 23, 2015 at 3:05 PM, Patrick Palka patr...@parcs.ath.cx wrote: On Mon, Jun 22, 2015 at 2:56 PM, Patrick Palka patr...@parcs.ath.cx wrote: On Mon, Jun 22, 2015 at 1:29 PM, Jeff Law l...@redhat.com wrote: On 06/09/2015 11:31 AM, Patrick Palka wrote: This patch refactors the entry point of -Wmisleading-indentation from: void warn_for_misleading_indentation (location_t guard_loc, location_t body_loc, location_t next_stmt_loc, enum cpp_ttype next_tok_type, const char *guard_kind); to struct token_indent_info { location_t location; cpp_ttype type; rid keyword; }; void warn_for_misleading_indentation (const token_indent_info guard_tinfo, const token_indent_info body_tinfo, const token_indent_info next_tinfo); The purpose of this refactoring is to expose more information to the -Wmisleading-indentation implementation to allow for more advanced heuristics and for better coverage. (I decided to keep the usage of const references because nobody seems to mind. Also I added a new header file, c-indentation.h.) gcc/c-family/ChangeLog: * c-indentation.h (struct token_indent_info): Define. (get_token_indent_info): Define. (warn_for_misleading_information): Declare. * c-common.h (warn_for_misleading_information): Remove. * c-identation.c (warn_for_misleading_indentation): Change declaration to take three token_indent_infos. Adjust accordingly. * c-identation.c (should_warn_for_misleading_indentation): Likewise. Bail out early if the body is a compound statement. (guard_tinfo_to_string): Define. gcc/c/ChangeLog: * c-parser.c (c_parser_if_body): Take token_indent_info argument. Call warn_for_misleading_indentation even when the body is a semicolon. Extract token_indent_infos corresponding to the guard, body and next tokens. Adjust call to warn_for_misleading_indentation accordingly. (c_parser_else_body): Likewise. (c_parser_if_statement): Likewise. (c_parser_while_statement): Likewise. (c_parser_for_statement): Likewise. gcc/cp/ChangeLog: * parser.c (cp_parser_selection_statement): Move handling of semicolon body to ... (cp_parser_implicitly_scoped_statement): .. here. Call warn_for_misleading_indentation even when the body is a semicolon. Extract token_indent_infos corresponding to the guard, body and next tokens. Adjust call to warn_for_misleading_indentation accordingly. Take token_indent_info argument. (cp_parser_already_scoped_statement): Likewise. (cp_parser_selection_statement, cp_parser_iteration_statement): Extract a token_indent_info corresponding to the guard token. The only question in my mind is bootstrap regression testing. From reading the thread for the earlier version of this patch I got the impression you had bootstrapped and regression tested earlier versions. If you could confirm that you've bootstrapped and regression tested this version it'd be appreciated. You can do it on the individual patches or the set as a whole. I think I successfully bootstrapped + regtested this exact version but I'm not sure. I was going to do so again before committing anyway. I will fire off a build tonight and confirm the results tomorrow. Bootstrap + regtest on x86_64-linux-gnu was successful with no new regressions. Jeff Hi Jeff. Sorry for the late reply. I would like to commit this series now (after another bootstrap + regtest cycle) to work on further refinements to this warning. You clearly approved the last two out of three patches, but it is not clear whether you approved this first patch. Could you clarify your stance on this patch? Thanks again for reviewing.
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
Hi Jeff, On 24/07/15 19:43, Jeff Law wrote: On 07/24/2015 03:31 AM, Kyrill Tkachov wrote: Wouldn't it be better to walk BB_A, gathering the set of all the registers modified, then do a single walk through BB testing for uses of those registers? I think so, yes. I'll try that. You might look at resource.c -- I haven't looked at it in a long time, but it might have many of the interfaces you need to do this kind of book keeping. I experimented with resource.c and the roadblock I hit is that it seems to have an assumption that it operates on hard regs (in fact the struct it uses to describe the resources has a HARD_REG_SET for the regs) and so it triggers various HARD_REGISTER_P asserts when I try to use the functions there. if-conversion runs before register allocation, so we're dealing with pseudos here. My other attempt was to go over BB_A and mark the set registers in a bitmap then go over BB_B and do a FOR_EACH_SUBRTX of the SET_SRC of each insn. If a sub-rtx is a reg that is set in the bitmap from BB_A we return false. This seemed to do the job and testing worked out ok. That would require one walk over BB_A, one walk over BB_B but I don't know how expensive FOR_EACH_SUBRTX walks are... Would that be an acceptable solution? snip It fails when the last insn is not recognised, because noce_try_cmove_arith can modify the last insn, but I have not seen it cause any trouble. If it fails then back in noce_try_cmove_arith we goto end_seq_and_fail which ends the sequence and throws it away (and cancels if-conversion down that path), so it should be safe. OK, I was working for the assumption that memoization ought not fail, but it seems that was a bad assumption on my part.So given noce_try_cmove_arith can change the last insn and make it no-recognizable this code seems reasoanble. So I think the only outstanding issues are: 1. Investigate moving rather than re-emitting insns. I'll look into that, but what is the machinery by which one moves insns? Thanks, Kyrill 2. Investigate a more efficient means to find set/use conflicts between the two blocks, possibly using resource.[ch] jeff
[PATCH][1/N] Change GET_MODE_INNER to always return a non-void mode
Hi, Part 1 of this change is a clean-up. I have changed calls to GET_MODE_INNER (m) so that it returns m in cases where there is no inner mode. This simplifies some of the calling code by removing the need to check for VOIDmode and allows calling it unconditionally. I also removed element_precision () as it was only called in one place and thought it neater to call GET_MODE_PRECISION explicitly. Parts 2-4 will include further tidy-ups and optimisations based on [1/N]. Good to go? Regards, David Sherwood. 2015-07-17 David Sherwood david.sherw...@arm.com gcc/ * config/arm/arm.c (neon_element_bits, neon_valid_immediate): Call GET_MODE_INNER unconditionally. * config/spu/spu.c (arith_immediate_p): Likewise. * config/i386/i386.c (ix86_build_signbit_mask): Likewise. New variable. * expmed.c (synth_mult): Remove check for VOIDmode result from GET_MODE_INNER. (expand_mult_const): Likewise. * fold-const.c (): Replace call to element_precision with call to GET_MODE_PRECISION. * genmodes.c (emit_mode_inner_inline): Replace void_mode-name with m-name. (emit_mode_inner): Likewise. * lto-streamer-out.c (lto_write_mode_table): Update GET_MODE_INNER result check. * machmode.h (GET_MODE_UNIT_SIZE): Simplify. (GET_MODE_UNIT_PRECISION): Likewise. * rtlanal.c (subreg_get_info): Call GET_MODE_INNER unconditionally. * simplify-rtx.c (simplify_immed_subreg): Likewise. * stor-layout.c (bitwise_type_for_mode): Update assert. (element_precision): Remove. mode_inner1.patch Description: Binary data
Re: [PATCH 1/4][ARM][PR target/65697][5.1] Backport stronger barriers for __sync fetch-op builtins.
Ping. Updated patch attached. Also, retested for arm-none-linux-gnueabihf with native bootstrap and make check and for arm-none-eabi with cross compiled make check. On 02/07/15 14:12, Matthew Wahab wrote: The __sync builtins are implemented using barriers that are too weak for ARMv8 targets, this has been fixed on trunk for the ARM back-end. Since GCC-5.1 is also generating the incorrect code, it should also be fixed. This patch backports the changes made to strengthen the barriers emitted for the __sync fetch-and-op/op-and-fetch builtins. The trunk patch submission is at https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01410.html The commit is at https://gcc.gnu.org/ml/gcc-cvs/2015-06/msg01235.html Tested the series for arm-none-linux-gnueabihf with check-gcc Ok for the branch? Matthew 2015-07-02 Matthew Wahab matthew.wa...@arm.com Backport from trunk: 2015-06-29 Matthew Wahab matthew.wa...@arm.com PR target/65697 * config/armc/arm.c (arm_split_atomic_op): For ARMv8, replace an initial acquire barrier with final barrier. From 0c2f209f869aead3475fe491f08cf7640d2bc8fe Mon Sep 17 00:00:00 2001 From: mwahab mwahab@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Mon, 29 Jun 2015 16:03:34 + Subject: [PATCH 1/4] 2015-07-01 Matthew Wahab matthew.wa...@arm.com Backport PR target/65697 * config/armc/arm.c (arm_split_atomic_op): For ARMv8, replace an initial acquire barrier with final barrier. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@225132 138bc75d-0d04-0410-961f-82ee72b054a4 Conflicts: gcc/ChangeLog Change-Id: I2074541794ecad8847ada04690cd9132a51b6404 --- gcc/config/arm/arm.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 614ff0d..f694e74 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -27822,6 +27822,8 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem, rtx_code_label *label; rtx x; + bool is_armv8_sync = arm_arch8 is_mm_sync (model); + bool use_acquire = TARGET_HAVE_LDACQ !(is_mm_relaxed (model) || is_mm_consume (model) || is_mm_release (model)); @@ -27830,6 +27832,11 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem, !(is_mm_relaxed (model) || is_mm_consume (model) || is_mm_acquire (model)); + /* For ARMv8, a load-acquire is too weak for __sync memory orders. Instead, + a full barrier is emitted after the store-release. */ + if (is_armv8_sync) +use_acquire = false; + /* Checks whether a barrier is needed and emits one accordingly. */ if (!(use_acquire || use_release)) arm_pre_atomic_barrier (model); @@ -27900,7 +27907,8 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem, emit_unlikely_jump (gen_cbranchsi4 (x, cond, const0_rtx, label)); /* Checks whether a barrier is needed and emits one accordingly. */ - if (!(use_acquire || use_release)) + if (is_armv8_sync + || !(use_acquire || use_release)) arm_post_atomic_barrier (model); } -- 1.9.1
Re: [PATCH 3/4][ARM][PR target/65697][5.1] Add tests for __sync_builtins.
Ping. Updated patch attached. Also, retested for arm-none-linux-gnueabihf with native bootstrap and make check and for arm-none-eabi with cross compiled make check. On 02/07/15 14:17, Matthew Wahab wrote: This patch backports the tests added for code generated by the ARM back-end for the __sync builtins. The trunk patch submission is at https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01412.html The commit is at https://gcc.gnu.org/ml/gcc-cvs/2015-06/msg01237.html Tested the series for arm-none-linux-gnueabihf with check-gcc Ok for the branch? Matthew gcc/testsuite 2015-07-02 Matthew Wahab matthew.wa...@arm.com Backport from trunk: 2015-06-29 Matthew Wahab matthew.wa...@arm.com PR target/65697 * gcc.target/arm/armv-sync-comp-swap.c: New. * gcc.target/arm/armv-sync-op-acquire.c: New. * gcc.target/arm/armv-sync-op-full.c: New. * gcc.target/arm/armv-sync-op-release.c: New. From d1a53325eb47a5da55a8267deb5fbe168f7db4de Mon Sep 17 00:00:00 2001 From: mwahab mwahab@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Mon, 29 Jun 2015 16:12:12 + Subject: [PATCH 3/4] 2015-07-01 Matthew Wahab matthew.wa...@arm.com Backport PR target/65697 * gcc.target/arm/armv-sync-comp-swap.c: New. * gcc.target/arm/armv-sync-op-acquire.c: New. * gcc.target/arm/armv-sync-op-full.c: New. * gcc.target/arm/armv-sync-op-release.c: New. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@225134 138bc75d-0d04-0410-961f-82ee72b054a4 Conflicts: gcc/ChangeLog Change-Id: I16c02786765bbbfbb287fba863ba27fb6a56ddc5 --- gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c | 10 ++ gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c | 10 ++ gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c| 10 ++ gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c | 8 4 files changed, 38 insertions(+) create mode 100644 gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c create mode 100644 gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c create mode 100644 gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c create mode 100644 gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c b/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c new file mode 100644 index 000..f96c81a --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { do-require-effective-target arm_arch_v8a_ok } */ +/* { dg-options -O2 } */ +/* { dg-add-options arm_arch_v8a } */ + +#include ../aarch64/sync-comp-swap.x + +/* { dg-final { scan-assembler-times ldrex 2 } } */ +/* { dg-final { scan-assembler-times stlex 2 } } */ +/* { dg-final { scan-assembler-times dmb 2 } } */ diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c new file mode 100644 index 000..8d6659b --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { do-require-effective-target arm_arch_v8a_ok } */ +/* { dg-options -O2 } */ +/* { dg-add-options arm_arch_v8a } */ + +#include ../aarch64/sync-op-acquire.x + +/* { dg-final { scan-assembler-times ldrex 1 } } */ +/* { dg-final { scan-assembler-times stlex 1 } } */ +/* { dg-final { scan-assembler-times dmb 1 } } */ diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c new file mode 100644 index 000..a5ad3bd --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { do-require-effective-target arm_arch_v8a_ok } */ +/* { dg-options -O2 } */ +/* { dg-add-options arm_arch_v8a } */ + +#include ../aarch64/sync-op-full.x + +/* { dg-final { scan-assembler-times ldrex 12 } } */ +/* { dg-final { scan-assembler-times stlex 12 } } */ +/* { dg-final { scan-assembler-times dmb 12 } } */ diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c new file mode 100644 index 000..0d3be7b --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { do-require-effective-target arm_arch_v8a_ok } */ +/* { dg-options -O2 } */ +/* { dg-add-options arm_arch_v8a } */ + +#include ../aarch64/sync-op-release.x + +/* { dg-final { scan-assembler-times stl 1 } } */ -- 1.9.1
RE: [PATCH][1/N] Change GET_MODE_INNER to always return a non-void mode
Hi, Sorry, I forgot to mention I tested this on: aarch64 and aarch64_be - no regressions in gcc testsuite x86_64 - bootstrap build, no testsuite regressions arm-none-eabi - no regressions in gcc testsuite I will also make sure that I can do cross builds on a variety of different targets. Regards, David. -Original Message- From: David Sherwood [mailto:david.sherw...@arm.com] Sent: 27 July 2015 11:25 To: gcc-patches@gcc.gnu.org Subject: [PATCH][1/N] Change GET_MODE_INNER to always return a non-void mode Hi, Part 1 of this change is a clean-up. I have changed calls to GET_MODE_INNER (m) so that it returns m in cases where there is no inner mode. This simplifies some of the calling code by removing the need to check for VOIDmode and allows calling it unconditionally. I also removed element_precision () as it was only called in one place and thought it neater to call GET_MODE_PRECISION explicitly. Parts 2-4 will include further tidy-ups and optimisations based on [1/N]. Good to go? Regards, David Sherwood. 2015-07-17 David Sherwood david.sherw...@arm.com gcc/ * config/arm/arm.c (neon_element_bits, neon_valid_immediate): Call GET_MODE_INNER unconditionally. * config/spu/spu.c (arith_immediate_p): Likewise. * config/i386/i386.c (ix86_build_signbit_mask): Likewise. New variable. * expmed.c (synth_mult): Remove check for VOIDmode result from GET_MODE_INNER. (expand_mult_const): Likewise. * fold-const.c (): Replace call to element_precision with call to GET_MODE_PRECISION. * genmodes.c (emit_mode_inner_inline): Replace void_mode-name with m-name. (emit_mode_inner): Likewise. * lto-streamer-out.c (lto_write_mode_table): Update GET_MODE_INNER result check. * machmode.h (GET_MODE_UNIT_SIZE): Simplify. (GET_MODE_UNIT_PRECISION): Likewise. * rtlanal.c (subreg_get_info): Call GET_MODE_INNER unconditionally. * simplify-rtx.c (simplify_immed_subreg): Likewise. * stor-layout.c (bitwise_type_for_mode): Update assert. (element_precision): Remove.
Re: [PATCH 2/4][ARM][PR target/65697][5.1] Backport stronger barriers for __sync,compare-and-swap builtins.
Ping. Updated patch attached. Also, retested for arm-none-linux-gnueabihf with native bootstrap and make check and for arm-none-eabi with cross compiled make check. On 02/07/15 14:15, Matthew Wahab wrote: This patch backports the changes made to strengthen the barriers emitted for the __sync compare-and-swap builtins. The trunk patch submission is at https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01411.html The commit is at https://gcc.gnu.org/ml/gcc-cvs/2015-06/msg01236.html Tested the series for arm-none-linux-gnueabihf with check-gcc Ok for the branch? Matthew 2015-07-02 Matthew Wahab matthew.wa...@arm.com Backport from trunk: 2015-06-29 Matthew Wahab matthew.wa...@arm.com PR target/65697 * config/armc/arm.c (arm_split_compare_and_swap): For ARMv8, replace an initial acquire barrier with final barrier. From fdcde1aa0b852f2a01bb45115e28f694b0225fcf Mon Sep 17 00:00:00 2001 From: mwahab mwahab@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Mon, 29 Jun 2015 16:09:10 + Subject: [PATCH 2/4] 2015-07-01 Matthew Wahab matthew.wa...@arm.com Backport PR target/65697 * config/armc/arm.c (arm_split_compare_and_swap): For ARMv8, replace an initial acquire barrier with final barrier. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@225133 138bc75d-0d04-0410-961f-82ee72b054a4 Conflicts: gcc/ChangeLog Change-Id: Ifab505d792d6227c7d2231813dfb2e7826f0f450 --- gcc/config/arm/arm.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index f694e74..1e67a73 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -27757,6 +27757,8 @@ arm_split_compare_and_swap (rtx operands[]) scratch = operands[7]; mode = GET_MODE (mem); + bool is_armv8_sync = arm_arch8 is_mm_sync (mod_s); + bool use_acquire = TARGET_HAVE_LDACQ !(is_mm_relaxed (mod_s) || is_mm_consume (mod_s) || is_mm_release (mod_s)); @@ -27765,6 +27767,11 @@ arm_split_compare_and_swap (rtx operands[]) !(is_mm_relaxed (mod_s) || is_mm_consume (mod_s) || is_mm_acquire (mod_s)); + /* For ARMv8, the load-acquire is too weak for __sync memory orders. Instead, + a full barrier is emitted after the store-release. */ + if (is_armv8_sync) +use_acquire = false; + /* Checks whether a barrier is needed and emits one accordingly. */ if (!(use_acquire || use_release)) arm_pre_atomic_barrier (mod_s); @@ -27805,7 +27812,8 @@ arm_split_compare_and_swap (rtx operands[]) emit_label (label2); /* Checks whether a barrier is needed and emits one accordingly. */ - if (!(use_acquire || use_release)) + if (is_armv8_sync + || !(use_acquire || use_release)) arm_post_atomic_barrier (mod_s); if (is_mm_relaxed (mod_f)) -- 1.9.1
Re: [PATCH 4/4][ARM][PR target/65697][5.1] Fix tests for __sync_builtins.
Ping. Updated patch attached. Also, retested for arm-none-linux-gnueabihf with native bootstrap and make check and for arm-none-eabi with cross compiled make check. On 02/07/15 14:18, Matthew Wahab wrote: This patch backports fixes for the __sync builtin tests. The trunk patch submission is at https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00031.html The commit is at https://gcc.gnu.org/ml/gcc-cvs/2015-07/msg00025.html Tested the series for arm-none-linux-gnueabihf with check-gcc Ok for the branch? Matthew gcc/testsuite 2015-07-02 Matthew Wahab matthew.wa...@arm.com Backport from trunk: 2015-07-01 Matthew Wahab matthew.wa...@arm.com * gcc.target/arm/armv8-sync-comp-swap.c: Replace 'do-require-effective-target' with 'dg-require-effective-target'. * gcc.target/arm/armv8-sync-op-full.c: Likewise. * gcc.target/arm/armv8-sync-op-release.c: Likewise. * gcc.target/arm/armv8-sync-op-acquire.c: Likewise. Also, replace 'stlex' with 'strex' as the expected output. From d058686fe1027927a5fdfbb81a83526e3f9b9d6d Mon Sep 17 00:00:00 2001 From: mwahab mwahab@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Wed, 1 Jul 2015 12:16:01 + Subject: [PATCH 4/4] 2015-07-01 Matthew Wahab matthew.wa...@arm.com Backport * gcc.target/arm/armv8-sync-comp-swap.c: Replace 'do-require-effective-target' with 'dg-require-effective-target'. * gcc.target/arm/armv8-sync-op-full.c: Likewise. * gcc.target/arm/armv8-sync-op-release.c: Likewise. * gcc.target/arm/armv8-sync-op-acquire.c: Likewise. Also, replace 'stlex' with 'strex' as the expected output. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@225241 138bc75d-0d04-0410-961f-82ee72b054a4 Conflicts: gcc/testsuite/ChangeLog Change-Id: I19f2013f7bdd2dd035f36f0f7c9829cf6a86fb8e --- gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c | 2 +- gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c | 4 ++-- gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c| 2 +- gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c b/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c index f96c81a..0e95986 100644 --- a/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c +++ b/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { do-require-effective-target arm_arch_v8a_ok } */ +/* { dg-require-effective-target arm_arch_v8a_ok } */ /* { dg-options -O2 } */ /* { dg-add-options arm_arch_v8a } */ diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c index 8d6659b..c448599 100644 --- a/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c +++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c @@ -1,10 +1,10 @@ /* { dg-do compile } */ -/* { do-require-effective-target arm_arch_v8a_ok } */ +/* { dg-require-effective-target arm_arch_v8a_ok } */ /* { dg-options -O2 } */ /* { dg-add-options arm_arch_v8a } */ #include ../aarch64/sync-op-acquire.x /* { dg-final { scan-assembler-times ldrex 1 } } */ -/* { dg-final { scan-assembler-times stlex 1 } } */ +/* { dg-final { scan-assembler-times strex 1 } } */ /* { dg-final { scan-assembler-times dmb 1 } } */ diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c index a5ad3bd..cce9e00 100644 --- a/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c +++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { do-require-effective-target arm_arch_v8a_ok } */ +/* { dg-require-effective-target arm_arch_v8a_ok } */ /* { dg-options -O2 } */ /* { dg-add-options arm_arch_v8a } */ diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c index 0d3be7b..502a266 100644 --- a/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c +++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { do-require-effective-target arm_arch_v8a_ok } */ +/* { dg-require-effective-target arm_arch_v8a_ok } */ /* { dg-options -O2 } */ /* { dg-add-options arm_arch_v8a } */ -- 1.9.1
Re: [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER
On Mon, Jul 27, 2015 at 10:52:58AM +0100, pins...@gmail.com wrote: On Jul 27, 2015, at 2:26 AM, Jiong Wang jiong.w...@arm.com wrote: Andrew Pinski writes: On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang jiong.w...@arm.com wrote: James Greenhalgh writes: On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote: Current IRA still use both target macros in a few places. Tell IRA to use the order we defined rather than with it's own cost calculation. Allocate caller saved first, then callee saved. This is especially useful for LR/x30, as it's free to allocate and is pure caller saved when used in leaf function. Haven't noticed significant impact on benchmarks, but by grepping some keywords like Spilling, Push.*spill etc in ira rtl dump, the number is smaller. OK for trunk? OK, sorry for the delay. It might be mail client mangling, but please check that the trailing slashes line up in the version that gets committed. Thanks, James 2015-05-19 Jiong. Wang jiong.w...@arm.com gcc/ PR 63521 * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define. (HONOR_REG_ALLOC_ORDER): Define. Patch reverted. I did not see a reason why this patch was reverted. Maybe I am missing an email or something. There are several execution regressions under gcc testsuite, although as far as I can see it's this patch exposed hidding bugs in those testcases, but there might be one other issue, so to be conservative, I temporarily reverted this patch. If you are talking about: gcc.target/aarch64/aapcs64/func-ret-2.c execution Etc. These test cases are too dependent on the original register allocation order and really can be safely ignored. Really these three tests should be moved or written in a more sane way. Yup, completely agreed - but the testcases do throw up something interesting. If we are allocating registers to hold 128-bit values, and we pick x7 as highest preference, we implicitly allocate x8 along with it. I think we probably see the same thing if the first thing we do in a function is a structure copy through a back-end expanded movmem, which will likely begin with a 128-bit LDP using x7, x8. If the argument for this patch is that we prefer to allocate x7-x0 first, followed by x8, then we've potentially made a sub-optimal decision, our allocation order for 128-bit values is x7,x8,x5,x6 etc. My hunch is that we *might* get better code generation in this corner case out of some permutation of the allocation order for argument registers. I'm thinking something along the lines of {x6, x5, x4, x7, x3, x2, x1, x0, x8, ... } I asked Jiong to take a look at that, and I agree with his decision to reduce the churn on trunk and just revert the patch until we've come to a conclusion based on some evidence - rather than just my hunch! I agree that it would be harmless on trunk from a testing point of view, but I think Jiong is right to revert the patch until we better understand the code-generation implications. Of course, it might be that I am completely wrong! If you've already taken a look at using a register allocation order like the example I gave and have something to share, I'd be happy to read your advice! Thanks, James
Re: [ARM] Correct spelling of references to ARMv6KZ
On 27/07/15 10:47, Matthew Wahab wrote: On 23/07/15 12:04, Kyrill Tkachov wrote: GCC supports ARM architecture ARMv6KZ but refers to it as ARMv6ZK. This is made visible by the command line option -march=armv6zk and by the predefined macro __ARM_ARCH_6ZK__. This patch corrects the spelling internally and adds -march=armv6kz. To preserve existing behaviour, -march=armv6zk is kept as an alias of -march=armv6kz and both __ARM_ARCH_6KZ__ and __ARM_ARCH_6ZK__ macros are defined for the architecture. Use of -march=arm6kz will need to wait for binutils to be updated,[..] diff --git a/gcc/config/arm/driver-arm.c b/gcc/config/arm/driver-arm.c index c715bb7..7873606 100644 --- a/gcc/config/arm/driver-arm.c +++ b/gcc/config/arm/driver-arm.c @@ -35,6 +35,7 @@ static struct vendor_cpu arm_cpu_table[] = { {0xb02, armv6k, mpcore}, {0xb36, armv6j, arm1136j-s}, {0xb56, armv6t2, arm1156t2-s}, +{0xb76, armv6kz, arm1176jz-s}, {0xb76, armv6zk, arm1176jz-s}, {0xc05, armv7-a, cortex-a5}, {0xc07, armv7ve, cortex-a7}, This table is scanned from beginning to end, checking for the first field. You introduce a duplicate here, so the second form will never be reached. I'd suggest removing the wrong spelling from here, but the re-written march string will be passed to the assembler, so if the assembler is old and doesn't support the correct spelling we'll get errors. So it seems like in order to preserve backwards compatibility we don't want to put the correctly spelled entry here :( But definitely add a comment here mentioning the deliberate oversight. Respun patch attached. I've removed armv6kz entry from config/arm/driver-arm.c and replaced it with a comment for the armv6zk entry. Tested for arm-none-linux-gnueabihf with native bootstrap and make check. Matthew gcc/ 2015-07-27 Matthew Wahab matthew.wa...@arm.com * config/arm/arm-arches.def: Add armv6kz. Replace 6ZK with 6KZ and FL_FOR_ARCH6ZK with FL_FOR_ARCH6KZ. * config/arm/arm-c.c (arm_cpu_builtins): Emit __ARM_ARCH_6ZK__ for armv6kz targets. * config/arm/arm-cores.def: Replace 6ZK with 6KZ. * config/arm/arm-protos.h (FL_ARCH6KZ): New. (FL_FOR_ARCH6ZK): Remove. (FL_FOR_ARCH6KZ): New. (arm_arch6zk): New declaration. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm.c (arm_arch6kz): New. (arm_option_override): Set arm_arch6kz. * config/arm/arm.h (BASE_ARCH_6ZK): Rename to BASE_ARCH_6KZ. * config/arm/driver-arm.c: Add comment to armv6zk entry. * doc/invoke.texi: Replace armv6zk with armv6kz. This is ok for trunk. Thanks, Kyrill
[PATCH][RFC] Re-work GIMPLE checking to be gimple class friendly
I noticed that the code we generate for a simple gimple_assign_rhs1 (stmt) is quite bad as we have two checking pieces. The implementation is now static inline tree gimple_assign_rhs1 (const_gimple gs) { GIMPLE_CHECK (gs, GIMPLE_ASSIGN); return gimple_op (gs, 1); } and the hidden checking is due to gimple_op being static inline tree * gimple_ops (gimple gs) { size_t off; /* All the tuples have their operand vector at the very bottom of the structure. Note that those structures that do not have an operand vector have a zero offset. */ off = gimple_ops_offset_[gimple_statement_structure (gs)]; gcc_gimple_checking_assert (off != 0); return (tree *) ((char *) gs + off); (ugly in itself considering that we just verified we have a gassign which has an operand vector as member...). gimple_op incurs two additional memory reference to compute gimple_statement_structure and gimple_ops_offset_ plus the checking bits. The simplest thing would be to use GIMPLE_CHECK (gs, GIMPLE_ASSIGN); return as_a const gassign * (gs)-op[1]; but that's not optimal either because we check we have a gassign again (not optimized in stage1). So I decided to invent a fancy as_a which reports errors similar to GIMPLE_CHECK and makes the code look like const gassign *ass = GIMPLE_CHECK2const gassign * (gs); return ass-op[1]; for some reason I needed to overload is_a_helper for const_gimple (I thought the is_a machinery would transparently support qualified types?). I'm missing a fallback for !ENABLE_GIMPLE_CHECKING as well as an overload for 'gimple' I guess. I just changed gimple_assign_rhs1 for now. So this is a RFC. I understand in the end the whole GCC may use gassign everywhere we access gimple_assign_rhs1 but I don't see this happening soon and this simple proof-of-concept patch already reduces unoptimized text size of gimple-match.c from 836080 to 836359 (too bad) and optimized from 585731 to 193190 bytes (yay!) on x86_64 using trunk. Some inlining must go very differently - we just have 544 calls to gimple_assign_rhs1 in gimple-match.c. .optimizes shows all calls remaining as bb 89: _ZL18gimple_assign_rhs1PK21gimple_statement_base.part.32 (def_stmt_47); which is tree_node* gimple_assign_rhs1(const_gimple) (const struct gimple_statement_base * gs) { bb 2: gimple_check_failed (gs_1(D), /space/rguenther/tramp3d/trunk/gcc/gimple.h[0], 2381, gimple_assign_rhs1[0], 6, 0); } so eventually we just do a better job with profile estimation. I think inliner-wise we are even as we get rid of the gimple_op () inline but instead get the GIMPLE_CHECK2 inline. So - any comments? Thanks, Richard. 2015-07-27 Richard Biener rguent...@suse.de * gimple.h (GIMPLE_CHECK2): New inline template function. (gassign::code_): New constant static member. (is_a_helperconst gassign *): Add. (gimple_assign_rhs1): Use GIMPLE_CHECK2 and a cheaper way to access operand 1. * gimple.c (gassign::code_): Define. Index: gcc/gimple.h === --- gcc/gimple.h(revision 226240) +++ gcc/gimple.h(working copy) @@ -51,6 +51,16 @@ extern void gimple_check_failed (const_g gimple_check_failed (__gs, __FILE__, __LINE__, __FUNCTION__, \ (CODE), ERROR_MARK); \ } while (0) +template typename T +static inline T +GIMPLE_CHECK2(const_gimple gs, const char *file = __builtin_FILE (), + int line = __builtin_LINE (), const char *fun = __builtin_FUNCTION ()) +{ + T ret = dyn_cast T (gs); + if (ret) +gimple_check_failed (gs, file, line, fun, T()-code_, ERROR_MARK); + return ret; +} #else /* not ENABLE_GIMPLE_CHECKING */ #define gcc_gimple_checking_assert(EXPR) ((void)(0 (EXPR))) #define GIMPLE_CHECK(GS, CODE) (void)0 @@ -832,6 +842,7 @@ struct GTY((tag(GSS_WITH_OPS))) struct GTY((tag(GSS_WITH_MEM_OPS))) gassign : public gimple_statement_with_memory_ops { + static const enum gimple_code code_ = GIMPLE_ASSIGN; /* no additional fields; this uses the layout for GSS_WITH_MEM_OPS. */ }; @@ -864,6 +875,14 @@ is_a_helper gassign *::test (gimple gs template template inline bool +is_a_helper const gassign *::test (const_gimple gs) +{ + return gs-code == GIMPLE_ASSIGN; +} + +template +template +inline bool is_a_helper gbind *::test (gimple gs) { return gs-code == GIMPLE_BIND; @@ -2359,8 +2378,8 @@ gimple_assign_set_lhs (gimple gs, tree l static inline tree gimple_assign_rhs1 (const_gimple gs) { - GIMPLE_CHECK (gs, GIMPLE_ASSIGN); - return gimple_op (gs, 1); + const gassign *ass = GIMPLE_CHECK2const gassign * (gs); + return ass-op[1]; } Index: gcc/gimple.c === --- gcc/gimple.c(revision 226240) +++ gcc/gimple.c(working copy) @@ -89,6 +89,10 @@ static const char * const gimple_alloc_k
Re: [PATCHv2] [fixincludes] Ignore .DS_Store junk files when running make check
On 7/27/15, Andreas Schwab sch...@linux-m68k.org wrote: Eric Gallager eg...@gwmail.gwu.edu writes: Okay, I tried embedding ! -name CVS/ ! -name .svn/ into the find -name does an exact match, so you don't need the slash. Andreas. Okay, attached a new version of the patch; make check for fixincludes still passes with it. Could someone commit it for me? I don't have commit access. It's still pretty trivial, so I shouldn't need a copyright assignment for it, should I? Eric. fixincludes/check.tpl | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/fixincludes/check.tpl b/fixincludes/check.tpl index 0d1f444..ffd2b66 100644 --- a/fixincludes/check.tpl +++ b/fixincludes/check.tpl @@ -141,9 +141,8 @@ echo $exitok` cd $TESTBASE -find * -type f -print | \ -fgrep -v 'CVS/' | \ -fgrep -v '.svn/' ${TESTDIR}/LIST +find * -type f ! -name .DS_Store ! -name CVS ! -name .svn -print \ + ${TESTDIR}/LIST exitok=` exec ${TESTDIR}/LIST
Re: [libstdc++/67015, patch] Fix regex POSIX bracket parsing
On Mon, Jul 27, 2015 at 4:45 AM, Jonathan Wakely jwak...@redhat.com wrote: On 26/07/15 05:20 -0700, Tim Shen wrote: @@ -389,7 +391,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #endif } - void + _StringT _M_add_collating_element(const _StringT __s) { auto __st = _M_traits.lookup_collatename(__s.data(), Changing this return type is an ABI change. When compiled with older versions of GCC this function will not return anything, so code compiled with newer versions that links to that older code will read garbage off the stack if linked to the old version. (This isn't a problem for _M_expression_term because the return type is part of the mangled name for function templates). One solution would be to rename the function, so that code compiled with the new GCC will use the new function, not link to the old version that doesn't return anything. Done by s/_M_add_collating_element/_M_add_collate_element/. -- Regards, Tim Shen commit 4a3b4ab285b0ecabb706a776551dabf5a37059aa Author: Tim Shen tims...@google.com Date: Sun Jul 26 04:37:45 2015 -0700 PR libstdc++/67015 * include/bits/regex_compiler.h (_Compiler::_M_expression_term, _BracketMatcher::_M_add_collating_element): Change signature to make checking the and of bracket expression easier. * include/bits/regex_compiler.tcc (_Compiler::_M_expression_term): Treat '-' as a valid literal if it's at the end of bracket expression. * testsuite/28_regex/algorithms/regex_match/cstring_bracket_01.cc: New testcases. diff --git a/libstdc++-v3/include/bits/regex_compiler.h b/libstdc++-v3/include/bits/regex_compiler.h index 4472116..0cb0c04 100644 --- a/libstdc++-v3/include/bits/regex_compiler.h +++ b/libstdc++-v3/include/bits/regex_compiler.h @@ -116,8 +116,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION void _M_insert_bracket_matcher(bool __neg); + // Returns true if successfully matched one term and should continue. + // Returns false if the compiler should move on. templatebool __icase, bool __collate - void + bool _M_expression_term(pairbool, _CharT __last_char, _BracketMatcher_TraitsT, __icase, __collate __matcher); @@ -389,8 +391,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #endif } - void - _M_add_collating_element(const _StringT __s) + _StringT + _M_add_collate_element(const _StringT __s) { auto __st = _M_traits.lookup_collatename(__s.data(), __s.data() + __s.size()); @@ -400,6 +402,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #ifdef _GLIBCXX_DEBUG _M_is_ready = false; #endif + return __st; } void diff --git a/libstdc++-v3/include/bits/regex_compiler.tcc b/libstdc++-v3/include/bits/regex_compiler.tcc index 33d7118..9a62311 100644 --- a/libstdc++-v3/include/bits/regex_compiler.tcc +++ b/libstdc++-v3/include/bits/regex_compiler.tcc @@ -424,8 +424,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __last_char.first = true; __last_char.second = _M_value[0]; } - while (!_M_match_token(_ScannerT::_S_token_bracket_end)) - _M_expression_term(__last_char, __matcher); + while (_M_expression_term(__last_char, __matcher)); __matcher._M_ready(); _M_stack.push(_StateSeqT( *_M_nfa, @@ -434,21 +433,31 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION templatetypename _TraitsT templatebool __icase, bool __collate -void +bool _Compiler_TraitsT:: _M_expression_term(pairbool, _CharT __last_char, _BracketMatcher_TraitsT, __icase, __collate __matcher) { + if (_M_match_token(_ScannerT::_S_token_bracket_end)) + return false; + if (_M_match_token(_ScannerT::_S_token_collsymbol)) - __matcher._M_add_collating_element(_M_value); + { + auto __symbol = __matcher._M_add_collate_element(_M_value); + if (__symbol.size() == 1) + { + __last_char.first = true; + __last_char.second = __symbol[0]; + } + } else if (_M_match_token(_ScannerT::_S_token_equiv_class_name)) __matcher._M_add_equivalence_class(_M_value); else if (_M_match_token(_ScannerT::_S_token_char_class_name)) __matcher._M_add_character_class(_M_value, false); - // POSIX doesn't permit '-' as a start-range char (say [a-z--0]), - // except when the '-' is the first character in the bracket expression - // ([--0]). ECMAScript treats all '-' after a range as a normal character. - // Also see above, where _M_expression_term gets called. + // POSIX doesn't allow '-' as a start-range char (say [a-z--0]), + // except when the '-' is the first or last character in the bracket + // expression ([--0]). ECMAScript treats all '-' after a range as a
Re: [PATCH] warn for unsafe calls to __builtin_return_address
So, my suggestion would be to warn for any call with a nonzero value. The current documentation says that you should only use nonzero values for debug purposes. A warning would help yes, how many people read the manual after all :-) Thank you both for the feedback. Attached is a simplified patch to issue a warning for all builtin_xxx_address calls with any non-zero argument. Martin gcc/ChangeLog 2015-07-27 Martin Sebor mse...@redhat.com * c-family/c.opt (-Wbuiltin-address): New warning option. * doc/invoke.texi (Wbuiltin-address): Document it. * doc/extend.texi (__builtin_frame_addrress, __builtin_return_addrress): Clarify possible effects of calling the functions with non-zero arguments and mention -Wbuiltin-address. * builtins.c (expand_builtin_frame_address): Handle -Wbuiltin-address. gcc/testsuite/ChangeLog 2015-07-27 Martin Sebor mse...@redhat.com * g++.dg/Wbuiltin-address-in-Wall.C: New test. * g++.dg/Wbuiltin-address.C: New test. * g++.dg/Wno-builtin-address.C: New test. * gcc.dg/Wbuiltin-address-in-Wall.c: New test. * gcc.dg/Wbuiltin-address.c: New test. * gcc.dg/Wno-builtin-address.c: New test. diff --git a/gcc/builtins.c b/gcc/builtins.c index e8fe3db..48379f8 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -4564,34 +4564,38 @@ expand_builtin_frame_address (tree fndecl, tree exp) { /* The argument must be a nonnegative integer constant. It counts the number of frames to scan up the stack. - The value is the return address saved in that frame. */ + The value is either the frame pointer value or the return + address saved in that frame. */ if (call_expr_nargs (exp) == 0) /* Warning about missing arg was already issued. */ return const0_rtx; else if (! tree_fits_uhwi_p (CALL_EXPR_ARG (exp, 0))) { - if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_FRAME_ADDRESS) - error (invalid argument to %__builtin_frame_address%); - else - error (invalid argument to %__builtin_return_address%); + error (invalid argument to %qD, fndecl); return const0_rtx; } else { - rtx tem - = expand_builtin_return_addr (DECL_FUNCTION_CODE (fndecl), - tree_to_uhwi (CALL_EXPR_ARG (exp, 0))); + /* Number of frames to scan up the stack. */ + const unsigned HOST_WIDE_INT count = tree_to_uhwi (CALL_EXPR_ARG (exp, 0)); + + rtx tem = expand_builtin_return_addr (DECL_FUNCTION_CODE (fndecl), count); /* Some ports cannot access arbitrary stack frames. */ if (tem == NULL) { - if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_FRAME_ADDRESS) - warning (0, unsupported argument to %__builtin_frame_address%); - else - warning (0, unsupported argument to %__builtin_return_address%); + warning (0, invalid argument to %qD, fndecl); return const0_rtx; } + if (0 count) + { + /* Warn since no effort is made to ensure that any frame + beyond the current one exists or can be safely reached. */ + warning (OPT_Wbuiltin_address, calling %qD with + a non-zero argument is unsafe, fndecl); + } + /* For __builtin_frame_address, return what we've got. */ if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_FRAME_ADDRESS) return tem; diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 285952e..bdff624 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -295,6 +295,10 @@ Wbool-compare C ObjC C++ ObjC++ Var(warn_bool_compare) Warning LangEnabledBy(C ObjC C++ ObjC++,Wall) Warn about boolean expression compared with an integer value different from true/false +Wbuiltin-address +C ObjC C++ ObjC++ Var(warn_builtin_address) Warning LangEnabledBy(C ObjC C++ ObjC++,Wall) +Warn when __builtin_frame_address or __builtin_return_address is used unsafely + Wbuiltin-macro-redefined C ObjC C++ ObjC++ CPP(warn_builtin_macro_redefined) CppReason(CPP_W_BUILTIN_MACRO_REDEFINED) Var(cpp_warn_builtin_macro_redefined) Init(1) Warning Warn when a built-in preprocessor macro is undefined or redefined diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 9ad2b68..1b4596c 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -8562,8 +8562,11 @@ to determine if the top of the stack has been reached. Additional post-processing of the returned value may be needed, see @code{__builtin_extract_return_addr}. -This function should only be used with a nonzero argument for debugging -purposes. +Calling this function with a nonzero argument can have unpredictable +effects, including crashing the calling program. As a result, calls +that are considered unsafe are diagnosed when the @option{-Wbuiltin-address} +option is in effect. Such calls are typically only useful in debugging +situations. @end deftypefn @deftypefn {Built-in Function} {void *} __builtin_extract_return_addr (void *@var{addr}) @@ -8601,8 +8604,11 @@ any function other than the current one; in such cases, or when the top of the stack has been reached, this
Re: [PATCH 0/9] start converting POINTER_SIZE to a hook
On Mon, Jul 27, 2015 at 10:14:41AM -0600, Jeff Law wrote: On 07/27/2015 03:17 AM, Richard Biener wrote: On Mon, Jul 27, 2015 at 5:10 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders tbsaunde+...@tbsaunde.org Hi, $subject. patches individually bootstrapped + regtested on x86_64-linux-gnu, and run through config-list.mk with more patches removing usage of the macro. Ok? With POINTER_SIZE now being expensive (target hook) you might consider moving most users to use pointer_sized_int_node or some other global derived from POINTER_SIZE. Which of course raises the question of why we are hookizing this... if you'd want a truly switchable target you'd have to switch all global trees as well (or hookize them individually). Not sure -- it doesn't remove any conditionally compiled code... One could easily argue that it's just another step on the path towards a switchable target -- which in and of itself is a reasonable design goal. So my some what more intermediate goal was to have less files including target specific headers, eventually getting to the point we might be able to build a large part of the compiler independent of the target. Of course I think in the end really switchable targets would be nice. Trev Trevor, maybe a quick note on the motivation would help here... jeff
Re: [PATCH 0/9] start converting POINTER_SIZE to a hook
On Mon, Jul 27, 2015 at 09:05:08PM +0100, Richard Sandiford wrote: Jeff Law l...@redhat.com writes: On 07/27/2015 03:17 AM, Richard Biener wrote: On Mon, Jul 27, 2015 at 5:10 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders tbsaunde+...@tbsaunde.org Hi, $subject. patches individually bootstrapped + regtested on x86_64-linux-gnu, and run through config-list.mk with more patches removing usage of the macro. Ok? With POINTER_SIZE now being expensive (target hook) you might consider moving most users to use pointer_sized_int_node or some other global derived from POINTER_SIZE. Which of course raises the question of why we are hookizing this... if you'd want a truly switchable target you'd have to switch all global trees as well (or hookize them individually). Not sure -- it doesn't remove any conditionally compiled code... One could easily argue that it's just another step on the path towards a switchable target -- which in and of itself is a reasonable design goal. Trevor, maybe a quick note on the motivation would help here... I think at least we should use data hooks rather than function hooks, since this value should a constant within a subtarget. It should only change for target_reinit. I agree in principal, but I wasn't sure where all I might need to change the values of the hooks, and of course I wondered if there might be some crazy target where that's not good enough. Alternatively we could have a new target_globals structure that is initialised with the result of calling the hook. If we do that though, it might make sense to consolidate the hooks rather than have one for every value. E.g. having one function for UNITS_PER_WORD, one for POINTER_SIZE, one for Pmode, etc., would lead to some very verbose target code. so something like struct target_types { unsigned long pointer_size; ... }; const target_types targetm.get_type_data () ? that seems pretty reasonable, and I wouldn't expect too many ordering issues, but who knows. Its too bad nobody has taken on the big job of turning targetm into a class so we can hope for some devirt help from the compiler. thanks! Trev Perhaps the main problem with these approaches is ensuring that the value is set up early enough. Thanks, Richard
Re: [PATCH 4/4] define ASM_OUTPUT_LABEL to the name of a function
On Mon, Jul 27, 2015 at 11:06:58AM +0200, Richard Biener wrote: On Sat, Jul 25, 2015 at 4:37 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders tbsaunde+...@tbsaunde.org * config/arc/arc.h, config/bfin/bfin.h, config/frv/frv.h, config/ia64/ia64-protos.h, config/ia64/ia64.c, config/ia64/ia64.h, config/lm32/lm32.h, config/mep/mep.h, config/mmix/mmix.h, config/rs6000/rs6000.c, config/rs6000/xcoff.h, config/spu/spu.h, config/visium/visium.h, defaults.h: Define ASM_OUTPUT_LABEL to the name of a function. * output.h (default_output_label): New prototype. * varasm.c (default_output_label): New function. * vmsdbgout.c: Include tm_p.h. * xcoffout.c: Likewise. Just a general remark - the GCC output machinery is known to be slow, adding indirect calls might be not the very best idea without refactoring some of it. ah, I wasn't aware of that. On the other hand some of these hooks seem to generally be big so the call over head might not matter that much. I suppose if this is something we really care about we might want to consider pushing the libgas project farther so we can avoid all this text formatting all together. Did you do any performance measurements for artificial testcases exercising the specific bits you change? no since I wasn't aware it was a concern. I can try to do that in the next couple days. thanks! Trev Richard. --- gcc/config/arc/arc.h | 3 +-- gcc/config/bfin/bfin.h| 5 + gcc/config/frv/frv.h | 6 +- gcc/config/ia64/ia64-protos.h | 1 + gcc/config/ia64/ia64.c| 11 +++ gcc/config/ia64/ia64.h| 8 +--- gcc/config/lm32/lm32.h| 3 +-- gcc/config/mep/mep.h | 8 +--- gcc/config/mmix/mmix.h| 3 +-- gcc/config/pa/pa-protos.h | 1 + gcc/config/pa/pa.c| 12 gcc/config/pa/pa.h| 9 + gcc/config/rs6000/rs6000-protos.h | 1 + gcc/config/rs6000/rs6000.c| 8 gcc/config/rs6000/xcoff.h | 3 +-- gcc/config/spu/spu.h | 3 +-- gcc/config/visium/visium.h| 3 +-- gcc/defaults.h| 6 +- gcc/output.h | 3 +++ gcc/varasm.c | 9 + gcc/vmsdbgout.c | 1 + gcc/xcoffout.c| 1 + 22 files changed, 60 insertions(+), 48 deletions(-) diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h index d98cce1..d3747b9 100644 --- a/gcc/config/arc/arc.h +++ b/gcc/config/arc/arc.h @@ -1245,8 +1245,7 @@ do { \ /* This is how to output the definition of a user-level label named NAME, such as the label on a static function or variable NAME. */ -#define ASM_OUTPUT_LABEL(FILE, NAME) \ -do { assemble_name (FILE, NAME); fputs (:\n, FILE); } while (0) +#define ASM_OUTPUT_LABEL default_output_label #define ASM_NAME_P(NAME) ( NAME[0]=='*') diff --git a/gcc/config/bfin/bfin.h b/gcc/config/bfin/bfin.h index 26ba7c2..08906aa 100644 --- a/gcc/config/bfin/bfin.h +++ b/gcc/config/bfin/bfin.h @@ -1044,10 +1044,7 @@ typedef enum directives { ASM_OUTPUT_LABEL(FILE, NAME); \ } while (0) -#define ASM_OUTPUT_LABEL(FILE, NAME)\ - do { assemble_name (FILE, NAME);\ -fputs (:\n,FILE);\ - } while (0) +#define ASM_OUTPUT_LABEL default_output_label #define ASM_OUTPUT_LABELREF(FILE,NAME) \ do { fprintf (FILE, _%s, NAME); \ diff --git a/gcc/config/frv/frv.h b/gcc/config/frv/frv.h index b0d66fd..1d25974 100644 --- a/gcc/config/frv/frv.h +++ b/gcc/config/frv/frv.h @@ -1668,11 +1668,7 @@ do { \ `assemble_name (STREAM, NAME)' to output the name itself; before and after that, output the additional assembler syntax for defining the name, and a newline. */ -#define ASM_OUTPUT_LABEL(STREAM, NAME) \ -do { \ - assemble_name (STREAM, NAME); \ - fputs (:\n, STREAM); \ -} while (0) +#define ASM_OUTPUT_LABEL default_output_label /* Globalizing directive for a label. */ #define GLOBAL_ASM_OP \t.globl diff --git a/gcc/config/ia64/ia64-protos.h b/gcc/config/ia64/ia64-protos.h index 29fc714..8e540e4 100644 --- a/gcc/config/ia64/ia64-protos.h +++ b/gcc/config/ia64/ia64-protos.h @@ -72,6 +72,7 @@ extern rtx ia64_expand_builtin (tree, rtx, rtx, machine_mode, int); extern rtx ia64_va_arg (tree, tree); #endif /*
Re: [PATCH][RFC][match.pd] optimize (X C) == N when C is power of 2
On Fri, 24 Jul 2015, Jeff Law wrote: On 07/24/2015 03:44 AM, Ramana Radhakrishnan wrote: In expr.c, with TER you can detect such patterns, in this case when expanding the comparison, but perhaps we want a *.pd file that would have rules that would be only GIMPLE and only enabled in a special pass right before (or very close to) expansion, that would perform such instruction selection. Wild idea, but could it be considered to have target-specific match.pd files that can be included in the main match.pd? That way, targets would get the benefit of getting the target-specific folding they benefit from at the very beginning of compilation without stepping on other targets toes. The downside is preventing duplication, potentially reducing generic improvements and a maintenance headache for gimple optimizers. So how about wedding the two ideas that have sprouted out of this discussion. Specifically having a pass apply a target specific match.pd, but only do so at the end of the gimple optimization pipeline? The design goal would (of course) be to change representations in ways that allow the gimple-rtl expanders to generate more efficient code for the target. It avoids introducing the target bits early in the gimple pipeline, but still gives a clean way for targets to rewrite gimple for the benefit of gimple-rtl expansion. I think it also aligns with the idea of pushing back RTL expansion and expose some target specifics after another GIMPLE lowering phase. I'm also thinking of addressing-mode selection and register promotion. So at least if we think of that target specific match.pd pass as containing all RTL expansion tricks done with TER only then it should be quite simple to make it work. Richard.
Re: [ARM] Optimize compare against smin/umin
Hi Michael, On 26/07/15 23:54, Michael Collison wrote: Here is an updated patch that addresses the issues you mentioned: 2015-07-24 Michael Collison michael.colli...@linaro.org Ramana Radhakrishnan ramana.radhakrish...@arm.com * gcc/config/arm/arm.md (*arm_smin_cmp): New pattern. (*arm_umin_cmp): Likewise. * gcc.target/arm/mincmp.c: Test min compare idiom. Just New test. would be a better ChangeLog entry for this. diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 0be70a8..361c292 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -3455,6 +3455,44 @@ (set_attr type multiple,multiple)] ) +;; t = (s/u)min (x, y) +;; cc = cmp (t, z) +;; is the same as +;; cmp x, z +;; cmpge(u) y, z + +(define_insn_and_split *arm_smin_cmp + [(set (reg:CC CC_REGNUM) +(compare:CC + (smin:SI (match_operand:SI 0 s_register_operand r) + (match_operand:SI 1 s_register_operand r)) + (match_operand:SI 2 s_register_operand r)))] + TARGET_32BIT + # + reload_completed + [(set (reg:CC CC_REGNUM) +(compare:CC (match_dup 0) (match_dup 2))) + (cond_exec (ge:CC (reg:CC CC_REGNUM) (const_int 0)) + (set (reg:CC CC_REGNUM) + (compare:CC (match_dup 1) (match_dup 2] +) + +(define_insn_and_split *arm_umin_cmp + [(set (reg:CC CC_REGNUM) +(compare:CC + (umin:SI (match_operand:SI 0 s_register_operand r) + (match_operand:SI 1 s_register_operand r)) + (match_operand:SI 2 s_register_operand r)))] + TARGET_32BIT + # + reload_completed + [(set (reg:CC CC_REGNUM) +(compare:CC (match_dup 0) (match_dup 2))) + (cond_exec (geu:CC (reg:CC CC_REGNUM) (const_int 0)) + (set (reg:CC CC_REGNUM) + (compare:CC (match_dup 1) (match_dup 2] +) + (define_expand umaxsi3 [(parallel [ (set (match_operand:SI 0 s_register_operand ) diff --git a/gcc/testsuite/gcc.target/arm/mincmp.c b/gcc/testsuite/gcc.target/arm/mincmp.c new file mode 100644 index 000..2a55c6d --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mincmp.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ +/* { dg-require-effective-target arm32 } */ + +#define min(x, y) ((x) = (y)) ? (x) : (y) + +unsigned int foo (unsigned int i, unsigned int x ,unsigned int y) Comma after unsigned int x in the wrong place. +{ + return i (min (x, y)); +} + +int bar (int i, int x, int y) +{ + return i (min (x, y)); +} Can you please use GNU style for the testcase. New line after return type, function name an arguments on their own line. Ok with these changes. Thanks, Kyrill
Re: OMP. More constification
On Sun, Jul 26, 2015 at 11:01:09AM -0400, Nathan Sidwell wrote: I found some more missing consts. The size, kind, var and function arrays emitted by omp-low are read only, but are not so marked. This patch First of all, the hostaddrs array is going to be written by the library side for GOMP_MAP_FIRSTPRIVATE, so can't be const and shouldn't be const on the compiler emitted side either. Similarly for the use_device_ptr clause on GOMP_target_data* handling. The size array is const from the library side, sure, but on the compiler side is only conditionally static (static only if all the values written to it are INTEGER_CSTs), so not sure if we want to make it use const qualified type unconditionally, rather than only change it to the const qualified type if it is TREE_STATIC at the end. The kinds array I'm afraid might very well soon follow the size array model. Jakub
Re: [PATCH][RFC][match.pd] optimize (X C) == N when C is power of 2
On 25/07/15 03:19, Segher Boessenkool wrote: On Fri, Jul 24, 2015 at 09:09:39AM +0100, Kyrill Tkachov wrote: This transformation folds (X % C) == N into X ((1 (size - 1)) | (C - 1))) == N for constants C and N where N is positive and C is a power of 2. For N = 0 you can transform it to ((unsigned)X % C) == N and for 0 N C you can transform it to X 0 ((unsigned)X % C) == N (or X = 0) and for -C N 0 it is X 0 ((unsigned)X % C) == N + C (or X = 0) and for other N it is 0. For N not a constant, well, do you really care? :-) (That second case might eventually fold to your original expression). Yeah, these avoid the potentially expensive mask, but introduce more operations, which I believe may not be desirable at this stage. Unless these transformations are ok for match.pd I'll try to implement this transformation at RTL expansion time. Thanks, Kyrill Segher
[PATCH, PING] PR debug/53927: fix value for DW_AT_static_link
On 07/20/2015 09:39 AM, Pierre-Marie de Rodat wrote: This patch fixes the static link description in DWARF to comply with the specification. In order to do so, it appends a field to all FRAME objects to hold the frame base address (DW_AT_frame_base) so that the nested subrograms can directly reference this field. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53927 for the context (in particular why we need this additional field in FRAME objects). Ping for the patch submitted in https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01629.html. -- Pierre-Marie de Rodat
[PATCH, PING] Track indirect calls for call site information in debug info.
On 07/20/2015 02:45 PM, Pierre-Marie de Rodat wrote: On PowerPC targets with -mlongcall, most subprogram calls are turned into indirect calls: the call target is read from a register even though it is compile-time known. This makes it difficult for machine code static analysis engines to recover the callee information. The attached patch is an attempt to help such engines, generating DW_AT_abstract_origin attributes for all DW_TAG_GNU_call_site we are interested in. Ping for the patch submitted in https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01641.html. -- Pierre-Marie de Rodat
Re: [PATCH] Use single shared memory block pool for all pool allocators
On Sun, Jul 26, 2015 at 9:09 PM, pins...@gmail.com wrote: On Jul 26, 2015, at 11:50 AM, Andi Kleen a...@firstfloor.org wrote: Mikhail Maltsev malts...@gmail.com writes: Hi, all! Recently I did some profiling of GCC to find hotspots and areas of possible performance improvement among them. glibc malloc(3) is one of (perhaps known) I've been compiling gcc with tcmalloc to do a similar speedup. It would be interesting to compare that to your patch. Another useful optimization is to adjust the allocation size to be = 2MB. Then modern Linux kernels often can give you a large page, which cuts down TLB overhead. I did similar changes some time ago for the garbage collector. Unless you are running with 64k pages which I do all the time on my armv8 system. This can be a host configurable value of course. But first of all (without looking at the patch but just reading the description) this sounds like a good idea. Maybe still allow pools to use their own backing if the object size is larger than the block size of the caching pool? Thanks, Richard. Thanks, Andrew BTW I saw big differences in larger LTO builds. -Andi -- a...@linux.intel.com -- Speaking for myself only
Re: [PATCH 0/9] start converting POINTER_SIZE to a hook
On Mon, Jul 27, 2015 at 5:10 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders tbsaunde+...@tbsaunde.org Hi, $subject. patches individually bootstrapped + regtested on x86_64-linux-gnu, and run through config-list.mk with more patches removing usage of the macro. Ok? With POINTER_SIZE now being expensive (target hook) you might consider moving most users to use pointer_sized_int_node or some other global derived from POINTER_SIZE. Which of course raises the question of why we are hookizing this... if you'd want a truly switchable target you'd have to switch all global trees as well (or hookize them individually). Richard. Trev Trevor Saunders (9): remove POINTER_SIZE_UNITS macro add pointer_size target hook target.h: change to use targetm.pointer_size instead of POINTER_SIZE varasm.c: switch from POINTER_SIZE to targetm.pointer_size () ubsan.c: switch from POINTER_SIZE to targetm.pointer_size () tree-chkp.c: switch to targetm.pointer_size () stor-layout.c: switch to targetm.pointer_size () tree.c: switch to targetm.pointer_size () emit-rtl.c: switch to targetm.pointer_size () gcc/c-family/c-cppbuiltin.c | 2 +- gcc/defaults.h | 3 --- gcc/doc/tm.texi | 7 +++ gcc/doc/tm.texi.in | 2 ++ gcc/dwarf2asm.c | 4 ++-- gcc/emit-rtl.c | 5 +++-- gcc/lto/lto-object.c| 3 ++- gcc/stor-layout.c | 9 + gcc/target.def | 8 gcc/target.h| 8 gcc/targhooks.c | 8 gcc/targhooks.h | 1 + gcc/tree-chkp.c | 14 -- gcc/tree.c | 3 ++- gcc/ubsan.c | 3 ++- gcc/varasm.c| 12 ++-- 16 files changed, 65 insertions(+), 27 deletions(-) -- 2.4.0
Re: [PATCH 4/9] varasm.c: switch from POINTER_SIZE to targetm.pointer_size ()
On Mon, Jul 27, 2015 at 5:10 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders tbsaunde+...@tbsaunde.org gcc/ChangeLog: 2015-07-26 Trevor Saunders tbsaunde+...@tbsaunde.org * varasm.c (assemble_addr_to_section): Call targetm.pointer_size (). (dump_tm_clone_pairs): Likewise. --- gcc/varasm.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/gcc/varasm.c b/gcc/varasm.c index 8cb2ec9..238ae39 100644 --- a/gcc/varasm.c +++ b/gcc/varasm.c @@ -1543,8 +1543,8 @@ void assemble_addr_to_section (rtx symbol, section *sec) { switch_to_section (sec); - assemble_align (POINTER_SIZE); - assemble_integer (symbol, pointer_size_units (), POINTER_SIZE, 1); + assemble_align (targetm.pointer_size ()); + assemble_integer (symbol, pointer_size_units (), targetm.pointer_size (), 1); That'll invoke the pointer size hook twice. Refactoring assemble_integer to avoid this would be nice. Also caching the values in places like this. Richard. } /* Return the numbered .ctors.N (if CONSTRUCTOR_P) or .dtors.N (if @@ -5870,14 +5870,14 @@ dump_tm_clone_pairs (vectm_alias_pair tm_alias_pairs) if (!switched) { switch_to_section (targetm.asm_out.tm_clone_table_section ()); - assemble_align (POINTER_SIZE); + assemble_align (targetm.pointer_size ()); switched = true; } assemble_integer (XEXP (DECL_RTL (src), 0), - pointer_size_units (), POINTER_SIZE, 1); + pointer_size_units (), targetm.pointer_size (), 1); assemble_integer (XEXP (DECL_RTL (dst), 0), - pointer_size_units (), POINTER_SIZE, 1); + pointer_size_units (), targetm.pointer_size (), 1); } } -- 2.4.0
Re: [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER
Andrew Pinski writes: On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang jiong.w...@arm.com wrote: James Greenhalgh writes: On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote: Current IRA still use both target macros in a few places. Tell IRA to use the order we defined rather than with it's own cost calculation. Allocate caller saved first, then callee saved. This is especially useful for LR/x30, as it's free to allocate and is pure caller saved when used in leaf function. Haven't noticed significant impact on benchmarks, but by grepping some keywords like Spilling, Push.*spill etc in ira rtl dump, the number is smaller. OK for trunk? OK, sorry for the delay. It might be mail client mangling, but please check that the trailing slashes line up in the version that gets committed. Thanks, James 2015-05-19 Jiong. Wang jiong.w...@arm.com gcc/ PR 63521 * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define. (HONOR_REG_ALLOC_ORDER): Define. Patch reverted. I did not see a reason why this patch was reverted. Maybe I am missing an email or something. There are several execution regressions under gcc testsuite, although as far as I can see it's this patch exposed hidding bugs in those testcases, but there might be one other issue, so to be conservative, I temporarily reverted this patch. Thanks, Andrew -- Regards, Jiong
Re: [ARM] Correct spelling of references to ARMv6KZ
On 23/07/15 12:04, Kyrill Tkachov wrote: GCC supports ARM architecture ARMv6KZ but refers to it as ARMv6ZK. This is made visible by the command line option -march=armv6zk and by the predefined macro __ARM_ARCH_6ZK__. This patch corrects the spelling internally and adds -march=armv6kz. To preserve existing behaviour, -march=armv6zk is kept as an alias of -march=armv6kz and both __ARM_ARCH_6KZ__ and __ARM_ARCH_6ZK__ macros are defined for the architecture. Use of -march=arm6kz will need to wait for binutils to be updated,[..] diff --git a/gcc/config/arm/driver-arm.c b/gcc/config/arm/driver-arm.c index c715bb7..7873606 100644 --- a/gcc/config/arm/driver-arm.c +++ b/gcc/config/arm/driver-arm.c @@ -35,6 +35,7 @@ static struct vendor_cpu arm_cpu_table[] = { {0xb02, armv6k, mpcore}, {0xb36, armv6j, arm1136j-s}, {0xb56, armv6t2, arm1156t2-s}, +{0xb76, armv6kz, arm1176jz-s}, {0xb76, armv6zk, arm1176jz-s}, {0xc05, armv7-a, cortex-a5}, {0xc07, armv7ve, cortex-a7}, This table is scanned from beginning to end, checking for the first field. You introduce a duplicate here, so the second form will never be reached. I'd suggest removing the wrong spelling from here, but the re-written march string will be passed to the assembler, so if the assembler is old and doesn't support the correct spelling we'll get errors. So it seems like in order to preserve backwards compatibility we don't want to put the correctly spelled entry here :( But definitely add a comment here mentioning the deliberate oversight. Respun patch attached. I've removed armv6kz entry from config/arm/driver-arm.c and replaced it with a comment for the armv6zk entry. Tested for arm-none-linux-gnueabihf with native bootstrap and make check. Matthew gcc/ 2015-07-27 Matthew Wahab matthew.wa...@arm.com * config/arm/arm-arches.def: Add armv6kz. Replace 6ZK with 6KZ and FL_FOR_ARCH6ZK with FL_FOR_ARCH6KZ. * config/arm/arm-c.c (arm_cpu_builtins): Emit __ARM_ARCH_6ZK__ for armv6kz targets. * config/arm/arm-cores.def: Replace 6ZK with 6KZ. * config/arm/arm-protos.h (FL_ARCH6KZ): New. (FL_FOR_ARCH6ZK): Remove. (FL_FOR_ARCH6KZ): New. (arm_arch6zk): New declaration. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm.c (arm_arch6kz): New. (arm_option_override): Set arm_arch6kz. * config/arm/arm.h (BASE_ARCH_6ZK): Rename to BASE_ARCH_6KZ. * config/arm/driver-arm.c: Add comment to armv6zk entry. * doc/invoke.texi: Replace armv6zk with armv6kz. diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def index 840c1ff..3dafaa5 100644 --- a/gcc/config/arm/arm-arches.def +++ b/gcc/config/arm/arm-arches.def @@ -44,7 +44,8 @@ ARM_ARCH(armv6, arm1136js, 6, FL_CO_PROC | FL_FOR_ARCH6) ARM_ARCH(armv6j, arm1136js, 6J, FL_CO_PROC | FL_FOR_ARCH6J) ARM_ARCH(armv6k, mpcore, 6K, FL_CO_PROC | FL_FOR_ARCH6K) ARM_ARCH(armv6z, arm1176jzs, 6Z, FL_CO_PROC | FL_FOR_ARCH6Z) -ARM_ARCH(armv6zk, arm1176jzs, 6ZK, FL_CO_PROC | FL_FOR_ARCH6ZK) +ARM_ARCH(armv6kz, arm1176jzs, 6KZ, FL_CO_PROC | FL_FOR_ARCH6KZ) +ARM_ARCH(armv6zk, arm1176jzs, 6KZ, FL_CO_PROC | FL_FOR_ARCH6KZ) ARM_ARCH(armv6t2, arm1156t2s, 6T2, FL_CO_PROC | FL_FOR_ARCH6T2) ARM_ARCH(armv6-m, cortexm1, 6M, FL_FOR_ARCH6M) ARM_ARCH(armv6s-m, cortexm1, 6M, FL_FOR_ARCH6M) diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c index 297995b..9bf3973 100644 --- a/gcc/config/arm/arm-c.c +++ b/gcc/config/arm/arm-c.c @@ -167,6 +167,11 @@ arm_cpu_builtins (struct cpp_reader* pfile, int flags) } if (arm_arch_iwmmxt2) builtin_define (__IWMMXT2__); + /* ARMv6KZ was originally identified as the misspelled __ARM_ARCH_6ZK__. To + preserve the existing behaviour, the misspelled feature macro must still be + defined. */ + if (arm_arch6kz) +builtin_define (__ARM_ARCH_6ZK__); if (TARGET_AAPCS_BASED) { if (arm_pcs_default == ARM_PCS_AAPCS_VFP) diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def index 103c314..9d47fcf 100644 --- a/gcc/config/arm/arm-cores.def +++ b/gcc/config/arm/arm-cores.def @@ -125,8 +125,8 @@ ARM_CORE(arm1026ej-s, arm1026ejs, arm1026ejs, 5TEJ, FL_LDSCHED, 9e) /* V6 Architecture Processors */ ARM_CORE(arm1136j-s, arm1136js, arm1136js, 6J, FL_LDSCHED, 9e) ARM_CORE(arm1136jf-s, arm1136jfs, arm1136jfs, 6J, FL_LDSCHED | FL_VFPV2, 9e) -ARM_CORE(arm1176jz-s, arm1176jzs, arm1176jzs, 6ZK, FL_LDSCHED, 9e) -ARM_CORE(arm1176jzf-s, arm1176jzfs, arm1176jzfs, 6ZK, FL_LDSCHED | FL_VFPV2, 9e) +ARM_CORE(arm1176jz-s, arm1176jzs, arm1176jzs, 6KZ, FL_LDSCHED, 9e) +ARM_CORE(arm1176jzf-s, arm1176jzfs, arm1176jzfs, 6KZ, FL_LDSCHED | FL_VFPV2, 9e) ARM_CORE(mpcorenovfp, mpcorenovfp, mpcorenovfp, 6K, FL_LDSCHED, 9e) ARM_CORE(mpcore, mpcore, mpcore,
Re: [V850] Hookize LIBCALL_VALUE
Hi Anatoliy, OK for trunk? 2015-07-26 Anatoly Sokolov ae...@post.ru * config/v850/v850.h (LIBCALL_VALUE): Remove macros. * config/v850/v850.md (RV_REGNUM): New constants. * config/v850/v850.c (v850_libcall_value): New functions. (v850_function_value_regno_p, v850_function_value): Use RV_REGNUM. (TARGET_LIBCALL_VALUE): Define. Approved - please apply. Cheers Nick
Re: [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER
On Jul 27, 2015, at 2:26 AM, Jiong Wang jiong.w...@arm.com wrote: Andrew Pinski writes: On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang jiong.w...@arm.com wrote: James Greenhalgh writes: On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote: Current IRA still use both target macros in a few places. Tell IRA to use the order we defined rather than with it's own cost calculation. Allocate caller saved first, then callee saved. This is especially useful for LR/x30, as it's free to allocate and is pure caller saved when used in leaf function. Haven't noticed significant impact on benchmarks, but by grepping some keywords like Spilling, Push.*spill etc in ira rtl dump, the number is smaller. OK for trunk? OK, sorry for the delay. It might be mail client mangling, but please check that the trailing slashes line up in the version that gets committed. Thanks, James 2015-05-19 Jiong. Wang jiong.w...@arm.com gcc/ PR 63521 * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define. (HONOR_REG_ALLOC_ORDER): Define. Patch reverted. I did not see a reason why this patch was reverted. Maybe I am missing an email or something. There are several execution regressions under gcc testsuite, although as far as I can see it's this patch exposed hidding bugs in those testcases, but there might be one other issue, so to be conservative, I temporarily reverted this patch. If you are talking about: gcc.target/aarch64/aapcs64/func-ret-2.c execution Etc. These test cases are too dependent on the original register allocation order and really can be safely ignored. Really these three tests should be moved or written in a more sane way. Thanks, Andrew Thanks, Andrew -- Regards, Jiong