Re: [Patch, fortran] PR 37131, inline matmul
Le 05/05/2015 08:25, Thomas Koenig a écrit : Hello world, this is an update of the matmul inline patch. The only difference to the last version is that it has the ubound simplification taken out. Sorry, I delayed this, but it wasn't (yet) forgotten. Below a few answers to https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01247.html and documentation fixes. * simplify.c (simplify_bound): Simplify the case of the lower bound of an assumed-shape argument. Entry to be removed. ;-) Index: fortran/array.c === --- fortran/array.c (Revision 18) +++ fortran/array.c (Arbeitskopie) @@ -338,6 +338,9 @@ gfc_resolve_array_spec (gfc_array_spec *as, int ch if (as == NULL) return true; + if (as-resolved) +return true; + Why this? Because you get regressions otherwise. Not resolving an array spec twice should do no harm, and resolving it twice does so - I hit the error message in check_restricted. I'm not sure what is wrong, maybe PR 23466 was not fully fixed, but this works. Hum, it seems to work without it here. Can you double check? @@ -524,29 +542,11 @@ constant_string_length (gfc_expr *e) } -/* Returns a new expression (a variable) to be used in place of the old one, - with an assignment statement before the current statement to set - the value of the variable. Creates a new BLOCK for the statement if - that hasn't already been done and puts the statement, plus the - newly created variables, in that block. Special cases: If the - expression is constant or a temporary which has already - been created, just copy it. */ - -static gfc_expr* -create_var (gfc_expr * e) Keep a comment here. Still exists, further down. I don't mind the comment being moved around together with create_var. :-) I was asking for a comment for the new function insert_block. + gfc_simplify_expr (ar-start[i], 0); + } + else if (was_fullref) + { + ar-dimen_type[i] = DIMEN_RANGE; + ar-start[i] = NULL; + ar-end[i] = NULL; + ar-stride[i] = NULL; + } Is this reachable ? Not in the current incarnation, I wanted to keep it around for a full segment later. I can also remove this. At least add an assert, a comment, something telling that it's not used. I would rather remove it until it's actually used, but I can live with it, if it becomes used shortly. [...] Index: fortran/options.c === --- fortran/options.c (Revision 18) +++ fortran/options.c (Arbeitskopie) [...] + + if (flag_external_blas flag_inline_matmul_limit 0) +flag_inline_matmul_limit = flag_blas_matmul_limit; Hum, shouldn't we do something for flag_inline_matmul_limit 0 as well? This is done automatically, by the options machinery. That is cool Huh? is it? I was talking about this: Using -fblas-matmul-limit=10, one gets inlining until 10. Using -fblas-matmul-limit=10 -finline-matmul-limit=20 the inlining limit is _increased_ to 20. It is of course questionable for a user to specify a low limit for blas being lower than the high limit for inlining, so that we have a contradictory specification for the matrix sizes in between. But I think it would make more sense to have blas take precedence, that is have flag_inline_matmul_limit clamped to flag_blas_matmul_limit, even for flag_inline_matmul_limit 0. Is this done by the options machinery? Either way, I don't mind too much, this is a corner case. Index: invoke.texi === --- invoke.texi (Revision 18) +++ invoke.texi (Arbeitskopie) @@ -1537,6 +1538,20 @@ geometric mean of the dimensions of the argument a The default value for @var{n} is 30. +@item -finline-matmul-limit=@var{n} +@opindex @code{finline-matmul-limit} +When front-end optimiztion is active, optimization some calls to the @code{MATMUL} +intrinsic function will be inlined. s/some calls ... inlined/calls ... inlined for small matrix sizes/ or something like that. Setting +@code{-finline-matmul-limit=0} will disable inlining in all cases. +Setting this option it to a specified value will call the library +routines for matrices with size larger than @var{n}. s/it // Setting ... value @var{n} will produce inline code for matrix sizes up to @var{n}; the library routines will be called for bigger matrices. Maybe add something about code bloat here. Suggestion: This may result in code size increase if the matrix size can't be determined at compile time, as code for both cases is generated. If the matrices +involved are not square, the size comparison is performed using the +geometric mean of the dimensions of the argument and result matrices. + +The default value for @var{n} is the value specified for
Re: [PING^3] [PATCH] [AArch64, NEON] Improve vmulX intrinsics
Hi James, Thanks for your comment. Seems we need a 'dup' before 'fmul' if we use the GCC vector extension syntax way. Example: dup v1.2s, v1.s[0] fmulv0.2s, v1.2s, v0.2s And we need another pattern to combine this two insns into 'fmul %0.2s,%1.2s,%2.s[0]', which is kind of complex. BTW: maybe it's better to reconsider this issue after this patch, right? Thanks. Jiang jiji On Sat, Apr 11, 2015 at 11:37:47AM +0100, Jiangjiji wrote: Hi, This is a ping for: https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00772.html Regtested with aarch64-linux-gnu on QEMU. This patch has no regressions for aarch64_be-linux-gnu big-endian target too. OK for the trunk? Thanks. Jiang jiji -- Re: [PING^2] [PATCH] [AArch64, NEON] Improve vmulX intrinsics Hi, Kyrill Thank you for your suggestion. I fixed it and regtested with aarch64-linux-gnu on QEMU. This patch has no regressions for aarch64_be-linux-gnu big-endian target too. OK for the trunk? Hi Jiang, I'm sorry that I've taken so long to get to this, I've been out of office for several weeks. I have one comment. +__extension__ static __inline float32x2_t __attribute__ +((__always_inline__)) +vmul_n_f32 (float32x2_t __a, float32_t __b) { + return __builtin_aarch64_mul_nv2sf (__a, __b); } + For vmul_n_* intrinsics, is there a reason we don't want to use the GCC vector extension syntax to allow us to write these as: __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vmul_n_f32 (float32x2_t __a, float32_t __b) { return __a * __b; } It would be great if we could make that work. Thanks, James
Re: [Bug target/66015] New: align directives not propagated after __attribute__ ((__optimize__ (O2)))
Hi Jim, Steve, Andreas Please find here a fix for the issue reported by Andreas https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64835 for ia64. same than x86 and aarch64. I don't the environment to run the testsuite for ia64. would you mind giving it a try and verify that it fixes the issue ? many thanks Christian 2015-05-05 Christian Bruel christian.br...@st.com PR target/66015 * config/ia64/ia64.c (ia64_option_override): Move align_loops, and align_functions into ia64_override_options_after_change. 2015-05-05 Christian Bruel christian.br...@st.com PR target/66015 * gcc.target/ia64/iinline-attr-1.c: New test. Index: gcc/config/ia64/ia64.c === --- gcc/config/ia64/ia64.c (revision 222803) +++ gcc/config/ia64/ia64.c (working copy) @@ -6051,10 +6051,6 @@ init_machine_status = ia64_init_machine_status; - if (align_functions = 0) -align_functions = 64; - if (align_loops = 0) -align_loops = 32; if (TARGET_ABI_OPEN_VMS) flag_no_common = 1; @@ -6066,6 +6062,11 @@ static void ia64_override_options_after_change (void) { + if (align_functions = 0) +align_functions = 64; + if (align_loops = 0) +align_loops = 32; + if (optimize = 3 !global_options_set.x_flag_selective_scheduling !global_options_set.x_flag_selective_scheduling2) Index: gcc/testsuite/gcc.target/ia64/iinline-attr-1.c === --- gcc/testsuite/gcc.target/ia64/iinline-attr-1.c (revision 0) +++ gcc/testsuite/gcc.target/ia64/iinline-attr-1.c (working copy) @@ -0,0 +1,28 @@ +/* Verify that alignment flags are set when attribute __optimize is used. */ +/* { dg-do compile } */ + +extern void non_existent(int); + +__attribute__ ((__optimize__ (O2))) +static void hooray () +{ + non_existent (1); +} + +__attribute__ ((__optimize__ (O2))) +static void hiphip (void (*f)()) +{ + non_existent (2); + f (); +} + +__attribute__ ((__optimize__ (O2))) +int test (void) +{ + hiphip (hooray); + return 0; +} + +/* { dg-final { scan-assembler .align 64 } } */ + +
Re: [PATCH, i386]: Switch x86 to TARGET_SUPPORTS_WIDE_INT
Hi HJ, I've checked spec2000 performance. Only few spec binaries differ. Anyway performance is unchanged. Thanks, Evgeny On Thu, Apr 30, 2015 at 10:08 PM, H.J. Lu hjl.to...@gmail.com wrote: On Thu, Apr 30, 2015 at 12:01 PM, Uros Bizjak ubiz...@gmail.com wrote: Hello! Attached patch switches x86 to TARGET_SUPPORTS_WIDE_INT. The patch builds on the fact that build requires HOST_BITS_PER_WIDE_INT = 64 capable host. Taking this in account, noticeable blocks of code can be removed, and all but one immed_double_const can be removed. The only wide-int mode that remains is TImode. 2015-04-30 Uros Bizjak ubiz...@gmail.com * config/i386/i386.h (TARGET_SUPPORTS_WIDE_INT): New define. * config/i386/i386.c (ix86_legitimate_constant_p): Handle TImode as CONST_WIDE_INT, not CONST_DOUBLE. (ix86_cannot_force_const_mem): Handle CONST_WIDE_INT. (output_pic_addr_const): Do not handle VOIDmode CONST_DOUBLEs. (ix86_find_base_term): Do not check for CONST_DOUBLE. (ix86_print_operand): Do not handle non-FPmode CONST_DOUBLEs. (ix86_build_signbit_mask): Rewrite using wide ints. (ix86_split_to_parts) [HOST_BITS_PER_WIDE_INT 64]: Remove. (ix86_rtx_costs): Handle CONST_WIDE_INT. (find_constant): Ditto. * config/i386/i386.md (bts, btr, btc peepholes): Rewrite using gen_int_mode. * config/i386/predicates.md (x86_64_immediate_operand) case CONST_INT: Remove HOST_BITS_PER_WIDE_INT == 32 code. (x86_64_zext_immediate_operand): Remove CONST_DOUBLE handling. case CONST_INT: Remove HOST_BITS_PER_WIDE_INT == 32 code. (const0_operand): Also match const_wide_int. (constm1_operand): Ditto. (const1_operand): Ditto. Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32} and i686-linux-gnu. I won't be able to commit the patch until Monday. H.J., can you please test it on your SPEC testers, so there won't be any surprises w.r.t. performance issues. Hi Igor, Can your team run SPEC CPU on this patch? Thanks. -- H.J.
Re: [PATCH] Fix eipa_sra AAPCS issue (PR target/65956)
On 05/05/15 13:37, Richard Biener wrote: On May 5, 2015 1:01:59 PM GMT+02:00, Richard Earnshaw richard.earns...@foss.arm.com wrote: On 05/05/15 11:54, Richard Earnshaw wrote: On 05/05/15 08:32, Jakub Jelinek wrote: On Mon, May 04, 2015 at 05:00:11PM +0200, Jakub Jelinek wrote: So at least changing arm_needs_doubleword_align for non-aggregates would likely not break anything that hasn't been broken already and would unbreak the majority of cases. Attached (untested so far). It indeed changes code generated for over-aligned va_arg, but as I believe you can't properly pass those in the ... caller, this should just fix it so that va_arg handling matches the caller (and likewise for callees for named argument passing). The following testcase shows that eipa_sra changes alignment even for the aggregates. Change aligned (8) to aligned (4) to see another possibility. Actually I misread it, for the aggregates esra actually doesn't change anything, which is the reason why the testcase doesn't fail. The problem with the scalars is that esra first changed it to the over-aligned MEM_REFs and then later on eipa_sra used the types of the MEM_REFs created by esra. 2015-05-05 Jakub Jelinek ja...@redhat.com PR target/65956 * config/arm/arm.c (arm_needs_doubleword_align): For non-aggregate types check TYPE_ALIGN of TYPE_MAIN_VARIANT rather than type itself. * gcc.c-torture/execute/pr65956.c: New test. --- gcc/config/arm/arm.c.jj2015-05-04 21:51:42.0 +0200 +++ gcc/config/arm/arm.c 2015-05-05 09:20:52.481693337 +0200 @@ -6063,8 +6063,13 @@ arm_init_cumulative_args (CUMULATIVE_ARG static bool arm_needs_doubleword_align (machine_mode mode, const_tree type) { - return (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY -|| (type TYPE_ALIGN (type) PARM_BOUNDARY)); + if (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY) +return true; I don't think this is right (though I suspect the existing code has the same problem). We should only look at mode if there is no type information. The problem is that GCC has a nasty habit of assigning real machine modes to things that are really BLKmode and we've run into several cases where this has royally screwed things up. So for consistency in the ARM back-end we are careful to only use mode when type is NULL (= it's a libcall). + if (type == NULL_TREE) +return false; + if (AGGREGATE_TYPE_P (type)) +return TYPE_ALIGN (type) PARM_BOUNDARY; + return TYPE_ALIGN (TYPE_MAIN_VARIANT (type)) PARM_BOUNDARY; } It ought to be possible to re-order this, though, to static bool arm_needs_doubleword_align (machine_mode mode, const_tree type) { - return (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY - || (type TYPE_ALIGN (type) PARM_BOUNDARY)); + if (type != NULL_TREE) +{ + if (AGGREGATE_TYPE_P (type)) +return TYPE_ALIGN (type) PARM_BOUNDARY; + return TYPE_ALIGN (TYPE_MAIN_VARIANT (type)) PARM_BOUNDARY; +} + return (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY); } Either way, this would need careful cross-testing against an existing compiler. It looks as though either patch would cause ABI incompatibility for typedef int alignedint __attribute__((aligned((8; int __attribute__((weak)) foo (int a, alignedint b) {return b;} void bar (alignedint x) { foo (1, x); } Where currently gcc uses r2 as the argument register for b in foo. And for foo (1,2) or an int typed 2nd arg? Richard. R. As is int foo2 (int a, int b); void bar2 (alignedint x) { foo2 (1, x); } ... R
Re: [PATCH] Fix eipa_sra AAPCS issue (PR target/65956)
On 05/05/15 13:46, Richard Earnshaw wrote: On 05/05/15 13:37, Richard Biener wrote: On May 5, 2015 1:01:59 PM GMT+02:00, Richard Earnshaw richard.earns...@foss.arm.com wrote: On 05/05/15 11:54, Richard Earnshaw wrote: On 05/05/15 08:32, Jakub Jelinek wrote: On Mon, May 04, 2015 at 05:00:11PM +0200, Jakub Jelinek wrote: So at least changing arm_needs_doubleword_align for non-aggregates would likely not break anything that hasn't been broken already and would unbreak the majority of cases. Attached (untested so far). It indeed changes code generated for over-aligned va_arg, but as I believe you can't properly pass those in the ... caller, this should just fix it so that va_arg handling matches the caller (and likewise for callees for named argument passing). The following testcase shows that eipa_sra changes alignment even for the aggregates. Change aligned (8) to aligned (4) to see another possibility. Actually I misread it, for the aggregates esra actually doesn't change anything, which is the reason why the testcase doesn't fail. The problem with the scalars is that esra first changed it to the over-aligned MEM_REFs and then later on eipa_sra used the types of the MEM_REFs created by esra. 2015-05-05 Jakub Jelinek ja...@redhat.com PR target/65956 * config/arm/arm.c (arm_needs_doubleword_align): For non-aggregate types check TYPE_ALIGN of TYPE_MAIN_VARIANT rather than type itself. * gcc.c-torture/execute/pr65956.c: New test. --- gcc/config/arm/arm.c.jj 2015-05-04 21:51:42.0 +0200 +++ gcc/config/arm/arm.c 2015-05-05 09:20:52.481693337 +0200 @@ -6063,8 +6063,13 @@ arm_init_cumulative_args (CUMULATIVE_ARG static bool arm_needs_doubleword_align (machine_mode mode, const_tree type) { - return (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY - || (type TYPE_ALIGN (type) PARM_BOUNDARY)); + if (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY) +return true; I don't think this is right (though I suspect the existing code has the same problem). We should only look at mode if there is no type information. The problem is that GCC has a nasty habit of assigning real machine modes to things that are really BLKmode and we've run into several cases where this has royally screwed things up. So for consistency in the ARM back-end we are careful to only use mode when type is NULL (= it's a libcall). + if (type == NULL_TREE) +return false; + if (AGGREGATE_TYPE_P (type)) +return TYPE_ALIGN (type) PARM_BOUNDARY; + return TYPE_ALIGN (TYPE_MAIN_VARIANT (type)) PARM_BOUNDARY; } It ought to be possible to re-order this, though, to static bool arm_needs_doubleword_align (machine_mode mode, const_tree type) { - return (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY -|| (type TYPE_ALIGN (type) PARM_BOUNDARY)); + if (type != NULL_TREE) +{ + if (AGGREGATE_TYPE_P (type)) +return TYPE_ALIGN (type) PARM_BOUNDARY; + return TYPE_ALIGN (TYPE_MAIN_VARIANT (type)) PARM_BOUNDARY; +} + return (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY); } Either way, this would need careful cross-testing against an existing compiler. It looks as though either patch would cause ABI incompatibility for typedef int alignedint __attribute__((aligned((8; int __attribute__((weak)) foo (int a, alignedint b) {return b;} void bar (alignedint x) { foo (1, x); } Where currently gcc uses r2 as the argument register for b in foo. And for foo (1,2) or an int typed 2nd arg? Richard. R. As is int foo2 (int a, int b); void bar2 (alignedint x) { foo2 (1, x); } The real question here is why is TYPE the type of the value, rather than the type of the formal as expressed by the prototype (or implicit prototype in the case of variadics or KR)? Surely this is the mid-end passing the wrong information to the back-end. R. ... R
Re: [PATCH] Fix eipa_sra AAPCS issue (PR target/65956)
On Tue, May 05, 2015 at 01:49:55PM +0100, Richard Earnshaw wrote: The real question here is why is TYPE the type of the value, rather than the type of the formal as expressed by the prototype (or implicit prototype in the case of variadics or KR)? Surely this is the mid-end passing the wrong information to the back-end. There is nothing else for unnamed arguments (KR, stdarg). For named arguments, the backend has the option to save the fntype in CUMULATIVE_ARGS and look it up when it needs that. But, that will still mean KR and stdarg will be just broken on arm. Jakub
Re: [PATCH] Fix eipa_sra AAPCS issue (PR target/65956)
On Tue, May 05, 2015 at 02:02:19PM +0100, Richard Earnshaw wrote: In a literal sense, yes. However, even KR stdarg have standard promotion and conversion rules (size int = int, floats promoted to double, etc). What are those rules for GCC's overaligned types (ie where in the docs does it say what happens and how a back-end should interpret them)? Shouldn't the mid-end be doing that work so as to For the middle-end, the TYPE_ALIGN info on expression types is considered useless, you can get there anything. There is no conversion rule to what you get for myalignedint + int, or (myalignedint) int, or (int) myalignedint, etc. create a consistent view of the values passed into the back-end? It seems to me that at present the back-end has to be psychic as to what is really happening. No, the backend just shouldn't consider TYPE_ALIGN on the scalars, and it seems only arm ever looks at that. Jakub
Re: [PATCH] Fix eipa_sra AAPCS issue (PR target/65956)
On Tue, May 05, 2015 at 12:01:59PM +0100, Richard Earnshaw wrote: Either way, this would need careful cross-testing against an existing compiler. It looks as though either patch would cause ABI incompatibility for typedef int alignedint __attribute__((aligned((8; int __attribute__((weak)) foo (int a, alignedint b) {return b;} void bar (alignedint x) { foo (1, x); } Where currently gcc uses r2 as the argument register for b in foo. That is true, but as you can't call it with almost anything else: void bar2 (alignedint x) { foo (1, (alignedint) 1); } void bar3 (alignedint x) { foo (1, x + (alignedint) 1); } alignedint q; void bar4 (alignedint x) { foo (1, q); } eextern int baz (int, int); void bar5 (alignedint x) { baz (1, x); } is all broken, I'd seriously doubt anybody is actually using it successfully. Having an ABI feature that most of the time doesn't work at all, and occassionally happens to work, is just unsupportable. You could add a -Wpsabi warning in the callee, warning if any of the non-aggregate arguments has TYPE_ALIGN (type) != TYPE_ALIGN (TYPE_MAIN_VARIANT (type)) to let users know that this really didn't work in = 5.1. And supposedly we need the tree-sra.c patch then so that eipa_sra doesn't create the over-aligned parameters, because -Wpsabi then would warn about those. In any case, this is a P1 issue that needs resolving RSN, not just on the trunk, but also on the branch, and the tree-sra.c patch I've posted isn't anywhere close to resolving it without fixing the backend. I'm in the middle of bootstrapping/regtesting this on armv7hl (both profiledbootstrap to see if it fixes it and normal one). Jakub
Re: [PATCH] add self-tuning to x86 hardware fast path in libitm
Today I have received the news that the Copyright Assignment was completed with the FSF. On Thu, Apr 30, 2015 at 8:10 AM, Nuno Diegues n...@ist.utl.pt wrote: Patch looks good to me now. It would be perhaps nice to have an environment variable to turn the adaptive algorithm off for tests, but that's not critical. Yes, that makes perfect sense. It would be also nice to test it on something else, but I understand it's difficult to find other software using the STM syntax. Indeed. I'll try to find some time to work on that, but it may take a while. I can't approve the patch however. I believe it's big enough that you may need a copy right assignment. I have signed a Form Assignment from the Free Software Foundation to deal exactly with those matters for this patch to the libitm. Torvald Riegel had advised me to do so. I have not, however, received any further information; so I'm left wondering if it went through or if it is still hanging. I will ping back to FSF to check that out perhaps? Best regards, -- Nuno Diegues -Andi -- a...@linux.intel.com -- Speaking for myself only
Re: [PATCH] Fix eipa_sra AAPCS issue (PR target/65956)
On 05/05/15 13:54, Jakub Jelinek wrote: On Tue, May 05, 2015 at 01:49:55PM +0100, Richard Earnshaw wrote: The real question here is why is TYPE the type of the value, rather than the type of the formal as expressed by the prototype (or implicit prototype in the case of variadics or KR)? Surely this is the mid-end passing the wrong information to the back-end. There is nothing else for unnamed arguments (KR, stdarg). For named arguments, the backend has the option to save the fntype in CUMULATIVE_ARGS and look it up when it needs that. But, that will still mean KR and stdarg will be just broken on arm. Jakub In a literal sense, yes. However, even KR stdarg have standard promotion and conversion rules (size int = int, floats promoted to double, etc). What are those rules for GCC's overaligned types (ie where in the docs does it say what happens and how a back-end should interpret them)? Shouldn't the mid-end be doing that work so as to create a consistent view of the values passed into the back-end? It seems to me that at present the back-end has to be psychic as to what is really happening. R.
Re: [PATCH, PR65915] Fix float conversion split.
On Tue, May 5, 2015 at 3:03 PM, Ilya Tocar tocarip.in...@gmail.com wrote: +++ b/gcc/testsuite/gcc.target/i386/pr65915.c @@ -0,0 +1,6 @@ +/* { dg-do run } */ +/* { dg-options -O2 -mavx512f -fpic -mcmodel=medium } */ +/* { dg-require-effective-target avx512f } */ +/* { dg-require-effective-target lp64 } */ + +#include avx512f-vrndscalepd-2.c Missing testcases for FAIL: gcc.target/i386/avx512f-vrndscaleps-2.c (test for excess errors) FAIL: gcc.target/i386/avx512vl-vrndscaleps-2.c (internal compiler error) The attached test is OK, since these two would test for the same problem. as well as ChangeLog entries. ChangeLog is missing. Please add PR number and describe *each* change accurately. You can say (vector convert to float spltiter) for this particular nameless splitter. Please repost the patch with updated ChangeLog. ChangeLog PR c/65915 * config/i386/i386.md (vector convert to float spltiter): Check for xmm16+, when splitting scalar float conversion. * config/i386/sse.md (sse2_cvtsi2sd): Support EVEX version. And for tests PR c/65915 * gcc.target/i386/pr65915.c: New. OK for mainline. Thanks, Uros.
Re: [PATCH] Fix eipa_sra AAPCS issue (PR target/65956)
On 05/05/15 14:06, Jakub Jelinek wrote: On Tue, May 05, 2015 at 02:02:19PM +0100, Richard Earnshaw wrote: In a literal sense, yes. However, even KR stdarg have standard promotion and conversion rules (size int = int, floats promoted to double, etc). What are those rules for GCC's overaligned types (ie where in the docs does it say what happens and how a back-end should interpret them)? Shouldn't the mid-end be doing that work so as to For the middle-end, the TYPE_ALIGN info on expression types is considered useless, you can get there anything. There is no conversion rule to what you get for myalignedint + int, or (myalignedint) int, or (int) myalignedint, etc. create a consistent view of the values passed into the back-end? It seems to me that at present the back-end has to be psychic as to what is really happening. No, the backend just shouldn't consider TYPE_ALIGN on the scalars, and it seems only arm ever looks at that. Nothing in the specification for TARGET_FUNCTION_ARG (or any of the related functions) makes any mention of this... R. Jakub
Re: [PATCH] Fix eipa_sra AAPCS issue (PR target/65956)
On May 5, 2015 1:01:59 PM GMT+02:00, Richard Earnshaw richard.earns...@foss.arm.com wrote: On 05/05/15 11:54, Richard Earnshaw wrote: On 05/05/15 08:32, Jakub Jelinek wrote: On Mon, May 04, 2015 at 05:00:11PM +0200, Jakub Jelinek wrote: So at least changing arm_needs_doubleword_align for non-aggregates would likely not break anything that hasn't been broken already and would unbreak the majority of cases. Attached (untested so far). It indeed changes code generated for over-aligned va_arg, but as I believe you can't properly pass those in the ... caller, this should just fix it so that va_arg handling matches the caller (and likewise for callees for named argument passing). The following testcase shows that eipa_sra changes alignment even for the aggregates. Change aligned (8) to aligned (4) to see another possibility. Actually I misread it, for the aggregates esra actually doesn't change anything, which is the reason why the testcase doesn't fail. The problem with the scalars is that esra first changed it to the over-aligned MEM_REFs and then later on eipa_sra used the types of the MEM_REFs created by esra. 2015-05-05 Jakub Jelinek ja...@redhat.com PR target/65956 * config/arm/arm.c (arm_needs_doubleword_align): For non-aggregate types check TYPE_ALIGN of TYPE_MAIN_VARIANT rather than type itself. * gcc.c-torture/execute/pr65956.c: New test. --- gcc/config/arm/arm.c.jj 2015-05-04 21:51:42.0 +0200 +++ gcc/config/arm/arm.c2015-05-05 09:20:52.481693337 +0200 @@ -6063,8 +6063,13 @@ arm_init_cumulative_args (CUMULATIVE_ARG static bool arm_needs_doubleword_align (machine_mode mode, const_tree type) { - return (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY - || (type TYPE_ALIGN (type) PARM_BOUNDARY)); + if (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY) +return true; I don't think this is right (though I suspect the existing code has the same problem). We should only look at mode if there is no type information. The problem is that GCC has a nasty habit of assigning real machine modes to things that are really BLKmode and we've run into several cases where this has royally screwed things up. So for consistency in the ARM back-end we are careful to only use mode when type is NULL (= it's a libcall). + if (type == NULL_TREE) +return false; + if (AGGREGATE_TYPE_P (type)) +return TYPE_ALIGN (type) PARM_BOUNDARY; + return TYPE_ALIGN (TYPE_MAIN_VARIANT (type)) PARM_BOUNDARY; } It ought to be possible to re-order this, though, to static bool arm_needs_doubleword_align (machine_mode mode, const_tree type) { - return (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY - || (type TYPE_ALIGN (type) PARM_BOUNDARY)); + if (type != NULL_TREE) +{ + if (AGGREGATE_TYPE_P (type)) +return TYPE_ALIGN (type) PARM_BOUNDARY; + return TYPE_ALIGN (TYPE_MAIN_VARIANT (type)) PARM_BOUNDARY; +} + return (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY); } Either way, this would need careful cross-testing against an existing compiler. It looks as though either patch would cause ABI incompatibility for typedef int alignedint __attribute__((aligned((8; int __attribute__((weak)) foo (int a, alignedint b) {return b;} void bar (alignedint x) { foo (1, x); } Where currently gcc uses r2 as the argument register for b in foo. And for foo (1,2) or an int typed 2nd arg? Richard. R.
Re: [PATCH] Fix eipa_sra AAPCS issue (PR target/65956)
On 05/05/15 13:37, Richard Biener wrote: On May 5, 2015 1:01:59 PM GMT+02:00, Richard Earnshaw richard.earns...@foss.arm.com wrote: On 05/05/15 11:54, Richard Earnshaw wrote: On 05/05/15 08:32, Jakub Jelinek wrote: On Mon, May 04, 2015 at 05:00:11PM +0200, Jakub Jelinek wrote: So at least changing arm_needs_doubleword_align for non-aggregates would likely not break anything that hasn't been broken already and would unbreak the majority of cases. Attached (untested so far). It indeed changes code generated for over-aligned va_arg, but as I believe you can't properly pass those in the ... caller, this should just fix it so that va_arg handling matches the caller (and likewise for callees for named argument passing). The following testcase shows that eipa_sra changes alignment even for the aggregates. Change aligned (8) to aligned (4) to see another possibility. Actually I misread it, for the aggregates esra actually doesn't change anything, which is the reason why the testcase doesn't fail. The problem with the scalars is that esra first changed it to the over-aligned MEM_REFs and then later on eipa_sra used the types of the MEM_REFs created by esra. 2015-05-05 Jakub Jelinek ja...@redhat.com PR target/65956 * config/arm/arm.c (arm_needs_doubleword_align): For non-aggregate types check TYPE_ALIGN of TYPE_MAIN_VARIANT rather than type itself. * gcc.c-torture/execute/pr65956.c: New test. --- gcc/config/arm/arm.c.jj2015-05-04 21:51:42.0 +0200 +++ gcc/config/arm/arm.c 2015-05-05 09:20:52.481693337 +0200 @@ -6063,8 +6063,13 @@ arm_init_cumulative_args (CUMULATIVE_ARG static bool arm_needs_doubleword_align (machine_mode mode, const_tree type) { - return (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY -|| (type TYPE_ALIGN (type) PARM_BOUNDARY)); + if (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY) +return true; I don't think this is right (though I suspect the existing code has the same problem). We should only look at mode if there is no type information. The problem is that GCC has a nasty habit of assigning real machine modes to things that are really BLKmode and we've run into several cases where this has royally screwed things up. So for consistency in the ARM back-end we are careful to only use mode when type is NULL (= it's a libcall). + if (type == NULL_TREE) +return false; + if (AGGREGATE_TYPE_P (type)) +return TYPE_ALIGN (type) PARM_BOUNDARY; + return TYPE_ALIGN (TYPE_MAIN_VARIANT (type)) PARM_BOUNDARY; } It ought to be possible to re-order this, though, to static bool arm_needs_doubleword_align (machine_mode mode, const_tree type) { - return (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY - || (type TYPE_ALIGN (type) PARM_BOUNDARY)); + if (type != NULL_TREE) +{ + if (AGGREGATE_TYPE_P (type)) +return TYPE_ALIGN (type) PARM_BOUNDARY; + return TYPE_ALIGN (TYPE_MAIN_VARIANT (type)) PARM_BOUNDARY; +} + return (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY); } Either way, this would need careful cross-testing against an existing compiler. It looks as though either patch would cause ABI incompatibility for typedef int alignedint __attribute__((aligned((8; int __attribute__((weak)) foo (int a, alignedint b) {return b;} void bar (alignedint x) { foo (1, x); } Where currently gcc uses r2 as the argument register for b in foo. And for foo (1,2) or an int typed 2nd arg? Yep, looks like that's horribly broken too ;-( R. Richard. R.
Re: [PR testsuite/65205, libgomp/65993] Fix dg-shouldfail usage in OpenACC libgomp tests
On 2015-05-05 5:43 AM, Thomas Schwinge wrote: FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/lib-62.c -DACC_DEVICE_TYPE_hos t=1 -DACC_MEM_SHARED=1 output pattern test, is , should match invalid size With this one I'll need your help: please cite from libgomp.log (or, from a manual run) the actual output message that you're getting. There's no output message: # ./lib-62.exe Segmentation fault (core dumped) (gdb) r Starting program: /test/gnu/gcc/objdir/hppa2.0w-hp-hpux11.11/libgomp/testsuite/lib-62.exe warning: Private mapping of shared library text was not specified by the executable; setting a breakpoint in a shared library which is not privately mapped will not work. See the HP-UX 11i v3 chatr manpage for methods to privately map shared library text. Program received signal SIGSEGV, Segmentation fault. acc_init (d=acc_device_nvidia) at ../../../gcc/libgomp/oacc-init.c:178 178 ndevs = base_dev-get_num_devices_func (); (gdb) disass $pc-16,$pc+16 Dump of assembler code from 0xc12731c8 to 0xc12731e8: 0xc12731c8 acc_init+64:copy r4,r19 0xc12731cc acc_init+68:copy r6,r26 0xc12731d0 acc_init+72:b,l 0xc1272af0 resolve_device,rp 0xc12731d4 acc_init+76:copy r19,r4 = 0xc12731d8 acc_init+80:ldw 1c(ret0),r22 0xc12731dc acc_init+84:copy r4,r19 0xc12731e0 acc_init+88:copy ret0,r3 0xc12731e4 acc_init+92:copy r19,r4 End of assembler dump. (gdb) p/x $ret0 $1 = 0x0 It would seem resolve_device returned 0. Dave -- John David Anglin dave.ang...@bell.net
Re: [PATCH, PR target/66015]: Fix alignments with attribute_optimize for aarch64
On 5 May 2015 at 12:07, Christian Bruel christian.br...@st.com wrote: This fixes PR target/66015 and a latent issue revealed by gcc.dg/ipa/iinline-attr.c since https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01609.html Regtested on aarch64-linux-gnu by Linaro. OK for trunk ? OK. Is this issue present in gcc-5? If so can you backport? Thanks /Marcus
Re: [PATCH, PR65915] Fix float conversion split.
+++ b/gcc/testsuite/gcc.target/i386/pr65915.c @@ -0,0 +1,6 @@ +/* { dg-do run } */ +/* { dg-options -O2 -mavx512f -fpic -mcmodel=medium } */ +/* { dg-require-effective-target avx512f } */ +/* { dg-require-effective-target lp64 } */ + +#include avx512f-vrndscalepd-2.c Missing testcases for FAIL: gcc.target/i386/avx512f-vrndscaleps-2.c (test for excess errors) FAIL: gcc.target/i386/avx512vl-vrndscaleps-2.c (internal compiler error) The attached test is OK, since these two would test for the same problem. as well as ChangeLog entries. ChangeLog is missing. Please add PR number and describe *each* change accurately. You can say (vector convert to float spltiter) for this particular nameless splitter. Please repost the patch with updated ChangeLog. ChangeLog PR c/65915 * config/i386/i386.md (vector convert to float spltiter): Check for xmm16+, when splitting scalar float conversion. * config/i386/sse.md (sse2_cvtsi2sd): Support EVEX version. And for tests PR c/65915 * gcc.target/i386/pr65915.c: New. Reposted patch below. --- gcc/config/i386/i386.md | 8 ++-- gcc/config/i386/sse.md | 6 +++--- gcc/testsuite/gcc.target/i386/pr65915.c | 6 ++ 3 files changed, 15 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr65915.c diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 937871a..af1cd9b 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -4897,7 +4897,9 @@ TARGET_SSE2 TARGET_SSE_MATH TARGET_USE_VECTOR_CONVERTS optimize_function_for_speed_p (cfun) reload_completed SSE_REG_P (operands[0]) -(MEM_P (operands[1]) || TARGET_INTER_UNIT_MOVES_TO_VEC) +(MEM_P (operands[1]) || TARGET_INTER_UNIT_MOVES_TO_VEC) +(!EXT_REX_SSE_REG_P (operands[0]) + || TARGET_AVX512VL) [(const_int 0)] { operands[3] = simplify_gen_subreg (ssevecmodemode, operands[0], @@ -4921,7 +4923,9 @@ TARGET_SSE2 TARGET_SSE_MATH TARGET_SSE_PARTIAL_REG_DEPENDENCY optimize_function_for_speed_p (cfun) -reload_completed SSE_REG_P (operands[0]) +reload_completed SSE_REG_P (operands[0]) +(!EXT_REX_SSE_REG_P (operands[0]) + || TARGET_AVX512VL) [(const_int 0)] { const machine_mode vmode = MODEF:ssevecmodemode; diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 9b7009a..c61098d 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -4258,11 +4258,11 @@ (set_attr mode TI)]) (define_insn sse2_cvtsi2sd - [(set (match_operand:V2DF 0 register_operand =x,x,x) + [(set (match_operand:V2DF 0 register_operand =x,x,v) (vec_merge:V2DF (vec_duplicate:V2DF (float:DF (match_operand:SI 2 nonimmediate_operand r,m,rm))) - (match_operand:V2DF 1 register_operand 0,0,x) + (match_operand:V2DF 1 register_operand 0,0,v) (const_int 1)))] TARGET_SSE2 @ @@ -4275,7 +4275,7 @@ (set_attr amdfam10_decode vector,double,*) (set_attr bdver1_decode double,direct,*) (set_attr btver2_decode double,double,double) - (set_attr prefix orig,orig,vex) + (set_attr prefix orig,orig,maybe_evex) (set_attr mode DF)]) (define_insn sse2_cvtsi2sdqround_name diff --git a/gcc/testsuite/gcc.target/i386/pr65915.c b/gcc/testsuite/gcc.target/i386/pr65915.c new file mode 100644 index 000..990c5aa --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr65915.c @@ -0,0 +1,6 @@ +/* { dg-do run } */ +/* { dg-options -O2 -mavx512f -fpic -mcmodel=medium } */ +/* { dg-require-effective-target avx512f } */ +/* { dg-require-effective-target lp64 } */ + +#include avx512f-vrndscalepd-2.c -- 1.8.3.1
C++ PATCH for self-referential variable template
A variable template or partial specialization trying to refer to the template in its initializer got confused because name lookup found the VAR_DECL rather than the template. Fixed by avoiding pushing the VAR_DECL. Tested x86_64-pc-linux-gnu, applying to trunk and 5. commit fb2e3bd25d2b8ce5cde5facf156941c7eae5d643 Author: Jason Merrill ja...@redhat.com Date: Mon May 4 21:09:11 2015 -0500 * decl.c (start_decl): Don't push the plain VAR_DECL for a variable template. diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index 6ec1579..261a12d 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -4825,8 +4825,11 @@ start_decl (const cp_declarator *declarator, was_public = TREE_PUBLIC (decl); - /* Enter this declaration into the symbol table. */ - decl = maybe_push_decl (decl); + /* Enter this declaration into the symbol table. Don't push the plain + VAR_DECL for a variable template. */ + if (!template_parm_scope_p () + || TREE_CODE (decl) != VAR_DECL) +decl = maybe_push_decl (decl); if (processing_template_decl) decl = push_template_decl (decl); diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ24.C b/gcc/testsuite/g++.dg/cpp1y/var-templ24.C new file mode 100644 index 000..d8f7cbf --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1y/var-templ24.C @@ -0,0 +1,5 @@ +// { dg-do compile { target c++14 } } + +template class T bool Foo = Fooint; +template bool Fooint = true; +int i = Foochar;
Re: [PATCH 5/8] always define HAVE_simple_return and HAVE_return
On Tue, May 05, 2015 at 06:12:50PM -0700, Mike Stump wrote: On Apr 26, 2015, at 10:55 PM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders tbsaunde+...@tbsaunde.org gcc/ChangeLog: 2015-04-27 Trevor Saunders tbsaunde+...@tbsaunde.org * bb-reorder.c (HAVE_return): Don't check if its undefined. * defaults.h (gen_simple_return): New function. (gen_simple_return): Likewise. (HAVE_return): Add default definition to false. (HAVE_simple_return): Likewise. * cfgrtl.c (force_nonfallthru_and_redirect): Remove checks if HAVE_return and HAVE_simple_return are defined. * function.c (gen_return_pattern): Likewise. (convert_jumps_to_returns): Likewise. (thread_prologue_and_epilogue_insns): Likewise. * reorg.c (find_end_label): Likewise. (dbr_schedule): Likewise. * shrink-wrap.c: Likewise. * shrink-wrap.h: Likewise. I’m seeing: In file included from ./tm.h:30:0, from ../../gcc/gcc/c-family/c-semantics.c:24: ../../gcc/gcc/defaults.h: In function ‘rtx_def* gen_simple_return()’: ../../gcc/gcc/defaults.h:1422:1: error: redefinition of ‘rtx_def* gen_simple_return()’ gen_simple_return () ^ In file included from ./tm.h:22:0, from ../../gcc/gcc/c-family/c-semantics.c:24: ./insn-flags.h:1744:1: error: ‘rtx_def* gen_simple_return()’ previously defined here gen_simple_return(void) ^ in my port. I have a simple_return and a return that is “0” enabled. defaults.h has: #ifndef HAVE_simple_return #define HAVE_simple_return 0 static inline rtx gen_simple_return () { gcc_unreachable (); return NULL; } #endif and insn-flags.h has: static inline rtx gen_simple_return (void); static inline rtx gen_simple_return(void) { return 0; } If I change the enable to “1” or “” then it compiles better. Also, I can delete the pattern or change the name of the pattern and it works ok as well. If they both did #ifndef HAVE_simple_return, and then insn-flags did #define HAVE_simple_return 0, I think it might work better. I’ve not thought about it much. hrm, I didn't realize you could sometimes get a dummy function in insn-flags.h (especially without it defining the macro that seems kind of broken on genflags' part). What I think we could actually do is change genflags to #define HAVE_x 0 if it is going to emit the static function and some how have a list of insns that always get emited so you don't need the define_insn with enabled = 0 in every target's md file. The one big trick with this is that all the #ifdef HAVE_foo need to be updated so it'll take a little time. Trev
Re: [PATCH] PR debug/61352 back port from mainline
On Apr 12, 2015, at 11:11 AM, Jack Howarth howarth.at@gmail.com wrote: The attached patch is a back port of the change from https://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=211067 for gcc-4_9-branch. Bootstrap and regression tested on x86_64-apple-darwin14 with Xcode 6.3. Okay for gcc-4_9-branch? Jack PR61352_backport.diff Ok. Committed revision 222835.
C++ PATCH for DR 1518 (explicit default ctor and copy-list-initialization)
I went to draft standardese for DR 1518 this week and realized that it had already been resolved by DR 1630, which clarified that default-initialization can call explicit default constructors even in the context of copy-list-initialization. So this patch adjusts my earlier approximation to reflect that. Tested x86_64-pc-linux-gnu, applying to trunk. commit 733826123adfc74ffb07e423fb8cd9311c8d6a86 Author: Jason Merrill ja...@redhat.com Date: Mon May 4 06:45:03 2015 -0500 DR 1518 DR 1630 PR c++/54835 PR c++/60417 * call.c (convert_like_real): Check value-initialization before explicit. * typeck2.c (process_init_constructor_record): Don't set CONSTRUCTOR_IS_DIRECT_INIT. (process_init_constructor_array): Likewise. * init.c (build_vec_init): Likewise. diff --git a/gcc/cp/call.c b/gcc/cp/call.c index 7bdf236..b77f69a 100644 --- a/gcc/cp/call.c +++ b/gcc/cp/call.c @@ -6243,19 +6243,9 @@ convert_like_real (conversion *convs, tree expr, tree fn, int argnum, tree convfn = cand-fn; unsigned i; - /* When converting from an init list we consider explicit - constructors, but actually trying to call one is an error. */ - if (DECL_NONCONVERTING_P (convfn) DECL_CONSTRUCTOR_P (convfn) - /* Unless this is for direct-list-initialization. */ - !DIRECT_LIST_INIT_P (expr)) - { - if (!(complain tf_error)) - return error_mark_node; - error (converting to %qT from initializer list would use - explicit constructor %qD, totype, convfn); - } - - /* If we're initializing from {}, it's value-initialization. */ + /* If we're initializing from {}, it's value-initialization. Note + that under the resolution of core 1630, value-initialization can + use explicit constructors. */ if (BRACE_ENCLOSED_INITIALIZER_P (expr) CONSTRUCTOR_NELTS (expr) == 0 TYPE_HAS_DEFAULT_CONSTRUCTOR (totype)) @@ -6271,6 +6261,18 @@ convert_like_real (conversion *convs, tree expr, tree fn, int argnum, return expr; } + /* When converting from an init list we consider explicit + constructors, but actually trying to call one is an error. */ + if (DECL_NONCONVERTING_P (convfn) DECL_CONSTRUCTOR_P (convfn) + /* Unless this is for direct-list-initialization. */ + !DIRECT_LIST_INIT_P (expr)) + { + if (!(complain tf_error)) + return error_mark_node; + error (converting to %qT from initializer list would use + explicit constructor %qD, totype, convfn); + } + expr = mark_rvalue_use (expr); /* Set user_conv_p on the argument conversions, so rvalue/base diff --git a/gcc/cp/init.c b/gcc/cp/init.c index a4fc9ff..c41e30c 100644 --- a/gcc/cp/init.c +++ b/gcc/cp/init.c @@ -3720,7 +3720,6 @@ build_vec_init (tree base, tree maxindex, tree init, if (cxx_dialect = cxx11 AGGREGATE_TYPE_P (type)) { init = build_constructor (init_list_type_node, NULL); - CONSTRUCTOR_IS_DIRECT_INIT (init) = true; } else { diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c index c0df823..6e0c777 100644 --- a/gcc/cp/typeck2.c +++ b/gcc/cp/typeck2.c @@ -1285,7 +1285,6 @@ process_init_constructor_array (tree type, tree init, we can't rely on the back end to do it for us, so make the initialization explicit by list-initializing from T{}. */ next = build_constructor (init_list_type_node, NULL); - CONSTRUCTOR_IS_DIRECT_INIT (next) = true; next = massage_init_elt (TREE_TYPE (type), next, complain); if (initializer_zerop (next)) /* The default zero-initialization is fine for us; don't @@ -1396,9 +1395,6 @@ process_init_constructor_record (tree type, tree init, for us, so build up TARGET_EXPRs. If the type in question is a class, just build one up; if it's an array, recurse. */ next = build_constructor (init_list_type_node, NULL); - /* Call this direct-initialization pending DR 1518 resolution so - that explicit default ctors don't break valid C++03 code. */ - CONSTRUCTOR_IS_DIRECT_INIT (next) = true; next = massage_init_elt (TREE_TYPE (field), next, complain); /* Warn when some struct elements are implicitly initialized. */ diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist-dr1518.C b/gcc/testsuite/g++.dg/cpp0x/initlist-dr1518.C new file mode 100644 index 000..9f8ee0b --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/initlist-dr1518.C @@ -0,0 +1,27 @@ +// DR 1518 +// { dg-do compile { target c++11 } } + +struct A { + explicit A() = default; +}; + +struct B : A { + explicit B() = default; +}; + +struct C { + explicit C(); +}; + +struct D : A { + C c; + explicit D() = default; +}; + +templatetypename T void f() { + T t = {}; +} +templatetypename T void g() { + void x(T t); + x({}); +} diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist40.C b/gcc/testsuite/g++.dg/cpp0x/initlist40.C index 6e6a11a..de2e19d 100644 --- a/gcc/testsuite/g++.dg/cpp0x/initlist40.C +++ b/gcc/testsuite/g++.dg/cpp0x/initlist40.C @@
Three C++ PATCHes to fix testsuite regressions with cxx_dialect defaulting to cxx11
These patches fix several issues I found by switching the C++ dialect default to C++11. The first patch fixes obj-c++.dg/encode-10.mm: When build_non_dependent_expr tries to get a constant value for a non-dependent expression, we need to know that AT_ENCODE_EXPR can't be a constant expression. The second patch fixes the GDB test gdb.cp/anon-struct.cc. In C++11 mode we declare the default constructor before we see the typedef which gives the anonymous struct a name for linkage purposes, and as a result the DWARF output for the constructor had no name. The patch causes us to ignore the constructor if we see it before it has a name, and update the name of the constructor when the type gets one. The third patch fixes g++.dg/torture/Wsizeof-memaccess2.C. We shouldn't emit those warnings when we're in SFINAE mode. Tested x86_64-pc-linux-gnu, applying to trunk. commit 3e0151efa351acdbb0a294a5d9c36b187c798d19 Author: Jason Merrill ja...@redhat.com Date: Tue May 5 20:55:56 2015 -0500 Fix obj-c++.dg/encode-10.mm with cxx_dialect == cxx11. * constexpr.c (potential_constant_expression_1) [AT_ENCODE_EXPR]: Return false. diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c index 9ebb640..f35ec1e 100644 --- a/gcc/cp/constexpr.c +++ b/gcc/cp/constexpr.c @@ -4159,6 +4159,7 @@ potential_constant_expression_1 (tree t, bool want_rval, bool strict, case OBJ_TYPE_REF: case TRANSACTION_EXPR: case ASM_EXPR: +case AT_ENCODE_EXPR: fail: if (flags tf_error) error (expression %qE is not a constant-expression, t); commit 2d097581fbd5c0e3de866d879f2296d68753caff Author: Jason Merrill ja...@redhat.com Date: Tue May 5 20:52:13 2015 -0500 Fix gdb.cp/anon-struct.cc with -std=c++11. gcc/ * dwarf2out.c (gen_member_die): Don't emit anything for an anonymous class constructor. gcc/cp/ * decl2.c (reset_type_linkage_2): Update the DECL_NAME of a maybe-in-charge constructor. diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c index b2251d8..0d47847 100644 --- a/gcc/cp/decl2.c +++ b/gcc/cp/decl2.c @@ -2648,7 +2648,12 @@ reset_type_linkage_2 (tree type) if (TREE_CODE (m) == VAR_DECL) reset_decl_linkage (m); for (tree m = TYPE_METHODS (type); m; m = DECL_CHAIN (m)) - reset_decl_linkage (m); + { + reset_decl_linkage (m); + if (DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (m)) + /* Also update its name, for cxx_dwarf_name. */ + DECL_NAME (m) = TYPE_IDENTIFIER (type); + } binding_table_foreach (CLASSTYPE_NESTED_UTDS (type), bt_reset_linkage_2, NULL); } diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index ddca2a8..cb2656c 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -19951,6 +19951,10 @@ gen_member_die (tree type, dw_die_ref context_die) /* Don't include clones in the member list. */ if (DECL_ABSTRACT_ORIGIN (member)) continue; + /* Nor constructors for anonymous classes. */ + if (DECL_ARTIFICIAL (member) + dwarf2_name (member, 0) == NULL) + continue; child = lookup_decl_die (member); if (child) commit 2fe1e2823adecf5b62248f996391b787e4b62926 Author: Jason Merrill ja...@redhat.com Date: Tue May 5 20:55:00 2015 -0500 Fix g++.dg/torture/Wsizeof-pointer-memaccess2.C with -std=c++11 * semantics.c (finish_call_expr): Check complain. diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c index 0fc08b5f..701a8eb 100644 --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -2377,6 +2377,7 @@ finish_call_expr (tree fn, vectree, va_gc **args, bool disallow_virtual, if (!result) { if (warn_sizeof_pointer_memaccess + (complain tf_warning) !vec_safe_is_empty (*args) !processing_template_decl) {
Re: [PATCH] Expand PIC calls without PLT with -fno-plt
On Mon, May 04, 2015 at 11:42:20AM -0600, Jeff Law wrote: On 05/04/2015 11:39 AM, Jakub Jelinek wrote: On Mon, May 04, 2015 at 11:34:05AM -0600, Jeff Law wrote: On 05/04/2015 10:37 AM, Alexander Monakov wrote: This patch introduces option -fno-plt that allows to expand calls that would go via PLT to load the address of the function immediately at call site (which introduces a GOT load). Cover letter explains the motivation for this patch. New option documentation for invoke.texi is missing from the patch; if this is accepted I'll be happy to send a v2 with documentation added. * calls.c (prepare_call_address): Transform PLT call to GOT lookup and indirect call by forcing address into a pseudo with -fno-plt. * common.opt (flag_plt): New option. OK once you cobble together the invoke.texi changes. Isn't what Michael/Alan suggested better? I mean as/ld/compiler changes to inline the plt slot's first part, then lazy binding will work fine. I must have missed Alan/Michael's message. ISTM the win here is that by going through the GOT, you can CSE the GOT reference and possibly get some more register allocation freedom. Is that still the case with Alan/Michael's approach? There are many advantages to 'going through the GOT'. CSE'ing the reference is just one. The biggest (IMO) is that you can avoid the bad PLT ABI that most targets have, where making a call to a PLT slot requires the GOT address to be pre-loaded into a fixed, call-saved register. This precludes sibcalls and forces many functions which otherwise would not need their own stack frames to create one for saving the old value of the GOT register. See my blog entry on the topic here: http://ewontfix.com/18/ Anyone who really wants lazy binding can use -fplt (which is presumably still the default; I didn't check) but lazy binding should largely be considered deprecated anyway since effective use of relro protection requires -z now too, in which case you're paying all the costs (which are considerable!) for lazy binding support even though you won't get it. Rich
Re: [AArch64][PR65375] Fix RTX cost for vector SET
On 05/05/15 16:17, James Greenhalgh wrote: On Sat, Apr 25, 2015 at 12:26:16AM +0100, Kugan wrote: Thanks for the review. I have updated the patch based on the comments with some other minor changes. Bootstrapped and regression tested on aarch64-none-linux-gnu with no-new regressions. Is this OK for trunk? Thanks, Kugan gcc/ChangeLog: 2015-04-24 Kugan Vivekanandarajah kug...@linaro.org Jim Wilson jim.wil...@linaro.org * config/arm/aarch-common-protos.h (struct mem_cost_table): Added new fields loadv and storev. * config/aarch64/aarch64-cost-tables.h (thunderx_extra_costs): Initialize loadv and storev. * config/arm/aarch-cost-tables.h (generic_extra_costs): Likewise. (cortexa53_extra_costs): Likewise. (cortexa57_extra_costs): Likewise. (xgene1_extra_costs): Likewise. * config/aarch64/aarch64.c (aarch64_rtx_costs): Update vector rtx_costs. Hi Kugan, Just a few syle comments, regarding the placements of comments in single-line if statements. I know the current code does not neccesarily always follow the comments below, I'll write a patch cleaning that up at some point when I'm back at my desk. Thanks, James @@ -5667,6 +5668,14 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED, case NEG: op0 = XEXP (x, 0); + if (VECTOR_MODE_P (mode)) +{ + if (speed) +/* FNEG. */ +*cost += extra_cost-vect.alu; + return false; +} + if (GET_MODE_CLASS (GET_MODE (x)) == MODE_INT) { if (GET_RTX_CLASS (GET_CODE (op0)) == RTX_COMPARE Personally, I find commented if statements without braces hard to quickly parse. Something like this is much faster for me: if (speed) { /* FNEG. */ *cost += extra_cost-vect.alu; } @@ -5844,7 +5872,10 @@ cost_minus: if (speed) { -if (GET_MODE_CLASS (mode) == MODE_INT) +if (VECTOR_MODE_P (mode)) + /* Vector SUB. */ + *cost += extra_cost-vect.alu; +else if (GET_MODE_CLASS (mode) == MODE_INT) /* SUB(S). */ *cost += extra_cost-alu.arith; else if (GET_MODE_CLASS (mode) == MODE_FLOAT) As above. @@ -5888,7 +5919,6 @@ cost_plus: { if (speed) *cost += extra_cost-alu.arith_shift; - *cost += rtx_cost (XEXP (XEXP (op0, 0), 0), (enum rtx_code) GET_CODE (op0), 0, speed); Drop this whitespace change. @@ -5913,7 +5943,10 @@ cost_plus: if (speed) { -if (GET_MODE_CLASS (mode) == MODE_INT) +if (VECTOR_MODE_P (mode)) + /* Vector ADD. */ + *cost += extra_cost-vect.alu; +else if (GET_MODE_CLASS (mode) == MODE_INT) /* ADD. */ *cost += extra_cost-alu.arith; else if (GET_MODE_CLASS (mode) == MODE_FLOAT) As above. @@ -6013,10 +6061,15 @@ cost_plus: return false; case NOT: - /* MVN. */ if (speed) -*cost += extra_cost-alu.logical; - +{ + /* Vector NOT. */ + if (VECTOR_MODE_P (mode)) +*cost += extra_cost-vect.alu; + /* MVN. */ + else +*cost += extra_cost-alu.logical; +} /* The logical instruction could have the shifted register form, but the cost is the same if the shift is processed as a separate instruction, so we don't bother with it here. */ As above. @@ -6055,10 +6108,15 @@ cost_plus: return true; } - /* UXTB/UXTH. */ if (speed) -*cost += extra_cost-alu.extend; - +{ + if (VECTOR_MODE_P (mode)) +/* UMOV. */ +*cost += extra_cost-vect.alu; + else +/* UXTB/UXTH. */ +*cost += extra_cost-alu.extend; +} return false; ca§se SIGN_EXTEND: And again :) @@ -6087,10 +6150,16 @@ cost_plus: if (CONST_INT_P (op1)) { - /* LSL (immediate), UBMF, UBFIZ and friends. These are all - aliases. */ if (speed) -*cost += extra_cost-alu.shift; +{ + /* Vector shift (immediate). */ + if (VECTOR_MODE_P (mode)) +*cost += extra_cost-vect.alu; + /* LSL (immediate), UBMF, UBFIZ and friends. These are all + aliases. */ + else +*cost += extra_cost-alu.shift; +} /* We can incorporate zero/sign extend for free. */ if (GET_CODE (op0) == ZERO_EXTEND Again, the comment here makes it very difficult to spot the form of the if/else statement. @@ -6102,10 +6171,15 @@ cost_plus: } else { - /* LSLV. */ if (speed) -*cost += extra_cost-alu.shift_reg; - +{ + /* Vector shift
Re: [PATCH 5/8] always define HAVE_simple_return and HAVE_return
On Apr 26, 2015, at 10:55 PM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders tbsaunde+...@tbsaunde.org gcc/ChangeLog: 2015-04-27 Trevor Saunders tbsaunde+...@tbsaunde.org * bb-reorder.c (HAVE_return): Don't check if its undefined. * defaults.h (gen_simple_return): New function. (gen_simple_return): Likewise. (HAVE_return): Add default definition to false. (HAVE_simple_return): Likewise. * cfgrtl.c (force_nonfallthru_and_redirect): Remove checks if HAVE_return and HAVE_simple_return are defined. * function.c (gen_return_pattern): Likewise. (convert_jumps_to_returns): Likewise. (thread_prologue_and_epilogue_insns): Likewise. * reorg.c (find_end_label): Likewise. (dbr_schedule): Likewise. * shrink-wrap.c: Likewise. * shrink-wrap.h: Likewise. I’m seeing: In file included from ./tm.h:30:0, from ../../gcc/gcc/c-family/c-semantics.c:24: ../../gcc/gcc/defaults.h: In function ‘rtx_def* gen_simple_return()’: ../../gcc/gcc/defaults.h:1422:1: error: redefinition of ‘rtx_def* gen_simple_return()’ gen_simple_return () ^ In file included from ./tm.h:22:0, from ../../gcc/gcc/c-family/c-semantics.c:24: ./insn-flags.h:1744:1: error: ‘rtx_def* gen_simple_return()’ previously defined here gen_simple_return(void) ^ in my port. I have a simple_return and a return that is “0” enabled. defaults.h has: #ifndef HAVE_simple_return #define HAVE_simple_return 0 static inline rtx gen_simple_return () { gcc_unreachable (); return NULL; } #endif and insn-flags.h has: static inline rtx gen_simple_return (void); static inline rtx gen_simple_return(void) { return 0; } If I change the enable to “1” or “” then it compiles better. Also, I can delete the pattern or change the name of the pattern and it works ok as well. If they both did #ifndef HAVE_simple_return, and then insn-flags did #define HAVE_simple_return 0, I think it might work better. I’ve not thought about it much.
[PATCH] Add SPECIAL_FLOAT_MODE to enable adding IEEE 128-bit floating point to PowerPC
This patch contains the machine independent changes that I will need to add IEEE 128-bit floating point to the GCC compiler for PowerPC. It adds a new mode defintion macro: SPECIAL_FLOAT_MODE that is similar to FLOAT_MODE. The difference is the conversion system will skip SPECIAL_FLOAT_MODE types when it is looking for a larger type. The problem is the PowerPC will have 2 128-bit floating point types, one using the IBM double-double format (a pair of doubles to give the user more mantissa bits, but no greater exponent range), and IEEE 128-bit floating point. I don't want DFmode to automatically convert to KFmode (IEEE 128-bit floating point), but to TFmode (IBM double-double). With these patches applied, in the places where we are setting up types, we include the special floating point types, but for the normal GET_MODE_WIDER_MODE usage we do not want to use the special floating point mode. I found some of the GET_MODE_WIDER_MODE loops needed to add a check for running out of modes if the mode type was special. For those loops, I added a test for mode not being VOIDmode. 2015-05-05 Michael Meissner meiss...@linux.vnet.ibm.com * machmode.h (GET_MODE_WIDER_MODE_SPECIAL): New macro to get the wider modes that are normally skipped by default. * rtlanal.c (init_num_sign_bit_copies_in_rep): In going from narrow to wider modes, check whether we receive VOIDmode, since special floating point types are not listed in the normal widening tables. When doing initializations, go through all modes in a class, even special modes, that are normally skipped by default widening. * cse.c (cse_insn): Likewise. * expr.c (init_expr_target): Likewise. (compress_float_constant): Likewise. * dse.c (find_shift_sequence): Likewise. * emit-rtl.c (init_derived_machine_modes): Likewise. (init_emit_once): Likewise. * combine.c (simplify_comparison): Likewise. * machmode.def (SPECIAL_FLOAT_MODE): New type of floating point that is special, and is not automatically part of the normal widening rules. * genmodes.c (struct mode_data): Add special field. (blank_mode): Initialize special. (complete_mode): Complex and vector types inherit the special mode class. (FLOAT_MODE): Add special field for floating point to sort special nodes higher than normal nodes for the same size. The intention is to allow __float128 on PowerPC (KFmode) to be higher than long double (TFmode), so that automatic widening uses the long double type. (FRACTIONAL_FLOAT_MODE): Likewise. (SPECIAL_FLOAT_MODE): Likewise. (FLOAT_MODE_INTERNAL): Likewise. (make_float_mode): Likewise. (emit_mode_wider): Likewise. These patches have been part of my IEEE 128-bit floating point branch for some time, and I have bootstrapped that branch many times. At present, I am just submitting these patches to add the necessary infrastructure for the patches I will be contributing later. I have done an x86 bootstrap on the IEEE 128-bit floating point branch to make sure it does not cause problems for ports that do not define SPECIAL_FLOAT_MODE types. Are these patches ok to commit to the trunk? I would also be open to alternative ways of creating a mode that is not part of the normal wider conversion methods or in adding options to control the order the widening types are searched. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 Index: gcc/cse.c === --- gcc/cse.c (.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/cse.c) (revision 222823) +++ gcc/cse.c (.../gcc/cse.c) (working copy) @@ -4856,7 +4856,7 @@ cse_insn (rtx_insn *insn) rtx new_and = gen_rtx_AND (VOIDmode, NULL_RTX, XEXP (src, 1)); for (tmode = GET_MODE_WIDER_MODE (mode); - GET_MODE_SIZE (tmode) = UNITS_PER_WORD; + GET_MODE_SIZE (tmode) = UNITS_PER_WORD tmode != VOIDmode; tmode = GET_MODE_WIDER_MODE (tmode)) { rtx inner = gen_lowpart (tmode, XEXP (src, 0)); @@ -4908,7 +4908,7 @@ cse_insn (rtx_insn *insn) XEXP (memory_extend_rtx, 0) = src; for (tmode = GET_MODE_WIDER_MODE (mode); - GET_MODE_SIZE (tmode) = UNITS_PER_WORD; + GET_MODE_SIZE (tmode) = UNITS_PER_WORD tmode != VOIDmode; tmode = GET_MODE_WIDER_MODE (tmode)) { struct table_elt *larger_elt; Index: gcc/expr.c === --- gcc/expr.c (.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/expr.c) (revision 222823) +++ gcc/expr.c (.../gcc/expr.c)(working copy) @@ -261,11 +261,11 @@
C++ PATCH to add -Wterminate
C++11 changed destructors to be noexcept by default; as a result, code with throwing destructors that works in C++98 immediately calls terminate in C++11. When testing changing the default for cxx_dialect to C++11 I noticed that this was causing one of the tests to fail, and decided to add a warning for a throw-expression in a noexcept function without an intervening try block. This patch also adds the related warning to -Wc++11-compat. Tested x86_64-pc-linux-gnu, applying to trunk. commit 1ec3d94aa682212752ff3e1cef13ac118a71aa78 Author: Jason Merrill ja...@redhat.com Date: Tue May 5 14:01:08 2015 -0500 gcc/c-family/ * c.opt (Wterminate): New. gcc/cp/ * cp-gimplify.c (cp_genericize_r): Track TRY_BLOCK and MUST_NOT_THROW_EXPR, warn about a THROW_EXPR directly within a MUST_NOT_THROW_EXPR. (cp_genericize_data): Add try_block field. (cp_genericize_tree): Initialize it. * except.c (expand_end_catch_block): Set TREE_NO_WARNING on implicit rethrow. diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 983f4a8..8ef0cea 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -829,6 +829,10 @@ Wsystem-headers C ObjC C++ ObjC++ Warning ; Documented in common.opt +Wterminate +C++ ObjC++ Warning Var(warn_terminate) Init(1) +Warn if a throw expression will always result in a call to terminate() + Wtraditional C ObjC CPP(cpp_warn_traditional) CppReason(CPP_W_TRADITIONAL) Var(warn_traditional) Init(0) Warning Warn about features not present in traditional C diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c index 70645b5..35749ef 100644 --- a/gcc/cp/cp-gimplify.c +++ b/gcc/cp/cp-gimplify.c @@ -905,6 +905,7 @@ struct cp_genericize_data hash_settree *p_set; vectree bind_expr_stack; struct cp_genericize_omp_taskreg *omp_ctx; + tree try_block; }; /* Perform any pre-gimplification lowering of C++ front end trees to @@ -1193,6 +1194,54 @@ cp_genericize_r (tree *stmt_p, int *walk_subtrees, void *data) wtd-omp_ctx = omp_ctx.outer; splay_tree_delete (omp_ctx.variables); } + else if (TREE_CODE (stmt) == TRY_BLOCK) +{ + *walk_subtrees = 0; + tree try_block = wtd-try_block; + wtd-try_block = stmt; + cp_walk_tree (TRY_STMTS (stmt), cp_genericize_r, data, NULL); + wtd-try_block = try_block; + cp_walk_tree (TRY_HANDLERS (stmt), cp_genericize_r, data, NULL); +} + else if (TREE_CODE (stmt) == MUST_NOT_THROW_EXPR) +{ + /* MUST_NOT_THROW_COND might be something else with TM. */ + if (MUST_NOT_THROW_COND (stmt) == NULL_TREE) + { + *walk_subtrees = 0; + tree try_block = wtd-try_block; + wtd-try_block = stmt; + cp_walk_tree (TREE_OPERAND (stmt, 0), cp_genericize_r, data, NULL); + wtd-try_block = try_block; + } +} + else if (TREE_CODE (stmt) == THROW_EXPR) +{ + location_t loc = location_of (stmt); + if (TREE_NO_WARNING (stmt)) + /* Never mind. */; + else if (wtd-try_block) + { + if (TREE_CODE (wtd-try_block) == MUST_NOT_THROW_EXPR + warning_at (loc, OPT_Wterminate, + throw will always call terminate()) + cxx_dialect = cxx11 + DECL_DESTRUCTOR_P (current_function_decl)) + inform (loc, in C++11 destructors default to noexcept); + } + else + { + if (warn_cxx0x_compat cxx_dialect cxx11 + DECL_DESTRUCTOR_P (current_function_decl) + (TYPE_RAISES_EXCEPTIONS (TREE_TYPE (current_function_decl)) + == NULL_TREE) + (get_defaulted_eh_spec (current_function_decl) + == empty_except_spec)) + warning_at (loc, OPT_Wc__0x_compat, + in C++11 this throw will terminate because + destructors default to noexcept); + } +} else if (TREE_CODE (stmt) == CONVERT_EXPR) gcc_assert (!CONVERT_EXPR_VBASE_PATH (stmt)); else if (TREE_CODE (stmt) == FOR_STMT) @@ -1269,6 +1318,7 @@ cp_genericize_tree (tree* t_p) wtd.p_set = new hash_settree; wtd.bind_expr_stack.create (0); wtd.omp_ctx = NULL; + wtd.try_block = NULL_TREE; cp_walk_tree (t_p, cp_genericize_r, wtd, NULL); delete wtd.p_set; wtd.bind_expr_stack.release (); diff --git a/gcc/cp/except.c b/gcc/cp/except.c index 3ff1ce6..614f2e9 100644 --- a/gcc/cp/except.c +++ b/gcc/cp/except.c @@ -579,7 +579,11 @@ expand_end_catch_block (void) if (in_function_try_handler (DECL_CONSTRUCTOR_P (current_function_decl) || DECL_DESTRUCTOR_P (current_function_decl))) -finish_expr_stmt (build_throw (NULL_TREE)); +{ + tree rethrow = build_throw (NULL_TREE); + TREE_NO_WARNING (rethrow) = true; + finish_expr_stmt (rethrow); +} } tree diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index debd8ed..9c8aa99 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -2882,6 +2882,12 @@ Warn when overload resolution chooses a promotion from unsigned or enumerated type to a signed type, over a conversion to an unsigned type of the same size. Previous versions of G++ tried to preserve
Re: [PATCH] Fix eipa_sra AAPCS issue (PR target/65956)
On Tue, May 05, 2015 at 09:32:28AM +0200, Jakub Jelinek wrote: 2015-05-05 Jakub Jelinek ja...@redhat.com PR target/65956 * config/arm/arm.c (arm_needs_doubleword_align): For non-aggregate types check TYPE_ALIGN of TYPE_MAIN_VARIANT rather than type itself. * gcc.c-torture/execute/pr65956.c: New test. Note, the patch fixed the profiledbootstrap that has been broken in 5+. Jakub
Re: [PATCH] Fix eipa_sra AAPCS issue (PR target/65956)
On 05/05/15 15:29, Jakub Jelinek wrote: On Tue, May 05, 2015 at 02:20:43PM +0100, Richard Earnshaw wrote: On 05/05/15 14:06, Jakub Jelinek wrote: On Tue, May 05, 2015 at 02:02:19PM +0100, Richard Earnshaw wrote: In a literal sense, yes. However, even KR stdarg have standard promotion and conversion rules (size int = int, floats promoted to double, etc). What are those rules for GCC's overaligned types (ie where in the docs does it say what happens and how a back-end should interpret them)? Shouldn't the mid-end be doing that work so as to For the middle-end, the TYPE_ALIGN info on expression types is considered useless, you can get there anything. There is no conversion rule to what you get for myalignedint + int, or (myalignedint) int, or (int) myalignedint, etc. create a consistent view of the values passed into the back-end? It seems to me that at present the back-end has to be psychic as to what is really happening. No, the backend just shouldn't consider TYPE_ALIGN on the scalars, and it seems only arm ever looks at that. Nothing in the specification for TARGET_FUNCTION_ARG (or any of the related functions) makes any mention of this... While this requirement isn't documented, it is clearly common sense or at least something any kind of testing would reveal immediately. Then clearly no such tests exist in the testsuite :-( R. And it is nothing broken recently (except that with the SRA changes it hits much more often), looking e.g. at GCC 3.2, I'm seeing that expand_call is on that testcase also called with pretty random TYPE_ALIGN on the argument types; we didn't have GIMPLE then, so it is nothing that GIMPLE brought in. Jakub
Re: [PATCH] Fix eipa_sra AAPCS issue (PR target/65956)
On 05/05/15 15:06, Richard Biener wrote: On May 5, 2015 2:49:55 PM GMT+02:00, Richard Earnshaw richard.earns...@foss.arm.com wrote: On 05/05/15 13:46, Richard Earnshaw wrote: On 05/05/15 13:37, Richard Biener wrote: On May 5, 2015 1:01:59 PM GMT+02:00, Richard Earnshaw richard.earns...@foss.arm.com wrote: On 05/05/15 11:54, Richard Earnshaw wrote: On 05/05/15 08:32, Jakub Jelinek wrote: On Mon, May 04, 2015 at 05:00:11PM +0200, Jakub Jelinek wrote: So at least changing arm_needs_doubleword_align for non-aggregates would likely not break anything that hasn't been broken already and would unbreak the majority of cases. Attached (untested so far). It indeed changes code generated for over-aligned va_arg, but as I believe you can't properly pass those in the ... caller, this should just fix it so that va_arg handling matches the caller (and likewise for callees for named argument passing). The following testcase shows that eipa_sra changes alignment even for the aggregates. Change aligned (8) to aligned (4) to see another possibility. Actually I misread it, for the aggregates esra actually doesn't change anything, which is the reason why the testcase doesn't fail. The problem with the scalars is that esra first changed it to the over-aligned MEM_REFs and then later on eipa_sra used the types of the MEM_REFs created by esra. 2015-05-05 Jakub Jelinek ja...@redhat.com PR target/65956 * config/arm/arm.c (arm_needs_doubleword_align): For non-aggregate types check TYPE_ALIGN of TYPE_MAIN_VARIANT rather than type itself. * gcc.c-torture/execute/pr65956.c: New test. --- gcc/config/arm/arm.c.jj 2015-05-04 21:51:42.0 +0200 +++ gcc/config/arm/arm.c2015-05-05 09:20:52.481693337 +0200 @@ -6063,8 +6063,13 @@ arm_init_cumulative_args (CUMULATIVE_ARG static bool arm_needs_doubleword_align (machine_mode mode, const_tree type) { - return (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY - || (type TYPE_ALIGN (type) PARM_BOUNDARY)); + if (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY) +return true; I don't think this is right (though I suspect the existing code has the same problem). We should only look at mode if there is no type information. The problem is that GCC has a nasty habit of assigning real machine modes to things that are really BLKmode and we've run into several cases where this has royally screwed things up. So for consistency in the ARM back-end we are careful to only use mode when type is NULL (= it's a libcall). + if (type == NULL_TREE) +return false; + if (AGGREGATE_TYPE_P (type)) +return TYPE_ALIGN (type) PARM_BOUNDARY; + return TYPE_ALIGN (TYPE_MAIN_VARIANT (type)) PARM_BOUNDARY; } It ought to be possible to re-order this, though, to static bool arm_needs_doubleword_align (machine_mode mode, const_tree type) { - return (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY - || (type TYPE_ALIGN (type) PARM_BOUNDARY)); + if (type != NULL_TREE) +{ + if (AGGREGATE_TYPE_P (type)) +return TYPE_ALIGN (type) PARM_BOUNDARY; + return TYPE_ALIGN (TYPE_MAIN_VARIANT (type)) PARM_BOUNDARY; +} + return (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY); } Either way, this would need careful cross-testing against an existing compiler. It looks as though either patch would cause ABI incompatibility for typedef int alignedint __attribute__((aligned((8; int __attribute__((weak)) foo (int a, alignedint b) {return b;} void bar (alignedint x) { foo (1, x); } Where currently gcc uses r2 as the argument register for b in foo. And for foo (1,2) or an int typed 2nd arg? Richard. R. As is int foo2 (int a, int b); void bar2 (alignedint x) { foo2 (1, x); } The real question here is why is TYPE the type of the value, rather than the type of the formal as expressed by the prototype (or implicit prototype in the case of variadics or KR)? Surely this is the mid-end passing the wrong information to the back-end. No - the issue is you are looking at the type of the value at all, instead of at the type of the formal. And I would argue that passing the type of the acutal is stupid in all cases except when the type of the formal does not exist. It's just asking for problems like these. R. Richard. R. ... R
RFA: RL78: Save the frame pointer if it is used.
Hi DJ, Below is a small patch to fix a small problem with the need_to_save() function in the RL78 backend. It was only marking the frame pointer register as needing a save if the frame pointer was in use. But since the frame pointer is call-saved it needs to be saved any time it is used, even if it is not being used as a frame pointer. This problem has not arisen before because the frame pointer is also marked as being fixed. But it turns out that it can still be used inside some inline assembler, and so the prologue and epilogue code still need to save and restore it. Tested with no regressions on an rl78-elf toolchain. OK to apply ? Cheers Nick gcc/ChangeLog 2015-05-05 Nick Clifton ni...@redhat.com * config/rl78/rl78.c (need_to_save): Save the frame pointer any time that it is used. Index: gcc/config/rl78/rl78.c === --- gcc/config/rl78/rl78.c (revision 222796) +++ gcc/config/rl78/rl78.c (working copy) @@ -687,7 +687,8 @@ return df_regs_ever_live_p (regno); } - if (regno == FRAME_POINTER_REGNUM frame_pointer_needed) + if (regno == FRAME_POINTER_REGNUM + (frame_pointer_needed || df_regs_ever_live_p (regno))) return true; if (fixed_regs[regno]) return false;
PING: [PATCH] PR target/65846: Optimize data access in PIE with copy reloc
On Wed, Apr 22, 2015 at 9:34 AM, H.J. Lu hongjiu...@intel.com wrote: Normally, with PIE, GCC accesses globals that are extern to the module using GOT. This is two instructions, one to get the address of the global from GOT and the other to get the value. Examples: --- extern int a_glob; int main () { return a_glob; } --- With PIE, the generated code accesses global via GOT using two memory loads: movqa_glob@GOTPCREL(%rip), %rax movl(%rax), %eax for 64-bit or movla_glob@GOT(%ecx), %eax movl(%eax), %eax for 32-bit. Some experiments on google and SPEC CPU benchmarks show that the extra instruction affects performance by 1% to 5%. Solution - Copy Relocations: When the linker supports copy relocations, GCC can always assume that the global will be defined in the executable. For globals that are truly extern (come from shared objects), the linker will create copy relocations and have them defined in the executable. Result is that no global access needs to go through GOT and hence improves performance. We can generate movla_glob(%rip), %eax for 64-bit and movla_glob@GOTOFF(%eax), %eax for 32-bit. This optimization only applies to undefined non-weak non-TLS global data. Undefined weak global or TLS data access still must go through GOT. This patch reverts legitimate_pic_address_disp_p change made in revision 218397, which only applies to x86-64. Instead, this patch updates targetm.binds_local_p to indicate if undefined non-weak non-TLS global data is defined locally in PIE. It also introduces a new target hook, binds_tls_local_p to distinguish TLS variable from non-TLS variable. By default, binds_tls_local_p is the same as binds_local_p. This patch checks if 32-bit and 64-bit linkers support PIE with copy reloc at configure time. 64-bit linker is enabled in binutils 2.25 and 32-bit linker is enabled in binutils 2.26. This optimization is enabled only if the linker support is available. Tested on Linux/x86-64 with -m32 and -m64, using linkers with and without support for copy relocation in PIE. OK for trunk? Thanks. H.J. --- gcc/ PR target/65846 * configure.ac (HAVE_LD_PIE_COPYRELOC): Renamed to ... (HAVE_LD_64BIT_PIE_COPYRELOC): This. (HAVE_LD_32BIT_PIE_COPYRELOC): New. Defined to 1 if Linux/ia32 linker supports PIE with copy reloc. * output.h (default_binds_tls_local_p): New. (default_binds_local_p_3): Add 2 bool arguments. * target.def (binds_tls_local_p): New target hook. * varasm.c (decl_default_tls_model): Replace targetm.binds_local_p with targetm.binds_tls_local_p. (default_binds_local_p_3): Add a bool argument to indicate TLS variable and a bool argument to indicate if an undefined non-TLS non-weak data is local. Double check TLS variable. If an undefined non-TLS non-weak data is local, treat it as defined locally. (default_binds_local_p): Pass false and false to default_binds_local_p_3. (default_binds_local_p_2): Likewise. (default_binds_local_p_1): Likewise. (default_binds_tls_local_p): New. * config.in: Regenerated. * configure: Likewise. * doc/tm.texi: Likewise. * config/i386/i386.c (legitimate_pic_address_disp_p): Don't check HAVE_LD_PIE_COPYRELOC here. (ix86_binds_local): New. (ix86_binds_tls_local_p): Likewise. (ix86_binds_local_p): Use it. (TARGET_BINDS_TLS_LOCAL_P): New. * doc/tm.texi.in (TARGET_BINDS_TLS_LOCAL_P): New hook. gcc/testsuite/ PR target/65846 * gcc.target/i386/pie-copyrelocs-1.c: Updated for ia32. * gcc.target/i386/pie-copyrelocs-2.c: Likewise. * gcc.target/i386/pie-copyrelocs-3.c: Likewise. * gcc.target/i386/pie-copyrelocs-4.c: Likewise. * gcc.target/i386/pr32219-9.c: Likewise. * gcc.target/i386/pr32219-10.c: New file. * lib/target-supports.exp (check_effective_target_pie_copyreloc): Check HAVE_LD_64BIT_PIE_COPYRELOC and HAVE_LD_32BIT_PIE_COPYRELOC instead of HAVE_LD_64BIT_PIE_COPYRELOC. Richard, Jeff, Can you review this patch: https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01331.html Thanks. -- H.J.
Re: [PATCH] Fix eipa_sra AAPCS issue (PR target/65956)
On 05/05/15 15:33, Richard Earnshaw wrote: On 05/05/15 15:29, Jakub Jelinek wrote: On Tue, May 05, 2015 at 02:20:43PM +0100, Richard Earnshaw wrote: On 05/05/15 14:06, Jakub Jelinek wrote: On Tue, May 05, 2015 at 02:02:19PM +0100, Richard Earnshaw wrote: In a literal sense, yes. However, even KR stdarg have standard promotion and conversion rules (size int = int, floats promoted to double, etc). What are those rules for GCC's overaligned types (ie where in the docs does it say what happens and how a back-end should interpret them)? Shouldn't the mid-end be doing that work so as to For the middle-end, the TYPE_ALIGN info on expression types is considered useless, you can get there anything. There is no conversion rule to what you get for myalignedint + int, or (myalignedint) int, or (int) myalignedint, etc. create a consistent view of the values passed into the back-end? It seems to me that at present the back-end has to be psychic as to what is really happening. No, the backend just shouldn't consider TYPE_ALIGN on the scalars, and it seems only arm ever looks at that. Nothing in the specification for TARGET_FUNCTION_ARG (or any of the related functions) makes any mention of this... While this requirement isn't documented, it is clearly common sense or at least something any kind of testing would reveal immediately. Then clearly no such tests exist in the testsuite :-( Or, more precisely, no such tests existed in 2008, when the code went in. R. R. And it is nothing broken recently (except that with the SRA changes it hits much more often), looking e.g. at GCC 3.2, I'm seeing that expand_call is on that testcase also called with pretty random TYPE_ALIGN on the argument types; we didn't have GIMPLE then, so it is nothing that GIMPLE brought in. Jakub
Re: Next set of OpenACC changes: C family
On 05/05/2015 07:18 AM, Jakub Jelinek wrote: On Tue, May 05, 2015 at 10:57:28AM +0200, Thomas Schwinge wrote: --- gcc/c-family/c-common.c +++ gcc/c-family/c-common.c @@ -809,7 +809,7 @@ const struct attribute_spec c_common_attribute_table[] = handle_omp_declare_simd_attribute, false }, { cilk simd function, 0, -1, true, false, false, handle_omp_declare_simd_attribute, false }, - { omp declare target, 0, 0, true, false, false, + { omp declare target, 0, -1, true, false, false, handle_omp_declare_target_attribute, false }, { alloc_align,1, 1, false, true, true, handle_alloc_align_attribute, false }, Can you explain this change? omp declare target doesn't take any arguments, so 0, 0, looks right to me. Because we are using that attribute for oacc routines, and routines may have contain clauses. Thinking about this some more, we could probably revert this change. I have another patch to disable exception handling inside openacc accelerated regions because the nvptx target doesn't support them. In that patch I introduced a new oacc function attribute. Maybe we should attach the acc routine clauses on that oacc function attribute. @@ -823,6 +823,7 @@ const struct attribute_spec c_common_attribute_table[] = handle_bnd_legacy, false }, { bnd_instrument, 0, 0, true, false, false, handle_bnd_instrument, false }, + { oacc declare, 0, -1, true, false, false, NULL, false }, { NULL, 0, 0, false, false, false, NULL, false } If oacc declare is different, then supposedly you shouldn't reuse omp declare target attribute for the OpenACC thingie. I'm not sure about this one. Oacc has enough quirks where it may be justifiable though. I'll find out who wrote this patch. --- gcc/c-family/c-omp.c +++ gcc/c-family/c-omp.c @@ -1087,3 +1087,108 @@ c_omp_predetermined_sharing (tree decl) return OMP_CLAUSE_DEFAULT_UNSPECIFIED; } + +/* Return a numerical code representing the device_type. Currently, + only device_type(nvidia) is supported. All device_type parameters + are treated as case-insensitive keywords. */ + +int +oacc_extract_device_id (const char *device) +{ + if (!strcasecmp (device, nvidia)) +return GOMP_DEVICE_NVIDIA_PTX; + return GOMP_DEVICE_NONE; +} Why do you support just one particular device_type? That sounds broken. You should just have some table with names - GOMP_DEVICE_* mappings. I kind of wanted to keep this patch local in gomp-4_0-branch until it was a little more functional. Adding proper support for device_type is going to be more involved. For instance, the the tile clause changes the shape of a loop, so if you have #pragma acc loop tile (2, 4) device_type (nvidia) tile (5, 5) \ device_type (something_else) tile (1, 4) we're going to have to generate three different versions of that parallel region. Then we'd have to teach the compiler to the offload regions with the proper number of gangs, workers and vectors, etc. For our initial implementation, we just decided to support device_type (nvidia), since openacc is really only working on nvptx and host devices. And the runtime is rigged up to ignore num_gangs, num_workers and vector_length for the host anyway. So that's why I filtered out the device_type clauses in the front end. Also, for full disclosure, we're parsing the tile clause, but we're not actually tiling the loops yet. We're still in the process of getting the oacc execution model working on the nvptx target. Things which are easy to do in cpu threads (barriers and synchronization, global memory, etc.) are not as straightforward on gpus, unfortunately. + if (code (1 GOMP_DEVICE_NVIDIA_PTX)) +{ + if (seen_nvidia) +{ + seen_nvidia = NULL_TREE; + error_at (OMP_CLAUSE_LOCATION (c), +duplicate device_type (nvidia)); + goto filter_error; +} + else +seen_nvidia = OMP_CLAUSE_DEVICE_TYPE_CLAUSES (c); Again, I must say I don't like the hardcoding of one particular device type here. Doesn't Intel want to support OpenACC for XeonPhi? What about HSA eventually, etc.? @@ -4624,7 +4657,7 @@ c_parser_compound_statement_nostart (c_parser *parser) last_label = false; mark_valid_location_for_stdc_pragma (false); c_parser_declaration_or_fndef (parser, true, true, true, true, - true, NULL, vNULL); + true, NULL, vNULL, NULL_TREE, false); Wouldn't default arguments be in order here? Though, even those will mean compile time cost of passing all the zeros almost all the time. I'll check who did this. -/* OpenMP 2.5: +/* OpenACC: + num_gangs (
Re: [Bug target/66015] New: align directives not propagated after __attribute__ ((__optimize__ (O2)))
On Tue, 2015-05-05 at 14:19 +0200, Christian Bruel wrote: I don't the environment to run the testsuite for ia64. would you mind giving it a try and verify that it fixes the issue ? I don't have ia64 hardware anymore. The GCC compile farm advertises ia64 hardware https://gcc.gnu.org/wiki/CompileFarm though the web site suggests that all ia64 machines are offline, so I don't know if that is going to help. Jim
Re: [PATCH] Fix eipa_sra AAPCS issue (PR target/65956)
On Tue, May 05, 2015 at 02:20:43PM +0100, Richard Earnshaw wrote: On 05/05/15 14:06, Jakub Jelinek wrote: On Tue, May 05, 2015 at 02:02:19PM +0100, Richard Earnshaw wrote: In a literal sense, yes. However, even KR stdarg have standard promotion and conversion rules (size int = int, floats promoted to double, etc). What are those rules for GCC's overaligned types (ie where in the docs does it say what happens and how a back-end should interpret them)? Shouldn't the mid-end be doing that work so as to For the middle-end, the TYPE_ALIGN info on expression types is considered useless, you can get there anything. There is no conversion rule to what you get for myalignedint + int, or (myalignedint) int, or (int) myalignedint, etc. create a consistent view of the values passed into the back-end? It seems to me that at present the back-end has to be psychic as to what is really happening. No, the backend just shouldn't consider TYPE_ALIGN on the scalars, and it seems only arm ever looks at that. Nothing in the specification for TARGET_FUNCTION_ARG (or any of the related functions) makes any mention of this... While this requirement isn't documented, it is clearly common sense or at least something any kind of testing would reveal immediately. And it is nothing broken recently (except that with the SRA changes it hits much more often), looking e.g. at GCC 3.2, I'm seeing that expand_call is on that testcase also called with pretty random TYPE_ALIGN on the argument types; we didn't have GIMPLE then, so it is nothing that GIMPLE brought in. Jakub
Commit: MSP430: Add -mcode-region= and -mdata-region= options
Hi Guys, I am applying the attached patch to add two new command line options to the MSP430 backend. The -mcode-region= and -mdata-region= options allow the user to specify whether functions and data should be placed into low memory (below 64K) or high memory. This only applies to the MSP430X, and only when the specific MSP430 part actually has both low and high memory regions. An interesting ability of this new feature is to select a memory region of either, which tells the linker to place the object into low memory if there is room available and into high memory otherwise. Three new function and data attributes have also been added - lower, either and upper - which allow the programmer to explicitly state which memory region an object should use. Tested with no regressions and 77 less unexpected failures on an msp430-elf toolchain. Cheers Nick gcc/ChangeLog 2015-05-05 Nick Clifton ni...@redhat.com * config/msp430/msp430-opts.h (enum msp430_regions): New. * config/msp430/msp430.c (msp430_override_options): Complain if -mcode-region or -mdata-region is used on a non MSP430X. (msp430_section_attr): New function. Checks lower, upper and either attributes. (msp430_attribute_table): Add lower, upper and either. (gen_prefix): New function. Generates a prefix for a section name. (msp430_select_section): New function - handles the choice of section for an object. Takes into account memory region attributes and options. (msp430_function_section): Use gen_prefix. (TARGET_SECTION_TYPE_FLAGS): Define. (msp430_section_type_flags): New function. (TARGET_ASM_UNIQUE_SECTION): Define. (msp430_unique_section): New function. (msp430_output_aligned_decl_common): New function. (msp430_do_not_relax_short_jumps): New function. * config/msp430/msp430.h (USE_SELECT_SECTION_FOR_FUNCTIONS): Define. (ASM_OUTPUT_ALIGNED_DECL_COMMON): Define. * config/msp430/msp430-protos.h (msp430_do_not_relax_short_jumps): New prototype. (msp430_output_aligned_decl_common): New prototype. * config/msp430/msp430.md (length): New attribute. (cbranchhi4_real): If msp430_do_not_relax_short_jumps is true then use a long code sequence for short jumps. * config/msp430/msp430.opt (mcode-region): New. (mdata-region): New. * doc/invoke.texi: Document new options. * doc/extend.texi: Document new attributes. Index: gcc/config/msp430/msp430-opts.h === --- gcc/config/msp430/msp430-opts.h (revision 222796) +++ gcc/config/msp430/msp430-opts.h (working copy) @@ -29,4 +29,12 @@ F5SERIES }; +enum msp430_regions +{ + ANY, + EITHER, + LOWER, + UPPER +}; + #endif Index: gcc/config/msp430/msp430-protos.h === --- gcc/config/msp430/msp430-protos.h (revision 222796) +++ gcc/config/msp430/msp430-protos.h (working copy) @@ -21,6 +21,7 @@ #ifndef GCC_MSP430_PROTOS_H #define GCC_MSP430_PROTOS_H +bool msp430_do_not_relax_short_jumps (void); rtx msp430_eh_return_stackadj_rtx (void); void msp430_expand_eh_return (rtx); void msp430_expand_epilogue (int); @@ -40,6 +41,7 @@ const char * msp430x_logical_shift_right (rtx); const char * msp430_mcu_name (void); bool msp430_modes_tieable_p (machine_mode, machine_mode); +voidmsp430_output_aligned_decl_common (FILE *, const tree, const char *, unsigned HOST_WIDE_INT, unsigned); void msp430_output_labelref (FILE *, const char *); void msp430_register_pragmas (void); rtx msp430_return_addr_rtx (int); Index: gcc/config/msp430/msp430.c === --- gcc/config/msp430/msp430.c (revision 222796) +++ gcc/config/msp430/msp430.c (working copy) @@ -244,6 +244,11 @@ if (TARGET_LARGE !msp430x) error (-mlarge requires a 430X-compatible -mmcu=); + if (msp430_code_region == UPPER ! msp430x) +error (-mcode-region=upper requires 430X-compatible cpu); + if (msp430_data_region == UPPER ! msp430x) +error (-mdata-region=upper requires 430X-compatible cpu); + if (flag_exceptions || flag_non_call_exceptions || flag_unwind_tables || flag_asynchronous_unwind_tables) flag_omit_frame_pointer = false; @@ -1146,46 +1151,234 @@ + cfun-machine-framesize_outgoing); } +/* Attribute Handling. */ + +const char * const ATTR_INTR = interrupt; +const char * const ATTR_WAKEUP = wakeup; +const char * const ATTR_NAKED = naked; +const char * const ATTR_REENT = reentrant; +const char * const ATTR_CRIT = critical; +const char * const ATTR_LOWER = lower; +const char * const ATTR_UPPER = upper; +const char * const ATTR_EITHER = either; + static inline bool -is_attr_func (const char * attr) +has_attr (const char
RFA: Doc update: Describe new MSP430 feature
Hi Gerald, Hi Joesph, Hi Sandra, I recently contributes a patch to enhance the MSP430 backend with the ability to automatically distribute code and data between high and low memory regions. Below is a patch to update the html documentation with a description of this new feature. Is it OK to apply ? Cheers Nick Index: htdocs/gcc-5/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v retrieving revision 1.120 diff -u -3 -p -r1.120 changes.html --- htdocs/gcc-5/changes.html 1 May 2015 22:44:45 - 1.120 +++ htdocs/gcc-5/changes.html 5 May 2015 15:31:03 - @@ -859,6 +859,27 @@ here/a./p when compiling for soft-float targets./li /ul +h3 id=msp430MSP430/h3 + ul +lipThe MSP430 compiler now has the ability to automatically distribute code + and data between low (less than 64K) memory and high memory. This only + applies parts that actually have both memory regions and only if the + linker script for the part has been specifically set up to support this + feature./p + + pA new attribute of codeeither/code can be applied to both functions + and data, and this tells the compiler to place the object into low memory + if there is room and into high memory otherwise. Two other new attributes + - codelower/code and codeupper/code - can be used to explicitly + state that an object should be placed in the specified memory region. If + there is not enough left in that region the compilation will fail./p + + pTwo new command line options - code-mcode-region=[lower|upper|either]/code + and code-mdata-region=[lower|upper|either]/code - can be used to tell + the compiler what to do with objects that do not have one of these new + attributes./p/li + /ul + h3 id=nds32NDS32/h3 ul liThe variadic function ABI implementation is now compatible with
ping**n re [patch, ARM] Add support for crtfastmath.o
This patch I posted last fall: https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00711.html still has not been reviewed, in spite of me pinging it several times before GCC 5 went into stage 4. Now that we're back in stage 1 again, can somebody please take a look at it? I've just re-tested it on mainline head. FAOD, NVIDIA assigned the copyright on their original version of the patch to Mentor Graphics so there should not be any legal issue with it. -Sandra
Re: [PATCH] Fix eipa_sra AAPCS issue (PR target/65956)
On May 5, 2015 4:33:58 PM GMT+02:00, Richard Earnshaw richard.earns...@foss.arm.com wrote: On 05/05/15 15:33, Richard Earnshaw wrote: On 05/05/15 15:29, Jakub Jelinek wrote: On Tue, May 05, 2015 at 02:20:43PM +0100, Richard Earnshaw wrote: On 05/05/15 14:06, Jakub Jelinek wrote: On Tue, May 05, 2015 at 02:02:19PM +0100, Richard Earnshaw wrote: In a literal sense, yes. However, even KR stdarg have standard promotion and conversion rules (size int = int, floats promoted to double, etc). What are those rules for GCC's overaligned types (ie where in the docs does it say what happens and how a back-end should interpret them)? Shouldn't the mid-end be doing that work so as to For the middle-end, the TYPE_ALIGN info on expression types is considered useless, you can get there anything. There is no conversion rule to what you get for myalignedint + int, or (myalignedint) int, or (int) myalignedint, etc. create a consistent view of the values passed into the back-end? It seems to me that at present the back-end has to be psychic as to what is really happening. No, the backend just shouldn't consider TYPE_ALIGN on the scalars, and it seems only arm ever looks at that. Nothing in the specification for TARGET_FUNCTION_ARG (or any of the related functions) makes any mention of this... While this requirement isn't documented, it is clearly common sense or at least something any kind of testing would reveal immediately. Then clearly no such tests exist in the testsuite :-( Or, more precisely, no such tests existed in 2008, when the code went in. That just means that the code was the first that make this matter. BTW, what do other compilers do (I suppose llvm supports the gcc extensions for alignment)? Can llvm and gcc interoperate properly here? Do the cited testcases work with llvm? Richard. R. R. And it is nothing broken recently (except that with the SRA changes it hits much more often), looking e.g. at GCC 3.2, I'm seeing that expand_call is on that testcase also called with pretty random TYPE_ALIGN on the argument types; we didn't have GIMPLE then, so it is nothing that GIMPLE brought in. Jakub
[wwwdocs] Skeleton for GCC 6 release notes
I started working on this over the weekend, and then Jason wondered about it yesterday, so I completed and committed the following skeleton for the GCC 6 release notes yesterday. Gerald Index: gcc-6/changes.html === RCS file: gcc-6/changes.html diff -N gcc-6/changes.html --- /dev/null 1 Jan 1970 00:00:00 - +++ gcc-6/changes.html 5 May 2015 19:17:11 - @@ -0,0 +1,81 @@ +html + +head +titleGCC 6 Release Series mdash; Changes, New Features, and Fixes/title +/head + +!-- GCC maintainers, please do not hesitate to update/contribute entries + concerning those part of GCC you maintain! 2002-03-23, Gerald. +-- + +body +h1GCC 6 Release Seriesbr /Changes, New Features, and Fixes/h1 + +!-- .. -- +h2Caveats/h2 + + +!-- .. -- +!-- h2 id=generalGeneral Optimizer Improvements/h2 -- + + +!-- .. -- +h2 id=languagesNew Languages and Language specific improvements/h2 + +!-- h3 id=adaAda/h3 -- + +!-- h3 id=c-familyC family/h3 -- + +h3 id=cxxC++/h3 + +!-- h3 id=goGo/h3 -- + +!-- h3 id=javaJava (GCJ)/h3 -- + + +!-- .. -- +!-- h2 id=jitlibgccjit/h2 -- + + +!-- .. -- +!-- h2 id=targetsNew Targets and Target Specific Improvements/h2 -- + +!-- h3 id=aarch64AArch64/h3 -- + +!-- h3 id=armARM/h3 -- + +!-- h3 id=avrAVR/h3 -- + +!-- h3 id=x86IA-32/x86-64/h3 -- + +!-- h3 id=mipsMIPS/h3 -- + +!-- h3 id=nds32NDS32/h3 -- + +!-- h3 id=powerpcPowerPC / PowerPC64 / RS6000/h3 -- + +!-- h3 id=rxRX/h3 -- + +!-- h3 id=shSH/h3 -- + + +!-- .. -- +!-- h2 id=osOperating Systems/h2 -- + +!-- h3 id=dragonflyDragonFly BSD/h3 -- + +!-- h3 id=freebsdFreeBSD/h3 -- + +!-- h3 id=vxmilsVxWorks MILS/h3 -- + + +!-- .. -- +!-- h2Documentation improvements/h2 -- + + +!-- .. -- +!-- h2Other significant improvements/h2 -- + + +/body +/html
Re: RFA: Doc update: Describe new MSP430 feature
Hi Nick, On Tue, 5 May 2015, Nick Clifton wrote: I recently contributes a patch to enhance the MSP430 backend with the ability to automatically distribute code and data between high and low memory regions. Below is a patch to update the html documentation with a description of this new feature. Is it OK to apply ? as maintainer of the port you do not need anyone else's approval. That said, I am always happy to provide a second pair of eyes, so here are some comments: Index: htdocs/gcc-5/changes.html === +lipThe MSP430 compiler now has the ability to automatically distribute code + and data between low (less than 64K) memory and high memory. Should that be addresses below ... or something like that, since the system would still have more than 64K of memory, not less? + This only + applies parts Is a to missing here? + pTwo new command line options - code-mcode-region=[lower|upper|either]/code command-line options Does this really apply to GCC 5, or GCC 6 and the (new) gcc-6/changes.html? Gerald
RE: [RFC]: Remove Mem/address type assumption in combiner
Hi Segher, Thank you for testing the patch and approving it. Before I commit it, I wanted to check with you on the changelog entry. Please find the updated patch with the changelog entry. I have just removed the comments that says checks for PLUS/MINUS RTX is a hack. Let me know if it ok. Regards, Venkat. Change Log --- 2015-05-05 Venkataramanan Kumar venkataramanan.ku...@amd.com * combine.c (make_compound_operation): Remove checks for PLUS/MINUS rtx type. ---patch--- diff --git a/gcc/combine.c b/gcc/combine.c index c04146a..9e3eb03 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -7723,9 +7723,8 @@ extract_left_shift (rtx x, int count) We try, as much as possible, to re-use rtl expressions to save memory. IN_CODE says what kind of expression we are processing. Normally, it is - SET. In a memory address (inside a MEM, PLUS or minus, the latter two - being kludges), it is MEM. When processing the arguments of a comparison - or a COMPARE against zero, it is COMPARE. */ + SET. In a memory address it is MEM. When processing the arguments of + a comparison or a COMPARE against zero, it is COMPARE. */ rtx make_compound_operation (rtx x, enum rtx_code in_code) @@ -7745,8 +7744,6 @@ make_compound_operation (rtx x, enum rtx_code in_code) but once inside, go back to our default of SET. */ next_code = (code == MEM ? MEM - : ((code == PLUS || code == MINUS) - SCALAR_INT_MODE_P (mode)) ? MEM : ((code == COMPARE || COMPARISON_P (x)) XEXP (x, 1) == const0_rtx) ? COMPARE : in_code == COMPARE ? SET : in_code); ---patch--- -Original Message- From: Segher Boessenkool [mailto:seg...@kernel.crashing.org] Sent: Friday, May 01, 2015 8:48 PM To: Kumar, Venkataramanan Cc: Jeff Law (l...@redhat.com); gcc-patches@gcc.gnu.org; maxim.kuvyr...@linaro.org Subject: Re: [RFC]: Remove Mem/address type assumption in combiner On Wed, Apr 29, 2015 at 12:03:35PM -0500, Segher Boessenkool wrote: On Wed, Apr 29, 2015 at 09:25:21AM +, Kumar, Venkataramanan wrote: diff --git a/gcc/combine.c b/gcc/combine.c index 5c763b4..945abdb 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -7703,8 +7703,6 @@ make_compound_operation (rtx x, enum rtx_code in_code) but once inside, go back to our default of SET. */ next_code = (code == MEM ? MEM - : ((code == PLUS || code == MINUS) - SCALAR_INT_MODE_P (mode)) ? MEM : ((code == COMPARE || COMPARISON_P (x)) XEXP (x, 1) == const0_rtx) ? COMPARE : in_code == COMPARE ? SET : in_code); On X86_64, it passes bootstrap and regression tests. But on Aarch64 the test in PR passed, but I got a few test case failures. There are few patterns based on multiplication operations in Aarch64 backend which are used to match with the pattern combiner generated. Now those patterns have to be fixed to use SHIFTS. Also need to see any impact on other targets. Right. It would be good if you could find out for what targets it matters. The thing is, if a target expects only the patterns as combine used to make them, it will regress (as you've seen on aarch64); but it will regress _silently_. Which isn't so nice. But before that I wanted to check if the assumption in combiner, can simply be removed ? Seeing for what targets / patterns it makes a difference would tell us the answer to that, too :-) I'll run some tests with and without your patch. So I ran the tests. These are text sizes for vmlinux built for most architectures (before resp. after the patch). Results are good, but they show many targets need some work... BIGGER 5410496 5432744 alpha 3848987 3849143 arm 2190276 2190292 blackfin 2188431 2191503 c6x (but data shrank by more than text grew) 2212322 2212706 cris 10896454 10896965 i386 7471891 7488771 parisc 6168799 6195391 parisc64 1545840 1545872 shnommu 1946649 1954149 xtensa I looked at some of those. Alpha regresses with s4add things, arm with add ,lsl things, parisc with shladd things. I tried to fix the parisc one (it seemed simplest), and the 32-bit kernel got most (but not all) of the size difference back; and the 64-bit wouldn't build (an assert in the kernel failed, and it uses a floating point reg where an integer one is needed -- I gave up). SMALLER 8688171 8688003 powerpc 20252605 20251797 powerpc64 11425334 11422558 s390 12300135 12299767 x86_64 The PowerPC differences are mostly what first was rlwinm;addi;rlwinm;lwz and now is rlwinm;lwz; i.e. the add is moved after the two rotates, the rotates are merged, and the add is made part of the offset in the load. NO DIFF 3321492 3321492 frv 3252747 3252747 m32r 4708232 4708232 microblaze 3949145 3949145 mips 5693019
[PATCH, i386]: Fix PR65990, unrecognizable insn with -mmemcpy-strategy=rep_8byte
Hello! Attached patch fixes PR65990. rep_8byte is not a valid stringop strategy for 32bit targets. 2015-05-05 Uros Bizjak ubiz...@gmail.com PR target/65990 * config/i386/i386.c (ix86_parse_stringop_strategy_string): Error out if rep_8byte stringop strategy was specified for 32-bit target. testsuite/ChangeLog: 2015-05-05 Uros Bizjak ubiz...@gmail.com PR target/65990 * gcc.target/i386/pr65990.c: New test. Tested on x86_64-linux-gnu{,-m32} and committed to mainline SVN. RMs, should this patch be backported to release branches? Uros. Index: testsuite/gcc.target/i386/pr65990.c === --- testsuite/gcc.target/i386/pr65990.c (revision 0) +++ testsuite/gcc.target/i386/pr65990.c (revision 0) @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options -mtune=btver2 -mmemcpy-strategy=rep_8byte:-1:noalign } + +/* { dg-error stringop strategy name rep_8byte specified for option -mmemcpy_strategy= not supported for 32-bit code { target ia32 } 0 } */ + +struct U9 +{ + unsigned a[9]; +}; + +struct U9 u9; + +void +foo () +{ + u9 = (struct U9) { +.a = { + 0xFF, + 0xFF, + 0xFF, + 0xFF, + 0xFF, + 0xFF, + 0xFF, + 0xFF, + 0xFF +} + }; +} Index: config/i386/i386.c === --- config/i386/i386.c (revision 222804) +++ config/i386/i386.c (working copy) @@ -2988,6 +2988,17 @@ ix86_parse_stringop_strategy_string (char *strateg return; } + if ((stringop_alg) i == rep_prefix_8_byte + !TARGET_64BIT) + { + /* rep; movq isn't available in 32-bit code. */ + error (stringop strategy name %s specified for option %s +not supported for 32-bit code, + alg_name, + is_memset ? -mmemset_strategy= : -mmemcpy_strategy=); + return; + } + input_ranges[n].max = maxs; input_ranges[n].alg = (stringop_alg) i; if (!strcmp (align, align))
Re: [PATCH, i386]: Fix PR65990, unrecognizable insn with -mmemcpy-strategy=rep_8byte
On Tue, May 05, 2015 at 06:58:08PM +0200, Uros Bizjak wrote: RMs, should this patch be backported to release branches? I think so (4.9+, I think 4.8 doesn't have -mmemcpy-strategy=). Jakub
[patch 0/27] RFC: Use automake-1.11.6 across the tree
Hello build machinery maintainers, following up http://thread.gmane.org/gmane.comp.gcc.patches/331902/focus=334462 http://thread.gmane.org/gmane.comp.gcc.patches/332160 On 01/25/2015 08:42 PM, Jan-Benedict Glaw wrote: On Sun, 2015-01-04 20:20:40 +0100, Michael Haubenwallner michael.haubenwall...@ssi-schaefer.com wrote: Updating to 1.14 might require more work like updating some .in files as well. I've seen automake-1.11 being explicitly used, so for now we really want 1.11.6 eventually? Probably yes, we may want to stick to a well-known version. (Maybe another way could be to really upgrade to current versions, with is, I guess, more work than just sync files and rerun. That might be fruitful (ie. to not stick to older versions), but this is a change I don't see in the current stage.) To cut a long story short: * Do we actually see *problems* from the version inconsistencies already introduced by me and/or others? There's a problem for gcc-developers: When I need to import a libtool upstream patch, by its nature it affects each library. As I prefer to avoid mixing these diffs with an automake version change in one commit, I need to bootstrap different libraries with different automake versions. Even if I probably need to split this change into one commit per library anyway, the need for multiple automake versions still feels pointless. * ...and: Do we want to stick to known versions, or update if? (Probably not in such a late stage, though...) Now that gcc-5 is out, what about an automake-1.11.6 update for gcc-6? BTW, the actual commands I use to re-run automake for everything (I found) is: $ export AUTOMAKE='automake-1.11 --add-missing --copy --force-missing' $ /src/gcc-trunk/configure --prefix=/install \ --enable-languages=c,c++,fortran,go,java,lto,objc,obj-c++ \ --enable-liboffloadmic=target \ --enable-libmpx \ --enable-maintainer-mode $ make bootstrap Thanks! /haubi/
libgo patch committed: Permit nil Func in Func.Name
The gc toolchain permits Func to be nil when calling the runtime.Func.Name method, returning an empty string. This patch changes libgo to do the same, rather than crashing. This is GCC PR 66016. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline and 4.9 branch. Ian diff -r f3c36747f019 libgo/runtime/go-caller.c --- a/libgo/runtime/go-caller.c Thu Apr 30 13:40:30 2015 -0700 +++ b/libgo/runtime/go-caller.c Tue May 05 07:12:06 2015 -0700 @@ -231,6 +231,8 @@ String runtime_funcname_go (Func *f) { + if (f == NULL) +return runtime_gostringnocopy ((const byte *) ); return f-name; }
Re: [RFC]: Remove Mem/address type assumption in combiner
On Tue, May 05, 2015 at 04:14:03PM +, Kumar, Venkataramanan wrote: Hi Segher, Thank you for testing the patch and approving it. Before I commit it, I wanted to check with you on the changelog entry. Please find the updated patch with the changelog entry. I have just removed the comments that says checks for PLUS/MINUS RTX is a hack. Let me know if it ok. Regards, Venkat. Change Log --- 2015-05-05 Venkataramanan Kumar venkataramanan.ku...@amd.com * combine.c (make_compound_operation): Remove checks for PLUS/MINUS rtx type. Yes, this is okay for trunk. Segher
[PING^2, AArch64] Add long-call attribute and pragma interfaces
Patch ping ... On 04/02/2015 11:59 PM, Yangfei (Felix) wrote: Patch ping: https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01148.html This patch needs documentation for the new attributes and pragmas before it can be committed. (Since this is a new feature I think it has to wait until stage 1, too, but that's not my call.) -Sandra Sure, I will update the docs when this patch is approved by the port maintainers. I didn't get any feedback from Richard for the v2 patch. Thanks.
[PATCH 2/4 v2: part 1] Move linemap_assert higher up within the file
--- libcpp/include/line-map.h | 71 --- 1 file changed, 36 insertions(+), 35 deletions(-) diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h index e1681ea..7001552 100644 --- a/libcpp/include/line-map.h +++ b/libcpp/include/line-map.h @@ -279,6 +279,42 @@ struct GTY(()) line_map { } GTY((desc (%1.reason == LC_ENTER_MACRO))) d; }; +#if defined ENABLE_CHECKING (GCC_VERSION = 2007) + +/* Assertion macro to be used in line-map code. */ +#define linemap_assert(EXPR) \ + do {\ +if (! (EXPR)) \ + abort (); \ + } while (0) + +/* Assert that becomes a conditional expression when checking is disabled at + compilation time. Use this for conditions that should not happen but if + they happen, it is better to handle them gracefully rather than crash + randomly later. + Usage: + + if (linemap_assert_fails(EXPR)) handle_error(); */ +#define linemap_assert_fails(EXPR) __extension__ \ + ({linemap_assert (EXPR); false;}) + +/* Assert that MAP encodes locations of tokens that are not part of + the replacement-list of a macro expansion. */ +#define linemap_check_ordinary(LINE_MAP) __extension__ \ + ({linemap_assert (!linemap_macro_expansion_map_p (LINE_MAP)); \ +(LINE_MAP);}) +#else +/* Include EXPR, so that unused variable warnings do not occur. */ +#define linemap_assert(EXPR) ((void)(0 (EXPR))) +#define linemap_assert_fails(EXPR) (! (EXPR)) +#define linemap_check_ordinary(LINE_MAP) (LINE_MAP) +#endif + +/* Return TRUE if MAP encodes locations coming from a macro + replacement-list at macro expansion point. */ +bool +linemap_macro_expansion_map_p (const struct line_map *); + #define MAP_START_LOCATION(MAP) (MAP)-start_location #define ORDINARY_MAP_FILE_NAME(MAP) \ @@ -568,10 +604,6 @@ extern const struct line_map *linemap_lookup macro expansion, FALSE otherwise. */ bool linemap_tracks_macro_expansion_locs_p (struct line_maps *); -/* Return TRUE if MAP encodes locations coming from a macro - replacement-list at macro expansion point. */ -bool linemap_macro_expansion_map_p (const struct line_map *); - /* Return the name of the macro associated to MACRO_MAP. */ const char* linemap_map_get_macro_name (const struct line_map*); @@ -638,37 +670,6 @@ bool linemap_location_from_macro_expansion_p (const struct line_maps *, #define MAIN_FILE_P(MAP) \ ((linemap_check_ordinary (MAP)-d.ordinary.included_from 0)) -#if defined ENABLE_CHECKING (GCC_VERSION = 2007) - -/* Assertion macro to be used in line-map code. */ -#define linemap_assert(EXPR) \ - do { \ -if (! (EXPR)) \ - abort ();\ - } while (0) - -/* Assert that becomes a conditional expression when checking is disabled at - compilation time. Use this for conditions that should not happen but if - they happen, it is better to handle them gracefully rather than crash - randomly later. - Usage: - - if (linemap_assert_fails(EXPR)) handle_error(); */ -#define linemap_assert_fails(EXPR) __extension__ \ - ({linemap_assert (EXPR); false;}) - -/* Assert that MAP encodes locations of tokens that are not part of - the replacement-list of a macro expansion. */ -#define linemap_check_ordinary(LINE_MAP) __extension__ \ - ({linemap_assert (!linemap_macro_expansion_map_p (LINE_MAP)); \ -(LINE_MAP);}) -#else -/* Include EXPR, so that unused variable warnings do not occur. */ -#define linemap_assert(EXPR) ((void)(0 (EXPR))) -#define linemap_assert_fails(EXPR) (! (EXPR)) -#define linemap_check_ordinary(LINE_MAP) (LINE_MAP) -#endif - /* Encode and return a source_location from a column number. The source line considered is the last source line used to call linemap_line_start, i.e, the last source line which a location was -- 1.8.5.3
[PATCH 2/4 v2: part 2] libcpp: Replace macro usage with C++ constructs
libcpp/ChangeLog: * include/line-map.h (MAX_SOURCE_LOCATION): Convert from a macro to a const source_location. (RESERVED_LOCATION_COUNT): Likewise. (linemap_check_ordinary): Convert from a macro to a pair of inline functions, for const/non-const arguments. (MAP_START_LOCATION): Likewise. (ORDINARY_MAP_STARTING_LINE_NUMBER): Likewise. (ORDINARY_MAP_INCLUDER_FILE_INDEX): Likewise. (ORDINARY_MAP_IN_SYSTEM_HEADER_P): Likewise. (ORDINARY_MAP_NUMBER_OF_COLUMN_BITS): Convert from a macro to a pair of inline functions, for const/non-const arguments, where the latter is named... (SET_ORDINARY_MAP_NUMBER_OF_COLUMN_BITS): New function. (ORDINARY_MAP_FILE_NAME): Convert from a macro to a pair of inline functions, for const/non-const arguments. (MACRO_MAP_MACRO): Likewise. (MACRO_MAP_NUM_MACRO_TOKENS): Likewise. (MACRO_MAP_LOCATIONS): Likewise. (MACRO_MAP_EXPANSION_POINT_LOCATION): Likewise. (LINEMAPS_MAP_INFO): Likewise. (LINEMAPS_MAPS): Likewise. (LINEMAPS_ALLOCATED): Likewise. (LINEMAPS_USED): Likewise. (LINEMAPS_CACHE): Likewise. (LINEMAPS_ORDINARY_CACHE): Likewise. (LINEMAPS_MACRO_CACHE): Likewise. (LINEMAPS_MAP_AT): Convert from a macro to an inline function. (LINEMAPS_LAST_MAP): Likewise. (LINEMAPS_LAST_ALLOCATED_MAP): Likewise. (LINEMAPS_ORDINARY_MAPS): Likewise. (LINEMAPS_ORDINARY_MAP_AT): Likewise. (LINEMAPS_ORDINARY_ALLOCATED): Likewise. (LINEMAPS_ORDINARY_USED): Likewise. (LINEMAPS_LAST_ORDINARY_MAP): Likewise. (LINEMAPS_LAST_ALLOCATED_ORDINARY_MAP): Likewise. (LINEMAPS_MACRO_MAPS): Likewise. (LINEMAPS_MACRO_MAP_AT): Likewise. (LINEMAPS_MACRO_ALLOCATED): Likewise. (LINEMAPS_MACRO_USED): Likewise. (LINEMAPS_MACRO_LOWEST_LOCATION): Likewise. (LINEMAPS_LAST_MACRO_MAP): Likewise. (LINEMAPS_LAST_ALLOCATED_MACRO_MAP): Likewise. (IS_ADHOC_LOC): Likewise. (COMBINE_LOCATION_DATA): Likewise. (SOURCE_LINE): Likewise. (SOURCE_COLUMN): Likewise. (LAST_SOURCE_LINE_LOCATION): Likewise. (LAST_SOURCE_LINE): Likewise. (LAST_SOURCE_COLUMN): Likewise. (LAST_SOURCE_LINE_LOCATION) (INCLUDED_FROM): Likewise. (MAIN_FILE_P): Likewise. (LINEMAP_FILE): Likewise. (LINEMAP_LINE): Likewise. (LINEMAP_SYSP): Likewise. (linemap_location_before_p): Likewise. * line-map.c (linemap_check_files_exited): Make local map const. (linemap_add): Use SET_ORDINARY_MAP_NUMBER_OF_COLUMN_BITS. (linemap_line_start): Likewise. --- libcpp/include/line-map.h | 562 +- libcpp/line-map.c | 6 +- 2 files changed, 455 insertions(+), 113 deletions(-) diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h index 7001552..67a64e9 100644 --- a/libcpp/include/line-map.h +++ b/libcpp/include/line-map.h @@ -155,7 +155,7 @@ struct GTY(()) line_map_ordinary { /* This is the highest possible source location encoded within an ordinary or macro map. */ -#define MAX_SOURCE_LOCATION 0x7FFF +const source_location MAX_SOURCE_LOCATION = 0x7FFF; struct cpp_hashnode; @@ -298,16 +298,10 @@ struct GTY(()) line_map { #define linemap_assert_fails(EXPR) __extension__ \ ({linemap_assert (EXPR); false;}) -/* Assert that MAP encodes locations of tokens that are not part of - the replacement-list of a macro expansion. */ -#define linemap_check_ordinary(LINE_MAP) __extension__ \ - ({linemap_assert (!linemap_macro_expansion_map_p (LINE_MAP)); \ -(LINE_MAP);}) #else /* Include EXPR, so that unused variable warnings do not occur. */ #define linemap_assert(EXPR) ((void)(0 (EXPR))) #define linemap_assert_fails(EXPR) (! (EXPR)) -#define linemap_check_ordinary(LINE_MAP) (LINE_MAP) #endif /* Return TRUE if MAP encodes locations coming from a macro @@ -315,30 +309,196 @@ struct GTY(()) line_map { bool linemap_macro_expansion_map_p (const struct line_map *); -#define MAP_START_LOCATION(MAP) (MAP)-start_location +/* Assert that MAP encodes locations of tokens that are not part of + the replacement-list of a macro expansion. */ +inline struct line_map * +linemap_check_ordinary (struct line_map *map) +{ + linemap_assert (!linemap_macro_expansion_map_p (map)); + return map; +} + +/* Assert that MAP encodes locations of tokens that are not part of + the replacement-list of a macro expansion. */ + +inline const struct line_map * +linemap_check_ordinary (const struct line_map *map) +{ + linemap_assert (!linemap_macro_expansion_map_p (map)); + return map; +} + +/* Read the start location of MAP, as an rvalue. */ + +inline source_location +MAP_START_LOCATION (const line_map *map) +{ + return map-start_location;
[PATCH 2/4 v2] libcpp: Replace macro usage with C++ constructs)
On Mon, 2015-05-04 at 13:15 -0600, Jeff Law wrote: On 05/01/2015 06:56 PM, David Malcolm wrote: libcpp makes extensive use of the C preprocessor. Whilst this has a pleasingly self-referential quality, I find the code hard-to-read; implementing source location support in my JIT branch was much harder than I felt it should have been. In an attempt at making the code easier to follow, and to build towards a followup patch, this patch converts most of these macros to C++ equivalents: using const for compile-time constants, and inline functions where macros aren't used as lvalues. This effectively documents the expected types of the params, and makes them available from the debugger e.g.: (gdb) p LINEMAP_FILE ($3) $1 = 0x13b8b37 command-line and indeed the constants also: (gdb) p IS_ADHOC_LOC(MAX_SOURCE_LOCATION) $2 = false (gdb) p IS_ADHOC_LOC(MAX_SOURCE_LOCATION + 1) $3 = true [I didn't mark the inline functions as static; should they be?] [FWIW, I posted a reduced version of this patch about a year ago as: https://gcc.gnu.org/ml/gcc-patches/2014-05/msg01092.html which covered a smaller subset of the macros]. libcpp/ChangeLog: * include/line-map.h (MAX_SOURCE_LOCATION): Convert from a macro to a const source_location. (RESERVED_LOCATION_COUNT): Likewise. (linemap_check_ordinary): Convert from a macro to a pair of inline functions, for const/non-const arguments. (MAP_START_LOCATION): Likewise. (ORDINARY_MAP_STARTING_LINE_NUMBER): Likewise. (ORDINARY_MAP_INCLUDER_FILE_INDEX): Likewise. (ORDINARY_MAP_IN_SYSTEM_HEADER_P): Likewise. (ORDINARY_MAP_NUMBER_OF_COLUMN_BITS): Convert from a macro to a pair of inline functions, for const/non-const arguments, where the latter is named... (SET_ORDINARY_MAP_NUMBER_OF_COLUMN_BITS): New function. (ORDINARY_MAP_FILE_NAME): Convert from a macro to a pair of inline functions, for const/non-const arguments. (MACRO_MAP_MACRO): Likewise. (MACRO_MAP_NUM_MACRO_TOKENS): Likewise. (MACRO_MAP_LOCATIONS): Likewise. (MACRO_MAP_EXPANSION_POINT_LOCATION): Likewise. (LINEMAPS_MAP_INFO): Likewise. (LINEMAPS_MAPS): Likewise. (LINEMAPS_ALLOCATED): Likewise. (LINEMAPS_USED): Likewise. (LINEMAPS_CACHE): Likewise. (LINEMAPS_ORDINARY_CACHE): Likewise. (LINEMAPS_MACRO_CACHE): Likewise. (LINEMAPS_MAP_AT): Convert from a macro to an inline function. (LINEMAPS_LAST_MAP): Likewise. (LINEMAPS_LAST_ALLOCATED_MAP): Likewise. (LINEMAPS_ORDINARY_MAPS): Likewise. (LINEMAPS_ORDINARY_MAP_AT): Likewise. (LINEMAPS_ORDINARY_ALLOCATED): Likewise. (LINEMAPS_ORDINARY_USED): Likewise. (LINEMAPS_LAST_ORDINARY_MAP): Likewise. (LINEMAPS_LAST_ALLOCATED_ORDINARY_MAP): Likewise. (LINEMAPS_MACRO_MAPS): Likewise. (LINEMAPS_MACRO_MAP_AT): Likewise. (LINEMAPS_MACRO_ALLOCATED): Likewise. (LINEMAPS_MACRO_USED): Likewise. (LINEMAPS_MACRO_LOWEST_LOCATION): Likewise. (LINEMAPS_LAST_MACRO_MAP): Likewise. (LINEMAPS_LAST_ALLOCATED_MACRO_MAP): Likewise. (IS_ADHOC_LOC): Likewise. (COMBINE_LOCATION_DATA): Likewise. (SOURCE_LINE): Likewise. (SOURCE_COLUMN): Likewise. (LAST_SOURCE_LINE_LOCATION): Likewise. (LAST_SOURCE_LINE): Likewise. (LAST_SOURCE_COLUMN): Likewise. (LAST_SOURCE_LINE_LOCATION) (INCLUDED_FROM): Likewise. (MAIN_FILE_P): Likewise. (LINEMAP_FILE): Likewise. (LINEMAP_LINE): Likewise. (LINEMAP_SYSP): Likewise. (linemap_location_before_p): Likewise. * line-map.c (linemap_check_files_exited): Make local map const. (linemap_add): Use SET_ORDINARY_MAP_NUMBER_OF_COLUMN_BITS. (linemap_line_start): Likewise. --- -#define MAP_START_LOCATION(MAP) (MAP)-start_location +#if defined ENABLE_CHECKING (GCC_VERSION = 2007) + +/* Assertion macro to be used in line-map code. */ +#define linemap_assert(EXPR) \ + do {\ +if (! (EXPR)) \ + abort (); \ + } while (0) + +/* Assert that becomes a conditional expression when checking is disabled at + compilation time. Use this for conditions that should not happen but if + they happen, it is better to handle them gracefully rather than crash + randomly later. + Usage: + + if (linemap_assert_fails(EXPR)) handle_error(); */ +#define linemap_assert_fails(EXPR) __extension__ \ + ({linemap_assert (EXPR); false;}) + +#else +/* Include EXPR, so that unused variable warnings do not occur. */ +#define linemap_assert(EXPR) ((void)(0 (EXPR))) +#define linemap_assert_fails(EXPR) (! (EXPR)) +#endif I moved linemap_assert and linemap_assert_fails higher up within the file so that
Re: Patch ping
On 05/05/2015 08:52 PM, Jakub Jelinek wrote: Hi! http://gcc.gnu.org/ml/gcc-patches/2015-04/msg01432.html - this got approved for arm and aarch64, but not for s390{,x} Ok for trunk? Yes. Thanks! -Andreas-
Re: [PATCH 3/3] Fix indentation issues seen by -Wmisleading-indentation
On Wed, 2015-04-29 at 14:10 +0200, Mikael Morin wrote: Hello, Le 29/04/2015 02:02, David Malcolm a écrit : diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c index 2c7c554..30e4eab 100644 --- a/gcc/fortran/parse.c +++ b/gcc/fortran/parse.c @@ -4283,7 +4283,7 @@ parse_oacc_structured_block (gfc_statement acc_st) unexpected_eof (); else if (st != acc_end_st) gfc_error (Expecting %s at %C, gfc_ascii_statement (acc_end_st)); - reject_statement (); + reject_statement (); } while (st != acc_end_st); I think this one is a bug; there should be braces around 'gfc_error' and 'reject_statement'. At least that's the pattern in 'parse_oacc_loop', and how the 'unexpected_statement' function is used. Author CC'ed in any case. Thanks. FWIW, Jeff had approved that patch, so I've committed the patch to trunk (as r222823), making the indentation reflect the block structure. Thomas: should the reject_statement (); call in the above be guarded by the else if (st != acc_end_st) clause? Thanks Dave
Re: RFA: RL78: Save the frame pointer if it is used.
OK to apply ? Ok, but... - if (regno == FRAME_POINTER_REGNUM frame_pointer_needed) + if (regno == FRAME_POINTER_REGNUM + (frame_pointer_needed || df_regs_ever_live_p (regno))) Do we want regs_ever_live or regs_ever_written_to ? I seem to recall changing a port... mep perhaps... to only save registers that are changed, not registers that are used but read-only.
Re: [AArch64][PR65375] Fix RTX cost for vector SET
On Sat, Apr 25, 2015 at 12:26:16AM +0100, Kugan wrote: Thanks for the review. I have updated the patch based on the comments with some other minor changes. Bootstrapped and regression tested on aarch64-none-linux-gnu with no-new regressions. Is this OK for trunk? Thanks, Kugan gcc/ChangeLog: 2015-04-24 Kugan Vivekanandarajah kug...@linaro.org Jim Wilson jim.wil...@linaro.org * config/arm/aarch-common-protos.h (struct mem_cost_table): Added new fields loadv and storev. * config/aarch64/aarch64-cost-tables.h (thunderx_extra_costs): Initialize loadv and storev. * config/arm/aarch-cost-tables.h (generic_extra_costs): Likewise. (cortexa53_extra_costs): Likewise. (cortexa57_extra_costs): Likewise. (xgene1_extra_costs): Likewise. * config/aarch64/aarch64.c (aarch64_rtx_costs): Update vector rtx_costs. Hi Kugan, Just a few syle comments, regarding the placements of comments in single-line if statements. I know the current code does not neccesarily always follow the comments below, I'll write a patch cleaning that up at some point when I'm back at my desk. Thanks, James @@ -5667,6 +5668,14 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED, case NEG: op0 = XEXP (x, 0); + if (VECTOR_MODE_P (mode)) + { + if (speed) + /* FNEG. */ + *cost += extra_cost-vect.alu; + return false; + } + if (GET_MODE_CLASS (GET_MODE (x)) == MODE_INT) { if (GET_RTX_CLASS (GET_CODE (op0)) == RTX_COMPARE Personally, I find commented if statements without braces hard to quickly parse. Something like this is much faster for me: if (speed) { /* FNEG. */ *cost += extra_cost-vect.alu; } @@ -5844,7 +5872,10 @@ cost_minus: if (speed) { - if (GET_MODE_CLASS (mode) == MODE_INT) + if (VECTOR_MODE_P (mode)) + /* Vector SUB. */ + *cost += extra_cost-vect.alu; + else if (GET_MODE_CLASS (mode) == MODE_INT) /* SUB(S). */ *cost += extra_cost-alu.arith; else if (GET_MODE_CLASS (mode) == MODE_FLOAT) As above. @@ -5888,7 +5919,6 @@ cost_plus: { if (speed) *cost += extra_cost-alu.arith_shift; - *cost += rtx_cost (XEXP (XEXP (op0, 0), 0), (enum rtx_code) GET_CODE (op0), 0, speed); Drop this whitespace change. @@ -5913,7 +5943,10 @@ cost_plus: if (speed) { - if (GET_MODE_CLASS (mode) == MODE_INT) + if (VECTOR_MODE_P (mode)) + /* Vector ADD. */ + *cost += extra_cost-vect.alu; + else if (GET_MODE_CLASS (mode) == MODE_INT) /* ADD. */ *cost += extra_cost-alu.arith; else if (GET_MODE_CLASS (mode) == MODE_FLOAT) As above. @@ -6013,10 +6061,15 @@ cost_plus: return false; case NOT: - /* MVN. */ if (speed) - *cost += extra_cost-alu.logical; - + { + /* Vector NOT. */ + if (VECTOR_MODE_P (mode)) + *cost += extra_cost-vect.alu; + /* MVN. */ + else + *cost += extra_cost-alu.logical; + } /* The logical instruction could have the shifted register form, but the cost is the same if the shift is processed as a separate instruction, so we don't bother with it here. */ As above. @@ -6055,10 +6108,15 @@ cost_plus: return true; } - /* UXTB/UXTH. */ if (speed) - *cost += extra_cost-alu.extend; - + { + if (VECTOR_MODE_P (mode)) + /* UMOV. */ + *cost += extra_cost-vect.alu; + else + /* UXTB/UXTH. */ + *cost += extra_cost-alu.extend; + } return false; ca§se SIGN_EXTEND: And again :) @@ -6087,10 +6150,16 @@ cost_plus: if (CONST_INT_P (op1)) { - /* LSL (immediate), UBMF, UBFIZ and friends. These are all - aliases. */ if (speed) - *cost += extra_cost-alu.shift; + { + /* Vector shift (immediate). */ + if (VECTOR_MODE_P (mode)) + *cost += extra_cost-vect.alu; + /* LSL (immediate), UBMF, UBFIZ and friends. These are all + aliases. */ + else + *cost += extra_cost-alu.shift; + } /* We can incorporate zero/sign extend for free. */ if (GET_CODE (op0) == ZERO_EXTEND Again, the comment here makes it very difficult to spot the form of the if/else statement. @@ -6102,10 +6171,15 @@ cost_plus: } else { - /* LSLV. */ if (speed) - *cost += extra_cost-alu.shift_reg; - + { + /* Vector
Re: [PING^3] [PATCH] [AArch64, NEON] Improve vmulX intrinsics
On Sat, Apr 11, 2015 at 11:37:47AM +0100, Jiangjiji wrote: Hi, This is a ping for: https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00772.html Regtested with aarch64-linux-gnu on QEMU. This patch has no regressions for aarch64_be-linux-gnu big-endian target too. OK for the trunk? Thanks. Jiang jiji -- Re: [PING^2] [PATCH] [AArch64, NEON] Improve vmulX intrinsics Hi, Kyrill Thank you for your suggestion. I fixed it and regtested with aarch64-linux-gnu on QEMU. This patch has no regressions for aarch64_be-linux-gnu big-endian target too. OK for the trunk? Hi Jiang, I'm sorry that I've taken so long to get to this, I've been out of office for several weeks. I have one comment. +__extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) +vmul_n_f32 (float32x2_t __a, float32_t __b) +{ + return __builtin_aarch64_mul_nv2sf (__a, __b); +} + For vmul_n_* intrinsics, is there a reason we don't want to use the GCC vector extension syntax to allow us to write these as: __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vmul_n_f32 (float32x2_t __a, float32_t __b) { return __a * __b; } It would be great if we could make that work. Thanks, James
[Patch, fortran] PR 37131, inline matmul
Hello world, this is an update of the matmul inline patch. The only difference to the last version is that it has the ubound simplification taken out. Any further comments? OK for trunk? Thomas 2015-05-05 Thomas Koenig tkoe...@gcc.gnu.org PR fortran/37131 * gfortran.h (gfc_isym_id): Add GFC_ISYM_FE_RUNTIME_ERROR. (gfc_array_spec): Add resolved flag. (gfc_intrinsic_sym): Add vararg. * intrinsic.h (gfc_check_fe_runtime_error): Add prototype. (gfc_resolve_re_runtime_error): Likewise. Add prototype for gfc_is_reallocatable_lhs. * array.c (gfc_resolve_array_spec): Do not resolve if it has already been resolved. * trans-array.h (gfc_is_reallocatable_lhs): Remove prototype. * check.c (gfc_check_fe_runtime_error): New function. * intrinsic.c (add_sym_1p): New function. (make_vararg): New function. (add_subroutines): Add fe_runtime_error. (gfc_intrinsic_sub_interface): Skip sorting for variable number of arguments. * iresolve.c (gfc_resolve_fe_runtime_error): New function. * lang.opt (inline-matmul-limit): New option. (gfc_post_options): If no inline matmul limit has been set and BLAS is called externally, use the BLAS limit. * frontend-passes.c: Include intrinsic.h. (var_num): New global counter for naming temporary variablbles. (matrix_case): Enum for differentiating the different matmul cases. (realloc_string_callback): Add trim to the variable name. (create_var): Add optional argument vname as part of the name. Use var_num. Set dimension of result correctly. Split off block creation into (insert_block): New function. (cfe_expr_0): Use fcn as part of temporary variable name. (optimize_namesapce): Also set gfc_current_ns. Call inline_matmul_assign. (combine_array_constructor): Use constr as part of temporary name. (get_array_inq_function): New function. (build_logical_expr): New function. (get_operand): new function. (inline_limit_check): New function. (runtime_error_ne): New function. (matmul_lhs_realloc): New function. (is_functino_or_op): New function. (has_function_or_op): New function. (freeze_expr): New function. (freeze_references): New function. (convert_to_index_kind): New function. (create_do_loop): New function. (get_size_m1): New function. (scalarized_expr): New function. (inline_matmul_assign): New function. * simplify.c (simplify_bound): Simplify the case of the lower bound of an assumed-shape argument. 2015-05-05 Thomas Koenig tkoe...@gcc.gnu.org PR fortran/37131 * gfortran.dg/dependency_26.f90: Add option to suppress inlining matmul. * gfortran.dg/function_optimize_1.f90: Likewise. * gfortran.dg/function_optimize_2.f90: Likewise. * gfortran.dg/function_optimize_5.f90: Likewise. * gfortran.dg/function_optimize_7.f90: Likewise. * gfortran.dg/inline_matmul_1.f90: New test. * gfortran.dg/inline_matmul_2.f90: New test. * gfortran.dg/inline_matmul_3.f90: New test. * gfortran.dg/inline_matmul_4.f90: New test. * gfortran.dg/inline_matmul_5.f90: New test. Index: fortran/array.c === --- fortran/array.c (Revision 222771) +++ fortran/array.c (Arbeitskopie) @@ -338,6 +338,9 @@ gfc_resolve_array_spec (gfc_array_spec *as, int ch if (as == NULL) return true; + if (as-resolved) +return true; + for (i = 0; i as-rank + as-corank; i++) { e = as-lower[i]; @@ -364,6 +367,8 @@ gfc_resolve_array_spec (gfc_array_spec *as, int ch } } + as-resolved = true; + return true; } Index: fortran/check.c === --- fortran/check.c (Revision 222771) +++ fortran/check.c (Arbeitskopie) @@ -5527,7 +5527,37 @@ gfc_check_random_seed (gfc_expr *size, gfc_expr *p return true; } +bool +gfc_check_fe_runtime_error (gfc_actual_arglist *a) +{ + gfc_expr *e; + int len, i; + int num_percent, nargs; + e = a-expr; + if (e-expr_type != EXPR_CONSTANT) +return true; + + len = e-value.character.length; + if (e-value.character.string[len-1] != '\0') +gfc_internal_error (fe_runtime_error string must be null terminated); + + num_percent = 0; + for (i=0; ilen-1; i++) +if (e-value.character.string[i] == '%') + num_percent ++; + + nargs = 0; + for (; a; a = a-next) +nargs ++; + + if (nargs -1 != num_percent) +gfc_internal_error (fe_runtime_error: Wrong number of arguments (%d instead of %d), + nargs, num_percent++); + + return true; +} + bool gfc_check_second_sub (gfc_expr *time) {
Re: [PATCH] Fix eipa_sra AAPCS issue (PR target/65956)
On Mon, May 04, 2015 at 05:00:11PM +0200, Jakub Jelinek wrote: So at least changing arm_needs_doubleword_align for non-aggregates would likely not break anything that hasn't been broken already and would unbreak the majority of cases. Attached (untested so far). It indeed changes code generated for over-aligned va_arg, but as I believe you can't properly pass those in the ... caller, this should just fix it so that va_arg handling matches the caller (and likewise for callees for named argument passing). The following testcase shows that eipa_sra changes alignment even for the aggregates. Change aligned (8) to aligned (4) to see another possibility. Actually I misread it, for the aggregates esra actually doesn't change anything, which is the reason why the testcase doesn't fail. The problem with the scalars is that esra first changed it to the over-aligned MEM_REFs and then later on eipa_sra used the types of the MEM_REFs created by esra. 2015-05-05 Jakub Jelinek ja...@redhat.com PR target/65956 * config/arm/arm.c (arm_needs_doubleword_align): For non-aggregate types check TYPE_ALIGN of TYPE_MAIN_VARIANT rather than type itself. * gcc.c-torture/execute/pr65956.c: New test. --- gcc/config/arm/arm.c.jj 2015-05-04 21:51:42.0 +0200 +++ gcc/config/arm/arm.c2015-05-05 09:20:52.481693337 +0200 @@ -6063,8 +6063,13 @@ arm_init_cumulative_args (CUMULATIVE_ARG static bool arm_needs_doubleword_align (machine_mode mode, const_tree type) { - return (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY - || (type TYPE_ALIGN (type) PARM_BOUNDARY)); + if (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY) +return true; + if (type == NULL_TREE) +return false; + if (AGGREGATE_TYPE_P (type)) +return TYPE_ALIGN (type) PARM_BOUNDARY; + return TYPE_ALIGN (TYPE_MAIN_VARIANT (type)) PARM_BOUNDARY; } --- gcc/testsuite/gcc.c-torture/execute/pr65956.c.jj2015-05-01 10:32:34.730150257 +0200 +++ gcc/testsuite/gcc.c-torture/execute/pr65956.c 2015-05-01 10:32:13.0 +0200 @@ -0,0 +1,67 @@ +/* PR target/65956 */ + +struct A { char *a; int b; long long c; }; +char v[3]; + +__attribute__((noinline, noclone)) void +fn1 (char *x, char *y) +{ + if (x != v[1] || y != v[2]) +__builtin_abort (); + v[1]++; +} + +__attribute__((noinline, noclone)) int +fn2 (char *x) +{ + asm volatile ( : +g (x) : : memory); + return x == v[0]; +} + +__attribute__((noinline, noclone)) void +fn3 (const char *x) +{ + if (x[0] != 0) +__builtin_abort (); +} + +static struct A +foo (const char *x, struct A y, struct A z) +{ + struct A r = { 0, 0, 0 }; + if (y.b z.b) +{ + if (fn2 (y.a) fn2 (z.a)) + switch (x[0]) + { + case '|': + break; + default: + fn3 (x); + } + fn1 (y.a, z.a); +} + return r; +} + +__attribute__((noinline, noclone)) int +bar (int x, struct A *y) +{ + switch (x) +{ +case 219: + foo (+, y[-2], y[0]); +case 220: + foo (-, y[-2], y[0]); +} +} + +int +main () +{ + struct A a[3] = { { v[1], 1, 1LL }, { v[0], 0, 0LL }, { v[2], 2, 2LL } }; + bar (220, a + 2); + if (v[1] != 1) +__builtin_abort (); + return 0; +} Jakub
Re: [Patch, Fortran, PR58586, v3] ICE with derived type with allocatable component passed by value
Hi Mikael, hi all, Mikael, thanks for the review so far. I have inserted the changes requested and updated this patch to trunk. For the e-expr_type == EXPR_FUNCTION I have choose !DECL_P(parmse.expr) to not only prevent VAR_DECLs aliasing, but also prevent aliasing for PARM_DECLs and similar. Bootstraps and regtests ok on x86_64-linux-gnu/F21. Note, this patch also fixes the regression in alloc_comp_constructor_1.f90 introduced by my patch for pr44672, v4. The memory loss is also fixed again. Ok for trunk? Regards, Andre On Sun, 26 Apr 2015 12:22:56 +0200 Mikael Morin mikael.mo...@sfr.fr wrote: Hello, I'm reviewing the original patch only for now. The added bits in v2 will have to wait. Le 23/04/2015 20:00, Andre Vehreschild a écrit : diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c index 9e6432f..80dfed1 100644 --- a/gcc/fortran/trans-expr.c +++ b/gcc/fortran/trans-expr.c @@ -5344,8 +5344,19 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym, (e-expr_type != EXPR_VARIABLE !e-rank)) { int parm_rank; - tmp = build_fold_indirect_ref_loc (input_location, - parmse.expr); + /* It is known the e returns a structure type with at least one + allocatable component. When e is a function, ensure that the + function is called once only by using a temporary variable. */ + if (e-expr_type == EXPR_FUNCTION) + parmse.expr = gfc_evaluate_now_loc (input_location, + parmse.expr, se-pre); You need not limit this to functions only. I think you can even do this without condition. Yes, one could do that, but then an unnecessary temporary variable in the - for my taste - already too clobbered pseudo code is introduced. Furthermore, is the penalty on doing the check for a function negligible. I therefore have not changed that. All right; would you mind writing it either if (e-expr_type != EXPR_VARIABLE) or if (!DECL_P (parmse.expr)) or if (!VAR_P (parmse.expr)) instead? + if (POINTER_TYPE_P (TREE_TYPE (parmse.expr))) This distinguishes arguments with/without value attribute, right? I think it's better to use the frontend information here (fsym-attr.value). Changed to check for value. Please check fsym fsym-attr.value and add the following testcase (it fails with the patch). module m type :: t integer, allocatable :: comp end type type :: u type(t), allocatable :: comp end type end module m use m call sub(u()) end OK with these changes. Ah, and don't forget to provide a ChangeLog entry with it. The Changelog entry comes in an additional attachment. It doesn't appear inline in my mailer as its content type is application/octet-stream, so I missed it. Sorry. Thanks for the patch. I will review the rest later. Mikael -- Andre Vehreschild * Email: vehre ad gmx dot de gcc/testsuite/ChangeLog: 2015-05-05 Andre Vehreschild ve...@gmx.de * gfortran.dg/alloc_comp_class_3.f03: New test. * gfortran.dg/alloc_comp_class_4.f03: New test. gcc/fortran/ChangeLog: 2015-04-23 Andre Vehreschild ve...@gmx.de PR fortran/58586 * resolve.c (resolve_symbol): Non-private functions in modules with allocatable or pointer components are marked referenced now. Furthermore is the default init especially for those components now done in gfc_conf_procedure_call preventing duplicate code. * trans-decl.c (build_function_decl): Set the result decl when the function symbol and the result symbol are equal. (gfc_generate_function_code): Generate a fake result decl for functions returning an object with allocatable components and initialize them. * trans-expr.c (gfc_conv_procedure_call): For value typed trees use the tree without indirect ref. And for non-decl trees add a temporary variable to prevent evaluating the tree multiple times (prevent multiple function evaluations). * trans.h: Made gfc_trans_structure_assign () protoype available, which is now needed by trans-decl.c:gfc_generate_ function_code(), too. gcc/fortran/ChangeLog: 2015-05-05 Andre Vehreschild ve...@gmx.de * resolve.c (resolve_symbol): * trans-decl.c (build_function_decl): (gfc_generate_function_code): * trans-expr.c (gfc_conv_procedure_call): * trans.h: diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c index 2ac4689..72df35e 100644 --- a/gcc/fortran/resolve.c +++ b/gcc/fortran/resolve.c @@ -14093,10 +14093,15 @@ resolve_symbol (gfc_symbol *sym) if ((!a-save !a-dummy !a-pointer !a-in_common !a-use_assoc - (a-referenced || a-result) - !(a-function sym != sym-result)) + !a-result !a-function) || (a-dummy a-intent ==
Re: [PATCH, x86] Add TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE hook
Hi Christian, On 4 May 2015 at 11:29, Christian Bruel christian.br...@st.com wrote: Hi Christian, I noticed case gcc.dg/ipa/iinline-attr.c failed on aarch64. The original patch is x86 specific, while the case is added as general one. Could you please have a look at this? FAIL: gcc.dg/ipa/iinline-attr.c scan-ipa-dump inline hooray[^\\n]*inline copy in test that is the same latent bug for aarch64: alignment flags are not propagated with attribute optimize (O2). testing attached patch The patch looks good to me, maybe you can just fix the original typo on optimizing in the comments while moving the code. I've bootstrapped and regtested it on aarch64-linux-gnu with the same target testcase you added on i386 (as we discussed offline) and everything is ok (gcc.dg/ipa/iinline-attr.c now PASS). Notice that I can't approve your patch. Cheers, Yvan
[PATCH,testsuite] Xfail gcc.dg/tree-ssa/stdarg-2.c f15 scans
Hi, after checking in the 'postpone expanding va_arg till pass_stdarg' patch series, some scans related to function f15 in gcc.dg/tree-ssa/stdarg-2.c have started failing. [ The committed patch series contained a modification of stdarg-2.c, but that seems to be not complete and not correct. ] F.i., for s390 we find at https://gcc.gnu.org/ml/gcc-testresults/2015-05/msg00507.html: ... FAIL: gcc.dg/tree-ssa/stdarg-2.c scan-tree-dump stdarg f15: va_list escapes 0, needs to save 1 GPR units and 2 FPR units ... Furthermore, at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64950#c11 it was noted that: ... Unfortunately, the gcc.dg/tree-ssa/stdarg-2.c part of the patch is wrong: the test now FAILs on i686-unknown-linux-gnu, i686-apple-darwin, and i386-pc-solaris with -m64: both dumps (i.e. -m32 and -m64) contain m32/stdarg-2.c.084t.stdarg:f15: va_list escapes 1, needs to save all GPR units and all FPR units. m64/stdarg-2.c.084t.stdarg:f15: va_list escapes 1, needs to save all GPR units and all FPR units. ... I've filed two PRs: - PR66010 '[6 Regression] Missed optimization after inlining va_list parameter' https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66010 - PR66013 'Missed optimization after inlining va_list parameter, -m32 case' https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66013 to track these errors. AFAIU now from the investigations in those PRs, we can expect all f15 scans that check for the presence of 'va_list escapes 0' to fail. I'd like to commit two patches. - The first patch undoes the modification of stdarg-2.c as committed in the original patch series (omitted). - The second patch adds appropriate xfails (attached). OK for trunk? Thanks, - Tom [PATCH 2/2] Xfail gcc.dg/tree-ssa/stdarg-2.c f15 scans 2015-05-05 Tom de Vries t...@codesourcery.com * gcc.dg/tree-ssa/stdarg-2.c: Xfail f15 scans which test for presence of 'va_list escapes 0'. --- gcc/testsuite/gcc.dg/tree-ssa/stdarg-2.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/stdarg-2.c b/gcc/testsuite/gcc.dg/tree-ssa/stdarg-2.c index fe39da3..f09b5de 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/stdarg-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/stdarg-2.c @@ -288,10 +288,18 @@ f15 (int i, ...) f15_1 (ap); va_end (ap); } -/* { dg-final { scan-tree-dump f15: va_list escapes 0, needs to save \[148\] GPR units and \[1-9\]\[0-9\]* FPR units stdarg { target { { i?86-*-* x86_64-*-* } { ! { ia32 || llp64 } } } } } } */ -/* { dg-final { scan-tree-dump f15: va_list escapes 0, needs to save \[148\] GPR units and \[1-9\]\[0-9\]* FPR units stdarg { target { powerpc*-*-linux* { powerpc_fprs ilp32 } } } } } */ + +/* Following three dg-finals are marked as xfail due to PR66010/PR66013. */ +/* Was: { target { { i?86-*-* x86_64-*-* } { ! { ia32 || llp64 } } } }. */ +/* { dg-final { scan-tree-dump f15: va_list escapes 0, needs to save \[148\] GPR units and \[1-9\]\[0-9\]* FPR units stdarg { xfail *-*-* } } } */ +/* Was: { target { powerpc*-*-linux* { powerpc_fprs ilp32 } } }. */ +/* { dg-final { scan-tree-dump f15: va_list escapes 0, needs to save \[148\] GPR units and \[1-9\]\[0-9\]* FPR units stdarg { xfail *-*-* } } } */ +/* Was: { target s390*-*-linux* }. */ +/* { dg-final { scan-tree-dump f15: va_list escapes 0, needs to save 1 GPR units and 2 FPR units stdarg { xfail *-*-* } } } */ + +/* We may be able to improve upon this after fixing PR66010/PR66013. */ /* { dg-final { scan-tree-dump f15: va_list escapes 1, needs to save all GPR units and all FPR units stdarg { target alpha*-*-linux* } } } */ -/* { dg-final { scan-tree-dump f15: va_list escapes 0, needs to save 1 GPR units and 2 FPR units stdarg { target s390*-*-linux* } } } */ + /* { dg-final { scan-tree-dump-not f15: va_list escapes 0, needs to save 0 GPR units stdarg { target { { i?86-*-* x86_64-*-* } ia32 } } } } */ /* { dg-final { scan-tree-dump-not f15: va_list escapes 0, needs to save 0 GPR units stdarg { target ia64-*-* } } } */ /* { dg-final { scan-tree-dump-not f15: va_list escapes 0, needs to save 0 GPR units stdarg { target { powerpc*-*-* lp64 } } } } */ -- 1.9.1
Re: [CHKP, PATCH] Fix LTO cgraph merge for instrumented functions
Ping 2015-04-14 12:14 GMT+03:00 Ilya Enkovich enkovich@gmail.com: On 10 Apr 03:15, Jan Hubicka wrote: References are not streamed out for nodes which are referenced in a partition but don't belong to it ('continue' condition in output_refs loop). Yeah, but it already special cases aliases (because we now always preserve IPA_REF_ALIAS link in the boundary, so I think making IPA_REF_CHKP to be rpeserved should go the same path) so we can special case the instrumentation thunks too? Any cgraph_node having instrumented clone should have it, not only instrumentation thunks. Surely we can make an exception for IPA_REF_CHKP. I suppose we still need to 1) write_symbol and lto-symtab should know that transparent aliases are not real symbols and ignore them. We have predicate symtab_node::real_symbol_p, I suppose it should return true. I think cgraph_merge and varpool_merge in lto-symtab needs to know what to do when merging two symbols with transparent aliases. 2) At some point we need to populate transparent alias links as discussed in the other thread. 3) For safety, I guess we want symbol_table::change_decl_assembler_name to either sanity check there are no trasparent alias links pointing to old assembler names or update it. Wouldn't search for possible referring via transparent aliases nodes be too expensive? Changing of decl_assembler_name is not very common and if we set up the links only after renaming of statics, i think we are safe. But it would be better to keep verify it at some point. I suppose verify_node check that there is no transparent alias link on symbols were it is not supposed to be and that there is link when it is supposed to be (i.e. symtab_state is =EXPANSION and symbol is either weakref on tragets w/o native weakrefs or instrumentation cone. Would you please mind adding the test and making sure it triggers when things are out of sync? OK, but I suppose it should be a part of transparent alias links rework discussed in another thread. BTW are you going to intoroduce transparent aliases in cgraph soon? 4) I think we want verify_node to check that transparent alias links and chkp points as intended and only on those symbols where they need to be There is also logic in lto-partition to always insert alias target OK, so you have the chkp references that links to instrumented_version. You do not stream them for boundary symbols but for some reason they are needed, but when you can end up with wrong CHKP link? Wrong links may appear when cgraph nodes are merged. I see, I think you really want to fix these at merging time as sugested in 1). In this case even the node-instrumented_version pointer may be out of date and point to a node that lost in merging? node-instrumented_version is redirected and never points to lost symbol. But during merge we may have situations when (node-instrumented_version-instrumented_version != node) because of multiple not merged (yet) symbols referencing the same merged target. OK, which should not be a problem if we stream the links instead of rebuilding them, right? IPA_REF_CHKP refs don't affect instrumented_version merging. And I actually don't see here any problems, instrumented_version links always come into consistent state when symbol merging finishes. Here is a new patch version. It has no chkp_maybe_fix_chkp_ref function, IPA_REF_CHKP references are streamed out instead. Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk? I also want to port it to GCC 5 branch after release. Thanks, Ilya -- gcc/ 2015-04-14 Ilya Enkovich ilya.enkov...@intel.com * ipa.c (symbol_table::remove_unreachable_nodes): Don't remove instumentation thunks calling reachable functions. * lto-cgraph.c (output_refs): Always output IPA_REF_CHKP. * lto/lto-partition.c (privatize_symbol_name_1): New. (privatize_symbol_name): Privatize both decl and orig_decl names for instrumented functions. gcc/testsuite/ 2015-04-14 Ilya Enkovich ilya.enkov...@intel.com * gcc.dg/lto/chkp-privatize-1_0.c: New. * gcc.dg/lto/chkp-privatize-1_1.c: New. * gcc.dg/lto/chkp-privatize-2_0.c: New. * gcc.dg/lto/chkp-privatize-2_1.c: New. diff --git a/gcc/ipa.c b/gcc/ipa.c index b3752de..3054afe 100644 --- a/gcc/ipa.c +++ b/gcc/ipa.c @@ -492,7 +492,22 @@ symbol_table::remove_unreachable_nodes (FILE *file) } else if (cnode-thunk.thunk_p) enqueue_node (cnode-callees-callee, first, reachable); - + + /* For instrumentation clones we always need original +function node for proper LTO privatization. */ + if (cnode-instrumentation_clone + reachable.contains
Re: [CHKP, PATCH] Fix instrumented indirect calls with propagated pointers
Ping 2015-04-14 17:35 GMT+03:00 Ilya Enkovich enkovich@gmail.com: On 10 Apr 03:27, Jan Hubicka wrote: + /* We might propagate instrumented function pointer into + not instrumented function and vice versa. In such a + case we need to either fix function declaration or + remove bounds from call statement. */ + if (flag_check_pointer_bounds callee) +skip_bounds = chkp_redirect_edge (e); I think this gets wrong the case where the edge is speculative and the new direct call needs adjustement. You probably need to do the right think in the if (e-speculative) branch so direct call is updated by indirect is not or at least give an explanation why this is not needed :) The speculative edge handling works in a way that the runtime conditoinal is built and then the edge is updated to the direct path, so perhaps you can just move all this after the ocnditoinal? I think you are right, it should be OK to move it after apeculative call processing. diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c index 404cb68..ffb6ad7 100644 --- a/gcc/lto-wrapper.c +++ b/gcc/lto-wrapper.c @@ -273,6 +273,7 @@ merge_and_complain (struct cl_decoded_option **decoded_options, case OPT_fwrapv: case OPT_fopenmp: case OPT_fopenacc: + case OPT_fcheck_pointer_bounds: /* For selected options we can merge conservatively. */ for (j = 0; j *decoded_options_count; ++j) if ((*decoded_options)[j].opt_index == foption-opt_index) @@ -503,6 +504,7 @@ append_compiler_options (obstack *argv_obstack, struct cl_decoded_option *opts, case OPT_Ofast: case OPT_Og: case OPT_Os: + case OPT_fcheck_pointer_bounds: Hmm, will this make mixed units contaiing some check bounds and some non-check bounds to work? Perhaps these should be function specific? Does things like inlining bounds checked function into non-checked function work? This actually should make it work better because solves a possible problem with uninitialized static bounds data (chkp static constructors are generated only when OPT_fcheck_pointer_bounds is passed). Inlining of instrumentation thunks is not supported (similar to all other thunks). But we may have a not instrumented call in an instrumented function and do inlining for it. Otherwise the patch seems resonable. Honza Here is a fixed version with chkp redirection moved. Bootstrap and testing passed. Is it OK for trunk and later for GCC 5? Thanks, Ilya -- gcc/ 2015-04-14 Ilya Enkovich ilya.enkov...@intel.com PR target/65527 * cgraph.c (cgraph_edge::redirect_call_stmt_to_callee): Add redirection for instrumented calls. * lto-wrapper.c (merge_and_complain): Merge -fcheck-pointer-bounds. (append_compiler_options): Append -fcheck-pointer-bounds. * tree-chkp.h (chkp_copy_call_skip_bounds): New. (chkp_redirect_edge): New. * tree-chkp.c (chkp_copy_call_skip_bounds): New. (chkp_redirect_edge): New. gcc/testsuite/ 2015-04-14 Ilya Enkovich ilya.enkov...@intel.com PR target/65527 * gcc.target/i386/mpx/chkp-fix-calls-1.c: New. * gcc.target/i386/mpx/chkp-fix-calls-2.c: New. * gcc.target/i386/mpx/chkp-fix-calls-3.c: New. * gcc.target/i386/mpx/chkp-fix-calls-4.c: New. diff --git a/gcc/cgraph.c b/gcc/cgraph.c index 85531c8..38e71fc 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -1281,6 +1281,7 @@ cgraph_edge::redirect_call_stmt_to_callee (void) tree lhs = gimple_call_lhs (e-call_stmt); gcall *new_stmt; gimple_stmt_iterator gsi; + bool skip_bounds = false; #ifdef ENABLE_CHECKING cgraph_node *node; #endif @@ -1389,8 +1390,16 @@ cgraph_edge::redirect_call_stmt_to_callee (void) } } + /* We might propagate instrumented function pointer into + not instrumented function and vice versa. In such a + case we need to either fix function declaration or + remove bounds from call statement. */ + if (flag_check_pointer_bounds e-callee) +skip_bounds = chkp_redirect_edge (e); + if (e-indirect_unknown_callee - || decl == e-callee-decl) + || (decl == e-callee-decl + !skip_bounds)) return e-call_stmt; #ifdef ENABLE_CHECKING @@ -1415,13 +1424,19 @@ cgraph_edge::redirect_call_stmt_to_callee (void) } } - if (e-callee-clone.combined_args_to_skip) + if (e-callee-clone.combined_args_to_skip + || skip_bounds) { int lp_nr; - new_stmt - = gimple_call_copy_skip_args (e-call_stmt, - e-callee-clone.combined_args_to_skip); + new_stmt = e-call_stmt; + if (e-callee-clone.combined_args_to_skip) + new_stmt + = gimple_call_copy_skip_args (new_stmt, + e-callee-clone.combined_args_to_skip); + if (skip_bounds)
[PATCH][ARM] Fix PR 65955: Do not take REGNO on non-REG operand in movcond_addsi
Hi all, As the PR says, the movcond_addsi pattern takes the REGNO of an operand that may be a CONST_INT. The fix for that is rather simple (perhaps even obvious?). Unfortunately the testcase is in ada, and I'm not sure how to integrate this into the testsuite. Does anyone know how to add an ada testcase? The patch that introduced this faulty check was added for GCC 5 and backported to 4.9, so this patch should be backported everywhere as well. I'll be testing it on those branches shortly. Tested on arm-none-eabi. Bootstrapped on arm-none-linux-gnueabihf with arm and thumb default mode. I think this is the right thing to do in any case since the current code is clearly doing the wrong thing if operands[2] is a CONST_INT. Ok for trunk? Thanks, Kyrill P.S. I'm attaching the patch with a larger than usual context so that the definition of operands[2] can be seen more clearly. 2015-05-05 Kyrylo Tkachov kyrylo.tkac...@arm.com PR target/65955 * config/arm/arm.md (movcond_addsi): Check that operands[2] is a REG before taking its REGNO. commit e88f515909e185e7add7429b11672ccc3ad17be9 Author: Kyrylo Tkachov kyrylo.tkac...@arm.com Date: Fri May 1 10:01:13 2015 +0100 [ARM] PR 65955: check for REG_P before taking REGNO in movcond_addsi diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index b2f6b4f..e439e7a 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -9413,51 +9413,51 @@ (define_insn_and_split movcond_addsi (match_operator 5 comparison_operator [(plus:SI (match_operand:SI 3 s_register_operand r,r,r) (match_operand:SI 4 arm_add_operand rIL,rIL,rIL)) (const_int 0)]) (match_operand:SI 1 arm_rhs_operand rI,rPy,r) (match_operand:SI 2 arm_rhs_operand rI,rPy,r))) (clobber (reg:CC CC_REGNUM))] TARGET_32BIT # reload_completed [(set (reg:CC_NOOV CC_REGNUM) (compare:CC_NOOV (plus:SI (match_dup 3) (match_dup 4)) (const_int 0))) (set (match_dup 0) (match_dup 1)) (cond_exec (match_dup 6) (set (match_dup 0) (match_dup 2)))] { machine_mode mode = SELECT_CC_MODE (GET_CODE (operands[5]), operands[3], operands[4]); enum rtx_code rc = GET_CODE (operands[5]); operands[6] = gen_rtx_REG (mode, CC_REGNUM); gcc_assert (!(mode == CCFPmode || mode == CCFPEmode)); -if (REGNO (operands[2]) != REGNO (operands[0])) +if (!REG_P (operands[2]) || REGNO (operands[2]) != REGNO (operands[0])) rc = reverse_condition (rc); else std::swap (operands[1], operands[2]); operands[6] = gen_rtx_fmt_ee (rc, VOIDmode, operands[6], const0_rtx); } [(set_attr conds clob) (set_attr enabled_for_depr_it no,yes,yes) (set_attr type multiple)] ) (define_insn movcond [(set (match_operand:SI 0 s_register_operand =r,r,r) (if_then_else:SI (match_operator 5 arm_comparison_operator [(match_operand:SI 3 s_register_operand r,r,r) (match_operand:SI 4 arm_add_operand rIL,rIL,rIL)]) (match_operand:SI 1 arm_rhs_operand 0,rI,?rI) (match_operand:SI 2 arm_rhs_operand rI,0,rI))) (clobber (reg:CC CC_REGNUM))] TARGET_ARM * if (GET_CODE (operands[5]) == LT (operands[4] == const0_rtx))
Re: [PATCH, AArch64] [4.9] Backport PR64304 fix (miscompilation with -mgeneral-regs-only )
On 4 May 2015 at 10:13, Chen Shanyao chenshan...@huawei.com wrote: According to your opinion, I split the backports of pr64304 into 2 emails, and this one is for 4.9 branch. This patch backport the fix of PR target/64304 , miscompilation with -mgeneral-regs-only, to the 4.9 branch from trunk r219844. Tested on x86_64 by using qemu of aarch64. OK for 4.9? diff -rupN gcc-4.9-20150225/gcc/ChangeLog gcc-4.9-20150225.pr64304//gcc/ChangeLog --- gcc-4.9-20150225/gcc/ChangeLog2015-03-04 20:48:30.0 -0500 +++ gcc-4.9-20150225.pr64304//gcc/ChangeLog2015-03-04 20:55:59.0 -0500 @@ -1,3 +1,13 @@ +2015-03-05 Shanyao Chen chenshan...@huawei.com + +Backported from mainline +2015-01-19 Jiong Wang jiong.w...@arm.com +Andrew Pinski apin...@cavium.com + +PR target/64304 +* config/aarch64/aarch64.md (define_insn *ashlmode3_insn): Deleted. +(ashlmode3): Don't expand if operands[2] is not constant. + OK /Marcus
Next set of OpenACC changes
Hi! In follow-up messages, I'll be posting the separated parts (for easier review) of a next set of OpenACC changes that we'd like to commit. ChangeLog updates not yet written; will do that before commit, obviously. Overall diffstat: gcc/c-family/c-common.c|3 +- gcc/c-family/c-common.h|2 + gcc/c-family/c-omp.c | 105 ++ gcc/c-family/c-pragma.c|4 + gcc/c-family/c-pragma.h| 14 +- gcc/c/c-parser.c | 1353 gcc/c/c-tree.h |3 +- gcc/c/c-typeck.c | 112 +- gcc/cp/cp-gimplify.c |3 +- gcc/cp/cp-tree.h |3 +- gcc/cp/parser.c| 1382 + gcc/cp/parser.h|4 + gcc/cp/pt.c| 43 +- gcc/cp/semantics.c | 151 +- gcc/fortran/dump-parse-tree.c | 12 +- gcc/fortran/gfortran.h | 50 +- gcc/fortran/match.h|1 + gcc/fortran/openmp.c | 581 +-- gcc/fortran/parse.c| 65 +- gcc/fortran/parse.h|2 +- gcc/fortran/resolve.c |5 + gcc/fortran/st.c |7 + gcc/fortran/trans-decl.c | 62 +- gcc/fortran/trans-openmp.c | 66 +- gcc/fortran/trans-stmt.c |7 +- gcc/fortran/trans-stmt.h |2 +- gcc/fortran/trans.c|2 + gcc/gimplify.c | 16 +- gcc/omp-low.c | 11 +- gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c | 46 + .../c-c++-common/goacc-gomp/nesting-fail-1.c | 25 - gcc/testsuite/c-c++-common/goacc/asyncwait-1.c |4 +- gcc/testsuite/c-c++-common/goacc/data-2.c | 12 +- gcc/testsuite/c-c++-common/goacc/declare-1.c | 84 + gcc/testsuite/c-c++-common/goacc/declare-2.c | 67 + gcc/testsuite/c-c++-common/goacc/dtype-1.c | 113 ++ gcc/testsuite/c-c++-common/goacc/dtype-2.c | 31 + gcc/testsuite/c-c++-common/goacc/host_data-1.c | 14 + gcc/testsuite/c-c++-common/goacc/host_data-2.c | 14 + gcc/testsuite/c-c++-common/goacc/host_data-3.c | 16 + gcc/testsuite/c-c++-common/goacc/host_data-4.c | 15 + gcc/testsuite/c-c++-common/goacc/kernels-1.c |6 - gcc/testsuite/c-c++-common/goacc/kernels-empty.c |6 + gcc/testsuite/c-c++-common/goacc/kernels-eternal.c | 11 + .../c-c++-common/goacc/kernels-noreturn.c | 12 + gcc/testsuite/c-c++-common/goacc/loop-1.c |2 - gcc/testsuite/c-c++-common/goacc/parallel-1.c |6 - gcc/testsuite/c-c++-common/goacc/parallel-empty.c |6 + .../c-c++-common/goacc/parallel-eternal.c | 11 + .../c-c++-common/goacc/parallel-noreturn.c | 12 + gcc/testsuite/c-c++-common/goacc/reduction-1.c | 25 +- gcc/testsuite/c-c++-common/goacc/reduction-2.c | 22 +- gcc/testsuite/c-c++-common/goacc/reduction-3.c | 22 +- gcc/testsuite/c-c++-common/goacc/reduction-4.c | 40 +- gcc/testsuite/c-c++-common/goacc/routine-1.c | 35 + gcc/testsuite/c-c++-common/goacc/routine-2.c | 36 + gcc/testsuite/c-c++-common/goacc/routine-3.c | 52 + gcc/testsuite/c-c++-common/goacc/routine-4.c | 87 ++ gcc/testsuite/c-c++-common/goacc/tile.c| 26 + gcc/testsuite/g++.dg/goacc/template-reduction.C| 100 ++ gcc/testsuite/g++.dg/goacc/template.C | 131 ++ gcc/testsuite/gfortran.dg/goacc/cache-1.f95|1 - gcc/testsuite/gfortran.dg/goacc/coarray.f95|2 +- gcc/testsuite/gfortran.dg/goacc/coarray_2.f90 |1 + gcc/testsuite/gfortran.dg/goacc/combined_loop.f90 |2 +- gcc/testsuite/gfortran.dg/goacc/cray.f95 |1 - gcc/testsuite/gfortran.dg/goacc/declare-1.f95 |3 +- gcc/testsuite/gfortran.dg/goacc/declare-2.f95 | 44 + gcc/testsuite/gfortran.dg/goacc/default.f95| 17 + gcc/testsuite/gfortran.dg/goacc/dtype-1.f95| 161 ++ gcc/testsuite/gfortran.dg/goacc/dtype-2.f95| 39 + gcc/testsuite/gfortran.dg/goacc/host_data-tree.f95 |2 +- gcc/testsuite/gfortran.dg/goacc/loop-1.f95 |1 - gcc/testsuite/gfortran.dg/goacc/loop-2.f95 | 26 +- gcc/testsuite/gfortran.dg/goacc/modules.f95| 55 + gcc/testsuite/gfortran.dg/goacc/parameter.f95 |1 - gcc/testsuite/gfortran.dg/goacc/update.f95 |
RE: [RFC: Patch, PR 60158] gcc/varasm.c : Pass actual alignment value to output_constant_pool_2
-Original Message- From: Jeff Law [mailto:l...@redhat.com] Sent: Thursday, April 30, 2015 9:14 PM On 04/30/2015 09:34 AM, rohitarul...@freescale.com wrote: -Original Message- From: Jeff Law [mailto:l...@redhat.com] Sent: Thursday, April 30, 2015 8:32 PM To: Dharmakan Rohit-B30502; gcc-patches@gcc.gnu.org; rguent...@suse.de; Jakub Jelinek Cc: Alan Modra; David Edelsohn; Wienskoski Edmar-RA8797 Subject: Re: [RFC: Patch, PR 60158] gcc/varasm.c : Pass actual alignment value to output_constant_pool_2 On 04/29/2015 04:30 AM, rohitarul...@freescale.com wrote: Jeff, I have made the changes as per your comments and attached the patch. If the patch is OK, I will proceed with the regression tests. This patch refers back to 60158 and based on what I see in 60158, it appears I should be looking for a .data.rel.ro.local section which contains the address of a string constant. But the constants are being put into .rodata.str1.4. And if the issue is we're putting bits into the wrong section and don't have an appropriate .fixup section, then ISTM that the test should be compiled, then objdump used to verify the sections and/or relocations. An additional concern is that I get the same code for the included testcase with or without your changes. This is with a powerpc-softfloat-linux-gnuspe configured compiler -- which matches what I saw in pr 60158. So while the patch seems reasonable, I'm concerned that I've been unable to show it changing anything. Thoughts? Jeff, the issue is still reproducible with GCC v4.8 branch but not with GCC v4.9 or trunk. Was it fixed by some other approach or has the bug gone latent? Obviously if the former, then the patch is only relevant to gcc-4.8 if the latter, then we'll still want to get it fixed on the trunk and possibly in the release branches. Can you please investigate if the bug has been fixed by other means or if it's just gone latent on the trunk? Jeff, Just to summarize: By default in GCC v4.7.x, all the constants are put into '.rodata.str1.4' section. In GCC v4.8.x from r192719 onwards, one of the move instruction of the string constant .LC0 is getting spilled. The reload pass, for any constants that aren't allowed and can't be reloaded in to registers tries to change them into memory references. Then while emitting that string constant to asm code (A:varasm.c: output_constant_pool_1), it explicitly passes the alignment as 1 which prevents the generation of fix-up table entries in 'B: rs6000.c:rs6000_assemble_integer' because the data is considered unaligned now. The bug seems to have gone latent with an unrelated trunk commit r204695 [* tree-ssa-loop-ivopts.c (force_expr_to_var_cost): Refactor the code. Handle type conversion.]. This commit chooses different spill candidates hence all the string constants are being put in to '.rodata.str1.4´section. The check I had in the test case is that if there is a '.data.rel.ro.local', then there should be '.fixup' section generated. Please let me know if you need any other details. Regards, Rohit
Re: [PATCH, AArch64] Add Cortex-A53 erratum 843419 configure-time option
On 4 May 2015 at 09:58, Yvan Roux yvan.r...@linaro.org wrote: Yes this is a better formulation. +corresponding flag to the linker. It can be explicitly disabled during +compilation by passing the @option{-mno-fix-cortex-a53-835769} option. Copy paste error here with the previous errata number. Here is the patch with the modifications. Is it needed to backport it into 4.9 and 5.1 branches ? This is OK for trunk. The back ports make sense, but leave it 48 hours, if there are no objections then go ahead. Cheers /Marcus
Re: [PATCH, ARM] Fix testcases that require Thumb2 effective target.
On 04/05/15 10:14, Yvan Roux wrote: Hi, This patch fixes two ARM testcases that require target to be Thumb2 effective. One is built for Cortex-m3, the purpose of the second one is to generate thumb2_addsi3_compare0_scratch insn and both are failing when compiled for armv5t for instance. Built and regtested, is it OK for trunk ? Ok. Thanks, Kyrill Thanks, Yvan 2015-05-04 Yvan Roux yvan.r...@linaro.org * gcc.target/arm/pr65067.c: Require Thumb2 effective target. * gcc.target/arm/pr65924.c: Likewise.
Next set of OpenACC changes: middle end, libgomp
Hi! On Tue, 05 May 2015 10:54:02 +0200, I wrote: In follow-up messages, I'll be posting the separated parts (for easier review) of a next set of OpenACC changes that we'd like to commit. ChangeLog updates not yet written; will do that before commit, obviously. gcc/gimplify.c | 16 +- gcc/omp-low.c | 11 +- gcc/tree-core.h| 14 +- gcc/tree-pretty-print.c|6 + gcc/tree.c | 13 +- gcc/tree.h | 21 +- include/gomp-constants.h |4 + libgomp/oacc-mem.c |3 + libgomp/oacc-ptx.h | 28 + libgomp/plugin/plugin-nvptx.c | 10 + diff --git gcc/gimplify.c gcc/gimplify.c index bda62ce..12efdc8 100644 --- gcc/gimplify.c +++ gcc/gimplify.c @@ -6385,6 +6385,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, case OMP_CLAUSE_MERGEABLE: case OMP_CLAUSE_PROC_BIND: case OMP_CLAUSE_SAFELEN: + case OMP_CLAUSE_TILE: break; case OMP_CLAUSE_ALIGNED: @@ -6770,6 +6771,7 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, tree *list_p) case OMP_CLAUSE_VECTOR: case OMP_CLAUSE_AUTO: case OMP_CLAUSE_SEQ: + case OMP_CLAUSE_TILE: break; default: @@ -8410,21 +8412,7 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, break; case OACC_KERNELS: - if (OACC_KERNELS_COMBINED (*expr_p)) - sorry (directive not yet implemented); - else - gimplify_omp_workshare (expr_p, pre_p); - ret = GS_ALL_DONE; - break; - case OACC_PARALLEL: - if (OACC_PARALLEL_COMBINED (*expr_p)) - sorry (directive not yet implemented); - else - gimplify_omp_workshare (expr_p, pre_p); - ret = GS_ALL_DONE; - break; - case OACC_DATA: case OMP_SECTIONS: case OMP_SINGLE: diff --git gcc/omp-low.c gcc/omp-low.c index 34e2e5c..6ec5145 100644 --- gcc/omp-low.c +++ gcc/omp-low.c @@ -1928,6 +1928,9 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) case OMP_CLAUSE_INDEPENDENT: case OMP_CLAUSE_AUTO: case OMP_CLAUSE_SEQ: + case OMP_CLAUSE_BIND: + case OMP_CLAUSE_NOHOST: + case OMP_CLAUSE_TILE: sorry (Clause not supported yet); break; @@ -2055,6 +2058,9 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) case OMP_CLAUSE_INDEPENDENT: case OMP_CLAUSE_AUTO: case OMP_CLAUSE_SEQ: + case OMP_CLAUSE_BIND: + case OMP_CLAUSE_NOHOST: + case OMP_CLAUSE_TILE: sorry (Clause not supported yet); break; @@ -2742,7 +2748,10 @@ check_omp_nesting_restrictions (gimple stmt, omp_context *ctx) { for (omp_context *ctx_ = ctx; ctx_ != NULL; ctx_ = ctx_-outer) if (is_gimple_omp (ctx_-stmt) -is_gimple_omp_oacc (ctx_-stmt)) +is_gimple_omp_oacc (ctx_-stmt) + /* Except for atomic codes that we share with OpenMP. */ +! (gimple_code (stmt) == GIMPLE_OMP_ATOMIC_LOAD + || gimple_code (stmt) == GIMPLE_OMP_ATOMIC_STORE)) { error_at (gimple_location (stmt), non-OpenACC construct inside of OpenACC region); diff --git gcc/tree-core.h gcc/tree-core.h index ad1bb23..ffbccda 100644 --- gcc/tree-core.h +++ gcc/tree-core.h @@ -390,7 +390,19 @@ enum omp_clause_code { OMP_CLAUSE_NUM_WORKERS, /* OpenACC clause: vector_length (integer-expression). */ - OMP_CLAUSE_VECTOR_LENGTH + OMP_CLAUSE_VECTOR_LENGTH, + + /* OpenACC clause: bind ( identifer | string ). */ + OMP_CLAUSE_BIND, + + /* OpenACC clause: nohost. */ + OMP_CLAUSE_NOHOST, + + /* OpenACC clause: tile ( size-expr-list ). */ + OMP_CLAUSE_TILE, + + /* OpenACC clause: device_type ( device-type-list). */ + OMP_CLAUSE_DEVICE_TYPE }; #undef DEFTREESTRUCT diff --git gcc/tree-pretty-print.c gcc/tree-pretty-print.c index d7c049f..5eb4daf 100644 --- gcc/tree-pretty-print.c +++ gcc/tree-pretty-print.c @@ -799,6 +799,12 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, int flags) case OMP_CLAUSE_INDEPENDENT: pp_string (pp, independent); break; +case OMP_CLAUSE_TILE: + pp_string (pp, tile(); + dump_generic_node (pp, OMP_CLAUSE_TILE_LIST (clause), +spc, flags, false); + pp_right_paren (pp); + break; default: /* Should never happen. */ diff --git gcc/tree.c gcc/tree.c index daf0292..43f80b7 100644 --- gcc/tree.c +++ gcc/tree.c @@ -369,6 +369,10 @@ unsigned const char omp_clause_num_ops[] = 1, /* OMP_CLAUSE_NUM_GANGS */ 1, /* OMP_CLAUSE_NUM_WORKERS */ 1, /*
Re: [PATCH, i386]: Switch x86 to TARGET_SUPPORTS_WIDE_INT
On Tue, May 5, 2015 at 8:10 PM, Richard Biener richard.guent...@gmail.com wrote: I've checked spec2000 performance. Only few spec binaries differ. Anyway performance is unchanged. Thanks, that confirms expectations. Are changes in generated code expected at all? The change by itself should have no effect on the code, but it is a big change, so some unrelated perturbations in the code are expected. Uros.
Go patch committed: Don't make temporaries for string concatenation of constants
This patch from Chris Manghane changes the Go frontend to not use temporaries for constants when doing string concatenation. This fixes http://golang.org/issue/10642. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline and GCC 5 branch. Ian diff -r b2bef4b0764a go/expressions.cc --- a/go/expressions.cc Tue May 05 09:38:04 2015 -0700 +++ b/go/expressions.cc Tue May 05 13:50:49 2015 -0700 @@ -5120,13 +5120,15 @@ if (this-left_-type()-is_string_type() this-op_ == OPERATOR_PLUS) { - if (!this-left_-is_variable()) + if (!this-left_-is_variable() + !this-left_-is_constant()) { temp = Statement::make_temporary(NULL, this-left_, loc); inserter-insert(temp); this-left_ = Expression::make_temporary_reference(temp, loc); } - if (!this-right_-is_variable()) + if (!this-right_-is_variable() + !this-right_-is_constant()) { temp = Statement::make_temporary(this-left_-type(), this-right_, loc);
Re: libgo patch committed: Permit nil Func in Func.Name
On Tue, May 5, 2015 at 10:35 AM, Andreas Schwab sch...@linux-m68k.org wrote: Ian Lance Taylor i...@golang.org writes: Committed to mainline and 4.9 branch. What about the 5 branch? Oh yeah. Thanks. Committing to GCC 5 branch shortly, after tests complete. Ian
Re: [PATCH, i386]: Switch x86 to TARGET_SUPPORTS_WIDE_INT
On May 5, 2015 6:16:14 PM GMT+02:00, Uros Bizjak ubiz...@gmail.com wrote: On Tue, May 5, 2015 at 2:23 PM, Evgeny Stupachenko evstu...@gmail.com wrote: I've checked spec2000 performance. Only few spec binaries differ. Anyway performance is unchanged. Thanks, that confirms expectations. Are changes in generated code expected at all? Richard. Patch committed. Uros.
Patch ping
Hi! http://gcc.gnu.org/ml/gcc-patches/2015-04/msg01432.html - this got approved for arm and aarch64, but not for s390{,x} Ok for trunk? Jakub
[PATCH 5/4] libcpp: Eliminate most of the non-const/reference-returning inline fns
gcc/java/ChangeLog: * jcf-parse.c (set_source_filename): Replace write through ORDINARY_MAP_FILE_NAME with direct access to to_file. libcpp/ChangeLog: * include/line-map.h (MAP_START_LOCATION): Eliminate the non-const variant, and tweak comment for the const variant. (ORDINARY_MAP_STARTING_LINE_NUMBER): Drop the non-const variant. (ORDINARY_MAP_INCLUDER_FILE_INDEX): Likewise. (ORDINARY_MAP_IN_SYSTEM_HEADER_P): Likewise. (SET_ORDINARY_MAP_NUMBER_OF_COLUMN_BITS): Delete. (ORDINARY_MAP_FILE_NAME): Drop the non-const variant. (MACRO_MAP_MACRO): Likewise. (MACRO_MAP_NUM_MACRO_TOKENS): Likewise. (MACRO_MAP_LOCATIONS): Likewise. (MACRO_MAP_EXPANSION_POINT_LOCATION): Likewise. * line-map.c (linemap_add): Replace writes through macros with direct field accesses. (linemap_enter_macro): Likewise. (linemap_line_start): Likewise. --- gcc/java/jcf-parse.c | 2 +- libcpp/include/line-map.h | 86 +-- libcpp/line-map.c | 29 3 files changed, 16 insertions(+), 101 deletions(-) diff --git a/gcc/java/jcf-parse.c b/gcc/java/jcf-parse.c index e609331..d1c9fc4 100644 --- a/gcc/java/jcf-parse.c +++ b/gcc/java/jcf-parse.c @@ -374,7 +374,7 @@ set_source_filename (JCF *jcf, int index) } sfname = find_sourcefile (sfname); - ORDINARY_MAP_FILE_NAME (LINEMAPS_LAST_ORDINARY_MAP (line_table)) = sfname; + LINEMAPS_LAST_ORDINARY_MAP (line_table)-to_file = sfname; if (current_class == main_class) main_input_filename = sfname; } diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h index 95a1d85..0530072 100644 --- a/libcpp/include/line-map.h +++ b/libcpp/include/line-map.h @@ -343,7 +343,7 @@ linemap_check_macro (const line_map *map) return (const line_map_macro *)map; } -/* Read the start location of MAP, as an rvalue. */ +/* Read the start location of MAP. */ inline source_location MAP_START_LOCATION (const line_map *map) @@ -351,15 +351,6 @@ MAP_START_LOCATION (const line_map *map) return map-start_location; } -/* Access the start location of MAP as a reference - (e.g. as an lvalue). */ - -inline source_location -MAP_START_LOCATION (line_map *map) -{ - return map-start_location; -} - /* Get the starting line number of ordinary map MAP. */ inline linenum_type @@ -368,15 +359,6 @@ ORDINARY_MAP_STARTING_LINE_NUMBER (const line_map_ordinary *ord_map) return ord_map-to_line; } -/* Access the starting line number of ordinary map MAP by - reference (e.g. as an lvalue). */ - -inline linenum_type -ORDINARY_MAP_STARTING_LINE_NUMBER (line_map_ordinary *ord_map) -{ - return ord_map-to_line; -} - /* Get the index of the ordinary map at whose end ordinary map MAP was included. @@ -388,14 +370,6 @@ ORDINARY_MAP_INCLUDER_FILE_INDEX (const line_map_ordinary *ord_map) return ord_map-included_from; } -/* As above, but by reference (e.g. as an lvalue). */ - -inline int -ORDINARY_MAP_INCLUDER_FILE_INDEX (line_map_ordinary *ord_map) -{ - return ord_map-included_from; -} - /* Return a positive value if map encodes locations from a system header, 0 otherwise. Returns 1 if ordinary map MAP encodes locations in a system header and 2 if it encodes locations in a C system header @@ -407,14 +381,6 @@ ORDINARY_MAP_IN_SYSTEM_HEADER_P (const line_map_ordinary *ord_map) return ord_map-sysp; } -/* As above, but by reference (e.g. as an lvalue). */ - -inline unsigned char -ORDINARY_MAP_IN_SYSTEM_HEADER_P (line_map_ordinary *ord_map) -{ - return ord_map-sysp; -} - /* Get the number of the low-order source_location bits used for a column number within ordinary map MAP. */ @@ -424,16 +390,6 @@ ORDINARY_MAP_NUMBER_OF_COLUMN_BITS (const line_map_ordinary *ord_map) return ord_map-column_bits; } -/* Set the number of the low-order source_location bits used for a - column number within ordinary map MAP. */ - -inline void -SET_ORDINARY_MAP_NUMBER_OF_COLUMN_BITS (line_map_ordinary *ord_map, - int col_bits) -{ - ord_map-column_bits = col_bits; -} - /* Get the filename of ordinary map MAP. */ inline const char * @@ -442,14 +398,6 @@ ORDINARY_MAP_FILE_NAME (const line_map_ordinary *ord_map) return ord_map-to_file; } -/* As above, but by reference (e.g. as an lvalue). */ - -inline const char * -ORDINARY_MAP_FILE_NAME (line_map_ordinary *ord_map) -{ - return ord_map-to_file; -} - /* Get the cpp macro whose expansion gave birth to macro map MAP. */ inline cpp_hashnode * @@ -458,14 +406,6 @@ MACRO_MAP_MACRO (const line_map_macro *macro_map) return macro_map-macro; } -/* As above, but by reference (e.g. as an lvalue). */ - -inline cpp_hashnode * -MACRO_MAP_MACRO (line_map_macro *macro_map) -{ - return macro_map-macro; -} - /* Get the number of tokens inside the replacement-list of the
Re: [PATCH, x86] Add TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE hook
thanks for the validation and the confirmation that iinline-attr.c is now fixed on aarch64. I can now send the patch for submission request (this one was just illustrative). thanks Christian On 05/05/2015 11:10 AM, Yvan Roux wrote: Hi Christian, On 4 May 2015 at 11:29, Christian Bruel christian.br...@st.com wrote: Hi Christian, I noticed case gcc.dg/ipa/iinline-attr.c failed on aarch64. The original patch is x86 specific, while the case is added as general one. Could you please have a look at this? FAIL: gcc.dg/ipa/iinline-attr.c scan-ipa-dump inline hooray[^\\n]*inline copy in test that is the same latent bug for aarch64: alignment flags are not propagated with attribute optimize (O2). testing attached patch The patch looks good to me, maybe you can just fix the original typo on optimizing in the comments while moving the code. I've bootstrapped and regtested it on aarch64-linux-gnu with the same target testcase you added on i386 (as we discussed offline) and everything is ok (gcc.dg/ipa/iinline-attr.c now PASS). Notice that I can't approve your patch. Cheers, Yvan
Re: [PATCH,testsuite] Xfail gcc.dg/tree-ssa/stdarg-2.c f15 scans
On May 5, 2015 11:15:11 AM GMT+02:00, Tom de Vries tom_devr...@mentor.com wrote: Hi, after checking in the 'postpone expanding va_arg till pass_stdarg' patch series, some scans related to function f15 in gcc.dg/tree-ssa/stdarg-2.c have started failing. [ The committed patch series contained a modification of stdarg-2.c, but that seems to be not complete and not correct. ] F.i., for s390 we find at https://gcc.gnu.org/ml/gcc-testresults/2015-05/msg00507.html: ... FAIL: gcc.dg/tree-ssa/stdarg-2.c scan-tree-dump stdarg f15: va_list escapes 0, needs to save 1 GPR units and 2 FPR units ... Furthermore, at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64950#c11 it was noted that: ... Unfortunately, the gcc.dg/tree-ssa/stdarg-2.c part of the patch is wrong: the test now FAILs on i686-unknown-linux-gnu, i686-apple-darwin, and i386-pc-solaris with -m64: both dumps (i.e. -m32 and -m64) contain m32/stdarg-2.c.084t.stdarg:f15: va_list escapes 1, needs to save all GPR units and all FPR units. m64/stdarg-2.c.084t.stdarg:f15: va_list escapes 1, needs to save all GPR units and all FPR units. ... I've filed two PRs: - PR66010 '[6 Regression] Missed optimization after inlining va_list parameter' https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66010 - PR66013 'Missed optimization after inlining va_list parameter, -m32 case' https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66013 to track these errors. AFAIU now from the investigations in those PRs, we can expect all f15 scans that check for the presence of 'va_list escapes 0' to fail. I'd like to commit two patches. - The first patch undoes the modification of stdarg-2.c as committed in the original patch series (omitted). - The second patch adds appropriate xfails (attached). OK for trunk? OK. Thanks, Richard. Thanks, - Tom
Re: [PATCH][AArch64] Add branch-cost to cpu tuning information.
On 01/05/15 10:18, Marcus Shawcroft wrote: On 21 April 2015 at 15:00, Matthew Wahab matthew.wa...@arm.com wrote: +int aarch64_branch_cost (bool, bool); + You would never guess looking at this .h today, but long ago there was something close to alphabetical order by function name in place. Please lift this definition between aarch64_bitmask_imm and aarch64_classify_symbolic_expression. +int +aarch64_branch_cost (bool speed_p, bool predictable_p) +{ Add an appropriate comment before the function please. Attached reworked patch: - Moved declaration of aarch64_branch_cost to after aarch64_bitmask_imm. - Added comment before definition of aarch64_branch_cost. Tested aarch64-none-linux-gnu with gcc-check. Ok for trunk? Matthew 2015-05-05 Matthew Wahab matthew.wa...@arm.com * gcc/config/aarch64-protos.h (struct cpu_branch_cost): New. (tune_params): Add field branch_costs. (aarch64_branch_cost): Declare. * gcc/config/aarch64.c (generic_branch_cost): New. (generic_tunings): Set field cpu_branch_cost to generic_branch_cost. (cortexa53_tunings): Likewise. (cortexa57_tunings): Likewise. (thunderx_tunings): Likewise. (xgene1_tunings): Likewise. (aarch64_branch_cost): Define. * gcc/config/aarch64/aarch64.h (BRANCH_COST): Redefine. diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 08ce5f1..931c8b8 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -162,12 +162,20 @@ struct cpu_vector_cost const int cond_not_taken_branch_cost; /* Cost of not taken branch. */ }; +/* Branch costs. */ +struct cpu_branch_cost +{ + const int predictable;/* Predictable branch or optimizing for size. */ + const int unpredictable; /* Unpredictable branch or optimizing for speed. */ +}; + struct tune_params { const struct cpu_cost_table *const insn_extra_cost; const struct cpu_addrcost_table *const addr_cost; const struct cpu_regmove_cost *const regmove_cost; const struct cpu_vector_cost *const vec_costs; + const struct cpu_branch_cost *const branch_costs; const int memmov_cost; const int issue_rate; const unsigned int fuseable_ops; @@ -184,6 +192,7 @@ struct tune_params HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned); int aarch64_get_condition_code (rtx); bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode); +int aarch64_branch_cost (bool, bool); enum aarch64_symbol_type aarch64_classify_symbolic_expression (rtx, enum aarch64_symbol_context); bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 374b0a9..7bc28ae 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -340,12 +340,20 @@ static const struct cpu_vector_cost xgene1_vector_cost = #define AARCH64_FUSE_ADRP_LDR (1 3) #define AARCH64_FUSE_CMP_BRANCH (1 4) +/* Generic costs for branch instructions. */ +static const struct cpu_branch_cost generic_branch_cost = +{ + 2, /* Predictable. */ + 2 /* Unpredictable. */ +}; + static const struct tune_params generic_tunings = { cortexa57_extra_costs, generic_addrcost_table, generic_regmove_cost, generic_vector_cost, + generic_branch_cost, 4, /* memmov_cost */ 2, /* issue_rate */ AARCH64_FUSE_NOTHING, /* fuseable_ops */ @@ -365,6 +373,7 @@ static const struct tune_params cortexa53_tunings = generic_addrcost_table, cortexa53_regmove_cost, generic_vector_cost, + generic_branch_cost, 4, /* memmov_cost */ 2, /* issue_rate */ (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD @@ -385,6 +394,7 @@ static const struct tune_params cortexa57_tunings = cortexa57_addrcost_table, cortexa57_regmove_cost, cortexa57_vector_cost, + generic_branch_cost, 4, /* memmov_cost */ 3, /* issue_rate */ (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD @@ -405,6 +415,7 @@ static const struct tune_params thunderx_tunings = generic_addrcost_table, thunderx_regmove_cost, generic_vector_cost, + generic_branch_cost, 6, /* memmov_cost */ 2, /* issue_rate */ AARCH64_FUSE_CMP_BRANCH, /* fuseable_ops */ @@ -424,6 +435,7 @@ static const struct tune_params xgene1_tunings = xgene1_addrcost_table, xgene1_regmove_cost, xgene1_vector_cost, + generic_branch_cost, 6, /* memmov_cost */ 4, /* issue_rate */ AARCH64_FUSE_NOTHING, /* fuseable_ops */ @@ -5409,6 +5421,23 @@ aarch64_address_cost (rtx x, return cost; } +/* Return the cost of a branch. If SPEED_P is true then the compiler is + optimizing for speed. If PREDICTABLE_P is true then the branch is predicted + to be taken. */ + +int +aarch64_branch_cost (bool speed_p, bool predictable_p) +{ + /* When optimizing for speed, use the cost of unpredictable branches. */ + const struct cpu_branch_cost *branch_costs = +
Re: [PR testsuite/65205, libgomp/65993] Fix dg-shouldfail usage in OpenACC libgomp tests
Hi! On Mon, 4 May 2015 10:20:14 -0400, John David Anglin dave.ang...@bell.net wrote: On 2015-05-04 4:32 AM, Thomas Schwinge wrote: Dave, would you please test the following patch, and report the regression status compared to before r222620? (Compared to your existing r222021 results, as posted in the PR, for example.) With patch, we have the following fails on hppa2.0w-hp-hpux11.11: Thanks for testing! FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/lib-3.c -DACC_DEVICE_TYPE_host =1 -DACC_MEM_SHARED=1 output pattern test, is libgomp: no device found , should match device [0-9]+\([0-9]+\) is initialized FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/lib-42.c -DACC_DEVICE_TYPE_hos t=1 -DACC_MEM_SHARED=1 output pattern test, is , should match \[[0-9a-fA-FxX]+,2 56\] is not mapped These are known issues. FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/lib-62.c -DACC_DEVICE_TYPE_hos t=1 -DACC_MEM_SHARED=1 output pattern test, is , should match invalid size With this one I'll need your help: please cite from libgomp.log (or, from a manual run) the actual output message that you're getting. Committed in r222799: commit 92d46f1ab3a56ffa0deb412141fbbcbe7036f61d Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Tue May 5 09:39:29 2015 + [PR testsuite/65205, libgomp/65993] Fix dg-shouldfail usage in OpenACC libgomp tests In dg-output, don't expect 0x prefix for %p format specifier, don't expect (nil) for NULL pointer. PR testsuite/65205 PR libgomp/65993 libgomp/ * testsuite/libgomp.oacc-c-c++-common/clauses-2.c: In dg-output, don't expect 0x prefix for %p format specifier, don't expect (nil) for NULL pointer. * testsuite/libgomp.oacc-c-c++-common/lib-16.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-17.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-18.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-20.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-21.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-22.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-23.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-25.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-26.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-27.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-28.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-29.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-30.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-34.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-35.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-36.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-39.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-40.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-42.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-43.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-44.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-47.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-48.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-52.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-53.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-54.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-57.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-58.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/data-already-1.c: More accurately specify what we're looking for. * testsuite/libgomp.oacc-c-c++-common/data-already-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/data-already-8.c: Likewise. * testsuite/libgomp.oacc-fortran/data-already-1.f: Likewise. * testsuite/libgomp.oacc-fortran/data-already-2.f: Likewise. * testsuite/libgomp.oacc-fortran/data-already-8.f: Likewise. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@222799 138bc75d-0d04-0410-961f-82ee72b054a4 --- libgomp/ChangeLog | 43 .../libgomp.oacc-c-c++-common/clauses-2.c |2 +- .../libgomp.oacc-c-c++-common/data-already-1.c |4 +- .../libgomp.oacc-c-c++-common/data-already-2.c |4 +- .../libgomp.oacc-c-c++-common/data-already-8.c |4 +- .../testsuite/libgomp.oacc-c-c++-common/lib-16.c |2 +- .../testsuite/libgomp.oacc-c-c++-common/lib-17.c |2 +- .../testsuite/libgomp.oacc-c-c++-common/lib-18.c |2 +- .../testsuite/libgomp.oacc-c-c++-common/lib-20.c |2 +- .../testsuite/libgomp.oacc-c-c++-common/lib-21.c |2 +- .../testsuite/libgomp.oacc-c-c++-common/lib-22.c |2 +- .../testsuite/libgomp.oacc-c-c++-common/lib-23.c |2 +-
Re: [PATCH 0/3][AArch64] DImode vector compares
Alan Lawrence wrote: Hi, Comparing 64x1 vector types (defined by hand or from arm_neon.h) using GCC vector extensions currently generates very poor assembly code, for example uint64x1_t foo (uint64x1_t a, uint64x1_t b) { return a = b; } generates (at -O3): fmov x0, d0 // 22 movdi_aarch64/12 [length = 4] fmov x1, d1 // 23 movdi_aarch64/12 [length = 4] cmp x0, x1 // 10 cmpdi/1 [length = 4] csinv x0, xzr, xzr, cc // 17 cmovdi_insn/3 [length = 4] fmov d0, x0 // 24 *movdi_aarch64/11 [length = 4] ret // 27 simple_return [length = 4] Meaning that arm_neon.h instead has to use rather awkward forms like return (uint64x1_t) {__a[0] = __b[0] ? -1ll : 0ll}; to produce the desired assembly cmhs d0, d0, d1 ret This series adds vcond(u?)didi patterns for AArch64, to generate appropriate RTL from direct comparisons of 64x1 vectors (which are of DImode). However, as things stand, adding a vconddidi pattern causes an ICE in vector_compare_rtx (maybe_legitimize_operands), because a DImode constant-zero (vector or otherwise) is expanded as const0_rtx, which has mode VOIDmode. I tried quite a bit to generate an RTL const_vector, or even just something with mode DImode, but without success, hence the first patch fixes vector_compare_rtx to use the mode from the tree if necessary. (DImode vectors are specifically allowed by stor-layout.c, but no other platform defines vconddidi.) Can I ping the AArch64 parts of this (patches 2+3)? These then provide the testcases requested by Jeff Law in his approval of the first patch (https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00076.html). Thanks, Alan
Re: Next set of OpenACC changes: Fortran
On 5 May 2015 at 10:58, Thomas Schwinge tho...@codesourcery.com wrote: Hi! +/* Node in the linked list used for storing !$oacc declare constructs. */ The clause is called $ACC declare, isn't it? + for (oc = new_oc; oc; oc = oc-next) +{ + c = oc-clauses; + for (n = c-lists[OMP_LIST_MAP]; n != NULL; n = n-next) + n-sym-mark = 0; +} + + for (oc = new_oc; oc; oc = oc-next) +{ + c = oc-clauses; + for (n = c-lists[OMP_LIST_MAP]; n != NULL; n = n-next) + { + if (n-sym-mark) + { + gfc_error (Symbol %qs present on multiple clauses at %C, +n-sym-name); + return MATCH_ERROR; + } + else + n-sym-mark = 1; + } +} + + for (oc = new_oc; oc; oc = oc-next) +{ + c = oc-clauses; + for (n = c-lists[OMP_LIST_MAP]; n != NULL; n = n-next) + n-sym-mark = 1; +} Much code for setting n-sym-mark = 1. What am i missing? + + ns-oacc_declare = new_oc; + return MATCH_YES; } @@ -1304,10 +1580,21 @@ match gfc_match_oacc_update (void) { gfc_omp_clauses *c; - if (gfc_match_omp_clauses (c, OACC_UPDATE_CLAUSES, false, false, true) + locus here = gfc_current_locus; + + if (gfc_match_omp_clauses (c, OACC_UPDATE_CLAUSES, +OACC_UPDATE_CLAUSE_DEVICE_TYPE_MASK, false, +false, true) != MATCH_YES) return MATCH_ERROR; + if (!c-lists[OMP_LIST_MAP]) +{ + gfc_error (%acc update% must contain at least one +%device% or %host/self% clause at %L, here); + return MATCH_ERROR; $ACC UPDATE instead of %acc update % ? - else if (code-ext.omp_clauses-gang - code-ext.omp_clauses-worker - code-ext.omp_clauses-vector) + if (code-ext.omp_clauses-tile_list code-ext.omp_clauses-gang + code-ext.omp_clauses-worker code-ext.omp_clauses-vector) conditions on separate lines, please. - for (list = OMP_LIST_DEVICE_RESIDENT; - list = OMP_LIST_DEVICE_RESIDENT; list++) -for (n = ns-oacc_declare_clauses-lists[list]; n; n = n-next) - { - n-sym-mark = 0; - if (n-sym-attr.flavor == FL_PARAMETER) - gfc_error (PARAMETER object %qs is not allowed at %L, n-sym-name, loc); - } + for (list = OMP_LIST_DEVICE_RESIDENT; + list = OMP_LIST_DEVICE_RESIDENT; list++) + for (n = oc-clauses-lists[list]; n; n = n-next) + { + n-sym-mark = 0; + if (n-sym-attr.flavor == FL_PARAMETER) + gfc_error (PARAMETER object %qs is not allowed at %L, +n-sym-name, loc); + } - for (list = OMP_LIST_DEVICE_RESIDENT; - list = OMP_LIST_DEVICE_RESIDENT; list++) -for (n = ns-oacc_declare_clauses-lists[list]; n; n = n-next) - { - if (n-sym-mark) - gfc_error (Symbol %qs present on multiple clauses at %L, -n-sym-name, loc); - else - n-sym-mark = 1; - } + for (list = OMP_LIST_DEVICE_RESIDENT; + list = OMP_LIST_DEVICE_RESIDENT; list++) + for (n = oc-clauses-lists[list]; n; n = n-next) + { + if (n-sym-mark) + gfc_error (Symbol %qs present on multiple clauses at %L, +n-sym-name, loc); + else + n-sym-mark = 1; + } - for (n = ns-oacc_declare_clauses-lists[OMP_LIST_DEVICE_RESIDENT]; n; - n = n-next) -check_array_not_assumed (n-sym, loc, DEVICE_RESIDENT); + for (n = oc-clauses-lists[OMP_LIST_DEVICE_RESIDENT]; n; n = n-next) + check_array_not_assumed (n-sym, loc, DEVICE_RESIDENT); + + for (n = oc-clauses-lists[OMP_LIST_MAP]; n; n = n-next) + { + if (n-expr n-expr-ref-type == REF_ARRAY) + gfc_error (Subarray: %qs not allowed in $!ACC DECLARE at %L, +n-sym-name, loc); + } +} } The -mark setting looks complicated (as noted above)? thanks,
[C++ Patch/RFC] PR 53184
Hi, per the audit trail, this issue appears to boil down to two separate issues: - The warning doesn't appear universally useful, thus it would be nice to give it a name in order to enable disabling it. - As shown by the testcase, sometimes the wording is misleading: it talks about 'anonymous namespace', where, as clarified by Jason in the trail, the issue is really about a type with no linkage, no namespace involved. - The former is easy done, I picked: -Wsubobject-linkage. Makes sense? - The latter is a little more tricky, because it doesn't seem always easy to tell one case from the other, in particular when templates are involved (eg, g++.dg/warn/anonymous-namespace-3.C) and the linkage issue involves template arguments. Given that the warning doesn't seem terribly important (as another data point, clang doesn't have it), so far I have conditionals which reliably figure out cases of anonymous namespace and cases of no linkage (per the testcase at issue, for example) and otherwise fall back to an 'or' wording. I hope the improvement is good enough. Alternately, I suppose the warning could use a completely different, more generic, wording, but in that case testcases like anonymous-namespace-3.C will need adjustment. Tested x86_64-linux. Thanks, Paolo. Index: c-family/c.opt === --- c-family/c.opt (revision 222822) +++ c-family/c.opt (working copy) @@ -913,6 +913,11 @@ Wuseless-cast C++ ObjC++ Var(warn_useless_cast) Warning Warn about useless casts +Wsubobject-linkage +C++ ObjC++ Var(warn_subobject_linkage) Warning Init(1) +Warn if a class type has a base or a field whose type uses the anonymous +namespace or depends on a type with no linkage + ansi C ObjC C++ ObjC++ A synonym for -std=c89 (for C) or -std=c++98 (for C++) Index: cp/decl2.c === --- cp/decl2.c (revision 222822) +++ cp/decl2.c (working copy) @@ -2554,9 +2554,22 @@ constrain_class_visibility (tree type) if (subvis == VISIBILITY_ANON) { if (!in_main_input_context ()) - warning (0, \ + { + if (MAYBE_CLASS_TYPE_P (ftype) +decl_anon_ns_mem_p (TYPE_MAIN_DECL (ftype))) + warning (OPT_Wsubobject_linkage, \ %qT has a field %qD whose type uses the anonymous namespace, type, t); + else if (no_linkage_check (ftype, /*relaxed_p=*/false)) + warning (OPT_Wsubobject_linkage, \ +%qT has a field %qD whose type depends on a type with no linkage, + type, t); + else + warning (OPT_Wsubobject_linkage, \ +%qT has a field %qD whose type uses the anonymous namespace or depends on a + type with no linkage, + type, t); + } } else if (MAYBE_CLASS_TYPE_P (ftype) vis VISIBILITY_HIDDEN @@ -2574,9 +2587,21 @@ constrain_class_visibility (tree type) if (subvis == VISIBILITY_ANON) { if (!in_main_input_context()) - warning (0, \ + { + if (decl_anon_ns_mem_p (TYPE_MAIN_DECL (TREE_TYPE (t + warning (OPT_Wsubobject_linkage, \ %qT has a base %qT whose type uses the anonymous namespace, -type, TREE_TYPE (t)); +type, TREE_TYPE (t)); + else if (no_linkage_check (TREE_TYPE (t), /*relaxed_p=*/false)) + warning (OPT_Wsubobject_linkage, \ +%qT has a base %qT whose type depends on a type with no linkage, +type, TREE_TYPE (t)); + else + warning (OPT_Wsubobject_linkage, \ +%qT has a base %qT whose type uses the anonymous namespace or depends on a +type with no linkage, +type, TREE_TYPE (t)); + } } else if (vis VISIBILITY_HIDDEN subvis = VISIBILITY_HIDDEN) Index: doc/invoke.texi === --- doc/invoke.texi (revision 222822) +++ doc/invoke.texi (working copy) @@ -278,7 +278,7 @@ Objective-C and Objective-C++ Dialects}. -Wstrict-aliasing=n @gol -Wstrict-overflow -Wstrict-overflow=@var{n} @gol -Wsuggest-attribute=@r{[}pure@r{|}const@r{|}noreturn@r{|}format@r{]} @gol -Wsuggest-final-types @gol -Wsuggest-final-methods -Wsuggest-override @gol --Wmissing-format-attribute @gol +-Wmissing-format-attribute -Wsubobject-linkage @gol -Wswitch -Wswitch-default -Wswitch-enum -Wswitch-bool -Wsync-nand @gol -Wsystem-headers -Wtrampolines -Wtrigraphs -Wtype-limits -Wundef @gol -Wuninitialized -Wunknown-pragmas -Wno-pragmas @gol @@ -4801,6 +4801,13 @@ types. @option{-Wconversion-null} is enabled by de Warn when a literal '0' is used as null pointer
Re: [debug-early] fix problem with template parameter packs
On 05/04/2015 09:29 PM, Aldy Hernandez wrote: The code handling parameter DIEs needed a little tweaking for variable length template arguments. I've relaxed the original assert, but this may require tweaking at branch review time-- hopefully later this week. What testcase motivated this? We're within a formal_parameter_pack, but DECL_ABSTRACT is set, so I guess the earlier parm_die was from a declaration? If we're going to re-use the individual parms, I'd think we want to reuse the pack as well. Jason
Re: [debug-early] fix problem with template parameter packs
On 05/05/2015 02:08 PM, Jason Merrill wrote: On 05/04/2015 09:29 PM, Aldy Hernandez wrote: The code handling parameter DIEs needed a little tweaking for variable length template arguments. I've relaxed the original assert, but this may require tweaking at branch review time-- hopefully later this week. What testcase motivated this? We're within a formal_parameter_pack, but Pretty much every other test in the libstdc++-v3 testsuite was failing with the ICE I elided in my patch. I wasn't able to narrow it down to a tiny test, but I can do so if you want. ?? Aldy
Re: [PATCH] Fix eipa_sra AAPCS issue (PR target/65956)
On 05/05/15 08:32, Jakub Jelinek wrote: On Mon, May 04, 2015 at 05:00:11PM +0200, Jakub Jelinek wrote: So at least changing arm_needs_doubleword_align for non-aggregates would likely not break anything that hasn't been broken already and would unbreak the majority of cases. Attached (untested so far). It indeed changes code generated for over-aligned va_arg, but as I believe you can't properly pass those in the ... caller, this should just fix it so that va_arg handling matches the caller (and likewise for callees for named argument passing). The following testcase shows that eipa_sra changes alignment even for the aggregates. Change aligned (8) to aligned (4) to see another possibility. Actually I misread it, for the aggregates esra actually doesn't change anything, which is the reason why the testcase doesn't fail. The problem with the scalars is that esra first changed it to the over-aligned MEM_REFs and then later on eipa_sra used the types of the MEM_REFs created by esra. 2015-05-05 Jakub Jelinek ja...@redhat.com PR target/65956 * config/arm/arm.c (arm_needs_doubleword_align): For non-aggregate types check TYPE_ALIGN of TYPE_MAIN_VARIANT rather than type itself. * gcc.c-torture/execute/pr65956.c: New test. --- gcc/config/arm/arm.c.jj 2015-05-04 21:51:42.0 +0200 +++ gcc/config/arm/arm.c 2015-05-05 09:20:52.481693337 +0200 @@ -6063,8 +6063,13 @@ arm_init_cumulative_args (CUMULATIVE_ARG static bool arm_needs_doubleword_align (machine_mode mode, const_tree type) { - return (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY - || (type TYPE_ALIGN (type) PARM_BOUNDARY)); + if (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY) +return true; I don't think this is right (though I suspect the existing code has the same problem). We should only look at mode if there is no type information. The problem is that GCC has a nasty habit of assigning real machine modes to things that are really BLKmode and we've run into several cases where this has royally screwed things up. So for consistency in the ARM back-end we are careful to only use mode when type is NULL (= it's a libcall). + if (type == NULL_TREE) +return false; + if (AGGREGATE_TYPE_P (type)) +return TYPE_ALIGN (type) PARM_BOUNDARY; + return TYPE_ALIGN (TYPE_MAIN_VARIANT (type)) PARM_BOUNDARY; } It ought to be possible to re-order this, though, to static bool arm_needs_doubleword_align (machine_mode mode, const_tree type) { - return (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY - || (type TYPE_ALIGN (type) PARM_BOUNDARY)); + if (type != NULL_TREE) +{ + if (AGGREGATE_TYPE_P (type)) +return TYPE_ALIGN (type) PARM_BOUNDARY; + return TYPE_ALIGN (TYPE_MAIN_VARIANT (type)) PARM_BOUNDARY; +} + return (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY); } Either way, this would need careful cross-testing against an existing compiler. R. --- gcc/testsuite/gcc.c-torture/execute/pr65956.c.jj 2015-05-01 10:32:34.730150257 +0200 +++ gcc/testsuite/gcc.c-torture/execute/pr65956.c 2015-05-01 10:32:13.0 +0200 @@ -0,0 +1,67 @@ +/* PR target/65956 */ + +struct A { char *a; int b; long long c; }; +char v[3]; + +__attribute__((noinline, noclone)) void +fn1 (char *x, char *y) +{ + if (x != v[1] || y != v[2]) +__builtin_abort (); + v[1]++; +} + +__attribute__((noinline, noclone)) int +fn2 (char *x) +{ + asm volatile ( : +g (x) : : memory); + return x == v[0]; +} + +__attribute__((noinline, noclone)) void +fn3 (const char *x) +{ + if (x[0] != 0) +__builtin_abort (); +} + +static struct A +foo (const char *x, struct A y, struct A z) +{ + struct A r = { 0, 0, 0 }; + if (y.b z.b) +{ + if (fn2 (y.a) fn2 (z.a)) + switch (x[0]) + { + case '|': + break; + default: + fn3 (x); + } + fn1 (y.a, z.a); +} + return r; +} + +__attribute__((noinline, noclone)) int +bar (int x, struct A *y) +{ + switch (x) +{ +case 219: + foo (+, y[-2], y[0]); +case 220: + foo (-, y[-2], y[0]); +} +} + +int +main () +{ + struct A a[3] = { { v[1], 1, 1LL }, { v[0], 0, 0LL }, { v[2], 2, 2LL } }; + bar (220, a + 2); + if (v[1] != 1) +__builtin_abort (); + return 0; +} Jakub
Re: [PATCH] Fix eipa_sra AAPCS issue (PR target/65956)
On 05/05/15 11:54, Richard Earnshaw wrote: On 05/05/15 08:32, Jakub Jelinek wrote: On Mon, May 04, 2015 at 05:00:11PM +0200, Jakub Jelinek wrote: So at least changing arm_needs_doubleword_align for non-aggregates would likely not break anything that hasn't been broken already and would unbreak the majority of cases. Attached (untested so far). It indeed changes code generated for over-aligned va_arg, but as I believe you can't properly pass those in the ... caller, this should just fix it so that va_arg handling matches the caller (and likewise for callees for named argument passing). The following testcase shows that eipa_sra changes alignment even for the aggregates. Change aligned (8) to aligned (4) to see another possibility. Actually I misread it, for the aggregates esra actually doesn't change anything, which is the reason why the testcase doesn't fail. The problem with the scalars is that esra first changed it to the over-aligned MEM_REFs and then later on eipa_sra used the types of the MEM_REFs created by esra. 2015-05-05 Jakub Jelinek ja...@redhat.com PR target/65956 * config/arm/arm.c (arm_needs_doubleword_align): For non-aggregate types check TYPE_ALIGN of TYPE_MAIN_VARIANT rather than type itself. * gcc.c-torture/execute/pr65956.c: New test. --- gcc/config/arm/arm.c.jj 2015-05-04 21:51:42.0 +0200 +++ gcc/config/arm/arm.c 2015-05-05 09:20:52.481693337 +0200 @@ -6063,8 +6063,13 @@ arm_init_cumulative_args (CUMULATIVE_ARG static bool arm_needs_doubleword_align (machine_mode mode, const_tree type) { - return (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY - || (type TYPE_ALIGN (type) PARM_BOUNDARY)); + if (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY) +return true; I don't think this is right (though I suspect the existing code has the same problem). We should only look at mode if there is no type information. The problem is that GCC has a nasty habit of assigning real machine modes to things that are really BLKmode and we've run into several cases where this has royally screwed things up. So for consistency in the ARM back-end we are careful to only use mode when type is NULL (= it's a libcall). + if (type == NULL_TREE) +return false; + if (AGGREGATE_TYPE_P (type)) +return TYPE_ALIGN (type) PARM_BOUNDARY; + return TYPE_ALIGN (TYPE_MAIN_VARIANT (type)) PARM_BOUNDARY; } It ought to be possible to re-order this, though, to static bool arm_needs_doubleword_align (machine_mode mode, const_tree type) { - return (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY - || (type TYPE_ALIGN (type) PARM_BOUNDARY)); + if (type != NULL_TREE) +{ + if (AGGREGATE_TYPE_P (type)) +return TYPE_ALIGN (type) PARM_BOUNDARY; + return TYPE_ALIGN (TYPE_MAIN_VARIANT (type)) PARM_BOUNDARY; +} + return (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY); } Either way, this would need careful cross-testing against an existing compiler. It looks as though either patch would cause ABI incompatibility for typedef int alignedint __attribute__((aligned((8; int __attribute__((weak)) foo (int a, alignedint b) {return b;} void bar (alignedint x) { foo (1, x); } Where currently gcc uses r2 as the argument register for b in foo. R.
[PATCH, i386]: Use HOST_WIDE_INT_1{,U} some more
Hello! No functional changes. 2015-05-05 Uros Bizjak ubiz...@gmail.com * config/i386/i386.c: Use HOST_WIDE_INT_1 instead of (HOST_WIDE_INT) 1 and HOST_WIDE_INT_1U instead of (unsigned HOST_WIDE_INT) 1. * config/i386/i386.md: Ditto. * config/i386/winnt.c: Ditto. Tested on x86_64-linux-gnu {,-m32} and committed to mainline SVN: Uros. Index: config/i386/i386.md === --- config/i386/i386.md (revision 222796) +++ config/i386/i386.md (working copy) @@ -6087,7 +6087,7 @@ GET_MODE (operands[0]) == GET_MODE (operands[1]) ((unsigned HOST_WIDE_INT) INTVAL (operands[2])) - 1 3 ((unsigned HOST_WIDE_INT) INTVAL (operands[3]) -((unsigned HOST_WIDE_INT) 1 INTVAL (operands[2]))) +(HOST_WIDE_INT_1U INTVAL (operands[2]))) # reload_completed [(const_int 0)] Index: config/i386/winnt.c === --- config/i386/winnt.c (revision 222796) +++ config/i386/winnt.c (working copy) @@ -599,7 +599,7 @@ i386_pe_asm_output_aligned_decl_common (FILE *stre assemble_name (stream, name); if (use_pe_aligned_common) fprintf (stream, , HOST_WIDE_INT_PRINT_DEC , %d\n, - size ? size : (HOST_WIDE_INT) 1, + size ? size : HOST_WIDE_INT_1, exact_log2 (align) - exact_log2 (CHAR_BIT)); else fprintf (stream, , HOST_WIDE_INT_PRINT_DEC \t ASM_COMMENT_START Index: config/i386/i386.c === --- config/i386/i386.c (revision 222796) +++ config/i386/i386.c (working copy) @@ -19366,7 +19366,7 @@ ix86_expand_adjust_ufix_to_sfix_si (rtx val, rtx * OPTAB_DIRECT); else { - rtx two31 = GEN_INT ((unsigned HOST_WIDE_INT) 1 31); + rtx two31 = GEN_INT (HOST_WIDE_INT_1U 31); two31 = ix86_build_const_vector (intmode, 1, two31); *xorp = expand_simple_binop (intmode, AND, gen_lowpart (intmode, tmp[0]), @@ -24772,7 +24772,7 @@ ix86_expand_set_or_movmem (rtx dst, rtx src, rtx c } /* Make sure we don't need to care about overflow later on. */ - if (count ((unsigned HOST_WIDE_INT) 1 30)) + if (count (HOST_WIDE_INT_1U 30)) return false; /* Step 0: Decide on preferred algorithm, desired alignment and @@ -47734,7 +47734,7 @@ expand_vec_perm_interleave2 (struct expand_vec_per /* Examine from whence the elements come. */ contents = 0; for (i = 0; i nelt; ++i) -contents |= ((unsigned HOST_WIDE_INT) 1) d-perm[i]; +contents |= HOST_WIDE_INT_1U d-perm[i]; memset (remap, 0xff, sizeof (remap)); dremap = *d; @@ -47744,7 +47744,7 @@ expand_vec_perm_interleave2 (struct expand_vec_per unsigned HOST_WIDE_INT h1, h2, h3, h4; /* Split the two input vectors into 4 halves. */ - h1 = (((unsigned HOST_WIDE_INT) 1) nelt2) - 1; + h1 = (HOST_WIDE_INT_1U nelt2) - 1; h2 = h1 nelt2; h3 = h2 nelt2; h4 = h3 nelt2; @@ -47826,7 +47826,7 @@ expand_vec_perm_interleave2 (struct expand_vec_per unsigned int nonzero_halves[4]; /* Split the two input vectors into 8 quarters. */ - q[0] = (((unsigned HOST_WIDE_INT) 1) nelt4) - 1; + q[0] = (HOST_WIDE_INT_1U nelt4) - 1; for (i = 1; i 8; ++i) q[i] = q[0] (nelt4 * i); for (i = 0; i 4; ++i)
Re: [PATCH, PR target/65103, 2/3] Propagate address constants into loops for i386
2015-04-21 8:52 GMT+03:00 Jeff Law l...@redhat.com: On 04/17/2015 02:34 AM, Ilya Enkovich wrote: On 15 Apr 14:07, Ilya Enkovich wrote: 2015-04-14 8:22 GMT+03:00 Jeff Law l...@redhat.com: On 03/15/2015 02:30 PM, Richard Sandiford wrote: Ilya Enkovich enkovich@gmail.com writes: This patch allows propagation of loop invariants for i386 if propagated value is a constant to be used in address operand. Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk or stage 1? Is it necessary for this to be a target hook? The concept doesn't seem particularly target-specific. We should only propagate into the address if the new cost is no greater than the old cost, but if the address meets that condition and if propagating at this point in the pipeline is a win on x86, then wouldn't it be a win for other targets too? I agree with Richard here. I can't see a strong reason why this should be a target hook. Perhaps part of the issue here is the address costing metrics may not have enough context to make good decisions. In which case what context do they need? At this point I don't insist on a target hook. The main reasoning was to not affect other targets. If we extend propagation for non constant values different aspects may appear. E.g. possible register pressure changes may significantly affect ia32. I just wanted to have an instrument to play with a propagation on x86 not affecting other targets. I don't have an opportunity to test possible performance implications on non-x86 targets. Don't expect (significant) regressions there but who knows... I'll remove the hook from this patch. Will probably introduce it later if some target specific cases are found. Thanks, Ilya Jeff Here is a version with no hook. Bootstrapped and tested on x86_64-unknown-linux-gnu. Is it OK for trunk? Thanks, Ilya -- gcc/ 2015-04-17 Ilya Enkovich ilya.enkov...@intel.com PR target/65103 * fwprop.c (forward_propagate_into): Propagate loop invariants if a target says so. gcc/testsuite/ 2015-04-17 Ilya Enkovich ilya.enkov...@intel.com PR target/65103 * gcc.target/i386/pr65103-2.c: New. It seems to me there's a key piece missing here -- metrics. When is this profitable, when is it not profitable. Just blindly undoing LICM seems wrong here. The first thought is to look at register pressure through the loop. I thought we had some infrastructure for this kind of query available. It'd probably be wise to re-use it. In fact, one might reasonably ask if LICM should have hoisted the expression to start with. I'd also think the cost of the constant may come into play here. A really cheap constant probably should not have been hoisted by LICM to start with -- but the code may have been written in such a way that some low cost constants are pulled out as loop invariants at the source level. So this isn't strictly an issue of un-doing bad LICM So I think to go forward we need to be working on solving the when is this a profitable transformation to make. This patch doesn't force propagation. The patch just allows propagation and regular fwprop cost estimation is used to compute if this is profitable. For i386 I don't see cases when we shouldn't propagate. We remove instruction, reduce register pressure and having constant in memory operand is free which is reflected in address_cost hook. Ilya jeff
Re: [PATCH 2/3][AArch64] Add vcond(u?)didi pattern
On 17 April 2015 at 16:40, Alan Lawrence alan.lawre...@arm.com wrote: This just adds the necessary patterns used for comparisons of DImode vectors. Used as part of arm_neon.h, in next/final patch. Tested on aarch64-none-elf. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_vcond_internalmodemode, vcondmodemode, vcondumode,mode): Add DImode variant. OK /Marcus
Re: [PATCH 3/3][AArch64] Idiomatic 64x1 comparisons in arm_neon.h
On 17 April 2015 at 16:41, Alan Lawrence alan.lawre...@arm.com wrote: This also makes the existing intrinsics tests apply to the new patterns. Tested on aarch64-none-elf. gcc/ChangeLog: * config/aarch64/arm_neon.h (vceq_s64, vceq_u64, vceqz_s64, vceqz_u64, vcge_s64, vcge_u64, vcgez_s64, vcgt_s64, vcgt_u64, vcgtz_s64, vcle_s64, vcle_u64, vclez_s64, vclt_s64, vclt_u64, vcltz_s64, vtst_s64, vtst_u64): Rewrite using gcc vector extensions. gcc/testsuite/ChangeLog: * gcc.target/aarch64/singleton_intrinsics_1.c: Generalize regex to allow cmlt or sshr. OK /Marcus