Re: [PATCH 3/3] Handle const_vector in mulv4si3 for pre-sse4.1.
On Mon, Jun 18, 2012 at 10:06 PM, Richard Henderson r...@redhat.com wrote: Please note that you will probably hit PR33329, this is the reason that we expand multiplications after reload. Please see [1] for further explanation. There is gcc.target/i386/pr33329.c test to cover this issue, but it is not effective anymore since the simplification happens at tree level. [1] http://gcc.gnu.org/ml/gcc-patches/2007-09/msg00668.html Well, even with the test case changed s/*2/*12345/ so that the test case continues to use a multiply instead of devolving to a shift, does not fail. There have been a lot of changes since 2007; I might hope that the underlying bug has been fixed. Should we also change mulVI1_AVX23 and mulVI8_AVX23 from pre-reload splitter to an expander in the same way? Uros.
Re: [patch] Deal with #ident without
On Wed, Jun 20, 2012 at 2:21 AM, Hans-Peter Nilsson h...@bitrange.com wrote: On Tue, 19 Jun 2012, Steven Bosscher wrote: I've now committed this, see r188791. Breaking cris-elf. Just try rebuilding cc1: ./gcc/gcc/../libdecnumber/dpd -I../libdecnumber \ /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c -o cris.o /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c: In function 'cris_asm_output_ident': /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2480: error: 'cgraph_state' undeclared (first use in this function) /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2480: error: (Each undeclared identifier is reported only once /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2480: error: for each function it appears in.) /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2480: error: 'CGRAPH_STATE_PARSING' undeclared (first use in this funct\ ion) /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2478: warning: unused variable 'buf' /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2477: warning: unused variable 'size' /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2476: warning: unused variable 'section_asm_op' /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c: In function 'cris_option_override': /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2538: error: 'flag_no_gcc_ident' undeclared (first use in this function\ ) make[2]: *** [cris.o] Error 1 Grr. A merge f*ck-up. This was in my testing tree on the compile farm but not in the patch I committed: Index: config/cris/cris.c === --- config/cris/cris.c (revision 188808) +++ config/cris/cris.c (working copy) @@ -47,6 +47,7 @@ along with GCC; see the file COPYING3. #include optabs.h #include df.h #include opts.h +#include cgraph.h /* Usable when we have an amount to add or subtract, and want the optimal size of the insn. */ @@ -2533,10 +2534,6 @@ cris_asm_output_case_end (FILE *stream, static void cris_option_override (void) { - /* We don't want an .ident for gcc. - It isn't really clear anymore why not. */ - flag_no_gcc_ident = true; - if (cris_max_stackframe_str) { cris_max_stackframe = atoi (cris_max_stackframe_str); I'm building a cross to cris-elf now, to be sure, and I'll commit this ASAP after that. Sorry for this... Ciao! Steven
Re: [patch committed testsuite] Tweak gcc.dg/stack-usage-1.c on SH
I've applied the attached patch which is a tiny SH specific change of gcc.dg/stack-usage-1.c test. Tested on sh-linux and i686-pc-linux-gnu. This is wrong, please remove the dg-options line and do like the other targets. -- Eric Botcazou
RFA: PATCH to Makefile.def/tpl to add libgomp to make check-c++
The recent regression in libgomp leads me to want to add libgomp tests to the check-c++ target. OK for trunk? commit 3eaa6c5b268115cbf4ab762b5d7b50022389ef25 Author: Jason Merrill ja...@redhat.com Date: Tue Jun 19 18:16:34 2012 -0700 * Makefile.tpl (check-target-libgomp-c++): New. * Makefile.def (c++): Add it. * Makefile.in: Regenerate. diff --git a/Makefile.def b/Makefile.def index 1449a50..2a0b8fa 100644 --- a/Makefile.def +++ b/Makefile.def @@ -518,7 +518,8 @@ dependencies = { module=configure-target-libgfortran; on=all-target-libquadmath; languages = { language=c; gcc-check-target=check-gcc; }; languages = { language=c++; gcc-check-target=check-c++; lib-check-target=check-target-libstdc++-v3; -lib-check-target=check-target-libmudflap-c++; }; +lib-check-target=check-target-libmudflap-c++; +lib-check-target=check-target-libgomp-c++; }; languages = { language=fortran; gcc-check-target=check-fortran; lib-check-target=check-target-libquadmath; lib-check-target=check-target-libgfortran; }; diff --git a/Makefile.in b/Makefile.in index def860e..9cf3543 100644 --- a/Makefile.in +++ b/Makefile.in @@ -41116,6 +41116,13 @@ check-target-libmudflap-c++: @endif target-libmudflap +@if target-libgomp +.PHONY: check-target-libgomp-c++ +check-target-libgomp-c++: + $(MAKE) RUNTESTFLAGS=$(RUNTESTFLAGS) c++.exp check-target-libgomp + +@endif target-libgomp + # -- # GCC module # -- @@ -41150,7 +41157,7 @@ check-gcc-c++: s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \ $(HOST_EXPORTS) \ (cd gcc $(MAKE) $(GCC_FLAGS_TO_PASS) check-c++); -check-c++: check-gcc-c++ check-target-libstdc++-v3 check-target-libmudflap-c++ +check-c++: check-gcc-c++ check-target-libstdc++-v3 check-target-libmudflap-c++ check-target-libgomp-c++ .PHONY: check-gcc-fortran check-fortran check-gcc-fortran: diff --git a/Makefile.tpl b/Makefile.tpl index 371c3b6..f06a7ce 100644 --- a/Makefile.tpl +++ b/Makefile.tpl @@ -1415,6 +1415,13 @@ check-target-libmudflap-c++: @endif target-libmudflap +@if target-libgomp +.PHONY: check-target-libgomp-c++ +check-target-libgomp-c++: + $(MAKE) RUNTESTFLAGS=$(RUNTESTFLAGS) c++.exp check-target-libgomp + +@endif target-libgomp + # -- # GCC module # --
Re: [patch committed testsuite] Tweak gcc.dg/stack-usage-1.c on SH
Eric Botcazou ebotca...@adacore.com wrote: This is wrong, please remove the dg-options line and do like the other targets. I'll revert that line and use my patch in the trail #11 of http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53621. Regards, kaz
Re: [PATCH] C++11, grammar fix for late-specified return types and virt-specifiers
Applied, thanks. Note that your dg-error regexp doesn't make much sense: // { dg-error expected type-specifier before 'final'||expected ';'||declaration doesn't declare anything } Regular expression or uses a single |, so this ends up being a long way of writing // { dg-error } I adjusted the dg-error lines to check only for the expected type-specifier error, and used dg-prune-output to discard the extra errors. Jason
Re: C++ PATCH for c++/53484 (wrong auto in template)
On 06/15/2012 02:59 PM, Dominique Dhumieres wrote: Back when we added C++11 auto deduction, I thought we could shortcut the normal deduction in some templates, when the type is adequately describable (thus the late, unlamented function describable_type). Over time various problems with this have arisen, of which this is the most recent; as a result, I'm giving up the attempt as a bad idea and just deferring auto deduction if the initializer is type-dependent. ... This has caused http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00085.html Did you mean to put a bugzilla link here? Jason
Re: [patch committed testsuite] Tweak gcc.dg/stack-usage-1.c on SH
This is wrong, please remove the dg-options line and do like the other targets. I'll revert that line and use my patch in the trail #11 of http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53621. I've applied the patch below. I'll backport it release branches. Regards, kaz -- 2012-06-19 Kaz Kojima kkoj...@gcc.gnu.org * gcc.dg/stack-usage-1.c: Remove dg-options line for sh targets and add __sh__ case. --- ORIG/trunk/gcc/testsuite/gcc.dg/stack-usage-1.c 2012-06-20 10:01:51.0 +0900 +++ trunk/gcc/testsuite/gcc.dg/stack-usage-1.c 2012-06-20 16:28:31.0 +0900 @@ -1,6 +1,5 @@ /* { dg-do compile } */ /* { dg-options -fstack-usage } */ -/* { dg-options -fstack-usage -fomit-frame-pointer { target { sh*-*-* } } } */ /* This is aimed at testing basic support for -fstack-usage in the back-ends. See the SPARC back-end for example (grep flag_stack_usage_info in sparc.c). @@ -61,6 +60,8 @@ # define SIZE (256 - __EPIPHANY_STACK_OFFSET__) #elif defined (__RL78__) # define SIZE 254 +#elif defined (__sh__) +# define SIZE 252 #else # define SIZE 256 #endif
Re: [patch committed testsuite] Tweak gcc.dg/stack-usage-1.c on SH
I've applied the patch below. I'll backport it release branches. Thanks! -- Eric Botcazou
Re: [PATCH] C++11, grammar fix for late-specified return types and virt-specifiers
On 20 June 2012 10:35, Jason Merrill ja...@redhat.com wrote: Applied, thanks. Note that your dg-error regexp doesn't make much sense: // { dg-error expected type-specifier before 'final'||expected ';'||declaration doesn't declare anything } Regular expression or uses a single |, so this ends up being a long way of writing // { dg-error } Funny. The testcasewriting gcc wiki page at http://gcc.gnu.org/wiki/TestCaseWriting suggests a double pipe. Quoth the Raven: Should a line produce two errors, the regular expression should include an || (ie. a regular expression OR) between the possible message fragments. If a single pipe is indeed to be used, perhaps we want to correct that piece of documentation, lest fools follow its advice. :)
Re: [PING ARM Patches] PR53447: optimizations of 64bit ALU operation with constant
Hi Michael It seems the wiki page describes 64bit operations on NEON only. My patches improves 64bit operations on core registers only. I touched the neon patterns simply because those DI mode operations are enabled separately according to the TARGET_NEON value, so in the neon patterns I duplicated the alternatives in normal cases. thanks Carrot On Wed, Jun 20, 2012 at 9:58 AM, Michael Hope michael.h...@linaro.org wrote: On 18 June 2012 22:17, Carrot Wei car...@google.com wrote: Hi Could ARM maintainers review following patches? http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00497.html 64bit add/sub constants. http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01834.html 64bit and with constants. http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01974.html 64bit xor with constants. http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00287.html 64bit ior with constants. Hi Carrot. Out of interest, how do these interact with the 64 bit in NEON patches that Andrew has been doing? They seem to touch many of the same patterns and I'm concerned that they'd cause GCC to prefer core registers instead of NEON, especially as the constant values you can use in a vmov are limited. There's a (in progress) summary of the current state for the standard C operators here: https://wiki.linaro.org/MichaelHope/Sandbox/64BitOperations -- Michael
Re: [PATCH] Fix PR53708
On Tue, 19 Jun 2012, Iain Sandoe wrote: On 19 Jun 2012, at 22:41, Mike Stump wrote: On Jun 19, 2012, at 12:22 PM, Iain Sandoe i...@codesourcery.com wrote: On 19 Jun 2012, at 13:53, Dominique Dhumieres wrote: On Tue, 19 Jun 2012, Richard Guenther wrote: Richard Guenther rguent...@suse.de writes: We are too eager to bump alignment of some decls when vectorizing. The fix is to not bump alignment of decls the user explicitely aligned or that are used in an unknown way. I thought attribute((__aligned__)) only set a minimum alignment for variables? Most usees I've seen have been trying to get better performance from higher alignment, so it might not go down well if the attribute stopped the vectoriser from increasing the alignment still further. That's what the documentation says indeed. I'm not sure which part of the patch fixes the ObjC failures where the alignment is part of the ABI (and I suppose ObjC then mis-uses the aligned attribute?). A quick test shows that if (DECL_PRESERVE_P (decl)) alone is enough to fix the objc failures, while they are still there if one uses only if (DECL_USER_ALIGN (decl)) That makes sense, I had a quick look at the ObjC code, and it appears that the explicit ALIGNs were never committed to trunk. Thus, the question becomes; what should ObjC (or any other) FE do to ensure that specific ABI (upper) alignment constraints are met? Hum, upper is easy... I thought the issue was that extra alignment would kill it? I know that extra alignment does kill some of the objc metadata. clearly, ambiguous phrasing on my part. I mean when we want to say no more than this much. I think the only way would be to lay out things inside a structure. Otherwise if extra alignment can break things cannot re-ordering of symbols break, too? Or can you elaborate on how extra alignment breaks stuff here? Thanks, Richard.
Re: C++ PATCH for c++/53484 (wrong auto in template)
Did you mean to put a bugzilla link here? Yes;-( http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53565 copy and paste from the wrong window). Dominique
Re: [PATCH] Fix PR tree-optimization/53636 (SLP generates invalid misaligned access)
On Tue, Jun 19, 2012 at 11:36 PM, Mikael Pettersson mi...@it.uu.se wrote: Richard Guenther writes: On Fri, Jun 15, 2012 at 5:00 PM, Ulrich Weigand uweig...@de.ibm.com wrote: Richard Guenther wrote: On Fri, Jun 15, 2012 at 3:13 PM, Ulrich Weigand uweig...@de.ibm.com wrote: However, there is a second case where we need to check every pass: if we're not actually vectorizing any loop, but are performing basic-block SLP. In this case, it would appear that we need the same check as described in the comment above, i.e. to verify that the stride is a multiple of the vector size. The patch below adds this check, and this indeed fixes the invalid access I was seeing in the test case (in the final assembler, we now get a vld1.16 instead of vldr). Tested on arm-linux-gnueabi with no regressions. OK for mainline? Ok. Thanks for the quick review; I've checked this in to mainline now. I just noticed that the test case also crashes on 4.7, but not on 4.6. Would a backport to 4.7 also be OK, once testing passes? Yes. Please leave it on mainline a few days to catch fallout from autotesters. This patch caused FAIL: gcc.dg/vect/bb-slp-16.c scan-tree-dump-times slp basic block vectorized using SLP 1 on sparc64-linux. Comparing the pre and post patch dumps for that file shows 22: vect_compute_data_ref_alignment: 22: misalign = 4 bytes of ref MEM[(unsigned int *)pout_90 + 28B] 22: vect_compute_data_ref_alignment: -22: force alignment of arr[i_87] -22: misalign = 0 bytes of ref arr[i_87] +22: SLP: step doesn't divide the vector-size. +22: Unknown alignment for access: arr (lots of stuff that's simply gone) -22: BASIC BLOCK VECTORIZED - -22: basic block vectorized using SLP +22: not vectorized: unsupported unaligned store.arr[i_87] +22: not vectorized: unsupported alignment in basic block. In this testcase the alignment of arr[i] should be irrelevant - it is not part of the stmts that are going to be vectorized. But of course this may be simply an odering issue in how we analyze data-references / statements in basic-block vectorization (thus we possibly did not yet declare the arr[i] = i statement as not taking part in the vectorization). The line -22: force alignment of arr[i_87] is odd, too - as said we do not need to touch arr when vectorizing the basic-block. Ulrich, can you look into this or do you want me to take a look here? Mikael - please open a bugreport for this. Thanks, Richard. /Mikael
Re: [Patch] Adjustments for Windows x64 SEH
On Jun 19, 2012, at 6:47 PM, Richard Henderson wrote: On 2012-06-18 05:22, Tristan Gingold wrote: + /* Win64 SEH, very large frames need a frame-pointer as maximum stack + allocation is 4GB (add a safety guard for saved registers). */ + if (TARGET_64BIT_MS_ABI get_frame_size () + 4096 SEH_MAX_FRAME_SIZE) +return true; Elsewhere you say this is an upper bound for stack use by the prologue. It's clearly a wild guess. The maximum stack use is 10*sse + 8*int registers saved, which is a lot less than 4096. That said, I'm ok with *using* 4096 so long that the comment clearly states that it's a large over-estimate. I do suggest, however, folding this into the SEH_MAX_FRAME_SIZE value, and expanding on the comment there. I see no practical difference between 0x8000 and 0x7fffe000 being the limit. Here is the new comment. I have reduced the estimation to 256. /* According to Windows x64 software convention, the maximum stack allocatable in the prologue is 4G - 8 bytes. Furthermore, there is a limited set of instructions allowed to adjust the stack pointer in the epilog, forcing the use of frame pointer for frames larger than 2 GB. This theorical limit is reduced by 256, an over-estimated upper bound for the stack use by the prologue. We define only one threshold for both the prolog and the epilog. When the frame size is larger than this threshold, we allocate the are to save SSE regs, then save them, and then allocate the remaining. There is no SEH unwind info for this later allocation. */ #define SEH_MAX_FRAME_SIZE ((2U 30) - 256) +/* Output assembly code to get the establisher frame (Windows x64 only). + This corresponds to what will be computed by Windows from Frame Register + and Frame Register Offset fields of the UNWIND_INFO structure. Since + these values are computed very late (by ix86_expand_prologue), we cannot + express this using only RTL. */ + +const char * +ix86_output_establisher_frame (rtx target) +{ + if (!frame_pointer_needed) +{ + /* Note that we have advertized an lea operation. */ + output_asm_insn (lea{q}\t{0(%%rsp), %0|%0, 0[rsp]}, target); +} + else +{ + rtx xops[3]; + struct ix86_frame frame; + + /* Recompute the frame layout here. */ + ix86_compute_frame_layout (frame); + + /* Closely follow how the frame pointer is set in + ix86_expand_prologue. */ + xops[0] = target; + xops[1] = hard_frame_pointer_rtx; + if (frame.hard_frame_pointer_offset == frame.reg_save_offset) +xops[2] = GEN_INT (0); + else +xops[2] = GEN_INT (-(frame.stack_pointer_offset + - frame.hard_frame_pointer_offset)); + output_asm_insn (lea{q}\t{%a2(%1), %0|%0, %a2[%1]}, xops); This is what register elimination is for; the value substitution happens during reload. Now, one *could* add a new pseudo-hard-register for this (we support as many register eliminations as needed), but before we do that we need to decide if we can adjust the soft frame pointer to be the value required. If so, you can then rely on the existing __builtin_frame_address. Which is a very attractive sounding solution. I'm 99% moving the sfp will work. Thank you for this idea. I am trying to implement it. Tristan.
[patch][RFC] Move the C front end to gcc/c/
Hello, Attached is a concept patch to move the C front end to its own sub-directory of the main gcc directory. Things like updates of sourcebuild.texi are not yet included. I'm posting this as an RFC: Does this look like the right approach? Have I overlooked other things than just documentation updates? I hope this would not cause too much trouble for branches like the cxx-conversion branch? Bootstrapped on x86_64-unknown-linux-gnu with c,objc,c++,obj-c++ enabled, FWIW. Thanks, Ciao! Steven move_C_fe.diff Description: Binary data
Re: RFA: PATCH to Makefile.def/tpl to add libgomp to make check-c++
On Wed, Jun 20, 2012 at 9:26 AM, Jason Merrill ja...@redhat.com wrote: The recent regression in libgomp leads me to want to add libgomp tests to the check-c++ target. OK for trunk? Ok. (what about libitm?) Thanks, Richard.
[PATCH] Adjust call stmt cost for tailcalls
Tailcalls have no argument setup cost and no return value cost. This patch adjusts estminate_num_insns to reflect that. Honza, does this look correct? Bootstrapped and tested on x86_64-unknown-linux-gnu. Thanks, Richard. 2012-06-20 Richard Guenther rguent...@suse.de * tree-inline.c (estimate_num_insns): Estimate call cost for tailcalls properly. Index: gcc/tree-inline.c === --- gcc/tree-inline.c (revision 188817) +++ gcc/tree-inline.c (working copy) @@ -3611,12 +3611,15 @@ estimate_num_insns (gimple stmt, eni_wei } cost = node ? weights-call_cost : weights-indirect_call_cost; - if (gimple_call_lhs (stmt)) - cost += estimate_move_cost (TREE_TYPE (gimple_call_lhs (stmt))); - for (i = 0; i gimple_call_num_args (stmt); i++) + if (!gimple_call_tail_p (stmt)) { - tree arg = gimple_call_arg (stmt, i); - cost += estimate_move_cost (TREE_TYPE (arg)); + if (gimple_call_lhs (stmt)) + cost += estimate_move_cost (TREE_TYPE (gimple_call_lhs (stmt))); + for (i = 0; i gimple_call_num_args (stmt); i++) + { + tree arg = gimple_call_arg (stmt, i); + cost += estimate_move_cost (TREE_TYPE (arg)); + } } break; }
Re: [PATCH] Fix PR53708
On Tue, 19 Jun 2012, Dominique Dhumieres wrote: On Tue, 19 Jun 2012, Richard Guenther wrote: Richard Guenther rguent...@suse.de writes: We are too eager to bump alignment of some decls when vectorizing. The fix is to not bump alignment of decls the user explicitely aligned or that are used in an unknown way. I thought attribute((__aligned__)) only set a minimum alignment for variables? Most usees I've seen have been trying to get better performance from higher alignment, so it might not go down well if the attribute stopped the vectoriser from increasing the alignment still further. That's what the documentation says indeed. I'm not sure which part of the patch fixes the ObjC failures where the alignment is part of the ABI (and I suppose ObjC then mis-uses the aligned attribute?). A quick test shows that if (DECL_PRESERVE_P (decl)) alone is enough to fix the objc failures, while they are still there if one uses only if (DECL_USER_ALIGN (decl)) Thus, the following. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2012-06-20 Richard Guenther rguent...@suse.de * tree-vect-data-refs.c (vect_can_force_dr_alignment_p): Allow adjusting alignment of user-aligned decls again. Index: gcc/tree-vect-data-refs.c === --- gcc/tree-vect-data-refs.c (revision 188817) +++ gcc/tree-vect-data-refs.c (working copy) @@ -4731,10 +4720,9 @@ vect_can_force_dr_alignment_p (const_tre if (TREE_ASM_WRITTEN (decl)) return false; - /* Do not override explicit alignment set by the user or the alignment - as specified by the ABI when the used attribute is set. */ - if (DECL_USER_ALIGN (decl) - || DECL_PRESERVE_P (decl)) + /* Do not override the alignment as specified by the ABI when the used + attribute is set. */ + if (DECL_PRESERVE_P (decl)) return false; if (TREE_STATIC (decl))
[patch][m32c] Remove unnecessary includes from m32c-pragma.c
Hello, m32c-pragma.c doesn't need the includes that the patch below removes. Tested with a cross from powerpc64-unknown-linux-gnu to m32c-elf. Will commit as obvious unless someone objects. Ciao! Steven * config/m32c/m32c-pragma.c: Remove unnecessary includes. Index: config/m32c/m32c-pragma.c === --- config/m32c/m32c-pragma.c (revision 188820) +++ config/m32c/m32c-pragma.c (working copy) @@ -27,13 +27,7 @@ #include c-family/c-common.h #include diagnostic-core.h #include cpplib.h -#include hard-reg-set.h -#include output.h #include m32c-protos.h -#include function.h -#define MAX_RECOG_OPERANDS 10 -#include reload.h -#include target.h /* Implements the GCC memregs pragma. This pragma takes only an integer, and is semantically identical to the -memregs= command
Re: [Patch] PR 51938: extend ifcombine
On Sun, Jun 10, 2012 at 4:16 PM, Marc Glisse marc.gli...@inria.fr wrote: Hello, currently, tree-ssa-ifcombine handles pairs of imbricated ifs that share the same then branch, or the same else branch. There is no particular reason why it couldn't also handle the case where the then branch of one is the else branch of the other, which is what I do here. Any comments? The general idea looks good, but I think the patch is too invasive. As far as I can see the only callers with a non-zero 'inv' argument come from ifcombine_ifnotorif and ifcombine_ifnotandif (and both with inv == 2). I would rather see a more localized patch that makes use of invert_tree_comparison to perform the inversion on the call arguments of maybe_fold_and/or_comparisons. Is there any reason that would not work? At least + if (inv 1) +lcompcode2 = COMPCODE_TRUE - lcompcode2; looks as if it were not semantically correct - you cannot simply invert floating-point comparisons (see the restrictions invert_tree_comparison has). Thanks, Richard. 2012-06-10 Marc Glisse marc.gli...@inria.fr gcc/ PR tree-optimization/51938 * fold-const.c (combine_comparisons): Extra argument. Handle inverted conditions. (fold_truth_andor_1): Update call to combine_comparisons. * gimple-fold.c (swap12): New function. (and_comparisons_1): Extra argument. Handle inverted conditions. (and_var_with_comparison_1): Update call to and_comparisons_1. (maybe_fold_and_comparisons): Extra argument. Update call to and_comparisons_1. (or_comparisons_1): Extra argument. Handle inverted conditions. (or_var_with_comparison_1): Update call to or_comparisons_1. (maybe_fold_or_comparisons): Extra argument. Update call to or_comparisons_1. * tree-ssa-ifcombine.c (ifcombine_ifnotandif): New function. (ifcombine_ifnotorif): New function. (tree_ssa_ifcombine_bb): Call them. (ifcombine_iforif): Update call to maybe_fold_or_comparisons. (ifcombine_ifandif): Update call to maybe_fold_and_comparisons. * tree-ssa-reassoc.c (eliminate_redundant_comparison): Update calls to maybe_fold_or_comparisons and maybe_fold_and_comparisons. * tree-if-conv.c (fold_or_predicates): Update call to maybe_fold_or_comparisons. * gimple.h (maybe_fold_and_comparisons): Match gimple-fold.c prototype. (maybe_fold_or_comparisons): Likewise. * tree.h (combine_comparisons): Match fold-const.c prototype. gcc/testsuite/ PR tree-optimization/51938 * gcc.dg/tree-ssa/ssa-ifcombine-8.c: New testcase. * gcc.dg/tree-ssa/ssa-ifcombine-9.c: New testcase. -- Marc Glisse
Re: [patch][ARM] Do not include output.h in arm-c.c
On 19/06/12 23:44, Steven Bosscher wrote: Hello, Only a few front-end files to go that need output.h, and some of them are in the c_target_objs: arm, mep, m32c, and rl78. This patch tackles the ARM case. arm-c.c needs output.h because EMIT_EABI_ATTRIBUTE wants to print to asm_out_file. Solved by replacing EMIT_EABI_ATTRIBUTE with a function arm.c:arm_emit_eabi_attribute. Tested by building a cross-compiler from powerpc64-unknown-linux-gnu X arm-eabi, and comparing assembly on a set of files. OK for trunk? Ciao! Steven= arm_C_no_output_h.diff OK. R.
[patch] Implement -fcallgraph-info option
Hi, this is a repost of http://gcc.gnu.org/ml/gcc-patches/2010-10/msg02468.html earlier in the development cycle, so with hopefully more time for discussion. The command line option -fcallgraph-info is added and makes the compiler generate another output file (xxx.ci) for each compilation unit, which is a valid VCG file (you can launch your favorite VCG viewer on it unmodified) and contains the final callgraph of the unit. final is a bit of a misnomer as this is actually the callgraph at RTL expansion time, but since most high-level optimizations are done at the Tree level and RTL doesn't usually fiddle with calls, it's final in almost all cases. Moreover, the nodes can be decorated with additional info: -fcallgraph-info=su adds stack usage info and -fcallgraph-info=da dynamic allocation info. This is useful for embedded applications with stringent requirements in terms of memory usage for example. You can provide the .ci files (or an aggregated version) for pre-compiled libraries. This is again strictly orthogonal to code and debug info generation. There are a few non-obvious changes to libfuncs.h, builtins.c, expr.c and optabs.c to deal with quirks of the RTL expander, but this mostly removes dead code. This version takes into account Joseph's comments for the first submission. As for Richard's comments, the implementation is low-level by design because we want to be able to trust it and the IPA callgraph isn't suitable for this, as the RTL expander can introduce function calls that need to be accounted for. Tested on x86_64-suse-linux. 2012-06-20 Eric Botcazou ebotca...@adacore.com Callgraph info support * common.opt (-fcallgraph-info[=]): New option. * doc/invoke.texi (Debugging options): Document it. * opts.c (common_handle_option): Handle it. * flag-types.h (enum callgraph_info_type): New type. * builtins.c (set_builtin_user_assembler_name): Do not initialize memcpy_libfunc and memset_libfunc. * calls.c (expand_call): If -fcallgraph-info, record the call. (emit_library_call_value_1): Likewise. * cgraph.h (struct cgraph_final_info): New structure. (struct cgraph_dynamic_alloc): Likewise. (cgraph_final_edge): Likewise. (cgraph_node): Add 'final' field. (dump_cgraph_final_vcg): Declare. (cgraph_final_record_call): Likewise. (cgraph_final_record_dynamic_alloc): Likewise. (cgraph_final_info): Likewise. * cgraph.c: Include expr.h and output.h. (cgraph_create_empty_node): Initialize 'final' field. (final_create_edge): New static function. (cgraph_final_record_call): New global function. (cgraph_final_record_dynamic_alloc): Likewise. (cgraph_final_info): Likewise. (dump_cgraph_final_indirect_call_node_vcg): New static function. (dump_cgraph_final_edge_vcg): Likewise. (dump_cgraph_final_node_vcg): Likewise. (external_node_needed_p): Likewise. (dump_cgraph_final_vcg): New global function. * expr.c (emit_block_move_via_libcall): Set input_location on the call. (set_storage_via_libcall): Likewise. (block_move_fn): Make global. Do not include gt-expr.h. * expr.h (block_move_fn): Declare. * gimplify.c (gimplify_decl_expr): Record dynamically-allocated object by calling cgraph_final_record_dynamic_alloc if -fcallgraph-info=da. * libfuncs.h (enum libfunc_index): Delete LTI_memcpy and LTI_memset. (memcpy_libfunc): Delete. (memset_libfunc): Likewise. * optabs.c (init_one_libfunc): Do not zap the SYMBOL_REF_DECL. (init_optabs): Do not initialize memcpy_libfunc and memset_libfunc. * print-tree.c (print_decl_identifier): New function. * output.h (enum stack_usage_kind_type): New type. (stack_usage_qual): Declare. * toplev.c (callgraph_info_file): New global variable. (stack_usage_qual): Likewise. (output_stack_usage): If -fcallgraph-info=su, set stack_usage_kind and stack_usage of associated callgraph node. If -fstack-usage, use print_decl_identifier for pretty-printing. (lang_dependent_init): Open file if -fcallgraph-info. (finalize): If callgraph_info_file is not null, invoke dump_cgraph_vcg and close file. * tree.h (print_decl_identifier): Declare. (PRINT_DECL_ORIGIN, PRINT_DECL_NAME, PRINT_DECL_UNIQUE_NAME): New. * Makefile.in (expr.o): Remove gt-expr.h. (cgraph.o): Add $(EXPR_H) and output.h. * config/picochip/picochip.c: Adjust comment. -- Eric Botcazou Index: doc/invoke.texi === --- doc/invoke.texi (revision 188651) +++ doc/invoke.texi (working copy) @@ -328,7 +328,7 @@ Objective-C and Objective-C++ Dialects}. -feliminate-unused-debug-symbols -femit-class-debug-always @gol
Re: [PR49888, VTA] don't keep VALUEs bound to modified MEMs
On Wed, Jun 20, 2012 at 12:52:12AM -0300, Alexandre Oliva wrote: On Jun 16, 2012, H.J. Lu hjl.to...@gmail.com wrote: from Alexandre Oliva aol...@redhat.com PR debug/53671 PR debug/49888 * alias.c (memrefs_conflict_p): Improve handling of AND for alignment. from Alexandre Oliva aol...@redhat.com PR debug/53671 PR debug/49888 * var-tracking.c (vt_initialize): Record initial offset between arg pointer and stack pointer. from Alexandre Oliva aol...@redhat.com PR debug/53671 PR debug/49888 * var-tracking.c (vt_get_canonicalize_base): New. (vt_canonicalize_addr, vt_stack_offset_p): New. (vt_canon_true_dep): New. (drop_overlapping_mem_locs): Use vt_canon_true_dep. (clobber_overlaping_mems): Use vt_canonicalize_addr. from Alexandre Oliva aol...@redhat.com PR debug/53671 PR debug/49888 * var-tracking.c (vt_init_cfa_base): Drop redundant recording of CFA base. Ok, thanks. Jakub
Re: [PR debug/53682] avoid crash in cselib promote_debug_loc
On Wed, Jun 20, 2012 at 12:39:29AM -0300, Alexandre Oliva wrote: When promote_debug_loc was first introduced, it would never be called with a NULL loc list. However, because of the strategy of temporarily resetting loc lists before recursion introduced a few months ago in alias.c, the earlier assumption no longer holds. This patch adusts promote_debug_loc to deal with this case. The thing I'm worried about is what will happen with -g0 in that case. If the loc list is temporarily reset, it will be restored again, won't that mean that for -g0 we'll then have a loc that is in the corresponding -g compilation referenced by a DEBUG_INSNs only (and thus non-promoted)? for gcc/ChangeLog from Alexandre Oliva aol...@redhat.com PR debug/53682 * cselib.c (promote_debug_loc): Don't crash on NULL argument. Index: gcc/cselib.c === --- gcc/cselib.c.orig 2012-06-17 22:52:27.740087279 -0300 +++ gcc/cselib.c 2012-06-18 08:55:32.948832112 -0300 @@ -322,7 +322,7 @@ new_elt_loc_list (cselib_val *val, rtx l static inline void promote_debug_loc (struct elt_loc_list *l) { - if (l-setting_insn DEBUG_INSN_P (l-setting_insn) + if (l l-setting_insn DEBUG_INSN_P (l-setting_insn) (!cselib_current_insn || !DEBUG_INSN_P (cselib_current_insn))) { n_debug_values--; Jakub
Re: [patch] Implement -fcallgraph-info option
On Wed, Jun 20, 2012 at 12:40 PM, Steven Bosscher stevenb@gmail.com wrote: On Wed, Jun 20, 2012 at 12:30 PM, Eric Botcazou ebotca...@adacore.com wrote: * cgraph.c: Include expr.h and output.h. What for? Never mind, I see why you need this.
[patch] Poison removed target macros ASM_OUTPUT_IDENT and IDENT_ASM_OP
Hello, These macros are now gone and should be poisoned. I'll commit this later today unless someone objects. Ciao! Steven * system.h: Poison ASM_OUTPUT_IDENT and IDENT_ASM_OP. Index: system.h === --- system.h(revision 188825) +++ system.h(working copy) @@ -815,7 +815,7 @@ LABEL_ALIGN_AFTER_BARRIER_MAX_SKIP JUMP_ALIGN_MAX_SKIP \ CAN_DEBUG_WITHOUT_FP UNLIKELY_EXECUTED_TEXT_SECTION_NAME\ HOT_TEXT_SECTION_NAME LEGITIMATE_CONSTANT_P ALWAYS_STRIP_DOTDOT \ - OUTPUT_ADDR_CONST_EXTRA SMALL_REGISTER_CLASSES + OUTPUT_ADDR_CONST_EXTRA SMALL_REGISTER_CLASSES ASM_OUTPUT_IDENT /* Target macros only used for code built for the target, that have moved to libgcc-tm.h or have never been present elsewhere. */ @@ -887,7 +887,8 @@ SETJMP_VIA_SAVE_AREA FORBIDDEN_INC_DEC_CLASSES \ PREFERRED_OUTPUT_RELOAD_CLASS SYSTEM_INCLUDE_DIR \ STANDARD_INCLUDE_DIR STANDARD_INCLUDE_COMPONENT\ - LINK_ELIMINATE_DUPLICATE_LDIRECTORIES MIPS_DEBUGGING_INFO + LINK_ELIMINATE_DUPLICATE_LDIRECTORIES MIPS_DEBUGGING_INFO \ + IDENT_ASM_OP /* Hooks that are no longer used. */ #pragma GCC poison LANG_HOOKS_FUNCTION_MARK LANG_HOOKS_FUNCTION_FREE \
[PATCH, i386]: Some more int iterator macroizations
Hello! 2012-06-20 Uros Bizjak ubiz...@gmail.com * config/i386/i386.md (rounding_insn): New int attribute. (rounding_insnxf2): Macroize insn from {floor,ceil,btrunc}xf2 using FRNDINT_ROUNDING int iterator. (lrounding_insnxfmode2): Rename from lroundingxfmode2. 2012-06-20 Uros Bizjak ubiz...@gmail.com * config/i386/i386.md (IEEE_MAXMIN): New int iterator. (ieee_maxmin): New int attribute. (*ieee_sieee_maxminmode3): Macroize insn from *ieee_s{max,min}mode3 using IEEE_MAXMIN mode iterator. Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN. Uros. Index: i386.md === --- i386.md (revision 188808) +++ i386.md (working copy) @@ -15108,6 +15108,14 @@ [UNSPEC_FIST_FLOOR UNSPEC_FIST_CEIL]) +;; Base name for define_insn +(define_int_attr rounding_insn + [(UNSPEC_FRNDINT_FLOOR floor) +(UNSPEC_FRNDINT_CEIL ceil) +(UNSPEC_FRNDINT_TRUNC btrunc) +(UNSPEC_FIST_FLOOR floor) +(UNSPEC_FIST_CEIL ceil)]) + (define_int_attr rounding [(UNSPEC_FRNDINT_FLOOR floor) (UNSPEC_FRNDINT_CEIL ceil) @@ -15161,17 +15169,14 @@ (set_attr i387_cw rounding) (set_attr mode XF)]) -(define_expand floorxf2 - [(use (match_operand:XF 0 register_operand)) - (use (match_operand:XF 1 register_operand))] +(define_expand rounding_insnxf2 + [(parallel [(set (match_operand:XF 0 register_operand) + (unspec:XF [(match_operand:XF 1 register_operand)] + FRNDINT_ROUNDING)) + (clobber (reg:CC FLAGS_REG))])] TARGET_USE_FANCY_MATH_387 -flag_unsafe_math_optimizations -{ - if (optimize_insn_for_size_p ()) -FAIL; - emit_insn (gen_frndintxf2_floor (operands[0], operands[1])); - DONE; -}) +flag_unsafe_math_optimizations +!optimize_insn_for_size_p ()) (define_expand floormode2 [(use (match_operand:MODEF 0 register_operand)) @@ -15213,18 +15218,6 @@ DONE; }) -(define_expand ceilxf2 - [(use (match_operand:XF 0 register_operand)) - (use (match_operand:XF 1 register_operand))] - TARGET_USE_FANCY_MATH_387 -flag_unsafe_math_optimizations -{ - if (optimize_insn_for_size_p ()) -FAIL; - emit_insn (gen_frndintxf2_ceil (operands[0], operands[1])); - DONE; -}) - (define_expand ceilmode2 [(use (match_operand:MODEF 0 register_operand)) (use (match_operand:MODEF 1 register_operand))] @@ -15265,18 +15258,6 @@ DONE; }) -(define_expand btruncxf2 - [(use (match_operand:XF 0 register_operand)) - (use (match_operand:XF 1 register_operand))] - TARGET_USE_FANCY_MATH_387 -flag_unsafe_math_optimizations -{ - if (optimize_insn_for_size_p ()) -FAIL; - emit_insn (gen_frndintxf2_trunc (operands[0], operands[1])); - DONE; -}) - (define_expand btruncmode2 [(use (match_operand:MODEF 0 register_operand)) (use (match_operand:MODEF 1 register_operand))] @@ -15357,14 +15338,12 @@ (set_attr mode XF)]) (define_expand nearbyintxf2 - [(use (match_operand:XF 0 register_operand)) - (use (match_operand:XF 1 register_operand))] + [(parallel [(set (match_operand:XF 0 register_operand) + (unspec:XF [(match_operand:XF 1 register_operand)] + UNSPEC_FRNDINT_MASK_PM)) + (clobber (reg:CC FLAGS_REG))])] TARGET_USE_FANCY_MATH_387 -flag_unsafe_math_optimizations -{ - emit_insn (gen_frndintxf2_mask_pm (operands[0], operands[1])); - DONE; -}) +flag_unsafe_math_optimizations) (define_expand nearbyintmode2 [(use (match_operand:MODEF 0 register_operand)) @@ -15531,7 +15510,7 @@ (use (match_dup 2)) (use (match_dup 3))])]) -(define_expand lroundingxfmode2 +(define_expand lrounding_insnxfmode2 [(parallel [(set (match_operand:SWI248x 0 nonimmediate_operand) (unspec:SWI248x [(match_operand:XF 1 register_operand)] FIST_ROUNDING)) @@ -16616,31 +16595,24 @@ ;; Their operands are not commutative, and thus they may be used in the ;; presence of -0.0 and NaN. -(define_insn *ieee_sminmode3 - [(set (match_operand:MODEF 0 register_operand =x,x) - (unspec:MODEF - [(match_operand:MODEF 1 register_operand 0,x) - (match_operand:MODEF 2 nonimmediate_operand xm,xm)] -UNSPEC_IEEE_MIN))] - SSE_FLOAT_MODE_P (MODEmode) TARGET_SSE_MATH - @ - minssemodesuffix\t{%2, %0|%0, %2} - vminssemodesuffix\t{%2, %1, %0|%0, %1, %2} - [(set_attr isa noavx,avx) - (set_attr prefix orig,vex) - (set_attr type sseadd) - (set_attr mode MODE)]) +(define_int_iterator IEEE_MAXMIN + [UNSPEC_IEEE_MAX +UNSPEC_IEEE_MIN]) -(define_insn *ieee_smaxmode3 +(define_int_attr ieee_maxmin + [(UNSPEC_IEEE_MAX max) +(UNSPEC_IEEE_MIN min)]) + +(define_insn *ieee_sieee_maxminmode3 [(set (match_operand:MODEF 0 register_operand
Re: [Patch ping] Strength reduction
On Thu, Jun 14, 2012 at 3:21 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Pro forma ping. :) ;) I notice (with all of these functions) +unsigned +negate_cost (enum machine_mode mode, bool speed) +{ + static unsigned costs[NUM_MACHINE_MODES]; + rtx seq; + unsigned cost; + + if (costs[mode]) +return costs[mode]; + + start_sequence (); + force_operand (gen_rtx_fmt_e (NEG, mode, + gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1)), +NULL_RTX); + seq = get_insns (); + end_sequence (); + + cost = seq_cost (seq, speed); + if (!cost) +cost = 1; that the cost[] array is independent on the speed argument. Thus whatever comes first determines the cost. Odd, and probably not good. A fix would be appreciated (even for the current code ...) - simply make the array costs[NUM_MACHINE_MODES][2]. As for the renaming - can you name the functions consistently? Thus the above would be negate_reg_cost? And maybe rename the other FIXME function, too? Index: gcc/tree-ssa-strength-reduction.c === --- gcc/tree-ssa-strength-reduction.c (revision 0) +++ gcc/tree-ssa-strength-reduction.c (revision 0) @@ -0,0 +1,1611 @@ +/* Straight-line strength reduction. + Copyright (C) 2012 Free Software Foundation, Inc. I know we have these 'tree-ssa-' names, but really this is gimple-ssa now ;) So, please name it gimple-ssa-strength-reduction.c. + /* Access to the statement for subsequent modification. Cached to + save compile time. */ + gimple_stmt_iterator cand_gsi; this is a iterator for cand_stmt? Then caching it is no longer necessary as the iterator is the stmt itself after recent infrastructure changes. +/* Hash table embodying a mapping from statements to candidates. */ +static htab_t stmt_cand_map; ... +static hashval_t +stmt_cand_hash (const void *p) +{ + return htab_hash_pointer (((const_slsr_cand_t) p)-cand_stmt); +} use a pointer-map instead. +/* Callback to produce a hash value for a candidate chain header. */ + +static hashval_t +base_cand_hash (const void *p) +{ + tree ssa_name = ((const_cand_chain_t) p)-base_name; + + if (TREE_CODE (ssa_name) != SSA_NAME) +return (hashval_t) 0; + + return (hashval_t) SSA_NAME_VERSION (ssa_name); +} does it ever happen that ssa_name is not an SSA_NAME? I'm not sure the memory savings over simply using a fixed-size (num_ssa_names) array indexed by SSA_NAME_VERSION pointing to the chain is worth using a hashtable for this? + node = (cand_chain_t) pool_alloc (chain_pool); + node-base_name = c-base_name; If you never free pool entries it's more efficient to use an obstack. alloc-pool only pays off if you get freed item re-use. + switch (gimple_assign_rhs_code (gs)) +{ +case MULT_EXPR: + rhs2 = gimple_assign_rhs2 (gs); + + if (TREE_CODE (rhs2) == INTEGER_CST) + return multiply_by_cost (TREE_INT_CST_LOW (rhs2), lhs_mode, speed); + + if (TREE_CODE (rhs1) == INTEGER_CST) + return multiply_by_cost (TREE_INT_CST_LOW (rhs1), lhs_mode, speed); In theory all commutative statements should have constant operands only at rhs2 ... Also you do not verify that the constant fits in a host-wide-int - but maybe you do not care? Thus, I'd do if (host_integerp (rhs2, 0)) return multiply_by_cost (TREE_INT_CST_LOW (rhs2), lhs_mode, speed); or make multiply_by[_const?]_cost take a double-int instead. Likewise below for add. +case MODIFY_EXPR: + /* Be suspicious of assigning costs to copies that may well go away. */ + return 0; MODIFY_EXPR is never a gimple_assign_rhs_code. Simple copies have a code of SSA_NAME for example. But as you assert if you get to an unhandled code I wonder why you needed the above ... +static slsr_cand_t +base_cand_from_table (tree base_in) +{ + slsr_cand mapping_key; + + gimple def = SSA_NAME_DEF_STMT (base_in); + if (!def) +return (slsr_cand_t) NULL; + + mapping_key.cand_stmt = def; + return (slsr_cand_t) htab_find (stmt_cand_map, mapping_key); isn't that reachable via the base-name - chain mapping for base_in? + if (TREE_CODE (rhs2) == SSA_NAME operand_equal_p (rhs1, rhs2, 0)) +return; SSA_NAMEs can be compared by pointer equality, thus the above is equivalent to if (TREE_CODE (rhs2) == SSA_NAME rhs1 == rhs2) or even just if (rhs1 == rhs2) applies elsewhere as well. +/* Return TRUE iff PRODUCT is an integral multiple of FACTOR, and return + the multiple in *MULTIPLE. Otherwise return FALSE and leave *MULTIPLE + unchanged. */ +/* ??? - Should this be moved to double-int.c? */ I think so. +static bool +double_int_multiple_of (double_int product, double_int factor, + bool unsigned_p, double_int *multiple) +{ + double_int remainder; + double_int quotient = double_int_divmod (product, factor, unsigned_p, + TRUNC_DIV_EXPR, remainder); + if
Re: [patch] Implement -fcallgraph-info option
On Wed, Jun 20, 2012 at 12:30 PM, Eric Botcazou ebotca...@adacore.com wrote: Hi, this is a repost of http://gcc.gnu.org/ml/gcc-patches/2010-10/msg02468.html earlier in the development cycle, so with hopefully more time for discussion. The command line option -fcallgraph-info is added and makes the compiler generate another output file (xxx.ci) for each compilation unit, which is a valid VCG file (you can launch your favorite VCG viewer on it unmodified) and contains the final callgraph of the unit. final is a bit of a misnomer as this is actually the callgraph at RTL expansion time, but since most high-level optimizations are done at the Tree level and RTL doesn't usually fiddle with calls, it's final in almost all cases. Moreover, the nodes can be decorated with additional info: -fcallgraph-info=su adds stack usage info and -fcallgraph-info=da dynamic allocation info. This is useful for embedded applications with stringent requirements in terms of memory usage for example. You can provide the .ci files (or an aggregated version) for pre-compiled libraries. This is again strictly orthogonal to code and debug info generation. There are a few non-obvious changes to libfuncs.h, builtins.c, expr.c and optabs.c to deal with quirks of the RTL expander, but this mostly removes dead code. This version takes into account Joseph's comments for the first submission. As for Richard's comments, the implementation is low-level by design because we want to be able to trust it and the IPA callgraph isn't suitable for this, as the RTL expander can introduce function calls that need to be accounted for. Hmm. I wonder why we cannot do the following (some of this might be already the case): 1) preserve cgraph nodes of functions we expanded 2) at expand time, remove call edges from the cgraph node of the currently expanding function (they are not kept up-to-date anyway) 3) add cgraph edges to the regular cgraph when expand_call expands a call (what about indirect calls? you seem to ignore those ...) 4) for dynamic allocs simply record an edge to a callgraph node for alloca Thus, go away with the notion of a final cgraph. And make it possible to specify the dump should be emitted at other useful points of the compilation (LTO WPA phase comes to my mind, similar the callgraph as constructed by the frontend). Thus, make the whole thing a little less special case and more useful in general? (yeah, I still detest VCG and prefer DOT ... maybe we can think of a simple abstraction layer that would allow switching the output format ...) Thanks, Richard. Tested on x86_64-suse-linux. 2012-06-20 Eric Botcazou ebotca...@adacore.com Callgraph info support * common.opt (-fcallgraph-info[=]): New option. * doc/invoke.texi (Debugging options): Document it. * opts.c (common_handle_option): Handle it. * flag-types.h (enum callgraph_info_type): New type. * builtins.c (set_builtin_user_assembler_name): Do not initialize memcpy_libfunc and memset_libfunc. * calls.c (expand_call): If -fcallgraph-info, record the call. (emit_library_call_value_1): Likewise. * cgraph.h (struct cgraph_final_info): New structure. (struct cgraph_dynamic_alloc): Likewise. (cgraph_final_edge): Likewise. (cgraph_node): Add 'final' field. (dump_cgraph_final_vcg): Declare. (cgraph_final_record_call): Likewise. (cgraph_final_record_dynamic_alloc): Likewise. (cgraph_final_info): Likewise. * cgraph.c: Include expr.h and output.h. (cgraph_create_empty_node): Initialize 'final' field. (final_create_edge): New static function. (cgraph_final_record_call): New global function. (cgraph_final_record_dynamic_alloc): Likewise. (cgraph_final_info): Likewise. (dump_cgraph_final_indirect_call_node_vcg): New static function. (dump_cgraph_final_edge_vcg): Likewise. (dump_cgraph_final_node_vcg): Likewise. (external_node_needed_p): Likewise. (dump_cgraph_final_vcg): New global function. * expr.c (emit_block_move_via_libcall): Set input_location on the call. (set_storage_via_libcall): Likewise. (block_move_fn): Make global. Do not include gt-expr.h. * expr.h (block_move_fn): Declare. * gimplify.c (gimplify_decl_expr): Record dynamically-allocated object by calling cgraph_final_record_dynamic_alloc if -fcallgraph-info=da. * libfuncs.h (enum libfunc_index): Delete LTI_memcpy and LTI_memset. (memcpy_libfunc): Delete. (memset_libfunc): Likewise. * optabs.c (init_one_libfunc): Do not zap the SYMBOL_REF_DECL. (init_optabs): Do not initialize memcpy_libfunc and memset_libfunc. * print-tree.c (print_decl_identifier): New function. * output.h (enum stack_usage_kind_type): New type. (stack_usage_qual):
Re: [Patch ARM] PR51980 / PR49081 Improve Neon permute intrinsics.
On Wed, 20 Jun 2012 11:56:39 +0100 Ramana Radhakrishnan ramana.radhakrish...@linaro.org wrote: Hi, This patch helps use the __builtin_shuffle intrinsics to implement the Neon permute intrinsics following on from Julian's and my patch last week. It needed support for __builtin_shuffle in the C++ frontend which is now in and has been for the past few days , so I'm a little happier with this going in now.The changes to Julian's patch are to drop the mask generation and now this directly generates the vector constants instead. A small stylistic point I noticed: in, let rec print_lines = function [] - () - | [line] - Format.printf %s line - | line::lines - Format.printf %s@, line; print_lines lines in + | [line] - if line then Format.printf %s line else () + | line::lines - (if line then Format.printf %s@, line); print_lines lines in print_lines body; close_braceblock ffmt; end_function ffmt You can use constant strings in pattern matches, so this can be just: let rec print_lines = function [] | ::_ - () | [line] - Format.printf... | line::lines - Format.printf... You didn't need the brackets () around the if, btw. It's semantically quite like C: only a single statement after the then is conditional. If you want multiple statements conditionalised, the idiomatic way to do it is use begin...end (equivalent to { } in C) after the then keyword. HTH, Julian
Re: [PATCH] Fix PR tree-optimization/53636 (SLP generates invalid misaligned access)
Richard Guenther writes: On Tue, Jun 19, 2012 at 11:36 PM, Mikael Pettersson mi...@it.uu.se wrote: Richard Guenther writes: On Fri, Jun 15, 2012 at 5:00 PM, Ulrich Weigand uweig...@de.ibm.com wrote: Richard Guenther wrote: On Fri, Jun 15, 2012 at 3:13 PM, Ulrich Weigand uweig...@de.ibm.com wrote: However, there is a second case where we need to check every pass: if we're not actually vectorizing any loop, but are performing basic-block SLP. In this case, it would appear that we need the same check as described in the comment above, i.e. to verify that the stride is a multiple of the vector size. The patch below adds this check, and this indeed fixes the invalid access I was seeing in the test case (in the final assembler, we now get a vld1.16 instead of vldr). Tested on arm-linux-gnueabi with no regressions. OK for mainline? Ok. Thanks for the quick review; I've checked this in to mainline now. I just noticed that the test case also crashes on 4.7, but not on 4.6. Would a backport to 4.7 also be OK, once testing passes? Yes. Please leave it on mainline a few days to catch fallout from autotesters. This patch caused FAIL: gcc.dg/vect/bb-slp-16.c scan-tree-dump-times slp basic block vectorized using SLP 1 on sparc64-linux. Comparing the pre and post patch dumps for that file shows 22: vect_compute_data_ref_alignment: 22: misalign = 4 bytes of ref MEM[(unsigned int *)pout_90 + 28B] 22: vect_compute_data_ref_alignment: -22: force alignment of arr[i_87] -22: misalign = 0 bytes of ref arr[i_87] +22: SLP: step doesn't divide the vector-size. +22: Unknown alignment for access: arr (lots of stuff that's simply gone) -22: BASIC BLOCK VECTORIZED - -22: basic block vectorized using SLP +22: not vectorized: unsupported unaligned store.arr[i_87] +22: not vectorized: unsupported alignment in basic block. In this testcase the alignment of arr[i] should be irrelevant - it is not part of the stmts that are going to be vectorized. But of course this may be simply an odering issue in how we analyze data-references / statements in basic-block vectorization (thus we possibly did not yet declare the arr[i] = i statement as not taking part in the vectorization). The line -22: force alignment of arr[i_87] is odd, too - as said we do not need to touch arr when vectorizing the basic-block. Ulrich, can you look into this or do you want me to take a look here? Mikael - please open a bugreport for this. I opened PR53729 for this, with an update saying that powerpc64-linux also has this regression. /Mikael
[PATCH] Fix PR30318 - handle more cases of + in VRP
This concludes the VRP and anti-ranges series for now (well, it was the motivation for this patch which was pending for quite some time). This re-implements PLUS_EXPR support on integer ranges to cover all cases, even those that generate an anti-range as result. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2012-06-20 Richard Guenther rguent...@suse.de PR tree-optimization/30318 * tree-vrp.c (range_int_cst_p): Do not reject overflowed constants here. (range_int_cst_singleton_p): But explicitely here. (zero_nonzero_bits_from_vr): And here. (extract_range_from_binary_expr_1): Re-implement PLUS_EXPR to cover all cases we can perform arbitrary precision arithmetic with double-ints. (intersect_ranges): Handle adjacent anti-ranges. * gcc.dg/tree-ssa/vrp69.c: New testcase. Index: gcc/tree-vrp.c === *** gcc/tree-vrp.c.orig 2012-06-19 16:58:53.0 +0200 --- gcc/tree-vrp.c 2012-06-19 17:16:31.569517561 +0200 *** range_int_cst_p (value_range_t *vr) *** 844,852 { return (vr-type == VR_RANGE TREE_CODE (vr-max) == INTEGER_CST ! TREE_CODE (vr-min) == INTEGER_CST ! !TREE_OVERFLOW (vr-max) ! !TREE_OVERFLOW (vr-min)); } /* Return true if VR is a INTEGER_CST singleton. */ --- 844,850 { return (vr-type == VR_RANGE TREE_CODE (vr-max) == INTEGER_CST ! TREE_CODE (vr-min) == INTEGER_CST); } /* Return true if VR is a INTEGER_CST singleton. */ *** static inline bool *** 855,860 --- 853,860 range_int_cst_singleton_p (value_range_t *vr) { return (range_int_cst_p (vr) + !TREE_OVERFLOW (vr-min) + !TREE_OVERFLOW (vr-max) tree_int_cst_equal (vr-min, vr-max)); } *** zero_nonzero_bits_from_vr (value_range_t *** 1970,1976 { *may_be_nonzero = double_int_minus_one; *must_be_nonzero = double_int_zero; ! if (!range_int_cst_p (vr)) return false; if (range_int_cst_singleton_p (vr)) --- 1970,1978 { *may_be_nonzero = double_int_minus_one; *must_be_nonzero = double_int_zero; ! if (!range_int_cst_p (vr) ! || TREE_OVERFLOW (vr-min) ! || TREE_OVERFLOW (vr-max)) return false; if (range_int_cst_singleton_p (vr)) *** extract_range_from_binary_expr_1 (value_ *** 2376,2414 range and see what we end up with. */ if (code == PLUS_EXPR) { ! /* If we have a PLUS_EXPR with two VR_ANTI_RANGEs, drop to !VR_VARYING. It would take more effort to compute a precise !range for such a case. For example, if we have op0 == 1 and !op1 == -1 with their ranges both being ~[0,0], we would have !op0 + op1 == 0, so we cannot claim that the sum is in ~[0,0]. !Note that we are guaranteed to have vr0.type == vr1.type at !this point. */ ! if (vr0.type == VR_ANTI_RANGE) { set_value_range_to_varying (vr); return; } - - /* For operations that make the resulting range directly -proportional to the original ranges, apply the operation to -the same end of each range. */ - min = vrp_int_const_binop (code, vr0.min, vr1.min); - max = vrp_int_const_binop (code, vr0.max, vr1.max); - - /* If both additions overflowed the range kind is still correct. -This happens regularly with subtracting something in unsigned -arithmetic. - ??? See PR30318 for all the cases we do not handle. */ - if ((TREE_OVERFLOW (min) !is_overflow_infinity (min)) - (TREE_OVERFLOW (max) !is_overflow_infinity (max))) - { - min = build_int_cst_wide (TREE_TYPE (min), - TREE_INT_CST_LOW (min), - TREE_INT_CST_HIGH (min)); - max = build_int_cst_wide (TREE_TYPE (max), - TREE_INT_CST_LOW (max), - TREE_INT_CST_HIGH (max)); - } } else if (code == MIN_EXPR || code == MAX_EXPR) --- 2378,2538 range and see what we end up with. */ if (code == PLUS_EXPR) { ! /* If we have a PLUS_EXPR with two VR_RANGE integer constant ! ranges compute the precise range for such case if possible. */ ! if (range_int_cst_p (vr0) ! range_int_cst_p (vr1) ! /* We attempt to do infinite precision signed integer arithmetic, !thus we need two more bits than the possibly unsigned inputs. */ ! TYPE_PRECISION (expr_type) HOST_BITS_PER_DOUBLE_INT - 1) ! { ! double_int min0 = tree_to_double_int (vr0.min); ! double_int max0 = tree_to_double_int (vr0.max); ! double_int min1 = tree_to_double_int (vr1.min); !
Re: [PATCH] Fix PR tree-optimization/53636 (SLP generates invalid misaligned access)
Richard Guenther wrote: In this testcase the alignment of arr[i] should be irrelevant - it is not part of the stmts that are going to be vectorized. Agreed. But of course this may be simply an odering issue in how we analyze data-references / statements in basic-block vectorization (thus we possibly did not yet declare the arr[i] = i statement as not taking part in the vectorization). The line -22: force alignment of arr[i_87] is odd, too - as said we do not need to touch arr when vectorizing the basic-block. Ulrich, can you look into this or do you want me to take a look here? I'll have a look. Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: [patch][RFC] Move the C front end to gcc/c/
On Wed, Jun 20, 2012 at 4:44 AM, Steven Bosscher stevenb@gmail.com wrote: I'm posting this as an RFC: Does this look like the right approach? Have I overlooked other things than just documentation updates? I hope this would not cause too much trouble for branches like the cxx-conversion branch? It should be fine. SVN knows about moves and, as it happens, I don't think we have changes in any of the c-* files. Thanks for doing this. Diego.
[patch testsuite]: Fix two testcases for x86_64-*-mingw* target
Hi, ChangeLog 2012-06-20 Kai Tietz * gcc.target/i386/pr23943.c (size_t): Use compatible type-definition for LLP64 targets. * gcc.target/i386/pr38988.c: Likewise. Regression-tested for x86_64-w64-mingw32, and x86_64-unknown-linux-gnu. Ok for apply? Regards, Kai Index: gcc/testsuite/gcc.target/i386/pr23943.c === --- gcc/testsuite/gcc.target/i386/pr23943.c (revision 188753) +++ gcc/testsuite/gcc.target/i386/pr23943.c (working copy) @@ -4,7 +4,7 @@ /* { dg-require-effective-target fpic } */ /* { dg-options -O2 -fPIC } */ -typedef long unsigned int size_t; +__extension__ typedef __SIZE_TYPE__ size_t; extern size_t strlen (__const char *__s) __attribute__ ((__nothrow__)) __attribute__ ((__pure__)) __attribute__ ((__nonnull__ (1))); Index: gcc/testsuite/gcc.target/i386/pr38988.c === --- gcc/testsuite/gcc.target/i386/pr38988.c (revision 188753) +++ gcc/testsuite/gcc.target/i386/pr38988.c (working copy) @@ -3,7 +3,7 @@ /* { dg-require-effective-target fpic } */ /* { dg-options -O2 -fpic -mcmodel=large } */ -typedef long unsigned int size_t; +__extension__ typedef __SIZE_TYPE__ size_t; typedef void (*func_ptr) (void); static func_ptr __DTOR_LIST__[1] = { (func_ptr) (-1) };
Re: [Patch] Don't test for pr53425 on mingw
On Mon, Jun 18, 2012 at 10:09 AM, Kai Tietz ktiet...@googlemail.com wrote: 2012/6/18 JonY jo...@users.sourceforge.net: Hi, I am told that this ABI test does not apply to mingw targets. OK to apply? Hi JonY, The test doesn't apply to x64 windows targets, as for it sse is part of its ABI. As test already checks for !ia32, we could simply check for x86_64/i?86-*-mingw* targets instead. We don't need to check for ilp32 here again. The test needs to be skipped if the target is: x86_64-*-mingw* i*86-*-mingw* with -m64 multilib option and it needs to run if the target is: i*86-*-mingw* x86_64-*-mingw* with -m32 multilib option Does anyone know how to make that happen?
Re: [Patch] Don't test for pr53425 on mingw
As both tests are checking already for !ia32, there is no additiona check beside the targets necessary. Cheers, Kai
[Patch ARM] Improve vdup_n intrinsics.
Hi , This improves the vdup_n intrinsics where one tries to form constant vectors. This uses targetm.fold_builtin to fold these vector initializations to actual vector constants. The vdup_n cases are fine with both endian-ness as the vector constant is just duplicated. In addition I've made the *neon_vmov patterns take a const_zero vector to allow the compiler to generate vmov.i32 reg, #0 for vdup_n_f32 (0.0f); type operations. It has the nice side effect that zero initalization of FP vectors for Neon doesn't need a load from the literal pool. I will point out that the vcreate and a lot of the other intrinsics could be improved in a similar vein (caveat big-endian) . This helps in a number of cases where we were initially generating a mov of a constant into an integer register and then dupping it over and indeed helps the tree optimizers recognize the value for the constant vector that it is. This also needed some work with making a testcase for vabd more robust which just showed that the folding works ! In the process I've also cleaned up a few prototypes which was obvious. Tested cross on arm-linux-gnueabi with no regressions. Ok (to commit as 2 separate patches one for the prototype cleanup and the other for the vdup case ) ? regards, Ramana 2012-06-20 Ramana Radhakrishnan ramana.radhakrish...@linaro.org * config/arm/arm.c (arm_vector_alignment_reachable): Fix declaration. (arm_builtin_support_vector_misalignment): Likewise. (arm_preferred_rename_class): Likewise. (arm_vectorize_vec_perm_const_ok): Likewise. (arm_fold_builtin): New. (TARGET_FOLD_BUILTIN): New. * config/arm/neon.md (*neon_movmode:VDX, VQX): Add Dz alternative. testsuite/ * gcc.target/arm/neon-combine-sub-abs-into-abd.c: Make test more robust. vmovzero.patch Description: Binary data
Re: [Target maintainers]: Please update libjava/sysdep/*/locks.h with new atomic builtins
Alan and I both re-implemented the locks and settled on the following patch. This uses the __atomic intrinsics, not the __sync instrinsics, to avoid generating expensive instructions for a memory model that is stricter than necessary. If these intrinsics correctly represent the semantics of the libjava barriers, it probably can be used as a generic implementation for targets that support the __atomic intrinsics. - David 2012-06-20 David Edelsohn dje@gmail.com Alan Modra amo...@gmail.com * sysdep/powerpc/locks.h (compare_and_swap): Use GCC atomic intrinsics. (release_set): Same. (compare_and_swap_release): Same. (read_barrier): Same. (write_barrier): Same. Index: locks.h === --- locks.h (revision 188778) +++ locks.h (working copy) @@ -11,87 +11,63 @@ #ifndef __SYSDEP_LOCKS_H__ #define __SYSDEP_LOCKS_H__ -#ifdef __LP64__ -#define _LARX ldarx -#define _STCX stdcx. -#else -#define _LARX lwarx -#ifdef __PPC405__ -#define _STCX sync; stwcx. -#else -#define _STCX stwcx. -#endif -#endif - typedef size_t obj_addr_t; /* Integer type big enough for object */ /* address. */ +// Atomically replace *addr by new_val if it was initially equal to old. +// Return true if the comparison succeeded. +// Assumed to have acquire semantics, i.e. later memory operations +// cannot execute before the compare_and_swap finishes. + inline static bool -compare_and_swap (volatile obj_addr_t *addr, obj_addr_t old, +compare_and_swap (volatile obj_addr_t *addr, + obj_addr_t old, obj_addr_t new_val) { - obj_addr_t ret; - - __asm__ __volatile__ ( - _LARX %0,0,%1 \n -xor. %0,%3,%0\n -bne $+12\n - _STCX %2,0,%1\n -bne- $-16\n - : =r (ret) - : r (addr), r (new_val), r (old) - : cr0, memory); - - /* This version of __compare_and_swap is to be used when acquiring - a lock, so we don't need to worry about whether other memory - operations have completed, but we do need to be sure that any loads - after this point really occur after we have acquired the lock. */ - __asm__ __volatile__ (isync : : : memory); - return ret == 0; + return __atomic_compare_exchange_n (addr, old, new_val, 0, + __ATOMIC_ACQUIRE, __ATOMIC_RELAXED); } + +// Set *addr to new_val with release semantics, i.e. making sure +// that prior loads and stores complete before this +// assignment. + inline static void release_set (volatile obj_addr_t *addr, obj_addr_t new_val) { - __asm__ __volatile__ (sync : : : memory); - *addr = new_val; + __atomic_store_n(addr, val, __ATOMIC_RELEASE); } + +// Compare_and_swap with release semantics instead of acquire semantics. + inline static bool compare_and_swap_release (volatile obj_addr_t *addr, obj_addr_t old, obj_addr_t new_val) { - obj_addr_t ret; - - __asm__ __volatile__ (sync : : : memory); - - __asm__ __volatile__ ( - _LARX %0,0,%1 \n -xor. %0,%3,%0\n -bne $+12\n - _STCX %2,0,%1\n -bne- $-16\n - : =r (ret) - : r (addr), r (new_val), r (old) - : cr0, memory); - - return ret == 0; + return __atomic_compare_exchange_n (addr, old, new_val, 0, + __ATOMIC_RELEASE, __ATOMIC_RELAXED); } + // Ensure that subsequent instructions do not execute on stale // data that was loaded from memory before the barrier. + inline static void read_barrier () { - __asm__ __volatile__ (isync : : : memory); + __atomic_thread_fence (__ATOMIC_ACQUIRE); } + // Ensure that prior stores to memory are completed with respect to other // processors. + inline static void write_barrier () { - __asm__ __volatile__ (sync : : : memory); + __atomic_thread_fence (__ATOMIC_RELEASE); } #endif
Re: [Target maintainers]: Please update libjava/sysdep/*/locks.h with new atomic builtins
On Wed, Jun 20, 2012 at 09:10:44AM -0400, David Edelsohn wrote: inline static void release_set (volatile obj_addr_t *addr, obj_addr_t new_val) { - __asm__ __volatile__ (sync : : : memory); - *addr = new_val; + __atomic_store_n(addr, val, __ATOMIC_RELEASE); A typo seems to have crept in here. s/val/new_val/ -- Alan Modra Australia Development Lab, IBM
Re: [PATCH] ARM/NEON: vld1q_dup_s64 builtin
On 06.06.2012 11:00, Ramana Radhakrishnan wrote: Ok with those changes. Ramana . Hi Ramana, How about this version? Christophe. commit f57ce4b63ca1c30ee88e8c1a431d6e90ffbecb82 Author: Christophe Lyon christophe.l...@st.com Date: Wed Jun 20 15:30:50 2012 +0200 2012-06-20 Christophe Lyon christophe.l...@st.com * gcc/config/arm/neon.md (UNSPEC_VLD1_DUP): Remove. (neon_vld1_dup): Restrict to VQ operands. (neon_vld1_dupv2di): New, fixes vld1q_dup_s64. * gcc/testsuite/gcc.target/arm/neon-vld1_dupQ.c: New test. diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 4568dea..b3b925c 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -45,7 +45,6 @@ UNSPEC_VHADD UNSPEC_VHSUB UNSPEC_VLD1 - UNSPEC_VLD1_DUP UNSPEC_VLD1_LANE UNSPEC_VLD2 UNSPEC_VLD2_DUP @@ -4381,8 +4380,7 @@ (define_insn neon_vld1_dupmode [(set (match_operand:VDX 0 s_register_operand =w) -(unspec:VDX [(match_operand:V_elem 1 neon_struct_operand Um)] -UNSPEC_VLD1_DUP))] +(vec_duplicate:VDX (match_operand:V_elem 1 neon_struct_operand Um)))] TARGET_NEON { if (GET_MODE_NUNITS (MODEmode) 1) @@ -4397,20 +4395,30 @@ ) (define_insn neon_vld1_dupmode - [(set (match_operand:VQX 0 s_register_operand =w) -(unspec:VQX [(match_operand:V_elem 1 neon_struct_operand Um)] -UNSPEC_VLD1_DUP))] + [(set (match_operand:VQ 0 s_register_operand =w) +(vec_duplicate:VQ (match_operand:V_elem 1 neon_struct_operand Um)))] TARGET_NEON { - if (GET_MODE_NUNITS (MODEmode) 2) -return vld1.V_sz_elem\t{%e0[], %f0[]}, %A1; - else -return vld1.V_sz_elem\t%h0, %A1; + return vld1.V_sz_elem\t{%e0[], %f0[]}, %A1; } - [(set (attr neon_type) - (if_then_else (gt (const_string V_mode_nunits) (const_string 1)) -(const_string neon_vld2_2_regs_vld1_vld2_all_lanes) -(const_string neon_vld1_1_2_regs)))] + [(set_attr neon_type neon_vld2_2_regs_vld1_vld2_all_lanes)] +) + +(define_insn_and_split neon_vld1_dupv2di + [(set (match_operand:V2DI 0 s_register_operand =w) +(vec_duplicate:V2DI (match_operand:DI 1 neon_struct_operand Um)))] + TARGET_NEON + # +reload_completed + [(const_int 0)] + { +rtx tmprtx = gen_lowpart (DImode, operands[0]); +emit_insn (gen_neon_vld1_dupdi (tmprtx, operands[1])); +emit_move_insn (gen_highpart (DImode, operands[0]), tmprtx ); +DONE; +} + [(set_attr length 8) + (set_attr neon_type neon_vld2_2_regs_vld1_vld2_all_lanes)] ) (define_expand vec_store_lanesmodemode diff --git a/gcc/testsuite/gcc.target/arm/neon-vld1_dupQ.c b/gcc/testsuite/gcc.target/arm/neon-vld1_dupQ.c new file mode 100644 index 000..b5793bf --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vld1_dupQ.c @@ -0,0 +1,24 @@ +/* Test the `vld1q_s64' ARM Neon intrinsic. */ + +/* { dg-do run } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-options -O0 } */ +/* { dg-add-options arm_neon } */ + +#include arm_neon.h +#include stdlib.h + +int main (void) +{ + int64x1_t input[2] = {(int64x1_t)0x0123456776543210LL, + (int64x1_t)0x89abcdeffedcba90LL}; + int64x1_t output[2] = {0, 0}; + int64x2_t var = vld1q_dup_s64(input); + + vst1q_s64(output, var); + if (output[0] != (int64x1_t)0x0123456776543210LL) +abort(); + if (output[1] != (int64x1_t)0x0123456776543210LL) +abort(); + return 0; +}
Re: [PATCH] Fix PR53708
Hi, On 20 Jun 2012, at 09:23, Richard Guenther wrote: On Tue, 19 Jun 2012, Iain Sandoe wrote: On 19 Jun 2012, at 22:41, Mike Stump wrote: On Jun 19, 2012, at 12:22 PM, Iain Sandoe i...@codesourcery.com wrote: On 19 Jun 2012, at 13:53, Dominique Dhumieres wrote: On Tue, 19 Jun 2012, Richard Guenther wrote: Richard Guenther rguent...@suse.de writes: We are too eager to bump alignment of some decls when vectorizing. The fix is to not bump alignment of decls the user explicitely aligned or that are used in an unknown way. I thought attribute((__aligned__)) only set a minimum alignment for variables? Most usees I've seen have been trying to get better performance from higher alignment, so it might not go down well if the attribute stopped the vectoriser from increasing the alignment still further. That's what the documentation says indeed. I'm not sure which part of the patch fixes the ObjC failures where the alignment is part of the ABI (and I suppose ObjC then mis-uses the aligned attribute?). A quick test shows that if (DECL_PRESERVE_P (decl)) alone is enough to fix the objc failures, while they are still there if one uses only if (DECL_USER_ALIGN (decl)) That makes sense, I had a quick look at the ObjC code, and it appears that the explicit ALIGNs were never committed to trunk. Thus, the question becomes; what should ObjC (or any other) FE do to ensure that specific ABI (upper) alignment constraints are met? Hum, upper is easy... I thought the issue was that extra alignment would kill it? I know that extra alignment does kill some of the objc metadata. clearly, ambiguous phrasing on my part. I mean when we want to say no more than this much. I think the only way would be to lay out things inside a structure. Otherwise if extra alignment can break things cannot re-ordering of symbols break, too? Or can you elaborate on how extra alignment breaks stuff here? The NeXT runtime meta-data are anonymous and stored in named sections; The names of the sections are known to the runtime, which looks up the data directly. In the case that's failing we have class references (which are, effectively, pointers to strings). The (m32, v0 or v1) runtime knows that the number of names is the section size / sizeof(long). Bumping up the alignment of these items makes it look like there are more name pointers present. For later versions of (Darwin) ld, this is caught by the linker; for earlier versions the exe will fail. The order of the names or other items is not significant (or has already been handled by the runtime). [It might be possible rejig the class ref. list as an array, or a structure containing only longs, but I'll need to look at that later]. However, it seems reasonable that ABIs could require both upper and lower limits on alignment; are we saying that the only way to handle the 'upper' is by declaring things 'packed' and putting them into a crafted structure? thanks Iain
Re: [Target maintainers]: Please update libjava/sysdep/*/locks.h with new atomic builtins
On Wed, Jun 20, 2012 at 9:35 AM, Alan Modra amo...@gmail.com wrote: On Wed, Jun 20, 2012 at 09:10:44AM -0400, David Edelsohn wrote: inline static void release_set (volatile obj_addr_t *addr, obj_addr_t new_val) { - __asm__ __volatile__ (sync : : : memory); - *addr = new_val; + __atomic_store_n(addr, val, __ATOMIC_RELEASE); A typo seems to have crept in here. s/val/new_val/ Fixed. Thanks, David
Re: [cxx-conversion] Remove option to build without a C++ compiler (issue6296093)
dnovi...@google.com (Diego Novillo) writes: Ian, could you please take a look to double check I have not missed anything? There was more code dealing with it than I was expecting. It all looks plausible to me. Ian
Re: [PATCH] Fix PR53708
On Wed, 20 Jun 2012, Richard Guenther wrote: On Wed, 20 Jun 2012, Iain Sandoe wrote: Hi, On 20 Jun 2012, at 09:23, Richard Guenther wrote: On Tue, 19 Jun 2012, Iain Sandoe wrote: On 19 Jun 2012, at 22:41, Mike Stump wrote: On Jun 19, 2012, at 12:22 PM, Iain Sandoe i...@codesourcery.com wrote: On 19 Jun 2012, at 13:53, Dominique Dhumieres wrote: On Tue, 19 Jun 2012, Richard Guenther wrote: Richard Guenther rguent...@suse.de writes: We are too eager to bump alignment of some decls when vectorizing. The fix is to not bump alignment of decls the user explicitely aligned or that are used in an unknown way. I thought attribute((__aligned__)) only set a minimum alignment for variables? Most usees I've seen have been trying to get better performance from higher alignment, so it might not go down well if the attribute stopped the vectoriser from increasing the alignment still further. That's what the documentation says indeed. I'm not sure which part of the patch fixes the ObjC failures where the alignment is part of the ABI (and I suppose ObjC then mis-uses the aligned attribute?). A quick test shows that if (DECL_PRESERVE_P (decl)) alone is enough to fix the objc failures, while they are still there if one uses only if (DECL_USER_ALIGN (decl)) That makes sense, I had a quick look at the ObjC code, and it appears that the explicit ALIGNs were never committed to trunk. Thus, the question becomes; what should ObjC (or any other) FE do to ensure that specific ABI (upper) alignment constraints are met? Hum, upper is easy... I thought the issue was that extra alignment would kill it? I know that extra alignment does kill some of the objc metadata. clearly, ambiguous phrasing on my part. I mean when we want to say no more than this much. I think the only way would be to lay out things inside a structure. Otherwise if extra alignment can break things cannot re-ordering of symbols break, too? Or can you elaborate on how extra alignment breaks stuff here? The NeXT runtime meta-data are anonymous and stored in named sections; The names of the sections are known to the runtime, which looks up the data directly. In the case that's failing we have class references (which are, effectively, pointers to strings). The (m32, v0 or v1) runtime knows that the number of names is the section size / sizeof(long). Bumping up the alignment of these items makes it look like there are more name pointers present. For later versions of (Darwin) ld, this is caught by the linker; for earlier versions the exe will fail. The order of the names or other items is not significant (or has already been handled by the runtime). [It might be possible rejig the class ref. list as an array, or a structure containing only longs, but I'll need to look at that later]. However, it seems reasonable that ABIs could require both upper and lower limits on alignment; are we saying that the only way to handle the 'upper' is by declaring things 'packed' and putting them into a crafted structure? Yes, I think so. It would also be reasonable to have __attribute__((aligned(8),packed)) specify that's a hard alignment requirement, not a lower bound. Not sure if that works to that effect though. At least int c __attribute__((aligned(8),packed)); tells you that the packed attribute is ignored. If you pack things inside an array the array itself might still get larger alignment (though of course not its elements). So if you rely on section concatenation not producing gaps even the packed structure may not be a good solution - it's start address can still get a bigger alignment. So I think there does not exist a way to tell GCC that the start address of an object ought not to be aligned in another way than the ABI specifies (though the ABIs I know only specify minimum alignments, not maximum ones ...). From reading your description of the issue again I think that an array of names is what you want. Richard.
Re: [cxx-conversion] Remove option to build without a C++ compiler (issue6296093)
On Wed, Jun 20, 2012 at 1:08 AM, Diego Novillo dnovi...@google.com wrote: diff --git a/configure.ac b/configure.ac index 071b5e2..2a2a0c6 100644 --- a/configure.ac +++ b/configure.ac @@ -1667,7 +1653,7 @@ ACX_ELF_TARGET_IFELSE([# ELF platforms build the lto-plugin always. ]) -# By default, C is the only stage 1 language. +# By default, C and C++ are the only stage 1 languages. stage1_languages=,c, So shouldn't you add c++ here? # Target libraries that we bootstrap. @@ -1705,15 +1691,14 @@ if test -d ${srcdir}/gcc; then ;; esac - # If bootstrapping, then using --enable-build-with-cxx or - # --enable-build-poststage1-with-cxx requires enabling C++. - case ,$enable_languages,:,$ENABLE_BUILD_WITH_CXX,$ENABLE_BUILD_POSTSTAGE1_WITH_CXX,:$enable_bootstrap in - *,c++,*:*:*) ;; - *:*,yes,*:yes) + # If bootstrapping, C++ must be enabled. Hmn, perhaps I misunderstand, but shouldn't C++ also be enabled if not bootstrapping? + case ,$enable_languages,:$enable_bootstrap in + *,c++,*:*) ;; + *:yes) if test -f ${srcdir}/gcc/cp/config-lang.in; then enable_languages=${enable_languages},c++ else - AC_MSG_ERROR([bootstrapping with --enable-build-with-cxx or --enable-build-poststage1-with-cxx requires c++ sources]) + AC_MSG_ERROR([bootstrapping requires c++ sources]) fi ;; esac @@ -1808,10 +1793,7 @@ if test -d ${srcdir}/gcc; then fi if test $language = c++; then - if test $ENABLE_BUILD_WITH_CXX = yes \ - || test $ENABLE_BUILD_POSTSTAGE1_WITH_CXX = yes; then - boot_language=yes - fi + boot_language=yes fi This shouldn't be necessary if you add c++ to stage1_languages case ,${enable_languages}, in @@ -3198,26 +3180,6 @@ case $build in esac ;; esac You can also remove the lang_requires_boot_languages machinery again. It is only used by Go to enable c++ for bootstrapping the Go front end, but with c++ enabled by default, there is no need for this hack for Go anymore. Ciao! Steven
Re: [cxx-conversion] Remove option to build without a C++ compiler (issue6296093)
On Wed, Jun 20, 2012 at 4:10 PM, Steven Bosscher stevenb@gmail.com wrote: On Wed, Jun 20, 2012 at 1:08 AM, Diego Novillo dnovi...@google.com wrote: diff --git a/configure.ac b/configure.ac index 071b5e2..2a2a0c6 100644 --- a/configure.ac +++ b/configure.ac @@ -1667,7 +1653,7 @@ ACX_ELF_TARGET_IFELSE([# ELF platforms build the lto-plugin always. ]) -# By default, C is the only stage 1 language. +# By default, C and C++ are the only stage 1 languages. stage1_languages=,c, So shouldn't you add c++ here? If you are not bootstrapping you only need frontends to build target libraries - unless that includes a C++ library by default no, you only need a C++ host compiler then. I think stage1_languages should be empty and Makefile.def should be properly set-up to add frontends required for required target libraries. Richard. # Target libraries that we bootstrap. @@ -1705,15 +1691,14 @@ if test -d ${srcdir}/gcc; then ;; esac - # If bootstrapping, then using --enable-build-with-cxx or - # --enable-build-poststage1-with-cxx requires enabling C++. - case ,$enable_languages,:,$ENABLE_BUILD_WITH_CXX,$ENABLE_BUILD_POSTSTAGE1_WITH_CXX,:$enable_bootstrap in - *,c++,*:*:*) ;; - *:*,yes,*:yes) + # If bootstrapping, C++ must be enabled. Hmn, perhaps I misunderstand, but shouldn't C++ also be enabled if not bootstrapping? + case ,$enable_languages,:$enable_bootstrap in + *,c++,*:*) ;; + *:yes) if test -f ${srcdir}/gcc/cp/config-lang.in; then enable_languages=${enable_languages},c++ else - AC_MSG_ERROR([bootstrapping with --enable-build-with-cxx or --enable-build-poststage1-with-cxx requires c++ sources]) + AC_MSG_ERROR([bootstrapping requires c++ sources]) fi ;; esac @@ -1808,10 +1793,7 @@ if test -d ${srcdir}/gcc; then fi if test $language = c++; then - if test $ENABLE_BUILD_WITH_CXX = yes \ - || test $ENABLE_BUILD_POSTSTAGE1_WITH_CXX = yes; then - boot_language=yes - fi + boot_language=yes fi This shouldn't be necessary if you add c++ to stage1_languages case ,${enable_languages}, in @@ -3198,26 +3180,6 @@ case $build in esac ;; esac You can also remove the lang_requires_boot_languages machinery again. It is only used by Go to enable c++ for bootstrapping the Go front end, but with c++ enabled by default, there is no need for this hack for Go anymore. Ciao! Steven
Re: [arm] Remove obsolete FPA support (1/n): obsolete target removal
On 06/13/2012 02:51 PM, Richard Earnshaw wrote: This patch is the first of a series to remove support for the now obsolete FPA and Maverick co-processors. This patch removes those targets and configuration options that were marked as deprecated in GCC-4.7 and removes the config fragments that depended on them. * config.gcc (unsupported): Move obsoleted FPA-based configurations here from ... (obsolete): ... here. [...] I am not sure, but I think the libgcc/config.host needs some cleanup too? -- Sebastian Huber, embedded brains GmbH Address : Obere Lagerstr. 30, D-82178 Puchheim, Germany Phone : +49 89 18 90 80 79-6 Fax : +49 89 18 90 80 79-9 E-Mail : sebastian.hu...@embedded-brains.de PGP : Public key available on request. Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
Re: RFA: Fix PR53688
Hi, On Tue, 19 Jun 2012, Richard Guenther wrote: The MEM_REF is acceptable to the tree oracle and it can extract points-to information from it. Thus for simplicity unconditionally building the above is the best. But it doesn't work, as refs_may_alias_p_1 only accepts certain operands in MEM_REFs. So, I opted to check the operand for is_gimple_mem_ref_addr after it's built, and if not acceptable at least build a mem-ref for the base object, if possible. In order not to loose info we had before the patch I had to improve get_base_address a little to not give up on MEM_REFs like MEM[p.c]. Regstrapped on x86_64-linux, no regressions. Okay for trunk? Ciao, Michael. PR middle-end/53688 * gimple.c (get_base_address): Strip components also from inner arguments to MEM_REFs. * builtins.c (get_memory_rtx): Always build an all-aliasing MEM_REF with correct size. testsuite/ * gcc.c-torture/execute/pr53688.c: New test. Index: gimple.c === --- gimple.c(revision 188772) +++ gimple.c(working copy) @@ -2911,7 +2911,11 @@ get_base_address (tree t) if ((TREE_CODE (t) == MEM_REF || TREE_CODE (t) == TARGET_MEM_REF) TREE_CODE (TREE_OPERAND (t, 0)) == ADDR_EXPR) -t = TREE_OPERAND (TREE_OPERAND (t, 0), 0); +{ + t = TREE_OPERAND (TREE_OPERAND (t, 0), 0); + while (handled_component_p (t)) + t = TREE_OPERAND (t, 0); +} if (TREE_CODE (t) == SSA_NAME || DECL_P (t) Index: builtins.c === --- builtins.c (revision 188772) +++ builtins.c (working copy) @@ -1252,7 +1252,6 @@ get_memory_rtx (tree exp, tree len) { tree orig_exp = exp; rtx addr, mem; - HOST_WIDE_INT off; /* When EXP is not resolved SAVE_EXPR, MEM_ATTRS can be still derived from its expression, for expr-a.b only variable.a.b is recorded. */ @@ -1269,114 +1268,30 @@ get_memory_rtx (tree exp, tree len) POINTER_TYPE_P (TREE_TYPE (TREE_OPERAND (exp, 0 exp = TREE_OPERAND (exp, 0); - off = 0; - if (TREE_CODE (exp) == POINTER_PLUS_EXPR - TREE_CODE (TREE_OPERAND (exp, 0)) == ADDR_EXPR - host_integerp (TREE_OPERAND (exp, 1), 0) - (off = tree_low_cst (TREE_OPERAND (exp, 1), 0)) 0) -exp = TREE_OPERAND (TREE_OPERAND (exp, 0), 0); - else if (TREE_CODE (exp) == ADDR_EXPR) -exp = TREE_OPERAND (exp, 0); - else if (POINTER_TYPE_P (TREE_TYPE (exp))) -exp = build1 (INDIRECT_REF, TREE_TYPE (TREE_TYPE (exp)), exp); - else -exp = NULL; - - /* Honor attributes derived from exp, except for the alias set - (as builtin stringops may alias with anything) and the size - (as stringops may access multiple array elements). */ - if (exp) + /* Build a MEM_REF representing the whole accessed area as a byte blob, + (as builtin stringops may alias with anything). */ + exp = fold_build2 (MEM_REF, +build_array_type (char_type_node, + build_range_type (sizetype, +size_one_node, len)), +exp, build_int_cst (ptr_type_node, 0)); + + /* If the MEM_REF has no acceptable address, try to get the base object, + and build an all-aliasing unknown-sized access to that one. */ + if (!is_gimple_mem_ref_addr (TREE_OPERAND (exp, 0)) + (exp = get_base_address (exp))) { - set_mem_attributes (mem, exp, 0); - - if (off) - mem = adjust_automodify_address_nv (mem, BLKmode, NULL, off); - - /* Allow the string and memory builtins to overflow from one -field into another, see http://gcc.gnu.org/PR23561. -Thus avoid COMPONENT_REFs in MEM_EXPR unless we know the whole -memory accessed by the string or memory builtin will fit -within the field. */ - if (MEM_EXPR (mem) TREE_CODE (MEM_EXPR (mem)) == COMPONENT_REF) - { - tree mem_expr = MEM_EXPR (mem); - HOST_WIDE_INT offset = -1, length = -1; - tree inner = exp; - - while (TREE_CODE (inner) == ARRAY_REF -|| CONVERT_EXPR_P (inner) -|| TREE_CODE (inner) == VIEW_CONVERT_EXPR -|| TREE_CODE (inner) == SAVE_EXPR) - inner = TREE_OPERAND (inner, 0); - - gcc_assert (TREE_CODE (inner) == COMPONENT_REF); - - if (MEM_OFFSET_KNOWN_P (mem)) - offset = MEM_OFFSET (mem); - - if (offset = 0 len host_integerp (len, 0)) - length = tree_low_cst (len, 0); - - while (TREE_CODE (inner) == COMPONENT_REF) - { - tree field = TREE_OPERAND (inner, 1); - gcc_assert (TREE_CODE (mem_expr) == COMPONENT_REF); - gcc_assert (field == TREE_OPERAND (mem_expr, 1)); - - /* Bitfields are generally not byte-addressable. */ - gcc_assert (!DECL_BIT_FIELD
Re: RFA: Fix PR53688
On Wed, Jun 20, 2012 at 4:57 PM, Michael Matz m...@suse.de wrote: Hi, On Tue, 19 Jun 2012, Richard Guenther wrote: The MEM_REF is acceptable to the tree oracle and it can extract points-to information from it. Thus for simplicity unconditionally building the above is the best. But it doesn't work, as refs_may_alias_p_1 only accepts certain operands in MEM_REFs. So, I opted to check the operand for is_gimple_mem_ref_addr after it's built, and if not acceptable at least build a mem-ref for the base object, if possible. In order not to loose info we had before the patch I had to improve get_base_address a little to not give up on MEM_REFs like MEM[p.c]. Regstrapped on x86_64-linux, no regressions. Okay for trunk? Hrm ... Ciao, Michael. PR middle-end/53688 * gimple.c (get_base_address): Strip components also from inner arguments to MEM_REFs. * builtins.c (get_memory_rtx): Always build an all-aliasing MEM_REF with correct size. testsuite/ * gcc.c-torture/execute/pr53688.c: New test. Index: gimple.c === --- gimple.c (revision 188772) +++ gimple.c (working copy) @@ -2911,7 +2911,11 @@ get_base_address (tree t) if ((TREE_CODE (t) == MEM_REF || TREE_CODE (t) == TARGET_MEM_REF) TREE_CODE (TREE_OPERAND (t, 0)) == ADDR_EXPR) - t = TREE_OPERAND (TREE_OPERAND (t, 0), 0); + { + t = TREE_OPERAND (TREE_OPERAND (t, 0), 0); + while (handled_component_p (t)) + t = TREE_OPERAND (t, 0); + } if (TREE_CODE (t) == SSA_NAME || DECL_P (t) Index: builtins.c === --- builtins.c (revision 188772) +++ builtins.c (working copy) @@ -1252,7 +1252,6 @@ get_memory_rtx (tree exp, tree len) { tree orig_exp = exp; rtx addr, mem; - HOST_WIDE_INT off; /* When EXP is not resolved SAVE_EXPR, MEM_ATTRS can be still derived from its expression, for expr-a.b only variable.a.b is recorded. */ @@ -1269,114 +1268,30 @@ get_memory_rtx (tree exp, tree len) POINTER_TYPE_P (TREE_TYPE (TREE_OPERAND (exp, 0 exp = TREE_OPERAND (exp, 0); - off = 0; - if (TREE_CODE (exp) == POINTER_PLUS_EXPR - TREE_CODE (TREE_OPERAND (exp, 0)) == ADDR_EXPR - host_integerp (TREE_OPERAND (exp, 1), 0) - (off = tree_low_cst (TREE_OPERAND (exp, 1), 0)) 0) - exp = TREE_OPERAND (TREE_OPERAND (exp, 0), 0); - else if (TREE_CODE (exp) == ADDR_EXPR) - exp = TREE_OPERAND (exp, 0); - else if (POINTER_TYPE_P (TREE_TYPE (exp))) - exp = build1 (INDIRECT_REF, TREE_TYPE (TREE_TYPE (exp)), exp); - else - exp = NULL; - - /* Honor attributes derived from exp, except for the alias set - (as builtin stringops may alias with anything) and the size - (as stringops may access multiple array elements). */ - if (exp) + /* Build a MEM_REF representing the whole accessed area as a byte blob, + (as builtin stringops may alias with anything). */ + exp = fold_build2 (MEM_REF, + build_array_type (char_type_node, + build_range_type (sizetype, + size_one_node, len)), + exp, build_int_cst (ptr_type_node, 0)); + + /* If the MEM_REF has no acceptable address, try to get the base object, + and build an all-aliasing unknown-sized access to that one. */ + if (!is_gimple_mem_ref_addr (TREE_OPERAND (exp, 0)) + (exp = get_base_address (exp))) The get_base_address massaging should be not necessary if you'd use the original exp here, not the built MEM_REF. Otherwise looks ok. Thanks, Richard. { - set_mem_attributes (mem, exp, 0); - - if (off) - mem = adjust_automodify_address_nv (mem, BLKmode, NULL, off); - - /* Allow the string and memory builtins to overflow from one - field into another, see http://gcc.gnu.org/PR23561. - Thus avoid COMPONENT_REFs in MEM_EXPR unless we know the whole - memory accessed by the string or memory builtin will fit - within the field. */ - if (MEM_EXPR (mem) TREE_CODE (MEM_EXPR (mem)) == COMPONENT_REF) - { - tree mem_expr = MEM_EXPR (mem); - HOST_WIDE_INT offset = -1, length = -1; - tree inner = exp; - - while (TREE_CODE (inner) == ARRAY_REF - || CONVERT_EXPR_P (inner) - || TREE_CODE (inner) == VIEW_CONVERT_EXPR - || TREE_CODE (inner) == SAVE_EXPR) - inner = TREE_OPERAND (inner, 0); - - gcc_assert (TREE_CODE (inner) == COMPONENT_REF); - - if (MEM_OFFSET_KNOWN_P (mem)) - offset = MEM_OFFSET (mem); - - if (offset = 0 len host_integerp (len, 0)) - length = tree_low_cst (len, 0); - - while (TREE_CODE (inner) ==
Re: [cxx-conversion] Remove option to build without a C++ compiler (issue6296093)
On 12-06-20 10:10 , Steven Bosscher wrote: -# By default, C is the only stage 1 language. +# By default, C and C++ are the only stage 1 languages. stage1_languages=,c, So shouldn't you add c++ here? That was a bad change on the comment. We only need C for stage1. Thanks for spotting it. - # If bootstrapping, then using --enable-build-with-cxx or - # --enable-build-poststage1-with-cxx requires enabling C++. - case ,$enable_languages,:,$ENABLE_BUILD_WITH_CXX,$ENABLE_BUILD_POSTSTAGE1_WITH_CXX,:$enable_bootstrap in -*,c++,*:*:*) ;; -*:*,yes,*:yes) + # If bootstrapping, C++ must be enabled. Hmn, perhaps I misunderstand, but shouldn't C++ also be enabled if not bootstrapping? It's only needed if we are building C++ code. Everything else uses the host compiler. You can also remove the lang_requires_boot_languages machinery again. It is only used by Go to enable c++ for bootstrapping the Go front end, but with c++ enabled by default, there is no need for this hack for Go anymore. Good point. I'll send a separate patch for that. Diego.
Re: RFA: Fix PR53688
Hi, On Wed, 20 Jun 2012, Richard Guenther wrote: + exp = fold_build2 (MEM_REF, + build_array_type (char_type_node, + build_range_type (sizetype, + size_one_node, len)), + exp, build_int_cst (ptr_type_node, 0)); + + /* If the MEM_REF has no acceptable address, try to get the base object, + and build an all-aliasing unknown-sized access to that one. */ + if (!is_gimple_mem_ref_addr (TREE_OPERAND (exp, 0)) + (exp = get_base_address (exp))) The get_base_address massaging should be not necessary if you'd use the original exp here, not the built MEM_REF. Hmm? The original expression is an address, I have to build a MEM_REF out of that, and the is_gimple_mem_ref_addr() just checked that that very address (after going through fold) is not acceptable as MEM_REF operand. So how could I avoid the massaging of the address to make it an acceptable operand? Ciao, Michael.
Re: RFA: Fix PR53688
On Wed, Jun 20, 2012 at 5:09 PM, Michael Matz m...@suse.de wrote: Hi, On Wed, 20 Jun 2012, Richard Guenther wrote: + exp = fold_build2 (MEM_REF, + build_array_type (char_type_node, + build_range_type (sizetype, + size_one_node, len)), + exp, build_int_cst (ptr_type_node, 0)); + + /* If the MEM_REF has no acceptable address, try to get the base object, + and build an all-aliasing unknown-sized access to that one. */ + if (!is_gimple_mem_ref_addr (TREE_OPERAND (exp, 0)) + (exp = get_base_address (exp))) The get_base_address massaging should be not necessary if you'd use the original exp here, not the built MEM_REF. Hmm? The original expression is an address, I have to build a MEM_REF out of that, and the is_gimple_mem_ref_addr() just checked that that very address (after going through fold) is not acceptable as MEM_REF operand. So how could I avoid the massaging of the address to make it an acceptable operand? Not change get_base_addres and use if (!is_gimple_mem_ref_addr (TREE_OPERAND (exp, 0)) (exp = get_base_address (TREE_OPERAND (orig_exp, 0 Richard.
Re: [Patch ping] Strength reduction
On Wed, 2012-06-20 at 13:11 +0200, Richard Guenther wrote: On Thu, Jun 14, 2012 at 3:21 PM, William J. Schmidt wschm...@linux.vnet.ibm.com wrote: Pro forma ping. :) ;) I notice (with all of these functions) +unsigned +negate_cost (enum machine_mode mode, bool speed) +{ + static unsigned costs[NUM_MACHINE_MODES]; + rtx seq; + unsigned cost; + + if (costs[mode]) +return costs[mode]; + + start_sequence (); + force_operand (gen_rtx_fmt_e (NEG, mode, + gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1)), + NULL_RTX); + seq = get_insns (); + end_sequence (); + + cost = seq_cost (seq, speed); + if (!cost) +cost = 1; that the cost[] array is independent on the speed argument. Thus whatever comes first determines the cost. Odd, and probably not good. A fix would be appreciated (even for the current code ...) - simply make the array costs[NUM_MACHINE_MODES][2]. As for the renaming - can you name the functions consistently? Thus the above would be negate_reg_cost? And maybe rename the other FIXME function, too? I agree with all this. I'll prepare all the cost model changes as a separate preliminaries patch. Index: gcc/tree-ssa-strength-reduction.c === --- gcc/tree-ssa-strength-reduction.c (revision 0) +++ gcc/tree-ssa-strength-reduction.c (revision 0) @@ -0,0 +1,1611 @@ +/* Straight-line strength reduction. + Copyright (C) 2012 Free Software Foundation, Inc. I know we have these 'tree-ssa-' names, but really this is gimple-ssa now ;) So, please name it gimple-ssa-strength-reduction.c. Will do. Vive la revolution? ;) + /* Access to the statement for subsequent modification. Cached to + save compile time. */ + gimple_stmt_iterator cand_gsi; this is a iterator for cand_stmt? Then caching it is no longer necessary as the iterator is the stmt itself after recent infrastructure changes. Oh yeah, I remember seeing that go by. Nice. Will change. +/* Hash table embodying a mapping from statements to candidates. */ +static htab_t stmt_cand_map; ... +static hashval_t +stmt_cand_hash (const void *p) +{ + return htab_hash_pointer (((const_slsr_cand_t) p)-cand_stmt); +} use a pointer-map instead. +/* Callback to produce a hash value for a candidate chain header. */ + +static hashval_t +base_cand_hash (const void *p) +{ + tree ssa_name = ((const_cand_chain_t) p)-base_name; + + if (TREE_CODE (ssa_name) != SSA_NAME) +return (hashval_t) 0; + + return (hashval_t) SSA_NAME_VERSION (ssa_name); +} does it ever happen that ssa_name is not an SSA_NAME? Not in this patch, but when I introduce CAND_REF in a later patch it could happen since the base field of a CAND_REF is a MEM_REF. It's a safety valve in case of misuse. I'll think about this some more. I'm not sure the memory savings over simply using a fixed-size (num_ssa_names) array indexed by SSA_NAME_VERSION pointing to the chain is worth using a hashtable for this? That's reasonable. I'll do that. + node = (cand_chain_t) pool_alloc (chain_pool); + node-base_name = c-base_name; If you never free pool entries it's more efficient to use an obstack. alloc-pool only pays off if you get freed item re-use. OK. I'll change both cand_pool and chain_pool to obstacks. + switch (gimple_assign_rhs_code (gs)) +{ +case MULT_EXPR: + rhs2 = gimple_assign_rhs2 (gs); + + if (TREE_CODE (rhs2) == INTEGER_CST) + return multiply_by_cost (TREE_INT_CST_LOW (rhs2), lhs_mode, speed); + + if (TREE_CODE (rhs1) == INTEGER_CST) + return multiply_by_cost (TREE_INT_CST_LOW (rhs1), lhs_mode, speed); In theory all commutative statements should have constant operands only at rhs2 ... I'm glad I'm not the only one who thought that was the theory. ;) I wasn't sure, and I've seen violations of this come up in practice. Should I assert when that happens instead, and track down the offending optimizations? Also you do not verify that the constant fits in a host-wide-int - but maybe you do not care? Thus, I'd do if (host_integerp (rhs2, 0)) return multiply_by_cost (TREE_INT_CST_LOW (rhs2), lhs_mode, speed); or make multiply_by[_const?]_cost take a double-int instead. Likewise below for add. Ok. Name change looks good also, I'll include that in the cost model changes. +case MODIFY_EXPR: + /* Be suspicious of assigning costs to copies that may well go away. */ + return 0; MODIFY_EXPR is never a gimple_assign_rhs_code. Simple copies have a code of SSA_NAME for example. But as you assert if you get to an unhandled code I wonder why you needed the above ... I'll remove this, and document that we are deliberately not touching copies (which was my original intent). +static slsr_cand_t +base_cand_from_table (tree base_in) +{ + slsr_cand mapping_key;
Re: [PATCH] C++11, grammar fix for late-specified return types and virt-specifiers
On 06/20/2012 12:57 AM, Ville Voutilainen wrote: If a single pipe is indeed to be used, perhaps we want to correct that piece of documentation, lest fools follow its advice. :) Done, thanks. Jason
Re: [PATCH] ARM: exclude fixed_regs for stack-alignment save/restore
On Mon, Jun 18, 2012 at 9:34 AM, Roland McGrath mcgra...@google.com wrote: OK then. If you like the original patch, would you like to commit it for me? ping?
Re: [arm] Remove obsolete FPA support (1/n): obsolete target removal
On 20/06/12 15:41, Sebastian Huber wrote: On 06/13/2012 02:51 PM, Richard Earnshaw wrote: This patch is the first of a series to remove support for the now obsolete FPA and Maverick co-processors. This patch removes those targets and configuration options that were marked as deprecated in GCC-4.7 and removes the config fragments that depended on them. * config.gcc (unsupported): Move obsoleted FPA-based configurations here from ... (obsolete): ... here. [...] I am not sure, but I think the libgcc/config.host needs some cleanup too? Undoubtedly. But I'm not finished yet... R.
[PATCH, i386]: Macroize remaining rounding expanders
Hello! 2012-06-20 Uros Bizjak ubiz...@gmail.com * config/i386/i386.md (rounding_insnmode2): Macroize expander from {floor,ceil,btrunc}mode2 using FIST_ROUNDING int iterator. (lrounding_insnMODEF:modeSWI48:mode2): Macroize expander from l{floor,ceil}MODEF:modeSWI48:mode2 using FIST_ROUNDING int iterator. Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN. Uros. Index: i386.md === --- i386.md (revision 188837) +++ i386.md (working copy) @@ -15178,9 +15178,11 @@ flag_unsafe_math_optimizations !optimize_insn_for_size_p ()) -(define_expand floormode2 - [(use (match_operand:MODEF 0 register_operand)) - (use (match_operand:MODEF 1 register_operand))] +(define_expand rounding_insnmode2 + [(parallel [(set (match_operand:MODEF 0 register_operand) + (unspec:MODEF [(match_operand:MODEF 1 register_operand)] +FRNDINT_ROUNDING)) + (clobber (reg:CC FLAGS_REG))])] (TARGET_USE_FANCY_MATH_387 (!(SSE_FLOAT_MODE_P (MODEmode) TARGET_SSE_MATH) || TARGET_MIX_SSE_I387) @@ -15193,53 +15195,31 @@ { if (TARGET_ROUND) emit_insn (gen_sse4_1_roundmode2 - (operands[0], operands[1], GEN_INT (ROUND_FLOOR))); + (operands[0], operands[1], GEN_INT (ROUND_ROUNDING))); else if (optimize_insn_for_size_p ()) -FAIL; - else if (TARGET_64BIT || (MODEmode != DFmode)) - ix86_expand_floorceil (operands[0], operands[1], true); - else - ix86_expand_floorceildf_32 (operands[0], operands[1], true); -} - else -{ - rtx op0, op1; - - if (optimize_insn_for_size_p ()) FAIL; - - op0 = gen_reg_rtx (XFmode); - op1 = gen_reg_rtx (XFmode); - emit_insn (gen_extendmodexf2 (op1, operands[1])); - emit_insn (gen_frndintxf2_floor (op0, op1)); - - emit_insn (gen_truncxfmode2_i387_noop (operands[0], op0)); -} - DONE; -}) - -(define_expand ceilmode2 - [(use (match_operand:MODEF 0 register_operand)) - (use (match_operand:MODEF 1 register_operand))] - (TARGET_USE_FANCY_MATH_387 - (!(SSE_FLOAT_MODE_P (MODEmode) TARGET_SSE_MATH) - || TARGET_MIX_SSE_I387) - flag_unsafe_math_optimizations) - || (SSE_FLOAT_MODE_P (MODEmode) TARGET_SSE_MATH -!flag_trapping_math) -{ - if (SSE_FLOAT_MODE_P (MODEmode) TARGET_SSE_MATH - !flag_trapping_math) -{ - if (TARGET_ROUND) - emit_insn (gen_sse4_1_roundmode2 - (operands[0], operands[1], GEN_INT (ROUND_CEIL))); - else if (optimize_insn_for_size_p ()) - FAIL; else if (TARGET_64BIT || (MODEmode != DFmode)) - ix86_expand_floorceil (operands[0], operands[1], false); + { + if (ROUND_ROUNDING == ROUND_FLOOR) + ix86_expand_floorceil (operands[0], operands[1], true); + else if (ROUND_ROUNDING == ROUND_CEIL) + ix86_expand_floorceil (operands[0], operands[1], false); + else if (ROUND_ROUNDING == ROUND_TRUNC) + ix86_expand_trunc (operands[0], operands[1]); + else + gcc_unreachable (); + } else - ix86_expand_floorceildf_32 (operands[0], operands[1], false); + { + if (ROUND_ROUNDING == ROUND_FLOOR) + ix86_expand_floorceildf_32 (operands[0], operands[1], true); + else if (ROUND_ROUNDING == ROUND_CEIL) + ix86_expand_floorceildf_32 (operands[0], operands[1], false); + else if (ROUND_ROUNDING == ROUND_TRUNC) + ix86_expand_truncdf_32 (operands[0], operands[1]); + else + gcc_unreachable (); + } } else { @@ -15251,53 +15231,13 @@ op0 = gen_reg_rtx (XFmode); op1 = gen_reg_rtx (XFmode); emit_insn (gen_extendmodexf2 (op1, operands[1])); - emit_insn (gen_frndintxf2_ceil (op0, op1)); + emit_insn (gen_frndintxf2_rounding (op0, op1)); emit_insn (gen_truncxfmode2_i387_noop (operands[0], op0)); } DONE; }) -(define_expand btruncmode2 - [(use (match_operand:MODEF 0 register_operand)) - (use (match_operand:MODEF 1 register_operand))] - (TARGET_USE_FANCY_MATH_387 - (!(SSE_FLOAT_MODE_P (MODEmode) TARGET_SSE_MATH) - || TARGET_MIX_SSE_I387) - flag_unsafe_math_optimizations) - || (SSE_FLOAT_MODE_P (MODEmode) TARGET_SSE_MATH -!flag_trapping_math) -{ - if (SSE_FLOAT_MODE_P (MODEmode) TARGET_SSE_MATH - !flag_trapping_math) -{ - if (TARGET_ROUND) - emit_insn (gen_sse4_1_roundmode2 - (operands[0], operands[1], GEN_INT (ROUND_TRUNC))); - else if (optimize_insn_for_size_p ()) - FAIL; - else if (TARGET_64BIT || (MODEmode != DFmode)) - ix86_expand_trunc (operands[0], operands[1]); - else - ix86_expand_truncdf_32 (operands[0], operands[1]); -} - else -{ -
Re: [PATCH] add DECL_SOURCE_COLUMN to tree.h (trivial)
On 12-06-20 13:43 , Rüdiger Sonderfeld wrote: The patch is extremely trivial and probably doesn't need copyright assignment. However I have signed copyright assignment for Emacs and maybe that will work too (not sure if this has to be signed for every project). It does, unfortunately. gcc/ChangeLog 2012-06-20 Rüdiger Sonderfeld ruedi...@c-plusplus.de * tree.h (DECL_SOURCE_COLUMN): New accessor OK. I suppose you do not have write to the repo, so I will commit it for you. Diego.
Re: [PATCH] add DECL_SOURCE_COLUMN to tree.h (trivial)
On 12-06-20 13:50 , Diego Novillo wrote: OK. I suppose you do not have write to the repo, so I will commit it for you. Committed r188841. Diego.
Re: [Patch ping] Strength reduction
On 06/20/2012 04:11 AM, Richard Guenther wrote: I notice (with all of these functions) +unsigned +negate_cost (enum machine_mode mode, bool speed) +{ + static unsigned costs[NUM_MACHINE_MODES]; + rtx seq; + unsigned cost; + + if (costs[mode]) +return costs[mode]; + + start_sequence (); + force_operand (gen_rtx_fmt_e (NEG, mode, + gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1)), + NULL_RTX); I don't suppose there's any way to share data with what init_expmed computes? Not, strictly speaking, the cleanest thing to include expmed.h here, but surely a tad better than re-computing identical data (and without the clever rtl garbage avoidance tricks). r~
Re: [Patch] PR 51938: extend ifcombine
On Wed, 20 Jun 2012, Richard Guenther wrote: On Sun, Jun 10, 2012 at 4:16 PM, Marc Glisse marc.gli...@inria.fr wrote: Hello, currently, tree-ssa-ifcombine handles pairs of imbricated ifs that share the same then branch, or the same else branch. There is no particular reason why it couldn't also handle the case where the then branch of one is the else branch of the other, which is what I do here. Any comments? The general idea looks good, but I think the patch is too invasive. As far as I can see the only callers with a non-zero 'inv' argument come from ifcombine_ifnotorif and ifcombine_ifnotandif (and both with inv == 2). I would rather see a more localized patch that makes use of invert_tree_comparison to perform the inversion on the call arguments of maybe_fold_and/or_comparisons. Is there any reason that would not work? invert_tree_comparison is useless for floating point (the case I am most interested in) unless we specify -fno-trapping-math (writing this patch taught me to add this flag to my default flags, but I can't expect everyone to do the same). An issue is that gcc mixes the behaviors of qnan and snan (it is not really an issue, it just means that !(comparison) can't be represented as comparison2). At least + if (inv 1) +lcompcode2 = COMPCODE_TRUE - lcompcode2; looks as if it were not semantically correct - you cannot simply invert floating-point comparisons (see the restrictions invert_tree_comparison has). I don't remember all details, but I specifically thought of that, and the trapping behavior is handled a few lines below. -- Marc Glisse
[PATCH, i386]: Macroize with int iterators remaining insn patterns
Hello! 2012-06-20 Uros Bizjak ubiz...@gmail.com * config/i386/i386.md (SINCOS): New int iterator. (sincos): New int attribute. (*sincosxf2_i387): Macroize insn from *{sin,cos}xf2_i387 using SINCOS int iterator. (*sincos_extendmodexf2_i387): Macroize insn from *{sin,cos}_extendmodexf2_i387 using SINCOS int iterator. 2012-06-20 Uros Bizjak ubiz...@gmail.com * config/i386/i386.md (RDFSGSBASE): New int iterator. (WRFSGSBASE): Ditto. (fsgs): New int attribute. (rdfsgsbasemode): Macroize insn from rdfsgsbasemode using RDFSGSBASE int iterator. (wrfsgsbasemode): Macroize insn from wrfsgsbasemode using WRFSGSBASE int iterator. Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN. Uros. Index: i386.md === --- i386.md (revision 188840) +++ i386.md (working copy) @@ -13863,47 +13863,34 @@ DONE; }) -(define_insn *sinxf2_i387 - [(set (match_operand:XF 0 register_operand =f) - (unspec:XF [(match_operand:XF 1 register_operand 0)] UNSPEC_SIN))] - TARGET_USE_FANCY_MATH_387 -flag_unsafe_math_optimizations - fsin - [(set_attr type fpspc) - (set_attr mode XF)]) +(define_int_iterator SINCOS + [UNSPEC_SIN +UNSPEC_COS]) -(define_insn *sin_extendmodexf2_i387 - [(set (match_operand:XF 0 register_operand =f) - (unspec:XF [(float_extend:XF - (match_operand:MODEF 1 register_operand 0))] - UNSPEC_SIN))] - TARGET_USE_FANCY_MATH_387 -(!(SSE_FLOAT_MODE_P (MODEmode) TARGET_SSE_MATH) - || TARGET_MIX_SSE_I387) -flag_unsafe_math_optimizations - fsin - [(set_attr type fpspc) - (set_attr mode XF)]) +(define_int_attr sincos + [(UNSPEC_SIN sin) +(UNSPEC_COS cos)]) -(define_insn *cosxf2_i387 +(define_insn *sincosxf2_i387 [(set (match_operand:XF 0 register_operand =f) - (unspec:XF [(match_operand:XF 1 register_operand 0)] UNSPEC_COS))] + (unspec:XF [(match_operand:XF 1 register_operand 0)] + SINCOS))] TARGET_USE_FANCY_MATH_387 flag_unsafe_math_optimizations - fcos + fsincos [(set_attr type fpspc) (set_attr mode XF)]) -(define_insn *cos_extendmodexf2_i387 +(define_insn *sincos_extendmodexf2_i387 [(set (match_operand:XF 0 register_operand =f) (unspec:XF [(float_extend:XF (match_operand:MODEF 1 register_operand 0))] - UNSPEC_COS))] + SINCOS))] TARGET_USE_FANCY_MATH_387 (!(SSE_FLOAT_MODE_P (MODEmode) TARGET_SSE_MATH) || TARGET_MIX_SSE_I387) flag_unsafe_math_optimizations - fcos + fsincos [(set_attr type fpspc) (set_attr mode XF)]) @@ -18087,38 +18074,36 @@ (set (attr length) (symbol_ref ix86_attr_length_address_default (insn) + 9))]) -(define_insn rdfsbasemode - [(set (match_operand:SWI48 0 register_operand =r) - (unspec_volatile:SWI48 [(const_int 0)] UNSPECV_RDFSBASE))] - TARGET_64BIT TARGET_FSGSBASE - rdfsbase %0 - [(set_attr type other) - (set_attr prefix_extra 2)]) +(define_int_iterator RDFSGSBASE + [UNSPECV_RDFSBASE +UNSPECV_RDGSBASE]) -(define_insn rdgsbasemode +(define_int_iterator WRFSGSBASE + [UNSPECV_WRFSBASE +UNSPECV_WRGSBASE]) + +(define_int_attr fsgs + [(UNSPECV_RDFSBASE fs) +(UNSPECV_RDGSBASE gs) +(UNSPECV_WRFSBASE fs) +(UNSPECV_WRGSBASE gs)]) + +(define_insn rdfsgsbasemode [(set (match_operand:SWI48 0 register_operand =r) - (unspec_volatile:SWI48 [(const_int 0)] UNSPECV_RDGSBASE))] + (unspec_volatile:SWI48 [(const_int 0)] RDFSGSBASE))] TARGET_64BIT TARGET_FSGSBASE - rdgsbase %0 + rdfsgsbase\t%0 [(set_attr type other) (set_attr prefix_extra 2)]) -(define_insn wrfsbasemode +(define_insn wrfsgsbasemode [(unspec_volatile [(match_operand:SWI48 0 register_operand r)] - UNSPECV_WRFSBASE)] + WRFSGSBASE)] TARGET_64BIT TARGET_FSGSBASE - wrfsbase %0 + wrfsgsbase\t%0 [(set_attr type other) (set_attr prefix_extra 2)]) -(define_insn wrgsbasemode - [(unspec_volatile [(match_operand:SWI48 0 register_operand r)] - UNSPECV_WRGSBASE)] - TARGET_64BIT TARGET_FSGSBASE - wrgsbase %0 - [(set_attr type other) - (set_attr prefix_extra 2)]) - (define_insn rdrandmode_1 [(set (match_operand:SWI248 0 register_operand =r) (unspec_volatile:SWI248 [(const_int 0)] UNSPECV_RDRAND))
[PATCH] PR c/53702: Fix -Wunused-local-typedefs with nested functions
Hi, A few weeks ago I submitted a fix for a garbage collection issue I ran into involving -Wunused-local-typedefs [1]. The analysis for that patch still stands, but unfortunately the patch is wrong. The problem is that the allocation reuse can't be removed otherwise the information about local typedefs for a parent function is lost after a nested function is parsed. I obviously missed that distinction the first time. This patch restores the previous behavior and just clears the 'x_cur_stmt_list' field to avoid the GC issue. The patch was tested by building mips-linux-gnu (to verify that the GC crash that I originally encountered is still fixed) and by bootstrapping and running the full test suite for i686-pc-linux-gnu. OK? P.S. If it is OK, then can someone commit for me (I don't have write access)? [1] http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01936.html gcc/ 2012-06-20 Meador Inge mead...@codesourcery.com PR c/53702 * c-decl.c (c_push_function_context): Restore the behavior to reuse the language function allocated for -Wunused-local-typedefs. (c_pop_function_context): If necessary, clear the language function created in c_push_function_context. Always clear out the x_cur_stmt_list field of the restored language function. gcc/testsuite/ 2012-06-20 Meador Inge mead...@codesourcery.com PR c/53702 * gcc.dg/Wunused-local-typedefs.c: New testcase. Index: gcc/testsuite/gcc.dg/Wunused-local-typedefs.c === --- gcc/testsuite/gcc.dg/Wunused-local-typedefs.c (revision 0) +++ gcc/testsuite/gcc.dg/Wunused-local-typedefs.c (revision 0) @@ -0,0 +1,36 @@ +/* Origin PR c/53702 +{ dg-options -Wunused-local-typedefs } +{ dg-do compile } +*/ + +/* Only test nested functions for C. More tests that work for C and C++ + can be found in c-c++-common. +*/ + +void +test0 () +{ + typedef int foo; /* { dg-warning locally defined but not used } */ + void f () + { + } +} + +void +test1 () +{ + void f () + { +typedef int foo; /* { dg-warning locally defined but not used } */ + } +} + + +void +test2 () +{ + void f () + { + } + typedef int foo; /* { dg-warning locally defined but not used } */ +} Index: gcc/c-decl.c === --- gcc/c-decl.c(revision 188841) +++ gcc/c-decl.c(working copy) @@ -8579,9 +8579,11 @@ check_for_loop_decls (location_t loc, bo void c_push_function_context (void) { - struct language_function *p; - p = ggc_alloc_language_function (); - cfun-language = p; + struct language_function *p = cfun-language; + /* cfun-language might have been already allocated by the use of + -Wunused-local-typedefs. In that case, just re-use it. */ + if (p == NULL) +cfun-language = p = ggc_alloc_cleared_language_function (); p-base.x_stmt_tree = c_stmt_tree; c_stmt_tree.x_cur_stmt_list @@ -8607,7 +8609,12 @@ c_pop_function_context (void) pop_function_context (); p = cfun-language; - cfun-language = NULL; + + /* When -Wunused-local-typedefs is in effect, cfun-languages is + used to store data throughout the life time of the current cfun, + So don't deallocate it. */ + if (!warn_unused_local_typedefs) +cfun-language = NULL; if (DECL_STRUCT_FUNCTION (current_function_decl) == 0 DECL_SAVED_TREE (current_function_decl) == NULL_TREE) @@ -8620,6 +8627,7 @@ c_pop_function_context (void) } c_stmt_tree = p-base.x_stmt_tree; + p-base.x_stmt_tree.x_cur_stmt_list = NULL; c_break_label = p-x_break_label; c_cont_label = p-x_cont_label; c_switch_stack = p-x_switch_stack;
Re: [Patch ping] Strength reduction
On Wed, 2012-06-20 at 11:52 -0700, Richard Henderson wrote: On 06/20/2012 04:11 AM, Richard Guenther wrote: I notice (with all of these functions) +unsigned +negate_cost (enum machine_mode mode, bool speed) +{ + static unsigned costs[NUM_MACHINE_MODES]; + rtx seq; + unsigned cost; + + if (costs[mode]) +return costs[mode]; + + start_sequence (); + force_operand (gen_rtx_fmt_e (NEG, mode, + gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1)), +NULL_RTX); I don't suppose there's any way to share data with what init_expmed computes? Not, strictly speaking, the cleanest thing to include expmed.h here, but surely a tad better than re-computing identical data (and without the clever rtl garbage avoidance tricks). Interesting. I was building on what ivopts already has; not sure of the history there. It looks like there is some overlap in function, but expmed doesn't have everything ivopts uses today (particularly the hash table of costs for multiplies by various constants). The stuff I need for type promotion/demotion is also not present (which I'm computing on demand for whatever mode pairs are encountered). Not sure how great it would be to precompute that for all pairs, and obviously precomputing costs of multiplying by all constants isn't going to work. So if the two functionalities were to be combined, it would seem to require some modification to how expmed works. Thanks, Bill r~
[Patch, mips] Fix warning when using --with-synci
This patch addresses the problem of building GCC for mips with the '--with-synci' configure option. If you do that and then compile a program with GCC and specify an architecture that does not support synci (such as the -mips32 option), GCC will issue a warning that synci is not supported. This results in many problems including an inability to build a multilib version of GCC that includes -mips32. This patch changes the gcc driver to pass --msynci-if-supported to cc1 instead of -msynci so that cc1 will only turn on synci if it is supported on the architecture being compiled for and will leave it off (and not generate a warning) if it is not supported on that architecture. If the user specifically uses -msynci, they still get the warning. Tested on mips-linux-gnu, OK for checkin? Steve Ellcey sell...@mips.com 2012-06-20 Steve Ellcey sell...@mips.com * config.gcc: Set with_synci to synci-if-supported instead if synci. * config/mips/mips.c (mips_option_override): Check TARGET_SYNCI_IF_SUPPORTED and update target_flags. * config/mips/mips.opt (msynci-if-supported): New. diff --git a/gcc/config.gcc b/gcc/config.gcc index f2b0936..58ee3e9 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -3281,7 +3281,7 @@ case ${target} in case ${with_synci} in yes) - with_synci=synci + with_synci=synci-if-supported ;; | no) # No is the default. diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index 5bcb7a8..f17d39b 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -16172,6 +16172,9 @@ mips_option_override (void) : !TARGET_BRANCHLIKELY)) sorry (%qs requires branch-likely instructions, -mfix-r1); + if (TARGET_SYNCI_IF_SUPPORTED !TARGET_SYNCI ISA_HAS_SYNCI) +target_flags |= MASK_SYNCI; + if (TARGET_SYNCI !ISA_HAS_SYNCI) { warning (0, the %qs architecture does not support the synci diff --git a/gcc/config/mips/mips.opt b/gcc/config/mips/mips.opt index e3294a7..1dbce65 100644 --- a/gcc/config/mips/mips.opt +++ b/gcc/config/mips/mips.opt @@ -338,6 +338,9 @@ msynci Target Report Mask(SYNCI) Use synci instruction to invalidate i-cache +msynci-if-supported +Target Mask(SYNCI_IF_SUPPORTED) RejectNegative Undocumented + mtune= Target RejectNegative Joined Var(mips_tune_option) ToLower Enum(mips_arch_opt_value) -mtune=PROCESSOR Optimize the output for PROCESSOR
[Patch, fortran] PR 39654 FTELL intrinsic
Hi, the attached patch makes the FTELL intrinsic function work on offsets larger than 2 GB on 32-bit systems that support large files. As this is an ABI change the old library function is left untouched, to be removed when/if the library ABI is bumped. Regtested on x86_64-unknown-linux-gnu, Ok for trunk? frontend ChangeLog: 2012-06-21 Janne Blomqvist j...@gcc.gnu.org PR fortran/39654 * iresolve.c (gfc_resolve_ftell): Fix result kind and use new library function. library ChangeLog: 2012-06-21 Janne Blomqvist j...@gcc.gnu.org PR fortran/39654 * io/intrinsics.c (ftell2): New function. * gfortran.map (_gfortran_ftell2): Export function. -- Janne Blomqvist ftell.diff Description: Binary data
Re: Updated to respond to various email comments from Jason, Diego and Cary (issue6197069)
+ /* If we're putting types in their own .debug_types sections, + the .debug_pubtypes table will still point to the compile + unit (not the type unit), so we want to use the offset of + the skeleton DIE (if there is one). */ + if (pub-die-comdat_type_p names == pubtype_table) + { + comdat_type_node_ref type_node = pub-die-die_id.die_type_node; + + if (type_node != NULL type_node-skeleton_die != NULL) + die_offset = type_node-skeleton_die-die_offset; + } I think we had agreed that if there is no skeleton, we should use an offset of 0. You're right, I forgot to handle that case. How's this look? if (type_node != NULL) die_offset = (type_node-skeleton_die != NULL ? type_node-skeleton_die-die_offset : 0); Is that OK if it passes regression tests? -cary
New option to turn off stack reuse for temporaries
One of the most common runtime errors we have seen in gcc-4_7 is caused by dangling references to temporaries whole life time have ended e.g, const A a = foo(); or foo (A());// where temp's address is saved and used after foo. Of course this is user error according to the standard, triaging of bugs like this is pretty time consuming to triage. This patch tries to introduce an option to disable stack reuse for temporaries, which can be used to debugging purpose. Is this good for trunk? thanks, David 2012-06-20 Xinliang David Li davi...@google.com * common.opt: -ftemp-reuse-stack option. * gimplify.c (gimplify_target_expr): Check new flag. Index: doc/invoke.texi === --- doc/invoke.texi (revision 188362) +++ doc/invoke.texi (working copy) @@ -1003,6 +1003,7 @@ See S/390 and zSeries Options. -fstack-limit-register=@var{reg} -fstack-limit-symbol=@var{sym} @gol -fno-stack-limit -fsplit-stack @gol -fleading-underscore -ftls-model=@var{model} @gol +-ftemp-stack-reuse @gol -ftrapv -fwrapv -fbounds-check @gol -fvisibility -fstrict-volatile-bitfields} @end table @@ -19500,6 +19501,10 @@ indices used to access arrays are within currently only supported by the Java and Fortran front ends, where this option defaults to true and false respectively. +@item -ftemp-stack-reuse +@opindex ftemp_stack_reuse +This option enables stack space reuse for temporaries. The default is on. + @item -ftrapv @opindex ftrapv This option generates traps for signed overflow on addition, subtraction, Index: gimplify.c === --- gimplify.c (revision 188362) +++ gimplify.c (working copy) @@ -5487,7 +5487,8 @@ gimplify_target_expr (tree *expr_p, gimp /* Add a clobber for the temporary going out of scope, like gimplify_bind_expr. */ if (gimplify_ctxp-in_cleanup_point_expr - needs_to_live_in_memory (temp)) + needs_to_live_in_memory (temp) + flag_temp_stack_reuse) { tree clobber = build_constructor (TREE_TYPE (temp), NULL); TREE_THIS_VOLATILE (clobber) = true; Index: common.opt === --- common.opt (revision 188362) +++ common.opt (working copy) @@ -1322,6 +1322,10 @@ fif-conversion2 Common Report Var(flag_if_conversion2) Optimization Perform conversion of conditional jumps to conditional execution +ftemp-stack-reuse +Common Report Var(flag_temp_stack_reuse) Init(1) +Enable stack reuse for compiler generated temps + ftree-loop-if-convert Common Report Var(flag_tree_loop_if_convert) Init(-1) Optimization Convert conditional jumps in innermost loops to branchless equivalents
[wwwdocs] Fix type in gcc-4.7/changes.html
Applied. Gerald Index: gcc-4.7/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.7/changes.html,v retrieving revision 1.119 diff -u -3 -p -r1.119 changes.html --- gcc-4.7/changes.html14 Jun 2012 17:57:18 - 1.119 +++ gcc-4.7/changes.html20 Jun 2012 23:32:58 - @@ -834,7 +834,7 @@ void set_portb (uint8_t value) functions when the user switches the target machine using the code#pragma GCC target/code or code__attribute__ ((__target__ (emtarget/em)))/code - code sequences. In additon, the target macros are updated. + code sequences. In addition, the target macros are updated. However, due to the way the code-save-temps/code switch is implemented, you won't see the effect of these additional macros being defined in preprocessor output./li
Re: [PR debug/53682] avoid crash in cselib promote_debug_loc
On Jun 20, 2012, Jakub Jelinek ja...@redhat.com wrote: On Wed, Jun 20, 2012 at 12:39:29AM -0300, Alexandre Oliva wrote: When promote_debug_loc was first introduced, it would never be called with a NULL loc list. However, because of the strategy of temporarily resetting loc lists before recursion introduced a few months ago in alias.c, the earlier assumption no longer holds. This patch adusts promote_debug_loc to deal with this case. The thing I'm worried about is what will happen with -g0 in that case. If the loc list is temporarily reset, it will be restored again, won't that mean that for -g0 we'll then have a loc that is in the corresponding -g compilation referenced by a DEBUG_INSNs only (and thus non-promoted)? I don't see how. If we get to a NULL loc list, it means it's not the first time we visit that node (it was visited upstack), so if it needed promotion, it would have already been promoted then. -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer
Re: [Patch ARM/ configury] Add fall-back check for gnu_unique_object
On 10 April 2012 10:11, Ramana Radhakrishnan ramana.radhakrish...@linaro.org wrote: The patch with correct configure output is ok. Thanks - this is what I committed. Is this something that can be considered for backporting to release branches ? This patch technically doesn't fix a regression but brings in line behaviour of the normal bootstrap for %gnu_unique_object ? Ping - Is this ok to backport to the 4.7 and 4.6 branches ? Ramana
Re: Updated to respond to various email comments from Jason, Diego and Cary (issue6197069)
OK. Jason
Re: [wwwdocs] Make codingconventions.html pass W3 validator.
Lawrence, you ask a number of awfully good questions. :-) First of all you made me realize that we were missing a cross-link from http://gcc.gnu.org/projects/web.html to http://gcc.gnu.org/contribute.html#webchanges which the first patch included below does now. On Tue, 5 Jun 2012, Lawrence Crowl wrote: Where do these prepended pages come from? How do I test the page as it will appear? This is covered in http://gcc.gnu.org/contribute.html#webchanges . I guess maybe I'm asking for the makefile that produces what one would see. I want to validate that. This is now documented via the second patch below. BTW, part of the problem is that the pages are complete enough as they are to be considered complete. I.e. they are not obviously fragments. Would it be better to make them clearly fragments? The idea was for them to be basic HTML, so that people can view them in their browsers and use some clever editors without problems. So far this has generally worked well. Is there something we can tweak to make it better for you? Doesn't the prepending prevent incremental migration to new standards? This is true, though we can mitigate this by adding separate tags or annotations to either old or new pages during such a transition. The last time I did such a transition, it was not a big issue, though, and I expect web standards to be more incremental and usually quite compatible. But, you are right. Since you ran into this, I would like to document this better. Would http://gcc.gnu.org/projects/web.html be a good place, or do you have a different suggestion? My entry point was http://gcc.gnu.org/cvs.html, so at a minimum it need to be cross linked with http://gcc.gnu.org/projects/web.html. Done via the patch below, which also shortens the cvs.html page to make it easier to consume (and move/integrate somewhere else later on). Anything else I can answer / document, let me know! Gerald Index: projects/web.html === RCS file: /cvs/gcc/wwwdocs/htdocs/projects/web.html,v retrieving revision 1.11 diff -u -3 -p -r1.11 web.html --- projects/web.html 30 Mar 2008 18:59:30 - 1.11 +++ projects/web.html 20 Jun 2012 23:50:38 - @@ -8,6 +8,9 @@ h1GCC: Web Pages/h1 +pa href=../contribute.html#webchangesContributing changes to +our web pages/a is simple./p + pOur web pages are managed via CVS and can be accessed using the directions for a href=../cvs.htmlour CVS setup/a./p Index: projects/web.html === RCS file: /cvs/gcc/wwwdocs/htdocs/projects/web.html,v retrieving revision 1.12 diff -u -3 -p -r1.12 web.html --- projects/web.html 20 Jun 2012 23:55:35 - 1.12 +++ projects/web.html 21 Jun 2012 00:02:02 - @@ -14,6 +14,13 @@ our web pages/a is simple./p pOur web pages are managed via CVS and can be accessed using the directions for a href=../cvs.htmlour CVS setup/a./p +pAs changes are checked in, the respective pages are preprocessed +via the script codewwwdocs/bin/preprocess/code which in turn +uses a tool called MetaHTML. Among others, this preprocessing +adds CSS style sheets, XML and HTML headers, and our standard +footer. The MetaHTML style sheet is in +codewwwdocs/htdocs/style.mhtml/code./p + h2TODO/h2 pAny help concerning open issues is highly welcome, as are Index: cvs.html === RCS file: /cvs/gcc/wwwdocs/htdocs/cvs.html,v retrieving revision 1.220 diff -u -3 -p -r1.220 cvs.html --- cvs.html3 Apr 2011 13:00:43 - 1.220 +++ cvs.html21 Jun 2012 00:25:42 - @@ -12,7 +12,8 @@ pOur web pages and related scripts are available via our CVS repository. You can also a href=http://gcc.gnu.org/cgi-bin/cvsweb.cgi/wwwdocs/;browse them -online/a./p +online/a or view a href=projects/web.htmldetails on the +setup/a./p h2Using the CVS repository/h2 @@ -28,8 +29,6 @@ and SSH installed, you can check out the pFor anonymous access, use code-d :pserver:c...@gcc.gnu.org:/cvs/gcc/code instead./p -pPatches should be marked with the tag [wwwdocs] in the subject line./p - hr / h2a name=checkinChecking in a change/a/h2 @@ -37,25 +36,21 @@ and SSH installed, you can check out the pWhen you check in changes to our web pages, they will automatically be checked out into the web server's data area./p -pThe following is meant to provide a very quick overview of how +pThe following is a very quick overview of how to check in a change. We recommend you list files explicitly to avoid accidental checkins and prefer that each checkin be of a complete, single logical change./p ol liSync your sources with the master repository via codecvs -update/code before attempting a checkin; this will save you a little -time if someone else has modified that file since the last time you -synced your sources. It will also identify any files in your local
Re: New option to turn off stack reuse for temporaries
The documentation needs to explain more what the option controls, and why you might want it on or off. Other than that it looks fine. Jason
[gimplefe] creating individual gimple_assign statements
Hi, This patch creates basic gimple_assign statements. It is a little raw not considering all types of gimple_assign statements for which I have already started working. Here is the Changelog. 2012-06-09 Sandeep Soni soni.sande...@gmail.com * parser.c (gimple_symtab_get): New. (gimple_symtab_get_token): New. (gp_parse_expect_lhs): Returns tree node. (gp_parse_expect_rhs_op): Returns the op as tree node. (gp_parse_assign_stmt) : Builds gimple_assign statement. Index: gcc/gimple/parser.c === --- gcc/gimple/parser.c (revision 188546) +++ gcc/gimple/parser.c (working copy) @@ -105,6 +105,7 @@ gimple_symtab_eq_hash, NULL); } + /* Registers DECL with the gimple symbol table as having identifier ID. */ static void @@ -123,6 +124,41 @@ *slot = new_entry; } + +/* Gets the tree node for the corresponding identifier ID */ + +static tree +gimple_symtab_get (tree id) +{ + struct gimple_symtab_entry_def temp; + gimple_symtab_entry_t entry; + void **slot; + + gimple_symtab_maybe_init_hash_table(); + temp.id = id; + slot = htab_find_slot (gimple_symtab, temp, NO_INSERT); + if (slot) +{ + entry = (gimple_symtab_entry_t) *slot; + return entry-decl; +} + else +return NULL; +} + + +/* Gets the tree node of token TOKEN from the global gimple symbol table. */ + +static tree +gimple_symtab_get_token (const gimple_token *token) +{ + const char *name = gl_token_as_text(token); + tree id = get_identifier(name); + tree decl = gimple_symtab_get (id); + return decl; +} + + /* Return the string representation of token TOKEN. */ static const char * @@ -360,10 +396,11 @@ /* Helper for gp_parse_assign_stmt. The token read from reader PARSER should be the lhs of the tuple. */ -static void +static tree gp_parse_expect_lhs (gimple_parser *parser) { const gimple_token *next_token; + tree lhs; /* Just before the name of the identifier we might get the symbol of dereference too. If we do get it then consume that token, else @@ -372,18 +409,22 @@ if (next_token-type == CPP_MULT) next_token = gl_consume_token (parser-lexer); - gl_consume_expected_token (parser-lexer, CPP_NAME); + next_token = gl_consume_token (parser-lexer); + lhs = gimple_symtab_get_token (next_token); gl_consume_expected_token (parser-lexer, CPP_COMMA); + return lhs; + } /* Helper for gp_parse_assign_stmt. The token read from reader PARSER should be the first operand in rhs of the tuple. */ -static void +static tree gp_parse_expect_rhs_op (gimple_parser *parser) { const gimple_token *next_token; + tree rhs = NULL_TREE; next_token = gl_peek_token (parser-lexer); @@ -402,11 +443,13 @@ case CPP_NUMBER: case CPP_STRING: next_token = gl_consume_token (parser-lexer); + rhs = gimple_symtab_get_token (next_token); break; default: break; } + } @@ -420,9 +463,10 @@ gimple_token *optoken; enum tree_code opcode; enum gimple_rhs_class rhs_class; + tree op1 = NULL_TREE, op2 = NULL_TREE, op3 = NULL_TREE; opcode = gp_parse_expect_subcode (parser, optoken); - gp_parse_expect_lhs (parser); + tree lhs = gp_parse_expect_lhs (parser); rhs_class = get_gimple_rhs_class (opcode); switch (rhs_class) @@ -436,16 +480,16 @@ case GIMPLE_UNARY_RHS: case GIMPLE_BINARY_RHS: case GIMPLE_TERNARY_RHS: - gp_parse_expect_rhs_op (parser); + op1 = gp_parse_expect_rhs_op (parser); if (rhs_class == GIMPLE_BINARY_RHS || rhs_class == GIMPLE_TERNARY_RHS) { gl_consume_expected_token (parser-lexer, CPP_COMMA); - gp_parse_expect_rhs_op (parser); + op2 = gp_parse_expect_rhs_op (parser); } if (rhs_class == GIMPLE_TERNARY_RHS) { gl_consume_expected_token (parser-lexer, CPP_COMMA); - gp_parse_expect_rhs_op (parser); + op3 = gp_parse_expect_rhs_op (parser); } break; @@ -454,6 +498,9 @@ } gl_consume_expected_token (parser-lexer, CPP_GREATER); + + gimple stmt = gimple_build_assign_with_ops (code, lhs, op1, op2, op3); + gcc_assert(verify_gimple_stmt(stmt)); } /* Helper for gp_parse_cond_stmt. The token read from reader PARSER should -- Cheers Sandy
Re: New option to turn off stack reuse for temporaries
I modified the documentation and it now looks like this: @item -ftemp-stack-reuse @opindex ftemp_stack_reuse This option enables stack space reuse for temporaries. The default is on. The lifetime of a compiler generated temporary is well defined by the C++ standard. When a lifetime of a temporary ends, and if the temporary lives in memory, an optimizing compiler has the freedom to reuse its stack space with other temporaries or scoped local variables whose live range does not overlap with it. However some of the legacy code relies on the behavior of older compilers in which temporaries' stack space is not reused, the aggressive stack reuse can lead to runtime errors. This option is used to control the temporary stack reuse optimization. Does it look ok? thanks, David On Wed, Jun 20, 2012 at 5:29 PM, Jason Merrill ja...@redhat.com wrote: The documentation needs to explain more what the option controls, and why you might want it on or off. Other than that it looks fine. Jason
Re: RFA: PATCH to Makefile.def/tpl to add libgomp to make check-c++
On Jun 20, 2012, at 12:26 AM, Jason Merrill ja...@redhat.com wrote: The recent regression in libgomp leads me to want to add libgomp tests to the check-c++ target. I'm fine with the idea...