Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
Il 13/10/2012 00:25, Steven Bosscher ha scritto: On Fri, Oct 12, 2012 at 11:16 PM, Jan Hubicka hubi...@ucw.cz wrote: On Fri, Oct 12, 2012 at 10:44 PM, Jan Hubicka hubi...@ucw.cz wrote: 1) computing liveness with REG_EQUAL included prior RD that means a lot of shuffling of REG_DEAD notes I was already working on a patch for this. I'll send it here later tonight. Great, thanks! This is probably most sensible approach even if we will need to recompute liveness before/after webizer. I don't think we have to touch the liveness sets. We can compute an extra set of registers live only for REG_EQUAL/REG_EQUIV notes. Attached is what I had in mind. Untested, etc. it's late (and the Yankees are playing) so I'll get back to properly testing this tomorrow. Can we just simulate liveness for web, and drop REG_EQUAL/REG_EQUIV notes that refer to a dead pseudo? Paolo
Re: [i386] scalar ops that preserve the high part of a vector
On Sat, Oct 13, 2012 at 10:52 AM, Marc Glisse marc.gli...@inria.fr wrote: Hello, this patch provides an alternate pattern to let combine recognize scalar operations that preserve the high part of a vector. If the strategy is all right, I could do the same for more operations (mul, div, ...). Something similar is also possible for V4SF (different pattern though), but probably not as useful. But, we _do_ have vec_merge pattern that describes the operation. Adding another one to each operation just to satisfy combine is IMO not correct approach. I'd rather see generic RTX simplification that simplifies your proposed pattern to vec_merge pattern. Also, as you mention in PR54855, Comment #5, the approach is too fragile... Uros.
Re: [patch] PR54919 - fix variable expansion in RTL loop unrolling
Today appears to be RTL loop optimizer patch day, because here's another patch... The problem here is that variable expansion does not update REG_EQUAL notes when it performs replacement of the renamed register. I fixed this by using validate_replace_rtx_group(). There is already code in analyze_insn_to_expand_var() to make sure that the to-be-replaced register is only used to accumulate into, so I think that using validate_replace_rtx_group is safe. Could use a 2nd pair of eyes to make sure, though. At least the comments in there make it clear that's indeed the intent. Tested with a bootstrapped compiler. Test coverage isn't great, because variable expansion is not enabled by default. OK, thanks (if you also add the testcase to gcc.dg with the special options). -- Eric Botcazou
Re: [PATCH, gdc] - Merging gdc (GNU D Compiler) into gcc
On 5 October 2012 11:35, Richard Guenther richard.guent...@gmail.com wrote: On Fri, Oct 5, 2012 at 12:07 PM, Iain Buclaw ibuc...@ubuntu.com wrote: On 5 October 2012 01:06, Joseph S. Myers jos...@codesourcery.com wrote: On Thu, 4 Oct 2012, Iain Buclaw wrote: The only patches to gcc proper are documentation-related and adding the D frontend / libphobos to configure and make files. I would have thought that these would typically only be included with the actual front-end? Looking back at my previous review comments, I suggested that you might need to split up c-common.[ch] so that certain parts of attribute handling could be shared with D, because duplicate code copied from elsewhere in GCC was not an appropriate implementation approach. Have you then eliminated the duplicate code in some other way that does not involve splitting up those files so code can be shared? Ah, no; thanks for reminding me of this. The code duplicated from c-common.[ch] are the handlers for C __attributes__, however gdc doesn't use all of them because some just don't have a fitting place eg: gnu_inline, artificial. Would the best approach be to move all handle_* functions and any helper functions into a new source file that can be shared between frontends, and define two new frontend hooks, LANG_HOOK_ATTRIBUTE_TABLE and LANG_HOOK_FORMAT_ATTRIBUTE_TABLE ? Btw, the LTO frontend also has most of the stuff duplicated ... (see lto/lto-lang.c). Not sure why ... Richard. Looks like LTO's frontend has the relevant attributes duplicated in order to support the attributes used for GCC builtins (const, pure, nothrow, transaction_pure, etc...). Probably only these handlers that could move to a common frontend location, and keep the rest as part of c-family. Regards, -- Iain Buclaw *(p e ? p++ : p) = (c 0x0f) + '0';
Re: [PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)
On Sat, Oct 13, 2012 at 11:12 PM, Vladimir Makarov vmaka...@redhat.com wrote: Ok for the idea. If we have a problem later, we could fix it. I'll look at the next version of the patch when you send it to give your the final approval. Great, thanks! Here is the updated patch, tested in the same way as the previous version. Ciao! Steven ira-speedup-3.diff Description: Binary data
Re: encoding all aliases options in .opt files
Manuel López-Ibáñez lopeziba...@gmail.com writes: aux-infoFILE /* we could accept this to be compatible with some options like -B */ Concatenated option arguments (without separators like '=' or '-') should only ever be used for single character options. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Re: [i386] scalar ops that preserve the high part of a vector
On Sun, 14 Oct 2012, Uros Bizjak wrote: On Sat, Oct 13, 2012 at 10:52 AM, Marc Glisse marc.gli...@inria.fr wrote: Hello, this patch provides an alternate pattern to let combine recognize scalar operations that preserve the high part of a vector. If the strategy is all right, I could do the same for more operations (mul, div, ...). Something similar is also possible for V4SF (different pattern though), but probably not as useful. But, we _do_ have vec_merge pattern that describes the operation. Adding another one to each operation just to satisfy combine is IMO not correct approach. At some point I wondered about _replacing_ the existing pattern, so there would only be one ;-) The vec_merge pattern takes as argument 2 vectors instead of a vector and a scalar, and describes the operation as a vector operation where we drop half of the result, instead of a scalar operation where we re-add the top half of the vector. I don't know if that's the most convenient choice. Adding code in simplify-rtx to replace vec_merge with vec_concat / vec_select might be easier than the other way around. If the middle-end somehow gave us: (plus X (vec_concat Y 0)) it would seem a bit strange to add an optimization that turns it into: (vec_merge (plus X (subreg:V2DF Y)) X 1) but then producing: (vec_concat (plus (vec_select X 0) Y) (vec_select X 1)) would be strange as well. (ignoring the signed zero issues here) I'd rather see generic RTX simplification that simplifies your proposed pattern to vec_merge pattern. Ok, I'll see what I can do. Also, as you mention in PR54855, Comment #5, the approach is too fragile... I am not sure I can make the RTX simplification much less fragile... Whenever I see (vec_concat X (vec_select Y 1)), I would have to check whether X is some (possibly large) tree of scalar computations involving Y[0], move it all to vec_merge computations, and fix other users of some of those scalars to now use S[0]. Seems too hard, I would stop at single-operation X that is used only once. Besides, the gain is larger in proportion when there is a single operation :-) Thank you for your comments, -- Marc Glisse
Re: encoding all aliases options in .opt files
On 14 October 2012 13:38, Andreas Schwab sch...@linux-m68k.org wrote: Manuel López-Ibáñez lopeziba...@gmail.com writes: aux-infoFILE /* we could accept this to be compatible with some options like -B */ Concatenated option arguments (without separators like '=' or '-') should only ever be used for single character options. We could make that rule explicit in the options-handling machinery. Cheers, Manuel.
[PATCH, alpha]: Remove empty predicates and/or constraints from .md files
Hello! 2012-10-14 Uros Bizjak ubiz...@gmail.com * config/alpha/alpha.md: Remove empty predicates and/or constraints. * config/alpha/sync.md: Ditto. Tested on alphaev68-pc-linux-gnu, committed to mainline SVN. Uros. a.diff.txt.gz Description: GNU Zip compressed data
[C++ testcase] PR 52643
Hi, testcase added, issue closed as fixed. Tested x86_64-linux. Thanks, Paolo. 2012-10-14 Paolo Carlini paolo.carl...@oracle.com PR c++/52643 * g++.dg/opt/pr52643.C: New. Index: g++.dg/opt/pr52643.C === --- g++.dg/opt/pr52643.C(revision 0) +++ g++.dg/opt/pr52643.C(working copy) @@ -0,0 +1,64 @@ +// PR c++/52643 +// { dg-options -O } + +templateclass T class already_AddRefd {}; + +templateclass T +class ObjRef +{ +public: + ObjRef() {} + + ObjRef(const already_AddRefdT aar) {} + + ~ObjRef() + { +T* mPtr; +mPtr-release_ref(); + } + + operator T* () const + { +return __null; + } + + templateclass U + void operator= (const already_AddRefdU newAssign) {} +}; + +class MyRetClass { +public: + void release_ref(); +}; + +class MyClass +{ + void appendChild(); + void getTripleOutOfByPredicate(); + already_AddRefdMyRetClass getNextTriple(); +}; + +void +MyClass::getTripleOutOfByPredicate() +{ + ObjRefMyRetClass t (getNextTriple()); + + if (t == __null) +throw MyRetClass(); +} + +void +MyClass::appendChild() +{ + while (1) + { +try +{ + ObjRefMyRetClass t (getNextTriple()); + continue; +} +catch (MyRetClass) +{ +} + } +}
Re: [patch] PR54919 - fix variable expansion in RTL loop unrolling
Hello, Today appears to be RTL loop optimizer patch day, because here's another patch... The problem here is that variable expansion does not update REG_EQUAL notes when it performs replacement of the renamed register. Hehe. or rather REG_EQUAL patch day :) It makes me wonder how much of the REG_EQUAL machinery we stil make good use of. I fixed this by using validate_replace_rtx_group(). There is already code in analyze_insn_to_expand_var() to make sure that the to-be-replaced register is only used to accumulate into, so I think that using validate_replace_rtx_group is safe. Could use a 2nd pair of eyes to make sure, though. Tested with a bootstrapped compiler. Test coverage isn't great, because variable expansion is not enabled by default. Are there particular reasons to not enable it? It seems like usefull optimization. Honza
Re: Propagate profile counts during switch expansion
Hi, Index: optabs.c === --- optabs.c(revision 191879) +++ optabs.c(working copy) @@ -4249,7 +4249,7 @@ prepare_operand (enum insn_code icode, rtx x, int we can do the branch. */ static void -emit_cmp_and_jump_insn_1 (rtx test, enum machine_mode mode, rtx label) +emit_cmp_and_jump_insn_1 (rtx test, enum machine_mode mode, rtx label, int prob) { enum machine_mode optab_mode; enum mode_class mclass; @@ -4261,7 +4261,16 @@ static void gcc_assert (icode != CODE_FOR_nothing); gcc_assert (insn_operand_matches (icode, 0, test)); - emit_jump_insn (GEN_FCN (icode) (test, XEXP (test, 0), XEXP (test, 1), label)); + rtx insn = emit_insn ( + GEN_FCN (icode) (test, XEXP (test, 0), XEXP (test, 1), label)); I think we did not change to style of mixing declaration and code yet. So please put declaration ahead. I think you want to keep emit_jump_insn. Also do nothing when profile_status == PROFILE_ABSENT. Index: cfgbuild.c === --- cfgbuild.c (revision 191879) +++ cfgbuild.c (working copy) @@ -559,8 +559,11 @@ compute_outgoing_frequencies (basic_block b) f-count = b-count - e-count; return; } + else +{ + guess_outgoing_edge_probabilities (b); +} Add comment here that we rely on multiway BBs having sane probabilities already. You still want to do guessing when the edges out are EH. Those also can be many. Index: expr.h === --- expr.h (revision 191879) +++ expr.h (working copy) @@ -190,7 +190,7 @@ extern int have_sub2_insn (rtx, rtx); /* Emit a pair of rtl insns to compare two rtx's and to jump to a label if the comparison is true. */ extern void emit_cmp_and_jump_insns (rtx, rtx, enum rtx_code, rtx, -enum machine_mode, int, rtx); +enum machine_mode, int, rtx, int prob=-1); Hmm, probably first appreance of this C++ construct. I suppose it is OK. +static inline void +reset_out_edges_aux (basic_block bb) +{ + edge e; + edge_iterator ei; + FOR_EACH_EDGE(e, ei, bb-succs) +e-aux = (void *)0; +} +static inline void +compute_cases_per_edge (gimple stmt) +{ + basic_block bb = gimple_bb (stmt); + reset_out_edges_aux (bb); + int ncases = gimple_switch_num_labels (stmt); + for (int i = ncases - 1; i = 1; --i) +{ + tree elt = gimple_switch_label (stmt, i); + tree lab = CASE_LABEL (elt); + basic_block case_bb = label_to_block_fn (cfun, lab); + edge case_edge = find_edge (bb, case_bb); + case_edge-aux = (void *)((long)(case_edge-aux) + 1); +} +} Comments and newlines per coding standard. With the these changes, the patch is OK Thanks, Honza
Re: Use conditional casting with symtab_node
On Fri, Oct 12, 2012 at 4:22 AM, Richard Biener richard.guent...@gmail.com wrote: I also think that instead of if (cgraph_node *q = p-cast_to cgraph_node * ()) we want if ((q = cast_to cgraph_node * (p)) I see absolutely no good reason to make cast_to a member, given that the language has static_cast, const_cast and stuff. cast_to would simply be our equivalent to dynamic_cast within our OO model. Then I'd call it *_cast instead of cast_*, so, why not gcc_cast ? Or dyn_cast (). That way if ((q = dyn_cast function * (p)) This looks fine to me. Diego.
Fix estimated number of iterations for loops with multiple exits
Hi, the update of RTL optimizers to use SCEV's loop bounds make them to be inexpectedly active. One of reasons is invalid. For loop int *a; int t() { int i; for (i=0;i100;i++) if (a[i]) return 1; return 0; } We get realistic number of iteration estimate to be 999. This is quite wrong. We could however still predict loop: int t2() { int i; for (i=0;i300;i++) if (a[i]) abort (); return 0; } This patch implements that by making estimate_numbers_of_iterations_loop to save the realistic estimate only when all other exits out of the loop are unlikely (i.e. EH or predicted by NORETURN or similarly strong heuristic). Bootstrapped/regtested x86_64-linux, comitted. Honza * tree-ssa-loop-niter.c (estimate_numbers_of_iterations_loop): Do not predict loops with multiple exits realistically. * cfgloopanal.c (single_likely_exit): New function. * gcc.dg/unroll_5.c: New testcase. Index: tree-ssa-loop-niter.c === --- tree-ssa-loop-niter.c (revision 192432) +++ tree-ssa-loop-niter.c (working copy) @@ -2965,6 +2965,7 @@ estimate_numbers_of_iterations_loop (str struct tree_niter_desc niter_desc; edge ex; double_int bound; + edge likely_exit; /* Give up if we already have tried to compute an estimation. */ if (loop-estimate_state != EST_NOT_COMPUTED) @@ -2975,6 +2976,7 @@ estimate_numbers_of_iterations_loop (str loop-any_estimate = false; exits = get_loop_exit_edges (loop); + likely_exit = single_likely_exit (loop); FOR_EACH_VEC_ELT (edge, exits, i, ex) { if (!number_of_iterations_exit (loop, ex, niter_desc, false)) @@ -2988,7 +2990,7 @@ estimate_numbers_of_iterations_loop (str niter); record_estimate (loop, niter, niter_desc.max, last_stmt (ex-src), - true, true, true); + true, ex == likely_exit, true); } VEC_free (edge, heap, exits); Index: cfgloopanal.c === --- cfgloopanal.c (revision 192432) +++ cfgloopanal.c (working copy) @@ -446,3 +446,40 @@ mark_loop_exit_edges (void) } } +/* Return exit edge if loop has only one exit that is likely + to be executed on runtime (i.e. it is not EH or leading + to noreturn call. */ + +edge +single_likely_exit (struct loop *loop) +{ + edge found = single_exit (loop); + VEC (edge, heap) *exits; + unsigned i; + edge ex; + + if (found) +return found; + exits = get_loop_exit_edges (loop); + FOR_EACH_VEC_ELT (edge, exits, i, ex) +{ + if (ex-flags (EDGE_EH | EDGE_ABNORMAL_CALL)) + continue; + /* The constant of 5 is set in a way so noreturn calls are +ruled out by this test. The static branch prediction algorithm + will not assign such a low probability to conditionals for usual + reasons. */ + if (profile_status != PROFILE_ABSENT + ex-probability 5 !ex-count) + continue; + if (!found) + found = ex; + else + { + VEC_free (edge, heap, exits); + return NULL; + } +} + VEC_free (edge, heap, exits); + return found; +} Index: testsuite/gcc.dg/unroll_5.c === --- testsuite/gcc.dg/unroll_5.c (revision 0) +++ testsuite/gcc.dg/unroll_5.c (revision 0) @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options -O3 -fdump-rtl-loop2_unroll -funroll-loops } */ +void abort (void); +int *a; +int t() +{ + int i; + for (i=0;i100;i++) +if (a[i]) + return 1; + return 0; +} +int t2() +{ + int i; + for (i=0;i300;i++) +if (a[i]) +abort (); + return 0; +} +/* { dg-final { scan-rtl-dump-times upper bound: 99 1 loop2_unroll } } */ +/* { dg-final { scan-rtl-dump-not realistic bound: 99 loop2_unroll } } */ +/* { dg-final { scan-rtl-dump-times upper bound: 299 1 loop2_unroll } } */ +/* { dg-final { scan-rtl-dump-times realistic bound: 299 1 loop2_unroll } } */ +/* { dg-final { cleanup-rtl-dump loop2_unroll } } */
LangEnabledBy with arguments
Bootstrapped and regression tested on x86_64-linux-gnu. The additional testcase was not failing before, but tests for something that the current testsuite does not. OK? 2012-10-14 Manuel López-Ibáñez m...@gcc.gnu.org PR c/53063 PR c/40989 gcc/ * optc-gen.awk: Handle new form of LangEnabledBy. * opts.c (set_Wstrict_aliasing): Declare here. Make static. * common.opt (Wstrict-aliasing=,Wstrict-overflow=): Do not use Init. * doc/options.texi (LangEnabledBy): Document new form. * flags.h (set_Wstrict_aliasing): Do not declare. c-family/ * c.opt (Wstrict-aliasing=,Wstrict-overflow=): Use LangEnabledBy. * c-opts.c (c_common_handle_option): Do not set them here. Add comment. (c_common_post_options): Likewise. testsuite/ * gcc.dg/Wstrict-overflow-24.c: New. lang-enabled-by-with-args2.diff Description: Binary data
Re: [PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)
On 12-10-14 6:16 AM, Steven Bosscher wrote: On Sat, Oct 13, 2012 at 11:12 PM, Vladimir Makarov vmaka...@redhat.com wrote: Ok for the idea. If we have a problem later, we could fix it. I'll look at the next version of the patch when you send it to give your the final approval. Great, thanks! Here is the updated patch, tested in the same way as the previous version. Thanks, Steven. IRA part is ok for me to commit.
Re: [lra] patch from Richard Sandiford's review of lra-assigns.c
On 12-10-12 11:00 AM, Richard Sandiford wrote: Vladimir Makarov vmaka...@redhat.com writes: The following patch implements most Richard's proposals for LRA lra-spills.c and lra-coalesce.c files. The patch was successfully bootstrapped on x86/x86-64. Committed as rev. 192389. Thanks for the updates. Looks good to me. Just one comment though: @@ -125,7 +136,7 @@ process_copy_to_form_thread (int regno1, last = regno_assign_info[last].next) regno_assign_info[last].first = regno1_first; regno_assign_info[last].next = regno_assign_info[regno1_first].next; - regno_assign_info[regno1_first].first = regno2_first; + regno_assign_info[regno1_first].next = regno2_first; regno_assign_info[regno1_first].freq += regno_assign_info[regno2_first].freq; } I still think this is missing a: regno_assign_info[last].first = regno1_first; Thanks, Richard. I fixed in my today patch.
[lra] new hint * interpreitation.
The following patch adds a new interpretation of hint * for LRA. 2012-10-14 Vladimir Makarov vmaka...@redhat.com * doc/tm.texi: Add new interpretation of hint * for LRA. Committed as rev. 192436. Index: doc/md.texi === --- doc/md.texi (revision 192325) +++ doc/md.texi (working copy) @@ -1,5 +1,5 @@ @c Copyright (C) 1988, 1989, 1992, 1993, 1994, 1996, 1998, 1999, 2000, 2001, -@c 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011 +@c 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012 @c Free Software Foundation, Inc. @c This is part of the GCC manual. @c For copying conditions, see the file gcc.texi. @@ -1606,7 +1606,9 @@ @item * Says that the following character should be ignored when choosing register preferences. @samp{*} has no effect on the meaning of the -constraint as a constraint, and no effect on reloading. +constraint as a constraint, and no effect on reloading. For LRA +@samp{*} additionally disparages slightly the alternative if the +following character matches the operand. @ifset INTERNALS Here is an example: the 68000 has an instruction to sign-extend a
Re: [lra] patch to fix GCC crash on a SPEC2006 test
On 12-10-13 11:37 AM, Peter Bergner wrote: On Thu, 2012-10-11 at 23:53 -0400, Vladimir Makarov wrote: Is the following comment better? Presence of any pseudo in CALL_INSN_FUNCTION_USAGE does not affect value of insn_bitmap of the corresponding lra_reg_info. That is because we don't need to reload pseudos in CALL_INSN_FUNCTION_USAGEs. So if we process only insns in the insn_bitmap of given pseudo here, we can miss the pseudo in some CALL_INSN_FUNCTION_USAGEs. Sure, that's better. Thanks. Ok. Fixed.
Re: [SH] PR 34777 - Add test case
On Wed, 2012-10-10 at 07:46 +0900, Kaz Kojima wrote: Oleg Endo oleg.e...@t-online.de wrote: Uhm, yes, I forgot to add the -fschedule-insns and -mprefergot options. Regarding the -Os option, I think it's better to test this one at multiple optimization levels, just in case. I've looked through gcc.c-torture/compile and found some target specific test cases there, so I thought it would be OK to do the same :) Some targets also have their own torture subdir. If it's better, I could also create gcc.target/sh/torture. Maybe. For this specific test, I thought that -Os -fschedule-insns -fPIC -mprefergot would be enough because empirically these options will give high R0 register pressure which had caused that PR. Sorry for the delayed reply. The attached patch adds gcc.target/sh/torture and puts the test there. The torture subdir might be also useful in the future. Tested on rev 192417 with make -k check-gcc RUNTESTFLAGS=--target_board=sh-sim\{-m2/-ml} OK? Cheers, Oleg testsuite/ChangeLog: PR target/34777 * gcc.target/sh/torture/sh-torture.exp: New. * gcc.target/sh/torture/pr34777.c: New. Index: gcc/testsuite/gcc.target/sh/torture/pr34777.c === --- gcc/testsuite/gcc.target/sh/torture/pr34777.c (revision 0) +++ gcc/testsuite/gcc.target/sh/torture/pr34777.c (revision 0) @@ -0,0 +1,30 @@ +/* { dg-do compile { target sh*-*-* } } */ +/* { dg-additional-options -fschedule-insns -fPIC -mprefergot } */ +/* { dg-skip-if { sh*-*-* } { -m5* } { } } */ + +static __inline __attribute__ ((__always_inline__)) void * +_dl_mmap (void * start, int length, int prot, int flags, int fd, + int offset) +{ + register long __sc3 __asm__ (r3) = 90; + register long __sc4 __asm__ (r4) = (long) start; + register long __sc5 __asm__ (r5) = (long) length; + register long __sc6 __asm__ (r6) = (long) prot; + register long __sc7 __asm__ (r7) = (long) flags; + register long __sc0 __asm__ (r0) = (long) fd; + register long __sc1 __asm__ (r1) = (long) offset; + __asm__ __volatile__ (trapa %1 + : =z (__sc0) + : i (0x10 + 6), 0 (__sc0), r (__sc4), + r (__sc5), r (__sc6), r (__sc7), + r (__sc3), r (__sc1) + : memory ); +} + +extern int _dl_pagesize; +void +_dl_dprintf(int fd, const char *fmt, ...) +{ + static char *buf; + buf = _dl_mmap ((void *) 0, _dl_pagesize, 0x1 | 0x2, 0x02 | 0x20, -1, 0); +} Index: gcc/testsuite/gcc.target/sh/torture/sh-torture.exp === --- gcc/testsuite/gcc.target/sh/torture/sh-torture.exp (revision 0) +++ gcc/testsuite/gcc.target/sh/torture/sh-torture.exp (revision 0) @@ -0,0 +1,41 @@ +# Copyright (C) 2012 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# http://www.gnu.org/licenses/. + +# GCC testsuite that uses the `gcc-dg.exp' driver, looping over +# optimization options. + +# Exit immediately if this isn't a SH target. +if { ![istarget sh*-*-*] } then { + return +} + +# Load support procs. +load_lib gcc-dg.exp + +# If a testcase doesn't have special options, use these. +global DEFAULT_CFLAGS +if ![info exists DEFAULT_CFLAGS] then { +set DEFAULT_CFLAGS -ansi -pedantic-errors +} + +# Initialize `dg'. +dg-init + +# Main loop. +gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\]]] $DEFAULT_CFLAGS + +# All done. +dg-finish
[C++ testcase] PR 53581
Hi, testcase added, issue closed as fixed. Tested x86_64-linux. Thanks, Paolo. / 2012-10-14 Paolo Carlini paolo.carl...@oracle.com PR c++/53581 * g++.dg/template/crash113.C: New. Index: g++.dg/template/crash113.C === --- g++.dg/template/crash113.C (revision 0) +++ g++.dg/template/crash113.C (working copy) @@ -0,0 +1,50 @@ +// PR c++/53581 + +templateclass A, int M, int N +class Child; + +templateclass A, int M, int N +class Base +{ +public: + ChildA, M, N operator-(const BaseA, M, N m) const + { +ChildA, M, N diff; +return diff; + } + + A test() const + { +return 0; + } + +private: + A values[M * N]; +}; + +templateclass A, int N +class Ops +{ +public: + virtual ~Ops() {} + + bool bar() const + { +ChildA, N, N mat; +return (*static_castconst ChildA, N, N*(this) - mat).test(); + } +}; + + +templateclass A, int N +class ChildA, N, N : public BaseA, N, N, public OpsA, N {}; + +class ImageWarp +{ + bool bar() const + { +return foo.bar(); + } + + Childfloat, 3, 3 foo; +};
Re: [patch] PR54919 - fix variable expansion in RTL loop unrolling
On Sun, Oct 14, 2012 at 4:18 PM, Jan Hubicka wrote: Tested with a bootstrapped compiler. Test coverage isn't great, because variable expansion is not enabled by default. Are there particular reasons to not enable it? It seems like usefull optimization. I don't know of any reason not to enable it, but I have no access to fancy benchmarks to see what happens if the option is enabled. Wouldn't hurt to throw this at SPEC2k6 or something like that, just to see what happens. Ciao! Steven
Re: [patch] PR54919 - fix variable expansion in RTL loop unrolling
On Sun, Oct 14, 2012 at 11:11 AM, Eric Botcazou wrote: OK, thanks (if you also add the testcase to gcc.dg with the special options). Thanks, committed as trunk r192439. Ciao! Steven
Re: [PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)
On Sun, Oct 14, 2012 at 7:19 PM, Vladimir Makarov wrote: Thanks, Steven. IRA part is ok for me to commit. Thanks, I've committed this as trunk r192440. I'm aware I'm on the hook for fixing any fall-out :-) Ciao! Steven
Tidy store_bit_field_1 co.
insv, extv and extzv have an unusual interface: the structure operand is supposed to have word_mode if stored in registers or byte_mode if stored in memory. Andrew's patch to try different insv modes: http://gcc.gnu.org/ml/gcc-patches/2012-09/msg00126.html prompted me to try making the patterns more like other optabs. The use of word and byte units for registers and memory respectively is pretty deeply engrained into the current expand routines, even in the parts that don't deal directly with the .md patterns. E.g. the bitnum parameter to store_bit_field always counts from the leftmost bit of OP0, but store_bit_field_1 internally converts it to a trio of unit, offset (number of whole units) and bitpos (position within a unit). The latter two are then also used in the interface to store_fixed_bit_field, with the unit being implicit. store_split_bit_field uses the original bitnum-style parameter instead. This patch makes the code use the original bitnum throughout, and only separate into units where locally useful. Also, if the field spans two words of a register OP0, store_bit_field_1 reduces OP0 to just the first word. It then makes sure that we fall through to store_fixed_bit_field, which in turn calls store_split_bit_field, which knows that OP0 is only partial. I think this is dangerous: it's the only time that store_bit_field_1 trims OP0 to cover only part of the field, and so adds another special case for the rest of the function to handle and ignore. It also makes the interface to store_fixed_bit_field more complicated. The patch instead makes store_bit_field_1 call store_split_bit_field directly where appropriate. diffstat for this patch and the one I'm about to post says: expmed.c | 640 +-- 1 file changed, 261 insertions(+), 379 deletions(-) so I'd like to submit them as clean ups regardless of whether I ever get around to the main patterns change. The patch is probably quite hard to review, sorry. I've made the changelog a bit more detailed than usual in order to list the individual points. Tested on x86_64-linux-gnu, powerpc64-linux-gnu, mipsisa64-elf (both -EL and -EB) and mipsisa32-elf (also both -EL and -EB). OK to install? Richard gcc/ * expmed.c (store_bit_field_1): Remove unit, offset, bitpos and byte_offset from the outermost scope. Express conditions in terms of bitnum rather than offset, bitpos and byte_offset. Split the plain move cases into two, one for memory accesses and one for register accesses. Allow simplify_gen_subreg to fail rather than calling validate_subreg. Move the handling of multiword OP0s after the code that coerces VALUE to an integer mode. Use simplify_gen_subreg for this case and assert that it succeeds. If the field still spans several words, pass it directly to store_split_bit_field. Assume after that point that both sources and register targets fit within a word. Replace x-prefixed variables with non-prefixed forms. Compute the bitpos for insv register operands directly in the chosen unit size, rather than going through an intermediate BITS_PER_WORD unit size. Update the call to store_fixed_bit_field. (store_fixed_bit_field): Replace the bitpos and offset parameters with a single bitnum parameter, of the same form as store_bit_field. Assume that OP0 contains the full field. Simplify the memory offset calculation. Assert that the processed OP0 has an integral mode. (store_split_bit_field): Update the call to store_fixed_bit_field. Index: gcc/expmed.c === --- gcc/expmed.c2012-10-13 19:46:00.862780569 +0100 +++ gcc/expmed.c2012-10-14 11:41:48.692695324 +0100 @@ -49,7 +49,6 @@ static void store_fixed_bit_field (rtx, unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, - unsigned HOST_WIDE_INT, rtx); static void store_split_bit_field (rtx, unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, @@ -409,15 +408,9 @@ store_bit_field_1 (rtx str_rtx, unsigned enum machine_mode fieldmode, rtx value, bool fallback_p) { - unsigned int unit -= (MEM_P (str_rtx)) ? BITS_PER_UNIT : BITS_PER_WORD; - unsigned HOST_WIDE_INT offset, bitpos; rtx op0 = str_rtx; - int byte_offset; rtx orig_value; - enum machine_mode op_mode = mode_for_extraction (EP_insv, 3); - while (GET_CODE (op0) == SUBREG) { /* The following line once was done only if WORDS_BIG_ENDIAN, @@ -427,8 +420,7 @@ store_bit_field_1 (rtx str_rtx, unsigned always get higher addresses. */ int
Re: Make try_unroll_loop_completely to use loop bounds recorded
Hi, here is an updated patch. The idea of splitting loopback edge did not fly. We then remove the edge in cfgcleanup prior demolyshing the loop and we loose track on what basic blocks needs updating because we no longer can get the loop body. As a good news however I do not need the changed loop depth walking. The infinite recursion I was running into before disappeared. I guess it was another bug I fixed later properly. I also looked into what unroll and loop_depth is doing and it is using IRREDUCIBLE flags only to set the irred_invalidated flag. Also the use of IRREDUCIBLE flag within the unrolling itself (to locate the last exit of the loop) is safe WRT updates we do, so we only need to recompute it when done after all the changes. This solve the quadratic time issue. The pass also works when canonicalization is done on all loops, not just innermost but I would also like to enable this separately of this change. I also updated Java and Fortran for the builtin_unreachable macro. Those are the only constructing builtin_expect that is also used internaly. I also noticed that the builtin is missing CONST flag (it is looping const that is possible to decare by combination of const and noreturn) but I will do that incrementally. I am honestly not sure what Ada and Go does here to get around to duplicate all this mess, but they don't seem to handle other similar cases either. The patch now adds a regression on Fortran testcase that simplifies into: ! { dg-do run } ! Program to check corner cases for DO statements. program do_1 implicit none integer i, j ! limit=HUGE(i), step 1 j = 0 do i = HUGE(i) - 10, HUGE(i), 1 j = j + 1 end do if (j .ne. 11) call abort end program here loop iterates into INT_MAX and compiles as: bb 3: # i_8 = PHI 2147483637(2), i_9(3) # j_6 = PHI 0(2), j_7(3) # DEBUG j = j_6 # DEBUG i = i_8 j_7 = j_6 + 1; # DEBUG j = j_7 i_9 = i_8 + 1; # DEBUG i = i_9 if (i_8 == 2147483647) goto bb 4; else goto bb 3; Now we try to estimate number of iterations as: Statement i_9 = i_8 + 1; is executed at most 9 (bounded by 9) + 1 times in loop 1. Loop 1 iterates at most 9 times. This is one iteration fewer than it ought to be. The problem is that result of i_9=i_8+1 is undefined on the last iteration but program is still valid because the value is not used (it is used only by the PHI on i_8). So this seems like another semi-latent bug in tree-ssa-niter. Any ideas what to do here? I think we need to prove that the value is used in something that matters: i.e. loop exit test or memory access and only bound number of executions of statements using them. The patch will also need upating in gcc.target/i386/l_fma_* testcases. The reason is that we peel the vectorized prologues/epilogues that was in fact motivation for this whole patch. The testcases counts number of instructions appearing in them and needs compensation for different cost models of the patch, so I plan to do it for the final version only. Bootstrapped/regtested x86_64-linux (modulo the regressions above) and also tested with -O3 bootstrap that passes with -Wno-error. Honza * gcc.dg/tree-ssa/cunroll-1.c: New testcase. * gcc.dg/tree-ssa/cunroll-2.c: New testcase. * gcc.dg/tree-ssa/cunroll-3.c: New testcase. * gcc.dg/tree-ssa/cunroll-4.c: New testcase. * gcc.dg/tree-ssa/cunroll-5.c: New testcase. * cfgloopmanip.c (unloop): Export. * tree-ssa-loop-ivcanon.c (tree_estimate_loop_size): Estimate also with unknown exit conditional. (try_unroll_loop_completely): Use max_loop_iterations_int to unroll also loops with low upper bound; handle unlooping of the last loop even when exit conditional is not known; unloop loop that do not loop even if they are not innermost. (canonicalize_loop_induction_variables): Record niter bounds known; try unrolling even if number of iterations is not known; (canonicalize_induction_variables): Handle updating of irreducible loops (tree_unroll_loops_completely): Likewise. * cfgloop.h (unloop): Declare. * f95-lang.c (gfc_init_builtin_functions): Build __builtin_unreachable. Index: java/builtins.c === *** java/builtins.c (revision 192432) --- java/builtins.c (working copy) *** VMSupportsCS8_builtin (tree method_retur *** 453,458 --- 453,460 #define BUILTIN_NOTHROW 1 #define BUILTIN_CONST 2 + #define BUILTIN_NORETURN 4 + #define BUILTIN_LEAF 4 /* Define a single builtin. */ static void define_builtin (enum built_in_function val, *** define_builtin (enum built_in_function v *** 475,480 --- 477,487 TREE_NOTHROW (decl) = 1; if (flags BUILTIN_CONST) TREE_READONLY (decl) = 1; + if (flags BUILTIN_NORETURN) + TREE_THIS_VOLATILE (decl) = 1; + if (flags
Tidy extract_bit_field_1 co.
Partnering the store_bit_field_1 patch that I just posted, this patch tidies up the extract_bit_field code in the same way. There is one deliberate behavioural change here. The old code had a single check for cases where the extraction could be done as a simple move. It started: if (((bitsize = BITS_PER_WORD bitsize == GET_MODE_BITSIZE (mode) bitpos % BITS_PER_WORD == 0) || (mode1 != BLKmode /* ??? The big endian test here is wrong. This is correct if the value is in a register, and if mode_for_size is not the same mode as op0. This causes us to get unnecessarily inefficient code from the Thumb port when -mbig-endian. */ (BYTES_BIG_ENDIAN ? bitpos + bitsize == BITS_PER_WORD : bitpos == 0))) The BYTES_BIG_ENDIAN check didn't make sense for memory operands though, because bitpos was based on byte units in that case. That might well be what the comment was complaining about; I'm not sure. Also, I made the MODE1 computation take failures of mode_for_size into account. Tested on x86_64-linux-gnu, powerpc64-linux-gnu, mipsisa64-elf (both -EL and -EB) and mipsisa32-elf (also both -EL and -EB). OK to install? Richard gcc/ * expmed.c (store_split_bit_field): Update the calls to extract_fixed_bit_field. In the big-endian case, always use the mode of OP0 to count the number of significant bits. (extract_bit_field_1): Remove unit, offset, bitpos and byte_offset from the outermost scope. Express conditions in terms of bitnum rather than offset, bitpos and byte_offset. Move the computation of MODE1 to the block that needs it. Use MODE unless the TMODE-based mode_for_size calculation succeeds. Split the plain move cases into two, one for memory accesses and one for register accesses. Generalize the memory case, freeing it from the old register-based endian checks. Move the INT_MODE calculation above the code that needs it. Use simplify_gen_subreg to handle multiword OP0s. If the field still spans several words, pass it directly to extract_split_bit_field. Assume after that point that both targets and register sources fit within a word. Replace x-prefixed variables with non-prefixed forms. Compute the bitpos for ext(z)v register operands directly in the chosen unit size, rather than going through an intermediate BITS_PER_WORD unit size. Simplify the containment check used when forcing OP0 into a register. Update the call to extract_fixed_bit_field. (extract_fixed_bit_field): Replace the bitpos and offset parameters with a single bitnum parameter, of the same form as extract_bit_field. Assume that OP0 contains the full field. Simplify the memory offset calculation and containment check for volatile bitfields. Make the offset explicit when volatile bitfields force a misaligned access. Remove WARNED and fix long lines. Assert that the processed OP0 has an integral mode. (store_split_bit_field): Update the call to store_fixed_bit_field. Index: gcc/expmed.c === --- gcc/expmed.c2012-10-14 11:44:27.359686486 +0100 +++ gcc/expmed.c2012-10-14 11:44:41.770685683 +0100 @@ -57,7 +57,6 @@ static void store_split_bit_field (rtx, rtx); static rtx extract_fixed_bit_field (enum machine_mode, rtx, unsigned HOST_WIDE_INT, - unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, rtx, int, bool); static rtx mask_rtx (enum machine_mode, int, int, int); static rtx lshift_value (enum machine_mode, rtx, int, int); @@ -1114,28 +1113,21 @@ store_split_bit_field (rtx op0, unsigned if (BYTES_BIG_ENDIAN) { - int total_bits; - - /* We must do an endian conversion exactly the same way as it is -done in extract_bit_field, so that the two calls to -extract_fixed_bit_field will have comparable arguments. */ - if (!MEM_P (value) || GET_MODE (value) == BLKmode) - total_bits = BITS_PER_WORD; - else - total_bits = GET_MODE_BITSIZE (GET_MODE (value)); - /* Fetch successively less significant portions. */ if (CONST_INT_P (value)) part = GEN_INT (((unsigned HOST_WIDE_INT) (INTVAL (value)) (bitsize - bitsdone - thissize)) (((HOST_WIDE_INT) 1 thissize) - 1)); else - /* The args are chosen so that the last part includes the - lsb. Give extract_bit_field the value it needs (with - endianness compensation) to fetch the piece we want. */ - part =
[patch] Back-port ifcvt.c changes from PR54146
Hello, This patch is a back-port of one of the scalability improvements I made to perform, well, maybe not well but at least not so poorly on the test case of PR54146, which has an extremely large function. The problem in ifcvt.c has two parts. The first is that clearing several arrays of size(max_reg_num) for every basic block slowed down things. The second part is that this memory was being allocated with alloca, so that a sufficiently large function could blow out the stack. The latter problem was now also found by a user trying to compile a sensible and well-known piece of software (see http://gcc.gnu.org/ml/gcc/2012-10/msg00202.html). This code compiles with older GCC releases, so this problem is a regression. To fix the problem in GCC 4.7, I'd like to propose this back-port. Bootstrappedtested with release and default development checking on x86_64-unknown-linux-gnu and on powerpc64-unknown-linux-gnu. The patch has also already spent more than two months on the trunk now without problems. OK for the GCC 4.7 release branch? Maybe also for the GCC 4.6 branch after testing? Ciao! Steven PR54146_ifcvt_47.diff Description: Binary data
[SH] Document function attributes
Hello, The attached patch adds documentation for SH specific function attributes which haven't been documented yet. Tested with 'make info dvi pdf'. OK? Cheers, Oleg gcc/ChangeLog: * config/sh/sh.c: Update function attribute comments. * doc/extend.texi (function_vector): Rephrase SH2A specific part. (nosave_low_regs, renesas, trapa_handler): Document SH specific attributes. (sp_switch, trap_exit): Add to index. Index: gcc/config/sh/sh.c === --- gcc/config/sh/sh.c (revision 192417) +++ gcc/config/sh/sh.c (working copy) @@ -9451,30 +9451,42 @@ return; } -/* Supported attributes: +/*-- +/* Target specific attributes + Supported attributes are: - interrupt_handler -- specifies this function is an interrupt handler. + * interrupt_handler + Specifies this function is an interrupt handler. - trapa_handler - like above, but don't save all registers. + * trapa_handler + Like interrupt_handler, but don't save all registers. - sp_switch -- specifies an alternate stack for an interrupt handler - to run on. + * sp_switch + Specifies an alternate stack for an interrupt handler to run on. - trap_exit -- use a trapa to exit an interrupt function instead of - an rte instruction. + * trap_exit + Use a trapa to exit an interrupt function instead of rte. - nosave_low_regs - don't save r0..r7 in an interrupt handler. - This is useful on the SH3 and upwards, - which has a separate set of low regs for User and Supervisor modes. - This should only be used for the lowest level of interrupts. Higher levels - of interrupts must save the registers in case they themselves are - interrupted. + * nosave_low_regs + Don't save r0..r7 in an interrupt handler function. + This is useful on SH3* and SH4*, which have a separate set of low + regs for user and privileged modes. + This is mainly to be used for non-reentrant interrupt handlers (i.e. + those that run with interrupts disabled and thus can't be + interrupted thenselves). - renesas -- use Renesas calling/layout conventions (functions and - structures). + * renesas + Use Renesas calling/layout conventions (functions and structures). - resbank -- In case of an ISR, use a register bank to save registers - R0-R14, MACH, MACL, GBR and PR. This is useful only on SH2A targets. + * resbank + In case of an interrupt handler function, use a register bank to + save registers R0-R14, MACH, MACL, GBR and PR. + This is available only on SH2A targets. + + * function_vector + Declares a function to be called using the TBR relative addressing + mode. Takes an argument that specifies the slot number in the table + where this function can be looked up by the JSR/N @@(disp8,TBR) insn. */ /* Handle a 'resbank' attribute. */ Index: gcc/doc/extend.texi === --- gcc/doc/extend.texi (revision 192417) +++ gcc/doc/extend.texi (working copy) @@ -2682,17 +2682,16 @@ the function vector has a limited size (maximum 128 entries on the H8/300 and 64 entries on the H8/300H and H8S) and shares space with the interrupt vector. -In SH2A target, this attribute declares a function to be called using the +On SH2A targets, this attribute declares a function to be called using the TBR relative addressing mode. The argument to this attribute is the entry number of the same function in a vector table containing all the TBR -relative addressable functions. For the successful jump, register TBR -should contain the start address of this TBR relative vector table. -In the startup routine of the user application, user needs to care of this -TBR register initialization. The TBR relative vector table can have at -max 256 function entries. The jumps to these functions will be generated -using a SH2A specific, non delayed branch instruction JSR/N @@(disp8,TBR). -You must use GAS and GLD from GNU binutils version 2.7 or later for -this attribute to work correctly. +relative addressable functions. For correct operation the TBR must be setup +accordingly to point to the start of the vector table before any functions with +this attribute are invoked. Usually a good place to do the initialization is +the startup routine. The TBR relative vector table can have at max 256 function +entries. The jumps to these functions will be generated using a SH2A specific, +non delayed branch instruction JSR/N @@(disp8,TBR). You must use GAS and GLD +from GNU binutils version 2.7 or later for this attribute to work correctly. Please refer the example of M16C target, to see the use of this attribute while declaring a function, @@ -3251,6 +3250,13 @@ take function pointer arguments. The @code{nothrow} attribute is not implemented in GCC versions earlier than 3.3. +@item nosave_low_regs +@cindex
[patch] Fix PR rtl-optimization/54870
Hi, This is the execution failure of gfortran.dg/array_constructor_4.f90 in 64-bit mode on SPARC/Solaris at -O3. The dse2 dump for the reduced testcase reads: dse: local deletions = 0, global deletions = 1, spill deletions = 0 starting the processing of deferred insns deleting insn with uid = 25. ending the processing of deferred insns but the memory location stored to: (insn 25 27 154 2 (set (mem/c:SI (plus:DI (reg/f:DI 30 %fp) (const_int 2039 [0x7f7])) [6 A.1+16 S4 A64]) (reg:SI 1 %g1 [136])) array_constructor_4.f90:4 61 {*movsi_insn} (nil)) is read by a subsequent call to memcpy. It turns out that this memcpy call is generated for an aggregate assignment: MEM[(c_char * {ref-all})i] = MEM[(c_char * {ref-all})A.17]; Note the A.1 in the store and the A.17 in the load. A.1 and A.17 are aggregate variables sharing the same stack slot. A.17 is correcty marked as addressable because of the call to memcpy, but A.1 isn't since its address isn't taken, and DSE can optimize away (since 4.7) stores if their MEM_EXPR doesn't escape. The store is reaching the load because an intermediate store into A.17: (insn 78 76 82 6 (set (mem/c:SI (plus:DI (reg/f:DI 30 %fp) (const_int 2039 [0x7f7])) [6 A.17+16 S4 A64]) (reg:SI 1 %g1 [136])) array_constructor_4.f90:14 61 {*movsi_insn} (nil)) has been deleted by postreload as no-op (because redundant), thus making A.1 partially escape without marking it as addressable. The attached patch uses cfun-gimple_df-escaped.vars to plug the hole: when mark_addressable is called during RTL expansion and the decl is partitioned, all the variables in the partition are added to the bitmap. Then can_escape is changed to additionally test cfun-gimple_df-escaped.vars. Tested on x86-64/Linux and SPARC64/Solaris, OK for mainline and 4.7 branch? 2012-10-14 Eric Botcazou ebotca...@adacore.com PR rtl-optimization/54870 * dse.c (can_escape): Test cfun-gimple_df-escaped.vars as well. * gimplify.c (mark_addressable): If this is a partition decl, add all the variables in the partition to cfun-gimple_df-escaped.vars. -- Eric BotcazouIndex: dse.c === --- dse.c (revision 192353) +++ dse.c (working copy) @@ -990,6 +990,7 @@ delete_dead_store_insn (insn_info_t insn } /* Check if EXPR can possibly escape the current function scope. */ + static bool can_escape (tree expr) { @@ -998,7 +999,10 @@ can_escape (tree expr) return true; base = get_base_address (expr); if (DECL_P (base) - !may_be_aliased (base)) + !may_be_aliased (base) + !(cfun-gimple_df-escaped.vars + bitmap_bit_p (cfun-gimple_df-escaped.vars, + DECL_PT_UID (base return false; return true; } Index: gimplify.c === --- gimplify.c (revision 192353) +++ gimplify.c (working copy) @@ -116,6 +116,26 @@ mark_addressable (tree x) TREE_CODE (x) != RESULT_DECL) return; TREE_ADDRESSABLE (x) = 1; + + /* If this is a partitioned decl, we need to mark all the variables in the + partition as escaped. This is needed because a store into one of them + can be replaced with a store into another, and this may not change the + outcome of the escape analysis for DSE to work properly. */ + if (TREE_CODE (x) == VAR_DECL + !TREE_STATIC (x) + cfun-gimple_df != NULL + cfun-gimple_df-decls_to_pointers != NULL) +{ + void *namep + = pointer_map_contains (cfun-gimple_df-decls_to_pointers, x); + if (namep) + { + struct ptr_info_def *pi = get_ptr_info (*(tree *)namep); + if (cfun-gimple_df-escaped.vars == NULL) + cfun-gimple_df-escaped.vars = BITMAP_GGC_ALLOC (); + bitmap_ior_into (cfun-gimple_df-escaped.vars, pi-pt.vars); + } +} } /* Return a hash value for a formal temporary table entry. */
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Sun, Oct 14, 2012 at 9:02 AM, Paolo Bonzini wrote: Can we just simulate liveness for web, and drop REG_EQUAL/REG_EQUIV notes that refer to a dead pseudo? I don't think we want to do that. A REG_EQUAL/REG_EQUIV note can use a pseudo that isn't live and still be valid. Consider a simple example like this: a = b + 3 // b dies here c = a {REG_EQUAL b+3} The REG_EQUAL note is valid and may help optimization. Removing it just because b is dead at that point would be unnecessarily pessimistic. I also don't want to compute DF_LR taking EQ_USES into account as real uses for liveness, because that involves recomputing and enlarging the DF_LR sets (all of them, both globally and locally) before LRRD and after LRRD. That's why I implemented the quick-and-dirty liveness computation for the notes: It's non-intrusive on DF_LR and it's cheap. Ciao! Steven
Committed, MMIX: fix INCOMING_REGNO / OUTGOING_REGNO for return-value
Back then, I must've missed that INCOMING_REGNO / OUTGOING_REGNO are used to map return-value-register/s too. Fixes: FAIL: gcc.dg/builtin-apply4.c execution test ... FAIL: gcc.dg/builtin-return-1.c execution test ... FAIL: gcc.dg/torture/stackalign/builtin-apply-4.c -O0 execution test FAIL: gcc.dg/torture/stackalign/builtin-apply-4.c -O1 execution test FAIL: gcc.dg/torture/stackalign/builtin-apply-4.c -O2 execution test FAIL: gcc.dg/torture/stackalign/builtin-apply-4.c -O3 -fomit-frame-pointer execution test FAIL: gcc.dg/torture/stackalign/builtin-apply-4.c -O3 -g execution test FAIL: gcc.dg/torture/stackalign/builtin-apply-4.c -Os execution test FAIL: gcc.dg/torture/stackalign/builtin-apply-4.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution tes\ t FAIL: gcc.dg/torture/stackalign/builtin-apply-4.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test FAIL: gcc.dg/torture/stackalign/builtin-return-1.c -O0 execution test FAIL: gcc.dg/torture/stackalign/builtin-return-1.c -O1 execution test FAIL: gcc.dg/torture/stackalign/builtin-return-1.c -O2 execution test FAIL: gcc.dg/torture/stackalign/builtin-return-1.c -O3 -fomit-frame-pointer execution test FAIL: gcc.dg/torture/stackalign/builtin-return-1.c -O3 -g execution test FAIL: gcc.dg/torture/stackalign/builtin-return-1.c -Os execution test FAIL: gcc.dg/torture/stackalign/builtin-return-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution te\ st FAIL: gcc.dg/torture/stackalign/builtin-return-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test Committed. * config/mmix/mmix.c (mmix_opposite_regno): Handle the return-value register too. --- gcc/config/mmix/mmix.c.prev 2012-10-09 02:00:51.0 +0200 +++ gcc/config/mmix/mmix.c 2012-10-14 00:59:54.0 +0200 @@ -392,15 +392,33 @@ mmix_conditional_register_usage (void) /* INCOMING_REGNO and OUTGOING_REGNO worker function. Those two macros must only be applied to function argument - registers. FIXME: for their current use in gcc, it'd be better - with an explicit specific additional FUNCTION_INCOMING_ARG_REGNO_P - a'la TARGET_FUNCTION_ARG / TARGET_FUNCTION_INCOMING_ARG instead of + registers and the function return value register for the opposite + use. FIXME: for their current use in gcc, it'd be better with an + explicit specific additional FUNCTION_INCOMING_ARG_REGNO_P a'la + TARGET_FUNCTION_ARG / TARGET_FUNCTION_INCOMING_ARG instead of forcing the target to commit to a fixed mapping and for any - unspecified register use. */ + unspecified register use. Particularly when thinking about the + return-value, it is better to imagine INCOMING_REGNO and + OUTGOING_REGNO as named CALLEE_TO_CALLER_REGNO and INNER_REGNO as + named CALLER_TO_CALLEE_REGNO because the direction. The incoming + and outgoing is from the perspective of the parameter-registers, + but the same macro is (must be, lacking an alternative like + suggested above) used to map the return-value-register from the + same perspective. To make directions even more confusing, the macro + MMIX_OUTGOING_RETURN_VALUE_REGNUM holds the number of the register + in which to return a value, i.e. INCOMING_REGNO for the return-value- + register as received from a called function; the return-value on the + way out. */ int mmix_opposite_regno (int regno, int incoming) { + if (incoming regno == MMIX_OUTGOING_RETURN_VALUE_REGNUM) +return MMIX_RETURN_VALUE_REGNUM; + + if (!incoming regno == MMIX_RETURN_VALUE_REGNUM) +return MMIX_OUTGOING_RETURN_VALUE_REGNUM; + if (!mmix_function_arg_regno_p (regno, incoming)) return regno; brgds, H-P
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
I don't think we want to do that. A REG_EQUAL/REG_EQUIV note can use a pseudo that isn't live and still be valid. Consider a simple example like this: a = b + 3 // b dies here c = a {REG_EQUAL b+3} The REG_EQUAL note is valid and may help optimization. Removing it just because b is dead at that point would be unnecessarily pessimistic. But if you have a REG_DEAD note for b on the first insn, then you cannot rematerialize the REG_EQUAL note after it, otherwise bad things can happen. See PR rtl-optimization/51505 for an example. -- Eric Botcazou
Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)
On Sun, Oct 14, 2012 at 11:25 PM, Eric Botcazou wrote: I don't think we want to do that. A REG_EQUAL/REG_EQUIV note can use a pseudo that isn't live and still be valid. Consider a simple example like this: a = b + 3 // b dies here c = a {REG_EQUAL b+3} The REG_EQUAL note is valid and may help optimization. Removing it just because b is dead at that point would be unnecessarily pessimistic. But if you have a REG_DEAD note for b on the first insn, then you cannot rematerialize the REG_EQUAL note after it, otherwise bad things can happen. See PR rtl-optimization/51505 for an example. That's not the case here. The register is only dead because the webizer renamed one of its live ranges but forgets to rename the EQ_NOTE use. Ciao! Steven
Re: PR fortran/51727: make module files reproducible, question on C++ in gcc
On Sat, Oct 13, 2012 at 4:26 PM, Tobias Schlüter tobias.schlue...@physik.uni-muenchen.de wrote: Hi, first a question also to non-gfortraners: if I want to use std::map, where do I #include map? In system.h? Now to the patch-specific part: in this PR, module files are produced with random changes because the order in which symbols are written can depend on the memory layout. This patch fixes this by recording which symbols need to be written and then processing them in order. The patch doesn't make the more involved effort of putting all symbols into the module in an easily predicted order, instead it only makes sure that the order remains fixed across the compiler invocations. The reason why the former is difficult is that during the process of writing a symbol, it can turn out that other symbols will have to be written as well (say, because they appear in array specifications). Since the module-writing code determines which symbols to output while actually writing the file, recording all the symbols that need to be written before writing to the file would mean a lot of surgery. I'm putting forward two patches. One uses a C++ map to very concisely build up and handle the ordered list of symbols. This has three problems: 1) gfortran maintainers may not want C++isms (even though in this case it's very localized, and in my opinion very transparent), and 2) it can't be backported to old release branches which are still compiled as C. Joost expressed interested in a backport. 3) I don't know where to #include map (see above) Therefore I also propose a patch where I added the necessary ~50 lines of boilerplate code and added the necessary traversal function to use gfortran's GFC_BBT to maintain the ordered tree of symbols. Both patches pass the testsuite and Joost confirms that they fix the problem with CP2K. I also verified with a few examples that they both produce identical .mod files as they should. Is the C++ patch, modified to do the #include correctly, ok for the trunk? If not, the C-only patch? Can I put the C-only patch on the release branches? And which? Hi, I'm pleasantly surprised that you managed to fix this PR with so little code! - Personally, I'd prefer the C++ version; The C++ standard library is widely used and documented and using it in favour of rolling our own is IMHO a good idea. - I'd be vary wrt backporting, in my experience the module.c code is somewhat fragile and easily causes regressions. In any case, AFAICS PR 51727 is not a regression. - I think one could go a step further and get rid of the BBT stuff in pointer_info, replacing it with two file-level maps std::mapvoid*, pointer_info* pmap; // Or could be std::unordered_map if available std::mapint, pointer_info* imap; So when writing a module, use pmap similar to how pointer_info BBT is used now, and then use imap to get a consistent order per your patch. When reading, lookup/create mostly via imap, creating a pmap entry also when creating a new imap entry; this avoids having to do a brute-force search when looking up via pointer when reading (see find_pointer2()). (This 3rd point is mostly an idea for further work, and is not meant as a requirement for accepting the patch) Ok for trunk, although wait for a few days in case there is a storm of protest on the C vs. C++ issue from other gfortran maintainers. -- Janne Blomqvist
Re: PR fortran/51727: make module files reproducible, question on C++ in gcc
On Mon, Oct 15, 2012 at 12:35:27AM +0300, Janne Blomqvist wrote: On Sat, Oct 13, 2012 at 4:26 PM, Tobias Schlüter I'm putting forward two patches. One uses a C++ map to very concisely build up and handle the ordered list of symbols. This has three problems: 1) gfortran maintainers may not want C++isms (even though in this case it's very localized, and in my opinion very transparent), and Even if you prefer a C++isms, why don't you go for hash-table.h? std::map at least with the default allocator will just crash the compiler if malloc returns NULL (remember that we build with -fno-exceptions), while when you use hash-table.h (or hashtab.h) you get proper OOM diagnostics. Jakub
Re: [testsuite] gcc.target/arm/div64-unwinding.c: xfail for linux
On 10 October 2012 22:57, Richard Earnshaw rearn...@arm.com wrote: On 10/10/12 03:11, Janis Johnson wrote: On 10/09/2012 07:39 AM, Richard Earnshaw wrote: On 27/09/12 01:02, Janis Johnson wrote: Test gcc.target/arm/div64-unwinding.c is known to fail for GNU/Linux targets, as described in PR54732. This patch adds an XFAIL. Tested on arm-none-eabi and arm-none-linux-gnueabi, checked in on trunk. Janis gcc-20120926-5 2012-09-26 Janis Johnson jani...@codesourcery.com * gcc.target/arm/div64-unwinding.c: XFAIL for GNU/Linux. Index: gcc.target/arm/div64-unwinding.c === --- gcc.target/arm/div64-unwinding.c(revision 191765) +++ gcc.target/arm/div64-unwinding.c(working copy) @@ -1,6 +1,7 @@ /* Performing a 64-bit division should not pull in the unwinder. */ -/* { dg-do run } */ +/* The test is expected to fail for GNU/Linux; see PR54723. */ +/* { dg-do run { xfail *-*-linux* } } */ /* { dg-options -O0 } */ #include stdlib.h I don't like this. To me, XFAIL means there's a bug here, but we're not too worried about it. The behaviour on linux targets is correct, so this test should either PASS or be skipped. Richard, The impression I got from Julian is there's a bug here, but we're not too worried about it. If you think it should be skipped instead then I'll gladly change the test. Janis I don't believe there's a bug here. The ARM EABI defines __aeabi_idiv0 as a hook that will be called if division by zero occurs. While the default implementation simply raises SIGFPE on linux, it is perfectly possible to provide your own definition of this hook and then throw() a C++ exception. In order to do that you'd need unwind information in the divdi implementation ([u]divsi tailcalls the hook). Technically you could argue the same for bare metal, but in that case the arguments against the code bloat outweigh this very small corner case and users wanting this will have to rebuild their support code. On linux, I think the presence of the unwind information is correct, since the code bloat problem is very much a secondary concern. So yes, please could you make the test be skipped on linux. Julian's patch turns off the unwinding information for all ARM systems including Linux. The test currently fails as something else (glibc?) ends up pulling in the unwinder. -- Michael
[lra] merged with trunk @192442
LRA branch was merged with trunk @192442. Committed as rev. 192446.
Re: [PATCH] Fix gcov handling directories with periods
On Sat, Oct 13, 2012 at 1:11 PM, Andreas Schwab sch...@linux-m68k.org wrote: Ian Lance Taylor i...@google.com writes: Suppose you drop this into include/libiberty.h: #ifdef __cplusplus inline char *lbasename(char *s) { return const_castchar*(lbasename (s)); } #endif That doesn't work: ../../gcc/libcpp/../include/libiberty.h: In function ‘char* lbasename(char*)’: ../../gcc/libcpp/../include/libiberty.h:123:31: error: declaration of C function ‘char* lbasename(char*)’ conflicts with ../../gcc/libcpp/../include/libiberty.h:121:20: error: previous declaration ‘const char* lbasename(const char*)’ here Hmmm, of course. OK, your patch with CONST_CAST is OK. Thanks. Ian
Ping^2: RFA: Process '*' in '@'-output-template alternatives
The following patch is still awaiting review: 2011-09-19 Jorn Rennecke joern.renne...@arc.com * genoutput.c (process_template): Process '*' in '@' alternatives. * doc/md.texi (node Output Statement): Provide example for the above. http://gcc.gnu.org/ml/gcc-patches/2012-09/msg01422.html
Ping: RFA: Improve doloop_begin support
2012-09-26 Jorn Rennecke joern.renne...@arc.com * loop-doloop.c (doloop_modify): Pass doloop_end pattern to gen_doloop_begin. * loop-doloop.c (doloop_optimize): Pass flag to indicate if loop is entered at top to gen_doloop_end. * config/arm/thumb2.md (doloop_end): Accept extra operand. * config/bfin/bfin.md (doloop_end): Likewise. * config/c6x/c6x.md (doloop_end): Likewise. * config/ia64/ia64.md (doloop_end): Likewise. * config/mep/mep.md (doloop_begin, doloop_end): Likewise. * config/rs6000/rs6000.md (doloop_end): Likewise. * config/s390/s390.md (doloop_end): Likewise. * config/sh/sh.md (doloop_end): Likewise. * config/spu/spu.md (doloop_end): Likewise. * config/tilegx/tilegx.md (doloop_end): Likewise. * config/tilepro/tilepro.md (doloop_end): Likewise. * doc/md.texi (doloop_end): Document new operand. http://gcc.gnu.org/ml/gcc-patches/2012-09/msg01807.html
Ping: RFA: Fix OP_INOUT handling of web.c:union_match_dups
2012-10-02 Joern Rennecke joern.renne...@embecosm.com * web.c (union_match_dups): Properly handle OP_INOUT match_dups. http://gcc.gnu.org/ml/gcc-patches/2012-10/msg00189.html
Ping: RFA: add lock_length attribute to break branch-shortening cycles
2012-10-04 Joern Rennecke joern.renne...@embecosm.com * final.c (get_attr_length_1): Use direct recursion rather than calling get_attr_length. (get_attr_lock_length): New function. (INSN_VARIABLE_LENGTH_P): Define. (shorten_branches): Take HAVE_ATTR_lock_length into account. Don't overwrite non-delay slot insn lengths with the lengths of delay slot insns with same uid. * genattrtab.c (lock_length_str): New variable. (make_length_attrs): New parameter base. (main): Initialize lock_length_str. Generate lock_lengths attributes. * genattr.c (gen_attr): Emit declarations for lock_length attribute related functions. * doc/md.texi (node Insn Lengths): Document lock_length attribute. http://gcc.gnu.org/ml/gcc-patches/2012-10/msg00383.html