Re: failed attempt: retain identifier length from frontend to backend
On Mon, Aug 20, 2012 at 7:03 PM, Dimitrios Apostolou ji...@gmx.net wrote: Hello, my last attempt on improving something serious was about three weeks ago, trying to keep all lengths of all strings parsed in the frontend for the whole compilation phase until the assembly output. I was hoping that would help on using faster hashes (knowing the length allows us to hash 4 or 8 bytes per iteration), quicker strcmp in various places, and using less strlen() calls, which show especially on -g3 compilations that store huge macro strings. I'll post no patch here, since what I currently have is a mess in 3 different branches and most don't even compile. I tried various approaches. First I tried adding an extra length parameter in all relevant functions, starting from the assembly generation and working my way upwards. This got too complex, and I'd really like to ask if you find any meaning in changing target specific hooks and macros to actually accept length as argument (e.g. ASM_OUTPUT_*) or return it (e.g. ASM_GENERATE_*). Changes seemed too intrusive for me to continue. But seeing that identifier length is there inside struct ht_identifier (or cpp_hashnode) and not lost, I tried the approach of having the length at str[-4] for all identifiers. To achieve this I changed ht_identifier to store str with the flexible array hack. Unfortunately I hadn't noticed that ht_identifier was a part of tree_node and also part of too many other structs, so changing all those structs to have variable size was not without side effects. In the end it compiled but I got crashes all over, and I'm sure I didn't do things right since I broke things like the static assert in libcpp/identifiers.c, that I don't even understand: /* We don't need a proxy since the hash table's identifier comes first in cpp_hashnode. However, in case this is ever changed, we have a static assertion for it. */ -extern char proxy_assertion_broken[offsetof (struct cpp_hashnode, ident) == 0 ? 1 : -1]; Anyway last attempt was to decouple ht_identifier completely from trees and other structs by storing pointer to it, but I was pretty worn out and quickly gave up after getting errors on gengtype-generated files that I didn't even know how to handle. Was all this project too ambitious? I'd appreciate all input. I think the proper thing would indeed have been to pass down the length of the string to relevant functions. Richard. Thanks, Dimitris
Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)
On Mon, Aug 20, 2012 at 10:29 PM, Jan Hubicka hubi...@ucw.cz wrote: On Mon, Aug 20, 2012 at 6:27 PM, Jan Hubicka hubi...@ucw.cz wrote: Xinliang David Li davi...@google.com writes: Process level synchronization problems can happen when two processes (running the instrumented binary) exit at the same time. The updated/merged counters from one process may be overwritten by another process -- this is true for both counter data and summary data. Solution 3) does not introduce any new problems. You could just use lockf() ? The issue here is holding lock for all the files (that can be many) versus number of locks limits possibilities for deadlocking (mind that updating may happen in different orders on the same files for different programs built from same objects) For David: there is no thread safety code in mainline for the counters. Long time ago Zdenek implmented poor-mans TLS for counters (before TLS was invented) http://gcc.gnu.org/ml/gcc-patches/2001-11/msg01546.html but it was voted down as too memory expensive per thread. We could optionaly do atomic updates like ICC or combination of both as discussed in the thread. So far no one implemented it since the coverage fixups seems to work well enough in pracitce for multithreaded programs where reproducibility do not seem to be _that_ important. For GCC profiled bootstrap however I would like to see the output binary to be reproducible. We realy ought to update profiles safe for multple processes. Trashing whole process run is worse than doing race in increment. There is good chance that one of runs is more important than others and it will get trashed. I do not think we do have serious update problems in the summaries at the moment. We lock individual files as we update them. The summary is simple enough to be safe. sum_all is summed, max_all is maximum over the individual runs. Even when you combine mutiple programs the summary will end up same. Everything except for max_all is ignored anyway. Solution 2 (i.e. histogram streaming) will also have the property that it is safe WRT multiple programs, just like sum_all. I think the sum_all based scaling of the working set entries also has this property. What is your opinion on saving the histogram in the I think the scaling will have at lest roundoff issues WRT different merging orders. summary and merging histograms together as best as possible compared to the alternative of saving the working set information as now and scaling it up by the ratio between the new and old sum_all when merging? So far I like this option best. But David seems to lean towards thirtd option with whole file locking. I see it may show to be more extensible in the future. At the moment I do not understand two things 1) why do we need info on the number of counter above given threshold, sinc ethe hot/cold decisions usually depends purely on the count cutoff. Removing those will solve merging issues with variant 2 and then it would be probably good solution. This is useful for large applications with a long tail. The instruction working set for those applications are very large, and inliner and unroller need to be aware of that and good heuristics can be developed to throttle aggressive code bloat transformations. For inliner, it is kind of the like the global budget but more application dependent. In the long run, we will collect more advanced fdo summary regarding working set -- it will be working set size for each code region (locality region). 2) Do we plan to add some features in near future that will anyway require global locking? I guess LIPO itself does not count since it streams its data into independent file as you mentioned earlier and locking LIPO file is not that hard. Does LIPO stream everything into that common file, or does it use combination of gcda files and common summary? Actually, LIPO module grouping information are stored in gcda files. It is also stored in a separate .imports file (one per object) --- this is primarily used by our build system for dependence information. What other stuff Google plans to merge? (In general I would be curious about merging plans WRT profile stuff, so we get more synchronized and effective on getting patches in. We have about two months to get it done in stage1 and it would be nice to get as much as possible. Obviously some of the patches will need bit fo dicsussion like this one. Hope you do not find it frustrating, I actually think this is an important feature). We plan to merge in the new LIPO implementation based on LTO streaming. Rong Xu finished this in 4.6 based compiler, and he needs to port it to 4.8. thanks, David I also realized today that the common value counters (used by switch, indirect call and div/mod value profiling) are
Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)
This is useful for large applications with a long tail. The instruction working set for those applications are very large, and inliner and unroller need to be aware of that and good heuristics can be developed to throttle aggressive code bloat transformations. For inliner, it is kind of the like the global budget but more application dependent. In the long run, we will collect more advanced fdo summary regarding working set -- it will be working set size for each code region (locality region). I see, so you use it to estimate size of the working set and effect of bloating optimizations on cache size. This sounds interesting. What are you experiences with this? What concerns me that it is greatly inaccurate - you have no idea how many instructions given counter is guarding and it can differ quite a lot. Also inlining/optimization makes working sets significantly different (by factor of 100 for tramp3d). But on the ohter hand any solution at this level will be greatly inaccurate. So I am curious how reliable data you can get from this? How you take this into account for the heuristics? It seems to me that for this use perhaps the simple logic in histogram merging maximizing the number of BBs for given bucket will work well? It is inaccurate, but we are working with greatly inaccurate data anyway. Except for degenerated cases, the small and unimportant runs will have small BB counts, while large runs will have larger counts and those are ones we optimize for anyway. 2) Do we plan to add some features in near future that will anyway require global locking? I guess LIPO itself does not count since it streams its data into independent file as you mentioned earlier and locking LIPO file is not that hard. Does LIPO stream everything into that common file, or does it use combination of gcda files and common summary? Actually, LIPO module grouping information are stored in gcda files. It is also stored in a separate .imports file (one per object) --- this is primarily used by our build system for dependence information. I see, getting LIPO safe WRT parallel updates will be fun. How does LIPO behave on GCC bootstrap? (i.e. it does a lot more work in the libgcov module per each invocation, so I am curious if it is practically useful at all). With LTO based solution a lot can be probably pushed at link time? Before actual GCC starts from the linker plugin, LIPO module can read gcov CFGs from gcda files and do all the merging/updating/CFG constructions that is currently performed at runtime, right? What other stuff Google plans to merge? (In general I would be curious about merging plans WRT profile stuff, so we get more synchronized and effective on getting patches in. We have about two months to get it done in stage1 and it would be nice to get as much as possible. Obviously some of the patches will need bit fo dicsussion like this one. Hope you do not find it frustrating, I actually think this is an important feature). We plan to merge in the new LIPO implementation based on LTO streaming. Rong Xu finished this in 4.6 based compiler, and he needs to port it to 4.8. Good. Looks like a lot of work ahead. It would be nice if we can perhaps start by merging the libgcov infrastructure updates prior the LIPO changes. From what I saw at LIPO branch some time ago it has a lot of stuff that is not exactly LIPO specific. Honza thanks, David I also realized today that the common value counters (used by switch, indirect call and div/mod value profiling) are non-stanble WRT different merging orders (i.e. parallel make in train run). I do not think there is actual solution to that except for not merging the counter section of this type in libgcov and merge them in some canonical order at profile feedback time. Perhaps we just want to live with this, since the disprepancy here is small. (i.e. these counters are quite rare and their outcome has just local effect on the final binary, unlike the global summaries/edge counters). Honza Thanks, Teresa Honza -Andi -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: Merge C++ conversion into trunk (0/6 - Overview)
On Tue, Aug 21, 2012 at 3:31 AM, Lawrence Crowl cr...@google.com wrote: On 8/20/12, H.J. Lu hjl.to...@gmail.com wrote: The C++ merge caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54332 GCC memory usage is more than doubled from = 3GB to = 10GB. Is this a known issue? The two memory stat reports show no differences. Are you sure you didn't splice in the wrong report? Well, easy things such as a messed up hashtab conversion (no freeing) or vec conversion (no freeing) can cause this, or even the gengtype change causing GC issues (which is why those should have been different revisions). Richard. -- Lawrence Crowl
Loop iterations inline hint
Hi, this patch adds a hint that if inlining makes bounds on loop iterations known, it is probably good idea. This is primarely targetting Fortran's array descriptors, but should be generally useful. Fortran will still need a bit more work. Often we disregard inlining because we think the call is cold (because it comes from Main) so inlining heuristic will need more updating and apparently we will also need to update for PHI conditionals as done in Martin's patch 3/3. At the moment the hint is interpreted same way as the indirect_call hint from previous patch. Martin: I think ipa-cp should also make use of this hint. Resolving number of loop iterations is important enough reason to specialize in many cases. I think it already has logic for devirtualization but perhaps it should be made more aggressive? I was sort of surprised that for Mozila the inlining hint makes us to catch 20 times more cases than before. Most of the cases sounds like good ipa-cp candidates. Also can you please try to finaly make param notes to be used by the virtual clones machinery and thus make it possible for ipa-cp to specialize for known aggregate parameters? This should make a lot of difference for Fortran, I think. Boostrapped/regtested x86_64-linux, will commit it after a bit more testing. Honza * gcc.dg/ipa/inlinehint-1.c: New. PR fortran/48636 * ipa-inline.c (want_inline_small_function_p): Take loop_iterations hint. (edge_badness): Likewise. * ipa-inline.h (inline_hints_vals): Add INLINE_HINT_loop_iterations. (inline_summary): Add loop_iterations. * ipa-inline-analysis.c: Include tree-scalar-evolution.h. (dump_inline_hints): Dump loop_iterations. (reset_inline_summary): Free loop_iterations. (inline_node_duplication_hook): Update loop_iterations. (dump_inline_summary): Dump loop_iterations. (will_be_nonconstant_expr_predicate): New function. (estimate_function_body_sizes): Analyze loops. (estimate_node_size_and_time): Set hint loop_iterations. (inline_merge_summary): Merge loop iterations. (inline_read_section): Stream in loop_iterations. (inline_write_summary): Stram out loop_iterations. Index: testsuite/gcc.dg/ipa/inlinehint-1.c === *** testsuite/gcc.dg/ipa/inlinehint-1.c (revision 0) --- testsuite/gcc.dg/ipa/inlinehint-1.c (revision 0) *** *** 0 --- 1,16 + /* { dg-options -O3 -c -fdump-ipa-inline-details -fno-early-inlining -fno-ipa-cp } */ + test (int a) + { +int i; +for (i=0; ia; i++) + { + test2(a); + test2(a); + } + } + m() + { + test (10); + } + /* { dg-final { scan-ipa-dump loop_iterations inline } } */ + /* { dg-final { cleanup-ipa-dump inline } } */ Index: ipa-inline.c === *** ipa-inline.c(revision 190510) --- ipa-inline.c(working copy) *** want_inline_small_function_p (struct cgr *** 480,486 hints suggests that inlining given function is very profitable. */ else if (DECL_DECLARED_INLINE_P (callee-symbol.decl) growth = MAX_INLINE_INSNS_SINGLE ! !(hints INLINE_HINT_indirect_call)) { e-inline_failed = CIF_MAX_INLINE_INSNS_SINGLE_LIMIT; want_inline = false; --- 480,487 hints suggests that inlining given function is very profitable. */ else if (DECL_DECLARED_INLINE_P (callee-symbol.decl) growth = MAX_INLINE_INSNS_SINGLE ! !(hints (INLINE_HINT_indirect_call !| INLINE_HINT_loop_iterations))) { e-inline_failed = CIF_MAX_INLINE_INSNS_SINGLE_LIMIT; want_inline = false; *** edge_badness (struct cgraph_edge *edge, *** 863,869 if (dump) fprintf (dump_file, Badness overflow\n); } ! if (hints INLINE_HINT_indirect_call) badness /= 8; if (dump) { --- 864,871 if (dump) fprintf (dump_file, Badness overflow\n); } ! if (hints (INLINE_HINT_indirect_call ! | INLINE_HINT_loop_iterations)) badness /= 8; if (dump) { Index: ipa-inline.h === *** ipa-inline.h(revision 190510) --- ipa-inline.h(working copy) *** typedef struct GTY(()) condition *** 45,51 /* Inline hints are reasons why inline heuristics should preffer inlining given function. They are represtented as bitmap of the following values. */ enum inline_hints_vals { ! INLINE_HINT_indirect_call = 1 }; typedef int inline_hints; --- 45,52 /* Inline hints are reasons why inline heuristics should preffer inlining given function. They are represtented as bitmap of the
Re: [wwwdocs] Update Fortran secrion in 4.8/changes.html
Gerald Pfeifer wrote: I went ahead and made some smaller changes, patch below. Thanks. I noticed you are using q.../q, as in qcodee/code/q, which we usually don't. Why that? My impression was that a one-letter code didn't stand out enough and looked rather odd; if you think it improves consistency or readability, feel free to change it. * * * I intent to commit the attached patch to document two new warning flags, which were recently added. (Suggested in ISO/IEC Technical Report 24772 Guidance for Avoiding Vulnerabilities through Language Selection and Use.) Tobias Index: changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v retrieving revision 1.17 diff -u -r1.17 changes.html --- changes.html 20 Aug 2012 12:23:39 - 1.17 +++ changes.html 21 Aug 2012 06:56:55 - @@ -92,6 +92,21 @@ (re)allocation in hot loops. (For arrays, replacing qcodevar=/code/q by qcodevar(:)=/code/q disables the automatic reallocation.)/li +liThe a +href=http://gcc.gnu.org/onlinedocs/gfortran/Error-and-Warning-Options.html; +code-Wcompare-reals/code/a flag has been added. When this flag is set, +warnings are issued when comparing codeREAL/code or +codeCOMPLEX/code types for equality and inequality; consider replacing +codea == b/code by codeabs(aminus;b) lt; eps/code with a suitable +codeeps/code. The -Wcompare-reals flag is enabled by +code-Wall/code./li + +liThe a +href=http://gcc.gnu.org/onlinedocs/gfortran/Error-and-Warning-Options.html; +code-Wtarget-lifetime/code/a flag has been added (enabled with +code-Wall/code), which warns if the pointer in a pointer assignment +might outlive its target./li + lipReading floating point numbers which use qcodeq/code/q for the exponential (such as code4.0q0/code) is now supported as vendor extension for better compatibility with old data files. It is strongly
Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)
On Mon, Aug 20, 2012 at 11:33 PM, Jan Hubicka hubi...@ucw.cz wrote: This is useful for large applications with a long tail. The instruction working set for those applications are very large, and inliner and unroller need to be aware of that and good heuristics can be developed to throttle aggressive code bloat transformations. For inliner, it is kind of the like the global budget but more application dependent. In the long run, we will collect more advanced fdo summary regarding working set -- it will be working set size for each code region (locality region). I see, so you use it to estimate size of the working set and effect of bloating optimizations on cache size. This sounds interesting. What are you experiences with this? Teresa has done some tunings for the unroller so far. The inliner tuning is the next step. What concerns me that it is greatly inaccurate - you have no idea how many instructions given counter is guarding and it can differ quite a lot. Also inlining/optimization makes working sets significantly different (by factor of 100 for tramp3d). The pre ipa-inline working set is the one that is needed for ipa inliner tuning. For post-ipa inline code increase transformations, some update is probably needed. But on the ohter hand any solution at this level will be greatly inaccurate. So I am curious how reliable data you can get from this? How you take this into account for the heuristics? This effort is just the first step to allow good heuristics to develop. It seems to me that for this use perhaps the simple logic in histogram merging maximizing the number of BBs for given bucket will work well? It is inaccurate, but we are working with greatly inaccurate data anyway. Except for degenerated cases, the small and unimportant runs will have small BB counts, while large runs will have larger counts and those are ones we optimize for anyway. The working set curve for each type of applications contains lots of information that can be mined. The inaccuracy can also be mitigated by more data 'calibration'. 2) Do we plan to add some features in near future that will anyway require global locking? I guess LIPO itself does not count since it streams its data into independent file as you mentioned earlier and locking LIPO file is not that hard. Does LIPO stream everything into that common file, or does it use combination of gcda files and common summary? Actually, LIPO module grouping information are stored in gcda files. It is also stored in a separate .imports file (one per object) --- this is primarily used by our build system for dependence information. I see, getting LIPO safe WRT parallel updates will be fun. How does LIPO behave on GCC bootstrap? We have not tried gcc bootstrap with LIPO. Gcc compile time is not the main problem for application build -- the link time (for debug build) is. (i.e. it does a lot more work in the libgcov module per each invocation, so I am curious if it is practically useful at all). With LTO based solution a lot can be probably pushed at link time? Before actual GCC starts from the linker plugin, LIPO module can read gcov CFGs from gcda files and do all the merging/updating/CFG constructions that is currently performed at runtime, right? The dynamic cgraph build and analysis is still done at runtime. However, with the new implementation, FE is no longer involved. Gcc driver is modified to understand module grouping, and lto is used to merge the streamed output from aux modules. David What other stuff Google plans to merge? (In general I would be curious about merging plans WRT profile stuff, so we get more synchronized and effective on getting patches in. We have about two months to get it done in stage1 and it would be nice to get as much as possible. Obviously some of the patches will need bit fo dicsussion like this one. Hope you do not find it frustrating, I actually think this is an important feature). We plan to merge in the new LIPO implementation based on LTO streaming. Rong Xu finished this in 4.6 based compiler, and he needs to port it to 4.8. Good. Looks like a lot of work ahead. It would be nice if we can perhaps start by merging the libgcov infrastructure updates prior the LIPO changes. From what I saw at LIPO branch some time ago it has a lot of stuff that is not exactly LIPO specific. Honza thanks, David I also realized today that the common value counters (used by switch, indirect call and div/mod value profiling) are non-stanble WRT different merging orders (i.e. parallel make in train run). I do not think there is actual solution to that except for not merging the counter section of this type in libgcov and merge them in some canonical order at profile feedback time. Perhaps we just want to live with this, since the disprepancy here is small. (i.e.
Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)
Teresa has done some tunings for the unroller so far. The inliner tuning is the next step. What concerns me that it is greatly inaccurate - you have no idea how many instructions given counter is guarding and it can differ quite a lot. Also inlining/optimization makes working sets significantly different (by factor of 100 for tramp3d). The pre ipa-inline working set is the one that is needed for ipa inliner tuning. For post-ipa inline code increase transformations, some update is probably needed. But on the ohter hand any solution at this level will be greatly inaccurate. So I am curious how reliable data you can get from this? How you take this into account for the heuristics? This effort is just the first step to allow good heuristics to develop. It seems to me that for this use perhaps the simple logic in histogram merging maximizing the number of BBs for given bucket will work well? It is inaccurate, but we are working with greatly inaccurate data anyway. Except for degenerated cases, the small and unimportant runs will have small BB counts, while large runs will have larger counts and those are ones we optimize for anyway. The working set curve for each type of applications contains lots of information that can be mined. The inaccuracy can also be mitigated by more data 'calibration'. Sure, I think I am leaning towards trying the solution 2) with maximizing counter count merging (probably it would make sense to rename it from BB count since it is not really BB count and thus it is misleading) and we will see how well it works in practice. We have benefits of much fewer issues with profile locking/unlocking and we lose bit of precision on BB counts. I tend to believe that the error will not be that important in practice. Another loss is more histogram streaming into each gcda file, but with skiping zero entries it should not be major overhead problem I hope. What do you think? 2) Do we plan to add some features in near future that will anyway require global locking? I guess LIPO itself does not count since it streams its data into independent file as you mentioned earlier and locking LIPO file is not that hard. Does LIPO stream everything into that common file, or does it use combination of gcda files and common summary? Actually, LIPO module grouping information are stored in gcda files. It is also stored in a separate .imports file (one per object) --- this is primarily used by our build system for dependence information. I see, getting LIPO safe WRT parallel updates will be fun. How does LIPO behave on GCC bootstrap? We have not tried gcc bootstrap with LIPO. Gcc compile time is not the main problem for application build -- the link time (for debug build) is. I was primarily curious how the LIPOs runtime analysis fare in the situation where you do very many small train runs on rather large app (sure GCC is small to google's use case ;). (i.e. it does a lot more work in the libgcov module per each invocation, so I am curious if it is practically useful at all). With LTO based solution a lot can be probably pushed at link time? Before actual GCC starts from the linker plugin, LIPO module can read gcov CFGs from gcda files and do all the merging/updating/CFG constructions that is currently performed at runtime, right? The dynamic cgraph build and analysis is still done at runtime. However, with the new implementation, FE is no longer involved. Gcc driver is modified to understand module grouping, and lto is used to merge the streamed output from aux modules. I see. Are there any fundamental reasons why it can not be done at link-time when all gcda files are available? Why the grouping is not done inside linker plugin? Honza David
Re: [PATCH, ARM] Don't pull in unwinder for 64-bit division routines
On Fri, Aug 17, 2012 at 9:13 AM, Ian Lance Taylor i...@google.com wrote: Looks fine to me. Ian Will backport to arm/embedded-4_7-branch. No sure if appropriate for 4.7 branch since it is not a stability problem. - Joey
Fix Solaris 9/x86 bootstrap
Solaris 9/x86 bootstrap was broken after the cxx-conversion merge: In file included from /vol/gcc/src/hg/trunk/local/gcc/gengtype.c:957: /vol/gcc/src/hg/trunk/local/gcc/rtl.def:347: error: expected identifier before n umeric constant /vol/gcc/src/hg/trunk/local/gcc/rtl.def:347: error: expected '}' before numeric constant This happens since g++, unlike gcc, defines __EXTENSIONS__, which exposes the equivalent of #define PC 14 Initially I tried to avoid this by having gengtype.c include rtl.h, which already has the #undef, but this produced so much fallout that I decided it's better to just replicate it here. The patch allowed an i386-pc-solaris2.9 bootstrap to finish. I think this counts as obvious unless someone prefers the rtl.h route nonetheless. Ok for mainline? Rainer 2012-08-20 Rainer Orth r...@cebitec.uni-bielefeld.de * gengtype.c (PC): Undef. # HG changeset patch # Parent cf74f0e72cab4965ba20bf236eac2fac2b87064e Fix Solaris 9 bootstrap diff --git a/gcc/gengtype.c b/gcc/gengtype.c --- a/gcc/gengtype.c +++ b/gcc/gengtype.c @@ -35,6 +35,8 @@ #include gengtype.h #include filenames.h +#undef PC /* Some systems predefine this symbol; don't let it interfere. */ + /* Data types, macros, etc. used only in this file. */ -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: Fix Solaris 9/x86 bootstrap
On Tue, Aug 21, 2012 at 10:53 AM, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: Solaris 9/x86 bootstrap was broken after the cxx-conversion merge: In file included from /vol/gcc/src/hg/trunk/local/gcc/gengtype.c:957: /vol/gcc/src/hg/trunk/local/gcc/rtl.def:347: error: expected identifier before n umeric constant /vol/gcc/src/hg/trunk/local/gcc/rtl.def:347: error: expected '}' before numeric constant This happens since g++, unlike gcc, defines __EXTENSIONS__, which exposes the equivalent of #define PC 14 Initially I tried to avoid this by having gengtype.c include rtl.h, which already has the #undef, but this produced so much fallout that I decided it's better to just replicate it here. The patch allowed an i386-pc-solaris2.9 bootstrap to finish. I think this counts as obvious unless someone prefers the rtl.h route nonetheless. Ok for mainline? Doesn't that belong in system.h instead? And removed from rtl.h? Thanks, Richard. Rainer 2012-08-20 Rainer Orth r...@cebitec.uni-bielefeld.de * gengtype.c (PC): Undef. -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[SH] PR 39423 - Add support for SH2A movu.w insn
Hello, This adds support for SH2A's movu.w insn for memory addressing cases as described in the PR. Tested on rev 190546 with make -k check RUNTESTFLAGS=--target_board=sh-sim \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb} and no new failures. OK? Cheers, Oleg ChangeLog: PR target/39423 * config/sh/sh.md (*movhi_index_disp): Add support for SH2A movu.w insn. testsuite/ChangeLog: PR target/39423 * gcc.target/sh/pr39423-2.c: New. Index: gcc/config/sh/sh.md === --- gcc/config/sh/sh.md (revision 190459) +++ gcc/config/sh/sh.md (working copy) @@ -5667,12 +5667,35 @@ (clobber (reg:SI T_REG))] TARGET_SH1 # - 1 - [(parallel [(set (match_dup 0) (sign_extend:SI (match_dup 1))) - (clobber (reg:SI T_REG))]) - (set (match_dup 0) (zero_extend:SI (match_dup 2)))] + can_create_pseudo_p () + [(const_int 0)] { - operands[2] = gen_lowpart (HImode, operands[0]); + rtx mem = operands[1]; + rtx plus0_rtx = XEXP (mem, 0); + rtx plus1_rtx = XEXP (plus0_rtx, 0); + rtx mult_rtx = XEXP (plus1_rtx, 0); + + rtx op_1 = XEXP (mult_rtx, 0); + rtx op_2 = GEN_INT (exact_log2 (INTVAL (XEXP (mult_rtx, 1; + rtx op_3 = XEXP (plus1_rtx, 1); + rtx op_4 = XEXP (plus0_rtx, 1); + rtx op_5 = gen_reg_rtx (SImode); + rtx op_6 = gen_reg_rtx (SImode); + rtx op_7 = replace_equiv_address (mem, gen_rtx_PLUS (SImode, op_6, op_4)); + + emit_insn (gen_ashlsi3 (op_5, op_1, op_2)); + emit_insn (gen_addsi3 (op_6, op_5, op_3)); + + /* On SH2A the movu.w insn can be used for zero extending loads. */ + if (TARGET_SH2A) +emit_insn (gen_zero_extendhisi2 (operands[0], op_7)); + else +{ + emit_insn (gen_extendhisi2 (operands[0], op_7)); + emit_insn (gen_zero_extendhisi2 (operands[0], + gen_lowpart (HImode, operands[0]))); +} + DONE; }) (define_insn_and_split *movsi_index_disp Index: gcc/testsuite/gcc.target/sh/pr39423-2.c === --- gcc/testsuite/gcc.target/sh/pr39423-2.c (revision 0) +++ gcc/testsuite/gcc.target/sh/pr39423-2.c (revision 0) @@ -0,0 +1,14 @@ +/* Check that displacement addressing is used for indexed addresses with a + small offset, instead of re-calculating the index and that the movu.w + instruction is used on SH2A. */ +/* { dg-do compile { target sh*-*-* } } */ +/* { dg-options -O2 } */ +/* { dg-skip-if { sh*-*-* } { * } { -m2a* } } */ +/* { dg-final { scan-assembler-not add\t#1 } } */ +/* { dg-final { scan-assembler movu.w } } */ + +int +test_00 (unsigned short tab[], int index) +{ + return tab[index + 1]; +}
Re: [PATCH][RFC] Move TREE_VEC length and SSA_NAME version into tree_base
On Mon, 20 Aug 2012, Richard Guenther wrote: This shrinks TREE_VEC from 40 bytes to 32 bytes and SSA_NAME from 80 bytes to 72 bytes on a 64bit host. Both structures suffer from the fact they need storage for an integer (length and version) which leaves unused padding. Both data structures do not require as many flag bits as we keep in tree_base though, so they can conveniently use the upper 4-bytes of the 8-bytes tree_base to store length / version. I added a union to tree_base to divide up the space between flags (possibly) used for all tree kinds and flags that are not used for those who chose to re-use the upper 4-bytes of tree_base for something else. This superseeds the patch that moved the C++ specific usage of TREE_CHAIN on TREE_VECs to tree_base (same savings, but TREE_VEC isn't any closer to be based on tree_base only). Due to re-use of flags from frontends definitive checking for flag accesses is not always possible (TREE_NOTRHOW for example). Where appropriate I added TREE_NOT_CHECK (NODE, TREE_VEC) instead, to catch mis-uses of the C++ frontend. Changed ARGUMENT_PACK_INCOMPLETE_P to use TREE_ADDRESSABLE instead of TREE_LANG_FLAG_0 then which it used on TREE_VECs. We are very lazy adjusting flag usage documentation :/ Bootstrap and regtest pending on x86_64-unknown-linux-gnu. After discussion on IRC I added !SSA_NAME checking to the lang flag accessors. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2012-08-21 Richard Guenther rguent...@suse.de cp/ * cp-tree.h (TREE_INDIRECT_USING): Use TREE_LANG_FLAG_0 accessor. (ATTR_IS_DEPENDENT): Likewise. (ARGUMENT_PACK_INCOMPLETE_P): Use TREE_ADDRESSABLE instead of TREE_LANG_FLAG_0 on TREE_VECs. * tree.h (struct tree_base): Add union to make it possible to re-use the upper 4 bytes for tree codes that do not need as many flags as others. Move visited and default_def_flag to common bits section in exchange for saturating_flag and unsigned_flag. Add SSA name version and tree vec length fields here. (struct tree_vec): Remove length field here. (struct tree_ssa_name): Remove version field here. Index: trunk/gcc/cp/cp-tree.h === *** trunk.orig/gcc/cp/cp-tree.h 2012-08-20 12:47:47.0 +0200 --- trunk/gcc/cp/cp-tree.h 2012-08-20 13:53:05.212969994 +0200 *** struct GTY((variable_size)) lang_decl { *** 2520,2530 /* In a TREE_LIST concatenating using directives, indicate indirect directives */ ! #define TREE_INDIRECT_USING(NODE) (TREE_LIST_CHECK (NODE)-base.lang_flag_0) /* In a TREE_LIST in an attribute list, indicates that the attribute must be applied at instantiation time. */ ! #define ATTR_IS_DEPENDENT(NODE) (TREE_LIST_CHECK (NODE)-base.lang_flag_0) extern tree decl_shadowed_for_var_lookup (tree); extern void decl_shadowed_for_var_insert (tree, tree); --- 2520,2530 /* In a TREE_LIST concatenating using directives, indicate indirect directives */ ! #define TREE_INDIRECT_USING(NODE) TREE_LANG_FLAG_0 (TREE_LIST_CHECK (NODE)) /* In a TREE_LIST in an attribute list, indicates that the attribute must be applied at instantiation time. */ ! #define ATTR_IS_DEPENDENT(NODE) TREE_LANG_FLAG_0 (TREE_LIST_CHECK (NODE)) extern tree decl_shadowed_for_var_lookup (tree); extern void decl_shadowed_for_var_insert (tree, tree); *** extern void decl_shadowed_for_var_insert *** 2881,2887 arguments will be placed into the beginning of the argument pack, but additional arguments might still be deduced. */ #define ARGUMENT_PACK_INCOMPLETE_P(NODE)\ ! TREE_LANG_FLAG_0 (ARGUMENT_PACK_ARGS (NODE)) /* When ARGUMENT_PACK_INCOMPLETE_P, stores the explicit template arguments used to fill this pack. */ --- 2881,2887 arguments will be placed into the beginning of the argument pack, but additional arguments might still be deduced. */ #define ARGUMENT_PACK_INCOMPLETE_P(NODE)\ ! TREE_ADDRESSABLE (ARGUMENT_PACK_ARGS (NODE)) /* When ARGUMENT_PACK_INCOMPLETE_P, stores the explicit template arguments used to fill this pack. */ Index: trunk/gcc/tree.h === *** trunk.orig/gcc/tree.h 2012-08-20 12:47:47.0 +0200 --- trunk/gcc/tree.h2012-08-21 10:32:47.717394657 +0200 *** enum omp_clause_code *** 417,423 so all nodes have these fields. See the accessor macros, defined below, for documentation of the !fields. */ struct GTY(()) tree_base { ENUM_BITFIELD(tree_code) code : 16; --- 417,424 so all nodes have these fields. See the accessor macros, defined below, for documentation of the !fields, and the table below which connects the fileds and the !
Re: [PATCH][RFC] Move TREE_VEC length and SSA_NAME version into tree_base
On 21 August 2012 10:58, Richard Guenther rguent...@suse.de wrote: Index: trunk/gcc/tree.h === *** trunk.orig/gcc/tree.h 2012-08-20 12:47:47.0 +0200 --- trunk/gcc/tree.h2012-08-21 10:32:47.717394657 +0200 *** enum omp_clause_code *** 417,423 so all nodes have these fields. See the accessor macros, defined below, for documentation of the !fields. */ struct GTY(()) tree_base { ENUM_BITFIELD(tree_code) code : 16; --- 417,424 so all nodes have these fields. See the accessor macros, defined below, for documentation of the !fields, and the table below which connects the fileds and the !accessor macros. */ Typo fileds. Jay.
Re: [PATCH] Add valgrind support to alloc-pool.c
On Sat, Aug 18, 2012 at 9:56 AM, Richard Guenther richard.guent...@gmail.com wrote: On Sat, Aug 18, 2012 at 6:17 AM, Andrew Pinski pins...@gmail.com wrote: Hi, I implemented this patch almost 6 years ago when the df branch was being worked on. It adds valgrind support to alloc-pool.c to catch cases of using memory after free the memory. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. Ok. It doesn't work. Did you check with valgrind checking? /space/rguenther/tramp3d/trunk/gcc/alloc-pool.c: In function 'void* pool_alloc(alloc_pool)': /space/rguenther/tramp3d/trunk/gcc/alloc-pool.c:250:3: error: expected primary-expression before 'int' /space/rguenther/tramp3d/trunk/gcc/alloc-pool.c:250:3: error: expected ')' before 'int' /space/rguenther/tramp3d/trunk/gcc/alloc-pool.c:250:3: error: expected ')' before ';' token /space/rguenther/tramp3d/trunk/gcc/alloc-pool.c:263:3: error: 'size' was not declared in this scope /space/rguenther/tramp3d/trunk/gcc/alloc-pool.c:303:7: error: 'size' was not declared in this scope that's because VALGRIND_DISCARD is not what you think it is. Testing a fix ... Richard. Thanks, Richard. Thanks, Andrew Pinski ChangeLog: * alloc-pool.c (pool_alloc): Add valgrind markers. (pool_free): Likewise.
[C++ PATCH] Add overflow checking to __cxa_vec_new[23]
I don't think there are any callers out there, but let's fix this for completeness. A compiler emitting code to call this function would still have to perform overflow checks for the new T[n][m] case, so this interface is not as helpful as it looks at first glance. Tested on x86_64-redhat-linux-gnu. -- Florian Weimer / Red Hat Product Security Team 2012-08-21 Florian Weimer fwei...@redhat.com * libsupc++/vec.cc (compute_size): New function. (__cxa_vec_new2, __cxa_vec_new3): Use it. 2012-08-21 Florian Weimer fwei...@redhat.com * g++.old-deja/g++.abi/cxa_vec.C (test5, test6): New. diff --git a/gcc/testsuite/g++.old-deja/g++.abi/cxa_vec.C b/gcc/testsuite/g++.old-deja/g++.abi/cxa_vec.C index f3d602f..e2b82e7 100644 --- a/gcc/testsuite/g++.old-deja/g++.abi/cxa_vec.C +++ b/gcc/testsuite/g++.old-deja/g++.abi/cxa_vec.C @@ -8,7 +8,7 @@ // Avoid use of none-overridable new/delete operators in shared // { dg-options -static { target *-*-mingw* } } // Test __cxa_vec routines -// Copyright (C) 2000, 2005 Free Software Foundation, Inc. +// Copyright (C) 2000-2012 Free Software Foundation, Inc. // Contributed by Nathan Sidwell 7 Apr 2000 nathan@nat...@codesourcery.com #if defined (__GXX_ABI_VERSION) __GXX_ABI_VERSION = 100 @@ -255,6 +255,80 @@ void test4 () return; } +static const std::size_t large_size = std::size_t(1) (sizeof(std::size_t) * 8 - 2); + +// allocate an array whose size causes an overflow during multiplication +void test5 () +{ + static bool started = false; + + if (!started) +{ + started = true; + std::set_terminate (test0); + + ctor_count = dtor_count = 1; + dtor_repeat = false; + blocks = 0; + + try +{ + void *ary = abi::__cxa_vec_new (large_size, 4, padding, ctor, dtor); + longjmp (jump, 1); +} + catch (std::bad_alloc) + { + if (ctor_count != 1) + longjmp (jump, 4); + } + catch (...) +{ + longjmp (jump, 2); +} +} + else +{ + longjmp (jump, 3); +} + return; +} + +// allocate an array whose size causes an overflow during addition +void test6 () +{ + static bool started = false; + + if (!started) +{ + started = true; + std::set_terminate (test0); + + ctor_count = dtor_count = 1; + dtor_repeat = false; + blocks = 0; + + try +{ + void *ary = abi::__cxa_vec_new (std::size_t(-1) / 4, 4, padding, ctor, dtor); + longjmp (jump, 1); +} + catch (std::bad_alloc) + { + if (ctor_count != 1) + longjmp (jump, 4); + } + catch (...) +{ + longjmp (jump, 2); +} +} + else +{ + longjmp (jump, 3); +} + return; +} + static void (*tests[])() = { test0, @@ -262,6 +336,8 @@ static void (*tests[])() = test2, test3, test4, + test5, + test6, NULL }; diff --git a/libstdc++-v3/libsupc++/vec.cc b/libstdc++-v3/libsupc++/vec.cc index 700c5ef..bfce117 100644 --- a/libstdc++-v3/libsupc++/vec.cc +++ b/libstdc++-v3/libsupc++/vec.cc @@ -1,6 +1,6 @@ // New abi Support -*- C++ -*- -// Copyright (C) 2000, 2001, 2003, 2004, 2009, 2011 +// Copyright (C) 2000-2012 // Free Software Foundation, Inc. // // This file is part of GCC. @@ -59,6 +59,19 @@ namespace __cxxabiv1 globals-caughtExceptions = p-nextException; globals-uncaughtExceptions += 1; } + +// Compute the total size with overflow checking. +std::size_t compute_size(std::size_t element_count, + std::size_t element_size, + std::size_t padding_size) +{ + if (element_size element_count std::size_t(-1) / element_size) + throw std::bad_alloc(); + std::size_t size = element_count * element_size; + if (size + padding_size size) + throw std::bad_alloc(); + return size + padding_size; +} } // Allocate and construct array. @@ -83,7 +96,8 @@ namespace __cxxabiv1 void *(*alloc) (std::size_t), void (*dealloc) (void *)) { -std::size_t size = element_count * element_size + padding_size; +std::size_t size + = compute_size(element_count, element_size, padding_size); char *base = static_cast char * (alloc (size)); if (!base) return base; @@ -124,7 +138,8 @@ namespace __cxxabiv1 void *(*alloc) (std::size_t), void (*dealloc) (void *, std::size_t)) { -std::size_t size = element_count * element_size + padding_size; +std::size_t size + = compute_size(element_count, element_size, padding_size); char *base = static_castchar *(alloc (size)); if (!base) return base;
[SH] Use more multi-line asm outputs
Hello, This mainly converts the asm outputs to multi-line strings and uses tab chars instead of '\\t' in the asm strings, in the hope to make stuff easier to read and a bit more consistent. Tested on rev 190546 with make -k check RUNTESTFLAGS=--target_board=sh-sim \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb} and no new failures. OK? Cheers, Oleg ChangeLog: * config/sh/sh.md (cmpeqdi_t, cmpgtdi_t, cmpgedi_t, cmpgeudi_t, cmpgtudi_t, *movsicc_t_false, *movsicc_t_true, divsi_inv20, negsi_cond, truncdihi2, ic_invalidate_line_i, ic_invalidate_line_sh4a, ic_invalidate_line_media, movdf_i4, calli_pcrel, call_valuei, call_valuei_pcrel, sibcalli_pcrel, sibcall_compact, sibcall_valuei_pcrel, sibcall_value_compact, casesi_worker_1, casesi_worker_2, bandreg_m2a, borreg_m2a, bxorreg_m2a, sp_switch_1, sp_switch_2, stack_protect_set_si, stack_protect_set_si_media, stack_protect_set_di_media, stack_protect_test_si, stack_protect_test_si_media, stack_protect_test_di_media): Convert to multi-line asm output strings. (divsi_inv_qitable, divsi_inv_hitable): Use single-alternative asm output. (*andsi3_bclr, rotldi3_mextr, rotrdi3_mextr, calli, call_valuei_tbr_rel, movml_push_banked, movml_pop_banked, bclr_m2a, bclrmem_m2a, bset_m2a, bsetmem_m2a, bst_m2a, bld_m2a, bldsign_m2a, bld_reg, *bld_regqi, band_m2a, bor_m2a, bxor_m2a, mextr_rl, *mextr_lr, ): Use tab char instead of '\\t'. (iordi3): Use braced string. (*movsi_pop): Use tab chars instead of spaces. Index: gcc/config/sh/sh.md === --- gcc/config/sh/sh.md (revision 190546) +++ gcc/config/sh/sh.md (working copy) @@ -541,12 +541,10 @@ ;; On the SH and SH2, the rte instruction reads the return pc from the stack, ;; and thus we can't put a pop instruction in its delay slot. -;; ??? On the SH3, the rte instruction does not use the stack, so a pop +;; On the SH3 and SH4, the rte instruction does not use the stack, so a pop ;; instruction can go in the delay slot. - ;; Since a normal return (rts) implicitly uses the PR register, ;; we can't allow PR register loads in an rts delay slot. - (define_delay (eq_attr type return) [(and (eq_attr in_delay_slot yes) @@ -1154,9 +1152,21 @@ (eq:SI (match_operand:DI 0 arith_reg_operand r,r) (match_operand:DI 1 arith_reg_or_0_operand N,r)))] TARGET_SH1 - @ - tst %S0,%S0\;bf %,Ldi%=\;tst %R0,%R0\\n%,Ldi%=: - cmp/eq %S1,%S0\;bf %,Ldi%=\;cmp/eq %R1,%R0\\n%,Ldi%=: +{ + static const char* alt[] = + { + tst %S0,%S0 \n + bf 0f \n + tst %R0,%R0 \n +0:, + + cmp/eq %S1,%S0 \n + bf 0f \n + cmp/eq %R1,%R0 \n +0: + }; + return alt[which_alternative]; +} [(set_attr length 6) (set_attr type arith3b)]) @@ -1189,9 +1199,23 @@ (gt:SI (match_operand:DI 0 arith_reg_operand r,r) (match_operand:DI 1 arith_reg_or_0_operand r,N)))] TARGET_SH2 - @ - cmp/eq\\t%S1,%S0\;bf{.|/}s\\t%,Ldi%=\;cmp/gt\\t%S1,%S0\;cmp/hi\\t%R1,%R0\\n%,Ldi%=: - tst\\t%S0,%S0\;bf{.|/}s\\t%,Ldi%=\;cmp/pl\\t%S0\;cmp/hi\\t%S0,%R0\\n%,Ldi%=: +{ + static const char* alt[] = + { + cmp/eq %S1,%S0 \n + bf{.|/}s 0f \n + cmp/gt %S1,%S0 \n + cmp/hi %R1,%R0 \n +0:, + +tst %S0,%S0 \n + bf{.|/}s 0f \n + cmp/pl %S0 \n + cmp/hi %S0,%R0 \n +0: + }; + return alt[which_alternative]; +} [(set_attr length 8) (set_attr type arith3)]) @@ -1200,9 +1224,19 @@ (ge:SI (match_operand:DI 0 arith_reg_operand r,r) (match_operand:DI 1 arith_reg_or_0_operand r,N)))] TARGET_SH2 - @ - cmp/eq\\t%S1,%S0\;bf{.|/}s\\t%,Ldi%=\;cmp/ge\\t%S1,%S0\;cmp/hs\\t%R1,%R0\\n%,Ldi%=: - cmp/pz\\t%S0 +{ + static const char* alt[] = + { + cmp/eq %S1,%S0 \n + bf{.|/}s 0f \n + cmp/ge %S1,%S0 \n + cmp/hs %R1,%R0 \n +0:, + + cmp/pz %S0 + }; + return alt[which_alternative]; +} [(set_attr length 8,2) (set_attr type arith3,mt_group)]) @@ -1215,7 +1249,13 @@ (geu:SI (match_operand:DI 0 arith_reg_operand r) (match_operand:DI 1 arith_reg_operand r)))] TARGET_SH2 - cmp/eq\\t%S1,%S0\;bf{.|/}s\\t%,Ldi%=\;cmp/hs\\t%S1,%S0\;cmp/hs\\t%R1,%R0\\n%,Ldi%=: +{ + return cmp/eq %S1,%S0 \n + bf{.|/}s 0f \n + cmp/hs %S1,%S0 \n + cmp/hs %R1,%R0 \n + 0:; +} [(set_attr length 8) (set_attr type arith3)]) @@ -1224,7 +1264,13 @@ (gtu:SI (match_operand:DI 0 arith_reg_operand r) (match_operand:DI 1 arith_reg_operand r)))] TARGET_SH2 - cmp/eq\\t%S1,%S0\;bf{.|/}s\\t%,Ldi%=\;cmp/hi\\t%S1,%S0\;cmp/hi\\t%R1,%R0\\n%,Ldi%=: +{ + return cmp/eq %S1,%S0 \n + bf{.|/}s 0f \n + cmp/hi %S1,%S0 \n + cmp/hi %R1,%R0 \n + 0:; +} [(set_attr length 8) (set_attr type arith3)]) @@ -1276,7 +1322,7 @@ cmpgtu %N1, %N2, %0 [(set_attr type cmp_media)]) -; These two patterns
[PATCH] Fix more leaks
This fixes a few more heap leaks. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2012-08-21 Richard Guenther rguent...@suse.de * tree-ssa-loop-im.c (tree_ssa_lim_finalize): Properly free the affine expansion cache. * tree-ssa-dom.c (free_expr_hash_elt_contents): New function, split out from ... (free_expr_hash_elt): ... this one. (record_cond): Properly free a not needed hashtable element. (lookup_avail_expr): Likewise. * tree-into-ssa.c (init_ssa_renamer): Specify a free function for the var_infos hashtable. (update_ssa): Likewise. Index: gcc/tree-ssa-loop-im.c === *** gcc/tree-ssa-loop-im.c (revision 190533) --- gcc/tree-ssa-loop-im.c (working copy) *** tree_ssa_lim_finalize (void) *** 2634,2640 VEC_free (bitmap, heap, memory_accesses.all_refs_stored_in_loop); if (memory_accesses.ttae_cache) ! pointer_map_destroy (memory_accesses.ttae_cache); } /* Moves invariants from loops. Only expensive invariants are moved out -- --- 2634,2640 VEC_free (bitmap, heap, memory_accesses.all_refs_stored_in_loop); if (memory_accesses.ttae_cache) ! free_affine_expand_cache (memory_accesses.ttae_cache); } /* Moves invariants from loops. Only expensive invariants are moved out -- Index: gcc/tree-ssa-dom.c === *** gcc/tree-ssa-dom.c (revision 190533) --- gcc/tree-ssa-dom.c (working copy) *** print_expr_hash_elt (FILE * stream, cons *** 649,667 } } ! /* Delete an expr_hash_elt and reclaim its storage. */ static void ! free_expr_hash_elt (void *elt) { - struct expr_hash_elt *element = ((struct expr_hash_elt *)elt); - if (element-expr.kind == EXPR_CALL) free (element-expr.ops.call.args); ! ! if (element-expr.kind == EXPR_PHI) free (element-expr.ops.phi.args); free (element); } --- 649,672 } } ! /* Delete variable sized pieces of the expr_hash_elt ELEMENT. */ static void ! free_expr_hash_elt_contents (struct expr_hash_elt *element) { if (element-expr.kind == EXPR_CALL) free (element-expr.ops.call.args); ! else if (element-expr.kind == EXPR_PHI) free (element-expr.ops.phi.args); + } + + /* Delete an expr_hash_elt and reclaim its storage. */ + static void + free_expr_hash_elt (void *elt) + { + struct expr_hash_elt *element = ((struct expr_hash_elt *)elt); + free_expr_hash_elt_contents (element); free (element); } *** lookup_avail_expr (gimple stmt, bool ins *** 2404,2412 slot = htab_find_slot_with_hash (avail_exprs, element, element.hash, (insert ? INSERT : NO_INSERT)); if (slot == NULL) ! return NULL_TREE; ! ! if (*slot == NULL) { struct expr_hash_elt *element2 = XNEW (struct expr_hash_elt); *element2 = element; --- 2409,2419 slot = htab_find_slot_with_hash (avail_exprs, element, element.hash, (insert ? INSERT : NO_INSERT)); if (slot == NULL) ! { ! free_expr_hash_elt_contents (element); ! return NULL_TREE; ! } ! else if (*slot == NULL) { struct expr_hash_elt *element2 = XNEW (struct expr_hash_elt); *element2 = element; *** lookup_avail_expr (gimple stmt, bool ins *** 2422,2427 --- 2429,2436 VEC_safe_push (expr_hash_elt_t, heap, avail_exprs_stack, element2); return NULL_TREE; } + else + free_expr_hash_elt_contents (element); /* Extract the LHS of the assignment so that it can be used as the current definition of another variable. */ Index: gcc/tree-into-ssa.c === *** gcc/tree-into-ssa.c (revision 190533) --- gcc/tree-into-ssa.c (working copy) *** init_ssa_renamer (void) *** 2291,2297 /* Allocate memory for the DEF_BLOCKS hash table. */ gcc_assert (var_infos == NULL); var_infos = htab_create (VEC_length (tree, cfun-local_decls), ! var_info_hash, var_info_eq, NULL); bitmap_obstack_initialize (update_ssa_obstack); } --- 2291,2297 /* Allocate memory for the DEF_BLOCKS hash table. */ gcc_assert (var_infos == NULL); var_infos = htab_create (VEC_length (tree, cfun-local_decls), ! var_info_hash, var_info_eq, free); bitmap_obstack_initialize (update_ssa_obstack); } *** update_ssa (unsigned update_flags) *** 3170,3176 { /* If we rename bare symbols initialize the mapping to auxiliar info we need to keep track of. */ ! var_infos = htab_create (47, var_info_hash, var_info_eq, NULL); /* If we have to rename some symbols from
[PATCH] Document tree.h flags more, fixup valgrind alloc-pool.c
Testing in progress. Richard. 2012-08-21 Richard Guenther rguent...@suse.de * alloc-pool.c (pool_alloc): Fix valgrind annotation. * tree.h: Complete flags documentation. (CLEANUP_EH_ONLY): Check documented allowed tree codes. Index: gcc/alloc-pool.c === --- gcc/alloc-pool.c(revision 190558) +++ gcc/alloc-pool.c(working copy) @@ -247,7 +247,9 @@ void * pool_alloc (alloc_pool pool) { alloc_pool_list header; - VALGRIND_DISCARD (int size); +#ifdef ENABLE_VALGRIND_CHECKING + int size; +#endif if (GATHER_STATISTICS) { @@ -260,7 +262,9 @@ pool_alloc (alloc_pool pool) } gcc_checking_assert (pool); - VALGRIND_DISCARD (size = pool-elt_size - offsetof (allocation_object, u.data)); +#ifdef ENABLE_VALGRIND_CHECKING + size = pool-elt_size - offsetof (allocation_object, u.data); +#endif /* If there are no more free elements, make some more!. */ if (!pool-returned_free_list) Index: gcc/tree.h === --- gcc/tree.h (revision 190558) +++ gcc/tree.h (working copy) @@ -417,7 +417,7 @@ enum omp_clause_code so all nodes have these fields. See the accessor macros, defined below, for documentation of the - fields, and the table below which connects the fileds and the + fields, and the table below which connects the fields and the accessor macros. */ struct GTY(()) tree_base { @@ -494,6 +494,9 @@ struct GTY(()) tree_base { CASE_LOW_SEEN in CASE_LABEL_EXPR + PREDICT_EXPR_OUTCOME in + PREDICT_EXPR + static_flag: TREE_STATIC in @@ -576,12 +579,16 @@ struct GTY(()) tree_base { OMP_PARALLEL_COMBINED in OMP_PARALLEL + OMP_CLAUSE_PRIVATE_OUTER_REF in OMP_CLAUSE_PRIVATE TYPE_REF_IS_RVALUE in REFERENCE_TYPE + ENUM_IS_OPAQUE in + ENUMERAL_TYPE + protected_flag: TREE_PROTECTED in @@ -1117,7 +1124,8 @@ extern void omp_clause_range_check_faile /* In a TARGET_EXPR or WITH_CLEANUP_EXPR, means that the pertinent cleanup should only be executed if an exception is thrown, not on normal exit of its scope. */ -#define CLEANUP_EH_ONLY(NODE) ((NODE)-base.static_flag) +#define CLEANUP_EH_ONLY(NODE) \ + (TREE_CHECK2 (NODE, TARGET_EXPR, WITH_CLEANUP_EXPR)-base.static_flag) /* In a TRY_CATCH_EXPR, means that the handler should be considered a separate cleanup in honor_protect_cleanup_actions. */
Re: [PATCH] Document tree.h flags more, fixup valgrind alloc-pool.c
On Tue, 21 Aug 2012, Richard Guenther wrote: Testing in progress. Richard. 2012-08-21 Richard Guenther rguent...@suse.de * alloc-pool.c (pool_alloc): Fix valgrind annotation. * tree.h: Complete flags documentation. (CLEANUP_EH_ONLY): Check documented allowed tree codes. I have instead applied the following - the C++ frontend uses CLEANUP_EH_ONLY on C++ specific trees. Bootstrapped on x86_64-unknown-linux-gnu. Richard. 2012-08-21 Richard Guenther rguent...@suse.de * alloc-pool.c (pool_alloc): Fix valgrind annotation. * tree.h: Fix typo and complete flags documentation. Index: gcc/alloc-pool.c === --- gcc/alloc-pool.c(revision 190558) +++ gcc/alloc-pool.c(working copy) @@ -247,7 +247,9 @@ void * pool_alloc (alloc_pool pool) { alloc_pool_list header; - VALGRIND_DISCARD (int size); +#ifdef ENABLE_VALGRIND_CHECKING + int size; +#endif if (GATHER_STATISTICS) { @@ -260,7 +262,9 @@ pool_alloc (alloc_pool pool) } gcc_checking_assert (pool); - VALGRIND_DISCARD (size = pool-elt_size - offsetof (allocation_object, u.data)); +#ifdef ENABLE_VALGRIND_CHECKING + size = pool-elt_size - offsetof (allocation_object, u.data); +#endif /* If there are no more free elements, make some more!. */ if (!pool-returned_free_list) Index: gcc/tree.h === --- gcc/tree.h (revision 190558) +++ gcc/tree.h (working copy) @@ -417,7 +417,7 @@ enum omp_clause_code so all nodes have these fields. See the accessor macros, defined below, for documentation of the - fields, and the table below which connects the fileds and the + fields, and the table below which connects the fields and the accessor macros. */ struct GTY(()) tree_base { @@ -494,6 +494,9 @@ struct GTY(()) tree_base { CASE_LOW_SEEN in CASE_LABEL_EXPR + PREDICT_EXPR_OUTCOME in + PREDICT_EXPR + static_flag: TREE_STATIC in @@ -576,12 +579,16 @@ struct GTY(()) tree_base { OMP_PARALLEL_COMBINED in OMP_PARALLEL + OMP_CLAUSE_PRIVATE_OUTER_REF in OMP_CLAUSE_PRIVATE TYPE_REF_IS_RVALUE in REFERENCE_TYPE + ENUM_IS_OPAQUE in + ENUMERAL_TYPE + protected_flag: TREE_PROTECTED in
Re: [PATCH] Set current_function_decl in {push,pop}_cfun and push_struct_function
On Wed, Aug 15, 2012 at 05:21:04PM +0200, Martin Jambor wrote: Hi, On Fri, Aug 10, 2012 at 04:57:41PM +0200, Eric Botcazou wrote: - ada/gcc-interface/utils.c:rest_of_subprog_body_compilation calls dump_function which in turns calls dump_function_to_file which calls push_cfun. But Ada front end has its idea of the current_function_decl and there is no cfun which is an inconsistency which makes push_cfun assert fail. I solved it by temporarily setting current_function_decl to NULL_TREE. It's just dumping and I thought that dump_function should be considered middle-end and thus middle-end invariants should apply. If you think that calling dump_function from rest_of_subprog_body_compilation is a layering violation, I don't have a problem with replacing it with a more manual scheme like the one in c-family/c-gimplify.c:c_genericize, provided that this yields roughly the same output. Richi suggested on IRC that I remove the push/pop_cfun calls from dump_function_to_file. The only problem seems to be dump_histograms_for_stmt Yesterday I actually tried and it is not the only problem. Another one is dump_function_to_file-dump_bb-maybe_hot_bb_p which uses cfun to read profile_status. There may be others, this one just blew up first when I set cfun to NULL. And in future someone is quite likely to need cfun to dump something new too. At the same time, re-implementing dumping c-family/c-gimplify.c:c_genericize when dump_function suffices seems ugly to me. So I am going to declare dump_function a front-end interface and use set_cfun in my original patch in dump_function_to_file like we do in other such functions. I hope that will be OK. Thanks, Martin PS: Each of various alternatives proposed in this thread had someone who opposed it. If there is a consensus that some of them should be implemented anyway (like global value profiling hash), I am willing to do that, I just do not want to end up bickering about the result.
Re: [PATCH] Set current_function_decl in {push,pop}_cfun and push_struct_function
On Tue, Aug 21, 2012 at 1:27 PM, Martin Jambor mjam...@suse.cz wrote: On Wed, Aug 15, 2012 at 05:21:04PM +0200, Martin Jambor wrote: Hi, On Fri, Aug 10, 2012 at 04:57:41PM +0200, Eric Botcazou wrote: - ada/gcc-interface/utils.c:rest_of_subprog_body_compilation calls dump_function which in turns calls dump_function_to_file which calls push_cfun. But Ada front end has its idea of the current_function_decl and there is no cfun which is an inconsistency which makes push_cfun assert fail. I solved it by temporarily setting current_function_decl to NULL_TREE. It's just dumping and I thought that dump_function should be considered middle-end and thus middle-end invariants should apply. If you think that calling dump_function from rest_of_subprog_body_compilation is a layering violation, I don't have a problem with replacing it with a more manual scheme like the one in c-family/c-gimplify.c:c_genericize, provided that this yields roughly the same output. Richi suggested on IRC that I remove the push/pop_cfun calls from dump_function_to_file. The only problem seems to be dump_histograms_for_stmt Yesterday I actually tried and it is not the only problem. Another one is dump_function_to_file-dump_bb-maybe_hot_bb_p which uses cfun to read profile_status. There may be others, this one just blew up first when I set cfun to NULL. And in future someone is quite likely to need cfun to dump something new too. At the same time, re-implementing dumping c-family/c-gimplify.c:c_genericize when dump_function suffices seems ugly to me. So I am going to declare dump_function a front-end interface and use set_cfun in my original patch in dump_function_to_file like we do in other such functions. I hope that will be OK. Thanks, Setting cfun has side-effects of switching target stuff which might have code-generation side-effects because of implementation issues we have with target/optimize attributes. So I don't think cfun should be changed just for dumping. Can you instead just set current_function_decl and access struct function via DECL_STRUCT_FUNCTION in the dumpers then? After all, it it is a front-end interface, the frontend way of saying this is the current function is to set current_function_decl, not the middle-end cfun. Richard. Martin PS: Each of various alternatives proposed in this thread had someone who opposed it. If there is a consensus that some of them should be implemented anyway (like global value profiling hash), I am willing to do that, I just do not want to end up bickering about the result.
[Patch,testsuite] Break gcc.dg/fixed-point/convert.c into manageable parts
Just as the title says: gcc.dg/fixed-point/convert.c is much too big to run on embedded targets like AVR. Note that embedded systems are a main audience of ISO/IEC TR 18037, and that these systems might have limited resources. The original convert.c inflates to thousands of functions and set -O0. Some targets need to emulate *everything*, even integer multiplication, and the executable is much too fat. The patch breaks up convert.c in parts so that an AVR ATmega103 device with 128KiB for executable code (.text + .data + .rodata) can run them. Ok for trunk? Johann * gcc.dg/fixed-point/convert.c: Split into more manageable parts: * gcc.dg/fixed-point/convert-1.c: New. * gcc.dg/fixed-point/convert-2.c: New. * gcc.dg/fixed-point/convert-3.c: New. * gcc.dg/fixed-point/convert-4.c: New. * gcc.dg/fixed-point/convert-float-1.c: New. * gcc.dg/fixed-point/convert-float-2.c: New. * gcc.dg/fixed-point/convert-float-3.c: New. * gcc.dg/fixed-point/convert-float-4.c: New. * gcc.dg/fixed-point/convert-accum-neg.c: New. * gcc.dg/fixed-point/convert-sat.c: New. * gcc.dg/fixed-point/convert.h: New. Index: gcc/testsuite/gcc.dg/fixed-point/convert-sat.c === --- gcc/testsuite/gcc.dg/fixed-point/convert-sat.c (revision 0) +++ gcc/testsuite/gcc.dg/fixed-point/convert-sat.c (revision 0) @@ -0,0 +1,45 @@ +/* { dg-do run } */ +/* { dg-options -std=gnu99 -O0 } */ + +/* C99 6.3 Conversions. + + Check conversions involving fixed-point. */ + +extern void abort (void); + +#include convert.h + +int main () +{ + SAT_CONV1 (short _Accum, hk); + SAT_CONV1 (_Accum, k); + SAT_CONV1 (long _Accum, lk); + SAT_CONV1 (long long _Accum, llk); + + SAT_CONV2 (unsigned short _Accum, uhk); + SAT_CONV2 (unsigned _Accum, uk); + SAT_CONV2 (unsigned long _Accum, ulk); + SAT_CONV2 (unsigned long long _Accum, ullk); + + SAT_CONV3 (short _Fract, hr); + SAT_CONV3 (_Fract, r); + SAT_CONV3 (long _Fract, lr); + SAT_CONV3 (long long _Fract, llr); + + SAT_CONV4 (signed char); + SAT_CONV4 (short); + SAT_CONV4 (int); + SAT_CONV4 (long); + SAT_CONV4 (long long); + + SAT_CONV5 (unsigned char); + SAT_CONV5 (unsigned short); + SAT_CONV5 (unsigned int); + SAT_CONV5 (unsigned long); + SAT_CONV5 (unsigned long long); + + SAT_CONV6 (float); + SAT_CONV6 (double); + + return 0; +} Index: gcc/testsuite/gcc.dg/fixed-point/convert-accum-neg.c === --- gcc/testsuite/gcc.dg/fixed-point/convert-accum-neg.c (revision 0) +++ gcc/testsuite/gcc.dg/fixed-point/convert-accum-neg.c (revision 0) @@ -0,0 +1,33 @@ +/* { dg-do run } */ +/* { dg-options -std=gnu99 -O0 } */ + +/* C99 6.3 Conversions. + + Check conversions involving fixed-point. */ + +extern void abort (void); + +#include convert.h + +int main () +{ + ALL_ACCUM_CONV (short _Accum, hk); + ALL_ACCUM_CONV (_Accum, k); + ALL_ACCUM_CONV (long _Accum, lk); + ALL_ACCUM_CONV (long long _Accum, llk); + ALL_ACCUM_CONV (unsigned short _Accum, uhk); + ALL_ACCUM_CONV (unsigned _Accum, uk); + ALL_ACCUM_CONV (unsigned long _Accum, ulk); + ALL_ACCUM_CONV (unsigned long long _Accum, ullk); + + NEG_CONV (short _Fract, hr); + NEG_CONV (_Fract, r); + NEG_CONV (long _Fract, lr); + NEG_CONV (long long _Fract, llr); + NEG_CONV (short _Accum, hk); + NEG_CONV (_Accum, k); + NEG_CONV (long _Accum, lk); + NEG_CONV (long long _Accum, llk); + + return 0; +} Index: gcc/testsuite/gcc.dg/fixed-point/convert-1.c === --- gcc/testsuite/gcc.dg/fixed-point/convert-1.c (revision 0) +++ gcc/testsuite/gcc.dg/fixed-point/convert-1.c (revision 0) @@ -0,0 +1,20 @@ +/* { dg-do run } */ +/* { dg-options -std=gnu99 -O0 } */ + +/* C99 6.3 Conversions. + + Check conversions involving fixed-point. */ + +extern void abort (void); + +#include convert.h + +int main () +{ + ALL_CONV (short _Fract, hr); + ALL_CONV (_Fract, r); + ALL_CONV (long _Fract, lr); + ALL_CONV (long long _Fract, llr); + + return 0; +} Index: gcc/testsuite/gcc.dg/fixed-point/convert-2.c === --- gcc/testsuite/gcc.dg/fixed-point/convert-2.c (revision 0) +++ gcc/testsuite/gcc.dg/fixed-point/convert-2.c (revision 0) @@ -0,0 +1,20 @@ +/* { dg-do run } */ +/* { dg-options -std=gnu99 -O0 } */ + +/* C99 6.3 Conversions. + + Check conversions involving fixed-point. */ + +extern void abort (void); + +#include convert.h + +int main () +{ + ALL_CONV (unsigned short _Fract, uhr); + ALL_CONV (unsigned _Fract, ur); + ALL_CONV (unsigned long _Fract, ulr); + ALL_CONV (unsigned long long _Fract, ullr); + + return 0; +} Index: gcc/testsuite/gcc.dg/fixed-point/convert.c === --- gcc/testsuite/gcc.dg/fixed-point/convert.c (revision 190558) +++
Re: Reproducible gcc builds, gfortran, and -grecord-gcc-switches
On 20 August 2012 16:45, Joseph S. Myers jos...@codesourcery.com wrote: On Mon, 20 Aug 2012, Simon Baldwin wrote: OPT_* for Fortran options only exist when the Fortran front-end is in the source tree (whether or not enabled). I think we try to avoid knowingly breaking use cases where people remove some front ends from the source tree, although we don't actively test them and no longer provide split-up source tarballs. Thanks for the update. Which fix should move forwards? I think the approach using a new option flag is the way to go, though the patch needs (at least) documentation for the new flag in options.texi. Updated version appended below. Okay for 4.8 trunk? -- Omit OPT_cpp_ from the DWARF producer string in gfortran. Gfortran uses -cpp=temporary file internally, and with -grecord_gcc_switches this command line switch is stored by default in object files. This causes problems with build and packaging systems that care about gcc binary reproducibility and file checksums; the temporary file is different on each compiler invocation. Fixed by adding a new opt marker NoDWARFRecord and associated flag, filtering options for this this setting when writing the producer string, and setting this flag for fortran -cpp=temporary file Tested for fortran (suppresses -cpp=...) and c (no effect). gcc/ChangeLog 2012-08-21 Simon Baldwin sim...@google.com * dwarf2out.c (gen_producer_string): Omit command line switch if CL_NO_DWARF_RECORD flag set. * opts.c (print_specific_help): Add CL_NO_DWARF_RECORD handling. * opts.h (CL_NO_DWARF_RECORD): New. * opt-functions.awk (switch_flags): Add NoDWARFRecord. * doc/options.texi: Document NoDWARFRecord option flag. * doc/invoke.texi: Document --help=nodwarfrecord. gcc/fortran/ChangeLog 2012-08-21 Simon Baldwin sim...@google.com * lang.opt (-cpp=): Mark flag NoDWARFRecord. Index: gcc/doc/options.texi === --- gcc/doc/options.texi(revision 190535) +++ gcc/doc/options.texi(working copy) @@ -468,4 +468,8 @@ of @option{-@var{opt}}, if not explicitl specify several different languages. Each @var{language} must have been declared by an earlier @code{Language} record. @xref{Option file format}. + +@item NoDWARFRecord +The option is added to the list of those omitted from the producer string +written by @option{-grecord-gcc-switches}. @end table Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 190535) +++ gcc/doc/invoke.texi (working copy) @@ -1330,6 +1330,10 @@ sign in the same continuous piece of tex @item @samp{separate} Display options taking an argument that appears as a separate word following the original option, such as: @samp{-o output-file}. + +@item @samp{nodwarfrecord} +Display only those options that are marked for addition to the list of +options omitted from @option{-grecord-gcc-switches}. @end table Thus for example to display all the undocumented target-specific Index: gcc/dwarf2out.c === --- gcc/dwarf2out.c (revision 190535) +++ gcc/dwarf2out.c (working copy) @@ -18101,6 +18101,9 @@ gen_producer_string (void) /* Ignore these. */ continue; default: +if (cl_options[save_decoded_options[j].opt_index].flags +CL_NO_DWARF_RECORD) + continue; gcc_checking_assert (save_decoded_options[j].canonical_option[0][0] == '-'); switch (save_decoded_options[j].canonical_option[0][1]) Index: gcc/opts.c === --- gcc/opts.c (revision 190535) +++ gcc/opts.c (working copy) @@ -1186,7 +1186,9 @@ print_specific_help (unsigned int includ { if (any_flags == 0) { - if (include_flags CL_UNDOCUMENTED) + if (include_flags CL_NO_DWARF_RECORD) + description = _(The following options are not recorded by DWARF); + else if (include_flags CL_UNDOCUMENTED) description = _(The following options are not documented); else if (include_flags CL_SEPARATE) description = _(The following options take separate arguments); @@ -1292,7 +1294,7 @@ common_handle_option (struct gcc_options /* Walk along the argument string, parsing each word in turn. The format is: arg = [^]{word}[,{arg}] - word = {optimizers|target|warnings|undocumented| + word = {optimizers|target|warnings|undocumented|nodwarfrecord| params|common|language} */ while (* a != 0) { @@ -1307,6 +1309,7 @@ common_handle_option (struct gcc_options { target, CL_TARGET }, { warnings, CL_WARNING }, { undocumented, CL_UNDOCUMENTED
Re: patch for machine independent rtl section to hide case statements for different types of constants.
Now that I have had a chance to talk to Richard, I have now done everything that he requested in his email. Here is the new patch and changelog. Everything was tested on x86-64. 2012-08-21 Kenneth Zadeck zad...@naturalbridge.com * alias.c (rtx_equal_for_memref_p): Convert constant cases. * combine.c (find_single_use_1, mark_used_regs_combine): Ditto. * cse.c (exp_equiv_p, canon_reg, fold_rtx, cse_process_notes_1, count_reg_usage): Ditto. * cselib.c (cselib_expand_value_rtx_1): Convert to CASE_CONST_ANY. (cselib_subst_to_values): Convert constant cases. * df-scan.c (df_uses_record): Ditto. * dse.c (const_or_frame_p): Convert case statements to explicit if-then-else using mode classes. * emit-rtl.c (verify_rtx_sharing, copy_insn_1): Convert constant cases. * explow.c (convert_memory_address_addr_space): Ditto. * gcse.c (want_to_gcse_p, oprs_unchanged_p, compute_transp): Ditto. * genattrtab.c (attr_copy_rtx, clear_struct_flag): Ditto. * ira.c (equiv_init_varies_p, contains_replace_regs, memref_referenced_p, rtx_moveable_p): Ditto. * jump.c (mark_jump_label_1): Remove constant cases. (rtx_renumbered_equal_p): Convert to CASE_CONST_UNIQUE. * loop-invariant.c (check_maybe_invariant): Convert constant cases. (hash_invariant_expr_1,invariant_expr_equal_p): Convert to CASE_CONST_ALL. * postreload-gcse.c (oprs_unchanged_p): Convert constant cases. * reginfo.c (reg_scan_mark_refs): Ditto. * regrename.c (scan_rtx): Ditto. * reload1.c (eliminate_regs_1, elimination_effects, scan_paradoxical_subregs): Ditto. * reload.c (operands_match_p, subst_reg_equivs): Ditto. * resource.c (mark_referenced_resources, mark_set_resources): Ditto. * rtlanal.c (rtx_unstable_p, rtx_varies_p, count_occurrences) (reg_mentioned_p, modified_between_p, modified_in_p) (volatile_insn_p, volatile_refs_p, side_effects_p, may_trap_p_1, inequality_comparisons_p, computed_jump_p_1): Ditto. * rtl.c (copy_rtx, rtx_equal_p_cb, rtx_equal_p): Ditto. * sched-deps.c (sched_analyze_2): Ditto. * valtrack.c (cleanup_auto_inc_dec): Ditto. * rtl.h: (CASE_CONST_SCALAR_INTEGER, CASE_CONST_UNIQUE, CASE_CONST_ANY): New macros. I plan to commit this in a few days unless someone has some comments. This is a mostly trivial patch and the changes from that are Richard Sandiford's and he is an rtl maintainer. kenny On 08/20/2012 09:58 AM, Kenneth Zadeck wrote: I of course meant the machine independent not dependent On 08/20/2012 09:50 AM, Kenneth Zadeck wrote: This patch started out to be a purely mechanical change to the switch statements so that the ones that are used to take apart constants can be logically grouped. This is important for the next patch that I will submit this week that frees the rtl level from only being able to represent large integer constants with two HWIs. I sent the patch to Richard Sandiford and when the comments came back from him, this patch turned into something that actually has real semantic changes. (His comments are enclosed below.) I did almost all of Richard's changes because he is generally right about such things, but it does mean that the patch has to be more carefully reviewed. Richard does not count his comments as a review. The patch has, of course, been properly tested on x86-64. Any comments? Ok for commit? Kenny diff -upNr '--exclude=.svn' gccBaseline/gcc/alias.c gccWCase/gcc/alias.c --- gccBaseline/gcc/alias.c 2012-08-17 09:35:24.794195890 -0400 +++ gccWCase/gcc/alias.c 2012-08-19 09:48:33.666509880 -0400 @@ -1486,9 +1486,7 @@ rtx_equal_for_memref_p (const_rtx x, con return XSTR (x, 0) == XSTR (y, 0); case VALUE: -case CONST_INT: -case CONST_DOUBLE: -case CONST_FIXED: +CASE_CONST_UNIQUE: /* There's no need to compare the contents of CONST_DOUBLEs or CONST_INTs because pointer equality is a good enough comparison for these nodes. */ diff -upNr '--exclude=.svn' gccBaseline/gcc/combine.c gccWCase/gcc/combine.c --- gccBaseline/gcc/combine.c 2012-08-17 09:35:24.802195795 -0400 +++ gccWCase/gcc/combine.c 2012-08-20 15:43:34.659362244 -0400 @@ -531,12 +531,10 @@ find_single_use_1 (rtx dest, rtx *loc) switch (code) { -case CONST_INT: case CONST: case LABEL_REF: case SYMBOL_REF: -case CONST_DOUBLE: -case CONST_VECTOR: +CASE_CONST_UNIQUE: case CLOBBER: return 0; @@ -12788,10 +12786,8 @@ mark_used_regs_combine (rtx x) { case LABEL_REF: case SYMBOL_REF: -case CONST_INT: case CONST: -case CONST_DOUBLE: -case CONST_VECTOR: +CASE_CONST_UNIQUE: case PC: case ADDR_VEC: case ADDR_DIFF_VEC: diff -upNr '--exclude=.svn' gccBaseline/gcc/cse.c gccWCase/gcc/cse.c --- gccBaseline/gcc/cse.c 2012-07-27 16:58:24.829691705 -0400 +++ gccWCase/gcc/cse.c 2012-08-20 15:47:26.924501205 -0400 @@ -2623,9 +2623,7 @@
Re: [wwwdocs] Document Runtime CPU detection builtins
On 2012-08-20 22:41 , Sriraman Tallam wrote: Hi Gerald / Diego, I have made all the mentioned changes. I also shortened the description like Diego mentioned by removing all the strings but kept the caveats. I have not added a reference to the documentation because i do not know what link to reference. The builtins are completely documented in extend.texi. Referring to the user's manual is OK, I think. +pCaveat: If these built-in functions are called before any static +constructors are invoked, like during IFUNC initialization, then the CPU +detection initialization must be explicity run using this newly provided s/explicity/explicitly/ Other than that, it looks fine to me. Diego.
Re: [PATCH] Combine location with block using block_locations
On Mon, Aug 20, 2012 at 3:18 AM, Dehao Chen de...@google.com wrote: ping Conceptually I like the change. Can a libcpp maintainer please have a 2nd look? Dehao, did you do any compile-time and memory-usage benchmarks? Thanks, Richard. Thanks, Dehao On Tue, Aug 14, 2012 at 10:13 AM, Dehao Chen de...@google.com wrote: Hi, Dodji, Thanks for the review. I've fixed all the addressed issues. I'm attaching the related changes: Thanks, Dehao libcpp/ChangeLog: 2012-08-01 Dehao Chen de...@google.com * include/line-map.h (MAX_SOURCE_LOCATION): New value. (location_adhoc_data_init): New. (location_adhoc_data_fini): New. (get_combined_adhoc_loc): New. (get_data_from_adhoc_loc): New. (get_location_from_adhoc_loc): New. (COMBINE_LOCATION_DATA): New. (IS_ADHOC_LOC): New. (expanded_location): New field. * line-map.c (location_adhoc_data): New. (location_adhoc_data_htab): New. (curr_adhoc_loc): New. (location_adhoc_data): New. (allocated_location_adhoc_data): New. (location_adhoc_data_hash): New. (location_adhoc_data_eq): New. (location_adhoc_data_update): New. (get_combined_adhoc_loc): New. (get_data_from_adhoc_loc): New. (get_location_from_adhoc_loc): New. (location_adhoc_data_init): New. (location_adhoc_data_fini): New. (linemap_lookup): Change to use new location. (linemap_ordinary_map_lookup): Likewise. (linemap_macro_map_lookup): Likewise. (linemap_macro_map_loc_to_def_point): Likewise. (linemap_macro_map_loc_unwind_toward_spel): Likewise. (linemap_get_expansion_line): Likewise. (linemap_get_expansion_filename): Likewise. (linemap_location_in_system_header_p): Likewise. (linemap_location_from_macro_expansion_p): Likewise. (linemap_macro_loc_to_spelling_point): Likewise. (linemap_macro_loc_to_def_point): Likewise. (linemap_macro_loc_to_exp_point): Likewise. (linemap_resolve_location): Likewise. (linemap_unwind_toward_expansion): Likewise. (linemap_unwind_to_first_non_reserved_loc): Likewise. (linemap_expand_location): Likewise. (linemap_dump_location): Likewise. Index: libcpp/line-map.c === --- libcpp/line-map.c (revision 190209) +++ libcpp/line-map.c (working copy) @@ -25,6 +25,7 @@ #include line-map.h #include cpplib.h #include internal.h +#include hashtab.h static void trace_include (const struct line_maps *, const struct line_map *); static const struct line_map * linemap_ordinary_map_lookup (struct line_maps *, @@ -50,6 +51,135 @@ extern unsigned num_expanded_macros_counter; extern unsigned num_macro_tokens_counter; +/* Data structure to associate an arbitrary data to a source location. */ +struct location_adhoc_data { + source_location locus; + void *data; +}; + +/* The following data structure encodes a location with some adhoc data + and maps it to a new unsigned integer (called an adhoc location) + that replaces the original location to represent the mapping. + + The new adhoc_loc uses the highest bit as the enabling bit, i.e. if the + highest bit is 1, then the number is adhoc_loc. Otherwise, it serves as + the original location. Once identified as the adhoc_loc, the lower 31 + bits of the integer is used to index the location_adhoc_data array, + in which the locus and associated data is stored. */ + +static htab_t location_adhoc_data_htab; +static source_location curr_adhoc_loc; +static struct location_adhoc_data *location_adhoc_data; +static unsigned int allocated_location_adhoc_data; + +/* Hash function for location_adhoc_data hashtable. */ + +static hashval_t +location_adhoc_data_hash (const void *l) +{ + const struct location_adhoc_data *lb = + (const struct location_adhoc_data *) l; + return (hashval_t) lb-locus + (size_t) lb-data; +} + +/* Compare function for location_adhoc_data hashtable. */ + +static int +location_adhoc_data_eq (const void *l1, const void *l2) +{ + const struct location_adhoc_data *lb1 = + (const struct location_adhoc_data *) l1; + const struct location_adhoc_data *lb2 = + (const struct location_adhoc_data *) l2; + return lb1-locus == lb2-locus lb1-data == lb2-data; +} + +/* Update the hashtable when location_adhoc_data is reallocated. */ + +static int +location_adhoc_data_update (void **slot, void *data) +{ + *((char **) slot) += ((char *) location_adhoc_data - (char *) data); + return 1; +} + +/* Combine LOCUS and DATA to a combined adhoc loc. */ + +source_location +get_combined_adhoc_loc (source_location locus, void *data) +{ + struct location_adhoc_data lb; + struct location_adhoc_data **slot; + +
[PATCH][4.7] Backport recent heap leak fixes
This backports the obvious heap leak fixes that have accumulated sofar. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2012-08-21 Richard Guenther rguent...@suse.de Backport from mainline 2012-08-16 Richard Guenther rguent...@suse.de PR middle-end/54146 * tree-ssa-loop-niter.c (find_loop_niter_by_eval): Free the exit vector. * ipa-pure-const.c (analyze_function): Use FOR_EACH_LOOP_BREAK. * cfgloop.h (FOR_EACH_LOOP_BREAK): Fix. * tree-ssa-structalias.c (handle_lhs_call): Properly free rhsc. * tree-ssa-loop-im.c (analyze_memory_references): Adjust. (tree_ssa_lim_finalize): Free all mem_refs. * tree-ssa-sccvn.c (extract_and_process_scc_for_name): Free scc when bailing out. * modulo-sched.c (sms_schedule): Use FOR_EACH_LOOP_BREAK. * ira-build.c (loop_with_complex_edge_p): Free loop exit vector. * graphite-sese-to-poly.c (scop_ivs_can_be_represented): Use FOR_EACH_LOOP_BREAK. 2012-08-17 Richard Guenther rguent...@suse.de * tree-sra.c (modify_function): Free redirect_callers vector. * ipa-split.c (split_function): Free args_to_pass vector. * tree-vect-stmts.c (vectorizable_operation): Do not pre-allocate vec_oprnds. (new_stmt_vec_info): Do not pre-allocate STMT_VINFO_SAME_ALIGN_REFS. * tree-vect-slp.c (vect_free_slp_instance): Free the instance. (vect_analyze_slp_instance): Free everything. (destroy_bb_vec_info): Free the SLP instances. 2012-08-17 Richard Guenther rguent...@suse.de * params.def (integer-share-limit): Decrease from 256 to 251, add rationale. 2012-08-21 Richard Guenther rguent...@suse.de * tree-ssa-loop-im.c (tree_ssa_lim_finalize): Properly free the affine expansion cache. Index: gcc/tree-ssa-loop-niter.c === --- gcc/tree-ssa-loop-niter.c (revision 190560) +++ gcc/tree-ssa-loop-niter.c (working copy) @@ -2290,7 +2290,10 @@ find_loop_niter_by_eval (struct loop *lo /* Loops with multiple exits are expensive to handle and less important. */ if (!flag_expensive_optimizations VEC_length (edge, exits) 1) -return chrec_dont_know; +{ + VEC_free (edge, heap, exits); + return chrec_dont_know; +} FOR_EACH_VEC_ELT (edge, exits, i, ex) { Index: gcc/ipa-pure-const.c === --- gcc/ipa-pure-const.c(revision 190560) +++ gcc/ipa-pure-const.c(working copy) @@ -803,7 +803,7 @@ end: if (dump_file) fprintf (dump_file, can not prove finiteness of loop %i\n, loop-num); l-looping =true; - break; + FOR_EACH_LOOP_BREAK (li); } scev_finalize (); } Index: gcc/ipa-split.c === --- gcc/ipa-split.c (revision 190560) +++ gcc/ipa-split.c (working copy) @@ -1239,6 +1239,7 @@ split_function (struct split_point *spli } call = gimple_build_call_vec (node-decl, args_to_pass); gimple_set_block (call, DECL_INITIAL (current_function_decl)); + VEC_free (tree, heap, args_to_pass); /* We avoid address being taken on any variable used by split part, so return slot optimization is always possible. Moreover this is Index: gcc/graphite-sese-to-poly.c === --- gcc/graphite-sese-to-poly.c (revision 190560) +++ gcc/graphite-sese-to-poly.c (working copy) @@ -3229,6 +3229,7 @@ scop_ivs_can_be_represented (scop_p scop loop_iterator li; loop_p loop; gimple_stmt_iterator psi; + bool result = true; FOR_EACH_LOOP (li, loop, 0) { @@ -3244,11 +3245,16 @@ scop_ivs_can_be_represented (scop_p scop if (TYPE_UNSIGNED (type) TYPE_PRECISION (type) = TYPE_PRECISION (long_long_integer_type_node)) - return false; + { + result = false; + break; + } } + if (!result) + FOR_EACH_LOOP_BREAK (li); } - return true; + return result; } /* Builds the polyhedral representation for a SESE region. */ Index: gcc/cfgloop.h === --- gcc/cfgloop.h (revision 190560) +++ gcc/cfgloop.h (working copy) @@ -629,7 +629,7 @@ fel_init (loop_iterator *li, loop_p *loo #define FOR_EACH_LOOP_BREAK(LI) \ { \ -VEC_free (int, heap, (LI)-to_visit); \ +VEC_free (int, heap, (LI).to_visit); \ break; \ } Index: gcc/tree-ssa-structalias.c === --- gcc/tree-ssa-structalias.c (revision 190560) +++ gcc/tree-ssa-structalias.c
Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)
The issue here is holding lock for all the files (that can be many) versus number of locks limits possibilities for deadlocking (mind that updating may happen in different orders on the same files for different programs built from same objects) lockf typically has a deadlock detector, and will error out. -Andi
RE: [AARCH64] [PATCH 1/3] AArch64 Port
Hi, Thanks for the feedback. I respond here to the remaining issues: Index: gcc/doc/extend.texi === --- gcc/doc/extend.texi (revision 187870) +++ gcc/doc/extend.texi (working copy) @@ -935,7 +935,8 @@ Not all targets support additional floating point types. @code{__float80} and @code{__float128} types are supported on i386, x86_64 and ia64 targets. -The @code{__float128} type is supported on hppa HP-UX targets. +The @code{__float128} type is supported on hppa HP-UX targets and ARM AArch64 +targets. I don't see any good reason to support it on AArch64, since it's the same as long double there. (It's on PA HP-UX as a workaround for libquadmath requiring the type rather than being able to with with a type called either long double or __float128 - libquadmath being used on PA HP-UX as a workaround for the system libm lacking much long double support. But that shouldn't be an issue for new targets such as AArch64 GNU/Linux. And my understanding from N1582 is that the C bindings for IEEE 754-2008, being worked on for a five-part ISO/IEC TS, are expected to use names such as _Float128, not __float128, as standard names for supported IEEE floating-point types.) Support for __float128 has been removed. Fixed in: r189655 | sofiane | 2012-07-19 13:24:57 +0100 (Thu, 19 Jul 2012) | 19 lines [AArch64] Remove __float128 support. +@opindex mbig-endian +Generate big-endian code. This is the default when GCC is configured for an +@samp{aarch64*be-*-*} target. In general, throughout Texinfo changes, two spaces after . at the end of a sentence. +@item -march=@var{name} +@opindex march +Specify the name of the target architecture, optionally suffixed by one or +more feature modifiers. This option has the form +@samp{-march=arch[+[no]feature]}, where the only value for @samp{arch} +is @samp{armv8}, and the possible values for @samp{feature} are +@samp{crypto}, @samp{fp}, @samp{simd}. It's unfortunate that you've chosen this complicated syntax that means the generic support for enumerated option arguments cannot be used (and so --help information cannot list supported CPUs and features). A simpler syntax where -march takes just an architecture name and features have separate options would seem better, and more in line with most other architectures supported by GCC. There are several Texinfo problems above. Instead of feature you should use @var{feature}, and since the '[' and ']' are not literal text they should be inside @r{} - the proper way of writing @samp{-march=arch[+[no]feature]} would be @option{-march=@var{arch}@r{[}+@r{[}no@r{]}@var{feature}@r{]}}. Also, could you document what the feature names mean? Documentation formatting has been fixed to conform to the required styling. Also the documentation has been updated to clarify ambiguous parts or add missing ones. Fixed in: r188895 | belagod | 2012-06-22 18:23:05 +0100 (Fri, 22 Jun 2012) | 11 lines [AArch64] Fix documentation layout. +@item -mcpu=@var{name} +@opindex mcpu +Specify the name of the target processor, optionally suffixed by one or more +feature modifiers. This option has the form @samp{- cpu=cpu[+[no]feature]}, +where the possible values for @samp{cpu} are @samp{generic}, @samp{large}, +and the possible values for @samp{feature} are @samp{crypto}, @samp{fp}, +@samp{simd}. Same comments apply. Same as above. Fixed in: r188895 | belagod | 2012-06-22 18:23:05 +0100 (Fri, 22 Jun 2012) | 11 lines [AArch64] Fix documentation layout. +This option is very similar to the -mcpu= option, except that instead of @option{-mcpu=}. And does -mtune= take feature names or just plain CPU names? Same as above. Fixed in: r188895 | belagod | 2012-06-22 18:23:05 +0100 (Fri, 22 Jun 2012) | 11 lines [AArch64] Fix documentation layout. + if (mvn == 0) + { + if (widthc != 'd') + sprintf (templ,movi\t%%0.%d%c, %%1, lsl %d ,(64/width), + widthc, shift); + else + sprintf (templ,movi\t%%d0, %%1); + } + else + sprintf (templ,mvni\t%%0.%d%c, %%1, lsl %d,(64/width), + widthc, shift); Presumably you have some logic for why the 40-byte buffer size is enough, but could you use snprintf with sizeof (templ) specified in the call to protect against any mistakes in that logic? Also, spaces after commas and around the / in the division, and the second line in the function call should be lined up immediately after the opening '(', not further right. (Check for and fix all these issues elsewhere in the port as well; I've just pointed out a representative instance of them.) sprinsf has been replaced with snprintf and sizeof (templ) as appropriate. Fixed in: r188896 | belagod | 2012-06-22 18:32:35 +0100 (Fri, 22 Jun
PATCH: PR target/54347: REAL_VALUE_TO_TARGET_LONG_DOUBLE shouldn't be used in i386
Hi, long double may not be 80-bit on i386. We can't use REAL_VALUE_TO_TARGET_LONG_DOUBLE for XFmode. This patch replaces REAL_VALUE_TO_TARGET_LONG_DOUBLE with real_to_target. OK to install? Thanks. H.J. --- 2012-08-21 H.J. Lu hongjiu...@intel.com PR target/54347 * config/i386/i386.c (ix86_split_to_parts): Replace REAL_VALUE_TO_TARGET_LONG_DOUBLE with real_to_target. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 5da4da2..a6fc45b 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -20743,7 +20743,9 @@ ix86_split_to_parts (rtx operand, rtx *parts, enum machine_mode mode) parts[2] = gen_int_mode (l[2], SImode); break; case XFmode: - REAL_VALUE_TO_TARGET_LONG_DOUBLE (r, l); + /* We can't use REAL_VALUE_TO_TARGET_LONG_DOUBLE since +long double may not be 80-bit. */ + real_to_target (l, r, mode); parts[2] = gen_int_mode (l[2], SImode); break; case DFmode:
RE: [AARCH64] [PATCH 2/3] AArch64 Port
-Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Joseph S. Myers Sent: 25 May 2012 15:24 To: Marcus Shawcroft Cc: gcc-patches@gcc.gnu.org Subject: Re: [AARCH64] [PATCH 2/3] AArch64 Port On Fri, 25 May 2012, Marcus Shawcroft wrote: Index: gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-4.x === --- gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-4.x (revision 0) +++ gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-4.x (revision 0) @@ -0,0 +1,5 @@ +if { [istarget aarch64_be-*-*] } then { + return 1 +} + +return 0 This isn't a suitable way of enabling a test only for one endianness, since a test may be run with -mbig-endian or -mlittle-endian with a compiler defaulting to the other endianness. You need to test an effective-target keyword instead. Index: gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-3.x === --- gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-3.x (revision 0) +++ gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-3.x (revision 0) @@ -0,0 +1,5 @@ +if { [istarget aarch64_be-*-*] } then { + return 1 +} + +return 0 Likewise. Thanks. This is now fixed in: r190482 | sofiane | 2012-08-17 16:02:20 +0100 (Fri, 17 Aug 2012) | 9 lines [AArch64] Use effective-target to check for big endian Sofiane
Re: Reproducible gcc builds, gfortran, and -grecord-gcc-switches
On Tue, 21 Aug 2012, Simon Baldwin wrote: Index: gcc/doc/options.texi === --- gcc/doc/options.texi (revision 190535) +++ gcc/doc/options.texi (working copy) @@ -468,4 +468,8 @@ of @option{-@var{opt}}, if not explicitl specify several different languages. Each @var{language} must have been declared by an earlier @code{Language} record. @xref{Option file format}. + +@item NoDWARFRecord +The option is added to the list of those omitted from the producer string +written by @option{-grecord-gcc-switches}. Remove added to the list of those (which seems unnecessarily verbose). +@item @samp{nodwarfrecord} +Display only those options that are marked for addition to the list of +options omitted from @option{-grecord-gcc-switches}. I don't think there's any need for special --help support for options with this flag; this flag is really an implementation detail. (Thus, I think all the opts.c changes are unnecessary.) -- Joseph S. Myers jos...@codesourcery.com
Re: [google/gcc-4_7] Fix regression - SUBTARGET_EXTRA_SPECS overridden by LINUX_GRTE_EXTRA_SPECS
Hi Jing, the crosstool test passed. You can start the review, thanks! -Han On Wed, Aug 15, 2012 at 3:11 PM, Han Shen(沈涵) shen...@google.com wrote: Hi Jing, ping? On Mon, Aug 13, 2012 at 10:58 AM, Han Shen(沈涵) shen...@google.com wrote: Hi, the google/gcc-4_7 fails to linking anything (on x86-generic), by looking into specs file, it seems that 'link_emulation' section is missing in specs. The problem is in config/i386/linux.h, SUBTARGET_EXTRA_SPECS (which is not empty for chrome x86-generic) is overridden by LINUX_GRTE_EXTRA_SPECS. My fix is to prepend LINUX_GRTE_EXTRA_SPECS to SUBTARGET_EXTRA_SPECS in linux.h Jing, could you take a look at this? -- Han Shen 2012-08-13 Han Shen shen...@google.com * gcc/config/i386/gnu-user.h (SUBTARGET_EXTRA_SPECS): Compute new value of LINUX_GRTE_EXTRA_SPECS by pre-pending LINUX_GRTE_EXTRA_SPECS to its origin value. * gcc/config/i386/gnu-user.h (SUBTARGET_EXTRA_SPECS_STR): Add new MACRO to hold value of SUBTARET_EXTRA_SPECS so that SUBTARET_EXTRA_SPECS could be replaced later in gnu-user.h --- a/gcc/config/i386/gnu-user.h +++ b/gcc/config/i386/gnu-user.h @@ -92,11 +92,14 @@ along with GCC; see the file COPYING3. If not see #define ASM_SPEC \ --32 %{!mno-sse2avx:%{mavx:-msse2avx}} %{msse2avx:%{!mavx:-msse2avx}} -#undef SUBTARGET_EXTRA_SPECS -#define SUBTARGET_EXTRA_SPECS \ +#undef SUBTARGET_EXTRA_SPECS_STR +#define SUBTARGET_EXTRA_SPECS_STR \ { link_emulation, GNU_USER_LINK_EMULATION },\ { dynamic_linker, GNU_USER_DYNAMIC_LINKER } +#undef SUBTARGET_EXTRA_SPECS +#define SUBTARGET_EXTRA_SPECS SUBTARGET_EXTRA_SPECS_STR + #undef LINK_SPEC #define LINK_SPEC -m %(link_emulation) %{shared:-shared} \ %{!shared: \ --- a/gcc/config/i386/linux.h +++ b/gcc/config/i386/linux.h @@ -32,5 +32,11 @@ along with GCC; see the file COPYING3. If not see #endif #undef SUBTARGET_EXTRA_SPECS +#ifndef SUBTARGET_EXTRA_SPECS_STR #define SUBTARGET_EXTRA_SPECS \ LINUX_GRTE_EXTRA_SPECS +#else +#define SUBTARGET_EXTRA_SPECS \ + LINUX_GRTE_EXTRA_SPECS \ + SUBTARGET_EXTRA_SPECS_STR +#endif -- Han Shen | Software Engineer | shen...@google.com | +1-650-440-3330 -- Han Shen | Software Engineer | shen...@google.com | +1-650-440-3330
Merge from gcc 4.7 branch to gccgo branch
I've merged gcc 4.7 branch revision 190560 to the gccgo branch. Ian
Re: [Patch,AVR] PR54222: Add fixed point support
2012/8/13 Georg-Johann Lay a...@gjlay.de: Denis Chertykov wrote: 2012/8/11 Georg-Johann Lay a...@gjlay.de: Weddington, Eric schrieb: From: Georg-Johann Lay The first step would be to bisect and find the patch that lead to PR53923. It was not a change in the avr BE, so the question goes to the authors of the respective patch. Up to now I didn't even try to bisect; that would take years on the host that I have available... My only real concern is that this is a major feature addition and the AVR port is currently broken. I don't know if it's the avr port or some parts of the middle end that don't cooperate with avr. I would really, really love to see fixed point support added in, especially since I know that Sean has worked on it for quite a while, and you've also done a lot of work in getting the patches in shape to get them committed. But, if the AVR port is currently broken (by whomever, and whatever patch) and a major feature like this can't be tested to make sure it doesn't break anything else in the AVR backend, then I'm hesitant to approve (even though I really want to approve). I don't understand enough of DF to fix PR53923. The insn that leads to the ICE is (in df-problems.c:dead_debug_insert_temp): Today I have updated GCC svn tree and successfully compiled avr-gcc. The libgcc2-mulsc3.c from also compiled without bugs. Denis. PS: May be I'm doing something wrong ? (I had too long vacations) I am configuring with --target=avr --disable-nls --with-dwarf2 --enable-languages=c,c++ --enable-target-optspace=yes --enable-checking=yes,rtl Build GCC is gcc version 4.3.2. Build and host are i686-pc-linux-gnu. Maybe it's different on a 64-bit computer, but I only have 32-bit host. I have debugging PR53923 and on my opinion it's not an AVR port bug. Please commit fixed point support. Denis. PS: sorry for delay
Re: patch for machine independent rtl section to hide case statements for different types of constants.
Kenneth Zadeck zad...@naturalbridge.com writes: I plan to commit this in a few days unless someone has some comments. This is a mostly trivial patch and the changes from that are Richard Sandiford's and he is an rtl maintainer. Please don't do this. Patches need to be sent for review in their final form. Obviously, having got this far with the patch, you're free to beat me up if I don't review it. :-) Anyway, please do call it CASE_CONST_SCALAR_INT rather than CASE_CONST_SCALAR_INTEGER. Like I said in my original mail, CASE_CONST_SCALAR_INT chimes nicely with SCALAR_INT_MODE_P, etc., and (as I didn't say) it'd be better not to have two spellings of the same thing. diff -upNr '--exclude=.svn' gccBaseline/gcc/combine.c gccWCase/gcc/combine.c --- gccBaseline/gcc/combine.c 2012-08-17 09:35:24.802195795 -0400 +++ gccWCase/gcc/combine.c2012-08-20 15:43:34.659362244 -0400 @@ -531,12 +531,10 @@ find_single_use_1 (rtx dest, rtx *loc) switch (code) { -case CONST_INT: case CONST: case LABEL_REF: case SYMBOL_REF: -case CONST_DOUBLE: -case CONST_VECTOR: +CASE_CONST_UNIQUE: case CLOBBER: return 0; @@ -12788,10 +12786,8 @@ mark_used_regs_combine (rtx x) { case LABEL_REF: case SYMBOL_REF: -case CONST_INT: case CONST: -case CONST_DOUBLE: -case CONST_VECTOR: +CASE_CONST_UNIQUE: case PC: case ADDR_VEC: case ADDR_DIFF_VEC: These were supposed to be CASE_CONST_ANY. The omission of CONST_FIXED looks like an oversight. switch (code) { -case CONST_INT: -case CONST_DOUBLE: -case CONST_FIXED: +CASE_CONST_UNIQUE: case SYMBOL_REF: case CONST: case LABEL_REF: This was suppsoed to be CASE_CONST_ANY too. The omission of CONST_VECTOR looks like an oversight. +/* Match CONST_*s for which pointer equality corresponds to value +equality. */ Should be: /* Match CONST_*s for which pointer equality corresponds to value equality. */ (probably an artefact of my work mailer, sorry) + + + Rather a lot of whitespace there. One line seems enough, since we're just before the definition of CONST_INT_P. OK with those changes, thanks. Richard
Re: patch for machine independent rtl section to hide case statements for different types of constants.
Richard Sandiford rdsandif...@googlemail.com writes: switch (code) { -case CONST_INT: -case CONST_DOUBLE: -case CONST_FIXED: +CASE_CONST_UNIQUE: case SYMBOL_REF: case CONST: case LABEL_REF: This was suppsoed to be CASE_CONST_ANY too. The omission of CONST_VECTOR looks like an oversight. Sorry, snipped the all-important: --- gccBaseline/gcc/loop-invariant.c 2012-07-22 16:55:01.239982968 -0400 +++ gccWCase/gcc/loop-invariant.c 2012-08-20 16:02:30.013430970 -0400 @@ -203,9 +203,7 @@ check_maybe_invariant (rtx x) Richard
Re: patch for machine independent rtl section to hide case statements for different types of constants.
it would have been tough without the second snippit On 08/21/2012 01:02 PM, Richard Sandiford wrote: Richard Sandiford rdsandif...@googlemail.com writes: switch (code) { -case CONST_INT: -case CONST_DOUBLE: -case CONST_FIXED: +CASE_CONST_UNIQUE: case SYMBOL_REF: case CONST: case LABEL_REF: This was suppsoed to be CASE_CONST_ANY too. The omission of CONST_VECTOR looks like an oversight. Sorry, snipped the all-important: --- gccBaseline/gcc/loop-invariant.c2012-07-22 16:55:01.239982968 -0400 +++ gccWCase/gcc/loop-invariant.c 2012-08-20 16:02:30.013430970 -0400 @@ -203,9 +203,7 @@ check_maybe_invariant (rtx x) Richard
Re: [Patch,testsuite] Break gcc.dg/fixed-point/convert.c into manageable parts
On Aug 21, 2012, at 4:32 AM, Georg-Johann Lay wrote: The patch breaks up convert.c in parts so that an AVR ATmega103 device with 128KiB for executable code (.text + .data + .rodata) can run them. Ok for trunk? Ok, but watch out for any comments from the fixed-point or the C front-end folks.
Re: patch for machine independent rtl section to hide case statements for different types of constants.
I am certainly not going to check it in if there are any issues with the patch. However, this was basically a trivial lexicographical cleanup, and if no one has any comments on it after a reasonable amount of time, then i do feel this is ok.Obviously if anyone has any comments. that is completely a different issue. I named it this way CASE_CONST_SCALAR_INTEGER because i need to introduce in the next patch a predicate that looks like /* Predicate yielding true iff X is an rtx for a integer const. */ #if TARGET_SUPPORTS_WIDE_INT == 1 #define CONST_INTEGER_P(X) \ (CONST_INT_P (X) || CONST_WIDE_INT_P (X)) #else #define CONST_INTEGER_P(X) \ (CONST_INT_P (X) || CONST_DOUBLE_AS_INT_P (X)) #endif for all of the rtxs that represent an integer. And this name was consistent with that. It may be that you have a suggestion for the name of predicate as well but it seemed to make more sense to have the rtxs be consistent rather than rtx/mode consistent. kenny On 08/21/2012 12:56 PM, Richard Sandiford wrote: Kenneth Zadeck zad...@naturalbridge.com writes: I plan to commit this in a few days unless someone has some comments. This is a mostly trivial patch and the changes from that are Richard Sandiford's and he is an rtl maintainer. Please don't do this. Patches need to be sent for review in their final form. Obviously, having got this far with the patch, you're free to beat me up if I don't review it. :-) Anyway, please do call it CASE_CONST_SCALAR_INT rather than CASE_CONST_SCALAR_INTEGER. Like I said in my original mail, CASE_CONST_SCALAR_INT chimes nicely with SCALAR_INT_MODE_P, etc., and (as I didn't say) it'd be better not to have two spellings of the same thing. diff -upNr '--exclude=.svn' gccBaseline/gcc/combine.c gccWCase/gcc/combine.c --- gccBaseline/gcc/combine.c 2012-08-17 09:35:24.802195795 -0400 +++ gccWCase/gcc/combine.c 2012-08-20 15:43:34.659362244 -0400 @@ -531,12 +531,10 @@ find_single_use_1 (rtx dest, rtx *loc) switch (code) { -case CONST_INT: case CONST: case LABEL_REF: case SYMBOL_REF: -case CONST_DOUBLE: -case CONST_VECTOR: +CASE_CONST_UNIQUE: case CLOBBER: return 0; @@ -12788,10 +12786,8 @@ mark_used_regs_combine (rtx x) { case LABEL_REF: case SYMBOL_REF: -case CONST_INT: case CONST: -case CONST_DOUBLE: -case CONST_VECTOR: +CASE_CONST_UNIQUE: case PC: case ADDR_VEC: case ADDR_DIFF_VEC: These were supposed to be CASE_CONST_ANY. The omission of CONST_FIXED looks like an oversight. switch (code) { -case CONST_INT: -case CONST_DOUBLE: -case CONST_FIXED: +CASE_CONST_UNIQUE: case SYMBOL_REF: case CONST: case LABEL_REF: This was suppsoed to be CASE_CONST_ANY too. The omission of CONST_VECTOR looks like an oversight. +/* Match CONST_*s for which pointer equality corresponds to value +equality. */ Should be: /* Match CONST_*s for which pointer equality corresponds to value equality. */ (probably an artefact of my work mailer, sorry) + + + Rather a lot of whitespace there. One line seems enough, since we're just before the definition of CONST_INT_P. OK with those changes, thanks. Richard
Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)
On Tue, Aug 21, 2012 at 12:34 AM, Jan Hubicka hubi...@ucw.cz wrote: Teresa has done some tunings for the unroller so far. The inliner tuning is the next step. What concerns me that it is greatly inaccurate - you have no idea how many instructions given counter is guarding and it can differ quite a lot. Also inlining/optimization makes working sets significantly different (by factor of 100 for tramp3d). The pre ipa-inline working set is the one that is needed for ipa inliner tuning. For post-ipa inline code increase transformations, some update is probably needed. But on the ohter hand any solution at this level will be greatly inaccurate. So I am curious how reliable data you can get from this? How you take this into account for the heuristics? This effort is just the first step to allow good heuristics to develop. It seems to me that for this use perhaps the simple logic in histogram merging maximizing the number of BBs for given bucket will work well? It is inaccurate, but we are working with greatly inaccurate data anyway. Except for degenerated cases, the small and unimportant runs will have small BB counts, while large runs will have larger counts and those are ones we optimize for anyway. The working set curve for each type of applications contains lots of information that can be mined. The inaccuracy can also be mitigated by more data 'calibration'. Sure, I think I am leaning towards trying the solution 2) with maximizing counter count merging (probably it would make sense to rename it from BB count since it is not really BB count and thus it is misleading) and we will see how well it works in practice. We have benefits of much fewer issues with profile locking/unlocking and we lose bit of precision on BB counts. I tend to believe that the error will not be that important in practice. Another loss is more histogram streaming into each gcda file, but with skiping zero entries it should not be major overhead problem I hope. What do you think? 2) Do we plan to add some features in near future that will anyway require global locking? I guess LIPO itself does not count since it streams its data into independent file as you mentioned earlier and locking LIPO file is not that hard. Does LIPO stream everything into that common file, or does it use combination of gcda files and common summary? Actually, LIPO module grouping information are stored in gcda files. It is also stored in a separate .imports file (one per object) --- this is primarily used by our build system for dependence information. I see, getting LIPO safe WRT parallel updates will be fun. How does LIPO behave on GCC bootstrap? We have not tried gcc bootstrap with LIPO. Gcc compile time is not the main problem for application build -- the link time (for debug build) is. I was primarily curious how the LIPOs runtime analysis fare in the situation where you do very many small train runs on rather large app (sure GCC is small to google's use case ;). There will be race, but as Teresa mentioned, there is a big chance that the process which finishes the merge the last is also t the final overrider of the LIPO summary data. (i.e. it does a lot more work in the libgcov module per each invocation, so I am curious if it is practically useful at all). With LTO based solution a lot can be probably pushed at link time? Before actual GCC starts from the linker plugin, LIPO module can read gcov CFGs from gcda files and do all the merging/updating/CFG constructions that is currently performed at runtime, right? The dynamic cgraph build and analysis is still done at runtime. However, with the new implementation, FE is no longer involved. Gcc driver is modified to understand module grouping, and lto is used to merge the streamed output from aux modules. I see. Are there any fundamental reasons why it can not be done at link-time when all gcda files are available? For build parallelism, the decision should be made as early as possible -- that is what makes LIPO 'light'. Why the grouping is not done inside linker plugin? It is not delayed into link time. In fact linker plugin is not even involved. David Honza David
Re: [wwwdocs] Document Runtime CPU detection builtins
Committed after making the changes. One small problem, I am not sure how to fix this: The hyper link I referenced is : http://gcc.gnu.org/onlinedocs/gcc/X86-Built_002din-Functions.html#X86-Built_002din-Functions whereas the committed changes.html is pointing to: http://gcc.gnu.org/onlinedocs/gcc/X86-Built-in-Functions.html#X86-Built-in-Functions Please note that the _002din is missing. This makes the link broken, did I miss anything? I verified that I submitted the right link. Thanks, -Sri. On Tue, Aug 21, 2012 at 5:41 AM, Diego Novillo dnovi...@google.com wrote: On 2012-08-20 22:41 , Sriraman Tallam wrote: Hi Gerald / Diego, I have made all the mentioned changes. I also shortened the description like Diego mentioned by removing all the strings but kept the caveats. I have not added a reference to the documentation because i do not know what link to reference. The builtins are completely documented in extend.texi. Referring to the user's manual is OK, I think. +pCaveat: If these built-in functions are called before any static +constructors are invoked, like during IFUNC initialization, then the CPU +detection initialization must be explicity run using this newly provided s/explicity/explicitly/ Other than that, it looks fine to me. Diego.
Re: C++ PATCH for c++/51675 (more constexpr unions)
On Wed, Feb 8, 2012 at 1:23 AM, Jason Merrill ja...@redhat.com wrote: More traffic on PR 51675 demonstrates that my earlier patch didn't fix the whole problem. This patch improves handling of user-defined constructors. Tested x86_64-pc-linux-gnu, applying to trunk. This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54341 -- H.J.
[lra] patch to remove -flra option
The following patch mostly removes -flra option by defining a machine-dependent hook lra_p. If the hook returns true, LRA is used. Otherwise, reload pass is used. By default the hook returns false. It returns true for 8 targets, lra was ported (i386, rs6000, arm, s390, ia64, sparc, mips, pa). The patch was successfully bootstrapped on x86/x86-64. Committed as rev. 190564. 2012-08-21 Vladimir Makarov vmaka...@redhat.com * targhooks.h (default_lra_p): Declare. * targhooks.c (default_lra_p): New function. * target.def (lra_p): New hook. * ira.h (ira_use_lra_p): New external. * ira.c (ira_init_once, ira_init, ira_finish_once): Call lra_start_once, lra_init, lra_finish_once in anyway. (ira_setup_eliminable_regset, setup_reg_renumber): Use ira_use_lra_p instead of flra_lra. (ira_use_lra_p): Define. (ira): Set up ira_use_lra_p. Use ira_use_lra_p instead of flra_lra. * dwarf2out.c: Add ira.h. (based_loc_descr, compute_frame_pointer_to_fb_displacement): Use ira_use_lra_p instead of ira_use_lra_p. * rtlanal.c (simplify_subreg_regno): Add comments. * Makefile.in (dwarf2out.c): Add dependence ira.h. * doc/passes.texi: Change LRA pass description. * doc/tm.texi.in: Add TARGET_LRA_P. * doc/tm.texi: Update. * doc/invoke.texi: Remove -flra option. * common.opt: Remove flra option. Add description for flra-reg-spill. * reginfo.c (allocate_reg_info): Fix a comment typo. * config/arm/arm.c (TARGET_LRA_P): Define. (arm_lra_p): New function. * config/sparc/sparc.c (TARGET_LRA_P): Define. (sparc_lra_p): New function. * config/s390/s390.c (TARGET_LRA_P): Define. (s390_lra_p): New function. * config/i386/i386.c (TARGET_LRA_P): Define. (ix86_lra_p): New function. * config/rs6000/rs6000.c (TARGET_LRA_P): Define. (rs6000_lra_p): New function. * config/mips/mips.c (TARGET_LRA_P): Define. (mips_lra_p): New function. * config/pa/pa.c (TARGET_LRA_P): Define. (pa_lra_p): New function. * config/ia64/ia64.c (TARGET_LRA_P): Define. (ia64_lra_p): New function. Index: targhooks.c === --- targhooks.c (revision 190448) +++ targhooks.c (working copy) @@ -840,6 +840,12 @@ default_branch_target_register_class (vo return NO_REGS; } +extern bool +default_lra_p (void) +{ + return false; +} + int default_register_bank (int hard_regno ATTRIBUTE_UNUSED) { Index: target.def === --- target.def (revision 190448) +++ target.def (working copy) @@ -2332,6 +2332,16 @@ DEFHOOK tree, (tree type, tree expr), hook_tree_tree_tree_null) +/* Return true if we use LRA instead of reload. */ +DEFHOOK +(lra_p, + A target hook which returns true if we use LRA instead of reload pass.\ + It means that LRA was ported to the target.\ + \ + The default version of this target hook returns always false., + bool, (void), + default_lra_p) + /* Return register bank of given hard regno for the current target. */ DEFHOOK (register_bank, Index: targhooks.h === --- targhooks.h (revision 190448) +++ targhooks.h (working copy) @@ -132,6 +132,7 @@ extern rtx default_static_chain (const_t extern void default_trampoline_init (rtx, tree, rtx); extern int default_return_pops_args (tree, tree, int); extern reg_class_t default_branch_target_register_class (void); +extern bool default_lra_p (void); extern int default_register_bank (int); extern bool default_different_addr_displacement_p (void); extern reg_class_t default_secondary_reload (bool, rtx, reg_class_t, Index: rtlanal.c === --- rtlanal.c (revision 190448) +++ rtlanal.c (working copy) @@ -3501,6 +3501,7 @@ simplify_subreg_regno (unsigned int xreg if (GET_MODE_CLASS (xmode) != MODE_COMPLEX_INT GET_MODE_CLASS (xmode) != MODE_COMPLEX_FLOAT REG_CANNOT_CHANGE_MODE_P (xregno, xmode, ymode) + /* We can use mode change in LRA for some transformations. */ ! lra_in_progress) return -1; #endif @@ -3511,6 +3512,8 @@ simplify_subreg_regno (unsigned int xreg return -1; if (FRAME_POINTER_REGNUM != ARG_POINTER_REGNUM + /* We should convert arg register in LRA after the elimination + if it is possible. */ xregno == ARG_POINTER_REGNUM ! lra_in_progress) return -1; Index: ira.c === --- ira.c (revision 190448) +++ ira.c (working copy) @@ -1634,8 +1634,7 @@ void ira_init_once (void) { ira_init_costs_once (); - if (flag_lra) -lra_init_once (); + lra_init_once (); } /* Free
[patch] two more bitmap obstacks
Hello, Two more bitmap obstacks, this time in tree-ssa-coalesce.c. The advantage isn't so much in having the bitmaps on the non-default obstack, but more in that the bitmaps can be free'ed all at once by simply releasing the obstack. Bootstrappedtested on x86_64-unknown-linux-gnu. OK for trunk? Ciao! Steven more_obs.diff Description: Binary data
Re: patch for machine independent rtl section to hide case statements for different types of constants.
Kenneth Zadeck zad...@naturalbridge.com writes: I named it this way CASE_CONST_SCALAR_INTEGER because i need to introduce in the next patch a predicate that looks like /* Predicate yielding true iff X is an rtx for a integer const. */ #if TARGET_SUPPORTS_WIDE_INT == 1 #define CONST_INTEGER_P(X) \ (CONST_INT_P (X) || CONST_WIDE_INT_P (X)) #else #define CONST_INTEGER_P(X) \ (CONST_INT_P (X) || CONST_DOUBLE_AS_INT_P (X)) #endif for all of the rtxs that represent an integer. And this name was consistent with that. It may be that you have a suggestion for the name of predicate as well Good guess. but it seemed to make more sense to have the rtxs be consistent rather than rtx/mode consistent. Yeah, I think CONST_SCALAR_INT_P would be better here too. INTEGER just isn't distinct enough from INT for the difference to be obvious. It also doesn't indicate that complex integers and vector integers are excluded. SCALAR_INT seems a bit more precise, as well as having precedent. BTW, the == 1 above looks redundant. Richard
[PATCH] fix wrong-code bug for -fstrict-volatile-bitfields
This patch is a followup to the addition of support for -fstrict-volatile-bitfields (required by the ARM EABI); see this thread http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01889.html for discussion of the original patch. That patch only addressed the behavior when extracting the value of a volatile bit field, but the same problems affect storing values into a volatile bit field (or a field of a packed structure, which is effectively implemented as a bit field). This patch makes the code for bitfield stores mirror that for bitfield loads. Although the fix is in target-independent code, it's really for ARM; hence the test case, which (without this patch) generates wrong code. Code to determine the access width was correctly preserving the user-specified width, but was incorrectly falling through to code that assumes word mode. As well as regression testing on arm-none-eabi, I've bootstrapped and regression-tested this patch on x86_64 Linux. Earlier versions of this patch have also been present in our local 4.5 and 4.6 GCC trees for some time, so it's been well-tested on a variety of other platforms. OK to check in on mainline? -Sandra 2012-08-21 Paul Brook p...@codesourcery.com Joseph Myers jos...@codesourcery.com Sandra Loosemore san...@codesourcery.com gcc/ * expr.h (store_bit_field): Add packedp parameter to prototype. * expmed.c (store_bit_field, store_bit_field_1): Add packedp parameter. Adjust all callers. (warn_misaligned_bitfield): New function, split from extract_fixed_bit_field. (store_fixed_bit_field): Add packedp parameter. Fix wrong-code behavior for the combination of misaligned bitfield and -fstrict-volatile-bitfields. Use warn_misaligned_bitfield. (extract_fixed_bit_field): Use warn_misaligned_bitfield. * expr.c: Adjust calls to store_bit_field. (expand_assignment): Identify accesses to packed structures. (store_field): Add packedp parameter. Adjust callers. * calls.c: Adjust calls to store_bit_field. * ifcvt.c: Likewise. * config/s390/s390.c: Likewise. gcc/testsuite/ * gcc.target/arm/volatile-bitfields-5.c: New test case. Index: gcc/expr.h === --- gcc/expr.h (revision 190541) +++ gcc/expr.h (working copy) @@ -693,7 +693,7 @@ extern void store_bit_field (rtx, unsign unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, - enum machine_mode, rtx); + bool, enum machine_mode, rtx); extern rtx extract_bit_field (rtx, unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, int, bool, rtx, enum machine_mode, enum machine_mode); Index: gcc/expmed.c === --- gcc/expmed.c (revision 190541) +++ gcc/expmed.c (working copy) @@ -50,7 +50,7 @@ static void store_fixed_bit_field (rtx, unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, - rtx); + rtx, bool); static void store_split_bit_field (rtx, unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, @@ -406,7 +406,7 @@ store_bit_field_1 (rtx str_rtx, unsigned unsigned HOST_WIDE_INT bitnum, unsigned HOST_WIDE_INT bitregion_start, unsigned HOST_WIDE_INT bitregion_end, - enum machine_mode fieldmode, + bool packedp, enum machine_mode fieldmode, rtx value, bool fallback_p) { unsigned int unit @@ -638,7 +638,7 @@ store_bit_field_1 (rtx str_rtx, unsigned if (!store_bit_field_1 (op0, new_bitsize, bitnum + bit_offset, bitregion_start, bitregion_end, - word_mode, + false, word_mode, value_word, fallback_p)) { delete_insns_since (last); @@ -859,7 +859,7 @@ store_bit_field_1 (rtx str_rtx, unsigned tempreg = copy_to_reg (xop0); if (store_bit_field_1 (tempreg, bitsize, xbitpos, bitregion_start, bitregion_end, - fieldmode, orig_value, false)) + false, fieldmode, orig_value, false)) { emit_move_insn (xop0, tempreg); return true; @@ -872,7 +872,7 @@ store_bit_field_1 (rtx str_rtx, unsigned return false; store_fixed_bit_field (op0, offset, bitsize, bitpos, - bitregion_start, bitregion_end, value); + bitregion_start, bitregion_end, value, packedp); return true; } @@ -885,6 +885,8 @@ store_bit_field_1 (rtx str_rtx, unsigned These two fields are 0, if the C++ memory model does not apply, or we are not interested in keeping track of bitfield regions. + PACKEDP is true for fields with the packed attribute. + FIELDMODE is the machine-mode of the FIELD_DECL node for this field. */ void @@ -892,6 +894,7 @@ store_bit_field (rtx str_rtx, unsigned H unsigned HOST_WIDE_INT bitnum, unsigned HOST_WIDE_INT bitregion_start,
Re: [PATCH, MIPS] fix MIPS16 jump table overflow
Sandra Loosemore san...@codesourcery.com writes: In config/mips/mips.h, there is presently this comment: /* ??? 16-bit offsets can overflow in large functions. */ #define TARGET_MIPS16_SHORT_JUMP_TABLES TARGET_MIPS16_TEXT_LOADS A while ago we had a bug report where a big switch statement did, in fact, overflow the range of 16-bit offsets, causing a runtime error. GCC already has generic machinery to shorten offset tables for switch statements that does the necessary range checking, but it only works with casesi, not the lower-level tablejump expansion. So, this patch provides a casesi expander to handle this situation. Nice. This patch has been in use on our local 4.5 and 4.6 branches for about a year now. When testing it on mainline, I found it tripped over the recent change to add MIPS16 branch overflow checking in other situations, causing it to get into an infinite loop. I think telling it to ignore these new jump insns it doesn't know how to process is the right thing to do, but I'm not sure if there's a better way to restrict the condition or make mips16_split_long_branches more robust. Richard, since that's your code I assume you'll suggest an alternative if this doesn't meet with your approval. Changing it to: if (JUMP_P (insn) USEFUL_INSN_P (insn) get_attr_length (insn) 8 (any_condjump_p (insn) || any_uncond_jump_p (insn))) should be OK. @@ -5937,6 +5933,91 @@ [(set_attr type jump) (set_attr mode none)]) +;; For MIPS16, we don't know whether a given jump table will use short or +;; word-sized offsets until late in compilation, when we are able to determine +;; the sizes of the insns which comprise the containing function. This +;; necessitates the use of the casesi rather than the tablejump pattern, since +;; the latter tries to calculate the index of the offset to jump through early +;; in compilation, i.e. at expand time, when nothing is known about the +;; eventual function layout. + +(define_expand casesi + [(match_operand:SI 0 register_operand ); index to jump on + (match_operand:SI 1 const_int_operand ) ; lower bound + (match_operand:SI 2 const_int_operand ) ; total range + (match_operand:SI 3 ); table label + (match_operand:SI 4 )] ; out of range label The last two are Pmode rather than SImode. Since there aren't different case* patterns for different Pmodes, we can't use :P instead, so let's just drop the modes on 4 and 5. Would be nice to add a compile test for -mabi=64 just to make sure that Pmode == DImode works. A copy of an existing test like code-readable-1.c would be fine. +(define_insn casesi_internal_mips16 + [(set (pc) + (if_then_else + (leu (match_operand:SI 0 register_operand d) + (match_operand:SI 1 arith_operand dI)) + (mem:SI (plus:SI (mult:SI (match_dup 0) (const_int 4)) + (label_ref (match_operand 2 + (label_ref (match_operand 3 + (clobber (match_scratch:SI 4 =d)) + (clobber (match_scratch:SI 5 =d)) + (clobber (reg:SI MIPS16_T_REGNUM)) + (use (label_ref (match_dup 2)))] Although this is descriptive, the MEM is probably more trouble than it's worth. A hard-coded MEM like this will alias a lot of things, whereas we're only reading from the function itself. I think an unspec would be better. This pattern should have :P for operands 4 and 5, with the pattern name becoming: casesi_internal_mips16_mode PMODE_INSN should make it easy to wrap up the difference. There shouldn't be any need for the final USE. Let me know if you found otherwise, because that sounds like a bug. + TARGET_MIPS16_SHORT_JUMP_TABLES +{ + rtx diff_vec = PATTERN (next_real_insn (operands[2])); + + gcc_assert (GET_CODE (diff_vec) == ADDR_DIFF_VEC); + + output_asm_insn (sltu\t%0, %1, operands); + output_asm_insn (bteqz\t%3, operands); + output_asm_insn (la\t%4, %2, operands); + + switch (GET_MODE (diff_vec)) +{ +case HImode: + output_asm_insn (sll\t%5, %0, 1, operands); + output_asm_insn (addu\t%5, %4, %5, operands); + output_asm_insn (lh\t%5, 0(%5), operands); + break; + +case SImode: + output_asm_insn (sll\t%5, %0, 2, operands); + output_asm_insn (addu\t%5, %4, %5, operands); + output_asm_insn (lw\t%5, 0(%5), operands); + break; + +default: + gcc_unreachable (); +} + + output_asm_insn (addu\t%4, %4, %5, operands); + + return j\t%4; +} + [(set_attr length 32)]) The addus here ought to be daddus after the :P change. I think we can avoid the earlyclobber on operand 4 by moving the LA after the SLL. +#define CASE_VECTOR_MODE ptr_mode + +/* Only use short offsets if their range will not overflow. */ +#define CASE_VECTOR_SHORTEN_MODE(MIN, MAX, BODY) \ + (TARGET_MIPS16_SHORT_JUMP_TABLES ((MIN)
[PATCH] Fix some leaks and one uninitialized var read
Hi! The recent change in find_assert_locations from XCNEWVEC to XNEWVEC caused a valgrind warning, because bb_rpo[ENTRY_BLOCK] used to be accessed, but was never initialized. Fixed by ignoring edges from ENTRY_BLOCK altogether. The rest are a couple of memory leak fixes. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2012-08-21 Jakub Jelinek ja...@redhat.com * tree-vrp.c (find_assert_locations): Skip also edges from the entry block. * tree-vect-loop-manip.c (slpeel_make_loop_iterate_ntimes): Call free_stmt_vec_info on orig_cond after gsi_removing it. * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Always free body_cost_vec vector. (vect_analyze_data_refs): If gather is unsuccessful, free_data_ref (dr). * tree-inline.c (tree_function_versioning): Free old_transforms_to_apply vector. --- gcc/tree-vrp.c.jj 2012-08-20 20:56:01.0 +0200 +++ gcc/tree-vrp.c 2012-08-21 12:15:32.501753048 +0200 @@ -5596,7 +5596,7 @@ find_assert_locations (void) FOR_EACH_EDGE (e, ei, bb-preds) { int pred = e-src-index; - if (e-flags EDGE_DFS_BACK) + if ((e-flags EDGE_DFS_BACK) || pred == ENTRY_BLOCK) continue; if (!live[pred]) --- gcc/tree-vect-loop-manip.c.jj 2012-08-15 10:55:24.0 +0200 +++ gcc/tree-vect-loop-manip.c 2012-08-21 15:01:02.600750196 +0200 @@ -788,6 +788,7 @@ slpeel_make_loop_iterate_ntimes (struct /* Remove old loop exit test: */ gsi_remove (loop_cond_gsi, true); + free_stmt_vec_info (orig_cond); loop_loc = find_loop_location (loop); if (dump_file (dump_flags TDF_DETAILS)) --- gcc/tree-vect-data-refs.c.jj2012-08-20 11:09:45.0 +0200 +++ gcc/tree-vect-data-refs.c 2012-08-21 16:32:13.631428796 +0200 @@ -1934,10 +1934,9 @@ vect_enhance_data_refs_alignment (loop_v gcc_assert (stat); return stat; } - else - VEC_free (stmt_info_for_cost, heap, body_cost_vec); } + VEC_free (stmt_info_for_cost, heap, body_cost_vec); /* (2) Versioning to force alignment. */ @@ -3313,6 +3312,8 @@ vect_analyze_data_refs (loop_vec_info lo gather = false; if (!gather) { + STMT_VINFO_DATA_REF (stmt_info) = NULL; + free_data_ref (dr); if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS)) { fprintf (vect_dump, --- gcc/tree-inline.c.jj2012-08-15 10:55:33.0 +0200 +++ gcc/tree-inline.c 2012-08-21 17:28:24.181069515 +0200 @@ -5089,6 +5089,7 @@ tree_function_versioning (tree old_decl, VEC_index (ipa_opt_pass, old_transforms_to_apply, i)); + VEC_free (ipa_opt_pass, heap, old_transforms_to_apply); } id.copy_decl = copy_decl_no_change; Jakub
libgcc patch committed: Increase non-split stack space
When a -fsplit-stack function calls a non-split-stack function, the gold linker automatically redirects the call to __morestack to call __morestack_non_split instead. I wrote __morestack_non_split to always allocate at least 0x4000 bytes. However, that was unclear thinking; 0x4000 bytes is sufficient for calling into libc, but it is not sufficient for calling a general function. This value leads to stack overruns in ordinary code. The default thread stack size on x86 and x86_64 is 0x80 bytes. This patch significantly increases the stack size allocated for non-split code, to less than the default but still larger, 0x10 bytes. Probably the program should have a way to control this, but I'm not yet sure what the right API would be for that. In any case the default should be larger. Bootstrapped and ran Go testsuite and split-stack tests on x86_64-unknown-linux-gnu. Committed to mainline and 4.7 branch. Ian 2012-08-21 Ian Lance Taylor i...@google.com * config/i386/morestack.S (__morestack_non_split): Increase amount of space allocated for non-split code stack. Index: config/i386/morestack.S === --- config/i386/morestack.S (revision 190572) +++ config/i386/morestack.S (working copy) @@ -83,6 +83,9 @@ #endif +# The amount of space we ask for when calling non-split-stack code. +#define NON_SPLIT_STACK 0x10 + # This entry point is for split-stack code which calls non-split-stack # code. When the linker sees this case, it converts the call to # __morestack to call __morestack_non_split instead. We just bump the @@ -109,7 +112,7 @@ __morestack_non_split: movl %esp,%eax # Current stack, subl 8(%esp),%eax # less required stack frame size, - subl $0x4000,%eax # less space for non-split code. + subl $NON_SPLIT_STACK,%eax # less space for non-split code. cmpl %gs:0x30,%eax # See if we have enough space. jb 2f # Get more space if we need it. @@ -171,7 +174,8 @@ __morestack_non_split: .cfi_adjust_cfa_offset -4 # Account for popped register. - addl $0x5000+BACKOFF,4(%esp) # Increment space we request. + # Increment space we request. + addl $NON_SPLIT_STACK+0x1000+BACKOFF,4(%esp) # Fall through into morestack. @@ -186,7 +190,7 @@ __morestack_non_split: movq %rsp,%rax # Current stack, subq %r10,%rax # less required stack frame size, - subq $0x4000,%rax # less space for non-split code. + subq $NON_SPLIT_STACK,%rax # less space for non-split code. #ifdef __LP64__ cmpq %fs:0x70,%rax # See if we have enough space. @@ -219,7 +223,8 @@ __morestack_non_split: .cfi_adjust_cfa_offset -8 # Adjust for popped register. - addq $0x5000+BACKOFF,%r10 # Increment space we request. + # Increment space we request. + addq $NON_SPLIT_STACK+0x1000+BACKOFF,%r10 # Fall through into morestack.
PATCH: PR middle-end/54332: [4.8 Regression] 481.wrf in SPEC CPU 2006 takes 10GB memory to compile
Hi, This patch restores df_free_collection_rec call inside the insn traversal loop and removes the stack allocation check in vec_reserve. It has been approved in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54332#c25 It has been tested on Linux/x86-64 and checked in. Thanks. H.J. --- 2012-08-21 H.J. Lu hongjiu...@intel.com PR middle-end/54332 * df-scan.c (df_bb_verify): Restore df_free_collection_rec call inside the insn traversal loop. * vec.h (vec_reserve): Remove the stack allocation check. diff --git a/gcc/df-scan.c b/gcc/df-scan.c index 55492fa..df90365 100644 --- a/gcc/df-scan.c +++ b/gcc/df-scan.c @@ -4448,6 +4448,7 @@ df_bb_verify (basic_block bb) if (!INSN_P (insn)) continue; df_insn_refs_verify (collection_rec, bb, insn, true); + df_free_collection_rec (collection_rec); } /* Do the artificial defs and uses. */ diff --git a/gcc/vec.h b/gcc/vec.h index 5fdb859..1922616 100644 --- a/gcc/vec.h +++ b/gcc/vec.h @@ -1099,21 +1099,9 @@ vec_reserve (vec_tT *vec_, int reserve MEM_STAT_DECL) sizeof (T), false PASS_MEM_STAT); else -{ - /* Only allow stack vectors when re-growing them. The initial -allocation of stack vectors must be done with the -VEC_stack_alloc macro, because it uses alloca() for the -allocation. */ - if (vec_ == NULL) - { - fprintf (stderr, Stack vectors must be initially allocated - with VEC_stack_alloc.\n); - gcc_unreachable (); - } - return (vec_tT *) vec_stack_o_reserve (vec_, reserve, - offsetof (vec_tT, vec), - sizeof (T) PASS_MEM_STAT); -} +return (vec_tT *) vec_stack_o_reserve (vec_, reserve, +offsetof (vec_tT, vec), +sizeof (T) PASS_MEM_STAT); }
Another merge from gcc 4.7 branch to gccgo branch
I merged revision 190574 from the gcc 4.7 branch to the gccgo branch. Ian
Re: [PATCH] Allow dg-skip-if to use compiler flags specified through set_board_info cflags
On Aug 11, 2012, at 10:39 AM, Senthil Kumar Selvaraj wrote: This patch allows cflags set in board config files using set_board_info cflags to be used in the selectors of dg-skip-if and other dejagnu commands that use the check-flags proc. Ok.
[Patch, Fortran, committed] free gfc_code of EXEC_END_PROCEDURE
Background: There is currently a memory leakage cleanup in the middle end – and fixing PR 54332 would probably have been also easier without FE leaks. I think we should join in an try to remove some leakage - and try to not introduce new ones. * * * Committed: For EXPR_END_PROCEDURE, I have committed one fix as obvious (- parse.c). However, I have a test case where parse_contained still leaks memory, possibly another, similar patch is needed in addition. * * * There are also plenty of leaks related to the freeing of gfc_ss. I attached a draft patch (trans-expr.c, trans-intrinsics.c), which is probably okay, but not yet regtested. OK with a changelog (and if it regtested)? Note: The patch is incomplete, e.g. argss of gfc_conv_procedure_call is not (or not always) freed. Ditto for rss of gfc_trans_assignment_1; ditto for lss and rss of gfc_trans_pointer_assignment * * * * * * * * * * * * Additionally, there is a memory leak when generating more than one procedure per TU: The memory which is allocated but not freed gfc_generate_function_code - (generate_coarray_init or trans_function_start) - init_function_start - prepare_function_start - init_emit The memory should be feed via (backend_init_target or lang_dependent_init_target) - expand_dummy_function_end - free_after_compilation The latter seems to operate on cfun – hence, it only frees the last cfun and not all. However, despite some longer debugging (e.g. using a main program, which calls create_main_function), I failed to find the problem. * * * And module.c can also leak plenty of memory ... Tobias Index: gcc/fortran/ChangeLog === --- gcc/fortran/ChangeLog (Revision 190574) +++ gcc/fortran/ChangeLog (Arbeitskopie) @@ -1,3 +1,8 @@ +2012-08-21 Tobias Burnus bur...@net-b.de + + * parse.c (parse_contained): Include EXEC_END_PROCEDURE + in ns-code to make sure the gfc_code is freed. + 2012-08-20 Tobias Burnus bur...@net-b.de PR fortran/54301 Index: gcc/fortran/parse.c === --- gcc/fortran/parse.c (Revision 190574) +++ gcc/fortran/parse.c (Arbeitskopie) @@ -4075,6 +4075,7 @@ parse_contained (int module) case ST_END_PROGRAM: case ST_END_SUBROUTINE: accept_statement (st); + gfc_current_ns-code = s1.head; break; default: diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c index 4f7d026..cfb0862 100644 --- a/gcc/fortran/trans-expr.c +++ b/gcc/fortran/trans-expr.c @@ -533,6 +533,7 @@ gfc_copy_class_to_class (tree from, tree to, tree nelems) loop.to[0] = nelems; gfc_trans_scalarizing_loops (loop, loopbody); gfc_add_block_to_block (body, loop.pre); + gfc_cleanup_loop (loop); tmp = gfc_finish_block (body); } else @@ -6770,6 +6771,7 @@ gfc_trans_arrayfunc_assign (gfc_expr * expr1, gfc_expr * expr2) if (!expr2-value.function.isym) { realloc_lhs_loop_for_fcn_call (se, expr1-where, ss, loop); + gfc_cleanup_loop (loop); ss-is_alloc_lhs = 1; } else @@ -6778,6 +6780,7 @@ gfc_trans_arrayfunc_assign (gfc_expr * expr1, gfc_expr * expr2) gfc_conv_function_expr (se, expr2); gfc_add_block_to_block (se.pre, se.post); + gfc_free_ss (se.ss); return gfc_finish_block (se.pre); } diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c index fac29c7..d0aebe9 100644 --- a/gcc/fortran/trans-intrinsic.c +++ b/gcc/fortran/trans-intrinsic.c @@ -1328,6 +1328,7 @@ gfc_conv_intrinsic_rank (gfc_se *se, gfc_expr *expr) argse.descriptor_only = 1; gfc_conv_expr_descriptor (argse, expr-value.function.actual-expr, ss); + gfc_free_ss (ss); gfc_add_block_to_block (se-pre, argse.pre); gfc_add_block_to_block (se-post, argse.post);
Re: [PATCH, ARM] Don't pull in unwinder for 64-bit division routines
On 17 August 2012 07:29, Julian Brown jul...@codesourcery.com wrote: On Thu, 16 Aug 2012 19:56:52 +0100 Ramana Radhakrishnan ramra...@arm.com wrote: On 07/24/12 13:27, Julian Brown wrote: On Fri, 20 Jul 2012 11:15:27 +0100 Julian Brown jul...@codesourcery.com wrote: Anyway: this revised version of the patch removes the strange libgcc Makefile-fragment changes, the equivalent of which have since been incorporated into mainline GCC now anyway, so the patch is somewhat more straightforward than it was previously. Joey Ye contacted me offlist and suggested that the t-divmod-ef fragment might be better integrated into t-bpabi instead. Doing that makes the patch somewhat smaller/cleaner. Minimally re-tested, looks OK. The original submission makes no mention of testing ? The ARM specific portions look OK to me modulo no regressions. Thanks -- I'm sure I did test the patch, but just omitted to mention that fact in the mail :-O. We've also been carrying a version of this patch in our local source base for many years now. Hi Julian. The test case fails on arm-linux-gnueabi: http://gcc.gnu.org/ml/gcc-testresults/2012-08/msg02100.html FAIL: gcc.target/arm/div64-unwinding.c execution test The test aborts as _Unwind_RaiseException is not null. _divdi3.o itself looks fine and no longer pulls in the unwinder so I assume something else in the environment is. I've put the binaries up at http://people.linaro.org/~michaelh/incoming/div64-unwinding.exe and http://people.linaro.org/~michaelh/incoming/_divdi3.o if that helps. -- Michael
Re: [SH] Use more multi-line asm outputs
Oleg Endo oleg.e...@t-online.de wrote: This mainly converts the asm outputs to multi-line strings and uses tab chars instead of '\\t' in the asm strings, in the hope to make stuff easier to read and a bit more consistent. Tested on rev 190546 with make -k check RUNTESTFLAGS=--target_board=sh-sim \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb} and no new failures. OK? OK. Regards, kaz
Re: [SH] PR 39423 - Add support for SH2A movu.w insn
Oleg Endo oleg.e...@t-online.de wrote: This adds support for SH2A's movu.w insn for memory addressing cases as described in the PR. Tested on rev 190546 with make -k check RUNTESTFLAGS=--target_board=sh-sim \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb} and no new failures. OK? OK. Regards, kaz
Re: [PATCH] Combine location with block using block_locations
On Tue, Aug 21, 2012 at 6:25 AM, Richard Guenther richard.guent...@gmail.com wrote: On Mon, Aug 20, 2012 at 3:18 AM, Dehao Chen de...@google.com wrote: ping Conceptually I like the change. Can a libcpp maintainer please have a 2nd look? Dehao, did you do any compile-time and memory-usage benchmarks? I don't have a memory benchmarks at hand. But I've tested it through some huge apps, each of which takes more than 1 hour to build on a modern machine. None of them had observed noticeable memory footprint and compile time increase. Thanks, Dehao Thanks, Richard. Thanks, Dehao On Tue, Aug 14, 2012 at 10:13 AM, Dehao Chen de...@google.com wrote: Hi, Dodji, Thanks for the review. I've fixed all the addressed issues. I'm attaching the related changes: Thanks, Dehao libcpp/ChangeLog: 2012-08-01 Dehao Chen de...@google.com * include/line-map.h (MAX_SOURCE_LOCATION): New value. (location_adhoc_data_init): New. (location_adhoc_data_fini): New. (get_combined_adhoc_loc): New. (get_data_from_adhoc_loc): New. (get_location_from_adhoc_loc): New. (COMBINE_LOCATION_DATA): New. (IS_ADHOC_LOC): New. (expanded_location): New field. * line-map.c (location_adhoc_data): New. (location_adhoc_data_htab): New. (curr_adhoc_loc): New. (location_adhoc_data): New. (allocated_location_adhoc_data): New. (location_adhoc_data_hash): New. (location_adhoc_data_eq): New. (location_adhoc_data_update): New. (get_combined_adhoc_loc): New. (get_data_from_adhoc_loc): New. (get_location_from_adhoc_loc): New. (location_adhoc_data_init): New. (location_adhoc_data_fini): New. (linemap_lookup): Change to use new location. (linemap_ordinary_map_lookup): Likewise. (linemap_macro_map_lookup): Likewise. (linemap_macro_map_loc_to_def_point): Likewise. (linemap_macro_map_loc_unwind_toward_spel): Likewise. (linemap_get_expansion_line): Likewise. (linemap_get_expansion_filename): Likewise. (linemap_location_in_system_header_p): Likewise. (linemap_location_from_macro_expansion_p): Likewise. (linemap_macro_loc_to_spelling_point): Likewise. (linemap_macro_loc_to_def_point): Likewise. (linemap_macro_loc_to_exp_point): Likewise. (linemap_resolve_location): Likewise. (linemap_unwind_toward_expansion): Likewise. (linemap_unwind_to_first_non_reserved_loc): Likewise. (linemap_expand_location): Likewise. (linemap_dump_location): Likewise. Index: libcpp/line-map.c === --- libcpp/line-map.c (revision 190209) +++ libcpp/line-map.c (working copy) @@ -25,6 +25,7 @@ #include line-map.h #include cpplib.h #include internal.h +#include hashtab.h static void trace_include (const struct line_maps *, const struct line_map *); static const struct line_map * linemap_ordinary_map_lookup (struct line_maps *, @@ -50,6 +51,135 @@ extern unsigned num_expanded_macros_counter; extern unsigned num_macro_tokens_counter; +/* Data structure to associate an arbitrary data to a source location. */ +struct location_adhoc_data { + source_location locus; + void *data; +}; + +/* The following data structure encodes a location with some adhoc data + and maps it to a new unsigned integer (called an adhoc location) + that replaces the original location to represent the mapping. + + The new adhoc_loc uses the highest bit as the enabling bit, i.e. if the + highest bit is 1, then the number is adhoc_loc. Otherwise, it serves as + the original location. Once identified as the adhoc_loc, the lower 31 + bits of the integer is used to index the location_adhoc_data array, + in which the locus and associated data is stored. */ + +static htab_t location_adhoc_data_htab; +static source_location curr_adhoc_loc; +static struct location_adhoc_data *location_adhoc_data; +static unsigned int allocated_location_adhoc_data; + +/* Hash function for location_adhoc_data hashtable. */ + +static hashval_t +location_adhoc_data_hash (const void *l) +{ + const struct location_adhoc_data *lb = + (const struct location_adhoc_data *) l; + return (hashval_t) lb-locus + (size_t) lb-data; +} + +/* Compare function for location_adhoc_data hashtable. */ + +static int +location_adhoc_data_eq (const void *l1, const void *l2) +{ + const struct location_adhoc_data *lb1 = + (const struct location_adhoc_data *) l1; + const struct location_adhoc_data *lb2 = + (const struct location_adhoc_data *) l2; + return lb1-locus == lb2-locus lb1-data == lb2-data; +} + +/* Update the hashtable when location_adhoc_data is reallocated. */ + +static int +location_adhoc_data_update (void **slot,
Build static libgcc with hidden visibility even with --disable-shared
As discussed in http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01219.html, it is desirable for the libgcc build with inhibit_libc defined and --disable-shared to be similar enough to that build without inhibit_libc and --enable-shared to be usable to build glibc, producing the same results as if glibc were built with a toolchain that already included a shared libgcc and was built against previously built glibc. One source of differences noted there was functions in libgcc.a being hidden only if shared libgcc is also being built. This patch changes the logic so that the way libgcc.a is built in the static-only case is more similar to how it is built when shared libgcc is being built as well; in particular, libgcc symbols are generally given hidden visibility (if supported) in the static libgcc. Tested with cross to arm-none-linux-gnueabi that it fixes the previously observed differences; rebuilding glibc with the second GCC now produces identical stripped binaries to the results of building with the first (static-only) GCC, except for the cases of nscd and static libraries which differ between multiple glibc builds even with identical compilers (in both cases because of embedded timestamps). Also bootstrapped with no regressions on x86_64-unknown-linux-gnu as a sanity check. OK to commit? 2012-08-21 Joseph Myers jos...@codesourcery.com * Makefile.in (vis_hide, gen-hide-list): Do not make definitions depend on --enable-shared. ($(lib1asmfuncs-o)): Use %.vis files independent of --enable-shared. * static-object.mk ($(base)$(objext), $(base).vis) ($(base)_s$(objext)): Use same rules for visibility handling as in shared-object.mk. Index: libgcc/Makefile.in === --- libgcc/Makefile.in (revision 190577) +++ libgcc/Makefile.in (working copy) @@ -363,6 +363,7 @@ ifneq ($(LIBUNWIND),) install-libunwind = install-libunwind endif +endif # For -fvisibility=hidden. We need both a -fvisibility=hidden on # the command line, and a #define to prevent libgcc2.h etc from @@ -386,11 +387,8 @@ gen-hide-list = echo $@ endif -else -# Not enable_shared. +ifneq ($(enable_shared),yes) iterator = $(srcdir)/empty.mk $(patsubst %,$(srcdir)/static-object.mk,$(iter-items)) -vis_hide = -gen-hide-list = echo \$@ endif LIB2ADD += enable-execute-stack.c @@ -439,7 +437,6 @@ $(LIB2_DIVMOD_FUNCS)) # Build libgcc1 (assembly) components. -ifeq ($(enable_shared),yes) lib1asmfuncs-o = $(patsubst %,%$(objext),$(LIB1ASMFUNCS)) $(lib1asmfuncs-o): %$(objext): $(srcdir)/config/$(LIB1ASMSRC) %.vis @@ -451,15 +448,10 @@ lib1asmfuncs-s-o = $(patsubst %,%_s$(objext),$(LIB1ASMFUNCS)) $(lib1asmfuncs-s-o): %_s$(objext): $(srcdir)/config/$(LIB1ASMSRC) $(gcc_s_compile) -DL$* -xassembler-with-cpp -c $ +ifeq ($(enable_shared),yes) + libgcc-s-objects += $(lib1asmfuncs-s-o) -else - -lib1asmfuncs-o = $(patsubst %,%$(objext),$(LIB1ASMFUNCS)) -$(lib1asmfuncs-o): %$(objext): $(srcdir)/config/$(LIB1ASMSRC) - $(gcc_compile) -DL$* -xassembler-with-cpp -c $ -libgcc-objects += $(lib1asmfuncs-o) - endif # Build lib2funcs. For the static library also include LIB2FUNCS_ST. Index: libgcc/static-object.mk === --- libgcc/static-object.mk (revision 190577) +++ libgcc/static-object.mk (working copy) @@ -24,7 +24,13 @@ endif endif -$(base)$(objext): $o - $(gcc_compile) -c -xassembler-with-cpp $ +$(base)$(objext): $o $(base).vis + $(gcc_compile) -c -xassembler-with-cpp -include $*.vis $ +$(base).vis: $(base)_s$(objext) + $(gen-hide-list) + +$(base)_s$(objext): $o + $(gcc_s_compile) -c -xassembler-with-cpp $ + endif -- Joseph S. Myers jos...@codesourcery.com
Re: Build static libgcc with hidden visibility even with --disable-shared
On Tue, Aug 21, 2012 at 5:33 PM, Joseph S. Myers jos...@codesourcery.com wrote: 2012-08-21 Joseph Myers jos...@codesourcery.com * Makefile.in (vis_hide, gen-hide-list): Do not make definitions depend on --enable-shared. ($(lib1asmfuncs-o)): Use %.vis files independent of --enable-shared. * static-object.mk ($(base)$(objext), $(base).vis) ($(base)_s$(objext)): Use same rules for visibility handling as in shared-object.mk. This is OK. Thanks. Ian
[Patch ARM] Update the test case to differ movs and lsrs for ARM mode and non-ARM mode
Hi, Due to the impact of ARM UAL, the Thumb1 and Thumb2 mode use LSRS instruction while the ARM mode uses MOVS instruction. So the following case is updated accordingly. Is it OK to trunk? BR, Terry 2012-08-21 Terry Guo terry@arm.com * gcc.target/arm/combine-movs.c: Check movs for ARM mode and lsrs for other mode. diff --git a/gcc/testsuite/gcc.target/arm/combine-movs.c b/gcc/testsuite/gcc.target/arm/combine-movs.c index 4209a33..fbef9df 100644 --- a/gcc/testsuite/gcc.target/arm/combine-movs.c +++ b/gcc/testsuite/gcc.target/arm/combine-movs.c @@ -1,5 +1,4 @@ /* { dg-do compile } */ -/* { dg-skip-if { arm_thumb1 } } */ /* { dg-options -O } */ void foo (unsigned long r[], unsigned int d) @@ -9,4 +8,5 @@ void foo (unsigned long r[], unsigned int d) r[i] = 0; } -/* { dg-final { scan-assembler movs\tr\[0-9\] } } */ +/* { dg-final { scan-assembler movs\tr\[0-9\] { target arm_nothumb } } } */ +/* { dg-final { scan-assembler lsrs\tr\[0-9\] { target { ! arm_nothumb } } } } */
Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)
I can go ahead with the histogram approach. There is some roundoff error from the working set scaling approach that can affect different merging orders as you note, although I think this only really affects the small counter values. The other place where saving/merging the histogram Do you have any intuition on why simple maximalization merging (that is safe wrt ordering) would be bad idea? We care only about working set size around top of the histogram and I would say that we should sort of optimize for the largest (in the number of blocks in hot area) of the train runs. One way where things will get messed up is when the working set is about the same but the runs are of different size so all the blocks gets accounted into two different buckets. But in general I do not think there is resonably accurate way to merge the profiles without actually streaming out all counter IDs in every bucket, so perhaps this will work well enough. If not, we can in future introduce per-program global summary file that will contain the counters to be merged acurately. would help is when the distribution of counter values in the histogram varies between runs, say for example, the hottest loop is much hotter in a subsequent run, but the rest of the counter values stay largely consistent. Note, however, that if the hotspots are different in different runs, then merging either the histogram or the working set will have issues. The only way to detect this is to recompute the histogram/working set from scratch from the merged counters. I wonder in practice, even when there are a lot of simultaneous runs going like in a gcc bootstrap, if we could get reasonably accurate summary recomputation without global locking. The reason is that as long as the actual counter updates are safe as they are now due to the individual file locking, the inaccuracy in the recomputed summary information will not grow over time, and there is a reasonable chance that the last run to complete and merge will recompute the summary from the final merged counter values and get it right (or only be off by a little bit if there are a couple of runs completing simultaneously at the end). But this can be investigated as a follow on to this patch. The main concern is probably the build reproducibility in gcc bootstrap with FDO. Hmm, you mean in the first pass update every file with new counters and in the second pass just update the summaries? OK, so I guess we went through 1) two pass updating with race in between pases. 2) two pass updating with first pass updating countes and second having race only for summary update. (i.e. no races for counters) 3) two pass updating with flocking (and some way to handle detected deadlocks) 4) one pass updating with histogram merging + maximalization of working set. (we do not realy need to scale the buckets, we can simply merge the histograms and then mutiply them by nruns before comparing to actual counters. This assumes that working sets of all runs are about the same, but should work resonably in practice I think. I guess 3/4 are acceptable WRT bootstrap reproducibility. I have no experience with flocking large number of files and portability of this solution i.e. to Windows. If you think that 2) would be too inaccurate in practice and 3) has chance to be portable, we could go for this. It will solve the precision problems and will also work for LIPO summaries. I would be curious about effect on profiledbootstrap time of this if you implement it. Honza David Thanks, Teresa 2) Do we plan to add some features in near future that will anyway require global locking? I guess LIPO itself does not count since it streams its data into independent file as you mentioned earlier and locking LIPO file is not that hard. Does LIPO stream everything into that common file, or does it use combination of gcda files and common summary? Actually, LIPO module grouping information are stored in gcda files. It is also stored in a separate .imports file (one per object) --- this is primarily used by our build system for dependence information. I see, getting LIPO safe WRT parallel updates will be fun. How does LIPO behave on GCC bootstrap? We have not tried gcc bootstrap with LIPO. Gcc compile time is not the main problem for application build -- the link time (for debug build) is. I was primarily curious how the LIPOs runtime analysis fare in the situation where you do very many small train runs on rather large app (sure GCC is small to google's use case ;). (i.e. it does a lot more work in the libgcov module per each invocation, so I am curious if it is practically useful at all). With LTO based solution a lot can be probably pushed at link time? Before actual GCC starts from the linker
Re: [PATCH 4/4] Reduce the size of optabs representation
On Jul 19, 2012, at 11:24 AM, Richard Henderson wrote: +# genopinit produces two files. +insn-opinit.c insn-opinit.h: s-opinit ; @true +s-opinit: $(MD_DEPS) build/genopinit$(build_exeext) insn-conditions.md + $(RUN_GEN) build/genopinit$(build_exeext) $(md_file) \ + insn-conditions.md -htmp-opinit.h -ctmp-opinit.c + $(SHELL) $(srcdir)/../move-if-change tmp-opinit.h insn-opinit.h + $(SHELL) $(srcdir)/../move-if-change tmp-opinit.c insn-opinit.c + $(STAMP) s-opinit Breaks my port without the attached patch... diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 67f1d66..bd31c9b 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -3484,7 +3484,7 @@ s-attrtab : $(MD_DEPS) build/genattrtab$(build_exeext) \ # genopinit produces two files. insn-opinit.c insn-opinit.h: s-opinit ; @true s-opinit: $(MD_DEPS) build/genopinit$(build_exeext) insn-conditions.md - $(RUN_GEN) build/genopinit$(build_exeext) $(md_file) \ + $(RUN_GEN) build/genopinit$(build_exeext) $(MD_INCS) $(md_file) \ insn-conditions.md -htmp-opinit.h -ctmp-opinit.c $(SHELL) $(srcdir)/../move-if-change tmp-opinit.h insn-opinit.h $(SHELL) $(srcdir)/../move-if-change tmp-opinit.c insn-opinit.c
Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)
On Tue, Aug 21, 2012 at 6:56 PM, Jan Hubicka hubi...@ucw.cz wrote: I can go ahead with the histogram approach. There is some roundoff error from the working set scaling approach that can affect different merging orders as you note, although I think this only really affects the small counter values. The other place where saving/merging the histogram Do you have any intuition on why simple maximalization merging (that is safe wrt ordering) would be bad idea? When you say maximalization merging are you talking about the histogram merging approach I mentioned a few emails back (my response on Aug 19) where we assume the same relative order of hotness in the counters between runs, and accumulate the counter values in the histogram in that order? This would be inaccurate if different runs exercised different areas of the code, and thus the counters would be ordered in the histogram differently. We care only about working set size around top of the histogram and I would say For optimizations that care about the boundary between hot and cold such as code layout I think we will also care about the smaller values in the histogram (to have a good idea of what constitutes a cold block counter value). that we should sort of optimize for the largest (in the number of blocks in hot area) of the train runs. One way where things will get messed up is when the working set is about the same but the runs are of different size so all the blocks gets accounted into two different buckets. I'm not sure I understand the last sentence - is this something that would not get handled by merging the histogram entries as I described earlier? Or it sounds like you might have a different merging approach in mind? But in general I do not think there is resonably accurate way to merge the profiles without actually streaming out all counter IDs in every bucket, so perhaps this will work well enough. If not, we can in future introduce per-program global summary file that will contain the counters to be merged acurately. Sounds good. would help is when the distribution of counter values in the histogram varies between runs, say for example, the hottest loop is much hotter in a subsequent run, but the rest of the counter values stay largely consistent. Note, however, that if the hotspots are different in different runs, then merging either the histogram or the working set will have issues. The only way to detect this is to recompute the histogram/working set from scratch from the merged counters. I wonder in practice, even when there are a lot of simultaneous runs going like in a gcc bootstrap, if we could get reasonably accurate summary recomputation without global locking. The reason is that as long as the actual counter updates are safe as they are now due to the individual file locking, the inaccuracy in the recomputed summary information will not grow over time, and there is a reasonable chance that the last run to complete and merge will recompute the summary from the final merged counter values and get it right (or only be off by a little bit if there are a couple of runs completing simultaneously at the end). But this can be investigated as a follow on to this patch. The main concern is probably the build reproducibility in gcc bootstrap with FDO. Hmm, you mean in the first pass update every file with new counters and in the second pass just update the summaries? Right, that's what I had in mind (what you have described in #2 below). OK, so I guess we went through 1) two pass updating with race in between pases. 2) two pass updating with first pass updating countes and second having race only for summary update. (i.e. no races for counters) 3) two pass updating with flocking (and some way to handle detected deadlocks) 4) one pass updating with histogram merging + maximalization of working set. (we do not realy need to scale the buckets, we can simply merge the histograms and then mutiply them by nruns before comparing to actual counters. By merging the histograms (and accumulating the counter values stored there as we merge), I don't think we need to multiply the counter values by nruns, do we? This assumes that working sets of all runs are about the same, but should work resonably in practice I think. I guess 3/4 are acceptable WRT bootstrap reproducibility. I have no experience with flocking large number of files and portability of this solution i.e. to Windows. If you think that 2) would be too inaccurate in practice and 3) has chance to be portable, we could go for this. It will solve the precision problems and will also work for LIPO summaries. I would be curious about effect on profiledbootstrap time of this if you implement it. I'm hoping that 2) will be accurate enough in practice, but it will need some investigation. Thanks, Teresa Honza David Thanks, Teresa
[Patch, Fortran, committed] Free loop and gfc_ss data
Committed as Rev. 190586 after successful regtesting. That's the version I also had attached to http://gcc.gnu.org/ml/fortran/2012-08/msg00118.html; as written there: The patch is incomplete, e.g. argss of gfc_conv_procedure_call is not (or not always) freed. Ditto for rss of gfc_trans_assignment_1; ditto for lss and rss of gfc_trans_pointer_assignment. Tobias Index: gcc/fortran/trans-expr.c === --- gcc/fortran/trans-expr.c (Revision 190585) +++ gcc/fortran/trans-expr.c (Arbeitskopie) @@ -533,6 +533,7 @@ gfc_copy_class_to_class (tree from, tree to, tree loop.to[0] = nelems; gfc_trans_scalarizing_loops (loop, loopbody); gfc_add_block_to_block (body, loop.pre); + gfc_cleanup_loop (loop); tmp = gfc_finish_block (body); } else @@ -6770,6 +6771,7 @@ gfc_trans_arrayfunc_assign (gfc_expr * expr1, gfc_ if (!expr2-value.function.isym) { realloc_lhs_loop_for_fcn_call (se, expr1-where, ss, loop); + gfc_cleanup_loop (loop); ss-is_alloc_lhs = 1; } else @@ -6778,6 +6780,7 @@ gfc_trans_arrayfunc_assign (gfc_expr * expr1, gfc_ gfc_conv_function_expr (se, expr2); gfc_add_block_to_block (se.pre, se.post); + gfc_free_ss (se.ss); return gfc_finish_block (se.pre); } Index: gcc/fortran/trans-intrinsic.c === --- gcc/fortran/trans-intrinsic.c (Revision 190585) +++ gcc/fortran/trans-intrinsic.c (Arbeitskopie) @@ -1328,6 +1328,7 @@ gfc_conv_intrinsic_rank (gfc_se *se, gfc_expr *exp argse.descriptor_only = 1; gfc_conv_expr_descriptor (argse, expr-value.function.actual-expr, ss); + gfc_free_ss (ss); gfc_add_block_to_block (se-pre, argse.pre); gfc_add_block_to_block (se-post, argse.post); Index: gcc/fortran/ChangeLog === --- gcc/fortran/ChangeLog (Revision 190585) +++ gcc/fortran/ChangeLog (Arbeitskopie) @@ -1,3 +1,9 @@ +2012-08-22 Tobias Burnus bur...@net-b.de + + * trans-expr.c (gfc_copy_class_to_class, + gfc_trans_arrayfunc_assign): Free loop and ss data. + * trans-intrinsic.c (gfc_trans_arrayfunc_assign): Free ss data. + 2012-08-21 Tobias Burnus bur...@net-b.de * parse.c (parse_contained): Include EXEC_END_PROCEDURE