Re: [middle-end] Add machine_mode to address_cost target hook
Joern Rennecke: David Edelsohn: Oleg Endo: Hmm .. the ACK status so far is: Not sure if we are supposed to acknowledge all the straigtforward argument additions... at any rate, the epiphany hunk is OK. I think I'll make use of the new functionality eventually, but prefer to be able to test such a functional change separately, so I'm fine with the approach to just introduce the infrastructure first. Similar for avr: The costs actually depend on the address space, but I think it is reasonable to model that in separate changes. Johann
Re: [Patch, PR 54128] ira.c change to fix mips bootstrap
On Fri, Aug 31, 2012 at 10:58:51AM -0700, Steve Ellcey wrote: Here is my patch to fix the bootstrap comparision failure (PR 54128) on MIPS. The reason for the comparision failure was a difference in register usage and I tracked it down to build_insn_chain which checked all instructions for register usage in order to set the dead_or_set and live_relevant_regs bitmaps instead of checking only non-debug instructions. Changing INSN_P to NONDEBUG_INSN_P in build_insn_chain allowed me to bootstrap and caused no regressions. The debug insns generally shouldn't extend the lifetime of pseudos (see the valtrack.c stuff), so if you hit this, there is probably some earlier bug that didn't reset/adjust the debug insns in question. I'm not saying the ira.c patch is absolutely a bad idea, but it would be good if you could investigate where those debug insns started extending lifetime of pseudos. 2012-08-31 Steve Ellcey sell...@mips.com PR bootstrap/54128 * ira.c (build_insn_chain): Check only NONDEBUG instructions for register usage. Jakub
Re: [PATCH] Add counter histogram to fdo summary (issue6465057)
On 2012.09.04 at 14:23 -0700, Teresa Johnson wrote: I just committed the patch (included below). I implemented the occupancy bit vector approach for recording non-zero histogram entries, and a few issues uncovered with the merging in a profiled bootstrap. Passes both bootstrap and profiledbootstrap builds and regression tests. This commit causes: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54487 -- Markus
Re: [PATCH] Set correct source location for deallocator calls
On 09/04/2012 09:31 PM, Dehao Chen wrote: Looks like even with addr2line properly installed, the gcj generated code cannot get the correct source file/lineno. Do I need to pass in #javac stacktrace.java #java stacktrace stacktrace.e(stacktrace.java:42) stacktrace.d(stacktrace.java:38) stacktrace.c(stacktrace.java:31) stacktrace.b(stacktrace.java:26) stacktrace.a(stacktrace.java:19) stacktrace.main(stacktrace.java:12) #gcj *.class -o stacktrace.exe #./stacktrace.exe stacktrace.e(stacktrace.exe:-1) stacktrace.d(stacktrace.exe:-1) stacktrace.c(stacktrace.exe:-1) stacktrace.b(stacktrace.exe:-1) stacktrace.a(stacktrace.exe:-1) stacktrace.main(stacktrace.exe:-1) Works for me: [aph@nighthawk ~]$ gcj stacktrace.java --main=stacktrace -g [aph@nighthawk ~]$ ./a.out stacktrace.e(stacktrace.java:42) stacktrace.d(stacktrace.java:38) stacktrace.c(stacktrace.java:31) stacktrace.b(stacktrace.java:26) stacktrace.a(stacktrace.java:19) stacktrace.main(stacktrace.java:12) Aren't you just compiling without -g ? There is no debuginfo. Andrew.
Re: [PATCH] Set correct source location for deallocator calls
On Wed, Sep 5, 2012 at 12:29 AM, Andrew Haley a...@redhat.com wrote: On 09/04/2012 09:31 PM, Dehao Chen wrote: Looks like even with addr2line properly installed, the gcj generated code cannot get the correct source file/lineno. Do I need to pass in #javac stacktrace.java #java stacktrace stacktrace.e(stacktrace.java:42) stacktrace.d(stacktrace.java:38) stacktrace.c(stacktrace.java:31) stacktrace.b(stacktrace.java:26) stacktrace.a(stacktrace.java:19) stacktrace.main(stacktrace.java:12) #gcj *.class -o stacktrace.exe #./stacktrace.exe stacktrace.e(stacktrace.exe:-1) stacktrace.d(stacktrace.exe:-1) stacktrace.c(stacktrace.exe:-1) stacktrace.b(stacktrace.exe:-1) stacktrace.a(stacktrace.exe:-1) stacktrace.main(stacktrace.exe:-1) Works for me: [aph@nighthawk ~]$ gcj stacktrace.java --main=stacktrace -g [aph@nighthawk ~]$ ./a.out stacktrace.e(stacktrace.java:42) stacktrace.d(stacktrace.java:38) stacktrace.c(stacktrace.java:31) stacktrace.b(stacktrace.java:26) stacktrace.a(stacktrace.java:19) stacktrace.main(stacktrace.java:12) Aren't you just compiling without -g ? There is no debuginfo. The other thing that might be needed is a newer addr2line which works correctly with the dwarf2(4) that GCC outputs. Thanks, Andrew
Re: [PATCH, M68K] Fix ICE from scheduler improvement
Maxim Kuvyrkov ma...@codesourcery.com writes: No. This hunk makes m68k scheduling support pick up the new state. Can you reformulate the comment to clarify that? Ok with that change. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Re: [PATCH, libstdc++] Add proper OpenBSD support
I'll look at this more carefully later today when I get back from a holiday, but it looks ok after a quick glance. I've CC'd gcc-patches, as required for all patches. Thanks. On Sep 4, 2012 10:27 PM, Mark Kettenis mark.kette...@xs4all.nl wrote: Fixes a few testcases. Mostly based on the existing NetBSD/FreeBSD/Darwin code. 2012-09-04 Mark Kettenis kette...@openbsd.org * configure.host (*-*-openbsd*) Set cpu_include_dir. * config/os/bsd/openbsd/ctype_base.h: New file. * config/os/bsd/openbsd/ctype_configure_char.cc: New file. * config/os/bsd/openbsd/ctype_inline.h: New file. * config/os/bsd/openbsd/os_defines.h: New file. Index: configure.host === --- configure.host (revision 190863) +++ configure.host (working copy) @@ -270,6 +270,9 @@ netbsd*) os_include_dir=os/bsd/netbsd ;; + openbsd*) + os_include_dir=os/bsd/openbsd + ;; qnx6.[12]*) os_include_dir=os/qnx/qnx6.1 c_model=c Index: config/os/bsd/openbsd/ctype_base.h === --- config/os/bsd/openbsd/ctype_base.h (revision 0) +++ config/os/bsd/openbsd/ctype_base.h (working copy) @@ -0,0 +1,59 @@ +// Locale support -*- C++ -*- + +// Copyright (C) 2000, 2009, 2012 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +// http://www.gnu.org/licenses/. + +// +// ISO C++ 14882: 22.1 Locales +// + +// Information as gleaned from /usr/include/ctype.h on OpenBSD. + +namespace std _GLIBCXX_VISIBILITY(default) +{ +_GLIBCXX_BEGIN_NAMESPACE_VERSION + + /// @brief Base class for ctype. + struct ctype_base + { + // Non-standard typedefs. + typedef const short* __to_type; + + // NB: Offsets into ctypechar::_M_table force a particular size + // on the mask type. Because of this, we don't use an enum. + typedef char mask; + + static const mask upper = _U; + static const mask lower = _L; + static const mask alpha = _U | _L; + static const mask digit = _N; + static const mask xdigit = _N | _X; + static const mask space = _S; + static const mask print = _P | _U | _L | _N | _B; + static const mask graph = _P | _U | _L | _N; + static const mask cntrl = _C; + static const mask punct = _P; + static const mask alnum = _U | _L | _N; + }; + +_GLIBCXX_END_NAMESPACE_VERSION +} // namespace Index: config/os/bsd/openbsd/os_defines.h === --- config/os/bsd/openbsd/os_defines.h (revision 0) +++ config/os/bsd/openbsd/os_defines.h (working copy) @@ -0,0 +1,41 @@ +// Specific definitions for OpenBSD -*- C++ -*- + +// Copyright (C) 2000, 2002, 2009, 2012 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +// http://www.gnu.org/licenses/. + +/** @file bits/os_defines.h + * This is an internal header file, included by other library headers. + *
Re: [middle-end] Add machine_mode to address_cost target hook
Updated ACK table: [x] target-independent bits [ ] alpha [x] arm [ ] avr [ ] bfin [ ] cr16 [x] cris [x] epiphany[ ] i386 [ ] ia64 [x] iq2000[ ] lm32[ ] m32c [x] m32r [x] mcore [ ] mep [x] microblaze [x] mips [x] mmix [x] mn10300 [ ] pa [x] rs6000[x] rx[ ] s390[ ] score [x] sh[ ] sparc [x] spu [x] stormy16 [x] v850 [ ] vax [ ] xtensa On Tue, 2012-09-04 at 21:33 -0400, Hans-Peter Nilsson wrote: It helps if you CC port maintainers where approval is requested; you missed at least me. Right. Did that for the remaining targets. (BTW, there's no maintainer listed for the cr16 port) For the reference, the current version of the patch is here: http://gcc.gnu.org/ml/gcc-patches/2012-09/msg00210.html Still, considering that we're C++ now, you can presumably drop the function parameter name instead of adding ATTRIBUTE_UNUSED. Yes, that has been mentioned before and I did drop the parameter names in a place or two, but not in the target pieces -- on purpose. Personally I prefer reading 'int bleh ATTRIBUTE_UNUSED' instead of 'int,int,bool,char,int,void*'. Makes it somehow easier to follow and/or fill in the blanks. PS. please consider building cc1 for each target using contrib/config-list.mk or a similar script, to eliminate breaking typos. Thanks for the hint! Cheers, Oleg
Re: [middle-end] Add machine_mode to address_cost target hook
On Wed, Sep 5, 2012 at 10:16 AM, Oleg Endo oleg.e...@t-online.de wrote: Updated ACK table: [x] target-independent bits [ ] alpha [x] arm [ ] avr [ ] bfin [ ] cr16 [x] cris [x] epiphany[ ] i386 [ ] ia64 [x] iq2000[ ] lm32[ ] m32c [x] m32r [x] mcore [ ] mep [x] microblaze [x] mips [x] mmix [x] mn10300 [ ] pa [x] rs6000[x] rx[ ] s390[ ] score [x] sh[ ] sparc [x] spu [x] stormy16 [x] v850 [ ] vax [ ] xtensa On Tue, 2012-09-04 at 21:33 -0400, Hans-Peter Nilsson wrote: It helps if you CC port maintainers where approval is requested; you missed at least me. Right. Did that for the remaining targets. (BTW, there's no maintainer listed for the cr16 port) This is OK for x86. Thanks, Uros.
Re: [PATCH][RFC] Add -Og
On Sep 4, 2012, at 22:42 , Hans-Peter Nilsson h...@bitrange.com wrote: Please, no inlining. Think of stack back-traces and their use when debugging. But, there was a talk at the GNU Tools Cauldron with related contents - http://gcc.gnu.org/wiki/cauldron2012#Control-flow_preservation_in_GCC_for_safety-critical_uses. It'll be Very Nice if control flow preservation was part of -Og or at least they played nice together. A general comment first: I think the compromise Richard aims at satisfying is a very useful one: It aims at providing fast compilation, a superior debugging experience and reasonable runtime performance. While there are commonalities with preserve-control-flow, indeed, there are differences in target concerns as well. We did a lot of work to support inlining for example :-) Our experience is that compromises of this kind are extremely sensitive to micro details spreadout in multiple places, so I'm afraid having one part of the other might be hard. Olivier
Re: [PATCH][RFC] Add -Og
On 4 September 2012 21:42, Hans-Peter Nilsson h...@bitrange.com wrote: On Mon, 3 Sep 2012, Richard Guenther wrote: On Fri, Aug 10, 2012 at 1:30 PM, Richard Guenther rguent...@suse.de wrote: This adds a new optimization level, -Og, as previously discussed. It aims at providing fast compilation, a superior debugging experience and reasonable runtime performance. Instead of making -O1 this optimization level this adds a new -Og. It's a first cut, highlighting that our fixed pass pipeline and simply enabling/disabling individual passes (but not pass copies for example) doesn't scale to properly differentiate between -Og and -O[23]. -O1 should get similar treatment, eventually just building on -Og but not focusing on debugging experience. That is, I expect that in the end we will at least have two post-IPA optimization pipelines. It also means that you cannot enable PRE or VRP with -Og at the moment because these passes are not anywhere scheduled (similar to the situation with -O0). It has some funny effect on dump-file naming of the pass copies though, which hints at that the current setup is too static. For that reason the new queue comes after the old, to not confuse too many testcases. It also does not yet disable any of the early optimizations that make debugging harder (SRA comes to my mind here, as does switch-conversion and partial inlining). Please, no inlining. Think of stack back-traces and their use when debugging. I would argue [without sufficient knowledge of how easy this would actually be to do in a real compiler :-)] that this is a debugger problem and not a compiler issue. With DWARF as the debug info format it should certainly be possible to produce a view that looked like: $ bt #0b baz (...) #0a inlined into bar (...) #0 inlined into foo (...) #1 do_something #2 main This would involve reading the .debug_frame, and then looking up the inlined subroutines via .debug_info. I personally would like as much optimisation as possible at -Og that doesn't break a defined level of debug illusion. I have seen too many cases where people debug at -O0/-O1 and then build a release with a -O2/-O3 build and get bitten by undefined behaviour issues. The more optimised -Og code is the less the reason to release a build at a higher optimisation level. Thanks, Matt -- Matthew Gretton-Dann Linaro Toolchain Working Group matthew.gretton-d...@linaro.org
Re: [PATCH][RFC] Add -Og
On Wed, Sep 5, 2012 at 10:46 AM, Matthew Gretton-Dann matthew.gretton-d...@linaro.org wrote: Please, no inlining. Think of stack back-traces and their use when debugging. I would argue [without sufficient knowledge of how easy this would actually be to do in a real compiler :-)] that this is a debugger problem and not a compiler issue. It's also a compiler issue if you take inlining of clones into account, or scheduling such that the inlined body is scattered all over in the the caller's body. The compiler can tell the debugger only so much... Ciao! Steven
Re: [PATCH][RFC] Add -Og
On 5 September 2012 09:55, Steven Bosscher stevenb@gmail.com wrote: On Wed, Sep 5, 2012 at 10:46 AM, Matthew Gretton-Dann matthew.gretton-d...@linaro.org wrote: Please, no inlining. Think of stack back-traces and their use when debugging. I would argue [without sufficient knowledge of how easy this would actually be to do in a real compiler :-)] that this is a debugger problem and not a compiler issue. It's also a compiler issue if you take inlining of clones into account, or scheduling such that the inlined body is scattered all over in the the caller's body. The compiler can tell the debugger only so much... But that's not a problem with inlining, that's a problem with allowing things to happen out of order (for some definition of things and order) - which in my understanding -Og is going to tie down. Thanks, Matt -- Matthew Gretton-Dann Linaro Toolchain Working Group matthew.gretton-d...@linaro.org
Re: [PATCH][RFC] Add -Og
On Wed, 5 Sep 2012, Matthew Gretton-Dann wrote: On 5 September 2012 09:55, Steven Bosscher stevenb@gmail.com wrote: On Wed, Sep 5, 2012 at 10:46 AM, Matthew Gretton-Dann matthew.gretton-d...@linaro.org wrote: Please, no inlining. Think of stack back-traces and their use when debugging. I would argue [without sufficient knowledge of how easy this would actually be to do in a real compiler :-)] that this is a debugger problem and not a compiler issue. It's also a compiler issue if you take inlining of clones into account, or scheduling such that the inlined body is scattered all over in the the caller's body. The compiler can tell the debugger only so much... But that's not a problem with inlining, that's a problem with allowing things to happen out of order (for some definition of things and order) - which in my understanding -Og is going to tie down. Yes, the goal is definitely to avoid the jumping back and forth on source lines you can see when debugging optimized programs. Btw, the patch only lies the foundation for all the goals that accumulated in this thread to be eventually fulfilled. I only tried to collect the minimal set of optimizations that do not automatically defeat any of them ;) Any comments on the implementation details btw? Thanks, Richard.
Re: [PATCH, libstdc++] Add proper OpenBSD support
Date: Wed, 5 Sep 2012 10:55:27 +0300 From: Jonathan Wakely jwakely@gmail.com I'll look at this more carefully later today when I get back from a holiday, but it looks ok after a quick glance. Great! I've CC'd gcc-patches, as required for all patches. Sorry 'bout that. Bit of a brain fart on my side; send it to gcc-patc...@openbsd.org instead of gcc-patches@gcc.gnu.org.
Re: [PATCH][RFC] Add -Og
On Wed, Sep 05, 2012 at 11:07:17AM +0200, Richard Guenther wrote: But that's not a problem with inlining, that's a problem with allowing things to happen out of order (for some definition of things and order) - which in my understanding -Og is going to tie down. Yes, the goal is definitely to avoid the jumping back and forth on source lines you can see when debugging optimized programs. Btw, the patch only lies the foundation for all the goals that accumulated in this thread to be eventually fulfilled. I only tried to collect the minimal set of optimizations that do not automatically defeat any of them ;) The jumping back and forth could be fixed by limiting scheduling only to insns with the same locus with -Og or something similar (if doing scheduling at all). Of course the fewer gimple passes that reorder stmts the better too. Guess some inlining ought to be allowed, but e.g. IPA cloning should be definitely disabled. Jakub
Re: [PATCH] Clarify gcc-{ar,nm,ranlib} usage in the documentation
On Tue, Sep 4, 2012 at 6:42 PM, Andi Kleen a...@firstfloor.org wrote: From: Andi Kleen a...@linux.intel.com Make it clear in the documentation that with -fno-fat-lto-objects the gcc-* wrappers should be used to pass the linker plugin. Ok. Thanks, Richard. gcc/: 2012-09-04 Andi Kleen a...@linux.intel.com * doc/invoke.texi (-ffat-lto-objects): Clarify that gcc-ar et.al. should be used. --- gcc/doc/invoke.texi |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 6cf7cec..197803d 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -8138,7 +8138,9 @@ requires the complete toolchain to be aware of LTO. It requires a linker with linker plugin support for basic functionality. Additionally, @command{nm}, @command{ar} and @command{ranlib} need to support linker plugins to allow a full-featured build environment -(capable of building static libraries etc). +(capable of building static libraries etc). gcc provides the @command{gcc-ar}, +@command{gcc-nm}, @command{gcc-ranlib} wrappers to pass the right options +to these tools. With non fat LTO makefiles need to be modified to use them. The default is @option{-ffat-lto-objects} but this default is intended to change in future releases when linker plugin enabled environments become more -- 1.7.7
Re: [middle-end] Add machine_mode to address_cost target hook
On 05/09/12 09:16, Oleg Endo wrote: Updated ACK table: [x] target-independent bits [ ] alpha [x] arm [ ] avr [ ] bfin [ ] cr16 [x] cris [x] epiphany[ ] i386 [ ] ia64 [x] iq2000[ ] lm32[ ] m32c [x] m32r [x] mcore [ ] mep [x] microblaze [x] mips [x] mmix [x] mn10300 [ ] pa [x] rs6000[x] rx[ ] s390[ ] score [x] sh[ ] sparc [x] spu [x] stormy16 [x] v850 [ ] vax [ ] xtensa On Tue, 2012-09-04 at 21:33 -0400, Hans-Peter Nilsson wrote: It helps if you CC port maintainers where approval is requested; you missed at least me. Right. Did that for the remaining targets. (BTW, there's no maintainer listed for the cr16 port) For the reference, the current version of the patch is here: http://gcc.gnu.org/ml/gcc-patches/2012-09/msg00210.html Still, considering that we're C++ now, you can presumably drop the function parameter name instead of adding ATTRIBUTE_UNUSED. Yes, that has been mentioned before and I did drop the parameter names in a place or two, but not in the target pieces -- on purpose. Personally I prefer reading 'int bleh ATTRIBUTE_UNUSED' instead of 'int,int,bool,char,int,void*'. Makes it somehow easier to follow and/or fill in the blanks. PS. please consider building cc1 for each target using contrib/config-list.mk or a similar script, to eliminate breaking typos. Thanks for the hint! Cheers, Oleg Since they're not complicated: sparc, vax, ia64, mep, cr16, bfin, s390, pa, xtensa, m32c, score, i386, lm32 and alpha are all ok. R.
Re: [middle-end] Add machine_mode to address_cost target hook
2012/9/5 Oleg Endo oleg.e...@t-online.de: Updated ACK table: [x] target-independent bits [ ] alpha [x] arm [ ] avr [ ] bfin [ ] cr16 [x] cris [x] epiphany[ ] i386 [ ] ia64 [x] iq2000[ ] lm32[ ] m32c [x] m32r [x] mcore [ ] mep [x] microblaze [x] mips [x] mmix [x] mn10300 [ ] pa [x] rs6000[x] rx[ ] s390[ ] score [x] sh[ ] sparc [x] spu [x] stormy16 [x] v850 [ ] vax [ ] xtensa avr - ok Denis.
Re: [PATCH] Reduce memory usage for storing LTO decl resolutions
On Wed, Sep 5, 2012 at 12:49 AM, Steven Bosscher stevenb@gmail.com wrote: On Tue, Sep 4, 2012 at 6:43 PM, Andi Kleen a...@firstfloor.org wrote: +/* Compact representation of a index - resolution pair. Unpacked to an + vector later. */ +struct res_pair +{ + ld_plugin_symbol_resolution_t res; + unsigned index; +}; +typedef struct res_pair res_pair; + +DEF_VEC_P(res_pair); +DEF_VEC_ALLOC_P(res_pair, heap); Did you mean to use DEF_VEC_O here? (Not sure it matters after the vec rewrite for c++) Should be indeed _O. Ok with that change (double-check it still works with the recent VEC C++ changes). Ok to backport, check again if it works properly though! Thanks, Richard. Ciao! Steven
Re: faster random number engine
On 08/30/2012 06:37 PM, Benjamin De Kosnik wrote: Nice! Thanks. Here's a small patchlet to set the abi version to .18. With this, check-abi will pass. tested x86/linux Benjamin, is this still uncommitted? I'm seeing abi_check failing... Thanks, Paolo.
RE: Ping: [PATCH] Enable bbro for -Os
-Original Message- From: Richard Guenther [mailto:richard.guent...@gmail.com] Sent: Tuesday, September 04, 2012 6:31 PM To: Zhenqiang Chen Cc: Steven Bosscher; gcc-patches@gcc.gnu.org Subject: Re: Ping: [PATCH] Enable bbro for -Os On Wed, Aug 29, 2012 at 10:42 AM, Zhenqiang Chen zhenqiang.c...@arm.com wrote: -Original Message- From: Steven Bosscher [mailto:stevenb@gmail.com] Sent: Friday, August 24, 2012 8:17 PM To: Zhenqiang Chen Cc: gcc-patches@gcc.gnu.org Subject: Re: Ping: [PATCH] Enable bbro for -Os On Wed, Aug 22, 2012 at 8:49 AM, Zhenqiang Chen zhenqiang.c...@arm.com wrote: The patch is to enable bbro for -Os. When optimizing for size, it * avoid duplicating block. * keep its original order if there is no chance to fall through. * ignore edge frequency and probability. * handle predecessor first if its index is smaller to break long trace. You do this by inserting the index as a key. I don't fully understand this change. You're assuming that a block with a lower index has a lower pre- order number in the CFG's DFS spanning tree, IIUC (i.e. the blocks are numbered sequentially)? I'm not sure that's always true. I think you should add an explanation for this heuristic. Thank you for the comments. cleanup_cfg is called at the end cfg_layout_initialize before reorder_basic_blocks. cleanup_cfg does lots of optimization on cfg and renumber the basic blocks. After cleanup_cfg, the blocks are roughly numbered sequentially. Well, sequentially in their current order which is not in any way flow- controlled. Yip. The order is not flow-controlled. During debugging, I found the order is quite good for code size. Logs show I have code size improvement only with cleanup_cfg and without reorder_basic_blocks. But the order is not the final result. It is changed after cfg_layout_finalize. The patch tries to keep the order and connect some fall through edges. @@ -530,10 +544,11 @@ find_traces_1_round (int branch_th, int exec_th, gcov_type count_th, } /* Edge that cannot be fallthru or improbable or infrequent -successor (i.e. it is unsuitable successor). */ +successor (i.e. it is unsuitable successor). +For size, ignore the frequency and probability. */ if (!(e-flags EDGE_CAN_FALLTHRU) || (e-flags EDGE_COMPLEX) - || prob branch_th || EDGE_FREQUENCY (e) exec_th - || e-count count_th) + || (prob branch_th || EDGE_FREQUENCY (e) exec_th + || e-count count_th) !for_size) continue; why that change? It seems you do re-orderings that would not be done with -Os even though your goal was to preserve the original ordering. The change just ignores the frequency and probability for -Os. Connecting EDGE_CAN_FALLTHRU edges does have more chance to reduce size. No reason to skip it. + /* Wait for the predecessors. */ + if ((e == best_edge) for_size + (EDGE_COUNT (best_edge-dest-succs) 1 + || EDGE_COUNT (best_edge-dest-preds) 1)) + { + best_edge = NULL; + } I don't understand this (well, I'm not very familiar with bb-reorder), doesn't that mean you rather want to push this block to the next round? The change is to break long trace to reduce long jump. It does not connect the best_edge immediately. Put the block to the next round to give a chance for its predecessor which index is smaller to be handled first. Take if-then-else as an example: A /\ B C \/ D Without this change, the final order might be * ABD ... C // C is at the end of the program, might need longjump C--D * ACD ... B //B is at the end of the program, might need longjump B--D But from code size view, ABCD/ACBD is better since it reduces the possibility of longjump. The change is to generate such code. In this example we ignore best_edge B--D and C--D. So put D to next round, then the block with less index is selected. (I'm not familiar with cfg-cleanup. My logs show D's index is always greater than B's and C's) Overall I think this patch looks like adding hacks into the heuristics to make it sane for -Os instead of doing what the comment suggests: I think I follow the comments: * minimize the combined size: In the old heuristics, there is a long performance sensitive trace from entry to exit. In my patch, it is broken into small ones. Logs show with the patch, most traces with only one or two blocks; few have 3 blocks. * more or less automatically remove extra jumps: By ignoring the frequency and probability, the patch have more chance to follow through. When there is no chance, just keep its original order. * long jumps: As mentioned above the patch has less chance to introduce long jumps. - /* Don't reorder
Re: [PATCH] Set correct source location for deallocator calls
On Tue, 2012-09-04 at 18:17 +0100, Bryce McKinlay wrote: libgcj wouldn't actually use it for unwinding, we already have all that. We'd just use it to read DWARF debug info and give us the source code line numbers. Casey Marshell did also write that part some time ago, but it was never finished/integrated. http://gcc.gnu.org/ml/java-patches/2004-q3/msg00350.html Cheers, Mark
Re: [patch] PR bootstrap/54453 (libstdc++ doesn't build)
On 09/01/2012 08:10 PM, Steven Bosscher wrote: Hello, r190783 breaks bootstrap on powerpc64-unknown-linux-gnu. The problem is caused by a regexp that used to check for space|tab before the patch but now only looks for tab. The attached patch fixes this problem for me, but I'm not sure why (I haven't tried to look into the details of the problem, I've simply reverted parts of the commit that broke things for me). Can a libstdc++ maintainer please have a look? Uli, I'm not sure to understand why that commit of yours changed that specific regexp, was an unintended change? Please provide feedback asap, otherwise, in order to restore bootstrap on powerpc64, Steven please just install your fix (but please double check that things are still Ok on x86_64-linux!) Paolo.
[PATCH,i386] fma4 addition for bdver2
Hello, FMA4 and FMA3 ISA are implemented in bdver2 target. FMA3 is selected by default. This patch supports the use of FMA4 intrinsics for bdver2 targets. Is it OK for trunk? Regards Ganesh 2012-09-05 Ganesh Gopalasubramanian ganesh.gopalasubraman...@amd.com * config/i386/i386.md : Comments on fma4 instruction selection reflect requirement on register pressure based cost model. * config/i386/driver-i386.c (host_detect_local_cpu): fma4 flag is set-reset as informed by the cpuid flag. * config/i386/i386.c (processor_alias_table): fma4 flag is enabled for bdver2. Index: gcc/config/i386/i386.md === --- gcc/config/i386/i386.md (revision 190830) +++ gcc/config/i386/i386.md (working copy) @@ -659,9 +659,11 @@ (eq_attr isa noavx2) (symbol_ref !TARGET_AVX2) (eq_attr isa bmi2) (symbol_ref TARGET_BMI2) (eq_attr isa fma) (symbol_ref TARGET_FMA) -;; Disable generation of FMA4 instructions for generic code -;; since FMA3 is preferred for targets that implement both -;; instruction sets. +;; Fma instruction selection has to be done based on +;; register pressure. For generating fma4, a cost model +;; based on register pressure is required. Till then, +;; fma4 instruction is disabled for targets that implement +;; both fma and fma4 instruction sets. (eq_attr isa fma4) (symbol_ref TARGET_FMA4 !TARGET_FMA) ] Index: gcc/config/i386/driver-i386.c === --- gcc/config/i386/driver-i386.c (revision 190830) +++ gcc/config/i386/driver-i386.c (working copy) @@ -483,8 +483,6 @@ has_abm = ecx bit_ABM; has_lwp = ecx bit_LWP; has_fma4 = ecx bit_FMA4; - if (vendor == SIG_AMD has_fma4 has_fma) - has_fma4 = 0; has_xop = ecx bit_XOP; has_tbm = ecx bit_TBM; has_lzcnt = ecx bit_LZCNT; Index: gcc/config/i386/i386.c === --- gcc/config/i386/i386.c (revision 190830) +++ gcc/config/i386/i386.c (working copy) @@ -3164,7 +3164,7 @@ {bdver2, PROCESSOR_BDVER2, CPU_BDVER2, PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_SSE4A | PTA_CX16 | PTA_ABM | PTA_SSSE3 | PTA_SSE4_1 - | PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX + | PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX | PTA_FMA4 | PTA_XOP | PTA_LWP | PTA_BMI | PTA_TBM | PTA_F16C | PTA_FMA}, {btver1, PROCESSOR_BTVER1, CPU_GENERIC64, Regards Ganesh
Re: [C++ Patch] PR 54191
On 08/28/2012 01:49 AM, Jason Merrill wrote: OK. Thanks Jason. I have now committed the patch together with the three additional tests which depended on instantiation_dependent_p. Thanks again, Paolo.
Re: [PATCH,i386] fma4 addition for bdver2
On Wed, Sep 5, 2012 at 12:10 PM, Gopalasubramanian, Ganesh ganesh.gopalasubraman...@amd.com wrote: FMA4 and FMA3 ISA are implemented in bdver2 target. FMA3 is selected by default. This patch supports the use of FMA4 intrinsics for bdver2 targets. Is it OK for trunk? OK. I will backport this patch, together with my previous FMA patch to 4.7 branch. Thanks, Uros.
Re: faster random number engine
On 09/05/2012 11:53 AM, Paolo Carlini wrote: On 08/30/2012 06:37 PM, Benjamin De Kosnik wrote: Nice! Thanks. Here's a small patchlet to set the abi version to .18. With this, check-abi will pass. tested x86/linux Benjamin, is this still uncommitted? I'm seeing abi_check failing... Ok, now I see that the patch is in and a default configured build is fine abi-wise. Sorry about the false alarm. If you are curious, the reason why I sent the message is that adding --enable-libstdcxx-time=rt (which should be more or less the default in the C++11 era) used not to lead to abi_check failures and it does now. Should look more into it... Paolo.
RE: [PATCH] PR45070: Fix wrong epilogue code for cortex-m0/Os
-Original Message- From: Ramana Radhakrishnan Sent: Tuesday, September 04, 2012 4:03 PM To: Bin Cheng Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] PR45070: Fix wrong epilogue code for cortex-m0/Os I ran regression test with/without Os for cortex-m0 and everything is ok. Ok for trunk and 4.7/4.6 release branches? OK for trunk. committed. Ok to backport if no release manager objects in 24 hours and if it tests without regressions there. no objections. committed after regression test. Thanks.
Re: [PATCH] Reduce memory usage for storing LTO decl resolutions
On 2012-09-04 18:49 , Steven Bosscher wrote: Did you mean to use DEF_VEC_O here? (Not sure it matters after the vec rewrite for c++) It doesn't anymore. But it will matter for backports. Diego.
[RFA 1/n] Fix if conversion interactions with block partitioning
All, This is the first patch in a series with the ultimate aim of enabling -freorder-blocks-and-partition in the ARM backend. However, whilst working on this I have come across a number of midend issues which should be fixed individually. This patch fixes an ICE during if-conversion. The problem is that when we encounter a CFG that looks like: || || | 167 (COLD) | / \ | /\ | 168 (COLD) 169 (COLD) \ \/ \--\ \ / \ \/ 170 (HOT) | | The 'ce3' phase merges blocks 167, 168, and 169, and eventually calls rtl_tidy_fallthru_edge to convert the edge from 167 to 170 into a fallthru one. This causes verify_flow_info to fail as you can't have a fallthru edge between different partitions. The fix I have implemented is to have rtl_tidy_fallthru not do anything if the fallthru edge crosses a partition boundary. OK? Thanks, Matt gcc/ChangeLog: 2012-09-05 Matthew Gretton-Dann matthew.gretton-d...@linaro.org * cfgrtl.c (rtl_tidy_fallthru_edge): Don't tidy edges which cross partitions. diff --git a/gcc/cfgrtl.c b/gcc/cfgrtl.c index c62b5bc..341ea9e 100644 --- a/gcc/cfgrtl.c +++ b/gcc/cfgrtl.c @@ -1572,6 +1572,11 @@ rtl_tidy_fallthru_edge (edge e) if (INSN_P (q)) return; + /* If the two blocks are in different partitions we do not want to mark + this as a fallthru edge. */ + if (BB_PARTITION (b) != BB_PARTITION (c)) +return; + /* Remove what will soon cease being the jump insn from the source block. If block B consisted only of this single jump, turn it into a deleted note. */ -- Matthew Gretton-Dann Linaro Toolchain Working Group matthew.gretton-d...@linaro.org
[RFA 2/n] Don't lift loads above register using jumps in postreload-gcse.c
All, When implementing ARM/Thumb support for -freorder-blocks-and-partition I encountered the following silent code generation fault. Given the following CFG: | | 93 97 | | (FALLTHRU)(CROSSING) \ / \\ /---/ \/ 94 | Basic Block 94 has the following insn in it which we want to lift into blocks 93 and 97: (insn/v 62 767 63 94 (set (reg:SI 3 r3 [orig:1468 ivtmp.85 ] [1468]) (mem/c:SI (plus:SI (reg/f:SI 13 sp) (const_int 20 [0x14])) [7 %sfp+-52 S4 A32])) For block 93 this becomes a move from r0 to r3 - and everything is OK. For block 97 there is no appropriate move so the compiler tries to copy the load, and insert it on the edge from 97 to 94. This edge is a crossing edge, and so is implemented by an indirect jump: (insn 2795 2590 3940 97 (set (reg:SI 3 r3 [2464]) (mem/u/c:SI (symbol_ref/u:SI (*.LC19) [flags 0x2]) [2 S4 A32])) 634 {*arm_movsi_vfp} (insn_list:REG_LABEL_OPERAND 887 (expr_list:REG_EQUIV (label_ref:SI 887) (nil (jump_insn 2796 3940 2593 97 (set (pc) (reg:SI 3 r3 [2464])) 264 {*arm_indirect_jump} (expr_list:REG_CROSSING_JUMP (nil) (nil))) The compiler tries to insert the copy of insn 62 (in this case it becomes insn 3940) immediately before the jump_insn - which because this is a crossing edge is implemented as an indirect jump using a register in the ARM backend: (insn 2795 2590 3940 97 (set (reg:SI 3 r3 [2464]) (mem/u/c:SI (symbol_ref/u:SI (*.LC19) [flags 0x2]) [2 S4 A32])) 634 {*arm_movsi_vfp} (insn_list:REG_LABEL_OPERAND 887 (expr_list:REG_EQUIV (label_ref:SI 887) (nil (insn 3940 2795 2796 97 (set (reg:SI 3 r3 [orig:1468 ivtmp.85 ] [1468]) (mem/c:SI (plus:SI (reg/f:SI 13 sp) (const_int 20 [0x14])) [7 %sfp+-52 S4 A32])) -1 (nil)) (jump_insn 2796 3940 2593 97 (set (pc) (reg:SI 3 r3 [2464])) 264 {*arm_indirect_jump} (expr_list:REG_CROSSING_JUMP (nil) (nil))) However, this is incorrect as insn 3940 sets r3, and the jump_insn 2796 wants to use r3 (as set by 2795). The patch fixes this by checking that the register set by the load is not used by the jump before allowing the load to be lifted. Whilst this fix works for this particular case I am not sure it is the best fix for the general issue, and so if others have a better idea how to fix this I would be very happy. In particular I wonder whether we should be defining TARGET_CANNOT_MODIFY_JUMPS_P for the ARM backend as indirect jumps use registers in a similar way to the SH backend. Not that this would have helped in this particular instance. Tested cross arm-linux-none-gnueabi with in progress ARM -freorder-blocks-and- partition enabling patch. OK? Thanks, Matt gcc/ChangeLog: 2012-09-05 Matthew Gretton-Dann matthew.gretton-d...@linaro.org * postreload-gcse.c (eliminate_partially_redundant_load): Ensure that loads are not lifted over branches which use the register loaded. diff --git a/gcc/postreload-gcse.c b/gcc/postreload-gcse.c index b464d1f..85fb9b3 100644 --- a/gcc/postreload-gcse.c +++ b/gcc/postreload-gcse.c @@ -1048,6 +1048,13 @@ eliminate_partially_redundant_load (basic_block bb, rtx /* Adding a load on a critical edge will cause a split. */ if (EDGE_CRITICAL_P (pred)) critical_edge_split = true; + + /* If the destination register is used at the BB end we can not +insert the load. */ + if (reg_used_between_p (dest, PREV_INSN (BB_END (pred_bb)), + next_pred_bb_end)) + goto cleanup; + not_ok_count += pred-count; unoccr = (struct unoccr *) obstack_alloc (unoccr_obstack, sizeof (struct unoccr)); -- Matthew Gretton-Dann Linaro Toolchain Working Group matthew.gretton-d...@linaro.org
Re: [Test] contrib/test_installed modified to set specific gcov
Ok for trunk, 4.7, 4.6? 2012/8/17 Anna Tikhonova anna.m.tikhon...@gmail.com: Ping one more time. 2012/8/10 Anna Tikhonova anna.m.tikhon...@gmail.com: Ping. 2012/8/8 Anna Tikhonova anna.m.tikhon...@gmail.com: Hi, while running check for Android NDK compiler (I've used contrib/test_installed for it) I've noticed that gcov name is hardcoded in g++.dg/gcov/gcov.exp. All NDK x86 tools have prefix like i686-linux-android-, so testing will fail to spawn gcov. The patch attached introduces --with-gcov flag of contrib/test_installed so one could set specific gcov for testing. As workaround we could create a wrapper script named 'gcov' pointing to specific gcov in directory where GCC_UNDER_TEST resides. What do you think of this patch? Do you find it usefull? Thanks in advance. Anna
Re: [RFA 1/n] Fix if conversion interactions with block partitioning
On Wed, Sep 5, 2012 at 1:25 PM, Matthew Gretton-Dann wrote: + /* If the two blocks are in different partitions we do not want to mark + this as a fallthru edge. */ + if (BB_PARTITION (b) != BB_PARTITION (c)) +return; + I think you should look for a REG_CROSSING_JUMP note on BB_END instead of comparing BB_PARTITION. Ciao! Steven
Re: [RFA 2/n] Don't lift loads above register using jumps in postreload-gcse.c
On Wed, Sep 5, 2012 at 1:42 PM, Matthew Gretton-Dann wrote: Whilst this fix works for this particular case I am not sure it is the best fix for the general issue, and so if others have a better idea how to fix this I would be very happy. postreload-gcse.c is broken in interesting ways. Look at this gem for example: static bool reg_changed_after_insn_p (rtx x, int cuid) { unsigned int regno, end_regno; regno = REGNO (x); end_regno = END_HARD_REGNO (x); do if (reg_avail_info[regno] cuid) return true; while (++regno end_regno); return false; } So the more conservative the fix, the better :-) The patch looks correct to me. But perhaps the pass should just punt on blocks not ending in a simple jump in bb_has_well_behaved_predecessors? Ciao! Steven
Re: [RFA 2/n] Don't lift loads above register using jumps in postreload-gcse.c
On 05/09/12 13:02, Steven Bosscher wrote: On Wed, Sep 5, 2012 at 1:42 PM, Matthew Gretton-Dann wrote: Whilst this fix works for this particular case I am not sure it is the best fix for the general issue, and so if others have a better idea how to fix this I would be very happy. postreload-gcse.c is broken in interesting ways. Look at this gem for example: static bool reg_changed_after_insn_p (rtx x, int cuid) { unsigned int regno, end_regno; regno = REGNO (x); end_regno = END_HARD_REGNO (x); do if (reg_avail_info[regno] cuid) return true; while (++regno end_regno); return false; } So the more conservative the fix, the better :-) The patch looks correct to me. But perhaps the pass should just punt on blocks not ending in a simple jump in bb_has_well_behaved_predecessors? Ciao! Steven That sort of makes sense. Why would we ever want to hoist an insn out of a cold block into a hot one? I could see it making sense to do the reverse on occasion, but clearly care is needed here. R.
Re: [patch] PR bootstrap/54453 (libstdc++ doesn't build)
On Wed, Sep 5, 2012 at 6:08 AM, Paolo Carlini paolo.carl...@oracle.com wrote: Uli, I'm not sure to understand why that commit of yours changed that specific regexp, Completely unintended, I thought I mentioned this already. The problem is emacs's whitespace mode which made those changes.
Re: [RFA 2/n] Don't lift loads above register using jumps in postreload-gcse.c
On 5 September 2012 13:45, Richard Earnshaw rearn...@arm.com wrote: On 05/09/12 13:02, Steven Bosscher wrote: On Wed, Sep 5, 2012 at 1:42 PM, Matthew Gretton-Dann wrote: Whilst this fix works for this particular case I am not sure it is the best fix for the general issue, and so if others have a better idea how to fix this I would be very happy. postreload-gcse.c is broken in interesting ways. Look at this gem for example: static bool reg_changed_after_insn_p (rtx x, int cuid) { unsigned int regno, end_regno; regno = REGNO (x); end_regno = END_HARD_REGNO (x); do if (reg_avail_info[regno] cuid) return true; while (++regno end_regno); return false; } So the more conservative the fix, the better :-) I suppose removing the pass is too conservative :-) The patch looks correct to me. But perhaps the pass should just punt on blocks not ending in a simple jump in bb_has_well_behaved_predecessors? By 'simple jump' you mean any block with at most only EDGE_FALLTHRU on the edge? That sort of makes sense. Why would we ever want to hoist an insn out of a cold block into a hot one? I could see it making sense to do the reverse on occasion, but clearly care is needed here. So whilst testing -freorder-blocks-and-partition has caused this behaviour to be exhibited, I believe there is nothing stopping this happening with any indirect jump - not just crossing jumps. Thanks, Matt -- Matthew Gretton-Dann Linaro Toolchain Working Group matthew.gretton-d...@linaro.org
[PATCH] Avoid repeated SSA updates during loop unrolling
This cuts another 250s off the testcase in PR46590 by calling update_ssa from complete unrolling only after all innermost loops are processed once. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2012-09-05 Richard Guenther rguent...@suse.de PR tree-optimization/46590 * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Do not update SSA form here. (canonicalize_induction_variables): Assert we do not need to update SSA form. (tree_unroll_loops_completely): Update SSA form here. * tree-ssa-loop-manip.c (gimple_duplicate_loop_to_header_edge): Do not verify loop-closed SSA form if SSA form is not up-to-date. Index: gcc/tree-ssa-loop-ivcanon.c === --- gcc/tree-ssa-loop-ivcanon.c (revision 190968) +++ gcc/tree-ssa-loop-ivcanon.c (working copy) @@ -414,7 +414,6 @@ try_unroll_loop_completely (struct loop else gimple_cond_make_false (cond); update_stmt (cond); - update_ssa (TODO_update_ssa); if (dump_file (dump_flags TDF_DETAILS)) fprintf (dump_file, Unrolled loop %d completely.\n, loop-num); @@ -493,6 +492,7 @@ canonicalize_induction_variables (void) true, UL_SINGLE_ITER, true); } + gcc_assert (!need_ssa_update_p (cfun)); /* Clean up the information about numbers of iterations, since brute force evaluation could reveal new information. */ @@ -536,6 +536,8 @@ tree_unroll_loops_completely (bool may_i if (changed) { + update_ssa (TODO_update_ssa); + /* This will take care of removing completely unrolled loops from the loop structures so we can continue unrolling now innermost loops. */ Index: gcc/tree-ssa-loop-manip.c === --- gcc/tree-ssa-loop-manip.c (revision 190968) +++ gcc/tree-ssa-loop-manip.c (working copy) @@ -752,7 +752,13 @@ gimple_duplicate_loop_to_header_edge (st return false; #ifdef ENABLE_CHECKING - if (loops_state_satisfies_p (LOOP_CLOSED_SSA)) + /* ??? This forces needless update_ssa calls after processing each + loop instead of just once after processing all loops. We should + instead verify that loop-closed SSA form is up-to-date for LOOP + only (and possibly SSA form). For now just skip verifying if + there are to-be renamed variables. */ + if (!need_ssa_update_p (cfun) + loops_state_satisfies_p (LOOP_CLOSED_SSA)) verify_loop_closed_ssa (true); #endif
[fortran, committed] PR 54474 assumed rank testsuite fix (was: Re: [Patch, Fortran, committed] Fix PR54467 (TBP ICEs due to _final wrapper disabling))
On 04/09/2012 00:19, Dominique Dhumieres wrote: Hi Tobias, The lines 6 and 12 of gfortran.dg/coarray_poly_3.f90 need some adjustment along the line: For what's worth, the testsuite change was part of patch (b) at http://gcc.gnu.org/ml/fortran/2012-08/msg00201.html while it should have been part of patch (a), (or the corresponding frontend change should have been part of patch (b)). Anyway, I committed it (revision 190977). Mikael Index: gfortran.dg/coarray_poly_3.f90 === --- gfortran.dg/coarray_poly_3.f90 (révision 190976) +++ gfortran.dg/coarray_poly_3.f90 (révision 190977) @@ -3,13 +3,13 @@ ! -subroutine cont1(x) ! { dg-error has the CONTIGUOUS attribute but is not an array pointer or an assumed-shape array } +subroutine cont1(x) ! { dg-error has the CONTIGUOUS attribute but is not an array pointer or an assumed-shape or assumed-rank array } type t end type t class(t), contiguous, allocatable :: x(:) end -subroutine cont2(x) ! { dg-error has the CONTIGUOUS attribute but is not an array pointer or an assumed-shape array } +subroutine cont2(x) ! { dg-error has the CONTIGUOUS attribute but is not an array pointer or an assumed-shape or assumed-rank array } type t end type t class(t), contiguous, allocatable :: x(:)[:] Index: ChangeLog === --- ChangeLog (révision 190976) +++ ChangeLog (révision 190977) @@ -1,3 +1,8 @@ +2012-09-05 Dominique Dhumieres domi...@lps.ens.fr + + PR fortran/54474 + * gfortran.dg/coarray_poly_3.f90: Adjust error messages. + 2012-09-05 Paolo Carlini paolo.carl...@oracle.com PR c++/54191
Re: Minor reorganization in bb-reorder.c
The file contains 3 RTL optimization passes, the gate and worker functions of which are strangely intertwined. More cosmetic changes: a few clarifications in the head comment of the file, a handful of long lines and other formatting nits. No functional changes. Tested on x86_64-suse-linux, applied on the mainline. 2012-09-05 Eric Botcazou ebotca...@adacore.com * bb-reorder.c: Clarify a few points in the head comment and fix long lines in other comments. (find_traces): Fix long line. (find_traces_1_round): Likewise. (better_edge_p): Likewise. (connect_traces): Likewise. (duplicate_computed_gotos): Likewise. (find_rarely_executed_basic_blocks_and_cr): Remove trailing spaces. (fix_up_fall_thru_edges): Fix formatting. -- Eric Botcazou Index: bb-reorder.c === --- bb-reorder.c (revision 190948) +++ bb-reorder.c (working copy) @@ -20,41 +20,41 @@ /* This (greedy) algorithm constructs traces in several rounds. The construction starts from seeds. The seed for the first round - is the entry point of function. When there are more than one seed - that one is selected first that has the lowest key in the heap - (see function bb_to_key). Then the algorithm repeatedly adds the most - probable successor to the end of a trace. Finally it connects the traces. + is the entry point of the function. When there are more than one seed, + the one with the lowest key in the heap is selected first (see bb_to_key). + Then the algorithm repeatedly adds the most probable successor to the end + of a trace. Finally it connects the traces. There are two parameters: Branch Threshold and Exec Threshold. - If the edge to a successor of the actual basic block is lower than - Branch Threshold or the frequency of the successor is lower than - Exec Threshold the successor will be the seed in one of the next rounds. + If the probability of an edge to a successor of the current basic block is + lower than Branch Threshold or its frequency is lower than Exec Threshold, + then the successor will be the seed in one of the next rounds. Each round has these parameters lower than the previous one. - The last round has to have these parameters set to zero - so that the remaining blocks are picked up. + The last round has to have these parameters set to zero so that the + remaining blocks are picked up. The algorithm selects the most probable successor from all unvisited successors and successors that have been added to this trace. The other successors (that has not been sent to the next round) will be - other seeds for this round and the secondary traces will start in them. - If the successor has not been visited in this trace it is added to the trace - (however, there is some heuristic for simple branches). - If the successor has been visited in this trace the loop has been found. - If the loop has many iterations the loop is rotated so that the - source block of the most probable edge going out from the loop - is the last block of the trace. + other seeds for this round and the secondary traces will start from them. + If the successor has not been visited in this trace, it is added to the + trace (however, there is some heuristic for simple branches). + If the successor has been visited in this trace, a loop has been found. + If the loop has many iterations, the loop is rotated so that the source + block of the most probable edge going out of the loop is the last block + of the trace. If the loop has few iterations and there is no edge from the last block of - the loop going out from loop the loop header is duplicated. - Finally, the construction of the trace is terminated. + the loop going out of the loop, the loop header is duplicated. - When connecting traces it first checks whether there is an edge from the - last block of one trace to the first block of another trace. + When connecting traces, the algorithm first checks whether there is an edge + from the last block of a trace to the first block of another trace. When there are still some unconnected traces it checks whether there exists - a basic block BB such that BB is a successor of the last bb of one trace - and BB is a predecessor of the first block of another trace. In this case, - BB is duplicated and the traces are connected through this duplicate. + a basic block BB such that BB is a successor of the last block of a trace + and BB is a predecessor of the first block of another trace. In this case, + BB is duplicated, added at the end of the first trace and the traces are + connected through it. The rest of traces are simply connected so there will be a jump to the - beginning of the rest of trace. + beginning of the rest of traces. References: @@ -89,11 +89,11 @@ /* The number of
Re: [PATCH] Enable bbro for -Os
Basic block reordering is disabled for -Os from gcc 4.7 since the pass will lead to big code size regression. But benchmarks logs also show there are lots of regression due to poor code layout compared with 4.6. The patch is to enable bbro for -Os. When optimizing for size, it * avoid duplicating block. * keep its original order if there is no chance to fall through. * ignore edge frequency and probability. * handle predecessor first if its index is smaller to break long trace. * only connect Trace n with Trace n + 1 to reduce long jump. Here are the CSiBE code size benchmark results: * For ARM, code size reduces 0.21%. * For MIPS, code size reduces 0.25%. * For PPC, code size reduces 0.33%. * For X86, code size reduces 0.22%. Interesting figures. The patch looks good overall but, since the -Os path deviates substantially from the original algorithm, it needs to be clearly documented in the comment at the top of the file (before Reference), e.g. The above description is for the full algorithm, which is used when the function is optimized for speed. When the function is optimized for size, in order to ...insert reasons here..., the algorithm is modified as follows: ...list modifications here... More details comments: /* Edge that cannot be fallthru or improbable or infrequent -successor (i.e. it is unsuitable successor). */ +successor (i.e. it is unsuitable successor). +For size, ignore the frequency and probability. */ if (!(e-flags EDGE_CAN_FALLTHRU) || (e-flags EDGE_COMPLEX) - || prob branch_th || EDGE_FREQUENCY (e) exec_th - || e-count count_th) + || (prob branch_th || EDGE_FREQUENCY (e) exec_th + || e-count count_th) !for_size) continue; ...ignore the probability and frequency. @@ -558,6 +564,14 @@ find_traces_1_round (int branch_th, int exec_th, gcov_type count_th, /* Add all non-selected successors to the heaps. */ FOR_EACH_EDGE (e, ei, bb-succs) { + /* Wait for the predecessors. */ + if ((e == best_edge) for_size + (EDGE_COUNT (best_edge-dest-succs) 1 + || EDGE_COUNT (best_edge-dest-preds) 1)) + { + best_edge = NULL; + } + if (e == best_edge || e-dest == EXIT_BLOCK_PTR || bb_visited_trace (e-dest)) I don't really understand what this means and why this is done here. If you don't want to add the best destination in some cases, why not do it just before the loop and explicitly state the reason? And you don't need parentheses. @@ -596,11 +610,12 @@ find_traces_1_round (int branch_th, int exec_th, gcov_type count_th, { /* When partitioning hot/cold basic blocks, make sure the cold blocks (and only the cold blocks) all get -pushed to the last round of trace collection. */ +pushed to the last round of trace collection. +Do not push to next round when optimizing for size. Trailing spaces in comment. @@ -681,6 +696,8 @@ find_traces_1_round (int branch_th, int exec_th, gcov_type count_th, (i.e. 2 * B-frequency = EDGE_FREQUENCY (AC) ) Best ordering is then A B C. + For size, A B C is always the best order. + This situation is created for example by: if (A) B; When optimizing for size, A B C is always the best ordering. @@ -864,6 +886,13 @@ better_edge_p (const_basic_block bb, const_edge e, int prob, int freq, int best_ int diff_prob = best_prob / 10; int diff_freq = best_freq / 10; + if (optimize_function_for_size_p (cfun)) +{ + /* The smaller one is better to keep the original order. */ + return !cur_best_edge +|| cur_best_edge-dest-index e-dest-index; +} Move the comment out of the block, add When optimizing for size and remove the parentheses. +/* Return true when the edge E is better than the temporary best edge + CUR_BEST_EDGE. If SRC_INDEX_P is true, the function compares the src bb of + E and CUR_BEST_EDGE; otherwise it will compare the dest bb. + BEST_LEN is the trace length of src (or dest) bb in CUR_BEST_EDGE. + TRACES record the information about traces. + When optimizing for size, the edge with smaller index is better. + When optimizing for speed, the edge with bigger probability or longer trace + is better. */ + +static bool +connect_better_edge_p (const_edge e, bool src_index_p, int best_len, + const_edge cur_best_edge, struct trace *traces) +{ + int e_index; + int b_index; + + if (!cur_best_edge) +return true; + + if (optimize_function_for_size_p (cfun)) +{ +
Add a configure option to disable system header canonicalizations (issue6495088)
Add a configure option to disable system header canonicalizations. Libcpp may canonicalize system header paths with lrealpath() for diagnostics, dependency output, and similar. If gcc is held in a symlink farm the canonicalized paths may be meaningless to users, and will also conflict with build frameworks that (for example) disallow absolute paths to header files. This change adds --[en/dis]able-canonical-system-headers, allowing configure to select whether or not to implement r186991. See also PR c++/52974. Tested for regressions with bootstrap builds of C and C++, both with and without configure --disable-canonical-system-headers. Okay for trunk? libcpp/ChangeLog.google-integration 2012-09-05 Simon Baldwin sim...@google.com * files.c (maybe_shorter_path): Suppress function definition if ENABLE_CANONICAL_SYSTEM_HEADERS is not defined. * (find_file_in_dir): Call maybe_shorter_path() only if ENABLE_CANONICAL_SYSTEM_HEADERS is defined. * configure.ac: Add new --enable-canonical-system-headers. * configure: Regenerate. * config.in: Regenerate. gcc/ChangeLog.google-integration 2012-09-05 Simon Baldwin sim...@google.com * doc/install.texi: Document --enable-canonical-system-headers. Index: gcc/doc/install.texi === --- gcc/doc/install.texi(revision 190968) +++ gcc/doc/install.texi(working copy) @@ -1710,6 +1710,14 @@ link time when @option{-fuse-linker-plug This linker should have plugin support such as gold starting with version 2.20 or GNU ld starting with version 2.21. See @option{-fuse-linker-plugin} for details. + +@item --enable-canonical-system-headers +@itemx --disable-canonical-system-headers +Enable system header path canonicalization for @file{libcpp}. This can +produce shorter header file paths in diagnostics and dependency output +files, but these changed header paths may conflict with some compilation +environments. Enabled by default, and may be disabled using +@option{--disable-canonical-system-headers}. @end table @subheading Cross-Compiler-Specific Options Index: libcpp/configure === --- libcpp/configure(revision 190968) +++ libcpp/configure(working copy) @@ -700,6 +700,7 @@ enable_rpath with_libiconv_prefix enable_maintainer_mode enable_checking +enable_canonical_system_headers ' ac_precious_vars='build_alias host_alias @@ -1333,6 +1334,8 @@ Optional Features: --disable-rpath do not hardcode runtime library paths --enable-maintainer-mode enable rules only needed by maintainers --enable-checking enable expensive run-time checks + --enable-canonical-system-headers + enable or disable system headers canonicalization Optional Packages: --with-PACKAGE[=ARG]use PACKAGE [ARG=yes] @@ -7094,6 +7097,19 @@ $as_echo #define ENABLE_CHECKING 1 c fi +# Check whether --enable-canonical-system-headers was given. +if test ${enable_canonical_system_headers+set} = set; then : + enableval=$enable_canonical_system_headers; +else + enable_canonical_system_headers=yes +fi + +if test $enable_canonical_system_headers != no; then + +$as_echo #define ENABLE_CANONICAL_SYSTEM_HEADERS 1 confdefs.h + +fi + case $target in alpha*-*-* | \ Index: libcpp/files.c === --- libcpp/files.c (revision 190968) +++ libcpp/files.c (working copy) @@ -345,6 +345,7 @@ pch_open_file (cpp_reader *pfile, _cpp_f shorter, otherwise return NULL. This function does NOT free the memory pointed by FILE. */ +#ifdef ENABLE_CANONICAL_SYSTEM_HEADERS static char * maybe_shorter_path (const char * file) { @@ -359,6 +360,7 @@ maybe_shorter_path (const char * file) return NULL; } } +#endif /* Try to open the path FILE-name appended to FILE-dir. This is where remap and PCH intercept the file lookup process. Return true @@ -384,6 +386,7 @@ find_file_in_dir (cpp_reader *pfile, _cp char *copy; void **pp; +#ifdef ENABLE_CANONICAL_SYSTEM_HEADERS /* We try to canonicalize system headers. */ if (file-dir-sysp) { @@ -396,6 +399,7 @@ find_file_in_dir (cpp_reader *pfile, _cp path = canonical_path; } } +#endif hv = htab_hash_string (path); if (htab_find_with_hash (pfile-nonexistent_file_hash, path, hv) != NULL) Index: libcpp/configure.ac === --- libcpp/configure.ac (revision 190968) +++ libcpp/configure.ac (working copy) @@ -132,6 +132,16 @@ if test $enable_checking != no ; then [Define if you want more run-time sanity checks.]) fi +AC_ARG_ENABLE(canonical-system-headers, +[ --enable-canonical-system-headers + enable or disable system headers
Re: Add a configure option to disable system header canonicalizations (issue6495088)
On Wed, Sep 5, 2012 at 6:56 AM, Simon Baldwin sim...@google.com wrote: Add a configure option to disable system header canonicalizations. Why should this be a configure option rather than a command-line option? Ian Libcpp may canonicalize system header paths with lrealpath() for diagnostics, dependency output, and similar. If gcc is held in a symlink farm the canonicalized paths may be meaningless to users, and will also conflict with build frameworks that (for example) disallow absolute paths to header files. This change adds --[en/dis]able-canonical-system-headers, allowing configure to select whether or not to implement r186991. See also PR c++/52974. Tested for regressions with bootstrap builds of C and C++, both with and without configure --disable-canonical-system-headers. Okay for trunk? libcpp/ChangeLog.google-integration 2012-09-05 Simon Baldwin sim...@google.com * files.c (maybe_shorter_path): Suppress function definition if ENABLE_CANONICAL_SYSTEM_HEADERS is not defined. * (find_file_in_dir): Call maybe_shorter_path() only if ENABLE_CANONICAL_SYSTEM_HEADERS is defined. * configure.ac: Add new --enable-canonical-system-headers. * configure: Regenerate. * config.in: Regenerate. gcc/ChangeLog.google-integration 2012-09-05 Simon Baldwin sim...@google.com * doc/install.texi: Document --enable-canonical-system-headers. Index: gcc/doc/install.texi === --- gcc/doc/install.texi(revision 190968) +++ gcc/doc/install.texi(working copy) @@ -1710,6 +1710,14 @@ link time when @option{-fuse-linker-plug This linker should have plugin support such as gold starting with version 2.20 or GNU ld starting with version 2.21. See @option{-fuse-linker-plugin} for details. + +@item --enable-canonical-system-headers +@itemx --disable-canonical-system-headers +Enable system header path canonicalization for @file{libcpp}. This can +produce shorter header file paths in diagnostics and dependency output +files, but these changed header paths may conflict with some compilation +environments. Enabled by default, and may be disabled using +@option{--disable-canonical-system-headers}. @end table @subheading Cross-Compiler-Specific Options Index: libcpp/configure === --- libcpp/configure(revision 190968) +++ libcpp/configure(working copy) @@ -700,6 +700,7 @@ enable_rpath with_libiconv_prefix enable_maintainer_mode enable_checking +enable_canonical_system_headers ' ac_precious_vars='build_alias host_alias @@ -1333,6 +1334,8 @@ Optional Features: --disable-rpath do not hardcode runtime library paths --enable-maintainer-mode enable rules only needed by maintainers --enable-checking enable expensive run-time checks + --enable-canonical-system-headers + enable or disable system headers canonicalization Optional Packages: --with-PACKAGE[=ARG]use PACKAGE [ARG=yes] @@ -7094,6 +7097,19 @@ $as_echo #define ENABLE_CHECKING 1 c fi +# Check whether --enable-canonical-system-headers was given. +if test ${enable_canonical_system_headers+set} = set; then : + enableval=$enable_canonical_system_headers; +else + enable_canonical_system_headers=yes +fi + +if test $enable_canonical_system_headers != no; then + +$as_echo #define ENABLE_CANONICAL_SYSTEM_HEADERS 1 confdefs.h + +fi + case $target in alpha*-*-* | \ Index: libcpp/files.c === --- libcpp/files.c (revision 190968) +++ libcpp/files.c (working copy) @@ -345,6 +345,7 @@ pch_open_file (cpp_reader *pfile, _cpp_f shorter, otherwise return NULL. This function does NOT free the memory pointed by FILE. */ +#ifdef ENABLE_CANONICAL_SYSTEM_HEADERS static char * maybe_shorter_path (const char * file) { @@ -359,6 +360,7 @@ maybe_shorter_path (const char * file) return NULL; } } +#endif /* Try to open the path FILE-name appended to FILE-dir. This is where remap and PCH intercept the file lookup process. Return true @@ -384,6 +386,7 @@ find_file_in_dir (cpp_reader *pfile, _cp char *copy; void **pp; +#ifdef ENABLE_CANONICAL_SYSTEM_HEADERS /* We try to canonicalize system headers. */ if (file-dir-sysp) { @@ -396,6 +399,7 @@ find_file_in_dir (cpp_reader *pfile, _cp path = canonical_path; } } +#endif hv = htab_hash_string (path); if (htab_find_with_hash (pfile-nonexistent_file_hash, path, hv) != NULL) Index: libcpp/configure.ac === --- libcpp/configure.ac (revision 190968) +++ libcpp/configure.ac (working copy) @@ -132,6
Re: Add a configure option to disable system header canonicalizations (issue6495088)
On 5 September 2012 16:03, Ian Lance Taylor i...@google.com wrote: On Wed, Sep 5, 2012 at 6:56 AM, Simon Baldwin sim...@google.com wrote: Add a configure option to disable system header canonicalizations. Why should this be a configure option rather than a command-line option? The underlying problem is a niche one, likely to affect a (vanishingly?) small number of users. It is hard in widely distributed build systems that combine make and non-make schemes to ensure that a given flag is passed to every compiler invocation every time. A configure option is a relatively non-invasive libcpp change. Gcc already has too many command line flags. Do you have a strong reason for why this should be a command line option and not a configure flag? Libcpp may canonicalize system header paths with lrealpath() for diagnostics, dependency output, and similar. If gcc is held in a symlink farm the canonicalized paths may be meaningless to users, and will also conflict with build frameworks that (for example) disallow absolute paths to header files. This change adds --[en/dis]able-canonical-system-headers, allowing configure to select whether or not to implement r186991. See also PR c++/52974. Tested for regressions with bootstrap builds of C and C++, both with and without configure --disable-canonical-system-headers. Okay for trunk? libcpp/ChangeLog.google-integration 2012-09-05 Simon Baldwin sim...@google.com * files.c (maybe_shorter_path): Suppress function definition if ENABLE_CANONICAL_SYSTEM_HEADERS is not defined. * (find_file_in_dir): Call maybe_shorter_path() only if ENABLE_CANONICAL_SYSTEM_HEADERS is defined. * configure.ac: Add new --enable-canonical-system-headers. * configure: Regenerate. * config.in: Regenerate. gcc/ChangeLog.google-integration 2012-09-05 Simon Baldwin sim...@google.com * doc/install.texi: Document --enable-canonical-system-headers. Index: gcc/doc/install.texi === --- gcc/doc/install.texi(revision 190968) +++ gcc/doc/install.texi(working copy) @@ -1710,6 +1710,14 @@ link time when @option{-fuse-linker-plug This linker should have plugin support such as gold starting with version 2.20 or GNU ld starting with version 2.21. See @option{-fuse-linker-plugin} for details. + +@item --enable-canonical-system-headers +@itemx --disable-canonical-system-headers +Enable system header path canonicalization for @file{libcpp}. This can +produce shorter header file paths in diagnostics and dependency output +files, but these changed header paths may conflict with some compilation +environments. Enabled by default, and may be disabled using +@option{--disable-canonical-system-headers}. @end table @subheading Cross-Compiler-Specific Options Index: libcpp/configure === --- libcpp/configure(revision 190968) +++ libcpp/configure(working copy) @@ -700,6 +700,7 @@ enable_rpath with_libiconv_prefix enable_maintainer_mode enable_checking +enable_canonical_system_headers ' ac_precious_vars='build_alias host_alias @@ -1333,6 +1334,8 @@ Optional Features: --disable-rpath do not hardcode runtime library paths --enable-maintainer-mode enable rules only needed by maintainers --enable-checking enable expensive run-time checks + --enable-canonical-system-headers + enable or disable system headers canonicalization Optional Packages: --with-PACKAGE[=ARG]use PACKAGE [ARG=yes] @@ -7094,6 +7097,19 @@ $as_echo #define ENABLE_CHECKING 1 c fi +# Check whether --enable-canonical-system-headers was given. +if test ${enable_canonical_system_headers+set} = set; then : + enableval=$enable_canonical_system_headers; +else + enable_canonical_system_headers=yes +fi + +if test $enable_canonical_system_headers != no; then + +$as_echo #define ENABLE_CANONICAL_SYSTEM_HEADERS 1 confdefs.h + +fi + case $target in alpha*-*-* | \ Index: libcpp/files.c === --- libcpp/files.c (revision 190968) +++ libcpp/files.c (working copy) @@ -345,6 +345,7 @@ pch_open_file (cpp_reader *pfile, _cpp_f shorter, otherwise return NULL. This function does NOT free the memory pointed by FILE. */ +#ifdef ENABLE_CANONICAL_SYSTEM_HEADERS static char * maybe_shorter_path (const char * file) { @@ -359,6 +360,7 @@ maybe_shorter_path (const char * file) return NULL; } } +#endif /* Try to open the path FILE-name appended to FILE-dir. This is where remap and PCH intercept the file lookup process. Return true @@ -384,6 +386,7 @@ find_file_in_dir (cpp_reader *pfile, _cp char *copy; void **pp; +#ifdef ENABLE_CANONICAL_SYSTEM_HEADERS
Re: Scheduler: Allow breaking dependencies by modifying patterns
On 08/03/2012 02:05 PM, Bernd Schmidt wrote: This patch allows us to change rn++ rm=[rn] into rm=[rn + 4] rn++ Ping. Bernd
Re: Add a configure option to disable system header canonicalizations (issue6495088)
On Wed, Sep 5, 2012 at 7:23 AM, Simon Baldwin sim...@google.com wrote: On 5 September 2012 16:03, Ian Lance Taylor i...@google.com wrote: On Wed, Sep 5, 2012 at 6:56 AM, Simon Baldwin sim...@google.com wrote: Add a configure option to disable system header canonicalizations. Why should this be a configure option rather than a command-line option? The underlying problem is a niche one, likely to affect a (vanishingly?) small number of users. It is hard in widely distributed build systems that combine make and non-make schemes to ensure that a given flag is passed to every compiler invocation every time. A configure option is a relatively non-invasive libcpp change. Gcc already has too many command line flags. Do you have a strong reason for why this should be a command line option and not a configure flag? I don't know if it's a strong reason, but the problem seems to be one that is characteristic of a specific invocation of a compiler, rather than characteristic of the compiler in general. The same compiler may be invoked in multiple ways. In only some of those ways is it appropriate to avoid canonicalizing paths. Ian Libcpp may canonicalize system header paths with lrealpath() for diagnostics, dependency output, and similar. If gcc is held in a symlink farm the canonicalized paths may be meaningless to users, and will also conflict with build frameworks that (for example) disallow absolute paths to header files. This change adds --[en/dis]able-canonical-system-headers, allowing configure to select whether or not to implement r186991. See also PR c++/52974. Tested for regressions with bootstrap builds of C and C++, both with and without configure --disable-canonical-system-headers. Okay for trunk? libcpp/ChangeLog.google-integration 2012-09-05 Simon Baldwin sim...@google.com * files.c (maybe_shorter_path): Suppress function definition if ENABLE_CANONICAL_SYSTEM_HEADERS is not defined. * (find_file_in_dir): Call maybe_shorter_path() only if ENABLE_CANONICAL_SYSTEM_HEADERS is defined. * configure.ac: Add new --enable-canonical-system-headers. * configure: Regenerate. * config.in: Regenerate. gcc/ChangeLog.google-integration 2012-09-05 Simon Baldwin sim...@google.com * doc/install.texi: Document --enable-canonical-system-headers. Index: gcc/doc/install.texi === --- gcc/doc/install.texi(revision 190968) +++ gcc/doc/install.texi(working copy) @@ -1710,6 +1710,14 @@ link time when @option{-fuse-linker-plug This linker should have plugin support such as gold starting with version 2.20 or GNU ld starting with version 2.21. See @option{-fuse-linker-plugin} for details. + +@item --enable-canonical-system-headers +@itemx --disable-canonical-system-headers +Enable system header path canonicalization for @file{libcpp}. This can +produce shorter header file paths in diagnostics and dependency output +files, but these changed header paths may conflict with some compilation +environments. Enabled by default, and may be disabled using +@option{--disable-canonical-system-headers}. @end table @subheading Cross-Compiler-Specific Options Index: libcpp/configure === --- libcpp/configure(revision 190968) +++ libcpp/configure(working copy) @@ -700,6 +700,7 @@ enable_rpath with_libiconv_prefix enable_maintainer_mode enable_checking +enable_canonical_system_headers ' ac_precious_vars='build_alias host_alias @@ -1333,6 +1334,8 @@ Optional Features: --disable-rpath do not hardcode runtime library paths --enable-maintainer-mode enable rules only needed by maintainers --enable-checking enable expensive run-time checks + --enable-canonical-system-headers + enable or disable system headers canonicalization Optional Packages: --with-PACKAGE[=ARG]use PACKAGE [ARG=yes] @@ -7094,6 +7097,19 @@ $as_echo #define ENABLE_CHECKING 1 c fi +# Check whether --enable-canonical-system-headers was given. +if test ${enable_canonical_system_headers+set} = set; then : + enableval=$enable_canonical_system_headers; +else + enable_canonical_system_headers=yes +fi + +if test $enable_canonical_system_headers != no; then + +$as_echo #define ENABLE_CANONICAL_SYSTEM_HEADERS 1 confdefs.h + +fi + case $target in alpha*-*-* | \ Index: libcpp/files.c === --- libcpp/files.c (revision 190968) +++ libcpp/files.c (working copy) @@ -345,6 +345,7 @@ pch_open_file (cpp_reader *pfile, _cpp_f shorter, otherwise return NULL. This function does NOT free the memory pointed by FILE. */ +#ifdef ENABLE_CANONICAL_SYSTEM_HEADERS static char * maybe_shorter_path (const char
Re: [PATCH] Add counter histogram to fdo summary (issue6465057)
Sorry about that. I am right now trying to reproduce the profiledbootstrap problem that H.J. reported, which is on x86_64-unknown-linux-gnu where I had successfully done a profiledbootstrap before my commit. Unfortunately after svn updating my client I am hitting an unrelated build problem with my profiledbootstrap (it can't find the rule to make libc++11convenience.la) which is slowing me down. I don't have access to a gentoo system so I might need your help to track down the error you reported if it doesn't turn out to be the same as the one H.J. hit. If you have a bad .gcda file that you could send me I can start with that. Sounds like there is a profile merging problem that I didn't see, that is corrupting the gcda files in both cases. Thanks, Teresa On Wed, Sep 5, 2012 at 12:12 AM, Markus Trippelsdorf mar...@trippelsdorf.de wrote: On 2012.09.04 at 14:23 -0700, Teresa Johnson wrote: I just committed the patch (included below). I implemented the occupancy bit vector approach for recording non-zero histogram entries, and a few issues uncovered with the merging in a profiled bootstrap. Passes both bootstrap and profiledbootstrap builds and regression tests. This commit causes: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54487 -- Markus -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[PATCH, i386]: Remove reg_not_xmm0_operand and similar hacks
Hello! Attached patch removes various *_not_xmm0_operand hacks. These were used to prevent combine from moving xmm0 register to the wrong place, but recently we implemented better approach. 2012-09-05 Uros Bizjak ubiz...@gmail.com * config/i386/sse.md (sse4_1_blendvssemodesuffixavxsizesuffix): Use register_operand instead of reg_not_xmm0_operand{,_maybe_avx}. Use nonimmediate_operand instead of nonimm_not_xmm0_operand{,_maybe_avx}. (sse4_1_avx2_pblendvb): Ditto. (sse4_2_pcmpestr): Ditto. (*sse4_2_pcmpestr_unaligned): Ditto. (sse4_2_pcmpistr): Ditto. (*sse4_2_pcmpistr_unaligned): Ditto. * config/i386/predicates.md (reg_not_xmm0_operand): Remove predicate. (nonimm_not_xmm0_operand): Ditto. (nonimm_not_xmm0_operand_maybe_avx): Ditto. (nonimm_not_xmm0_operand_maybe_avx): Ditto. * config/i386/i386.md (rdpmc): Do not force operand 1 into ecx. Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32} and committed to mainline SVN. Uros. Index: predicates.md === --- predicates.md (revision 190984) +++ predicates.md (working copy) @@ -93,33 +93,6 @@ (match_test TARGET_64BIT) (match_test REGNO (op) BX_REG))) -;; Return true if op is not xmm0 register. -(define_predicate reg_not_xmm0_operand - (match_operand 0 register_operand) -{ - if (GET_CODE (op) == SUBREG) -op = SUBREG_REG (op); - - return !REG_P (op) || REGNO (op) != FIRST_SSE_REG; -}) - -;; As above, but also allow memory operands. -(define_predicate nonimm_not_xmm0_operand - (ior (match_operand 0 memory_operand) - (match_operand 0 reg_not_xmm0_operand))) - -;; Return true if op is not xmm0 register, but only for non-AVX targets. -(define_predicate reg_not_xmm0_operand_maybe_avx - (if_then_else (match_test TARGET_AVX) -(match_operand 0 register_operand) -(match_operand 0 reg_not_xmm0_operand))) - -;; As above, but also allow memory operands. -(define_predicate nonimm_not_xmm0_operand_maybe_avx - (if_then_else (match_test TARGET_AVX) -(match_operand 0 nonimmediate_operand) -(match_operand 0 nonimm_not_xmm0_operand))) - ;; Return true if VALUE can be stored in a sign extended immediate field. (define_predicate x86_64_immediate_operand (match_code const_int,symbol_ref,label_ref,const) Index: sse.md === --- sse.md (revision 190984) +++ sse.md (working copy) @@ -8593,10 +8593,10 @@ (set_attr mode MODE)]) (define_insn sse4_1_blendvssemodesuffixavxsizesuffix - [(set (match_operand:VF 0 reg_not_xmm0_operand_maybe_avx =x,x) + [(set (match_operand:VF 0 register_operand =x,x) (unspec:VF - [(match_operand:VF 1 reg_not_xmm0_operand_maybe_avx 0,x) - (match_operand:VF 2 nonimm_not_xmm0_operand_maybe_avx xm,xm) + [(match_operand:VF 1 register_operand 0,x) + (match_operand:VF 2 nonimmediate_operand xm,xm) (match_operand:VF 3 register_operand Yz,x)] UNSPEC_BLENDV))] TARGET_SSE4_1 @@ -8691,10 +8691,10 @@ (set_attr mode TI)]) (define_insn sse4_1_avx2_pblendvb - [(set (match_operand:VI1_AVX2 0 reg_not_xmm0_operand =x,x) + [(set (match_operand:VI1_AVX2 0 register_operand =x,x) (unspec:VI1_AVX2 - [(match_operand:VI1_AVX2 1 reg_not_xmm0_operand_maybe_avx 0,x) - (match_operand:VI1_AVX2 2 nonimm_not_xmm0_operand_maybe_avx xm,xm) + [(match_operand:VI1_AVX2 1 register_operand 0,x) + (match_operand:VI1_AVX2 2 nonimmediate_operand xm,xm) (match_operand:VI1_AVX2 3 register_operand Yz,x)] UNSPEC_BLENDV))] TARGET_SSE4_1 @@ -9164,9 +9164,9 @@ (define_insn_and_split sse4_2_pcmpestr [(set (match_operand:SI 0 register_operand =c,c) (unspec:SI - [(match_operand:V16QI 2 reg_not_xmm0_operand x,x) + [(match_operand:V16QI 2 register_operand x,x) (match_operand:SI 3 register_operand a,a) - (match_operand:V16QI 4 nonimm_not_xmm0_operand x,m) + (match_operand:V16QI 4 nonimmediate_operand x,m) (match_operand:SI 5 register_operand d,d) (match_operand:SI 6 const_0_to_255_operand n,n)] UNSPEC_PCMPESTR)) @@ -9224,7 +9224,7 @@ (define_insn_and_split *sse4_2_pcmpestr_unaligned [(set (match_operand:SI 0 register_operand =c) (unspec:SI - [(match_operand:V16QI 2 reg_not_xmm0_operand x) + [(match_operand:V16QI 2 register_operand x) (match_operand:SI 3 register_operand a) (unspec:V16QI [(match_operand:V16QI 4 memory_operand m)] @@ -9365,8 +9365,8 @@ (define_insn_and_split sse4_2_pcmpistr [(set (match_operand:SI 0 register_operand =c,c) (unspec:SI - [(match_operand:V16QI 2 reg_not_xmm0_operand x,x) - (match_operand:V16QI 3 nonimm_not_xmm0_operand x,m)
Re: [RFA 2/n] Don't lift loads above register using jumps in postreload-gcse.c
On Wed, Sep 5, 2012 at 3:18 PM, Matthew Gretton-Dann matthew.gretton-d...@linaro.org wrote: On 5 September 2012 13:45, Richard Earnshaw rearn...@arm.com wrote: On 05/09/12 13:02, Steven Bosscher wrote: On Wed, Sep 5, 2012 at 1:42 PM, Matthew Gretton-Dann wrote: Whilst this fix works for this particular case I am not sure it is the best fix for the general issue, and so if others have a better idea how to fix this I would be very happy. postreload-gcse.c is broken in interesting ways. Look at this gem for example: static bool reg_changed_after_insn_p (rtx x, int cuid) { unsigned int regno, end_regno; regno = REGNO (x); end_regno = END_HARD_REGNO (x); do if (reg_avail_info[regno] cuid) return true; while (++regno end_regno); return false; } So the more conservative the fix, the better :-) I suppose removing the pass is too conservative :-) The patch looks correct to me. But perhaps the pass should just punt on blocks not ending in a simple jump in bb_has_well_behaved_predecessors? By 'simple jump' you mean any block with at most only EDGE_FALLTHRU on the edge? No, I mean using the onlyjump_p predicate. Ciao! Steven
[PATCH] Fix strspn/strcspn builtin folding (PR middle-end/54486)
Hi! fold_builtin_str*spn in this case returns sizetype typed constant instead of size_t, which makes -Wformat warn. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.7? 2012-09-05 Jakub Jelinek ja...@redhat.com PR middle-end/54486 * builtins.c (fold_builtin_strspn, fold_builtin_strcspn): Use build_int_cst with size_type_node instead of size_int. * c-c++-common/pr54486.c: New test. --- gcc/builtins.c.jj 2012-08-10 12:57:21.0 +0200 +++ gcc/builtins.c 2012-09-05 09:00:50.640530273 +0200 @@ -11890,7 +11890,7 @@ fold_builtin_strspn (location_t loc, tre if (p1 p2) { const size_t r = strspn (p1, p2); - return size_int (r); + return build_int_cst (size_type_node, r); } /* If either argument is , return NULL_TREE. */ @@ -11935,7 +11935,7 @@ fold_builtin_strcspn (location_t loc, tr if (p1 p2) { const size_t r = strcspn (p1, p2); - return size_int (r); + return build_int_cst (size_type_node, r); } /* If the first argument is , return NULL_TREE. */ --- gcc/testsuite/c-c++-common/pr54486.c.jj 2012-09-05 09:14:13.017748249 +0200 +++ gcc/testsuite/c-c++-common/pr54486.c2012-09-05 09:13:35.0 +0200 @@ -0,0 +1,32 @@ +/* PR middle-end/54486 */ +/* { dg-do compile } */ +/* { dg-options -Wformat } */ + +#ifdef __cplusplus +extern C { +#endif +typedef __SIZE_TYPE__ size_t; +extern int printf (const char *, ...); +extern size_t strspn (const char *, const char *); +extern size_t strcspn (const char *, const char *); +extern size_t strlen (const char *); +#ifdef __cplusplus +} +#endif + +void +foo (void) +{ + printf (%zu\n, strspn (abc, abcdefg)); + printf (%zu\n, (size_t) strspn (abc, abcdefg)); + printf (%zu\n, __builtin_strspn (abc, abcdefg)); + printf (%zu\n, (size_t) __builtin_strspn (abc, abcdefg)); + printf (%zu\n, strcspn (abc, abcdefg)); + printf (%zu\n, (size_t) strcspn (abc, abcdefg)); + printf (%zu\n, __builtin_strcspn (abc, abcdefg)); + printf (%zu\n, (size_t) __builtin_strcspn (abc, abcdefg)); + printf (%zu\n, strlen (abc)); + printf (%zu\n, (size_t) strlen (abc)); + printf (%zu\n, __builtin_strlen (abc)); + printf (%zu\n, (size_t) __builtin_strlen (abc)); +} Jakub
[PATCH] Fix sel-sched ICE with asm goto (PR rtl-optimization/54455)
Hi! As discussed in the PR, sel-sched doesn't handle correctly tidying of empty blocks if fallthru predecessor ends with asm goto that has some labels on the empty block in addition to the fallthru edge. cfgrtl.c can handle that, so this patch just gives up on it on the sel-sched side. The testcase is new since the patch in the PR, tested with unpatched and patched gcc (fails vs. works). Bootstrapped/regtested on x86_64-linux and i686-linux (as usually, with rtl checking). Ok for trunk? 2012-09-05 Jakub Jelinek ja...@redhat.com PR rtl-optimization/54455 * sel-sched-ir.c (maybe_tidy_empty_bb): Give up if previous fallthru bb ends up with asm goto referencing bb's label. * gcc.dg/54455.c: New test. --- gcc/sel-sched-ir.c.jj 2012-08-15 10:55:30.0 +0200 +++ gcc/sel-sched-ir.c 2012-09-03 09:56:59.352233243 +0200 @@ -3686,6 +3686,22 @@ maybe_tidy_empty_bb (basic_block bb) FOR_EACH_EDGE (e, ei, bb-preds) if (e-flags EDGE_COMPLEX) return false; +else if (e-flags EDGE_FALLTHRU) + { + rtx note; + /* If prev bb ends with asm goto, see if any of the + ASM_OPERANDS_LABELs don't point to the fallthru + label. Do not attempt to redirect it in that case. */ + if (JUMP_P (BB_END (e-src)) +(note = extract_asm_operands (PATTERN (BB_END (e-src) + { + int i, n = ASM_OPERANDS_LABEL_LENGTH (note); + + for (i = 0; i n; ++i) + if (XEXP (ASM_OPERANDS_LABEL (note, i), 0) == BB_HEAD (bb)) + return false; + } + } free_data_sets (bb); --- gcc/testsuite/gcc.dg/54455.c.jj 2012-06-15 19:53:34.312404791 +0200 +++ gcc/testsuite/gcc.dg/54455.c2012-09-05 15:05:02.328728962 +0200 @@ -0,0 +1,25 @@ +/* PR rtl-optimization/54455 */ +/* { dg-do compile } */ +/* { dg-options -O1 -fschedule-insns -fselective-scheduling --param max-sched-extend-regions-iters=8 } */ + +extern void fn1 (void), fn2 (void); + +static inline __attribute__((always_inline)) int +foo (int *x, long y) +{ + asm goto ( : : r (x), r (y) : memory : lab); + return 0; +lab: + return 1; +} + +void +bar (int *x) +{ + if (foo (x, 23)) +fn1 (); + else +fn2 (); + + foo (x, 2); +} Jakub
Re: [PATCH] Add counter histogram to fdo summary (issue6465057)
On Wed, Sep 5, 2012 at 8:09 AM, Teresa Johnson tejohn...@google.com wrote: Sorry about that. I am right now trying to reproduce the profiledbootstrap problem that H.J. reported, which is on x86_64-unknown-linux-gnu where I had successfully done a profiledbootstrap before my commit. Unfortunately after svn updating my client I am hitting an unrelated build problem with my profiledbootstrap (it can't find the rule to make libc++11convenience.la) which is slowing me down. I can reproduce it with revision 190982 on Fedora 18/x86-64 with 8-core. -- H.J.
Re: [PATCH] Add counter histogram to fdo summary (issue6465057)
On Wed, Sep 5, 2012 at 8:44 AM, H.J. Lu hjl.to...@gmail.com wrote: On Wed, Sep 5, 2012 at 8:09 AM, Teresa Johnson tejohn...@google.com wrote: Sorry about that. I am right now trying to reproduce the profiledbootstrap problem that H.J. reported, which is on x86_64-unknown-linux-gnu where I had successfully done a profiledbootstrap before my commit. Unfortunately after svn updating my client I am hitting an unrelated build problem with my profiledbootstrap (it can't find the rule to make libc++11convenience.la) which is slowing me down. I can reproduce it with revision 190982 on Fedora 18/x86-64 with 8-core. Ok, thanks. I am being blocked by an unrelated error: libtool: compile: /home/tejohnson/extra/gcc_trunk_4_validate/tmp/./gcc/xgcc -shared-libgcc -B/home/tejohnson/extra/gcc_trunk_4_validate/tmp/./gcc -nostdinc++ -L/home/tejohnson/extra/gcc_trunk_4_validate/tmp/x86_64-unknown-linux-gnu/libstdc++-v3/src -L/home/tejohnson/extra/gcc_trunk_4_validate/tmp/x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs -B/usr/local/x86_64-unknown-linux-gnu/bin/ -B/usr/local/x86_64-unknown-linux-gnu/lib/ -isystem /usr/local/x86_64-unknown-linux-gnu/include -isystem /usr/local/x86_64-unknown-linux-gnu/sys-include -I/home/tejohnson/extra/gcc_trunk_4/libstdc++-v3/../libgcc -I/home/tejohnson/extra/gcc_trunk_4_validate/tmp/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu -I/home/tejohnson/extra/gcc_trunk_4_validate/tmp/x86_64-unknown-linux-gnu/libstdc++-v3/include -I/home/tejohnson/extra/gcc_trunk_4/libstdc++-v3/libsupc++ -std=gnu++11 -fno-implicit-templates -Wall -Wextra -Wwrite-strings -Wcast-qual -Wabi -fdiagnostics-show-location=once -ffunction-sections -fdata-sections -frandom-seed=random.lo -g -O2 -D_GNU_SOURCE -c /home/tejohnson/extra/gcc_trunk_4/libstdc++-v3/src/c++11/random.cc -fPIC -DPIC -o random.o /tmp/ccOm0d5x.s: Assembler messages: /tmp/ccOm0d5x.s:33: Error: no such instruction: `rdrand %eax' make[6]: *** [random.lo] Error 1 Looks like I am being hit by: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54419 I'm going to try backing out all the changes related to this bug to see if I can make progress on the profiledbootstrap. Teresa -- H.J. -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[Patch ARM] implement bswap16
Hi, This patch implements __builtin_bswap16() on ARM (v6 and above) using revsh with a signed input and rev16 with an unsigned input. It is pretty much equal to the patch posted some time ago http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00962.html, but it's hard to write such patterns differently ;-) I have added a testcase. OK for trunk? Christophe. bswap16.patch Description: Binary data
Re: [PATCH] Add counter histogram to fdo summary (issue6465057)
On Wed, Sep 5, 2012 at 8:50 AM, Teresa Johnson tejohn...@google.com wrote: On Wed, Sep 5, 2012 at 8:44 AM, H.J. Lu hjl.to...@gmail.com wrote: On Wed, Sep 5, 2012 at 8:09 AM, Teresa Johnson tejohn...@google.com wrote: Sorry about that. I am right now trying to reproduce the profiledbootstrap problem that H.J. reported, which is on x86_64-unknown-linux-gnu where I had successfully done a profiledbootstrap before my commit. Unfortunately after svn updating my client I am hitting an unrelated build problem with my profiledbootstrap (it can't find the rule to make libc++11convenience.la) which is slowing me down. I can reproduce it with revision 190982 on Fedora 18/x86-64 with 8-core. Ok, thanks. I am being blocked by an unrelated error: libtool: compile: /home/tejohnson/extra/gcc_trunk_4_validate/tmp/./gcc/xgcc -shared-libgcc -B/home/tejohnson/extra/gcc_trunk_4_validate/tmp/./gcc -nostdinc++ -L/home/tejohnson/extra/gcc_trunk_4_validate/tmp/x86_64-unknown-linux-gnu/libstdc++-v3/src -L/home/tejohnson/extra/gcc_trunk_4_validate/tmp/x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs -B/usr/local/x86_64-unknown-linux-gnu/bin/ -B/usr/local/x86_64-unknown-linux-gnu/lib/ -isystem /usr/local/x86_64-unknown-linux-gnu/include -isystem /usr/local/x86_64-unknown-linux-gnu/sys-include -I/home/tejohnson/extra/gcc_trunk_4/libstdc++-v3/../libgcc -I/home/tejohnson/extra/gcc_trunk_4_validate/tmp/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu -I/home/tejohnson/extra/gcc_trunk_4_validate/tmp/x86_64-unknown-linux-gnu/libstdc++-v3/include -I/home/tejohnson/extra/gcc_trunk_4/libstdc++-v3/libsupc++ -std=gnu++11 -fno-implicit-templates -Wall -Wextra -Wwrite-strings -Wcast-qual -Wabi -fdiagnostics-show-location=once -ffunction-sections -fdata-sections -frandom-seed=random.lo -g -O2 -D_GNU_SOURCE -c /home/tejohnson/extra/gcc_trunk_4/libstdc++-v3/src/c++11/random.cc -fPIC -DPIC -o random.o /tmp/ccOm0d5x.s: Assembler messages: /tmp/ccOm0d5x.s:33: Error: no such instruction: `rdrand %eax' make[6]: *** [random.lo] Error 1 Looks like I am being hit by: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54419 I'm going to try backing out all the changes related to this bug to see if I can make progress on the profiledbootstrap. You can install the latest binutils and put it in your PATH. -- H.J.
Re: [PATCH] Fix strspn/strcspn builtin folding (PR middle-end/54486)
On 2012-09-05 11:40 , Jakub Jelinek wrote: 2012-09-05 Jakub Jelinek ja...@redhat.com PR middle-end/54486 * builtins.c (fold_builtin_strspn, fold_builtin_strcspn): Use build_int_cst with size_type_node instead of size_int. * c-c++-common/pr54486.c: New test. OK.
Patch ping^2
http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01100.html - C++ -Wsizeof-pointer-memaccess support (C is already in) Jakub
Re: [PATCH] Add counter histogram to fdo summary (issue6465057)
On Wed, Sep 5, 2012 at 9:13 AM, H.J. Lu hjl.to...@gmail.com wrote: On Wed, Sep 5, 2012 at 8:50 AM, Teresa Johnson tejohn...@google.com wrote: On Wed, Sep 5, 2012 at 8:44 AM, H.J. Lu hjl.to...@gmail.com wrote: On Wed, Sep 5, 2012 at 8:09 AM, Teresa Johnson tejohn...@google.com wrote: Sorry about that. I am right now trying to reproduce the profiledbootstrap problem that H.J. reported, which is on x86_64-unknown-linux-gnu where I had successfully done a profiledbootstrap before my commit. Unfortunately after svn updating my client I am hitting an unrelated build problem with my profiledbootstrap (it can't find the rule to make libc++11convenience.la) which is slowing me down. I can reproduce it with revision 190982 on Fedora 18/x86-64 with 8-core. Ok, thanks. I am being blocked by an unrelated error: libtool: compile: /home/tejohnson/extra/gcc_trunk_4_validate/tmp/./gcc/xgcc -shared-libgcc -B/home/tejohnson/extra/gcc_trunk_4_validate/tmp/./gcc -nostdinc++ -L/home/tejohnson/extra/gcc_trunk_4_validate/tmp/x86_64-unknown-linux-gnu/libstdc++-v3/src -L/home/tejohnson/extra/gcc_trunk_4_validate/tmp/x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs -B/usr/local/x86_64-unknown-linux-gnu/bin/ -B/usr/local/x86_64-unknown-linux-gnu/lib/ -isystem /usr/local/x86_64-unknown-linux-gnu/include -isystem /usr/local/x86_64-unknown-linux-gnu/sys-include -I/home/tejohnson/extra/gcc_trunk_4/libstdc++-v3/../libgcc -I/home/tejohnson/extra/gcc_trunk_4_validate/tmp/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu -I/home/tejohnson/extra/gcc_trunk_4_validate/tmp/x86_64-unknown-linux-gnu/libstdc++-v3/include -I/home/tejohnson/extra/gcc_trunk_4/libstdc++-v3/libsupc++ -std=gnu++11 -fno-implicit-templates -Wall -Wextra -Wwrite-strings -Wcast-qual -Wabi -fdiagnostics-show-location=once -ffunction-sections -fdata-sections -frandom-seed=random.lo -g -O2 -D_GNU_SOURCE -c /home/tejohnson/extra/gcc_trunk_4/libstdc++-v3/src/c++11/random.cc -fPIC -DPIC -o random.o /tmp/ccOm0d5x.s: Assembler messages: /tmp/ccOm0d5x.s:33: Error: no such instruction: `rdrand %eax' make[6]: *** [random.lo] Error 1 Looks like I am being hit by: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54419 I'm going to try backing out all the changes related to this bug to see if I can make progress on the profiledbootstrap. You can install the latest binutils and put it in your PATH. Ok, I just backed out the libstdc++ patches which fixed it for now. I am trying a couple different profiledbootstraps. One with just --with-build-config=bootstrap-lto and one with --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --enable-languages=c,c++ --enable-gnu-indirect-function --with-fpmath=sse. Thanks, Teresa -- H.J. -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Fix argument spelling in uninstantiated vec_t function
Caught by Clang (which also checks uninstantiated templates). PR bootstrap/54484 * vec.h (vec_t::lower_bound): Fix spelling of LESSTHAN argument. diff --git a/gcc/vec.h b/gcc/vec.h index 441c9b5..fbf95d2 100644 --- a/gcc/vec.h +++ b/gcc/vec.h @@ -1075,7 +1075,7 @@ vec_tT::lower_bound (T obj, bool (*lessthan)(T, T)) const templatetypename T unsigned vec_tT::lower_bound (const T *ptr, - bool (*lessthan_)(const T *, const T *)) const + bool (*lessthan)(const T *, const T *)) const { unsigned int len = VEC_length (T, this); unsigned int half, middle;
Re: [C++ Patch] PR 18747
Hi, On 08/31/2012 11:32 PM, Jason Merrill wrote: Since you're traveling, I poked at this myself some more. The issue here is that there are too many template headers for the declaration, so we want to figure out what the right number is and give an appropriate message. Tested x86_64-pc-linux-gnu, applying to trunk. Thanks for looking into this. Now I wonder if we made progress on a couple of long standing PRs where we weren't strict enough by one with the number of 'template '. Let me check... Paolo.
Re: [C++ Patch] PR 18747
On 09/05/2012 06:41 PM, Paolo Carlini wrote: Thanks for looking into this. Now I wonder if we made progress on a couple of long standing PRs where we weren't strict enough by one with the number of 'template '. Let me check... Nope, apparently c++/24314 is still there. But maybe it's easier to fix now (I suspect however that the problem has to do with the logic of num_template_headers_for_class, which I think you didn't change) Paolo.
[Patch, Fortran, committed] PR54462 - fix ICE on invalid
Rather obvious fix. gfc_undo_symbols segfaulted when the COMMON statement aborted before the common symtree was created. Committed as Rev. 190989. Hopefully, that's the last fall out of my memory clean up patch. The hopefully last issue with the current FINAL patch has already been fixed by Mikael. Thanks! Tobias Index: gcc/fortran/ChangeLog === --- gcc/fortran/ChangeLog (Revision 190985) +++ gcc/fortran/ChangeLog (Arbeitskopie) @@ -1,3 +1,8 @@ +2012-09-04 Tobias Burnus bur...@net-b.de + + PR fortran/54462 + * symbol.c (gfc_undo_symbols): Avoid NULL pointer dereference. + 2012-09-04 Janus Weil ja...@gcc.gnu.org PR fortran/54435 Index: gcc/fortran/symbol.c === --- gcc/fortran/symbol.c (Revision 190985) +++ gcc/fortran/symbol.c (Arbeitskopie) @@ -2919,10 +2919,12 @@ gfc_undo_symbols (void) gfc_symtree st, *st0; st0 = find_common_symtree (p-ns-common_root, p-common_block); - - st.name = st0-name; - gfc_delete_bbt (p-ns-common_root, st, compare_symtree); - free (st0); + if (st0) + { + st.name = st0-name; + gfc_delete_bbt (p-ns-common_root, st, compare_symtree); + free (st0); + } } if (p-common_block-head == p)
Re: [middle-end] Add machine_mode to address_cost target hook
On Wed, 2012-09-05 at 13:44 +0400, Denis Chertykov wrote: 2012/9/5 Oleg Endo oleg.e...@t-online.de: Updated ACK table: [x] target-independent bits [ ] alpha [x] arm [ ] avr [ ] bfin [ ] cr16 [x] cris [x] epiphany[ ] i386 [ ] ia64 [x] iq2000[ ] lm32[ ] m32c [x] m32r [x] mcore [ ] mep [x] microblaze [x] mips [x] mmix [x] mn10300 [ ] pa [x] rs6000[x] rx[ ] s390[ ] score [x] sh[ ] sparc [x] spu [x] stormy16 [x] v850 [ ] vax [ ] xtensa avr - ok Thanks, that was the last piece. I've committed the patch as rev 190990. Cheers, Oleg
Re: Change double_int calls to new interface.
On 9/5/12, Richard Guenther rguent...@suse.de wrote: On Tue, 4 Sep 2012, Lawrence Crowl wrote: Modify gcc/*.[hc] double_int call sites to use the new interface. This change entailed adding a few new methods to double_int. Other changes will happen in separate patches. Once all uses of the old interface are gone, they will be removed. The change results in a 0.163% time improvement with a 70% confidence. Tested on x86_64. Index: gcc/ChangeLog - double_int_lshift - (double_int_one, -TREE_INT_CST_LOW (vr1.min), -TYPE_PRECISION (expr_type), -false)); + double_int_one + .llshift (TREE_INT_CST_LOW (vr1.min), + TYPE_PRECISION (expr_type))); Ick - is that what our coding-conventions say? I mean the .llshift on the next line. Our conventions say nothing about that, but method calls seem somewhat analogoust to binary operators, and hence this formatting was probably the least objectional. Otherwise ok. As in you want me to do something else? The tmin.cmp (tmax, uns) 0 kind of things look odd - definitely methods like tmin.gt (tmax, uns) would be nice to have. Or even better, get rid of the 'uns' parameters and provide a struct double_int_with_signedness { double_int val; bool uns; }; struct double_uint : double_int_with_signedness { double_uint (double_int); }; ... and comparison operators which take double_uint/sint. It would, I think, be better to have separate signed and unsigned types. That change was significantly structural, and I don't know where the wide_int work sits in relation to that choice. You didn't remove any of the old interfaces, so I think we are going to bitrot quickly again. I couldn't remove the old interface yet because I haven't updated all the code yet. -- Lawrence Crowl
Re: [C++ Patch] PR 18747
On 09/05/2012 06:52 PM, Paolo Carlini wrote: On 09/05/2012 06:41 PM, Paolo Carlini wrote: Thanks for looking into this. Now I wonder if we made progress on a couple of long standing PRs where we weren't strict enough by one with the number of 'template '. Let me check... Nope, apparently c++/24314 is still there. But maybe it's easier to fix now (I suspect however that the problem has to do with the logic of num_template_headers_for_class, which I think you didn't change) In fact, something seems weird earlier, in cp_parser_check_template_parameters. It has: /* If there are the same number of template classes and parameter lists, that's OK. */ if (parser-num_template_parameter_lists == num_templates) return true; /* If there are more, but only one more, then we are referring to a member template. That's OK too. */ if (parser-num_template_parameter_lists == num_templates + 1) return true; but note that for: template class T struct A { int select() { return 0; } }; we have parser-num_template_parameter_lists == 1 and num_templates == 0. Thus it seems that the case 'num_templates + 1' isn't (just) about member templates... Paolo.
Re: [middle-end] Add machine_mode to address_cost target hook
I don't feel the m32c change needs my specific ack, it's a harmless change that goes with the ack for the feature itself. However, I will note that m32c does have different costs for addresses in different address spaces, at least when -Os.
Re: [middle-end] Add machine_mode to address_cost target hook
On Wed, 2012-09-05 at 14:39 -0400, DJ Delorie wrote: I don't feel the m32c change needs my specific ack, it's a harmless change that goes with the ack for the feature itself. However, I will note that m32c does have different costs for addresses in different address spaces, at least when -Os. I have created http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54496 for this. Cheers, Oleg
Re: Change double_int calls to new interface.
On Wed, 5 Sep 2012, Lawrence Crowl wrote: On 9/5/12, Richard Guenther rguent...@suse.de wrote: The tmin.cmp (tmax, uns) 0 kind of things look odd - definitely methods like tmin.gt (tmax, uns) would be nice to have. Or even better, get rid of the 'uns' parameters and provide a struct double_int_with_signedness { double_int val; bool uns; }; struct double_uint : double_int_with_signedness { double_uint (double_int); }; ... and comparison operators which take double_uint/sint. It would, I think, be better to have separate signed and unsigned types. That change was significantly structural, and I don't know where the wide_int work sits in relation to that choice. Note that in tree-vrp.c, if I remember correctly, I used both signed and unsigned operations on the same object (emulating arbitrary precision is a pain). -- Marc Glisse
Re: [v3] libstdc++/54296
On 09/05/2012 11:58 AM, Paolo Carlini wrote: Hi, On 09/04/2012 10:08 PM, François Dumont wrote: Hi I managed to do the test with Valgrind and so confirm the fix with the attached patch (unmodified since last proposal). Patch is Ok, thanks for your patience and thanks again for all your great work on the unordered containers! Paolo. Attached patch applied. No problem Paolo, this is your job as maintainers to challenge the patches, no big deal. And being now able to run programs through Valgrind or Gdb is definitely more comfortable for me. 2012-09-05 François Dumont fdum...@gcc.gnu.org PR libstdc++/54296 * include/bits/hashtable.h (_M_erase(size_type, __node_base*, __node_type*)): New. (erase(const_iterator)): Use latter. (_M_erase(std::true_type, const key_type)): New, likewise. (_M_erase(std::false_type, const key_type)): New. Find all nodes matching the key before deallocating them so that the key doesn't get invalidated. (erase(const key_type)): Use the new member functions. * testsuite/23_containers/unordered_map/erase/54296.cc: New. * testsuite/23_containers/unordered_multimap/erase/54296.cc: New. Index: include/bits/hashtable.h === --- include/bits/hashtable.h (revision 190990) +++ include/bits/hashtable.h (working copy) @@ -612,6 +612,15 @@ iterator _M_insert(_Arg, std::false_type); + size_type + _M_erase(std::true_type, const key_type); + + size_type + _M_erase(std::false_type, const key_type); + + iterator + _M_erase(size_type __bkt, __node_base* __prev_n, __node_type* __n); + public: // Emplace templatetypename... _Args @@ -636,7 +645,8 @@ { return erase(const_iterator(__it)); } size_type - erase(const key_type); + erase(const key_type __k) + { return _M_erase(__unique_keys(), __k); } iterator erase(const_iterator, const_iterator); @@ -1430,7 +1440,21 @@ // is why we need buckets to contain the before begin to make // this research fast. __node_base* __prev_n = _M_get_previous_node(__bkt, __n); - if (__n == _M_bucket_begin(__bkt)) + return _M_erase(__bkt, __prev_n, __n); +} + + templatetypename _Key, typename _Value, + typename _Alloc, typename _ExtractKey, typename _Equal, + typename _H1, typename _H2, typename _Hash, typename _RehashPolicy, + typename _Traits +typename _Hashtable_Key, _Value, _Alloc, _ExtractKey, _Equal, + _H1, _H2, _Hash, _RehashPolicy, + _Traits::iterator +_Hashtable_Key, _Value, _Alloc, _ExtractKey, _Equal, + _H1, _H2, _Hash, _RehashPolicy, _Traits:: +_M_erase(size_type __bkt, __node_base* __prev_n, __node_type* __n) +{ + if (__prev_n == _M_buckets[__bkt]) _M_remove_bucket_begin(__bkt, __n-_M_next(), __n-_M_nxt ? _M_bucket_index(__n-_M_next()) : 0); else if (__n-_M_nxt) @@ -1457,7 +1481,7 @@ _Traits::size_type _Hashtable_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits:: -erase(const key_type __k) +_M_erase(std::true_type, const key_type __k) { __hash_code __code = this-_M_hash_code(__k); std::size_t __bkt = _M_bucket_index(__k, __code); @@ -1466,43 +1490,67 @@ __node_base* __prev_n = _M_find_before_node(__bkt, __k, __code); if (!__prev_n) return 0; + + // We found a matching node, erase it. __node_type* __n = static_cast__node_type*(__prev_n-_M_nxt); - bool __is_bucket_begin = _M_buckets[__bkt] == __prev_n; + _M_erase(__bkt, __prev_n, __n); + return 1; +} - // We found a matching node, start deallocation loop from it - std::size_t __next_bkt = __bkt; - __node_type* __next_n = __n; + templatetypename _Key, typename _Value, + typename _Alloc, typename _ExtractKey, typename _Equal, + typename _H1, typename _H2, typename _Hash, typename _RehashPolicy, + typename _Traits +typename _Hashtable_Key, _Value, _Alloc, _ExtractKey, _Equal, + _H1, _H2, _Hash, _RehashPolicy, + _Traits::size_type +_Hashtable_Key, _Value, _Alloc, _ExtractKey, _Equal, + _H1, _H2, _Hash, _RehashPolicy, _Traits:: +_M_erase(std::false_type, const key_type __k) +{ + __hash_code __code = this-_M_hash_code(__k); + std::size_t __bkt = _M_bucket_index(__k, __code); + + // Look for the node before the first matching node. + __node_base* __prev_n = _M_find_before_node(__bkt, __k, __code); + if (!__prev_n) + return 0; + + // _GLIBCXX_RESOLVE_LIB_DEFECTS + // 526. Is it undefined if a function in the standard changes + // in parameters? + // We use one loop to find all matching nodes and another to deallocate + // them so that the key stays valid during the first loop. It might be + // invalidated indirectly when destroying nodes. + __node_type* __n =
[PATCH, libgfortran]: Use __builtin_ia32_{stmxcsr,ldmxcsr} intrinsics in config/fpu-i387.h
Hello! This patch substitutes volatile asms with equivalent intrinsics. 2012-09-05 Uros Bizjak ubiz...@gmail.com * config/fpu-387.h (set_fpu): Use __builtin_ia32_stmxcsr and __builtin_ia32_ldmxcsr intrinsics. Tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline an 4.7 branch. Uros. Index: config/fpu-387.h === --- config/fpu-387.h(revision 190984) +++ config/fpu-387.h(working copy) @@ -118,7 +118,7 @@ void set_fpu (void) { unsigned int cw_sse; - asm volatile (stmxcsr %0 : =m (cw_sse)); + cw_sse = __builtin_ia32_stmxcsr (); cw_sse = 0x; cw_sse |= (_FPU_MASK_IM | _FPU_MASK_DM | _FPU_MASK_ZM | _FPU_MASK_OM @@ -131,6 +131,6 @@ void set_fpu (void) if (options.fpe GFC_FPE_UNDERFLOW) cw_sse = ~(_FPU_MASK_UM 7); if (options.fpe GFC_FPE_INEXACT) cw_sse = ~(_FPU_MASK_PM 7); - asm volatile (ldmxcsr %0 : : m (cw_sse)); + __builtin_ia32_ldmxcsr (cw_sse); } }
Re: Change double_int calls to new interface.
On 9/5/12, Marc Glisse marc.gli...@inria.fr wrote: On Wed, 5 Sep 2012, Lawrence Crowl wrote: On 9/5/12, Richard Guenther rguent...@suse.de wrote: The tmin.cmp (tmax, uns) 0 kind of things look odd - definitely methods like tmin.gt (tmax, uns) would be nice to have. Or even better, get rid of the 'uns' parameters and provide a struct double_int_with_signedness { double_int val; bool uns; }; struct double_uint : double_int_with_signedness { double_uint (double_int); }; ... and comparison operators which take double_uint/sint. It would, I think, be better to have separate signed and unsigned types. That change was significantly structural, and I don't know where the wide_int work sits in relation to that choice. Note that in tree-vrp.c, if I remember correctly, I used both signed and unsigned operations on the same object (emulating arbitrary precision is a pain). Presumably the wide_int work will address that issue. -- Lawrence Crowl
[patch] Random cleanups
Hello, Just some cleanups I did while working on something bigger. OK for trunk? Ciao! Steven * graphite.c (print_global_statistics): Use EDGE_COUNT instead of VEC_length. (print_graphite_scop_statistics): Likewise. * graphite-scop-detection.c (get_bb_type): Use single_succ_p. (print_graphite_scop_statistics): Use EDGE_COUNT, not VEC_length. (canonicalize_loop_closed_ssa): Use single_pred_p. * alias.c (reg_seen): Make this an sbitmap. (record_set, init_alias_analysis): Update. random_cleanups.diff Description: Binary data
[patch] Fix bitmap_last_set_bit
Hi, bitmap.c:bitmap_last_set_bit() is not used by any code in the current GCC trunk, but I'm using it and I noticed it returns an incorrect result. This patch rewrites most of the function to return the correct result. Not sure how to test this other than to say that my code, that uses this function, works with the patch and breaks without it. I've also unleashed bitmap_last_set_bit (and bitmap_first_set_bit) on a large number of randomly generated bitmaps and rather expensive verification code that doesn't accept the results of the pre-patch bitmap_last_set_bit and is happy with my new implementation. OK for trunk? Ciao! Steven * bitmap.c (bitmap_last_set_bit): Rewrite to return the correct bit. bitmap_last_bit_set.diff Description: Binary data
[PATCH] Fix PR 54494, removal of volatile store in strlen optimization
Hi, The problem here is the strlen optimization tries to remove a null character store as we already have done it but it does it for a volatile store which is not a valid thing to do. This patch fixes the problem by ignoring statements which have volatile operands. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. Thanks, Andrew Pinski ChangeLog: * tree-ssa-strlen.c (strlen_optimize_stmt): Don't look at statements which have volatile operands. testsuite/ChangeLog: * gcc.dg/tree-ssa/strlen-1.c: New testcase. Index: gcc/tree-ssa-strlen.c === --- gcc/tree-ssa-strlen.c (revision 190993) +++ gcc/tree-ssa-strlen.c (working copy) @@ -1782,7 +1782,8 @@ strlen_optimize_stmt (gimple_stmt_iterat break; } } - else if (is_gimple_assign (stmt)) + else if (is_gimple_assign (stmt) + !gimple_has_volatile_ops (stmt)) { tree lhs = gimple_assign_lhs (stmt); Index: gcc/testsuite/gcc.dg/tree-ssa/strlen-1.c === --- gcc/testsuite/gcc.dg/tree-ssa/strlen-1.c(revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/strlen-1.c(revision 0) @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-optimized } */ +extern const unsigned long base; +static inline void wreg(unsigned char val, unsigned long addr) __attribute__((always_inline)); +static inline void wreg(unsigned char val, unsigned long addr) +{ + *((volatile unsigned char *) (__SIZE_TYPE__) (base + addr)) = val; +} +void wreg_twice(void) +{ + wreg(0, 42); + wreg(0, 42); +} + +/* We should not remove the second null character store to (base+42) address. */ +/* { dg-final { scan-tree-dump-times ={v} 0; 2 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */
Re: [patch, mips] New mips triplet for multilib linux builds
On Tue, 2012-09-04 at 23:55 +0100, Richard Sandiford wrote: If we do that, then your DRIVER_SELF_SPECS can further have: MIPS_ISA_SYNCI_SPEC where the definition: /* Infer a -msynci setting from a -mips argument, on the assumption that -msynci is desired where possible. */ #define MIPS_ISA_SYNCI_SPEC \ %{msynci|mno-synci:;%{mips32r2|mips64r2:-msynci:-mno-synci}} can go in mips.h. OPTION_DEFAULT_SPECS would then handle synci in just the same way as the other options, without the special SYNCI_SPEC macro. I am having trouble with this part. The newly built compiler is choking on this config spec when building libgcc and I am not sure how to read it. I tried looking in gcc/doc to find a description of the spec syntax but I couldn't find where it was documented. I don't know what the semicolon does and I have never seen a 3 part spec like %{mips32r2|mips64r2:-msynci:-mno-synci} Is this an 'if-then-else' usage? I have only ever seen two part usages like: %{mips32r2|mips64r2:-msynci} Steve Ellcey sell...@mips.com
Re: [PATCH, libgfortran]: Use __builtin_ia32_{stmxcsr,ldmxcsr} intrinsics in config/fpu-i387.h
On Wed, Sep 5, 2012 at 9:53 PM, Uros Bizjak ubiz...@gmail.com wrote: This patch substitutes volatile asms with equivalent intrinsics. 2012-09-05 Uros Bizjak ubiz...@gmail.com * config/fpu-387.h (set_fpu): Use __builtin_ia32_stmxcsr and __builtin_ia32_ldmxcsr intrinsics. Tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline an 4.7 branch. I forgot that these builtins are enabled for SSE only (and x86_64 bootstrap enables SSE2 by default), so following addition is needed: --cut here-- Index: config/fpu-387.h === --- config/fpu-387.h(revision 190992) +++ config/fpu-387.h(working copy) @@ -96,7 +96,11 @@ #define _FPU_MASK_UM 0x10 #define _FPU_MASK_PM 0x20 -void set_fpu (void) +void +#ifndef __SSE__ +__attribute__((__target__(sse))) +#endif +set_fpu (void) { unsigned short cw; --cut here-- Re-tested on x86_64-pc-linux-gnu and committed. Uros. Uros.
Re: [PATCH, libgfortran]: Use __builtin_ia32_{stmxcsr,ldmxcsr} intrinsics in config/fpu-i387.h
On Wed, Sep 5, 2012 at 11:30 PM, Uros Bizjak ubiz...@gmail.com wrote: * config/fpu-387.h (set_fpu): Use __builtin_ia32_stmxcsr and __builtin_ia32_ldmxcsr intrinsics. Tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline an 4.7 branch. I forgot that these builtins are enabled for SSE only (and x86_64 bootstrap enables SSE2 by default), so following addition is needed: --cut here-- Index: config/fpu-387.h === --- config/fpu-387.h(revision 190992) +++ config/fpu-387.h(working copy) @@ -96,7 +96,11 @@ #define _FPU_MASK_UM 0x10 #define _FPU_MASK_PM 0x20 -void set_fpu (void) +void +#ifndef __SSE__ +__attribute__((__target__(sse))) +#endif +set_fpu (void) { unsigned short cw; --cut here-- Re-tested on x86_64-pc-linux-gnu and committed. ... Not really. This option enables cmove, which should not be used on plain x86_32. At the end, lets revert back to assembly, with following change that was intended from the beginning: Index: config/fpu-387.h === --- config/fpu-387.h(revision 190992) +++ config/fpu-387.h(working copy) @@ -112,7 +112,7 @@ if (options.fpe GFC_FPE_UNDERFLOW) cw = ~_FPU_MASK_UM; if (options.fpe GFC_FPE_INEXACT) cw = ~_FPU_MASK_PM; - asm volatile (fldcw %0 : : m (cw)); + asm volatile (%vstmxcsr %0 : =m (cw_sse)); if (has_sse()) { @@ -131,6 +131,6 @@ if (options.fpe GFC_FPE_UNDERFLOW) cw_sse = ~(_FPU_MASK_UM 7); if (options.fpe GFC_FPE_INEXACT) cw_sse = ~(_FPU_MASK_PM 7); - __builtin_ia32_ldmxcsr (cw_sse); + asm volatile (%vldmxcsr %0 : : m (cw_sse)); } } Sorry for troubles, Uros.
Re: [PATCH] Fix PR 54494, removal of volatile store in strlen optimization
On Wed, Sep 05, 2012 at 02:10:03PM -0700, Andrew Pinski wrote: The problem here is the strlen optimization tries to remove a null character store as we already have done it but it does it for a volatile store which is not a valid thing to do. This patch fixes the problem by ignoring statements which have volatile operands. That should be caught by the !TREE_SIDE_EFFECTS (lhs) check a few lines later. Isn't the bug instead that remap_gimple_op_r copies over TREE_THIS_VOLATILE flag, but doesn't copy over TREE_SIDE_EFFECTS? * tree-ssa-strlen.c (strlen_optimize_stmt): Don't look at statements which have volatile operands. testsuite/ChangeLog: * gcc.dg/tree-ssa/strlen-1.c: New testcase. Index: gcc/tree-ssa-strlen.c === --- gcc/tree-ssa-strlen.c (revision 190993) +++ gcc/tree-ssa-strlen.c (working copy) @@ -1782,7 +1782,8 @@ strlen_optimize_stmt (gimple_stmt_iterat break; } } - else if (is_gimple_assign (stmt)) + else if (is_gimple_assign (stmt) + !gimple_has_volatile_ops (stmt)) { tree lhs = gimple_assign_lhs (stmt); Index: gcc/testsuite/gcc.dg/tree-ssa/strlen-1.c === --- gcc/testsuite/gcc.dg/tree-ssa/strlen-1.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/strlen-1.c (revision 0) @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-optimized } */ +extern const unsigned long base; +static inline void wreg(unsigned char val, unsigned long addr) __attribute__((always_inline)); +static inline void wreg(unsigned char val, unsigned long addr) +{ + *((volatile unsigned char *) (__SIZE_TYPE__) (base + addr)) = val; +} +void wreg_twice(void) +{ + wreg(0, 42); + wreg(0, 42); +} + +/* We should not remove the second null character store to (base+42) address. */ +/* { dg-final { scan-tree-dump-times ={v} 0; 2 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Jakub
[Patch, Fortran] PR54463 - fix -fexternal-matmul with -fdefault-real-8
Rather obvious fix. Build on x86-64-linux. OK for the trunk when regtesting has succeeded? Tobias 2012-09-06 Tobias Burnus PR fortran/54463 * trans-intrinsic.c (gfc_conv_intrinsic_funcall): Fix matmul call to BLAS if the default-kind has been promoted. 2012-09-06 Tobias Burnus PR fortran/54463 * gfortran.dg/promotion_2.f90: New. diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c index add4baa..4b268b3 100644 --- a/gcc/fortran/trans-intrinsic.c +++ b/gcc/fortran/trans-intrinsic.c @@ -2362,21 +2362,20 @@ gfc_conv_intrinsic_funcall (gfc_se * se, gfc_expr * expr) if (gfc_option.flag_external_blas (sym-ts.type == BT_REAL || sym-ts.type == BT_COMPLEX) - (sym-ts.kind == gfc_default_real_kind - || sym-ts.kind == gfc_default_double_kind)) + (sym-ts.kind == 4 || sym-ts.kind == 8)) { tree gemm_fndecl; if (sym-ts.type == BT_REAL) { - if (sym-ts.kind == gfc_default_real_kind) + if (sym-ts.kind == 4) gemm_fndecl = gfor_fndecl_sgemm; else gemm_fndecl = gfor_fndecl_dgemm; } else { - if (sym-ts.kind == gfc_default_real_kind) + if (sym-ts.kind == 4) gemm_fndecl = gfor_fndecl_cgemm; else gemm_fndecl = gfor_fndecl_zgemm; --- /dev/null 2012-09-03 07:48:56.919718426 +0200 +++ gcc/gcc/testsuite/gfortran.dg/promotion_2.f90 2012-09-05 19:52:17.0 +0200 @@ -0,0 +1,16 @@ +! { dg-do compile } +! { dg-options -fdefault-real-8 -fexternal-blas -fdump-tree-original } +! +! PR fortran/54463 +! +! Contributed by Simon Reinhardt +! +program test + implicit none + real, dimension(3,3) :: A + A = matmul(A,A) +end program test + +! { dg-final { scan-tree-dump-times sgemm_ 0 original } } +! { dg-final { scan-tree-dump-times dgemm_ 1 original } } +! { dg-final { cleanup-tree-dump original } }
Re: [Patch, Fortran] PR54463 - fix -fexternal-matmul with -fdefault-real-8
On Wed, Sep 05, 2012 at 11:55:54PM +0200, Tobias Burnus wrote: Rather obvious fix. Build on x86-64-linux. OK for the trunk when regtesting has succeeded? Yes. -- Steve
Re: [PATCH][RFC] Add -Og
On Sep 5, 2012, Richard Guenther rguent...@suse.de wrote: Yes, the goal is definitely to avoid the jumping back and forth on source lines you can see when debugging optimized programs. Hmm... If that's the goal, how about adding to the mix the Statement Frontier Notes proposal I advanced in some GCC Summit? Its goal is precisely to mark user-relevant viewpoints for a debugger to stop so as at to get a consistent and progressive view of the computation. Sure enough, there are optimizations that don't make much room for that, and so it would make sense for -Og to disable those, to get excellent results, rather than just -gO0d ones ;-) Any comments on the implementation details btw? Not really, sorry. -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer
[Patch ARM] big-endian support for Neon vext tests
Hello, Although the recent optimization I have committed to use Neon vext instruction for suitable builtin_shuffle calls does not support big-endian yet, I have written a patch to the existing testcases such they now support big-endian mode. I think it's worth improving these tests since writing the right masks for big-endian (such that the program computes the same results as in little-endian) is not always straightforward. In particular: * I have added some comments in a few tests were it took me a while to find the right mask. * In the case of the test which is executed, I had to force the noinline attribute on the helper functions, otherwise the computed results are wrong in big-endian. It is probably an overkill workaround but it works :-) I am going to file a bugzilla for this problem. I have checked that replacing calls to builtin_shuffle by the expected Neon vext variant produces the expected results in big-endian mode, and I arranged the big-endian masks to get the same results. Christophe. neon-vext-big-endian-tests.patch Description: Binary data
Re: [PATCH] Fix valtrack ICE (PR debug/53923)
On Aug 20, 2012, Jakub Jelinek ja...@redhat.com wrote: On the testcase from this PR on AVR (from libgcc, thus not including it into testsuite/) we ICE, because dead_debug_insert_temp is called several times on the same insn, for multi-register hard register for each regno in it (except the first which doesn't seem to be dead). In the first call dead_debug_insert_temp changes *DF_REF_REAL_LOC to DEBUG_EXPR, and in the next call for the next consecutive hard register we set reg variable to *DF_REF_REAL_LOC and rely on it to be a REG, when it is a DEBUG_EXPR already instead. This scenario sounds awfully familiar. (looks at personal notes) Yeah, PR53740. Does it look like the same problem, perhaps incompletely fixed there? I'm just concerned whether the multi-register HW reg should have been handled differently elsewhere, and this patch of yours is treating the symptom rather than the underlying issue. Do you have more recollections on whether the multi-reg refs made sense there? I ask because, when I debugged 53740, what I found out was that it didn't, we had regs marked as requiring dead_debug handling that really didn't. Now, if you didn't examine that possibility or don't recall the details (sorry it took me so long to get back to you), I'll be glad to have a look myself. 2012-08-20 Jakub Jelinek ja...@redhat.com PR debug/53923 * valtrack.c (dead_debug_insert_temp): Drop non-reg uses from the chain. -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer
Re: [PATCH] Improve debug info if tree DCE removes stores (PR debug/50317, fallout)
Hi, Richi, Sorry if this comes in late. I'd saved your message for careful analysis and I only got back to it now. On Aug 1, 2012, Richard Guenther richard.guent...@gmail.com wrote: On Fri, Dec 2, 2011 at 8:28 PM, Jakub Jelinek ja...@redhat.com wrote: PR debug/50317 * tree-ssa.c (target_for_debug_bind): Also allow is_gimple_reg_type vars that aren't referenced. (tree-ssa-live.c (remove_unused_locals): Don't clear TREE_ADDRESSABLE of unreferenced local vars. * cfgexpand.c (expand_debug_expr): For DEBUG_IMPLICIT_PTR allow also TREE_ADDRESSABLE vars that satisfy target_for_debug_bind. But we do _not_ want to have used non-register (but register type) variables tracked because we do not track aliases? The reasoning, back when I wrote target_for_debug_bind, was that it didn't make sense to enable VTA for variables that were addressable, because the location of addressable variables isn't subject to change. Since their location is a constant address, we're better off avoiding all the VTA location wrangling and just using their unchanging MEM location. - if (!is_gimple_reg (var)) -{ - if (is_gimple_reg_type (TREE_TYPE (var)) - referenced_var_lookup (cfun, DECL_UID (var)) == NULL_TREE) - return var; - return NULL_TREE; -} + /* var-tracking only tracks registers. */ + if (!is_gimple_reg_type (TREE_TYPE (var))) +return NULL_TREE; This change, although not incorrect, would subject lots of variables unnecessary VTA treatment, growing memory use and compile time. -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer
Re: [PATCH] Fix PR 54494, removal of volatile store in strlen optimization
On Wed, Sep 5, 2012 at 2:36 PM, Jakub Jelinek ja...@redhat.com wrote: On Wed, Sep 05, 2012 at 02:10:03PM -0700, Andrew Pinski wrote: The problem here is the strlen optimization tries to remove a null character store as we already have done it but it does it for a volatile store which is not a valid thing to do. This patch fixes the problem by ignoring statements which have volatile operands. That should be caught by the !TREE_SIDE_EFFECTS (lhs) check a few lines later. Isn't the bug instead that remap_gimple_op_r copies over TREE_THIS_VOLATILE flag, but doesn't copy over TREE_SIDE_EFFECTS? Yes it should be doing the copy. In fact I have touched this area before. I will submit a new patch. Thanks, Andrew Pinski * tree-ssa-strlen.c (strlen_optimize_stmt): Don't look at statements which have volatile operands. testsuite/ChangeLog: * gcc.dg/tree-ssa/strlen-1.c: New testcase. Index: gcc/tree-ssa-strlen.c === --- gcc/tree-ssa-strlen.c (revision 190993) +++ gcc/tree-ssa-strlen.c (working copy) @@ -1782,7 +1782,8 @@ strlen_optimize_stmt (gimple_stmt_iterat break; } } - else if (is_gimple_assign (stmt)) + else if (is_gimple_assign (stmt) + !gimple_has_volatile_ops (stmt)) { tree lhs = gimple_assign_lhs (stmt); Index: gcc/testsuite/gcc.dg/tree-ssa/strlen-1.c === --- gcc/testsuite/gcc.dg/tree-ssa/strlen-1.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/strlen-1.c (revision 0) @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-optimized } */ +extern const unsigned long base; +static inline void wreg(unsigned char val, unsigned long addr) __attribute__((always_inline)); +static inline void wreg(unsigned char val, unsigned long addr) +{ + *((volatile unsigned char *) (__SIZE_TYPE__) (base + addr)) = val; +} +void wreg_twice(void) +{ + wreg(0, 42); + wreg(0, 42); +} + +/* We should not remove the second null character store to (base+42) address. */ +/* { dg-final { scan-tree-dump-times ={v} 0; 2 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Jakub
Re: [Patch, PR 54128] ira.c change to fix mips bootstrap
On Wed, 2012-09-05 at 08:15 +0200, Jakub Jelinek wrote: The debug insns generally shouldn't extend the lifetime of pseudos (see the valtrack.c stuff), so if you hit this, there is probably some earlier bug that didn't reset/adjust the debug insns in question. I'm not saying the ira.c patch is absolutely a bad idea, but it would be good if you could investigate where those debug insns started extending lifetime of pseudos. I am not sure I know how to do that. I am also not sure the problem is with extending the life of a psuedo register or if it is in recognizing that a hard register is dead. $2, the register that doesn't get reused when generating debug code is the register used to return values. In this case I am returning a 64 bit integer value (step_c) that is split across two registers ($2 and $3). In the ira dump file I don't see any debug instructions referring to $3, but I do have one for $2. The debug_insn for $2 first shows up in the cse1 phase and there is no debug_insn for $3, perhaps because we only use the lower half of the return value. (debug_insn 73 25 72 5 (var_location:SI D#1 (reg:SI 2 $2)) -1 (nil)) (insn 72 73 27 5 (set (reg:SI 224 [ step_c+4 ]) (reg:SI 3 $3 [orig:2+4 ] [2])) x.i:58 282 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 3 $3 [orig:2+4 ] [2]) (nil))) (debug_insn 27 72 28 5 (var_location:DI step_c (concatn/v:DI [ (debug_expr:SI D#1) (reg:SI 224 [ step_c+4 ]) ])) x.i:58 -1 (nil)) It seems odd to have a concatn where one element is a debug_expr and the other is a register. But I don't know if this is a problem or a normal way of handling functions that return a value in two registers. Steve Ellcey s...@cup.hp.com
[PATCH] Merging Cilk Plus to GCC (2 of approximately 22)
Hello Everyone, Attached, please find a patch that will add regression test cases for elemental function implementation in C. Here are the Changelog entries: === gcc/testsuite/ChangeLog 2012-09-05 Balaji V. Iyer balaji.v.i...@intel.com * gcc.dg/cilk-plus/elem_fn_tests/32bit/test1.c: New test. * gcc.dg/cilk-plus/elem_fn_tests/32bit/test10.c: Likewise. * gcc.dg/cilk-plus/elem_fn_tests/32bit/test11.c: Likewise. * gcc.dg/cilk-plus/elem_fn_tests/32bit/test12.c: Likewise. * gcc.dg/cilk-plus/elem_fn_tests/32bit/test7.c: Likewise. * gcc.dg/cilk-plus/elem_fn_tests/32bit/test8.c: Likewise. * gcc.dg/cilk-plus/elem_fn_tests/32bit/test9.c: Likewise. * gcc.dg/cilk-plus/elem_fn_tests/64bit/ctrl_flow.c: Likewise. * gcc.dg/cilk-plus/elem_fn_tests/64bit/ctrl_flow2.c: Likewise. * gcc.dg/cilk-plus/elem_fn_tests/64bit/switch_stmt.c: Likewise. * gcc.dg/cilk-plus/elem_fn_tests/64bit/test1.c: Likewise. * gcc.dg/cilk-plus/elem_fn_tests/64bit/test13.c: Likewise. * gcc.dg/cilk-plus/elem_fn_tests/64bit/test14.c: Likewise. * gcc.dg/cilk-plus/elem_fn_tests/64bit/test15.c: Likewise. * gcc.dg/cilk-plus/elem_fn_tests/64bit/test16.c: Likewise. * gcc.dg/cilk-plus/elem_fn_tests/64bit/test17.c: Likewise. * gcc.dg/cilk-plus/elem_fn_tests/64bit/test18.c: Likewise. * gcc.dg/cilk-plus/elem_fn_tests/64bit/test2.c: Likewise. * gcc.dg/cilk-plus/elem_fn_tests/64bit/test3.c: Likewise. * gcc.dg/cilk-plus/elem_fn_tests/64bit/test4.c: Likewise. * gcc.dg/cilk-plus/elem_fn_tests/64bit/test5.c: Likewise. * gcc.dg/cilk-plus/elem_fn_tests/64bit/test6.c: Likewise. * gcc.dg/cilk-plus/elem_fn_tests/elem_fn.exp: New script. == Thanking You, Yours Sincerely, Balaji V. Iyer. diff --git gcc/testsuite/gcc.dg/cilk-plus/elem_fn_tests/32bit/test1.c gcc/testsuite/gcc.dg/cilk-plus/elem_fn_tests/32bit/test1.c new file mode 100644 index 000..a33ea3b --- /dev/null +++ gcc/testsuite/gcc.dg/cilk-plus/elem_fn_tests/32bit/test1.c @@ -0,0 +1,30 @@ +/* { dg-final { scan-assembler simdsimd } } */ + +/* This test will insert the clone for the function ef_add inside the function + * main (the non-masked version). + */ + +#include stdlib.h +#define My_Type float +__attribute__ ((vector(vectorlength(4), processor (pentium_4), uniform (x,y My_Type ef_add (My_Type x, My_Type y); + +My_Type vhx2[10]; +int +main (int argc, char **argv) +{ + My_Type vhx[10]; + int ii = 9; + + if (argc == 1) +for (ii = 0; ii 10; ii++) + vhx[ii] = argc; + + for (ii = 0; ii 10; ii++) +vhx2[ii] = ef_add(vhx[ii], vhx[ii]); + + for (ii = 0; ii 10; ii++) +if (vhx2[ii] != (argc + argc)) + abort (); + return 0; +} + diff --git gcc/testsuite/gcc.dg/cilk-plus/elem_fn_tests/32bit/test10.c gcc/testsuite/gcc.dg/cilk-plus/elem_fn_tests/32bit/test10.c new file mode 100644 index 000..477369e --- /dev/null +++ gcc/testsuite/gcc.dg/cilk-plus/elem_fn_tests/32bit/test10.c @@ -0,0 +1,12 @@ +/* { dg-final { scan-assembler simdsimd } } */ + +/* This test will create 2 clones of the function below, + * for the pentium4 with sse3 processor. + */ +#define My_Type float +__attribute__ ((vector(vectorlength(4), processor (pentium_4_sse3), linear(y), uniform (x +My_Type ef_add (My_Type x, My_Type y) + +{ + return x + y; +} diff --git gcc/testsuite/gcc.dg/cilk-plus/elem_fn_tests/32bit/test11.c gcc/testsuite/gcc.dg/cilk-plus/elem_fn_tests/32bit/test11.c new file mode 100644 index 000..197064b --- /dev/null +++ gcc/testsuite/gcc.dg/cilk-plus/elem_fn_tests/32bit/test11.c @@ -0,0 +1,12 @@ +/* { dg-final { scan-assembler simdsimd } } */ + +/* This test will create 1 clones of the function below, just mask + * for the pentium4 processor. + */ +#define My_Type float +__attribute__ ((vector(vectorlength(4), mask, processor (pentium_4_sse3), linear(y), uniform (x +My_Type ef_add (My_Type x, My_Type y) + +{ + return x + y; +} diff --git gcc/testsuite/gcc.dg/cilk-plus/elem_fn_tests/32bit/test12.c gcc/testsuite/gcc.dg/cilk-plus/elem_fn_tests/32bit/test12.c new file mode 100644 index 000..1c78356 --- /dev/null +++ gcc/testsuite/gcc.dg/cilk-plus/elem_fn_tests/32bit/test12.c @@ -0,0 +1,12 @@ +/* { dg-final { scan-assembler simdsimd } } */ + +/* This test will create 1 clones of the function below, just no mask + * for the pentium4 with sse3 processor. + */ +#define My_Type float +__attribute__ ((vector(vectorlength(4), nomask, processor (pentium_4_sse3), linear(y), uniform (x +My_Type ef_add (My_Type x, My_Type y) + +{ + return x + y; +} diff --git gcc/testsuite/gcc.dg/cilk-plus/elem_fn_tests/32bit/test7.c gcc/testsuite/gcc.dg/cilk-plus/elem_fn_tests/32bit/test7.c new file mode 100644 index 000..6720a8c --- /dev/null
Re: [C++ Patch] PR 18747
On 09/05/2012 02:17 PM, Paolo Carlini wrote: In fact, something seems weird earlier, in cp_parser_check_template_parameters. It has: /* If there are the same number of template classes and parameter lists, that's OK. */ if (parser-num_template_parameter_lists == num_templates) return true; /* If there are more, but only one more, then we are referring to a member template. That's OK too. */ if (parser-num_template_parameter_lists == num_templates + 1) return true; Right. but note that for: template class T struct A { int select() { return 0; } }; we have parser-num_template_parameter_lists == 1 and num_templates == 0. Thus it seems that the case 'num_templates + 1' isn't (just) about member templates... That's odd, num_templates should be 1. And I notice that cp_parser_check_declarator_template_parameters has another copy of the num_template_headers_for_class logic; they should be merged. I think the problem with 24314 is that we try to decide how many template headers we want before we determine what declaration we're looking at. When we have a redefinition or specialization, we know exactly how many headers we want, and we should check accordingly rather than say N or N+1. Jason
Re: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)
On Wed, 5 Sep 2012, Iyer, Balaji V wrote: Attached, please find the 1st of ~22 patches that implements Cilk Plus. This patch will implement Elemental Functions into the C compiler. Please check it in to the trunk if it looks OK. Below, I will give you a small example about what elemental function is and how it can be useful. Details about elemental function can be found in the following link (http://software.intel.com/en-us/articles/elemental-functions-writing-data-parallel-code-in-cc-using-intel-cilk-plus) That page says To continue reading the article, click on the link below. but I don't see such a link below. * c-cpp-elem-function.c (create_processor_attribute): Likewise. I don't see a ChangeLog entry for the addition of this file at all. When a new file is added, New file. is enough entry; you don't describe particular things within the file. This file includes tm.h and tm_p.h. Inclusion of these headers from front-end code is deprecated. If they are really needed, please put comments on the includes about exactly what target macros are being used in this front-end code. Similarly, use of hard-reg-set.h in front-end code is doubtful. Generally, please check all #includes in all new source files and make sure that each include is actually needed because some functionality from the relevant header is used in the source file; do not just copy the headers included by some existing source file. create_processor_attribute contains hardcoded references to x86-specific functionality. This is not OK; all such target dependencies need to be kept within the back ends, and handled from the rest of the compiler via target hooks (in most cases, new target dependencies must use target hooks not target macros). Please make sure every new function has a comment explicitly describing the semantics of every parameter and the return value as well as anything else the function does. Where there are alternative versions of functions/macros with/without explicit locations, please use the forms with explicit locations (e.g. build2_loc instead of build2), and try to link the locations to particular source code tokens and pass those locations down explicitly to each function as needed. There may be more issues; I'll await a revised patch before doing further review. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Merging Cilk Plus to GCC (2 of approximately 22)
On Wed, 5 Sep 2012, Iyer, Balaji V wrote: Hello Everyone, Attached, please find a patch that will add regression test cases for elemental function implementation in C. Can the tests be arranged so that all functionality that isn't intrinsically target-specific is tested for all targets - or at least, so it's easy for testsuite support for extra targets to be added without needing to duplicate lots of tests? You do: # For 64 bit architectures, we can run both 32 bit and 64 bit tests. if { [istarget x86_64-*-*] } then { This is the wrong approach. i?86-*-* targets can also support 64-bit while defaulting to 32-bit. Instead, leave it up to the user to specify the multilib options with which to test - if they wish to run both 32-bit and 64-bit tests, they can run the whole testsuite in both modes. Again, there may be more issues; in particular I'll need to review whether the entirety of the actual implementation is sufficiently covered by testcases once a revised implementation patch is available. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Merging Cilk Plus into Trunk (Patch 1 of approximately 22)
On Wed, Sep 5, 2012 at 5:09 PM, Iyer, Balaji V balaji.v.i...@intel.com wrote: Hello Everyone, Attached, please find the 1st of ~22 patches that implements Cilk Plus. This patch will implement Elemental Functions into the C compiler. Please check it in to the trunk if it looks OK. Below, I will give you a small example about what elemental function is and how it can be useful. Details about elemental function can be found in the following link (http://software.intel.com/en-us/articles/elemental-functions-writing-data-parallel-code-in-cc-using-intel-cilk-plus) Let's say we have two for loops like this: int my_func (int x, int y); For (ii = 0; ii 1; ii++) X[ii] = my_func (Y[ii], Z[ii]); For (jj = 1; jj 1; jj++) { A[jj] = my_func (B[ii], A[jj-1]) + A[jj-1]; Assume that my_func's body is not visible during this compilation (e.g it is in some library). If a vectorized version for my_func is available, then the first for loop can be vectorized. However, even if such a version of my_func is available, the 2nd for loop cannot be vectorized. It would be beneficial if there is a vectorized version and a scalar version of my_func. This is where an elemental function comes to play. If we annotate *both* the function declaration and the function with the following attribute, the compiler will create a vector and scalar version of the function. __attribute__((vector)) my_func (int x, int y); __attribute__((vector)) my_func (int x, int y) { ... /* Body of the function. */ } 1. You should consider a different name for the attribute. 2. Considering this example, won't you get the same behaviour if my_func was declared with pure attribute? If not, why? -- Gaby
[google] remove versioned symbols from libstdc++.a
This is a Google-local fix to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54482. When configured with --with-pic, libstdc++.a includes versioned symbols, preventing it from being linked into shared libraries. The ultimate cause of this is a misuse of -DPIC as a proxy for, I'm being compiled into a shared library. Unfortunately, there is no obvious alternative without first patching libtool. This patch provides a temporary workaround for the google/* branches. It creates new libtool options, -Xcompiler-static and -Xcompiler-shared, which pass flags only when compiling static or shared libraries, respectively. I then use the new machinery to pass -UPIC to the static library compilations. This has the effect of tricking libstdc++ into behaving properly. Ideally, a new macro should be used, since there are legitimate cases when PIC could be useful (e.g. in selecting between alternate assembly implementations). However, the current approach is less likely to break under future merge activity, since any new compatibility changes should just work. Long term, the correct solution is to: (a) convert this to a suitable libtool patch and push that upstream, (b) update GCC's libtool version, and (c) rework the libstdc++ source files to key off a more appropriate macro (e.g. SHARED_LIB). That's going to take some time, though, especially since upgrading libtool is a major (and rare) event. Okay for google/integration and google/gcc-4_7? Thanks, Ollie 2012-09-05 Ollie Wild a...@google.com * ltmain.sh (func_mode_compile): Add -Xcompiler-shared and -Xcompiler-static options. (func_mode_help): Document new options. * libstdc++/src/Makefile.am (LTCXXCOMPILE): Pass -UPIC when compiling static libraries. * libstdc++/src/Makefile.in: Regenerate. commit 7208cb10bcf3f1bfab77aa6756fc0b2672bd39fa Author: Ollie Wild a...@google.com Date: Tue Sep 4 14:35:19 2012 -0500 Add new libtool options -Xcompiler-shared and -Xcompiler-static. Use this to remove versioned symbols from libstdc++.a when configured with --with-pic. Google ref b/704 2012-09-05 Ollie Wild a...@google.com * ltmain.sh (func_mode_compile): Add -Xcompiler-shared and -Xcompiler-static options. (func_mode_help): Document new options. * libstdc++/src/Makefile.am (LTCXXCOMPILE): Pass -UPIC when compiling static libraries. * libstdc++/src/Makefile.in: Regenerate. diff --git a/libstdc++-v3/src/Makefile.am b/libstdc++-v3/src/Makefile.am index a1eb04d..d166155 100644 --- a/libstdc++-v3/src/Makefile.am +++ b/libstdc++-v3/src/Makefile.am @@ -147,7 +147,8 @@ LTCXXCOMPILE = \ $(LIBTOOL) --tag CXX \ $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) \ --mode=compile $(CXX) $(INCLUDES) \ - $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CXXFLAGS) $(CXXFLAGS) + $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CXXFLAGS) $(CXXFLAGS) \ + -Xcompiler-static -UPIC LTLDFLAGS = $(shell $(SHELL) $(top_srcdir)/../libtool-ldflags $(LDFLAGS)) diff --git a/libstdc++-v3/src/Makefile.in b/libstdc++-v3/src/Makefile.in index b10d853..e0578a2 100644 --- a/libstdc++-v3/src/Makefile.in +++ b/libstdc++-v3/src/Makefile.in @@ -406,7 +406,8 @@ LTCXXCOMPILE = \ $(LIBTOOL) --tag CXX \ $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) \ --mode=compile $(CXX) $(INCLUDES) \ - $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CXXFLAGS) $(CXXFLAGS) + $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CXXFLAGS) $(CXXFLAGS) \ + -Xcompiler-static -UPIC LTLDFLAGS = $(shell $(SHELL) $(top_srcdir)/../libtool-ldflags $(LDFLAGS)) diff --git a/ltmain.sh b/ltmain.sh index 6428631..3aac68f 100644 --- a/ltmain.sh +++ b/ltmain.sh @@ -1280,6 +1280,8 @@ func_mode_compile () $opt_debug # Get the compilation command and the source file. base_compile= +shared_compile= +static_compile= srcfile=$nonopt # always keep a non-empty value in srcfile suppress_opt=yes suppress_output= @@ -1303,6 +1305,20 @@ func_mode_compile () continue ;; + xcompiler-shared ) +arg_mode=normal + func_quote_for_eval $arg + shared_compile=$shared_compile $func_quote_for_eval_result + continue + ;; + + xcompiler-static ) +arg_mode=normal + func_quote_for_eval $arg + static_compile=$static_compile $func_quote_for_eval_result + continue + ;; + normal ) # Accept any command-line options. case $arg in @@ -1333,6 +1349,18 @@ func_mode_compile () continue # The current srcfile will either be retained or ;;# replaced later. I would guess that would be a bug. + -Xcompiler-shared) + arg_mode=xcompiler-shared # the next one goes into the +# shared_compile arg list + continue + ;; + + -Xcompiler-static) + arg_mode=xcompiler-static # the next
Re: [PATCH 1/2] gcc symbol database
What progress about my patch?
[google/gcc-4_7] Backport patch for comdat types problem
This patch is for the google/gcc-4_7 branch. Backport upstream patch to fix a problem where type signature does not include the type's context. Tested with bootstrap and regression tests. 2012-07-19 Jason Merrill ja...@redhat.com PR debug/53235 * dwarf2out.c (get_die_parent): New. (generate_type_signature): Use it. Index: gcc/testsuite/g++.dg/debug/dwarf2/nested-4.C === --- gcc/testsuite/g++.dg/debug/dwarf2/nested-4.C(revision 0) +++ gcc/testsuite/g++.dg/debug/dwarf2/nested-4.C(revision 0) @@ -0,0 +1,14 @@ +// PR debug/53235 +// { dg-options -gdwarf-4 } +// { dg-final { scan-assembler-times debug_types 2 } } + +namespace E { + class O {}; + void f (O o) {} +} +namespace F { + class O {}; + void f (O fo) {} +} +E::O eo; +int main () {} Index: gcc/dwarf2out.c === --- gcc/dwarf2out.c (revision 190940) +++ gcc/dwarf2out.c (working copy) @@ -5066,6 +5066,23 @@ get_AT (dw_die_ref die, enum dwarf_attri return NULL; } +/* Returns the parent of the declaration of DIE. */ + +static dw_die_ref +get_die_parent (dw_die_ref die) +{ + dw_die_ref t; + + if (!die) +return NULL; + + if ((t = get_AT_ref (die, DW_AT_abstract_origin)) + || (t = get_AT_ref (die, DW_AT_specification))) +die = t; + + return die-die_parent; +} + /* Return the low pc attribute value, typically associated with a subprogram DIE. Return null if the low pc attribute is either not present, or if it cannot be represented as an assembler label identifier. */ @@ -6698,9 +6715,11 @@ generate_type_signature (dw_die_ref die, unsigned char checksum[16]; struct md5_ctx ctx; dw_die_ref decl; + dw_die_ref parent; name = get_AT_string (die, DW_AT_name); decl = get_AT_ref (die, DW_AT_specification); + parent = get_die_parent (die); /* First, compute a signature for just the type name (and its surrounding context, if any. This is stored in the type unit DIE for link-time @@ -6711,8 +6730,8 @@ generate_type_signature (dw_die_ref die, md5_init_ctx (ctx); /* Checksum the names of surrounding namespaces and structures. */ - if (decl != NULL decl-die_parent != NULL) -checksum_die_context (decl-die_parent, ctx); + if (parent != NULL) +checksum_die_context (parent, ctx); md5_process_bytes (die-die_tag, sizeof (die-die_tag), ctx); md5_process_bytes (name, strlen (name) + 1, ctx); @@ -6728,8 +6747,8 @@ generate_type_signature (dw_die_ref die, die-die_mark = mark; /* Checksum the names of surrounding namespaces and structures. */ - if (decl != NULL decl-die_parent != NULL) -checksum_die_context (decl-die_parent, ctx); + if (parent != NULL) +checksum_die_context (parent, ctx); /* Checksum the DIE and its children. */ die_checksum_ordered (die, ctx, mark);
[wwwdocs] Fix a href vs a name confusion in codingrationale.html
That was an interesting one to find and understand: this was not supposed to be a link, but setting of an anchor. Fix a href vs a name confusion in codingrationale.html; applied. Gerald Index: codingrationale.html === RCS file: /cvs/gcc/wwwdocs/htdocs/codingrationale.html,v retrieving revision 1.1 diff -u -3 -p -r1.1 codingrationale.html --- codingrationale.html16 Jul 2012 19:51:44 - 1.1 +++ codingrationale.html6 Sep 2012 02:52:54 - @@ -348,7 +348,7 @@ but the clarity in layout persists. h3Formatting Conventions/h3 -h4a href=namesNames/a/h4 +h4a name=namesNames/a/h4 p Naming data members with a trailing underscore
[patch, score] Remove TARGET_LEGITIMIZE_ADDRESS define
Remove inconsistent code and macro define in score backend, Use gcc default code instead. --liqin ChangeLog: 2012-09-06 Chen Liqin liqin@gmail.com * config/score/score.c : Remove TARGET_LEGITIMIZE_ADDRESS define and score_legitimize_address function, use compiler default code instead. Index: gcc/config/score/score.c === --- gcc/config/score/score.c(revision 191002) +++ gcc/config/score/score.c(working copy) @@ -120,9 +120,6 @@ #undef TARGET_OPTION_OVERRIDE #define TARGET_OPTION_OVERRIDE score_option_override -#undef TARGET_LEGITIMIZE_ADDRESS -#define TARGET_LEGITIMIZE_ADDRESS score_legitimize_address - #undef TARGET_SCHED_ISSUE_RATE #define TARGET_SCHED_ISSUE_RATE score_issue_rate @@ -541,30 +538,6 @@ return gen_rtx_LO_SUM (Pmode, high, addr); } -/* This function is used to implement LEGITIMIZE_ADDRESS. If X can - be legitimized in a way that the generic machinery might not expect, - return the new address. */ -static rtx -score_legitimize_address (rtx x) -{ - enum score_symbol_type symbol_type; - - if (score_symbolic_constant_p (x, symbol_type) - symbol_type == SYMBOL_GENERAL) -return score_split_symbol (0, x); - - if (GET_CODE (x) == PLUS - GET_CODE (XEXP (x, 1)) == CONST_INT) -{ - rtx reg = XEXP (x, 0); - if (!score_valid_base_register_p (reg, 0)) -reg = copy_to_mode_reg (Pmode, reg); - return score_add_offset (reg, INTVAL (XEXP (x, 1))); -} - - return x; -} - /* Fill INFO with information about a single argument. CUM is the cumulative state for earlier arguments. MODE is the mode of this argument and TYPE is its type (if known). NAMED is true if this
Re: [google/gcc-4_7] Backport patch for comdat types problem
On Wed, Sep 5, 2012 at 7:46 PM, Cary Coutant ccout...@google.com wrote: This patch is for the google/gcc-4_7 branch. Approved for google/gcc-4_7 branch. Thanks, -- Paul Pluzhnikov
Re: [google/gcc-4_7] Backport patch for comdat types problem
This patch is for the google/gcc-4_7 branch. Approved for google/gcc-4_7 branch. Thanks, committed at r191005. -cary
Re: [wwwdocs] PATCH for Re: Commit: XStormy16: Add support for -fstack-usage
On Mon, 3 Sep 2012, nick clifton wrote: I like the full xstormy16 as well. I think that the fact that the gcc backend sources are in a directory called stormy16 is just a historical curiosity... Now that we are using svn, you could do a mv and change this. :) And the following patch updated the web page per your preference. Gerald Index: changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v retrieving revision 1.27 diff -u -3 -p -r1.27 changes.html --- changes.html4 Sep 2012 16:20:29 - 1.27 +++ changes.html6 Sep 2012 03:38:50 - @@ -298,7 +298,7 @@ by this change./p liAdded optimized instruction scheduling for Niagara4./li /ul -h3 id=stormyXStormy16/h3 +h3 id=xstormy16XStormy16/h3 ul liThis target now supports the code-fstack-usage/code