RE: [Patch ARM] Update the test case to differ movs and lsrs for ARM mode and non-ARM mode
-Original Message- From: Richard Earnshaw Sent: Wednesday, August 22, 2012 10:00 PM To: Terry Guo Cc: gcc-patches@gcc.gnu.org Subject: Re: [Patch ARM] Update the test case to differ movs and lsrs for ARM mode and non-ARM mode On 22/08/12 12:16, Terry Guo wrote: Due to the impact of ARM UAL, the Thumb1 and Thumb2 mode use LSRS instruction while the ARM mode uses MOVS instruction. So the following case is updated accordingly. Is it OK to trunk? BR, Terry 2012-08-21 Terry Guo terry@arm.com * gcc.target/arm/combine-movs.c: Check movs for ARM mode and lsrs for other mode. This can't be right. Thumb1 doesn't use unified syntax. R. oops. You are right. Sorry for making such obvious mistake. Here is patch updated to distinguish ARM and Thumb2. Tested for Thumb1, Thumb2 and ARM modes. No regression. Is it OK? BR, Terry 2012-08-21 Terry Guo terry@arm.com * gcc.target/arm/combine-movs.c: Check movs for ARM mode and lsrs for Thumb2 mode. OK. R. Hi Richard, Is it ok to apply this fix to gcc 4.7 branch? BR, Terry
[Ping]RE: [Patch, test] Enable to prune warnings for tests defined in one exp file
Hi Mike, Is it ok to document this feature in README.gcc? Is it ok to back port this feature to 4.7 branch? Thanks. BR, Terry -Original Message- From: Terry Guo [mailto:terry@arm.com] Sent: Thursday, August 30, 2012 10:45 AM To: 'Mike Stump' Cc: gcc-patches@gcc.gnu.org; Richard Guenther Subject: RE: [Patch, test] Enable to prune warnings for tests defined in one exp file -Original Message- From: Mike Stump [mailto:mikest...@comcast.net] Sent: Tuesday, August 28, 2012 1:21 AM To: Terry Guo Cc: gcc-patches@gcc.gnu.org; Richard Guenther Subject: Re: [Patch, test] Enable to prune warnings for tests defined in one exp file On Aug 27, 2012, at 1:14 AM, Terry Guo wrote: This patch intends to provide a chance to prune common warning messages for tests defined in an exp file. Is it OK to trunk? Ok. If you can find where to document this... :-) That'd be nice. I checked the texi files in gcc/doc folder, but can't find a suitable place. So I resort to README.gcc in gcc/testsuite which is claimed to list notes for those writing testcases and those writing expect scripts. Following is the patch. Is it OK? BR, Terry 2012-08-30 Terry Guo terry@arm.com * README.gcc: Document new variable dg_runtest_extra_prunes. Index: gcc/testsuite/README.gcc === --- gcc/testsuite/README.gcc (revision 190795) +++ gcc/testsuite/README.gcc (working copy) @@ -79,6 +79,11 @@ If a test does not fit into the torture framework, use the dg framework. +If some tests in an exp file need to skip same warning messages, just define +variable dg_runtest_extra_prunes in this exp file and let it contain this warning +message pattern. This can avoid duplicating dg-prune in these cases. +Always remember to clear this variable when leave this exp file. + Copyright (C) 1997, 1998, 2004 Free Software Foundation, Inc.
Re: [Patch,avr] PR54461: Better AVR-Libc integration
2012/9/4 Joerg Wunsch j...@uriah.heep.sax.de: As Georg-Johann Lay wrote: What do you propose? I'm fine with that option, and think it's a good idea. I have not objections against the patch. Denis
Re: [Patch,avr] Fix PR54220
2012/9/3 Georg-Johann Lay a...@gjlay.de: This implements TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS as obvious fix for PR54220. Ok to install? Johann PR target/54220 * config/avr/avr.c (TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS): New define to... (avr_allocate_stack_slots_for_args): ...this new static function. Approved. Denis.
Re: [Patch,avr] PR54461: Better AVR-Libc integration
On Mon, Sep 3, 2012 at 4:23 PM, Georg-Johann Lay a...@gjlay.de wrote: Gabriel Dos Reis schrieb: Georg-Johann Lay wrote: Gabriel Dos Reis schrieb: Georg-Johann Lay wrote: AVR-Libc comes with hand-optimized float support functions written in assembler. These functions use the same naming conventions like libgcc. There are situations where this name clashed lead to performance regression because the functions from libgcc are linked. One example are the new fixed-point support that convert fixed-point to/from float and reference float/int conversion functions from within libgcc. The float implementation in libm.a have been discussed several times with the only result that it is very unlikely that the code will ever be integrated into libgcc because the original authors are no more around. And is is much less work to add a new configure switch than to port and integrate the code, given there were no license issues. One point against such an extension was that such change to the compiler establishes a dependency between the compiler and AVR-Libc, but this decision has been made long ago by accepting code that actually should had been added to libgcc -- but was not for whatever reason. This patch removes that performance regressions by removing the doubly implemented functions from libgcc by means of a new configure option --with-avrlibc. as I stated yesterday, I do not understand why there needs to be yet another configure option. The NATURAL libc for ARV targets is ARV-libc. We should not need a switch for that. There is also newlib that is used with avr-gcc. I know this because some bugs are only triggered for newlib. If there are users that report bugs if avr-gcc is build for newlib, I'd guess these users are actually interested in using newlib. I did not say there was no other libc library. I said that the *natural* libc appears to be AVR-libc. We don't configure GCC/g++ saying --with-libstdc++. That's a different story because these libraries support in-tree build just like newlib does. This is not true for AVR-Libc which does not support in-tree builds. I agree that AVR-Libc is the most common libc implementation used with avr-gcc and is has many advantages over other libc implementation (except that it does not support in-tree builds). I think the in-tree builds thing is a red herring. However, a --with-avrlibc is not needed to *get* the support from AVR-Libc, it's just used to fix some problems that arise in certain use cases. so, let's make it the default -- see below. Besides that, the proposed arrangement does not affect the configuration if the switch is *not* specified, thus the patch is appropriate to be backported. My intention is to backport it to 4.7 as indicated by the milestone, but if the change was unconditionally I don't think the change is appropriate for a backport. It is perfectly reasonable and OK to to make the backport more guarded (e.g. by the configure option) than on mainline. And after all it's just a *configure* option that some distribution maintainers can set if they want to. yes, but it is still one more configure option. The tool chain user is not bothered at all by the new option and won't even notice it. From the user perspective it's just as if some optimizations had been added to the tool chain. What do you propose? Use the setting per default and support a --with-avrlibc=no if the user want full libgcc support and nothing removed from it? Yes. Let's make the sane behaviour the default. -- Gaby
[SH] Define NO_IMPLICIT_EXTERN_C for newlib targets
newlib uses extern C wrappers in its headers, so GCC can be told it is C++ compatible. this patch fixes : FAIL: g++.dg/lookup/builtin5.C -std=c++11 scan-assembler _ZSt5atanhd t Tested om the 4.7 and 4.8 branches, OK for both ? nb: newlib can be added to the list of runtimes that need it (see http://gcc.gnu.org/ml/gcc-patches/2012-06/msg01164.html), in case this macro is removed in the future. Thanks Christian 2012-09-04 Christian Bruel christian.br...@st.com * config/sh/newlib.h (NO_IMPLICIT_EXTERN_C): Define. Index: config/sh/newlib.h === --- config/sh/newlib.h (revision 190714) +++ config/sh/newlib.h (working copy) @@ -23,3 +23,7 @@ #undef LIB_SPEC #define LIB_SPEC -lc -lgloss + +#undef NO_IMPLICIT_EXTERN_C +#define NO_IMPLICIT_EXTERN_C 1 +
Re: [Patch,avr] PR54461: Better AVR-Libc integration
Gabriel Dos Reis schrieb: Georg-Johann Lay wrote: Gabriel Dos Reis schrieb: Georg-Johann Lay wrote: Gabriel Dos Reis schrieb: Georg-Johann Lay wrote: AVR-Libc comes with hand-optimized float support functions written in assembler. These functions use the same naming conventions like libgcc. There are situations where this name clashed lead to performance regression because the functions from libgcc are linked. One example are the new fixed-point support that convert fixed-point to/from float and reference float/int conversion functions from within libgcc. The float implementation in libm.a have been discussed several times with the only result that it is very unlikely that the code will ever be integrated into libgcc because the original authors are no more around. And is is much less work to add a new configure switch than to port and integrate the code, given there were no license issues. One point against such an extension was that such change to the compiler establishes a dependency between the compiler and AVR-Libc, but this decision has been made long ago by accepting code that actually should had been added to libgcc -- but was not for whatever reason. This patch removes that performance regressions by removing the doubly implemented functions from libgcc by means of a new configure option --with-avrlibc. as I stated yesterday, I do not understand why there needs to be yet another configure option. The NATURAL libc for ARV targets is ARV-libc. We should not need a switch for that. There is also newlib that is used with avr-gcc. I know this because some bugs are only triggered for newlib. If there are users that report bugs if avr-gcc is build for newlib, I'd guess these users are actually interested in using newlib. I did not say there was no other libc library. I said that the *natural* libc appears to be AVR-libc. We don't configure GCC/g++ saying --with-libstdc++. That's a different story because these libraries support in-tree build just like newlib does. This is not true for AVR-Libc which does not support in-tree builds. I agree that AVR-Libc is the most common libc implementation used with avr-gcc and is has many advantages over other libc implementation (except that it does not support in-tree builds). I think the in-tree builds thing is a red herring. I don't think so. If there was an in-tree build gcc could detect itself whether or not AVR-Libc is present or not. With the current setup the user has to specify that -- in whatever direction: that libc is there or that libc is not there depending on whatever is default. However, a --with-avrlibc is not needed to *get* the support from AVR-Libc, it's just used to fix some problems that arise in certain use cases. so, let's make it the default -- see below. Besides that, the proposed arrangement does not affect the configuration if the switch is *not* specified, thus the patch is appropriate to be backported. My intention is to backport it to 4.7 as indicated by the milestone, but if the change was unconditionally I don't think the change is appropriate for a backport. It is perfectly reasonable and OK to to make the backport more guarded (e.g. by the configure option) than on mainline. And after all it's just a *configure* option that some distribution maintainers can set if they want to. yes, but it is still one more configure option. hmm. The configure machinery was not changed, it automatically sets with_foo if --with-foo is specified. It's just about who is to be blamed if he does not read the release notes ;-) Whatever, I think we two are stuck now and enough arguments passed back and forth. Let the port maintainers decide. And Jörg, would you check the excludes list in t-avrlibc? Johann The tool chain user is not bothered at all by the new option and won't even notice it. From the user perspective it's just as if some optimizations had been added to the tool chain. What do you propose? Use the setting per default and support a --with-avrlibc=no if the user want full libgcc support and nothing removed from it? Yes. Let's make the sane behaviour the default. -- Gaby
Re: [Patch,avr] PR54461: Better AVR-Libc integration
On Tue, Sep 4, 2012 at 1:55 AM, Georg-Johann Lay a...@gjlay.de wrote: Gabriel Dos Reis schrieb: Georg-Johann Lay wrote: Gabriel Dos Reis schrieb: Georg-Johann Lay wrote: Gabriel Dos Reis schrieb: Georg-Johann Lay wrote: AVR-Libc comes with hand-optimized float support functions written in assembler. These functions use the same naming conventions like libgcc. There are situations where this name clashed lead to performance regression because the functions from libgcc are linked. One example are the new fixed-point support that convert fixed-point to/from float and reference float/int conversion functions from within libgcc. The float implementation in libm.a have been discussed several times with the only result that it is very unlikely that the code will ever be integrated into libgcc because the original authors are no more around. And is is much less work to add a new configure switch than to port and integrate the code, given there were no license issues. One point against such an extension was that such change to the compiler establishes a dependency between the compiler and AVR-Libc, but this decision has been made long ago by accepting code that actually should had been added to libgcc -- but was not for whatever reason. This patch removes that performance regressions by removing the doubly implemented functions from libgcc by means of a new configure option --with-avrlibc. as I stated yesterday, I do not understand why there needs to be yet another configure option. The NATURAL libc for ARV targets is ARV-libc. We should not need a switch for that. There is also newlib that is used with avr-gcc. I know this because some bugs are only triggered for newlib. If there are users that report bugs if avr-gcc is build for newlib, I'd guess these users are actually interested in using newlib. I did not say there was no other libc library. I said that the *natural* libc appears to be AVR-libc. We don't configure GCC/g++ saying --with-libstdc++. That's a different story because these libraries support in-tree build just like newlib does. This is not true for AVR-Libc which does not support in-tree builds. I agree that AVR-Libc is the most common libc implementation used with avr-gcc and is has many advantages over other libc implementation (except that it does not support in-tree builds). I think the in-tree builds thing is a red herring. I don't think so. If there was an in-tree build gcc could detect itself whether or not AVR-Libc is present or not. With the current setup the user has to specify that -- in whatever direction: that libc is there or that libc is not there depending on whatever is default. obviously that situation isn't ideal, and we shouldn't build patches that are as if it it should be perpetuated. [...] yes, but it is still one more configure option. hmm. The configure machinery was not changed, It is one more configure option for user to specify, no matter how the internal configury is implemented. -- Gaby
[PATCH] Fix PR 54362 (COND_EXPR not understood by ITM)
Hi, The problem here is that trans-mem.c does not take into account that COND_EXPR can happen for pointers. This patch modifies thread_private_new_memory to handle COND_EXPR as it can handle PHI nodes. The testcase is a modified version of memopt-12.c but with a loop which both LIM and if-convert can change the conditional to a COND_EXPR. I found this problem when I was producing a pass which does a full if-convert before expanding (well changing the last phi-opt pass) and it produces COND_EXPRs and memopt-12.c started to fail. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. Thanks, Andrew Pinski ChangeLog: * trans-mem.c (thread_private_new_memory): Handle COND_EXPR also. testsuite/ChangeLog: * gcc.dg/tm/memopt-16.c: New testcase. Index: testsuite/gcc.dg/tm/memopt-16.c === --- testsuite/gcc.dg/tm/memopt-16.c (revision 0) +++ testsuite/gcc.dg/tm/memopt-16.c (revision 0) @@ -0,0 +1,43 @@ +/* { dg-do compile } */ +/* { dg-options -fgnu-tm -O3 -fdump-tree-tmmark } */ +/* Like memopt-12.c but the phi is inside a look which causes + it to be converted into a COND_EXPR. */ + +extern int test(void) __attribute__((transaction_safe)); +extern void *malloc (__SIZE_TYPE__) __attribute__((malloc,transaction_safe)); + +struct large { int foo[500]; }; + +int f(int j) +{ + int *p1, *p2, *p3; + + p1 = malloc (sizeof (*p1)*5000); + __transaction_atomic { +_Bool t; +int i = 1; +*p1 = 0; + +p2 = malloc (sizeof (*p2)*6000); +*p2 = 1; +t = test(); + +for (i = 0;i j;i++) +{ + +/* p3 = PHI (p1, p2) */ +if (t) + p3 = p1; +else + p3 = p2; + +/* Since both p1 and p2 are thread-private, we can inherit the + logging already done. No ITM_W* instrumentation necessary. */ +*p3 = 555; +} + } + return p3[something()]; +} + +/* { dg-final { scan-tree-dump-times ITM_WU 0 tmmark } } */ +/* { dg-final { cleanup-tree-dump tmmark } } */ Index: trans-mem.c === --- trans-mem.c (revision 190908) +++ trans-mem.c (working copy) @@ -1379,6 +1379,19 @@ thread_private_new_memory (basic_block e /* x = (cast*) foo == foo */ else if (code == VIEW_CONVERT_EXPR || code == NOP_EXPR) x = gimple_assign_rhs1 (stmt); + /* x = c ? op1 : op2 == op1 or op2 just like a PHI */ + else if (code == COND_EXPR) + { + tree op1 = gimple_assign_rhs2 (stmt); + tree op2 = gimple_assign_rhs3 (stmt); + enum thread_memory_type mem; + retval = thread_private_new_memory (entry_block, op1); + if (retval == mem_non_local) + goto new_memory_ret; + mem = thread_private_new_memory (entry_block, op2); + retval = MIN (retval, mem); + goto new_memory_ret; + } else { retval = mem_non_local;
Re: [middle-end] Add machine_mode to address_cost target hook
On Mon, 2012-09-03 at 01:58 +0200, Oleg Endo wrote: OKOK -- I'll do it :) (within the next couple of days) And so I did. Attached is an updated patch that adds the address space parameter to the address_cost function. I hope that this change does not reset the ACKs so far: [x] target-independent bits [ ] alpha [ ] arm [ ] avr [ ] bfin [ ] cr16 [ ] cris [ ] epiphany[ ] i386 [ ] ia64 [ ] iq2000[ ] lm32[ ] m32c [ ] m32r [ ] mcore [ ] mep [x] microblaze [x] mips [ ] mmix [x] mn10300 [ ] pa [ ] rs6000[ ] rx[ ] s390[ ] score [x] sh[ ] sparc [ ] spu [ ] stormy16 [ ] v850 [ ] vax [ ] xtensa Tested with 'make all-gcc' on SH xgcc and i386 native build. No functional changes, except on MIPS, as requested by Richard Sandiford. Cheers, Oleg ChangeLog: * hooks.c (hook_int_rtx_mode_as_bool_0): New function. * hooks.h (hook_int_rtx_mode_as_bool_0): Declare it. * output.h (default_address_cost): Add machine_mode and address space arguments. * target.def (address_cost): Likewise. * rtlanal.c (address_cost): Pass mode and address space to target hook. (default_address_cost): Add unnamed machine_mode and address space arguments. * doc/tm.texi: Regenerate. * config/alpha/alpha.c (TARGET_ADDRESS_COST): Use hook_int_rtx_mode_as_bool_0 instead of hook_int_rtx_bool_0. * config/arm/arm.c (arm_address_cost): Add machine_mode and address space arguments. * config/avr/avr.c (avr_address_cost): Likewise. * config/bfin/bfin.c (bfin_address_cost): Likewise. * config/cr16/cr16.c (cr16_address_cost): Likewise. * config/cris/cris.c (cris_address_cost): Likewise. * config/epiphany/epiphany.c (epiphany_address_cost): Likewise. * config/i386/i386.c (ix86_address_cost): Likewise. * config/ia64/ia64.c (TARGET_ADDRESS_COST): Use hook_int_rtx_mode_as_bool_0 instead of hook_int_rtx_bool_0. * config/iq2000/iq2000.c (iq2000_address_cost): Add machine_mode and address space arguments. Pass them on in recursive invocation. * config/lm32/lm32.c (TARGET_ADDRESS_COST): Use hook_int_rtx_mode_as_bool_0 instead of hook_int_rtx_bool_0. * config/m32c/m32c.c (m32c_address_cost): Add machine_mode and address space arguments. * config/m32r/m32r.c (TARGET_ADDRESS_COST): Use hook_int_rtx_mode_as_bool_0 instead of hook_int_rtx_bool_0. * config/mcore/mcore.c (TARGET_ADDRESS_COST): Likewise. * config/mep/mep.c (mep_address_cost): Add machine_mode and address space arguments. * config/microblaze/microblaze.c (microblaze_address_cost): Likewise. * config/mips/mips.c (mips_address_cost): Likewise. * config/mmix/mmix.c (TARGET_ADDRESS_COST): Use hook_int_rtx_mode_as_bool_0 instead of hook_int_rtx_bool_0. * config/mn10300/mn10300.c (mn10300_address_cost): Add machine_mode and address space arguments. Use GET_MODE (x) and ADDR_SPACE_GENERIC in recursive invocation. * config/pa/pa.c (hppa_address_cost): Add machine_mode and address space arguments. * config/rs6000/rs6000.c (rs6000_debug_address_cost): Likewise. (TARGET_ADDRESS_COST): Use hook_int_rtx_mode_as_bool_0 instead of hook_int_rtx_bool_0. * config/rx/rx.c (rx_address_cost): Add machine_mode and address space arguments. * config/s390/s390.c (s390_address_cost): Likewise. * config/score/score-protos.h (score_address_cost): Likewise. * config/score/score.c (score_address_cost): Likewise. * config/sh/sh.c (sh_address_cost): Likewise. * config/sparc/sparc.c (TARGET_ADDRESS_COST): Use hook_int_rtx_mode_as_bool_0 instead of hook_int_rtx_bool_0. * config/spu/spu.c (TARGET_ADDRESS_COST): Likewise. * config/stormy16/stormy16.c (xstormy16_address_cost): Add machine_mode and address space arguments. * config/v850/v850.c (TARGET_ADDRESS_COST): Use hook_int_rtx_mode_as_bool_0 instead of hook_int_rtx_bool_0. * config/vax/vax.c (vax_address_cost): Add machine_mode and address space arguments. * config/xtensa/xtensa (TARGET_ADDRESS_COST): Use hook_int_rtx_mode_as_bool_0 instead of hook_int_rtx_bool_0. Index: gcc/rtlanal.c === --- gcc/rtlanal.c (revision 190865) +++ gcc/rtlanal.c (working copy) @@ -3820,13 +3820,13 @@ if (!memory_address_addr_space_p (mode, x, as)) return 1000; - return targetm.address_cost (x, speed); + return targetm.address_cost (x, mode, as, speed); } /* If the target doesn't override, compute the cost as with arithmetic. */ int -default_address_cost (rtx
Re: [PATCH, M68K] Fix ICE from scheduler improvement
Maxim Kuvyrkov maxim_kuvyr...@mentor.com writes: 2012-09-03 Maxim Kuvyrkov ma...@codesourcery.com * config/m68k/m68k.c (m68k_sched_dfa_post_advance_cycle): Fix ICE caused by save scheduler state patch. The change log entry should describe what was changed. Save scheduler state doesn't say anything to me. + { + /* The instruction buffer appears to be more filled than we + anticipated. We should have inheritted the state from s/inheritted/inherited/ + the previous basic block. Adjust buffer counter. */ + ++sched_ib.filled; + } The comment appears to suggest that this is rather a workaround for a deficiency elsewhere. Is that true? Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Re: [middle-end] Add machine_mode to address_cost target hook
Hi Oleg, And so I did. Attached is an updated patch that adds the address space parameter to the address_cost function. I hope that this change does not reset the ACKs so far: [x] target-independent bits [ ] alpha [ ] arm [ ] avr [ ] bfin [ ] cr16 [ ] cris [ ] epiphany[ ] i386 [ ] ia64 [ ] iq2000[ ] lm32[ ] m32c [ ] m32r [ ] mcore [ ] mep [x] microblaze [x] mips [ ] mmix [x] mn10300 [ ] pa [ ] rs6000[ ] rx[ ] s390[ ] score [x] sh[ ] sparc [ ] spu [ ] stormy16 [ ] v850 [ ] vax [ ] xtensa Please add ACKs for: iq2000, m32r, mcore, rx, stormy16 and v850. Cheers Nick
Re: [PATCH] PR45070: Fix wrong epilogue code for cortex-m0/Os
I ran regression test with/without Os for cortex-m0 and everything is ok. Ok for trunk and 4.7/4.6 release branches? OK for trunk. Ok to backport if no release manager objects in 24 hours and if it tests without regressions there. Thanks, Ramana
Re: [SH] PR 51244 - Add CANONICALIZE_COMPARISON macro
On Mon, 2012-09-03 at 19:37 +0900, Kaz Kojima wrote: Oleg Endo oleg.e...@t-online.de wrote: In any case, I have no problem with changing the multi line comments to /* ... */. Just let me know. Other than that, the patch is OK. I've committed the attached version of the patch as rev 190909. Cheers, Oleg Index: gcc/config/sh/sh.md === --- gcc/config/sh/sh.md (revision 190865) +++ gcc/config/sh/sh.md (working copy) @@ -881,10 +881,9 @@ if (TARGET_SHMEDIA) emit_jump_insn (gen_cbranchint4_media (operands[0], operands[1], operands[2], operands[3])); - else if (TARGET_CBRANCHDI4) -expand_cbranchsi4 (operands, LAST_AND_UNUSED_RTX_CODE, -1); else -sh_emit_compare_and_branch (operands, SImode); +expand_cbranchsi4 (operands, LAST_AND_UNUSED_RTX_CODE, -1); + DONE; }) Index: gcc/config/sh/sh-protos.h === --- gcc/config/sh/sh-protos.h (revision 190865) +++ gcc/config/sh/sh-protos.h (working copy) @@ -106,6 +106,9 @@ extern rtx sh_gen_truncate (enum machine_mode, rtx, int); extern bool sh_vector_mode_supported_p (enum machine_mode); extern bool sh_cfun_trap_exit_p (void); +extern void sh_canonicalize_comparison (enum rtx_code, rtx, rtx, + enum machine_mode mode = VOIDmode); + #endif /* RTX_CODE */ extern const char *output_jump_label_table (void); Index: gcc/config/sh/sh.c === --- gcc/config/sh/sh.c (revision 190865) +++ gcc/config/sh/sh.c (working copy) @@ -21,6 +21,12 @@ along with GCC; see the file COPYING3. If not see http://www.gnu.org/licenses/. */ +/* FIXME: This is a temporary hack, so that we can include algorithm + below. algorithm will try to include cstdlib which will reference + malloc co, which are poisoned by system.h. The proper solution is + to include cstdlib in system.h instead of stdlib.h. */ +#include cstdlib + #include config.h #include system.h #include coretypes.h @@ -56,6 +62,7 @@ #include tm-constrs.h #include opts.h +#include algorithm int code_for_indirect_jump_scratch = CODE_FOR_indirect_jump_scratch; @@ -1791,65 +1798,124 @@ } } -enum rtx_code -prepare_cbranch_operands (rtx *operands, enum machine_mode mode, - enum rtx_code comparison) +/* Implement the CANONICALIZE_COMPARISON macro for the combine pass. + This function is also re-used to canonicalize comparisons in cbranch + pattern expanders. */ +void +sh_canonicalize_comparison (enum rtx_code cmp, rtx op0, rtx op1, + enum machine_mode mode) { - rtx op1; - rtx scratch = NULL_RTX; + /* When invoked from within the combine pass the mode is not specified, + so try to get it from one of the operands. */ + if (mode == VOIDmode) +mode = GET_MODE (op0); + if (mode == VOIDmode) +mode = GET_MODE (op1); - if (comparison == LAST_AND_UNUSED_RTX_CODE) -comparison = GET_CODE (operands[0]); - else -scratch = operands[4]; - if (CONST_INT_P (operands[1]) - !CONST_INT_P (operands[2])) + // We need to have a mode to do something useful here. + if (mode == VOIDmode) +return; + + // Currently, we don't deal with floats here. + if (GET_MODE_CLASS (mode) == MODE_FLOAT) +return; + + // Make sure that the constant operand is the second operand. + if (CONST_INT_P (op0) !CONST_INT_P (op1)) { - rtx tmp = operands[1]; + std::swap (op0, op1); + cmp = swap_condition (cmp); +} - operands[1] = operands[2]; - operands[2] = tmp; - comparison = swap_condition (comparison); -} - if (CONST_INT_P (operands[2])) + if (CONST_INT_P (op1)) { - HOST_WIDE_INT val = INTVAL (operands[2]); - if ((val == -1 || val == -0x81) - (comparison == GT || comparison == LE)) + /* Try to adjust the constant operand in such a way that available + comparison insns can be utilized better and the constant can be + loaded with a 'mov #imm,Rm' insn. This avoids a load from the + constant pool. */ + const HOST_WIDE_INT val = INTVAL (op1); + + /* x -1 -- x = 0 + x 0xFF7F -- x = 0xFF80 + x = -1 -- x 0 + x = 0xFF7F -- x 0xFF80 */ + if ((val == -1 || val == -0x81) (cmp == GT || cmp == LE)) { - comparison = (comparison == GT) ? GE : LT; - operands[2] = gen_int_mode (val + 1, mode); + cmp = cmp == GT ? GE : LT; + op1 = gen_int_mode (val + 1, mode); +} + + /* x = 1 -- x 0 + x = 0x80 -- x 0x7F + x 1 -- x = 0 + x 0x80 -- x = 0x7F */ + else if ((val == 1 || val == 0x80) (cmp == GE || cmp == LT)) + { + cmp = cmp == GE ? GT : LE; + op1 = gen_int_mode (val - 1, mode); } - else if ((val == 1 || val == 0x80) - (comparison == GE || comparison == LT)) + + /* unsigned x = 1 -- x != 0 + unsigned x 1 -- x == 0 */ + else if (val == 1 (cmp == GEU ||
[Patch, Fortran, committed] PR 54435 54443
Hi all, I have just committed to trunk a trivial fix for two recent OOP regressions (affecting 4.7 and trunk), both of which originate from the same problem: http://gcc.gnu.org/viewcvs?view=revisionrevision=190910 I will commit this fix also to the 4.7 branch in a few days. Cheers, Janus
[PATCH] Fix PR54458
This fixes PR54458 where DOM jump threading turns a loop into one with multiple latches but does not mark it so. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2012-09-04 Richard Guenther rguent...@suse.de PR tree-optimization/54458 * tree-ssa-threadupdate.c (thread_through_loop_header): If we turn the loop into one with multiple latches mark it so. * gcc.dg/torture/pr54458.c: New testcase. Index: gcc/tree-ssa-threadupdate.c === --- gcc/tree-ssa-threadupdate.c (revision 190889) +++ gcc/tree-ssa-threadupdate.c (working copy) @@ -1037,11 +1037,21 @@ thread_through_loop_header (struct loop } free (bblocks); + /* If the new header has multiple latches mark it so. */ + FOR_EACH_EDGE (e, ei, loop-header-preds) + if (e-src-loop_father == loop +e-src != loop-latch) + { + loop-latch = NULL; + loops_state_set (LOOPS_MAY_HAVE_MULTIPLE_LATCHES); + } + /* Cancel remaining threading requests that would make the loop a multiple entry loop. */ FOR_EACH_EDGE (e, ei, header-preds) { edge e2; + if (e-aux == NULL) continue; Index: gcc/testsuite/gcc.dg/torture/pr54458.c === --- gcc/testsuite/gcc.dg/torture/pr54458.c (revision 0) +++ gcc/testsuite/gcc.dg/torture/pr54458.c (working copy) @@ -0,0 +1,20 @@ +/* { dg-do compile } */ + +unsigned int a, b, c; + +void +foo (unsigned int x) +{ + do +{ + if (a == 0 ? 1 : 1 % a) + for (; b; b--) + lab:; + else + while (x) + ; + if (c) + goto lab; +} + while (1); +}
Re: [PATCH 3/3] Compute predicates for phi node results in ipa-inline-analysis.c
On Mon, Sep 3, 2012 at 5:52 PM, Jan Hubicka hubi...@ucw.cz wrote: On Fri, Aug 31, 2012 at 7:24 PM, Martin Jambor mjam...@suse.cz wrote: Hi, On Thu, Aug 30, 2012 at 05:11:35PM +0200, Martin Jambor wrote: this is a new version of the patch which makes ipa analysis produce predicates for PHI node results, at least at the bottom of the simplest diamond and semi-diamond CFG subgraphs. This time I also analyze the conditions again rather than extracting information from CFG edges, which means I can reason about substantially more PHI nodes. This patch makes us produce loop bounds hint for the pr48636.f90 testcase. Bootstrapped and tested on x86_64-linux. OK for trunk? Thanks, Martin 2012-08-29 Martin Jambor mjam...@suse.cz * ipa-inline-analysis.c (phi_result_unknown_predicate): New function. (predicate_for_phi_result): Likewise. (estimate_function_body_sizes): Use the above two functions. This patch, on top of the one doing loop calculations almost always, introduces a number of testsuite failures which somehow I had not caught during my testing. The problem is that either calculate_dominance_info or loop_optimizer_init introduce new SSA names for which there is no index in nonconstant_names which is allocated before the dominance and loop computations. I'm currently bootstrapping and testing the following fix which simply allocates the vector after doing the two computations. If it passes I will commit it straight away so that the regression is fixed before I leave for the weekend, I hope it's obvious enough for that. On the other hand, it would really be better if we did not change function bodies during IPA summary generation phase... Um ... we shouldn't do this. Can you track down where it happens? I suppose it might come from CFG manipulations loop_optimizer_init performs when not passing AVOID_CFG_MODIFICATIONS. I bet it come from loop noromalization :) (i.e. loop closed form or preheader construction both needs new SSA names.) I think it would be best to make pass manager to handle this and make loop normalization to happen once before all SSA IPA analysis And compute loops as well. Richard. Honza
Re: [PATCH, C] Mixed pointer types in call to streamer_tree_cache_lookup() in gcc/lto-streamer-out.c
On Mon, Sep 3, 2012 at 6:10 PM, Andris Pavenis andris.pave...@iki.fi wrote: On 09/03/2012 03:27 PM, Richard Guenther wrote: On Sat, Sep 1, 2012 at 2:21 PM, Andris Pavenis andris.pave...@iki.fi wrote: uint32_t * is used as a 3rd parameter in call to streamer_tree_cache_lookup() in 2 places in gcc/lto-streamer-out.c when the procedure prototype have unsigned *. They are not guaranteed to be the same for all targets (I got error when building for DJGPP) Ok. I do not have SVN write access, so I cannot commit myself Hmm, OTOH your patch looks wrong as @@ -1131,7 +1131,7 @@ lto_output_decl_state_refs (struct output_block *ob, struct lto_out_decl_state *state) { unsigned i; - uint32_t ref; + unsigned ref; tree decl; /* Write reference to FUNCTION_DECL. If there is not function, conflicts with streamer_tree_cache_lookup (ob-writer_cache, decl, ref); gcc_assert (ref != (unsigned)-1); lto_output_data_stream (out_stream, ref, sizeof (uint32_t)); where the on-disk format expects uint32_t layout. Thus I think streamer_tree_cache_lookup should instead use uint32_t consistently. Richard. Andris Thanks, Richard. Andris ChangeLog entry 2012-09-01 Andris Pavenis andris.pave...@iki.fi * lto-streamer-out.c (write_global_references, lto_output_decl_state_refs): Fix parameter type in call to streamer_tree_cache_lookup
Re: combine BIT_FIELD_REF and VEC_PERM_EXPR
On Mon, Sep 3, 2012 at 6:12 PM, Marc Glisse marc.gli...@inria.fr wrote: On Mon, 3 Sep 2012, Richard Guenther wrote: + if (code == VEC_PERM_EXPR) +{ + tree p, m, index, tem; + unsigned nelts; + m = gimple_assign_rhs3 (def_stmt); + if (TREE_CODE (m) != VECTOR_CST) + return false; + nelts = VECTOR_CST_NELTS (m); + idx = TREE_INT_CST_LOW (VECTOR_CST_ELT (m, idx)); + idx %= 2 * nelts; + if (idx nelts) + { + p = gimple_assign_rhs1 (def_stmt); + } + else + { + p = gimple_assign_rhs2 (def_stmt); + idx -= nelts; + } + index = build_int_cst (TREE_TYPE (TREE_TYPE (m)), idx * size); + tem = fold_build3 (BIT_FIELD_REF, TREE_TYPE (op), p, op1, index); This shouldn't simplify, so you can use build3 instead. I think that it is possible for p to be a VECTOR_CST, if the shuffle involves one constant and one non-constant vectors, no? Well, constant propagation should have handled it ... When it sees __builtin_shuffle(cst1,var,cst2)[cst3], CCP should basically do the same thing I am doing here, in the hope that the element will be part of cst1 instead of var? Yes, if CCP sees vec_1 = VEC_PERM cst1, var, cst2; scalar_2 = BIT_FIELD_REF vec_1, cst3; then if vec_1 ends up being constant it should figure out that vec_1 is constant and that scalar_2 is constant. Of course if we have vec_1 = VEC_PERM cst1, var1, var2; and var1/var2 are CONSTRUCTORS with some elements constants then it won't be able to do that and forwprop should do it. So I suppose handling constants in forwprop is fine (but it would be nice to double-check if in the first example CCP figures out that vec_1 and scalar_2 are constant). What if builtin_shuffle takes 2 constructors, one of which contains at least one constant? It looks easier to handle it here and let the next run of CCP notice the simplified expression. Or do you mean I should add the new function to CCP (or even fold) instead of forwprop? (wouldn't be the first time CCP does more than constant propagation) CCP should only do lattice-based constant propagation, other cases need to be handled in forwprop. If you use fold_build3 you need to check that the result is in expected form (a is_gimple_invariant or an SSA_NAME). Now that I look at this line, I wonder if I am missing some unshare_expr for p and/or op1. If either is a CONSTRUCTOR and its def stmt is not removed and it survives into tem then yes ... But the integer_cst doesn't need it. Ok, thanks. -- Marc Glisse
Re: [PATCH][RFC] Add -Og
On Mon, 3 Sep 2012, H.J. Lu wrote: On Mon, Sep 3, 2012 at 11:50 AM, rguent...@suse.de wrote: H.J. Lu hjl.to...@gmail.com wrote: On Mon, Sep 3, 2012 at 6:28 AM, Richard Guenther richard.guent...@gmail.com wrote: On Fri, Aug 10, 2012 at 1:30 PM, Richard Guenther rguent...@suse.de wrote: This adds a new optimization level, -Og, as previously discussed. It aims at providing fast compilation, a superior debugging experience and reasonable runtime performance. Instead of making -O1 this optimization level this adds a new -Og. It's a first cut, highlighting that our fixed pass pipeline and simply enabling/disabling individual passes (but not pass copies for example) doesn't scale to properly differentiate between -Og and -O[23]. -O1 should get similar treatment, eventually just building on -Og but not focusing on debugging experience. That is, I expect that in the end we will at least have two post-IPA optimization pipelines. It also means that you cannot enable PRE or VRP with -Og at the moment because these passes are not anywhere scheduled (similar to the situation with -O0). It has some funny effect on dump-file naming of the pass copies though, which hints at that the current setup is too static. For that reason the new queue comes after the old, to not confuse too many testcases. It also does not yet disable any of the early optimizations that make debugging harder (SRA comes to my mind here, as does switch-conversion and partial inlining). The question arises if we want to support in any reasonable way using profile-feedback or LTO for -O[01g], thus if we rather want to delay some of the early opts to after IPA optimizations. Not bootstrapped or fully tested, but it works for the compile torture. Comments welcome, No comments? Then I'll drop this idea for 4.8. When I debug binutils, I have to use -O0 -g to get precise line and variable info. Also glibc has to be compiled with -O, which makes debug a challenge. Will -Og help bintils and glibc debug? I suppose so, but it is hard to tell without knowing more about the issues. The main issues are 1. I need to know precise values for all local variables at all times. That would certainly be a good design goal for -Og (but surely the first cut at it won't do it). 2. Compiler shouldn't inline a function or move lines around. Let's split that. 2. Compiler shouldn't inline a function well - we need to inline always_inline functions. And I am positively sure people want trivial C++ abstraction penalty to be removed even with -Og, thus getter/setter methods inlined. Let's say the compiler should not inline a function not declared inline and the compiler should not inline a function if that would increase code size even if it is declared inline? 3. Compiler shouldn't move lines around. A good goal as well, probably RTL pieces are least ready for this. 4. Generated code should be small and fast, compile-time and memory usage should be low. Unless either of it defeats 1. to 3. The patch only provides a starting point and from the GIMPLE side should be reasonably close to the goals above. Richard.
[Patch,avr,committed] Fix PR54476
http://gcc.gnu.org/viewcvs?root=gccview=revrev=190920 http://gcc.gnu.org/viewcvs?root=gccview=revrev=190921 Applied these obvious fix for PR54476. Johann -- Index: config/avr/avr.c === --- config/avr/avr.c(revision 190914) +++ config/avr/avr.c(working copy) @@ -10449,7 +10449,7 @@ avr_mem_clobber (void) static void avr_expand_delay_cycles (rtx operands0) { - unsigned HOST_WIDE_INT cycles = UINTVAL (operands0); + unsigned HOST_WIDE_INT cycles = UINTVAL (operands0) GET_MODE_MASK (SImode); unsigned HOST_WIDE_INT cycles_used; unsigned HOST_WIDE_INT loop_count;
Re: [middle-end] Add machine_mode to address_cost target hook
On 04/09/12 08:52, Oleg Endo wrote: On Mon, 2012-09-03 at 01:58 +0200, Oleg Endo wrote: OKOK -- I'll do it :) (within the next couple of days) And so I did. Attached is an updated patch that adds the address space parameter to the address_cost function. I hope that this change does not reset the ACKs so far: [x] target-independent bits [ ] alpha [ ] arm [ ] avr [ ] bfin [ ] cr16 [ ] cris [ ] epiphany[ ] i386 [ ] ia64 [ ] iq2000[ ] lm32[ ] m32c [ ] m32r [ ] mcore [ ] mep [x] microblaze [x] mips [ ] mmix [x] mn10300 [ ] pa [ ] rs6000[ ] rx[ ] s390[ ] score [x] sh[ ] sparc [ ] spu [ ] stormy16 [ ] v850 [ ] vax [ ] xtensa Tested with 'make all-gcc' on SH xgcc and i386 native build. No functional changes, except on MIPS, as requested by Richard Sandiford. Cheers, Oleg ChangeLog: * hooks.c (hook_int_rtx_mode_as_bool_0): New function. * hooks.h (hook_int_rtx_mode_as_bool_0): Declare it. * output.h (default_address_cost): Add machine_mode and address space arguments. * target.def (address_cost): Likewise. * rtlanal.c (address_cost): Pass mode and address space to target hook. (default_address_cost): Add unnamed machine_mode and address space arguments. * doc/tm.texi: Regenerate. * config/alpha/alpha.c (TARGET_ADDRESS_COST): Use hook_int_rtx_mode_as_bool_0 instead of hook_int_rtx_bool_0. * config/arm/arm.c (arm_address_cost): Add machine_mode and address space arguments. * config/avr/avr.c (avr_address_cost): Likewise. * config/bfin/bfin.c (bfin_address_cost): Likewise. * config/cr16/cr16.c (cr16_address_cost): Likewise. * config/cris/cris.c (cris_address_cost): Likewise. * config/epiphany/epiphany.c (epiphany_address_cost): Likewise. * config/i386/i386.c (ix86_address_cost): Likewise. * config/ia64/ia64.c (TARGET_ADDRESS_COST): Use hook_int_rtx_mode_as_bool_0 instead of hook_int_rtx_bool_0. * config/iq2000/iq2000.c (iq2000_address_cost): Add machine_mode and address space arguments. Pass them on in recursive invocation. * config/lm32/lm32.c (TARGET_ADDRESS_COST): Use hook_int_rtx_mode_as_bool_0 instead of hook_int_rtx_bool_0. * config/m32c/m32c.c (m32c_address_cost): Add machine_mode and address space arguments. * config/m32r/m32r.c (TARGET_ADDRESS_COST): Use hook_int_rtx_mode_as_bool_0 instead of hook_int_rtx_bool_0. * config/mcore/mcore.c (TARGET_ADDRESS_COST): Likewise. * config/mep/mep.c (mep_address_cost): Add machine_mode and address space arguments. * config/microblaze/microblaze.c (microblaze_address_cost): Likewise. * config/mips/mips.c (mips_address_cost): Likewise. * config/mmix/mmix.c (TARGET_ADDRESS_COST): Use hook_int_rtx_mode_as_bool_0 instead of hook_int_rtx_bool_0. * config/mn10300/mn10300.c (mn10300_address_cost): Add machine_mode and address space arguments. Use GET_MODE (x) and ADDR_SPACE_GENERIC in recursive invocation. * config/pa/pa.c (hppa_address_cost): Add machine_mode and address space arguments. * config/rs6000/rs6000.c (rs6000_debug_address_cost): Likewise. (TARGET_ADDRESS_COST): Use hook_int_rtx_mode_as_bool_0 instead of hook_int_rtx_bool_0. * config/rx/rx.c (rx_address_cost): Add machine_mode and address space arguments. * config/s390/s390.c (s390_address_cost): Likewise. * config/score/score-protos.h (score_address_cost): Likewise. * config/score/score.c (score_address_cost): Likewise. * config/sh/sh.c (sh_address_cost): Likewise. * config/sparc/sparc.c (TARGET_ADDRESS_COST): Use hook_int_rtx_mode_as_bool_0 instead of hook_int_rtx_bool_0. * config/spu/spu.c (TARGET_ADDRESS_COST): Likewise. * config/stormy16/stormy16.c (xstormy16_address_cost): Add machine_mode and address space arguments. * config/v850/v850.c (TARGET_ADDRESS_COST): Use hook_int_rtx_mode_as_bool_0 instead of hook_int_rtx_bool_0. * config/vax/vax.c (vax_address_cost): Add machine_mode and address space arguments. * config/xtensa/xtensa (TARGET_ADDRESS_COST): Use hook_int_rtx_mode_as_bool_0 instead of hook_int_rtx_bool_0. The arm bits are OK. R.
Ping^2: [PATCH]Remove duplicate check on BRANCH_COST in fold-const.c
-Original Message- From: Richard Earnshaw Sent: Thursday, July 26, 2012 9:19 PM To: Andrew Pinski Cc: Bin Cheng; gcc-patches@gcc.gnu.org Subject: Re: [PATCH]Remove duplicate check on BRANCH_COST in fold-const.c On 26/07/12 11:27, Andrew Pinski wrote: On Thu, Jul 26, 2012 at 3:20 AM, Bin Cheng bin.ch...@arm.com wrote: Hi, This patch removes the duplicate check on BRANCH_COST in fold_truth_andor. The BRANCH_COST condition removed is a duplicate of the default definition of LOGICAL_OP_NON_SHORT_CIRCUIT. All current targets (mips and rs6000) that provide non-default definitions of LOGICAL_OP_SHORT_CIRCUIT set it to 0, so this patch is therefore just a code cleanup and does not change behaviour in the compiler. I built mipsel-elf cross compiler and compared newlib/libstdc++ compiled by the patched/original compilers. Is it OK? Just some history here on this. The BRANCH COST check was there before LOGICAL_OP_NON_SHORT_CIRCUIT was added. I will be submitting a patch which changes the MIPS definition soon but it will not be based on the branch cost but rather than another option. So in the end it might not be redundant as it is currently. Thanks, Andrew You can always factor BRANCH_COST into LOGICAL_OP_NON_SHORT_CIRCUIT (as the default currently does), so there's no loss of functionality from removing this currently redundant check. However, the current definition is broken in that it makes it impossible to force the compiler to use this optimization when the branch cost is low. Hi, is this change ok? Or we need more discussion on it? Thanks very much.
Re: [Patch,avr] PR54461: Better AVR-Libc integration
On Tue, Sep 4, 2012 at 8:55 AM, Georg-Johann Lay a...@gjlay.de wrote: Gabriel Dos Reis schrieb: Georg-Johann Lay wrote: Gabriel Dos Reis schrieb: Georg-Johann Lay wrote: Gabriel Dos Reis schrieb: Georg-Johann Lay wrote: AVR-Libc comes with hand-optimized float support functions written in assembler. These functions use the same naming conventions like libgcc. There are situations where this name clashed lead to performance regression because the functions from libgcc are linked. One example are the new fixed-point support that convert fixed-point to/from float and reference float/int conversion functions from within libgcc. The float implementation in libm.a have been discussed several times with the only result that it is very unlikely that the code will ever be integrated into libgcc because the original authors are no more around. And is is much less work to add a new configure switch than to port and integrate the code, given there were no license issues. One point against such an extension was that such change to the compiler establishes a dependency between the compiler and AVR-Libc, but this decision has been made long ago by accepting code that actually should had been added to libgcc -- but was not for whatever reason. This patch removes that performance regressions by removing the doubly implemented functions from libgcc by means of a new configure option --with-avrlibc. as I stated yesterday, I do not understand why there needs to be yet another configure option. The NATURAL libc for ARV targets is ARV-libc. We should not need a switch for that. There is also newlib that is used with avr-gcc. I know this because some bugs are only triggered for newlib. If there are users that report bugs if avr-gcc is build for newlib, I'd guess these users are actually interested in using newlib. I did not say there was no other libc library. I said that the *natural* libc appears to be AVR-libc. We don't configure GCC/g++ saying --with-libstdc++. That's a different story because these libraries support in-tree build just like newlib does. This is not true for AVR-Libc which does not support in-tree builds. I agree that AVR-Libc is the most common libc implementation used with avr-gcc and is has many advantages over other libc implementation (except that it does not support in-tree builds). I think the in-tree builds thing is a red herring. I don't think so. If there was an in-tree build gcc could detect itself whether or not AVR-Libc is present or not. With the current setup the user has to specify that -- in whatever direction: that libc is there or that libc is not there depending on whatever is default. You can do a link check on whether -lc provides those functions and skip those that overlap with libgcc. Richard. However, a --with-avrlibc is not needed to *get* the support from AVR-Libc, it's just used to fix some problems that arise in certain use cases. so, let's make it the default -- see below. Besides that, the proposed arrangement does not affect the configuration if the switch is *not* specified, thus the patch is appropriate to be backported. My intention is to backport it to 4.7 as indicated by the milestone, but if the change was unconditionally I don't think the change is appropriate for a backport. It is perfectly reasonable and OK to to make the backport more guarded (e.g. by the configure option) than on mainline. And after all it's just a *configure* option that some distribution maintainers can set if they want to. yes, but it is still one more configure option. hmm. The configure machinery was not changed, it automatically sets with_foo if --with-foo is specified. It's just about who is to be blamed if he does not read the release notes ;-) Whatever, I think we two are stuck now and enough arguments passed back and forth. Let the port maintainers decide. And Jörg, would you check the excludes list in t-avrlibc? Johann The tool chain user is not bothered at all by the new option and won't even notice it. From the user perspective it's just as if some optimizations had been added to the tool chain. What do you propose? Use the setting per default and support a --with-avrlibc=no if the user want full libgcc support and nothing removed from it? Yes. Let's make the sane behaviour the default. -- Gaby
Ping: [PATCH GCC/ARM] Fix problem that hardreg_cprop opportunities are missed on thumb1
Hi, For thumb1, arm-gcc rewrites move insn into subtract of ZERO in peephole2 pass intentionally, then executes pass_if_after_reload/pass_regrename/pass_cprop_hardreg sequentially. In this scenario, copy propagation opportunities are missed because: 1. the move insns are re-written. 2. pass_cprop_hardreg currently don't notice the subtract of ZERO. This patch fixes the problem and the logic is: 1. notice the plus/subtract of ZERO in pass_cprop_hardreg. 2. if the last insn providing information about conditional codes is in the form of dest_reg = src_reg - 0, record the src_reg in newly added field thumb1_cc_op0_src of structure machine_function. 3. in pattern cbranchsi4_insn, check thumb1_cc_op0_src along with thumb1_cc_op0 to save one comparison insn. I measured the patch on CSiBE, about 600 bytes are saved for both O2 and Os on cortex-m0 without any regression. I also tested the patch on arm-none-eabi+cortex-m0/arm-none-eabi+cortex-m3/i686-pc-linux and no regressions introduced. So is it OK? Thanks 2012-08-13 Bin Cheng bin.ch...@arm.com * regcprop.c (copyprop_hardreg_forward_1) Notice copies in the form of subtract of ZERO. * config/arm/arm.h (thumb1_cc_op0_src) New field. * config/arm/arm.c (thumb1_final_prescan_insn) Record thumb1_cc_op0_src. * config/arm/arm.md (cbranchsi4_insn) Check thumb1_cc_op0_src along with thumb1_cc_op0. Ping? Hi Ramana, could you help me review this patch? Hi Eric, Richard, could you help me review the change in regcprop.c? Thanks very much
Re: Ping^2: [PATCH]Remove duplicate check on BRANCH_COST in fold-const.c
On Tue, Sep 4, 2012 at 11:56 AM, Bin Cheng bin.ch...@arm.com wrote: -Original Message- From: Richard Earnshaw Sent: Thursday, July 26, 2012 9:19 PM To: Andrew Pinski Cc: Bin Cheng; gcc-patches@gcc.gnu.org Subject: Re: [PATCH]Remove duplicate check on BRANCH_COST in fold-const.c On 26/07/12 11:27, Andrew Pinski wrote: On Thu, Jul 26, 2012 at 3:20 AM, Bin Cheng bin.ch...@arm.com wrote: Hi, This patch removes the duplicate check on BRANCH_COST in fold_truth_andor. The BRANCH_COST condition removed is a duplicate of the default definition of LOGICAL_OP_NON_SHORT_CIRCUIT. All current targets (mips and rs6000) that provide non-default definitions of LOGICAL_OP_SHORT_CIRCUIT set it to 0, so this patch is therefore just a code cleanup and does not change behaviour in the compiler. I built mipsel-elf cross compiler and compared newlib/libstdc++ compiled by the patched/original compilers. Is it OK? Just some history here on this. The BRANCH COST check was there before LOGICAL_OP_NON_SHORT_CIRCUIT was added. I will be submitting a patch which changes the MIPS definition soon but it will not be based on the branch cost but rather than another option. So in the end it might not be redundant as it is currently. Thanks, Andrew You can always factor BRANCH_COST into LOGICAL_OP_NON_SHORT_CIRCUIT (as the default currently does), so there's no loss of functionality from removing this currently redundant check. However, the current definition is broken in that it makes it impossible to force the compiler to use this optimization when the branch cost is low. Hi, is this change ok? Or we need more discussion on it? It's not ok (I btw fail to see the patch in this thread). The current way LOGICAL_OP_NON_SHORT_CIRCUIT is implemented/used should instead be changed to always match the pattern LOGICAL_OP_NON_SHORT_CIRCUIT (BRANCH_COST (optimize_function_for_speed_p (cfun), false) = 2) and the default value of LOGICAL_OP_NON_SHORT_CIRCUIT should be 1, defined in defaults.h (and the docs updated). Richard. Thanks very much.
Re: combine vec_perm_expr with constructor
On Mon, Sep 3, 2012 at 5:50 PM, Marc Glisse marc.gli...@inria.fr wrote: On Mon, 3 Sep 2012, Richard Guenther wrote: On Mon, Sep 3, 2012 at 4:00 PM, Marc Glisse marc.gli...@inria.fr wrote: On Mon, 3 Sep 2012, Richard Guenther wrote: You shouldn't need the VECTOR_CST handling - constant propagation should already ensure properly simplified code here (and is the more canonical place to handle this). IIRC, I added VECTOR_CST because of mixed constructor/vector_cst shuffles (and because it wasn't too hard). If I remove it (I can), I guess some of the testcases won't work anymore. I see. If you still have a testcase can you look if CCP does not do something it should? I think CCP is working fine, the fold_ternary patch you approved today tests some of that (without that patch, sometimes ccp1 does half the work and fre1 finishes it, and since forwprop1 is before fre1, I hit that case there). Is there a particular scenario you have in mind that might not be handled? No. In theory it should be handled just fine via gimple_fold_stmt_to_constant_1 dispatching to fold_ternary. Richard. Here I was concerned with: x={a,b}; // constructor y={18,42}; // vector_cst m={0,3}; __builtin_shuffle(x,y,m) // should be {a,42} -- Marc Glisse
Re: Ping: [PATCH] Enable bbro for -Os
On Wed, Aug 29, 2012 at 10:42 AM, Zhenqiang Chen zhenqiang.c...@arm.com wrote: -Original Message- From: Steven Bosscher [mailto:stevenb@gmail.com] Sent: Friday, August 24, 2012 8:17 PM To: Zhenqiang Chen Cc: gcc-patches@gcc.gnu.org Subject: Re: Ping: [PATCH] Enable bbro for -Os On Wed, Aug 22, 2012 at 8:49 AM, Zhenqiang Chen zhenqiang.c...@arm.com wrote: The patch is to enable bbro for -Os. When optimizing for size, it * avoid duplicating block. * keep its original order if there is no chance to fall through. * ignore edge frequency and probability. * handle predecessor first if its index is smaller to break long trace. You do this by inserting the index as a key. I don't fully understand this change. You're assuming that a block with a lower index has a lower pre- order number in the CFG's DFS spanning tree, IIUC (i.e. the blocks are numbered sequentially)? I'm not sure that's always true. I think you should add an explanation for this heuristic. Thank you for the comments. cleanup_cfg is called at the end cfg_layout_initialize before reorder_basic_blocks. cleanup_cfg does lots of optimization on cfg and renumber the basic blocks. After cleanup_cfg, the blocks are roughly numbered sequentially. Well, sequentially in their current order which is not in any way flow-controlled. The heuristic bases on the result of cleanup_cfg. It just wants to keep the order of cleanup_cfg since logs show we will have code size improvement (by cleanup_cfg) even if we do not call reorder_basic_blocks. index as a key is a simple way keep the original order. That's true. Comments are added in the updated patch. * only connect Trace n with Trace n + 1 to reduce long jump. ... * bb-reorder.c (connect_better_edge_p): New added. (find_traces_1_round): When optimizing for size, ignore edge frequency and probability, and handle all in one round. (bb_to_key): Use bb-index as key for size. (better_edge_p): The smaller bb index is better for size. (connect_traces): Connect block n with block n + 1; connect trace m with trace m + 1 if falling through. (copy_bb_p): Avoid duplicating blocks. (gate_handle_reorder_blocks): Enable bbro when optimizing for -Os. This probably fixes PR54364. Try the case in PR54364, the patch does reduce several jmp. @@ -1169,6 +1272,10 @@ copy_bb_p (const_basic_block bb, int code_may_grow) int max_size = uncond_jump_length; rtx insn; + /* Avoid duplicating blocks for size. */ if + (optimize_function_for_size_p (cfun)) +return false; + if (!bb-frequency) return false; This shouldn't be necessary, due to the CODE_MAY_GROW argument, and this change should result in a code size increase because jumps to conditional jumps aren't removed anymore. What did you make this change for, do you have a test case where code size increases if you allow copy_bb_p to return true? Thanks. It is not necessary. Here is the updated ChangeLog. The updated patch is attached. ChangeLog 2012-08-29 Zhenqiang Chen zhenqiang.c...@arm.com PR middle-end/54364 * bb-reorder.c (connect_better_edge_p): New added. (find_traces_1_round): When optimizing for size, ignore edge frequency and probability, and handle all in one round. (bb_to_key): Use bb-index as key for size. (better_edge_p): The smaller bb index is better for size. (connect_traces): Connect block n with block n + 1; connect trace m with trace m + 1 if falling through. (gate_handle_reorder_blocks): Enable bbro when optimizing for -Os. @@ -530,10 +544,11 @@ find_traces_1_round (int branch_th, int exec_th, gcov_type count_th, } /* Edge that cannot be fallthru or improbable or infrequent -successor (i.e. it is unsuitable successor). */ +successor (i.e. it is unsuitable successor). +For size, ignore the frequency and probability. */ if (!(e-flags EDGE_CAN_FALLTHRU) || (e-flags EDGE_COMPLEX) - || prob branch_th || EDGE_FREQUENCY (e) exec_th - || e-count count_th) + || (prob branch_th || EDGE_FREQUENCY (e) exec_th + || e-count count_th) !for_size) continue; why that change? It seems you do re-orderings that would not be done with -Os even though your goal was to preserve the original ordering. + /* Wait for the predecessors. */ + if ((e == best_edge) for_size + (EDGE_COUNT (best_edge-dest-succs) 1 + || EDGE_COUNT (best_edge-dest-preds) 1)) + { + best_edge = NULL; + } I don't understand this (well, I'm not very familiar with bb-reorder), doesn't that mean you rather want to push this block to the next round? Overall I
Re: [middle-end] Add machine_mode to address_cost target hook
Il 04/09/2012 09:52, Oleg Endo ha scritto: [x] target-independent bits [ ] alpha [ ] arm [ ] avr [ ] bfin [ ] cr16 [ ] cris [ ] epiphany[ ] i386 [ ] ia64 [ ] iq2000[ ] lm32[ ] m32c [ ] m32r [ ] mcore [ ] mep [x] microblaze [x] mips [ ] mmix [x] mn10300 [ ] pa [ ] rs6000[ ] rx[ ] s390[ ] score [x] sh[ ] sparc [ ] spu [ ] stormy16 [ ] v850 [ ] vax [ ] xtensa Tested with 'make all-gcc' on SH xgcc and i386 native build. No functional changes, except on MIPS, as requested by Richard Sandiford. I think you only need explicit approval for mn10300. All other changes are trivial. +hook_int_rtx_mode_as_bool_0 (rtx, enum machine_mode, addr_space_t, bool) So we're using C++ already? Or do we want ATTRIBUTE_UNUSED here? Paolo
Re: [PATCH][RFC] Add -Og
On 04/09/12 10:45, Richard Guenther wrote: On Mon, 3 Sep 2012, H.J. Lu wrote: On Mon, Sep 3, 2012 at 11:50 AM, rguent...@suse.de wrote: H.J. Lu hjl.to...@gmail.com wrote: On Mon, Sep 3, 2012 at 6:28 AM, Richard Guenther richard.guent...@gmail.com wrote: On Fri, Aug 10, 2012 at 1:30 PM, Richard Guenther rguent...@suse.de wrote: This adds a new optimization level, -Og, as previously discussed. It aims at providing fast compilation, a superior debugging experience and reasonable runtime performance. Instead of making -O1 this optimization level this adds a new -Og. It's a first cut, highlighting that our fixed pass pipeline and simply enabling/disabling individual passes (but not pass copies for example) doesn't scale to properly differentiate between -Og and -O[23]. -O1 should get similar treatment, eventually just building on -Og but not focusing on debugging experience. That is, I expect that in the end we will at least have two post-IPA optimization pipelines. It also means that you cannot enable PRE or VRP with -Og at the moment because these passes are not anywhere scheduled (similar to the situation with -O0). It has some funny effect on dump-file naming of the pass copies though, which hints at that the current setup is too static. For that reason the new queue comes after the old, to not confuse too many testcases. It also does not yet disable any of the early optimizations that make debugging harder (SRA comes to my mind here, as does switch-conversion and partial inlining). The question arises if we want to support in any reasonable way using profile-feedback or LTO for -O[01g], thus if we rather want to delay some of the early opts to after IPA optimizations. Not bootstrapped or fully tested, but it works for the compile torture. Comments welcome, No comments? Then I'll drop this idea for 4.8. When I debug binutils, I have to use -O0 -g to get precise line and variable info. Also glibc has to be compiled with -O, which makes debug a challenge. Will -Og help bintils and glibc debug? I suppose so, but it is hard to tell without knowing more about the issues. The main issues are 1. I need to know precise values for all local variables at all times. That would certainly be a good design goal for -Og (but surely the first cut at it won't do it). 2. Compiler shouldn't inline a function or move lines around. Let's split that. 2. Compiler shouldn't inline a function well - we need to inline always_inline functions. And I am positively sure people want trivial C++ abstraction penalty to be removed even with -Og, thus getter/setter methods inlined. Let's say the compiler should not inline a function not declared inline and the compiler should not inline a function if that would increase code size even if it is declared inline? 3. Compiler shouldn't move lines around. A good goal as well, probably RTL pieces are least ready for this. 4. Generated code should be small and fast, compile-time and memory usage should be low. Unless either of it defeats 1. to 3. The patch only provides a starting point and from the GIMPLE side should be reasonably close to the goals above. Richard. I'd add 5. User variables don't have to live in memory (or in any single place), but there should only be one 'live' location at any one time. Changing the value of a variable at a sequence point/statement/line boundary (pick a definition and document it) should affect all subsequent uses of that variable. Values assigned to variables remain available until the declaration goes out of scope. R.
Re: Ping^2: [PATCH]Remove duplicate check on BRANCH_COST in fold-const.c
On 04/09/12 11:11, Richard Guenther wrote: On Tue, Sep 4, 2012 at 11:56 AM, Bin Cheng bin.ch...@arm.com wrote: -Original Message- From: Richard Earnshaw Sent: Thursday, July 26, 2012 9:19 PM To: Andrew Pinski Cc: Bin Cheng; gcc-patches@gcc.gnu.org Subject: Re: [PATCH]Remove duplicate check on BRANCH_COST in fold-const.c On 26/07/12 11:27, Andrew Pinski wrote: On Thu, Jul 26, 2012 at 3:20 AM, Bin Cheng bin.ch...@arm.com wrote: Hi, This patch removes the duplicate check on BRANCH_COST in fold_truth_andor. The BRANCH_COST condition removed is a duplicate of the default definition of LOGICAL_OP_NON_SHORT_CIRCUIT. All current targets (mips and rs6000) that provide non-default definitions of LOGICAL_OP_SHORT_CIRCUIT set it to 0, so this patch is therefore just a code cleanup and does not change behaviour in the compiler. I built mipsel-elf cross compiler and compared newlib/libstdc++ compiled by the patched/original compilers. Is it OK? Just some history here on this. The BRANCH COST check was there before LOGICAL_OP_NON_SHORT_CIRCUIT was added. I will be submitting a patch which changes the MIPS definition soon but it will not be based on the branch cost but rather than another option. So in the end it might not be redundant as it is currently. Thanks, Andrew You can always factor BRANCH_COST into LOGICAL_OP_NON_SHORT_CIRCUIT (as the default currently does), so there's no loss of functionality from removing this currently redundant check. However, the current definition is broken in that it makes it impossible to force the compiler to use this optimization when the branch cost is low. Hi, is this change ok? Or we need more discussion on it? It's not ok (I btw fail to see the patch in this thread). The current way LOGICAL_OP_NON_SHORT_CIRCUIT is implemented/used should instead be changed to always match the pattern LOGICAL_OP_NON_SHORT_CIRCUIT (BRANCH_COST (optimize_function_for_speed_p (cfun), false) = 2) and the default value of LOGICAL_OP_NON_SHORT_CIRCUIT should be 1, defined in defaults.h (and the docs updated). That's not going to work for modern ARM cores. We want to set BRANCH_COST to 1 but still have it generate the non-short-circuit code (because conditional compares are really cheap. R. Richard. Thanks very much.
Re: [wwwdocs] PATCH for Re: [PATCH] Remove matrix-reorg
On Mon, 3 Sep 2012, Richard Guenther wrote: I'd not mention the command-line flags. I was thinking to point out what to not use any longer, in case. Doesn't that make sense for the release notes? They were not working correctly and they did not work with LTO which made them useless apart from for single-TU programs. How about the following? Gerald Index: changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v retrieving revision 1.26 diff -u -3 -p -r1.26 changes.html --- changes.html2 Sep 2012 15:56:24 - 1.26 +++ changes.html4 Sep 2012 10:59:32 - @@ -38,7 +38,7 @@ explicit use of vector types may be inco built with older versions of GCC. Auto-vectorized code is not affected by this change./p -h2General Optimizer Improvements/h2 +h2General Optimizer Improvements (and Changes)/h2 ul liA new option code-ftree-partial-pre/code was added to control @@ -46,6 +46,11 @@ by this change./p This option is enabled by default at the code-O3/code optimization level, and it makes PRE more aggressive. /li +liThe struct reorg and matrix reorg optimizations (command-line +options code-fipa-struct-reorg/code and +code-fipa-matrix-reorg/code) have been removed. They did not +work correctly nor with link-time optimization (LTO), hence were only +applicable to programs consisting of a single translation unit./li /ul
Re: [Patch,avr] PR54461: Better AVR-Libc integration
Richard Guenther wrote: Georg-Johann Lay wrote: Gabriel Dos Reis schrieb: Georg-Johann Lay wrote: Gabriel Dos Reis schrieb: Georg-Johann Lay wrote: Gabriel Dos Reis schrieb: Georg-Johann Lay wrote: AVR-Libc comes with hand-optimized float support functions written in assembler. These functions use the same naming conventions like libgcc. There are situations where this name clashed lead to performance regression because the functions from libgcc are linked. One example are the new fixed-point support that convert fixed-point to/from float and reference float/int conversion functions from within libgcc. The float implementation in libm.a have been discussed several times with the only result that it is very unlikely that the code will ever be integrated into libgcc because the original authors are no more around. And is is much less work to add a new configure switch than to port and integrate the code, given there were no license issues. One point against such an extension was that such change to the compiler establishes a dependency between the compiler and AVR-Libc, but this decision has been made long ago by accepting code that actually should had been added to libgcc -- but was not for whatever reason. This patch removes that performance regressions by removing the doubly implemented functions from libgcc by means of a new configure option --with-avrlibc. as I stated yesterday, I do not understand why there needs to be yet another configure option. The NATURAL libc for ARV targets is ARV-libc. We should not need a switch for that. There is also newlib that is used with avr-gcc. I know this because some bugs are only triggered for newlib. If there are users that report bugs if avr-gcc is build for newlib, I'd guess these users are actually interested in using newlib. I did not say there was no other libc library. I said that the *natural* libc appears to be AVR-libc. We don't configure GCC/g++ saying --with-libstdc++. That's a different story because these libraries support in-tree build just like newlib does. This is not true for AVR-Libc which does not support in-tree builds. I agree that AVR-Libc is the most common libc implementation used with avr-gcc and is has many advantages over other libc implementation (except that it does not support in-tree builds). I think the in-tree builds thing is a red herring. I don't think so. If there was an in-tree build gcc could detect itself whether or not AVR-Libc is present or not. With the current setup the user has to specify that -- in whatever direction: that libc is there or that libc is not there depending on whatever is default. You can do a link check on whether -lc provides those functions and skip those that overlap with libgcc. Can you explain this? A typical build of avr tools goes like 1) configure, build and install binutils 2) configure, build and install the compiler 3) configure, build and install AVR-Libc so that in step 2 no checking is possible because there is no -lc yet. Or do you mean a check at run time (of the compiler)? Johann
Re: [PATCH][RFC] Add -Og
On 4 September 2012 10:45, Richard Guenther rguent...@suse.de wrote: On Mon, 3 Sep 2012, H.J. Lu wrote: On Mon, Sep 3, 2012 at 11:50 AM, rguent...@suse.de wrote: H.J. Lu hjl.to...@gmail.com wrote: On Mon, Sep 3, 2012 at 6:28 AM, Richard Guenther richard.guent...@gmail.com wrote: On Fri, Aug 10, 2012 at 1:30 PM, Richard Guenther rguent...@suse.de wrote: This adds a new optimization level, -Og, as previously discussed. It aims at providing fast compilation, a superior debugging experience and reasonable runtime performance. Instead of making -O1 this optimization level this adds a new -Og. It's a first cut, highlighting that our fixed pass pipeline and simply enabling/disabling individual passes (but not pass copies for example) doesn't scale to properly differentiate between -Og and -O[23]. -O1 should get similar treatment, eventually just building on -Og but not focusing on debugging experience. That is, I expect that in the end we will at least have two post-IPA optimization pipelines. It also means that you cannot enable PRE or VRP with -Og at the moment because these passes are not anywhere scheduled (similar to the situation with -O0). It has some funny effect on dump-file naming of the pass copies though, which hints at that the current setup is too static. For that reason the new queue comes after the old, to not confuse too many testcases. It also does not yet disable any of the early optimizations that make debugging harder (SRA comes to my mind here, as does switch-conversion and partial inlining). The question arises if we want to support in any reasonable way using profile-feedback or LTO for -O[01g], thus if we rather want to delay some of the early opts to after IPA optimizations. Not bootstrapped or fully tested, but it works for the compile torture. Comments welcome, No comments? Then I'll drop this idea for 4.8. When I debug binutils, I have to use -O0 -g to get precise line and variable info. Also glibc has to be compiled with -O, which makes debug a challenge. Will -Og help bintils and glibc debug? I suppose so, but it is hard to tell without knowing more about the issues. The main issues are 1. I need to know precise values for all local variables at all times. That would certainly be a good design goal for -Og (but surely the first cut at it won't do it). 2. Compiler shouldn't inline a function or move lines around. Let's split that. 2. Compiler shouldn't inline a function well - we need to inline always_inline functions. And I am positively sure people want trivial C++ abstraction penalty to be removed even with -Og, thus getter/setter methods inlined. Let's say the compiler should not inline a function not declared inline and the compiler should not inline a function if that would increase code size even if it is declared inline? I don't see a problem with inlining functions under -Og - under a couple of assumptions: * The debug table format can correctly mark inlined functions (DWARF can - I don't know about other formats). * The compiler is executing sequence points in order - and so the function being inlined doesn't 'spill out' into the function it is inlined into. See below for further comments. This should provide enough information to the debugger to allow it to maintain the illusion that an inlined function is a separate function, and enable a user to set breakpoints on all calls to the function. 3. Compiler shouldn't move lines around. A good goal as well, probably RTL pieces are least ready for this. I would change this to say something like (using C language terms): The compiler should provide enough information to allow breakpoints to be set at each sequence point, and that the state of the machine is such that everything before that sequence point will have been completed and that nothing after that sequence point will have been started. It is probably also possible to argue that there is a case for having points between sequence points where we say the code would be in a good state (lets call them observation points). So for instance we might want to say that in: int x, a, b, c; ... x = a + b * c; If we just say we only promise a known state at sequence points then the compiler is free to use some form of multiply-accumulate instruction here. But a user may want to see the multiply followed by addition split out. So we could define the observation points to be on the *, +, and =. 4. Generated code should be small and fast, compile-time and memory usage should be low. Unless either of it defeats 1. to 3. The patch only provides a starting point and from the GIMPLE side should be reasonably close to the goals above. Richard. Thanks, Matt -- Matthew Gretton-Dann Linaro Toolchain Working Group matthew.gretton-d...@linaro.org
Re: [PATCH 3/3] Compute predicates for phi node results in ipa-inline-analysis.c
On Tue, Sep 04, 2012 at 11:27:47AM +0200, Richard Guenther wrote: On Mon, Sep 3, 2012 at 5:52 PM, Jan Hubicka hubi...@ucw.cz wrote: On Fri, Aug 31, 2012 at 7:24 PM, Martin Jambor mjam...@suse.cz wrote: Hi, On Thu, Aug 30, 2012 at 05:11:35PM +0200, Martin Jambor wrote: this is a new version of the patch which makes ipa analysis produce predicates for PHI node results, at least at the bottom of the simplest diamond and semi-diamond CFG subgraphs. This time I also analyze the conditions again rather than extracting information from CFG edges, which means I can reason about substantially more PHI nodes. This patch makes us produce loop bounds hint for the pr48636.f90 testcase. Bootstrapped and tested on x86_64-linux. OK for trunk? Thanks, Martin 2012-08-29 Martin Jambor mjam...@suse.cz * ipa-inline-analysis.c (phi_result_unknown_predicate): New function. (predicate_for_phi_result): Likewise. (estimate_function_body_sizes): Use the above two functions. This patch, on top of the one doing loop calculations almost always, introduces a number of testsuite failures which somehow I had not caught during my testing. The problem is that either calculate_dominance_info or loop_optimizer_init introduce new SSA names for which there is no index in nonconstant_names which is allocated before the dominance and loop computations. I'm currently bootstrapping and testing the following fix which simply allocates the vector after doing the two computations. If it passes I will commit it straight away so that the regression is fixed before I leave for the weekend, I hope it's obvious enough for that. On the other hand, it would really be better if we did not change function bodies during IPA summary generation phase... Um ... we shouldn't do this. Can you track down where it happens? I suppose it might come from CFG manipulations loop_optimizer_init performs when not passing AVOID_CFG_MODIFICATIONS. I bet it come from loop noromalization :) (i.e. loop closed form or preheader construction both needs new SSA names.) I think it would be best to make pass manager to handle this and make loop normalization to happen once before all SSA IPA analysis And compute loops as well. OK, this is now PR 54477 so that we don't forget. Thanks, Martin
Re: [wwwdocs] PATCH for Re: [PATCH] Remove matrix-reorg
On Tue, 4 Sep 2012, Gerald Pfeifer wrote: On Mon, 3 Sep 2012, Richard Guenther wrote: I'd not mention the command-line flags. I was thinking to point out what to not use any longer, in case. Doesn't that make sense for the release notes? They were not working correctly and they did not work with LTO which made them useless apart from for single-TU programs. How about the following? Looks good to me. Thanks, Richard. Gerald Index: changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v retrieving revision 1.26 diff -u -3 -p -r1.26 changes.html --- changes.html 2 Sep 2012 15:56:24 - 1.26 +++ changes.html 4 Sep 2012 10:59:32 - @@ -38,7 +38,7 @@ explicit use of vector types may be inco built with older versions of GCC. Auto-vectorized code is not affected by this change./p -h2General Optimizer Improvements/h2 +h2General Optimizer Improvements (and Changes)/h2 ul liA new option code-ftree-partial-pre/code was added to control @@ -46,6 +46,11 @@ by this change./p This option is enabled by default at the code-O3/code optimization level, and it makes PRE more aggressive. /li +liThe struct reorg and matrix reorg optimizations (command-line +options code-fipa-struct-reorg/code and +code-fipa-matrix-reorg/code) have been removed. They did not +work correctly nor with link-time optimization (LTO), hence were only +applicable to programs consisting of a single translation unit./li /ul -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend
Re: [PATCH][RFC] Add -Og
On Tue, Sep 4, 2012 at 4:16 AM, Matthew Gretton-Dann matthew.gretton-d...@linaro.org wrote: On 4 September 2012 10:45, Richard Guenther rguent...@suse.de wrote: On Mon, 3 Sep 2012, H.J. Lu wrote: On Mon, Sep 3, 2012 at 11:50 AM, rguent...@suse.de wrote: H.J. Lu hjl.to...@gmail.com wrote: On Mon, Sep 3, 2012 at 6:28 AM, Richard Guenther richard.guent...@gmail.com wrote: On Fri, Aug 10, 2012 at 1:30 PM, Richard Guenther rguent...@suse.de wrote: This adds a new optimization level, -Og, as previously discussed. It aims at providing fast compilation, a superior debugging experience and reasonable runtime performance. Instead of making -O1 this optimization level this adds a new -Og. It's a first cut, highlighting that our fixed pass pipeline and simply enabling/disabling individual passes (but not pass copies for example) doesn't scale to properly differentiate between -Og and -O[23]. -O1 should get similar treatment, eventually just building on -Og but not focusing on debugging experience. That is, I expect that in the end we will at least have two post-IPA optimization pipelines. It also means that you cannot enable PRE or VRP with -Og at the moment because these passes are not anywhere scheduled (similar to the situation with -O0). It has some funny effect on dump-file naming of the pass copies though, which hints at that the current setup is too static. For that reason the new queue comes after the old, to not confuse too many testcases. It also does not yet disable any of the early optimizations that make debugging harder (SRA comes to my mind here, as does switch-conversion and partial inlining). The question arises if we want to support in any reasonable way using profile-feedback or LTO for -O[01g], thus if we rather want to delay some of the early opts to after IPA optimizations. Not bootstrapped or fully tested, but it works for the compile torture. Comments welcome, No comments? Then I'll drop this idea for 4.8. When I debug binutils, I have to use -O0 -g to get precise line and variable info. Also glibc has to be compiled with -O, which makes debug a challenge. Will -Og help bintils and glibc debug? I suppose so, but it is hard to tell without knowing more about the issues. The main issues are 1. I need to know precise values for all local variables at all times. That would certainly be a good design goal for -Og (but surely the first cut at it won't do it). It will be harder to use it to debug binutils. 2. Compiler shouldn't inline a function or move lines around. Let's split that. 2. Compiler shouldn't inline a function well - we need to inline always_inline functions. And I am positively sure people want trivial C++ abstraction penalty to be removed even with -Og, thus getter/setter methods inlined. Let's say the compiler should not inline a function not declared inline and the compiler should not inline a function if that would increase code size even if it is declared inline? I don't see a problem with inlining functions under -Og - under a couple of assumptions: * The debug table format can correctly mark inlined functions (DWARF can - I don't know about other formats). * The compiler is executing sequence points in order - and so the function being inlined doesn't 'spill out' into the function it is inlined into. See below for further comments. This should provide enough information to the debugger to allow it to maintain the illusion that an inlined function is a separate function, and enable a user to set breakpoints on all calls to the function. It works for me. 3. Compiler shouldn't move lines around. A good goal as well, probably RTL pieces are least ready for this. I would change this to say something like (using C language terms): The compiler should provide enough information to allow breakpoints to be set at each sequence point, and that the state of the machine is such that everything before that sequence point will have been completed and that nothing after that sequence point will have been started. It is probably also possible to argue that there is a case for having points between sequence points where we say the code would be in a good state (lets call them observation points). So for instance we might want to say that in: int x, a, b, c; ... x = a + b * c; If we just say we only promise a known state at sequence points then the compiler is free to use some form of multiply-accumulate instruction here. But a user may want to see the multiply followed by addition split out. So we could define the observation points to be on the *, +, and =. The problem I run into is next in gdb can go backward within the same function when compiled with optimization. It makes harder for me to use breakpoints to track where/when the problem happens. -- H.J.
[PATCH] Fix bogus use of cfun in gen_subprogram_die and premark_used_types
Hi, while looking into how to remove push/pop_cfun from dwarf2out.c, I have noticed the following wrong use of cfun in premark_used_types, which is the first thing called by gen_subprogram_die. What happens is that: 1. early inliner calls dwarf2out_abstract_function, cfun corresponds to the function being inlined to, argument decl is the function being inlined. 2. dwarf2out_abstract_function calls gen_type_die_for_member to generate an in-class declaration DIE. It does this before changing cfun. 3. gen_type_die_for_member calls gen_type_die_for_member because member is a function decl. 4. gen_subprogram_die calls premark_used_types to mark DIEs of all types in cfun-used_types_hash as perennial. But cfun does not correspond to the decl it is supposed to be emitting a DIE for, instead, used_types of the function decl is being inlined to are being marked as perennial. Similarly, when dealing with nested functions, gen_subprogram_die can call itself, just with a different decl parameter but unchanged cfun through decls_for_scope. I was not able to produce a failing testcase similar to gcc.dg/20060410.c, mainly because dwarf2out_abstract_function then changes cfun and indirectly invokes gen_subprogram_die again but still I believe the intention was to use DECL_STRUCT_FUNCTION(decl) rather than cfun in premark_used_types and everywhere in gen_subprogram_die. The patch below does exactly that and as far as my experiments go, seems to work. This patch also removes push/pop cfun from dwarf2out_abstract_function and only leaves the change of current_function_decl. Richi suggested that we push NULL cfun at this point but my goal is to enforce that cfun and current_function_decl match at each push_cfun and since dwarf2out_abstract_function can call itself, that is not the case. Nevertheless, I also bootstrapped, tested and compiled Firefox with a version in which I do push_cfun(NULL) when cfun is not already NULL and there were no problems. Bootstrapped and tested on x86_64-linux. OK for trunk? Thanks, Martin 2012-08-30 Martin Jambor mjam...@suse.cz * dwarf2out.c (dwarf2out_abstract_function): Do not change cfun. (premark_used_types): New parameter fun, use it instead of cfun. (gen_subprogram_die): Use DECL_STRUCT_FUNCTION (decl) instead of cfun, also pass it to premark_used_types. *** /tmp/3GCMxa_dwarf2out.c Tue Sep 4 15:10:23 2012 --- gcc/dwarf2out.c Mon Sep 3 14:48:02 2012 *** dwarf2out_abstract_function (tree decl) *** 16765,16771 /* Pretend we've just finished compiling this function. */ save_fn = current_function_decl; current_function_decl = decl; - push_cfun (DECL_STRUCT_FUNCTION (decl)); was_abstract = DECL_ABSTRACT (decl); set_decl_abstract_flags (decl, 1); --- 16765,16770 *** dwarf2out_abstract_function (tree decl) *** 16779,16785 call_arg_locations = old_call_arg_locations; call_site_count = old_call_site_count; tail_call_site_count = old_tail_call_site_count; - pop_cfun (); } /* Helper function of premark_used_types() which gets called through --- 16778,16783 *** premark_types_used_by_global_vars_helper *** 16838,16847 /* Mark all members of used_types_hash as perennial. */ static void ! premark_used_types (void) { ! if (cfun cfun-used_types_hash) ! htab_traverse (cfun-used_types_hash, premark_used_types_helper, NULL); } /* Mark all members of types_used_by_vars_entry as perennial. */ --- 16836,16845 /* Mark all members of used_types_hash as perennial. */ static void ! premark_used_types (struct function *fun) { ! if (fun fun-used_types_hash) ! htab_traverse (fun-used_types_hash, premark_used_types_helper, NULL); } /* Mark all members of types_used_by_vars_entry as perennial. */ *** gen_subprogram_die (tree decl, dw_die_re *** 16904,16910 int declaration = (current_function_decl != decl || class_or_namespace_scope_p (context_die)); ! premark_used_types (); /* It is possible to have both DECL_ABSTRACT and DECLARATION be true if we started to generate the abstract instance of an inline, decided to output --- 16902,16908 int declaration = (current_function_decl != decl || class_or_namespace_scope_p (context_die)); ! premark_used_types (DECL_STRUCT_FUNCTION (decl)); /* It is possible to have both DECL_ABSTRACT and DECLARATION be true if we started to generate the abstract instance of an inline, decided to output *** gen_subprogram_die (tree decl, dw_die_re *** 17067,17079 else if (!DECL_EXTERNAL (decl)) { HOST_WIDE_INT cfa_fb_offset; if (!old_die || !get_AT (old_die, DW_AT_inline)) equate_decl_number_to_die (decl, subr_die); if (!flag_reorder_blocks_and_partition) { !
[PATCH] Simplify FRE parts of PRE, try to save some memory
Currently computa_avail consumes an unreasonable amount of memory in the FRE case for PR46590. The following patch makes some obvious adjustments but does not cure the underlying issue. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2012-09-04 Richard Guenther rguent...@suse.de * tree-ssa-pre.c (add_to_exp_gen): Adjust. (make_values_for_phi): Do not add to PHI_GEN for FRE. (compute_avail): Stop processing after adding all defs to AVAIL_OUT for FRE. (init_pre): Do not allocate not needed bitmap sets for FRE. Index: gcc/tree-ssa-pre.c === *** gcc/tree-ssa-pre.c (revision 190918) --- gcc/tree-ssa-pre.c (working copy) *** insert (void) *** 3786,3799 static void add_to_exp_gen (basic_block block, tree op) { ! if (!in_fre) ! { ! pre_expr result; ! if (TREE_CODE (op) == SSA_NAME ssa_undefined_value_p (op)) ! return; ! result = get_or_alloc_expr_for_name (op); ! bitmap_value_insert_into_set (EXP_GEN (block), result); ! } } /* Create value ids for PHI in BLOCK. */ --- 3786,3800 static void add_to_exp_gen (basic_block block, tree op) { ! pre_expr result; ! ! gcc_checking_assert (!in_fre); ! ! if (TREE_CODE (op) == SSA_NAME ssa_undefined_value_p (op)) ! return; ! ! result = get_or_alloc_expr_for_name (op); ! bitmap_value_insert_into_set (EXP_GEN (block), result); } /* Create value ids for PHI in BLOCK. */ *** make_values_for_phi (gimple phi, basic_b *** 3805,3827 /* We have no need for virtual phis, as they don't represent actual computations. */ ! if (!virtual_operand_p (result)) { ! pre_expr e = get_or_alloc_expr_for_name (result); ! add_to_value (get_expr_value_id (e), e); bitmap_insert_into_set (PHI_GEN (block), e); ! bitmap_value_insert_into_set (AVAIL_OUT (block), e); ! if (!in_fre) { ! unsigned i; ! for (i = 0; i gimple_phi_num_args (phi); ++i) { ! tree arg = gimple_phi_arg_def (phi, i); ! if (TREE_CODE (arg) == SSA_NAME) ! { ! e = get_or_alloc_expr_for_name (arg); ! add_to_value (get_expr_value_id (e), e); ! } } } } --- 3806,3828 /* We have no need for virtual phis, as they don't represent actual computations. */ ! if (virtual_operand_p (result)) ! return; ! ! pre_expr e = get_or_alloc_expr_for_name (result); ! add_to_value (get_expr_value_id (e), e); ! bitmap_value_insert_into_set (AVAIL_OUT (block), e); ! if (!in_fre) { ! unsigned i; bitmap_insert_into_set (PHI_GEN (block), e); ! for (i = 0; i gimple_phi_num_args (phi); ++i) { ! tree arg = gimple_phi_arg_def (phi, i); ! if (TREE_CODE (arg) == SSA_NAME) { ! e = get_or_alloc_expr_for_name (arg); ! add_to_value (get_expr_value_id (e), e); } } } *** compute_avail (void) *** 3934,3939 --- 3935,3944 bitmap_value_insert_into_set (AVAIL_OUT (block), e); } + /* That's all we need to do when doing FRE. */ + if (in_fre) + continue; + if (gimple_has_side_effects (stmt) || stmt_could_throw_p (stmt)) continue; *** compute_avail (void) *** 3992,3999 get_or_alloc_expression_id (result); add_to_value (get_expr_value_id (result), result); ! if (!in_fre) ! bitmap_value_insert_into_set (EXP_GEN (block), result); } continue; } --- 3997,4003 get_or_alloc_expression_id (result); add_to_value (get_expr_value_id (result), result); ! bitmap_value_insert_into_set (EXP_GEN (block), result); } continue; } *** compute_avail (void) *** 4105,4112 get_or_alloc_expression_id (result); add_to_value (get_expr_value_id (result), result); ! if (!in_fre) ! bitmap_value_insert_into_set (EXP_GEN (block), result); continue; } --- 4109,4115 get_or_alloc_expression_id (result); add_to_value (get_expr_value_id (result), result); ! bitmap_value_insert_into_set (EXP_GEN (block), result); continue; } *** my_rev_post_order_compute (int *post_ord *** 4733,4747 src = ei_edge (ei)-src; dest = ei_edge (ei)-dest; ! /* Check if the edge destination has been visited yet. */ if (src !=
MAINTAINERS (Write After Approval): Add myself.
Hi all, I have added my name in the Write After Approval section, with the attached patch. Christophe. Index: ChangeLog === --- ChangeLog (revision 190926) +++ ChangeLog (working copy) @@ -1,3 +1,7 @@ +2012-09-04 Christophe Lyon christophe.l...@st.com + + * MAINTAINERS (Write After Approval): Add myself. + 2012-09-03 Richard Guenther rguent...@suse.de PR bootstrap/54138 Index: MAINTAINERS === --- MAINTAINERS (revision 190926) +++ MAINTAINERS (working copy) @@ -439,6 +439,7 @@ Manuel López-Ibáñez m...@gcc.gnu.org Martin v. Löwis loe...@informatik.hu-berlin.de H.J. Luhjl.to...@gmail.com Xinliang David Li davi...@google.com +Christophe Lyonchristophe.l...@st.com Luis Machado luis...@br.ibm.com Ziga Mahkovec ziga.mahko...@klika.si Simon Martin simar...@users.sourceforge.net
Re: Re-implement VEC_* to be member functions of vec_tT
On Fri, Aug 24, 2012 at 2:32 PM, Diego Novillo dnovi...@google.com wrote: On 2012-08-23 23:08 , Diego Novillo wrote: I've tested this patch on x86_64 and ppc64 with all languages plus ada, go and obj-c++. I am going to be offline for several days starting on Saturday, so I will not commit it until I return. I've also done memory and time comparisons to make sure I didn't change behaviour. No differences. I've now committed the patch. Diego.
Fix bootstrap failure with Sun linker
The generated libstdc++-symbols.ver-sun cannot be parsed by the linker anymore. Bootstrapped on SPARC64/Solaris 9 SPARC/Solaris 10, applied on the mainline. 2012-09-04 Eric Botcazou ebotca...@adacore.com * make_sunver.pl: Add missing newline at the end of extern C++ block. -- Eric Botcazou Index: make_sunver.pl === --- make_sunver.pl (revision 190863) +++ make_sunver.pl (working copy) @@ -185,7 +185,7 @@ while (F) { $glob = 'glob'; if ($in_extern) { $in_extern--; - print $1##$2; + print $1##$2\n; } else { print; }
Re: [Patch,avr] PR54461: Better AVR-Libc integration
On Tue, Sep 4, 2012 at 1:01 PM, Georg-Johann Lay a...@gjlay.de wrote: Richard Guenther wrote: Georg-Johann Lay wrote: Gabriel Dos Reis schrieb: Georg-Johann Lay wrote: Gabriel Dos Reis schrieb: Georg-Johann Lay wrote: Gabriel Dos Reis schrieb: Georg-Johann Lay wrote: AVR-Libc comes with hand-optimized float support functions written in assembler. These functions use the same naming conventions like libgcc. There are situations where this name clashed lead to performance regression because the functions from libgcc are linked. One example are the new fixed-point support that convert fixed-point to/from float and reference float/int conversion functions from within libgcc. The float implementation in libm.a have been discussed several times with the only result that it is very unlikely that the code will ever be integrated into libgcc because the original authors are no more around. And is is much less work to add a new configure switch than to port and integrate the code, given there were no license issues. One point against such an extension was that such change to the compiler establishes a dependency between the compiler and AVR-Libc, but this decision has been made long ago by accepting code that actually should had been added to libgcc -- but was not for whatever reason. This patch removes that performance regressions by removing the doubly implemented functions from libgcc by means of a new configure option --with-avrlibc. as I stated yesterday, I do not understand why there needs to be yet another configure option. The NATURAL libc for ARV targets is ARV-libc. We should not need a switch for that. There is also newlib that is used with avr-gcc. I know this because some bugs are only triggered for newlib. If there are users that report bugs if avr-gcc is build for newlib, I'd guess these users are actually interested in using newlib. I did not say there was no other libc library. I said that the *natural* libc appears to be AVR-libc. We don't configure GCC/g++ saying --with-libstdc++. That's a different story because these libraries support in-tree build just like newlib does. This is not true for AVR-Libc which does not support in-tree builds. I agree that AVR-Libc is the most common libc implementation used with avr-gcc and is has many advantages over other libc implementation (except that it does not support in-tree builds). I think the in-tree builds thing is a red herring. I don't think so. If there was an in-tree build gcc could detect itself whether or not AVR-Libc is present or not. With the current setup the user has to specify that -- in whatever direction: that libc is there or that libc is not there depending on whatever is default. You can do a link check on whether -lc provides those functions and skip those that overlap with libgcc. Can you explain this? A typical build of avr tools goes like 1) configure, build and install binutils 2) configure, build and install the compiler 3) configure, build and install AVR-Libc so that in step 2 no checking is possible because there is no -lc yet. Or do you mean a check at run time (of the compiler)? 4) build and install the real compiler at which time you have AVR-libc available. AT least that's how you bootstrap a glibc cross. Richard. Johann
Re: [PATCH] Simplify FRE parts of PRE, try to save some memory
On Tue, Sep 4, 2012 at 3:19 PM, Richard Guenther rguent...@suse.de wrote: Currently computa_avail consumes an unreasonable amount of memory in the FRE case for PR46590. The following patch makes some obvious adjustments but does not cure the underlying issue. I don't think there's any way to cure the underlying issue, it's just the result of having SSA form that so many values are available. You can improve the representation of the sets (e.g. something similar to the views of the tree-ssa-live machinery) but that's it. Ciao! Steven
Re: [PATCH] Simplify FRE parts of PRE, try to save some memory
On Tue, 4 Sep 2012, Steven Bosscher wrote: On Tue, Sep 4, 2012 at 3:19 PM, Richard Guenther rguent...@suse.de wrote: Currently computa_avail consumes an unreasonable amount of memory in the FRE case for PR46590. The following patch makes some obvious adjustments but does not cure the underlying issue. I don't think there's any way to cure the underlying issue, it's just the result of having SSA form that so many values are available. You can improve the representation of the sets (e.g. something similar to the views of the tree-ssa-live machinery) but that's it. One idea is that we only need AVAIL for the block we are currently doing elimination on (well, for FRE - PRE is another story). And we need AVAIL only for values we are going to replace. The first can be exploited by a domwalk unifying elimination and AVAIL computation for FRE, the 2nd one is harder ;) At least sounds like a reason to finally split FRE from PRE ... Richard.
Re: Ping^2: [PATCH]Remove duplicate check on BRANCH_COST in fold-const.c
On Tue, Sep 4, 2012 at 12:53 PM, Richard Earnshaw rearn...@arm.com wrote: On 04/09/12 11:11, Richard Guenther wrote: On Tue, Sep 4, 2012 at 11:56 AM, Bin Cheng bin.ch...@arm.com wrote: -Original Message- From: Richard Earnshaw Sent: Thursday, July 26, 2012 9:19 PM To: Andrew Pinski Cc: Bin Cheng; gcc-patches@gcc.gnu.org Subject: Re: [PATCH]Remove duplicate check on BRANCH_COST in fold-const.c On 26/07/12 11:27, Andrew Pinski wrote: On Thu, Jul 26, 2012 at 3:20 AM, Bin Cheng bin.ch...@arm.com wrote: Hi, This patch removes the duplicate check on BRANCH_COST in fold_truth_andor. The BRANCH_COST condition removed is a duplicate of the default definition of LOGICAL_OP_NON_SHORT_CIRCUIT. All current targets (mips and rs6000) that provide non-default definitions of LOGICAL_OP_SHORT_CIRCUIT set it to 0, so this patch is therefore just a code cleanup and does not change behaviour in the compiler. I built mipsel-elf cross compiler and compared newlib/libstdc++ compiled by the patched/original compilers. Is it OK? Just some history here on this. The BRANCH COST check was there before LOGICAL_OP_NON_SHORT_CIRCUIT was added. I will be submitting a patch which changes the MIPS definition soon but it will not be based on the branch cost but rather than another option. So in the end it might not be redundant as it is currently. Thanks, Andrew You can always factor BRANCH_COST into LOGICAL_OP_NON_SHORT_CIRCUIT (as the default currently does), so there's no loss of functionality from removing this currently redundant check. However, the current definition is broken in that it makes it impossible to force the compiler to use this optimization when the branch cost is low. Hi, is this change ok? Or we need more discussion on it? It's not ok (I btw fail to see the patch in this thread). The current way LOGICAL_OP_NON_SHORT_CIRCUIT is implemented/used should instead be changed to always match the pattern LOGICAL_OP_NON_SHORT_CIRCUIT (BRANCH_COST (optimize_function_for_speed_p (cfun), false) = 2) and the default value of LOGICAL_OP_NON_SHORT_CIRCUIT should be 1, defined in defaults.h (and the docs updated). That's not going to work for modern ARM cores. We want to set BRANCH_COST to 1 but still have it generate the non-short-circuit code (because conditional compares are really cheap. Then you define LOGICAL_OP_NON_SHORT_CIRCUIT to zero. The above would be an identity transform for all targets currently, so it is not working for modern ARM cores anyway. Richard. R. Richard. Thanks very much.
Re: [middle-end] Add machine_mode to address_cost target hook
On Tue, Sep 4, 2012 at 12:38 PM, Paolo Bonzini bonz...@gnu.org wrote: Il 04/09/2012 09:52, Oleg Endo ha scritto: [x] target-independent bits [ ] alpha [ ] arm [ ] avr [ ] bfin [ ] cr16 [ ] cris [ ] epiphany[ ] i386 [ ] ia64 [ ] iq2000[ ] lm32[ ] m32c [ ] m32r [ ] mcore [ ] mep [x] microblaze [x] mips [ ] mmix [x] mn10300 [ ] pa [ ] rs6000[ ] rx[ ] s390[ ] score [x] sh[ ] sparc [ ] spu [ ] stormy16 [ ] v850 [ ] vax [ ] xtensa Tested with 'make all-gcc' on SH xgcc and i386 native build. No functional changes, except on MIPS, as requested by Richard Sandiford. I think you only need explicit approval for mn10300. All other changes are trivial. +hook_int_rtx_mode_as_bool_0 (rtx, enum machine_mode, addr_space_t, bool) So we're using C++ already? Or do we want ATTRIBUTE_UNUSED here? Use C++ where it is so nicely obvious an improvement ;) Richard. Paolo
Re: Ping^2: [PATCH]Remove duplicate check on BRANCH_COST in fold-const.c
On 04/09/12 15:31, Richard Guenther wrote: On Tue, Sep 4, 2012 at 12:53 PM, Richard Earnshaw rearn...@arm.com wrote: On 04/09/12 11:11, Richard Guenther wrote: On Tue, Sep 4, 2012 at 11:56 AM, Bin Cheng bin.ch...@arm.com wrote: -Original Message- From: Richard Earnshaw Sent: Thursday, July 26, 2012 9:19 PM To: Andrew Pinski Cc: Bin Cheng; gcc-patches@gcc.gnu.org Subject: Re: [PATCH]Remove duplicate check on BRANCH_COST in fold-const.c On 26/07/12 11:27, Andrew Pinski wrote: On Thu, Jul 26, 2012 at 3:20 AM, Bin Cheng bin.ch...@arm.com wrote: Hi, This patch removes the duplicate check on BRANCH_COST in fold_truth_andor. The BRANCH_COST condition removed is a duplicate of the default definition of LOGICAL_OP_NON_SHORT_CIRCUIT. All current targets (mips and rs6000) that provide non-default definitions of LOGICAL_OP_SHORT_CIRCUIT set it to 0, so this patch is therefore just a code cleanup and does not change behaviour in the compiler. I built mipsel-elf cross compiler and compared newlib/libstdc++ compiled by the patched/original compilers. Is it OK? Just some history here on this. The BRANCH COST check was there before LOGICAL_OP_NON_SHORT_CIRCUIT was added. I will be submitting a patch which changes the MIPS definition soon but it will not be based on the branch cost but rather than another option. So in the end it might not be redundant as it is currently. Thanks, Andrew You can always factor BRANCH_COST into LOGICAL_OP_NON_SHORT_CIRCUIT (as the default currently does), so there's no loss of functionality from removing this currently redundant check. However, the current definition is broken in that it makes it impossible to force the compiler to use this optimization when the branch cost is low. Hi, is this change ok? Or we need more discussion on it? It's not ok (I btw fail to see the patch in this thread). The current way LOGICAL_OP_NON_SHORT_CIRCUIT is implemented/used should instead be changed to always match the pattern LOGICAL_OP_NON_SHORT_CIRCUIT (BRANCH_COST (optimize_function_for_speed_p (cfun), false) = 2) and the default value of LOGICAL_OP_NON_SHORT_CIRCUIT should be 1, defined in defaults.h (and the docs updated). That's not going to work for modern ARM cores. We want to set BRANCH_COST to 1 but still have it generate the non-short-circuit code (because conditional compares are really cheap. Then you define LOGICAL_OP_NON_SHORT_CIRCUIT to zero. The above would be an identity transform for all targets currently, so it is not working for modern ARM cores anyway. No, that's backwards. That gives us branches around compares, not formation of or'ed cflag values that we can then transform into conditional compares. R. Richard. R. Richard. Thanks very much.
Re: Ping^2: [PATCH]Remove duplicate check on BRANCH_COST in fold-const.c
On Tue, Sep 4, 2012 at 4:33 PM, Richard Earnshaw rearn...@arm.com wrote: On 04/09/12 15:31, Richard Guenther wrote: On Tue, Sep 4, 2012 at 12:53 PM, Richard Earnshaw rearn...@arm.com wrote: On 04/09/12 11:11, Richard Guenther wrote: On Tue, Sep 4, 2012 at 11:56 AM, Bin Cheng bin.ch...@arm.com wrote: -Original Message- From: Richard Earnshaw Sent: Thursday, July 26, 2012 9:19 PM To: Andrew Pinski Cc: Bin Cheng; gcc-patches@gcc.gnu.org Subject: Re: [PATCH]Remove duplicate check on BRANCH_COST in fold-const.c On 26/07/12 11:27, Andrew Pinski wrote: On Thu, Jul 26, 2012 at 3:20 AM, Bin Cheng bin.ch...@arm.com wrote: Hi, This patch removes the duplicate check on BRANCH_COST in fold_truth_andor. The BRANCH_COST condition removed is a duplicate of the default definition of LOGICAL_OP_NON_SHORT_CIRCUIT. All current targets (mips and rs6000) that provide non-default definitions of LOGICAL_OP_SHORT_CIRCUIT set it to 0, so this patch is therefore just a code cleanup and does not change behaviour in the compiler. I built mipsel-elf cross compiler and compared newlib/libstdc++ compiled by the patched/original compilers. Is it OK? Just some history here on this. The BRANCH COST check was there before LOGICAL_OP_NON_SHORT_CIRCUIT was added. I will be submitting a patch which changes the MIPS definition soon but it will not be based on the branch cost but rather than another option. So in the end it might not be redundant as it is currently. Thanks, Andrew You can always factor BRANCH_COST into LOGICAL_OP_NON_SHORT_CIRCUIT (as the default currently does), so there's no loss of functionality from removing this currently redundant check. However, the current definition is broken in that it makes it impossible to force the compiler to use this optimization when the branch cost is low. Hi, is this change ok? Or we need more discussion on it? It's not ok (I btw fail to see the patch in this thread). The current way LOGICAL_OP_NON_SHORT_CIRCUIT is implemented/used should instead be changed to always match the pattern LOGICAL_OP_NON_SHORT_CIRCUIT (BRANCH_COST (optimize_function_for_speed_p (cfun), false) = 2) and the default value of LOGICAL_OP_NON_SHORT_CIRCUIT should be 1, defined in defaults.h (and the docs updated). That's not going to work for modern ARM cores. We want to set BRANCH_COST to 1 but still have it generate the non-short-circuit code (because conditional compares are really cheap. Then you define LOGICAL_OP_NON_SHORT_CIRCUIT to zero. The above would be an identity transform for all targets currently, so it is not working for modern ARM cores anyway. No, that's backwards. That gives us branches around compares, not formation of or'ed cflag values that we can then transform into conditional compares. I see. So I suppose for that reason the original patch is ok. Thanks, Richard. R. Richard. R. Richard. Thanks very much.
Re: [middle-end] Add machine_mode to address_cost target hook
On Sep 4, 2012, Oleg Endo oleg.e...@t-online.de wrote: * config/mn10300/mn10300.c (mn10300_address_cost): Add machine_mode and address space arguments. Use GET_MODE (x) and ADDR_SPACE_GENERIC in recursive invocation. Ok with a change, see below. * config/sh/sh.c (sh_address_cost): Likewise. Ok, thanks. Index: gcc/config/mn10300/mn10300.c - total = mn10300_address_cost (XEXP (x, 0), speed); + total = mn10300_address_cost (XEXP (x, 0), GET_MODE (x), + ADDR_SPACE_GENERIC, speed); Instead of ADDR_SPACE_GENERIC, this should be MEM_ADDR_SPACE (x), no? -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer
Re: [Patch,avr] PR54461: Better AVR-Libc integration
On Tue, Sep 4, 2012 at 9:17 AM, Richard Guenther richard.guent...@gmail.com wrote: Can you explain this? A typical build of avr tools goes like 1) configure, build and install binutils 2) configure, build and install the compiler 3) configure, build and install AVR-Libc so that in step 2 no checking is possible because there is no -lc yet. Or do you mean a check at run time (of the compiler)? 4) build and install the real compiler at which time you have AVR-libc available. AT least that's how you bootstrap a glibc cross. avr-gcc has had a simplified build process for a while, as it almost never needed to have a avr-gcc hosted on an avr platform. It is usually built as a cross-compiler that always run on the build platform. What I was suggesting earlier is that we shouldn't continue patching the AVR target as if the current state is almost ideal. Pick a libc -- avr-libc appears to be the natural implementation -- and make it the default as opposed to adding nobs. -- Gaby
RE: [Patch,avr] PR54461: Better AVR-Libc integration
-Original Message- From: dosr...@gmail.com [] On Behalf Of Gabriel Dos Reis Sent: Tuesday, September 04, 2012 9:08 AM To: Richard Guenther Cc: Georg-Johann Lay; gcc-patches@gcc.gnu.org; Denis Chertykov; Weddington, Eric; Joerg Wunsch Subject: Re: [Patch,avr] PR54461: Better AVR-Libc integration On Tue, Sep 4, 2012 at 9:17 AM, Richard Guenther richard.guent...@gmail.com wrote: Can you explain this? A typical build of avr tools goes like 1) configure, build and install binutils 2) configure, build and install the compiler 3) configure, build and install AVR-Libc so that in step 2 no checking is possible because there is no -lc yet. Or do you mean a check at run time (of the compiler)? 4) build and install the real compiler at which time you have AVR-libc available. AT least that's how you bootstrap a glibc cross. avr-gcc has had a simplified build process for a while, as it almost never needed to have a avr-gcc hosted on an avr platform. It is usually built as a cross-compiler that always run on the build platform. What I was suggesting earlier is that we shouldn't continue patching the AVR target as if the current state is almost ideal. Pick a libc -- avr- libc appears to be the natural implementation -- and make it the default as opposed to adding nobs. I also strongly agree with this. AFAIK, the only project that uses newlib as the C library for the AVR target is RTEMS, because, AIUI, they need to have the POSIX interface. The vast majority of AVR users have a toolchain that uses avr-libc. Eric Weddington
Fix bootstrap with release checking
PR bootstrap/54479 * vec.h (vec_t::copy): Add cast in call to reserve_exact. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 3d20ebd..c605432 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2012-09-04 Diego Novillo dnovi...@google.com + + PR bootstrap/54479 + * vec.h (vec_t::copy): Add cast in call to reserve_exact. + 2012-09-04 Richard Guenther rguent...@suse.de * tree-ssa-pre.c (add_to_exp_gen): Adjust. diff --git a/gcc/vec.h b/gcc/vec.h index 74a85c7..ac426e9 100644 --- a/gcc/vec.h +++ b/gcc/vec.h @@ -699,7 +699,8 @@ vec_tT::copy (ALONE_MEM_STAT_DECL) if (len) { - new_vec = vec_tT::reserve_exactA (NULL, len PASS_MEM_STAT); + new_vec = vec_tT::reserve_exactA (static_castvec_tT * (NULL), + len PASS_MEM_STAT); new_vec-embedded_init (len, len); memcpy (new_vec-address (), vec_, sizeof (T) * len); }
Re: Ping^2: [PATCH]Remove duplicate check on BRANCH_COST in fold-const.c
Sorry, I mis-sent this offline. On Tue, Sep 4, 2012 at 11:00 PM, Bin.Cheng amker.ch...@gmail.com wrote: It's not ok (I btw fail to see the patch in this thread). The current way LOGICAL_OP_NON_SHORT_CIRCUIT is implemented/used should instead be changed to always match the pattern LOGICAL_OP_NON_SHORT_CIRCUIT (BRANCH_COST (optimize_function_for_speed_p (cfun), false) = 2) and the default value of LOGICAL_OP_NON_SHORT_CIRCUIT should be 1, defined in defaults.h (and the docs updated). That's not going to work for modern ARM cores. We want to set BRANCH_COST to 1 but still have it generate the non-short-circuit code (because conditional compares are really cheap. Hi Richard, For now, LOGICAL_OP_NON_SHORT_CIRCUIT macro is defined as below, which is duplicate of the BRANCH_COST condition. #ifndef LOGICAL_OP_NON_SHORT_CIRCUIT #define LOGICAL_OP_NON_SHORT_CIRCUIT \ (BRANCH_COST (optimize_function_for_speed_p (cfun), \ false) = 2) #endif Recently we measured performance on some ARM processors and found it would be better to have non-short-circuit optimization while setting BRANCH_COST to 1, which is impossible with present codes. So here comes this patch as below: Index: gcc/fold-const.c === --- gcc/fold-const.c(revision 189835) +++ gcc/fold-const.c(working copy) @@ -8443,9 +8443,7 @@ if ((tem = fold_truth_andor_1 (loc, code, type, arg0, arg1)) != 0) return tem; - if ((BRANCH_COST (optimize_function_for_speed_p (cfun), - false) = 2) - LOGICAL_OP_NON_SHORT_CIRCUIT + if (LOGICAL_OP_NON_SHORT_CIRCUIT (code == TRUTH_AND_EXPR || code == TRUTH_ANDIF_EXPR || code == TRUTH_OR_EXPR The purpose is to remove the duplicate check on BRANCH_COST. As Andrew pointed out that the patch may change behavior if some back-ends define the macro independent of BRANCH_COST. After looking into the code, there are two uses of the macro in fold-const.c, each controls one kind code transformation. The first use is: else if (LOGICAL_OP_NON_SHORT_CIRCUIT lhs != 0 rhs != 0 (code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR) operand_equal_p (lhs, rhs, 0)) The second one is: if ((BRANCH_COST (optimize_function_for_speed_p (cfun), false) = 2) LOGICAL_OP_NON_SHORT_CIRCUIT (code == TRUTH_AND_EXPR || code == TRUTH_ANDIF_EXPR || code == TRUTH_OR_EXPR || code == TRUTH_ORIF_EXPR)) I am not sure why the 2nd condition is designed in current way and haven't found any useful changelog on it. But considering back end can factor BRANCH_COST in LOGICAL_OP_NON_SHORT_CIRCUIT or not, we can conclude that the behavior will only be changed if some back-end want to control the two transformations differently. So the problem becomes whether the 2nd condition should be changed. Either way there is scenario cannot be covered. And for now, FTR, only two targets redefine L_O_N_S_C: mips and rs6000. Both set it to zero so won't be affected by this change. -- Best Regards.
Re: [PATCH] Set correct source location for deallocator calls
On Thu, Aug 30, 2012 at 9:33 AM, Richard Henderson r...@redhat.com wrote: On 08/30/2012 08:20 AM, Andrew Haley wrote: Is the problem simply that the logic to scan the assembly code isn't present in the libgcj testsuite? Yes, exactly. For this case, I don't think that we want a testcase to rely on addr2line in the system. So how about that that we add a test when assembly scan is available in libgcj testsuit? Thanks, Dehao r~
Re: [PATCH] Set correct source location for deallocator calls
On 09/04/2012 05:07 PM, Dehao Chen wrote: On Thu, Aug 30, 2012 at 9:33 AM, Richard Henderson r...@redhat.com wrote: On 08/30/2012 08:20 AM, Andrew Haley wrote: Is the problem simply that the logic to scan the assembly code isn't present in the libgcj testsuite? Yes, exactly. For this case, I don't think that we want a testcase to rely on addr2line in the system. So how about that that we add a test when assembly scan is available in libgcj testsuit? Fine by me. I guess you can just copy the scanning code from the gcc testsuite. Andrew.
Re: [PATCH] Set correct source location for deallocator calls
On Tue, Sep 4, 2012 at 5:07 PM, Dehao Chen de...@google.com wrote: On Thu, Aug 30, 2012 at 9:33 AM, Richard Henderson r...@redhat.com wrote: On 08/30/2012 08:20 AM, Andrew Haley wrote: Is the problem simply that the logic to scan the assembly code isn't present in the libgcj testsuite? Yes, exactly. For this case, I don't think that we want a testcase to rely on addr2line in the system. So how about that that we add a test when assembly scan is available in libgcj testsuit? Once Ian Lance Taylor's libbacktrace patch is integrated (see: http://gcc.gnu.org/ml/gcc/2012-08/msg00317.html), we'll be able to get rid of the code that calls addr2line from libgcj. So, I think it would be fine to write a Java test case using Throwable.getStackTrace(). Whichever approach is easiest for you is fine. Bryce
Re: [PATCH] Set correct source location for deallocator calls
On 09/04/2012 05:32 PM, Bryce McKinlay wrote: On Tue, Sep 4, 2012 at 5:07 PM, Dehao Chen de...@google.com wrote: On Thu, Aug 30, 2012 at 9:33 AM, Richard Henderson r...@redhat.com wrote: On 08/30/2012 08:20 AM, Andrew Haley wrote: Is the problem simply that the logic to scan the assembly code isn't present in the libgcj testsuite? Yes, exactly. For this case, I don't think that we want a testcase to rely on addr2line in the system. So how about that that we add a test when assembly scan is available in libgcj testsuit? Once Ian Lance Taylor's libbacktrace patch is integrated (see: http://gcc.gnu.org/ml/gcc/2012-08/msg00317.html), we'll be able to get rid of the code that calls addr2line from libgcj. As I understand it, Ian Taylor's backtrace patch is intended for use in gcc development, and as he puts it Since its use in GCC would be purely for GCC developers, it's not essential that it be fully portable. Not for gcj runtime. Andrew.
[PATCH] Clarify gcc-{ar,nm,ranlib} usage in the documentation
From: Andi Kleen a...@linux.intel.com Make it clear in the documentation that with -fno-fat-lto-objects the gcc-* wrappers should be used to pass the linker plugin. gcc/: 2012-09-04 Andi Kleen a...@linux.intel.com * doc/invoke.texi (-ffat-lto-objects): Clarify that gcc-ar et.al. should be used. --- gcc/doc/invoke.texi |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 6cf7cec..197803d 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -8138,7 +8138,9 @@ requires the complete toolchain to be aware of LTO. It requires a linker with linker plugin support for basic functionality. Additionally, @command{nm}, @command{ar} and @command{ranlib} need to support linker plugins to allow a full-featured build environment -(capable of building static libraries etc). +(capable of building static libraries etc). gcc provides the @command{gcc-ar}, +@command{gcc-nm}, @command{gcc-ranlib} wrappers to pass the right options +to these tools. With non fat LTO makefiles need to be modified to use them. The default is @option{-ffat-lto-objects} but this default is intended to change in future releases when linker plugin enabled environments become more -- 1.7.7
[PATCH] Reduce memory usage for storing LTO decl resolutions
From: Andi Kleen a...@linux.intel.com With a LTO build of a large project (11k subfiles incrementially linked) storing the LTO resolutions took over 0.5GB memory: lto/lto.c:1087 (lto_resolution_read) 0: 0.0% 540398500 15903: 0.0% The reason is that the declaration indexes are quite sparse, but every subfile got a full continuous vector for them. Since there are so many of them the many vectors add up. This patch instead stores the resolutions initially in a compact (index, resolution) format. This is only expanded into a sparse vector for fast lookup when the subfile is actually read, but then immediately freed. This means only one vector is allocated at a time. This brings the overhead for this down to less than 3MB for the test case: lto/lto.c:1087 (lto_resolution_read) 0: 0.0%2821456 42186: 0.0% Passed bootstrap and test suite on x86_64-linux. Ok for 4.8 and possibly for 4.7? -Andi 2012-09-04 Andi Kleen a...@linux.intel.com * gcc/lto-streamer.h (res_pair): Add. (lto_file_decl_data): Replace resolutions with respairs. Add max_index. * gcc/lto/lto.c (lto_resolution_read): Remove max_index. Add rp. Initialize respairs. (lto_file_finalize): Set up resolutions vector lazily from respairs. --- gcc/lto-streamer.h | 15 ++- gcc/lto/lto.c | 31 ++- 2 files changed, 36 insertions(+), 10 deletions(-) diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h index bed408a..19a35cb 100644 --- a/gcc/lto-streamer.h +++ b/gcc/lto-streamer.h @@ -513,6 +513,18 @@ typedef struct lto_out_decl_state *lto_out_decl_state_ptr; DEF_VEC_P(lto_out_decl_state_ptr); DEF_VEC_ALLOC_P(lto_out_decl_state_ptr, heap); +/* Compact representation of a index - resolution pair. Unpacked to an + vector later. */ +struct res_pair +{ + ld_plugin_symbol_resolution_t res; + unsigned index; +}; +typedef struct res_pair res_pair; + +DEF_VEC_P(res_pair); +DEF_VEC_ALLOC_P(res_pair, heap); + /* One of these is allocated for each object file that being compiled by lto. This structure contains the tables that are needed by the serialized functions and ipa passes to connect themselves to the @@ -548,7 +560,8 @@ struct GTY(()) lto_file_decl_data unsigned HOST_WIDE_INT id; /* Symbol resolutions for this file */ - VEC(ld_plugin_symbol_resolution_t,heap) * GTY((skip)) resolutions; + VEC(res_pair, heap) * GTY((skip)) respairs; + unsigned max_index; struct gcov_ctr_summary GTY((skip)) profile_info; }; diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c index bd91c39..5da5412 100644 --- a/gcc/lto/lto.c +++ b/gcc/lto/lto.c @@ -1012,7 +1012,6 @@ lto_resolution_read (splay_tree file_ids, FILE *resolution, lto_file *file) unsigned int num_symbols; unsigned int i; struct lto_file_decl_data *file_data; - unsigned max_index = 0; splay_tree_node nd = NULL; if (!resolution) @@ -1054,13 +1053,12 @@ lto_resolution_read (splay_tree file_ids, FILE *resolution, lto_file *file) unsigned int j; unsigned int lto_resolution_str_len = sizeof (lto_resolution_str) / sizeof (char *); + res_pair rp; t = fscanf (resolution, %u HOST_WIDE_INT_PRINT_HEX_PURE %26s %*[^\n]\n, index, id, r_str); if (t != 3) internal_error (invalid line in the resolution file); - if (index max_index) - max_index = index; for (j = 0; j lto_resolution_str_len; j++) { @@ -1082,11 +1080,13 @@ lto_resolution_read (splay_tree file_ids, FILE *resolution, lto_file *file) } file_data = (struct lto_file_decl_data *)nd-value; - VEC_safe_grow_cleared (ld_plugin_symbol_resolution_t, heap, -file_data-resolutions, -max_index + 1); - VEC_replace (ld_plugin_symbol_resolution_t, - file_data-resolutions, index, r); + /* The indexes are very sparse. To save memory save them in a compact + format that is only unpacked later when the subfile is processed. */ + rp.res = r; + rp.index = index; + VEC_safe_push (res_pair, heap, file_data-respairs, rp); + if (file_data-max_index index) +file_data-max_index = index; } } @@ -1166,6 +1166,18 @@ lto_file_finalize (struct lto_file_decl_data *file_data, lto_file *file) { const char *data; size_t len; + VEC(ld_plugin_symbol_resolution_t,heap) *resolutions = NULL; + int i; + res_pair *rp; + + /* Create vector for fast access of resolution. We do this lazily + to save memory. */ + VEC_safe_grow_cleared (ld_plugin_symbol_resolution_t, heap, +resolutions, +file_data-max_index + 1); + for (i = 0; VEC_iterate (res_pair, file_data-respairs, i, rp); i++) +VEC_replace (ld_plugin_symbol_resolution_t, resolutions, rp-index, rp-res); + VEC_free
Re: [Ping]RE: [Patch, test] Enable to prune warnings for tests defined in one exp file
On Sep 3, 2012, at 11:05 PM, Terry Guo terry@arm.com wrote: Is it ok to document this feature in README.gcc? Sure. I was almost hoping someone had a pointer to a wiki page that had new bits... Is it ok to back port this feature to 4.7 branch? Ok.
Re: [PATCH] Set correct source location for deallocator calls
On Tue, Sep 4, 2012 at 5:39 PM, Andrew Haley a...@redhat.com wrote: On 09/04/2012 05:32 PM, Bryce McKinlay wrote: On Tue, Sep 4, 2012 at 5:07 PM, Dehao Chen de...@google.com wrote: On Thu, Aug 30, 2012 at 9:33 AM, Richard Henderson r...@redhat.com wrote: On 08/30/2012 08:20 AM, Andrew Haley wrote: Is the problem simply that the logic to scan the assembly code isn't present in the libgcj testsuite? Yes, exactly. For this case, I don't think that we want a testcase to rely on addr2line in the system. So how about that that we add a test when assembly scan is available in libgcj testsuit? Once Ian Lance Taylor's libbacktrace patch is integrated (see: http://gcc.gnu.org/ml/gcc/2012-08/msg00317.html), we'll be able to get rid of the code that calls addr2line from libgcj. As I understand it, Ian Taylor's backtrace patch is intended for use in gcc development, and as he puts it Since its use in GCC would be purely for GCC developers, it's not essential that it be fully portable. Not for gcj runtime. He's also planning to use it for libgo, and other gcc runtime libs have indicated interest. It doesn't have to work on all platforms, and I can't see how it would be any less portable than addr2line! Bryce
Re: [PATCH] Set correct source location for deallocator calls
On 09/04/2012 06:08 PM, Bryce McKinlay wrote: On Tue, Sep 4, 2012 at 5:39 PM, Andrew Haley a...@redhat.com wrote: On 09/04/2012 05:32 PM, Bryce McKinlay wrote: On Tue, Sep 4, 2012 at 5:07 PM, Dehao Chen de...@google.com wrote: On Thu, Aug 30, 2012 at 9:33 AM, Richard Henderson r...@redhat.com wrote: On 08/30/2012 08:20 AM, Andrew Haley wrote: Is the problem simply that the logic to scan the assembly code isn't present in the libgcj testsuite? Yes, exactly. For this case, I don't think that we want a testcase to rely on addr2line in the system. So how about that that we add a test when assembly scan is available in libgcj testsuit? Once Ian Lance Taylor's libbacktrace patch is integrated (see: http://gcc.gnu.org/ml/gcc/2012-08/msg00317.html), we'll be able to get rid of the code that calls addr2line from libgcj. As I understand it, Ian Taylor's backtrace patch is intended for use in gcc development, and as he puts it Since its use in GCC would be purely for GCC developers, it's not essential that it be fully portable. Not for gcj runtime. He's also planning to use it for libgo, and other gcc runtime libs have indicated interest. It doesn't have to work on all platforms, and I can't see how it would be any less portable than addr2line! I certainly can. Maybe once it's shaken-down so it's at least as robust as what we have now it'll be OK. I suspect it hasn't had much testing with, for example, unwinding through signal handlers. Andrew.
Re: [PATCH] Set correct source location for deallocator calls
On Tue, Sep 4, 2012 at 6:12 PM, Andrew Haley a...@redhat.com wrote: He's also planning to use it for libgo, and other gcc runtime libs have indicated interest. It doesn't have to work on all platforms, and I can't see how it would be any less portable than addr2line! I certainly can. Maybe once it's shaken-down so it's at least as robust as what we have now it'll be OK. I suspect it hasn't had much testing with, for example, unwinding through signal handlers. libgcj wouldn't actually use it for unwinding, we already have all that. We'd just use it to read DWARF debug info and give us the source code line numbers.
re: [google/gcc-4_7] Fix GDB test suite regression with -fdebug-types-section patch
On Sat, Sep 1, 2012 at 1:44 PM, gcc-patches-digest-h...@gcc.gnu.org wrote: From: ccout...@google.com (Cary Coutant) To: d...@google.com, gcc-patches@gcc.gnu.org Cc: Date: Fri, 31 Aug 2012 16:55:40 -0700 (PDT) Subject: [google/gcc-4_7] Fix GDB test suite regression with -fdebug-types-section patch This patch is for the google/gcc-4_7 branch. This patch fixes a problem caused by the previous patch that removed the code to copy children of a DIE referenced by a type unit. I don't believe that it's necessary to copy the children of the class declaration at all, and this patch simply removes the code that copies those children. If there's a reference in the type unit to one of the children of that class, that one child will get copied in as needed. The problem was that it IS necessary to copy the children of a non-declaration -- such as a DW_TAG_array_type. I've restored the loop that calls clone_tree_partial, but placed it within a test for is_declaration_die. Bootstraps and passes regression tests. Also tested with parts of the GDB testsuite, and is still able to build a large internal test case that previously resulted in out-of-memory during compilation. Google ref b/7041390. 2012-08-31 Cary Coutant ccout...@google.com * gcc/dwarf2out.c (clone_tree_partial): Restore. (copy_decls_walk): Call clone_tree_partial to copy children of non-declaration DIEs. This is OK for google branches.
Re: [PATCH] Set correct source location for deallocator calls
On 09/04/2012 06:17 PM, Bryce McKinlay wrote: On Tue, Sep 4, 2012 at 6:12 PM, Andrew Haley a...@redhat.com wrote: He's also planning to use it for libgo, and other gcc runtime libs have indicated interest. It doesn't have to work on all platforms, and I can't see how it would be any less portable than addr2line! I certainly can. Maybe once it's shaken-down so it's at least as robust as what we have now it'll be OK. I suspect it hasn't had much testing with, for example, unwinding through signal handlers. libgcj wouldn't actually use it for unwinding, we already have all that. We'd just use it to read DWARF debug info and give us the source code line numbers. OK, as long as that's all it does. I think I was perhaps a bit misled by its description of a stack backtrace library. It certainly looks like a nicer approach than addr2line, but is going to be much less well-ported. I guess we'll see how it goes. Andrew.
Re: [PATCH][RFC] Fixing instability of -fschedule-insns for x86
On Mon, Aug 13, 2012 at 9:39 PM, Igor Zamyatin izamya...@gmail.com wrote: Main idea of this activity is mostly to provide user a possibility to safely turn on first scheduler for his codes. In some cases this could positively affect performance, especially for in-order Atom. It would be great to hear some feedback from the community about the change. I don't think it is necessary to set dependence for CALL_INSN arguments. It seems to me, that it is enough to set scheduling priority of moves to hard registers to zero, to schedule them as late as possible, presumably just before call insn. The attached patch builds on your idea of setting priorities of moves from hard registers to pseudos to maximum (these are moves from function arguments, they should be scheduled as soon as possible to free hard registers). Please note that it is enough to handle only likely spilled hard regs (for moves from and to registers), since these regs are causing all the troubles. The patch assumes that likely spilled hard regs didn't propagate to other instructions, and that other hard registers didn't propagate to operands with wrong constraints (recent x86 improvement). Unfortunately, the patch doesn't fix PR 54472 (the spill failure with selective scheduler). No matter what TARGET_SCHED_ADJUST_PRIORITY returns, the offending move to ax register always get scheduled before problematic string instruction. The patch however builds on promise from the documentation that: -- Target Hook: int TARGET_SCHED_ADJUST_PRIORITY (rtx INSN, int PRIORITY) This hook adjusts the integer scheduling priority PRIORITY of INSN. It should return the new priority. Increase the priority to execute INSN earlier, reduce the priority to execute INSN later. Do not define this hook if you do not need to adjust the scheduling priorities of insns. The patch is in RFC state, but survives quite some -fschedule-insns testing on current mainline, with and without added -fsched-pressure. Uros. Index: i386.c === --- i386.c (revision 190932) +++ i386.c (working copy) @@ -24314,6 +24314,49 @@ ix86_sched_reorder(FILE *dump, int sched_verbose, return issue_rate; } +/* Before reload, adjust priority of moves to/from likely spilled + hard registers. This reduces hard register life times and consequently + the chance of spill failures for enclosed instructions. */ +static int +ix86_adjust_priority (rtx insn, int priority) +{ + rtx set; + + if (reload_completed) +return priority; + + if (!NONJUMP_INSN_P (insn)) +return priority; + + set = single_set (insn); + + if (set) +{ + rtx tmp; + + /* Set priority of moves from likely spilled hard registers to maximum, +to schedule them as soon as possible. These are moves from +function argument registers at the top of the function entry. */ + tmp = SET_SRC (set); + if (REG_P (tmp) + HARD_REGISTER_P (tmp) + ! TEST_HARD_REG_BIT (fixed_reg_set, REGNO (tmp)) + targetm.class_likely_spilled_p (REGNO_REG_CLASS (REGNO (tmp + return current_sched_info-sched_max_insns_priority; + + /* Set priority of moves to likely spilled hard registers to minimum, +to schedule them as late as possible. These are moves to +function argument registers before function call. */ + tmp = SET_DEST (set); + if (REG_P (tmp) + HARD_REGISTER_P (tmp) + ! TEST_HARD_REG_BIT (fixed_reg_set, REGNO (tmp)) + targetm.class_likely_spilled_p (REGNO_REG_CLASS (REGNO (tmp + return 0; +} + + return priority; +} /* Model decoder of Core 2/i7. @@ -39608,6 +39651,8 @@ ix86_enum_va_list (int idx, const char **pname, tr #define TARGET_SCHED_REASSOCIATION_WIDTH ix86_reassociation_width #undef TARGET_SCHED_REORDER #define TARGET_SCHED_REORDER ix86_sched_reorder +#undef TARGET_SCHED_ADJUST_PRIORITY +#define TARGET_SCHED_ADJUST_PRIORITY ix86_adjust_priority /* The size of the dispatch window is the total number of bytes of object code allowed in a window. */
[RFC] PowerPC / rs6000 call glue removal
Segher and I are planning to remove the machinery supporting RS6000_CALL_GLUE. In the AIX ABI, used by AIX, PowerOpen, PPC64 Linux and mcall-aixdesc, direct calls to named functions that may be external are followed by a special no-op instruction that the linker can replace with an instruction to restore the TOC addressibility register. This no-op instruction changed over time prior to the PowerPC architecture: initially cror 15,15,15, then cror 31,31,31 and finally settling on the PowerPC nop instruction. PPC64 Linux only has ever used the PowerPC nop instruction. All versions of AIX targeting PowerPC use the PowerPC nop instruction. All current configurations of GCC targeting PPC64 Linux and AIX only generate the nop instruction. The machinery is not used for the PPC32 SVR4 ABI and eABI. With the recent removal of support for the original POWER instruction set from GCC, we propose to remove the machinery to control the no-op instruction emitted by GCC. The linkers will continue to handle the older no-op instructions in any old object files. However, the default value of RS6000_CALL_GLUE, which almost all targets override, is cror 31,31,31. The only place this potentially still could be used is in the old mcall-aixdesc embedded mode originally designed by Cygnus for a customer who desired an embedded mode with AIX compatibility. We do not believe there are any remaining users of this option, but we want to distribute this announcement widely to ensure that anyone affected has an opportunity to comment. http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01849.html As the saying goes: Speak now, or forever hold your peace. Thanks, David
Re: [Patch,avr] PR54461: Better AVR-Libc integration
Weddington, Eric wrote: Can you explain this? A typical build of avr tools goes like 1) configure, build and install binutils 2) configure, build and install the compiler 3) configure, build and install AVR-Libc so that in step 2 no checking is possible because there is no -lc yet. Or do you mean a check at run time (of the compiler)? 4) build and install the real compiler at which time you have AVR-libc available. AT least that's how you bootstrap a glibc cross. avr-gcc has had a simplified build process for a while, as it almost never needed to have a avr-gcc hosted on an avr platform. It is usually built as a cross-compiler that always run on the build platform. What I was suggesting earlier is that we shouldn't continue patching the AVR target as if the current state is almost ideal. Pick a libc -- avr- libc appears to be the natural implementation -- and make it the default as opposed to adding nobs. I also strongly agree with this. AFAIK, the only project that uses newlib as the C library for the AVR target is RTEMS, because, AIUI, they need to have the POSIX interface. The vast majority of AVR users have a toolchain that uses avr-libc. So here is an updated version of the patch. Instead of with_avrlibc = yes it does with_avrlibc != no. Just like the first version, --with-avrlibc[=*] is only recognized if avr-gcc is not configured for RTEMS, i.e. RTEMS users don't need to set --with-avrlibc=no in order to get a complete libgcc. Johann -- PR target/54461 * configure.ac (noconfigdirs,target=avr-*-*): Add target-newlib, target-libgloss if not configured --with-avrlibc=no. * configure: Regenerate. libgcc/ PR target/54461 * config.host (tmake_file,host=avr-*-*): Add avr/t-avrlibc if not configured --with-avrlibc=no. * config/avr/t-avrlibc: New file. * Makefile.in (FPBIT_FUNCS): filter-out LIB2FUNCS_EXCLUDE. (DPBIT_FUNCS): Ditto. (TPBIT_FUNCS): Ditto. gcc/ PR target/54461 * config.gcc (tm_file,target=avr-*-*): Add avr/avrlibc.h if not configured --with-avrlibc=no. (tm_defines,target=avr-*-*): Add WITH_AVRLIBC if not configured --with-avrlibc=no. * config/avr/avrlibc.h: New file. * config/avr/avr-c.c: Build-in define __WITH_AVRLIBC__ if not configured --with-avrlibc=no. Index: configure === --- configure (revision 190922) +++ configure (working copy) @@ -3499,6 +3499,13 @@ case ${target} in arm-*-riscix*) noconfigdirs=$noconfigdirs ld target-libgloss ;; + avr-*-rtems*) +;; + avr-*-*) +if test x${with_avrlibc} != xno; then + noconfigdirs=$noconfigdirs target-newlib target-libgloss +fi +;; c4x-*-* | tic4x-*-*) noconfigdirs=$noconfigdirs target-libgloss ;; Index: configure.ac === --- configure.ac (revision 190922) +++ configure.ac (working copy) @@ -891,6 +891,13 @@ case ${target} in arm-*-riscix*) noconfigdirs=$noconfigdirs ld target-libgloss ;; + avr-*-rtems*) +;; + avr-*-*) +if test x${with_avrlibc} != xno; then + noconfigdirs=$noconfigdirs target-newlib target-libgloss +fi +;; c4x-*-* | tic4x-*-*) noconfigdirs=$noconfigdirs target-libgloss ;; Index: libgcc/config/avr/t-avrlibc === --- libgcc/config/avr/t-avrlibc (revision 0) +++ libgcc/config/avr/t-avrlibc (revision 0) @@ -0,0 +1,66 @@ +# This file is used if not configured --with-avrlibc=no +# +# AVR-Libc comes with hand-optimized float routines. +# For historical reasons, these routines live in AVR-Libc +# and not in libgcc and use the same function names like libgcc. +# To get the best support, i.e. always use the routines from +# AVR-Libc, we remove these routines from libgcc. +# +# See also PR54461. +# +# +# Arithmetic: +# __addsf3 __subsf3 __divsf3 __mulsf3 __negsf2 +# +# Comparison: +# __cmpsf2 __unordsf2 +# __eqsf2 __lesf2 __ltsf2 __nesf2 __gesf2 __gtsf2 +# +# Conversion: +# __fixsfdi __fixunssfdi __floatdisf __floatundisf +# __fixsfsi __fixunssfsi __floatsisf __floatunsisf +# +# +# These functions are contained in modules: +# +# _addsub_sf.o: __addsf3 __subsf3 +# _mul_sf.o: __mulsf3 +# _div_sf.o: __divsf3 +# _negate_sf.o: __negsf2 +# +# _compare_sf.o: __cmpsf2 +# _unord_sf.o:__unordsf2 +# _eq_sf.o: __eqsf2 +# _ne_sf.o: __nesf2 +# _ge_sf.o: __gesf2 +# _gt_sf.o: __gtsf2 +# _le_sf.o: __lesf2 +# _lt_sf.o: __ltsf2 +# +# _fixsfdi.o: __fixsfdi +# _fixunssfdi.o: __fixunssfdi +# _fixunssfsi.o: __fixunssfsi +# _floatdisf.o: __floatdisf +# _floatundisf.o: __floatundisf +# _sf_to_si.o:__fixsfsi +# _si_to_sf.o:__floatsisf +# _usi_to_sf.o: __floatunsisf + + +# SFmode +LIB2FUNCS_EXCLUDE += \
RE: [Patch,avr] PR54461: Better AVR-Libc integration
-Original Message- From: Georg-Johann Lay [] Sent: Tuesday, September 04, 2012 12:00 PM To: Weddington, Eric Cc: Gabriel Dos Reis; Richard Guenther; gcc-patches@gcc.gnu.org; Denis Chertykov; Joerg Wunsch Subject: Re: [Patch,avr] PR54461: Better AVR-Libc integration So here is an updated version of the patch. Instead of with_avrlibc = yes it does with_avrlibc != no. Just like the first version, --with-avrlibc[=*] is only recognized if avr-gcc is not configured for RTEMS, i.e. RTEMS users don't need to set --with-avrlibc=no in order to get a complete libgcc. Sorry, I'm a bit confused. With your new patch... - If I build GCC, for the avr target (plain), without specifying the --with-avr-libc= switch, does it default to yes? - If I build GCC, for the avr-rtems target, without specifying the --with-avr-libc= switch, does it default to no? Because the above is what I would expect the default behavior to be. Doing that would certainly help with backwards compatibility for those building toolchain distributions. I would think that the user has to specify the --with-avr-libc= flag to explicitly deviate from common usage and practice. Eric Weddington
Re: [Patch,avr] PR54461: Better AVR-Libc integration
On Tue, Sep 4, 2012 at 1:00 PM, Georg-Johann Lay a...@gjlay.de wrote: Weddington, Eric wrote: Can you explain this? A typical build of avr tools goes like 1) configure, build and install binutils 2) configure, build and install the compiler 3) configure, build and install AVR-Libc so that in step 2 no checking is possible because there is no -lc yet. Or do you mean a check at run time (of the compiler)? 4) build and install the real compiler at which time you have AVR-libc available. AT least that's how you bootstrap a glibc cross. avr-gcc has had a simplified build process for a while, as it almost never needed to have a avr-gcc hosted on an avr platform. It is usually built as a cross-compiler that always run on the build platform. What I was suggesting earlier is that we shouldn't continue patching the AVR target as if the current state is almost ideal. Pick a libc -- avr- libc appears to be the natural implementation -- and make it the default as opposed to adding nobs. I also strongly agree with this. AFAIK, the only project that uses newlib as the C library for the AVR target is RTEMS, because, AIUI, they need to have the POSIX interface. The vast majority of AVR users have a toolchain that uses avr-libc. So here is an updated version of the patch. Instead of with_avrlibc = yes it does with_avrlibc != no. Just like the first version, --with-avrlibc[=*] is only recognized if avr-gcc is not configured for RTEMS, i.e. RTEMS users don't need to set --with-avrlibc=no in order to get a complete libgcc. Thanks! I am satisfied with this. -- Gaby
Fix PR 54478 - Work around g++ 4.3 parsing bug
This patch works around a parsing problem with g++ 4.3. The parser is failing to lookup calls to the template function reserve when called from other member functions: vec_tT::reserveA (...) The parser thinks that the '' in reserveA is a less-than operation. This problem does not happen after 4.3. This code is going to change significantly, so this won't be needed soon. Tested on x86_64 with g++ 4.3 and g++ 4.6. Diego. PR bootstrap/54478 * vec.h (vec_t::alloc): Remove explicit type specification in call to reserve. (vec_t::copy): Likewise. (vec_t::reserve): Likewise. (vec_t::reserve_exact): Likewise. (vec_t::safe_splice): Likewise. (vec_t::safe_push): Likewise. (vec_t::safe_grow): Likewise. (vec_t::safe_grow_cleared): Likewise. (vec_t::safe_insert): Likewise. diff --git a/gcc/vec.h b/gcc/vec.h index ac426e9..c0f1bb2 100644 --- a/gcc/vec.h +++ b/gcc/vec.h @@ -655,7 +655,7 @@ templateenum vec_allocation_t A vec_tT * vec_tT::alloc (int nelems MEM_STAT_DECL) { - return vec_tT::reserve_exactA ((vec_tT *) NULL, nelems PASS_MEM_STAT); + return reserve_exactA ((vec_tT *) NULL, nelems PASS_MEM_STAT); } templatetypename T @@ -699,8 +699,8 @@ vec_tT::copy (ALONE_MEM_STAT_DECL) if (len) { - new_vec = vec_tT::reserve_exactA (static_castvec_tT * (NULL), - len PASS_MEM_STAT); + new_vec = reserve_exactA (static_castvec_tT * (NULL), + len PASS_MEM_STAT); new_vec-embedded_init (len, len); memcpy (new_vec-address (), vec_, sizeof (T) * len); } @@ -736,7 +736,7 @@ vec_tT::reserve (vec_tT **vec, int nelems VEC_CHECK_DECL MEM_STAT_DECL) bool extend = (*vec) ? !(*vec)-space (nelems VEC_CHECK_PASS) : nelems != 0; if (extend) -*vec = vec_tT::reserveA (*vec, nelems PASS_MEM_STAT); +*vec = reserveA (*vec, nelems PASS_MEM_STAT); return extend; } @@ -755,7 +755,7 @@ vec_tT::reserve_exact (vec_tT **vec, int nelems VEC_CHECK_DECL bool extend = (*vec) ? !(*vec)-space (nelems VEC_CHECK_PASS) : nelems != 0; if (extend) -*vec = vec_tT::reserve_exactA (*vec, nelems PASS_MEM_STAT); +*vec = reserve_exactA (*vec, nelems PASS_MEM_STAT); return extend; } @@ -796,8 +796,7 @@ vec_tT::safe_splice (vec_tT **dst, vec_tT *src VEC_CHECK_DECL { if (src) { - vec_tT::reserve_exactA (dst, VEC_length (T, src) VEC_CHECK_PASS - MEM_STAT_INFO); + reserve_exactA (dst, VEC_length (T, src) VEC_CHECK_PASS MEM_STAT_INFO); (*dst)-splice (src VEC_CHECK_PASS); } } @@ -843,7 +842,7 @@ templateenum vec_allocation_t A T vec_tT::safe_push (vec_tT **vec, T obj VEC_CHECK_DECL MEM_STAT_DECL) { - vec_tT::reserveA (vec, 1 VEC_CHECK_PASS PASS_MEM_STAT); + reserveA (vec, 1 VEC_CHECK_PASS PASS_MEM_STAT); return (*vec)-quick_push (obj VEC_CHECK_PASS); } @@ -858,7 +857,7 @@ templateenum vec_allocation_t A T * vec_tT::safe_push (vec_tT **vec, const T *ptr VEC_CHECK_DECL MEM_STAT_DECL) { - vec_tT::reserveA (vec, 1 VEC_CHECK_PASS PASS_MEM_STAT); + reserveA (vec, 1 VEC_CHECK_PASS PASS_MEM_STAT); return (*vec)-quick_push (ptr VEC_CHECK_PASS); } @@ -898,8 +897,8 @@ vec_tT::safe_grow (vec_tT **vec, int size VEC_CHECK_DECL MEM_STAT_DECL) { VEC_ASSERT (size = 0 VEC_length (T, *vec) = (unsigned)size, grow, T, A); - vec_tT::reserve_exactA (vec, size - (int)VEC_length (T, *vec) - VEC_CHECK_PASS PASS_MEM_STAT); + reserve_exactA (vec, size - (int)VEC_length (T, *vec) + VEC_CHECK_PASS PASS_MEM_STAT); (*vec)-prefix_.num_ = size; } @@ -915,7 +914,7 @@ vec_tT::safe_grow_cleared (vec_tT **vec, int size VEC_CHECK_DECL MEM_STAT_DECL) { int oldsize = VEC_length (T, *vec); - vec_tT::safe_growA (vec, size VEC_CHECK_PASS PASS_MEM_STAT); + safe_growA (vec, size VEC_CHECK_PASS PASS_MEM_STAT); memset (((*vec)-address ()[oldsize]), 0, sizeof (T) * (size - oldsize)); } @@ -972,7 +971,7 @@ void vec_tT::safe_insert (vec_tT **vec, unsigned ix, T obj VEC_CHECK_DECL MEM_STAT_DECL) { - vec_tT::reserveA (vec, 1 VEC_CHECK_PASS PASS_MEM_STAT); + reserveA (vec, 1 VEC_CHECK_PASS PASS_MEM_STAT); (*vec)-quick_insert (ix, obj VEC_CHECK_PASS); } @@ -988,7 +987,7 @@ void vec_tT::safe_insert (vec_tT **vec, unsigned ix, T *ptr VEC_CHECK_DECL MEM_STAT_DECL) { - vec_tT::reserveA (vec, 1 VEC_CHECK_PASS PASS_MEM_STAT); + reserveA (vec, 1 VEC_CHECK_PASS PASS_MEM_STAT); (*vec)-quick_insert (ix, ptr VEC_CHECK_PASS); }
PATCH to configure.ac to fix --enable-languages=all
I configure GCC with --enable-languages=all,obj-c++ for testing, and this started breaking recently, because I ended up with 'c' twice in the list of languages, so reconfiguring breaks. This seems to be because r189080 didn't adjust the all case along with the language name case. Tested x86_64-pc-linux-gnu, applying to trunk as obvious. commit 1e901c1bd7814cd5e3b6800fe035255ab6c3976f Author: Jason Merrill ja...@redhat.com Date: Tue Sep 4 13:25:21 2012 -0400 * configure.ac: Fix --enable-languages=all. diff --git a/configure b/configure index 0f655b8..cd06e4e 100755 --- a/configure +++ b/configure @@ -6112,6 +6112,7 @@ if test -d ${srcdir}/gcc; then boot_language=yes fi +add_this_lang=no case ,${enable_languages}, in *,${language},*) # Language was explicitly selected; include it @@ -6122,10 +6123,9 @@ if test -d ${srcdir}/gcc; then ;; *,all,*) # 'all' was selected, select it if it is a default language -add_this_lang=${build_by_default} -;; - *) -add_this_lang=no + if test $language != c; then + add_this_lang=${build_by_default} + fi ;; esac diff --git a/configure.ac b/configure.ac index 02174b3..9bee624 100644 --- a/configure.ac +++ b/configure.ac @@ -1758,6 +1758,7 @@ if test -d ${srcdir}/gcc; then boot_language=yes fi +add_this_lang=no case ,${enable_languages}, in *,${language},*) # Language was explicitly selected; include it @@ -1768,10 +1769,9 @@ if test -d ${srcdir}/gcc; then ;; *,all,*) # 'all' was selected, select it if it is a default language -add_this_lang=${build_by_default} -;; - *) -add_this_lang=no + if test $language != c; then + add_this_lang=${build_by_default} + fi ;; esac
Re: [middle-end] Add machine_mode to address_cost target hook
Oleg Endo oleg.e...@t-online.de writes: On Mon, 2012-09-03 at 01:58 +0200, Oleg Endo wrote: OKOK -- I'll do it :) (within the next couple of days) And so I did. Attached is an updated patch that adds the address space parameter to the address_cost function. I hope that this change does not reset the ACKs so far: [x] target-independent bits [ ] alpha [ ] arm [ ] avr [ ] bfin [ ] cr16 [ ] cris [ ] epiphany[ ] i386 [ ] ia64 [ ] iq2000[ ] lm32[ ] m32c [ ] m32r [ ] mcore [ ] mep [x] microblaze [x] mips [ ] mmix [x] mn10300 [ ] pa [ ] rs6000[ ] rx[ ] s390[ ] score [x] sh[ ] sparc [ ] spu [ ] stormy16 [ ] v850 [ ] vax [ ] xtensa Tested with 'make all-gcc' on SH xgcc and i386 native build. No functional changes, except on MIPS, as requested by Richard Sandiford. Thanks, looks good to me. Hopefully a friendly global maintainer will approve the whole thing in one go (modulo Alex's comment) so that you don't need to get individual approvals for all targets. Richard
[PATCH] Further OpenBSD/amd64 and OpenBSD/i386 improvements
Here are some additional fixes for OpenBSD that fix a fair number of failing testcases. I can split this up in smaller patches if that's preferred. I believe I submitted the openbsd-stdint.h bit before. We consistenly use long long types for the *max_t types, on both 32-bit and 64-bit platforms wheras GCC defaults to using long on 32-bit platforms and long long on 64-bit platforms. Hence the need for overrides. libgcc/: 2012-09-02 Mark Kettenis kette...@gnu.org * config.host (*-*-openbsd*): Add t-eh-dw2-dip to tmake_file. (i[34567]86-*-openbsd* and x86_64-*-openbsd*): Add to list of i[34567]86-*-* and x86_64-*-* soft-fp targets. * unwind-dw2-fde-dip.c: Don't include elf.h on OpenBSD. (USE_PT_GNU_EH_FRAME): Define for OpenBSD. (ElfW): Likewise. gcc:/ 2012-09-02 Mark Kettenis kette...@gnu.org * config.gcc (*-*-openbsd4.[3-9]|*-*-openbsd[5-9]*): Set default_use_cxa_atexit to yes. * config/openbsd-stdint.h (INTMAX_TYPE, UINTMAX_TYPE): Define. * config/i386/openbsdelf.h (LIBGCC2_HAS_TF_MODE, LIBGCC2_TF_CEXT) (TF_SIZE): Define. Index: libgcc/unwind-dw2-fde-dip.c === --- libgcc/unwind-dw2-fde-dip.c (revision 190863) +++ libgcc/unwind-dw2-fde-dip.c (working copy) @@ -33,7 +33,7 @@ #include tconfig.h #include tsystem.h -#ifndef inhibit_libc +#if !defined(inhibit_libc) !defined(__OpenBSD__) #include elf.h /* Get DT_CONFIG. */ #endif #include coretypes.h @@ -65,6 +65,12 @@ #endif #if !defined(inhibit_libc) defined(HAVE_LD_EH_FRAME_HDR) \ + defined(__OpenBSD__) +# define ElfW(type) Elf_##type +# define USE_PT_GNU_EH_FRAME +#endif + +#if !defined(inhibit_libc) defined(HAVE_LD_EH_FRAME_HDR) \ defined(TARGET_DL_ITERATE_PHDR) \ defined(__sun__) defined(__svr4__) # define USE_PT_GNU_EH_FRAME Index: libgcc/config.host === --- libgcc/config.host (revision 190863) +++ libgcc/config.host (working copy) @@ -213,7 +213,7 @@ esac ;; *-*-openbsd*) - tmake_file=$tmake_file t-crtstuff-pic t-libgcc-pic + tmake_file=$tmake_file t-crtstuff-pic t-libgcc-pic t-eh-dw2-dip case ${target_thread_file} in posix) tmake_file=$tmake_file t-openbsd-thread @@ -1150,7 +1150,8 @@ i[34567]86-*-gnu* | \ i[34567]86-*-solaris2* | x86_64-*-solaris2.1[0-9]* | \ i[34567]86-*-cygwin* | i[34567]86-*-mingw* | x86_64-*-mingw* | \ - i[34567]86-*-freebsd* | x86_64-*-freebsd*) + i[34567]86-*-freebsd* | x86_64-*-freebsd* | \ + i[34567]86-*-openbsd* | x86_64-*-openbsd*) tmake_file=${tmake_file} t-softfp-tf if test ${host_address} = 32; then tmake_file=${tmake_file} i386/${host_address}/t-softfp Index: gcc/config.gcc === --- gcc/config.gcc (revision 190863) +++ gcc/config.gcc (working copy) @@ -708,6 +708,11 @@ *-*-openbsd2.*|*-*-openbsd3.[012]) tm_defines=${tm_defines} HAS_LIBC_R=1 ;; esac + case ${target} in +*-*-openbsd4.[3-9]|*-*-openbsd[5-9]*) + default_use_cxa_atexit=yes + ;; + esac ;; *-*-rtems*) case ${enable_threads} in Index: gcc/config/i386/openbsdelf.h === --- gcc/config/i386/openbsdelf.h(revision 190863) +++ gcc/config/i386/openbsdelf.h(working copy) @@ -111,3 +111,9 @@ #define OBSD_HAS_CORRECT_SPECS #define HAVE_ENABLE_EXECUTE_STACK + +/* Put all *tf routines in libgcc. */ +#undef LIBGCC2_HAS_TF_MODE +#define LIBGCC2_HAS_TF_MODE 1 +#define LIBGCC2_TF_CEXT q +#define TF_SIZE 113 Index: gcc/config/openbsd-stdint.h === --- gcc/config/openbsd-stdint.h (revision 190863) +++ gcc/config/openbsd-stdint.h (working copy) @@ -26,6 +26,9 @@ #define UINT_FAST16_TYPE unsigned int #define UINT_FAST32_TYPE unsigned int #define UINT_FAST64_TYPE long long unsigned int + +#define INTMAX_TYPElong long int +#define UINTMAX_TYPE long long unsigned int #define INTPTR_TYPElong int #define UINTPTR_TYPE long unsigned int
C++ PATCH for c++/54437 (firefox build failure)
Here, the problem was that we were resolving the address of an overloaded function in the context of the template being called (which doesn't have access to the function) rather than the caller (which does). We need to massage explicit template arguments before we enter the callee's context. Tested x86_64-pc-linux-gnu, applying to trunk. commit c17767b10d05c0ea47107a3b7f067da76cc5ad8d Author: Jason Merrill ja...@redhat.com Date: Tue Sep 4 11:27:03 2012 -0400 PR c++/54437 PR c++/51213 * pt.c (fn_type_unification): Call coerce_template_parms before entering substitution context. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 4a39427..6f6235c 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -14591,11 +14591,22 @@ fn_type_unification (tree fn, static int deduction_depth; struct pending_template *old_last_pend = last_pending_template; struct tinst_level *old_error_tinst = last_error_tinst_level; + tree tparms = DECL_INNERMOST_TEMPLATE_PARMS (fn); tree tinst; tree r = error_mark_node; - if (excessive_deduction_depth) -return error_mark_node; + /* Adjust any explicit template arguments before entering the + substitution context. */ + if (explicit_targs) +{ + explicit_targs + = (coerce_template_parms (tparms, explicit_targs, NULL_TREE, + complain, + /*require_all_args=*/false, + /*use_default_args=*/false)); + if (explicit_targs == error_mark_node) + return error_mark_node; +} /* In C++0x, it's possible to have a function template whose type depends on itself recursively. This is most obvious with decltype, but can also @@ -14608,6 +14619,8 @@ fn_type_unification (tree fn, substitutions back up to the initial one. This is, of course, not reentrant. */ + if (excessive_deduction_depth) +return error_mark_node; tinst = build_tree_list (fn, targs); if (!push_tinst_level (tinst)) { @@ -14640,23 +14653,10 @@ fn_type_unification (tree fn, specified template argument values. If a substitution in a template parameter or in the function type of the function template results in an invalid type, type deduction fails. */ - tree tparms = DECL_INNERMOST_TEMPLATE_PARMS (fn); int i, len = TREE_VEC_LENGTH (tparms); location_t loc = input_location; - tree converted_args; bool incomplete = false; - if (explicit_targs == error_mark_node) - goto fail; - - converted_args - = (coerce_template_parms (tparms, explicit_targs, NULL_TREE, - complain, - /*require_all_args=*/false, - /*use_default_args=*/false)); - if (converted_args == error_mark_node) - goto fail; - /* Substitute the explicit args into the function type. This is necessary so that, for instance, explicitly declared function arguments can match null pointed constants. If we were given @@ -14667,7 +14667,7 @@ fn_type_unification (tree fn, { tree parm = TREE_VALUE (TREE_VEC_ELT (tparms, i)); bool parameter_pack = false; - tree targ = TREE_VEC_ELT (converted_args, i); + tree targ = TREE_VEC_ELT (explicit_targs, i); /* Dig out the actual parm. */ if (TREE_CODE (parm) == TYPE_DECL @@ -14705,7 +14705,7 @@ fn_type_unification (tree fn, processing_template_decl += incomplete; input_location = DECL_SOURCE_LOCATION (fn); - fntype = tsubst (TREE_TYPE (fn), converted_args, + fntype = tsubst (TREE_TYPE (fn), explicit_targs, complain | tf_partial, NULL_TREE); input_location = loc; processing_template_decl -= incomplete; @@ -14714,8 +14714,8 @@ fn_type_unification (tree fn, goto fail; /* Place the explicitly specified arguments in TARGS. */ - for (i = NUM_TMPL_ARGS (converted_args); i--;) - TREE_VEC_ELT (targs, i) = TREE_VEC_ELT (converted_args, i); + for (i = NUM_TMPL_ARGS (explicit_targs); i--;) + TREE_VEC_ELT (targs, i) = TREE_VEC_ELT (explicit_targs, i); } /* Never do unification on the 'this' parameter. */ diff --git a/gcc/testsuite/g++.dg/template/access24.C b/gcc/testsuite/g++.dg/template/access24.C new file mode 100644 index 000..9f19226 --- /dev/null +++ b/gcc/testsuite/g++.dg/template/access24.C @@ -0,0 +1,8 @@ +// PR c++/54437 + +template void (*P)() void f(); +class A { + template class T static void g(); + template class T static void h () { fgT (); } + static void i() { hint(); } +};
Re: [middle-end] Add machine_mode to address_cost target hook
On Tue, 2012-09-04 at 12:02 -0300, Alexandre Oliva wrote: Index: gcc/config/mn10300/mn10300.c - total = mn10300_address_cost (XEXP (x, 0), speed); + total = mn10300_address_cost (XEXP (x, 0), GET_MODE (x), + ADDR_SPACE_GENERIC, speed); Instead of ADDR_SPACE_GENERIC, this should be MEM_ADDR_SPACE (x), no? Effectively, it actually doesn't matter, since the address space is not used in the cost function. But yeah, true, fixed thusly. The change log entry for this was also wrong. Fixed that, too. Thanks. Updated patch and change log below. On Tue, 2012-09-04 at 12:38 +0200, Paolo Bonzini wrote: I think you only need explicit approval for mn10300. All other changes are trivial. On Tue, 2012-09-04 at 19:43 +0100, Richard Sandiford wrote: Thanks, looks good to me. Hopefully a friendly global maintainer will approve the whole thing in one go (modulo Alex's comment) so that you don't need to get individual approvals for all targets. Hmm .. the ACK status so far is: [x] target-independent bits [ ] alpha [x] arm [ ] avr [ ] bfin [ ] cr16 [ ] cris [ ] epiphany[ ] i386 [ ] ia64 [x] iq2000[ ] lm32[ ] m32c [x] m32r [x] mcore [ ] mep [x] microblaze [x] mips [ ] mmix [x] mn10300 [ ] pa [ ] rs6000[x] rx[ ] s390[ ] score [x] sh[ ] sparc [ ] spu [x] stormy16 [x] v850 [ ] vax [ ] xtensa I think I'll wait until Friday. If there are no further objections until then, I'd like to and install the patch even if some of the boxes should still remain unchecked. Would that be OK? On Tue, 2012-09-04 at 16:32 +0200, Richard Guenther wrote: +hook_int_rtx_mode_as_bool_0 (rtx, enum machine_mode, addr_space_t, bool) So we're using C++ already? Or do we want ATTRIBUTE_UNUSED here? Use C++ where it is so nicely obvious an improvement ;) ProblemFactoryManagerListenerSingleton? ;) Cheers, Oleg ChangeLog: * hooks.c (hook_int_rtx_mode_as_bool_0): New function. * hooks.h (hook_int_rtx_mode_as_bool_0): Declare it. * output.h (default_address_cost): Add machine_mode and address space arguments. * target.def (address_cost): Likewise. * rtlanal.c (address_cost): Pass mode and address space to target hook. (default_address_cost): Add unnamed machine_mode and address space arguments. * doc/tm.texi: Regenerate. * config/alpha/alpha.c (TARGET_ADDRESS_COST): Use hook_int_rtx_mode_as_bool_0 instead of hook_int_rtx_bool_0. * config/arm/arm.c (arm_address_cost): Add machine_mode and address space arguments. * config/avr/avr.c (avr_address_cost): Likewise. * config/bfin/bfin.c (bfin_address_cost): Likewise. * config/cr16/cr16.c (cr16_address_cost): Likewise. * config/cris/cris.c (cris_address_cost): Likewise. * config/epiphany/epiphany.c (epiphany_address_cost): Likewise. * config/i386/i386.c (ix86_address_cost): Likewise. * config/ia64/ia64.c (TARGET_ADDRESS_COST): Use hook_int_rtx_mode_as_bool_0 instead of hook_int_rtx_bool_0. * config/iq2000/iq2000.c (iq2000_address_cost): Add machine_mode and address space arguments. Pass them on in recursive invocation. * config/lm32/lm32.c (TARGET_ADDRESS_COST): Use hook_int_rtx_mode_as_bool_0 instead of hook_int_rtx_bool_0. * config/m32c/m32c.c (m32c_address_cost): Add machine_mode and address space arguments. * config/m32r/m32r.c (TARGET_ADDRESS_COST): Use hook_int_rtx_mode_as_bool_0 instead of hook_int_rtx_bool_0. * config/mcore/mcore.c (TARGET_ADDRESS_COST): Likewise. * config/mep/mep.c (mep_address_cost): Add machine_mode and address space arguments. * config/microblaze/microblaze.c (microblaze_address_cost): Likewise. * config/mips/mips.c (mips_address_cost): Likewise. * config/mmix/mmix.c (TARGET_ADDRESS_COST): Use hook_int_rtx_mode_as_bool_0 instead of hook_int_rtx_bool_0. * config/mn10300/mn10300.c (mn10300_address_cost): Add machine_mode and address space arguments. (mn10300_rtx_costs): Pass GET_MODE (x) and MEM_ADDR_SPACE (x) to mn10300_address_cost. * config/pa/pa.c (hppa_address_cost): Add machine_mode and address space arguments. * config/rs6000/rs6000.c (rs6000_debug_address_cost): Likewise. (TARGET_ADDRESS_COST): Use hook_int_rtx_mode_as_bool_0 instead of hook_int_rtx_bool_0. * config/rx/rx.c (rx_address_cost): Add machine_mode and address space arguments. * config/s390/s390.c (s390_address_cost): Likewise. * config/score/score-protos.h (score_address_cost): Likewise. * config/score/score.c (score_address_cost): Likewise. *
Re: [Patch,avr] PR54461: Better AVR-Libc integration
Weddington, Eric wrote: From: Georg-Johann Lay So here is an updated version of the patch. Instead of with_avrlibc = yes it does with_avrlibc != no. Just like the first version, --with-avrlibc[=*] is only recognized if avr-gcc is not configured for RTEMS, i.e. RTEMS users don't need to set --with-avrlibc=no in order to get a complete libgcc. Sorry, I'm a bit confused. With your new patch... - If I build GCC, for the avr target (plain), without specifying the --with-avr-libc= switch, does it default to yes? Yes. Anything except an explicit no is treated like yes. - If I build GCC, for the avr-rtems target, without specifying the --with-avr-libc= switch, does it default to no? Notice the switch is called --with-avrlibc. The option is ignored for avr-*-rtems*, thus is similar to no, thus yes for the question. Because the above is what I would expect the default behavior to be. Doing that would certainly help with backwards compatibility for those building toolchain distributions. I would think that the user has to specify the --with-avr-libc= flag to explicitly deviate from common usage and practice. Yes, that's the case. Except for users that want avr-*-* without AVR-Libc and with newlib or some other libc flavor. Johann
Fix PR rtl-optimization/54456
This patch fixes PR rtl-optimization/54456 by running the first scheduling pass only when optimizing, as is already done for the second scheduling pass. Tested on x86_64-suse-linux, applied on the mainline. 2012-09-04 Eric Botcazou ebotca...@adacore.com PR rtl-optimization/54456 * sched-rgn.c (gate_handle_sched): Return 1 only if optimize 0. -- Eric Botcazou Index: sched-rgn.c === --- sched-rgn.c (revision 190863) +++ sched-rgn.c (working copy) @@ -3473,7 +3473,7 @@ static bool gate_handle_sched (void) { #ifdef INSN_SCHEDULING - return flag_schedule_insns dbg_cnt (sched_func); + return optimize 0 flag_schedule_insns dbg_cnt (sched_func); #else return 0; #endif
RE: [Patch,avr] PR54461: Better AVR-Libc integration
-Original Message- From: Georg-Johann Lay [] Sent: Tuesday, September 04, 2012 1:03 PM To: Weddington, Eric Cc: Gabriel Dos Reis; Richard Guenther; gcc-patches@gcc.gnu.org; Denis Chertykov; Joerg Wunsch Subject: Re: [Patch,avr] PR54461: Better AVR-Libc integration I would think that the user has to specify the --with-avr-libc= flag to explicitly deviate from common usage and practice. Yes, that's the case. Except for users that want avr-*-* without AVR-Libc and with newlib or some other libc flavor. Excellent! Thanks for the detailed explanation, and sorry for my confusion. I'm good with the patch, then. Eric
Minor reorganization in bb-reorder.c
The file contains 3 RTL optimization passes, the gate and worker functions of which are strangely intertwined. Fixed thusly, tested on x86_64-suse-linux, applied on the mainline. 2012-09-04 Eric Botcazou ebotca...@adacore.com * bb-reorder.c (gate_handle_reorder_blocks): Move around. (rest_of_handle_reorder_blocks): Likewise. (pass_reorder_blocks): Likewise. (gate_handle_partition_blocks): Likewise. -- Eric Botcazou Index: bb-reorder.c === --- bb-reorder.c (revision 190863) +++ bb-reorder.c (working copy) @@ -2037,6 +2037,65 @@ insert_section_boundary_note (void) } } +static bool +gate_handle_reorder_blocks (void) +{ + if (targetm.cannot_modify_jumps_p ()) +return false; + /* Don't reorder blocks when optimizing for size because extra jump insns may + be created; also barrier may create extra padding. + + More correctly we should have a block reordering mode that tried to + minimize the combined size of all the jumps. This would more or less + automatically remove extra jumps, but would also try to use more short + jumps instead of long jumps. */ + if (!optimize_function_for_speed_p (cfun)) +return false; + return (optimize 0 + (flag_reorder_blocks || flag_reorder_blocks_and_partition)); +} + +static unsigned int +rest_of_handle_reorder_blocks (void) +{ + basic_block bb; + + /* Last attempt to optimize CFG, as scheduling, peepholing and insn + splitting possibly introduced more crossjumping opportunities. */ + cfg_layout_initialize (CLEANUP_EXPENSIVE); + + reorder_basic_blocks (); + cleanup_cfg (CLEANUP_EXPENSIVE); + + FOR_EACH_BB (bb) +if (bb-next_bb != EXIT_BLOCK_PTR) + bb-aux = bb-next_bb; + cfg_layout_finalize (); + + /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ + insert_section_boundary_note (); + return 0; +} + +struct rtl_opt_pass pass_reorder_blocks = +{ + { + RTL_PASS, + bbro, /* name */ + gate_handle_reorder_blocks, /* gate */ + rest_of_handle_reorder_blocks,/* execute */ + NULL, /* sub */ + NULL, /* next */ + 0,/* static_pass_number */ + TV_REORDER_BLOCKS,/* tv_id */ + 0,/* properties_required */ + 0,/* properties_provided */ + 0,/* properties_destroyed */ + 0,/* todo_flags_start */ + TODO_verify_rtl_sharing, /* todo_flags_finish */ + } +}; + /* Duplicate the blocks containing computed gotos. This basically unfactors computed gotos that were factored early on in the compilation process to speed up edge based data flow. We used to not unfactoring them again, @@ -2178,6 +2237,21 @@ struct rtl_opt_pass pass_duplicate_compu } }; +static bool +gate_handle_partition_blocks (void) +{ + /* The optimization to partition hot/cold basic blocks into separate + sections of the .o file does not work well with linkonce or with + user defined section attributes. Don't call it if either case + arises. */ + return (flag_reorder_blocks_and_partition + optimize + /* See gate_handle_reorder_blocks. We should not partition if + we are going to omit the reordering. */ + optimize_function_for_speed_p (cfun) + !DECL_ONE_ONLY (current_function_decl) + !user_defined_section_attribute); +} /* This function is the main 'entrance' for the optimization that partitions hot and cold basic blocks into separate sections of the @@ -2346,83 +2420,6 @@ partition_hot_cold_basic_blocks (void) return TODO_verify_flow | TODO_verify_rtl_sharing; } - -static bool -gate_handle_reorder_blocks (void) -{ - if (targetm.cannot_modify_jumps_p ()) -return false; - /* Don't reorder blocks when optimizing for size because extra jump insns may - be created; also barrier may create extra padding. - - More correctly we should have a block reordering mode that tried to - minimize the combined size of all the jumps. This would more or less - automatically remove extra jumps, but would also try to use more short - jumps instead of long jumps. */ - if (!optimize_function_for_speed_p (cfun)) -return false; - return (optimize 0 - (flag_reorder_blocks || flag_reorder_blocks_and_partition)); -} - - -/* Reorder basic blocks. */ -static unsigned int -rest_of_handle_reorder_blocks (void) -{ - basic_block bb; - - /* Last attempt to optimize CFG, as scheduling, peepholing and insn - splitting possibly introduced more crossjumping opportunities. */ - cfg_layout_initialize (CLEANUP_EXPENSIVE); - - reorder_basic_blocks (); - cleanup_cfg (CLEANUP_EXPENSIVE); - - FOR_EACH_BB (bb) -if (bb-next_bb != EXIT_BLOCK_PTR) - bb-aux =
[PATCH, libstdc++] Add proper OpenBSD support
Fixes a few testcases. Mostly based on the existing NetBSD/FreeBSD/Darwin code. 2012-09-04 Mark Kettenis kette...@openbsd.org * configure.host (*-*-openbsd*) Set cpu_include_dir. * config/os/bsd/openbsd/ctype_base.h: New file. * config/os/bsd/openbsd/ctype_configure_char.cc: New file. * config/os/bsd/openbsd/ctype_inline.h: New file. * config/os/bsd/openbsd/os_defines.h: New file. Index: configure.host === --- configure.host (revision 190863) +++ configure.host (working copy) @@ -270,6 +270,9 @@ netbsd*) os_include_dir=os/bsd/netbsd ;; + openbsd*) +os_include_dir=os/bsd/openbsd +;; qnx6.[12]*) os_include_dir=os/qnx/qnx6.1 c_model=c Index: config/os/bsd/openbsd/ctype_base.h === --- config/os/bsd/openbsd/ctype_base.h (revision 0) +++ config/os/bsd/openbsd/ctype_base.h (working copy) @@ -0,0 +1,59 @@ +// Locale support -*- C++ -*- + +// Copyright (C) 2000, 2009, 2012 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +// http://www.gnu.org/licenses/. + +// +// ISO C++ 14882: 22.1 Locales +// + +// Information as gleaned from /usr/include/ctype.h on OpenBSD. + +namespace std _GLIBCXX_VISIBILITY(default) +{ +_GLIBCXX_BEGIN_NAMESPACE_VERSION + + /// @brief Base class for ctype. + struct ctype_base + { +// Non-standard typedefs. +typedef const short* __to_type; + +// NB: Offsets into ctypechar::_M_table force a particular size +// on the mask type. Because of this, we don't use an enum. +typedef char mask; + +static const mask upper= _U; +static const mask lower= _L; +static const mask alpha= _U | _L; +static const mask digit= _N; +static const mask xdigit = _N | _X; +static const mask space= _S; +static const mask print= _P | _U | _L | _N | _B; +static const mask graph= _P | _U | _L | _N; +static const mask cntrl= _C; +static const mask punct= _P; +static const mask alnum= _U | _L | _N; + }; + +_GLIBCXX_END_NAMESPACE_VERSION +} // namespace Index: config/os/bsd/openbsd/os_defines.h === --- config/os/bsd/openbsd/os_defines.h (revision 0) +++ config/os/bsd/openbsd/os_defines.h (working copy) @@ -0,0 +1,41 @@ +// Specific definitions for OpenBSD -*- C++ -*- + +// Copyright (C) 2000, 2002, 2009, 2012 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +// http://www.gnu.org/licenses/. + +/** @file bits/os_defines.h + * This is an internal header file, included by other library headers. + * Do not attempt to use it directly. @headername{iosfwd} + */ + +#ifndef _GLIBCXX_OS_DEFINES +#define _GLIBCXX_OS_DEFINES 1 + +// System-specific #define, typedefs, corrections, etc, go here. This +// file will come before all others. + +#define _GLIBCXX_USE_C99_DYNAMIC (!(__ISO_C_VISIBLE = 1999)) +#define _GLIBCXX_USE_C99_LONG_LONG_DYNAMIC
C++ PATCH for c++/54198
My patch to change check_default_argument to call perform_implicit_conversion_flags in order to get the diagnostics we want there had the undesired side-effect of causing the instantiation of templates that would be used by that conversion, even though the conversion isn't really used. So this patch avoids that by setting cp_unevaluated_context. Tested x86_64-pc-linux-gnu, applying to trunk. commit ce91c2a524880f727a114cc40e0ad94ac6755631 Author: Jason Merrill ja...@redhat.com Date: Tue Sep 4 15:20:32 2012 -0400 PR c++/54198 * decl.c (check_default_argument): Set cp_unevaluated_operand around call to perform_implicit_conversion_flags. diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index 8b94e26..8024373 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -10575,8 +10575,10 @@ check_default_argument (tree decl, tree arg) A default argument expression is implicitly converted to the parameter type. */ + ++cp_unevaluated_operand; perform_implicit_conversion_flags (decl_type, arg, tf_warning_or_error, LOOKUP_NORMAL); + --cp_unevaluated_operand; if (warn_zero_as_null_pointer_constant c_inhibit_evaluation_warnings == 0 diff --git a/gcc/testsuite/g++.dg/template/defarg15.C b/gcc/testsuite/g++.dg/template/defarg15.C new file mode 100644 index 000..fea3dee --- /dev/null +++ b/gcc/testsuite/g++.dg/template/defarg15.C @@ -0,0 +1,19 @@ +// PR c++/54198 + +template typename T void +refIfNotNull (T* p1) +{ +p1-ref; +} +template typename T struct A +{ +A (T* p1) +{ +refIfNotNull (p1); +} +}; +class B; +class C +{ +void getParent (A B = 0); +};
Re: [middle-end] Add machine_mode to address_cost target hook
On Tue, Sep 4, 2012 at 2:57 PM, Oleg Endo oleg.e...@t-online.de wrote: Hmm .. the ACK status so far is: [x] target-independent bits [ ] alpha [x] arm [ ] avr [ ] bfin [ ] cr16 [ ] cris [ ] epiphany[ ] i386 [ ] ia64 [x] iq2000[ ] lm32[ ] m32c [x] m32r [x] mcore [ ] mep [x] microblaze [x] mips [ ] mmix [x] mn10300 [ ] pa [x] rs6000[x] rx[ ] s390[ ] score [x] sh[ ] sparc [x] spu [x] stormy16 [x] v850 [ ] vax [ ] xtensa * config/rs6000/rs6000.c (rs6000_debug_address_cost): Likewise. (TARGET_ADDRESS_COST): Use hook_int_rtx_mode_as_bool_0 instead of hook_int_rtx_bool_0. * config/spu/spu.c (TARGET_ADDRESS_COST): Likewise. The rs6000 and spu bits are okay. Thanks, David
Re: [PATCH] rs6000: Add a builtin to read the time base register on PowerPC
On Wed, Aug 29, 2012 at 3:09 PM, Segher Boessenkool seg...@kernel.crashing.org wrote: For things that do mftb with high frequency, maybe you should also add a builtin that does just an mftb, i.e. returns a 32-bit result on 32-bit implementations. Are you thinking in a function that returns only the TBL? On 32-bit, just TBL; on 64-bit, the whole TB (there is no machine instruction to read just TBL on 64-bit, so it doesn't make much sense to have it return a 32-bit number). It sounds like you are asking for an additional interface for high-frequency events that only reads one register on both PPC32 and PPC64. I do not believe that interface currently exists for PPC in GLibc and that seems out of the scope of this patch. It could be a nice feature, but it's a new feature request that is not necessary for this round of patches. Thanks, David
Re: [middle-end] Add machine_mode to address_cost target hook
Quoting David Edelsohn dje@gmail.com: On Tue, Sep 4, 2012 at 2:57 PM, Oleg Endo oleg.e...@t-online.de wrote: Hmm .. the ACK status so far is: Not sure if we are supposed to acknowledge all the straigtforward argument additions... at any rate, the epiphany hunk is OK. I think I'll make use of the new functionality eventually, but prefer to be able to test such a functional change separately, so I'm fine with the approach to just introduce the infrastructure first.
Re: [PATCH] Set correct source location for deallocator calls
Looks like even with addr2line properly installed, the gcj generated code cannot get the correct source file/lineno. Do I need to pass in anything to gcj to let it know where addr2line is? Thanks, Dehao #javac stacktrace.java #java stacktrace stacktrace.e(stacktrace.java:42) stacktrace.d(stacktrace.java:38) stacktrace.c(stacktrace.java:31) stacktrace.b(stacktrace.java:26) stacktrace.a(stacktrace.java:19) stacktrace.main(stacktrace.java:12) #gcj *.class -o stacktrace.exe #./stacktrace.exe stacktrace.e(stacktrace.exe:-1) stacktrace.d(stacktrace.exe:-1) stacktrace.c(stacktrace.exe:-1) stacktrace.b(stacktrace.exe:-1) stacktrace.a(stacktrace.exe:-1) stacktrace.main(stacktrace.exe:-1) The java code is shown below: stacktrace.java /* This test should test the stacktrace functionality. We only print ClassName and MethName since the other information like FileName and LineNumber are not consistent while building native or interpreted and we want to test the output inside the dejagnu test environment. Also, we have to make the methods public since they might be optimized away with inline's and then the -O3/-O2 execution might fail. */ public class stacktrace { public static void main(String args[]) { try { new stacktrace().a(); } catch (TopException e) { } } public void a() throws TopException { try { b(); } catch (MiddleException e) { throw new TopException(e); } } public void b() throws MiddleException { c(); } public void c() throws MiddleException { try { d(); } catch (BottomException e) { throw new MiddleException(e); } } public void d() throws BottomException { e(); } public void e() throws BottomException { throw new BottomException(); } } class TopException extends Exception { TopException(Throwable cause) { super(cause); } } class MiddleException extends Exception { MiddleException(Throwable cause) { super(cause); } } class BottomException extends Exception { BottomException() { StackTraceElement stack[] = this.getStackTrace(); for (int i = 0; i stack.length; i++) { String className = stack[i].getClassName(); String methodName = stack[i].getMethodName(); System.out.println(className + . + methodName + ( + stack[i].getFileName() + : + stack[i].getLineNumber() + )); } } }
Re: [PATCH] Set correct source location for deallocator calls
On Tue, Sep 4, 2012 at 9:22 AM, Andrew Haley a...@redhat.com wrote: On 09/04/2012 05:07 PM, Dehao Chen wrote: On Thu, Aug 30, 2012 at 9:33 AM, Richard Henderson r...@redhat.com wrote: On 08/30/2012 08:20 AM, Andrew Haley wrote: Is the problem simply that the logic to scan the assembly code isn't present in the libgcj testsuite? Yes, exactly. For this case, I don't think that we want a testcase to rely on addr2line in the system. So how about that that we add a test when assembly scan is available in libgcj testsuit? Fine by me. I guess you can just copy the scanning code from the gcc testsuite. I tried that, but it is not trivial, and simply copying proc scan-assembler to libjava seems ugly. Do libjava people really think it's worth to add scan-assembler and other premitives in gcc testsuite into libjava testsuite? If yes, I'll leave it to the TODO list. Thanks, Dehao Andrew.
Re: [PATCH] Combine location with block using block_locations
ping... Thanks, Dehao On Tue, Aug 21, 2012 at 4:54 PM, Dehao Chen de...@google.com wrote: On Tue, Aug 21, 2012 at 6:25 AM, Richard Guenther richard.guent...@gmail.com wrote: On Mon, Aug 20, 2012 at 3:18 AM, Dehao Chen de...@google.com wrote: ping Conceptually I like the change. Can a libcpp maintainer please have a 2nd look? Dehao, did you do any compile-time and memory-usage benchmarks? I don't have a memory benchmarks at hand. But I've tested it through some huge apps, each of which takes more than 1 hour to build on a modern machine. None of them had observed noticeable memory footprint and compile time increase. Thanks, Dehao Thanks, Richard. Thanks, Dehao On Tue, Aug 14, 2012 at 10:13 AM, Dehao Chen de...@google.com wrote: Hi, Dodji, Thanks for the review. I've fixed all the addressed issues. I'm attaching the related changes: Thanks, Dehao libcpp/ChangeLog: 2012-08-01 Dehao Chen de...@google.com * include/line-map.h (MAX_SOURCE_LOCATION): New value. (location_adhoc_data_init): New. (location_adhoc_data_fini): New. (get_combined_adhoc_loc): New. (get_data_from_adhoc_loc): New. (get_location_from_adhoc_loc): New. (COMBINE_LOCATION_DATA): New. (IS_ADHOC_LOC): New. (expanded_location): New field. * line-map.c (location_adhoc_data): New. (location_adhoc_data_htab): New. (curr_adhoc_loc): New. (location_adhoc_data): New. (allocated_location_adhoc_data): New. (location_adhoc_data_hash): New. (location_adhoc_data_eq): New. (location_adhoc_data_update): New. (get_combined_adhoc_loc): New. (get_data_from_adhoc_loc): New. (get_location_from_adhoc_loc): New. (location_adhoc_data_init): New. (location_adhoc_data_fini): New. (linemap_lookup): Change to use new location. (linemap_ordinary_map_lookup): Likewise. (linemap_macro_map_lookup): Likewise. (linemap_macro_map_loc_to_def_point): Likewise. (linemap_macro_map_loc_unwind_toward_spel): Likewise. (linemap_get_expansion_line): Likewise. (linemap_get_expansion_filename): Likewise. (linemap_location_in_system_header_p): Likewise. (linemap_location_from_macro_expansion_p): Likewise. (linemap_macro_loc_to_spelling_point): Likewise. (linemap_macro_loc_to_def_point): Likewise. (linemap_macro_loc_to_exp_point): Likewise. (linemap_resolve_location): Likewise. (linemap_unwind_toward_expansion): Likewise. (linemap_unwind_to_first_non_reserved_loc): Likewise. (linemap_expand_location): Likewise. (linemap_dump_location): Likewise. Index: libcpp/line-map.c === --- libcpp/line-map.c (revision 190209) +++ libcpp/line-map.c (working copy) @@ -25,6 +25,7 @@ #include line-map.h #include cpplib.h #include internal.h +#include hashtab.h static void trace_include (const struct line_maps *, const struct line_map *); static const struct line_map * linemap_ordinary_map_lookup (struct line_maps *, @@ -50,6 +51,135 @@ extern unsigned num_expanded_macros_counter; extern unsigned num_macro_tokens_counter; +/* Data structure to associate an arbitrary data to a source location. */ +struct location_adhoc_data { + source_location locus; + void *data; +}; + +/* The following data structure encodes a location with some adhoc data + and maps it to a new unsigned integer (called an adhoc location) + that replaces the original location to represent the mapping. + + The new adhoc_loc uses the highest bit as the enabling bit, i.e. if the + highest bit is 1, then the number is adhoc_loc. Otherwise, it serves as + the original location. Once identified as the adhoc_loc, the lower 31 + bits of the integer is used to index the location_adhoc_data array, + in which the locus and associated data is stored. */ + +static htab_t location_adhoc_data_htab; +static source_location curr_adhoc_loc; +static struct location_adhoc_data *location_adhoc_data; +static unsigned int allocated_location_adhoc_data; + +/* Hash function for location_adhoc_data hashtable. */ + +static hashval_t +location_adhoc_data_hash (const void *l) +{ + const struct location_adhoc_data *lb = + (const struct location_adhoc_data *) l; + return (hashval_t) lb-locus + (size_t) lb-data; +} + +/* Compare function for location_adhoc_data hashtable. */ + +static int +location_adhoc_data_eq (const void *l1, const void *l2) +{ + const struct location_adhoc_data *lb1 = + (const struct location_adhoc_data *) l1; + const struct location_adhoc_data *lb2 = + (const struct location_adhoc_data *) l2; + return lb1-locus == lb2-locus lb1-data == lb2-data; +} + +/* Update the hashtable when
Re: [google/integration] Add a configure option to disable system header canonicalizations (issue6489063)
On Fri, Aug 31, 2012 at 10:30 AM, Simon Baldwin sim...@google.com wrote: Yes. I meant --disable-canonical-prefixes. That is a gcc configure flag that we use to control the default setting for -[no-]canonical-prefixes where neither flag is supplied on the gcc command line. --disable/enable-canonical-prefixes is only in google branches. I did a little archaeology. AFAICT, there was no specific objection to pushing --disable-canonical-prefixes into upstream trunk. The feedback I see to your initial post was send us a trunk-based patch and here are some minor nits to cleanup. It basically sounds like upstream was neutral to the patch and would probably accept it if we actually sent something for review. I still think this is something that is both reasonable and feasible to push upstream. We should at least try to get some feedback first. While there aren't a lot of people using symlink farms, I'd be surprised if we were the only ones. Ollie
Fix bootstrap failure with clang++ (PR 54484)
Fix bootstrap failure with clang++. This patch fixes a bootstrap failure when using clang as the host compiler. Default arguments for class template member functions should be added in the declaration, not the definition. From Jason: 8.3.6 says Default arguments for a member function of a class template shall be specified on the initial declaration of the member function within the class template. 2012-09-04 Diego Novillo dnovi...@google.com PR bootstrap/54484 * vec.h (vec_t::embedded_init): Move default argument value to function declaration. diff --git a/gcc/vec.h b/gcc/vec.h index c0f1bb2..441c9b5 100644 --- a/gcc/vec.h +++ b/gcc/vec.h @@ -171,7 +171,7 @@ struct GTY(()) vec_t T last (ALONE_VEC_CHECK_DECL); const T operator[] (unsigned) const; T operator[] (unsigned); - void embedded_init (int, int); + void embedded_init (int, int = 0); templateenum vec_allocation_t A vec_tT *copy (ALONE_MEM_STAT_DECL); @@ -599,7 +599,7 @@ vec_tT::iterate (const vec_tT *vec, unsigned ix, T **ptr) final member): size_t vec_tT::embedded_sizeT (int reserve); - void v-embedded_init(int reserve, int active = 0); + void v-embedded_init(int reserve, int active); These allow the caller to perform the memory allocation. */ @@ -616,7 +616,7 @@ vec_tT::embedded_size (int nelems) templatetypename T void -vec_tT::embedded_init (int nelems, int active = 0) +vec_tT::embedded_init (int nelems, int active) { prefix_.num_ = active; prefix_.alloc_ = nelems;
Re: Fix bootstrap failure with clang++ (PR 54484)
On Tue, Sep 4, 2012 at 11:07 PM, Diego Novillo dnovi...@google.com wrote: Fix bootstrap failure with clang++. This patch fixes a bootstrap failure when using clang as the host compiler. Default arguments for class template member functions should be added in the declaration, not the definition. From Jason: 8.3.6 says Default arguments for a member function of a class template shall be specified on the initial declaration of the member function within the class template. If GCC doesn't diagnose this, what is there to avoid this problem in the future? Ciao! Steven
Re: Fix bootstrap failure with clang++ (PR 54484)
On 2012-09-04 17:10 , Steven Bosscher wrote: On Tue, Sep 4, 2012 at 11:07 PM, Diego Novillo dnovi...@google.com wrote: Fix bootstrap failure with clang++. This patch fixes a bootstrap failure when using clang as the host compiler. Default arguments for class template member functions should be added in the declaration, not the definition. From Jason: 8.3.6 says Default arguments for a member function of a class template shall be specified on the initial declaration of the member function within the class template. If GCC doesn't diagnose this, what is there to avoid this problem in the future? I'm filing a separate PR for this. Diego
[patch, mips] New mips triplet for multilib linux builds
I would like to create a new mips target triplet (mips-mti-linux-gnu). This target would be multilib by default and would have --enable-synci on by default. It would mainly be used for building mips cross compilers with glibc. I hope to extend this target to support the n32 and 64 bit ABIs in the future and add a corresponding mips-mti-elf triplet that would be like mips-sde-elf but have fewer/different multilib versions. Other then adding the new target the only changes are to the --enable-synci default setting (enabled for mips-mti-linux-gnu, still disabled for other targets) and in mips.h to use a new macro SYNCI_SPEC so that I don't have to copy all of OPTION_DEFAULT_SPECS into mti-linux.h just to change the -msynci handling. I tested the changes by building and running the testsuite with the qemu simulator. No glibc or binutils changes were needed for this. OK to checkin? Steve Ellcey sell...@mips.com 2012-09-04 Steve Ellcey sell...@mips.com * config.gcc: Add mips*-mti-linux* target and make with_synci true by default for that target. * config/mips/mips.h (SYNCI_SPEC): New. (OPTION_DEFAULT_SPECS): Use new SYNCI_SPEC. * mti-linux.h: New file. * t-mti-linux: New file. diff --git a/gcc/config.gcc b/gcc/config.gcc index 9ec8a41..6923211 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -1685,6 +1685,13 @@ mips*-*-netbsd*) # NetBSD/mips, either endian. tm_file=elfos.h ${tm_file} mips/elf.h netbsd.h netbsd-elf.h mips/netbsd.h extra_options=${extra_options} netbsd.opt netbsd-elf.opt ;; +mips*-mti-linux*) + tm_file=dbxelf.h elfos.h gnu-user.h linux.h glibc-stdint.h mips/mti-linux.h ${tm_file} mips/gnu-user.h mips/gnu-user64.h mips/linux64.h mips/linux-common.h + tmake_file=${tmake_file} mips/t-mti-linux + gnu_ld=yes + gas=yes + test x$with_llsc != x || with_llsc=yes + ;; mips64*-*-linux* | mipsisa64*-*-linux*) tm_file=dbxelf.h elfos.h gnu-user.h linux.h glibc-stdint.h ${tm_file} mips/gnu-user.h mips/gnu-user64.h mips/linux64.h mips/linux-common.h tmake_file=${tmake_file} mips/t-linux64 @@ -3262,10 +3269,19 @@ case ${target} in yes) with_synci=synci ;; -| no) - # No is the default. + no) with_synci=no-synci ;; + ) + case ${target} in + mips*-mti-*) + with_synci=synci + ;; + *) + with_synci=no-synci + ;; + esac + ;; *) echo Unknown synci type used in --with-synci 12 exit 1 diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h index 9ce466d..b98b434 100644 --- a/gcc/config/mips/mips.h +++ b/gcc/config/mips/mips.h @@ -748,6 +748,9 @@ struct mips_cpu_info { specified. --with-divide is ignored if -mdivide-traps or -mdivide-breaks are specified. */ +#ifndef SYNCI_SPEC +#define SYNCI_SPEC -m%(VALUE) +#endif #define OPTION_DEFAULT_SPECS \ {arch, %{ MIPS_ARCH_OPTION_SPEC :;: -march=%(VALUE)} }, \ {arch_32, %{ OPT_ARCH32 :%{ MIPS_ARCH_OPTION_SPEC :;: -march=%(VALUE)}} }, \ @@ -760,7 +763,7 @@ struct mips_cpu_info { {divide, %{!mdivide-traps:%{!mdivide-breaks:-mdivide-%(VALUE)}} }, \ {llsc, %{!mllsc:%{!mno-llsc:-m%(VALUE)}} }, \ {mips-plt, %{!mplt:%{!mno-plt:-m%(VALUE)}} }, \ - {synci, %{!msynci:%{!mno-synci:-m%(VALUE)}} } + {synci, %{!msynci:%{!mno-synci: SYNCI_SPEC }} } /* A spec that infers the -mdsp setting from an -march argument. */ diff --git a/gcc/config/mips/mti-linux.h b/gcc/config/mips/mti-linux.h new file mode 100644 index 000..af3d71f --- /dev/null +++ b/gcc/config/mips/mti-linux.h @@ -0,0 +1,35 @@ +/* Target macros for mips*-mti-linux* targets. + Copyright (C) 2012 + Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +http://www.gnu.org/licenses/. */ + +/* Use the (o)32 ABI and the mips32r2 architecture by default. */ +#undef MIPS_ABI_DEFAULT +#define MIPS_ABI_DEFAULT ABI_32 +#undef MIPS_ISA_DEFAULT +#define
Re: [PATCH] Add counter histogram to fdo summary (issue6465057)
I just committed the patch (included below). I implemented the occupancy bit vector approach for recording non-zero histogram entries, and a few issues uncovered with the merging in a profiled bootstrap. Passes both bootstrap and profiledbootstrap builds and regression tests. Thanks, Teresa Enhances the gcov program summary by adding a histogram of arc counter entries. This is used to compute working set information in the compiler for use by optimizations that need information on hot vs cold counter values or the rough working set size in terms of the number of counters. Each working set data point is the minimum counter value and number of counters required to reach a given percentage of the cumulative counter sum across the profiled execution (sum_all in the program summary). 2012-09-04 Teresa Johnson tejohn...@google.com * libgcc/libgcov.c (struct gcov_summary_buffer): New structure. (gcov_histogram_insert): New function. (gcov_compute_histogram): Ditto. (gcov_exit): Invoke gcov_compute_histogram, and perform merging of histograms during summary merging. * gcc/gcov-io.c (gcov_write_summary): Write out non-zero histogram entries to function summary along with an occupancy bit vector. (gcov_read_summary): Read in the histogram entries. (gcov_histo_index): New function. (void gcov_histogram_merge): Ditto. * gcc/gcov-io.h (gcov_type_unsigned): New type. (struct gcov_bucket_type): Ditto. (struct gcov_ctr_summary): Include histogram. (GCOV_TAG_SUMMARY_LENGTH): Update to include histogram entries. (GCOV_HISTOGRAM_SIZE): New macro. (GCOV_HISTOGRAM_BITVECTOR_SIZE): Ditto. * gcc/profile.c (NUM_GCOV_WORKING_SETS): Ditto. (gcov_working_sets): New global variable. (compute_working_sets): New function. (find_working_set): Ditto. (get_exec_counts): Invoke compute_working_sets. * gcc/coverage.c (read_counts_file): Merge histograms, and fix bug with accessing summary info for non-summable counters. * gcc/basic-block.h (gcov_type_unsigned): New type. (struct gcov_working_set_info): Ditto. (find_working_set): Declare. * gcc/gcov-dump.c (tag_summary): Dump out histogram. Index: libgcc/libgcov.c === --- libgcc/libgcov.c(revision 190950) +++ libgcc/libgcov.c(working copy) @@ -97,6 +97,12 @@ struct gcov_fn_buffer /* note gcov_fn_info ends in a trailing array. */ }; +struct gcov_summary_buffer +{ + struct gcov_summary_buffer *next; + struct gcov_summary summary; +}; + /* Chain of per-object gcov structures. */ static struct gcov_info *gcov_list; @@ -276,6 +282,76 @@ gcov_version (struct gcov_info *ptr, gcov_unsigned return 1; } +/* Insert counter VALUE into HISTOGRAM. */ + +static void +gcov_histogram_insert(gcov_bucket_type *histogram, gcov_type value) +{ + unsigned i; + + i = gcov_histo_index(value); + histogram[i].num_counters++; + histogram[i].cum_value += value; + if (value histogram[i].min_value) +histogram[i].min_value = value; +} + +/* Computes a histogram of the arc counters to place in the summary SUM. */ + +static void +gcov_compute_histogram (struct gcov_summary *sum) +{ + struct gcov_info *gi_ptr; + const struct gcov_fn_info *gfi_ptr; + const struct gcov_ctr_info *ci_ptr; + struct gcov_ctr_summary *cs_ptr; + unsigned t_ix, f_ix, ctr_info_ix, ix; + int h_ix; + + /* This currently only applies to arc counters. */ + t_ix = GCOV_COUNTER_ARCS; + + /* First check if there are any counts recorded for this counter. */ + cs_ptr = (sum-ctrs[t_ix]); + if (!cs_ptr-num) +return; + + for (h_ix = 0; h_ix GCOV_HISTOGRAM_SIZE; h_ix++) +{ + cs_ptr-histogram[h_ix].num_counters = 0; + cs_ptr-histogram[h_ix].min_value = cs_ptr-run_max; + cs_ptr-histogram[h_ix].cum_value = 0; +} + + /* Walk through all the per-object structures and record each of + the count values in histogram. */ + for (gi_ptr = gcov_list; gi_ptr; gi_ptr = gi_ptr-next) +{ + if (!gi_ptr-merge[t_ix]) +continue; + + /* Find the appropriate index into the gcov_ctr_info array + for the counter we are currently working on based on the + existence of the merge function pointer for this object. */ + for (ix = 0, ctr_info_ix = 0; ix t_ix; ix++) +{ + if (gi_ptr-merge[ix]) +ctr_info_ix++; +} + for (f_ix = 0; f_ix != gi_ptr-n_functions; f_ix++) +{ + gfi_ptr = gi_ptr-functions[f_ix]; + + if (!gfi_ptr || gfi_ptr-key != gi_ptr) +continue; + + ci_ptr = gfi_ptr-ctrs[ctr_info_ix]; + for (ix = 0; ix ci_ptr-num; ix++) +gcov_histogram_insert (cs_ptr-histogram, ci_ptr-values[ix]); +} +} +} + /* Dump the coverage counts. We merge with existing counts when
Re: [patch,libgcc] fp-bit.c: filter-out LIB2FUNCS_EXCLUDE
On Mon, Sep 3, 2012 at 8:30 AM, Georg-Johann Lay a...@gjlay.de wrote: * Makefile.in (FPBIT_FUNCS): filter-out LIB2FUNCS_EXCLUDE. (DPBIT_FUNCS): Ditto. (TPBIT_FUNCS): Ditto. This is OK. Thanks. Ian
Re: Fix bootstrap failure with clang++ (PR 54484)
On 2012-09-04 17:10 , Steven Bosscher wrote: On Tue, Sep 4, 2012 at 11:07 PM, Diego Novillo dnovi...@google.com wrote: Fix bootstrap failure with clang++. This patch fixes a bootstrap failure when using clang as the host compiler. Default arguments for class template member functions should be added in the declaration, not the definition. From Jason: 8.3.6 says Default arguments for a member function of a class template shall be specified on the initial declaration of the member function within the class template. If GCC doesn't diagnose this, what is there to avoid this problem in the future? http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54485 Diego.
Re: Ping: [PATCH GCC/ARM] Fix problem that hardreg_cprop opportunities are missed on thumb1
Bin Cheng bin.ch...@arm.com writes: Hi, For thumb1, arm-gcc rewrites move insn into subtract of ZERO in peephole2 pass intentionally, then executes pass_if_after_reload/pass_regrename/pass_cprop_hardreg sequentially. In this scenario, copy propagation opportunities are missed because: 1. the move insns are re-written. 2. pass_cprop_hardreg currently don't notice the subtract of ZERO. This patch fixes the problem and the logic is: 1. notice the plus/subtract of ZERO in pass_cprop_hardreg. 2. if the last insn providing information about conditional codes is in the form of dest_reg = src_reg - 0, record the src_reg in newly added field thumb1_cc_op0_src of structure machine_function. 3. in pattern cbranchsi4_insn, check thumb1_cc_op0_src along with thumb1_cc_op0 to save one comparison insn. I measured the patch on CSiBE, about 600 bytes are saved for both O2 and Os on cortex-m0 without any regression. I also tested the patch on arm-none-eabi+cortex-m0/arm-none-eabi+cortex-m3/i686-pc-linux and no regressions introduced. So is it OK? Thanks 2012-08-13 Bin Cheng bin.ch...@arm.com * regcprop.c (copyprop_hardreg_forward_1) Notice copies in the form of subtract of ZERO. * config/arm/arm.h (thumb1_cc_op0_src) New field. * config/arm/arm.c (thumb1_final_prescan_insn) Record thumb1_cc_op0_src. * config/arm/arm.md (cbranchsi4_insn) Check thumb1_cc_op0_src along with thumb1_cc_op0. Ping? Hi Ramana, could you help me review this patch? Hi Eric, Richard, could you help me review the change in regcprop.c? Subtraction of zero isn't canonical rtl though. Passes after peephole2 would be well within their rights to simplify the expression back to a move. From that point of view, making the passes recognise (plus X 0) and (minus X 0) as special cases would be inconsistent. Rather than make the Thumb 1 CC usage implicit in the rtl stream, and carry the current state around in cfun-machine, it seems like it would be better to get md_reorg to rewrite the instructions into a form that makes the use of condition codes explicit. md_reorg also sounds like a better place in the pipeline than peephole2 to be doing this kind of transformation, although I admit I have zero evidence to back that up... Richard
Re: [PATCH] Reduce memory usage for storing LTO decl resolutions
On Tue, Sep 4, 2012 at 6:43 PM, Andi Kleen a...@firstfloor.org wrote: +/* Compact representation of a index - resolution pair. Unpacked to an + vector later. */ +struct res_pair +{ + ld_plugin_symbol_resolution_t res; + unsigned index; +}; +typedef struct res_pair res_pair; + +DEF_VEC_P(res_pair); +DEF_VEC_ALLOC_P(res_pair, heap); Did you mean to use DEF_VEC_O here? (Not sure it matters after the vec rewrite for c++) Ciao! Steven
Re: [PATCH] rs6000: Add a builtin to read the time base register on PowerPC
For things that do mftb with high frequency, maybe you should also add a builtin that does just an mftb, i.e. returns a 32-bit result on 32-bit implementations. Are you thinking in a function that returns only the TBL? On 32-bit, just TBL; on 64-bit, the whole TB (there is no machine instruction to read just TBL on 64-bit, so it doesn't make much sense to have it return a 32-bit number). It sounds like you are asking for an additional interface for high-frequency events that only reads one register on both PPC32 and PPC64. Yes. A builtin only makes sense for measuring very short intervals; the builtin is quite a hassle (the timebase is not part of the UISA, and as we see it actually differs a lot between implementations), and there is no advantage over having it in some library if you're measuring big intervals. I do not believe that interface currently exists for PPC in GLibc Does glibc implement the timebase thing at all? I lost track of those patches. and that seems out of the scope of this patch. It could be a nice feature, but it's a new feature request that is not necessary for this round of patches. Sure; on the other hand, it seems simple enough to implement. It was just a request. Segher