Re: Revert PowerPC shrink-wrap support 3 of 3
On Thu, 10 Nov 2011, Hans-Peter Nilsson wrote: I think I need someone with appropriate write privileges to agree with that, and to also give 48h for someone to fix the problem. Sorry for not forthcoming on the second point. brgds, H-P PS. where is the policy written down, besides the mailing list archives? https://gcc.gnu.org/develop.html has had the following since 2003 or so: Patch Reversion If a patch is committed which introduces a regression on any target which the Steering Committee considers to be important and if: * the problem is reported to the original poster; * 48 hours pass without the original poster or any other party indicating that a fix will be forthcoming in the very near future; * two people with write privileges to the affected area of the compiler determine that the best course of action is to revert the patch; then they may revert the patch. (The list of important targets will be revised at the beginning of each release cycle, if necessary, and is part of the release criteria.) After the patch has been reverted, the poster may appeal the decision to the Steering Committee. Note that no distinction is made between patches which are themselves buggy and patches that expose latent bugs elsewhere in the compiler. This is there as part of the overall (release) methodology, though svnwrite.html would be at least as natural. Perhaps a reference there? Thoughts? Gerald PS: Yes, I am catching up with some older mails. ;-)
Re: CFG review needed for fix of PowerPC shrink-wrap support 3 of 3
On 11/14/2011 11:56 AM, Alan Modra wrote: * function.c (thread_prologue_and_epilogue_insns): Guard emitting return with single_succ_p test. Ok. r~
CFG review needed for fix of PowerPC shrink-wrap support 3 of 3
From: Bernd Schmidt ber...@codesourcery.com Date: Mon, 14 Nov 2011 10:51:56 +0100 On 11/11/11 20:13, Hans-Peter Nilsson wrote: AFAICT, your patch has got sufficiently testing now (on three targets to boot) to be considered safe to check in. Or is something amiss? (If it's the unchecked code quality you mentioned, that can be just as well dealt with having the tree in a working state; having the tree broken for some targets accomplishes nothing.) I briefly looked at that, and while it does tend to move stuff around compared to an unpatched compiler, the effects don't seem to be all that bad. If we find a testcase where it's a real problem, we could do more aggressive reordering after epilogue generation, when we no longer have exit fallthroguh edges. So I'd like to ask for an OK here. Looks like all we need is a positive review of http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01409.html and a ChangeLog entry to unbreak three or more targets. Someone with approval rights: pretty please? brgds, H-P
Re: CFG review needed for fix of PowerPC shrink-wrap support 3 of 3
Someone with approval rights: pretty please? Can I add my +1 pretty please as well here :) ? According to #c3 this fixes arm-linux-gnueabi cross-builds for C++ as well and potentially allows this to bootstrap again. I have kicked off a bootstrap and test run on arm-linux-gnueabi . cheers Ramana
Re: CFG review needed for fix of PowerPC shrink-wrap support 3 of 3
Ramana Radhakrishnan ramana.radhakrish...@linaro.org writes: Someone with approval rights: pretty please? Can I add my +1 pretty please as well here :) ? According to #c3 this fixes arm-linux-gnueabi cross-builds for C++ as well and potentially allows this to bootstrap again. I have kicked off a bootstrap and test run on arm-linux-gnueabi . It's better than before, but Ada bootstrap (on Solaris/SPARC, cf. PR bootstrap/51086) still fails with the same ICE. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: CFG review needed for fix of PowerPC shrink-wrap support 3 of 3
On 11/14/2011 04:10 AM, Hans-Peter Nilsson wrote: Looks like all we need is a positive review of http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01409.html and a ChangeLog entry to unbreak three or more targets. Someone with approval rights: pretty please? That patch is ok. r~
Re: CFG review needed for fix of PowerPC shrink-wrap support 3 of 3
On Mon, Nov 14, 2011 at 07:48:03AM -1000, Richard Henderson wrote: On 11/14/2011 04:10 AM, Hans-Peter Nilsson wrote: Looks like all we need is a positive review of http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01409.html and a ChangeLog entry to unbreak three or more targets. Someone with approval rights: pretty please? That patch is ok. Sorry for the bootstrap problems. I believe the cause is my extraction of convert_jumps_to_returns and emit_return_for_exit. The new HAVE_return code misses a single_succ_p test (covered by the bitmap_bit_p (bb_flags, ...) test in the HAVE_simple_return case). I haven't really looked into what Bernd's fix does. I know this one fixes what I broke.. * function.c (thread_prologue_and_epilogue_insns): Guard emitting return with single_succ_p test. Index: gcc/function.c === --- gcc/function.c (revision 181188) +++ gcc/function.c (working copy) @@ -6230,7 +6230,8 @@ thread_prologue_and_epilogue_insns (void !active_insn_between (BB_HEAD (last_bb), BB_END (last_bb))) convert_jumps_to_returns (last_bb, false, NULL); - if (EDGE_COUNT (exit_fallthru_edge-src-preds) != 0) + if (EDGE_COUNT (last_bb-preds) != 0 + single_succ_p (last_bb)) { last_bb = emit_return_for_exit (exit_fallthru_edge, false); epilogue_end = returnjump = BB_END (last_bb); -- Alan Modra Australia Development Lab, IBM
Re: CFG review needed for fix of PowerPC shrink-wrap support 3 of 3
From: Richard Henderson r...@redhat.com Date: Mon, 14 Nov 2011 18:48:03 +0100 On 11/14/2011 04:10 AM, Hans-Peter Nilsson wrote: Looks like all we need is a positive review of http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01409.html and a ChangeLog entry to unbreak three or more targets. Someone with approval rights: pretty please? That patch is ok. Committed with this ChangeLog entry, obvious from the function header and the added comment. 2011-11-15 Bernd Schmidt ber...@codesourcery.com PR rtl-optimization/51051 * cfgrtl.c (cfg_layout_can_merge_blocks_p): Return FALSE if the move would cause fallthrough into the exit block. brgds, H-P
Re: CFG review needed for fix of PowerPC shrink-wrap support 3 of 3
On 11/14/2011 11:56 AM, Alan Modra wrote: On Mon, Nov 14, 2011 at 07:48:03AM -1000, Richard Henderson wrote: On 11/14/2011 04:10 AM, Hans-Peter Nilsson wrote: Looks like all we need is a positive review of http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01409.html and a ChangeLog entry to unbreak three or more targets. Someone with approval rights: pretty please? That patch is ok. Sorry for the bootstrap problems. I believe the cause is my extraction of convert_jumps_to_returns and emit_return_for_exit. The new HAVE_return code misses a single_succ_p test (covered by the bitmap_bit_p (bb_flags, ...) test in the HAVE_simple_return case). I haven't really looked into what Bernd's fix does. I know this one fixes what I broke.. * function.c (thread_prologue_and_epilogue_insns): Guard emitting return with single_succ_p test. Hmm. This looks plausible too. Bernd's patch made sure that cfglayout didn't do something impossible. Of course, it's possible that his patch should merely be an assert, and the correct fix goes here. Thoughts? r~
Re: CFG review needed for fix of PowerPC shrink-wrap support 3 of 3
On 11/15/11 01:43, Richard Henderson wrote: On 11/14/2011 11:56 AM, Alan Modra wrote: * function.c (thread_prologue_and_epilogue_insns): Guard emitting return with single_succ_p test. Hmm. This looks plausible too. Bernd's patch made sure that cfglayout didn't do something impossible. Of course, it's possible that his patch should merely be an assert, and the correct fix goes here. I haven't really looked at Alan's patch; I'm thinking there were probably two bugs - the one I found being latent before. Note that the problematic CFG exists way before thread_prologue_and_epilogue even runs, so an assert would still trigger. Bernd
Re: CFG review needed for fix of PowerPC shrink-wrap support 3 of 3
From: Bernd Schmidt ber...@codesourcery.com Date: Tue, 15 Nov 2011 01:54:34 +0100 On 11/15/11 01:43, Richard Henderson wrote: On 11/14/2011 11:56 AM, Alan Modra wrote: * function.c (thread_prologue_and_epilogue_insns): Guard emitting return with single_succ_p test. Hmm. This looks plausible too. Bernd's patch made sure that cfglayout didn't do something impossible. Of course, it's possible that his patch should merely be an assert, and the correct fix goes here. I haven't really looked at Alan's patch; I'm thinking there were probably two bugs - the one I found being latent before. Note that the problematic CFG exists way before thread_prologue_and_epilogue even runs, so an assert would still trigger. I suspect that Alan's fix here will take care of the sparc ada bootstrap regression reported, and regardless if it's correct it should be installed.
Re: Revert PowerPC shrink-wrap support 3 of 3
From: Hans-Peter Nilsson h...@axis.com Date: Thu, 10 Nov 2011 18:52:39 +0100 From: Hans-Peter Nilsson h...@axis.com Date: Thu, 10 Nov 2011 15:12:54 +0100 From: Bernd Schmidt ber...@codesourcery.com Date: Thu, 10 Nov 2011 14:29:04 +0100 HP, can you run full tests? Cross-test to cris-elf in progress. Thanks! Works, no regressions compared to before the breakage (r181187). Thanks! According to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51051#c3 it fixes building for arm-unknown-linux-gnueabi too. AFAICT, your patch has got sufficiently testing now (on three targets to boot) to be considered safe to check in. Or is something amiss? (If it's the unchecked code quality you mentioned, that can be just as well dealt with having the tree in a working state; having the tree broken for some targets accomplishes nothing.) brgds, H-P
Revert PowerPC shrink-wrap support 3 of 3
From: Hans-Peter Nilsson h...@axis.com Date: Wed, 9 Nov 2011 09:55:59 +0100 From: Alan Modra amo...@gmail.com Date: Tue, 1 Nov 2011 16:33:40 +0100 On Tue, Nov 01, 2011 at 12:57:22AM +1030, Alan Modra wrote: * function.c (bb_active_p): Delete. (dup_block_and_redirect, active_insn_between): New functions. (convert_jumps_to_returns, emit_return_for_exit): New functions, split out from.. (thread_prologue_and_epilogue_insns): ..here. Delete shadowing variables. Don't do prologue register clobber tests when shrink wrapping already failed. Delete all last_bb_active code. Instead compute tail block candidates for duplicating exit path. Remove these from antic set. Duplicate tails when reached from both blocks needing a prologue/epilogue and blocks not needing such. * ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and HAVE_simple_return. * bb-reorder.c (get_uncond_jump_length): Make global. * bb-reorder.h (get_uncond_jump_length): Declare. * cfgrtl.c (rtl_create_basic_block): Comment typo fix. (rtl_split_edge): Likewise. Warning fix. (rtl_duplicate_bb): New function. (rtl_cfg_hooks): Enable can_duplicate_block_p and duplicate_block. This (a revision in the range 181187:181189) broke build for cris-elf like so: See PR51051. Given that this also broke arm-linux-gnueabi, a primary platform, and Alan being absent until the 15th according to a message on IRC, I move to revert r181188. I think I need someone with appropriate write privileges to agree with that, and to also give 48h for someone to fix the problem. Sorry for not forthcoming on the second point. brgds, H-P PS. where is the policy written down, besides the mailing list archives?
Re: Revert PowerPC shrink-wrap support 3 of 3
On Thu, Nov 10, 2011 at 11:38 AM, Hans-Peter Nilsson hans-peter.nils...@axis.com wrote: From: Hans-Peter Nilsson h...@axis.com Date: Wed, 9 Nov 2011 09:55:59 +0100 From: Alan Modra amo...@gmail.com Date: Tue, 1 Nov 2011 16:33:40 +0100 On Tue, Nov 01, 2011 at 12:57:22AM +1030, Alan Modra wrote: * function.c (bb_active_p): Delete. (dup_block_and_redirect, active_insn_between): New functions. (convert_jumps_to_returns, emit_return_for_exit): New functions, split out from.. (thread_prologue_and_epilogue_insns): ..here. Delete shadowing variables. Don't do prologue register clobber tests when shrink wrapping already failed. Delete all last_bb_active code. Instead compute tail block candidates for duplicating exit path. Remove these from antic set. Duplicate tails when reached from both blocks needing a prologue/epilogue and blocks not needing such. * ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and HAVE_simple_return. * bb-reorder.c (get_uncond_jump_length): Make global. * bb-reorder.h (get_uncond_jump_length): Declare. * cfgrtl.c (rtl_create_basic_block): Comment typo fix. (rtl_split_edge): Likewise. Warning fix. (rtl_duplicate_bb): New function. (rtl_cfg_hooks): Enable can_duplicate_block_p and duplicate_block. This (a revision in the range 181187:181189) broke build for cris-elf like so: See PR51051. Given that this also broke arm-linux-gnueabi, a primary platform, and Alan being absent until the 15th according to a message on IRC, I move to revert r181188. Is there a PR for the arm issue? I think I need someone with appropriate write privileges to agree with that, and to also give 48h for someone to fix the problem. Sorry for not forthcoming on the second point. Did you or somebody else try to look into the problem? To decide whether it's the best course of action it would be nice to know if it's a simple error in the patch that is easy to fix. brgds, H-P PS. where is the policy written down, besides the mailing list archives? http://gcc.gnu.org/develop.html
Re: Revert PowerPC shrink-wrap support 3 of 3
From: Richard Guenther richard.guent...@gmail.com Date: Thu, 10 Nov 2011 12:22:56 +0100 On Thu, Nov 10, 2011 at 11:38 AM, Hans-Peter Nilsson hans-peter.nils...@axis.com wrote: From: Hans-Peter Nilsson h...@axis.com Date: Wed, 9 Nov 2011 09:55:59 +0100 From: Alan Modra amo...@gmail.com Date: Tue, 1 Nov 2011 16:33:40 +0100 On Tue, Nov 01, 2011 at 12:57:22AM +1030, Alan Modra wrote: * function.c (bb_active_p): Delete. (dup_block_and_redirect, active_insn_between): New functions. (convert_jumps_to_returns, emit_return_for_exit): New functions, split out from.. (thread_prologue_and_epilogue_insns): ..here. Delete shadowing variables. Don't do prologue register clobber tests when shrink wrapping already failed. Delete all last_bb_active code. Instead compute tail block candidates for duplicating exit path. Remove these from antic set. Duplicate tails when reached from both blocks needing a prologue/epilogue and blocks not needing such. * ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and HAVE_simple_return. * bb-reorder.c (get_uncond_jump_length): Make global. * bb-reorder.h (get_uncond_jump_length): Declare. * cfgrtl.c (rtl_create_basic_block): Comment typo fix. (rtl_split_edge): Likewise. Warning fix. (rtl_duplicate_bb): New function. (rtl_cfg_hooks): Enable can_duplicate_block_p and duplicate_block. This (a revision in the range 181187:181189) broke build for cris-elf like so: See PR51051. Given that this also broke arm-linux-gnueabi, a primary platform, and Alan being absent until the 15th according to a message on IRC, I move to revert r181188. Is there a PR for the arm issue? It's covered by the same PR, see comment #1. I've now updated the target field. I think I need someone with appropriate write privileges to agree with that, and to also give 48h for someone to fix the problem. Sorry for not forthcoming on the second point. Did you or somebody else try to look into the problem? To decide whether it's the best course of action it would be nice to know if it's a simple error in the patch that is easy to fix. Nope, not really. Wouldn't FWIW, de jure matter, me not having write privileges to the affected area. Though, I had a quick look at the patch and nothing stood out except its intrusiveness, and it seems the patch wasn't tested on a !simple_return target (just powerpc-linux according to the replied-to message). brgds, H-P
Re: Revert PowerPC shrink-wrap support 3 of 3
On Thu, Nov 10, 2011 at 12:43 PM, Hans-Peter Nilsson hans-peter.nils...@axis.com wrote: From: Richard Guenther richard.guent...@gmail.com Date: Thu, 10 Nov 2011 12:22:56 +0100 On Thu, Nov 10, 2011 at 11:38 AM, Hans-Peter Nilsson hans-peter.nils...@axis.com wrote: From: Hans-Peter Nilsson h...@axis.com Date: Wed, 9 Nov 2011 09:55:59 +0100 From: Alan Modra amo...@gmail.com Date: Tue, 1 Nov 2011 16:33:40 +0100 On Tue, Nov 01, 2011 at 12:57:22AM +1030, Alan Modra wrote: * function.c (bb_active_p): Delete. (dup_block_and_redirect, active_insn_between): New functions. (convert_jumps_to_returns, emit_return_for_exit): New functions, split out from.. (thread_prologue_and_epilogue_insns): ..here. Delete shadowing variables. Don't do prologue register clobber tests when shrink wrapping already failed. Delete all last_bb_active code. Instead compute tail block candidates for duplicating exit path. Remove these from antic set. Duplicate tails when reached from both blocks needing a prologue/epilogue and blocks not needing such. * ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and HAVE_simple_return. * bb-reorder.c (get_uncond_jump_length): Make global. * bb-reorder.h (get_uncond_jump_length): Declare. * cfgrtl.c (rtl_create_basic_block): Comment typo fix. (rtl_split_edge): Likewise. Warning fix. (rtl_duplicate_bb): New function. (rtl_cfg_hooks): Enable can_duplicate_block_p and duplicate_block. This (a revision in the range 181187:181189) broke build for cris-elf like so: See PR51051. Given that this also broke arm-linux-gnueabi, a primary platform, and Alan being absent until the 15th according to a message on IRC, I move to revert r181188. Is there a PR for the arm issue? It's covered by the same PR, see comment #1. I've now updated the target field. I think I need someone with appropriate write privileges to agree with that, and to also give 48h for someone to fix the problem. Sorry for not forthcoming on the second point. Did you or somebody else try to look into the problem? To decide whether it's the best course of action it would be nice to know if it's a simple error in the patch that is easy to fix. Nope, not really. Wouldn't FWIW, de jure matter, me not having write privileges to the affected area. Though, I had a quick look at the patch and nothing stood out except its intrusiveness, and it seems the patch wasn't tested on a !simple_return target (just powerpc-linux according to the replied-to message). Fair enough. You can count me as one then, and I'll defer to Bernd to either provide a fix or ack the revert. Thanks, Richard. brgds, H-P
Re: Revert PowerPC shrink-wrap support 3 of 3
On 11/10/11 13:14, Richard Guenther wrote: Fair enough. You can count me as one then, and I'll defer to Bernd to either provide a fix or ack the revert. I'm trying to track it down. In 189r.outof_cfglayout, we have (insn 31 33 35 3 (use (reg/i:SI 0 r0)) ../../../../baseline-trunk/libstdc++-v3/libsupc++/new_opv.cc:34 -1 (nil)) ;; Successors: EXIT [100.0%] (fallthru) ;; lr out 0 [r0] 11 [fp] 13 [sp] 14 [lr] 25 [sfp] 26 [afp] ;; live out 0 [r0] 11 [fp] 13 [sp] 25 [sfp] 26 [afp] followed by a number of other basic blocks, so that looks wrong to me. outof_cfglayout seems to assume that fallthrough edges to the exit block are OK and don't need fixing up, and changing that seems nontrivial at first glance. The situation is first created during cfgcleanup in into_cfglayout. The following patch makes the testcase compile by stopping the compiler from moving the exit fallthru block around, but I've not checked whether it has a negative effect on code quality. HP, can you run full tests? Bernd Index: ../baseline-trunk/gcc/cfgrtl.c === --- ../baseline-trunk/gcc/cfgrtl.c (revision 181252) +++ ../baseline-trunk/gcc/cfgrtl.c (working copy) @@ -2735,6 +2735,16 @@ cfg_layout_can_merge_blocks_p (basic_blo if (BB_PARTITION (a) != BB_PARTITION (b)) return false; + /* If we would end up moving B's instructions, make sure it doesn't fall + through into the exit block, since we cannot recover from a fallthrough + edge into the exit block occurring in the middle of a function. */ + if (NEXT_INSN (BB_END (a)) != BB_HEAD (b)) +{ + edge e = find_fallthru_edge (b-succs); + if (e e-dest == EXIT_BLOCK_PTR) + return false; +} + /* There must be exactly one edge in between the blocks. */ return (single_succ_p (a) single_succ (a) == b
Re: Revert PowerPC shrink-wrap support 3 of 3
From: Hans-Peter Nilsson h...@axis.com Date: Thu, 10 Nov 2011 15:12:54 +0100 From: Bernd Schmidt ber...@codesourcery.com Date: Thu, 10 Nov 2011 14:29:04 +0100 HP, can you run full tests? Cross-test to cris-elf in progress. Thanks! Works, no regressions compared to before the breakage (r181187). Thanks! According to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51051#c3 it fixes building for arm-unknown-linux-gnueabi too. brgds, H-P
Re: Revert PowerPC shrink-wrap support 3 of 3
On Thu, Nov 10, 2011 at 02:29:04PM +0100, Bernd Schmidt wrote: On 11/10/11 13:14, Richard Guenther wrote: Fair enough. You can count me as one then, and I'll defer to Bernd to either provide a fix or ack the revert. I'm trying to track it down. In 189r.outof_cfglayout, we have (insn 31 33 35 3 (use (reg/i:SI 0 r0)) ../../../../baseline-trunk/libstdc++-v3/libsupc++/new_opv.cc:34 -1 (nil)) ;; Successors: EXIT [100.0%] (fallthru) ;; lr out 0 [r0] 11 [fp] 13 [sp] 14 [lr] 25 [sfp] 26 [afp] ;; live out 0 [r0] 11 [fp] 13 [sp] 25 [sfp] 26 [afp] followed by a number of other basic blocks, so that looks wrong to me. outof_cfglayout seems to assume that fallthrough edges to the exit block are OK and don't need fixing up, and changing that seems nontrivial at first glance. The situation is first created during cfgcleanup in into_cfglayout. The following patch makes the testcase compile by stopping the compiler from moving the exit fallthru block around, but I've not checked whether it has a negative effect on code quality. HP, can you run full tests? FWIW, I did bootstrap and make check with/without this patch, and it introduces no regressions in the PowerPC, but I haven't look at the code generated. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899
Re: PowerPC shrink-wrap support 3 of 3
On Wed, Nov 02, 2011 at 02:03:40AM +1030, Alan Modra wrote: Bootstrapped and regression tested powerpc-linux. OK to apply? (And I won't be posting any more versions of the patch until this is reviewed. Please excuse me for spamming the list.) Looks reasonable to me, appart from * function.c (bb_active_p): Delete. (dup_block_and_redirect, active_insn_between): New functions. (convert_jumps_to_returns, emit_return_for_exit): New functions, split out from.. (thread_prologue_and_epilogue_insns): ..here. Delete shadowing variables. Don't do prologue register clobber tests when shrink wrapping already failed. Delete all last_bb_active code. Instead compute tail block candidates for duplicating exit path. Remove these from antic set. Duplicate tails when reached from both blocks needing a prologue/epilogue and blocks not needing such. * ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and HAVE_simple_return. * bb-reorder.c (get_uncond_jump_length): Make global. * bb-reorder.h (get_uncond_jump_length): Declare. * cfgrtl.c (rtl_create_basic_block): Comment typo fix. (rtl_split_edge): Likewise. Warning fix. (rtl_duplicate_bb): New function. (rtl_cfg_hooks): Enable can_duplicate_block_p and duplicate_block. Index: gcc/function.c === --- gcc/function.c(revision 180588) +++ gcc/function.c(working copy) @@ -65,6 +65,8 @@ along with GCC; see the file COPYING3. #include df.h #include timevar.h #include vecprim.h +#include params.h +#include bb-reorder.h This needs corresponding change in Makefile.in, function.o needs $(PARAMS_H) bb-reorder.h dependencies added. Ok for trunk with that change if no maintainer objects within 24 hours. Jakub
Re: PowerPC shrink-wrap support 3 of 3
On Tue, Nov 01, 2011 at 12:57:22AM +1030, Alan Modra wrote: Bits left to do - limit size of duplicated tails Done here. Also fixes a hole in that I took no notice of targetm.cannot_copy_insn_p when duplicating tails. One interesting result is that the tail duplication actually reduces the text size of libstdc++.so from 1074042 to 1073478 bytes on powerpc-linux. The reason being that a shrink-wrapped function that needs a prologue only on paths ending in a sibling call will lose one copy of the epilogue. That must happen enough to more than make up for duplicated tails. Bootstrapped and regression tested powerpc-linux. OK to apply? (And I won't be posting any more versions of the patch until this is reviewed. Please excuse me for spamming the list.) * function.c (bb_active_p): Delete. (dup_block_and_redirect, active_insn_between): New functions. (convert_jumps_to_returns, emit_return_for_exit): New functions, split out from.. (thread_prologue_and_epilogue_insns): ..here. Delete shadowing variables. Don't do prologue register clobber tests when shrink wrapping already failed. Delete all last_bb_active code. Instead compute tail block candidates for duplicating exit path. Remove these from antic set. Duplicate tails when reached from both blocks needing a prologue/epilogue and blocks not needing such. * ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and HAVE_simple_return. * bb-reorder.c (get_uncond_jump_length): Make global. * bb-reorder.h (get_uncond_jump_length): Declare. * cfgrtl.c (rtl_create_basic_block): Comment typo fix. (rtl_split_edge): Likewise. Warning fix. (rtl_duplicate_bb): New function. (rtl_cfg_hooks): Enable can_duplicate_block_p and duplicate_block. Index: gcc/function.c === --- gcc/function.c (revision 180588) +++ gcc/function.c (working copy) @@ -65,6 +65,8 @@ along with GCC; see the file COPYING3. #include df.h #include timevar.h #include vecprim.h +#include params.h +#include bb-reorder.h /* So we can assign to cfun in this file. */ #undef cfun @@ -5290,8 +5292,6 @@ requires_stack_frame_p (rtx insn, HARD_R HARD_REG_SET hardregs; unsigned regno; - if (!INSN_P (insn) || DEBUG_INSN_P (insn)) -return false; if (CALL_P (insn)) return !SIBLING_CALL_P (insn); @@ -5514,23 +5514,186 @@ set_return_jump_label (rtx returnjump) JUMP_LABEL (returnjump) = ret_rtx; } -/* Return true if BB has any active insns. */ +#ifdef HAVE_simple_return +/* Create a copy of BB instructions and insert at BEFORE. Redirect + preds of BB to COPY_BB if they don't appear in NEED_PROLOGUE. */ +static void +dup_block_and_redirect (basic_block bb, basic_block copy_bb, rtx before, + bitmap_head *need_prologue) +{ + edge_iterator ei; + edge e; + rtx insn = BB_END (bb); + + /* We know BB has a single successor, so there is no need to copy a + simple jump at the end of BB. */ + if (simplejump_p (insn)) +insn = PREV_INSN (insn); + + start_sequence (); + duplicate_insn_chain (BB_HEAD (bb), insn); + if (dump_file) +{ + unsigned count = 0; + for (insn = get_insns (); insn; insn = NEXT_INSN (insn)) + if (active_insn_p (insn)) + ++count; + fprintf (dump_file, Duplicating bb %d to bb %d, %u active insns.\n, + bb-index, copy_bb-index, count); +} + insn = get_insns (); + end_sequence (); + emit_insn_before (insn, before); + + /* Redirect all the paths that need no prologue into copy_bb. */ + for (ei = ei_start (bb-preds); (e = ei_safe_edge (ei)); ) +if (!bitmap_bit_p (need_prologue, e-src-index)) + { + redirect_edge_and_branch_force (e, copy_bb); + continue; + } +else + ei_next (ei); +} +#endif + +#if defined (HAVE_return) || defined (HAVE_simple_return) +/* Return true if there are any active insns between HEAD and TAIL. */ static bool -bb_active_p (basic_block bb) +active_insn_between (rtx head, rtx tail) { + while (tail) +{ + if (active_insn_p (tail)) + return true; + if (tail == head) + return false; + tail = PREV_INSN (tail); +} + return false; +} + +/* LAST_BB is a block that exits, and empty of active instructions. + Examine its predecessors for jumps that can be converted to + (conditional) returns. */ +static VEC (edge, heap) * +convert_jumps_to_returns (basic_block last_bb, bool simple_p, + VEC (edge, heap) *unconverted ATTRIBUTE_UNUSED) +{ + int i; + basic_block bb; rtx label; + edge_iterator ei; + edge e; + VEC(basic_block,heap) *src_bbs; + + src_bbs = VEC_alloc (basic_block, heap, EDGE_COUNT (last_bb-preds)); + FOR_EACH_EDGE (e, ei, last_bb-preds) +if (e-src != ENTRY_BLOCK_PTR) + VEC_quick_push (basic_block,
Re: PowerPC shrink-wrap support 3 of 3
So I'm at the point where I'm reasonably happy with this work. This patch doesn't do anything particularly clever regarding our shrink-wrap implementation. We still only insert one copy of the prologue, and one of the epilogue in thread_prologue_and_epilogue. All it really does is replaces Bernd's !last_bb_active code (allowing one tail block with no active insns to be shared by paths needing a prologue and paths not needing a prologue), with what I think is conceptually simpler, duplicating a shared tail block. Then I extend this to duplicating a chain of tail blocks. That leads to some simplification as all the special cases and restrictions of !last_bb_active disappear. For example, convert_jumps_to_returns looks much like the code in gcc-4.6. We also get many more functions being shrink-wrapped. Some numbers from my latest gcc bootstraps: powerpc-linux .../gcc-virgin/gcc grep 'Performing shrink' *.pro_and_epilogue | wc -l 453 .../gcc-curr/gcc grep 'Performing shrink' *.pro_and_epilogue | wc -l 648 i686-linux .../gcc-virgin/gcc$ grep 'Performing shrink' *pro_and_epilogue | wc -l 329 .../gcc-curr/gcc$ grep 'Performing shrink' *.pro_and_epilogue | wc -l 416 Bits left to do - limit size of duplicated tails - don't duplicate sibling call blocks, but instead split the block after the sibling call epilogue has been added, redirecting non-prologue paths past the epilogue. Is this OK to apply as is? * function.c (bb_active_p): Delete. (dup_block_and_redirect, active_insn_between): New functions. (convert_jumps_to_returns, emit_return_for_exit): New functions, split out from.. (thread_prologue_and_epilogue_insns): ..here. Delete shadowing variables. Don't do prologue register clobber tests when shrink wrapping already failed. Delete all last_bb_active code. Instead compute tail block candidates for duplicating exit path. Remove these from antic set. Duplicate tails when reached from both blocks needing a prologue/epilogue and blocks not needing such. Index: gcc/function.c === *** gcc/function.c (revision 180588) --- gcc/function.c (working copy) *** set_return_jump_label (rtx returnjump) *** 5514,5535 JUMP_LABEL (returnjump) = ret_rtx; } ! /* Return true if BB has any active insns. */ static bool ! bb_active_p (basic_block bb) { rtx label; ! /* Test whether there are active instructions in BB. */ ! label = BB_END (bb); ! while (label !LABEL_P (label)) { ! if (active_insn_p (label)) ! break; ! label = PREV_INSN (label); } ! return BB_HEAD (bb) != label || !LABEL_P (label); } /* Generate the prologue and epilogue RTL if the machine supports it. Thread this into place with notes indicating where the prologue ends and where --- 5514,5698 JUMP_LABEL (returnjump) = ret_rtx; } ! #ifdef HAVE_simple_return ! /* Create a copy of BB instructions and insert at BEFORE. Redirect !preds of BB to COPY_BB if they don't appear in NEED_PROLOGUE. */ ! static void ! dup_block_and_redirect (basic_block bb, basic_block copy_bb, rtx before, ! bitmap_head *need_prologue) ! { ! edge_iterator ei; ! edge e; ! rtx insn = BB_END (bb); ! ! /* We know BB has a single successor, so there is no need to copy a ! simple jump at the end of BB. */ ! if (simplejump_p (insn)) ! insn = PREV_INSN (insn); ! ! start_sequence (); ! duplicate_insn_chain (BB_HEAD (bb), insn); ! if (dump_file) ! { ! unsigned count = 0; ! for (insn = get_insns (); insn; insn = NEXT_INSN (insn)) ! if (active_insn_p (insn)) ! ++count; ! fprintf (dump_file, Duplicating bb %d to bb %d, %u active insns.\n, ! bb-index, copy_bb-index, count); ! } ! insn = get_insns (); ! end_sequence (); ! emit_insn_before (insn, before); ! ! /* Redirect all the paths that need no prologue into copy_bb. */ ! for (ei = ei_start (bb-preds); (e = ei_safe_edge (ei)); ) ! if (!bitmap_bit_p (need_prologue, e-src-index)) ! { ! redirect_edge_and_branch_force (e, copy_bb); ! continue; ! } ! else ! ei_next (ei); ! } ! #endif ! ! #if defined (HAVE_return) || defined (HAVE_simple_return) ! /* Return true if there are any active insns between HEAD and TAIL. */ static bool ! active_insn_between (rtx head, rtx tail) ! { ! while (tail) ! { ! if (active_insn_p (tail)) ! return true; ! if (tail == head) ! return false; ! tail = PREV_INSN (tail); ! } ! return false; ! } ! ! /* LAST_BB is a block that exits, and empty of active instructions. !Examine its predecessors for jumps that can be converted to !(conditional) returns. */ ! static VEC (edge, heap) * ! convert_jumps_to_returns (basic_block last_bb, bool simple_p, !
Re: PowerPC shrink-wrap support 3 of 3
On Thu, Oct 27, 2011 at 12:24:46AM +1030, Alan Modra wrote: more code than duplicating epilogues. From what I've seen, the duplicate tails are generally very small. I guess I should dump out some info so we can get a better idea. There were 545 occurrences of shrink-wrap in the gcc/ dir for a -O2 powerpc-linux bootstrap. I counted active insns in duplicated blocks. 244 had zero insns (all the ones I looked at were just uses of r3), 96 had one insn (all the ones I looked at were setting r3), with the remainder being: cfgrtl.c.199r.pro_and_epilogue:Duplicating bb 6, 2 insns. tail call function.c.199r.pro_and_epilogue:Duplicating bb 17, 2 insns. setting a value via pointer (*poffset in instantiate_new_reg) insn-automata.c.199r.pro_and_epilogue:Duplicating bb 229, 2 insns. tail call varasm.c.199r.pro_and_epilogue:Duplicating bb 48, 2 insns. setting a global var rs6000.c.199r.pro_and_epilogue:Duplicating bb 300, 3 insns. tail call rs6000.c.199r.pro_and_epilogue:Duplicating bb 8, 3 insns. tail call toplev.c.199r.pro_and_epilogue:Duplicating bb 5, 4 insns. loading two word return value from random_seed var. reginfo.c.199r.pro_and_epilogue:Duplicating bb 4, 7 insns. setting a two word global var and r3 dbxout.c.199r.pro_and_epilogue:Duplicating bb 4, 8 insns. tail call dbxout.c.199r.pro_and_epilogue:Duplicating bb 4, 8 insns. tail call Note that having 350 duplicated blocks doesn't mean we had 350 extra cases of shrink-wrapping, because some tails were two blocks, and I recall seeing a case when developing my code where one function had two duplicated tails. Certainly many more shrink wrapped functions though. :-) -- Alan Modra Australia Development Lab, IBM
Re: PowerPC shrink-wrap support 3 of 3
On Sun, Oct 16, 2011 at 02:51:01PM -0400, David Edelsohn wrote: The patch is okay, although I am not thrilled about the need to change the register allocation order. Committed revision 180522. It turns out that shrink-wrapping isn't as effective as it used to be with the 20110915 based sources I was using originally. povray Ray_In_Bound no longer gets the benefit of shrink wrap, likely due to some cfg optimization. We end up with a simple block that just does r3=1 then jumps to last_bb being reached from blocks that need a prologue as well as blocks that don't. That's enough to kill our current shrink wrap implementation. What we need is something to duplicate these tail blocks.. Patch here for comment. I haven't yet run benchmarks, but this passes bootstrap and regression test (on rev 180286, current virgin mainline fails bootstrap on powerpc-linux). * function.c (thread_prologue_and_epilogue_insns): Delete shadowing variables. Don't do prologue register clobber tests when shrink wrapping already failed. Compute tail block candidates for duplicating exit path. Remove these from antic set. Duplicate tails when reached from both blocks needing a prologue/epilogue and blocks not needing such. Index: gcc/function.c === --- gcc/function.c (revision 180467) +++ gcc/function.c (working copy) @@ -5697,11 +5697,11 @@ thread_prologue_and_epilogue_insns (void HARD_REG_SET prologue_clobbered, prologue_used, live_on_edge; HARD_REG_SET set_up_by_prologue; rtx p_insn; - VEC(basic_block, heap) *vec; basic_block bb; bitmap_head bb_antic_flags; bitmap_head bb_on_list; + bitmap_head bb_tail; if (dump_file) fprintf (dump_file, Attempting shrink-wrapping optimization.\n); @@ -5732,6 +5732,7 @@ thread_prologue_and_epilogue_insns (void bitmap_initialize (bb_antic_flags, bitmap_default_obstack); bitmap_initialize (bb_on_list, bitmap_default_obstack); + bitmap_initialize (bb_tail, bitmap_default_obstack); /* Find the set of basic blocks that require a stack frame. */ @@ -5774,19 +5775,22 @@ thread_prologue_and_epilogue_insns (void } } + /* Save a copy of blocks that really need a prologue. */ + bitmap_copy (bb_antic_flags, bb_flags); + /* For every basic block that needs a prologue, mark all blocks reachable from it, so as to ensure they are also seen as requiring a prologue. */ while (!VEC_empty (basic_block, vec)) { basic_block tmp_bb = VEC_pop (basic_block, vec); - edge e; - edge_iterator ei; + FOR_EACH_EDGE (e, ei, tmp_bb-succs) if (e-dest != EXIT_BLOCK_PTR bitmap_set_bit (bb_flags, e-dest-index)) VEC_quick_push (basic_block, vec, e-dest); } + /* If the last basic block contains only a label, we'll be able to convert jumps to it to (potentially conditional) return insns later. This means we don't necessarily need a prologue @@ -5799,14 +5803,29 @@ thread_prologue_and_epilogue_insns (void goto fail_shrinkwrap; } + /* Find the set of basic blocks that need no prologue and only +go to the exit. */ + bitmap_set_bit (bb_tail, EXIT_BLOCK_PTR-index); + VEC_quick_push (basic_block, vec, EXIT_BLOCK_PTR); + while (!VEC_empty (basic_block, vec)) + { + basic_block tmp_bb = VEC_pop (basic_block, vec); + + FOR_EACH_EDGE (e, ei, tmp_bb-preds) + if (single_succ_p (e-src) +!bitmap_bit_p (bb_antic_flags, e-src-index) +bitmap_set_bit (bb_tail, e-src-index)) + VEC_quick_push (basic_block, vec, e-src); + } + /* Now walk backwards from every block that is marked as needing -a prologue to compute the bb_antic_flags bitmap. */ - bitmap_copy (bb_antic_flags, bb_flags); +a prologue to compute the bb_antic_flags bitmap. Exclude +tail blocks; They can be duplicated to be used on paths not +needing a prologue. */ + bitmap_and_compl (bb_antic_flags, bb_flags, bb_tail); FOR_EACH_BB (bb) { - edge e; - edge_iterator ei; - if (!bitmap_bit_p (bb_flags, bb-index)) + if (!bitmap_bit_p (bb_antic_flags, bb-index)) continue; FOR_EACH_EDGE (e, ei, bb-preds) if (!bitmap_bit_p (bb_antic_flags, e-src-index) @@ -5816,8 +5835,6 @@ thread_prologue_and_epilogue_insns (void while (!VEC_empty (basic_block, vec)) { basic_block tmp_bb = VEC_pop (basic_block, vec); - edge e; - edge_iterator ei; bool all_set = true; bitmap_clear_bit (bb_on_list, tmp_bb-index); @@ -5862,28 +5879,172 @@ thread_prologue_and_epilogue_insns (void
Re: PowerPC shrink-wrap support 3 of 3
On 10/26/11 14:27, Alan Modra wrote: Committed revision 180522. It turns out that shrink-wrapping isn't as effective as it used to be with the 20110915 based sources I was using originally. povray Ray_In_Bound no longer gets the benefit of shrink wrap, likely due to some cfg optimization. We end up with a simple block that just does r3=1 then jumps to last_bb being reached from blocks that need a prologue as well as blocks that don't. That's enough to kill our current shrink wrap implementation. What we need is something to duplicate these tail blocks.. Would it work to insert the epilogue on some edges to this R3=1 block, and not on the others? (How many edges of each kind are there?) Now that we have an initial patch in the tree and it mostly seems to work, we can think about making it a little stronger - the initial implementation is really quite conservative. Bernd
Re: PowerPC shrink-wrap support 3 of 3
On Wed, Oct 26, 2011 at 03:01:01PM +0200, Bernd Schmidt wrote: On 10/26/11 14:27, Alan Modra wrote: Committed revision 180522. It turns out that shrink-wrapping isn't as effective as it used to be with the 20110915 based sources I was using originally. povray Ray_In_Bound no longer gets the benefit of shrink wrap, likely due to some cfg optimization. We end up with a simple block that just does r3=1 then jumps to last_bb being reached from blocks that need a prologue as well as blocks that don't. That's enough to kill our current shrink wrap implementation. What we need is something to duplicate these tail blocks.. Would it work to insert the epilogue on some edges to this R3=1 block, and not on the others? Wouldn't you need to modify all the target epilogue code? Our epilogues return. (How many edges of each kind are there?) In the povray case there was one edge of each kind, but I have seen other cases where there were 4 edges from blocks needing no prologue and 2 edges from blocks needing a prologue. I can't tell you what the testcase was now; It was something I looked at when ironing out bugs in my code. You wouldn't believe how many ways it is possible to write buggy cfg manipulation code.. I guess the tradeoff between the classic shrink-wrap epilogue scheme and my duplicate tail idea is whether duplicating tail blocks adds more code than duplicating epilogues. From what I've seen, the duplicate tails are generally very small. I guess I should dump out some info so we can get a better idea. -- Alan Modra Australia Development Lab, IBM
Re: PowerPC shrink-wrap support 3 of 3
On Wed, Oct 26, 2011 at 03:59:36PM +0200, Bernd Schmidt wrote: On 10/26/11 15:54, Alan Modra wrote: I guess the tradeoff between the classic shrink-wrap epilogue scheme and my duplicate tail idea is whether duplicating tail blocks adds more code than duplicating epilogues. From what I've seen, the duplicate tails are generally very small. I guess I should dump out some info so we can get a better idea. I suppose if one wanted to avoid inserting more than one epilogue for code-size reasons, one could make a new basic block containing the epilogue, and redirect edges as appropriate. Suppose you have a function that returns r3=0 in one tail block and r3=1 in another, and these blocks are reached both by paths needing a prologue and by paths not needing a prologue. Which seems a likely common case. I'm fairly certain that would require two copies of the normal epilogue, or duplicating the tail blocks. (But it's late here and I'm ready to nod off so may not be thinking straight.) -- Alan Modra Australia Development Lab, IBM
Re: PowerPC shrink-wrap support 3 of 3
On Wed, Sep 28, 2011 at 11:47 AM, Alan Modra amo...@gmail.com wrote: * config/rs6000/rs6000.c (rs6000_make_savres_rtx): Delete unneeded declaration. (rs6000_emit_stack_reset): Only return insn emitted when it adjusts sp. (rs6000_make_savres_rtx): Rename to rs6000_emit_savres_rtx. Use simple_return in pattern, emit instruction, and set jump_label. (rs6000_emit_prologue): Update for rs6000_emit_savres_rtx. Use simple_return rather than return. (emit_cfa_restores): New function. (rs6000_emit_epilogue): Emit cfa_restores when flag_shrink_wrap. Add missing cfa_restores for SAVE_WORLD. Add missing LR cfa_restore when using out-of-line gpr restore. Add missing LR and FP regs cfa_restores for out-of-line fpr restore. Consolidate code setting up cfa_restores. Formatting. Use LR_REGNO define. (rs6000_output_mi_thunk): Use simple_return rather than return. * config/rs6000/rs6000.md (sibcall*, sibcall_value*): Likewise. (return_internal*): Likewise. (any_return, return_pred, return_str): New iterators. (return, conditional return insns): Provide both return and simple_return variants. * gcc/config/rs6000/rs6000.h (EARLY_R12, LATE_R12): Define. (REG_ALLOC_ORDER): Move r12 before call-saved regs when FIXED_R13. Move r11 and r0 later to suit shrink-wrapping. Alan, The patch is okay, although I am not thrilled about the need to change the register allocation order. Thanks, David
Re: PowerPC shrink-wrap support 3 of 3
This supercedes http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01004.html and http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01593.html, fixing the two regressions introduced by those patches. The first patch is unchanged except to leave all the out-of-line restore functions using return rather than simple_return. We don't want these being confused with a plain simple_return and perhaps used by the shrink- wrapping to return from pre-prologue code. The second of these two patches was way too simplistic. It was a real pain getting the cfa_restores correct. A lot were missing, or emitted at the wrong place (due to bug in rs6000_emit_stack_reset). I also had the real restore insn move past the cfa_restores (mtlr 0 insn scheduled over loads from stack). * config/rs6000/rs6000.c (rs6000_make_savres_rtx): Delete unneeded declaration. (rs6000_emit_stack_reset): Only return insn emitted when it adjusts sp. (rs6000_make_savres_rtx): Rename to rs6000_emit_savres_rtx. Use simple_return in pattern, emit instruction, and set jump_label. (rs6000_emit_prologue): Update for rs6000_emit_savres_rtx. Use simple_return rather than return. (emit_cfa_restores): New function. (rs6000_emit_epilogue): Emit cfa_restores when flag_shrink_wrap. Add missing cfa_restores for SAVE_WORLD. Add missing LR cfa_restore when using out-of-line gpr restore. Add missing LR and FP regs cfa_restores for out-of-line fpr restore. Consolidate code setting up cfa_restores. Formatting. Use LR_REGNO define. (rs6000_output_mi_thunk): Use simple_return rather than return. * config/rs6000/rs6000.md (sibcall*, sibcall_value*): Likewise. (return_internal*): Likewise. (any_return, return_pred, return_str): New iterators. (return, conditional return insns): Provide both return and simple_return variants. * gcc/config/rs6000/rs6000.h (EARLY_R12, LATE_R12): Define. (REG_ALLOC_ORDER): Move r12 before call-saved regs when FIXED_R13. Move r11 and r0 later to suit shrink-wrapping. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 178876) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -899,8 +900,6 @@ static const char *rs6000_mangle_type (c static void rs6000_set_default_type_attributes (tree); static rtx rs6000_savres_routine_sym (rs6000_stack_t *, bool, bool, bool); static rtx rs6000_emit_stack_reset (rs6000_stack_t *, rtx, rtx, int, bool); -static rtx rs6000_make_savres_rtx (rs6000_stack_t *, rtx, int, - enum machine_mode, bool, bool, bool); static bool rs6000_reg_live_or_pic_offset_p (int); static tree rs6000_builtin_vectorized_libmass (tree, tree, tree); static tree rs6000_builtin_vectorized_function (tree, tree, tree); @@ -19704,8 +19728,10 @@ rs6000_emit_stack_reset (rs6000_stack_t if (sp_offset != 0) { rtx dest_reg = savres ? gen_rtx_REG (Pmode, 11) : sp_reg_rtx; - return emit_insn (gen_add3_insn (dest_reg, frame_reg_rtx, - GEN_INT (sp_offset))); + rtx insn = emit_insn (gen_add3_insn (dest_reg, frame_reg_rtx, + GEN_INT (sp_offset))); + if (!savres) + return insn; } else if (!savres) return emit_move_insn (sp_reg_rtx, frame_reg_rtx); @@ -19729,10 +19755,11 @@ rs6000_emit_stack_reset (rs6000_stack_t } /* Construct a parallel rtx describing the effect of a call to an - out-of-line register save/restore routine. */ + out-of-line register save/restore routine, and emit the insn + or jump_insn as appropriate. */ static rtx -rs6000_make_savres_rtx (rs6000_stack_t *info, +rs6000_emit_savres_rtx (rs6000_stack_t *info, rtx frame_reg_rtx, int save_area_offset, enum machine_mode reg_mode, bool savep, bool gpr, bool lr) @@ -19742,6 +19769,7 @@ rs6000_make_savres_rtx (rs6000_stack_t * int reg_size = GET_MODE_SIZE (reg_mode); rtx sym; rtvec p; + rtx par, insn; offset = 0; start_reg = (gpr @@ -19755,7 +19783,7 @@ rs6000_make_savres_rtx (rs6000_stack_t * RTVEC_ELT (p, offset++) = ret_rtx; RTVEC_ELT (p, offset++) -= gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, 65)); += gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, LR_REGNO)); sym = rs6000_savres_routine_sym (info, savep, gpr, lr); RTVEC_ELT (p, offset++) = gen_rtx_USE (VOIDmode, sym); @@ -19788,7 +19816,16 @@ rs6000_make_savres_rtx (rs6000_stack_t * RTVEC_ELT (p, i + offset) = gen_rtx_SET (VOIDmode, mem, reg); } - return gen_rtx_PARALLEL (VOIDmode, p); + par = gen_rtx_PARALLEL (VOIDmode, p); + + if (!savep lr) +{ + insn = emit_jump_insn (par); + JUMP_LABEL (insn) = ret_rtx; +
Re: PowerPC shrink-wrap support 3 of 3
This patch fixes an issue that limit opportunities for shrink-wrapping on PowerPC. The rs6000 REG_ALLOC_ORDER chooses r0 as the very first gpr to use in code, with r11 also having high priority. This means it is quite likely that r0 or r11 is live on the edge chosen for shrink-wrapping. That's unfortunate since rs6000 uses r0, r11, and r12 as temporaries in the prologue, and thus can't insert the prologue on that edge. I avoid this situation in most cases by simply changing the register allocation order, making r0 and r11 the last of the volatile gprs to be used. I also noticed that r12 is not used until after the non-volatile regs are allocated. According to the comment, the reason this is done is to avoid putting a DI into the r12,r13 pair (and as r13 is non-volatile would mean saving and restoring all the non-volatile gprs when not inlining the saves/restores). However, when r13 is a fixed reg this cannot occur, so moving r12 before the non-volatile regs makes another reg available before we need to start saving regs. After making the register allocation order changes I found gcc bootstrap failed. This turned out to be caused by shrink-wrapping and register renaming moving some pre-prologue code after the epilogue. To support this we need to ensure no prologue unwind info is left active after the epilogue. Bootstrapped and regression tested powerpc-linux. Two regressions appeared due to a problem in the shrink-wrap code. More on that later. * config/rs6000/rs6000.c (rs6000_emit_epilogue): Emit cfa_restores when flag_shrink_wrap. * gcc/config/rs6000/rs6000.h (EARLY_R12, LATE_R12): Define. (REG_ALLOC_ORDER): Move r12 before call-saved regs when FIXED_R13. Move r11 and r0 later to suit shrink-wrapping. diff -urp -x'*~' -x'*.orig' -x'*.rej' -x.svn gcc-alanshrink1/gcc/config/rs6000/rs6000.c gcc-current/gcc/config/rs6000/rs6000.c --- gcc-alanshrink1/gcc/config/rs6000/rs6000.c 2011-09-26 15:51:19.0 +0930 +++ gcc-current/gcc/config/rs6000/rs6000.c 2011-09-23 11:54:00.0 +0930 @@ -21001,7 +21045,7 @@ rs6000_emit_epilogue (int sibcall) reg = gen_rtx_REG (V4SImode, i); emit_move_insn (reg, mem); - if (DEFAULT_ABI == ABI_V4) + if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap) cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg, cfa_restores); } @@ -21052,10 +21096,16 @@ rs6000_emit_epilogue (int sibcall) } /* Set LR here to try to overlap restores below. LR is always saved - above incoming stack, so it never needs REG_CFA_RESTORE. */ + above incoming stack, so it never needs REG_CFA_RESTORE except + when shrink-wrapping as in that case we may have blocks from + before the prologue being moved after the epilogue. */ if (restore_lr restoring_GPRs_inline) -emit_move_insn (gen_rtx_REG (Pmode, LR_REGNO), - gen_rtx_REG (Pmode, 0)); +{ + rtx lr = gen_rtx_REG (Pmode, LR_REGNO); + emit_move_insn (lr, gen_rtx_REG (Pmode, 0)); + if (flag_shrink_wrap) + cfa_restores = alloc_reg_note (REG_CFA_RESTORE, lr, cfa_restores); +} /* Load exception handler data registers, if needed. */ if (crtl-calls_eh_return) @@ -21146,7 +21196,7 @@ rs6000_emit_epilogue (int sibcall) reg = gen_rtx_REG (reg_mode, info-first_gp_reg_save + i); insn = emit_move_insn (reg, mem); - if (DEFAULT_ABI == ABI_V4) + if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap) { if (frame_pointer_needed info-first_gp_reg_save + i @@ -21188,7 +21238,7 @@ rs6000_emit_epilogue (int sibcall) if (info-cr_save_p) { rs6000_restore_saved_cr (cr_save_reg, using_mtcr_multiple); - if (DEFAULT_ABI == ABI_V4) + if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap) cfa_restores = alloc_reg_note (REG_CFA_RESTORE, gen_rtx_REG (SImode, CR2_REGNO), @@ -21217,7 +21267,7 @@ rs6000_emit_epilogue (int sibcall) return; } - if (DEFAULT_ABI == ABI_V4) + if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap) { if (frame_pointer_needed) { @@ -21246,7 +21296,7 @@ rs6000_emit_epilogue (int sibcall) rtx reg = gen_rtx_REG (reg_mode, info-first_gp_reg_save + i); RTVEC_ELT (p, i) = gen_rtx_SET (VOIDmode, reg, mem); - if (DEFAULT_ABI == ABI_V4) + if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap) cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg, cfa_restores); } @@ -21271,7 +21321,7 @@ rs6000_emit_epilogue (int sibcall) rtx reg = gen_rtx_REG (reg_mode, info-first_gp_reg_save + i); insn = emit_move_insn (reg,
Re: PowerPC shrink-wrap support 3 of 3
On Mon, Sep 26, 2011 at 11:22:54PM +0930, Alan Modra wrote: Two regressions appeared due to a problem in the shrink-wrap code. These two. +FAIL: g++.dg/torture/pr46111.C -O1 (internal compiler error) +FAIL: gcc.dg/autopar/pr46099.c (internal compiler error) Both internal compiler error: in maybe_record_trace_start, at dwarf2cfi.c:2243, caused by disjoint blocks in the set of blocks that needs no prologue. ie. thread_prologue_and_epilogue_insns gives code needing no prologue - prologue inserted here code needing prologue more no prologue code more code needing prologue epilogue That second block needing no prologue now wrongly tries to use unwind info from the prologue. -- Alan Modra Australia Development Lab, IBM
Re: PowerPC shrink-wrap support 3 of 3
On 09/27/11 00:32, Alan Modra wrote: On Mon, Sep 26, 2011 at 11:22:54PM +0930, Alan Modra wrote: Two regressions appeared due to a problem in the shrink-wrap code. These two. +FAIL: g++.dg/torture/pr46111.C -O1 (internal compiler error) +FAIL: gcc.dg/autopar/pr46099.c (internal compiler error) Both internal compiler error: in maybe_record_trace_start, at dwarf2cfi.c:2243, caused by disjoint blocks in the set of blocks that needs no prologue. ie. thread_prologue_and_epilogue_insns gives code needing no prologue - prologue inserted here code needing prologue more no prologue code more code needing prologue epilogue That second block needing no prologue now wrongly tries to use unwind info from the prologue. What's the actual CFG structure here? Bernd
Re: PowerPC shrink-wrap support 3 of 3
On Tue, Sep 27, 2011 at 12:39:36AM +0200, Bernd Schmidt wrote: On 09/27/11 00:32, Alan Modra wrote: On Mon, Sep 26, 2011 at 11:22:54PM +0930, Alan Modra wrote: Two regressions appeared due to a problem in the shrink-wrap code. These two. +FAIL: g++.dg/torture/pr46111.C -O1 (internal compiler error) +FAIL: gcc.dg/autopar/pr46099.c (internal compiler error) Both internal compiler error: in maybe_record_trace_start, at dwarf2cfi.c:2243, caused by disjoint blocks in the set of blocks that needs no prologue. ie. thread_prologue_and_epilogue_insns gives code needing no prologue - prologue inserted here code needing prologue more no prologue code more code needing prologue epilogue That second block needing no prologue now wrongly tries to use unwind info from the prologue. What's the actual CFG structure here? Immediately after thread_prologue_and_epilogue_insns bb2 - simple_ret | bb3 /\ /\ bb4bb5 (pro)| | |--| bb7 | | (epi) bb6--| | | bb8-| | (s ret)| | |-bb9 -- Alan Modra Australia Development Lab, IBM
Re: PowerPC shrink-wrap support 3 of 3
On 09/27/11 02:11, Alan Modra wrote: On Tue, Sep 27, 2011 at 12:39:36AM +0200, Bernd Schmidt wrote: On 09/27/11 00:32, Alan Modra wrote: On Mon, Sep 26, 2011 at 11:22:54PM +0930, Alan Modra wrote: Two regressions appeared due to a problem in the shrink-wrap code. These two. +FAIL: g++.dg/torture/pr46111.C -O1 (internal compiler error) +FAIL: gcc.dg/autopar/pr46099.c (internal compiler error) Both internal compiler error: in maybe_record_trace_start, at dwarf2cfi.c:2243, caused by disjoint blocks in the set of blocks that needs no prologue. ie. thread_prologue_and_epilogue_insns gives code needing no prologue - prologue inserted here code needing prologue more no prologue code more code needing prologue epilogue That second block needing no prologue now wrongly tries to use unwind info from the prologue. What's the actual CFG structure here? Immediately after thread_prologue_and_epilogue_insns bb2 - simple_ret | bb3 /\ /\ bb4bb5 (pro)| | |--| bb7 | | (epi) bb6--| | | bb8-| | (s ret)| | |-bb9 That looks perfectly reasonable, and even if it is ordered in the way shown above, dwarf2cfi still should be able to deal with it if the prologue and epilogue are annotated correctly. Again, at the crash site, examine the start insn to find out which basic block is the problem, and use dump_cfi_row on the two rows it tried to compare (and found an unexpected difference). Bernd
PowerPC shrink-wrap support 3 of 3
Finally, the powerpc backend changes. These are mostly just mechanical. I'll note that we need both simple_return and return variants of the conditional returns because they can only be used when no epilogue is required. The return variant must use direct_return() as a predicate to check this requirement. The simple_return variant is used by the shrink_wrap optimization to enable a conditional return before the function prologue, and should *not* use direct_return() because the optimization is valid for functions that do require an epilogue. All other instructions in the rs6000 backend that currently use return are generated from the epilogue or are sibling calls, so simple_return is appropriate as per md.texi. * config/rs6000/rs6000.c (rs6000_make_savres_rtx): Delete unneeded declaration. (rs6000_make_savres_rtx): Rename to rs6000_emit_savres_rtx. Use simple_return in pattern, emit instruction, and set jump_label. (rs6000_emit_prologue): Update for rs6000_emit_savres_rtx. Use simple_return rather than return. (rs6000_output_mi_thunk): Use simple_return rather than return. * config/rs6000/altivec.md (restore_world): Likewise. * config/rs6000/spe.md (return_and_restore_gpregs_spe): Likewise. * config/rs6000/rs6000.md (sibcall*, sibcall_value*): Likewise. (return_internal*, return_and_restore*): Likewise. (any_return, return_pred, return_str): New iterators. (return, conditional return insns): Provide both return and simple_return variants. diff -urp -x.svn -x'*~' -x'*.orig' gcc-bernd/gcc/config/rs6000/rs6000.c gcc-current/gcc/config/rs6000/rs6000.c --- gcc-bernd/gcc/config/rs6000/rs6000.c2011-09-16 11:52:15.0 +0930 +++ gcc-current/gcc/config/rs6000/rs6000.c 2011-09-15 22:25:50.0 +0930 @@ -899,8 +900,6 @@ static const char *rs6000_mangle_type (c static void rs6000_set_default_type_attributes (tree); static rtx rs6000_savres_routine_sym (rs6000_stack_t *, bool, bool, bool); static rtx rs6000_emit_stack_reset (rs6000_stack_t *, rtx, rtx, int, bool); -static rtx rs6000_make_savres_rtx (rs6000_stack_t *, rtx, int, - enum machine_mode, bool, bool, bool); static bool rs6000_reg_live_or_pic_offset_p (int); static tree rs6000_builtin_vectorized_libmass (tree, tree, tree); static tree rs6000_builtin_vectorized_function (tree, tree, tree); @@ -19729,10 +19764,11 @@ rs6000_emit_stack_reset (rs6000_stack_t } /* Construct a parallel rtx describing the effect of a call to an - out-of-line register save/restore routine. */ + out-of-line register save/restore routine, and emit the insn + or jump_insn as appropriate. */ static rtx -rs6000_make_savres_rtx (rs6000_stack_t *info, +rs6000_emit_savres_rtx (rs6000_stack_t *info, rtx frame_reg_rtx, int save_area_offset, enum machine_mode reg_mode, bool savep, bool gpr, bool lr) @@ -19742,6 +19778,7 @@ rs6000_make_savres_rtx (rs6000_stack_t * int reg_size = GET_MODE_SIZE (reg_mode); rtx sym; rtvec p; + rtx par, insn; offset = 0; start_reg = (gpr @@ -19752,7 +19789,7 @@ rs6000_make_savres_rtx (rs6000_stack_t * p = rtvec_alloc ((lr ? 4 : 3) + n_regs); if (!savep lr) -RTVEC_ELT (p, offset++) = ret_rtx; +RTVEC_ELT (p, offset++) = simple_return_rtx; RTVEC_ELT (p, offset++) = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, 65)); @@ -19788,7 +19825,16 @@ rs6000_make_savres_rtx (rs6000_stack_t * RTVEC_ELT (p, i + offset) = gen_rtx_SET (VOIDmode, mem, reg); } - return gen_rtx_PARALLEL (VOIDmode, p); + par = gen_rtx_PARALLEL (VOIDmode, p); + + if (!savep lr) +{ + insn = emit_jump_insn (par); + JUMP_LABEL (insn) = simple_return_rtx; +} + else +insn = emit_insn (par); + return insn; } /* Determine whether the gp REG is really used. */ @@ -20087,16 +20133,13 @@ rs6000_emit_prologue (void) } else if (!WORLD_SAVE_P (info) info-first_fp_reg_save != 64) { - rtx par; - - par = rs6000_make_savres_rtx (info, frame_reg_rtx, - info-fp_save_offset + sp_offset, - DFmode, - /*savep=*/true, /*gpr=*/false, - /*lr=*/(strategy -SAVE_NOINLINE_FPRS_SAVES_LR) - != 0); - insn = emit_insn (par); + insn = rs6000_emit_savres_rtx (info, frame_reg_rtx, +info-fp_save_offset + sp_offset, +DFmode, +/*savep=*/true, /*gpr=*/false, +/*lr=*/((strategy + SAVE_NOINLINE_FPRS_SAVES_LR) +