Re: Revert PowerPC shrink-wrap support 3 of 3

2015-04-08 Thread Gerald Pfeifer
On Thu, 10 Nov 2011, Hans-Peter Nilsson wrote:
 I think I need someone with appropriate write privileges to
 agree with that, and to also give 48h for someone to fix the
 problem.  Sorry for not forthcoming on the second point.
 
 brgds, H-P
 PS. where is the policy written down, besides the mailing list archives?

https://gcc.gnu.org/develop.html has had the following since 2003 or so:

  Patch Reversion

  If a patch is committed which introduces a regression on any target 
  which the Steering Committee considers to be important and if:

   * the problem is reported to the original poster;
   * 48 hours pass without the original poster or any other party 
 indicating that a fix will be forthcoming in the very near future;
   * two people with write privileges to the affected area of the compiler 
 determine that the best course of action is to revert the patch;

  then they may revert the patch.

  (The list of important targets will be revised at the beginning of each 
  release cycle, if necessary, and is part of the release criteria.)

  After the patch has been reverted, the poster may appeal the decision to 
  the Steering Committee.

  Note that no distinction is made between patches which are themselves 
  buggy and patches that expose latent bugs elsewhere in the compiler.

This is there as part of the overall (release) methodology, though
svnwrite.html would be at least as natural.  Perhaps a reference
there?  Thoughts?

Gerald

PS: Yes, I am catching up with some older mails. ;-)


Re: CFG review needed for fix of PowerPC shrink-wrap support 3 of 3

2011-11-15 Thread Richard Henderson
On 11/14/2011 11:56 AM, Alan Modra wrote:
   * function.c (thread_prologue_and_epilogue_insns): Guard
   emitting return with single_succ_p test.

Ok.


r~


CFG review needed for fix of PowerPC shrink-wrap support 3 of 3

2011-11-14 Thread Hans-Peter Nilsson
 From: Bernd Schmidt ber...@codesourcery.com
 Date: Mon, 14 Nov 2011 10:51:56 +0100

 On 11/11/11 20:13, Hans-Peter Nilsson wrote:
  AFAICT, your patch has got sufficiently testing now (on three
  targets to boot) to be considered safe to check in.  Or is
  something amiss?
  
  (If it's the unchecked code quality you mentioned, that can be
  just as well dealt with having the tree in a working state;
  having the tree broken for some targets accomplishes nothing.)
 
 I briefly looked at that, and while it does tend to move stuff around
 compared to an unpatched compiler, the effects don't seem to be all that
 bad. If we find a testcase where it's a real problem, we could do more
 aggressive reordering after epilogue generation, when we no longer have
 exit fallthroguh edges. So I'd like to ask for an OK here.

Looks like all we need is a positive review of
http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01409.html and a
ChangeLog entry to unbreak three or more targets.

Someone with approval rights: pretty please?

brgds, H-P


Re: CFG review needed for fix of PowerPC shrink-wrap support 3 of 3

2011-11-14 Thread Ramana Radhakrishnan

 Someone with approval rights: pretty please?

Can I add my +1 pretty please as well here :) ? According to #c3
this fixes arm-linux-gnueabi cross-builds for C++ as well and
potentially allows this to bootstrap again. I have kicked off a
bootstrap and test run on arm-linux-gnueabi .

cheers
Ramana


Re: CFG review needed for fix of PowerPC shrink-wrap support 3 of 3

2011-11-14 Thread Rainer Orth
Ramana Radhakrishnan ramana.radhakrish...@linaro.org writes:

 Someone with approval rights: pretty please?

 Can I add my +1 pretty please as well here :) ? According to #c3
 this fixes arm-linux-gnueabi cross-builds for C++ as well and
 potentially allows this to bootstrap again. I have kicked off a
 bootstrap and test run on arm-linux-gnueabi .

It's better than before, but Ada bootstrap (on Solaris/SPARC, cf. PR
bootstrap/51086) still fails with the same ICE.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: CFG review needed for fix of PowerPC shrink-wrap support 3 of 3

2011-11-14 Thread Richard Henderson
On 11/14/2011 04:10 AM, Hans-Peter Nilsson wrote:
 Looks like all we need is a positive review of
 http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01409.html and a
 ChangeLog entry to unbreak three or more targets.
 
 Someone with approval rights: pretty please?

That patch is ok.


r~


Re: CFG review needed for fix of PowerPC shrink-wrap support 3 of 3

2011-11-14 Thread Alan Modra
On Mon, Nov 14, 2011 at 07:48:03AM -1000, Richard Henderson wrote:
 On 11/14/2011 04:10 AM, Hans-Peter Nilsson wrote:
  Looks like all we need is a positive review of
  http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01409.html and a
  ChangeLog entry to unbreak three or more targets.
  
  Someone with approval rights: pretty please?
 
 That patch is ok.

Sorry for the bootstrap problems.  I believe the cause is my
extraction of convert_jumps_to_returns and emit_return_for_exit.  The
new HAVE_return code misses a single_succ_p test (covered by the
bitmap_bit_p (bb_flags, ...) test in the HAVE_simple_return case).

I haven't really looked into what Bernd's fix does.  I know this one
fixes what I broke..

* function.c (thread_prologue_and_epilogue_insns): Guard
emitting return with single_succ_p test.

Index: gcc/function.c
===
--- gcc/function.c  (revision 181188)
+++ gcc/function.c  (working copy)
@@ -6230,7 +6230,8 @@ thread_prologue_and_epilogue_insns (void
   !active_insn_between (BB_HEAD (last_bb), BB_END (last_bb)))
convert_jumps_to_returns (last_bb, false, NULL);
 
- if (EDGE_COUNT (exit_fallthru_edge-src-preds) != 0)
+ if (EDGE_COUNT (last_bb-preds) != 0
+  single_succ_p (last_bb))
{
  last_bb = emit_return_for_exit (exit_fallthru_edge, false);
  epilogue_end = returnjump = BB_END (last_bb);

-- 
Alan Modra
Australia Development Lab, IBM


Re: CFG review needed for fix of PowerPC shrink-wrap support 3 of 3

2011-11-14 Thread Hans-Peter Nilsson
 From: Richard Henderson r...@redhat.com
 Date: Mon, 14 Nov 2011 18:48:03 +0100

 On 11/14/2011 04:10 AM, Hans-Peter Nilsson wrote:
  Looks like all we need is a positive review of
  http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01409.html and a
  ChangeLog entry to unbreak three or more targets.
  
  Someone with approval rights: pretty please?
 
 That patch is ok.

Committed with this ChangeLog entry, obvious from the function
header and the added comment.

2011-11-15  Bernd Schmidt  ber...@codesourcery.com

PR rtl-optimization/51051
* cfgrtl.c (cfg_layout_can_merge_blocks_p): Return FALSE if the
move would cause fallthrough into the exit block.

brgds, H-P


Re: CFG review needed for fix of PowerPC shrink-wrap support 3 of 3

2011-11-14 Thread Richard Henderson
On 11/14/2011 11:56 AM, Alan Modra wrote:
 On Mon, Nov 14, 2011 at 07:48:03AM -1000, Richard Henderson wrote:
 On 11/14/2011 04:10 AM, Hans-Peter Nilsson wrote:
 Looks like all we need is a positive review of
 http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01409.html and a
 ChangeLog entry to unbreak three or more targets.

 Someone with approval rights: pretty please?

 That patch is ok.
 
 Sorry for the bootstrap problems.  I believe the cause is my
 extraction of convert_jumps_to_returns and emit_return_for_exit.  The
 new HAVE_return code misses a single_succ_p test (covered by the
 bitmap_bit_p (bb_flags, ...) test in the HAVE_simple_return case).
 
 I haven't really looked into what Bernd's fix does.  I know this one
 fixes what I broke..
 
   * function.c (thread_prologue_and_epilogue_insns): Guard
   emitting return with single_succ_p test.


Hmm.  This looks plausible too.  

Bernd's patch made sure that cfglayout didn't do something impossible.
Of course, it's possible that his patch should merely be an assert,
and the correct fix goes here.

Thoughts?


r~


Re: CFG review needed for fix of PowerPC shrink-wrap support 3 of 3

2011-11-14 Thread Bernd Schmidt
On 11/15/11 01:43, Richard Henderson wrote:
 On 11/14/2011 11:56 AM, Alan Modra wrote:
  * function.c (thread_prologue_and_epilogue_insns): Guard
  emitting return with single_succ_p test.
 
 Hmm.  This looks plausible too.  
 
 Bernd's patch made sure that cfglayout didn't do something impossible.
 Of course, it's possible that his patch should merely be an assert,
 and the correct fix goes here.

I haven't really looked at Alan's patch; I'm thinking there were
probably two bugs - the one I found being latent before. Note that the
problematic CFG exists way before thread_prologue_and_epilogue even
runs, so an assert would still trigger.


Bernd


Re: CFG review needed for fix of PowerPC shrink-wrap support 3 of 3

2011-11-14 Thread David Miller
From: Bernd Schmidt ber...@codesourcery.com
Date: Tue, 15 Nov 2011 01:54:34 +0100

 On 11/15/11 01:43, Richard Henderson wrote:
 On 11/14/2011 11:56 AM, Alan Modra wrote:
 * function.c (thread_prologue_and_epilogue_insns): Guard
 emitting return with single_succ_p test.
 
 Hmm.  This looks plausible too.  
 
 Bernd's patch made sure that cfglayout didn't do something impossible.
 Of course, it's possible that his patch should merely be an assert,
 and the correct fix goes here.
 
 I haven't really looked at Alan's patch; I'm thinking there were
 probably two bugs - the one I found being latent before. Note that the
 problematic CFG exists way before thread_prologue_and_epilogue even
 runs, so an assert would still trigger.

I suspect that Alan's fix here will take care of the sparc ada bootstrap
regression reported, and regardless if it's correct it should be installed.


Re: Revert PowerPC shrink-wrap support 3 of 3

2011-11-11 Thread Hans-Peter Nilsson
 From: Hans-Peter Nilsson h...@axis.com
 Date: Thu, 10 Nov 2011 18:52:39 +0100

  From: Hans-Peter Nilsson h...@axis.com
  Date: Thu, 10 Nov 2011 15:12:54 +0100
 
   From: Bernd Schmidt ber...@codesourcery.com
   Date: Thu, 10 Nov 2011 14:29:04 +0100
  
   HP, can you run full tests?
  
  Cross-test to cris-elf in progress.
  Thanks!
 
 Works, no regressions compared to before the breakage (r181187).
 Thanks!  According to
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51051#c3 it fixes
 building for arm-unknown-linux-gnueabi too.

AFAICT, your patch has got sufficiently testing now (on three
targets to boot) to be considered safe to check in.  Or is
something amiss?

(If it's the unchecked code quality you mentioned, that can be
just as well dealt with having the tree in a working state;
having the tree broken for some targets accomplishes nothing.)

brgds, H-P


Revert PowerPC shrink-wrap support 3 of 3

2011-11-10 Thread Hans-Peter Nilsson
 From: Hans-Peter Nilsson h...@axis.com
 Date: Wed, 9 Nov 2011 09:55:59 +0100

  From: Alan Modra amo...@gmail.com
  Date: Tue, 1 Nov 2011 16:33:40 +0100
 
  On Tue, Nov 01, 2011 at 12:57:22AM +1030, Alan Modra wrote:
 
  * function.c (bb_active_p): Delete.
  (dup_block_and_redirect, active_insn_between): New functions.
  (convert_jumps_to_returns, emit_return_for_exit): New functions,
  split out from..
  (thread_prologue_and_epilogue_insns): ..here.  Delete
  shadowing variables.  Don't do prologue register clobber tests
  when shrink wrapping already failed.  Delete all last_bb_active
  code.  Instead compute tail block candidates for duplicating
  exit path.  Remove these from antic set.  Duplicate tails when
  reached from both blocks needing a prologue/epilogue and
  blocks not needing such.
  * ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and
  HAVE_simple_return.
  * bb-reorder.c (get_uncond_jump_length): Make global.
  * bb-reorder.h (get_uncond_jump_length): Declare.
  * cfgrtl.c (rtl_create_basic_block): Comment typo fix.
  (rtl_split_edge): Likewise.  Warning fix.
  (rtl_duplicate_bb): New function.
  (rtl_cfg_hooks): Enable can_duplicate_block_p and duplicate_block.
 
 This (a revision in the range 181187:181189) broke build for
 cris-elf like so:
 See PR51051.

Given that this also broke arm-linux-gnueabi, a primary
platform, and Alan being absent until the 15th according to a
message on IRC, I move to revert r181188.

I think I need someone with appropriate write privileges to
agree with that, and to also give 48h for someone to fix the
problem.  Sorry for not forthcoming on the second point.

brgds, H-P
PS. where is the policy written down, besides the mailing list archives?


Re: Revert PowerPC shrink-wrap support 3 of 3

2011-11-10 Thread Richard Guenther
On Thu, Nov 10, 2011 at 11:38 AM, Hans-Peter Nilsson
hans-peter.nils...@axis.com wrote:
 From: Hans-Peter Nilsson h...@axis.com
 Date: Wed, 9 Nov 2011 09:55:59 +0100

  From: Alan Modra amo...@gmail.com
  Date: Tue, 1 Nov 2011 16:33:40 +0100

  On Tue, Nov 01, 2011 at 12:57:22AM +1030, Alan Modra wrote:

          * function.c (bb_active_p): Delete.
          (dup_block_and_redirect, active_insn_between): New functions.
          (convert_jumps_to_returns, emit_return_for_exit): New functions,
          split out from..
          (thread_prologue_and_epilogue_insns): ..here.  Delete
          shadowing variables.  Don't do prologue register clobber tests
          when shrink wrapping already failed.  Delete all last_bb_active
          code.  Instead compute tail block candidates for duplicating
          exit path.  Remove these from antic set.  Duplicate tails when
          reached from both blocks needing a prologue/epilogue and
          blocks not needing such.
          * ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and
          HAVE_simple_return.
          * bb-reorder.c (get_uncond_jump_length): Make global.
          * bb-reorder.h (get_uncond_jump_length): Declare.
          * cfgrtl.c (rtl_create_basic_block): Comment typo fix.
          (rtl_split_edge): Likewise.  Warning fix.
          (rtl_duplicate_bb): New function.
          (rtl_cfg_hooks): Enable can_duplicate_block_p and duplicate_block.

 This (a revision in the range 181187:181189) broke build for
 cris-elf like so:
 See PR51051.

 Given that this also broke arm-linux-gnueabi, a primary
 platform, and Alan being absent until the 15th according to a
 message on IRC, I move to revert r181188.

Is there a PR for the arm issue?

 I think I need someone with appropriate write privileges to
 agree with that, and to also give 48h for someone to fix the
 problem.  Sorry for not forthcoming on the second point.

Did you or somebody else try to look into the problem?  To decide
whether it's the best course of action it would be nice to know if
it's a simple error in the patch that is easy to fix.

 brgds, H-P
 PS. where is the policy written down, besides the mailing list archives?

http://gcc.gnu.org/develop.html


Re: Revert PowerPC shrink-wrap support 3 of 3

2011-11-10 Thread Hans-Peter Nilsson
 From: Richard Guenther richard.guent...@gmail.com
 Date: Thu, 10 Nov 2011 12:22:56 +0100

 On Thu, Nov 10, 2011 at 11:38 AM, Hans-Peter Nilsson
 hans-peter.nils...@axis.com wrote:
  From: Hans-Peter Nilsson h...@axis.com
  Date: Wed, 9 Nov 2011 09:55:59 +0100
 
   From: Alan Modra amo...@gmail.com
   Date: Tue, 1 Nov 2011 16:33:40 +0100
 
   On Tue, Nov 01, 2011 at 12:57:22AM +1030, Alan Modra wrote:
 
           * function.c (bb_active_p): Delete.
           (dup_block_and_redirect, active_insn_between): New functions.
           (convert_jumps_to_returns, emit_return_for_exit): New functions,
           split out from..
           (thread_prologue_and_epilogue_insns): ..here.  Delete
           shadowing variables.  Don't do prologue register clobber tests
           when shrink wrapping already failed.  Delete all last_bb_active
           code.  Instead compute tail block candidates for duplicating
           exit path.  Remove these from antic set.  Duplicate tails when
           reached from both blocks needing a prologue/epilogue and
           blocks not needing such.
           * ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and
           HAVE_simple_return.
           * bb-reorder.c (get_uncond_jump_length): Make global.
           * bb-reorder.h (get_uncond_jump_length): Declare.
           * cfgrtl.c (rtl_create_basic_block): Comment typo fix.
           (rtl_split_edge): Likewise.  Warning fix.
           (rtl_duplicate_bb): New function.
           (rtl_cfg_hooks): Enable can_duplicate_block_p and 
   duplicate_block.
 
  This (a revision in the range 181187:181189) broke build for
  cris-elf like so:
  See PR51051.
 
  Given that this also broke arm-linux-gnueabi, a primary
  platform, and Alan being absent until the 15th according to a
  message on IRC, I move to revert r181188.
 
 Is there a PR for the arm issue?

It's covered by the same PR, see comment #1.
I've now updated the target field.

  I think I need someone with appropriate write privileges to
  agree with that, and to also give 48h for someone to fix the
  problem.  Sorry for not forthcoming on the second point.
 
 Did you or somebody else try to look into the problem?  To decide
 whether it's the best course of action it would be nice to know if
 it's a simple error in the patch that is easy to fix.

Nope, not really.  Wouldn't FWIW, de jure matter, me not having
write privileges to the affected area.  Though, I had a quick
look at the patch and nothing stood out except its
intrusiveness, and it seems the patch wasn't tested on a
!simple_return target (just powerpc-linux according to the
replied-to message).

brgds, H-P


Re: Revert PowerPC shrink-wrap support 3 of 3

2011-11-10 Thread Richard Guenther
On Thu, Nov 10, 2011 at 12:43 PM, Hans-Peter Nilsson
hans-peter.nils...@axis.com wrote:
 From: Richard Guenther richard.guent...@gmail.com
 Date: Thu, 10 Nov 2011 12:22:56 +0100

 On Thu, Nov 10, 2011 at 11:38 AM, Hans-Peter Nilsson
 hans-peter.nils...@axis.com wrote:
  From: Hans-Peter Nilsson h...@axis.com
  Date: Wed, 9 Nov 2011 09:55:59 +0100
 
   From: Alan Modra amo...@gmail.com
   Date: Tue, 1 Nov 2011 16:33:40 +0100
 
   On Tue, Nov 01, 2011 at 12:57:22AM +1030, Alan Modra wrote:
 
           * function.c (bb_active_p): Delete.
           (dup_block_and_redirect, active_insn_between): New functions.
           (convert_jumps_to_returns, emit_return_for_exit): New functions,
           split out from..
           (thread_prologue_and_epilogue_insns): ..here.  Delete
           shadowing variables.  Don't do prologue register clobber tests
           when shrink wrapping already failed.  Delete all last_bb_active
           code.  Instead compute tail block candidates for duplicating
           exit path.  Remove these from antic set.  Duplicate tails when
           reached from both blocks needing a prologue/epilogue and
           blocks not needing such.
           * ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and
           HAVE_simple_return.
           * bb-reorder.c (get_uncond_jump_length): Make global.
           * bb-reorder.h (get_uncond_jump_length): Declare.
           * cfgrtl.c (rtl_create_basic_block): Comment typo fix.
           (rtl_split_edge): Likewise.  Warning fix.
           (rtl_duplicate_bb): New function.
           (rtl_cfg_hooks): Enable can_duplicate_block_p and 
   duplicate_block.
 
  This (a revision in the range 181187:181189) broke build for
  cris-elf like so:
  See PR51051.
 
  Given that this also broke arm-linux-gnueabi, a primary
  platform, and Alan being absent until the 15th according to a
  message on IRC, I move to revert r181188.

 Is there a PR for the arm issue?

 It's covered by the same PR, see comment #1.
 I've now updated the target field.

  I think I need someone with appropriate write privileges to
  agree with that, and to also give 48h for someone to fix the
  problem.  Sorry for not forthcoming on the second point.

 Did you or somebody else try to look into the problem?  To decide
 whether it's the best course of action it would be nice to know if
 it's a simple error in the patch that is easy to fix.

 Nope, not really.  Wouldn't FWIW, de jure matter, me not having
 write privileges to the affected area.  Though, I had a quick
 look at the patch and nothing stood out except its
 intrusiveness, and it seems the patch wasn't tested on a
 !simple_return target (just powerpc-linux according to the
 replied-to message).

Fair enough.  You can count me as one then, and I'll defer to Bernd
to either provide a fix or ack the revert.

Thanks,
Richard.

 brgds, H-P



Re: Revert PowerPC shrink-wrap support 3 of 3

2011-11-10 Thread Bernd Schmidt
On 11/10/11 13:14, Richard Guenther wrote:
 Fair enough.  You can count me as one then, and I'll defer to Bernd
 to either provide a fix or ack the revert.

I'm trying to track it down.

In 189r.outof_cfglayout, we have

(insn 31 33 35 3 (use (reg/i:SI 0 r0))
../../../../baseline-trunk/libstdc++-v3/libsupc++/new_opv.cc:34 -1
 (nil))

;; Successors:  EXIT [100.0%]  (fallthru)
;; lr  out   0 [r0] 11 [fp] 13 [sp] 14 [lr] 25 [sfp] 26 [afp]
;; live  out 0 [r0] 11 [fp] 13 [sp] 25 [sfp] 26 [afp]

followed by a number of other basic blocks, so that looks wrong to me.
outof_cfglayout seems to assume that fallthrough edges to the exit block
are OK and don't need fixing up, and changing that seems nontrivial at
first glance.

The situation is first created during cfgcleanup in into_cfglayout. The
following patch makes the testcase compile by stopping the compiler from
moving the exit fallthru block around, but I've not checked whether it
has a negative effect on code quality. HP, can you run full tests?


Bernd
Index: ../baseline-trunk/gcc/cfgrtl.c
===
--- ../baseline-trunk/gcc/cfgrtl.c  (revision 181252)
+++ ../baseline-trunk/gcc/cfgrtl.c  (working copy)
@@ -2735,6 +2735,16 @@ cfg_layout_can_merge_blocks_p (basic_blo
   if (BB_PARTITION (a) != BB_PARTITION (b))
 return false;
 
+  /* If we would end up moving B's instructions, make sure it doesn't fall
+ through into the exit block, since we cannot recover from a fallthrough
+ edge into the exit block occurring in the middle of a function.  */
+  if (NEXT_INSN (BB_END (a)) != BB_HEAD (b))
+{
+  edge e = find_fallthru_edge (b-succs);
+  if (e  e-dest == EXIT_BLOCK_PTR)
+   return false;
+}
+
   /* There must be exactly one edge in between the blocks.  */
   return (single_succ_p (a)
   single_succ (a) == b


Re: Revert PowerPC shrink-wrap support 3 of 3

2011-11-10 Thread Hans-Peter Nilsson
 From: Hans-Peter Nilsson h...@axis.com
 Date: Thu, 10 Nov 2011 15:12:54 +0100

  From: Bernd Schmidt ber...@codesourcery.com
  Date: Thu, 10 Nov 2011 14:29:04 +0100
 
  HP, can you run full tests?
 
 Cross-test to cris-elf in progress.
 Thanks!

Works, no regressions compared to before the breakage (r181187).
Thanks!  According to
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51051#c3 it fixes
building for arm-unknown-linux-gnueabi too.

brgds, H-P


Re: Revert PowerPC shrink-wrap support 3 of 3

2011-11-10 Thread Michael Meissner
On Thu, Nov 10, 2011 at 02:29:04PM +0100, Bernd Schmidt wrote:
 On 11/10/11 13:14, Richard Guenther wrote:
  Fair enough.  You can count me as one then, and I'll defer to Bernd
  to either provide a fix or ack the revert.
 
 I'm trying to track it down.
 
 In 189r.outof_cfglayout, we have
 
 (insn 31 33 35 3 (use (reg/i:SI 0 r0))
 ../../../../baseline-trunk/libstdc++-v3/libsupc++/new_opv.cc:34 -1
  (nil))
 
 ;; Successors:  EXIT [100.0%]  (fallthru)
 ;; lr  out   0 [r0] 11 [fp] 13 [sp] 14 [lr] 25 [sfp] 26 [afp]
 ;; live  out 0 [r0] 11 [fp] 13 [sp] 25 [sfp] 26 [afp]
 
 followed by a number of other basic blocks, so that looks wrong to me.
 outof_cfglayout seems to assume that fallthrough edges to the exit block
 are OK and don't need fixing up, and changing that seems nontrivial at
 first glance.
 
 The situation is first created during cfgcleanup in into_cfglayout. The
 following patch makes the testcase compile by stopping the compiler from
 moving the exit fallthru block around, but I've not checked whether it
 has a negative effect on code quality. HP, can you run full tests?

FWIW, I did bootstrap and make check with/without this patch, and it introduces
no regressions in the PowerPC, but I haven't look at the code generated.

-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899



Re: PowerPC shrink-wrap support 3 of 3

2011-11-07 Thread Jakub Jelinek
On Wed, Nov 02, 2011 at 02:03:40AM +1030, Alan Modra wrote:
 Bootstrapped and regression tested powerpc-linux.  OK to apply?
 (And I won't be posting any more versions of the patch until this is
 reviewed.  Please excuse me for spamming the list.)

Looks reasonable to me, appart from

   * function.c (bb_active_p): Delete.
   (dup_block_and_redirect, active_insn_between): New functions.
   (convert_jumps_to_returns, emit_return_for_exit): New functions,
   split out from..
   (thread_prologue_and_epilogue_insns): ..here.  Delete
   shadowing variables.  Don't do prologue register clobber tests
   when shrink wrapping already failed.  Delete all last_bb_active
   code.  Instead compute tail block candidates for duplicating
   exit path.  Remove these from antic set.  Duplicate tails when
   reached from both blocks needing a prologue/epilogue and
   blocks not needing such.
   * ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and
   HAVE_simple_return.
   * bb-reorder.c (get_uncond_jump_length): Make global.
   * bb-reorder.h (get_uncond_jump_length): Declare.
   * cfgrtl.c (rtl_create_basic_block): Comment typo fix.
   (rtl_split_edge): Likewise.  Warning fix.
   (rtl_duplicate_bb): New function.
   (rtl_cfg_hooks): Enable can_duplicate_block_p and duplicate_block.
 
 Index: gcc/function.c
 ===
 --- gcc/function.c(revision 180588)
 +++ gcc/function.c(working copy)
 @@ -65,6 +65,8 @@ along with GCC; see the file COPYING3.  
  #include df.h
  #include timevar.h
  #include vecprim.h
 +#include params.h
 +#include bb-reorder.h

This needs corresponding change in Makefile.in, function.o
needs $(PARAMS_H) bb-reorder.h dependencies added.

Ok for trunk with that change if no maintainer objects within 24 hours.

Jakub


Re: PowerPC shrink-wrap support 3 of 3

2011-11-01 Thread Alan Modra
On Tue, Nov 01, 2011 at 12:57:22AM +1030, Alan Modra wrote:
 Bits left to do
 - limit size of duplicated tails

Done here.  Also fixes a hole in that I took no notice of
targetm.cannot_copy_insn_p when duplicating tails.

One interesting result is that the tail duplication actually reduces
the text size of libstdc++.so from 1074042 to 1073478 bytes on
powerpc-linux.  The reason being that a shrink-wrapped function that
needs a prologue only on paths ending in a sibling call will lose one
copy of the epilogue.  That must happen enough to more than make up
for duplicated tails.

Bootstrapped and regression tested powerpc-linux.  OK to apply?
(And I won't be posting any more versions of the patch until this is
reviewed.  Please excuse me for spamming the list.)

* function.c (bb_active_p): Delete.
(dup_block_and_redirect, active_insn_between): New functions.
(convert_jumps_to_returns, emit_return_for_exit): New functions,
split out from..
(thread_prologue_and_epilogue_insns): ..here.  Delete
shadowing variables.  Don't do prologue register clobber tests
when shrink wrapping already failed.  Delete all last_bb_active
code.  Instead compute tail block candidates for duplicating
exit path.  Remove these from antic set.  Duplicate tails when
reached from both blocks needing a prologue/epilogue and
blocks not needing such.
* ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and
HAVE_simple_return.
* bb-reorder.c (get_uncond_jump_length): Make global.
* bb-reorder.h (get_uncond_jump_length): Declare.
* cfgrtl.c (rtl_create_basic_block): Comment typo fix.
(rtl_split_edge): Likewise.  Warning fix.
(rtl_duplicate_bb): New function.
(rtl_cfg_hooks): Enable can_duplicate_block_p and duplicate_block.

Index: gcc/function.c
===
--- gcc/function.c  (revision 180588)
+++ gcc/function.c  (working copy)
@@ -65,6 +65,8 @@ along with GCC; see the file COPYING3.  
 #include df.h
 #include timevar.h
 #include vecprim.h
+#include params.h
+#include bb-reorder.h
 
 /* So we can assign to cfun in this file.  */
 #undef cfun
@@ -5290,8 +5292,6 @@ requires_stack_frame_p (rtx insn, HARD_R
   HARD_REG_SET hardregs;
   unsigned regno;
 
-  if (!INSN_P (insn) || DEBUG_INSN_P (insn))
-return false;
   if (CALL_P (insn))
 return !SIBLING_CALL_P (insn);
 
@@ -5514,23 +5514,186 @@ set_return_jump_label (rtx returnjump)
 JUMP_LABEL (returnjump) = ret_rtx;
 }
 
-/* Return true if BB has any active insns.  */
+#ifdef HAVE_simple_return
+/* Create a copy of BB instructions and insert at BEFORE.  Redirect
+   preds of BB to COPY_BB if they don't appear in NEED_PROLOGUE.  */
+static void
+dup_block_and_redirect (basic_block bb, basic_block copy_bb, rtx before,
+   bitmap_head *need_prologue)
+{
+  edge_iterator ei;
+  edge e;
+  rtx insn = BB_END (bb);
+
+  /* We know BB has a single successor, so there is no need to copy a
+ simple jump at the end of BB.  */
+  if (simplejump_p (insn))
+insn = PREV_INSN (insn);
+
+  start_sequence ();
+  duplicate_insn_chain (BB_HEAD (bb), insn);
+  if (dump_file)
+{
+  unsigned count = 0;
+  for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
+   if (active_insn_p (insn))
+ ++count;
+  fprintf (dump_file, Duplicating bb %d to bb %d, %u active insns.\n,
+  bb-index, copy_bb-index, count);
+}
+  insn = get_insns ();
+  end_sequence ();
+  emit_insn_before (insn, before);
+
+  /* Redirect all the paths that need no prologue into copy_bb.  */
+  for (ei = ei_start (bb-preds); (e = ei_safe_edge (ei)); )
+if (!bitmap_bit_p (need_prologue, e-src-index))
+  {
+   redirect_edge_and_branch_force (e, copy_bb);
+   continue;
+  }
+else
+  ei_next (ei);
+}
+#endif
+
+#if defined (HAVE_return) || defined (HAVE_simple_return)
+/* Return true if there are any active insns between HEAD and TAIL.  */
 static bool
-bb_active_p (basic_block bb)
+active_insn_between (rtx head, rtx tail)
 {
+  while (tail)
+{
+  if (active_insn_p (tail))
+   return true;
+  if (tail == head)
+   return false;
+  tail = PREV_INSN (tail);
+}
+  return false;
+}
+
+/* LAST_BB is a block that exits, and empty of active instructions.
+   Examine its predecessors for jumps that can be converted to
+   (conditional) returns.  */
+static VEC (edge, heap) *
+convert_jumps_to_returns (basic_block last_bb, bool simple_p,
+ VEC (edge, heap) *unconverted ATTRIBUTE_UNUSED)
+{
+  int i;
+  basic_block bb;
   rtx label;
+  edge_iterator ei;
+  edge e;
+  VEC(basic_block,heap) *src_bbs;
+
+  src_bbs = VEC_alloc (basic_block, heap, EDGE_COUNT (last_bb-preds));
+  FOR_EACH_EDGE (e, ei, last_bb-preds)
+if (e-src != ENTRY_BLOCK_PTR)
+  VEC_quick_push (basic_block, 

Re: PowerPC shrink-wrap support 3 of 3

2011-10-31 Thread Alan Modra
So I'm at the point where I'm reasonably happy with this work.  This
patch doesn't do anything particularly clever regarding our
shrink-wrap implementation.  We still only insert one copy of the
prologue, and one of the epilogue in thread_prologue_and_epilogue.
All it really does is replaces Bernd's !last_bb_active code (allowing
one tail block with no active insns to be shared by paths needing a
prologue and paths not needing a prologue), with what I think is
conceptually simpler, duplicating a shared tail block.  Then I extend
this to duplicating a chain of tail blocks.

That leads to some simplification as all the special cases and
restrictions of !last_bb_active disappear.  For example,
convert_jumps_to_returns looks much like the code in gcc-4.6.  We also
get many more functions being shrink-wrapped.  Some numbers from my
latest gcc bootstraps:

powerpc-linux
.../gcc-virgin/gcc grep 'Performing shrink' *.pro_and_epilogue | wc -l
453
.../gcc-curr/gcc grep 'Performing shrink' *.pro_and_epilogue | wc -l
648

i686-linux
.../gcc-virgin/gcc$ grep 'Performing shrink' *pro_and_epilogue | wc -l
329
.../gcc-curr/gcc$ grep 'Performing shrink' *.pro_and_epilogue | wc -l
416

Bits left to do
- limit size of duplicated tails
- don't duplicate sibling call blocks, but instead split the block
  after the sibling call epilogue has been added, redirecting
  non-prologue paths past the epilogue.

Is this OK to apply as is?

* function.c (bb_active_p): Delete.
(dup_block_and_redirect, active_insn_between): New functions.
(convert_jumps_to_returns, emit_return_for_exit): New functions,
split out from..
(thread_prologue_and_epilogue_insns): ..here.  Delete
shadowing variables.  Don't do prologue register clobber tests
when shrink wrapping already failed.  Delete all last_bb_active
code.  Instead compute tail block candidates for duplicating
exit path.  Remove these from antic set.  Duplicate tails when
reached from both blocks needing a prologue/epilogue and
blocks not needing such.

Index: gcc/function.c
===
*** gcc/function.c  (revision 180588)
--- gcc/function.c  (working copy)
*** set_return_jump_label (rtx returnjump)
*** 5514,5535 
  JUMP_LABEL (returnjump) = ret_rtx;
  }
  
! /* Return true if BB has any active insns.  */
  static bool
! bb_active_p (basic_block bb)
  {
rtx label;
  
!   /* Test whether there are active instructions in BB.  */
!   label = BB_END (bb);
!   while (label  !LABEL_P (label))
  {
!   if (active_insn_p (label))
!   break;
!   label = PREV_INSN (label);
  }
!   return BB_HEAD (bb) != label || !LABEL_P (label);
  }
  
  /* Generate the prologue and epilogue RTL if the machine supports it.  Thread
 this into place with notes indicating where the prologue ends and where
--- 5514,5698 
  JUMP_LABEL (returnjump) = ret_rtx;
  }
  
! #ifdef HAVE_simple_return
! /* Create a copy of BB instructions and insert at BEFORE.  Redirect
!preds of BB to COPY_BB if they don't appear in NEED_PROLOGUE.  */
! static void
! dup_block_and_redirect (basic_block bb, basic_block copy_bb, rtx before,
!   bitmap_head *need_prologue)
! {
!   edge_iterator ei;
!   edge e;
!   rtx insn = BB_END (bb);
! 
!   /* We know BB has a single successor, so there is no need to copy a
!  simple jump at the end of BB.  */
!   if (simplejump_p (insn))
! insn = PREV_INSN (insn);
! 
!   start_sequence ();
!   duplicate_insn_chain (BB_HEAD (bb), insn);
!   if (dump_file)
! {
!   unsigned count = 0;
!   for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
!   if (active_insn_p (insn))
! ++count;
!   fprintf (dump_file, Duplicating bb %d to bb %d, %u active insns.\n,
!  bb-index, copy_bb-index, count);
! }
!   insn = get_insns ();
!   end_sequence ();
!   emit_insn_before (insn, before);
! 
!   /* Redirect all the paths that need no prologue into copy_bb.  */
!   for (ei = ei_start (bb-preds); (e = ei_safe_edge (ei)); )
! if (!bitmap_bit_p (need_prologue, e-src-index))
!   {
!   redirect_edge_and_branch_force (e, copy_bb);
!   continue;
!   }
! else
!   ei_next (ei);
! }
! #endif
! 
! #if defined (HAVE_return) || defined (HAVE_simple_return)
! /* Return true if there are any active insns between HEAD and TAIL.  */
  static bool
! active_insn_between (rtx head, rtx tail)
! {
!   while (tail)
! {
!   if (active_insn_p (tail))
!   return true;
!   if (tail == head)
!   return false;
!   tail = PREV_INSN (tail);
! }
!   return false;
! }
! 
! /* LAST_BB is a block that exits, and empty of active instructions.
!Examine its predecessors for jumps that can be converted to
!(conditional) returns.  */
! static VEC (edge, heap) *
! convert_jumps_to_returns (basic_block last_bb, bool simple_p,
! 

Re: PowerPC shrink-wrap support 3 of 3

2011-10-27 Thread Alan Modra
On Thu, Oct 27, 2011 at 12:24:46AM +1030, Alan Modra wrote:
 more code than duplicating epilogues.  From what I've seen, the
 duplicate tails are generally very small.  I guess I should dump out
 some info so we can get a better idea.

There were 545 occurrences of shrink-wrap in the gcc/ dir for a
-O2 powerpc-linux bootstrap.  I counted active insns in duplicated
blocks.  244 had zero insns (all the ones I looked at were just uses
of r3), 96 had one insn (all the ones I looked at were setting r3),
with the remainder being:

cfgrtl.c.199r.pro_and_epilogue:Duplicating bb 6, 2 insns.
  tail call
function.c.199r.pro_and_epilogue:Duplicating bb 17, 2 insns.
  setting a value via pointer (*poffset in instantiate_new_reg)
insn-automata.c.199r.pro_and_epilogue:Duplicating bb 229, 2 insns.
  tail call
varasm.c.199r.pro_and_epilogue:Duplicating bb 48, 2 insns.
  setting a global var
rs6000.c.199r.pro_and_epilogue:Duplicating bb 300, 3 insns.
  tail call
rs6000.c.199r.pro_and_epilogue:Duplicating bb 8, 3 insns.
  tail call
toplev.c.199r.pro_and_epilogue:Duplicating bb 5, 4 insns.
  loading two word return value from random_seed var.
reginfo.c.199r.pro_and_epilogue:Duplicating bb 4, 7 insns.
  setting a two word global var and r3
dbxout.c.199r.pro_and_epilogue:Duplicating bb 4, 8 insns.
  tail call
dbxout.c.199r.pro_and_epilogue:Duplicating bb 4, 8 insns.
  tail call

Note that having 350 duplicated blocks doesn't mean we had 350 extra
cases of shrink-wrapping, because some tails were two blocks, and I
recall seeing a case when developing my code where one function had
two duplicated tails.  Certainly many more shrink wrapped functions
though.  :-)

-- 
Alan Modra
Australia Development Lab, IBM


Re: PowerPC shrink-wrap support 3 of 3

2011-10-26 Thread Alan Modra
On Sun, Oct 16, 2011 at 02:51:01PM -0400, David Edelsohn wrote:
 The patch is okay, although I am not thrilled about the need to change
 the register allocation order.

Committed revision 180522.  It turns out that shrink-wrapping isn't as
effective as it used to be with the 20110915 based sources I was using
originally.  povray Ray_In_Bound no longer gets the benefit of shrink
wrap, likely due to some cfg optimization.  We end up with a simple
block that just does r3=1 then jumps to last_bb being reached from
blocks that need a prologue as well as blocks that don't.  That's
enough to kill our current shrink wrap implementation.  What we need
is something to duplicate these tail blocks..

Patch here for comment.  I haven't yet run benchmarks, but this passes
bootstrap and regression test (on rev 180286, current virgin mainline
fails bootstrap on powerpc-linux).

* function.c (thread_prologue_and_epilogue_insns): Delete
shadowing variables.  Don't do prologue register clobber tests
when shrink wrapping already failed.  Compute tail block
candidates for duplicating exit path.  Remove these from
antic set.  Duplicate tails when reached from both blocks
needing a prologue/epilogue and blocks not needing such.

Index: gcc/function.c
===
--- gcc/function.c  (revision 180467)
+++ gcc/function.c  (working copy)
@@ -5697,11 +5697,11 @@ thread_prologue_and_epilogue_insns (void
   HARD_REG_SET prologue_clobbered, prologue_used, live_on_edge;
   HARD_REG_SET set_up_by_prologue;
   rtx p_insn;
-
   VEC(basic_block, heap) *vec;
   basic_block bb;
   bitmap_head bb_antic_flags;
   bitmap_head bb_on_list;
+  bitmap_head bb_tail;
 
   if (dump_file)
fprintf (dump_file, Attempting shrink-wrapping optimization.\n);
@@ -5732,6 +5732,7 @@ thread_prologue_and_epilogue_insns (void
 
   bitmap_initialize (bb_antic_flags, bitmap_default_obstack);
   bitmap_initialize (bb_on_list, bitmap_default_obstack);
+  bitmap_initialize (bb_tail, bitmap_default_obstack);
 
   /* Find the set of basic blocks that require a stack frame.  */
 
@@ -5774,19 +5775,22 @@ thread_prologue_and_epilogue_insns (void
}
}
 
+  /* Save a copy of blocks that really need a prologue.  */
+  bitmap_copy (bb_antic_flags, bb_flags);
+
   /* For every basic block that needs a prologue, mark all blocks
 reachable from it, so as to ensure they are also seen as
 requiring a prologue.  */
   while (!VEC_empty (basic_block, vec))
{
  basic_block tmp_bb = VEC_pop (basic_block, vec);
- edge e;
- edge_iterator ei;
+
  FOR_EACH_EDGE (e, ei, tmp_bb-succs)
if (e-dest != EXIT_BLOCK_PTR
 bitmap_set_bit (bb_flags, e-dest-index))
  VEC_quick_push (basic_block, vec, e-dest);
}
+
   /* If the last basic block contains only a label, we'll be able
 to convert jumps to it to (potentially conditional) return
 insns later.  This means we don't necessarily need a prologue
@@ -5799,14 +5803,29 @@ thread_prologue_and_epilogue_insns (void
goto fail_shrinkwrap;
}
 
+  /* Find the set of basic blocks that need no prologue and only
+go to the exit.  */
+  bitmap_set_bit (bb_tail, EXIT_BLOCK_PTR-index);
+  VEC_quick_push (basic_block, vec, EXIT_BLOCK_PTR);
+  while (!VEC_empty (basic_block, vec))
+   {
+ basic_block tmp_bb = VEC_pop (basic_block, vec);
+
+ FOR_EACH_EDGE (e, ei, tmp_bb-preds)
+   if (single_succ_p (e-src)
+!bitmap_bit_p (bb_antic_flags, e-src-index)
+bitmap_set_bit (bb_tail, e-src-index))
+ VEC_quick_push (basic_block, vec, e-src);
+   }
+
   /* Now walk backwards from every block that is marked as needing
-a prologue to compute the bb_antic_flags bitmap.  */
-  bitmap_copy (bb_antic_flags, bb_flags);
+a prologue to compute the bb_antic_flags bitmap.  Exclude
+tail blocks; They can be duplicated to be used on paths not
+needing a prologue.  */
+  bitmap_and_compl (bb_antic_flags, bb_flags, bb_tail);
   FOR_EACH_BB (bb)
{
- edge e;
- edge_iterator ei;
- if (!bitmap_bit_p (bb_flags, bb-index))
+ if (!bitmap_bit_p (bb_antic_flags, bb-index))
continue;
  FOR_EACH_EDGE (e, ei, bb-preds)
if (!bitmap_bit_p (bb_antic_flags, e-src-index)
@@ -5816,8 +5835,6 @@ thread_prologue_and_epilogue_insns (void
   while (!VEC_empty (basic_block, vec))
{
  basic_block tmp_bb = VEC_pop (basic_block, vec);
- edge e;
- edge_iterator ei;
  bool all_set = true;
 
  bitmap_clear_bit (bb_on_list, tmp_bb-index);
@@ -5862,28 +5879,172 @@ thread_prologue_and_epilogue_insns (void
  

Re: PowerPC shrink-wrap support 3 of 3

2011-10-26 Thread Bernd Schmidt
On 10/26/11 14:27, Alan Modra wrote:
 Committed revision 180522.  It turns out that shrink-wrapping isn't as
 effective as it used to be with the 20110915 based sources I was using
 originally.  povray Ray_In_Bound no longer gets the benefit of shrink
 wrap, likely due to some cfg optimization.  We end up with a simple
 block that just does r3=1 then jumps to last_bb being reached from
 blocks that need a prologue as well as blocks that don't.  That's
 enough to kill our current shrink wrap implementation.  What we need
 is something to duplicate these tail blocks..

Would it work to insert the epilogue on some edges to this R3=1 block,
and not on the others? (How many edges of each kind are there?)

Now that we have an initial patch in the tree and it mostly seems to
work, we can think about making it a little stronger - the initial
implementation is really quite conservative.


Bernd


Re: PowerPC shrink-wrap support 3 of 3

2011-10-26 Thread Alan Modra
On Wed, Oct 26, 2011 at 03:01:01PM +0200, Bernd Schmidt wrote:
 On 10/26/11 14:27, Alan Modra wrote:
  Committed revision 180522.  It turns out that shrink-wrapping isn't as
  effective as it used to be with the 20110915 based sources I was using
  originally.  povray Ray_In_Bound no longer gets the benefit of shrink
  wrap, likely due to some cfg optimization.  We end up with a simple
  block that just does r3=1 then jumps to last_bb being reached from
  blocks that need a prologue as well as blocks that don't.  That's
  enough to kill our current shrink wrap implementation.  What we need
  is something to duplicate these tail blocks..
 
 Would it work to insert the epilogue on some edges to this R3=1 block,
 and not on the others?

Wouldn't you need to modify all the target epilogue code?  Our
epilogues return.

 (How many edges of each kind are there?)
In the povray case there was one edge of each kind, but I have seen
other cases where there were 4 edges from blocks needing no prologue
and 2 edges from blocks needing a prologue.  I can't tell you what the
testcase was now;  It was something I looked at when ironing out bugs
in my code.  You wouldn't believe how many ways it is possible to
write buggy cfg manipulation code..

I guess the tradeoff between the classic shrink-wrap epilogue scheme
and my duplicate tail idea is whether duplicating tail blocks adds
more code than duplicating epilogues.  From what I've seen, the
duplicate tails are generally very small.  I guess I should dump out
some info so we can get a better idea.

-- 
Alan Modra
Australia Development Lab, IBM


Re: PowerPC shrink-wrap support 3 of 3

2011-10-26 Thread Alan Modra
On Wed, Oct 26, 2011 at 03:59:36PM +0200, Bernd Schmidt wrote:
 On 10/26/11 15:54, Alan Modra wrote:
  I guess the tradeoff between the classic shrink-wrap epilogue scheme
  and my duplicate tail idea is whether duplicating tail blocks adds
  more code than duplicating epilogues.  From what I've seen, the
  duplicate tails are generally very small.  I guess I should dump out
  some info so we can get a better idea.
 
 I suppose if one wanted to avoid inserting more than one epilogue for
 code-size reasons, one could make a new basic block containing the
 epilogue, and redirect edges as appropriate.

Suppose you have a function that returns r3=0 in one tail block and
r3=1 in another, and these blocks are reached both by paths needing
a prologue and by paths not needing a prologue.  Which seems a likely
common case.  I'm fairly certain that would require two copies of the
normal epilogue, or duplicating the tail blocks.  (But it's late here
and I'm ready to nod off so may not be thinking straight.)

-- 
Alan Modra
Australia Development Lab, IBM


Re: PowerPC shrink-wrap support 3 of 3

2011-10-16 Thread David Edelsohn
On Wed, Sep 28, 2011 at 11:47 AM, Alan Modra amo...@gmail.com wrote:

        * config/rs6000/rs6000.c (rs6000_make_savres_rtx): Delete unneeded
        declaration.
        (rs6000_emit_stack_reset): Only return insn emitted when it adjusts sp.
        (rs6000_make_savres_rtx): Rename to rs6000_emit_savres_rtx.  Use
        simple_return in pattern, emit instruction, and set jump_label.
        (rs6000_emit_prologue): Update for rs6000_emit_savres_rtx.  Use
        simple_return rather than return.
        (emit_cfa_restores): New function.
        (rs6000_emit_epilogue): Emit cfa_restores when flag_shrink_wrap.
        Add missing cfa_restores for SAVE_WORLD.  Add missing LR cfa_restore
        when using out-of-line gpr restore.  Add missing LR and FP regs
        cfa_restores for out-of-line fpr restore.  Consolidate code setting
        up cfa_restores.  Formatting.  Use LR_REGNO define.
        (rs6000_output_mi_thunk): Use simple_return rather than return.
        * config/rs6000/rs6000.md (sibcall*, sibcall_value*): Likewise.
        (return_internal*): Likewise.
        (any_return, return_pred, return_str): New iterators.
        (return, conditional return insns): Provide both return and
        simple_return variants.
        * gcc/config/rs6000/rs6000.h (EARLY_R12, LATE_R12): Define.
        (REG_ALLOC_ORDER): Move r12 before call-saved regs when FIXED_R13.
        Move r11 and r0 later to suit shrink-wrapping.

Alan,

The patch is okay, although I am not thrilled about the need to change
the register allocation order.

Thanks, David


Re: PowerPC shrink-wrap support 3 of 3

2011-09-28 Thread Alan Modra
This supercedes http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01004.html
and http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01593.html, fixing
the two regressions introduced by those patches.  The first patch is
unchanged except to leave all the out-of-line restore functions using
return rather than simple_return.  We don't want these being
confused with a plain simple_return and perhaps used by the shrink-
wrapping to return from pre-prologue code.

The second of these two patches was way too simplistic.  It was a real
pain getting the cfa_restores correct.  A lot were missing, or emitted
at the wrong place (due to bug in rs6000_emit_stack_reset).  I also
had the real restore insn move past the cfa_restores (mtlr 0 insn
scheduled over loads from stack).

* config/rs6000/rs6000.c (rs6000_make_savres_rtx): Delete unneeded
declaration.
(rs6000_emit_stack_reset): Only return insn emitted when it adjusts sp.
(rs6000_make_savres_rtx): Rename to rs6000_emit_savres_rtx.  Use
simple_return in pattern, emit instruction, and set jump_label.
(rs6000_emit_prologue): Update for rs6000_emit_savres_rtx.  Use
simple_return rather than return.
(emit_cfa_restores): New function.
(rs6000_emit_epilogue): Emit cfa_restores when flag_shrink_wrap.
Add missing cfa_restores for SAVE_WORLD.  Add missing LR cfa_restore
when using out-of-line gpr restore.  Add missing LR and FP regs
cfa_restores for out-of-line fpr restore.  Consolidate code setting
up cfa_restores.  Formatting.  Use LR_REGNO define.
(rs6000_output_mi_thunk): Use simple_return rather than return.
* config/rs6000/rs6000.md (sibcall*, sibcall_value*): Likewise.
(return_internal*): Likewise.
(any_return, return_pred, return_str): New iterators.
(return, conditional return insns): Provide both return and
simple_return variants.
* gcc/config/rs6000/rs6000.h (EARLY_R12, LATE_R12): Define.
(REG_ALLOC_ORDER): Move r12 before call-saved regs when FIXED_R13.
Move r11 and r0 later to suit shrink-wrapping.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 178876)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -899,8 +900,6 @@ static const char *rs6000_mangle_type (c
 static void rs6000_set_default_type_attributes (tree);
 static rtx rs6000_savres_routine_sym (rs6000_stack_t *, bool, bool, bool);
 static rtx rs6000_emit_stack_reset (rs6000_stack_t *, rtx, rtx, int, bool);
-static rtx rs6000_make_savres_rtx (rs6000_stack_t *, rtx, int,
-  enum machine_mode, bool, bool, bool);
 static bool rs6000_reg_live_or_pic_offset_p (int);
 static tree rs6000_builtin_vectorized_libmass (tree, tree, tree);
 static tree rs6000_builtin_vectorized_function (tree, tree, tree);
@@ -19704,8 +19728,10 @@ rs6000_emit_stack_reset (rs6000_stack_t 
   if (sp_offset != 0)
{
  rtx dest_reg = savres ? gen_rtx_REG (Pmode, 11) : sp_reg_rtx;
- return emit_insn (gen_add3_insn (dest_reg, frame_reg_rtx,
-  GEN_INT (sp_offset)));
+ rtx insn = emit_insn (gen_add3_insn (dest_reg, frame_reg_rtx,
+  GEN_INT (sp_offset)));
+ if (!savres)
+   return insn;
}
   else if (!savres)
return emit_move_insn (sp_reg_rtx, frame_reg_rtx);
@@ -19729,10 +19755,11 @@ rs6000_emit_stack_reset (rs6000_stack_t 
 }
 
 /* Construct a parallel rtx describing the effect of a call to an
-   out-of-line register save/restore routine.  */
+   out-of-line register save/restore routine, and emit the insn
+   or jump_insn as appropriate.  */
 
 static rtx
-rs6000_make_savres_rtx (rs6000_stack_t *info,
+rs6000_emit_savres_rtx (rs6000_stack_t *info,
rtx frame_reg_rtx, int save_area_offset,
enum machine_mode reg_mode,
bool savep, bool gpr, bool lr)
@@ -19742,6 +19769,7 @@ rs6000_make_savres_rtx (rs6000_stack_t *
   int reg_size = GET_MODE_SIZE (reg_mode);
   rtx sym;
   rtvec p;
+  rtx par, insn;
 
   offset = 0;
   start_reg = (gpr
@@ -19755,7 +19783,7 @@ rs6000_make_savres_rtx (rs6000_stack_t *
 RTVEC_ELT (p, offset++) = ret_rtx;
 
   RTVEC_ELT (p, offset++)
-= gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, 65));
+= gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, LR_REGNO));
 
   sym = rs6000_savres_routine_sym (info, savep, gpr, lr);
   RTVEC_ELT (p, offset++) = gen_rtx_USE (VOIDmode, sym);
@@ -19788,7 +19816,16 @@ rs6000_make_savres_rtx (rs6000_stack_t *
   RTVEC_ELT (p, i + offset) = gen_rtx_SET (VOIDmode, mem, reg);
 }
 
-  return gen_rtx_PARALLEL (VOIDmode, p);
+  par = gen_rtx_PARALLEL (VOIDmode, p);
+
+  if (!savep  lr)
+{
+  insn = emit_jump_insn (par);
+  JUMP_LABEL (insn) = ret_rtx;
+

Re: PowerPC shrink-wrap support 3 of 3

2011-09-26 Thread Alan Modra
This patch fixes an issue that limit opportunities for shrink-wrapping
on PowerPC.  The rs6000 REG_ALLOC_ORDER chooses r0 as the very first
gpr to use in code, with r11 also having high priority.  This means it
is quite likely that r0 or r11 is live on the edge chosen for
shrink-wrapping.  That's unfortunate since rs6000 uses r0, r11, and
r12 as temporaries in the prologue, and thus can't insert the prologue
on that edge.

I avoid this situation in most cases by simply changing the register
allocation order, making r0 and r11 the last of the volatile gprs to
be used.  I also noticed that r12 is not used until after the
non-volatile regs are allocated.  According to the comment, the reason
this is done is to avoid putting a DI into the r12,r13 pair (and as
r13 is non-volatile would mean saving and restoring all the
non-volatile gprs when not inlining the saves/restores).  However,
when r13 is a fixed reg this cannot occur, so moving r12 before the
non-volatile regs makes another reg available before we need to start
saving regs.

After making the register allocation order changes I found gcc
bootstrap failed.  This turned out to be caused by shrink-wrapping
and register renaming moving some pre-prologue code after the
epilogue.  To support this we need to ensure no prologue unwind info
is left active after the epilogue.

Bootstrapped and regression tested powerpc-linux.  Two regressions
appeared due to a problem in the shrink-wrap code.  More on that
later.

* config/rs6000/rs6000.c (rs6000_emit_epilogue): Emit cfa_restores
when flag_shrink_wrap.
* gcc/config/rs6000/rs6000.h (EARLY_R12, LATE_R12): Define.
(REG_ALLOC_ORDER): Move r12 before call-saved regs when FIXED_R13.
Move r11 and r0 later to suit shrink-wrapping.

diff -urp -x'*~' -x'*.orig' -x'*.rej' -x.svn 
gcc-alanshrink1/gcc/config/rs6000/rs6000.c 
gcc-current/gcc/config/rs6000/rs6000.c
--- gcc-alanshrink1/gcc/config/rs6000/rs6000.c  2011-09-26 15:51:19.0 
+0930
+++ gcc-current/gcc/config/rs6000/rs6000.c  2011-09-23 11:54:00.0 
+0930
@@ -21001,7 +21045,7 @@ rs6000_emit_epilogue (int sibcall)
 
reg = gen_rtx_REG (V4SImode, i);
emit_move_insn (reg, mem);
-   if (DEFAULT_ABI == ABI_V4)
+   if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap)
  cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg,
 cfa_restores);
  }
@@ -21052,10 +21096,16 @@ rs6000_emit_epilogue (int sibcall)
 }
 
   /* Set LR here to try to overlap restores below.  LR is always saved
- above incoming stack, so it never needs REG_CFA_RESTORE.  */
+ above incoming stack, so it never needs REG_CFA_RESTORE except
+ when shrink-wrapping as in that case we may have blocks from
+ before the prologue being moved after the epilogue.  */
   if (restore_lr  restoring_GPRs_inline)
-emit_move_insn (gen_rtx_REG (Pmode, LR_REGNO),
-   gen_rtx_REG (Pmode, 0));
+{
+  rtx lr = gen_rtx_REG (Pmode, LR_REGNO);
+  emit_move_insn (lr, gen_rtx_REG (Pmode, 0));
+  if (flag_shrink_wrap)
+   cfa_restores = alloc_reg_note (REG_CFA_RESTORE, lr, cfa_restores);
+}
 
   /* Load exception handler data registers, if needed.  */
   if (crtl-calls_eh_return)
@@ -21146,7 +21196,7 @@ rs6000_emit_epilogue (int sibcall)
reg = gen_rtx_REG (reg_mode, info-first_gp_reg_save + i);
 
insn = emit_move_insn (reg, mem);
-   if (DEFAULT_ABI == ABI_V4)
+   if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap)
  {
if (frame_pointer_needed
 info-first_gp_reg_save + i
@@ -21188,7 +21238,7 @@ rs6000_emit_epilogue (int sibcall)
  if (info-cr_save_p)
{
  rs6000_restore_saved_cr (cr_save_reg, using_mtcr_multiple);
- if (DEFAULT_ABI == ABI_V4)
+ if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap)
cfa_restores
  = alloc_reg_note (REG_CFA_RESTORE,
gen_rtx_REG (SImode, CR2_REGNO),
@@ -21217,7 +21267,7 @@ rs6000_emit_epilogue (int sibcall)
  return;
}
 
-  if (DEFAULT_ABI == ABI_V4)
+  if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap)
{
  if (frame_pointer_needed)
{
@@ -21246,7 +21296,7 @@ rs6000_emit_epilogue (int sibcall)
  rtx reg = gen_rtx_REG (reg_mode, info-first_gp_reg_save + i);
 
  RTVEC_ELT (p, i) = gen_rtx_SET (VOIDmode, reg, mem);
- if (DEFAULT_ABI == ABI_V4)
+ if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap)
cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg,
   cfa_restores);
}
@@ -21271,7 +21321,7 @@ rs6000_emit_epilogue (int sibcall)
rtx reg = gen_rtx_REG (reg_mode, info-first_gp_reg_save + i);
 
insn = emit_move_insn (reg, 

Re: PowerPC shrink-wrap support 3 of 3

2011-09-26 Thread Alan Modra
On Mon, Sep 26, 2011 at 11:22:54PM +0930, Alan Modra wrote:
 Two regressions appeared due to a problem in the shrink-wrap code.

These two.
+FAIL: g++.dg/torture/pr46111.C  -O1  (internal compiler error)
+FAIL: gcc.dg/autopar/pr46099.c (internal compiler error)

Both internal compiler error: in maybe_record_trace_start, at
dwarf2cfi.c:2243, caused by disjoint blocks in the set of blocks that
needs no prologue.  ie. thread_prologue_and_epilogue_insns gives


code needing no prologue
  - prologue inserted here
code needing prologue

more no prologue code

more code needing prologue

epilogue

That second block needing no prologue now wrongly tries to use unwind
info from the prologue.

-- 
Alan Modra
Australia Development Lab, IBM


Re: PowerPC shrink-wrap support 3 of 3

2011-09-26 Thread Bernd Schmidt
On 09/27/11 00:32, Alan Modra wrote:
 On Mon, Sep 26, 2011 at 11:22:54PM +0930, Alan Modra wrote:
 Two regressions appeared due to a problem in the shrink-wrap code.
 
 These two.
 +FAIL: g++.dg/torture/pr46111.C  -O1  (internal compiler error)
 +FAIL: gcc.dg/autopar/pr46099.c (internal compiler error)
 
 Both internal compiler error: in maybe_record_trace_start, at
 dwarf2cfi.c:2243, caused by disjoint blocks in the set of blocks that
 needs no prologue.  ie. thread_prologue_and_epilogue_insns gives
 
 
 code needing no prologue
   - prologue inserted here
 code needing prologue
 
 more no prologue code
 
 more code needing prologue
 
 epilogue
 
 That second block needing no prologue now wrongly tries to use unwind
 info from the prologue.

What's the actual CFG structure here?


Bernd



Re: PowerPC shrink-wrap support 3 of 3

2011-09-26 Thread Alan Modra
On Tue, Sep 27, 2011 at 12:39:36AM +0200, Bernd Schmidt wrote:
 On 09/27/11 00:32, Alan Modra wrote:
  On Mon, Sep 26, 2011 at 11:22:54PM +0930, Alan Modra wrote:
  Two regressions appeared due to a problem in the shrink-wrap code.
  
  These two.
  +FAIL: g++.dg/torture/pr46111.C  -O1  (internal compiler error)
  +FAIL: gcc.dg/autopar/pr46099.c (internal compiler error)
  
  Both internal compiler error: in maybe_record_trace_start, at
  dwarf2cfi.c:2243, caused by disjoint blocks in the set of blocks that
  needs no prologue.  ie. thread_prologue_and_epilogue_insns gives
  
  
  code needing no prologue
    - prologue inserted here
  code needing prologue
  
  more no prologue code
  
  more code needing prologue
  
  epilogue
  
  That second block needing no prologue now wrongly tries to use unwind
  info from the prologue.
 
 What's the actual CFG structure here?

Immediately after thread_prologue_and_epilogue_insns

bb2 - simple_ret
 |
bb3
/\
  /\
bb4bb5
   (pro)|
 |  |--|
bb7 |   |
   (epi)   bb6--|
 |  |
bb8-|  |
  (s ret)|  |
 |-bb9


-- 
Alan Modra
Australia Development Lab, IBM


Re: PowerPC shrink-wrap support 3 of 3

2011-09-26 Thread Bernd Schmidt
On 09/27/11 02:11, Alan Modra wrote:
 On Tue, Sep 27, 2011 at 12:39:36AM +0200, Bernd Schmidt wrote:
 On 09/27/11 00:32, Alan Modra wrote:
 On Mon, Sep 26, 2011 at 11:22:54PM +0930, Alan Modra wrote:
 Two regressions appeared due to a problem in the shrink-wrap code.

 These two.
 +FAIL: g++.dg/torture/pr46111.C  -O1  (internal compiler error)
 +FAIL: gcc.dg/autopar/pr46099.c (internal compiler error)

 Both internal compiler error: in maybe_record_trace_start, at
 dwarf2cfi.c:2243, caused by disjoint blocks in the set of blocks that
 needs no prologue.  ie. thread_prologue_and_epilogue_insns gives

 
 code needing no prologue
   - prologue inserted here
 code needing prologue
 
 more no prologue code
 
 more code needing prologue
 
 epilogue

 That second block needing no prologue now wrongly tries to use unwind
 info from the prologue.

 What's the actual CFG structure here?
 
 Immediately after thread_prologue_and_epilogue_insns
 
 bb2 - simple_ret
  |
 bb3
 /\
   /\
 bb4bb5
(pro)|
  |  |--|
 bb7 |   |
(epi)   bb6--|
  |  |
 bb8-|  |
   (s ret)|  |
  |-bb9

That looks perfectly reasonable, and even if it is ordered in the way
shown above, dwarf2cfi still should be able to deal with it if the
prologue and epilogue are annotated correctly.

Again, at the crash site, examine the start insn to find out which basic
block is the problem, and use dump_cfi_row on the two rows it tried to
compare (and found an unexpected difference).


Bernd


PowerPC shrink-wrap support 3 of 3

2011-09-17 Thread Alan Modra
Finally, the powerpc backend changes.  These are mostly just
mechanical.  I'll note that we need both simple_return and return
variants of the conditional returns because they can only be used when
no epilogue is required.  The return variant must use
direct_return() as a predicate to check this requirement.  The
simple_return variant is used by the shrink_wrap optimization to
enable a conditional return before the function prologue, and should
*not* use direct_return() because the optimization is valid for
functions that do require an epilogue.  All other instructions in the
rs6000 backend that currently use return are generated from the
epilogue or are sibling calls, so simple_return is appropriate as
per md.texi.

* config/rs6000/rs6000.c (rs6000_make_savres_rtx): Delete unneeded
declaration.
(rs6000_make_savres_rtx): Rename to rs6000_emit_savres_rtx.  Use
simple_return in pattern, emit instruction, and set jump_label.
(rs6000_emit_prologue): Update for rs6000_emit_savres_rtx.  Use
simple_return rather than return.
(rs6000_output_mi_thunk): Use simple_return rather than return.
* config/rs6000/altivec.md (restore_world): Likewise.
* config/rs6000/spe.md (return_and_restore_gpregs_spe): Likewise.
* config/rs6000/rs6000.md (sibcall*, sibcall_value*): Likewise.
(return_internal*, return_and_restore*): Likewise.
(any_return, return_pred, return_str): New iterators.
(return, conditional return insns): Provide both return and
simple_return variants.

diff -urp -x.svn -x'*~' -x'*.orig' gcc-bernd/gcc/config/rs6000/rs6000.c 
gcc-current/gcc/config/rs6000/rs6000.c
--- gcc-bernd/gcc/config/rs6000/rs6000.c2011-09-16 11:52:15.0 
+0930
+++ gcc-current/gcc/config/rs6000/rs6000.c  2011-09-15 22:25:50.0 
+0930
@@ -899,8 +900,6 @@ static const char *rs6000_mangle_type (c
 static void rs6000_set_default_type_attributes (tree);
 static rtx rs6000_savres_routine_sym (rs6000_stack_t *, bool, bool, bool);
 static rtx rs6000_emit_stack_reset (rs6000_stack_t *, rtx, rtx, int, bool);
-static rtx rs6000_make_savres_rtx (rs6000_stack_t *, rtx, int,
-  enum machine_mode, bool, bool, bool);
 static bool rs6000_reg_live_or_pic_offset_p (int);
 static tree rs6000_builtin_vectorized_libmass (tree, tree, tree);
 static tree rs6000_builtin_vectorized_function (tree, tree, tree);
@@ -19729,10 +19764,11 @@ rs6000_emit_stack_reset (rs6000_stack_t 
 }
 
 /* Construct a parallel rtx describing the effect of a call to an
-   out-of-line register save/restore routine.  */
+   out-of-line register save/restore routine, and emit the insn
+   or jump_insn as appropriate.  */
 
 static rtx
-rs6000_make_savres_rtx (rs6000_stack_t *info,
+rs6000_emit_savres_rtx (rs6000_stack_t *info,
rtx frame_reg_rtx, int save_area_offset,
enum machine_mode reg_mode,
bool savep, bool gpr, bool lr)
@@ -19742,6 +19778,7 @@ rs6000_make_savres_rtx (rs6000_stack_t *
   int reg_size = GET_MODE_SIZE (reg_mode);
   rtx sym;
   rtvec p;
+  rtx par, insn;
 
   offset = 0;
   start_reg = (gpr
@@ -19752,7 +19789,7 @@ rs6000_make_savres_rtx (rs6000_stack_t *
   p = rtvec_alloc ((lr ? 4 : 3) + n_regs);
 
   if (!savep  lr)
-RTVEC_ELT (p, offset++) = ret_rtx;
+RTVEC_ELT (p, offset++) = simple_return_rtx;
 
   RTVEC_ELT (p, offset++)
 = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, 65));
@@ -19788,7 +19825,16 @@ rs6000_make_savres_rtx (rs6000_stack_t *
   RTVEC_ELT (p, i + offset) = gen_rtx_SET (VOIDmode, mem, reg);
 }
 
-  return gen_rtx_PARALLEL (VOIDmode, p);
+  par = gen_rtx_PARALLEL (VOIDmode, p);
+
+  if (!savep  lr)
+{
+  insn = emit_jump_insn (par);
+  JUMP_LABEL (insn) = simple_return_rtx;
+}
+  else
+insn = emit_insn (par);
+  return insn;
 }
 
 /* Determine whether the gp REG is really used.  */
@@ -20087,16 +20133,13 @@ rs6000_emit_prologue (void)
 }
   else if (!WORLD_SAVE_P (info)  info-first_fp_reg_save != 64)
 {
-  rtx par;
-
-  par = rs6000_make_savres_rtx (info, frame_reg_rtx,
-   info-fp_save_offset + sp_offset,
-   DFmode,
-   /*savep=*/true, /*gpr=*/false,
-   /*lr=*/(strategy
-SAVE_NOINLINE_FPRS_SAVES_LR)
-  != 0);
-  insn = emit_insn (par);
+  insn = rs6000_emit_savres_rtx (info, frame_reg_rtx,
+info-fp_save_offset + sp_offset,
+DFmode,
+/*savep=*/true, /*gpr=*/false,
+/*lr=*/((strategy
+  SAVE_NOINLINE_FPRS_SAVES_LR)
+