RE: [PATCH, 2/2] shrink wrap a function with a single loop: split live_edge

2014-09-25 Thread Zhenqiang Chen


 -Original Message-
 From: Jiong Wang [mailto:jiong.w...@arm.com]
 Sent: Thursday, September 25, 2014 2:13 AM
 To: Jeff Law; Zhenqiang Chen
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH, 2/2] shrink wrap a function with a single loop: split
 live_edge
 
 
 On 22/09/14 18:51, Jeff Law wrote:
  On 09/22/14 04:24, Jiong Wang wrote:
  Great.  Can you send an updated patchkit for review.
  patch attached.
 
  please review, thanks.
 
  gcc/ * shrink-wrap.c (move_insn_for_shrink_wrap): Initialize the
  live-in of new created BB as the intersection of live-in from
  old_dest and live-out from bb.
  Looks good.  However, before committing we need a couple things.
 
  1. Bootstrap  regression test this variant of the patch.  I know you
  tested an earlier one, but please test this one just to be sure.
 
  2. Testcase.  I think you could test for either the reduction in the
  live-in set of the newly created block or that you're shrink wrapping
  one or more functions you didn't previously shrink-wrap.  I think it's
  fine if this test is target specific.
 
   bootstrap ok based on revision 215515.
 
   while the x86 regression result is interesting. there is no regression on
 check-g++, while there is four regression on check-gcc:
 
 FAIL: gcc.dg/tree-ssa/loadpre10.c (internal compiler error)
 FAIL: gcc.dg/tree-ssa/loadpre10.c (test for excess errors)
 FAIL: gcc.dg/tree-ssa/pr21417.c (internal compiler error)
 FAIL: gcc.dg/tree-ssa/pr21417.c (test for excess errors)
 
this is caused by our improving the accuracy of live-in for new created 
 basic
 block. Now we will split
more than one edge for the above two testcase. thus trigger the following
 assert in move_insn_for_shrink_wrap:
 
/* We should not split more than once for a function.  */
gcc_assert (!(*split_p));

According to the algorithm, it is impossible to split one edge twice. It's 
possible to split two different edges. But for such cases, the control flow is 
too complex to perform shrink-wrapping.

Anyway, your patch improves the accuracy. You can replace the gcc_assert to 
return; or change split_p to splitted_edge then you can check one edge is 
not splitted twice.

Thanks!
-Zhenqiang
 
   take pr21417.c for example, after the patch, two edges will be split,
 
 before this patch
 =
 .L2:
  movq%rdi, %rax
  cmpl$142, (%rdi)
  jne .L13
 .L4:
  all insns sinked here  -- the only split
  ...
  ...
 
  popq%rbx
  popq%rbp
 .L13:
  ret
 
 after this patch
 
 .L2:
 
  cmpl$142, (%rdi)
  jne .L13
 .L4:
  part of insns sinked into here  -- first split
  
  
 
  popq%rbx
  popq%rbp
  ret
 
 .L13:
  movq%rdi, %rax  -- second split and one instruction moved here
  ret
 
 I don't know why there is a assert to prevent multi split.
 
 after I remove that assert, pass bootstrap and no regression.
 
 and for pr21417.c, the multi split more cause one extra ret instruction, but
 the performance is better, because there
 is no need to execute movq%rdi, %rax if we go down to L4.
 
 any comments?
 
 BTW: I updated the patch with testcase which could not be shrink-wrapped
 before this patch.
 
 thanks.
 
 -- Jiong
 
 
  Jeff
 
 






Re: [PATCH, 2/2] shrink wrap a function with a single loop: split live_edge

2014-09-25 Thread Jiong Wang


On 25/09/14 09:52, Zhenqiang Chen wrote:



-Original Message-
From: Jiong Wang [mailto:jiong.w...@arm.com]
Sent: Thursday, September 25, 2014 2:13 AM
To: Jeff Law; Zhenqiang Chen
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH, 2/2] shrink wrap a function with a single loop: split
live_edge


On 22/09/14 18:51, Jeff Law wrote:

On 09/22/14 04:24, Jiong Wang wrote:

Great.  Can you send an updated patchkit for review.

patch attached.

please review, thanks.

gcc/ * shrink-wrap.c (move_insn_for_shrink_wrap): Initialize the
live-in of new created BB as the intersection of live-in from
old_dest and live-out from bb.

Looks good.  However, before committing we need a couple things.

1. Bootstrap  regression test this variant of the patch.  I know you
tested an earlier one, but please test this one just to be sure.

2. Testcase.  I think you could test for either the reduction in the
live-in set of the newly created block or that you're shrink wrapping
one or more functions you didn't previously shrink-wrap.  I think it's
fine if this test is target specific.

   bootstrap ok based on revision 215515.

   while the x86 regression result is interesting. there is no regression on
check-g++, while there is four regression on check-gcc:

FAIL: gcc.dg/tree-ssa/loadpre10.c (internal compiler error)
FAIL: gcc.dg/tree-ssa/loadpre10.c (test for excess errors)
FAIL: gcc.dg/tree-ssa/pr21417.c (internal compiler error)
FAIL: gcc.dg/tree-ssa/pr21417.c (test for excess errors)

this is caused by our improving the accuracy of live-in for new created 
basic
block. Now we will split
more than one edge for the above two testcase. thus trigger the following
assert in move_insn_for_shrink_wrap:

/* We should not split more than once for a function.  */
gcc_assert (!(*split_p));

According to the algorithm, it is impossible to split one edge twice. It's 
possible to split two different edges. But for such cases, the control flow is 
too complex to perform shrink-wrapping.

Anyway, your patch improves the accuracy. You can replace the gcc_assert to return; or change 
split_p to splitted_edge then you can check one edge is not splitted twice.


thanks for the explanation.

actually, the old bitmap_copy (df_get_live_in (next_block), df_get_live_out (bb)); will 
let any dest reg
in entry block alive in the new splitted block. If there is another block which 
dest also set in live_in, then
dest alive in two blocks, then those code in live_edge_for_reg will always 
return NULL, thus the old
inaccurate data flow will actually never make split two different edges 
happen... thus assert never triggered.

as from the whole x86 boostrap, and regression test, only two cases trigger 
split two different edges, I think it's
trival case, thus prefer to be conservative to keep the old logic, as suggested, just replace 
gcc_assert into return false.

or if we want to allow multi split, I think just remove the assert is OK, because 
EDGE_COUNT (next_block-preds) == 2
will guarantee split one edge twice never happen.

new patch updated.

pass bootstrap and no regression, both check-gcc and check-g++, on the x86.

OK for trunk?

thanks.

gcc/
   * shrink-wrap.c (move_insn_for_shrink_wrap): Initialize the live-in of
   new created BB as the intersection of live-in from old_dest and live-out
   from bb.
diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index af23f02..bd4813c 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -250,16 +250,21 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
   if (!df_live)
 	return false;

+  basic_block old_dest = live_edge-dest;
   next_block = split_edge (live_edge);

   /* We create a new basic block.  Call df_grow_bb_info to make sure
 	 all data structures are allocated.  */
   df_grow_bb_info (df_live);
-  bitmap_copy (df_get_live_in (next_block), df_get_live_out (bb));
+
+  bitmap_and (df_get_live_in (next_block), df_get_live_out (bb),
+		  df_get_live_in (old_dest));
   df_set_bb_dirty (next_block);

   /* We should not split more than once for a function.  */
-  gcc_assert (!(*split_p));
+  if (*split_p)
+	return false;
+
   *split_p = true;
 }

diff --git a/gcc/testsuite/gcc.target/i386/shrink_wrap_1.c b/gcc/testsuite/gcc.target/i386/shrink_wrap_1.c
new file mode 100644
index 000..47f2468
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/shrink_wrap_1.c
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-rtl-pro_and_epilogue } */
+
+enum machine_mode
+{
+  FAKE_0,
+  FAKE_1,
+  FAKE_2,
+  FAKE_3,
+  FAKE_4,
+  FAKE_5,
+  NUM_MACHINE_MODES,
+};
+
+typedef int *rtx;
+typedef long unsigned int size_t;
+extern unsigned char mode_size[NUM_MACHINE_MODES];
+
+extern rtx c_readstr (const char *, enum machine_mode);
+extern rtx convert_to_mode (enum machine_mode, rtx, int);
+extern rtx expand_mult (enum machine_mode, rtx, rtx, rtx, int);
+extern rtx force_reg (enum machine_mode, rtx

Re: [PATCH, 2/2] shrink wrap a function with a single loop: split live_edge

2014-09-25 Thread Jeff Law

On 09/25/14 09:04, Jiong Wang wrote:


On 25/09/14 09:52, Zhenqiang Chen wrote:



-Original Message-
From: Jiong Wang [mailto:jiong.w...@arm.com]
Sent: Thursday, September 25, 2014 2:13 AM
To: Jeff Law; Zhenqiang Chen
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH, 2/2] shrink wrap a function with a single loop:
split
live_edge


On 22/09/14 18:51, Jeff Law wrote:

On 09/22/14 04:24, Jiong Wang wrote:

Great.  Can you send an updated patchkit for review.

patch attached.

please review, thanks.

gcc/ * shrink-wrap.c (move_insn_for_shrink_wrap): Initialize the
live-in of new created BB as the intersection of live-in from
old_dest and live-out from bb.

Looks good.  However, before committing we need a couple things.

1. Bootstrap  regression test this variant of the patch.  I know you
tested an earlier one, but please test this one just to be sure.

2. Testcase.  I think you could test for either the reduction in the
live-in set of the newly created block or that you're shrink wrapping
one or more functions you didn't previously shrink-wrap.  I think it's
fine if this test is target specific.

   bootstrap ok based on revision 215515.

   while the x86 regression result is interesting. there is no
regression on
check-g++, while there is four regression on check-gcc:

FAIL: gcc.dg/tree-ssa/loadpre10.c (internal compiler error)
FAIL: gcc.dg/tree-ssa/loadpre10.c (test for excess errors)
FAIL: gcc.dg/tree-ssa/pr21417.c (internal compiler error)
FAIL: gcc.dg/tree-ssa/pr21417.c (test for excess errors)

this is caused by our improving the accuracy of live-in for new
created basic
block. Now we will split
more than one edge for the above two testcase. thus trigger the
following
assert in move_insn_for_shrink_wrap:

/* We should not split more than once for a function.  */
gcc_assert (!(*split_p));

According to the algorithm, it is impossible to split one edge twice.
It's possible to split two different edges. But for such cases, the
control flow is too complex to perform shrink-wrapping.

Anyway, your patch improves the accuracy. You can replace the
gcc_assert to return; or change split_p to splitted_edge then
you can check one edge is not splitted twice.


thanks for the explanation.

actually, the old bitmap_copy (df_get_live_in (next_block),
df_get_live_out (bb)); will let any dest reg
in entry block alive in the new splitted block. If there is another
block which dest also set in live_in, then
dest alive in two blocks, then those code in live_edge_for_reg will
always return NULL, thus the old
inaccurate data flow will actually never make split two different edges
happen... thus assert never triggered.

as from the whole x86 boostrap, and regression test, only two cases
trigger split two different edges, I think it's
trival case, thus prefer to be conservative to keep the old logic, as
suggested, just replace gcc_assert into return false.

or if we want to allow multi split, I think just remove the assert is
OK, because EDGE_COUNT (next_block-preds) == 2
will guarantee split one edge twice never happen.

new patch updated.

pass bootstrap and no regression, both check-gcc and check-g++, on the x86.

OK for trunk?

thanks.

gcc/
* shrink-wrap.c (move_insn_for_shrink_wrap): Initialize the live-in of
new created BB as the intersection of live-in from old_dest and
live-out
from bb.

Please include a ChangeLog entry for the testsuite.  Something like:

* gcc.target/i386/shrink_wrap_1.c: New test.

With that addition, OK for the trunk.

Jeff




Re: [PATCH, 2/2] shrink wrap a function with a single loop: split live_edge

2014-09-24 Thread Jiong Wang


On 22/09/14 18:51, Jeff Law wrote:

On 09/22/14 04:24, Jiong Wang wrote:

Great.  Can you send an updated patchkit for review.

patch attached.

please review, thanks.

gcc/ * shrink-wrap.c (move_insn_for_shrink_wrap): Initialize the
live-in of new created BB as the intersection of live-in from
old_dest and live-out from bb.

Looks good.  However, before committing we need a couple things.

1. Bootstrap  regression test this variant of the patch.  I know you
tested an earlier one, but please test this one just to be sure.

2. Testcase.  I think you could test for either the reduction in the
live-in set of the newly created block or that you're shrink wrapping
one or more functions you didn't previously shrink-wrap.  I think it's
fine if this test is target specific.


 bootstrap ok based on revision 215515.

 while the x86 regression result is interesting. there is no regression on 
check-g++, while there is four regression on check-gcc:

FAIL: gcc.dg/tree-ssa/loadpre10.c (internal compiler error)
FAIL: gcc.dg/tree-ssa/loadpre10.c (test for excess errors)
FAIL: gcc.dg/tree-ssa/pr21417.c (internal compiler error)
FAIL: gcc.dg/tree-ssa/pr21417.c (test for excess errors)

  this is caused by our improving the accuracy of live-in for new created basic 
block. Now we will split
  more than one edge for the above two testcase. thus trigger the following 
assert in move_insn_for_shrink_wrap:

  /* We should not split more than once for a function.  */
  gcc_assert (!(*split_p));

 take pr21417.c for example, after the patch, two edges will be split,

before this patch
=
.L2:
movq%rdi, %rax
cmpl$142, (%rdi)
jne .L13
.L4:
all insns sinked here  -- the only split
...
...

popq%rbx
popq%rbp
.L13:
ret

after this patch

.L2:

cmpl$142, (%rdi)
jne .L13
.L4:
part of insns sinked into here  -- first split



popq%rbx
popq%rbp
ret

.L13:   
movq%rdi, %rax  -- second split and one instruction moved here
ret

I don't know why there is a assert to prevent multi split.

after I remove that assert, pass bootstrap and no regression.

and for pr21417.c, the multi split more cause one extra ret instruction, but 
the performance is better, because there
is no need to execute movq%rdi, %rax if we go down to L4.

any comments?

BTW: I updated the patch with testcase which could not be shrink-wrapped before 
this patch.

thanks.

-- Jiong



Jeff


diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index fd24135..63deadf 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -217,12 +217,15 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
   if (!df_live)
 	return false;

+  basic_block old_dest = live_edge-dest;
   next_block = split_edge (live_edge);

   /* We create a new basic block.  Call df_grow_bb_info to make sure
 	 all data structures are allocated.  */
   df_grow_bb_info (df_live);
-  bitmap_copy (df_get_live_in (next_block), df_get_live_out (bb));
+
+  bitmap_and (df_get_live_in (next_block), df_get_live_out (bb),
+		  df_get_live_in (old_dest));
   df_set_bb_dirty (next_block);

   /* We should not split more than once for a function.  */
diff --git a/gcc/testsuite/gcc.target/i386/shrink_wrap_1.c b/gcc/testsuite/gcc.target/i386/shrink_wrap_1.c
new file mode 100644
index 000..47f2468
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/shrink_wrap_1.c
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-rtl-pro_and_epilogue } */
+
+enum machine_mode
+{
+  FAKE_0,
+  FAKE_1,
+  FAKE_2,
+  FAKE_3,
+  FAKE_4,
+  FAKE_5,
+  NUM_MACHINE_MODES,
+};
+
+typedef int *rtx;
+typedef long unsigned int size_t;
+extern unsigned char mode_size[NUM_MACHINE_MODES];
+
+extern rtx c_readstr (const char *, enum machine_mode);
+extern rtx convert_to_mode (enum machine_mode, rtx, int);
+extern rtx expand_mult (enum machine_mode, rtx, rtx, rtx, int);
+extern rtx force_reg (enum machine_mode, rtx);
+extern void *memset (void *__s, int __c, size_t __n);
+
+rtx
+builtin_memset_gen_str (void *data, long offset __attribute__ ((__unused__)),
+			enum machine_mode mode)
+{
+  rtx target, coeff;
+  size_t size;
+  char *p;
+
+  size = ((unsigned short) (__builtin_constant_p (mode)
+			? mode_size_inline (mode) : mode_size[mode]));
+  if (size == 1)
+return (rtx) data;
+
+  p = ((char *) __builtin_alloca(sizeof (char) * (size)));
+  memset (p, 1, size);
+  coeff = c_readstr (p, mode);
+
+  target = convert_to_mode (mode, (rtx) data, 1);
+  target = expand_mult (mode, target, coeff, (rtx) 0, 1);
+  return force_reg (mode, target);
+}
+
+/* { dg-final { scan-rtl-dump Performing shrink-wrapping pro_and_epilogue  } } */
+/* { dg-final { cleanup-rtl-dump pro_and_epilogue } } */

Re: [PATCH, 2/2] shrink wrap a function with a single loop: split live_edge

2014-09-22 Thread Jiong Wang

On 19/09/14 17:19, Jeff Law wrote:


On 09/19/14 10:02, Jiong Wang wrote:

On 19/09/14 16:49, Jeff Law wrote:


Probably.  Though I'd be a bit concerned with next_block-next_bb.
Wouldn't it be safer to stash away the relevant basic block prior to the
call to split_edge, then use that saved copy.  Something like this
(untested):

basic_block old_dest = live_edge-dest;
next_block = split_edge (live_edge);

/* We create a new basic block.  Call df_grow_bb_info to make sure
  all data structures are allocated.  */
df_grow_bb_info (df_live);
bitmap_and (df_get_live_in (next_block),
   df_get_live_out (bb),
   df_get_live_in (old_dest));


The idea being we don't want to depend on the precise ordering blocks in
the block chain.

Could you try that and see if it does what you need?

Jeff,

Thanks, verified, it works.

Great.  Can you send an updated patchkit for review.


patch attached.

please review, thanks.

gcc/
  * shrink-wrap.c (move_insn_for_shrink_wrap): Initialize the live-in of
  new created BB as the intersection of live-in from old_dest and live-out
  from bb.
diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index fd24135..63deadf 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -217,12 +217,15 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
   if (!df_live)
 	return false;

+  basic_block old_dest = live_edge-dest;
   next_block = split_edge (live_edge);

   /* We create a new basic block.  Call df_grow_bb_info to make sure
 	 all data structures are allocated.  */
   df_grow_bb_info (df_live);
-  bitmap_copy (df_get_live_in (next_block), df_get_live_out (bb));
+
+  bitmap_and (df_get_live_in (next_block), df_get_live_out (bb),
+		  df_get_live_in (old_dest));
   df_set_bb_dirty (next_block);

   /* We should not split more than once for a function.  */

Re: [PATCH, 2/2] shrink wrap a function with a single loop: split live_edge

2014-09-22 Thread Jeff Law

On 09/22/14 04:24, Jiong Wang wrote:

Great.  Can you send an updated patchkit for review.


patch attached.

please review, thanks.

gcc/ * shrink-wrap.c (move_insn_for_shrink_wrap): Initialize the
live-in of new created BB as the intersection of live-in from
old_dest and live-out from bb.

Looks good.  However, before committing we need a couple things.

1. Bootstrap  regression test this variant of the patch.  I know you 
tested an earlier one, but please test this one just to be sure.


2. Testcase.  I think you could test for either the reduction in the 
live-in set of the newly created block or that you're shrink wrapping 
one or more functions you didn't previously shrink-wrap.  I think it's 
fine if this test is target specific.


Jeff



Re: [PATCH, 2/2] shrink wrap a function with a single loop: split live_edge

2014-09-19 Thread Jiong Wang


On 19/09/14 06:51, Jeff Law wrote:

On 09/16/14 00:55, Zhenqiang Chen wrote:



-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
ow...@gcc.gnu.org] On Behalf Of Jiong Wang
Sent: Monday, September 15, 2014 11:28 PM
To: Zhenqiang Chen
Cc: gcc-patches@gcc.gnu.org; Jeff Law
Subject: Re: [PATCH, 2/2] shrink wrap a function with a single loop: split
live_edge

On 08/05/14 09:07, Zhenqiang Chen wrote:

static bool
move_insn_for_shrink_wrap (basic_block bb, rtx insn,
  const HARD_REG_SET uses,
  const HARD_REG_SET defs,
-  HARD_REG_SET *last_uses)
+  HARD_REG_SET *last_uses,
+  bool *split_p)
{
  rtx set, src, dest;
  bitmap live_out, live_in, bb_uses, bb_defs;
  unsigned int i, dregno, end_dregno, sregno, end_sregno;
  basic_block next_block;
+  edge live_edge;

  /* Look for a simple register copy.  */
  set = single_set (insn);
@@ -5582,17 +5589,31 @@ move_insn_for_shrink_wrap (basic_block bb,

rtx insn,

  || overlaps_hard_reg_set_p (defs, GET_MODE (dest), dregno))
return false;

-  /* See whether there is a successor block to which we could move
INSN.  */
-  next_block = next_block_for_reg (bb, dregno, end_dregno);
-  if (!next_block)
+  live_edge = next_block_for_reg (bb, dregno, end_dregno);  if
+ (!live_edge)
 return false;

+  next_block = live_edge-dest;
+
  /* If the destination register is referred in later insn,
 try to forward it.  */
  if (overlaps_hard_reg_set_p (*last_uses, GET_MODE (dest), dregno)
   !try_copy_prop (bb, insn, src, dest, last_uses))
return false;

+  /* Create a new basic block on the edge.  */  if (EDGE_COUNT
+ (next_block-preds) == 2)
+{
+  next_block = split_edge (live_edge);
+
+  bitmap_copy (df_get_live_in (next_block), df_get_live_out
+ (bb));

(re-sent, looks like the first send not received by the server...)

for the function _IO_wdefault_xsputn in glibc wgenops.c (.dot file
attached)

174: NOTE_INSN_BASIC_BLOCK 21 are the new created basic block because
of the sink of move instruction ax:DI = 0 and split edge.

but the live_in of this BB is copied from live_out of BB 2 which is too 
big, and
actually prevent the later sink of 16: r12:DI=dx:DI.

Should it be better to copy live_in from next_block-next_bb instead of
live_out from bb, as it will model what's needed more accurately?

According to the algorithm, next_block-next_bb (which is live_edge-dest) should 
have two predecessors. Live_in of next_block-next_bb would include live_out of the 
other edge,  which is not necessary accurately. To be more accurate, you may need an 
intersection of df_get_live_out (bb) and df_get_live_in (next_block-next_bb).


Presumably we can't recompute the liveness info here as it's too
expensive to do that each time we split an edge to sink a statement?
ie, we need the more accurate liveness within this pass so that we can
sink further instructions?


for a single case, x86 bootstrap, looks like these code are not compile time 
critical
as for all 4762 live edges which could sink instructions

4139: EDGE_COUNT (next_block-preds) == 1
270: EDGE_COUNT (next_block-preds) == 2
353: EDGE_COUNT (next_block-preds)  2

and if we make the live info more accurate here, just very few additional 
functions ( 10)
shrink-wrapped from glibc build  gcc bootstrap test.

So, is it OK we simply change bitmap_copy  (df_get_live_in  (next_block),  
df_get_live_out  (bb));
into bitmap_and  (df_get_live_in  (next_block),  df_get_live_out  (bb),  
df_get_live_in  (next_block-next_bb)); ?

-- Jiong



jeff







Re: [PATCH, 2/2] shrink wrap a function with a single loop: split live_edge

2014-09-19 Thread Jeff Law

On 09/19/14 05:27, Jiong Wang wrote:


On 19/09/14 06:51, Jeff Law wrote:

On 09/16/14 00:55, Zhenqiang Chen wrote:



-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
ow...@gcc.gnu.org] On Behalf Of Jiong Wang
Sent: Monday, September 15, 2014 11:28 PM
To: Zhenqiang Chen
Cc: gcc-patches@gcc.gnu.org; Jeff Law
Subject: Re: [PATCH, 2/2] shrink wrap a function with a single loop:
split
live_edge

On 08/05/14 09:07, Zhenqiang Chen wrote:

static bool
move_insn_for_shrink_wrap (basic_block bb, rtx insn,
  const HARD_REG_SET uses,
  const HARD_REG_SET defs,
-  HARD_REG_SET *last_uses)
+  HARD_REG_SET *last_uses,
+  bool *split_p)
{
  rtx set, src, dest;
  bitmap live_out, live_in, bb_uses, bb_defs;
  unsigned int i, dregno, end_dregno, sregno, end_sregno;
  basic_block next_block;
+  edge live_edge;

  /* Look for a simple register copy.  */
  set = single_set (insn);
@@ -5582,17 +5589,31 @@ move_insn_for_shrink_wrap (basic_block bb,

rtx insn,

  || overlaps_hard_reg_set_p (defs, GET_MODE (dest), dregno))
return false;

-  /* See whether there is a successor block to which we could move
INSN.  */
-  next_block = next_block_for_reg (bb, dregno, end_dregno);
-  if (!next_block)
+  live_edge = next_block_for_reg (bb, dregno, end_dregno);  if
+ (!live_edge)
 return false;

+  next_block = live_edge-dest;
+
  /* If the destination register is referred in later insn,
 try to forward it.  */
  if (overlaps_hard_reg_set_p (*last_uses, GET_MODE (dest),
dregno)
   !try_copy_prop (bb, insn, src, dest, last_uses))
return false;

+  /* Create a new basic block on the edge.  */  if (EDGE_COUNT
+ (next_block-preds) == 2)
+{
+  next_block = split_edge (live_edge);
+
+  bitmap_copy (df_get_live_in (next_block), df_get_live_out
+ (bb));

(re-sent, looks like the first send not received by the server...)

for the function _IO_wdefault_xsputn in glibc wgenops.c (.dot
file
attached)

174: NOTE_INSN_BASIC_BLOCK 21 are the new created basic block
because
of the sink of move instruction ax:DI = 0 and split edge.

but the live_in of this BB is copied from live_out of BB 2 which
is too big, and
actually prevent the later sink of 16: r12:DI=dx:DI.

Should it be better to copy live_in from next_block-next_bb
instead of
live_out from bb, as it will model what's needed more accurately?

According to the algorithm, next_block-next_bb (which is
live_edge-dest) should have two predecessors. Live_in of
next_block-next_bb would include live_out of the other edge,  which
is not necessary accurately. To be more accurate, you may need an
intersection of df_get_live_out (bb) and df_get_live_in
(next_block-next_bb).


Presumably we can't recompute the liveness info here as it's too
expensive to do that each time we split an edge to sink a statement?
ie, we need the more accurate liveness within this pass so that we can
sink further instructions?


for a single case, x86 bootstrap, looks like these code are not compile
time critical
as for all 4762 live edges which could sink instructions

4139: EDGE_COUNT (next_block-preds) == 1
270: EDGE_COUNT (next_block-preds) == 2
353: EDGE_COUNT (next_block-preds)  2

and if we make the live info more accurate here, just very few
additional functions ( 10)
shrink-wrapped from glibc build  gcc bootstrap test.

Perhaps I wasn't clear.

The whole point behind the proposed changes is to enable additional 
sinking that is currently blocked because of the overly large set of 
live objects.


So my question was an attempt to understand why we didn't just a normal 
liveness update -- I speculated that we aren't doing a normal liveness 
update because of the compile-time cost.






So, is it OK we simply change bitmap_copy  (df_get_live_in
(next_block),  df_get_live_out  (bb));
into bitmap_and  (df_get_live_in  (next_block),  df_get_live_out
(bb),  df_get_live_in  (next_block-next_bb)); ?
Probably.  Though I'd be a bit concerned with next_block-next_bb. 
Wouldn't it be safer to stash away the relevant basic block prior to the 
call to split_edge, then use that saved copy.  Something like this 
(untested):


basic_block old_dest = live_edge-dest;
next_block = split_edge (live_edge);

/* We create a new basic block.  Call df_grow_bb_info to make sure
   all data structures are allocated.  */
df_grow_bb_info (df_live);
bitmap_and (df_get_live_in (next_block),
df_get_live_out (bb),
df_get_live_in (old_dest));


The idea being we don't want to depend on the precise ordering blocks in 
the block chain.


Could you try that and see if it does what you need?

jeff



Re: [PATCH, 2/2] shrink wrap a function with a single loop: split live_edge

2014-09-19 Thread Jiong Wang


On 19/09/14 16:49, Jeff Law wrote:

On 09/19/14 05:27, Jiong Wang wrote:

On 19/09/14 06:51, Jeff Law wrote:

On 09/16/14 00:55, Zhenqiang Chen wrote:

-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
ow...@gcc.gnu.org] On Behalf Of Jiong Wang
Sent: Monday, September 15, 2014 11:28 PM
To: Zhenqiang Chen
Cc: gcc-patches@gcc.gnu.org; Jeff Law
Subject: Re: [PATCH, 2/2] shrink wrap a function with a single loop:
split
live_edge

On 08/05/14 09:07, Zhenqiang Chen wrote:

 static bool
 move_insn_for_shrink_wrap (basic_block bb, rtx insn,
   const HARD_REG_SET uses,
   const HARD_REG_SET defs,
-  HARD_REG_SET *last_uses)
+  HARD_REG_SET *last_uses,
+  bool *split_p)
{
   rtx set, src, dest;
   bitmap live_out, live_in, bb_uses, bb_defs;
   unsigned int i, dregno, end_dregno, sregno, end_sregno;
   basic_block next_block;
+  edge live_edge;

   /* Look for a simple register copy.  */
   set = single_set (insn);
@@ -5582,17 +5589,31 @@ move_insn_for_shrink_wrap (basic_block bb,

rtx insn,

   || overlaps_hard_reg_set_p (defs, GET_MODE (dest), dregno))
 return false;

-  /* See whether there is a successor block to which we could move
INSN.  */
-  next_block = next_block_for_reg (bb, dregno, end_dregno);
-  if (!next_block)
+  live_edge = next_block_for_reg (bb, dregno, end_dregno);  if
+ (!live_edge)
  return false;

+  next_block = live_edge-dest;
+
   /* If the destination register is referred in later insn,
  try to forward it.  */
   if (overlaps_hard_reg_set_p (*last_uses, GET_MODE (dest),
dregno)
!try_copy_prop (bb, insn, src, dest, last_uses))
 return false;

+  /* Create a new basic block on the edge.  */  if (EDGE_COUNT
+ (next_block-preds) == 2)
+{
+  next_block = split_edge (live_edge);
+
+  bitmap_copy (df_get_live_in (next_block), df_get_live_out
+ (bb));

(re-sent, looks like the first send not received by the server...)

 for the function _IO_wdefault_xsputn in glibc wgenops.c (.dot
file
attached)

 174: NOTE_INSN_BASIC_BLOCK 21 are the new created basic block
because
of the sink of move instruction ax:DI = 0 and split edge.

 but the live_in of this BB is copied from live_out of BB 2 which
is too big, and
actually prevent the later sink of 16: r12:DI=dx:DI.

 Should it be better to copy live_in from next_block-next_bb
instead of
live_out from bb, as it will model what's needed more accurately?

According to the algorithm, next_block-next_bb (which is
live_edge-dest) should have two predecessors. Live_in of
next_block-next_bb would include live_out of the other edge,  which
is not necessary accurately. To be more accurate, you may need an
intersection of df_get_live_out (bb) and df_get_live_in
(next_block-next_bb).


Presumably we can't recompute the liveness info here as it's too
expensive to do that each time we split an edge to sink a statement?
ie, we need the more accurate liveness within this pass so that we can
sink further instructions?

for a single case, x86 bootstrap, looks like these code are not compile
time critical
as for all 4762 live edges which could sink instructions

4139: EDGE_COUNT (next_block-preds) == 1
270: EDGE_COUNT (next_block-preds) == 2
353: EDGE_COUNT (next_block-preds)  2

and if we make the live info more accurate here, just very few
additional functions ( 10)
shrink-wrapped from glibc build  gcc bootstrap test.

Perhaps I wasn't clear.

The whole point behind the proposed changes is to enable additional
sinking that is currently blocked because of the overly large set of
live objects.

So my question was an attempt to understand why we didn't just a normal
liveness update -- I speculated that we aren't doing a normal liveness
update because of the compile-time cost.




So, is it OK we simply change bitmap_copy  (df_get_live_in
(next_block),  df_get_live_out  (bb));
into bitmap_and  (df_get_live_in  (next_block),  df_get_live_out
(bb),  df_get_live_in  (next_block-next_bb)); ?

Probably.  Though I'd be a bit concerned with next_block-next_bb.
Wouldn't it be safer to stash away the relevant basic block prior to the
call to split_edge, then use that saved copy.  Something like this
(untested):

basic_block old_dest = live_edge-dest;
next_block = split_edge (live_edge);

/* We create a new basic block.  Call df_grow_bb_info to make sure
 all data structures are allocated.  */
df_grow_bb_info (df_live);
bitmap_and (df_get_live_in (next_block),
  df_get_live_out (bb),
  df_get_live_in (old_dest));


The idea being we don't want to depend on the precise ordering blocks in
the block chain.

Could you try that and see if it does what you need?

Jeff,

  Thanks, verified, it works.

Regards,
Jiong


jeff







Re: [PATCH, 2/2] shrink wrap a function with a single loop: split live_edge

2014-09-19 Thread Jeff Law

On 09/19/14 10:02, Jiong Wang wrote:


On 19/09/14 16:49, Jeff Law wrote:

On 09/19/14 05:27, Jiong Wang wrote:

On 19/09/14 06:51, Jeff Law wrote:

On 09/16/14 00:55, Zhenqiang Chen wrote:

-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
ow...@gcc.gnu.org] On Behalf Of Jiong Wang
Sent: Monday, September 15, 2014 11:28 PM
To: Zhenqiang Chen
Cc: gcc-patches@gcc.gnu.org; Jeff Law
Subject: Re: [PATCH, 2/2] shrink wrap a function with a single loop:
split
live_edge

On 08/05/14 09:07, Zhenqiang Chen wrote:

 static bool
 move_insn_for_shrink_wrap (basic_block bb, rtx insn,
   const HARD_REG_SET uses,
   const HARD_REG_SET defs,
-  HARD_REG_SET *last_uses)
+  HARD_REG_SET *last_uses,
+  bool *split_p)
{
   rtx set, src, dest;
   bitmap live_out, live_in, bb_uses, bb_defs;
   unsigned int i, dregno, end_dregno, sregno, end_sregno;
   basic_block next_block;
+  edge live_edge;

   /* Look for a simple register copy.  */
   set = single_set (insn);
@@ -5582,17 +5589,31 @@ move_insn_for_shrink_wrap (basic_block bb,

rtx insn,

   || overlaps_hard_reg_set_p (defs, GET_MODE (dest),
dregno))
 return false;

-  /* See whether there is a successor block to which we could move
INSN.  */
-  next_block = next_block_for_reg (bb, dregno, end_dregno);
-  if (!next_block)
+  live_edge = next_block_for_reg (bb, dregno, end_dregno);  if
+ (!live_edge)
  return false;

+  next_block = live_edge-dest;
+
   /* If the destination register is referred in later insn,
  try to forward it.  */
   if (overlaps_hard_reg_set_p (*last_uses, GET_MODE (dest),
dregno)
!try_copy_prop (bb, insn, src, dest, last_uses))
 return false;

+  /* Create a new basic block on the edge.  */  if (EDGE_COUNT
+ (next_block-preds) == 2)
+{
+  next_block = split_edge (live_edge);
+
+  bitmap_copy (df_get_live_in (next_block), df_get_live_out
+ (bb));

(re-sent, looks like the first send not received by the server...)

 for the function _IO_wdefault_xsputn in glibc wgenops.c (.dot
file
attached)

 174: NOTE_INSN_BASIC_BLOCK 21 are the new created basic block
because
of the sink of move instruction ax:DI = 0 and split edge.

 but the live_in of this BB is copied from live_out of BB 2 which
is too big, and
actually prevent the later sink of 16: r12:DI=dx:DI.

 Should it be better to copy live_in from next_block-next_bb
instead of
live_out from bb, as it will model what's needed more accurately?

According to the algorithm, next_block-next_bb (which is
live_edge-dest) should have two predecessors. Live_in of
next_block-next_bb would include live_out of the other edge,  which
is not necessary accurately. To be more accurate, you may need an
intersection of df_get_live_out (bb) and df_get_live_in
(next_block-next_bb).


Presumably we can't recompute the liveness info here as it's too
expensive to do that each time we split an edge to sink a statement?
ie, we need the more accurate liveness within this pass so that we can
sink further instructions?

for a single case, x86 bootstrap, looks like these code are not compile
time critical
as for all 4762 live edges which could sink instructions

4139: EDGE_COUNT (next_block-preds) == 1
270: EDGE_COUNT (next_block-preds) == 2
353: EDGE_COUNT (next_block-preds)  2

and if we make the live info more accurate here, just very few
additional functions ( 10)
shrink-wrapped from glibc build  gcc bootstrap test.

Perhaps I wasn't clear.

The whole point behind the proposed changes is to enable additional
sinking that is currently blocked because of the overly large set of
live objects.

So my question was an attempt to understand why we didn't just a normal
liveness update -- I speculated that we aren't doing a normal liveness
update because of the compile-time cost.




So, is it OK we simply change bitmap_copy  (df_get_live_in
(next_block),  df_get_live_out  (bb));
into bitmap_and  (df_get_live_in  (next_block),  df_get_live_out
(bb),  df_get_live_in  (next_block-next_bb)); ?

Probably.  Though I'd be a bit concerned with next_block-next_bb.
Wouldn't it be safer to stash away the relevant basic block prior to the
call to split_edge, then use that saved copy.  Something like this
(untested):

basic_block old_dest = live_edge-dest;
next_block = split_edge (live_edge);

/* We create a new basic block.  Call df_grow_bb_info to make sure
 all data structures are allocated.  */
df_grow_bb_info (df_live);
bitmap_and (df_get_live_in (next_block),
  df_get_live_out (bb),
  df_get_live_in (old_dest));


The idea being we don't want to depend on the precise ordering blocks in
the block chain.

Could you try that and see if it does what you need?

Jeff,

   Thanks, verified, it works.

Great.  Can you send an updated patchkit

Re: [PATCH, 2/2] shrink wrap a function with a single loop: split live_edge

2014-09-18 Thread Jeff Law

On 09/16/14 00:55, Zhenqiang Chen wrote:




-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
ow...@gcc.gnu.org] On Behalf Of Jiong Wang
Sent: Monday, September 15, 2014 11:28 PM
To: Zhenqiang Chen
Cc: gcc-patches@gcc.gnu.org; Jeff Law
Subject: Re: [PATCH, 2/2] shrink wrap a function with a single loop: split
live_edge

On 08/05/14 09:07, Zhenqiang Chen wrote:

   static bool
   move_insn_for_shrink_wrap (basic_block bb, rtx insn,
 const HARD_REG_SET uses,
 const HARD_REG_SET defs,
-  HARD_REG_SET *last_uses)
+  HARD_REG_SET *last_uses,
+  bool *split_p)
{
 rtx set, src, dest;
 bitmap live_out, live_in, bb_uses, bb_defs;
 unsigned int i, dregno, end_dregno, sregno, end_sregno;
 basic_block next_block;
+  edge live_edge;

 /* Look for a simple register copy.  */
 set = single_set (insn);
@@ -5582,17 +5589,31 @@ move_insn_for_shrink_wrap (basic_block bb,

rtx insn,

 || overlaps_hard_reg_set_p (defs, GET_MODE (dest), dregno))
   return false;

-  /* See whether there is a successor block to which we could move
INSN.  */
-  next_block = next_block_for_reg (bb, dregno, end_dregno);
-  if (!next_block)
+  live_edge = next_block_for_reg (bb, dregno, end_dregno);  if
+ (!live_edge)
return false;

+  next_block = live_edge-dest;
+
 /* If the destination register is referred in later insn,
try to forward it.  */
 if (overlaps_hard_reg_set_p (*last_uses, GET_MODE (dest), dregno)
  !try_copy_prop (bb, insn, src, dest, last_uses))
   return false;

+  /* Create a new basic block on the edge.  */  if (EDGE_COUNT
+ (next_block-preds) == 2)
+{
+  next_block = split_edge (live_edge);
+
+  bitmap_copy (df_get_live_in (next_block), df_get_live_out
+ (bb));


(re-sent, looks like the first send not received by the server...)

   for the function _IO_wdefault_xsputn in glibc wgenops.c (.dot file
attached)

   174: NOTE_INSN_BASIC_BLOCK 21 are the new created basic block because
of the sink of move instruction ax:DI = 0 and split edge.

   but the live_in of this BB is copied from live_out of BB 2 which is too big, 
and
actually prevent the later sink of 16: r12:DI=dx:DI.

   Should it be better to copy live_in from next_block-next_bb instead of
live_out from bb, as it will model what's needed more accurately?


According to the algorithm, next_block-next_bb (which is live_edge-dest) should 
have two predecessors. Live_in of next_block-next_bb would include live_out of the 
other edge,  which is not necessary accurately. To be more accurate, you may need an 
intersection of df_get_live_out (bb) and df_get_live_in (next_block-next_bb).

Presumably we can't recompute the liveness info here as it's too 
expensive to do that each time we split an edge to sink a statement? 
ie, we need the more accurate liveness within this pass so that we can 
sink further instructions?


jeff



RE: [PATCH, 2/2] shrink wrap a function with a single loop: split live_edge

2014-09-16 Thread Zhenqiang Chen


 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Jiong Wang
 Sent: Monday, September 15, 2014 11:28 PM
 To: Zhenqiang Chen
 Cc: gcc-patches@gcc.gnu.org; Jeff Law
 Subject: Re: [PATCH, 2/2] shrink wrap a function with a single loop: split
 live_edge
 
 On 08/05/14 09:07, Zhenqiang Chen wrote:
static bool
move_insn_for_shrink_wrap (basic_block bb, rtx insn,
  const HARD_REG_SET uses,
  const HARD_REG_SET defs,
  -  HARD_REG_SET *last_uses)
  +  HARD_REG_SET *last_uses,
  +  bool *split_p)
  {
  rtx set, src, dest;
  bitmap live_out, live_in, bb_uses, bb_defs;
  unsigned int i, dregno, end_dregno, sregno, end_sregno;
  basic_block next_block;
  +  edge live_edge;
 
  /* Look for a simple register copy.  */
  set = single_set (insn);
  @@ -5582,17 +5589,31 @@ move_insn_for_shrink_wrap (basic_block bb,
 rtx insn,
  || overlaps_hard_reg_set_p (defs, GET_MODE (dest), dregno))
return false;
 
  -  /* See whether there is a successor block to which we could move
  INSN.  */
  -  next_block = next_block_for_reg (bb, dregno, end_dregno);
  -  if (!next_block)
  +  live_edge = next_block_for_reg (bb, dregno, end_dregno);  if
  + (!live_edge)
 return false;
 
  +  next_block = live_edge-dest;
  +
  /* If the destination register is referred in later insn,
 try to forward it.  */
  if (overlaps_hard_reg_set_p (*last_uses, GET_MODE (dest), dregno)
   !try_copy_prop (bb, insn, src, dest, last_uses))
return false;
 
  +  /* Create a new basic block on the edge.  */  if (EDGE_COUNT
  + (next_block-preds) == 2)
  +{
  +  next_block = split_edge (live_edge);
  +
  +  bitmap_copy (df_get_live_in (next_block), df_get_live_out
  + (bb));
 
 (re-sent, looks like the first send not received by the server...)
 
   for the function _IO_wdefault_xsputn in glibc wgenops.c (.dot file
 attached)
 
   174: NOTE_INSN_BASIC_BLOCK 21 are the new created basic block because
 of the sink of move instruction ax:DI = 0 and split edge.
 
   but the live_in of this BB is copied from live_out of BB 2 which is too 
 big, and
 actually prevent the later sink of 16: r12:DI=dx:DI.
 
   Should it be better to copy live_in from next_block-next_bb instead of
 live_out from bb, as it will model what's needed more accurately?

According to the algorithm, next_block-next_bb (which is live_edge-dest) 
should have two predecessors. Live_in of next_block-next_bb would include 
live_out of the other edge,  which is not necessary accurately. To be more 
accurate, you may need an intersection of df_get_live_out (bb) and 
df_get_live_in (next_block-next_bb).

Thanks!
-Zhenqiang
 
 +  bitmap_copy (df_get_live_in (next_block), df_get_live_in
 + (next_block-next_bb));
 
   After this modification, pass x86-64 bootstrap, and this function could be
 shrink-wrapped.
 
 -- Jiong
 
  +  df_set_bb_dirty (next_block);
  +
  +  /* We should not split more than once for a function.  */
  +  gcc_assert (!(*split_p));
  +  *split_p = true;
  +}
  +
  /* At this point we are committed to moving INSN, but let's try to
 move it as far as we can.  */
  do
  @@ -5610,7 +5631,10 @@ move_insn_for_shrink_wrap (basic_block bb,
 rtx insn,
   {
 for (i = dregno; i  end_dregno; i++)
   {
  - if (REGNO_REG_SET_P (bb_uses, i) || REGNO_REG_SET_P
 (bb_defs, i)
  +
  + if (*split_p
  + || REGNO_REG_SET_P (bb_uses, i)
  + || REGNO_REG_SET_P (bb_defs, i)
 || REGNO_REG_SET_P (DF_LIVE_BB_INFO (bb)-gen, i))
   next_block = NULL;
 CLEAR_REGNO_REG_SET (live_out, i); @@ -5621,7 +5645,8
  @@ move_insn_for_shrink_wrap (basic_block bb, rtx insn,
Either way, SRC is now live on entry.  */
 for (i = sregno; i  end_sregno; i++)
   {
  - if (REGNO_REG_SET_P (bb_defs, i)
  + if (*split_p
  + || REGNO_REG_SET_P (bb_defs, i)
 || REGNO_REG_SET_P (DF_LIVE_BB_INFO (bb)-gen, i))
   next_block = NULL;
 SET_REGNO_REG_SET (live_out, i); @@ -5650,21 +5675,31
  @@ move_insn_for_shrink_wrap (basic_block bb, rtx insn,
  /* If we don't need to add the move to BB, look for a single
successor block.  */
  if (next_block)
  -   next_block = next_block_for_reg (next_block, dregno, end_dregno);
  +   {
  + live_edge = next_block_for_reg (next_block, dregno, end_dregno);
  + if (!live_edge || EDGE_COUNT (live_edge-dest-preds)  1)
  +   break;
  + next_block = live_edge-dest;
  +   }
}
  while (next_block);
 
  -  /* BB now defines DEST.  It only

Re: [PATCH, 2/2] shrink wrap a function with a single loop: split live_edge

2014-09-15 Thread Jiong Wang

On 08/05/14 09:07, Zhenqiang Chen wrote:

  static bool
  move_insn_for_shrink_wrap (basic_block bb, rtx insn,
const HARD_REG_SET uses,
const HARD_REG_SET defs,
-  HARD_REG_SET *last_uses)
+  HARD_REG_SET *last_uses,
+  bool *split_p)
{
rtx set, src, dest;
bitmap live_out, live_in, bb_uses, bb_defs;
unsigned int i, dregno, end_dregno, sregno, end_sregno;
basic_block next_block;
+  edge live_edge;

/* Look for a simple register copy.  */
set = single_set (insn);
@@ -5582,17 +5589,31 @@ move_insn_for_shrink_wrap (basic_block bb, rtx insn,
|| overlaps_hard_reg_set_p (defs, GET_MODE (dest), dregno))
  return false;

-  /* See whether there is a successor block to which we could move INSN.  */
-  next_block = next_block_for_reg (bb, dregno, end_dregno);
-  if (!next_block)
+  live_edge = next_block_for_reg (bb, dregno, end_dregno);
+  if (!live_edge)
   return false;

+  next_block = live_edge-dest;
+
/* If the destination register is referred in later insn,
   try to forward it.  */
if (overlaps_hard_reg_set_p (*last_uses, GET_MODE (dest), dregno)
 !try_copy_prop (bb, insn, src, dest, last_uses))
  return false;

+  /* Create a new basic block on the edge.  */
+  if (EDGE_COUNT (next_block-preds) == 2)
+{
+  next_block = split_edge (live_edge);
+
+  bitmap_copy (df_get_live_in (next_block), df_get_live_out (bb));


(re-sent, looks like the first send not received by the server...)

 for the function _IO_wdefault_xsputn in glibc wgenops.c (.dot file attached)

 174: NOTE_INSN_BASIC_BLOCK 21 are the new created basic block because of the 
sink of move
instruction ax:DI = 0 and split edge.

 but the live_in of this BB is copied from live_out of BB 2 which is too big, 
and
actually prevent the later sink of 16: r12:DI=dx:DI.

 Should it be better to copy live_in from next_block-next_bb instead of live_out from 
bb,
as it will model what's needed more accurately?

+  bitmap_copy (df_get_live_in (next_block), df_get_live_in 
(next_block-next_bb));

 After this modification, pass x86-64 bootstrap, and this function could be 
shrink-wrapped.

-- Jiong


+  df_set_bb_dirty (next_block);
+
+  /* We should not split more than once for a function.  */
+  gcc_assert (!(*split_p));
+  *split_p = true;
+}
+
/* At this point we are committed to moving INSN, but let's try to
   move it as far as we can.  */
do
@@ -5610,7 +5631,10 @@ move_insn_for_shrink_wrap (basic_block bb, rtx insn,
 {
   for (i = dregno; i  end_dregno; i++)
 {
- if (REGNO_REG_SET_P (bb_uses, i) || REGNO_REG_SET_P (bb_defs, i)
+
+ if (*split_p
+ || REGNO_REG_SET_P (bb_uses, i)
+ || REGNO_REG_SET_P (bb_defs, i)
   || REGNO_REG_SET_P (DF_LIVE_BB_INFO (bb)-gen, i))
 next_block = NULL;
   CLEAR_REGNO_REG_SET (live_out, i);
@@ -5621,7 +5645,8 @@ move_insn_for_shrink_wrap (basic_block bb, rtx insn,
  Either way, SRC is now live on entry.  */
   for (i = sregno; i  end_sregno; i++)
 {
- if (REGNO_REG_SET_P (bb_defs, i)
+ if (*split_p
+ || REGNO_REG_SET_P (bb_defs, i)
   || REGNO_REG_SET_P (DF_LIVE_BB_INFO (bb)-gen, i))
 next_block = NULL;
   SET_REGNO_REG_SET (live_out, i);
@@ -5650,21 +5675,31 @@ move_insn_for_shrink_wrap (basic_block bb, rtx insn,
/* If we don't need to add the move to BB, look for a single
  successor block.  */
if (next_block)
-   next_block = next_block_for_reg (next_block, dregno, end_dregno);
+   {
+ live_edge = next_block_for_reg (next_block, dregno, end_dregno);
+ if (!live_edge || EDGE_COUNT (live_edge-dest-preds)  1)
+   break;
+ next_block = live_edge-dest;
+   }
  }
while (next_block);

-  /* BB now defines DEST.  It only uses the parts of DEST that overlap SRC
- (next loop).  */
-  for (i = dregno; i  end_dregno; i++)
+  /* For the new created basic block, there is no dataflow info at all.
+ So skip the following dataflow update and check.  */
+  if (!(*split_p))
  {
-  CLEAR_REGNO_REG_SET (bb_uses, i);
-  SET_REGNO_REG_SET (bb_defs, i);
-}
+  /* BB now defines DEST.  It only uses the parts of DEST that overlap SRC
+(next loop).  */
+  for (i = dregno; i  end_dregno; i++)
+   {
+ CLEAR_REGNO_REG_SET (bb_uses, i);
+ SET_REGNO_REG_SET (bb_defs, i);
+   }

-  /* BB now uses SRC.  */
-  for (i = sregno; i  end_sregno; i++)
-SET_REGNO_REG_SET (bb_uses, i);
+  /* BB now uses SRC.  */
+  for (i = sregno; i  end_sregno; i++)
+   SET_REGNO_REG_SET (bb_uses, i);
+}

emit_insn_after (PATTERN (insn), bb_note 

Re: [PATCH, 2/2] shrink wrap a function with a single loop: split live_edge

2014-05-14 Thread Jeff Law

On 05/13/14 03:49, Zhenqiang Chen wrote:

On 9 May 2014 14:08, Jeff Law l...@redhat.com wrote:

On 05/08/14 02:07, Zhenqiang Chen wrote:


Hi,

The patch splits the live_edge for move_insn_for_shrink_wrap to sink
the copy out of the entry block.

Bootstrap and no make check regression on X86-64 and ARM.

OK for trunk?

Thanks!
-Zhenqiang

ChangeLog:
2014-05-08  Zhenqiang Chen  zhenqiang.c...@linaro.org

  * function.c (next_block_for_reg): Allow live_edge-dest has two
  predecessors.
  (move_insn_for_shrink_wrap): Split live_edge.
  (prepre_shrink_wrap): One more parameter for
move_insn_for_shrink_wrap.

OK.
jeff



Re: [PATCH, 2/2] shrink wrap a function with a single loop: split live_edge

2014-05-13 Thread Zhenqiang Chen
On 9 May 2014 14:08, Jeff Law l...@redhat.com wrote:
 On 05/08/14 02:07, Zhenqiang Chen wrote:

 Hi,

 The patch splits the live_edge for move_insn_for_shrink_wrap to sink
 the copy out of the entry block.

 Bootstrap and no make check regression on X86-64 and ARM.

 OK for trunk?

 Thanks!
 -Zhenqiang

 ChangeLog:
 2014-05-08  Zhenqiang Chen  zhenqiang.c...@linaro.org

  * function.c (next_block_for_reg): Allow live_edge-dest has two
  predecessors.
  (move_insn_for_shrink_wrap): Split live_edge.
  (prepre_shrink_wrap): One more parameter for
 move_insn_for_shrink_wrap.


 diff --git a/gcc/function.c b/gcc/function.c
 index 764ac82..0be58e2 100644
 --- a/gcc/function.c
 +++ b/gcc/function.c
 @@ -5381,7 +5381,7 @@ requires_stack_frame_p (rtx insn, HARD_REG_SET
 prologue_used,
  and if BB is its only predecessor.  Return that block if so,
  otherwise return null.  */

 -static basic_block
 +static edge
   next_block_for_reg (basic_block bb, int regno, int end_regno)

 Comment for this function needs to be changed.  You're no longer returning a
 block, but the edge leading to the block.  It also seems the name of the
 function ought to change.

The comment and function name are updated.

 This looks basically OK.  I'd like to see the requested cleanups made, then
 the resulting new patch reposted for a final review.

Here is the updated patch based on the cleaned shrink-wrapping code:

ChangeLog:
2014-05-13  Zhenqiang Chen  zhenqiang.c...@linaro.org

* shrink-wrap.c: Update comment.
(next_block_for_reg): Rename to live_edge_for_reg.
(live_edge_for_reg): Allow live_edge-dest has two predecessors.
(move_insn_for_shrink_wrap): Split live_edge.
(prepre_shrink_wrap): One more parameter for move_insn_for_shrink_wrap.

diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index b302777..f11e920 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -1,4 +1,4 @@
-/* Expands front end tree to back end RTL for GCC.
+/* Shrink-wrapping related optimizations.
Copyright (C) 1987-2014 Free Software Foundation, Inc.

 This file is part of GCC.
@@ -110,12 +110,12 @@ requires_stack_frame_p (rtx insn, HARD_REG_SET
prologue_used,
   return false;
 }

-/* See whether BB has a single successor that uses [REGNO, END_REGNO),
-   and if BB is its only predecessor.  Return that block if so,
-   otherwise return null.  */
+/* See whether there has a single live edge from BB, which dest uses
+   [REGNO, END_REGNO).  Return the live edge if its dest bb has
+   one or two predecessors.  Otherwise return NULL.  */

-static basic_block
-next_block_for_reg (basic_block bb, int regno, int end_regno)
+static edge
+live_edge_for_reg (basic_block bb, int regno, int end_regno)
 {
   edge e, live_edge;
   edge_iterator ei;
@@ -148,25 +148,30 @@ next_block_for_reg (basic_block bb, int regno,
int end_regno)
   if (live_edge-flags  EDGE_ABNORMAL)
 return NULL;

-  if (EDGE_COUNT (live_edge-dest-preds)  1)
+  /* When live_edge-dest-preds == 2, we can create a new block on
+ the edge to make it meet the requirement.  */
+  if (EDGE_COUNT (live_edge-dest-preds)  2)
 return NULL;

-  return live_edge-dest;
+  return live_edge;
 }

 /* Try to move INSN from BB to a successor.  Return true on success.
USES and DEFS are the set of registers that are used and defined
-   after INSN in BB.  */
+   after INSN in BB.  SPLIT_P indicates whether a live edge from BB
+   is splitted or not.  */

static bool
 move_insn_for_shrink_wrap (basic_block bb, rtx insn,
   const HARD_REG_SET uses,
-  const HARD_REG_SET defs)
+  const HARD_REG_SET defs,
+  bool *split_p)
 {
   rtx set, src, dest;
   bitmap live_out, live_in, bb_uses, bb_defs;
   unsigned int i, dregno, end_dregno, sregno, end_sregno;
   basic_block next_block;
+  edge live_edge;

   /* Look for a simple register copy.  */
   set = single_set (insn);
@@ -191,10 +196,24 @@ move_insn_for_shrink_wrap (basic_block bb, rtx insn,
 return false;

   /* See whether there is a successor block to which we could move INSN.  */
-  next_block = next_block_for_reg (bb, dregno, end_dregno);
-  if (!next_block)
+  live_edge = live_edge_for_reg (bb, dregno, end_dregno);
+  if (!live_edge)
 return false;

+  next_block = live_edge-dest;
+  /* Create a new basic block on the edge.  */
+  if (EDGE_COUNT (next_block-preds) == 2)
+{
+  next_block = split_edge (live_edge);
+
+  bitmap_copy (df_get_live_in (next_block), df_get_live_out (bb));
+  df_set_bb_dirty (next_block);
+
+  /* We should not split more than once for a function.  */
+  gcc_assert (!(*split_p));
+  *split_p = true;
+}
+
   /* At this point we are committed to moving INSN, but let's try to
  move it as far as we can.  */
   do
@@ -212,7 +231,9 @@ move_insn_for_shrink_wrap (basic_block bb, rtx insn,
{
  for (i = 

Re: [PATCH, 2/2] shrink wrap a function with a single loop: split live_edge

2014-05-09 Thread Jeff Law

On 05/08/14 02:07, Zhenqiang Chen wrote:

Hi,

The patch splits the live_edge for move_insn_for_shrink_wrap to sink
the copy out of the entry block.

Bootstrap and no make check regression on X86-64 and ARM.

OK for trunk?

Thanks!
-Zhenqiang

ChangeLog:
2014-05-08  Zhenqiang Chen  zhenqiang.c...@linaro.org

 * function.c (next_block_for_reg): Allow live_edge-dest has two
 predecessors.
 (move_insn_for_shrink_wrap): Split live_edge.
 (prepre_shrink_wrap): One more parameter for move_insn_for_shrink_wrap.


diff --git a/gcc/function.c b/gcc/function.c
index 764ac82..0be58e2 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -5381,7 +5381,7 @@ requires_stack_frame_p (rtx insn, HARD_REG_SET
prologue_used,
 and if BB is its only predecessor.  Return that block if so,
 otherwise return null.  */

-static basic_block
+static edge
  next_block_for_reg (basic_block bb, int regno, int end_regno)
Comment for this function needs to be changed.  You're no longer 
returning a block, but the edge leading to the block.  It also seems the 
name of the function ought to change.


This looks basically OK.  I'd like to see the requested cleanups made, 
then the resulting new patch reposted for a final review.


Jeff


Re: [PATCH, 2/2] shrink wrap a function with a single loop: split live_edge

2014-05-09 Thread Zhenqiang Chen
On 9 May 2014 14:08, Jeff Law l...@redhat.com wrote:
 On 05/08/14 02:07, Zhenqiang Chen wrote:

 Hi,

 The patch splits the live_edge for move_insn_for_shrink_wrap to sink
 the copy out of the entry block.

 Bootstrap and no make check regression on X86-64 and ARM.

 OK for trunk?

 Thanks!
 -Zhenqiang

 ChangeLog:
 2014-05-08  Zhenqiang Chen  zhenqiang.c...@linaro.org

  * function.c (next_block_for_reg): Allow live_edge-dest has two
  predecessors.
  (move_insn_for_shrink_wrap): Split live_edge.
  (prepre_shrink_wrap): One more parameter for
 move_insn_for_shrink_wrap.


 diff --git a/gcc/function.c b/gcc/function.c
 index 764ac82..0be58e2 100644
 --- a/gcc/function.c
 +++ b/gcc/function.c
 @@ -5381,7 +5381,7 @@ requires_stack_frame_p (rtx insn, HARD_REG_SET
 prologue_used,
  and if BB is its only predecessor.  Return that block if so,
  otherwise return null.  */

 -static basic_block
 +static edge
   next_block_for_reg (basic_block bb, int regno, int end_regno)

 Comment for this function needs to be changed.  You're no longer returning a
 block, but the edge leading to the block.  It also seems the name of the
 function ought to change.

 This looks basically OK.  I'd like to see the requested cleanups made, then
 the resulting new patch reposted for a final review.

Thank you for the comments. I will follow Steven's comments to
separate shrink-wrapping code from function.c to shrink-wrap.c.

-Zhenqiang


Re: [PATCH, 2/2] shrink wrap a function with a single loop: split live_edge

2014-05-08 Thread Steven Bosscher
On Thu, May 8, 2014 at 10:07 AM, Zhenqiang Chen wrote:
 The patch splits the live_edge for move_insn_for_shrink_wrap to sink
 the copy out of the entry block.

Maybe also time to take the shrink-wrapping code out of function.c and
put it in its own file?

Ciao!
Steven