Re: [PATCH] Improve prepare_shrink_wrap to sink more instructions

2014-10-02 Thread Andrew Pinski
On Fri, Sep 19, 2014 at 1:43 PM, Jeff Law l...@redhat.com wrote:
 On 09/15/14 08:33, Jiong Wang wrote:


 Jeff,

   thanks, I partially understand your meaning here.

take the function ira_implicitly_set_insn_hard_regs in ira-lives.c
 for example,

when generating address rtl, gcc will automatically generate const
 operator to prefix
the address expression, like the following. so a simple CONSTANT_P
 check is enough in
case there is no embedded register.

(insn 309 310 308 3 (set (reg:DI 44 r15 [orig:94 ivtmp.674 ] [94])
  (const:DI (plus:DI (symbol_ref:DI (recog_data) [flags 0x40]
 var_decl 0x2b2c2ff91510 recog_data)
  (const_int 480 [0x1e0] -1


but for architecture like aarch64, the following instruction
 sequences to forming address
may be generated

 (insn 73 14 74 4 (set (reg/f:DI 20 x20 [99])
  (high:DI (symbol_ref:DI (global_a) [flags 0xc0]  var_decl
 0x7ff755a60900 stats))) 35 {*movdi_aarch64}
   (expr_list:REG_EQUIV (high:DI (symbol_ref:DI (global_a) [flags
 0xc0]  var_decl 0x7ff755a60900 stats))
  (nil)))

 (insn 17 30 25 5 (set (reg/f:DI 4 x4 [83])
  (lo_sum:DI (reg/f:DI 20 x20 [99])
  (symbol_ref:DI (global_a) [flags 0xc0]  var_decl
 0x7ff755a60900 stats))) {add_losym_di}
   (expr_list:REG_EQUIV (symbol_ref:DI (global_a) [flags 0xc0]
 var_decl 0x7ff755a60900 stats)
  (nil)))

   while CONSTANT_P could not catch the latter lo_sum case, as the
 RTX_CLASS of lo_sum is RTX_OBJ not RTX_CONST_OBJ,

 Hmm, it's been ~15 years since I regularly worked on a target that uses
 HIGH/LO_SUM, I thought we wrapped the LO_SUM expression inside a CONST as
 well, but reading the docs for CONST, that clearly isn't the case.

Could we add a check for lo_sum since it is an RTX_OBJ rather than
RTX_COMM_ARITH or RTX_BIN_ARITH?
I am testing the patch for that to fix the above issue.  It shows up
with the testcase Jiong added but only with -mabi=ilp32 enabled.

Thanks,
Andrew Pinski


 Sorry for that.  Can you (re) send your current patch for this for review?

 Jeff


Re: [PATCH] Improve prepare_shrink_wrap to sink more instructions

2014-10-02 Thread Jiong Wang


On 02/10/14 09:49, Andrew Pinski wrote:

On Fri, Sep 19, 2014 at 1:43 PM, Jeff Law l...@redhat.com wrote:

On 09/15/14 08:33, Jiong Wang wrote:


Jeff,

   thanks, I partially understand your meaning here.

take the function ira_implicitly_set_insn_hard_regs in ira-lives.c
for example,

when generating address rtl, gcc will automatically generate const
operator to prefix
the address expression, like the following. so a simple CONSTANT_P
check is enough in
case there is no embedded register.

(insn 309 310 308 3 (set (reg:DI 44 r15 [orig:94 ivtmp.674 ] [94])
  (const:DI (plus:DI (symbol_ref:DI (recog_data) [flags 0x40]
var_decl 0x2b2c2ff91510 recog_data)
  (const_int 480 [0x1e0] -1


but for architecture like aarch64, the following instruction
sequences to forming address
may be generated

(insn 73 14 74 4 (set (reg/f:DI 20 x20 [99])
  (high:DI (symbol_ref:DI (global_a) [flags 0xc0]  var_decl
0x7ff755a60900 stats))) 35 {*movdi_aarch64}
   (expr_list:REG_EQUIV (high:DI (symbol_ref:DI (global_a) [flags
0xc0]  var_decl 0x7ff755a60900 stats))
  (nil)))

(insn 17 30 25 5 (set (reg/f:DI 4 x4 [83])
  (lo_sum:DI (reg/f:DI 20 x20 [99])
  (symbol_ref:DI (global_a) [flags 0xc0]  var_decl
0x7ff755a60900 stats))) {add_losym_di}
   (expr_list:REG_EQUIV (symbol_ref:DI (global_a) [flags 0xc0]
var_decl 0x7ff755a60900 stats)
  (nil)))

   while CONSTANT_P could not catch the latter lo_sum case, as the
RTX_CLASS of lo_sum is RTX_OBJ not RTX_CONST_OBJ,

Hmm, it's been ~15 years since I regularly worked on a target that uses
HIGH/LO_SUM, I thought we wrapped the LO_SUM expression inside a CONST as
well, but reading the docs for CONST, that clearly isn't the case.

Could we add a check for lo_sum since it is an RTX_OBJ rather than
RTX_COMM_ARITH or RTX_BIN_ARITH?


thanks, agree, that's exactly what I want to catch, while missed it during the 
patch re-write. I was been stupid!
anyway, I think we need to solve pr63404, 
https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02568.html firstly, as it's a 
correctness regression.

--
Jiong


I am testing the patch for that to fix the above issue.  It shows up
with the testcase Jiong added but only with -mabi=ilp32 enabled.

Thanks,
Andrew Pinski


Sorry for that.  Can you (re) send your current patch for this for review?

Jeff





Re: [COMMITTED][PATCH] Improve prepare_shrink_wrap to sink more instructions

2014-09-25 Thread Christophe Lyon
On 24 September 2014 20:32, Jiong Wang jiong.w...@arm.com wrote:

 On 22/09/14 19:01, Jeff Law wrote:

 On 09/22/14 04:29, Jiong Wang wrote:

 On 19/09/14 21:43, Jeff Law wrote:

 patch attached.

 please review, thanks.

 gcc/
 * shrink-wrap.c (move_insn_for_shrink_wrap): Add further check when
 !REG_P (src) to
 release more instruction sink opportunities.

 gcc/testsuite/
 * gcc.target/aarch64/shrink_wrap_symbol_ref_1.c: New testcase.

 Thanks.  Please verify this version passes a bootstrap  regression
 test.  Assuming it does it is OK for the trunk.


 pass bootstrap and on regression on x86 based on revision 215515.

 committed as revision 215563.

 -- Jiong


I have observed regressions in the g++ testsuite: pr49847 now FAILs
after this patch.

Here is what I have in my logs:
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/testsuite/g++/../../xg++
-B/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/testsuite/g++/../../
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/pr49847.C
-fno-diagnostics-show-caret -fdiagnostics-color=never  -nostdinc++
-I/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/arm-none-linux-gnueabihf/libstdc++-v3/include/arm-none-linux-gnueabihf
-I/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/arm-none-linux-gnueabihf/libstdc++-v3/include
-I/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/libsupc++
-I/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/include/backward
-I/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/testsuite/util
-fmessage-length=0  -std=gnu++98 -O -fnon-call-exceptions  -S -o
pr49847.s(timeout = 800)
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/pr49847.C: In
function 'int f(float)':
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/pr49847.C:7:1:
error: missing REG_EH_REGION note at the end of bb 2
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/pr49847.C:7:1:
internal compiler error: verify_flow_info failed
0x82f8ba verify_flow_info()
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfghooks.c:260

0x840cd3 commit_edge_insertions()
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgrtl.c:2068
0x9bf243 thread_prologue_and_epilogue_insns
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/function.c:5852
0x9bfa52 rest_of_handle_thread_prologue_and_epilogue
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/function.c:6245
0x9bfa52 execute
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/function.c:6283

As per
http://cbuild.validation.linaro.org/build/cross-validation/gcc/trunk/215563/report-build-info.html
I've noticed this on targets:
arm-none-linux-gnueabihf
armeb-none-linux-gnueabihf
aarch64-none-elf
aarch64_be-none-elf
aarch64-none-linux-gnu
but NOT on
arm-none-eabi
arm-none-linux-gnueabi

Christophe.


Re: [COMMITTED][PATCH] Improve prepare_shrink_wrap to sink more instructions

2014-09-25 Thread Jiong Wang


On 25/09/14 12:25, Christophe Lyon wrote:

On 24 September 2014 20:32, Jiong Wang jiong.w...@arm.com wrote:

On 22/09/14 19:01, Jeff Law wrote:

On 09/22/14 04:29, Jiong Wang wrote:

On 19/09/14 21:43, Jeff Law wrote:

patch attached.

please review, thanks.

gcc/
 * shrink-wrap.c (move_insn_for_shrink_wrap): Add further check when
!REG_P (src) to
 release more instruction sink opportunities.

gcc/testsuite/
 * gcc.target/aarch64/shrink_wrap_symbol_ref_1.c: New testcase.

Thanks.  Please verify this version passes a bootstrap  regression
test.  Assuming it does it is OK for the trunk.


pass bootstrap and on regression on x86 based on revision 215515.

committed as revision 215563.

-- Jiong


I have observed regressions in the g++ testsuite: pr49847 now FAILs
after this patch.

no.

even without my patch, the regression still happen.

or you could specify -fno-shrink-wrap, gcc still crash.

so, this regression should caused by other commits which haven't exposed 
on x86 regression test.


-- Jiong



Here is what I have in my logs:
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/testsuite/g++/../../xg++
-B/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/testsuite/g++/../../
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/pr49847.C
-fno-diagnostics-show-caret -fdiagnostics-color=never  -nostdinc++
-I/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/arm-none-linux-gnueabihf/libstdc++-v3/include/arm-none-linux-gnueabihf
-I/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/arm-none-linux-gnueabihf/libstdc++-v3/include
-I/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/libsupc++
-I/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/include/backward
-I/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/testsuite/util
-fmessage-length=0  -std=gnu++98 -O -fnon-call-exceptions  -S -o
pr49847.s(timeout = 800)
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/pr49847.C: In
function 'int f(float)':
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/pr49847.C:7:1:
error: missing REG_EH_REGION note at the end of bb 2
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/pr49847.C:7:1:
internal compiler error: verify_flow_info failed
0x82f8ba verify_flow_info()
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfghooks.c:260

0x840cd3 commit_edge_insertions()
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgrtl.c:2068
0x9bf243 thread_prologue_and_epilogue_insns
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/function.c:5852
0x9bfa52 rest_of_handle_thread_prologue_and_epilogue
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/function.c:6245
0x9bfa52 execute
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/function.c:6283

As per
http://cbuild.validation.linaro.org/build/cross-validation/gcc/trunk/215563/report-build-info.html
I've noticed this on targets:
arm-none-linux-gnueabihf
armeb-none-linux-gnueabihf
aarch64-none-elf
aarch64_be-none-elf
aarch64-none-linux-gnu
but NOT on
arm-none-eabi
arm-none-linux-gnueabi

Christophe.






[COMMITTED][PATCH] Improve prepare_shrink_wrap to sink more instructions

2014-09-24 Thread Jiong Wang


On 22/09/14 19:01, Jeff Law wrote:

On 09/22/14 04:29, Jiong Wang wrote:

On 19/09/14 21:43, Jeff Law wrote:

patch attached.

please review, thanks.

gcc/
* shrink-wrap.c (move_insn_for_shrink_wrap): Add further check when
!REG_P (src) to
release more instruction sink opportunities.

gcc/testsuite/
* gcc.target/aarch64/shrink_wrap_symbol_ref_1.c: New testcase.

Thanks.  Please verify this version passes a bootstrap  regression
test.  Assuming it does it is OK for the trunk.


pass bootstrap and on regression on x86 based on revision 215515.

committed as revision 215563.

-- Jiong



jeff









Re: [PATCH] Improve prepare_shrink_wrap to sink more instructions

2014-09-22 Thread Jiong Wang

On 19/09/14 21:43, Jeff Law wrote:


On 09/15/14 08:33, Jiong Wang wrote:

Jeff,

   thanks, I partially understand your meaning here.

take the function ira_implicitly_set_insn_hard_regs in ira-lives.c
for example,

when generating address rtl, gcc will automatically generate const
operator to prefix
the address expression, like the following. so a simple CONSTANT_P
check is enough in
case there is no embedded register.

(insn 309 310 308 3 (set (reg:DI 44 r15 [orig:94 ivtmp.674 ] [94])
  (const:DI (plus:DI (symbol_ref:DI (recog_data) [flags 0x40]
var_decl 0x2b2c2ff91510 recog_data)
  (const_int 480 [0x1e0] -1


but for architecture like aarch64, the following instruction
sequences to forming address
may be generated

(insn 73 14 74 4 (set (reg/f:DI 20 x20 [99])
  (high:DI (symbol_ref:DI (global_a) [flags 0xc0]  var_decl
0x7ff755a60900 stats))) 35 {*movdi_aarch64}
   (expr_list:REG_EQUIV (high:DI (symbol_ref:DI (global_a) [flags
0xc0]  var_decl 0x7ff755a60900 stats))
  (nil)))

(insn 17 30 25 5 (set (reg/f:DI 4 x4 [83])
  (lo_sum:DI (reg/f:DI 20 x20 [99])
  (symbol_ref:DI (global_a) [flags 0xc0]  var_decl
0x7ff755a60900 stats))) {add_losym_di}
   (expr_list:REG_EQUIV (symbol_ref:DI (global_a) [flags 0xc0]
var_decl 0x7ff755a60900 stats)
  (nil)))

   while CONSTANT_P could not catch the latter lo_sum case, as the
RTX_CLASS of lo_sum is RTX_OBJ not RTX_CONST_OBJ,

Hmm, it's been ~15 years since I regularly worked on a target that uses
HIGH/LO_SUM, I thought we wrapped the LO_SUM expression inside a CONST
as well, but reading the docs for CONST, that clearly isn't the case.

Sorry for that.  Can you (re) send your current patch for this for review?


patch attached.

please review, thanks.

gcc/
  * shrink-wrap.c (move_insn_for_shrink_wrap): Add further check when !REG_P 
(src) to
  release more instruction sink opportunities.

gcc/testsuite/
  * gcc.target/aarch64/shrink_wrap_symbol_ref_1.c: New testcase.
diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index fd24135..739e957 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include bb-reorder.h
 #include shrink-wrap.h
 #include regcprop.h
+#include rtl-iter.h

 #ifdef HAVE_simple_return

@@ -169,7 +170,9 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
 {
   rtx set, src, dest;
   bitmap live_out, live_in, bb_uses, bb_defs;
-  unsigned int i, dregno, end_dregno, sregno, end_sregno;
+  unsigned int i, dregno, end_dregno;
+  unsigned int sregno = FIRST_PSEUDO_REGISTER;
+  unsigned int end_sregno = FIRST_PSEUDO_REGISTER;
   basic_block next_block;
   edge live_edge;

@@ -179,7 +182,34 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
 return false;
   src = SET_SRC (set);
   dest = SET_DEST (set);
-  if (!REG_P (dest) || !REG_P (src)
+
+  if (!REG_P (src))
+{
+  unsigned int reg_num = 0;
+  unsigned int nonconstobj_num = 0;
+  rtx src_inner = NULL_RTX;
+
+  subrtx_var_iterator::array_type array;
+  FOR_EACH_SUBRTX_VAR (iter, array, src, ALL)
+	{
+	  rtx x = *iter;
+	  if (REG_P (x))
+	{
+	  reg_num++;
+	  src_inner = x;
+	}
+	  else if (!CONSTANT_P (x)  OBJECT_P (x))
+	nonconstobj_num++;
+	}
+
+  if (nonconstobj_num  0
+	  || reg_num  1)
+	src = NULL_RTX;
+  else if (reg_num == 1)
+	src = src_inner;
+}
+
+  if (!REG_P (dest) || src == NULL_RTX
   /* STACK or FRAME related adjustment might be part of prologue.
 	 So keep them in the entry block.  */
   || dest == stack_pointer_rtx
@@ -188,10 +218,13 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
 return false;

   /* Make sure that the source register isn't defined later in BB.  */
-  sregno = REGNO (src);
-  end_sregno = END_REGNO (src);
-  if (overlaps_hard_reg_set_p (defs, GET_MODE (src), sregno))
-return false;
+  if (REG_P (src))
+{
+  sregno = REGNO (src);
+  end_sregno = END_REGNO (src);
+  if (overlaps_hard_reg_set_p (defs, GET_MODE (src), sregno))
+	return false;
+}

   /* Make sure that the destination register isn't referenced later in BB.  */
   dregno = REGNO (dest);
diff --git a/gcc/testsuite/gcc.target/aarch64/shrink_wrap_symbol_ref_1.c b/gcc/testsuite/gcc.target/aarch64/shrink_wrap_symbol_ref_1.c
new file mode 100644
index 000..ad2e588
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/shrink_wrap_symbol_ref_1.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-rtl-pro_and_epilogue } */
+
+extern char *asm_out_file;
+extern void default_elf_asm_output_ascii (char *, const char *, int);
+
+void
+assemble_string (const char *p, int size)
+{
+  int pos = 0;
+  int maximum = 2000;
+
+  while (pos  size)
+{
+  int thissize = size - pos;
+
+  if (thissize  maximum)
+	thissize = maximum;
+
+  default_elf_asm_output_ascii (asm_out_file, p, 

Re: [PATCH] Improve prepare_shrink_wrap to sink more instructions

2014-09-22 Thread Jeff Law

On 09/22/14 04:29, Jiong Wang wrote:

On 19/09/14 21:43, Jeff Law wrote:


On 09/15/14 08:33, Jiong Wang wrote:

Jeff,

   thanks, I partially understand your meaning here.

take the function ira_implicitly_set_insn_hard_regs in ira-lives.c
for example,

when generating address rtl, gcc will automatically generate const
operator to prefix
the address expression, like the following. so a simple CONSTANT_P
check is enough in
case there is no embedded register.

(insn 309 310 308 3 (set (reg:DI 44 r15 [orig:94 ivtmp.674 ] [94])
  (const:DI (plus:DI (symbol_ref:DI (recog_data) [flags 0x40]
var_decl 0x2b2c2ff91510 recog_data)
  (const_int 480 [0x1e0] -1


but for architecture like aarch64, the following instruction
sequences to forming address
may be generated

(insn 73 14 74 4 (set (reg/f:DI 20 x20 [99])
  (high:DI (symbol_ref:DI (global_a) [flags 0xc0]  var_decl
0x7ff755a60900 stats))) 35 {*movdi_aarch64}
   (expr_list:REG_EQUIV (high:DI (symbol_ref:DI (global_a) [flags
0xc0]  var_decl 0x7ff755a60900 stats))
  (nil)))

(insn 17 30 25 5 (set (reg/f:DI 4 x4 [83])
  (lo_sum:DI (reg/f:DI 20 x20 [99])
  (symbol_ref:DI (global_a) [flags 0xc0]  var_decl
0x7ff755a60900 stats))) {add_losym_di}
   (expr_list:REG_EQUIV (symbol_ref:DI (global_a) [flags 0xc0]
var_decl 0x7ff755a60900 stats)
  (nil)))

   while CONSTANT_P could not catch the latter lo_sum case, as the
RTX_CLASS of lo_sum is RTX_OBJ not RTX_CONST_OBJ,

Hmm, it's been ~15 years since I regularly worked on a target that uses
HIGH/LO_SUM, I thought we wrapped the LO_SUM expression inside a CONST
as well, but reading the docs for CONST, that clearly isn't the case.

Sorry for that.  Can you (re) send your current patch for this for
review?


patch attached.

please review, thanks.

gcc/
   * shrink-wrap.c (move_insn_for_shrink_wrap): Add further check when
!REG_P (src) to
   release more instruction sink opportunities.

gcc/testsuite/
   * gcc.target/aarch64/shrink_wrap_symbol_ref_1.c: New testcase.
Thanks.  Please verify this version passes a bootstrap  regression 
test.  Assuming it does it is OK for the trunk.


jeff



Re: [PATCH] Improve prepare_shrink_wrap to sink more instructions

2014-09-19 Thread Jeff Law

On 09/15/14 08:33, Jiong Wang wrote:


Jeff,

  thanks, I partially understand your meaning here.

   take the function ira_implicitly_set_insn_hard_regs in ira-lives.c
for example,

   when generating address rtl, gcc will automatically generate const
operator to prefix
   the address expression, like the following. so a simple CONSTANT_P
check is enough in
   case there is no embedded register.

   (insn 309 310 308 3 (set (reg:DI 44 r15 [orig:94 ivtmp.674 ] [94])
 (const:DI (plus:DI (symbol_ref:DI (recog_data) [flags 0x40]
var_decl 0x2b2c2ff91510 recog_data)
 (const_int 480 [0x1e0] -1


   but for architecture like aarch64, the following instruction
sequences to forming address
   may be generated

(insn 73 14 74 4 (set (reg/f:DI 20 x20 [99])
 (high:DI (symbol_ref:DI (global_a) [flags 0xc0]  var_decl
0x7ff755a60900 stats))) 35 {*movdi_aarch64}
  (expr_list:REG_EQUIV (high:DI (symbol_ref:DI (global_a) [flags
0xc0]  var_decl 0x7ff755a60900 stats))
 (nil)))

(insn 17 30 25 5 (set (reg/f:DI 4 x4 [83])
 (lo_sum:DI (reg/f:DI 20 x20 [99])
 (symbol_ref:DI (global_a) [flags 0xc0]  var_decl
0x7ff755a60900 stats))) {add_losym_di}
  (expr_list:REG_EQUIV (symbol_ref:DI (global_a) [flags 0xc0]
var_decl 0x7ff755a60900 stats)
 (nil)))

  while CONSTANT_P could not catch the latter lo_sum case, as the
RTX_CLASS of lo_sum is RTX_OBJ not RTX_CONST_OBJ,
Hmm, it's been ~15 years since I regularly worked on a target that uses 
HIGH/LO_SUM, I thought we wrapped the LO_SUM expression inside a CONST 
as well, but reading the docs for CONST, that clearly isn't the case.


Sorry for that.  Can you (re) send your current patch for this for review?

Jeff


Re: [PATCH] Improve prepare_shrink_wrap to sink more instructions

2014-09-15 Thread Jiong Wang


On 11/09/14 21:39, Jeff Law wrote:

On 09/08/14 07:57, Jiong Wang wrote:

Conceptually OK.  Some questions/concerns about the implementation.

Hi Jeff,

thanks very much for your review.

It seems to me that what you're trying to describe on the RHS is
REG_P || CONSTANT_P

yes.

and actually I am trying to detect all rtx which contains any number
of RTX_CONST_OBJs and no more than one REG.

Note that CONSTANT_P will catch things like (high (symbol_ref)) and
(lo_sum (reg) (symbol_ref)) which are often used to build addresses on
some targets.

With that in mind, rather than using a for_each_rtx callback, test
if (REG_P (src) || CONSTANT_P (src))

Note that SRC, when it is a CONSTANT_P,  may have a variety of forms
with embedded registers.  You'll need to verify that all of those
registers are not assigned after the insn with the CONSTANT_P source
operand.  Right now you only perform that check when SRC is a REG_P.

I am using the for_each_rtx because I want to scan src so that
any sub-operands are checked,  the number of REG and non-constant
objects are record in reg_found and nonconst_found.  the embedded
register found also record in the reg field of the structure
rtx_search_arg,

Constants in this context are going to satisfy CONSTANT_P, you don't
need to manually verify them.  It will include simple constants and
constants which are built up out of multiple instructions (symbolic
constants in particular).


Jeff,

 thanks, I partially understand your meaning here.

  take the function ira_implicitly_set_insn_hard_regs in ira-lives.c for 
example,

  when generating address rtl, gcc will automatically generate const operator 
to prefix
  the address expression, like the following. so a simple CONSTANT_P check is 
enough in
  case there is no embedded register.

  (insn 309 310 308 3 (set (reg:DI 44 r15 [orig:94 ivtmp.674 ] [94])
(const:DI (plus:DI (symbol_ref:DI (recog_data) [flags 0x40]  var_decl 
0x2b2c2ff91510 recog_data)
(const_int 480 [0x1e0] -1


  but for architecture like aarch64, the following instruction sequences to 
forming address
  may be generated

(insn 73 14 74 4 (set (reg/f:DI 20 x20 [99])
(high:DI (symbol_ref:DI (global_a) [flags 0xc0]  var_decl 
0x7ff755a60900 stats))) 35 {*movdi_aarch64}
 (expr_list:REG_EQUIV (high:DI (symbol_ref:DI (global_a) [flags 0xc0]  
var_decl 0x7ff755a60900 stats))
(nil)))

(insn 17 30 25 5 (set (reg/f:DI 4 x4 [83])
(lo_sum:DI (reg/f:DI 20 x20 [99])
(symbol_ref:DI (global_a) [flags 0xc0]  var_decl 0x7ff755a60900 
stats))) {add_losym_di}
 (expr_list:REG_EQUIV (symbol_ref:DI (global_a) [flags 0xc0]  var_decl 
0x7ff755a60900 stats)
(nil)))

 while CONSTANT_P could not catch the latter lo_sum case, as the RTX_CLASS of 
lo_sum is RTX_OBJ not RTX_CONST_OBJ,
 so the

 the manually verification

  FOR_EACH_SUBRTX_VAR (iter, array, src, ALL)
{
  rtx x = *iter;
  if (REG_P (x))
{
  reg_num++;
  src_inner = x;
}
  else if (!CONSTANT_P (x)  OBJECT_P (x))
nonconstobj_num++;
}

 is to catch cases like the lo_sum insn above which contains no more than 1 
register and only contains
 RTX_CONST_OBJ.


I suspect you still need the callback to verify the # of registers is
just 1 so that the later tests work.  However, please don't use
for_each_rtx, we're moving away from that to a more efficient walker
FOR_EACH_SUBRTX.


 
  fixed, patch updated, please review.


  thanks.

  BTW, when I verifying this patch on gilbc x86-64 build, I found there may be 
another hiding
  issue when calculating live_in when splitting edge in 
move_insn_for_shrink_wrap, I will reply
  on that historical email thread directly.

Regards,
Jiong



Jeff



diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index fd24135..739e957 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include bb-reorder.h
 #include shrink-wrap.h
 #include regcprop.h
+#include rtl-iter.h
 
 #ifdef HAVE_simple_return
 
@@ -169,7 +170,9 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
 {
   rtx set, src, dest;
   bitmap live_out, live_in, bb_uses, bb_defs;
-  unsigned int i, dregno, end_dregno, sregno, end_sregno;
+  unsigned int i, dregno, end_dregno;
+  unsigned int sregno = FIRST_PSEUDO_REGISTER;
+  unsigned int end_sregno = FIRST_PSEUDO_REGISTER;
   basic_block next_block;
   edge live_edge;
 
@@ -179,7 +182,34 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
 return false;
   src = SET_SRC (set);
   dest = SET_DEST (set);
-  if (!REG_P (dest) || !REG_P (src)
+
+  if (!REG_P (src))
+{
+  unsigned int reg_num = 0;
+  unsigned int nonconstobj_num = 0;
+  rtx src_inner = NULL_RTX;
+
+  subrtx_var_iterator::array_type array;
+  FOR_EACH_SUBRTX_VAR (iter, array, src, ALL)
+	{
+	  rtx x = *iter;
+	  if 

Re: [PATCH] Improve prepare_shrink_wrap to sink more instructions

2014-09-15 Thread Jeff Law

On 09/11/14 14:58, Wang Jiong wrote:


the reg_found will record the # of registers, and there will be a
check to make sure no register or just 1 register in the src.
so if reg_found is bigger than 1, then src will be assigned to
NULL_RTX, that we just stop the move.

   +  if (arg.nonconst_found
   +  || arg.reg_found  1)
   +src = NULL_RTX;

   am I missing something here?
No, I was just reiterating that you are going to need that code, so some 
kind of walker would still be necessary.  Sorry I wasn't clear about that.


jeff



Re: [PATCH] Improve prepare_shrink_wrap to sink more instructions

2014-09-11 Thread Jeff Law

On 09/08/14 07:57, Jiong Wang wrote:

Conceptually OK.  Some questions/concerns about the implementation.

Hi Jeff,

   thanks very much for your review.

It seems to me that what you're trying to describe on the RHS is
REG_P || CONSTANT_P

   yes.

   and actually I am trying to detect all rtx which contains any number
   of RTX_CONST_OBJs and no more than one REG.


Note that CONSTANT_P will catch things like (high (symbol_ref)) and
(lo_sum (reg) (symbol_ref)) which are often used to build addresses on
some targets.

With that in mind, rather than using a for_each_rtx callback, test
if (REG_P (src) || CONSTANT_P (src))

Note that SRC, when it is a CONSTANT_P,  may have a variety of forms
with embedded registers.  You'll need to verify that all of those
registers are not assigned after the insn with the CONSTANT_P source
operand.  Right now you only perform that check when SRC is a REG_P.


   I am using the for_each_rtx because I want to scan src so that
any sub-operands are checked,  the number of REG and non-constant
objects are record in reg_found and nonconst_found.  the embedded
register found also record in the reg field of the structure
rtx_search_arg,
Constants in this context are going to satisfy CONSTANT_P, you don't 
need to manually verify them.  It will include simple constants and 
constants which are built up out of multiple instructions (symbolic 
constants in particular).


I suspect you still need the callback to verify the # of registers is 
just 1 so that the later tests work.  However, please don't use 
for_each_rtx, we're moving away from that to a more efficient walker 
FOR_EACH_SUBRTX.


Jeff


Re: [PATCH] Improve prepare_shrink_wrap to sink more instructions

2014-09-11 Thread Wang Jiong
2014-09-11 21:39 GMT+01:00 Jeff Law l...@redhat.com:

 On 09/08/14 07:57, Jiong Wang wrote:

 Conceptually OK.  Some questions/concerns about the implementation.

 Hi Jeff,

thanks very much for your review.

 It seems to me that what you're trying to describe on the RHS is
 REG_P || CONSTANT_P

yes.

and actually I am trying to detect all rtx which contains any number
of RTX_CONST_OBJs and no more than one REG.


 Note that CONSTANT_P will catch things like (high (symbol_ref)) and
 (lo_sum (reg) (symbol_ref)) which are often used to build addresses on
 some targets.

 With that in mind, rather than using a for_each_rtx callback, test
 if (REG_P (src) || CONSTANT_P (src))

 Note that SRC, when it is a CONSTANT_P,  may have a variety of forms
 with embedded registers.  You'll need to verify that all of those
 registers are not assigned after the insn with the CONSTANT_P source
 operand.  Right now you only perform that check when SRC is a REG_P.


I am using the for_each_rtx because I want to scan src so that
 any sub-operands are checked,  the number of REG and non-constant
 objects are record in reg_found and nonconst_found.  the embedded
 register found also record in the reg field of the structure
 rtx_search_arg,


Hi Jeff,

  Constants in this context are going to satisfy CONSTANT_P, you
don't need to manually verify them.  It will include simple
constants and constants which are built up out of multiple
instructions (symbolic constants in particular).

===  thanks, will investigate this tomorrow.


 I suspect you still need the callback to verify the # of registers is just 1 
 so that the later tests work.

===  the reg_found will record the # of registers, and there will be
a check to make sure no register or just 1 register in the src.
===  so if reg_found is bigger than 1, then src will be assigned to
NULL_RTX, that we just stop the move.

  +  if (arg.nonconst_found
  +  || arg.reg_found  1)
  +src = NULL_RTX;

=== am I missing something here?

However, please don't use for_each_rtx, we're moving away from that to a more 
efficient walker FOR_EACH_SUBRTX.

=== OK, will fix.  Thanks very much for review.

Regards,
Jiong


 Jeff


Re: [PATCH] Improve prepare_shrink_wrap to sink more instructions

2014-09-08 Thread Jiong Wang

On 05/09/14 20:48, Jeff Law wrote:

On 09/04/14 08:15, Jiong Wang wrote:

this patch relax the restriction on src to accept any one of the following:

+ REG
+ CONST_OBJ, like SYMBOL_REF
+ combination of single REG and any other CONST_OBJs.
  (reg def/use calculation will not affected by CONST_OBJs)

RISC backend may benefit more from this relax, although there still
be minor improvements on x86. for example, there are 17 more functions
shrink-wrapped during x86-64 bootstrap, like sort_bucket in ira-color.c.

test done
=
no regression on aarch64-none-elf bare-metal.
no regression on x86-64 check-gcc.
both aarch64 and x86-64 bootstrap OK.

ok for install?

2014-09-04 Jiong Wangjiong.w...@arm.com

gcc/
* shrink-wrap.c (rtx_search_arg): New structure type.
(rtx_search_arg_p): New typedef.
(count_reg_const): New callback function.
(move_insn_for_shrink_wrap): Relax the restriction on src operand.


Conceptually OK.  Some questions/concerns about the implementation.

Hi Jeff,

  thanks very much for your review.

It seems to me that what you're trying to describe on the RHS is
REG_P || CONSTANT_P

  yes.

  and actually I am trying to detect all rtx which contains any number
  of RTX_CONST_OBJs and no more than one REG.


Note that CONSTANT_P will catch things like (high (symbol_ref)) and
(lo_sum (reg) (symbol_ref)) which are often used to build addresses on
some targets.

With that in mind, rather than using a for_each_rtx callback, test
if (REG_P (src) || CONSTANT_P (src))

Note that SRC, when it is a CONSTANT_P,  may have a variety of forms
with embedded registers.  You'll need to verify that all of those
registers are not assigned after the insn with the CONSTANT_P source
operand.  Right now you only perform that check when SRC is a REG_P.


  I am using the for_each_rtx because I want to scan src so that
any sub-operands are checked,  the number of REG and non-constant
objects are record in reg_found and nonconst_found.  the embedded
register found also record in the reg field of the structure 
rtx_search_arg,

and there is a assignment:

  else if (arg.reg_found == 1)
src = arg.reg;

  so the logic of my patch is:
 * if REG_P (src), nothing new, just follow the original logic.
 * if !REG_P (src),  scan src, to make sure there is no more than 
one REG,

   and there is no non-constant objects.
   if any embedded REG found, assign it to src  to go through all 
code written for REG_P (src).


 And I do found there is one glitch in my patch:

  +  else if (!CONSTANT_P (x))

should be

  +  else if (!CONSTANT_P (x)  OBJECT_P (x)) to skip those operator 
rtx like plus, minus etc.


so, does my explanation make sense?

BTW, there will be 2547 functions shrink-wrapped during x86-64 bootstrap 
compared with 2501

before this improvement,  46 more,  +1.84%.

thanks.

-- Jiong




Jeff








Re: [PATCH] Improve prepare_shrink_wrap to sink more instructions

2014-09-05 Thread Jeff Law

On 09/04/14 08:15, Jiong Wang wrote:

currently, the instruction sink in prepare_shrink_wrap is a bit
conservative
that some further optimization opportunities have been missed.

given the prologue use register A by:

   (store A, [sp + offset])

then given the entry_basic_block contains a simply register copy like:

   (move A, B)

current prepare_shrink_wrap will sink the move instruction as deep as
it can,
then the entry_basic_block could be marked as don't need prologue.

while if we replace (move A, B) into either one of

   * (move B, CONST_K),
   * (move B, (plus A, CONST_K))

we still could do the same sink optimization, but *current gcc do not*.

pattern like (move B, CONST_K) are very normal for some RISC targets.

for example on AArch64, we could have the following pair:

   adrpx22, global_data_a
   add x0, x22, :lo12:global_data_a

if adrp be scheduled into the entry_basic_block then the write of x22 may
prevent shrink-wrap happen.

when judge whether one instruction is sink-able,
move_insn_for_shrink_wrap only
accept simply reg copy that both dest and src are REG_P, while the second
operand of adrp is actually a SYMBOL_REF, thus it's reject by the
optimization.

this patch relax the restriction on src to accept any one of the following:

   + REG
   + CONST_OBJ, like SYMBOL_REF
   + combination of single REG and any other CONST_OBJs.
 (reg def/use calculation will not affected by CONST_OBJs)

RISC backend may benefit more from this relax, although there still
be minor improvements on x86. for example, there are 17 more functions
shrink-wrapped during x86-64 bootstrap, like sort_bucket in ira-color.c.

test done
=
   no regression on aarch64-none-elf bare-metal.
   no regression on x86-64 check-gcc.
   both aarch64 and x86-64 bootstrap OK.

ok for install?

2014-09-04 Jiong Wangjiong.w...@arm.com

gcc/
   * shrink-wrap.c (rtx_search_arg): New structure type.
   (rtx_search_arg_p): New typedef.
   (count_reg_const): New callback function.
   (move_insn_for_shrink_wrap): Relax the restriction on src operand.


Conceptually OK.  Some questions/concerns about the implementation.

It seems to me that what you're trying to describe on the RHS is
REG_P || CONSTANT_P

Note that CONSTANT_P will catch things like (high (symbol_ref)) and 
(lo_sum (reg) (symbol_ref)) which are often used to build addresses on 
some targets.


With that in mind, rather than using a for_each_rtx callback, test
if (REG_P (src) || CONSTANT_P (src))

Note that SRC, when it is a CONSTANT_P,  may have a variety of forms 
with embedded registers.  You'll need to verify that all of those 
registers are not assigned after the insn with the CONSTANT_P source 
operand.  Right now you only perform that check when SRC is a REG_P.


Jeff


[PATCH] Improve prepare_shrink_wrap to sink more instructions

2014-09-04 Thread Jiong Wang

currently, the instruction sink in prepare_shrink_wrap is a bit conservative
that some further optimization opportunities have been missed.

given the prologue use register A by:

  (store A, [sp + offset])

then given the entry_basic_block contains a simply register copy like:

  (move A, B)

current prepare_shrink_wrap will sink the move instruction as deep as it can,
then the entry_basic_block could be marked as don't need prologue.

while if we replace (move A, B) into either one of

  * (move B, CONST_K),
  * (move B, (plus A, CONST_K))

we still could do the same sink optimization, but *current gcc do not*.

pattern like (move B, CONST_K) are very normal for some RISC targets.

for example on AArch64, we could have the following pair:

  adrpx22, global_data_a
  add x0, x22, :lo12:global_data_a

if adrp be scheduled into the entry_basic_block then the write of x22 may
prevent shrink-wrap happen.

when judge whether one instruction is sink-able, move_insn_for_shrink_wrap only
accept simply reg copy that both dest and src are REG_P, while the second
operand of adrp is actually a SYMBOL_REF, thus it's reject by the optimization.

this patch relax the restriction on src to accept any one of the following:

  + REG
  + CONST_OBJ, like SYMBOL_REF
  + combination of single REG and any other CONST_OBJs.
(reg def/use calculation will not affected by CONST_OBJs)

RISC backend may benefit more from this relax, although there still
be minor improvements on x86. for example, there are 17 more functions
shrink-wrapped during x86-64 bootstrap, like sort_bucket in ira-color.c.

test done
=
  no regression on aarch64-none-elf bare-metal.
  no regression on x86-64 check-gcc.
  both aarch64 and x86-64 bootstrap OK.

ok for install?

2014-09-04 Jiong Wangjiong.w...@arm.com

gcc/
  * shrink-wrap.c (rtx_search_arg): New structure type.
  (rtx_search_arg_p): New typedef.
  (count_reg_const): New callback function.
  (move_insn_for_shrink_wrap): Relax the restriction on src operand.

diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index 0938f2c..5b5ca85 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -156,6 +156,37 @@ live_edge_for_reg (basic_block bb, int regno, int end_regno)
   return live_edge;
 }
 
+struct rtx_search_arg
+{
+  unsigned int reg_found;
+  unsigned int nonconst_found;
+  rtx reg;
+};
+
+typedef struct rtx_search_arg *rtx_search_arg_p;
+
+/* A for_each_rtx callback used by move_insn_for_shrink_wrap to count the
+   numbers of register and non-constant objects.  */
+
+static int
+count_reg_const (rtx *loc, void *arg)
+{
+  rtx_search_arg_p p = (rtx_search_arg_p) arg;
+  rtx x;
+
+  x = *loc;
+
+  if (REG_P (x))
+{
+  p-reg_found++;
+  p-reg = x;
+}
+  else if (! CONSTANT_P (x))
+p-nonconst_found++;
+
+  return 0;
+}
+
 /* Try to move INSN from BB to a successor.  Return true on success.
USES and DEFS are the set of registers that are used and defined
after INSN in BB.  SPLIT_P indicates whether a live edge from BB
@@ -169,7 +200,9 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
 {
   rtx set, src, dest;
   bitmap live_out, live_in, bb_uses, bb_defs;
-  unsigned int i, dregno, end_dregno, sregno, end_sregno;
+  unsigned int i, dregno, end_dregno;
+  unsigned int sregno = FIRST_PSEUDO_REGISTER;
+  unsigned int end_sregno = FIRST_PSEUDO_REGISTER;
   basic_block next_block;
   edge live_edge;
 
@@ -179,7 +212,25 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
 return false;
   src = SET_SRC (set);
   dest = SET_DEST (set);
-  if (!REG_P (dest) || !REG_P (src)
+
+  if (!REG_P (src))
+{
+  struct rtx_search_arg arg;
+
+  arg.reg_found = 0;
+  arg.nonconst_found = 0;
+  arg.reg = NULL_RTX;
+
+  for_each_rtx (src, count_reg_const, (void *) arg);
+
+  if (arg.nonconst_found
+	  || arg.reg_found  1)
+	src = NULL_RTX;
+  else if (arg.reg_found == 1)
+	src = arg.reg;
+}
+
+  if (!REG_P (dest) || src == NULL_RTX
   /* STACK or FRAME related adjustment might be part of prologue.
 	 So keep them in the entry block.  */
   || dest == stack_pointer_rtx
@@ -188,10 +238,13 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
 return false;
 
   /* Make sure that the source register isn't defined later in BB.  */
-  sregno = REGNO (src);
-  end_sregno = END_REGNO (src);
-  if (overlaps_hard_reg_set_p (defs, GET_MODE (src), sregno))
-return false;
+  if (REG_P (src))
+{
+  sregno = REGNO (src);
+  end_sregno = END_REGNO (src);
+  if (overlaps_hard_reg_set_p (defs, GET_MODE (src), sregno))
+	return false;
+}
 
   /* Make sure that the destination register isn't referenced later in BB.  */
   dregno = REGNO (dest);
diff --git a/gcc/testsuite/gcc.target/aarch64/shrink_wrap_symbol_ref_1.c b/gcc/testsuite/gcc.target/aarch64/shrink_wrap_symbol_ref_1.c
new file mode 100644
index 000..ad2e588
--- /dev/null
+++