Re: [PATCH] Improve prepare_shrink_wrap to sink more instructions
On Fri, Sep 19, 2014 at 1:43 PM, Jeff Law l...@redhat.com wrote: On 09/15/14 08:33, Jiong Wang wrote: Jeff, thanks, I partially understand your meaning here. take the function ira_implicitly_set_insn_hard_regs in ira-lives.c for example, when generating address rtl, gcc will automatically generate const operator to prefix the address expression, like the following. so a simple CONSTANT_P check is enough in case there is no embedded register. (insn 309 310 308 3 (set (reg:DI 44 r15 [orig:94 ivtmp.674 ] [94]) (const:DI (plus:DI (symbol_ref:DI (recog_data) [flags 0x40] var_decl 0x2b2c2ff91510 recog_data) (const_int 480 [0x1e0] -1 but for architecture like aarch64, the following instruction sequences to forming address may be generated (insn 73 14 74 4 (set (reg/f:DI 20 x20 [99]) (high:DI (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats))) 35 {*movdi_aarch64} (expr_list:REG_EQUIV (high:DI (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats)) (nil))) (insn 17 30 25 5 (set (reg/f:DI 4 x4 [83]) (lo_sum:DI (reg/f:DI 20 x20 [99]) (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats))) {add_losym_di} (expr_list:REG_EQUIV (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats) (nil))) while CONSTANT_P could not catch the latter lo_sum case, as the RTX_CLASS of lo_sum is RTX_OBJ not RTX_CONST_OBJ, Hmm, it's been ~15 years since I regularly worked on a target that uses HIGH/LO_SUM, I thought we wrapped the LO_SUM expression inside a CONST as well, but reading the docs for CONST, that clearly isn't the case. Could we add a check for lo_sum since it is an RTX_OBJ rather than RTX_COMM_ARITH or RTX_BIN_ARITH? I am testing the patch for that to fix the above issue. It shows up with the testcase Jiong added but only with -mabi=ilp32 enabled. Thanks, Andrew Pinski Sorry for that. Can you (re) send your current patch for this for review? Jeff
Re: [PATCH] Improve prepare_shrink_wrap to sink more instructions
On 02/10/14 09:49, Andrew Pinski wrote: On Fri, Sep 19, 2014 at 1:43 PM, Jeff Law l...@redhat.com wrote: On 09/15/14 08:33, Jiong Wang wrote: Jeff, thanks, I partially understand your meaning here. take the function ira_implicitly_set_insn_hard_regs in ira-lives.c for example, when generating address rtl, gcc will automatically generate const operator to prefix the address expression, like the following. so a simple CONSTANT_P check is enough in case there is no embedded register. (insn 309 310 308 3 (set (reg:DI 44 r15 [orig:94 ivtmp.674 ] [94]) (const:DI (plus:DI (symbol_ref:DI (recog_data) [flags 0x40] var_decl 0x2b2c2ff91510 recog_data) (const_int 480 [0x1e0] -1 but for architecture like aarch64, the following instruction sequences to forming address may be generated (insn 73 14 74 4 (set (reg/f:DI 20 x20 [99]) (high:DI (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats))) 35 {*movdi_aarch64} (expr_list:REG_EQUIV (high:DI (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats)) (nil))) (insn 17 30 25 5 (set (reg/f:DI 4 x4 [83]) (lo_sum:DI (reg/f:DI 20 x20 [99]) (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats))) {add_losym_di} (expr_list:REG_EQUIV (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats) (nil))) while CONSTANT_P could not catch the latter lo_sum case, as the RTX_CLASS of lo_sum is RTX_OBJ not RTX_CONST_OBJ, Hmm, it's been ~15 years since I regularly worked on a target that uses HIGH/LO_SUM, I thought we wrapped the LO_SUM expression inside a CONST as well, but reading the docs for CONST, that clearly isn't the case. Could we add a check for lo_sum since it is an RTX_OBJ rather than RTX_COMM_ARITH or RTX_BIN_ARITH? thanks, agree, that's exactly what I want to catch, while missed it during the patch re-write. I was been stupid! anyway, I think we need to solve pr63404, https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02568.html firstly, as it's a correctness regression. -- Jiong I am testing the patch for that to fix the above issue. It shows up with the testcase Jiong added but only with -mabi=ilp32 enabled. Thanks, Andrew Pinski Sorry for that. Can you (re) send your current patch for this for review? Jeff
Re: [COMMITTED][PATCH] Improve prepare_shrink_wrap to sink more instructions
On 24 September 2014 20:32, Jiong Wang jiong.w...@arm.com wrote: On 22/09/14 19:01, Jeff Law wrote: On 09/22/14 04:29, Jiong Wang wrote: On 19/09/14 21:43, Jeff Law wrote: patch attached. please review, thanks. gcc/ * shrink-wrap.c (move_insn_for_shrink_wrap): Add further check when !REG_P (src) to release more instruction sink opportunities. gcc/testsuite/ * gcc.target/aarch64/shrink_wrap_symbol_ref_1.c: New testcase. Thanks. Please verify this version passes a bootstrap regression test. Assuming it does it is OK for the trunk. pass bootstrap and on regression on x86 based on revision 215515. committed as revision 215563. -- Jiong I have observed regressions in the g++ testsuite: pr49847 now FAILs after this patch. Here is what I have in my logs: /aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/testsuite/g++/../../xg++ -B/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/testsuite/g++/../../ /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/pr49847.C -fno-diagnostics-show-caret -fdiagnostics-color=never -nostdinc++ -I/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/arm-none-linux-gnueabihf/libstdc++-v3/include/arm-none-linux-gnueabihf -I/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/arm-none-linux-gnueabihf/libstdc++-v3/include -I/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/libsupc++ -I/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/include/backward -I/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/testsuite/util -fmessage-length=0 -std=gnu++98 -O -fnon-call-exceptions -S -o pr49847.s(timeout = 800) /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/pr49847.C: In function 'int f(float)': /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/pr49847.C:7:1: error: missing REG_EH_REGION note at the end of bb 2 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/pr49847.C:7:1: internal compiler error: verify_flow_info failed 0x82f8ba verify_flow_info() /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfghooks.c:260 0x840cd3 commit_edge_insertions() /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgrtl.c:2068 0x9bf243 thread_prologue_and_epilogue_insns /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/function.c:5852 0x9bfa52 rest_of_handle_thread_prologue_and_epilogue /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/function.c:6245 0x9bfa52 execute /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/function.c:6283 As per http://cbuild.validation.linaro.org/build/cross-validation/gcc/trunk/215563/report-build-info.html I've noticed this on targets: arm-none-linux-gnueabihf armeb-none-linux-gnueabihf aarch64-none-elf aarch64_be-none-elf aarch64-none-linux-gnu but NOT on arm-none-eabi arm-none-linux-gnueabi Christophe.
Re: [COMMITTED][PATCH] Improve prepare_shrink_wrap to sink more instructions
On 25/09/14 12:25, Christophe Lyon wrote: On 24 September 2014 20:32, Jiong Wang jiong.w...@arm.com wrote: On 22/09/14 19:01, Jeff Law wrote: On 09/22/14 04:29, Jiong Wang wrote: On 19/09/14 21:43, Jeff Law wrote: patch attached. please review, thanks. gcc/ * shrink-wrap.c (move_insn_for_shrink_wrap): Add further check when !REG_P (src) to release more instruction sink opportunities. gcc/testsuite/ * gcc.target/aarch64/shrink_wrap_symbol_ref_1.c: New testcase. Thanks. Please verify this version passes a bootstrap regression test. Assuming it does it is OK for the trunk. pass bootstrap and on regression on x86 based on revision 215515. committed as revision 215563. -- Jiong I have observed regressions in the g++ testsuite: pr49847 now FAILs after this patch. no. even without my patch, the regression still happen. or you could specify -fno-shrink-wrap, gcc still crash. so, this regression should caused by other commits which haven't exposed on x86 regression test. -- Jiong Here is what I have in my logs: /aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/testsuite/g++/../../xg++ -B/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/testsuite/g++/../../ /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/pr49847.C -fno-diagnostics-show-caret -fdiagnostics-color=never -nostdinc++ -I/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/arm-none-linux-gnueabihf/libstdc++-v3/include/arm-none-linux-gnueabihf -I/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/arm-none-linux-gnueabihf/libstdc++-v3/include -I/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/libsupc++ -I/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/include/backward -I/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/testsuite/util -fmessage-length=0 -std=gnu++98 -O -fnon-call-exceptions -S -o pr49847.s(timeout = 800) /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/pr49847.C: In function 'int f(float)': /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/pr49847.C:7:1: error: missing REG_EH_REGION note at the end of bb 2 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/pr49847.C:7:1: internal compiler error: verify_flow_info failed 0x82f8ba verify_flow_info() /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfghooks.c:260 0x840cd3 commit_edge_insertions() /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgrtl.c:2068 0x9bf243 thread_prologue_and_epilogue_insns /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/function.c:5852 0x9bfa52 rest_of_handle_thread_prologue_and_epilogue /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/function.c:6245 0x9bfa52 execute /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/function.c:6283 As per http://cbuild.validation.linaro.org/build/cross-validation/gcc/trunk/215563/report-build-info.html I've noticed this on targets: arm-none-linux-gnueabihf armeb-none-linux-gnueabihf aarch64-none-elf aarch64_be-none-elf aarch64-none-linux-gnu but NOT on arm-none-eabi arm-none-linux-gnueabi Christophe.
[COMMITTED][PATCH] Improve prepare_shrink_wrap to sink more instructions
On 22/09/14 19:01, Jeff Law wrote: On 09/22/14 04:29, Jiong Wang wrote: On 19/09/14 21:43, Jeff Law wrote: patch attached. please review, thanks. gcc/ * shrink-wrap.c (move_insn_for_shrink_wrap): Add further check when !REG_P (src) to release more instruction sink opportunities. gcc/testsuite/ * gcc.target/aarch64/shrink_wrap_symbol_ref_1.c: New testcase. Thanks. Please verify this version passes a bootstrap regression test. Assuming it does it is OK for the trunk. pass bootstrap and on regression on x86 based on revision 215515. committed as revision 215563. -- Jiong jeff
Re: [PATCH] Improve prepare_shrink_wrap to sink more instructions
On 19/09/14 21:43, Jeff Law wrote: On 09/15/14 08:33, Jiong Wang wrote: Jeff, thanks, I partially understand your meaning here. take the function ira_implicitly_set_insn_hard_regs in ira-lives.c for example, when generating address rtl, gcc will automatically generate const operator to prefix the address expression, like the following. so a simple CONSTANT_P check is enough in case there is no embedded register. (insn 309 310 308 3 (set (reg:DI 44 r15 [orig:94 ivtmp.674 ] [94]) (const:DI (plus:DI (symbol_ref:DI (recog_data) [flags 0x40] var_decl 0x2b2c2ff91510 recog_data) (const_int 480 [0x1e0] -1 but for architecture like aarch64, the following instruction sequences to forming address may be generated (insn 73 14 74 4 (set (reg/f:DI 20 x20 [99]) (high:DI (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats))) 35 {*movdi_aarch64} (expr_list:REG_EQUIV (high:DI (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats)) (nil))) (insn 17 30 25 5 (set (reg/f:DI 4 x4 [83]) (lo_sum:DI (reg/f:DI 20 x20 [99]) (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats))) {add_losym_di} (expr_list:REG_EQUIV (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats) (nil))) while CONSTANT_P could not catch the latter lo_sum case, as the RTX_CLASS of lo_sum is RTX_OBJ not RTX_CONST_OBJ, Hmm, it's been ~15 years since I regularly worked on a target that uses HIGH/LO_SUM, I thought we wrapped the LO_SUM expression inside a CONST as well, but reading the docs for CONST, that clearly isn't the case. Sorry for that. Can you (re) send your current patch for this for review? patch attached. please review, thanks. gcc/ * shrink-wrap.c (move_insn_for_shrink_wrap): Add further check when !REG_P (src) to release more instruction sink opportunities. gcc/testsuite/ * gcc.target/aarch64/shrink_wrap_symbol_ref_1.c: New testcase. diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c index fd24135..739e957 100644 --- a/gcc/shrink-wrap.c +++ b/gcc/shrink-wrap.c @@ -53,6 +53,7 @@ along with GCC; see the file COPYING3. If not see #include bb-reorder.h #include shrink-wrap.h #include regcprop.h +#include rtl-iter.h #ifdef HAVE_simple_return @@ -169,7 +170,9 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn, { rtx set, src, dest; bitmap live_out, live_in, bb_uses, bb_defs; - unsigned int i, dregno, end_dregno, sregno, end_sregno; + unsigned int i, dregno, end_dregno; + unsigned int sregno = FIRST_PSEUDO_REGISTER; + unsigned int end_sregno = FIRST_PSEUDO_REGISTER; basic_block next_block; edge live_edge; @@ -179,7 +182,34 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn, return false; src = SET_SRC (set); dest = SET_DEST (set); - if (!REG_P (dest) || !REG_P (src) + + if (!REG_P (src)) +{ + unsigned int reg_num = 0; + unsigned int nonconstobj_num = 0; + rtx src_inner = NULL_RTX; + + subrtx_var_iterator::array_type array; + FOR_EACH_SUBRTX_VAR (iter, array, src, ALL) + { + rtx x = *iter; + if (REG_P (x)) + { + reg_num++; + src_inner = x; + } + else if (!CONSTANT_P (x) OBJECT_P (x)) + nonconstobj_num++; + } + + if (nonconstobj_num 0 + || reg_num 1) + src = NULL_RTX; + else if (reg_num == 1) + src = src_inner; +} + + if (!REG_P (dest) || src == NULL_RTX /* STACK or FRAME related adjustment might be part of prologue. So keep them in the entry block. */ || dest == stack_pointer_rtx @@ -188,10 +218,13 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn, return false; /* Make sure that the source register isn't defined later in BB. */ - sregno = REGNO (src); - end_sregno = END_REGNO (src); - if (overlaps_hard_reg_set_p (defs, GET_MODE (src), sregno)) -return false; + if (REG_P (src)) +{ + sregno = REGNO (src); + end_sregno = END_REGNO (src); + if (overlaps_hard_reg_set_p (defs, GET_MODE (src), sregno)) + return false; +} /* Make sure that the destination register isn't referenced later in BB. */ dregno = REGNO (dest); diff --git a/gcc/testsuite/gcc.target/aarch64/shrink_wrap_symbol_ref_1.c b/gcc/testsuite/gcc.target/aarch64/shrink_wrap_symbol_ref_1.c new file mode 100644 index 000..ad2e588 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/shrink_wrap_symbol_ref_1.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-rtl-pro_and_epilogue } */ + +extern char *asm_out_file; +extern void default_elf_asm_output_ascii (char *, const char *, int); + +void +assemble_string (const char *p, int size) +{ + int pos = 0; + int maximum = 2000; + + while (pos size) +{ + int thissize = size - pos; + + if (thissize maximum) + thissize = maximum; + + default_elf_asm_output_ascii (asm_out_file, p,
Re: [PATCH] Improve prepare_shrink_wrap to sink more instructions
On 09/22/14 04:29, Jiong Wang wrote: On 19/09/14 21:43, Jeff Law wrote: On 09/15/14 08:33, Jiong Wang wrote: Jeff, thanks, I partially understand your meaning here. take the function ira_implicitly_set_insn_hard_regs in ira-lives.c for example, when generating address rtl, gcc will automatically generate const operator to prefix the address expression, like the following. so a simple CONSTANT_P check is enough in case there is no embedded register. (insn 309 310 308 3 (set (reg:DI 44 r15 [orig:94 ivtmp.674 ] [94]) (const:DI (plus:DI (symbol_ref:DI (recog_data) [flags 0x40] var_decl 0x2b2c2ff91510 recog_data) (const_int 480 [0x1e0] -1 but for architecture like aarch64, the following instruction sequences to forming address may be generated (insn 73 14 74 4 (set (reg/f:DI 20 x20 [99]) (high:DI (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats))) 35 {*movdi_aarch64} (expr_list:REG_EQUIV (high:DI (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats)) (nil))) (insn 17 30 25 5 (set (reg/f:DI 4 x4 [83]) (lo_sum:DI (reg/f:DI 20 x20 [99]) (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats))) {add_losym_di} (expr_list:REG_EQUIV (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats) (nil))) while CONSTANT_P could not catch the latter lo_sum case, as the RTX_CLASS of lo_sum is RTX_OBJ not RTX_CONST_OBJ, Hmm, it's been ~15 years since I regularly worked on a target that uses HIGH/LO_SUM, I thought we wrapped the LO_SUM expression inside a CONST as well, but reading the docs for CONST, that clearly isn't the case. Sorry for that. Can you (re) send your current patch for this for review? patch attached. please review, thanks. gcc/ * shrink-wrap.c (move_insn_for_shrink_wrap): Add further check when !REG_P (src) to release more instruction sink opportunities. gcc/testsuite/ * gcc.target/aarch64/shrink_wrap_symbol_ref_1.c: New testcase. Thanks. Please verify this version passes a bootstrap regression test. Assuming it does it is OK for the trunk. jeff
Re: [PATCH] Improve prepare_shrink_wrap to sink more instructions
On 09/15/14 08:33, Jiong Wang wrote: Jeff, thanks, I partially understand your meaning here. take the function ira_implicitly_set_insn_hard_regs in ira-lives.c for example, when generating address rtl, gcc will automatically generate const operator to prefix the address expression, like the following. so a simple CONSTANT_P check is enough in case there is no embedded register. (insn 309 310 308 3 (set (reg:DI 44 r15 [orig:94 ivtmp.674 ] [94]) (const:DI (plus:DI (symbol_ref:DI (recog_data) [flags 0x40] var_decl 0x2b2c2ff91510 recog_data) (const_int 480 [0x1e0] -1 but for architecture like aarch64, the following instruction sequences to forming address may be generated (insn 73 14 74 4 (set (reg/f:DI 20 x20 [99]) (high:DI (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats))) 35 {*movdi_aarch64} (expr_list:REG_EQUIV (high:DI (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats)) (nil))) (insn 17 30 25 5 (set (reg/f:DI 4 x4 [83]) (lo_sum:DI (reg/f:DI 20 x20 [99]) (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats))) {add_losym_di} (expr_list:REG_EQUIV (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats) (nil))) while CONSTANT_P could not catch the latter lo_sum case, as the RTX_CLASS of lo_sum is RTX_OBJ not RTX_CONST_OBJ, Hmm, it's been ~15 years since I regularly worked on a target that uses HIGH/LO_SUM, I thought we wrapped the LO_SUM expression inside a CONST as well, but reading the docs for CONST, that clearly isn't the case. Sorry for that. Can you (re) send your current patch for this for review? Jeff
Re: [PATCH] Improve prepare_shrink_wrap to sink more instructions
On 11/09/14 21:39, Jeff Law wrote: On 09/08/14 07:57, Jiong Wang wrote: Conceptually OK. Some questions/concerns about the implementation. Hi Jeff, thanks very much for your review. It seems to me that what you're trying to describe on the RHS is REG_P || CONSTANT_P yes. and actually I am trying to detect all rtx which contains any number of RTX_CONST_OBJs and no more than one REG. Note that CONSTANT_P will catch things like (high (symbol_ref)) and (lo_sum (reg) (symbol_ref)) which are often used to build addresses on some targets. With that in mind, rather than using a for_each_rtx callback, test if (REG_P (src) || CONSTANT_P (src)) Note that SRC, when it is a CONSTANT_P, may have a variety of forms with embedded registers. You'll need to verify that all of those registers are not assigned after the insn with the CONSTANT_P source operand. Right now you only perform that check when SRC is a REG_P. I am using the for_each_rtx because I want to scan src so that any sub-operands are checked, the number of REG and non-constant objects are record in reg_found and nonconst_found. the embedded register found also record in the reg field of the structure rtx_search_arg, Constants in this context are going to satisfy CONSTANT_P, you don't need to manually verify them. It will include simple constants and constants which are built up out of multiple instructions (symbolic constants in particular). Jeff, thanks, I partially understand your meaning here. take the function ira_implicitly_set_insn_hard_regs in ira-lives.c for example, when generating address rtl, gcc will automatically generate const operator to prefix the address expression, like the following. so a simple CONSTANT_P check is enough in case there is no embedded register. (insn 309 310 308 3 (set (reg:DI 44 r15 [orig:94 ivtmp.674 ] [94]) (const:DI (plus:DI (symbol_ref:DI (recog_data) [flags 0x40] var_decl 0x2b2c2ff91510 recog_data) (const_int 480 [0x1e0] -1 but for architecture like aarch64, the following instruction sequences to forming address may be generated (insn 73 14 74 4 (set (reg/f:DI 20 x20 [99]) (high:DI (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats))) 35 {*movdi_aarch64} (expr_list:REG_EQUIV (high:DI (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats)) (nil))) (insn 17 30 25 5 (set (reg/f:DI 4 x4 [83]) (lo_sum:DI (reg/f:DI 20 x20 [99]) (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats))) {add_losym_di} (expr_list:REG_EQUIV (symbol_ref:DI (global_a) [flags 0xc0] var_decl 0x7ff755a60900 stats) (nil))) while CONSTANT_P could not catch the latter lo_sum case, as the RTX_CLASS of lo_sum is RTX_OBJ not RTX_CONST_OBJ, so the the manually verification FOR_EACH_SUBRTX_VAR (iter, array, src, ALL) { rtx x = *iter; if (REG_P (x)) { reg_num++; src_inner = x; } else if (!CONSTANT_P (x) OBJECT_P (x)) nonconstobj_num++; } is to catch cases like the lo_sum insn above which contains no more than 1 register and only contains RTX_CONST_OBJ. I suspect you still need the callback to verify the # of registers is just 1 so that the later tests work. However, please don't use for_each_rtx, we're moving away from that to a more efficient walker FOR_EACH_SUBRTX. fixed, patch updated, please review. thanks. BTW, when I verifying this patch on gilbc x86-64 build, I found there may be another hiding issue when calculating live_in when splitting edge in move_insn_for_shrink_wrap, I will reply on that historical email thread directly. Regards, Jiong Jeff diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c index fd24135..739e957 100644 --- a/gcc/shrink-wrap.c +++ b/gcc/shrink-wrap.c @@ -53,6 +53,7 @@ along with GCC; see the file COPYING3. If not see #include bb-reorder.h #include shrink-wrap.h #include regcprop.h +#include rtl-iter.h #ifdef HAVE_simple_return @@ -169,7 +170,9 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn, { rtx set, src, dest; bitmap live_out, live_in, bb_uses, bb_defs; - unsigned int i, dregno, end_dregno, sregno, end_sregno; + unsigned int i, dregno, end_dregno; + unsigned int sregno = FIRST_PSEUDO_REGISTER; + unsigned int end_sregno = FIRST_PSEUDO_REGISTER; basic_block next_block; edge live_edge; @@ -179,7 +182,34 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn, return false; src = SET_SRC (set); dest = SET_DEST (set); - if (!REG_P (dest) || !REG_P (src) + + if (!REG_P (src)) +{ + unsigned int reg_num = 0; + unsigned int nonconstobj_num = 0; + rtx src_inner = NULL_RTX; + + subrtx_var_iterator::array_type array; + FOR_EACH_SUBRTX_VAR (iter, array, src, ALL) + { + rtx x = *iter; + if
Re: [PATCH] Improve prepare_shrink_wrap to sink more instructions
On 09/11/14 14:58, Wang Jiong wrote: the reg_found will record the # of registers, and there will be a check to make sure no register or just 1 register in the src. so if reg_found is bigger than 1, then src will be assigned to NULL_RTX, that we just stop the move. + if (arg.nonconst_found + || arg.reg_found 1) +src = NULL_RTX; am I missing something here? No, I was just reiterating that you are going to need that code, so some kind of walker would still be necessary. Sorry I wasn't clear about that. jeff
Re: [PATCH] Improve prepare_shrink_wrap to sink more instructions
On 09/08/14 07:57, Jiong Wang wrote: Conceptually OK. Some questions/concerns about the implementation. Hi Jeff, thanks very much for your review. It seems to me that what you're trying to describe on the RHS is REG_P || CONSTANT_P yes. and actually I am trying to detect all rtx which contains any number of RTX_CONST_OBJs and no more than one REG. Note that CONSTANT_P will catch things like (high (symbol_ref)) and (lo_sum (reg) (symbol_ref)) which are often used to build addresses on some targets. With that in mind, rather than using a for_each_rtx callback, test if (REG_P (src) || CONSTANT_P (src)) Note that SRC, when it is a CONSTANT_P, may have a variety of forms with embedded registers. You'll need to verify that all of those registers are not assigned after the insn with the CONSTANT_P source operand. Right now you only perform that check when SRC is a REG_P. I am using the for_each_rtx because I want to scan src so that any sub-operands are checked, the number of REG and non-constant objects are record in reg_found and nonconst_found. the embedded register found also record in the reg field of the structure rtx_search_arg, Constants in this context are going to satisfy CONSTANT_P, you don't need to manually verify them. It will include simple constants and constants which are built up out of multiple instructions (symbolic constants in particular). I suspect you still need the callback to verify the # of registers is just 1 so that the later tests work. However, please don't use for_each_rtx, we're moving away from that to a more efficient walker FOR_EACH_SUBRTX. Jeff
Re: [PATCH] Improve prepare_shrink_wrap to sink more instructions
2014-09-11 21:39 GMT+01:00 Jeff Law l...@redhat.com: On 09/08/14 07:57, Jiong Wang wrote: Conceptually OK. Some questions/concerns about the implementation. Hi Jeff, thanks very much for your review. It seems to me that what you're trying to describe on the RHS is REG_P || CONSTANT_P yes. and actually I am trying to detect all rtx which contains any number of RTX_CONST_OBJs and no more than one REG. Note that CONSTANT_P will catch things like (high (symbol_ref)) and (lo_sum (reg) (symbol_ref)) which are often used to build addresses on some targets. With that in mind, rather than using a for_each_rtx callback, test if (REG_P (src) || CONSTANT_P (src)) Note that SRC, when it is a CONSTANT_P, may have a variety of forms with embedded registers. You'll need to verify that all of those registers are not assigned after the insn with the CONSTANT_P source operand. Right now you only perform that check when SRC is a REG_P. I am using the for_each_rtx because I want to scan src so that any sub-operands are checked, the number of REG and non-constant objects are record in reg_found and nonconst_found. the embedded register found also record in the reg field of the structure rtx_search_arg, Hi Jeff, Constants in this context are going to satisfy CONSTANT_P, you don't need to manually verify them. It will include simple constants and constants which are built up out of multiple instructions (symbolic constants in particular). === thanks, will investigate this tomorrow. I suspect you still need the callback to verify the # of registers is just 1 so that the later tests work. === the reg_found will record the # of registers, and there will be a check to make sure no register or just 1 register in the src. === so if reg_found is bigger than 1, then src will be assigned to NULL_RTX, that we just stop the move. + if (arg.nonconst_found + || arg.reg_found 1) +src = NULL_RTX; === am I missing something here? However, please don't use for_each_rtx, we're moving away from that to a more efficient walker FOR_EACH_SUBRTX. === OK, will fix. Thanks very much for review. Regards, Jiong Jeff
Re: [PATCH] Improve prepare_shrink_wrap to sink more instructions
On 05/09/14 20:48, Jeff Law wrote: On 09/04/14 08:15, Jiong Wang wrote: this patch relax the restriction on src to accept any one of the following: + REG + CONST_OBJ, like SYMBOL_REF + combination of single REG and any other CONST_OBJs. (reg def/use calculation will not affected by CONST_OBJs) RISC backend may benefit more from this relax, although there still be minor improvements on x86. for example, there are 17 more functions shrink-wrapped during x86-64 bootstrap, like sort_bucket in ira-color.c. test done = no regression on aarch64-none-elf bare-metal. no regression on x86-64 check-gcc. both aarch64 and x86-64 bootstrap OK. ok for install? 2014-09-04 Jiong Wangjiong.w...@arm.com gcc/ * shrink-wrap.c (rtx_search_arg): New structure type. (rtx_search_arg_p): New typedef. (count_reg_const): New callback function. (move_insn_for_shrink_wrap): Relax the restriction on src operand. Conceptually OK. Some questions/concerns about the implementation. Hi Jeff, thanks very much for your review. It seems to me that what you're trying to describe on the RHS is REG_P || CONSTANT_P yes. and actually I am trying to detect all rtx which contains any number of RTX_CONST_OBJs and no more than one REG. Note that CONSTANT_P will catch things like (high (symbol_ref)) and (lo_sum (reg) (symbol_ref)) which are often used to build addresses on some targets. With that in mind, rather than using a for_each_rtx callback, test if (REG_P (src) || CONSTANT_P (src)) Note that SRC, when it is a CONSTANT_P, may have a variety of forms with embedded registers. You'll need to verify that all of those registers are not assigned after the insn with the CONSTANT_P source operand. Right now you only perform that check when SRC is a REG_P. I am using the for_each_rtx because I want to scan src so that any sub-operands are checked, the number of REG and non-constant objects are record in reg_found and nonconst_found. the embedded register found also record in the reg field of the structure rtx_search_arg, and there is a assignment: else if (arg.reg_found == 1) src = arg.reg; so the logic of my patch is: * if REG_P (src), nothing new, just follow the original logic. * if !REG_P (src), scan src, to make sure there is no more than one REG, and there is no non-constant objects. if any embedded REG found, assign it to src to go through all code written for REG_P (src). And I do found there is one glitch in my patch: + else if (!CONSTANT_P (x)) should be + else if (!CONSTANT_P (x) OBJECT_P (x)) to skip those operator rtx like plus, minus etc. so, does my explanation make sense? BTW, there will be 2547 functions shrink-wrapped during x86-64 bootstrap compared with 2501 before this improvement, 46 more, +1.84%. thanks. -- Jiong Jeff
Re: [PATCH] Improve prepare_shrink_wrap to sink more instructions
On 09/04/14 08:15, Jiong Wang wrote: currently, the instruction sink in prepare_shrink_wrap is a bit conservative that some further optimization opportunities have been missed. given the prologue use register A by: (store A, [sp + offset]) then given the entry_basic_block contains a simply register copy like: (move A, B) current prepare_shrink_wrap will sink the move instruction as deep as it can, then the entry_basic_block could be marked as don't need prologue. while if we replace (move A, B) into either one of * (move B, CONST_K), * (move B, (plus A, CONST_K)) we still could do the same sink optimization, but *current gcc do not*. pattern like (move B, CONST_K) are very normal for some RISC targets. for example on AArch64, we could have the following pair: adrpx22, global_data_a add x0, x22, :lo12:global_data_a if adrp be scheduled into the entry_basic_block then the write of x22 may prevent shrink-wrap happen. when judge whether one instruction is sink-able, move_insn_for_shrink_wrap only accept simply reg copy that both dest and src are REG_P, while the second operand of adrp is actually a SYMBOL_REF, thus it's reject by the optimization. this patch relax the restriction on src to accept any one of the following: + REG + CONST_OBJ, like SYMBOL_REF + combination of single REG and any other CONST_OBJs. (reg def/use calculation will not affected by CONST_OBJs) RISC backend may benefit more from this relax, although there still be minor improvements on x86. for example, there are 17 more functions shrink-wrapped during x86-64 bootstrap, like sort_bucket in ira-color.c. test done = no regression on aarch64-none-elf bare-metal. no regression on x86-64 check-gcc. both aarch64 and x86-64 bootstrap OK. ok for install? 2014-09-04 Jiong Wangjiong.w...@arm.com gcc/ * shrink-wrap.c (rtx_search_arg): New structure type. (rtx_search_arg_p): New typedef. (count_reg_const): New callback function. (move_insn_for_shrink_wrap): Relax the restriction on src operand. Conceptually OK. Some questions/concerns about the implementation. It seems to me that what you're trying to describe on the RHS is REG_P || CONSTANT_P Note that CONSTANT_P will catch things like (high (symbol_ref)) and (lo_sum (reg) (symbol_ref)) which are often used to build addresses on some targets. With that in mind, rather than using a for_each_rtx callback, test if (REG_P (src) || CONSTANT_P (src)) Note that SRC, when it is a CONSTANT_P, may have a variety of forms with embedded registers. You'll need to verify that all of those registers are not assigned after the insn with the CONSTANT_P source operand. Right now you only perform that check when SRC is a REG_P. Jeff
[PATCH] Improve prepare_shrink_wrap to sink more instructions
currently, the instruction sink in prepare_shrink_wrap is a bit conservative that some further optimization opportunities have been missed. given the prologue use register A by: (store A, [sp + offset]) then given the entry_basic_block contains a simply register copy like: (move A, B) current prepare_shrink_wrap will sink the move instruction as deep as it can, then the entry_basic_block could be marked as don't need prologue. while if we replace (move A, B) into either one of * (move B, CONST_K), * (move B, (plus A, CONST_K)) we still could do the same sink optimization, but *current gcc do not*. pattern like (move B, CONST_K) are very normal for some RISC targets. for example on AArch64, we could have the following pair: adrpx22, global_data_a add x0, x22, :lo12:global_data_a if adrp be scheduled into the entry_basic_block then the write of x22 may prevent shrink-wrap happen. when judge whether one instruction is sink-able, move_insn_for_shrink_wrap only accept simply reg copy that both dest and src are REG_P, while the second operand of adrp is actually a SYMBOL_REF, thus it's reject by the optimization. this patch relax the restriction on src to accept any one of the following: + REG + CONST_OBJ, like SYMBOL_REF + combination of single REG and any other CONST_OBJs. (reg def/use calculation will not affected by CONST_OBJs) RISC backend may benefit more from this relax, although there still be minor improvements on x86. for example, there are 17 more functions shrink-wrapped during x86-64 bootstrap, like sort_bucket in ira-color.c. test done = no regression on aarch64-none-elf bare-metal. no regression on x86-64 check-gcc. both aarch64 and x86-64 bootstrap OK. ok for install? 2014-09-04 Jiong Wangjiong.w...@arm.com gcc/ * shrink-wrap.c (rtx_search_arg): New structure type. (rtx_search_arg_p): New typedef. (count_reg_const): New callback function. (move_insn_for_shrink_wrap): Relax the restriction on src operand. diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c index 0938f2c..5b5ca85 100644 --- a/gcc/shrink-wrap.c +++ b/gcc/shrink-wrap.c @@ -156,6 +156,37 @@ live_edge_for_reg (basic_block bb, int regno, int end_regno) return live_edge; } +struct rtx_search_arg +{ + unsigned int reg_found; + unsigned int nonconst_found; + rtx reg; +}; + +typedef struct rtx_search_arg *rtx_search_arg_p; + +/* A for_each_rtx callback used by move_insn_for_shrink_wrap to count the + numbers of register and non-constant objects. */ + +static int +count_reg_const (rtx *loc, void *arg) +{ + rtx_search_arg_p p = (rtx_search_arg_p) arg; + rtx x; + + x = *loc; + + if (REG_P (x)) +{ + p-reg_found++; + p-reg = x; +} + else if (! CONSTANT_P (x)) +p-nonconst_found++; + + return 0; +} + /* Try to move INSN from BB to a successor. Return true on success. USES and DEFS are the set of registers that are used and defined after INSN in BB. SPLIT_P indicates whether a live edge from BB @@ -169,7 +200,9 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn, { rtx set, src, dest; bitmap live_out, live_in, bb_uses, bb_defs; - unsigned int i, dregno, end_dregno, sregno, end_sregno; + unsigned int i, dregno, end_dregno; + unsigned int sregno = FIRST_PSEUDO_REGISTER; + unsigned int end_sregno = FIRST_PSEUDO_REGISTER; basic_block next_block; edge live_edge; @@ -179,7 +212,25 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn, return false; src = SET_SRC (set); dest = SET_DEST (set); - if (!REG_P (dest) || !REG_P (src) + + if (!REG_P (src)) +{ + struct rtx_search_arg arg; + + arg.reg_found = 0; + arg.nonconst_found = 0; + arg.reg = NULL_RTX; + + for_each_rtx (src, count_reg_const, (void *) arg); + + if (arg.nonconst_found + || arg.reg_found 1) + src = NULL_RTX; + else if (arg.reg_found == 1) + src = arg.reg; +} + + if (!REG_P (dest) || src == NULL_RTX /* STACK or FRAME related adjustment might be part of prologue. So keep them in the entry block. */ || dest == stack_pointer_rtx @@ -188,10 +238,13 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn, return false; /* Make sure that the source register isn't defined later in BB. */ - sregno = REGNO (src); - end_sregno = END_REGNO (src); - if (overlaps_hard_reg_set_p (defs, GET_MODE (src), sregno)) -return false; + if (REG_P (src)) +{ + sregno = REGNO (src); + end_sregno = END_REGNO (src); + if (overlaps_hard_reg_set_p (defs, GET_MODE (src), sregno)) + return false; +} /* Make sure that the destination register isn't referenced later in BB. */ dregno = REGNO (dest); diff --git a/gcc/testsuite/gcc.target/aarch64/shrink_wrap_symbol_ref_1.c b/gcc/testsuite/gcc.target/aarch64/shrink_wrap_symbol_ref_1.c new file mode 100644 index 000..ad2e588 --- /dev/null +++