Re: Optimize calls to functions that return one of their arguments
Ok, that looks like three votes in favour. I've checked in the following variant with a few minor changes. I've added strcpy and memset to the list of functions, and split off a new function in ira-lives. Changes to subdirectories must be documented in the subdirectory's ChangeLog (ada, c-family, lto) and not in the main ChangeLog. -- Eric Botcazou
Re: Optimize calls to functions that return one of their arguments
On 05/15/2012 09:12 AM, Eric Botcazou wrote: Ok, that looks like three votes in favour. I've checked in the following variant with a few minor changes. I've added strcpy and memset to the list of functions, and split off a new function in ira-lives. Changes to subdirectories must be documented in the subdirectory's ChangeLog (ada, c-family, lto) and not in the main ChangeLog. I checked in a fix. Bernd
Re: Optimize calls to functions that return one of their arguments
I checked in a fix. Thanks! -- Eric Botcazou
Re: Optimize calls to functions that return one of their arguments
On 05/09/2012 05:53 PM, Vladimir Makarov wrote: It is pretty interesting and original idea. My only major objection to the patch is about treatment of ALLOCNO_CHEAP_CALLS_CROSSED_NUM. I think it should be accumulated as ALLOCNO_CALLS_CROSSED_NUM. Changed (places in ira-buld.c). I don't expect that this micro-optimization improves SPEC2000, but it will improve some tests. So it is good to have it. It would be really interesting to see the optimization impact on SPEC2000. I think I'll do it for myself in a week. So IRA part of the patch is ok for me if you modify treatment of ALLOCNO_CHEAP_CALLS_CROSSED_NUM as it is done for ALLOCNO_CALLS_CROSSED_NUM (when upper region allocnos accumulate the values from the corresponding allocnos from its sub-regions). Ok, that looks like three votes in favour. I've checked in the following variant with a few minor changes. I've added strcpy and memset to the list of functions, and split off a new function in ira-lives. Bernd Index: gcc/attribs.c === --- gcc/attribs.c (revision 187411) +++ gcc/attribs.c (working copy) @@ -312,8 +312,9 @@ decl_attributes (tree *node, tree attrib if (spec == NULL) { - warning (OPT_Wattributes, %qE attribute directive ignored, - name); + if (!(flags (int) ATTR_FLAG_BUILT_IN)) + warning (OPT_Wattributes, %qE attribute directive ignored, + name); continue; } else if (list_length (args) spec-min_length Index: gcc/doc/rtl.texi === --- gcc/doc/rtl.texi (revision 187411) +++ gcc/doc/rtl.texi (working copy) @@ -3455,20 +3455,26 @@ unpredictably. @code{call_insn} insns have the same extra fields as @code{insn} insns, accessed in the same way and in addition contain a field @code{CALL_INSN_FUNCTION_USAGE}, which contains a list (chain of -@code{expr_list} expressions) containing @code{use} and @code{clobber} -expressions that denote hard registers and @code{MEM}s used or -clobbered by the called function. +@code{expr_list} expressions) containing @code{use}, @code{clobber} and +sometimes @code{set} expressions that denote hard registers and +@code{mem}s used or clobbered by the called function. -A @code{MEM} generally points to a stack slots in which arguments passed +A @code{mem} generally points to a stack slot in which arguments passed to the libcall by reference (@pxref{Register Arguments, TARGET_PASS_BY_REFERENCE}) are stored. If the argument is caller-copied (@pxref{Register Arguments, TARGET_CALLEE_COPIES}), -the stack slot will be mentioned in @code{CLOBBER} and @code{USE} -entries; if it's callee-copied, only a @code{USE} will appear, and the -@code{MEM} may point to addresses that are not stack slots. +the stack slot will be mentioned in @code{clobber} and @code{use} +entries; if it's callee-copied, only a @code{use} will appear, and the +@code{mem} may point to addresses that are not stack slots. -@code{CLOBBER}ed registers in this list augment registers specified in -@code{CALL_USED_REGISTERS} (@pxref{Register Basics}). +Registers occurring inside a @code{clobber} in this list augment +registers specified in @code{CALL_USED_REGISTERS} (@pxref{Register +Basics}). + +If the list contains a @code{set} involving two registers, it indicates +that the function returns one of its arguments. Such a @code{set} may +look like a no-op if the same register holds the argument and the return +value. @findex code_label @findex CODE_LABEL_NUMBER Index: gcc/c-family/c-common.c === --- gcc/c-family/c-common.c (revision 187411) +++ gcc/c-family/c-common.c (working copy) @@ -4603,11 +4603,13 @@ enum built_in_attribute { #define DEF_ATTR_NULL_TREE(ENUM) ENUM, #define DEF_ATTR_INT(ENUM, VALUE) ENUM, +#define DEF_ATTR_STRING(ENUM, VALUE) ENUM, #define DEF_ATTR_IDENT(ENUM, STRING) ENUM, #define DEF_ATTR_TREE_LIST(ENUM, PURPOSE, VALUE, CHAIN) ENUM, #include builtin-attrs.def #undef DEF_ATTR_NULL_TREE #undef DEF_ATTR_INT +#undef DEF_ATTR_STRING #undef DEF_ATTR_IDENT #undef DEF_ATTR_TREE_LIST ATTR_LAST @@ -5926,6 +5928,8 @@ c_init_attributes (void) built_in_attributes[(int) ENUM] = NULL_TREE; #define DEF_ATTR_INT(ENUM, VALUE)\ built_in_attributes[(int) ENUM] = build_int_cst (integer_type_node, VALUE); +#define DEF_ATTR_STRING(ENUM, VALUE)\ + built_in_attributes[(int) ENUM] = build_string (strlen (VALUE), VALUE); #define DEF_ATTR_IDENT(ENUM, STRING)\ built_in_attributes[(int) ENUM] = get_identifier (STRING); #define DEF_ATTR_TREE_LIST(ENUM, PURPOSE, VALUE, CHAIN) \ Index: gcc/postreload.c === --- gcc/postreload.c (revision 187411) +++ gcc/postreload.c (working copy) @@ -1358,8 +1358,10 @@ reload_combine (void) for (link = CALL_INSN_FUNCTION_USAGE (insn); link; link = XEXP (link, 1)) { -
Re: Optimize calls to functions that return one of their arguments
On Mon, May 14, 2012 at 02:08:34PM +0200, Bernd Schmidt wrote: --- gcc/builtins.def (revision 187411) +++ gcc/builtins.def (working copy) @@ -532,10 +532,10 @@ DEF_EXT_LIB_BUILTIN(BUILT_IN_BZERO, DEF_EXT_LIB_BUILTIN(BUILT_IN_INDEX, index, BT_FN_STRING_CONST_STRING_INT, ATTR_PURE_NOTHROW_NONNULL_LEAF) DEF_LIB_BUILTIN(BUILT_IN_MEMCHR, memchr, BT_FN_PTR_CONST_PTR_INT_SIZE, ATTR_PURE_NOTHROW_NONNULL_LEAF) DEF_LIB_BUILTIN(BUILT_IN_MEMCMP, memcmp, BT_FN_INT_CONST_PTR_CONST_PTR_SIZE, ATTR_PURE_NOTHROW_NONNULL_LEAF) -DEF_LIB_BUILTIN(BUILT_IN_MEMCPY, memcpy, BT_FN_PTR_PTR_CONST_PTR_SIZE, ATTR_NOTHROW_NONNULL_LEAF) -DEF_LIB_BUILTIN(BUILT_IN_MEMMOVE, memmove, BT_FN_PTR_PTR_CONST_PTR_SIZE, ATTR_NOTHROW_NONNULL_LEAF) +DEF_LIB_BUILTIN(BUILT_IN_MEMCPY, memcpy, BT_FN_PTR_PTR_CONST_PTR_SIZE, ATTR_RET1_NOTHROW_NONNULL_LEAF) +DEF_LIB_BUILTIN(BUILT_IN_MEMMOVE, memmove, BT_FN_PTR_PTR_CONST_PTR_SIZE, ATTR_RET1_NOTHROW_NONNULL_LEAF) DEF_EXT_LIB_BUILTIN(BUILT_IN_MEMPCPY, mempcpy, BT_FN_PTR_PTR_CONST_PTR_SIZE, ATTR_NOTHROW_NONNULL_LEAF) -DEF_LIB_BUILTIN(BUILT_IN_MEMSET, memset, BT_FN_PTR_PTR_INT_SIZE, ATTR_NOTHROW_NONNULL_LEAF) +DEF_LIB_BUILTIN(BUILT_IN_MEMSET, memset, BT_FN_PTR_PTR_INT_SIZE, ATTR_RET1_NOTHROW_NONNULL_LEAF) DEF_EXT_LIB_BUILTIN(BUILT_IN_RINDEX, rindex, BT_FN_STRING_CONST_STRING_INT, ATTR_PURE_NOTHROW_NONNULL_LEAF) DEF_EXT_LIB_BUILTIN(BUILT_IN_STPCPY, stpcpy, BT_FN_STRING_STRING_CONST_STRING, ATTR_NOTHROW_NONNULL_LEAF) DEF_EXT_LIB_BUILTIN(BUILT_IN_STPNCPY, stpncpy, BT_FN_STRING_STRING_CONST_STRING_SIZE, ATTR_NOTHROW_NONNULL_LEAF) @@ -543,7 +543,7 @@ DEF_EXT_LIB_BUILTIN(BUILT_IN_STRCASE DEF_LIB_BUILTIN(BUILT_IN_STRCAT, strcat, BT_FN_STRING_STRING_CONST_STRING, ATTR_NOTHROW_NONNULL_LEAF) DEF_LIB_BUILTIN(BUILT_IN_STRCHR, strchr, BT_FN_STRING_CONST_STRING_INT, ATTR_PURE_NOTHROW_NONNULL_LEAF) DEF_LIB_BUILTIN(BUILT_IN_STRCMP, strcmp, BT_FN_INT_CONST_STRING_CONST_STRING, ATTR_PURE_NOTHROW_NONNULL_LEAF) -DEF_LIB_BUILTIN(BUILT_IN_STRCPY, strcpy, BT_FN_STRING_STRING_CONST_STRING, ATTR_NOTHROW_NONNULL_LEAF) +DEF_LIB_BUILTIN(BUILT_IN_STRCPY, strcpy, BT_FN_STRING_STRING_CONST_STRING, ATTR_RET1_NOTHROW_NONNULL_LEAF) DEF_LIB_BUILTIN(BUILT_IN_STRCSPN, strcspn, BT_FN_SIZE_CONST_STRING_CONST_STRING, ATTR_PURE_NOTHROW_NONNULL_LEAF) DEF_EXT_LIB_BUILTIN(BUILT_IN_STRDUP, strdup, BT_FN_STRING_CONST_STRING, ATTR_MALLOC_NOTHROW_NONNULL_LEAF) DEF_EXT_LIB_BUILTIN(BUILT_IN_STRNDUP, strndup, BT_FN_STRING_CONST_STRING_SIZE, ATTR_MALLOC_NOTHROW_NONNULL_LEAF) @@ -757,8 +757,8 @@ DEF_BUILTIN_STUB (BUILT_IN_ALLOCA_WITH_A /* Object size checking builtins. */ DEF_GCC_BUILTIN (BUILT_IN_OBJECT_SIZE, object_size, BT_FN_SIZE_CONST_PTR_INT, ATTR_PURE_NOTHROW_LEAF_LIST) -DEF_EXT_LIB_BUILTIN(BUILT_IN_MEMCPY_CHK, __memcpy_chk, BT_FN_PTR_PTR_CONST_PTR_SIZE_SIZE, ATTR_NOTHROW_NONNULL_LEAF) -DEF_EXT_LIB_BUILTIN(BUILT_IN_MEMMOVE_CHK, __memmove_chk, BT_FN_PTR_PTR_CONST_PTR_SIZE_SIZE, ATTR_NOTHROW_NONNULL_LEAF) +DEF_EXT_LIB_BUILTIN(BUILT_IN_MEMCPY_CHK, __memcpy_chk, BT_FN_PTR_PTR_CONST_PTR_SIZE_SIZE, ATTR_RET1_NOTHROW_NONNULL_LEAF) +DEF_EXT_LIB_BUILTIN(BUILT_IN_MEMMOVE_CHK, __memmove_chk, BT_FN_PTR_PTR_CONST_PTR_SIZE_SIZE, ATTR_RET1_NOTHROW_NONNULL_LEAF) DEF_EXT_LIB_BUILTIN(BUILT_IN_MEMPCPY_CHK, __mempcpy_chk, BT_FN_PTR_PTR_CONST_PTR_SIZE_SIZE, ATTR_NOTHROW_NONNULL_LEAF) DEF_EXT_LIB_BUILTIN(BUILT_IN_MEMSET_CHK, __memset_chk, BT_FN_PTR_PTR_INT_SIZE_SIZE, ATTR_NOTHROW_NONNULL_LEAF) DEF_EXT_LIB_BUILTIN(BUILT_IN_STPCPY_CHK, __stpcpy_chk, BT_FN_STRING_STRING_CONST_STRING_SIZE, ATTR_NOTHROW_NONNULL_LEAF) You are missing also BUILT_IN_STRNCPY, BUILT_IN_MEMSET_CHK, BUILT_IN_STRCPY_CHK and BUILT_IN_STRNCPY_CHK. BUILT_IN_ASSUME_ALIGNED also does, but it should be optimized away far before expansion already. Jakub
Re: Optimize calls to functions that return one of their arguments
On 05/14/2012 08:08 AM, Bernd Schmidt wrote: On 05/09/2012 05:53 PM, Vladimir Makarov wrote: It is pretty interesting and original idea. My only major objection to the patch is about treatment of ALLOCNO_CHEAP_CALLS_CROSSED_NUM. I think it should be accumulated as ALLOCNO_CALLS_CROSSED_NUM. Changed (places in ira-buld.c). I don't expect that this micro-optimization improves SPEC2000, but it will improve some tests. So it is good to have it. It would be really interesting to see the optimization impact on SPEC2000. I think I'll do it for myself in a week. So IRA part of the patch is ok for me if you modify treatment of ALLOCNO_CHEAP_CALLS_CROSSED_NUM as it is done for ALLOCNO_CALLS_CROSSED_NUM (when upper region allocnos accumulate the values from the corresponding allocnos from its sub-regions). Ok, that looks like three votes in favour. I've checked in the following variant with a few minor changes. I've added strcpy and memset to the list of functions, and split off a new function in ira-lives. IRA part is ok for me now. Thanks, Bernd.
Re: Optimize calls to functions that return one of their arguments
On 04/28/2012 11:31 AM, Bernd Schmidt wrote: This patch allows us to recognize that even if the argument to memcpy lives across the call, we can allocate it to a call-used register by reusing the return value of the function. First, the patch sets the existing fn spec attribute for memcpy/memmove. This is translated to a new form of CALL_INSN_FUNCTION_USAGE, a (set (returnreg) (argreg)). This is recognized by IRA to adjust costs, and for communicating to caller-save that the register can be restored cheaply. The optimization only triggers if the argument is passed in a register, which should be the case in the majority of sane ABIs. The effect on the new testcase: pushq%rbx |subq$8, %rsp movslq%edx, %rdxmovslq%edx, %rdx movq%rdi, %rbx callmemcpycallmemcpy movq%rbx, %rax |addq$8, %rsp popq%rbx retret Bernd, sorry for some delay. I needed to think about the patch. It is pretty interesting and original idea. My only major objection to the patch is about treatment of ALLOCNO_CHEAP_CALLS_CROSSED_NUM. I think it should be accumulated as ALLOCNO_CALLS_CROSSED_NUM. Otherwise, I am afraid you will have a degradation in many cases instead of improvement. IRA is a regional allocator. The first it tries to do coloring in whole function (seeing a whole picture), then it tries to improve allocation in subregions. When you treat ALLOCNO_CHEAP_CALLS_CROSSED_NUM not accumulated (it means not taking subregions into account) you mislead allocation in the region containing subregions. For example, if the single call is in a subregion, ALLOCNO_CHEAP_CALLS_CROSSED_NUM for the subregion allocno will be 1 but in whole program allocno will be 0. RA in whole function will tend to allocate callee-saved hard register and RA in the subregion will tend to allocate caller-saved hard register. That will increase a possibility to create additional shuffle insns on the subregion borders and as consequence will degrade the code. I don't expect that this micro-optimization improves SPEC2000, but it will improve some tests. So it is good to have it. It would be really interesting to see the optimization impact on SPEC2000. I think I'll do it for myself in a week. So IRA part of the patch is ok for me if you modify treatment of ALLOCNO_CHEAP_CALLS_CROSSED_NUM as it is done for ALLOCNO_CALLS_CROSSED_NUM (when upper region allocnos accumulate the values from the corresponding allocnos from its sub-regions).
Re: Optimize calls to functions that return one of their arguments
On 05/03/2012 07:47 PM, Richard Sandiford wrote: Richard Guentherrichard.guent...@gmail.com writes: On Sat, Apr 28, 2012 at 5:31 PM, Bernd Schmidtber...@codesourcery.com wrote: This patch allows us to recognize that even if the argument to memcpy lives across the call, we can allocate it to a call-used register by reusing the return value of the function. Non-IRA RTL bits look good to me FWIW. Vlad, any opinions on the IRA bits? Bernd
Re: Optimize calls to functions that return one of their arguments
On 05/07/2012 10:30 AM, Bernd Schmidt wrote: On 05/03/2012 07:47 PM, Richard Sandiford wrote: Richard Guentherrichard.guent...@gmail.com writes: On Sat, Apr 28, 2012 at 5:31 PM, Bernd Schmidtber...@codesourcery.com wrote: This patch allows us to recognize that even if the argument to memcpy lives across the call, we can allocate it to a call-used register by reusing the return value of the function. Non-IRA RTL bits look good to me FWIW. Vlad, any opinions on the IRA bits? Sorry, Bernd. I was on vacation last week. I'll look at this today.
Re: Optimize calls to functions that return one of their arguments
Richard Guenther richard.guent...@gmail.com writes: On Sat, Apr 28, 2012 at 5:31 PM, Bernd Schmidt ber...@codesourcery.com wrote: This patch allows us to recognize that even if the argument to memcpy lives across the call, we can allocate it to a call-used register by reusing the return value of the function. First, the patch sets the existing fn spec attribute for memcpy/memmove. This is translated to a new form of CALL_INSN_FUNCTION_USAGE, a (set (returnreg) (argreg)). This is recognized by IRA to adjust costs, and for communicating to caller-save that the register can be restored cheaply. The optimization only triggers if the argument is passed in a register, which should be the case in the majority of sane ABIs. The effect on the new testcase: pushq %rbx | subq $8, %rsp movslq %edx, %rdx movslq %edx, %rdx movq %rdi, %rbx call memcpy call memcpy movq %rbx, %rax | addq $8, %rsp popq %rbx ret ret Bootstrapped with all languages on i686-linux, and bootstrapped and tested minus Ada on x86_64-linux. There's one Go test which seems to fail randomly both with and without the patch: FAIL: go.test/test/stack.go execution, -O2 -g Ok? Heh, interesting idea. I suppose you chose memcpy/memmove because the middle-end emits those for block moves? I see you've gone all the way generalizing support for this instead of a special hack (an RTL flag on the call, directly set by the expander?) for this case? I've chickened out at adding fnspec annotations to builtins because of the explosion of various strings that would need ;) That said, I like it but will leave the RTL / IRA parts to someone else to look at. Non-IRA RTL bits look good to me FWIW. Just for the record: I wondered whether a reg note would be better than the CALL_INSN_FUNCTION_USAGE SET, but while it would probably be enough for this one case (and probably less invasive), I agree the SET has a nice feel of generality about it. It would work for functions that return two values, etc. It also makes the information self-contained; no grubbing around in the PATTERN to find the return register. I looked through the walks of CALL_INSN_FUNCTION_USAGE and agree the ones you changed seem to be the only ones that need to change. Nit, but it'd be good to fix a preexisting typo in the docs: -A @code{MEM} generally points to a stack slots in which arguments passed +A @code{mem} generally points to a stack slots in which arguments passed (s/slots/slot/) too. There were also a couple of long lines in the IRA part. Richard
Re: Optimize calls to functions that return one of their arguments
On Sat, Apr 28, 2012 at 5:31 PM, Bernd Schmidt ber...@codesourcery.com wrote: This patch allows us to recognize that even if the argument to memcpy lives across the call, we can allocate it to a call-used register by reusing the return value of the function. First, the patch sets the existing fn spec attribute for memcpy/memmove. This is translated to a new form of CALL_INSN_FUNCTION_USAGE, a (set (returnreg) (argreg)). This is recognized by IRA to adjust costs, and for communicating to caller-save that the register can be restored cheaply. The optimization only triggers if the argument is passed in a register, which should be the case in the majority of sane ABIs. The effect on the new testcase: pushq %rbx | subq $8, %rsp movslq %edx, %rdx movslq %edx, %rdx movq %rdi, %rbx call memcpy call memcpy movq %rbx, %rax | addq $8, %rsp popq %rbx ret ret Bootstrapped with all languages on i686-linux, and bootstrapped and tested minus Ada on x86_64-linux. There's one Go test which seems to fail randomly both with and without the patch: FAIL: go.test/test/stack.go execution, -O2 -g Ok? Heh, interesting idea. I suppose you chose memcpy/memmove because the middle-end emits those for block moves? I see you've gone all the way generalizing support for this instead of a special hack (an RTL flag on the call, directly set by the expander?) for this case? I've chickened out at adding fnspec annotations to builtins because of the explosion of various strings that would need ;) That said, I like it but will leave the RTL / IRA parts to someone else to look at. Thanks, Richard. Bernd