Re: Optimize calls to functions that return one of their arguments

2012-05-15 Thread Eric Botcazou
 Ok, that looks like three votes in favour. I've checked in the following
 variant with a few minor changes. I've added strcpy and memset to the
 list of functions, and split off a new function in ira-lives.

Changes to subdirectories must be documented in the subdirectory's ChangeLog 
(ada, c-family, lto) and not in the main ChangeLog.

-- 
Eric Botcazou


Re: Optimize calls to functions that return one of their arguments

2012-05-15 Thread Bernd Schmidt

On 05/15/2012 09:12 AM, Eric Botcazou wrote:

Ok, that looks like three votes in favour. I've checked in the following
variant with a few minor changes. I've added strcpy and memset to the
list of functions, and split off a new function in ira-lives.


Changes to subdirectories must be documented in the subdirectory's ChangeLog
(ada, c-family, lto) and not in the main ChangeLog.


I checked in a fix.


Bernd




Re: Optimize calls to functions that return one of their arguments

2012-05-15 Thread Eric Botcazou
 I checked in a fix.

Thanks!

-- 
Eric Botcazou


Re: Optimize calls to functions that return one of their arguments

2012-05-14 Thread Bernd Schmidt

On 05/09/2012 05:53 PM, Vladimir Makarov wrote:

It is pretty interesting and original idea. My only major objection to
the patch is about treatment of ALLOCNO_CHEAP_CALLS_CROSSED_NUM. I think
it should be accumulated as ALLOCNO_CALLS_CROSSED_NUM.


Changed (places in ira-buld.c).


I don't expect that this micro-optimization improves SPEC2000, but it
will improve some tests. So it is good to have it. It would be really
interesting to see the optimization impact on SPEC2000. I think I'll do
it for myself in a week.

So IRA part of the patch is ok for me if you modify treatment of
ALLOCNO_CHEAP_CALLS_CROSSED_NUM as it is done for
ALLOCNO_CALLS_CROSSED_NUM (when upper region allocnos accumulate the
values from the corresponding allocnos from its sub-regions).


Ok, that looks like three votes in favour. I've checked in the following 
variant with a few minor changes. I've added strcpy and memset to the 
list of functions, and split off a new function in ira-lives.



Bernd
Index: gcc/attribs.c
===
--- gcc/attribs.c	(revision 187411)
+++ gcc/attribs.c	(working copy)
@@ -312,8 +312,9 @@ decl_attributes (tree *node, tree attrib
 
   if (spec == NULL)
 	{
-	  warning (OPT_Wattributes, %qE attribute directive ignored,
-		   name);
+	  if (!(flags  (int) ATTR_FLAG_BUILT_IN))
+	warning (OPT_Wattributes, %qE attribute directive ignored,
+		 name);
 	  continue;
 	}
   else if (list_length (args)  spec-min_length
Index: gcc/doc/rtl.texi
===
--- gcc/doc/rtl.texi	(revision 187411)
+++ gcc/doc/rtl.texi	(working copy)
@@ -3455,20 +3455,26 @@ unpredictably.
 @code{call_insn} insns have the same extra fields as @code{insn} insns,
 accessed in the same way and in addition contain a field
 @code{CALL_INSN_FUNCTION_USAGE}, which contains a list (chain of
-@code{expr_list} expressions) containing @code{use} and @code{clobber}
-expressions that denote hard registers and @code{MEM}s used or
-clobbered by the called function.
+@code{expr_list} expressions) containing @code{use}, @code{clobber} and
+sometimes @code{set} expressions that denote hard registers and
+@code{mem}s used or clobbered by the called function.
 
-A @code{MEM} generally points to a stack slots in which arguments passed
+A @code{mem} generally points to a stack slot in which arguments passed
 to the libcall by reference (@pxref{Register Arguments,
 TARGET_PASS_BY_REFERENCE}) are stored.  If the argument is
 caller-copied (@pxref{Register Arguments, TARGET_CALLEE_COPIES}),
-the stack slot will be mentioned in @code{CLOBBER} and @code{USE}
-entries; if it's callee-copied, only a @code{USE} will appear, and the
-@code{MEM} may point to addresses that are not stack slots.
+the stack slot will be mentioned in @code{clobber} and @code{use}
+entries; if it's callee-copied, only a @code{use} will appear, and the
+@code{mem} may point to addresses that are not stack slots.
 
-@code{CLOBBER}ed registers in this list augment registers specified in
-@code{CALL_USED_REGISTERS} (@pxref{Register Basics}).
+Registers occurring inside a @code{clobber} in this list augment
+registers specified in @code{CALL_USED_REGISTERS} (@pxref{Register
+Basics}).
+
+If the list contains a @code{set} involving two registers, it indicates
+that the function returns one of its arguments.  Such a @code{set} may
+look like a no-op if the same register holds the argument and the return
+value.
 
 @findex code_label
 @findex CODE_LABEL_NUMBER
Index: gcc/c-family/c-common.c
===
--- gcc/c-family/c-common.c	(revision 187411)
+++ gcc/c-family/c-common.c	(working copy)
@@ -4603,11 +4603,13 @@ enum built_in_attribute
 {
 #define DEF_ATTR_NULL_TREE(ENUM) ENUM,
 #define DEF_ATTR_INT(ENUM, VALUE) ENUM,
+#define DEF_ATTR_STRING(ENUM, VALUE) ENUM,
 #define DEF_ATTR_IDENT(ENUM, STRING) ENUM,
 #define DEF_ATTR_TREE_LIST(ENUM, PURPOSE, VALUE, CHAIN) ENUM,
 #include builtin-attrs.def
 #undef DEF_ATTR_NULL_TREE
 #undef DEF_ATTR_INT
+#undef DEF_ATTR_STRING
 #undef DEF_ATTR_IDENT
 #undef DEF_ATTR_TREE_LIST
   ATTR_LAST
@@ -5926,6 +5928,8 @@ c_init_attributes (void)
   built_in_attributes[(int) ENUM] = NULL_TREE;
 #define DEF_ATTR_INT(ENUM, VALUE)\
   built_in_attributes[(int) ENUM] = build_int_cst (integer_type_node, VALUE);
+#define DEF_ATTR_STRING(ENUM, VALUE)\
+  built_in_attributes[(int) ENUM] = build_string (strlen (VALUE), VALUE);
 #define DEF_ATTR_IDENT(ENUM, STRING)\
   built_in_attributes[(int) ENUM] = get_identifier (STRING);
 #define DEF_ATTR_TREE_LIST(ENUM, PURPOSE, VALUE, CHAIN)	\
Index: gcc/postreload.c
===
--- gcc/postreload.c	(revision 187411)
+++ gcc/postreload.c	(working copy)
@@ -1358,8 +1358,10 @@ reload_combine (void)
 	  for (link = CALL_INSN_FUNCTION_USAGE (insn); link;
 	   link = XEXP (link, 1))
 	{
-	 

Re: Optimize calls to functions that return one of their arguments

2012-05-14 Thread Jakub Jelinek
On Mon, May 14, 2012 at 02:08:34PM +0200, Bernd Schmidt wrote:
 --- gcc/builtins.def  (revision 187411)
 +++ gcc/builtins.def  (working copy)
 @@ -532,10 +532,10 @@ DEF_EXT_LIB_BUILTIN(BUILT_IN_BZERO,
  DEF_EXT_LIB_BUILTIN(BUILT_IN_INDEX, index, 
 BT_FN_STRING_CONST_STRING_INT, ATTR_PURE_NOTHROW_NONNULL_LEAF)
  DEF_LIB_BUILTIN(BUILT_IN_MEMCHR, memchr, 
 BT_FN_PTR_CONST_PTR_INT_SIZE, ATTR_PURE_NOTHROW_NONNULL_LEAF)
  DEF_LIB_BUILTIN(BUILT_IN_MEMCMP, memcmp, 
 BT_FN_INT_CONST_PTR_CONST_PTR_SIZE, ATTR_PURE_NOTHROW_NONNULL_LEAF)
 -DEF_LIB_BUILTIN(BUILT_IN_MEMCPY, memcpy, 
 BT_FN_PTR_PTR_CONST_PTR_SIZE, ATTR_NOTHROW_NONNULL_LEAF)
 -DEF_LIB_BUILTIN(BUILT_IN_MEMMOVE, memmove, 
 BT_FN_PTR_PTR_CONST_PTR_SIZE, ATTR_NOTHROW_NONNULL_LEAF)
 +DEF_LIB_BUILTIN(BUILT_IN_MEMCPY, memcpy, 
 BT_FN_PTR_PTR_CONST_PTR_SIZE, ATTR_RET1_NOTHROW_NONNULL_LEAF)
 +DEF_LIB_BUILTIN(BUILT_IN_MEMMOVE, memmove, 
 BT_FN_PTR_PTR_CONST_PTR_SIZE, ATTR_RET1_NOTHROW_NONNULL_LEAF)
  DEF_EXT_LIB_BUILTIN(BUILT_IN_MEMPCPY, mempcpy, 
 BT_FN_PTR_PTR_CONST_PTR_SIZE, ATTR_NOTHROW_NONNULL_LEAF)
 -DEF_LIB_BUILTIN(BUILT_IN_MEMSET, memset, BT_FN_PTR_PTR_INT_SIZE, 
 ATTR_NOTHROW_NONNULL_LEAF)
 +DEF_LIB_BUILTIN(BUILT_IN_MEMSET, memset, BT_FN_PTR_PTR_INT_SIZE, 
 ATTR_RET1_NOTHROW_NONNULL_LEAF)
  DEF_EXT_LIB_BUILTIN(BUILT_IN_RINDEX, rindex, 
 BT_FN_STRING_CONST_STRING_INT, ATTR_PURE_NOTHROW_NONNULL_LEAF)
  DEF_EXT_LIB_BUILTIN(BUILT_IN_STPCPY, stpcpy, 
 BT_FN_STRING_STRING_CONST_STRING, ATTR_NOTHROW_NONNULL_LEAF)
  DEF_EXT_LIB_BUILTIN(BUILT_IN_STPNCPY, stpncpy, 
 BT_FN_STRING_STRING_CONST_STRING_SIZE, ATTR_NOTHROW_NONNULL_LEAF)
 @@ -543,7 +543,7 @@ DEF_EXT_LIB_BUILTIN(BUILT_IN_STRCASE
  DEF_LIB_BUILTIN(BUILT_IN_STRCAT, strcat, 
 BT_FN_STRING_STRING_CONST_STRING, ATTR_NOTHROW_NONNULL_LEAF)
  DEF_LIB_BUILTIN(BUILT_IN_STRCHR, strchr, 
 BT_FN_STRING_CONST_STRING_INT, ATTR_PURE_NOTHROW_NONNULL_LEAF)
  DEF_LIB_BUILTIN(BUILT_IN_STRCMP, strcmp, 
 BT_FN_INT_CONST_STRING_CONST_STRING, ATTR_PURE_NOTHROW_NONNULL_LEAF)
 -DEF_LIB_BUILTIN(BUILT_IN_STRCPY, strcpy, 
 BT_FN_STRING_STRING_CONST_STRING, ATTR_NOTHROW_NONNULL_LEAF)
 +DEF_LIB_BUILTIN(BUILT_IN_STRCPY, strcpy, 
 BT_FN_STRING_STRING_CONST_STRING, ATTR_RET1_NOTHROW_NONNULL_LEAF)
  DEF_LIB_BUILTIN(BUILT_IN_STRCSPN, strcspn, 
 BT_FN_SIZE_CONST_STRING_CONST_STRING, ATTR_PURE_NOTHROW_NONNULL_LEAF)
  DEF_EXT_LIB_BUILTIN(BUILT_IN_STRDUP, strdup, 
 BT_FN_STRING_CONST_STRING, ATTR_MALLOC_NOTHROW_NONNULL_LEAF)
  DEF_EXT_LIB_BUILTIN(BUILT_IN_STRNDUP, strndup, 
 BT_FN_STRING_CONST_STRING_SIZE, ATTR_MALLOC_NOTHROW_NONNULL_LEAF)
 @@ -757,8 +757,8 @@ DEF_BUILTIN_STUB (BUILT_IN_ALLOCA_WITH_A
  
  /* Object size checking builtins.  */
  DEF_GCC_BUILTIN (BUILT_IN_OBJECT_SIZE, object_size, 
 BT_FN_SIZE_CONST_PTR_INT, ATTR_PURE_NOTHROW_LEAF_LIST)
 -DEF_EXT_LIB_BUILTIN(BUILT_IN_MEMCPY_CHK, __memcpy_chk, 
 BT_FN_PTR_PTR_CONST_PTR_SIZE_SIZE, ATTR_NOTHROW_NONNULL_LEAF)
 -DEF_EXT_LIB_BUILTIN(BUILT_IN_MEMMOVE_CHK, __memmove_chk, 
 BT_FN_PTR_PTR_CONST_PTR_SIZE_SIZE, ATTR_NOTHROW_NONNULL_LEAF)
 +DEF_EXT_LIB_BUILTIN(BUILT_IN_MEMCPY_CHK, __memcpy_chk, 
 BT_FN_PTR_PTR_CONST_PTR_SIZE_SIZE, ATTR_RET1_NOTHROW_NONNULL_LEAF)
 +DEF_EXT_LIB_BUILTIN(BUILT_IN_MEMMOVE_CHK, __memmove_chk, 
 BT_FN_PTR_PTR_CONST_PTR_SIZE_SIZE, ATTR_RET1_NOTHROW_NONNULL_LEAF)
  DEF_EXT_LIB_BUILTIN(BUILT_IN_MEMPCPY_CHK, __mempcpy_chk, 
 BT_FN_PTR_PTR_CONST_PTR_SIZE_SIZE, ATTR_NOTHROW_NONNULL_LEAF)
  DEF_EXT_LIB_BUILTIN(BUILT_IN_MEMSET_CHK, __memset_chk, 
 BT_FN_PTR_PTR_INT_SIZE_SIZE, ATTR_NOTHROW_NONNULL_LEAF)
  DEF_EXT_LIB_BUILTIN(BUILT_IN_STPCPY_CHK, __stpcpy_chk, 
 BT_FN_STRING_STRING_CONST_STRING_SIZE, ATTR_NOTHROW_NONNULL_LEAF)

You are missing also BUILT_IN_STRNCPY, BUILT_IN_MEMSET_CHK,
BUILT_IN_STRCPY_CHK and BUILT_IN_STRNCPY_CHK.
BUILT_IN_ASSUME_ALIGNED also does, but it should be optimized away
far before expansion already.

Jakub


Re: Optimize calls to functions that return one of their arguments

2012-05-14 Thread Vladimir Makarov

On 05/14/2012 08:08 AM, Bernd Schmidt wrote:

On 05/09/2012 05:53 PM, Vladimir Makarov wrote:

It is pretty interesting and original idea. My only major objection to
the patch is about treatment of ALLOCNO_CHEAP_CALLS_CROSSED_NUM. I think
it should be accumulated as ALLOCNO_CALLS_CROSSED_NUM.


Changed (places in ira-buld.c).


I don't expect that this micro-optimization improves SPEC2000, but it
will improve some tests. So it is good to have it. It would be really
interesting to see the optimization impact on SPEC2000. I think I'll do
it for myself in a week.

So IRA part of the patch is ok for me if you modify treatment of
ALLOCNO_CHEAP_CALLS_CROSSED_NUM as it is done for
ALLOCNO_CALLS_CROSSED_NUM (when upper region allocnos accumulate the
values from the corresponding allocnos from its sub-regions).


Ok, that looks like three votes in favour. I've checked in the 
following variant with a few minor changes. I've added strcpy and 
memset to the list of functions, and split off a new function in 
ira-lives.



IRA part is ok for me now.  Thanks, Bernd.


Re: Optimize calls to functions that return one of their arguments

2012-05-09 Thread Vladimir Makarov

On 04/28/2012 11:31 AM, Bernd Schmidt wrote:
This patch allows us to recognize that even if the argument to memcpy 
lives across the call, we can allocate it to a call-used register by 
reusing the return value of the function.


First, the patch sets the existing fn spec attribute for 
memcpy/memmove. This is translated to a new form of 
CALL_INSN_FUNCTION_USAGE, a (set (returnreg) (argreg)). This is 
recognized by IRA to adjust costs, and for communicating to 
caller-save that the register can be restored cheaply.


The optimization only triggers if the argument is passed in a 
register, which should be the case in the majority of sane ABIs. The 
effect on the new testcase:


pushq%rbx  |subq$8, %rsp
movslq%edx, %rdxmovslq%edx, %rdx
movq%rdi, %rbx 
callmemcpycallmemcpy
movq%rbx, %rax  |addq$8, %rsp
popq%rbx 
retret


Bernd, sorry for some delay.  I needed to think about the patch.

It is pretty interesting and original idea.  My only major objection to 
the patch is about treatment of ALLOCNO_CHEAP_CALLS_CROSSED_NUM.  I 
think it should be accumulated as ALLOCNO_CALLS_CROSSED_NUM. Otherwise, 
I am afraid you will have a degradation in many cases instead of 
improvement.


IRA is a regional allocator.  The first it tries to do coloring in whole 
function (seeing a whole picture), then it tries to improve allocation 
in subregions.  When you treat ALLOCNO_CHEAP_CALLS_CROSSED_NUM not 
accumulated (it means not taking subregions into account) you mislead 
allocation in the region containing subregions.


For example, if the single call is in a subregion, 
ALLOCNO_CHEAP_CALLS_CROSSED_NUM for the subregion allocno will be 1 but 
in whole program allocno will be 0.  RA in whole function will tend to 
allocate callee-saved hard register and RA in the subregion will tend to 
allocate caller-saved hard register.  That will increase a possibility 
to create additional shuffle insns on the subregion borders and as 
consequence will degrade the code.


I don't expect that this micro-optimization improves SPEC2000, but it 
will improve some tests.  So it is good to have it.  It would be really 
interesting to see the optimization impact on SPEC2000.  I think I'll do 
it for myself in a week.


So IRA part of the patch is ok for me if you modify treatment of 
ALLOCNO_CHEAP_CALLS_CROSSED_NUM as it is done for 
ALLOCNO_CALLS_CROSSED_NUM (when upper region allocnos accumulate the 
values from the corresponding allocnos from its sub-regions).




Re: Optimize calls to functions that return one of their arguments

2012-05-07 Thread Bernd Schmidt

On 05/03/2012 07:47 PM, Richard Sandiford wrote:

Richard Guentherrichard.guent...@gmail.com  writes:

On Sat, Apr 28, 2012 at 5:31 PM, Bernd Schmidtber...@codesourcery.com  wrote:

This patch allows us to recognize that even if the argument to memcpy lives
across the call, we can allocate it to a call-used register by reusing the
return value of the function.



Non-IRA RTL bits look good to me FWIW.


Vlad, any opinions on the IRA bits?


Bernd



Re: Optimize calls to functions that return one of their arguments

2012-05-07 Thread Vladimir Makarov

On 05/07/2012 10:30 AM, Bernd Schmidt wrote:

On 05/03/2012 07:47 PM, Richard Sandiford wrote:

Richard Guentherrichard.guent...@gmail.com  writes:
On Sat, Apr 28, 2012 at 5:31 PM, Bernd 
Schmidtber...@codesourcery.com  wrote:
This patch allows us to recognize that even if the argument to 
memcpy lives
across the call, we can allocate it to a call-used register by 
reusing the

return value of the function.



Non-IRA RTL bits look good to me FWIW.


Vlad, any opinions on the IRA bits?



Sorry, Bernd. I was on vacation last week.  I'll look at this today.



Re: Optimize calls to functions that return one of their arguments

2012-05-03 Thread Richard Sandiford
Richard Guenther richard.guent...@gmail.com writes:
 On Sat, Apr 28, 2012 at 5:31 PM, Bernd Schmidt ber...@codesourcery.com 
 wrote:
 This patch allows us to recognize that even if the argument to memcpy lives
 across the call, we can allocate it to a call-used register by reusing the
 return value of the function.

 First, the patch sets the existing fn spec attribute for memcpy/memmove.
 This is translated to a new form of CALL_INSN_FUNCTION_USAGE, a (set
 (returnreg) (argreg)). This is recognized by IRA to adjust costs, and for
 communicating to caller-save that the register can be restored cheaply.

 The optimization only triggers if the argument is passed in a register,
 which should be the case in the majority of sane ABIs. The effect on the new
 testcase:

        pushq   %rbx              |     subq    $8, %rsp
        movslq  %edx, %rdx              movslq  %edx, %rdx
        movq    %rdi, %rbx        
        call    memcpy                  call    memcpy
        movq    %rbx, %rax        |     addq    $8, %rsp
        popq    %rbx              
        ret                                     ret

 Bootstrapped with all languages on i686-linux, and bootstrapped and tested
 minus Ada on x86_64-linux. There's one Go test which seems to fail randomly
 both with and without the patch:
 FAIL: go.test/test/stack.go execution,  -O2 -g

 Ok?

 Heh, interesting idea.  I suppose you chose memcpy/memmove because the
 middle-end emits those for block moves?  I see you've gone all the way
 generalizing support for this instead of a special hack (an RTL flag on the
 call, directly set by the expander?) for this case?

 I've chickened out at adding fnspec annotations to builtins because of the
 explosion of various strings that would need ;)

 That said, I like it but will leave the RTL / IRA parts to someone else to 
 look
 at.

Non-IRA RTL bits look good to me FWIW.  Just for the record: I wondered
whether a reg note would be better than the CALL_INSN_FUNCTION_USAGE SET,
but while it would probably be enough for this one case (and probably
less invasive), I agree the SET has a nice feel of generality about it.
It would work for functions that return two values, etc.  It also makes
the information self-contained; no grubbing around in the PATTERN to
find the return register.

I looked through the walks of CALL_INSN_FUNCTION_USAGE and agree the
ones you changed seem to be the only ones that need to change.

Nit, but it'd be good to fix a preexisting typo in the docs:

-A @code{MEM} generally points to a stack slots in which arguments passed
+A @code{mem} generally points to a stack slots in which arguments passed

(s/slots/slot/) too.  There were also a couple of long lines in the
IRA part.

Richard


Re: Optimize calls to functions that return one of their arguments

2012-05-02 Thread Richard Guenther
On Sat, Apr 28, 2012 at 5:31 PM, Bernd Schmidt ber...@codesourcery.com wrote:
 This patch allows us to recognize that even if the argument to memcpy lives
 across the call, we can allocate it to a call-used register by reusing the
 return value of the function.

 First, the patch sets the existing fn spec attribute for memcpy/memmove.
 This is translated to a new form of CALL_INSN_FUNCTION_USAGE, a (set
 (returnreg) (argreg)). This is recognized by IRA to adjust costs, and for
 communicating to caller-save that the register can be restored cheaply.

 The optimization only triggers if the argument is passed in a register,
 which should be the case in the majority of sane ABIs. The effect on the new
 testcase:

        pushq   %rbx              |     subq    $8, %rsp
        movslq  %edx, %rdx              movslq  %edx, %rdx
        movq    %rdi, %rbx        
        call    memcpy                  call    memcpy
        movq    %rbx, %rax        |     addq    $8, %rsp
        popq    %rbx              
        ret                                     ret

 Bootstrapped with all languages on i686-linux, and bootstrapped and tested
 minus Ada on x86_64-linux. There's one Go test which seems to fail randomly
 both with and without the patch:
 FAIL: go.test/test/stack.go execution,  -O2 -g

 Ok?

Heh, interesting idea.  I suppose you chose memcpy/memmove because the
middle-end emits those for block moves?  I see you've gone all the way
generalizing support for this instead of a special hack (an RTL flag on the
call, directly set by the expander?) for this case?

I've chickened out at adding fnspec annotations to builtins because of the
explosion of various strings that would need ;)

That said, I like it but will leave the RTL / IRA parts to someone else to look
at.

Thanks,
Richard.


 Bernd