Re: [PATCH, AARCH64] Enable fuse-caller-save for AARCH64

2014-06-20 Thread Tom de Vries

On 19-06-14 20:41, Richard Henderson wrote:

On 06/19/2014 11:25 AM, Tom de Vries wrote:

On 19-06-14 05:53, Richard Henderson wrote:

On 06/01/2014 03:00 AM, Tom de Vries wrote:

+aarch64_emit_call_insn (rtx pat)
+{
+  rtx insn = emit_call_insn (pat);
+
+  rtx *fusage = CALL_INSN_FUNCTION_USAGE (insn);
+  clobber_reg (fusage, gen_rtx_REG (word_mode, IP0_REGNUM));
+  clobber_reg (fusage, gen_rtx_REG (word_mode, IP1_REGNUM));

Actually, I'd like to know more about how this is supposed to work.

Why are you only marking the two registers that would be used by a PLT entry,
but not those clobbered by the ld.so trampoline, or indeed the unknown function
that would be called from the PLT.

Oh, I see, looking at the code we do actually follow the cgraph and make sure
it is a direct call with a known destination.  So, in fact, it's only the
registers that could be clobbered by ld branch islands (so these two are still
correct for aarch64).

This means the documentation is actually wrong when it mentions PLTs at all.


Yes, if we go from the point of view that the
TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS hooks sole purpose is to enable
the fuse-caller-save optimization.

How about this updated definition ? OK for trunk if re-testing on arm succeeds ?


I did like the doc including mention of stubs, because they're easy to
forget.  How about

Set to true if each call that binds to a local definition explicitly clobbers
or sets all non-fixed registers modified by performing the call.  That is, by
the call pattern itself, or by code that might be inserted by the linker
(e.g. stubs, veneers, branch islands), but not including those modifiable by
the callee.  The affected registers may be mentioned explicitly in the
call pattern, or included as clobbers in CALL_INSN_FUNCTION_USAGE.
The default version of this hook is set to false.  The purpose of this hook
is to enable the fuse-caller-save optimization.




Looks good to me.  Bootstrapped and committed as attached.

Thanks,
- Tom

2014-06-20  Tom de Vries  t...@codesourcery.com

	* target.def (call_fusage_contains_non_callee_clobbers): Update
	definition.
	* doc/tm.texi: Regenerate.

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index c272630..45281ae 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -4884,14 +4884,14 @@ Whether this target supports splitting the stack when the options described in @
 @cindex miscellaneous register hooks
 
 @deftypevr {Target Hook} bool TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS
-set to true if all the calls in the current function contain clobbers in
-CALL_INSN_FUNCTION_USAGE for the registers that are clobbered by the call
-rather than by the callee, and are not already set or clobbered in the call
-pattern.  Examples of such registers are registers used in PLTs and stubs,
-and temporary registers used in the call instruction but not present in the
-rtl pattern.  Another way to formulate it is the registers not present in the
-rtl pattern that are clobbered by the call assuming the callee does not
-clobber any register.  The default version of this hook is set to false.
+Set to true if each call that binds to a local definition explicitly
+clobbers or sets all non-fixed registers modified by performing the call.
+That is, by the call pattern itself, or by code that might be inserted by the
+linker (e.g. stubs, veneers, branch islands), but not including those
+modifiable by the callee.  The affected registers may be mentioned explicitly
+in the call pattern, or included as clobbers in CALL_INSN_FUNCTION_USAGE.
+The default version of this hook is set to false.  The purpose of this hook
+is to enable the fuse-caller-save optimization.
 @end deftypevr
 
 @node Varargs
diff --git a/gcc/target.def b/gcc/target.def
index e455211..ee250e6 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -5128,18 +5128,18 @@ FRAME_POINTER_REGNUM, ARG_POINTER_REGNUM, and the PIC_OFFSET_TABLE_REGNUM.,
  hook_void_bitmap)
 
 /* Targets should define this target hook to mark that non-callee clobbers are
-   present in CALL_INSN_FUNCTION_USAGE for all the calls in the current
-   function.  */
+   present in CALL_INSN_FUNCTION_USAGE for all the calls that bind to a local
+   definition.  */
 DEFHOOKPOD
 (call_fusage_contains_non_callee_clobbers,
- set to true if all the calls in the current function contain clobbers in\n\
-CALL_INSN_FUNCTION_USAGE for the registers that are clobbered by the call\n\
-rather than by the callee, and are not already set or clobbered in the call\n\
-pattern.  Examples of such registers are registers used in PLTs and stubs,\n\
-and temporary registers used in the call instruction but not present in the\n\
-rtl pattern.  Another way to formulate it is the registers not present in the\n\
-rtl pattern that are clobbered by the call assuming the callee does not\n\
-clobber any register.  The default version of this hook is set to false.,
+ Set to true if each call that binds to a local definition explicitly\n\

Re: [PATCH, AARCH64] Enable fuse-caller-save for AARCH64

2014-06-19 Thread Tom de Vries

On 19-06-14 05:53, Richard Henderson wrote:

Do we in fact make sure this isn't an ifunc resolver?  I don't immediately see
how those get wired up in the cgraph...


Richard,

using the patch below I changed the 
gcc/testsuite/gcc.target/i386/fuse-caller-save.c testcase to use an ifunc 
resolver, and observed that the fuse-caller-save optimization didn't work.


The reason the optimization doesn't work in this case is that 
default_binds_local_p_1 checks the ifunc attribute:

...
  /* Weakrefs may not bind locally, even though the weakref itself is always
 static and therefore local.  Similarly, the resolver for ifunc functions
 might resolve to a non-local function.
 FIXME: We can resolve the weakref case more curefuly by looking at the
 weakref alias.  */
  else if (lookup_attribute (weakref, DECL_ATTRIBUTES (exp))
   || (TREE_CODE (exp) == FUNCTION_DECL
lookup_attribute (ifunc, DECL_ATTRIBUTES (exp
local_p = false;
...

The default_binds_local_p_1 function is used via this path in the optimization:
get_call_reg_set_usage - get_call_cgraph_rtl_info - 
decl_binds_to_current_def_p - default_binds_local_p - default_binds_local_p_1 .


Thanks,
- Tom


diff --git a/gcc/testsuite/gcc.target/i386/fuse-caller-save.c b/gcc/testsuite/gcc.target/i386/fuse-caller-save.c
index 4ec4995..012dc12 100644
--- a/gcc/testsuite/gcc.target/i386/fuse-caller-save.c
+++ b/gcc/testsuite/gcc.target/i386/fuse-caller-save.c
@@ -5,11 +5,18 @@
 /* Testing -fuse-caller-save optimization option.  */
 
 static int __attribute__((noinline))
-bar (int x)
+my_bar (int x)
 {
   return x + 3;
 }
 
+static void (*resolve_bar (void)) (void)
+{
+  return (void*) my_bar;
+}
+
+static int __attribute__((noinline)) __attribute__((ifunc (resolve_bar))) bar (int x);
+
 int __attribute__((noinline))
 foo (int y)
 {
-- 
1.9.1



Re: [PATCH, AARCH64] Enable fuse-caller-save for AARCH64

2014-06-19 Thread Tom de Vries

On 19-06-14 05:21, Richard Henderson wrote:

On 06/01/2014 03:00 AM, Tom de Vries wrote:

+/* Emit call insn with PAT and do aarch64-specific handling.  */
+
+bool
+aarch64_emit_call_insn (rtx pat)
+{
+  rtx insn = emit_call_insn (pat);
+
+  rtx *fusage = CALL_INSN_FUNCTION_USAGE (insn);
+  clobber_reg (fusage, gen_rtx_REG (word_mode, IP0_REGNUM));
+  clobber_reg (fusage, gen_rtx_REG (word_mode, IP1_REGNUM));
+}
+


Which can't have been bootstrapped, since this has no return stmt.
Why the bool return type anyway?  Nothing appears to use it.



Richard,

Indeed, the return type should be void, this patch fixes that.

I have no setup to bootstrap this on aarch64. I've build an aarch64 compiler and 
ran the gcc.target/aarch64/fuse-caller-save.c testcase.


Committed as obvious.

Thanks,
- Tom
2014-06-19  Tom de Vries  t...@codesourcery.com

	* config/aarch64/aarch64-protos.h (aarch64_emit_call_insn): Change
	return type to void.
	* config/aarch64/aarch64.c (aarch64_emit_call_insn): Same.

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 213c8dc..53023ba 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -245,7 +245,7 @@ void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx,
 void aarch64_init_expanders (void);
 void aarch64_print_operand (FILE *, rtx, char);
 void aarch64_print_operand_address (FILE *, rtx);
-bool aarch64_emit_call_insn (rtx);
+void aarch64_emit_call_insn (rtx);
 
 /* Initialize builtins for SIMD intrinsics.  */
 void init_aarch64_simd_builtins (void);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b2d005b..f0aafbd 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3395,7 +3395,7 @@ aarch64_fixed_condition_code_regs (unsigned int *p1, unsigned int *p2)
 
 /* Emit call insn with PAT and do aarch64-specific handling.  */
 
-bool
+void
 aarch64_emit_call_insn (rtx pat)
 {
   rtx insn = emit_call_insn (pat);
-- 
1.9.1



Re: [PATCH, AARCH64] Enable fuse-caller-save for AARCH64

2014-06-19 Thread Richard Henderson
On 06/19/2014 01:39 AM, Tom de Vries wrote:
 On 19-06-14 05:53, Richard Henderson wrote:
 Do we in fact make sure this isn't an ifunc resolver?  I don't immediately 
 see
 how those get wired up in the cgraph...
 
 Richard,
 
 using the patch below I changed the
 gcc/testsuite/gcc.target/i386/fuse-caller-save.c testcase to use an ifunc
 resolver, and observed that the fuse-caller-save optimization didn't work.
 
 The reason the optimization doesn't work in this case is that
 default_binds_local_p_1 checks the ifunc attribute:
 ...
   /* Weakrefs may not bind locally, even though the weakref itself is always
  static and therefore local.  Similarly, the resolver for ifunc functions
  might resolve to a non-local function.
  FIXME: We can resolve the weakref case more curefuly by looking at the
  weakref alias.  */
   else if (lookup_attribute (weakref, DECL_ATTRIBUTES (exp))
|| (TREE_CODE (exp) == FUNCTION_DECL
 lookup_attribute (ifunc, DECL_ATTRIBUTES (exp
 local_p = false;
 ...
 
 The default_binds_local_p_1 function is used via this path in the 
 optimization:
 get_call_reg_set_usage - get_call_cgraph_rtl_info -
 decl_binds_to_current_def_p - default_binds_local_p - 
 default_binds_local_p_1 .

Excellent.  Thanks for doing the digging I was too lazy to finish last night.


r~



Re: [PATCH, AARCH64] Enable fuse-caller-save for AARCH64

2014-06-19 Thread Tom de Vries

On 19-06-14 05:53, Richard Henderson wrote:

On 06/01/2014 03:00 AM, Tom de Vries wrote:

+aarch64_emit_call_insn (rtx pat)
+{
+  rtx insn = emit_call_insn (pat);
+
+  rtx *fusage = CALL_INSN_FUNCTION_USAGE (insn);
+  clobber_reg (fusage, gen_rtx_REG (word_mode, IP0_REGNUM));
+  clobber_reg (fusage, gen_rtx_REG (word_mode, IP1_REGNUM));

Actually, I'd like to know more about how this is supposed to work.

Why are you only marking the two registers that would be used by a PLT entry,
but not those clobbered by the ld.so trampoline, or indeed the unknown function
that would be called from the PLT.

Oh, I see, looking at the code we do actually follow the cgraph and make sure
it is a direct call with a known destination.  So, in fact, it's only the
registers that could be clobbered by ld branch islands (so these two are still
correct for aarch64).

This means the documentation is actually wrong when it mentions PLTs at all.


Yes, if we go from the point of view that the 
TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS hooks sole purpose is to enable 
the fuse-caller-save optimization.


How about this updated definition ? OK for trunk if re-testing on arm succeeds ?

Thanks,
- Tom


2014-06-19  Tom de Vries  t...@codesourcery.com

	* config/arm/arm.c (arm_emit_call_insn): Remove clobber of CC_REGNUM.
	* target.def: Update defition.
	* doc/tm.texi: Regenerate.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index d293b5b..178f08b 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -17642,11 +17642,11 @@ arm_emit_call_insn (rtx pat, rtx addr, bool sibcall)
   if (TARGET_AAPCS_BASED)
 {
   /* For AAPCS, IP and CC can be clobbered by veneers inserted by the
-	 linker.  We need to add these to allow setting
-	 TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS to true.  */
+	 linker.  We need to add IP to allow setting
+	 TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS to true.  CC is not
+	 needed since it's a fixed register.  */
   rtx *fusage = CALL_INSN_FUNCTION_USAGE (insn);
   clobber_reg (fusage, gen_rtx_REG (word_mode, IP_REGNUM));
-  clobber_reg (fusage, gen_rtx_REG (word_mode, CC_REGNUM));
 }
 }
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index c272630..b0a8dbd 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -4884,14 +4884,11 @@ Whether this target supports splitting the stack when the options described in @
 @cindex miscellaneous register hooks
 
 @deftypevr {Target Hook} bool TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS
-set to true if all the calls in the current function contain clobbers in
-CALL_INSN_FUNCTION_USAGE for the registers that are clobbered by the call
-rather than by the callee, and are not already set or clobbered in the call
-pattern.  Examples of such registers are registers used in PLTs and stubs,
-and temporary registers used in the call instruction but not present in the
-rtl pattern.  Another way to formulate it is the registers not present in the
-rtl pattern that are clobbered by the call assuming the callee does not
-clobber any register.  The default version of this hook is set to false.
+Set to true if each call that binds to a local definition contain clobbers
+in CALL_INSN_FUNCTION_USAGE for the non-fixed registers that are clobbered by
+the call rather than by the callee, and are not already set or clobbered in
+the call pattern.  The default version of this hook is set to false.  The
+purpose of this hook it to enable the fuse-caller-save optimization.
 @end deftypevr
 
 @node Varargs
diff --git a/gcc/target.def b/gcc/target.def
index e455211..b738281 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -5128,18 +5128,15 @@ FRAME_POINTER_REGNUM, ARG_POINTER_REGNUM, and the PIC_OFFSET_TABLE_REGNUM.,
  hook_void_bitmap)
 
 /* Targets should define this target hook to mark that non-callee clobbers are
-   present in CALL_INSN_FUNCTION_USAGE for all the calls in the current
-   function.  */
+   present in CALL_INSN_FUNCTION_USAGE for all the calls that bind to a local
+   definition.  */
 DEFHOOKPOD
 (call_fusage_contains_non_callee_clobbers,
- set to true if all the calls in the current function contain clobbers in\n\
-CALL_INSN_FUNCTION_USAGE for the registers that are clobbered by the call\n\
-rather than by the callee, and are not already set or clobbered in the call\n\
-pattern.  Examples of such registers are registers used in PLTs and stubs,\n\
-and temporary registers used in the call instruction but not present in the\n\
-rtl pattern.  Another way to formulate it is the registers not present in the\n\
-rtl pattern that are clobbered by the call assuming the callee does not\n\
-clobber any register.  The default version of this hook is set to false.,
+ Set to true if each call that binds to a local definition contain clobbers\n\
+in CALL_INSN_FUNCTION_USAGE for the non-fixed registers that are clobbered by\n\
+the call rather than by the callee, and are not already set or clobbered in\n\
+the call pattern.  

Re: [PATCH, AARCH64] Enable fuse-caller-save for AARCH64

2014-06-19 Thread Richard Henderson
On 06/19/2014 11:25 AM, Tom de Vries wrote:
 On 19-06-14 05:53, Richard Henderson wrote:
 On 06/01/2014 03:00 AM, Tom de Vries wrote:
 +aarch64_emit_call_insn (rtx pat)
 +{
 +  rtx insn = emit_call_insn (pat);
 +
 +  rtx *fusage = CALL_INSN_FUNCTION_USAGE (insn);
 +  clobber_reg (fusage, gen_rtx_REG (word_mode, IP0_REGNUM));
 +  clobber_reg (fusage, gen_rtx_REG (word_mode, IP1_REGNUM));
 Actually, I'd like to know more about how this is supposed to work.

 Why are you only marking the two registers that would be used by a PLT entry,
 but not those clobbered by the ld.so trampoline, or indeed the unknown 
 function
 that would be called from the PLT.

 Oh, I see, looking at the code we do actually follow the cgraph and make sure
 it is a direct call with a known destination.  So, in fact, it's only the
 registers that could be clobbered by ld branch islands (so these two are 
 still
 correct for aarch64).

 This means the documentation is actually wrong when it mentions PLTs at all.
 
 Yes, if we go from the point of view that the
 TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS hooks sole purpose is to 
 enable
 the fuse-caller-save optimization.
 
 How about this updated definition ? OK for trunk if re-testing on arm 
 succeeds ?

I did like the doc including mention of stubs, because they're easy to
forget.  How about

Set to true if each call that binds to a local definition explicitly clobbers
or sets all non-fixed registers modified by performing the call.  That is, by
the call pattern itself, or by code that might be inserted by the linker
(e.g. stubs, veneers, branch islands), but not including those modifiable by
the callee.  The affected registers may be mentioned explicitly in the
call pattern, or included as clobbers in CALL_INSN_FUNCTION_USAGE.
The default version of this hook is set to false.  The purpose of this hook
is to enable the fuse-caller-save optimization.


r~


Re: [PATCH, AARCH64] Enable fuse-caller-save for AARCH64

2014-06-18 Thread Richard Henderson
On 06/01/2014 03:00 AM, Tom de Vries wrote:
 +/* Emit call insn with PAT and do aarch64-specific handling.  */
 +
 +bool
 +aarch64_emit_call_insn (rtx pat)
 +{
 +  rtx insn = emit_call_insn (pat);
 +
 +  rtx *fusage = CALL_INSN_FUNCTION_USAGE (insn);
 +  clobber_reg (fusage, gen_rtx_REG (word_mode, IP0_REGNUM));
 +  clobber_reg (fusage, gen_rtx_REG (word_mode, IP1_REGNUM));
 +}
 +

Which can't have been bootstrapped, since this has no return stmt.
Why the bool return type anyway?  Nothing appears to use it.


r~


Re: [PATCH, AARCH64] Enable fuse-caller-save for AARCH64

2014-06-18 Thread Richard Henderson
On 06/01/2014 03:00 AM, Tom de Vries wrote:
 +aarch64_emit_call_insn (rtx pat)
 +{
 +  rtx insn = emit_call_insn (pat);
 +
 +  rtx *fusage = CALL_INSN_FUNCTION_USAGE (insn);
 +  clobber_reg (fusage, gen_rtx_REG (word_mode, IP0_REGNUM));
 +  clobber_reg (fusage, gen_rtx_REG (word_mode, IP1_REGNUM));

Actually, I'd like to know more about how this is supposed to work.

Why are you only marking the two registers that would be used by a PLT entry,
but not those clobbered by the ld.so trampoline, or indeed the unknown function
that would be called from the PLT.

Oh, I see, looking at the code we do actually follow the cgraph and make sure
it is a direct call with a known destination.  So, in fact, it's only the
registers that could be clobbered by ld branch islands (so these two are still
correct for aarch64).

This means the documentation is actually wrong when it mentions PLTs at all.

Do we in fact make sure this isn't an ifunc resolver?  I don't immediately see
how those get wired up in the cgraph...


r~


Re: [PATCH, AARCH64] Enable fuse-caller-save for AARCH64

2014-06-18 Thread Marcus Shawcroft
On 1 June 2014 11:00, Tom de Vries tom_devr...@mentor.com wrote:
 Richard,

 This patch:
 - adds the for TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS required
   clobbers in CALL_INSN_FUNCTION_USAGE,
 - sets TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS to true, which
 enables
   the fuse-caller-save optimisation, and
 - adds an aarch64 fuse-caller-save test-case.

 Build and tested on aarch64-linux-gnu.

 OK for trunk?

 Thanks,
 - Tom


OK
/Marcus