Re: [PATCH] Add -mno-r11 option to suppress load of ppc64 static chain in indirect calls
On Thu, Jul 07, 2011 at 03:14:02PM -0400, David Edelsohn wrote: On Thu, Jul 7, 2011 at 11:53 AM, Richard Guenther richard.guent...@gmail.com wrote: Well, that's up to the target maintainers to decide, maybe -mno-nested-functions instead? Is -mno-nested-functions or -mno-nested-function-pointers too C-centric or GCC-centric? I don't know what wording would be more informative, but the functionality is available in Pascal, PL/I, Ada, GCC extensions and other languages. We're open to suggestions. The compiler certainly can't figure out in _all_ cases - but it should be able to handle most of the cases (with LTO even more cases) ok, no? -mno-r11 is an assertion to the compiler that no function calls through pointers will require the static chain. However, I agree that the compiler conservatively should be able to figure out some cases itself, which would be a good enhancement. I changed the switch to -mno-pointers-to-nested-functions as David requestion in private communications. [gcc] 2011-07-13 Michael Meissner meiss...@linux.vnet.ibm.com * config/rs6000/rs6000.opt (-mpointers-to-nested-functions): Rename -mr11. * config/rs6000/rs6000.c (rs6000_trampoline_init): Ditto. (rs6000_call_indirect_aix): Ditto. * config/rs6000/rs6000.md (call_indirect_aixptrsize): Ditto. (call_indirect_aixptrsize_internal): Ditto. (call_indirect_aixptrsize_nor11): Ditto. (call_indirect_aixptrsize_internal2): Ditto. (call_value_indirect_aixptrsize): Ditto. (call_value_indirect_aixptrsize_internal): Ditto. (call_value_indirect_aixptrsize_nor11): Ditto. (call_value_indirect_aixptrsize_internal2): Ditto. * doc/invoke.texi (RS/6000 and PowerPC Options): Ditto. [gcc/testsuite] 2011-07-13 Michael Meissner meiss...@linux.vnet.ibm.com * gcc.target/powerpc/no-r11-1.c: Change -mno-r11 to -mno-pointers-to-nested-functions. * gcc.target/powerpc/no-r11-2.c: Ditto. * gcc.target/powerpc/no-r11-3.c: Ditto. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899 Index: gcc/config/rs6000/rs6000.opt === --- gcc/config/rs6000/rs6000.opt(revision 176251) +++ gcc/config/rs6000/rs6000.opt(working copy) @@ -521,9 +521,9 @@ mxilinx-fpu Target Var(rs6000_xilinx_fpu) Save Specify Xilinx FPU. -mr11 -Target Report Var(TARGET_R11) Init(1) Save -Use/do not use r11 to hold the static link in calls. +mpointers-to-nested-functions +Target Report Var(TARGET_POINTERS_TO_NESTED_FUNCTIONS) Init(1) Save +Use/do not use r11 to hold the static link in calls to functions via pointers. msave-toc-indirect Target Undocumented Var(TARGET_SAVE_TOC_INDIRECT) Save Init(1) Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 176251) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -24409,7 +24409,7 @@ rs6000_trampoline_init (rtx m_tramp, tre { rtx fnmem, fn_reg, toc_reg; - if (!TARGET_R11) + if (!TARGET_POINTERS_TO_NESTED_FUNCTIONS) error (-mno-r11 must not be used if you have trampolines); fnmem = gen_const_mem (Pmode, force_reg (Pmode, fnaddr)); @@ -27741,7 +27741,7 @@ rs6000_call_indirect_aix (rtx value, rtx stack_toc_offset = GEN_INT (TOC_SAVE_OFFSET_32BIT); func_toc_offset = GEN_INT (AIX_FUNC_DESC_TOC_32BIT); func_sc_offset = GEN_INT (AIX_FUNC_DESC_SC_32BIT); - if (TARGET_R11) + if (TARGET_POINTERS_TO_NESTED_FUNCTIONS) { call_func = gen_call_indirect_aix32bit; call_value_func = gen_call_value_indirect_aix32bit; @@ -27757,7 +27757,7 @@ rs6000_call_indirect_aix (rtx value, rtx stack_toc_offset = GEN_INT (TOC_SAVE_OFFSET_64BIT); func_toc_offset = GEN_INT (AIX_FUNC_DESC_TOC_64BIT); func_sc_offset = GEN_INT (AIX_FUNC_DESC_SC_64BIT); - if (TARGET_R11) + if (TARGET_POINTERS_TO_NESTED_FUNCTIONS) { call_func = gen_call_indirect_aix64bit; call_value_func = gen_call_value_indirect_aix64bit; @@ -27800,7 +27800,7 @@ rs6000_call_indirect_aix (rtx value, rtx func_toc_offset)); /* If we have a static chain, load it up. */ - if (TARGET_R11) + if (TARGET_POINTERS_TO_NESTED_FUNCTIONS) { func_sc_mem = gen_rtx_MEM (Pmode, gen_rtx_PLUS (Pmode, Index: gcc/config/rs6000/rs6000.md === --- gcc/config/rs6000/rs6000.md (revision 176251) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -12386,7 +12386,7 @@ (define_insn_and_split call_indirect_ai (use (match_operand:P 3 memory_operand m,m)) (use (reg:P STATIC_CHAIN_REGNUM)) (clobber
Re: [PATCH] Add -mno-r11 option to suppress load of ppc64 static chain in indirect calls
On Thu, Jul 7, 2011 at 4:19 PM, Richard Guenther richard.guent...@gmail.com wrote: Does XLC have a similar switch whose name we can use? The IBM XL compiler is discussing a similar feature, but it is not implemented yet and does not have a formal command line option name. - David
Re: [PATCH] Add -mno-r11 option to suppress load of ppc64 static chain in indirect calls
On Thu, Jul 7, 2011 at 12:29 AM, Michael Meissner meiss...@linux.vnet.ibm.com wrote: This patch adds an option to not load the static chain (r11) for 64-bit PowerPC calls through function pointers (or virtual function). Most of the languages on the PowerPC do not need the static chain being loaded when called, and adding this instruction can slow down code that calls very short functions. In addition, if the function does not call alloca, setjmp or deal with exceptions where the stack is modified, the compiler can move the store of the TOC value for the current function to the prologue of the function, rather than at each call site. The effect of these patches is to speed up 464.h264ref in the Spec 2006 benchmark by about 7% if -mno-r11 is used, and 5% if it is not used (but the save of the TOC register is hoisted). I believe this is due to the load of the current function's TOC (r2) having to wait until the store queue is drained with the store just before the call. Unfortunately, I do see a 3% slowdown in 429.mcf, which I don't know what the cause is. I have bootstraped the compiler and saw that there were no regressions in make check. Is it ok to install in the trunk? Hum. Can't the compiler figure this our itself per-call-site? At least the name of the command-line switch -m[no-]r11 is meaningless to me. Points-to information should be able to tell you if the function pointer points to a nested function. Richard. [gcc] 2011-07-06 Michael Meissner meiss...@linux.vnet.ibm.com * config/rs6000/rs6000-protos.h (rs6000_call_indirect_aix): New declaration. (rs6000_save_toc_in_prologue_p): Ditto. * config/rs6000/rs6000.opt (-mr11): New switch to disable loading up the static chain (r11) during indirect function calls. (-msave-toc-indirect): New undocumented debug switch. * config/rs6000/rs6000.c (struct machine_function): Add save_toc_in_prologue field to note whether the prologue needs to save the TOC value in the reserved stack location. (rs6000_emit_prologue): Use TOC_REGNUM instead of 2. If we need to save the TOC in the prologue, do so. (rs6000_trampoline_init): Don't allow creating AIX style trampolines if -mno-r11 is in effect. (rs6000_call_indirect_aix): New function to create AIX style indirect calls, adding support for -mno-r11 to suppress loading the static chain, and saving the TOC in the prologue instead of the call body. (rs6000_save_toc_in_prologue_p): Return true if we are saving the TOC in the prologue. * config/rs6000/rs6000.md (STACK_POINTER_REGNUM): Add more fixed register numbers. (TOC_REGNUM): Ditto. (STATIC_CHAIN_REGNUM): Ditto. (ARG_POINTER_REGNUM): Ditto. (SFP_REGNO): Delete, unused. (TOC_SAVE_OFFSET_32BIT): Add constants for AIX TOC save and function descriptor offsets. (TOC_SAVE_OFFSET_64BIT): Ditto. (AIX_FUNC_DESC_TOC_32BIT): Ditto. (AIX_FUNC_DESC_TOC_64BIT): Ditto. (AIX_FUNC_DESC_SC_32BIT): Ditto. (AIX_FUNC_DESC_SC_64BIT): Ditto. (ptrload): New mode attribute for the appropriate load of a pointer. (call_indirect_aix32): Delete, rewrite AIX indirect function calls. (call_indirect_aix64): Ditto. (call_value_indirect_aix32): Ditto. (call_value_indirect_aix64): Ditto. (call_indirect_nonlocal_aix32_internal): Ditto. (call_indirect_nonlocal_aix32): Ditto. (call_indirect_nonlocal_aix64_internal): Ditto. (call_indirect_nonlocal_aix64): Ditto. (call): Rewrite AIX indirect function calls. Add support for eliminating the static chain, and for moving the save of the TOC to the function prologue. (call_value): Ditto. (call_indirect_aixptrsize): Ditto. (call_indirect_aixptrsize_internal): Ditto. (call_indirect_aixptrsize_internal2): Ditto. (call_indirect_aixptrsize_nor11): Ditto. (call_value_indirect_aixptrsize): Ditto. (call_value_indirect_aixptrsize_internal): Ditto. (call_value_indirect_aixptrsize_internal2): Ditto. (call_value_indirect_aixptrsize_nor11): Ditto. (call_nonlocal_aix32): Relocate in the rs6000.md file. (call_nonlocal_aix64): Ditto. * doc/invoke.texi (RS/6000 and PowerPC Options): Add -mr11 and -mno-r11 documentation. [gcc/testsuite] 2011-07-06 Michael Meissner meiss...@linux.vnet.ibm.com * gcc.target/powerpc/no-r11-1.c: New test for -mr11, -mno-r11. * gcc.target/powerpc/no-r11-2.c: Ditto. * gcc.target/powerpc/no-r11-3.c: Ditto. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899
Re: [PATCH] Add -mno-r11 option to suppress load of ppc64 static chain in indirect calls
On Thu, Jul 07, 2011 at 10:59:36AM +0200, Richard Guenther wrote: Hum. Can't the compiler figure this our itself per-call-site? At least the name of the command-line switch -m[no-]r11 is meaningless to me. Points-to information should be able to tell you if the function pointer points to a nested function. Yeah. E.g. for C++ virtual method calls I believe all function pointers in vtables should always ignore the static chain pointer, etc., because you can't have a nested method. Jakub
Re: [PATCH] Add -mno-r11 option to suppress load of ppc64 static chain in indirect calls
On Thu, Jul 7, 2011 at 11:03 AM, Jakub Jelinek ja...@redhat.com wrote: On Thu, Jul 07, 2011 at 10:59:36AM +0200, Richard Guenther wrote: Hum. Can't the compiler figure this our itself per-call-site? At least the name of the command-line switch -m[no-]r11 is meaningless to me. Points-to information should be able to tell you if the function pointer points to a nested function. Yeah. E.g. for C++ virtual method calls I believe all function pointers in vtables should always ignore the static chain pointer, etc., because you can't have a nested method. For this kind of FE specific info you could use a flag on the CALL_EXPR as well. Richard.
Re: [PATCH] Add -mno-r11 option to suppress load of ppc64 static chain in indirect calls
On Thu, Jul 07, 2011 at 10:59:36AM +0200, Richard Guenther wrote: On Thu, Jul 7, 2011 at 12:29 AM, Michael Meissner meiss...@linux.vnet.ibm.com wrote: This patch adds an option to not load the static chain (r11) for 64-bit PowerPC calls through function pointers (or virtual function). Most of the languages on the PowerPC do not need the static chain being loaded when called, and adding this instruction can slow down code that calls very short functions. In addition, if the function does not call alloca, setjmp or deal with exceptions where the stack is modified, the compiler can move the store of the TOC value for the current function to the prologue of the function, rather than at each call site. The effect of these patches is to speed up 464.h264ref in the Spec 2006 benchmark by about 7% if -mno-r11 is used, and 5% if it is not used (but the save of the TOC register is hoisted). I believe this is due to the load of the current function's TOC (r2) having to wait until the store queue is drained with the store just before the call. Unfortunately, I do see a 3% slowdown in 429.mcf, which I don't know what the cause is. I have bootstraped the compiler and saw that there were no regressions in make check. Is it ok to install in the trunk? Hum. Can't the compiler figure this our itself per-call-site? At least the name of the command-line switch -m[no-]r11 is meaningless to me. Points-to information should be able to tell you if the function pointer points to a nested function. No, the compiler cannot figure it out. Consider the case where a function is passed a pointer to a function, such as the standard library function qsort. The call may come from any random module, that isn't part of the compilation suite, such as if the function being passed the pointer is in a shared library. You don't know whether the function pointed to uses the static chain (i.e. nested function call with trampoline, call to PL/I, or other language that does use the static chain, which is part of the ABI). The point of the switch is similar to -ffast-math where you say you are willing to ignore some corner cases in the standard in order to get better performance. I certainly can call the switch -mno-static-chain, which is perhaps more meaningful (at least to us compiler folk, I'm not sure static chain means much to the normal programmer). -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899
Re: [PATCH] Add -mno-r11 option to suppress load of ppc64 static chain in indirect calls
On Thu, Jul 7, 2011 at 5:47 PM, Michael Meissner meiss...@linux.vnet.ibm.com wrote: On Thu, Jul 07, 2011 at 10:59:36AM +0200, Richard Guenther wrote: On Thu, Jul 7, 2011 at 12:29 AM, Michael Meissner meiss...@linux.vnet.ibm.com wrote: This patch adds an option to not load the static chain (r11) for 64-bit PowerPC calls through function pointers (or virtual function). Most of the languages on the PowerPC do not need the static chain being loaded when called, and adding this instruction can slow down code that calls very short functions. In addition, if the function does not call alloca, setjmp or deal with exceptions where the stack is modified, the compiler can move the store of the TOC value for the current function to the prologue of the function, rather than at each call site. The effect of these patches is to speed up 464.h264ref in the Spec 2006 benchmark by about 7% if -mno-r11 is used, and 5% if it is not used (but the save of the TOC register is hoisted). I believe this is due to the load of the current function's TOC (r2) having to wait until the store queue is drained with the store just before the call. Unfortunately, I do see a 3% slowdown in 429.mcf, which I don't know what the cause is. I have bootstraped the compiler and saw that there were no regressions in make check. Is it ok to install in the trunk? Hum. Can't the compiler figure this our itself per-call-site? At least the name of the command-line switch -m[no-]r11 is meaningless to me. Points-to information should be able to tell you if the function pointer points to a nested function. No, the compiler cannot figure it out. Consider the case where a function is passed a pointer to a function, such as the standard library function qsort. The call may come from any random module, that isn't part of the compilation suite, such as if the function being passed the pointer is in a shared library. You don't know whether the function pointed to uses the static chain (i.e. nested function call with trampoline, call to PL/I, or other language that does use the static chain, which is part of the ABI). The point of the switch is similar to -ffast-math where you say you are willing to ignore some corner cases in the standard in order to get better performance. Well, I guess you don't propose to build glibc with -mno-r11? The compiler certainly can't figure out in _all_ cases - but it should be able to handle most of the cases (with LTO even more cases) ok, no? I also wonder why loading a register is so expensive compared to the actual call ... I certainly can call the switch -mno-static-chain, which is perhaps more meaningful (at least to us compiler folk, I'm not sure static chain means much to the normal programmer). Well, that's up to the target maintainers to decide, maybe -mno-nested-functions instead? Richard.
Re: [PATCH] Add -mno-r11 option to suppress load of ppc64 static chain in indirect calls
[...] On Jul 7, 2011, at 5:53 PM, Richard Guenther wrote: On Thu, Jul 7, 2011 at 5:47 PM, Michael Meissner meiss...@linux.vnet.ibm.com wrote: I certainly can call the switch -mno-static-chain, which is perhaps more meaningful (at least to us compiler folk, I'm not sure static chain means much to the normal programmer). Well, that's up to the target maintainers to decide, maybe -mno-nested-functions instead? Isn't that an issue of pointer to nested functions rather than nested functions ? So -mno-nested-function-pointers would be more accurate That's somewhat important from an Ada POV as nested subprograms are common, but access/pointer to nested subprogram is not very usual. My two cents. Tristan.
Re: [PATCH] Add -mno-r11 option to suppress load of ppc64 static chain in indirect calls
On Thu, Jul 7, 2011 at 11:53 AM, Richard Guenther richard.guent...@gmail.com wrote: Well, that's up to the target maintainers to decide, maybe -mno-nested-functions instead? Is -mno-nested-functions or -mno-nested-function-pointers too C-centric or GCC-centric? I don't know what wording would be more informative, but the functionality is available in Pascal, PL/I, Ada, GCC extensions and other languages. We're open to suggestions. The compiler certainly can't figure out in _all_ cases - but it should be able to handle most of the cases (with LTO even more cases) ok, no? -mno-r11 is an assertion to the compiler that no function calls through pointers will require the static chain. However, I agree that the compiler conservatively should be able to figure out some cases itself, which would be a good enhancement. Thanks, David
Re: [PATCH] Add -mno-r11 option to suppress load of ppc64 static chain in indirect calls
On Thu, Jul 7, 2011 at 9:14 PM, David Edelsohn dje@gmail.com wrote: On Thu, Jul 7, 2011 at 11:53 AM, Richard Guenther richard.guent...@gmail.com wrote: Well, that's up to the target maintainers to decide, maybe -mno-nested-functions instead? Is -mno-nested-functions or -mno-nested-function-pointers too C-centric or GCC-centric? I don't know what wording would be more informative, but the functionality is available in Pascal, PL/I, Ada, GCC extensions and other languages. We're open to suggestions. The compiler certainly can't figure out in _all_ cases - but it should be able to handle most of the cases (with LTO even more cases) ok, no? -mno-r11 is an assertion to the compiler that no function calls through pointers will require the static chain. However, I agree that the compiler conservatively should be able to figure out some cases itself, which would be a good enhancement. Does XLC have a similar switch whose name we can use? Richard. Thanks, David
[PATCH] Add -mno-r11 option to suppress load of ppc64 static chain in indirect calls
This patch adds an option to not load the static chain (r11) for 64-bit PowerPC calls through function pointers (or virtual function). Most of the languages on the PowerPC do not need the static chain being loaded when called, and adding this instruction can slow down code that calls very short functions. In addition, if the function does not call alloca, setjmp or deal with exceptions where the stack is modified, the compiler can move the store of the TOC value for the current function to the prologue of the function, rather than at each call site. The effect of these patches is to speed up 464.h264ref in the Spec 2006 benchmark by about 7% if -mno-r11 is used, and 5% if it is not used (but the save of the TOC register is hoisted). I believe this is due to the load of the current function's TOC (r2) having to wait until the store queue is drained with the store just before the call. Unfortunately, I do see a 3% slowdown in 429.mcf, which I don't know what the cause is. I have bootstraped the compiler and saw that there were no regressions in make check. Is it ok to install in the trunk? [gcc] 2011-07-06 Michael Meissner meiss...@linux.vnet.ibm.com * config/rs6000/rs6000-protos.h (rs6000_call_indirect_aix): New declaration. (rs6000_save_toc_in_prologue_p): Ditto. * config/rs6000/rs6000.opt (-mr11): New switch to disable loading up the static chain (r11) during indirect function calls. (-msave-toc-indirect): New undocumented debug switch. * config/rs6000/rs6000.c (struct machine_function): Add save_toc_in_prologue field to note whether the prologue needs to save the TOC value in the reserved stack location. (rs6000_emit_prologue): Use TOC_REGNUM instead of 2. If we need to save the TOC in the prologue, do so. (rs6000_trampoline_init): Don't allow creating AIX style trampolines if -mno-r11 is in effect. (rs6000_call_indirect_aix): New function to create AIX style indirect calls, adding support for -mno-r11 to suppress loading the static chain, and saving the TOC in the prologue instead of the call body. (rs6000_save_toc_in_prologue_p): Return true if we are saving the TOC in the prologue. * config/rs6000/rs6000.md (STACK_POINTER_REGNUM): Add more fixed register numbers. (TOC_REGNUM): Ditto. (STATIC_CHAIN_REGNUM): Ditto. (ARG_POINTER_REGNUM): Ditto. (SFP_REGNO): Delete, unused. (TOC_SAVE_OFFSET_32BIT): Add constants for AIX TOC save and function descriptor offsets. (TOC_SAVE_OFFSET_64BIT): Ditto. (AIX_FUNC_DESC_TOC_32BIT): Ditto. (AIX_FUNC_DESC_TOC_64BIT): Ditto. (AIX_FUNC_DESC_SC_32BIT): Ditto. (AIX_FUNC_DESC_SC_64BIT): Ditto. (ptrload): New mode attribute for the appropriate load of a pointer. (call_indirect_aix32): Delete, rewrite AIX indirect function calls. (call_indirect_aix64): Ditto. (call_value_indirect_aix32): Ditto. (call_value_indirect_aix64): Ditto. (call_indirect_nonlocal_aix32_internal): Ditto. (call_indirect_nonlocal_aix32): Ditto. (call_indirect_nonlocal_aix64_internal): Ditto. (call_indirect_nonlocal_aix64): Ditto. (call): Rewrite AIX indirect function calls. Add support for eliminating the static chain, and for moving the save of the TOC to the function prologue. (call_value): Ditto. (call_indirect_aixptrsize): Ditto. (call_indirect_aixptrsize_internal): Ditto. (call_indirect_aixptrsize_internal2): Ditto. (call_indirect_aixptrsize_nor11): Ditto. (call_value_indirect_aixptrsize): Ditto. (call_value_indirect_aixptrsize_internal): Ditto. (call_value_indirect_aixptrsize_internal2): Ditto. (call_value_indirect_aixptrsize_nor11): Ditto. (call_nonlocal_aix32): Relocate in the rs6000.md file. (call_nonlocal_aix64): Ditto. * doc/invoke.texi (RS/6000 and PowerPC Options): Add -mr11 and -mno-r11 documentation. [gcc/testsuite] 2011-07-06 Michael Meissner meiss...@linux.vnet.ibm.com * gcc.target/powerpc/no-r11-1.c: New test for -mr11, -mno-r11. * gcc.target/powerpc/no-r11-2.c: Ditto. * gcc.target/powerpc/no-r11-3.c: Ditto. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899 Index: gcc/config/rs6000/rs6000-protos.h === --- gcc/config/rs6000/rs6000-protos.h (revision 175921) +++ gcc/config/rs6000/rs6000-protos.h (working copy) @@ -171,6 +171,8 @@ extern unsigned int rs6000_dbx_register_ extern void rs6000_emit_epilogue (int); extern void rs6000_emit_eh_reg_restore (rtx, rtx); extern const char * output_isel (rtx *); +extern void