Re: [PATCH] lower-subreg and IBM long double
On Mon, Aug 19, 2013 at 8:47 AM, Alan Modra wrote: gcc/ ... * doc/tm.texi.in (TARGET_INIT_LOWER_SUBREG): Document. ChangeLog update needed here, the new hook is documented in target.def. +static void +rs6000_init_lower_subreg (void *data) +{ Functions should start with a leading comment. Index: gcc/target.def === --- gcc/target.def (revision 201834) +++ gcc/target.def (working copy) @@ -5110,6 +5110,14 @@ comparison code or operands., void, (int *code, rtx *op0, rtx *op1, bool op0_preserve_value), default_canonicalize_comparison) +/* Allow modification of subreg choices. */ +DEFHOOK +(init_lower_subreg, + This hook allows modification of the choices the lower_subreg pass \ +will make for particular subreg modes. @var{data} is a pointer to a \ +@code{struct target_lower_subreg}., + void, (void *data), NULL) + Bah, another void* pointer... I would have preferred a hook that takes a machine_mode and let it be called in the first loop of compute_costs, something like the code below. Your solution gives the target a lot more power to control the split decisions (adjust individual costs, review overall choice of modes to split, etc.) but I don't think any target really needs that. Do you think the target should have more control over splitting than just on per machine_mode basis? Ciao! Steven Index: lower-subreg.c === --- lower-subreg.c (revision 201887) +++ lower-subreg.c (working copy) @@ -208,7 +208,18 @@ compute_costs (bool speed_p, struct cost_rtxes *rt if (factor 1) { int mode_move_cost; - + bool split_profitable_mode = true; + + if (targetm.split_profitable_mode + ! targetm.split_profitable_mode ((mode))) + { + split_profitable_mode = false; + if (LOG_COSTS) + fprintf (stderr, +%s move: target signals splitting is not profitable%s\n, +GET_MODE_NAME (mode), FORCE_LOWERING ? (forcing) : ); + } + PUT_MODE (rtxes-target, mode); PUT_MODE (rtxes-source, mode); mode_move_cost = set_rtx_cost (rtxes-set, speed_p); @@ -218,7 +229,9 @@ compute_costs (bool speed_p, struct cost_rtxes *rt GET_MODE_NAME (mode), mode_move_cost, word_move_cost, factor); - if (FORCE_LOWERING || mode_move_cost = word_move_cost * factor) + if (FORCE_LOWERING + || ((mode_move_cost = word_move_cost * factor) + split_profitable_mode)) { choices[speed_p].move_modes_to_split[i] = true; choices[speed_p].something_to_do = true;
[PATCH] lower-subreg and IBM long double
On Tue, Jun 11, 2013 at 09:56:11AM +0930, Alan Modra wrote: [snip] It isn't hard to see why we are going wrong. IBM long double is really a two element array of double, and the rs6000 backend uses subregs to access the elements. The problem is that lower-subreg lowers to word_mode, so we get DImode. word_mode makes sense for most targets where subregs of FP modes might be used to narrow an access for bit-twiddling operations on the sign bit. It doesn't make sense for us. We want DFmode for FP operations. An example is the expander used by the testcase. This is a repost of http://gcc.gnu.org/ml/gcc/2013-06/msg00053.html, taking into account Joseph's doc review comments and adding a testcase. David has already acked the rs6000.c changes. Bootstrapped and regression tested powerpc64-linux. OK to apply? gcc/ * rs6000.c (TARGET_INIT_LOWER_SUBREG): Define. (rs6000_init_lower_subreg): New function. * lower-subreg.c (init_lower_subreg): Call targetm.init_lower_subreg. * target.def (init_lower_subreg): New. * doc/tm.texi.in (TARGET_INIT_LOWER_SUBREG): Document. * doc/tm.texi: Regenerate. gcc/testsuite/ * gcc.target/powerpc/fabsl.c: New test. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 201834) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -59,6 +59,7 @@ #include opts.h #include tree-vectorizer.h #include dumpfile.h +#include lower-subreg.h #if TARGET_XCOFF #include xcoffout.h /* get declarations of xcoff_*_section_name */ #endif @@ -1364,6 +1365,8 @@ static const struct attribute_spec rs6000_attribut #define TARGET_RTX_COSTS rs6000_rtx_costs #undef TARGET_ADDRESS_COST #define TARGET_ADDRESS_COST hook_int_rtx_mode_as_bool_0 +#undef TARGET_INIT_LOWER_SUBREG +#define TARGET_INIT_LOWER_SUBREG rs6000_init_lower_subreg #undef TARGET_DWARF_REGISTER_SPAN #define TARGET_DWARF_REGISTER_SPAN rs6000_dwarf_register_span @@ -27971,6 +27974,20 @@ rs6000_memory_move_cost (enum machine_mode mode, r return ret; } +static void +rs6000_init_lower_subreg (void *data) +{ + if (!TARGET_IEEEQUAD + TARGET_HARD_FLOAT + (TARGET_FPRS || TARGET_E500_DOUBLE) + TARGET_LONG_DOUBLE_128) +{ + struct target_lower_subreg *info = (struct target_lower_subreg *) data; + info-x_choices[0].move_modes_to_split[TFmode] = false; + info-x_choices[1].move_modes_to_split[TFmode] = false; +} +} + /* Returns a code for a target-specific builtin that implements reciprocal of the function, or NULL_TREE if not available. */ Index: gcc/lower-subreg.c === --- gcc/lower-subreg.c (revision 201834) +++ gcc/lower-subreg.c (working copy) @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3. If not see #include tree-pass.h #include df.h #include lower-subreg.h +#include target.h #ifdef STACK_GROWS_DOWNWARD # undef STACK_GROWS_DOWNWARD @@ -287,6 +288,9 @@ init_lower_subreg (void) if (LOG_COSTS) fprintf (stderr, \nSpeed costs\n===\n\n); compute_costs (true, rtxes); + + if (targetm.init_lower_subreg) +targetm.init_lower_subreg (this_target_lower_subreg); } static bool Index: gcc/target.def === --- gcc/target.def (revision 201834) +++ gcc/target.def (working copy) @@ -5110,6 +5110,14 @@ comparison code or operands., void, (int *code, rtx *op0, rtx *op1, bool op0_preserve_value), default_canonicalize_comparison) +/* Allow modification of subreg choices. */ +DEFHOOK +(init_lower_subreg, + This hook allows modification of the choices the lower_subreg pass \ +will make for particular subreg modes. @var{data} is a pointer to a \ +@code{struct target_lower_subreg}., + void, (void *data), NULL) + DEFHOOKPOD (atomic_test_and_set_trueval, This value should be set if the result written by\ Index: gcc/doc/tm.texi.in === --- gcc/doc/tm.texi.in (revision 201834) +++ gcc/doc/tm.texi.in (working copy) @@ -4939,6 +4939,8 @@ Define this macro if a non-short-circuit operation @hook TARGET_ADDRESS_COST +@hook TARGET_INIT_LOWER_SUBREG + @node Scheduling @section Adjusting the Instruction Scheduler -- Alan Modra Australia Development Lab, IBM
Re: lower-subreg and IBM long double
On Tue, Jun 11, 2013 at 04:58:07PM +, Joseph S. Myers wrote: It's preferred to put the hook documentation in the doc string in target.def, so tm.texi.in then only contains the @hook line indicating where the documentation will go in tm.texi. Ah, thanks. Revised patch with testcase. gcc/ * rs6000.c (TARGET_INIT_LOWER_SUBREG): Define. (rs6000_init_lower_subreg): New function. * lower-subreg.c (init_lower_subreg): Call targetm.init_lower_subreg. * target.def (init_lower_subreg): New. * doc/tm.texi.in (TARGET_INIT_LOWER_SUBREG): Document. * doc/tm.texi: Regenerate. gcc/testsuite/ * gcc.target/powerpc/fabsl.c: New test. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 199975) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -59,6 +59,7 @@ #include opts.h #include tree-vectorizer.h #include dumpfile.h +#include lower-subreg.h #if TARGET_XCOFF #include xcoffout.h /* get declarations of xcoff_*_section_name */ #endif @@ -1357,6 +1358,8 @@ static const struct attribute_spec rs6000_attribut #define TARGET_RTX_COSTS rs6000_rtx_costs #undef TARGET_ADDRESS_COST #define TARGET_ADDRESS_COST hook_int_rtx_mode_as_bool_0 +#undef TARGET_INIT_LOWER_SUBREG +#define TARGET_INIT_LOWER_SUBREG rs6000_init_lower_subreg #undef TARGET_DWARF_REGISTER_SPAN #define TARGET_DWARF_REGISTER_SPAN rs6000_dwarf_register_span @@ -27430,6 +27523,20 @@ rs6000_memory_move_cost (enum machine_mode mode, r return ret; } +static void +rs6000_init_lower_subreg (void *data) +{ + if (!TARGET_IEEEQUAD + TARGET_HARD_FLOAT + (TARGET_FPRS || TARGET_E500_DOUBLE) + TARGET_LONG_DOUBLE_128) +{ + struct target_lower_subreg *info = (struct target_lower_subreg *) data; + info-x_choices[0].move_modes_to_split[TFmode] = false; + info-x_choices[1].move_modes_to_split[TFmode] = false; +} +} + /* Returns a code for a target-specific builtin that implements reciprocal of the function, or NULL_TREE if not available. */ Index: gcc/lower-subreg.c === --- gcc/lower-subreg.c (revision 199975) +++ gcc/lower-subreg.c (working copy) @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3. If not see #include tree-pass.h #include df.h #include lower-subreg.h +#include target.h #ifdef STACK_GROWS_DOWNWARD # undef STACK_GROWS_DOWNWARD @@ -287,6 +288,9 @@ init_lower_subreg (void) if (LOG_COSTS) fprintf (stderr, \nSpeed costs\n===\n\n); compute_costs (true, rtxes); + + if (targetm.init_lower_subreg) +targetm.init_lower_subreg (this_target_lower_subreg); } static bool Index: gcc/target.def === --- gcc/target.def (revision 199975) +++ gcc/target.def (working copy) @@ -2926,6 +2926,14 @@ DEFHOOK void, (int *code, rtx *op0, rtx *op1, bool op0_preserve_value), default_canonicalize_comparison) +/* Allow modification of subreg choices. */ +DEFHOOK +(init_lower_subreg, + This hook allows modification of the choices the lower_subreg pass \ +will make for particular subreg modes. @var{data} is a pointer to a \ +@code{struct target_lower_subreg}., + void, (void *data), NULL) + DEFHOOKPOD (atomic_test_and_set_trueval, This value should be set if the result written by\ Index: gcc/doc/tm.texi.in === --- gcc/doc/tm.texi.in (revision 199975) +++ gcc/doc/tm.texi.in (working copy) @@ -6384,6 +6384,8 @@ should probably only be given to addresses with di registers on machines with lots of registers. @end deftypefn +@hook TARGET_INIT_LOWER_SUBREG + @node Scheduling @section Adjusting the Instruction Scheduler Index: gcc/testsuite/gcc.target/powerpc/fabsl.c === --- gcc/testsuite/gcc.target/powerpc/fabsl.c(revision 0) +++ gcc/testsuite/gcc.target/powerpc/fabsl.c(revision 0) @@ -0,0 +1,10 @@ +/* { dg-do compile { target { powerpc*-*-darwin* powerpc*-*-aix* rs6000-*-* powerpc*-*-linux* } } } */ +/* { dg-options -O2 -mlong-double-128 } */ +/* { dg-final { scan-assembler-not lwz } } */ +/* { dg-final { scan-assembler-not ld } } */ + +long double +_fabsl (long double x) +{ + return __builtin_fabsl (x); +} -- Alan Modra Australia Development Lab, IBM
Re: lower-subreg and IBM long double
On Tue, 11 Jun 2013, Alan Modra wrote: Index: gcc/doc/tm.texi.in === --- gcc/doc/tm.texi.in(revision 199781) +++ gcc/doc/tm.texi.in(working copy) @@ -6375,6 +6375,12 @@ registers on machines with lots of registers. @end deftypefn +@hook TARGET_INIT_LOWER_SUBREG +This hook allows modification of the choices the lower_subreg pass +will make for particular subreg modes. @var{data} is a pointer to a +@code{struct target_lower_subreg}. +@end deftypefn It's preferred to put the hook documentation in the doc string in target.def, so tm.texi.in then only contains the @hook line indicating where the documentation will go in tm.texi. -- Joseph S. Myers jos...@codesourcery.com
lower-subreg and IBM long double
Should lower-subreg be disabled for IBM long double TFmode? On powerpc64-linux, this testcase long double ld_abs (long double x) { return __builtin_fabsl (x); } compiled with -m64 -O2 -S generates the horrible code shown on the left. The code on the right is ideal, as generated by gcc-4.2. We regressed with gcc-4.3.0, ie. with lower-subreg. .L.ld_abs: .L.ld_abs: fabs 0,1fmr 0,1 stfd 2,-32(1) fabs 1,1 fcmpu 7,1,0 fcmpu 7,0,1 ori 2,2,0 beqlr 7 ld 10,-32(1)fneg 2,2 mr 9,10 blr beq 7,.L2 std 10,-24(1) ori 2,2,0 lfd 13,-24(1) fneg 13,13 stfd 13,-24(1) ori 2,2,0 ld 9,-24(1) .L2: stfd 0,-32(1) std 9,-8(1) ori 2,2,0 ld 8,-32(1) lfd 2,-8(1) std 8,-16(1) ori 2,2,0 lfd 1,-16(1) blr It isn't hard to see why we are going wrong. IBM long double is really a two element array of double, and the rs6000 backend uses subregs to access the elements. The problem is that lower-subreg lowers to word_mode, so we get DImode. word_mode makes sense for most targets where subregs of FP modes might be used to narrow an access for bit-twiddling operations on the sign bit. It doesn't make sense for us. We want DFmode for FP operations. An example is the expander used by the testcase. (define_expand abstf2_internal [(set (match_operand:TF 0 gpc_reg_operand ) (match_operand:TF 1 gpc_reg_operand )) (set (match_dup 3) (match_dup 5)) (set (match_dup 5) (abs:DF (match_dup 5))) (set (match_dup 4) (compare:CCFP (match_dup 3) (match_dup 5))) (set (pc) (if_then_else (eq (match_dup 4) (const_int 0)) (label_ref (match_operand 2 )) (pc))) (set (match_dup 6) (neg:DF (match_dup 6)))] !TARGET_IEEEQUAD TARGET_HARD_FLOAT TARGET_FPRS TARGET_DOUBLE_FLOAT TARGET_LONG_DOUBLE_128 { const int hi_word = LONG_DOUBLE_WORDS_BIG_ENDIAN ? 0 : GET_MODE_SIZE (DFmode); const int lo_word = LONG_DOUBLE_WORDS_BIG_ENDIAN ? GET_MODE_SIZE (DFmode) : 0; operands[3] = gen_reg_rtx (DFmode); operands[4] = gen_reg_rtx (CCFPmode); operands[5] = simplify_gen_subreg (DFmode, operands[0], TFmode, hi_word); operands[6] = simplify_gen_subreg (DFmode, operands[0], TFmode, lo_word); }) The following patch disables lower-subreg for double double TFmode, bootstrap and regression tests are OK, but I'm a little unsure whether this is the right thing to do. * rs6000.c (TARGET_INIT_LOWER_SUBREG): Define. (rs6000_init_lower_subreg): New function. * lower-subreg.c (init_lower_subreg): Call targetm.init_lower_subreg. * target.def (init_lower_subreg): New. * doc/tm.texi.in (TARGET_INIT_LOWER_SUBREG): Document. * doc/tm.texi: Regenerate. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 199781) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -59,6 +59,7 @@ #include opts.h #include tree-vectorizer.h #include dumpfile.h +#include lower-subreg.h #if TARGET_XCOFF #include xcoffout.h /* get declarations of xcoff_*_section_name */ #endif @@ -1317,6 +1318,8 @@ #define TARGET_RTX_COSTS rs6000_rtx_costs #undef TARGET_ADDRESS_COST #define TARGET_ADDRESS_COST hook_int_rtx_mode_as_bool_0 +#undef TARGET_INIT_LOWER_SUBREG +#define TARGET_INIT_LOWER_SUBREG rs6000_init_lower_subreg #undef TARGET_DWARF_REGISTER_SPAN #define TARGET_DWARF_REGISTER_SPAN rs6000_dwarf_register_span @@ -26865,6 +26955,20 @@ return ret; } +static void +rs6000_init_lower_subreg (void *data) +{ + if (!TARGET_IEEEQUAD + TARGET_HARD_FLOAT + (TARGET_FPRS || TARGET_E500_DOUBLE) + TARGET_LONG_DOUBLE_128) +{ + struct target_lower_subreg *info = (struct target_lower_subreg *) data; + info-x_choices[0].move_modes_to_split[TFmode] = false; + info-x_choices[1].move_modes_to_split[TFmode] = false; +} +} + /* Returns a code for a target-specific builtin that implements reciprocal of the function, or NULL_TREE if not available. */ Index: gcc/lower-subreg.c === --- gcc/lower-subreg.c (revision 199781) +++ gcc/lower-subreg.c (working copy) @@ -39,6 +39,7 @@ #include tree-pass.h #include df.h #include lower-subreg.h +#include target.h #ifdef STACK_GROWS_DOWNWARD # undef STACK_GROWS_DOWNWARD @@ -287,6 +288,9 @@ if (LOG_COSTS) fprintf (stderr, \nSpeed costs\n===\n\n); compute_costs (true, rtxes); + + if (targetm.init_lower_subreg) +targetm.init_lower_subreg (this_target_lower_subreg); } static bool Index: gcc/target.def === --- gcc/target.def (revision 199781) +++ gcc/target.def
Re: lower-subreg and IBM long double
On Mon, Jun 10, 2013 at 8:26 PM, Alan Modra amo...@gmail.com wrote: The following patch disables lower-subreg for double double TFmode, bootstrap and regression tests are OK, but I'm a little unsure whether this is the right thing to do. * rs6000.c (TARGET_INIT_LOWER_SUBREG): Define. (rs6000_init_lower_subreg): New function. * lower-subreg.c (init_lower_subreg): Call targetm.init_lower_subreg. * target.def (init_lower_subreg): New. * doc/tm.texi.in (TARGET_INIT_LOWER_SUBREG): Document. * doc/tm.texi: Regenerate. I agree with the rs6000 bits. You need someone else to approve the common bits. This also needs a testcase. Thanks, David
Re: lower-subreg and IBM long double
On Mon, Jun 10, 2013 at 6:00 PM, David Edelsohn dje@gmail.com wrote: On Mon, Jun 10, 2013 at 8:26 PM, Alan Modra amo...@gmail.com wrote: The following patch disables lower-subreg for double double TFmode, bootstrap and regression tests are OK, but I'm a little unsure whether this is the right thing to do. * rs6000.c (TARGET_INIT_LOWER_SUBREG): Define. (rs6000_init_lower_subreg): New function. * lower-subreg.c (init_lower_subreg): Call targetm.init_lower_subreg. * target.def (init_lower_subreg): New. * doc/tm.texi.in (TARGET_INIT_LOWER_SUBREG): Document. * doc/tm.texi: Regenerate. I agree with the rs6000 bits. You need someone else to approve the common bits. This also needs a testcase. I thought there was a way already to disable lower subreg already for some modes. Thanks, Andrew Thanks, David
Re: lower-subreg and IBM long double
On Mon, Jun 10, 2013 at 06:31:55PM -0700, Andrew Pinski wrote: On Mon, Jun 10, 2013 at 6:00 PM, David Edelsohn dje@gmail.com wrote: On Mon, Jun 10, 2013 at 8:26 PM, Alan Modra amo...@gmail.com wrote: The following patch disables lower-subreg for double double TFmode, bootstrap and regression tests are OK, but I'm a little unsure whether this is the right thing to do. * rs6000.c (TARGET_INIT_LOWER_SUBREG): Define. (rs6000_init_lower_subreg): New function. * lower-subreg.c (init_lower_subreg): Call targetm.init_lower_subreg. * target.def (init_lower_subreg): New. * doc/tm.texi.in (TARGET_INIT_LOWER_SUBREG): Document. * doc/tm.texi: Regenerate. I agree with the rs6000 bits. You need someone else to approve the common bits. This also needs a testcase. I thought there was a way already to disable lower subreg already for some modes. There is, via rtx_costs. In fact that was my first approach, with the following in rs6000_rtx_costs, but this potentially affects other areas of the compiler. case SET: if (GET_MODE (SET_DEST (x)) == TFmode !TARGET_IEEEQUAD TARGET_HARD_FLOAT (TARGET_FPRS || TARGET_E500_DOUBLE) TARGET_LONG_DOUBLE_128) /* This hack is to persuade lower_subreg to not lower TFmode regs to DImode. */ *total = COSTS_N_INSNS (2) - 1; break; -- Alan Modra Australia Development Lab, IBM