Re: [PATCH, rev 2] PR target/79799, Add vec_insert of V4SFmode on PowerPC ISA 3.0 (power9)
On Fri, Jun 16, 2017 at 05:55:35PM -0400, Michael Meissner wrote: > Here is the latest patch that restricts the optimization to 64-bit (due to > needing VSX small integers). I've done a full bootstrap/make check on a > little > endian power8 system, and a build without bootstrap and make check on a little > endian power9 system. Neither the power8 nor the power9 systems had any > regressions. I'm also running a test on a big endian power7 system for > completeness. > > Assuming the power7 test finishes without any regressions, can I check this > patch into the trunk and later the GCC 7 branch. > > The main change was to restrict the optimization to 64-bit PowerPC that have > VSX small integer support turned on (default for 64-bit). I did shorten the > one line in the testsuite that you mentioned. Okay for both. Thanks! Segher > 2017-06-16 Michael Meissner> > PR target/79799 > * config/rs6000/rs6000.c (rs6000_expand_vector_init): Add support > for doing vector set of SFmode on ISA 3.0. > * config/rs6000/vsx.md (vsx_set_v4sf_p9): Likewise. > (vsx_set_v4sf_p9_zero): Special case setting 0.0f to a V4SF > element. > (vsx_insert_extract_v4sf_p9): Add an optimization for inserting a > SFmode value into a V4SF variable that was extracted from another > V4SF variable without converting the element to double precision > and back to single precision vector format. > (vsx_insert_extract_v4sf_p9_2): Likewise. > > [gcc/testsuite] > 2017-06-16 Michael Meissner > > PR target/79799 > * gcc.target/powerpc/pr79799-1.c: New test. > * gcc.target/powerpc/pr79799-2.c: Likewise. > * gcc.target/powerpc/pr79799-3.c: Likewise. > * gcc.target/powerpc/pr79799-4.c: Likewise. > * gcc.target/powerpc/pr79799-5.c: Likewise.
Re: [PATCH, rev 2] PR target/79799, Add vec_insert of V4SFmode on PowerPC ISA 3.0 (power9)
On Fri, Jun 16, 2017 at 04:30:48PM -0500, Segher Boessenkool wrote: > On Fri, Jun 16, 2017 at 04:26:58PM -0400, Michael Meissner wrote: > > > > + "&& reload_completed" > > > > > > I still don't think it is such a good idea to do all of this not until > > > after reload. It does of course allow you to play tricks with changing > > > register mode at will, like you do ;-) > > > > The problem is MODES_TIEABLE_P. V4S{I,F}mode and SImode cannot be tied > > together (i.e. use gen_lowpart to change the mode and use a SUBREG). So > > after > > reload, we can just use gen_rtx_REG (...) to change the register type, but > > before reload, by creating the SUBREG, it can lead to various aborts if rtl > > checking is turned on. > > That sounds like a problem elsewhere? Hrm. > > > > All these unspecs are a similar problem: the RTL optimisers cannot do > > > much at all with it. > > > > I don't think there is a good way to represent a vec_insert. And > > vec_extract > > can't represent a variable extract either. > > Yeah. But especially for all this lane shuffling etc. the generic > optimisers could do a good job, if only they knew how. Maybe we need > some new RTL codes. > > > > > + [(set_attr "type" "vecperm") > > > > > Is that a good type for this? I think the convert is more expensive > > > than the permutes? If so, that would be better (of course it only > > > matters for sched1, not super important). > > > > I generally use the type of the last insn. I am open to other suggestions. > > It should describe the resulting insns as a whole. Picking the type of > the most expensive insn is often a reasonable approximation; for integer > insns "two" or "three" can be okay. > > I don't think we can do much better currently. Here is the latest patch that restricts the optimization to 64-bit (due to needing VSX small integers). I've done a full bootstrap/make check on a little endian power8 system, and a build without bootstrap and make check on a little endian power9 system. Neither the power8 nor the power9 systems had any regressions. I'm also running a test on a big endian power7 system for completeness. Assuming the power7 test finishes without any regressions, can I check this patch into the trunk and later the GCC 7 branch. The main change was to restrict the optimization to 64-bit PowerPC that have VSX small integer support turned on (default for 64-bit). I did shorten the one line in the testsuite that you mentioned. [gcc] 2017-06-16 Michael MeissnerPR target/79799 * config/rs6000/rs6000.c (rs6000_expand_vector_init): Add support for doing vector set of SFmode on ISA 3.0. * config/rs6000/vsx.md (vsx_set_v4sf_p9): Likewise. (vsx_set_v4sf_p9_zero): Special case setting 0.0f to a V4SF element. (vsx_insert_extract_v4sf_p9): Add an optimization for inserting a SFmode value into a V4SF variable that was extracted from another V4SF variable without converting the element to double precision and back to single precision vector format. (vsx_insert_extract_v4sf_p9_2): Likewise. [gcc/testsuite] 2017-06-16 Michael Meissner PR target/79799 * gcc.target/powerpc/pr79799-1.c: New test. * gcc.target/powerpc/pr79799-2.c: Likewise. * gcc.target/powerpc/pr79799-3.c: Likewise. * gcc.target/powerpc/pr79799-4.c: Likewise. * gcc.target/powerpc/pr79799-5.c: Likewise. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 249175) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -7451,6 +7451,8 @@ rs6000_expand_vector_set (rtx target, rt insn = gen_vsx_set_v8hi_p9 (target, target, val, elt_rtx); else if (mode == V16QImode) insn = gen_vsx_set_v16qi_p9 (target, target, val, elt_rtx); + else if (mode == V4SFmode) + insn = gen_vsx_set_v4sf_p9 (target, target, val, elt_rtx); } if (insn) Index: gcc/config/rs6000/vsx.md === --- gcc/config/rs6000/vsx.md(revision 249175) +++ gcc/config/rs6000/vsx.md(working copy) @@ -3012,6 +3012,134 @@ (define_insn "vsx_set__p9" } [(set_attr "type" "vecperm")]) +(define_insn_and_split "vsx_set_v4sf_p9" + [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa") + (unspec:V4SF +[(match_operand:V4SF 1 "gpc_reg_operand" "0") + (match_operand:SF 2 "gpc_reg_operand" "ww") + (match_operand:QI 3 "const_0_to_3_operand" "n")] +UNSPEC_VSX_SET)) + (clobber (match_scratch:SI 4 "="))] + "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR && TARGET_VSX_SMALL_INTEGER + &&
Re: [PATCH, rev 2] PR target/79799, Add vec_insert of V4SFmode on PowerPC ISA 3.0 (power9)
On Fri, Jun 16, 2017 at 04:26:58PM -0400, Michael Meissner wrote: > > > + "&& reload_completed" > > > > I still don't think it is such a good idea to do all of this not until > > after reload. It does of course allow you to play tricks with changing > > register mode at will, like you do ;-) > > The problem is MODES_TIEABLE_P. V4S{I,F}mode and SImode cannot be tied > together (i.e. use gen_lowpart to change the mode and use a SUBREG). So after > reload, we can just use gen_rtx_REG (...) to change the register type, but > before reload, by creating the SUBREG, it can lead to various aborts if rtl > checking is turned on. That sounds like a problem elsewhere? Hrm. > > All these unspecs are a similar problem: the RTL optimisers cannot do > > much at all with it. > > I don't think there is a good way to represent a vec_insert. And vec_extract > can't represent a variable extract either. Yeah. But especially for all this lane shuffling etc. the generic optimisers could do a good job, if only they knew how. Maybe we need some new RTL codes. > > > + [(set_attr "type" "vecperm") > > > Is that a good type for this? I think the convert is more expensive > > than the permutes? If so, that would be better (of course it only > > matters for sched1, not super important). > > I generally use the type of the last insn. I am open to other suggestions. It should describe the resulting insns as a whole. Picking the type of the most expensive insn is often a reasonable approximation; for integer insns "two" or "three" can be okay. I don't think we can do much better currently. Segher
Re: [PATCH, rev 2] PR target/79799, Add vec_insert of V4SFmode on PowerPC ISA 3.0 (power9)
On Fri, Jun 16, 2017 at 02:52:46PM -0500, Segher Boessenkool wrote: > Hi Mike, > > On Thu, Jun 15, 2017 at 10:10:28PM -0400, Michael Meissner wrote: > > +(define_insn_and_split "vsx_set_v4sf_p9" > > + [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa") > > + (unspec:V4SF > > +[(match_operand:V4SF 1 "gpc_reg_operand" "0") > > + (match_operand:SF 2 "gpc_reg_operand" "ww") > > + (match_operand:QI 3 "const_0_to_3_operand" "n")] > > +UNSPEC_VSX_SET)) > > + (clobber (match_scratch:SI 4 "="))] > > + "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR" > > + "#" > > + "&& reload_completed" > > I still don't think it is such a good idea to do all of this not until > after reload. It does of course allow you to play tricks with changing > register mode at will, like you do ;-) The problem is MODES_TIEABLE_P. V4S{I,F}mode and SImode cannot be tied together (i.e. use gen_lowpart to change the mode and use a SUBREG). So after reload, we can just use gen_rtx_REG (...) to change the register type, but before reload, by creating the SUBREG, it can lead to various aborts if rtl checking is turned on. > All these unspecs are a similar problem: the RTL optimisers cannot do > much at all with it. I don't think there is a good way to represent a vec_insert. And vec_extract can't represent a variable extract either. > > + [(set_attr "type" "vecperm") I generally use the type of the last insn. I am open to other suggestions. > Is that a good type for this? I think the convert is more expensive > than the permutes? If so, that would be better (of course it only > matters for sched1, not super important). > > > --- gcc/testsuite/gcc.target/powerpc/pr79799-1.c(nonexistent) > > +++ gcc/testsuite/gcc.target/powerpc/pr79799-1.c(working copy) > > @@ -0,0 +1,43 @@ > > +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */ > > Why not powerpc*-*-*? Well as it turns out, it aborts in 32-bit, because -mvsx-small-integer is not enabled, and we can't have SImode in vector registers. I'll have to add some additional tests and resubmit the patch. > > > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { > > "-mcpu=power9" } } */ > > +/* { dg-require-effective-target powerpc_p9vector_ok } */ > > +/* { dg-options "-mcpu=power9 -O2" } */ > > + > > +#include > > + > > +/* GCC 7.1 did not have a specialized method for inserting 32-bit floating > > point on > > + ISA 3.0 (power9) systems. */ > > That first line is a bit long. Ok. > The patch is okay for trunk and 7 with the testsuite nits taken care of. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH, rev 2] PR target/79799, Add vec_insert of V4SFmode on PowerPC ISA 3.0 (power9)
Hi Mike, On Thu, Jun 15, 2017 at 10:10:28PM -0400, Michael Meissner wrote: > +(define_insn_and_split "vsx_set_v4sf_p9" > + [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa") > + (unspec:V4SF > + [(match_operand:V4SF 1 "gpc_reg_operand" "0") > + (match_operand:SF 2 "gpc_reg_operand" "ww") > + (match_operand:QI 3 "const_0_to_3_operand" "n")] > + UNSPEC_VSX_SET)) > + (clobber (match_scratch:SI 4 "="))] > + "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR" > + "#" > + "&& reload_completed" I still don't think it is such a good idea to do all of this not until after reload. It does of course allow you to play tricks with changing register mode at will, like you do ;-) All these unspecs are a similar problem: the RTL optimisers cannot do much at all with it. > + [(set_attr "type" "vecperm") Is that a good type for this? I think the convert is more expensive than the permutes? If so, that would be better (of course it only matters for sched1, not super important). > --- gcc/testsuite/gcc.target/powerpc/pr79799-1.c (nonexistent) > +++ gcc/testsuite/gcc.target/powerpc/pr79799-1.c (working copy) > @@ -0,0 +1,43 @@ > +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */ Why not powerpc*-*-*? > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { > "-mcpu=power9" } } */ > +/* { dg-require-effective-target powerpc_p9vector_ok } */ > +/* { dg-options "-mcpu=power9 -O2" } */ > + > +#include > + > +/* GCC 7.1 did not have a specialized method for inserting 32-bit floating > point on > + ISA 3.0 (power9) systems. */ That first line is a bit long. The patch is okay for trunk and 7 with the testsuite nits taken care of. Thanks, Segher
[PATCH, rev 2] PR target/79799, Add vec_insert of V4SFmode on PowerPC ISA 3.0 (power9)
On Thu, Jun 15, 2017 at 07:39:39PM -0400, Michael Meissner wrote: > I thought the patch was fine as I posted. I had an optimization I thought > about (optimizing for inserting 0.0f) and I noticed some problems with it. > However, even in backing out the change, there are some problems. So, I will > hopefully reissue the patch tomorrow. Ok, the problem was I need to patch the compiler with a work around to run code on the current alpha hardware, and in backing out the patches of the code I was working on, I backed out the work around as well. This patch replaces the first patch. It adds an optimazation so that if you set a field in a V4SFmode vector to 0.0f, the compiler will know it can just clear the field, and it doesn't have to convert the 0.0 in internal scalar format to vector format witht he XSCVDPSPN instruction. As before, I have bootstrapped this patch on a little endian power8 system, and I had no regressions in the test suite. The new tests pr79799-{1,2,3,5}.c all generate the appropriate code. I have also done a non-bootstrap build and make check on the alpha power9 hardware with --with-cpu=power9, and there are no regressions. The executable test (pr79799-4.c) runs fine. Can I install this change to the trunk? After a week of burn-in, can I install this on the GCC 7.x branch? Note, it will not work on previous branches. [gcc] 2017-06-15 Michael MeissnerPR target/79799 * config/rs6000/rs6000.c (rs6000_expand_vector_init): Add support for doing vector set of SFmode on ISA 3.0. * config/rs6000/vsx.md (vsx_set_v4sf_p9): Likewise. (vsx_set_v4sf_p9_zero): Special case setting 0.0f to a V4SF element. (vsx_insert_extract_v4sf_p9): Add an optimization for inserting a SFmode value into a V4SF variable that was extracted from another V4SF variable without converting the element to double precision and back to single precision vector format. (vsx_insert_extract_v4sf_p9_2): Likewise. [gcc/testsuite] 2017-06-15 Michael Meissner PR target/79799 * gcc.target/powerpc/pr79799-1.c: New test. * gcc.target/powerpc/pr79799-2.c: Likewise. * gcc.target/powerpc/pr79799-3.c: Likewise. * gcc.target/powerpc/pr79799-4.c: Likewise. * gcc.target/powerpc/pr79799-5.c: Likewise. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 249175) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -7442,6 +7442,9 @@ rs6000_expand_vector_set (rtx target, rt else if (mode == V2DImode) insn = gen_vsx_set_v2di (target, target, val, elt_rtx); + else if (TARGET_P9_VECTOR && mode == V4SFmode) + insn = gen_vsx_set_v4sf_p9 (target, target, val, elt_rtx); + else if (TARGET_P9_VECTOR && TARGET_VSX_SMALL_INTEGER && TARGET_UPPER_REGS_DI && TARGET_POWERPC64) { Index: gcc/config/rs6000/vsx.md === --- gcc/config/rs6000/vsx.md(revision 249175) +++ gcc/config/rs6000/vsx.md(working copy) @@ -3012,6 +3012,130 @@ (define_insn "vsx_set__p9" } [(set_attr "type" "vecperm")]) +(define_insn_and_split "vsx_set_v4sf_p9" + [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa") + (unspec:V4SF +[(match_operand:V4SF 1 "gpc_reg_operand" "0") + (match_operand:SF 2 "gpc_reg_operand" "ww") + (match_operand:QI 3 "const_0_to_3_operand" "n")] +UNSPEC_VSX_SET)) + (clobber (match_scratch:SI 4 "="))] + "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR" + "#" + "&& reload_completed" + [(set (match_dup 5) + (unspec:V4SF [(match_dup 2)] +UNSPEC_VSX_CVDPSPN)) + (parallel [(set (match_dup 4) + (vec_select:SI (match_dup 6) + (parallel [(match_dup 7)]))) + (clobber (scratch:SI))]) + (set (match_dup 8) + (unspec:V4SI [(match_dup 8) + (match_dup 4) + (match_dup 3)] +UNSPEC_VSX_SET))] +{ + unsigned int tmp_regno = reg_or_subregno (operands[4]); + + operands[5] = gen_rtx_REG (V4SFmode, tmp_regno); + operands[6] = gen_rtx_REG (V4SImode, tmp_regno); + operands[7] = GEN_INT (VECTOR_ELT_ORDER_BIG ? 1 : 2); + operands[8] = gen_rtx_REG (V4SImode, reg_or_subregno (operands[0])); +} + [(set_attr "type" "vecperm") + (set_attr "length" "12")]) + +;; Special case setting 0.0f to a V4SF element +(define_insn_and_split "*vsx_set_v4sf_p9_zero" + [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa") + (unspec:V4SF +[(match_operand:V4SF 1 "gpc_reg_operand" "0") + (match_operand:SF 2