Re: [PATCH, rev 2] PR target/79799, Add vec_insert of V4SFmode on PowerPC ISA 3.0 (power9)

2017-06-19 Thread Segher Boessenkool
On Fri, Jun 16, 2017 at 05:55:35PM -0400, Michael Meissner wrote:
> Here is the latest patch that restricts the optimization to 64-bit (due to
> needing VSX small integers).  I've done a full bootstrap/make check on a 
> little
> endian power8 system, and a build without bootstrap and make check on a little
> endian power9 system.  Neither the power8 nor the power9 systems had any
> regressions.  I'm also running a test on a big endian power7 system for
> completeness.
> 
> Assuming the power7 test finishes without any regressions, can I check this
> patch into the trunk and later the GCC 7 branch.
> 
> The main change was to restrict the optimization to 64-bit PowerPC that have
> VSX small integer support turned on (default for 64-bit).  I did shorten the
> one line in the testsuite that you mentioned.

Okay for both.  Thanks!


Segher


> 2017-06-16  Michael Meissner  
> 
>   PR target/79799
>   * config/rs6000/rs6000.c (rs6000_expand_vector_init): Add support
>   for doing vector set of SFmode on ISA 3.0.
>   * config/rs6000/vsx.md (vsx_set_v4sf_p9): Likewise.
>   (vsx_set_v4sf_p9_zero): Special case setting 0.0f to a V4SF
>   element.
>   (vsx_insert_extract_v4sf_p9): Add an optimization for inserting a
>   SFmode value into a V4SF variable that was extracted from another
>   V4SF variable without converting the element to double precision
>   and back to single precision vector format.
>   (vsx_insert_extract_v4sf_p9_2): Likewise.
> 
> [gcc/testsuite]
> 2017-06-16  Michael Meissner  
> 
>   PR target/79799
>   * gcc.target/powerpc/pr79799-1.c: New test.
>   * gcc.target/powerpc/pr79799-2.c: Likewise.
>   * gcc.target/powerpc/pr79799-3.c: Likewise.
>   * gcc.target/powerpc/pr79799-4.c: Likewise.
>   * gcc.target/powerpc/pr79799-5.c: Likewise.


Re: [PATCH, rev 2] PR target/79799, Add vec_insert of V4SFmode on PowerPC ISA 3.0 (power9)

2017-06-16 Thread Michael Meissner
On Fri, Jun 16, 2017 at 04:30:48PM -0500, Segher Boessenkool wrote:
> On Fri, Jun 16, 2017 at 04:26:58PM -0400, Michael Meissner wrote:
> > > > +  "&& reload_completed"
> > > 
> > > I still don't think it is such a good idea to do all of this not until
> > > after reload.  It does of course allow you to play tricks with changing
> > > register mode at will, like you do ;-)
> > 
> > The problem is MODES_TIEABLE_P.  V4S{I,F}mode and SImode cannot be tied
> > together (i.e. use gen_lowpart to change the mode and use a SUBREG).  So 
> > after
> > reload, we can just use gen_rtx_REG (...) to change the register type, but
> > before reload, by creating the SUBREG, it can lead to various aborts if rtl
> > checking is turned on.
> 
> That sounds like a problem elsewhere?  Hrm.
> 
> > > All these unspecs are a similar problem: the RTL optimisers cannot do
> > > much at all with it.
> > 
> > I don't think there is a good way to represent a vec_insert.  And 
> > vec_extract
> > can't represent a variable extract either.
> 
> Yeah.  But especially for all this lane shuffling etc. the generic
> optimisers could do a good job, if only they knew how.  Maybe we need
> some new RTL codes.
> 
> > > > +  [(set_attr "type" "vecperm")
> > 
> > > Is that a good type for this?  I think the convert is more expensive
> > > than the permutes?  If so, that would be better (of course it only
> > > matters for sched1, not super important).
> > 
> > I generally use the type of the last insn.  I am open to other suggestions.
> 
> It should describe the resulting insns as a whole.  Picking the type of
> the most expensive insn is often a reasonable approximation; for integer
> insns "two" or "three" can be okay.
> 
> I don't think we can do much better currently.

Here is the latest patch that restricts the optimization to 64-bit (due to
needing VSX small integers).  I've done a full bootstrap/make check on a little
endian power8 system, and a build without bootstrap and make check on a little
endian power9 system.  Neither the power8 nor the power9 systems had any
regressions.  I'm also running a test on a big endian power7 system for
completeness.

Assuming the power7 test finishes without any regressions, can I check this
patch into the trunk and later the GCC 7 branch.

The main change was to restrict the optimization to 64-bit PowerPC that have
VSX small integer support turned on (default for 64-bit).  I did shorten the
one line in the testsuite that you mentioned.

[gcc]
2017-06-16  Michael Meissner  

PR target/79799
* config/rs6000/rs6000.c (rs6000_expand_vector_init): Add support
for doing vector set of SFmode on ISA 3.0.
* config/rs6000/vsx.md (vsx_set_v4sf_p9): Likewise.
(vsx_set_v4sf_p9_zero): Special case setting 0.0f to a V4SF
element.
(vsx_insert_extract_v4sf_p9): Add an optimization for inserting a
SFmode value into a V4SF variable that was extracted from another
V4SF variable without converting the element to double precision
and back to single precision vector format.
(vsx_insert_extract_v4sf_p9_2): Likewise.

[gcc/testsuite]
2017-06-16  Michael Meissner  

PR target/79799
* gcc.target/powerpc/pr79799-1.c: New test.
* gcc.target/powerpc/pr79799-2.c: Likewise.
* gcc.target/powerpc/pr79799-3.c: Likewise.
* gcc.target/powerpc/pr79799-4.c: Likewise.
* gcc.target/powerpc/pr79799-5.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 249175)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -7451,6 +7451,8 @@ rs6000_expand_vector_set (rtx target, rt
insn = gen_vsx_set_v8hi_p9 (target, target, val, elt_rtx);
  else if (mode == V16QImode)
insn = gen_vsx_set_v16qi_p9 (target, target, val, elt_rtx);
+ else if (mode == V4SFmode)
+   insn = gen_vsx_set_v4sf_p9 (target, target, val, elt_rtx);
}
 
   if (insn)
Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 249175)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -3012,6 +3012,134 @@ (define_insn "vsx_set__p9"
 }
   [(set_attr "type" "vecperm")])
 
+(define_insn_and_split "vsx_set_v4sf_p9"
+  [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa")
+   (unspec:V4SF
+[(match_operand:V4SF 1 "gpc_reg_operand" "0")
+ (match_operand:SF 2 "gpc_reg_operand" "ww")
+ (match_operand:QI 3 "const_0_to_3_operand" "n")]
+UNSPEC_VSX_SET))
+   (clobber (match_scratch:SI 4 "="))]
+  "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR && TARGET_VSX_SMALL_INTEGER
+   && 

Re: [PATCH, rev 2] PR target/79799, Add vec_insert of V4SFmode on PowerPC ISA 3.0 (power9)

2017-06-16 Thread Segher Boessenkool
On Fri, Jun 16, 2017 at 04:26:58PM -0400, Michael Meissner wrote:
> > > +  "&& reload_completed"
> > 
> > I still don't think it is such a good idea to do all of this not until
> > after reload.  It does of course allow you to play tricks with changing
> > register mode at will, like you do ;-)
> 
> The problem is MODES_TIEABLE_P.  V4S{I,F}mode and SImode cannot be tied
> together (i.e. use gen_lowpart to change the mode and use a SUBREG).  So after
> reload, we can just use gen_rtx_REG (...) to change the register type, but
> before reload, by creating the SUBREG, it can lead to various aborts if rtl
> checking is turned on.

That sounds like a problem elsewhere?  Hrm.

> > All these unspecs are a similar problem: the RTL optimisers cannot do
> > much at all with it.
> 
> I don't think there is a good way to represent a vec_insert.  And vec_extract
> can't represent a variable extract either.

Yeah.  But especially for all this lane shuffling etc. the generic
optimisers could do a good job, if only they knew how.  Maybe we need
some new RTL codes.

> > > +  [(set_attr "type" "vecperm")
> 
> > Is that a good type for this?  I think the convert is more expensive
> > than the permutes?  If so, that would be better (of course it only
> > matters for sched1, not super important).
> 
> I generally use the type of the last insn.  I am open to other suggestions.

It should describe the resulting insns as a whole.  Picking the type of
the most expensive insn is often a reasonable approximation; for integer
insns "two" or "three" can be okay.

I don't think we can do much better currently.


Segher


Re: [PATCH, rev 2] PR target/79799, Add vec_insert of V4SFmode on PowerPC ISA 3.0 (power9)

2017-06-16 Thread Michael Meissner
On Fri, Jun 16, 2017 at 02:52:46PM -0500, Segher Boessenkool wrote:
> Hi Mike,
> 
> On Thu, Jun 15, 2017 at 10:10:28PM -0400, Michael Meissner wrote:
> > +(define_insn_and_split "vsx_set_v4sf_p9"
> > +  [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa")
> > +   (unspec:V4SF
> > +[(match_operand:V4SF 1 "gpc_reg_operand" "0")
> > + (match_operand:SF 2 "gpc_reg_operand" "ww")
> > + (match_operand:QI 3 "const_0_to_3_operand" "n")]
> > +UNSPEC_VSX_SET))
> > +   (clobber (match_scratch:SI 4 "="))]
> > +  "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR"
> > +  "#"
> > +  "&& reload_completed"
> 
> I still don't think it is such a good idea to do all of this not until
> after reload.  It does of course allow you to play tricks with changing
> register mode at will, like you do ;-)

The problem is MODES_TIEABLE_P.  V4S{I,F}mode and SImode cannot be tied
together (i.e. use gen_lowpart to change the mode and use a SUBREG).  So after
reload, we can just use gen_rtx_REG (...) to change the register type, but
before reload, by creating the SUBREG, it can lead to various aborts if rtl
checking is turned on.

> All these unspecs are a similar problem: the RTL optimisers cannot do
> much at all with it.

I don't think there is a good way to represent a vec_insert.  And vec_extract
can't represent a variable extract either.

> > +  [(set_attr "type" "vecperm")

I generally use the type of the last insn.  I am open to other suggestions.

> Is that a good type for this?  I think the convert is more expensive
> than the permutes?  If so, that would be better (of course it only
> matters for sched1, not super important).
> 
> > --- gcc/testsuite/gcc.target/powerpc/pr79799-1.c(nonexistent)
> > +++ gcc/testsuite/gcc.target/powerpc/pr79799-1.c(working copy)
> > @@ -0,0 +1,43 @@
> > +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
> 
> Why not powerpc*-*-*?

Well as it turns out, it aborts in 32-bit, because -mvsx-small-integer is not
enabled, and we can't have SImode in vector registers.  I'll have to add some
additional tests and resubmit the patch.

> 
> > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
> > "-mcpu=power9" } } */
> > +/* { dg-require-effective-target powerpc_p9vector_ok } */
> > +/* { dg-options "-mcpu=power9 -O2" } */
> > +
> > +#include 
> > +
> > +/* GCC 7.1 did not have a specialized method for inserting 32-bit floating 
> > point on
> > +   ISA 3.0 (power9) systems.  */
> 
> That first line is a bit long.

Ok.

> The patch is okay for trunk and 7 with the testsuite nits taken care of.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH, rev 2] PR target/79799, Add vec_insert of V4SFmode on PowerPC ISA 3.0 (power9)

2017-06-16 Thread Segher Boessenkool
Hi Mike,

On Thu, Jun 15, 2017 at 10:10:28PM -0400, Michael Meissner wrote:
> +(define_insn_and_split "vsx_set_v4sf_p9"
> +  [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa")
> + (unspec:V4SF
> +  [(match_operand:V4SF 1 "gpc_reg_operand" "0")
> +   (match_operand:SF 2 "gpc_reg_operand" "ww")
> +   (match_operand:QI 3 "const_0_to_3_operand" "n")]
> +  UNSPEC_VSX_SET))
> +   (clobber (match_scratch:SI 4 "="))]
> +  "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR"
> +  "#"
> +  "&& reload_completed"

I still don't think it is such a good idea to do all of this not until
after reload.  It does of course allow you to play tricks with changing
register mode at will, like you do ;-)

All these unspecs are a similar problem: the RTL optimisers cannot do
much at all with it.

> +  [(set_attr "type" "vecperm")

Is that a good type for this?  I think the convert is more expensive
than the permutes?  If so, that would be better (of course it only
matters for sched1, not super important).

> --- gcc/testsuite/gcc.target/powerpc/pr79799-1.c  (nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/pr79799-1.c  (working copy)
> @@ -0,0 +1,43 @@
> +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */

Why not powerpc*-*-*?

> +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
> "-mcpu=power9" } } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-options "-mcpu=power9 -O2" } */
> +
> +#include 
> +
> +/* GCC 7.1 did not have a specialized method for inserting 32-bit floating 
> point on
> +   ISA 3.0 (power9) systems.  */

That first line is a bit long.


The patch is okay for trunk and 7 with the testsuite nits taken care of.

Thanks,


Segher


[PATCH, rev 2] PR target/79799, Add vec_insert of V4SFmode on PowerPC ISA 3.0 (power9)

2017-06-15 Thread Michael Meissner
On Thu, Jun 15, 2017 at 07:39:39PM -0400, Michael Meissner wrote:
> I thought the patch was fine as I posted.  I had an optimization I thought
> about (optimizing for inserting 0.0f) and I noticed some problems with it.
> However, even in backing out the change, there are some problems.  So, I will
> hopefully reissue the patch tomorrow.

Ok, the problem was I need to patch the compiler with a work around to run code
on the current alpha hardware, and in backing out the patches of the code I was
working on, I backed out the work around as well.

This patch replaces the first patch.  It adds an optimazation so that if you
set a field in a V4SFmode vector to 0.0f, the compiler will know it can just
clear the field, and it doesn't have to convert the 0.0 in internal scalar
format to vector format witht he XSCVDPSPN instruction.

As before, I have bootstrapped this patch on a little endian power8 system, and
I had no regressions in the test suite.  The new tests pr79799-{1,2,3,5}.c all
generate the appropriate code.  I have also done a non-bootstrap build and make
check on the alpha power9 hardware with --with-cpu=power9, and there are no
regressions.  The executable test (pr79799-4.c) runs fine.

Can I install this change to the trunk?  After a week of burn-in, can I install
this on the GCC 7.x branch?  Note, it will not work on previous branches.

[gcc]
2017-06-15  Michael Meissner  

PR target/79799
* config/rs6000/rs6000.c (rs6000_expand_vector_init): Add support
for doing vector set of SFmode on ISA 3.0.
* config/rs6000/vsx.md (vsx_set_v4sf_p9): Likewise.
(vsx_set_v4sf_p9_zero): Special case setting 0.0f to a V4SF
element.
(vsx_insert_extract_v4sf_p9): Add an optimization for inserting a
SFmode value into a V4SF variable that was extracted from another
V4SF variable without converting the element to double precision
and back to single precision vector format.
(vsx_insert_extract_v4sf_p9_2): Likewise.

[gcc/testsuite]
2017-06-15  Michael Meissner  

PR target/79799
* gcc.target/powerpc/pr79799-1.c: New test.
* gcc.target/powerpc/pr79799-2.c: Likewise.
* gcc.target/powerpc/pr79799-3.c: Likewise.
* gcc.target/powerpc/pr79799-4.c: Likewise.
* gcc.target/powerpc/pr79799-5.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 249175)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -7442,6 +7442,9 @@ rs6000_expand_vector_set (rtx target, rt
   else if (mode == V2DImode)
insn = gen_vsx_set_v2di (target, target, val, elt_rtx);
 
+  else if (TARGET_P9_VECTOR && mode == V4SFmode)
+   insn = gen_vsx_set_v4sf_p9 (target, target, val, elt_rtx);
+
   else if (TARGET_P9_VECTOR && TARGET_VSX_SMALL_INTEGER
   && TARGET_UPPER_REGS_DI && TARGET_POWERPC64)
{
Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 249175)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -3012,6 +3012,130 @@ (define_insn "vsx_set__p9"
 }
   [(set_attr "type" "vecperm")])
 
+(define_insn_and_split "vsx_set_v4sf_p9"
+  [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa")
+   (unspec:V4SF
+[(match_operand:V4SF 1 "gpc_reg_operand" "0")
+ (match_operand:SF 2 "gpc_reg_operand" "ww")
+ (match_operand:QI 3 "const_0_to_3_operand" "n")]
+UNSPEC_VSX_SET))
+   (clobber (match_scratch:SI 4 "="))]
+  "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 5)
+   (unspec:V4SF [(match_dup 2)]
+UNSPEC_VSX_CVDPSPN))
+   (parallel [(set (match_dup 4)
+  (vec_select:SI (match_dup 6)
+ (parallel [(match_dup 7)])))
+ (clobber (scratch:SI))])
+   (set (match_dup 8)
+   (unspec:V4SI [(match_dup 8)
+ (match_dup 4)
+ (match_dup 3)]
+UNSPEC_VSX_SET))]
+{
+  unsigned int tmp_regno = reg_or_subregno (operands[4]);
+
+  operands[5] = gen_rtx_REG (V4SFmode, tmp_regno);
+  operands[6] = gen_rtx_REG (V4SImode, tmp_regno);
+  operands[7] = GEN_INT (VECTOR_ELT_ORDER_BIG ? 1 : 2);
+  operands[8] = gen_rtx_REG (V4SImode, reg_or_subregno (operands[0]));
+}
+  [(set_attr "type" "vecperm")
+   (set_attr "length" "12")])
+
+;; Special case setting 0.0f to a V4SF element
+(define_insn_and_split "*vsx_set_v4sf_p9_zero"
+  [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa")
+   (unspec:V4SF
+[(match_operand:V4SF 1 "gpc_reg_operand" "0")
+ (match_operand:SF 2