Re: [PATCH], Tweak PowerPC movdi constraints

2016-11-21 Thread Michael Meissner
On Mon, Nov 21, 2016 at 12:51:38PM -0600, Segher Boessenkool wrote:
> 
> Okay, if you change the changelog to say what the patch actually does ;-)
> And please watch for fallout.

This is the ChangeLog entry I checked in.

2016-11-21  Michael Meissner  
 
* config/rs6000/rs6000.md (movdi_internal32): Change constraints
so that DImode can be allocated to FP/vector registers in more
cases, and we can avoid direct move operations.  If the register
needs reloading, prefer GPRs over FP/vector registers.  In the
case of FPR vs. Altivec registers, prefer FPR registers unless we
have the ISA 3.0 reg+offset scalar instructions.
(movdi_internal64): Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH], Tweak PowerPC movdi constraints

2016-11-21 Thread Segher Boessenkool
On Mon, Nov 21, 2016 at 01:27:59PM -0500, Michael Meissner wrote:
> On Fri, Nov 18, 2016 at 05:07:21PM -0600, Segher Boessenkool wrote:
> > On Fri, Nov 18, 2016 at 05:52:12PM -0500, Michael Meissner wrote:
> > > On Fri, Nov 18, 2016 at 04:43:40PM -0600, Segher Boessenkool wrote:
> > > > Could you also test with reload please?  Just LE is enough I guess.
> > > > We'd like to keep reload working for GCC 7 at least, and these cost
> > > > prefixes tend to break mov patterns :-/
> > > 
> > > Argh, I guess you are right, but then if reload doesn't work, I will 
> > > likely
> > > submit the patch where there are three different movdi's (one for 32-bit
> > > without the change, one for 64-bit with reload, and one for 64-bit with 
> > > lra).
> > > I would prefer not to do that.
> > 
> > Let's hope it just works :-)
> 
> I did test it over the weekend.
> 
> 29 of the 30 spec 2006 benchmarks currently build with reload (gamess fails).
> The same 29 build and run with the new patch.  Like the patch under LRA, there
> are no regressions in performance, and one FP benchmark is faster.
> 
> Under LRA, sphinx3 is 2.5% faster (compared to LRA without the patch).
> 
> Under reload, sphinx3 is roughly the same performance, but calculix is 3.8%
> faster.

Great, thanks for testing.

> Can I apply the patch?

Okay, if you change the changelog to say what the patch actually does ;-)
And please watch for fallout.


Segher


Re: [PATCH], Tweak PowerPC movdi constraints

2016-11-21 Thread Michael Meissner
On Fri, Nov 18, 2016 at 05:07:21PM -0600, Segher Boessenkool wrote:
> On Fri, Nov 18, 2016 at 05:52:12PM -0500, Michael Meissner wrote:
> > On Fri, Nov 18, 2016 at 04:43:40PM -0600, Segher Boessenkool wrote:
> > > Could you also test with reload please?  Just LE is enough I guess.
> > > We'd like to keep reload working for GCC 7 at least, and these cost
> > > prefixes tend to break mov patterns :-/
> > 
> > Argh, I guess you are right, but then if reload doesn't work, I will likely
> > submit the patch where there are three different movdi's (one for 32-bit
> > without the change, one for 64-bit with reload, and one for 64-bit with 
> > lra).
> > I would prefer not to do that.
> 
> Let's hope it just works :-)

I did test it over the weekend.

29 of the 30 spec 2006 benchmarks currently build with reload (gamess fails).
The same 29 build and run with the new patch.  Like the patch under LRA, there
are no regressions in performance, and one FP benchmark is faster.

Under LRA, sphinx3 is 2.5% faster (compared to LRA without the patch).

Under reload, sphinx3 is roughly the same performance, but calculix is 3.8%
faster.

Can I apply the patch?

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH], Tweak PowerPC movdi constraints

2016-11-18 Thread Segher Boessenkool
On Fri, Nov 18, 2016 at 05:52:12PM -0500, Michael Meissner wrote:
> On Fri, Nov 18, 2016 at 04:43:40PM -0600, Segher Boessenkool wrote:
> > On Fri, Nov 18, 2016 at 03:38:38PM -0500, Michael Meissner wrote:
> > > This patch tweaks the movdi constraints for the PowerPC to use "^" or "$"
> > > constraints instead of "?*".  This allows the register allocator to more 
> > > often
> > > allocate DImode to a floating point/vector register when it is desirable 
> > > to do
> > > so.
> > 
> > It also changes some plain "?" to "^" or "$" or even "*" (which cannot
> > work for multi-character constraints, it skips one character, not one
> > constraint).  Wrong version of the patch?
> 
> Note, if '*' does not work with multi-character prefixes, that is a bug.  All
> of rs6000.md assumes that ?*wa means that the register allocator will not
> consider VSX vector registers for when calculating register preferences.

The documentation is out of date.

>From ira-costs.c:

  /* Scan all the constraint letters.  See if the operand
 matches any of the constraints.  Collect the valid
 register classes and see if this operand accepts
 memory.  */
  while ((c = *p))
{
  switch (c)
{
case '*':
  /* Ignore the next letter for this pass.  */
  c = *++p;
  break;

and then 83 lines later:

  p += CONSTRAINT_LEN (c, p);
  if (c == ',')
break;
}

so it does in fact work.

Neither the patch description nor the changelog says you are doing these
changes though.

> > > I built bootstrap compilers and did make check with no regressions on:
> > > 1)Little endian power8, --with-cpu=power8
> > > 2)Big endian power8, --with-cpu=power8 (no 32-bit support)
> > > 3)Big endian power7, --with-cpu=power7 (both 32/64-bit support)
> > 
> > Could you also test with reload please?  Just LE is enough I guess.
> > We'd like to keep reload working for GCC 7 at least, and these cost
> > prefixes tend to break mov patterns :-/
> 
> Argh, I guess you are right, but then if reload doesn't work, I will likely
> submit the patch where there are three different movdi's (one for 32-bit
> without the change, one for 64-bit with reload, and one for 64-bit with lra).
> I would prefer not to do that.

Let's hope it just works :-)


Segher


Re: [PATCH], Tweak PowerPC movdi constraints

2016-11-18 Thread Michael Meissner
On Fri, Nov 18, 2016 at 04:43:40PM -0600, Segher Boessenkool wrote:
> On Fri, Nov 18, 2016 at 03:38:38PM -0500, Michael Meissner wrote:
> > This patch tweaks the movdi constraints for the PowerPC to use "^" or "$"
> > constraints instead of "?*".  This allows the register allocator to more 
> > often
> > allocate DImode to a floating point/vector register when it is desirable to 
> > do
> > so.
> 
> It also changes some plain "?" to "^" or "$" or even "*" (which cannot
> work for multi-character constraints, it skips one character, not one
> constraint).  Wrong version of the patch?

Note, if '*' does not work with multi-character prefixes, that is a bug.  All
of rs6000.md assumes that ?*wa means that the register allocator will not
consider VSX vector registers for when calculating register preferences.

> > I built bootstrap compilers and did make check with no regressions on:
> > 1)  Little endian power8, --with-cpu=power8
> > 2)  Big endian power8, --with-cpu=power8 (no 32-bit support)
> > 3)  Big endian power7, --with-cpu=power7 (both 32/64-bit support)
> 
> Could you also test with reload please?  Just LE is enough I guess.
> We'd like to keep reload working for GCC 7 at least, and these cost
> prefixes tend to break mov patterns :-/

Argh, I guess you are right, but then if reload doesn't work, I will likely
submit the patch where there are three different movdi's (one for 32-bit
without the change, one for 64-bit with reload, and one for 64-bit with lra).
I would prefer not to do that.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH], Tweak PowerPC movdi constraints

2016-11-18 Thread Segher Boessenkool
On Fri, Nov 18, 2016 at 03:38:38PM -0500, Michael Meissner wrote:
> This patch tweaks the movdi constraints for the PowerPC to use "^" or "$"
> constraints instead of "?*".  This allows the register allocator to more often
> allocate DImode to a floating point/vector register when it is desirable to do
> so.

It also changes some plain "?" to "^" or "$" or even "*" (which cannot
work for multi-character constraints, it skips one character, not one
constraint).  Wrong version of the patch?

> I built bootstrap compilers and did make check with no regressions on:
> 1)Little endian power8, --with-cpu=power8
> 2)Big endian power8, --with-cpu=power8 (no 32-bit support)
> 3)Big endian power7, --with-cpu=power7 (both 32/64-bit support)

Could you also test with reload please?  Just LE is enough I guess.
We'd like to keep reload working for GCC 7 at least, and these cost
prefixes tend to break mov patterns :-/


Segher


[PATCH], Tweak PowerPC movdi constraints

2016-11-18 Thread Michael Meissner
This patch tweaks the movdi constraints for the PowerPC to use "^" or "$"
constraints instead of "?*".  This allows the register allocator to more often
allocate DImode to a floating point/vector register when it is desirable to do
so.

I did a full Spec 2006 run with this patch installed, and most of the
benchmarks were neutral in terms of performance.  The 482.sphinx3 benchmark had
a 2.5% performance boost with these patches.  There were no benchmarks that
regressed with this patch.

I built bootstrap compilers and did make check with no regressions on:
1)  Little endian power8, --with-cpu=power8
2)  Big endian power8, --with-cpu=power8 (no 32-bit support)
3)  Big endian power7, --with-cpu=power7 (both 32/64-bit support)

Can I check this patch into the trunk?

[gcc]
2016-11-18  Michael Meissner  

* config/rs6000/rs6000.md (movdi_internal32): Change FPR/VSX "?*"
load/store constraints to "^" if the instruction allows d-form
addressing or "$" if it only allows x-form addressing.  Change
FPR/VSX move constraints to "^".
(movdi_internal64): Likewise.

[gcc/testsuite]
2016-11-18  Michael Meissner  

* gcc.target/powerpc/ppc-round2.c: Allow XSCVDPSXWS and XSCVDPUXWS
to be generated instead of FCTIWUZ or FCTIWZ.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md 
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
(revision 242556)
+++ gcc/config/rs6000/rs6000.md (.../gcc/config/rs6000) (working copy)
@@ -8118,10 +8118,10 @@ (define_insn "p8_mfvsrd_4_disf"
 
 (define_insn "*movdi_internal32"
   [(set (match_operand:DI 0 "rs6000_nonimmediate_operand"
- "=Y,r, r, ?m,?*d,?*d,
-  r, ?wY,   ?Z,?*wb,  ?*wv,   ?wi,
-  ?wo,   ?wo,   ?wv,   ?wi,   ?wi,?wv,
-  ?wv")
+ "=Y,r, r, ^m,^d, ^d,
+  r, ^wY,   $Z,^wb,   $wv,^wi,
+  *wo,   *wo,   *wv,   *wi,   *wi,*wv,
+  *wv")
 
(match_operand:DI 1 "input_operand"
   "r,Y, r, d, m,  d,
@@ -8195,9 +8195,9 @@ (define_split
 (define_insn "*movdi_internal64"
   [(set (match_operand:DI 0 "nonimmediate_operand"
"=Y,r, r, r, r,  r,
-?m,?*d,   ?*d,   ?wY,   ?Z, ?*wb,
-?*wv,  ?wi,   ?wo,   ?wo,   ?wv,?wi,
-?wi,   ?wv,   ?wv,   r, *h, *h,
+^m,^d,^d,^Y,$Z, $wb,
+$wv,   ^wi,   *wo,   *wo,   *wv,*wi,
+*wi,   *wv,   *wv,   r, *h, *h,
 ?*r,   ?*wg,  ?*r,   ?*wj")
 
(match_operand:DI 1 "input_operand"
Index: gcc/testsuite/gcc.target/powerpc/ppc-round2.c
===
--- gcc/testsuite/gcc.target/powerpc/ppc-round2.c   
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)
 (revision 242556)
+++ gcc/testsuite/gcc.target/powerpc/ppc-round2.c   
(.../gcc/testsuite/gcc.target/powerpc)  (working copy)
@@ -5,8 +5,8 @@
 /* { dg-options "-O2 -mcpu=power8" } */
 /* { dg-final { scan-assembler-times "fcfid "  2 } } */
 /* { dg-final { scan-assembler-times "fcfids " 2 } } */
-/* { dg-final { scan-assembler-times "fctiwuz "2 } } */
-/* { dg-final { scan-assembler-times "fctiwz " 2 } } */
+/* { dg-final { scan-assembler-times "fctiwuz \|xscvdpuxws " 2 } } */
+/* { dg-final { scan-assembler-times "fctiwz \|xscvdpsxws "  2 } } */
 /* { dg-final { scan-assembler-times "mfvsrd " 4 } } */
 /* { dg-final { scan-assembler-times "mtvsrwa "2 } } */
 /* { dg-final { scan-assembler-times "mtvsrwz "2 } } */