Re: [PATCH] rs6000: Make the ctr* patterns allow ints in vector regs (PR71763)

2016-07-15 Thread Segher Boessenkool
On Mon, Jul 11, 2016 at 06:54:44AM -0400, David Edelsohn wrote:
> Should we backport this?  At least Alan's UNSPEC_DOLOOP part?

Alan backported this to 6 (I unfortunately removed gcc-patches
from cc:).


Segher


Re: [PATCH] rs6000: Make the ctr* patterns allow ints in vector regs (PR71763)

2016-07-11 Thread Segher Boessenkool
On Mon, Jul 11, 2016 at 06:54:44AM -0400, David Edelsohn wrote:
> Should we backport this?  At least Alan's UNSPEC_DOLOOP part?

The *wi is a bugfix; I'll backport it, just like the *d (already did
that one, 6 and 5; do we want 4.9 as well?)

Alan's last patch would be good to have as well, it is arguably a bugfix,
and it avoids most of the problematic cases.


Segher


Re: [PATCH] rs6000: Make the ctr* patterns allow ints in vector regs (PR71763)

2016-07-11 Thread David Edelsohn
Should we backport this?  At least Alan's UNSPEC_DOLOOP part?

- David


On Mon, Jul 11, 2016 at 6:30 AM, Alan Modra  wrote:
> On Fri, Jul 08, 2016 at 04:17:36AM -0500, Segher Boessenkool wrote:
>> Maybe just UNSPEC_BDZ?  UNSPEC_DOLOOP?
>
> Committed revision 238207, using UNSPEC_DOLOOP.
>
> --
> Alan Modra
> Australia Development Lab, IBM


Re: [PATCH] rs6000: Make the ctr* patterns allow ints in vector regs (PR71763)

2016-07-11 Thread Alan Modra
On Fri, Jul 08, 2016 at 04:17:36AM -0500, Segher Boessenkool wrote:
> Maybe just UNSPEC_BDZ?  UNSPEC_DOLOOP?

Committed revision 238207, using UNSPEC_DOLOOP.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] rs6000: Make the ctr* patterns allow ints in vector regs (PR71763)

2016-07-08 Thread Segher Boessenkool
On Fri, Jul 08, 2016 at 01:28:05AM +0930, Alan Modra wrote:
> BTW, both pr70098 and pr71763 are triggered by combine, not
> loop-doloop as I was thinking earlier.  See rtl dumps for the
> testcases.  I doubt the "optimization" done by combine here is worth
> keeping, since loop-doloop.c ought to already handle the benficial
> inner loop use of ctr.  Elsewhere we typically end up with an insn
> that needs splitting back to the original sequence.  So we could avoid
> creating trouble for ourselves with the following patch.

I agree on the approach; if there are any missed optimisation because
of it, it doesn't weigh up to the frequentish pessimisation we have now.

One case it will prevent it bdz before a bdnz loop (for a loop count of
zero), but we usually do not generate that anyway, and it isn't obvious
it is faster anyway (or smaller, even).

> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 7d9c660..b2d1118 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -148,6 +148,7 @@
> UNSPEC_IEEE128_MOVE
> UNSPEC_IEEE128_CONVERT
> UNSPEC_SIGNBIT
> +   UNSPEC_DONT_COMBINE
>])

That is a pretty horrible name.  Combine can combine such insns just fine;
it won't make up the unspec out of thin air though.

It seems you want to use this in other cases that should not be invented
by combine as well, but that won't work: combine could then morph one of
those into another kind.

Maybe just UNSPEC_BDZ?  UNSPEC_DOLOOP?

> -(define_insn "*ctr_internal5"
> +(define_insn "*ctr_internal3"

Please don't rename the patterns, not if you don't make better names.


Thanks, this should be a nice improvement,


Segher


Re: [PATCH] rs6000: Make the ctr* patterns allow ints in vector regs (PR71763)

2016-07-08 Thread Segher Boessenkool
On Fri, Jul 08, 2016 at 12:37:55PM +0930, Alan Modra wrote:
> The regression tests passed.  I've been looking at differences in
> gcc/*.o and find many cases like the following.
> 
> orig/combine.o
> 1508: 01 00 3f 2c cmpdi   r31,1
> 150c: ff ff ff 3b addir31,r31,-1
> 1510: dc fe 82 41 beq 13ec
> patched/combine.o
> 1508: ff ff ff 37 addic.  r31,r31,-1
> 150c: e0 fe 82 41 beq 13ec
> 
> Combine transforms the first sequence to the second, then further
> transforms that to a bdz (ctr).  When that fails to get ctr
> allocated, the splitter takes us all the way back to the three insn
> sequence..

It used to do the addic. insn.  When I made the carry bit exposed to GCC,
it no longer was possible to always split to addic. though (CA might be
live there already).  Since the splitter should seldomly be used at all,
it now never splits to addic. (and addic. also is slower on some machines,
it is cracked, longer latency than you get with the compare to 1).

> With the patch we use ctr for the inner loop.  With unpatched gcc
> combine generates ctr for the outer loop, which of course uses
> ctr and isn't profitable with an inner loop using ctr.  Vagaries of
> the register allocator result in the outer loop using ctr with the
> inner one losing.  Oops, we generally want inner loops to be more
> highly optimized.

Lovely :-)


Segher


Re: [PATCH] rs6000: Make the ctr* patterns allow ints in vector regs (PR71763)

2016-07-07 Thread Alan Modra
On Fri, Jul 08, 2016 at 01:28:05AM +0930, Alan Modra wrote:
> BTW, both pr70098 and pr71763 are triggered by combine, not
> loop-doloop as I was thinking earlier.  See rtl dumps for the
> testcases.  I doubt the "optimization" done by combine here is worth
> keeping, since loop-doloop.c ought to already handle the benficial
> inner loop use of ctr.  Elsewhere we typically end up with an insn
> that needs splitting back to the original sequence.  So we could avoid
> creating trouble for ourselves with the following patch.
> 
> Bootstrap and regression test powerpc64le-linux and powerpc64-linux in
> progress.
> 
>   * config/rs6000/rs6000.md (UNSPEC_DONT_COMBINE): New unspec.
>   (ctr): Add unspec.
>   (ctr_internal* and splitters): Likewise.  Renumber.

The regression tests passed.  I've been looking at differences in
gcc/*.o and find many cases like the following.

orig/combine.o
1508:   01 00 3f 2c cmpdi   r31,1
150c:   ff ff ff 3b addir31,r31,-1
1510:   dc fe 82 41 beq 13ec
patched/combine.o
1508:   ff ff ff 37 addic.  r31,r31,-1
150c:   e0 fe 82 41 beq 13ec

Combine transforms the first sequence to the second, then further
transforms that to a bdz (ctr).  When that fails to get ctr
allocated, the splitter takes us all the way back to the three insn
sequence..

There's also a quite interesting case involving this nested loop in
real.c:real_to_hexadecimal.

  for (i = SIGSZ - 1; i >= 0; --i)
for (j = HOST_BITS_PER_LONG - 4; j >= 0; j -= 4)
  {
*p++ = "0123456789abcdef"[(r->sig[i] >> j) & 15];
if (--digits == 0)
  goto out;
  }

With the patch we use ctr for the inner loop.  With unpatched gcc
combine generates ctr for the outer loop, which of course uses
ctr and isn't profitable with an inner loop using ctr.  Vagaries of
the register allocator result in the outer loop using ctr with the
inner one losing.  Oops, we generally want inner loops to be more
highly optimized.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] rs6000: Make the ctr* patterns allow ints in vector regs (PR71763)

2016-07-07 Thread Alan Modra
On Thu, Jul 07, 2016 at 02:58:17AM +, Segher Boessenkool wrote:
> Similar to PR70098, which is about integers in floating point registers,
> we can have the completely analogous problem with vector registers as well
> now that we allow integers in vector registers.  So, this patch solves it
> in the same way.  This only works for targets with direct move.
> 
> To recap: register allocation can decide to put an integer mode value in
> a floating point or vector register.  If that register is used in a bd*z
> instruction, which is a jump instruction, reload can not do an output
> reload on it (it does not do output reloads on any jump insns), so the
> float or vector register will remain, and we have to allow it here or
> recog will ICE.  Later on we will split this to valid instructions,
> including a move from that fp/vec register to an int register; it is this
> move that will still fail (PR70098) if we do not have direct move enabled.

BTW, both pr70098 and pr71763 are triggered by combine, not
loop-doloop as I was thinking earlier.  See rtl dumps for the
testcases.  I doubt the "optimization" done by combine here is worth
keeping, since loop-doloop.c ought to already handle the benficial
inner loop use of ctr.  Elsewhere we typically end up with an insn
that needs splitting back to the original sequence.  So we could avoid
creating trouble for ourselves with the following patch.

Bootstrap and regression test powerpc64le-linux and powerpc64-linux in
progress.

* config/rs6000/rs6000.md (UNSPEC_DONT_COMBINE): New unspec.
(ctr): Add unspec.
(ctr_internal* and splitters): Likewise.  Renumber.

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 7d9c660..b2d1118 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -148,6 +148,7 @@
UNSPEC_IEEE128_MOVE
UNSPEC_IEEE128_CONVERT
UNSPEC_SIGNBIT
+   UNSPEC_DONT_COMBINE
   ])
 
 ;;
@@ -12185,6 +12186,7 @@
  (set (match_dup 0)
   (plus:P (match_dup 0)
(const_int -1)))
+ (unspec [(const_int 0)] UNSPEC_DONT_COMBINE)
  (clobber (match_scratch:CC 2 ""))
  (clobber (match_scratch:P 3 ""))])]
   ""
@@ -12205,6 +12207,7 @@
(set (match_operand:P 2 "nonimmediate_operand" "=1,*r,m,*d*wi*c*l")
(plus:P (match_dup 1)
 (const_int -1)))
+   (unspec [(const_int 0)] UNSPEC_DONT_COMBINE)
(clobber (match_scratch:CC 3 "=X,,,"))
(clobber (match_scratch:P 4 "=X,X,,r"))]
   ""
@@ -12229,6 +12232,7 @@
(set (match_operand:P 2 "nonimmediate_operand" "=1,*r,m,*d*wi*c*l")
(plus:P (match_dup 1)
 (const_int -1)))
+   (unspec [(const_int 0)] UNSPEC_DONT_COMBINE)
(clobber (match_scratch:CC 3 "=X,,,"))
(clobber (match_scratch:P 4 "=X,X,,r"))]
   ""
@@ -12246,7 +12250,7 @@
 
 ;; Similar but use EQ
 
-(define_insn "*ctr_internal5"
+(define_insn "*ctr_internal3"
   [(set (pc)
(if_then_else (eq (match_operand:P 1 "register_operand" "c,*b,*b,*b")
  (const_int 1))
@@ -12255,6 +12259,7 @@
(set (match_operand:P 2 "nonimmediate_operand" "=1,*r,m,*d*wi*c*l")
(plus:P (match_dup 1)
 (const_int -1)))
+   (unspec [(const_int 0)] UNSPEC_DONT_COMBINE)
(clobber (match_scratch:CC 3 "=X,,,"))
(clobber (match_scratch:P 4 "=X,X,,r"))]
   ""
@@ -12270,7 +12275,7 @@
   [(set_attr "type" "branch")
(set_attr "length" "*,16,20,20")])
 
-(define_insn "*ctr_internal6"
+(define_insn "*ctr_internal4"
   [(set (pc)
(if_then_else (eq (match_operand:P 1 "register_operand" "c,*b,*b,*b")
  (const_int 1))
@@ -12279,6 +12284,7 @@
(set (match_operand:P 2 "nonimmediate_operand" "=1,*r,m,*d*wi*c*l")
(plus:P (match_dup 1)
 (const_int -1)))
+   (unspec [(const_int 0)] UNSPEC_DONT_COMBINE)
(clobber (match_scratch:CC 3 "=X,,,"))
(clobber (match_scratch:P 4 "=X,X,,r"))]
   ""
@@ -12305,6 +12311,7 @@
  (match_operand 6 "" "")))
(set (match_operand:P 0 "int_reg_operand" "")
(plus:P (match_dup 1) (const_int -1)))
+   (unspec [(const_int 0)] UNSPEC_DONT_COMBINE)
(clobber (match_scratch:CC 3 ""))
(clobber (match_scratch:P 4 ""))]
   "reload_completed"
@@ -12330,6 +12337,7 @@
  (match_operand 6 "" "")))
(set (match_operand:P 0 "nonimmediate_operand" "")
(plus:P (match_dup 1) (const_int -1)))
+   (unspec [(const_int 0)] UNSPEC_DONT_COMBINE)
(clobber (match_scratch:CC 3 ""))
(clobber (match_scratch:P 4 ""))]
   "reload_completed && ! gpc_reg_operand (operands[0], SImode)"

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] rs6000: Make the ctr* patterns allow ints in vector regs (PR71763)

2016-07-06 Thread Segher Boessenkool
Whoops, left off a bit:

> 2016-07-06  Segher Boessenkool  
> 
>   PR target/70098
>   PR target/71763
>   * config/rs6000/rs6000.md (*ctr_internal1, *ctr_internal2,
>   *ctr_internal5, *ctr_internal6): 

* config/rs6000/rs6000.md (*ctr_internal1, *ctr_internal2,
*ctr_internal5, *ctr_internal6): Add *wi to the output
constraint.


Segher


[PATCH] rs6000: Make the ctr* patterns allow ints in vector regs (PR71763)

2016-07-06 Thread Segher Boessenkool
Similar to PR70098, which is about integers in floating point registers,
we can have the completely analogous problem with vector registers as well
now that we allow integers in vector registers.  So, this patch solves it
in the same way.  This only works for targets with direct move.

To recap: register allocation can decide to put an integer mode value in
a floating point or vector register.  If that register is used in a bd*z
instruction, which is a jump instruction, reload can not do an output
reload on it (it does not do output reloads on any jump insns), so the
float or vector register will remain, and we have to allow it here or
recog will ICE.  Later on we will split this to valid instructions,
including a move from that fp/vec register to an int register; it is this
move that will still fail (PR70098) if we do not have direct move enabled.

Bootstrapped and tested on powerpc64-linux (-m32/-m64, -mlra/-mno-lra);
testing on powerpc64le-linux in progress.  Also tested the new testcase
separately.  And bootstrapped/tested on powerp64le-linux.

This will need to go to the 6 branch together with the int-in-vector
patches (and the previous 70098 patch if that isn't there yet).

Committing to trunk.


Segher


2016-07-06  Segher Boessenkool  

PR target/70098
PR target/71763
* config/rs6000/rs6000.md (*ctr_internal1, *ctr_internal2,
*ctr_internal5, *ctr_internal6): 

gcc/testsuite/
PR target/70098
PR target/71763
* gcc.target/powerpc/pr71763.c: New file.

---
 gcc/config/rs6000/rs6000.md|  8 
 gcc/testsuite/gcc.target/powerpc/pr71763.c | 27 +++
 2 files changed, 31 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr71763.c

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index a7615b1..fcb70e5 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -12151,7 +12151,7 @@ (define_insn "*ctr_internal1"
  (const_int 1))
  (label_ref (match_operand 0 "" ""))
  (pc)))
-   (set (match_operand:P 2 "nonimmediate_operand" "=1,*r,m,*d*c*l")
+   (set (match_operand:P 2 "nonimmediate_operand" "=1,*r,m,*d*wi*c*l")
(plus:P (match_dup 1)
 (const_int -1)))
(clobber (match_scratch:CC 3 "=X,,,"))
@@ -12175,7 +12175,7 @@ (define_insn "*ctr_internal2"
  (const_int 1))
  (pc)
  (label_ref (match_operand 0 "" ""
-   (set (match_operand:P 2 "nonimmediate_operand" "=1,*r,m,*d*c*l")
+   (set (match_operand:P 2 "nonimmediate_operand" "=1,*r,m,*d*wi*c*l")
(plus:P (match_dup 1)
 (const_int -1)))
(clobber (match_scratch:CC 3 "=X,,,"))
@@ -12201,7 +12201,7 @@ (define_insn "*ctr_internal5"
  (const_int 1))
  (label_ref (match_operand 0 "" ""))
  (pc)))
-   (set (match_operand:P 2 "nonimmediate_operand" "=1,*r,m,*d*c*l")
+   (set (match_operand:P 2 "nonimmediate_operand" "=1,*r,m,*d*wi*c*l")
(plus:P (match_dup 1)
 (const_int -1)))
(clobber (match_scratch:CC 3 "=X,,,"))
@@ -12225,7 +12225,7 @@ (define_insn "*ctr_internal6"
  (const_int 1))
  (pc)
  (label_ref (match_operand 0 "" ""
-   (set (match_operand:P 2 "nonimmediate_operand" "=1,*r,m,*d*c*l")
+   (set (match_operand:P 2 "nonimmediate_operand" "=1,*r,m,*d*wi*c*l")
(plus:P (match_dup 1)
 (const_int -1)))
(clobber (match_scratch:CC 3 "=X,,,"))
diff --git a/gcc/testsuite/gcc.target/powerpc/pr71763.c 
b/gcc/testsuite/gcc.target/powerpc/pr71763.c
new file mode 100644
index 000..7910a90
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr71763.c
@@ -0,0 +1,27 @@
+// PR target/71763
+// { dg-do compile }
+// { dg-options "-O1 -mvsx" }
+// { dg-xfail-if "PR70098" { lp64 && powerpc64_no_dm } }
+// { dg-prune-output ".*internal compiler error.*" }
+
+int a, b;
+float c;
+
+void fn2(void);
+
+void fn1(void)
+{
+long d;
+
+for (d = 3; d; d--) {
+for (a = 0; a <= 1; a++) {
+b &= 1;
+if (b) {
+for (;;) {
+fn2();
+c = d;
+}
+}
+}
+}
+}
-- 
1.9.3