Re: [PATCH] PowerPC PR target/84154, fix floating point to small integer conversion regression

2018-02-08 Thread Michael Meissner
On Thu, Feb 08, 2018 at 06:10:31PM -0500, Hans-Peter Nilsson wrote:
> On Wed, 7 Feb 2018, Segher Boessenkool wrote:
> > Hi Mike,
> >
> > On Tue, Feb 06, 2018 at 04:34:08PM -0500, Michael Meissner wrote:
> > > Here is the patch reworked.  It bootstraps on both little/big endian 
> > > power8,
> > > and all of the tests run.  Can I install this into trunk now, and into 
> > > GCC 7
> > > after a soak period (along with the previous patch)?
> >
> > > +;; If have ISA 3.0, QI/HImode values can go in both VSX registers and GPR
> >
> > "If we have"?
> >
> > > +  [(set (match_operand:QHSI 0 "memory_operand" "=Z")
> > > + (any_fix:QHSI (match_operand:SFDF 1 "gpc_reg_operand" "wa")))
> > > +   (clobber (match_scratch:SI 2 "=wa"))]
> > > +  "((mode == SImode && TARGET_P8_VECTOR)
> > > +|| (mode != SImode && TARGET_P9_VECTOR))"
> >
> > This is the same as
> >
> >   "(mode == SImode && TARGET_P8_VECTOR) || TARGET_P9_VECTOR"
> 
> Umm, sorry for chiming in here with zero rs6000 knowledge and I
> might be missing something trivial but...what wouldn't that misfire for
>  "mode == SImode && ! TARGET_P8_VECTOR && TARGET_P9_VECTOR" ?
> 
> (Is that invalid or not applicable or don't care or something?)

TARGET_P9_VECTOR requires TARGET_P8_VECTOR.

Basically when we are converting SF/DFmode to SImode, we want to allow it on
ISA 2.07 (-mcpu=power8).  If we are converting to SF/DFmode to HI/QImode, we
require ISA 3.0 (-mcpu=power9).

The reason is that we don't have the instructions to do 32-bit integer store
and 32-bit integer sign/zero extended load instructions to all of the vector
and floating point registers until ISA 2.07.  Because of that, we don't allow
SImode in the vector and floating point registers until ISA 2.07.  In
processors before power8, we had to do a store 64-bit integer on the stack and
then load up the 32-bit value into the GPR registers.

However, ISA 2.07 does not have instructions to store or load 8/16-bit values
that can be conveniently used.  ISA 3.0 added 8/16-bit store, 8/16-bit zero
extended load, and 8/16-bit sign extend instructions.  So in ISA 3.0, we allow
QI/HImode to go in vector registers.

In ISA 2.06 (-mcpu=power7) we had to use UNSPECs to hide the convert floating
point scalar to 32-bit signed/unsigned instruction instructions because we
didn't allow the base type into the register.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH] PowerPC PR target/84154, fix floating point to small integer conversion regression

2018-02-08 Thread Hans-Peter Nilsson
On Wed, 7 Feb 2018, Segher Boessenkool wrote:
> Hi Mike,
>
> On Tue, Feb 06, 2018 at 04:34:08PM -0500, Michael Meissner wrote:
> > Here is the patch reworked.  It bootstraps on both little/big endian power8,
> > and all of the tests run.  Can I install this into trunk now, and into GCC 7
> > after a soak period (along with the previous patch)?
>
> > +;; If have ISA 3.0, QI/HImode values can go in both VSX registers and GPR
>
> "If we have"?
>
> > +  [(set (match_operand:QHSI 0 "memory_operand" "=Z")
> > +   (any_fix:QHSI (match_operand:SFDF 1 "gpc_reg_operand" "wa")))
> > +   (clobber (match_scratch:SI 2 "=wa"))]
> > +  "((mode == SImode && TARGET_P8_VECTOR)
> > +|| (mode != SImode && TARGET_P9_VECTOR))"
>
> This is the same as
>
>   "(mode == SImode && TARGET_P8_VECTOR) || TARGET_P9_VECTOR"

Umm, sorry for chiming in here with zero rs6000 knowledge and I
might be missing something trivial but...what wouldn't that misfire for
 "mode == SImode && ! TARGET_P8_VECTOR && TARGET_P9_VECTOR" ?

(Is that invalid or not applicable or don't care or something?)

brgds, H-P


Re: [PATCH] PowerPC PR target/84154, fix floating point to small integer conversion regression

2018-02-07 Thread Segher Boessenkool
Hi Mike,

On Tue, Feb 06, 2018 at 04:34:08PM -0500, Michael Meissner wrote:
> Here is the patch reworked.  It bootstraps on both little/big endian power8,
> and all of the tests run.  Can I install this into trunk now, and into GCC 7
> after a soak period (along with the previous patch)?

> +;; If have ISA 3.0, QI/HImode values can go in both VSX registers and GPR

"If we have"?

> +  [(set (match_operand:QHSI 0 "memory_operand" "=Z")
> + (any_fix:QHSI (match_operand:SFDF 1 "gpc_reg_operand" "wa")))
> +   (clobber (match_scratch:SI 2 "=wa"))]
> +  "((mode == SImode && TARGET_P8_VECTOR)
> +|| (mode != SImode && TARGET_P9_VECTOR))"

This is the same as

  "(mode == SImode && TARGET_P8_VECTOR) || TARGET_P9_VECTOR"

which is a bit easier to understand I think.

Okay (for trunk as well as 7) with those trivialities improved.  Thanks!


Segher


Re: [PATCH] PowerPC PR target/84154, fix floating point to small integer conversion regression

2018-02-06 Thread Michael Meissner
On Tue, Feb 06, 2018 at 11:15:21AM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Feb 05, 2018 at 05:57:51PM -0500, Michael Meissner wrote:
> > > We have
> > > 
> > > (define_code_attr su [(sign_extend  "s")
> > >   (zero_extend  "u")
> > >   (fix  "s")
> > >   (unsigned_fix "s")
> > >   (float"s")
> > >   (unsigned_float   "u")])
> > > 
> > > and "s" for unsigned_fix seems like it should be "u".  Very surprising
> > > otherwise (if this is needed in some case, it should just write "s" there
> > > instead of "").
> 
> What about this?

As we discussed before, this was a bug, and I've already committed the fix
separately.

> > I found as long as I avoid putting the  or  in the output template
> > (i.e. use an output statement instead of a template) it works.  It only
> > seems to fail in the IEEE128 case, and not in the SFDF case.  I will submit 
> > a
> > bug report  on it after this gets checked in, as it will be simpler to 
> > provide
> > a patch that people can test.
> 
> Thanks.
> 
> > +(define_insn_and_split "fix_trunc2"
> > +  [(set (match_operand: 0 "gpc_reg_operand" "=wJ,wJwK,r")
> > +   (any_fix:QHI (match_operand:SFDF 1 "gpc_reg_operand" "wJ,wJwK,wa")))
> > +   (clobber (match_scratch:SI 2 "=X,X,wi"))]
> > +  "TARGET_DIRECT_MOVE"
> > +  "@
> > +   fctiwz %0,%1
> > +   xscvdpxws %x0,%x1
> 
> This one cannot work with the  definition above, for example.
> 
> > +(define_insn "fix_2_hw"
> > +  [(set (match_operand:SDI 0 "altivec_register_operand" "=v")
> > +   (any_fix:SDI (match_operand:IEEE128 1 "altivec_register_operand" "v")))]
> > +  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
> > +{
> > +  return ( == UNSIGNED_FIX) ? "xscvqpuz %0,%1" : "xscvqpsz 
> > %0,%1";
> > +}
> 
> And it may well be why you need this.
> 
> Rest looks good :-)

Here is the patch reworked.  It bootstraps on both little/big endian power8,
and all of the tests run.  Can I install this into trunk now, and into GCC 7
after a soak period (along with the previous patch)?

[gcc]
2018-02-06  Michael Meissner  

PR target/84154
* config/rs6000/rs6000.md (fix_trunc2):
Convert from define_expand to be define_insn_and_split.  Rework
float/double/_Float128 conversions to QI/HI/SImode to work with
both ISA 2.07 (power8) or ISA 3.0 (power9).  Fix regression where
conversions to QI/HImode types did a store and then a load to
truncate the value.  For conversions to VSX registers, don't split
the insn, instead emit the code directly.  Use the code iterator
any_fix to combine signed and unsigned conversions.
(fix_truncsi2_p8): Likewise.
(fixuns_trunc2): Likewise.
(fix_trunc2): Likewise.
(fix_trunc2): Likewise.
(fix_di2_hw): Likewise.
(fixuns_di2_hw): Likewise.
(fix_si2_hw): Likewise.
(fixuns_si2_hw): Likewise.
(fix_2_hw): Likewise.
(fix_trunc2): Likewise.
(fctiwz__smallint): Rename fctiwz__smallint to
fix_truncsi2_p8.
(fix_trunc2_internal): Delete, no longer
used.
(fixuns_trunc2_internal): Likewise.
(fix__mem): Likewise.
(fctiwz__mem): Likewise.
(fix__mem): Likewise.
(fix_trunc2_mem): On ISA 3.0, prevent
the register allocator from doing a direct move to the GPRs to do
a store, and instead use the ISA 3.0 store byte/half-word from
vector register instruction.  For IEEE 128-bit floating point,
also optimize stores of 32-bit ints.
(fix_trunc2_mem): Likewise.

[gcc/testsuite]
2018-02-06  Michael Meissner  

PR target/84154
* gcc.target/powerpc/pr84154-1.c: New tests.
* gcc.target/powerpc/pr84154-2.c: Likewise.
* gcc.target/powerpc/pr84154-3.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 257269)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -5700,43 +5700,60 @@ (define_insn "*fix_truncdi2_fctidz
xscvdpsxds %x0,%x1"
   [(set_attr "type" "fp")])
 
-(define_expand "fix_trunc2"
-  [(parallel [(set (match_operand: 0 "nonimmediate_operand")
-  (fix:QHI (match_operand:SFDF 1 "gpc_reg_operand")))
- (clobber (match_scratch:DI 2))])]
-  "TARGET_P9_VECTOR && TARGET_DIRECT_MOVE_64BIT"
+;; If have ISA 3.0, QI/HImode values can go in both VSX registers and GPR
+;; registers.  If we have ISA 2.07, we don't allow QI/HImode values in the
+;; vector registers, so we need to do direct moves to the GPRs, but SImode
+;; values can go in VSX registers.  Keeping the direct move part 

Re: [PATCH] PowerPC PR target/84154, fix floating point to small integer conversion regression

2018-02-06 Thread Michael Meissner
On Tue, Feb 06, 2018 at 11:15:21AM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Feb 05, 2018 at 05:57:51PM -0500, Michael Meissner wrote:
> > > We have
> > > 
> > > (define_code_attr su [(sign_extend  "s")
> > >   (zero_extend  "u")
> > >   (fix  "s")
> > >   (unsigned_fix "s")
> > >   (float"s")
> > >   (unsigned_float   "u")])
> > > 
> > > and "s" for unsigned_fix seems like it should be "u".  Very surprising
> > > otherwise (if this is needed in some case, it should just write "s" there
> > > instead of "").
> 
> What about this?

You were right.  I kept missing this.  Sorry.

Here is the patch for just this fix.  As we've discussed, I'll apply this patch
directly as being obvious, while waiting for the other builds.

[gcc]
2018-02-06  Michael Meissner  

PR target/84154
* config/rs6000/rs6000.md (su code attribute): Use "u" for
unsigned_fix, not "s".

Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 257269)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -550,7 +550,7 @@ (define_code_attr u  [(sign_extend  "")
 (define_code_attr su [(sign_extend "s")
  (zero_extend  "u")
  (fix  "s")
- (unsigned_fix "s")
+ (unsigned_fix "u")
  (float"s")
  (unsigned_float   "u")])
 

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH] PowerPC PR target/84154, fix floating point to small integer conversion regression

2018-02-06 Thread Segher Boessenkool
Hi!

On Mon, Feb 05, 2018 at 05:57:51PM -0500, Michael Meissner wrote:
> > We have
> > 
> > (define_code_attr su [(sign_extend  "s")
> >   (zero_extend  "u")
> >   (fix  "s")
> >   (unsigned_fix "s")
> >   (float"s")
> >   (unsigned_float   "u")])
> > 
> > and "s" for unsigned_fix seems like it should be "u".  Very surprising
> > otherwise (if this is needed in some case, it should just write "s" there
> > instead of "").

What about this?

> I found as long as I avoid putting the  or  in the output template
> (i.e. use an output statement instead of a template) it works.  It only
> seems to fail in the IEEE128 case, and not in the SFDF case.  I will submit a
> bug report  on it after this gets checked in, as it will be simpler to provide
> a patch that people can test.

Thanks.

> +(define_insn_and_split "fix_trunc2"
> +  [(set (match_operand: 0 "gpc_reg_operand" "=wJ,wJwK,r")
> + (any_fix:QHI (match_operand:SFDF 1 "gpc_reg_operand" "wJ,wJwK,wa")))
> +   (clobber (match_scratch:SI 2 "=X,X,wi"))]
> +  "TARGET_DIRECT_MOVE"
> +  "@
> +   fctiwz %0,%1
> +   xscvdpxws %x0,%x1

This one cannot work with the  definition above, for example.

> +(define_insn "fix_2_hw"
> +  [(set (match_operand:SDI 0 "altivec_register_operand" "=v")
> + (any_fix:SDI (match_operand:IEEE128 1 "altivec_register_operand" "v")))]
> +  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
> +{
> +  return ( == UNSIGNED_FIX) ? "xscvqpuz %0,%1" : "xscvqpsz 
> %0,%1";
> +}

And it may well be why you need this.

Rest looks good :-)


Segher


Re: [PATCH] PowerPC PR target/84154, fix floating point to small integer conversion regression

2018-02-05 Thread Michael Meissner
On Mon, Feb 05, 2018 at 08:01:32AM -0600, Segher Boessenkool wrote:
> On Mon, Feb 05, 2018 at 07:54:58AM -0500, Michael Meissner wrote:
> > On Mon, Feb 05, 2018 at 05:57:25AM -0600, Segher Boessenkool wrote:
> > > On Thu, Feb 01, 2018 at 02:31:17PM -0500, Michael Meissner wrote:
> > > > This patch fixes the optimization regression that occurred on GCC 7 
> > > > where
> > > > conversions from the various floating point types to small integers 
> > > > would at
> > > > times generate a store and a load.
> > > 
> > > [ snip big explanation; thanks for that! ]
> > > 
> > > Could you merge the signed and unsigned patterns, using any_fix?  Or is
> > > there a reason that cannot work (other than that  unsigned_fix seems
> > > buggy, it should say "u")?
> > 
> > Well that's the way the instructions are.  For traditional FPR instructions 
> > we
> > have FCTIWZ vs. FCTIWUZ, while on the VSX side we have XSCVDPSXWS
> > vs. XSCVDPUXWS.  If you mean the name of the insn, I can change it if 
> > desired,
> > but originally it was based on the FPR insn name.
> 
> We have
> 
> (define_code_attr su [(sign_extend  "s")
>   (zero_extend  "u")
>   (fix  "s")
>   (unsigned_fix "s")
>   (float"s")
>   (unsigned_float   "u")])
> 
> and "s" for unsigned_fix seems like it should be "u".  Very surprising
> otherwise (if this is needed in some case, it should just write "s" there
> instead of "").
> 
> > > Okay for trunk even without that (but please try).  Also okay for 7 after
> > > looking for fallout.
> > 
> > In the past, I have found that combining code iterators with two mode 
> > iterators
> > has a bug where it would use the wrong code iterator, so I just avoided 
> > doing
> > that.  I'll see if it is still a bug.
> 
> Hrm.  If you have multiple iterators you often need to use ":" syntax,
> and you might want that anyway because the precedence rules are non-obvious;
> but you are hitting something else?  Please open a PR if so :-)

I already do use the : syntax.

I found as long as I avoid putting the  or  in the output template
(i.e. use an output statement instead of a template) it works.  It only
seems to fail in the IEEE128 case, and not in the SFDF case.  I will submit a
bug report  on it after this gets checked in, as it will be simpler to provide
a patch that people can test.

This patch cleans up all of the {,unsigned_}fix patterns that convert to QImode
or HImode.  It adds the memory combiner to include SImode to the mix.  While I
was at it, I combined the IEEE insns to use any_fix.

I have checked this patch on both little endian and big endian power8 systems,
and I have have hand checked the code for power7 and power9 systems.  There
were no regressions in the test suite, the new tests were verified to run, and
I did bootstrap builds on both systems.  The big endian power8 system also
32-bit tests as well as 64-bit tests.  Can I apply this patch to the trunk and
after a waiting period, apply the patch to the GCC 7 branch?

[gcc]
2018-02-05  Michael Meissner  

PR target/84154
* config/rs6000/rs6000.md (fix_trunc2):
Convert from define_expand to be define_insn_and_split.  Rework
float/double/_Float128 conversions to QI/HI/SImode to work with
both ISA 2.07 (power8) or ISA 3.0 (power9).  Fix regression where
conversions to QI/HImode types did a store and then a load to
truncate the value.  For conversions to VSX registers, don't split
the insn, instead emit the code directly.  Use the code iterator
any_fix to combine signed and unsigned conversions.
(fix_truncsi2_p8): Likewise.
(fixuns_trunc2): Likewise.
(fix_trunc2): Likewise.
(fix_trunc2): Likewise.
(fix_di2_hw): Likewise.
(fixuns_di2_hw): Likewise.
(fix_si2_hw): Likewise.
(fixuns_si2_hw): Likewise.
(fix_2_hw): Likewise.
(fix_trunc2): Likewise.
(fctiwz__smallint): Rename fctiwz__smallint to
fix_truncsi2_p8.
(fix_trunc2_internal): Delete, no longer
used.
(fixuns_trunc2_internal): Likewise.
(fix__mem): Likewise.
(fctiwz__mem): Likewise.
(fix__mem): Likewise.
(fix_trunc2_mem): On ISA 3.0, prevent
the register allocator from doing a direct move to the GPRs to do
a store, and instead use the ISA 3.0 store byte/half-word from
vector register instruction.  For IEEE 128-bit floating point,
also optimize stores of 32-bit ints.
(fix_trunc2_mem): Likewise.

[gcc/testsuite]
2018-02-05  Michael Meissner  

PR target/84154
* gcc.target/powerpc/pr84154-1.c: New tests.
* gcc.target/powerpc/pr84154-2.c: Likewise.
* gcc.target/powerpc/pr84154-3.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 

Re: [PATCH] PowerPC PR target/84154, fix floating point to small integer conversion regression

2018-02-05 Thread Segher Boessenkool
On Mon, Feb 05, 2018 at 07:54:58AM -0500, Michael Meissner wrote:
> On Mon, Feb 05, 2018 at 05:57:25AM -0600, Segher Boessenkool wrote:
> > On Thu, Feb 01, 2018 at 02:31:17PM -0500, Michael Meissner wrote:
> > > This patch fixes the optimization regression that occurred on GCC 7 where
> > > conversions from the various floating point types to small integers would 
> > > at
> > > times generate a store and a load.
> > 
> > [ snip big explanation; thanks for that! ]
> > 
> > Could you merge the signed and unsigned patterns, using any_fix?  Or is
> > there a reason that cannot work (other than that  unsigned_fix seems
> > buggy, it should say "u")?
> 
> Well that's the way the instructions are.  For traditional FPR instructions we
> have FCTIWZ vs. FCTIWUZ, while on the VSX side we have XSCVDPSXWS
> vs. XSCVDPUXWS.  If you mean the name of the insn, I can change it if desired,
> but originally it was based on the FPR insn name.

We have

(define_code_attr su [(sign_extend  "s")
  (zero_extend  "u")
  (fix  "s")
  (unsigned_fix "s")
  (float"s")
  (unsigned_float   "u")])

and "s" for unsigned_fix seems like it should be "u".  Very surprising
otherwise (if this is needed in some case, it should just write "s" there
instead of "").

> > Okay for trunk even without that (but please try).  Also okay for 7 after
> > looking for fallout.
> 
> In the past, I have found that combining code iterators with two mode 
> iterators
> has a bug where it would use the wrong code iterator, so I just avoided doing
> that.  I'll see if it is still a bug.

Hrm.  If you have multiple iterators you often need to use ":" syntax,
and you might want that anyway because the precedence rules are non-obvious;
but you are hitting something else?  Please open a PR if so :-)


Segher


Re: [PATCH] PowerPC PR target/84154, fix floating point to small integer conversion regression

2018-02-05 Thread Michael Meissner
On Mon, Feb 05, 2018 at 05:57:25AM -0600, Segher Boessenkool wrote:
> On Thu, Feb 01, 2018 at 02:31:17PM -0500, Michael Meissner wrote:
> > This patch fixes the optimization regression that occurred on GCC 7 where
> > conversions from the various floating point types to small integers would at
> > times generate a store and a load.
> 
> [ snip big explanation; thanks for that! ]
> 
> Could you merge the signed and unsigned patterns, using any_fix?  Or is
> there a reason that cannot work (other than that  unsigned_fix seems
> buggy, it should say "u")?

Well that's the way the instructions are.  For traditional FPR instructions we
have FCTIWZ vs. FCTIWUZ, while on the VSX side we have XSCVDPSXWS
vs. XSCVDPUXWS.  If you mean the name of the insn, I can change it if desired,
but originally it was based on the FPR insn name.

> Okay for trunk even without that (but please try).  Also okay for 7 after
> looking for fallout.

In the past, I have found that combining code iterators with two mode iterators
has a bug where it would use the wrong code iterator, so I just avoided doing
that.  I'll see if it is still a bug.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH] PowerPC PR target/84154, fix floating point to small integer conversion regression

2018-02-05 Thread Segher Boessenkool
On Thu, Feb 01, 2018 at 02:31:17PM -0500, Michael Meissner wrote:
> This patch fixes the optimization regression that occurred on GCC 7 where
> conversions from the various floating point types to small integers would at
> times generate a store and a load.

[ snip big explanation; thanks for that! ]

Could you merge the signed and unsigned patterns, using any_fix?  Or is
there a reason that cannot work (other than that  unsigned_fix seems
buggy, it should say "u")?

Okay for trunk even without that (but please try).  Also okay for 7 after
looking for fallout.

Thanks!


Segher


[PATCH] PowerPC PR target/84154, fix floating point to small integer conversion regression

2018-02-01 Thread Michael Meissner
This patch fixes the optimization regression that occurred on GCC 7 where
conversions from the various floating point types to small integers would at
times generate a store and a load.

For example, converting from double to unsigned char generated the following
code on GCC 6 for -mcpu=power8:

fctiwuz 1,1
mfvsrd 3,1
rlwinm 3,3,0,0xff

on GCC 7 and 8 it generates:

fctiwuz 0,1
mfvsrwz 9,0
stw 9,-16(1)
ori 2,2,0
lbz 3,-16(1)

The insns before register allocation are:

(insn 7 8 13 2 (set (subreg:SI (reg:QI 157) 0)
(unsigned_fix:SI (reg:SF 33)))

(insn 13 7 14 2 (set (reg/i:DI 3 3)
 (zero_extend:DI (reg:QI 157

After reload, the insns are:

(insn 7 8 19 2 (set (reg:SI 32 0 [160])
(unsigned_fix:SI (reg:SF 33

(insn 19 7 18 2 (set (reg:SI 9 9 [160])
 (reg:SI 32 0 [160])))

(insn 18 19 13 2 (set (mem/c:SI (plus:DI (reg/f:DI 1 1)
 (const_int -16))
  (reg:SI 9

(insn 13 18 14 2 (set (reg/i:DI 3 3)
  (zero_extend:DI (mem/c:HI (plus:DI (reg/f:DI 1 1)
 (const_int 
-16))

ISA 3.0 (Power9) did not have this problem, because it already had a
fixuns_truncdfqi2 pattern, since QI/HImode values are allowed in vector
registers.  Previous versions of the ISA did not allow QI/HImode into vector
registers, because there wasn't load or store byte/half-word operations.

I extended ISA 3.0 conversion patterns to handle ISA 2.07, using splitters to
move the 32-bit int parts back to the GPR to do sign/zero extension or stores.

I also moved the optimization to prevent the register allocator from doing a
direct move on ISA 3.0 to do an offsettable store via the GPR register to a
separate insn, like I had previously done for SImode.  The rationale for this
is to prevent some places where the register allocator decided to do change a
store into a move (and then later store).

I have tested this patch on a little endian power8 system (64-bit) and a big
endian power8 system (both 32-bit and 64-bit executables).  There were no
regressions in the test suite and the compiler bootstrapped fine.  I added some
tests, and verified they ran in all 3 environments.  Can I check this into the
trunk?  Given this is a regression in GCC 7 as well, can I check the patch if
it applies cleanly into GCC 7 after a burn-in period.

[gcc]
2018-02-01  Michael Meissner  

PR target/84154
* config/rs6000/rs6000.md (fix_trunc2):
Convert from define_expand to be define_insn_and_split.  Rework
float/double/_Float128 conversions to QI/HI/SImode to work with
both ISA 2.07 (power8) or ISA 3.0 (power9).  Fix regression where
conversions to QI/HImode types did a store and then a load to
truncate the value.  For conversions to VSX registers, don't split
the insn, instead emit the code directly.
(fixuns_trunc2): Likewise.
(fix_trunc2): Likewise.
(fix_trunc2_internal): Delete, no longer
used.
(fixuns_trunc2_internal): Likewise.
(fix__mem): Likewise.
(fix_trunc2_mem): On ISA 3.0, prevent the
register allocator from doing a direct move to the GPRs to do a
store, and instead use the ISA 3.0 store byte/half-word from
vector register instruction.  For IEEE 128-bit floating point,
also optimize stores of 32-bit ints.
(fixuns_trunc2_mem): Likewise.
(fix_trunc2_mem): Likewise.
(fixuns_trunc2_mem): Likewise.

[gcc/testsuite]
2018-02-01  Michael Meissner  

PR target/84154
* gcc.target/powerpc/pr84154-1.c: New tests.
* gcc.target/powerpc/pr84154-2.c: Likewise.
* gcc.target/powerpc/pr84154-3.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 257269)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -5700,43 +5700,48 @@ (define_insn "*fix_truncdi2_fctidz
xscvdpsxds %x0,%x1"
   [(set_attr "type" "fp")])
 
-(define_expand "fix_trunc2"
-  [(parallel [(set (match_operand: 0 "nonimmediate_operand")
-  (fix:QHI (match_operand:SFDF 1 "gpc_reg_operand")))
- (clobber (match_scratch:DI 2))])]
-  "TARGET_P9_VECTOR && TARGET_DIRECT_MOVE_64BIT"
+;; If have ISA 3.0, QI/HImode values can go in both VSX registers and GPR
+;; registers.  If we have ISA 2.07, we don't allow QI/HImode values in the
+;; vector registers, so we need to do direct moves to