Re: [PATCH] rs6000: Fix issue in specifying PTImode as an attribute [PR106895]

2023-08-26 Thread Michael Meissner via Gcc-patches
On Thu, Aug 24, 2023 at 09:19:51PM -0500, Peter Bergner wrote:
> On 8/24/23 12:35 PM, Michael Meissner wrote:
> > On Thu, Jul 20, 2023 at 10:05:28AM +0530, jeevitha wrote:
> >> gcc/
> >>PR target/110411
> >>* config/rs6000/rs6000.h (enum rs6000_builtin_type_index): Add fields
> >>to hold PTImode type.
> >>* config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Add node
> >>for PTImode type.
> > 
> > It is good as far as it goes, but I suspect we will eventually need to 
> > extend
> > it.  In particular, the reason people need PTImode is they need the even/odd
> > register layout.  What you've done enables users to declare this value.
> 
> Sure, it could be extended, but that is not what this patch is about.
> It's purely to allow the kernel team access to the guaranteed even/odd
> register layout for some inline asm code.  Any extension would be a
> follow-on patch to this.

I think we need to get the intended users to try the compiler out, and see if
something else happens down the road that we didn't think of.

I tend to think of these things like the children's story "If you give a mouse
a cookie", which in turn leads to another thing and then another.

As I said, I would expect it would be temptimg to start use these types with
either 8 byte atomic built-ins, or do masks, etc.

In a way, it sort of reminds me of OOmode, where we have this opaque type to
load two vectors, but when you start trying to access the two separate
registers, you get all sorts of moves, because the compiler underneath the
covers doesn't really know it is two registers.

In general, I tend that people don't tend to have a full 128-bit integer, but
instead you have a structure with two fields, one is the lock and one is the
data.  So you want to load things together (or do compare/swap, etc.) and then
you want to split the parts into 2 pieces, and then later, combine them back
into the PTImode container.

I really wish we had a constraint that matched the 2nd register in a multi-word
register (not an output operation, a constraint so that the register allocator
could know that you want to overlap a register).

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH] rs6000: Fix issue in specifying PTImode as an attribute [PR106895]

2023-08-24 Thread Michael Meissner via Gcc-patches
On Thu, Jul 20, 2023 at 10:05:28AM +0530, jeevitha wrote:
> Hi All,
> 
> The following patch has been bootstrapped and regtested on powerpc64le-linux.
> 
> When the user specifies PTImode as an attribute, it breaks. Created
> a tree node to handle PTImode types. PTImode attribute helps in generating
> even/odd register pairs on 128 bits.
> 
> 2023-07-20  Jeevitha Palanisamy  
> 
> gcc/
>   PR target/110411
>   * config/rs6000/rs6000.h (enum rs6000_builtin_type_index): Add fields
>   to hold PTImode type.
>   * config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Add node
>   for PTImode type.
> 
> gcc/testsuite/
>   PR target/106895
>   * gcc.target/powerpc/pr106895.c: New testcase.

It is good as far as it goes, but I suspect we will eventually need to extend
it.  In particular, the reason people need PTImode is they need the even/odd
register layout.  What you've done enables users to declare this value.

However, it is likely the users (kernel users mostly) will want to use it with
the atomic built-in functions that take 16 byte values.  So I suspect we will
need to add overloads for those built-ins to allow either TImode and PTImode to
be used.  Note, the PTImode built-in would bypass the TImode parts where they
convert a TImode into PTImode.

This is the reason PTImode was created in the first place.  Due to the calling
sequence, TImode could be passed in odd/even (as well as even/odd) register
pairs, but the atomic insns and lq/stq need even/odd register pairs.  But if
you are calling a built-in with PTImode, you don't have to convert it to
PTImode.

But then the next problem is what happens when people start using it.  Do we
need to add all of the TImode insns (Add, subtract, and, ior, xor, shifts at
the very least)?  These are the things I expect people might want to do for
memory accessed via atomic insns.

Then we get to the thorny problems of load/store on little endian systems, and
do we define the order of the two registers.  Unfortunately, the lq/stq
instructions will load words in the opposite order as plq/pstq.  I imagine the
kernel folk want to use lq/stq, but we may have to figure out exactly what they
want.

If we define any form of operation on PTImode, we likely need to define whether
register 0 has the high bits or low bits.

Sorry to be so negative, but those are a lot of the issues that might come up
as people use it.


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH] Fix typo in insn name.

2023-07-26 Thread Michael Meissner via Gcc-patches
On Wed, Jul 26, 2023 at 01:54:01PM +0800, Kewen.Lin wrote:
> Hi Mike,
> 
> on 2023/7/11 03:59, Michael Meissner wrote:
> > In doing other work, I noticed that there was an insn:
> > 
> > vsx_extract_v4sf__load
> > 
> > Which did not have an iterator.  I removed the useless .
> 
> It actually has a mode iterator, the "P" is used for clobber.
> 
> The whole pattern of this define_insn_and_split is
> 
> (define_insn_and_split "*vsx_extract_v4sf__load"
>   [(set (match_operand:SF 0 "register_operand" "=f,v,v,?r")
>   (vec_select:SF
>(match_operand:V4SF 1 "memory_operand" "m,Z,m,m")
>(parallel [(match_operand:QI 2 "const_0_to_3_operand" "n,n,n,n")])))
>(clobber (match_scratch:P 3 "=,,,"))] <== *P used here*
> 
> Its definition is:
> 
> (define_mode_iterator P [(SI "TARGET_32BIT") (DI "TARGET_64BIT")])
> 
> I guess we can just leave it there?
> 
> BR,
> Kewen

Yes, I didn't notice the :P in the insn.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Ping: [PATCH] Fix typo in insn name.

2023-07-24 Thread Michael Meissner via Gcc-patches
Ping clean-up patch.

| Date: Mon, 10 Jul 2023 15:59:44 -0400
| From: Michael Meissner 
| Subject: [PATCH] Fix typo in insn name.
| Message-ID: 

As I said in the reply, the only thing this patch does is to rename
vsx_extract_v4sf__load to vsx_extract_v4sf_load since the insn does not
use a mode iterator.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Ping: [PATCH] Improve 64->128 bit zero extension on PowerPC (PR target/108958)

2023-07-24 Thread Michael Meissner via Gcc-patches
Ping patch.

| Date: Mon, 10 Jul 2023 15:51:56 -0400
| From: Michael Meissner 
| Subject: [PATCH] Improve 64->128 bit zero extension on PowerPC (PR 
target/108958)
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Ping: [PATCH] Optimize vec_splats of vec_extract for V2DI/V2DF (PR target/99293)

2023-07-24 Thread Michael Meissner via Gcc-patches
Ping patch:

| Date: Mon, 10 Jul 2023 15:50:47 -0400
| From: Michael Meissner 
| Subject: [PATCH] Optimize vec_splats of vec_extract for V2DI/V2DF (PR 
target/99293)
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH] Fix typo in insn name.

2023-07-10 Thread Michael Meissner via Gcc-patches
On Mon, Jul 10, 2023 at 03:10:21PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Jul 10, 2023 at 03:59:44PM -0400, Michael Meissner wrote:
> > In doing other work, I noticed that there was an insn:
> > 
> > vsx_extract_v4sf__load
> > 
> > Which did not have an iterator.  I removed the useless .
> 
> This patch does that, you mean.
> 
> > --- a/gcc/config/rs6000/vsx.md
> > +++ b/gcc/config/rs6000/vsx.md
> > @@ -3576,7 +3576,7 @@ (define_insn_and_split "vsx_extract_v4sf"
> >[(set_attr "length" "8")
> > (set_attr "type" "fp")])
> >  
> > -(define_insn_and_split "*vsx_extract_v4sf__load"
> > +(define_insn_and_split "*vsx_extract_v4sf_load"
> >[(set (match_operand:SF 0 "register_operand" "=f,v,v,?r")
> > (vec_select:SF
> >  (match_operand:V4SF 1 "memory_operand" "m,Z,m,m")
> 
> Does this fix any ICEs?  Or do you have some example that makes better
> machine code after this change?  Or would a better change perhaps be to
> just remove this pattern completely, if it doesn't do anything useful?
> 
> I.e., please include a new testcase.

There is absolutely no code change.  It is purely a cleanup patch.  In doing
other patches, I just noticed that pattern had a _ in it when it didn't
have an iterator.  I just cleaned up the code removing _.  I probably
should have changed it to vsx_extract_v4sf_sf_load.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH] Improve 64->128 bit zero extension on PowerPC (PR target/108958)

2023-07-10 Thread Michael Meissner via Gcc-patches
I forgot to add:

I have tested this patch on the following systems and there was no degration.
Can I check it into the trunk branch?

*   Power10, LE, --with-cpu=power10, IBM 128-bit long double
*   Power9,  LE, --with-cpu=power9,  IBM 128-bit long double
*   Power9,  LE, --with-cpu=power9,  IEEE 128-bit long double
*   Power9,  LE, --with-cpu=power9,  64-bit default long double
*   Power9,  BE, --with-cpu=power9,  IBM 128-bit long double
*   Power8,  BE, --with-cpu=power8,  IBM 128-bit long double

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH] Optimize vec_splats of vec_extract for V2DI/V2DF (PR target/99293)

2023-07-10 Thread Michael Meissner via Gcc-patches
I forgot to add:

I have tested this patch on the following systems and there was no degration.
Can I check it into the trunk branch?

*   Power10, LE, --with-cpu=power10, IBM 128-bit long double
*   Power9,  LE, --with-cpu=power9,  IBM 128-bit long double
*   Power9,  LE, --with-cpu=power9,  IEEE 128-bit long double
*   Power9,  LE, --with-cpu=power9,  64-bit default long double
*   Power9,  BE, --with-cpu=power9,  IBM 128-bit long double
*   Power8,  BE, --with-cpu=power8,  IBM 128-bit long double

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[PATCH] Fix typo in insn name.

2023-07-10 Thread Michael Meissner via Gcc-patches
In doing other work, I noticed that there was an insn:

vsx_extract_v4sf__load

Which did not have an iterator.  I removed the useless .

I have tested this patch on the following systems and there was no degration.
Can I check it into the trunk branch?

*   Power10, LE, --with-cpu=power10, IBM 128-bit long double
*   Power9,  LE, --with-cpu=power9,  IBM 128-bit long double
*   Power9,  LE, --with-cpu=power9,  IEEE 128-bit long double
*   Power9,  LE, --with-cpu=power9,  64-bit default long double
*   Power9,  BE, --with-cpu=power9,  IBM 128-bit long double
*   Power8,  BE, --with-cpu=power8,  IBM 128-bit long double

2023-07-10  Michael Meissner  

gcc/

* config/rs6000/vsx.md (vsx_extract_v4sf_load): Rename from
vsx_extract_v4sf__load.
---
 gcc/config/rs6000/vsx.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index d34c3b21abe..aed450e31ec 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -3576,7 +3576,7 @@ (define_insn_and_split "vsx_extract_v4sf"
   [(set_attr "length" "8")
(set_attr "type" "fp")])
 
-(define_insn_and_split "*vsx_extract_v4sf__load"
+(define_insn_and_split "*vsx_extract_v4sf_load"
   [(set (match_operand:SF 0 "register_operand" "=f,v,v,?r")
(vec_select:SF
 (match_operand:V4SF 1 "memory_operand" "m,Z,m,m")
-- 
2.41.0


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[PATCH] Improve 64->128 bit zero extension on PowerPC (PR target/108958)

2023-07-10 Thread Michael Meissner via Gcc-patches
If we are converting an unsigned DImode to a TImode value, and the TImode value
will go in a vector register, GCC currently does the DImode to TImode conversion
in GPR registers, and then moves the value to the vector register via a mtvsrdd
instruction.

This patch adds a new zero_extendditi2 insn which optimizes moving a GPR to a
vector register using the mtvsrdd instruction with RA=0, and using lxvrdx to
load a 64-bit value into the bottom 64-bits of the vector register.

2023-07-10  Michael Meissner  

gcc/

PR target/108958
* gcc/config/rs6000.md (zero_extendditi2): New insn.

gcc/testsuite/

PR target/108958
* gcc.target/powerpc/pr108958.c: New test.
---
 gcc/config/rs6000/rs6000.md | 52 +++
 gcc/testsuite/gcc.target/powerpc/pr108958.c | 57 +
 2 files changed, 109 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108958.c

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index cdab49fbb91..1a3d6316eab 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -987,6 +987,58 @@ (define_insn_and_split "*zero_extendsi2_dot2"
(set_attr "dot" "yes")
(set_attr "length" "4,8")])
 
+(define_insn_and_split "zero_extendditi2"
+  [(set (match_operand:TI 0 "gpc_reg_operand" "=r,r,wa,wa,wa")
+   (zero_extend:TI
+(match_operand:DI 1 "reg_or_mem_operand" "r,m,b,Z,wa")))
+   (clobber (match_scratch:DI 2 "=X,X,X,X,"))]
+  "TARGET_POWERPC64 && TARGET_P9_VECTOR"
+  "@
+   #
+   #
+   mtvsrdd %x0,0,%1
+   lxvrdx %x0,%y1
+   #"
+  "&& reload_completed
+   && (int_reg_operand (operands[0], TImode)
+   || (vsx_register_operand (operands[0], TImode)
+  && vsx_register_operand (operands[1], DImode)))"
+  [(set (match_dup 2) (match_dup 1))
+   (set (match_dup 3) (const_int 0))]
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+
+  /* If we are converting a VSX DImode to VSX TImode, we need to move the upper
+ 64-bits (DImode) to the lower 64-bits.  We can't just do a xxpermdi
+ instruction to swap the two 64-bit words, because can't rely on the bottom
+ 64-bits of the VSX register being 0.  Instead we create a 0 and do the
+ xxpermdi operation to combine the two registers.  */
+  if (vsx_register_operand (dest, TImode)
+  && vsx_register_operand (src, DImode))
+{
+  rtx tmp = operands[2];
+  emit_move_insn (tmp, const0_rtx);
+
+  rtx hi = tmp;
+  rtx lo = src;
+  if (!BYTES_BIG_ENDIAN)
+   std::swap (hi, lo);
+
+  rtx dest_v2di = gen_rtx_REG (V2DImode, reg_or_subregno (dest));
+  emit_insn (gen_vsx_concat_v2di (dest_v2di, hi, lo));
+  DONE;
+}
+
+  /* If we are zero extending to a GPR register either from a GPR register,
+ a VSX register or from memory, do the zero extend operation to the
+ lower DI register, and set the upper DI register to 0.  */
+  operands[2] = gen_lowpart (DImode, dest);
+  operands[3] = gen_highpart (DImode, dest);
+}
+  [(set_attr "type" "*,load,vecexts,vecload,vecperm")
+   (set_attr "isa" "*,*,p9v,p10,*")
+   (set_attr "length" "8,8,*,*,8")])
 
 (define_insn "extendqi2"
   [(set (match_operand:EXTQI 0 "gpc_reg_operand" "=r,?*v")
diff --git a/gcc/testsuite/gcc.target/powerpc/pr108958.c 
b/gcc/testsuite/gcc.target/powerpc/pr108958.c
new file mode 100644
index 000..85ea0976f91
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr108958.c
@@ -0,0 +1,57 @@
+/* { dg-require-effective-target int128 } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* This patch makes sure the various optimization and code paths are done for
+   zero extending DImode to TImode on power10 (PR target/pr108958).  */
+
+__uint128_t
+gpr_to_gpr (unsigned long long a)
+{
+  return a;  /* li 4,0.  */
+}
+
+__uint128_t
+mem_to_gpr (unsigned long long *p)
+{
+  return *p;   /* ld 3,0(3); li 4,0.  */
+}
+
+__uint128_t
+vsx_to_gpr (double d)
+{
+  return (unsigned long long)d;/* fctiduz 0,1; li 4,0; mfvsrd 
3,0.  */
+}
+
+void
+gpr_to_vsx (__uint128_t *p, unsigned long long a)
+{
+  __uint128_t b = a;   /* mtvsrdd 0,0,4; stxv 0,0(3).  */
+  __asm__ (" # %x0" : "+wa" (b));
+  *p = b;
+}
+
+void
+mem_to_vsx (__uint128_t *p, unsigned long long *q)
+{
+  __uint128_t a = *q;  /* lxvrdx 0,0,4; stxv 0,0(3).  */
+  __asm__ (" # %x0" : "+wa" (a));
+  *p = a;
+}
+
+void
+vsx_to_vsx (__uint128_t *p, double d)
+{
+  /* fctiduz 1,1; xxspltib 0,0; xxpermdi 0,0,1,0; stxv 0,0(3).  */
+  __uint128_t a = (unsigned long long)d;
+  __asm__ (" # %x0" : "+wa" (a));
+  *p = a;
+}
+
+/* { dg-final { scan-assembler-times {\mld\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mli\M}   3 } } */
+/* { dg-final { scan-assembler-times {\mlxvrdx\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mmfvsrd\M}   1 } } */
+/* { dg-final 

[PATCH] Optimize vec_splats of vec_extract for V2DI/V2DF (PR target/99293)

2023-07-10 Thread Michael Meissner via Gcc-patches
This patch optimizes cases like:

vector double v1, v2;
/* ... */
v2 = vec_splats (vec_extract (v1, 0);   /* or  */
v2 = vec_splats (vec_extract (v1, 1);

Previously:

vector long long
splat_dup_l_0 (vector long long v)
{
  return __builtin_vec_splats (__builtin_vec_extract (v, 0));
}

would generate:

mfvsrld 9,34
mtvsrdd 34,9,9
blr

With this patch, GCC generates:

xxpermdi 34,34,34,3
blr

2023-07-10  Michael Meissner  

gcc/

PR target/99293
* gcc/config/rs6000/vsx.md (vsx_splat_extract_): New combiner
insn.

gcc/testsuite/

PR target/108958
* gcc.target/powerpc/pr99293.c: New test.
* gcc.target/powerpc/builtins-1.c: Update insn count.
---
 gcc/config/rs6000/vsx.md  | 18 ++
 gcc/testsuite/gcc.target/powerpc/builtins-1.c |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr99293.c| 55 +++
 3 files changed, 74 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr99293.c

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 0c269e4e8d9..d34c3b21abe 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4600,6 +4600,24 @@ (define_insn "vsx_splat__mem"
   "lxvdsx %x0,%y1"
   [(set_attr "type" "vecload")])
 
+;; Optimize SPLAT of an extract from a V2DF/V2DI vector with a constant element
+(define_insn "*vsx_splat_extract_"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+   (vec_duplicate:VSX_D
+(vec_select:
+ (match_operand:VSX_D 1 "vsx_register_operand" "wa")
+ (parallel [(match_operand 2 "const_0_to_1_operand" "n")]]
+  "VECTOR_MEM_VSX_P (mode)"
+{
+  int which_word = INTVAL (operands[2]);
+  if (!BYTES_BIG_ENDIAN)
+which_word = 1 - which_word;
+
+  operands[3] = GEN_INT (which_word ? 3 : 0);
+  return "xxpermdi %x0,%x1,%x1,%3";
+}
+  [(set_attr "type" "vecperm")])
+
 ;; V4SI splat support
 (define_insn "vsx_splat_v4si"
   [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,wa")
diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-1.c
index 28cd1aa6b1a..98783668bce 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-1.c
@@ -1035,4 +1035,4 @@ foo156 (vector unsigned short usa)
 /* { dg-final { scan-assembler-times {\mvmrglb\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mvmrgew\M} 4 } } */
 /* { dg-final { scan-assembler-times {\mvsplth|xxsplth\M} 4 } } */
-/* { dg-final { scan-assembler-times {\mxxpermdi\M} 44 } } */
+/* { dg-final { scan-assembler-times {\mxxpermdi\M} 42 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr99293.c 
b/gcc/testsuite/gcc.target/powerpc/pr99293.c
new file mode 100644
index 000..e5f44bd7346
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr99293.c
@@ -0,0 +1,55 @@
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-O2 -mpower8-vector" } */
+
+/* Test for PR 99263, which wants to do:
+   __builtin_vec_splats (__builtin_vec_extract (v, n))
+
+   where v is a V2DF or V2DI vector and n is either 0 or 1.  Previously the GCC
+   compiler would do a direct move to the GPR registers to select the item and 
a
+   direct move from the GPR registers to do the splat.
+
+   Before the patch, splat_dup_ll_0 or splat_dup_dbl_0 below would generate:
+
+mfvsrld 9,34
+mtvsrdd 34,9,9
+blr
+
+   and now it generates:
+
+xxpermdi 34,34,34,3
+blr  */
+
+#include 
+
+vector long long
+splat_dup_ll_0 (vector long long v)
+{
+  /* xxpermdi 34,34,34,3 */
+  return __builtin_vec_splats (vec_extract (v, 0));
+}
+
+vector double
+splat_dup_dbl_0 (vector double v)
+{
+  /* xxpermdi 34,34,34,3 */
+  return __builtin_vec_splats (vec_extract (v, 0));
+}
+
+vector long long
+splat_dup_ll_1 (vector long long v)
+{
+  /* xxpermdi 34,34,34,0 */
+  return __builtin_vec_splats (vec_extract (v, 1));
+}
+
+vector double
+splat_dup_dbl_1 (vector double v)
+{
+  /* xxpermdi 34,34,34,0 */
+  return __builtin_vec_splats (vec_extract (v, 1));
+}
+
+/* { dg-final { scan-assembler-times "xxpermdi" 4 } } */
+/* { dg-final { scan-assembler-not   "mfvsrd" } } */
+/* { dg-final { scan-assembler-not   "mfvsrld"} } */
+/* { dg-final { scan-assembler-not   "mtvsrdd"} } */
-- 
2.41.0


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: PING^3 [PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare

2023-07-06 Thread Michael Meissner via Gcc-patches
I get the following warning which prevents gcc from bootstrapping due to
-Werror:

/home/meissner/fsf-src/work124-sfsplat/gcc/config/rs6000/rs6000-p10sfopt.cc: In 
function ‘void {anonymous}::process_chain_from_load(gimple*)’:
/home/meissner/fsf-src/work124-sfsplat/gcc/config/rs6000/rs6000-p10sfopt.cc:505:30:
 warning: zero-length gcc_dump_printf format string [-Wformat-zero-length]
  505 |   dump_printf (MSG_NOTE, "");
  |  ^~

I just commented out the dump_printf call.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[PATCH, V6] Fix power10 fusion and -fstack-protector, PR target/105325

2023-06-13 Thread Michael Meissner via Gcc-patches
This patch fixes an issue where if you use the -fstack-protector and
-mcpu=power10 options and you have a large stack frame, the GCC compiler will
generate a LWA instruction with a large offset.

Unlike the previous versions of this patch, I dug into it, and I found it was
much more complex that I originally thought.

The important thing in the bug is that -fstack-protector is used, but it could
potentially happen with fused load-compare to any stack location when the stack
frame is larger than 32K without -fstack-protector.

Here is the initial fused initial insn that was created.  It refers to the
stack location based off of the virtrual frame pointer:

(insn 6 5 7 2 (parallel [
(set (reg:CC 119)
 (compare:CC (mem/c:SI (plus:DI (reg/f:DI 110 sfp)
(const_int -4))
 (const_int 0 [0])))
(clobber (scratch:DI))
])
 (nil))

After the stack size is finalized, the frame pointer removed, and the post
reload phase is run, the insn is now:

(insn 6 5 7 2 (parallel [
(set (reg:CC 100 0 [119])
 (compare:CC (mem/c:SI (plus:DI (reg/f:DI 1 1)
(const_int 40044))
 (const_int 0 [0])))
(clobber (reg:DI 9 9 [120]))
])
 (nil))

When the split2 pass is run after reload has finished the ds_form_mem_operand
predicate that was used for lwa and ld no longer returns true.  This means that
since the operand predicates aren't recognized, it won't be split.  Thus, it
goes all of the way to final.  The automatic prefix instruction support was not
run because the type was changed from "load" to "fused_load_cmpi".  This meant
that it was assume that the insn was only 8 bytes, and that we did not need to
prefer the lwa with a 'p'.

The solution involves:

1)  Don't use ds_form_mem_operand for ld and lwa, always use
non_update_memory_operand.

2)  Delete ds_form_mem_operand since it is no longer used.

3)  Use the "YZ" constraints for ld/lwa instead of "m".

4)  If we don't need to sign extend the lwa, convert it to lwz, and use
cmpwi instead of cmpdi.  Adjust the insn name to reflect the code
generate.

5)  Insure that the insn using lwa will be recognized as having a prefixed
operand (and hence the instruction length is 16 bytes instead of 8
bytes).

5a) Set the prefixed and maybe_prefix attributes to know that
fused_load_cmpi are also load insns;

5b) In the case where we are just setting CC and not using the memory
afterward, set the clobber to use a DI register, and put an
explicit sign_extend operation in the split;

5c) Set the sign_extend attribute to "yes".

5d) 5a-5c are the things that prefixed_load_p in rs6000.cc checks to
ensure that lwa is treated as a ds-form instruction and not as
a d-form instruction (i.e. lwz).

6)  Add a new test case for this case.

7)  Adjust the insn counts in fusion-p10-ldcmpi.c.  Because we are no
longer using ds_form_mem_operand, the ld and lwa instructions will fuse
x-form (reg+reg) addresses in addition ds-form (reg+offset or reg).

I have built bootstrap compilers and tested them on the following environments.
There were no regressions in any of the runs.

Little endian power10, long double is IBM 128-bit
Little endian power9, long double is IBM 128-bit
Little endian power9, long double is IEEE 128-bit
Big endian power8, long double is IBM 128-bit (32/64-bit tests run)

Can I check this patch into the master GCC branch?  After a waiting period, once
the previous changes to genfusion.pl are checked in, can I install this patch in
previous GCC compilers?

2023-06-12   Michael Meissner  

gcc/

* config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): Fix problems that
allowed prefixed lwa to be generated.
* config/rs6000/fusion.md: Regenerate.
* config/rs6000/predicates.md (ds_form_mem_operand): Delete.
* config/rs6000/rs6000.md (prefixed attribute): Add support for load
plus compare immediate fused insns.
(maybe_prefixed): Likewise.

gcc/testsuite/

* g++.target/powerpc/pr105325.C: New test.
* gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c: Update insn
counts.
---
 gcc/config/rs6000/fusion.md   | 27 +++---
 gcc/config/rs6000/genfusion.pl| 36 +++
 gcc/config/rs6000/predicates.md   | 14 
 gcc/config/rs6000/rs6000.md   |  4 +--
 gcc/testsuite/g++.target/powerpc/pr105325.C   | 26 ++
 .../gcc.target/powerpc/fusion-p10-ldcmpi.c| 16 +
 6 files changed, 81 insertions(+), 42 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr105325.C

diff 

Ping: [PATCH V5] PR target/105325: Fix constraint issue with power10 fusion

2023-05-15 Thread Michael Meissner via Gcc-patches
Ping both patches:

Patch #1, rewrite genfusion.pl's code for load and compare immediate fusion to
be more readable.  This patch produces the same output as the current sources.

| Date: Wed, 10 May 2023 11:38:55 -0400
| Subject: Re: [PATCH V5, 1/2] PR target/105325: Rewrite genfusion.pl's 
gen_ld_cmpi_p10 function.
| Message-ID: 

Patch #2, implement the fix for PR target/105325:

| Date: Wed, 10 May 2023 11:40:00 -0400
| Subject: [PATCH V5, 2/2] PR target/105325: Fix memory constraints for power10 
fusion.
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[PATCH V5, 2/2] PR target/105325: Fix memory constraints for power10 fusion.

2023-05-10 Thread Michael Meissner via Gcc-patches
This patch applies stricter predicates and constraints for LD and LWA
instructions with power10 fusion.  These instructions are DS-form instructions,
which means that the bottom 2 bits of the address must be 0.  In the past, we
did not use the stricter predicates and constraints, and if the user used the
-fstack-protector option, it would generate a non-prefixed load instruction
whose offset was too big if the stack is large.

This patch has been tested on:

* Little endian power9 with both IEEE and IBM long double
* Little endian power10
* Big endian power8 using both 32-bit and 64-bit code generation.

Can I check this into the master branch?  Assuming I can check this in, I will
also commit to the active GCC branches after a burn-in period.

2023-05-10   Michael Meissner  

gcc/

PR target/105325
* config/rs6000/genfusion.pl (print_ld_cmpi_p10): Use "YZ" constraints
for DS-form loads.  Set the sign_extend attribute for loads that do sign
extension.  Use the lwa_operand predicate for the LWA instruction.
* config/rs6000/fusion.md: Regenerate.

gcc/testsuite/

PR target/105325
* g++.target/powerpc/pr105325.C: New test.
* gcc.target/powerpc/fusion-p10-ldcmpi.c: Adjust insn counts.
---
 gcc/config/rs6000/fusion.md   | 17 +++-
 gcc/config/rs6000/genfusion.pl| 20 +++---
 gcc/testsuite/g++.target/powerpc/pr105325.C   | 26 +++
 .../gcc.target/powerpc/fusion-p10-ldcmpi.c|  4 +--
 4 files changed, 54 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr105325.C

diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
index 81ba4b33940..836dbd20948 100755
--- a/gcc/config/rs6000/genfusion.pl
+++ b/gcc/config/rs6000/genfusion.pl
@@ -129,6 +129,12 @@ sub print_ld_cmpi_p10
   print "  \"\"\n";
   print "  [(set_attr \"type\" \"fused_load_cmpi\")\n";
   print "   (set_attr \"cost\" \"8\")\n";
+
+  if ($extend eq "sign")
+{
+  print "   (set_attr \"sign_extend\" \"yes\")\n";
+}
+
   print "   (set_attr \"length\" \"8\")])\n";
   print "\n";
 }
@@ -147,9 +153,9 @@ sub gen_ld_cmpi_p10
   "HI" => "lhz",
   "QI" => "lbz");
 
-  # Memory predicate to use.
+  # Memory predicate to use.  For LWA, use the special LWA_OPERAND.
   my %signed_memory_predicate = ("DI" => "ds_form_mem_operand",
-"SI" => "ds_form_mem_operand",
+"SI" => "lwa_operand",
 "HI" => "non_update_memory_operand");
 
   my %unsigned_memory_predicate = ("DI" => "ds_form_mem_operand",
@@ -161,6 +167,10 @@ sub gen_ld_cmpi_p10
   my %np = ("ds" => "NON_PREFIXED_DS",
"d"  => "NON_PREFIXED_D");
 
+  # Constraint to use.
+  my %constraint = ("ds" => "YZ",
+   "d"  => "m");
+
   # Result modes to use. Clobber is used when you are comparing the load to
   # -1/0/1, but you are not using it otherwise.  EXTDI does not exist. We
   # cannot directly use HI/QI results because we only have word and double word
@@ -189,7 +199,8 @@ sub gen_ld_cmpi_p10
 
  print_ld_cmpi_p10 ($lmode, $result, "CC", "",
 "const_m1_to_1_operand", $extend,
-$signed_load{$lmode}, $np{$mem_format}, "m",
+$signed_load{$lmode}, $np{$mem_format},
+$constraint{$mem_format},
 $signed_memory_predicate{$lmode});
}
 
@@ -204,7 +215,8 @@ sub gen_ld_cmpi_p10
 
  print_ld_cmpi_p10 ($lmode, $result, "CCUNS", "l",
 "const_0_to_1_operand", $extend,
-$unsigned_load{$lmode}, $np{$mem_format}, "m",
+$unsigned_load{$lmode}, $np{$mem_format},
+$constraint{$mem_format},
 $unsigned_memory_predicate{$lmode});
}
 }
diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index d45fb138a70..da9953d9ad9 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -22,7 +22,7 @@
 ;; load mode is DI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-(compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+(compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
 (match_operand:DI 3 "const_m1_to_1_operand" "n")))
(clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -43,7 +43,7 @@ (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
 ;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
   [(set (match_operand:CCUNS 2 

Re: [PATCH V5, 1/2] PR target/105325: Rewrite genfusion.pl's gen_ld_cmpi_p10 function.

2023-05-10 Thread Michael Meissner via Gcc-patches
This patch rewrites the gen_ld_cmpi_p10 function in genfusion.pl to be clearer.
The resulting fusion.md file that this patch generates is exactly the same
output that the previous version of genfusion.pl generated.  The next patch in
this series will fix PR target/105325 (provide correct predicates and
constraints for power10 fusion of load and compare immediate).

This patch has been tested on:

* Little endian power9 with both IEEE and IBM long double
* Little endian power10
* Big endian power8 using both 32-bit and 64-bit code generation.

Can I check this into the master branch?  Assuming I can check this in, I will
also commit to the active GCC branches after a burn-in period.

2023-05-10   Michael Meissner  

gcc/

PR target/105325
* config/rs6000/genfusion.pl (mode_to_ldst_char): Delete.
(print_ld_cmpi_p10): New function, split off from gen_ld_cmpi_p10.
(gen_ld_cmpi_p10): Rewrite completely.
---
 gcc/config/rs6000/genfusion.pl | 248 +
 1 file changed, 157 insertions(+), 91 deletions(-)

diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
index e4db352e0ce..81ba4b33940 100755
--- a/gcc/config/rs6000/genfusion.pl
+++ b/gcc/config/rs6000/genfusion.pl
@@ -45,103 +45,169 @@ print <<'EOF';
 
 EOF
 
-sub mode_to_ldst_char
+# Print the insns for load and compare with -1/0/1.
+# Arguments:
+# lmode  -- Integer mode ("DI", "SI", "HI", or "QI").
+# result -- "clobber", "GPR", or $lmode
+# ccmode -- Sign vs. unsigned ("CC" or "CCUNS").
+# mem_format -- Memory format ("d" or "ds").
+# cmpl   -- Suffix for compare ("l" or "")
+# const_pred -- Predicate for constant (i.e. -1/0/1 or 0/1).
+# extend -- "sign", "zero", or "none".
+# echr   -- Suffix for load ("a", "z", or "").
+# load   -- Load instruction (i.e. "ld", "lwa", "lwz", etc.)
+# np -- enum non_prefixed_form for memory type
+# constraint -- constraint to use
+# mem_pred   -- predicate for the memory operation
+
+sub print_ld_cmpi_p10
 {
-my ($mode) = @_;
-my %x = (DI => 'd', SI => 'w', HI => 'h', QI => 'b');
-return $x{$mode} if exists $x{$mode};
-return '?';
+  my ($lmode, $result, $ccmode, $cmpl, $const_pred,
+  $extend, $load, $np, $constraint, $mem_pred) = @_;
+
+  # For clobber, we need a SI/DI reg in case we split because we have to
+  # sign/zero extend.
+  my $clobbermode = ($lmode =~ m/^[HQ]I$/) ? "GPR" : $lmode;
+
+  # Break long print statements into smaller lines.
+  my $info = join (" ",
+  "load mode is ${lmode} result mode is ${result}",
+  "compare mode is ${ccmode} extend is ${extend}");
+
+  my $name = join ("",
+  "${load}_cmp${cmpl}di_cr0_${lmode}",
+  "_${result}_${ccmode}_${extend}");
+
+  my $cmp_op1 = "(match_operand:${lmode} 1 \"${mem_pred}\" \"${constraint}\")";
+
+  my $spaces = " " x (length ($ccmode) + 18);
+
+  print ";; load-cmpi fusion pattern generated by gen_ld_cmpi_p10\n";
+  print ";; ${info}\n";
+  print "(define_insn_and_split \"*${name}\"\n";
+  print "  [(set (match_operand:${ccmode} 2 \"cc_reg_operand\" \"=x\")\n";
+  print "(compare:${ccmode} ${cmp_op1}\n";
+  print "${spaces}(match_operand:${lmode} 3 \"${const_pred}\" \"n\")))\n";
+
+  if ($result eq "clobber")
+{
+  print "   (clobber (match_scratch:${clobbermode} 0 \"=r\"))]\n";
+}
+
+  else
+{
+  my $load_op0 = "(match_operand:${result} 0 \"gpc_reg_operand\" \"=r\")";
+  my $load_op1 = (($result eq $lmode)
+ ? "(match_dup 1)"
+ : "(${extend}_extend:${result} (match_dup 1))");
+  print "   (set ${load_op0} ${load_op1})]\n";
+}
+
+  # Do not match prefixed loads.  The machine only fuses non-prefixed loads
+  # with compare immediate.  Take into account whether the load is a ds-form
+  # or a d-form instruction.
+  print "  \"(TARGET_P10_FUSION)\"\n";
+  print "  \"${load}%X1 %0,%1\\;cmp${cmpl}di %2,%0,%3\"\n";
+  print "  \"&& reload_completed\n";
+  print "   && (cc_reg_not_cr0_operand (operands[2], CCmode)\n";
+  print "   || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),\n";
+  print "  ${lmode}mode, ${np}))\"\n";
+
+  if ($extend eq "none")
+{
+  print "  [(set (match_dup 0) (match_dup 1))\n";
+}
+
+  else
+{
+  my $resultmode = ($result eq "clobber") ? $clobbermode : $result;
+  print "  [(set (match_dup 0) (${extend}_extend:${resultmode} (match_dup 
1)))\n";
+}
+
+  print "   (set (match_dup 2)\n";
+  print "(compare:${ccmode} (match_dup 0) (match_dup 3)))]\n";
+  print "  \"\"\n";
+  print "  [(set_attr \"type\" \"fused_load_cmpi\")\n";
+  print "   (set_attr \"cost\" \"8\")\n";
+  print "   (set_attr \"length\" \"8\")])\n";
+  print "\n";
 }
 
 sub gen_ld_cmpi_p10
 {
-my ($lmode, $ldst, $clobbermode, $result, $cmpl, $echr, $constpred,
-   $mempred, $ccmode, $np, $extend, 

[PATCH V5, 0/2] PR target/105325: Fix constraint issue with power10 fusion

2023-05-10 Thread Michael Meissner via Gcc-patches
I have posted 4 previous versions of this patch (April 26th, March 28th, March
24th, and March 21st).

In this patch, rather than just add changes to the existing code in
genfusion.pl, I rewrote the function completely.  There are two patches within
this patch set:

* The first patch rewrites the perl function to be more readable.  This
  patch produces the same output for fusion.md that the current version
  generates.

* The second patch then using the rewrite in the first patch adds the
  changes to fix the problem.

The issue with the original bug is the power10 load GPR + cmpi -1/0/1 fusion
optimization generates illegal assembler code when the -fstack-protector option
is used.

Ultimately the code was dying because the fusion load + compare -1/0/1 patterns
did not handle the possibility that the load might be prefixed.

The main cause is the constraints for the individual loads in the fusion did not
match the machine.  In particular, LWA is a ds format instruction when it is
unprefixed.  The code did not also set the prefixed attribute correctly.

These patch hav been tested on:

* Little endian power9 with both IEEE and IBM long double
* Little endian power10
* Big endian power8 using both 32-bit and 64-bit code generation.

Can I check these into the master branch?  Assuming I can check this in, I will
also commit to the active GCC branches after a burn-in period.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH, V4] PR target/105325, Make load/cmp fusion know about prefixed loads.

2023-05-03 Thread Michael Meissner via Gcc-patches
On Tue, May 02, 2023 at 05:32:04PM -0500, Segher Boessenkool wrote:
> On Wed, Apr 26, 2023 at 12:18:36PM -0400, Michael Meissner wrote:
> > * gcc/config/rs6000/genfusion.pl (gen_ld_cmpi_p10): Improve generation
> > of the ld and lwa instructions which use the DS encoding instead of D.
> > Use the YZ constraint for these loads.  Handle prefixed loads better.
> 
> Don't use tabs in the middle of a line.
> 
> "Handle prefixed loads better" is not what the patch does, and/or is so
> vague as to be useless.

Ok.

> > --- a/gcc/config/rs6000/genfusion.pl
> > +++ b/gcc/config/rs6000/genfusion.pl
> > @@ -56,7 +56,7 @@ sub mode_to_ldst_char
> >  sub gen_ld_cmpi_p10
> >  {
> >  my ($lmode, $ldst, $clobbermode, $result, $cmpl, $echr, $constpred,
> > -   $mempred, $ccmode, $np, $extend, $resultmode);
> > +   $mempred, $ccmode, $np, $extend, $resultmode, $constraint);
> >LMODE: foreach $lmode ('DI','SI','HI','QI') {
> >$ldst = mode_to_ldst_char($lmode);
> >$clobbermode = $lmode;
> > @@ -71,21 +71,34 @@ sub gen_ld_cmpi_p10
> >CCMODE: foreach $ccmode ('CC','CCUNS') {
> >   $np = "NON_PREFIXED_D";
> >   $mempred = "non_update_memory_operand";
> > + $constraint = "m";
> >   if ( $ccmode eq 'CC' ) {
> >   next CCMODE if $lmode eq 'QI';
> > - if ( $lmode eq 'DI' || $lmode eq 'SI' ) {
> > + if ( $lmode eq 'HI' ) {
> > + $np = "NON_PREFIXED_D";
> > + $mempred = "non_update_memory_operand";
> > + $echr = "a";
> > + } elsif ( $lmode eq 'SI' ) {
> > + # ld and lwa are both DS-FORM.
> > + $np = "NON_PREFIXED_DS";
> > + $mempred = "lwa_operand";
> > + $echr = "a";
> > + $constraint = "YZ";
> > + } elsif ( $lmode eq 'DI' ) {
> >   # ld and lwa are both DS-FORM.
> >   $np = "NON_PREFIXED_DS";
> >   $mempred = "ds_form_mem_operand";
> > + $echr = "";
> > + $constraint = "YZ";
> >   }
> >   $cmpl = "";
> > - $echr = "a";
> >   $constpred = "const_m1_to_1_operand";
> >   } else {
> >   if ( $lmode eq 'DI' ) {
> >   # ld is DS-form, but lwz is not.
> >   $np = "NON_PREFIXED_DS";
> >   $mempred = "ds_form_mem_operand";
> > + $constraint = "YZ";
> >   }
> >   $cmpl = "l";
> >   $echr = "z";
> > @@ -108,7 +121,7 @@ sub gen_ld_cmpi_p10
> >  
> >   print "(define_insn_and_split 
> > \"*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}\"\n";
> >   print "  [(set (match_operand:${ccmode} 2 \"cc_reg_operand\" 
> > \"=x\")\n";
> > - print "(compare:${ccmode} (match_operand:${lmode} 1 
> > \"${mempred}\" \"m\")\n";
> > + print "(compare:${ccmode} (match_operand:${lmode} 1 
> > \"${mempred}\" \"${constraint}\")\n";
> >   if ($ccmode eq 'CCUNS') { print "   "; }
> >   print "(match_operand:${lmode} 3 \"${constpred}\" 
> > \"n\")))\n";
> >   if ($result eq 'clobber') {
> > @@ -137,6 +150,11 @@ sub gen_ld_cmpi_p10
> >   print "  \"\"\n";
> >   print "  [(set_attr \"type\" \"fused_load_cmpi\")\n";
> >   print "   (set_attr \"cost\" \"8\")\n";
> > +
> > + if ($extend eq "sign") {
> > + print "   (set_attr \"sign_extend\" \"yes\")\n";
> > + }
> > +
> >   print "   (set_attr \"length\" \"8\")])\n";
> >   print "\n";
> >}
> 
> This already was a 90-line function that did too many things.  Now it is
> bigger and does more things, and the patch is unintelligible.
> 
> Please first factor things.  There are many more things terrible Perl
> code style here (like all of the quoting), but where to start :-/

Note, I didn't write the original patch nor the original code (Aaron did), but
without a lot of rewrites it will take more time to get it done.

> I once again spent many hours trying to review this, and once again
> failed.  Please write better code, and please make better patches.
> 
> > index ec783803820..7d6c94aee5b 100644
> > --- a/gcc/config/rs6000/rs6000.md
> > +++ b/gcc/config/rs6000/rs6000.md
> > @@ -302,7 +302,7 @@ (define_attr "prefixed" "no,yes"
> >   (eq_attr "maybe_prefixed" "no"))
> >  (const_string "no")
> >  
> > -(eq_attr "type" "load,fpload,vecload")
> > +(eq_attr "type" "load,fpload,vecload,vecload,fused_load_cmpi")
> 
> Don't duplicate vecload.

Ok.

> > --- /dev/null
> > +++ b/gcc/testsuite/g++.target/powerpc/pr105325.C
> > @@ -0,0 +1,25 @@
> > +/* { dg-do assemble } */
> > +/* { dg-require-effective-target lp64 } */
> > +/* { dg-require-effective-target power10_ok } */
> > +/* { dg-require-effective-target powerpc_prefixed_addr } */
> > +/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */
> 
> The power10_ok selector still is terribly broken (it allows only some
> variants of 64-bit Linux and nothing 

[PATCH, V4] PR target/105325, Make load/cmp fusion know about prefixed loads.

2023-04-26 Thread Michael Meissner via Gcc-patches
I posted a version of patch on March 21st, a second version on March 24th, and
the third version on March 28th.

The V4 patch just adds a new condition to the new test case.  Previously, I was
using 'powerpc_prefixed_addr' to determine whether the GCC compiler would
automatically generate prefixed addresses.  The V4 version also adds a check
for 'power10_ok'.  Power10_ok is needed in case the compiler could generate
prefixed addresses, but the assembler does not support prefixed instructions.

The V3 patch makes some code changes suggested in the genfusion.pl code from
the last 2 patch submissions.  The fusion.md that is produced by genfusion.pl
is the same in all 3 versions.

In V3, I changed the genfusion.pl to match the suggestion for code layout.  I
also used the correct comment for each of the instructions (in the 2nd patch,
the when I rewrote the comments about ld and lwa being DS format instructions,
I had put the ld comment in the section handling lwa, and vice versa).

In V3, I also removed lp64 from the new test.  When I first added the prefixed
code, it was only done for 64-bit, but now it is allowed for 32-bit.  However,
the case that shows up (lwa) would not hit in 32-bit, since it only generates
lwz and not lwa.  It also would not generate ld.  But the test does pass when
it is built with -m32.

The issue with the original bug is the power10 load GPR + cmpi -1/0/1 fusion
optimization generates illegal assembler code.

Ultimately the code was dying because the fusion load + compare -1/0/1 patterns
did not handle the possibility that the load might be prefixed.

The main cause is the constraints for the individual loads in the fusion did not
match the machine.  In particular, LWA is a ds format instruction when it is
unprefixed.  The code did not also set the prefixed attribute correctly.

This patch rewrites the genfusion.pl script so that it will have more accurate
constraints for the LWA and LD instructions (which are DS instructions).  The
updated genfusion.pl was then run to update fusion.md.  Finally, the code for
the "prefixed" attribute is modified so that it considers load + compare
immediate patterns to be like the normal load insns in checking whether
operand[1] is a prefixed instruction.

I have tested this code on a power9 little endian system (with long double
being IEEE 128-bit and IBM 128-bit), a power10 little endian system, and a
power8 big endian system, testing both 32-bit and 64-bit code generation.

For the V4 changes I also built the compiler on a big endian system with an
older assembler, and I verified that the pr105325.C test was listed as
unsupported.

Can I put this code into the master branch, and after a waiting period, apply
it to the GCC 12 and GCC 11 branches (the bug does show up in those branches,
and the patch applies without change).

2023-04-26   Michael Meissner  

gcc/

PR target/105325
* gcc/config/rs6000/genfusion.pl (gen_ld_cmpi_p10): Improve generation
of the ld and lwa instructions which use the DS encoding instead of D.
Use the YZ constraint for these loads.  Handle prefixed loads better.
Set the sign_extend attribute as appropriate.
* gcc/config/rs6000/fusion.md: Regenerate.
* gcc/config/rs6000/rs6000.md (prefixed attribute): Add fused_load_cmpi
instructions to the list of instructions that might have a prefixed load
instruction.

gcc/testsuite/

PR target/105325
* g++.target/powerpc/pr105325.C: New test.
* gcc.target/powerpc/fusion-p10-ldcmpi.c: Adjust insn counts.
---
 gcc/config/rs6000/fusion.md   | 17 +++-
 gcc/config/rs6000/genfusion.pl| 26 ---
 gcc/config/rs6000/rs6000.md   |  2 +-
 gcc/testsuite/g++.target/powerpc/pr105325.C   | 25 ++
 .../gcc.target/powerpc/fusion-p10-ldcmpi.c|  4 +--
 5 files changed, 60 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr105325.C

diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index d45fb138a70..da9953d9ad9 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -22,7 +22,7 @@
 ;; load mode is DI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-(compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+(compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
 (match_operand:DI 3 "const_m1_to_1_operand" "n")))
(clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -43,7 +43,7 @@ (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
 ;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-(compare:CCUNS (match_operand:DI 

Ping #2: [PATCH, V3] PR target/105325, Make load/cmp fusion know about prefixed loads

2023-04-12 Thread Michael Meissner via Gcc-patches
Ping for patch 105325.  I believe patch V3 answers the objections raised
earlier.  Can I check this patch into master?  Then can I apply this patch to
GCC 12 and 11 after appropriate delays?

| Date: Mon, 27 Mar 2023 23:19:55 -0400
| Subject: [PATCH, V3] PR target/105325, Make load/cmp fusion know about 
prefixed loads
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH] testsuite: update requires for powerpc/float128-cmp2-runnable.c

2023-04-12 Thread Michael Meissner via Gcc-patches
On Wed, Apr 12, 2023 at 01:31:46PM +0800, Jiufu Guo wrote:
> I understand that QP insns (e.g. xscmpexpqp) is valid if the system
> meets ISA3.0, no matter BE/LE, 32-bit/64-bit.
> I think option -mfloat128-hardware is designed for QP insns.
> 
> While there is one issue, on BE machine, when compiling with options
> "-mfloat128-hardware -m32", an error message is generated:
> "error: '%<-mfloat128-hardware%>' requires '-m64'"
> 
> (I'm wondering if we need to relax this limitation.)

In the past, the machine independent portion of the compiler demanded that for
scalar mode, there be an integer mode of the same size, since sometimes moves
are converted to using an int RTL mode.  Since we don't have TImode support in
32-bit, you would get various errors because something tried to do a TImode
move for KFmode types, and the TImode wasn't available.

If somebody wants to verify that this now works on 32-bit and/or implements
TImode on 32-bit, then we can relax the restriction.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[PATCH, V3] PR target/70243 - Do not generate vmaddfp or vnmsubdp

2023-04-08 Thread Michael Meissner via Gcc-patches
This is version 3 of the patch.  This is essentially version 1 with the removal
of changes to altivec.md, and cleanup of the comments.

Version 2 generated the vmaddfp and vnmsubfp instructions if -Ofast was used,
and those changes are deleted in this patch.

The Altivec instructions vmaddfp and vnmsubfp have different rounding behaviors
than the VSX xvmaddsp and xvnmsubsp instructions.  In particular, generating
these instructions seems to break Eigen on big endian systems.

I have done bootstrap builds on power9 little endian (with both IEEE long
double and IBM long double).  I have also done the builds and test on a power8
big endian system (testing both 32-bit and 64-bit code generation).  Chip has
verified that it fixes the problem that Eigen encountered.  Can I check this
into the master GCC branch?  After a burn-in period, can I check this patch
into the active GCC branches?

Thanks in advance.

2023-04-07   Michael Meissner  

gcc/

PR target/70243
* config/rs6000/rs6000.md (vsx_fmav4sf4): Do not generate vmaddfp.
(vsx_nfmsv4sf4): Do not generate vnmsubfp.

gcc/testsuite/

PR target/70243
* gcc.target/powerpc/pr70243.c: New test.
---
 gcc/config/rs6000/vsx.md   | 31 
 gcc/testsuite/gcc.target/powerpc/pr70243.c | 41 ++
 2 files changed, 55 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr70243.c

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 0865608f94a..c4c503cacad 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -2009,22 +2009,20 @@ (define_insn "*vsx_tsqrt2_internal"
   "xtsqrtp %0,%x1"
   [(set_attr "type" "")])
 
-;; Fused vector multiply/add instructions. Support the classical Altivec
-;; versions of fma, which allows the target to be a separate register from the
-;; 3 inputs.  Under VSX, the target must be either the addend or the first
-;; multiply.
-
+;; Fused vector multiply/add instructions. Do not generate the Altivec versions
+;; of fma (vmaddfp and vnmsubfp).  These instructions allows the target to be a
+;; separate register from the 3 inputs, but they have different rounding
+;; behaviors than the VSX instructions.
 (define_insn "*vsx_fmav4sf4"
-  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v")
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
(fma:V4SF
- (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v")
- (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v")
- (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))]
+ (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa")
+ (match_operand:V4SF 2 "vsx_register_operand" "wa,0")
+ (match_operand:V4SF 3 "vsx_register_operand" "0,wa")))]
   "VECTOR_UNIT_VSX_P (V4SFmode)"
   "@
xvmaddasp %x0,%x1,%x2
-   xvmaddmsp %x0,%x1,%x3
-   vmaddfp %0,%1,%2,%3"
+   xvmaddmsp %x0,%x1,%x3"
   [(set_attr "type" "vecfloat")])
 
 (define_insn "*vsx_fmav2df4"
@@ -2066,18 +2064,17 @@ (define_insn "*vsx_nfma4"
   [(set_attr "type" "")])
 
 (define_insn "*vsx_nfmsv4sf4"
-  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v")
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
(neg:V4SF
 (fma:V4SF
-  (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v")
-  (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v")
+  (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa")
+  (match_operand:V4SF 2 "vsx_register_operand" "wa,0")
   (neg:V4SF
-(match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")]
+(match_operand:V4SF 3 "vsx_register_operand" "0,wa")]
   "VECTOR_UNIT_VSX_P (V4SFmode)"
   "@
xvnmsubasp %x0,%x1,%x2
-   xvnmsubmsp %x0,%x1,%x3
-   vnmsubfp %0,%1,%2,%3"
+   xvnmsubmsp %x0,%x1,%x3"
   [(set_attr "type" "vecfloat")])
 
 (define_insn "*vsx_nfmsv2df4"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr70243.c 
b/gcc/testsuite/gcc.target/powerpc/pr70243.c
new file mode 100644
index 000..18a5ce78792
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr70243.c
@@ -0,0 +1,41 @@
+/* { dg-do compile */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+/* PR 70423, Make sure we don't generate vmaddfp or vnmsubfp.  These
+   instructions have different rounding modes than the VSX instructions
+   xvmaddsp and xvnmsubsp.  These tests are written where the 3 inputs and
+   target are all separate registers.  Because vmaddfp and vnmsubfp are no
+   longer generated the compiler will have to generate an xsmaddsp or xsnmsubsp
+   instruction followed by a move operation.  */
+
+#include 
+
+vector float
+do_add1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return (a * b) + c;
+}
+
+vector float
+do_nsub1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return -((a * b) - c);
+}
+

[PATCH, V2] PR target/70243: Do not generate vmaddfp and vnmsubfp

2023-04-07 Thread Michael Meissner via Gcc-patches
This is version 2 of the patch.  The first version was posted on April 6th.

In this version, I eliminated the changes to Altivec.md that added checks to
altivec_fmav4sf4 and altivec_vnmsubfp.  After writing the code, I remembered
that VECTOR_UNIT_ALTIVEC_P that is used by those insns will not be true if the
VSX instruction set is enabled, so no additional test is needed.

As we discussed in a private chat room, I modified the code to generate vmaddfp
and vnmsubfp if -Ofast (-ffast-math) is used.  This allows the compiler to
eliminate the extra move if the user does not care about strict floating point
code generation, but it generates only the VSX instructions in the normal
case.

I reworked the examples and split them into two tests to test both the normal
case when -Ofast is not used and when it is used.

I also fixed the instructions mentioned in the comments to be the actual
instructions (vmaddfp and vnmsubfp) instead of fmaddfp and fnmsubdp.  Sorry
about tat.

The AltiVec (VMX) instructions vmaddfp and vnmsubfp have different rounding
behaviors than the VSX xvmadd{a,m}sp and xvnmsub{a,m}sp instructions.  In
particular, generating these instructions seems to break Eigen.

The bug is that GCC has generated the VMX vmaddfp and vnmsubfp instructions on
VSX systems as an alternative to the xsmadd{a,m}sp and xsnmsub{a,m}sp
instructions.  The advantage of the VMX instructions is that they are 4 operand
instructions (i.e. the target register does not have to overlap with one of the
input registers).  This can mean that the compiler can eliminate an extra move
instruction. The disadvantage of generating these instructions is it does not
round the same was as the VSX instructions.

This patch will only generate the VMX vmaddfp and vnmsubfp instructions as
alternatives in the VSX instruction insn support if -Ofast (-ffast-math) is
used.  I also added 2 tests to the regression suite.

I have done bootstrap builds on power9 little endian (with both IEEE long
double and IBM long double).  I have also done the builds and test on a power8
big endian system (testing both 32-bit and 64-bit code generation).  Chip has
verified that it fixes the problem that Eigen encountered.  Can I check this
into the master GCC branch?  After a burn-in period, can I check this patch
into the active GCC branches?

Thanks in advance.

2023-04-07   Michael Meissner  

gcc/

PR target/70243
* config/rs6000/rs6000.md (isa attribute): Add fastmath.
(enabled attribute): Add support for fastmath.
* config/rs6000/vsx.md (vsx_fmav4sf4): Set the isa attribute to
fastmath to disable Altivec instruction generatins normally.
(vsx_nfmsv4sf4): Likewise.

gcc/testsuite/

PR target/70243
* gcc.target/powerpc/pr70243.c: New test.
* gcc.target/powerpc/pr70243-2.c: New test.
---
 gcc/config/rs6000/rs6000.md  |  6 ++-
 gcc/config/rs6000/vsx.md | 17 
 gcc/testsuite/gcc.target/powerpc/pr70243-2.c | 41 
 gcc/testsuite/gcc.target/powerpc/pr70243.c   | 41 
 4 files changed, 97 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr70243-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr70243.c

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 44f7dd509cb..7fea6a40e0c 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -354,7 +354,7 @@ (define_attr "cpu"
   (const (symbol_ref "(enum attr_cpu) rs6000_tune")))
 
 ;; The ISA we implement.
-(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10"
+(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10,fastmath"
   (const_string "any"))
 
 ;; Is this alternative enabled for the current CPU/ISA/etc.?
@@ -402,6 +402,10 @@ (define_attr "enabled" ""
  (and (eq_attr "isa" "p10")
  (match_test "TARGET_POWER10"))
  (const_int 1)
+
+ (and (eq_attr "isa" "fastmath")
+ (match_test "flag_unsafe_math_optimizations"))
+ (const_int 1)
 ] (const_int 0)))
 
 ;; If this instruction is microcoded on the CELL processor
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 0865608f94a..7f64a2dd356 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -2009,11 +2009,12 @@ (define_insn "*vsx_tsqrt2_internal"
   "xtsqrtp %0,%x1"
   [(set_attr "type" "")])
 
-;; Fused vector multiply/add instructions. Support the classical Altivec
-;; versions of fma, which allows the target to be a separate register from the
-;; 3 inputs.  Under VSX, the target must be either the addend or the first
-;; multiply.
-
+;; Fused vector multiply/add instructions. Under VSX, the target must be either
+;; the addend or the first multiply.  If the user used -Ofast, also support the
+;; classical VMX versions of fma (vmaddfp and vnmsubfp), which allows the
+;; target to be a separate register from the 3 inputs.  This restriction is due

Re: PR target/70243: Do not generate fmaddfp and fnmsubfp

2023-04-07 Thread Michael Meissner via Gcc-patches
On Thu, Apr 06, 2023 at 03:37:59PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Apr 06, 2023 at 11:12:11AM -0400, Michael Meissner wrote:
> > The Altivec instructions fmaddfp and fnmsubfp have different rounding 
> > behaviors
> 
> Those are not existing instructions.  You mean "vmaddfp" etc.

Yes, sorry about that.  I guess I was thinking about the scalar instructions.

> > than the VSX xvmaddsp and xvnmsubsp instructions.  In particular, generating
> > these instructions seems to break Eigen.
> 
> Those instructions use round-to-nearest-tiea-to-even, like all other
> VMX FP insns.  A proper patch has to deal with all VMX FP insns.  But,
> almost all programs expect that rounding mode anyway, so this is not a
> problem in practice.  What happened on Eigen is that the Linux kernel
> starts every new process with VSCR[NJ]=1, breaking pretty much
> everything that wants floating point for non-toy purposes.  (There
> currently is a bug on LE that sets the wrong bit, hiding the problem in
> that configuration, but it is intended there as well).
> 
> > GCC has generated the Altivec fmaddfp and fnmsubfp instructions on VSX 
> > systems
> > as an alternative to the xsmadd{a,m}sp and xsnmsub{a,m}sp instructions.  The
> > advantage  of the Altivec instructions is that they are 4 operand 
> > instructions
> > (i.e. the target register does not have to overlap with one of the input
> > registers).  The advantage is it can eliminate an extra move instruction.  
> > The
> > disadvantage is it does round the same was as the VSX instructions.
> 
> And it gets the VSCR[NJ] setting applied.  Yup.
> 
> > This patch eliminates the generation of the Altivec fmaddfp and fnmsubfp
> > instructions as alternatives in the VSX instruction insn support, and in the
> > Altivec insns it adds a test to prevent the insn from being used if VSX is
> > available.  I also added a test to the regression test suite.
> 
> Please leave the latter out, it does not belong in this patch.  If you
> want a patch to do that deal with *all* VMX FP insns?  There also are
> add, sub, mul, etc.  Well I think those (as well as madd and nmsub) are
> the only ones that use the NJ bit or the RN bits, but please check.

After I posted the patch I refreshed my memory of the VECTOR_UNIT_ALTIVEC_P
macro and it is not true if VSX code generation is enabled.  So I dropped the
changes to altivec.md.

In addition, as far as I know, the only AltiVec (VMX) floating point
instructions generated when VSX is used are the vmaddfp and vnmsubfp
instructions.  In the case of add and subtract, xvaddsp and xvsubsp is more
general than vaddfp or vsubfp since it can access all VSX registers.  VMX does
not have a stand-alone multiply (it generates FMA with a zero register) and it
does not have a division operation.  And VMX does not have xvmsub{a,m}sp nor
xvnadd{a,m}sp variations of the FMA instructions.

> > --- a/gcc/config/rs6000/altivec.md
> > +++ b/gcc/config/rs6000/altivec.md
> > @@ -750,12 +750,15 @@ (define_insn "altivec_vsel4"
> >  
> >  ;; Fused multiply add.
> >  
> > +;; If we are using VSX instructions, do not generate the vmaddfp 
> > instruction
> > +;; since is has different rounding behavior than the xvmaddsp instruction.
> > +
> 
> No blank lines please.

Ok.

> >  (define_insn "*altivec_fmav4sf4"
> >[(set (match_operand:V4SF 0 "register_operand" "=v")
> > (fma:V4SF (match_operand:V4SF 1 "register_operand" "v")
> >   (match_operand:V4SF 2 "register_operand" "v")
> >   (match_operand:V4SF 3 "register_operand" "v")))]
> > -  "VECTOR_UNIT_ALTIVEC_P (V4SFmode)"
> > +  "VECTOR_UNIT_ALTIVEC_P (V4SFmode) && !TARGET_VSX"
> 
> This is very error-prone.  Maybe add a test to the VECTOR_UNIT_ALTIVEC
> macro instead?

As I said that part of the code is not in the next patch.

> > -;; Fused vector multiply/add instructions. Support the classical Altivec
> > -;; versions of fma, which allows the target to be a separate register from 
> > the
> > -;; 3 inputs.  Under VSX, the target must be either the addend or the first
> > -;; multiply.
> > +;; Fused vector multiply/add instructions. Do not use the classical Altivec

> (Two spaces after dot, and AltiVec is spelled with a capital V.  I don't
> like it either, VMX is a much nicer and more regular name).

When the name might be more regular, but in terms of the instruction set, it
does have holes that I mentioned above (no multiply that is not a FMA, two of
the four FMA variants are not provided).

> > +;; versions of fma.  Those instructions allows the target to be a separate
> > +;; register from the 3 inputs, but they have different rounding behaviors.
> >  
> >  (define_insn "*vsx_fmav4sf4"
> > -  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v")
> > +  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
> > (fma:V4SF
> > - (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v")
> > - (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v")
> > - 

PR target/70243: Do not generate fmaddfp and fnmsubfp

2023-04-06 Thread Michael Meissner via Gcc-patches
The Altivec instructions fmaddfp and fnmsubfp have different rounding behaviors
than the VSX xvmaddsp and xvnmsubsp instructions.  In particular, generating
these instructions seems to break Eigen.

GCC has generated the Altivec fmaddfp and fnmsubfp instructions on VSX systems
as an alternative to the xsmadd{a,m}sp and xsnmsub{a,m}sp instructions.  The
advantage  of the Altivec instructions is that they are 4 operand instructions
(i.e. the target register does not have to overlap with one of the input
registers).  The advantage is it can eliminate an extra move instruction.  The
disadvantage is it does round the same was as the VSX instructions.

This patch eliminates the generation of the Altivec fmaddfp and fnmsubfp
instructions as alternatives in the VSX instruction insn support, and in the
Altivec insns it adds a test to prevent the insn from being used if VSX is
available.  I also added a test to the regression test suite.

I have done bootstrap builds on power9 little endian (with both IEEE long
double and IBM long double).  I have also done the builds and test on a power8
big endian system (testing both 32-bit and 64-bit code generation).  Chip has
verified that it fixes the problem that Eigen encountered.  Can I check this
into the master GCC branch?  After a burn-in period, can I check this patch
into the active GCC branches?

Thanks in advance.

2023-04-06   Michael Meissner  

gcc/

PR target/70243
* config/rs6000/altivec.md (altivec_fmav4sf4): Add a test to prevent
fmaddfp and fnmsubfp from being generated on VSX systems.
(altivec_vnmsubfp): Likewise.
* config/rs6000/rs6000.md (vsx_fmav4sf4): Do not generate fmaddfp or
fnmsubfp.
(vsx_nfmsv4sf4): Likewise.

gcc/testsuite/

PR target/70243
* gcc.target/powerpc/pr70243.c: New test.
---
 gcc/config/rs6000/altivec.md   |  9 +++--
 gcc/config/rs6000/vsx.md   | 29 +++
 gcc/testsuite/gcc.target/powerpc/pr70243.c | 41 ++
 3 files changed, 61 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr70243.c

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 49b0c964f4d..63eab228d0d 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -750,12 +750,15 @@ (define_insn "altivec_vsel4"
 
 ;; Fused multiply add.
 
+;; If we are using VSX instructions, do not generate the vmaddfp instruction
+;; since is has different rounding behavior than the xvmaddsp instruction.
+
 (define_insn "*altivec_fmav4sf4"
   [(set (match_operand:V4SF 0 "register_operand" "=v")
(fma:V4SF (match_operand:V4SF 1 "register_operand" "v")
  (match_operand:V4SF 2 "register_operand" "v")
  (match_operand:V4SF 3 "register_operand" "v")))]
-  "VECTOR_UNIT_ALTIVEC_P (V4SFmode)"
+  "VECTOR_UNIT_ALTIVEC_P (V4SFmode) && !TARGET_VSX"
   "vmaddfp %0,%1,%2,%3"
   [(set_attr "type" "vecfloat")])
 
@@ -984,6 +987,8 @@ (define_insn "vstril_p_direct_"
   [(set_attr "type" "vecsimple")])
 
 ;; Fused multiply subtract 
+;; If we are using VSX instructions, do not generate the vnmsubfp instruction
+;; since is has different rounding behavior than the xvnmsubsp instruction.
 (define_insn "*altivec_vnmsubfp"
   [(set (match_operand:V4SF 0 "register_operand" "=v")
(neg:V4SF
@@ -991,7 +996,7 @@ (define_insn "*altivec_vnmsubfp"
   (match_operand:V4SF 2 "register_operand" "v")
   (neg:V4SF
(match_operand:V4SF 3 "register_operand" "v")]
-  "VECTOR_UNIT_ALTIVEC_P (V4SFmode)"
+  "VECTOR_UNIT_ALTIVEC_P (V4SFmode) && !TARGET_VSX"
   "vnmsubfp %0,%1,%2,%3"
   [(set_attr "type" "vecfloat")])
 
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 0865608f94a..03c1d787b6c 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -2009,22 +2009,20 @@ (define_insn "*vsx_tsqrt2_internal"
   "xtsqrtp %0,%x1"
   [(set_attr "type" "")])
 
-;; Fused vector multiply/add instructions. Support the classical Altivec
-;; versions of fma, which allows the target to be a separate register from the
-;; 3 inputs.  Under VSX, the target must be either the addend or the first
-;; multiply.
+;; Fused vector multiply/add instructions. Do not use the classical Altivec
+;; versions of fma.  Those instructions allows the target to be a separate
+;; register from the 3 inputs, but they have different rounding behaviors.
 
 (define_insn "*vsx_fmav4sf4"
-  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v")
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
(fma:V4SF
- (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v")
- (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v")
- (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))]
+ (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa")
+ (match_operand:V4SF 2 

Ping: [PATCH, V3] PR target/105325, Make load/cmp fusion know about prefixed loads

2023-04-05 Thread Michael Meissner via Gcc-patches
Ping patch:

| Date: Mon, 27 Mar 2023 23:19:55 -0400
| From: Michael Meissner 
| Subject: [PATCH, V3] PR target/105325, Make load/cmp fusion know about 
prefixed loads
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH, V2] PR target/105325, Make load/cmp fusion know about prefixed load

2023-03-27 Thread Michael Meissner via Gcc-patches
On Mon, Mar 27, 2023 at 03:03:17PM +0800, Kewen.Lin wrote:
> ... instead I suggested moving these three lines to below else arm for CCUNS,
> since the arm for CC already has those variables redefined, so it's something
> like:

I did those changes in the 3rd version of the patch.

| Date: Mon, 27 Mar 2023 23:19:55 -0400
| From: Michael Meissner 
| Subject: [PATCH, V3] PR target/105325, Make load/cmp fusion know about 
prefixed loads
| Message-ID: 

...

> In the previous review, I put a comment that "lp64 seems not necessary.".
> Did you try to test without it? (if yes, any fallouts?)

Yes, I tried it without the lp64, and I removed it from V3 of the patch.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[PATCH, V3] PR target/105325, Make load/cmp fusion know about prefixed loads

2023-03-27 Thread Michael Meissner via Gcc-patches
I posted a version of patch on March 21st and a second version on March 24th.
This patch makes some code changes suggested in the genfusion.pl code from the
last 2 patch submissions.  The fusion.md that is produced by genfusion.pl is
the same in all 3 versions.

I changed the genfusion.pl to match the suggestion for code layout.  I also
used the correct comment for each of the instructions (in the 2nd patch, the
when I rewrote the comments about ld and lwa being DS format instructions, I
had put the ld comment in the section handling lwa, and vice versa).

I also removed lp64 from the new test.  When I first added the prefixed code,
it was only done for 64-bit, but now it is allowed for 32-bit.  However, the
case that shows up (lwa) would not hit in 32-bit, since it only generates lwz
and not lwa.  It also would not generate ld.  But the test does pass when it is
built with -m32.

The issue with the bug is the power10 load GPR + cmpi -1/0/1 fusion
optimization generates illegal assembler code.

Ultimately the code was dying because the fusion load + compare -1/0/1 patterns
did not handle the possibility that the load might be prefixed.

The main cause is the constraints for the individual loads in the fusion did not
match the machine.  In particular, LWA is a ds format instruction when it is
unprefixed.  The code did not also set the prefixed attribute correctly.

This patch rewrites the genfusion.pl script so that it will have more accurate
constraints for the LWA and LD instructions (which are DS instructions).  The
updated genfusion.pl was then run to update fusion.md.  Finally, the code for
the "prefixed" attribute is modified so that it considers load + compare
immediate patterns to be like the normal load insns in checking whether
operand[1] is a prefixed instruction.

I have tested this code on a power9 little endian system (with long double
being IEEE 128-bit and IBM 128-bit), a power10 little endian system, and a
power8 big endian system, testing both 32-bit and 64-bit code generation.  Can
I put this code into the master branch, and after a waiting period, apply it to
the GCC 12 and GCC 11 branches (the bug does show up in those branches, and the
patch applies without change).

2023-03-27   Michael Meissner  

gcc/

PR target/105325
* gcc/config/rs6000/genfusion.pl (gen_ld_cmpi_p10): Improve generation
of the ld and lwa instructions which use the DS encoding instead of D.
Use the YZ constraint for these loads.  Handle prefixed loads better.
Set the sign_extend attribute as appropriate.
* gcc/config/rs6000/fusion.md: Regenerate.
* gcc/config/rs6000/rs6000.md (prefixed attribute): Add fused_load_cmpi
instructions to the list of instructions that might have a prefixed load
instruction.

gcc/testsuite/

PR target/105325
* g++.target/powerpc/pr105325.C: New test.
* gcc.target/powerpc/fusion-p10-ldcmpi.c: Adjust insn counts.

---
 gcc/config/rs6000/fusion.md   | 17 +
 gcc/config/rs6000/genfusion.pl| 36 ++-
 gcc/config/rs6000/rs6000.md   |  2 +-
 gcc/testsuite/g++.target/powerpc/pr105325.C   | 23 
 .../gcc.target/powerpc/fusion-p10-ldcmpi.c|  4 +--
 5 files changed, 64 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr105325.C

diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index d45fb138a70..da9953d9ad9 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -22,7 +22,7 @@
 ;; load mode is DI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-(compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+(compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
 (match_operand:DI 3 "const_m1_to_1_operand" "n")))
(clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -43,7 +43,7 @@ (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
 ;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-(compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+(compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
(match_operand:DI 3 "const_0_to_1_operand" "n")))
(clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -64,7 +64,7 @@ (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
 ;; load mode is DI result mode is DI compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-(compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+(compare:CC (match_operand:DI 1 

Re: [PATCH] PR target/105325, Make load/cmp fusion know about prefixed loads

2023-03-24 Thread Michael Meissner via Gcc-patches
On Thu, Mar 23, 2023 at 04:10:22PM +0800, Kewen.Lin wrote:
> Hi Mike,
> 
> Thanks for fixing this, some minor comments are inlined below.
> 
> on 2023/3/22 07:53, Michael Meissner wrote:
> > The issue with the bug is the power10 load GPR + cmpi -1/0/1 fusion
> > optimization generates illegal assembler code.
> > 
> > Ultimately the code was dying because the fusion load + compare -1/0/1 
> > patterns
> > did not handle the possibility that the load might be prefixed.
> > 
> > The main cause is the constraints for the individual loads in the fusion 
> > did not
> > match the machine.  In particular, LWA is a ds format instruction when it is
> > unprefixed.  The code did not also set the prefixed attribute correctly.
> > 
> > This patch rewrites the genfusion.pl script so that it will have more 
> > accurate
> > constraints for the LWA and LD instructions (which are DS instructions).  
> > The
> > updated genfusion.pl was then run to update fusion.md.  Finally, the code 
> > for
> > the "prefixed" attribute is modified so that it considers load + compare
> > immediate patterns to be like the normal load insns in checking whether
> > operand[1] is a prefixed instruction.
> > 
> > I have tested this patch on a little endian power10 system, on a little 
> > endian
> > power9 system, and a big endian power8 system (both -m32 and -m64 tested on
> > BE).  There were no regressions, can I check this into the trunk?
> > 
> > The same patch applies to the gcc-12 and gcc-11 branches.  Can I check this
> > patch into those branches also after a burn-in period?
> > 
> > 2023-03-21   Michael Meissner  
> >  Aaron Sawdey  
> > 
> > gcc/
> > 
> > PR target/105325
> > * gcc/config/rs6000/genfusion.pl (gen_ld_cmpi_p10): Improve generation
> > of the ld and lwa instructions which use the DS encoding instead of D.
> > Use the YZ constraint for these loads.  Handle prefixed loads better.
> > Set the sign_extend attribute as appropriate.
> > * gcc/config/rs6000/fusion.md: Regenerate.
> > * gcc/config/rs6000/rs6000.md (prefixed attribute): Add fused_load_cmpi
> > instructions to the list of instructions that might have a prefixed load
> > instruction.
> > 
> > gcc/testsuite/
> > 
> > PR target/105325
> > * g++.target/powerpc/pr105325.C: New test.
> > * gcc.target/powerpc/fusion-p10-ldcmpi.c: Adjust insn counts.
> > ---
> >  gcc/config/rs6000/genfusion.pl| 26 ---
> >  gcc/config/rs6000/fusion.md   | 17 +++-
> >  gcc/config/rs6000/rs6000.md   |  2 +-
> >  gcc/testsuite/g++.target/powerpc/pr105325.C   | 24 +
> >  .../gcc.target/powerpc/fusion-p10-ldcmpi.c|  4 +--
> >  5 files changed, 59 insertions(+), 14 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.target/powerpc/pr105325.C
> > 
> > diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
> > index e4db352e0ce..4f367cadc52 100755
> > --- a/gcc/config/rs6000/genfusion.pl
> > +++ b/gcc/config/rs6000/genfusion.pl
> > @@ -56,7 +56,7 @@ sub mode_to_ldst_char
> >  sub gen_ld_cmpi_p10
> >  {
> >  my ($lmode, $ldst, $clobbermode, $result, $cmpl, $echr, $constpred,
> > -   $mempred, $ccmode, $np, $extend, $resultmode);
> > +   $mempred, $ccmode, $np, $extend, $resultmode, $constraint);
> >LMODE: foreach $lmode ('DI','SI','HI','QI') {
> >$ldst = mode_to_ldst_char($lmode);
> >$clobbermode = $lmode;
> > @@ -71,21 +71,34 @@ sub gen_ld_cmpi_p10
> >CCMODE: foreach $ccmode ('CC','CCUNS') {
> >   $np = "NON_PREFIXED_D";
> >   $mempred = "non_update_memory_operand";
> > + $constraint = "m";
> 
> The three assignments on $np $mempred $constraint can be moved
> to place (a) (see below) and add one explicit assignment for
> $constraint at place (b), since for the condition ccmode eq 'CC',
> HI/SI/DI have their own settings (btw QI is skipped), these
> assignments for default value can be moved to else arm (for CCUNS).

...

> we have broken it into two different arms for SI and DI, this
> comment can be removed?

...

> 
> ... and this comment.
> 

I have fixed these issues and reposted the patch as:

| Date: Fri, 24 Mar 2023 19:06:35 -0400
| From: Michael Meissner 
| Subject: [PATCH, V2] PR target/105325, Make load/cmp fusion know about 
prefixed load
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[PATCH, V2] PR target/105325, Make load/cmp fusion know about prefixed load

2023-03-24 Thread Michael Meissner via Gcc-patches
I posted a version of patch on March 21st.  This patch makes some code changes
suggested in the genfusion.pl code.  The only change is in genfusion.pl.  The
fusion.md that it makes is the same.

The issue with the bug is the power10 load GPR + cmpi -1/0/1 fusion
optimization generates illegal assembler code.

Ultimately the code was dying because the fusion load + compare -1/0/1 patterns
did not handle the possibility that the load might be prefixed.

The main cause is the constraints for the individual loads in the fusion did not
match the machine.  In particular, LWA is a ds format instruction when it is
unprefixed.  The code did not also set the prefixed attribute correctly.

This patch rewrites the genfusion.pl script so that it will have more accurate
constraints for the LWA and LD instructions (which are DS instructions).  The
updated genfusion.pl was then run to update fusion.md.  Finally, the code for
the "prefixed" attribute is modified so that it considers load + compare
immediate patterns to be like the normal load insns in checking whether
operand[1] is a prefixed instruction.

I am re-running the tests right now, but they should have the same results
since fsuion.md is the same, and only code in genfusion.pl that makes fusion.md
was modified.  Assuming these runs pass can I check this into the master
branch?

I will also need to check these same patches into GCC 11 and GCC 12 after a
waiting period (the patch applied to those branches as well).

2023-03-21   Michael Meissner  

gcc/

PR target/105325
* gcc/config/rs6000/genfusion.pl (gen_ld_cmpi_p10): Improve generation
of the ld and lwa instructions which use the DS encoding instead of D.
Use the YZ constraint for these loads.  Handle prefixed loads better.
Set the sign_extend attribute as appropriate.
* gcc/config/rs6000/fusion.md: Regenerate.
* gcc/config/rs6000/rs6000.md (prefixed attribute): Add fused_load_cmpi
instructions to the list of instructions that might have a prefixed load
instruction.

gcc/testsuite/

PR target/105325
* g++.target/powerpc/pr105325.C: New test.
* gcc.target/powerpc/fusion-p10-ldcmpi.c: Adjust insn counts.
---
 gcc/config/rs6000/fusion.md   | 17 ++
 gcc/config/rs6000/genfusion.pl| 32 +++
 gcc/config/rs6000/rs6000.md   |  2 +-
 gcc/testsuite/g++.target/powerpc/pr105325.C   | 24 ++
 .../gcc.target/powerpc/fusion-p10-ldcmpi.c|  4 +--
 5 files changed, 62 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr105325.C

diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index d45fb138a70..da9953d9ad9 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -22,7 +22,7 @@
 ;; load mode is DI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-(compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+(compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
 (match_operand:DI 3 "const_m1_to_1_operand" "n")))
(clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -43,7 +43,7 @@ (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
 ;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-(compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+(compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
(match_operand:DI 3 "const_0_to_1_operand" "n")))
(clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -64,7 +64,7 @@ (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
 ;; load mode is DI result mode is DI compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-(compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+(compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
 (match_operand:DI 3 "const_m1_to_1_operand" "n")))
(set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -85,7 +85,7 @@ (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
 ;; load mode is DI result mode is DI compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-(compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+(compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
(match_operand:DI 3 "const_0_to_1_operand" "n")))
(set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   

[PATCH] PR target/105325, Make load/cmp fusion know about prefixed loads

2023-03-21 Thread Michael Meissner via Gcc-patches
The issue with the bug is the power10 load GPR + cmpi -1/0/1 fusion
optimization generates illegal assembler code.

Ultimately the code was dying because the fusion load + compare -1/0/1 patterns
did not handle the possibility that the load might be prefixed.

The main cause is the constraints for the individual loads in the fusion did not
match the machine.  In particular, LWA is a ds format instruction when it is
unprefixed.  The code did not also set the prefixed attribute correctly.

This patch rewrites the genfusion.pl script so that it will have more accurate
constraints for the LWA and LD instructions (which are DS instructions).  The
updated genfusion.pl was then run to update fusion.md.  Finally, the code for
the "prefixed" attribute is modified so that it considers load + compare
immediate patterns to be like the normal load insns in checking whether
operand[1] is a prefixed instruction.

I have tested this patch on a little endian power10 system, on a little endian
power9 system, and a big endian power8 system (both -m32 and -m64 tested on
BE).  There were no regressions, can I check this into the trunk?

The same patch applies to the gcc-12 and gcc-11 branches.  Can I check this
patch into those branches also after a burn-in period?

2023-03-21   Michael Meissner  
 Aaron Sawdey  

gcc/

PR target/105325
* gcc/config/rs6000/genfusion.pl (gen_ld_cmpi_p10): Improve generation
of the ld and lwa instructions which use the DS encoding instead of D.
Use the YZ constraint for these loads.  Handle prefixed loads better.
Set the sign_extend attribute as appropriate.
* gcc/config/rs6000/fusion.md: Regenerate.
* gcc/config/rs6000/rs6000.md (prefixed attribute): Add fused_load_cmpi
instructions to the list of instructions that might have a prefixed load
instruction.

gcc/testsuite/

PR target/105325
* g++.target/powerpc/pr105325.C: New test.
* gcc.target/powerpc/fusion-p10-ldcmpi.c: Adjust insn counts.
---
 gcc/config/rs6000/genfusion.pl| 26 ---
 gcc/config/rs6000/fusion.md   | 17 +++-
 gcc/config/rs6000/rs6000.md   |  2 +-
 gcc/testsuite/g++.target/powerpc/pr105325.C   | 24 +
 .../gcc.target/powerpc/fusion-p10-ldcmpi.c|  4 +--
 5 files changed, 59 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr105325.C

diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
index e4db352e0ce..4f367cadc52 100755
--- a/gcc/config/rs6000/genfusion.pl
+++ b/gcc/config/rs6000/genfusion.pl
@@ -56,7 +56,7 @@ sub mode_to_ldst_char
 sub gen_ld_cmpi_p10
 {
 my ($lmode, $ldst, $clobbermode, $result, $cmpl, $echr, $constpred,
-   $mempred, $ccmode, $np, $extend, $resultmode);
+   $mempred, $ccmode, $np, $extend, $resultmode, $constraint);
   LMODE: foreach $lmode ('DI','SI','HI','QI') {
   $ldst = mode_to_ldst_char($lmode);
   $clobbermode = $lmode;
@@ -71,21 +71,34 @@ sub gen_ld_cmpi_p10
   CCMODE: foreach $ccmode ('CC','CCUNS') {
  $np = "NON_PREFIXED_D";
  $mempred = "non_update_memory_operand";
+ $constraint = "m";
  if ( $ccmode eq 'CC' ) {
  next CCMODE if $lmode eq 'QI';
- if ( $lmode eq 'DI' || $lmode eq 'SI' ) {
+ if ( $lmode eq 'HI' ) {
+ $np = "NON_PREFIXED_D";
+ $mempred = "non_update_memory_operand";
+ $echr = "a";
+ } elsif ( $lmode eq 'SI' ) {
+ # ld and lwa are both DS-FORM.
+ $np = "NON_PREFIXED_DS";
+ $mempred = "lwa_operand";
+ $echr = "a";
+ $constraint = "YZ";
+ } elsif ( $lmode eq 'DI' ) {
  # ld and lwa are both DS-FORM.
  $np = "NON_PREFIXED_DS";
  $mempred = "ds_form_mem_operand";
+ $echr = "";
+ $constraint = "YZ";
  }
  $cmpl = "";
- $echr = "a";
  $constpred = "const_m1_to_1_operand";
  } else {
  if ( $lmode eq 'DI' ) {
  # ld is DS-form, but lwz is not.
  $np = "NON_PREFIXED_DS";
  $mempred = "ds_form_mem_operand";
+ $constraint = "YZ";
  }
  $cmpl = "l";
  $echr = "z";
@@ -108,7 +121,7 @@ sub gen_ld_cmpi_p10
 
  print "(define_insn_and_split 
\"*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}\"\n";
  print "  [(set (match_operand:${ccmode} 2 \"cc_reg_operand\" 
\"=x\")\n";
- print "(compare:${ccmode} (match_operand:${lmode} 1 
\"${mempred}\" \"m\")\n";
+ print "(compare:${ccmode} (match_operand:${lmode} 1 
\"${mempred}\" \"${constraint}\")\n";
  if ($ccmode eq 'CCUNS') { print "   "; }

Re: [PATCH V4] Rework 128-bit complex multiply and divide.

2023-03-20 Thread Michael Meissner via Gcc-patches
On Mon, Mar 20, 2023 at 01:43:41PM -0400, Michael Meissner wrote:
> I think we will need backports for GCC 12.  The issue exists in GCC 11, but I
> don't think that GCC 11 can really work on systems with IEEE long double, 
> since
> a lot of the stuff to really finish up the support was not in GCC 11.  I think
> I tried dropping the patch into GCC 12, and it looks like something else may 
> be
> needed.  I will look into it.

The current patch applies to GCC 12 without changes, and it does fix the
problem.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH V4] Rework 128-bit complex multiply and divide.

2023-03-20 Thread Michael Meissner via Gcc-patches
On Fri, Mar 17, 2023 at 02:35:16PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Mar 09, 2023 at 08:40:36PM -0500, Michael Meissner wrote:
> > PR target/109067
> > * config/rs6000/rs6000.cc (create_complex_muldiv): Delete.
> > (init_float128_ieee): Delete code to switch complex multiply and divide
> > for long double.
> > (complex_multiply_builtin_code): New helper function.
> > (complex_divide_builtin_code): Likewise.
> > (rs6000_mangle_decl_assembler_name): Add support for mangling the name
> > of complex 128-bit multiply and divide built-in functions.
> 
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/divic3-1.c
> > +/* { dg-final { scan-assembler "__divtc3" } } */
> 
> /* { dg-final { scan-assembler {\m__divtc3\M} } } */
> 
> It might well be that we can use a sloppier regexp here, but why would
> we do that?  It is a good thing to use the \m and \M constraint escapes
> pretty much always.

The last time I posted the patch, you said:

| > +/* { dg-final { scan-assembler "bl __divtc3" } } */
|
| This name depends on what object format and ABI is in use (some have an
| extra leading underscore, or a dot, or whatever).

So the patch was an attempt to match the other cases.

> Similar for the other three testcases of course.
> 
> This patch is okay for trunk, if you have tested it on all
> configurations (powerpc-linux, powerpc64-linux, powerpc64le-linux with
> and without default IEEE128 long double at least).  Thank you!
> 
> Does this need backports?

I think we will need backports for GCC 12.  The issue exists in GCC 11, but I
don't think that GCC 11 can really work on systems with IEEE long double, since
a lot of the stuff to really finish up the support was not in GCC 11.  I think
I tried dropping the patch into GCC 12, and it looks like something else may be
needed.  I will look into it.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[PATCH V4] Rework 128-bit complex multiply and divide.

2023-03-09 Thread Michael Meissner via Gcc-patches
This patch reworks how the complex multiply and divide built-in functions are
done.  Previously GCC created built-in declarations for doing long double 
complex
multiply and divide when long double is IEEE 128-bit.  However, it did not
support __ibm128 complex multiply and divide if long double is IEEE 128-bit.

This code does not create the built-in declaration with the changed name.
Instead, it uses the TARGET_MANGLE_DECL_ASSEMBLER_NAME hook to change the name
before it is written out to the assembler file like it now does for all of the
other long double built-in functions.

Originally, the patch was part of a larger patch set and the comments reflected
this.  I have removed the comments referring to the other patches.  While this
patch was originally developed as part of those other patches, it is a stand
alone patch.

I have tried to take the comments in the last patch review in this patch.
Note, I will be away from the computer from March 10 through the 13th.  So I
would not be checking in the patches until I get back.  But I thought I would
share the results of the changes that were asked for.

I fixed the complex_multiply_builtin_code and complex_divide_builtin_code
functions to have an assert tht the mode is within the proper modes.  I have
tried to make the code a little bit clearer.

I have cleaned up the tests to eliminate the target powerpc in the tests.  I
have elimited the -mpower8-vector option.  I have changed the scan assembler
lines jut to look for __divtc3 or __multc3, and not depend on the format of the
'bl' call to those functions.  I have kept the -Wno-psabi option, because this
is needed to prevent spurious errors on systems with older libraries (like big
endian) that don't have IEEE 128-bit support.

2023-03-09   Michael Meissner  

gcc/

PR target/109067
* config/rs6000/rs6000.cc (create_complex_muldiv): Delete.
(init_float128_ieee): Delete code to switch complex multiply and divide
for long double.
(complex_multiply_builtin_code): New helper function.
(complex_divide_builtin_code): Likewise.
(rs6000_mangle_decl_assembler_name): Add support for mangling the name
of complex 128-bit multiply and divide built-in functions.

gcc/testsuite/

PR target/109067
* gcc.target/powerpc/divic3-1.c: New test.
* gcc.target/powerpc/divic3-2.c: Likewise.
* gcc.target/powerpc/mulic3-1.c: Likewise.
* gcc.target/powerpc/mulic3-2.c: Likewise.
---
 gcc/config/rs6000/rs6000.cc | 111 +++-
 gcc/testsuite/gcc.target/powerpc/divic3-1.c |  21 
 gcc/testsuite/gcc.target/powerpc/divic3-2.c |  25 +
 gcc/testsuite/gcc.target/powerpc/mulic3-1.c |  21 
 gcc/testsuite/gcc.target/powerpc/mulic3-2.c |  25 +
 5 files changed, 156 insertions(+), 47 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/divic3-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/divic3-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/mulic3-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/mulic3-2.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 8e0b0d022db..fa5f93a874f 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -11154,26 +11154,6 @@ init_float128_ibm (machine_mode mode)
 }
 }
 
-/* Create a decl for either complex long double multiply or complex long double
-   divide when long double is IEEE 128-bit floating point.  We can't use
-   __multc3 and __divtc3 because the original long double using IBM extended
-   double used those names.  The complex multiply/divide functions are encoded
-   as builtin functions with a complex result and 4 scalar inputs.  */
-
-static void
-create_complex_muldiv (const char *name, built_in_function fncode, tree fntype)
-{
-  tree fndecl = add_builtin_function (name, fntype, fncode, BUILT_IN_NORMAL,
- name, NULL_TREE);
-
-  set_builtin_decl (fncode, fndecl, true);
-
-  if (TARGET_DEBUG_BUILTIN)
-fprintf (stderr, "create complex %s, fncode: %d\n", name, (int) fncode);
-
-  return;
-}
-
 /* Set up IEEE 128-bit floating point routines.  Use different names if the
arguments can be passed in a vector register.  The historical PowerPC
implementation of IEEE 128-bit floating point used _q_ for the names, so
@@ -11185,32 +11165,6 @@ init_float128_ieee (machine_mode mode)
 {
   if (FLOAT128_VECTOR_P (mode))
 {
-  static bool complex_muldiv_init_p = false;
-
-  /* Set up to call __mulkc3 and __divkc3 under -mabi=ieeelongdouble.  If
-we have clone or target attributes, this will be called a second
-time.  We want to create the built-in function only once.  */
- if (mode == TFmode && TARGET_IEEEQUAD && !complex_muldiv_init_p)
-   {
-complex_muldiv_init_p = true;
-built_in_function fncode_mul =
-  (built_in_function) (BUILT_IN_COMPLEX_MUL_MIN + TCmode
-

Re: [PATCH 2/2] Rework 128-bit complex multiply and divide.

2023-03-09 Thread Michael Meissner via Gcc-patches
On Thu, Mar 09, 2023 at 04:16:21PM -0600, Segher Boessenkool wrote:
> On Thu, Mar 09, 2023 at 11:11:34AM -0500, Michael Meissner wrote:
> > On Fri, Mar 03, 2023 at 03:35:44PM -0600, Segher Boessenkool wrote:
> > > > +/* { dg-final { scan-assembler "bl __divtc3" } } */
> > > 
> > > This name depends on what object format and ABI is in use (some have an
> > > extra leading underscore, or a dot, or whatever).
> > 
> > Yes it is needed if GCC is configured against an older GLIBC before the full
> > IEEE 128-bit support was added.  For example, on my big endian test system, 
> > you
> > get warnings if you switch the floating point format.  I would imagine it 
> > would
> > also fail on little endian system with older libraries.
> 
> The regexp is not good enough, that is all.  Maybe
>   {bl .?__divtc3}
> or similar?  We have many examples in the tests already.

I forgot the mention the regexp.  I think just doing:

/* { dg-final { scan-assembler "__multc3" } } */

is sufficient.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH 2/2] Rework 128-bit complex multiply and divide.

2023-03-09 Thread Michael Meissner via Gcc-patches
On Fri, Mar 03, 2023 at 03:35:44PM -0600, Segher Boessenkool wrote:
> > +complex_multiply_builtin_code (machine_mode mode)
> > +{
> > +  return (built_in_function) (BUILT_IN_COMPLEX_MUL_MIN + mode
> > + - MIN_MODE_COMPLEX_FLOAT);
> > +}
> 
> There should be an assert that the mode is as expected
>   gcc_assert (IN_RANGE (mode, MIN_MODE_COMPLEX_FLOAT, 
> MAX_MODE_COMPLEX_FLOAT));
> or such.
> 
> Using more temporaries should make this simpler as well, obviate the
> need for explicit casts, and make everything fit on short lines.

While I can use a temporary to shorten the line, I can't eliminate the case, or
I'll get a warning about implicit conversion from int to the enum
built_in_function.  Here is what I will use:

static inline built_in_function
complex_multiply_builtin_code (machine_mode mode)
{
  gcc_assert (IN_RANGE (mode, MIN_MODE_COMPLEX_FLOAT, MAX_MODE_COMPLEX_FLOAT));
  int func = BUILT_IN_COMPLEX_MUL_MIN + mode - MIN_MODE_COMPLEX_FLOAT;
  return (built_in_function) func;
}

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH 2/2] Rework 128-bit complex multiply and divide.

2023-03-09 Thread Michael Meissner via Gcc-patches
On Fri, Mar 03, 2023 at 03:35:44PM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Feb 03, 2023 at 12:53:05AM -0500, Michael Meissner wrote:
> > This patch reworks how the complex multiply and divide built-in functions 
> > are
> > done.
> 
> > I tested all 3 patchs for PR target/107299 on:
> 
> Is this part of the proposed commit message?  As Ke Wen pointed out, it
> is wrong.  Most of your mail does not belong in a commit message at all,
> but some probably does?  Please do this clearer with future patches.
> 
> > * config/rs6000/rs6000.cc (create_complex_muldiv): Delete.
> > (init_float128_ieee): Delete code to switch complex multiply and divide
> > for long double.
> 
> I like this kind of patch :-)
> 
> > +/* Internal function to return the built-in function id for the complex
> > +   multiply operation for a given mode.  */
> > +
> > +static inline built_in_function
> > +complex_multiply_builtin_code (machine_mode mode)
> > +{
> > +  return (built_in_function) (BUILT_IN_COMPLEX_MUL_MIN + mode
> > + - MIN_MODE_COMPLEX_FLOAT);
> > +}
> 
> There should be an assert that the mode is as expected
>   gcc_assert (IN_RANGE (mode, MIN_MODE_COMPLEX_FLOAT, 
> MAX_MODE_COMPLEX_FLOAT));
> or such.

Ok.

> Using more temporaries should make this simpler as well, obviate the
> need for explicit casts, and make everything fit on short lines.
> 
> > +static inline built_in_function
> > +complex_divide_builtin_code (machine_mode mode)
> > +{
> > +  return (built_in_function) (BUILT_IN_COMPLEX_DIV_MIN + mode
> > + - MIN_MODE_COMPLEX_FLOAT);
> > +}
> 
> Ditto ofc.
> 
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/divic3-1.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile { target { powerpc*-*-* } } } */
> 
> Leave the target clause out.

Ok.

> > +/* { dg-require-effective-target powerpc_p8vector_ok } */
> > +/* { dg-require-effective-target longdouble128 } */
> > +/* { dg-require-effective-target ppc_float128_sw } */
> > +/* { dg-options "-O2 -mpower8-vector -mabi=ieeelongdouble -Wno-psabi" } */
> 
> It would be nice if you did not try to add -mpower8-vector in more
> testcases :-(

Yep.

> Is -Wno-psabi needed here?  What is the error you get without it / on
> which configurations?  Cargo-culting hiding the warnings makes you see
> fewer warnings, but that is the opposite of a good idea.
> 
> > +/* { dg-final { scan-assembler "bl __divtc3" } } */
> 
> This name depends on what object format and ABI is in use (some have an
> extra leading underscore, or a dot, or whatever).

Yes it is needed if GCC is configured against an older GLIBC before the full
IEEE 128-bit support was added.  For example, on my big endian test system, you
get warnings if you switch the floating point format.  I would imagine it would
also fail on little endian system with older libraries.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Ping: [PATCH 2/2] Rework 128-bit complex multiply and divide.

2023-03-02 Thread Michael Meissner via Gcc-patches
This patch is second in importance after the first patch in the series.  It is
needed to allow complex IBM 128-bit multiply/divide when long double is IEEE
128-bit.

| Date: Fri, 3 Feb 2023 00:53:05 -0500
| From: Michael Meissner 
| Subject: [PATCH 2/2] Rework 128-bit complex multiply and divide.
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Ping: [PATCH 1/2] PR target/107299: Fix build issue when long double is IEEE 128-bit

2023-03-02 Thread Michael Meissner via Gcc-patches
This is the most important patch.  It is needed to allow the boostrap to work
again when long double is IEEE 128-bit.

| Date: Fri, 3 Feb 2023 00:49:12 -0500
| From: Michael Meissner 
| Subject: [PATCH 1/2] PR target/107299: Fix build issue when long double is 
IEEE 128-bit
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Ping: [PATCH, V3] PR 107299, GCC does not build on PowerPC when long double is IEEE 128-bit

2023-02-27 Thread Michael Meissner via Gcc-patches
This is the most important patch to look at:

| Date: Wed, 14 Dec 2022 15:29:02 -0500
| From: Michael Meissner 
| Subject: [PATCH, V3] PR 107299, GCC does not build on PowerPC when long 
double is IEEE 128-bit
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH 1/2] PR target/107299: Fix build issue when long double is IEEE 128-bit

2023-02-22 Thread Michael Meissner via Gcc-patches
On Wed, Feb 22, 2023 at 06:37:39PM +0800, Kewen.Lin wrote:
> Thanks for working on this!  If updating libgcc source to workaround this 
> issue
> is the best option we can have at this moment, it's fine.

Thanks.  Yes, I agree that it does not fix the root issue.

> Comparing to one
> previous proposal which removes the workaround in build_common_tree_nodes for
> rs6000 KFmode, a bit concern on this one is that users can still meet the ICE
> with a simple case like:
> 
> typedef float TFtype __attribute__((mode (TF)));
> 
> TFtype
> test (TFtype t)
> {
>   return __builtin_copysignf128 (1.0q, t);
> }
> 
> but I guess they would write this kind of code very rarely?

I tend to think that it is better to consistantly use __float128/_Float128
types with the 'f128' functions, and use long double with the 'l'.  It would be
nice to fix the root cause (of __float128 and _Float128 not being the same type
within the compiler).

It is complicated by the fact that until C++2x, you can't use the _Float128
type.  You can use the __float128 and __ibm128 extensions, but you can't use
those extensions with _Complex.  This means for C++, you have to use the
__attrbibute__((mode)) to get to the complex type.  And due to the way I
initially implemented it, whether you use T{C,F}, K{C,F}, and I{C,F} depends on
the switches.

But without fixing that (which is fairly complex), I really want the master
branch fixed so you can build GCC with long double defaulting to IEEE 128-bit.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH 2/2] Rework 128-bit complex multiply and divide.

2023-02-22 Thread Michael Meissner via Gcc-patches
On Wed, Feb 22, 2023 at 06:13:07PM +0800, Kewen.Lin wrote:
> These two above paragraphs look a bit out of date (two patches now). :)

Thanks.

> IIUC this patch actually fixes a latent issue, so it is independent of the one
> fixing the bootstrapping issue, right?  This updated version of patch looks
> good to me, but I'd leave the approval to Segher/David.  Thanks!

Yes, I've been waiting for Segher or David's approval for this for awhile.

The history is it is indeed a latent issue (not supporting __ibm128 complex
multiply and divide when long double is IEEE 128-bit).  However, at the time I
wrote it, the other changes had broken the complex multiply and divide, and I
wrote this patch as part of the series.  I separated the patch from the other 2
to make it simpler to go in.  But it seems to be in limbo.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[PATCH 8/8] Add saturating subtract built-ins.

2023-02-03 Thread Michael Meissner via Gcc-patches
This patch adds support for a saturating subtract built-in function that may be
added to a future PowerPC processor.  Note, if it is added, the name of the
built-in function may change before GCC 13 is released.  If the name changes,
we will submit a patch changing the name.

I also added support for providing dense math built-in functions, even though
at present, we have not added any new built-in functions for dense math.  It is
likely we will want to add new dense math built-in functions as the dense math
support is fleshed out.

I tested this patch on a little endian power10 system with long double using
the tradiational IBM double double format.  Assuming the other 6 patches for
-mcpu=future are checked in (or at least the first patch), can I check this
patch into the master branch for GCC 13.

Note, I will be on vacation from Tuesday February 7th through Tuesday February
14th.

2023-02-03   Michael Meissner  

gcc/

* config/rs6000/rs6000-builtin.cc (rs6000_invalid_builtin): Add support
for flagging invalid use of future built-in functions.
(rs6000_builtin_is_supported): Add support for future built-in
functions.
* config/rs6000/rs6000-builtins.def (__builtin_saturate_subtract32): New
built-in function for -mcpu=future.
(__builtin_saturate_subtract64): Likewise.
* config/rs6000/rs6000-gen-builtins.cc (enum bif_stanza): Add stanzas
for -mcpu=future built-ins.
(stanza_map): Likewise.
(enable_string): Likewise.
(struct attrinfo): Likewise.
(parse_bif_attrs): Likewise.
(write_decls): Likewise.
* config/rs6000/rs6000.md (sat_sub3): Add saturating subtract
built-in insn declarations.
(sat_sub3_dot): Likewise.
(sat_sub3_dot2): Likewise.
* doc/extend.texi (Future PowerPC built-ins): New section.

gcc/testsuite/

* gcc.target/powerpc/subfus-1.c: New test.
* gcc.target/powerpc/subfus-2.c: Likewise.
---
 gcc/config/rs6000/rs6000-builtin.cc | 17 ++
 gcc/config/rs6000/rs6000-builtins.def   | 11 
 gcc/config/rs6000/rs6000-gen-builtins.cc| 35 ++--
 gcc/config/rs6000/rs6000.md | 60 +
 gcc/doc/extend.texi | 24 +
 gcc/testsuite/gcc.target/powerpc/subfus-1.c | 32 +++
 gcc/testsuite/gcc.target/powerpc/subfus-2.c | 32 +++
 gcc/testsuite/lib/target-supports.exp   | 16 +-
 8 files changed, 220 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/subfus-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/subfus-2.c

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index d971cf90e51..b9b0b2d52d0 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -139,6 +139,17 @@ rs6000_invalid_builtin (enum rs6000_gen_builtins fncode)
 case ENB_MMA:
   error ("%qs requires the %qs option", name, "-mmma");
   break;
+case ENB_FUTURE:
+  error ("%qs requires the %qs option", name, "-mcpu=future");
+  break;
+case ENB_FUTURE_64:
+  error ("%qs requires the %qs option and either the %qs or %qs option",
+name, "-mcpu=future", "-m64", "-mpowerpc64");
+  break;
+case ENB_DM:
+  error ("%qs requires the %qs or %qs options", name, "-mcpu=future",
+"-mdense-math");
+  break;
 default:
 case ENB_ALWAYS:
   gcc_unreachable ();
@@ -194,6 +205,12 @@ rs6000_builtin_is_supported (enum rs6000_gen_builtins 
fncode)
   return TARGET_HTM;
 case ENB_MMA:
   return TARGET_MMA;
+case ENB_FUTURE:
+  return TARGET_FUTURE;
+case ENB_FUTURE_64:
+  return TARGET_FUTURE && TARGET_POWERPC64;
+case ENB_DM:
+  return TARGET_DENSE_MATH;
 default:
   gcc_unreachable ();
 }
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index e0d9f5adc97..8b73e994558 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -139,6 +139,8 @@
 ;   endian   Needs special handling for endianness
 ;   ibmldRestrict usage to the case when TFmode is IBM-128
 ;   ibm128   Restrict usage to the case where __ibm128 is supported or if ibmld
+;   future   Restrict usage to future instructions
+;   dm   Restrict usage to dense math
 ;
 ; Each attribute corresponds to extra processing required when
 ; the built-in is expanded.  All such special processing should
@@ -4108,3 +4110,12 @@
 
   void __builtin_vsx_stxvp (v256, unsigned long, const v256 *);
 STXVP nothing {mma,pair}
+
+[future]
+  const signed int __builtin_saturate_subtract32 (signed int, signed int);
+  SAT_SUBSI sat_subsi3 {}
+
+[future-64]
+  const signed long __builtin_saturate_subtract64 (signed long, signed long);
+  SAT_SUBDI sat_subdi3 {}
+
diff --git a/gcc/config/rs6000/rs6000-gen-builtins.cc 

[PATCH 7/8] Support load/store vector with right length.

2023-02-03 Thread Michael Meissner via Gcc-patches
This patch adds support for new instructions that may be added to the PowerPC
architecture in the future to enhance the load and store vector with length
instructions.

The current instructions (lxvl, lxvll, stxvl, and stxvll) are inconvient to use
since the count for the number of bytes must be in the top 8 bits of the GPR
register, instead of the bottom 8 bits.  This meant that code generating these
instructions typically had to do a shift left by 56 bits to get the count into
the right position.  In a future version of the PowerPC architecture, new
variants of these instructions might be added that expect the count to be in
the bottom 8 bits of the GPR register.  These patches add this support to GCC
if the user uses the -mcpu=future option.

I discovered that the code in rs6000-string.cc to generate ISA 3.1 lxvl/stxvl
future lxvll/stxvll instructions would generate these instructions on 32-bit.
However the patterns for these instructions is only done on 64-bit systems.  So
I added a check for 64-bit support before generating the instructions.

I tested this patch on a little endian power10 system with long double using
the tradiational IBM double double format.  Assuming the other 6 patches for
-mcpu=future are checked in (or at least the first patch), can I check this
patch into the master branch for GCC 13?

Note, I will be on vacation from Tuesday February 7th through Tuesday February
14th.

2023-02-03   Michael Meissner  

gcc/

* config/rs6000/rs6000-string.cc (expand_block_move): Do generate lxvl
and stxvl on 32-bit.
* config/rs6000/vsx.md (lxvl): If -mcpu=future, generate the lxvl with
the shift count automaticaly used in the insn.
(lxvrl): New insn for -mcpu=future.
(lxvrll): Likewise.
(stxvl): If -mcpu=future, generate the stxvl with the shift count
automaticaly used in the insn.
(stxvrl): New insn for -mcpu=future.
(stxvrll): Likewise.

gcc/testsuite/

* gcc.target/powerpc/lxvrl.c: New test.
* lib/target-supports.exp (check_effective_target_powerpc_future_ok):
New effective target.
---
 gcc/config/rs6000/rs6000-string.cc   |   1 +
 gcc/config/rs6000/vsx.md | 122 +++
 gcc/testsuite/gcc.target/powerpc/lxvrl.c |  32 ++
 gcc/testsuite/lib/target-supports.exp|  16 ++-
 4 files changed, 148 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/lxvrl.c

diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index 75e6f8803a5..9b2f1b83b22 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -2811,6 +2811,7 @@ expand_block_move (rtx operands[], bool might_overlap)
  gen_func.mov = gen_vsx_movv2di_64bit;
}
   else if (TARGET_BLOCK_OPS_UNALIGNED_VSX
+  && TARGET_POWERPC64
   && TARGET_POWER10 && bytes < 16
   && orig_bytes > 16
   && !(bytes == 1 || bytes == 2
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 0865608f94a..1ab8dc373c0 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5582,20 +5582,32 @@ (define_expand "first_mismatch_or_eos_index_"
   DONE;
 })
 
-;; Load VSX Vector with Length
+;; Load VSX Vector with Length.  If we have lxvrl, we don't have to do an
+;; explicit shift left into a pseudo.
 (define_expand "lxvl"
-  [(set (match_dup 3)
-(ashift:DI (match_operand:DI 2 "register_operand")
-   (const_int 56)))
-   (set (match_operand:V16QI 0 "vsx_register_operand")
-   (unspec:V16QI
-[(match_operand:DI 1 "gpc_reg_operand")
-  (mem:V16QI (match_dup 1))
- (match_dup 3)]
-UNSPEC_LXVL))]
+  [(use (match_operand:V16QI 0 "vsx_register_operand"))
+   (use (match_operand:DI 1 "gpc_reg_operand"))
+   (use (match_operand:DI 2 "gpc_reg_operand"))]
   "TARGET_P9_VECTOR && TARGET_64BIT"
 {
-  operands[3] = gen_reg_rtx (DImode);
+  rtx shift_len = gen_rtx_ASHIFT (DImode, operands[2], GEN_INT (56));
+  rtx len;
+
+  if (TARGET_FUTURE)
+len = shift_len;
+  else
+{
+  len = gen_reg_rtx (DImode);
+  emit_insn (gen_rtx_SET (len, shift_len));
+}
+
+  rtx dest = operands[0];
+  rtx addr = operands[1];
+  rtx mem = gen_rtx_MEM (V16QImode, addr);
+  rtvec rv = gen_rtvec (3, addr, mem, len);
+  rtx lxvl = gen_rtx_UNSPEC (V16QImode, rv, UNSPEC_LXVL);
+  emit_insn (gen_rtx_SET (dest, lxvl));
+  DONE;
 })
 
 (define_insn "*lxvl"
@@ -5619,6 +5631,34 @@ (define_insn "lxvll"
   "lxvll %x0,%1,%2"
   [(set_attr "type" "vecload")])
 
+;; For lxvrl and lxvrll, use the combiner to eliminate the shift.  The
+;; define_expand for lxvl will already incorporate the shift in generating the
+;; insn.  The lxvll buitl-in function required the user to have already done
+;; the shift.  Defining lxvrll this way, will optimize cases where the user has
+;; done the shift immediately before the 

[PATCH 6/8] PowerPC: Add support for 1,024 bit DMR registers.

2023-02-03 Thread Michael Meissner via Gcc-patches
This patch is a prelimianry patch to add the full 1,024 bit dense math register
(DMRs) for -mcpu=future.  The MMA 512-bit accumulators map onto the top of the
DMR register.

This patch only adds the new 1,024 bit register support.  It does not add
support for any instructions that need 1,024 bit registers instead of 512 bit
registers.

I used the new mode 'TDOmode' to be the opaque mode used for 1,204 bit
registers.  The 'wD' constraint added in previous patches is used for these
registers.  I added support to do load and store of DMRs via the VSX registers,
since there are no load/store dense math instructions.  I added the new keyword
'__dmr' to create 1,024 bit types that can be loaded into DMRs.  At present, I
don't have aliases for __dmr512 and __dmr1024 that we've discussed internally.

The patches have been tested on the following platforms.  I added the patches
for PR target/107299 that I submitted on November 2nd before doing the builds so
that GCC would build on systems using IEEE 128-bit long double.
*   https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html

Note this patch requires the patch posted on February 2nd, 2023 to bump up the
precision size to 16 bits.  To get this into GCC 13, I will have to revise this
patch.

| Date: Thu, 2 Feb 2023 12:38:30 -0500
| Subject: [PATCH] Bump up precision size to 16 bits.
| Message-ID: 
| https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611198.html

There were no regressions with doing bootstrap builds and running the regression
tests, providing the above patch for the precision size has been installed:

1)  Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
2)  Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
3)  Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
4)  Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).

Note, I will be on vacation from Tuesday February 7th through Tuesday February
14th.

Can I check this patch into the GCC 13 master branch?

2023-02-03   Michael Meissner  

gcc/

* config/rs6000/mma.md (UNSPEC_DM_INSERT512_UPPER): New unspec.
(UNSPEC_DM_INSERT512_LOWER): Likewise.
(UNSPEC_DM_EXTRACT512): Likewise.
(UNSPEC_DMR_RELOAD_FROM_MEMORY): Likewise.
(UNSPEC_DMR_RELOAD_TO_MEMORY): Likewise.
(movtdo): New define_expand and define_insn_and_split to implement 1,024
bit DMR registers.
(movtdo_insert512_upper): New insn.
(movtdo_insert512_lower): Likewise.
(movtdo_extract512): Likewise.
(reload_dmr_from_memory): Likewise.
(reload_dmr_to_memory): Likewise.
* config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add DMR
support.
(rs6000_init_builtins): Add support for __dmr keyword.
* config/rs6000/rs6000-call.cc (rs6000_return_in_memory): Add support
for TDOmode.
(rs6000_function_arg): Likewise.
* config/rs6000/rs6000-modes.def (TDOmode): New mode.
* config/rs6000/rs6000.cc (rs6000_hard_regno_nregs_internal): Add
support for TDOmode.
(rs6000_hard_regno_mode_ok_uncached): Likewise.
(rs6000_hard_regno_mode_ok): Likewise.
(rs6000_modes_tieable_p): Likewise.
(rs6000_debug_reg_global): Likewise.
(rs6000_setup_reg_addr_masks): Likewise.
(rs6000_init_hard_regno_mode_ok): Add support for TDOmode.  Setup reload
hooks for DMR mode.
(reg_offset_addressing_ok_p): Add support for TDOmode.
(rs6000_emit_move): Likewise.
(rs6000_secondary_reload_simple_move): Likewise.
(rs6000_secondary_reload_class): Likewise.
(rs6000_mangle_type): Add mangling for __dmr type.
(rs6000_dmr_register_move_cost): Add support for TDOmode.
(rs6000_split_multireg_move): Likewise.
(rs6000_invalid_conversion): Likewise.
* config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Add TDOmode.
(enum rs6000_builtin_type_index): Add DMR type nodes.
(dmr_type_node): Likewise.
(ptr_dmr_type_node): Likewise.

gcc/testsuite/

* gcc.target/powerpc/dm-1024bit.c: New test.
---
 gcc/config/rs6000/mma.md  | 152 ++
 gcc/config/rs6000/rs6000-builtin.cc   |  13 ++
 gcc/config/rs6000/rs6000-call.cc  |  13 +-
 gcc/config/rs6000/rs6000-modes.def|   4 +
 gcc/config/rs6000/rs6000.cc   | 125 ++
 gcc/config/rs6000/rs6000.h|   7 +-
 gcc/testsuite/gcc.target/powerpc/dm-1024bit.c |  63 
 7 files changed, 345 insertions(+), 32 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/dm-1024bit.c

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 411e2345291..0233c7b304a 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -92,6 +92,11 @@ (define_c_enum "unspec"
UNSPEC_MMA_XXMFACC
UNSPEC_MMA_XXMTACC

[PATCH 4/8] PowerPC: Switch to dense math names for all MMA operations

2023-02-03 Thread Michael Meissner via Gcc-patches
This patch changes the assembler instruction names for MMA instructions from
the original name used in power10 to the new name when used with the dense math
system.  I.e. xvf64gerpp becomes dmxvf64gerpp.  The assembler will emit the
same bits for either spelling.

The patches have been tested on the following platforms.  I added the patches
for PR target/107299 that I submitted on November 2nd before doing the builds so
that GCC would build on systems using IEEE 128-bit long double.
*   https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html

There were no regressions with doing bootstrap builds and running the regression
tests:

1)  Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
2)  Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
3)  Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
4)  Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).

Note, I will be on vacation from Tuesday February 7th through Tuesday February
14th.

Can I check this patch into the GCC 13 master branch?

2023-02-03   Michael Meissner  

gcc/

* config/rs6000/mma.md (vvi4i4i8_dm): New int attribute.
(avvi4i4i8_dm): Likewise.
(vvi4i4i2_dm): Likewise.
(avvi4i4i2_dm): Likewise.
(vvi4i4_dm): Likewise.
(avvi4i4_dm): Likewise.
(pvi4i2_dm): Likewise.
(apvi4i2_dm): Likewise.
(vvi4i4i4_dm): Likewise.
(avvi4i4i4_dm): Likewise.
(mma_): Add support for running on DMF systems, generating the dense
math instruction and using the dense math accumulators.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.

gcc/testsuite/

* gcc.target/powerpc/dm-double-test.c: New test.
* lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New
target test.
---
 gcc/config/rs6000/mma.md  |  98 +++--
 .../gcc.target/powerpc/dm-double-test.c   | 194 ++
 gcc/testsuite/lib/target-supports.exp |  19 ++
 3 files changed, 299 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/dm-double-test.c

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 9e3feb3ea54..411e2345291 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -227,13 +227,22 @@ (define_int_attr apv  [(UNSPEC_MMA_XVF64GERPP 
"xvf64gerpp")
 
 (define_int_attr vvi4i4i8  [(UNSPEC_MMA_PMXVI4GER8 "pmxvi4ger8")])
 
+(define_int_attr vvi4i4i8_dm   [(UNSPEC_MMA_PMXVI4GER8 
"pmdmxvi4ger8")])
+
 (define_int_attr avvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8PP   
"pmxvi4ger8pp")])
 
+(define_int_attr avvi4i4i8_dm  [(UNSPEC_MMA_PMXVI4GER8PP   
"pmdmxvi4ger8pp")])
+
 (define_int_attr vvi4i4i2  [(UNSPEC_MMA_PMXVI16GER2"pmxvi16ger2")
 (UNSPEC_MMA_PMXVI16GER2S   "pmxvi16ger2s")
 (UNSPEC_MMA_PMXVF16GER2"pmxvf16ger2")
 (UNSPEC_MMA_PMXVBF16GER2   
"pmxvbf16ger2")])
 
+(define_int_attr vvi4i4i2_dm   [(UNSPEC_MMA_PMXVI16GER2"pmdmxvi16ger2")
+(UNSPEC_MMA_PMXVI16GER2S   
"pmdmxvi16ger2s")
+(UNSPEC_MMA_PMXVF16GER2"pmdmxvf16ger2")
+(UNSPEC_MMA_PMXVBF16GER2   
"pmdmxvbf16ger2")])
+
 (define_int_attr avvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2PP  "pmxvi16ger2pp")
 (UNSPEC_MMA_PMXVI16GER2SPP 
"pmxvi16ger2spp")
 (UNSPEC_MMA_PMXVF16GER2PP  "pmxvf16ger2pp")
@@ -245,25 +254,54 @@ (define_int_attr avvi4i4i2
[(UNSPEC_MMA_PMXVI16GER2PP  "pmxvi16ger2pp")
 (UNSPEC_MMA_PMXVBF16GER2NP 
"pmxvbf16ger2np")
 (UNSPEC_MMA_PMXVBF16GER2NN 
"pmxvbf16ger2nn")])
 
+(define_int_attr avvi4i4i2_dm  [(UNSPEC_MMA_PMXVI16GER2PP  
"pmdmxvi16ger2pp")
+(UNSPEC_MMA_PMXVI16GER2SPP 
"pmdmxvi16ger2spp")
+(UNSPEC_MMA_PMXVF16GER2PP  
"pmdmxvf16ger2pp")
+(UNSPEC_MMA_PMXVF16GER2PN  
"pmdmxvf16ger2pn")
+(UNSPEC_MMA_PMXVF16GER2NP  
"pmdmxvf16ger2np")
+(UNSPEC_MMA_PMXVF16GER2NN  
"pmdmxvf16ger2nn")
+(UNSPEC_MMA_PMXVBF16GER2PP 
"pmdmxvbf16ger2pp")
+(UNSPEC_MMA_PMXVBF16GER2PN 
"pmdmxvbf16ger2pn")
+(UNSPEC_MMA_PMXVBF16GER2NP 
"pmdmxvbf16ger2np")
+

[PATCH 3/8] PowerPC: Make MMA insns support DMR registers.

2023-02-03 Thread Michael Meissner via Gcc-patches
This patch changes the MMA instructions to use either FPR registers
(-mcpu=power10) or DMRs (-mcpu=future).  In this patch, the existing MMA
instruction names are used.

A macro (__PPC_DMR__) is defined if the MMA instructions use the DMRs.

The patches have been tested on the following platforms.  I added the patches
for PR target/107299 that I submitted on November 2nd before doing the builds so
that GCC would build on systems using IEEE 128-bit long double.
*   https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html

There were no regressions with doing bootstrap builds and running the regression
tests:

1)  Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
2)  Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
3)  Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
4)  Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).

Note, I will be on vacation from Tuesday February 7th through Tuesday February
14th.

Can I check this patch into the GCC 13 master branch?

2023-02-03   Michael Meissner  

gcc/

* config/rs6000/mma.md (mma_): New define_expand to handle
mma_ for dense math and non dense math.
(mma_ insn): Restrict to non dense math.
(mma_xxsetaccz): Convert to define_expand to handle non dense math and
dense math.
(mma_xxsetaccz_vsx): Rename from mma_xxsetaccz and restrict usage to non
dense math.
(mma_xxsetaccz_dm): Dense math version of mma_xxsetaccz.
(mma_): Add support for dense math.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
__PPC_DMR__ if we have dense math instructions.
* config/rs6000/rs6000.cc (print_operand): Make %A handle only DMRs if
dense math and only FPRs if not dense math.
(rs6000_split_multireg_move): Do not generate the xxmtacc instruction to
prime the DMR registers or the xxmfacc instruction to de-prime
instructions if we have dense math register support.
---
 gcc/config/rs6000/mma.md  | 247 +-
 gcc/config/rs6000/rs6000-c.cc |   3 +
 gcc/config/rs6000/rs6000.cc   |  35 ++---
 3 files changed, 176 insertions(+), 109 deletions(-)

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 59ca6835f7c..9e3feb3ea54 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -552,190 +552,249 @@ (define_insn "*mma_disassemble_acc_dm"
   "dmxxextfdmr256 %0,%1,2"
   [(set_attr "type" "mma")])
 
-(define_insn "mma_"
+;; MMA instructions that do not use their accumulators as an input, still must
+;; not allow their vector operands to overlap the registers used by the
+;; accumulator.  We enforce this by marking the output as early clobber.  If we
+;; have dense math, we don't need the whole prime/de-prime action, so just make
+;; thse instructions be NOPs.
+
+(define_expand "mma_"
+  [(set (match_operand:XO 0 "register_operand")
+   (unspec:XO [(match_operand:XO 1 "register_operand")]
+  MMA_ACC))]
+  "TARGET_MMA"
+{
+  if (TARGET_DENSE_MATH)
+{
+  if (!rtx_equal_p (operands[0], operands[1]))
+   emit_move_insn (operands[0], operands[1]);
+  DONE;
+}
+
+  /* Generate the prime/de-prime code.  */
+})
+
+(define_insn "*mma_"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=")
(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")]
MMA_ACC))]
-  "TARGET_MMA"
+  "TARGET_MMA && !TARGET_DENSE_MATH"
   " %A0"
   [(set_attr "type" "mma")])
 
 ;; We can't have integer constants in XOmode so we wrap this in an
-;; UNSPEC_VOLATILE.
+;; UNSPEC_VOLATILE for the non-dense math case.  For dense math, we don't need
+;; to disable optimization and we can do a normal UNSPEC.
 
-(define_insn "mma_xxsetaccz"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
+(define_expand "mma_xxsetaccz"
+  [(set (match_operand:XO 0 "register_operand")
(unspec_volatile:XO [(const_int 0)]
UNSPECV_MMA_XXSETACCZ))]
   "TARGET_MMA"
+{
+  if (TARGET_DENSE_MATH)
+{
+  emit_insn (gen_mma_xxsetaccz_dm (operands[0]));
+  DONE;
+}
+})
+
+(define_insn "*mma_xxsetaccz_vsx"
+  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
+   (unspec_volatile:XO [(const_int 0)]
+   UNSPECV_MMA_XXSETACCZ))]
+  "TARGET_MMA && !TARGET_DENSE_MATH"
   "xxsetaccz %A0"
   [(set_attr "type" "mma")])
 
+
+(define_insn "mma_xxsetaccz_dm"
+  [(set (match_operand:XO 0 "dmr_operand" "=wD")
+   (unspec:XO [(const_int 0)]
+  UNSPECV_MMA_XXSETACCZ))]
+  

[PATCH 2/8] PowerPC: Add support for accumulators in DMR registers.

2023-02-03 Thread Michael Meissner via Gcc-patches
The MMA subsystem added the notion of accumulator registers as an optional
feature of ISA 3.1.  In ISA 3.1, these accumulators overlapped with the VSX
vector registers 0..31, but logically the accumulator registers were separate
from the FPR registers.  In ISA 3.1, it was anticipated that in future systems,
the accumulator registers may no overlap with the FPR registers.  This patch
adds the support for dense math registers as separate registers.

These changes are preliminary.  They are expected to change over time.

This particular patch does not change the MMA support to use the accumulators
within the dense math registers.  This patch just adds the basic support for
having separate DMRs.  The next patch will switch the MMA support to use the
accumulators if -mcpu=future is used.

For testing purposes, I added an undocumented option '-mdense-math' to enable
or disable the dense math support.

This patch adds a new constraint (wD).  If MMA is selected but dense math is
not selected (i.e. -mcpu=power10), the wD constraint will allow access to
accumulators that overlap with the VSX vector registers 0..31.  If both MMA and
dense math are selected (i.e. -mcpu=future), the wD constraint will only allow
dense math registers.

This patch modifies the existing %A output modifier.  If MMA is selected but
dense math is not selected, then %A output modifier converts the VSX register
number to the accumulator number, by dividing it by 4.  If both MMA and dense
math are selected, then %A will map the separate DMR registers into 0..7.

The intention is that user code using extended asm can be modified to run on
both MMA without dense math and MMA with dense math:

1)  If possible, don't use extended asm, but instead use the MMA built-in
functions;

2)  If you do need to write extended asm, change the d constraints
targetting accumulators should now use wD;

3)  Only use the built-in zero, assemble and disassemble functions create
move data between vector quad types and dense math accumulators.
I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the
extended asm code.  The reason is these instructions assume there is a
1-to-1 correspondence between 4 adjacent FPR registers and an
accumulator that overlaps with those instructions.  With accumulators
now being separate registers, there no longer is a 1-to-1
correspondence.

It is possible that the mangling for DMRs and the GDB register numbers may
change in the future.

The patches have been tested on the following platforms.  I added the patches
for PR target/107299 that I submitted on November 2nd before doing the builds so
that GCC would build on systems using IEEE 128-bit long double.
*   https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html

There were no regressions with doing bootstrap builds and running the regression
tests:

1)  Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
2)  Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
3)  Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
4)  Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).

Can I check this patch into the GCC 13 master branch?

Note, I will be on vacation from Tuesday February 7th through Tuesday February
14th.

2023-02-03   Michael Meissner  

gcc/

* config/rs6000/constraints.md (wD constraint): New constraint.
* config/rs6000/mma.md (UNSPEC_DM_ASSEMBLE_ACC): New unspec.
(movxo): Convert into define_expand.
(movxo_vsx): Version of movxo where accumulators overlap with VSX vector
registers 0..31.
(movxo_dm): Verson of movxo that supports separate dense math
accumulators.
(mma_assemble_acc): Add dense math support to define_expand.
(mma_assemble_acc_vsx): Rename from mma_assemble_acc, and restrict it to
non dense math systems.
(mma_assemble_acc_dm): Dense math version of mma_assemble_acc.
(mma_disassemble_acc): Add dense math support to define_expand.
(mma_disassemble_acc_vsx): Rename from mma_disassemble_acc, and restrict
it to non dense math systems.
(mma_disassemble_acc_dm): Dense math version of mma_disassemble_acc.
* config/rs6000/predicates.md (dmr_operand): New predicate.
(accumulator_operand): Likewise.
* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): Add -mdense-math.
(POWERPC_MASKS): Likewise.
* config/rs6000/rs6000.cc (enum rs6000_reg_type): Add DMR_REG_TYPE.
(enum rs6000_reload_reg_type): Add RELOAD_REG_DMR.
(LAST_RELOAD_REG_CLASS): Add support for DMR registers and the wD
constraint.
(reload_reg_map): Likewise.
(rs6000_reg_names): Likewise.
(alt_reg_names): Likewise.
(rs6000_hard_regno_nregs_internal): Likewise.
(rs6000_hard_regno_mode_ok_uncached): Likewise.

[PATCH 1/8] PowerPC: Make -mcpu=future enable -mblock-ops-vector-pair

2023-02-03 Thread Michael Meissner via Gcc-patches
This patch enables generating load and store vector pair instructions when
doing certain memory copy operations when -mcpu=future is used.  In doing tests
on power10, it was determined that using these instructions were problematical
in a few cases, so we disabled generating them by default.  This patch
re-enabled generating these instructions if -mcpu=future is used.

The patches have been tested on the following platforms.  I added the patches
for PR target/107299 that I submitted on November 2nd before doing the builds so
that GCC would build on systems using IEEE 128-bit long double.
*   https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html

There were no regressions with doing bootstrap builds and running the regression
tests:

1)  Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
2)  Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
3)  Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
4)  Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).

Note, I will be on vacation from Tuesday February 7th through Tuesday February
14th.

Can I check this patch into the GCC 13 master branch?

2023-02-03   Michael Meissner  

gcc/

* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): Add
-mblock-ops-vector-pair.
(POWERPC_MASKS): Likewise.
---
 gcc/config/rs6000/rs6000-cpus.def | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-cpus.def 
b/gcc/config/rs6000/rs6000-cpus.def
index deb4ea1c980..b9a4d9ad76e 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -88,6 +88,7 @@
 
 /* Flags for a potential future processor that may or may not be delivered.  */
 #define ISA_FUTURE_MASKS   (ISA_3_1_MASKS_SERVER   \
+| OPTION_MASK_BLOCK_OPS_VECTOR_PAIR\
 | OPTION_MASK_FUTURE)
 
 /* Flags that need to be turned off if -mno-power9-vector.  */
@@ -125,6 +126,7 @@
 
 /* Mask of all options to set the default isa flags based on -mcpu=.  */
 #define POWERPC_MASKS  (OPTION_MASK_ALTIVEC\
+| OPTION_MASK_BLOCK_OPS_VECTOR_PAIR\
 | OPTION_MASK_CMPB \
 | OPTION_MASK_CRYPTO   \
 | OPTION_MASK_DFP  \
-- 
2.39.1


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[PATCH 1/8] PowerPC: Add -mcpu=future.

2023-02-03 Thread Michael Meissner via Gcc-patches
These patches implement support for potential future PowerPC cpus.  At this
time, features enabled with -mcpu=future may or may not be in actual PowerPCs
that will be delivered in the future.

This patch adds support for the -mcpu=future and -mtune=future options.
If you use -mcpu=future, the macro __ARCH_PWR_FUTURE__ is defined, and the
assembler .machine directive "future" is used.  Future patches in this
series will add support for new instructions that may be present in future
PowerPC processors.

At the moment, we do not have any differences in tuning between power10 and
future.  It is anticipated that we may change the tuning characteristics for
-mtune=future at a later time.

The patches have been tested on the following platforms.  I added the patches
for PR target/107299 that I submitted on November 2nd before doing the builds so
that GCC would build on systems using IEEE 128-bit long double.
* https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html

There were no regressions with doing bootstrap builds and running the regression
tests:

1)  Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
2)  Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
3)  Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
4)  Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).

Can I check this patch into the GCC 13 master branch?

Note, I will be on vacation from Tuesday February 7th through Tuesday February
14th.

2023-02-03   Michael Meissner  

gcc/

* config/rs6000/power10.md (power10-load): Temporarily treat
-mcpu=future the same as -mcpu=power10.
(power10-fused-load): Likewise.
(power10-prefixed-load): Likewise.
(power10-prefixed-load): Likewise.
(power10-load-update): Likewise.
(power10-fpload-double): Likewise.
(power10-fpload-double): Likewise.
(power10-prefixed-fpload-double): Likewise.
(power10-prefixed-fpload-double): Likewise.
(power10-fpload-update-double): Likewise.
(power10-fpload-single): Likewise.
(power10-fpload-update-single): Likewise.
(power10-vecload): Likewise.
(power10-vecload-pair): Likewise.
(power10-store): Likewise.
(power10-fused-store): Likewise.
(power10-prefixed-store): Likewise.
(power10-prefixed-store): Likewise.
(power10-store-update): Likewise.
(power10-vecstore-pair): Likewise.
(power10-larx): Likewise.
(power10-lq): Likewise.
(power10-stcx): Likewise.
(power10-stq): Likewise.
(power10-sync): Likewise.
(power10-sync): Likewise.
(power10-alu): Likewise.
(power10-fused_alu): Likewise.
(power10-paddi): Likewise.
(power10-rot): Likewise.
(power10-rot-compare): Likewise.
(power10-alu2): Likewise.
(power10-cmp): Likewise.
(power10-two): Likewise.
(power10-three): Likewise.
(power10-mul): Likewise.
(power10-mul-compare): Likewise.
(power10-div): Likewise.
(power10-div-compare): Likewise.
(power10-crlogical): Likewise.
(power10-mfcrf): Likewise.
(power10-mfcr): Likewise.
(power10-mtcr): Likewise.
(power10-mtjmpr): Likewise.
(power10-mfjmpr): Likewise.
(power10-mfjmpr): Likewise.
(power10-fpsimple): Likewise.
(power10-fp): Likewise.
(power10-fpcompare): Likewise.
(power10-sdiv): Likewise.
(power10-ddiv): Likewise.
(power10-sqrt): Likewise.
(power10-dsqrt): Likewise.
(power10-vec-2cyc): Likewise.
(power10-fused-vec): Likewise.
(power10-veccmp): Likewise.
(power10-vecsimple): Likewise.
(power10-vecnormal): Likewise.
(power10-qp): Likewise.
(power10-vecperm): Likewise.
(power10-vecperm-compare): Likewise.
(power10-prefixed-vecperm): Likewise.
(power10-veccomplex): Likewise.
(power10-vecfdiv): Likewise.
(power10-vecdiv): Likewise.
(power10-qpdiv): Likewise.
(power10-qpmul): Likewise.
(power10-mtvsr): Likewise.
(power10-mfvsr): Likewise.
(power10-mfvsr): Likewise.
(power10-branch): Likewise.
(power10-fused-branch): Likewise.
(power10-crypto): Likewise.
(power10-htm): Likewise.
(power10-htm): Likewise.
(power10-dfp): Likewise.
(power10-dfpq): Likewise.
(power10-mma): Likewise.
(power10-prefixed-mma): Likewise.
* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
__ARCH_PWR_FUTURE__ if -mcpu=future.
* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): New macro.
(POWERPC_MASKS): Add -mcpu=future.
* config/rs6000/rs6000-opts.h (enum processor_type): Add
PROCESSOR_FUTURE.
* config/rs6000/rs6000-tables.opt: 

[PATCH 0/8] PowerPC future support for Dense Math

2023-02-03 Thread Michael Meissner via Gcc-patches
These patches were originally posted on November 10th.  Segher has asked that I
repost them.  These patches are somewhat changed since the original posting to
address some of the comments.

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605581.html

In the first patch (adding -mcpu=future), I have taken out the code of making
-mtune=future act as -mtune=power10.  Instead I went through all of the places
that look at the tuning (mostly in power10.md and rs6000.cc), and added future
as an option.  Obviously at a later time, we will provide a separate tuning
file for future (or whatever the new name will be if the instructions are added
officially).  But for now, it will suffice.

In patch #3, I fixed the opcode for clearing a dense math register that Peter
had noticed.  I was using the name based on the existing clear instruction,
instead of the new instruction.

In patch #6, I fixed the code, relying on the changes for setting the precision
field to 16 bits.  Since that patch will not be able to go into GCC 13 at
present, we might skip that support for now.  The important thing for existing
users of the MMA code is the support for accumulators being in the separate
dense math registers rather than overlapping does need to go in, and we can
probably delay the 1,024 bit register support, or implement in a different
fashion.

In the insn names, I tried to switch to using _vsx instead of _fpr for the
existing MMA support instructions.  I also tried to clear up the comments to
specify ISA 3.1 instead of power10 when talking about the existing MMA
support.

The following is from the original posting (slightly modified):

This patch is very preliminary support for a potential new feature to the
PowerPC that extends the current power10 MMA architecture.  This feature may or
may not be present in any specific future PowerPC processor.

In the current MMA subsystem for Power10, there are 8 512-bit accumulator
registers.  These accumulators are each tied to sets of 4 FPR registers.  When
you issue a prime instruction, it makes sure the accumulator is a copy of the 4
FPR registers the accumulator is tied to.  When you issue a deprime
instruction, it makes sure that the accumulator data content is logically
copied to the matching FPR register.

In the potential dense math system, the accumulators are moved to separate
registers called dense math registers (DM registers or DMR).  The DMRs are then
extended to 1,024 bits and new instructions will be added to deal with all
1,024 bits of the DMRs.

If you take existing MMA code, it will work as long as you don't do anything
with accumulators, and you follow the rules in the ISA 3.1 documentation for
using the MMA subsystem.

These patches add support for the 512-bit accumulators within the dense math
system, and for allocation of the 1,024-bit DMRs.  At this time, no additional
built-in functions will be done to support any dense math features other than
doing data movement between the DMRs and the VSX registers.  Before we can look
at adding any new dense math support other than data movement, we need the GCC
compiler to be able to allocate and use these DMRs.

There are 8 patches in this patch set:

1) The first patch just adds -mcpu=future as an option to add new support.
This is similar to the -mcpu=future that we did before power10 was announced.

2) The second patch enables GCC to use the load and store vector pair
instructions to optimize memory copy operations in the compiler.  For power10,
we needed to just stay with normal vector load/stores for memory copy
operations.

3) The third patch enables 512-bit accumulators store in DMRs.  This patch
enables the register allocation, but it does not move the existing MMA to use
these registers.

4) The fourth patch switches the MMA subsystem to use 512-bit accumulators
within DMRs if you use -mcpu=future.

5) The fifth patch switches the names of the MMA instructions to use the dense
math equivalent name if -mcpu=future.

6) The sixth patch enables using the full 1,024-bit DMRs.  Right now, all you
can do with DMRs is move a VSX register to a DMR register, and to move a DMR
register to a VSX register.  [As I mentioned above, at the moment, this patch
is problematical as is]

7) The seventh patch is not DMR related.  It adds support for variants of the
load/store vector with length instruction that may be added in future PowerPC
processors.  These variants eliminate having to shift the byte length left by
56 bits.

8) The eighth patch is also not DMR related.  It adds support for a saturating
subtract operation that may be added to future PowerPC processors.

In terms of changes, we now use the wD constraint for accumulators.  If you
compile with -mcpu=power10, the wD constraint will match the equivalent VSX
register (0..31) that overlaps with the accumulator.  If you compile with
-mcpu=future, the wD constraint will match the DMR register and not the FPR
register.

This patch also modifies the print_operand %A 

[PATCH 2/2] Rework 128-bit complex multiply and divide.

2023-02-02 Thread Michael Meissner via Gcc-patches
This patch reworks how the complex multiply and divide built-in functions are
done.  Previously we created built-in declarations for doing long double complex
multiply and divide when long double is IEEE 128-bit.  The old code also did not
support __ibm128 complex multiply and divide if long double is IEEE 128-bit.

This patch was originally posted on December 13th, 2022:

| Date: Tue, 13 Dec 2022 01:21:06 -0500
| Subject: [PATCH V2] Rework 128-bit complex multiply and divide, PR 
target/107299
| Message-ID: 

In terms of history, I wrote the original code just as I was starting to test
GCC on systems where IEEE 128-bit long double was the default.  At the time, we
had not yet started mangling the built-in function names as a way to bridge
going from a system with 128-bit IBM long double to 128-bin IEEE long double.

The original code depends on there only being two 128-bit types invovled.  With
the next patch in this series, this assumption will no longer be true.  When
long double is IEEE 128-bit, there will be 2 IEEE 128-bit types (one for the
explicit __float128/_Float128 type and one for long double).

The problem is we cannot create two separate built-in functions that resolve to
the same name.  This is a requirement of add_builtin_function and the C front
end.  That means for the 3 possible modes (IFmode, KFmode, and TFmode), you can
only use 2 of them.

This code does not create the built-in declaration with the changed name.
Instead, it uses the TARGET_MANGLE_DECL_ASSEMBLER_NAME hook to change the name
before it is written out to the assembler file like it now does for all of the
other long double built-in functions.

When I wrote these patches, I discovered that __ibm128 complex multiply and
divide had originally not been supported if long double is IEEE 128-bit as it
would generate calls to __mulic3 and __divic3.  I added tests in the testsuite
to verify that the correct name (i.e. __multc3 and __divtc3) is used in this
case.

I had previously sent this patch out on November 1st.  Compared to that version,
this version no longer disables the special mapping when you are building
libgcc, as it turns out we don't need it.

I tested all 3 patchs for PR target/107299 on:

1)  LE Power10 using --with-cpu=power10 --with-long-double-format=ieee
2)  LE Power10 using --with-cpu=power10 --with-long-double-format=ibm
3)  LE Power9  using --with-cpu=power9  --with-long-double-format=ibm
4)  BE Power8  using --with-cpu=power8  --with-long-double-format=ibm

Once all 3 patches have been applied, we can once again build GCC when long
double is IEEE 128-bit.  There were no other regressions with these patches.
Can I check these patches into the trunk?

Note, it is Friday February 3rd, 2023.  I will be on vacation Tuesday February
7th through February 14th.

2023-02-02   Michael Meissner  

gcc/

PR target/107299
* config/rs6000/rs6000.cc (create_complex_muldiv): Delete.
(init_float128_ieee): Delete code to switch complex multiply and divide
for long double.
(complex_multiply_builtin_code): New helper function.
(complex_divide_builtin_code): Likewise.
(rs6000_mangle_decl_assembler_name): Add support for mangling the name
of complex 128-bit multiply and divide built-in functions.

gcc/testsuite/

PR target/107299
* gcc.target/powerpc/divic3-1.c: New test.
* gcc.target/powerpc/divic3-2.c: Likewise.
* gcc.target/powerpc/mulic3-1.c: Likewise.
* gcc.target/powerpc/mulic3-2.c: Likewise.
---
 gcc/config/rs6000/rs6000.cc | 109 +++-
 gcc/testsuite/gcc.target/powerpc/divic3-1.c |  18 
 gcc/testsuite/gcc.target/powerpc/divic3-2.c |  17 +++
 gcc/testsuite/gcc.target/powerpc/mulic3-1.c |  18 
 gcc/testsuite/gcc.target/powerpc/mulic3-2.c |  17 +++
 5 files changed, 132 insertions(+), 47 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/divic3-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/divic3-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/mulic3-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/mulic3-2.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 16ca3a31757..7e76c37fdab 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -11151,26 +11151,6 @@ init_float128_ibm (machine_mode mode)
 }
 }
 
-/* Create a decl for either complex long double multiply or complex long double
-   divide when long double is IEEE 128-bit floating point.  We can't use
-   __multc3 and __divtc3 because the original long double using IBM extended
-   double used those names.  The complex multiply/divide functions are encoded
-   as builtin functions with a complex result and 4 scalar inputs.  */
-
-static void
-create_complex_muldiv (const char *name, built_in_function fncode, tree fntype)
-{
-  tree fndecl = add_builtin_function (name, fntype, fncode, BUILT_IN_NORMAL,
-  

[PATCH 1/2] PR target/107299: Fix build issue when long double is IEEE 128-bit

2023-02-02 Thread Michael Meissner via Gcc-patches
This patch is a repost of a patch:

| Date: Thu, 19 Jan 2023 11:37:27 -0500
| Subject: [PATCH] PR target/107299: Fix build issue when long double is IEEE 
128-bit
| Message-ID: 

This patch updates the IEEE 128-bit types used in libgcc.

At the moment, we cannot build GCC when the target uses IEEE 128-bit long
doubles, such as building the compiler for a native Fedora 36 system.  The
build dies when it is trying to build the _mulkc3.c and _divkc3 modules.

This patch changes libgcc to use long double for the IEEE 128-bit base type if
long double is IEEE 128-bit, and it uses _Float128 otherwise.  The built-in
functions are adjusted to be the correct version based on the IEEE 128-bit base
type used.

While it is desirable to ultimately have __float128 and _Float128 use the same
internal type and mode within GCC, at present if you use the option
-mabi=ieeelongdouble, the __float128 type will use the long double type and not
the _Float128 type.  We get an internal compiler error if we combine the
signbitf128 built-in with a long double type.

I've gone through several iterations of trying to fix this within GCC, and
there are various problems that have come up.  I developed this alternative
patch that changes libgcc so that it does not tickle the issue.  I hope we can
fix the compiler at some point, but right now, this is preventing people on
Fedora 36 systems from building compilers where the default long double is IEEE
128-bit.

I have built a GCC compiler tool chain on the following platforms and there
were no regressions caused by these patches.

*   Power10 little endian, IBM long double, --with-cpu=power10

*   Power9 little endian, IBM long double, --with-cpu=power9

*   Power8 big endian, IBM long double, --with-cpu=power8, both
32-bit/64-bit tests.

In addition, I have built a GCC compiler tool chain on the following systems
with IEEE 128-bit long double as the default.  Comparing the test suite runs to
the runs for the toolchain with IBM long double as the default, I only get the
expected differences (C++ modules test fail on IEEE long double, 3 Fortran
tests pass on IEEE long double that fail on IBM long double, C test pr105334.c
fails, and C test fp128_conversions.c fails on power10):

*   Power10 little endian, IEEE long double, --with-cpu=power10

*   Power9 little endian, IEEE long double, --with-cpu=power9

Note, it is Friday February 3rd, and I will be on vacation from Tuesday
February 7th through Tuesday February 14th.

Can I check this change into the master branch?

2023-02-02   Michael Meissner  

PR target/107299
* config/rs6000/_divkc3.c (COPYSIGN): Use the correct built-in based on
whether long double is IBM or IEEE.
(INFINITY): Likewise.
(FABS): Likewise.
* config/rs6000/_mulkc3.c (COPYSIGN): Likewise.
(INFINITY): Likewise.
* config/rs6000/quad-float128.h (TF): Remove definition.
(TFtype): Define to be long double or _Float128.
(TCtype): Define to be _Complex long double or _Complex _Float128.
* libgcc2.h (TFtype): Allow machine config files to override this.
(TCtype): Likewise.
* soft-fp/quad.h (TFtype): Likewise.
---
 libgcc/config/rs6000/_divkc3.c   |  8 
 libgcc/config/rs6000/_mulkc3.c   |  7 +++
 libgcc/config/rs6000/quad-float128.h | 19 ++-
 libgcc/libgcc2.h |  4 
 libgcc/soft-fp/quad.h|  2 ++
 5 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/libgcc/config/rs6000/_divkc3.c b/libgcc/config/rs6000/_divkc3.c
index 9f52428cfa0..e3bb97c9cb7 100644
--- a/libgcc/config/rs6000/_divkc3.c
+++ b/libgcc/config/rs6000/_divkc3.c
@@ -26,9 +26,17 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #include "soft-fp.h"
 #include "quad-float128.h"
 
+#ifndef __LONG_DOUBLE_IEEE128__
 #define COPYSIGN(x,y) __builtin_copysignf128 (x, y)
 #define INFINITY __builtin_inff128 ()
 #define FABS __builtin_fabsf128
+
+#else
+#define COPYSIGN(x,y) __builtin_copysignl (x, y)
+#define INFINITY __builtin_infl ()
+#define FABS __builtin_fabsl
+#endif
+
 #define isnan __builtin_isnan
 #define isinf __builtin_isinf
 #define isfinite __builtin_isfinite
diff --git a/libgcc/config/rs6000/_mulkc3.c b/libgcc/config/rs6000/_mulkc3.c
index 299d8d147b0..3d98436d1d4 100644
--- a/libgcc/config/rs6000/_mulkc3.c
+++ b/libgcc/config/rs6000/_mulkc3.c
@@ -26,8 +26,15 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #include "soft-fp.h"
 #include "quad-float128.h"
 
+#ifndef __LONG_DOUBLE_IEEE128__
 #define COPYSIGN(x,y) __builtin_copysignf128 (x, y)
 #define INFINITY __builtin_inff128 ()
+
+#else
+#define COPYSIGN(x,y) __builtin_copysignl (x, y)
+#define INFINITY __builtin_infl ()
+#endif
+
 #define isnan __builtin_isnan
 #define isinf __builtin_isinf
 
diff --git a/libgcc/config/rs6000/quad-float128.h 
b/libgcc/config/rs6000/quad-float128.h
index 

[PATCH 0/2] Repost of patches for solving the build on Fedora 36 problem

2023-02-02 Thread Michael Meissner via Gcc-patches
I'm reposting these two patches that allow GCC to build on Fedora 36 just to be
clear which patches I'm talking about.  The issue is that if GCC is configured
with long double using the IEEE 128-bit representation, it currently cannot
build _mulkc3 and _divkc3 in libgcc.

Note, these patches do not solve the underlying problem of mixing _Float128 and
long double types and using built-in functions (i.e. calling a _Float128
built-in function with long double arguments when long double is IEEE 128-bit,
or vice versa calling a long double built-in function with _Float128
arguments).  But they do allow the compiler to build.

Note, it is the morning of February 3rd, and I will be off on vacation from
February 7th through February 14th.

The first patch changes libgcc so that it uses either _Float128 or long double
as the base IEEE 128-bit type, depending on whether long double uses the IBM
double-double representation, or the IEEE 128-bit representation.  And for the
complex type it uses _Complex _Float128 or _Complex long double.  The _mulkc3
and _divkc3 functions are adjusted to use the f128 built-in functions or the
long double built-in functions, based on the long double type.

The second patch improves how the compiler generates the call to _mulkc3 and
_divkc3.  I've discovered as I have tried to fix underlying problem with the
IEEE 128-bit floating point types, it breaks the calls for IEEE 128-bit complex
multiply and divide.  This patch uses a cleaner approach to generate these
calls, and it will work with the current setup, and with the various fixes that
I've attempted to do to fix the underlying problem.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[PATCH] Bump up precision size to 16 bits.

2023-02-02 Thread Michael Meissner via Gcc-patches
The new __dmr type that is being added as a possible future PowerPC instruction
set bumps into a structure field size issue.  The size of the __dmr type is 
1024 bits.
The precision field in tree_type_common is currently 10 bits, so if you store
1,024 into field, you get a 0 back.  When you get 0 in the precision field, the
ccp pass passes this 0 to sext_hwi in hwint.h.  That function in turn generates
a shift that is equal to the host wide int bit size, which is undefined as
machine dependent for shifting in C/C++.

  int shift = HOST_BITS_PER_WIDE_INT - prec;
  return ((HOST_WIDE_INT) ((unsigned HOST_WIDE_INT) src << shift)) >> shift;

It turns out the x86_64 where I first did my tests returns the original input
before the two shifts, while the PowerPC always returns 0.  In the ccp pass, the
original input is -1, and so it worked.  When I did the runs on the PowerPC, the
result was 0, which ultimately led to the failure.

In addition, once the precision field is larger, it will help PR C/102989 (C2x
_BigInt) as well as the implementation of the SET_TYPE_VECTOR_SUBPARTS macro.

I bootstraped various PowerPC compilers (power10 LE, power9 LE, power8 BE)
along with an x86_64 build.  There were no regressions.  My proposed patches
for the __dmr type now run fine.  Can I install this into the master branch for
GCC 13?

2023-02-02   Richard Biener  
 Michael Meissner  

gcc/

PR middle-end/108623
* hwint.h (sext_hwi): Add assertion against precision 0.
* tree-core.h (tree_type_common): Bump up precision field to 16 bits.
Align bit fields > 1 bit to at least an 8-bit boundary.
---
 gcc/hwint.h |  1 +
 gcc/tree-core.h | 24 
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/gcc/hwint.h b/gcc/hwint.h
index e31aa006fa4..ba92efbfc25 100644
--- a/gcc/hwint.h
+++ b/gcc/hwint.h
@@ -277,6 +277,7 @@ ctz_or_zero (unsigned HOST_WIDE_INT x)
 static inline HOST_WIDE_INT
 sext_hwi (HOST_WIDE_INT src, unsigned int prec)
 {
+  gcc_checking_assert (prec != 0);
   if (prec == HOST_BITS_PER_WIDE_INT)
 return src;
   else
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 8124a1328d4..b71748c6c02 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1686,18 +1686,8 @@ struct GTY(()) tree_type_common {
   tree attributes;
   unsigned int uid;
 
-  unsigned int precision : 10;
-  unsigned no_force_blk_flag : 1;
-  unsigned needs_constructing_flag : 1;
-  unsigned transparent_aggr_flag : 1;
-  unsigned restrict_flag : 1;
-  unsigned contains_placeholder_bits : 2;
-
+  unsigned int precision : 16;
   ENUM_BITFIELD(machine_mode) mode : 8;
-
-  /* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE.
- TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE.  */
-  unsigned string_flag : 1;
   unsigned lang_flag_0 : 1;
   unsigned lang_flag_1 : 1;
   unsigned lang_flag_2 : 1;
@@ -1713,12 +1703,22 @@ struct GTY(()) tree_type_common {
  so we need to store the value 32 (not 31, as we need the zero
  as well), hence six bits.  */
   unsigned align : 6;
+  /* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE.
+ TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE.  */
+  unsigned string_flag : 1;
+  unsigned no_force_blk_flag : 1;
+
   unsigned warn_if_not_align : 6;
+  unsigned needs_constructing_flag : 1;
+  unsigned transparent_aggr_flag : 1;
+
+  unsigned contains_placeholder_bits : 2;
+  unsigned restrict_flag : 1;
   unsigned typeless_storage : 1;
   unsigned empty_flag : 1;
   unsigned indivisible_p : 1;
   unsigned no_named_args_stdarg_p : 1;
-  unsigned spare : 15;
+  unsigned spare : 9;
 
   alias_set_type alias_set;
   tree pointer_to;
-- 
2.39.1


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH 0/6] PowerPC Dense Math prelimary support (-mcpu=future)

2023-01-31 Thread Michael Meissner via Gcc-patches
On Sun, Jan 29, 2023 at 09:52:38PM -0500, Michael Meissner wrote:
> On Sat, Jan 28, 2023 at 02:29:04AM -0500, Michael Meissner wrote:
> > On Fri, Jan 27, 2023 at 01:59:00PM -0600, Segher Boessenkool wrote:
> > > > There is one bug that I noticed.  When you use the full DMR instruction 
> > > > the
> > > > constant copy propagation patch issues internal errors.  I believe this 
> > > > is due
> > > > to the CCP pass not handling opaque types cleanly enough, and it only 
> > > > shows up
> > > > in larger types.  I would like to get these patches committed, and then 
> > > > work
> > > > the maintainers of the CCP to fix the problem.
> > > 
> > > Erm.  If the compiler ICEs, we can not include this code.  But hopefully
> > > you mean something else?
> > 
> > I realize we can't include the code for final release.  But as a temporary
> > measure I was hoping we would put in the code, we could allow somebody more
> > familar with ccp to debug it.  Then if there were changes needed in the 
> > PowerPC
> > back end, we could make them, once ccp was fixed.
> > 
> > But that is a moot point, ccp no longer dies with the code, so I have 
> > removed
> > the comment and the no tree ccp option in the next set of patches.
> 
> Unfortunately, while it worked on my x86 as a cross compiler, when I did the
> builds for real, it is a problem, so I will need to look into it.

Ok, I tracked down the source of the bug.  The CCP pass is depending on the
precision field.  Unfortunately in tree-core.h, the precision is a 10 integer
bit field, so 1,024 will become 0.

Having a 0 precision meant that the hwint function for sign extending a value
would generate:

(HOST_WIDE_INT)(((unsigned HOST_WIDE_INT)value << 64) >> 64)

which is undefined behavior in C and C++.  On the x86_64 doing the shift left
and then right gives you the initial value (which was -1), while on the PowerPC
it always gives you 0.  The CCP code was assuming if it wasn't -1, that it was
an integer, but the TDO type is opaque, not integer.

The solution was to grow precision by 1 bit and decrease the extra bits in the
placeholder entry by 1 bit.  I'm testing it now.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH 0/6] PowerPC Dense Math prelimary support (-mcpu=future)

2023-01-29 Thread Michael Meissner via Gcc-patches
On Sat, Jan 28, 2023 at 02:29:04AM -0500, Michael Meissner wrote:
> On Fri, Jan 27, 2023 at 01:59:00PM -0600, Segher Boessenkool wrote:
> > > There is one bug that I noticed.  When you use the full DMR instruction 
> > > the
> > > constant copy propagation patch issues internal errors.  I believe this 
> > > is due
> > > to the CCP pass not handling opaque types cleanly enough, and it only 
> > > shows up
> > > in larger types.  I would like to get these patches committed, and then 
> > > work
> > > the maintainers of the CCP to fix the problem.
> > 
> > Erm.  If the compiler ICEs, we can not include this code.  But hopefully
> > you mean something else?
> 
> I realize we can't include the code for final release.  But as a temporary
> measure I was hoping we would put in the code, we could allow somebody more
> familar with ccp to debug it.  Then if there were changes needed in the 
> PowerPC
> back end, we could make them, once ccp was fixed.
> 
> But that is a moot point, ccp no longer dies with the code, so I have removed
> the comment and the no tree ccp option in the next set of patches.

Unfortunately, while it worked on my x86 as a cross compiler, when I did the
builds for real, it is a problem, so I will need to look into it.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH 0/6] PowerPC Dense Math prelimary support (-mcpu=future)

2023-01-27 Thread Michael Meissner via Gcc-patches
On Fri, Jan 27, 2023 at 01:59:00PM -0600, Segher Boessenkool wrote:
> > There is one bug that I noticed.  When you use the full DMR instruction the
> > constant copy propagation patch issues internal errors.  I believe this is 
> > due
> > to the CCP pass not handling opaque types cleanly enough, and it only shows 
> > up
> > in larger types.  I would like to get these patches committed, and then work
> > the maintainers of the CCP to fix the problem.
> 
> Erm.  If the compiler ICEs, we can not include this code.  But hopefully
> you mean something else?

I realize we can't include the code for final release.  But as a temporary
measure I was hoping we would put in the code, we could allow somebody more
familar with ccp to debug it.  Then if there were changes needed in the PowerPC
back end, we could make them, once ccp was fixed.

But that is a moot point, ccp no longer dies with the code, so I have removed
the comment and the no tree ccp option in the next set of patches.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Patch: [PATCH 8] PowerPC: Support load/store vector with right length.

2023-01-20 Thread Michael Meissner via Gcc-patches
Ping patch.  We really would like to get these possible future PowerPC insn
changes into GCC 13.

| Date: Sat, 12 Nov 2022 00:10:59 -0500
| Subject: [PATCH 8] PowerPC: Support load/store vector with right length.
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Ping: [PATCH 7] PowerPC: Add -mcpu=future saturating subtract built-ins.

2023-01-20 Thread Michael Meissner via Gcc-patches
Ping patch.  We really would like to get these possibly future PowerPC insns
into GCC 13.

| Date: Sat, 12 Nov 2022 00:07:55 -0500
| Subject: [PATCH 7] PowerPC: Add -mcpu=future saturating subtract built-ins.
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Ping: [PATCH 6/6] PowerPC: Add support for 1,024 bit DMR registers.

2023-01-20 Thread Michael Meissner via Gcc-patches
Ping patch.  We really would like to get these possibly future PowerPC insns
into GCC 13.

| Date: Wed, 9 Nov 2022 21:52:49 -0500
| Subject: [PATCH 6/6] PowerPC: Add support for 1,024 bit DMR registers.
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Ping: [PATCH 5/6] PowerPC: Switch to dense math names for all MMA operations.

2023-01-20 Thread Michael Meissner via Gcc-patches
Ping patch.  We really would like to get these possibly future PowerPC insns
into GCC 13.

| Date: Wed, 9 Nov 2022 21:51:48 -0500
| Subject: [PATCH 5/6] PowerPC: Switch to dense math names for all MMA 
operations.
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Ping: [PATCH 4/6] PowerPC: Make MMA insns support DMR registers

2023-01-20 Thread Michael Meissner via Gcc-patches
Ping patch.  We really would like to get these possibly future PowerPC insns
into GCC 13.

| Date: Wed, 9 Nov 2022 21:50:24 -0500
| Subject: [PATCH 4/6] PowerPC: Make MMA insns support DMR registers
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Ping: [PATCH 3/6] PowerPC: Add support for accumulators in DMR registers.

2023-01-20 Thread Michael Meissner via Gcc-patches
Ping patch.  We really would like to get these possibly future PowerPC patches
into GCC 13.

| Date: Wed, 9 Nov 2022 21:46:36 -0500
| Subject: [PATCH 3/6] PowerPC: Add support for accumulators in DMR registers.
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Ping: [PATCH 2/6] PowerPC: Make -mcpu=future enable -mblock-ops-vector-pair.

2023-01-20 Thread Michael Meissner via Gcc-patches
Ping patch.  We really would like to get these possible future PowerPC patches
into GCC 13.

| Date: Wed, 9 Nov 2022 21:45:39 -0500
| Subject: [PATCH 2/6] PowerPC: Make -mcpu=future enable 
-mblock-ops-vector-pair.
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Ping: [PATCH 1/6] PowerPC: Add -mcpu=future

2023-01-20 Thread Michael Meissner via Gcc-patches
Ping patch.  We really would like the patches to enable the possible future
MMA+ instructions into GCC 13.

| Date: Wed, 9 Nov 2022 21:44:39 -0500
| Subject: [PATCH 1/6] PowerPC: Add -mcpu=future
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[PATCH] PR target/107299: Fix build issue when long double is IEEE 128-bit

2023-01-19 Thread Michael Meissner via Gcc-patches
This patch updates the IEEE 128-bit types used in libgcc.

At the moment, we cannot build GCC when the target uses IEEE 128-bit long
doubles, such as building the compiler for a native Fedora 36 system.  The
build dies when it is trying to build the _mulkc3.c and _divkc3 modules.

This patch changes libgcc to use long double for the IEEE 128-bit base type if
long double is IEEE 128-bit, and it uses _Float128 otherwise.  The built-in
functions are adjusted to be the correct version based on the IEEE 128-bit base
type used.

While it is desirable to ultimately have __float128 and _Float128 use the same
internal type and mode within GCC, at present if you use the option
-mabi=ieeelongdouble, the __float128 type will use the long double type and not
the _Float128 type.  We get an internal compiler error if we combine the
signbitf128 built-in with a long double type.

I've gone through several iterations of trying to fix this within GCC, and
there are various problems that have come up.  I developed this alternative
patch that changes libgcc so that it does not tickle the issue.  I hope we can
fix the compiler at some point, but right now, this is preventing people on
Fedora 36 systems from building compilers where the default long double is IEEE
128-bit.

I have built a GCC compiler tool chain on the following platforms and there
were no regressions caused by these patches.

*   Power10 little endian, IBM long double, --with-cpu=power10

*   Power9 little endian, IBM long double, --with-cpu=power9

*   Power8 big endian, IBM long double, --with-cpu=power8, both
32-bit/64-bit tests.

In addition, I have built a GCC compiler tool chain on the following systems
with IEEE 128-bit long double as the default.  Comparing the test suite runs to
the runs for the toolchain with IBM long double as the default, I only get the
expected differences (C++ modules test fail on IEEE long double, 3 Fortran
tests pass on IEEE long double that fail on IBM long double, C test pr105334.c
fails, and C test fp128_conversions.c fails on power10):

*   Power10 little endian, IEEE long double, --with-cpu=power10

*   Power9 little endian, IEEE long double, --with-cpu=power9

Can I check this change into the master branch?

2023-01-19   Michael Meissner  

PR target/107299
* config/rs6000/_divkc3.c (COPYSIGN): Use the correct built-in based on
whether long double is IBM or IEEE.
(INFINITY): Likewise.
(FABS): Likewise.
* config/rs6000/_mulkc3.c (COPYSIGN): Likewise.
(INFINITY): Likewise.
* config/rs6000/quad-float128.h (TF): Remove definition.
(TFtype): Define to be long double or _Float128.
(TCtype): Define to be _Complex long double or _Complex _Float128.
* libgcc2.h (TFtype): Allow machine config files to override this.
(TCtype): Likewise.
* soft-fp/quad.h (TFtype): Likewise.
---
 libgcc/config/rs6000/_divkc3.c   |  8 
 libgcc/config/rs6000/_mulkc3.c   |  7 +++
 libgcc/config/rs6000/quad-float128.h | 19 ++-
 libgcc/libgcc2.h |  4 
 libgcc/soft-fp/quad.h|  2 ++
 5 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/libgcc/config/rs6000/_divkc3.c b/libgcc/config/rs6000/_divkc3.c
index 59ab2137d1d..8eeb0f76ba4 100644
--- a/libgcc/config/rs6000/_divkc3.c
+++ b/libgcc/config/rs6000/_divkc3.c
@@ -26,9 +26,17 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #include "soft-fp.h"
 #include "quad-float128.h"
 
+#ifndef __LONG_DOUBLE_IEEE128__
 #define COPYSIGN(x,y) __builtin_copysignf128 (x, y)
 #define INFINITY __builtin_inff128 ()
 #define FABS __builtin_fabsf128
+
+#else
+#define COPYSIGN(x,y) __builtin_copysignl (x, y)
+#define INFINITY __builtin_infl ()
+#define FABS __builtin_fabsl
+#endif
+
 #define isnan __builtin_isnan
 #define isinf __builtin_isinf
 #define isfinite __builtin_isfinite
diff --git a/libgcc/config/rs6000/_mulkc3.c b/libgcc/config/rs6000/_mulkc3.c
index cfae81f8b5f..290dc89bbc1 100644
--- a/libgcc/config/rs6000/_mulkc3.c
+++ b/libgcc/config/rs6000/_mulkc3.c
@@ -26,8 +26,15 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #include "soft-fp.h"
 #include "quad-float128.h"
 
+#ifndef __LONG_DOUBLE_IEEE128__
 #define COPYSIGN(x,y) __builtin_copysignf128 (x, y)
 #define INFINITY __builtin_inff128 ()
+
+#else
+#define COPYSIGN(x,y) __builtin_copysignl (x, y)
+#define INFINITY __builtin_infl ()
+#endif
+
 #define isnan __builtin_isnan
 #define isinf __builtin_isinf
 
diff --git a/libgcc/config/rs6000/quad-float128.h 
b/libgcc/config/rs6000/quad-float128.h
index ae0622c744c..8332184348a 100644
--- a/libgcc/config/rs6000/quad-float128.h
+++ b/libgcc/config/rs6000/quad-float128.h
@@ -27,21 +27,14 @@
License along with the GNU C Library; if not, see
.  */
 
-/* quad.h defines the TFtype type by:
-   typedef float TFtype 

Re: [PATCH/RFC] rs6000: Remove optimize_for_speed check for implicit TARGET_SAVE_TOC_INDIRECT [PR108184]

2023-01-17 Thread Michael Meissner via Gcc-patches
On Tue, Jan 17, 2023 at 03:57:24PM -0500, Michael Meissner wrote:
> So I have objection to the change.  I suspect it may be better with a check 
> for
> just optimize either for speed or size, and not for speed.

Sigh.  I meant I have NO objection to the change.  Sorry about that.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH/RFC] rs6000: Remove optimize_for_speed check for implicit TARGET_SAVE_TOC_INDIRECT [PR108184]

2023-01-17 Thread Michael Meissner via Gcc-patches
On Mon, Jan 16, 2023 at 05:39:04PM +0800, Kewen.Lin wrote:
> Hi,
> 
> Now we will check optimize_function_for_speed_p (cfun) for
> TARGET_SAVE_TOC_INDIRECT if it's implicitly enabled.  But
> the effect of -msave-toc-indirect is actually to save the
> TOC in the prologue for indirect calls rather than inline,
> it's also good for optimize_function_for_size?  So this
> patch is to remove the check of optimize_function_for_speed
> and make it work for both optimizing for size and speed.
> 
> Bootstrapped and regtested on powerpc64-linux-gnu P8,
> powerpc64le-linux-gnu P{9,10} and powerpc-ibm-aix.
> 
> Any thoughts?
> 
> Thanks in advance!

Well in terms of size, it is only a savings if we have 2 or more indirect calls
within a module, and we are not compiling for power10.

On power9, if we have just one indirect call, then it is the same size.

On power10, the -msave-toc-indirect switch does nothing, because we don't need
TOCs when we have prefixed addressing.

So I have objection to the change.  I suspect it may be better with a check for
just optimize either for speed or size, and not for speed.

The option however, can slow things down if there is an early exit to the
function since the store would always be done, even if the function exits
early.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [RFC/PATCH] Remove the workaround for _Float128 precision [PR107299]

2023-01-11 Thread Michael Meissner via Gcc-patches
On Tue, Jan 10, 2023 at 07:23:23PM +0100, Jakub Jelinek wrote:
> On Mon, Jan 09, 2023 at 10:21:52PM -0500, Michael Meissner wrote:
> > I had the patches to change the precision to 128, and I just ran them.  C 
> > and
> > C++ do not seem to be bothered by changing the precision to 128 (once I got 
> > it
> > to build, etc.).  But Fortran on the other hand does actually use the 
> > precision
> > to differentiate between IBM extended double and IEEE 128-bit.  In 
> > particular,
> > the following 3 tests fail when long double is IBM extended double:
> > 
> > gfortran.dg/PR100914.f90
> > gfortran.dg/c-interop/typecodes-array-float128.f90
> > gfortran.dg/c-interop/typecodes-scalar-float128.f90
> > 
> > I tried adding code to use the old precisions for Fortran, but not for 
> > C/C++,
> > but it didn't seem to work.
> > 
> > So while it might be possible to use a single 128 for the precision, it 
> > needs
> > more work and attention, particularly on the Fortran side.
> 
> Can't be more than a few lines changed in the fortran FE.
> Yes, the FE needs to know if it is IBM extended double or IEEE 128-bit so
> that it can decide on the mangling - where to use the artificial kind 17 and
> where to use 16.  But as long as it can figure that out, it doesn't need to
> rely on a particular precision.

I agree that in theory it should be simple to fix.  Unfortunately the patches
that I was working on cause some other failures that I need to investigate.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299

2023-01-11 Thread Michael Meissner via Gcc-patches
On Tue, Nov 01, 2022 at 10:42:30PM -0400, Michael Meissner wrote:
> This patch fixes the issue that GCC cannot build when the default long double
> is IEEE 128-bit.  It fails in building libgcc, specifically when it is trying
> to buld the __mulkc3 function in libgcc.  It is failing in 
> gimple-range-fold.cc
> during the evrp pass.  Ultimately it is failing because the code declared the
> type to use TFmode but it used F128 functions (i.e. KFmode).

Unfortunately, this patch no longer works against the trunk.  I have a simpler
patch to libgcc that uses the _Complex _Float128 and _Float128 types for
building the IEEE 128-bit support in libgcc.  It doesn't fix the problem in the
compiler, but it will allow us to go forward and build GCC on targets that have
IEEE 128-bit floating point support (i.e. Fedora 36).

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [RFC/PATCH] Remove the workaround for _Float128 precision [PR107299]

2023-01-09 Thread Michael Meissner via Gcc-patches
On Fri, Jan 06, 2023 at 07:41:07PM -0500, Michael Meissner wrote:
> On Wed, Dec 21, 2022 at 09:40:24PM +, Joseph Myers wrote:
> > On Wed, 21 Dec 2022, Segher Boessenkool wrote:
> > 
> > > > --- a/gcc/tree.cc
> > > > +++ b/gcc/tree.cc
> > > > @@ -9442,15 +9442,6 @@ build_common_tree_nodes (bool signed_char)
> > > >if (!targetm.floatn_mode (n, extended).exists ())
> > > > continue;
> > > >int precision = GET_MODE_PRECISION (mode);
> > > > -  /* Work around the rs6000 KFmode having precision 113 not
> > > > -128.  */
> > > 
> > > It has precision 126 now fwiw.
> > > 
> > > Joseph: what do you think about this patch?  Is the workaround it
> > > removes still useful in any way, do we need to do that some other way if
> > > we remove this?
> > 
> > I think it's best for the TYPE_PRECISION, for any type with the binary128 
> > format, to be 128 (not 126).
> > 
> > It's necessary that _Float128, _Float64x and long double all have the same 
> > TYPE_PRECISION when they have the same (binary128) format, or at least 
> > that TYPE_PRECISION for _Float128 >= that for long double >= that for 
> > _Float64x, so that the rules in c_common_type apply properly.
> > 
> > How the TYPE_PRECISION compares to that of __ibm128, or of long double 
> > when that's double-double, is less important.
> 
> I spent a few days on working on this.  I have patches to make the 3 128-bit
> types to all have TYPE_PRECISION of 128.  To do this, I added a new mode macro
> (FRACTIONAL_FLOAT_MODE_NO_WIDEN) that takes the same arguments as
> FRACTIONAL_FLOAT_MODE.

...

I had the patches to change the precision to 128, and I just ran them.  C and
C++ do not seem to be bothered by changing the precision to 128 (once I got it
to build, etc.).  But Fortran on the other hand does actually use the precision
to differentiate between IBM extended double and IEEE 128-bit.  In particular,
the following 3 tests fail when long double is IBM extended double:

gfortran.dg/PR100914.f90
gfortran.dg/c-interop/typecodes-array-float128.f90
gfortran.dg/c-interop/typecodes-scalar-float128.f90

I tried adding code to use the old precisions for Fortran, but not for C/C++,
but it didn't seem to work.

So while it might be possible to use a single 128 for the precision, it needs
more work and attention, particularly on the Fortran side.

I'm not sure it is worth it to try and change things.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [RFC/PATCH] Remove the workaround for _Float128 precision [PR107299]

2023-01-06 Thread Michael Meissner via Gcc-patches
On Wed, Dec 21, 2022 at 09:40:24PM +, Joseph Myers wrote:
> On Wed, 21 Dec 2022, Segher Boessenkool wrote:
> 
> > > --- a/gcc/tree.cc
> > > +++ b/gcc/tree.cc
> > > @@ -9442,15 +9442,6 @@ build_common_tree_nodes (bool signed_char)
> > >if (!targetm.floatn_mode (n, extended).exists ())
> > >   continue;
> > >int precision = GET_MODE_PRECISION (mode);
> > > -  /* Work around the rs6000 KFmode having precision 113 not
> > > -  128.  */
> > 
> > It has precision 126 now fwiw.
> > 
> > Joseph: what do you think about this patch?  Is the workaround it
> > removes still useful in any way, do we need to do that some other way if
> > we remove this?
> 
> I think it's best for the TYPE_PRECISION, for any type with the binary128 
> format, to be 128 (not 126).
> 
> It's necessary that _Float128, _Float64x and long double all have the same 
> TYPE_PRECISION when they have the same (binary128) format, or at least 
> that TYPE_PRECISION for _Float128 >= that for long double >= that for 
> _Float64x, so that the rules in c_common_type apply properly.
> 
> How the TYPE_PRECISION compares to that of __ibm128, or of long double 
> when that's double-double, is less important.

I spent a few days on working on this.  I have patches to make the 3 128-bit
types to all have TYPE_PRECISION of 128.  To do this, I added a new mode macro
(FRACTIONAL_FLOAT_MODE_NO_WIDEN) that takes the same arguments as
FRACTIONAL_FLOAT_MODE.

This will create a floating point mode that is a normal floating point mode,
but the GET_MODE_WIDER and GET_MODE_2XWIDER macros will never return it.  By
declaring both IFmode and KFmode to not be widened to, but noral TFmode is, it
eliminates the problems where an IBM expression got converted to IEEE, which
can mostly (but not always) contain the value.  In addition, on power8, it
means it won't call the KF mode emulator functions, just the IF functions.

We need to have one 128-bit mode (TFmode) that is not declared as NO_WARN, or
long double won't be created, since float_mode_for_size (128) will not be able
to find the correct type.

I did have to patch convert_mode_scalar so that it would not abort if it was
doing a conversion between two floating point types with the same precision.

I tested this with the first patch from the previous set of patches (that
rewrites complex multiply/divide built-in setup).  I think that patch is useful
as a stand alone patch.

I also used Kewen Lin's patch from December 27th in build_common_tree_nodes to
do the test.  I haven't tested if this particular patch fixes this problem, or
it fixes something else.

Finally, I used the third patch in my series of patches that straightens out
128<->128 FP conversions.  That patch needed to be tweaked slightly, as one of
the conversations became FLOAT_EXTEND instead of FLOAT_TRUNCATE.

We don't have a RTL operation that says convert from one floating point type to
another where both types are the same size.  Whether FLOAT_EXTEND is used or
FLOAT_TRUNCATE, used to depend on whether the TYPE_PRECISION was greater or
lesser.  Now that they are the same, it arbitrarily picks FLOAT_EXTEND.

While I still think the 2nd patch is important, it isn't needed with the above
patches.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [RFC/PATCH] Remove the workaround for _Float128 precision [PR107299]

2023-01-03 Thread Michael Meissner via Gcc-patches
On Wed, Dec 21, 2022 at 09:40:24PM +, Joseph Myers wrote:
> On Wed, 21 Dec 2022, Segher Boessenkool wrote:
> 
> > > --- a/gcc/tree.cc
> > > +++ b/gcc/tree.cc
> > > @@ -9442,15 +9442,6 @@ build_common_tree_nodes (bool signed_char)
> > >if (!targetm.floatn_mode (n, extended).exists ())
> > >   continue;
> > >int precision = GET_MODE_PRECISION (mode);
> > > -  /* Work around the rs6000 KFmode having precision 113 not
> > > -  128.  */
> > 
> > It has precision 126 now fwiw.
> > 
> > Joseph: what do you think about this patch?  Is the workaround it
> > removes still useful in any way, do we need to do that some other way if
> > we remove this?
> 
> I think it's best for the TYPE_PRECISION, for any type with the binary128 
> format, to be 128 (not 126).
> 
> It's necessary that _Float128, _Float64x and long double all have the same 
> TYPE_PRECISION when they have the same (binary128) format, or at least 
> that TYPE_PRECISION for _Float128 >= that for long double >= that for 
> _Float64x, so that the rules in c_common_type apply properly.
> 
> How the TYPE_PRECISION compares to that of __ibm128, or of long double 
> when that's double-double, is less important.

When I did the original implementation years ago, there were various implicit
assumptions that for any one precision, there must be only one floating point
type.

I tend to agree that logically the precision should be 128, but until we go
through and fix all of these assumptions, it may be problematical.  This shows
up in the whole infrastructure of looking for a FP type with larger precision
than a given precision.  There just isn't an ordering that works and preserves
all values.

I'm coming to think that we may want 2 types of FP, one is a standard FP type
where you can convert to a larger type, and the other for various FP types
where there is no default widening conversion.

And logically there is the issue with 16-bit floats, giving we have different
versions of 16-bit float.

And if an implementation ever wanted to support both BID and DFP decimal types
at the same time, they would have similar issues.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299

2022-12-16 Thread Michael Meissner via Gcc-patches
On Fri, Dec 16, 2022 at 11:55:27AM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Dec 15, 2022 at 07:09:38PM -0500, Michael Meissner wrote:
> > On Thu, Dec 15, 2022 at 11:59:49AM -0600, Segher Boessenkool wrote:
> > > On Wed, Dec 14, 2022 at 10:36:03AM +0100, Jakub Jelinek wrote:
> > > > The hacks with different precisions of powerpc 128-bit floating types 
> > > > are
> > > > very unfortunate, it is I assume because the middle-end asserted that 
> > > > scalar
> > > > floating point types with different modes have different precision.
> > > 
> > > IEEE QP and double-double cannot be ordered, neither represents a subset
> > > of the other.  But the current middle end does require a total ordering
> > > for all floating point types (that can be converted to each other).
> > > 
> > > Mike's precision hack was supposed to give us some time until an actual
> > > fix was made.  But no one has worked on that, and there have been
> > > failures found with the precision hack as well, it worked remarkably
> > > well but it isn't perfect.
> > > 
> > > We cannot move forward in a meaningful way until these problems are
> > > fixed.  We can move around like headless chickens some more of course.
> > 
> > In general I tend to think most of these automatic widenings are
> > problematical.  But there are cases where it makes sense.
> 
> These things are *not* widening at all, that is the problem.  For some
> values it is lossy, in either direction.

Ummm yes and no.

Going from SFmode to DFmode is a widening, as is SDmode to DDmode.  Since all
values within SFmode or SDmode can be represented in DFmode/DDmode.  That is
needed, since not all machines have full support for arithmetic.

Going from DFmode to KFmode is still a widening, but in general we may want to
prevent it from happing due to the speed of KFmode operations compared to
DFmode.  Likewise going from DFmode to IFmode is a widening since all values in
DFmode can be represented in IFmode.

Obviously going from IFmode to KFmode or the reverse is where the issue it.

> > Lets see.  On the PowerPC, there is no support for 32-bit decimal 
> > arithmetic.
> > There you definately the compiler to automatically promoto SDmode to DDmode 
> > to
> > do the arithmetic and then possibly convert it back.  Similarly for the 
> > limited
> > 16-bit floating point modes, where you have operations to pack and unpack 
> > the
> > object, but you have no arithmetic.
> 
> And those things *are* widening, non-lossy in all cases.  Well-defined
> etc.

Yes, but we just need to improve the hooks to prevent cases where it is not
defined.

> > But I would argue that you NEVER want to automatically promoto DFmode to 
> > either
> > KFmode, TFmode, or IFmode, since those modes are (almost) always going to be
> > slower than doing the emulation.  This is particularly true if we only 
> > support
> > a subset of operations, where some things can be done inline, but a lot of
> > operations would need to be done via emulation (such as on power8 where we
> > don't have IEEE 128-bit support in hardware).
> 
> TFmode is either the same actual mode as either KFmode or IFmode, let's
> reduce confusion by not talking about it at all anymore.
> 
> The middle end should never convert to another mode without a good
> reason for it.  But OTOH, both IFmode and KFmode can represent all
> values in DFmode, you just have to be careful about semantics when
> eventually converting back.

It doesn't.  Where the issue is is when you call a built-in function and that
built-in function uses different types than what you pass.  Then conversions
have to be inserted.

> > If the machine independent part of the compiler decides oh we can do this
> > operation because some operations are present (such as move, negate, 
> > absolute
> > value, and compare), then you likely will wind up promoting the 64-bit 
> > type(s)
> > to 128-bit, doing a call to a slower 128-bit function, and then truncating 
> > the
> > value back to 64-bit is faster than calling a 64-bit emulation function.
> 
> This does not in general have the correct semantics though (without
> -ffast-math), so the compiler will not do things like it.

It would happen if we didn't set the hooks to prevent it, but we did.  Maybe
there are places that need more hooks.

> > While for the PowerPC, we want to control what is the logical wider type for
> > floating point types, I can imagine we don't want all backends to have to
> > implment these hooks if they just have the standard 2-3 floating point 
> > modes.
> 
> KFmode is not wider than IFmode.  IFmode is not wider than KFmode.
> KFmode can represent values IFmode cannot.  IFmode can represent valuse
> KFmode cannot.

There the issue is the historical issue that GCC believes there is only number
(precision) that says whether one type is wider than another.  And to some
extent, precision is wrong, in that do you want precision to mean the number of
bytes in a value, the mantissa size, the exponent 

Re: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299

2022-12-15 Thread Michael Meissner via Gcc-patches
On Thu, Dec 15, 2022 at 11:59:49AM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Dec 14, 2022 at 10:36:03AM +0100, Jakub Jelinek wrote:
> > On Wed, Dec 14, 2022 at 04:46:07PM +0800, Kewen.Lin via Gcc-patches wrote:
> > > Since function useless_type_conversion_p considers two float types are 
> > > compatible
> > > if they have the same mode, so it doesn't require the explicit 
> > > conversions between
> > > these two types.  I think it's exactly what we want.  And to me, it looks 
> > > unexpected
> > > to have two types with the same mode but different precision.
> > > 
> > > So could we consider disabling the above workaround to make _Float128 
> > > have the same
> > > precision as __float128 (long double) (the underlying TFmode)?  I tried 
> > > the below
> > > change:
> > 
> > The hacks with different precisions of powerpc 128-bit floating types are
> > very unfortunate, it is I assume because the middle-end asserted that scalar
> > floating point types with different modes have different precision.
> 
> IEEE QP and double-double cannot be ordered, neither represents a subset
> of the other.  But the current middle end does require a total ordering
> for all floating point types (that can be converted to each other).
> 
> Mike's precision hack was supposed to give us some time until an actual
> fix was made.  But no one has worked on that, and there have been
> failures found with the precision hack as well, it worked remarkably
> well but it isn't perfect.
> 
> We cannot move forward in a meaningful way until these problems are
> fixed.  We can move around like headless chickens some more of course.

In general I tend to think most of these automatic widenings are
problematical.  But there are cases where it makes sense.

Lets see.  On the PowerPC, there is no support for 32-bit decimal arithmetic.
There you definately the compiler to automatically promoto SDmode to DDmode to
do the arithmetic and then possibly convert it back.  Similarly for the limited
16-bit floating point modes, where you have operations to pack and unpack the
object, but you have no arithmetic.

But I would argue that you NEVER want to automatically promoto DFmode to either
KFmode, TFmode, or IFmode, since those modes are (almost) always going to be
slower than doing the emulation.  This is particularly true if we only support
a subset of operations, where some things can be done inline, but a lot of
operations would need to be done via emulation (such as on power8 where we
don't have IEEE 128-bit support in hardware).

If the machine independent part of the compiler decides oh we can do this
operation because some operations are present (such as move, negate, absolute
value, and compare), then you likely will wind up promoting the 64-bit type(s)
to 128-bit, doing a call to a slower 128-bit function, and then truncating the
value back to 64-bit is faster than calling a 64-bit emulation function.  And
even if the operation is does have a named insn to do the operation, it doesn't
mean that you want to use that operation in general.

I recall in the past that for some x86 boxes, the 80-bit XFmode insns floating
point stack operations on the x86 were really slow compared to the current
SFmode and DFmode SSE operations.  But for some of the older machines, it may
have been faster.  And chosing a different -march= would change whether or
not you want to do the optimization.  Having these tables built statically can
be a recipie for disaster.  For floating point at least, I would prefer if a
target had an option to dispense with the statically built get_wider tables,
and did everything via target hooks.

While for the PowerPC, we want to control what is the logical wider type for
floating point types, I can imagine we don't want all backends to have to
implment these hooks if they just have the standard 2-3 floating point modes.

I purposelly haven't been looking into 16-bit floating point support, but I
have to imagine there you have the problem that there are at least 2-3
different 16-bit formats roaming around.  This is essentially the same issue in
the PowerPC where you have 2 128-bit floating point types, neither of which is
a pure subset of the other.

To my way of thinking, it is a many branching tree.  On the PowerPC, you want
SDmode to promoto to DDmode, and possibly to TDmode.  And SFmode mode would
promote to DFmode, but DFmode would not generally promote automtically to
IFmode, TFmode, or KFmode.  We don't have any machines that support it, but I
lets say some machine wanted to support both decimal types (DFP and BID).  You
would logically not want any DFP type to promoto to a BID type or vice versa.

Sure, explicit conversions would be allowed, but not the invisibile conversions
done to promote the type.

In terms of these machine dependent types, there are some issues that show up
when a port creates these special types.

   1)   It would be nice if _Complex worked with MD types.  It is tiresome to
have 

[PATCH 3/3, V3] PR 107299, Update float 128-bit conversion

2022-12-14 Thread Michael Meissner via Gcc-patches
This patch fixes two tests that are still failing when long double is IEEE
128-bit after the previous 2 patches for PR target/107299 have been applied.
The tests are:

gcc.target/powerpc/convert-fp-128.c
gcc.target/powerpc/pr85657-3.c

This patch is a rewrite of the patch submitted on August 18th:

| https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599988.html

This patch reworks the conversions between 128-bit binary floating point types.
Previously, we would call rs6000_expand_float128_convert to do all conversions.
Now, we only define the conversions between the same representation that turn
into a NOP.  The appropriate extend or truncate insn is generated, and after
register allocation, it is converted to a move.

This patch also fixes two places where we want to override the external name
for the conversion function, and the wrong optab was used.  Previously,
rs6000_expand_float128_convert would handle the move or generate the call as
needed.  Now, it lets the machine independent code generate the call.  But if
we use the machine independent code to generate the call, we need to update the
name for two optabs where a truncate would be used in terms of converting
between the modes.  This patch updates those two optabs.

I tested this patch on:

1)  LE Power10 using --with-cpu=power10 --with-long-double-format=ieee
2)  LE Power10 using --with-cpu=power10 --with-long-double-format=ibm
3)  LE Power9  using --with-cpu=power9  --with-long-double-format=ibm
4)  BE Power8  using --with-cpu=power8  --with-long-double-format=ibm

In the past I have also tested this exact patch on the following systems:

1)  LE Power10 using --with-cpu=power9  --with-long-double-format=ibm
2)  LE Power10 using --with-cpu=power8  --with-long-double-format=ibm
3)  LE Power10 using --with-cpu=power10 --with-long-double-format=ibm

There were no regressions in the bootstrap process or running the tests (after
applying all 3 patches for PR target/107299).  Can I check this patch into the
trunk?

2022-12-14   Michael Meissner  

gcc/

PR target/107299
* config/rs6000/rs6000.cc (init_float128_ieee): Use the correct
float_extend or float_truncate optab based on how the machine converts
between IEEE 128-bit and IBM 128-bit.
* config/rs6000/rs6000.md (IFKF): Delete.
(IFKF_reg): Delete.
(extendiftf2): Rewrite to be a move if IFmode and TFmode are both IBM
128-bit.  Do not run if TFmode is IEEE 128-bit.
(extendifkf2): Delete.
(extendtfkf2): Delete.
(extendtfif2): Delete.
(trunciftf2): Delete.
(truncifkf2): Delete.
(trunckftf2): Delete.
(extendkftf2): Implement conversion of IEEE 128-bit types as a move.
(trunctfif2): Delete.
(trunctfkf2): Implement conversion of IEEE 128-bit types as a move.
(extendtf2_internal): Delete.
(extendtf2_internal): Delete.
---
 gcc/config/rs6000/rs6000.cc |   4 +-
 gcc/config/rs6000/rs6000.md | 177 ++--
 2 files changed, 50 insertions(+), 131 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 604f6a9ce33..0a20bfc8421 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -11134,11 +11134,11 @@ init_float128_ieee (machine_mode mode)
   set_conv_libfunc (trunc_optab, SFmode, mode, "__trunckfsf2");
   set_conv_libfunc (trunc_optab, DFmode, mode, "__trunckfdf2");
 
-  set_conv_libfunc (sext_optab, mode, IFmode, "__trunctfkf2");
+  set_conv_libfunc (trunc_optab, mode, IFmode, "__trunctfkf2");
   if (mode != TFmode && FLOAT128_IBM_P (TFmode))
set_conv_libfunc (sext_optab, mode, TFmode, "__trunctfkf2");
 
-  set_conv_libfunc (trunc_optab, IFmode, mode, "__extendkftf2");
+  set_conv_libfunc (sext_optab, IFmode, mode, "__extendkftf2");
   if (mode != TFmode && FLOAT128_IBM_P (TFmode))
set_conv_libfunc (trunc_optab, TFmode, mode, "__extendkftf2");
 
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 6011f5bf76a..799af3c3ebe 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -543,12 +543,6 @@ (define_mode_iterator FMOVE128_GPR [TI
 ; Iterator for 128-bit VSX types for pack/unpack
 (define_mode_iterator FMOVE128_VSX [V1TI KF])
 
-; Iterators for converting to/from TFmode
-(define_mode_iterator IFKF [IF KF])
-
-; Constraints for moving IF/KFmode.
-(define_mode_attr IFKF_reg [(IF "d") (KF "wa")])
-
 ; Whether a floating point move is ok, don't allow SD without hardware FP
 (define_mode_attr fmove_ok [(SF "")
(DF "")
@@ -9096,106 +9090,65 @@ (define_insn "*ieee_128bit_vsx_nabs2_internal"
   "xxlor %x0,%x1,%x2"
   [(set_attr "type" "veclogical")])
 
-;; Float128 conversion functions.  These expand to library function calls.
-;; We use expand to convert from IBM double double to IEEE 128-bit
-;; and trunc for 

[PATCH 2/3, V3] PR 107299, Make __float128 use the _Float128 type

2022-12-14 Thread Michael Meissner via Gcc-patches
This patch fixes the issue that GCC cannot build when the default long double
is IEEE 128-bit.  It fails in building libgcc, specifically when it is trying
to buld the __mulkc3 function in libgcc.  It is failing in gimple-range-fold.cc
during the evrp pass.  Ultimately it is failing because the code declared the
internal type for one IEEE 128-bit floating point type, and NaN functions use a
different IEEE 128-bit floating point type.

Gimple-range-fold uses the internal types, but there are similar problems when
the code is converted to RTL and the two different modes (KFmode, TFmode) are
used.

typedef float TFtype __attribute__((mode (TF)));
typedef __complex float TCtype __attribute__((mode (TC)));

TCtype
__mulkc3_sw (TFtype a, TFtype b, TFtype c, TFtype d)
{
  TFtype ac, bd, ad, bc, x, y;
  TCtype res;

  ac = a * c;
  bd = b * d;
  ad = a * d;
  bc = b * c;

  x = ac - bd;
  y = ad + bc;

  if (__builtin_isnan (x) && __builtin_isnan (y))
{
  _Bool recalc = 0;
  if (__builtin_isinf (a) || __builtin_isinf (b))
{

  a = __builtin_copysignf128 (__builtin_isinf (a) ? 1 : 0, a);
  b = __builtin_copysignf128 (__builtin_isinf (b) ? 1 : 0, b);
  if (__builtin_isnan (c))
c = __builtin_copysignf128 (0, c);
  if (__builtin_isnan (d))
d = __builtin_copysignf128 (0, d);
  recalc = 1;
}
  if (__builtin_isinf (c) || __builtin_isinf (d))
{

  c = __builtin_copysignf128 (__builtin_isinf (c) ? 1 : 0, c);
  d = __builtin_copysignf128 (__builtin_isinf (d) ? 1 : 0, d);
  if (__builtin_isnan (a))
a = __builtin_copysignf128 (0, a);
  if (__builtin_isnan (b))
b = __builtin_copysignf128 (0, b);
  recalc = 1;
}
  if (!recalc
  && (__builtin_isinf (ac) || __builtin_isinf (bd)
  || __builtin_isinf (ad) || __builtin_isinf (bc)))
{

  if (__builtin_isnan (a))
a = __builtin_copysignf128 (0, a);
  if (__builtin_isnan (b))
b = __builtin_copysignf128 (0, b);
  if (__builtin_isnan (c))
c = __builtin_copysignf128 (0, c);
  if (__builtin_isnan (d))
d = __builtin_copysignf128 (0, d);
  recalc = 1;
}
  if (recalc)
{
  x = __builtin_inff128 () * (a * c - b * d);
  y = __builtin_inff128 () * (a * d + b * c);
}
}

  __real__ res = x;
  __imag__ res = y;
  return res;
}

Currently GCC uses the long double type node for __float128 if long double is
IEEE 128-bit.  It did not use the node for _Float128.

Originally this was noticed if you call the nansq function to make a signaling
NaN (nansq is mapped to nansf128).  Because the type node for _Float128 is
different from __float128, the machine independent code converts signaling NaNs
to quiet NaNs if the types are not compatible.  The following tests used to
fail when run on a system where long double is IEEE 128-bit:

gcc.dg/torture/float128-nan.c
gcc.target/powerpc/nan128-1.c

This patch makes both __float128 and _Float128 use the same type node.

One side effect of not using the long double type node for __float128 is that we
must only use KFmode for _Float128/__float128.  The libstdc++ library won't
build if we use TFmode for _Float128 and __float128 when long double is IEEE
128-bit.

Another minor side effect is that the f128 round to odd fused multiply-add
function will not merge negatition with the FMA operation when the type is long
double.  If the type is __float128 or _Float128, then it will continue to do the
optimization.  The round to odd functions are defined in terms of __float128
arguments.  For example:

long double
do_fms (long double a, long double b, long double c)
{
return __builtin_fmaf128_round_to_odd (a, b, -c);
}

will generate (assuming -mabi=ieeelongdouble):

xsnegqp 4,4
xsmaddqpo 4,2,3
xxlor 34,36,36

while:

__float128
do_fms (__float128 a, __float128 b, __float128 c)
{
return __builtin_fmaf128_round_to_odd (a, b, -c);
}

will generate:

xsmsubqpo 4,2,3
xxlor 34,36,36

Assuming this patch goes in, we can open a bug about the above optimizations not
working.  However, given that the functions are explicitly documented to use
__float128 types, and the code in the test is using long double, I don't think
it is a high 

[PATCH 1/3, V3] PR 107299, Rework 128-bit complex multiply and divide

2022-12-14 Thread Michael Meissner via Gcc-patches
This patch reworks how the complex multiply and divide built-in functions are
done.  Previously we created built-in declarations for doing long double complex
multiply and divide when long double is IEEE 128-bit.  The old code also did not
support __ibm128 complex multiply and divide if long double is IEEE 128-bit.

In terms of history, I wrote the original code just as I was starting to test
GCC on systems where IEEE 128-bit long double was the default.  At the time, we
had not yet started mangling the built-in function names as a way to bridge
going from a system with 128-bit IBM long double to 128-bin IEEE long double.

The original code depends on there only being two 128-bit types invovled.  With
the next patch in this series, this assumption will no longer be true.  When
long double is IEEE 128-bit, there will be 2 IEEE 128-bit types (one for the
explicit __float128/_Float128 type and one for long double).

The problem is we cannot create two separate built-in functions that resolve to
the same name.  This is a requirement of add_builtin_function and the C front
end.  That means for the 3 possible modes (IFmode, KFmode, and TFmode), you can
only use 2 of them.

This code does not create the built-in declaration with the changed name.
Instead, it uses the TARGET_MANGLE_DECL_ASSEMBLER_NAME hook to change the name
before it is written out to the assembler file like it now does for all of the
other long double built-in functions.

When I wrote these patches, I discovered that __ibm128 complex multiply and
divide had originally not been supported if long double is IEEE 128-bit as it
would generate calls to __mulic3 and __divic3.  I added tests in the testsuite
to verify that the correct name (i.e. __multc3 and __divtc3) is used in this
case.

I had previously sent this patch out on November 1st.  Compared to that version,
this version no longer disables the special mapping when you are building
libgcc, as it turns out we don't need it.

I tested all 3 patchs for PR target/107299 on:

1)  LE Power10 using --with-cpu=power10 --with-long-double-format=ieee
2)  LE Power10 using --with-cpu=power10 --with-long-double-format=ibm
3)  LE Power9  using --with-cpu=power9  --with-long-double-format=ibm
4)  BE Power8  using --with-cpu=power8  --with-long-double-format=ibm

Once all 3 patches have been applied, we can once again build GCC when long
double is IEEE 128-bit.  There were no other regressions with these patches.
Can I check these patches into the trunk?

2022-12-14   Michael Meissner  

gcc/

PR target/107299
* config/rs6000/rs6000.cc (create_complex_muldiv): Delete.
(init_float128_ieee): Delete code to switch complex multiply and divide
for long double.
(complex_multiply_builtin_code): New helper function.
(complex_divide_builtin_code): Likewise.
(rs6000_mangle_decl_assembler_name): Add support for mangling the name
of complex 128-bit multiply and divide built-in functions.

gcc/testsuite/

PR target/107299
* gcc.target/powerpc/divic3-1.c: New test.
* gcc.target/powerpc/divic3-2.c: Likewise.
* gcc.target/powerpc/mulic3-1.c: Likewise.
* gcc.target/powerpc/mulic3-2.c: Likewise.
---
 gcc/config/rs6000/rs6000.cc | 109 +++-
 gcc/testsuite/gcc.target/powerpc/divic3-1.c |  18 
 gcc/testsuite/gcc.target/powerpc/divic3-2.c |  17 +++
 gcc/testsuite/gcc.target/powerpc/mulic3-1.c |  18 
 gcc/testsuite/gcc.target/powerpc/mulic3-2.c |  17 +++
 5 files changed, 132 insertions(+), 47 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/divic3-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/divic3-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/mulic3-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/mulic3-2.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index b3a609f3aa3..6d08f6ed1fb 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -11101,26 +11101,6 @@ init_float128_ibm (machine_mode mode)
 }
 }
 
-/* Create a decl for either complex long double multiply or complex long double
-   divide when long double is IEEE 128-bit floating point.  We can't use
-   __multc3 and __divtc3 because the original long double using IBM extended
-   double used those names.  The complex multiply/divide functions are encoded
-   as builtin functions with a complex result and 4 scalar inputs.  */
-
-static void
-create_complex_muldiv (const char *name, built_in_function fncode, tree fntype)
-{
-  tree fndecl = add_builtin_function (name, fntype, fncode, BUILT_IN_NORMAL,
- name, NULL_TREE);
-
-  set_builtin_decl (fncode, fndecl, true);
-
-  if (TARGET_DEBUG_BUILTIN)
-fprintf (stderr, "create complex %s, fncode: %d\n", name, (int) fncode);
-
-  return;
-}
-
 /* Set up IEEE 128-bit floating point routines.  Use different names if the
arguments can 

[PATCH, V3] PR 107299, GCC does not build on PowerPC when long double is IEEE 128-bit

2022-12-14 Thread Michael Meissner via Gcc-patches
This set of patches was first submitted on November 1st.  Kewen.Lin
 asked for some changes to the first set of patches.  I
also tried to clean up the comments in the second patch about types that Segher
Boessenkool  mentioned.

I had just re-submitted the first patch yesterday, but Segher asked that I
repost all three patches.  Here is the original commentary for all three
patches, tweaked a little bit:

These 3 patches fix the problems with building GCC on PowerPC systems when long
double is configured to use the IEEE 128-bit format.

There are 3 patches in this patch set.  The first two patches are required to
fix the basic problem.  The third patch fixes some issues that were noticed
along the way.

The basic issue is internally within GCC there are several types for 128-bit
floating point.  The types are:

1) The long double type (which use either TFmode for 128-bit long doubles
or possibly DFmode for 64-bit long doubles).  In the normal case, long
double is 128-bits (TFmode) and depending on the configuration options
and the switches passed by the user at compilation time, long double is
either the 128-bit IBM double-double type or IEEE 128-bit.

2)  The type for __ibm128.  If long double is IBM 128-bit double-double,
internally within the compiler, this type is the same as the long
double type.  If long double is either IEEE 128-bit or is 64-bit, then
this type is a separate type.  If long double is not double-double,
this type will use IFmode during RTL.

3)  The type for _Float128.  This type is always IEEE 128-bit if it exists.
While it is a separate internal type, currently if long double is IEEE
128-bit, this type uses TFmode once it gets to RTL, but within Gimple
it is a separate type.  If long double is not IEEE 128-bit, then this
type uses KFmode.  All of the f128 math functions defined by the
compiler use this type.  In the past, _Float128 was a C extended type
and not available in C++.  Now it is a part of the C/C++ 2x standards.

4)  The type for __float128.  The history is I implemented __float128
first, and several releases later, we added _Float128 as a standard C
type.  Unfortunately, I didn't think things through enough when
_Float128 came out.  Like __ibm128, it uses the long double type if
long double is IEEE 128-bit, and now it uses the _Float128 type if long
double is not IEEE 128-bit.  IMHO, this is the major problem.  The two
IEEE 128-bit types should use the same type internally (or at least one
should be a qualified type of the other).  Before we started adding
more support for _Float128, it mostly works, but now it doesn't with
more optimizations being done.

5)  The error occurs in building _mulkc3 in libgcc, when the TFmode type in
the code is defined to use attribute((mode(TF))), but the functions
that are called all have _Float128 arguments.  These are separate
types, and ultimately one of the consistancy checks fails because they
are different types.

There are 3 patches in this set:

1)  The first patch rewrites how the complex 128-bit multiply and divide
functions are done in the compiler.  In the old scheme, essentially
there were only two types ever being used, the long double type, and
the not long double type.  The original code would make the names
called of these functions to be __multc3/__divtc3 or
__mulkc3/__divkc3.  This worked because there were only two types.
With straightening out the types, so __float128/_Float128 is never the
long double type, there are potentially 3-4 types.  However, the C
front end and the middle end code will not let us create two built-in
functions that have the same name.

Patch #1 patch rips out this code, and rewrites it to be cleaner.

In the original version of the patches, I disabled doing the mapping
when building libgcc because it caused problems when building __mulkc3
and __divkc3.  I have removed this check, since the second patch will
allow these functions to be built without disabling the mapping.

2)  The second patch fixes the problem of __float128 and _Float128 not
being the same if long double is IEEE 128-bit.  After this patch, both
_Float128 and __float128 types will always use the same type.  When we
get to RTL, it will always use KFmode type (and not use TFmode).  The
stdc++ library will not build if we use TFmode for these types due to
the other changes.

There is a minor codegen issue that if you explicitly use long double
and call the F128 FMA (fused multiply-add) round to odd functions that
are defined to use __float128/_Float128 arguments.  While we might be
able to optimize these 

Re: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299

2022-12-12 Thread Michael Meissner via Gcc-patches
I have submitted a new replacement patch for this patch:

| Date: Tue, 1 Nov 2022 22:40:43 -0400
| Subject: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR 
target/107299
| Message-ID: 
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608368.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[PATCH V2] Rework 128-bit complex multiply and divide, PR target/107299

2022-12-12 Thread Michael Meissner via Gcc-patches
In the patch I previously submitted:

| Date: Tue, 1 Nov 2022 22:40:43 -0400
| Subject: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR 
target/107299
| Message-ID: 

Kewen.Lin questioned whether we needed to disable the special handling of the
IEEE 128-bit complex multiply/divide if we are building libgcc.  I looked at
it, and we don't need to disable doing the special handling when building
libgcc.  But in order for it to work, patches #2 and #3 need to be applied.

This patch is a replacement patch for that previous patch.

This function reworks how the complex multiply and divide built-in functions are
done.  Previously we created built-in declarations for doing long double complex
multiply and divide when long double is IEEE 128-bit.  The old code also did not
support __ibm128 complex multiply and divide if long double is IEEE 128-bit.

In terms of history, I wrote the original code just as I was starting to test
GCC on systems where IEEE 128-bit long double was the default.  At the time, we
had not yet started mangling the built-in function names as a way to bridge
going from a system with 128-bit IBM long double to 128-bin IEEE long double.

The original code depends on there only being two 128-bit types invovled.  With
the next patch in this series, this assumption will no longer be true.  When
long double is IEEE 128-bit, there will be 2 IEEE 128-bit types (one for the
explicit __float128/_Float128 type and one for long double).

The problem is we cannot create two separate built-in functions that resolve to
the same name.  This is a requirement of add_builtin_function and the C front
end.  That means for the 3 possible modes (IFmode, KFmode, and TFmode), you can
only use 2 of them.

This code does not create the built-in declaration with the changed name.
Instead, it uses the TARGET_MANGLE_DECL_ASSEMBLER_NAME hook to change the name
before it is written out to the assembler file like it now does for all of the
other long double built-in functions.

When I wrote these patches, I discovered that __ibm128 complex multiply and
divide had originally not been supported if long double is IEEE 128-bit as it
would generate calls to __mulic3 and __divic3.  I added tests in the testsuite
to verify that the correct name (i.e. __multc3 and __divtc3) is used in this
case.

I tested all 3 patchs for PR target/107299 on:

1)  LE Power10 using --with-cpu=power10 --with-long-double-format=ieee
2)  LE Power10 using --with-cpu=power10 --with-long-double-format=ibm
3)  LE Power9  using --with-cpu=power9  --with-long-double-format=ibm
4)  BE Power8  using --with-cpu=power8  --with-long-double-format=ibm

Once all 3 patches have been applied, we can once again build GCC when long
double is IEEE 128-bit.  There were no other regressions with these patches.
Can I check these patches into the trunk?

2022-12-13   Michael Meissner  

gcc/

PR target/107299
* config/rs6000/rs6000.cc (create_complex_muldiv): Delete.
(init_float128_ieee): Delete code to switch complex multiply and divide
for long double.
(complex_multiply_builtin_code): New helper function.
(complex_divide_builtin_code): Likewise.
(rs6000_mangle_decl_assembler_name): Add support for mangling the name
of complex 128-bit multiply and divide built-in functions.

gcc/testsuite/

PR target/107299
* gcc.target/powerpc/divic3-1.c: New test.
* gcc.target/powerpc/divic3-2.c: Likewise.
* gcc.target/powerpc/mulic3-1.c: Likewise.
* gcc.target/powerpc/mulic3-2.c: Likewise.
---
 gcc/config/rs6000/rs6000.cc | 109 +++-
 gcc/testsuite/gcc.target/powerpc/divic3-1.c |  18 
 gcc/testsuite/gcc.target/powerpc/divic3-2.c |  17 +++
 gcc/testsuite/gcc.target/powerpc/mulic3-1.c |  18 
 gcc/testsuite/gcc.target/powerpc/mulic3-2.c |  17 +++
 5 files changed, 132 insertions(+), 47 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/divic3-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/divic3-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/mulic3-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/mulic3-2.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 70a3ca801fe..b5a5ecbf51a 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -11120,26 +11120,6 @@ init_float128_ibm (machine_mode mode)
 }
 }
 
-/* Create a decl for either complex long double multiply or complex long double
-   divide when long double is IEEE 128-bit floating point.  We can't use
-   __multc3 and __divtc3 because the original long double using IBM extended
-   double used those names.  The complex multiply/divide functions are encoded
-   as builtin functions with a complex result and 4 scalar inputs.  */
-
-static void
-create_complex_muldiv (const char *name, built_in_function fncode, tree fntype)
-{
-  tree fndecl = add_builtin_function (name, fntype, 

Re: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299

2022-12-12 Thread Michael Meissner via Gcc-patches
On Mon, Dec 12, 2022 at 06:20:14PM +0800, Kewen.Lin wrote:
> Without or with patch #1, the below ICE in libgcc exists, the ICE should have
> nothing to do with the special handling for building_libgcc in patch #1.  I
> think patch #2 which makes _Float128 and __float128 use the same internal
> type fixes that ICE.
> 
> I still don't get the point why we need the special handling for 
> building_libgcc,
> I also tested on top of patch #1 and #2 w/ and w/o the special handling for
> building_libgcc, both bootstrapped and regress-tested.
> 
> Could you have a double check?

As long as patch #2 and #3 are installed, we don't need the special handling
for building_libgcc.  Good catch.

I will send out a replacement patch for it.

> Since your patch #2 (and #3) fixes ICE and some exposed problems, and 
> _Float128
> is to use the same internal type as __float128, types with 
> attribute((mode(TF)))
> and attribute((mode(TC))) should be correct, I assume that this patch is just
> to make the types explicit be with _Float128 (for better readability and
> maintainance), but not for any correctness issues.

Yes, the patch is mainly for clarity.  The history is the libgcc support went
in before _Float128 went in, and I never went back to use those types when we
could use them.

With _Float128, we can just use _Complex _Float128 and not
bother with trying to get the right KC/TC for the attribute mode stuff.

However, if patches 1-3 aren't put in, just applying the patch to use _Float128
and _Complex _Float128 would fix the immediate problem (of not building GCC on
systems with IEEE 128-bit long double).  However, it is a band-aid that just
works around the problem of building __mulkc3 and __divkc3.  It doesn't fix the
other problems between __float128 and _Float128 that show up in some places
that I would like to get fixed.

So I haven't submitted the patch, because I think it is more important to get
the other issues fixed.

> > Now, this patch fixes the specific problem of not being able to build libgcc
> > (along with patch #1 of the series).  But other things show the differences
> > from time time because we are using different internal types and the middle 
> > end
> > doesn't know that these types are really the same bits.
> > 
> > It is better long term (IMHO) if we have the two types (__float128 and
> > _Float128) use the same internal type (which is what is done in patches #2 
> > and
> > #3).  This fixes the other issues that show up, such as creating signaling 
> > NaNs
> > for one internal type, and converting it to the other internal type, loses 
> > that
> > the NaN is signalling.
> > 
> 
> I see, nice!
> 
> BR,
> Kewen

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299

2022-12-08 Thread Michael Meissner via Gcc-patches
On Wed, Dec 07, 2022 at 03:55:41PM +0800, Kewen.Lin wrote:
> Hi Mike,
> 
> on 2022/12/7 14:44, Michael Meissner wrote:
> > On Tue, Dec 06, 2022 at 05:36:54PM +0800, Kewen.Lin wrote:
> >> Hi Mike,
> >>
> >> Thanks for fixing this!
> >>
> >> Could you help to elaborate why we need to disable it during libgcc 
> >> building?
> > 
> > When you are building libgcc, you are building the __mulkc3, __divkc3
> > functions.  The mapping in the compiler interferes with those functions,
> > because at the moment, libgcc uses an alternate IEEE 128-bit type.
> > 
> 
> But I'm still confused.  For __mulkc3 (__divkc3 is similar),
> 
> 1) with -mabi=ieeelongdouble (TARGET_IEEEQUAD true, define 
> __LONG_DOUBLE_IEEE128__),
>the used types are:
> 
>typedef float TFtype __attribute__ ((mode (TF)));
>typedef __complex float TCtype __attribute__ ((mode (TC)));
> 
> 2) with -mabi=ibmlongdouble (TARGET_IEEEQUAD false, not 
> __LONG_DOUBLE_IEEE128__ defined),
>the used types are:
> 
>typedef float TFtype __attribute__ ((mode (KF)));
>typedef __complex float TCtype __attribute__ ((mode (KC)));
> 
> The proposed mapping in the current patch is:
> 
> +
> +  if (id == complex_multiply_builtin_code (KCmode))
> + newname = "__mulkc3";
> +
> +  else if (id == complex_multiply_builtin_code (ICmode))
> + newname = "__multc3";
> +
> +  else if (id == complex_multiply_builtin_code (TCmode))
> + newname = (TARGET_IEEEQUAD) ? "__mulkc3" : "__multc3";
> 
> for 1), TCmode && TARGET_IEEEQUAD => "__mulkc3"
> for 2), KCmode => "__mulkc3"
> 
> Both should be still with name "__mulkc3", do I miss anything?
> 
> BR,
> Kewen

The reason is due to the different internal types, the value range propigation
pass throws an error when we are trying to build libgcc.  This is due to the
underlying problem of different IEEE 128-bit types within the compiler.

The 128-bit IEEE support in libgcc was written before _Float128 was added to
GCC.  One consequence is that you can't get to the complex variant of
__float128.  So libgcc needs to use the attribute mode to get to that type.

But with the support for IEEE 128-bit long double changing things, it makes the
libgcc code use the wrong code.

/home/meissner/fsf-src/work102/libgcc/config/rs6000/_mulkc3.c: In function 
‘__mulkc3_sw’:
/home/meissner/fsf-src/work102/libgcc/config/rs6000/_mulkc3.c:97:1: internal 
compiler error: in fold_stmt, at gimple-range-fold.cc:522
   97 | }
  | ^
0x122784f3 fold_using_range::fold_stmt(vrange&, gimple*, fur_source&, 
tree_node*)
/home/meissner/fsf-src/work102/gcc/gimple-range-fold.cc:522
0x1226477f gimple_ranger::fold_range_internal(vrange&, gimple*, tree_node*)
/home/meissner/fsf-src/work102/gcc/gimple-range.cc:257
0x12264b1f gimple_ranger::range_of_stmt(vrange&, gimple*, tree_node*)
/home/meissner/fsf-src/work102/gcc/gimple-range.cc:318
0x113bdd8b range_query::value_of_stmt(gimple*, tree_node*)
/home/meissner/fsf-src/work102/gcc/value-query.cc:134
0x1134838f rvrp_folder::value_of_stmt(gimple*, tree_node*)
/home/meissner/fsf-src/work102/gcc/tree-vrp.cc:1023
0x111344cf substitute_and_fold_dom_walker::before_dom_children(basic_block_def*)
/home/meissner/fsf-src/work102/gcc/tree-ssa-propagate.cc:819
0x121ecbd3 dom_walker::walk(basic_block_def*)
/home/meissner/fsf-src/work102/gcc/domwalk.cc:311
0x11134ee7 substitute_and_fold_engine::substitute_and_fold(basic_block_def*)
/home/meissner/fsf-src/work102/gcc/tree-ssa-propagate.cc:998
0x11346bb7 execute_ranger_vrp(function*, bool, bool)
/home/meissner/fsf-src/work102/gcc/tree-vrp.cc:1084
0x11347063 execute
/home/meissner/fsf-src/work102/gcc/tree-vrp.cc:1165
Please submit a full bug report, with preprocessed source (by using 
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.
make[1]: *** [/home/meissner/fsf-src/work102/libgcc/shared-object.mk:14: 
_mulkc3.o] Error 1
make[1]: Leaving directory 
'/home/meissner/fsf-build-ppc64le/work102/powerpc64le-unknown-linux-gnu/libgcc'
make: *** [Makefile:20623: all-target-libgcc] Error 2

> > I have a patch for making libgcc use the 'right' type that I haven't 
> > submitted
> > yet.  This is because the more general fix that these 3 patches do impacts 
> > other
> > functions (due to __float128 and _Float128 being different in the current
> > compiler when -mabi=ieeelongdouble).
> > 

The patch is to use _Float128 and _Complex _Float128 in libgcc.h instead of
trying to use attribute((mode(TF))) and attribute((mode(TC))) in libgcc.

Now, this patch fixes the specific problem of not being able to build libgcc
(along with patch #1 of the series).  But other things show the differences
from time time because we are using different internal types and the middle end
doesn't know that these types are really the same bits.

It is better long term (IMHO) if we have the two types (__float128 and
_Float128) 

Re: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299

2022-12-06 Thread Michael Meissner via Gcc-patches
On Tue, Dec 06, 2022 at 05:36:54PM +0800, Kewen.Lin wrote:
> Hi Mike,
> 
> Thanks for fixing this!
> 
> Could you help to elaborate why we need to disable it during libgcc building?

When you are building libgcc, you are building the __mulkc3, __divkc3
functions.  The mapping in the compiler interferes with those functions,
because at the moment, libgcc uses an alternate IEEE 128-bit type.

I have a patch for making libgcc use the 'right' type that I haven't submitted
yet.  This is because the more general fix that these 3 patches do impacts other
functions (due to __float128 and _Float128 being different in the current
compiler when -mabi=ieeelongdouble).

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Ping #3: [PATCH 3/3] Update float 128-bit conversions, PR target/107299.

2022-12-02 Thread Michael Meissner via Gcc-patches
Ping for patches submitted on November 1st.  These 3 patches are needed to be
able to build GCC for PowerPC target systems where long double is configured to
be IEEE 128-bit, such as Fedora 36.

This is the 3rd of 3 patches.

| Date: Tue, 1 Nov 2022 22:44:01 -0400
| Subject: [PATCH 3/3] Update float 128-bit conversions, PR target/107299.
| Message-ID: 
| https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604837.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Ping #3: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299

2022-12-02 Thread Michael Meissner via Gcc-patches
Ping for patches submitted on November 1st.  These 3 patches are needed to be
able to build GCC for PowerPC target systems where long double is configured to
be IEEE 128-bit, such as Fedora 36.

This is for the 2nd of 3 patches.

| Date: Tue, 1 Nov 2022 22:42:30 -0400
| Subject: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299
| Message-ID: 
| https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604836.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Ping #3: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299

2022-12-02 Thread Michael Meissner via Gcc-patches
Ping for patches submitted on November 1st.  These 3 patches are needed to be
able to build GCC for PowerPC target systems where long double is configured to
be IEEE 128-bit, such as Fedora 36.

This is the first patch of 3 patches.

| Date: Tue, 1 Nov 2022 22:40:43 -0400
| Subject: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR 
target/107299
| Message-ID: 
| https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604835.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Ping #3: [PATCH 3/3] Update float 128-bit conversions, PR target/107299.

2022-11-29 Thread Michael Meissner via Gcc-patches
Can we get the three patches in this patch set reviewed?  Without them, GCC 13
can't be built on Fedora 37 which defaults to IEEE 128-bit long double.

| Date: Tue, 1 Nov 2022 22:44:01 -0400
| Subject: [PATCH 3/3] Update float 128-bit conversions, PR target/107299.
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299

2022-11-29 Thread Michael Meissner via Gcc-patches
Can we get the three patches in this patch set reviewed?  Without the patches
applied, GCC 13 will not build on Fedora 37, which uses long double defaulting
to IEEE 128-bit.

| Date: Tue, 1 Nov 2022 22:42:30 -0400
| Subject: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Ping #2: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299

2022-11-29 Thread Michael Meissner via Gcc-patches
Can we please get this patch reviewed?  GCC 13 won't build on Fedora 37 (which
defaults to long double being IEEE 128-bit) without the 3 patches in this set.

| Date: Tue, 1 Nov 2022 22:40:43 -0400
| Subject: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR 
target/107299
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[PATCH 8] PowerPC: Support load/store vector with right length.

2022-11-11 Thread Michael Meissner via Gcc-patches
This patch adds support for new instructions that may be added to the PowerPC
architecture in the future to enhance the load and store vector with length
instructions.

The current instructions (lxvl, lxvll, stxvl, and stxvll) are inconvient to use
since the count for the number of bytes must be in the top 8 bits of the GPR
register, instead of the bottom 8 bits.  This meant that code generating these
instructions typically had to do a shift left by 56 bits to get the count into
the right position.  In a future version of the PowerPC architecture, new
variants of these instructions might be added that expect the count to be in
the bottom 8 bits of the GPR register.  These patches add this support to GCC
if the user uses the -mcpu=future option.

I tested this patch on a little endian power10 system with long double using
the tradiational IBM double double format.  Assuming the other 6 patches for
-mcpu=future are checked in (or at least the first two patches), can I check
this patch into the master branch for GCC 13.

2022-11-11   Michael Meissner  

gcc/

* config/rs6000/vsx.md (lxvl): If -mcpu=future, generate the lxvl with
the shift count automaticaly used in the insn.
(lxvrl): New insn for -mcpu=future.
(lxvrll): Likewise.
(stxvl): If -mcpu=future, generate the stxvl with the shift count
automaticaly used in the insn.
(stxvrl): New insn for -mcpu=future.
(stxvrll): Likewise.

gcc/testsuite/

* gcc.target/powerpc/lxvrl.c: New test.
---
 gcc/config/rs6000/vsx.md | 122 +++
 gcc/testsuite/gcc.target/powerpc/lxvrl.c |  31 ++
 2 files changed, 132 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/lxvrl.c

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index fb5cf04147e..e4e73db9bb8 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5582,20 +5582,32 @@ (define_expand "first_mismatch_or_eos_index_"
   DONE;
 })
 
-;; Load VSX Vector with Length
+;; Load VSX Vector with Length.  If we have lxvrl, we don't have to do an
+;; explicit shift left into a pseudo.
 (define_expand "lxvl"
-  [(set (match_dup 3)
-(ashift:DI (match_operand:DI 2 "register_operand")
-   (const_int 56)))
-   (set (match_operand:V16QI 0 "vsx_register_operand")
-   (unspec:V16QI
-[(match_operand:DI 1 "gpc_reg_operand")
-  (mem:V16QI (match_dup 1))
- (match_dup 3)]
-UNSPEC_LXVL))]
+  [(use (match_operand:V16QI 0 "vsx_register_operand"))
+   (use (match_operand:DI 1 "gpc_reg_operand"))
+   (use (match_operand:DI 2 "gpc_reg_operand"))]
   "TARGET_P9_VECTOR && TARGET_64BIT"
 {
-  operands[3] = gen_reg_rtx (DImode);
+  rtx shift_len = gen_rtx_ASHIFT (DImode, operands[2], GEN_INT (56));
+  rtx len;
+
+  if (TARGET_FUTURE)
+len = shift_len;
+  else
+{
+  len = gen_reg_rtx (DImode);
+  emit_insn (gen_rtx_SET (len, shift_len));
+}
+
+  rtx dest = operands[0];
+  rtx addr = operands[1];
+  rtx mem = gen_rtx_MEM (V16QImode, addr);
+  rtvec rv = gen_rtvec (3, addr, mem, len);
+  rtx lxvl = gen_rtx_UNSPEC (V16QImode, rv, UNSPEC_LXVL);
+  emit_insn (gen_rtx_SET (dest, lxvl));
+  DONE;
 })
 
 (define_insn "*lxvl"
@@ -5619,6 +5631,34 @@ (define_insn "lxvll"
   "lxvll %x0,%1,%2"
   [(set_attr "type" "vecload")])
 
+;; For lxvrl and lxvrll, use the combiner to eliminate the shift.  The
+;; define_expand for lxvl will already incorporate the shift in generating the
+;; insn.  The lxvll buitl-in function required the user to have already done
+;; the shift.  Defining lxvrll this way, will optimize cases where the user has
+;; done the shift immediately before the built-in.
+(define_insn "*lxvrl"
+  [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
+   (unspec:V16QI
+[(match_operand:DI 1 "gpc_reg_operand" "b")
+ (mem:V16QI (match_dup 1))
+ (ashift:DI (match_operand:DI 2 "register_operand" "r")
+(const_int 56))]
+UNSPEC_LXVL))]
+  "TARGET_FUTURE && TARGET_64BIT"
+  "lxvrl %x0,%1,%2"
+  [(set_attr "type" "vecload")])
+
+(define_insn "*lxvrll"
+  [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
+   (unspec:V16QI [(match_operand:DI 1 "gpc_reg_operand" "b")
+   (mem:V16QI (match_dup 1))
+  (ashift:DI (match_operand:DI 2 "register_operand" "r")
+ (const_int 56))]
+ UNSPEC_LXVLL))]
+  "TARGET_FUTURE"
+  "lxvrll %x0,%1,%2"
+  [(set_attr "type" "vecload")])
+
 ;; Expand for builtin xl_len_r
 (define_expand "xl_len_r"
   [(match_operand:V16QI 0 "vsx_register_operand")
@@ -5650,18 +5690,29 @@ (define_insn "stxvll"
 
 ;; Store VSX Vector with Length
 (define_expand "stxvl"
-  [(set (match_dup 3)
-   (ashift:DI (match_operand:DI 2 "register_operand")
-  (const_int 56)))
-   (set (mem:V16QI (match_operand:DI 1 

[PATCH 7] PowerPC: Add -mcpu=future saturating subtract built-ins.

2022-11-11 Thread Michael Meissner via Gcc-patches
This patch adds support for a saturating subtract built-in function that may be
added to a future PowerPC processor.  Note, if it is added, the name of the
built-in function may change before GCC 13 is released.  If the name changes,
we will submit a patch changing the name.

I also added support for providing dense math built-in functions, even though
at present, we have not added any new built-in functions for dense math.  It is
likely we will want to add new dense math built-in functions as the dense math
support is fleshed out.

I tested this patch on a little endian power10 system with long double using
the tradiational IBM double double format.  Assuming the other 6 patches for
-mcpu=future are checked in (or at least the first patch), can I check this
patch into the master branch for GCC 13.

2022-11-11   Michael Meissner  

gcc/

* config/rs6000/rs6000-builtin.cc (rs6000_invalid_builtin): Add support
for flagging invalid use of future built-in functions.
(rs6000_builtin_is_supported): Add support for future built-in
functions.
* config/rs6000/rs6000-builtins.def (__builtin_saturate_subtract32): New
built-in function for -mcpu=future.
(__builtin_saturate_subtract64): Likewise.
* config/rs6000/rs6000-gen-builtins.cc (enum bif_stanza): Add stanzas
for -mcpu=future built-ins.
(stanza_map): Likewise.
(enable_string): Likewise.
(struct attrinfo): Likewise.
(parse_bif_attrs): Likewise.
(write_decls): Likewise.
* config/rs6000/rs6000.md (sat_sub3): Add saturating subtract
built-in insn declarations.
(sat_sub3_dot): Likewise.
(sat_sub3_dot2): Likewise.
* doc/extend.texi (Future PowerPC built-ins): New section.

gcc/testsuite/

* gcc.target/powerpc/subfus-1.c: New test.
* gcc.target/powerpc/subfus-2.c: Likewise.
* lib/target-supports.exp (check_effective_target_powerpc_future_ok):
New effective target.
---
 gcc/config/rs6000/rs6000-builtin.cc | 17 ++
 gcc/config/rs6000/rs6000-builtins.def   | 11 
 gcc/config/rs6000/rs6000-gen-builtins.cc| 35 ++--
 gcc/config/rs6000/rs6000.md | 60 +
 gcc/doc/extend.texi | 24 +
 gcc/testsuite/gcc.target/powerpc/subfus-1.c | 32 +++
 gcc/testsuite/gcc.target/powerpc/subfus-2.c | 32 +++
 gcc/testsuite/lib/target-supports.exp   | 16 +-
 8 files changed, 220 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/subfus-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/subfus-2.c

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index f4eba184db8..1ac00e4b26c 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -139,6 +139,17 @@ rs6000_invalid_builtin (enum rs6000_gen_builtins fncode)
 case ENB_MMA:
   error ("%qs requires the %qs option", name, "-mmma");
   break;
+case ENB_FUTURE:
+  error ("%qs requires the %qs option", name, "-mcpu=future");
+  break;
+case ENB_FUTURE_64:
+  error ("%qs requires the %qs option and either the %qs or %qs option",
+name, "-mcpu=future", "-m64", "-mpowerpc64");
+  break;
+case ENB_DM:
+  error ("%qs requires the %qs or %qs options", name, "-mcpu=future",
+"-mdense-math");
+  break;
 default:
 case ENB_ALWAYS:
   gcc_unreachable ();
@@ -194,6 +205,12 @@ rs6000_builtin_is_supported (enum rs6000_gen_builtins 
fncode)
   return TARGET_HTM;
 case ENB_MMA:
   return TARGET_MMA;
+case ENB_FUTURE:
+  return TARGET_FUTURE;
+case ENB_FUTURE_64:
+  return TARGET_FUTURE && TARGET_POWERPC64;
+case ENB_DM:
+  return TARGET_DENSE_MATH;
 default:
   gcc_unreachable ();
 }
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index f76f54793d7..ee141c1d99e 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -139,6 +139,8 @@
 ;   endian   Needs special handling for endianness
 ;   ibmldRestrict usage to the case when TFmode is IBM-128
 ;   ibm128   Restrict usage to the case where __ibm128 is supported or if ibmld
+;   future   Restrict usage to future instructions
+;   dm   Restrict usage to dense math
 ;
 ; Each attribute corresponds to extra processing required when
 ; the built-in is expanded.  All such special processing should
@@ -4108,3 +4110,12 @@
 
   void __builtin_vsx_stxvp (v256, unsigned long, const v256 *);
 STXVP nothing {mma,pair}
+
+[future]
+  const signed int __builtin_saturate_subtract32 (signed int, signed int);
+  SAT_SUBSI sat_subsi3 {}
+
+[future-64]
+  const signed long __builtin_saturate_subtract64 (signed long, signed long);
+  SAT_SUBDI sat_subdi3 {}
+
diff --git 

[PATCH 6/6] PowerPC: Add support for 1,024 bit DMR registers.

2022-11-09 Thread Michael Meissner via Gcc-patches
This patch is a prelimianry patch to add the full 1,024 bit dense math register
(DMRs) for -mcpu=future.  The MMA 512-bit accumulators map onto the top of the
DMR register.

This patch only adds the new 1,024 bit register support.  It does not add
support for any instructions that need 1,024 bit registers instead of 512 bit
registers.

I used the new mode 'TDOmode' to be the opaque mode used for 1,204 bit
registers.  The 'wD' constraint added in previous patches is used for these
registers.  I added support to do load and store of DMRs via the VSX registers,
since there are no load/store dense math instructions.  I added the new keyword
'__dmr' to create 1,024 bit types that can be loaded into DMRs.  At present, I
don't have aliases for __dmr512 and __dmr1024 that we've discussed internally.

At present, the tree constant propigation patch does not work with 1,024 bit
DMRs.  I believe this is due to the CCP pass not skipping opaque modes.  I hope
once this patch is committed, we can work on the machine independent changes to
allow the CCP pass not to issue an internal error when a DMR is used.

The patches have been tested on the following platforms.  I added the patches
for PR target/107299 that I submitted on November 2nd before doing the builds so
that GCC would build on systems using IEEE 128-bit long double.
*   https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html

There were no regressions with doing bootstrap builds and running the regression
tests:

1)  Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
2)  Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
3)  Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
4)  Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).

Can I check this patch into the GCC 13 master branch?

2022-11-09   Michael Meissner  

gcc/

* config/rs6000/mma.md (UNSPEC_DM_INSERT512_UPPER): New unspec.
(UNSPEC_DM_INSERT512_LOWER): Likewise.
(UNSPEC_DM_EXTRACT512): Likewise.
(UNSPEC_DMR_RELOAD_FROM_MEMORY): Likewise.
(UNSPEC_DMR_RELOAD_TO_MEMORY): Likewise.
(movtdo): New define_expand and define_insn_and_split to implement 1,024
bit DMR registers.
(movtdo_insert512_upper): New insn.
(movtdo_insert512_lower): Likewise.
(movtdo_extract512): Likewise.
(reload_dmr_from_memory): Likewise.
(reload_dmr_to_memory): Likewise.
* config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add DMR
support.
(rs6000_init_builtins): Add support for __dmr keyword.
* config/rs6000/rs6000-call.cc (rs6000_return_in_memory): Add support
for TDOmode.
(rs6000_function_arg): Likewise.
* config/rs6000/rs6000-modes.def (TDOmode): New mode.
* config/rs6000/rs6000.cc (rs6000_hard_regno_nregs_internal): Add
support for TDOmode.
(rs6000_hard_regno_mode_ok_uncached): Likewise.
(rs6000_hard_regno_mode_ok): Likewise.
(rs6000_modes_tieable_p): Likewise.
(rs6000_debug_reg_global): Likewise.
(rs6000_setup_reg_addr_masks): Likewise.
(rs6000_init_hard_regno_mode_ok): Add support for TDOmode.  Setup reload
hooks for DMR mode.
(reg_offset_addressing_ok_p): Add support for TDOmode.
(rs6000_emit_move): Likewise.
(rs6000_secondary_reload_simple_move): Likewise.
(rs6000_secondary_reload_class): Likewise.
(rs6000_mangle_type): Add mangling for __dmr type.
(rs6000_dmr_register_move_cost): Add support for TDOmode.
(rs6000_split_multireg_move): Likewise.
(rs6000_invalid_conversion): Likewise.
* config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Add TDOmode.
(enum rs6000_builtin_type_index): Add DMR type nodes.
(dmr_type_node): Likewise.
(ptr_dmr_type_node): Likewise.

gcc/testsuite/

* gcc.target/powerpc/dm-1024bit.c: New test.
---
 gcc/config/rs6000/mma.md  | 152 ++
 gcc/config/rs6000/rs6000-builtin.cc   |  13 ++
 gcc/config/rs6000/rs6000-call.cc  |  13 +-
 gcc/config/rs6000/rs6000-modes.def|   4 +
 gcc/config/rs6000/rs6000.cc   | 125 ++
 gcc/config/rs6000/rs6000.h|   7 +-
 gcc/testsuite/gcc.target/powerpc/dm-1024bit.c |  68 
 7 files changed, 350 insertions(+), 32 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/dm-1024bit.c

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index cca1fa71f75..2c08ad7619a 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -92,6 +92,11 @@ (define_c_enum "unspec"
UNSPEC_MMA_XXMFACC
UNSPEC_MMA_XXMTACC
UNSPEC_DM_ASSEMBLE_ACC
+   UNSPEC_DM_INSERT512_UPPER
+   UNSPEC_DM_INSERT512_LOWER
+   UNSPEC_DM_EXTRACT512
+   UNSPEC_DMR_RELOAD_FROM_MEMORY
+   UNSPEC_DMR_RELOAD_TO_MEMORY
   ])
 
 (define_c_enum 

[PATCH 5/6] PowerPC: Switch to dense math names for all MMA operations.

2022-11-09 Thread Michael Meissner via Gcc-patches
This patch changes the assembler instruction names for MMA instructions from
the original name used in power10 to the new name when used with the dense math
system.  I.e. xvf64gerpp becomes dmxvf64gerpp.  The assembler will emit the
same bits for either spelling.

The patches have been tested on the following platforms.  I added the patches
for PR target/107299 that I submitted on November 2nd before doing the builds so
that GCC would build on systems using IEEE 128-bit long double.
*   https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html

There were no regressions with doing bootstrap builds and running the regression
tests:

1)  Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
2)  Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
3)  Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
4)  Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).

Can I check this patch into the GCC 13 master branch?

2022-11-09   Michael Meissner  

gcc/

* config/rs6000/mma.md (vvi4i4i8_dm): New int attribute.
(avvi4i4i8_dm): Likewise.
(vvi4i4i2_dm): Likewise.
(avvi4i4i2_dm): Likewise.
(vvi4i4_dm): Likewise.
(avvi4i4_dm): Likewise.
(pvi4i2_dm): Likewise.
(apvi4i2_dm): Likewise.
(vvi4i4i4_dm): Likewise.
(avvi4i4i4_dm): Likewise.
(mma_): Add support for running on DMF systems, generating the dense
math instruction and using the dense math accumulators.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.

gcc/testsuite/

* gcc.target/powerpc/dm-double-test.c: New test.
* lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New
target test.
---
 gcc/config/rs6000/mma.md  |  98 +++--
 .../gcc.target/powerpc/dm-double-test.c   | 194 ++
 gcc/testsuite/lib/target-supports.exp |  19 ++
 3 files changed, 299 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/dm-double-test.c

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 835f34e8e00..cca1fa71f75 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -227,13 +227,22 @@ (define_int_attr apv  [(UNSPEC_MMA_XVF64GERPP 
"xvf64gerpp")
 
 (define_int_attr vvi4i4i8  [(UNSPEC_MMA_PMXVI4GER8 "pmxvi4ger8")])
 
+(define_int_attr vvi4i4i8_dm   [(UNSPEC_MMA_PMXVI4GER8 
"pmdmxvi4ger8")])
+
 (define_int_attr avvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8PP   
"pmxvi4ger8pp")])
 
+(define_int_attr avvi4i4i8_dm  [(UNSPEC_MMA_PMXVI4GER8PP   
"pmdmxvi4ger8pp")])
+
 (define_int_attr vvi4i4i2  [(UNSPEC_MMA_PMXVI16GER2"pmxvi16ger2")
 (UNSPEC_MMA_PMXVI16GER2S   "pmxvi16ger2s")
 (UNSPEC_MMA_PMXVF16GER2"pmxvf16ger2")
 (UNSPEC_MMA_PMXVBF16GER2   
"pmxvbf16ger2")])
 
+(define_int_attr vvi4i4i2_dm   [(UNSPEC_MMA_PMXVI16GER2"pmdmxvi16ger2")
+(UNSPEC_MMA_PMXVI16GER2S   
"pmdmxvi16ger2s")
+(UNSPEC_MMA_PMXVF16GER2"pmdmxvf16ger2")
+(UNSPEC_MMA_PMXVBF16GER2   
"pmdmxvbf16ger2")])
+
 (define_int_attr avvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2PP  "pmxvi16ger2pp")
 (UNSPEC_MMA_PMXVI16GER2SPP 
"pmxvi16ger2spp")
 (UNSPEC_MMA_PMXVF16GER2PP  "pmxvf16ger2pp")
@@ -245,25 +254,54 @@ (define_int_attr avvi4i4i2
[(UNSPEC_MMA_PMXVI16GER2PP  "pmxvi16ger2pp")
 (UNSPEC_MMA_PMXVBF16GER2NP 
"pmxvbf16ger2np")
 (UNSPEC_MMA_PMXVBF16GER2NN 
"pmxvbf16ger2nn")])
 
+(define_int_attr avvi4i4i2_dm  [(UNSPEC_MMA_PMXVI16GER2PP  
"pmdmxvi16ger2pp")
+(UNSPEC_MMA_PMXVI16GER2SPP 
"pmdmxvi16ger2spp")
+(UNSPEC_MMA_PMXVF16GER2PP  
"pmdmxvf16ger2pp")
+(UNSPEC_MMA_PMXVF16GER2PN  
"pmdmxvf16ger2pn")
+(UNSPEC_MMA_PMXVF16GER2NP  
"pmdmxvf16ger2np")
+(UNSPEC_MMA_PMXVF16GER2NN  
"pmdmxvf16ger2nn")
+(UNSPEC_MMA_PMXVBF16GER2PP 
"pmdmxvbf16ger2pp")
+(UNSPEC_MMA_PMXVBF16GER2PN 
"pmdmxvbf16ger2pn")
+(UNSPEC_MMA_PMXVBF16GER2NP 
"pmdmxvbf16ger2np")
+(UNSPEC_MMA_PMXVBF16GER2NN 
"pmdmxvbf16ger2nn")])
+
 (define_int_attr vvi4i4

[PATCH 4/6] PowerPC: Make MMA insns support DMR registers

2022-11-09 Thread Michael Meissner via Gcc-patches
This patch changes the MMA instructions to use either FPR registers
(-mcpu=power10) or DMRs (-mcpu=future).  In this patch, the existing MMA
instruction names are used.

A macro (__PPC_DMR__) is defined if the MMA instructions use the DMRs.

The patches have been tested on the following platforms.  I added the patches
for PR target/107299 that I submitted on November 2nd before doing the builds so
that GCC would build on systems using IEEE 128-bit long double.
*   https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html

There were no regressions with doing bootstrap builds and running the regression
tests:

1)  Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
2)  Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
3)  Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
4)  Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).

Can I check this patch into the GCC 13 master branch?

2022-11-09   Michael Meissner  

gcc/

* config/rs6000/mma.md (mma_): New define_expand to handle
mma_ for dense math and non dense math.
(mma_ insn): Restrict to non dense math.
(mma_xxsetaccz): Convert to define_expand to handle non dense math and
dense math.
(mma_xxsetaccz_p10): Rename from mma_xxsetaccz and restrict usage to non
dense math.
(mma_xxsetaccz_dm): Dense math version of mma_xxsetaccz.
(mma_): Add support for dense math.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
__PPC_DMR__ if we have dense math instructions.
* config/rs6000/rs6000.cc (print_operand): Make %A handle only DMRs if
dense math and only FPRs if not dense math.
(rs6000_split_multireg_move): Do not generate accumulator prime or
de-prime instructions if dense math.
---
 gcc/config/rs6000/mma.md  | 247 +-
 gcc/config/rs6000/rs6000-c.cc |   3 +
 gcc/config/rs6000/rs6000.cc   |  35 ++---
 3 files changed, 176 insertions(+), 109 deletions(-)

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index fe2e9c9e63e..835f34e8e00 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -545,190 +545,249 @@ (define_insn "*mma_disassemble_acc_dm"
   "dmxxextfdmr256 %0,%1,2"
   [(set_attr "type" "mma")])
 
-(define_insn "mma_"
+;; MMA instructions that do not use their accumulators as an input, still must
+;; not allow their vector operands to overlap the registers used by the
+;; accumulator.  We enforce this by marking the output as early clobber.  If we
+;; have dense math, we don't need the whole prime/de-prime action, so just make
+;; thse instructions be NOPs.
+
+(define_expand "mma_"
+  [(set (match_operand:XO 0 "register_operand")
+   (unspec:XO [(match_operand:XO 1 "register_operand")]
+  MMA_ACC))]
+  "TARGET_MMA"
+{
+  if (TARGET_DENSE_MATH)
+{
+  if (!rtx_equal_p (operands[0], operands[1]))
+   emit_move_insn (operands[0], operands[1]);
+  DONE;
+}
+
+  /* Generate the prime/de-prime code.  */
+})
+
+(define_insn "*mma_"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=")
(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")]
MMA_ACC))]
-  "TARGET_MMA"
+  "TARGET_MMA && !TARGET_DENSE_MATH"
   " %A0"
   [(set_attr "type" "mma")])
 
 ;; We can't have integer constants in XOmode so we wrap this in an
-;; UNSPEC_VOLATILE.
+;; UNSPEC_VOLATILE for the non-dense math case.  For dense math, we don't need
+;; to disable optimization and we can do a normal UNSPEC.
 
-(define_insn "mma_xxsetaccz"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
+(define_expand "mma_xxsetaccz"
+  [(set (match_operand:XO 0 "register_operand")
(unspec_volatile:XO [(const_int 0)]
UNSPECV_MMA_XXSETACCZ))]
   "TARGET_MMA"
+{
+  if (TARGET_DENSE_MATH)
+{
+  emit_insn (gen_mma_xxsetaccz_dm (operands[0]));
+  DONE;
+}
+})
+
+(define_insn "*mma_xxsetaccz_p10"
+  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
+   (unspec_volatile:XO [(const_int 0)]
+   UNSPECV_MMA_XXSETACCZ))]
+  "TARGET_MMA && !TARGET_DENSE_MATH"
   "xxsetaccz %A0"
   [(set_attr "type" "mma")])
 
+
+(define_insn "mma_xxsetaccz_dm"
+  [(set (match_operand:XO 0 "dmr_operand" "=wD")
+   (unspec:XO [(const_int 0)]
+  UNSPECV_MMA_XXSETACCZ))]
+  "TARGET_DENSE_MATH"
+  "dmsetaccz %0"
+  [(set_attr "type" "mma")])
+
 (define_insn "mma_"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=,")
-   (unspec:XO [(match_operand:V16QI 1 

[PATCH 3/6] PowerPC: Add support for accumulators in DMR registers.

2022-11-09 Thread Michael Meissner via Gcc-patches
The MMA system added the notion of accumulator registers.  In power10, these
accumulators overlapped with the FPR registers, but logically the accumulators
were separate from the FPR registers.  It is anticipated that in future
systems, we may have a separate dense math unit and the accumulators will be
mapped onto the new dense math registers (DMRs).  This patch adds the support
for dense math registers.

These changes are preliminary.  They are expected to change over time.

This particular patch does not change the MMA support to use the accumulators
within the dense math registers.  This patch just adds the basic support for
having separate DMRs.  The next patch will switch the MMA support to use the
accumulators if -mcpu=future is used.

For testing purposes, I added an undocumented option '-mdense-math' to enable
or disable the dense math support.

This patch adds a new constraint (wD).  If MMA is selected but dense math is
not selected (i.e. -mcpu=power10), the wD constraint will match accumulators
that overlap with the FPRs.  If both MMA and dense math are selected
(i.e. -mcpu=future), the wD constraint will only match DMRs.

This patch modifies the existing %A output modifier.  If MMA is selected but
dense math is not selected, then %A convert the FPR register number to the
accumulator number.  If both MMA and dense math are selected, then %A will only
work if the register is an accumulator mapped onto a DMR.

The intention is that user code using extended asm can be modified to run on
both MMA without dense math and MMA with dense math:

1)  If possible, don't use extended asm, but instead use the MMA built-in
functions;

2)  If you do need to write extended asm, change the d constraints
targetting accumulators should now use wD;

3)  Only use the built-in zero, assemble and disassemble functions create
move data between vector quad types and dense math accumulators.
I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the
extended asm code.  The reason is these instructions assume there is a
1-to-1 correspondence between 4 adjacent FPR registers and an
accumulator that overlaps with those instructions.  With accumulators
now being separate registers, there no longer is a 1-to-1
correspondence.

It is possible that the mangling for DMRs and the GDB register numbers may
change in the future.

The patches have been tested on the following platforms.  I added the patches
for PR target/107299 that I submitted on November 2nd before doing the builds so
that GCC would build on systems using IEEE 128-bit long double.
*   https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html

There were no regressions with doing bootstrap builds and running the regression
tests:

1)  Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
2)  Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
3)  Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
4)  Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).

Can I check this patch into the GCC 13 master branch?

2022-11-09   Michael Meissner  

gcc/

* config/rs6000/constraints.md (wD constraint): New constraint.
* config/rs6000/mma.md (UNSPEC_DM_ASSEMBLE_ACC): New unspec.
(movxo): Convert into define_expand.
(movxo_fpr): Version of movxo where accumulators overlap with FPRs.
(movxo_dm): Dense math version of movxo.
(mma_assemble_acc): Add dense match support to define_expand.
(mma_assemble_acc_fpr): Rename from mma_assemble_acc, and restrict it to
non dense math.
(mma_assemble_acc_dm): Dense math version of mma_assemble_acc.
(mma_disassemble_acc): Add dense math support to define_expand.
(mma_disassemble_acc_fpr): Rename from mma_disassemble_acc, and restrict
it to non dense math.
(mma_disassemble_acc_dm): Dense math version of mma_disassemble_acc.
* config/rs6000/predicates.md (dmr_operand): New predicate.
(accumulator_operand): Likewise.
* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): Add -mdense-math.
(POWERPC_MASKS): Likewise.
* config/rs6000/rs6000.cc (enum rs6000_reg_type): Add DMR_REG_TYPE.
(enum rs6000_reload_reg_type): Add RELOAD_REG_DMR.
(LAST_RELOAD_REG_CLASS): Add support for DMR registers.
(reload_reg_map): Likewise.
(rs6000_reg_names): Likewise.
(alt_reg_names): Likewise.
(rs6000_hard_regno_nregs_internal): Likewise.
(rs6000_hard_regno_mode_ok_uncached): Likewise.
(rs6000_debug_reg_global): Likewise.
(rs6000_setup_reg_addr_masks): Likewise.
(rs6000_init_hard_regno_mode_ok): Likewise.
(rs6000_option_override_internal): Add checking for -mdense-math.
(rs6000_secondary_reload_memory): Add support for DMR registers.

[PATCH 2/6] PowerPC: Make -mcpu=future enable -mblock-ops-vector-pair.

2022-11-09 Thread Michael Meissner via Gcc-patches
This patch enables generating load and store vector pair instructions when
doing certain memory copy operations when -mcpu=future is used.  In doing tests
on power10, it was determined that using these instructions were problematical
in a few cases, so we disabled generating them by default.  This patch
re-enabled generating these instructions if -mcpu=future is used.

The patches have been tested on the following platforms.  I added the patches
for PR target/107299 that I submitted on November 2nd before doing the builds so
that GCC would build on systems using IEEE 128-bit long double.
*   https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html

There were no regressions with doing bootstrap builds and running the regression
tests:

1)  Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
2)  Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
3)  Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
4)  Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).

Can I check this patch into the GCC 13 master branch?

2022-11-09   Michael Meissner  

gcc/

* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): Add
-mblock-ops-vector-pair.
(POWERPC_MASKS): Likewise.
---
 gcc/config/rs6000/rs6000-cpus.def | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-cpus.def 
b/gcc/config/rs6000/rs6000-cpus.def
index 5eac7d97e65..e8df7927055 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -89,6 +89,7 @@
 
 /* Flags for a potential future processor that may or may not be delivered.  */
 #define ISA_FUTURE_MASKS   (ISA_3_1_MASKS_SERVER   \
+| OPTION_MASK_BLOCK_OPS_VECTOR_PAIR\
 | OPTION_MASK_FUTURE)
 
 /* Flags that need to be turned off if -mno-power9-vector.  */
@@ -126,6 +127,7 @@
 
 /* Mask of all options to set the default isa flags based on -mcpu=.  */
 #define POWERPC_MASKS  (OPTION_MASK_ALTIVEC\
+| OPTION_MASK_BLOCK_OPS_VECTOR_PAIR\
 | OPTION_MASK_CMPB \
 | OPTION_MASK_CRYPTO   \
 | OPTION_MASK_DFP  \
-- 
2.38.1


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[PATCH 1/6] PowerPC: Add -mcpu=future

2022-11-09 Thread Michael Meissner via Gcc-patches
This patch adds support for the -mcpu=future and -mtune=future options.
Besides defining __ARCH_PWR_FUTURE__ this particular patch does not enable any
new features.

These patches implement support for potential future PowerPC cpus.  At this
time, features enabled with -mcpu=future may or may not be in actual PowerPCs
that will be delivered in the future.

At present, we do not have any specific differences in terms of cpu tuning for
future machines, so we make -mtune=future act the same as -mtune=power10.  It
is anticipated that we may add support for changing the tuning characteristics
for -mtune=future at a later time.

The patches have been tested on the following platforms.  I added the patches
for PR target/107299 that I submitted on November 2nd before doing the builds so
that GCC would build on systems using IEEE 128-bit long double.
*   https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604834.html

There were no regressions with doing bootstrap builds and running the regression
tests:

1)  Power10 LE using --with-cpu=power10 --with-long-double-format=ieee;
2)  Power10 LE using --with-cpu=power10 --with-long-double-format=ibm;
3)  Power9 LE using --with-cpu=power9 --with-long-double-format=ibm; and
4)  Power8 BE using --with-cpu=power8 (both 32-bit & 64-bit tested).

Can I check this patch into the GCC 13 master branch?

2022-11-09   Michael Meissner  

gcc/

* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
__ARCH_PWR_FUTURE__ if -mcpu=future.
* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): New macro.
(POWERPC_MASKS): Add -mfuture.
* config/rs6000/rs6000-opts.h (enum processor_type): Add
PROCESSOR_FUTURE.
* config/rs6000/rs6000-tables.opt: Regenerate.
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Add
-mcpu=future support.  Make -mtune=future act like -mtune=power10 for
now.
(rs6000_machine_from_flags): Likewise.
(rs6000_reassociation_width): Likewise.
(rs6000_adjust_cost): Likewise.
(rs6000_issue_rate): Likewise.
(rs6000_sched_reorder): Likewise.
(rs6000_sched_reorder2): Likewise.
(rs6000_register_move_cost): Likewise.
(rs6000_opt_masks): Add -mfuture.
* config/rs6000/rs6000.h (ASM_CPU_SUPPORT): Likewise.
* config/rs6000/rs6000.opt (-mfuture): New undocumented debug switch.
* config/rs6000/rs6000.md (cpu attribute): Add -mcpu=future support.
* doc/invoke.texi (IBM RS/6000 and PowerPC Options): Document 
-mcpu=future.
---
 gcc/config/rs6000/rs6000-c.cc   |  2 ++
 gcc/config/rs6000/rs6000-cpus.def   |  6 ++
 gcc/config/rs6000/rs6000-opts.h |  4 +++-
 gcc/config/rs6000/rs6000-tables.opt |  3 +++
 gcc/config/rs6000/rs6000.cc | 27 +++
 gcc/config/rs6000/rs6000.h  |  1 +
 gcc/config/rs6000/rs6000.md |  2 +-
 gcc/config/rs6000/rs6000.opt|  4 
 gcc/doc/invoke.texi |  2 +-
 9 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 5c2f3bcee9f..0d7b43f8edb 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -447,6 +447,8 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
flags)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR9");
   if ((flags & OPTION_MASK_POWER10) != 0)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR10");
+  if ((flags & OPTION_MASK_FUTURE) != 0)
+rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR_FUTURE");
   if ((flags & OPTION_MASK_SOFT_FLOAT) != 0)
 rs6000_define_or_undefine_macro (define_p, "_SOFT_FLOAT");
   if ((flags & OPTION_MASK_RECIP_PRECISION) != 0)
diff --git a/gcc/config/rs6000/rs6000-cpus.def 
b/gcc/config/rs6000/rs6000-cpus.def
index c3825bcccd8..5eac7d97e65 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -87,6 +87,10 @@
 | OTHER_POWER10_MASKS  \
 | OPTION_MASK_P10_FUSION)
 
+/* Flags for a potential future processor that may or may not be delivered.  */
+#define ISA_FUTURE_MASKS   (ISA_3_1_MASKS_SERVER   \
+| OPTION_MASK_FUTURE)
+
 /* Flags that need to be turned off if -mno-power9-vector.  */
 #define OTHER_P9_VECTOR_MASKS  (OPTION_MASK_FLOAT128_HW\
 | OPTION_MASK_P9_MINMAX)
@@ -133,6 +137,7 @@
 | OPTION_MASK_FPRND\
 | OPTION_MASK_POWER10  \
 | OPTION_MASK_P10_FUSION   \
+| OPTION_MASK_FUTURE   \
 | OPTION_MASK_HTM  \
 

  1   2   3   4   5   6   7   >