PING^4: [PATCH V4, rs6000] Disable generation of scalar modulo instructions

2023-09-08 Thread Pat Haugen via Gcc-patches

Ping.

On 6/30/23 2:26 PM, Pat Haugen via Gcc-patches wrote:

Updated from prior version to address latest review comment (simplify
umod3).

Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-06-30  Pat Haugen  

gcc/
     * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling
     scalar modulo.
     * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
     * config/rs6000/rs6000.md (mod3, *mod3): Disable.
     (define_expand umod3): New.
     (define_insn umod3): Rename to *umod3 and disable.
     (umodti3, modti3): Disable.

gcc/testsuite/
     * gcc.target/powerpc/clone1.c: Add xfails.
     * gcc.target/powerpc/clone3.c: Likewise.
     * gcc.target/powerpc/mod-1.c: Update scan strings and add xfails.
     * gcc.target/powerpc/mod-2.c: Likewise.
     * gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.




PING^3: [PATCH V4, rs6000] Disable generation of scalar modulo instructions

2023-08-09 Thread Pat Haugen via Gcc-patches

Ping.

On 6/30/23 2:26 PM, Pat Haugen via Gcc-patches wrote:

Updated from prior version to address latest review comment (simplify
umod3).

Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-06-30  Pat Haugen  

gcc/
     * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling
     scalar modulo.
     * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
     * config/rs6000/rs6000.md (mod3, *mod3): Disable.
     (define_expand umod3): New.
     (define_insn umod3): Rename to *umod3 and disable.
     (umodti3, modti3): Disable.

gcc/testsuite/
     * gcc.target/powerpc/clone1.c: Add xfails.
     * gcc.target/powerpc/clone3.c: Likewise.
     * gcc.target/powerpc/mod-1.c: Update scan strings and add xfails.
     * gcc.target/powerpc/mod-2.c: Likewise.
     * gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.




PING ^2: [PATCH V4, rs6000] Disable generation of scalar modulo instructions

2023-08-01 Thread Pat Haugen via Gcc-patches

On 6/30/23 2:26 PM, Pat Haugen via Gcc-patches wrote:

Updated from prior version to address latest review comment (simplify
umod3).

Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-06-30  Pat Haugen  

gcc/
     * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling
     scalar modulo.
     * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
     * config/rs6000/rs6000.md (mod3, *mod3): Disable.
     (define_expand umod3): New.
     (define_insn umod3): Rename to *umod3 and disable.
     (umodti3, modti3): Disable.

gcc/testsuite/
     * gcc.target/powerpc/clone1.c: Add xfails.
     * gcc.target/powerpc/clone3.c: Likewise.
     * gcc.target/powerpc/mod-1.c: Update scan strings and add xfails.
     * gcc.target/powerpc/mod-2.c: Likewise.
     * gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.




Re: [PATCH V4, rs6000] Disable generation of scalar modulo instructions

2023-07-13 Thread Pat Haugen via Gcc-patches

Ping.

On 6/30/23 2:26 PM, Pat Haugen via Gcc-patches wrote:

Updated from prior version to address latest review comment (simplify
umod3).

Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-06-30  Pat Haugen  

gcc/
     * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling
     scalar modulo.
     * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
     * config/rs6000/rs6000.md (mod3, *mod3): Disable.
     (define_expand umod3): New.
     (define_insn umod3): Rename to *umod3 and disable.
     (umodti3, modti3): Disable.

gcc/testsuite/
     * gcc.target/powerpc/clone1.c: Add xfails.
     * gcc.target/powerpc/clone3.c: Likewise.
     * gcc.target/powerpc/mod-1.c: Update scan strings and add xfails.
     * gcc.target/powerpc/mod-2.c: Likewise.
     * gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.




[PATCH V4, rs6000] Disable generation of scalar modulo instructions

2023-06-30 Thread Pat Haugen via Gcc-patches

Updated from prior version to address latest review comment (simplify
umod3).

Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-06-30  Pat Haugen  

gcc/
* config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling
scalar modulo.
* config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
* config/rs6000/rs6000.md (mod3, *mod3): Disable.
(define_expand umod3): New.
(define_insn umod3): Rename to *umod3 and disable.
(umodti3, modti3): Disable.

gcc/testsuite/
* gcc.target/powerpc/clone1.c: Add xfails.
* gcc.target/powerpc/clone3.c: Likewise.
* gcc.target/powerpc/mod-1.c: Update scan strings and add xfails.
* gcc.target/powerpc/mod-2.c: Likewise.
* gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 07c3a3d15ac..72abf285301 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -22157,7 +22157,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
*total = rs6000_cost->divsi;
}
   /* Add in shift and subtract for MOD unless we have a mod instruction. */
-  if (!TARGET_MODULO && (code == MOD || code == UMOD))
+  if ((!TARGET_MODULO
+  || (RS6000_DISABLE_SCALAR_MODULO && SCALAR_INT_MODE_P (mode)))
+&& (code == MOD || code == UMOD))
*total += COSTS_N_INSNS (2);
   return false;
 
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..22595f6ebd7 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2492,3 +2492,9 @@ while (0)
rs6000_asm_output_opcode (STREAM);  \
 }  \
   while (0)
+
+/* Disable generation of scalar modulo instructions due to performance issues
+   with certain input values.  This can be removed in the future when the
+   issues have been resolved.  */
+#define RS6000_DISABLE_SCALAR_MODULO 1
+
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index cdab49fbb91..555c8525333 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3422,6 +3422,17 @@ (define_expand "mod3"
FAIL;
 
   operands[2] = force_reg (mode, operands[2]);
+
+  if (RS6000_DISABLE_SCALAR_MODULO)
+   {
+ temp1 = gen_reg_rtx (mode);
+ temp2 = gen_reg_rtx (mode);
+
+ emit_insn (gen_div3 (temp1, operands[1], operands[2]));
+ emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+ emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+ DONE;
+   }
 }
   else
 {
@@ -3441,17 +3452,36 @@ (define_insn "*mod3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
 (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
 (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
   "mods %0,%1,%2"
   [(set_attr "type" "div")
(set_attr "size" "")])
 
+;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is
+;; removed.
+(define_expand "umod3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand")
+   (umod:GPR (match_operand:GPR 1 "gpc_reg_operand")
+ (match_operand:GPR 2 "gpc_reg_operand")))]
+  "TARGET_MODULO"
+{
+  if (RS6000_DISABLE_SCALAR_MODULO)
+{
+  rtx temp1 = gen_reg_rtx (mode);
+  rtx temp2 = gen_reg_rtx (mode);
+
+  emit_insn (gen_udiv3 (temp1, operands[1], operands[2]));
+  emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+  emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+  DONE;
+}
+})
 
-(define_insn "umod3"
+(define_insn "*umod3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
 (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
  (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
   "modu %0,%1,%2"
   [(set_attr "type" "div")
(set_attr "size" "")])
@@ -3508,7 +3538,7 @@ (define_insn "umodti3"
   [(set (match_operand:TI 0 "altivec_register_operand" "=v")
(umod:TI (match_operand:TI 1 "altivec_register_operand" "v")
 (match_operand:TI 2 "altivec_register_operand" "v")))]
-  "TARGET_POWER10 && TARGET_POWERPC64"
+  "TARGET_POWER10 && TARGET_POWERPC64 && !RS6000_DISABLE_SCALAR_MODULO"
   "vmoduq %0,%1,%2"
   [(set_attr "type" "vecdiv")
(set_attr "size" "128")])
@@ -3517,7 +3547,7 @@ (define_insn "modti3"
   [(set (match_operand:TI 0 "altivec_register_operand" "=v")
(mod:TI 

Re: [PATCH V3, rs6000] Disable generation of scalar modulo instructions

2023-06-27 Thread Pat Haugen via Gcc-patches

On 6/27/23 1:52 PM, Pat Haugen via Gcc-patches wrote:
Updated from prior version to address review comments (update 
rs6000_rtx_cost,

update scan strings of mod-1.c/mod-2.c)l.

Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-06-27  Pat Haugen  

gcc/
 * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling
 scalar modulo.
 * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
 * config/rs6000/rs6000.md (mod3, *mod3): Disable.
 (define_expand umod3): New.
 (define_insn umod3): Rename to *umod3 and disable.
 (umodti3, modti3): Disable.

gcc/testsuite/
 * gcc.target/powerpc/clone1.c: Add xfails.
 * gcc.target/powerpc/clone3.c: Likewise.
 * gcc.target/powerpc/mod-1.c: Update scan strings and add xfails.
 * gcc.target/powerpc/mod-2.c: Likewise.
 * gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.



Attaching patch since my mailer apparently messed up some formatting again.

-Pat
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 546c353029b..2dae217bf64 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -22127,7 +22127,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
*total = rs6000_cost->divsi;
}
   /* Add in shift and subtract for MOD unless we have a mod instruction. */
-  if (!TARGET_MODULO && (code == MOD || code == UMOD))
+  if ((!TARGET_MODULO
+  || (RS6000_DISABLE_SCALAR_MODULO && SCALAR_INT_MODE_P (mode)))
+&& (code == MOD || code == UMOD))
*total += COSTS_N_INSNS (2);
   return false;
 
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..22595f6ebd7 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2492,3 +2492,9 @@ while (0)
rs6000_asm_output_opcode (STREAM);  \
 }  \
   while (0)
+
+/* Disable generation of scalar modulo instructions due to performance issues
+   with certain input values.  This can be removed in the future when the
+   issues have been resolved.  */
+#define RS6000_DISABLE_SCALAR_MODULO 1
+
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b0db8ae508d..6c2f237a539 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3421,6 +3421,17 @@ (define_expand "mod3"
FAIL;
 
   operands[2] = force_reg (mode, operands[2]);
+
+  if (RS6000_DISABLE_SCALAR_MODULO)
+   {
+ temp1 = gen_reg_rtx (mode);
+ temp2 = gen_reg_rtx (mode);
+
+ emit_insn (gen_div3 (temp1, operands[1], operands[2]));
+ emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+ emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+ DONE;
+   }
 }
   else
 {
@@ -3440,17 +3451,42 @@ (define_insn "*mod3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
 (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
 (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
   "mods %0,%1,%2"
   [(set_attr "type" "div")
(set_attr "size" "")])
 
+;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is
+;; removed.
+(define_expand "umod3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand")
+   (umod:GPR (match_operand:GPR 1 "gpc_reg_operand")
+ (match_operand:GPR 2 "gpc_reg_operand")))]
+  ""
+{
+  rtx temp1;
+  rtx temp2;
+
+  if (!TARGET_MODULO)
+   FAIL;
 
-(define_insn "umod3"
+  if (RS6000_DISABLE_SCALAR_MODULO)
+{
+  temp1 = gen_reg_rtx (mode);
+  temp2 = gen_reg_rtx (mode);
+
+  emit_insn (gen_udiv3 (temp1, operands[1], operands[2]));
+  emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+  emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+  DONE;
+}
+})
+
+(define_insn "*umod3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
 (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
  (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
   "modu %0,%1,%2"
   [(set

[PATCH V3, rs6000] Disable generation of scalar modulo instructions

2023-06-27 Thread Pat Haugen via Gcc-patches
Updated from prior version to address review comments (update 
rs6000_rtx_cost,

update scan strings of mod-1.c/mod-2.c)l.

Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-06-27  Pat Haugen  

gcc/
* config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling
scalar modulo.
* config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
* config/rs6000/rs6000.md (mod3, *mod3): Disable.
(define_expand umod3): New.
(define_insn umod3): Rename to *umod3 and disable.
(umodti3, modti3): Disable.

gcc/testsuite/
* gcc.target/powerpc/clone1.c: Add xfails.
* gcc.target/powerpc/clone3.c: Likewise.
* gcc.target/powerpc/mod-1.c: Update scan strings and add xfails.
* gcc.target/powerpc/mod-2.c: Likewise.
* gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.


diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 546c353029b..2dae217bf64 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -22127,7 +22127,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,

*total = rs6000_cost->divsi;
}
   /* Add in shift and subtract for MOD unless we have a mod 
instruction. */

-  if (!TARGET_MODULO && (code == MOD || code == UMOD))
+  if ((!TARGET_MODULO
+  || (RS6000_DISABLE_SCALAR_MODULO && SCALAR_INT_MODE_P (mode)))
+&& (code == MOD || code == UMOD))
*total += COSTS_N_INSNS (2);
   return false;

diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..22595f6ebd7 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2492,3 +2492,9 @@ while (0)
rs6000_asm_output_opcode (STREAM);  \
 }  \
   while (0)
+
+/* Disable generation of scalar modulo instructions due to performance 
issues

+   with certain input values.  This can be removed in the future when the
+   issues have been resolved.  */
+#define RS6000_DISABLE_SCALAR_MODULO 1
+
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b0db8ae508d..6c2f237a539 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3421,6 +3421,17 @@ (define_expand "mod3"
FAIL;

   operands[2] = force_reg (mode, operands[2]);
+
+  if (RS6000_DISABLE_SCALAR_MODULO)
+   {
+ temp1 = gen_reg_rtx (mode);
+ temp2 = gen_reg_rtx (mode);
+
+ emit_insn (gen_div3 (temp1, operands[1], operands[2]));
+ emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+ emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+ DONE;
+   }
 }
   else
 {
@@ -3440,17 +3451,42 @@ (define_insn "*mod3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
 (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
 (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
   "mods %0,%1,%2"
   [(set_attr "type" "div")
(set_attr "size" "")])

+;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is
+;; removed.
+(define_expand "umod3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand")
+   (umod:GPR (match_operand:GPR 1 "gpc_reg_operand")
+ (match_operand:GPR 2 "gpc_reg_operand")))]
+  ""
+{
+  rtx temp1;
+  rtx temp2;
+
+  if (!TARGET_MODULO)
+   FAIL;

-(define_insn "umod3"
+  if (RS6000_DISABLE_SCALAR_MODULO)
+{
+  temp1 = gen_reg_rtx (mode);
+  temp2 = gen_reg_rtx (mode);
+
+  emit_insn (gen_udiv3 (temp1, operands[1], operands[2]));
+  emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+  emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+  DONE;
+}
+})
+
+(define_insn "*umod3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
 (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
  (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
   "modu %0,%1,%2"
   [(set_attr "type" "div")
(set_attr "size" "")])
@@ -3507,7 +3543,7 @@ (define_insn "umodti3"
   [(set (match_operand:TI 0 "altivec_register_operand" "=v")
(umod:TI (match_operand:TI 1 "altivec_register_operand" "v")
 (match_operand:TI 2 "altivec_register_operand" "v")))]
-  "TARGET_POWER10 && TARGET_POWERPC64"
+  "TARGET_POWER10 && TARGET_POWERPC64 && !RS6000_DISABLE_SCALAR_MODULO"
   "vmoduq %0,%1,%2"
   [(set_attr "type" "vecdiv")

Re: [PATCH V2, rs6000] Disable generation of scalar modulo instructions

2023-06-02 Thread Pat Haugen via Gcc-patches

Ping ^3

On 4/18/23 7:22 AM, Pat Haugen via Gcc-patches wrote:

Updated from prior patch to also disable for int128.


Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-04-18  Pat Haugen  

gcc/
 * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
 * config/rs6000/rs6000.md (mod3, *mod3): Disable.
 (define_expand umod3): New.
 (define_insn umod3): Rename to *umod3 and disable.
 (umodti3, modti3): Disable.

gcc/testsuite/
 * gcc.target/powerpc/clone1.c: Add xfails.
 * gcc.target/powerpc/clone3.c: Likewise.
 * gcc.target/powerpc/mod-1.c: Likewise.
 * gcc.target/powerpc/mod-2.c: Likewise.
 * gcc.target/powerpc/p10-vdivq-vmodq.c: Likewise.


diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..1cf0a0013c0 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2492,3 +2492,9 @@ while (0)
     rs6000_asm_output_opcode (STREAM);    \
  }    \
    while (0)
+
+/* Disable generation of scalar modulo instructions due to performance 
issues

+   with certain input values. This can be removed in the future when the
+   issues have been resolved.  */
+#define RS6000_DISABLE_SCALAR_MODULO 1
+
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 44f7dd509cb..4f397bc9179 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3421,6 +3421,17 @@ (define_expand "mod3"
  FAIL;

    operands[2] = force_reg (mode, operands[2]);
+
+  if (RS6000_DISABLE_SCALAR_MODULO)
+    {
+  temp1 = gen_reg_rtx (mode);
+  temp2 = gen_reg_rtx (mode);
+
+  emit_insn (gen_div3 (temp1, operands[1], operands[2]));
+  emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+  emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+  DONE;
+    }
  }
    else
  {
@@ -3440,17 +3451,42 @@ (define_insn "*mod3"
    [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
  (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
   (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
    "mods %0,%1,%2"
    [(set_attr "type" "div")
     (set_attr "size" "")])

+;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is
+;; removed.
+(define_expand "umod3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand")
+    (umod:GPR (match_operand:GPR 1 "gpc_reg_operand")
+  (match_operand:GPR 2 "gpc_reg_operand")))]
+  ""
+{
+  rtx temp1;
+  rtx temp2;
+
+  if (!TARGET_MODULO)
+    FAIL;

-(define_insn "umod3"
+  if (RS6000_DISABLE_SCALAR_MODULO)
+    {
+  temp1 = gen_reg_rtx (mode);
+  temp2 = gen_reg_rtx (mode);
+
+  emit_insn (gen_udiv3 (temp1, operands[1], operands[2]));
+  emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+  emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+  DONE;
+    }
+})
+
+(define_insn "*umod3"
    [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
  (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
    (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
    "modu %0,%1,%2"
    [(set_attr "type" "div")
     (set_attr "size" "")])
@@ -3507,7 +3543,7 @@ (define_insn "umodti3"
    [(set (match_operand:TI 0 "altivec_register_operand" "=v")
  (umod:TI (match_operand:TI 1 "altivec_register_operand" "v")
   (match_operand:TI 2 "altivec_register_operand" "v")))]
-  "TARGET_POWER10 && TARGET_POWERPC64"
+  "TARGET_POWER10 && TARGET_POWERPC64 && !RS6000_DISABLE_SCALAR_MODULO"
    "vmoduq %0,%1,%2"
    [(set_attr "type" "vecdiv")
     (set_attr "size" "128")])
@@ -3516,7 +3552,7 @@ (define_insn "modti3"
    [(set (match_operand:TI 0 "altivec_register_operand" "=v")
  (mod:TI (match_operand:TI 1 "altivec_register_operand" "v")
  (match_operand:TI 2 "altivec_register_operand" "v")))]
-  "TARGET_POWER10 && TARGE

Re: [PATCH V2, rs6000] Disable generation of scalar modulo instructions

2023-05-18 Thread Pat Haugen via Gcc-patches

Ping.

On 4/18/23 7:22 AM, Pat Haugen via Gcc-patches wrote:

Updated from prior patch to also disable for int128.


Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-04-18  Pat Haugen  

gcc/
 * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
 * config/rs6000/rs6000.md (mod3, *mod3): Disable.
 (define_expand umod3): New.
 (define_insn umod3): Rename to *umod3 and disable.
 (umodti3, modti3): Disable.

gcc/testsuite/
 * gcc.target/powerpc/clone1.c: Add xfails.
 * gcc.target/powerpc/clone3.c: Likewise.
 * gcc.target/powerpc/mod-1.c: Likewise.
 * gcc.target/powerpc/mod-2.c: Likewise.
 * gcc.target/powerpc/p10-vdivq-vmodq.c: Likewise.


diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..1cf0a0013c0 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2492,3 +2492,9 @@ while (0)
     rs6000_asm_output_opcode (STREAM);    \
  }    \
    while (0)
+
+/* Disable generation of scalar modulo instructions due to performance 
issues

+   with certain input values. This can be removed in the future when the
+   issues have been resolved.  */
+#define RS6000_DISABLE_SCALAR_MODULO 1
+
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 44f7dd509cb..4f397bc9179 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3421,6 +3421,17 @@ (define_expand "mod3"
  FAIL;

    operands[2] = force_reg (mode, operands[2]);
+
+  if (RS6000_DISABLE_SCALAR_MODULO)
+    {
+  temp1 = gen_reg_rtx (mode);
+  temp2 = gen_reg_rtx (mode);
+
+  emit_insn (gen_div3 (temp1, operands[1], operands[2]));
+  emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+  emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+  DONE;
+    }
  }
    else
  {
@@ -3440,17 +3451,42 @@ (define_insn "*mod3"
    [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
  (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
   (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
    "mods %0,%1,%2"
    [(set_attr "type" "div")
     (set_attr "size" "")])

+;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is
+;; removed.
+(define_expand "umod3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand")
+    (umod:GPR (match_operand:GPR 1 "gpc_reg_operand")
+  (match_operand:GPR 2 "gpc_reg_operand")))]
+  ""
+{
+  rtx temp1;
+  rtx temp2;
+
+  if (!TARGET_MODULO)
+    FAIL;

-(define_insn "umod3"
+  if (RS6000_DISABLE_SCALAR_MODULO)
+    {
+  temp1 = gen_reg_rtx (mode);
+  temp2 = gen_reg_rtx (mode);
+
+  emit_insn (gen_udiv3 (temp1, operands[1], operands[2]));
+  emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+  emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+  DONE;
+    }
+})
+
+(define_insn "*umod3"
    [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
  (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
    (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
    "modu %0,%1,%2"
    [(set_attr "type" "div")
     (set_attr "size" "")])
@@ -3507,7 +3543,7 @@ (define_insn "umodti3"
    [(set (match_operand:TI 0 "altivec_register_operand" "=v")
  (umod:TI (match_operand:TI 1 "altivec_register_operand" "v")
   (match_operand:TI 2 "altivec_register_operand" "v")))]
-  "TARGET_POWER10 && TARGET_POWERPC64"
+  "TARGET_POWER10 && TARGET_POWERPC64 && !RS6000_DISABLE_SCALAR_MODULO"
    "vmoduq %0,%1,%2"
    [(set_attr "type" "vecdiv")
     (set_attr "size" "128")])
@@ -3516,7 +3552,7 @@ (define_insn "modti3"
    [(set (match_operand:TI 0 "altivec_register_operand" "=v")
  (mod:TI (match_operand:TI 1 "altivec_register_operand" "v")
  (match_operand:TI 2 "altivec_register_operand" "v")))]
-  "TARGET_POWER10 && TARGE

Re: [PATCH V2, rs6000] Disable generation of scalar modulo instructions

2023-05-04 Thread Pat Haugen via Gcc-patches

Ping.

On 4/18/23 7:22 AM, Pat Haugen via Gcc-patches wrote:

Updated from prior patch to also disable for int128.


Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-04-18  Pat Haugen  

gcc/
 * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
 * config/rs6000/rs6000.md (mod3, *mod3): Disable.
 (define_expand umod3): New.
 (define_insn umod3): Rename to *umod3 and disable.
 (umodti3, modti3): Disable.

gcc/testsuite/
 * gcc.target/powerpc/clone1.c: Add xfails.
 * gcc.target/powerpc/clone3.c: Likewise.
 * gcc.target/powerpc/mod-1.c: Likewise.
 * gcc.target/powerpc/mod-2.c: Likewise.
 * gcc.target/powerpc/p10-vdivq-vmodq.c: Likewise.


diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..1cf0a0013c0 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2492,3 +2492,9 @@ while (0)
     rs6000_asm_output_opcode (STREAM);    \
  }    \
    while (0)
+
+/* Disable generation of scalar modulo instructions due to performance 
issues

+   with certain input values. This can be removed in the future when the
+   issues have been resolved.  */
+#define RS6000_DISABLE_SCALAR_MODULO 1
+
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 44f7dd509cb..4f397bc9179 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3421,6 +3421,17 @@ (define_expand "mod3"
  FAIL;

    operands[2] = force_reg (mode, operands[2]);
+
+  if (RS6000_DISABLE_SCALAR_MODULO)
+    {
+  temp1 = gen_reg_rtx (mode);
+  temp2 = gen_reg_rtx (mode);
+
+  emit_insn (gen_div3 (temp1, operands[1], operands[2]));
+  emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+  emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+  DONE;
+    }
  }
    else
  {
@@ -3440,17 +3451,42 @@ (define_insn "*mod3"
    [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
  (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
   (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
    "mods %0,%1,%2"
    [(set_attr "type" "div")
     (set_attr "size" "")])

+;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is
+;; removed.
+(define_expand "umod3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand")
+    (umod:GPR (match_operand:GPR 1 "gpc_reg_operand")
+  (match_operand:GPR 2 "gpc_reg_operand")))]
+  ""
+{
+  rtx temp1;
+  rtx temp2;
+
+  if (!TARGET_MODULO)
+    FAIL;

-(define_insn "umod3"
+  if (RS6000_DISABLE_SCALAR_MODULO)
+    {
+  temp1 = gen_reg_rtx (mode);
+  temp2 = gen_reg_rtx (mode);
+
+  emit_insn (gen_udiv3 (temp1, operands[1], operands[2]));
+  emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+  emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+  DONE;
+    }
+})
+
+(define_insn "*umod3"
    [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
  (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
    (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
    "modu %0,%1,%2"
    [(set_attr "type" "div")
     (set_attr "size" "")])
@@ -3507,7 +3543,7 @@ (define_insn "umodti3"
    [(set (match_operand:TI 0 "altivec_register_operand" "=v")
  (umod:TI (match_operand:TI 1 "altivec_register_operand" "v")
   (match_operand:TI 2 "altivec_register_operand" "v")))]
-  "TARGET_POWER10 && TARGET_POWERPC64"
+  "TARGET_POWER10 && TARGET_POWERPC64 && !RS6000_DISABLE_SCALAR_MODULO"
    "vmoduq %0,%1,%2"
    [(set_attr "type" "vecdiv")
     (set_attr "size" "128")])
@@ -3516,7 +3552,7 @@ (define_insn "modti3"
    [(set (match_operand:TI 0 "altivec_register_operand" "=v")
  (mod:TI (match_operand:TI 1 "altivec_register_operand" "v")
  (match_operand:TI 2 "altivec_register_operand" "v")))]
-  "TARGET_POWER10 && TARGE

[PATCH V2, rs6000] Disable generation of scalar modulo instructions

2023-04-18 Thread Pat Haugen via Gcc-patches

Updated from prior patch to also disable for int128.


Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-04-18  Pat Haugen  

gcc/
* config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
* config/rs6000/rs6000.md (mod3, *mod3): Disable.
(define_expand umod3): New.
(define_insn umod3): Rename to *umod3 and disable.
(umodti3, modti3): Disable.

gcc/testsuite/
* gcc.target/powerpc/clone1.c: Add xfails.
* gcc.target/powerpc/clone3.c: Likewise.
* gcc.target/powerpc/mod-1.c: Likewise.
* gcc.target/powerpc/mod-2.c: Likewise.
* gcc.target/powerpc/p10-vdivq-vmodq.c: Likewise.


diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..1cf0a0013c0 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2492,3 +2492,9 @@ while (0)
rs6000_asm_output_opcode (STREAM);  \
 }  \
   while (0)
+
+/* Disable generation of scalar modulo instructions due to performance 
issues

+   with certain input values. This can be removed in the future when the
+   issues have been resolved.  */
+#define RS6000_DISABLE_SCALAR_MODULO 1
+
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 44f7dd509cb..4f397bc9179 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3421,6 +3421,17 @@ (define_expand "mod3"
FAIL;

   operands[2] = force_reg (mode, operands[2]);
+
+  if (RS6000_DISABLE_SCALAR_MODULO)
+   {
+ temp1 = gen_reg_rtx (mode);
+ temp2 = gen_reg_rtx (mode);
+
+ emit_insn (gen_div3 (temp1, operands[1], operands[2]));
+ emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+ emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+ DONE;
+   }
 }
   else
 {
@@ -3440,17 +3451,42 @@ (define_insn "*mod3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
 (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
 (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
   "mods %0,%1,%2"
   [(set_attr "type" "div")
(set_attr "size" "")])

+;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is
+;; removed.
+(define_expand "umod3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand")
+   (umod:GPR (match_operand:GPR 1 "gpc_reg_operand")
+ (match_operand:GPR 2 "gpc_reg_operand")))]
+  ""
+{
+  rtx temp1;
+  rtx temp2;
+
+  if (!TARGET_MODULO)
+   FAIL;

-(define_insn "umod3"
+  if (RS6000_DISABLE_SCALAR_MODULO)
+{
+  temp1 = gen_reg_rtx (mode);
+  temp2 = gen_reg_rtx (mode);
+
+  emit_insn (gen_udiv3 (temp1, operands[1], operands[2]));
+  emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+  emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+  DONE;
+}
+})
+
+(define_insn "*umod3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
 (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
  (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
   "modu %0,%1,%2"
   [(set_attr "type" "div")
(set_attr "size" "")])
@@ -3507,7 +3543,7 @@ (define_insn "umodti3"
   [(set (match_operand:TI 0 "altivec_register_operand" "=v")
(umod:TI (match_operand:TI 1 "altivec_register_operand" "v")
 (match_operand:TI 2 "altivec_register_operand" "v")))]
-  "TARGET_POWER10 && TARGET_POWERPC64"
+  "TARGET_POWER10 && TARGET_POWERPC64 && !RS6000_DISABLE_SCALAR_MODULO"
   "vmoduq %0,%1,%2"
   [(set_attr "type" "vecdiv")
(set_attr "size" "128")])
@@ -3516,7 +3552,7 @@ (define_insn "modti3"
   [(set (match_operand:TI 0 "altivec_register_operand" "=v")
(mod:TI (match_operand:TI 1 "altivec_register_operand" "v")
(match_operand:TI 2 "altivec_register_operand" "v")))]
-  "TARGET_POWER10 && TARGET_POWERPC64"
+  "TARGET_POWER10 && TARGET_POWERPC64 && !RS6000_DISABLE_SCALAR_MODULO"
   "vmodsq %0,%1,%2"
   [(set_attr "type" "vecdiv")
(set_attr "size" "128")])
diff --git a/gcc/testsuite/gcc.target/powerpc/clone1.c 
b/gcc/testsuite/gcc.target/powerpc/clone1.c

index c69fd2aa1b8..74323ca0e8c 100644
--- a/gcc/testsuite/gcc.target/powerpc/clone1.c
+++ b/gcc/testsuite/gcc.target/powerpc/clone1.c
@@ -21,6 +21,7 @@ long mod_func_or (long a, long b, long c)
   return mod_func (a, b) | c;
 }

-/* { dg-final { scan-assembler-times {\mdivd\M}  1 } } */
-/* { 

[PATCH, rs6000] Disable generation of scalar modulo instructions

2023-04-07 Thread Pat Haugen via Gcc-patches

Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64le (Power10).
Ok for master and backports after burn in?

-Pat


2023-04-07  Pat Haugen  

gcc/
* config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
* config/rs6000/rs6000.md (mod3, *mod3): Use it.
(define_expand umod3): New.
(define_insn umod3): Rename to *umod3 and disable.

gcc/testsuite/
* gcc.target/powerpc/clone1.c: Add xfails.
* gcc.target/powerpc/clone3.c: Likewise.
* gcc.target/powerpc/mod-1.c: Likewise.
* gcc.target/powerpc/mod-2.c: Likewise.



diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..e4d9f357622 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2492,3 +2492,9 @@ while (0)
rs6000_asm_output_opcode (STREAM);  \
 }  \
   while (0)
+
+/* Disable generation of scalar modulo instructions for word/dword due to
+   performance issues with certain input values. This can be removed in the
+   future when the issues have been resolved.  */
+#define RS6000_DISABLE_SCALAR_MODULO 1
+
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 44f7dd509cb..a267b7ee2d0 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3421,6 +3421,17 @@ (define_expand "mod3"
FAIL;

   operands[2] = force_reg (mode, operands[2]);
+
+  if (RS6000_DISABLE_SCALAR_MODULO)
+   {
+ temp1 = gen_reg_rtx (mode);
+ temp2 = gen_reg_rtx (mode);
+
+ emit_insn (gen_div3 (temp1, operands[1], operands[2]));
+ emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+ emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+ DONE;
+   }
 }
   else
 {
@@ -3440,17 +3451,42 @@ (define_insn "*mod3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
 (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
 (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
   "mods %0,%1,%2"
   [(set_attr "type" "div")
(set_attr "size" "")])

+;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is
+;; removed.
+(define_expand "umod3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand")
+   (umod:GPR (match_operand:GPR 1 "gpc_reg_operand")
+ (match_operand:GPR 2 "gpc_reg_operand")))]
+  ""
+{
+  rtx temp1;
+  rtx temp2;
+
+  if (!TARGET_MODULO)
+   FAIL;
+
+  if (RS6000_DISABLE_SCALAR_MODULO)
+{
+  temp1 = gen_reg_rtx (mode);
+  temp2 = gen_reg_rtx (mode);
+
+  emit_insn (gen_udiv3 (temp1, operands[1], operands[2]));
+  emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+  emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+  DONE;
+}
+})

-(define_insn "umod3"
+(define_insn "*umod3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
 (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
  (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
   "modu %0,%1,%2"
   [(set_attr "type" "div")
(set_attr "size" "")])
diff --git a/gcc/testsuite/gcc.target/powerpc/clone1.c 
b/gcc/testsuite/gcc.target/powerpc/clone1.c

index c69fd2aa1b8..74323ca0e8c 100644
--- a/gcc/testsuite/gcc.target/powerpc/clone1.c
+++ b/gcc/testsuite/gcc.target/powerpc/clone1.c
@@ -21,6 +21,7 @@ long mod_func_or (long a, long b, long c)
   return mod_func (a, b) | c;
 }

-/* { dg-final { scan-assembler-times {\mdivd\M}  1 } } */
-/* { dg-final { scan-assembler-times {\mmulld\M} 1 } } */
-/* { dg-final { scan-assembler-times {\mmodsd\M} 1 } } */
+/* { Fail due to RS6000_DISABLE_SCALAR_MODULO. */
+/* { dg-final { scan-assembler-times {\mdivd\M}  1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\mmulld\M} 1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\mmodsd\M} 1 { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/clone3.c 
b/gcc/testsuite/gcc.target/powerpc/clone3.c

index 911b88b781d..d3eb4dd2378 100644
--- a/gcc/testsuite/gcc.target/powerpc/clone3.c
+++ b/gcc/testsuite/gcc.target/powerpc/clone3.c
@@ -27,7 +27,8 @@ long mod_func_or (long a, long b, long c)
   return mod_func (a, b) | c;
 }

-/* { dg-final { scan-assembler-times {\mdivd\M}  1 } } */
-/* { dg-final { scan-assembler-times {\mmulld\M} 1 } } */
-/* { dg-final { scan-assembler-times {\mmodsd\M} 2 } } */
+/* { Fail due to RS6000_DISABLE_SCALAR_MODULO. */
+/* { dg-final { scan-assembler-times {\mdivd\M}  1 { xfail *-*-* } } } 

[PATCH V2, rs6000] Tweak modulo define_insns to eliminate register copy

2023-03-21 Thread Pat Haugen via Gcc-patches

Updated patch with review comments addressed: fixed up testcase and added
another testcase to verify peephole is functional.

Don't force target of modulo into a distinct register.

The define_insns for the modulo operation currently force the target 
register

to a distinct reg in preparation for a possible future peephole combining
div/mod. But this can lead to cases of a needless copy being inserted. Fixed
with the following patch.

Bootstrapped and regression tested on powerpc64le.
Ok for master?

-Pat


2023-03-21  Pat Haugen  

gcc/
* config/rs6000/rs6000.md (*mod3, umod3): Add
non-earlyclobber alternative.

gcc/testsuite/
* gcc.target/powerpc/mod-no_copy.c: New.
* gcc.target/powerpc/mod-peephole.c: New.


diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 81bffb04ceb..44f7dd509cb 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3437,9 +3437,9 @@ (define_expand "mod3"
 ;; In order to enable using a peephole2 for combining div/mod to 
eliminate the

 ;; mod, prefer putting the result of mod into a different register
 (define_insn "*mod3"
-  [(set (match_operand:GPR 0 "gpc_reg_operand" "=")
-(mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
-(match_operand:GPR 2 "gpc_reg_operand" "r")))]
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
+(mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
+(match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
   "TARGET_MODULO"
   "mods %0,%1,%2"
   [(set_attr "type" "div")
@@ -3447,9 +3447,9 @@ (define_insn "*mod3"


 (define_insn "umod3"
-  [(set (match_operand:GPR 0 "gpc_reg_operand" "=")
-(umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
- (match_operand:GPR 2 "gpc_reg_operand" "r")))]
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
+(umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
+ (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
   "TARGET_MODULO"
   "modu %0,%1,%2"
   [(set_attr "type" "div")
diff --git a/gcc/testsuite/gcc.target/powerpc/mod-no_copy.c 
b/gcc/testsuite/gcc.target/powerpc/mod-no_copy.c

new file mode 100644
index 000..c55e486ee9b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mod-no_copy.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mdejagnu-cpu=power9 -O2" } */
+
+/* Verify r3 is used as source and target, no copy inserted. */
+
+long foo (long a, long b)
+{
+  return (a % b);
+}
+
+unsigned long foo2 (unsigned long a, unsigned long b)
+{
+  return (a % b);
+}
+
+/* { dg-final { scan-assembler-not {\mmr\M} } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/mod-peephole.c 
b/gcc/testsuite/gcc.target/powerpc/mod-peephole.c

new file mode 100644
index 000..7517fbc397c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mod-peephole.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-mdejagnu-cpu=power9 -O2" } */
+
+/* Verify peephole fires to combine div/mod using same opnds. */
+
+long foo (long a, long b)
+{
+  long x, y;
+
+  x = a / b;
+  y = a % b;
+  return (x + y);
+}
+
+unsigned long foo2 (unsigned long a, unsigned long b)
+{
+  unsigned long x, y;
+
+  x = a / b;
+  y = a % b;
+  return (x + y);
+}
+
+/* { dg-final { scan-assembler-not {\mmodsd\M} } } */
+/* { dg-final { scan-assembler-not {\mmodud\M} } } */


Re: [PATCH, rs6000] Tweak modulo define_insns to eliminate register copy

2023-02-27 Thread Pat Haugen via Gcc-patches

On 2/27/23 2:53 PM, Segher Boessenkool wrote:

Hi!

On Mon, Feb 27, 2023 at 02:12:23PM -0600, Pat Haugen wrote:

On 2/27/23 11:08 AM, Segher Boessenkool wrote:

On Mon, Feb 27, 2023 at 09:11:37AM -0600, Pat Haugen wrote:

The define_insns for the modulo operation currently force the target
register
to a distinct reg in preparation for a possible future peephole combining
div/mod. But this can lead to cases of a needless copy being inserted.
Fixed
with the following patch.


Have you verified those peepholes still match?


Yes, I verified the peepholes still match and transform the sequence.


Please add the testcases for that then?  Or do we have tests for it
already :-)


I don't see one, but can add one.


Do those peepholes actually improve performance?  On new CPUs?  The code
here says
;; On machines with modulo support, do a combined div/mod the old fashioned
;; method, since the multiply/subtract is faster than doing the mod
instruction
;; after a divide.
but that really should not be true: we can do the div and mod in
parallel (except in SMT4 perhaps, which we never schedule for anyway),
so that should always be strictly faster.


Since the modulo insns were introduced in Power9, we're just talking
Power9/Power10. On paper, I would agree that separate div/mod could be
slightly faster to get the mod result,


"Slightly".  It takes 12 cycles for the two in parallel (64-bit, p9),
but 17 cycles for the "cheaper" sequence (divd+mulld+subf, 12+5+2).  It
is all worse if the units are busy of course, or if there are other
problems.


but if you throw in another
independent div or mod in the insn stream then doing the peephole should
be a clear win since that 3rd insn can execute in parallel with the
initial divide as opposed to waiting for the one of the first div/mod to
clear the exclusive stage of the pipe.


That is the SMT4 case, the one we do not optimise for.  SMT2 and ST can
do four in parallel.  This means you can start a div or mod every 2nd
cycle on average, so it is very unlikely you will ever be limited by
this on real code.


Power9/Power10 only have 2 fixed-point divide units, and are able to 
issue 2 divides every 9/11 cycles (they aren't fully pipelined), with 
latencies of 12-24/12-25. Not saying that changes the "best case" 
scenario, just pointing out a lot of variables in play.


-Pat




Re: [PATCH, rs6000] Tweak modulo define_insns to eliminate register copy

2023-02-27 Thread Pat Haugen via Gcc-patches

On 2/27/23 11:08 AM, Segher Boessenkool wrote:

Hi!

On Mon, Feb 27, 2023 at 09:11:37AM -0600, Pat Haugen wrote:

The define_insns for the modulo operation currently force the target
register
to a distinct reg in preparation for a possible future peephole combining
div/mod. But this can lead to cases of a needless copy being inserted. Fixed
with the following patch.


Have you verified those peepholes still match?


Yes, I verified the peepholes still match and transform the sequence.



Do those peepholes actually improve performance?  On new CPUs?  The code
here says
;; On machines with modulo support, do a combined div/mod the old fashioned
;; method, since the multiply/subtract is faster than doing the mod instruction
;; after a divide.
but that really should not be true: we can do the div and mod in
parallel (except in SMT4 perhaps, which we never schedule for anyway),
so that should always be strictly faster.

Since the modulo insns were introduced in Power9, we're just talking 
Power9/Power10. On paper, I would agree that separate div/mod could be 
slightly faster to get the mod result, but if you throw in another 
independent div or mod in the insn stream then doing the peephole should 
be a clear win since that 3rd insn can execute in parallel with the 
initial divide as opposed to waiting for the one of the first div/mod to 
clear the exclusive stage of the pipe.



--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mod-no_copy.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target { powerpc*-*-*  } } } */


All files in gcc.target/powerpc/ test for this already.  Just leave off
the target clause here?


+/* { dg-require-effective-target powerpc_p9modulo_ok } */


Leave out this line, because ...


+/* { dg-options "-mdejagnu-cpu=power9 -O2" } */


... the -mcpu= forces it to true always.


Will update.

-Pat




+/* Verify r3 is used as source and target, no copy inserted. */



+/* { dg-final { scan-assembler-not {\mmr\M} } } */


That is probably good enough, yeah, since the test results in only a
handful of insns.


Segher




[PATCH, rs6000] Tweak modulo define_insns to eliminate register copy

2023-02-27 Thread Pat Haugen via Gcc-patches

Don't force target of modulo into a distinct register.

The define_insns for the modulo operation currently force the target 
register

to a distinct reg in preparation for a possible future peephole combining
div/mod. But this can lead to cases of a needless copy being inserted. Fixed
with the following patch.

Bootstrapped and regression tested on powerpc64le.
Ok for master?

-Pat


2023-02-27  Pat Haugen  

gcc/
* config/rs6000/rs6000.md (*mod3, umod3): Add
non-earlyclobber alternative.

gcc/testsuite/
* gcc.target/powerpc/mod-no_copy.c: New.


diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 81bffb04ceb..44f7dd509cb 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3437,9 +3437,9 @@ (define_expand "mod3"
 ;; In order to enable using a peephole2 for combining div/mod to 
eliminate the

 ;; mod, prefer putting the result of mod into a different register
 (define_insn "*mod3"
-  [(set (match_operand:GPR 0 "gpc_reg_operand" "=")
-(mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
-(match_operand:GPR 2 "gpc_reg_operand" "r")))]
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
+(mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
+(match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
   "TARGET_MODULO"
   "mods %0,%1,%2"
   [(set_attr "type" "div")
@@ -3447,9 +3447,9 @@ (define_insn "*mod3"


 (define_insn "umod3"
-  [(set (match_operand:GPR 0 "gpc_reg_operand" "=")
-(umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
- (match_operand:GPR 2 "gpc_reg_operand" "r")))]
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
+(umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
+ (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
   "TARGET_MODULO"
   "modu %0,%1,%2"
   [(set_attr "type" "div")
diff --git a/gcc/testsuite/gcc.target/powerpc/mod-no_copy.c 
b/gcc/testsuite/gcc.target/powerpc/mod-no_copy.c

new file mode 100644
index 000..91e3003b3fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mod-no_copy.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target { powerpc*-*-*  } } } */
+/* { dg-require-effective-target powerpc_p9modulo_ok } */
+/* { dg-options "-mdejagnu-cpu=power9 -O2" } */
+
+/* Verify r3 is used as source and target, no copy inserted. */
+
+long foo (long a, long b)
+{
+  return (a % b);
+}
+
+unsigned long foo2 (unsigned long a, unsigned long b)
+{
+  return (a % b);
+}
+
+/* { dg-final { scan-assembler-not {\mmr\M} } } */


Re: [PATCH] rs6000: Make P10_FUSION honour tuning setting

2023-01-05 Thread Pat Haugen via Gcc-patches

On 1/4/23 3:20 AM, Kewen.Lin via Gcc-patches wrote:

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 88c865b6b4b..6fa084c0807 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -4378,9 +4378,15 @@ rs6000_option_override_internal (bool global_init_p)
rs6000_isa_flags &= ~OPTION_MASK_MMA;
  }

-  if (TARGET_POWER10
-  && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION) == 0)
-rs6000_isa_flags |= OPTION_MASK_P10_FUSION;
+  /* Enable power10 fusion if we are tuning for power10, even if we aren't
+ generating power10 instructions.  */
+  if (!(rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION))
+{
+  if (processor_target_table[tune_index].processor == PROCESSOR_POWER10)


You can use (rs6000_tune == PROCESSOR_POWER10) at this point.
-Pat


+   rs6000_isa_flags |= OPTION_MASK_P10_FUSION;
+  else
+   rs6000_isa_flags &= ~OPTION_MASK_P10_FUSION;
+}





Ping: [PATCH, rs6000] Fix passing of Coomplex IEEE 128-bit [PR99685]

2022-05-10 Thread Pat Haugen via Gcc-patches

Ping.

On 4/26/22 3:06 PM, Pat Haugen via Gcc-patches wrote:

Fix register count when not splitting Complex IEEE 128-bit args.

For ABI_V4, we do not split complex args. This created a problem because
even though an arg would be passed in two VSX regs, we were only 
advancing the

function arg counter by one VSX register. Fixed with this patch.

Bootstrapped and regression tested on powerpc64(32/64) and powerpc64le.
Ok for master?

-Pat


2022-04-26  Pat Haugen  

 PR testsuite/99685

gcc/
 * config/rs6000/rs6000-call.cc (rs6000_function_arg_advance_1): Bump
 register count when not splitting IEEE 128-bit Complex.


diff --git a/gcc/config/rs6000/rs6000-call.cc 
b/gcc/config/rs6000/rs6000-call.cc

index f06c692..9d18607 100644
--- a/gcc/config/rs6000/rs6000-call.cc
+++ b/gcc/config/rs6000/rs6000-call.cc
@@ -,6 +,12 @@ rs6000_function_arg_advance_1 (CUMULATIVE_ARGS 
*cum, machine_mode mode,

  {
    cum->vregno += n_elts;

+  /* If we are not splitting Complex IEEE 128-bit args then account 
for

+ the fact that they are passed in 2 VSX regs. */
+  if (! targetm.calls.split_complex_arg && type
+  && TREE_CODE (type) == COMPLEX_TYPE && elt_mode == KCmode)
+    cum->vregno++;
+
    if (!TARGET_ALTIVEC)
  error ("cannot pass argument in vector register because"
     " altivec instructions are disabled, use %qs"




[PATCH, rs6000] Fix passing of Coomplex IEEE 128-bit [PR99685]

2022-04-26 Thread Pat Haugen via Gcc-patches

Fix register count when not splitting Complex IEEE 128-bit args.

For ABI_V4, we do not split complex args. This created a problem because
even though an arg would be passed in two VSX regs, we were only 
advancing the

function arg counter by one VSX register. Fixed with this patch.

Bootstrapped and regression tested on powerpc64(32/64) and powerpc64le.
Ok for master?

-Pat


2022-04-26  Pat Haugen  

PR testsuite/99685

gcc/
* config/rs6000/rs6000-call.cc (rs6000_function_arg_advance_1): Bump
register count when not splitting IEEE 128-bit Complex.


diff --git a/gcc/config/rs6000/rs6000-call.cc 
b/gcc/config/rs6000/rs6000-call.cc

index f06c692..9d18607 100644
--- a/gcc/config/rs6000/rs6000-call.cc
+++ b/gcc/config/rs6000/rs6000-call.cc
@@ -,6 +,12 @@ rs6000_function_arg_advance_1 (CUMULATIVE_ARGS 
*cum, machine_mode mode,

{
  cum->vregno += n_elts;

+ /* If we are not splitting Complex IEEE 128-bit args then account for
+the fact that they are passed in 2 VSX regs. */
+ if (! targetm.calls.split_complex_arg && type
+ && TREE_CODE (type) == COMPLEX_TYPE && elt_mode == KCmode)
+   cum->vregno++;
+
  if (!TARGET_ALTIVEC)
error ("cannot pass argument in vector register because"
   " altivec instructions are disabled, use %qs"


PING: [PATCH, testsuite] Fix attr-retain-*.c testcases on 32-bit PowerPC [PR100407]

2022-02-24 Thread Pat Haugen via Gcc-patches

Ping.

On 2/10/22 4:17 PM, Pat Haugen via Gcc-patches wrote:

Per Alan's comment in the bugzilla, fix attr-retain-* tescases for 32-bit 
PowerPC.

Bootstrapped and regression tested on powerpc64(32/64) and powerpc64le.
Ok for master?

-Pat


2022-02-10  Pat Haugen  

PR testsuite/100407

gcc/testsuite/
* gcc.c-torture/compile/attr-retain-1.c: Add -G0 for 32-bit PowerPC.
* gcc.c-torture/compile/attr-retain-2.c: Likewise.



diff --git a/gcc/testsuite/gcc.c-torture/compile/attr-retain-1.c 
b/gcc/testsuite/gcc.c-torture/compile/attr-retain-1.c
index 6cab155..4a366eb 100644
--- a/gcc/testsuite/gcc.c-torture/compile/attr-retain-1.c
+++ b/gcc/testsuite/gcc.c-torture/compile/attr-retain-1.c
@@ -1,4 +1,5 @@
  /* { dg-do compile { target R_flag_in_section } } */
+/* { dg-options "-G0" { target { powerpc*-*-* && ilp32 } } } */
  /* { dg-final { scan-assembler ".text.*,\"axR\"" } } */
  /* { dg-final { scan-assembler ".bss.*,\"awR\"" } } */
  /* { dg-final { scan-assembler ".data.*,\"awR\"" } } */
diff --git a/gcc/testsuite/gcc.c-torture/compile/attr-retain-2.c 
b/gcc/testsuite/gcc.c-torture/compile/attr-retain-2.c
index 0208ffe..d9fc150 100644
--- a/gcc/testsuite/gcc.c-torture/compile/attr-retain-2.c
+++ b/gcc/testsuite/gcc.c-torture/compile/attr-retain-2.c
@@ -11,5 +11,6 @@
  /* { dg-final { scan-assembler ".bss.used_lcomm2,\"awR\"" { target arm-*-* } 
} } */
  /* { dg-final { scan-assembler ".data.used_foo_sec,\"awR\"" } } */
  /* { dg-options "-ffunction-sections -fdata-sections" } */
+/* { dg-options "-ffunction-sections -fdata-sections -G0" { target { powerpc*-*-* 
&& ilp32 } } } */
  
  #include "attr-retain-1.c"




PING: [PATCH, rs6000] Clean up Power10 fusion options

2022-02-17 Thread Pat Haugen via Gcc-patches

Ping.

On 1/28/22 12:03 PM, Pat Haugen via Gcc-patches wrote:

Mark Power10 fusion option undocumented and remove sub-options.

Bootstrapped and regression tested on powerpc64le(Power10).
Ok for master?

-Pat


2022-01-28  Pat Haugen  

gcc/
* config/rs6000/rs6000.opt (mpower10-fusion): Mark Undocumented.
(mpower10-fusion-ld-cmpi, mpower10-fusion-2logical,
mpower10-fusion-logical-add, mpower10-fusion-add-logical,
mpower10-fusion-2add, mpower10-fusion-2store): Remove.
* config/rs6000/rs6000-cpus.def (ISA_3_1_MASKS_SERVER,
OTHER_P9_VECTOR_MASKS): Remove Power10 fusion sub-options.
* config/rs6000/rs6000.cc (rs6000_option_override_internal,
power10_sched_reorder): Likewise.
* config/rs6000/genfusion.pl (gen_ld_cmpi_p10, gen_logical_addsubf,
gen_addadd): Likewise
* config/rs6000/fusion.md: Regenerate.


[PATCH, testsuite] Fix attr-retain-*.c testcases on 32-bit PowerPC [PR100407]

2022-02-10 Thread Pat Haugen via Gcc-patches
Per Alan's comment in the bugzilla, fix attr-retain-* tescases for 32-bit 
PowerPC.

Bootstrapped and regression tested on powerpc64(32/64) and powerpc64le.
Ok for master?

-Pat


2022-02-10  Pat Haugen  

PR testsuite/100407

gcc/testsuite/
* gcc.c-torture/compile/attr-retain-1.c: Add -G0 for 32-bit PowerPC.
* gcc.c-torture/compile/attr-retain-2.c: Likewise.



diff --git a/gcc/testsuite/gcc.c-torture/compile/attr-retain-1.c 
b/gcc/testsuite/gcc.c-torture/compile/attr-retain-1.c
index 6cab155..4a366eb 100644
--- a/gcc/testsuite/gcc.c-torture/compile/attr-retain-1.c
+++ b/gcc/testsuite/gcc.c-torture/compile/attr-retain-1.c
@@ -1,4 +1,5 @@
 /* { dg-do compile { target R_flag_in_section } } */
+/* { dg-options "-G0" { target { powerpc*-*-* && ilp32 } } } */
 /* { dg-final { scan-assembler ".text.*,\"axR\"" } } */
 /* { dg-final { scan-assembler ".bss.*,\"awR\"" } } */
 /* { dg-final { scan-assembler ".data.*,\"awR\"" } } */
diff --git a/gcc/testsuite/gcc.c-torture/compile/attr-retain-2.c 
b/gcc/testsuite/gcc.c-torture/compile/attr-retain-2.c
index 0208ffe..d9fc150 100644
--- a/gcc/testsuite/gcc.c-torture/compile/attr-retain-2.c
+++ b/gcc/testsuite/gcc.c-torture/compile/attr-retain-2.c
@@ -11,5 +11,6 @@
 /* { dg-final { scan-assembler ".bss.used_lcomm2,\"awR\"" { target arm-*-* } } 
} */
 /* { dg-final { scan-assembler ".data.used_foo_sec,\"awR\"" } } */
 /* { dg-options "-ffunction-sections -fdata-sections" } */
+/* { dg-options "-ffunction-sections -fdata-sections -G0" { target { 
powerpc*-*-* && ilp32 } } } */
 
 #include "attr-retain-1.c"


[PATCH, rs6000] Clean up Power10 fusion options

2022-01-28 Thread Pat Haugen via Gcc-patches
Mark Power10 fusion option undocumented and remove sub-options.

Bootstrapped and regression tested on powerpc64le(Power10).
Ok for master?

-Pat


2022-01-28  Pat Haugen  

gcc/
* config/rs6000/rs6000.opt (mpower10-fusion): Mark Undocumented.
(mpower10-fusion-ld-cmpi, mpower10-fusion-2logical,
mpower10-fusion-logical-add, mpower10-fusion-add-logical,
mpower10-fusion-2add, mpower10-fusion-2store): Remove.
* config/rs6000/rs6000-cpus.def (ISA_3_1_MASKS_SERVER,
OTHER_P9_VECTOR_MASKS): Remove Power10 fusion sub-options.
* config/rs6000/rs6000.cc (rs6000_option_override_internal,
power10_sched_reorder): Likewise.
* config/rs6000/genfusion.pl (gen_ld_cmpi_p10, gen_logical_addsubf,
gen_addadd): Likewise
* config/rs6000/fusion.md: Regenerate.
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index c2a77182a9e..b4e69e9fefd 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -491,33 +491,9 @@ Target Mask(P8_VECTOR) Var(rs6000_isa_flags)
 Use vector and scalar instructions added in ISA 2.07.
 
 mpower10-fusion
-Target Mask(P10_FUSION) Var(rs6000_isa_flags)
+Target Undocumented Mask(P10_FUSION) Var(rs6000_isa_flags)
 Fuse certain integer operations together for better performance on power10.
 
-mpower10-fusion-ld-cmpi
-Target Undocumented Mask(P10_FUSION_LD_CMPI) Var(rs6000_isa_flags)
-Fuse certain integer operations together for better performance on power10.
-
-mpower10-fusion-2logical
-Target Undocumented Mask(P10_FUSION_2LOGICAL) Var(rs6000_isa_flags)
-Fuse pairs of scalar or vector logical operations together for better 
performance on power10.
-
-mpower10-fusion-logical-add
-Target Undocumented Mask(P10_FUSION_LOGADD) Var(rs6000_isa_flags)
-Fuse scalar logical op with add/subf for better performance on power10.
-
-mpower10-fusion-add-logical
-Target Undocumented Mask(P10_FUSION_ADDLOG) Var(rs6000_isa_flags)
-Fuse scalar add/subf with logical op for better performance on power10.
-
-mpower10-fusion-2add
-Target Undocumented Mask(P10_FUSION_2ADD) Var(rs6000_isa_flags)
-Fuse dependent pairs of add or vaddudm instructions for better performance on 
power10.
-
-mpower10-fusion-2store
-Target Undocumented Mask(P10_FUSION_2STORE) Var(rs6000_isa_flags)
-Fuse certain store operations together for better performance on power10.
-
 mcrypto
 Target Mask(CRYPTO) Var(rs6000_isa_flags)
 Use ISA 2.07 Category:Vector.AES and Category:Vector.SHA2 instructions.
diff --git a/gcc/config/rs6000/rs6000-cpus.def 
b/gcc/config/rs6000/rs6000-cpus.def
index 325b21967c4..963947f6939 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -85,13 +85,7 @@
 #define ISA_3_1_MASKS_SERVER   (ISA_3_0_MASKS_SERVER   \
 | OPTION_MASK_POWER10  \
 | OTHER_POWER10_MASKS  \
-| OPTION_MASK_P10_FUSION   \
-| OPTION_MASK_P10_FUSION_LD_CMPI   \
-| OPTION_MASK_P10_FUSION_2LOGICAL  \
-| OPTION_MASK_P10_FUSION_LOGADD\
-| OPTION_MASK_P10_FUSION_ADDLOG\
-| OPTION_MASK_P10_FUSION_2ADD  \
-| OPTION_MASK_P10_FUSION_2STORE)
+| OPTION_MASK_P10_FUSION)
 
 /* Flags that need to be turned off if -mno-power9-vector.  */
 #define OTHER_P9_VECTOR_MASKS  (OPTION_MASK_FLOAT128_HW\
@@ -139,12 +133,6 @@
 | OPTION_MASK_FPRND\
 | OPTION_MASK_POWER10  \
 | OPTION_MASK_P10_FUSION   \
-| OPTION_MASK_P10_FUSION_LD_CMPI   \
-| OPTION_MASK_P10_FUSION_2LOGICAL  \
-| OPTION_MASK_P10_FUSION_LOGADD\
-| OPTION_MASK_P10_FUSION_ADDLOG\
-| OPTION_MASK_P10_FUSION_2ADD  \
-| OPTION_MASK_P10_FUSION_2STORE\
 | OPTION_MASK_HTM  \
 | OPTION_MASK_ISEL \
 | OPTION_MASK_MFCRF\
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index a5fd36b72d9..548366abada 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -4458,30 +4458,6 @@ rs6000_option_override_internal (bool global_init_p)
   && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION) == 0)
 rs6000_isa_flags |= OPTION_MASK_P10_FUSION;
 
-  if (TARGET_POWER10 &&
-  

Ping^5: [PATCH, rs6000 V2] rotate and mask constants [PR94393]

2022-01-19 Thread Pat Haugen via Gcc-patches
Ping. I'll note that I recently discovered that this patch also fixes PR93176 
and PR97042.

-Pat

On 11/22/21 1:38 PM, Pat Haugen via Gcc-patches wrote:
> Updated version of the patch. Changes made from original are updated 
> commentary to hopefully aid readability, no functional changes.
> 
> 
> Implement more two insn constants.  rotate_and_mask_constant covers
> 64-bit constants that can be formed by rotating a 16-bit signed
> constant, rotating a 16-bit signed constant masked on left or right
> (rldicl and rldicr), rotating a 16-bit signed constant masked by
> rldic, and unusual "lis; rldicl" and "lis; rldicr" patterns.  All the
> values possible for DImode rs6000_is_valid_and_mask are covered.
> 
> Bootstrapped and regression tested on powerpc64(32/64) and powerpc64le.
> Ok for master?
> 
> -Pat
> 
> 
> 2021-11-22  Alan Modra  
>   Pat Haugen  
> 
>   PR 94393
> gcc/
>   * config/rs6000/rs6000.c (rotate_di, is_rotate_positive_constant,
>   is_rotate_negative_constant, rotate_and_mask_constant): New functions.
>   (num_insns_constant_multi, rs6000_emit_set_long_const): Use it here.
>   * config/rs6000/rs6000.md (*movdi_internal64+1 splitter): Delete.
> gcc/testsuite/
>   * gcc.target/powerpc/rot_cst.h,
>   * gcc.target/powerpc/rot_cst1.c,
>   * gcc.target/powerpc/rot_cst2.c: New tests.



Ping^4: [PATCH, rs6000 V2] rotate and mask constants [PR94393]

2022-01-13 Thread Pat Haugen via Gcc-patches
Ping.

On 11/22/21 1:38 PM, Pat Haugen via Gcc-patches wrote:
> Updated version of the patch. Changes made from original are updated 
> commentary to hopefully aid readability, no functional changes.
> 
> 
> Implement more two insn constants.  rotate_and_mask_constant covers
> 64-bit constants that can be formed by rotating a 16-bit signed
> constant, rotating a 16-bit signed constant masked on left or right
> (rldicl and rldicr), rotating a 16-bit signed constant masked by
> rldic, and unusual "lis; rldicl" and "lis; rldicr" patterns.  All the
> values possible for DImode rs6000_is_valid_and_mask are covered.
> 
> Bootstrapped and regression tested on powerpc64(32/64) and powerpc64le.
> Ok for master?
> 
> -Pat
> 
> 
> 2021-11-22  Alan Modra  
>   Pat Haugen  
> 
>   PR 94393
> gcc/
>   * config/rs6000/rs6000.c (rotate_di, is_rotate_positive_constant,
>   is_rotate_negative_constant, rotate_and_mask_constant): New functions.
>   (num_insns_constant_multi, rs6000_emit_set_long_const): Use it here.
>   * config/rs6000/rs6000.md (*movdi_internal64+1 splitter): Delete.
> gcc/testsuite/
>   * gcc.target/powerpc/rot_cst.h,
>   * gcc.target/powerpc/rot_cst1.c,
>   * gcc.target/powerpc/rot_cst2.c: New tests.



Ping^3: [PATCH, rs6000 V2] rotate and mask constants [PR94393]

2021-12-30 Thread Pat Haugen via Gcc-patches
Ping.

On 11/22/21 1:38 PM, Pat Haugen via Gcc-patches wrote:
> Updated version of the patch. Changes made from original are updated 
> commentary to hopefully aid readability, no functional changes.
> 
> 
> Implement more two insn constants.  rotate_and_mask_constant covers
> 64-bit constants that can be formed by rotating a 16-bit signed
> constant, rotating a 16-bit signed constant masked on left or right
> (rldicl and rldicr), rotating a 16-bit signed constant masked by
> rldic, and unusual "lis; rldicl" and "lis; rldicr" patterns.  All the
> values possible for DImode rs6000_is_valid_and_mask are covered.
> 
> Bootstrapped and regression tested on powerpc64(32/64) and powerpc64le.
> Ok for master?
> 
> -Pat
> 
> 
> 2021-11-22  Alan Modra  
>   Pat Haugen  
> 
>   PR 94393
> gcc/
>   * config/rs6000/rs6000.c (rotate_di, is_rotate_positive_constant,
>   is_rotate_negative_constant, rotate_and_mask_constant): New functions.
>   (num_insns_constant_multi, rs6000_emit_set_long_const): Use it here.
>   * config/rs6000/rs6000.md (*movdi_internal64+1 splitter): Delete.
> gcc/testsuite/
>   * gcc.target/powerpc/rot_cst.h,
>   * gcc.target/powerpc/rot_cst1.c,
>   * gcc.target/powerpc/rot_cst2.c: New tests.
> 



Ping: [PATCH, rs6000 V2] rotate and mask constants [PR94393]

2021-12-07 Thread Pat Haugen via Gcc-patches
Ping.

On 11/22/21 1:38 PM, Pat Haugen via Gcc-patches wrote:
> Updated version of the patch. Changes made from original are updated 
> commentary to hopefully aid readability, no functional changes.
> 
> 
> Implement more two insn constants.  rotate_and_mask_constant covers
> 64-bit constants that can be formed by rotating a 16-bit signed
> constant, rotating a 16-bit signed constant masked on left or right
> (rldicl and rldicr), rotating a 16-bit signed constant masked by
> rldic, and unusual "lis; rldicl" and "lis; rldicr" patterns.  All the
> values possible for DImode rs6000_is_valid_and_mask are covered.
> 
> Bootstrapped and regression tested on powerpc64(32/64) and powerpc64le.
> Ok for master?
> 
> -Pat
> 
> 
> 2021-11-22  Alan Modra  
>   Pat Haugen  
> 
>   PR 94393
> gcc/
>   * config/rs6000/rs6000.c (rotate_di, is_rotate_positive_constant,
>   is_rotate_negative_constant, rotate_and_mask_constant): New functions.
>   (num_insns_constant_multi, rs6000_emit_set_long_const): Use it here.
>   * config/rs6000/rs6000.md (*movdi_internal64+1 splitter): Delete.
> gcc/testsuite/
>   * gcc.target/powerpc/rot_cst.h,
>   * gcc.target/powerpc/rot_cst1.c,
>   * gcc.target/powerpc/rot_cst2.c: New tests.
> 



[PATCH, rs6000 GCC-11 committed] Fix fusion testcase counts

2021-10-04 Thread Pat Haugen via Gcc-patches
Somehow there was an issue when a larger patch was backported, and this chunk 
did not make it. Tested and committed as obvious.

-Pat


2021-10-04  Pat Haugen  

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/fusion-p10-ldcmpi.c: Update counts.


diff --git a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c 
b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
index ea1d5d01950..526a026d874 100644
--- a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
+++ b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
@@ -53,16 +53,16 @@ TEST(int16_t)
 TEST(uint8_t)
 TEST(int8_t)
 
-/* { dg-final { scan-assembler-times "lbz_cmpldi_cr0_QI_clobber_CCUNS_zero"   
2 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lbz_cmpldi_cr0_QI_clobber_CCUNS_zero"   
4 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_DI_CC_none" 
4 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_clobber_CC_none"
4 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_DI_CCUNS_none" 
1 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_clobber_CCUNS_none"
1 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign"   
8 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"   
2 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign" 
3 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign"  
16 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"   
4 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign" 
0 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"   
4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero" 
2 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero" 
0 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none"   
2 { target lp64 } } } */
 
 /* { dg-final { scan-assembler-times "lbz_cmpldi_cr0_QI_clobber_CCUNS_zero"   
2 { target ilp32 } } } */


Ping: [PATCH, rs6000 V2] Add store fusion support for Power10

2021-08-26 Thread Pat Haugen via Gcc-patches
Ping.

On 8/11/21 11:02 AM, Pat Haugen via Gcc-patches wrote:
> Enable store fusion on Power10.
> 
> Use the SCHED_REORDER hook to implement Power10 specific ready list 
> reordering.
> As of now this is just store fusion.
> 
> Things changed in this version of the patch
> - Separate patch for additional load/store checks
> - Move option check from is_fusable_store() to caller
> - Misc coding style changes pointed out in review (parens/braces)
> - Add testcases
> 
> Bootstrap/regtest on powerpc64(32/64) and powerpc64le(Power10) with no new 
> regressions.
> Ok for master?
> 
> -Pat
> 
> 
> 2021-08-11  Pat Haugen  
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000-cpus.def (ISA_3_1_MASKS_SERVER): Add new flag.
>   (POWERPC_MASKS): Likewise.
>   * config/rs6000/rs6000.c (rs6000_option_override_internal): Enable
>   store fusion for Power10.
>   (is_fusable_store): New.
>   (power10_sched_reorder): Likewise.
>   (rs6000_sched_reorder): Do Power10 specific reordering.
>   (rs6000_sched_reorder2): Likewise.
>   * config/rs6000/rs6000.opt: Add new option.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/fusion-p10-stst.c: New test.
>   * gcc.target/powerpc/fusion-p10-stst2.c: New test.
> 
> 
> 
> diff --git a/gcc/config/rs6000/rs6000-cpus.def 
> b/gcc/config/rs6000/rs6000-cpus.def
> index 6758296c0fd..f5812da0184 100644
> --- a/gcc/config/rs6000/rs6000-cpus.def
> +++ b/gcc/config/rs6000/rs6000-cpus.def
> @@ -90,7 +90,8 @@
>| OPTION_MASK_P10_FUSION_2LOGICAL  \
>| OPTION_MASK_P10_FUSION_LOGADD\
>| OPTION_MASK_P10_FUSION_ADDLOG\
> -  | OPTION_MASK_P10_FUSION_2ADD)
> +  | OPTION_MASK_P10_FUSION_2ADD  \
> +  | OPTION_MASK_P10_FUSION_2STORE)
>  
>  /* Flags that need to be turned off if -mno-power9-vector.  */
>  #define OTHER_P9_VECTOR_MASKS(OPTION_MASK_FLOAT128_HW
> \
> @@ -143,6 +144,7 @@
>| OPTION_MASK_P10_FUSION_LOGADD\
>| OPTION_MASK_P10_FUSION_ADDLOG\
>| OPTION_MASK_P10_FUSION_2ADD  \
> +  | OPTION_MASK_P10_FUSION_2STORE\
>| OPTION_MASK_HTM  \
>| OPTION_MASK_ISEL \
>| OPTION_MASK_MFCRF\
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 60f406a4ff6..402cc924e3f 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -4495,6 +4495,10 @@ rs6000_option_override_internal (bool global_init_p)
>&& (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2ADD) == 0)
>  rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2ADD;
>  
> +  if (TARGET_POWER10
> +  && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2STORE) == 0)
> +rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2STORE;
> +
>/* Turn off vector pair/mma options on non-power10 systems.  */
>else if (!TARGET_POWER10 && TARGET_MMA)
>  {
> @@ -18874,6 +18878,91 @@ power9_sched_reorder2 (rtx_insn **ready, int lastpos)
>return cached_can_issue_more;
>  }
>  
> +/* Determine if INSN is a store to memory that can be fused with a similar
> +   adjacent store.  */
> +
> +static bool
> +is_fusable_store (rtx_insn *insn, rtx *str_mem)
> +{
> +  /* Insn must be a non-prefixed base+disp form store.  */
> +  if (is_store_insn (insn, str_mem)
> +  && get_attr_prefixed (insn) == PREFIXED_NO
> +  && get_attr_update (insn) == UPDATE_NO
> +  && get_attr_indexed (insn) == INDEXED_NO)
> +{
> +  /* Further restictions by mode and size.  */
> +  machine_mode mode = GET_MODE (*str_mem);
> +  HOST_WIDE_INT size;
> +  if (MEM_SIZE_KNOWN_P (*str_mem))
> + size = MEM_SIZE (*str_mem);
> +  else
> + return false;
> +
> +  if (INTEGRAL_MODE_P (mode))
> + /* Must be word or dword size.  */
> + return (size == 4 || size == 8);
> +  else if (FLOAT_MODE_P (mode))
> + /* Must be dword size.  */
> + return (size == 8);
> +}
> +
> +  return false;
> +}
> +
> +/* Do Power10 specific reordering of the ready list.  */
> +
> +static int
> +power10_sched_reorder (rtx_insn **ready, int lastpos)
> +{
> +  rtx mem1;
> +
> +  /* Do store fusion during sched2

[PATCH, rs6000 V2] Add store fusion support for Power10

2021-08-11 Thread Pat Haugen via Gcc-patches
Enable store fusion on Power10.

Use the SCHED_REORDER hook to implement Power10 specific ready list reordering.
As of now this is just store fusion.

Things changed in this version of the patch
- Separate patch for additional load/store checks
- Move option check from is_fusable_store() to caller
- Misc coding style changes pointed out in review (parens/braces)
- Add testcases

Bootstrap/regtest on powerpc64(32/64) and powerpc64le(Power10) with no new 
regressions.
Ok for master?

-Pat


2021-08-11  Pat Haugen  

gcc/ChangeLog:

* config/rs6000/rs6000-cpus.def (ISA_3_1_MASKS_SERVER): Add new flag.
(POWERPC_MASKS): Likewise.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Enable
store fusion for Power10.
(is_fusable_store): New.
(power10_sched_reorder): Likewise.
(rs6000_sched_reorder): Do Power10 specific reordering.
(rs6000_sched_reorder2): Likewise.
* config/rs6000/rs6000.opt: Add new option.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/fusion-p10-stst.c: New test.
* gcc.target/powerpc/fusion-p10-stst2.c: New test.



diff --git a/gcc/config/rs6000/rs6000-cpus.def 
b/gcc/config/rs6000/rs6000-cpus.def
index 6758296c0fd..f5812da0184 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -90,7 +90,8 @@
 | OPTION_MASK_P10_FUSION_2LOGICAL  \
 | OPTION_MASK_P10_FUSION_LOGADD\
 | OPTION_MASK_P10_FUSION_ADDLOG\
-| OPTION_MASK_P10_FUSION_2ADD)
+| OPTION_MASK_P10_FUSION_2ADD  \
+| OPTION_MASK_P10_FUSION_2STORE)
 
 /* Flags that need to be turned off if -mno-power9-vector.  */
 #define OTHER_P9_VECTOR_MASKS  (OPTION_MASK_FLOAT128_HW\
@@ -143,6 +144,7 @@
 | OPTION_MASK_P10_FUSION_LOGADD\
 | OPTION_MASK_P10_FUSION_ADDLOG\
 | OPTION_MASK_P10_FUSION_2ADD  \
+| OPTION_MASK_P10_FUSION_2STORE\
 | OPTION_MASK_HTM  \
 | OPTION_MASK_ISEL \
 | OPTION_MASK_MFCRF\
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 60f406a4ff6..402cc924e3f 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4495,6 +4495,10 @@ rs6000_option_override_internal (bool global_init_p)
   && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2ADD) == 0)
 rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2ADD;
 
+  if (TARGET_POWER10
+  && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2STORE) == 0)
+rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2STORE;
+
   /* Turn off vector pair/mma options on non-power10 systems.  */
   else if (!TARGET_POWER10 && TARGET_MMA)
 {
@@ -18874,6 +18878,91 @@ power9_sched_reorder2 (rtx_insn **ready, int lastpos)
   return cached_can_issue_more;
 }
 
+/* Determine if INSN is a store to memory that can be fused with a similar
+   adjacent store.  */
+
+static bool
+is_fusable_store (rtx_insn *insn, rtx *str_mem)
+{
+  /* Insn must be a non-prefixed base+disp form store.  */
+  if (is_store_insn (insn, str_mem)
+  && get_attr_prefixed (insn) == PREFIXED_NO
+  && get_attr_update (insn) == UPDATE_NO
+  && get_attr_indexed (insn) == INDEXED_NO)
+{
+  /* Further restictions by mode and size.  */
+  machine_mode mode = GET_MODE (*str_mem);
+  HOST_WIDE_INT size;
+  if (MEM_SIZE_KNOWN_P (*str_mem))
+   size = MEM_SIZE (*str_mem);
+  else
+   return false;
+
+  if (INTEGRAL_MODE_P (mode))
+   /* Must be word or dword size.  */
+   return (size == 4 || size == 8);
+  else if (FLOAT_MODE_P (mode))
+   /* Must be dword size.  */
+   return (size == 8);
+}
+
+  return false;
+}
+
+/* Do Power10 specific reordering of the ready list.  */
+
+static int
+power10_sched_reorder (rtx_insn **ready, int lastpos)
+{
+  rtx mem1;
+
+  /* Do store fusion during sched2 only.  */
+  if (!reload_completed)
+return cached_can_issue_more;
+
+  /* If the prior insn finished off a store fusion pair then simply
+ reset the counter and return, nothing more to do.  */
+  if (load_store_pendulum != 0)
+{
+  load_store_pendulum = 0;
+  return cached_can_issue_more;
+}
+
+  /* Try to pair certain store insns to adjacent memory locations
+ so that the hardware will fuse them to a single operation.  */
+  if (TARGET_P10_FUSION && TARGET_P10_FUSION_2STORE
+  && is_fusable_store (last_scheduled_insn, ))
+{
+  int pos;
+  rtx mem2;
+
+  /* A fusable store was just scheduled.  Scan the ready list for 

[PATCH, rs6000 V2] Add additional checks when identifying load/store instructions

2021-08-06 Thread Pat Haugen via Gcc-patches
Add additional checks to verify destination[source] of a load[store]
instruction is a register.

Modified code based on review comment to not change general logic of the flow. 
Braces needed on inner if-else to prevent error during bootstrap for ambiguous 
'else'.

Bootstrap/regtest on powerpc64le with no new regressions. Ok for master?

-Pat


2021-08-07  Pat Haugen  

gcc/ChangeLog:

* config/rs6000/rs6000.c (is_load_insn1): Verify destination is a
register.
(is_store_insn1): Verify source is a register.


diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 279f00cc648..c8a146a7d18 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -18357,8 +18357,13 @@ is_load_insn1 (rtx pat, rtx *load_mem)
   if (!pat || pat == NULL_RTX)
 return false;
 
-  if (GET_CODE (pat) == SET)
-return find_mem_ref (SET_SRC (pat), load_mem);
+  if (GET_CODE (pat)== SET)
+{
+  if (REG_P (SET_DEST (pat)))
+   return find_mem_ref (SET_SRC (pat), load_mem);
+  else
+   return false;
+}
 
   if (GET_CODE (pat) == PARALLEL)
 {
@@ -18395,7 +18400,12 @@ is_store_insn1 (rtx pat, rtx *str_mem)
 return false;
 
   if (GET_CODE (pat) == SET)
-return find_mem_ref (SET_DEST (pat), str_mem);
+{
+  if (REG_P (SET_SRC (pat)) || SUBREG_P (SET_SRC (pat)))
+   return find_mem_ref (SET_DEST (pat), str_mem);
+  else
+   return false;
+}
 
   if (GET_CODE (pat) == PARALLEL)
 {


Re: [PATCH, rs6000] Add additional checks when identifying load/store instructions

2021-08-06 Thread Pat Haugen via Gcc-patches
On 8/6/21 11:05 AM, Segher Boessenkool wrote:
> On Fri, Aug 06, 2021 at 10:29:40AM -0500, Pat Haugen wrote:
>> On 8/6/21 10:02 AM, Segher Boessenkool wrote:
 -  if (GET_CODE (pat) == SET)
 +  if (GET_CODE (pat) == SET && REG_P (SET_DEST (pat)))
  return find_mem_ref (SET_SRC (pat), load_mem);
>>>
>>> So this now falls through if it is a SET of something else than a reg.
>>> Is that intentional?  If so, this should be in the changelog.
>>>
>>
>> Falling through eventually resulted in returning false, but no, wasn't 
>> really intentional to change the logic that way. I'll fix up both places to 
>> look like the following and resubmit.
>>
>>   if (GET_CODE (pat)== SET) 
>> if (REG_P (SET_DEST (pat)))
>>   return find_mem_ref (SET_SRC (pat), load_mem);
>> else
>>   return false;
> 
> Well, that isn't quiet it either, no?  It returns false for all stores
> then?

Yeah, I didn't word that clearly, meant I'd fix up both appropriately with the 
inner if-else logic.

Here's what it looks like now.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 279f00cc648..ad856254f52 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -18357,8 +18357,11 @@ is_load_insn1 (rtx pat, rtx *load_mem)
   if (!pat || pat == NULL_RTX)
 return false;
 
-  if (GET_CODE (pat) == SET)
-return find_mem_ref (SET_SRC (pat), load_mem);
+  if (GET_CODE (pat)== SET)
+if (REG_P (SET_DEST (pat)))
+  return find_mem_ref (SET_SRC (pat), load_mem);
+else
+  return false;
 
   if (GET_CODE (pat) == PARALLEL)
 {
@@ -18395,7 +18398,10 @@ is_store_insn1 (rtx pat, rtx *str_mem)
 return false;
 
   if (GET_CODE (pat) == SET)
-return find_mem_ref (SET_DEST (pat), str_mem);
+if (REG_P (SET_SRC (pat)) || SUBREG_P (SET_SRC (pat)))
+  return find_mem_ref (SET_DEST (pat), str_mem);
+else
+  return false;
 
   if (GET_CODE (pat) == PARALLEL)
 {

Unless I'm totally not understanding the point you're trying to get across to 
me.

-Pat


Re: [PATCH, rs6000] Add additional checks when identifying load/store instructions

2021-08-06 Thread Pat Haugen via Gcc-patches
On 8/6/21 10:02 AM, Segher Boessenkool wrote:
> On Fri, Aug 06, 2021 at 09:47:40AM -0500, Pat Haugen wrote:
>> Add additional checks to verify destination[source] of a load[store]
>> instruction is a register.
> 
>>  * config/rs6000/rs6000.c: (is_load_insn1): Verify destination is a
>>  register.
> 
> No colon before " (".
> 
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -18357,7 +18361,7 @@ is_load_insn1 (rtx pat, rtx *load_mem)
>>if (!pat || pat == NULL_RTX)
>>  return false;
>>  
>> -  if (GET_CODE (pat) == SET)
>> +  if (GET_CODE (pat) == SET && REG_P (SET_DEST (pat)))
>>  return find_mem_ref (SET_SRC (pat), load_mem);
> 
> So this now falls through if it is a SET of something else than a reg.
> Is that intentional?  If so, this should be in the changelog.
> 

Falling through eventually resulted in returning false, but no, wasn't really 
intentional to change the logic that way. I'll fix up both places to look like 
the following and resubmit.

  if (GET_CODE (pat)== SET) 
if (REG_P (SET_DEST (pat)))
  return find_mem_ref (SET_SRC (pat), load_mem);
else
  return false;

-Pat


[PATCH, rs6000] Add additional checks when identifying load/store instructions

2021-08-06 Thread Pat Haugen via Gcc-patches
Add additional checks to verify destination[source] of a load[store]
instruction is a register.

Bootstrap/regtest on powerpc64le with no new regressions. Ok for master?

-Pat


2021-08-06  Pat Haugen  

gcc/ChangeLog:

* config/rs6000/rs6000.c: (is_load_insn1): Verify destination is a
register.
(is_store_insn1): Verify source is a register.


diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 279f00cc648..1460a0d7c5c 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -18357,7 +18361,7 @@ is_load_insn1 (rtx pat, rtx *load_mem)
   if (!pat || pat == NULL_RTX)
 return false;
 
-  if (GET_CODE (pat) == SET)
+  if (GET_CODE (pat) == SET && REG_P (SET_DEST (pat)))
 return find_mem_ref (SET_SRC (pat), load_mem);
 
   if (GET_CODE (pat) == PARALLEL)
@@ -18394,7 +18398,8 @@ is_store_insn1 (rtx pat, rtx *str_mem)
   if (!pat || pat == NULL_RTX)
 return false;
 
-  if (GET_CODE (pat) == SET)
+  if (GET_CODE (pat) == SET
+  && (REG_P (SET_SRC (pat)) || SUBREG_P (SET_SRC (pat
 return find_mem_ref (SET_DEST (pat), str_mem);
 
   if (GET_CODE (pat) == PARALLEL)


Re: [PATCH, rs6000] Add store fusion support for Power10

2021-08-04 Thread Pat Haugen via Gcc-patches
On 8/4/21 9:23 AM, Bill Schmidt wrote:
> Hi Pat,
> 
> Good stuff!  Comments below.
> 
> On 8/2/21 3:19 PM, Pat Haugen via Gcc-patches wrote:
>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>> index 279f00cc648..1460a0d7c5c 100644
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -4490,6 +4490,10 @@ rs6000_option_override_internal (bool global_init_p)
>>     && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2ADD) == 0)
>>   rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2ADD;
>>
>> +  if (TARGET_POWER10
>> +  && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2STORE) == 0)
>> +    rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2STORE;
>> +
>>     /* Turn off vector pair/mma options on non-power10 systems.  */
>>     else if (!TARGET_POWER10 && TARGET_MMA)
>>   {
>> @@ -18357,7 +18361,7 @@ is_load_insn1 (rtx pat, rtx *load_mem)
>>     if (!pat || pat == NULL_RTX)
>>   return false;
>>
>> -  if (GET_CODE (pat) == SET)
>> +  if (GET_CODE (pat) == SET && REG_P (SET_DEST (pat)))
>>   return find_mem_ref (SET_SRC (pat), load_mem);
> Looks like this is just an optimization to quickly discard stores, right?

Additional verification check to make sure destination is REG, yes. This will 
become a separate patch.

>>     if (GET_CODE (pat) == PARALLEL)
>> @@ -18394,7 +18398,8 @@ is_store_insn1 (rtx pat, rtx *str_mem)
>>     if (!pat || pat == NULL_RTX)
>>   return false;
>>
>> -  if (GET_CODE (pat) == SET)
>> +  if (GET_CODE (pat) == SET
>> +  && (REG_P (SET_SRC (pat)) || SUBREG_P (SET_SRC (pat
>>   return find_mem_ref (SET_DEST (pat), str_mem);
> 
> 
> Similar question.

Similar answer. :)
> 
>>
>>     if (GET_CODE (pat) == PARALLEL)
>> @@ -18859,6 +18864,96 @@ power9_sched_reorder2 (rtx_insn **ready, int 
>> lastpos)
>>     return cached_can_issue_more;
>>   }
>>
>> +/* Determine if INSN is a store to memory that can be fused with a similar
>> +   adjacent store.  */
>> +
>> +static bool
>> +is_fusable_store (rtx_insn *insn, rtx *str_mem)
>> +{
>> +  /* Exit early if not doing store fusion.  */
>> +  if (!(TARGET_P10_FUSION && TARGET_P10_FUSION_2STORE))
>> +    return false;
>> +
>> +  /* Insn must be a non-prefixed base+disp form store.  */
>> +  if (is_store_insn (insn, str_mem)
>> +  && get_attr_prefixed (insn) == PREFIXED_NO
>> +  && get_attr_update (insn) == UPDATE_NO
>> +  && get_attr_indexed (insn) == INDEXED_NO)
>> +    {
>> +  /* Further restictions by mode and size.  */
>> +  machine_mode mode = GET_MODE (*str_mem);
>> +  HOST_WIDE_INT size;
>> +  if MEM_SIZE_KNOWN_P (*str_mem)
>> +    size = MEM_SIZE (*str_mem);
>> +  else
>> +    return false;
>> +
>> +  if INTEGRAL_MODE_P (mode)
>> +    {
>> +  /* Must be word or dword size.  */
>> +  return (size == 4 || size == 8);
>> +    }
>> +  else if FLOAT_MODE_P (mode)
>> +    {
>> +  /* Must be dword size.  */
>> +  return (size == 8);
>> +    }
>> +    }
>> +
>> +  return false;
>> +}
>> +
>> +/* Do Power10 specific reordering of the ready list.  */
>> +
>> +static int
>> +power10_sched_reorder (rtx_insn **ready, int lastpos)
>> +{
>> +  int pos;
>> +  rtx mem1, mem2;
>> +
>> +  /* Do store fusion during sched2 only.  */
>> +  if (!reload_completed)
>> +    return cached_can_issue_more;
>> +
>> +  /* If the prior insn finished off a store fusion pair then simply
>> + reset the counter and return, nothing more to do.  */
> 
> 
> Good comments throughout, thanks!
> 
>> +  if (load_store_pendulum != 0)
>> +    {
>> +  load_store_pendulum = 0;
>> +  return cached_can_issue_more;
>> +    }
>> +
>> +  /* Try to pair certain store insns to adjacent memory locations
>> + so that the hardware will fuse them to a single operation.  */
>> +  if (is_fusable_store (last_scheduled_insn, ))
>> +    {
>> +  /* A fusable store was just scheduled.  Scan the ready list for 
>> another
>> + store that it can fuse with.  */
>> +  pos = lastpos;
>> +  while (pos >= 0)
>> +    {
>> +  /* GPR stores can be ascending or descending offsets, FPR/VSR stores
> VSR?  I don't see how that applies here.

Scalar floating point store from VSX re

[PATCH, rs6000] Add store fusion support for Power10

2021-08-02 Thread Pat Haugen via Gcc-patches
Enable store fusion on Power10.

Use the SCHED_REORDER hook to implement Power10 specific ready list reordering.
As of now, pairing stores for store fusion is the only function being
performed.

Bootstrap/regtest on powerpc64le(Power10) with no new regressions. Ok for
master?

-Pat


2021-08-02  Pat Haugen  

gcc/ChangeLog:

* config/rs6000/rs6000-cpus.def (ISA_3_1_MASKS_SERVER): Add new flag.
(POWERPC_MASKS): Likewise.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Enable
store fusion for Power10.
(is_load_insn1): Verify destination is a register.
(is_store_insn1): Verify source is a register.
(is_fusable_store): New.
(power10_sched_reorder): Likewise.
(rs6000_sched_reorder): Do Power10 specific reordering.
(rs6000_sched_reorder2): Likewise.
* config/rs6000/rs6000.opt: Add new option.



diff --git a/gcc/config/rs6000/rs6000-cpus.def 
b/gcc/config/rs6000/rs6000-cpus.def
index 6758296c0fd..f5812da0184 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -90,7 +90,8 @@
 | OPTION_MASK_P10_FUSION_2LOGICAL  \
 | OPTION_MASK_P10_FUSION_LOGADD\
 | OPTION_MASK_P10_FUSION_ADDLOG\
-| OPTION_MASK_P10_FUSION_2ADD)
+| OPTION_MASK_P10_FUSION_2ADD  \
+| OPTION_MASK_P10_FUSION_2STORE)
 
 /* Flags that need to be turned off if -mno-power9-vector.  */
 #define OTHER_P9_VECTOR_MASKS  (OPTION_MASK_FLOAT128_HW\
@@ -143,6 +144,7 @@
 | OPTION_MASK_P10_FUSION_LOGADD\
 | OPTION_MASK_P10_FUSION_ADDLOG\
 | OPTION_MASK_P10_FUSION_2ADD  \
+| OPTION_MASK_P10_FUSION_2STORE\
 | OPTION_MASK_HTM  \
 | OPTION_MASK_ISEL \
 | OPTION_MASK_MFCRF\
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 279f00cc648..1460a0d7c5c 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4490,6 +4490,10 @@ rs6000_option_override_internal (bool global_init_p)
   && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2ADD) == 0)
 rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2ADD;
 
+  if (TARGET_POWER10
+  && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2STORE) == 0)
+rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2STORE;
+
   /* Turn off vector pair/mma options on non-power10 systems.  */
   else if (!TARGET_POWER10 && TARGET_MMA)
 {
@@ -18357,7 +18361,7 @@ is_load_insn1 (rtx pat, rtx *load_mem)
   if (!pat || pat == NULL_RTX)
 return false;
 
-  if (GET_CODE (pat) == SET)
+  if (GET_CODE (pat) == SET && REG_P (SET_DEST (pat)))
 return find_mem_ref (SET_SRC (pat), load_mem);
 
   if (GET_CODE (pat) == PARALLEL)
@@ -18394,7 +18398,8 @@ is_store_insn1 (rtx pat, rtx *str_mem)
   if (!pat || pat == NULL_RTX)
 return false;
 
-  if (GET_CODE (pat) == SET)
+  if (GET_CODE (pat) == SET
+  && (REG_P (SET_SRC (pat)) || SUBREG_P (SET_SRC (pat
 return find_mem_ref (SET_DEST (pat), str_mem);
 
   if (GET_CODE (pat) == PARALLEL)
@@ -18859,6 +18864,96 @@ power9_sched_reorder2 (rtx_insn **ready, int lastpos)
   return cached_can_issue_more;
 }
 
+/* Determine if INSN is a store to memory that can be fused with a similar
+   adjacent store.  */
+
+static bool
+is_fusable_store (rtx_insn *insn, rtx *str_mem)
+{
+  /* Exit early if not doing store fusion.  */
+  if (!(TARGET_P10_FUSION && TARGET_P10_FUSION_2STORE))
+return false;
+
+  /* Insn must be a non-prefixed base+disp form store.  */
+  if (is_store_insn (insn, str_mem)
+  && get_attr_prefixed (insn) == PREFIXED_NO
+  && get_attr_update (insn) == UPDATE_NO
+  && get_attr_indexed (insn) == INDEXED_NO)
+{
+  /* Further restictions by mode and size.  */
+  machine_mode mode = GET_MODE (*str_mem);
+  HOST_WIDE_INT size;
+  if MEM_SIZE_KNOWN_P (*str_mem)
+   size = MEM_SIZE (*str_mem);
+  else
+   return false;
+
+  if INTEGRAL_MODE_P (mode)
+   {
+ /* Must be word or dword size.  */
+ return (size == 4 || size == 8);
+   }
+  else if FLOAT_MODE_P (mode)
+   {
+ /* Must be dword size.  */
+ return (size == 8);
+   }
+}
+
+  return false;
+}
+
+/* Do Power10 specific reordering of the ready list.  */
+
+static int
+power10_sched_reorder (rtx_insn **ready, int lastpos)
+{
+  int pos;
+  rtx mem1, mem2;
+
+  /* Do store fusion during sched2 only.  */
+  if (!reload_completed)
+return cached_can_issue_more;
+
+  /* If the prior insn finished off a 

PING: [RS6000] rotate and mask constants [PR94393]

2021-07-23 Thread Pat Haugen via Gcc-patches
Ping https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555760.html

I've done a current bootstrap/regtest on powerpc64/powerpc64le with no 
regressions.

-Pat


Re: [PATCH, rs6000] Update Power10 scheduling description for fused instruction types

2021-06-28 Thread Pat Haugen via Gcc-patches
On 6/7/21 3:41 PM, Pat Haugen via Gcc-patches wrote:
> Update Power10 scheduling description for new fused instruction types.
> 
> Bootstrap/regtest on powerpc64le(Power10) with no new regressions. Ok for
> trunk?
> 
> -Pat
> 
> 
> 2021-06-07  Pat Haugen  
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/power10.md (power10-fused-load, power10-fused-store,
>   power10-fused_alu, power10-fused-vec, power10-fused-branch): New.
> 
> 

Forgot to ask if ok to backport to GCC 11 since the new instruction types were 
backported.

-Pat



[PATCH, rs6000] Update Power10 scheduling description for fused instruction types

2021-06-07 Thread Pat Haugen via Gcc-patches
Update Power10 scheduling description for new fused instruction types.

Bootstrap/regtest on powerpc64le(Power10) with no new regressions. Ok for
trunk?

-Pat


2021-06-07  Pat Haugen  

gcc/ChangeLog:

* config/rs6000/power10.md (power10-fused-load, power10-fused-store,
power10-fused_alu, power10-fused-vec, power10-fused-branch): New.



diff --git a/gcc/config/rs6000/power10.md b/gcc/config/rs6000/power10.md
index 665f0f22c62..0186ae95896 100644
--- a/gcc/config/rs6000/power10.md
+++ b/gcc/config/rs6000/power10.md
@@ -100,6 +100,11 @@ (define_insn_reservation "power10-load" 4
(eq_attr "cpu" "power10"))
   "DU_any_power10,LU_power10")
 
+(define_insn_reservation "power10-fused-load" 4
+  (and (eq_attr "type" "fused_load_cmpi,fused_addis_load,fused_load_load")
+   (eq_attr "cpu" "power10"))
+  "DU_even_power10,LU_power10")
+
 (define_insn_reservation "power10-prefixed-load" 4
   (and (eq_attr "type" "load")
(eq_attr "update" "no")
@@ -176,6 +181,11 @@ (define_insn_reservation "power10-store" 0
(eq_attr "cpu" "power10"))
   "DU_any_power10,STU_power10")
 
+(define_insn_reservation "power10-fused-store" 0
+  (and (eq_attr "type" "fused_store_store")
+   (eq_attr "cpu" "power10"))
+  "DU_even_power10,STU_power10")
+
 (define_insn_reservation "power10-prefixed-store" 0
   (and (eq_attr "type" "store,fpstore,vecstore")
(eq_attr "prefixed" "yes")
@@ -244,6 +254,11 @@ (define_insn_reservation "power10-alu" 2
 (define_bypass 4 "power10-alu"
 "power10-crlogical,power10-mfcr,power10-mfcrf")
 
+(define_insn_reservation "power10-fused_alu" 2
+  (and (eq_attr "type" "fused_arith_logical,fused_cmp_isel,fused_carry")
+   (eq_attr "cpu" "power10"))
+  "DU_even_power10,EXU_power10")
+
 ; paddi
 (define_insn_reservation "power10-paddi" 2
   (and (eq_attr "type" "add")
@@ -403,6 +418,11 @@ (define_insn_reservation "power10-vec-2cyc" 2
(eq_attr "cpu" "power10"))
   "DU_any_power10,EXU_power10")
 
+(define_insn_reservation "power10-fused-vec" 2
+  (and (eq_attr "type" "fused_vector")
+   (eq_attr "cpu" "power10"))
+  "DU_even_power10,EXU_power10")
+
 (define_insn_reservation "power10-veccmp" 3
   (and (eq_attr "type" "veccmp")
(eq_attr "cpu" "power10"))
@@ -490,6 +510,11 @@ (define_insn_reservation "power10-branch" 2
(eq_attr "cpu" "power10"))
   "DU_any_power10,STU_power10")
 
+(define_insn_reservation "power10-fused-branch" 3
+  (and (eq_attr "type" "fused_mtbc")
+   (eq_attr "cpu" "power10"))
+  "DU_even_power10,STU_power10")
+
 
 ; Crypto
 (define_insn_reservation "power10-crypto" 4



Re: [PATCH, rs6000] Fix alias set of link reg save MEM

2021-06-02 Thread Pat Haugen via Gcc-patches
On 6/2/21 9:19 AM, Segher Boessenkool wrote:
> On Wed, Jun 02, 2021 at 08:23:48AM -0500, Pat Haugen wrote:
>> On 6/2/21 7:01 AM, Richard Biener wrote:
>>> So did you check the RTL (and alias-sets) produced by
>>> __builtin_return_address?  Test coverage might
>>> be low here and w/o scheduling opportunities to break things.
>>
>> __builtin_return_address creates it's own copy of the link reg to a pseudo 
>> upon function entry. It doesn't appear to try and "reuse" any LR copy/save 
>> location that might be generated via the prolog code. References to 
>> __builtin_return_address will then refer to that pseudo. So I don't see any 
>> connection between the prolog save code and __builtin_return_address.
> 
> That is for __builtin_return_address(0), the simple (and always working)
> one.  It is harder for non-zero arguments (although I don't see why
> those would not work, even with inlining).


Right, I realized after I sent the reply I was being a little too specific to 
__builtin_return_address(0). For non-zero args __builtin_return_address 
generates code to walk the appropriate number of stack frames back before 
loading the LR from the designated save area. All those mem references are 
using either frame-alias-set or 0 as their alias set, so we still should be 
fine. So regardless of the argument to the builtin, there is no connection 
between the current function's prologue LR save code and the code 
__builtin_return_address() would generate in the function.

-Pat


Re: [PATCH, rs6000] Fix alias set of link reg save MEM

2021-06-02 Thread Pat Haugen via Gcc-patches
On 6/2/21 7:01 AM, Richard Biener wrote:
> On Wed, Jun 2, 2021 at 1:15 PM Pat Haugen  wrote:
>>
>> On 6/2/21 1:51 AM, Richard Biener wrote:
>>> On Tue, Jun 1, 2021 at 10:37 PM Pat Haugen via Gcc-patches
>>>  wrote:
>>>>
>>>> Make sure link reg save MEM has frame alias set, to match other link reg
>>>> save/restore code.
>>>>
>>>> Bootstrap/regtest on powerpc64/powerpc64le with no new regressions. Ok for
>>>> trunk?
>>>>
>>>> -Pat
>>>>
>>>>
>>>> 2021-06-01  Pat Haugen  
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> * config/rs6000/rs6000-logue.c (rs6000_emit_prologue): Use
>>>> gen_frame_store.
>>>>
>>>>
>>>>
>>>> diff --git a/gcc/config/rs6000/rs6000-logue.c 
>>>> b/gcc/config/rs6000/rs6000-logue.c
>>>> index 13c00e740d6..07337c4836a 100644
>>>> --- a/gcc/config/rs6000/rs6000-logue.c
>>>> +++ b/gcc/config/rs6000/rs6000-logue.c
>>>> @@ -3257,7 +3257,7 @@ rs6000_emit_prologue (void)
>>>>if (!WORLD_SAVE_P (info) && info->lr_save_p
>>>>&& !cfun->machine->lr_is_wrapped_separately)
>>>>  {
>>>> -  rtx addr, reg, mem;
>>>> +  rtx reg;
>>>>
>>>>reg = gen_rtx_REG (Pmode, 0);
>>>>START_USE (0);
>>>> @@ -3267,13 +3267,8 @@ rs6000_emit_prologue (void)
>>>>if (!(strategy & (SAVE_NOINLINE_GPRS_SAVES_LR
>>>> | SAVE_NOINLINE_FPRS_SAVES_LR)))
>>>> {
>>>> - addr = gen_rtx_PLUS (Pmode, frame_reg_rtx,
>>>> -  GEN_INT (info->lr_save_offset + frame_off));
>>>> - mem = gen_rtx_MEM (Pmode, addr);
>>>> - /* This should not be of rs6000_sr_alias_set, because of
>>>> -__builtin_return_address.  */
>>>
>>> I can't figure what this comment meant - did you?  Note the old code
>>> looks like it would end up with alias-set zero and thus more conservative
>>> than with using frame-alias-set so this is an optimization?
>>
>> No, I couldn't figure out the reasoning for the comment/code either. It's 
>> been that way since it was introduced in March 2000 as part of the “Merge in 
>> changes from newppc-branch.” patch. All other places where we save/restore 
>> the link reg use a MEM with frame-alias-set. This change is an optimization 
>> as you suspect in that it allows us to schedule non-aliased loads above the 
>> link reg store (which couldn't happen before due to use of alias-set zero).
> 
> So did you check the RTL (and alias-sets) produced by
> __builtin_return_address?  Test coverage might
> be low here and w/o scheduling opportunities to break things.

__builtin_return_address creates it's own copy of the link reg to a pseudo upon 
function entry. It doesn't appear to try and "reuse" any LR copy/save location 
that might be generated via the prolog code. References to 
__builtin_return_address will then refer to that pseudo. So I don't see any 
connection between the prolog save code and __builtin_return_address.

-Pat


Re: [PATCH, rs6000] Fix alias set of link reg save MEM

2021-06-02 Thread Pat Haugen via Gcc-patches
On 6/2/21 1:51 AM, Richard Biener wrote:
> On Tue, Jun 1, 2021 at 10:37 PM Pat Haugen via Gcc-patches
>  wrote:
>>
>> Make sure link reg save MEM has frame alias set, to match other link reg
>> save/restore code.
>>
>> Bootstrap/regtest on powerpc64/powerpc64le with no new regressions. Ok for
>> trunk?
>>
>> -Pat
>>
>>
>> 2021-06-01  Pat Haugen  
>>
>> gcc/ChangeLog:
>>
>> * config/rs6000/rs6000-logue.c (rs6000_emit_prologue): Use
>> gen_frame_store.
>>
>>
>>
>> diff --git a/gcc/config/rs6000/rs6000-logue.c 
>> b/gcc/config/rs6000/rs6000-logue.c
>> index 13c00e740d6..07337c4836a 100644
>> --- a/gcc/config/rs6000/rs6000-logue.c
>> +++ b/gcc/config/rs6000/rs6000-logue.c
>> @@ -3257,7 +3257,7 @@ rs6000_emit_prologue (void)
>>if (!WORLD_SAVE_P (info) && info->lr_save_p
>>&& !cfun->machine->lr_is_wrapped_separately)
>>  {
>> -  rtx addr, reg, mem;
>> +  rtx reg;
>>
>>reg = gen_rtx_REG (Pmode, 0);
>>START_USE (0);
>> @@ -3267,13 +3267,8 @@ rs6000_emit_prologue (void)
>>if (!(strategy & (SAVE_NOINLINE_GPRS_SAVES_LR
>> | SAVE_NOINLINE_FPRS_SAVES_LR)))
>> {
>> - addr = gen_rtx_PLUS (Pmode, frame_reg_rtx,
>> -  GEN_INT (info->lr_save_offset + frame_off));
>> - mem = gen_rtx_MEM (Pmode, addr);
>> - /* This should not be of rs6000_sr_alias_set, because of
>> -__builtin_return_address.  */
> 
> I can't figure what this comment meant - did you?  Note the old code
> looks like it would end up with alias-set zero and thus more conservative
> than with using frame-alias-set so this is an optimization?

No, I couldn't figure out the reasoning for the comment/code either. It's been 
that way since it was introduced in March 2000 as part of the “Merge in changes 
from newppc-branch.” patch. All other places where we save/restore the link reg 
use a MEM with frame-alias-set. This change is an optimization as you suspect 
in that it allows us to schedule non-aliased loads above the link reg store 
(which couldn't happen before due to use of alias-set zero).

-Pat


[PATCH, rs6000] Fix alias set of link reg save MEM

2021-06-01 Thread Pat Haugen via Gcc-patches
Make sure link reg save MEM has frame alias set, to match other link reg
save/restore code.

Bootstrap/regtest on powerpc64/powerpc64le with no new regressions. Ok for
trunk?

-Pat


2021-06-01  Pat Haugen  

gcc/ChangeLog:

* config/rs6000/rs6000-logue.c (rs6000_emit_prologue): Use
gen_frame_store.



diff --git a/gcc/config/rs6000/rs6000-logue.c b/gcc/config/rs6000/rs6000-logue.c
index 13c00e740d6..07337c4836a 100644
--- a/gcc/config/rs6000/rs6000-logue.c
+++ b/gcc/config/rs6000/rs6000-logue.c
@@ -3257,7 +3257,7 @@ rs6000_emit_prologue (void)
   if (!WORLD_SAVE_P (info) && info->lr_save_p
   && !cfun->machine->lr_is_wrapped_separately)
 {
-  rtx addr, reg, mem;
+  rtx reg;
 
   reg = gen_rtx_REG (Pmode, 0);
   START_USE (0);
@@ -3267,13 +3267,8 @@ rs6000_emit_prologue (void)
   if (!(strategy & (SAVE_NOINLINE_GPRS_SAVES_LR
| SAVE_NOINLINE_FPRS_SAVES_LR)))
{
- addr = gen_rtx_PLUS (Pmode, frame_reg_rtx,
-  GEN_INT (info->lr_save_offset + frame_off));
- mem = gen_rtx_MEM (Pmode, addr);
- /* This should not be of rs6000_sr_alias_set, because of
-__builtin_return_address.  */
-
- insn = emit_move_insn (mem, reg);
+ insn = emit_insn (gen_frame_store (reg, frame_reg_rtx,
+info->lr_save_offset + frame_off));
  rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off,
NULL_RTX, NULL_RTX);
  END_USE (0);


Re: [PATCH, rs6000] Add ALTIVEC_REGS as pressure class

2021-05-10 Thread Pat Haugen via Gcc-patches
On 5/7/21 6:00 PM, Segher Boessenkool wrote:
>> --- a/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c
>> @@ -62,6 +62,6 @@ rlnm_test_2 (vector unsigned long long x, vector unsigned 
>> long long y,
>>  /* { dg-final { scan-assembler-times "vextsb2d" 1 } } */
>>  /* { dg-final { scan-assembler-times "vslw" 1 } } */
>>  /* { dg-final { scan-assembler-times "vsld" 1 } } */
>> -/* { dg-final { scan-assembler-times "xxlor" 3 } } */
>> +/* { dg-final { scan-assembler-times "xxlor" 2 } } */
>>  /* { dg-final { scan-assembler-times "vrlwnm" 2 } } */
>>  /* { dg-final { scan-assembler-times "vrldnm" 2 } } */
> So what is this replaced with?  Was it an "xxlmr" and it is just
> unnecessary now?

Different RA choice made the reg copy unnecessary.

<   xxspltib 0,8
<   xxlor 32,0,0
---
>   xxspltib 32,8

-Pat


[PATCH, rs6000] Add ALTIVEC_REGS as pressure class

2021-05-07 Thread Pat Haugen via Gcc-patches
Add ALTIVEC_REGS as pressure class.

Code that has heavy register pressure on Altivec registers can suffer from
over-aggressive scheduling during sched1, which then leads to increased
register spill. This is due to the fact that registers that prefer
ALTIVEC_REGS are currently assigned an allocno class of VSX_REGS. This then
misleads the scheduler to think there are 64 regs available, when in reality
there are only 32 Altivec regs. This patch fixes the problem by assigning an
allocno class of ALTIVEC_REGS and adding ALTIVEC_REGS as a pressure class.

Bootstrap/regtest on powerpc64/powerpc64le with no new regressions. Testing
on CPU2017 showed no significant differences. Ok for trunk?

-Pat


2021-05-07  Pat Haugen  

gcc/ChangeLog:

* config/rs6000/rs6000.c (rs6000_ira_change_pseudo_allocno_class):
Return ALTIVEC_REGS if that is best_class.
(rs6000_compute_pressure_classes): Add ALTIVEC_REGS.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/fold-vec-insert-float-p9.c: Adjust instruction 
counts.
* gcc.target/powerpc/vec-rlmi-rlnm.c: Likewise.



diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 844fee8..fee4eef 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -22487,11 +22487,14 @@ rs6000_ira_change_pseudo_allocno_class (int regno 
ATTRIBUTE_UNUSED,
 of allocno class.  */
   if (best_class == BASE_REGS)
return GENERAL_REGS;
-  if (TARGET_VSX
- && (best_class == FLOAT_REGS || best_class == ALTIVEC_REGS))
+  if (TARGET_VSX && best_class == FLOAT_REGS)
return VSX_REGS;
   return best_class;
 
+case VSX_REGS:
+  if (best_class == ALTIVEC_REGS)
+   return ALTIVEC_REGS;
+
 default:
   break;
 }
@@ -23609,12 +23612,12 @@ rs6000_compute_pressure_classes (enum reg_class 
*pressure_classes)
 
   n = 0;
   pressure_classes[n++] = GENERAL_REGS;
+  if (TARGET_ALTIVEC)
+pressure_classes[n++] = ALTIVEC_REGS;
   if (TARGET_VSX)
 pressure_classes[n++] = VSX_REGS;
   else
 {
-  if (TARGET_ALTIVEC)
-   pressure_classes[n++] = ALTIVEC_REGS;
   if (TARGET_HARD_FLOAT)
pressure_classes[n++] = FLOAT_REGS;
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-float-p9.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-float-p9.c
index 1c57672..4541768 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-float-p9.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-float-p9.c
@@ -31,5 +31,5 @@ testf_cst (float f, vector float vf)
 /* { dg-final { scan-assembler-times {\mstfs\M} 2 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {\mlxv\M} 2 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {\mlvewx\M} 1 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times {\mvperm\M} 1 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times {\mxxperm\M} 2 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times {\mvperm\M} 2 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times {\mxxperm\M} 1 { target ilp32 } } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c 
b/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c
index 1e7d739..5512c0f 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c
@@ -62,6 +62,6 @@ rlnm_test_2 (vector unsigned long long x, vector unsigned 
long long y,
 /* { dg-final { scan-assembler-times "vextsb2d" 1 } } */
 /* { dg-final { scan-assembler-times "vslw" 1 } } */
 /* { dg-final { scan-assembler-times "vsld" 1 } } */
-/* { dg-final { scan-assembler-times "xxlor" 3 } } */
+/* { dg-final { scan-assembler-times "xxlor" 2 } } */
 /* { dg-final { scan-assembler-times "vrlwnm" 2 } } */
 /* { dg-final { scan-assembler-times "vrldnm" 2 } } */


[PATCH, rs6000 V3] Update "prefix" attribute for Power10 [PR99133]

2021-03-30 Thread Pat Haugen via Gcc-patches
Update prefixed attribute for Power10.

This patch creates a new attribute, "maybe_prefixed", which is used to mark
those instructions that may have a prefixed form. The existing "prefixed"
attribute is now used to mark all instructions that are prefixed form.

This patch differs from the prior version in that it doesn't modify the
existing settings of the "prefixed" attribute but just adds the new attribute
and sets/tests it appropriately.

Bootstrap/regtest on powerpc64le (Power10) and powerpc64 (Power8 32/64) with no
new regressions. Ok for trunk?

-Pat


2021-03-30  Pat Haugen  

gcc/
PR target/99133
* config/rs6000/altivec.md (xxspltiw_v4si, xxspltiw_v4sf_inst,
xxspltidp_v2df_inst, xxsplti32dx_v4si_inst, xxsplti32dx_v4sf_inst,
xxblend_, xxpermx_inst, xxeval): Mark prefixed.
* config/rs6000/mma.md (mma_, mma_,
mma_, mma_, mma_, mma_,
mma_, mma_, mma_, mma_):
Likewise.
* config/rs6000/rs6000.c (rs6000_final_prescan_insn): Adjust test. 
* config/rs6000/rs6000.md (define_attr "maybe_prefixed"): New.
(define_attr "prefixed"): Update initializer.
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 27a269b9e72..21f1cc6f15b 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -826,7 +826,8 @@ (define_insn "xxspltiw_v4si"
 UNSPEC_XXSPLTIW))]
  "TARGET_POWER10"
  "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")])
+ [(set_attr "type" "vecsimple")
+  (set_attr "prefixed" "yes")])
 
 (define_expand "xxspltiw_v4sf"
   [(set (match_operand:V4SF 0 "register_operand" "=wa")
@@ -845,7 +846,8 @@ (define_insn "xxspltiw_v4sf_inst"
 UNSPEC_XXSPLTIW))]
  "TARGET_POWER10"
  "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")])
+ [(set_attr "type" "vecsimple")
+  (set_attr "prefixed" "yes")])
 
 (define_expand "xxspltidp_v2df"
   [(set (match_operand:V2DF 0 "register_operand" )
@@ -864,7 +866,8 @@ (define_insn "xxspltidp_v2df_inst"
 UNSPEC_XXSPLTID))]
   "TARGET_POWER10"
   "xxspltidp %x0,%1"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "yes")])
 
 (define_expand "xxsplti32dx_v4si"
   [(set (match_operand:V4SI 0 "register_operand" "=wa")
@@ -893,7 +896,8 @@ (define_insn "xxsplti32dx_v4si_inst"
 UNSPEC_XXSPLTI32DX))]
   "TARGET_POWER10"
   "xxsplti32dx %x0,%2,%3"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "yes")])
 
 (define_expand "xxsplti32dx_v4sf"
   [(set (match_operand:V4SF 0 "register_operand" "=wa")
@@ -921,7 +925,8 @@ (define_insn "xxsplti32dx_v4sf_inst"
 UNSPEC_XXSPLTI32DX))]
   "TARGET_POWER10"
   "xxsplti32dx %x0,%2,%3"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "yes")])
 
 (define_insn "xxblend_"
   [(set (match_operand:VM3 0 "register_operand" "=wa")
@@ -931,7 +936,8 @@ (define_insn "xxblend_"
UNSPEC_XXBLEND))]
   "TARGET_POWER10"
   "xxblendv %x0,%x1,%x2,%x3"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "yes")])
 
 (define_expand "xxpermx"
   [(set (match_operand:V2DI 0 "register_operand" "+wa")
@@ -975,7 +981,8 @@ (define_insn "xxpermx_inst"
 UNSPEC_XXPERMX))]
   "TARGET_POWER10"
   "xxpermx %x0,%x1,%x2,%x3,%4"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "yes")])
 
 (define_expand "vstrir_"
   [(set (match_operand:VIshort 0 "altivec_register_operand")
@@ -3623,7 +3630,8 @@ (define_insn "xxeval"
 UNSPEC_XXEVAL))]
"TARGET_POWER10"
"xxeval %0,%1,%2,%3,%4"
-   [(set_attr "type" "vecsimple")])
+   [(set_attr "type" "vecsimple")
+(set_attr "prefixed" "yes")])
 
 (define_expand "vec_unpacku_hi_v16qi"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index a00d3a3de26..1f6fc03d2ac 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -540,7 +540,8 @@ (define_insn "mma_"
MMA_VVI4I4I8))]
   "TARGET_MMA"
   " %A0,%x1,%x2,%3,%4,%5"
-  [(set_attr "type" "mma")])
+  [(set_attr "type" "mma")
+   (set_attr "prefixed" "yes")])
 
 (define_insn "mma_"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=")
@@ -553,7 +554,8 @@ (define_insn "mma_"
MMA_AVVI4I4I8))]
   "TARGET_MMA"
   " %A0,%x2,%x3,%4,%5,%6"
-  [(set_attr "type" "mma")])
+  [(set_attr "type" "mma")
+   (set_attr "prefixed" "yes")])
 
 (define_insn "mma_"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=")
@@ -565,7 +567,8 @@ (define_insn "mma_"
MMA_VVI4I4I2))]
   "TARGET_MMA"
   " %A0,%x1,%x2,%3,%4,%5"
-  [(set_attr "type" "mma")])
+  [(set_attr "type" "mma")
+   (set_attr "prefixed" "yes")])
 
 (define_insn "mma_"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=")

Re: [PATCH, rs6000 V2] Update "prefix" attribute for Power10 [PR99133]

2021-03-18 Thread Pat Haugen via Gcc-patches
On 3/18/21 11:33 AM, will schmidt wrote:
> Per this change:
> 
> +;; Whether an insn is a prefixed insn.  A prefixed instruction has a prefix
> +;; instruction word that conveys additional information such as a larger
> +;; immediate, additional operands, etc., in addition to the normal 
> instruction
> +;; word.  The default "length" attribute will also be adjusted by default to
> +;; be 12 bytes.
> +(define_attr "prefixed" "no,yes"
> +  (if_then_else (eq_attr "prepend_prefixed_insn" "yes")
> + (const_string "yes")
> + (const_string "no")))
> 
> 
> .. it looks like at least most of the users of the "prefixed" attribute have
> been switched over to use "prepend_prefixed_insn" instead.   Are there still
> users of the "prefixed" attribute remaining ?  I'm guessing so, given context,
> but can't tell for certain.
> 
> (Just a question, not a specific request for a change)

Yes, there are still a couple uses of get_attr_prefixed() in rs6000.c, plus the 
Power10 scheduling description makes use of it.

-Pat


[PATCH, rs6000 V2] Update "prefix" attribute for Power10 [PR99133]

2021-03-17 Thread Pat Haugen via Gcc-patches
Update prefixed attribute for Power10.

This patch creates a new attribute, prepend_prefixed_insn, which is used to mark
those instructions that are prefixed and need to have a 'p' prepended to their
mnemonic at asm emit time. The existing "prefix" attribute is now used to mark
all instructions that are prefixed form.

Bootstrap/regtest on powerpc64le (Power10) and powerpc64 (Power8 32/64) with no
new regressions. Ok for trunk?

-Pat


2021-03-17  Pat Haugen  

gcc/
PR target/99133
* config/rs6000/altivec.md (xxspltiw_v4si, xxspltiw_v4sf_inst,
xxspltidp_v2df_inst, xxsplti32dx_v4si_inst, xxsplti32dx_v4sf_inst,
xxblend_, xxpermx_inst, xxeval): Mark prefixed.
* config/rs6000/mma.md (mma_, mma_,
mma_, mma_, mma_, mma_,
mma_, mma_, mma_, mma_):
Likewise.
* config/rs6000/pcrel-opt.md: Adjust attribute name.
* config/rs6000/rs6000.c (rs6000_final_prescan_insn): Adjust test. 
* config/rs6000/rs6000.md (define_attr "prepend_prefixed_insn"): New.
(define_attr "prefixed"): Update initializer.
(*tls_gd_pcrel, *tls_ld_pcrel, tls_dtprel_,
tls_tprel_, *tls_got_tprel_pcrel_, *pcrel_local_addr,
*pcrel_extern_addr, stack_protect_setdi, stack_protect_testdi):
Adjust attribute name.
* config/rs6000/sync.md (load_quadpti, store_quadpti): Likewise.


diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 27a269b9e72..21f1cc6f15b 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -826,7 +826,8 @@ (define_insn "xxspltiw_v4si"
 UNSPEC_XXSPLTIW))]
  "TARGET_POWER10"
  "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")])
+ [(set_attr "type" "vecsimple")
+  (set_attr "prefixed" "yes")])
 
 (define_expand "xxspltiw_v4sf"
   [(set (match_operand:V4SF 0 "register_operand" "=wa")
@@ -845,7 +846,8 @@ (define_insn "xxspltiw_v4sf_inst"
 UNSPEC_XXSPLTIW))]
  "TARGET_POWER10"
  "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")])
+ [(set_attr "type" "vecsimple")
+  (set_attr "prefixed" "yes")])
 
 (define_expand "xxspltidp_v2df"
   [(set (match_operand:V2DF 0 "register_operand" )
@@ -864,7 +866,8 @@ (define_insn "xxspltidp_v2df_inst"
 UNSPEC_XXSPLTID))]
   "TARGET_POWER10"
   "xxspltidp %x0,%1"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "yes")])
 
 (define_expand "xxsplti32dx_v4si"
   [(set (match_operand:V4SI 0 "register_operand" "=wa")
@@ -893,7 +896,8 @@ (define_insn "xxsplti32dx_v4si_inst"
 UNSPEC_XXSPLTI32DX))]
   "TARGET_POWER10"
   "xxsplti32dx %x0,%2,%3"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "yes")])
 
 (define_expand "xxsplti32dx_v4sf"
   [(set (match_operand:V4SF 0 "register_operand" "=wa")
@@ -921,7 +925,8 @@ (define_insn "xxsplti32dx_v4sf_inst"
 UNSPEC_XXSPLTI32DX))]
   "TARGET_POWER10"
   "xxsplti32dx %x0,%2,%3"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "yes")])
 
 (define_insn "xxblend_"
   [(set (match_operand:VM3 0 "register_operand" "=wa")
@@ -931,7 +936,8 @@ (define_insn "xxblend_"
UNSPEC_XXBLEND))]
   "TARGET_POWER10"
   "xxblendv %x0,%x1,%x2,%x3"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "yes")])
 
 (define_expand "xxpermx"
   [(set (match_operand:V2DI 0 "register_operand" "+wa")
@@ -975,7 +981,8 @@ (define_insn "xxpermx_inst"
 UNSPEC_XXPERMX))]
   "TARGET_POWER10"
   "xxpermx %x0,%x1,%x2,%x3,%4"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "yes")])
 
 (define_expand "vstrir_"
   [(set (match_operand:VIshort 0 "altivec_register_operand")
@@ -3623,7 +3630,8 @@ (define_insn "xxeval"
 UNSPEC_XXEVAL))]
"TARGET_POWER10"
"xxeval %0,%1,%2,%3,%4"
-   [(set_attr "type" "vecsimple")])
+   [(set_attr "type" "vecsimple")
+(set_attr "prefixed" "yes")])
 
 (define_expand "vec_unpacku_hi_v16qi"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index a00d3a3de26..1f6fc03d2ac 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -540,7 +540,8 @@ (define_insn "mma_"
MMA_VVI4I4I8))]
   "TARGET_MMA"
   " %A0,%x1,%x2,%3,%4,%5"
-  [(set_attr "type" "mma")])
+  [(set_attr "type" "mma")
+   (set_attr "prefixed" "yes")])
 
 (define_insn "mma_"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=")
@@ -553,7 +554,8 @@ (define_insn "mma_"
MMA_AVVI4I4I8))]
   "TARGET_MMA"
   " %A0,%x2,%x3,%4,%5,%6"
-  [(set_attr "type" "mma")])
+  [(set_attr "type" "mma")
+   (set_attr "prefixed" "yes")])
 
 (define_insn "mma_"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=")
@@ -565,7 +567,8 @@ (define_insn "mma_"
   

Re: [PATCH, rs6000] Update "size" attribute for Power10

2021-03-02 Thread Pat Haugen via Gcc-patches
Ping3

On 2/18/21 2:30 PM, Pat Haugen via Gcc-patches wrote:
> Ping2
> 
> On 1/26/21 11:27 AM, Pat Haugen via Gcc-patches wrote:
>> Ping
>>
>> On 12/8/20 3:46 PM, Pat Haugen via Gcc-patches wrote:
>>> Update size attribute for Power10.
>>>
>>>
>>> This patch was broken out from my larger patch to update various attributes 
>>> for
>>> Power10, in order to make the review process hopefully easier. This patch 
>>> only
>>> updates the size attribute for various new instructions. There were no 
>>> changes
>>> requested to this portion of the original patch, so nothing is new here.
>>>
>>> Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. 
>>> Ok for trunk?
>>>
>>> -Pat
>>>
>>>
>>> 2020-11-08  Pat Haugen  
>>>
>>> gcc/
>>> * config/rs6000/dfp.md (extendddtd2, trunctddd2, *cmp_internal1,
>>> floatditd2, ftrunc2, fixdi2, dfp_ddedpd_,
>>> dfp_denbcd_, dfp_dxex_, dfp_diex_,
>>> *dfp_sgnfcnc_, dfp_dscli_, dfp_dscri_): Update size
>>> attribute for Power10.
>>> * config/rs6000/mma.md (*movoo): Likewise.
>>> * config/rs6000/rs6000.md (define_attr "size"): Add 256.
>>> (define_mode_attr bits): Add DD/TD modes.
>>> * config/rs6000/sync.md (load_quadpti, store_quadpti, load_lockedpti,
>>> store_conditionalpti): Update size attribute for Power10.
>>>
>>
> 



Re: [PATCH, rs6000] Add Power10 scheduling description

2021-03-02 Thread Pat Haugen via Gcc-patches
Ping3

On 2/18/21 2:31 PM, Pat Haugen via Gcc-patches wrote:
> Ping2.
> 
> On 1/26/21 11:30 AM, Pat Haugen via Gcc-patches wrote:
>> Ping.
>>
>> On 11/13/20 4:04 PM, Pat Haugen via Gcc-patches wrote:
>>> Add Power10 scheduling description.
>>>
>>> This patch adds the Power10 scheduling description. Since power10.md was 
>>> pretty much a complete rewrite (existing version of power10.md is mostly 
>>> just a copy of power9.md), I diffed power10.md with /dev/null so that the 
>>> full contents of the file are shown as opposed to a diff. This should make 
>>> it easier to read. This patch will not apply on current trunk do to that 
>>> reason.
>>>  
>>> Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. 
>>> Ok for trunk?
>>>
>>> -Pat
>>>
>>>
>>> 2020-11-13  Pat Haugen  
>>>
>>> gcc/
>>> * config/rs6000/rs6000.c (struct processor_costs): New.
>>> (rs6000_option_override_internal): Set Power10 costs.
>>> (rs6000_issue_rate): Set Power10 issue rate.
>>> * config/rs6000/power10.md: Rewrite for Power10.
>>>
>>
> 



[PATCH, rs6000] Rename variable for clarity

2021-02-23 Thread Pat Haugen via Gcc-patches
Rename next_insn_prefixed_p for improved clarity.

Bootstrap/regtest on powerpc64le with no new regressions. Ok for trunk?

-Pat


2021-02-22  Pat Haugen  

gcc/
* config/rs6000/rs6000.c (next_insn_prefixed_p): Rename.
(rs6000_final_prescan_insn): Adjust.
(rs6000_asm_output_opcode): Likewise.


diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ec068c58aa5..4e608073358 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -26191,7 +26191,7 @@ prefixed_paddi_p (rtx_insn *insn)
 
 /* Whether the next instruction needs a 'p' prefix issued before the
instruction is printed out.  */
-static bool next_insn_prefixed_p;
+static bool prepend_p_to_next_insn;
 
 /* Define FINAL_PRESCAN_INSN if some processing needs to be done before
outputting the assembler code.  On the PowerPC, we remember if the current
@@ -26202,7 +26202,7 @@ static bool next_insn_prefixed_p;
 void
 rs6000_final_prescan_insn (rtx_insn *insn, rtx [], int)
 {
-  next_insn_prefixed_p = (get_attr_prefixed (insn) != PREFIXED_NO);
+  prepend_p_to_next_insn = (get_attr_prefixed (insn) != PREFIXED_NO);
   return;
 }
 
@@ -26212,7 +26212,7 @@ rs6000_final_prescan_insn (rtx_insn *insn, rtx [], int)
 void
 rs6000_asm_output_opcode (FILE *stream)
 {
-  if (next_insn_prefixed_p)
+  if (prepend_p_to_next_insn)
 fprintf (stream, "p");
 
   return;


Re: [PATCH, rs6000] Add Power10 scheduling description

2021-02-18 Thread Pat Haugen via Gcc-patches
Ping2.

On 1/26/21 11:30 AM, Pat Haugen via Gcc-patches wrote:
> Ping.
> 
> On 11/13/20 4:04 PM, Pat Haugen via Gcc-patches wrote:
>> Add Power10 scheduling description.
>>
>> This patch adds the Power10 scheduling description. Since power10.md was 
>> pretty much a complete rewrite (existing version of power10.md is mostly 
>> just a copy of power9.md), I diffed power10.md with /dev/null so that the 
>> full contents of the file are shown as opposed to a diff. This should make 
>> it easier to read. This patch will not apply on current trunk do to that 
>> reason.
>>  
>> Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. 
>> Ok for trunk?
>>
>> -Pat
>>
>>
>> 2020-11-13  Pat Haugen  
>>
>> gcc/
>>  * config/rs6000/rs6000.c (struct processor_costs): New.
>>  (rs6000_option_override_internal): Set Power10 costs.
>>  (rs6000_issue_rate): Set Power10 issue rate.
>>  * config/rs6000/power10.md: Rewrite for Power10.
>>
> 



Re: [PATCH, rs6000] Update "prefix" attribute for Power10

2021-02-18 Thread Pat Haugen via Gcc-patches
Ping2.

On 1/26/21 11:28 AM, Pat Haugen via Gcc-patches wrote:
> Ping.
> 
> On 12/10/20 3:32 PM, Pat Haugen via Gcc-patches wrote:
>> Update prefixed attribute for Power10.
>>
>>
>> This patch was broken out from my larger patch to update various attributes 
>> for
>> Power10, in order to make the review process hopefully easier. This patch 
>> only
>> updates the prefix attribute for various new instructions. Changes in this
>> version include missed updates to rs6000_insn_cost and
>> rs6000_adjust_insn_length. I stayed with the new 'always' keyword but added
>> additional commentary so hopefully is more clear.
>>
>> Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. 
>> Ok for trunk?
>>
>> -Pat
>>
>>
>> 2020-11-10  Pat Haugen  
>>
>> gcc/
>>  * config/rs6000/altivec.md (xxspltiw_v4si, xxspltiw_v4sf_inst,
>>  xxspltidp_v2df_inst, xxsplti32dx_v4si_inst, xxsplti32dx_v4sf_inst,
>>  xxblend_, xxpermx_inst, xxeval): Mark prefixed "always".
>>  * config/rs6000/mma.md (mma_, mma_,
>>  mma_, mma_, mma_, mma_,
>>  mma_, mma_, mma_, mma_):
>>  Likewise.
>>  * config/rs6000/rs6000.c (rs6000_insn_cost): Update test for prefixed
>>  insn.
>>  (next_insn_prefixed_p): Rename to prefix_next_insn_p.
>>  (rs6000_final_prescan_insn): Only add 'p' for PREFIXED_YES.
>>  (rs6000_asm_output_opcode): Adjust.
>>  (rs6000_adjust_insn_length): Update test for prefixed insns.
>>  * config/rs6000/rs6000.md (define_attr "prefixed"): Add 'always'
>>  and update commentary.
>>
> 



Re: [PATCH, rs6000] Update "size" attribute for Power10

2021-02-18 Thread Pat Haugen via Gcc-patches
Ping2

On 1/26/21 11:27 AM, Pat Haugen via Gcc-patches wrote:
> Ping
> 
> On 12/8/20 3:46 PM, Pat Haugen via Gcc-patches wrote:
>> Update size attribute for Power10.
>>
>>
>> This patch was broken out from my larger patch to update various attributes 
>> for
>> Power10, in order to make the review process hopefully easier. This patch 
>> only
>> updates the size attribute for various new instructions. There were no 
>> changes
>> requested to this portion of the original patch, so nothing is new here.
>>
>> Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. 
>> Ok for trunk?
>>
>> -Pat
>>
>>
>> 2020-11-08  Pat Haugen  
>>
>> gcc/
>>  * config/rs6000/dfp.md (extendddtd2, trunctddd2, *cmp_internal1,
>>  floatditd2, ftrunc2, fixdi2, dfp_ddedpd_,
>>  dfp_denbcd_, dfp_dxex_, dfp_diex_,
>>  *dfp_sgnfcnc_, dfp_dscli_, dfp_dscri_): Update size
>>  attribute for Power10.
>>  * config/rs6000/mma.md (*movoo): Likewise.
>>  * config/rs6000/rs6000.md (define_attr "size"): Add 256.
>>  (define_mode_attr bits): Add DD/TD modes.
>>  * config/rs6000/sync.md (load_quadpti, store_quadpti, load_lockedpti,
>>  store_conditionalpti): Update size attribute for Power10.
>>
> 



Re: [PATCH, rs6000] Add Power10 scheduling description

2021-01-26 Thread Pat Haugen via Gcc-patches
Ping.

On 11/13/20 4:04 PM, Pat Haugen via Gcc-patches wrote:
> Add Power10 scheduling description.
> 
> This patch adds the Power10 scheduling description. Since power10.md was 
> pretty much a complete rewrite (existing version of power10.md is mostly just 
> a copy of power9.md), I diffed power10.md with /dev/null so that the full 
> contents of the file are shown as opposed to a diff. This should make it 
> easier to read. This patch will not apply on current trunk do to that reason.
>  
> Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. Ok 
> for trunk?
> 
> -Pat
> 
> 
> 2020-11-13  Pat Haugen  
> 
> gcc/
>   * config/rs6000/rs6000.c (struct processor_costs): New.
>   (rs6000_option_override_internal): Set Power10 costs.
>   (rs6000_issue_rate): Set Power10 issue rate.
>   * config/rs6000/power10.md: Rewrite for Power10.
> 



Re: [PATCH, rs6000] Update "prefix" attribute for Power10

2021-01-26 Thread Pat Haugen via Gcc-patches
Ping.

On 12/10/20 3:32 PM, Pat Haugen via Gcc-patches wrote:
> Update prefixed attribute for Power10.
> 
> 
> This patch was broken out from my larger patch to update various attributes 
> for
> Power10, in order to make the review process hopefully easier. This patch only
> updates the prefix attribute for various new instructions. Changes in this
> version include missed updates to rs6000_insn_cost and
> rs6000_adjust_insn_length. I stayed with the new 'always' keyword but added
> additional commentary so hopefully is more clear.
> 
> Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. Ok 
> for trunk?
> 
> -Pat
> 
> 
> 2020-11-10  Pat Haugen  
> 
> gcc/
>   * config/rs6000/altivec.md (xxspltiw_v4si, xxspltiw_v4sf_inst,
>   xxspltidp_v2df_inst, xxsplti32dx_v4si_inst, xxsplti32dx_v4sf_inst,
>   xxblend_, xxpermx_inst, xxeval): Mark prefixed "always".
>   * config/rs6000/mma.md (mma_, mma_,
>   mma_, mma_, mma_, mma_,
>   mma_, mma_, mma_, mma_):
>   Likewise.
>   * config/rs6000/rs6000.c (rs6000_insn_cost): Update test for prefixed
>   insn.
>   (next_insn_prefixed_p): Rename to prefix_next_insn_p.
>   (rs6000_final_prescan_insn): Only add 'p' for PREFIXED_YES.
>   (rs6000_asm_output_opcode): Adjust.
>   (rs6000_adjust_insn_length): Update test for prefixed insns.
>   * config/rs6000/rs6000.md (define_attr "prefixed"): Add 'always'
>   and update commentary.
> 



Re: [PATCH, rs6000] Update "size" attribute for Power10

2021-01-26 Thread Pat Haugen via Gcc-patches
Ping

On 12/8/20 3:46 PM, Pat Haugen via Gcc-patches wrote:
> Update size attribute for Power10.
> 
> 
> This patch was broken out from my larger patch to update various attributes 
> for
> Power10, in order to make the review process hopefully easier. This patch only
> updates the size attribute for various new instructions. There were no changes
> requested to this portion of the original patch, so nothing is new here.
> 
> Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. Ok 
> for trunk?
> 
> -Pat
> 
> 
> 2020-11-08  Pat Haugen  
> 
> gcc/
>   * config/rs6000/dfp.md (extendddtd2, trunctddd2, *cmp_internal1,
>   floatditd2, ftrunc2, fixdi2, dfp_ddedpd_,
>   dfp_denbcd_, dfp_dxex_, dfp_diex_,
>   *dfp_sgnfcnc_, dfp_dscli_, dfp_dscri_): Update size
>   attribute for Power10.
>   * config/rs6000/mma.md (*movoo): Likewise.
>   * config/rs6000/rs6000.md (define_attr "size"): Add 256.
>   (define_mode_attr bits): Add DD/TD modes.
>   * config/rs6000/sync.md (load_quadpti, store_quadpti, load_lockedpti,
>   store_conditionalpti): Update size attribute for Power10.
> 



Re: [PATCH,rs6000] Fusion patterns for logical-logical

2020-12-21 Thread Pat Haugen via Gcc-patches
On 12/10/20 8:41 PM, acsawdey--- via Gcc-patches wrote:
> +  [(set_attr "type" "logical")

Similar to load-cmp fusion pairs, we need a new insn type here.

-Pat


Re: [PATCH,rs6000] Combine patterns for p10 load-cmpi fusion

2020-12-21 Thread Pat Haugen via Gcc-patches
On 12/4/20 1:19 PM, acsawdey--- via Gcc-patches wrote:
> +   print "  [(set_attr \"type\" \"load\")\n";

We need to tag these with a new instruction type, such as 'fused-load-cmp', so 
the scheduler can distinguish them from normal loads.

-Pat


[PATCH, rs6000] Don't set MMA prefixed instruction length to 8

2020-12-14 Thread Pat Haugen via Gcc-patches
Fix instruction length for MMA insns.


Prefixed instructions should not have their length explicitly set to '8'. The 
function get_attr_length() will adjust the length appropriately based on the 
value of the "prefixed" attribute.

Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. Ok 
for trunk?

-Pat


2020-12-14  Pat Haugen  

gcc/
* config/rs6000/mma.md (*movxo, mma_, mma_,
mma_, mma_, mma_, mma_,
mma_, mma_, mma_, mma_):
Remove explicit setting of length attribute.
diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 4d291c42f7a..ef4394416b6 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -318,7 +318,7 @@ (define_insn_and_split "*movxo"
   DONE;
 }
   [(set_attr "type" "vecload,vecstore,veclogical")
-   (set_attr "length" "8,8,16")
+   (set_attr "length" "*,*,16")
(set_attr "max_prefixed_insns" "2,2,*")])
 
 (define_expand "mma_assemble_pair"
@@ -539,8 +539,7 @@ (define_insn "mma_"
MMA_VVI4I4I8))]
   "TARGET_MMA"
   " %A0,%x1,%x2,%3,%4,%5"
-  [(set_attr "type" "mma")
-   (set_attr "length" "8")])
+  [(set_attr "type" "mma")])
 
 (define_insn "mma_"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=")
@@ -553,8 +552,7 @@ (define_insn "mma_"
MMA_AVVI4I4I8))]
   "TARGET_MMA"
   " %A0,%x2,%x3,%4,%5,%6"
-  [(set_attr "type" "mma")
-   (set_attr "length" "8")])
+  [(set_attr "type" "mma")])
 
 (define_insn "mma_"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=")
@@ -566,8 +564,7 @@ (define_insn "mma_"
MMA_VVI4I4I2))]
   "TARGET_MMA"
   " %A0,%x1,%x2,%3,%4,%5"
-  [(set_attr "type" "mma")
-   (set_attr "length" "8")])
+  [(set_attr "type" "mma")])
 
 (define_insn "mma_"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=")
@@ -580,8 +577,7 @@ (define_insn "mma_"
MMA_AVVI4I4I2))]
   "TARGET_MMA"
   " %A0,%x2,%x3,%4,%5,%6"
-  [(set_attr "type" "mma")
-   (set_attr "length" "8")])
+  [(set_attr "type" "mma")])
 
 (define_insn "mma_"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=")
@@ -592,8 +588,7 @@ (define_insn "mma_"
MMA_VVI4I4))]
   "TARGET_MMA"
   " %A0,%x1,%x2,%3,%4"
-  [(set_attr "type" "mma")
-   (set_attr "length" "8")])
+  [(set_attr "type" "mma")])
 
 (define_insn "mma_"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=")
@@ -605,8 +600,7 @@ (define_insn "mma_"
MMA_AVVI4I4))]
   "TARGET_MMA"
   " %A0,%x2,%x3,%4,%5"
-  [(set_attr "type" "mma")
-   (set_attr "length" "8")])
+  [(set_attr "type" "mma")])
 
 (define_insn "mma_"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=")
@@ -617,8 +611,7 @@ (define_insn "mma_"
MMA_PVI4I2))]
   "TARGET_MMA"
   " %A0,%x1,%x2,%3,%4"
-  [(set_attr "type" "mma")
-   (set_attr "length" "8")])
+  [(set_attr "type" "mma")])
 
 (define_insn "mma_"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=")
@@ -630,8 +623,7 @@ (define_insn "mma_"
MMA_APVI4I2))]
   "TARGET_MMA"
   " %A0,%x2,%x3,%4,%5"
-  [(set_attr "type" "mma")
-   (set_attr "length" "8")])
+  [(set_attr "type" "mma")])
 
 (define_insn "mma_"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=")
@@ -643,8 +635,7 @@ (define_insn "mma_"
MMA_VVI4I4I4))]
   "TARGET_MMA"
   " %A0,%x1,%x2,%3,%4,%5"
-  [(set_attr "type" "mma")
-   (set_attr "length" "8")])
+  [(set_attr "type" "mma")])
 
 (define_insn "mma_"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=")
@@ -657,5 +648,4 @@ (define_insn "mma_"
MMA_AVVI4I4I4))]
   "TARGET_MMA"
   " %A0,%x2,%x3,%4,%5,%6"
-  [(set_attr "type" "mma")
-   (set_attr "length" "8")])
+  [(set_attr "type" "mma")])


[PATCH, rs6000] Update "prefix" attribute for Power10

2020-12-10 Thread Pat Haugen via Gcc-patches
Update prefixed attribute for Power10.


This patch was broken out from my larger patch to update various attributes for
Power10, in order to make the review process hopefully easier. This patch only
updates the prefix attribute for various new instructions. Changes in this
version include missed updates to rs6000_insn_cost and
rs6000_adjust_insn_length. I stayed with the new 'always' keyword but added
additional commentary so hopefully is more clear.

Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. Ok 
for trunk?

-Pat


2020-11-10  Pat Haugen  

gcc/
* config/rs6000/altivec.md (xxspltiw_v4si, xxspltiw_v4sf_inst,
xxspltidp_v2df_inst, xxsplti32dx_v4si_inst, xxsplti32dx_v4sf_inst,
xxblend_, xxpermx_inst, xxeval): Mark prefixed "always".
* config/rs6000/mma.md (mma_, mma_,
mma_, mma_, mma_, mma_,
mma_, mma_, mma_, mma_):
Likewise.
* config/rs6000/rs6000.c (rs6000_insn_cost): Update test for prefixed
insn.
(next_insn_prefixed_p): Rename to prefix_next_insn_p.
(rs6000_final_prescan_insn): Only add 'p' for PREFIXED_YES.
(rs6000_asm_output_opcode): Adjust.
(rs6000_adjust_insn_length): Update test for prefixed insns.
* config/rs6000/rs6000.md (define_attr "prefixed"): Add 'always'
and update commentary.
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 6a6ce0f84ed..fc926f7a7aa 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -828,7 +828,8 @@ (define_insn "xxspltiw_v4si"
 UNSPEC_XXSPLTIW))]
  "TARGET_POWER10"
  "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")])
+ [(set_attr "type" "vecsimple")
+  (set_attr "prefixed" "always")])
 
 (define_expand "xxspltiw_v4sf"
   [(set (match_operand:V4SF 0 "register_operand" "=wa")
@@ -847,7 +848,8 @@ (define_insn "xxspltiw_v4sf_inst"
 UNSPEC_XXSPLTIW))]
  "TARGET_POWER10"
  "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")])
+ [(set_attr "type" "vecsimple")
+  (set_attr "prefixed" "always")])
 
 (define_expand "xxspltidp_v2df"
   [(set (match_operand:V2DF 0 "register_operand" )
@@ -866,7 +868,8 @@ (define_insn "xxspltidp_v2df_inst"
 UNSPEC_XXSPLTID))]
   "TARGET_POWER10"
   "xxspltidp %x0,%1"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "always")])
 
 (define_expand "xxsplti32dx_v4si"
   [(set (match_operand:V4SI 0 "register_operand" "=wa")
@@ -895,7 +898,8 @@ (define_insn "xxsplti32dx_v4si_inst"
 UNSPEC_XXSPLTI32DX))]
   "TARGET_POWER10"
   "xxsplti32dx %x0,%2,%3"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "always")])
 
 (define_expand "xxsplti32dx_v4sf"
   [(set (match_operand:V4SF 0 "register_operand" "=wa")
@@ -923,7 +927,8 @@ (define_insn "xxsplti32dx_v4sf_inst"
 UNSPEC_XXSPLTI32DX))]
   "TARGET_POWER10"
   "xxsplti32dx %x0,%2,%3"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "always")])
 
 (define_insn "xxblend_"
   [(set (match_operand:VM3 0 "register_operand" "=wa")
@@ -933,7 +938,8 @@ (define_insn "xxblend_"
UNSPEC_XXBLEND))]
   "TARGET_POWER10"
   "xxblendv %x0,%x1,%x2,%x3"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "always")])
 
 (define_expand "xxpermx"
   [(set (match_operand:V2DI 0 "register_operand" "+wa")
@@ -977,7 +983,8 @@ (define_insn "xxpermx_inst"
 UNSPEC_XXPERMX))]
   "TARGET_POWER10"
   "xxpermx %x0,%x1,%x2,%x3,%4"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "always")])
 
 (define_expand "vstrir_"
   [(set (match_operand:VIshort 0 "altivec_register_operand")
@@ -3625,7 +3632,8 @@ (define_insn "xxeval"
 UNSPEC_XXEVAL))]
"TARGET_POWER10"
"xxeval %0,%1,%2,%3,%4"
-   [(set_attr "type" "vecsimple")])
+   [(set_attr "type" "vecsimple")
+(set_attr "prefixed" "always")])
 
 (define_expand "vec_unpacku_hi_v16qi"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 4d291c42f7a..a87a1eb132c 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -540,6 +540,7 @@ (define_insn "mma_"
   "TARGET_MMA"
   " %A0,%x1,%x2,%3,%4,%5"
   [(set_attr "type" "mma")
+   (set_attr "prefixed" "always")
(set_attr "length" "8")])
 
 (define_insn "mma_"
@@ -554,6 +555,7 @@ (define_insn "mma_"
   "TARGET_MMA"
   " %A0,%x2,%x3,%4,%5,%6"
   [(set_attr "type" "mma")
+   (set_attr "prefixed" "always")
(set_attr "length" "8")])
 
 (define_insn "mma_"
@@ -567,6 +569,7 @@ (define_insn "mma_"
   "TARGET_MMA"
   " %A0,%x1,%x2,%3,%4,%5"
   [(set_attr "type" "mma")
+   (set_attr "prefixed" "always")
(set_attr "length" "8")])
 
 (define_insn "mma_"
@@ -581,6 +584,7 @@ 

[PATCH, rs6000] Update "size" attribute for Power10

2020-12-08 Thread Pat Haugen via Gcc-patches
Update size attribute for Power10.


This patch was broken out from my larger patch to update various attributes for
Power10, in order to make the review process hopefully easier. This patch only
updates the size attribute for various new instructions. There were no changes
requested to this portion of the original patch, so nothing is new here.

Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. Ok 
for trunk?

-Pat


2020-11-08  Pat Haugen  

gcc/
* config/rs6000/dfp.md (extendddtd2, trunctddd2, *cmp_internal1,
floatditd2, ftrunc2, fixdi2, dfp_ddedpd_,
dfp_denbcd_, dfp_dxex_, dfp_diex_,
*dfp_sgnfcnc_, dfp_dscli_, dfp_dscri_): Update size
attribute for Power10.
* config/rs6000/mma.md (*movoo): Likewise.
* config/rs6000/rs6000.md (define_attr "size"): Add 256.
(define_mode_attr bits): Add DD/TD modes.
* config/rs6000/sync.md (load_quadpti, store_quadpti, load_lockedpti,
store_conditionalpti): Update size attribute for Power10.

diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
index 9a952300cd6..7562e63a919 100644
--- a/gcc/config/rs6000/dfp.md
+++ b/gcc/config/rs6000/dfp.md
@@ -139,7 +139,8 @@ (define_insn "extendddtd2"
(float_extend:TD (match_operand:DD 1 "gpc_reg_operand" "d")))]
   "TARGET_DFP"
   "dctqpq %0,%1"
-  [(set_attr "type" "dfp")])
+  [(set_attr "type" "dfp")
+   (set_attr "size" "128")])
 
 ;; The result of drdpq is an even/odd register pair with the converted
 ;; value in the even register and zero in the odd register.
@@ -153,6 +154,7 @@ (define_insn "trunctddd2"
   "TARGET_DFP"
   "drdpq %2,%1\;fmr %0,%2"
   [(set_attr "type" "dfp")
+   (set_attr "size" "128")
(set_attr "length" "8")])
 
 (define_insn "trunctdsd2"
@@ -206,7 +208,8 @@ (define_insn "*cmp_internal1"
  (match_operand:DDTD 2 "gpc_reg_operand" "d")))]
   "TARGET_DFP"
   "dcmpu %0,%1,%2"
-  [(set_attr "type" "dfp")])
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
 
 (define_insn "floatdidd2"
   [(set (match_operand:DD 0 "gpc_reg_operand" "=d")
@@ -220,7 +223,8 @@ (define_insn "floatditd2"
(float:TD (match_operand:DI 1 "gpc_reg_operand" "d")))]
   "TARGET_DFP"
   "dcffixq %0,%1"
-  [(set_attr "type" "dfp")])
+  [(set_attr "type" "dfp")
+   (set_attr "size" "128")])
 
 ;; Convert a decimal64/128 to a decimal64/128 whose value is an integer.
 ;; This is the first stage of converting it to an integer type.
@@ -230,7 +234,8 @@ (define_insn "ftrunc2"
(fix:DDTD (match_operand:DDTD 1 "gpc_reg_operand" "d")))]
   "TARGET_DFP"
   "drintn. 0,%0,%1,1"
-  [(set_attr "type" "dfp")])
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
 
 ;; Convert a decimal64/128 whose value is an integer to an actual integer.
 ;; This is the second stage of converting decimal float to integer type.
@@ -240,7 +245,8 @@ (define_insn "fixdi2"
(fix:DI (match_operand:DDTD 1 "gpc_reg_operand" "d")))]
   "TARGET_DFP"
   "dctfix %0,%1"
-  [(set_attr "type" "dfp")])
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
 
 ;; Decimal builtin support
 
@@ -262,7 +268,8 @@ (define_insn "dfp_ddedpd_"
 UNSPEC_DDEDPD))]
   "TARGET_DFP"
   "ddedpd %1,%0,%2"
-  [(set_attr "type" "dfp")])
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
 
 (define_insn "dfp_denbcd_"
   [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
@@ -271,7 +278,8 @@ (define_insn "dfp_denbcd_"
 UNSPEC_DENBCD))]
   "TARGET_DFP"
   "denbcd %1,%0,%2"
-  [(set_attr "type" "dfp")])
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
 
 (define_insn "dfp_denbcd_v16qi_inst"
   [(set (match_operand:TD 0 "gpc_reg_operand" "=d")
@@ -301,7 +309,8 @@ (define_insn "dfp_dxex_"
   UNSPEC_DXEX))]
   "TARGET_DFP"
   "dxex %0,%1"
-  [(set_attr "type" "dfp")])
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
 
 (define_insn "dfp_diex_"
   [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
@@ -310,7 +319,8 @@ (define_insn "dfp_diex_"
 UNSPEC_DXEX))]
   "TARGET_DFP"
   "diex %0,%1,%2"
-  [(set_attr "type" "dfp")])
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
 
 (define_expand "dfptstsfi__"
   [(set (match_dup 3)
@@ -349,7 +359,8 @@ (define_insn "*dfp_sgnfcnc_"
 operands[1] = GEN_INT (63);
   return "dtstsfi %0,%1,%2";
 }
-  [(set_attr "type" "fp")])
+  [(set_attr "type" "fp")
+   (set_attr "size" "")])
 
 (define_insn "dfp_dscli_"
   [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
@@ -358,7 +369,8 @@ (define_insn "dfp_dscli_"
 UNSPEC_DSCLI))]
   "TARGET_DFP"
   "dscli %0,%1,%2"
-  [(set_attr "type" "dfp")])
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
 
 (define_insn "dfp_dscri_"
   [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
@@ -367,4 +379,5 @@ (define_insn "dfp_dscri_"
 UNSPEC_DSCRI))]
   "TARGET_DFP"
   "dscri %0,%1,%2"
-  [(set_attr "type" "dfp")])
+  [(set_attr "type" 

Re: [PATCH v2] rs6000, vector integer multiply/divide/modulo instructions

2020-11-24 Thread Pat Haugen via Gcc-patches
On 11/24/20 8:17 PM, Pat Haugen via Gcc-patches wrote:
> On 11/24/20 12:59 PM, Carl Love via Gcc-patches wrote:
>> +
>> +(define_insn "dives_"
>> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
>> +(unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
>> +(match_operand:VIlong 2 "vsx_register_operand" "v")]
>> +   UNSPEC_VDIVES))]
>> +  "TARGET_POWER10"
>> +  "vdives %0,%1,%2"
>> +  [(set_attr "type" "vecdiv")
>> +   (set_attr "size" "128")])
>> +
>> +(define_insn "diveu_"
>> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
>> +(unspec: VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
>> + (match_operand:VIlong 2 "vsx_register_operand" "v")]
>> +UNSPEC_VDIVEU))]
>> +  "TARGET_POWER10"
>> +  "vdiveu %0,%1,%2"
>> +  [(set_attr "type" "vecdiv")
>> +   (set_attr "size" "128")])
>> +
>> +(define_insn "div3"
>> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
>> +(div:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
>> +(match_operand:VIlong 2 "vsx_register_operand" "v")))]
>> +  "TARGET_POWER10"
>> +  "vdivs %0,%1,%2"
>> +  [(set_attr "type" "vecdiv")
>> +   (set_attr "size" "128")])
>> +
>> +(define_insn "udiv3"
>> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
>> +(udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
>> +(match_operand:VIlong 2 "vsx_register_operand" "v")))]
>> +  "TARGET_POWER10"
>> +  "vdivu %0,%1,%2"
>> +  [(set_attr "type" "vecdiv")
>> +   (set_attr "size" "128")])
>> +
>> +(define_insn "mods_"
>> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
>> +(mod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
>> +(match_operand:VIlong 2 "vsx_register_operand" "v")))]
>> +  "TARGET_POWER10"
>> +  "vmods %0,%1,%2"
>> +  [(set_attr "type" "vecdiv")
>> +   (set_attr "size" "128")])
>> +
>> +(define_insn "modu_"
>> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
>> +(umod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
>> + (match_operand:VIlong 2 "vsx_register_operand" "v")))]
>> +  "TARGET_POWER10"
>> +  "vmodu %0,%1,%2"
>> +  [(set_attr "type" "vecdiv")
>> +   (set_attr "size" "128")])
> 
> We should only be setting "size" "128" for instructions that operate on 
> scalar 128-bit data items (i.e. 'vdivesq' etc). Since the above insns are 
> either V2DI/V4SI (ala VIlong mode_iterator), they shouldn't be marked as size 
> 128. If you want to set the size based on mode, (set_attr "size" "") 
> should do the trick I believe.

Well, after you update "(define_mode_attr bits" in rs6000.md for V2DI/V4SI.



Re: [PATCH v2] rs6000, vector integer multiply/divide/modulo instructions

2020-11-24 Thread Pat Haugen via Gcc-patches
On 11/24/20 12:59 PM, Carl Love via Gcc-patches wrote:
> +
> +(define_insn "dives_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +(unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")]
> +UNSPEC_VDIVES))]
> +  "TARGET_POWER10"
> +  "vdives %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "128")])
> +
> +(define_insn "diveu_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +(unspec: VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> +  (match_operand:VIlong 2 "vsx_register_operand" "v")]
> + UNSPEC_VDIVEU))]
> +  "TARGET_POWER10"
> +  "vdiveu %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "128")])
> +
> +(define_insn "div3"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (div:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vdivs %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "128")])
> +
> +(define_insn "udiv3"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vdivu %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "128")])
> +
> +(define_insn "mods_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (mod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmods %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "128")])
> +
> +(define_insn "modu_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (umod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +  (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmodu %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "128")])

We should only be setting "size" "128" for instructions that operate on scalar 
128-bit data items (i.e. 'vdivesq' etc). Since the above insns are either 
V2DI/V4SI (ala VIlong mode_iterator), they shouldn't be marked as size 128. If 
you want to set the size based on mode, (set_attr "size" "") should do 
the trick I believe.

-Pat


Re: [PATCH] rs6000, vector integer multiply/divide/modulo instructions

2020-11-19 Thread Pat Haugen via Gcc-patches
On 11/4/20 10:44 AM, Carl Love via Gcc-patches wrote:
> +
> +(define_insn "vdives_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +(unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")]
> +UNSPEC_VDIVES))]
> +  "TARGET_POWER10"
> +  "vdives %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "vdiveu_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +(unspec: VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> +  (match_operand:VIlong 2 "vsx_register_operand" "v")]
> + UNSPEC_VDIVEU))]
> +  "TARGET_POWER10"
> +  "vdiveu %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "div3"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (div:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vdivs %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "udiv3"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vdivu %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "vmods_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (mod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmods %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "vmodu_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (umod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +  (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmodu %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])

Since the vdiv.../vmod... instructions execute in the fixed point divide unit, 
all the above instructions should have a type of "div" instead of "vecsimple".


> +
> +(define_insn "vmulhs_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")]
> +UNSPEC_VMULHS))]
> +  "TARGET_POWER10"
> +  "vmulhs %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "vmulhu_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")]
> +UNSPEC_VMULHU))]
> +  "TARGET_POWER10"
> +  "vmulhu %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])> +
> +;; Vector multiply low double word
> +(define_insn "mulv2di3"
> +  [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
> + (mult:V2DI (match_operand:V2DI 1 "vsx_register_operand" "v")
> +(match_operand:V2DI 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmulld %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])

Similarly, the above 3 insns should have a "mul" instruction type.

-Pat


Re: [PATCH, rs6000] Add Power10 scheduling description

2020-11-18 Thread Pat Haugen via Gcc-patches
On 11/17/20 10:31 PM, will schmidt wrote:
> On Fri, 2020-11-13 at 16:04 -0600, Pat Haugen via Gcc-patches wrote:
>> +(define_automaton "power10dsp,power10issue,power10div")
>> +
>> +; Decode/dispatch slots
>> +(define_cpu_unit "du0_power10,du1_power10,du2_power10,du3_power10,
>> +  du4_power10,du5_power10,du6_power10,du7_power10" "power10dsp")
>> +
>> +; Four execution units
>> +(define_cpu_unit "exu0_power10,exu1_power10,exu2_power10,exu3_power10"
>> + "power10issue")
>> +; Two load units and two store units
>> +(define_cpu_unit "lu0_power10,lu1_power10" "power10issue")
>> +(define_cpu_unit "stu0_power10,stu1_power10" "power10issue")
>> +; Create false units for use by non-pipelined div/sqrt
>> +(define_cpu_unit "fx_div0_power10,fx_div1_power10" "power10div")
>> +(define_cpu_unit "fp_div0_power10,fp_div1_power10,fp_div2_power10,
>> +  fp_div3_power10" "power10div")
> 
> The spacing catches my eye, I'd want to add spaces around those commas,
> etc.   But.. this appears to be consistent with behavior
> as seen in the
> existing power9.md ; power9.md ; etc. 
> So it's either this way per necessity, or this way per history.
> Either way, no change requested here given that precedence.
> (If this and
> the older stuff also needs to be cosmetically tweaked, that can be
> handled later on..)

Yeah, just historical.


>> +; Load Unit
>> +(define_insn_reservation "power10-load" 4
>> +  (and (eq_attr "type" "load")
>> +   (eq_attr "update" "no")
>> +   (eq_attr "size" "!128")
>> +   (eq_attr "prefixed" "no")
>> +   (eq_attr "cpu" "power10"))
>> +  "DU_any_power10,LU_power10")
>> +
>> +(define_insn_reservation "power10-prefixed-load" 4
>> +  (and (eq_attr "type" "load")
>> +   (eq_attr "update" "no")
>> +   (eq_attr "size" "!128")
>> +   (eq_attr "prefixed" "!no")
> 
> I'm sure there is reason, but remind me..  "!no" versus "yes" ?

>From my prior patch, the prefixed attribute can now have values no/yes/always, 
>so '!no' means 'yes' || 'always'.


>> +
>> +; DFP
>> +; Use the minimum 12 cycle latency for all insns, even though some are more
> 
> ".. for all DFP insns"
> Since you specify this is a minimum, can prob drop "even though some
> are more".

ok



[PATCH, rs6000] Add Power10 scheduling description

2020-11-13 Thread Pat Haugen via Gcc-patches
Add Power10 scheduling description.

This patch adds the Power10 scheduling description. Since power10.md was pretty 
much a complete rewrite (existing version of power10.md is mostly just a copy 
of power9.md), I diffed power10.md with /dev/null so that the full contents of 
the file are shown as opposed to a diff. This should make it easier to read. 
This patch will not apply on current trunk do to that reason.
 
Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. Ok 
for trunk?

-Pat


2020-11-13  Pat Haugen  

gcc/
* config/rs6000/rs6000.c (struct processor_costs): New.
(rs6000_option_override_internal): Set Power10 costs.
(rs6000_issue_rate): Set Power10 issue rate.
* config/rs6000/power10.md: Rewrite for Power10.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 4d528a39a37..85bb42d6dce 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1080,6 +1080,26 @@ struct processor_costs power9_cost = {
   COSTS_N_INSNS (3),   /* SF->DF convert */
 };
 
+/* Instruction costs on POWER10 processors.  */
+static const
+struct processor_costs power10_cost = {
+  COSTS_N_INSNS (1),   /* mulsi */
+  COSTS_N_INSNS (1),   /* mulsi_const */
+  COSTS_N_INSNS (1),   /* mulsi_const9 */
+  COSTS_N_INSNS (1),   /* muldi */
+  COSTS_N_INSNS (4),   /* divsi */
+  COSTS_N_INSNS (4),   /* divdi */
+  COSTS_N_INSNS (2),   /* fp */
+  COSTS_N_INSNS (2),   /* dmul */
+  COSTS_N_INSNS (7),   /* sdiv */
+  COSTS_N_INSNS (9),   /* ddiv */
+  128, /* cache line size */
+  32,  /* l1 cache */
+  512, /* l2 cache */
+  16,  /* prefetch streams */
+  COSTS_N_INSNS (2),   /* SF->DF convert */
+};
+
 /* Instruction costs on POWER A2 processors.  */
 static const
 struct processor_costs ppca2_cost = {
@@ -4734,10 +4754,13 @@ rs6000_option_override_internal (bool global_init_p)
break;
 
   case PROCESSOR_POWER9:
-  case PROCESSOR_POWER10:
rs6000_cost = _cost;
break;
 
+  case PROCESSOR_POWER10:
+   rs6000_cost = _cost;
+   break;
+
   case PROCESSOR_PPCA2:
rs6000_cost = _cost;
break;
@@ -18001,8 +18024,9 @@ rs6000_issue_rate (void)
   case PROCESSOR_POWER8:
 return 7;
   case PROCESSOR_POWER9:
-  case PROCESSOR_POWER10:
 return 6;
+  case PROCESSOR_POWER10:
+return 8;
   default:
 return 1;
   }
diff --git a/gcc/config/rs6000/power10.md b/gcc/config/rs6000/power10.md
new file mode 100644
index 000..f9ca4cbf10e
--- /dev/null
+++ b/gcc/config/rs6000/power10.md
@@ -0,0 +1,553 @@
+;; Scheduling description for the IBM POWER10 processor.
+;; Copyright (C) 2020-2020 Free Software Foundation, Inc.
+;;
+;; Contributed by Pat Haugen (pthau...@us.ibm.com).
+
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either version 3, or (at your
+;; option) any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+;; License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+; For Power10 we model (and try to pack) the in-order decode/dispatch groups
+; which consist of 8 instructions max.  We do not try to model the details of
+; the out-of-order issue queues and how insns flow to the various execution
+; units except for the simple representation of the issue limitation of at
+; most 4 insns to the execution units/2 insns to the load units/2 insns to
+; the store units.
+(define_automaton "power10dsp,power10issue,power10div")
+
+; Decode/dispatch slots
+(define_cpu_unit "du0_power10,du1_power10,du2_power10,du3_power10,
+ du4_power10,du5_power10,du6_power10,du7_power10" "power10dsp")
+
+; Four execution units
+(define_cpu_unit "exu0_power10,exu1_power10,exu2_power10,exu3_power10"
+"power10issue")
+; Two load units and two store units
+(define_cpu_unit "lu0_power10,lu1_power10" "power10issue")
+(define_cpu_unit "stu0_power10,stu1_power10" "power10issue")
+; Create false units for use by non-pipelined div/sqrt
+(define_cpu_unit "fx_div0_power10,fx_div1_power10" "power10div")
+(define_cpu_unit "fp_div0_power10,fp_div1_power10,fp_div2_power10,
+ fp_div3_power10" "power10div")
+
+
+; Dispatch slots are allocated in order conforming to program order.
+(absence_set "du0_power10" "du1_power10,du2_power10,du3_power10,du4_power10,\
+  du5_power10,du6_power10,du7_power10")
+(absence_set "du1_power10" "du2_power10,du3_power10,du4_power10,du5_power10,\
+  du6_power10,du7_power10")

Re: [RFC, Instruction Scheduler, Stage1] New hook/code to perform fusion of dependent instructions

2020-11-13 Thread Pat Haugen via Gcc-patches
On 11/12/20 10:05 AM, Jeff Law wrote:
>> I have coded up a proof of concept that implements our needs via a new
>> target hook. The hook is passed a pair of dependent insns and returns if
>> they are a fusion candidate. It is called while removing the forward
>> dependencies of the just scheduled insn. If a dependent insn becomes
>> available to schedule and it's a fusion candidate with the just
>> scheduled insn, then the new code moves it to the ready list (if
>> necessary) and marks it as SCHED_GROUP (piggy-backing on the existing
>> code used by TARGET_SCHED_MACRO_FUSION) to make sure the fusion
>> candidate will be scheduled next. Following is the scheduling part of
>> the diff. Does this sound like a feasible approach? I welcome any
>> comments/discussion.
> It looks fairly reasonable to me.   Do you plan on trying to take this
> forward at all?

Due to other requirements where this approach did not work, we are pursuing a 
different approach of creating combine patterns for the target.

-Pat



Re: [PATCH, rs6000] Update instruction attributes for Power10

2020-11-06 Thread Pat Haugen via Gcc-patches
On 11/5/20 4:32 PM, will schmidt wrote:
> On Wed, 2020-11-04 at 14:42 -0600, Pat Haugen via Gcc-patches wrote:
>>  * config/rs6000/rs6000.c (rs6000_final_prescan_insn): Only add 'p' for
>>  PREFIXED_YES.
> 
> The code change reads as roughly 
> - next_insn_prefixed_p != PREFIXED_NO
> 
> + next_insn_prefixed_p == PREFIXED_YES"
> 
> So just an inversion of the logic? I don't obviously see the 'p' impact
> there.
> 
It's no longer an inversion of the logic since I added a PREFIXED_ALWAYS value. 
'next_insn_prefixed' is used by rs6000_final_prescan_insn() to determine 
whether an insn mnemonic needs a 'p' prefix. We want it set for PREFIXED_YES, 
but not for PREFIXED_NO or PREFIXED_ALWAYS.

> 
>>  * config/rs6000/rs6000.md (define_attr "size"): Add 256.
>>  (define_attr "prefixed"): Add 'always'.
>>  (define_mode_attr bits): Add DD/TD modes.
>>  (cfuged, cntlzdm, cnttzdm, pdepd, pextd, bswaphi2_reg, bswapsi2_reg,
>>  bswapdi2_brd, setbc_signed_,
>>  *setbcr_signed_, *setnbc_signed_,
>>  *setnbcr_signed_): Update instruction attributes for
>>  Power10.
> 
> ok.  (assuming the assorted 'integer' -> 'crypto' changes are correct,
> of course).  
> 
Yes, crypto represents the correct pipe the insns are executed on.

Thanks for the review,
Pat



[PATCH, rs6000] Update instruction attributes for Power10

2020-11-04 Thread Pat Haugen via Gcc-patches
Update instruction attributes for Power10.


This patch updates the type/prefixed/dot/size attributes for various new 
instructions (and a couple existing that were incorrect) in preparation for the 
Power10 scheduling patch that will be following.

Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. Ok 
for trunk?

-Pat


2020-11-04  Pat Haugen  

gcc/
* config/rs6000/altivec.md (vsdb_, xxspltiw_v4si,
xxspltiw_v4sf_inst, xxspltidp_v2df_inst, xxsplti32dx_v4si_inst,
xxsplti32dx_v4sf_inst, xxblend_, xxpermx_inst,
vstrir_code_, vstrir_p_code_, vstril_code_,
vstril_p_code_, altivec_lvsl_reg, altivec_lvsl_direct,
altivec_lvsr_reg, altivec_lvsr_direct, xxeval, vcfuged, vclzdm,
vctzdm, vpdepd, vpextd, vgnb, vclrlb, vclrrb): Update instruction
attributes for Power10.
* config/rs6000/dfp.md (extendddtd2, trunctddd2, *cmp_internal1,
floatditd2, ftrunc2, fixdi2, dfp_ddedpd_,
dfp_denbcd_, dfp_dxex_, dfp_diex_,
*dfp_sgnfcnc_, dfp_dscli_, dfp_dscri_): Likewise.
* config/rs6000/mma.md (*movpoi, mma_, mma_,
mma_, mma_, mma_, mma_,
mma_, mma_, mma_, mma_):
Likewise.
* config/rs6000/rs6000.c (rs6000_final_prescan_insn): Only add 'p' for
PREFIXED_YES.
* config/rs6000/rs6000.md (define_attr "size"): Add 256.
(define_attr "prefixed"): Add 'always'.
(define_mode_attr bits): Add DD/TD modes.
(cfuged, cntlzdm, cnttzdm, pdepd, pextd, bswaphi2_reg, bswapsi2_reg,
bswapdi2_brd, setbc_signed_,
*setbcr_signed_, *setnbc_signed_,
*setnbcr_signed_): Update instruction attributes for
Power10.
* config/rs6000/sync.md (load_quadpti, store_quadpti, load_lockedpti,
store_conditionalpti): Update instruction attributes for Power10.
* config/rs6000/vsx.md (*xvtlsbb_internal, xxgenpcvm__internal,
vextractl_internal, vextractr_internal,
vinsertvl_internal_, vinsertvr_internal_,
vinsertgl_internal_, vinsertgr_internal_,
vreplace_elt__inst): Likewise.

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 0a2e634d6b0..76191ba4107 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -819,7 +819,7 @@ (define_insn "vsdb_"
  VSHIFT_DBL_LR))]
   "TARGET_POWER10"
   "vsdbi %0,%1,%2,%3"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecperm")])
 
 (define_insn "xxspltiw_v4si"
   [(set (match_operand:V4SI 0 "register_operand" "=wa")
@@ -827,7 +827,8 @@ (define_insn "xxspltiw_v4si"
 UNSPEC_XXSPLTIW))]
  "TARGET_POWER10"
  "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")])
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "always")])
 
 (define_expand "xxspltiw_v4sf"
   [(set (match_operand:V4SF 0 "register_operand" "=wa")
@@ -846,7 +847,8 @@ (define_insn "xxspltiw_v4sf_inst"
 UNSPEC_XXSPLTIW))]
  "TARGET_POWER10"
  "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")])
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "always")])
 
 (define_expand "xxspltidp_v2df"
   [(set (match_operand:V2DF 0 "register_operand" )
@@ -865,7 +867,8 @@ (define_insn "xxspltidp_v2df_inst"
 UNSPEC_XXSPLTID))]
   "TARGET_POWER10"
   "xxspltidp %x0,%1"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "always")])
 
 (define_expand "xxsplti32dx_v4si"
   [(set (match_operand:V4SI 0 "register_operand" "=wa")
@@ -894,7 +897,8 @@ (define_insn "xxsplti32dx_v4si_inst"
 UNSPEC_XXSPLTI32DX))]
   "TARGET_POWER10"
   "xxsplti32dx %x0,%2,%3"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "always")])
 
 (define_expand "xxsplti32dx_v4sf"
   [(set (match_operand:V4SF 0 "register_operand" "=wa")
@@ -922,7 +926,8 @@ (define_insn "xxsplti32dx_v4sf_inst"
 UNSPEC_XXSPLTI32DX))]
   "TARGET_POWER10"
   "xxsplti32dx %x0,%2,%3"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "always")])
 
 (define_insn "xxblend_"
   [(set (match_operand:VM3 0 "register_operand" "=wa")
@@ -932,7 +937,8 @@ (define_insn "xxblend_"
UNSPEC_XXBLEND))]
   "TARGET_POWER10"
   "xxblendv %x0,%x1,%x2,%x3"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "always")])
 
 (define_expand "xxpermx"
   [(set (match_operand:V2DI 0 "register_operand" "+wa")
@@ -976,7 +982,8 @@ (define_insn "xxpermx_inst"
 UNSPEC_XXPERMX))]
   "TARGET_POWER10"
   "xxpermx %x0,%x1,%x2,%x3,%4"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "always")])
 
 (define_expand "vstrir_"
   [(set (match_operand:VIshort 0 "altivec_register_operand")
@@ -998,7 +1005,7 @@ (define_insn "vstrir_code_"
   UNSPEC_VSTRIR))]
   "TARGET_POWER10"
   

Re: [PATCH 1/2, rs6000] int128 sign extention instructions (partial prereq)

2020-09-25 Thread Pat Haugen via Gcc-patches
On 9/24/20 10:59 AM, will schmidt via Gcc-patches wrote:
> +;; Move DI value from GPR to TI mode in VSX register, word 1.
> +(define_insn "mtvsrdd_diti_w1"
> +  [(set (match_operand:TI 0 "register_operand" "=wa")
> + (unspec:TI [(match_operand:DI 1 "register_operand" "r")]
> +UNSPEC_MTVSRD_DITI_W1))]
> +  "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
> +  "mtvsrdd %x0,0,%1"
> +  [(set_attr "type" "vecsimple")])

"vecmove" (since I just updated the other uses).

> +
> +;; Sign extend 64-bit value in TI reg, word 1, to 128-bit value in TI reg
> +(define_insn "extendditi2_vector"
> +  [(set (match_operand:TI 0 "gpc_reg_operand" "=v")
> +(unspec:TI [(match_operand:TI 1 "gpc_reg_operand" "v")]
> + UNSPEC_EXTENDDITI2))]
> +  "TARGET_POWER10"
> +  "vextsd2q %0,%1"
> +  [(set_attr "type" "exts")])

"vecexts".

> +
> +(define_expand "extendditi2"
> +  [(set (match_operand:TI 0 "gpc_reg_operand")
> +(sign_extend:DI (match_operand:DI 1 "gpc_reg_operand")))]
> +  "TARGET_POWER10"
> +  {
> +/* Move 64-bit src from GPR to vector reg and sign extend to 128-bits */
> +rtx temp = gen_reg_rtx (TImode);
> +emit_insn (gen_mtvsrdd_diti_w1 (temp, operands[1]));
> +emit_insn (gen_extendditi2_vector (operands[0], temp));
> +DONE;
> +  }
> +  [(set_attr "type" "exts")])

Don't need "type" attr on define_expand since the type will come from the 2 
individual insns emitted.

Thanks,
Pat


[PATCH] rs6000: Rename mffgpr/mftgpr instruction types

2020-09-11 Thread Pat Haugen via Gcc-patches
The following is mostly a mechanical change to rename the mffgpr/mftgpr
insn types to mtvsr/mfvsr to be more clear. It also removes Power6
references to those insn types since we no longer generate those
instructions.

Bootstrap/regtest on powerpc64le with no new regressions. Ok for trunk?

-Pat


2020-09-11  Pat Haugen  

gcc/
* config/rs6000/power10.md (power10-mffgpr, power10-mftgpr): Rename to
mtvsr/mfvsr.
* config/rs6000/power6.md (X2F_power6, power6-mftgpr, power6-mffgpr):
Remove.
* config/rs6000/power8.md (power8-mffgpr, power8-mftgpr): Rename to
mtvsr/mfvsr.
* config/rs6000/power9.md (power8-mffgpr, power8-mftgpr): Likewise.
* config/rs6000/rs6000.c (rs6000_adjust_cost): Remove Power6
TYPE_MFFGPR cases.
* config/rs6000/rs6000.md (mffgpr, mftgpr, zero_extendsi2,
extendsi2, @signbit2_dm, lfiwax, lfiwzx, *movsi_internal1,
movsi_from_sf, *movdi_from_sf_zero_ext, *mov_internal,
movsd_hardfloat, movsf_from_si, *mov_hardfloat64, p8_mtvsrwz,
p8_mtvsrd_df, p8_mtvsrd_sf, p8_mfvsrd_3_, *movdi_internal64,
unpack_dm): Rename mffgpr/mftgpr to mtvsr/mfvsr.
* config/rs6000/vsx.md (vsx_mov_64bit, vsx_extract_,
vsx_extract_si, *vsx_extract__p8): Likewise.



diff --git a/gcc/config/rs6000/power10.md b/gcc/config/rs6000/power10.md
index 9f8a5825744..2b4d882e8df 100644
--- a/gcc/config/rs6000/power10.md
+++ b/gcc/config/rs6000/power10.md
@@ -468,13 +468,13 @@ (define_insn_reservation "power10-qpmul" 24
(eq_attr "cpu" "power10"))
   "DU_super_power10,dfu_power10*8")

-(define_insn_reservation "power10-mffgpr" 2
-  (and (eq_attr "type" "mffgpr")
+(define_insn_reservation "power10-mtvsr" 2
+  (and (eq_attr "type" "mtvsr")
(eq_attr "cpu" "power10"))
   "DU_slice_3_power10,VSU_power10")

-(define_insn_reservation "power10-mftgpr" 2
-  (and (eq_attr "type" "mftgpr")
+(define_insn_reservation "power10-mfvsr" 2
+  (and (eq_attr "type" "mfvsr")
(eq_attr "cpu" "power10"))
   "DU_slice_3_power10,VSU_power10")

diff --git a/gcc/config/rs6000/power6.md b/gcc/config/rs6000/power6.md
index a94ce52425f..e2e7582e67c 100644
--- a/gcc/config/rs6000/power6.md
+++ b/gcc/config/rs6000/power6.md
@@ -56,10 +56,6 @@ (define_reservation "LX2_power6"
 (define_reservation "FX2_power6"
 "iu1_power6+iu2_power6")

-(define_reservation "X2F_power6"
-"(iu1_power6+iu2_power6+fpu1_power6)\
-|(iu1_power6+iu2_power6+fpu2_power6)")
-
 (define_reservation "BX2_power6"
 "iu1_power6+iu2_power6+bpu_power6")

@@ -605,20 +601,3 @@ (define_bypass 6 "power6-vecperm" "power6-veccomplex" )

 (define_bypass 5 "power6-vecperm" "power6-vecstore" )

-(define_insn_reservation "power6-mftgpr" 8
-  (and (eq_attr "type" "mftgpr")
-   (eq_attr "cpu" "power6"))
-  "X2F_power6")
-
-(define_insn_reservation "power6-mffgpr" 14
-  (and (eq_attr "type" "mffgpr")
-   (eq_attr "cpu" "power6"))
-  "LX2_power6")
-
-(define_bypass 4 "power6-mftgpr" "power6-imul,\
-  power6-lmul,\
-  power6-imul-cmp,\
-  power6-lmul-cmp,\
-  power6-imul3,\
-  power6-idiv,\
-  power6-ldiv" )
diff --git a/gcc/config/rs6000/power8.md b/gcc/config/rs6000/power8.md
index fae2ad8bf57..a3f46c62be2 100644
--- a/gcc/config/rs6000/power8.md
+++ b/gcc/config/rs6000/power8.md
@@ -379,13 +379,13 @@ (define_insn_reservation "power8-vecdiv" 31
(eq_attr "cpu" "power8"))
   "DU_any_power8,VSU_power8")

-(define_insn_reservation "power8-mffgpr" 5
-  (and (eq_attr "type" "mffgpr")
+(define_insn_reservation "power8-mtvsr" 5
+  (and (eq_attr "type" "mtvsr")
(eq_attr "cpu" "power8"))
   "DU_any_power8,VSU_power8")

-(define_insn_reservation "power8-mftgpr" 6
-  (and (eq_attr "type" "mftgpr")
+(define_insn_reservation "power8-mfvsr" 6
+  (and (eq_attr "type" "mfvsr")
(eq_attr "cpu" "power8"))
   "DU_any_power8,VSU_power8")

diff --git a/gcc/config/rs6000/power9.md b/gcc/config/rs6000/power9.md
index 2277b145308..c86d643a7d3 100644
--- a/gcc/config/rs6000/power9.md
+++ b/gcc/config/rs6000/power9.md
@@ -466,13 +466,13 @@ (define_insn_reservation "power9-qpmul" 24
(eq_attr "cpu" "power9"))
   "DU_super_power9,dfu_power9*8")

-(define_insn_reservation "power9-mffgpr" 2
-  (and (eq_attr "type" "mffgpr")
+(define_insn_reservation "power9-mtvsr" 2
+  (and (eq_attr "type" "mtvsr")
(eq_attr "cpu" "power9"))
   "DU_slice_3_power9,VSU_power9")

-(define_insn_reservation "power9-mftgpr" 2
-  (and (eq_attr "type" "mftgpr")
+(define_insn_reservation "power9-mfvsr" 2
+  (and (eq_attr "type" "mfvsr")
(eq_attr "cpu" "power9"))
   "DU_slice_3_power9,VSU_power9")

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ca5b71ecdd3..5b0162dcccd 

Re: [PATCH] rs6000: Fix instruction type

2020-09-09 Thread Pat Haugen via Gcc-patches
On 9/9/20 4:41 PM, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Sep 09, 2020 at 04:14:37PM -0500, Pat Haugen wrote:
>> I noticed that some of the VSR<->GPR move instructions are not typed
>> correctly. This patch fixes those instructions so that the scheduler
>> treats them with the correct latency.
> 
>> --- a/gcc/config/rs6000/rs6000.md
>> +++ b/gcc/config/rs6000/rs6000.md
>> @@ -5483,7 +5483,7 @@ (define_insn "lfiwzx"
>> lxsiwzx %x0,%y1
>> mtvsrwz %x0,%1
>> xxextractuw %x0,%x1,4"
>> -  [(set_attr "type" "fpload,fpload,mftgpr,vecexts")
>> +  [(set_attr "type" "fpload,fpload,mffgpr,vecexts")
>> (set_attr "isa" "*,p8v,p8v,p9v")])
> 
> Can we rename mftgpr/mffgpr globally?  Maybe even as mfvsr and mtvsr,
> because that is what is actually modeled here?  Such names will make it
> much harder to get confused and use the wrong type, too :-)
> 

Those types were originally created for the mffgpr/mftgpr Power6
instructions. But since it appears we no longer generate those insns I
totally agree with doing a global change as you suggest to make things
clearer. Would you like that as a separate patch or is it fine to
include in this one?


>> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
>> index 54da54c43dc..3a5cf896da8 100644
>> --- a/gcc/config/rs6000/vsx.md
>> +++ b/gcc/config/rs6000/vsx.md
>> @@ -2885,7 +2885,7 @@ (define_insn "vsx_concat_"
>>else
>>  gcc_unreachable ();
>>  }
>> -  [(set_attr "type" "vecperm")])
>> +  [(set_attr "type" "vecperm,vecmove")])
> 
> mtvsrdd is a mtvsr, sorry, mffgpr just the same?  It isn't vecmove?
> 
>> @@ -4440,7 +4440,7 @@ (define_insn "vsx_splat__reg"
>>"@
>> xxpermdi %x0,%x1,%x1,0
>> mtvsrdd %x0,%1,%1"
>> -  [(set_attr "type" "vecperm")])
>> +  [(set_attr "type" "vecperm,vecmove")])
> 
> Same here.

mtvsrdd dispatches as a vector op, so requires a super-slice. As opposed
to the others which just require a single execution slice for Power9.

-Pat


[PATCH] rs6000: Fix instruction type

2020-09-09 Thread Pat Haugen via Gcc-patches
I noticed that some of the VSR<->GPR move instructions are not typed
correctly. This patch fixes those instructions so that the scheduler
treats them with the correct latency.

Bootstrap/regtest on powerpc64le with no new regressions. Also ran a
CPU2017 benchmark comparison on Power9 with no major differences (a
couple minor
improvements and no degradations). Ok for trunk?

-Pat


2020-09-09  Pat Haugen  

gcc/
* gcc/config/rs6000/rs6000.md
(lfiwzx, floatunssi2_lfiwzx, p8_mtvsrwz, p8_mtvsrd_sf): Fix insn
type.
* gcc/config/rs6000/vsx.md
(vsx_concat_, vsx_splat__reg, vsx_splat_v4sf): Likewise.



diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 43b620ae1c0..f902c864c26 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -5483,7 +5483,7 @@ (define_insn "lfiwzx"
lxsiwzx %x0,%y1
mtvsrwz %x0,%1
xxextractuw %x0,%x1,4"
-  [(set_attr "type" "fpload,fpload,mftgpr,vecexts")
+  [(set_attr "type" "fpload,fpload,mffgpr,vecexts")
(set_attr "isa" "*,p8v,p8v,p9v")])

 (define_insn_and_split "floatunssi2_lfiwzx"
@@ -7634,7 +7634,7 @@ (define_insn_and_split "movsf_from_si"
 *,  12,*, *")
(set_attr "type"
"load,   fpload,fpload,fpload,store, fpstore,
-fpstore,vecfloat,  mffgpr,*")
+fpstore,vecfloat,  mftgpr,*")
(set_attr "isa"
"*,  *, p9v,   p8v,   *, *,
 p8v,p8v,   p8v,   *")])
@@ -8711,7 +8711,7 @@ (define_insn "p8_mtvsrwz"
   UNSPEC_P8V_MTVSRWZ))]
   "!TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
   "mtvsrwz %x0,%1"
-  [(set_attr "type" "mftgpr")])
+  [(set_attr "type" "mffgpr")])

 (define_insn_and_split "reload_fpr_from_gpr"
   [(set (match_operand:FMOVE64X 0 "register_operand" "=d")
@@ -8810,7 +8810,7 @@ (define_insn "p8_mtvsrd_sf"
   UNSPEC_P8V_MTVSRD))]
   "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
   "mtvsrd %x0,%1"
-  [(set_attr "type" "mftgpr")])
+  [(set_attr "type" "mffgpr")])

 (define_insn_and_split "reload_vsx_from_gprsf"
   [(set (match_operand:SF 0 "register_operand" "=wa")
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 54da54c43dc..3a5cf896da8 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -2885,7 +2885,7 @@ (define_insn "vsx_concat_"
   else
 gcc_unreachable ();
 }
-  [(set_attr "type" "vecperm")])
+  [(set_attr "type" "vecperm,vecmove")])

 ;; Combiner patterns to allow creating XXPERMDI's to access either double
 ;; word element in a vector register.
@@ -4440,7 +4440,7 @@ (define_insn "vsx_splat__reg"
   "@
xxpermdi %x0,%x1,%x1,0
mtvsrdd %x0,%1,%1"
-  [(set_attr "type" "vecperm")])
+  [(set_attr "type" "vecperm,vecmove")])

 (define_insn "vsx_splat__mem"
   [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
@@ -4493,7 +4493,7 @@ (define_insn_and_split "vsx_splat_v4sf"
(unspec:V4SF [(match_dup 0)
  (const_int 0)] UNSPEC_VSX_XXSPLTW))]
   ""
-  [(set_attr "type" "vecload,vecperm,mftgpr")
+  [(set_attr "type" "vecload,vecperm,vecperm")
(set_attr "length" "*,8,*")
(set_attr "isa" "*,p8v,*")])


[PATCH] rs6000: Disable -fcaller-saves by default

2020-07-23 Thread Pat Haugen via Gcc-patches
Disable -fcaller-saves by default.

This patch turns off -fcaller-saves by default so that IRA doesn't try
to use volatile regs for pseudos that are live across a call, which
would then require LRA to save/restore the reg around the call.

Bootstrap/regtest on powerpc64le with no new regressions. Also ran a
CPU2017 benchmark comparison with no major differences (a few minor
improvements and one minor degradation). Ok for trunk?

-Pat


2020-07-23  Pat Haugen  

gcc/
* common/config/rs6000/rs6000-common.c
(rs6000_option_optimization_table): Turn off -fcaller-saves.

gcc/testsuite/
* gcc.target/powerpc/caller-saves.c: New.



diff --git a/gcc/common/config/rs6000/rs6000-common.c
b/gcc/common/config/rs6000/rs6000-common.c
index ee37b9dc90b..6d68834aed2 100644
--- a/gcc/common/config/rs6000/rs6000-common.c
+++ b/gcc/common/config/rs6000/rs6000-common.c
@@ -43,8 +43,13 @@ static const struct default_options
rs6000_option_optimization_table[] =
on rs6000, turn it off by default.  */
 { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },

+/* Don't allow IRA to use volatile regs across function calls.  */
+{ OPT_LEVELS_ALL, OPT_fcaller_saves, NULL, 0 },
+
 /* Double growth factor to counter reduced min jump length.  */
 { OPT_LEVELS_ALL, OPT__param_max_grow_copy_bb_insns_, NULL, 16 },
+
+/* Following marks the end of the list and needs to remain last.  */
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };

diff --git a/gcc/testsuite/gcc.target/powerpc/caller-saves.c
b/gcc/testsuite/gcc.target/powerpc/caller-saves.c
new file mode 100644
index 000..846356f16f7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/caller-saves.c
@@ -0,0 +1,9 @@
+/* { dg-do compile */
+/* { dg-options "-O2 -fverbose-asm" } */
+
+void
+foo ()
+{
+}
+
+/* { dg-final { scan-assembler-not "fcaller-saves" } } */


Re: [RFC, Instruction Scheduler, Stage1] New hook/code to perform fusion of dependent instructions

2020-04-08 Thread Pat Haugen via Gcc-patches
On 4/8/20 2:46 AM, Richard Biener wrote:
>> I have coded up a proof of concept that implements our needs via a new
>> target hook. The hook is passed a pair of dependent insns and returns if
>> they are a fusion candidate. It is called while removing the forward
>> dependencies of the just scheduled insn. If a dependent insn becomes
>> available to schedule and it's a fusion candidate with the just
>> scheduled insn, then the new code moves it to the ready list (if
>> necessary) and marks it as SCHED_GROUP (piggy-backing on the existing
>> code used by TARGET_SCHED_MACRO_FUSION) to make sure the fusion
>> candidate will be scheduled next. Following is the scheduling part of
>> the diff. Does this sound like a feasible approach? I welcome any
>> comments/discussion.
> It would be nice test if the i386 use of TARGET_SCHED_MACRO_FUSION hooks
> for compare + branch fusion can be transitioned to that scheme.  I think your
> proposal doesn't improve things there since we can never schedule
> branches earlier
> but we could schedule the compare later.  But your proposal only
> affects scheduling
> of later insns, not the order of insns getting onto the ready list?

Yes, your understanding is correct, my approach only affects later
dependent insns. The first insn of the fusion pair will be scheduled
according to existing priority based scheduling.

-Pat


[RFC, Instruction Scheduler, Stage1] New hook/code to perform fusion of dependent instructions

2020-04-07 Thread Pat Haugen via Gcc-patches
The Power processor has the ability to fuse certain pairs of dependent
instructions to improve their performance if they appear back-to-back in
the instruction stream. In looking at the current support for
instruction fusion in GCC I saw the following 2 options.

1) TARGET_SCHED_MACRO_FUSION target hooks: Only looks at existing
back-to-back instructions and will ensure the scheduler keeps them together.

2) -fsched-fusion/TARGET_SCHED_FUSION_PRIORITY: Runs as a separate
scheduling pass before peephole2. Operates independently on a single
insn. Used by ARM backend to assign higher priorities to base/disp loads
and stores so that the scheduling pass will schedule loads/stores to
adjacent memory back-to-back. Later these insns will be transformed into
load/store pair insns.

Neither of these work for Power's purpose because they don't deal with
fusion of dependent insns that may not already be back-to-back. The
TARGET_SCHED_REORDER[2] hooks also don't work since the dependent insn
more than likely gets queued for N cycles so wouldn't be on the ready
list for the reorder hooks to process. We want the ability for the
scheduler to schedule dependent insn pairs back-to-back when possible
(i.e. other dependencies of both insns have been satisfied).

I have coded up a proof of concept that implements our needs via a new
target hook. The hook is passed a pair of dependent insns and returns if
they are a fusion candidate. It is called while removing the forward
dependencies of the just scheduled insn. If a dependent insn becomes
available to schedule and it's a fusion candidate with the just
scheduled insn, then the new code moves it to the ready list (if
necessary) and marks it as SCHED_GROUP (piggy-backing on the existing
code used by TARGET_SCHED_MACRO_FUSION) to make sure the fusion
candidate will be scheduled next. Following is the scheduling part of
the diff. Does this sound like a feasible approach? I welcome any
comments/discussion.

Thanks,
Pat


diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index 80687fb5359..7a62136d497 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -4152,6 +4152,39 @@ schedule_insn (rtx_insn *insn)
  && SCHED_GROUP_P (next)
  && advance < effective_cost)
advance = effective_cost;
+
+ /* If all dependencies of this insn have now been resolved and the
+just scheduled insn was not part of a SCHED_GROUP, check if
+this dependent insn can be fused with the just scheduled insn.  */
+ else if (effective_cost >= 0 && !SCHED_GROUP_P (insn)
+  && targetm.sched.dep_fusion
+  && targetm.sched.dep_fusion (insn, next))
+   {
+ /* Move to ready list if necessary.  */
+ if (effective_cost > 0)
+   {
+ queue_remove (next);
+ ready_add (, next, true);
+   }
+
+ /* Mark as sched_group.  */
+ SCHED_GROUP_P (next) = 1;
+
+ /* Fix insn_tick.  */
+ INSN_TICK (next) = INSN_TICK (insn);
+
+ /* Dump some debug output for success.  */
+ if (sched_verbose >= 5)
+   {
+ fprintf (sched_dump, ";;\t\tFusing dependent insns: ");
+ fprintf (sched_dump, "%4d %-30s --> ", INSN_UID (insn),
+  str_pattern_slim (PATTERN (insn)));
+ fprintf (sched_dump, "%4d %-30s\n", INSN_UID (next),
+  str_pattern_slim (PATTERN (next)));
+   }
+   }
}
   else
/* Check always has only one forward dependence (to the first insn in