from:"Pat Haugen"

Re: [PATCH V4, rs6000] Disable generation of scalar modulo instructions

2023-09-19 Thread Pat Haugen


On 9/13/23 3:48 PM, Segher Boessenkool wrote:



-  "TARGET_POWER10 && TARGET_POWERPC64"
+  "TARGET_POWER10 && TARGET_POWERPC64 && !RS6000_DISABLE_SCALAR_MODULO"
"vmoduq %0,%1,%2"


Did we ever test if this insn in fact is slower as well?  I don't mean
either way, orthogonality is good, but just for my enlightenment.

Yes, I tested quadword too and saw the same result, div/mul/sub is the 
better option.




With improved changelog: okay for trunk.  Okay for all backports as
well (after some soak time).


Thanks, updated changelog and committed to trunk. Will backport after 
burn-in.


-Pat

PING^4: [PATCH V4, rs6000] Disable generation of scalar modulo instructions

2023-09-08 Thread Pat Haugen via Gcc-patches


Ping.

On 6/30/23 2:26 PM, Pat Haugen via Gcc-patches wrote:

Updated from prior version to address latest review comment (simplify
umod3).

Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-06-30  Pat Haugen  

gcc/
     * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling
     scalar modulo.
     * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
     * config/rs6000/rs6000.md (mod3, *mod3): Disable.
     (define_expand umod3): New.
     (define_insn umod3): Rename to *umod3 and disable.
     (umodti3, modti3): Disable.

gcc/testsuite/
     * gcc.target/powerpc/clone1.c: Add xfails.
     * gcc.target/powerpc/clone3.c: Likewise.
     * gcc.target/powerpc/mod-1.c: Update scan strings and add xfails.
     * gcc.target/powerpc/mod-2.c: Likewise.
     * gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.

PING^3: [PATCH V4, rs6000] Disable generation of scalar modulo instructions

2023-08-09 Thread Pat Haugen via Gcc-patches


Ping.

On 6/30/23 2:26 PM, Pat Haugen via Gcc-patches wrote:

Updated from prior version to address latest review comment (simplify
umod3).

Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-06-30  Pat Haugen  

gcc/
     * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling
     scalar modulo.
     * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
     * config/rs6000/rs6000.md (mod3, *mod3): Disable.
     (define_expand umod3): New.
     (define_insn umod3): Rename to *umod3 and disable.
     (umodti3, modti3): Disable.

gcc/testsuite/
     * gcc.target/powerpc/clone1.c: Add xfails.
     * gcc.target/powerpc/clone3.c: Likewise.
     * gcc.target/powerpc/mod-1.c: Update scan strings and add xfails.
     * gcc.target/powerpc/mod-2.c: Likewise.
     * gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.

PING ^2: [PATCH V4, rs6000] Disable generation of scalar modulo instructions

2023-08-01 Thread Pat Haugen via Gcc-patches


On 6/30/23 2:26 PM, Pat Haugen via Gcc-patches wrote:

Updated from prior version to address latest review comment (simplify
umod3).

Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-06-30  Pat Haugen  

gcc/
     * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling
     scalar modulo.
     * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
     * config/rs6000/rs6000.md (mod3, *mod3): Disable.
     (define_expand umod3): New.
     (define_insn umod3): Rename to *umod3 and disable.
     (umodti3, modti3): Disable.

gcc/testsuite/
     * gcc.target/powerpc/clone1.c: Add xfails.
     * gcc.target/powerpc/clone3.c: Likewise.
     * gcc.target/powerpc/mod-1.c: Update scan strings and add xfails.
     * gcc.target/powerpc/mod-2.c: Likewise.
     * gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.

Re: [PATCH V4, rs6000] Disable generation of scalar modulo instructions

2023-07-13 Thread Pat Haugen via Gcc-patches


Ping.

On 6/30/23 2:26 PM, Pat Haugen via Gcc-patches wrote:

Updated from prior version to address latest review comment (simplify
umod3).

Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-06-30  Pat Haugen  

gcc/
     * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling
     scalar modulo.
     * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
     * config/rs6000/rs6000.md (mod3, *mod3): Disable.
     (define_expand umod3): New.
     (define_insn umod3): Rename to *umod3 and disable.
     (umodti3, modti3): Disable.

gcc/testsuite/
     * gcc.target/powerpc/clone1.c: Add xfails.
     * gcc.target/powerpc/clone3.c: Likewise.
     * gcc.target/powerpc/mod-1.c: Update scan strings and add xfails.
     * gcc.target/powerpc/mod-2.c: Likewise.
     * gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.

[PATCH V4, rs6000] Disable generation of scalar modulo instructions

2023-06-30 Thread Pat Haugen via Gcc-patches


Updated from prior version to address latest review comment (simplify
umod3).

Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-06-30  Pat Haugen  

gcc/
* config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling
scalar modulo.
* config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
* config/rs6000/rs6000.md (mod3, *mod3): Disable.
(define_expand umod3): New.
(define_insn umod3): Rename to *umod3 and disable.
(umodti3, modti3): Disable.

gcc/testsuite/
* gcc.target/powerpc/clone1.c: Add xfails.
* gcc.target/powerpc/clone3.c: Likewise.
* gcc.target/powerpc/mod-1.c: Update scan strings and add xfails.
* gcc.target/powerpc/mod-2.c: Likewise.
* gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 07c3a3d15ac..72abf285301 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -22157,7 +22157,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
*total = rs6000_cost->divsi;
}
   /* Add in shift and subtract for MOD unless we have a mod instruction. */
-  if (!TARGET_MODULO && (code == MOD || code == UMOD))
+  if ((!TARGET_MODULO
+  || (RS6000_DISABLE_SCALAR_MODULO && SCALAR_INT_MODE_P (mode)))
+&& (code == MOD || code == UMOD))
*total += COSTS_N_INSNS (2);
   return false;
 
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..22595f6ebd7 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2492,3 +2492,9 @@ while (0)
rs6000_asm_output_opcode (STREAM);  \
 }  \
   while (0)
+
+/* Disable generation of scalar modulo instructions due to performance issues
+   with certain input values.  This can be removed in the future when the
+   issues have been resolved.  */
+#define RS6000_DISABLE_SCALAR_MODULO 1
+
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index cdab49fbb91..555c8525333 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3422,6 +3422,17 @@ (define_expand "mod3"
FAIL;
 
   operands[2] = force_reg (mode, operands[2]);
+
+  if (RS6000_DISABLE_SCALAR_MODULO)
+   {
+ temp1 = gen_reg_rtx (mode);
+ temp2 = gen_reg_rtx (mode);
+
+ emit_insn (gen_div3 (temp1, operands[1], operands[2]));
+ emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+ emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+ DONE;
+   }
 }
   else
 {
@@ -3441,17 +3452,36 @@ (define_insn "*mod3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
 (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
 (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
   "mods %0,%1,%2"
   [(set_attr "type" "div")
(set_attr "size" "")])
 
+;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is
+;; removed.
+(define_expand "umod3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand")
+   (umod:GPR (match_operand:GPR 1 "gpc_reg_operand")
+ (match_operand:GPR 2 "gpc_reg_operand")))]
+  "TARGET_MODULO"
+{
+  if (RS6000_DISABLE_SCALAR_MODULO)
+{
+  rtx temp1 = gen_reg_rtx (mode);
+  rtx temp2 = gen_reg_rtx (mode);
+
+  emit_insn (gen_udiv3 (temp1, operands[1], operands[2]));
+  emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+  emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+  DONE;
+}
+})
 
-(define_insn "umod3"
+(define_insn "*umod3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
 (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
  (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
   "modu %0,%1,%2"
   [(set_attr "type" "div")
(set_attr "size" "")])
@@ -3508,7 +3538,7 @@ (define_insn "umodti3"
   [(set (match_operand:TI 0 "altivec_register_operand" "=v")
(umod:TI (match_operand:TI

Re: [PATCH V3, rs6000] Disable generation of scalar modulo instructions

2023-06-27 Thread Pat Haugen via Gcc-patches


On 6/27/23 1:52 PM, Pat Haugen via Gcc-patches wrote:
Updated from prior version to address review comments (update 
rs6000_rtx_cost,

update scan strings of mod-1.c/mod-2.c)l.

Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-06-27  Pat Haugen  

gcc/
 * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling
 scalar modulo.
 * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
 * config/rs6000/rs6000.md (mod3, *mod3): Disable.
 (define_expand umod3): New.
 (define_insn umod3): Rename to *umod3 and disable.
 (umodti3, modti3): Disable.

gcc/testsuite/
 * gcc.target/powerpc/clone1.c: Add xfails.
 * gcc.target/powerpc/clone3.c: Likewise.
 * gcc.target/powerpc/mod-1.c: Update scan strings and add xfails.
 * gcc.target/powerpc/mod-2.c: Likewise.
 * gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.



Attaching patch since my mailer apparently messed up some formatting again.

-Pat
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 546c353029b..2dae217bf64 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -22127,7 +22127,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
*total = rs6000_cost->divsi;
}
   /* Add in shift and subtract for MOD unless we have a mod instruction. */
-  if (!TARGET_MODULO && (code == MOD || code == UMOD))
+  if ((!TARGET_MODULO
+  || (RS6000_DISABLE_SCALAR_MODULO && SCALAR_INT_MODE_P (mode)))
+&& (code == MOD || code == UMOD))
*total += COSTS_N_INSNS (2);
   return false;
 
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..22595f6ebd7 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2492,3 +2492,9 @@ while (0)
rs6000_asm_output_opcode (STREAM);  \
 }  \
   while (0)
+
+/* Disable generation of scalar modulo instructions due to performance issues
+   with certain input values.  This can be removed in the future when the
+   issues have been resolved.  */
+#define RS6000_DISABLE_SCALAR_MODULO 1
+
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b0db8ae508d..6c2f237a539 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3421,6 +3421,17 @@ (define_expand "mod3"
FAIL;
 
   operands[2] = force_reg (mode, operands[2]);
+
+  if (RS6000_DISABLE_SCALAR_MODULO)
+   {
+ temp1 = gen_reg_rtx (mode);
+ temp2 = gen_reg_rtx (mode);
+
+ emit_insn (gen_div3 (temp1, operands[1], operands[2]));
+ emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+ emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+ DONE;
+   }
 }
   else
 {
@@ -3440,17 +3451,42 @@ (define_insn "*mod3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
 (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
 (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
   "mods %0,%1,%2"
   [(set_attr "type" "div")
(set_attr "size" "")])
 
+;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is
+;; removed.
+(define_expand "umod3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand")
+   (umod:GPR (match_operand:GPR 1 "gpc_reg_operand")
+ (match_operand:GPR 2 "gpc_reg_operand")))]
+  ""
+{
+  rtx temp1;
+  rtx temp2;
+
+  if (!TARGET_MODULO)
+   FAIL;
 
-(define_insn "umod3"
+  if (RS6000_DISABLE_SCALAR_MODULO)
+{
+  temp1 = gen_reg_rtx (mode);
+  temp2 = gen_reg_rtx (mode);
+
+  emit_insn (gen_udiv3 (temp1, operands[1], operands[2]));
+  emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+  emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+  DONE;
+}
+})
+
+(define_insn "*umod3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
 (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
  (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
   "modu %0,%1,%2"
   [(set

[PATCH V3, rs6000] Disable generation of scalar modulo instructions

2023-06-27 Thread Pat Haugen via Gcc-patches

Updated from prior version to address review comments (update 
rs6000_rtx_cost,

update scan strings of mod-1.c/mod-2.c)l.

Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-06-27  Pat Haugen  

gcc/
* config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling
scalar modulo.
* config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
* config/rs6000/rs6000.md (mod3, *mod3): Disable.
(define_expand umod3): New.
(define_insn umod3): Rename to *umod3 and disable.
(umodti3, modti3): Disable.

gcc/testsuite/
* gcc.target/powerpc/clone1.c: Add xfails.
* gcc.target/powerpc/clone3.c: Likewise.
* gcc.target/powerpc/mod-1.c: Update scan strings and add xfails.
* gcc.target/powerpc/mod-2.c: Likewise.
* gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.


diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 546c353029b..2dae217bf64 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -22127,7 +22127,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,

*total = rs6000_cost->divsi;
}
   /* Add in shift and subtract for MOD unless we have a mod 
instruction. */

-  if (!TARGET_MODULO && (code == MOD || code == UMOD))
+  if ((!TARGET_MODULO
+  || (RS6000_DISABLE_SCALAR_MODULO && SCALAR_INT_MODE_P (mode)))
+&& (code == MOD || code == UMOD))
*total += COSTS_N_INSNS (2);
   return false;

diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..22595f6ebd7 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2492,3 +2492,9 @@ while (0)
rs6000_asm_output_opcode (STREAM);  \
 }  \
   while (0)
+
+/* Disable generation of scalar modulo instructions due to performance 
issues

+   with certain input values.  This can be removed in the future when the
+   issues have been resolved.  */
+#define RS6000_DISABLE_SCALAR_MODULO 1
+
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b0db8ae508d..6c2f237a539 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3421,6 +3421,17 @@ (define_expand "mod3"
FAIL;

   operands[2] = force_reg (mode, operands[2]);
+
+  if (RS6000_DISABLE_SCALAR_MODULO)
+   {
+ temp1 = gen_reg_rtx (mode);
+ temp2 = gen_reg_rtx (mode);
+
+ emit_insn (gen_div3 (temp1, operands[1], operands[2]));
+ emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+ emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+ DONE;
+   }
 }
   else
 {
@@ -3440,17 +3451,42 @@ (define_insn "*mod3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
 (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
 (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
   "mods %0,%1,%2"
   [(set_attr "type" "div")
(set_attr "size" "")])

+;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is
+;; removed.
+(define_expand "umod3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand")
+   (umod:GPR (match_operand:GPR 1 "gpc_reg_operand")
+ (match_operand:GPR 2 "gpc_reg_operand")))]
+  ""
+{
+  rtx temp1;
+  rtx temp2;
+
+  if (!TARGET_MODULO)
+   FAIL;

-(define_insn "umod3"
+  if (RS6000_DISABLE_SCALAR_MODULO)
+{
+  temp1 = gen_reg_rtx (mode);
+  temp2 = gen_reg_rtx (mode);
+
+  emit_insn (gen_udiv3 (temp1, operands[1], operands[2]));
+  emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+  emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+  DONE;
+}
+})
+
+(define_insn "*umod3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
 (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
  (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
   "modu %0,%1,%2"
   [(set_attr "type" "div")
(set_attr "size" "")])
@@ -3507,7 +3543,

Re: [PATCH V2, rs6000] Disable generation of scalar modulo instructions

2023-06-02 Thread Pat Haugen via Gcc-patches


Ping ^3

On 4/18/23 7:22 AM, Pat Haugen via Gcc-patches wrote:

Updated from prior patch to also disable for int128.


Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-04-18  Pat Haugen  

gcc/
 * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
 * config/rs6000/rs6000.md (mod3, *mod3): Disable.
 (define_expand umod3): New.
 (define_insn umod3): Rename to *umod3 and disable.
 (umodti3, modti3): Disable.

gcc/testsuite/
 * gcc.target/powerpc/clone1.c: Add xfails.
 * gcc.target/powerpc/clone3.c: Likewise.
 * gcc.target/powerpc/mod-1.c: Likewise.
 * gcc.target/powerpc/mod-2.c: Likewise.
 * gcc.target/powerpc/p10-vdivq-vmodq.c: Likewise.


diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..1cf0a0013c0 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2492,3 +2492,9 @@ while (0)
     rs6000_asm_output_opcode (STREAM);    \
  }    \
    while (0)
+
+/* Disable generation of scalar modulo instructions due to performance 
issues

+   with certain input values. This can be removed in the future when the
+   issues have been resolved.  */
+#define RS6000_DISABLE_SCALAR_MODULO 1
+
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 44f7dd509cb..4f397bc9179 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3421,6 +3421,17 @@ (define_expand "mod3"
  FAIL;

    operands[2] = force_reg (mode, operands[2]);
+
+  if (RS6000_DISABLE_SCALAR_MODULO)
+    {
+  temp1 = gen_reg_rtx (mode);
+  temp2 = gen_reg_rtx (mode);
+
+  emit_insn (gen_div3 (temp1, operands[1], operands[2]));
+  emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+  emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+  DONE;
+    }
  }
    else
  {
@@ -3440,17 +3451,42 @@ (define_insn "*mod3"
    [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
  (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
   (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
    "mods %0,%1,%2"
    [(set_attr "type" "div")
     (set_attr "size" "")])

+;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is
+;; removed.
+(define_expand "umod3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand")
+    (umod:GPR (match_operand:GPR 1 "gpc_reg_operand")
+  (match_operand:GPR 2 "gpc_reg_operand")))]
+  ""
+{
+  rtx temp1;
+  rtx temp2;
+
+  if (!TARGET_MODULO)
+    FAIL;

-(define_insn "umod3"
+  if (RS6000_DISABLE_SCALAR_MODULO)
+    {
+  temp1 = gen_reg_rtx (mode);
+  temp2 = gen_reg_rtx (mode);
+
+  emit_insn (gen_udiv3 (temp1, operands[1], operands[2]));
+  emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+  emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+  DONE;
+    }
+})
+
+(define_insn "*umod3"
    [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
  (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
    (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
    "modu %0,%1,%2"
    [(set_attr "type" "div")
     (set_attr "size" "")])
@@ -3507,7 +3543,7 @@ (define_insn "umodti3"
    [(set (match_operand:TI 0 "altivec_register_operand" "=v")
  (umod:TI (match_operand:TI 1 "altivec_register_operand" "v")
   (match_operand:TI 2 "altivec_register_operand" "v")))]
-  "TARGET_POWER10 && TARGET_POWERPC64"
+  "TARGET_POWER10 && TARGET_POWERPC64 && !RS6000_DISABLE_SCALAR_MODULO"
    "vmoduq %0,%1,%2"
    [(set_attr "type" "vecdiv")
     (set_attr "size" "128")])
@@ -3516,7 +3552,7 @@ (define_insn "modti3"
    [(set (match_operand:TI 0 "altivec_register_operand" "=v")
  (mod:TI (match_operand:TI 1 "altivec_register_operand" "v")
  (match_operand:TI 2 "altivec_register_operand" "v")))]
-  "TARGET_POWER10 && TARGE

Re: [PATCH V2, rs6000] Disable generation of scalar modulo instructions

2023-05-18 Thread Pat Haugen via Gcc-patches


Ping.

On 4/18/23 7:22 AM, Pat Haugen via Gcc-patches wrote:

Updated from prior patch to also disable for int128.


Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-04-18  Pat Haugen  

gcc/
 * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
 * config/rs6000/rs6000.md (mod3, *mod3): Disable.
 (define_expand umod3): New.
 (define_insn umod3): Rename to *umod3 and disable.
 (umodti3, modti3): Disable.

gcc/testsuite/
 * gcc.target/powerpc/clone1.c: Add xfails.
 * gcc.target/powerpc/clone3.c: Likewise.
 * gcc.target/powerpc/mod-1.c: Likewise.
 * gcc.target/powerpc/mod-2.c: Likewise.
 * gcc.target/powerpc/p10-vdivq-vmodq.c: Likewise.


diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..1cf0a0013c0 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2492,3 +2492,9 @@ while (0)
     rs6000_asm_output_opcode (STREAM);    \
  }    \
    while (0)
+
+/* Disable generation of scalar modulo instructions due to performance 
issues

+   with certain input values. This can be removed in the future when the
+   issues have been resolved.  */
+#define RS6000_DISABLE_SCALAR_MODULO 1
+
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 44f7dd509cb..4f397bc9179 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3421,6 +3421,17 @@ (define_expand "mod3"
  FAIL;

    operands[2] = force_reg (mode, operands[2]);
+
+  if (RS6000_DISABLE_SCALAR_MODULO)
+    {
+  temp1 = gen_reg_rtx (mode);
+  temp2 = gen_reg_rtx (mode);
+
+  emit_insn (gen_div3 (temp1, operands[1], operands[2]));
+  emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+  emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+  DONE;
+    }
  }
    else
  {
@@ -3440,17 +3451,42 @@ (define_insn "*mod3"
    [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
  (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
   (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
    "mods %0,%1,%2"
    [(set_attr "type" "div")
     (set_attr "size" "")])

+;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is
+;; removed.
+(define_expand "umod3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand")
+    (umod:GPR (match_operand:GPR 1 "gpc_reg_operand")
+  (match_operand:GPR 2 "gpc_reg_operand")))]
+  ""
+{
+  rtx temp1;
+  rtx temp2;
+
+  if (!TARGET_MODULO)
+    FAIL;

-(define_insn "umod3"
+  if (RS6000_DISABLE_SCALAR_MODULO)
+    {
+  temp1 = gen_reg_rtx (mode);
+  temp2 = gen_reg_rtx (mode);
+
+  emit_insn (gen_udiv3 (temp1, operands[1], operands[2]));
+  emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+  emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+  DONE;
+    }
+})
+
+(define_insn "*umod3"
    [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
  (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
    (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
    "modu %0,%1,%2"
    [(set_attr "type" "div")
     (set_attr "size" "")])
@@ -3507,7 +3543,7 @@ (define_insn "umodti3"
    [(set (match_operand:TI 0 "altivec_register_operand" "=v")
  (umod:TI (match_operand:TI 1 "altivec_register_operand" "v")
   (match_operand:TI 2 "altivec_register_operand" "v")))]
-  "TARGET_POWER10 && TARGET_POWERPC64"
+  "TARGET_POWER10 && TARGET_POWERPC64 && !RS6000_DISABLE_SCALAR_MODULO"
    "vmoduq %0,%1,%2"
    [(set_attr "type" "vecdiv")
     (set_attr "size" "128")])
@@ -3516,7 +3552,7 @@ (define_insn "modti3"
    [(set (match_operand:TI 0 "altivec_register_operand" "=v")
  (mod:TI (match_operand:TI 1 "altivec_register_operand" "v")
  (match_operand:TI 2 "altivec_register_operand" "v")))]
-  "TARGET_POWER10 && TARGE

Re: [PATCH V2, rs6000] Disable generation of scalar modulo instructions

2023-05-04 Thread Pat Haugen via Gcc-patches


Ping.

On 4/18/23 7:22 AM, Pat Haugen via Gcc-patches wrote:

Updated from prior patch to also disable for int128.


Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-04-18  Pat Haugen  

gcc/
 * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
 * config/rs6000/rs6000.md (mod3, *mod3): Disable.
 (define_expand umod3): New.
 (define_insn umod3): Rename to *umod3 and disable.
 (umodti3, modti3): Disable.

gcc/testsuite/
 * gcc.target/powerpc/clone1.c: Add xfails.
 * gcc.target/powerpc/clone3.c: Likewise.
 * gcc.target/powerpc/mod-1.c: Likewise.
 * gcc.target/powerpc/mod-2.c: Likewise.
 * gcc.target/powerpc/p10-vdivq-vmodq.c: Likewise.


diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..1cf0a0013c0 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2492,3 +2492,9 @@ while (0)
     rs6000_asm_output_opcode (STREAM);    \
  }    \
    while (0)
+
+/* Disable generation of scalar modulo instructions due to performance 
issues

+   with certain input values. This can be removed in the future when the
+   issues have been resolved.  */
+#define RS6000_DISABLE_SCALAR_MODULO 1
+
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 44f7dd509cb..4f397bc9179 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3421,6 +3421,17 @@ (define_expand "mod3"
  FAIL;

    operands[2] = force_reg (mode, operands[2]);
+
+  if (RS6000_DISABLE_SCALAR_MODULO)
+    {
+  temp1 = gen_reg_rtx (mode);
+  temp2 = gen_reg_rtx (mode);
+
+  emit_insn (gen_div3 (temp1, operands[1], operands[2]));
+  emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+  emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+  DONE;
+    }
  }
    else
  {
@@ -3440,17 +3451,42 @@ (define_insn "*mod3"
    [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
  (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
   (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
    "mods %0,%1,%2"
    [(set_attr "type" "div")
     (set_attr "size" "")])

+;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is
+;; removed.
+(define_expand "umod3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand")
+    (umod:GPR (match_operand:GPR 1 "gpc_reg_operand")
+  (match_operand:GPR 2 "gpc_reg_operand")))]
+  ""
+{
+  rtx temp1;
+  rtx temp2;
+
+  if (!TARGET_MODULO)
+    FAIL;

-(define_insn "umod3"
+  if (RS6000_DISABLE_SCALAR_MODULO)
+    {
+  temp1 = gen_reg_rtx (mode);
+  temp2 = gen_reg_rtx (mode);
+
+  emit_insn (gen_udiv3 (temp1, operands[1], operands[2]));
+  emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+  emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+  DONE;
+    }
+})
+
+(define_insn "*umod3"
    [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
  (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
    (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
    "modu %0,%1,%2"
    [(set_attr "type" "div")
     (set_attr "size" "")])
@@ -3507,7 +3543,7 @@ (define_insn "umodti3"
    [(set (match_operand:TI 0 "altivec_register_operand" "=v")
  (umod:TI (match_operand:TI 1 "altivec_register_operand" "v")
   (match_operand:TI 2 "altivec_register_operand" "v")))]
-  "TARGET_POWER10 && TARGET_POWERPC64"
+  "TARGET_POWER10 && TARGET_POWERPC64 && !RS6000_DISABLE_SCALAR_MODULO"
    "vmoduq %0,%1,%2"
    [(set_attr "type" "vecdiv")
     (set_attr "size" "128")])
@@ -3516,7 +3552,7 @@ (define_insn "modti3"
    [(set (match_operand:TI 0 "altivec_register_operand" "=v")
  (mod:TI (match_operand:TI 1 "altivec_register_operand" "v")
  (match_operand:TI 2 "altivec_register_operand" "v")))]
-  "TARGET_POWER10 && TARGE

[PATCH V2, rs6000] Disable generation of scalar modulo instructions

2023-04-18 Thread Pat Haugen via Gcc-patches


Updated from prior patch to also disable for int128.


Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-04-18  Pat Haugen  

gcc/
* config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
* config/rs6000/rs6000.md (mod3, *mod3): Disable.
(define_expand umod3): New.
(define_insn umod3): Rename to *umod3 and disable.
(umodti3, modti3): Disable.

gcc/testsuite/
* gcc.target/powerpc/clone1.c: Add xfails.
* gcc.target/powerpc/clone3.c: Likewise.
* gcc.target/powerpc/mod-1.c: Likewise.
* gcc.target/powerpc/mod-2.c: Likewise.
* gcc.target/powerpc/p10-vdivq-vmodq.c: Likewise.


diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..1cf0a0013c0 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2492,3 +2492,9 @@ while (0)
rs6000_asm_output_opcode (STREAM);  \
 }  \
   while (0)
+
+/* Disable generation of scalar modulo instructions due to performance 
issues

+   with certain input values. This can be removed in the future when the
+   issues have been resolved.  */
+#define RS6000_DISABLE_SCALAR_MODULO 1
+
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 44f7dd509cb..4f397bc9179 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3421,6 +3421,17 @@ (define_expand "mod3"
FAIL;

   operands[2] = force_reg (mode, operands[2]);
+
+  if (RS6000_DISABLE_SCALAR_MODULO)
+   {
+ temp1 = gen_reg_rtx (mode);
+ temp2 = gen_reg_rtx (mode);
+
+ emit_insn (gen_div3 (temp1, operands[1], operands[2]));
+ emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+ emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+ DONE;
+   }
 }
   else
 {
@@ -3440,17 +3451,42 @@ (define_insn "*mod3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
 (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
 (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
   "mods %0,%1,%2"
   [(set_attr "type" "div")
(set_attr "size" "")])

+;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is
+;; removed.
+(define_expand "umod3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand")
+   (umod:GPR (match_operand:GPR 1 "gpc_reg_operand")
+ (match_operand:GPR 2 "gpc_reg_operand")))]
+  ""
+{
+  rtx temp1;
+  rtx temp2;
+
+  if (!TARGET_MODULO)
+   FAIL;

-(define_insn "umod3"
+  if (RS6000_DISABLE_SCALAR_MODULO)
+{
+  temp1 = gen_reg_rtx (mode);
+  temp2 = gen_reg_rtx (mode);
+
+  emit_insn (gen_udiv3 (temp1, operands[1], operands[2]));
+  emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+  emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+  DONE;
+}
+})
+
+(define_insn "*umod3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
 (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
  (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
   "modu %0,%1,%2"
   [(set_attr "type" "div")
(set_attr "size" "")])
@@ -3507,7 +3543,7 @@ (define_insn "umodti3"
   [(set (match_operand:TI 0 "altivec_register_operand" "=v")
(umod:TI (match_operand:TI 1 "altivec_register_operand" "v")
 (match_operand:TI 2 "altivec_register_operand" "v")))]
-  "TARGET_POWER10 && TARGET_POWERPC64"
+  "TARGET_POWER10 && TARGET_POWERPC64 && !RS6000_DISABLE_SCALAR_MODULO"
   "vmoduq %0,%1,%2"
   [(set_attr "type" "vecdiv")
(set_attr "size" "128")])
@@ -3516,7 +3552,7 @@ (define_insn "modti3"
   [(set (match_operand:TI 0 "altivec_register_operand" "=v")
(mod:TI (match_operand:TI 1 "altivec_register_operand" "v")
(match_operand:TI 2 "altivec_register_operand&

[PATCH, rs6000] Disable generation of scalar modulo instructions

2023-04-07 Thread Pat Haugen via Gcc-patches


Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64le (Power10).
Ok for master and backports after burn in?

-Pat


2023-04-07  Pat Haugen  

gcc/
* config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
* config/rs6000/rs6000.md (mod3, *mod3): Use it.
(define_expand umod3): New.
(define_insn umod3): Rename to *umod3 and disable.

gcc/testsuite/
* gcc.target/powerpc/clone1.c: Add xfails.
* gcc.target/powerpc/clone3.c: Likewise.
* gcc.target/powerpc/mod-1.c: Likewise.
* gcc.target/powerpc/mod-2.c: Likewise.



diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..e4d9f357622 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2492,3 +2492,9 @@ while (0)
rs6000_asm_output_opcode (STREAM);  \
 }  \
   while (0)
+
+/* Disable generation of scalar modulo instructions for word/dword due to
+   performance issues with certain input values. This can be removed in the
+   future when the issues have been resolved.  */
+#define RS6000_DISABLE_SCALAR_MODULO 1
+
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 44f7dd509cb..a267b7ee2d0 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3421,6 +3421,17 @@ (define_expand "mod3"
FAIL;

   operands[2] = force_reg (mode, operands[2]);
+
+  if (RS6000_DISABLE_SCALAR_MODULO)
+   {
+ temp1 = gen_reg_rtx (mode);
+ temp2 = gen_reg_rtx (mode);
+
+ emit_insn (gen_div3 (temp1, operands[1], operands[2]));
+ emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+ emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+ DONE;
+   }
 }
   else
 {
@@ -3440,17 +3451,42 @@ (define_insn "*mod3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
 (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
 (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
   "mods %0,%1,%2"
   [(set_attr "type" "div")
(set_attr "size" "")])

+;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is
+;; removed.
+(define_expand "umod3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand")
+   (umod:GPR (match_operand:GPR 1 "gpc_reg_operand")
+ (match_operand:GPR 2 "gpc_reg_operand")))]
+  ""
+{
+  rtx temp1;
+  rtx temp2;
+
+  if (!TARGET_MODULO)
+   FAIL;
+
+  if (RS6000_DISABLE_SCALAR_MODULO)
+{
+  temp1 = gen_reg_rtx (mode);
+  temp2 = gen_reg_rtx (mode);
+
+  emit_insn (gen_udiv3 (temp1, operands[1], operands[2]));
+  emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+  emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+  DONE;
+}
+})

-(define_insn "umod3"
+(define_insn "*umod3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
 (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
  (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
   "modu %0,%1,%2"
   [(set_attr "type" "div")
(set_attr "size" "")])
diff --git a/gcc/testsuite/gcc.target/powerpc/clone1.c 
b/gcc/testsuite/gcc.target/powerpc/clone1.c

index c69fd2aa1b8..74323ca0e8c 100644
--- a/gcc/testsuite/gcc.target/powerpc/clone1.c
+++ b/gcc/testsuite/gcc.target/powerpc/clone1.c
@@ -21,6 +21,7 @@ long mod_func_or (long a, long b, long c)
   return mod_func (a, b) | c;
 }

-/* { dg-final { scan-assembler-times {\mdivd\M}  1 } } */
-/* { dg-final { scan-assembler-times {\mmulld\M} 1 } } */
-/* { dg-final { scan-assembler-times {\mmodsd\M} 1 } } */
+/* { Fail due to RS6000_DISABLE_SCALAR_MODULO. */
+/* { dg-final { scan-assembler-times {\mdivd\M}  1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\mmulld\M} 1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\mmodsd\M} 1 { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/clone3.c 
b/gcc/testsuite/gcc.target/powerpc/clone3.c

index 911b88b781d..d3eb4dd2378 100644
--- a/gcc/testsuite/gcc.target/powerpc/clone3.c
+++ b/gcc/testsuite/gcc.target/powerpc/clone3.c
@@ -27,7 +27,8 @

[PATCH V2, rs6000] Tweak modulo define_insns to eliminate register copy

2023-03-21 Thread Pat Haugen via Gcc-patches


Updated patch with review comments addressed: fixed up testcase and added
another testcase to verify peephole is functional.

Don't force target of modulo into a distinct register.

The define_insns for the modulo operation currently force the target 
register

to a distinct reg in preparation for a possible future peephole combining
div/mod. But this can lead to cases of a needless copy being inserted. Fixed
with the following patch.

Bootstrapped and regression tested on powerpc64le.
Ok for master?

-Pat


2023-03-21  Pat Haugen  

gcc/
* config/rs6000/rs6000.md (*mod3, umod3): Add
non-earlyclobber alternative.

gcc/testsuite/
* gcc.target/powerpc/mod-no_copy.c: New.
* gcc.target/powerpc/mod-peephole.c: New.


diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 81bffb04ceb..44f7dd509cb 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3437,9 +3437,9 @@ (define_expand "mod3"
 ;; In order to enable using a peephole2 for combining div/mod to 
eliminate the

 ;; mod, prefer putting the result of mod into a different register
 (define_insn "*mod3"
-  [(set (match_operand:GPR 0 "gpc_reg_operand" "=")
-(mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
-(match_operand:GPR 2 "gpc_reg_operand" "r")))]
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
+(mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
+(match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
   "TARGET_MODULO"
   "mods %0,%1,%2"
   [(set_attr "type" "div")
@@ -3447,9 +3447,9 @@ (define_insn "*mod3"


 (define_insn "umod3"
-  [(set (match_operand:GPR 0 "gpc_reg_operand" "=")
-(umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
- (match_operand:GPR 2 "gpc_reg_operand" "r")))]
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
+(umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
+ (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
   "TARGET_MODULO"
   "modu %0,%1,%2"
   [(set_attr "type" "div")
diff --git a/gcc/testsuite/gcc.target/powerpc/mod-no_copy.c 
b/gcc/testsuite/gcc.target/powerpc/mod-no_copy.c

new file mode 100644
index 000..c55e486ee9b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mod-no_copy.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mdejagnu-cpu=power9 -O2" } */
+
+/* Verify r3 is used as source and target, no copy inserted. */
+
+long foo (long a, long b)
+{
+  return (a % b);
+}
+
+unsigned long foo2 (unsigned long a, unsigned long b)
+{
+  return (a % b);
+}
+
+/* { dg-final { scan-assembler-not {\mmr\M} } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/mod-peephole.c 
b/gcc/testsuite/gcc.target/powerpc/mod-peephole.c

new file mode 100644
index 000..7517fbc397c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mod-peephole.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-mdejagnu-cpu=power9 -O2" } */
+
+/* Verify peephole fires to combine div/mod using same opnds. */
+
+long foo (long a, long b)
+{
+  long x, y;
+
+  x = a / b;
+  y = a % b;
+  return (x + y);
+}
+
+unsigned long foo2 (unsigned long a, unsigned long b)
+{
+  unsigned long x, y;
+
+  x = a / b;
+  y = a % b;
+  return (x + y);
+}
+
+/* { dg-final { scan-assembler-not {\mmodsd\M} } } */
+/* { dg-final { scan-assembler-not {\mmodud\M} } } */

Re: [PATCH, rs6000] Tweak modulo define_insns to eliminate register copy

2023-02-27 Thread Pat Haugen via Gcc-patches


On 2/27/23 2:53 PM, Segher Boessenkool wrote:

Hi!

On Mon, Feb 27, 2023 at 02:12:23PM -0600, Pat Haugen wrote:

On 2/27/23 11:08 AM, Segher Boessenkool wrote:

On Mon, Feb 27, 2023 at 09:11:37AM -0600, Pat Haugen wrote:

The define_insns for the modulo operation currently force the target
register
to a distinct reg in preparation for a possible future peephole combining
div/mod. But this can lead to cases of a needless copy being inserted.
Fixed
with the following patch.


Have you verified those peepholes still match?


Yes, I verified the peepholes still match and transform the sequence.


Please add the testcases for that then?  Or do we have tests for it
already :-)


I don't see one, but can add one.


Do those peepholes actually improve performance?  On new CPUs?  The code
here says
;; On machines with modulo support, do a combined div/mod the old fashioned
;; method, since the multiply/subtract is faster than doing the mod
instruction
;; after a divide.
but that really should not be true: we can do the div and mod in
parallel (except in SMT4 perhaps, which we never schedule for anyway),
so that should always be strictly faster.


Since the modulo insns were introduced in Power9, we're just talking
Power9/Power10. On paper, I would agree that separate div/mod could be
slightly faster to get the mod result,


"Slightly".  It takes 12 cycles for the two in parallel (64-bit, p9),
but 17 cycles for the "cheaper" sequence (divd+mulld+subf, 12+5+2).  It
is all worse if the units are busy of course, or if there are other
problems.


but if you throw in another
independent div or mod in the insn stream then doing the peephole should
be a clear win since that 3rd insn can execute in parallel with the
initial divide as opposed to waiting for the one of the first div/mod to
clear the exclusive stage of the pipe.


That is the SMT4 case, the one we do not optimise for.  SMT2 and ST can
do four in parallel.  This means you can start a div or mod every 2nd
cycle on average, so it is very unlikely you will ever be limited by
this on real code.


Power9/Power10 only have 2 fixed-point divide units, and are able to 
issue 2 divides every 9/11 cycles (they aren't fully pipelined), with 
latencies of 12-24/12-25. Not saying that changes the "best case" 
scenario, just pointing out a lot of variables in play.


-Pat

Re: [PATCH, rs6000] Tweak modulo define_insns to eliminate register copy

2023-02-27 Thread Pat Haugen via Gcc-patches


On 2/27/23 11:08 AM, Segher Boessenkool wrote:

Hi!

On Mon, Feb 27, 2023 at 09:11:37AM -0600, Pat Haugen wrote:

The define_insns for the modulo operation currently force the target
register
to a distinct reg in preparation for a possible future peephole combining
div/mod. But this can lead to cases of a needless copy being inserted. Fixed
with the following patch.


Have you verified those peepholes still match?


Yes, I verified the peepholes still match and transform the sequence.



Do those peepholes actually improve performance?  On new CPUs?  The code
here says
;; On machines with modulo support, do a combined div/mod the old fashioned
;; method, since the multiply/subtract is faster than doing the mod instruction
;; after a divide.
but that really should not be true: we can do the div and mod in
parallel (except in SMT4 perhaps, which we never schedule for anyway),
so that should always be strictly faster.

Since the modulo insns were introduced in Power9, we're just talking 
Power9/Power10. On paper, I would agree that separate div/mod could be 
slightly faster to get the mod result, but if you throw in another 
independent div or mod in the insn stream then doing the peephole should 
be a clear win since that 3rd insn can execute in parallel with the 
initial divide as opposed to waiting for the one of the first div/mod to 
clear the exclusive stage of the pipe.



--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mod-no_copy.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target { powerpc*-*-*  } } } */


All files in gcc.target/powerpc/ test for this already.  Just leave off
the target clause here?


+/* { dg-require-effective-target powerpc_p9modulo_ok } */


Leave out this line, because ...


+/* { dg-options "-mdejagnu-cpu=power9 -O2" } */


... the -mcpu= forces it to true always.


Will update.

-Pat




+/* Verify r3 is used as source and target, no copy inserted. */



+/* { dg-final { scan-assembler-not {\mmr\M} } } */


That is probably good enough, yeah, since the test results in only a
handful of insns.


Segher

[PATCH, rs6000] Tweak modulo define_insns to eliminate register copy

2023-02-27 Thread Pat Haugen via Gcc-patches


Don't force target of modulo into a distinct register.

The define_insns for the modulo operation currently force the target 
register

to a distinct reg in preparation for a possible future peephole combining
div/mod. But this can lead to cases of a needless copy being inserted. Fixed
with the following patch.

Bootstrapped and regression tested on powerpc64le.
Ok for master?

-Pat


2023-02-27  Pat Haugen  

gcc/
* config/rs6000/rs6000.md (*mod3, umod3): Add
non-earlyclobber alternative.

gcc/testsuite/
* gcc.target/powerpc/mod-no_copy.c: New.


diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 81bffb04ceb..44f7dd509cb 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3437,9 +3437,9 @@ (define_expand "mod3"
 ;; In order to enable using a peephole2 for combining div/mod to 
eliminate the

 ;; mod, prefer putting the result of mod into a different register
 (define_insn "*mod3"
-  [(set (match_operand:GPR 0 "gpc_reg_operand" "=")
-(mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
-(match_operand:GPR 2 "gpc_reg_operand" "r")))]
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
+(mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
+(match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
   "TARGET_MODULO"
   "mods %0,%1,%2"
   [(set_attr "type" "div")
@@ -3447,9 +3447,9 @@ (define_insn "*mod3"


 (define_insn "umod3"
-  [(set (match_operand:GPR 0 "gpc_reg_operand" "=")
-(umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
- (match_operand:GPR 2 "gpc_reg_operand" "r")))]
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
+(umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
+ (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
   "TARGET_MODULO"
   "modu %0,%1,%2"
   [(set_attr "type" "div")
diff --git a/gcc/testsuite/gcc.target/powerpc/mod-no_copy.c 
b/gcc/testsuite/gcc.target/powerpc/mod-no_copy.c

new file mode 100644
index 000..91e3003b3fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mod-no_copy.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target { powerpc*-*-*  } } } */
+/* { dg-require-effective-target powerpc_p9modulo_ok } */
+/* { dg-options "-mdejagnu-cpu=power9 -O2" } */
+
+/* Verify r3 is used as source and target, no copy inserted. */
+
+long foo (long a, long b)
+{
+  return (a % b);
+}
+
+unsigned long foo2 (unsigned long a, unsigned long b)
+{
+  return (a % b);
+}
+
+/* { dg-final { scan-assembler-not {\mmr\M} } } */

Re: [PATCH] rs6000: Make P10_FUSION honour tuning setting

2023-01-05 Thread Pat Haugen via Gcc-patches


On 1/4/23 3:20 AM, Kewen.Lin via Gcc-patches wrote:

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 88c865b6b4b..6fa084c0807 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -4378,9 +4378,15 @@ rs6000_option_override_internal (bool global_init_p)
rs6000_isa_flags &= ~OPTION_MASK_MMA;
  }

-  if (TARGET_POWER10
-  && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION) == 0)
-rs6000_isa_flags |= OPTION_MASK_P10_FUSION;
+  /* Enable power10 fusion if we are tuning for power10, even if we aren't
+ generating power10 instructions.  */
+  if (!(rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION))
+{
+  if (processor_target_table[tune_index].processor == PROCESSOR_POWER10)


You can use (rs6000_tune == PROCESSOR_POWER10) at this point.
-Pat


+   rs6000_isa_flags |= OPTION_MASK_P10_FUSION;
+  else
+   rs6000_isa_flags &= ~OPTION_MASK_P10_FUSION;
+}

Ping: [PATCH, rs6000] Fix passing of Coomplex IEEE 128-bit [PR99685]

2022-05-10 Thread Pat Haugen via Gcc-patches


Ping.

On 4/26/22 3:06 PM, Pat Haugen via Gcc-patches wrote:

Fix register count when not splitting Complex IEEE 128-bit args.

For ABI_V4, we do not split complex args. This created a problem because
even though an arg would be passed in two VSX regs, we were only 
advancing the

function arg counter by one VSX register. Fixed with this patch.

Bootstrapped and regression tested on powerpc64(32/64) and powerpc64le.
Ok for master?

-Pat


2022-04-26  Pat Haugen  

 PR testsuite/99685

gcc/
 * config/rs6000/rs6000-call.cc (rs6000_function_arg_advance_1): Bump
 register count when not splitting IEEE 128-bit Complex.


diff --git a/gcc/config/rs6000/rs6000-call.cc 
b/gcc/config/rs6000/rs6000-call.cc

index f06c692..9d18607 100644
--- a/gcc/config/rs6000/rs6000-call.cc
+++ b/gcc/config/rs6000/rs6000-call.cc
@@ -,6 +,12 @@ rs6000_function_arg_advance_1 (CUMULATIVE_ARGS 
*cum, machine_mode mode,

  {
    cum->vregno += n_elts;

+  /* If we are not splitting Complex IEEE 128-bit args then account 
for

+ the fact that they are passed in 2 VSX regs. */
+  if (! targetm.calls.split_complex_arg && type
+  && TREE_CODE (type) == COMPLEX_TYPE && elt_mode == KCmode)
+    cum->vregno++;
+
    if (!TARGET_ALTIVEC)
  error ("cannot pass argument in vector register because"
     " altivec instructions are disabled, use %qs"

[PATCH, rs6000] Fix passing of Coomplex IEEE 128-bit [PR99685]

2022-04-26 Thread Pat Haugen via Gcc-patches


Fix register count when not splitting Complex IEEE 128-bit args.

For ABI_V4, we do not split complex args. This created a problem because
even though an arg would be passed in two VSX regs, we were only 
advancing the

function arg counter by one VSX register. Fixed with this patch.

Bootstrapped and regression tested on powerpc64(32/64) and powerpc64le.
Ok for master?

-Pat


2022-04-26  Pat Haugen  

PR testsuite/99685

gcc/
* config/rs6000/rs6000-call.cc (rs6000_function_arg_advance_1): Bump
register count when not splitting IEEE 128-bit Complex.


diff --git a/gcc/config/rs6000/rs6000-call.cc 
b/gcc/config/rs6000/rs6000-call.cc

index f06c692..9d18607 100644
--- a/gcc/config/rs6000/rs6000-call.cc
+++ b/gcc/config/rs6000/rs6000-call.cc
@@ -,6 +,12 @@ rs6000_function_arg_advance_1 (CUMULATIVE_ARGS 
*cum, machine_mode mode,

{
  cum->vregno += n_elts;

+ /* If we are not splitting Complex IEEE 128-bit args then account for
+the fact that they are passed in 2 VSX regs. */
+ if (! targetm.calls.split_complex_arg && type
+ && TREE_CODE (type) == COMPLEX_TYPE && elt_mode == KCmode)
+   cum->vregno++;
+
  if (!TARGET_ALTIVEC)
error ("cannot pass argument in vector register because"
   " altivec instructions are disabled, use %qs"

PING: [PATCH, testsuite] Fix attr-retain-*.c testcases on 32-bit PowerPC [PR100407]

2022-02-24 Thread Pat Haugen via Gcc-patches


Ping.

On 2/10/22 4:17 PM, Pat Haugen via Gcc-patches wrote:

Per Alan's comment in the bugzilla, fix attr-retain-* tescases for 32-bit 
PowerPC.

Bootstrapped and regression tested on powerpc64(32/64) and powerpc64le.
Ok for master?

-Pat


2022-02-10  Pat Haugen  

PR testsuite/100407

gcc/testsuite/
* gcc.c-torture/compile/attr-retain-1.c: Add -G0 for 32-bit PowerPC.
* gcc.c-torture/compile/attr-retain-2.c: Likewise.



diff --git a/gcc/testsuite/gcc.c-torture/compile/attr-retain-1.c 
b/gcc/testsuite/gcc.c-torture/compile/attr-retain-1.c
index 6cab155..4a366eb 100644
--- a/gcc/testsuite/gcc.c-torture/compile/attr-retain-1.c
+++ b/gcc/testsuite/gcc.c-torture/compile/attr-retain-1.c
@@ -1,4 +1,5 @@
  /* { dg-do compile { target R_flag_in_section } } */
+/* { dg-options "-G0" { target { powerpc*-*-* && ilp32 } } } */
  /* { dg-final { scan-assembler ".text.*,\"axR\"" } } */
  /* { dg-final { scan-assembler ".bss.*,\"awR\"" } } */
  /* { dg-final { scan-assembler ".data.*,\"awR\"" } } */
diff --git a/gcc/testsuite/gcc.c-torture/compile/attr-retain-2.c 
b/gcc/testsuite/gcc.c-torture/compile/attr-retain-2.c
index 0208ffe..d9fc150 100644
--- a/gcc/testsuite/gcc.c-torture/compile/attr-retain-2.c
+++ b/gcc/testsuite/gcc.c-torture/compile/attr-retain-2.c
@@ -11,5 +11,6 @@
  /* { dg-final { scan-assembler ".bss.used_lcomm2,\"awR\"" { target arm-*-* } 
} } */
  /* { dg-final { scan-assembler ".data.used_foo_sec,\"awR\"" } } */
  /* { dg-options "-ffunction-sections -fdata-sections" } */
+/* { dg-options "-ffunction-sections -fdata-sections -G0" { target { powerpc*-*-* 
&& ilp32 } } } */
  
  #include "attr-retain-1.c"

PING: [PATCH, rs6000] Clean up Power10 fusion options

2022-02-17 Thread Pat Haugen via Gcc-patches


Ping.

On 1/28/22 12:03 PM, Pat Haugen via Gcc-patches wrote:

Mark Power10 fusion option undocumented and remove sub-options.

Bootstrapped and regression tested on powerpc64le(Power10).
Ok for master?

-Pat


2022-01-28  Pat Haugen  

gcc/
* config/rs6000/rs6000.opt (mpower10-fusion): Mark Undocumented.
(mpower10-fusion-ld-cmpi, mpower10-fusion-2logical,
mpower10-fusion-logical-add, mpower10-fusion-add-logical,
mpower10-fusion-2add, mpower10-fusion-2store): Remove.
* config/rs6000/rs6000-cpus.def (ISA_3_1_MASKS_SERVER,
OTHER_P9_VECTOR_MASKS): Remove Power10 fusion sub-options.
* config/rs6000/rs6000.cc (rs6000_option_override_internal,
power10_sched_reorder): Likewise.
* config/rs6000/genfusion.pl (gen_ld_cmpi_p10, gen_logical_addsubf,
gen_addadd): Likewise
* config/rs6000/fusion.md: Regenerate.

[PATCH, testsuite] Fix attr-retain-*.c testcases on 32-bit PowerPC [PR100407]

2022-02-10 Thread Pat Haugen via Gcc-patches

Per Alan's comment in the bugzilla, fix attr-retain-* tescases for 32-bit 
PowerPC.

Bootstrapped and regression tested on powerpc64(32/64) and powerpc64le.
Ok for master?

-Pat


2022-02-10  Pat Haugen  

PR testsuite/100407

gcc/testsuite/
* gcc.c-torture/compile/attr-retain-1.c: Add -G0 for 32-bit PowerPC.
* gcc.c-torture/compile/attr-retain-2.c: Likewise.



diff --git a/gcc/testsuite/gcc.c-torture/compile/attr-retain-1.c 
b/gcc/testsuite/gcc.c-torture/compile/attr-retain-1.c
index 6cab155..4a366eb 100644
--- a/gcc/testsuite/gcc.c-torture/compile/attr-retain-1.c
+++ b/gcc/testsuite/gcc.c-torture/compile/attr-retain-1.c
@@ -1,4 +1,5 @@
 /* { dg-do compile { target R_flag_in_section } } */
+/* { dg-options "-G0" { target { powerpc*-*-* && ilp32 } } } */
 /* { dg-final { scan-assembler ".text.*,\"axR\"" } } */
 /* { dg-final { scan-assembler ".bss.*,\"awR\"" } } */
 /* { dg-final { scan-assembler ".data.*,\"awR\"" } } */
diff --git a/gcc/testsuite/gcc.c-torture/compile/attr-retain-2.c 
b/gcc/testsuite/gcc.c-torture/compile/attr-retain-2.c
index 0208ffe..d9fc150 100644
--- a/gcc/testsuite/gcc.c-torture/compile/attr-retain-2.c
+++ b/gcc/testsuite/gcc.c-torture/compile/attr-retain-2.c
@@ -11,5 +11,6 @@
 /* { dg-final { scan-assembler ".bss.used_lcomm2,\"awR\"" { target arm-*-* } } 
} */
 /* { dg-final { scan-assembler ".data.used_foo_sec,\"awR\"" } } */
 /* { dg-options "-ffunction-sections -fdata-sections" } */
+/* { dg-options "-ffunction-sections -fdata-sections -G0" { target { 
powerpc*-*-* && ilp32 } } } */
 
 #include "attr-retain-1.c"

[PATCH, rs6000] Clean up Power10 fusion options

2022-01-28 Thread Pat Haugen via Gcc-patches

Mark Power10 fusion option undocumented and remove sub-options.

Bootstrapped and regression tested on powerpc64le(Power10).
Ok for master?

-Pat


2022-01-28  Pat Haugen  

gcc/
* config/rs6000/rs6000.opt (mpower10-fusion): Mark Undocumented.
(mpower10-fusion-ld-cmpi, mpower10-fusion-2logical,
mpower10-fusion-logical-add, mpower10-fusion-add-logical,
mpower10-fusion-2add, mpower10-fusion-2store): Remove.
* config/rs6000/rs6000-cpus.def (ISA_3_1_MASKS_SERVER,
OTHER_P9_VECTOR_MASKS): Remove Power10 fusion sub-options.
* config/rs6000/rs6000.cc (rs6000_option_override_internal,
power10_sched_reorder): Likewise.
* config/rs6000/genfusion.pl (gen_ld_cmpi_p10, gen_logical_addsubf,
gen_addadd): Likewise
* config/rs6000/fusion.md: Regenerate.
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index c2a77182a9e..b4e69e9fefd 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -491,33 +491,9 @@ Target Mask(P8_VECTOR) Var(rs6000_isa_flags)
 Use vector and scalar instructions added in ISA 2.07.
 
 mpower10-fusion
-Target Mask(P10_FUSION) Var(rs6000_isa_flags)
+Target Undocumented Mask(P10_FUSION) Var(rs6000_isa_flags)
 Fuse certain integer operations together for better performance on power10.
 
-mpower10-fusion-ld-cmpi
-Target Undocumented Mask(P10_FUSION_LD_CMPI) Var(rs6000_isa_flags)
-Fuse certain integer operations together for better performance on power10.
-
-mpower10-fusion-2logical
-Target Undocumented Mask(P10_FUSION_2LOGICAL) Var(rs6000_isa_flags)
-Fuse pairs of scalar or vector logical operations together for better 
performance on power10.
-
-mpower10-fusion-logical-add
-Target Undocumented Mask(P10_FUSION_LOGADD) Var(rs6000_isa_flags)
-Fuse scalar logical op with add/subf for better performance on power10.
-
-mpower10-fusion-add-logical
-Target Undocumented Mask(P10_FUSION_ADDLOG) Var(rs6000_isa_flags)
-Fuse scalar add/subf with logical op for better performance on power10.
-
-mpower10-fusion-2add
-Target Undocumented Mask(P10_FUSION_2ADD) Var(rs6000_isa_flags)
-Fuse dependent pairs of add or vaddudm instructions for better performance on 
power10.
-
-mpower10-fusion-2store
-Target Undocumented Mask(P10_FUSION_2STORE) Var(rs6000_isa_flags)
-Fuse certain store operations together for better performance on power10.
-
 mcrypto
 Target Mask(CRYPTO) Var(rs6000_isa_flags)
 Use ISA 2.07 Category:Vector.AES and Category:Vector.SHA2 instructions.
diff --git a/gcc/config/rs6000/rs6000-cpus.def 
b/gcc/config/rs6000/rs6000-cpus.def
index 325b21967c4..963947f6939 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -85,13 +85,7 @@
 #define ISA_3_1_MASKS_SERVER   (ISA_3_0_MASKS_SERVER   \
 | OPTION_MASK_POWER10  \
 | OTHER_POWER10_MASKS  \
-| OPTION_MASK_P10_FUSION   \
-| OPTION_MASK_P10_FUSION_LD_CMPI   \
-| OPTION_MASK_P10_FUSION_2LOGICAL  \
-| OPTION_MASK_P10_FUSION_LOGADD\
-| OPTION_MASK_P10_FUSION_ADDLOG\
-| OPTION_MASK_P10_FUSION_2ADD  \
-| OPTION_MASK_P10_FUSION_2STORE)
+| OPTION_MASK_P10_FUSION)
 
 /* Flags that need to be turned off if -mno-power9-vector.  */
 #define OTHER_P9_VECTOR_MASKS  (OPTION_MASK_FLOAT128_HW\
@@ -139,12 +133,6 @@
 | OPTION_MASK_FPRND\
 | OPTION_MASK_POWER10  \
 | OPTION_MASK_P10_FUSION   \
-| OPTION_MASK_P10_FUSION_LD_CMPI   \
-| OPTION_MASK_P10_FUSION_2LOGICAL  \
-| OPTION_MASK_P10_FUSION_LOGADD\
-| OPTION_MASK_P10_FUSION_ADDLOG\
-| OPTION_MASK_P10_FUSION_2ADD  \
-| OPTION_MASK_P10_FUSION_2STORE\
 | OPTION_MASK_HTM  \
 | OPTION_MASK_ISEL \
 | OPTION_MASK_MFCRF\
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index a5fd36b72d9..548366abada 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -4458,30 +4458,6 @@ rs6000_option_override_internal (bool global_init_p)
   && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION) == 0)
 rs6000_isa_flags |= OPTION_MASK_P10_FUSION;
 
-  if (TA

Ping^5: [PATCH, rs6000 V2] rotate and mask constants [PR94393]

2022-01-19 Thread Pat Haugen via Gcc-patches

Ping. I'll note that I recently discovered that this patch also fixes PR93176 
and PR97042.

-Pat

On 11/22/21 1:38 PM, Pat Haugen via Gcc-patches wrote:
> Updated version of the patch. Changes made from original are updated 
> commentary to hopefully aid readability, no functional changes.
> 
> 
> Implement more two insn constants.  rotate_and_mask_constant covers
> 64-bit constants that can be formed by rotating a 16-bit signed
> constant, rotating a 16-bit signed constant masked on left or right
> (rldicl and rldicr), rotating a 16-bit signed constant masked by
> rldic, and unusual "lis; rldicl" and "lis; rldicr" patterns.  All the
> values possible for DImode rs6000_is_valid_and_mask are covered.
> 
> Bootstrapped and regression tested on powerpc64(32/64) and powerpc64le.
> Ok for master?
> 
> -Pat
> 
> 
> 2021-11-22  Alan Modra  
>   Pat Haugen  
> 
>   PR 94393
> gcc/
>   * config/rs6000/rs6000.c (rotate_di, is_rotate_positive_constant,
>   is_rotate_negative_constant, rotate_and_mask_constant): New functions.
>   (num_insns_constant_multi, rs6000_emit_set_long_const): Use it here.
>   * config/rs6000/rs6000.md (*movdi_internal64+1 splitter): Delete.
> gcc/testsuite/
>   * gcc.target/powerpc/rot_cst.h,
>   * gcc.target/powerpc/rot_cst1.c,
>   * gcc.target/powerpc/rot_cst2.c: New tests.

Ping^4: [PATCH, rs6000 V2] rotate and mask constants [PR94393]

2022-01-13 Thread Pat Haugen via Gcc-patches

Ping.

On 11/22/21 1:38 PM, Pat Haugen via Gcc-patches wrote:
> Updated version of the patch. Changes made from original are updated 
> commentary to hopefully aid readability, no functional changes.
> 
> 
> Implement more two insn constants.  rotate_and_mask_constant covers
> 64-bit constants that can be formed by rotating a 16-bit signed
> constant, rotating a 16-bit signed constant masked on left or right
> (rldicl and rldicr), rotating a 16-bit signed constant masked by
> rldic, and unusual "lis; rldicl" and "lis; rldicr" patterns.  All the
> values possible for DImode rs6000_is_valid_and_mask are covered.
> 
> Bootstrapped and regression tested on powerpc64(32/64) and powerpc64le.
> Ok for master?
> 
> -Pat
> 
> 
> 2021-11-22  Alan Modra  
>   Pat Haugen  
> 
>   PR 94393
> gcc/
>   * config/rs6000/rs6000.c (rotate_di, is_rotate_positive_constant,
>   is_rotate_negative_constant, rotate_and_mask_constant): New functions.
>   (num_insns_constant_multi, rs6000_emit_set_long_const): Use it here.
>   * config/rs6000/rs6000.md (*movdi_internal64+1 splitter): Delete.
> gcc/testsuite/
>   * gcc.target/powerpc/rot_cst.h,
>   * gcc.target/powerpc/rot_cst1.c,
>   * gcc.target/powerpc/rot_cst2.c: New tests.

Ping^3: [PATCH, rs6000 V2] rotate and mask constants [PR94393]

2021-12-30 Thread Pat Haugen via Gcc-patches

Ping.

On 11/22/21 1:38 PM, Pat Haugen via Gcc-patches wrote:
> Updated version of the patch. Changes made from original are updated 
> commentary to hopefully aid readability, no functional changes.
> 
> 
> Implement more two insn constants.  rotate_and_mask_constant covers
> 64-bit constants that can be formed by rotating a 16-bit signed
> constant, rotating a 16-bit signed constant masked on left or right
> (rldicl and rldicr), rotating a 16-bit signed constant masked by
> rldic, and unusual "lis; rldicl" and "lis; rldicr" patterns.  All the
> values possible for DImode rs6000_is_valid_and_mask are covered.
> 
> Bootstrapped and regression tested on powerpc64(32/64) and powerpc64le.
> Ok for master?
> 
> -Pat
> 
> 
> 2021-11-22  Alan Modra  
>   Pat Haugen  
> 
>   PR 94393
> gcc/
>   * config/rs6000/rs6000.c (rotate_di, is_rotate_positive_constant,
>   is_rotate_negative_constant, rotate_and_mask_constant): New functions.
>   (num_insns_constant_multi, rs6000_emit_set_long_const): Use it here.
>   * config/rs6000/rs6000.md (*movdi_internal64+1 splitter): Delete.
> gcc/testsuite/
>   * gcc.target/powerpc/rot_cst.h,
>   * gcc.target/powerpc/rot_cst1.c,
>   * gcc.target/powerpc/rot_cst2.c: New tests.
>

Ping: [PATCH, rs6000 V2] rotate and mask constants [PR94393]

2021-12-07 Thread Pat Haugen via Gcc-patches

Ping.

On 11/22/21 1:38 PM, Pat Haugen via Gcc-patches wrote:
> Updated version of the patch. Changes made from original are updated 
> commentary to hopefully aid readability, no functional changes.
> 
> 
> Implement more two insn constants.  rotate_and_mask_constant covers
> 64-bit constants that can be formed by rotating a 16-bit signed
> constant, rotating a 16-bit signed constant masked on left or right
> (rldicl and rldicr), rotating a 16-bit signed constant masked by
> rldic, and unusual "lis; rldicl" and "lis; rldicr" patterns.  All the
> values possible for DImode rs6000_is_valid_and_mask are covered.
> 
> Bootstrapped and regression tested on powerpc64(32/64) and powerpc64le.
> Ok for master?
> 
> -Pat
> 
> 
> 2021-11-22  Alan Modra  
>   Pat Haugen  
> 
>   PR 94393
> gcc/
>   * config/rs6000/rs6000.c (rotate_di, is_rotate_positive_constant,
>   is_rotate_negative_constant, rotate_and_mask_constant): New functions.
>   (num_insns_constant_multi, rs6000_emit_set_long_const): Use it here.
>   * config/rs6000/rs6000.md (*movdi_internal64+1 splitter): Delete.
> gcc/testsuite/
>   * gcc.target/powerpc/rot_cst.h,
>   * gcc.target/powerpc/rot_cst1.c,
>   * gcc.target/powerpc/rot_cst2.c: New tests.
>

[PATCH, rs6000 GCC-11 committed] Fix fusion testcase counts

2021-10-04 Thread Pat Haugen via Gcc-patches

Somehow there was an issue when a larger patch was backported, and this chunk 
did not make it. Tested and committed as obvious.

-Pat


2021-10-04  Pat Haugen  

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/fusion-p10-ldcmpi.c: Update counts.


diff --git a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c 
b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
index ea1d5d01950..526a026d874 100644
--- a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
+++ b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
@@ -53,16 +53,16 @@ TEST(int16_t)
 TEST(uint8_t)
 TEST(int8_t)
 
-/* { dg-final { scan-assembler-times "lbz_cmpldi_cr0_QI_clobber_CCUNS_zero"   
2 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lbz_cmpldi_cr0_QI_clobber_CCUNS_zero"   
4 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_DI_CC_none" 
4 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_clobber_CC_none"
4 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_DI_CCUNS_none" 
1 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_clobber_CCUNS_none"
1 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign"   
8 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"   
2 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign" 
3 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign"  
16 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"   
4 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign" 
0 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"   
4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero" 
2 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero" 
0 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none"   
2 { target lp64 } } } */
 
 /* { dg-final { scan-assembler-times "lbz_cmpldi_cr0_QI_clobber_CCUNS_zero"   
2 { target ilp32 } } } */

Ping: [PATCH, rs6000 V2] Add store fusion support for Power10

2021-08-26 Thread Pat Haugen via Gcc-patches

Ping.

On 8/11/21 11:02 AM, Pat Haugen via Gcc-patches wrote:
> Enable store fusion on Power10.
> 
> Use the SCHED_REORDER hook to implement Power10 specific ready list 
> reordering.
> As of now this is just store fusion.
> 
> Things changed in this version of the patch
> - Separate patch for additional load/store checks
> - Move option check from is_fusable_store() to caller
> - Misc coding style changes pointed out in review (parens/braces)
> - Add testcases
> 
> Bootstrap/regtest on powerpc64(32/64) and powerpc64le(Power10) with no new 
> regressions.
> Ok for master?
> 
> -Pat
> 
> 
> 2021-08-11  Pat Haugen  
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000-cpus.def (ISA_3_1_MASKS_SERVER): Add new flag.
>   (POWERPC_MASKS): Likewise.
>   * config/rs6000/rs6000.c (rs6000_option_override_internal): Enable
>   store fusion for Power10.
>   (is_fusable_store): New.
>   (power10_sched_reorder): Likewise.
>   (rs6000_sched_reorder): Do Power10 specific reordering.
>   (rs6000_sched_reorder2): Likewise.
>   * config/rs6000/rs6000.opt: Add new option.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/fusion-p10-stst.c: New test.
>   * gcc.target/powerpc/fusion-p10-stst2.c: New test.
> 
> 
> 
> diff --git a/gcc/config/rs6000/rs6000-cpus.def 
> b/gcc/config/rs6000/rs6000-cpus.def
> index 6758296c0fd..f5812da0184 100644
> --- a/gcc/config/rs6000/rs6000-cpus.def
> +++ b/gcc/config/rs6000/rs6000-cpus.def
> @@ -90,7 +90,8 @@
>| OPTION_MASK_P10_FUSION_2LOGICAL  \
>| OPTION_MASK_P10_FUSION_LOGADD\
>| OPTION_MASK_P10_FUSION_ADDLOG\
> -  | OPTION_MASK_P10_FUSION_2ADD)
> +  | OPTION_MASK_P10_FUSION_2ADD  \
> +  | OPTION_MASK_P10_FUSION_2STORE)
>  
>  /* Flags that need to be turned off if -mno-power9-vector.  */
>  #define OTHER_P9_VECTOR_MASKS(OPTION_MASK_FLOAT128_HW
> \
> @@ -143,6 +144,7 @@
>| OPTION_MASK_P10_FUSION_LOGADD\
>| OPTION_MASK_P10_FUSION_ADDLOG\
>| OPTION_MASK_P10_FUSION_2ADD  \
> +  | OPTION_MASK_P10_FUSION_2STORE\
>| OPTION_MASK_HTM  \
>| OPTION_MASK_ISEL \
>| OPTION_MASK_MFCRF\
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 60f406a4ff6..402cc924e3f 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -4495,6 +4495,10 @@ rs6000_option_override_internal (bool global_init_p)
>&& (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2ADD) == 0)
>  rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2ADD;
>  
> +  if (TARGET_POWER10
> +  && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2STORE) == 0)
> +rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2STORE;
> +
>/* Turn off vector pair/mma options on non-power10 systems.  */
>else if (!TARGET_POWER10 && TARGET_MMA)
>  {
> @@ -18874,6 +18878,91 @@ power9_sched_reorder2 (rtx_insn **ready, int lastpos)
>return cached_can_issue_more;
>  }
>  
> +/* Determine if INSN is a store to memory that can be fused with a similar
> +   adjacent store.  */
> +
> +static bool
> +is_fusable_store (rtx_insn *insn, rtx *str_mem)
> +{
> +  /* Insn must be a non-prefixed base+disp form store.  */
> +  if (is_store_insn (insn, str_mem)
> +  && get_attr_prefixed (insn) == PREFIXED_NO
> +  && get_attr_update (insn) == UPDATE_NO
> +  && get_attr_indexed (insn) == INDEXED_NO)
> +{
> +  /* Further restictions by mode and size.  */
> +  machine_mode mode = GET_MODE (*str_mem);
> +  HOST_WIDE_INT size;
> +  if (MEM_SIZE_KNOWN_P (*str_mem))
> + size = MEM_SIZE (*str_mem);
> +  else
> + return false;
> +
> +  if (INTEGRAL_MODE_P (mode))
> + /* Must be word or dword size.  */
> + return (size == 4 || size == 8);
> +  else if (FLOAT_MODE_P (mode))
> + /* Must be dword size.  */
> + return (size == 8);
> +}
> +
> +  return false;
> +}
> +
> +/* Do Power10 specific reordering of the ready list.  */
> +
> +static int
> +power10_sched_reorder (rtx_insn **ready, int lastpos)
> +{
> +  rtx mem1;
> +
> +  /* Do store fusion during sched2

[PATCH, rs6000 V2] Add store fusion support for Power10

2021-08-11 Thread Pat Haugen via Gcc-patches

Enable store fusion on Power10.

Use the SCHED_REORDER hook to implement Power10 specific ready list reordering.
As of now this is just store fusion.

Things changed in this version of the patch
- Separate patch for additional load/store checks
- Move option check from is_fusable_store() to caller
- Misc coding style changes pointed out in review (parens/braces)
- Add testcases

Bootstrap/regtest on powerpc64(32/64) and powerpc64le(Power10) with no new 
regressions.
Ok for master?

-Pat


2021-08-11  Pat Haugen  

gcc/ChangeLog:

* config/rs6000/rs6000-cpus.def (ISA_3_1_MASKS_SERVER): Add new flag.
(POWERPC_MASKS): Likewise.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Enable
store fusion for Power10.
(is_fusable_store): New.
(power10_sched_reorder): Likewise.
(rs6000_sched_reorder): Do Power10 specific reordering.
(rs6000_sched_reorder2): Likewise.
* config/rs6000/rs6000.opt: Add new option.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/fusion-p10-stst.c: New test.
* gcc.target/powerpc/fusion-p10-stst2.c: New test.



diff --git a/gcc/config/rs6000/rs6000-cpus.def 
b/gcc/config/rs6000/rs6000-cpus.def
index 6758296c0fd..f5812da0184 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -90,7 +90,8 @@
 | OPTION_MASK_P10_FUSION_2LOGICAL  \
 | OPTION_MASK_P10_FUSION_LOGADD\
 | OPTION_MASK_P10_FUSION_ADDLOG\
-| OPTION_MASK_P10_FUSION_2ADD)
+| OPTION_MASK_P10_FUSION_2ADD  \
+| OPTION_MASK_P10_FUSION_2STORE)
 
 /* Flags that need to be turned off if -mno-power9-vector.  */
 #define OTHER_P9_VECTOR_MASKS  (OPTION_MASK_FLOAT128_HW\
@@ -143,6 +144,7 @@
 | OPTION_MASK_P10_FUSION_LOGADD\
 | OPTION_MASK_P10_FUSION_ADDLOG\
 | OPTION_MASK_P10_FUSION_2ADD  \
+| OPTION_MASK_P10_FUSION_2STORE\
 | OPTION_MASK_HTM  \
 | OPTION_MASK_ISEL \
 | OPTION_MASK_MFCRF\
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 60f406a4ff6..402cc924e3f 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4495,6 +4495,10 @@ rs6000_option_override_internal (bool global_init_p)
   && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2ADD) == 0)
 rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2ADD;
 
+  if (TARGET_POWER10
+  && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2STORE) == 0)
+rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2STORE;
+
   /* Turn off vector pair/mma options on non-power10 systems.  */
   else if (!TARGET_POWER10 && TARGET_MMA)
 {
@@ -18874,6 +18878,91 @@ power9_sched_reorder2 (rtx_insn **ready, int lastpos)
   return cached_can_issue_more;
 }
 
+/* Determine if INSN is a store to memory that can be fused with a similar
+   adjacent store.  */
+
+static bool
+is_fusable_store (rtx_insn *insn, rtx *str_mem)
+{
+  /* Insn must be a non-prefixed base+disp form store.  */
+  if (is_store_insn (insn, str_mem)
+  && get_attr_prefixed (insn) == PREFIXED_NO
+  && get_attr_update (insn) == UPDATE_NO
+  && get_attr_indexed (insn) == INDEXED_NO)
+{
+  /* Further restictions by mode and size.  */
+  machine_mode mode = GET_MODE (*str_mem);
+  HOST_WIDE_INT size;
+  if (MEM_SIZE_KNOWN_P (*str_mem))
+   size = MEM_SIZE (*str_mem);
+  else
+   return false;
+
+  if (INTEGRAL_MODE_P (mode))
+   /* Must be word or dword size.  */
+   return (size == 4 || size == 8);
+  else if (FLOAT_MODE_P (mode))
+   /* Must be dword size.  */
+   return (size == 8);
+}
+
+  return false;
+}
+
+/* Do Power10 specific reordering of the ready list.  */
+
+static int
+power10_sched_reorder (rtx_insn **ready, int lastpos)
+{
+  rtx mem1;
+
+  /* Do store fusion during sched2 only.  */
+  if (!reload_completed)
+return cached_can_issue_more;
+
+  /* If the prior insn finished off a store fusion pair then simply
+ reset the counter and return, nothing more to do.  */
+  if (load_store_pendulum != 0)
+{
+  load_store_pendulum = 0;
+  return cached_can_issue_more;
+}
+
+  /* Try to pair certain store insns to adjacent memory locations
+ so that the hardware will fuse them to a single operation.  */
+  if (TARGET_P10_FUSION && TARGET_P10_FUSION_2STORE
+  && is_fusable_store (last_scheduled_insn, ))
+{
+  int pos;
+  rtx me

[PATCH, rs6000 V2] Add additional checks when identifying load/store instructions

2021-08-06 Thread Pat Haugen via Gcc-patches

Add additional checks to verify destination[source] of a load[store]
instruction is a register.

Modified code based on review comment to not change general logic of the flow. 
Braces needed on inner if-else to prevent error during bootstrap for ambiguous 
'else'.

Bootstrap/regtest on powerpc64le with no new regressions. Ok for master?

-Pat


2021-08-07  Pat Haugen  

gcc/ChangeLog:

* config/rs6000/rs6000.c (is_load_insn1): Verify destination is a
register.
(is_store_insn1): Verify source is a register.


diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 279f00cc648..c8a146a7d18 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -18357,8 +18357,13 @@ is_load_insn1 (rtx pat, rtx *load_mem)
   if (!pat || pat == NULL_RTX)
 return false;
 
-  if (GET_CODE (pat) == SET)
-return find_mem_ref (SET_SRC (pat), load_mem);
+  if (GET_CODE (pat)== SET)
+{
+  if (REG_P (SET_DEST (pat)))
+   return find_mem_ref (SET_SRC (pat), load_mem);
+  else
+   return false;
+}
 
   if (GET_CODE (pat) == PARALLEL)
 {
@@ -18395,7 +18400,12 @@ is_store_insn1 (rtx pat, rtx *str_mem)
 return false;
 
   if (GET_CODE (pat) == SET)
-return find_mem_ref (SET_DEST (pat), str_mem);
+{
+  if (REG_P (SET_SRC (pat)) || SUBREG_P (SET_SRC (pat)))
+   return find_mem_ref (SET_DEST (pat), str_mem);
+  else
+   return false;
+}
 
   if (GET_CODE (pat) == PARALLEL)
 {

Re: [PATCH, rs6000] Add additional checks when identifying load/store instructions

2021-08-06 Thread Pat Haugen via Gcc-patches

On 8/6/21 11:05 AM, Segher Boessenkool wrote:
> On Fri, Aug 06, 2021 at 10:29:40AM -0500, Pat Haugen wrote:
>> On 8/6/21 10:02 AM, Segher Boessenkool wrote:
>>>> -  if (GET_CODE (pat) == SET)
>>>> +  if (GET_CODE (pat) == SET && REG_P (SET_DEST (pat)))
>>>>  return find_mem_ref (SET_SRC (pat), load_mem);
>>>
>>> So this now falls through if it is a SET of something else than a reg.
>>> Is that intentional?  If so, this should be in the changelog.
>>>
>>
>> Falling through eventually resulted in returning false, but no, wasn't 
>> really intentional to change the logic that way. I'll fix up both places to 
>> look like the following and resubmit.
>>
>>   if (GET_CODE (pat)== SET) 
>> if (REG_P (SET_DEST (pat)))
>>   return find_mem_ref (SET_SRC (pat), load_mem);
>> else
>>   return false;
> 
> Well, that isn't quiet it either, no?  It returns false for all stores
> then?

Yeah, I didn't word that clearly, meant I'd fix up both appropriately with the 
inner if-else logic.

Here's what it looks like now.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 279f00cc648..ad856254f52 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -18357,8 +18357,11 @@ is_load_insn1 (rtx pat, rtx *load_mem)
   if (!pat || pat == NULL_RTX)
 return false;
 
-  if (GET_CODE (pat) == SET)
-return find_mem_ref (SET_SRC (pat), load_mem);
+  if (GET_CODE (pat)== SET)
+if (REG_P (SET_DEST (pat)))
+  return find_mem_ref (SET_SRC (pat), load_mem);
+else
+  return false;
 
   if (GET_CODE (pat) == PARALLEL)
 {
@@ -18395,7 +18398,10 @@ is_store_insn1 (rtx pat, rtx *str_mem)
 return false;
 
   if (GET_CODE (pat) == SET)
-return find_mem_ref (SET_DEST (pat), str_mem);
+if (REG_P (SET_SRC (pat)) || SUBREG_P (SET_SRC (pat)))
+  return find_mem_ref (SET_DEST (pat), str_mem);
+else
+  return false;
 
   if (GET_CODE (pat) == PARALLEL)
 {

Unless I'm totally not understanding the point you're trying to get across to 
me.

-Pat

Re: [PATCH, rs6000] Add additional checks when identifying load/store instructions

2021-08-06 Thread Pat Haugen via Gcc-patches

On 8/6/21 10:02 AM, Segher Boessenkool wrote:
> On Fri, Aug 06, 2021 at 09:47:40AM -0500, Pat Haugen wrote:
>> Add additional checks to verify destination[source] of a load[store]
>> instruction is a register.
> 
>>  * config/rs6000/rs6000.c: (is_load_insn1): Verify destination is a
>>  register.
> 
> No colon before " (".
> 
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -18357,7 +18361,7 @@ is_load_insn1 (rtx pat, rtx *load_mem)
>>if (!pat || pat == NULL_RTX)
>>  return false;
>>  
>> -  if (GET_CODE (pat) == SET)
>> +  if (GET_CODE (pat) == SET && REG_P (SET_DEST (pat)))
>>  return find_mem_ref (SET_SRC (pat), load_mem);
> 
> So this now falls through if it is a SET of something else than a reg.
> Is that intentional?  If so, this should be in the changelog.
> 

Falling through eventually resulted in returning false, but no, wasn't really 
intentional to change the logic that way. I'll fix up both places to look like 
the following and resubmit.

  if (GET_CODE (pat)== SET) 
if (REG_P (SET_DEST (pat)))
  return find_mem_ref (SET_SRC (pat), load_mem);
else
  return false;

-Pat

[PATCH, rs6000] Add additional checks when identifying load/store instructions

2021-08-06 Thread Pat Haugen via Gcc-patches

Add additional checks to verify destination[source] of a load[store]
instruction is a register.

Bootstrap/regtest on powerpc64le with no new regressions. Ok for master?

-Pat


2021-08-06  Pat Haugen  

gcc/ChangeLog:

* config/rs6000/rs6000.c: (is_load_insn1): Verify destination is a
register.
(is_store_insn1): Verify source is a register.


diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 279f00cc648..1460a0d7c5c 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -18357,7 +18361,7 @@ is_load_insn1 (rtx pat, rtx *load_mem)
   if (!pat || pat == NULL_RTX)
 return false;
 
-  if (GET_CODE (pat) == SET)
+  if (GET_CODE (pat) == SET && REG_P (SET_DEST (pat)))
 return find_mem_ref (SET_SRC (pat), load_mem);
 
   if (GET_CODE (pat) == PARALLEL)
@@ -18394,7 +18398,8 @@ is_store_insn1 (rtx pat, rtx *str_mem)
   if (!pat || pat == NULL_RTX)
 return false;
 
-  if (GET_CODE (pat) == SET)
+  if (GET_CODE (pat) == SET
+  && (REG_P (SET_SRC (pat)) || SUBREG_P (SET_SRC (pat
 return find_mem_ref (SET_DEST (pat), str_mem);
 
   if (GET_CODE (pat) == PARALLEL)

Re: [PATCH, rs6000] Add store fusion support for Power10

2021-08-04 Thread Pat Haugen via Gcc-patches

On 8/4/21 9:23 AM, Bill Schmidt wrote:
> Hi Pat,
> 
> Good stuff!  Comments below.
> 
> On 8/2/21 3:19 PM, Pat Haugen via Gcc-patches wrote:
>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>> index 279f00cc648..1460a0d7c5c 100644
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -4490,6 +4490,10 @@ rs6000_option_override_internal (bool global_init_p)
>>     && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2ADD) == 0)
>>   rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2ADD;
>>
>> +  if (TARGET_POWER10
>> +  && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2STORE) == 0)
>> +    rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2STORE;
>> +
>>     /* Turn off vector pair/mma options on non-power10 systems.  */
>>     else if (!TARGET_POWER10 && TARGET_MMA)
>>   {
>> @@ -18357,7 +18361,7 @@ is_load_insn1 (rtx pat, rtx *load_mem)
>>     if (!pat || pat == NULL_RTX)
>>   return false;
>>
>> -  if (GET_CODE (pat) == SET)
>> +  if (GET_CODE (pat) == SET && REG_P (SET_DEST (pat)))
>>   return find_mem_ref (SET_SRC (pat), load_mem);
> Looks like this is just an optimization to quickly discard stores, right?

Additional verification check to make sure destination is REG, yes. This will 
become a separate patch.

>>     if (GET_CODE (pat) == PARALLEL)
>> @@ -18394,7 +18398,8 @@ is_store_insn1 (rtx pat, rtx *str_mem)
>>     if (!pat || pat == NULL_RTX)
>>   return false;
>>
>> -  if (GET_CODE (pat) == SET)
>> +  if (GET_CODE (pat) == SET
>> +  && (REG_P (SET_SRC (pat)) || SUBREG_P (SET_SRC (pat
>>   return find_mem_ref (SET_DEST (pat), str_mem);
> 
> 
> Similar question.

Similar answer. :)
> 
>>
>>     if (GET_CODE (pat) == PARALLEL)
>> @@ -18859,6 +18864,96 @@ power9_sched_reorder2 (rtx_insn **ready, int 
>> lastpos)
>>     return cached_can_issue_more;
>>   }
>>
>> +/* Determine if INSN is a store to memory that can be fused with a similar
>> +   adjacent store.  */
>> +
>> +static bool
>> +is_fusable_store (rtx_insn *insn, rtx *str_mem)
>> +{
>> +  /* Exit early if not doing store fusion.  */
>> +  if (!(TARGET_P10_FUSION && TARGET_P10_FUSION_2STORE))
>> +    return false;
>> +
>> +  /* Insn must be a non-prefixed base+disp form store.  */
>> +  if (is_store_insn (insn, str_mem)
>> +  && get_attr_prefixed (insn) == PREFIXED_NO
>> +  && get_attr_update (insn) == UPDATE_NO
>> +  && get_attr_indexed (insn) == INDEXED_NO)
>> +    {
>> +  /* Further restictions by mode and size.  */
>> +  machine_mode mode = GET_MODE (*str_mem);
>> +  HOST_WIDE_INT size;
>> +  if MEM_SIZE_KNOWN_P (*str_mem)
>> +    size = MEM_SIZE (*str_mem);
>> +  else
>> +    return false;
>> +
>> +  if INTEGRAL_MODE_P (mode)
>> +    {
>> +  /* Must be word or dword size.  */
>> +  return (size == 4 || size == 8);
>> +    }
>> +  else if FLOAT_MODE_P (mode)
>> +    {
>> +  /* Must be dword size.  */
>> +  return (size == 8);
>> +    }
>> +    }
>> +
>> +  return false;
>> +}
>> +
>> +/* Do Power10 specific reordering of the ready list.  */
>> +
>> +static int
>> +power10_sched_reorder (rtx_insn **ready, int lastpos)
>> +{
>> +  int pos;
>> +  rtx mem1, mem2;
>> +
>> +  /* Do store fusion during sched2 only.  */
>> +  if (!reload_completed)
>> +    return cached_can_issue_more;
>> +
>> +  /* If the prior insn finished off a store fusion pair then simply
>> + reset the counter and return, nothing more to do.  */
> 
> 
> Good comments throughout, thanks!
> 
>> +  if (load_store_pendulum != 0)
>> +    {
>> +  load_store_pendulum = 0;
>> +  return cached_can_issue_more;
>> +    }
>> +
>> +  /* Try to pair certain store insns to adjacent memory locations
>> + so that the hardware will fuse them to a single operation.  */
>> +  if (is_fusable_store (last_scheduled_insn, ))
>> +    {
>> +  /* A fusable store was just scheduled.  Scan the ready list for 
>> another
>> + store that it can fuse with.  */
>> +  pos = lastpos;
>> +  while (pos >= 0)
>> +    {
>> +  /* GPR stores can be ascending or descending offsets, FPR/VSR stores
> VSR?  I don't see how that applies here.

Scalar floating point store from VSX re

[PATCH, rs6000] Add store fusion support for Power10

2021-08-02 Thread Pat Haugen via Gcc-patches

Enable store fusion on Power10.

Use the SCHED_REORDER hook to implement Power10 specific ready list reordering.
As of now, pairing stores for store fusion is the only function being
performed.

Bootstrap/regtest on powerpc64le(Power10) with no new regressions. Ok for
master?

-Pat


2021-08-02  Pat Haugen  

gcc/ChangeLog:

* config/rs6000/rs6000-cpus.def (ISA_3_1_MASKS_SERVER): Add new flag.
(POWERPC_MASKS): Likewise.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Enable
store fusion for Power10.
(is_load_insn1): Verify destination is a register.
(is_store_insn1): Verify source is a register.
(is_fusable_store): New.
(power10_sched_reorder): Likewise.
(rs6000_sched_reorder): Do Power10 specific reordering.
(rs6000_sched_reorder2): Likewise.
* config/rs6000/rs6000.opt: Add new option.



diff --git a/gcc/config/rs6000/rs6000-cpus.def 
b/gcc/config/rs6000/rs6000-cpus.def
index 6758296c0fd..f5812da0184 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -90,7 +90,8 @@
 | OPTION_MASK_P10_FUSION_2LOGICAL  \
 | OPTION_MASK_P10_FUSION_LOGADD\
 | OPTION_MASK_P10_FUSION_ADDLOG\
-| OPTION_MASK_P10_FUSION_2ADD)
+| OPTION_MASK_P10_FUSION_2ADD  \
+| OPTION_MASK_P10_FUSION_2STORE)
 
 /* Flags that need to be turned off if -mno-power9-vector.  */
 #define OTHER_P9_VECTOR_MASKS  (OPTION_MASK_FLOAT128_HW\
@@ -143,6 +144,7 @@
 | OPTION_MASK_P10_FUSION_LOGADD\
 | OPTION_MASK_P10_FUSION_ADDLOG\
 | OPTION_MASK_P10_FUSION_2ADD  \
+| OPTION_MASK_P10_FUSION_2STORE\
 | OPTION_MASK_HTM  \
 | OPTION_MASK_ISEL \
 | OPTION_MASK_MFCRF\
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 279f00cc648..1460a0d7c5c 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4490,6 +4490,10 @@ rs6000_option_override_internal (bool global_init_p)
   && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2ADD) == 0)
 rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2ADD;
 
+  if (TARGET_POWER10
+  && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2STORE) == 0)
+rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2STORE;
+
   /* Turn off vector pair/mma options on non-power10 systems.  */
   else if (!TARGET_POWER10 && TARGET_MMA)
 {
@@ -18357,7 +18361,7 @@ is_load_insn1 (rtx pat, rtx *load_mem)
   if (!pat || pat == NULL_RTX)
 return false;
 
-  if (GET_CODE (pat) == SET)
+  if (GET_CODE (pat) == SET && REG_P (SET_DEST (pat)))
 return find_mem_ref (SET_SRC (pat), load_mem);
 
   if (GET_CODE (pat) == PARALLEL)
@@ -18394,7 +18398,8 @@ is_store_insn1 (rtx pat, rtx *str_mem)
   if (!pat || pat == NULL_RTX)
 return false;
 
-  if (GET_CODE (pat) == SET)
+  if (GET_CODE (pat) == SET
+  && (REG_P (SET_SRC (pat)) || SUBREG_P (SET_SRC (pat
 return find_mem_ref (SET_DEST (pat), str_mem);
 
   if (GET_CODE (pat) == PARALLEL)
@@ -18859,6 +18864,96 @@ power9_sched_reorder2 (rtx_insn **ready, int lastpos)
   return cached_can_issue_more;
 }
 
+/* Determine if INSN is a store to memory that can be fused with a similar
+   adjacent store.  */
+
+static bool
+is_fusable_store (rtx_insn *insn, rtx *str_mem)
+{
+  /* Exit early if not doing store fusion.  */
+  if (!(TARGET_P10_FUSION && TARGET_P10_FUSION_2STORE))
+return false;
+
+  /* Insn must be a non-prefixed base+disp form store.  */
+  if (is_store_insn (insn, str_mem)
+  && get_attr_prefixed (insn) == PREFIXED_NO
+  && get_attr_update (insn) == UPDATE_NO
+  && get_attr_indexed (insn) == INDEXED_NO)
+{
+  /* Further restictions by mode and size.  */
+  machine_mode mode = GET_MODE (*str_mem);
+  HOST_WIDE_INT size;
+  if MEM_SIZE_KNOWN_P (*str_mem)
+   size = MEM_SIZE (*str_mem);
+  else
+   return false;
+
+  if INTEGRAL_MODE_P (mode)
+   {
+ /* Must be word or dword size.  */
+ return (size == 4 || size == 8);
+   }
+  else if FLOAT_MODE_P (mode)
+   {
+ /* Must be dword size.  */
+ return (size == 8);
+   }
+}
+
+  return false;
+}
+
+/* Do Power10 specific reordering of the ready list.  */
+
+static int
+power10_sched_reorder (rtx_insn **ready, int lastpos)
+{
+  int pos;
+  rtx mem1, mem2;
+
+  /* Do store fusion during sched2 only.  */
+  if (!reload_comp

PING: [RS6000] rotate and mask constants [PR94393]

2021-07-23 Thread Pat Haugen via Gcc-patches

Ping https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555760.html

I've done a current bootstrap/regtest on powerpc64/powerpc64le with no 
regressions.

-Pat

Re: [PATCH, rs6000] Update Power10 scheduling description for fused instruction types

2021-06-28 Thread Pat Haugen via Gcc-patches

On 6/7/21 3:41 PM, Pat Haugen via Gcc-patches wrote:
> Update Power10 scheduling description for new fused instruction types.
> 
> Bootstrap/regtest on powerpc64le(Power10) with no new regressions. Ok for
> trunk?
> 
> -Pat
> 
> 
> 2021-06-07  Pat Haugen  
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/power10.md (power10-fused-load, power10-fused-store,
>   power10-fused_alu, power10-fused-vec, power10-fused-branch): New.
> 
> 

Forgot to ask if ok to backport to GCC 11 since the new instruction types were 
backported.

-Pat

[PATCH, rs6000] Update Power10 scheduling description for fused instruction types

2021-06-07 Thread Pat Haugen via Gcc-patches

Update Power10 scheduling description for new fused instruction types.

Bootstrap/regtest on powerpc64le(Power10) with no new regressions. Ok for
trunk?

-Pat


2021-06-07  Pat Haugen  

gcc/ChangeLog:

* config/rs6000/power10.md (power10-fused-load, power10-fused-store,
power10-fused_alu, power10-fused-vec, power10-fused-branch): New.



diff --git a/gcc/config/rs6000/power10.md b/gcc/config/rs6000/power10.md
index 665f0f22c62..0186ae95896 100644
--- a/gcc/config/rs6000/power10.md
+++ b/gcc/config/rs6000/power10.md
@@ -100,6 +100,11 @@ (define_insn_reservation "power10-load" 4
(eq_attr "cpu" "power10"))
   "DU_any_power10,LU_power10")
 
+(define_insn_reservation "power10-fused-load" 4
+  (and (eq_attr "type" "fused_load_cmpi,fused_addis_load,fused_load_load")
+   (eq_attr "cpu" "power10"))
+  "DU_even_power10,LU_power10")
+
 (define_insn_reservation "power10-prefixed-load" 4
   (and (eq_attr "type" "load")
(eq_attr "update" "no")
@@ -176,6 +181,11 @@ (define_insn_reservation "power10-store" 0
(eq_attr "cpu" "power10"))
   "DU_any_power10,STU_power10")
 
+(define_insn_reservation "power10-fused-store" 0
+  (and (eq_attr "type" "fused_store_store")
+   (eq_attr "cpu" "power10"))
+  "DU_even_power10,STU_power10")
+
 (define_insn_reservation "power10-prefixed-store" 0
   (and (eq_attr "type" "store,fpstore,vecstore")
(eq_attr "prefixed" "yes")
@@ -244,6 +254,11 @@ (define_insn_reservation "power10-alu" 2
 (define_bypass 4 "power10-alu"
 "power10-crlogical,power10-mfcr,power10-mfcrf")
 
+(define_insn_reservation "power10-fused_alu" 2
+  (and (eq_attr "type" "fused_arith_logical,fused_cmp_isel,fused_carry")
+   (eq_attr "cpu" "power10"))
+  "DU_even_power10,EXU_power10")
+
 ; paddi
 (define_insn_reservation "power10-paddi" 2
   (and (eq_attr "type" "add")
@@ -403,6 +418,11 @@ (define_insn_reservation "power10-vec-2cyc" 2
(eq_attr "cpu" "power10"))
   "DU_any_power10,EXU_power10")
 
+(define_insn_reservation "power10-fused-vec" 2
+  (and (eq_attr "type" "fused_vector")
+   (eq_attr "cpu" "power10"))
+  "DU_even_power10,EXU_power10")
+
 (define_insn_reservation "power10-veccmp" 3
   (and (eq_attr "type" "veccmp")
(eq_attr "cpu" "power10"))
@@ -490,6 +510,11 @@ (define_insn_reservation "power10-branch" 2
(eq_attr "cpu" "power10"))
   "DU_any_power10,STU_power10")
 
+(define_insn_reservation "power10-fused-branch" 3
+  (and (eq_attr "type" "fused_mtbc")
+   (eq_attr "cpu" "power10"))
+  "DU_even_power10,STU_power10")
+
 
 ; Crypto
 (define_insn_reservation "power10-crypto" 4

Re: [PATCH, rs6000] Fix alias set of link reg save MEM

2021-06-02 Thread Pat Haugen via Gcc-patches

On 6/2/21 9:19 AM, Segher Boessenkool wrote:
> On Wed, Jun 02, 2021 at 08:23:48AM -0500, Pat Haugen wrote:
>> On 6/2/21 7:01 AM, Richard Biener wrote:
>>> So did you check the RTL (and alias-sets) produced by
>>> __builtin_return_address?  Test coverage might
>>> be low here and w/o scheduling opportunities to break things.
>>
>> __builtin_return_address creates it's own copy of the link reg to a pseudo 
>> upon function entry. It doesn't appear to try and "reuse" any LR copy/save 
>> location that might be generated via the prolog code. References to 
>> __builtin_return_address will then refer to that pseudo. So I don't see any 
>> connection between the prolog save code and __builtin_return_address.
> 
> That is for __builtin_return_address(0), the simple (and always working)
> one.  It is harder for non-zero arguments (although I don't see why
> those would not work, even with inlining).

Right, I realized after I sent the reply I was being a little too specific to 
__builtin_return_address(0). For non-zero args __builtin_return_address 
generates code to walk the appropriate number of stack frames back before 
loading the LR from the designated save area. All those mem references are 
using either frame-alias-set or 0 as their alias set, so we still should be 
fine. So regardless of the argument to the builtin, there is no connection 
between the current function's prologue LR save code and the code 
__builtin_return_address() would generate in the function.

-Pat

Re: [PATCH, rs6000] Fix alias set of link reg save MEM

2021-06-02 Thread Pat Haugen via Gcc-patches

On 6/2/21 7:01 AM, Richard Biener wrote:
> On Wed, Jun 2, 2021 at 1:15 PM Pat Haugen  wrote:
>>
>> On 6/2/21 1:51 AM, Richard Biener wrote:
>>> On Tue, Jun 1, 2021 at 10:37 PM Pat Haugen via Gcc-patches
>>>  wrote:
>>>>
>>>> Make sure link reg save MEM has frame alias set, to match other link reg
>>>> save/restore code.
>>>>
>>>> Bootstrap/regtest on powerpc64/powerpc64le with no new regressions. Ok for
>>>> trunk?
>>>>
>>>> -Pat
>>>>
>>>>
>>>> 2021-06-01  Pat Haugen  
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> * config/rs6000/rs6000-logue.c (rs6000_emit_prologue): Use
>>>> gen_frame_store.
>>>>
>>>>
>>>>
>>>> diff --git a/gcc/config/rs6000/rs6000-logue.c 
>>>> b/gcc/config/rs6000/rs6000-logue.c
>>>> index 13c00e740d6..07337c4836a 100644
>>>> --- a/gcc/config/rs6000/rs6000-logue.c
>>>> +++ b/gcc/config/rs6000/rs6000-logue.c
>>>> @@ -3257,7 +3257,7 @@ rs6000_emit_prologue (void)
>>>>if (!WORLD_SAVE_P (info) && info->lr_save_p
>>>>&& !cfun->machine->lr_is_wrapped_separately)
>>>>  {
>>>> -  rtx addr, reg, mem;
>>>> +  rtx reg;
>>>>
>>>>reg = gen_rtx_REG (Pmode, 0);
>>>>START_USE (0);
>>>> @@ -3267,13 +3267,8 @@ rs6000_emit_prologue (void)
>>>>if (!(strategy & (SAVE_NOINLINE_GPRS_SAVES_LR
>>>> | SAVE_NOINLINE_FPRS_SAVES_LR)))
>>>> {
>>>> - addr = gen_rtx_PLUS (Pmode, frame_reg_rtx,
>>>> -  GEN_INT (info->lr_save_offset + frame_off));
>>>> - mem = gen_rtx_MEM (Pmode, addr);
>>>> - /* This should not be of rs6000_sr_alias_set, because of
>>>> -__builtin_return_address.  */
>>>
>>> I can't figure what this comment meant - did you?  Note the old code
>>> looks like it would end up with alias-set zero and thus more conservative
>>> than with using frame-alias-set so this is an optimization?
>>
>> No, I couldn't figure out the reasoning for the comment/code either. It's 
>> been that way since it was introduced in March 2000 as part of the “Merge in 
>> changes from newppc-branch.” patch. All other places where we save/restore 
>> the link reg use a MEM with frame-alias-set. This change is an optimization 
>> as you suspect in that it allows us to schedule non-aliased loads above the 
>> link reg store (which couldn't happen before due to use of alias-set zero).
> 
> So did you check the RTL (and alias-sets) produced by
> __builtin_return_address?  Test coverage might
> be low here and w/o scheduling opportunities to break things.

__builtin_return_address creates it's own copy of the link reg to a pseudo upon 
function entry. It doesn't appear to try and "reuse" any LR copy/save location 
that might be generated via the prolog code. References to 
__builtin_return_address will then refer to that pseudo. So I don't see any 
connection between the prolog save code and __builtin_return_address.

-Pat

Re: [PATCH, rs6000] Fix alias set of link reg save MEM

2021-06-02 Thread Pat Haugen via Gcc-patches

On 6/2/21 1:51 AM, Richard Biener wrote:
> On Tue, Jun 1, 2021 at 10:37 PM Pat Haugen via Gcc-patches
>  wrote:
>>
>> Make sure link reg save MEM has frame alias set, to match other link reg
>> save/restore code.
>>
>> Bootstrap/regtest on powerpc64/powerpc64le with no new regressions. Ok for
>> trunk?
>>
>> -Pat
>>
>>
>> 2021-06-01  Pat Haugen  
>>
>> gcc/ChangeLog:
>>
>> * config/rs6000/rs6000-logue.c (rs6000_emit_prologue): Use
>> gen_frame_store.
>>
>>
>>
>> diff --git a/gcc/config/rs6000/rs6000-logue.c 
>> b/gcc/config/rs6000/rs6000-logue.c
>> index 13c00e740d6..07337c4836a 100644
>> --- a/gcc/config/rs6000/rs6000-logue.c
>> +++ b/gcc/config/rs6000/rs6000-logue.c
>> @@ -3257,7 +3257,7 @@ rs6000_emit_prologue (void)
>>if (!WORLD_SAVE_P (info) && info->lr_save_p
>>&& !cfun->machine->lr_is_wrapped_separately)
>>  {
>> -  rtx addr, reg, mem;
>> +  rtx reg;
>>
>>reg = gen_rtx_REG (Pmode, 0);
>>START_USE (0);
>> @@ -3267,13 +3267,8 @@ rs6000_emit_prologue (void)
>>if (!(strategy & (SAVE_NOINLINE_GPRS_SAVES_LR
>> | SAVE_NOINLINE_FPRS_SAVES_LR)))
>> {
>> - addr = gen_rtx_PLUS (Pmode, frame_reg_rtx,
>> -  GEN_INT (info->lr_save_offset + frame_off));
>> - mem = gen_rtx_MEM (Pmode, addr);
>> - /* This should not be of rs6000_sr_alias_set, because of
>> -__builtin_return_address.  */
> 
> I can't figure what this comment meant - did you?  Note the old code
> looks like it would end up with alias-set zero and thus more conservative
> than with using frame-alias-set so this is an optimization?

No, I couldn't figure out the reasoning for the comment/code either. It's been 
that way since it was introduced in March 2000 as part of the “Merge in changes 
from newppc-branch.” patch. All other places where we save/restore the link reg 
use a MEM with frame-alias-set. This change is an optimization as you suspect 
in that it allows us to schedule non-aliased loads above the link reg store 
(which couldn't happen before due to use of alias-set zero).

-Pat

[PATCH, rs6000] Fix alias set of link reg save MEM

2021-06-01 Thread Pat Haugen via Gcc-patches

Make sure link reg save MEM has frame alias set, to match other link reg
save/restore code.

Bootstrap/regtest on powerpc64/powerpc64le with no new regressions. Ok for
trunk?

-Pat


2021-06-01  Pat Haugen  

gcc/ChangeLog:

* config/rs6000/rs6000-logue.c (rs6000_emit_prologue): Use
gen_frame_store.



diff --git a/gcc/config/rs6000/rs6000-logue.c b/gcc/config/rs6000/rs6000-logue.c
index 13c00e740d6..07337c4836a 100644
--- a/gcc/config/rs6000/rs6000-logue.c
+++ b/gcc/config/rs6000/rs6000-logue.c
@@ -3257,7 +3257,7 @@ rs6000_emit_prologue (void)
   if (!WORLD_SAVE_P (info) && info->lr_save_p
   && !cfun->machine->lr_is_wrapped_separately)
 {
-  rtx addr, reg, mem;
+  rtx reg;
 
   reg = gen_rtx_REG (Pmode, 0);
   START_USE (0);
@@ -3267,13 +3267,8 @@ rs6000_emit_prologue (void)
   if (!(strategy & (SAVE_NOINLINE_GPRS_SAVES_LR
| SAVE_NOINLINE_FPRS_SAVES_LR)))
{
- addr = gen_rtx_PLUS (Pmode, frame_reg_rtx,
-  GEN_INT (info->lr_save_offset + frame_off));
- mem = gen_rtx_MEM (Pmode, addr);
- /* This should not be of rs6000_sr_alias_set, because of
-__builtin_return_address.  */
-
- insn = emit_move_insn (mem, reg);
+ insn = emit_insn (gen_frame_store (reg, frame_reg_rtx,
+info->lr_save_offset + frame_off));
  rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off,
NULL_RTX, NULL_RTX);
  END_USE (0);

Re: [PATCH, rs6000] Add ALTIVEC_REGS as pressure class

2021-05-10 Thread Pat Haugen via Gcc-patches

On 5/7/21 6:00 PM, Segher Boessenkool wrote:
>> --- a/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c
>> @@ -62,6 +62,6 @@ rlnm_test_2 (vector unsigned long long x, vector unsigned 
>> long long y,
>>  /* { dg-final { scan-assembler-times "vextsb2d" 1 } } */
>>  /* { dg-final { scan-assembler-times "vslw" 1 } } */
>>  /* { dg-final { scan-assembler-times "vsld" 1 } } */
>> -/* { dg-final { scan-assembler-times "xxlor" 3 } } */
>> +/* { dg-final { scan-assembler-times "xxlor" 2 } } */
>>  /* { dg-final { scan-assembler-times "vrlwnm" 2 } } */
>>  /* { dg-final { scan-assembler-times "vrldnm" 2 } } */
> So what is this replaced with?  Was it an "xxlmr" and it is just
> unnecessary now?

Different RA choice made the reg copy unnecessary.

<   xxspltib 0,8
<   xxlor 32,0,0
---
>   xxspltib 32,8

-Pat

[PATCH, rs6000] Add ALTIVEC_REGS as pressure class

2021-05-07 Thread Pat Haugen via Gcc-patches

Add ALTIVEC_REGS as pressure class.

Code that has heavy register pressure on Altivec registers can suffer from
over-aggressive scheduling during sched1, which then leads to increased
register spill. This is due to the fact that registers that prefer
ALTIVEC_REGS are currently assigned an allocno class of VSX_REGS. This then
misleads the scheduler to think there are 64 regs available, when in reality
there are only 32 Altivec regs. This patch fixes the problem by assigning an
allocno class of ALTIVEC_REGS and adding ALTIVEC_REGS as a pressure class.

Bootstrap/regtest on powerpc64/powerpc64le with no new regressions. Testing
on CPU2017 showed no significant differences. Ok for trunk?

-Pat


2021-05-07  Pat Haugen  

gcc/ChangeLog:

* config/rs6000/rs6000.c (rs6000_ira_change_pseudo_allocno_class):
Return ALTIVEC_REGS if that is best_class.
(rs6000_compute_pressure_classes): Add ALTIVEC_REGS.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/fold-vec-insert-float-p9.c: Adjust instruction 
counts.
* gcc.target/powerpc/vec-rlmi-rlnm.c: Likewise.



diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 844fee8..fee4eef 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -22487,11 +22487,14 @@ rs6000_ira_change_pseudo_allocno_class (int regno 
ATTRIBUTE_UNUSED,
 of allocno class.  */
   if (best_class == BASE_REGS)
return GENERAL_REGS;
-  if (TARGET_VSX
- && (best_class == FLOAT_REGS || best_class == ALTIVEC_REGS))
+  if (TARGET_VSX && best_class == FLOAT_REGS)
return VSX_REGS;
   return best_class;
 
+case VSX_REGS:
+  if (best_class == ALTIVEC_REGS)
+   return ALTIVEC_REGS;
+
 default:
   break;
 }
@@ -23609,12 +23612,12 @@ rs6000_compute_pressure_classes (enum reg_class 
*pressure_classes)
 
   n = 0;
   pressure_classes[n++] = GENERAL_REGS;
+  if (TARGET_ALTIVEC)
+pressure_classes[n++] = ALTIVEC_REGS;
   if (TARGET_VSX)
 pressure_classes[n++] = VSX_REGS;
   else
 {
-  if (TARGET_ALTIVEC)
-   pressure_classes[n++] = ALTIVEC_REGS;
   if (TARGET_HARD_FLOAT)
pressure_classes[n++] = FLOAT_REGS;
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-float-p9.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-float-p9.c
index 1c57672..4541768 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-float-p9.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-float-p9.c
@@ -31,5 +31,5 @@ testf_cst (float f, vector float vf)
 /* { dg-final { scan-assembler-times {\mstfs\M} 2 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {\mlxv\M} 2 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {\mlvewx\M} 1 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times {\mvperm\M} 1 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times {\mxxperm\M} 2 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times {\mvperm\M} 2 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times {\mxxperm\M} 1 { target ilp32 } } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c 
b/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c
index 1e7d739..5512c0f 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c
@@ -62,6 +62,6 @@ rlnm_test_2 (vector unsigned long long x, vector unsigned 
long long y,
 /* { dg-final { scan-assembler-times "vextsb2d" 1 } } */
 /* { dg-final { scan-assembler-times "vslw" 1 } } */
 /* { dg-final { scan-assembler-times "vsld" 1 } } */
-/* { dg-final { scan-assembler-times "xxlor" 3 } } */
+/* { dg-final { scan-assembler-times "xxlor" 2 } } */
 /* { dg-final { scan-assembler-times "vrlwnm" 2 } } */
 /* { dg-final { scan-assembler-times "vrldnm" 2 } } */

[PATCH, rs6000 V3] Update "prefix" attribute for Power10 [PR99133]

2021-03-30 Thread Pat Haugen via Gcc-patches

Update prefixed attribute for Power10.

This patch creates a new attribute, "maybe_prefixed", which is used to mark
those instructions that may have a prefixed form. The existing "prefixed"
attribute is now used to mark all instructions that are prefixed form.

This patch differs from the prior version in that it doesn't modify the
existing settings of the "prefixed" attribute but just adds the new attribute
and sets/tests it appropriately.

Bootstrap/regtest on powerpc64le (Power10) and powerpc64 (Power8 32/64) with no
new regressions. Ok for trunk?

-Pat


2021-03-30  Pat Haugen  

gcc/
PR target/99133
* config/rs6000/altivec.md (xxspltiw_v4si, xxspltiw_v4sf_inst,
xxspltidp_v2df_inst, xxsplti32dx_v4si_inst, xxsplti32dx_v4sf_inst,
xxblend_, xxpermx_inst, xxeval): Mark prefixed.
* config/rs6000/mma.md (mma_, mma_,
mma_, mma_, mma_, mma_,
mma_, mma_, mma_, mma_):
Likewise.
* config/rs6000/rs6000.c (rs6000_final_prescan_insn): Adjust test. 
* config/rs6000/rs6000.md (define_attr "maybe_prefixed"): New.
(define_attr "prefixed"): Update initializer.
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 27a269b9e72..21f1cc6f15b 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -826,7 +826,8 @@ (define_insn "xxspltiw_v4si"
 UNSPEC_XXSPLTIW))]
  "TARGET_POWER10"
  "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")])
+ [(set_attr "type" "vecsimple")
+  (set_attr "prefixed" "yes")])
 
 (define_expand "xxspltiw_v4sf"
   [(set (match_operand:V4SF 0 "register_operand" "=wa")
@@ -845,7 +846,8 @@ (define_insn "xxspltiw_v4sf_inst"
 UNSPEC_XXSPLTIW))]
  "TARGET_POWER10"
  "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")])
+ [(set_attr "type" "vecsimple")
+  (set_attr "prefixed" "yes")])
 
 (define_expand "xxspltidp_v2df"
   [(set (match_operand:V2DF 0 "register_operand" )
@@ -864,7 +866,8 @@ (define_insn "xxspltidp_v2df_inst"
 UNSPEC_XXSPLTID))]
   "TARGET_POWER10"
   "xxspltidp %x0,%1"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "yes")])
 
 (define_expand "xxsplti32dx_v4si"
   [(set (match_operand:V4SI 0 "register_operand" "=wa")
@@ -893,7 +896,8 @@ (define_insn "xxsplti32dx_v4si_inst"
 UNSPEC_XXSPLTI32DX))]
   "TARGET_POWER10"
   "xxsplti32dx %x0,%2,%3"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "yes")])
 
 (define_expand "xxsplti32dx_v4sf"
   [(set (match_operand:V4SF 0 "register_operand" "=wa")
@@ -921,7 +925,8 @@ (define_insn "xxsplti32dx_v4sf_inst"
 UNSPEC_XXSPLTI32DX))]
   "TARGET_POWER10"
   "xxsplti32dx %x0,%2,%3"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "yes")])
 
 (define_insn "xxblend_"
   [(set (match_operand:VM3 0 "register_operand" "=wa")
@@ -931,7 +936,8 @@ (define_insn "xxblend_"
UNSPEC_XXBLEND))]
   "TARGET_POWER10"
   "xxblendv %x0,%x1,%x2,%x3"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "yes")])
 
 (define_expand "xxpermx"
   [(set (match_operand:V2DI 0 "register_operand" "+wa")
@@ -975,7 +981,8 @@ (define_insn "xxpermx_inst"
 UNSPEC_XXPERMX))]
   "TARGET_POWER10"
   "xxpermx %x0,%x1,%x2,%x3,%4"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "yes")])
 
 (define_expand "vstrir_"
   [(set (match_operand:VIshort 0 "altivec_register_operand")
@@ -3623,7 +3630,8 @@ (define_insn "xxeval"
 UNSPEC_XXEVAL))]
"TARGET_POWER10"
"xxeval %0,%1,%2,%3,%4"
-   [(set_attr "type" "vecsimple")])
+   [(set_attr "type" "vecsimple")
+(set_attr "prefixed" "yes")])
 
 (define_expand "vec_unpacku_hi_v16qi"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
diff --git a/gcc/config/rs6

Re: [PATCH, rs6000 V2] Update "prefix" attribute for Power10 [PR99133]

2021-03-18 Thread Pat Haugen via Gcc-patches

On 3/18/21 11:33 AM, will schmidt wrote:
> Per this change:
> 
> +;; Whether an insn is a prefixed insn.  A prefixed instruction has a prefix
> +;; instruction word that conveys additional information such as a larger
> +;; immediate, additional operands, etc., in addition to the normal 
> instruction
> +;; word.  The default "length" attribute will also be adjusted by default to
> +;; be 12 bytes.
> +(define_attr "prefixed" "no,yes"
> +  (if_then_else (eq_attr "prepend_prefixed_insn" "yes")
> + (const_string "yes")
> + (const_string "no")))
> 
> 
> .. it looks like at least most of the users of the "prefixed" attribute have
> been switched over to use "prepend_prefixed_insn" instead.   Are there still
> users of the "prefixed" attribute remaining ?  I'm guessing so, given context,
> but can't tell for certain.
> 
> (Just a question, not a specific request for a change)

Yes, there are still a couple uses of get_attr_prefixed() in rs6000.c, plus the 
Power10 scheduling description makes use of it.

-Pat

[PATCH, rs6000 V2] Update "prefix" attribute for Power10 [PR99133]

2021-03-17 Thread Pat Haugen via Gcc-patches

Update prefixed attribute for Power10.

This patch creates a new attribute, prepend_prefixed_insn, which is used to mark
those instructions that are prefixed and need to have a 'p' prepended to their
mnemonic at asm emit time. The existing "prefix" attribute is now used to mark
all instructions that are prefixed form.

Bootstrap/regtest on powerpc64le (Power10) and powerpc64 (Power8 32/64) with no
new regressions. Ok for trunk?

-Pat


2021-03-17  Pat Haugen  

gcc/
PR target/99133
* config/rs6000/altivec.md (xxspltiw_v4si, xxspltiw_v4sf_inst,
xxspltidp_v2df_inst, xxsplti32dx_v4si_inst, xxsplti32dx_v4sf_inst,
xxblend_, xxpermx_inst, xxeval): Mark prefixed.
* config/rs6000/mma.md (mma_, mma_,
mma_, mma_, mma_, mma_,
mma_, mma_, mma_, mma_):
Likewise.
* config/rs6000/pcrel-opt.md: Adjust attribute name.
* config/rs6000/rs6000.c (rs6000_final_prescan_insn): Adjust test. 
* config/rs6000/rs6000.md (define_attr "prepend_prefixed_insn"): New.
(define_attr "prefixed"): Update initializer.
(*tls_gd_pcrel, *tls_ld_pcrel, tls_dtprel_,
tls_tprel_, *tls_got_tprel_pcrel_, *pcrel_local_addr,
*pcrel_extern_addr, stack_protect_setdi, stack_protect_testdi):
Adjust attribute name.
* config/rs6000/sync.md (load_quadpti, store_quadpti): Likewise.


diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 27a269b9e72..21f1cc6f15b 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -826,7 +826,8 @@ (define_insn "xxspltiw_v4si"
 UNSPEC_XXSPLTIW))]
  "TARGET_POWER10"
  "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")])
+ [(set_attr "type" "vecsimple")
+  (set_attr "prefixed" "yes")])
 
 (define_expand "xxspltiw_v4sf"
   [(set (match_operand:V4SF 0 "register_operand" "=wa")
@@ -845,7 +846,8 @@ (define_insn "xxspltiw_v4sf_inst"
 UNSPEC_XXSPLTIW))]
  "TARGET_POWER10"
  "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")])
+ [(set_attr "type" "vecsimple")
+  (set_attr "prefixed" "yes")])
 
 (define_expand "xxspltidp_v2df"
   [(set (match_operand:V2DF 0 "register_operand" )
@@ -864,7 +866,8 @@ (define_insn "xxspltidp_v2df_inst"
 UNSPEC_XXSPLTID))]
   "TARGET_POWER10"
   "xxspltidp %x0,%1"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "yes")])
 
 (define_expand "xxsplti32dx_v4si"
   [(set (match_operand:V4SI 0 "register_operand" "=wa")
@@ -893,7 +896,8 @@ (define_insn "xxsplti32dx_v4si_inst"
 UNSPEC_XXSPLTI32DX))]
   "TARGET_POWER10"
   "xxsplti32dx %x0,%2,%3"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "yes")])
 
 (define_expand "xxsplti32dx_v4sf"
   [(set (match_operand:V4SF 0 "register_operand" "=wa")
@@ -921,7 +925,8 @@ (define_insn "xxsplti32dx_v4sf_inst"
 UNSPEC_XXSPLTI32DX))]
   "TARGET_POWER10"
   "xxsplti32dx %x0,%2,%3"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "yes")])
 
 (define_insn "xxblend_"
   [(set (match_operand:VM3 0 "register_operand" "=wa")
@@ -931,7 +936,8 @@ (define_insn "xxblend_"
UNSPEC_XXBLEND))]
   "TARGET_POWER10"
   "xxblendv %x0,%x1,%x2,%x3"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "yes")])
 
 (define_expand "xxpermx"
   [(set (match_operand:V2DI 0 "register_operand" "+wa")
@@ -975,7 +981,8 @@ (define_insn "xxpermx_inst"
 UNSPEC_XXPERMX))]
   "TARGET_POWER10"
   "xxpermx %x0,%x1,%x2,%x3,%4"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "yes")])
 
 (define_expand "vstrir_"
   [(set (match_operand:VIshort 0 "altivec_register_operand")
@@ -3623,7 +3630,8 @@ (define_insn "xxeval"
 UNSPEC_XXEVAL))]
"TARGET_POWER10"
"xxeval %0,%1,%2,%3,%4"
-   [(set_attr "type" "vecsimple")])
+   [(set_attr "type" "vecsi

Re: [PATCH, rs6000] Update "size" attribute for Power10

2021-03-02 Thread Pat Haugen via Gcc-patches

Ping3

On 2/18/21 2:30 PM, Pat Haugen via Gcc-patches wrote:
> Ping2
> 
> On 1/26/21 11:27 AM, Pat Haugen via Gcc-patches wrote:
>> Ping
>>
>> On 12/8/20 3:46 PM, Pat Haugen via Gcc-patches wrote:
>>> Update size attribute for Power10.
>>>
>>>
>>> This patch was broken out from my larger patch to update various attributes 
>>> for
>>> Power10, in order to make the review process hopefully easier. This patch 
>>> only
>>> updates the size attribute for various new instructions. There were no 
>>> changes
>>> requested to this portion of the original patch, so nothing is new here.
>>>
>>> Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. 
>>> Ok for trunk?
>>>
>>> -Pat
>>>
>>>
>>> 2020-11-08  Pat Haugen  
>>>
>>> gcc/
>>> * config/rs6000/dfp.md (extendddtd2, trunctddd2, *cmp_internal1,
>>> floatditd2, ftrunc2, fixdi2, dfp_ddedpd_,
>>> dfp_denbcd_, dfp_dxex_, dfp_diex_,
>>> *dfp_sgnfcnc_, dfp_dscli_, dfp_dscri_): Update size
>>> attribute for Power10.
>>> * config/rs6000/mma.md (*movoo): Likewise.
>>> * config/rs6000/rs6000.md (define_attr "size"): Add 256.
>>> (define_mode_attr bits): Add DD/TD modes.
>>> * config/rs6000/sync.md (load_quadpti, store_quadpti, load_lockedpti,
>>> store_conditionalpti): Update size attribute for Power10.
>>>
>>
>

Re: [PATCH, rs6000] Add Power10 scheduling description

2021-03-02 Thread Pat Haugen via Gcc-patches

Ping3

On 2/18/21 2:31 PM, Pat Haugen via Gcc-patches wrote:
> Ping2.
> 
> On 1/26/21 11:30 AM, Pat Haugen via Gcc-patches wrote:
>> Ping.
>>
>> On 11/13/20 4:04 PM, Pat Haugen via Gcc-patches wrote:
>>> Add Power10 scheduling description.
>>>
>>> This patch adds the Power10 scheduling description. Since power10.md was 
>>> pretty much a complete rewrite (existing version of power10.md is mostly 
>>> just a copy of power9.md), I diffed power10.md with /dev/null so that the 
>>> full contents of the file are shown as opposed to a diff. This should make 
>>> it easier to read. This patch will not apply on current trunk do to that 
>>> reason.
>>>  
>>> Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. 
>>> Ok for trunk?
>>>
>>> -Pat
>>>
>>>
>>> 2020-11-13  Pat Haugen  
>>>
>>> gcc/
>>> * config/rs6000/rs6000.c (struct processor_costs): New.
>>> (rs6000_option_override_internal): Set Power10 costs.
>>> (rs6000_issue_rate): Set Power10 issue rate.
>>> * config/rs6000/power10.md: Rewrite for Power10.
>>>
>>
>

[PATCH, rs6000] Rename variable for clarity

2021-02-23 Thread Pat Haugen via Gcc-patches

Rename next_insn_prefixed_p for improved clarity.

Bootstrap/regtest on powerpc64le with no new regressions. Ok for trunk?

-Pat


2021-02-22  Pat Haugen  

gcc/
* config/rs6000/rs6000.c (next_insn_prefixed_p): Rename.
(rs6000_final_prescan_insn): Adjust.
(rs6000_asm_output_opcode): Likewise.


diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ec068c58aa5..4e608073358 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -26191,7 +26191,7 @@ prefixed_paddi_p (rtx_insn *insn)
 
 /* Whether the next instruction needs a 'p' prefix issued before the
instruction is printed out.  */
-static bool next_insn_prefixed_p;
+static bool prepend_p_to_next_insn;
 
 /* Define FINAL_PRESCAN_INSN if some processing needs to be done before
outputting the assembler code.  On the PowerPC, we remember if the current
@@ -26202,7 +26202,7 @@ static bool next_insn_prefixed_p;
 void
 rs6000_final_prescan_insn (rtx_insn *insn, rtx [], int)
 {
-  next_insn_prefixed_p = (get_attr_prefixed (insn) != PREFIXED_NO);
+  prepend_p_to_next_insn = (get_attr_prefixed (insn) != PREFIXED_NO);
   return;
 }
 
@@ -26212,7 +26212,7 @@ rs6000_final_prescan_insn (rtx_insn *insn, rtx [], int)
 void
 rs6000_asm_output_opcode (FILE *stream)
 {
-  if (next_insn_prefixed_p)
+  if (prepend_p_to_next_insn)
 fprintf (stream, "p");
 
   return;

Re: [PATCH, rs6000] Add Power10 scheduling description

2021-02-18 Thread Pat Haugen via Gcc-patches

Ping2.

On 1/26/21 11:30 AM, Pat Haugen via Gcc-patches wrote:
> Ping.
> 
> On 11/13/20 4:04 PM, Pat Haugen via Gcc-patches wrote:
>> Add Power10 scheduling description.
>>
>> This patch adds the Power10 scheduling description. Since power10.md was 
>> pretty much a complete rewrite (existing version of power10.md is mostly 
>> just a copy of power9.md), I diffed power10.md with /dev/null so that the 
>> full contents of the file are shown as opposed to a diff. This should make 
>> it easier to read. This patch will not apply on current trunk do to that 
>> reason.
>>  
>> Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. 
>> Ok for trunk?
>>
>> -Pat
>>
>>
>> 2020-11-13  Pat Haugen  
>>
>> gcc/
>>  * config/rs6000/rs6000.c (struct processor_costs): New.
>>  (rs6000_option_override_internal): Set Power10 costs.
>>  (rs6000_issue_rate): Set Power10 issue rate.
>>  * config/rs6000/power10.md: Rewrite for Power10.
>>
>

Re: [PATCH, rs6000] Update "prefix" attribute for Power10

2021-02-18 Thread Pat Haugen via Gcc-patches

Ping2.

On 1/26/21 11:28 AM, Pat Haugen via Gcc-patches wrote:
> Ping.
> 
> On 12/10/20 3:32 PM, Pat Haugen via Gcc-patches wrote:
>> Update prefixed attribute for Power10.
>>
>>
>> This patch was broken out from my larger patch to update various attributes 
>> for
>> Power10, in order to make the review process hopefully easier. This patch 
>> only
>> updates the prefix attribute for various new instructions. Changes in this
>> version include missed updates to rs6000_insn_cost and
>> rs6000_adjust_insn_length. I stayed with the new 'always' keyword but added
>> additional commentary so hopefully is more clear.
>>
>> Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. 
>> Ok for trunk?
>>
>> -Pat
>>
>>
>> 2020-11-10  Pat Haugen  
>>
>> gcc/
>>  * config/rs6000/altivec.md (xxspltiw_v4si, xxspltiw_v4sf_inst,
>>  xxspltidp_v2df_inst, xxsplti32dx_v4si_inst, xxsplti32dx_v4sf_inst,
>>  xxblend_, xxpermx_inst, xxeval): Mark prefixed "always".
>>  * config/rs6000/mma.md (mma_, mma_,
>>  mma_, mma_, mma_, mma_,
>>  mma_, mma_, mma_, mma_):
>>  Likewise.
>>  * config/rs6000/rs6000.c (rs6000_insn_cost): Update test for prefixed
>>  insn.
>>  (next_insn_prefixed_p): Rename to prefix_next_insn_p.
>>  (rs6000_final_prescan_insn): Only add 'p' for PREFIXED_YES.
>>  (rs6000_asm_output_opcode): Adjust.
>>  (rs6000_adjust_insn_length): Update test for prefixed insns.
>>  * config/rs6000/rs6000.md (define_attr "prefixed"): Add 'always'
>>  and update commentary.
>>
>

Re: [PATCH, rs6000] Update "size" attribute for Power10

2021-02-18 Thread Pat Haugen via Gcc-patches

Ping2

On 1/26/21 11:27 AM, Pat Haugen via Gcc-patches wrote:
> Ping
> 
> On 12/8/20 3:46 PM, Pat Haugen via Gcc-patches wrote:
>> Update size attribute for Power10.
>>
>>
>> This patch was broken out from my larger patch to update various attributes 
>> for
>> Power10, in order to make the review process hopefully easier. This patch 
>> only
>> updates the size attribute for various new instructions. There were no 
>> changes
>> requested to this portion of the original patch, so nothing is new here.
>>
>> Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. 
>> Ok for trunk?
>>
>> -Pat
>>
>>
>> 2020-11-08  Pat Haugen  
>>
>> gcc/
>>  * config/rs6000/dfp.md (extendddtd2, trunctddd2, *cmp_internal1,
>>  floatditd2, ftrunc2, fixdi2, dfp_ddedpd_,
>>  dfp_denbcd_, dfp_dxex_, dfp_diex_,
>>  *dfp_sgnfcnc_, dfp_dscli_, dfp_dscri_): Update size
>>  attribute for Power10.
>>  * config/rs6000/mma.md (*movoo): Likewise.
>>  * config/rs6000/rs6000.md (define_attr "size"): Add 256.
>>  (define_mode_attr bits): Add DD/TD modes.
>>  * config/rs6000/sync.md (load_quadpti, store_quadpti, load_lockedpti,
>>  store_conditionalpti): Update size attribute for Power10.
>>
>

Re: [PATCH, rs6000] Add Power10 scheduling description

2021-01-26 Thread Pat Haugen via Gcc-patches

Ping.

On 11/13/20 4:04 PM, Pat Haugen via Gcc-patches wrote:
> Add Power10 scheduling description.
> 
> This patch adds the Power10 scheduling description. Since power10.md was 
> pretty much a complete rewrite (existing version of power10.md is mostly just 
> a copy of power9.md), I diffed power10.md with /dev/null so that the full 
> contents of the file are shown as opposed to a diff. This should make it 
> easier to read. This patch will not apply on current trunk do to that reason.
>  
> Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. Ok 
> for trunk?
> 
> -Pat
> 
> 
> 2020-11-13  Pat Haugen  
> 
> gcc/
>   * config/rs6000/rs6000.c (struct processor_costs): New.
>   (rs6000_option_override_internal): Set Power10 costs.
>   (rs6000_issue_rate): Set Power10 issue rate.
>   * config/rs6000/power10.md: Rewrite for Power10.
>

Re: [PATCH, rs6000] Update "prefix" attribute for Power10

2021-01-26 Thread Pat Haugen via Gcc-patches

Ping.

On 12/10/20 3:32 PM, Pat Haugen via Gcc-patches wrote:
> Update prefixed attribute for Power10.
> 
> 
> This patch was broken out from my larger patch to update various attributes 
> for
> Power10, in order to make the review process hopefully easier. This patch only
> updates the prefix attribute for various new instructions. Changes in this
> version include missed updates to rs6000_insn_cost and
> rs6000_adjust_insn_length. I stayed with the new 'always' keyword but added
> additional commentary so hopefully is more clear.
> 
> Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. Ok 
> for trunk?
> 
> -Pat
> 
> 
> 2020-11-10  Pat Haugen  
> 
> gcc/
>   * config/rs6000/altivec.md (xxspltiw_v4si, xxspltiw_v4sf_inst,
>   xxspltidp_v2df_inst, xxsplti32dx_v4si_inst, xxsplti32dx_v4sf_inst,
>   xxblend_, xxpermx_inst, xxeval): Mark prefixed "always".
>   * config/rs6000/mma.md (mma_, mma_,
>   mma_, mma_, mma_, mma_,
>   mma_, mma_, mma_, mma_):
>   Likewise.
>   * config/rs6000/rs6000.c (rs6000_insn_cost): Update test for prefixed
>   insn.
>   (next_insn_prefixed_p): Rename to prefix_next_insn_p.
>   (rs6000_final_prescan_insn): Only add 'p' for PREFIXED_YES.
>   (rs6000_asm_output_opcode): Adjust.
>   (rs6000_adjust_insn_length): Update test for prefixed insns.
>   * config/rs6000/rs6000.md (define_attr "prefixed"): Add 'always'
>   and update commentary.
>

Re: [PATCH, rs6000] Update "size" attribute for Power10

2021-01-26 Thread Pat Haugen via Gcc-patches

Ping

On 12/8/20 3:46 PM, Pat Haugen via Gcc-patches wrote:
> Update size attribute for Power10.
> 
> 
> This patch was broken out from my larger patch to update various attributes 
> for
> Power10, in order to make the review process hopefully easier. This patch only
> updates the size attribute for various new instructions. There were no changes
> requested to this portion of the original patch, so nothing is new here.
> 
> Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. Ok 
> for trunk?
> 
> -Pat
> 
> 
> 2020-11-08  Pat Haugen  
> 
> gcc/
>   * config/rs6000/dfp.md (extendddtd2, trunctddd2, *cmp_internal1,
>   floatditd2, ftrunc2, fixdi2, dfp_ddedpd_,
>   dfp_denbcd_, dfp_dxex_, dfp_diex_,
>   *dfp_sgnfcnc_, dfp_dscli_, dfp_dscri_): Update size
>   attribute for Power10.
>   * config/rs6000/mma.md (*movoo): Likewise.
>   * config/rs6000/rs6000.md (define_attr "size"): Add 256.
>   (define_mode_attr bits): Add DD/TD modes.
>   * config/rs6000/sync.md (load_quadpti, store_quadpti, load_lockedpti,
>   store_conditionalpti): Update size attribute for Power10.
>

Re: [PATCH,rs6000] Fusion patterns for logical-logical

2020-12-21 Thread Pat Haugen via Gcc-patches

On 12/10/20 8:41 PM, acsawdey--- via Gcc-patches wrote:
> +  [(set_attr "type" "logical")

Similar to load-cmp fusion pairs, we need a new insn type here.

-Pat

Re: [PATCH,rs6000] Combine patterns for p10 load-cmpi fusion

2020-12-21 Thread Pat Haugen via Gcc-patches

On 12/4/20 1:19 PM, acsawdey--- via Gcc-patches wrote:
> +   print "  [(set_attr \"type\" \"load\")\n";

We need to tag these with a new instruction type, such as 'fused-load-cmp', so 
the scheduler can distinguish them from normal loads.

-Pat

[PATCH, rs6000] Don't set MMA prefixed instruction length to 8

2020-12-14 Thread Pat Haugen via Gcc-patches

Fix instruction length for MMA insns.


Prefixed instructions should not have their length explicitly set to '8'. The 
function get_attr_length() will adjust the length appropriately based on the 
value of the "prefixed" attribute.

Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. Ok 
for trunk?

-Pat


2020-12-14  Pat Haugen  

gcc/
* config/rs6000/mma.md (*movxo, mma_, mma_,
mma_, mma_, mma_, mma_,
mma_, mma_, mma_, mma_):
Remove explicit setting of length attribute.
diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 4d291c42f7a..ef4394416b6 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -318,7 +318,7 @@ (define_insn_and_split "*movxo"
   DONE;
 }
   [(set_attr "type" "vecload,vecstore,veclogical")
-   (set_attr "length" "8,8,16")
+   (set_attr "length" "*,*,16")
(set_attr "max_prefixed_insns" "2,2,*")])
 
 (define_expand "mma_assemble_pair"
@@ -539,8 +539,7 @@ (define_insn "mma_"
MMA_VVI4I4I8))]
   "TARGET_MMA"
   " %A0,%x1,%x2,%3,%4,%5"
-  [(set_attr "type" "mma")
-   (set_attr "length" "8")])
+  [(set_attr "type" "mma")])
 
 (define_insn "mma_"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=")
@@ -553,8 +552,7 @@ (define_insn "mma_"
MMA_AVVI4I4I8))]
   "TARGET_MMA"
   " %A0,%x2,%x3,%4,%5,%6"
-  [(set_attr "type" "mma")
-   (set_attr "length" "8")])
+  [(set_attr "type" "mma")])
 
 (define_insn "mma_"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=")
@@ -566,8 +564,7 @@ (define_insn "mma_"
MMA_VVI4I4I2))]
   "TARGET_MMA"
   " %A0,%x1,%x2,%3,%4,%5"
-  [(set_attr "type" "mma")
-   (set_attr "length" "8")])
+  [(set_attr "type" "mma")])
 
 (define_insn "mma_"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=")
@@ -580,8 +577,7 @@ (define_insn "mma_"
MMA_AVVI4I4I2))]
   "TARGET_MMA"
   " %A0,%x2,%x3,%4,%5,%6"
-  [(set_attr "type" "mma")
-   (set_attr "length" "8")])
+  [(set_attr "type" "mma")])
 
 (define_insn "mma_"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=")
@@ -592,8 +588,7 @@ (define_insn "mma_"
MMA_VVI4I4))]
   "TARGET_MMA"
   " %A0,%x1,%x2,%3,%4"
-  [(set_attr "type" "mma")
-   (set_attr "length" "8")])
+  [(set_attr "type" "mma")])
 
 (define_insn "mma_"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=")
@@ -605,8 +600,7 @@ (define_insn "mma_"
MMA_AVVI4I4))]
   "TARGET_MMA"
   " %A0,%x2,%x3,%4,%5"
-  [(set_attr "type" "mma")
-   (set_attr "length" "8")])
+  [(set_attr "type" "mma")])
 
 (define_insn "mma_"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=")
@@ -617,8 +611,7 @@ (define_insn "mma_"
MMA_PVI4I2))]
   "TARGET_MMA"
   " %A0,%x1,%x2,%3,%4"
-  [(set_attr "type" "mma")
-   (set_attr "length" "8")])
+  [(set_attr "type" "mma")])
 
 (define_insn "mma_"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=")
@@ -630,8 +623,7 @@ (define_insn "mma_"
MMA_APVI4I2))]
   "TARGET_MMA"
   " %A0,%x2,%x3,%4,%5"
-  [(set_attr "type" "mma")
-   (set_attr "length" "8")])
+  [(set_attr "type" "mma")])
 
 (define_insn "mma_"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=")
@@ -643,8 +635,7 @@ (define_insn "mma_"
MMA_VVI4I4I4))]
   "TARGET_MMA"
   " %A0,%x1,%x2,%3,%4,%5"
-  [(set_attr "type" "mma")
-   (set_attr "length" "8")])
+  [(set_attr "type" "mma")])
 
 (define_insn "mma_"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=")
@@ -657,5 +648,4 @@ (define_insn "mma_"
MMA_AVVI4I4I4))]
   "TARGET_MMA"
   " %A0,%x2,%x3,%4,%5,%6"
-  [(set_attr "type" "mma")
-   (set_attr "length" "8")])
+  [(set_attr "type" "mma")])

[PATCH, rs6000] Update "prefix" attribute for Power10

2020-12-10 Thread Pat Haugen via Gcc-patches

Update prefixed attribute for Power10.


This patch was broken out from my larger patch to update various attributes for
Power10, in order to make the review process hopefully easier. This patch only
updates the prefix attribute for various new instructions. Changes in this
version include missed updates to rs6000_insn_cost and
rs6000_adjust_insn_length. I stayed with the new 'always' keyword but added
additional commentary so hopefully is more clear.

Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. Ok 
for trunk?

-Pat


2020-11-10  Pat Haugen  

gcc/
* config/rs6000/altivec.md (xxspltiw_v4si, xxspltiw_v4sf_inst,
xxspltidp_v2df_inst, xxsplti32dx_v4si_inst, xxsplti32dx_v4sf_inst,
xxblend_, xxpermx_inst, xxeval): Mark prefixed "always".
* config/rs6000/mma.md (mma_, mma_,
mma_, mma_, mma_, mma_,
mma_, mma_, mma_, mma_):
Likewise.
* config/rs6000/rs6000.c (rs6000_insn_cost): Update test for prefixed
insn.
(next_insn_prefixed_p): Rename to prefix_next_insn_p.
(rs6000_final_prescan_insn): Only add 'p' for PREFIXED_YES.
(rs6000_asm_output_opcode): Adjust.
(rs6000_adjust_insn_length): Update test for prefixed insns.
* config/rs6000/rs6000.md (define_attr "prefixed"): Add 'always'
and update commentary.
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 6a6ce0f84ed..fc926f7a7aa 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -828,7 +828,8 @@ (define_insn "xxspltiw_v4si"
 UNSPEC_XXSPLTIW))]
  "TARGET_POWER10"
  "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")])
+ [(set_attr "type" "vecsimple")
+  (set_attr "prefixed" "always")])
 
 (define_expand "xxspltiw_v4sf"
   [(set (match_operand:V4SF 0 "register_operand" "=wa")
@@ -847,7 +848,8 @@ (define_insn "xxspltiw_v4sf_inst"
 UNSPEC_XXSPLTIW))]
  "TARGET_POWER10"
  "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")])
+ [(set_attr "type" "vecsimple")
+  (set_attr "prefixed" "always")])
 
 (define_expand "xxspltidp_v2df"
   [(set (match_operand:V2DF 0 "register_operand" )
@@ -866,7 +868,8 @@ (define_insn "xxspltidp_v2df_inst"
 UNSPEC_XXSPLTID))]
   "TARGET_POWER10"
   "xxspltidp %x0,%1"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "always")])
 
 (define_expand "xxsplti32dx_v4si"
   [(set (match_operand:V4SI 0 "register_operand" "=wa")
@@ -895,7 +898,8 @@ (define_insn "xxsplti32dx_v4si_inst"
 UNSPEC_XXSPLTI32DX))]
   "TARGET_POWER10"
   "xxsplti32dx %x0,%2,%3"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "always")])
 
 (define_expand "xxsplti32dx_v4sf"
   [(set (match_operand:V4SF 0 "register_operand" "=wa")
@@ -923,7 +927,8 @@ (define_insn "xxsplti32dx_v4sf_inst"
 UNSPEC_XXSPLTI32DX))]
   "TARGET_POWER10"
   "xxsplti32dx %x0,%2,%3"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "always")])
 
 (define_insn "xxblend_"
   [(set (match_operand:VM3 0 "register_operand" "=wa")
@@ -933,7 +938,8 @@ (define_insn "xxblend_"
UNSPEC_XXBLEND))]
   "TARGET_POWER10"
   "xxblendv %x0,%x1,%x2,%x3"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "always")])
 
 (define_expand "xxpermx"
   [(set (match_operand:V2DI 0 "register_operand" "+wa")
@@ -977,7 +983,8 @@ (define_insn "xxpermx_inst"
 UNSPEC_XXPERMX))]
   "TARGET_POWER10"
   "xxpermx %x0,%x1,%x2,%x3,%4"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "always")])
 
 (define_expand "vstrir_"
   [(set (match_operand:VIshort 0 "altivec_register_operand")
@@ -3625,7 +3632,8 @@ (define_insn "xxeval"
 UNSPEC_XXEVAL))]
"TARGET_POWER10"
"xxeval %0,%1,%2,%3,%4"
-   [(set_attr "type" "vecsimple")])
+   [(set_attr "type" "vecsimple")
+

[PATCH, rs6000] Update "size" attribute for Power10

2020-12-08 Thread Pat Haugen via Gcc-patches

Update size attribute for Power10.


This patch was broken out from my larger patch to update various attributes for
Power10, in order to make the review process hopefully easier. This patch only
updates the size attribute for various new instructions. There were no changes
requested to this portion of the original patch, so nothing is new here.

Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. Ok 
for trunk?

-Pat


2020-11-08  Pat Haugen  

gcc/
* config/rs6000/dfp.md (extendddtd2, trunctddd2, *cmp_internal1,
floatditd2, ftrunc2, fixdi2, dfp_ddedpd_,
dfp_denbcd_, dfp_dxex_, dfp_diex_,
*dfp_sgnfcnc_, dfp_dscli_, dfp_dscri_): Update size
attribute for Power10.
* config/rs6000/mma.md (*movoo): Likewise.
* config/rs6000/rs6000.md (define_attr "size"): Add 256.
(define_mode_attr bits): Add DD/TD modes.
* config/rs6000/sync.md (load_quadpti, store_quadpti, load_lockedpti,
store_conditionalpti): Update size attribute for Power10.

diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
index 9a952300cd6..7562e63a919 100644
--- a/gcc/config/rs6000/dfp.md
+++ b/gcc/config/rs6000/dfp.md
@@ -139,7 +139,8 @@ (define_insn "extendddtd2"
(float_extend:TD (match_operand:DD 1 "gpc_reg_operand" "d")))]
   "TARGET_DFP"
   "dctqpq %0,%1"
-  [(set_attr "type" "dfp")])
+  [(set_attr "type" "dfp")
+   (set_attr "size" "128")])
 
 ;; The result of drdpq is an even/odd register pair with the converted
 ;; value in the even register and zero in the odd register.
@@ -153,6 +154,7 @@ (define_insn "trunctddd2"
   "TARGET_DFP"
   "drdpq %2,%1\;fmr %0,%2"
   [(set_attr "type" "dfp")
+   (set_attr "size" "128")
(set_attr "length" "8")])
 
 (define_insn "trunctdsd2"
@@ -206,7 +208,8 @@ (define_insn "*cmp_internal1"
  (match_operand:DDTD 2 "gpc_reg_operand" "d")))]
   "TARGET_DFP"
   "dcmpu %0,%1,%2"
-  [(set_attr "type" "dfp")])
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
 
 (define_insn "floatdidd2"
   [(set (match_operand:DD 0 "gpc_reg_operand" "=d")
@@ -220,7 +223,8 @@ (define_insn "floatditd2"
(float:TD (match_operand:DI 1 "gpc_reg_operand" "d")))]
   "TARGET_DFP"
   "dcffixq %0,%1"
-  [(set_attr "type" "dfp")])
+  [(set_attr "type" "dfp")
+   (set_attr "size" "128")])
 
 ;; Convert a decimal64/128 to a decimal64/128 whose value is an integer.
 ;; This is the first stage of converting it to an integer type.
@@ -230,7 +234,8 @@ (define_insn "ftrunc2"
(fix:DDTD (match_operand:DDTD 1 "gpc_reg_operand" "d")))]
   "TARGET_DFP"
   "drintn. 0,%0,%1,1"
-  [(set_attr "type" "dfp")])
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
 
 ;; Convert a decimal64/128 whose value is an integer to an actual integer.
 ;; This is the second stage of converting decimal float to integer type.
@@ -240,7 +245,8 @@ (define_insn "fixdi2"
(fix:DI (match_operand:DDTD 1 "gpc_reg_operand" "d")))]
   "TARGET_DFP"
   "dctfix %0,%1"
-  [(set_attr "type" "dfp")])
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
 
 ;; Decimal builtin support
 
@@ -262,7 +268,8 @@ (define_insn "dfp_ddedpd_"
 UNSPEC_DDEDPD))]
   "TARGET_DFP"
   "ddedpd %1,%0,%2"
-  [(set_attr "type" "dfp")])
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
 
 (define_insn "dfp_denbcd_"
   [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
@@ -271,7 +278,8 @@ (define_insn "dfp_denbcd_"
 UNSPEC_DENBCD))]
   "TARGET_DFP"
   "denbcd %1,%0,%2"
-  [(set_attr "type" "dfp")])
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
 
 (define_insn "dfp_denbcd_v16qi_inst"
   [(set (match_operand:TD 0 "gpc_reg_operand" "=d")
@@ -301,7 +309,8 @@ (define_insn "dfp_dxex_"
   UNSPEC_DXEX))]
   "TARGET_DFP"
   "dxex %0,%1"
-  [(set_attr "type" "dfp")])
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
 
 (define_insn "dfp_diex_"
   [(set (match_ope

Re: [PATCH v2] rs6000, vector integer multiply/divide/modulo instructions

2020-11-24 Thread Pat Haugen via Gcc-patches

On 11/24/20 8:17 PM, Pat Haugen via Gcc-patches wrote:
> On 11/24/20 12:59 PM, Carl Love via Gcc-patches wrote:
>> +
>> +(define_insn "dives_"
>> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
>> +(unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
>> +(match_operand:VIlong 2 "vsx_register_operand" "v")]
>> +   UNSPEC_VDIVES))]
>> +  "TARGET_POWER10"
>> +  "vdives %0,%1,%2"
>> +  [(set_attr "type" "vecdiv")
>> +   (set_attr "size" "128")])
>> +
>> +(define_insn "diveu_"
>> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
>> +(unspec: VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
>> + (match_operand:VIlong 2 "vsx_register_operand" "v")]
>> +UNSPEC_VDIVEU))]
>> +  "TARGET_POWER10"
>> +  "vdiveu %0,%1,%2"
>> +  [(set_attr "type" "vecdiv")
>> +   (set_attr "size" "128")])
>> +
>> +(define_insn "div3"
>> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
>> +(div:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
>> +(match_operand:VIlong 2 "vsx_register_operand" "v")))]
>> +  "TARGET_POWER10"
>> +  "vdivs %0,%1,%2"
>> +  [(set_attr "type" "vecdiv")
>> +   (set_attr "size" "128")])
>> +
>> +(define_insn "udiv3"
>> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
>> +(udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
>> +(match_operand:VIlong 2 "vsx_register_operand" "v")))]
>> +  "TARGET_POWER10"
>> +  "vdivu %0,%1,%2"
>> +  [(set_attr "type" "vecdiv")
>> +   (set_attr "size" "128")])
>> +
>> +(define_insn "mods_"
>> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
>> +(mod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
>> +(match_operand:VIlong 2 "vsx_register_operand" "v")))]
>> +  "TARGET_POWER10"
>> +  "vmods %0,%1,%2"
>> +  [(set_attr "type" "vecdiv")
>> +   (set_attr "size" "128")])
>> +
>> +(define_insn "modu_"
>> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
>> +(umod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
>> + (match_operand:VIlong 2 "vsx_register_operand" "v")))]
>> +  "TARGET_POWER10"
>> +  "vmodu %0,%1,%2"
>> +  [(set_attr "type" "vecdiv")
>> +   (set_attr "size" "128")])
> 
> We should only be setting "size" "128" for instructions that operate on 
> scalar 128-bit data items (i.e. 'vdivesq' etc). Since the above insns are 
> either V2DI/V4SI (ala VIlong mode_iterator), they shouldn't be marked as size 
> 128. If you want to set the size based on mode, (set_attr "size" "") 
> should do the trick I believe.

Well, after you update "(define_mode_attr bits" in rs6000.md for V2DI/V4SI.

Re: [PATCH v2] rs6000, vector integer multiply/divide/modulo instructions

2020-11-24 Thread Pat Haugen via Gcc-patches

On 11/24/20 12:59 PM, Carl Love via Gcc-patches wrote:
> +
> +(define_insn "dives_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +(unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")]
> +UNSPEC_VDIVES))]
> +  "TARGET_POWER10"
> +  "vdives %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "128")])
> +
> +(define_insn "diveu_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +(unspec: VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> +  (match_operand:VIlong 2 "vsx_register_operand" "v")]
> + UNSPEC_VDIVEU))]
> +  "TARGET_POWER10"
> +  "vdiveu %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "128")])
> +
> +(define_insn "div3"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (div:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vdivs %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "128")])
> +
> +(define_insn "udiv3"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vdivu %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "128")])
> +
> +(define_insn "mods_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (mod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmods %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "128")])
> +
> +(define_insn "modu_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (umod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +  (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmodu %0,%1,%2"
> +  [(set_attr "type" "vecdiv")
> +   (set_attr "size" "128")])

We should only be setting "size" "128" for instructions that operate on scalar 
128-bit data items (i.e. 'vdivesq' etc). Since the above insns are either 
V2DI/V4SI (ala VIlong mode_iterator), they shouldn't be marked as size 128. If 
you want to set the size based on mode, (set_attr "size" "") should do 
the trick I believe.

-Pat

Re: [PATCH] rs6000, vector integer multiply/divide/modulo instructions

2020-11-19 Thread Pat Haugen via Gcc-patches

On 11/4/20 10:44 AM, Carl Love via Gcc-patches wrote:
> +
> +(define_insn "vdives_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +(unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")]
> +UNSPEC_VDIVES))]
> +  "TARGET_POWER10"
> +  "vdives %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "vdiveu_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> +(unspec: VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> +  (match_operand:VIlong 2 "vsx_register_operand" "v")]
> + UNSPEC_VDIVEU))]
> +  "TARGET_POWER10"
> +  "vdiveu %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "div3"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (div:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vdivs %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "udiv3"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vdivu %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "vmods_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (mod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmods %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "vmodu_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (umod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v")
> +  (match_operand:VIlong 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmodu %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])

Since the vdiv.../vmod... instructions execute in the fixed point divide unit, 
all the above instructions should have a type of "div" instead of "vecsimple".


> +
> +(define_insn "vmulhs_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")]
> +UNSPEC_VMULHS))]
> +  "TARGET_POWER10"
> +  "vmulhs %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])
> +
> +(define_insn "vmulhu_"
> +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> + (unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v")
> + (match_operand:VIlong 2 "vsx_register_operand" "v")]
> +UNSPEC_VMULHU))]
> +  "TARGET_POWER10"
> +  "vmulhu %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])> +
> +;; Vector multiply low double word
> +(define_insn "mulv2di3"
> +  [(set (match_operand:V2DI 0 "vsx_register_operand" "=v")
> + (mult:V2DI (match_operand:V2DI 1 "vsx_register_operand" "v")
> +(match_operand:V2DI 2 "vsx_register_operand" "v")))]
> +  "TARGET_POWER10"
> +  "vmulld %0,%1,%2"
> +  [(set_attr "type" "vecsimple")])

Similarly, the above 3 insns should have a "mul" instruction type.

-Pat

Re: [PATCH, rs6000] Add Power10 scheduling description

2020-11-18 Thread Pat Haugen via Gcc-patches

On 11/17/20 10:31 PM, will schmidt wrote:
> On Fri, 2020-11-13 at 16:04 -0600, Pat Haugen via Gcc-patches wrote:
>> +(define_automaton "power10dsp,power10issue,power10div")
>> +
>> +; Decode/dispatch slots
>> +(define_cpu_unit "du0_power10,du1_power10,du2_power10,du3_power10,
>> +  du4_power10,du5_power10,du6_power10,du7_power10" "power10dsp")
>> +
>> +; Four execution units
>> +(define_cpu_unit "exu0_power10,exu1_power10,exu2_power10,exu3_power10"
>> + "power10issue")
>> +; Two load units and two store units
>> +(define_cpu_unit "lu0_power10,lu1_power10" "power10issue")
>> +(define_cpu_unit "stu0_power10,stu1_power10" "power10issue")
>> +; Create false units for use by non-pipelined div/sqrt
>> +(define_cpu_unit "fx_div0_power10,fx_div1_power10" "power10div")
>> +(define_cpu_unit "fp_div0_power10,fp_div1_power10,fp_div2_power10,
>> +  fp_div3_power10" "power10div")
> 
> The spacing catches my eye, I'd want to add spaces around those commas,
> etc.   But.. this appears to be consistent with behavior
> as seen in the
> existing power9.md ; power9.md ; etc. 
> So it's either this way per necessity, or this way per history.
> Either way, no change requested here given that precedence.
> (If this and
> the older stuff also needs to be cosmetically tweaked, that can be
> handled later on..)

Yeah, just historical.


>> +; Load Unit
>> +(define_insn_reservation "power10-load" 4
>> +  (and (eq_attr "type" "load")
>> +   (eq_attr "update" "no")
>> +   (eq_attr "size" "!128")
>> +   (eq_attr "prefixed" "no")
>> +   (eq_attr "cpu" "power10"))
>> +  "DU_any_power10,LU_power10")
>> +
>> +(define_insn_reservation "power10-prefixed-load" 4
>> +  (and (eq_attr "type" "load")
>> +   (eq_attr "update" "no")
>> +   (eq_attr "size" "!128")
>> +   (eq_attr "prefixed" "!no")
> 
> I'm sure there is reason, but remind me..  "!no" versus "yes" ?

>From my prior patch, the prefixed attribute can now have values no/yes/always, 
>so '!no' means 'yes' || 'always'.


>> +
>> +; DFP
>> +; Use the minimum 12 cycle latency for all insns, even though some are more
> 
> ".. for all DFP insns"
> Since you specify this is a minimum, can prob drop "even though some
> are more".

ok

[PATCH, rs6000] Add Power10 scheduling description

2020-11-13 Thread Pat Haugen via Gcc-patches

Add Power10 scheduling description.

This patch adds the Power10 scheduling description. Since power10.md was pretty 
much a complete rewrite (existing version of power10.md is mostly just a copy 
of power9.md), I diffed power10.md with /dev/null so that the full contents of 
the file are shown as opposed to a diff. This should make it easier to read. 
This patch will not apply on current trunk do to that reason.
 
Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. Ok 
for trunk?

-Pat


2020-11-13  Pat Haugen  

gcc/
* config/rs6000/rs6000.c (struct processor_costs): New.
(rs6000_option_override_internal): Set Power10 costs.
(rs6000_issue_rate): Set Power10 issue rate.
* config/rs6000/power10.md: Rewrite for Power10.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 4d528a39a37..85bb42d6dce 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1080,6 +1080,26 @@ struct processor_costs power9_cost = {
   COSTS_N_INSNS (3),   /* SF->DF convert */
 };
 
+/* Instruction costs on POWER10 processors.  */
+static const
+struct processor_costs power10_cost = {
+  COSTS_N_INSNS (1),   /* mulsi */
+  COSTS_N_INSNS (1),   /* mulsi_const */
+  COSTS_N_INSNS (1),   /* mulsi_const9 */
+  COSTS_N_INSNS (1),   /* muldi */
+  COSTS_N_INSNS (4),   /* divsi */
+  COSTS_N_INSNS (4),   /* divdi */
+  COSTS_N_INSNS (2),   /* fp */
+  COSTS_N_INSNS (2),   /* dmul */
+  COSTS_N_INSNS (7),   /* sdiv */
+  COSTS_N_INSNS (9),   /* ddiv */
+  128, /* cache line size */
+  32,  /* l1 cache */
+  512, /* l2 cache */
+  16,  /* prefetch streams */
+  COSTS_N_INSNS (2),   /* SF->DF convert */
+};
+
 /* Instruction costs on POWER A2 processors.  */
 static const
 struct processor_costs ppca2_cost = {
@@ -4734,10 +4754,13 @@ rs6000_option_override_internal (bool global_init_p)
break;
 
   case PROCESSOR_POWER9:
-  case PROCESSOR_POWER10:
rs6000_cost = _cost;
break;
 
+  case PROCESSOR_POWER10:
+   rs6000_cost = _cost;
+   break;
+
   case PROCESSOR_PPCA2:
rs6000_cost = _cost;
break;
@@ -18001,8 +18024,9 @@ rs6000_issue_rate (void)
   case PROCESSOR_POWER8:
 return 7;
   case PROCESSOR_POWER9:
-  case PROCESSOR_POWER10:
 return 6;
+  case PROCESSOR_POWER10:
+return 8;
   default:
 return 1;
   }
diff --git a/gcc/config/rs6000/power10.md b/gcc/config/rs6000/power10.md
new file mode 100644
index 000..f9ca4cbf10e
--- /dev/null
+++ b/gcc/config/rs6000/power10.md
@@ -0,0 +1,553 @@
+;; Scheduling description for the IBM POWER10 processor.
+;; Copyright (C) 2020-2020 Free Software Foundation, Inc.
+;;
+;; Contributed by Pat Haugen (pthau...@us.ibm.com).
+
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either version 3, or (at your
+;; option) any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+;; License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+; For Power10 we model (and try to pack) the in-order decode/dispatch groups
+; which consist of 8 instructions max.  We do not try to model the details of
+; the out-of-order issue queues and how insns flow to the various execution
+; units except for the simple representation of the issue limitation of at
+; most 4 insns to the execution units/2 insns to the load units/2 insns to
+; the store units.
+(define_automaton "power10dsp,power10issue,power10div")
+
+; Decode/dispatch slots
+(define_cpu_unit "du0_power10,du1_power10,du2_power10,du3_power10,
+ du4_power10,du5_power10,du6_power10,du7_power10" "power10dsp")
+
+; Four execution units
+(define_cpu_unit "exu0_power10,exu1_power10,exu2_power10,exu3_power10"
+"power10issue")
+; Two load units and two store units
+(define_cpu_unit "lu0_power10,lu1_power10" "power10issue")
+(define_cpu_unit "stu0_power10,stu1_power10" "power10issue")
+; Create false units for use by non-pipelined div/sqrt
+(define_cpu_unit "fx_div0_power10,fx_div1_power10" "power10div")
+(define_cpu_unit "fp_div0_power10,fp_div1_power10,fp_div2_power10,
+ fp_div3_power10" "power10div")
+
+
+; Dispatch slots are allocated in order conforming to program order.
+(absence_set "du0_power10" "du1_power10,du2_power10,du3_power10,du4_power10,\
+

Re: [RFC, Instruction Scheduler, Stage1] New hook/code to perform fusion of dependent instructions

2020-11-13 Thread Pat Haugen via Gcc-patches

On 11/12/20 10:05 AM, Jeff Law wrote:
>> I have coded up a proof of concept that implements our needs via a new
>> target hook. The hook is passed a pair of dependent insns and returns if
>> they are a fusion candidate. It is called while removing the forward
>> dependencies of the just scheduled insn. If a dependent insn becomes
>> available to schedule and it's a fusion candidate with the just
>> scheduled insn, then the new code moves it to the ready list (if
>> necessary) and marks it as SCHED_GROUP (piggy-backing on the existing
>> code used by TARGET_SCHED_MACRO_FUSION) to make sure the fusion
>> candidate will be scheduled next. Following is the scheduling part of
>> the diff. Does this sound like a feasible approach? I welcome any
>> comments/discussion.
> It looks fairly reasonable to me.   Do you plan on trying to take this
> forward at all?

Due to other requirements where this approach did not work, we are pursuing a 
different approach of creating combine patterns for the target.

-Pat

Re: [PATCH, rs6000] Update instruction attributes for Power10

2020-11-06 Thread Pat Haugen via Gcc-patches

On 11/5/20 4:32 PM, will schmidt wrote:
> On Wed, 2020-11-04 at 14:42 -0600, Pat Haugen via Gcc-patches wrote:
>>  * config/rs6000/rs6000.c (rs6000_final_prescan_insn): Only add 'p' for
>>  PREFIXED_YES.
> 
> The code change reads as roughly 
> - next_insn_prefixed_p != PREFIXED_NO
> 
> + next_insn_prefixed_p == PREFIXED_YES"
> 
> So just an inversion of the logic? I don't obviously see the 'p' impact
> there.
> 
It's no longer an inversion of the logic since I added a PREFIXED_ALWAYS value. 
'next_insn_prefixed' is used by rs6000_final_prescan_insn() to determine 
whether an insn mnemonic needs a 'p' prefix. We want it set for PREFIXED_YES, 
but not for PREFIXED_NO or PREFIXED_ALWAYS.

> 
>>  * config/rs6000/rs6000.md (define_attr "size"): Add 256.
>>  (define_attr "prefixed"): Add 'always'.
>>  (define_mode_attr bits): Add DD/TD modes.
>>  (cfuged, cntlzdm, cnttzdm, pdepd, pextd, bswaphi2_reg, bswapsi2_reg,
>>  bswapdi2_brd, setbc_signed_,
>>  *setbcr_signed_, *setnbc_signed_,
>>  *setnbcr_signed_): Update instruction attributes for
>>  Power10.
> 
> ok.  (assuming the assorted 'integer' -> 'crypto' changes are correct,
> of course).  
> 
Yes, crypto represents the correct pipe the insns are executed on.

Thanks for the review,
Pat

[PATCH, rs6000] Update instruction attributes for Power10

2020-11-04 Thread Pat Haugen via Gcc-patches

Update instruction attributes for Power10.


This patch updates the type/prefixed/dot/size attributes for various new 
instructions (and a couple existing that were incorrect) in preparation for the 
Power10 scheduling patch that will be following.

Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. Ok 
for trunk?

-Pat


2020-11-04  Pat Haugen  

gcc/
* config/rs6000/altivec.md (vsdb_, xxspltiw_v4si,
xxspltiw_v4sf_inst, xxspltidp_v2df_inst, xxsplti32dx_v4si_inst,
xxsplti32dx_v4sf_inst, xxblend_, xxpermx_inst,
vstrir_code_, vstrir_p_code_, vstril_code_,
vstril_p_code_, altivec_lvsl_reg, altivec_lvsl_direct,
altivec_lvsr_reg, altivec_lvsr_direct, xxeval, vcfuged, vclzdm,
vctzdm, vpdepd, vpextd, vgnb, vclrlb, vclrrb): Update instruction
attributes for Power10.
* config/rs6000/dfp.md (extendddtd2, trunctddd2, *cmp_internal1,
floatditd2, ftrunc2, fixdi2, dfp_ddedpd_,
dfp_denbcd_, dfp_dxex_, dfp_diex_,
*dfp_sgnfcnc_, dfp_dscli_, dfp_dscri_): Likewise.
* config/rs6000/mma.md (*movpoi, mma_, mma_,
mma_, mma_, mma_, mma_,
mma_, mma_, mma_, mma_):
Likewise.
* config/rs6000/rs6000.c (rs6000_final_prescan_insn): Only add 'p' for
PREFIXED_YES.
* config/rs6000/rs6000.md (define_attr "size"): Add 256.
(define_attr "prefixed"): Add 'always'.
(define_mode_attr bits): Add DD/TD modes.
(cfuged, cntlzdm, cnttzdm, pdepd, pextd, bswaphi2_reg, bswapsi2_reg,
bswapdi2_brd, setbc_signed_,
*setbcr_signed_, *setnbc_signed_,
*setnbcr_signed_): Update instruction attributes for
Power10.
* config/rs6000/sync.md (load_quadpti, store_quadpti, load_lockedpti,
store_conditionalpti): Update instruction attributes for Power10.
* config/rs6000/vsx.md (*xvtlsbb_internal, xxgenpcvm__internal,
vextractl_internal, vextractr_internal,
vinsertvl_internal_, vinsertvr_internal_,
vinsertgl_internal_, vinsertgr_internal_,
vreplace_elt__inst): Likewise.

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 0a2e634d6b0..76191ba4107 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -819,7 +819,7 @@ (define_insn "vsdb_"
  VSHIFT_DBL_LR))]
   "TARGET_POWER10"
   "vsdbi %0,%1,%2,%3"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecperm")])
 
 (define_insn "xxspltiw_v4si"
   [(set (match_operand:V4SI 0 "register_operand" "=wa")
@@ -827,7 +827,8 @@ (define_insn "xxspltiw_v4si"
 UNSPEC_XXSPLTIW))]
  "TARGET_POWER10"
  "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")])
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "always")])
 
 (define_expand "xxspltiw_v4sf"
   [(set (match_operand:V4SF 0 "register_operand" "=wa")
@@ -846,7 +847,8 @@ (define_insn "xxspltiw_v4sf_inst"
 UNSPEC_XXSPLTIW))]
  "TARGET_POWER10"
  "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")])
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "always")])
 
 (define_expand "xxspltidp_v2df"
   [(set (match_operand:V2DF 0 "register_operand" )
@@ -865,7 +867,8 @@ (define_insn "xxspltidp_v2df_inst"
 UNSPEC_XXSPLTID))]
   "TARGET_POWER10"
   "xxspltidp %x0,%1"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "always")])
 
 (define_expand "xxsplti32dx_v4si"
   [(set (match_operand:V4SI 0 "register_operand" "=wa")
@@ -894,7 +897,8 @@ (define_insn "xxsplti32dx_v4si_inst"
 UNSPEC_XXSPLTI32DX))]
   "TARGET_POWER10"
   "xxsplti32dx %x0,%2,%3"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "always")])
 
 (define_expand "xxsplti32dx_v4sf"
   [(set (match_operand:V4SF 0 "register_operand" "=wa")
@@ -922,7 +926,8 @@ (define_insn "xxsplti32dx_v4sf_inst"
 UNSPEC_XXSPLTI32DX))]
   "TARGET_POWER10"
   "xxsplti32dx %x0,%2,%3"
-  [(set_attr "type" "vecsimple")])
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "always")])
 
 (define_insn "xxblend_"
   [(set (match_operand:VM3 0 "register_operand" "=wa")
@@ -932,7 +937,8 @@ (define_insn "xx

Re: [PATCH 1/2, rs6000] int128 sign extention instructions (partial prereq)

2020-09-25 Thread Pat Haugen via Gcc-patches

On 9/24/20 10:59 AM, will schmidt via Gcc-patches wrote:
> +;; Move DI value from GPR to TI mode in VSX register, word 1.
> +(define_insn "mtvsrdd_diti_w1"
> +  [(set (match_operand:TI 0 "register_operand" "=wa")
> + (unspec:TI [(match_operand:DI 1 "register_operand" "r")]
> +UNSPEC_MTVSRD_DITI_W1))]
> +  "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
> +  "mtvsrdd %x0,0,%1"
> +  [(set_attr "type" "vecsimple")])

"vecmove" (since I just updated the other uses).

> +
> +;; Sign extend 64-bit value in TI reg, word 1, to 128-bit value in TI reg
> +(define_insn "extendditi2_vector"
> +  [(set (match_operand:TI 0 "gpc_reg_operand" "=v")
> +(unspec:TI [(match_operand:TI 1 "gpc_reg_operand" "v")]
> + UNSPEC_EXTENDDITI2))]
> +  "TARGET_POWER10"
> +  "vextsd2q %0,%1"
> +  [(set_attr "type" "exts")])

"vecexts".

> +
> +(define_expand "extendditi2"
> +  [(set (match_operand:TI 0 "gpc_reg_operand")
> +(sign_extend:DI (match_operand:DI 1 "gpc_reg_operand")))]
> +  "TARGET_POWER10"
> +  {
> +/* Move 64-bit src from GPR to vector reg and sign extend to 128-bits */
> +rtx temp = gen_reg_rtx (TImode);
> +emit_insn (gen_mtvsrdd_diti_w1 (temp, operands[1]));
> +emit_insn (gen_extendditi2_vector (operands[0], temp));
> +DONE;
> +  }
> +  [(set_attr "type" "exts")])

Don't need "type" attr on define_expand since the type will come from the 2 
individual insns emitted.

Thanks,
Pat

[PATCH] rs6000: Rename mffgpr/mftgpr instruction types

2020-09-11 Thread Pat Haugen via Gcc-patches

The following is mostly a mechanical change to rename the mffgpr/mftgpr
insn types to mtvsr/mfvsr to be more clear. It also removes Power6
references to those insn types since we no longer generate those
instructions.

Bootstrap/regtest on powerpc64le with no new regressions. Ok for trunk?

-Pat


2020-09-11  Pat Haugen  

gcc/
* config/rs6000/power10.md (power10-mffgpr, power10-mftgpr): Rename to
mtvsr/mfvsr.
* config/rs6000/power6.md (X2F_power6, power6-mftgpr, power6-mffgpr):
Remove.
* config/rs6000/power8.md (power8-mffgpr, power8-mftgpr): Rename to
mtvsr/mfvsr.
* config/rs6000/power9.md (power8-mffgpr, power8-mftgpr): Likewise.
* config/rs6000/rs6000.c (rs6000_adjust_cost): Remove Power6
TYPE_MFFGPR cases.
* config/rs6000/rs6000.md (mffgpr, mftgpr, zero_extendsi2,
extendsi2, @signbit2_dm, lfiwax, lfiwzx, *movsi_internal1,
movsi_from_sf, *movdi_from_sf_zero_ext, *mov_internal,
movsd_hardfloat, movsf_from_si, *mov_hardfloat64, p8_mtvsrwz,
p8_mtvsrd_df, p8_mtvsrd_sf, p8_mfvsrd_3_, *movdi_internal64,
unpack_dm): Rename mffgpr/mftgpr to mtvsr/mfvsr.
* config/rs6000/vsx.md (vsx_mov_64bit, vsx_extract_,
vsx_extract_si, *vsx_extract__p8): Likewise.



diff --git a/gcc/config/rs6000/power10.md b/gcc/config/rs6000/power10.md
index 9f8a5825744..2b4d882e8df 100644
--- a/gcc/config/rs6000/power10.md
+++ b/gcc/config/rs6000/power10.md
@@ -468,13 +468,13 @@ (define_insn_reservation "power10-qpmul" 24
(eq_attr "cpu" "power10"))
   "DU_super_power10,dfu_power10*8")

-(define_insn_reservation "power10-mffgpr" 2
-  (and (eq_attr "type" "mffgpr")
+(define_insn_reservation "power10-mtvsr" 2
+  (and (eq_attr "type" "mtvsr")
(eq_attr "cpu" "power10"))
   "DU_slice_3_power10,VSU_power10")

-(define_insn_reservation "power10-mftgpr" 2
-  (and (eq_attr "type" "mftgpr")
+(define_insn_reservation "power10-mfvsr" 2
+  (and (eq_attr "type" "mfvsr")
(eq_attr "cpu" "power10"))
   "DU_slice_3_power10,VSU_power10")

diff --git a/gcc/config/rs6000/power6.md b/gcc/config/rs6000/power6.md
index a94ce52425f..e2e7582e67c 100644
--- a/gcc/config/rs6000/power6.md
+++ b/gcc/config/rs6000/power6.md
@@ -56,10 +56,6 @@ (define_reservation "LX2_power6"
 (define_reservation "FX2_power6"
 "iu1_power6+iu2_power6")

-(define_reservation "X2F_power6"
-"(iu1_power6+iu2_power6+fpu1_power6)\
-|(iu1_power6+iu2_power6+fpu2_power6)")
-
 (define_reservation "BX2_power6"
 "iu1_power6+iu2_power6+bpu_power6")

@@ -605,20 +601,3 @@ (define_bypass 6 "power6-vecperm" "power6-veccomplex" )

 (define_bypass 5 "power6-vecperm" "power6-vecstore" )

-(define_insn_reservation "power6-mftgpr" 8
-  (and (eq_attr "type" "mftgpr")
-   (eq_attr "cpu" "power6"))
-  "X2F_power6")
-
-(define_insn_reservation "power6-mffgpr" 14
-  (and (eq_attr "type" "mffgpr")
-   (eq_attr "cpu" "power6"))
-  "LX2_power6")
-
-(define_bypass 4 "power6-mftgpr" "power6-imul,\
-  power6-lmul,\
-  power6-imul-cmp,\
-  power6-lmul-cmp,\
-  power6-imul3,\
-  power6-idiv,\
-  power6-ldiv" )
diff --git a/gcc/config/rs6000/power8.md b/gcc/config/rs6000/power8.md
index fae2ad8bf57..a3f46c62be2 100644
--- a/gcc/config/rs6000/power8.md
+++ b/gcc/config/rs6000/power8.md
@@ -379,13 +379,13 @@ (define_insn_reservation "power8-vecdiv" 31
(eq_attr "cpu" "power8"))
   "DU_any_power8,VSU_power8")

-(define_insn_reservation "power8-mffgpr" 5
-  (and (eq_attr "type" "mffgpr")
+(define_insn_reservation "power8-mtvsr" 5
+  (and (eq_attr "type" "mtvsr")
(eq_attr "cpu" "power8"))
   "DU_any_power8,VSU_power8")

-(define_insn_reservation "power8-mftgpr" 6
-  (and (eq_attr "type" "mftgpr")
+(define_insn_reservation "power8-mfvsr" 6
+  (and (eq_attr "type" "mfvsr")
(eq_attr "cpu" "power8"))
   "DU_any_power8,VSU_power8")

diff --git a/gcc/config/rs6000/power9.md b/gcc/config/rs6000/power9.md
index 2277b145308..c86d643a7d3 100644
--- a/gcc/config/rs6000/power9.md
+++ b/gcc/config/rs6000/po

Re: [PATCH] rs6000: Fix instruction type

2020-09-09 Thread Pat Haugen via Gcc-patches

On 9/9/20 4:41 PM, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Sep 09, 2020 at 04:14:37PM -0500, Pat Haugen wrote:
>> I noticed that some of the VSR<->GPR move instructions are not typed
>> correctly. This patch fixes those instructions so that the scheduler
>> treats them with the correct latency.
> 
>> --- a/gcc/config/rs6000/rs6000.md
>> +++ b/gcc/config/rs6000/rs6000.md
>> @@ -5483,7 +5483,7 @@ (define_insn "lfiwzx"
>> lxsiwzx %x0,%y1
>> mtvsrwz %x0,%1
>> xxextractuw %x0,%x1,4"
>> -  [(set_attr "type" "fpload,fpload,mftgpr,vecexts")
>> +  [(set_attr "type" "fpload,fpload,mffgpr,vecexts")
>> (set_attr "isa" "*,p8v,p8v,p9v")])
> 
> Can we rename mftgpr/mffgpr globally?  Maybe even as mfvsr and mtvsr,
> because that is what is actually modeled here?  Such names will make it
> much harder to get confused and use the wrong type, too :-)
> 

Those types were originally created for the mffgpr/mftgpr Power6
instructions. But since it appears we no longer generate those insns I
totally agree with doing a global change as you suggest to make things
clearer. Would you like that as a separate patch or is it fine to
include in this one?


>> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
>> index 54da54c43dc..3a5cf896da8 100644
>> --- a/gcc/config/rs6000/vsx.md
>> +++ b/gcc/config/rs6000/vsx.md
>> @@ -2885,7 +2885,7 @@ (define_insn "vsx_concat_"
>>else
>>  gcc_unreachable ();
>>  }
>> -  [(set_attr "type" "vecperm")])
>> +  [(set_attr "type" "vecperm,vecmove")])
> 
> mtvsrdd is a mtvsr, sorry, mffgpr just the same?  It isn't vecmove?
> 
>> @@ -4440,7 +4440,7 @@ (define_insn "vsx_splat__reg"
>>"@
>> xxpermdi %x0,%x1,%x1,0
>> mtvsrdd %x0,%1,%1"
>> -  [(set_attr "type" "vecperm")])
>> +  [(set_attr "type" "vecperm,vecmove")])
> 
> Same here.

mtvsrdd dispatches as a vector op, so requires a super-slice. As opposed
to the others which just require a single execution slice for Power9.

-Pat

[PATCH] rs6000: Fix instruction type

2020-09-09 Thread Pat Haugen via Gcc-patches

I noticed that some of the VSR<->GPR move instructions are not typed
correctly. This patch fixes those instructions so that the scheduler
treats them with the correct latency.

Bootstrap/regtest on powerpc64le with no new regressions. Also ran a
CPU2017 benchmark comparison on Power9 with no major differences (a
couple minor
improvements and no degradations). Ok for trunk?

-Pat


2020-09-09  Pat Haugen  

gcc/
* gcc/config/rs6000/rs6000.md
(lfiwzx, floatunssi2_lfiwzx, p8_mtvsrwz, p8_mtvsrd_sf): Fix insn
type.
* gcc/config/rs6000/vsx.md
(vsx_concat_, vsx_splat__reg, vsx_splat_v4sf): Likewise.



diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 43b620ae1c0..f902c864c26 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -5483,7 +5483,7 @@ (define_insn "lfiwzx"
lxsiwzx %x0,%y1
mtvsrwz %x0,%1
xxextractuw %x0,%x1,4"
-  [(set_attr "type" "fpload,fpload,mftgpr,vecexts")
+  [(set_attr "type" "fpload,fpload,mffgpr,vecexts")
(set_attr "isa" "*,p8v,p8v,p9v")])

 (define_insn_and_split "floatunssi2_lfiwzx"
@@ -7634,7 +7634,7 @@ (define_insn_and_split "movsf_from_si"
 *,  12,*, *")
(set_attr "type"
"load,   fpload,fpload,fpload,store, fpstore,
-fpstore,vecfloat,  mffgpr,*")
+fpstore,vecfloat,  mftgpr,*")
(set_attr "isa"
"*,  *, p9v,   p8v,   *, *,
 p8v,p8v,   p8v,   *")])
@@ -8711,7 +8711,7 @@ (define_insn "p8_mtvsrwz"
   UNSPEC_P8V_MTVSRWZ))]
   "!TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
   "mtvsrwz %x0,%1"
-  [(set_attr "type" "mftgpr")])
+  [(set_attr "type" "mffgpr")])

 (define_insn_and_split "reload_fpr_from_gpr"
   [(set (match_operand:FMOVE64X 0 "register_operand" "=d")
@@ -8810,7 +8810,7 @@ (define_insn "p8_mtvsrd_sf"
   UNSPEC_P8V_MTVSRD))]
   "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
   "mtvsrd %x0,%1"
-  [(set_attr "type" "mftgpr")])
+  [(set_attr "type" "mffgpr")])

 (define_insn_and_split "reload_vsx_from_gprsf"
   [(set (match_operand:SF 0 "register_operand" "=wa")
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 54da54c43dc..3a5cf896da8 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -2885,7 +2885,7 @@ (define_insn "vsx_concat_"
   else
 gcc_unreachable ();
 }
-  [(set_attr "type" "vecperm")])
+  [(set_attr "type" "vecperm,vecmove")])

 ;; Combiner patterns to allow creating XXPERMDI's to access either double
 ;; word element in a vector register.
@@ -4440,7 +4440,7 @@ (define_insn "vsx_splat__reg"
   "@
xxpermdi %x0,%x1,%x1,0
mtvsrdd %x0,%1,%1"
-  [(set_attr "type" "vecperm")])
+  [(set_attr "type" "vecperm,vecmove")])

 (define_insn "vsx_splat__mem"
   [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
@@ -4493,7 +4493,7 @@ (define_insn_and_split "vsx_splat_v4sf"
(unspec:V4SF [(match_dup 0)
  (const_int 0)] UNSPEC_VSX_XXSPLTW))]
   ""
-  [(set_attr "type" "vecload,vecperm,mftgpr")
+  [(set_attr "type" "vecload,vecperm,vecperm")
(set_attr "length" "*,8,*")
(set_attr "isa" "*,p8v,*")])

[PATCH] rs6000: Disable -fcaller-saves by default

2020-07-23 Thread Pat Haugen via Gcc-patches

Disable -fcaller-saves by default.

This patch turns off -fcaller-saves by default so that IRA doesn't try
to use volatile regs for pseudos that are live across a call, which
would then require LRA to save/restore the reg around the call.

Bootstrap/regtest on powerpc64le with no new regressions. Also ran a
CPU2017 benchmark comparison with no major differences (a few minor
improvements and one minor degradation). Ok for trunk?

-Pat


2020-07-23  Pat Haugen  

gcc/
* common/config/rs6000/rs6000-common.c
(rs6000_option_optimization_table): Turn off -fcaller-saves.

gcc/testsuite/
* gcc.target/powerpc/caller-saves.c: New.



diff --git a/gcc/common/config/rs6000/rs6000-common.c
b/gcc/common/config/rs6000/rs6000-common.c
index ee37b9dc90b..6d68834aed2 100644
--- a/gcc/common/config/rs6000/rs6000-common.c
+++ b/gcc/common/config/rs6000/rs6000-common.c
@@ -43,8 +43,13 @@ static const struct default_options
rs6000_option_optimization_table[] =
on rs6000, turn it off by default.  */
 { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },

+/* Don't allow IRA to use volatile regs across function calls.  */
+{ OPT_LEVELS_ALL, OPT_fcaller_saves, NULL, 0 },
+
 /* Double growth factor to counter reduced min jump length.  */
 { OPT_LEVELS_ALL, OPT__param_max_grow_copy_bb_insns_, NULL, 16 },
+
+/* Following marks the end of the list and needs to remain last.  */
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };

diff --git a/gcc/testsuite/gcc.target/powerpc/caller-saves.c
b/gcc/testsuite/gcc.target/powerpc/caller-saves.c
new file mode 100644
index 000..846356f16f7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/caller-saves.c
@@ -0,0 +1,9 @@
+/* { dg-do compile */
+/* { dg-options "-O2 -fverbose-asm" } */
+
+void
+foo ()
+{
+}
+
+/* { dg-final { scan-assembler-not "fcaller-saves" } } */

Re: [RFC, Instruction Scheduler, Stage1] New hook/code to perform fusion of dependent instructions

2020-04-08 Thread Pat Haugen via Gcc-patches

On 4/8/20 2:46 AM, Richard Biener wrote:
>> I have coded up a proof of concept that implements our needs via a new
>> target hook. The hook is passed a pair of dependent insns and returns if
>> they are a fusion candidate. It is called while removing the forward
>> dependencies of the just scheduled insn. If a dependent insn becomes
>> available to schedule and it's a fusion candidate with the just
>> scheduled insn, then the new code moves it to the ready list (if
>> necessary) and marks it as SCHED_GROUP (piggy-backing on the existing
>> code used by TARGET_SCHED_MACRO_FUSION) to make sure the fusion
>> candidate will be scheduled next. Following is the scheduling part of
>> the diff. Does this sound like a feasible approach? I welcome any
>> comments/discussion.
> It would be nice test if the i386 use of TARGET_SCHED_MACRO_FUSION hooks
> for compare + branch fusion can be transitioned to that scheme.  I think your
> proposal doesn't improve things there since we can never schedule
> branches earlier
> but we could schedule the compare later.  But your proposal only
> affects scheduling
> of later insns, not the order of insns getting onto the ready list?

Yes, your understanding is correct, my approach only affects later
dependent insns. The first insn of the fusion pair will be scheduled
according to existing priority based scheduling.

-Pat

[RFC, Instruction Scheduler, Stage1] New hook/code to perform fusion of dependent instructions

2020-04-07 Thread Pat Haugen via Gcc-patches

The Power processor has the ability to fuse certain pairs of dependent
instructions to improve their performance if they appear back-to-back in
the instruction stream. In looking at the current support for
instruction fusion in GCC I saw the following 2 options.

1) TARGET_SCHED_MACRO_FUSION target hooks: Only looks at existing
back-to-back instructions and will ensure the scheduler keeps them together.

2) -fsched-fusion/TARGET_SCHED_FUSION_PRIORITY: Runs as a separate
scheduling pass before peephole2. Operates independently on a single
insn. Used by ARM backend to assign higher priorities to base/disp loads
and stores so that the scheduling pass will schedule loads/stores to
adjacent memory back-to-back. Later these insns will be transformed into
load/store pair insns.

Neither of these work for Power's purpose because they don't deal with
fusion of dependent insns that may not already be back-to-back. The
TARGET_SCHED_REORDER[2] hooks also don't work since the dependent insn
more than likely gets queued for N cycles so wouldn't be on the ready
list for the reorder hooks to process. We want the ability for the
scheduler to schedule dependent insn pairs back-to-back when possible
(i.e. other dependencies of both insns have been satisfied).

I have coded up a proof of concept that implements our needs via a new
target hook. The hook is passed a pair of dependent insns and returns if
they are a fusion candidate. It is called while removing the forward
dependencies of the just scheduled insn. If a dependent insn becomes
available to schedule and it's a fusion candidate with the just
scheduled insn, then the new code moves it to the ready list (if
necessary) and marks it as SCHED_GROUP (piggy-backing on the existing
code used by TARGET_SCHED_MACRO_FUSION) to make sure the fusion
candidate will be scheduled next. Following is the scheduling part of
the diff. Does this sound like a feasible approach? I welcome any
comments/discussion.

Thanks,
Pat


diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index 80687fb5359..7a62136d497 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -4152,6 +4152,39 @@ schedule_insn (rtx_insn *insn)
  && SCHED_GROUP_P (next)
  && advance < effective_cost)
advance = effective_cost;
+
+ /* If all dependencies of this insn have now been resolved and the
+just scheduled insn was not part of a SCHED_GROUP, check if
+this dependent insn can be fused with the just scheduled insn.  */
+ else if (effective_cost >= 0 && !SCHED_GROUP_P (insn)
+  && targetm.sched.dep_fusion
+  && targetm.sched.dep_fusion (insn, next))
+   {
+ /* Move to ready list if necessary.  */
+ if (effective_cost > 0)
+   {
+ queue_remove (next);
+ ready_add (, next, true);
+   }
+
+ /* Mark as sched_group.  */
+ SCHED_GROUP_P (next) = 1;
+
+ /* Fix insn_tick.  */
+ INSN_TICK (next) = INSN_TICK (insn);
+
+ /* Dump some debug output for success.  */
+ if (sched_verbose >= 5)
+   {
+ fprintf (sched_dump, ";;\t\tFusing dependent insns: ");
+ fprintf (sched_dump, "%4d %-30s --> ", INSN_UID (insn),
+  str_pattern_slim (PATTERN (insn)));
+ fprintf (sched_dump, "%4d %-30s\n", INSN_UID (next),
+  str_pattern_slim (PATTERN (next)));
+   }
+   }
}
   else
/* Check always has only one forward dependence (to the first insn in

Re: [PATCH] correct COUNT and PROB for unrolled loop

2020-02-03 Thread Pat Haugen

On 2/3/20 2:17 AM, Jiufu Guo wrote:
> +/* { dg-final { scan-rtl-dump-times "REG_BR_PROB 937042044" 1 
> "loop2_unroll"} } */

Sorry I didn't catch this addition to the original testcase earlier, but I 
wonder how stable this test is going to be. If there are future changes to 
default count/probability, or changes in their representation, this may fail 
and need to be updated. The fact that the loop is still getting aligned is the 
main concern.

-Pat

[PATCH] rs6000: Refactor scheduling hook code

2019-11-18 Thread Pat Haugen

The following patch factors out some common code to its own function and also 
moves the Power6 specific code to a new function.

Bootstrap/regtest on powerpc64le with no regressions. Ok for trunk?

-Pat


2019-11-18  Pat Haugen  

* config/rs6000/rs6000.c (move_to_end_of_ready): New, factored out
from common code.
(power6_sched_reorder2): Factored out from rs6000_sched_reorder2,
call new function.
(power9_sched_reorder2): Call new function.
(rs6000_sched_reorder2): Likewise.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 278306)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -17711,14 +17711,216 @@ get_next_active_insn (rtx_insn *insn, rt
   return insn;
 }
 
+/* Move instruction at POS to the end of the READY list.  */
+
+static void
+move_to_end_of_ready (rtx_insn **ready, int pos, int lastpos)
+{
+  rtx_insn *tmp;
+  int i;
+
+  tmp = ready[pos];
+  for (i = pos; i < lastpos; i++)
+ready[i] = ready[i + 1];
+  ready[lastpos] = tmp;
+}
+
+/* Do Power6 specific sched_reorder2 reordering of ready list.  */
+
+static int
+power6_sched_reorder2 (rtx_insn **ready, int lastpos)
+{
+  /* For Power6, we need to handle some special cases to try and keep the
+ store queue from overflowing and triggering expensive flushes.
+
+ This code monitors how load and store instructions are being issued
+ and skews the ready list one way or the other to increase the likelihood
+ that a desired instruction is issued at the proper time.
+
+ A couple of things are done.  First, we maintain a "load_store_pendulum"
+ to track the current state of load/store issue.
+
+   - If the pendulum is at zero, then no loads or stores have been
+issued in the current cycle so we do nothing.
+
+   - If the pendulum is 1, then a single load has been issued in this
+cycle and we attempt to locate another load in the ready list to
+issue with it.
+
+   - If the pendulum is -2, then two stores have already been
+issued in this cycle, so we increase the priority of the first load
+in the ready list to increase it's likelihood of being chosen first
+in the next cycle.
+
+   - If the pendulum is -1, then a single store has been issued in this
+cycle and we attempt to locate another store in the ready list to
+issue with it, preferring a store to an adjacent memory location to
+facilitate store pairing in the store queue.
+
+   - If the pendulum is 2, then two loads have already been
+issued in this cycle, so we increase the priority of the first store
+in the ready list to increase it's likelihood of being chosen first
+in the next cycle.
+
+   - If the pendulum < -2 or > 2, then do nothing.
+
+   Note: This code covers the most common scenarios.  There exist non
+load/store instructions which make use of the LSU and which
+would need to be accounted for to strictly model the behavior
+of the machine.  Those instructions are currently unaccounted
+for to help minimize compile time overhead of this code.
+   */
+  int pos;
+  rtx load_mem, str_mem;
+
+  if (is_store_insn (last_scheduled_insn, _mem))
+/* Issuing a store, swing the load_store_pendulum to the left */
+load_store_pendulum--;
+  else if (is_load_insn (last_scheduled_insn, _mem))
+/* Issuing a load, swing the load_store_pendulum to the right */
+load_store_pendulum++;
+  else
+return cached_can_issue_more;
+
+  /* If the pendulum is balanced, or there is only one instruction on
+ the ready list, then all is well, so return. */
+  if ((load_store_pendulum == 0) || (lastpos <= 0))
+return cached_can_issue_more;
+
+  if (load_store_pendulum == 1)
+{
+  /* A load has been issued in this cycle.  Scan the ready list
+for another load to issue with it */
+  pos = lastpos;
+
+  while (pos >= 0)
+   {
+ if (is_load_insn (ready[pos], _mem))
+   {
+ /* Found a load.  Move it to the head of the ready list,
+and adjust it's priority so that it is more likely to
+stay there */
+ move_to_end_of_ready (ready, pos, lastpos);
+
+ if (!sel_sched_p ()
+ && INSN_PRIORITY_KNOWN (ready[lastpos]))
+   INSN_PRIORITY (ready[lastpos])++;
+ break;
+   }
+ pos--;
+   }
+}
+  else if (load_store_pendulum == -2)
+{
+  /* Two stores have been issued in this cycle.  Increase the
+priority of the first load in the ready list to favor it for
+issuing in the next cycle. */
+  pos = lastpos;
+
+  while (pos >= 0)
+   {
+ if (is_load_insn (ready[pos], _mem)
+ && !sel_sched_p ()
+

[PATCH rs6000] Fix PR target/84369: gcc.dg/sms-10.c fails on Power9

2019-04-15 Thread Pat Haugen

As pointed out in the PR, the test is failing because a store->load dependency 
is reporting zero cost. Fixed by leaving existing costs as is (i.e. cost for 
update forms), and just adding a simple bypass for store->load dependencies.

Bootstrap/regtest on powerpc64le (Power9) with no new regressions and testcase 
now passing. Also ran cpu2006/cpu2017 benchmark comparisons with no notable 
differences. Ok for trunk?

-Pat


2019-04-15  Pat Haugen  

PR target/84369
* config/rs6000/power9.md: Add store forwarding bypass.

Index: gcc/config/rs6000/power9.md
===
--- gcc/config/rs6000/power9.md (revision 270261)
+++ gcc/config/rs6000/power9.md (working copy)
@@ -236,6 +236,9 @@ (define_insn_reservation "power9-vecstor
(eq_attr "cpu" "power9"))
   "DU_super_power9,LSU_pair_power9")
 
+; Store forwarding latency is 6
+(define_bypass 6 "power9-*store*" "power9-*load*")
+
 (define_insn_reservation "power9-larx" 4
   (and (eq_attr "type" "load_l")
(eq_attr "cpu" "power9"))

[PATCH, testsuite] Enable vect_usad_char effective target for PowerPC and fix up SAD_EXPR testcases

2019-02-19 Thread Pat Haugen

Power9 added support for V16QImode SAD operations. While making the 
check_effective_target change I noticed that the tests will also pass on 
Power7/Power8 even though they don't have the optab support. The reason is the 
tests are only checking that the source pattern is recognized, not that a 
SAD_EXPR was actually generated. So I've updated the tests also.

Ok for trunk?

-Pat


testsuite/ChangeLog:
2019-02-19  Pat Haugen  

* lib/target-supports.exp (check_effective_target_vect_usad_char):
Add PowerPC support.
* gcc.dg/vect/slp-reduc-sad.c: Update scan string.
* gcc.dg/vect/vect-reduc-sad.c: Likewise.


Index: gcc/testsuite/lib/target-supports.exp
===
--- gcc/testsuite/lib/target-supports.exp   (revision 268784)
+++ gcc/testsuite/lib/target-supports.exp   (working copy)
@@ -5982,7 +5982,9 @@ proc check_effective_target_vect_usad_ch
   expr { [istarget i?86-*-*]
  || [istarget x86_64-*-*]
  || ([istarget aarch64*-*-*]
- && ![check_effective_target_aarch64_sve])}}]
+ && ![check_effective_target_aarch64_sve])
+ || ([istarget powerpc*-*-*]
+ && [check_p9vector_hw_available])}}]
 }
 
 # Return 1 if the target plus current options supports both signed
Index: gcc/testsuite/gcc.dg/vect/slp-reduc-sad.c
===
--- gcc/testsuite/gcc.dg/vect/slp-reduc-sad.c   (revision 268784)
+++ gcc/testsuite/gcc.dg/vect/slp-reduc-sad.c   (working copy)
@@ -58,6 +58,6 @@ main ()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vect_recog_sad_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump "sad pattern recognized" "vect" } } */
 /* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-reduc-sad.c
===
--- gcc/testsuite/gcc.dg/vect/vect-reduc-sad.c  (revision 268784)
+++ gcc/testsuite/gcc.dg/vect/vect-reduc-sad.c  (working copy)
@@ -49,6 +49,6 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vect_recog_sad_pattern: detected" 1 
"vect" } } */
+/* { dg-final { scan-tree-dump-times "sad pattern recognized" 1 "vect" } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */

[PATCH doc] Remove documentation for PowerPC -maltivec={be,le}

2019-02-12 Thread Pat Haugen

The options were removed in May 2018 (r260109), but documentation was not 
updated.

Bootstrap on powerpc64le. Ok for trunk?

-Pat


2019-02-12  Pat Haugen  

* doc/invoke.texi (RS/6000 and PowerPC Options): Remove duplicate
-maltivec. Delete -maltivec=be and -maltivec=le documentation.

Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 268784)
+++ gcc/doc/invoke.texi (working copy)
@@ -1085,7 +1085,7 @@ See RS/6000 and PowerPC Options.
 -mstrict-align  -mno-strict-align  -mrelocatable @gol
 -mno-relocatable  -mrelocatable-lib  -mno-relocatable-lib @gol
 -mtoc  -mno-toc  -mlittle  -mlittle-endian  -mbig  -mbig-endian @gol
--mdynamic-no-pic  -maltivec  -mswdiv  -msingle-pic-base @gol
+-mdynamic-no-pic  -mswdiv  -msingle-pic-base @gol
 -mprioritize-restricted-insns=@var{priority} @gol
 -msched-costly-dep=@var{dependence_type} @gol
 -minsert-sched-nops=@var{scheme} @gol
@@ -24088,40 +24088,14 @@ the AltiVec instruction set.  You may al
 @option{-mabi=altivec} to adjust the current ABI with AltiVec ABI
 enhancements.

-When @option{-maltivec} is used, rather than @option{-maltivec=le} or
-@option{-maltivec=be}, the element order for AltiVec intrinsics such
-as @code{vec_splat}, @code{vec_extract}, and @code{vec_insert}
+When @option{-maltivec} is used, the element order for AltiVec intrinsics
+such as @code{vec_splat}, @code{vec_extract}, and @code{vec_insert}
 match array element order corresponding to the endianness of the
 target.  That is, element zero identifies the leftmost element in a
 vector register when targeting a big-endian platform, and identifies
 the rightmost element in a vector register when targeting a
 little-endian platform.

-@item -maltivec=be
-@opindex maltivec=be
-Generate AltiVec instructions using big-endian element order,
-regardless of whether the target is big- or little-endian.  This is
-the default when targeting a big-endian platform.  Using this option
-is currently deprecated.  Support for this feature will be removed in
-GCC 9.
-
-The element order is used to interpret element numbers in AltiVec
-intrinsics such as @code{vec_splat}, @code{vec_extract}, and
-@code{vec_insert}.  By default, these match array element order
-corresponding to the endianness for the target.
-
-@item -maltivec=le
-@opindex maltivec=le
-Generate AltiVec instructions using little-endian element order,
-regardless of whether the target is big- or little-endian.  This is
-the default when targeting a little-endian platform.  This option is
-currently ignored when targeting a big-endian platform.
-
-The element order is used to interpret element numbers in AltiVec
-intrinsics such as @code{vec_splat}, @code{vec_extract}, and
-@code{vec_insert}.  By default, these match array element order
-corresponding to the endianness for the target.
-
 @item -mvrsave
 @itemx -mno-vrsave
 @opindex mvrsave

Re: Add new --param knobs for inliner

2019-01-07 Thread Pat Haugen

On 1/5/19 11:58 AM, Jan Hubicka wrote:
> @@ -791,7 +791,7 @@ want_inline_small_function_p (struct cgr
>ipa_hints hints = estimate_edge_hints (e);
>int big_speedup = -1; /* compute this lazily */
>  
> -  if (growth <= 0)
> +  if (growth <= PARAM_VALUE (PARAM_VALUE (PARAM_MAX_INLINE_INSNS_SIZE)))

Extra PARAM_VALUE here.

-Pat

[PATCH] Fix PR68212, unrolled loop no longer aligned

2018-11-28 Thread Pat Haugen

The following patch fixes the case where unrolling in the absence of profile 
information can cause a loop to no longer look hot and therefor not get 
aligned. In this case, instead of dividing by the unroll factor we now just 
scale by profile_probability::likely (). The diff looks worse than what really 
changed, just the addition of an if test and putting the existing 'scalemain' 
setting code into the else leg. Bootstrap/regtest on powerpc64le with no new 
regressions. I also ran a CPU2006 comparison, there were 4 benchmarks that 
improved 2-3% with the others in the noise range.

Ok for trunk?

-Pat



2018-11-28  Pat Haugen  

PR rtl-optimization/68212
* cfgloopmanip.c (duplicate_loop_to_header_edge): Adjust scale factor.

testsuite/ChangeLog:
2018-11-28  Pat Haugen  

PR rtl-optimization/68212
* gcc.dg/pr68212.c: New test.



Index: gcc/cfgloopmanip.c
===
--- gcc/cfgloopmanip.c  (revision 266522)
+++ gcc/cfgloopmanip.c  (working copy)
@@ -1242,16 +1242,25 @@ duplicate_loop_to_header_edge (struct lo
  profile_probability prob_pass_main = bitmap_bit_p (wont_exit, 0)
? prob_pass_wont_exit
: prob_pass_thru;
- profile_probability p = prob_pass_main;
- profile_count scale_main_den = count_in;
- for (i = 0; i < ndupl; i++)
+
+ /* If not using real profile data then don't scale the loop by ndupl.
+This can lead to the loop no longer looking hot wrt surrounding
+code.  */
+ if (profile_status_for_fn (cfun) == PROFILE_GUESSED)
+   scale_main = profile_probability::likely ();
+ else
{
- scale_main_den += count_in.apply_probability (p);
- p = p * scale_step[i];
+ profile_probability p = prob_pass_main;
+ profile_count scale_main_den = count_in;
+ for (i = 0; i < ndupl; i++)
+   {
+ scale_main_den += count_in.apply_probability (p);
+ p = p * scale_step[i];
+   }
+ /* If original loop is executed COUNT_IN times, the unrolled
+loop will account SCALE_MAIN_DEN times.  */
+ scale_main = count_in.probability_in (scale_main_den);
}
- /* If original loop is executed COUNT_IN times, the unrolled
-loop will account SCALE_MAIN_DEN times.  */
- scale_main = count_in.probability_in (scale_main_den);
  scale_act = scale_main * prob_pass_main;
}
   else
Index: gcc/testsuite/gcc.dg/pr68212.c
===
--- gcc/testsuite/gcc.dg/pr68212.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/pr68212.c  (working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-tree-vectorize -funroll-loops --param 
max-unroll-times=4 -fdump-rtl-alignments" } */
+
+void foo(long int *a, long int *b, long int n)
+{
+  long int i;
+
+  for (i = 0; i < n; i++)
+a[i] = *b;
+}
+
+/* { dg-final { scan-rtl-dump-times "internal loop alignment added" 1 
"alignments"} } */

Re: Tweak ALAP calculation in SCHED_PRESSURE_MODEL

2018-11-20 Thread Pat Haugen

On 11/20/18 10:53 AM, Kyrill Tkachov wrote:
> On 20/11/18 16:48, Pat Haugen wrote:
>> On 11/19/18 2:30 PM, Pat Haugen wrote:
>>>> This is a follow-up from 
>>>> https://gcc.gnu.org/ml/gcc-patches/2018-11/msg01525.html
>>>> This version introduces an "artificial" property of the dependencies 
>>>> produced in
>>>> sched-deps.c that is recorded when they are created due to 
>>>> MAX_PENDING_LIST_LENGTH
>>>> and they are thus ignored in the model_analyze_insns ALAP calculation.
>>>>
>>>> This approach gives most of the benefits of the original patch [1] on 
>>>> aarch64.
>>>> I tried it on the cactusADM hot function (bench_staggeredleapfrog2_) on 
>>>> powerpc64le-unknown-linux-gnu
>>>> with -O3 and found that the initial version proposed did indeed increase 
>>>> the instruction count
>>>> and stack space. This version gives a small improvement on powerpc in 
>>>> terms of instruction count
>>>> (number of st* instructions stays the same), so I'm hoping this version 
>>>> addresses Pat's concerns.
>>>> Pat, could you please try this version out if you've got the chance?
>>>>
>>> I tried the new verison on cactusADM, it's showing a 2% degradation. I've 
>>> kicked off a full CPU2006 run just to see if any others are affected.
>> The other benchmarks were neutral. So the only benchmark showing a change is 
>> the 2% degradation on cactusADM. Comparing the generated .s files for 
>> bench_staggeredleapfrog2_(), there is about a 0.7% increase in load insns 
>> and still the 1% increase in store insns.
> 
> Sigh :(
> What options are you compiling with? I tried a powerpc64le compiler with 
> plain -O3 and saw got a slight improvement (by manual expection)

I was using the following: -O3 -mcpu=power8 -fpeel-loops -funroll-loops 
-ffast-math -mpopcntd -mrecip=all. When I run with just -O3 -mcpu=power8 I see 
just under a 1% degradation.

-Pat

Re: Tweak ALAP calculation in SCHED_PRESSURE_MODEL

2018-11-20 Thread Pat Haugen

On 11/19/18 2:30 PM, Pat Haugen wrote:
>> This is a follow-up from 
>> https://gcc.gnu.org/ml/gcc-patches/2018-11/msg01525.html
>> This version introduces an "artificial" property of the dependencies 
>> produced in
>> sched-deps.c that is recorded when they are created due to 
>> MAX_PENDING_LIST_LENGTH
>> and they are thus ignored in the model_analyze_insns ALAP calculation.
>>
>> This approach gives most of the benefits of the original patch [1] on 
>> aarch64.
>> I tried it on the cactusADM hot function (bench_staggeredleapfrog2_) on 
>> powerpc64le-unknown-linux-gnu
>> with -O3 and found that the initial version proposed did indeed increase the 
>> instruction count
>> and stack space. This version gives a small improvement on powerpc in terms 
>> of instruction count
>> (number of st* instructions stays the same), so I'm hoping this version 
>> addresses Pat's concerns.
>> Pat, could you please try this version out if you've got the chance?
>>
> I tried the new verison on cactusADM, it's showing a 2% degradation. I've 
> kicked off a full CPU2006 run just to see if any others are affected.

The other benchmarks were neutral. So the only benchmark showing a change is 
the 2% degradation on cactusADM. Comparing the generated .s files for 
bench_staggeredleapfrog2_(), there is about a 0.7% increase in load insns and 
still the 1% increase in store insns.

-Pat

Re: Tweak ALAP calculation in SCHED_PRESSURE_MODEL

2018-11-19 Thread Pat Haugen

On 11/19/18 11:54 AM, Kyrill Tkachov wrote:
> On 16/11/18 18:19, Pat Haugen wrote:
>> On 11/8/18 6:10 AM, Kyrill Tkachov wrote:
>>> The attached patch avoids that by making the alap calculation only
>>> look at true dependencies.  This shouldn't be too bad, since we use
>>> INSN_PRIORITY as the final tie-breaker than that does take
>>> anti-dependencies into account.
>>>
>>> This reduces the number of spills in the hot function from 436.cactusADM
>>> by 14% on aarch64 at -O3 (and the number of instructions in general).
>>> SPEC2017 shows a minor improvement on Cortex-A72 (about 0.1% overall).
>>> Thanks to Wilco for the benchmarking.
>> I tried the patch on PowerPC since it also uses SCHED_PRESSURE_MODEL 
>> algorithm. For CPU2006 only cactusADM had a noticeable difference, but I'm 
>> seeing a 5% degradation. Looking at the generated asm for function 
>> bench_staggeredleapfrog2_(), I see about a 1% increase in number of loads 
>> and stores generated and an extra 100 bytes allocated on the stack.
>>
>> -Pat
>>
> 
> This is a follow-up from 
> https://gcc.gnu.org/ml/gcc-patches/2018-11/msg01525.html
> This version introduces an "artificial" property of the dependencies produced 
> in
> sched-deps.c that is recorded when they are created due to 
> MAX_PENDING_LIST_LENGTH
> and they are thus ignored in the model_analyze_insns ALAP calculation.
> 
> This approach gives most of the benefits of the original patch [1] on aarch64.
> I tried it on the cactusADM hot function (bench_staggeredleapfrog2_) on 
> powerpc64le-unknown-linux-gnu
> with -O3 and found that the initial version proposed did indeed increase the 
> instruction count
> and stack space. This version gives a small improvement on powerpc in terms 
> of instruction count
> (number of st* instructions stays the same), so I'm hoping this version 
> addresses Pat's concerns.
> Pat, could you please try this version out if you've got the chance?
> 

I tried the new verison on cactusADM, it's showing a 2% degradation. I've 
kicked off a full CPU2006 run just to see if any others are affected.

-Pat

Re: Tweak ALAP calculation in SCHED_PRESSURE_MODEL

2018-11-16 Thread Pat Haugen

On 11/8/18 6:10 AM, Kyrill Tkachov wrote:
> The attached patch avoids that by making the alap calculation only
> look at true dependencies.  This shouldn't be too bad, since we use
> INSN_PRIORITY as the final tie-breaker than that does take
> anti-dependencies into account.
> 
> This reduces the number of spills in the hot function from 436.cactusADM
> by 14% on aarch64 at -O3 (and the number of instructions in general).
> SPEC2017 shows a minor improvement on Cortex-A72 (about 0.1% overall).
> Thanks to Wilco for the benchmarking.

I tried the patch on PowerPC since it also uses SCHED_PRESSURE_MODEL algorithm. 
For CPU2006 only cactusADM had a noticeable difference, but I'm seeing a 5% 
degradation. Looking at the generated asm for function 
bench_staggeredleapfrog2_(), I see about a 1% increase in number of loads and 
stores generated and an extra 100 bytes allocated on the stack.

-Pat

[PATCH] Fix typo in try_peel_loop

2018-10-31 Thread Pat Haugen

I observed the odd looking code while looking into something else and confirmed 
prior behavior (before r248863) was to sum the counts of the non-latch 
predecessors of the loop (in order to scale the loop such that it no longer 
appears to iterate).

Bootstrap/regtest on powerpc64le with no regressions. Committed as obvious.

-Pat


2018-10-31  Pat Haugen  

* tree-ssa-loop-ivcanon.c (try_peel_loop): Fix typo and remove dead
initialization.


Index: gcc/tree-ssa-loop-ivcanon.c
===
--- gcc/tree-ssa-loop-ivcanon.c (revision 264821)
+++ gcc/tree-ssa-loop-ivcanon.c (working copy)
@@ -1139,10 +1139,10 @@ try_peel_loop (struct loop *loop,
 if (e->src != loop->latch)
   {
if (e->src->count.initialized_p ())
- entry_count = e->src->count + e->src->count;
+ entry_count += e->src->count;
gcc_assert (!flow_bb_inside_loop_p (loop, e->src));
   }
-  profile_probability p = profile_probability::very_unlikely ();
+  profile_probability p;
   p = entry_count.probability_in (loop->header->count);
   scale_loop_profile (loop, p, 0);
   bitmap_set_bit (peeled_loops, loop->num);

[PATCH rs6000] Fix PR86612

2018-07-26 Thread Pat Haugen

Probably an obvious patch but...

The testcase fails because it looks like recent glibc headers (2.27 at least) 
no longer contain a declaration for __strdup, which results in warning messages 
being generated and failure for excess errors. Fixed by calling the standard 
name.

Verified the testcase now passes, ok for trunk?

-Pat

testsuite/ChangeLog:
2018-07-26  Pat Haugen  

PR target/86612
* gcc.target/powerpc/pr58673-2.c: Call strdup.


Index: testsuite/gcc.target/powerpc/pr58673-2.c
===
--- testsuite/gcc.target/powerpc/pr58673-2.c(revision 262974)
+++ testsuite/gcc.target/powerpc/pr58673-2.c(working copy)
@@ -140,7 +140,7 @@ pr_ff (t_coupl_rec * tcr, real time, t_i
   
malloc
   
(__len);
   
__retval;}
-   )): __strdup (eoNames[i])));
+   )): strdup (eoNames[i])));
  raleg[j++] =
(__extension__
 (__builtin_constant_p (buf)
@@ -165,7 +165,7 @@ pr_ff (t_coupl_rec * tcr, real time, t_i
   
malloc
   
(__len);
   
__retval;}
-   )): __strdup (buf)));
+   )): strdup (buf)));
}
}
   if (tcr->nLJ)

Re: [PATCH 1/4] Clean up of new format of -falign-FOO.

2018-07-16 Thread Pat Haugen

testsuite/gcc.target/powerpc/loop_align.c fails with this patch. It just needs 
a simple tweak to the scan-assembler line since we're no longer generating the 
",,31" portion on the .p2align.

-Pat

Re: [PATCH rs6000] Fix PR85698

2018-05-21 Thread Pat Haugen

On 5/17/18 1:57 PM, Segher Boessenkool wrote:
> On Thu, May 17, 2018 at 07:58:20PM +0200, Richard Biener wrote:
>> On May 17, 2018 6:04:36 PM GMT+02:00, Segher Boessenkool 
>> <seg...@kernel.crashing.org> wrote:
>>> On Thu, May 17, 2018 at 10:42:46AM -0500, Pat Haugen wrote:
>>>> The following patch fixes a problem that resulted in incorrect code
>>> generation for the CPU2017 benchmark 525.x264_r. The fix correctly
>>> checks the "dest" operand, which is the memory operand.
>>>> Bootstrap/regtest on powerp64le and powerpc64 (-m32/-m64) with no new
>>>> regressions. Ok for trunk?
>>> Okay.  Thanks!
>> Don't forget the branch. 
> It's okay for both 7 and 8, too.
> 

Fix has been backported to both 7 and 8 branches along with a fix to GCC 8 
branch gcc.target/powerpc/vec-setup-be-long.c to remove the XFAIL that was 
pre-approved off list.

-Pat

[PATCH rs6000] Fix PR85698

2018-05-17 Thread Pat Haugen

The following patch fixes a problem that resulted in incorrect code generation 
for the CPU2017 benchmark 525.x264_r. The fix correctly checks the "dest" 
operand, which is the memory operand.

Bootstrap/regtest on powerp64le and powerpc64 (-m32/-m64) with no new
regressions. Ok for trunk?

-Pat


2018-05-17  Pat Haugen  <pthau...@us.ibm.com>
Segher Boessenkool  <seg...@kernel.crashing.org>

PR target/85698
* config/rs6000/rs6000.c (rs6000_output_move_128bit): Check dest 
operand.

testsuite/ChangeLog:
2018-05-17  Pat Haugen  <pthau...@us.ibm.com>

PR target/85698
* gcc.target/powerpc/pr85698.c: New test.


Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 260267)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -20234,7 +20234,7 @@ rs6000_output_move_128bit (rtx operands[
}
 
   else if (TARGET_ALTIVEC && src_vmx_p
-  && altivec_indexed_or_indirect_operand (src, mode))
+  && altivec_indexed_or_indirect_operand (dest, mode))
return "stvx %1,%y0";
 
   else if (TARGET_VSX && src_vsx_p)
Index: gcc/testsuite/gcc.target/powerpc/pr85698.c
===
--- gcc/testsuite/gcc.target/powerpc/pr85698.c  (nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr85698.c  (working copy)
@@ -0,0 +1,79 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power7" } } */
+/* { dg-options "-O3 -mcpu=power7" } */
+
+/* PR85698: Incorrect code generated on LE due to use of stxvw4x. */
+
+typedef unsigned char uint8_t;
+typedef short int16_t;
+extern void abort (void);
+extern int memcmp(const void *, const void *, __SIZE_TYPE__);
+
+uint8_t expected[128] =
+{14, 0, 4, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
+ 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 28, 35, 33, 35, 36, 37, 38, 39, 40,
+ 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
+ 60, 61, 62, 63, 66, 63, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78,
+ 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 97, 96,
+ 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113,
+ 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127};
+
+static uint8_t x264_clip_uint8( int x )
+{
+  return x&(~255) ? (-x)>>31 : x;
+}
+void add4x4_idct( uint8_t *p_dst, int16_t dct[16])
+{
+  int16_t d[16];
+  int16_t tmp[16];
+  int i, y, x;
+  for( i = 0; i < 4; i++ )
+{
+  int s02 =  dct[0*4+i] +  dct[2*4+i];
+  int d02 =  dct[0*4+i] -  dct[2*4+i];
+  int s13 =  dct[1*4+i] + (dct[3*4+i]>>1);
+  int d13 = (dct[1*4+i]>>1) -  dct[3*4+i];
+  tmp[i*4+0] = s02 + s13;
+  tmp[i*4+1] = d02 + d13;
+  tmp[i*4+2] = d02 - d13;
+  tmp[i*4+3] = s02 - s13;
+}
+  for( i = 0; i < 4; i++ )
+{
+  int s02 =  tmp[0*4+i] +  tmp[2*4+i];
+  int d02 =  tmp[0*4+i] -  tmp[2*4+i];
+  int s13 =  tmp[1*4+i] + (tmp[3*4+i]>>1);
+  int d13 = (tmp[1*4+i]>>1) -  tmp[3*4+i];
+  d[0*4+i] = ( s02 + s13 + 32 ) >> 6;
+  d[1*4+i] = ( d02 + d13 + 32 ) >> 6;
+  d[2*4+i] = ( d02 - d13 + 32 ) >> 6;
+  d[3*4+i] = ( s02 - s13 + 32 ) >> 6;
+}
+  for( y = 0; y < 4; y++ )
+{
+  for( x = 0; x < 4; x++ )
+p_dst[x] = x264_clip_uint8( p_dst[x] + d[y*4+x] );
+  p_dst += 32;
+}
+}
+
+int main()
+{
+  uint8_t dst[128];
+  int16_t dct[16];
+  int i;
+
+  for (i = 0; i < 16; i++)
+dct[i] = i*10 + i;
+  for (i = 0; i < 128; i++)
+dst[i] = i;
+
+  add4x4_idct(dst, dct);
+
+  if (memcmp (dst, expected, 128))
+abort();
+
+ return 0;
+}
+

[Patch, rs6000] Fix register values in ppc-asm.h

2017-11-21 Thread Pat Haugen

The following patch fixes a couple typos in ppc-asm.h. Committed as
obvious. Will also backport to GCC 6/7 branches.

-Pat


2017-11-21  Pat Haugen  <pthau...@us.ibm.com>

* config/rs6000/ppc-asm.h (f50, vs50): Fix values.


Index: gcc/config/rs6000/ppc-asm.h
===
--- gcc/config/rs6000/ppc-asm.h (revision 255022)
+++ gcc/config/rs6000/ppc-asm.h (working copy)
@@ -120,7 +120,7 @@ see the files COPYING3 and COPYING.RUNTI
 #define f4747
 #define f4848
 #define f4949
-#define f5030
+#define f5050
 #define f5151
 #define f5252
 #define f5353
@@ -222,7 +222,7 @@ see the files COPYING3 and COPYING.RUNTI
 #define vs47   47
 #define vs48   48
 #define vs49   49
-#define vs50   30
+#define vs50   50
 #define vs51   51
 #define vs52   52
 #define vs53   53

Re: [PATCH, rs6000] Correct some Power9 scheduling info

2017-11-15 Thread Pat Haugen

On 09/27/2017 12:56 PM, Pat Haugen wrote:
> The following patch corrects some Power9 resource requirements and
> instruction latencies. Bootstrap/regtest on powerpc64le-linux with no
> new regressions. Ok for trunk?

Updated patch follows. Bootstrap/regtest on powerpc64le-linux (Power9)
with no regressions. Ok for trunk?

-Pat

2017-11-15  Pat Haugen  <pthau...@us.ibm.com>

* rs6000/power9.md (power9fpdiv): New automaton and cpu_unit defined
for it.
(DU_C2_3_power9): Correct reservation combinations.
(FP_DIV_power9, VEC_DIV_power9): New.
(power9-alu): Split out rotate/shift...
(power9-rot): ...to here, correct dispatch resource.
(power9-cracked-alu, power9-mul, power9-mul-compare): Correct dispatch
resource.
(power9-fp): Correct latency.
(power9-sdiv): Add div/sqrt resource.
(power9-ddiv): Correct latency, add div/sqrt resource.
(power9-sqrt, power9-dsqrt): Add div/sqrt resource.
(power9-vecfdiv, power9-vecdiv): Correct latency, add div/sqrt
resource.
(power9-qpdiv, power9-qpmul): Adjust resource usage.


Index: gcc/config/rs6000/power9.md
===
--- gcc/config/rs6000/power9.md	(revision 254708)
+++ gcc/config/rs6000/power9.md	(working copy)
@@ -19,7 +19,7 @@
 ;; along with GCC; see the file COPYING3.  If not see
 ;; <http://www.gnu.org/licenses/>.
 
-(define_automaton "power9dsp,power9lsu,power9vsu,power9misc")
+(define_automaton "power9dsp,power9lsu,power9vsu,power9fpdiv,power9misc")
 
 (define_cpu_unit "lsu0_power9,lsu1_power9,lsu2_power9,lsu3_power9" "power9lsu")
 (define_cpu_unit "vsu0_power9,vsu1_power9,vsu2_power9,vsu3_power9" "power9vsu")
@@ -28,7 +28,11 @@
 ; Two fixed point divide units, not pipelined
 (define_cpu_unit "fx_div0_power9,fx_div1_power9" "power9misc")
 (define_cpu_unit "bru_power9,cryptu_power9,dfu_power9" "power9misc")
+; Create a false unit for use by non-pipelined FP div/sqrt
+(define_cpu_unit "fp_div0_power9,fp_div1_power9,fp_div2_power9,fp_div3_power9"
+		 "power9fpdiv")
 
+
 (define_cpu_unit "x0_power9,x1_power9,xa0_power9,xa1_power9,
 		  x2_power9,x3_power9,xb0_power9,xb1_power9,
 		  br0_power9,br1_power9" "power9dsp")
@@ -79,8 +83,7 @@
 
 ; 2-way cracked plus 3rd slot
 (define_reservation "DU_C2_3_power9" "x0_power9+x1_power9+xa0_power9|
-  x1_power9+x2_power9+xa0_power9|
-  x1_power9+x2_power9+xb0_power9|
+  x1_power9+x2_power9+xa1_power9|
   x2_power9+x3_power9+xb0_power9")
 
 ; 3-way cracked (consumes whole decode/dispatch cycle)
@@ -108,7 +111,19 @@
 
 (define_reservation "VSU_PRM_power9" "prm0_power9|prm1_power9")
 
+; Define the reservation to be used by FP div/sqrt which allows other insns
+; to be issued to the VSU, but blocks other div/sqrt for a number of cycles.
+; Note that the number of cycles blocked varies depending on insn, but we
+; just use the same number for all in order to keep the number of DFA states
+; reasonable.
+(define_reservation "FP_DIV_power9"
+		"fp_div0_power9*8|fp_div1_power9*8|fp_div2_power9*8|
+		 fp_div3_power9*8")
+(define_reservation "VEC_DIV_power9"
+		"fp_div0_power9*8+fp_div1_power9*8|
+		 fp_div2_power9*8+fp_div3_power9*8")
 
+
 ; LS Unit
 (define_insn_reservation "power9-load" 4
   (and (eq_attr "type" "load")
@@ -243,9 +258,7 @@
 
 ; Most ALU insns are simple 2 cycle, including record form
 (define_insn_reservation "power9-alu" 2
-  (and (ior (eq_attr "type" "add,exts,integer,logical,isel")
-	(and (eq_attr "type" "insert,shift")
-		 (eq_attr "dot" "no")))
+  (and (eq_attr "type" "add,exts,integer,logical,isel")
(eq_attr "cpu" "power9"))
   "DU_any_power9,VSU_power9")
 ; 5 cycle CR latency
@@ -252,12 +265,19 @@
 (define_bypass 5 "power9-alu"
 		 "power9-crlogical,power9-mfcr,power9-mfcrf")
 
+; Rotate/shift prevent use of third slot
+(define_insn_reservation "power9-rot" 2
+  (and (eq_attr "type" "insert,shift")
+   (eq_attr "dot" "no")
+   (eq_attr "cpu" "power9"))
+  "DU_slice_3_power9,VSU_power9")
+
 ; Record form rotate/shift are cracked
 (define_insn_reservation "power9-cracked-alu" 2
   (and (eq_attr "type" "insert,shift")
(eq_attr "dot" "yes")
(eq_attr "cpu" "power9"))
-  "DU_C2_power9,VSU_power9")
+  "DU_C2_3_power9,VSU_power9")
 ; 7 cycle CR latency
 (define_bypass 7 "p

Re: [PATCH, rs6000] Fix scheduling description for quad-precision multiply instructions

2017-11-08 Thread Pat Haugen

On 11/08/2017 11:30 AM, Segher Boessenkool wrote:
>> --- gcc/config/rs6000/power9.md  (revision 254377)
>> +++ gcc/config/rs6000/power9.md  (working copy)
>> @@ -436,6 +436,12 @@ (define_insn_reservation "power9-qpdiv"
>> (eq_attr "cpu" "power9"))
>>"DU_super_power9,dfu_power9")
>>  
>> +(define_insn_reservation "power9-qpmul" 24
>> +  (and (eq_attr "type" "qmul")
>> +   (eq_attr "size" "128")
>> +   (eq_attr "cpu" "power9"))
>> +  "DU_super_power9,dfu_power9*12")
> All other p9 reservations (other than integer div) do not reserve a
> unit more than one cycle.  Will this blow up size of the automaton?
> If not, should the other qp ops not reserve for more cycles, too?
> 

You are observant. :)  I should have just included the change for qp
divide in this patch since it's so simple, but didn't. The qp divide
also blocks the pipe for a number of cycles, so yes should be changed.
The other qp insns (add/sub/etc) do not, so are correct as is.


> Rest looks fine; okay for trunk with the reservation thing taken care
> of one way or the other.  Thanks!

Following is additional change I'll include along with updated
ChangeLog. Testing was fine.

@@ -434,8 +434,14 @@
   (and (eq_attr "type" "vecdiv")
(eq_attr "size" "128")
(eq_attr "cpu" "power9"))
-  "DU_super_power9,dfu_power9")
+  "DU_super_power9,dfu_power9*44")

[PATCH, rs6000] Fix scheduling description for quad-precision multiply instructions

2017-11-08 Thread Pat Haugen

 The following patch creates a new insn type to annotate quad precision
multiply instructions, updates the appropriate insns to use the new type
and creates an entry in the Power9 machine description which describes
the correct latency/resources. Bootstrap/regtest on powerpc64le-linux
with no new regressions. Ok for trunk?

-Pat


2017-11-08  Pat Haugen  <pthau...@us.ibm.com>

* rs6000/power9.md (power9-qpmul): New.
* rs6000/rs6000.md ("type" attr): Add qmul.
(mul3, fma4_hw, *fms4_hw, *nfma4_hw,
*nfms4_hw, mul3_odd, fma4_odd, *fms4_odd,
*nfma4_odd, *nfms4_odd): Change type to qmul.


Index: gcc/config/rs6000/power9.md
===
--- gcc/config/rs6000/power9.md	(revision 254377)
+++ gcc/config/rs6000/power9.md	(working copy)
@@ -436,6 +436,12 @@ (define_insn_reservation "power9-qpdiv"
(eq_attr "cpu" "power9"))
   "DU_super_power9,dfu_power9")
 
+(define_insn_reservation "power9-qpmul" 24
+  (and (eq_attr "type" "qmul")
+   (eq_attr "size" "128")
+   (eq_attr "cpu" "power9"))
+  "DU_super_power9,dfu_power9*12")
+
 (define_insn_reservation "power9-mffgpr" 2
   (and (eq_attr "type" "mffgpr")
(eq_attr "cpu" "power9"))
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md	(revision 254377)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -182,7 +182,7 @@ (define_attr "type"
cmp,
branch,jmpreg,mfjmpr,mtjmpr,trap,isync,sync,load_l,store_c,
cr_logical,delayed_cr,mfcr,mfcrf,mtcr,
-   fpcompare,fp,fpsimple,dmul,sdiv,ddiv,ssqrt,dsqrt,
+   fpcompare,fp,fpsimple,dmul,qmul,sdiv,ddiv,ssqrt,dsqrt,
vecsimple,veccomplex,vecdiv,veccmp,veccmpsimple,vecperm,
vecfloat,vecfdiv,vecdouble,mffgpr,mftgpr,crypto,
veclogical,veccmpfx,vecexts,vecmove,
@@ -14230,7 +14230,7 @@ (define_insn "mul3"
 	 (match_operand:IEEE128 2 "altivec_register_operand" "v")))]
   "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
   "xsmulqp %0,%1,%2"
-  [(set_attr "type" "vecfloat")
+  [(set_attr "type" "qmul")
(set_attr "size" "128")])
 
 (define_insn "div3"
@@ -14332,7 +14332,7 @@ (define_insn "fma4_hw"
 	 (match_operand:IEEE128 3 "altivec_register_operand" "0")))]
   "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
   "xsmaddqp %0,%1,%2"
-  [(set_attr "type" "vecfloat")
+  [(set_attr "type" "qmul")
(set_attr "size" "128")])
 
 (define_insn "*fms4_hw"
@@ -14344,7 +14344,7 @@ (define_insn "*fms4_hw"
 	  (match_operand:IEEE128 3 "altivec_register_operand" "0"]
   "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
   "xsmsubqp %0,%1,%2"
-  [(set_attr "type" "vecfloat")
+  [(set_attr "type" "qmul")
(set_attr "size" "128")])
 
 (define_insn "*nfma4_hw"
@@ -14356,7 +14356,7 @@ (define_insn "*nfma4_hw"
 	  (match_operand:IEEE128 3 "altivec_register_operand" "0"]
   "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
   "xsnmaddqp %0,%1,%2"
-  [(set_attr "type" "vecfloat")
+  [(set_attr "type" "qmul")
(set_attr "size" "128")])
 
 (define_insn "*nfms4_hw"
@@ -14369,7 +14369,7 @@ (define_insn "*nfms4_hw"
 	   (match_operand:IEEE128 3 "altivec_register_operand" "0")]
   "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
   "xsnmsubqp %0,%1,%2"
-  [(set_attr "type" "vecfloat")
+  [(set_attr "type" "qmul")
(set_attr "size" "128")])
 
 (define_insn "extend2_hw"
@@ -14644,7 +14644,7 @@ (define_insn "mul3_odd"
 	 UNSPEC_MUL_ROUND_TO_ODD))]
   "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
   "xsmulqpo %0,%1,%2"
-  [(set_attr "type" "vecfloat")
+  [(set_attr "type" "qmul")
(set_attr "size" "128")])
 
 (define_insn "div3_odd"
@@ -14677,7 +14677,7 @@ (define_insn "fma4_odd"
 	 UNSPEC_FMA_ROUND_TO_ODD))]
   "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
   "xsmaddqpo %0,%1,%2"
-  [(set_attr "type" "vecfloat")
+  [(set_attr "type" "qmul")
(set_attr "size" "128")])
 
 (define_insn "*fms4_odd"
@@ -14690,7 +14690,7 @@ (define_insn "*fms4_odd"
 	 UNSPEC_FMA_ROUND_TO_ODD))]
   "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
   "xsmsubqpo %0,%1,%2"
-  [(set_attr "type" "vecfloat")
+  [(set_attr "type" "qmul")
(set_attr "size" "128")])
 
 (define_insn "*nfma4_odd"
@@ -14703,7 +14703,7 @@ (define_insn "*nfma4_odd"
 	  UNSPEC_FMA_ROUND_TO_ODD)))]
   "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
   "xsnmaddqpo %0,%1,%2"
-  [(set_attr "type" "vecfloat")
+  [(set_attr "type" "qmul")
(set_attr "size" "128")])
 
 (define_insn "*nfms4_odd"
@@ -14717,7 +14717,7 @@ (define_insn "*nfms4_odd"
 	  UNSPEC_FMA_ROUND_TO_ODD)))]
   "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
   "xsnmsubqpo %0,%1,%2"
-  [(set_attr "type" "vecfloat")
+  [(set_attr "type" "qmul")
(set_attr "size" "128")])
 
 (define_insn "truncdf2_odd"

[PATCH, rs6000] Correct unaligned_load vector cost for Power9

2017-10-09 Thread Pat Haugen

Power9 has efficient unaligned load insns. The following patch fixes the
cost to reflect that. There was no similar code for the unaligned_store
case.  Bootstrap/regtest on powerpc64le-linux with no new regressions.
Ok for trunk?

-Pat


2017-10-09  Pat Haugen  <pthau...@us.ibm.com>

* config/rs6000/power9.c (rs6000_builtin_vectorization_cost): Remove
TARGET_P9_VECTOR code for unaligned_load case.


Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 253547)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -5438,9 +5438,6 @@ rs6000_builtin_vectorization_cost (enum
 return 3;

   case unaligned_load:
-   if (TARGET_P9_VECTOR)
- return 3;
-
if (TARGET_EFFICIENT_UNALIGNED_VSX)
  return 1;

[PATCH, rs6000] Correct some Power9 scheduling info

2017-09-27 Thread Pat Haugen

The following patch corrects some Power9 resource requirements and
instruction latencies. Bootstrap/regtest on powerpc64le-linux with no
new regressions. Ok for trunk?

-Pat


2017-09-27  Pat Haugen  <pthau...@us.ibm.com>

* config/rs6000/power9.md (DU_C2_3_power9): Remove an incorrect
combination.
(power9-alu): Split out insert/shift types...
(power9-rot): ... to here. Correct dispatch resources.
(power9-cracked-alu): Correct dispatch resources.
(power9-mul): Likewise.
(power9-mul-compare): Likewise.
(power9-fp): Correct latency.
(power9-ddiv): Likewise.
(power9-vecfdiv): Likewise.
(power9-vecdiv): Likewise.
Index: gcc/config/rs6000/power9.md
===
--- gcc/config/rs6000/power9.md	(revision 252029)
+++ gcc/config/rs6000/power9.md	(working copy)
@@ -80,7 +80,6 @@ (define_reservation "DU_C2_power9" "x0_p
 ; 2-way cracked plus 3rd slot
 (define_reservation "DU_C2_3_power9" "x0_power9+x1_power9+xa0_power9|
   x1_power9+x2_power9+xa0_power9|
-  x1_power9+x2_power9+xb0_power9|
   x2_power9+x3_power9+xb0_power9")
 
 ; 3-way cracked (consumes whole decode/dispatch cycle)
@@ -243,21 +242,29 @@ (define_insn_reservation "power9-sync" 4
 
 ; Most ALU insns are simple 2 cycle, including record form
 (define_insn_reservation "power9-alu" 2
-  (and (ior (eq_attr "type" "add,exts,integer,logical,isel")
-	(and (eq_attr "type" "insert,shift")
-		 (eq_attr "dot" "no")))
+  (and (eq_attr "type" "add,exts,integer,logical,isel")
(eq_attr "cpu" "power9"))
   "DU_any_power9,VSU_power9")
 ; 5 cycle CR latency
 (define_bypass 5 "power9-alu"
 		 "power9-crlogical,power9-mfcr,power9-mfcrf")
 
+; Rotate/shift prevent use of third slot
+(define_insn_reservation "power9-rot" 2
+  (and (eq_attr "type" "insert,shift")
+   (eq_attr "dot" "no")
+   (eq_attr "cpu" "power9"))
+  "DU_slice_3_power9,VSU_power9")
+; 5 cycle CR latency
+(define_bypass 5 "power9-rot"
+		 "power9-crlogical,power9-mfcr,power9-mfcrf")
+
 ; Record form rotate/shift are cracked
 (define_insn_reservation "power9-cracked-alu" 2
   (and (eq_attr "type" "insert,shift")
(eq_attr "dot" "yes")
(eq_attr "cpu" "power9"))
-  "DU_C2_power9,VSU_power9")
+  "DU_C2_3_power9,VSU_power9")
 ; 7 cycle CR latency
 (define_bypass 7 "power9-cracked-alu"
 		 "power9-crlogical,power9-mfcr,power9-mfcrf")
@@ -291,13 +298,13 @@ (define_insn_reservation "power9-mul" 5
   (and (eq_attr "type" "mul")
(eq_attr "dot" "no")
(eq_attr "cpu" "power9"))
-  "DU_any_power9,VSU_power9")
+  "DU_slice_3_power9,VSU_power9")
 
 (define_insn_reservation "power9-mul-compare" 5
   (and (eq_attr "type" "mul")
(eq_attr "dot" "yes")
(eq_attr "cpu" "power9"))
-  "DU_C2_power9,VSU_power9")
+  "DU_C2_3_power9,VSU_power9")
 ; 10 cycle CR latency
 (define_bypass 10 "power9-mul-compare"
 		 "power9-crlogical,power9-mfcr,power9-mfcrf")
@@ -349,7 +356,7 @@ (define_insn_reservation "power9-fpsimpl
(eq_attr "cpu" "power9"))
   "DU_slice_3_power9,VSU_power9")
 
-(define_insn_reservation "power9-fp" 7
+(define_insn_reservation "power9-fp" 5
   (and (eq_attr "type" "fp,dmul")
(eq_attr "cpu" "power9"))
   "DU_slice_3_power9,VSU_power9")
@@ -366,7 +373,7 @@ (define_insn_reservation "power9-sdiv" 2
(eq_attr "cpu" "power9"))
   "DU_slice_3_power9,VSU_power9")
 
-(define_insn_reservation "power9-ddiv" 33
+(define_insn_reservation "power9-ddiv" 27
   (and (eq_attr "type" "ddiv")
(eq_attr "cpu" "power9"))
   "DU_slice_3_power9,VSU_power9")
@@ -419,12 +426,12 @@ (define_insn_reservation "power9-veccomp
(eq_attr "cpu" "power9"))
   "DU_super_power9,VSU_super_power9")
 
-(define_insn_reservation "power9-vecfdiv" 28
+(define_insn_reservation "power9-vecfdiv" 24
   (and (eq_attr "type" "vecfdiv")
(eq_attr "cpu" "power9"))
   "DU_super_power9,VSU_super_power9")
 
-(define_insn_reservation "power9-vecdiv" 32
+(define_insn_reservation "power9-vecdiv" 27
   (and (eq_attr "type" "vecdiv")
(eq_attr "size" "!128")
(eq_attr "cpu" "power9"))

1 2 3 >

1 - 100 of 206 matches

Mail list logo