Re: [PATCH V4, rs6000] Disable generation of scalar modulo instructions
On 9/13/23 3:48 PM, Segher Boessenkool wrote: - "TARGET_POWER10 && TARGET_POWERPC64" + "TARGET_POWER10 && TARGET_POWERPC64 && !RS6000_DISABLE_SCALAR_MODULO" "vmoduq %0,%1,%2" Did we ever test if this insn in fact is slower as well? I don't mean either way, orthogonality is good, but just for my enlightenment. Yes, I tested quadword too and saw the same result, div/mul/sub is the better option. With improved changelog: okay for trunk. Okay for all backports as well (after some soak time). Thanks, updated changelog and committed to trunk. Will backport after burn-in. -Pat
PING^4: [PATCH V4, rs6000] Disable generation of scalar modulo instructions
Ping. On 6/30/23 2:26 PM, Pat Haugen via Gcc-patches wrote: Updated from prior version to address latest review comment (simplify umod3). Disable generation of scalar modulo instructions. It was recently discovered that the scalar modulo instructions can suffer noticeable performance issues for certain input values. This patch disables their generation since the equivalent div/mul/sub sequence does not suffer the same problem. Bootstrapped and regression tested on powerpc64/powerpc64le. Ok for master and backports after burn in? -Pat 2023-06-30 Pat Haugen gcc/ * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling scalar modulo. * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New. * config/rs6000/rs6000.md (mod3, *mod3): Disable. (define_expand umod3): New. (define_insn umod3): Rename to *umod3 and disable. (umodti3, modti3): Disable. gcc/testsuite/ * gcc.target/powerpc/clone1.c: Add xfails. * gcc.target/powerpc/clone3.c: Likewise. * gcc.target/powerpc/mod-1.c: Update scan strings and add xfails. * gcc.target/powerpc/mod-2.c: Likewise. * gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.
PING^3: [PATCH V4, rs6000] Disable generation of scalar modulo instructions
Ping. On 6/30/23 2:26 PM, Pat Haugen via Gcc-patches wrote: Updated from prior version to address latest review comment (simplify umod3). Disable generation of scalar modulo instructions. It was recently discovered that the scalar modulo instructions can suffer noticeable performance issues for certain input values. This patch disables their generation since the equivalent div/mul/sub sequence does not suffer the same problem. Bootstrapped and regression tested on powerpc64/powerpc64le. Ok for master and backports after burn in? -Pat 2023-06-30 Pat Haugen gcc/ * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling scalar modulo. * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New. * config/rs6000/rs6000.md (mod3, *mod3): Disable. (define_expand umod3): New. (define_insn umod3): Rename to *umod3 and disable. (umodti3, modti3): Disable. gcc/testsuite/ * gcc.target/powerpc/clone1.c: Add xfails. * gcc.target/powerpc/clone3.c: Likewise. * gcc.target/powerpc/mod-1.c: Update scan strings and add xfails. * gcc.target/powerpc/mod-2.c: Likewise. * gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.
PING ^2: [PATCH V4, rs6000] Disable generation of scalar modulo instructions
On 6/30/23 2:26 PM, Pat Haugen via Gcc-patches wrote: Updated from prior version to address latest review comment (simplify umod3). Disable generation of scalar modulo instructions. It was recently discovered that the scalar modulo instructions can suffer noticeable performance issues for certain input values. This patch disables their generation since the equivalent div/mul/sub sequence does not suffer the same problem. Bootstrapped and regression tested on powerpc64/powerpc64le. Ok for master and backports after burn in? -Pat 2023-06-30 Pat Haugen gcc/ * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling scalar modulo. * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New. * config/rs6000/rs6000.md (mod3, *mod3): Disable. (define_expand umod3): New. (define_insn umod3): Rename to *umod3 and disable. (umodti3, modti3): Disable. gcc/testsuite/ * gcc.target/powerpc/clone1.c: Add xfails. * gcc.target/powerpc/clone3.c: Likewise. * gcc.target/powerpc/mod-1.c: Update scan strings and add xfails. * gcc.target/powerpc/mod-2.c: Likewise. * gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.
Re: [PATCH V4, rs6000] Disable generation of scalar modulo instructions
Ping. On 6/30/23 2:26 PM, Pat Haugen via Gcc-patches wrote: Updated from prior version to address latest review comment (simplify umod3). Disable generation of scalar modulo instructions. It was recently discovered that the scalar modulo instructions can suffer noticeable performance issues for certain input values. This patch disables their generation since the equivalent div/mul/sub sequence does not suffer the same problem. Bootstrapped and regression tested on powerpc64/powerpc64le. Ok for master and backports after burn in? -Pat 2023-06-30 Pat Haugen gcc/ * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling scalar modulo. * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New. * config/rs6000/rs6000.md (mod3, *mod3): Disable. (define_expand umod3): New. (define_insn umod3): Rename to *umod3 and disable. (umodti3, modti3): Disable. gcc/testsuite/ * gcc.target/powerpc/clone1.c: Add xfails. * gcc.target/powerpc/clone3.c: Likewise. * gcc.target/powerpc/mod-1.c: Update scan strings and add xfails. * gcc.target/powerpc/mod-2.c: Likewise. * gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.
[PATCH V4, rs6000] Disable generation of scalar modulo instructions
Updated from prior version to address latest review comment (simplify umod3). Disable generation of scalar modulo instructions. It was recently discovered that the scalar modulo instructions can suffer noticeable performance issues for certain input values. This patch disables their generation since the equivalent div/mul/sub sequence does not suffer the same problem. Bootstrapped and regression tested on powerpc64/powerpc64le. Ok for master and backports after burn in? -Pat 2023-06-30 Pat Haugen gcc/ * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling scalar modulo. * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New. * config/rs6000/rs6000.md (mod3, *mod3): Disable. (define_expand umod3): New. (define_insn umod3): Rename to *umod3 and disable. (umodti3, modti3): Disable. gcc/testsuite/ * gcc.target/powerpc/clone1.c: Add xfails. * gcc.target/powerpc/clone3.c: Likewise. * gcc.target/powerpc/mod-1.c: Update scan strings and add xfails. * gcc.target/powerpc/mod-2.c: Likewise. * gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails. diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 07c3a3d15ac..72abf285301 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -22157,7 +22157,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code, *total = rs6000_cost->divsi; } /* Add in shift and subtract for MOD unless we have a mod instruction. */ - if (!TARGET_MODULO && (code == MOD || code == UMOD)) + if ((!TARGET_MODULO + || (RS6000_DISABLE_SCALAR_MODULO && SCALAR_INT_MODE_P (mode))) +&& (code == MOD || code == UMOD)) *total += COSTS_N_INSNS (2); return false; diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index 3503614efbd..22595f6ebd7 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -2492,3 +2492,9 @@ while (0) rs6000_asm_output_opcode (STREAM); \ } \ while (0) + +/* Disable generation of scalar modulo instructions due to performance issues + with certain input values. This can be removed in the future when the + issues have been resolved. */ +#define RS6000_DISABLE_SCALAR_MODULO 1 + diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index cdab49fbb91..555c8525333 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -3422,6 +3422,17 @@ (define_expand "mod3" FAIL; operands[2] = force_reg (mode, operands[2]); + + if (RS6000_DISABLE_SCALAR_MODULO) + { + temp1 = gen_reg_rtx (mode); + temp2 = gen_reg_rtx (mode); + + emit_insn (gen_div3 (temp1, operands[1], operands[2])); + emit_insn (gen_mul3 (temp2, temp1, operands[2])); + emit_insn (gen_sub3 (operands[0], operands[1], temp2)); + DONE; + } } else { @@ -3441,17 +3452,36 @@ (define_insn "*mod3" [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r") (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r") (match_operand:GPR 2 "gpc_reg_operand" "r,r")))] - "TARGET_MODULO" + "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO" "mods %0,%1,%2" [(set_attr "type" "div") (set_attr "size" "")]) +;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is +;; removed. +(define_expand "umod3" + [(set (match_operand:GPR 0 "gpc_reg_operand") + (umod:GPR (match_operand:GPR 1 "gpc_reg_operand") + (match_operand:GPR 2 "gpc_reg_operand")))] + "TARGET_MODULO" +{ + if (RS6000_DISABLE_SCALAR_MODULO) +{ + rtx temp1 = gen_reg_rtx (mode); + rtx temp2 = gen_reg_rtx (mode); + + emit_insn (gen_udiv3 (temp1, operands[1], operands[2])); + emit_insn (gen_mul3 (temp2, temp1, operands[2])); + emit_insn (gen_sub3 (operands[0], operands[1], temp2)); + DONE; +} +}) -(define_insn "umod3" +(define_insn "*umod3" [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r") (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r") (match_operand:GPR 2 "gpc_reg_operand" "r,r")))] - "TARGET_MODULO" + "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO" "modu %0,%1,%2" [(set_attr "type" "div") (set_attr "size" "")]) @@ -3508,7 +3538,7 @@ (define_insn "umodti3" [(set (match_operand:TI 0 "altivec_register_operand" "=v") (umod:TI (match_operand:TI
Re: [PATCH V3, rs6000] Disable generation of scalar modulo instructions
On 6/27/23 1:52 PM, Pat Haugen via Gcc-patches wrote: Updated from prior version to address review comments (update rs6000_rtx_cost, update scan strings of mod-1.c/mod-2.c)l. Disable generation of scalar modulo instructions. It was recently discovered that the scalar modulo instructions can suffer noticeable performance issues for certain input values. This patch disables their generation since the equivalent div/mul/sub sequence does not suffer the same problem. Bootstrapped and regression tested on powerpc64/powerpc64le. Ok for master and backports after burn in? -Pat 2023-06-27 Pat Haugen gcc/ * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling scalar modulo. * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New. * config/rs6000/rs6000.md (mod3, *mod3): Disable. (define_expand umod3): New. (define_insn umod3): Rename to *umod3 and disable. (umodti3, modti3): Disable. gcc/testsuite/ * gcc.target/powerpc/clone1.c: Add xfails. * gcc.target/powerpc/clone3.c: Likewise. * gcc.target/powerpc/mod-1.c: Update scan strings and add xfails. * gcc.target/powerpc/mod-2.c: Likewise. * gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails. Attaching patch since my mailer apparently messed up some formatting again. -Pat diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 546c353029b..2dae217bf64 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -22127,7 +22127,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code, *total = rs6000_cost->divsi; } /* Add in shift and subtract for MOD unless we have a mod instruction. */ - if (!TARGET_MODULO && (code == MOD || code == UMOD)) + if ((!TARGET_MODULO + || (RS6000_DISABLE_SCALAR_MODULO && SCALAR_INT_MODE_P (mode))) +&& (code == MOD || code == UMOD)) *total += COSTS_N_INSNS (2); return false; diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index 3503614efbd..22595f6ebd7 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -2492,3 +2492,9 @@ while (0) rs6000_asm_output_opcode (STREAM); \ } \ while (0) + +/* Disable generation of scalar modulo instructions due to performance issues + with certain input values. This can be removed in the future when the + issues have been resolved. */ +#define RS6000_DISABLE_SCALAR_MODULO 1 + diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index b0db8ae508d..6c2f237a539 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -3421,6 +3421,17 @@ (define_expand "mod3" FAIL; operands[2] = force_reg (mode, operands[2]); + + if (RS6000_DISABLE_SCALAR_MODULO) + { + temp1 = gen_reg_rtx (mode); + temp2 = gen_reg_rtx (mode); + + emit_insn (gen_div3 (temp1, operands[1], operands[2])); + emit_insn (gen_mul3 (temp2, temp1, operands[2])); + emit_insn (gen_sub3 (operands[0], operands[1], temp2)); + DONE; + } } else { @@ -3440,17 +3451,42 @@ (define_insn "*mod3" [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r") (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r") (match_operand:GPR 2 "gpc_reg_operand" "r,r")))] - "TARGET_MODULO" + "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO" "mods %0,%1,%2" [(set_attr "type" "div") (set_attr "size" "")]) +;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is +;; removed. +(define_expand "umod3" + [(set (match_operand:GPR 0 "gpc_reg_operand") + (umod:GPR (match_operand:GPR 1 "gpc_reg_operand") + (match_operand:GPR 2 "gpc_reg_operand")))] + "" +{ + rtx temp1; + rtx temp2; + + if (!TARGET_MODULO) + FAIL; -(define_insn "umod3" + if (RS6000_DISABLE_SCALAR_MODULO) +{ + temp1 = gen_reg_rtx (mode); + temp2 = gen_reg_rtx (mode); + + emit_insn (gen_udiv3 (temp1, operands[1], operands[2])); + emit_insn (gen_mul3 (temp2, temp1, operands[2])); + emit_insn (gen_sub3 (operands[0], operands[1], temp2)); + DONE; +} +}) + +(define_insn "*umod3" [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r") (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r") (match_operand:GPR 2 "gpc_reg_operand" "r,r")))] - "TARGET_MODULO" + "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO" "modu %0,%1,%2" [(set
[PATCH V3, rs6000] Disable generation of scalar modulo instructions
Updated from prior version to address review comments (update rs6000_rtx_cost, update scan strings of mod-1.c/mod-2.c)l. Disable generation of scalar modulo instructions. It was recently discovered that the scalar modulo instructions can suffer noticeable performance issues for certain input values. This patch disables their generation since the equivalent div/mul/sub sequence does not suffer the same problem. Bootstrapped and regression tested on powerpc64/powerpc64le. Ok for master and backports after burn in? -Pat 2023-06-27 Pat Haugen gcc/ * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling scalar modulo. * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New. * config/rs6000/rs6000.md (mod3, *mod3): Disable. (define_expand umod3): New. (define_insn umod3): Rename to *umod3 and disable. (umodti3, modti3): Disable. gcc/testsuite/ * gcc.target/powerpc/clone1.c: Add xfails. * gcc.target/powerpc/clone3.c: Likewise. * gcc.target/powerpc/mod-1.c: Update scan strings and add xfails. * gcc.target/powerpc/mod-2.c: Likewise. * gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails. diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 546c353029b..2dae217bf64 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -22127,7 +22127,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code, *total = rs6000_cost->divsi; } /* Add in shift and subtract for MOD unless we have a mod instruction. */ - if (!TARGET_MODULO && (code == MOD || code == UMOD)) + if ((!TARGET_MODULO + || (RS6000_DISABLE_SCALAR_MODULO && SCALAR_INT_MODE_P (mode))) +&& (code == MOD || code == UMOD)) *total += COSTS_N_INSNS (2); return false; diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index 3503614efbd..22595f6ebd7 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -2492,3 +2492,9 @@ while (0) rs6000_asm_output_opcode (STREAM); \ } \ while (0) + +/* Disable generation of scalar modulo instructions due to performance issues + with certain input values. This can be removed in the future when the + issues have been resolved. */ +#define RS6000_DISABLE_SCALAR_MODULO 1 + diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index b0db8ae508d..6c2f237a539 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -3421,6 +3421,17 @@ (define_expand "mod3" FAIL; operands[2] = force_reg (mode, operands[2]); + + if (RS6000_DISABLE_SCALAR_MODULO) + { + temp1 = gen_reg_rtx (mode); + temp2 = gen_reg_rtx (mode); + + emit_insn (gen_div3 (temp1, operands[1], operands[2])); + emit_insn (gen_mul3 (temp2, temp1, operands[2])); + emit_insn (gen_sub3 (operands[0], operands[1], temp2)); + DONE; + } } else { @@ -3440,17 +3451,42 @@ (define_insn "*mod3" [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r") (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r") (match_operand:GPR 2 "gpc_reg_operand" "r,r")))] - "TARGET_MODULO" + "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO" "mods %0,%1,%2" [(set_attr "type" "div") (set_attr "size" "")]) +;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is +;; removed. +(define_expand "umod3" + [(set (match_operand:GPR 0 "gpc_reg_operand") + (umod:GPR (match_operand:GPR 1 "gpc_reg_operand") + (match_operand:GPR 2 "gpc_reg_operand")))] + "" +{ + rtx temp1; + rtx temp2; + + if (!TARGET_MODULO) + FAIL; -(define_insn "umod3" + if (RS6000_DISABLE_SCALAR_MODULO) +{ + temp1 = gen_reg_rtx (mode); + temp2 = gen_reg_rtx (mode); + + emit_insn (gen_udiv3 (temp1, operands[1], operands[2])); + emit_insn (gen_mul3 (temp2, temp1, operands[2])); + emit_insn (gen_sub3 (operands[0], operands[1], temp2)); + DONE; +} +}) + +(define_insn "*umod3" [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r") (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r") (match_operand:GPR 2 "gpc_reg_operand" "r,r")))] - "TARGET_MODULO" + "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO" "modu %0,%1,%2" [(set_attr "type" "div") (set_attr "size" "")]) @@ -3507,7 +3543,
Re: [PATCH V2, rs6000] Disable generation of scalar modulo instructions
Ping ^3 On 4/18/23 7:22 AM, Pat Haugen via Gcc-patches wrote: Updated from prior patch to also disable for int128. Disable generation of scalar modulo instructions. It was recently discovered that the scalar modulo instructions can suffer noticeable performance issues for certain input values. This patch disables their generation since the equivalent div/mul/sub sequence does not suffer the same problem. Bootstrapped and regression tested on powerpc64/powerpc64le. Ok for master and backports after burn in? -Pat 2023-04-18 Pat Haugen gcc/ * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New. * config/rs6000/rs6000.md (mod3, *mod3): Disable. (define_expand umod3): New. (define_insn umod3): Rename to *umod3 and disable. (umodti3, modti3): Disable. gcc/testsuite/ * gcc.target/powerpc/clone1.c: Add xfails. * gcc.target/powerpc/clone3.c: Likewise. * gcc.target/powerpc/mod-1.c: Likewise. * gcc.target/powerpc/mod-2.c: Likewise. * gcc.target/powerpc/p10-vdivq-vmodq.c: Likewise. diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index 3503614efbd..1cf0a0013c0 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -2492,3 +2492,9 @@ while (0) rs6000_asm_output_opcode (STREAM); \ } \ while (0) + +/* Disable generation of scalar modulo instructions due to performance issues + with certain input values. This can be removed in the future when the + issues have been resolved. */ +#define RS6000_DISABLE_SCALAR_MODULO 1 + diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 44f7dd509cb..4f397bc9179 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -3421,6 +3421,17 @@ (define_expand "mod3" FAIL; operands[2] = force_reg (mode, operands[2]); + + if (RS6000_DISABLE_SCALAR_MODULO) + { + temp1 = gen_reg_rtx (mode); + temp2 = gen_reg_rtx (mode); + + emit_insn (gen_div3 (temp1, operands[1], operands[2])); + emit_insn (gen_mul3 (temp2, temp1, operands[2])); + emit_insn (gen_sub3 (operands[0], operands[1], temp2)); + DONE; + } } else { @@ -3440,17 +3451,42 @@ (define_insn "*mod3" [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r") (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r") (match_operand:GPR 2 "gpc_reg_operand" "r,r")))] - "TARGET_MODULO" + "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO" "mods %0,%1,%2" [(set_attr "type" "div") (set_attr "size" "")]) +;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is +;; removed. +(define_expand "umod3" + [(set (match_operand:GPR 0 "gpc_reg_operand") + (umod:GPR (match_operand:GPR 1 "gpc_reg_operand") + (match_operand:GPR 2 "gpc_reg_operand")))] + "" +{ + rtx temp1; + rtx temp2; + + if (!TARGET_MODULO) + FAIL; -(define_insn "umod3" + if (RS6000_DISABLE_SCALAR_MODULO) + { + temp1 = gen_reg_rtx (mode); + temp2 = gen_reg_rtx (mode); + + emit_insn (gen_udiv3 (temp1, operands[1], operands[2])); + emit_insn (gen_mul3 (temp2, temp1, operands[2])); + emit_insn (gen_sub3 (operands[0], operands[1], temp2)); + DONE; + } +}) + +(define_insn "*umod3" [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r") (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r") (match_operand:GPR 2 "gpc_reg_operand" "r,r")))] - "TARGET_MODULO" + "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO" "modu %0,%1,%2" [(set_attr "type" "div") (set_attr "size" "")]) @@ -3507,7 +3543,7 @@ (define_insn "umodti3" [(set (match_operand:TI 0 "altivec_register_operand" "=v") (umod:TI (match_operand:TI 1 "altivec_register_operand" "v") (match_operand:TI 2 "altivec_register_operand" "v")))] - "TARGET_POWER10 && TARGET_POWERPC64" + "TARGET_POWER10 && TARGET_POWERPC64 && !RS6000_DISABLE_SCALAR_MODULO" "vmoduq %0,%1,%2" [(set_attr "type" "vecdiv") (set_attr "size" "128")]) @@ -3516,7 +3552,7 @@ (define_insn "modti3" [(set (match_operand:TI 0 "altivec_register_operand" "=v") (mod:TI (match_operand:TI 1 "altivec_register_operand" "v") (match_operand:TI 2 "altivec_register_operand" "v")))] - "TARGET_POWER10 && TARGE
Re: [PATCH V2, rs6000] Disable generation of scalar modulo instructions
Ping. On 4/18/23 7:22 AM, Pat Haugen via Gcc-patches wrote: Updated from prior patch to also disable for int128. Disable generation of scalar modulo instructions. It was recently discovered that the scalar modulo instructions can suffer noticeable performance issues for certain input values. This patch disables their generation since the equivalent div/mul/sub sequence does not suffer the same problem. Bootstrapped and regression tested on powerpc64/powerpc64le. Ok for master and backports after burn in? -Pat 2023-04-18 Pat Haugen gcc/ * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New. * config/rs6000/rs6000.md (mod3, *mod3): Disable. (define_expand umod3): New. (define_insn umod3): Rename to *umod3 and disable. (umodti3, modti3): Disable. gcc/testsuite/ * gcc.target/powerpc/clone1.c: Add xfails. * gcc.target/powerpc/clone3.c: Likewise. * gcc.target/powerpc/mod-1.c: Likewise. * gcc.target/powerpc/mod-2.c: Likewise. * gcc.target/powerpc/p10-vdivq-vmodq.c: Likewise. diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index 3503614efbd..1cf0a0013c0 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -2492,3 +2492,9 @@ while (0) rs6000_asm_output_opcode (STREAM); \ } \ while (0) + +/* Disable generation of scalar modulo instructions due to performance issues + with certain input values. This can be removed in the future when the + issues have been resolved. */ +#define RS6000_DISABLE_SCALAR_MODULO 1 + diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 44f7dd509cb..4f397bc9179 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -3421,6 +3421,17 @@ (define_expand "mod3" FAIL; operands[2] = force_reg (mode, operands[2]); + + if (RS6000_DISABLE_SCALAR_MODULO) + { + temp1 = gen_reg_rtx (mode); + temp2 = gen_reg_rtx (mode); + + emit_insn (gen_div3 (temp1, operands[1], operands[2])); + emit_insn (gen_mul3 (temp2, temp1, operands[2])); + emit_insn (gen_sub3 (operands[0], operands[1], temp2)); + DONE; + } } else { @@ -3440,17 +3451,42 @@ (define_insn "*mod3" [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r") (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r") (match_operand:GPR 2 "gpc_reg_operand" "r,r")))] - "TARGET_MODULO" + "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO" "mods %0,%1,%2" [(set_attr "type" "div") (set_attr "size" "")]) +;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is +;; removed. +(define_expand "umod3" + [(set (match_operand:GPR 0 "gpc_reg_operand") + (umod:GPR (match_operand:GPR 1 "gpc_reg_operand") + (match_operand:GPR 2 "gpc_reg_operand")))] + "" +{ + rtx temp1; + rtx temp2; + + if (!TARGET_MODULO) + FAIL; -(define_insn "umod3" + if (RS6000_DISABLE_SCALAR_MODULO) + { + temp1 = gen_reg_rtx (mode); + temp2 = gen_reg_rtx (mode); + + emit_insn (gen_udiv3 (temp1, operands[1], operands[2])); + emit_insn (gen_mul3 (temp2, temp1, operands[2])); + emit_insn (gen_sub3 (operands[0], operands[1], temp2)); + DONE; + } +}) + +(define_insn "*umod3" [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r") (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r") (match_operand:GPR 2 "gpc_reg_operand" "r,r")))] - "TARGET_MODULO" + "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO" "modu %0,%1,%2" [(set_attr "type" "div") (set_attr "size" "")]) @@ -3507,7 +3543,7 @@ (define_insn "umodti3" [(set (match_operand:TI 0 "altivec_register_operand" "=v") (umod:TI (match_operand:TI 1 "altivec_register_operand" "v") (match_operand:TI 2 "altivec_register_operand" "v")))] - "TARGET_POWER10 && TARGET_POWERPC64" + "TARGET_POWER10 && TARGET_POWERPC64 && !RS6000_DISABLE_SCALAR_MODULO" "vmoduq %0,%1,%2" [(set_attr "type" "vecdiv") (set_attr "size" "128")]) @@ -3516,7 +3552,7 @@ (define_insn "modti3" [(set (match_operand:TI 0 "altivec_register_operand" "=v") (mod:TI (match_operand:TI 1 "altivec_register_operand" "v") (match_operand:TI 2 "altivec_register_operand" "v")))] - "TARGET_POWER10 && TARGE
Re: [PATCH V2, rs6000] Disable generation of scalar modulo instructions
Ping. On 4/18/23 7:22 AM, Pat Haugen via Gcc-patches wrote: Updated from prior patch to also disable for int128. Disable generation of scalar modulo instructions. It was recently discovered that the scalar modulo instructions can suffer noticeable performance issues for certain input values. This patch disables their generation since the equivalent div/mul/sub sequence does not suffer the same problem. Bootstrapped and regression tested on powerpc64/powerpc64le. Ok for master and backports after burn in? -Pat 2023-04-18 Pat Haugen gcc/ * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New. * config/rs6000/rs6000.md (mod3, *mod3): Disable. (define_expand umod3): New. (define_insn umod3): Rename to *umod3 and disable. (umodti3, modti3): Disable. gcc/testsuite/ * gcc.target/powerpc/clone1.c: Add xfails. * gcc.target/powerpc/clone3.c: Likewise. * gcc.target/powerpc/mod-1.c: Likewise. * gcc.target/powerpc/mod-2.c: Likewise. * gcc.target/powerpc/p10-vdivq-vmodq.c: Likewise. diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index 3503614efbd..1cf0a0013c0 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -2492,3 +2492,9 @@ while (0) rs6000_asm_output_opcode (STREAM); \ } \ while (0) + +/* Disable generation of scalar modulo instructions due to performance issues + with certain input values. This can be removed in the future when the + issues have been resolved. */ +#define RS6000_DISABLE_SCALAR_MODULO 1 + diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 44f7dd509cb..4f397bc9179 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -3421,6 +3421,17 @@ (define_expand "mod3" FAIL; operands[2] = force_reg (mode, operands[2]); + + if (RS6000_DISABLE_SCALAR_MODULO) + { + temp1 = gen_reg_rtx (mode); + temp2 = gen_reg_rtx (mode); + + emit_insn (gen_div3 (temp1, operands[1], operands[2])); + emit_insn (gen_mul3 (temp2, temp1, operands[2])); + emit_insn (gen_sub3 (operands[0], operands[1], temp2)); + DONE; + } } else { @@ -3440,17 +3451,42 @@ (define_insn "*mod3" [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r") (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r") (match_operand:GPR 2 "gpc_reg_operand" "r,r")))] - "TARGET_MODULO" + "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO" "mods %0,%1,%2" [(set_attr "type" "div") (set_attr "size" "")]) +;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is +;; removed. +(define_expand "umod3" + [(set (match_operand:GPR 0 "gpc_reg_operand") + (umod:GPR (match_operand:GPR 1 "gpc_reg_operand") + (match_operand:GPR 2 "gpc_reg_operand")))] + "" +{ + rtx temp1; + rtx temp2; + + if (!TARGET_MODULO) + FAIL; -(define_insn "umod3" + if (RS6000_DISABLE_SCALAR_MODULO) + { + temp1 = gen_reg_rtx (mode); + temp2 = gen_reg_rtx (mode); + + emit_insn (gen_udiv3 (temp1, operands[1], operands[2])); + emit_insn (gen_mul3 (temp2, temp1, operands[2])); + emit_insn (gen_sub3 (operands[0], operands[1], temp2)); + DONE; + } +}) + +(define_insn "*umod3" [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r") (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r") (match_operand:GPR 2 "gpc_reg_operand" "r,r")))] - "TARGET_MODULO" + "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO" "modu %0,%1,%2" [(set_attr "type" "div") (set_attr "size" "")]) @@ -3507,7 +3543,7 @@ (define_insn "umodti3" [(set (match_operand:TI 0 "altivec_register_operand" "=v") (umod:TI (match_operand:TI 1 "altivec_register_operand" "v") (match_operand:TI 2 "altivec_register_operand" "v")))] - "TARGET_POWER10 && TARGET_POWERPC64" + "TARGET_POWER10 && TARGET_POWERPC64 && !RS6000_DISABLE_SCALAR_MODULO" "vmoduq %0,%1,%2" [(set_attr "type" "vecdiv") (set_attr "size" "128")]) @@ -3516,7 +3552,7 @@ (define_insn "modti3" [(set (match_operand:TI 0 "altivec_register_operand" "=v") (mod:TI (match_operand:TI 1 "altivec_register_operand" "v") (match_operand:TI 2 "altivec_register_operand" "v")))] - "TARGET_POWER10 && TARGE
[PATCH V2, rs6000] Disable generation of scalar modulo instructions
Updated from prior patch to also disable for int128. Disable generation of scalar modulo instructions. It was recently discovered that the scalar modulo instructions can suffer noticeable performance issues for certain input values. This patch disables their generation since the equivalent div/mul/sub sequence does not suffer the same problem. Bootstrapped and regression tested on powerpc64/powerpc64le. Ok for master and backports after burn in? -Pat 2023-04-18 Pat Haugen gcc/ * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New. * config/rs6000/rs6000.md (mod3, *mod3): Disable. (define_expand umod3): New. (define_insn umod3): Rename to *umod3 and disable. (umodti3, modti3): Disable. gcc/testsuite/ * gcc.target/powerpc/clone1.c: Add xfails. * gcc.target/powerpc/clone3.c: Likewise. * gcc.target/powerpc/mod-1.c: Likewise. * gcc.target/powerpc/mod-2.c: Likewise. * gcc.target/powerpc/p10-vdivq-vmodq.c: Likewise. diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index 3503614efbd..1cf0a0013c0 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -2492,3 +2492,9 @@ while (0) rs6000_asm_output_opcode (STREAM); \ } \ while (0) + +/* Disable generation of scalar modulo instructions due to performance issues + with certain input values. This can be removed in the future when the + issues have been resolved. */ +#define RS6000_DISABLE_SCALAR_MODULO 1 + diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 44f7dd509cb..4f397bc9179 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -3421,6 +3421,17 @@ (define_expand "mod3" FAIL; operands[2] = force_reg (mode, operands[2]); + + if (RS6000_DISABLE_SCALAR_MODULO) + { + temp1 = gen_reg_rtx (mode); + temp2 = gen_reg_rtx (mode); + + emit_insn (gen_div3 (temp1, operands[1], operands[2])); + emit_insn (gen_mul3 (temp2, temp1, operands[2])); + emit_insn (gen_sub3 (operands[0], operands[1], temp2)); + DONE; + } } else { @@ -3440,17 +3451,42 @@ (define_insn "*mod3" [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r") (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r") (match_operand:GPR 2 "gpc_reg_operand" "r,r")))] - "TARGET_MODULO" + "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO" "mods %0,%1,%2" [(set_attr "type" "div") (set_attr "size" "")]) +;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is +;; removed. +(define_expand "umod3" + [(set (match_operand:GPR 0 "gpc_reg_operand") + (umod:GPR (match_operand:GPR 1 "gpc_reg_operand") + (match_operand:GPR 2 "gpc_reg_operand")))] + "" +{ + rtx temp1; + rtx temp2; + + if (!TARGET_MODULO) + FAIL; -(define_insn "umod3" + if (RS6000_DISABLE_SCALAR_MODULO) +{ + temp1 = gen_reg_rtx (mode); + temp2 = gen_reg_rtx (mode); + + emit_insn (gen_udiv3 (temp1, operands[1], operands[2])); + emit_insn (gen_mul3 (temp2, temp1, operands[2])); + emit_insn (gen_sub3 (operands[0], operands[1], temp2)); + DONE; +} +}) + +(define_insn "*umod3" [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r") (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r") (match_operand:GPR 2 "gpc_reg_operand" "r,r")))] - "TARGET_MODULO" + "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO" "modu %0,%1,%2" [(set_attr "type" "div") (set_attr "size" "")]) @@ -3507,7 +3543,7 @@ (define_insn "umodti3" [(set (match_operand:TI 0 "altivec_register_operand" "=v") (umod:TI (match_operand:TI 1 "altivec_register_operand" "v") (match_operand:TI 2 "altivec_register_operand" "v")))] - "TARGET_POWER10 && TARGET_POWERPC64" + "TARGET_POWER10 && TARGET_POWERPC64 && !RS6000_DISABLE_SCALAR_MODULO" "vmoduq %0,%1,%2" [(set_attr "type" "vecdiv") (set_attr "size" "128")]) @@ -3516,7 +3552,7 @@ (define_insn "modti3" [(set (match_operand:TI 0 "altivec_register_operand" "=v") (mod:TI (match_operand:TI 1 "altivec_register_operand" "v") (match_operand:TI 2 "altivec_register_operand&
[PATCH, rs6000] Disable generation of scalar modulo instructions
Disable generation of scalar modulo instructions. It was recently discovered that the scalar modulo instructions can suffer noticeable performance issues for certain input values. This patch disables their generation since the equivalent div/mul/sub sequence does not suffer the same problem. Bootstrapped and regression tested on powerpc64le (Power10). Ok for master and backports after burn in? -Pat 2023-04-07 Pat Haugen gcc/ * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New. * config/rs6000/rs6000.md (mod3, *mod3): Use it. (define_expand umod3): New. (define_insn umod3): Rename to *umod3 and disable. gcc/testsuite/ * gcc.target/powerpc/clone1.c: Add xfails. * gcc.target/powerpc/clone3.c: Likewise. * gcc.target/powerpc/mod-1.c: Likewise. * gcc.target/powerpc/mod-2.c: Likewise. diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index 3503614efbd..e4d9f357622 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -2492,3 +2492,9 @@ while (0) rs6000_asm_output_opcode (STREAM); \ } \ while (0) + +/* Disable generation of scalar modulo instructions for word/dword due to + performance issues with certain input values. This can be removed in the + future when the issues have been resolved. */ +#define RS6000_DISABLE_SCALAR_MODULO 1 + diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 44f7dd509cb..a267b7ee2d0 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -3421,6 +3421,17 @@ (define_expand "mod3" FAIL; operands[2] = force_reg (mode, operands[2]); + + if (RS6000_DISABLE_SCALAR_MODULO) + { + temp1 = gen_reg_rtx (mode); + temp2 = gen_reg_rtx (mode); + + emit_insn (gen_div3 (temp1, operands[1], operands[2])); + emit_insn (gen_mul3 (temp2, temp1, operands[2])); + emit_insn (gen_sub3 (operands[0], operands[1], temp2)); + DONE; + } } else { @@ -3440,17 +3451,42 @@ (define_insn "*mod3" [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r") (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r") (match_operand:GPR 2 "gpc_reg_operand" "r,r")))] - "TARGET_MODULO" + "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO" "mods %0,%1,%2" [(set_attr "type" "div") (set_attr "size" "")]) +;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is +;; removed. +(define_expand "umod3" + [(set (match_operand:GPR 0 "gpc_reg_operand") + (umod:GPR (match_operand:GPR 1 "gpc_reg_operand") + (match_operand:GPR 2 "gpc_reg_operand")))] + "" +{ + rtx temp1; + rtx temp2; + + if (!TARGET_MODULO) + FAIL; + + if (RS6000_DISABLE_SCALAR_MODULO) +{ + temp1 = gen_reg_rtx (mode); + temp2 = gen_reg_rtx (mode); + + emit_insn (gen_udiv3 (temp1, operands[1], operands[2])); + emit_insn (gen_mul3 (temp2, temp1, operands[2])); + emit_insn (gen_sub3 (operands[0], operands[1], temp2)); + DONE; +} +}) -(define_insn "umod3" +(define_insn "*umod3" [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r") (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r") (match_operand:GPR 2 "gpc_reg_operand" "r,r")))] - "TARGET_MODULO" + "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO" "modu %0,%1,%2" [(set_attr "type" "div") (set_attr "size" "")]) diff --git a/gcc/testsuite/gcc.target/powerpc/clone1.c b/gcc/testsuite/gcc.target/powerpc/clone1.c index c69fd2aa1b8..74323ca0e8c 100644 --- a/gcc/testsuite/gcc.target/powerpc/clone1.c +++ b/gcc/testsuite/gcc.target/powerpc/clone1.c @@ -21,6 +21,7 @@ long mod_func_or (long a, long b, long c) return mod_func (a, b) | c; } -/* { dg-final { scan-assembler-times {\mdivd\M} 1 } } */ -/* { dg-final { scan-assembler-times {\mmulld\M} 1 } } */ -/* { dg-final { scan-assembler-times {\mmodsd\M} 1 } } */ +/* { Fail due to RS6000_DISABLE_SCALAR_MODULO. */ +/* { dg-final { scan-assembler-times {\mdivd\M} 1 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times {\mmulld\M} 1 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times {\mmodsd\M} 1 { xfail *-*-* } } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/clone3.c b/gcc/testsuite/gcc.target/powerpc/clone3.c index 911b88b781d..d3eb4dd2378 100644 --- a/gcc/testsuite/gcc.target/powerpc/clone3.c +++ b/gcc/testsuite/gcc.target/powerpc/clone3.c @@ -27,7 +27,8 @
[PATCH V2, rs6000] Tweak modulo define_insns to eliminate register copy
Updated patch with review comments addressed: fixed up testcase and added another testcase to verify peephole is functional. Don't force target of modulo into a distinct register. The define_insns for the modulo operation currently force the target register to a distinct reg in preparation for a possible future peephole combining div/mod. But this can lead to cases of a needless copy being inserted. Fixed with the following patch. Bootstrapped and regression tested on powerpc64le. Ok for master? -Pat 2023-03-21 Pat Haugen gcc/ * config/rs6000/rs6000.md (*mod3, umod3): Add non-earlyclobber alternative. gcc/testsuite/ * gcc.target/powerpc/mod-no_copy.c: New. * gcc.target/powerpc/mod-peephole.c: New. diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 81bffb04ceb..44f7dd509cb 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -3437,9 +3437,9 @@ (define_expand "mod3" ;; In order to enable using a peephole2 for combining div/mod to eliminate the ;; mod, prefer putting the result of mod into a different register (define_insn "*mod3" - [(set (match_operand:GPR 0 "gpc_reg_operand" "=") -(mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r") -(match_operand:GPR 2 "gpc_reg_operand" "r")))] + [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r") +(mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r") +(match_operand:GPR 2 "gpc_reg_operand" "r,r")))] "TARGET_MODULO" "mods %0,%1,%2" [(set_attr "type" "div") @@ -3447,9 +3447,9 @@ (define_insn "*mod3" (define_insn "umod3" - [(set (match_operand:GPR 0 "gpc_reg_operand" "=") -(umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r") - (match_operand:GPR 2 "gpc_reg_operand" "r")))] + [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r") +(umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r") + (match_operand:GPR 2 "gpc_reg_operand" "r,r")))] "TARGET_MODULO" "modu %0,%1,%2" [(set_attr "type" "div") diff --git a/gcc/testsuite/gcc.target/powerpc/mod-no_copy.c b/gcc/testsuite/gcc.target/powerpc/mod-no_copy.c new file mode 100644 index 000..c55e486ee9b --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/mod-no_copy.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-mdejagnu-cpu=power9 -O2" } */ + +/* Verify r3 is used as source and target, no copy inserted. */ + +long foo (long a, long b) +{ + return (a % b); +} + +unsigned long foo2 (unsigned long a, unsigned long b) +{ + return (a % b); +} + +/* { dg-final { scan-assembler-not {\mmr\M} } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/mod-peephole.c b/gcc/testsuite/gcc.target/powerpc/mod-peephole.c new file mode 100644 index 000..7517fbc397c --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/mod-peephole.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-mdejagnu-cpu=power9 -O2" } */ + +/* Verify peephole fires to combine div/mod using same opnds. */ + +long foo (long a, long b) +{ + long x, y; + + x = a / b; + y = a % b; + return (x + y); +} + +unsigned long foo2 (unsigned long a, unsigned long b) +{ + unsigned long x, y; + + x = a / b; + y = a % b; + return (x + y); +} + +/* { dg-final { scan-assembler-not {\mmodsd\M} } } */ +/* { dg-final { scan-assembler-not {\mmodud\M} } } */
Re: [PATCH, rs6000] Tweak modulo define_insns to eliminate register copy
On 2/27/23 2:53 PM, Segher Boessenkool wrote: Hi! On Mon, Feb 27, 2023 at 02:12:23PM -0600, Pat Haugen wrote: On 2/27/23 11:08 AM, Segher Boessenkool wrote: On Mon, Feb 27, 2023 at 09:11:37AM -0600, Pat Haugen wrote: The define_insns for the modulo operation currently force the target register to a distinct reg in preparation for a possible future peephole combining div/mod. But this can lead to cases of a needless copy being inserted. Fixed with the following patch. Have you verified those peepholes still match? Yes, I verified the peepholes still match and transform the sequence. Please add the testcases for that then? Or do we have tests for it already :-) I don't see one, but can add one. Do those peepholes actually improve performance? On new CPUs? The code here says ;; On machines with modulo support, do a combined div/mod the old fashioned ;; method, since the multiply/subtract is faster than doing the mod instruction ;; after a divide. but that really should not be true: we can do the div and mod in parallel (except in SMT4 perhaps, which we never schedule for anyway), so that should always be strictly faster. Since the modulo insns were introduced in Power9, we're just talking Power9/Power10. On paper, I would agree that separate div/mod could be slightly faster to get the mod result, "Slightly". It takes 12 cycles for the two in parallel (64-bit, p9), but 17 cycles for the "cheaper" sequence (divd+mulld+subf, 12+5+2). It is all worse if the units are busy of course, or if there are other problems. but if you throw in another independent div or mod in the insn stream then doing the peephole should be a clear win since that 3rd insn can execute in parallel with the initial divide as opposed to waiting for the one of the first div/mod to clear the exclusive stage of the pipe. That is the SMT4 case, the one we do not optimise for. SMT2 and ST can do four in parallel. This means you can start a div or mod every 2nd cycle on average, so it is very unlikely you will ever be limited by this on real code. Power9/Power10 only have 2 fixed-point divide units, and are able to issue 2 divides every 9/11 cycles (they aren't fully pipelined), with latencies of 12-24/12-25. Not saying that changes the "best case" scenario, just pointing out a lot of variables in play. -Pat
Re: [PATCH, rs6000] Tweak modulo define_insns to eliminate register copy
On 2/27/23 11:08 AM, Segher Boessenkool wrote: Hi! On Mon, Feb 27, 2023 at 09:11:37AM -0600, Pat Haugen wrote: The define_insns for the modulo operation currently force the target register to a distinct reg in preparation for a possible future peephole combining div/mod. But this can lead to cases of a needless copy being inserted. Fixed with the following patch. Have you verified those peepholes still match? Yes, I verified the peepholes still match and transform the sequence. Do those peepholes actually improve performance? On new CPUs? The code here says ;; On machines with modulo support, do a combined div/mod the old fashioned ;; method, since the multiply/subtract is faster than doing the mod instruction ;; after a divide. but that really should not be true: we can do the div and mod in parallel (except in SMT4 perhaps, which we never schedule for anyway), so that should always be strictly faster. Since the modulo insns were introduced in Power9, we're just talking Power9/Power10. On paper, I would agree that separate div/mod could be slightly faster to get the mod result, but if you throw in another independent div or mod in the insn stream then doing the peephole should be a clear win since that 3rd insn can execute in parallel with the initial divide as opposed to waiting for the one of the first div/mod to clear the exclusive stage of the pipe. --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/mod-no_copy.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ All files in gcc.target/powerpc/ test for this already. Just leave off the target clause here? +/* { dg-require-effective-target powerpc_p9modulo_ok } */ Leave out this line, because ... +/* { dg-options "-mdejagnu-cpu=power9 -O2" } */ ... the -mcpu= forces it to true always. Will update. -Pat +/* Verify r3 is used as source and target, no copy inserted. */ +/* { dg-final { scan-assembler-not {\mmr\M} } } */ That is probably good enough, yeah, since the test results in only a handful of insns. Segher
[PATCH, rs6000] Tweak modulo define_insns to eliminate register copy
Don't force target of modulo into a distinct register. The define_insns for the modulo operation currently force the target register to a distinct reg in preparation for a possible future peephole combining div/mod. But this can lead to cases of a needless copy being inserted. Fixed with the following patch. Bootstrapped and regression tested on powerpc64le. Ok for master? -Pat 2023-02-27 Pat Haugen gcc/ * config/rs6000/rs6000.md (*mod3, umod3): Add non-earlyclobber alternative. gcc/testsuite/ * gcc.target/powerpc/mod-no_copy.c: New. diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 81bffb04ceb..44f7dd509cb 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -3437,9 +3437,9 @@ (define_expand "mod3" ;; In order to enable using a peephole2 for combining div/mod to eliminate the ;; mod, prefer putting the result of mod into a different register (define_insn "*mod3" - [(set (match_operand:GPR 0 "gpc_reg_operand" "=") -(mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r") -(match_operand:GPR 2 "gpc_reg_operand" "r")))] + [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r") +(mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r") +(match_operand:GPR 2 "gpc_reg_operand" "r,r")))] "TARGET_MODULO" "mods %0,%1,%2" [(set_attr "type" "div") @@ -3447,9 +3447,9 @@ (define_insn "*mod3" (define_insn "umod3" - [(set (match_operand:GPR 0 "gpc_reg_operand" "=") -(umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r") - (match_operand:GPR 2 "gpc_reg_operand" "r")))] + [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r") +(umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r") + (match_operand:GPR 2 "gpc_reg_operand" "r,r")))] "TARGET_MODULO" "modu %0,%1,%2" [(set_attr "type" "div") diff --git a/gcc/testsuite/gcc.target/powerpc/mod-no_copy.c b/gcc/testsuite/gcc.target/powerpc/mod-no_copy.c new file mode 100644 index 000..91e3003b3fc --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/mod-no_copy.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-require-effective-target powerpc_p9modulo_ok } */ +/* { dg-options "-mdejagnu-cpu=power9 -O2" } */ + +/* Verify r3 is used as source and target, no copy inserted. */ + +long foo (long a, long b) +{ + return (a % b); +} + +unsigned long foo2 (unsigned long a, unsigned long b) +{ + return (a % b); +} + +/* { dg-final { scan-assembler-not {\mmr\M} } } */
Re: [PATCH] rs6000: Make P10_FUSION honour tuning setting
On 1/4/23 3:20 AM, Kewen.Lin via Gcc-patches wrote: diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 88c865b6b4b..6fa084c0807 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -4378,9 +4378,15 @@ rs6000_option_override_internal (bool global_init_p) rs6000_isa_flags &= ~OPTION_MASK_MMA; } - if (TARGET_POWER10 - && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION) == 0) -rs6000_isa_flags |= OPTION_MASK_P10_FUSION; + /* Enable power10 fusion if we are tuning for power10, even if we aren't + generating power10 instructions. */ + if (!(rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION)) +{ + if (processor_target_table[tune_index].processor == PROCESSOR_POWER10) You can use (rs6000_tune == PROCESSOR_POWER10) at this point. -Pat + rs6000_isa_flags |= OPTION_MASK_P10_FUSION; + else + rs6000_isa_flags &= ~OPTION_MASK_P10_FUSION; +}
Ping: [PATCH, rs6000] Fix passing of Coomplex IEEE 128-bit [PR99685]
Ping. On 4/26/22 3:06 PM, Pat Haugen via Gcc-patches wrote: Fix register count when not splitting Complex IEEE 128-bit args. For ABI_V4, we do not split complex args. This created a problem because even though an arg would be passed in two VSX regs, we were only advancing the function arg counter by one VSX register. Fixed with this patch. Bootstrapped and regression tested on powerpc64(32/64) and powerpc64le. Ok for master? -Pat 2022-04-26 Pat Haugen PR testsuite/99685 gcc/ * config/rs6000/rs6000-call.cc (rs6000_function_arg_advance_1): Bump register count when not splitting IEEE 128-bit Complex. diff --git a/gcc/config/rs6000/rs6000-call.cc b/gcc/config/rs6000/rs6000-call.cc index f06c692..9d18607 100644 --- a/gcc/config/rs6000/rs6000-call.cc +++ b/gcc/config/rs6000/rs6000-call.cc @@ -,6 +,12 @@ rs6000_function_arg_advance_1 (CUMULATIVE_ARGS *cum, machine_mode mode, { cum->vregno += n_elts; + /* If we are not splitting Complex IEEE 128-bit args then account for + the fact that they are passed in 2 VSX regs. */ + if (! targetm.calls.split_complex_arg && type + && TREE_CODE (type) == COMPLEX_TYPE && elt_mode == KCmode) + cum->vregno++; + if (!TARGET_ALTIVEC) error ("cannot pass argument in vector register because" " altivec instructions are disabled, use %qs"
[PATCH, rs6000] Fix passing of Coomplex IEEE 128-bit [PR99685]
Fix register count when not splitting Complex IEEE 128-bit args. For ABI_V4, we do not split complex args. This created a problem because even though an arg would be passed in two VSX regs, we were only advancing the function arg counter by one VSX register. Fixed with this patch. Bootstrapped and regression tested on powerpc64(32/64) and powerpc64le. Ok for master? -Pat 2022-04-26 Pat Haugen PR testsuite/99685 gcc/ * config/rs6000/rs6000-call.cc (rs6000_function_arg_advance_1): Bump register count when not splitting IEEE 128-bit Complex. diff --git a/gcc/config/rs6000/rs6000-call.cc b/gcc/config/rs6000/rs6000-call.cc index f06c692..9d18607 100644 --- a/gcc/config/rs6000/rs6000-call.cc +++ b/gcc/config/rs6000/rs6000-call.cc @@ -,6 +,12 @@ rs6000_function_arg_advance_1 (CUMULATIVE_ARGS *cum, machine_mode mode, { cum->vregno += n_elts; + /* If we are not splitting Complex IEEE 128-bit args then account for +the fact that they are passed in 2 VSX regs. */ + if (! targetm.calls.split_complex_arg && type + && TREE_CODE (type) == COMPLEX_TYPE && elt_mode == KCmode) + cum->vregno++; + if (!TARGET_ALTIVEC) error ("cannot pass argument in vector register because" " altivec instructions are disabled, use %qs"
PING: [PATCH, testsuite] Fix attr-retain-*.c testcases on 32-bit PowerPC [PR100407]
Ping. On 2/10/22 4:17 PM, Pat Haugen via Gcc-patches wrote: Per Alan's comment in the bugzilla, fix attr-retain-* tescases for 32-bit PowerPC. Bootstrapped and regression tested on powerpc64(32/64) and powerpc64le. Ok for master? -Pat 2022-02-10 Pat Haugen PR testsuite/100407 gcc/testsuite/ * gcc.c-torture/compile/attr-retain-1.c: Add -G0 for 32-bit PowerPC. * gcc.c-torture/compile/attr-retain-2.c: Likewise. diff --git a/gcc/testsuite/gcc.c-torture/compile/attr-retain-1.c b/gcc/testsuite/gcc.c-torture/compile/attr-retain-1.c index 6cab155..4a366eb 100644 --- a/gcc/testsuite/gcc.c-torture/compile/attr-retain-1.c +++ b/gcc/testsuite/gcc.c-torture/compile/attr-retain-1.c @@ -1,4 +1,5 @@ /* { dg-do compile { target R_flag_in_section } } */ +/* { dg-options "-G0" { target { powerpc*-*-* && ilp32 } } } */ /* { dg-final { scan-assembler ".text.*,\"axR\"" } } */ /* { dg-final { scan-assembler ".bss.*,\"awR\"" } } */ /* { dg-final { scan-assembler ".data.*,\"awR\"" } } */ diff --git a/gcc/testsuite/gcc.c-torture/compile/attr-retain-2.c b/gcc/testsuite/gcc.c-torture/compile/attr-retain-2.c index 0208ffe..d9fc150 100644 --- a/gcc/testsuite/gcc.c-torture/compile/attr-retain-2.c +++ b/gcc/testsuite/gcc.c-torture/compile/attr-retain-2.c @@ -11,5 +11,6 @@ /* { dg-final { scan-assembler ".bss.used_lcomm2,\"awR\"" { target arm-*-* } } } */ /* { dg-final { scan-assembler ".data.used_foo_sec,\"awR\"" } } */ /* { dg-options "-ffunction-sections -fdata-sections" } */ +/* { dg-options "-ffunction-sections -fdata-sections -G0" { target { powerpc*-*-* && ilp32 } } } */ #include "attr-retain-1.c"
PING: [PATCH, rs6000] Clean up Power10 fusion options
Ping. On 1/28/22 12:03 PM, Pat Haugen via Gcc-patches wrote: Mark Power10 fusion option undocumented and remove sub-options. Bootstrapped and regression tested on powerpc64le(Power10). Ok for master? -Pat 2022-01-28 Pat Haugen gcc/ * config/rs6000/rs6000.opt (mpower10-fusion): Mark Undocumented. (mpower10-fusion-ld-cmpi, mpower10-fusion-2logical, mpower10-fusion-logical-add, mpower10-fusion-add-logical, mpower10-fusion-2add, mpower10-fusion-2store): Remove. * config/rs6000/rs6000-cpus.def (ISA_3_1_MASKS_SERVER, OTHER_P9_VECTOR_MASKS): Remove Power10 fusion sub-options. * config/rs6000/rs6000.cc (rs6000_option_override_internal, power10_sched_reorder): Likewise. * config/rs6000/genfusion.pl (gen_ld_cmpi_p10, gen_logical_addsubf, gen_addadd): Likewise * config/rs6000/fusion.md: Regenerate.
[PATCH, testsuite] Fix attr-retain-*.c testcases on 32-bit PowerPC [PR100407]
Per Alan's comment in the bugzilla, fix attr-retain-* tescases for 32-bit PowerPC. Bootstrapped and regression tested on powerpc64(32/64) and powerpc64le. Ok for master? -Pat 2022-02-10 Pat Haugen PR testsuite/100407 gcc/testsuite/ * gcc.c-torture/compile/attr-retain-1.c: Add -G0 for 32-bit PowerPC. * gcc.c-torture/compile/attr-retain-2.c: Likewise. diff --git a/gcc/testsuite/gcc.c-torture/compile/attr-retain-1.c b/gcc/testsuite/gcc.c-torture/compile/attr-retain-1.c index 6cab155..4a366eb 100644 --- a/gcc/testsuite/gcc.c-torture/compile/attr-retain-1.c +++ b/gcc/testsuite/gcc.c-torture/compile/attr-retain-1.c @@ -1,4 +1,5 @@ /* { dg-do compile { target R_flag_in_section } } */ +/* { dg-options "-G0" { target { powerpc*-*-* && ilp32 } } } */ /* { dg-final { scan-assembler ".text.*,\"axR\"" } } */ /* { dg-final { scan-assembler ".bss.*,\"awR\"" } } */ /* { dg-final { scan-assembler ".data.*,\"awR\"" } } */ diff --git a/gcc/testsuite/gcc.c-torture/compile/attr-retain-2.c b/gcc/testsuite/gcc.c-torture/compile/attr-retain-2.c index 0208ffe..d9fc150 100644 --- a/gcc/testsuite/gcc.c-torture/compile/attr-retain-2.c +++ b/gcc/testsuite/gcc.c-torture/compile/attr-retain-2.c @@ -11,5 +11,6 @@ /* { dg-final { scan-assembler ".bss.used_lcomm2,\"awR\"" { target arm-*-* } } } */ /* { dg-final { scan-assembler ".data.used_foo_sec,\"awR\"" } } */ /* { dg-options "-ffunction-sections -fdata-sections" } */ +/* { dg-options "-ffunction-sections -fdata-sections -G0" { target { powerpc*-*-* && ilp32 } } } */ #include "attr-retain-1.c"
[PATCH, rs6000] Clean up Power10 fusion options
Mark Power10 fusion option undocumented and remove sub-options. Bootstrapped and regression tested on powerpc64le(Power10). Ok for master? -Pat 2022-01-28 Pat Haugen gcc/ * config/rs6000/rs6000.opt (mpower10-fusion): Mark Undocumented. (mpower10-fusion-ld-cmpi, mpower10-fusion-2logical, mpower10-fusion-logical-add, mpower10-fusion-add-logical, mpower10-fusion-2add, mpower10-fusion-2store): Remove. * config/rs6000/rs6000-cpus.def (ISA_3_1_MASKS_SERVER, OTHER_P9_VECTOR_MASKS): Remove Power10 fusion sub-options. * config/rs6000/rs6000.cc (rs6000_option_override_internal, power10_sched_reorder): Likewise. * config/rs6000/genfusion.pl (gen_ld_cmpi_p10, gen_logical_addsubf, gen_addadd): Likewise * config/rs6000/fusion.md: Regenerate. diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt index c2a77182a9e..b4e69e9fefd 100644 --- a/gcc/config/rs6000/rs6000.opt +++ b/gcc/config/rs6000/rs6000.opt @@ -491,33 +491,9 @@ Target Mask(P8_VECTOR) Var(rs6000_isa_flags) Use vector and scalar instructions added in ISA 2.07. mpower10-fusion -Target Mask(P10_FUSION) Var(rs6000_isa_flags) +Target Undocumented Mask(P10_FUSION) Var(rs6000_isa_flags) Fuse certain integer operations together for better performance on power10. -mpower10-fusion-ld-cmpi -Target Undocumented Mask(P10_FUSION_LD_CMPI) Var(rs6000_isa_flags) -Fuse certain integer operations together for better performance on power10. - -mpower10-fusion-2logical -Target Undocumented Mask(P10_FUSION_2LOGICAL) Var(rs6000_isa_flags) -Fuse pairs of scalar or vector logical operations together for better performance on power10. - -mpower10-fusion-logical-add -Target Undocumented Mask(P10_FUSION_LOGADD) Var(rs6000_isa_flags) -Fuse scalar logical op with add/subf for better performance on power10. - -mpower10-fusion-add-logical -Target Undocumented Mask(P10_FUSION_ADDLOG) Var(rs6000_isa_flags) -Fuse scalar add/subf with logical op for better performance on power10. - -mpower10-fusion-2add -Target Undocumented Mask(P10_FUSION_2ADD) Var(rs6000_isa_flags) -Fuse dependent pairs of add or vaddudm instructions for better performance on power10. - -mpower10-fusion-2store -Target Undocumented Mask(P10_FUSION_2STORE) Var(rs6000_isa_flags) -Fuse certain store operations together for better performance on power10. - mcrypto Target Mask(CRYPTO) Var(rs6000_isa_flags) Use ISA 2.07 Category:Vector.AES and Category:Vector.SHA2 instructions. diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def index 325b21967c4..963947f6939 100644 --- a/gcc/config/rs6000/rs6000-cpus.def +++ b/gcc/config/rs6000/rs6000-cpus.def @@ -85,13 +85,7 @@ #define ISA_3_1_MASKS_SERVER (ISA_3_0_MASKS_SERVER \ | OPTION_MASK_POWER10 \ | OTHER_POWER10_MASKS \ -| OPTION_MASK_P10_FUSION \ -| OPTION_MASK_P10_FUSION_LD_CMPI \ -| OPTION_MASK_P10_FUSION_2LOGICAL \ -| OPTION_MASK_P10_FUSION_LOGADD\ -| OPTION_MASK_P10_FUSION_ADDLOG\ -| OPTION_MASK_P10_FUSION_2ADD \ -| OPTION_MASK_P10_FUSION_2STORE) +| OPTION_MASK_P10_FUSION) /* Flags that need to be turned off if -mno-power9-vector. */ #define OTHER_P9_VECTOR_MASKS (OPTION_MASK_FLOAT128_HW\ @@ -139,12 +133,6 @@ | OPTION_MASK_FPRND\ | OPTION_MASK_POWER10 \ | OPTION_MASK_P10_FUSION \ -| OPTION_MASK_P10_FUSION_LD_CMPI \ -| OPTION_MASK_P10_FUSION_2LOGICAL \ -| OPTION_MASK_P10_FUSION_LOGADD\ -| OPTION_MASK_P10_FUSION_ADDLOG\ -| OPTION_MASK_P10_FUSION_2ADD \ -| OPTION_MASK_P10_FUSION_2STORE\ | OPTION_MASK_HTM \ | OPTION_MASK_ISEL \ | OPTION_MASK_MFCRF\ diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index a5fd36b72d9..548366abada 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -4458,30 +4458,6 @@ rs6000_option_override_internal (bool global_init_p) && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION) == 0) rs6000_isa_flags |= OPTION_MASK_P10_FUSION; - if (TA
Ping^5: [PATCH, rs6000 V2] rotate and mask constants [PR94393]
Ping. I'll note that I recently discovered that this patch also fixes PR93176 and PR97042. -Pat On 11/22/21 1:38 PM, Pat Haugen via Gcc-patches wrote: > Updated version of the patch. Changes made from original are updated > commentary to hopefully aid readability, no functional changes. > > > Implement more two insn constants. rotate_and_mask_constant covers > 64-bit constants that can be formed by rotating a 16-bit signed > constant, rotating a 16-bit signed constant masked on left or right > (rldicl and rldicr), rotating a 16-bit signed constant masked by > rldic, and unusual "lis; rldicl" and "lis; rldicr" patterns. All the > values possible for DImode rs6000_is_valid_and_mask are covered. > > Bootstrapped and regression tested on powerpc64(32/64) and powerpc64le. > Ok for master? > > -Pat > > > 2021-11-22 Alan Modra > Pat Haugen > > PR 94393 > gcc/ > * config/rs6000/rs6000.c (rotate_di, is_rotate_positive_constant, > is_rotate_negative_constant, rotate_and_mask_constant): New functions. > (num_insns_constant_multi, rs6000_emit_set_long_const): Use it here. > * config/rs6000/rs6000.md (*movdi_internal64+1 splitter): Delete. > gcc/testsuite/ > * gcc.target/powerpc/rot_cst.h, > * gcc.target/powerpc/rot_cst1.c, > * gcc.target/powerpc/rot_cst2.c: New tests.
Ping^4: [PATCH, rs6000 V2] rotate and mask constants [PR94393]
Ping. On 11/22/21 1:38 PM, Pat Haugen via Gcc-patches wrote: > Updated version of the patch. Changes made from original are updated > commentary to hopefully aid readability, no functional changes. > > > Implement more two insn constants. rotate_and_mask_constant covers > 64-bit constants that can be formed by rotating a 16-bit signed > constant, rotating a 16-bit signed constant masked on left or right > (rldicl and rldicr), rotating a 16-bit signed constant masked by > rldic, and unusual "lis; rldicl" and "lis; rldicr" patterns. All the > values possible for DImode rs6000_is_valid_and_mask are covered. > > Bootstrapped and regression tested on powerpc64(32/64) and powerpc64le. > Ok for master? > > -Pat > > > 2021-11-22 Alan Modra > Pat Haugen > > PR 94393 > gcc/ > * config/rs6000/rs6000.c (rotate_di, is_rotate_positive_constant, > is_rotate_negative_constant, rotate_and_mask_constant): New functions. > (num_insns_constant_multi, rs6000_emit_set_long_const): Use it here. > * config/rs6000/rs6000.md (*movdi_internal64+1 splitter): Delete. > gcc/testsuite/ > * gcc.target/powerpc/rot_cst.h, > * gcc.target/powerpc/rot_cst1.c, > * gcc.target/powerpc/rot_cst2.c: New tests.
Ping^3: [PATCH, rs6000 V2] rotate and mask constants [PR94393]
Ping. On 11/22/21 1:38 PM, Pat Haugen via Gcc-patches wrote: > Updated version of the patch. Changes made from original are updated > commentary to hopefully aid readability, no functional changes. > > > Implement more two insn constants. rotate_and_mask_constant covers > 64-bit constants that can be formed by rotating a 16-bit signed > constant, rotating a 16-bit signed constant masked on left or right > (rldicl and rldicr), rotating a 16-bit signed constant masked by > rldic, and unusual "lis; rldicl" and "lis; rldicr" patterns. All the > values possible for DImode rs6000_is_valid_and_mask are covered. > > Bootstrapped and regression tested on powerpc64(32/64) and powerpc64le. > Ok for master? > > -Pat > > > 2021-11-22 Alan Modra > Pat Haugen > > PR 94393 > gcc/ > * config/rs6000/rs6000.c (rotate_di, is_rotate_positive_constant, > is_rotate_negative_constant, rotate_and_mask_constant): New functions. > (num_insns_constant_multi, rs6000_emit_set_long_const): Use it here. > * config/rs6000/rs6000.md (*movdi_internal64+1 splitter): Delete. > gcc/testsuite/ > * gcc.target/powerpc/rot_cst.h, > * gcc.target/powerpc/rot_cst1.c, > * gcc.target/powerpc/rot_cst2.c: New tests. >
Ping: [PATCH, rs6000 V2] rotate and mask constants [PR94393]
Ping. On 11/22/21 1:38 PM, Pat Haugen via Gcc-patches wrote: > Updated version of the patch. Changes made from original are updated > commentary to hopefully aid readability, no functional changes. > > > Implement more two insn constants. rotate_and_mask_constant covers > 64-bit constants that can be formed by rotating a 16-bit signed > constant, rotating a 16-bit signed constant masked on left or right > (rldicl and rldicr), rotating a 16-bit signed constant masked by > rldic, and unusual "lis; rldicl" and "lis; rldicr" patterns. All the > values possible for DImode rs6000_is_valid_and_mask are covered. > > Bootstrapped and regression tested on powerpc64(32/64) and powerpc64le. > Ok for master? > > -Pat > > > 2021-11-22 Alan Modra > Pat Haugen > > PR 94393 > gcc/ > * config/rs6000/rs6000.c (rotate_di, is_rotate_positive_constant, > is_rotate_negative_constant, rotate_and_mask_constant): New functions. > (num_insns_constant_multi, rs6000_emit_set_long_const): Use it here. > * config/rs6000/rs6000.md (*movdi_internal64+1 splitter): Delete. > gcc/testsuite/ > * gcc.target/powerpc/rot_cst.h, > * gcc.target/powerpc/rot_cst1.c, > * gcc.target/powerpc/rot_cst2.c: New tests. >
[PATCH, rs6000 GCC-11 committed] Fix fusion testcase counts
Somehow there was an issue when a larger patch was backported, and this chunk did not make it. Tested and committed as obvious. -Pat 2021-10-04 Pat Haugen gcc/testsuite/ChangeLog: * gcc.target/powerpc/fusion-p10-ldcmpi.c: Update counts. diff --git a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c index ea1d5d01950..526a026d874 100644 --- a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c +++ b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c @@ -53,16 +53,16 @@ TEST(int16_t) TEST(uint8_t) TEST(int8_t) -/* { dg-final { scan-assembler-times "lbz_cmpldi_cr0_QI_clobber_CCUNS_zero" 2 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "lbz_cmpldi_cr0_QI_clobber_CCUNS_zero" 4 { target lp64 } } } */ /* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_DI_CC_none" 4 { target lp64 } } } */ /* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_clobber_CC_none" 4 { target lp64 } } } */ /* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_DI_CCUNS_none" 1 { target lp64 } } } */ /* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_clobber_CCUNS_none" 1 { target lp64 } } } */ -/* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign" 8 { target lp64 } } } */ -/* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero" 2 { target lp64 } } } */ -/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign" 3 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign" 16 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero" 4 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign" 0 { target lp64 } } } */ /* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none" 4 { target lp64 } } } */ -/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero" 2 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero" 0 { target lp64 } } } */ /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none" 2 { target lp64 } } } */ /* { dg-final { scan-assembler-times "lbz_cmpldi_cr0_QI_clobber_CCUNS_zero" 2 { target ilp32 } } } */
Ping: [PATCH, rs6000 V2] Add store fusion support for Power10
Ping. On 8/11/21 11:02 AM, Pat Haugen via Gcc-patches wrote: > Enable store fusion on Power10. > > Use the SCHED_REORDER hook to implement Power10 specific ready list > reordering. > As of now this is just store fusion. > > Things changed in this version of the patch > - Separate patch for additional load/store checks > - Move option check from is_fusable_store() to caller > - Misc coding style changes pointed out in review (parens/braces) > - Add testcases > > Bootstrap/regtest on powerpc64(32/64) and powerpc64le(Power10) with no new > regressions. > Ok for master? > > -Pat > > > 2021-08-11 Pat Haugen > > gcc/ChangeLog: > > * config/rs6000/rs6000-cpus.def (ISA_3_1_MASKS_SERVER): Add new flag. > (POWERPC_MASKS): Likewise. > * config/rs6000/rs6000.c (rs6000_option_override_internal): Enable > store fusion for Power10. > (is_fusable_store): New. > (power10_sched_reorder): Likewise. > (rs6000_sched_reorder): Do Power10 specific reordering. > (rs6000_sched_reorder2): Likewise. > * config/rs6000/rs6000.opt: Add new option. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/fusion-p10-stst.c: New test. > * gcc.target/powerpc/fusion-p10-stst2.c: New test. > > > > diff --git a/gcc/config/rs6000/rs6000-cpus.def > b/gcc/config/rs6000/rs6000-cpus.def > index 6758296c0fd..f5812da0184 100644 > --- a/gcc/config/rs6000/rs6000-cpus.def > +++ b/gcc/config/rs6000/rs6000-cpus.def > @@ -90,7 +90,8 @@ >| OPTION_MASK_P10_FUSION_2LOGICAL \ >| OPTION_MASK_P10_FUSION_LOGADD\ >| OPTION_MASK_P10_FUSION_ADDLOG\ > - | OPTION_MASK_P10_FUSION_2ADD) > + | OPTION_MASK_P10_FUSION_2ADD \ > + | OPTION_MASK_P10_FUSION_2STORE) > > /* Flags that need to be turned off if -mno-power9-vector. */ > #define OTHER_P9_VECTOR_MASKS(OPTION_MASK_FLOAT128_HW > \ > @@ -143,6 +144,7 @@ >| OPTION_MASK_P10_FUSION_LOGADD\ >| OPTION_MASK_P10_FUSION_ADDLOG\ >| OPTION_MASK_P10_FUSION_2ADD \ > + | OPTION_MASK_P10_FUSION_2STORE\ >| OPTION_MASK_HTM \ >| OPTION_MASK_ISEL \ >| OPTION_MASK_MFCRF\ > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index 60f406a4ff6..402cc924e3f 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -4495,6 +4495,10 @@ rs6000_option_override_internal (bool global_init_p) >&& (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2ADD) == 0) > rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2ADD; > > + if (TARGET_POWER10 > + && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2STORE) == 0) > +rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2STORE; > + >/* Turn off vector pair/mma options on non-power10 systems. */ >else if (!TARGET_POWER10 && TARGET_MMA) > { > @@ -18874,6 +18878,91 @@ power9_sched_reorder2 (rtx_insn **ready, int lastpos) >return cached_can_issue_more; > } > > +/* Determine if INSN is a store to memory that can be fused with a similar > + adjacent store. */ > + > +static bool > +is_fusable_store (rtx_insn *insn, rtx *str_mem) > +{ > + /* Insn must be a non-prefixed base+disp form store. */ > + if (is_store_insn (insn, str_mem) > + && get_attr_prefixed (insn) == PREFIXED_NO > + && get_attr_update (insn) == UPDATE_NO > + && get_attr_indexed (insn) == INDEXED_NO) > +{ > + /* Further restictions by mode and size. */ > + machine_mode mode = GET_MODE (*str_mem); > + HOST_WIDE_INT size; > + if (MEM_SIZE_KNOWN_P (*str_mem)) > + size = MEM_SIZE (*str_mem); > + else > + return false; > + > + if (INTEGRAL_MODE_P (mode)) > + /* Must be word or dword size. */ > + return (size == 4 || size == 8); > + else if (FLOAT_MODE_P (mode)) > + /* Must be dword size. */ > + return (size == 8); > +} > + > + return false; > +} > + > +/* Do Power10 specific reordering of the ready list. */ > + > +static int > +power10_sched_reorder (rtx_insn **ready, int lastpos) > +{ > + rtx mem1; > + > + /* Do store fusion during sched2
[PATCH, rs6000 V2] Add store fusion support for Power10
Enable store fusion on Power10. Use the SCHED_REORDER hook to implement Power10 specific ready list reordering. As of now this is just store fusion. Things changed in this version of the patch - Separate patch for additional load/store checks - Move option check from is_fusable_store() to caller - Misc coding style changes pointed out in review (parens/braces) - Add testcases Bootstrap/regtest on powerpc64(32/64) and powerpc64le(Power10) with no new regressions. Ok for master? -Pat 2021-08-11 Pat Haugen gcc/ChangeLog: * config/rs6000/rs6000-cpus.def (ISA_3_1_MASKS_SERVER): Add new flag. (POWERPC_MASKS): Likewise. * config/rs6000/rs6000.c (rs6000_option_override_internal): Enable store fusion for Power10. (is_fusable_store): New. (power10_sched_reorder): Likewise. (rs6000_sched_reorder): Do Power10 specific reordering. (rs6000_sched_reorder2): Likewise. * config/rs6000/rs6000.opt: Add new option. gcc/testsuite/ChangeLog: * gcc.target/powerpc/fusion-p10-stst.c: New test. * gcc.target/powerpc/fusion-p10-stst2.c: New test. diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def index 6758296c0fd..f5812da0184 100644 --- a/gcc/config/rs6000/rs6000-cpus.def +++ b/gcc/config/rs6000/rs6000-cpus.def @@ -90,7 +90,8 @@ | OPTION_MASK_P10_FUSION_2LOGICAL \ | OPTION_MASK_P10_FUSION_LOGADD\ | OPTION_MASK_P10_FUSION_ADDLOG\ -| OPTION_MASK_P10_FUSION_2ADD) +| OPTION_MASK_P10_FUSION_2ADD \ +| OPTION_MASK_P10_FUSION_2STORE) /* Flags that need to be turned off if -mno-power9-vector. */ #define OTHER_P9_VECTOR_MASKS (OPTION_MASK_FLOAT128_HW\ @@ -143,6 +144,7 @@ | OPTION_MASK_P10_FUSION_LOGADD\ | OPTION_MASK_P10_FUSION_ADDLOG\ | OPTION_MASK_P10_FUSION_2ADD \ +| OPTION_MASK_P10_FUSION_2STORE\ | OPTION_MASK_HTM \ | OPTION_MASK_ISEL \ | OPTION_MASK_MFCRF\ diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 60f406a4ff6..402cc924e3f 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -4495,6 +4495,10 @@ rs6000_option_override_internal (bool global_init_p) && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2ADD) == 0) rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2ADD; + if (TARGET_POWER10 + && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2STORE) == 0) +rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2STORE; + /* Turn off vector pair/mma options on non-power10 systems. */ else if (!TARGET_POWER10 && TARGET_MMA) { @@ -18874,6 +18878,91 @@ power9_sched_reorder2 (rtx_insn **ready, int lastpos) return cached_can_issue_more; } +/* Determine if INSN is a store to memory that can be fused with a similar + adjacent store. */ + +static bool +is_fusable_store (rtx_insn *insn, rtx *str_mem) +{ + /* Insn must be a non-prefixed base+disp form store. */ + if (is_store_insn (insn, str_mem) + && get_attr_prefixed (insn) == PREFIXED_NO + && get_attr_update (insn) == UPDATE_NO + && get_attr_indexed (insn) == INDEXED_NO) +{ + /* Further restictions by mode and size. */ + machine_mode mode = GET_MODE (*str_mem); + HOST_WIDE_INT size; + if (MEM_SIZE_KNOWN_P (*str_mem)) + size = MEM_SIZE (*str_mem); + else + return false; + + if (INTEGRAL_MODE_P (mode)) + /* Must be word or dword size. */ + return (size == 4 || size == 8); + else if (FLOAT_MODE_P (mode)) + /* Must be dword size. */ + return (size == 8); +} + + return false; +} + +/* Do Power10 specific reordering of the ready list. */ + +static int +power10_sched_reorder (rtx_insn **ready, int lastpos) +{ + rtx mem1; + + /* Do store fusion during sched2 only. */ + if (!reload_completed) +return cached_can_issue_more; + + /* If the prior insn finished off a store fusion pair then simply + reset the counter and return, nothing more to do. */ + if (load_store_pendulum != 0) +{ + load_store_pendulum = 0; + return cached_can_issue_more; +} + + /* Try to pair certain store insns to adjacent memory locations + so that the hardware will fuse them to a single operation. */ + if (TARGET_P10_FUSION && TARGET_P10_FUSION_2STORE + && is_fusable_store (last_scheduled_insn, )) +{ + int pos; + rtx me
[PATCH, rs6000 V2] Add additional checks when identifying load/store instructions
Add additional checks to verify destination[source] of a load[store] instruction is a register. Modified code based on review comment to not change general logic of the flow. Braces needed on inner if-else to prevent error during bootstrap for ambiguous 'else'. Bootstrap/regtest on powerpc64le with no new regressions. Ok for master? -Pat 2021-08-07 Pat Haugen gcc/ChangeLog: * config/rs6000/rs6000.c (is_load_insn1): Verify destination is a register. (is_store_insn1): Verify source is a register. diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 279f00cc648..c8a146a7d18 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -18357,8 +18357,13 @@ is_load_insn1 (rtx pat, rtx *load_mem) if (!pat || pat == NULL_RTX) return false; - if (GET_CODE (pat) == SET) -return find_mem_ref (SET_SRC (pat), load_mem); + if (GET_CODE (pat)== SET) +{ + if (REG_P (SET_DEST (pat))) + return find_mem_ref (SET_SRC (pat), load_mem); + else + return false; +} if (GET_CODE (pat) == PARALLEL) { @@ -18395,7 +18400,12 @@ is_store_insn1 (rtx pat, rtx *str_mem) return false; if (GET_CODE (pat) == SET) -return find_mem_ref (SET_DEST (pat), str_mem); +{ + if (REG_P (SET_SRC (pat)) || SUBREG_P (SET_SRC (pat))) + return find_mem_ref (SET_DEST (pat), str_mem); + else + return false; +} if (GET_CODE (pat) == PARALLEL) {
Re: [PATCH, rs6000] Add additional checks when identifying load/store instructions
On 8/6/21 11:05 AM, Segher Boessenkool wrote: > On Fri, Aug 06, 2021 at 10:29:40AM -0500, Pat Haugen wrote: >> On 8/6/21 10:02 AM, Segher Boessenkool wrote: >>>> - if (GET_CODE (pat) == SET) >>>> + if (GET_CODE (pat) == SET && REG_P (SET_DEST (pat))) >>>> return find_mem_ref (SET_SRC (pat), load_mem); >>> >>> So this now falls through if it is a SET of something else than a reg. >>> Is that intentional? If so, this should be in the changelog. >>> >> >> Falling through eventually resulted in returning false, but no, wasn't >> really intentional to change the logic that way. I'll fix up both places to >> look like the following and resubmit. >> >> if (GET_CODE (pat)== SET) >> if (REG_P (SET_DEST (pat))) >> return find_mem_ref (SET_SRC (pat), load_mem); >> else >> return false; > > Well, that isn't quiet it either, no? It returns false for all stores > then? Yeah, I didn't word that clearly, meant I'd fix up both appropriately with the inner if-else logic. Here's what it looks like now. diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 279f00cc648..ad856254f52 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -18357,8 +18357,11 @@ is_load_insn1 (rtx pat, rtx *load_mem) if (!pat || pat == NULL_RTX) return false; - if (GET_CODE (pat) == SET) -return find_mem_ref (SET_SRC (pat), load_mem); + if (GET_CODE (pat)== SET) +if (REG_P (SET_DEST (pat))) + return find_mem_ref (SET_SRC (pat), load_mem); +else + return false; if (GET_CODE (pat) == PARALLEL) { @@ -18395,7 +18398,10 @@ is_store_insn1 (rtx pat, rtx *str_mem) return false; if (GET_CODE (pat) == SET) -return find_mem_ref (SET_DEST (pat), str_mem); +if (REG_P (SET_SRC (pat)) || SUBREG_P (SET_SRC (pat))) + return find_mem_ref (SET_DEST (pat), str_mem); +else + return false; if (GET_CODE (pat) == PARALLEL) { Unless I'm totally not understanding the point you're trying to get across to me. -Pat
Re: [PATCH, rs6000] Add additional checks when identifying load/store instructions
On 8/6/21 10:02 AM, Segher Boessenkool wrote: > On Fri, Aug 06, 2021 at 09:47:40AM -0500, Pat Haugen wrote: >> Add additional checks to verify destination[source] of a load[store] >> instruction is a register. > >> * config/rs6000/rs6000.c: (is_load_insn1): Verify destination is a >> register. > > No colon before " (". > >> --- a/gcc/config/rs6000/rs6000.c >> +++ b/gcc/config/rs6000/rs6000.c >> @@ -18357,7 +18361,7 @@ is_load_insn1 (rtx pat, rtx *load_mem) >>if (!pat || pat == NULL_RTX) >> return false; >> >> - if (GET_CODE (pat) == SET) >> + if (GET_CODE (pat) == SET && REG_P (SET_DEST (pat))) >> return find_mem_ref (SET_SRC (pat), load_mem); > > So this now falls through if it is a SET of something else than a reg. > Is that intentional? If so, this should be in the changelog. > Falling through eventually resulted in returning false, but no, wasn't really intentional to change the logic that way. I'll fix up both places to look like the following and resubmit. if (GET_CODE (pat)== SET) if (REG_P (SET_DEST (pat))) return find_mem_ref (SET_SRC (pat), load_mem); else return false; -Pat
[PATCH, rs6000] Add additional checks when identifying load/store instructions
Add additional checks to verify destination[source] of a load[store] instruction is a register. Bootstrap/regtest on powerpc64le with no new regressions. Ok for master? -Pat 2021-08-06 Pat Haugen gcc/ChangeLog: * config/rs6000/rs6000.c: (is_load_insn1): Verify destination is a register. (is_store_insn1): Verify source is a register. diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 279f00cc648..1460a0d7c5c 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -18357,7 +18361,7 @@ is_load_insn1 (rtx pat, rtx *load_mem) if (!pat || pat == NULL_RTX) return false; - if (GET_CODE (pat) == SET) + if (GET_CODE (pat) == SET && REG_P (SET_DEST (pat))) return find_mem_ref (SET_SRC (pat), load_mem); if (GET_CODE (pat) == PARALLEL) @@ -18394,7 +18398,8 @@ is_store_insn1 (rtx pat, rtx *str_mem) if (!pat || pat == NULL_RTX) return false; - if (GET_CODE (pat) == SET) + if (GET_CODE (pat) == SET + && (REG_P (SET_SRC (pat)) || SUBREG_P (SET_SRC (pat return find_mem_ref (SET_DEST (pat), str_mem); if (GET_CODE (pat) == PARALLEL)
Re: [PATCH, rs6000] Add store fusion support for Power10
On 8/4/21 9:23 AM, Bill Schmidt wrote: > Hi Pat, > > Good stuff! Comments below. > > On 8/2/21 3:19 PM, Pat Haugen via Gcc-patches wrote: >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c >> index 279f00cc648..1460a0d7c5c 100644 >> --- a/gcc/config/rs6000/rs6000.c >> +++ b/gcc/config/rs6000/rs6000.c >> @@ -4490,6 +4490,10 @@ rs6000_option_override_internal (bool global_init_p) >> && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2ADD) == 0) >> rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2ADD; >> >> + if (TARGET_POWER10 >> + && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2STORE) == 0) >> + rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2STORE; >> + >> /* Turn off vector pair/mma options on non-power10 systems. */ >> else if (!TARGET_POWER10 && TARGET_MMA) >> { >> @@ -18357,7 +18361,7 @@ is_load_insn1 (rtx pat, rtx *load_mem) >> if (!pat || pat == NULL_RTX) >> return false; >> >> - if (GET_CODE (pat) == SET) >> + if (GET_CODE (pat) == SET && REG_P (SET_DEST (pat))) >> return find_mem_ref (SET_SRC (pat), load_mem); > Looks like this is just an optimization to quickly discard stores, right? Additional verification check to make sure destination is REG, yes. This will become a separate patch. >> if (GET_CODE (pat) == PARALLEL) >> @@ -18394,7 +18398,8 @@ is_store_insn1 (rtx pat, rtx *str_mem) >> if (!pat || pat == NULL_RTX) >> return false; >> >> - if (GET_CODE (pat) == SET) >> + if (GET_CODE (pat) == SET >> + && (REG_P (SET_SRC (pat)) || SUBREG_P (SET_SRC (pat >> return find_mem_ref (SET_DEST (pat), str_mem); > > > Similar question. Similar answer. :) > >> >> if (GET_CODE (pat) == PARALLEL) >> @@ -18859,6 +18864,96 @@ power9_sched_reorder2 (rtx_insn **ready, int >> lastpos) >> return cached_can_issue_more; >> } >> >> +/* Determine if INSN is a store to memory that can be fused with a similar >> + adjacent store. */ >> + >> +static bool >> +is_fusable_store (rtx_insn *insn, rtx *str_mem) >> +{ >> + /* Exit early if not doing store fusion. */ >> + if (!(TARGET_P10_FUSION && TARGET_P10_FUSION_2STORE)) >> + return false; >> + >> + /* Insn must be a non-prefixed base+disp form store. */ >> + if (is_store_insn (insn, str_mem) >> + && get_attr_prefixed (insn) == PREFIXED_NO >> + && get_attr_update (insn) == UPDATE_NO >> + && get_attr_indexed (insn) == INDEXED_NO) >> + { >> + /* Further restictions by mode and size. */ >> + machine_mode mode = GET_MODE (*str_mem); >> + HOST_WIDE_INT size; >> + if MEM_SIZE_KNOWN_P (*str_mem) >> + size = MEM_SIZE (*str_mem); >> + else >> + return false; >> + >> + if INTEGRAL_MODE_P (mode) >> + { >> + /* Must be word or dword size. */ >> + return (size == 4 || size == 8); >> + } >> + else if FLOAT_MODE_P (mode) >> + { >> + /* Must be dword size. */ >> + return (size == 8); >> + } >> + } >> + >> + return false; >> +} >> + >> +/* Do Power10 specific reordering of the ready list. */ >> + >> +static int >> +power10_sched_reorder (rtx_insn **ready, int lastpos) >> +{ >> + int pos; >> + rtx mem1, mem2; >> + >> + /* Do store fusion during sched2 only. */ >> + if (!reload_completed) >> + return cached_can_issue_more; >> + >> + /* If the prior insn finished off a store fusion pair then simply >> + reset the counter and return, nothing more to do. */ > > > Good comments throughout, thanks! > >> + if (load_store_pendulum != 0) >> + { >> + load_store_pendulum = 0; >> + return cached_can_issue_more; >> + } >> + >> + /* Try to pair certain store insns to adjacent memory locations >> + so that the hardware will fuse them to a single operation. */ >> + if (is_fusable_store (last_scheduled_insn, )) >> + { >> + /* A fusable store was just scheduled. Scan the ready list for >> another >> + store that it can fuse with. */ >> + pos = lastpos; >> + while (pos >= 0) >> + { >> + /* GPR stores can be ascending or descending offsets, FPR/VSR stores > VSR? I don't see how that applies here. Scalar floating point store from VSX re
[PATCH, rs6000] Add store fusion support for Power10
Enable store fusion on Power10. Use the SCHED_REORDER hook to implement Power10 specific ready list reordering. As of now, pairing stores for store fusion is the only function being performed. Bootstrap/regtest on powerpc64le(Power10) with no new regressions. Ok for master? -Pat 2021-08-02 Pat Haugen gcc/ChangeLog: * config/rs6000/rs6000-cpus.def (ISA_3_1_MASKS_SERVER): Add new flag. (POWERPC_MASKS): Likewise. * config/rs6000/rs6000.c (rs6000_option_override_internal): Enable store fusion for Power10. (is_load_insn1): Verify destination is a register. (is_store_insn1): Verify source is a register. (is_fusable_store): New. (power10_sched_reorder): Likewise. (rs6000_sched_reorder): Do Power10 specific reordering. (rs6000_sched_reorder2): Likewise. * config/rs6000/rs6000.opt: Add new option. diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def index 6758296c0fd..f5812da0184 100644 --- a/gcc/config/rs6000/rs6000-cpus.def +++ b/gcc/config/rs6000/rs6000-cpus.def @@ -90,7 +90,8 @@ | OPTION_MASK_P10_FUSION_2LOGICAL \ | OPTION_MASK_P10_FUSION_LOGADD\ | OPTION_MASK_P10_FUSION_ADDLOG\ -| OPTION_MASK_P10_FUSION_2ADD) +| OPTION_MASK_P10_FUSION_2ADD \ +| OPTION_MASK_P10_FUSION_2STORE) /* Flags that need to be turned off if -mno-power9-vector. */ #define OTHER_P9_VECTOR_MASKS (OPTION_MASK_FLOAT128_HW\ @@ -143,6 +144,7 @@ | OPTION_MASK_P10_FUSION_LOGADD\ | OPTION_MASK_P10_FUSION_ADDLOG\ | OPTION_MASK_P10_FUSION_2ADD \ +| OPTION_MASK_P10_FUSION_2STORE\ | OPTION_MASK_HTM \ | OPTION_MASK_ISEL \ | OPTION_MASK_MFCRF\ diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 279f00cc648..1460a0d7c5c 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -4490,6 +4490,10 @@ rs6000_option_override_internal (bool global_init_p) && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2ADD) == 0) rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2ADD; + if (TARGET_POWER10 + && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_2STORE) == 0) +rs6000_isa_flags |= OPTION_MASK_P10_FUSION_2STORE; + /* Turn off vector pair/mma options on non-power10 systems. */ else if (!TARGET_POWER10 && TARGET_MMA) { @@ -18357,7 +18361,7 @@ is_load_insn1 (rtx pat, rtx *load_mem) if (!pat || pat == NULL_RTX) return false; - if (GET_CODE (pat) == SET) + if (GET_CODE (pat) == SET && REG_P (SET_DEST (pat))) return find_mem_ref (SET_SRC (pat), load_mem); if (GET_CODE (pat) == PARALLEL) @@ -18394,7 +18398,8 @@ is_store_insn1 (rtx pat, rtx *str_mem) if (!pat || pat == NULL_RTX) return false; - if (GET_CODE (pat) == SET) + if (GET_CODE (pat) == SET + && (REG_P (SET_SRC (pat)) || SUBREG_P (SET_SRC (pat return find_mem_ref (SET_DEST (pat), str_mem); if (GET_CODE (pat) == PARALLEL) @@ -18859,6 +18864,96 @@ power9_sched_reorder2 (rtx_insn **ready, int lastpos) return cached_can_issue_more; } +/* Determine if INSN is a store to memory that can be fused with a similar + adjacent store. */ + +static bool +is_fusable_store (rtx_insn *insn, rtx *str_mem) +{ + /* Exit early if not doing store fusion. */ + if (!(TARGET_P10_FUSION && TARGET_P10_FUSION_2STORE)) +return false; + + /* Insn must be a non-prefixed base+disp form store. */ + if (is_store_insn (insn, str_mem) + && get_attr_prefixed (insn) == PREFIXED_NO + && get_attr_update (insn) == UPDATE_NO + && get_attr_indexed (insn) == INDEXED_NO) +{ + /* Further restictions by mode and size. */ + machine_mode mode = GET_MODE (*str_mem); + HOST_WIDE_INT size; + if MEM_SIZE_KNOWN_P (*str_mem) + size = MEM_SIZE (*str_mem); + else + return false; + + if INTEGRAL_MODE_P (mode) + { + /* Must be word or dword size. */ + return (size == 4 || size == 8); + } + else if FLOAT_MODE_P (mode) + { + /* Must be dword size. */ + return (size == 8); + } +} + + return false; +} + +/* Do Power10 specific reordering of the ready list. */ + +static int +power10_sched_reorder (rtx_insn **ready, int lastpos) +{ + int pos; + rtx mem1, mem2; + + /* Do store fusion during sched2 only. */ + if (!reload_comp
PING: [RS6000] rotate and mask constants [PR94393]
Ping https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555760.html I've done a current bootstrap/regtest on powerpc64/powerpc64le with no regressions. -Pat
Re: [PATCH, rs6000] Update Power10 scheduling description for fused instruction types
On 6/7/21 3:41 PM, Pat Haugen via Gcc-patches wrote: > Update Power10 scheduling description for new fused instruction types. > > Bootstrap/regtest on powerpc64le(Power10) with no new regressions. Ok for > trunk? > > -Pat > > > 2021-06-07 Pat Haugen > > gcc/ChangeLog: > > * config/rs6000/power10.md (power10-fused-load, power10-fused-store, > power10-fused_alu, power10-fused-vec, power10-fused-branch): New. > > Forgot to ask if ok to backport to GCC 11 since the new instruction types were backported. -Pat
[PATCH, rs6000] Update Power10 scheduling description for fused instruction types
Update Power10 scheduling description for new fused instruction types. Bootstrap/regtest on powerpc64le(Power10) with no new regressions. Ok for trunk? -Pat 2021-06-07 Pat Haugen gcc/ChangeLog: * config/rs6000/power10.md (power10-fused-load, power10-fused-store, power10-fused_alu, power10-fused-vec, power10-fused-branch): New. diff --git a/gcc/config/rs6000/power10.md b/gcc/config/rs6000/power10.md index 665f0f22c62..0186ae95896 100644 --- a/gcc/config/rs6000/power10.md +++ b/gcc/config/rs6000/power10.md @@ -100,6 +100,11 @@ (define_insn_reservation "power10-load" 4 (eq_attr "cpu" "power10")) "DU_any_power10,LU_power10") +(define_insn_reservation "power10-fused-load" 4 + (and (eq_attr "type" "fused_load_cmpi,fused_addis_load,fused_load_load") + (eq_attr "cpu" "power10")) + "DU_even_power10,LU_power10") + (define_insn_reservation "power10-prefixed-load" 4 (and (eq_attr "type" "load") (eq_attr "update" "no") @@ -176,6 +181,11 @@ (define_insn_reservation "power10-store" 0 (eq_attr "cpu" "power10")) "DU_any_power10,STU_power10") +(define_insn_reservation "power10-fused-store" 0 + (and (eq_attr "type" "fused_store_store") + (eq_attr "cpu" "power10")) + "DU_even_power10,STU_power10") + (define_insn_reservation "power10-prefixed-store" 0 (and (eq_attr "type" "store,fpstore,vecstore") (eq_attr "prefixed" "yes") @@ -244,6 +254,11 @@ (define_insn_reservation "power10-alu" 2 (define_bypass 4 "power10-alu" "power10-crlogical,power10-mfcr,power10-mfcrf") +(define_insn_reservation "power10-fused_alu" 2 + (and (eq_attr "type" "fused_arith_logical,fused_cmp_isel,fused_carry") + (eq_attr "cpu" "power10")) + "DU_even_power10,EXU_power10") + ; paddi (define_insn_reservation "power10-paddi" 2 (and (eq_attr "type" "add") @@ -403,6 +418,11 @@ (define_insn_reservation "power10-vec-2cyc" 2 (eq_attr "cpu" "power10")) "DU_any_power10,EXU_power10") +(define_insn_reservation "power10-fused-vec" 2 + (and (eq_attr "type" "fused_vector") + (eq_attr "cpu" "power10")) + "DU_even_power10,EXU_power10") + (define_insn_reservation "power10-veccmp" 3 (and (eq_attr "type" "veccmp") (eq_attr "cpu" "power10")) @@ -490,6 +510,11 @@ (define_insn_reservation "power10-branch" 2 (eq_attr "cpu" "power10")) "DU_any_power10,STU_power10") +(define_insn_reservation "power10-fused-branch" 3 + (and (eq_attr "type" "fused_mtbc") + (eq_attr "cpu" "power10")) + "DU_even_power10,STU_power10") + ; Crypto (define_insn_reservation "power10-crypto" 4
Re: [PATCH, rs6000] Fix alias set of link reg save MEM
On 6/2/21 9:19 AM, Segher Boessenkool wrote: > On Wed, Jun 02, 2021 at 08:23:48AM -0500, Pat Haugen wrote: >> On 6/2/21 7:01 AM, Richard Biener wrote: >>> So did you check the RTL (and alias-sets) produced by >>> __builtin_return_address? Test coverage might >>> be low here and w/o scheduling opportunities to break things. >> >> __builtin_return_address creates it's own copy of the link reg to a pseudo >> upon function entry. It doesn't appear to try and "reuse" any LR copy/save >> location that might be generated via the prolog code. References to >> __builtin_return_address will then refer to that pseudo. So I don't see any >> connection between the prolog save code and __builtin_return_address. > > That is for __builtin_return_address(0), the simple (and always working) > one. It is harder for non-zero arguments (although I don't see why > those would not work, even with inlining). Right, I realized after I sent the reply I was being a little too specific to __builtin_return_address(0). For non-zero args __builtin_return_address generates code to walk the appropriate number of stack frames back before loading the LR from the designated save area. All those mem references are using either frame-alias-set or 0 as their alias set, so we still should be fine. So regardless of the argument to the builtin, there is no connection between the current function's prologue LR save code and the code __builtin_return_address() would generate in the function. -Pat
Re: [PATCH, rs6000] Fix alias set of link reg save MEM
On 6/2/21 7:01 AM, Richard Biener wrote: > On Wed, Jun 2, 2021 at 1:15 PM Pat Haugen wrote: >> >> On 6/2/21 1:51 AM, Richard Biener wrote: >>> On Tue, Jun 1, 2021 at 10:37 PM Pat Haugen via Gcc-patches >>> wrote: >>>> >>>> Make sure link reg save MEM has frame alias set, to match other link reg >>>> save/restore code. >>>> >>>> Bootstrap/regtest on powerpc64/powerpc64le with no new regressions. Ok for >>>> trunk? >>>> >>>> -Pat >>>> >>>> >>>> 2021-06-01 Pat Haugen >>>> >>>> gcc/ChangeLog: >>>> >>>> * config/rs6000/rs6000-logue.c (rs6000_emit_prologue): Use >>>> gen_frame_store. >>>> >>>> >>>> >>>> diff --git a/gcc/config/rs6000/rs6000-logue.c >>>> b/gcc/config/rs6000/rs6000-logue.c >>>> index 13c00e740d6..07337c4836a 100644 >>>> --- a/gcc/config/rs6000/rs6000-logue.c >>>> +++ b/gcc/config/rs6000/rs6000-logue.c >>>> @@ -3257,7 +3257,7 @@ rs6000_emit_prologue (void) >>>>if (!WORLD_SAVE_P (info) && info->lr_save_p >>>>&& !cfun->machine->lr_is_wrapped_separately) >>>> { >>>> - rtx addr, reg, mem; >>>> + rtx reg; >>>> >>>>reg = gen_rtx_REG (Pmode, 0); >>>>START_USE (0); >>>> @@ -3267,13 +3267,8 @@ rs6000_emit_prologue (void) >>>>if (!(strategy & (SAVE_NOINLINE_GPRS_SAVES_LR >>>> | SAVE_NOINLINE_FPRS_SAVES_LR))) >>>> { >>>> - addr = gen_rtx_PLUS (Pmode, frame_reg_rtx, >>>> - GEN_INT (info->lr_save_offset + frame_off)); >>>> - mem = gen_rtx_MEM (Pmode, addr); >>>> - /* This should not be of rs6000_sr_alias_set, because of >>>> -__builtin_return_address. */ >>> >>> I can't figure what this comment meant - did you? Note the old code >>> looks like it would end up with alias-set zero and thus more conservative >>> than with using frame-alias-set so this is an optimization? >> >> No, I couldn't figure out the reasoning for the comment/code either. It's >> been that way since it was introduced in March 2000 as part of the “Merge in >> changes from newppc-branch.” patch. All other places where we save/restore >> the link reg use a MEM with frame-alias-set. This change is an optimization >> as you suspect in that it allows us to schedule non-aliased loads above the >> link reg store (which couldn't happen before due to use of alias-set zero). > > So did you check the RTL (and alias-sets) produced by > __builtin_return_address? Test coverage might > be low here and w/o scheduling opportunities to break things. __builtin_return_address creates it's own copy of the link reg to a pseudo upon function entry. It doesn't appear to try and "reuse" any LR copy/save location that might be generated via the prolog code. References to __builtin_return_address will then refer to that pseudo. So I don't see any connection between the prolog save code and __builtin_return_address. -Pat
Re: [PATCH, rs6000] Fix alias set of link reg save MEM
On 6/2/21 1:51 AM, Richard Biener wrote: > On Tue, Jun 1, 2021 at 10:37 PM Pat Haugen via Gcc-patches > wrote: >> >> Make sure link reg save MEM has frame alias set, to match other link reg >> save/restore code. >> >> Bootstrap/regtest on powerpc64/powerpc64le with no new regressions. Ok for >> trunk? >> >> -Pat >> >> >> 2021-06-01 Pat Haugen >> >> gcc/ChangeLog: >> >> * config/rs6000/rs6000-logue.c (rs6000_emit_prologue): Use >> gen_frame_store. >> >> >> >> diff --git a/gcc/config/rs6000/rs6000-logue.c >> b/gcc/config/rs6000/rs6000-logue.c >> index 13c00e740d6..07337c4836a 100644 >> --- a/gcc/config/rs6000/rs6000-logue.c >> +++ b/gcc/config/rs6000/rs6000-logue.c >> @@ -3257,7 +3257,7 @@ rs6000_emit_prologue (void) >>if (!WORLD_SAVE_P (info) && info->lr_save_p >>&& !cfun->machine->lr_is_wrapped_separately) >> { >> - rtx addr, reg, mem; >> + rtx reg; >> >>reg = gen_rtx_REG (Pmode, 0); >>START_USE (0); >> @@ -3267,13 +3267,8 @@ rs6000_emit_prologue (void) >>if (!(strategy & (SAVE_NOINLINE_GPRS_SAVES_LR >> | SAVE_NOINLINE_FPRS_SAVES_LR))) >> { >> - addr = gen_rtx_PLUS (Pmode, frame_reg_rtx, >> - GEN_INT (info->lr_save_offset + frame_off)); >> - mem = gen_rtx_MEM (Pmode, addr); >> - /* This should not be of rs6000_sr_alias_set, because of >> -__builtin_return_address. */ > > I can't figure what this comment meant - did you? Note the old code > looks like it would end up with alias-set zero and thus more conservative > than with using frame-alias-set so this is an optimization? No, I couldn't figure out the reasoning for the comment/code either. It's been that way since it was introduced in March 2000 as part of the “Merge in changes from newppc-branch.” patch. All other places where we save/restore the link reg use a MEM with frame-alias-set. This change is an optimization as you suspect in that it allows us to schedule non-aliased loads above the link reg store (which couldn't happen before due to use of alias-set zero). -Pat
[PATCH, rs6000] Fix alias set of link reg save MEM
Make sure link reg save MEM has frame alias set, to match other link reg save/restore code. Bootstrap/regtest on powerpc64/powerpc64le with no new regressions. Ok for trunk? -Pat 2021-06-01 Pat Haugen gcc/ChangeLog: * config/rs6000/rs6000-logue.c (rs6000_emit_prologue): Use gen_frame_store. diff --git a/gcc/config/rs6000/rs6000-logue.c b/gcc/config/rs6000/rs6000-logue.c index 13c00e740d6..07337c4836a 100644 --- a/gcc/config/rs6000/rs6000-logue.c +++ b/gcc/config/rs6000/rs6000-logue.c @@ -3257,7 +3257,7 @@ rs6000_emit_prologue (void) if (!WORLD_SAVE_P (info) && info->lr_save_p && !cfun->machine->lr_is_wrapped_separately) { - rtx addr, reg, mem; + rtx reg; reg = gen_rtx_REG (Pmode, 0); START_USE (0); @@ -3267,13 +3267,8 @@ rs6000_emit_prologue (void) if (!(strategy & (SAVE_NOINLINE_GPRS_SAVES_LR | SAVE_NOINLINE_FPRS_SAVES_LR))) { - addr = gen_rtx_PLUS (Pmode, frame_reg_rtx, - GEN_INT (info->lr_save_offset + frame_off)); - mem = gen_rtx_MEM (Pmode, addr); - /* This should not be of rs6000_sr_alias_set, because of -__builtin_return_address. */ - - insn = emit_move_insn (mem, reg); + insn = emit_insn (gen_frame_store (reg, frame_reg_rtx, +info->lr_save_offset + frame_off)); rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off, NULL_RTX, NULL_RTX); END_USE (0);
Re: [PATCH, rs6000] Add ALTIVEC_REGS as pressure class
On 5/7/21 6:00 PM, Segher Boessenkool wrote: >> --- a/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c >> +++ b/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c >> @@ -62,6 +62,6 @@ rlnm_test_2 (vector unsigned long long x, vector unsigned >> long long y, >> /* { dg-final { scan-assembler-times "vextsb2d" 1 } } */ >> /* { dg-final { scan-assembler-times "vslw" 1 } } */ >> /* { dg-final { scan-assembler-times "vsld" 1 } } */ >> -/* { dg-final { scan-assembler-times "xxlor" 3 } } */ >> +/* { dg-final { scan-assembler-times "xxlor" 2 } } */ >> /* { dg-final { scan-assembler-times "vrlwnm" 2 } } */ >> /* { dg-final { scan-assembler-times "vrldnm" 2 } } */ > So what is this replaced with? Was it an "xxlmr" and it is just > unnecessary now? Different RA choice made the reg copy unnecessary. < xxspltib 0,8 < xxlor 32,0,0 --- > xxspltib 32,8 -Pat
[PATCH, rs6000] Add ALTIVEC_REGS as pressure class
Add ALTIVEC_REGS as pressure class. Code that has heavy register pressure on Altivec registers can suffer from over-aggressive scheduling during sched1, which then leads to increased register spill. This is due to the fact that registers that prefer ALTIVEC_REGS are currently assigned an allocno class of VSX_REGS. This then misleads the scheduler to think there are 64 regs available, when in reality there are only 32 Altivec regs. This patch fixes the problem by assigning an allocno class of ALTIVEC_REGS and adding ALTIVEC_REGS as a pressure class. Bootstrap/regtest on powerpc64/powerpc64le with no new regressions. Testing on CPU2017 showed no significant differences. Ok for trunk? -Pat 2021-05-07 Pat Haugen gcc/ChangeLog: * config/rs6000/rs6000.c (rs6000_ira_change_pseudo_allocno_class): Return ALTIVEC_REGS if that is best_class. (rs6000_compute_pressure_classes): Add ALTIVEC_REGS. gcc/testsuite/ChangeLog: * gcc.target/powerpc/fold-vec-insert-float-p9.c: Adjust instruction counts. * gcc.target/powerpc/vec-rlmi-rlnm.c: Likewise. diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 844fee8..fee4eef 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -22487,11 +22487,14 @@ rs6000_ira_change_pseudo_allocno_class (int regno ATTRIBUTE_UNUSED, of allocno class. */ if (best_class == BASE_REGS) return GENERAL_REGS; - if (TARGET_VSX - && (best_class == FLOAT_REGS || best_class == ALTIVEC_REGS)) + if (TARGET_VSX && best_class == FLOAT_REGS) return VSX_REGS; return best_class; +case VSX_REGS: + if (best_class == ALTIVEC_REGS) + return ALTIVEC_REGS; + default: break; } @@ -23609,12 +23612,12 @@ rs6000_compute_pressure_classes (enum reg_class *pressure_classes) n = 0; pressure_classes[n++] = GENERAL_REGS; + if (TARGET_ALTIVEC) +pressure_classes[n++] = ALTIVEC_REGS; if (TARGET_VSX) pressure_classes[n++] = VSX_REGS; else { - if (TARGET_ALTIVEC) - pressure_classes[n++] = ALTIVEC_REGS; if (TARGET_HARD_FLOAT) pressure_classes[n++] = FLOAT_REGS; } diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-float-p9.c b/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-float-p9.c index 1c57672..4541768 100644 --- a/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-float-p9.c +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-float-p9.c @@ -31,5 +31,5 @@ testf_cst (float f, vector float vf) /* { dg-final { scan-assembler-times {\mstfs\M} 2 { target ilp32 } } } */ /* { dg-final { scan-assembler-times {\mlxv\M} 2 { target ilp32 } } } */ /* { dg-final { scan-assembler-times {\mlvewx\M} 1 { target ilp32 } } } */ -/* { dg-final { scan-assembler-times {\mvperm\M} 1 { target ilp32 } } } */ -/* { dg-final { scan-assembler-times {\mxxperm\M} 2 { target ilp32 } } } */ +/* { dg-final { scan-assembler-times {\mvperm\M} 2 { target ilp32 } } } */ +/* { dg-final { scan-assembler-times {\mxxperm\M} 1 { target ilp32 } } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c b/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c index 1e7d739..5512c0f 100644 --- a/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c +++ b/gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c @@ -62,6 +62,6 @@ rlnm_test_2 (vector unsigned long long x, vector unsigned long long y, /* { dg-final { scan-assembler-times "vextsb2d" 1 } } */ /* { dg-final { scan-assembler-times "vslw" 1 } } */ /* { dg-final { scan-assembler-times "vsld" 1 } } */ -/* { dg-final { scan-assembler-times "xxlor" 3 } } */ +/* { dg-final { scan-assembler-times "xxlor" 2 } } */ /* { dg-final { scan-assembler-times "vrlwnm" 2 } } */ /* { dg-final { scan-assembler-times "vrldnm" 2 } } */
[PATCH, rs6000 V3] Update "prefix" attribute for Power10 [PR99133]
Update prefixed attribute for Power10. This patch creates a new attribute, "maybe_prefixed", which is used to mark those instructions that may have a prefixed form. The existing "prefixed" attribute is now used to mark all instructions that are prefixed form. This patch differs from the prior version in that it doesn't modify the existing settings of the "prefixed" attribute but just adds the new attribute and sets/tests it appropriately. Bootstrap/regtest on powerpc64le (Power10) and powerpc64 (Power8 32/64) with no new regressions. Ok for trunk? -Pat 2021-03-30 Pat Haugen gcc/ PR target/99133 * config/rs6000/altivec.md (xxspltiw_v4si, xxspltiw_v4sf_inst, xxspltidp_v2df_inst, xxsplti32dx_v4si_inst, xxsplti32dx_v4sf_inst, xxblend_, xxpermx_inst, xxeval): Mark prefixed. * config/rs6000/mma.md (mma_, mma_, mma_, mma_, mma_, mma_, mma_, mma_, mma_, mma_): Likewise. * config/rs6000/rs6000.c (rs6000_final_prescan_insn): Adjust test. * config/rs6000/rs6000.md (define_attr "maybe_prefixed"): New. (define_attr "prefixed"): Update initializer. diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 27a269b9e72..21f1cc6f15b 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -826,7 +826,8 @@ (define_insn "xxspltiw_v4si" UNSPEC_XXSPLTIW))] "TARGET_POWER10" "xxspltiw %x0,%1" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecsimple") + (set_attr "prefixed" "yes")]) (define_expand "xxspltiw_v4sf" [(set (match_operand:V4SF 0 "register_operand" "=wa") @@ -845,7 +846,8 @@ (define_insn "xxspltiw_v4sf_inst" UNSPEC_XXSPLTIW))] "TARGET_POWER10" "xxspltiw %x0,%1" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecsimple") + (set_attr "prefixed" "yes")]) (define_expand "xxspltidp_v2df" [(set (match_operand:V2DF 0 "register_operand" ) @@ -864,7 +866,8 @@ (define_insn "xxspltidp_v2df_inst" UNSPEC_XXSPLTID))] "TARGET_POWER10" "xxspltidp %x0,%1" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecsimple") + (set_attr "prefixed" "yes")]) (define_expand "xxsplti32dx_v4si" [(set (match_operand:V4SI 0 "register_operand" "=wa") @@ -893,7 +896,8 @@ (define_insn "xxsplti32dx_v4si_inst" UNSPEC_XXSPLTI32DX))] "TARGET_POWER10" "xxsplti32dx %x0,%2,%3" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecsimple") + (set_attr "prefixed" "yes")]) (define_expand "xxsplti32dx_v4sf" [(set (match_operand:V4SF 0 "register_operand" "=wa") @@ -921,7 +925,8 @@ (define_insn "xxsplti32dx_v4sf_inst" UNSPEC_XXSPLTI32DX))] "TARGET_POWER10" "xxsplti32dx %x0,%2,%3" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecsimple") + (set_attr "prefixed" "yes")]) (define_insn "xxblend_" [(set (match_operand:VM3 0 "register_operand" "=wa") @@ -931,7 +936,8 @@ (define_insn "xxblend_" UNSPEC_XXBLEND))] "TARGET_POWER10" "xxblendv %x0,%x1,%x2,%x3" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecsimple") + (set_attr "prefixed" "yes")]) (define_expand "xxpermx" [(set (match_operand:V2DI 0 "register_operand" "+wa") @@ -975,7 +981,8 @@ (define_insn "xxpermx_inst" UNSPEC_XXPERMX))] "TARGET_POWER10" "xxpermx %x0,%x1,%x2,%x3,%4" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecsimple") + (set_attr "prefixed" "yes")]) (define_expand "vstrir_" [(set (match_operand:VIshort 0 "altivec_register_operand") @@ -3623,7 +3630,8 @@ (define_insn "xxeval" UNSPEC_XXEVAL))] "TARGET_POWER10" "xxeval %0,%1,%2,%3,%4" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecsimple") +(set_attr "prefixed" "yes")]) (define_expand "vec_unpacku_hi_v16qi" [(set (match_operand:V8HI 0 "register_operand" "=v") diff --git a/gcc/config/rs6
Re: [PATCH, rs6000 V2] Update "prefix" attribute for Power10 [PR99133]
On 3/18/21 11:33 AM, will schmidt wrote: > Per this change: > > +;; Whether an insn is a prefixed insn. A prefixed instruction has a prefix > +;; instruction word that conveys additional information such as a larger > +;; immediate, additional operands, etc., in addition to the normal > instruction > +;; word. The default "length" attribute will also be adjusted by default to > +;; be 12 bytes. > +(define_attr "prefixed" "no,yes" > + (if_then_else (eq_attr "prepend_prefixed_insn" "yes") > + (const_string "yes") > + (const_string "no"))) > > > .. it looks like at least most of the users of the "prefixed" attribute have > been switched over to use "prepend_prefixed_insn" instead. Are there still > users of the "prefixed" attribute remaining ? I'm guessing so, given context, > but can't tell for certain. > > (Just a question, not a specific request for a change) Yes, there are still a couple uses of get_attr_prefixed() in rs6000.c, plus the Power10 scheduling description makes use of it. -Pat
[PATCH, rs6000 V2] Update "prefix" attribute for Power10 [PR99133]
Update prefixed attribute for Power10. This patch creates a new attribute, prepend_prefixed_insn, which is used to mark those instructions that are prefixed and need to have a 'p' prepended to their mnemonic at asm emit time. The existing "prefix" attribute is now used to mark all instructions that are prefixed form. Bootstrap/regtest on powerpc64le (Power10) and powerpc64 (Power8 32/64) with no new regressions. Ok for trunk? -Pat 2021-03-17 Pat Haugen gcc/ PR target/99133 * config/rs6000/altivec.md (xxspltiw_v4si, xxspltiw_v4sf_inst, xxspltidp_v2df_inst, xxsplti32dx_v4si_inst, xxsplti32dx_v4sf_inst, xxblend_, xxpermx_inst, xxeval): Mark prefixed. * config/rs6000/mma.md (mma_, mma_, mma_, mma_, mma_, mma_, mma_, mma_, mma_, mma_): Likewise. * config/rs6000/pcrel-opt.md: Adjust attribute name. * config/rs6000/rs6000.c (rs6000_final_prescan_insn): Adjust test. * config/rs6000/rs6000.md (define_attr "prepend_prefixed_insn"): New. (define_attr "prefixed"): Update initializer. (*tls_gd_pcrel, *tls_ld_pcrel, tls_dtprel_, tls_tprel_, *tls_got_tprel_pcrel_, *pcrel_local_addr, *pcrel_extern_addr, stack_protect_setdi, stack_protect_testdi): Adjust attribute name. * config/rs6000/sync.md (load_quadpti, store_quadpti): Likewise. diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 27a269b9e72..21f1cc6f15b 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -826,7 +826,8 @@ (define_insn "xxspltiw_v4si" UNSPEC_XXSPLTIW))] "TARGET_POWER10" "xxspltiw %x0,%1" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecsimple") + (set_attr "prefixed" "yes")]) (define_expand "xxspltiw_v4sf" [(set (match_operand:V4SF 0 "register_operand" "=wa") @@ -845,7 +846,8 @@ (define_insn "xxspltiw_v4sf_inst" UNSPEC_XXSPLTIW))] "TARGET_POWER10" "xxspltiw %x0,%1" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecsimple") + (set_attr "prefixed" "yes")]) (define_expand "xxspltidp_v2df" [(set (match_operand:V2DF 0 "register_operand" ) @@ -864,7 +866,8 @@ (define_insn "xxspltidp_v2df_inst" UNSPEC_XXSPLTID))] "TARGET_POWER10" "xxspltidp %x0,%1" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecsimple") + (set_attr "prefixed" "yes")]) (define_expand "xxsplti32dx_v4si" [(set (match_operand:V4SI 0 "register_operand" "=wa") @@ -893,7 +896,8 @@ (define_insn "xxsplti32dx_v4si_inst" UNSPEC_XXSPLTI32DX))] "TARGET_POWER10" "xxsplti32dx %x0,%2,%3" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecsimple") + (set_attr "prefixed" "yes")]) (define_expand "xxsplti32dx_v4sf" [(set (match_operand:V4SF 0 "register_operand" "=wa") @@ -921,7 +925,8 @@ (define_insn "xxsplti32dx_v4sf_inst" UNSPEC_XXSPLTI32DX))] "TARGET_POWER10" "xxsplti32dx %x0,%2,%3" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecsimple") + (set_attr "prefixed" "yes")]) (define_insn "xxblend_" [(set (match_operand:VM3 0 "register_operand" "=wa") @@ -931,7 +936,8 @@ (define_insn "xxblend_" UNSPEC_XXBLEND))] "TARGET_POWER10" "xxblendv %x0,%x1,%x2,%x3" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecsimple") + (set_attr "prefixed" "yes")]) (define_expand "xxpermx" [(set (match_operand:V2DI 0 "register_operand" "+wa") @@ -975,7 +981,8 @@ (define_insn "xxpermx_inst" UNSPEC_XXPERMX))] "TARGET_POWER10" "xxpermx %x0,%x1,%x2,%x3,%4" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecsimple") + (set_attr "prefixed" "yes")]) (define_expand "vstrir_" [(set (match_operand:VIshort 0 "altivec_register_operand") @@ -3623,7 +3630,8 @@ (define_insn "xxeval" UNSPEC_XXEVAL))] "TARGET_POWER10" "xxeval %0,%1,%2,%3,%4" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecsi
Re: [PATCH, rs6000] Update "size" attribute for Power10
Ping3 On 2/18/21 2:30 PM, Pat Haugen via Gcc-patches wrote: > Ping2 > > On 1/26/21 11:27 AM, Pat Haugen via Gcc-patches wrote: >> Ping >> >> On 12/8/20 3:46 PM, Pat Haugen via Gcc-patches wrote: >>> Update size attribute for Power10. >>> >>> >>> This patch was broken out from my larger patch to update various attributes >>> for >>> Power10, in order to make the review process hopefully easier. This patch >>> only >>> updates the size attribute for various new instructions. There were no >>> changes >>> requested to this portion of the original patch, so nothing is new here. >>> >>> Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. >>> Ok for trunk? >>> >>> -Pat >>> >>> >>> 2020-11-08 Pat Haugen >>> >>> gcc/ >>> * config/rs6000/dfp.md (extendddtd2, trunctddd2, *cmp_internal1, >>> floatditd2, ftrunc2, fixdi2, dfp_ddedpd_, >>> dfp_denbcd_, dfp_dxex_, dfp_diex_, >>> *dfp_sgnfcnc_, dfp_dscli_, dfp_dscri_): Update size >>> attribute for Power10. >>> * config/rs6000/mma.md (*movoo): Likewise. >>> * config/rs6000/rs6000.md (define_attr "size"): Add 256. >>> (define_mode_attr bits): Add DD/TD modes. >>> * config/rs6000/sync.md (load_quadpti, store_quadpti, load_lockedpti, >>> store_conditionalpti): Update size attribute for Power10. >>> >> >
Re: [PATCH, rs6000] Add Power10 scheduling description
Ping3 On 2/18/21 2:31 PM, Pat Haugen via Gcc-patches wrote: > Ping2. > > On 1/26/21 11:30 AM, Pat Haugen via Gcc-patches wrote: >> Ping. >> >> On 11/13/20 4:04 PM, Pat Haugen via Gcc-patches wrote: >>> Add Power10 scheduling description. >>> >>> This patch adds the Power10 scheduling description. Since power10.md was >>> pretty much a complete rewrite (existing version of power10.md is mostly >>> just a copy of power9.md), I diffed power10.md with /dev/null so that the >>> full contents of the file are shown as opposed to a diff. This should make >>> it easier to read. This patch will not apply on current trunk do to that >>> reason. >>> >>> Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. >>> Ok for trunk? >>> >>> -Pat >>> >>> >>> 2020-11-13 Pat Haugen >>> >>> gcc/ >>> * config/rs6000/rs6000.c (struct processor_costs): New. >>> (rs6000_option_override_internal): Set Power10 costs. >>> (rs6000_issue_rate): Set Power10 issue rate. >>> * config/rs6000/power10.md: Rewrite for Power10. >>> >> >
[PATCH, rs6000] Rename variable for clarity
Rename next_insn_prefixed_p for improved clarity. Bootstrap/regtest on powerpc64le with no new regressions. Ok for trunk? -Pat 2021-02-22 Pat Haugen gcc/ * config/rs6000/rs6000.c (next_insn_prefixed_p): Rename. (rs6000_final_prescan_insn): Adjust. (rs6000_asm_output_opcode): Likewise. diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index ec068c58aa5..4e608073358 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -26191,7 +26191,7 @@ prefixed_paddi_p (rtx_insn *insn) /* Whether the next instruction needs a 'p' prefix issued before the instruction is printed out. */ -static bool next_insn_prefixed_p; +static bool prepend_p_to_next_insn; /* Define FINAL_PRESCAN_INSN if some processing needs to be done before outputting the assembler code. On the PowerPC, we remember if the current @@ -26202,7 +26202,7 @@ static bool next_insn_prefixed_p; void rs6000_final_prescan_insn (rtx_insn *insn, rtx [], int) { - next_insn_prefixed_p = (get_attr_prefixed (insn) != PREFIXED_NO); + prepend_p_to_next_insn = (get_attr_prefixed (insn) != PREFIXED_NO); return; } @@ -26212,7 +26212,7 @@ rs6000_final_prescan_insn (rtx_insn *insn, rtx [], int) void rs6000_asm_output_opcode (FILE *stream) { - if (next_insn_prefixed_p) + if (prepend_p_to_next_insn) fprintf (stream, "p"); return;
Re: [PATCH, rs6000] Add Power10 scheduling description
Ping2. On 1/26/21 11:30 AM, Pat Haugen via Gcc-patches wrote: > Ping. > > On 11/13/20 4:04 PM, Pat Haugen via Gcc-patches wrote: >> Add Power10 scheduling description. >> >> This patch adds the Power10 scheduling description. Since power10.md was >> pretty much a complete rewrite (existing version of power10.md is mostly >> just a copy of power9.md), I diffed power10.md with /dev/null so that the >> full contents of the file are shown as opposed to a diff. This should make >> it easier to read. This patch will not apply on current trunk do to that >> reason. >> >> Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. >> Ok for trunk? >> >> -Pat >> >> >> 2020-11-13 Pat Haugen >> >> gcc/ >> * config/rs6000/rs6000.c (struct processor_costs): New. >> (rs6000_option_override_internal): Set Power10 costs. >> (rs6000_issue_rate): Set Power10 issue rate. >> * config/rs6000/power10.md: Rewrite for Power10. >> >
Re: [PATCH, rs6000] Update "prefix" attribute for Power10
Ping2. On 1/26/21 11:28 AM, Pat Haugen via Gcc-patches wrote: > Ping. > > On 12/10/20 3:32 PM, Pat Haugen via Gcc-patches wrote: >> Update prefixed attribute for Power10. >> >> >> This patch was broken out from my larger patch to update various attributes >> for >> Power10, in order to make the review process hopefully easier. This patch >> only >> updates the prefix attribute for various new instructions. Changes in this >> version include missed updates to rs6000_insn_cost and >> rs6000_adjust_insn_length. I stayed with the new 'always' keyword but added >> additional commentary so hopefully is more clear. >> >> Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. >> Ok for trunk? >> >> -Pat >> >> >> 2020-11-10 Pat Haugen >> >> gcc/ >> * config/rs6000/altivec.md (xxspltiw_v4si, xxspltiw_v4sf_inst, >> xxspltidp_v2df_inst, xxsplti32dx_v4si_inst, xxsplti32dx_v4sf_inst, >> xxblend_, xxpermx_inst, xxeval): Mark prefixed "always". >> * config/rs6000/mma.md (mma_, mma_, >> mma_, mma_, mma_, mma_, >> mma_, mma_, mma_, mma_): >> Likewise. >> * config/rs6000/rs6000.c (rs6000_insn_cost): Update test for prefixed >> insn. >> (next_insn_prefixed_p): Rename to prefix_next_insn_p. >> (rs6000_final_prescan_insn): Only add 'p' for PREFIXED_YES. >> (rs6000_asm_output_opcode): Adjust. >> (rs6000_adjust_insn_length): Update test for prefixed insns. >> * config/rs6000/rs6000.md (define_attr "prefixed"): Add 'always' >> and update commentary. >> >
Re: [PATCH, rs6000] Update "size" attribute for Power10
Ping2 On 1/26/21 11:27 AM, Pat Haugen via Gcc-patches wrote: > Ping > > On 12/8/20 3:46 PM, Pat Haugen via Gcc-patches wrote: >> Update size attribute for Power10. >> >> >> This patch was broken out from my larger patch to update various attributes >> for >> Power10, in order to make the review process hopefully easier. This patch >> only >> updates the size attribute for various new instructions. There were no >> changes >> requested to this portion of the original patch, so nothing is new here. >> >> Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. >> Ok for trunk? >> >> -Pat >> >> >> 2020-11-08 Pat Haugen >> >> gcc/ >> * config/rs6000/dfp.md (extendddtd2, trunctddd2, *cmp_internal1, >> floatditd2, ftrunc2, fixdi2, dfp_ddedpd_, >> dfp_denbcd_, dfp_dxex_, dfp_diex_, >> *dfp_sgnfcnc_, dfp_dscli_, dfp_dscri_): Update size >> attribute for Power10. >> * config/rs6000/mma.md (*movoo): Likewise. >> * config/rs6000/rs6000.md (define_attr "size"): Add 256. >> (define_mode_attr bits): Add DD/TD modes. >> * config/rs6000/sync.md (load_quadpti, store_quadpti, load_lockedpti, >> store_conditionalpti): Update size attribute for Power10. >> >
Re: [PATCH, rs6000] Add Power10 scheduling description
Ping. On 11/13/20 4:04 PM, Pat Haugen via Gcc-patches wrote: > Add Power10 scheduling description. > > This patch adds the Power10 scheduling description. Since power10.md was > pretty much a complete rewrite (existing version of power10.md is mostly just > a copy of power9.md), I diffed power10.md with /dev/null so that the full > contents of the file are shown as opposed to a diff. This should make it > easier to read. This patch will not apply on current trunk do to that reason. > > Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. Ok > for trunk? > > -Pat > > > 2020-11-13 Pat Haugen > > gcc/ > * config/rs6000/rs6000.c (struct processor_costs): New. > (rs6000_option_override_internal): Set Power10 costs. > (rs6000_issue_rate): Set Power10 issue rate. > * config/rs6000/power10.md: Rewrite for Power10. >
Re: [PATCH, rs6000] Update "prefix" attribute for Power10
Ping. On 12/10/20 3:32 PM, Pat Haugen via Gcc-patches wrote: > Update prefixed attribute for Power10. > > > This patch was broken out from my larger patch to update various attributes > for > Power10, in order to make the review process hopefully easier. This patch only > updates the prefix attribute for various new instructions. Changes in this > version include missed updates to rs6000_insn_cost and > rs6000_adjust_insn_length. I stayed with the new 'always' keyword but added > additional commentary so hopefully is more clear. > > Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. Ok > for trunk? > > -Pat > > > 2020-11-10 Pat Haugen > > gcc/ > * config/rs6000/altivec.md (xxspltiw_v4si, xxspltiw_v4sf_inst, > xxspltidp_v2df_inst, xxsplti32dx_v4si_inst, xxsplti32dx_v4sf_inst, > xxblend_, xxpermx_inst, xxeval): Mark prefixed "always". > * config/rs6000/mma.md (mma_, mma_, > mma_, mma_, mma_, mma_, > mma_, mma_, mma_, mma_): > Likewise. > * config/rs6000/rs6000.c (rs6000_insn_cost): Update test for prefixed > insn. > (next_insn_prefixed_p): Rename to prefix_next_insn_p. > (rs6000_final_prescan_insn): Only add 'p' for PREFIXED_YES. > (rs6000_asm_output_opcode): Adjust. > (rs6000_adjust_insn_length): Update test for prefixed insns. > * config/rs6000/rs6000.md (define_attr "prefixed"): Add 'always' > and update commentary. >
Re: [PATCH, rs6000] Update "size" attribute for Power10
Ping On 12/8/20 3:46 PM, Pat Haugen via Gcc-patches wrote: > Update size attribute for Power10. > > > This patch was broken out from my larger patch to update various attributes > for > Power10, in order to make the review process hopefully easier. This patch only > updates the size attribute for various new instructions. There were no changes > requested to this portion of the original patch, so nothing is new here. > > Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. Ok > for trunk? > > -Pat > > > 2020-11-08 Pat Haugen > > gcc/ > * config/rs6000/dfp.md (extendddtd2, trunctddd2, *cmp_internal1, > floatditd2, ftrunc2, fixdi2, dfp_ddedpd_, > dfp_denbcd_, dfp_dxex_, dfp_diex_, > *dfp_sgnfcnc_, dfp_dscli_, dfp_dscri_): Update size > attribute for Power10. > * config/rs6000/mma.md (*movoo): Likewise. > * config/rs6000/rs6000.md (define_attr "size"): Add 256. > (define_mode_attr bits): Add DD/TD modes. > * config/rs6000/sync.md (load_quadpti, store_quadpti, load_lockedpti, > store_conditionalpti): Update size attribute for Power10. >
Re: [PATCH,rs6000] Fusion patterns for logical-logical
On 12/10/20 8:41 PM, acsawdey--- via Gcc-patches wrote: > + [(set_attr "type" "logical") Similar to load-cmp fusion pairs, we need a new insn type here. -Pat
Re: [PATCH,rs6000] Combine patterns for p10 load-cmpi fusion
On 12/4/20 1:19 PM, acsawdey--- via Gcc-patches wrote: > + print " [(set_attr \"type\" \"load\")\n"; We need to tag these with a new instruction type, such as 'fused-load-cmp', so the scheduler can distinguish them from normal loads. -Pat
[PATCH, rs6000] Don't set MMA prefixed instruction length to 8
Fix instruction length for MMA insns. Prefixed instructions should not have their length explicitly set to '8'. The function get_attr_length() will adjust the length appropriately based on the value of the "prefixed" attribute. Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. Ok for trunk? -Pat 2020-12-14 Pat Haugen gcc/ * config/rs6000/mma.md (*movxo, mma_, mma_, mma_, mma_, mma_, mma_, mma_, mma_, mma_, mma_): Remove explicit setting of length attribute. diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md index 4d291c42f7a..ef4394416b6 100644 --- a/gcc/config/rs6000/mma.md +++ b/gcc/config/rs6000/mma.md @@ -318,7 +318,7 @@ (define_insn_and_split "*movxo" DONE; } [(set_attr "type" "vecload,vecstore,veclogical") - (set_attr "length" "8,8,16") + (set_attr "length" "*,*,16") (set_attr "max_prefixed_insns" "2,2,*")]) (define_expand "mma_assemble_pair" @@ -539,8 +539,7 @@ (define_insn "mma_" MMA_VVI4I4I8))] "TARGET_MMA" " %A0,%x1,%x2,%3,%4,%5" - [(set_attr "type" "mma") - (set_attr "length" "8")]) + [(set_attr "type" "mma")]) (define_insn "mma_" [(set (match_operand:XO 0 "fpr_reg_operand" "=") @@ -553,8 +552,7 @@ (define_insn "mma_" MMA_AVVI4I4I8))] "TARGET_MMA" " %A0,%x2,%x3,%4,%5,%6" - [(set_attr "type" "mma") - (set_attr "length" "8")]) + [(set_attr "type" "mma")]) (define_insn "mma_" [(set (match_operand:XO 0 "fpr_reg_operand" "=") @@ -566,8 +564,7 @@ (define_insn "mma_" MMA_VVI4I4I2))] "TARGET_MMA" " %A0,%x1,%x2,%3,%4,%5" - [(set_attr "type" "mma") - (set_attr "length" "8")]) + [(set_attr "type" "mma")]) (define_insn "mma_" [(set (match_operand:XO 0 "fpr_reg_operand" "=") @@ -580,8 +577,7 @@ (define_insn "mma_" MMA_AVVI4I4I2))] "TARGET_MMA" " %A0,%x2,%x3,%4,%5,%6" - [(set_attr "type" "mma") - (set_attr "length" "8")]) + [(set_attr "type" "mma")]) (define_insn "mma_" [(set (match_operand:XO 0 "fpr_reg_operand" "=") @@ -592,8 +588,7 @@ (define_insn "mma_" MMA_VVI4I4))] "TARGET_MMA" " %A0,%x1,%x2,%3,%4" - [(set_attr "type" "mma") - (set_attr "length" "8")]) + [(set_attr "type" "mma")]) (define_insn "mma_" [(set (match_operand:XO 0 "fpr_reg_operand" "=") @@ -605,8 +600,7 @@ (define_insn "mma_" MMA_AVVI4I4))] "TARGET_MMA" " %A0,%x2,%x3,%4,%5" - [(set_attr "type" "mma") - (set_attr "length" "8")]) + [(set_attr "type" "mma")]) (define_insn "mma_" [(set (match_operand:XO 0 "fpr_reg_operand" "=") @@ -617,8 +611,7 @@ (define_insn "mma_" MMA_PVI4I2))] "TARGET_MMA" " %A0,%x1,%x2,%3,%4" - [(set_attr "type" "mma") - (set_attr "length" "8")]) + [(set_attr "type" "mma")]) (define_insn "mma_" [(set (match_operand:XO 0 "fpr_reg_operand" "=") @@ -630,8 +623,7 @@ (define_insn "mma_" MMA_APVI4I2))] "TARGET_MMA" " %A0,%x2,%x3,%4,%5" - [(set_attr "type" "mma") - (set_attr "length" "8")]) + [(set_attr "type" "mma")]) (define_insn "mma_" [(set (match_operand:XO 0 "fpr_reg_operand" "=") @@ -643,8 +635,7 @@ (define_insn "mma_" MMA_VVI4I4I4))] "TARGET_MMA" " %A0,%x1,%x2,%3,%4,%5" - [(set_attr "type" "mma") - (set_attr "length" "8")]) + [(set_attr "type" "mma")]) (define_insn "mma_" [(set (match_operand:XO 0 "fpr_reg_operand" "=") @@ -657,5 +648,4 @@ (define_insn "mma_" MMA_AVVI4I4I4))] "TARGET_MMA" " %A0,%x2,%x3,%4,%5,%6" - [(set_attr "type" "mma") - (set_attr "length" "8")]) + [(set_attr "type" "mma")])
[PATCH, rs6000] Update "prefix" attribute for Power10
Update prefixed attribute for Power10. This patch was broken out from my larger patch to update various attributes for Power10, in order to make the review process hopefully easier. This patch only updates the prefix attribute for various new instructions. Changes in this version include missed updates to rs6000_insn_cost and rs6000_adjust_insn_length. I stayed with the new 'always' keyword but added additional commentary so hopefully is more clear. Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. Ok for trunk? -Pat 2020-11-10 Pat Haugen gcc/ * config/rs6000/altivec.md (xxspltiw_v4si, xxspltiw_v4sf_inst, xxspltidp_v2df_inst, xxsplti32dx_v4si_inst, xxsplti32dx_v4sf_inst, xxblend_, xxpermx_inst, xxeval): Mark prefixed "always". * config/rs6000/mma.md (mma_, mma_, mma_, mma_, mma_, mma_, mma_, mma_, mma_, mma_): Likewise. * config/rs6000/rs6000.c (rs6000_insn_cost): Update test for prefixed insn. (next_insn_prefixed_p): Rename to prefix_next_insn_p. (rs6000_final_prescan_insn): Only add 'p' for PREFIXED_YES. (rs6000_asm_output_opcode): Adjust. (rs6000_adjust_insn_length): Update test for prefixed insns. * config/rs6000/rs6000.md (define_attr "prefixed"): Add 'always' and update commentary. diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 6a6ce0f84ed..fc926f7a7aa 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -828,7 +828,8 @@ (define_insn "xxspltiw_v4si" UNSPEC_XXSPLTIW))] "TARGET_POWER10" "xxspltiw %x0,%1" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecsimple") + (set_attr "prefixed" "always")]) (define_expand "xxspltiw_v4sf" [(set (match_operand:V4SF 0 "register_operand" "=wa") @@ -847,7 +848,8 @@ (define_insn "xxspltiw_v4sf_inst" UNSPEC_XXSPLTIW))] "TARGET_POWER10" "xxspltiw %x0,%1" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecsimple") + (set_attr "prefixed" "always")]) (define_expand "xxspltidp_v2df" [(set (match_operand:V2DF 0 "register_operand" ) @@ -866,7 +868,8 @@ (define_insn "xxspltidp_v2df_inst" UNSPEC_XXSPLTID))] "TARGET_POWER10" "xxspltidp %x0,%1" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecsimple") + (set_attr "prefixed" "always")]) (define_expand "xxsplti32dx_v4si" [(set (match_operand:V4SI 0 "register_operand" "=wa") @@ -895,7 +898,8 @@ (define_insn "xxsplti32dx_v4si_inst" UNSPEC_XXSPLTI32DX))] "TARGET_POWER10" "xxsplti32dx %x0,%2,%3" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecsimple") + (set_attr "prefixed" "always")]) (define_expand "xxsplti32dx_v4sf" [(set (match_operand:V4SF 0 "register_operand" "=wa") @@ -923,7 +927,8 @@ (define_insn "xxsplti32dx_v4sf_inst" UNSPEC_XXSPLTI32DX))] "TARGET_POWER10" "xxsplti32dx %x0,%2,%3" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecsimple") + (set_attr "prefixed" "always")]) (define_insn "xxblend_" [(set (match_operand:VM3 0 "register_operand" "=wa") @@ -933,7 +938,8 @@ (define_insn "xxblend_" UNSPEC_XXBLEND))] "TARGET_POWER10" "xxblendv %x0,%x1,%x2,%x3" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecsimple") + (set_attr "prefixed" "always")]) (define_expand "xxpermx" [(set (match_operand:V2DI 0 "register_operand" "+wa") @@ -977,7 +983,8 @@ (define_insn "xxpermx_inst" UNSPEC_XXPERMX))] "TARGET_POWER10" "xxpermx %x0,%x1,%x2,%x3,%4" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecsimple") + (set_attr "prefixed" "always")]) (define_expand "vstrir_" [(set (match_operand:VIshort 0 "altivec_register_operand") @@ -3625,7 +3632,8 @@ (define_insn "xxeval" UNSPEC_XXEVAL))] "TARGET_POWER10" "xxeval %0,%1,%2,%3,%4" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecsimple") +
[PATCH, rs6000] Update "size" attribute for Power10
Update size attribute for Power10. This patch was broken out from my larger patch to update various attributes for Power10, in order to make the review process hopefully easier. This patch only updates the size attribute for various new instructions. There were no changes requested to this portion of the original patch, so nothing is new here. Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. Ok for trunk? -Pat 2020-11-08 Pat Haugen gcc/ * config/rs6000/dfp.md (extendddtd2, trunctddd2, *cmp_internal1, floatditd2, ftrunc2, fixdi2, dfp_ddedpd_, dfp_denbcd_, dfp_dxex_, dfp_diex_, *dfp_sgnfcnc_, dfp_dscli_, dfp_dscri_): Update size attribute for Power10. * config/rs6000/mma.md (*movoo): Likewise. * config/rs6000/rs6000.md (define_attr "size"): Add 256. (define_mode_attr bits): Add DD/TD modes. * config/rs6000/sync.md (load_quadpti, store_quadpti, load_lockedpti, store_conditionalpti): Update size attribute for Power10. diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md index 9a952300cd6..7562e63a919 100644 --- a/gcc/config/rs6000/dfp.md +++ b/gcc/config/rs6000/dfp.md @@ -139,7 +139,8 @@ (define_insn "extendddtd2" (float_extend:TD (match_operand:DD 1 "gpc_reg_operand" "d")))] "TARGET_DFP" "dctqpq %0,%1" - [(set_attr "type" "dfp")]) + [(set_attr "type" "dfp") + (set_attr "size" "128")]) ;; The result of drdpq is an even/odd register pair with the converted ;; value in the even register and zero in the odd register. @@ -153,6 +154,7 @@ (define_insn "trunctddd2" "TARGET_DFP" "drdpq %2,%1\;fmr %0,%2" [(set_attr "type" "dfp") + (set_attr "size" "128") (set_attr "length" "8")]) (define_insn "trunctdsd2" @@ -206,7 +208,8 @@ (define_insn "*cmp_internal1" (match_operand:DDTD 2 "gpc_reg_operand" "d")))] "TARGET_DFP" "dcmpu %0,%1,%2" - [(set_attr "type" "dfp")]) + [(set_attr "type" "dfp") + (set_attr "size" "")]) (define_insn "floatdidd2" [(set (match_operand:DD 0 "gpc_reg_operand" "=d") @@ -220,7 +223,8 @@ (define_insn "floatditd2" (float:TD (match_operand:DI 1 "gpc_reg_operand" "d")))] "TARGET_DFP" "dcffixq %0,%1" - [(set_attr "type" "dfp")]) + [(set_attr "type" "dfp") + (set_attr "size" "128")]) ;; Convert a decimal64/128 to a decimal64/128 whose value is an integer. ;; This is the first stage of converting it to an integer type. @@ -230,7 +234,8 @@ (define_insn "ftrunc2" (fix:DDTD (match_operand:DDTD 1 "gpc_reg_operand" "d")))] "TARGET_DFP" "drintn. 0,%0,%1,1" - [(set_attr "type" "dfp")]) + [(set_attr "type" "dfp") + (set_attr "size" "")]) ;; Convert a decimal64/128 whose value is an integer to an actual integer. ;; This is the second stage of converting decimal float to integer type. @@ -240,7 +245,8 @@ (define_insn "fixdi2" (fix:DI (match_operand:DDTD 1 "gpc_reg_operand" "d")))] "TARGET_DFP" "dctfix %0,%1" - [(set_attr "type" "dfp")]) + [(set_attr "type" "dfp") + (set_attr "size" "")]) ;; Decimal builtin support @@ -262,7 +268,8 @@ (define_insn "dfp_ddedpd_" UNSPEC_DDEDPD))] "TARGET_DFP" "ddedpd %1,%0,%2" - [(set_attr "type" "dfp")]) + [(set_attr "type" "dfp") + (set_attr "size" "")]) (define_insn "dfp_denbcd_" [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d") @@ -271,7 +278,8 @@ (define_insn "dfp_denbcd_" UNSPEC_DENBCD))] "TARGET_DFP" "denbcd %1,%0,%2" - [(set_attr "type" "dfp")]) + [(set_attr "type" "dfp") + (set_attr "size" "")]) (define_insn "dfp_denbcd_v16qi_inst" [(set (match_operand:TD 0 "gpc_reg_operand" "=d") @@ -301,7 +309,8 @@ (define_insn "dfp_dxex_" UNSPEC_DXEX))] "TARGET_DFP" "dxex %0,%1" - [(set_attr "type" "dfp")]) + [(set_attr "type" "dfp") + (set_attr "size" "")]) (define_insn "dfp_diex_" [(set (match_ope
Re: [PATCH v2] rs6000, vector integer multiply/divide/modulo instructions
On 11/24/20 8:17 PM, Pat Haugen via Gcc-patches wrote: > On 11/24/20 12:59 PM, Carl Love via Gcc-patches wrote: >> + >> +(define_insn "dives_" >> + [(set (match_operand:VIlong 0 "vsx_register_operand" "=v") >> +(unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v") >> +(match_operand:VIlong 2 "vsx_register_operand" "v")] >> + UNSPEC_VDIVES))] >> + "TARGET_POWER10" >> + "vdives %0,%1,%2" >> + [(set_attr "type" "vecdiv") >> + (set_attr "size" "128")]) >> + >> +(define_insn "diveu_" >> + [(set (match_operand:VIlong 0 "vsx_register_operand" "=v") >> +(unspec: VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v") >> + (match_operand:VIlong 2 "vsx_register_operand" "v")] >> +UNSPEC_VDIVEU))] >> + "TARGET_POWER10" >> + "vdiveu %0,%1,%2" >> + [(set_attr "type" "vecdiv") >> + (set_attr "size" "128")]) >> + >> +(define_insn "div3" >> + [(set (match_operand:VIlong 0 "vsx_register_operand" "=v") >> +(div:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v") >> +(match_operand:VIlong 2 "vsx_register_operand" "v")))] >> + "TARGET_POWER10" >> + "vdivs %0,%1,%2" >> + [(set_attr "type" "vecdiv") >> + (set_attr "size" "128")]) >> + >> +(define_insn "udiv3" >> + [(set (match_operand:VIlong 0 "vsx_register_operand" "=v") >> +(udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v") >> +(match_operand:VIlong 2 "vsx_register_operand" "v")))] >> + "TARGET_POWER10" >> + "vdivu %0,%1,%2" >> + [(set_attr "type" "vecdiv") >> + (set_attr "size" "128")]) >> + >> +(define_insn "mods_" >> + [(set (match_operand:VIlong 0 "vsx_register_operand" "=v") >> +(mod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v") >> +(match_operand:VIlong 2 "vsx_register_operand" "v")))] >> + "TARGET_POWER10" >> + "vmods %0,%1,%2" >> + [(set_attr "type" "vecdiv") >> + (set_attr "size" "128")]) >> + >> +(define_insn "modu_" >> + [(set (match_operand:VIlong 0 "vsx_register_operand" "=v") >> +(umod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v") >> + (match_operand:VIlong 2 "vsx_register_operand" "v")))] >> + "TARGET_POWER10" >> + "vmodu %0,%1,%2" >> + [(set_attr "type" "vecdiv") >> + (set_attr "size" "128")]) > > We should only be setting "size" "128" for instructions that operate on > scalar 128-bit data items (i.e. 'vdivesq' etc). Since the above insns are > either V2DI/V4SI (ala VIlong mode_iterator), they shouldn't be marked as size > 128. If you want to set the size based on mode, (set_attr "size" "") > should do the trick I believe. Well, after you update "(define_mode_attr bits" in rs6000.md for V2DI/V4SI.
Re: [PATCH v2] rs6000, vector integer multiply/divide/modulo instructions
On 11/24/20 12:59 PM, Carl Love via Gcc-patches wrote: > + > +(define_insn "dives_" > + [(set (match_operand:VIlong 0 "vsx_register_operand" "=v") > +(unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v") > + (match_operand:VIlong 2 "vsx_register_operand" "v")] > +UNSPEC_VDIVES))] > + "TARGET_POWER10" > + "vdives %0,%1,%2" > + [(set_attr "type" "vecdiv") > + (set_attr "size" "128")]) > + > +(define_insn "diveu_" > + [(set (match_operand:VIlong 0 "vsx_register_operand" "=v") > +(unspec: VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v") > + (match_operand:VIlong 2 "vsx_register_operand" "v")] > + UNSPEC_VDIVEU))] > + "TARGET_POWER10" > + "vdiveu %0,%1,%2" > + [(set_attr "type" "vecdiv") > + (set_attr "size" "128")]) > + > +(define_insn "div3" > + [(set (match_operand:VIlong 0 "vsx_register_operand" "=v") > + (div:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v") > + (match_operand:VIlong 2 "vsx_register_operand" "v")))] > + "TARGET_POWER10" > + "vdivs %0,%1,%2" > + [(set_attr "type" "vecdiv") > + (set_attr "size" "128")]) > + > +(define_insn "udiv3" > + [(set (match_operand:VIlong 0 "vsx_register_operand" "=v") > + (udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v") > + (match_operand:VIlong 2 "vsx_register_operand" "v")))] > + "TARGET_POWER10" > + "vdivu %0,%1,%2" > + [(set_attr "type" "vecdiv") > + (set_attr "size" "128")]) > + > +(define_insn "mods_" > + [(set (match_operand:VIlong 0 "vsx_register_operand" "=v") > + (mod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v") > + (match_operand:VIlong 2 "vsx_register_operand" "v")))] > + "TARGET_POWER10" > + "vmods %0,%1,%2" > + [(set_attr "type" "vecdiv") > + (set_attr "size" "128")]) > + > +(define_insn "modu_" > + [(set (match_operand:VIlong 0 "vsx_register_operand" "=v") > + (umod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v") > + (match_operand:VIlong 2 "vsx_register_operand" "v")))] > + "TARGET_POWER10" > + "vmodu %0,%1,%2" > + [(set_attr "type" "vecdiv") > + (set_attr "size" "128")]) We should only be setting "size" "128" for instructions that operate on scalar 128-bit data items (i.e. 'vdivesq' etc). Since the above insns are either V2DI/V4SI (ala VIlong mode_iterator), they shouldn't be marked as size 128. If you want to set the size based on mode, (set_attr "size" "") should do the trick I believe. -Pat
Re: [PATCH] rs6000, vector integer multiply/divide/modulo instructions
On 11/4/20 10:44 AM, Carl Love via Gcc-patches wrote: > + > +(define_insn "vdives_" > + [(set (match_operand:VIlong 0 "vsx_register_operand" "=v") > +(unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v") > + (match_operand:VIlong 2 "vsx_register_operand" "v")] > +UNSPEC_VDIVES))] > + "TARGET_POWER10" > + "vdives %0,%1,%2" > + [(set_attr "type" "vecsimple")]) > + > +(define_insn "vdiveu_" > + [(set (match_operand:VIlong 0 "vsx_register_operand" "=v") > +(unspec: VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v") > + (match_operand:VIlong 2 "vsx_register_operand" "v")] > + UNSPEC_VDIVEU))] > + "TARGET_POWER10" > + "vdiveu %0,%1,%2" > + [(set_attr "type" "vecsimple")]) > + > +(define_insn "div3" > + [(set (match_operand:VIlong 0 "vsx_register_operand" "=v") > + (div:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v") > + (match_operand:VIlong 2 "vsx_register_operand" "v")))] > + "TARGET_POWER10" > + "vdivs %0,%1,%2" > + [(set_attr "type" "vecsimple")]) > + > +(define_insn "udiv3" > + [(set (match_operand:VIlong 0 "vsx_register_operand" "=v") > + (udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v") > + (match_operand:VIlong 2 "vsx_register_operand" "v")))] > + "TARGET_POWER10" > + "vdivu %0,%1,%2" > + [(set_attr "type" "vecsimple")]) > + > +(define_insn "vmods_" > + [(set (match_operand:VIlong 0 "vsx_register_operand" "=v") > + (mod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v") > + (match_operand:VIlong 2 "vsx_register_operand" "v")))] > + "TARGET_POWER10" > + "vmods %0,%1,%2" > + [(set_attr "type" "vecsimple")]) > + > +(define_insn "vmodu_" > + [(set (match_operand:VIlong 0 "vsx_register_operand" "=v") > + (umod:VIlong (match_operand:VIlong 1 "vsx_register_operand" "v") > + (match_operand:VIlong 2 "vsx_register_operand" "v")))] > + "TARGET_POWER10" > + "vmodu %0,%1,%2" > + [(set_attr "type" "vecsimple")]) Since the vdiv.../vmod... instructions execute in the fixed point divide unit, all the above instructions should have a type of "div" instead of "vecsimple". > + > +(define_insn "vmulhs_" > + [(set (match_operand:VIlong 0 "vsx_register_operand" "=v") > + (unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v") > + (match_operand:VIlong 2 "vsx_register_operand" "v")] > +UNSPEC_VMULHS))] > + "TARGET_POWER10" > + "vmulhs %0,%1,%2" > + [(set_attr "type" "vecsimple")]) > + > +(define_insn "vmulhu_" > + [(set (match_operand:VIlong 0 "vsx_register_operand" "=v") > + (unspec:VIlong [(match_operand:VIlong 1 "vsx_register_operand" "v") > + (match_operand:VIlong 2 "vsx_register_operand" "v")] > +UNSPEC_VMULHU))] > + "TARGET_POWER10" > + "vmulhu %0,%1,%2" > + [(set_attr "type" "vecsimple")])> + > +;; Vector multiply low double word > +(define_insn "mulv2di3" > + [(set (match_operand:V2DI 0 "vsx_register_operand" "=v") > + (mult:V2DI (match_operand:V2DI 1 "vsx_register_operand" "v") > +(match_operand:V2DI 2 "vsx_register_operand" "v")))] > + "TARGET_POWER10" > + "vmulld %0,%1,%2" > + [(set_attr "type" "vecsimple")]) Similarly, the above 3 insns should have a "mul" instruction type. -Pat
Re: [PATCH, rs6000] Add Power10 scheduling description
On 11/17/20 10:31 PM, will schmidt wrote: > On Fri, 2020-11-13 at 16:04 -0600, Pat Haugen via Gcc-patches wrote: >> +(define_automaton "power10dsp,power10issue,power10div") >> + >> +; Decode/dispatch slots >> +(define_cpu_unit "du0_power10,du1_power10,du2_power10,du3_power10, >> + du4_power10,du5_power10,du6_power10,du7_power10" "power10dsp") >> + >> +; Four execution units >> +(define_cpu_unit "exu0_power10,exu1_power10,exu2_power10,exu3_power10" >> + "power10issue") >> +; Two load units and two store units >> +(define_cpu_unit "lu0_power10,lu1_power10" "power10issue") >> +(define_cpu_unit "stu0_power10,stu1_power10" "power10issue") >> +; Create false units for use by non-pipelined div/sqrt >> +(define_cpu_unit "fx_div0_power10,fx_div1_power10" "power10div") >> +(define_cpu_unit "fp_div0_power10,fp_div1_power10,fp_div2_power10, >> + fp_div3_power10" "power10div") > > The spacing catches my eye, I'd want to add spaces around those commas, > etc. But.. this appears to be consistent with behavior > as seen in the > existing power9.md ; power9.md ; etc. > So it's either this way per necessity, or this way per history. > Either way, no change requested here given that precedence. > (If this and > the older stuff also needs to be cosmetically tweaked, that can be > handled later on..) Yeah, just historical. >> +; Load Unit >> +(define_insn_reservation "power10-load" 4 >> + (and (eq_attr "type" "load") >> + (eq_attr "update" "no") >> + (eq_attr "size" "!128") >> + (eq_attr "prefixed" "no") >> + (eq_attr "cpu" "power10")) >> + "DU_any_power10,LU_power10") >> + >> +(define_insn_reservation "power10-prefixed-load" 4 >> + (and (eq_attr "type" "load") >> + (eq_attr "update" "no") >> + (eq_attr "size" "!128") >> + (eq_attr "prefixed" "!no") > > I'm sure there is reason, but remind me.. "!no" versus "yes" ? >From my prior patch, the prefixed attribute can now have values no/yes/always, >so '!no' means 'yes' || 'always'. >> + >> +; DFP >> +; Use the minimum 12 cycle latency for all insns, even though some are more > > ".. for all DFP insns" > Since you specify this is a minimum, can prob drop "even though some > are more". ok
[PATCH, rs6000] Add Power10 scheduling description
Add Power10 scheduling description. This patch adds the Power10 scheduling description. Since power10.md was pretty much a complete rewrite (existing version of power10.md is mostly just a copy of power9.md), I diffed power10.md with /dev/null so that the full contents of the file are shown as opposed to a diff. This should make it easier to read. This patch will not apply on current trunk do to that reason. Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. Ok for trunk? -Pat 2020-11-13 Pat Haugen gcc/ * config/rs6000/rs6000.c (struct processor_costs): New. (rs6000_option_override_internal): Set Power10 costs. (rs6000_issue_rate): Set Power10 issue rate. * config/rs6000/power10.md: Rewrite for Power10. diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 4d528a39a37..85bb42d6dce 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -1080,6 +1080,26 @@ struct processor_costs power9_cost = { COSTS_N_INSNS (3), /* SF->DF convert */ }; +/* Instruction costs on POWER10 processors. */ +static const +struct processor_costs power10_cost = { + COSTS_N_INSNS (1), /* mulsi */ + COSTS_N_INSNS (1), /* mulsi_const */ + COSTS_N_INSNS (1), /* mulsi_const9 */ + COSTS_N_INSNS (1), /* muldi */ + COSTS_N_INSNS (4), /* divsi */ + COSTS_N_INSNS (4), /* divdi */ + COSTS_N_INSNS (2), /* fp */ + COSTS_N_INSNS (2), /* dmul */ + COSTS_N_INSNS (7), /* sdiv */ + COSTS_N_INSNS (9), /* ddiv */ + 128, /* cache line size */ + 32, /* l1 cache */ + 512, /* l2 cache */ + 16, /* prefetch streams */ + COSTS_N_INSNS (2), /* SF->DF convert */ +}; + /* Instruction costs on POWER A2 processors. */ static const struct processor_costs ppca2_cost = { @@ -4734,10 +4754,13 @@ rs6000_option_override_internal (bool global_init_p) break; case PROCESSOR_POWER9: - case PROCESSOR_POWER10: rs6000_cost = _cost; break; + case PROCESSOR_POWER10: + rs6000_cost = _cost; + break; + case PROCESSOR_PPCA2: rs6000_cost = _cost; break; @@ -18001,8 +18024,9 @@ rs6000_issue_rate (void) case PROCESSOR_POWER8: return 7; case PROCESSOR_POWER9: - case PROCESSOR_POWER10: return 6; + case PROCESSOR_POWER10: +return 8; default: return 1; } diff --git a/gcc/config/rs6000/power10.md b/gcc/config/rs6000/power10.md new file mode 100644 index 000..f9ca4cbf10e --- /dev/null +++ b/gcc/config/rs6000/power10.md @@ -0,0 +1,553 @@ +;; Scheduling description for the IBM POWER10 processor. +;; Copyright (C) 2020-2020 Free Software Foundation, Inc. +;; +;; Contributed by Pat Haugen (pthau...@us.ibm.com). + +;; This file is part of GCC. +;; +;; GCC is free software; you can redistribute it and/or modify it +;; under the terms of the GNU General Public License as published +;; by the Free Software Foundation; either version 3, or (at your +;; option) any later version. +;; +;; GCC is distributed in the hope that it will be useful, but WITHOUT +;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY +;; or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public +;; License for more details. +;; +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; <http://www.gnu.org/licenses/>. + +; For Power10 we model (and try to pack) the in-order decode/dispatch groups +; which consist of 8 instructions max. We do not try to model the details of +; the out-of-order issue queues and how insns flow to the various execution +; units except for the simple representation of the issue limitation of at +; most 4 insns to the execution units/2 insns to the load units/2 insns to +; the store units. +(define_automaton "power10dsp,power10issue,power10div") + +; Decode/dispatch slots +(define_cpu_unit "du0_power10,du1_power10,du2_power10,du3_power10, + du4_power10,du5_power10,du6_power10,du7_power10" "power10dsp") + +; Four execution units +(define_cpu_unit "exu0_power10,exu1_power10,exu2_power10,exu3_power10" +"power10issue") +; Two load units and two store units +(define_cpu_unit "lu0_power10,lu1_power10" "power10issue") +(define_cpu_unit "stu0_power10,stu1_power10" "power10issue") +; Create false units for use by non-pipelined div/sqrt +(define_cpu_unit "fx_div0_power10,fx_div1_power10" "power10div") +(define_cpu_unit "fp_div0_power10,fp_div1_power10,fp_div2_power10, + fp_div3_power10" "power10div") + + +; Dispatch slots are allocated in order conforming to program order. +(absence_set "du0_power10" "du1_power10,du2_power10,du3_power10,du4_power10,\ +
Re: [RFC, Instruction Scheduler, Stage1] New hook/code to perform fusion of dependent instructions
On 11/12/20 10:05 AM, Jeff Law wrote: >> I have coded up a proof of concept that implements our needs via a new >> target hook. The hook is passed a pair of dependent insns and returns if >> they are a fusion candidate. It is called while removing the forward >> dependencies of the just scheduled insn. If a dependent insn becomes >> available to schedule and it's a fusion candidate with the just >> scheduled insn, then the new code moves it to the ready list (if >> necessary) and marks it as SCHED_GROUP (piggy-backing on the existing >> code used by TARGET_SCHED_MACRO_FUSION) to make sure the fusion >> candidate will be scheduled next. Following is the scheduling part of >> the diff. Does this sound like a feasible approach? I welcome any >> comments/discussion. > It looks fairly reasonable to me. Do you plan on trying to take this > forward at all? Due to other requirements where this approach did not work, we are pursuing a different approach of creating combine patterns for the target. -Pat
Re: [PATCH, rs6000] Update instruction attributes for Power10
On 11/5/20 4:32 PM, will schmidt wrote: > On Wed, 2020-11-04 at 14:42 -0600, Pat Haugen via Gcc-patches wrote: >> * config/rs6000/rs6000.c (rs6000_final_prescan_insn): Only add 'p' for >> PREFIXED_YES. > > The code change reads as roughly > - next_insn_prefixed_p != PREFIXED_NO > > + next_insn_prefixed_p == PREFIXED_YES" > > So just an inversion of the logic? I don't obviously see the 'p' impact > there. > It's no longer an inversion of the logic since I added a PREFIXED_ALWAYS value. 'next_insn_prefixed' is used by rs6000_final_prescan_insn() to determine whether an insn mnemonic needs a 'p' prefix. We want it set for PREFIXED_YES, but not for PREFIXED_NO or PREFIXED_ALWAYS. > >> * config/rs6000/rs6000.md (define_attr "size"): Add 256. >> (define_attr "prefixed"): Add 'always'. >> (define_mode_attr bits): Add DD/TD modes. >> (cfuged, cntlzdm, cnttzdm, pdepd, pextd, bswaphi2_reg, bswapsi2_reg, >> bswapdi2_brd, setbc_signed_, >> *setbcr_signed_, *setnbc_signed_, >> *setnbcr_signed_): Update instruction attributes for >> Power10. > > ok. (assuming the assorted 'integer' -> 'crypto' changes are correct, > of course). > Yes, crypto represents the correct pipe the insns are executed on. Thanks for the review, Pat
[PATCH, rs6000] Update instruction attributes for Power10
Update instruction attributes for Power10. This patch updates the type/prefixed/dot/size attributes for various new instructions (and a couple existing that were incorrect) in preparation for the Power10 scheduling patch that will be following. Bootstrap/regtest on powerpc64le (Power8/Power10) with no new regressions. Ok for trunk? -Pat 2020-11-04 Pat Haugen gcc/ * config/rs6000/altivec.md (vsdb_, xxspltiw_v4si, xxspltiw_v4sf_inst, xxspltidp_v2df_inst, xxsplti32dx_v4si_inst, xxsplti32dx_v4sf_inst, xxblend_, xxpermx_inst, vstrir_code_, vstrir_p_code_, vstril_code_, vstril_p_code_, altivec_lvsl_reg, altivec_lvsl_direct, altivec_lvsr_reg, altivec_lvsr_direct, xxeval, vcfuged, vclzdm, vctzdm, vpdepd, vpextd, vgnb, vclrlb, vclrrb): Update instruction attributes for Power10. * config/rs6000/dfp.md (extendddtd2, trunctddd2, *cmp_internal1, floatditd2, ftrunc2, fixdi2, dfp_ddedpd_, dfp_denbcd_, dfp_dxex_, dfp_diex_, *dfp_sgnfcnc_, dfp_dscli_, dfp_dscri_): Likewise. * config/rs6000/mma.md (*movpoi, mma_, mma_, mma_, mma_, mma_, mma_, mma_, mma_, mma_, mma_): Likewise. * config/rs6000/rs6000.c (rs6000_final_prescan_insn): Only add 'p' for PREFIXED_YES. * config/rs6000/rs6000.md (define_attr "size"): Add 256. (define_attr "prefixed"): Add 'always'. (define_mode_attr bits): Add DD/TD modes. (cfuged, cntlzdm, cnttzdm, pdepd, pextd, bswaphi2_reg, bswapsi2_reg, bswapdi2_brd, setbc_signed_, *setbcr_signed_, *setnbc_signed_, *setnbcr_signed_): Update instruction attributes for Power10. * config/rs6000/sync.md (load_quadpti, store_quadpti, load_lockedpti, store_conditionalpti): Update instruction attributes for Power10. * config/rs6000/vsx.md (*xvtlsbb_internal, xxgenpcvm__internal, vextractl_internal, vextractr_internal, vinsertvl_internal_, vinsertvr_internal_, vinsertgl_internal_, vinsertgr_internal_, vreplace_elt__inst): Likewise. diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 0a2e634d6b0..76191ba4107 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -819,7 +819,7 @@ (define_insn "vsdb_" VSHIFT_DBL_LR))] "TARGET_POWER10" "vsdbi %0,%1,%2,%3" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecperm")]) (define_insn "xxspltiw_v4si" [(set (match_operand:V4SI 0 "register_operand" "=wa") @@ -827,7 +827,8 @@ (define_insn "xxspltiw_v4si" UNSPEC_XXSPLTIW))] "TARGET_POWER10" "xxspltiw %x0,%1" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecperm") + (set_attr "prefixed" "always")]) (define_expand "xxspltiw_v4sf" [(set (match_operand:V4SF 0 "register_operand" "=wa") @@ -846,7 +847,8 @@ (define_insn "xxspltiw_v4sf_inst" UNSPEC_XXSPLTIW))] "TARGET_POWER10" "xxspltiw %x0,%1" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecperm") + (set_attr "prefixed" "always")]) (define_expand "xxspltidp_v2df" [(set (match_operand:V2DF 0 "register_operand" ) @@ -865,7 +867,8 @@ (define_insn "xxspltidp_v2df_inst" UNSPEC_XXSPLTID))] "TARGET_POWER10" "xxspltidp %x0,%1" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecperm") + (set_attr "prefixed" "always")]) (define_expand "xxsplti32dx_v4si" [(set (match_operand:V4SI 0 "register_operand" "=wa") @@ -894,7 +897,8 @@ (define_insn "xxsplti32dx_v4si_inst" UNSPEC_XXSPLTI32DX))] "TARGET_POWER10" "xxsplti32dx %x0,%2,%3" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecperm") + (set_attr "prefixed" "always")]) (define_expand "xxsplti32dx_v4sf" [(set (match_operand:V4SF 0 "register_operand" "=wa") @@ -922,7 +926,8 @@ (define_insn "xxsplti32dx_v4sf_inst" UNSPEC_XXSPLTI32DX))] "TARGET_POWER10" "xxsplti32dx %x0,%2,%3" - [(set_attr "type" "vecsimple")]) + [(set_attr "type" "vecperm") + (set_attr "prefixed" "always")]) (define_insn "xxblend_" [(set (match_operand:VM3 0 "register_operand" "=wa") @@ -932,7 +937,8 @@ (define_insn "xx
Re: [PATCH 1/2, rs6000] int128 sign extention instructions (partial prereq)
On 9/24/20 10:59 AM, will schmidt via Gcc-patches wrote: > +;; Move DI value from GPR to TI mode in VSX register, word 1. > +(define_insn "mtvsrdd_diti_w1" > + [(set (match_operand:TI 0 "register_operand" "=wa") > + (unspec:TI [(match_operand:DI 1 "register_operand" "r")] > +UNSPEC_MTVSRD_DITI_W1))] > + "TARGET_POWERPC64 && TARGET_DIRECT_MOVE" > + "mtvsrdd %x0,0,%1" > + [(set_attr "type" "vecsimple")]) "vecmove" (since I just updated the other uses). > + > +;; Sign extend 64-bit value in TI reg, word 1, to 128-bit value in TI reg > +(define_insn "extendditi2_vector" > + [(set (match_operand:TI 0 "gpc_reg_operand" "=v") > +(unspec:TI [(match_operand:TI 1 "gpc_reg_operand" "v")] > + UNSPEC_EXTENDDITI2))] > + "TARGET_POWER10" > + "vextsd2q %0,%1" > + [(set_attr "type" "exts")]) "vecexts". > + > +(define_expand "extendditi2" > + [(set (match_operand:TI 0 "gpc_reg_operand") > +(sign_extend:DI (match_operand:DI 1 "gpc_reg_operand")))] > + "TARGET_POWER10" > + { > +/* Move 64-bit src from GPR to vector reg and sign extend to 128-bits */ > +rtx temp = gen_reg_rtx (TImode); > +emit_insn (gen_mtvsrdd_diti_w1 (temp, operands[1])); > +emit_insn (gen_extendditi2_vector (operands[0], temp)); > +DONE; > + } > + [(set_attr "type" "exts")]) Don't need "type" attr on define_expand since the type will come from the 2 individual insns emitted. Thanks, Pat
[PATCH] rs6000: Rename mffgpr/mftgpr instruction types
The following is mostly a mechanical change to rename the mffgpr/mftgpr insn types to mtvsr/mfvsr to be more clear. It also removes Power6 references to those insn types since we no longer generate those instructions. Bootstrap/regtest on powerpc64le with no new regressions. Ok for trunk? -Pat 2020-09-11 Pat Haugen gcc/ * config/rs6000/power10.md (power10-mffgpr, power10-mftgpr): Rename to mtvsr/mfvsr. * config/rs6000/power6.md (X2F_power6, power6-mftgpr, power6-mffgpr): Remove. * config/rs6000/power8.md (power8-mffgpr, power8-mftgpr): Rename to mtvsr/mfvsr. * config/rs6000/power9.md (power8-mffgpr, power8-mftgpr): Likewise. * config/rs6000/rs6000.c (rs6000_adjust_cost): Remove Power6 TYPE_MFFGPR cases. * config/rs6000/rs6000.md (mffgpr, mftgpr, zero_extendsi2, extendsi2, @signbit2_dm, lfiwax, lfiwzx, *movsi_internal1, movsi_from_sf, *movdi_from_sf_zero_ext, *mov_internal, movsd_hardfloat, movsf_from_si, *mov_hardfloat64, p8_mtvsrwz, p8_mtvsrd_df, p8_mtvsrd_sf, p8_mfvsrd_3_, *movdi_internal64, unpack_dm): Rename mffgpr/mftgpr to mtvsr/mfvsr. * config/rs6000/vsx.md (vsx_mov_64bit, vsx_extract_, vsx_extract_si, *vsx_extract__p8): Likewise. diff --git a/gcc/config/rs6000/power10.md b/gcc/config/rs6000/power10.md index 9f8a5825744..2b4d882e8df 100644 --- a/gcc/config/rs6000/power10.md +++ b/gcc/config/rs6000/power10.md @@ -468,13 +468,13 @@ (define_insn_reservation "power10-qpmul" 24 (eq_attr "cpu" "power10")) "DU_super_power10,dfu_power10*8") -(define_insn_reservation "power10-mffgpr" 2 - (and (eq_attr "type" "mffgpr") +(define_insn_reservation "power10-mtvsr" 2 + (and (eq_attr "type" "mtvsr") (eq_attr "cpu" "power10")) "DU_slice_3_power10,VSU_power10") -(define_insn_reservation "power10-mftgpr" 2 - (and (eq_attr "type" "mftgpr") +(define_insn_reservation "power10-mfvsr" 2 + (and (eq_attr "type" "mfvsr") (eq_attr "cpu" "power10")) "DU_slice_3_power10,VSU_power10") diff --git a/gcc/config/rs6000/power6.md b/gcc/config/rs6000/power6.md index a94ce52425f..e2e7582e67c 100644 --- a/gcc/config/rs6000/power6.md +++ b/gcc/config/rs6000/power6.md @@ -56,10 +56,6 @@ (define_reservation "LX2_power6" (define_reservation "FX2_power6" "iu1_power6+iu2_power6") -(define_reservation "X2F_power6" -"(iu1_power6+iu2_power6+fpu1_power6)\ -|(iu1_power6+iu2_power6+fpu2_power6)") - (define_reservation "BX2_power6" "iu1_power6+iu2_power6+bpu_power6") @@ -605,20 +601,3 @@ (define_bypass 6 "power6-vecperm" "power6-veccomplex" ) (define_bypass 5 "power6-vecperm" "power6-vecstore" ) -(define_insn_reservation "power6-mftgpr" 8 - (and (eq_attr "type" "mftgpr") - (eq_attr "cpu" "power6")) - "X2F_power6") - -(define_insn_reservation "power6-mffgpr" 14 - (and (eq_attr "type" "mffgpr") - (eq_attr "cpu" "power6")) - "LX2_power6") - -(define_bypass 4 "power6-mftgpr" "power6-imul,\ - power6-lmul,\ - power6-imul-cmp,\ - power6-lmul-cmp,\ - power6-imul3,\ - power6-idiv,\ - power6-ldiv" ) diff --git a/gcc/config/rs6000/power8.md b/gcc/config/rs6000/power8.md index fae2ad8bf57..a3f46c62be2 100644 --- a/gcc/config/rs6000/power8.md +++ b/gcc/config/rs6000/power8.md @@ -379,13 +379,13 @@ (define_insn_reservation "power8-vecdiv" 31 (eq_attr "cpu" "power8")) "DU_any_power8,VSU_power8") -(define_insn_reservation "power8-mffgpr" 5 - (and (eq_attr "type" "mffgpr") +(define_insn_reservation "power8-mtvsr" 5 + (and (eq_attr "type" "mtvsr") (eq_attr "cpu" "power8")) "DU_any_power8,VSU_power8") -(define_insn_reservation "power8-mftgpr" 6 - (and (eq_attr "type" "mftgpr") +(define_insn_reservation "power8-mfvsr" 6 + (and (eq_attr "type" "mfvsr") (eq_attr "cpu" "power8")) "DU_any_power8,VSU_power8") diff --git a/gcc/config/rs6000/power9.md b/gcc/config/rs6000/power9.md index 2277b145308..c86d643a7d3 100644 --- a/gcc/config/rs6000/power9.md +++ b/gcc/config/rs6000/po
Re: [PATCH] rs6000: Fix instruction type
On 9/9/20 4:41 PM, Segher Boessenkool wrote: > Hi! > > On Wed, Sep 09, 2020 at 04:14:37PM -0500, Pat Haugen wrote: >> I noticed that some of the VSR<->GPR move instructions are not typed >> correctly. This patch fixes those instructions so that the scheduler >> treats them with the correct latency. > >> --- a/gcc/config/rs6000/rs6000.md >> +++ b/gcc/config/rs6000/rs6000.md >> @@ -5483,7 +5483,7 @@ (define_insn "lfiwzx" >> lxsiwzx %x0,%y1 >> mtvsrwz %x0,%1 >> xxextractuw %x0,%x1,4" >> - [(set_attr "type" "fpload,fpload,mftgpr,vecexts") >> + [(set_attr "type" "fpload,fpload,mffgpr,vecexts") >> (set_attr "isa" "*,p8v,p8v,p9v")]) > > Can we rename mftgpr/mffgpr globally? Maybe even as mfvsr and mtvsr, > because that is what is actually modeled here? Such names will make it > much harder to get confused and use the wrong type, too :-) > Those types were originally created for the mffgpr/mftgpr Power6 instructions. But since it appears we no longer generate those insns I totally agree with doing a global change as you suggest to make things clearer. Would you like that as a separate patch or is it fine to include in this one? >> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md >> index 54da54c43dc..3a5cf896da8 100644 >> --- a/gcc/config/rs6000/vsx.md >> +++ b/gcc/config/rs6000/vsx.md >> @@ -2885,7 +2885,7 @@ (define_insn "vsx_concat_" >>else >> gcc_unreachable (); >> } >> - [(set_attr "type" "vecperm")]) >> + [(set_attr "type" "vecperm,vecmove")]) > > mtvsrdd is a mtvsr, sorry, mffgpr just the same? It isn't vecmove? > >> @@ -4440,7 +4440,7 @@ (define_insn "vsx_splat__reg" >>"@ >> xxpermdi %x0,%x1,%x1,0 >> mtvsrdd %x0,%1,%1" >> - [(set_attr "type" "vecperm")]) >> + [(set_attr "type" "vecperm,vecmove")]) > > Same here. mtvsrdd dispatches as a vector op, so requires a super-slice. As opposed to the others which just require a single execution slice for Power9. -Pat
[PATCH] rs6000: Fix instruction type
I noticed that some of the VSR<->GPR move instructions are not typed correctly. This patch fixes those instructions so that the scheduler treats them with the correct latency. Bootstrap/regtest on powerpc64le with no new regressions. Also ran a CPU2017 benchmark comparison on Power9 with no major differences (a couple minor improvements and no degradations). Ok for trunk? -Pat 2020-09-09 Pat Haugen gcc/ * gcc/config/rs6000/rs6000.md (lfiwzx, floatunssi2_lfiwzx, p8_mtvsrwz, p8_mtvsrd_sf): Fix insn type. * gcc/config/rs6000/vsx.md (vsx_concat_, vsx_splat__reg, vsx_splat_v4sf): Likewise. diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 43b620ae1c0..f902c864c26 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -5483,7 +5483,7 @@ (define_insn "lfiwzx" lxsiwzx %x0,%y1 mtvsrwz %x0,%1 xxextractuw %x0,%x1,4" - [(set_attr "type" "fpload,fpload,mftgpr,vecexts") + [(set_attr "type" "fpload,fpload,mffgpr,vecexts") (set_attr "isa" "*,p8v,p8v,p9v")]) (define_insn_and_split "floatunssi2_lfiwzx" @@ -7634,7 +7634,7 @@ (define_insn_and_split "movsf_from_si" *, 12,*, *") (set_attr "type" "load, fpload,fpload,fpload,store, fpstore, -fpstore,vecfloat, mffgpr,*") +fpstore,vecfloat, mftgpr,*") (set_attr "isa" "*, *, p9v, p8v, *, *, p8v,p8v, p8v, *")]) @@ -8711,7 +8711,7 @@ (define_insn "p8_mtvsrwz" UNSPEC_P8V_MTVSRWZ))] "!TARGET_POWERPC64 && TARGET_DIRECT_MOVE" "mtvsrwz %x0,%1" - [(set_attr "type" "mftgpr")]) + [(set_attr "type" "mffgpr")]) (define_insn_and_split "reload_fpr_from_gpr" [(set (match_operand:FMOVE64X 0 "register_operand" "=d") @@ -8810,7 +8810,7 @@ (define_insn "p8_mtvsrd_sf" UNSPEC_P8V_MTVSRD))] "TARGET_POWERPC64 && TARGET_DIRECT_MOVE" "mtvsrd %x0,%1" - [(set_attr "type" "mftgpr")]) + [(set_attr "type" "mffgpr")]) (define_insn_and_split "reload_vsx_from_gprsf" [(set (match_operand:SF 0 "register_operand" "=wa") diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 54da54c43dc..3a5cf896da8 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -2885,7 +2885,7 @@ (define_insn "vsx_concat_" else gcc_unreachable (); } - [(set_attr "type" "vecperm")]) + [(set_attr "type" "vecperm,vecmove")]) ;; Combiner patterns to allow creating XXPERMDI's to access either double ;; word element in a vector register. @@ -4440,7 +4440,7 @@ (define_insn "vsx_splat__reg" "@ xxpermdi %x0,%x1,%x1,0 mtvsrdd %x0,%1,%1" - [(set_attr "type" "vecperm")]) + [(set_attr "type" "vecperm,vecmove")]) (define_insn "vsx_splat__mem" [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") @@ -4493,7 +4493,7 @@ (define_insn_and_split "vsx_splat_v4sf" (unspec:V4SF [(match_dup 0) (const_int 0)] UNSPEC_VSX_XXSPLTW))] "" - [(set_attr "type" "vecload,vecperm,mftgpr") + [(set_attr "type" "vecload,vecperm,vecperm") (set_attr "length" "*,8,*") (set_attr "isa" "*,p8v,*")])
[PATCH] rs6000: Disable -fcaller-saves by default
Disable -fcaller-saves by default. This patch turns off -fcaller-saves by default so that IRA doesn't try to use volatile regs for pseudos that are live across a call, which would then require LRA to save/restore the reg around the call. Bootstrap/regtest on powerpc64le with no new regressions. Also ran a CPU2017 benchmark comparison with no major differences (a few minor improvements and one minor degradation). Ok for trunk? -Pat 2020-07-23 Pat Haugen gcc/ * common/config/rs6000/rs6000-common.c (rs6000_option_optimization_table): Turn off -fcaller-saves. gcc/testsuite/ * gcc.target/powerpc/caller-saves.c: New. diff --git a/gcc/common/config/rs6000/rs6000-common.c b/gcc/common/config/rs6000/rs6000-common.c index ee37b9dc90b..6d68834aed2 100644 --- a/gcc/common/config/rs6000/rs6000-common.c +++ b/gcc/common/config/rs6000/rs6000-common.c @@ -43,8 +43,13 @@ static const struct default_options rs6000_option_optimization_table[] = on rs6000, turn it off by default. */ { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 }, +/* Don't allow IRA to use volatile regs across function calls. */ +{ OPT_LEVELS_ALL, OPT_fcaller_saves, NULL, 0 }, + /* Double growth factor to counter reduced min jump length. */ { OPT_LEVELS_ALL, OPT__param_max_grow_copy_bb_insns_, NULL, 16 }, + +/* Following marks the end of the list and needs to remain last. */ { OPT_LEVELS_NONE, 0, NULL, 0 } }; diff --git a/gcc/testsuite/gcc.target/powerpc/caller-saves.c b/gcc/testsuite/gcc.target/powerpc/caller-saves.c new file mode 100644 index 000..846356f16f7 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/caller-saves.c @@ -0,0 +1,9 @@ +/* { dg-do compile */ +/* { dg-options "-O2 -fverbose-asm" } */ + +void +foo () +{ +} + +/* { dg-final { scan-assembler-not "fcaller-saves" } } */
Re: [RFC, Instruction Scheduler, Stage1] New hook/code to perform fusion of dependent instructions
On 4/8/20 2:46 AM, Richard Biener wrote: >> I have coded up a proof of concept that implements our needs via a new >> target hook. The hook is passed a pair of dependent insns and returns if >> they are a fusion candidate. It is called while removing the forward >> dependencies of the just scheduled insn. If a dependent insn becomes >> available to schedule and it's a fusion candidate with the just >> scheduled insn, then the new code moves it to the ready list (if >> necessary) and marks it as SCHED_GROUP (piggy-backing on the existing >> code used by TARGET_SCHED_MACRO_FUSION) to make sure the fusion >> candidate will be scheduled next. Following is the scheduling part of >> the diff. Does this sound like a feasible approach? I welcome any >> comments/discussion. > It would be nice test if the i386 use of TARGET_SCHED_MACRO_FUSION hooks > for compare + branch fusion can be transitioned to that scheme. I think your > proposal doesn't improve things there since we can never schedule > branches earlier > but we could schedule the compare later. But your proposal only > affects scheduling > of later insns, not the order of insns getting onto the ready list? Yes, your understanding is correct, my approach only affects later dependent insns. The first insn of the fusion pair will be scheduled according to existing priority based scheduling. -Pat
[RFC, Instruction Scheduler, Stage1] New hook/code to perform fusion of dependent instructions
The Power processor has the ability to fuse certain pairs of dependent instructions to improve their performance if they appear back-to-back in the instruction stream. In looking at the current support for instruction fusion in GCC I saw the following 2 options. 1) TARGET_SCHED_MACRO_FUSION target hooks: Only looks at existing back-to-back instructions and will ensure the scheduler keeps them together. 2) -fsched-fusion/TARGET_SCHED_FUSION_PRIORITY: Runs as a separate scheduling pass before peephole2. Operates independently on a single insn. Used by ARM backend to assign higher priorities to base/disp loads and stores so that the scheduling pass will schedule loads/stores to adjacent memory back-to-back. Later these insns will be transformed into load/store pair insns. Neither of these work for Power's purpose because they don't deal with fusion of dependent insns that may not already be back-to-back. The TARGET_SCHED_REORDER[2] hooks also don't work since the dependent insn more than likely gets queued for N cycles so wouldn't be on the ready list for the reorder hooks to process. We want the ability for the scheduler to schedule dependent insn pairs back-to-back when possible (i.e. other dependencies of both insns have been satisfied). I have coded up a proof of concept that implements our needs via a new target hook. The hook is passed a pair of dependent insns and returns if they are a fusion candidate. It is called while removing the forward dependencies of the just scheduled insn. If a dependent insn becomes available to schedule and it's a fusion candidate with the just scheduled insn, then the new code moves it to the ready list (if necessary) and marks it as SCHED_GROUP (piggy-backing on the existing code used by TARGET_SCHED_MACRO_FUSION) to make sure the fusion candidate will be scheduled next. Following is the scheduling part of the diff. Does this sound like a feasible approach? I welcome any comments/discussion. Thanks, Pat diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c index 80687fb5359..7a62136d497 100644 --- a/gcc/haifa-sched.c +++ b/gcc/haifa-sched.c @@ -4152,6 +4152,39 @@ schedule_insn (rtx_insn *insn) && SCHED_GROUP_P (next) && advance < effective_cost) advance = effective_cost; + + /* If all dependencies of this insn have now been resolved and the +just scheduled insn was not part of a SCHED_GROUP, check if +this dependent insn can be fused with the just scheduled insn. */ + else if (effective_cost >= 0 && !SCHED_GROUP_P (insn) + && targetm.sched.dep_fusion + && targetm.sched.dep_fusion (insn, next)) + { + /* Move to ready list if necessary. */ + if (effective_cost > 0) + { + queue_remove (next); + ready_add (, next, true); + } + + /* Mark as sched_group. */ + SCHED_GROUP_P (next) = 1; + + /* Fix insn_tick. */ + INSN_TICK (next) = INSN_TICK (insn); + + /* Dump some debug output for success. */ + if (sched_verbose >= 5) + { + fprintf (sched_dump, ";;\t\tFusing dependent insns: "); + fprintf (sched_dump, "%4d %-30s --> ", INSN_UID (insn), + str_pattern_slim (PATTERN (insn))); + fprintf (sched_dump, "%4d %-30s\n", INSN_UID (next), + str_pattern_slim (PATTERN (next))); + } + } } else /* Check always has only one forward dependence (to the first insn in
Re: [PATCH] correct COUNT and PROB for unrolled loop
On 2/3/20 2:17 AM, Jiufu Guo wrote: > +/* { dg-final { scan-rtl-dump-times "REG_BR_PROB 937042044" 1 > "loop2_unroll"} } */ Sorry I didn't catch this addition to the original testcase earlier, but I wonder how stable this test is going to be. If there are future changes to default count/probability, or changes in their representation, this may fail and need to be updated. The fact that the loop is still getting aligned is the main concern. -Pat
[PATCH] rs6000: Refactor scheduling hook code
The following patch factors out some common code to its own function and also moves the Power6 specific code to a new function. Bootstrap/regtest on powerpc64le with no regressions. Ok for trunk? -Pat 2019-11-18 Pat Haugen * config/rs6000/rs6000.c (move_to_end_of_ready): New, factored out from common code. (power6_sched_reorder2): Factored out from rs6000_sched_reorder2, call new function. (power9_sched_reorder2): Call new function. (rs6000_sched_reorder2): Likewise. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 278306) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -17711,14 +17711,216 @@ get_next_active_insn (rtx_insn *insn, rt return insn; } +/* Move instruction at POS to the end of the READY list. */ + +static void +move_to_end_of_ready (rtx_insn **ready, int pos, int lastpos) +{ + rtx_insn *tmp; + int i; + + tmp = ready[pos]; + for (i = pos; i < lastpos; i++) +ready[i] = ready[i + 1]; + ready[lastpos] = tmp; +} + +/* Do Power6 specific sched_reorder2 reordering of ready list. */ + +static int +power6_sched_reorder2 (rtx_insn **ready, int lastpos) +{ + /* For Power6, we need to handle some special cases to try and keep the + store queue from overflowing and triggering expensive flushes. + + This code monitors how load and store instructions are being issued + and skews the ready list one way or the other to increase the likelihood + that a desired instruction is issued at the proper time. + + A couple of things are done. First, we maintain a "load_store_pendulum" + to track the current state of load/store issue. + + - If the pendulum is at zero, then no loads or stores have been +issued in the current cycle so we do nothing. + + - If the pendulum is 1, then a single load has been issued in this +cycle and we attempt to locate another load in the ready list to +issue with it. + + - If the pendulum is -2, then two stores have already been +issued in this cycle, so we increase the priority of the first load +in the ready list to increase it's likelihood of being chosen first +in the next cycle. + + - If the pendulum is -1, then a single store has been issued in this +cycle and we attempt to locate another store in the ready list to +issue with it, preferring a store to an adjacent memory location to +facilitate store pairing in the store queue. + + - If the pendulum is 2, then two loads have already been +issued in this cycle, so we increase the priority of the first store +in the ready list to increase it's likelihood of being chosen first +in the next cycle. + + - If the pendulum < -2 or > 2, then do nothing. + + Note: This code covers the most common scenarios. There exist non +load/store instructions which make use of the LSU and which +would need to be accounted for to strictly model the behavior +of the machine. Those instructions are currently unaccounted +for to help minimize compile time overhead of this code. + */ + int pos; + rtx load_mem, str_mem; + + if (is_store_insn (last_scheduled_insn, _mem)) +/* Issuing a store, swing the load_store_pendulum to the left */ +load_store_pendulum--; + else if (is_load_insn (last_scheduled_insn, _mem)) +/* Issuing a load, swing the load_store_pendulum to the right */ +load_store_pendulum++; + else +return cached_can_issue_more; + + /* If the pendulum is balanced, or there is only one instruction on + the ready list, then all is well, so return. */ + if ((load_store_pendulum == 0) || (lastpos <= 0)) +return cached_can_issue_more; + + if (load_store_pendulum == 1) +{ + /* A load has been issued in this cycle. Scan the ready list +for another load to issue with it */ + pos = lastpos; + + while (pos >= 0) + { + if (is_load_insn (ready[pos], _mem)) + { + /* Found a load. Move it to the head of the ready list, +and adjust it's priority so that it is more likely to +stay there */ + move_to_end_of_ready (ready, pos, lastpos); + + if (!sel_sched_p () + && INSN_PRIORITY_KNOWN (ready[lastpos])) + INSN_PRIORITY (ready[lastpos])++; + break; + } + pos--; + } +} + else if (load_store_pendulum == -2) +{ + /* Two stores have been issued in this cycle. Increase the +priority of the first load in the ready list to favor it for +issuing in the next cycle. */ + pos = lastpos; + + while (pos >= 0) + { + if (is_load_insn (ready[pos], _mem) + && !sel_sched_p () +
[PATCH rs6000] Fix PR target/84369: gcc.dg/sms-10.c fails on Power9
As pointed out in the PR, the test is failing because a store->load dependency is reporting zero cost. Fixed by leaving existing costs as is (i.e. cost for update forms), and just adding a simple bypass for store->load dependencies. Bootstrap/regtest on powerpc64le (Power9) with no new regressions and testcase now passing. Also ran cpu2006/cpu2017 benchmark comparisons with no notable differences. Ok for trunk? -Pat 2019-04-15 Pat Haugen PR target/84369 * config/rs6000/power9.md: Add store forwarding bypass. Index: gcc/config/rs6000/power9.md === --- gcc/config/rs6000/power9.md (revision 270261) +++ gcc/config/rs6000/power9.md (working copy) @@ -236,6 +236,9 @@ (define_insn_reservation "power9-vecstor (eq_attr "cpu" "power9")) "DU_super_power9,LSU_pair_power9") +; Store forwarding latency is 6 +(define_bypass 6 "power9-*store*" "power9-*load*") + (define_insn_reservation "power9-larx" 4 (and (eq_attr "type" "load_l") (eq_attr "cpu" "power9"))
[PATCH, testsuite] Enable vect_usad_char effective target for PowerPC and fix up SAD_EXPR testcases
Power9 added support for V16QImode SAD operations. While making the check_effective_target change I noticed that the tests will also pass on Power7/Power8 even though they don't have the optab support. The reason is the tests are only checking that the source pattern is recognized, not that a SAD_EXPR was actually generated. So I've updated the tests also. Ok for trunk? -Pat testsuite/ChangeLog: 2019-02-19 Pat Haugen * lib/target-supports.exp (check_effective_target_vect_usad_char): Add PowerPC support. * gcc.dg/vect/slp-reduc-sad.c: Update scan string. * gcc.dg/vect/vect-reduc-sad.c: Likewise. Index: gcc/testsuite/lib/target-supports.exp === --- gcc/testsuite/lib/target-supports.exp (revision 268784) +++ gcc/testsuite/lib/target-supports.exp (working copy) @@ -5982,7 +5982,9 @@ proc check_effective_target_vect_usad_ch expr { [istarget i?86-*-*] || [istarget x86_64-*-*] || ([istarget aarch64*-*-*] - && ![check_effective_target_aarch64_sve])}}] + && ![check_effective_target_aarch64_sve]) + || ([istarget powerpc*-*-*] + && [check_p9vector_hw_available])}}] } # Return 1 if the target plus current options supports both signed Index: gcc/testsuite/gcc.dg/vect/slp-reduc-sad.c === --- gcc/testsuite/gcc.dg/vect/slp-reduc-sad.c (revision 268784) +++ gcc/testsuite/gcc.dg/vect/slp-reduc-sad.c (working copy) @@ -58,6 +58,6 @@ main () return 0; } -/* { dg-final { scan-tree-dump "vect_recog_sad_pattern: detected" "vect" } } */ +/* { dg-final { scan-tree-dump "sad pattern recognized" "vect" } } */ /* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-reduc-sad.c === --- gcc/testsuite/gcc.dg/vect/vect-reduc-sad.c (revision 268784) +++ gcc/testsuite/gcc.dg/vect/vect-reduc-sad.c (working copy) @@ -49,6 +49,6 @@ main (void) return 0; } -/* { dg-final { scan-tree-dump-times "vect_recog_sad_pattern: detected" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "sad pattern recognized" 1 "vect" } } */ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
[PATCH doc] Remove documentation for PowerPC -maltivec={be,le}
The options were removed in May 2018 (r260109), but documentation was not updated. Bootstrap on powerpc64le. Ok for trunk? -Pat 2019-02-12 Pat Haugen * doc/invoke.texi (RS/6000 and PowerPC Options): Remove duplicate -maltivec. Delete -maltivec=be and -maltivec=le documentation. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 268784) +++ gcc/doc/invoke.texi (working copy) @@ -1085,7 +1085,7 @@ See RS/6000 and PowerPC Options. -mstrict-align -mno-strict-align -mrelocatable @gol -mno-relocatable -mrelocatable-lib -mno-relocatable-lib @gol -mtoc -mno-toc -mlittle -mlittle-endian -mbig -mbig-endian @gol --mdynamic-no-pic -maltivec -mswdiv -msingle-pic-base @gol +-mdynamic-no-pic -mswdiv -msingle-pic-base @gol -mprioritize-restricted-insns=@var{priority} @gol -msched-costly-dep=@var{dependence_type} @gol -minsert-sched-nops=@var{scheme} @gol @@ -24088,40 +24088,14 @@ the AltiVec instruction set. You may al @option{-mabi=altivec} to adjust the current ABI with AltiVec ABI enhancements. -When @option{-maltivec} is used, rather than @option{-maltivec=le} or -@option{-maltivec=be}, the element order for AltiVec intrinsics such -as @code{vec_splat}, @code{vec_extract}, and @code{vec_insert} +When @option{-maltivec} is used, the element order for AltiVec intrinsics +such as @code{vec_splat}, @code{vec_extract}, and @code{vec_insert} match array element order corresponding to the endianness of the target. That is, element zero identifies the leftmost element in a vector register when targeting a big-endian platform, and identifies the rightmost element in a vector register when targeting a little-endian platform. -@item -maltivec=be -@opindex maltivec=be -Generate AltiVec instructions using big-endian element order, -regardless of whether the target is big- or little-endian. This is -the default when targeting a big-endian platform. Using this option -is currently deprecated. Support for this feature will be removed in -GCC 9. - -The element order is used to interpret element numbers in AltiVec -intrinsics such as @code{vec_splat}, @code{vec_extract}, and -@code{vec_insert}. By default, these match array element order -corresponding to the endianness for the target. - -@item -maltivec=le -@opindex maltivec=le -Generate AltiVec instructions using little-endian element order, -regardless of whether the target is big- or little-endian. This is -the default when targeting a little-endian platform. This option is -currently ignored when targeting a big-endian platform. - -The element order is used to interpret element numbers in AltiVec -intrinsics such as @code{vec_splat}, @code{vec_extract}, and -@code{vec_insert}. By default, these match array element order -corresponding to the endianness for the target. - @item -mvrsave @itemx -mno-vrsave @opindex mvrsave
Re: Add new --param knobs for inliner
On 1/5/19 11:58 AM, Jan Hubicka wrote: > @@ -791,7 +791,7 @@ want_inline_small_function_p (struct cgr >ipa_hints hints = estimate_edge_hints (e); >int big_speedup = -1; /* compute this lazily */ > > - if (growth <= 0) > + if (growth <= PARAM_VALUE (PARAM_VALUE (PARAM_MAX_INLINE_INSNS_SIZE))) Extra PARAM_VALUE here. -Pat
[PATCH] Fix PR68212, unrolled loop no longer aligned
The following patch fixes the case where unrolling in the absence of profile information can cause a loop to no longer look hot and therefor not get aligned. In this case, instead of dividing by the unroll factor we now just scale by profile_probability::likely (). The diff looks worse than what really changed, just the addition of an if test and putting the existing 'scalemain' setting code into the else leg. Bootstrap/regtest on powerpc64le with no new regressions. I also ran a CPU2006 comparison, there were 4 benchmarks that improved 2-3% with the others in the noise range. Ok for trunk? -Pat 2018-11-28 Pat Haugen PR rtl-optimization/68212 * cfgloopmanip.c (duplicate_loop_to_header_edge): Adjust scale factor. testsuite/ChangeLog: 2018-11-28 Pat Haugen PR rtl-optimization/68212 * gcc.dg/pr68212.c: New test. Index: gcc/cfgloopmanip.c === --- gcc/cfgloopmanip.c (revision 266522) +++ gcc/cfgloopmanip.c (working copy) @@ -1242,16 +1242,25 @@ duplicate_loop_to_header_edge (struct lo profile_probability prob_pass_main = bitmap_bit_p (wont_exit, 0) ? prob_pass_wont_exit : prob_pass_thru; - profile_probability p = prob_pass_main; - profile_count scale_main_den = count_in; - for (i = 0; i < ndupl; i++) + + /* If not using real profile data then don't scale the loop by ndupl. +This can lead to the loop no longer looking hot wrt surrounding +code. */ + if (profile_status_for_fn (cfun) == PROFILE_GUESSED) + scale_main = profile_probability::likely (); + else { - scale_main_den += count_in.apply_probability (p); - p = p * scale_step[i]; + profile_probability p = prob_pass_main; + profile_count scale_main_den = count_in; + for (i = 0; i < ndupl; i++) + { + scale_main_den += count_in.apply_probability (p); + p = p * scale_step[i]; + } + /* If original loop is executed COUNT_IN times, the unrolled +loop will account SCALE_MAIN_DEN times. */ + scale_main = count_in.probability_in (scale_main_den); } - /* If original loop is executed COUNT_IN times, the unrolled -loop will account SCALE_MAIN_DEN times. */ - scale_main = count_in.probability_in (scale_main_den); scale_act = scale_main * prob_pass_main; } else Index: gcc/testsuite/gcc.dg/pr68212.c === --- gcc/testsuite/gcc.dg/pr68212.c (nonexistent) +++ gcc/testsuite/gcc.dg/pr68212.c (working copy) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fno-tree-vectorize -funroll-loops --param max-unroll-times=4 -fdump-rtl-alignments" } */ + +void foo(long int *a, long int *b, long int n) +{ + long int i; + + for (i = 0; i < n; i++) +a[i] = *b; +} + +/* { dg-final { scan-rtl-dump-times "internal loop alignment added" 1 "alignments"} } */
Re: Tweak ALAP calculation in SCHED_PRESSURE_MODEL
On 11/20/18 10:53 AM, Kyrill Tkachov wrote: > On 20/11/18 16:48, Pat Haugen wrote: >> On 11/19/18 2:30 PM, Pat Haugen wrote: >>>> This is a follow-up from >>>> https://gcc.gnu.org/ml/gcc-patches/2018-11/msg01525.html >>>> This version introduces an "artificial" property of the dependencies >>>> produced in >>>> sched-deps.c that is recorded when they are created due to >>>> MAX_PENDING_LIST_LENGTH >>>> and they are thus ignored in the model_analyze_insns ALAP calculation. >>>> >>>> This approach gives most of the benefits of the original patch [1] on >>>> aarch64. >>>> I tried it on the cactusADM hot function (bench_staggeredleapfrog2_) on >>>> powerpc64le-unknown-linux-gnu >>>> with -O3 and found that the initial version proposed did indeed increase >>>> the instruction count >>>> and stack space. This version gives a small improvement on powerpc in >>>> terms of instruction count >>>> (number of st* instructions stays the same), so I'm hoping this version >>>> addresses Pat's concerns. >>>> Pat, could you please try this version out if you've got the chance? >>>> >>> I tried the new verison on cactusADM, it's showing a 2% degradation. I've >>> kicked off a full CPU2006 run just to see if any others are affected. >> The other benchmarks were neutral. So the only benchmark showing a change is >> the 2% degradation on cactusADM. Comparing the generated .s files for >> bench_staggeredleapfrog2_(), there is about a 0.7% increase in load insns >> and still the 1% increase in store insns. > > Sigh :( > What options are you compiling with? I tried a powerpc64le compiler with > plain -O3 and saw got a slight improvement (by manual expection) I was using the following: -O3 -mcpu=power8 -fpeel-loops -funroll-loops -ffast-math -mpopcntd -mrecip=all. When I run with just -O3 -mcpu=power8 I see just under a 1% degradation. -Pat
Re: Tweak ALAP calculation in SCHED_PRESSURE_MODEL
On 11/19/18 2:30 PM, Pat Haugen wrote: >> This is a follow-up from >> https://gcc.gnu.org/ml/gcc-patches/2018-11/msg01525.html >> This version introduces an "artificial" property of the dependencies >> produced in >> sched-deps.c that is recorded when they are created due to >> MAX_PENDING_LIST_LENGTH >> and they are thus ignored in the model_analyze_insns ALAP calculation. >> >> This approach gives most of the benefits of the original patch [1] on >> aarch64. >> I tried it on the cactusADM hot function (bench_staggeredleapfrog2_) on >> powerpc64le-unknown-linux-gnu >> with -O3 and found that the initial version proposed did indeed increase the >> instruction count >> and stack space. This version gives a small improvement on powerpc in terms >> of instruction count >> (number of st* instructions stays the same), so I'm hoping this version >> addresses Pat's concerns. >> Pat, could you please try this version out if you've got the chance? >> > I tried the new verison on cactusADM, it's showing a 2% degradation. I've > kicked off a full CPU2006 run just to see if any others are affected. The other benchmarks were neutral. So the only benchmark showing a change is the 2% degradation on cactusADM. Comparing the generated .s files for bench_staggeredleapfrog2_(), there is about a 0.7% increase in load insns and still the 1% increase in store insns. -Pat
Re: Tweak ALAP calculation in SCHED_PRESSURE_MODEL
On 11/19/18 11:54 AM, Kyrill Tkachov wrote: > On 16/11/18 18:19, Pat Haugen wrote: >> On 11/8/18 6:10 AM, Kyrill Tkachov wrote: >>> The attached patch avoids that by making the alap calculation only >>> look at true dependencies. This shouldn't be too bad, since we use >>> INSN_PRIORITY as the final tie-breaker than that does take >>> anti-dependencies into account. >>> >>> This reduces the number of spills in the hot function from 436.cactusADM >>> by 14% on aarch64 at -O3 (and the number of instructions in general). >>> SPEC2017 shows a minor improvement on Cortex-A72 (about 0.1% overall). >>> Thanks to Wilco for the benchmarking. >> I tried the patch on PowerPC since it also uses SCHED_PRESSURE_MODEL >> algorithm. For CPU2006 only cactusADM had a noticeable difference, but I'm >> seeing a 5% degradation. Looking at the generated asm for function >> bench_staggeredleapfrog2_(), I see about a 1% increase in number of loads >> and stores generated and an extra 100 bytes allocated on the stack. >> >> -Pat >> > > This is a follow-up from > https://gcc.gnu.org/ml/gcc-patches/2018-11/msg01525.html > This version introduces an "artificial" property of the dependencies produced > in > sched-deps.c that is recorded when they are created due to > MAX_PENDING_LIST_LENGTH > and they are thus ignored in the model_analyze_insns ALAP calculation. > > This approach gives most of the benefits of the original patch [1] on aarch64. > I tried it on the cactusADM hot function (bench_staggeredleapfrog2_) on > powerpc64le-unknown-linux-gnu > with -O3 and found that the initial version proposed did indeed increase the > instruction count > and stack space. This version gives a small improvement on powerpc in terms > of instruction count > (number of st* instructions stays the same), so I'm hoping this version > addresses Pat's concerns. > Pat, could you please try this version out if you've got the chance? > I tried the new verison on cactusADM, it's showing a 2% degradation. I've kicked off a full CPU2006 run just to see if any others are affected. -Pat
Re: Tweak ALAP calculation in SCHED_PRESSURE_MODEL
On 11/8/18 6:10 AM, Kyrill Tkachov wrote: > The attached patch avoids that by making the alap calculation only > look at true dependencies. This shouldn't be too bad, since we use > INSN_PRIORITY as the final tie-breaker than that does take > anti-dependencies into account. > > This reduces the number of spills in the hot function from 436.cactusADM > by 14% on aarch64 at -O3 (and the number of instructions in general). > SPEC2017 shows a minor improvement on Cortex-A72 (about 0.1% overall). > Thanks to Wilco for the benchmarking. I tried the patch on PowerPC since it also uses SCHED_PRESSURE_MODEL algorithm. For CPU2006 only cactusADM had a noticeable difference, but I'm seeing a 5% degradation. Looking at the generated asm for function bench_staggeredleapfrog2_(), I see about a 1% increase in number of loads and stores generated and an extra 100 bytes allocated on the stack. -Pat
[PATCH] Fix typo in try_peel_loop
I observed the odd looking code while looking into something else and confirmed prior behavior (before r248863) was to sum the counts of the non-latch predecessors of the loop (in order to scale the loop such that it no longer appears to iterate). Bootstrap/regtest on powerpc64le with no regressions. Committed as obvious. -Pat 2018-10-31 Pat Haugen * tree-ssa-loop-ivcanon.c (try_peel_loop): Fix typo and remove dead initialization. Index: gcc/tree-ssa-loop-ivcanon.c === --- gcc/tree-ssa-loop-ivcanon.c (revision 264821) +++ gcc/tree-ssa-loop-ivcanon.c (working copy) @@ -1139,10 +1139,10 @@ try_peel_loop (struct loop *loop, if (e->src != loop->latch) { if (e->src->count.initialized_p ()) - entry_count = e->src->count + e->src->count; + entry_count += e->src->count; gcc_assert (!flow_bb_inside_loop_p (loop, e->src)); } - profile_probability p = profile_probability::very_unlikely (); + profile_probability p; p = entry_count.probability_in (loop->header->count); scale_loop_profile (loop, p, 0); bitmap_set_bit (peeled_loops, loop->num);
[PATCH rs6000] Fix PR86612
Probably an obvious patch but... The testcase fails because it looks like recent glibc headers (2.27 at least) no longer contain a declaration for __strdup, which results in warning messages being generated and failure for excess errors. Fixed by calling the standard name. Verified the testcase now passes, ok for trunk? -Pat testsuite/ChangeLog: 2018-07-26 Pat Haugen PR target/86612 * gcc.target/powerpc/pr58673-2.c: Call strdup. Index: testsuite/gcc.target/powerpc/pr58673-2.c === --- testsuite/gcc.target/powerpc/pr58673-2.c(revision 262974) +++ testsuite/gcc.target/powerpc/pr58673-2.c(working copy) @@ -140,7 +140,7 @@ pr_ff (t_coupl_rec * tcr, real time, t_i malloc (__len); __retval;} - )): __strdup (eoNames[i]))); + )): strdup (eoNames[i]))); raleg[j++] = (__extension__ (__builtin_constant_p (buf) @@ -165,7 +165,7 @@ pr_ff (t_coupl_rec * tcr, real time, t_i malloc (__len); __retval;} - )): __strdup (buf))); + )): strdup (buf))); } } if (tcr->nLJ)
Re: [PATCH 1/4] Clean up of new format of -falign-FOO.
testsuite/gcc.target/powerpc/loop_align.c fails with this patch. It just needs a simple tweak to the scan-assembler line since we're no longer generating the ",,31" portion on the .p2align. -Pat
Re: [PATCH rs6000] Fix PR85698
On 5/17/18 1:57 PM, Segher Boessenkool wrote: > On Thu, May 17, 2018 at 07:58:20PM +0200, Richard Biener wrote: >> On May 17, 2018 6:04:36 PM GMT+02:00, Segher Boessenkool >> <seg...@kernel.crashing.org> wrote: >>> On Thu, May 17, 2018 at 10:42:46AM -0500, Pat Haugen wrote: >>>> The following patch fixes a problem that resulted in incorrect code >>> generation for the CPU2017 benchmark 525.x264_r. The fix correctly >>> checks the "dest" operand, which is the memory operand. >>>> Bootstrap/regtest on powerp64le and powerpc64 (-m32/-m64) with no new >>>> regressions. Ok for trunk? >>> Okay. Thanks! >> Don't forget the branch. > It's okay for both 7 and 8, too. > Fix has been backported to both 7 and 8 branches along with a fix to GCC 8 branch gcc.target/powerpc/vec-setup-be-long.c to remove the XFAIL that was pre-approved off list. -Pat
[PATCH rs6000] Fix PR85698
The following patch fixes a problem that resulted in incorrect code generation for the CPU2017 benchmark 525.x264_r. The fix correctly checks the "dest" operand, which is the memory operand. Bootstrap/regtest on powerp64le and powerpc64 (-m32/-m64) with no new regressions. Ok for trunk? -Pat 2018-05-17 Pat Haugen <pthau...@us.ibm.com> Segher Boessenkool <seg...@kernel.crashing.org> PR target/85698 * config/rs6000/rs6000.c (rs6000_output_move_128bit): Check dest operand. testsuite/ChangeLog: 2018-05-17 Pat Haugen <pthau...@us.ibm.com> PR target/85698 * gcc.target/powerpc/pr85698.c: New test. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 260267) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -20234,7 +20234,7 @@ rs6000_output_move_128bit (rtx operands[ } else if (TARGET_ALTIVEC && src_vmx_p - && altivec_indexed_or_indirect_operand (src, mode)) + && altivec_indexed_or_indirect_operand (dest, mode)) return "stvx %1,%y0"; else if (TARGET_VSX && src_vsx_p) Index: gcc/testsuite/gcc.target/powerpc/pr85698.c === --- gcc/testsuite/gcc.target/powerpc/pr85698.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/pr85698.c (working copy) @@ -0,0 +1,79 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vsx_hw } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */ +/* { dg-options "-O3 -mcpu=power7" } */ + +/* PR85698: Incorrect code generated on LE due to use of stxvw4x. */ + +typedef unsigned char uint8_t; +typedef short int16_t; +extern void abort (void); +extern int memcmp(const void *, const void *, __SIZE_TYPE__); + +uint8_t expected[128] = +{14, 0, 4, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, + 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 28, 35, 33, 35, 36, 37, 38, 39, 40, + 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, + 60, 61, 62, 63, 66, 63, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, + 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 97, 96, + 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, + 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127}; + +static uint8_t x264_clip_uint8( int x ) +{ + return x&(~255) ? (-x)>>31 : x; +} +void add4x4_idct( uint8_t *p_dst, int16_t dct[16]) +{ + int16_t d[16]; + int16_t tmp[16]; + int i, y, x; + for( i = 0; i < 4; i++ ) +{ + int s02 = dct[0*4+i] + dct[2*4+i]; + int d02 = dct[0*4+i] - dct[2*4+i]; + int s13 = dct[1*4+i] + (dct[3*4+i]>>1); + int d13 = (dct[1*4+i]>>1) - dct[3*4+i]; + tmp[i*4+0] = s02 + s13; + tmp[i*4+1] = d02 + d13; + tmp[i*4+2] = d02 - d13; + tmp[i*4+3] = s02 - s13; +} + for( i = 0; i < 4; i++ ) +{ + int s02 = tmp[0*4+i] + tmp[2*4+i]; + int d02 = tmp[0*4+i] - tmp[2*4+i]; + int s13 = tmp[1*4+i] + (tmp[3*4+i]>>1); + int d13 = (tmp[1*4+i]>>1) - tmp[3*4+i]; + d[0*4+i] = ( s02 + s13 + 32 ) >> 6; + d[1*4+i] = ( d02 + d13 + 32 ) >> 6; + d[2*4+i] = ( d02 - d13 + 32 ) >> 6; + d[3*4+i] = ( s02 - s13 + 32 ) >> 6; +} + for( y = 0; y < 4; y++ ) +{ + for( x = 0; x < 4; x++ ) +p_dst[x] = x264_clip_uint8( p_dst[x] + d[y*4+x] ); + p_dst += 32; +} +} + +int main() +{ + uint8_t dst[128]; + int16_t dct[16]; + int i; + + for (i = 0; i < 16; i++) +dct[i] = i*10 + i; + for (i = 0; i < 128; i++) +dst[i] = i; + + add4x4_idct(dst, dct); + + if (memcmp (dst, expected, 128)) +abort(); + + return 0; +} +
[Patch, rs6000] Fix register values in ppc-asm.h
The following patch fixes a couple typos in ppc-asm.h. Committed as obvious. Will also backport to GCC 6/7 branches. -Pat 2017-11-21 Pat Haugen <pthau...@us.ibm.com> * config/rs6000/ppc-asm.h (f50, vs50): Fix values. Index: gcc/config/rs6000/ppc-asm.h === --- gcc/config/rs6000/ppc-asm.h (revision 255022) +++ gcc/config/rs6000/ppc-asm.h (working copy) @@ -120,7 +120,7 @@ see the files COPYING3 and COPYING.RUNTI #define f4747 #define f4848 #define f4949 -#define f5030 +#define f5050 #define f5151 #define f5252 #define f5353 @@ -222,7 +222,7 @@ see the files COPYING3 and COPYING.RUNTI #define vs47 47 #define vs48 48 #define vs49 49 -#define vs50 30 +#define vs50 50 #define vs51 51 #define vs52 52 #define vs53 53
Re: [PATCH, rs6000] Correct some Power9 scheduling info
On 09/27/2017 12:56 PM, Pat Haugen wrote: > The following patch corrects some Power9 resource requirements and > instruction latencies. Bootstrap/regtest on powerpc64le-linux with no > new regressions. Ok for trunk? Updated patch follows. Bootstrap/regtest on powerpc64le-linux (Power9) with no regressions. Ok for trunk? -Pat 2017-11-15 Pat Haugen <pthau...@us.ibm.com> * rs6000/power9.md (power9fpdiv): New automaton and cpu_unit defined for it. (DU_C2_3_power9): Correct reservation combinations. (FP_DIV_power9, VEC_DIV_power9): New. (power9-alu): Split out rotate/shift... (power9-rot): ...to here, correct dispatch resource. (power9-cracked-alu, power9-mul, power9-mul-compare): Correct dispatch resource. (power9-fp): Correct latency. (power9-sdiv): Add div/sqrt resource. (power9-ddiv): Correct latency, add div/sqrt resource. (power9-sqrt, power9-dsqrt): Add div/sqrt resource. (power9-vecfdiv, power9-vecdiv): Correct latency, add div/sqrt resource. (power9-qpdiv, power9-qpmul): Adjust resource usage. Index: gcc/config/rs6000/power9.md === --- gcc/config/rs6000/power9.md (revision 254708) +++ gcc/config/rs6000/power9.md (working copy) @@ -19,7 +19,7 @@ ;; along with GCC; see the file COPYING3. If not see ;; <http://www.gnu.org/licenses/>. -(define_automaton "power9dsp,power9lsu,power9vsu,power9misc") +(define_automaton "power9dsp,power9lsu,power9vsu,power9fpdiv,power9misc") (define_cpu_unit "lsu0_power9,lsu1_power9,lsu2_power9,lsu3_power9" "power9lsu") (define_cpu_unit "vsu0_power9,vsu1_power9,vsu2_power9,vsu3_power9" "power9vsu") @@ -28,7 +28,11 @@ ; Two fixed point divide units, not pipelined (define_cpu_unit "fx_div0_power9,fx_div1_power9" "power9misc") (define_cpu_unit "bru_power9,cryptu_power9,dfu_power9" "power9misc") +; Create a false unit for use by non-pipelined FP div/sqrt +(define_cpu_unit "fp_div0_power9,fp_div1_power9,fp_div2_power9,fp_div3_power9" + "power9fpdiv") + (define_cpu_unit "x0_power9,x1_power9,xa0_power9,xa1_power9, x2_power9,x3_power9,xb0_power9,xb1_power9, br0_power9,br1_power9" "power9dsp") @@ -79,8 +83,7 @@ ; 2-way cracked plus 3rd slot (define_reservation "DU_C2_3_power9" "x0_power9+x1_power9+xa0_power9| - x1_power9+x2_power9+xa0_power9| - x1_power9+x2_power9+xb0_power9| + x1_power9+x2_power9+xa1_power9| x2_power9+x3_power9+xb0_power9") ; 3-way cracked (consumes whole decode/dispatch cycle) @@ -108,7 +111,19 @@ (define_reservation "VSU_PRM_power9" "prm0_power9|prm1_power9") +; Define the reservation to be used by FP div/sqrt which allows other insns +; to be issued to the VSU, but blocks other div/sqrt for a number of cycles. +; Note that the number of cycles blocked varies depending on insn, but we +; just use the same number for all in order to keep the number of DFA states +; reasonable. +(define_reservation "FP_DIV_power9" + "fp_div0_power9*8|fp_div1_power9*8|fp_div2_power9*8| + fp_div3_power9*8") +(define_reservation "VEC_DIV_power9" + "fp_div0_power9*8+fp_div1_power9*8| + fp_div2_power9*8+fp_div3_power9*8") + ; LS Unit (define_insn_reservation "power9-load" 4 (and (eq_attr "type" "load") @@ -243,9 +258,7 @@ ; Most ALU insns are simple 2 cycle, including record form (define_insn_reservation "power9-alu" 2 - (and (ior (eq_attr "type" "add,exts,integer,logical,isel") - (and (eq_attr "type" "insert,shift") - (eq_attr "dot" "no"))) + (and (eq_attr "type" "add,exts,integer,logical,isel") (eq_attr "cpu" "power9")) "DU_any_power9,VSU_power9") ; 5 cycle CR latency @@ -252,12 +265,19 @@ (define_bypass 5 "power9-alu" "power9-crlogical,power9-mfcr,power9-mfcrf") +; Rotate/shift prevent use of third slot +(define_insn_reservation "power9-rot" 2 + (and (eq_attr "type" "insert,shift") + (eq_attr "dot" "no") + (eq_attr "cpu" "power9")) + "DU_slice_3_power9,VSU_power9") + ; Record form rotate/shift are cracked (define_insn_reservation "power9-cracked-alu" 2 (and (eq_attr "type" "insert,shift") (eq_attr "dot" "yes") (eq_attr "cpu" "power9")) - "DU_C2_power9,VSU_power9") + "DU_C2_3_power9,VSU_power9") ; 7 cycle CR latency (define_bypass 7 "p
Re: [PATCH, rs6000] Fix scheduling description for quad-precision multiply instructions
On 11/08/2017 11:30 AM, Segher Boessenkool wrote: >> --- gcc/config/rs6000/power9.md (revision 254377) >> +++ gcc/config/rs6000/power9.md (working copy) >> @@ -436,6 +436,12 @@ (define_insn_reservation "power9-qpdiv" >> (eq_attr "cpu" "power9")) >>"DU_super_power9,dfu_power9") >> >> +(define_insn_reservation "power9-qpmul" 24 >> + (and (eq_attr "type" "qmul") >> + (eq_attr "size" "128") >> + (eq_attr "cpu" "power9")) >> + "DU_super_power9,dfu_power9*12") > All other p9 reservations (other than integer div) do not reserve a > unit more than one cycle. Will this blow up size of the automaton? > If not, should the other qp ops not reserve for more cycles, too? > You are observant. :) I should have just included the change for qp divide in this patch since it's so simple, but didn't. The qp divide also blocks the pipe for a number of cycles, so yes should be changed. The other qp insns (add/sub/etc) do not, so are correct as is. > Rest looks fine; okay for trunk with the reservation thing taken care > of one way or the other. Thanks! Following is additional change I'll include along with updated ChangeLog. Testing was fine. @@ -434,8 +434,14 @@ (and (eq_attr "type" "vecdiv") (eq_attr "size" "128") (eq_attr "cpu" "power9")) - "DU_super_power9,dfu_power9") + "DU_super_power9,dfu_power9*44")
[PATCH, rs6000] Fix scheduling description for quad-precision multiply instructions
The following patch creates a new insn type to annotate quad precision multiply instructions, updates the appropriate insns to use the new type and creates an entry in the Power9 machine description which describes the correct latency/resources. Bootstrap/regtest on powerpc64le-linux with no new regressions. Ok for trunk? -Pat 2017-11-08 Pat Haugen <pthau...@us.ibm.com> * rs6000/power9.md (power9-qpmul): New. * rs6000/rs6000.md ("type" attr): Add qmul. (mul3, fma4_hw, *fms4_hw, *nfma4_hw, *nfms4_hw, mul3_odd, fma4_odd, *fms4_odd, *nfma4_odd, *nfms4_odd): Change type to qmul. Index: gcc/config/rs6000/power9.md === --- gcc/config/rs6000/power9.md (revision 254377) +++ gcc/config/rs6000/power9.md (working copy) @@ -436,6 +436,12 @@ (define_insn_reservation "power9-qpdiv" (eq_attr "cpu" "power9")) "DU_super_power9,dfu_power9") +(define_insn_reservation "power9-qpmul" 24 + (and (eq_attr "type" "qmul") + (eq_attr "size" "128") + (eq_attr "cpu" "power9")) + "DU_super_power9,dfu_power9*12") + (define_insn_reservation "power9-mffgpr" 2 (and (eq_attr "type" "mffgpr") (eq_attr "cpu" "power9")) Index: gcc/config/rs6000/rs6000.md === --- gcc/config/rs6000/rs6000.md (revision 254377) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -182,7 +182,7 @@ (define_attr "type" cmp, branch,jmpreg,mfjmpr,mtjmpr,trap,isync,sync,load_l,store_c, cr_logical,delayed_cr,mfcr,mfcrf,mtcr, - fpcompare,fp,fpsimple,dmul,sdiv,ddiv,ssqrt,dsqrt, + fpcompare,fp,fpsimple,dmul,qmul,sdiv,ddiv,ssqrt,dsqrt, vecsimple,veccomplex,vecdiv,veccmp,veccmpsimple,vecperm, vecfloat,vecfdiv,vecdouble,mffgpr,mftgpr,crypto, veclogical,veccmpfx,vecexts,vecmove, @@ -14230,7 +14230,7 @@ (define_insn "mul3" (match_operand:IEEE128 2 "altivec_register_operand" "v")))] "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)" "xsmulqp %0,%1,%2" - [(set_attr "type" "vecfloat") + [(set_attr "type" "qmul") (set_attr "size" "128")]) (define_insn "div3" @@ -14332,7 +14332,7 @@ (define_insn "fma4_hw" (match_operand:IEEE128 3 "altivec_register_operand" "0")))] "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)" "xsmaddqp %0,%1,%2" - [(set_attr "type" "vecfloat") + [(set_attr "type" "qmul") (set_attr "size" "128")]) (define_insn "*fms4_hw" @@ -14344,7 +14344,7 @@ (define_insn "*fms4_hw" (match_operand:IEEE128 3 "altivec_register_operand" "0"] "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)" "xsmsubqp %0,%1,%2" - [(set_attr "type" "vecfloat") + [(set_attr "type" "qmul") (set_attr "size" "128")]) (define_insn "*nfma4_hw" @@ -14356,7 +14356,7 @@ (define_insn "*nfma4_hw" (match_operand:IEEE128 3 "altivec_register_operand" "0"] "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)" "xsnmaddqp %0,%1,%2" - [(set_attr "type" "vecfloat") + [(set_attr "type" "qmul") (set_attr "size" "128")]) (define_insn "*nfms4_hw" @@ -14369,7 +14369,7 @@ (define_insn "*nfms4_hw" (match_operand:IEEE128 3 "altivec_register_operand" "0")] "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)" "xsnmsubqp %0,%1,%2" - [(set_attr "type" "vecfloat") + [(set_attr "type" "qmul") (set_attr "size" "128")]) (define_insn "extend2_hw" @@ -14644,7 +14644,7 @@ (define_insn "mul3_odd" UNSPEC_MUL_ROUND_TO_ODD))] "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)" "xsmulqpo %0,%1,%2" - [(set_attr "type" "vecfloat") + [(set_attr "type" "qmul") (set_attr "size" "128")]) (define_insn "div3_odd" @@ -14677,7 +14677,7 @@ (define_insn "fma4_odd" UNSPEC_FMA_ROUND_TO_ODD))] "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)" "xsmaddqpo %0,%1,%2" - [(set_attr "type" "vecfloat") + [(set_attr "type" "qmul") (set_attr "size" "128")]) (define_insn "*fms4_odd" @@ -14690,7 +14690,7 @@ (define_insn "*fms4_odd" UNSPEC_FMA_ROUND_TO_ODD))] "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)" "xsmsubqpo %0,%1,%2" - [(set_attr "type" "vecfloat") + [(set_attr "type" "qmul") (set_attr "size" "128")]) (define_insn "*nfma4_odd" @@ -14703,7 +14703,7 @@ (define_insn "*nfma4_odd" UNSPEC_FMA_ROUND_TO_ODD)))] "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)" "xsnmaddqpo %0,%1,%2" - [(set_attr "type" "vecfloat") + [(set_attr "type" "qmul") (set_attr "size" "128")]) (define_insn "*nfms4_odd" @@ -14717,7 +14717,7 @@ (define_insn "*nfms4_odd" UNSPEC_FMA_ROUND_TO_ODD)))] "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)" "xsnmsubqpo %0,%1,%2" - [(set_attr "type" "vecfloat") + [(set_attr "type" "qmul") (set_attr "size" "128")]) (define_insn "truncdf2_odd"
[PATCH, rs6000] Correct unaligned_load vector cost for Power9
Power9 has efficient unaligned load insns. The following patch fixes the cost to reflect that. There was no similar code for the unaligned_store case. Bootstrap/regtest on powerpc64le-linux with no new regressions. Ok for trunk? -Pat 2017-10-09 Pat Haugen <pthau...@us.ibm.com> * config/rs6000/power9.c (rs6000_builtin_vectorization_cost): Remove TARGET_P9_VECTOR code for unaligned_load case. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 253547) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -5438,9 +5438,6 @@ rs6000_builtin_vectorization_cost (enum return 3; case unaligned_load: - if (TARGET_P9_VECTOR) - return 3; - if (TARGET_EFFICIENT_UNALIGNED_VSX) return 1;
[PATCH, rs6000] Correct some Power9 scheduling info
The following patch corrects some Power9 resource requirements and instruction latencies. Bootstrap/regtest on powerpc64le-linux with no new regressions. Ok for trunk? -Pat 2017-09-27 Pat Haugen <pthau...@us.ibm.com> * config/rs6000/power9.md (DU_C2_3_power9): Remove an incorrect combination. (power9-alu): Split out insert/shift types... (power9-rot): ... to here. Correct dispatch resources. (power9-cracked-alu): Correct dispatch resources. (power9-mul): Likewise. (power9-mul-compare): Likewise. (power9-fp): Correct latency. (power9-ddiv): Likewise. (power9-vecfdiv): Likewise. (power9-vecdiv): Likewise. Index: gcc/config/rs6000/power9.md === --- gcc/config/rs6000/power9.md (revision 252029) +++ gcc/config/rs6000/power9.md (working copy) @@ -80,7 +80,6 @@ (define_reservation "DU_C2_power9" "x0_p ; 2-way cracked plus 3rd slot (define_reservation "DU_C2_3_power9" "x0_power9+x1_power9+xa0_power9| x1_power9+x2_power9+xa0_power9| - x1_power9+x2_power9+xb0_power9| x2_power9+x3_power9+xb0_power9") ; 3-way cracked (consumes whole decode/dispatch cycle) @@ -243,21 +242,29 @@ (define_insn_reservation "power9-sync" 4 ; Most ALU insns are simple 2 cycle, including record form (define_insn_reservation "power9-alu" 2 - (and (ior (eq_attr "type" "add,exts,integer,logical,isel") - (and (eq_attr "type" "insert,shift") - (eq_attr "dot" "no"))) + (and (eq_attr "type" "add,exts,integer,logical,isel") (eq_attr "cpu" "power9")) "DU_any_power9,VSU_power9") ; 5 cycle CR latency (define_bypass 5 "power9-alu" "power9-crlogical,power9-mfcr,power9-mfcrf") +; Rotate/shift prevent use of third slot +(define_insn_reservation "power9-rot" 2 + (and (eq_attr "type" "insert,shift") + (eq_attr "dot" "no") + (eq_attr "cpu" "power9")) + "DU_slice_3_power9,VSU_power9") +; 5 cycle CR latency +(define_bypass 5 "power9-rot" + "power9-crlogical,power9-mfcr,power9-mfcrf") + ; Record form rotate/shift are cracked (define_insn_reservation "power9-cracked-alu" 2 (and (eq_attr "type" "insert,shift") (eq_attr "dot" "yes") (eq_attr "cpu" "power9")) - "DU_C2_power9,VSU_power9") + "DU_C2_3_power9,VSU_power9") ; 7 cycle CR latency (define_bypass 7 "power9-cracked-alu" "power9-crlogical,power9-mfcr,power9-mfcrf") @@ -291,13 +298,13 @@ (define_insn_reservation "power9-mul" 5 (and (eq_attr "type" "mul") (eq_attr "dot" "no") (eq_attr "cpu" "power9")) - "DU_any_power9,VSU_power9") + "DU_slice_3_power9,VSU_power9") (define_insn_reservation "power9-mul-compare" 5 (and (eq_attr "type" "mul") (eq_attr "dot" "yes") (eq_attr "cpu" "power9")) - "DU_C2_power9,VSU_power9") + "DU_C2_3_power9,VSU_power9") ; 10 cycle CR latency (define_bypass 10 "power9-mul-compare" "power9-crlogical,power9-mfcr,power9-mfcrf") @@ -349,7 +356,7 @@ (define_insn_reservation "power9-fpsimpl (eq_attr "cpu" "power9")) "DU_slice_3_power9,VSU_power9") -(define_insn_reservation "power9-fp" 7 +(define_insn_reservation "power9-fp" 5 (and (eq_attr "type" "fp,dmul") (eq_attr "cpu" "power9")) "DU_slice_3_power9,VSU_power9") @@ -366,7 +373,7 @@ (define_insn_reservation "power9-sdiv" 2 (eq_attr "cpu" "power9")) "DU_slice_3_power9,VSU_power9") -(define_insn_reservation "power9-ddiv" 33 +(define_insn_reservation "power9-ddiv" 27 (and (eq_attr "type" "ddiv") (eq_attr "cpu" "power9")) "DU_slice_3_power9,VSU_power9") @@ -419,12 +426,12 @@ (define_insn_reservation "power9-veccomp (eq_attr "cpu" "power9")) "DU_super_power9,VSU_super_power9") -(define_insn_reservation "power9-vecfdiv" 28 +(define_insn_reservation "power9-vecfdiv" 24 (and (eq_attr "type" "vecfdiv") (eq_attr "cpu" "power9")) "DU_super_power9,VSU_super_power9") -(define_insn_reservation "power9-vecdiv" 32 +(define_insn_reservation "power9-vecdiv" 27 (and (eq_attr "type" "vecdiv") (eq_attr "size" "!128") (eq_attr "cpu" "power9"))