Re: [PATCHv2, rs6000] Generate mfvsrwz for all subtargets and remove redundant zero extend [PR106769]

2023-07-30 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

on 2023/7/25 10:10, HAO CHEN GUI wrote:
> Hi,
>   This patch modifies vsx extract expand and generates mfvsrwz/stxsiwx
> for all subtargets when the mode is V4SI and the index of extracted element
> is 1 for BE and 2 for LE. Also this patch adds a insn pattern for mfvsrwz
> which helps eliminate redundant zero extend.
> 
>   Compared to last version, the main change is to move "vsx_extract_v4si_w1"
> and "*mfvsrwz" to the front of "*vsx_extract__di_p9". Also some insn
> conditions are changed to assertions.
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625128.html

Since the previous one is v2, this is actually v3. ;-)

> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Generate mfvsrwz for all platform and remove redundant zero extend
> 
> mfvsrwz has lower latency than xxextractuw or vextuw[lr]x.  So it should be
> generated even with p9 vector enabled.  Also the instruction is already
> zero extended.  A combine pattern is needed to eliminate redundant zero
> extend instructions.
> 
> gcc/
>   PR target/106769
>   * config/rs6000/vsx.md (expand vsx_extract_): Set it only
>   for V8HI and V16QI.
>   (vsx_extract_v4si): New expand for V4SI extraction.
>   (vsx_extract_v4si_w1): New insn pattern for V4SI extraction
>   when the index of extracted element is 1 with BE and 2 with LE.

Nit: Maybe better to match the name with " ... for V4SI extraction on
word 1 from BE order."

>   (*mfvsrwz): New insn pattern.
>   (*vsx_extract__di_p9): Not generate the insn when the index
>   of extracted element is 1 with BE and 2 with LE.
>   (*vsx_extract_si): Removed.

Nit: s/Removed/Remove/

>   (*vsx_extract_v4si_not_w1): New insn and split pattern which deals
>   with the cases not handled by vsx_extract_v4si_w1.
> 
> gcc/testsuite/
>   PR target/106769
>   * gcc.target/powerpc/pr106769.h: New.
>   * gcc.target/powerpc/pr106769-p8.c: New.
>   * gcc.target/powerpc/pr106769-p9.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 0a34ceebeb5..0065b76fef8 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -3722,9 +3722,9 @@ (define_insn "vsx_xxpermdi2__1"
>  (define_expand  "vsx_extract_"
>[(parallel [(set (match_operand: 0 "gpc_reg_operand")
>  (vec_select:
> - (match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand")
> + (match_operand:VSX_EXTRACT_I2 1 "gpc_reg_operand")
>   (parallel [(match_operand:QI 2 "const_int_operand")])))
> -   (clobber (match_scratch:VSX_EXTRACT_I 3))])]
> +   (clobber (match_scratch:VSX_EXTRACT_I2 3))])]
>"VECTOR_MEM_VSX_P (mode) && TARGET_DIRECT_MOVE_64BIT"
>  {
>/* If we have ISA 3.0, we can do a xxextractuw/vextractu{b,h}.  */
> @@ -3736,6 +3736,63 @@ (define_expand  "vsx_extract_"
>  }
>  })
> 
> +(define_expand  "vsx_extract_v4si"
> +  [(parallel [(set (match_operand:SI 0 "gpc_reg_operand")
> +(vec_select:SI
> + (match_operand:V4SI 1 "gpc_reg_operand")
> + (parallel [(match_operand:QI 2 "const_0_to_3_operand")])))
> +   (clobber (match_scratch:V4SI 3))])]
> +  "TARGET_DIRECT_MOVE_64BIT"
> +{
> +  /* The word 1 (BE order) can be extracted by mfvsrwz/stxsiwx.  So just
> + fall through to vsx_extract_v4si_w1.  */
> +  if (TARGET_P9_VECTOR
> +  && INTVAL (operands[2]) != (BYTES_BIG_ENDIAN ? 1 : 2))
> +{
> +  emit_insn (gen_vsx_extract_v4si_p9 (operands[0], operands[1],
> +   operands[2]));
> +  DONE;
> +}
> +})
> +
> +/* Extract from word 1 (BE order).  */

Nit: Use semicolon ";" for comments to keep consistent with the others
and what the doc says.

> +(define_insn "vsx_extract_v4si_w1"
> +  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,wa,Z,wa")
> + (vec_select:SI
> +  (match_operand:V4SI 1 "gpc_reg_operand" "v,v,v,0")
> +  (parallel [(match_operand:QI 2 "const_0_to_3_operand" "n,n,n,n")])))
> +   (clobber (match_scratch:V4SI 3 "=v,v,v,v"))]
> +  "TARGET_DIRECT_MOVE_64BIT
> +   && INTVAL (operands[2]) == (BYTES_BIG_ENDIAN ? 1 : 2)"
> +{
> +   if (which_alternative == 0)
> + return "mfvsrwz %0,%x1";
> +
> +   if (which_alternative == 1)
> + return "xxlor %x0,%x1,%x1";
> +
> +   if (which_alternative == 2)
> + return "stxsiwx %x1,%y0";
> +
> +   return ASM_COMMENT_START " vec_extract to same register";
> +}
> +  [(set_attr "type" "mfvsr,veclogical,fpstore,*")
> +   (set_attr "length" "4,4,4,0")
> +   (set_attr "isa" "p8v,*,p8v,*")])
> +
> +(define_insn "*mfvsrwz"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> + (zero_extend:DI
> +   (vec_select:SI
> + (match_operand:V4SI 1 "vsx_register_operand" "wa")
> + (parallel [(match_operand:QI 2 "const_int_operand" "n")]
> +   (clobber 

[PATCH 2/2] MATCH: Add `a == b | a cmp b` and `a != b & a cmp b` simplifications

2023-07-30 Thread Andrew Pinski via Gcc-patches
Even though these are done by combine_comparisons, we can add them to match
to allow simplifcations during match rather than just during reassoc/ifcombine.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR tree-optimization/106164
* match.pd (`a != b & a <= b`, `a != b & a >= b`,
`a == b | a < b`, `a == b | a > b`): Handle these cases
too.

gcc/testsuite/ChangeLog:

PR tree-optimization/106164
* gcc.dg/tree-ssa/cmpbit-2.c: New test.
---
 gcc/match.pd | 32 +--
 gcc/testsuite/gcc.dg/tree-ssa/cmpbit-2.c | 39 
 2 files changed, 69 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/cmpbit-2.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 00af5d99119..cf8057701ea 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2832,7 +2832,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (switch
   (if (code1 == EQ_EXPR && val) @3)
   (if (code1 == EQ_EXPR && !val) { constant_boolean_node (false, type); })
-  (if (code1 == NE_EXPR && !val) @4)))
+  (if (code1 == NE_EXPR && !val) @4)
+  (if (code1 == NE_EXPR
+   && code2 == GE_EXPR
+  && cmp == 0)
+   (gt @0 @1))
+  (if (code1 == NE_EXPR
+   && code2 == LE_EXPR
+  && cmp == 0)
+   (lt @0 @1))
+ )
+)
+   )
+  )
+ )
+)
 
 /* Convert (X OP1 CST1) && (X OP2 CST2).
Convert (X OP1 Y) && (X OP2 Y).  */
@@ -2917,7 +2931,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (switch
   (if (code1 == EQ_EXPR && val) @4)
   (if (code1 == NE_EXPR && val) { constant_boolean_node (true, type); })
-  (if (code1 == NE_EXPR && !val) @3)))
+  (if (code1 == NE_EXPR && !val) @3)
+  (if (code1 == EQ_EXPR
+   && code2 == GT_EXPR
+  && cmp == 0)
+   (ge @0 @1))
+  (if (code1 == EQ_EXPR
+   && code2 == LT_EXPR
+  && cmp == 0)
+   (le @0 @1))
+ )
+)
+   )
+  )
+ )
+)
 
 /* Convert (X OP1 CST1) || (X OP2 CST2).
Convert (X OP1 Y)|| (X OP2 Y).  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cmpbit-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/cmpbit-2.c
new file mode 100644
index 000..c4226ef01af
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cmpbit-2.c
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fno-tree-reassoc -fdump-tree-optimized-raw" } */
+
+_Bool f(int a, int b)
+{
+  _Bool c = a == b;
+  _Bool d = a > b;
+  return c | d;
+}
+
+_Bool f1(int a, int b)
+{
+  _Bool c = a != b;
+  _Bool d = a >= b;
+  return c & d;
+}
+
+_Bool g(int a, int b)
+{
+  _Bool c = a == b;
+  _Bool d = a < b;
+  return c | d;
+}
+
+_Bool g1(int a, int b)
+{
+  _Bool c = a != b;
+  _Bool d = a <= b;
+  return c & d;
+}
+
+
+/* We should be able to optimize these without reassociation too. */
+/* { dg-final { scan-tree-dump-not "bit_and_expr," "optimized" } } */
+/* { dg-final { scan-tree-dump-not "bit_ior_expr," "optimized" } } */
+/* { dg-final { scan-tree-dump-times "gt_expr," 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "ge_expr," 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "lt_expr," 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "le_expr," 1 "optimized" } } */
-- 
2.31.1



[PATCH 1/2] MATCH: PR 106164 : Optimize `(X CMP1 Y) AND/IOR (X CMP2 Y)`

2023-07-30 Thread Andrew Pinski via Gcc-patches
I noticed that there are patterns that optimize
`(X CMP1 CST1) AND/IOR (X CMP2 CST2)` and we can easily extend
them to support the  `(X CMP1 Y) AND/IOR (X CMP2 Y)` by saying they
compare equal. This allows for this kind of optimization for integral
and pointer types (which have the same semantics).

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR tree-optimization/106164
* match.pd: Extend the `(X CMP1 CST1) AND/IOR (X CMP2 CST2)`
patterns to support `(X CMP1 Y) AND/IOR (X CMP2 Y)`.

gcc/testsuite/ChangeLog:

PR tree-optimization/106164
* gcc.dg/tree-ssa/cmpbit-1.c: New test.
---
 gcc/match.pd | 66 +++-
 gcc/testsuite/gcc.dg/tree-ssa/cmpbit-1.c | 38 ++
 2 files changed, 90 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/cmpbit-1.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 73eb249f704..00af5d99119 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2799,14 +2799,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
 /* Convert (X == CST1) && (X OP2 CST2) to a known value
based on CST1 OP2 CST2.  Similarly for (X != CST1).  */
+/* Convert (X == Y) && (X OP2 Y) to a known value if X is an integral type.
+   Similarly for (X != Y).  */
 
 (for code1 (eq ne)
  (for code2 (eq ne lt gt le ge)
   (simplify
-   (bit_and:c (code1@3 @0 INTEGER_CST@1) (code2@4 @0 INTEGER_CST@2))
+   (bit_and:c (code1@3 @0 @1) (code2@4 @0 @2))
+   (if ((TREE_CODE (@1) == INTEGER_CST
+&& TREE_CODE (@2) == INTEGER_CST)
+   || ((INTEGRAL_TYPE_P (TREE_TYPE (@1))
+|| POINTER_TYPE_P (TREE_TYPE (@1)))
+   && operand_equal_p (@1, @2)))
 (with
  {
-  int cmp = tree_int_cst_compare (@1, @2);
+  int cmp = 0;
+  if (TREE_CODE (@1) == INTEGER_CST
+ && TREE_CODE (@2) == INTEGER_CST)
+   cmp = tree_int_cst_compare (@1, @2);
   bool val;
   switch (code2)
 {
@@ -2822,17 +2832,26 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (switch
   (if (code1 == EQ_EXPR && val) @3)
   (if (code1 == EQ_EXPR && !val) { constant_boolean_node (false, type); })
-  (if (code1 == NE_EXPR && !val) @4))
+  (if (code1 == NE_EXPR && !val) @4)))
 
-/* Convert (X OP1 CST1) && (X OP2 CST2).  */
+/* Convert (X OP1 CST1) && (X OP2 CST2).
+   Convert (X OP1 Y) && (X OP2 Y).  */
 
 (for code1 (lt le gt ge)
  (for code2 (lt le gt ge)
   (simplify
-  (bit_and (code1:c@3 @0 INTEGER_CST@1) (code2:c@4 @0 INTEGER_CST@2))
+  (bit_and (code1:c@3 @0 @1) (code2:c@4 @0 @2))
+  (if ((TREE_CODE (@1) == INTEGER_CST
+   && TREE_CODE (@2) == INTEGER_CST)
+   || ((INTEGRAL_TYPE_P (TREE_TYPE (@1))
+   || POINTER_TYPE_P (TREE_TYPE (@1)))
+  && operand_equal_p (@1, @2)))
(with
 {
- int cmp = tree_int_cst_compare (@1, @2);
+ int cmp = 0;
+ if (TREE_CODE (@1) == INTEGER_CST
+&& TREE_CODE (@2) == INTEGER_CST)
+   cmp = tree_int_cst_compare (@1, @2);
 }
 (switch
  /* Choose the more restrictive of two < or <= comparisons.  */
@@ -2861,18 +2880,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  && (code1 == GT_EXPR || code1 == GE_EXPR)
  && (code2 == LT_EXPR || code2 == LE_EXPR))
   { constant_boolean_node (false, type); })
- )
+ ))
 
 /* Convert (X == CST1) || (X OP2 CST2) to a known value
based on CST1 OP2 CST2.  Similarly for (X != CST1).  */
+/* Convert (X == Y) || (X OP2 Y) to a known value if X is an integral type.
+   Similarly for (X != Y).  */
 
 (for code1 (eq ne)
  (for code2 (eq ne lt gt le ge)
   (simplify
-   (bit_ior:c (code1@3 @0 INTEGER_CST@1) (code2@4 @0 INTEGER_CST@2))
+   (bit_ior:c (code1@3 @0 @1) (code2@4 @0 @2))
+   (if ((TREE_CODE (@1) == INTEGER_CST
+&& TREE_CODE (@2) == INTEGER_CST)
+   || ((INTEGRAL_TYPE_P (TREE_TYPE (@1))
+   || POINTER_TYPE_P (TREE_TYPE (@1)))
+   && operand_equal_p (@1, @2)))
 (with
  {
-  int cmp = tree_int_cst_compare (@1, @2);
+  int cmp = 0;
+  if (TREE_CODE (@1) == INTEGER_CST
+ && TREE_CODE (@2) == INTEGER_CST)
+   cmp = tree_int_cst_compare (@1, @2);
   bool val;
   switch (code2)
{
@@ -2888,17 +2917,26 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (switch
   (if (code1 == EQ_EXPR && val) @4)
   (if (code1 == NE_EXPR && val) { constant_boolean_node (true, type); })
-  (if (code1 == NE_EXPR && !val) @3))
+  (if (code1 == NE_EXPR && !val) @3)))
 
-/* Convert (X OP1 CST1) || (X OP2 CST2).  */
+/* Convert (X OP1 CST1) || (X OP2 CST2).
+   Convert (X OP1 Y)|| (X OP2 Y).  */
 
 (for code1 (lt le gt ge)
  (for code2 (lt le gt ge)
   (simplify
-  (bit_ior (code1@3 @0 INTEGER_CST@1) (code2@4 @0 INTEGER_CST@2))
+  (bit_ior (code1@3 @0 @1) (code2@4 @0 @2))
+  (if ((TREE_CODE (@1) == INTEGER_CST
+&& TREE_CODE (@2) == INTEGER_CST)
+   || ((INTEGRAL_TYPE_P (TREE_TYPE (@1))
+ 

RE: [PATCH v1] RISC-V: Bugfix for RVV floating-point rm suffix sequence

2023-07-30 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito.

Pan

-Original Message-
From: Kito Cheng  
Sent: Monday, July 31, 2023 10:58 AM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 

Subject: Re: [PATCH v1] RISC-V: Bugfix for RVV floating-point rm suffix sequence

lgtm

On Mon, Jul 31, 2023 at 10:56 AM  wrote:
>
> From: Pan Li 
>
> According to below RVV intrinsic doc, the RVV floating-point intrinsic name
> with rounding mode should be:
>
> _rm_m
>
> instead of:
>
> _m_rm
>
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226
>
> This patch fix this naming sequence issue and adjust the test cases.
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-shapes.cc (struct alu_frm_def):
> Move rm suffix before mask.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/float-point-frm-insert-1.c: Adjust
> test cases.
> * gcc.target/riscv/rvv/base/float-point-frm.c: Ditto.
> ---
>  gcc/config/riscv/riscv-vector-builtins-shapes.cc | 10 +-
>  .../riscv/rvv/base/float-point-frm-insert-1.c| 14 +++---
>  .../gcc.target/riscv/rvv/base/float-point-frm.c  | 16 
>  3 files changed, 20 insertions(+), 20 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-shapes.cc 
> b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
> index 22b5fe256df..6af57c22bfb 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-shapes.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
> @@ -261,6 +261,11 @@ struct alu_frm_def : public build_base
> b.append_name (type_suffixes[instance.type.index].vector);
>}
>
> +/* According to rvv-intrinsic-doc, it does not add "_rm" suffix
> +   for vop_rm C++ overloaded API.  */
> +if (!overloaded_p)
> +  b.append_name ("_rm");
> +
>  /* According to rvv-intrinsic-doc, it does not add "_m" suffix
> for vop_m C++ overloaded API.  */
>  if (overloaded_p && instance.pred == PRED_TYPE_m)
> @@ -268,11 +273,6 @@ struct alu_frm_def : public build_base
>
>  b.append_name (predication_suffixes[instance.pred]);
>
> -/* According to rvv-intrinsic-doc, it does not add "_rm" suffix
> -   for vop_rm C++ overloaded API.  */
> -if (!overloaded_p)
> -  b.append_name ("_rm");
> -
>  return b.finish_name ();
>}
>
> diff --git 
> a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-1.c
> index 608b3883dd0..d6c5e1bddd6 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-1.c
> @@ -11,20 +11,20 @@ test_riscv_vfadd_vv_f32m1_rm (vfloat32m1_t op1, 
> vfloat32m1_t op2, size_t vl) {
>  }
>
>  vfloat32m1_t
> -test_vfadd_vv_f32m1_m_rm(vbool32_t mask, vfloat32m1_t op1, vfloat32m1_t op2,
> +test_vfadd_vv_f32m1_rm_m (vbool32_t mask, vfloat32m1_t op1, vfloat32m1_t op2,
>  size_t vl) {
> -  return __riscv_vfadd_vv_f32m1_m_rm(mask, op1, op2, 1, vl);
> +  return __riscv_vfadd_vv_f32m1_rm_m (mask, op1, op2, 1, vl);
>  }
>
>  vfloat32m1_t
> -test_vfadd_vf_f32m1_rm(vfloat32m1_t op1, float32_t op2, size_t vl) {
> -  return __riscv_vfadd_vf_f32m1_rm(op1, op2, 2, vl);
> +test_vfadd_vf_f32m1_rm (vfloat32m1_t op1, float32_t op2, size_t vl) {
> +  return __riscv_vfadd_vf_f32m1_rm (op1, op2, 2, vl);
>  }
>
>  vfloat32m1_t
> -test_vfadd_vf_f32m1_m_rm(vbool32_t mask, vfloat32m1_t op1, float32_t op2,
> -size_t vl) {
> -  return __riscv_vfadd_vf_f32m1_m_rm(mask, op1, op2, 3, vl);
> +test_vfadd_vf_f32m1_rm_m (vbool32_t mask, vfloat32m1_t op1, float32_t op2,
> + size_t vl) {
> +  return __riscv_vfadd_vf_f32m1_rm_m (mask, op1, op2, 3, vl);
>  }
>
>  /* { dg-final { scan-assembler-times 
> {vfadd\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[fav]+[0-9]+} 4 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm.c
> index 95271b2c822..1f142605cc3 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm.c
> @@ -11,20 +11,20 @@ test_riscv_vfadd_vv_f32m1_rm (vfloat32m1_t op1, 
> vfloat32m1_t op2, size_t vl) {
>  }
>
>  vfloat32m1_t
> -test_vfadd_vv_f32m1_m_rm(vbool32_t mask, vfloat32m1_t op1, vfloat32m1_t op2,
> -size_t vl) {
> -  return __riscv_vfadd_vv_f32m1_m_rm(mask, op1, op2, 0, vl);
> +test_vfadd_vv_f32m1_rm_m (vbool32_t mask, vfloat32m1_t op1, vfloat32m1_t op2,
> + size_t vl) {
> +  return __riscv_vfadd_vv_f32m1_rm_m (mask, op1, op2, 0, vl);
>  }
>
>  vfloat32m1_t
> -test_vfadd_vf_f32m1_rm(vfloat32m1_t op1, float32_t op2, size_t vl) {
> -  return __riscv_vfadd_vf_f32m1_rm(op1, op2, 0, vl);
> +test_vfadd_vf_f32m1_rm (vfloat32m1_t op1, float32_t op2, size_t vl) {
> +  return 

Re: [PATCH v1] RISC-V: Bugfix for RVV floating-point rm suffix sequence

2023-07-30 Thread Kito Cheng via Gcc-patches
lgtm

On Mon, Jul 31, 2023 at 10:56 AM  wrote:
>
> From: Pan Li 
>
> According to below RVV intrinsic doc, the RVV floating-point intrinsic name
> with rounding mode should be:
>
> _rm_m
>
> instead of:
>
> _m_rm
>
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226
>
> This patch fix this naming sequence issue and adjust the test cases.
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-shapes.cc (struct alu_frm_def):
> Move rm suffix before mask.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/float-point-frm-insert-1.c: Adjust
> test cases.
> * gcc.target/riscv/rvv/base/float-point-frm.c: Ditto.
> ---
>  gcc/config/riscv/riscv-vector-builtins-shapes.cc | 10 +-
>  .../riscv/rvv/base/float-point-frm-insert-1.c| 14 +++---
>  .../gcc.target/riscv/rvv/base/float-point-frm.c  | 16 
>  3 files changed, 20 insertions(+), 20 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-shapes.cc 
> b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
> index 22b5fe256df..6af57c22bfb 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-shapes.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
> @@ -261,6 +261,11 @@ struct alu_frm_def : public build_base
> b.append_name (type_suffixes[instance.type.index].vector);
>}
>
> +/* According to rvv-intrinsic-doc, it does not add "_rm" suffix
> +   for vop_rm C++ overloaded API.  */
> +if (!overloaded_p)
> +  b.append_name ("_rm");
> +
>  /* According to rvv-intrinsic-doc, it does not add "_m" suffix
> for vop_m C++ overloaded API.  */
>  if (overloaded_p && instance.pred == PRED_TYPE_m)
> @@ -268,11 +273,6 @@ struct alu_frm_def : public build_base
>
>  b.append_name (predication_suffixes[instance.pred]);
>
> -/* According to rvv-intrinsic-doc, it does not add "_rm" suffix
> -   for vop_rm C++ overloaded API.  */
> -if (!overloaded_p)
> -  b.append_name ("_rm");
> -
>  return b.finish_name ();
>}
>
> diff --git 
> a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-1.c
> index 608b3883dd0..d6c5e1bddd6 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-1.c
> @@ -11,20 +11,20 @@ test_riscv_vfadd_vv_f32m1_rm (vfloat32m1_t op1, 
> vfloat32m1_t op2, size_t vl) {
>  }
>
>  vfloat32m1_t
> -test_vfadd_vv_f32m1_m_rm(vbool32_t mask, vfloat32m1_t op1, vfloat32m1_t op2,
> +test_vfadd_vv_f32m1_rm_m (vbool32_t mask, vfloat32m1_t op1, vfloat32m1_t op2,
>  size_t vl) {
> -  return __riscv_vfadd_vv_f32m1_m_rm(mask, op1, op2, 1, vl);
> +  return __riscv_vfadd_vv_f32m1_rm_m (mask, op1, op2, 1, vl);
>  }
>
>  vfloat32m1_t
> -test_vfadd_vf_f32m1_rm(vfloat32m1_t op1, float32_t op2, size_t vl) {
> -  return __riscv_vfadd_vf_f32m1_rm(op1, op2, 2, vl);
> +test_vfadd_vf_f32m1_rm (vfloat32m1_t op1, float32_t op2, size_t vl) {
> +  return __riscv_vfadd_vf_f32m1_rm (op1, op2, 2, vl);
>  }
>
>  vfloat32m1_t
> -test_vfadd_vf_f32m1_m_rm(vbool32_t mask, vfloat32m1_t op1, float32_t op2,
> -size_t vl) {
> -  return __riscv_vfadd_vf_f32m1_m_rm(mask, op1, op2, 3, vl);
> +test_vfadd_vf_f32m1_rm_m (vbool32_t mask, vfloat32m1_t op1, float32_t op2,
> + size_t vl) {
> +  return __riscv_vfadd_vf_f32m1_rm_m (mask, op1, op2, 3, vl);
>  }
>
>  /* { dg-final { scan-assembler-times 
> {vfadd\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[fav]+[0-9]+} 4 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm.c
> index 95271b2c822..1f142605cc3 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm.c
> @@ -11,20 +11,20 @@ test_riscv_vfadd_vv_f32m1_rm (vfloat32m1_t op1, 
> vfloat32m1_t op2, size_t vl) {
>  }
>
>  vfloat32m1_t
> -test_vfadd_vv_f32m1_m_rm(vbool32_t mask, vfloat32m1_t op1, vfloat32m1_t op2,
> -size_t vl) {
> -  return __riscv_vfadd_vv_f32m1_m_rm(mask, op1, op2, 0, vl);
> +test_vfadd_vv_f32m1_rm_m (vbool32_t mask, vfloat32m1_t op1, vfloat32m1_t op2,
> + size_t vl) {
> +  return __riscv_vfadd_vv_f32m1_rm_m (mask, op1, op2, 0, vl);
>  }
>
>  vfloat32m1_t
> -test_vfadd_vf_f32m1_rm(vfloat32m1_t op1, float32_t op2, size_t vl) {
> -  return __riscv_vfadd_vf_f32m1_rm(op1, op2, 0, vl);
> +test_vfadd_vf_f32m1_rm (vfloat32m1_t op1, float32_t op2, size_t vl) {
> +  return __riscv_vfadd_vf_f32m1_rm (op1, op2, 0, vl);
>  }
>
>  vfloat32m1_t
> -test_vfadd_vf_f32m1_m_rm(vbool32_t mask, vfloat32m1_t op1, float32_t op2,
> -size_t vl) {
> -  return __riscv_vfadd_vf_f32m1_m_rm(mask, op1, op2, 0, vl);
> +test_vfadd_vf_f32m1_rm_m 

[PATCH v1] RISC-V: Bugfix for RVV floating-point rm suffix sequence

2023-07-30 Thread Pan Li via Gcc-patches
From: Pan Li 

According to below RVV intrinsic doc, the RVV floating-point intrinsic name
with rounding mode should be:

_rm_m

instead of:

_m_rm

https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226

This patch fix this naming sequence issue and adjust the test cases.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-shapes.cc (struct alu_frm_def):
Move rm suffix before mask.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-insert-1.c: Adjust
test cases.
* gcc.target/riscv/rvv/base/float-point-frm.c: Ditto.
---
 gcc/config/riscv/riscv-vector-builtins-shapes.cc | 10 +-
 .../riscv/rvv/base/float-point-frm-insert-1.c| 14 +++---
 .../gcc.target/riscv/rvv/base/float-point-frm.c  | 16 
 3 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins-shapes.cc 
b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
index 22b5fe256df..6af57c22bfb 100644
--- a/gcc/config/riscv/riscv-vector-builtins-shapes.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
@@ -261,6 +261,11 @@ struct alu_frm_def : public build_base
b.append_name (type_suffixes[instance.type.index].vector);
   }
 
+/* According to rvv-intrinsic-doc, it does not add "_rm" suffix
+   for vop_rm C++ overloaded API.  */
+if (!overloaded_p)
+  b.append_name ("_rm");
+
 /* According to rvv-intrinsic-doc, it does not add "_m" suffix
for vop_m C++ overloaded API.  */
 if (overloaded_p && instance.pred == PRED_TYPE_m)
@@ -268,11 +273,6 @@ struct alu_frm_def : public build_base
 
 b.append_name (predication_suffixes[instance.pred]);
 
-/* According to rvv-intrinsic-doc, it does not add "_rm" suffix
-   for vop_rm C++ overloaded API.  */
-if (!overloaded_p)
-  b.append_name ("_rm");
-
 return b.finish_name ();
   }
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-1.c
index 608b3883dd0..d6c5e1bddd6 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-1.c
@@ -11,20 +11,20 @@ test_riscv_vfadd_vv_f32m1_rm (vfloat32m1_t op1, 
vfloat32m1_t op2, size_t vl) {
 }
 
 vfloat32m1_t
-test_vfadd_vv_f32m1_m_rm(vbool32_t mask, vfloat32m1_t op1, vfloat32m1_t op2,
+test_vfadd_vv_f32m1_rm_m (vbool32_t mask, vfloat32m1_t op1, vfloat32m1_t op2,
 size_t vl) {
-  return __riscv_vfadd_vv_f32m1_m_rm(mask, op1, op2, 1, vl);
+  return __riscv_vfadd_vv_f32m1_rm_m (mask, op1, op2, 1, vl);
 }
 
 vfloat32m1_t
-test_vfadd_vf_f32m1_rm(vfloat32m1_t op1, float32_t op2, size_t vl) {
-  return __riscv_vfadd_vf_f32m1_rm(op1, op2, 2, vl);
+test_vfadd_vf_f32m1_rm (vfloat32m1_t op1, float32_t op2, size_t vl) {
+  return __riscv_vfadd_vf_f32m1_rm (op1, op2, 2, vl);
 }
 
 vfloat32m1_t
-test_vfadd_vf_f32m1_m_rm(vbool32_t mask, vfloat32m1_t op1, float32_t op2,
-size_t vl) {
-  return __riscv_vfadd_vf_f32m1_m_rm(mask, op1, op2, 3, vl);
+test_vfadd_vf_f32m1_rm_m (vbool32_t mask, vfloat32m1_t op1, float32_t op2,
+ size_t vl) {
+  return __riscv_vfadd_vf_f32m1_rm_m (mask, op1, op2, 3, vl);
 }
 
 /* { dg-final { scan-assembler-times 
{vfadd\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[fav]+[0-9]+} 4 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm.c
index 95271b2c822..1f142605cc3 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm.c
@@ -11,20 +11,20 @@ test_riscv_vfadd_vv_f32m1_rm (vfloat32m1_t op1, 
vfloat32m1_t op2, size_t vl) {
 }
 
 vfloat32m1_t
-test_vfadd_vv_f32m1_m_rm(vbool32_t mask, vfloat32m1_t op1, vfloat32m1_t op2,
-size_t vl) {
-  return __riscv_vfadd_vv_f32m1_m_rm(mask, op1, op2, 0, vl);
+test_vfadd_vv_f32m1_rm_m (vbool32_t mask, vfloat32m1_t op1, vfloat32m1_t op2,
+ size_t vl) {
+  return __riscv_vfadd_vv_f32m1_rm_m (mask, op1, op2, 0, vl);
 }
 
 vfloat32m1_t
-test_vfadd_vf_f32m1_rm(vfloat32m1_t op1, float32_t op2, size_t vl) {
-  return __riscv_vfadd_vf_f32m1_rm(op1, op2, 0, vl);
+test_vfadd_vf_f32m1_rm (vfloat32m1_t op1, float32_t op2, size_t vl) {
+  return __riscv_vfadd_vf_f32m1_rm (op1, op2, 0, vl);
 }
 
 vfloat32m1_t
-test_vfadd_vf_f32m1_m_rm(vbool32_t mask, vfloat32m1_t op1, float32_t op2,
-size_t vl) {
-  return __riscv_vfadd_vf_f32m1_m_rm(mask, op1, op2, 0, vl);
+test_vfadd_vf_f32m1_rm_m (vbool32_t mask, vfloat32m1_t op1, float32_t op2,
+ size_t vl) {
+  return __riscv_vfadd_vf_f32m1_rm_m (mask, op1, op2, 0, vl);
 }
 
 /* { dg-final { scan-assembler-times 
{vfadd\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[fav]+[0-9]+} 4 } } */
-- 
2.34.1



RE: [PATCH V2] RISC-V: Enable basic VLS auto-vectorization

2023-07-30 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Kito Cheng via Gcc-patches
Sent: Monday, July 31, 2023 10:42 AM
To: Juzhe-Zhong 
Cc: gcc-patches@gcc.gnu.org; kito.ch...@sifive.com; jeffreya...@gmail.com; 
rdapp@gmail.com
Subject: Re: [PATCH V2] RISC-V: Enable basic VLS auto-vectorization

LGTM, thanks :)

On Mon, Jul 31, 2023 at 10:14 AM Juzhe-Zhong  wrote:
>
> Consider this following case:
> void
> foo (int8_t *in, int8_t *out, int8_t x)
> {
>   for (int i = 0; i < 16; i++)
> in[i] = x;
> }
>
> Compile option: --param=riscv-autovec-preference=scalable -fno-builtin
>
> Before this patch:
>
> foo:
> li  a5,16
> csrra4,vlenb
> vsetvli a3,zero,e8,m1,ta,ma
> vmv.v.x v1,a2
> bleua5,a4,.L2
> mv  a5,a4
> .L2:
> vsetvli zero,a5,e8,m1,ta,ma
> vse8.v  v1,0(a0)
> ret
>
> After this patch:
>
> foo:
> vsetivlizero,16,e8,mf8,ta,ma
> vmv.v.x v1,a2
> vse8.v  v1,0(a0)
> ret
>
> gcc/ChangeLog:
>
> * config/riscv/autovec-vls.md (@vec_duplicate): New pattern.
> * config/riscv/riscv-v.cc (autovectorize_vector_modes): Add VLS 
> autovec support.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/v-1.c: Adapt test.
> * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto.
> * gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto.
> * gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vls/dup-1.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/dup-2.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/dup-3.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/dup-4.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/dup-5.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/dup-6.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/dup-7.c: New test.
>
> ---
>  gcc/config/riscv/autovec-vls.md   |  19 ++
>  gcc/config/riscv/riscv-v.cc   |  21 ++-
>  .../gcc.target/riscv/rvv/autovec/v-1.c|   2 +-
>  .../gcc.target/riscv/rvv/autovec/vls/dup-1.c  | 168 ++
>  .../gcc.target/riscv/rvv/autovec/vls/dup-2.c  | 153 
>  .../gcc.target/riscv/rvv/autovec/vls/dup-3.c  | 153 
>  .../gcc.target/riscv/rvv/autovec/vls/dup-4.c  | 137 ++
>  .../gcc.target/riscv/rvv/autovec/vls/dup-5.c  | 137 ++
>  .../gcc.target/riscv/rvv/autovec/vls/dup-6.c  | 122 +
>  .../gcc.target/riscv/rvv/autovec/vls/dup-7.c  | 122 +
>  .../riscv/rvv/autovec/zve32f_zvl128b-1.c  |   2 +-
>  .../riscv/rvv/autovec/zve64d_zvl128b-1.c  |   2 +-
>  .../riscv/rvv/autovec/zve64f_zvl128b-1.c  |   2 +-
>  13 files changed, 1034 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-5.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-6.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-7.c
>
> diff --git a/gcc/config/riscv/autovec-vls.md b/gcc/config/riscv/autovec-vls.md
> index 9ece317ca4e..1a64dfdd91e 100644
> --- a/gcc/config/riscv/autovec-vls.md
> +++ b/gcc/config/riscv/autovec-vls.md
> @@ -139,3 +139,22 @@
>"vmv%m1r.v\t%0,%1"
>[(set_attr "type" "vmov")
> (set_attr "mode" "")])
> +
> +;; -
> +;;  Duplicate Operations
> +;; -
> +
> +(define_insn_and_split "@vec_duplicate"
> +  [(set (match_operand:VLS 0 "register_operand")
> +(vec_duplicate:VLS
> +  (match_operand: 1 "reg_or_int_operand")))]
> +  "TARGET_VECTOR && can_create_pseudo_p ()"
> +  "#"
> +  "&& 1"
> +  [(const_int 0)]
> +  {
> +riscv_vector::emit_vlmax_insn (code_for_pred_broadcast (mode),
> +   riscv_vector::RVV_UNOP, operands);
> +DONE;
> +  }
> +)
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index 9e89f970a4c..c10e51b362e 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -2533,7 +2533,6 @@ autovectorize_vector_modes (vector_modes *modes, bool)
>  {
>if (autovec_use_vlmax_p ())
>  {
> -  /* TODO: We will support RVV VLS auto-vectorization mode in the 
> future. */
>poly_uint64 full_size
> = BYTES_PER_RISCV_VECTOR * ((int) riscv_autovec_lmul);
>
> @@ -2561,7 +2560,25 @@ autovectorize_vector_modes (vector_modes *modes, bool)
> modes->safe_push (mode);
> }
>  

Re: [PATCH V2] RISC-V: Enable basic VLS auto-vectorization

2023-07-30 Thread Kito Cheng via Gcc-patches
LGTM, thanks :)

On Mon, Jul 31, 2023 at 10:14 AM Juzhe-Zhong  wrote:
>
> Consider this following case:
> void
> foo (int8_t *in, int8_t *out, int8_t x)
> {
>   for (int i = 0; i < 16; i++)
> in[i] = x;
> }
>
> Compile option: --param=riscv-autovec-preference=scalable -fno-builtin
>
> Before this patch:
>
> foo:
> li  a5,16
> csrra4,vlenb
> vsetvli a3,zero,e8,m1,ta,ma
> vmv.v.x v1,a2
> bleua5,a4,.L2
> mv  a5,a4
> .L2:
> vsetvli zero,a5,e8,m1,ta,ma
> vse8.v  v1,0(a0)
> ret
>
> After this patch:
>
> foo:
> vsetivlizero,16,e8,mf8,ta,ma
> vmv.v.x v1,a2
> vse8.v  v1,0(a0)
> ret
>
> gcc/ChangeLog:
>
> * config/riscv/autovec-vls.md (@vec_duplicate): New pattern.
> * config/riscv/riscv-v.cc (autovectorize_vector_modes): Add VLS 
> autovec support.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/v-1.c: Adapt test.
> * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto.
> * gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto.
> * gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vls/dup-1.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/dup-2.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/dup-3.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/dup-4.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/dup-5.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/dup-6.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/dup-7.c: New test.
>
> ---
>  gcc/config/riscv/autovec-vls.md   |  19 ++
>  gcc/config/riscv/riscv-v.cc   |  21 ++-
>  .../gcc.target/riscv/rvv/autovec/v-1.c|   2 +-
>  .../gcc.target/riscv/rvv/autovec/vls/dup-1.c  | 168 ++
>  .../gcc.target/riscv/rvv/autovec/vls/dup-2.c  | 153 
>  .../gcc.target/riscv/rvv/autovec/vls/dup-3.c  | 153 
>  .../gcc.target/riscv/rvv/autovec/vls/dup-4.c  | 137 ++
>  .../gcc.target/riscv/rvv/autovec/vls/dup-5.c  | 137 ++
>  .../gcc.target/riscv/rvv/autovec/vls/dup-6.c  | 122 +
>  .../gcc.target/riscv/rvv/autovec/vls/dup-7.c  | 122 +
>  .../riscv/rvv/autovec/zve32f_zvl128b-1.c  |   2 +-
>  .../riscv/rvv/autovec/zve64d_zvl128b-1.c  |   2 +-
>  .../riscv/rvv/autovec/zve64f_zvl128b-1.c  |   2 +-
>  13 files changed, 1034 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-5.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-6.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-7.c
>
> diff --git a/gcc/config/riscv/autovec-vls.md b/gcc/config/riscv/autovec-vls.md
> index 9ece317ca4e..1a64dfdd91e 100644
> --- a/gcc/config/riscv/autovec-vls.md
> +++ b/gcc/config/riscv/autovec-vls.md
> @@ -139,3 +139,22 @@
>"vmv%m1r.v\t%0,%1"
>[(set_attr "type" "vmov")
> (set_attr "mode" "")])
> +
> +;; -
> +;;  Duplicate Operations
> +;; -
> +
> +(define_insn_and_split "@vec_duplicate"
> +  [(set (match_operand:VLS 0 "register_operand")
> +(vec_duplicate:VLS
> +  (match_operand: 1 "reg_or_int_operand")))]
> +  "TARGET_VECTOR && can_create_pseudo_p ()"
> +  "#"
> +  "&& 1"
> +  [(const_int 0)]
> +  {
> +riscv_vector::emit_vlmax_insn (code_for_pred_broadcast (mode),
> +   riscv_vector::RVV_UNOP, operands);
> +DONE;
> +  }
> +)
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index 9e89f970a4c..c10e51b362e 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -2533,7 +2533,6 @@ autovectorize_vector_modes (vector_modes *modes, bool)
>  {
>if (autovec_use_vlmax_p ())
>  {
> -  /* TODO: We will support RVV VLS auto-vectorization mode in the 
> future. */
>poly_uint64 full_size
> = BYTES_PER_RISCV_VECTOR * ((int) riscv_autovec_lmul);
>
> @@ -2561,7 +2560,25 @@ autovectorize_vector_modes (vector_modes *modes, bool)
> modes->safe_push (mode);
> }
>  }
> -  return 0;
> +  unsigned int flag = 0;
> +  if (TARGET_VECTOR_VLS)
> +{
> +  /* Enable VECT_COMPARE_COSTS between VLA modes VLS modes for scalable
> +auto-vectorization.  */
> +  flag |= VECT_COMPARE_COSTS;
> +  /* Push all VLSmodes according to TARGET_MIN_VLEN.  */
> +  unsigned int i = 0;
> +  

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-07-30 Thread Hao Liu OS via Gcc-patches
> Which test case do you see this for?  The two tests in the patch still
> seem to report correct latencies for me if I make the change above.

Not the newly added tests.  It is still the existing case causing the previous 
ICE (i.e. assertion problem): gcc.target/aarch64/sve/cost_model_13.c.

It's not the test case itself failed, but the dump message of vect says the 
"reduction latency" is 0:

Before the change:
cost_model_13.c:7:21: note:  Original vector body cost = 6
cost_model_13.c:7:21: note:  Scalar issue estimate:
cost_model_13.c:7:21: note:load operations = 1
cost_model_13.c:7:21: note:store operations = 0
cost_model_13.c:7:21: note:general operations = 1
cost_model_13.c:7:21: note:reduction latency = 1
cost_model_13.c:7:21: note:estimated min cycles per iteration = 1.00
cost_model_13.c:7:21: note:estimated cycles per vector iteration (for VF 8) 
= 8.00
cost_model_13.c:7:21: note:  Vector issue estimate:
cost_model_13.c:7:21: note:load operations = 1
cost_model_13.c:7:21: note:store operations = 0
cost_model_13.c:7:21: note:general operations = 1
cost_model_13.c:7:21: note:reduction latency = 2
cost_model_13.c:7:21: note:estimated min cycles per iteration = 2.00

After the change:
cost_model_13.c:7:21: note:  Original vector body cost = 6
cost_model_13.c:7:21: note:  Scalar issue estimate:
cost_model_13.c:7:21: note:load operations = 1
cost_model_13.c:7:21: note:store operations = 0
cost_model_13.c:7:21: note:general operations = 1
cost_model_13.c:7:21: note:reduction latency = 0 <--- seems not 
consistent with above result
cost_model_13.c:7:21: note:estimated min cycles per iteration = 1.00
cost_model_13.c:7:21: note:estimated cycles per vector iteration (for VF 8) 
= 8.00
cost_model_13.c:7:21: note:  Vector issue estimate:
cost_model_13.c:7:21: note:load operations = 1
cost_model_13.c:7:21: note:store operations = 0
cost_model_13.c:7:21: note:general operations = 1
cost_model_13.c:7:21: note:reduction latency = 0 <--- seems not 
consistent with above result
cost_model_13.c:7:21: note:estimated min cycles per iteration = 1.00
 <--- seems not consistent with above result

BTW. this should be caused by the reduction stmt is not live, which indicates 
whether this stmts is part of a computation whose result is used outside the 
loop (tree-vectorized.h:1204):
  :
  # res_18 = PHI 
  # i_20 = PHI 
  _1 = (long unsigned int) i_20;
  _2 = _1 * 2;
  _3 = x_14(D) + _2;
  _4 = *_3;
  _5 = (unsigned short) _4;
  res.0_6 = (unsigned short) res_18;
  _7 = _5 + res.0_6; <-- This is not live, may be 
caused by the below type cast stmt.
  res_15 = (short int) _7;
  i_16 = i_20 + 1;
  if (n_11(D) > i_16)
goto ;
  else
goto ;

  :
  goto ;

Thanks,
-Hao


From: Richard Sandiford 
Sent: Saturday, July 29, 2023 1:35
To: Hao Liu OS
Cc: Richard Biener; GCC-patches@gcc.gnu.org
Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by 
multiplying count [PR110625]

Sorry for the slow response.

Hao Liu OS  writes:
>> Ah, thanks.  In that case, Hao, I think we can avoid the ICE by changing:
>>
>>   if ((kind == scalar_stmt || kind == vector_stmt || kind == vec_to_scalar)
>>   && vect_is_reduction (stmt_info))
>>
>> to:
>>
>>   if ((kind == scalar_stmt || kind == vector_stmt || kind == vec_to_scalar)
>>   && STMT_VINFO_LIVE_P (stmt_info)
>>   && vect_is_reduction (stmt_info))
>
> I  tried this and it indeed can avoid ICE.  But it seems the 
> reduction_latency calculation is also skipped, after such modification, the 
> redunction_latency is 0 for this case. Previously, it is 1 and 2 for scalar 
> and vector separately.

Which test case do you see this for?  The two tests in the patch still
seem to report correct latencies for me if I make the change above.

Thanks,
Richard

> IMHO, to keep it consistent with previous result, should we move 
> STMT_VINFO_LIVE_P check below and inside the if? such as:
>
>   /* Calculate the minimum cycles per iteration imposed by a reduction
>  operation.  */
>   if ((kind == scalar_stmt || kind == vector_stmt || kind == vec_to_scalar)
>   && vect_is_reduction (stmt_info))
> {
>   unsigned int base
> = aarch64_in_loop_reduction_latency (m_vinfo, stmt_info, m_vec_flags);
>   if (STMT_VINFO_LIVE_P (stmt_info) && STMT_VINFO_FORCE_SINGLE_CYCLE (
> info_for_reduction (m_vinfo, stmt_info)))
> /* ??? Ideally we'd use a tree to reduce the copies down to 1 vector,
>and then accumulate that, but at the moment the loop-carried
>dependency includes all copies.  */
> ops->reduction_latency = MAX (ops->reduction_latency, base * count);
>   else
> ops->reduction_latency = MAX (ops->reduction_latency, base);
>
> Thanks,
> Hao
>
> 
> 

Re: Re: [PATCH] RISC-V: Enable basic VLS auto-vectorization

2023-07-30 Thread juzhe.zh...@rivai.ai
Address comment V2:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625799.html 




juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-07-31 09:55
To: Juzhe-Zhong
CC: gcc-patches; kito.cheng; rdapp.gcc; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Enable basic VLS auto-vectorization
Hi Juzhe:
 
> * config/riscv/riscv.cc (riscv_estimated_poly_value): Fix incorrect 
> poly estimation.
 
Is it a necessary change for the VLS autovectorizaion or could it be a
separate change??
 


[PATCH V2] RISC-V: Enable basic VLS auto-vectorization

2023-07-30 Thread Juzhe-Zhong
Consider this following case:
void
foo (int8_t *in, int8_t *out, int8_t x)
{
  for (int i = 0; i < 16; i++)
in[i] = x;
}

Compile option: --param=riscv-autovec-preference=scalable -fno-builtin

Before this patch:

foo:
li  a5,16
csrra4,vlenb
vsetvli a3,zero,e8,m1,ta,ma
vmv.v.x v1,a2
bleua5,a4,.L2
mv  a5,a4
.L2:
vsetvli zero,a5,e8,m1,ta,ma
vse8.v  v1,0(a0)
ret

After this patch:

foo:
vsetivlizero,16,e8,mf8,ta,ma
vmv.v.x v1,a2
vse8.v  v1,0(a0)
ret

gcc/ChangeLog:

* config/riscv/autovec-vls.md (@vec_duplicate): New pattern.
* config/riscv/riscv-v.cc (autovectorize_vector_modes): Add VLS autovec 
support.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/v-1.c: Adapt test.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/dup-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/dup-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/dup-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/dup-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/dup-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/dup-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/dup-7.c: New test.

---
 gcc/config/riscv/autovec-vls.md   |  19 ++
 gcc/config/riscv/riscv-v.cc   |  21 ++-
 .../gcc.target/riscv/rvv/autovec/v-1.c|   2 +-
 .../gcc.target/riscv/rvv/autovec/vls/dup-1.c  | 168 ++
 .../gcc.target/riscv/rvv/autovec/vls/dup-2.c  | 153 
 .../gcc.target/riscv/rvv/autovec/vls/dup-3.c  | 153 
 .../gcc.target/riscv/rvv/autovec/vls/dup-4.c  | 137 ++
 .../gcc.target/riscv/rvv/autovec/vls/dup-5.c  | 137 ++
 .../gcc.target/riscv/rvv/autovec/vls/dup-6.c  | 122 +
 .../gcc.target/riscv/rvv/autovec/vls/dup-7.c  | 122 +
 .../riscv/rvv/autovec/zve32f_zvl128b-1.c  |   2 +-
 .../riscv/rvv/autovec/zve64d_zvl128b-1.c  |   2 +-
 .../riscv/rvv/autovec/zve64f_zvl128b-1.c  |   2 +-
 13 files changed, 1034 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-7.c

diff --git a/gcc/config/riscv/autovec-vls.md b/gcc/config/riscv/autovec-vls.md
index 9ece317ca4e..1a64dfdd91e 100644
--- a/gcc/config/riscv/autovec-vls.md
+++ b/gcc/config/riscv/autovec-vls.md
@@ -139,3 +139,22 @@
   "vmv%m1r.v\t%0,%1"
   [(set_attr "type" "vmov")
(set_attr "mode" "")])
+
+;; -
+;;  Duplicate Operations
+;; -
+
+(define_insn_and_split "@vec_duplicate"
+  [(set (match_operand:VLS 0 "register_operand")
+(vec_duplicate:VLS
+  (match_operand: 1 "reg_or_int_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+riscv_vector::emit_vlmax_insn (code_for_pred_broadcast (mode),
+   riscv_vector::RVV_UNOP, operands);
+DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 9e89f970a4c..c10e51b362e 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -2533,7 +2533,6 @@ autovectorize_vector_modes (vector_modes *modes, bool)
 {
   if (autovec_use_vlmax_p ())
 {
-  /* TODO: We will support RVV VLS auto-vectorization mode in the future. 
*/
   poly_uint64 full_size
= BYTES_PER_RISCV_VECTOR * ((int) riscv_autovec_lmul);
 
@@ -2561,7 +2560,25 @@ autovectorize_vector_modes (vector_modes *modes, bool)
modes->safe_push (mode);
}
 }
-  return 0;
+  unsigned int flag = 0;
+  if (TARGET_VECTOR_VLS)
+{
+  /* Enable VECT_COMPARE_COSTS between VLA modes VLS modes for scalable
+auto-vectorization.  */
+  flag |= VECT_COMPARE_COSTS;
+  /* Push all VLSmodes according to TARGET_MIN_VLEN.  */
+  unsigned int i = 0;
+  unsigned int base_size = TARGET_MIN_VLEN * riscv_autovec_lmul / 8;
+  unsigned int size = base_size;
+  machine_mode mode;
+  while (size > 0 && get_vector_mode (QImode, size).exists ())
+   {
+ modes->safe_push (mode);
+ i++;
+ size = base_size / (1U << i);
+   }
+}
+  

Re: [PATCH] RISC-V: Enable basic VLS auto-vectorization

2023-07-30 Thread Kito Cheng via Gcc-patches
Hi Juzhe:

> * config/riscv/riscv.cc (riscv_estimated_poly_value): Fix incorrect 
> poly estimation.

Is it a necessary change for the VLS autovectorizaion or could it be a
separate change??


[committed] MAINTAINERS: Add myself to write after approval

2023-07-30 Thread Li Xu
From: xuli 

Signed-off-by: Li Xu 

ChangeLog:

* MAINTAINERS: Add myself.
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e9b11b43a0f..49aa6bae73b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -712,6 +712,7 @@ Jonathan Wright 

 Ruoyao Xi  
 Mingjie Xing   
 Chenghua Xu
+Li Xu  
 Canqun Yang
 Fei Yang   
 Jeffrey Yasskin
-- 
2.17.1



[pushed] wwwdocs: gcc-4.5: Update link to GNU MPC

2023-07-30 Thread Gerald Pfeifer


---
 htdocs/gcc-4.5/changes.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/gcc-4.5/changes.html b/htdocs/gcc-4.5/changes.html
index 2e8f56a7..3d645bb3 100644
--- a/htdocs/gcc-4.5/changes.html
+++ b/htdocs/gcc-4.5/changes.html
@@ -18,7 +18,7 @@
 
   
 GCC now requires the https://www.multiprecision.org/mpc/;>MPC library in order to
+href="https://www.multiprecision.org;>MPC library in order to
 build.  See the https://gcc.gnu.org/install/prerequisites.html;>prerequisites
 page for version requirements.
-- 
2.41.0


[RFC PATCH] i386: Do not sanitize upper part of V2SFmode reg with -fno-trapping-math [PR110832]

2023-07-30 Thread Uros Bizjak via Gcc-patches
Also introduce -m[no-]mmxfp-with-sse option to disable trapping V2SF
named patterns in order to avoid generation of partial vector V4SFmode
trapping instructions.

The new option is enabled by default, because even with sanitization,
a small but consistent speed up of 2 to 3% with Polyhedron capacita
benchmark can be achieved vs. scalar code.

Using -fno-trapping-math improves Polyhedron capacita runtime 8 to 9%
vs. scalar code.  This is what clang does by default, as it defaults
to -fno-trapping-math.

PR target/110832

gcc/ChangeLog:

* config/i386/i386.h (TARGET_MMXFP_WITH_SSE): New macro.
* config/i386/i386/opt (mmmxfp-with-sse): New option.
* config/i386/mmx.md (movq__to_sse): Do not sanitize
upper part of V2SFmode register with -fno-trapping-math.
(v2sf3): Enable for TARGET_MMXFP_WITH_SSE.
(divv2sf3): Ditto.
(v2sf3): Ditto.
(sqrtv2sf2): Ditto.
(*mmx_haddv2sf3_low): Ditto.
(*mmx_hsubv2sf3_low): Ditto.
(vec_addsubv2sf3): Ditto.
(vec_cmpv2sfv2si): Ditto.
(vcondv2sf): Ditto.
(fmav2sf4): Ditto.
(fmsv2sf4): Ditto.
(fnmav2sf4): Ditto.
(fnmsv2sf4): Ditto.
(fix_truncv2sfv2si2): Ditto.
(fixuns_truncv2sfv2si2): Ditto.
(floatv2siv2sf2): Ditto.
(floatunsv2siv2sf2): Ditto.
(nearbyintv2sf2): Ditto.
(rintv2sf2): Ditto.
(lrintv2sfv2si2): Ditto.
(ceilv2sf2): Ditto.
(lceilv2sfv2si2): Ditto.
(floorv2sf2): Ditto.
(lfloorv2sfv2si2): Ditto.
(btruncv2sf2): Ditto.
(roundv2sf2): Ditto.
(lroundv2sfv2si2): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index ef342fcee9b..af72b6c48a9 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -50,6 +50,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #define TARGET_16BIT_P(x)  TARGET_CODE16_P(x)
 
 #define TARGET_MMX_WITH_SSE(TARGET_64BIT && TARGET_SSE2)
+#define TARGET_MMXFP_WITH_SSE  (TARGET_MMX_WITH_SSE && ix86_mmxfp_with_sse)
 
 #include "config/vxworks-dummy.h"
 
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 1cc8563477a..1b65fed5daf 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -670,6 +670,10 @@ m3dnowa
 Target Mask(ISA_3DNOW_A) Var(ix86_isa_flags) Save
 Support Athlon 3Dnow! built-in functions.
 
+mmmxfp-with-sse
+Target Var(ix86_mmxfp_with_sse) Init(1)
+Enable MMX floating point vectors in SSE registers
+
 msse
 Target Mask(ISA_SSE) Var(ix86_isa_flags) Save
 Support MMX and SSE built-in functions and code generation.
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 896af76a33f..0555da9022b 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -597,7 +597,18 @@ (define_expand "movq__to_sse"
  (match_operand:V2FI 1 "nonimmediate_operand")
  (match_dup 2)))]
   "TARGET_SSE2"
-  "operands[2] = CONST0_RTX (mode);")
+{
+  if (mode == V2SFmode
+  && !flag_trapping_math)
+{
+  rtx op1 = force_reg (mode, operands[1]);
+  emit_move_insn (operands[0], lowpart_subreg (mode,
+  op1, mode));
+  DONE;
+}
+
+  operands[2] = CONST0_RTX (mode);
+})
 
 ;
 ;;
@@ -650,7 +661,7 @@ (define_expand "v2sf3"
(plusminusmult:V2SF
  (match_operand:V2SF 1 "nonimmediate_operand")
  (match_operand:V2SF 2 "nonimmediate_operand")))]
-  "TARGET_MMX_WITH_SSE"
+  "TARGET_MMXFP_WITH_SSE"
 {
   rtx op2 = gen_reg_rtx (V4SFmode);
   rtx op1 = gen_reg_rtx (V4SFmode);
@@ -728,7 +739,7 @@ (define_expand "divv2sf3"
   [(set (match_operand:V2SF 0 "register_operand")
(div:V2SF (match_operand:V2SF 1 "register_operand")
  (match_operand:V2SF 2 "register_operand")))]
-  "TARGET_MMX_WITH_SSE"
+  "TARGET_MMXFP_WITH_SSE"
 {
   rtx op2 = gen_reg_rtx (V4SFmode);
   rtx op1 = gen_reg_rtx (V4SFmode);
@@ -750,7 +761,7 @@ (define_expand "v2sf3"
 (smaxmin:V2SF
  (match_operand:V2SF 1 "register_operand")
  (match_operand:V2SF 2 "register_operand")))]
-  "TARGET_MMX_WITH_SSE"
+  "TARGET_MMXFP_WITH_SSE"
 {
   rtx op2 = gen_reg_rtx (V4SFmode);
   rtx op1 = gen_reg_rtx (V4SFmode);
@@ -852,7 +863,7 @@ (define_insn "mmx_rcpit2v2sf3"
 (define_expand "sqrtv2sf2"
   [(set (match_operand:V2SF 0 "register_operand")
(sqrt:V2SF (match_operand:V2SF 1 "nonimmediate_operand")))]
-  "TARGET_MMX_WITH_SSE"
+  "TARGET_MMXFP_WITH_SSE"
 {
   rtx op1 = gen_reg_rtx (V4SFmode);
   rtx op0 = gen_reg_rtx (V4SFmode);
@@ -933,7 +944,7 @@ (define_insn_and_split "*mmx_haddv2sf3_low"
  (vec_select:SF
(match_dup 1)
(parallel [(match_operand:SI 3 "const_0_to_1_operand")]]
-  "TARGET_SSE3 && TARGET_MMX_WITH_SSE
+  "TARGET_SSE3 && TARGET_MMXFP_WITH_SSE
&& INTVAL (operands[2]) != INTVAL (operands[3])
&& ix86_pre_reload_split ()"
   "#"
@@ -979,7 +990,7 

[committed] Fix several preprocessor directives

2023-07-30 Thread François Dumont via Gcc-patches

Committed as obvious.

    libstdc++: Fix several preprocessor directives

    A wrong usage of #define in place of a #error seems to have been 
replicated

    at different places in source files.

    libstdc++-v3/ChangeLog:

    * src/c++11/compatibility-ldbl-facets-aliases.h: Replace 
#define with

    proper #error.
    * src/c++11/locale-inst-monetary.h: Likewise.
    * src/c++11/locale-inst-numeric.h: Likewise.

François
diff --git a/libstdc++-v3/src/c++11/compatibility-ldbl-facets-aliases.h b/libstdc++-v3/src/c++11/compatibility-ldbl-facets-aliases.h
index 70c9342d88a..faf8221b273 100644
--- a/libstdc++-v3/src/c++11/compatibility-ldbl-facets-aliases.h
+++ b/libstdc++-v3/src/c++11/compatibility-ldbl-facets-aliases.h
@@ -23,11 +23,11 @@
 // .
 
 #ifndef C
-#define "This file should not be compiled directly, only included"
+# error "This file should not be compiled directly, only included"
 #endif
 
 #ifndef _GLIBCXX_LONG_DOUBLE_COMPAT
-#define "This file should only be used for _GLIBCXX_LONG_DOUBLE_COMPAT builds"
+# error "This file should only be used for _GLIBCXX_LONG_DOUBLE_COMPAT builds"
 #endif
 
 // XXX GLIBCXX_ABI Deprecated
diff --git a/libstdc++-v3/src/c++11/locale-inst-monetary.h b/libstdc++-v3/src/c++11/locale-inst-monetary.h
index d8fecf26596..954de1f52cf 100644
--- a/libstdc++-v3/src/c++11/locale-inst-monetary.h
+++ b/libstdc++-v3/src/c++11/locale-inst-monetary.h
@@ -23,7 +23,7 @@
 // .
 
 #ifndef C
-#define "This file should not be compiled directly, only included"
+# error "This file should not be compiled directly, only included"
 #endif
 
 #include "facet_inst_macros.h"
diff --git a/libstdc++-v3/src/c++11/locale-inst-numeric.h b/libstdc++-v3/src/c++11/locale-inst-numeric.h
index c77ee9e8d38..b917fe5802e 100644
--- a/libstdc++-v3/src/c++11/locale-inst-numeric.h
+++ b/libstdc++-v3/src/c++11/locale-inst-numeric.h
@@ -23,7 +23,7 @@
 // .
 
 #ifndef C
-#define "This file should not be compiled directly, only included"
+# error "This file should not be compiled directly, only included"
 #endif
 
 #include "facet_inst_macros.h"