Re: [PATCH v2] RISC-V: Support RVV VFCVT.X.F.V rounding mode intrinsic API

2023-08-15 Thread Kito Cheng via Gcc-patches
LGTM

 於 2023年8月16日 週三 13:17 寫道:

> From: Pan Li 
>
> This patch would like to support the rounding mode API for the
> VFCVT.X.F.V as the below samples.
>
> * __riscv_vfcvt_x_f_v_i32m1_rm
> * __riscv_vfcvt_x_f_v_i32m1_rm_m
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-bases.cc
> (enum frm_op_type): New type for frm.
> (BASE): New declaration.
> * config/riscv/riscv-vector-builtins-bases.h: Ditto.
> * config/riscv/riscv-vector-builtins-functions.def
> (vfcvt_x_frm): New intrinsic function def.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/float-point-cvt-x.c: New test.
> ---
>  .../riscv/riscv-vector-builtins-bases.cc  | 15 +-
>  .../riscv/riscv-vector-builtins-bases.h   |  1 +
>  .../riscv/riscv-vector-builtins-functions.def |  2 ++
>  .../riscv/rvv/base/float-point-cvt-x.c| 29 +++
>  4 files changed, 46 insertions(+), 1 deletion(-)
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-cvt-x.c
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc
> b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> index f2124080ef9..817d2ed016a 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> @@ -58,6 +58,12 @@ enum lst_type
>LST_INDEXED,
>  };
>
> +enum frm_op_type
> +{
> +  NO_FRM,
> +  HAS_FRM,
> +};
> +
>  /* Helper function to fold vleff and vlsegff.  */
>  static gimple *
>  fold_fault_load (gimple_folder )
> @@ -1662,10 +1668,15 @@ public:
>  };
>
>  /* Implements vfcvt.x.  */
> -template
> +template
>  class vfcvt_x : public function_base
>  {
>  public:
> +  bool has_rounding_mode_operand_p () const override
> +  {
> +return FRM_OP == HAS_FRM;
> +  }
> +
>rtx expand (function_expander ) const override
>{
>  return e.use_exact_insn (code_for_pred_fcvt_x_f (UNSPEC, e.arg_mode
> (0)));
> @@ -2465,6 +2476,7 @@ static CONSTEXPR const vfclass vfclass_obj;
>  static CONSTEXPR const vmerge vfmerge_obj;
>  static CONSTEXPR const vmv_v vfmv_v_obj;
>  static CONSTEXPR const vfcvt_x vfcvt_x_obj;
> +static CONSTEXPR const vfcvt_x vfcvt_x_frm_obj;
>  static CONSTEXPR const vfcvt_x vfcvt_xu_obj;
>  static CONSTEXPR const vfcvt_rtz_x vfcvt_rtz_x_obj;
>  static CONSTEXPR const vfcvt_rtz_x vfcvt_rtz_xu_obj;
> @@ -2714,6 +2726,7 @@ BASE (vfclass)
>  BASE (vfmerge)
>  BASE (vfmv_v)
>  BASE (vfcvt_x)
> +BASE (vfcvt_x_frm)
>  BASE (vfcvt_xu)
>  BASE (vfcvt_rtz_x)
>  BASE (vfcvt_rtz_xu)
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h
> b/gcc/config/riscv/riscv-vector-builtins-bases.h
> index 2a9381eec5e..50a7d7ffb6f 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.h
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
> @@ -205,6 +205,7 @@ extern const function_base *const vfclass;
>  extern const function_base *const vfmerge;
>  extern const function_base *const vfmv_v;
>  extern const function_base *const vfcvt_x;
> +extern const function_base *const vfcvt_x_frm;
>  extern const function_base *const vfcvt_xu;
>  extern const function_base *const vfcvt_rtz_x;
>  extern const function_base *const vfcvt_rtz_xu;
> diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def
> b/gcc/config/riscv/riscv-vector-builtins-functions.def
> index 34def6bb82f..8b6a7cc49f3 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-functions.def
> +++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
> @@ -445,6 +445,8 @@ DEF_RVV_FUNCTION (vfcvt_rtz_xu, alu, full_preds,
> f_to_u_f_v_ops)
>  DEF_RVV_FUNCTION (vfcvt_f, alu, full_preds, i_to_f_x_v_ops)
>  DEF_RVV_FUNCTION (vfcvt_f, alu, full_preds, u_to_f_xu_v_ops)
>
> +DEF_RVV_FUNCTION (vfcvt_x_frm, alu_frm, full_preds, f_to_i_f_v_ops)
> +
>  // 13.18. Widening Floating-Point/Integer Type-Convert Instructions
>  DEF_RVV_FUNCTION (vfwcvt_x, alu, full_preds, f_to_wi_f_v_ops)
>  DEF_RVV_FUNCTION (vfwcvt_xu, alu, full_preds, f_to_wu_f_v_ops)
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-cvt-x.c
> b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-cvt-x.c
> new file mode 100644
> index 000..e090f0f97e9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-cvt-x.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
> +
> +#include "riscv_vector.h"
> +
> +vint32m1_t
> +test_riscv_vfcvt_x_f_vv_i32m1_rm (vfloat32m1_t op1, size_t vl) {
> +  return __riscv_vfcvt_x_f_v_i32m1_rm (op1, 0, vl);
> +}
> +
> +vint32m1_t
> +test_vfcvt_x_f_vv_i32m1_rm_m (vbool32_t mask, vfloat32m1_t op1, size_t
> vl) {
> +  return __riscv_vfcvt_x_f_v_i32m1_rm_m (mask, op1, 1, vl);
> +}
> +
> +vint32m1_t
> +test_riscv_vfcvt_x_f_vv_i32m1 (vfloat32m1_t op1, size_t vl) {
> +  return __riscv_vfcvt_x_f_v_i32m1 (op1, vl);
> +}
> +
> +vint32m1_t
> +test_vfcvt_x_f_vv_i32m1_m (vbool32_t mask, vfloat32m1_t op1, size_t vl) {
> +  

Re: [PATCH] Makefile.in: Add variable TM_P_H2 for TM_P_H dependency [PR111021]

2023-08-15 Thread Kewen.Lin via Gcc-patches
on 2023/8/16 10:31, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> As PR111021 shows, the below ${port}-protos.h include tree.h
> for code_helper and tree_code:
> 
>   arm/arm-protos.h:#include "tree.h"
>   cris/cris-protos.h:#include "tree.h"  (H-P removed this in r14-3218)
>   microblaze/microblaze-protos.h:#include "tree.h"
>   rl78/rl78-protos.h:#include "tree.h"
>   stormy16/stormy16-protos.h:#include "tree.h"
> 
> , when compiling build/gencondmd.cc, the include hierarchy
> makes it depend on tm_p.h -> ${port}-protos.h -> tree.h,
> which further includes (depends on) some files that are
> generated during the building, such as: all-tree.def,
> tree-check.h and so on.  The previous commit r14-3215
> should already force build/gencondmd.cc to depend on
> ${TREE_H}, so the reported build failure should be gone.
> 
> But for a long term maintenance, especially one day some
> build/xxx.cc requires tm_p.h but not recog.h, the ${TREE_H}
> dependence could be missed and a build failure will show
> up.  So this patch is to add one variable under section
> "# Shorthand variables for dependency lists.", to explicit
> indicate tm_p.h which includes ${port}-protos.h should
> depend on ${TREE_H}.  Then any new build/xxx.cc depending
> on tm_p.h will be able to consider ${TREE_H}.
> 
> Note that the existing ${TM_P_H} variable is also used for
> "generated_files", it isn't dedicated for dependencies, so
> a variable named ${TM_P_H2} is proposed and put under the
> "# Shorthand variables for dependency lists.", also the
> only use as dependence is updated accordingly.

I did some more checkings and found that not all files in
$(generated_files) are **generated**, some of them actually
sit in source directory, I misinterpreted it from its name,
I think we can just update the existing ${TM_P_H} instead of
adding a new variable.

I'll post a new patch after some testings, sorry for noise!

BR,
Kewen


[PATCH v2] RISC-V: Support RVV VFCVT.X.F.V rounding mode intrinsic API

2023-08-15 Thread Pan Li via Gcc-patches
From: Pan Li 

This patch would like to support the rounding mode API for the
VFCVT.X.F.V as the below samples.

* __riscv_vfcvt_x_f_v_i32m1_rm
* __riscv_vfcvt_x_f_v_i32m1_rm_m

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(enum frm_op_type): New type for frm.
(BASE): New declaration.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfcvt_x_frm): New intrinsic function def.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-cvt-x.c: New test.
---
 .../riscv/riscv-vector-builtins-bases.cc  | 15 +-
 .../riscv/riscv-vector-builtins-bases.h   |  1 +
 .../riscv/riscv-vector-builtins-functions.def |  2 ++
 .../riscv/rvv/base/float-point-cvt-x.c| 29 +++
 4 files changed, 46 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/float-point-cvt-x.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index f2124080ef9..817d2ed016a 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -58,6 +58,12 @@ enum lst_type
   LST_INDEXED,
 };
 
+enum frm_op_type
+{
+  NO_FRM,
+  HAS_FRM,
+};
+
 /* Helper function to fold vleff and vlsegff.  */
 static gimple *
 fold_fault_load (gimple_folder )
@@ -1662,10 +1668,15 @@ public:
 };
 
 /* Implements vfcvt.x.  */
-template
+template
 class vfcvt_x : public function_base
 {
 public:
+  bool has_rounding_mode_operand_p () const override
+  {
+return FRM_OP == HAS_FRM;
+  }
+
   rtx expand (function_expander ) const override
   {
 return e.use_exact_insn (code_for_pred_fcvt_x_f (UNSPEC, e.arg_mode (0)));
@@ -2465,6 +2476,7 @@ static CONSTEXPR const vfclass vfclass_obj;
 static CONSTEXPR const vmerge vfmerge_obj;
 static CONSTEXPR const vmv_v vfmv_v_obj;
 static CONSTEXPR const vfcvt_x vfcvt_x_obj;
+static CONSTEXPR const vfcvt_x vfcvt_x_frm_obj;
 static CONSTEXPR const vfcvt_x vfcvt_xu_obj;
 static CONSTEXPR const vfcvt_rtz_x vfcvt_rtz_x_obj;
 static CONSTEXPR const vfcvt_rtz_x vfcvt_rtz_xu_obj;
@@ -2714,6 +2726,7 @@ BASE (vfclass)
 BASE (vfmerge)
 BASE (vfmv_v)
 BASE (vfcvt_x)
+BASE (vfcvt_x_frm)
 BASE (vfcvt_xu)
 BASE (vfcvt_rtz_x)
 BASE (vfcvt_rtz_xu)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 2a9381eec5e..50a7d7ffb6f 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -205,6 +205,7 @@ extern const function_base *const vfclass;
 extern const function_base *const vfmerge;
 extern const function_base *const vfmv_v;
 extern const function_base *const vfcvt_x;
+extern const function_base *const vfcvt_x_frm;
 extern const function_base *const vfcvt_xu;
 extern const function_base *const vfcvt_rtz_x;
 extern const function_base *const vfcvt_rtz_xu;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 34def6bb82f..8b6a7cc49f3 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -445,6 +445,8 @@ DEF_RVV_FUNCTION (vfcvt_rtz_xu, alu, full_preds, 
f_to_u_f_v_ops)
 DEF_RVV_FUNCTION (vfcvt_f, alu, full_preds, i_to_f_x_v_ops)
 DEF_RVV_FUNCTION (vfcvt_f, alu, full_preds, u_to_f_xu_v_ops)
 
+DEF_RVV_FUNCTION (vfcvt_x_frm, alu_frm, full_preds, f_to_i_f_v_ops)
+
 // 13.18. Widening Floating-Point/Integer Type-Convert Instructions
 DEF_RVV_FUNCTION (vfwcvt_x, alu, full_preds, f_to_wi_f_v_ops)
 DEF_RVV_FUNCTION (vfwcvt_xu, alu, full_preds, f_to_wu_f_v_ops)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-cvt-x.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-cvt-x.c
new file mode 100644
index 000..e090f0f97e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-cvt-x.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+vint32m1_t
+test_riscv_vfcvt_x_f_vv_i32m1_rm (vfloat32m1_t op1, size_t vl) {
+  return __riscv_vfcvt_x_f_v_i32m1_rm (op1, 0, vl);
+}
+
+vint32m1_t
+test_vfcvt_x_f_vv_i32m1_rm_m (vbool32_t mask, vfloat32m1_t op1, size_t vl) {
+  return __riscv_vfcvt_x_f_v_i32m1_rm_m (mask, op1, 1, vl);
+}
+
+vint32m1_t
+test_riscv_vfcvt_x_f_vv_i32m1 (vfloat32m1_t op1, size_t vl) {
+  return __riscv_vfcvt_x_f_v_i32m1 (op1, vl);
+}
+
+vint32m1_t
+test_vfcvt_x_f_vv_i32m1_m (vbool32_t mask, vfloat32m1_t op1, size_t vl) {
+  return __riscv_vfcvt_x_f_v_i32m1_m (mask, op1, vl);
+}
+
+/* { dg-final { scan-assembler-times {vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+} 4 } 
} */
+/* { dg-final { scan-assembler-times {frrm\s+[axs][0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {fsrm\s+[axs][0-9]+} 2 } } */
+/* { dg-final { 

Re: [PATCH] mklog: fix bugs of --append option

2023-08-15 Thread Lehua Ding
Hi Jeff,


Can you take a look at this little patch?
It's a bugfix patch that only affects the --apend
option that I added earlier, not anyone else.
And please let me know if there is a more
suitable reviewer as well. Thank you so much.


Best,
Lehua





--Original--
From:   
 "Lehua Ding"   
 


Re: RISC-V: Added support for CRC.

2023-08-15 Thread Jeff Law via Gcc-patches



On 8/3/23 13:37, Mariam Harutyunyan via Gcc-patches wrote:

This patch adds CRC support for the RISC-V architecture. It adds internal
functions and built-ins specifically designed to handle CRC computations
efficiently.

If the target is ZBC, the clmul instruction is used for the CRC code
generation; otherwise, table-based CRC is generated.  A table with 256
elements is used to store precomputed CRCs.

These CRC calculation algorithms have higher performance than the naive CRC
calculation algorithm.

[ ... ]
Various comments attached.
From 9d2e9023c222501a1d9519bea3d5cdbd32b5a91e Mon Sep 17 00:00:00 2001
From: Mariam Arutunian 
Date: Thu, 3 Aug 2023 15:59:57 +0400
Subject: [PATCH] RISC-V: Added support for CRC.

  If the target is ZBC, then the clmul instruction is used for the CRC code generation;
  otherwise, table-based CRC is generated. A table with 256 elements is used to store precomputed CRCs.

  gcc/ChangeLog:
	*builtin-types.def (BT_FN_UINT8_UINT8_UINT8_CONST_SIZE): Define.
	(BT_FN_UINT16_UINT16_UINT8_CONST_SIZE): Likewise.
	(BT_FN_UINT16_UINT16_UINT16_CONST_SIZE): Likewise.
	(BT_FN_UINT32_UINT32_UINT8_CONST_SIZE): Likewise.
	(BT_FN_UINT32_UINT32_UINT16_CONST_SIZE): Likewise.
	(BT_FN_UINT32_UINT32_UINT32_CONST_SIZE): Likewise.
	(BT_FN_UINT64_UINT64_UINT8_CONST_SIZE): Likewise.
	(BT_FN_UINT64_UINT64_UINT16_CONST_SIZE): Likewise.
	(BT_FN_UINT64_UINT64_UINT32_CONST_SIZE): Likewise.
	(BT_FN_UINT64_UINT64_UINT32_CONST_SIZE): Likewise.
	* builtins.cc (associated_internal_fn): Handle BUILT_IN_CRC8_DATA8,
	BUILT_IN_CRC16_DATA8, BUILT_IN_CRC16_DATA16,
	BUILT_IN_CRC32_DATA8, BUILT_IN_CRC32_DATA16, BUILT_IN_CRC32_DATA32,
	BUILT_IN_CRC64_DATA8, BUILT_IN_CRC64_DATA16, BUILT_IN_CRC64_DATA32,
	BUILT_IN_CRC64_DATA64.
	* builtins.def (BUILT_IN_CRC8_DATA8): New builtin.
	(BUILT_IN_CRC16_DATA8): Likewise.
	(BUILT_IN_CRC16_DATA16): Likewise.
	(BUILT_IN_CRC32_DATA8): Likewise.
	(BUILT_IN_CRC32_DATA16): Likewise.
	(BUILT_IN_CRC32_DATA32): Likewise.
	(BUILT_IN_CRC64_DATA8): Likewise.
	(BUILT_IN_CRC64_DATA16): Likewise.
	(BUILT_IN_CRC64_DATA32): Likewise.
	(BUILT_IN_CRC64_DATA64): Likewise.
	* config/riscv/bitmanip.md (crc4): New expander.
	* config/riscv/riscv-protos.h (expand_crc_table_based): Declare.
	(expand_crc_using_clmul): Likewise.
	* config/riscv/riscv.cc (gf2n_poly_long_div_quotient): New function.
	(generate_crc): Likewise.
	(generate_crc_table): Likewise.
	(expand_crc_table_based): Likewise.
	(expand_crc_using_clmul): Likewise.
	* config/riscv/riscv.md (UNSPEC_CRC): New unspec for CRC.
	* internal-fn.cc (crc_direct): Define.
	(expand_crc_optab_fn): New function.
	(direct_crc_optab_supported_p): Define.
	* internal-fn.def (CRC): New internal optab function.
	* optabs.def (crc_optab): New optab.

  gcc/testsuite/ChangeLog:
	* gcc.target/riscv/crc-builtin-table-target32.c: New test.
	* gcc.target/riscv/crc-builtin-table-target64.c: New test.
	* gcc.target/riscv/crc-builtin-zbc32.c: New test.
	* gcc.target/riscv/crc-builtin-zbc64.c: New test.
---
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 2eab466a9f8..748d8be384b 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
In general, the ChangeLog entry is just sent like you've done above and
not included as a diff.


diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index 43381bc8949..e33837c27d0 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -829,6 +829,26 @@ DEF_FUNCTION_TYPE_3 (BT_FN_PTR_SIZE_SIZE_PTRMODE,
 		 BT_PTR, BT_SIZE, BT_SIZE, BT_PTRMODE)
 DEF_FUNCTION_TYPE_3 (BT_FN_VOID_PTR_UINT8_PTRMODE, BT_VOID, BT_PTR, BT_UINT8,
 		 BT_PTRMODE)
+DEF_FUNCTION_TYPE_3 (BT_FN_UINT8_UINT8_UINT8_CONST_SIZE, BT_UINT8, BT_UINT8,
+		 BT_UINT8, BT_CONST_SIZE)
[ ... ]
Presumably the reason we need to many variants is due to the desire to
support various types for the inputs and outputs?


diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index c42e7b890db..4c896303242 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -856,3 +856,38 @@
   "TARGET_ZBC"
   "clmulr\t%0,%1,%2"
   [(set_attr "type" "clmul")])
+
+;; Iterator for hardware-supported integer modes, same as ANYI
+(define_mode_iterator ANYI2 [QI HI SI (DI "TARGET_64BIT")])
+
+;; CRC 8, 16, 32, (64 for TARGET_64)
+(define_expand "crc4"
+	;; return value (calculated CRC)
+  [(set (match_operand:ANYI 0 "register_operand" "=r")
+		  ;; initial CRC
+	(unspec:ANYI [(match_operand:ANYI 1 "register_operand" "r")
+		  ;; data
+		  (match_operand:ANYI2 2 "register_operand" "r")
+		  ;; polynomial
+		  (match_operand:ANYI 3)]
+		  UNSPEC_CRC))]
I'm not real comfortable with directly supporting sub-word sizes for the
output operand.  The vast majority of insns on the risc-v port have
outputs that use either the X iterator (which maps to either SI or DI
mode for rv32 and rv64 respectively) or they use the GPR iterator.  I
don't think many us ANYI.

Which ultimately makes sense since operations actually write X mode

Re: [PATCH v4] Introduce attribute sym

2023-08-15 Thread Alexandre Oliva via Gcc-patches
On Jul 22, 2023, Fangrui Song  wrote:

> I wonder whether this attribute can be named "alias" without arguments.

Erhm...  Maybe I'm missing something about your suggestion, but without
arguments, how would we tell the compiler the symbol name of the
additional alias we want for the definition?

Maybe, instead of no arguments, we could use something like:

  attribute (alias (..., "alt_sym_name"))

but I don't find that clearer, and I have a hunch that the
implementation would be significantly more convoluted.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice but
very few check the facts.  Think Assange & Stallman.  The empires strike back


Re: RISC-V: Added support for CRC.

2023-08-15 Thread Jeff Law via Gcc-patches




On 8/9/23 07:02, Paul Koning wrote:




On Aug 9, 2023, at 2:32 AM, Alexander Monakov  wrote:


On Tue, 8 Aug 2023, Jeff Law wrote:


If the compiler can identify a CRC and collapse it down to a table or clmul,
that's a major win and such code does exist in the real world. That was the
whole point behind the Fedora experiment -- to determine if these things are
showing up in the real world or if this is just a benchmarking exercise.


Can you share the results of the experiment and give your estimate of what
sort of real-world improvement is expected? I already listed the popular
FOSS projects where CRC performance is important: the Linux kernel and
a few compression libraries. Those projects do not use a bitwise CRC loop,
except sometimes for table generation on startup (which needs less time
than a page fault that may be necessary to bring in a hardcoded table).

For those projects that need a better CRC, why is the chosen solution is
to optimize it in the compiler instead of offering them a library they
could use with any compiler?

Was there any thought given to embedded projects that use bitwise CRC
exactly because they little space for a hardcoded table to spare?


Or those that use smaller tables -- for example, the classic VAX microcode 
approach with a 16-entry table, doing CRC 4 bits at a time.
Yup.  I think we settled on 8 bits as a time for the table variant.  It 
seemed like a good tradeoff between size of the tables and speed.




I agree that this seems an odd thing to optimize.  CRC is a well known CPU hog 
with well established efficient solutions, and it's hard to see  why anyone who 
needs good performance would fail to understand and apply that knowledge.
As I've said, what started us down this path was Coremark. But what 
convinced me that this was useful beyond juicing benchmark data was 
finding the various implementations in the wild.


Jeff


Re: RISC-V: Added support for CRC.

2023-08-15 Thread Jeff Law via Gcc-patches




On 8/9/23 00:32, Alexander Monakov wrote:


On Tue, 8 Aug 2023, Jeff Law wrote:


If the compiler can identify a CRC and collapse it down to a table or clmul,
that's a major win and such code does exist in the real world. That was the
whole point behind the Fedora experiment -- to determine if these things are
showing up in the real world or if this is just a benchmarking exercise.


Can you share the results of the experiment and give your estimate of what
sort of real-world improvement is expected? I already listed the popular
FOSS projects where CRC performance is important: the Linux kernel and
a few compression libraries. Those projects do not use a bitwise CRC loop,
except sometimes for table generation on startup (which needs less time
than a page fault that may be necessary to bring in a hardcoded table).
That experiment was ~7 months ago.  I don't think any of the data is 
still around except for some extracted testcases.




For those projects that need a better CRC, why is the chosen solution is
to optimize it in the compiler instead of offering them a library they
could use with any compiler?
Because if the compiler can optimize it automatically, then the projects 
have to do literally nothing to take advantage of it.  They just compile 
normally and their bitwise CRC gets optimized down to either a table 
lookup or a clmul variant.  That's the real goal here.


If a step where we provide the backend bits hooked up to a builtin isn't 
useful, then we won't pursue it.  The thinking was it would provide 
value for those willing to make a slight change to their sources and at 
the same time we get real world exposure for the backend work of the CRC 
optimization effort while we polish the gimple detection bits.






Was there any thought given to embedded projects that use bitwise CRC
exactly because they little space for a hardcoded table to spare?
It wasn't an explicit goal, but the ability to select between a table 
implementation and a clmul implementation in the backend seemed useful, 
so we wired up both.





No, not if the compiler is not GCC, or its version is less than 14. And
those projects are not going to sacrifice their portability just for
__builtin_crc.

You may be right.   I don't think it's so clear cut. though.





I think offering a conventional library for CRC has substantial advantages.

That's not what I asked.  If you think there's room for improvement to a
builtin API, I'd love to hear it.

But it seems you don't think this is worth the effort at all.  That's
unfortunate, but if that's the consensus, then so be it.


I think it's a strange application of development effort. You'd get more
done coding a library.
Not if the end goal is to detect the CRC and optimize it into a table or 
clmul without the user having to do anything special.


Again, what we've proposed in this patch is a piece of that larger body 
of work, specifically the backend bits that we thought would have value 
independently.  If the community doesn't see that carved out chunk as 
helpful we'll table it until the whole end-to-end path is ready for 
submission.






I'll note LLVM is likely going forward with CRC detection and optimization at
some point in the next ~6 months (effectively moving the implementation from
the hexagon port into the generic parts of their loop optimizer).


I don't see CRC detection in the Hexagon port. There is a recognizer for
polynomial multiplication (CRC is division, not multiplication).
Yes, you need to the recognizer so that you can detect a CRC loop, then 
with a bit of math you turn that into a carryless multiply sequence.  I 
find the math here mindbending, but the Hexagon bits are precisely to 
optimize CRC loops.  Sadly the Hexagon bits are fairly specific to the 
CRC implementation inside coremark.  The GCC bits we've been working on 
are much more general.


One final note.  Elsewhere in this thread you described performance 
concerns.  Right now clmuls can be implemented in 4c, fully piped.  I 
fully expect that latency to drop within the next 12-18 months.  In that 
world, there's not going to be much benefit to using hand-coded 
libraries vs just letting the compiler do it.


Jeff


RE: [PATCH v1] RISC-V: Support RVV VFCVT.X.F.V rounding mode intrinsic API

2023-08-15 Thread Li, Pan2 via Gcc-patches
Got it, thanks!

Will start with CVT and rest frm instructions first, and then refactor.

Pan

-Original Message-
From: Kito Cheng  
Sent: Wednesday, August 16, 2023 11:44 AM
To: Li, Pan2 
Cc: juzhe.zh...@rivai.ai; gcc-patches ; Wang, Yanzhang 

Subject: Re: [PATCH v1] RISC-V: Support RVV VFCVT.X.F.V rounding mode intrinsic 
API

I would prefer to introduce an enum template argument and refactor
existing code later :)

On Wed, Aug 16, 2023 at 11:40 AM Li, Pan2 via Gcc-patches
 wrote:
>
> That should work as well, but may require some changes to existing codes like 
> declaration, etc.
> I am OK for both the enum or inherit, and will start with the CVT parts, then 
> refactor the existing frm class.
>
> Do you have any suggestion for the decision making?
>
> Pan
>
> -Original Message-
> From: Kito Cheng 
> Sent: Wednesday, August 16, 2023 11:30 AM
> To: Li, Pan2 
> Cc: juzhe.zh...@rivai.ai; gcc-patches ; Wang, 
> Yanzhang 
> Subject: Re: [PATCH v1] RISC-V: Support RVV VFCVT.X.F.V rounding mode 
> intrinsic API
>
> Or using an enum value rather than bool?
>
> I am thinking we could also simplify/remove most other frm classes,
> some practical example:
>
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc
> b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> index 2074dac0f16..ace63e963a5 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> @@ -58,6 +58,11 @@ enum lst_type
>   LST_INDEXED,
> };
>
> +enum frm_op_type
> +{
> +  NO_FRM,
> +  HAS_FRM
> +};
> /* Helper function to fold vleff and vlsegff.  */
> static gimple *
> fold_fault_load (gimple_folder )
> @@ -256,41 +261,22 @@ public:
>vremu/vsadd/vsaddu/vssub/vssubu
>vfadd/vfsub/
> */
> -template
> +template
> class binop : public function_base
> {
> public:
> -  rtx expand (function_expander ) const override
> +  bool has_rounding_mode_operand_p () const override
>   {
> -switch (e.op_info->op)
> -  {
> -  case OP_TYPE_vx:
> -  case OP_TYPE_vf:
> -   return e.use_exact_insn (code_for_pred_scalar (CODE, e.vector_mode 
> ()));
> -  case OP_TYPE_vv:
> -   return e.use_exact_insn (code_for_pred (CODE, e.vector_mode ()));
> -  default:
> -   gcc_unreachable ();
> -  }
> +return FRM_OP == HAS_FRM;
>   }
> -};
> -
> -/* Implements below instructions for now.
> -   - vfadd
> -   - vfsub
> -   - vfmul
> -   - vfdiv
> -*/
> -template
> -class binop_frm : public function_base
> -{
> -public:
> -  bool has_rounding_mode_operand_p () const override { return true; }
>
>   rtx expand (function_expander ) const override
>   {
> switch (e.op_info->op)
>   {
> +  case OP_TYPE_vx:
> +   gcc_assert (FRM_OP == NO_FRM);
> +   gcc_fallthrough ();
>   case OP_TYPE_vf:
>return e.use_exact_insn (code_for_pred_scalar (CODE, e.vector_mode 
> ()));
>   case OP_TYPE_vv:
> @@ -1648,10 +1634,15 @@ public:
> };
>
> /* Implements vfcvt.x.  */
> -template
> +template
> class vfcvt_x : public function_base
> {
> public:
> +  bool has_rounding_mode_operand_p () const override
> +  {
> +return FRM_OP == HAS_FRM;
> +  }
> +
>   rtx expand (function_expander ) const override
>   {
> return e.use_exact_insn (code_for_pred_fcvt_x_f (UNSPEC, e.arg_mode (0)));
> @@ -2389,8 +2380,8 @@ static CONSTEXPR const viota viota_obj;
> static CONSTEXPR const vid vid_obj;
> static CONSTEXPR const binop vfadd_obj;
> static CONSTEXPR const binop vfsub_obj;
> -static CONSTEXPR const binop_frm vfadd_frm_obj;
> -static CONSTEXPR const binop_frm vfsub_frm_obj;
> +static CONSTEXPR const binop vfadd_frm_obj;
> +static CONSTEXPR const binop vfsub_frm_obj;
> static CONSTEXPR const reverse_binop vfrsub_obj;
> static CONSTEXPR const reverse_binop_frm vfrsub_frm_obj;
> static CONSTEXPR const widen_binop vfwadd_obj;
> @@ -2398,9 +2389,9 @@ static CONSTEXPR const widen_binop_frm
> vfwadd_frm_obj;
> static CONSTEXPR const widen_binop vfwsub_obj;
> static CONSTEXPR const widen_binop_frm vfwsub_frm_obj;
> static CONSTEXPR const binop vfmul_obj;
> -static CONSTEXPR const binop_frm vfmul_frm_obj;
> +static CONSTEXPR const binop vfmul_frm_obj;
> static CONSTEXPR const binop vfdiv_obj;
> -static CONSTEXPR const binop_frm vfdiv_frm_obj;
> +static CONSTEXPR const binop vfdiv_frm_obj;
> static CONSTEXPR const reverse_binop vfrdiv_obj;
> static CONSTEXPR const reverse_binop_frm vfrdiv_frm_obj;
> static CONSTEXPR const widen_binop vfwmul_obj;


Re: [PATCH v1] RISC-V: Support RVV VFCVT.X.F.V rounding mode intrinsic API

2023-08-15 Thread Kito Cheng via Gcc-patches
I would prefer to introduce an enum template argument and refactor
existing code later :)

On Wed, Aug 16, 2023 at 11:40 AM Li, Pan2 via Gcc-patches
 wrote:
>
> That should work as well, but may require some changes to existing codes like 
> declaration, etc.
> I am OK for both the enum or inherit, and will start with the CVT parts, then 
> refactor the existing frm class.
>
> Do you have any suggestion for the decision making?
>
> Pan
>
> -Original Message-
> From: Kito Cheng 
> Sent: Wednesday, August 16, 2023 11:30 AM
> To: Li, Pan2 
> Cc: juzhe.zh...@rivai.ai; gcc-patches ; Wang, 
> Yanzhang 
> Subject: Re: [PATCH v1] RISC-V: Support RVV VFCVT.X.F.V rounding mode 
> intrinsic API
>
> Or using an enum value rather than bool?
>
> I am thinking we could also simplify/remove most other frm classes,
> some practical example:
>
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc
> b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> index 2074dac0f16..ace63e963a5 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> @@ -58,6 +58,11 @@ enum lst_type
>   LST_INDEXED,
> };
>
> +enum frm_op_type
> +{
> +  NO_FRM,
> +  HAS_FRM
> +};
> /* Helper function to fold vleff and vlsegff.  */
> static gimple *
> fold_fault_load (gimple_folder )
> @@ -256,41 +261,22 @@ public:
>vremu/vsadd/vsaddu/vssub/vssubu
>vfadd/vfsub/
> */
> -template
> +template
> class binop : public function_base
> {
> public:
> -  rtx expand (function_expander ) const override
> +  bool has_rounding_mode_operand_p () const override
>   {
> -switch (e.op_info->op)
> -  {
> -  case OP_TYPE_vx:
> -  case OP_TYPE_vf:
> -   return e.use_exact_insn (code_for_pred_scalar (CODE, e.vector_mode 
> ()));
> -  case OP_TYPE_vv:
> -   return e.use_exact_insn (code_for_pred (CODE, e.vector_mode ()));
> -  default:
> -   gcc_unreachable ();
> -  }
> +return FRM_OP == HAS_FRM;
>   }
> -};
> -
> -/* Implements below instructions for now.
> -   - vfadd
> -   - vfsub
> -   - vfmul
> -   - vfdiv
> -*/
> -template
> -class binop_frm : public function_base
> -{
> -public:
> -  bool has_rounding_mode_operand_p () const override { return true; }
>
>   rtx expand (function_expander ) const override
>   {
> switch (e.op_info->op)
>   {
> +  case OP_TYPE_vx:
> +   gcc_assert (FRM_OP == NO_FRM);
> +   gcc_fallthrough ();
>   case OP_TYPE_vf:
>return e.use_exact_insn (code_for_pred_scalar (CODE, e.vector_mode 
> ()));
>   case OP_TYPE_vv:
> @@ -1648,10 +1634,15 @@ public:
> };
>
> /* Implements vfcvt.x.  */
> -template
> +template
> class vfcvt_x : public function_base
> {
> public:
> +  bool has_rounding_mode_operand_p () const override
> +  {
> +return FRM_OP == HAS_FRM;
> +  }
> +
>   rtx expand (function_expander ) const override
>   {
> return e.use_exact_insn (code_for_pred_fcvt_x_f (UNSPEC, e.arg_mode (0)));
> @@ -2389,8 +2380,8 @@ static CONSTEXPR const viota viota_obj;
> static CONSTEXPR const vid vid_obj;
> static CONSTEXPR const binop vfadd_obj;
> static CONSTEXPR const binop vfsub_obj;
> -static CONSTEXPR const binop_frm vfadd_frm_obj;
> -static CONSTEXPR const binop_frm vfsub_frm_obj;
> +static CONSTEXPR const binop vfadd_frm_obj;
> +static CONSTEXPR const binop vfsub_frm_obj;
> static CONSTEXPR const reverse_binop vfrsub_obj;
> static CONSTEXPR const reverse_binop_frm vfrsub_frm_obj;
> static CONSTEXPR const widen_binop vfwadd_obj;
> @@ -2398,9 +2389,9 @@ static CONSTEXPR const widen_binop_frm
> vfwadd_frm_obj;
> static CONSTEXPR const widen_binop vfwsub_obj;
> static CONSTEXPR const widen_binop_frm vfwsub_frm_obj;
> static CONSTEXPR const binop vfmul_obj;
> -static CONSTEXPR const binop_frm vfmul_frm_obj;
> +static CONSTEXPR const binop vfmul_frm_obj;
> static CONSTEXPR const binop vfdiv_obj;
> -static CONSTEXPR const binop_frm vfdiv_frm_obj;
> +static CONSTEXPR const binop vfdiv_frm_obj;
> static CONSTEXPR const reverse_binop vfrdiv_obj;
> static CONSTEXPR const reverse_binop_frm vfrdiv_frm_obj;
> static CONSTEXPR const widen_binop vfwmul_obj;


RE: [PATCH v1] RISC-V: Support RVV VFCVT.X.F.V rounding mode intrinsic API

2023-08-15 Thread Li, Pan2 via Gcc-patches
That should work as well, but may require some changes to existing codes like 
declaration, etc.
I am OK for both the enum or inherit, and will start with the CVT parts, then 
refactor the existing frm class.

Do you have any suggestion for the decision making?

Pan

-Original Message-
From: Kito Cheng  
Sent: Wednesday, August 16, 2023 11:30 AM
To: Li, Pan2 
Cc: juzhe.zh...@rivai.ai; gcc-patches ; Wang, Yanzhang 

Subject: Re: [PATCH v1] RISC-V: Support RVV VFCVT.X.F.V rounding mode intrinsic 
API

Or using an enum value rather than bool?

I am thinking we could also simplify/remove most other frm classes,
some practical example:


diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 2074dac0f16..ace63e963a5 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -58,6 +58,11 @@ enum lst_type
  LST_INDEXED,
};

+enum frm_op_type
+{
+  NO_FRM,
+  HAS_FRM
+};
/* Helper function to fold vleff and vlsegff.  */
static gimple *
fold_fault_load (gimple_folder )
@@ -256,41 +261,22 @@ public:
   vremu/vsadd/vsaddu/vssub/vssubu
   vfadd/vfsub/
*/
-template
+template
class binop : public function_base
{
public:
-  rtx expand (function_expander ) const override
+  bool has_rounding_mode_operand_p () const override
  {
-switch (e.op_info->op)
-  {
-  case OP_TYPE_vx:
-  case OP_TYPE_vf:
-   return e.use_exact_insn (code_for_pred_scalar (CODE, e.vector_mode ()));
-  case OP_TYPE_vv:
-   return e.use_exact_insn (code_for_pred (CODE, e.vector_mode ()));
-  default:
-   gcc_unreachable ();
-  }
+return FRM_OP == HAS_FRM;
  }
-};
-
-/* Implements below instructions for now.
-   - vfadd
-   - vfsub
-   - vfmul
-   - vfdiv
-*/
-template
-class binop_frm : public function_base
-{
-public:
-  bool has_rounding_mode_operand_p () const override { return true; }

  rtx expand (function_expander ) const override
  {
switch (e.op_info->op)
  {
+  case OP_TYPE_vx:
+   gcc_assert (FRM_OP == NO_FRM);
+   gcc_fallthrough ();
  case OP_TYPE_vf:
   return e.use_exact_insn (code_for_pred_scalar (CODE, e.vector_mode ()));
  case OP_TYPE_vv:
@@ -1648,10 +1634,15 @@ public:
};

/* Implements vfcvt.x.  */
-template
+template
class vfcvt_x : public function_base
{
public:
+  bool has_rounding_mode_operand_p () const override
+  {
+return FRM_OP == HAS_FRM;
+  }
+
  rtx expand (function_expander ) const override
  {
return e.use_exact_insn (code_for_pred_fcvt_x_f (UNSPEC, e.arg_mode (0)));
@@ -2389,8 +2380,8 @@ static CONSTEXPR const viota viota_obj;
static CONSTEXPR const vid vid_obj;
static CONSTEXPR const binop vfadd_obj;
static CONSTEXPR const binop vfsub_obj;
-static CONSTEXPR const binop_frm vfadd_frm_obj;
-static CONSTEXPR const binop_frm vfsub_frm_obj;
+static CONSTEXPR const binop vfadd_frm_obj;
+static CONSTEXPR const binop vfsub_frm_obj;
static CONSTEXPR const reverse_binop vfrsub_obj;
static CONSTEXPR const reverse_binop_frm vfrsub_frm_obj;
static CONSTEXPR const widen_binop vfwadd_obj;
@@ -2398,9 +2389,9 @@ static CONSTEXPR const widen_binop_frm
vfwadd_frm_obj;
static CONSTEXPR const widen_binop vfwsub_obj;
static CONSTEXPR const widen_binop_frm vfwsub_frm_obj;
static CONSTEXPR const binop vfmul_obj;
-static CONSTEXPR const binop_frm vfmul_frm_obj;
+static CONSTEXPR const binop vfmul_frm_obj;
static CONSTEXPR const binop vfdiv_obj;
-static CONSTEXPR const binop_frm vfdiv_frm_obj;
+static CONSTEXPR const binop vfdiv_frm_obj;
static CONSTEXPR const reverse_binop vfrdiv_obj;
static CONSTEXPR const reverse_binop_frm vfrdiv_frm_obj;
static CONSTEXPR const widen_binop vfwmul_obj;


Re: [PATCH v1] RISC-V: Support RVV VFCVT.X.F.V rounding mode intrinsic API

2023-08-15 Thread Kito Cheng via Gcc-patches
Or using an enum value rather than bool?

I am thinking we could also simplify/remove most other frm classes,
some practical example:


diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 2074dac0f16..ace63e963a5 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -58,6 +58,11 @@ enum lst_type
  LST_INDEXED,
};

+enum frm_op_type
+{
+  NO_FRM,
+  HAS_FRM
+};
/* Helper function to fold vleff and vlsegff.  */
static gimple *
fold_fault_load (gimple_folder )
@@ -256,41 +261,22 @@ public:
   vremu/vsadd/vsaddu/vssub/vssubu
   vfadd/vfsub/
*/
-template
+template
class binop : public function_base
{
public:
-  rtx expand (function_expander ) const override
+  bool has_rounding_mode_operand_p () const override
  {
-switch (e.op_info->op)
-  {
-  case OP_TYPE_vx:
-  case OP_TYPE_vf:
-   return e.use_exact_insn (code_for_pred_scalar (CODE, e.vector_mode ()));
-  case OP_TYPE_vv:
-   return e.use_exact_insn (code_for_pred (CODE, e.vector_mode ()));
-  default:
-   gcc_unreachable ();
-  }
+return FRM_OP == HAS_FRM;
  }
-};
-
-/* Implements below instructions for now.
-   - vfadd
-   - vfsub
-   - vfmul
-   - vfdiv
-*/
-template
-class binop_frm : public function_base
-{
-public:
-  bool has_rounding_mode_operand_p () const override { return true; }

  rtx expand (function_expander ) const override
  {
switch (e.op_info->op)
  {
+  case OP_TYPE_vx:
+   gcc_assert (FRM_OP == NO_FRM);
+   gcc_fallthrough ();
  case OP_TYPE_vf:
   return e.use_exact_insn (code_for_pred_scalar (CODE, e.vector_mode ()));
  case OP_TYPE_vv:
@@ -1648,10 +1634,15 @@ public:
};

/* Implements vfcvt.x.  */
-template
+template
class vfcvt_x : public function_base
{
public:
+  bool has_rounding_mode_operand_p () const override
+  {
+return FRM_OP == HAS_FRM;
+  }
+
  rtx expand (function_expander ) const override
  {
return e.use_exact_insn (code_for_pred_fcvt_x_f (UNSPEC, e.arg_mode (0)));
@@ -2389,8 +2380,8 @@ static CONSTEXPR const viota viota_obj;
static CONSTEXPR const vid vid_obj;
static CONSTEXPR const binop vfadd_obj;
static CONSTEXPR const binop vfsub_obj;
-static CONSTEXPR const binop_frm vfadd_frm_obj;
-static CONSTEXPR const binop_frm vfsub_frm_obj;
+static CONSTEXPR const binop vfadd_frm_obj;
+static CONSTEXPR const binop vfsub_frm_obj;
static CONSTEXPR const reverse_binop vfrsub_obj;
static CONSTEXPR const reverse_binop_frm vfrsub_frm_obj;
static CONSTEXPR const widen_binop vfwadd_obj;
@@ -2398,9 +2389,9 @@ static CONSTEXPR const widen_binop_frm
vfwadd_frm_obj;
static CONSTEXPR const widen_binop vfwsub_obj;
static CONSTEXPR const widen_binop_frm vfwsub_frm_obj;
static CONSTEXPR const binop vfmul_obj;
-static CONSTEXPR const binop_frm vfmul_frm_obj;
+static CONSTEXPR const binop vfmul_frm_obj;
static CONSTEXPR const binop vfdiv_obj;
-static CONSTEXPR const binop_frm vfdiv_frm_obj;
+static CONSTEXPR const binop vfdiv_frm_obj;
static CONSTEXPR const reverse_binop vfrdiv_obj;
static CONSTEXPR const reverse_binop_frm vfrdiv_frm_obj;
static CONSTEXPR const widen_binop vfwmul_obj;


Re: [PATCH v4 0/6] Add Loongson SX/ASX instruction support to LoongArch target.

2023-08-15 Thread Xi Ruoyao via Gcc-patches
The implementation fails to handle this test case properly:

typedef double __attribute__((vector_size(32))) v4df;

void use1(double);

__attribute__((noipa)) double use(double)
{
register double x asm("f24") = 114.514;
__asm__("" : "+f" (x));
return x;
}

void test(void)
{
register v4df x asm("f24") = {1, 2, 3, 4};
__asm__("" : "+f" (x));
use(x[1]);
use1(x[3]);
}

Here use() attempts to save and restore f24, but it uses fst.d/fld.d,
clobbering the high 192 bits of xr24.  Now test() passes a wrong value
of x[3] to use1().

Note that saving and restoring f24 with xvst/xvld in use() won't really
fix the issue because in real life use() can be in another translation
unit (or even a shared library) compiled with -mno-lsx.  So it seems we
need to tell the compiler "a function call may clobber the high bits of
a vector register even if the corresponding floating-point register is
saved".  I'm not sure how to accomplish this...

On Tue, 2023-08-15 at 09:05 +0800, Chenghui Pan wrote:
> This is an update of:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626194.html
> 
> This version of patch set only introduces some small simplications of
> implementation. Because I missed the size limitation of mail size, the
> huge testsuite patches of v2 and v3 are not shown in the mail list.
> So,
> testsuite patches are splited from this patch set again and will be
> submitted 
> independently in the future.
> 
> Binutils-gdb introduced LSX/LASX support since 2.41 release:
> https://lists.gnu.org/archive/html/info-gnu/2023-07/msg9.html
> 
> Brief history of patch set version:
> v1 -> v2:
> - Reduce usage of "unspec" in RTL template.
> - Append Support of ADDR_REG_REG in LSX and LASX.
> - Constraint docs are appended in gcc/doc/md.texi and ccomment block.
> - Codes related to vecarg are removed.
> - Testsuite of LSX and LASX is added in v2. (Because of the size
> limitation of
>   mail list, these patches are not shown)
> - Adjust the loongarch_expand_vector_init() function to reduce
> instruction 
>   output amount.
> - Some minor implementation changes of RTL templates.
> 
> v2 -> v3:
> - Revert vabsd/xvabsd RTL templates to unspec impl.
> - Resolve warning in gcc/config/loongarch/loongarch.cc when
> bootstrapping 
>   with BOOT_CFLAGS="-O2 -ftree-vectorize -fno-vect-cost-model -mlasx".
> - Remove redundant definitions in lasxintrin.h.
> - Refine commit info.
> 
> Lulu Cheng (6):
>   LoongArch: Add Loongson SX vector directive compilation framework.
>   LoongArch: Add Loongson SX base instruction support.
>   LoongArch: Add Loongson SX directive builtin function support.
>   LoongArch: Add Loongson ASX vector directive compilation framework.
>   LoongArch: Add Loongson ASX base instruction support.
>   LoongArch: Add Loongson ASX directive builtin function support.
> 
>  gcc/config.gcc    |    2 +-
>  gcc/config/loongarch/constraints.md   |  131 +-
>  .../loongarch/genopts/loongarch-strings   |    4 +
>  gcc/config/loongarch/genopts/loongarch.opt.in |   12 +-
>  gcc/config/loongarch/lasx.md  | 5122 
>  gcc/config/loongarch/lasxintrin.h | 5338
> +
>  gcc/config/loongarch/loongarch-builtins.cc    | 2686 -
>  gcc/config/loongarch/loongarch-c.cc   |   18 +
>  gcc/config/loongarch/loongarch-def.c  |    6 +
>  gcc/config/loongarch/loongarch-def.h  |    9 +-
>  gcc/config/loongarch/loongarch-driver.cc  |   10 +
>  gcc/config/loongarch/loongarch-driver.h   |    2 +
>  gcc/config/loongarch/loongarch-ftypes.def |  666 +-
>  gcc/config/loongarch/loongarch-modes.def  |   39 +
>  gcc/config/loongarch/loongarch-opts.cc    |   89 +-
>  gcc/config/loongarch/loongarch-opts.h |    3 +
>  gcc/config/loongarch/loongarch-protos.h   |   35 +
>  gcc/config/loongarch/loongarch-str.h  |    3 +
>  gcc/config/loongarch/loongarch.cc | 4586 +-
>  gcc/config/loongarch/loongarch.h  |  117 +-
>  gcc/config/loongarch/loongarch.md |   56 +-
>  gcc/config/loongarch/loongarch.opt    |   12 +-
>  gcc/config/loongarch/lsx.md   | 4481 ++
>  gcc/config/loongarch/lsxintrin.h  | 5181 
>  gcc/config/loongarch/predicates.md    |  333 +-
>  gcc/doc/md.texi   |   11 +
>  26 files changed, 28668 insertions(+), 284 deletions(-)
>  create mode 100644 gcc/config/loongarch/lasx.md
>  create mode 100644 gcc/config/loongarch/lasxintrin.h
>  create mode 100644 gcc/config/loongarch/lsx.md
>  create mode 100644 gcc/config/loongarch/lsxintrin.h
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


RE: [PATCH v1] RISC-V: Support RVV VFCVT.X.F.V rounding mode intrinsic API

2023-08-15 Thread Li, Pan2 via Gcc-patches
Thanks Kito for comments. How about leverage inherit instead of template? 
AFAIK, the bool argument isn't recommended up to a point. 
For example, as below to reuse the expand part.

class vfcvt_x : public function_base
 {
 public:
+  virtual bool has_rounding_mode_operand_p () const { return false; }
+
   rtx expand (function_expander ) const override
   {
 return e.use_exact_insn (code_for_pred_fcvt_x_f (UNSPEC, e.arg_mode (0)));
   }
 };

+/* Implements below instructions for frm
+   - vfcvt_x
+*/
+template
+class vfcvt_x_frm : public vfcvt_x
+{
+public:
+  bool has_rounding_mode_operand_p () const override { return true; }
+};

Pan

-Original Message-
From: Kito Cheng  
Sent: Tuesday, August 15, 2023 11:34 PM
To: juzhe.zh...@rivai.ai
Cc: Li, Pan2 ; gcc-patches ; Wang, 
Yanzhang 
Subject: Re: [PATCH v1] RISC-V: Support RVV VFCVT.X.F.V rounding mode intrinsic 
API

Just a random idea came to my mind, maybe we could introduce one more
template argument to reduce those codes for rounding mode intrinsic
stuff?

example:

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 2074dac0f16..9cc60842a5b 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1648,10 +1648,11 @@ public:
};

/* Implements vfcvt.x.  */
-template
+template
class vfcvt_x : public function_base
{
public:
+  bool has_rounding_mode_operand_p () const override { return HAS_FRM; }
  rtx expand (function_expander ) const override
  {
return e.use_exact_insn (code_for_pred_fcvt_x_f (UNSPEC, e.arg_mode (0)));
@@ -2451,6 +2452,7 @@ static CONSTEXPR const vmerge vfmerge_obj;
static CONSTEXPR const vmv_v vfmv_v_obj;
static CONSTEXPR const vfcvt_x vfcvt_x_obj;
static CONSTEXPR const vfcvt_x vfcvt_xu_obj;
+static CONSTEXPR const vfcvt_x vfcvt_x_frm_obj;
static CONSTEXPR const vfcvt_rtz_x vfcvt_rtz_x_obj;
static CONSTEXPR const vfcvt_rtz_x vfcvt_rtz_xu_obj;
static CONSTEXPR const vfcvt_f vfcvt_f_obj;


[r14-2946 Regression] FAIL: gcc.target/i386/pr87007-5.c scan-assembler-times vxorps[^\n\r]*xmm[0-9] 0 on Linux/x86_64

2023-08-15 Thread Jiang, Haochen via Gcc-patches
From: haochen.jiang  
Sent: Tuesday, August 15, 2023 5:26 PM
To: rguent...@suse.de; gcc-regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org; 
Jiang, Haochen 
Subject: [r14-2946 Regression] FAIL: gcc.target/i386/pr87007-5.c 
scan-assembler-times vxorps[^\n\r]*xmm[0-9] 0 on Linux/x86_64

On Linux/x86_64,

46c8c225455273ce7f7da7cc5707aed54f23e78d is the first bad commit
commit 46c8c225455273ce7f7da7cc5707aed54f23e78d
Author: Richard Biener 
Date:   Wed Jul 26 15:23:45 2023 +0200

Improve sinking with unrelated defs

caused

FAIL: gcc.target/i386/pr87007-5.c scan-assembler-times vxorps[^\n\r]*xmm[0-9] 0

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-2946/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-5.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-5.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-5.c --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr87007-5.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


[r14-3148 Regression] FAIL: gcc.dg/vect/bb-slp-subgroups-2.c scan-tree-dump-times slp2 "optimized: basic block" 2 on Linux/x86_64

2023-08-15 Thread Jiang, Haochen via Gcc-patches
From: haochen.jiang  
Sent: Tuesday, August 15, 2023 5:26 PM
To: rguent...@suse.de; gcc-regress...@gcc.gnu.org; gcc-patches@gcc.gnu.org; 
Jiang, Haochen 
Subject: [r14-3148 Regression] FAIL: gcc.dg/vect/bb-slp-subgroups-2.c 
scan-tree-dump-times slp2 "optimized: basic block" 2 on Linux/x86_64

On Linux/x86_64,

3a13884b23ae32b43d56d68a9c6bd4ce53d60017 is the first bad commit commit 
3a13884b23ae32b43d56d68a9c6bd4ce53d60017
Author: Richard Biener 
Date:   Fri Aug 11 12:08:10 2023 +0200

Improve BB vectorization opt-info

caused

FAIL: gcc.dg/vect/bb-slp-subgroups-2.c -flto -ffat-lto-objects  
scan-tree-dump-times slp2 "optimized: basic block" 2
FAIL: gcc.dg/vect/bb-slp-subgroups-2.c scan-tree-dump-times slp2 "optimized: 
basic block" 2

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-3148/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-subgroups-2.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-subgroups-2.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


[PATCH] Makefile.in: Add variable TM_P_H2 for TM_P_H dependency [PR111021]

2023-08-15 Thread Kewen.Lin via Gcc-patches
Hi,

As PR111021 shows, the below ${port}-protos.h include tree.h
for code_helper and tree_code:

  arm/arm-protos.h:#include "tree.h"
  cris/cris-protos.h:#include "tree.h"  (H-P removed this in r14-3218)
  microblaze/microblaze-protos.h:#include "tree.h"
  rl78/rl78-protos.h:#include "tree.h"
  stormy16/stormy16-protos.h:#include "tree.h"

, when compiling build/gencondmd.cc, the include hierarchy
makes it depend on tm_p.h -> ${port}-protos.h -> tree.h,
which further includes (depends on) some files that are
generated during the building, such as: all-tree.def,
tree-check.h and so on.  The previous commit r14-3215
should already force build/gencondmd.cc to depend on
${TREE_H}, so the reported build failure should be gone.

But for a long term maintenance, especially one day some
build/xxx.cc requires tm_p.h but not recog.h, the ${TREE_H}
dependence could be missed and a build failure will show
up.  So this patch is to add one variable under section
"# Shorthand variables for dependency lists.", to explicit
indicate tm_p.h which includes ${port}-protos.h should
depend on ${TREE_H}.  Then any new build/xxx.cc depending
on tm_p.h will be able to consider ${TREE_H}.

Note that the existing ${TM_P_H} variable is also used for
"generated_files", it isn't dedicated for dependencies, so
a variable named ${TM_P_H2} is proposed and put under the
"# Shorthand variables for dependency lists.", also the
only use as dependence is updated accordingly.

It's tested with cross-builds for the affected ports with
steps:

  1) dropped the fix r14-3215;
  2) reproduced the build failure with serial build;
  3) applied this patch, serial built and verified all passed;
  4) added back r14-3215, serial built and verified all passed;

Is it ok for trunk?

BR,
Kewen
-
PR bootstrap/111021

gcc/ChangeLog:

* Makefile.in (TM_P_H2): New variable for tm_p.h dependence.
---
 gcc/Makefile.in | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 9dddb65b45d..192dc76f294 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1062,6 +1062,7 @@ RTL_SSA_H = $(PRETTY_PRINT_H) insn-config.h 
splay-tree-utils.h \
rtl-ssa/changes.h rtl-ssa/functions.h rtl-ssa/is-a.inl \
rtl-ssa/access-utils.h rtl-ssa/insn-utils.h rtl-ssa/movement.h \
rtl-ssa/change-utils.h rtl-ssa/member-fns.inl
+TM_P_H2 = $(TM_P_H) $(TREE_H)

 #

 # Now figure out from those variables how to compile and link.
@@ -2905,7 +2906,7 @@ build/inchash.o : inchash.cc $(BCONFIG_H) $(SYSTEM_H) 
$(CORETYPES_H)  \
   $(HASHTAB_H) inchash.h
 build/gencondmd.o : build/gencondmd.cc $(BCONFIG_H) $(SYSTEM_H)
\
   $(CORETYPES_H) $(GTM_H) insn-constants.h \
-  $(filter-out insn-flags.h, $(RTL_H) $(TM_P_H) $(FUNCTION_H) $(REGS_H) \
+  $(filter-out insn-flags.h, $(RTL_H) $(TM_P_H2) $(FUNCTION_H) $(REGS_H) \
   $(RECOG_H) output.h $(FLAGS_H) $(RESOURCE_H) toplev.h $(DIAGNOSTIC_CORE_H) 
reload.h  \
   $(EXCEPT_H) tm-constrs.h)
 # This pulls in tm-pred.h which contains inline functions wrapping up
--
2.39.1


Re: [PATCH] RISC-V: Fix reduc_strict_run-1 test case.

2023-08-15 Thread Jeff Law via Gcc-patches




On 8/15/23 19:21, juzhe.zh...@rivai.ai wrote:
For float/double, the in-order fold-left reduction produced the same 
result as scalar codes.


But for _Float16 is not, I think the issue is not the reduction issue, 
is float 16 precision issue.
But if it's a float16 precision issue then I would have expected both 
the computations for the lhs and rhs values to have suffered similarly.


But if you're confident it's OK, then I won't object.
jeff


Re: [PATCH] vect: Move VMAT_GATHER_SCATTER handlings from final loop nest

2023-08-15 Thread Kewen.Lin via Gcc-patches
on 2023/8/15 17:13, Richard Sandiford wrote:
> Richard Biener  writes:
>>> OK, fair enough.  So the idea is: see where we end up and then try to
>>> improve/factor the APIs in a less peephole way?
>>
>> Yeah, I think that's the only good way forward.
> 
> OK, no objection from me.  Sorry for holding the patch up.

This hasn't been approved yet (although the patch on VMAT_LOAD_STORE_LANES
was), so it wasn't held up and thanks for sharing your thoughts and making
it get attention. :)

>From the discussions, it seems this looks good to both of you.  But I could
be wrong, so may I ask if it's ok for trunk?

BR,
Kewen


Re: [PATCH 6/6] Support AVX10.1 for AVX512DQ+AVX512VL intrins

2023-08-15 Thread Hongtao Liu via Gcc-patches
On Tue, Aug 8, 2023 at 3:23 PM Haochen Jiang via Gcc-patches
 wrote:
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx10_1-vextractf64x2-1.c: New test.
> * gcc.target/i386/avx10_1-vextracti64x2-1.c: Ditto.
> * gcc.target/i386/avx10_1-vfpclasspd-1.c: Ditto.
> * gcc.target/i386/avx10_1-vfpclassps-1.c: Ditto.
> * gcc.target/i386/avx10_1-vinsertf64x2-1.c: Ditto.
> * gcc.target/i386/avx10_1-vinserti64x2-1.c: Ditto.
> * gcc.target/i386/avx10_1-vrangepd-1.c: Ditto.
> * gcc.target/i386/avx10_1-vrangeps-1.c: Ditto.
> * gcc.target/i386/avx10_1-vreducepd-1.c: Ditto.
> * gcc.target/i386/avx10_1-vreduceps-1.c: Ditto.
Ok for all 6 patches(please wait for extra 24 hours to commit, if
there's no objection).
> ---
>  .../gcc.target/i386/avx10_1-vextractf64x2-1.c | 18 
>  .../gcc.target/i386/avx10_1-vextracti64x2-1.c | 19 
>  .../gcc.target/i386/avx10_1-vfpclasspd-1.c| 21 ++
>  .../gcc.target/i386/avx10_1-vfpclassps-1.c| 21 ++
>  .../gcc.target/i386/avx10_1-vinsertf64x2-1.c  | 18 
>  .../gcc.target/i386/avx10_1-vinserti64x2-1.c  | 18 
>  .../gcc.target/i386/avx10_1-vrangepd-1.c  | 27 +
>  .../gcc.target/i386/avx10_1-vrangeps-1.c  | 27 +
>  .../gcc.target/i386/avx10_1-vreducepd-1.c | 29 +++
>  .../gcc.target/i386/avx10_1-vreduceps-1.c | 29 +++
>  10 files changed, 227 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vextractf64x2-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vextracti64x2-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vfpclasspd-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vfpclassps-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vinsertf64x2-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vinserti64x2-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vrangepd-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vrangeps-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vreducepd-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-vreduceps-1.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vextractf64x2-1.c 
> b/gcc/testsuite/gcc.target/i386/avx10_1-vextractf64x2-1.c
> new file mode 100644
> index 000..4c7e54dc198
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-vextractf64x2-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx10.1 -O2" } */
> +/* { dg-final { scan-assembler-times "vextractf64x2\[ 
> \\t\]+\[^\{\n\]*%ymm\[0-9\]+.{7}(?:\n|\[ \\t\]+#)"  1 } } */
> +/* { dg-final { scan-assembler-times "vextractf64x2\[ 
> \\t\]+\[^\{\n\]*%ymm\[0-9\]+.{7}\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
> +/* { dg-final { scan-assembler-times "vextractf64x2\[ 
> \\t\]+\[^\{\n\]*%ymm\[0-9\]+.{7}\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
> +
> +#include 
> +
> +volatile __m256d x;
> +volatile __m128d y;
> +
> +void extern
> +avx10_1_test (void)
> +{
> +  y = _mm256_extractf64x2_pd (x, 1);
> +  y = _mm256_mask_extractf64x2_pd (y, 2, x, 1);
> +  y = _mm256_maskz_extractf64x2_pd (2, x, 1);
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vextracti64x2-1.c 
> b/gcc/testsuite/gcc.target/i386/avx10_1-vextracti64x2-1.c
> new file mode 100644
> index 000..c0bd7700d52
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-vextracti64x2-1.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx10.1 -O2" } */
> +/* { dg-final { scan-assembler-times "vextracti64x2\[ 
> \\t\]+\[^\{\n\]*%ymm\[0-9\]+.{7}(?:\n|\[ \\t\]+#)"  1 } } */
> +/* { dg-final { scan-assembler-times "vextracti64x2\[ 
> \\t\]+\[^\{\n\]*%ymm\[0-9\]+.{7}\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)"  1 } } */
> +/* { dg-final { scan-assembler-times "vextracti64x2\[ 
> \\t\]+\[^\{\n\]*%ymm\[0-9\]+.{7}\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)"  1 } } */
> +
> +#include 
> +
> +volatile __m256i x;
> +volatile __m128i y;
> +
> +void extern
> +avx10_1_test (void)
> +{
> +  y = _mm256_extracti64x2_epi64 (x, 1);
> +  y = _mm256_mask_extracti64x2_epi64 (y, 2, x, 1);
> +  y = _mm256_maskz_extracti64x2_epi64 (2, x, 1);
> +}
> +
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-vfpclasspd-1.c 
> b/gcc/testsuite/gcc.target/i386/avx10_1-vfpclasspd-1.c
> new file mode 100644
> index 000..806ba800023
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx10_1-vfpclasspd-1.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx10.1 -O2" } */
> +/* { dg-final { scan-assembler-times "vfpclasspdy\[ 
> \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n^k\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
> +/* { dg-final { scan-assembler-times "vfpclasspdx\[ 
> \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n^k\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
> +/* { dg-final { scan-assembler-times "vfpclasspdy\[ 
> 

Re: [PATCH 3/3] Emit a warning when AVX10 options conflict in vector width

2023-08-15 Thread Hongtao Liu via Gcc-patches
On Tue, Aug 8, 2023 at 3:13 PM Haochen Jiang via Gcc-patches
 wrote:
>
> gcc/ChangeLog:
>
> * config/i386/driver-i386.cc (host_detect_local_cpu):
> Do not append -mno-avx10-max-512bit for -march=native.
> * common/config/i386/i386-common.cc
> (ix86_check_avx10_vector_width): New function to check isa_flags
> to emit a warning when there is a conflict in AVX10 options for
> vector width.
> (ix86_handle_option): Add check for avx10.1-256 and avx10.1-512.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx10_1-15.c: New test.
> * gcc.target/i386/avx10_1-16.c: Ditto.
> * gcc.target/i386/avx10_1-17.c: Ditto.
> * gcc.target/i386/avx10_1-18.c: Ditto.
> ---
Ok(please wait for extra 24 hours to commit, if there's no objection)
>  gcc/common/config/i386/i386-common.cc  | 20 
>  gcc/config/i386/driver-i386.cc |  3 ++-
>  gcc/config/i386/i386-options.cc|  2 +-
>  gcc/testsuite/gcc.target/i386/avx10_1-15.c |  5 +
>  gcc/testsuite/gcc.target/i386/avx10_1-16.c |  5 +
>  gcc/testsuite/gcc.target/i386/avx10_1-17.c | 13 +
>  gcc/testsuite/gcc.target/i386/avx10_1-18.c | 13 +
>  7 files changed, 59 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-15.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-16.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-17.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-18.c
>
> diff --git a/gcc/common/config/i386/i386-common.cc 
> b/gcc/common/config/i386/i386-common.cc
> index ec94251dd4c..db88befc9b8 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -428,6 +428,24 @@ ix86_check_avx512 (struct gcc_options *opts)
>return true;
>  }
>
> +/* Emit a warning when there is a conflict vector width in AVX10 options.  */
> +static void
> +ix86_check_avx10_vector_width (struct gcc_options *opts, bool avx10_max_512)
> +{
> +  if (avx10_max_512)
> +{
> +  if (((opts->x_ix86_isa_flags2 | ~OPTION_MASK_ISA2_AVX10_512BIT)
> +  == ~OPTION_MASK_ISA2_AVX10_512BIT)
> + && (opts->x_ix86_isa_flags2_explicit & 
> OPTION_MASK_ISA2_AVX10_512BIT))
> +   warning (0, "The options used for AVX10 have conflict vector width, "
> +"using the latter 512 as vector width");
> +}
> +  else if (opts->x_ix86_isa_flags2 & opts->x_ix86_isa_flags2_explicit
> +  & OPTION_MASK_ISA2_AVX10_512BIT)
> +warning (0, "The options used for AVX10 have conflict vector width, "
> +"using the latter 256 as vector width");
> +}
> +
>  /* Implement TARGET_HANDLE_OPTION.  */
>
>  bool
> @@ -1415,6 +1433,7 @@ ix86_handle_option (struct gcc_options *opts,
>return true;
>
>  case OPT_mavx10_1_256:
> +  ix86_check_avx10_vector_width (opts, false);
>opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX10_1_SET;
>opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX10_1_SET;
>opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVX10_512BIT_SET;
> @@ -1424,6 +1443,7 @@ ix86_handle_option (struct gcc_options *opts,
>return true;
>
>  case OPT_mavx10_1_512:
> +  ix86_check_avx10_vector_width (opts, true);
>opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX10_1_SET;
>opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVX10_1_SET;
>opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVX10_512BIT_SET;
> diff --git a/gcc/config/i386/driver-i386.cc b/gcc/config/i386/driver-i386.cc
> index 227ace6ff83..f4551a74e3a 100644
> --- a/gcc/config/i386/driver-i386.cc
> +++ b/gcc/config/i386/driver-i386.cc
> @@ -854,7 +854,8 @@ const char *host_detect_local_cpu (int argc, const char 
> **argv)
>   options = concat (options, " ",
> isa_names_table[i].option, NULL);
>   }
> -   else if (isa_names_table[i].feature != FEATURE_AVX10_1)
> +   else if ((isa_names_table[i].feature != FEATURE_AVX10_1)
> +&& (isa_names_table[i].feature != FEATURE_AVX10_512BIT))
>   options = concat (options, neg_option,
> isa_names_table[i].option + 2, NULL);
>   }
> diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> index b2281fbd4b5..8f9b825b527 100644
> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -985,7 +985,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree 
> args, char *p_strings[],
>  ix86_opt_ix86_no,
>  ix86_opt_str,
>  ix86_opt_enum,
> -ix86_opt_isa,
> +ix86_opt_isa
>};
>
>static const struct
> diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-15.c 
> b/gcc/testsuite/gcc.target/i386/avx10_1-15.c
> new file mode 100644
> index 000..fd873c9694c
> --- /dev/null
> +++ 

Re: [PATCH 2/3] Emit a warning when disabling AVX512 with AVX10 enabled or disabling AVX10 with AVX512 enabled

2023-08-15 Thread Hongtao Liu via Gcc-patches
On Tue, Aug 8, 2023 at 3:15 PM Haochen Jiang via Gcc-patches
 wrote:
>
> gcc/ChangeLog:
>
> * config/i386/driver-i386.cc (host_detect_local_cpu):
> Do not append -mno-avx10.1 for -march=native.
> * config/i386/i386-options.cc
> (ix86_check_avx10): New function to check isa_flags and
> isa_flags_explicit to emit warning when AVX10 is enabled
> by "-m" option.
> (ix86_check_avx512):  New function to check isa_flags and
> isa_flags_explicit to emit warning when AVX512 is enabled
> by "-m" option.
> (ix86_handle_option): Do not change the flags when warning
> is emitted.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx10_1-11.c: New test.
> * gcc.target/i386/avx10_1-12.c: Ditto.
> * gcc.target/i386/avx10_1-13.c: Ditto.
> * gcc.target/i386/avx10_1-14.c: Ditto.
Ok(please wait for extra 24 hours to commit, if there's no objection)
> ---
>  gcc/common/config/i386/i386-common.cc  | 68 +-
>  gcc/config/i386/driver-i386.cc |  2 +-
>  gcc/testsuite/gcc.target/i386/avx10_1-11.c |  5 ++
>  gcc/testsuite/gcc.target/i386/avx10_1-12.c | 13 +
>  gcc/testsuite/gcc.target/i386/avx10_1-13.c |  5 ++
>  gcc/testsuite/gcc.target/i386/avx10_1-14.c | 13 +
>  6 files changed, 91 insertions(+), 15 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-11.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-12.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-13.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-14.c
>
> diff --git a/gcc/common/config/i386/i386-common.cc 
> b/gcc/common/config/i386/i386-common.cc
> index 6c3bebb1846..ec94251dd4c 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -388,6 +388,46 @@ set_malign_value (const char **flag, unsigned value)
>*flag = r;
>  }
>
> +/* Emit a warning when using -mno-avx512{f,vl,bw,dq,cd,bf16,fp16,vbmi,vbmi2,
> +   vnni,ifma,bitalg,vpopcntdq} with -mavx10.1 and above.  */
> +static bool
> +ix86_check_avx10 (struct gcc_options *opts)
> +{
> +  if (opts->x_ix86_isa_flags2 & opts->x_ix86_isa_flags2_explicit
> +  & OPTION_MASK_ISA2_AVX10_1)
> +{
> +  warning (0, 
> "%<-mno-avx512{f,vl,bw,dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,"
> +  "bitalg,vpopcntdq}%> are ignored with %<-mavx10.1%> and 
> above");
> +  return false;
> +}
> +
> +  return true;
> +}
> +
> +/* Emit a warning when using -mno-avx10.1 with -mavx512{f,vl,bw,dq,cd,bf16,
> +   fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq}.  */
> +static bool
> +ix86_check_avx512 (struct gcc_options *opts)
> +{
> +  if ((opts->x_ix86_isa_flags & opts->x_ix86_isa_flags_explicit
> +   & (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_AVX512CD
> + | OPTION_MASK_ISA_AVX512DQ | OPTION_MASK_ISA_AVX512BW
> + | OPTION_MASK_ISA_AVX512VL | OPTION_MASK_ISA_AVX512IFMA
> + | OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VBMI2
> + | OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VPOPCNTDQ
> + | OPTION_MASK_ISA_AVX512BITALG))
> +  || (opts->x_ix86_isa_flags2 & opts->x_ix86_isa_flags2_explicit
> + & (OPTION_MASK_ISA2_AVX512FP16 | OPTION_MASK_ISA2_AVX512BF16)))
> +{
> +  warning (0, "%<-mno-avx10.1%> is ignored when using with "
> +  "%<-mavx512{f,vl,bw,dq,cd,bf16,fp16,vbmi,vbmi2,vnni,"
> +  "ifma,bitalg,vpopcntdq}%>");
> +  return false;
> +}
> +
> +  return true;
> +}
> +
>  /* Implement TARGET_HANDLE_OPTION.  */
>
>  bool
> @@ -609,7 +649,7 @@ ix86_handle_option (struct gcc_options *opts,
>   opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512F_SET;
>   opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512F_SET;
> }
> -  else
> +  else if (ix86_check_avx10 (opts))
> {
>   opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512F_UNSET;
>   opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512F_UNSET;
> @@ -624,7 +664,7 @@ ix86_handle_option (struct gcc_options *opts,
>   opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512CD_SET;
>   opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512CD_SET;
> }
> -  else
> +  else if (ix86_check_avx10 (opts))
> {
>   opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512CD_UNSET;
>   opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512CD_UNSET;
> @@ -898,7 +938,7 @@ ix86_handle_option (struct gcc_options *opts,
>   opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512VBMI2_SET;
>   opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512VBMI2_SET;
> }
> -  else
> +  else if (ix86_check_avx10 (opts))
> {
>   opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512VBMI2_UNSET;
>   opts->x_ix86_isa_flags_explicit |= 
> OPTION_MASK_ISA_AVX512VBMI2_UNSET;
> @@ -913,7 +953,7 

Re: [PATCH 1/3] Initial support for AVX10.1

2023-08-15 Thread Hongtao Liu via Gcc-patches
On Tue, Aug 8, 2023 at 3:16 PM Haochen Jiang via Gcc-patches
 wrote:
>
> gcc/ChangeLog:
>
> * common/config/i386/cpuinfo.h (get_available_features):
> Add avx10_set and version and detect avx10.1.
> (cpu_indicator_init): Handle avx10.1-512.
> * common/config/i386/i386-common.cc
> (OPTION_MASK_ISA2_AVX10_512BIT_SET): New.
> (OPTION_MASK_ISA2_AVX10_1_SET): Ditto.
> (OPTION_MASK_ISA2_AVX10_512BIT_UNSET): Ditto.
> (OPTION_MASK_ISA2_AVX10_1_UNSET): Ditto.
> (OPTION_MASK_ISA2_AVX2_UNSET): Modify for AVX10_1.
> (ix86_handle_option): Handle -mavx10.1, -mavx10.1-256 and
> -mavx10.1-512.
> * common/config/i386/i386-cpuinfo.h (enum processor_features):
> Add FEATURE_AVX10_512BIT, FEATURE_AVX10_1 and
> FEATURE_AVX10_512BIT.
> * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for
> AVX10_512BIT, AVX10_1 and AVX10_1_512.
> * config/i386/constraints.md (Yk): Add AVX10_1.
> (Yv): Ditto.
> (k): Ditto.
> * config/i386/cpuid.h (bit_AVX10): New.
> (bit_AVX10_256): Ditto.
> (bit_AVX10_512): Ditto.
> * config/i386/i386-c.cc (ix86_target_macros_internal):
> Define AVX10_512BIT and AVX10_1.
> * config/i386/i386-isa.def
> (AVX10_512BIT): Add DEF_PTA(AVX10_512BIT).
> (AVX10_1): Add DEF_PTA(AVX10_1).
> * config/i386/i386-options.cc (isa2_opts): Add -mavx10.1.
> (ix86_valid_target_attribute_inner_p): Handle avx10-512bit, avx10.1
> and avx10.1-512.
> (ix86_option_override_internal): Enable AVX512{F,VL,BW,DQ,CD,BF16,
> FP16,VBMI,VBMI2,VNNI,IFMA,BITALG,VPOPCNTDQ} features for avx10.1-512.
> (ix86_valid_target_attribute_inner_p): Handle AVX10_1.
> * config/i386/i386.cc (ix86_get_ssemov): Add AVX10_1.
> (ix86_conditional_register_usage): Ditto.
> (ix86_hard_regno_mode_ok): Ditto.
> (ix86_rtx_costs): Ditto.
> * config/i386/i386.h (VALID_MASK_AVX10_MODE): New macro.
> * config/i386/i386.opt: Add option -mavx10.1, -mavx10.1-256 and
> -mavx10.1-512.
> * doc/extend.texi: Document avx10.1, avx10.1-256 and avx10.1-512.
> * doc/invoke.texi: Document -mavx10.1, -mavx10.1-256 and 
> -mavx10.1-512.
> * doc/sourcebuild.texi: Document target avx10.1, avx10.1-256
> and avx10.1-512.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/i386/mv33.C: New test.
> * gcc.target/i386/avx10_1-1.c: Ditto.
> * gcc.target/i386/avx10_1-2.c: Ditto.
> * gcc.target/i386/avx10_1-3.c: Ditto.
> * gcc.target/i386/avx10_1-4.c: Ditto.
> * gcc.target/i386/avx10_1-5.c: Ditto.
> * gcc.target/i386/avx10_1-6.c: Ditto.
> * gcc.target/i386/avx10_1-7.c: Ditto.
> * gcc.target/i386/avx10_1-8.c: Ditto.
> * gcc.target/i386/avx10_1-9.c: Ditto.
> * gcc.target/i386/avx10_1-10.c: Ditto.
Ok(please wait for extra 24 hours to commit, if there's no objection)
> ---
>  gcc/common/config/i386/cpuinfo.h   | 36 +++
>  gcc/common/config/i386/i386-common.cc  | 53 +-
>  gcc/common/config/i386/i386-cpuinfo.h  |  3 ++
>  gcc/common/config/i386/i386-isas.h |  5 ++
>  gcc/config/i386/constraints.md |  6 +--
>  gcc/config/i386/cpuid.h|  6 +++
>  gcc/config/i386/i386-c.cc  |  4 ++
>  gcc/config/i386/i386-isa.def   |  2 +
>  gcc/config/i386/i386-options.cc| 26 ++-
>  gcc/config/i386/i386.cc| 18 ++--
>  gcc/config/i386/i386.h |  3 ++
>  gcc/config/i386/i386.opt   | 19 
>  gcc/doc/extend.texi| 13 ++
>  gcc/doc/invoke.texi| 16 +--
>  gcc/doc/sourcebuild.texi   |  9 
>  gcc/testsuite/g++.target/i386/mv33.C   | 30 
>  gcc/testsuite/gcc.target/i386/avx10_1-1.c  | 22 +
>  gcc/testsuite/gcc.target/i386/avx10_1-10.c | 13 ++
>  gcc/testsuite/gcc.target/i386/avx10_1-2.c  | 13 ++
>  gcc/testsuite/gcc.target/i386/avx10_1-3.c  | 13 ++
>  gcc/testsuite/gcc.target/i386/avx10_1-4.c  | 13 ++
>  gcc/testsuite/gcc.target/i386/avx10_1-5.c  | 13 ++
>  gcc/testsuite/gcc.target/i386/avx10_1-6.c  | 13 ++
>  gcc/testsuite/gcc.target/i386/avx10_1-7.c  | 13 ++
>  gcc/testsuite/gcc.target/i386/avx10_1-8.c  |  4 ++
>  gcc/testsuite/gcc.target/i386/avx10_1-9.c  | 13 ++
>  26 files changed, 366 insertions(+), 13 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/i386/mv33.C
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-10.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-3.c
>  create mode 100644 

[PATCH] Loongarch: Fix plugin header missing install.

2023-08-15 Thread Guo Jie
gcc/ChangeLog:

* config/loongarch/t-loongarch: Add loongarch-driver.h into
TM_H. Add loongarch-def.h and loongarch-tune.h into
OPTIONS_H_EXTRA.

Co-authored-by: Lulu Cheng 
---
 gcc/config/loongarch/t-loongarch | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/config/loongarch/t-loongarch b/gcc/config/loongarch/t-loongarch
index 6d6e3435d59..e73f4f437ef 100644
--- a/gcc/config/loongarch/t-loongarch
+++ b/gcc/config/loongarch/t-loongarch
@@ -16,6 +16,10 @@
 # along with GCC; see the file COPYING3.  If not see
 # .
 
+TM_H += $(srcdir)/config/loongarch/loongarch-driver.h
+OPTIONS_H_EXTRA += $(srcdir)/config/loongarch/loongarch-def.h \
+  $(srcdir)/config/loongarch/loongarch-tune.h
+
 # Canonical target triplet from config.gcc
 LA_MULTIARCH_TRIPLET = $(patsubst LA_MULTIARCH_TRIPLET=%,%,$\
 $(filter LA_MULTIARCH_TRIPLET=%,$(tm_defines)))
-- 
2.20.1



[PATCH V2] RISC-V: Support MASK_LEN_{LOAD_LANES,STORE_LANES}

2023-08-15 Thread Juzhe-Zhong
This patch allow us auto-vectorize this following case:

#define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE) \
  void __attribute__ ((noinline, noclone)) \
  NAME##_8 (OUTTYPE *__restrict dest, INTYPE *__restrict src,  \
MASKTYPE *__restrict cond, intptr_t n) \
  {\
for (intptr_t i = 0; i < n; ++i)   \
  if (cond[i]) \
dest[i] = (src[i * 8] + src[i * 8 + 1] + src[i * 8 + 2]\
   + src[i * 8 + 3] + src[i * 8 + 4] + src[i * 8 + 5]  \
   + src[i * 8 + 6] + src[i * 8 + 7]); \
  }

#define TEST2(NAME, OUTTYPE, INTYPE)   \
  TEST_LOOP (NAME##_f32, OUTTYPE, INTYPE, int32_t)  
 \

#define TEST1(NAME, OUTTYPE)   \
  TEST2 (NAME##_i32, OUTTYPE, int32_t) \

#define TEST(NAME) \
  TEST1 (NAME##_i32, int32_t)  \

TEST (test)

ASM:

test_i32_i32_f32_8:
ble a3,zero,.L5
.L3:
vsetvli a4,a3,e8,mf4,ta,ma
vle32.v v0,0(a2)
vsetvli a5,zero,e32,m1,ta,ma
vmsne.viv0,v0,0
vsetvli zero,a4,e32,m1,ta,ma
vlseg8e32.v v8,(a1),v0.t
vsetvli a5,zero,e32,m1,ta,ma
sllia6,a4,2
vadd.vv v1,v9,v8
sllia7,a4,5
vadd.vv v1,v1,v10
sub a3,a3,a4
vadd.vv v1,v1,v11
vadd.vv v1,v1,v12
vadd.vv v1,v1,v13
vadd.vv v1,v1,v14
vadd.vv v1,v1,v15
vsetvli zero,a4,e32,m1,ta,ma
vse32.v v1,0(a0),v0.t
add a2,a2,a6
add a1,a1,a7
add a0,a0,a6
bne a3,zero,.L3
.L5:
ret

gcc/ChangeLog:

* config/riscv/autovec.md (vec_mask_len_load_lanes): New 
pattern.
(vec_mask_len_store_lanes): Ditto.
* config/riscv/riscv-protos.h (expand_lanes_load_store): New function.
* config/riscv/riscv-v.cc (get_mask_mode): Add tuple mask mode.
(expand_lanes_load_store): New function.
* config/riscv/vector-iterators.md: New iterator.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c: Adapt 
test.
* gcc.target/riscv/rvv/autovec/partial/slp-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-6.c: Ditto.
* gcc.target/riscv/rvv/rvv.exp: Add lanes tests.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-1.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-2.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-3.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-4.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-5.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-6.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-7.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-1.c: New 
test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-2.c: New 
test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-3.c: New 
test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-4.c: New 
test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-5.c: New 
test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-6.c: New 
test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-7.c: New 
test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-1.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-2.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-3.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-4.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-5.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-6.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-7.c: New test.
  

[PATCH] IFN: Fix vector extraction into promoted subreg.

2023-08-15 Thread juzhe.zh...@rivai.ai
Hi, Robin, Richard and Richi.

I am wondering whether we can just simply replace the VEC_EXTRACT expander with 
binary?

Like this :?

DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT, ECF_CONST | ECF_NOTHROW,
-  vec_extract, vec_extract)
+  vec_extract, binary)

to fix the sign extend issue.

And remove the vec_extract explicit expander in internal-fn.cc ?

Thanks.


juzhe.zh...@rivai.ai


Re: [RFC PATCH v2 1/2] RISC-V: __builtin_riscv_pause for all environment

2023-08-15 Thread Jeff Law via Gcc-patches




On 8/9/23 20:25, Tsukasa OI wrote:

From: Tsukasa OI 

The "pause" RISC-V hint instruction requires the 'Zihintpause' extension
(in the assembler).  However, GCC emits "pause" unconditionally, making
an assembler error while compiling code with __builtin_riscv_pause while
the 'Zihintpause' extension disabled.

However, the "pause" instruction code (0x010f) is a HINT and emitting
its instruction code is safe in any environment.

This commit implements handling for the 'Zihintpause' extension and emits
".insn 0x010f" instead of "pause" only if the extension is disabled
(making the diagnostics better).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_ext_version_table): Implement the 'Zihintpause' extension,
version 2.0.  (riscv_ext_flag_table) Add 'Zihintpause' handling.
* config/riscv/riscv-builtins.cc: Remove availability predicate
"always" and add "hint_pause" and "hint_pause_pseudo", corresponding
the existence of the 'Zihintpause' extension.
(riscv_builtins) Split builtin implementation depending on the
existence of the 'Zihintpause' extension.
* config/riscv/riscv-opts.h
(MASK_ZIHINTPAUSE, TARGET_ZIHINTPAUSE): New.
* config/riscv/riscv.md (riscv_pause): Make it only available when
the 'Zihintpause' extension is enabled.  (riscv_pause_insn) New
"pause" implementation when the 'Zihintpause' extension is disabled.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/builtin_pause.c: Removed.
* gcc.target/riscv/zihintpause-1.c:
New test when the 'Zihintpause' extension is enabled.
* gcc.target/riscv/zihintpause-2.c: Likewise.
* gcc.target/riscv/zihintpause-noarch.c:
New test when the 'Zihintpause' extension is disabled.
So the conclusion from today's meeting was to make this available 
irrespective of the extension set.  So I've dropped the alternate patch 
from patchwork.




diff --git a/gcc/config/riscv/riscv-builtins.cc 
b/gcc/config/riscv/riscv-builtins.cc
index 79681d759628..554fb7f69bb0 100644
--- a/gcc/config/riscv/riscv-builtins.cc
+++ b/gcc/config/riscv/riscv-builtins.cc
@@ -122,7 +122,8 @@ AVAIL (clmul_zbkc32_or_zbc32, (TARGET_ZBKC || TARGET_ZBC) 
&& !TARGET_64BIT)
  AVAIL (clmul_zbkc64_or_zbc64, (TARGET_ZBKC || TARGET_ZBC) && TARGET_64BIT)
  AVAIL (clmulr_zbc32, TARGET_ZBC && !TARGET_64BIT)
  AVAIL (clmulr_zbc64, TARGET_ZBC && TARGET_64BIT)
-AVAIL (always, (!0))
+AVAIL (hint_pause, TARGET_ZIHINTPAUSE)
+AVAIL (hint_pause_pseudo, !TARGET_ZIHINTPAUSE)
  
  /* Construct a riscv_builtin_description from the given arguments.
  
@@ -179,7 +180,8 @@ static const struct riscv_builtin_description riscv_builtins[] = {
  
DIRECT_BUILTIN (frflags, RISCV_USI_FTYPE, hard_float),

DIRECT_NO_TARGET_BUILTIN (fsflags, RISCV_VOID_FTYPE_USI, hard_float),
-  DIRECT_NO_TARGET_BUILTIN (pause, RISCV_VOID_FTYPE, always),
+  RISCV_BUILTIN (pause, "pause", RISCV_BUILTIN_DIRECT_NO_TARGET, 
RISCV_VOID_FTYPE, hint_pause),
+  RISCV_BUILTIN (pause_insn, "pause", RISCV_BUILTIN_DIRECT_NO_TARGET, 
RISCV_VOID_FTYPE, hint_pause_pseudo),
  };
  
  /* Index I is the function declaration for riscv_builtins[I], or null if the

diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 28d9b81bd800..a6c3e0c9098f 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -102,10 +102,12 @@ enum riscv_entity
  #define MASK_ZICSR(1 << 0)
  #define MASK_ZIFENCEI (1 << 1)
  #define MASK_ZIHINTNTL (1 << 2)
+#define MASK_ZIHINTPAUSE (1 << 3)
  
  #define TARGET_ZICSR((riscv_zi_subext & MASK_ZICSR) != 0)

  #define TARGET_ZIFENCEI ((riscv_zi_subext & MASK_ZIFENCEI) != 0)
  #define TARGET_ZIHINTNTL ((riscv_zi_subext & MASK_ZIHINTNTL) != 0)
+#define TARGET_ZIHINTPAUSE ((riscv_zi_subext & MASK_ZIHINTPAUSE) != 0)
  
  #define MASK_ZAWRS   (1 << 0)

  #define TARGET_ZAWRS ((riscv_za_subext & MASK_ZAWRS) != 0)
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 688fd697255b..a6cdb32e9408 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2192,9 +2192,14 @@
  
  (define_insn "riscv_pause"

[(unspec_volatile [(const_int 0)] UNSPECV_PAUSE)]
-  ""
+  "TARGET_ZIHINTPAUSE"
"pause")
  
+(define_insn "riscv_pause_insn"

+  [(unspec_volatile [(const_int 0)] UNSPECV_PAUSE)]
+  ""
+  ".insn\t0x010f")
+
So I was wondering if we'd be better off always emitting the .insn form 
with a comment on the line indicating it's a pause.  ie something like


.insn\t0x010f ;; pause

That would allow the implementation to simplify down to a single 
unconditional pattern as well as simplifications elsewhere.


Alternately we could do:

TARGET_ZIHINTPAUSE ? pause : .insn\t0x010f

in a single pattern that is always available if you feel strongly that 
we should emit different code based on TARGET_ZIHINTPAUSE.


I think both simplify the riscv-builtins.cc change a bit.

Thoughts?


Re: Re: [PATCH] RISC-V: Support MASK_LEN_{LOAD_LANES,STORE_LANES}

2023-08-15 Thread juzhe.zh...@rivai.ai
Thanks Jeff.
I realize the quad_trunc/oct_trunc change is not necessary. I will remove that.

The middle-end support is approved, and testing on both X86 and ARM, soon will 
be committed.

Will commit this patch after middle-end patch is committed.

Thanks.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-08-15 22:18
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; kito.cheng; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support MASK_LEN_{LOAD_LANES,STORE_LANES}
 
 
On 8/14/23 06:15, Juzhe-Zhong wrote:
> This patch is depending on middle-end support:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627305.html
> 
> This patch allow us auto-vectorize this following case:
> 
> #define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE)
>  \
>void __attribute__ ((noinline, noclone))   
>   \
>NAME##_8 (OUTTYPE *__restrict dest, INTYPE *__restrict src,
>   \
> MASKTYPE *__restrict cond, intptr_t n) \
>{  
>   \
>  for (intptr_t i = 0; i < n; ++i) 
>   \
>if (cond[i])   
>   \
> dest[i] = (src[i * 8] + src[i * 8 + 1] + src[i * 8 + 2]\
>+ src[i * 8 + 3] + src[i * 8 + 4] + src[i * 8 + 5]  \
>+ src[i * 8 + 6] + src[i * 8 + 7]); \
>}
> 
> #define TEST2(NAME, OUTTYPE, INTYPE)  
>  \
>TEST_LOOP (NAME##_f32, OUTTYPE, INTYPE, int32_t)   
> \
> 
> #define TEST1(NAME, OUTTYPE)  
>  \
>TEST2 (NAME##_i32, OUTTYPE, int32_t)   
>   \
> 
> #define TEST(NAME)
>  \
>TEST1 (NAME##_i32, int32_t)
>   \
> 
> TEST (test)
> 
> ASM:
> 
> test_i32_i32_f32_8:
> ble a3,zero,.L5
> .L3:
> vsetvli a4,a3,e8,mf4,ta,ma
> vle32.v v0,0(a2)
> vsetvli a5,zero,e32,m1,ta,ma
> vmsne.vi v0,v0,0
> vsetvli zero,a4,e32,m1,ta,ma
> vlseg8e32.v v8,(a1),v0.t
> vsetvli a5,zero,e32,m1,ta,ma
> slli a6,a4,2
> vadd.vv v1,v9,v8
> slli a7,a4,5
> vadd.vv v1,v1,v10
> sub a3,a3,a4
> vadd.vv v1,v1,v11
> vadd.vv v1,v1,v12
> vadd.vv v1,v1,v13
> vadd.vv v1,v1,v14
> vadd.vv v1,v1,v15
> vsetvli zero,a4,e32,m1,ta,ma
> vse32.v v1,0(a0),v0.t
> add a2,a2,a6
> add a1,a1,a7
> add a0,a0,a6
> bne a3,zero,.L3
> .L5:
> ret
> 
> gcc/ChangeLog:
> 
>  * config/riscv/autovec.md (vec_mask_len_load_lanes): 
> New pattern.
>  (vec_mask_len_store_lanes): Ditto.
>  (2): Fix pattern for ICE.
>  (2): Ditto.
>  * config/riscv/riscv-protos.h (expand_lanes_load_store): New 
> function.
>  * config/riscv/riscv-v.cc (get_mask_mode): Add tuple mode mask mode.
>  (expand_lanes_load_store): New function.
>  * config/riscv/vector-iterators.md: New iterator.
I would generally recommend sending independent fixes separately.  In 
particular the quad_trunc, oct_trunc changes seem like they should have 
been a separate patch.  But no need to resend this time.  Just try to 
break out distinct changes like those into their own patch.
 
OK, but obviously hold off committing until the generic support is 
approved and committed.
 
Thanks,
jeff
 
 


Re: Re: [PATCH] RISC-V: Fix reduc_strict_run-1 test case.

2023-08-15 Thread juzhe.zh...@rivai.ai
For float/double, the in-order fold-left reduction produced the same result as 
scalar codes.

But for _Float16 is not, I think the issue is not the reduction issue, is float 
16 precision issue. 

Thanks.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-08-16 09:13
To: Robin Dapp; gcc-patches; palmer; Kito Cheng; juzhe.zh...@rivai.ai
Subject: Re: [PATCH] RISC-V: Fix reduc_strict_run-1 test case.
 
 
On 8/15/23 09:49, Robin Dapp wrote:
> Hi,
> 
> this patch changes the equality check for the reduc_strict_run-1
> testcase from == to fabs () < EPS.  The FAIL only occurs with
> _Float16 but I'd argue approximate equality is preferable for all
> float modes.
> 
> Regards
>   Robin
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c:
> Check float equality with fabs < EPS.
Generally agree with using an EPS test.
 
The question is shouldn't a fold-left reduction be done in-order and 
produce the same result as a scalar equivalent?
 
Jeff
 
 


Re: [PATCH] RISC-V: Fix reduc_strict_run-1 test case.

2023-08-15 Thread Jeff Law via Gcc-patches




On 8/15/23 09:49, Robin Dapp wrote:

Hi,

this patch changes the equality check for the reduc_strict_run-1
testcase from == to fabs () < EPS.  The FAIL only occurs with
_Float16 but I'd argue approximate equality is preferable for all
float modes.

Regards
  Robin

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c:
Check float equality with fabs < EPS.

Generally agree with using an EPS test.

The question is shouldn't a fold-left reduction be done in-order and 
produce the same result as a scalar equivalent?


Jeff



[PATCH] Remove XFAIL from gcc/testsuite/gcc.dg/unroll-7.c

2023-08-15 Thread Thiago Jung Bauermann via Gcc-patches
This test passes since commit e41103081bfa "Fix undefined behaviour in
profile_count::differs_from_p", so remove the xfail annotation.

Tested on aarch64-linux-gnu, armv8l-linux-gnueabihf and x86_64-linux-gnu.

gcc/testsuite/ChangeLog:
* gcc.dg/unroll-7.c: Remove xfail.
---
 gcc/testsuite/gcc.dg/unroll-7.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/unroll-7.c b/gcc/testsuite/gcc.dg/unroll-7.c
index 650448df5db1..17c5e533c2cb 100644
--- a/gcc/testsuite/gcc.dg/unroll-7.c
+++ b/gcc/testsuite/gcc.dg/unroll-7.c
@@ -15,4 +15,4 @@ int t(void)
 /* { dg-final { scan-rtl-dump "upper bound: 99" "loop2_unroll" } } */
 /* { dg-final { scan-rtl-dump "realistic bound: 99" "loop2_unroll" } } */
 /* { dg-final { scan-rtl-dump "considering unrolling loop with constant number 
of iterations" "loop2_unroll" } } */
-/* { dg-final { scan-rtl-dump-not "Invalid sum" "loop2_unroll" {xfail *-*-* } 
} } */
+/* { dg-final { scan-rtl-dump-not "Invalid sum" "loop2_unroll" } } */

base-commit: 5da4c0b85a97727e6802eaf3a0d47bcdb8da5f51


Re: [RFC] GCC Security policy

2023-08-15 Thread Paul Koning via Gcc-patches



> On Aug 15, 2023, at 8:37 PM, Alexander Monakov  wrote:
> 
>> ...
>> At some point the system tools need to respect the programmer or operator.
>> There is a difference between writing "Hello, World" and writing
>> performance critical or safety critical code.  That is the responsibility
>> of the programmer and the development team to choose the right software
>> engineers and right tools.  And to have the development environment and
>> checks in place to ensure that the results are meeting the requirements.
>> 
>> It is not the role of GCC or its security policy to tell people how to do
>> their job or hobby.  This isn't a safety tag required to be attached to a
>> new mattress.
> 
> Yes (though I'm afraid the analogy with the mattress is a bit lost on me).
> Those examples were meant to illustrate the point I tried to make earlier,
> not as additions proposed for the Security Policy. Specific examples
> where we can tell people in advance that compiler output needs to be
> verified, because the compiler is not engineered to preserve those
> security-relevant properties from the source code (and we would not
> accept such accidents as security bugs).

Now I'm confused.  I thought the whole point of what GCC is trying to, and 
wants to document, is that it DOES preserve security properties.  If the source 
code is standards-compliant and contains algorithms free of security holes, 
then the compiler is supposed to deliver output code that is likewise free of 
holes -- in other words, the transformation performed by GCC does not introduce 
holes in a hole-free input.

> Granted, it is a bit of a stretch since the notion of timing-safety is
> not really well-defined for C source code, but I didn't come up with
> better examples.

Is "timing-safety" a security property?  Not the way I understand that term.  
It sounds like another way to say that the code meets real time constraints or 
requirements.  No, compilers don't help with that (at least C doesn't -- Ada 
might be better here but I don't know enough).  For sufficiently strict 
requirements you'd have to examine both the generated machine code and 
understand, in gruesome detail, what the timing behaviors of the executing 
hardware are.  Good luck if it's a modern billion-transistor machine.

Again, I don't see that as a security property.  If it's considered desirable 
to say something about this, fine, but the words Siddesh crafted don't fit for 
that kind of property.

paul



Re: [RFC] GCC Security policy

2023-08-15 Thread Alexander Monakov


On Tue, 15 Aug 2023, David Edelsohn wrote:

> > Making users responsible for verifying that sources are "safe" is not okay
> > (we cannot teach them how to do that since there's no general method).
> > Making users responsible for sandboxing the compiler is fine (there's
> > a range of sandboxing solutions, from which they can choose according
> > to their requirements and threat model). Sorry about the ambiguity.
> >
> 
> Alex.
> 
> The compiler should faithfully implement the algorithms described by the
> programmer.  The compiler is responsible if it generates incorrect code for
> a well-defined, language-conforming program.  The compiler cannot be
> responsible for security issues inherent in the user code, whether that
> causes the compiler to function in a manner that deteriorates adversely
> affects the system or generates code that behaves in a manner that
> adversely affects the system.
> 
> If "safe" is the wrong word. What word would you suggest?

I think "safe" is the right word here. We also used "trusted" in a similar
sense. I believe we were on the same page about that.

> > For both 1) and 2), GCC is not engineered to respect such properties
> > during optimization and code generation, so it's not appropriate for such
> > tasks (a possible solution is to isolate such sensitive functions to
> > separate files, compile to assembly, inspect the assembly to check that it
> > still has the required properties, and use the inspected asm in subsequent
> > builds instead of the original high-level source).
> >
> 
> At some point the system tools need to respect the programmer or operator.
> There is a difference between writing "Hello, World" and writing
> performance critical or safety critical code.  That is the responsibility
> of the programmer and the development team to choose the right software
> engineers and right tools.  And to have the development environment and
> checks in place to ensure that the results are meeting the requirements.
> 
> It is not the role of GCC or its security policy to tell people how to do
> their job or hobby.  This isn't a safety tag required to be attached to a
> new mattress.

Yes (though I'm afraid the analogy with the mattress is a bit lost on me).
Those examples were meant to illustrate the point I tried to make earlier,
not as additions proposed for the Security Policy. Specific examples
where we can tell people in advance that compiler output needs to be
verified, because the compiler is not engineered to preserve those
security-relevant properties from the source code (and we would not
accept such accidents as security bugs).

Granted, it is a bit of a stretch since the notion of timing-safety is
not really well-defined for C source code, but I didn't come up with
better examples.

Alexander


Re: [RFC] GCC Security policy

2023-08-15 Thread David Edelsohn via Gcc-patches
On Tue, Aug 15, 2023 at 7:07 PM Alexander Monakov 
wrote:

>
> On Tue, 15 Aug 2023, Siddhesh Poyarekar wrote:
>
> > > Thanks, this is nicer (see notes below). My main concern is that we
> > > shouldn't pretend there's some method of verifying that arbitrary
> source
> > > code is "safe" to pass to an unsandboxed compiler, nor should we push
> > > the responsibility of doing that on users.
> >
> > But responsibility would be pushed to users, wouldn't it?
>
> Making users responsible for verifying that sources are "safe" is not okay
> (we cannot teach them how to do that since there's no general method).
> Making users responsible for sandboxing the compiler is fine (there's
> a range of sandboxing solutions, from which they can choose according
> to their requirements and threat model). Sorry about the ambiguity.
>

Alex.

The compiler should faithfully implement the algorithms described by the
programmer.  The compiler is responsible if it generates incorrect code for
a well-defined, language-conforming program.  The compiler cannot be
responsible for security issues inherent in the user code, whether that
causes the compiler to function in a manner that deteriorates adversely
affects the system or generates code that behaves in a manner that
adversely affects the system.

If "safe" is the wrong word. What word would you suggest?


> > So:
> >
> > The compiler driver processes source code, invokes other programs such
> as the
> > assembler and linker and generates the output result, which may be
> assembly
> > code or machine code.  Compiling untrusted sources can result in
> arbitrary
> > code execution and unconstrained resource consumption in the compiler.
> As a
> > result, compilation of such code should be done inside a sandboxed
> environment
> > to ensure that it does not compromise the development environment.
>
> I'm happy with this, thanks for bearing with me.
>
> > >> inside a sandboxed environment to ensure that it does not compromise
> the
> > >> development environment.  Note that this still does not guarantee
> safety of
> > >> the produced output programs and that such programs should still
> either be
> > >> analyzed thoroughly for safety or run only inside a sandbox or an
> isolated
> > >> system to avoid compromising the execution environment.
> > >
> > > The last statement seems to be a new addition. It is too broad and
> again
> > > makes a reference to analysis that appears quite theoretical. It might
> be
> > > better to drop this (and instead talk in more specific terms about any
> > > guarantees that produced binary code matches security properties
> intended
> > > by the sources; I believe Richard Sandiford raised this previously).
> >
> > OK, so I actually cover this at the end of the section; Richard's point
> AFAICT
> > was about hardening, which I added another note for to make it explicit
> that
> > missed hardening does not constitute a CVE-worthy threat:
>
> Thanks for the reminder. To illustrate what I was talking about, let me
> give
> two examples:
>
> 1) safety w.r.t timing attacks: even if the source code is written in
> a manner that looks timing-safe, it might be transformed in a way that
> mounting a timing attack on the resulting machine code is possible;
>
> 2) safety w.r.t information leaks: even if the source code attempts
> to discard sensitive data (such as passwords and keys) immediately
> after use, (partial) copies of that data may be left on stack and
> in registers, to be leaked later via a different vulnerability.
>
> For both 1) and 2), GCC is not engineered to respect such properties
> during optimization and code generation, so it's not appropriate for such
> tasks (a possible solution is to isolate such sensitive functions to
> separate files, compile to assembly, inspect the assembly to check that it
> still has the required properties, and use the inspected asm in subsequent
> builds instead of the original high-level source).
>

At some point the system tools need to respect the programmer or operator.
There is a difference between writing "Hello, World" and writing
performance critical or safety critical code.  That is the responsibility
of the programmer and the development team to choose the right software
engineers and right tools.  And to have the development environment and
checks in place to ensure that the results are meeting the requirements.

It is not the role of GCC or its security policy to tell people how to do
their job or hobby.  This isn't a safety tag required to be attached to a
new mattress.

Thanks, David


>
> Cheers.
> Alexander
>


Re: [RFC] GCC Security policy

2023-08-15 Thread David Malcolm via Gcc-patches
On Mon, 2023-08-14 at 09:26 -0400, Siddhesh Poyarekar wrote:
> Hi,
> 
> Here's the updated draft of the top part of the security policy with all 
> of the recommendations incorporated.
> 
> Thanks,
> Sid
> 
> 
> What is a GCC security bug?
> ===
> 
>  A security bug is one that threatens the security of a system or
>  network, or might compromise the security of data stored on it.
>  In the context of GCC there are multiple ways in which this might
>  happen and they're detailed below.
> 
> Compiler drivers, programs, libgccjit and support libraries
> ---
> 
>  The compiler driver processes source code, invokes other programs
>  such as the assembler and linker and generates the output result,
>  which may be assembly code or machine code.  It is necessary that
>  all source code inputs to the compiler are trusted, since it is
>  impossible for the driver to validate input source code beyond
>  conformance to a programming language standard.
> 
>  The GCC JIT implementation, libgccjit, is intended to be plugged
>  into applications to translate input source code in the application
>  context.  Limitations that apply to the compiler
>  driver, apply here too in terms of sanitizing inputs, so it is
>  recommended that inputs are either sanitized by an external program
>  to allow only trusted, safe execution in the context of the
>  application or the JIT execution context is appropriately sandboxed
>  to contain the effects of any bugs in the JIT or its generated code
>  to the sandboxed environment.

I'd prefer to reword this, as libgccjit was a poor choice of name for
the library (sorry!), to make it clearer it can be used for both ahead-
of-time and just-in-time compilation, and that as used for compilation,
the host considerations apply, not just those of the generated target
code.

How about:

 The libgccjit library can, despite the name, be used both for
 ahead-of-time compilation and for just-in-compilation.  In both
 cases it can be used to translate input representations (such as
 source code) in the application context; in the latter case the
 generated code is also run in the application context.
 Limitations that apply to the compiler driver, apply here too in
 terms of sanitizing inputs, so it is recommended that inputs are
 either sanitized by an external program to allow only trusted,
 safe compilation and execution in the context of the application,
 or that both the compilation *and* execution context of the code
 are appropriately sandboxed to contain the effects of any bugs in
 libgccjit, the application code using it, or its generated code to
 the sandboxed environment.

...or similar.

[...snip...]

Thanks
Dave



Re: [RFC] GCC Security policy

2023-08-15 Thread Alexander Monakov


On Tue, 15 Aug 2023, Siddhesh Poyarekar wrote:

> > Thanks, this is nicer (see notes below). My main concern is that we
> > shouldn't pretend there's some method of verifying that arbitrary source
> > code is "safe" to pass to an unsandboxed compiler, nor should we push
> > the responsibility of doing that on users.
> 
> But responsibility would be pushed to users, wouldn't it?

Making users responsible for verifying that sources are "safe" is not okay
(we cannot teach them how to do that since there's no general method).
Making users responsible for sandboxing the compiler is fine (there's
a range of sandboxing solutions, from which they can choose according
to their requirements and threat model). Sorry about the ambiguity.

> So:
> 
> The compiler driver processes source code, invokes other programs such as the
> assembler and linker and generates the output result, which may be assembly
> code or machine code.  Compiling untrusted sources can result in arbitrary
> code execution and unconstrained resource consumption in the compiler. As a
> result, compilation of such code should be done inside a sandboxed environment
> to ensure that it does not compromise the development environment.

I'm happy with this, thanks for bearing with me.

> >> inside a sandboxed environment to ensure that it does not compromise the
> >> development environment.  Note that this still does not guarantee safety of
> >> the produced output programs and that such programs should still either be
> >> analyzed thoroughly for safety or run only inside a sandbox or an isolated
> >> system to avoid compromising the execution environment.
> > 
> > The last statement seems to be a new addition. It is too broad and again
> > makes a reference to analysis that appears quite theoretical. It might be
> > better to drop this (and instead talk in more specific terms about any
> > guarantees that produced binary code matches security properties intended
> > by the sources; I believe Richard Sandiford raised this previously).
> 
> OK, so I actually cover this at the end of the section; Richard's point AFAICT
> was about hardening, which I added another note for to make it explicit that
> missed hardening does not constitute a CVE-worthy threat:

Thanks for the reminder. To illustrate what I was talking about, let me give
two examples:

1) safety w.r.t timing attacks: even if the source code is written in
a manner that looks timing-safe, it might be transformed in a way that
mounting a timing attack on the resulting machine code is possible;

2) safety w.r.t information leaks: even if the source code attempts
to discard sensitive data (such as passwords and keys) immediately
after use, (partial) copies of that data may be left on stack and
in registers, to be leaked later via a different vulnerability.

For both 1) and 2), GCC is not engineered to respect such properties
during optimization and code generation, so it's not appropriate for such
tasks (a possible solution is to isolate such sensitive functions to
separate files, compile to assembly, inspect the assembly to check that it
still has the required properties, and use the inspected asm in subsequent
builds instead of the original high-level source).

Cheers.
Alexander


Re: [PATCH] Add -Wdisabled-optimization warning for not optimizing sibling calls

2023-08-15 Thread Bradley Lucier via Gcc-patches
First, if this is no longer the appropriate group for this discussion, 
please tell me where to send it.


I've been working to understand all the comments here.  From them, I think:

1.  It's OK to have gcc report back to the user whether each particular 
call in tail position is optimized when -foptimize-sibling-calls is set 
as a compiler option; or, to report only those calls that have not been 
optimized.


2.  Given (1), the question is what form that information should take, 
and which gcc option should cause it to be expressed.


From comments in this thread and the documentation for today's mainline 
gcc, I configured and built Gambit Scheme with


./configure CC="/pkgs/gcc-mainline/bin/gcc -fopt-info-missed" 
--enable-single-host


thinking that info about missed optimizations would be a good place to 
export information about non-optimized sibling calls.


This may not have been a good idea, however, as I ended up with 93367 
lines about missed optimizations.


Is this the right direction to proceed in?  The documentation says about 
-fopt-info-missed


 One or more of the following option keywords can be used to
 describe a group of optimizations:

 'ipa'
  Enable dumps from all interprocedural optimizations.
 'loop'
  Enable dumps from all loop optimizations.
 'inline'
  Enable dumps from all inlining optimizations.
 'omp'
  Enable dumps from all OMP (Offloading and Multi Processing)
  optimizations.
 'vec'
  Enable dumps from all vectorization optimizations.
 'optall'
  Enable dumps from all optimizations.  This is a superset of
  the optimization groups listed above.

I'd like to limit the number of missed optimization warnings, but I 
don't know where sibling call optimization would fit into these categories.


Brad


Re: [PATCH v4 4/8] diagnostics: Support obtaining source code lines from generated data buffers

2023-08-15 Thread Lewis Hyatt via Gcc-patches
On Tue, Aug 15, 2023 at 3:46 PM David Malcolm  wrote:
>
> On Tue, 2023-08-15 at 14:15 -0400, Lewis Hyatt wrote:
> > On Tue, Aug 15, 2023 at 12:15:15PM -0400, David Malcolm wrote:
> > > On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > > > This patch enhances location_get_source_line(), which is the
> > > > primary
> > > > interface provided by the diagnostics infrastructure to obtain
> > > > the line of
> > > > source code corresponding to a given location, so that it
> > > > understands
> > > > generated data locations in addition to normal file-based
> > > > locations. This
> > > > involves changing the argument to location_get_source_line() from
> > > > a plain
> > > > file name, to a source_id object that can represent either type
> > > > of location.
> > > >
>
> [...]
>
> > > >
> > > >
> > > > diff --git a/gcc/input.cc b/gcc/input.cc
> > > > index 9377020b460..790279d4273 100644
> > > > --- a/gcc/input.cc
> > > > +++ b/gcc/input.cc
> > > > @@ -207,6 +207,28 @@ private:
> > > >void maybe_grow ();
> > > >  };
> > > >
> > > > +/* This is the implementation of cache_data_source for generated
> > > > +   data that is already in memory.  */
> > > > +class data_cache_slot final : public cache_data_source
> > >
> > > It occurred to me: why are we caching accessing a buffer that's
> > > already
> > > in memory - but we're also caching the line-splitting information,
> > > and
> > > providing the line-splitting algorithm with a consistent interface
> > > to
> > > the data, right?
> > >
> >
> > Yeah, for the current _Pragma use case, multi-line buffers are not
> > going to
> > be common, but they can occur. I was mainly motivated by the
> > consistent
> > interface, and by the assumption that the overhead is not critical
> > given a
> > diagnostic is being issued.
>
> (nods)
>
> >
> > > [...snip...]
> > >
> > > > @@ -397,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file
> > > > (const char *file_path)
> > > >global_dc->m_file_cache->forcibly_evict_file (file_path);
> > > >  }
> > > >
> > > > +void
> > > > +diagnostics_file_cache_forcibly_evict_data (const char *data,
> > > > +   unsigned int
> > > > data_len)
> > > > +{
> > > > +  if (!global_dc->m_file_cache)
> > > > +return;
> > > > +  global_dc->m_file_cache->forcibly_evict_data (data, data_len);
> > >
> > > Maybe we should rename diagnostic_context's m_file_cache to
> > > m_source_cache?  (and class file_cache for that matter?)  But if
> > > so,
> > > that can/should be a followup/separate patch.
> > >
> >
> > Yes, we should. Believe it or not, I was trying to minimize the size
> > of the
> > patch :)
>
> :)
>
> Thanks for splitting it up, BTW.
>
> [...]
>
>
> > >
> > > > @@ -912,26 +1000,22 @@ cache_data_source::read_line_num (size_t
> > > > line_num,
> > > > If the function fails, a NULL char_span is returned.  */
> > > >
> > > >  char_span
> > > > -location_get_source_line (const char *file_path, int line)
> > > > +location_get_source_line (source_id src, int line)
> > > >  {
> > > > -  const char *buffer = NULL;
> > > > -  ssize_t len;
> > > > -
> > > > -  if (line == 0)
> > > > -return char_span (NULL, 0);
> > > > -
> > > > -  if (file_path == NULL)
> > > > -return char_span (NULL, 0);
> > > > +  const char_span fail (nullptr, 0);
> > > > +  if (!src || line <= 0)
> > > > +return fail;
> > >
> > > Looking at source_id's operator bool, are there effectively three
> > > kinds
> > > of source_id?
> > >
> > > (a) file names
> > > (b) generated buffer
> > > (c) NULL == m_filename_or_buffer
> > >
> > > What does (c) mean?  Is it a "something's gone wrong/error" state?
> > > Or
> > > is this more a special-case of (a)? (in that the m_len for such a
> > > case
> > > would be zero)
> > >
> > > Should source_id's 2-param ctor have an assert that the ptr is non-
> > > NULL?
> > >
> > > [...snip...]
> > >
> > > The patch is OK for trunk as-is, but note the question about the
> > > source_id ctor above.
> > >
> >
> > Thanks. (c) has the same meaning as a NULL file name currently does,
> > so a
> > default-constructed source_id is not an in-memory buffer, but is
> > rather a
> > NULL filename. linemap_add() for instance, will interpret a NULL
> > filename
> > for an LC_LEAVE map, as a request to copy it from the natural values
> > being
> > returned to. I think the source_id constructor needs to accept a NULL
> > filename to remain backwards compatible. With the current design of
> > source_id, it is safe always to change a 'const char*' file name
> > argument to
> > a source_id argument instead; it will work just how it did before
> > because it
> > has an implicit constructor. But if the constructor would assert on a
> > non-NULL pointer, that would necessitate changing all call sites that
> > currently expect they can pass a NULL pointer there. (For example,
> > there are
> > several calls to _cpp_do_file_change() within libcpp that take
> > advantage of
> > being able to pass a 

Re: [PATCH v3] LoongArch:Implement 128-bit floating point functions in gcc.

2023-08-15 Thread Joseph Myers
On Tue, 15 Aug 2023, chenxiaolong wrote:

>   In the implementation process, the "q" suffix function is
> Re-register and associate the "__float128" type with the
> "long double" type so that the compiler can handle the
> corresponding function correctly. The functions implemented
> include __builtin_{huge_valq infq, fabsq, copysignq, nanq,nansq}.
> On the LoongArch architecture, __builtin_{fabsq,copysignq} can
> be implemented with the instruction "bstrins.d", so that its
> optimization effect reaches the optimal value.

Why?  If long double has binary128 format, you shouldn't need any of these 
functions at all; if it doesn't, just the C23 _Float128 type name and f128 
constant suffix, and associated built-in functions defined in 
builtins.def, should suffice (and since we now have _FloatN support for 
C++, C++ no longer provides a reason for adding __float128 either).  
__float128 is a legacy type name and feature and shouldn't be needed on 
any new architectures, which can just use the standard type name from the 
start.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] config-list.mk Darwin: Use --with-gnu-as

2023-08-15 Thread Iain Sandoe
Hi  Jan-Benedict,

> On 15 Aug 2023, at 20:36, Jan-Benedict Glaw  wrote:

> config-list.mk Darwin: Use --with-gnu-as for mass-building tests
> 
> As `config-list.mk` is probably mostly used on Linux system, where
> Apple's tools aren't around. Let's use --with-gnu-as instead to have
> an useable assembler.

* Actually I’m somewhat surprised that an “out of the tin” binutils assembler
would work for any arch other than probably x86_64 (which is what Adacore
folks supported for a while).  Otherwise, I’d kind of expect that the 
mach-o-specific
asm directives would trip things up quite quickly.  GAS is not going to produce
any sensible relocations for powerpc (I have a BFD patch, but it needs some
polish)

* Does this prevent it working properly, in the event that a Linux user has
 an installation of mach-o tools (e.g. cctools or llvm)?
 (there are several Linux-buildable branches floating around - and I am
  working on making something more ‘official’)

* Other than those questions, no objection from me (i.e. OK for trunk).

Iain




> 
> contrib/ChangeLog:
> 
>   * config-list.mk (i686-apple-darwin): Use --with-gnu-as.
>   (i686-apple-darwin9): Ditto.
>   (i686-apple-darwin10): Ditto.
>   (powerpc-darwin8): Ditto.
>   (powerpc-darwin7): Ditto.
>   (powerpc64-darwin): Ditto.
>   (x86_64-apple-darwin): Ditto.
> 
> diff --git a/contrib/config-list.mk b/contrib/config-list.mk
> index e570b13c71b..02d1a4fe6d2 100644
> --- a/contrib/config-list.mk
> +++ b/contrib/config-list.mk
> @@ -47,7 +47,9 @@ LIST = aarch64-elf aarch64-freebsd13 aarch64-linux-gnu 
> aarch64-rtems \
>   hppa-linux-gnuOPT-enable-sjlj-exceptions=yes hppa64-linux-gnu \
>   hppa64-hpux11.3 \
>   hppa64-hpux11.0OPT-enable-sjlj-exceptions=yes \
> -  i686-pc-linux-gnu i686-apple-darwin i686-apple-darwin9 i686-apple-darwin10 
> \
> +  i686-pc-linux-gnu \
> +  i686-apple-darwinOPT-with-gnu-as i686-apple-darwin9OPT-with-gnu-as \
> +  i686-apple-darwin10OPT-with-gnu-as \
>   i686-freebsd13 i686-kfreebsd-gnu \
>   i686-netbsdelf9 \
>   i686-openbsd i686-elf i686-kopensolaris-gnu i686-gnu \
> @@ -75,8 +77,8 @@ LIST = aarch64-elf aarch64-freebsd13 aarch64-linux-gnu 
> aarch64-rtems \
>   nvptx-none \
>   or1k-elf or1k-linux-uclibc or1k-linux-musl or1k-rtems \
>   pdp11-aout \
> -  powerpc-darwin8 \
> -  powerpc-darwin7 powerpc64-darwin powerpc-freebsd13 powerpc-netbsd \
> +  powerpc-darwin8OPT-with-gnu-as \
> +  powerpc-darwin7OPT-with-gnu-as powerpc64-darwinOPT-with-gnu-as 
> powerpc-freebsd13 powerpc-netbsd \
>   powerpc-eabisimaltivec powerpc-eabisim ppc-elf \
>   powerpc-eabialtivec powerpc-xilinx-eabi powerpc-eabi \
>   powerpc-rtems \
> @@ -96,7 +98,7 @@ LIST = aarch64-elf aarch64-freebsd13 aarch64-linux-gnu 
> aarch64-rtems \
>   sparc-wrs-vxworks sparc64-elf sparc64-rtems sparc64-linux \
>   sparc64-netbsd sparc64-openbsd \
>   v850e1-elf v850e-elf v850-elf v850-rtems vax-linux-gnu \
> -  vax-netbsdelf visium-elf x86_64-apple-darwin x86_64-gnu \
> +  vax-netbsdelf visium-elf x86_64-apple-darwinOPT-with-gnu-as x86_64-gnu \
>   x86_64-pc-linux-gnuOPT-with-fpmath=avx \
>   x86_64-elfOPT-with-fpmath=sse x86_64-freebsd13 x86_64-netbsd \
>   x86_64-w64-mingw32 \
> 
> 
> Okay for trunk?
> 
> Thanks,
>  Jan-Benedict
> 
> 
> -- 



Re: [PATCH] config-list.mk Darwin: Use --with-gnu-as

2023-08-15 Thread Rainer Orth
Hi Jan-Benedict,

> config-list.mk Darwin: Use --with-gnu-as for mass-building tests
>
> As `config-list.mk` is probably mostly used on Linux system, where
> Apple's tools aren't around. Let's use --with-gnu-as instead to have
> an useable assembler.
>
> contrib/ChangeLog:
>
>   * config-list.mk (i686-apple-darwin): Use --with-gnu-as.
>   (i686-apple-darwin9): Ditto.
>   (i686-apple-darwin10): Ditto.
>   (powerpc-darwin8): Ditto.
>   (powerpc-darwin7): Ditto.
>   (powerpc64-darwin): Ditto.
>   (x86_64-apple-darwin): Ditto.

this doesn't seem right: binutils toplevel configure.ac has gas in
noconfigdirs for all but i?86-*-darwin*.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH v4 4/8] diagnostics: Support obtaining source code lines from generated data buffers

2023-08-15 Thread David Malcolm via Gcc-patches
On Tue, 2023-08-15 at 14:15 -0400, Lewis Hyatt wrote:
> On Tue, Aug 15, 2023 at 12:15:15PM -0400, David Malcolm wrote:
> > On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > > This patch enhances location_get_source_line(), which is the
> > > primary
> > > interface provided by the diagnostics infrastructure to obtain
> > > the line of
> > > source code corresponding to a given location, so that it
> > > understands
> > > generated data locations in addition to normal file-based
> > > locations. This
> > > involves changing the argument to location_get_source_line() from
> > > a plain
> > > file name, to a source_id object that can represent either type
> > > of location.
> > > 

[...]

> > > 
> > > 
> > > diff --git a/gcc/input.cc b/gcc/input.cc
> > > index 9377020b460..790279d4273 100644
> > > --- a/gcc/input.cc
> > > +++ b/gcc/input.cc
> > > @@ -207,6 +207,28 @@ private:
> > >    void maybe_grow ();
> > >  };
> > >  
> > > +/* This is the implementation of cache_data_source for generated
> > > +   data that is already in memory.  */
> > > +class data_cache_slot final : public cache_data_source
> > 
> > It occurred to me: why are we caching accessing a buffer that's
> > already
> > in memory - but we're also caching the line-splitting information,
> > and
> > providing the line-splitting algorithm with a consistent interface
> > to
> > the data, right?
> > 
> 
> Yeah, for the current _Pragma use case, multi-line buffers are not
> going to
> be common, but they can occur. I was mainly motivated by the
> consistent
> interface, and by the assumption that the overhead is not critical
> given a
> diagnostic is being issued.

(nods)

> 
> > [...snip...]
> > 
> > > @@ -397,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file
> > > (const char *file_path)
> > >    global_dc->m_file_cache->forcibly_evict_file (file_path);
> > >  }
> > >  
> > > +void
> > > +diagnostics_file_cache_forcibly_evict_data (const char *data,
> > > +   unsigned int
> > > data_len)
> > > +{
> > > +  if (!global_dc->m_file_cache)
> > > +    return;
> > > +  global_dc->m_file_cache->forcibly_evict_data (data, data_len);
> > 
> > Maybe we should rename diagnostic_context's m_file_cache to
> > m_source_cache?  (and class file_cache for that matter?)  But if
> > so,
> > that can/should be a followup/separate patch.
> > 
> 
> Yes, we should. Believe it or not, I was trying to minimize the size
> of the
> patch :) 

:)

Thanks for splitting it up, BTW.

[...]


> > 
> > > @@ -912,26 +1000,22 @@ cache_data_source::read_line_num (size_t
> > > line_num,
> > >     If the function fails, a NULL char_span is returned.  */
> > >  
> > >  char_span
> > > -location_get_source_line (const char *file_path, int line)
> > > +location_get_source_line (source_id src, int line)
> > >  {
> > > -  const char *buffer = NULL;
> > > -  ssize_t len;
> > > -
> > > -  if (line == 0)
> > > -    return char_span (NULL, 0);
> > > -
> > > -  if (file_path == NULL)
> > > -    return char_span (NULL, 0);
> > > +  const char_span fail (nullptr, 0);
> > > +  if (!src || line <= 0)
> > > +    return fail;
> > 
> > Looking at source_id's operator bool, are there effectively three
> > kinds
> > of source_id?
> > 
> > (a) file names
> > (b) generated buffer
> > (c) NULL == m_filename_or_buffer
> > 
> > What does (c) mean?  Is it a "something's gone wrong/error" state? 
> > Or
> > is this more a special-case of (a)? (in that the m_len for such a
> > case
> > would be zero)
> > 
> > Should source_id's 2-param ctor have an assert that the ptr is non-
> > NULL?
> > 
> > [...snip...]
> > 
> > The patch is OK for trunk as-is, but note the question about the
> > source_id ctor above.
> > 
> 
> Thanks. (c) has the same meaning as a NULL file name currently does,
> so a
> default-constructed source_id is not an in-memory buffer, but is
> rather a
> NULL filename. linemap_add() for instance, will interpret a NULL
> filename
> for an LC_LEAVE map, as a request to copy it from the natural values
> being
> returned to. I think the source_id constructor needs to accept a NULL
> filename to remain backwards compatible. With the current design of
> source_id, it is safe always to change a 'const char*' file name
> argument to
> a source_id argument instead; it will work just how it did before
> because it
> has an implicit constructor. But if the constructor would assert on a
> non-NULL pointer, that would necessitate changing all call sites that
> currently expect they can pass a NULL pointer there. (For example,
> there are
> several calls to _cpp_do_file_change() within libcpp that take
> advantage of
> being able to pass a NULL filename to linemap_add.)

Yes, it's OK for this ctor to accept NULL;
   source_id (const char *filename = nullptr)
and I see you added the default arg.

I was referring to this ctor:
   source_id (const char *buffer, unsigned buffer_len)
Is it ever OK for "buffer" to be NULL in this 2-param ctor, or can we
assert 

Re: [PATCH v4 3/8] diagnostics: Refactor class file_cache_slot

2023-08-15 Thread David Malcolm via Gcc-patches
On Tue, 2023-08-15 at 13:58 -0400, Lewis Hyatt wrote:
> On Tue, Aug 15, 2023 at 11:43:05AM -0400, David Malcolm wrote:
> > On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > > Class file_cache_slot in input.cc is used to query specific lines
> > > of source
> > > code from a file when needed by diagnostics infrastructure. This
> > > will be
> > > extended in a subsequent patch to support obtaining the source
> > > code from
> > > in-memory generated buffers rather than from a file. The present
> > > patch
> > > refactors class file_cache_slot, putting most of the logic into a
> > > new base
> > > class cache_data_source, in preparation for reusing that code in
> > > the next
> > > patch. There is no change in functionality yet.
> > > 

[...snip...]

> > 
> > I confess I had to reread both this and patch 4/8 to make sense of
> > this; this is probably one of those cases where it's harder to read
> > in
> > patch form than as source, but I think I now understand the new
> > implementation.
> 
> Yes, sorry about that. I hope at least splitting into two patches
> here made it
> a little easier.
> 
> > 
> > Did you try testing this with valgrind (e.g. "make selftest-
> > valgrind")?
> > 
> 
> Oh interesting, was not aware of this. I think it shows that new
> leaks were
> not introduced with the patch series.
> 

[...snip...]

> 
> 
> > I don't think we have any selftest coverage for "\r" in the line-
> > break
> > handling; that would be good to add.
> > 
> > This patch is OK for trunk once the rest of the kit is approved.
> 
> Thank you. To be clear, were you suggesting to add selftest coverage
> for \r
> endings now, or in a follow up?

The former, please, so that we can sure that the patch doesn't
introduce any buffer overreads etc.

Thanks
Dave



[PATCH] config-list.mk Darwin: Use --with-gnu-as

2023-08-15 Thread Jan-Benedict Glaw
Hi!

config-list.mk Darwin: Use --with-gnu-as for mass-building tests

As `config-list.mk` is probably mostly used on Linux system, where
Apple's tools aren't around. Let's use --with-gnu-as instead to have
an useable assembler.

contrib/ChangeLog:

* config-list.mk (i686-apple-darwin): Use --with-gnu-as.
(i686-apple-darwin9): Ditto.
(i686-apple-darwin10): Ditto.
(powerpc-darwin8): Ditto.
(powerpc-darwin7): Ditto.
(powerpc64-darwin): Ditto.
(x86_64-apple-darwin): Ditto.

diff --git a/contrib/config-list.mk b/contrib/config-list.mk
index e570b13c71b..02d1a4fe6d2 100644
--- a/contrib/config-list.mk
+++ b/contrib/config-list.mk
@@ -47,7 +47,9 @@ LIST = aarch64-elf aarch64-freebsd13 aarch64-linux-gnu 
aarch64-rtems \
   hppa-linux-gnuOPT-enable-sjlj-exceptions=yes hppa64-linux-gnu \
   hppa64-hpux11.3 \
   hppa64-hpux11.0OPT-enable-sjlj-exceptions=yes \
-  i686-pc-linux-gnu i686-apple-darwin i686-apple-darwin9 i686-apple-darwin10 \
+  i686-pc-linux-gnu \
+  i686-apple-darwinOPT-with-gnu-as i686-apple-darwin9OPT-with-gnu-as \
+  i686-apple-darwin10OPT-with-gnu-as \
   i686-freebsd13 i686-kfreebsd-gnu \
   i686-netbsdelf9 \
   i686-openbsd i686-elf i686-kopensolaris-gnu i686-gnu \
@@ -75,8 +77,8 @@ LIST = aarch64-elf aarch64-freebsd13 aarch64-linux-gnu 
aarch64-rtems \
   nvptx-none \
   or1k-elf or1k-linux-uclibc or1k-linux-musl or1k-rtems \
   pdp11-aout \
-  powerpc-darwin8 \
-  powerpc-darwin7 powerpc64-darwin powerpc-freebsd13 powerpc-netbsd \
+  powerpc-darwin8OPT-with-gnu-as \
+  powerpc-darwin7OPT-with-gnu-as powerpc64-darwinOPT-with-gnu-as 
powerpc-freebsd13 powerpc-netbsd \
   powerpc-eabisimaltivec powerpc-eabisim ppc-elf \
   powerpc-eabialtivec powerpc-xilinx-eabi powerpc-eabi \
   powerpc-rtems \
@@ -96,7 +98,7 @@ LIST = aarch64-elf aarch64-freebsd13 aarch64-linux-gnu 
aarch64-rtems \
   sparc-wrs-vxworks sparc64-elf sparc64-rtems sparc64-linux \
   sparc64-netbsd sparc64-openbsd \
   v850e1-elf v850e-elf v850-elf v850-rtems vax-linux-gnu \
-  vax-netbsdelf visium-elf x86_64-apple-darwin x86_64-gnu \
+  vax-netbsdelf visium-elf x86_64-apple-darwinOPT-with-gnu-as x86_64-gnu \
   x86_64-pc-linux-gnuOPT-with-fpmath=avx \
   x86_64-elfOPT-with-fpmath=sse x86_64-freebsd13 x86_64-netbsd \
   x86_64-w64-mingw32 \


Okay for trunk?

Thanks,
  Jan-Benedict


-- 


signature.asc
Description: PGP signature


[PATCH] config-list.mk i686-solaris2.11: Use --with-gnu-as

2023-08-15 Thread Jan-Benedict Glaw
Hi!

i686-solaris2.11: Use --with-gnu-as for mass-building tests

As `config-list.mk` is probably mostly used on Linux system, where
Solaris's `as` isn't available, let's use GNU `as` as the default.

contrib/ChangeLog:

* config-list.mk (i686-solaris2.11): Use --with-gnu-as.

diff --git a/contrib/config-list.mk b/contrib/config-list.mk
index e570b13c71b..cb158a6c71e 100644
--- a/contrib/config-list.mk
+++ b/contrib/config-list.mk
@@ -52,7 +52,7 @@ LIST = aarch64-elf aarch64-freebsd13 aarch64-linux-gnu 
aarch64-rtems \
   i686-netbsdelf9 \
   i686-openbsd i686-elf i686-kopensolaris-gnu i686-gnu \
   i686-pc-msdosdjgpp i686-lynxos i686-nto-qnx \
-  i686-rtems i686-solaris2.11 i686-wrs-vxworks \
+  i686-rtems i686-solaris2.11OPT-with-gnu-as i686-wrs-vxworks \
   i686-wrs-vxworksae \
   i686-cygwinOPT-enable-threads=yes i686-mingw32crt ia64-elf \
   ia64-linux ia64-hpux ia64-hp-vms iq2000-elf lm32-elf \

Okay for trunk?

Thanks,
  Jan-Benedict

-- 


signature.asc
Description: PGP signature


Re: [RFC] GCC Security policy

2023-08-15 Thread Siddhesh Poyarekar

On 2023-08-15 10:07, Alexander Monakov wrote:


On Tue, 15 Aug 2023, Siddhesh Poyarekar wrote:


Does this as the first paragraph address your concerns:


Thanks, this is nicer (see notes below). My main concern is that we shouldn't
pretend there's some method of verifying that arbitrary source code is "safe"
to pass to an unsandboxed compiler, nor should we push the responsibility of
doing that on users.


But responsibility would be pushed to users, wouldn't it?


The compiler driver processes source code, invokes other programs such as the
assembler and linker and generates the output result, which may be assembly
code or machine code.  It is necessary that all source code inputs to the
compiler are trusted, since it is impossible for the driver to validate input
source code for safety.


The statement begins with "It is necessary", but the next statement offers
an alternative in case the code is untrusted. This is a contradiction.
Is it necessary or not in the end?

I'd suggest to drop this statement and instead make a brief note that
compiling crafted/untrusted sources can result in arbitrary code execution
and unconstrained resource consumption in the compiler.


So:

The compiler driver processes source code, invokes other programs such 
as the assembler and linker and generates the output result, which may 
be assembly code or machine code.  Compiling untrusted sources can 
result in arbitrary code execution and unconstrained resource 
consumption in the compiler. As a result, compilation of such code 
should be done inside a sandboxed environment to ensure that it does not 
compromise the development environment.



For untrusted code should compilation should be done

  ^^
 typo (spurious 'should')


Ack, thanks.




inside a sandboxed environment to ensure that it does not compromise the
development environment.  Note that this still does not guarantee safety of
the produced output programs and that such programs should still either be
analyzed thoroughly for safety or run only inside a sandbox or an isolated
system to avoid compromising the execution environment.


The last statement seems to be a new addition. It is too broad and again
makes a reference to analysis that appears quite theoretical. It might be
better to drop this (and instead talk in more specific terms about any
guarantees that produced binary code matches security properties intended
by the sources; I believe Richard Sandiford raised this previously).


OK, so I actually cover this at the end of the section; Richard's point 
AFAICT was about hardening, which I added another note for to make it 
explicit that missed hardening does not constitute a CVE-worthy threat:


As a result, the only case for a potential security issue in the
compiler is when it generates vulnerable application code for
trusted input source code that is conforming to the relevant
programming standard or extensions documented as supported by GCC
and the algorithm expressed in the source code does not have the
vulnerability.  The output application code could be considered
vulnerable if it produces an actual vulnerability in the target
application, specifically in the following cases:

- The application dereferences an invalid memory location despite
  the application sources being valid.
- The application reads from or writes to a valid but incorrect
  memory location, resulting in an information integrity issue or an
  information leak.
- The application ends up running in an infinite loop or with
  severe degradation in performance despite the input sources having
  no such issue, resulting in a Denial of Service.  Note that
  correct but non-performant code is not a security issue candidate,
  this only applies to incorrect code that may result in performance
  degradation severe enough to amount to a denial of service.
- The application crashes due to the generated incorrect code,
  resulting in a Denial of Service.


Re: [PATCH] bpf: remove useless define_insn for extendsisi2

2023-08-15 Thread Jose E. Marchesi via Gcc-patches


OK.
Thanks!

> This define_insn is never used, since a sign-extend to the same mode is
> just a move, so delete it.
>
> Tested on x86_64-linux-gnu host for bpf-unknown-none target.
>
> gcc/
>
>   * config/bpf/bpf.md (extendsisi2): Delete useless define_insn.
> ---
>  gcc/config/bpf/bpf.md | 7 ---
>  1 file changed, 7 deletions(-)
>
> diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
> index e0a42b9f939..a64de1095ed 100644
> --- a/gcc/config/bpf/bpf.md
> +++ b/gcc/config/bpf/bpf.md
> @@ -350,13 +350,6 @@ (define_insn "extendqidi2"
> {ldxsb\t%0,%1|%0 = *(s8 *) (%1)}"
>[(set_attr "type" "alu,ldx")])
>  
> -(define_insn "extendsisi2"
> -  [(set (match_operand:SI 0 "register_operand" "=r")
> -(sign_extend:SI (match_operand:SI 1 "register_operand" "r")))]
> -  "bpf_has_smov"
> -  "{movs32\t%0,%1,32|%w0 = (s32) %w1}"
> -  [(set_attr "type" "alu")])
> -
>  (define_insn "extendhisi2"
>[(set (match_operand:SI 0 "register_operand" "=r")
>  (sign_extend:SI (match_operand:HI 1 "register_operand" "r")))]


Re: [PATCH] bpf: fix pseudoc w regs for small modes [PR111029]

2023-08-15 Thread Jose E. Marchesi via Gcc-patches


Hello David.
Thanks for the patch.

OK.

> In the BPF pseudo-c assembly dialect, registers treated as 32-bits
> rather than the full 64 in various instructions ought to be printed as
> "wN" rather than "rN".  But bpf_print_register () was only doing this
> for specifically SImode registers, meaning smaller modes were printed
> incorrectly.
>
> This caused assembler errors like:
>
>   Error: unrecognized instruction `w2 =(s8)r1'
>
> for a 32-bit sign-extending register move instruction, where the source
> register is used in QImode.
>
> Fix bpf_print_register () to print the "w" version of register when
> specified by the template for any mode 32-bits or smaller.
>
> Tested on bpf-unknown-none.
>
>   PR target/111029
>
> gcc/
>   * config/bpf/bpf.cc (bpf_print_register): Print 'w' registers
>   for any mode 32-bits or smaller, not just SImode.
>
> gcc/testsuite/
>
>   * gcc.target/bpf/smov-2.c: New test.
>   * gcc.target/bpf/smov-pseudoc-2.c: New test.
> ---
>  gcc/config/bpf/bpf.cc |  2 +-
>  gcc/testsuite/gcc.target/bpf/smov-2.c | 15 +++
>  gcc/testsuite/gcc.target/bpf/smov-pseudoc-2.c | 15 +++
>  3 files changed, 31 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/bpf/smov-2.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/smov-pseudoc-2.c
>
> diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
> index 3516b79bce4..1d0abd7fbb3 100644
> --- a/gcc/config/bpf/bpf.cc
> +++ b/gcc/config/bpf/bpf.cc
> @@ -753,7 +753,7 @@ bpf_print_register (FILE *file, rtx op, int code)
>  fprintf (file, "%s", reg_names[REGNO (op)]);
>else
>  {
> -  if (code == 'w' && GET_MODE (op) == SImode)
> +  if (code == 'w' && GET_MODE_SIZE (GET_MODE (op)) <= 4)
>   {
> if (REGNO (op) == BPF_FP)
>   fprintf (file, "w10");
> diff --git a/gcc/testsuite/gcc.target/bpf/smov-2.c 
> b/gcc/testsuite/gcc.target/bpf/smov-2.c
> new file mode 100644
> index 000..6f3516d2385
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/bpf/smov-2.c
> @@ -0,0 +1,15 @@
> +/* Check signed 32-bit mov instructions.  */
> +/* { dg-do compile } */
> +/* { dg-options "-mcpu=v4 -O2" } */
> +
> +int
> +foo (unsigned char a, unsigned short b)
> +{
> +  int x = (char) a;
> +  int y = (short) b;
> +
> +  return x + y;
> +}
> +
> +/* { dg-final { scan-assembler {movs32\t%r.,%r.,8\n} } } */
> +/* { dg-final { scan-assembler {movs32\t%r.,%r.,16\n} } } */
> diff --git a/gcc/testsuite/gcc.target/bpf/smov-pseudoc-2.c 
> b/gcc/testsuite/gcc.target/bpf/smov-pseudoc-2.c
> new file mode 100644
> index 000..6af6cadf8df
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/bpf/smov-pseudoc-2.c
> @@ -0,0 +1,15 @@
> +/* Check signed 32-bit mov instructions (pseudo-C asm dialect).  */
> +/* { dg-do compile } */
> +/* { dg-options "-mcpu=v4 -O2 -masm=pseudoc" } */
> +
> +int
> +foo (unsigned char a, unsigned short b)
> +{
> +  int x = (char) a;
> +  int y = (short) b;
> +
> +  return x + y;
> +}
> +
> +/* { dg-final { scan-assembler {w. = \(s8\) w.\n} } } */
> +/* { dg-final { scan-assembler {w. = \(s16\) w.\n} } } */


[PATCH] bpf: remove useless define_insn for extendsisi2

2023-08-15 Thread David Faust via Gcc-patches
This define_insn is never used, since a sign-extend to the same mode is
just a move, so delete it.

Tested on x86_64-linux-gnu host for bpf-unknown-none target.

gcc/

* config/bpf/bpf.md (extendsisi2): Delete useless define_insn.
---
 gcc/config/bpf/bpf.md | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
index e0a42b9f939..a64de1095ed 100644
--- a/gcc/config/bpf/bpf.md
+++ b/gcc/config/bpf/bpf.md
@@ -350,13 +350,6 @@ (define_insn "extendqidi2"
{ldxsb\t%0,%1|%0 = *(s8 *) (%1)}"
   [(set_attr "type" "alu,ldx")])
 
-(define_insn "extendsisi2"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-(sign_extend:SI (match_operand:SI 1 "register_operand" "r")))]
-  "bpf_has_smov"
-  "{movs32\t%0,%1,32|%w0 = (s32) %w1}"
-  [(set_attr "type" "alu")])
-
 (define_insn "extendhisi2"
   [(set (match_operand:SI 0 "register_operand" "=r")
 (sign_extend:SI (match_operand:HI 1 "register_operand" "r")))]
-- 
2.40.1



[PATCH] bpf: fix pseudoc w regs for small modes [PR111029]

2023-08-15 Thread David Faust via Gcc-patches
In the BPF pseudo-c assembly dialect, registers treated as 32-bits
rather than the full 64 in various instructions ought to be printed as
"wN" rather than "rN".  But bpf_print_register () was only doing this
for specifically SImode registers, meaning smaller modes were printed
incorrectly.

This caused assembler errors like:

  Error: unrecognized instruction `w2 =(s8)r1'

for a 32-bit sign-extending register move instruction, where the source
register is used in QImode.

Fix bpf_print_register () to print the "w" version of register when
specified by the template for any mode 32-bits or smaller.

Tested on bpf-unknown-none.

PR target/111029

gcc/
* config/bpf/bpf.cc (bpf_print_register): Print 'w' registers
for any mode 32-bits or smaller, not just SImode.

gcc/testsuite/

* gcc.target/bpf/smov-2.c: New test.
* gcc.target/bpf/smov-pseudoc-2.c: New test.
---
 gcc/config/bpf/bpf.cc |  2 +-
 gcc/testsuite/gcc.target/bpf/smov-2.c | 15 +++
 gcc/testsuite/gcc.target/bpf/smov-pseudoc-2.c | 15 +++
 3 files changed, 31 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/bpf/smov-2.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/smov-pseudoc-2.c

diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
index 3516b79bce4..1d0abd7fbb3 100644
--- a/gcc/config/bpf/bpf.cc
+++ b/gcc/config/bpf/bpf.cc
@@ -753,7 +753,7 @@ bpf_print_register (FILE *file, rtx op, int code)
 fprintf (file, "%s", reg_names[REGNO (op)]);
   else
 {
-  if (code == 'w' && GET_MODE (op) == SImode)
+  if (code == 'w' && GET_MODE_SIZE (GET_MODE (op)) <= 4)
{
  if (REGNO (op) == BPF_FP)
fprintf (file, "w10");
diff --git a/gcc/testsuite/gcc.target/bpf/smov-2.c 
b/gcc/testsuite/gcc.target/bpf/smov-2.c
new file mode 100644
index 000..6f3516d2385
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/smov-2.c
@@ -0,0 +1,15 @@
+/* Check signed 32-bit mov instructions.  */
+/* { dg-do compile } */
+/* { dg-options "-mcpu=v4 -O2" } */
+
+int
+foo (unsigned char a, unsigned short b)
+{
+  int x = (char) a;
+  int y = (short) b;
+
+  return x + y;
+}
+
+/* { dg-final { scan-assembler {movs32\t%r.,%r.,8\n} } } */
+/* { dg-final { scan-assembler {movs32\t%r.,%r.,16\n} } } */
diff --git a/gcc/testsuite/gcc.target/bpf/smov-pseudoc-2.c 
b/gcc/testsuite/gcc.target/bpf/smov-pseudoc-2.c
new file mode 100644
index 000..6af6cadf8df
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/smov-pseudoc-2.c
@@ -0,0 +1,15 @@
+/* Check signed 32-bit mov instructions (pseudo-C asm dialect).  */
+/* { dg-do compile } */
+/* { dg-options "-mcpu=v4 -O2 -masm=pseudoc" } */
+
+int
+foo (unsigned char a, unsigned short b)
+{
+  int x = (char) a;
+  int y = (short) b;
+
+  return x + y;
+}
+
+/* { dg-final { scan-assembler {w. = \(s8\) w.\n} } } */
+/* { dg-final { scan-assembler {w. = \(s16\) w.\n} } } */
-- 
2.40.1



[PATCH] testsuite: Remove unused dg-line in ce8cdf5bcf96a2db6d7b9f656fc9ba58d7942a83

2023-08-15 Thread Benjamin Priour via Gcc-patches
From: benjamin priour 

Yet another blunder.

Succesfully regstrapped against ce8cdf5bcf96a2db6d7b9f656fc9ba58d7942a83
on x86_64-linux-gnu.

OK to push on trunk ?
Sorry,
Benjamin.

Fixup below.
---

Test case g++.dg/analyzer/fanalyzer-show-events-in-system-headers.C
introduced by patch ce8cdf5bcf96a2db6d7b9f656fc9ba58d7942a83
emitted a warning for an unused dg-line variable.
This fixes up the blunder.

Signed-off-by: benjamin priour 

gcc/testsuite/ChangeLog:

* g++.dg/analyzer/fanalyzer-show-events-in-system-headers.C:
Remove dg-line var declare_a.
---
 .../g++.dg/analyzer/fanalyzer-show-events-in-system-headers.C   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/gcc/testsuite/g++.dg/analyzer/fanalyzer-show-events-in-system-headers.C 
b/gcc/testsuite/g++.dg/analyzer/fanalyzer-show-events-in-system-headers.C
index 4cc93d129f0..aa964f93563 100644
--- a/gcc/testsuite/g++.dg/analyzer/fanalyzer-show-events-in-system-headers.C
+++ b/gcc/testsuite/g++.dg/analyzer/fanalyzer-show-events-in-system-headers.C
@@ -6,7 +6,7 @@
 struct A {int x; int y;};
 
 int main () { /* { dg-message "\\(1\\) entry to 'main'" "telltale event that 
we are going within a deeper frame than 'main'" } */
-  std::shared_ptr a; /* { dg-line declare_a } */ 
+  std::shared_ptr a;
   a->x = 4; /* { dg-line deref_a } */ 
   /* { dg-warning "dereference of NULL" "" { target *-*-* } deref_a } */
 
-- 
2.34.1



[PATCH V3] riscv: generate builtin macro for compilation with strict alignment:

2023-08-15 Thread Edwin Lu
This patch is a modification of 
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610115.html
following the discussion on
https://github.com/riscv-non-isa/riscv-c-api-doc/issues/32

Distinguish between explicit -mstrict-align and cpu tune param
for slow_unaligned_access=true/false. 

Tested for regressions using rv32/64 multilib with newlib/linux

gcc/ChangeLog:

* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins):
  Generate __riscv_unaligned_avoid with value 1 or
 __riscv_unaligned_slow with value 1 or
 __riscv_unaligned_fast with value 1
* config/riscv/riscv.cc (riscv_option_override):
Define riscv_user_wants_strict_align. Set
riscv_user_wants_strict_align to TARGET_STRICT_ALIGN
* config/riscv/riscv.h: Declare riscv_user_wants_strict_align

gcc/testsuite/ChangeLog:

* gcc.target/riscv/attribute-1.c: Check for
__riscv_unaligned_slow or __riscv_unaligned_fast
* gcc.target/riscv/attribute-4.c: Check for
__riscv_unaligned_avoid
* gcc.target/riscv/attribute-5.c: Check for
__riscv_unaligned_slow or __riscv_unaligned_fast
* gcc.target/riscv/predef-align-1.c: New test.
* gcc.target/riscv/predef-align-2.c: New test.
* gcc.target/riscv/predef-align-3.c: New test.
* gcc.target/riscv/predef-align-4.c: New test.
* gcc.target/riscv/predef-align-5.c: New test.
* gcc.target/riscv/predef-align-6.c: New test.

Signed-off-by: Edwin Lu 
Co-authored-by: Vineet Gupta 
---
Changes in V3:
- Clean up tests to be less verbose
- Fix style, comments, and consistency

Changes in V2:
- Updated naming conventions
  - Updated tests when -m[no-]strict-align is not explicitly added
---
 gcc/config/riscv/riscv-c.cc |  7 +++
 gcc/config/riscv/riscv.cc   |  9 +
 gcc/config/riscv/riscv.h|  1 +
 gcc/testsuite/gcc.target/riscv/attribute-1.c| 12 
 gcc/testsuite/gcc.target/riscv/attribute-4.c| 10 ++
 gcc/testsuite/gcc.target/riscv/attribute-5.c| 11 +++
 gcc/testsuite/gcc.target/riscv/predef-align-1.c | 16 
 gcc/testsuite/gcc.target/riscv/predef-align-2.c | 15 +++
 gcc/testsuite/gcc.target/riscv/predef-align-3.c | 16 
 gcc/testsuite/gcc.target/riscv/predef-align-4.c | 16 
 gcc/testsuite/gcc.target/riscv/predef-align-5.c | 15 +++
 gcc/testsuite/gcc.target/riscv/predef-align-6.c | 16 
 12 files changed, 144 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-6.c

diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index 2937c160071..283052ae313 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -108,6 +108,13 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
 
 }
 
+  if (riscv_user_wants_strict_align)
+builtin_define_with_int_value ("__riscv_unaligned_avoid", 1);
+  else if (riscv_slow_unaligned_access_p)
+builtin_define_with_int_value ("__riscv_unaligned_slow", 1);
+  else
+builtin_define_with_int_value ("__riscv_unaligned_fast", 1);
+
   if (TARGET_MIN_VLEN != 0)
 builtin_define_with_int_value ("__riscv_v_min_vlen", TARGET_MIN_VLEN);
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 49062bef9fc..705b750aaad 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -247,6 +247,9 @@ struct riscv_tune_info {
 /* Whether unaligned accesses execute very slowly.  */
 bool riscv_slow_unaligned_access_p;
 
+/* Whether user explicitly passed -mstrict-align.  */
+bool riscv_user_wants_strict_align;
+
 /* Stack alignment to assume/maintain.  */
 unsigned riscv_stack_boundary;
 
@@ -6962,6 +6965,12 @@ riscv_option_override (void)
  -m[no-]strict-align is left unspecified, heed -mtune's advice.  */
   riscv_slow_unaligned_access_p = (cpu->tune_param->slow_unaligned_access
   || TARGET_STRICT_ALIGN);
+
+  /* Make a note if user explicity passed -mstrict-align for later
+ builtin macro generation.  Can't use target_flags_explicitly since
+ it is set even for -mno-strict-align.  */
+  riscv_user_wants_strict_align = TARGET_STRICT_ALIGN;
+
   if ((target_flags_explicit & MASK_STRICT_ALIGN) == 0
   && cpu->tune_param->slow_unaligned_access)
 target_flags |= MASK_STRICT_ALIGN;
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index e18a0081297..e093db09d31 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -1036,6 +1036,7 @@ while (0)
 

Re: [PATCH v4 4/8] diagnostics: Support obtaining source code lines from generated data buffers

2023-08-15 Thread Lewis Hyatt via Gcc-patches
On Tue, Aug 15, 2023 at 12:15:15PM -0400, David Malcolm wrote:
> On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > This patch enhances location_get_source_line(), which is the primary
> > interface provided by the diagnostics infrastructure to obtain the line of
> > source code corresponding to a given location, so that it understands
> > generated data locations in addition to normal file-based locations. This
> > involves changing the argument to location_get_source_line() from a plain
> > file name, to a source_id object that can represent either type of location.
> > 
> > gcc/ChangeLog:
> > 
> > * input.cc (class data_cache_slot): New class.
> > (file_cache::lookup_data): New function.
> > (diagnostics_file_cache_forcibly_evict_data): New function.
> > (file_cache::forcibly_evict_data): New function.
> > (file_cache::evicted_cache_tab_entry): Generalize (via a template)
> > to work for both file_cache_slot and data_cache_slot.
> > (file_cache::add_file): Adapt for new interface to
> > evicted_cache_tab_entry.
> > (file_cache::add_data): New function.
> > (data_cache_slot::create): New function.
> > (file_cache::file_cache): Support the new m_data_slots member.
> > (file_cache::~file_cache): Likewise.
> > (file_cache::lookup_or_add_data): New function.
> > (file_cache::lookup_or_add): New function that calls either
> > lookup_or_add_data or lookup_or_add_file as appropriate.
> > (location_get_source_line): Change the FILE_PATH argument to a
> > source_id SRC, and use it to support obtaining source lines from
> > generated data as well as from files.
> > (location_compute_display_column): Support generated data using the
> > new features of location_get_source_line.
> > (dump_location_info): Likewise.
> > * input.h (location_get_source_line): Adjust prototype. Add a new
> > convenience overload taking an expanded_location.
> > (class cache_data_source): Declare.
> > (class data_cache_slot): Declare.
> > (class file_cache): Declare new members.
> > (diagnostics_file_cache_forcibly_evict_data): Declare.
> > ---
> >  gcc/input.cc | 171 ---
> >  gcc/input.h  |  23 +--
> >  2 files changed, 153 insertions(+), 41 deletions(-)
> > 
> > diff --git a/gcc/input.cc b/gcc/input.cc
> > index 9377020b460..790279d4273 100644
> > --- a/gcc/input.cc
> > +++ b/gcc/input.cc
> > @@ -207,6 +207,28 @@ private:
> >void maybe_grow ();
> >  };
> >  
> > +/* This is the implementation of cache_data_source for generated
> > +   data that is already in memory.  */
> > +class data_cache_slot final : public cache_data_source
> 
> It occurred to me: why are we caching accessing a buffer that's already
> in memory - but we're also caching the line-splitting information, and
> providing the line-splitting algorithm with a consistent interface to
> the data, right?
>

Yeah, for the current _Pragma use case, multi-line buffers are not going to
be common, but they can occur. I was mainly motivated by the consistent
interface, and by the assumption that the overhead is not critical given a
diagnostic is being issued.

> [...snip...]
> 
> > @@ -397,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file (const char 
> > *file_path)
> >global_dc->m_file_cache->forcibly_evict_file (file_path);
> >  }
> >  
> > +void
> > +diagnostics_file_cache_forcibly_evict_data (const char *data,
> > +   unsigned int data_len)
> > +{
> > +  if (!global_dc->m_file_cache)
> > +return;
> > +  global_dc->m_file_cache->forcibly_evict_data (data, data_len);
> 
> Maybe we should rename diagnostic_context's m_file_cache to
> m_source_cache?  (and class file_cache for that matter?)  But if so,
> that can/should be a followup/separate patch.
>

Yes, we should. Believe it or not, I was trying to minimize the size of the
patch :) So I didn't make such changes, but they will make things more
clear.

> [...snip...]
>  
> > @@ -525,10 +582,22 @@ file_cache_slot::create (const 
> > file_cache::input_context _context,
> >return true;
> >  }
> >  
> > +void
> > +data_cache_slot::create (const char *data, unsigned int data_len,
> > +unsigned int highest_use_count)
> > +{
> > +  reset ();
> > +  on_create (highest_use_count + 1,
> > +total_lines_num (source_id {data, data_len}));
> > +  m_data_begin = data;
> > +  m_data_end = data + data_len;
> > +}
> > +
> >  /* file_cache's ctor.  */
> >  
> >  file_cache::file_cache ()
> > -: m_file_slots (new file_cache_slot[num_file_slots])
> > +  : m_file_slots (new file_cache_slot[num_file_slots]),
> > +m_data_slots (new data_cache_slot[num_file_slots])
> 
> Should "num_file_slots" be renamed to "num_slots"?
> 
> I assume you're using the same value for both kinds of 

Re: [PATCH v4 3/8] diagnostics: Refactor class file_cache_slot

2023-08-15 Thread Lewis Hyatt via Gcc-patches
On Tue, Aug 15, 2023 at 11:43:05AM -0400, David Malcolm wrote:
> On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > Class file_cache_slot in input.cc is used to query specific lines of source
> > code from a file when needed by diagnostics infrastructure. This will be
> > extended in a subsequent patch to support obtaining the source code from
> > in-memory generated buffers rather than from a file. The present patch
> > refactors class file_cache_slot, putting most of the logic into a new base
> > class cache_data_source, in preparation for reusing that code in the next
> > patch. There is no change in functionality yet.
> > 
> > gcc/ChangeLog:
> > 
> > * input.cc (class file_cache_slot): Refactor functionality into a
> > new base class...
> > (class cache_data_source): ...here.
> > (file_cache::forcibly_evict_file): Adapt for refactoring.
> > (file_cache_slot::evict): Renamed to...
> > (file_cache_slot::reset): ...this, and partially refactored into
> > base class...
> > (cache_data_source::reset): ...here.
> > (file_cache_slot::get_full_file_content): Moved into base class...
> > (cache_data_source::get_full_file_content): ...here.
> > (file_cache_slot::create): Adapt for refactoring.
> > (file_cache_slot::file_cache_slot): Refactor partially into...
> > (cache_data_source::cache_data_source): ...here.
> > (file_cache_slot::~file_cache_slot): Refactor partially into...
> > (cache_data_source::~cache_data_source): ...here.
> > (file_cache_slot::needs_read_p): Remove.
> > (file_cache_slot::needs_grow_p): Remove.
> > (file_cache_slot::maybe_grow): Adapt for refactoring.
> > (file_cache_slot::read_data): Refactored, along with...
> > (file_cache_slot::maybe_read_data): this, into...
> > (file_cache_slot::get_more_data): ...here.
> > (find_end_of_line): Change interface to take a pair of pointers,
> > rather than a pointer + length.
> > (file_cache_slot::get_next_line): Refactored into...
> > (cache_data_source::get_next_line): ...here.
> > (file_cache_slot::goto_next_line): Refactored into...
> > (cache_data_source::goto_next_line): ...here.
> > (file_cache_slot::read_line_num): Refactored into...
> > (cache_data_source::read_line_num): ...here.
> > (location_get_source_line): Fix const-correctness as necessitated by
> > new interface.
> > ---
> >  gcc/input.cc | 513 +++
> >  1 file changed, 235 insertions(+), 278 deletions(-)
> > 
> 
> I confess I had to reread both this and patch 4/8 to make sense of
> this; this is probably one of those cases where it's harder to read in
> patch form than as source, but I think I now understand the new
> implementation.

Yes, sorry about that. I hope at least splitting into two patches here made it
a little easier.

> 
> Did you try testing this with valgrind (e.g. "make selftest-valgrind")?
>

Oh interesting, was not aware of this. I think it shows that new leaks were
not introduced with the patch series.

BEFORE patch series:
==1572278==
-fself-test: 7634593 pass(es) in 22.799240 seconds
==1572278==
==1572278== HEAP SUMMARY:
==1572278== in use at exit: 1,083,255 bytes in 2,394 blocks
==1572278==   total heap usage: 2,704,869 allocs, 2,702,475 frees, 
1,257,334,536 bytes allocated
==1572278==
==1572278== 8,032 bytes in 1 blocks are possibly lost in loss record 639 of 657
==1572278==at 0x4848899: malloc (in 
/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1572278==by 0x21FE1CB: xmalloc (xmalloc.c:149)
==1572278==by 0x21B02E0: new_buff (lex.cc:4767)
==1572278==by 0x21B02E0: _cpp_get_buff (lex.cc:4800)
==1572278==by 0x21ACC80: cpp_create_reader(c_lang, ht*, line_maps*) 
(init.cc:289)
==1572278==by 0xA64282: c_common_init_options(unsigned int, 
cl_decoded_option*) (c-opts.cc:237)
==1572278==by 0x95E479: toplev::main(int, char**) (toplev.cc:2241)
==1572278==by 0x960B2D: main (main.cc:39)
==1572278==
==1572278== LEAK SUMMARY:
==1572278==definitely lost: 0 bytes in 0 blocks
==1572278==indirectly lost: 0 bytes in 0 blocks
==1572278==  possibly lost: 8,032 bytes in 1 blocks
==1572278==still reachable: 1,075,223 bytes in 2,393 blocks
==1572278== suppressed: 0 bytes in 0 blocks
==1572278== Reachable blocks (those to which a pointer was found) are not shown.
==1572278== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==1572278==
==1572278== For lists of detected and suppressed errors, rerun with: -s
==1572278== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

AFTER patch series:
==1594840==
-fself-test: 7638403 pass(es) in 23.671784 seconds
==1594840==
==1594840== HEAP SUMMARY:
==1594840== in use at exit: 1,081,759 bytes in 2,367 blocks
==1594840==   total heap usage: 2,728,561 

Re: [PATCH v4 8/8] diagnostics: Support generated data locations in SARIF output

2023-08-15 Thread Lewis Hyatt via Gcc-patches
On Tue, Aug 15, 2023 at 01:04:04PM -0400, David Malcolm wrote:
> On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > The diagnostics routines for SARIF output need to read the source code back
> > in, so that they can generate "snippet" and "content" records, so they need 
> > to
> > be able to cope with generated data locations.  Add support for that in
> > diagnostic-format-sarif.cc.
> > 
> > gcc/ChangeLog:
> > 
> > * diagnostic-format-sarif.cc (class sarif_builder): Adapt interface
> > to support generated data locations.
> > (sarif_builder::maybe_make_physical_location_object): Change the
> > m_filenames hash_set to support generated data.
> > (sarif_builder::make_artifact_location_object): Use a source_id 
> > rather
> > than a plain file name.
> > (sarif_builder::maybe_make_region_object): Adapt to
> > expanded_location interface changes.
> > (sarif_builder::maybe_make_region_object_for_context): Likewise.
> > (sarif_builder::make_artifact_object): Likewise.
> > (sarif_builder::make_run_object): Handle generated data.
> > (sarif_builder::maybe_make_artifact_content_object): Likewise.
> > (get_source_lines): Likewise.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * c-c++-common/diagnostic-format-sarif-file-5.c: New test.
> 
> I'm not sure if generated data is allowed as part of a SARIF artefact,
> or if there's a more standard-compliant way of representing this; SARIF
> says an artefact is a "sequence of bytes addressable via a URI".
> 
> Can you post a simple example of the generated .sarif JSON please? 
> e.g. from the new test, so that we can see it looks like.
> 
> You could run it through:
> 
>   python -m json.tool 
> 
> to format it for easier reading.

For a simple example like:

_Pragma("GCC diagnostic ignored \"-Wnot-an-option\"")

for which the normal output is:

=
In buffer generated from t.cpp:1:
:1:24: warning: unknown option after ‘#pragma GCC diagnostic’ kind 
[-Wpragmas]
1 | GCC diagnostic ignored "-Wnot-an-option"
  |^
t.cpp:1:1: note: in <_Pragma directive>
1 | _Pragma("GCC diagnostic ignored \"-Wnot-an-option\"")
  | ^~~
=

The SARIF output does not end up referencing any generated data locations,
because those are logically part of the "expansion" of the _Pragma
directive, and it doesn't output macro expansions.  In order for SARIF to
currently do something with generated data, it needs to see a generated data
location in a non-macro context. The only way to get GCC to do that, right
now, is with -fdump-internal-locations, which is what the new test case
does. That just unfortunately generates a larger amount of output. I attached
it, in case that's still helpful, for the following program:

=
_Pragma("GCC diagnostic push")
=

I guess there's potentially already a problem here because 'python -m
json.tool' is unhappy with this output and refuses to process it:

=
Invalid \escape: line 1 column 3436 (char 3435)
=

The related text is:
=
{"location": {"uri": "", "uriBaseId": "PWD"},
"contents":{"text": "GCC diagnostic push\n\0"}
=

And the \0 is not allowed it seems?

I also attached the output of 'python -m json.tool' anyway, after manually
removing the \0.

Is it better to just skip these locations for now?

-Lewis
{"$schema": 
"https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json;,
 "version": "2.1.0", "runs": [{"tool": {"driver": {"name": "GNU C++17", 
"fullName": "GNU C++17 (GCC) version 14.0.0 20230811 (experimental) 
(x86_64-pc-linux-gnu)", "version": "14.0.0 20230811 (experimental)", 
"informationUri": "https://gcc.gnu.org/gcc-14/;, "rules": []}}, "invocations": 
[{"executionSuccessful": true, "toolExecutionNotifications": []}], 
"originalUriBaseIds": {"PWD": {"uri": "file:///home/lewis/"}}, "artifacts": 
[{"location": {"uri": "t.cpp", "uriBaseId": "PWD"}, "contents": {"text": 
"_Pragma(\"GCC diagnostic push\")\n"}, "sourceLanguage": "cplusplus"}, 
{"location": {"uri": "/usr/include/stdc-predef.h"}, "contents": {"text": "/* 
Copyright (C) 1991-2022 Free Software Foundation, Inc.\n   This file is part of 
the GNU C Library.\n\n   The GNU C Library is free software; you can 
redistribute it and/or\n   modify it under the terms of the GNU Lesser General 
Public\n 
   License as published by the Free Software Foundation; either\n   version 2.1 
of the License, or (at your option) any later version.\n\n   The GNU C Library 
is distributed in the hope that it will be useful,\n   but WITHOUT ANY 
WARRANTY; without even the implied warranty of\n   MERCHANTABILITY or FITNESS 
FOR A PARTICULAR PURPOSE.  See the GNU\n   Lesser General Public License for 
more details.\n\n   You should have received a copy of the GNU Lesser General 
Public\n   License along with the GNU C Library; if not, see\n   
.  

[gcc 11 backport] Support ld.mold linker.

2023-08-15 Thread Romain Geissler via Gcc-patches
Hi,

Is it ok to backport small unrisky features to the old gcc 11 branch ?
Here is a proposal to merge the ld.mold linker support which Martin has
pushed in gcc >= 12. It's a cherry-pick of commit
ad964f7eaef9c03ce68a01cfdd7fde9d56524868. Note that it doesn't backport
the gcc build machinery to be able to link gcc itself with mold.

Note: it also doesn't backport Martin's change in the LTO machinery to
support LTO plugin with mold.

Cheers,
Romain


gcc/ChangeLog:

* collect2.c (main): Add ld.mold.
* common.opt: Add -fuse-ld=mold.
* doc/invoke.texi: Document it.
* gcc.c (driver_handle_option): Handle -fuse-ld=mold.
* opts.c (common_handle_option): Likewise.
---
 gcc/collect2.c  | 10 +++---
 gcc/common.opt  |  4 
 gcc/doc/invoke.texi |  4 
 gcc/gcc.c   |  4 
 gcc/opts.c  |  1 +
 5 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/gcc/collect2.c b/gcc/collect2.c
index 3e212fc75f3..558016af486 100644
--- a/gcc/collect2.c
+++ b/gcc/collect2.c
@@ -785,6 +785,7 @@ main (int argc, char **argv)
   USE_GOLD_LD,
   USE_BFD_LD,
   USE_LLD_LD,
+  USE_MOLD_LD,
   USE_LD_MAX
 } selected_linker = USE_DEFAULT_LD;
   static const char *const ld_suffixes[USE_LD_MAX] =
@@ -793,7 +794,8 @@ main (int argc, char **argv)
   PLUGIN_LD_SUFFIX,
   "ld.gold",
   "ld.bfd",
-  "ld.lld"
+  "ld.lld",
+  "ld.mold"
 };
   static const char *const real_ld_suffix = "real-ld";
   static const char *const collect_ld_suffix = "collect-ld";
@@ -970,6 +972,8 @@ main (int argc, char **argv)
  selected_linker = USE_GOLD_LD;
else if (strcmp (argv[i], "-fuse-ld=lld") == 0)
  selected_linker = USE_LLD_LD;
+   else if (strcmp (argv[i], "-fuse-ld=mold") == 0)
+ selected_linker = USE_MOLD_LD;
else if (strncmp (argv[i], "-o", 2) == 0)
  {
/* Parse the output filename if it's given so that we can make
@@ -1082,7 +1086,7 @@ main (int argc, char **argv)
   ld_file_name = 0;
 #ifdef DEFAULT_LINKER
   if (selected_linker == USE_BFD_LD || selected_linker == USE_GOLD_LD ||
-  selected_linker == USE_LLD_LD)
+  selected_linker == USE_LLD_LD || selected_linker == USE_MOLD_LD)
 {
   char *linker_name;
 # ifdef HOST_EXECUTABLE_SUFFIX
@@ -1317,7 +1321,7 @@ main (int argc, char **argv)
  else if (!use_collect_ld
   && strncmp (arg, "-fuse-ld=", 9) == 0)
{
- /* Do not pass -fuse-ld={bfd|gold|lld} to the linker. */
+ /* Do not pass -fuse-ld={bfd|gold|lld|mold} to the linker. */
  ld1--;
  ld2--;
}
diff --git a/gcc/common.opt b/gcc/common.opt
index 4a3f09d9e1f..b7f0a52348c 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2967,6 +2967,10 @@ fuse-ld=lld
 Common Driver Negative(fuse-ld=lld)
 Use the lld LLVM linker instead of the default linker.
 
+fuse-ld=mold
+Common Driver Negative(fuse-ld=mold)
+Use the Modern linker (MOLD) linker instead of the default linker.
+
 fuse-linker-plugin
 Common Undocumented Var(flag_use_linker_plugin)
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index fffa899585e..3ecdb85 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15606,6 +15606,10 @@ Use the @command{gold} linker instead of the default 
linker.
 @opindex fuse-ld=lld
 Use the LLVM @command{lld} linker instead of the default linker.
 
+@item -fuse-ld=mold
+@opindex fuse-ld=mold
+Use the Modern Linker (@command{mold}) instead of the default linker.
+
 @cindex Libraries
 @item -l@var{library}
 @itemx -l @var{library}
diff --git a/gcc/gcc.c b/gcc/gcc.c
index 20a649ea08e..ecda8cbba02 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -4196,6 +4196,10 @@ driver_handle_option (struct gcc_options *opts,
use_ld = ".gold";
break;
 
+case OPT_fuse_ld_mold:
+   use_ld = ".mold";
+   break;
+
 case OPT_fcompare_debug_second:
   compare_debug_second = 1;
   break;
diff --git a/gcc/opts.c b/gcc/opts.c
index 24bb64198c8..9192ca75743 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -2875,6 +2875,7 @@ common_handle_option (struct gcc_options *opts,
 case OPT_fuse_ld_bfd:
 case OPT_fuse_ld_gold:
 case OPT_fuse_ld_lld:
+case OPT_fuse_ld_mold:
 case OPT_fuse_linker_plugin:
   /* No-op. Used by the driver and passed to us because it starts with f.*/
   break;
-- 
2.39.3



Re: [PATCH v4 8/8] diagnostics: Support generated data locations in SARIF output

2023-08-15 Thread David Malcolm via Gcc-patches
On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> The diagnostics routines for SARIF output need to read the source code back
> in, so that they can generate "snippet" and "content" records, so they need to
> be able to cope with generated data locations.  Add support for that in
> diagnostic-format-sarif.cc.
> 
> gcc/ChangeLog:
> 
> * diagnostic-format-sarif.cc (class sarif_builder): Adapt interface
> to support generated data locations.
> (sarif_builder::maybe_make_physical_location_object): Change the
> m_filenames hash_set to support generated data.
> (sarif_builder::make_artifact_location_object): Use a source_id rather
> than a plain file name.
> (sarif_builder::maybe_make_region_object): Adapt to
> expanded_location interface changes.
> (sarif_builder::maybe_make_region_object_for_context): Likewise.
> (sarif_builder::make_artifact_object): Likewise.
> (sarif_builder::make_run_object): Handle generated data.
> (sarif_builder::maybe_make_artifact_content_object): Likewise.
> (get_source_lines): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> * c-c++-common/diagnostic-format-sarif-file-5.c: New test.

I'm not sure if generated data is allowed as part of a SARIF artefact,
or if there's a more standard-compliant way of representing this; SARIF
says an artefact is a "sequence of bytes addressable via a URI".

Can you post a simple example of the generated .sarif JSON please? 
e.g. from the new test, so that we can see it looks like.

You could run it through:

  python -m json.tool 

to format it for easier reading.


Thanks
Dave



Re: [PATCH v4 6/8] diagnostics: Full support for generated data locations

2023-08-15 Thread David Malcolm via Gcc-patches
On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> Previous patches in this series have laid the groundwork for supporting
> source code locations in memory ("generated data") rather than ordinary
> files. This patch completes the support by adding awareness of such
> locations to all places that need to support them. The main changes are to
> diagnostic-show-locus.cc; the others are primarily small tweaks such as
> changing from the FILE to the SRC member when inspecting an
> expanded_location.
> 
> gcc/c-family/ChangeLog:
> 
> * c-format.cc (get_corrected_substring): Use the new overload of
> location_get_source_line() to support generated data.
> * c-indentation.cc (get_visual_column): Likewise.
> (get_first_nws_vis_column): Change argument from a plain file name
> to a source_id.
> (detect_intervening_unindent): Likewise.
> (should_warn_for_misleading_indentation): Pass
> detect_intervening_unindent() the SRC field rather than the FILE
> field from the expanded_location.
> 
> gcc/ChangeLog:
> 
> * gcc-rich-location.cc (blank_line_before_p): Use the new overload
> of location_get_source_line() to support generated data.
> * input.cc (get_source_text_between): Likewise.
> (get_substring_ranges_for_loc): Likewise.
> (get_source_file_content): Change the argument from a plain filename
> to a source_id.
> (location_missing_trailing_newline): Likewise.
> * input.h (get_source_file_content): Adjust prototype.
> (location_missing_trailing_newline): Likewise.
> * diagnostic-show-locus.cc (layout::calculate_x_offset_display): Use
> the new overload of location_get_source_line() to support generated
> data.
> (layout::print_line): Likewise.
> (class line_corrections): Change m_filename from a plain filename to
> a source_id.
> (source_line::source_line): Change argument from a plain filename to
> a source_id.
> (line_corrections::add_hint): Adapt to source_line change.
> (layout::print_trailing_fixits): Adapt to line_corrections change.
> (test_layout_x_offset_display_utf8): Test generated data too.
> (test_layout_x_offset_display_tab): Likewise.
> (test_diagnostic_show_locus_one_liner): Likewise.
> (test_diagnostic_show_locus_one_liner_utf8): Likewise.
> (test_add_location_if_nearby): Likewise.
> (test_diagnostic_show_locus_fixit_lines): Likewise.
> (test_fixit_consolidation): Likewise.
> (test_overlapped_fixit_printing): Likewise.
> (test_overlapped_fixit_printing_utf8): Likewise.
> (test_overlapped_fixit_printing_2): Likewise.
> (test_fixit_insert_containing_newline): Likewise.
> (test_fixit_insert_containing_newline_2): Likewise.
> (test_fixit_replace_containing_newline): Likewise.
> (test_fixit_deletion_affecting_newline): Likewise.
> (test_tab_expansion): Likewise.
> (test_escaping_bytes_1): Likewise.
> (test_escaping_bytes_2): Likewise.
> (test_line_numbers_multiline_range): Likewise.
> (diagnostic_show_locus_cc_tests): Likewise.
> ---
>  gcc/c-family/c-format.cc  |   2 +-
>  gcc/c-family/c-indentation.cc |   8 +-
>  gcc/diagnostic-show-locus.cc  | 227 ++
>  gcc/gcc-rich-location.cc  |   2 +-
>  gcc/input.cc  |  21 ++--
>  gcc/input.h   |   6 +-
>  6 files changed, 136 insertions(+), 130 deletions(-)
> 

Looks OK for trunk as-is (assuming prerequisites, of course), but as I
think you noted elsewhere this probably needs revising if we're going
to reject applying fix-it-hints to locations in generated data buffers.

Thanks
Dave


Re: [PATCH v4 5/8] diagnostics: Support testing generated data in input.cc selftests

2023-08-15 Thread David Malcolm via Gcc-patches
On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> Add selftests for the new capabilities in input.cc related to source code
> locations that are stored in memory rather than ordinary files.
> 
> gcc/ChangeLog:
> 
> * input.cc (temp_source_file::do_linemap_add): New function.
> (line_table_case::line_table_case): Add GENERATED_DATA argument.
> (line_table_test::line_table_test): Implement new M_GENERATED_DATA
> argument.
> (for_each_line_table_case): Optionally include generated data
> locations in the set of cases.
> (test_accessing_ordinary_linemaps): Test generated data locations.
> (test_make_location_nonpure_range_endpoints): Likewise.
> (test_line_offset_overflow): Likewise.
> (input_cc_tests): Likewise.
> * selftest.cc (named_temp_file::named_temp_file): Interpret a null
> SUFFIX argument as a request to use in-memory data.
> (named_temp_file::~named_temp_file): Support in-memory data.
> (temp_source_file::temp_source_file): Likewise.
> (temp_source_file::~temp_source_file): Likewise.
> * selftest.h (struct line_map_ordinary): Foward declare.
> (class named_temp_file): Add missing explicit to the constructor.
> (class temp_source_file): Add new members to support in-memory data.
> (class line_table_test): Likewise.
> (for_each_line_table_case): Adjust prototype.
> ---
>  gcc/input.cc    | 81 +
>  gcc/selftest.cc | 53 +---
>  gcc/selftest.h  | 19 ++--
>  3 files changed, 113 insertions(+), 40 deletions(-)
> 

Thanks; looks good to me.

Dave



Re: [PATCH] testsuite: Adjust g++.dg/gomp/pr58567.C to new compiler message

2023-08-15 Thread Thiago Jung Bauermann via Gcc-patches


Hello,

Thiago Jung Bauermann  writes:

> Commit 92d1425ca780 "c++: redundant targ coercion for var/alias tmpls"
> changed the compiler error message in this testcase from
>
> : In instantiation of 'void foo() [with T = int]':
> :14:11:   required from here
> :8:22: error: 'int' is not a class, struct, or union type
> :8:22: error: 'int' is not a class, struct, or union type
> :8:22: error: 'int' is not a class, struct, or union type
> :8:3: error: expected iteration declaration or initialization
> compiler exited with status 1
>
> to:
>
> : In instantiation of 'void foo() [with T = int]':
> :14:11:   required from here
> :8:22: error: 'int' is not a class, struct, or union type
> :8:3: error: invalid type for iteration variable 'i'
> compiler exited with status 1
> Excess errors:
> :8:3: error: invalid type for iteration variable 'i'
>
> Andrew Pinski analysed the issue in PR 110756 and considered that it was a
> testsuite issue in that the error message changed slightly.  Also, it's a
> better error message.
>
> Therefore, we only need to adjust the testcase to expect the new message.
>
> gcc/testsuite/ChangeLog:
>   PR testsuite/110756
>   g++.dg/gomp/pr58567.C: Adjust to new compiler error message.
> ---
>  gcc/testsuite/g++.dg/gomp/pr58567.C | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/g++.dg/gomp/pr58567.C 
> b/gcc/testsuite/g++.dg/gomp/pr58567.C
> index 35a5bb027ffe..866d831c65e4 100644
> --- a/gcc/testsuite/g++.dg/gomp/pr58567.C
> +++ b/gcc/testsuite/g++.dg/gomp/pr58567.C
> @@ -5,7 +5,7 @@
>  template void foo()
>  {
>#pragma omp parallel for
> -  for (typename T::X i = 0; i < 100; ++i)  /* { dg-error "'int' is not a 
> class, struct, or union type|expected iteration declaration or 
> initialization" } */
> +  for (typename T::X i = 0; i < 100; ++i)  /* { dg-error "'int' is not a 
> class, struct, or union type|invalid type for iteration variable 'i'" } */
>  ;
>  }
>  

Ping? I just tested trunk. It still fails this test, and this patch
still fixes the failures.

-- 
Thiago


Re: [PATCH v4 4/8] diagnostics: Support obtaining source code lines from generated data buffers

2023-08-15 Thread David Malcolm via Gcc-patches
On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> This patch enhances location_get_source_line(), which is the primary
> interface provided by the diagnostics infrastructure to obtain the line of
> source code corresponding to a given location, so that it understands
> generated data locations in addition to normal file-based locations. This
> involves changing the argument to location_get_source_line() from a plain
> file name, to a source_id object that can represent either type of location.
> 
> gcc/ChangeLog:
> 
> * input.cc (class data_cache_slot): New class.
> (file_cache::lookup_data): New function.
> (diagnostics_file_cache_forcibly_evict_data): New function.
> (file_cache::forcibly_evict_data): New function.
> (file_cache::evicted_cache_tab_entry): Generalize (via a template)
> to work for both file_cache_slot and data_cache_slot.
> (file_cache::add_file): Adapt for new interface to
> evicted_cache_tab_entry.
> (file_cache::add_data): New function.
> (data_cache_slot::create): New function.
> (file_cache::file_cache): Support the new m_data_slots member.
> (file_cache::~file_cache): Likewise.
> (file_cache::lookup_or_add_data): New function.
> (file_cache::lookup_or_add): New function that calls either
> lookup_or_add_data or lookup_or_add_file as appropriate.
> (location_get_source_line): Change the FILE_PATH argument to a
> source_id SRC, and use it to support obtaining source lines from
> generated data as well as from files.
> (location_compute_display_column): Support generated data using the
> new features of location_get_source_line.
> (dump_location_info): Likewise.
> * input.h (location_get_source_line): Adjust prototype. Add a new
> convenience overload taking an expanded_location.
> (class cache_data_source): Declare.
> (class data_cache_slot): Declare.
> (class file_cache): Declare new members.
> (diagnostics_file_cache_forcibly_evict_data): Declare.
> ---
>  gcc/input.cc | 171 ---
>  gcc/input.h  |  23 +--
>  2 files changed, 153 insertions(+), 41 deletions(-)
> 
> diff --git a/gcc/input.cc b/gcc/input.cc
> index 9377020b460..790279d4273 100644
> --- a/gcc/input.cc
> +++ b/gcc/input.cc
> @@ -207,6 +207,28 @@ private:
>    void maybe_grow ();
>  };
>  
> +/* This is the implementation of cache_data_source for generated
> +   data that is already in memory.  */
> +class data_cache_slot final : public cache_data_source

It occurred to me: why are we caching accessing a buffer that's already
in memory - but we're also caching the line-splitting information, and
providing the line-splitting algorithm with a consistent interface to
the data, right?

[...snip...]

> @@ -397,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file (const char 
> *file_path)
>    global_dc->m_file_cache->forcibly_evict_file (file_path);
>  }
>  
> +void
> +diagnostics_file_cache_forcibly_evict_data (const char *data,
> +   unsigned int data_len)
> +{
> +  if (!global_dc->m_file_cache)
> +    return;
> +  global_dc->m_file_cache->forcibly_evict_data (data, data_len);

Maybe we should rename diagnostic_context's m_file_cache to
m_source_cache?  (and class file_cache for that matter?)  But if so,
that can/should be a followup/separate patch.

[...snip...]
 
> @@ -525,10 +582,22 @@ file_cache_slot::create (const 
> file_cache::input_context _context,
>    return true;
>  }
>  
> +void
> +data_cache_slot::create (const char *data, unsigned int data_len,
> +    unsigned int highest_use_count)
> +{
> +  reset ();
> +  on_create (highest_use_count + 1,
> +    total_lines_num (source_id {data, data_len}));
> +  m_data_begin = data;
> +  m_data_end = data + data_len;
> +}
> +
>  /* file_cache's ctor.  */
>  
>  file_cache::file_cache ()
> -: m_file_slots (new file_cache_slot[num_file_slots])
> +  : m_file_slots (new file_cache_slot[num_file_slots]),
> +    m_data_slots (new data_cache_slot[num_file_slots])

Should "num_file_slots" be renamed to "num_slots"?

I assume you're using the same value for both kinds of slot since the
file_cache::evicted_cache_tab_entry template uses this.  I suppose the
number could be passed in as an argument to that function if we wanted
to have different sizes for the two kinds, but I don't think it
matters.

[...snip...]

> @@ -912,26 +1000,22 @@ cache_data_source::read_line_num (size_t line_num,
>     If the function fails, a NULL char_span is returned.  */
>  
>  char_span
> -location_get_source_line (const char *file_path, int line)
> +location_get_source_line (source_id src, int line)
>  {
> -  const char *buffer = NULL;
> -  ssize_t len;
> -
> -  if (line == 0)
> -    return char_span (NULL, 0);
> -
> -  if (file_path == NULL)
> -    return 

[PATCH v2][GCC] aarch64: Add support for Cortex-A720 CPU

2023-08-15 Thread Richard Ball via Gcc-patches

v2: Add missing PROFILE feature flag.

This patch adds support for the Cortex-A720 CPU to GCC.

No regressions on aarch64-none-elf.

Ok for master?

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (AARCH64_CORE): Add Cortex-
A720 CPU.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi: Document Cortex-A720 CPU.

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 
dbac497ef3aab410eb81db185b2e9532186888bb..73976e9a4c5e4f0b5c04bc7974e2006ddfd02fff
 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -176,6 +176,8 @@ AARCH64_CORE("cortex-a710",  cortexa710, cortexa57, V9A,  
(SVE2_BITPERM, MEMTAG,
 
 AARCH64_CORE("cortex-a715",  cortexa715, cortexa57, V9A,  (SVE2_BITPERM, 
MEMTAG, I8MM, BF16), neoversen2, 0x41, 0xd4d, -1)
 
+AARCH64_CORE("cortex-a720",  cortexa720, cortexa57, V9_2A,  (SVE2_BITPERM, 
MEMTAG, PROFILE), neoversen2, 0x41, 0xd81, -1)
+
 AARCH64_CORE("cortex-x2",  cortexx2, cortexa57, V9A,  (SVE2_BITPERM, MEMTAG, 
I8MM, BF16), neoversen2, 0x41, 0xd48, -1)
 
 AARCH64_CORE("cortex-x3",  cortexx3, cortexa57, V9A,  (SVE2_BITPERM, MEMTAG, 
I8MM, BF16), neoversen2, 0x41, 0xd4e, -1)
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index 
2170980dddb0d5d410a49631ad26ff2e346b39dd..12d610f0f6580096eed9cf3de8ad3239efde5e4b
 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,neoversev1,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexx2,cortexx3,neoversen2,demeter,neoversev2"
+   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,neoversev1,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,neoversen2,demeter,neoversev2"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
2c870d3c34b587ffc721b1f18f99ecd66d4217be..62537d9d09e25f864c27534b7ac2ec467ea24789
 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -20517,7 +20517,8 @@ performance of the code.  Permissible values for this 
option are:
 @samp{cortex-a75.cortex-a55}, @samp{cortex-a76.cortex-a55},
 @samp{cortex-r82}, @samp{cortex-x1}, @samp{cortex-x1c}, @samp{cortex-x2},
 @samp{cortex-x3}, @samp{cortex-a510}, @samp{cortex-a520}, @samp{cortex-a710},
-@samp{cortex-a715}, @samp{ampere1}, @samp{ampere1a}, and @samp{native}.
+@samp{cortex-a715}, @samp{cortex-a720}, @samp{ampere1}, @samp{ampere1a},
+and @samp{native}.
 
 The values @samp{cortex-a57.cortex-a53}, @samp{cortex-a72.cortex-a53},
 @samp{cortex-a73.cortex-a35}, @samp{cortex-a73.cortex-a53},


[PATCH] RISC-V: Fix reduc_strict_run-1 test case.

2023-08-15 Thread Robin Dapp via Gcc-patches
Hi,

this patch changes the equality check for the reduc_strict_run-1
testcase from == to fabs () < EPS.  The FAIL only occurs with
_Float16 but I'd argue approximate equality is preferable for all
float modes.

Regards
 Robin

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c:
Check float equality with fabs < EPS.
---
 .../riscv/rvv/autovec/reduc/reduc_strict_run-1.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c
index 516be97e9eb..93efe2c4333 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c
@@ -2,6 +2,9 @@
 /* { dg-additional-options "--param=riscv-autovec-preference=scalable 
-fno-vect-cost-model" } */
 
 #include "reduc_strict-1.c"
+#include 
+
+#define EPS 1e-2
 
 #define TEST_REDUC_PLUS(TYPE)  \
   {\
@@ -10,14 +13,14 @@
 TYPE r = 0, q = 3; \
 for (int i = 0; i < NUM_ELEMS (TYPE); i++) \
   {\
-   a[i] = (i * 0.1) * (i & 1 ? 1 : -1);\
-   b[i] = (i * 0.3) * (i & 1 ? 1 : -1);\
+   a[i] = (i * 0.01) * (i & 1 ? 1 : -1);   \
+   b[i] = (i * 0.03) * (i & 1 ? 1 : -1);   \
r += a[i];  \
q -= b[i];  \
asm volatile ("" ::: "memory"); \
   }\
 TYPE res = reduc_plus_##TYPE (a, b);   \
-if (res != r * q)  \
+if (fabs (res - r * q) > EPS)  \
   __builtin_abort ();  \
   }
 
-- 
2.41.0


Re: [PATCH v4 3/8] diagnostics: Refactor class file_cache_slot

2023-08-15 Thread David Malcolm via Gcc-patches
On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> Class file_cache_slot in input.cc is used to query specific lines of source
> code from a file when needed by diagnostics infrastructure. This will be
> extended in a subsequent patch to support obtaining the source code from
> in-memory generated buffers rather than from a file. The present patch
> refactors class file_cache_slot, putting most of the logic into a new base
> class cache_data_source, in preparation for reusing that code in the next
> patch. There is no change in functionality yet.
> 
> gcc/ChangeLog:
> 
> * input.cc (class file_cache_slot): Refactor functionality into a
> new base class...
> (class cache_data_source): ...here.
> (file_cache::forcibly_evict_file): Adapt for refactoring.
> (file_cache_slot::evict): Renamed to...
> (file_cache_slot::reset): ...this, and partially refactored into
> base class...
> (cache_data_source::reset): ...here.
> (file_cache_slot::get_full_file_content): Moved into base class...
> (cache_data_source::get_full_file_content): ...here.
> (file_cache_slot::create): Adapt for refactoring.
> (file_cache_slot::file_cache_slot): Refactor partially into...
> (cache_data_source::cache_data_source): ...here.
> (file_cache_slot::~file_cache_slot): Refactor partially into...
> (cache_data_source::~cache_data_source): ...here.
> (file_cache_slot::needs_read_p): Remove.
> (file_cache_slot::needs_grow_p): Remove.
> (file_cache_slot::maybe_grow): Adapt for refactoring.
> (file_cache_slot::read_data): Refactored, along with...
> (file_cache_slot::maybe_read_data): this, into...
> (file_cache_slot::get_more_data): ...here.
> (find_end_of_line): Change interface to take a pair of pointers,
> rather than a pointer + length.
> (file_cache_slot::get_next_line): Refactored into...
> (cache_data_source::get_next_line): ...here.
> (file_cache_slot::goto_next_line): Refactored into...
> (cache_data_source::goto_next_line): ...here.
> (file_cache_slot::read_line_num): Refactored into...
> (cache_data_source::read_line_num): ...here.
> (location_get_source_line): Fix const-correctness as necessitated by
> new interface.
> ---
>  gcc/input.cc | 513 +++
>  1 file changed, 235 insertions(+), 278 deletions(-)
> 

I confess I had to reread both this and patch 4/8 to make sense of
this; this is probably one of those cases where it's harder to read in
patch form than as source, but I think I now understand the new
implementation.

Did you try testing this with valgrind (e.g. "make selftest-valgrind")?

I don't think we have any selftest coverage for "\r" in the line-break
handling; that would be good to add.

This patch is OK for trunk once the rest of the kit is approved.

Thanks
Dave



Re: [PATCH v1] RISC-V: Support RVV VFCVT.X.F.V rounding mode intrinsic API

2023-08-15 Thread Kito Cheng via Gcc-patches
Just a random idea came to my mind, maybe we could introduce one more
template argument to reduce those codes for rounding mode intrinsic
stuff?

example:

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 2074dac0f16..9cc60842a5b 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1648,10 +1648,11 @@ public:
};

/* Implements vfcvt.x.  */
-template
+template
class vfcvt_x : public function_base
{
public:
+  bool has_rounding_mode_operand_p () const override { return HAS_FRM; }
  rtx expand (function_expander ) const override
  {
return e.use_exact_insn (code_for_pred_fcvt_x_f (UNSPEC, e.arg_mode (0)));
@@ -2451,6 +2452,7 @@ static CONSTEXPR const vmerge vfmerge_obj;
static CONSTEXPR const vmv_v vfmv_v_obj;
static CONSTEXPR const vfcvt_x vfcvt_x_obj;
static CONSTEXPR const vfcvt_x vfcvt_xu_obj;
+static CONSTEXPR const vfcvt_x vfcvt_x_frm_obj;
static CONSTEXPR const vfcvt_rtz_x vfcvt_rtz_x_obj;
static CONSTEXPR const vfcvt_rtz_x vfcvt_rtz_xu_obj;
static CONSTEXPR const vfcvt_f vfcvt_f_obj;


Re: [PATCH] Fortran: Avoid accessing gfc_charlen when not looking at BT_CHARACTER (PR 110677)

2023-08-15 Thread Martin Jambor
Hello,

On Mon, Aug 14 2023, Harald Anlauf via Gcc-patches wrote:
> Hi Martin,
>
> Am 14.08.23 um 19:39 schrieb Martin Jambor:
>> Hello,
>> 
>> this patch addresses an issue uncovered by the undefined behavior
>> sanitizer.  In function resolve_structure_cons in resolve.cc there is
>> a test starting with:
>> 
>>if (cons->expr->ts.type == BT_CHARACTER && comp->ts.u.cl
>>&& comp->ts.u.cl->length
>>&& comp->ts.u.cl->length->expr_type == EXPR_CONSTANT
>> 
>> and UBSAN complained of loads from comp->ts.u.cl->length->expr_type of
>> integer value 1818451807 which is outside of the value range expr_t
>> enum.  If I understand the code correctly it the entire load was
>> unwanted because comp->ts.type in those cases is BT_CLASS and not
>> BT_CHARACTER.  This patch simply adds a check to make sure it is only
>> accessed in those cases.
>> 
>> I have verified that the UPBSAN failure goes away with this patch, it
>> also passes bootstrap and testing on x86_64-linux.  OK for master?
>
> this looks good to me.
>
> Looking at that code block, there is a potential other UB a few lines
> below, where (hopefully integer) string lengths are to be passed to
> mpz_cmp.
>
> If the string length is ill-defined (e.g. non-integer), value.integer
> is undefined.  We've seen this elsewhere, where on BE platforms that
> undefined value was interpreted as some large integer and giving
> failures on those platforms.  One could similarly add the following
> checks here (on top of your patch):

Thank you very much for the approval and the improvement.  I have
committed the following (after another round of testing).

Martin



Fortran: Avoid accessing gfc_charlen when not looking at BT_CHARACTER (PR 
110677)

This patch addresses an issue uncovered by the undefined behavior
sanitizer.  In function resolve_structure_cons in resolve.cc there is
a test starting with:

  if (cons->expr->ts.type == BT_CHARACTER && comp->ts.u.cl
  && comp->ts.u.cl->length
  && comp->ts.u.cl->length->expr_type == EXPR_CONSTANT

and UBSAN complained of loads from comp->ts.u.cl->length->expr_type of
integer value 1818451807 which is outside of the value range expr_t
enum.  If I understand the code correctly it the entire load was
unwanted because comp->ts.type in those cases is BT_CLASS and not
BT_CHARACTER.  This patch simply adds a check to make sure it is only
accessed in those cases.

During review, Harald Anlauf noticed that length types also need to be
checked and so I added also checks that he suggested to the condition.

Co-authored-by: Harald Anlauf 

gcc/fortran/ChangeLog:

2023-08-14  Martin Jambor  

PR fortran/110677
* resolve.cc (resolve_structure_cons): Check comp->ts is character
type before accessing stuff through comp->ts.u.cl.
---
 gcc/fortran/resolve.cc | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index e7c8d919bef..f51674f7faa 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -1396,11 +1396,14 @@ resolve_structure_cons (gfc_expr *expr, int init)
 the one of the structure, ensure this if the lengths are known at
 compile time and when we are dealing with PARAMETER or structure
 constructors.  */
-  if (cons->expr->ts.type == BT_CHARACTER && comp->ts.u.cl
- && comp->ts.u.cl->length
+  if (cons->expr->ts.type == BT_CHARACTER
+ && comp->ts.type == BT_CHARACTER
+ && comp->ts.u.cl && comp->ts.u.cl->length
  && comp->ts.u.cl->length->expr_type == EXPR_CONSTANT
  && cons->expr->ts.u.cl && cons->expr->ts.u.cl->length
  && cons->expr->ts.u.cl->length->expr_type == EXPR_CONSTANT
+ && cons->expr->ts.u.cl->length->ts.type == BT_INTEGER
+ && comp->ts.u.cl->length->ts.type == BT_INTEGER
  && mpz_cmp (cons->expr->ts.u.cl->length->value.integer,
  comp->ts.u.cl->length->value.integer) != 0)
{
-- 
2.41.0



[committed][GCC 12] d: Fix internal compiler error: in layout_aggregate_type, at d/types.cc:574

2023-08-15 Thread Iain Buclaw via Gcc-patches
Hi,

This patch fixes an ICE that is specific to the D front-end language
version in GDC 12.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32, committed
to releases/gcc-12.

The pr110959.d test case has also been committed to mainline to catch
the unlikely event of a regression.

Regards,
Iain.

---
PR d/110959

gcc/d/ChangeLog:

* dmd/canthrow.d (Dsymbol_canThrow): Use foreachVar.
* dmd/declaration.d (TupleDeclaration::needThis): Likewise.
(TupleDeclaration::foreachVar): New function.
(VarDeclaration::setFieldOffset): Use foreachVar.
* dmd/dinterpret.d (Interpreter::visit (DeclarationExp)): Likewise.
* dmd/dsymbolsem.d (DsymbolSemanticVisitor::visit (VarDeclaration)):
Don't push tuple field members to the scope symbol table.
(determineFields): Handle pushing tuple field members here instead.
* dmd/dtoh.d (ToCppBuffer::visit (VarDeclaration)): Visit all tuple
fields.
(ToCppBuffer::visit (TupleDeclaration)): New function.
* dmd/expression.d (expandAliasThisTuples): Use foreachVar.
* dmd/foreachvar.d (VarWalker::visit (DeclarationExp)): Likewise.
* dmd/ob.d (genKill): Likewise.
(checkObErrors): Likewise.
* dmd/semantic2.d (Semantic2Visitor::visit (TupleDeclaration)): Visit
all tuple fields.

gcc/testsuite/ChangeLog:

* gdc.dg/pr110959.d: New test.
* gdc.test/runnable/test23010.d: New test.
---
 gcc/d/dmd/canthrow.d| 13 +
 gcc/d/dmd/declaration.d | 63 +
 gcc/d/dmd/dinterpret.d  | 17 +++---
 gcc/d/dmd/dsymbolsem.d  | 17 +++---
 gcc/d/dmd/dtoh.d| 11 
 gcc/d/dmd/expression.d  |  8 ++-
 gcc/d/dmd/foreachvar.d  | 14 +
 gcc/d/dmd/ob.d  | 22 +--
 gcc/d/dmd/semantic2.d   |  5 ++
 gcc/testsuite/gdc.dg/pr110959.d | 32 +++
 gcc/testsuite/gdc.test/runnable/test23010.d | 43 ++
 11 files changed, 153 insertions(+), 92 deletions(-)
 create mode 100644 gcc/testsuite/gdc.dg/pr110959.d
 create mode 100644 gcc/testsuite/gdc.test/runnable/test23010.d

diff --git a/gcc/d/dmd/canthrow.d b/gcc/d/dmd/canthrow.d
index a38cbb1610b..fe6e1e344b9 100644
--- a/gcc/d/dmd/canthrow.d
+++ b/gcc/d/dmd/canthrow.d
@@ -270,18 +270,7 @@ private CT Dsymbol_canThrow(Dsymbol s, FuncDeclaration 
func, bool mustNotThrow)
 }
 else if (auto td = s.isTupleDeclaration())
 {
-for (size_t i = 0; i < td.objects.dim; i++)
-{
-RootObject o = (*td.objects)[i];
-if (o.dyncast() == DYNCAST.expression)
-{
-Expression eo = cast(Expression)o;
-if (auto se = eo.isDsymbolExp())
-{
-result |= Dsymbol_canThrow(se.s, func, mustNotThrow);
-}
-}
-}
+td.foreachVar();
 }
 return result;
 }
diff --git a/gcc/d/dmd/declaration.d b/gcc/d/dmd/declaration.d
index 7b50c050487..6c83c196f72 100644
--- a/gcc/d/dmd/declaration.d
+++ b/gcc/d/dmd/declaration.d
@@ -656,23 +656,46 @@ extern (C++) final class TupleDeclaration : Declaration
 override bool needThis()
 {
 //printf("TupleDeclaration::needThis(%s)\n", toChars());
-for (size_t i = 0; i < objects.dim; i++)
+return isexp ? foreachVar((s) { return s.needThis(); }) != 0 : false;
+}
+
+/***
+ * Calls dg(Dsymbol) for each Dsymbol, which should be a VarDeclaration
+ * inside DsymbolExp (isexp == true).
+ * Params:
+ *dg = delegate to call for each Dsymbol
+ */
+extern (D) void foreachVar(scope void delegate(Dsymbol) dg)
+{
+assert(isexp);
+foreach (o; *objects)
 {
-RootObject o = (*objects)[i];
-if (o.dyncast() == DYNCAST.expression)
-{
-Expression e = cast(Expression)o;
-if (DsymbolExp ve = e.isDsymbolExp())
-{
-Declaration d = ve.s.isDeclaration();
-if (d && d.needThis())
-{
-return true;
-}
-}
-}
+if (auto e = o.isExpression())
+if (auto se = e.isDsymbolExp())
+dg(se.s);
 }
-return false;
+}
+
+/***
+ * Calls dg(Dsymbol) for each Dsymbol, which should be a VarDeclaration
+ * inside DsymbolExp (isexp == true).
+ * If dg returns !=0, stops and returns that value else returns 0.
+ * Params:
+ *dg = delegate to call for each Dsymbol
+ * Returns:
+ *last value returned by dg()
+ */
+extern (D) int 

Re: [PATCH] Handle TYPE_OVERFLOW_UNDEFINED vectorized BB reductions

2023-08-15 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> The following changes the gate to perform vectorization of BB reductions
> to use needs_fold_left_reduction_p which in turn requires handling
> TYPE_OVERFLOW_UNDEFINED types in the epilogue code generation by
> promoting any operations generated there to use unsigned arithmetic.
>
> The following does this, there's currently only v16qi where x86
> supports a .REDUC_PLUS reduction for integral modes so I had to
> add a x86 specific testcase using GIMPLE IL.
>
> Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu.

LGTM FWIW.

> The next plan is to remove the restriction to .REDUC_PLUS, factoring
> out some of the general non-ifn way of doing a reduction epilog
> from loop reduction handling.  I had a stab at doing in-order
> reductions already but then those are really too similar to
> having general SLP discovery from N scalar defs (and then replacing
> those with extracts), at least since there's no
> fold_left_plus that doesn't add to an existing scalar I can't
> seem to easily just handle that case, possibly discovering
> { x_0, x_1, ..., x_n-1 }, extracting x_0, shifting the vector
> to { x_1, ..., x_n-1,  } and using mask_fold_left_plus
> with accumulating to x_0 and the  element masked would do.
> But I'm not sure that's worth the trouble?

Yeah, I doubt it.  I don't think SVE's FADDA is expected to be an
optimisation in its own right.  It's more of an enabler.

Another reason to use it in loops is that it's VLA-friendly.
But that wouldn't be an issue here.

Thanks,
Richard

> In principle with generic N scalar defs we could do a forward
> discovery from grouped loads and see where that goes (and of
> course handle in-order reductions that way).
>
>   * tree-vect-slp.cc (vect_slp_check_for_roots): Use
>   !needs_fold_left_reduction_p to decide whether we can
>   handle the reduction with association.
>   (vectorize_slp_instance_root_stmt): For TYPE_OVERFLOW_UNDEFINED
>   reductions perform all arithmetic in an unsigned type.
>
>   * gcc.target/i386/vect-reduc-2.c: New testcase.
> ---
>  gcc/testsuite/gcc.target/i386/vect-reduc-2.c | 77 
>  gcc/tree-vect-slp.cc | 44 +++
>  2 files changed, 107 insertions(+), 14 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-reduc-2.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/vect-reduc-2.c 
> b/gcc/testsuite/gcc.target/i386/vect-reduc-2.c
> new file mode 100644
> index 000..62559ef8e7b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-reduc-2.c
> @@ -0,0 +1,77 @@
> +/* { dg-do compile } */
> +/* { dg-options "-fgimple -O2 -msse2 -fdump-tree-slp2-optimized" } */
> +
> +signed char x[16];
> +
> +signed char __GIMPLE (ssa,guessed_local(1073741824))
> +foo ()
> +{
> +  signed char _1;
> +  signed char _3;
> +  signed char _5;
> +  signed char _6;
> +  signed char _8;
> +  signed char _9;
> +  signed char _11;
> +  signed char _12;
> +  signed char _14;
> +  signed char _15;
> +  signed char _17;
> +  signed char _18;
> +  signed char _20;
> +  signed char _21;
> +  signed char _23;
> +  signed char _24;
> +  signed char _26;
> +  signed char _27;
> +  signed char _29;
> +  signed char _30;
> +  signed char _32;
> +  signed char _33;
> +  signed char _35;
> +  signed char _36;
> +  signed char _38;
> +  signed char _39;
> +  signed char _41;
> +  signed char _42;
> +  signed char _44;
> +  signed char _45;
> +  signed char _47;
> +
> +  __BB(2,guessed_local(1073741824)):
> +  _1 = x[15];
> +  _3 = x[1];
> +  _5 = _1 + _3;
> +  _6 = x[2];
> +  _8 = _5 + _6;
> +  _9 = x[3];
> +  _11 = _8 + _9;
> +  _12 = x[4];
> +  _14 = _11 + _12;
> +  _15 = x[5];
> +  _17 = _14 + _15;
> +  _18 = x[6];
> +  _20 = _17 + _18;
> +  _21 = x[7];
> +  _23 = _20 + _21;
> +  _24 = x[8];
> +  _26 = _23 + _24;
> +  _27 = x[9];
> +  _29 = _26 + _27;
> +  _30 = x[10];
> +  _32 = _29 + _30;
> +  _33 = x[11];
> +  _35 = _32 + _33;
> +  _36 = x[12];
> +  _38 = _35 + _36;
> +  _39 = x[13];
> +  _41 = _38 + _39;
> +  _42 = x[14];
> +  _44 = _41 + _42;
> +  _45 = x[0];
> +  _47 = _44 + _45;
> +  return _47;
> +
> +}
> +
> +/* { dg-final { scan-tree-dump "optimized: basic block part vectorized" 
> "slp2" } } */
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 7020bd9fa0e..07d68f2052b 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -7217,13 +7217,10 @@ vect_slp_check_for_roots (bb_vec_info bb_vinfo)
>   }
>else if (!VECTOR_TYPE_P (TREE_TYPE (rhs))
>  && (associative_tree_code (code) || code == MINUS_EXPR)
> -/* ???  The flag_associative_math and TYPE_OVERFLOW_WRAPS
> -   checks pessimize a two-element reduction.  PR54400.
> +/* ???  This pessimizes a two-element reduction.  PR54400.
> ???  In-order reduction could be handled if we only
> traverse one operand chain in vect_slp_linearize_chain.  */
> -&& ((FLOAT_TYPE_P 

Re: [RFC] GCC Security policy

2023-08-15 Thread Paul Koning via Gcc-patches



> On Aug 15, 2023, at 10:07 AM, Alexander Monakov  wrote:
> 
> 
> On Tue, 15 Aug 2023, Siddhesh Poyarekar wrote:
> 
>> Does this as the first paragraph address your concerns:
> 
> Thanks, this is nicer (see notes below). My main concern is that we shouldn't
> pretend there's some method of verifying that arbitrary source code is "safe"
> to pass to an unsandboxed compiler, nor should we push the responsibility of
> doing that on users.

Perhaps, but clearly the compiler can't do it ("Halting problem"...) so it has 
to be clear that the solution must be outside the compiler.  

paul



[v3] OpenACC 2.7: default clause support for data constructs (was: [PATCH, OpenACC 2.7, v2] Implement default clause support for data constructs)

2023-08-15 Thread Thomas Schwinge
Hi!

On 2023-08-01T23:35:16+0800, Chung-Lin Tang  wrote:
> this is v2 of the patch for implementing the OpenACC 2.7 addition of
> default(none|present) support for data constructs.

Thanks!

> Instead of propagating an additional 'oacc_default_kind' for OpenACC,
> this patch does it in a more complete way: it directly propagates the
> gimplify_omp_ctx* pointer of the inner most context where we found
> a default-clause.

Right -- but reviewing this, it came upon me that we don't need any such
new code at all, and instead may in 'gcc/gimplify.cc:oacc_default_clause'
simply look through the 'ctx's to find the 'default' clause information.
This centralizes the logic in the one place where it's relevant.

> This supports displaying the location/type of OpenACC
> construct where the default-clause is in the error messages.

This is preserved...

> The testcases also have the multiple nested data construct testing added,
> where we can now have messages referring precisely to the exact innermost
> default clause that was active at that program point.

..., but we should also still 'inform' about the compute construct, where
the user is expected to add explicit data clauses (if not adding to the
'data' construct where the 'default(none)' clause appears):

> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc

> @@ -7785,16 +7809,20 @@ oacc_default_clause (struct gimplify_omp_ctx *ctx, 
> tree decl, unsigned flags)

> -  else if (ctx->default_kind == OMP_CLAUSE_DEFAULT_NONE)
> +  else if (default_kind == OMP_CLAUSE_DEFAULT_NONE)
>  {
>error ("%qE not specified in enclosing OpenACC %qs construct",
> -  DECL_NAME (lang_hooks.decls.omp_report_decl (decl)), rkind);
> -  inform (ctx->location, "enclosing OpenACC %qs construct", rkind);
> -}
> -  else if (ctx->default_kind == OMP_CLAUSE_DEFAULT_PRESENT)
> +  DECL_NAME (lang_hooks.decls.omp_report_decl (decl)),
> +  oacc_region_type_name (ctx->region_type));
> +  inform (ctx->oacc_default_clause_ctx->location,
> +   "enclosing OpenACC %qs construct",
> +   oacc_region_type_name
> +   (ctx->oacc_default_clause_ctx->region_type));
> +}

That is, we should keep here the original 'inform' for 'ctx->location',
and *add another* 'inform' for 'ctx->oacc_default_clause_ctx->location'.
Otherwise that's confusing to users.

Instead of requiring another iteration through you, I've now implemented
that, and with test cases enhanced some more, pushed to master branch
commit bed993884b149851fe930b43cf11cbcdf05f1578
"OpenACC 2.7: default clause support for data constructs", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From bed993884b149851fe930b43cf11cbcdf05f1578 Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Tue, 6 Jun 2023 03:46:29 -0700
Subject: [PATCH] OpenACC 2.7: default clause support for data constructs

This patch implements the OpenACC 2.7 addition of default(none|present) support
for data constructs.

Now, specifying "default(none|present)" on a data construct turns on same
default clause behavior for all lexically enclosed compute constructs (which
don't already themselves have a default clause).

gcc/c/ChangeLog:
	* c-parser.cc (OACC_DATA_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_DEFAULT.

gcc/cp/ChangeLog:
	* parser.cc (OACC_DATA_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_DEFAULT.

gcc/fortran/ChangeLog:
	* openmp.cc (OACC_DATA_CLAUSES): Add OMP_CLAUSE_DEFAULT.

gcc/ChangeLog:
	* gimplify.cc (oacc_region_type_name): New function.
	(oacc_default_clause): If no 'default' clause appears on this
	compute construct, see if one appears on a lexically containing
	'data' construct.
	(gimplify_scan_omp_clauses): Upon OMP_CLAUSE_DEFAULT case, set
	ctx->oacc_default_clause_ctx to current context.

gcc/testsuite/ChangeLog:
	* c-c++-common/goacc/default-3.c: Adjust testcase.
	* c-c++-common/goacc/default-4.c: Adjust testcase.
	* c-c++-common/goacc/default-5.c: Adjust testcase.
	* gfortran.dg/goacc/default-3.f95: Adjust testcase.
	* gfortran.dg/goacc/default-4.f: Adjust testcase.
	* gfortran.dg/goacc/default-5.f: Adjust testcase.

Co-authored-by: Thomas Schwinge 
---
 gcc/c/c-parser.cc |  1 +
 gcc/cp/parser.cc  |  1 +
 gcc/fortran/openmp.cc |  3 +-
 gcc/gimplify.cc   | 64 +++
 gcc/testsuite/c-c++-common/goacc/default-3.c  | 59 +-
 gcc/testsuite/c-c++-common/goacc/default-4.c  | 42 ++
 gcc/testsuite/c-c++-common/goacc/default-5.c  | 19 -
 gcc/testsuite/gfortran.dg/goacc/default-3.f95 | 77 ++-
 gcc/testsuite/gfortran.dg/goacc/default-4.f   | 36 +
 gcc/testsuite/gfortran.dg/goacc/default-5.f   | 19 -
 10 

Re: [PATCH] RISC-V: Support MASK_LEN_{LOAD_LANES,STORE_LANES}

2023-08-15 Thread Jeff Law via Gcc-patches




On 8/14/23 06:15, Juzhe-Zhong wrote:

This patch is depending on middle-end support:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627305.html

This patch allow us auto-vectorize this following case:

#define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE) \
   void __attribute__ ((noinline, noclone)) 
\
   NAME##_8 (OUTTYPE *__restrict dest, INTYPE *__restrict src,  
\
MASKTYPE *__restrict cond, intptr_t n) \
   {
\
 for (intptr_t i = 0; i < n; ++i)   
\
   if (cond[i]) 
\
dest[i] = (src[i * 8] + src[i * 8 + 1] + src[i * 8 + 2]\
   + src[i * 8 + 3] + src[i * 8 + 4] + src[i * 8 + 5]  \
   + src[i * 8 + 6] + src[i * 8 + 7]); \
   }

#define TEST2(NAME, OUTTYPE, INTYPE)   \
   TEST_LOOP (NAME##_f32, OUTTYPE, INTYPE, int32_t) 
  \

#define TEST1(NAME, OUTTYPE)   \
   TEST2 (NAME##_i32, OUTTYPE, int32_t) 
\

#define TEST(NAME) \
   TEST1 (NAME##_i32, int32_t)  
\

TEST (test)

ASM:

test_i32_i32_f32_8:
ble a3,zero,.L5
.L3:
vsetvli a4,a3,e8,mf4,ta,ma
vle32.v v0,0(a2)
vsetvli a5,zero,e32,m1,ta,ma
vmsne.viv0,v0,0
vsetvli zero,a4,e32,m1,ta,ma
vlseg8e32.v v8,(a1),v0.t
vsetvli a5,zero,e32,m1,ta,ma
sllia6,a4,2
vadd.vv v1,v9,v8
sllia7,a4,5
vadd.vv v1,v1,v10
sub a3,a3,a4
vadd.vv v1,v1,v11
vadd.vv v1,v1,v12
vadd.vv v1,v1,v13
vadd.vv v1,v1,v14
vadd.vv v1,v1,v15
vsetvli zero,a4,e32,m1,ta,ma
vse32.v v1,0(a0),v0.t
add a2,a2,a6
add a1,a1,a7
add a0,a0,a6
bne a3,zero,.L3
.L5:
ret

gcc/ChangeLog:

 * config/riscv/autovec.md (vec_mask_len_load_lanes): 
New pattern.
 (vec_mask_len_store_lanes): Ditto.
 (2): Fix pattern for ICE.
 (2): Ditto.
 * config/riscv/riscv-protos.h (expand_lanes_load_store): New function.
 * config/riscv/riscv-v.cc (get_mask_mode): Add tuple mode mask mode.
 (expand_lanes_load_store): New function.
 * config/riscv/vector-iterators.md: New iterator.
I would generally recommend sending independent fixes separately.  In 
particular the quad_trunc, oct_trunc changes seem like they should have 
been a separate patch.  But no need to resend this time.  Just try to 
break out distinct changes like those into their own patch.


OK, but obviously hold off committing until the generic support is 
approved and committed.


Thanks,
jeff



Re: [PATCH] RISC-V: Support MASK_LEN_{LOAD_LANES,STORE_LANES}

2023-08-15 Thread Jeff Law via Gcc-patches




On 8/14/23 06:15, Juzhe-Zhong wrote:

This patch is depending on middle-end support:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627305.html

This patch allow us auto-vectorize this following case:

#define TEST_LOOP(NAME, OUTTYPE, INTYPE, MASKTYPE) \
   void __attribute__ ((noinline, noclone)) 
\
   NAME##_8 (OUTTYPE *__restrict dest, INTYPE *__restrict src,  
\
MASKTYPE *__restrict cond, intptr_t n) \
   {
\
 for (intptr_t i = 0; i < n; ++i)   
\
   if (cond[i]) 
\
dest[i] = (src[i * 8] + src[i * 8 + 1] + src[i * 8 + 2]\
   + src[i * 8 + 3] + src[i * 8 + 4] + src[i * 8 + 5]  \
   + src[i * 8 + 6] + src[i * 8 + 7]); \
   }

#define TEST2(NAME, OUTTYPE, INTYPE)   \
   TEST_LOOP (NAME##_f32, OUTTYPE, INTYPE, int32_t) 
  \

#define TEST1(NAME, OUTTYPE)   \
   TEST2 (NAME##_i32, OUTTYPE, int32_t) 
\

#define TEST(NAME) \
   TEST1 (NAME##_i32, int32_t)  
\

TEST (test)

ASM:

test_i32_i32_f32_8:
ble a3,zero,.L5
.L3:
vsetvli a4,a3,e8,mf4,ta,ma
vle32.v v0,0(a2)
vsetvli a5,zero,e32,m1,ta,ma
vmsne.viv0,v0,0
vsetvli zero,a4,e32,m1,ta,ma
vlseg8e32.v v8,(a1),v0.t
vsetvli a5,zero,e32,m1,ta,ma
sllia6,a4,2
vadd.vv v1,v9,v8
sllia7,a4,5
vadd.vv v1,v1,v10
sub a3,a3,a4
vadd.vv v1,v1,v11
vadd.vv v1,v1,v12
vadd.vv v1,v1,v13
vadd.vv v1,v1,v14
vadd.vv v1,v1,v15
vsetvli zero,a4,e32,m1,ta,ma
vse32.v v1,0(a0),v0.t
add a2,a2,a6
add a1,a1,a7
add a0,a0,a6
bne a3,zero,.L3
.L5:
ret

gcc/ChangeLog:

 * config/riscv/autovec.md (vec_mask_len_load_lanes): 
New pattern.
 (vec_mask_len_store_lanes): Ditto.
 (2): Fix pattern for ICE.
 (2): Ditto.
 * config/riscv/riscv-protos.h (expand_lanes_load_store): New function.
 * config/riscv/riscv-v.cc (get_mask_mode): Add tuple mode mask mode.
 (expand_lanes_load_store): New function.
 * config/riscv/vector-iterators.md: New iterator.
I would generally recommend sending independent fixes separately.  In 
particular the quad_trunc, oct_trunc changes seem like they should have 
been a separate patch.  But no need to resend this time.  Just try to 
break out distinct changes like those into their own patch.


OK, but obviously hold off committing until the generic support is 
approved and committed.


Thanks,
jeff



Re: Is this a bug for __builtin_dynamic_object_size?

2023-08-15 Thread Qing Zhao via Gcc-patches
Thanks.

I just filed a PR https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111030 to record 
this issue and added you to the CC list.

Qing
> On Aug 15, 2023, at 6:57 AM, Siddhesh Poyarekar  wrote:
> 
> On 2023-08-14 19:12, Qing Zhao wrote:
>> Hi, Sid,
>> For the following testing case:
>> #include 
>> #define noinline __attribute__((__noinline__))
>> static void noinline alloc_buf_more (int index)
>> {
>>   struct annotated {
>> long foo;
>> char b;
>> char array[index];
>> long c;
>>   } q, *p;
>>   p = 
>>   printf("the__bdos of p->array whole max is %d \n", 
>> __builtin_dynamic_object_size(p->array, 0));
>>   printf("the__bdos of p->array sub max is %d \n", 
>> __builtin_dynamic_object_size(p->array, 1));
>>   printf("the__bdos of p->array whole min is %d \n", 
>> __builtin_dynamic_object_size(p->array, 2));
>>   printf("the__bdos of p->array sub min is %d \n", 
>> __builtin_dynamic_object_size(p->array, 3));
>>   return;
>> }
>> int main ()
>> {
>>   alloc_buf_more (10);
>>   return 0;
>> }
>> If I compile it with the latest upstream gcc and run it:
>> /home/opc/Install/latest-d/bin/gcc -O t.c
>> the__bdos of p->array whole max is 23
>> the__bdos of p->array sub max is 23
>> the__bdos of p->array whole min is 23
>> the__bdos of p->array sub min is 23
>> In which__builtin_dynamic_object_size(p->array, 0) and 
>> __builtin_dynamic_object_size(p->array, 1) return the same size, this seems 
>> wrong to me.
>> There is one line in tree-object-size.cc might relate to this bug: (in the 
>> routine “addr_object_size”)
>>  603   if (! TYPE_SIZE_UNIT (TREE_TYPE (var))
>>  604   || ! tree_fits_uhwi_p (TYPE_SIZE_UNIT (TREE_TYPE (var)))
>>  605   || (pt_var_size && TREE_CODE (pt_var_size) == INTEGER_CST
>>  606   && tree_int_cst_lt (pt_var_size,
>>  607   TYPE_SIZE_UNIT (TREE_TYPE 
>> (var)
>>  608 var = pt_var;
>> I suspect that the above line 604 “ ! tree_fits_uhwi_p (TYPE_SIZE_UNIT 
>> (TREE_TYPE (var)))” relates to this bug, since the TYPESIZE of the VLA 
>> “array” is not a unsigned HOST_WIDE_INT, but we still can use its TYPESIZE 
>> for dynamic_object_size?
>> What do you think?
> 
> Thanks, yes that doesn't work.  I'm trying to revive the patch I had 
> submitted earlier[1] in the year and fix this issue too in that process.  In 
> general the subobject size computation doesn't handle variable sizes at all; 
> it depends on whole object+offset to get size information, which ends up 
> working only for flex arrays at the end of objects.
> 
> Sid
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608914.html



Re: [RFC] GCC Security policy

2023-08-15 Thread Alexander Monakov


On Tue, 15 Aug 2023, Siddhesh Poyarekar wrote:

> Does this as the first paragraph address your concerns:

Thanks, this is nicer (see notes below). My main concern is that we shouldn't
pretend there's some method of verifying that arbitrary source code is "safe"
to pass to an unsandboxed compiler, nor should we push the responsibility of
doing that on users.

> The compiler driver processes source code, invokes other programs such as the
> assembler and linker and generates the output result, which may be assembly
> code or machine code.  It is necessary that all source code inputs to the
> compiler are trusted, since it is impossible for the driver to validate input
> source code for safety.

The statement begins with "It is necessary", but the next statement offers
an alternative in case the code is untrusted. This is a contradiction.
Is it necessary or not in the end?

I'd suggest to drop this statement and instead make a brief note that
compiling crafted/untrusted sources can result in arbitrary code execution
and unconstrained resource consumption in the compiler.

> For untrusted code should compilation should be done
 ^^
 typo (spurious 'should')
 
> inside a sandboxed environment to ensure that it does not compromise the
> development environment.  Note that this still does not guarantee safety of
> the produced output programs and that such programs should still either be
> analyzed thoroughly for safety or run only inside a sandbox or an isolated
> system to avoid compromising the execution environment.

The last statement seems to be a new addition. It is too broad and again
makes a reference to analysis that appears quite theoretical. It might be
better to drop this (and instead talk in more specific terms about any
guarantees that produced binary code matches security properties intended
by the sources; I believe Richard Sandiford raised this previously).

Thanks.
Alexander


Re: cpymem for RISCV with v extension

2023-08-15 Thread Jeff Law via Gcc-patches




On 8/15/23 03:16, juzhe.zh...@rivai.ai wrote:

The new  patch looks reasonable to me now. Thanks for fixing it.

Could you append testcase after finishing test infrastructure ?
I prefer this patch with testcase after infrastructure.
So let's call this an ACK, but ask that Joern not commit until the 
testsuite bits are in place.



jeff


Re: cpymem for RISCV with v extension

2023-08-15 Thread Jeff Law via Gcc-patches




On 8/15/23 02:12, Joern Rennecke wrote:


It lacks the strength reduction of the opaque pattern version for -O3,
though.  Would people also like to see that expanded into RTL?  Or
should I just drop in the opaque pattern for that?  Or not at all,
because everyone uses Superscalar Out-Of-Order execution?
I doubt it's going to matter all that much.  Your decision IMHO.  I'd 
like to think everyone implementing V will be OOO superscalar, but I'm 
not naive enough to believe that will hold in practice (even with the P 
extension on the way).


jeff


[PATCH] IFN: Fix vector extraction into promoted subreg.

2023-08-15 Thread Robin Dapp via Gcc-patches
Hi,

this patch fixes the case where vec_extract gets passed a promoted
subreg (e.g. from a return value).  When such a subreg is the
destination of a vector extraction we create a separate pseudo
register and ensure that the necessary promotion is performed
afterwards.

Before this patch a sign-extended subreg would erroneously not
be zero-extended e.g. when used as return value.  I added missing
test cases for unsigned vec_extract on RISC-V that check the
proper behavior.

Testsuite and bootstrap done on x86, aarch64 and power10.

Regards
 Robin

gcc/ChangeLog:

* internal-fn.cc (expand_vec_extract_optab_fn): Handle
SUBREG_PROMOTED_VAR_P.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1u.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2u.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3u.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4u.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c: New test.
---
 gcc/internal-fn.cc|  25 +++-
 .../rvv/autovec/vls-vlmax/vec_extract-1u.c|  63 
 .../rvv/autovec/vls-vlmax/vec_extract-2u.c|  69 +
 .../rvv/autovec/vls-vlmax/vec_extract-3u.c|  69 +
 .../rvv/autovec/vls-vlmax/vec_extract-4u.c|  70 +
 .../rvv/autovec/vls-vlmax/vec_extract-runu.c  | 137 ++
 6 files changed, 430 insertions(+), 3 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1u.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2u.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3u.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4u.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 4f2b20a79e5..b1b12cc8369 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -3150,14 +3150,33 @@ expand_vec_extract_optab_fn (internal_fn, gcall *stmt, 
convert_optab optab)
 
   if (icode != CODE_FOR_nothing)
 {
-  create_output_operand ([0], target, extract_mode);
+  /* Some backends like riscv sign-extend the extraction result to a full
+Pmode register.  If we are passed a promoted subreg as target make
+sure not to use it as target directly.  Instead, use a new pseudo
+and perform the necessary extension afterwards. */
+  rtx dest = target;
+  if (target && SUBREG_P (target) && SUBREG_PROMOTED_VAR_P (target))
+   dest = gen_reg_rtx (extract_mode);
+
+  create_output_operand ([0], dest, extract_mode);
+
   create_input_operand ([1], src, outermode);
   create_convert_operand_from ([2], pos,
   TYPE_MODE (TREE_TYPE (op1)), true);
   if (maybe_expand_insn (icode, 3, ops))
{
- if (!rtx_equal_p (target, ops[0].value))
-   emit_move_insn (target, ops[0].value);
+ if (!rtx_equal_p (dest, target))
+   {
+ if (SUBREG_P (target) && SUBREG_PROMOTED_VAR_P (target))
+   {
+ /* Have convert_move perform the subreg promotion.  */
+ rtx tmp = convert_to_mode (extract_mode, ops[0].value, 0);
+ convert_move (SUBREG_REG (target), tmp,
+   SUBREG_PROMOTED_SIGN (target));
+   }
+ else
+   emit_move_insn (target, dest);
+   }
  return;
}
 }
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1u.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1u.c
new file mode 100644
index 000..a35988ff55d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1u.c
@@ -0,0 +1,63 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv64gcv_zvfh -mabi=lp64d -Wno-pedantic 
-Wno-psabi" } */
+
+#include 
+
+typedef uint64_t vnx2di __attribute__((vector_size (16)));
+typedef uint32_t vnx4si __attribute__((vector_size (16)));
+typedef uint16_t vnx8hi __attribute__((vector_size (16)));
+typedef uint8_t vnx16qi __attribute__((vector_size (16)));
+
+#define VEC_EXTRACT(S,V,IDX)   \
+  S\
+  __attribute__((noipa))   \
+  vec_extract_##V##_##IDX (V v)\
+  {\
+return v[IDX]; \
+  }
+
+#define VEC_EXTRACT_VAR1(S,V)  \
+  S\
+  __attribute__((noipa))   \
+  vec_extract_var_##V (V v, int8_t idx)\
+  {\
+return v[idx]; 

RE: [PATCH] RISC-V: Fix autovec_length_operand predicate[PR110989]

2023-08-15 Thread Li, Pan2 via Gcc-patches
Committed, thanks Robin.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Robin Dapp via Gcc-patches
Sent: Tuesday, August 15, 2023 6:43 PM
To: Juzhe-Zhong ; gcc-patches@gcc.gnu.org
Cc: rdapp@gmail.com; kito.ch...@sifive.com; kito.ch...@gmail.com; 
jeffreya...@gmail.com
Subject: Re: [PATCH] RISC-V: Fix autovec_length_operand predicate[PR110989]

> Currently, autovec_length_operand predicate incorrect configuration is
> discovered in PR110989 since this following situation:

In case you haven't committed it yet: This is OK.

Regards
 Robin


Re: [RFC PATCH 0/2] RISC-V: __builtin_riscv_pause for all environment

2023-08-15 Thread Jeff Law via Gcc-patches




On 8/13/23 13:52, Philipp Tomsich wrote:

On Sat, 12 Aug 2023 at 01:31, Jeff Law via Gcc-patches
 wrote:




On 8/9/23 16:39, Tsukasa OI wrote:

On 2023/08/10 5:05, Jeff Law wrote:



I'd tend to think we do not want to expose the intrinsic unless the
right extensions are enabled -- even though the encoding is a no-op and
we could emit it as a .insn.


I think that makes sense.  The only reason I implemented the
no-'Zihintpause' version is because GCC 13 implemented the built-in
unconditionally.  If the compatibility breakage is considered minimum (I
don't know, though), I'm ready to submit 'Zihintpause'-only version of
this patch set.

While it's a compatibility break I don't think we have a need to
preserve this kind of compatibility.  I suspect anyone using
__builtin_riscv_pause was probably already turning on Zihintpause and if
they weren't they should have been :-0


I'm sure we'll kick this around in the Tuesday meeting and hopefully
make a decision about the desired direction.  You're obviously welcome
to join if you're inclined.  Let me know if you need an invite.


The original discussion (and I believe that Andrew was the decisive
voice in the end) came to the conclusion that—given that pause is a
true hint—it could always be enabled.
We had originally expected to enable it only if Zihintpause was part
of the target architecture, but viewing it as "just a name for an
already existing pure hint" also made sense.
Note that on systems that don't implement Zihintpause, the hint is
guarantueed to not have an architectural effect.

That said, I don't really have a strong leaning one way or another.
Philipp.
I don't have a strong opinion either way and I certainly see both sides 
of the argument.


It sounds like the current situation is by design; knowing that now I 
would tend to lean towards keeping status quo, which would mean going 
with Tsukasa's first patch or something similar.


We'll certainly discuss on the call in a half-hour or so.

jeff


Re: [RFC PATCH 0/2] RISC-V: __builtin_riscv_pause for all environment

2023-08-15 Thread Jeff Law via Gcc-patches




On 8/11/23 18:20, Tsukasa OI wrote:




I'll not be able to attend that meeting due to Japanese religious events
around Aug 13-16 (it may not be impossible but at least difficult) but
look forward seeing that some conclusion is made.
No problem.  We hold that meeting weekly to work through any outstanding 
patches related to RISC-V.  You're always welcome to attend if you want.




I leave two patch sets corresponding two options so in either case, we
can apply a fix after the conclusion is made.

(1) __builtin_riscv_pause for 'Zihintpause'-only

(2) __builtin_riscv_pause for all

Thanks.   It's not clear what direction we'll take on this, but having a 
patchkit for both potential outcomes is definitely helpful.


jeff


Re: cpymem for RISCV with v extension

2023-08-15 Thread Jeff Law via Gcc-patches




On 8/14/23 19:46, Joern Rennecke wrote:

On Fri, 4 Aug 2023 at 21:52, Jeff Law  wrote:


diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index b4884a30872..e61110fa3ad 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -49,6 +49,7 @@
   #include "tm-constrs.h"
   #include "rtx-vector-builder.h"
   #include "targhooks.h"
+#include "predict.h"

Not sure this is needed, but I didn't scan for it explicitly.  If it's
not needed, then remove it.


It is needed to declare optimize_function_for_size_p .

Obviously a trivial nit.  Thanks for tracking it down.

jeff


[PATCH v4] c++: extend cold, hot attributes to classes

2023-08-15 Thread Javier Martinez via Gcc-patches
On Mon, Aug 14, 2023 at 8:32 PM Jason Merrill  wrote:
> I think you also want to check for ATTR_FLAG_TYPE_IN_PLACE.
> [...]
> > +  propagate_class_warmth_attribute (t);
> Maybe call this in check_bases_and_members instead?

Yes, that is sensible. Done.

Thanks,
Javier

Signed-off-by: Javier Martinez 
Signed-off-by: Javier Martinez 

gcc/c-family/ChangeLog:

* c-attribs.cc (handle_hot_attribute): remove warning on RECORD_TYPE
and UNION_TYPE when in c_dialect_xx.
(handle_cold_attribute): Likewise.

gcc/cp/ChangeLog:

* class.cc (propagate_class_warmth_attribute): New function.
(check_bases_and_members): propagate hot and cold attributes
to all FUNCTION_DECL when the record is marked hot or cold.
* cp-tree.h (maybe_propagate_warmth_attributes): New function.
* decl2.cc (maybe_propagate_warmth_attributes): New function.
* method.cc (lazily_declare_fn): propagate hot and cold
attributes to lazily declared functions when the record is
marked hot or cold.

gcc/testsuite/ChangeLog:

* g++.dg/ext/attr-hotness.C: New test.
---
 gcc/c-family/c-attribs.cc   | 50 -
 gcc/cp/class.cc | 31 +++
 gcc/cp/cp-tree.h|  1 +
 gcc/cp/decl2.cc | 37 ++
 gcc/cp/method.cc|  6 +++
 gcc/testsuite/g++.dg/ext/attr-hotness.C | 16 
 6 files changed, 139 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/attr-hotness.C

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index e2792ca6898..25083d597c0 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -452,10 +452,10 @@ const struct attribute_spec c_common_attribute_table[] =
   { "alloc_size",	  1, 2, false, true, true, false,
 			  handle_alloc_size_attribute,
 	  attr_alloc_exclusions },
-  { "cold",   0, 0, true,  false, false, false,
+  { "cold",		  0, 0, false,  false, false, false,
 			  handle_cold_attribute,
 	  attr_cold_hot_exclusions },
-  { "hot",0, 0, true,  false, false, false,
+  { "hot",		  0, 0, false,  false, false, false,
 			  handle_hot_attribute,
 	  attr_cold_hot_exclusions },
   { "no_address_safety_analysis",
@@ -1110,6 +1110,29 @@ handle_hot_attribute (tree *node, tree name, tree ARG_UNUSED (args),
 {
   /* Attribute hot processing is done later with lookup_attribute.  */
 }
+  else if ((TREE_CODE (*node) == RECORD_TYPE
+	|| TREE_CODE (*node) == UNION_TYPE)
+	   && c_dialect_cxx ()
+	   && (flags & (int) ATTR_FLAG_TYPE_IN_PLACE))
+{
+  /* Check conflict here as decl_attributes will otherwise only catch
+	 it late at the function when the attribute is used on a class.  */
+  tree cold_attr = lookup_attribute ("cold", TYPE_ATTRIBUTES (*node));
+  if (cold_attr)
+	{
+	  warning (OPT_Wattributes, "ignoring attribute %qE because it "
+			"conflicts with attribute %qs", name, "cold");
+	  *no_add_attrs = true;
+	}
+}
+  else if (flags & ((int) ATTR_FLAG_FUNCTION_NEXT
+		| (int) ATTR_FLAG_DECL_NEXT))
+{
+	/* Avoid applying the attribute to a function return type when
+	   used as:  void __attribute ((hot)) foo (void).  It will be
+	   passed to the function.  */
+	*no_add_attrs = true;
+}
   else
 {
   warning (OPT_Wattributes, "%qE attribute ignored", name);
@@ -1131,6 +1154,29 @@ handle_cold_attribute (tree *node, tree name, tree ARG_UNUSED (args),
 {
   /* Attribute cold processing is done later with lookup_attribute.  */
 }
+  else if ((TREE_CODE (*node) == RECORD_TYPE
+	|| TREE_CODE (*node) == UNION_TYPE)
+	   && c_dialect_cxx ()
+	   && (flags & (int) ATTR_FLAG_TYPE_IN_PLACE))
+{
+  /* Check conflict here as decl_attributes will otherwise only catch
+	 it late at the function when the attribute is used on a class.  */
+  tree hot_attr = lookup_attribute ("hot", TYPE_ATTRIBUTES (*node));
+  if (hot_attr)
+	{
+	  warning (OPT_Wattributes, "ignoring attribute %qE because it "
+			"conflicts with attribute %qs", name, "hot");
+	  *no_add_attrs = true;
+	}
+}
+  else if (flags & ((int) ATTR_FLAG_FUNCTION_NEXT
+		| (int) ATTR_FLAG_DECL_NEXT))
+{
+	/* Avoid applying the attribute to a function return type when
+	   used as:  void __attribute ((cold)) foo (void).  It will be
+	   passed to the function.  */
+	*no_add_attrs = true;
+}
   else
 {
   warning (OPT_Wattributes, "%qE attribute ignored", name);
diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc
index 778759237dc..bf0b558967f 100644
--- a/gcc/cp/class.cc
+++ b/gcc/cp/class.cc
@@ -205,6 +205,7 @@ static tree get_vcall_index (tree, tree);
 static bool type_maybe_constexpr_default_constructor (tree);
 static bool type_maybe_constexpr_destructor (tree);
 static bool 

[PATCH] Handle TYPE_OVERFLOW_UNDEFINED vectorized BB reductions

2023-08-15 Thread Richard Biener via Gcc-patches
The following changes the gate to perform vectorization of BB reductions
to use needs_fold_left_reduction_p which in turn requires handling
TYPE_OVERFLOW_UNDEFINED types in the epilogue code generation by
promoting any operations generated there to use unsigned arithmetic.

The following does this, there's currently only v16qi where x86
supports a .REDUC_PLUS reduction for integral modes so I had to
add a x86 specific testcase using GIMPLE IL.

Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu.

The next plan is to remove the restriction to .REDUC_PLUS, factoring
out some of the general non-ifn way of doing a reduction epilog
from loop reduction handling.  I had a stab at doing in-order
reductions already but then those are really too similar to
having general SLP discovery from N scalar defs (and then replacing
those with extracts), at least since there's no
fold_left_plus that doesn't add to an existing scalar I can't
seem to easily just handle that case, possibly discovering
{ x_0, x_1, ..., x_n-1 }, extracting x_0, shifting the vector
to { x_1, ..., x_n-1,  } and using mask_fold_left_plus
with accumulating to x_0 and the  element masked would do.
But I'm not sure that's worth the trouble?

In principle with generic N scalar defs we could do a forward
discovery from grouped loads and see where that goes (and of
course handle in-order reductions that way).

* tree-vect-slp.cc (vect_slp_check_for_roots): Use
!needs_fold_left_reduction_p to decide whether we can
handle the reduction with association.
(vectorize_slp_instance_root_stmt): For TYPE_OVERFLOW_UNDEFINED
reductions perform all arithmetic in an unsigned type.

* gcc.target/i386/vect-reduc-2.c: New testcase.
---
 gcc/testsuite/gcc.target/i386/vect-reduc-2.c | 77 
 gcc/tree-vect-slp.cc | 44 +++
 2 files changed, 107 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-reduc-2.c

diff --git a/gcc/testsuite/gcc.target/i386/vect-reduc-2.c 
b/gcc/testsuite/gcc.target/i386/vect-reduc-2.c
new file mode 100644
index 000..62559ef8e7b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-reduc-2.c
@@ -0,0 +1,77 @@
+/* { dg-do compile } */
+/* { dg-options "-fgimple -O2 -msse2 -fdump-tree-slp2-optimized" } */
+
+signed char x[16];
+
+signed char __GIMPLE (ssa,guessed_local(1073741824))
+foo ()
+{
+  signed char _1;
+  signed char _3;
+  signed char _5;
+  signed char _6;
+  signed char _8;
+  signed char _9;
+  signed char _11;
+  signed char _12;
+  signed char _14;
+  signed char _15;
+  signed char _17;
+  signed char _18;
+  signed char _20;
+  signed char _21;
+  signed char _23;
+  signed char _24;
+  signed char _26;
+  signed char _27;
+  signed char _29;
+  signed char _30;
+  signed char _32;
+  signed char _33;
+  signed char _35;
+  signed char _36;
+  signed char _38;
+  signed char _39;
+  signed char _41;
+  signed char _42;
+  signed char _44;
+  signed char _45;
+  signed char _47;
+
+  __BB(2,guessed_local(1073741824)):
+  _1 = x[15];
+  _3 = x[1];
+  _5 = _1 + _3;
+  _6 = x[2];
+  _8 = _5 + _6;
+  _9 = x[3];
+  _11 = _8 + _9;
+  _12 = x[4];
+  _14 = _11 + _12;
+  _15 = x[5];
+  _17 = _14 + _15;
+  _18 = x[6];
+  _20 = _17 + _18;
+  _21 = x[7];
+  _23 = _20 + _21;
+  _24 = x[8];
+  _26 = _23 + _24;
+  _27 = x[9];
+  _29 = _26 + _27;
+  _30 = x[10];
+  _32 = _29 + _30;
+  _33 = x[11];
+  _35 = _32 + _33;
+  _36 = x[12];
+  _38 = _35 + _36;
+  _39 = x[13];
+  _41 = _38 + _39;
+  _42 = x[14];
+  _44 = _41 + _42;
+  _45 = x[0];
+  _47 = _44 + _45;
+  return _47;
+
+}
+
+/* { dg-final { scan-tree-dump "optimized: basic block part vectorized" "slp2" 
} } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 7020bd9fa0e..07d68f2052b 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -7217,13 +7217,10 @@ vect_slp_check_for_roots (bb_vec_info bb_vinfo)
}
   else if (!VECTOR_TYPE_P (TREE_TYPE (rhs))
   && (associative_tree_code (code) || code == MINUS_EXPR)
-  /* ???  The flag_associative_math and TYPE_OVERFLOW_WRAPS
- checks pessimize a two-element reduction.  PR54400.
+  /* ???  This pessimizes a two-element reduction.  PR54400.
  ???  In-order reduction could be handled if we only
  traverse one operand chain in vect_slp_linearize_chain.  */
-  && ((FLOAT_TYPE_P (TREE_TYPE (rhs)) && flag_associative_math)
-  || (INTEGRAL_TYPE_P (TREE_TYPE (rhs))
-  && TYPE_OVERFLOW_WRAPS (TREE_TYPE (rhs
+  && !needs_fold_left_reduction_p (TREE_TYPE (rhs), code)
   /* Ops with constants at the tail can be stripped here.  */
   && TREE_CODE (rhs) == SSA_NAME
   && TREE_CODE (gimple_assign_rhs2 (assign)) == SSA_NAME
@@ -9161,9 +9158,23 @@ vectorize_slp_instance_root_stmt (slp_tree node, 
slp_instance 

[PATCH] Cleanup BB vectorization roots handling

2023-08-15 Thread Richard Biener via Gcc-patches
The following moves CONSTRUCTOR handling into the generic BB
vectorization roots handling, removing a special case and finally
renaming the function now consisting of more than just constructor
detection.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-vect-slp.cc (vect_analyze_slp_instance): Remove
slp_inst_kind_ctor handling.
(vect_analyze_slp): Simplify.
(vect_build_slp_instance): Dump when we analyze a CTOR.
(vect_slp_check_for_constructors): Rename to ...
(vect_slp_check_for_roots): ... this.  Register a
slp_root for CONSTRUCTORs instead of shoving them to
the set of grouped stores.
(vect_slp_analyze_bb_1): Adjust.
---
 gcc/tree-vect-slp.cc | 56 ++--
 1 file changed, 23 insertions(+), 33 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 40514e4758d..0563684928e 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -3123,6 +3123,14 @@ vect_build_slp_instance (vec_info *vinfo,
 /* ???  We need stmt_info for group splitting.  */
 stmt_vec_info stmt_info_)
 {
+  if (kind == slp_inst_kind_ctor)
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"Analyzing vectorizable constructor: %G\n",
+root_stmt_infos[0]->stmt);
+}
+
   if (dump_enabled_p ())
 {
   dump_printf_loc (MSG_NOTE, vect_location,
@@ -3429,22 +3437,6 @@ vect_analyze_slp_instance (vec_info *vinfo,
   STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info))
= STMT_VINFO_REDUC_DEF (vect_orig_stmt (scalar_stmts.last ()));
 }
-  else if (kind == slp_inst_kind_ctor)
-{
-  tree rhs = gimple_assign_rhs1 (stmt_info->stmt);
-  tree val;
-  scalar_stmts.create (CONSTRUCTOR_NELTS (rhs));
-  FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), i, val)
-   {
- stmt_vec_info def_info = vinfo->lookup_def (val);
- def_info = vect_stmt_to_vectorize (def_info);
- scalar_stmts.quick_push (def_info);
-   }
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_NOTE, vect_location,
-"Analyzing vectorizable constructor: %G\n",
-stmt_info->stmt);
-}
   else if (kind == slp_inst_kind_reduc_group)
 {
   /* Collect reduction statements.  */
@@ -3469,19 +3461,12 @@ vect_analyze_slp_instance (vec_info *vinfo,
 
   vec roots = vNULL;
   vec remain = vNULL;
-  if (kind == slp_inst_kind_ctor)
-{
-  roots.create (1);
-  roots.quick_push (stmt_info);
-}
   /* Build the tree for the SLP instance.  */
   bool res = vect_build_slp_instance (vinfo, kind, scalar_stmts,
  roots, remain,
  max_tree_size, limit, bst_map,
  kind == slp_inst_kind_store
  ? stmt_info : NULL);
-  if (!res)
-roots.release ();
 
   /* ???  If this is slp_inst_kind_store and the above succeeded here's
  where we should do store group splitting.  */
@@ -3509,9 +3494,7 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size)
   /* Find SLP sequences starting from groups of grouped stores.  */
   FOR_EACH_VEC_ELT (vinfo->grouped_stores, i, first_element)
 vect_analyze_slp_instance (vinfo, bst_map, first_element,
-  STMT_VINFO_GROUPED_ACCESS (first_element)
-  ? slp_inst_kind_store : slp_inst_kind_ctor,
-  max_tree_size, );
+  slp_inst_kind_store, max_tree_size, );
 
   if (bb_vec_info bb_vinfo = dyn_cast  (vinfo))
 {
@@ -7106,7 +7089,7 @@ vect_slp_is_lane_insert (gimple *use_stmt, tree vec, 
unsigned *this_lane)
array.  */
 
 static void
-vect_slp_check_for_constructors (bb_vec_info bb_vinfo)
+vect_slp_check_for_roots (bb_vec_info bb_vinfo)
 {
   for (unsigned i = 0; i < bb_vinfo->bbs.length (); ++i)
 for (gimple_stmt_iterator gsi = gsi_start_bb (bb_vinfo->bbs[i]);
@@ -7132,14 +7115,21 @@ vect_slp_check_for_constructors (bb_vec_info bb_vinfo)
  unsigned j;
  tree val;
  FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), j, val)
- if (TREE_CODE (val) != SSA_NAME
- || !bb_vinfo->lookup_def (val))
-   break;
+   if (TREE_CODE (val) != SSA_NAME
+   || !bb_vinfo->lookup_def (val))
+ break;
  if (j != CONSTRUCTOR_NELTS (rhs))
continue;
 
- stmt_vec_info stmt_info = bb_vinfo->lookup_stmt (assign);
- BB_VINFO_GROUPED_STORES (bb_vinfo).safe_push (stmt_info);
+ vec roots = vNULL;
+ roots.safe_push (bb_vinfo->lookup_stmt (assign));
+ vec stmts;
+ stmts.create (CONSTRUCTOR_NELTS (rhs));
+ FOR_EACH_CONSTRUCTOR_VALUE 

Re: [PATCH V2] VECT: Apply MASK_LEN_{LOAD_LANES, STORE_LANES} into vectorizer

2023-08-15 Thread Richard Biener via Gcc-patches
On Tue, 15 Aug 2023, Juzhe-Zhong wrote:

> Hi, Richard and Richi.
> 
> This patch is adding MASK_LEN_{LOAD_LANES,STORE_LANES} support into 
> vectorizer.
> 
> Consider this simple case:
> 
> void __attribute__ ((noinline, noclone))
> foo (int *__restrict a, int *__restrict b, int *__restrict c,
> int *__restrict d, int *__restrict e, int *__restrict f,
> int *__restrict g, int *__restrict h, int *__restrict j, int n)
> {
>   for (int i = 0; i < n; ++i)
> {
>   a[i] = j[i * 8];
>   b[i] = j[i * 8 + 1];
>   c[i] = j[i * 8 + 2];
>   d[i] = j[i * 8 + 3];
>   e[i] = j[i * 8 + 4];
>   f[i] = j[i * 8 + 5];
>   g[i] = j[i * 8 + 6];
>   h[i] = j[i * 8 + 7];
> }
> }
> 
> RVV Gimple IR:
> 
>   _79 = .SELECT_VL (ivtmp_81, POLY_INT_CST [4, 4]);
>   ivtmp_125 = _79 * 32;
>   vect_array.8 = .MASK_LEN_LOAD_LANES (vectp_j.6_124, 32B, { -1, ... }, _79, 
> 0);
>   vect__8.9_122 = vect_array.8[0];
>   vect__8.10_121 = vect_array.8[1];
>   vect__8.11_120 = vect_array.8[2];
>   vect__8.12_119 = vect_array.8[3];
>   vect__8.13_118 = vect_array.8[4];
>   vect__8.14_117 = vect_array.8[5];
>   vect__8.15_116 = vect_array.8[6];
>   vect__8.16_115 = vect_array.8[7];
>   vect_array.8 ={v} {CLOBBER};
>   ivtmp_114 = _79 * 4;
>   .MASK_LEN_STORE (vectp_a.17_113, 32B, { -1, ... }, _79, 0, vect__8.9_122);
>   .MASK_LEN_STORE (vectp_b.19_109, 32B, { -1, ... }, _79, 0, vect__8.10_121);
>   .MASK_LEN_STORE (vectp_c.21_105, 32B, { -1, ... }, _79, 0, vect__8.11_120);
>   .MASK_LEN_STORE (vectp_d.23_101, 32B, { -1, ... }, _79, 0, vect__8.12_119);
>   .MASK_LEN_STORE (vectp_e.25_97, 32B, { -1, ... }, _79, 0, vect__8.13_118);
>   .MASK_LEN_STORE (vectp_f.27_93, 32B, { -1, ... }, _79, 0, vect__8.14_117);
>   .MASK_LEN_STORE (vectp_g.29_89, 32B, { -1, ... }, _79, 0, vect__8.15_116);
>   .MASK_LEN_STORE (vectp_h.31_85, 32B, { -1, ... }, _79, 0, vect__8.16_115);
> 
> ASM:
> 
> foo:
>   lw  t4,8(sp)
>   ld  t5,0(sp)
>   ble t4,zero,.L5
> .L3:
>   vsetvli t1,t4,e8,mf4,ta,ma
>   vlseg8e32.v v8,(t5)
>   sllit3,t1,2
>   sllit6,t1,5
>   vse32.v v8,0(a0)
>   vse32.v v9,0(a1)
>   vse32.v v10,0(a2)
>   vse32.v v11,0(a3)
>   vse32.v v12,0(a4)
>   vse32.v v13,0(a5)
>   vse32.v v14,0(a6)
>   vse32.v v15,0(a7)
>   sub t4,t4,t1
>   add t5,t5,t6
>   add a0,a0,t3
>   add a1,a1,t3
>   add a2,a2,t3
>   add a3,a3,t3
>   add a4,a4,t3
>   add a5,a5,t3
>   add a6,a6,t3
>   add a7,a7,t3
>   bne t4,zero,.L3
> .L5:
>   ret
> 
> The details of the approach:
> 
> Step 1 - Modifiy the LANES LOAD/STORE support function 
> (vect_load_lanes_supported/vect_store_lanes_supported):
> 
> +/* Return FN if vec_{masked_,mask_len,}load_lanes is available for COUNT
> +   vectors of type VECTYPE.  MASKED_P says whether the masked form is 
> needed. */
>  
> -bool
> +internal_fn
>  vect_load_lanes_supported (tree vectype, unsigned HOST_WIDE_INT count,
>  bool masked_p)
>  {
> -  if (masked_p)
> -return vect_lanes_optab_supported_p ("vec_mask_load_lanes",
> -  vec_mask_load_lanes_optab,
> -  vectype, count);
> +  if (vect_lanes_optab_supported_p ("vec_mask_len_load_lanes",
> + vec_mask_len_load_lanes_optab,
> + vectype, count))
> +return IFN_MASK_LEN_LOAD_LANES;
> +  else if (masked_p)
> +{
> +  if (vect_lanes_optab_supported_p ("vec_mask_load_lanes",
> + vec_mask_load_lanes_optab,
> + vectype, count))
> + return IFN_MASK_LOAD_LANES;
> +}
>else
> -return vect_lanes_optab_supported_p ("vec_load_lanes",
> -  vec_load_lanes_optab,
> -  vectype, count);
> +{
> +  if (vect_lanes_optab_supported_p ("vec_load_lanes",
> + vec_load_lanes_optab,
> + vectype, count))
> + return IFN_LOAD_LANES;
> +}
> +  return IFN_LAST;
>  }
>  
> Instead of returning TRUE or FALSE whether target support the LANES 
> LOAD/STORE.
> I change it into return internal_fn of the LANES LOAD/STORE that target 
> support,
> If target didn't support any LANE LOAD/STORE optabs, return IFN_LAST.
> 
> Step 2 - Compute IFN for LANES LOAD/STORE (Only compute once).
> 
>   if (!STMT_VINFO_STRIDED_P (first_stmt_info)
> && (can_overrun_p || !would_overrun_p)
> && compare_step_with_zero (vinfo, stmt_info) > 0)
>   {
> /* First cope with the degenerate case of a single-element
>vector.  */
> if (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 1U))
>   ;
> 
> else
>   {
> /* Otherwise try using LOAD/STORE_LANES.  */

[PATCH V2] VECT: Apply MASK_LEN_{LOAD_LANES, STORE_LANES} into vectorizer

2023-08-15 Thread Juzhe-Zhong
Hi, Richard and Richi.

This patch is adding MASK_LEN_{LOAD_LANES,STORE_LANES} support into vectorizer.

Consider this simple case:

void __attribute__ ((noinline, noclone))
foo (int *__restrict a, int *__restrict b, int *__restrict c,
  int *__restrict d, int *__restrict e, int *__restrict f,
  int *__restrict g, int *__restrict h, int *__restrict j, int n)
{
  for (int i = 0; i < n; ++i)
{
  a[i] = j[i * 8];
  b[i] = j[i * 8 + 1];
  c[i] = j[i * 8 + 2];
  d[i] = j[i * 8 + 3];
  e[i] = j[i * 8 + 4];
  f[i] = j[i * 8 + 5];
  g[i] = j[i * 8 + 6];
  h[i] = j[i * 8 + 7];
}
}

RVV Gimple IR:

  _79 = .SELECT_VL (ivtmp_81, POLY_INT_CST [4, 4]);
  ivtmp_125 = _79 * 32;
  vect_array.8 = .MASK_LEN_LOAD_LANES (vectp_j.6_124, 32B, { -1, ... }, _79, 0);
  vect__8.9_122 = vect_array.8[0];
  vect__8.10_121 = vect_array.8[1];
  vect__8.11_120 = vect_array.8[2];
  vect__8.12_119 = vect_array.8[3];
  vect__8.13_118 = vect_array.8[4];
  vect__8.14_117 = vect_array.8[5];
  vect__8.15_116 = vect_array.8[6];
  vect__8.16_115 = vect_array.8[7];
  vect_array.8 ={v} {CLOBBER};
  ivtmp_114 = _79 * 4;
  .MASK_LEN_STORE (vectp_a.17_113, 32B, { -1, ... }, _79, 0, vect__8.9_122);
  .MASK_LEN_STORE (vectp_b.19_109, 32B, { -1, ... }, _79, 0, vect__8.10_121);
  .MASK_LEN_STORE (vectp_c.21_105, 32B, { -1, ... }, _79, 0, vect__8.11_120);
  .MASK_LEN_STORE (vectp_d.23_101, 32B, { -1, ... }, _79, 0, vect__8.12_119);
  .MASK_LEN_STORE (vectp_e.25_97, 32B, { -1, ... }, _79, 0, vect__8.13_118);
  .MASK_LEN_STORE (vectp_f.27_93, 32B, { -1, ... }, _79, 0, vect__8.14_117);
  .MASK_LEN_STORE (vectp_g.29_89, 32B, { -1, ... }, _79, 0, vect__8.15_116);
  .MASK_LEN_STORE (vectp_h.31_85, 32B, { -1, ... }, _79, 0, vect__8.16_115);

ASM:

foo:
lw  t4,8(sp)
ld  t5,0(sp)
ble t4,zero,.L5
.L3:
vsetvli t1,t4,e8,mf4,ta,ma
vlseg8e32.v v8,(t5)
sllit3,t1,2
sllit6,t1,5
vse32.v v8,0(a0)
vse32.v v9,0(a1)
vse32.v v10,0(a2)
vse32.v v11,0(a3)
vse32.v v12,0(a4)
vse32.v v13,0(a5)
vse32.v v14,0(a6)
vse32.v v15,0(a7)
sub t4,t4,t1
add t5,t5,t6
add a0,a0,t3
add a1,a1,t3
add a2,a2,t3
add a3,a3,t3
add a4,a4,t3
add a5,a5,t3
add a6,a6,t3
add a7,a7,t3
bne t4,zero,.L3
.L5:
ret

The details of the approach:

Step 1 - Modifiy the LANES LOAD/STORE support function 
(vect_load_lanes_supported/vect_store_lanes_supported):

+/* Return FN if vec_{masked_,mask_len,}load_lanes is available for COUNT
+   vectors of type VECTYPE.  MASKED_P says whether the masked form is needed. 
*/
 
-bool
+internal_fn
 vect_load_lanes_supported (tree vectype, unsigned HOST_WIDE_INT count,
   bool masked_p)
 {
-  if (masked_p)
-return vect_lanes_optab_supported_p ("vec_mask_load_lanes",
-vec_mask_load_lanes_optab,
-vectype, count);
+  if (vect_lanes_optab_supported_p ("vec_mask_len_load_lanes",
+   vec_mask_len_load_lanes_optab,
+   vectype, count))
+return IFN_MASK_LEN_LOAD_LANES;
+  else if (masked_p)
+{
+  if (vect_lanes_optab_supported_p ("vec_mask_load_lanes",
+   vec_mask_load_lanes_optab,
+   vectype, count))
+   return IFN_MASK_LOAD_LANES;
+}
   else
-return vect_lanes_optab_supported_p ("vec_load_lanes",
-vec_load_lanes_optab,
-vectype, count);
+{
+  if (vect_lanes_optab_supported_p ("vec_load_lanes",
+   vec_load_lanes_optab,
+   vectype, count))
+   return IFN_LOAD_LANES;
+}
+  return IFN_LAST;
 }
 
Instead of returning TRUE or FALSE whether target support the LANES LOAD/STORE.
I change it into return internal_fn of the LANES LOAD/STORE that target support,
If target didn't support any LANE LOAD/STORE optabs, return IFN_LAST.

Step 2 - Compute IFN for LANES LOAD/STORE (Only compute once).

  if (!STMT_VINFO_STRIDED_P (first_stmt_info)
  && (can_overrun_p || !would_overrun_p)
  && compare_step_with_zero (vinfo, stmt_info) > 0)
{
  /* First cope with the degenerate case of a single-element
 vector.  */
  if (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 1U))
;

  else
{
  /* Otherwise try using LOAD/STORE_LANES.  */
  *lanes_ifn
= vls_type == VLS_LOAD
? vect_load_lanes_supported (vectype, group_size, masked_p)
: vect_store_lanes_supported (vectype, group_size,

Re: [PATCH] vect: Move VMAT_GATHER_SCATTER handlings from final loop nest

2023-08-15 Thread Kewen.Lin via Gcc-patches
on 2023/8/15 20:07, Richard Biener wrote:
> On Tue, Aug 15, 2023 at 1:47 PM Kewen.Lin  wrote:
>>
>> on 2023/8/15 15:53, Richard Biener wrote:
>>> On Tue, Aug 15, 2023 at 4:44 AM Kewen.Lin  wrote:

 on 2023/8/14 22:16, Richard Sandiford wrote:
> "Kewen.Lin"  writes:
>> Hi Richard,
>>
>> on 2023/8/14 20:20, Richard Sandiford wrote:
>>> Thanks for the clean-ups.  But...
>>>
>>> "Kewen.Lin"  writes:
 Hi,

 Following Richi's suggestion [1], this patch is to move the
 handlings on VMAT_GATHER_SCATTER in the final loop nest
 of function vectorizable_load to its own loop.  Basically
 it duplicates the final loop nest, clean up some useless
 set up code for the case of VMAT_GATHER_SCATTER, remove some
 unreachable code.  Also remove the corresponding handlings
 in the final loop nest.

 Bootstrapped and regtested on x86_64-redhat-linux,
 aarch64-linux-gnu and powerpc64{,le}-linux-gnu.

 [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623329.html

 Is it ok for trunk?

 BR,
 Kewen
 -

 gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_load): Move the handlings on
VMAT_GATHER_SCATTER in the final loop nest to its own loop,
and update the final nest accordingly.
 ---
  gcc/tree-vect-stmts.cc | 361 +
  1 file changed, 219 insertions(+), 142 deletions(-)
>>>
>>> ...that seems like quite a lot of +s.  Is there nothing we can do to
>>> avoid the cut-&-paste?
>>
>> Thanks for the comments!  I'm not sure if I get your question, if we
>> want to move out the handlings of VMAT_GATHER_SCATTER, the new +s seem
>> inevitable?  Your concern is mainly about git blame history?
>
> No, it was more that 219-142=77, so it seems like a lot of lines
> are being duplicated rather than simply being moved.  (Unlike for
> VMAT_LOAD_STORE_LANES, which was even a slight LOC saving, and so
> was a clear improvement.)
>
> So I was just wondering if there was any obvious factoring-out that
> could be done to reduce the duplication.

 ah, thanks for the clarification!

 I think the main duplication are on the loop body beginning and end,
 let's take a look at them in details:

 +  if (memory_access_type == VMAT_GATHER_SCATTER)
 +{
 +  gcc_assert (alignment_support_scheme == dr_aligned
 + || alignment_support_scheme == dr_unaligned_supported);
 +  gcc_assert (!grouped_load && !slp_perm);
 +
 +  unsigned int inside_cost = 0, prologue_cost = 0;

 // These above are newly added.

 +  for (j = 0; j < ncopies; j++)
 +   {
 + /* 1. Create the vector or array pointer update chain.  */
 + if (j == 0 && !costing_p)
 +   {
 + if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
 +   vect_get_gather_scatter_ops (loop_vinfo, loop, stmt_info,
 +slp_node, _info, 
 _ptr,
 +_offsets);
 + else
 +   dataref_ptr
 + = vect_create_data_ref_ptr (vinfo, first_stmt_info, 
 aggr_type,
 + at_loop, offset, , gsi,
 + _incr, false, bump);
 +   }
 + else if (!costing_p)
 +   {
 + gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo));
 + if (!STMT_VINFO_GATHER_SCATTER_P (stmt_info))
 +   dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, 
 ptr_incr,
 +  gsi, stmt_info, bump);
 +   }

 // These are for dataref_ptr, in the final looop nest we deal with more 
 cases
 on simd_lane_access_p and diff_first_stmt_info, but don't handle
 STMT_VINFO_GATHER_SCATTER_P any more, very few (one case) can be shared 
 between,
 IMHO factoring out it seems like a overkill.

 +
 + if (mask && !costing_p)
 +   vec_mask = vec_masks[j];

 // It's merged out from j == 0 and j != 0

 +
 + gimple *new_stmt = NULL;
 + for (i = 0; i < vec_num; i++)
 +   {
 + tree final_mask = NULL_TREE;
 + tree final_len = NULL_TREE;
 + tree bias = NULL_TREE;
 + if (!costing_p)
 +   {
 + if (loop_masks)
 +   final_mask
 + = vect_get_loop_mask (loop_vinfo, gsi, loop_masks,
 +   

[PATCH] Support constants and externals in BB reduction vectorization

2023-08-15 Thread Richard Biener via Gcc-patches
The following supports vectorizing BB reductions involving a
constant or an invariant.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-vectorizer.h (_slp_instance::remain_stmts): Change
to ...
(_slp_instance::remain_defs): ... this.
(SLP_INSTANCE_REMAIN_STMTS): Rename to ...
(SLP_INSTANCE_REMAIN_DEFS): ... this.
(slp_root::remain): New.
(slp_root::slp_root): Adjust.
* tree-vect-slp.cc (vect_free_slp_instance): Adjust.
(vect_build_slp_instance): Get extra remain parameter,
adjust former handling of a cut off stmt.
(vect_analyze_slp_instance): Adjust.
(vect_analyze_slp): Likewise.
(_bb_vec_info::~_bb_vec_info): Likewise.
(vectorizable_bb_reduc_epilogue): Dump something if we fail.
(vect_slp_check_for_constructors): Handle non-internal
defs as remain defs of a reduction.
(vectorize_slp_instance_root_stmt): Adjust.

* gcc.dg/vect/bb-slp-75.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-75.c | 25 +++
 gcc/tree-vect-slp.cc  | 60 ++-
 gcc/tree-vectorizer.h |  9 ++--
 3 files changed, 71 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-75.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-75.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-75.c
new file mode 100644
index 000..1abac136f72
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-75.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_float } */
+/* { dg-additional-options "-ffast-math" } */
+/* { dg-additional-options "-msse2 -mfpmath=sse" { target { x86_64-*-* 
i?86-*-* } } } */
+
+float x[4];
+
+float test1 (float a)
+{
+  return x[0] + x[2] + x[1] + x[3] + a;
+}
+
+float test2 (void)
+{
+  return x[3] + x[2] + x[1] + 1.f + x[0];
+}
+
+float test3 (float a)
+{
+  return x[0] + a + x[2] + x[1] + x[3] + 1.f;
+}
+
+/* We currently require a .REDUC_PLUS direct internal function but do not
+   have a dejagnu target for this.  */
+/* { dg-final { scan-tree-dump-times "Basic block will be vectorized using 
SLP" 3 "slp2" { target { x86_64-*-* i?86-*-* } } } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 41997d5a546..cf91b21cf7d 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -209,7 +209,7 @@ vect_free_slp_instance (slp_instance instance)
   vect_free_slp_tree (SLP_INSTANCE_TREE (instance));
   SLP_INSTANCE_LOADS (instance).release ();
   SLP_INSTANCE_ROOT_STMTS (instance).release ();
-  SLP_INSTANCE_REMAIN_STMTS (instance).release ();
+  SLP_INSTANCE_REMAIN_DEFS (instance).release ();
   instance->subgraph_entries.release ();
   instance->cost_vec.release ();
   free (instance);
@@ -3115,6 +3115,7 @@ vect_build_slp_instance (vec_info *vinfo,
 slp_instance_kind kind,
 vec _stmts,
 vec _stmt_infos,
+vec ,
 unsigned max_tree_size, unsigned *limit,
 scalar_stmts_to_slp_tree_map_t *bst_map,
 /* ???  We need stmt_info for group splitting.  */
@@ -3134,10 +3135,9 @@ vect_build_slp_instance (vec_info *vinfo,
  ???  Selecting the optimal set of lanes to vectorize would be nice
  but SLP build for all lanes will fail quickly because we think
  we're going to need unrolling.  */
-  auto_vec remain;
   if (kind == slp_inst_kind_bb_reduc
   && (scalar_stmts.length () & 1))
-remain.safe_push (scalar_stmts.pop ());
+remain.safe_insert (0, gimple_get_lhs (scalar_stmts.pop ()->stmt));
 
   /* Build the tree for the SLP instance.  */
   unsigned int group_size = scalar_stmts.length ();
@@ -3186,10 +3186,7 @@ vect_build_slp_instance (vec_info *vinfo,
  SLP_INSTANCE_UNROLLING_FACTOR (new_instance) = unrolling_factor;
  SLP_INSTANCE_LOADS (new_instance) = vNULL;
  SLP_INSTANCE_ROOT_STMTS (new_instance) = root_stmt_infos;
- if (!remain.is_empty ())
-   SLP_INSTANCE_REMAIN_STMTS (new_instance) = remain.copy ();
- else
-   SLP_INSTANCE_REMAIN_STMTS (new_instance) = vNULL;
+ SLP_INSTANCE_REMAIN_DEFS (new_instance) = remain;
  SLP_INSTANCE_KIND (new_instance) = kind;
  new_instance->reduc_phis = NULL;
  new_instance->cost_vec = vNULL;
@@ -3469,6 +3466,7 @@ vect_analyze_slp_instance (vec_info *vinfo,
 gcc_unreachable ();
 
   vec roots = vNULL;
+  vec remain = vNULL;
   if (kind == slp_inst_kind_ctor)
 {
   roots.create (1);
@@ -3476,7 +3474,7 @@ vect_analyze_slp_instance (vec_info *vinfo,
 }
   /* Build the tree for the SLP instance.  */
   bool res = vect_build_slp_instance (vinfo, kind, scalar_stmts,
- roots,
+ roots, remain,
  max_tree_size, limit, 

Re: [PATCH] vect: Move VMAT_GATHER_SCATTER handlings from final loop nest

2023-08-15 Thread Richard Biener via Gcc-patches
On Tue, Aug 15, 2023 at 1:47 PM Kewen.Lin  wrote:
>
> on 2023/8/15 15:53, Richard Biener wrote:
> > On Tue, Aug 15, 2023 at 4:44 AM Kewen.Lin  wrote:
> >>
> >> on 2023/8/14 22:16, Richard Sandiford wrote:
> >>> "Kewen.Lin"  writes:
>  Hi Richard,
> 
>  on 2023/8/14 20:20, Richard Sandiford wrote:
> > Thanks for the clean-ups.  But...
> >
> > "Kewen.Lin"  writes:
> >> Hi,
> >>
> >> Following Richi's suggestion [1], this patch is to move the
> >> handlings on VMAT_GATHER_SCATTER in the final loop nest
> >> of function vectorizable_load to its own loop.  Basically
> >> it duplicates the final loop nest, clean up some useless
> >> set up code for the case of VMAT_GATHER_SCATTER, remove some
> >> unreachable code.  Also remove the corresponding handlings
> >> in the final loop nest.
> >>
> >> Bootstrapped and regtested on x86_64-redhat-linux,
> >> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
> >>
> >> [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623329.html
> >>
> >> Is it ok for trunk?
> >>
> >> BR,
> >> Kewen
> >> -
> >>
> >> gcc/ChangeLog:
> >>
> >>* tree-vect-stmts.cc (vectorizable_load): Move the handlings on
> >>VMAT_GATHER_SCATTER in the final loop nest to its own loop,
> >>and update the final nest accordingly.
> >> ---
> >>  gcc/tree-vect-stmts.cc | 361 +
> >>  1 file changed, 219 insertions(+), 142 deletions(-)
> >
> > ...that seems like quite a lot of +s.  Is there nothing we can do to
> > avoid the cut-&-paste?
> 
>  Thanks for the comments!  I'm not sure if I get your question, if we
>  want to move out the handlings of VMAT_GATHER_SCATTER, the new +s seem
>  inevitable?  Your concern is mainly about git blame history?
> >>>
> >>> No, it was more that 219-142=77, so it seems like a lot of lines
> >>> are being duplicated rather than simply being moved.  (Unlike for
> >>> VMAT_LOAD_STORE_LANES, which was even a slight LOC saving, and so
> >>> was a clear improvement.)
> >>>
> >>> So I was just wondering if there was any obvious factoring-out that
> >>> could be done to reduce the duplication.
> >>
> >> ah, thanks for the clarification!
> >>
> >> I think the main duplication are on the loop body beginning and end,
> >> let's take a look at them in details:
> >>
> >> +  if (memory_access_type == VMAT_GATHER_SCATTER)
> >> +{
> >> +  gcc_assert (alignment_support_scheme == dr_aligned
> >> + || alignment_support_scheme == dr_unaligned_supported);
> >> +  gcc_assert (!grouped_load && !slp_perm);
> >> +
> >> +  unsigned int inside_cost = 0, prologue_cost = 0;
> >>
> >> // These above are newly added.
> >>
> >> +  for (j = 0; j < ncopies; j++)
> >> +   {
> >> + /* 1. Create the vector or array pointer update chain.  */
> >> + if (j == 0 && !costing_p)
> >> +   {
> >> + if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
> >> +   vect_get_gather_scatter_ops (loop_vinfo, loop, stmt_info,
> >> +slp_node, _info, 
> >> _ptr,
> >> +_offsets);
> >> + else
> >> +   dataref_ptr
> >> + = vect_create_data_ref_ptr (vinfo, first_stmt_info, 
> >> aggr_type,
> >> + at_loop, offset, , gsi,
> >> + _incr, false, bump);
> >> +   }
> >> + else if (!costing_p)
> >> +   {
> >> + gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo));
> >> + if (!STMT_VINFO_GATHER_SCATTER_P (stmt_info))
> >> +   dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, 
> >> ptr_incr,
> >> +  gsi, stmt_info, bump);
> >> +   }
> >>
> >> // These are for dataref_ptr, in the final looop nest we deal with more 
> >> cases
> >> on simd_lane_access_p and diff_first_stmt_info, but don't handle
> >> STMT_VINFO_GATHER_SCATTER_P any more, very few (one case) can be shared 
> >> between,
> >> IMHO factoring out it seems like a overkill.
> >>
> >> +
> >> + if (mask && !costing_p)
> >> +   vec_mask = vec_masks[j];
> >>
> >> // It's merged out from j == 0 and j != 0
> >>
> >> +
> >> + gimple *new_stmt = NULL;
> >> + for (i = 0; i < vec_num; i++)
> >> +   {
> >> + tree final_mask = NULL_TREE;
> >> + tree final_len = NULL_TREE;
> >> + tree bias = NULL_TREE;
> >> + if (!costing_p)
> >> +   {
> >> + if (loop_masks)
> >> +   final_mask
> >> + = vect_get_loop_mask (loop_vinfo, gsi, loop_masks,
> >> +   vec_num * ncopies, vectype,
> >> + 

Re: Re: [PATCH] VECT: Apply MASK_LEN_{LOAD_LANES,STORE_LANES} into vectorizer

2023-08-15 Thread Richard Biener via Gcc-patches
On Tue, 15 Aug 2023, juzhe.zh...@rivai.ai wrote:

> Hi, Richi.
> 
> I realize this code perform analysis for load/store
> 
> +  internal_fn lanes_ifn;
>if (!get_load_store_type (vinfo, stmt_info, vectype, slp_node, mask, 
> vls_type,
> ncopies, _access_type, ,
> -   _support_scheme, , 
> _info))
> +   _support_scheme, , 
> _info,
> +   _ifn))
> 
> This function generate gather/scatter info "gs_info", using same approach.
> 
> add "_ifn" here which compute IFN for lanes load/store.
> 
> Does it reasonable ?

Ah, OK.  I guess re-computing it is OK then (once).

Richard.

> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-08-15 19:19
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; richard.sandiford
> Subject: Re: Re: [PATCH] VECT: Apply MASK_LEN_{LOAD_LANES,STORE_LANES} into 
> vectorizer
> On Tue, 15 Aug 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Hi, Richi.
> > 
> > > +   if (vect_store_lanes_supported (vectype, group_size, false)
> > > +   == IFN_MASK_LEN_STORE_LANES)
> > 
> > >> can you use the previously computed 'ifn' here please?
> > 
> > Do you mean rewrite the codes as follows :?
> > 
> > internal_fn lanes_ifn = vect_store_lanes_supported (vectype, group_size, 
> > false);
> > 
> > if (lanes_ifn == IFN_MASK_LEN_STORE_LANES).
>  
> The vect_store_lanes_supported is performed during analysis already
> and ideally we'd not re-do such check, so please save it in a
> variable at that point.
> > >> I think the patch needs refreshing after r14-3214-ga74d0d36a3f337.
> > 
> > Yeah, working on it and I will test on both X86 and ARM.
> > 
> > Thanks.
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-08-15 17:40
> > To: Ju-Zhe Zhong
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: [PATCH] VECT: Apply MASK_LEN_{LOAD_LANES,STORE_LANES} into 
> > vectorizer
> > On Mon, 14 Aug 2023, juzhe.zh...@rivai.ai wrote:
> >  
> > > From: Ju-Zhe Zhong 
> > > 
> > > Hi, Richard and Richi.
> > > 
> > > This patch is adding MASK_LEN_{LOAD_LANES,STORE_LANES} support into 
> > > vectorizer.
> > > 
> > > Consider this simple case:
> > > 
> > > void __attribute__ ((noinline, noclone))
> > > foo (int *__restrict a, int *__restrict b, int *__restrict c,
> > >   int *__restrict d, int *__restrict e, int *__restrict f,
> > >   int *__restrict g, int *__restrict h, int *__restrict j, int n)
> > > {
> > >   for (int i = 0; i < n; ++i)
> > > {
> > >   a[i] = j[i * 8];
> > >   b[i] = j[i * 8 + 1];
> > >   c[i] = j[i * 8 + 2];
> > >   d[i] = j[i * 8 + 3];
> > >   e[i] = j[i * 8 + 4];
> > >   f[i] = j[i * 8 + 5];
> > >   g[i] = j[i * 8 + 6];
> > >   h[i] = j[i * 8 + 7];
> > > }
> > > }
> > > 
> > > RVV Gimple IR:
> > > 
> > >   _79 = .SELECT_VL (ivtmp_81, POLY_INT_CST [4, 4]);
> > >   ivtmp_125 = _79 * 32;
> > >   vect_array.8 = .MASK_LEN_LOAD_LANES (vectp_j.6_124, 32B, { -1, ... }, 
> > > _79, 0);
> > >   vect__8.9_122 = vect_array.8[0];
> > >   vect__8.10_121 = vect_array.8[1];
> > >   vect__8.11_120 = vect_array.8[2];
> > >   vect__8.12_119 = vect_array.8[3];
> > >   vect__8.13_118 = vect_array.8[4];
> > >   vect__8.14_117 = vect_array.8[5];
> > >   vect__8.15_116 = vect_array.8[6];
> > >   vect__8.16_115 = vect_array.8[7];
> > >   vect_array.8 ={v} {CLOBBER};
> > >   ivtmp_114 = _79 * 4;
> > >   .MASK_LEN_STORE (vectp_a.17_113, 32B, { -1, ... }, _79, 0, 
> > > vect__8.9_122);
> > >   .MASK_LEN_STORE (vectp_b.19_109, 32B, { -1, ... }, _79, 0, 
> > > vect__8.10_121);
> > >   .MASK_LEN_STORE (vectp_c.21_105, 32B, { -1, ... }, _79, 0, 
> > > vect__8.11_120);
> > >   .MASK_LEN_STORE (vectp_d.23_101, 32B, { -1, ... }, _79, 0, 
> > > vect__8.12_119);
> > >   .MASK_LEN_STORE (vectp_e.25_97, 32B, { -1, ... }, _79, 0, 
> > > vect__8.13_118);
> > >   .MASK_LEN_STORE (vectp_f.27_93, 32B, { -1, ... }, _79, 0, 
> > > vect__8.14_117);
> > >   .MASK_LEN_STORE (vectp_g.29_89, 32B, { -1, ... }, _79, 0, 
> > > vect__8.15_116);
> > >   .MASK_LEN_STORE (vectp_h.31_85, 32B, { -1, ... }, _79, 0, 
> > > vect__8.16_115);
> > > 
> > > ASM:
> > > 
> > > foo:
> > > lw t4,8(sp)
> > > ld t5,0(sp)
> > > ble t4,zero,.L5
> > > .L3:
> > > vsetvli t1,t4,e8,mf4,ta,ma
> > > vlseg8e32.v v8,(t5)
> > > slli t3,t1,2
> > > slli t6,t1,5
> > > vse32.v v8,0(a0)
> > > vse32.v v9,0(a1)
> > > vse32.v v10,0(a2)
> > > vse32.v v11,0(a3)
> > > vse32.v v12,0(a4)
> > > vse32.v v13,0(a5)
> > > vse32.v v14,0(a6)
> > > vse32.v v15,0(a7)
> > > sub t4,t4,t1
> > > add t5,t5,t6
> > > add a0,a0,t3
> > > add a1,a1,t3
> > > add a2,a2,t3
> > > add a3,a3,t3
> > > add a4,a4,t3
> > > add a5,a5,t3
> > > add a6,a6,t3
> > > add a7,a7,t3
> > > bne t4,zero,.L3
> > > .L5:
> > > ret
> > > 
> > > The details of the approach:
> > > 
> > > Step 1 - Modifiy the LANES LOAD/STORE support function 
> > > (vect_load_lanes_supported/vect_store_lanes_supported):
> > > 
> > > +/* Return FN if 

Re: [PATCH] vect: Move VMAT_GATHER_SCATTER handlings from final loop nest

2023-08-15 Thread Kewen.Lin via Gcc-patches
on 2023/8/15 15:53, Richard Biener wrote:
> On Tue, Aug 15, 2023 at 4:44 AM Kewen.Lin  wrote:
>>
>> on 2023/8/14 22:16, Richard Sandiford wrote:
>>> "Kewen.Lin"  writes:
 Hi Richard,

 on 2023/8/14 20:20, Richard Sandiford wrote:
> Thanks for the clean-ups.  But...
>
> "Kewen.Lin"  writes:
>> Hi,
>>
>> Following Richi's suggestion [1], this patch is to move the
>> handlings on VMAT_GATHER_SCATTER in the final loop nest
>> of function vectorizable_load to its own loop.  Basically
>> it duplicates the final loop nest, clean up some useless
>> set up code for the case of VMAT_GATHER_SCATTER, remove some
>> unreachable code.  Also remove the corresponding handlings
>> in the final loop nest.
>>
>> Bootstrapped and regtested on x86_64-redhat-linux,
>> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>>
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623329.html
>>
>> Is it ok for trunk?
>>
>> BR,
>> Kewen
>> -
>>
>> gcc/ChangeLog:
>>
>>* tree-vect-stmts.cc (vectorizable_load): Move the handlings on
>>VMAT_GATHER_SCATTER in the final loop nest to its own loop,
>>and update the final nest accordingly.
>> ---
>>  gcc/tree-vect-stmts.cc | 361 +
>>  1 file changed, 219 insertions(+), 142 deletions(-)
>
> ...that seems like quite a lot of +s.  Is there nothing we can do to
> avoid the cut-&-paste?

 Thanks for the comments!  I'm not sure if I get your question, if we
 want to move out the handlings of VMAT_GATHER_SCATTER, the new +s seem
 inevitable?  Your concern is mainly about git blame history?
>>>
>>> No, it was more that 219-142=77, so it seems like a lot of lines
>>> are being duplicated rather than simply being moved.  (Unlike for
>>> VMAT_LOAD_STORE_LANES, which was even a slight LOC saving, and so
>>> was a clear improvement.)
>>>
>>> So I was just wondering if there was any obvious factoring-out that
>>> could be done to reduce the duplication.
>>
>> ah, thanks for the clarification!
>>
>> I think the main duplication are on the loop body beginning and end,
>> let's take a look at them in details:
>>
>> +  if (memory_access_type == VMAT_GATHER_SCATTER)
>> +{
>> +  gcc_assert (alignment_support_scheme == dr_aligned
>> + || alignment_support_scheme == dr_unaligned_supported);
>> +  gcc_assert (!grouped_load && !slp_perm);
>> +
>> +  unsigned int inside_cost = 0, prologue_cost = 0;
>>
>> // These above are newly added.
>>
>> +  for (j = 0; j < ncopies; j++)
>> +   {
>> + /* 1. Create the vector or array pointer update chain.  */
>> + if (j == 0 && !costing_p)
>> +   {
>> + if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
>> +   vect_get_gather_scatter_ops (loop_vinfo, loop, stmt_info,
>> +slp_node, _info, 
>> _ptr,
>> +_offsets);
>> + else
>> +   dataref_ptr
>> + = vect_create_data_ref_ptr (vinfo, first_stmt_info, 
>> aggr_type,
>> + at_loop, offset, , gsi,
>> + _incr, false, bump);
>> +   }
>> + else if (!costing_p)
>> +   {
>> + gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo));
>> + if (!STMT_VINFO_GATHER_SCATTER_P (stmt_info))
>> +   dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr,
>> +  gsi, stmt_info, bump);
>> +   }
>>
>> // These are for dataref_ptr, in the final looop nest we deal with more cases
>> on simd_lane_access_p and diff_first_stmt_info, but don't handle
>> STMT_VINFO_GATHER_SCATTER_P any more, very few (one case) can be shared 
>> between,
>> IMHO factoring out it seems like a overkill.
>>
>> +
>> + if (mask && !costing_p)
>> +   vec_mask = vec_masks[j];
>>
>> // It's merged out from j == 0 and j != 0
>>
>> +
>> + gimple *new_stmt = NULL;
>> + for (i = 0; i < vec_num; i++)
>> +   {
>> + tree final_mask = NULL_TREE;
>> + tree final_len = NULL_TREE;
>> + tree bias = NULL_TREE;
>> + if (!costing_p)
>> +   {
>> + if (loop_masks)
>> +   final_mask
>> + = vect_get_loop_mask (loop_vinfo, gsi, loop_masks,
>> +   vec_num * ncopies, vectype,
>> +   vec_num * j + i);
>> + if (vec_mask)
>> +   final_mask = prepare_vec_mask (loop_vinfo, mask_vectype,
>> +  final_mask, vec_mask, 
>> gsi);
>> +
>> + if (i > 0 && 

Re: Re: [PATCH] VECT: Apply MASK_LEN_{LOAD_LANES, STORE_LANES} into vectorizer

2023-08-15 Thread juzhe.zh...@rivai.ai
Hi, Richi.

I realize this code perform analysis for load/store

+  internal_fn lanes_ifn;
   if (!get_load_store_type (vinfo, stmt_info, vectype, slp_node, mask, 
vls_type,
ncopies, _access_type, ,
-   _support_scheme, , _info))
+   _support_scheme, , _info,
+   _ifn))

This function generate gather/scatter info "gs_info", using same approach.

add "_ifn" here which compute IFN for lanes load/store.

Does it reasonable ?

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-08-15 19:19
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford
Subject: Re: Re: [PATCH] VECT: Apply MASK_LEN_{LOAD_LANES,STORE_LANES} into 
vectorizer
On Tue, 15 Aug 2023, juzhe.zh...@rivai.ai wrote:
 
> Hi, Richi.
> 
> > +   if (vect_store_lanes_supported (vectype, group_size, false)
> > +   == IFN_MASK_LEN_STORE_LANES)
> 
> >> can you use the previously computed 'ifn' here please?
> 
> Do you mean rewrite the codes as follows :?
> 
> internal_fn lanes_ifn = vect_store_lanes_supported (vectype, group_size, 
> false);
> 
> if (lanes_ifn == IFN_MASK_LEN_STORE_LANES).
 
The vect_store_lanes_supported is performed during analysis already
and ideally we'd not re-do such check, so please save it in a
variable at that point.
> >> I think the patch needs refreshing after r14-3214-ga74d0d36a3f337.
> 
> Yeah, working on it and I will test on both X86 and ARM.
> 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-08-15 17:40
> To: Ju-Zhe Zhong
> CC: gcc-patches; richard.sandiford
> Subject: Re: [PATCH] VECT: Apply MASK_LEN_{LOAD_LANES,STORE_LANES} into 
> vectorizer
> On Mon, 14 Aug 2023, juzhe.zh...@rivai.ai wrote:
>  
> > From: Ju-Zhe Zhong 
> > 
> > Hi, Richard and Richi.
> > 
> > This patch is adding MASK_LEN_{LOAD_LANES,STORE_LANES} support into 
> > vectorizer.
> > 
> > Consider this simple case:
> > 
> > void __attribute__ ((noinline, noclone))
> > foo (int *__restrict a, int *__restrict b, int *__restrict c,
> >   int *__restrict d, int *__restrict e, int *__restrict f,
> >   int *__restrict g, int *__restrict h, int *__restrict j, int n)
> > {
> >   for (int i = 0; i < n; ++i)
> > {
> >   a[i] = j[i * 8];
> >   b[i] = j[i * 8 + 1];
> >   c[i] = j[i * 8 + 2];
> >   d[i] = j[i * 8 + 3];
> >   e[i] = j[i * 8 + 4];
> >   f[i] = j[i * 8 + 5];
> >   g[i] = j[i * 8 + 6];
> >   h[i] = j[i * 8 + 7];
> > }
> > }
> > 
> > RVV Gimple IR:
> > 
> >   _79 = .SELECT_VL (ivtmp_81, POLY_INT_CST [4, 4]);
> >   ivtmp_125 = _79 * 32;
> >   vect_array.8 = .MASK_LEN_LOAD_LANES (vectp_j.6_124, 32B, { -1, ... }, 
> > _79, 0);
> >   vect__8.9_122 = vect_array.8[0];
> >   vect__8.10_121 = vect_array.8[1];
> >   vect__8.11_120 = vect_array.8[2];
> >   vect__8.12_119 = vect_array.8[3];
> >   vect__8.13_118 = vect_array.8[4];
> >   vect__8.14_117 = vect_array.8[5];
> >   vect__8.15_116 = vect_array.8[6];
> >   vect__8.16_115 = vect_array.8[7];
> >   vect_array.8 ={v} {CLOBBER};
> >   ivtmp_114 = _79 * 4;
> >   .MASK_LEN_STORE (vectp_a.17_113, 32B, { -1, ... }, _79, 0, vect__8.9_122);
> >   .MASK_LEN_STORE (vectp_b.19_109, 32B, { -1, ... }, _79, 0, 
> > vect__8.10_121);
> >   .MASK_LEN_STORE (vectp_c.21_105, 32B, { -1, ... }, _79, 0, 
> > vect__8.11_120);
> >   .MASK_LEN_STORE (vectp_d.23_101, 32B, { -1, ... }, _79, 0, 
> > vect__8.12_119);
> >   .MASK_LEN_STORE (vectp_e.25_97, 32B, { -1, ... }, _79, 0, vect__8.13_118);
> >   .MASK_LEN_STORE (vectp_f.27_93, 32B, { -1, ... }, _79, 0, vect__8.14_117);
> >   .MASK_LEN_STORE (vectp_g.29_89, 32B, { -1, ... }, _79, 0, vect__8.15_116);
> >   .MASK_LEN_STORE (vectp_h.31_85, 32B, { -1, ... }, _79, 0, vect__8.16_115);
> > 
> > ASM:
> > 
> > foo:
> > lw t4,8(sp)
> > ld t5,0(sp)
> > ble t4,zero,.L5
> > .L3:
> > vsetvli t1,t4,e8,mf4,ta,ma
> > vlseg8e32.v v8,(t5)
> > slli t3,t1,2
> > slli t6,t1,5
> > vse32.v v8,0(a0)
> > vse32.v v9,0(a1)
> > vse32.v v10,0(a2)
> > vse32.v v11,0(a3)
> > vse32.v v12,0(a4)
> > vse32.v v13,0(a5)
> > vse32.v v14,0(a6)
> > vse32.v v15,0(a7)
> > sub t4,t4,t1
> > add t5,t5,t6
> > add a0,a0,t3
> > add a1,a1,t3
> > add a2,a2,t3
> > add a3,a3,t3
> > add a4,a4,t3
> > add a5,a5,t3
> > add a6,a6,t3
> > add a7,a7,t3
> > bne t4,zero,.L3
> > .L5:
> > ret
> > 
> > The details of the approach:
> > 
> > Step 1 - Modifiy the LANES LOAD/STORE support function 
> > (vect_load_lanes_supported/vect_store_lanes_supported):
> > 
> > +/* Return FN if vec_{masked_,mask_len,}load_lanes is available for COUNT
> > +   vectors of type VECTYPE.  MASKED_P says whether the masked form is 
> > needed. */
> >  
> > -bool
> > +internal_fn
> >  vect_load_lanes_supported (tree vectype, unsigned HOST_WIDE_INT count,
> > bool masked_p)
> >  {
> > -  if (masked_p)
> > -return vect_lanes_optab_supported_p ("vec_mask_load_lanes",
> > - vec_mask_load_lanes_optab,
> > - vectype, count);
> > +  if (vect_lanes_optab_supported_p 

Re: [PATCH] RISC-V: Implement vector "average" autovec pattern.

2023-08-15 Thread Robin Dapp via Gcc-patches
> Plz put your testcases into:
> 
> # widening operation only test on LMUL < 8
> set AUTOVEC_TEST_OPTS [list \
>   {-ftree-vectorize -O3 --param riscv-autovec-lmul=m1} \
>   {-ftree-vectorize -O3 --param riscv-autovec-lmul=m2} \
>   {-ftree-vectorize -O3 --param riscv-autovec-lmul=m4} \
>   {-ftree-vectorize -O2 --param riscv-autovec-lmul=m1} \
>   {-ftree-vectorize -O2 --param riscv-autovec-lmul=m2} \
>   {-ftree-vectorize -O2 --param riscv-autovec-lmul=m4} ]
> foreach op $AUTOVEC_TEST_OPTS {
>   dg-runtest [lsort [glob -nocomplain 
> $srcdir/$subdir/autovec/widen/*.\[cS\]]] \
>     "" "$op"
> }
> 
> You could either simpilfy put them into "widen" directory or create a new 
> directly.
> Anyway, make sure you have fully tested it with LMUL = 1/2/4.

Ah, almost forgot this.  I moved the tests to the widen directory
and will push it after testing.

Regards
 Robin


Re: [RFC] [v2] Extend fold_vec_perm to handle VLA vectors

2023-08-15 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 14 Aug 2023 at 18:23, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Thu, 10 Aug 2023 at 21:27, Richard Sandiford
> >  wrote:
> >>
> >> Prathamesh Kulkarni  writes:
> >> >> static bool
> >> >> is_simple_vla_size (poly_uint64 size)
> >> >> {
> >> >>   if (size.is_constant ())
> >> >> return false;
> >> >>   for (int i = 1; i < ARRAY_SIZE (size.coeffs); ++i)
> >> >> if (size[i] != (i <= 1 ? size[0] : 0))
> >> > Just wondering is this should be (i == 1 ? size[0] : 0) since i is
> >> > initialized to 1 ?
> >>
> >> Both work.  I prefer <= 1 because it doesn't depend on the micro
> >> optimisation to start at coefficient 1.  In a theoretical 3-indeterminate
> >> poly_int, we want the first 2 coefficients to be nonzero and the rest to
> >> be zero.
> >>
> >> > IIUC, is_simple_vla_size should return true for polynomials of first
> >> > degree and having same coeff like 4 + 4x ?
> >>
> >> FWIW, poly_int only supports first-degree polynomials at the moment.
> >> coeffs>2 means there is more than one indeterminate, rather than a
> >> higher power.
> > Oh OK, thanks for the clarification.
> >>
> >> >>   return false;
> >> >>   return true;
> >> >> }
> >> >>
> >> >>
> >> >>   FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_INT)
> >> >> {
> >> >>   auto nunits = GET_MODE_NUNITS (mode);
> >> >>   if (!is_simple_vla_size (nunits))
> >> >> continue;
> >> >>   if (nunits[0] ...)
> >> >> test_... (mode);
> >> >>   ...
> >> >>
> >> >> }
> >> >>
> >> >> test_vnx4si_v4si and test_v4si_vnx4si look good.  But with the
> >> >> loop structure above, I think we can apply the test_vnx4si and
> >> >> test_vnx16qi to more cases.  So the classification isn't the
> >> >> exact number of elements, but instead a limit.
> >> >>
> >> >> I think the nunits[0] conditions for test_vnx4si are as follows
> >> >> (inspection only, so could be wrong):
> >> >>
> >> >> > +/* Test cases where result and input vectors are VNx4SI  */
> >> >> > +
> >> >> > +static void
> >> >> > +test_vnx4si (machine_mode vmode)
> >> >> > +{
> >> >> > +  /* Case 1: mask = {0, ...} */
> >> >> > +  {
> >> >> > +tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1);
> >> >> > +tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1);
> >> >> > +poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> >> >> > +
> >> >> > +vec_perm_builder builder (len, 1, 1);
> >> >> > +builder.quick_push (0);
> >> >> > +vec_perm_indices sel (builder, 2, len);
> >> >> > +tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> >> >> > +
> >> >> > +tree expected_res[] = { vector_cst_elt (res, 0) };
> >> > This should be { vector_cst_elt (arg0, 0) }; will fix in next patch.
> >> >> > +validate_res (1, 1, res, expected_res);
> >> >> > +  }
> >> >>
> >> >> nunits[0] >= 2 (could be all nunits if the inputs had 
> >> >> nelts_per_pattern==1,
> >> >> which I think would be better)
> >> > IIUC, the vectors that can be used for a particular test should have
> >> > nunits[0] >= res_npatterns,
> >> > where res_npatterns is as computed in fold_vec_perm_cst without the
> >> > canonicalization ?
> >> > For above test -- res_npatterns = max(2, max (2, 1)) == 2, so we
> >> > require nunits[0] >= 2 ?
> >> > Which implies we can use above test for vectors with length 2 + 2x, 4 + 
> >> > 4x, etc.
> >>
> >> Right, that's what I meant.  With the inputs as they stand it has to be
> >> nunits[0] >= 2.  We need that form the inputs correctly.  But if the
> >> inputs instead had nelts_per_pattern == 1, the test would work for all
> >> nunits.
> > In the attached patch, I have reordered the tests based on min or max limit.
> > For tests where sel_npatterns < 3 (ie dup sequence), I have kept input
> > npatterns = 1,
> > so we can test more vector modes, and also input npatterns matter only
> > for stepped sequence in sel
> > (Since for a dup pattern we don't enforce the constraint of selecting
> > elements from same input pattern).
> > Does it look OK ?
> >
> > For the following tests with input vectors having shape (1, 3)
> > sel = {0, 1, 2, ...}  // (1, 3)
> > res = { arg0[0], arg0[1], arg0[2], ... } // (1, 3)
> >
> > and sel = {len, len + 1, len + 2, ... }  // (1, 3)
> > res = { arg1[0], arg1[1], arg1[2], ... } // (1, 3)
> >
> > Altho res_npatterns = 1, I suppose these will need to be tested with
> > vectors with length >= 4 + 4x,
> > since index 2 can be ambiguous for length 2 + 2x  ?
> > (In the patch, these are cases 2 and 3 in test_nunits_min_4)
>
> Ah, yeah, fair point.  I guess that means:
>
> +  /* Case 3: mask = {len, 0, 1, ...} // (1, 3)
> +Test that stepped sequence of the pattern selects from arg0.
> +res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3)  */
> +  {
> +   tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> +   tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> +   poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +   

Re: Re: [PATCH] VECT: Apply MASK_LEN_{LOAD_LANES,STORE_LANES} into vectorizer

2023-08-15 Thread Richard Biener via Gcc-patches
On Tue, 15 Aug 2023, juzhe.zh...@rivai.ai wrote:

> Hi, Richi.
> 
> > + if (vect_store_lanes_supported (vectype, group_size, false)
> > + == IFN_MASK_LEN_STORE_LANES)
> 
> >> can you use the previously computed 'ifn' here please?
> 
> Do you mean rewrite the codes as follows :?
> 
> internal_fn lanes_ifn = vect_store_lanes_supported (vectype, group_size, 
> false);
> 
> if (lanes_ifn == IFN_MASK_LEN_STORE_LANES).

The vect_store_lanes_supported is performed during analysis already
and ideally we'd not re-do such check, so please save it in a
variable at that point.
 
> >> I think the patch needs refreshing after r14-3214-ga74d0d36a3f337.
> 
> Yeah, working on it and I will test on both X86 and ARM.
> 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-08-15 17:40
> To: Ju-Zhe Zhong
> CC: gcc-patches; richard.sandiford
> Subject: Re: [PATCH] VECT: Apply MASK_LEN_{LOAD_LANES,STORE_LANES} into 
> vectorizer
> On Mon, 14 Aug 2023, juzhe.zh...@rivai.ai wrote:
>  
> > From: Ju-Zhe Zhong 
> > 
> > Hi, Richard and Richi.
> > 
> > This patch is adding MASK_LEN_{LOAD_LANES,STORE_LANES} support into 
> > vectorizer.
> > 
> > Consider this simple case:
> > 
> > void __attribute__ ((noinline, noclone))
> > foo (int *__restrict a, int *__restrict b, int *__restrict c,
> >   int *__restrict d, int *__restrict e, int *__restrict f,
> >   int *__restrict g, int *__restrict h, int *__restrict j, int n)
> > {
> >   for (int i = 0; i < n; ++i)
> > {
> >   a[i] = j[i * 8];
> >   b[i] = j[i * 8 + 1];
> >   c[i] = j[i * 8 + 2];
> >   d[i] = j[i * 8 + 3];
> >   e[i] = j[i * 8 + 4];
> >   f[i] = j[i * 8 + 5];
> >   g[i] = j[i * 8 + 6];
> >   h[i] = j[i * 8 + 7];
> > }
> > }
> > 
> > RVV Gimple IR:
> > 
> >   _79 = .SELECT_VL (ivtmp_81, POLY_INT_CST [4, 4]);
> >   ivtmp_125 = _79 * 32;
> >   vect_array.8 = .MASK_LEN_LOAD_LANES (vectp_j.6_124, 32B, { -1, ... }, 
> > _79, 0);
> >   vect__8.9_122 = vect_array.8[0];
> >   vect__8.10_121 = vect_array.8[1];
> >   vect__8.11_120 = vect_array.8[2];
> >   vect__8.12_119 = vect_array.8[3];
> >   vect__8.13_118 = vect_array.8[4];
> >   vect__8.14_117 = vect_array.8[5];
> >   vect__8.15_116 = vect_array.8[6];
> >   vect__8.16_115 = vect_array.8[7];
> >   vect_array.8 ={v} {CLOBBER};
> >   ivtmp_114 = _79 * 4;
> >   .MASK_LEN_STORE (vectp_a.17_113, 32B, { -1, ... }, _79, 0, vect__8.9_122);
> >   .MASK_LEN_STORE (vectp_b.19_109, 32B, { -1, ... }, _79, 0, 
> > vect__8.10_121);
> >   .MASK_LEN_STORE (vectp_c.21_105, 32B, { -1, ... }, _79, 0, 
> > vect__8.11_120);
> >   .MASK_LEN_STORE (vectp_d.23_101, 32B, { -1, ... }, _79, 0, 
> > vect__8.12_119);
> >   .MASK_LEN_STORE (vectp_e.25_97, 32B, { -1, ... }, _79, 0, vect__8.13_118);
> >   .MASK_LEN_STORE (vectp_f.27_93, 32B, { -1, ... }, _79, 0, vect__8.14_117);
> >   .MASK_LEN_STORE (vectp_g.29_89, 32B, { -1, ... }, _79, 0, vect__8.15_116);
> >   .MASK_LEN_STORE (vectp_h.31_85, 32B, { -1, ... }, _79, 0, vect__8.16_115);
> > 
> > ASM:
> > 
> > foo:
> > lw t4,8(sp)
> > ld t5,0(sp)
> > ble t4,zero,.L5
> > .L3:
> > vsetvli t1,t4,e8,mf4,ta,ma
> > vlseg8e32.v v8,(t5)
> > slli t3,t1,2
> > slli t6,t1,5
> > vse32.v v8,0(a0)
> > vse32.v v9,0(a1)
> > vse32.v v10,0(a2)
> > vse32.v v11,0(a3)
> > vse32.v v12,0(a4)
> > vse32.v v13,0(a5)
> > vse32.v v14,0(a6)
> > vse32.v v15,0(a7)
> > sub t4,t4,t1
> > add t5,t5,t6
> > add a0,a0,t3
> > add a1,a1,t3
> > add a2,a2,t3
> > add a3,a3,t3
> > add a4,a4,t3
> > add a5,a5,t3
> > add a6,a6,t3
> > add a7,a7,t3
> > bne t4,zero,.L3
> > .L5:
> > ret
> > 
> > The details of the approach:
> > 
> > Step 1 - Modifiy the LANES LOAD/STORE support function 
> > (vect_load_lanes_supported/vect_store_lanes_supported):
> > 
> > +/* Return FN if vec_{masked_,mask_len,}load_lanes is available for COUNT
> > +   vectors of type VECTYPE.  MASKED_P says whether the masked form is 
> > needed. */
> >  
> > -bool
> > +internal_fn
> >  vect_load_lanes_supported (tree vectype, unsigned HOST_WIDE_INT count,
> > bool masked_p)
> >  {
> > -  if (masked_p)
> > -return vect_lanes_optab_supported_p ("vec_mask_load_lanes",
> > - vec_mask_load_lanes_optab,
> > - vectype, count);
> > +  if (vect_lanes_optab_supported_p ("vec_mask_len_load_lanes",
> > + vec_mask_len_load_lanes_optab,
> > + vectype, count))
> > +return IFN_MASK_LEN_LOAD_LANES;
> > +  else if (masked_p)
> > +{
> > +  if (vect_lanes_optab_supported_p ("vec_mask_load_lanes",
> > + vec_mask_load_lanes_optab,
> > + vectype, count))
> > + return IFN_MASK_LOAD_LANES;
> > +}
> >else
> > -return vect_lanes_optab_supported_p ("vec_load_lanes",
> > - vec_load_lanes_optab,
> > - vectype, count);
> > +{
> > +  if (vect_lanes_optab_supported_p ("vec_load_lanes",
> > + vec_load_lanes_optab,
> > + vectype, count))
> > + return IFN_LOAD_LANES;
> > +}
> > +  return IFN_LAST;
> >  }
> >  
> > Instead of returning TRUE or FALSE whether target support the LANES 
> > LOAD/STORE.
> > I 

Re: Is this a bug for __builtin_dynamic_object_size?

2023-08-15 Thread Siddhesh Poyarekar

On 2023-08-14 19:12, Qing Zhao wrote:

Hi, Sid,

For the following testing case:

#include 

#define noinline __attribute__((__noinline__))

static void noinline alloc_buf_more (int index)
{
   struct annotated {
 long foo;
 char b;
 char array[index];
 long c;
   } q, *p;

   p = 

   printf("the__bdos of p->array whole max is %d \n", 
__builtin_dynamic_object_size(p->array, 0));
   printf("the__bdos of p->array sub max is %d \n", 
__builtin_dynamic_object_size(p->array, 1));
   printf("the__bdos of p->array whole min is %d \n", 
__builtin_dynamic_object_size(p->array, 2));
   printf("the__bdos of p->array sub min is %d \n", 
__builtin_dynamic_object_size(p->array, 3));

   return;
}

int main ()
{
   alloc_buf_more (10);
   return 0;
}

If I compile it with the latest upstream gcc and run it:

/home/opc/Install/latest-d/bin/gcc -O t.c
the__bdos of p->array whole max is 23
the__bdos of p->array sub max is 23
the__bdos of p->array whole min is 23
the__bdos of p->array sub min is 23

In which__builtin_dynamic_object_size(p->array, 0) and 
__builtin_dynamic_object_size(p->array, 1) return the same size, this seems wrong 
to me.

There is one line in tree-object-size.cc might relate to this bug: (in the 
routine “addr_object_size”)

  603   if (! TYPE_SIZE_UNIT (TREE_TYPE (var))
  604   || ! tree_fits_uhwi_p (TYPE_SIZE_UNIT (TREE_TYPE (var)))
  605   || (pt_var_size && TREE_CODE (pt_var_size) == INTEGER_CST
  606   && tree_int_cst_lt (pt_var_size,
  607   TYPE_SIZE_UNIT (TREE_TYPE (var)
  608 var = pt_var;

I suspect that the above line 604 “ ! tree_fits_uhwi_p (TYPE_SIZE_UNIT 
(TREE_TYPE (var)))” relates to this bug, since the TYPESIZE of the VLA “array” 
is not a unsigned HOST_WIDE_INT, but we still can use its TYPESIZE for 
dynamic_object_size?

What do you think?


Thanks, yes that doesn't work.  I'm trying to revive the patch I had 
submitted earlier[1] in the year and fix this issue too in that process. 
 In general the subobject size computation doesn't handle variable 
sizes at all; it depends on whole object+offset to get size information, 
which ends up working only for flex arrays at the end of objects.


Sid

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608914.html


Re: [PATCH v3] LoongArch:Implement 128-bit floating point functions in gcc.

2023-08-15 Thread Xi Ruoyao via Gcc-patches
Please fix code style (this is the third time I say it and I'm really
frustrated now).  GCC is a project, it's not a student homework so style
matters.  And it's not so difficult to fix the style: for a new file you
can use "clang-format --style GNU -i filename.c" to do the work
automatically.

On Tue, 2023-08-15 at 18:39 +0800, chenxiaolong wrote:
> In the implementation process, the "q" suffix function is
>     Re-register and associate the "__float128" type with the
>     "long double" type so that the compiler can handle the
>     corresponding function correctly. The functions implemented
>     include __builtin_{huge_valq infq, fabsq, copysignq, nanq,nansq}.
>     On the LoongArch architecture, __builtin_{fabsq,copysignq} can
>     be implemented with the instruction "bstrins.d", so that its
>     optimization effect reaches the optimal value.
> 
> gcc/ChangeLog:
> 
> * config/loongarch/loongarch-builtins.cc (DEF_LARCH_FTYPE):
> (enum loongarch_builtin_type):Increases the type of the function.
> (FLOAT_BUILTIN_HIQ):__builtin_{huge_valq,infq}.
> (FLOAT_BUILTIN_FCQ):__builtin_{fabsq,copysignq}.
> (FLOAT_BUILTIN_NNQ):__builtin_{nanq,nansq}.
> (loongarch_init_builtins):
> (loongarch_fold_builtin):
> (loongarch_expand_builtin):
> * config/loongarch/loongarch-protos.h (loongarch_fold_builtin):
> (loongarch_c_mode_for_suffix):Add the declaration of the function.
> * config/loongarch/loongarch.cc (loongarch_c_mode_for_suffix):Add
>     the definition of the function.
> (TARGET_FOLD_BUILTIN):
> (TARGET_C_MODE_FOR_SUFFIX):
> * config/loongarch/loongarch.md (infq):Add an instruction template
>     to the machine description file to generate information such as
>     the icode used by the function and the constructor.
> ():
> (fabsq):
> (copysignq):
> 
> libgcc/ChangeLog:
> 
> * config/loongarch/t-softfp-tf:
> * config/loongarch/tf-signs.c: New file.
> ---
>  gcc/config/loongarch/loongarch-builtins.cc | 168 -
>  gcc/config/loongarch/loongarch-protos.h    |   2 +
>  gcc/config/loongarch/loongarch.cc  |  14 ++
>  gcc/config/loongarch/loongarch.md  |  69 +
>  libgcc/config/loongarch/t-softfp-tf    |   3 +
>  libgcc/config/loongarch/tf-signs.c |  59 
>  6 files changed, 313 insertions(+), 2 deletions(-)
>  create mode 100644 libgcc/config/loongarch/tf-signs.c
> 
> diff --git a/gcc/config/loongarch/loongarch-builtins.cc 
> b/gcc/config/loongarch/loongarch-builtins.cc
> index b929f224dfa..2fb0fde0e3f 100644
> --- a/gcc/config/loongarch/loongarch-builtins.cc
> +++ b/gcc/config/loongarch/loongarch-builtins.cc
> @@ -36,6 +36,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "fold-const.h"
>  #include "expr.h"
>  #include "langhooks.h"
> +#include "calls.h"
> +#include "explow.h"
>  
>  /* Macros to create an enumeration identifier for a function prototype.  */
>  #define LARCH_FTYPE_NAME1(A, B) LARCH_##A##_FTYPE_##B
> @@ -48,9 +50,18 @@ enum loongarch_function_type
>  #define DEF_LARCH_FTYPE(NARGS, LIST) LARCH_FTYPE_NAME##NARGS LIST,
>  #include "config/loongarch/loongarch-ftypes.def"
>  #undef DEF_LARCH_FTYPE
> +  LARCH_BUILTIN_HUGE_VALQ,
> +  LARCH_BUILTIN_INFQ,
> +  LARCH_BUILTIN_FABSQ,
> +  LARCH_BUILTIN_COPYSIGNQ,
> +  LARCH_BUILTIN_NANQ,
> +  LARCH_BUILTIN_NANSQ,
>    LARCH_MAX_FTYPE_MAX
>  };
>  
> +/* Count the number of functions with "q" as the suffix.  */
> +const int MATHQ_NUMS = (int)LARCH_MAX_FTYPE_MAX - 
> (int)LARCH_BUILTIN_HUGE_VALQ;
> +
>  /* Specifies how a built-in function should be converted into rtl.  */
>  enum loongarch_builtin_type
>  {
> @@ -63,6 +74,15 @@ enum loongarch_builtin_type
>   value and the arguments are mapped to operands 0 and above.  */
>    LARCH_BUILTIN_DIRECT_NO_TARGET,
>  
> + /* The function corresponds to  __builtin_{huge_valq,infq}.  */
> +  LARCH_BUILTIN_HIQ_DIRECT,
> +
> + /* The function corresponds to  __builtin_{fabsq,copysignq}.  */
> +  LARCH_BUILTIN_FCQ_DIRECT,
> +
> +  /* Define the type of the __builtin_{nanq,nansq} function.  */
> +  LARCH_BUILTIN_NNQ_DIRECT
> +
>  };
>  
>  /* Declare an availability predicate for built-in functions that require
> @@ -136,6 +156,24 @@ AVAIL_ALL (hard_float, TARGET_HARD_FLOAT_ABI)
>    LARCH_BUILTIN (INSN, #INSN, LARCH_BUILTIN_DIRECT_NO_TARGET, \
>  FUNCTION_TYPE, AVAIL)
>  
> +/* Define an float to do funciton {huge_valq,infq}.  */
> +#define FLOAT_BUILTIN_HIQ (INSN, FUNCTION_TYPE)  \
> +    { CODE_FOR_ ## INSN, \
> +    "__builtin_" #INSN,  LARCH_BUILTIN_HIQ_DIRECT,    \
> +    FUNCTION_TYPE, loongarch_builtin_avail_default }
> +
> +/* Define an float to do funciton {fabsq,copysignq}.  */
> +#define FLOAT_BUILTIN_FCQ (INSN, FUNCTION_TYPE)  \
> +    { CODE_FOR_ ## INSN,

Re: [PATCH] RISC-V: Fix autovec_length_operand predicate[PR110989]

2023-08-15 Thread Robin Dapp via Gcc-patches
> Currently, autovec_length_operand predicate incorrect configuration is
> discovered in PR110989 since this following situation:

In case you haven't committed it yet: This is OK.

Regards
 Robin


[PATCH v3] LoongArch:Implement 128-bit floating point functions in gcc.

2023-08-15 Thread chenxiaolong
In the implementation process, the "q" suffix function is
Re-register and associate the "__float128" type with the
"long double" type so that the compiler can handle the
corresponding function correctly. The functions implemented
include __builtin_{huge_valq infq, fabsq, copysignq, nanq,nansq}.
On the LoongArch architecture, __builtin_{fabsq,copysignq} can
be implemented with the instruction "bstrins.d", so that its
optimization effect reaches the optimal value.

gcc/ChangeLog:

* config/loongarch/loongarch-builtins.cc (DEF_LARCH_FTYPE):
(enum loongarch_builtin_type):Increases the type of the function.
(FLOAT_BUILTIN_HIQ):__builtin_{huge_valq,infq}.
(FLOAT_BUILTIN_FCQ):__builtin_{fabsq,copysignq}.
(FLOAT_BUILTIN_NNQ):__builtin_{nanq,nansq}.
(loongarch_init_builtins):
(loongarch_fold_builtin):
(loongarch_expand_builtin):
* config/loongarch/loongarch-protos.h (loongarch_fold_builtin):
(loongarch_c_mode_for_suffix):Add the declaration of the function.
* config/loongarch/loongarch.cc (loongarch_c_mode_for_suffix):Add
the definition of the function.
(TARGET_FOLD_BUILTIN):
(TARGET_C_MODE_FOR_SUFFIX):
* config/loongarch/loongarch.md (infq):Add an instruction template
to the machine description file to generate information such as
the icode used by the function and the constructor.
():
(fabsq):
(copysignq):

libgcc/ChangeLog:

* config/loongarch/t-softfp-tf:
* config/loongarch/tf-signs.c: New file.
---
 gcc/config/loongarch/loongarch-builtins.cc | 168 -
 gcc/config/loongarch/loongarch-protos.h|   2 +
 gcc/config/loongarch/loongarch.cc  |  14 ++
 gcc/config/loongarch/loongarch.md  |  69 +
 libgcc/config/loongarch/t-softfp-tf|   3 +
 libgcc/config/loongarch/tf-signs.c |  59 
 6 files changed, 313 insertions(+), 2 deletions(-)
 create mode 100644 libgcc/config/loongarch/tf-signs.c

diff --git a/gcc/config/loongarch/loongarch-builtins.cc 
b/gcc/config/loongarch/loongarch-builtins.cc
index b929f224dfa..2fb0fde0e3f 100644
--- a/gcc/config/loongarch/loongarch-builtins.cc
+++ b/gcc/config/loongarch/loongarch-builtins.cc
@@ -36,6 +36,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "fold-const.h"
 #include "expr.h"
 #include "langhooks.h"
+#include "calls.h"
+#include "explow.h"
 
 /* Macros to create an enumeration identifier for a function prototype.  */
 #define LARCH_FTYPE_NAME1(A, B) LARCH_##A##_FTYPE_##B
@@ -48,9 +50,18 @@ enum loongarch_function_type
 #define DEF_LARCH_FTYPE(NARGS, LIST) LARCH_FTYPE_NAME##NARGS LIST,
 #include "config/loongarch/loongarch-ftypes.def"
 #undef DEF_LARCH_FTYPE
+  LARCH_BUILTIN_HUGE_VALQ,
+  LARCH_BUILTIN_INFQ,
+  LARCH_BUILTIN_FABSQ,
+  LARCH_BUILTIN_COPYSIGNQ,
+  LARCH_BUILTIN_NANQ,
+  LARCH_BUILTIN_NANSQ,
   LARCH_MAX_FTYPE_MAX
 };
 
+/* Count the number of functions with "q" as the suffix.  */
+const int MATHQ_NUMS = (int)LARCH_MAX_FTYPE_MAX - (int)LARCH_BUILTIN_HUGE_VALQ;
+
 /* Specifies how a built-in function should be converted into rtl.  */
 enum loongarch_builtin_type
 {
@@ -63,6 +74,15 @@ enum loongarch_builtin_type
  value and the arguments are mapped to operands 0 and above.  */
   LARCH_BUILTIN_DIRECT_NO_TARGET,
 
+ /* The function corresponds to  __builtin_{huge_valq,infq}.  */
+  LARCH_BUILTIN_HIQ_DIRECT,
+
+ /* The function corresponds to  __builtin_{fabsq,copysignq}.  */
+  LARCH_BUILTIN_FCQ_DIRECT,
+
+  /* Define the type of the __builtin_{nanq,nansq} function.  */
+  LARCH_BUILTIN_NNQ_DIRECT
+
 };
 
 /* Declare an availability predicate for built-in functions that require
@@ -136,6 +156,24 @@ AVAIL_ALL (hard_float, TARGET_HARD_FLOAT_ABI)
   LARCH_BUILTIN (INSN, #INSN, LARCH_BUILTIN_DIRECT_NO_TARGET, \
 FUNCTION_TYPE, AVAIL)
 
+/* Define an float to do funciton {huge_valq,infq}.  */
+#define FLOAT_BUILTIN_HIQ (INSN, FUNCTION_TYPE)  \
+{ CODE_FOR_ ## INSN, \
+"__builtin_" #INSN,  LARCH_BUILTIN_HIQ_DIRECT,\
+FUNCTION_TYPE, loongarch_builtin_avail_default }
+
+/* Define an float to do funciton {fabsq,copysignq}.  */
+#define FLOAT_BUILTIN_FCQ (INSN, FUNCTION_TYPE)  \
+{ CODE_FOR_ ## INSN, \
+"__builtin_" #INSN,  LARCH_BUILTIN_FCQ_DIRECT,\
+FUNCTION_TYPE, loongarch_builtin_avail_default }
+
+/* Define an float to do funciton {nanq,nansq}.  */
+#define FLOAT_BUILTIN_NNQ (INSN, FUNCTION_TYPE)  \
+{ CODE_FOR_ ## INSN,   \
+"__builtin_" #INSN,  LARCH_BUILTIN_NNQ_DIRECT,   \
+FUNCTION_TYPE, loongarch_builtin_avail_default }
+
 static const struct loongarch_builtin_description loongarch_builtins[] = {
 #define LARCH_MOVFCSR2GR 0
   DIRECT_BUILTIN (movfcsr2gr, LARCH_USI_FTYPE_UQI, hard_float),

Re: [RFC] GCC Security policy

2023-08-15 Thread Siddhesh Poyarekar

On 2023-08-15 01:59, Alexander Monakov wrote:


On Mon, 14 Aug 2023, Siddhesh Poyarekar wrote:


There's no practical (programmatic) way to do such validation; it has to be a
manual audit, which is why source code passed to the compiler has to be
*trusted*.


No, I do not think that is a logical conclusion. What is the problem with
passing untrusted code to a sandboxed compiler?


Right, that's what we're essentially trying to convey in the security policy
text.  It doesn't go into mechanisms for securing execution (because that's
really beyond the scope of the *project's* policy IMO) but it states
unambiguously that input to the compiler must be trusted:

"""
   ... It is necessary that
 all source code inputs to the compiler are trusted, since it is
 impossible for the driver to validate input source code beyond
 conformance to a programming language standard...
"""


I see two issues with this. First, it reads as if people wishing to build
not-entirely-trusted sources need to seek some other compiler, as somehow
we seem to imply that sandboxing GCC is out of the question.

Second, I take issue with the last part of the quoted text (language
conformance): verifying standards conformance is also impossible
(consider UB that manifests only during linking or dynamic loading)
so GCC is only doing that on a best-effort basis with no guarantees.


Does this as the first paragraph address your concerns:

The compiler driver processes source code, invokes other programs such 
as the assembler and linker and generates the output result, which may 
be assembly code or machine code.  It is necessary that all source code 
inputs to the compiler are trusted, since it is impossible for the 
driver to validate input source code for safety.  For untrusted code 
should compilation should be done inside a sandboxed environment to 
ensure that it does not compromise the development environment.  Note 
that this still does not guarantee safety of the produced output 
programs and that such programs should still either be analyzed 
thoroughly for safety or run only inside a sandbox or an isolated system 
to avoid compromising the execution environment.


Thanks,
Sid


  1   2   >