date:20230801

Re: [PATCH] Optimize vlddqu + inserti128 to vbroadcasti128

2023-08-01 Thread Uros Bizjak via Gcc-patches

On Wed, Aug 2, 2023 at 3:33 AM liuhongt  wrote:
>
> In [1], I propose a patch to generate vmovdqu for all vlddqu intrinsics
> after AVX2, it's rejected as
> > The instruction is reachable only as __builtin_ia32_lddqu* (aka
> > _mm_lddqu_si*), so it was chosen by the programmer for a reason. I
> > think that in this case, the compiler should not be too smart and
> > change the instruction behind the programmer's back. The caveats are
> > also explained at length in the ISA manual.
>
> So the patch is more conservative, only optimize vlddqu + vinserti128
> to vbroadcasti128.
> vlddqu + vinserti128 will use shuffle port in addition to load port
> comparing to vbroadcasti128, For latency perspective,vbroadcasti is no
> worse than vlddqu + vinserti128.
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625122.html
>
> Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> * config/i386/sse.md (*avx2_lddqu_inserti_to_bcasti): New
> pre_reload define_insn_and_split.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/vlddqu_vinserti128.c: New test.

OK with a small change bellow.

Thanks,
Uros.

> ---
>  gcc/config/i386/sse.md | 18 ++
>  .../gcc.target/i386/vlddqu_vinserti128.c   | 11 +++
>  2 files changed, 29 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/vlddqu_vinserti128.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 2d81347c7b6..4bdd2b43ba7 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -26600,6 +26600,24 @@ (define_insn "avx2_vbroadcasti128_"
> (set_attr "prefix" "vex,evex,evex")
> (set_attr "mode" "OI")])
>
> +;; optimize vlddqu + vinserti128 to vbroadcasti128, the former will use
> +;; extra shuffle port in addition to load port than the latter.
> +;; For latency perspective,vbroadcasti is no worse.
> +(define_insn_and_split "avx2_lddqu_inserti_to_bcasti"
> +  [(set (match_operand:V4DI 0 "register_operand" "=x,v,v")
> +   (vec_concat:V4DI
> + (subreg:V2DI
> +   (unspec:V16QI [(match_operand:V16QI 1 "memory_operand")]
> + UNSPEC_LDDQU) 0)
> + (subreg:V2DI (unspec:V16QI [(match_dup 1)]
> + UNSPEC_LDDQU) 0)))]
> +  "TARGET_AVX2 && ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 0)
> +   (vec_concat:V4DI (match_dup 1) (match_dup 1)))]
> +  "operands[1] = adjust_address (operands[1], V2DImode, 0);")

No need to validate address before reload, adjust_address_nv can be used.

> +
>  ;; Modes handled by AVX vec_dup patterns.
>  (define_mode_iterator AVX_VEC_DUP_MODE
>[V8SI V8SF V4DI V4DF])
> diff --git a/gcc/testsuite/gcc.target/i386/vlddqu_vinserti128.c 
> b/gcc/testsuite/gcc.target/i386/vlddqu_vinserti128.c
> new file mode 100644
> index 000..29699a5fa7f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vlddqu_vinserti128.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx2 -O2" } */
> +/* { dg-final { scan-assembler-times "vbroadcasti128" 1 } } */
> +/* { dg-final { scan-assembler-not {(?n)vlddqu.*xmm} } } */
> +
> +#include 
> +__m256i foo(void *data) {
> +__m128i X1 = _mm_lddqu_si128((__m128i*)data);
> +__m256i V1 = _mm256_broadcastsi128_si256 (X1);
> +return V1;
> +}
> --
> 2.39.1.388.g2fc9e9ca3c
>

[committed] [RISC-V] Avoid sub-word mode comparisons with Zicond

2023-08-01 Thread Jeff Law



c-torture/execute/pr59014-2.c fails with the Zicond work on rv64.  We 
miscompile the "foo" routine because we have eliminated a required sign 
extension.


The key routine looks like this:

foo (long long int x, long long int y)
{
  if (((int) x | (int) y) != 0)
return 6;
  return x + y;
}

So we kindof do the expected thing at expansion time.  We IOR X and Y, 
sign extend the result from 32 to 64 bits (note how the values in the 
source are casted from long long to ints), then emit a suitable 
conditional branch.  ie:



(insn 10 4 12 2 (set (reg:DI 142)
(ior:DI (reg/v:DI 138 [ x ])
(reg/v:DI 139 [ y ]))) "j.c":6:16 99 {iordi3}
 (nil))
(insn 12 10 13 2 (set (reg:DI 144)
(sign_extend:DI (subreg:SI (reg:DI 142) 0))) "j.c":6:6 116 {extendsidi2}
 (nil))
(jump_insn 13 12 14 2 (set (pc)
(if_then_else (ne (reg:DI 144)
(const_int 0 [0]))
(label_ref:DI 27)
(pc))) "j.c":6:6 243 {*branchdi}
 (expr_list:REG_DEAD (reg:DI 144)
(int_list:REG_BR_PROB 233216732 (nil)))


When we if-convert that we generate this sequence:


(insn 10 4 12 2 (set (reg:DI 142)
(ior:DI (reg/v:DI 138 [ x ])
(reg/v:DI 139 [ y ]))) "j.c":6:16 99 {iordi3}
 (nil))
(insn 12 10 30 2 (set (reg:DI 144)
(sign_extend:DI (subreg:SI (reg:DI 142) 0))) "j.c":6:6 116 {extendsidi2}
 (nil))
(insn 30 12 31 2 (set (reg:DI 147)
(const_int 6 [0x6])) "j.c":8:12 179 {*movdi_64bit}
 (nil)) 
(insn 31 30 33 2 (set (reg:DI 146)

(plus:DI (reg/v:DI 138 [ x ])
(reg/v:DI 139 [ y ]))) "j.c":8:12 5 {adddi3} 
 (nil))  
(insn 33 31 34 2 (set (reg:DI 149)

(if_then_else:DI (ne:DI (reg:DI 144)
(const_int 0 [0]))
(const_int 0 [0])
(reg:DI 146))) "j.c":8:12 11368 {*czero.nez.didi}
 (nil)) 
(insn 34 33 35 2 (set (reg:DI 148) 
(if_then_else:DI (eq:DI (reg:DI 144)

(const_int 0 [0]))
(const_int 0 [0])
(reg:DI 147))) "j.c":8:12 11367 {*czero.eqz.didi}
 (nil))
(insn 35 34 21 2 (set (reg:DI 137 [  ])
(ior:DI (reg:DI 148)
(reg:DI 149))) "j.c":8:12 99 {iordi3}
 (nil))


Which looks basically OK.  The sign extended subreg is a bit worrisome 
though.  And sure enough when we get into combine:



Failed to match this instruction:
(parallel [
(set (reg:DI 149)
(if_then_else:DI (eq:DI (subreg:SI (reg:DI 142) 0)
(const_int 0 [0]))
(reg:DI 146)
(const_int 0 [0])))
(set (reg:DI 144)
(sign_extend:DI (subreg:SI (reg:DI 142) 0)))
])
Successfully matched this instruction:
(set (reg:DI 144)
(sign_extend:DI (subreg:SI (reg:DI 142) 0)))
Successfully matched this instruction:
(set (reg:DI 149)
(if_then_else:DI (eq:DI (subreg:SI (reg:DI 142) 0)
(const_int 0 [0]))
(reg:DI 146)
(const_int 0 [0])))
allowing combination of insns 12 and 33


Since we need the side effect we first try the PARALLEL with two sets. 
That, as expected, fails.  Generic combine code then tries to pull apart 
the two sets as distinct insns resulting in this conditional move:



(insn 33 31 34 2 (set (reg:DI 149)
(if_then_else:DI (eq:DI (subreg:SI (reg:DI 142) 0)
(const_int 0 [0]))
(reg:DI 146)
(const_int 0 [0]))) "j.c":8:12 11347 {*czero.nez.disi}
 (expr_list:REG_DEAD (reg:DI 146)
(nil)))


Bzzt.  We can't actually implement this RTL in the hardware.  Basically 
it's asking to do 32bit comparison on rv64, ignoring the upper 32 bits 
of the input register.  That's not actually how zicond works.


The operands to the comparison need to be in DImode for rv64 and SImode 
for rv32.  That's the X iterator.  Note the mode of the comparison 
operands may be different than the mode of the destination.  ie, we 
might have a 64bit comparison and produce a 32bit sign extended result 
much like the setcc insns support.


This patch changes the 6 zicond patterns to use the X iterator on the 
comparison inputs.  That at least makes the patterns correct and fixes 
this particular testcase.   There's a few other lurking problems that 
I'll address in additional patches.


Committed to the trunk,
Jeff




commit 2d73f2eb80caf328bc4dd1324d475e7bf6b56837
Author: Jeff Law 
Date:   Tue Aug 1 23:12:16 2023 -0600

[committed] [RISC-V] Avoid sub-word mode comparisons with Zicond

c-torture/execute/pr59014-2.c fails with the Zicond work on rv64.  We
miscompile the "foo" routine because we have eliminated a required sign
extension.

The key routine looks like this:

foo (long long int x, long long int y)
{
  if (((int) x | (int) y) != 0)
return 6;
  return x + y;
}

So we kindof do the expected thing.  We IOR X and Y, sign extend the result
from 32 to 64 bits, then emit a suitable conditional

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-08-01 Thread Hao Liu OS via Gcc-patches

Hi Richard,

Update the patch with a simple case (see below case and comments).  It shows a 
live stmt may not have reduction def, which introduce the ICE.

Is it OK for trunk?


Fix the assertion failure on empty reduction define in info_for_reduction.
Even a stmt is live, it may still have empty reduction define.  Check the
reduction definition instead of live info before calling info_for_reduction.

gcc/ChangeLog:

PR target/110625
* config/aarch64/aarch64.cc (aarch64_force_single_cycle): check
STMT_VINFO_REDUC_DEF to avoid failures in info_for_reduction.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr110625_3.c: New testcase.
---
 gcc/config/aarch64/aarch64.cc |  2 +-
 gcc/testsuite/gcc.target/aarch64/pr110625_3.c | 34 +++
 2 files changed, 35 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr110625_3.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index d4d76025545..5b8d8fa8e2d 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -16776,7 +16776,7 @@ aarch64_adjust_stmt_cost (vect_cost_for_stmt kind, 
stmt_vec_info stmt_info,
 static bool
 aarch64_force_single_cycle (vec_info *vinfo, stmt_vec_info stmt_info)
 {
-  if (!STMT_VINFO_LIVE_P (stmt_info))
+  if (!STMT_VINFO_REDUC_DEF (stmt_info))
 return false;

   auto reduc_info = info_for_reduction (vinfo, stmt_info);
diff --git a/gcc/testsuite/gcc.target/aarch64/pr110625_3.c 
b/gcc/testsuite/gcc.target/aarch64/pr110625_3.c
new file mode 100644
index 000..35a50290cb0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr110625_3.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mcpu=neoverse-n2" } */
+
+/* Avoid ICE on empty reduction def in single_defuse_cycle.
+
+   E.g.
+  [local count: 858993456]:
+ # sum_18 = PHI 
+ sum.0_5 = (unsigned int) sum_18;
+ _6 = _4 + sum.0_5; <-- it is "live" but doesn't have reduction def
+ sum_15 = (int) _6;
+ ...
+ if (ivtmp_29 != 0)
+   goto ; [75.00%]
+ else
+   goto ; [25.00%]
+
+  [local count: 644245086]:
+ goto ; [100.00%]
+
+  [local count: 214748368]:
+ # _31 = PHI <_6(3)>
+ _8 = _31 >> 1;
+*/
+
+int
+f (unsigned int *tmp)
+{
+  int sum = 0;
+  for (int i = 0; i < 4; i++)
+sum += tmp[i];
+
+  return (unsigned int) sum >> 1;
+}
--
2.34.1


From: Hao Liu OS 
Sent: Tuesday, August 1, 2023 17:43
To: Richard Sandiford
Cc: Richard Biener; GCC-patches@gcc.gnu.org
Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by 
multiplying count [PR110625]

Hi Richard,

This is a quick fix to the several ICEs.  It seems even STMT_VINFO_LIVE_P is 
true, some reduct stmts still don't have REDUC_DEF.  So I change the check to 
STMT_VINFO_REDUC_DEF.

Is it OK for trunk?

---
Fix the ICEs on empty reduction define.  Even STMT_VINFO_LIVE_P is true, some 
reduct stmts
still don't have definition.

gcc/ChangeLog:

PR target/110625
* config/aarch64/aarch64.cc (aarch64_force_single_cycle): check
STMT_VINFO_REDUC_DEF to avoid failures in info_for_reduction
---
 gcc/config/aarch64/aarch64.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index d4d76025545..5b8d8fa8e2d 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -16776,7 +16776,7 @@ aarch64_adjust_stmt_cost (vect_cost_for_stmt kind, 
stmt_vec_info stmt_info,
 static bool
 aarch64_force_single_cycle (vec_info *vinfo, stmt_vec_info stmt_info)
 {
-  if (!STMT_VINFO_LIVE_P (stmt_info))
+  if (!STMT_VINFO_REDUC_DEF (stmt_info))
 return false;

   auto reduc_info = info_for_reduction (vinfo, stmt_info);
--
2.40.0



From: Richard Sandiford 
Sent: Monday, July 31, 2023 17:11
To: Hao Liu OS
Cc: Richard Biener; GCC-patches@gcc.gnu.org
Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by 
multiplying count [PR110625]

Hao Liu OS  writes:
>> Which test case do you see this for?  The two tests in the patch still
>> seem to report correct latencies for me if I make the change above.
>
> Not the newly added tests.  It is still the existing case causing the 
> previous ICE (i.e. assertion problem): gcc.target/aarch64/sve/cost_model_13.c.
>
> It's not the test case itself failed, but the dump message of vect says the 
> "reduction latency" is 0:
>
> Before the change:
> cost_model_13.c:7:21: note:  Original vector body cost = 6
> cost_model_13.c:7:21: note:  Scalar issue estimate:
> cost_model_13.c:7:21: note:load operations = 1
> cost_model_13.c:7:21: note:store operations = 0
> cost_model_13.c:7:21: note:general operations = 1
> cost_model_13.c:7:21: note:reduction latency = 1
> cost_model_13.c:7:21: note:estimated min cycles per iteration = 1.00
>

[PATCH v1] RISC-V: Support RVV VFWADD rounding mode intrinsic API

2023-08-01 Thread Pan Li via Gcc-patches

From: Pan Li 

This patch would like to support the rounding mode API for the VFWADD
VFSUB and VFRSUB as below samples.

* __riscv_vfwadd_vv_f64m2_rm
* __riscv_vfwadd_vv_f64m2_rm_m
* __riscv_vfwadd_vf_f64m2_rm
* __riscv_vfwadd_vf_f64m2_rm_m
* __riscv_vfwadd_wv_f64m2_rm
* __riscv_vfwadd_wv_f64m2_rm_m
* __riscv_vfwadd_wf_f64m2_rm
* __riscv_vfwadd_wf_f64m2_rm_m

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class widen_binop_frm): New class for binop frm.
(BASE): Add vfwadd_frm.
* config/riscv/riscv-vector-builtins-bases.h: New declaration.
* config/riscv/riscv-vector-builtins-functions.def
(vfwadd_frm): New function definition.
* config/riscv/riscv-vector-builtins-shapes.cc
(BASE_NAME_MAX_LEN): New macro.
(struct alu_frm_def): Leverage new base class.
(struct build_frm_base): New build base for frm.
(struct widen_alu_frm_def): New struct for widen alu frm.
(SHAPE): Add widen_alu_frm shape.
* config/riscv/riscv-vector-builtins-shapes.h: New declaration.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-widening-add.c: New test.
---
 .../riscv/riscv-vector-builtins-bases.cc  | 37 +++
 .../riscv/riscv-vector-builtins-bases.h   |  1 +
 .../riscv/riscv-vector-builtins-functions.def |  4 ++
 .../riscv/riscv-vector-builtins-shapes.cc | 66 +++
 .../riscv/riscv-vector-builtins-shapes.h  |  1 +
 .../riscv/rvv/base/float-point-widening-add.c | 52 +++
 6 files changed, 149 insertions(+), 12 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-widening-add.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 035cafc43b3..981a4a7ede8 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -315,6 +315,41 @@ public:
   }
 };
 
+/* Implements below instructions for frm
+   - vfwadd
+*/
+template
+class widen_binop_frm : public function_base
+{
+public:
+  bool has_rounding_mode_operand_p () const override { return true; }
+
+  rtx expand (function_expander ) const override
+  {
+switch (e.op_info->op)
+  {
+  case OP_TYPE_vv:
+   return e.use_exact_insn (
+ code_for_pred_dual_widen (CODE, e.vector_mode ()));
+  case OP_TYPE_vf:
+   return e.use_exact_insn (
+ code_for_pred_dual_widen_scalar (CODE, e.vector_mode ()));
+  case OP_TYPE_wv:
+   if (CODE == PLUS)
+ return e.use_exact_insn (
+   code_for_pred_single_widen_add (e.vector_mode ()));
+   else
+ return e.use_exact_insn (
+   code_for_pred_single_widen_sub (e.vector_mode ()));
+  case OP_TYPE_wf:
+   return e.use_exact_insn (
+ code_for_pred_single_widen_scalar (CODE, e.vector_mode ()));
+  default:
+   gcc_unreachable ();
+  }
+  }
+};
+
 /* Implements vrsub.  */
 class vrsub : public function_base
 {
@@ -2063,6 +2098,7 @@ static CONSTEXPR const binop_frm vfsub_frm_obj;
 static CONSTEXPR const reverse_binop vfrsub_obj;
 static CONSTEXPR const reverse_binop_frm vfrsub_frm_obj;
 static CONSTEXPR const widen_binop vfwadd_obj;
+static CONSTEXPR const widen_binop_frm vfwadd_frm_obj;
 static CONSTEXPR const widen_binop vfwsub_obj;
 static CONSTEXPR const binop vfmul_obj;
 static CONSTEXPR const binop vfdiv_obj;
@@ -2292,6 +2328,7 @@ BASE (vfsub_frm)
 BASE (vfrsub)
 BASE (vfrsub_frm)
 BASE (vfwadd)
+BASE (vfwadd_frm)
 BASE (vfwsub)
 BASE (vfmul)
 BASE (vfdiv)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 5c6b239c274..f9e1df5fe75 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -148,6 +148,7 @@ extern const function_base *const vfsub_frm;
 extern const function_base *const vfrsub;
 extern const function_base *const vfrsub_frm;
 extern const function_base *const vfwadd;
+extern const function_base *const vfwadd_frm;
 extern const function_base *const vfwsub;
 extern const function_base *const vfmul;
 extern const function_base *const vfmul;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index fa1c2cef970..743205a9b97 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -304,6 +304,10 @@ DEF_RVV_FUNCTION (vfwadd, widen_alu, full_preds, f_wwv_ops)
 DEF_RVV_FUNCTION (vfwadd, widen_alu, full_preds, f_wwf_ops)
 DEF_RVV_FUNCTION (vfwsub, widen_alu, full_preds, f_wwv_ops)
 DEF_RVV_FUNCTION (vfwsub, widen_alu, full_preds, f_wwf_ops)
+DEF_RVV_FUNCTION (vfwadd_frm, widen_alu_frm, full_preds, f_wvv_ops)
+DEF_RVV_FUNCTION (vfwadd_frm, widen_alu_frm, full_preds, f_wvf_ops)
+DEF_RVV_FUNCTION (vfwadd_frm, widen_alu_frm,

RE: [PATCH v3] RISCV: Add -m(no)-omit-leaf-frame-pointer support.

2023-08-01 Thread Wang, Yanzhang via Gcc-patches

Hi Jeff,

Do you have any further comments about this patch ?

Thanks,
Yanzhang

> -Original Message-
> From: Jeff Law 
> Sent: Friday, July 21, 2023 12:11 PM
> To: Kito Cheng ; Wang, Yanzhang
> 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@sifive.com;
> Li, Pan2 
> Subject: Re: [PATCH v3] RISCV: Add -m(no)-omit-leaf-frame-pointer support.
> 
> 
> 
> On 7/20/23 21:49, Kito Cheng wrote:
> > LGTM, I think long jump is another issue and making ra become a fixed
> > register will escalate to an ABI issue, so that should not be a
> > blocker for this patch.
> I'll take a look tomorrow, but I'm supportive of what Yanzhang is trying to
> do in principle.  I've got a few hot items to deal with tonight though.
> 
> WRT making $ra fixed.  In practice fixing a register just takes it out of
> the pool of things available to the allocator.  Furthermore $ra is always
> considered clobbered at call sites.  So while one could view it as an ABI
> change, it's not one that's actually observable in practice.
> I suspect that's one of the reasons why $ra is used by the assembler in
> this manner -- it minimizes both the ABI and performance impacts.
> 
> jeff

Re: Re: [PATCH V2] [PATCH 3/5] [RISC-V] Generate Zicond instruction for select pattern with condition eq or neq to 0

2023-08-01 Thread Xiao Zeng

On Tue, Aug 01, 2023 at 02:06:00 PM Jeff Law  wrote:
>
>
>
>On 7/29/23 03:13, Xiao Zeng wrote:
>> This patch recognizes Zicond patterns when the select pattern
>> with condition eq or neq to 0 (using eq as an example), namely:
>>
>> 1 rd = (rs2 == 0) ? non-imm : 0
>> 2 rd = (rs2 == 0) ? non-imm : non-imm
>> 3 rd = (rs2 == 0) ? reg : non-imm
>> 4 rd = (rs2 == 0) ? reg : reg
>>
>> gcc/ChangeLog:
>>
>>  * config/riscv/riscv.cc (riscv_expand_conditional_move): Recognize
>>  Zicond patterns
>>  * config/riscv/riscv.md: Recognize Zicond patterns through 
>>movcc
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/riscv/zicond-primitiveSemantics_return_0_imm.c: New 
>>test.
>>  * gcc.target/riscv/zicond-primitiveSemantics_return_imm_imm.c: New 
>>test.
>>  * gcc.target/riscv/zicond-primitiveSemantics_return_imm_reg.c: New 
>>test.
>>  * gcc.target/riscv/zicond-primitiveSemantics_return_reg_reg.c: New 
>>test.
>> ---
>>   gcc/config/riscv/riscv.cc | 144 ++
>>   gcc/config/riscv/riscv.md |   4 +-
>>   .../zicond-primitiveSemantics_return_0_imm.c  |  65 
>>   ...zicond-primitiveSemantics_return_imm_imm.c |  73 +
>>   ...zicond-primitiveSemantics_return_imm_reg.c |  65 
>>   ...zicond-primitiveSemantics_return_reg_reg.c |  65 
>>   6 files changed, 414 insertions(+), 2 deletions(-)
>>   create mode 100644 
>>gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_return_0_imm.c
>>   create mode 100644 
>>gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_return_imm_imm.c
>>   create mode 100644 
>>gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_return_imm_reg.c
>>   create mode 100644 
>>gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_return_reg_reg.c
>>
>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>> index 941ea25e1f2..6ac39f63dd7 100644
>> --- a/gcc/config/riscv/riscv.cc
>> +++ b/gcc/config/riscv/riscv.cc
>> @@ -3516,6 +3516,150 @@ riscv_expand_conditional_move (rtx dest, rtx op, rtx 
>> cons, rtx alt)
>>     cond, cons, alt)));
>> return true;
>>   }
>> +  else if (TARGET_ZICOND
>> +   && (code == EQ || code == NE)
>> +   && GET_MODE_CLASS (mode) == MODE_INT)
>> +    {
>> +  need_eq_ne_p = true;
>> +  /* 0 + imm  */
>> +  if (GET_CODE (cons) == CONST_INT && cons == const0_rtx
>> +  && GET_CODE (alt) == CONST_INT && alt != const0_rtx)
>A couple nits.  Rather than test the GET_CODE (object) == CONST_INT,
>instead use CONST_INT_P (object). 
fixed

>
>Rather than using const0_rtx, use CONST0_RTX (mode).  That makes it more
>general. 
fixed

>
>
>
>> +    {
>> +  riscv_emit_int_compare (, , , need_eq_ne_p);
>Might as well use "true" rather than "need_eq_ne_p" here and for the
>other calls in your new code.
> 
fixed

>> +  /* imm + imm  */
>> +  else if (GET_CODE (cons) == CONST_INT && cons != const0_rtx
>> +   && GET_CODE (alt) == CONST_INT && alt != const0_rtx)
>So same comments on using CONST_INT_P and CONST0_RTX 
fixed

>> +    {
>> +  riscv_emit_int_compare (, , , need_eq_ne_p);
>> +  rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
>> +  rtx reg = gen_reg_rtx (mode);
>> +  rtx temp = GEN_INT (INTVAL (alt) - INTVAL (cons));
>> +  emit_insn (gen_rtx_SET (reg, temp));
>Use force_reg here rather than directly emitting the insn to initialize
>"reg".  What you're doing works when the difference is small but will
>not work when the difference does not fit into a signed 12bit value. 
fixed

>
>> +  /* imm + reg  */
>> +  else if (GET_CODE (cons) == CONST_INT && cons != const0_rtx
>> +   && GET_CODE (alt) == REG)
>Same comments about CONST_INT_P and CONST0_RTX.  And instead of using
>GET_CODE (object) == REG, use REG_P (object).
>
>
>> +    {
>> +  /* Optimize for register value of 0.  */
>> +  if (op0 == alt && op1 == const0_rtx)
>> +    {
>> +  rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
>> +  cons = force_reg (mode, cons);
>> +  emit_insn (gen_rtx_SET (dest, gen_rtx_IF_THEN_ELSE (mode, 
>> cond,
>> +  cons, 
>> alt)));
>> +  return true;
>> +    }
>Isn't this only valid for NE?
Here is what I didn't express clearly, please see the following patterns in 
zicond.md:

(define_insn "*czero.eqz..opt2"
  [(set (match_operand:GPR 0 "register_operand"                   "=r")
        (if_then_else:GPR (eq (match_operand:ANYI 1 "register_operand" "r")
                              (const_int 0))
                          (match_operand:GPR 2 "register_operand" "r")
                          (match_operand:GPR 3 "register_operand" "1")))]
  "TARGET_ZICOND && rtx_equal_p (operands[1],  operands[3])"
  "czero.nez\t%0,%2,%1"
)

[PATCH v3] [RISC-V] Generate Zicond instruction for select pattern with condition eq or neq to 0

2023-08-01 Thread Xiao Zeng

This patch recognizes Zicond patterns when the select pattern
with condition eq or neq to 0 (using eq as an example), namely:

1 rd = (rs2 == 0) ? non-imm : 0
2 rd = (rs2 == 0) ? non-imm : non-imm
3 rd = (rs2 == 0) ? reg : non-imm
4 rd = (rs2 == 0) ? reg : reg

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_conditional_move): Recognize
Zicond patterns
* config/riscv/riscv.md: Recognize Zicond patterns through movcc

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-primitiveSemantics_return_0_imm.c: New test.
* gcc.target/riscv/zicond-primitiveSemantics_return_imm_imm.c: New test.
* gcc.target/riscv/zicond-primitiveSemantics_return_imm_reg.c: New test.
* gcc.target/riscv/zicond-primitiveSemantics_return_reg_reg.c: New test.
---
 gcc/config/riscv/riscv.cc | 137 ++
 gcc/config/riscv/riscv.md |   4 +-
 .../zicond-primitiveSemantics_return_0_imm.c  |  65 +
 ...zicond-primitiveSemantics_return_imm_imm.c |  73 ++
 ...zicond-primitiveSemantics_return_imm_reg.c |  65 +
 ...zicond-primitiveSemantics_return_reg_reg.c |  65 +
 6 files changed, 407 insertions(+), 2 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_return_0_imm.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_return_imm_imm.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_return_imm_reg.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_return_reg_reg.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index b6a57d0306d..6353d08ba9d 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3557,6 +3557,143 @@ riscv_expand_conditional_move (rtx dest, rtx op, rtx 
cons, rtx alt)
  cond, cons, alt)));
   return true;
 }
+  else if (TARGET_ZICOND
+   && (code == EQ || code == NE)
+   && GET_MODE_CLASS (mode) == MODE_INT)
+{
+  /* 0 + imm  */
+  if (CONST_INT_P (cons) && cons == CONST0_RTX (GET_MODE (cons))
+  && CONST_INT_P (alt) && alt != CONST0_RTX (GET_MODE (alt)))
+{
+  riscv_emit_int_compare (, , , true);
+  rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
+  alt = force_reg (mode, alt);
+  emit_insn (gen_rtx_SET (dest, gen_rtx_IF_THEN_ELSE (mode, cond,
+  cons, alt)));
+  return true;
+}
+  /* imm + imm  */
+  else if (CONST_INT_P (cons) && cons != CONST0_RTX (GET_MODE (cons))
+   && CONST_INT_P (alt) && alt != CONST0_RTX (GET_MODE (alt)))
+{
+  riscv_emit_int_compare (, , , true);
+  rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
+  alt = force_reg (mode, GEN_INT (INTVAL (alt) - INTVAL (cons)));
+  emit_insn (gen_rtx_SET (dest, gen_rtx_IF_THEN_ELSE (mode, cond,
+  CONST0_RTX 
(mode),
+  alt)));
+  riscv_emit_binary (PLUS, dest, dest, cons);
+  return true;
+}
+  /* imm + reg  */
+  else if (CONST_INT_P (cons) && cons != CONST0_RTX (GET_MODE (cons))
+   && REG_P (alt))
+{
+  if (op0 == alt && op1 == CONST0_RTX (GET_MODE (op1)))
+{
+  rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
+  cons = force_reg (mode, cons);
+  emit_insn (gen_rtx_SET (dest, gen_rtx_IF_THEN_ELSE (mode, cond,
+  cons, alt)));
+  return true;
+}
+  /* Handle the special situation of: -2048 == INTVAL (alt)
+ to avoid failure due to an unrecognized insn. Let the costing
+ model determine if the conditional move sequence is better
+ than the branching sequence.  */
+  if (-2048 == INTVAL (cons))
+{
+  rtx reg = gen_reg_rtx (mode);
+  emit_insn (gen_rtx_SET (reg, cons));
+  return riscv_expand_conditional_move (dest, op, reg, alt);
+}
+  riscv_emit_int_compare (, , , true);
+  rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
+  rtx temp = GEN_INT (-1 * INTVAL (cons));
+  riscv_emit_binary (PLUS, alt, alt, temp);
+  emit_insn (gen_rtx_SET (dest, gen_rtx_IF_THEN_ELSE (mode, cond,
+  CONST0_RTX 
(mode),
+  alt)));
+  riscv_emit_binary (PLUS, dest, dest, cons);
+  return true;
+}
+  /* imm + 0  */
+  else if (CONST_INT_P (cons) && cons !=

[PATCH] Optimize vlddqu + inserti128 to vbroadcasti128

2023-08-01 Thread liuhongt via Gcc-patches

In [1], I propose a patch to generate vmovdqu for all vlddqu intrinsics
after AVX2, it's rejected as
> The instruction is reachable only as __builtin_ia32_lddqu* (aka
> _mm_lddqu_si*), so it was chosen by the programmer for a reason. I
> think that in this case, the compiler should not be too smart and
> change the instruction behind the programmer's back. The caveats are
> also explained at length in the ISA manual.

So the patch is more conservative, only optimize vlddqu + vinserti128
to vbroadcasti128.
vlddqu + vinserti128 will use shuffle port in addition to load port
comparing to vbroadcasti128, For latency perspective,vbroadcasti is no
worse than vlddqu + vinserti128.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625122.html

Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:

* config/i386/sse.md (*avx2_lddqu_inserti_to_bcasti): New
pre_reload define_insn_and_split.

gcc/testsuite/ChangeLog:

* gcc.target/i386/vlddqu_vinserti128.c: New test.
---
 gcc/config/i386/sse.md | 18 ++
 .../gcc.target/i386/vlddqu_vinserti128.c   | 11 +++
 2 files changed, 29 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/vlddqu_vinserti128.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 2d81347c7b6..4bdd2b43ba7 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -26600,6 +26600,24 @@ (define_insn "avx2_vbroadcasti128_"
(set_attr "prefix" "vex,evex,evex")
(set_attr "mode" "OI")])
 
+;; optimize vlddqu + vinserti128 to vbroadcasti128, the former will use
+;; extra shuffle port in addition to load port than the latter.
+;; For latency perspective,vbroadcasti is no worse.
+(define_insn_and_split "avx2_lddqu_inserti_to_bcasti"
+  [(set (match_operand:V4DI 0 "register_operand" "=x,v,v")
+   (vec_concat:V4DI
+ (subreg:V2DI
+   (unspec:V16QI [(match_operand:V16QI 1 "memory_operand")]
+ UNSPEC_LDDQU) 0)
+ (subreg:V2DI (unspec:V16QI [(match_dup 1)]
+ UNSPEC_LDDQU) 0)))]
+  "TARGET_AVX2 && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (vec_concat:V4DI (match_dup 1) (match_dup 1)))]
+  "operands[1] = adjust_address (operands[1], V2DImode, 0);")
+
 ;; Modes handled by AVX vec_dup patterns.
 (define_mode_iterator AVX_VEC_DUP_MODE
   [V8SI V8SF V4DI V4DF])
diff --git a/gcc/testsuite/gcc.target/i386/vlddqu_vinserti128.c 
b/gcc/testsuite/gcc.target/i386/vlddqu_vinserti128.c
new file mode 100644
index 000..29699a5fa7f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vlddqu_vinserti128.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx2 -O2" } */
+/* { dg-final { scan-assembler-times "vbroadcasti128" 1 } } */
+/* { dg-final { scan-assembler-not {(?n)vlddqu.*xmm} } } */
+
+#include 
+__m256i foo(void *data) {
+__m128i X1 = _mm_lddqu_si128((__m128i*)data);
+__m256i V1 = _mm256_broadcastsi128_si256 (X1);
+return V1;
+}
-- 
2.39.1.388.g2fc9e9ca3c

[PATCH] Support vec_fmaddsub/vec_fmsubadd for vector HFmode.

2023-08-01 Thread liuhongt via Gcc-patches

AVX512FP16 supports vfmaddsubXXXph and vfmsubaddXXXph.
Also remove scalar mode from fmaddsub/fmsubadd pattern since there's
no scalar instruction for that.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready to push to trunk.

gcc/ChangeLog:

PR target/81904
* config/i386/sse.md (vec_fmaddsub4): Extend to vector
HFmode, use mode iterator VFH instead.
(vec_fmsubadd4): Ditto.
(fma_fmaddsub_):
Remove scalar mode from iterator, use VFH_AVX512VL instead.
(fma_fmsubadd_):
Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr81904.c: New test.
---
 gcc/config/i386/sse.md  | 44 -
 gcc/testsuite/gcc.target/i386/pr81904.c | 22 +
 2 files changed, 44 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr81904.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 51961bbfc0b..4e75c9addaa 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -5803,21 +5803,21 @@ (define_insn "_fnmsub__mask3"
 ;; But this doesn't seem useful in practice.
 
 (define_expand "vec_fmaddsub4"
-  [(set (match_operand:VF 0 "register_operand")
-   (unspec:VF
- [(match_operand:VF 1 "nonimmediate_operand")
-  (match_operand:VF 2 "nonimmediate_operand")
-  (match_operand:VF 3 "nonimmediate_operand")]
+  [(set (match_operand:VFH 0 "register_operand")
+   (unspec:VFH
+ [(match_operand:VFH 1 "nonimmediate_operand")
+  (match_operand:VFH 2 "nonimmediate_operand")
+  (match_operand:VFH 3 "nonimmediate_operand")]
  UNSPEC_FMADDSUB))]
   "TARGET_FMA || TARGET_FMA4 || ( == 64 || TARGET_AVX512VL)")
 
 (define_expand "vec_fmsubadd4"
-  [(set (match_operand:VF 0 "register_operand")
-   (unspec:VF
- [(match_operand:VF 1 "nonimmediate_operand")
-  (match_operand:VF 2 "nonimmediate_operand")
-  (neg:VF
-(match_operand:VF 3 "nonimmediate_operand"))]
+  [(set (match_operand:VFH 0 "register_operand")
+   (unspec:VFH
+ [(match_operand:VFH 1 "nonimmediate_operand")
+  (match_operand:VFH 2 "nonimmediate_operand")
+  (neg:VFH
+(match_operand:VFH 3 "nonimmediate_operand"))]
  UNSPEC_FMADDSUB))]
   "TARGET_FMA || TARGET_FMA4 || ( == 64 || TARGET_AVX512VL)")
 
@@ -5877,11 +5877,11 @@ (define_insn "*fma_fmaddsub_"
(set_attr "mode" "")])
 
 (define_insn "fma_fmaddsub_"
-  [(set (match_operand:VFH_SF_AVX512VL 0 "register_operand" "=v,v,v")
-   (unspec:VFH_SF_AVX512VL
- [(match_operand:VFH_SF_AVX512VL 1 "" "%0,0,v")
-  (match_operand:VFH_SF_AVX512VL 2 "" 
",v,")
-  (match_operand:VFH_SF_AVX512VL 3 "" 
"v,,0")]
+  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v,v")
+   (unspec:VFH_AVX512VL
+ [(match_operand:VFH_AVX512VL 1 "" "%0,0,v")
+  (match_operand:VFH_AVX512VL 2 "" 
",v,")
+  (match_operand:VFH_AVX512VL 3 "" 
"v,,0")]
  UNSPEC_FMADDSUB))]
   "TARGET_AVX512F &&  && 
"
   "@
@@ -5943,12 +5943,12 @@ (define_insn "*fma_fmsubadd_"
(set_attr "mode" "")])
 
 (define_insn "fma_fmsubadd_"
-  [(set (match_operand:VFH_SF_AVX512VL 0 "register_operand" "=v,v,v")
-   (unspec:VFH_SF_AVX512VL
- [(match_operand:VFH_SF_AVX512VL   1 "" "%0,0,v")
-  (match_operand:VFH_SF_AVX512VL   2 "" 
",v,")
-  (neg:VFH_SF_AVX512VL
-(match_operand:VFH_SF_AVX512VL 3 "" 
"v,,0"))]
+  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v,v")
+   (unspec:VFH_AVX512VL
+ [(match_operand:VFH_AVX512VL   1 "" "%0,0,v")
+  (match_operand:VFH_AVX512VL   2 "" 
",v,")
+  (neg:VFH_AVX512VL
+(match_operand:VFH_AVX512VL 3 "" 
"v,,0"))]
  UNSPEC_FMADDSUB))]
   "TARGET_AVX512F &&  && 
"
   "@
diff --git a/gcc/testsuite/gcc.target/i386/pr81904.c 
b/gcc/testsuite/gcc.target/i386/pr81904.c
new file mode 100644
index 000..9f5ad0bd952
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr81904.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2 -mprefer-vector-width=512" } */
+/* { dg-final { scan-assembler-times "vfmaddsub...ph\[ 
\t\]+\[^\n\]*%zmm\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-times "vfmsubadd...ph\[ 
\t\]+\[^\n\]*%zmm\[0-9\]" 1 } } */
+
+void vec_fmaddsub_fp16(int n, _Float16 da_r, _Float16 *x, _Float16* y, 
_Float16* __restrict z)
+{
+  for (int i = 0; i < 32; i += 2)
+{
+  z[i] =  da_r * x[i] - y[i];
+  z[i+1]  =  da_r * x[i+1] + y[i+1];
+}
+}
+
+void vec_fmasubadd_fp16(int n, _Float16 da_r, _Float16 *x, _Float16* y, 
_Float16* __restrict z)
+{
+  for (int i = 0; i < 32; i += 2)
+{
+  z[i] =  da_r * x[i] + y[i];
+  z[i+1]  =  da_r * x[i+1] - y[i+1];
+}
+}
-- 
2.39.1.388.g2fc9e9ca3c

Re: One question on the source code of tree-object-size.cc

2023-08-01 Thread Siddhesh Poyarekar


On 2023-08-01 18:57, Kees Cook wrote:


   return p;
}

/* in the following function, malloc allocated less space than size of the
struct fix.  Then what's the correct behavior we expect
the __builtin_object_size should have for the following?
  */

static struct fix * noinline alloc_buf_less ()
{
   struct fix *p;
   p = malloc(sizeof (struct fix) - SIZE_BUMP * sizeof (int));

   /*when checking the observed access p->array, we have info on both
 observered allocation and observed access,
 A. from observed allocation (alloc_size): (LENGTH - SIZE_BUMP) * sizeof 
(int)
 B. from observed access (TYPE): LENGTH * sizeof (int)
*/

   /* for MAXIMUM size in the whole object: currently, GCC always used the A.  */

   expect(__builtin_object_size(p->array, 0), (LENGTH - SIZE_BUMP) * 
sizeof(int));


ok:  __builtin_object_size(p->array, 0) == 20

My brain just melted a little, as this is now an under-sized instance of
"p", so we have an incomplete allocation. (I would expect -Warray-bounds
to yell very loudly for this.) But, technically, yes, this looks like
the right calculation.


AFAIK, -Warray-bounds will only yell in case of a dereference that the 
compiler may potentially see as being beyond that 20 byte bound; it 
won't actually see the undersized allocation.  An analyzer warning would 
be useful for just the undersized allocation regardless of whether the 
code actually ends up accessing the object beyond the allocation bounds.


Thanks,
Sid

Re: One question on the source code of tree-object-size.cc

2023-08-01 Thread Siddhesh Poyarekar


On 2023-08-01 17:35, Qing Zhao wrote:

typedef struct
{
   int a;
} A;
size_t f()
{
   A *p = malloc (1);
   return __builtin_object_size (p, 0);


Correction, that should be __builtin_object_size (p->a, 0).


Actually, it should be __builtin_object_size(p->a, 1).
For __builtin_object_size(p->a,0), gcc always uses the allocation size for the 
whole object.


Right, sorry, I mistyped, twice in fact; it should have been 
__bos(>a, 1) :)




GCC’s current behavior is:

For the size of the whole object, GCC currently always uses the allocation size.
And for the size in the sub-object, GCC chose the smaller one among the 
allocation size and the TYPE_SIZE.

Is this correct behavior?


Yes, it's deliberate; it specifically checks on var != pt_var, which can 
only be true for subobjects.


Thanks,
Sid

Re: RISC-V: Folding memory for FP + constant case

2023-08-01 Thread Jeff Law via Gcc-patches





On 8/1/23 17:38, Vineet Gupta wrote:


Also note that getting FP out of the shift-add sequences is the other 
key goal of Jivan's work.  FP elimination always results in a 
spill/reload if we have a shift-add insn where one operand is FP. 


Hmm, are you saying it should NOT be generating shift-add with SP as 
src, because currently thats exactly what fold FP offset *is* doing and 
is the reason it has 5 less insns.
We should not have shift-add with FP as a source prior to register 
allocation because it will almost always generate spill code.



jeff

Re: RISC-V: Folding memory for FP + constant case

2023-08-01 Thread Vineet Gupta





On 8/1/23 16:27, Jeff Law wrote:



On 8/1/23 17:13, Vineet Gupta wrote:


On 8/1/23 16:06, Philipp Tomsich wrote:

Very helpful! Looks as if regprop for stack_pointer is now either too
conservative — or one of our patches is missing in everyone's test
setup; we'll take a closer look.


FWIW, all 5 of them involve a SH2ADD have SP as source in the fold FP 
case which f-m-o seems to be generating a MV for.
To clarify f-m-o isn't generating the mv.  It's simplifying a sequence 
by pushing the constant in an addi instruction into the memory 
reference.  As a result the addi simplifies into a sp->reg copy that 
is supposed to then be propagated away.


Yep, that's clear.



Also note that getting FP out of the shift-add sequences is the other 
key goal of Jivan's work.  FP elimination always results in a 
spill/reload if we have a shift-add insn where one operand is FP. 


Hmm, are you saying it should NOT be generating shift-add with SP as 
src, because currently thats exactly what fold FP offset *is* doing and 
is the reason it has 5 less insns.


-Vineet

Re: RISC-V: Folding memory for FP + constant case

2023-08-01 Thread Vineet Gupta





On 8/1/23 16:22, Jeff Law wrote:
They must not be working as expected or folks are using old trees. 
Manolis's work for regcprop has been on the trunk for about 5-6 weeks 
ag this point: 


I have bleeding edge trunk from 2-3 days back. I think we are looking 
for the following which the tree does have.


2023-05-25 6a2e8dcbbd4b cprop_hardreg: Enable propagation of the stack 
pointer if possible


-Vineet

Re: RISC-V: Folding memory for FP + constant case

2023-08-01 Thread Jeff Law via Gcc-patches





On 8/1/23 17:13, Vineet Gupta wrote:


On 8/1/23 16:06, Philipp Tomsich wrote:

Very helpful! Looks as if regprop for stack_pointer is now either too
conservative — or one of our patches is missing in everyone's test
setup; we'll take a closer look.


FWIW, all 5 of them involve a SH2ADD have SP as source in the fold FP 
case which f-m-o seems to be generating a MV for.
To clarify f-m-o isn't generating the mv.  It's simplifying a sequence 
by pushing the constant in an addi instruction into the memory 
reference.  As a result the addi simplifies into a sp->reg copy that is 
supposed to then be propagated away.


Also note that getting FP out of the shift-add sequences is the other 
key goal of Jivan's work.  FP elimination always results in a 
spill/reload if we have a shift-add insn where one operand is FP.




jeff

Re: RISC-V: Folding memory for FP + constant case

2023-08-01 Thread Jeff Law via Gcc-patches





On 8/1/23 17:06, Philipp Tomsich wrote:

Very helpful! Looks as if regprop for stack_pointer is now either too
conservative — or one of our patches is missing in everyone's test
setup; we'll take a closer look.
They must not be working as expected or folks are using old trees. 
Manolis's work for regcprop has been on the trunk for about 5-6 weeks ag 
this point:




commit 893883f2f8f56984209c6ed210ee992ff71a14b0
Author: Manolis Tsamis 
Date:   Tue Jun 20 16:23:52 2023 +0200

cprop_hardreg: fix ORIGINAL_REGNO/REG_ATTRS/REG_POINTER handling

Fixes: 6a2e8dcbbd4bab3

Propagation for the stack pointer in regcprop was enabled in

6a2e8dcbbd4bab3, but set ORIGINAL_REGNO/REG_ATTRS/REG_POINTER for
stack_pointer_rtx which caused regression (e.g., PR 110313, PR 110308).

This fix adds special handling for stack_pointer_rtx in the places

where maybe_mode_change is called. This also adds an check in
maybe_mode_change to return the stack pointer only when the requested
mode matches the mode of stack_pointer_rtx.

PR debug/110308

gcc/ChangeLog:

* regcprop.cc (maybe_mode_change): Check stack_pointer_rtx mode.

(maybe_copy_reg_attrs): New function.
(find_oldest_value_reg): Use maybe_copy_reg_attrs.
(copyprop_hardreg_forward_1): Ditto.

gcc/testsuite/ChangeLog:

* g++.dg/torture/pr110308.C: New test.

Signed-off-by: Manolis Tsamis 

Signed-off-by: Philipp Tomsich 

commit 6a2e8dcbbd4bab374b27abea375bf7a921047800
Author: Manolis Tsamis 
Date:   Thu May 25 13:44:41 2023 +0200

cprop_hardreg: Enable propagation of the stack pointer if possible

Propagation of the stack pointer in cprop_hardreg is currenty

forbidden in all cases, due to maybe_mode_change returning NULL.
Relax this restriction and allow propagation when no mode change is
requested.

gcc/ChangeLog:

* regcprop.cc (maybe_mode_change): Enable stack pointer

propagation.

Re: RISC-V: Folding memory for FP + constant case

2023-08-01 Thread Jeff Law via Gcc-patches





On 8/1/23 17:03, Vineet Gupta wrote:



On 8/1/23 15:07, Philipp Tomsich wrote:

+Manolis Tsamis

On Tue, 1 Aug 2023 at 23:56, Jeff Law via Gcc-patches
 wrote:



On 8/1/23 13:14, Vineet Gupta wrote:


I have some numbers for f-m-o v3 vs this. Attached here (vs. inline to
avoid the Thunderbird mangling the test formatting)

Thanks.  Of particular importance is the leela change.  My recollection
was that the f-m-o work also picked up that case.  But if my memory is
faulty (always a possibility), then that shows a clear case where
Jivan's work picks up a case not handled by Manolis's work.

f-m-o originally targeted (and benefited) the leela-case.  I wonder if
other optimizations/changes over the last year interfere with this and
what needs to be changed to accomodate this... looks like we need to
revisit against trunk.

Philipp.


And on the other direction we can see that deepsjeng isn't helped by
Jivan's work, but is helped by Manolis's new pass.

I'd always hoped/expected we'd have cases where one patch clearly helped
over the other.  While the .25% to .37% improvements for the three most
impacted benchmarks doesn't move the needle much across the whole suite
they do add up over time.

Jeff


I took a quick look at Leela, the significant difference is from 
additional insns with SP not getting propagated.


e.g.

    231b6:    mv    a4,sp
    231b8:    sh2add    a5,a5,a4

vs.

    1e824:    sh2add    a5,a5,sp

There are 5 such instances which more or less make up for the delta.
ACK.  Jivan and I have seen similar things with some other work in this 
space.  What's weird is the bits of Manolis's work that went in a month 
or so ago are supposed to address this exact issue in the post-reload 
const/copy propagation pass.


Jeff

Re: RISC-V: Folding memory for FP + constant case

2023-08-01 Thread Vineet Gupta




On 8/1/23 16:06, Philipp Tomsich wrote:

Very helpful! Looks as if regprop for stack_pointer is now either too
conservative — or one of our patches is missing in everyone's test
setup; we'll take a closer look.


FWIW, all 5 of them involve a SH2ADD have SP as source in the fold FP 
case which f-m-o seems to be generating a MV for.




On Wed, 2 Aug 2023 at 01:03, Vineet Gupta  wrote:



On 8/1/23 15:07, Philipp Tomsich wrote:

+Manolis Tsamis

On Tue, 1 Aug 2023 at 23:56, Jeff Law via Gcc-patches
 wrote:


On 8/1/23 13:14, Vineet Gupta wrote:


I have some numbers for f-m-o v3 vs this. Attached here (vs. inline to
avoid the Thunderbird mangling the test formatting)

Thanks.  Of particular importance is the leela change.  My recollection
was that the f-m-o work also picked up that case.  But if my memory is
faulty (always a possibility), then that shows a clear case where
Jivan's work picks up a case not handled by Manolis's work.

f-m-o originally targeted (and benefited) the leela-case.  I wonder if
other optimizations/changes over the last year interfere with this and
what needs to be changed to accomodate this... looks like we need to
revisit against trunk.

Philipp.


And on the other direction we can see that deepsjeng isn't helped by
Jivan's work, but is helped by Manolis's new pass.

I'd always hoped/expected we'd have cases where one patch clearly helped
over the other.  While the .25% to .37% improvements for the three most
impacted benchmarks doesn't move the needle much across the whole suite
they do add up over time.

Jeff

I took a quick look at Leela, the significant difference is from
additional insns with SP not getting propagated.

e.g.

 231b6:mva4,sp
 231b8:sh2adda5,a5,a4

vs.

 1e824:sh2adda5,a5,sp

There are 5 such instances which more or less make up for the delta.

-Vineet

Re: RISC-V: Folding memory for FP + constant case

2023-08-01 Thread Philipp Tomsich

Very helpful! Looks as if regprop for stack_pointer is now either too
conservative — or one of our patches is missing in everyone's test
setup; we'll take a closer look.

On Wed, 2 Aug 2023 at 01:03, Vineet Gupta  wrote:
>
>
>
> On 8/1/23 15:07, Philipp Tomsich wrote:
> > +Manolis Tsamis
> >
> > On Tue, 1 Aug 2023 at 23:56, Jeff Law via Gcc-patches
> >  wrote:
> >>
> >>
> >> On 8/1/23 13:14, Vineet Gupta wrote:
> >>
> >>> I have some numbers for f-m-o v3 vs this. Attached here (vs. inline to
> >>> avoid the Thunderbird mangling the test formatting)
> >> Thanks.  Of particular importance is the leela change.  My recollection
> >> was that the f-m-o work also picked up that case.  But if my memory is
> >> faulty (always a possibility), then that shows a clear case where
> >> Jivan's work picks up a case not handled by Manolis's work.
> > f-m-o originally targeted (and benefited) the leela-case.  I wonder if
> > other optimizations/changes over the last year interfere with this and
> > what needs to be changed to accomodate this... looks like we need to
> > revisit against trunk.
> >
> > Philipp.
> >
> >> And on the other direction we can see that deepsjeng isn't helped by
> >> Jivan's work, but is helped by Manolis's new pass.
> >>
> >> I'd always hoped/expected we'd have cases where one patch clearly helped
> >> over the other.  While the .25% to .37% improvements for the three most
> >> impacted benchmarks doesn't move the needle much across the whole suite
> >> they do add up over time.
> >>
> >> Jeff
>
> I took a quick look at Leela, the significant difference is from
> additional insns with SP not getting propagated.
>
> e.g.
>
> 231b6:mva4,sp
> 231b8:sh2adda5,a5,a4
>
> vs.
>
> 1e824:sh2adda5,a5,sp
>
> There are 5 such instances which more or less make up for the delta.
>
> -Vineet
>

Re: RISC-V: Folding memory for FP + constant case

2023-08-01 Thread Vineet Gupta





On 8/1/23 15:07, Philipp Tomsich wrote:

+Manolis Tsamis

On Tue, 1 Aug 2023 at 23:56, Jeff Law via Gcc-patches
 wrote:



On 8/1/23 13:14, Vineet Gupta wrote:


I have some numbers for f-m-o v3 vs this. Attached here (vs. inline to
avoid the Thunderbird mangling the test formatting)

Thanks.  Of particular importance is the leela change.  My recollection
was that the f-m-o work also picked up that case.  But if my memory is
faulty (always a possibility), then that shows a clear case where
Jivan's work picks up a case not handled by Manolis's work.

f-m-o originally targeted (and benefited) the leela-case.  I wonder if
other optimizations/changes over the last year interfere with this and
what needs to be changed to accomodate this... looks like we need to
revisit against trunk.

Philipp.


And on the other direction we can see that deepsjeng isn't helped by
Jivan's work, but is helped by Manolis's new pass.

I'd always hoped/expected we'd have cases where one patch clearly helped
over the other.  While the .25% to .37% improvements for the three most
impacted benchmarks doesn't move the needle much across the whole suite
they do add up over time.

Jeff


I took a quick look at Leela, the significant difference is from 
additional insns with SP not getting propagated.


e.g.

   231b6:    mv    a4,sp
   231b8:    sh2add    a5,a5,a4

vs.

   1e824:    sh2add    a5,a5,sp

There are 5 such instances which more or less make up for the delta.

-Vineet

Re: One question on the source code of tree-object-size.cc

2023-08-01 Thread Kees Cook via Gcc-patches

On Tue, Aug 01, 2023 at 09:35:30PM +, Qing Zhao wrote:
> 
> 
> > On Jul 31, 2023, at 1:07 PM, Siddhesh Poyarekar  wrote:
> > 
> > On 2023-07-31 13:03, Siddhesh Poyarekar wrote:
> >> On 2023-07-31 12:47, Qing Zhao wrote:
> >>> Hi, Sid and Jakub,
> >>> 
> >>> I have a question in the following source portion of the routine 
> >>> “addr_object_size” of gcc/tree-object-size.cc:
> >>> 
> >>>   743   bytes = compute_object_offset (TREE_OPERAND (ptr, 0), var);
> >>>   744   if (bytes != error_mark_node)
> >>>   745 {
> >>>   746   bytes = size_for_offset (var_size, bytes);
> >>>   747   if (var != pt_var && pt_var_size && TREE_CODE (pt_var) == 
> >>> MEM_REF)
> >>>   748 {
> >>>   749   tree bytes2 = compute_object_offset (TREE_OPERAND 
> >>> (ptr, 0),
> >>>   750pt_var);
> >>>   751   if (bytes2 != error_mark_node)
> >>>   752 {
> >>>   753   bytes2 = size_for_offset (pt_var_size, bytes2);
> >>>   754   bytes = size_binop (MIN_EXPR, bytes, bytes2);
> >>>   755 }
> >>>   756 }
> >>>   757 }
> >>> 
> >>> At line 754, why we always use “MIN_EXPR” whenever it’s for OST_MINIMUM 
> >>> or not?
> >>> Shall we use
> >>> 
> >>> (object_size_type & OST_MINIMUM
> >>>  ? MIN_EXPR : MAX_EXPR)
> >>> 
> >> That MIN_EXPR is not for OST_MINIMUM.  It is to cater for allocations like 
> >> this:
> >> typedef struct
> >> {
> >>   int a;
> >> } A;
> >> size_t f()
> >> {
> >>   A *p = malloc (1);
> >>   return __builtin_object_size (p, 0);
> > 
> > Correction, that should be __builtin_object_size (p->a, 0).
> 
> Actually, it should be __builtin_object_size(p->a, 1).
> For __builtin_object_size(p->a,0), gcc always uses the allocation size for 
> the whole object.
> 
> GCC’s current behavior is:
> 
> For the size of the whole object, GCC currently always uses the allocation 
> size. 
> And for the size in the sub-object, GCC chose the smaller one among the 
> allocation size and the TYPE_SIZE. 
> 
> Is this correct behavior?
> 
> thanks.
> 
> Qing
> 
> Please see the following small example to show the above behavior:
> 
> =
> 
> #include 
> #include 
> 
> #define LENGTH 10
> #define SIZE_BUMP 5 
> #define noinline __attribute__((__noinline__))
> 
> struct fix {
>   size_t foo;
>   int array[LENGTH]; 
> };
> 
> #define expect(p, _v) do { \
> size_t v = _v; \
> if (p == v) \
> __builtin_printf ("ok:  %s == %zd\n", #p, p); \
> else \
> {  \
>   __builtin_printf ("WAT: %s == %zd (expected %zd)\n", #p, p, v); \
> } \
> } while (0);
> 
> 
> /* in the following function, malloc allocated more space than size of the 
>struct fix.  Then what's the correct behavior we expect
>the __builtin_object_size should have for the following?
>  */
> 
> static struct fix * noinline alloc_buf_more ()
> {
>   struct fix *p;
>   p = malloc(sizeof (struct fix) + SIZE_BUMP * sizeof (int)); 
> 
>   /*when checking the observed access p->array, we have info on both
> observered allocation and observed access, 
> A. from observed allocation (alloc_size): (LENGTH + SIZE_BUMP) * sizeof 
> (int)
> B. from observed access (TYPE): LENGTH * sizeof (int)
>*/
>
>   /* for MAXIMUM size in the whole object: currently, GCC always used the A.  
> */
>   expect(__builtin_object_size(p->array, 0), (LENGTH + SIZE_BUMP) * 
> sizeof(int));

ok:  __builtin_object_size(p->array, 0) == 60

This is what I'd expect, yes: all memory from "array" to end of
allocation, and that matches here: (LENGTH + SIZE_BUMP) * sizeof(int)

> 
>   /* for MAXIMUM size in the sub-object: currently, GCC chose the smaller
>  one among these two: B.  */
>   expect(__builtin_object_size(p->array, 1), LENGTH * sizeof(int));

ok:  __builtin_object_size(p->array, 1) == 40

Also as I'd expect: just LENGTH * sizeof(int), the remaining bytes
starting at "array", based on type info, regardless of rest of allocation.

> 
>   return p;
> }
> 
> /* in the following function, malloc allocated less space than size of the 
>struct fix.  Then what's the correct behavior we expect
>the __builtin_object_size should have for the following?
>  */
> 
> static struct fix * noinline alloc_buf_less ()
> {
>   struct fix *p;
>   p = malloc(sizeof (struct fix) - SIZE_BUMP * sizeof (int)); 
> 
>   /*when checking the observed access p->array, we have info on both
> observered allocation and observed access, 
> A. from observed allocation (alloc_size): (LENGTH - SIZE_BUMP) * sizeof 
> (int)
> B. from observed access (TYPE): LENGTH * sizeof (int)
>*/
>
>   /* for MAXIMUM size in the whole object: currently, GCC always used the A.  
> */
>   expect(__builtin_object_size(p->array, 0), (LENGTH - SIZE_BUMP) * 
> sizeof(int));

ok:  __builtin_object_size(p->array, 0) == 20

My brain just melted a

Re: [V1][PATCH 0/3] New attribute "element_count" to annotate bounds for C99 FAM(PR108896)

2023-08-01 Thread Kees Cook via Gcc-patches

On Mon, Jul 31, 2023 at 08:14:42PM +, Qing Zhao wrote:
> /* In general, Due to type casting, the type for the pointee of a pointer
>does not say anything about the object it points to,
>So, __builtin_object_size can not directly use the type of the pointee
>to decide the size of the object the pointer points to.
> 
>there are only two reliable ways:
>A. observed allocations  (call to the allocation functions in the routine)
>B. observed accesses (read or write access to the location of the 
>  pointer points to)
> 
>that provide information about the type/existence of an object at
>the corresponding address.
> 
>for A, we use the "alloc_size" attribute for the corresponding allocation
>functions to determine the object size;
> 
>For B, we use the SIZE info of the TYPE attached to the corresponding 
> access.
>(We treat counted_by attribute as a complement to the SIZE info of the TYPE
> for FMA) 
> 
>The only other way in C which ensures that a pointer actually points
>to an object of the correct type is 'static':
> 
>void foo(struct P *p[static 1]);   
> 
>See https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624814.html
>for more details.  */

This is a great explanation; thank you!

In the future I might want to have a new builtin that will allow
a program to query a pointer when neither A nor B have happened. But
for the first version of the __counted_by infrastructure, the above
limitations seen fine.

For example, maybe __builtin_counted_size(p) (which returns sizeof(*p) +
sizeof(*p->flex_array_member) * p->counted_by_member). Though since
there might be multiple flex array members, maybe this can't work. :)

-Kees

-- 
Kees Cook

Re: RISC-V: Folding memory for FP + constant case

2023-08-01 Thread Philipp Tomsich

+Manolis Tsamis

On Tue, 1 Aug 2023 at 23:56, Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 8/1/23 13:14, Vineet Gupta wrote:
>
> >
> > I have some numbers for f-m-o v3 vs this. Attached here (vs. inline to
> > avoid the Thunderbird mangling the test formatting)
> Thanks.  Of particular importance is the leela change.  My recollection
> was that the f-m-o work also picked up that case.  But if my memory is
> faulty (always a possibility), then that shows a clear case where
> Jivan's work picks up a case not handled by Manolis's work.

f-m-o originally targeted (and benefited) the leela-case.  I wonder if
other optimizations/changes over the last year interfere with this and
what needs to be changed to accomodate this... looks like we need to
revisit against trunk.

Philipp.

> And on the other direction we can see that deepsjeng isn't helped by
> Jivan's work, but is helped by Manolis's new pass.
>
> I'd always hoped/expected we'd have cases where one patch clearly helped
> over the other.  While the .25% to .37% improvements for the three most
> impacted benchmarks doesn't move the needle much across the whole suite
> they do add up over time.
>
> Jeff

Re: RISC-V: Folding memory for FP + constant case

2023-08-01 Thread Jeff Law via Gcc-patches





On 8/1/23 13:14, Vineet Gupta wrote:



I have some numbers for f-m-o v3 vs this. Attached here (vs. inline to 
avoid the Thunderbird mangling the test formatting)
Thanks.  Of particular importance is the leela change.  My recollection 
was that the f-m-o work also picked up that case.  But if my memory is 
faulty (always a possibility), then that shows a clear case where 
Jivan's work picks up a case not handled by Manolis's work.


And on the other direction we can see that deepsjeng isn't helped by 
Jivan's work, but is helped by Manolis's new pass.


I'd always hoped/expected we'd have cases where one patch clearly helped 
over the other.  While the .25% to .37% improvements for the three most 
impacted benchmarks doesn't move the needle much across the whole suite 
they do add up over time.


Jeff

Re: [PATCH] match.pd: Canonicalize (signed x << c) >> c [PR101955]

2023-08-01 Thread Jakub Jelinek via Gcc-patches

On Tue, Aug 01, 2023 at 03:20:33PM -0400, Drew Ross via Gcc-patches wrote:
> Canonicalizes (signed x << c) >> c into the lowest
> precision(type) - c bits of x IF those bits have a mode precision or a
> precision of 1. Also combines this rule with (unsigned x << c) >> c -> x &
> ((unsigned)-1 >> c) to prevent duplicate pattern. Tested successfully on
> x86_64 and x86 targets.
> 
>   PR middle-end/101955
> 
> gcc/ChangeLog:
> 
>   * match.pd ((signed x << c) >> c): New canonicalization.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/pr101955.c: New test.
> ---
>  gcc/match.pd| 20 +++
>  gcc/testsuite/gcc.dg/pr101955.c | 63 +
>  2 files changed, 77 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr101955.c
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 8543f777a28..62f7c84f565 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3758,13 +3758,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   - TYPE_PRECISION (TREE_TYPE (@2)
>(bit_and (convert @0) (lshift { build_minus_one_cst (type); } @1
>  
> -/* Optimize (x << c) >> c into x & ((unsigned)-1 >> c) for unsigned
> -   types.  */
> +/* For (x << c) >> c, optimize into x & ((unsigned)-1 >> c) for
> +   unsigned x OR truncate into the precision(type) - c lowest bits
> +   of signed x (if they have mode precision or a precision of 1)  */

There should be . between ) and "  */" above.

>  (simplify
> - (rshift (lshift @0 INTEGER_CST@1) @1)
> - (if (TYPE_UNSIGNED (type)
> -  && (wi::ltu_p (wi::to_wide (@1), element_precision (type
> -  (bit_and @0 (rshift { build_minus_one_cst (type); } @1
> + (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
> + (if (wi::ltu_p (wi::to_wide (@1), element_precision (type)))
> +  (if (TYPE_UNSIGNED (type))
> +   (bit_and @0 (rshift { build_minus_one_cst (type); } @1))

This needs to be (convert @0) instead of @0, because now that there is
the nop_convert? in between, @0 could have different type than type.
I certainly see regressions on
gcc.c-torture/compile/950612-1.c
on i686-linux because of this:
/home/jakub/src/gcc/gcc/testsuite/gcc.c-torture/compile/950612-1.c:17:1: error: 
type mismatch in binary expression
long long unsigned int

long long int

long long unsigned int

_346 = _3 & 4294967295;
during GIMPLE pass: forwprop
/home/jakub/src/gcc/gcc/testsuite/gcc.c-torture/compile/950612-1.c:17:1: 
internal compiler error: verify_gimple failed
0x9018a4e verify_gimple_in_cfg(function*, bool, bool)
../../gcc/tree-cfg.cc:5646
0x8e81eb5 execute_function_todo
../../gcc/passes.cc:2088
0x8e8234c do_per_function
../../gcc/passes.cc:1687
0x8e82431 execute_todo
../../gcc/passes.cc:2142
Please submit a full bug report, with preprocessed source (by using 
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

> +   (if (INTEGRAL_TYPE_P (type))
> +(with {
> +  int width = element_precision (type) - tree_to_uhwi (@1);
> +  tree stype = build_nonstandard_integer_type (width, 0);
> + }
> + (if (width  == 1 || type_has_mode_precision_p (stype))
> +  (convert (convert:stype @0

just one space before == instead of two

> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr101955.c
> @@ -0,0 +1,63 @@
> +/* { dg-do compile } */

The above line should be
/* { dg-do compile { target int32 } } */
because the test relies on 32-bit int, some targets have just
16-bit int.
Of course, unless you want to make the testcase more portable, by
using say
#define CHAR_BITS __CHAR_BIT__
#define INT_BITS (__SIZEOF_INT__ * __CHAR_BIT__)
#define LLONG_BITS (__SIZEOF_LONGLONG__ * __CHAR_BIT__)
and replacing all the 31, 24, 56 etc. constants with (INT_BITS - 1),
(INT_BITS - CHAR_BITS), (LLONG_BITS - CHAR_BITS) etc.
Though, it would still fail on some AVR configurations which have
(invalid for C) just 8-bit int, and the question is what to do with
that 16, because (INT_BITS - 2 * CHAR_BITS) is 0 on 16-bit ints, so
it would need to be (INT_BITS / 2) instead.  C requires that
long long is at least 64-bit, so that is less problematic (no known
target to have > 64-bit long long, though theoretically possible).

> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +

Jakub

Re: One question on the source code of tree-object-size.cc

2023-08-01 Thread Qing Zhao via Gcc-patches

> On Jul 31, 2023, at 1:07 PM, Siddhesh Poyarekar  wrote:
> 
> On 2023-07-31 13:03, Siddhesh Poyarekar wrote:
>> On 2023-07-31 12:47, Qing Zhao wrote:
>>> Hi, Sid and Jakub,
>>> 
>>> I have a question in the following source portion of the routine 
>>> “addr_object_size” of gcc/tree-object-size.cc:
>>> 
>>>   743   bytes = compute_object_offset (TREE_OPERAND (ptr, 0), var);
>>>   744   if (bytes != error_mark_node)
>>>   745 {
>>>   746   bytes = size_for_offset (var_size, bytes);
>>>   747   if (var != pt_var && pt_var_size && TREE_CODE (pt_var) == 
>>> MEM_REF)
>>>   748 {
>>>   749   tree bytes2 = compute_object_offset (TREE_OPERAND (ptr, 
>>> 0),
>>>   750pt_var);
>>>   751   if (bytes2 != error_mark_node)
>>>   752 {
>>>   753   bytes2 = size_for_offset (pt_var_size, bytes2);
>>>   754   bytes = size_binop (MIN_EXPR, bytes, bytes2);
>>>   755 }
>>>   756 }
>>>   757 }
>>> 
>>> At line 754, why we always use “MIN_EXPR” whenever it’s for OST_MINIMUM or 
>>> not?
>>> Shall we use
>>> 
>>> (object_size_type & OST_MINIMUM
>>>  ? MIN_EXPR : MAX_EXPR)
>>> 
>> That MIN_EXPR is not for OST_MINIMUM.  It is to cater for allocations like 
>> this:
>> typedef struct
>> {
>>   int a;
>> } A;
>> size_t f()
>> {
>>   A *p = malloc (1);
>>   return __builtin_object_size (p, 0);
> 
> Correction, that should be __builtin_object_size (p->a, 0).

Actually, it should be __builtin_object_size(p->a, 1).
For __builtin_object_size(p->a,0), gcc always uses the allocation size for the 
whole object.

GCC’s current behavior is:

For the size of the whole object, GCC currently always uses the allocation 
size. 
And for the size in the sub-object, GCC chose the smaller one among the 
allocation size and the TYPE_SIZE. 

Is this correct behavior?

thanks.

Qing

Please see the following small example to show the above behavior:

=

#include 
#include 

#define LENGTH 10
#define SIZE_BUMP 5 
#define noinline __attribute__((__noinline__))

struct fix {
  size_t foo;
  int array[LENGTH]; 
};

#define expect(p, _v) do { \
size_t v = _v; \
if (p == v) \
__builtin_printf ("ok:  %s == %zd\n", #p, p); \
else \
{  \
  __builtin_printf ("WAT: %s == %zd (expected %zd)\n", #p, p, v); \
} \
} while (0);

/* in the following function, malloc allocated more space than size of the 
   struct fix.  Then what's the correct behavior we expect
   the __builtin_object_size should have for the following?
 */

static struct fix * noinline alloc_buf_more ()
{
  struct fix *p;
  p = malloc(sizeof (struct fix) + SIZE_BUMP * sizeof (int)); 

  /*when checking the observed access p->array, we have info on both
observered allocation and observed access, 
A. from observed allocation (alloc_size): (LENGTH + SIZE_BUMP) * sizeof 
(int)
B. from observed access (TYPE): LENGTH * sizeof (int)
   */

  /* for MAXIMUM size in the whole object: currently, GCC always used the A.  */
  expect(__builtin_object_size(p->array, 0), (LENGTH + SIZE_BUMP) * 
sizeof(int));

  /* for MAXIMUM size in the sub-object: currently, GCC chose the smaller
 one among these two: B.  */
  expect(__builtin_object_size(p->array, 1), LENGTH * sizeof(int));

  return p;
}

/* in the following function, malloc allocated less space than size of the 
   struct fix.  Then what's the correct behavior we expect
   the __builtin_object_size should have for the following?
 */

static struct fix * noinline alloc_buf_less ()
{
  struct fix *p;
  p = malloc(sizeof (struct fix) - SIZE_BUMP * sizeof (int)); 

  /*when checking the observed access p->array, we have info on both
observered allocation and observed access, 
A. from observed allocation (alloc_size): (LENGTH - SIZE_BUMP) * sizeof 
(int)
B. from observed access (TYPE): LENGTH * sizeof (int)
   */

  /* for MAXIMUM size in the whole object: currently, GCC always used the A.  */
  expect(__builtin_object_size(p->array, 0), (LENGTH - SIZE_BUMP) * 
sizeof(int));

  /* for MAXIMUM size in the sub-object: currently, GCC chose the smaller
 one among these two: B.  */
  expect(__builtin_object_size(p->array, 1), (LENGTH - SIZE_BUMP) * 
sizeof(int));

  return p;
}

int main ()
{
  struct fix *p, *q; 
  p = alloc_buf_more ();
  q = alloc_buf_less ();

  return 0;
}

When compile the above small testing case with upstream gcc with 
-fstrict-flex-array=1:

/home/opc/Install/latest/bin/gcc -O -fstrict-flex-arrays=1 t28.c
ok:  __builtin_object_size(p->array, 0) == 60
ok:  __builtin_object_size(p->array, 1) == 40
ok:  __builtin_object_size(p->array, 0) == 20
ok:  __builtin_object_size(p->array, 1) == 20

> 
>> }
>> where the returned size should be 1 and not sizeof (int).  The mode doesn't 
>> really matter in this case.
>> HTH.
>> Sid

Re: [PATCH 0/5] GCC _BitInt support [PR102989]

2023-08-01 Thread Jakub Jelinek via Gcc-patches

On Fri, Jul 28, 2023 at 06:03:33PM +, Joseph Myers wrote:
> You could e.g. have a table up to 10^(N-1) for some N, and 10^N, 10^2N 
> etc. up to 10^6144 (or rather up to 10^6111, which can then be multiplied 
> by a 34-digit integer significand), so that only one multiplication is 
> needed to get the power of 10 and then a second multiplication by the 
> significand.  (Or split into three parts at the cost of an extra 
> multiplication, or multiply the significand by 1, 10, 100, 1000 or 1 
> as a multiplication within 128 bits and so only need to compute 10^k for k 
> a multiple of 5, or any number of variations on those themes.)

So, I've done some quick counting, if we want at most one multiplication
to get 10^X for X in 0..6111 (plus another to multiply mantissa by that),
having one table with 10^1..10^(N-1) and another with 10^YN for Y 1..6111/N,
I get for 64-bit limbs
S1 - size of 10^1..10^(N-1) table in bytes
S2 - size of 10^YN table
N   S1  S2  S
20  152 388792  388944
32  344 241848  242192
64  1104121560  122664
128 389660144   64040
255 14472   29320   43792
256 14584   29440   44024
266 15704   28032   43736
384 32072   19192   51264
512 56384   14080   70464
where 266 seems to be the minimum, though the difference from 256 is minimal
and having N a power of 2 seems cheaper.  Though, the above is just counting
the bytes of the 64-bit limb arrays concatenated together, I think it will
be helpful to have also an unsigned short table with the indexes into the
limb array (so another 256*2 + 24*2 bytes).
For something not in libgcc_s.so but in libgcc.a I guess 43.5KiB of .rodata
might be acceptable to make it fast.

Jakub

Re: RISC-V: Folding memory for FP + constant case

2023-08-01 Thread Jivan Hakobyan via Gcc-patches

Thank you for your effort.
I had evaluated only in intrate tests.
I am glad to see the same result on Leela.

On Tue, Aug 1, 2023 at 11:14 PM Vineet Gupta  wrote:

>
>
> On 7/25/23 20:31, Jeff Law via Gcc-patches wrote:
> >
> >
> > On 7/25/23 05:24, Jivan Hakobyan wrote:
> >> Hi.
> >>
> >> I re-run the benchmarks and hopefully got the same profit.
> >> I also compared the leela's code and figured out the reason.
> >>
> >> Actually, my and Manolis's patches do the same thing. The difference
> >> is only execution order.
> > But shouldn't your patch also allow for for at the last the potential
> > to pull the fp+offset computation out of a loop?  I'm pretty sure
> > Manolis's patch can't do that.
> >
> >> Because of f-m-o held after the register allocation it cannot
> >> eliminate redundant move 'sp' to another register.
> > Actually that's supposed to be handled by a different patch that
> > should already be upstream.  Specifically;
> >
> >> commit 6a2e8dcbbd4bab374b27abea375bf7a921047800
> >> Author: Manolis Tsamis 
> >> Date:   Thu May 25 13:44:41 2023 +0200
> >>
> >> cprop_hardreg: Enable propagation of the stack pointer if possible
> >> Propagation of the stack pointer in cprop_hardreg is currenty
> >> forbidden in all cases, due to maybe_mode_change returning NULL.
> >> Relax this restriction and allow propagation when no mode change is
> >> requested.
> >> gcc/ChangeLog:
> >> * regcprop.cc (maybe_mode_change): Enable stack pointer
> >> propagation.
> > I think there were a couple-follow-ups.  But that's the key change
> > that should allow propagation of copies from the stack pointer and
> > thus eliminate the mov gpr,sp instructions.  If that's not happening,
> > then it's worth investigating why.
> >
> >>
> >> Besides that, I have checked the build failure on x264_r. It is
> >> already fixed on the third version.
> > Yea, this was a problem with re-recognition.  I think it was fixed by:
> >
> >> commit ecfa870ff29d979bd2c3d411643b551f2b6915b0
> >> Author: Vineet Gupta 
> >> Date:   Thu Jul 20 11:15:37 2023 -0700
> >>
> >> RISC-V: optim const DF +0.0 store to mem [PR/110748]
> >> Fixes: ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT")
> >> DF +0.0 is bitwise all zeros so int x0 store to mem can be
> >> used to optimize it.
> > [ ... ]
> >
> >
> > So I think the big question WRT your patch is does it still help the
> > case where we weren't pulling the fp+offset computation out of a loop.
>
> I have some numbers for f-m-o v3 vs this. Attached here (vs. inline to
> avoid the Thunderbird mangling the test formatting)
>


-- 
With the best regards
Jivan Hakobyan

[PATCH v4] i386: Allow -mlarge-data-threshold with -mcmodel=large

2023-08-01 Thread Fangrui Song via Gcc-patches

When using -mcmodel=medium, large data objects larger than the
-mlarge-data-threshold threshold are placed into large data sections
(.lrodata, .ldata, .lbss and some variants).  GNU ld and ld.lld 17 place
.l* sections into separate output sections.  If small and medium code
model object files are mixed, the .l* sections won't exert relocation
overflow pressure on sections in object files built with -mcmodel=small.

However, when using -mcmodel=large, -mlarge-data-threshold doesn't
apply.  This means that the .rodata/.data/.bss sections may exert
relocation overflow pressure on sections in -mcmodel=small object files.

This patch allows -mcmodel=large to generate .l* sections and drops an
unneeded documentation restriction that the value must be the same.

Link: https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU
("Large data sections for the large code model")

Signed-off-by: Fangrui Song 

---
Changes from v1 
(https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616947.html):
* Clarify commit message. Add link to 
https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU

Changes from v2
* Drop an uneeded limitation in the documentation.

Changes from v3
* Change scan-assembler directives to use \. to match literal .
---
 gcc/config/i386/i386.cc| 15 +--
 gcc/config/i386/i386.opt   |  2 +-
 gcc/doc/invoke.texi|  6 +++---
 gcc/testsuite/gcc.target/i386/large-data.c | 13 +
 4 files changed, 26 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/large-data.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index eabc70011ea..37e810cc741 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -647,7 +647,8 @@ ix86_can_inline_p (tree caller, tree callee)
 static bool
 ix86_in_large_data_p (tree exp)
 {
-  if (ix86_cmodel != CM_MEDIUM && ix86_cmodel != CM_MEDIUM_PIC)
+  if (ix86_cmodel != CM_MEDIUM && ix86_cmodel != CM_MEDIUM_PIC &&
+  ix86_cmodel != CM_LARGE && ix86_cmodel != CM_LARGE_PIC)
 return false;
 
   if (exp == NULL_TREE)
@@ -858,8 +859,9 @@ x86_elf_aligned_decl_common (FILE *file, tree decl,
const char *name, unsigned HOST_WIDE_INT size,
unsigned align)
 {
-  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
-  && size > (unsigned int)ix86_section_threshold)
+  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC ||
+  ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC) &&
+ size > (unsigned int)ix86_section_threshold)
 {
   switch_to_section (get_named_section (decl, ".lbss", 0));
   fputs (LARGECOMM_SECTION_ASM_OP, file);
@@ -879,9 +881,10 @@ void
 x86_output_aligned_bss (FILE *file, tree decl, const char *name,
unsigned HOST_WIDE_INT size, unsigned align)
 {
-  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
-  && size > (unsigned int)ix86_section_threshold)
-switch_to_section (get_named_section (decl, ".lbss", 0));
+  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC ||
+   ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC) &&
+  size > (unsigned int)ix86_section_threshold)
+switch_to_section(get_named_section(decl, ".lbss", 0));
   else
 switch_to_section (bss_section);
   ASM_OUTPUT_ALIGN (file, floor_log2 (align / BITS_PER_UNIT));
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 1cc8563477a..52fad492353 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -282,7 +282,7 @@ Branches are this expensive (arbitrary units).
 
 mlarge-data-threshold=
 Target RejectNegative Joined UInteger Var(ix86_section_threshold) 
Init(DEFAULT_LARGE_SECTION_THRESHOLD)
--mlarge-data-threshold=Data greater than given threshold will 
go into .ldata section in x86-64 medium model.
+-mlarge-data-threshold=Data greater than given threshold will 
go into a large data section in x86-64 medium and large code models.
 
 mcmodel=
 Target RejectNegative Joined Enum(cmodel) Var(ix86_cmodel) Init(CM_32)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 104766f446d..bf6fe3e1a20 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -33207,9 +33207,9 @@ the cache line size.  @samp{compat} is the default.
 
 @opindex mlarge-data-threshold
 @item -mlarge-data-threshold=@var{threshold}
-When @option{-mcmodel=medium} is specified, data objects larger than
-@var{threshold} are placed in the large data section.  This value must be the
-same across all objects linked into the binary, and defaults to 65535.
+When @option{-mcmodel=medium} or @option{-mcmodel=large} is specified, data
+objects larger than @var{threshold} are placed in large data sections. The
+default is 65535.
 
 @opindex mrtd
 @item -mrtd
diff --git a/gcc/testsuite/gcc.target/i386/large-data.c 
b/gcc/testsuite/gcc.target/i386/large-data.c
new file mode 100644
index

Re: arm: Remove unsigned variant of vcaddq_m

2023-08-01 Thread Christophe Lyon via Gcc-patches

Hi Stam,


On Tue, 1 Aug 2023 at 19:22, Stamatis Markianos-Wright via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> Hi all,
>
> The unsigned variants of the vcaddq_m operation are not needed within the
> compiler, as the assembly output of the signed and unsigned versions of the
> ops is identical: with a `.i` suffix (as opposed to separate `.s` and `.u`
> suffixes).
>
> Tested with baremetal arm-none-eabi on Arm's fastmodels.
>
> Ok for trunk?
>

LGTM, with the very minor nit that you forgot to mention the typo fix in
mve.md in the ChangeLog part ;-)

 I think similar changes can be performed for all the other builtins that
use .i for both signed and unsigned versions, but we can do that later.

Thanks,

Christophe


> Thanks,
> Stamatis Markianos-Wright
>
> gcc/ChangeLog:
>
>  * config/arm/arm-mve-builtins-base.cc (vcaddq_rot90, vcaddq_rot270):
>Use common insn for signed and unsigned front-end definitions.
>  * config/arm/arm_mve_builtins.def
>(vcaddq_rot90_m_u, vcaddq_rot270_m_u): Make common.
>(vcaddq_rot90_m_s, vcaddq_rot270_m_s): Remove.
>  * config/arm/iterators.md (mve_insn): Merge signed and unsigned defs.
>(isu): Likewise.
>(rot): Likewise.
>(mve_rot): Likewise.
>(supf): Likewise.
>(VxCADDQ_M): Likewise.
>  * config/arm/unspecs.md (unspec): Likewise.
> ---
>   gcc/config/arm/arm-mve-builtins-base.cc |  4 ++--
>   gcc/config/arm/arm_mve_builtins.def |  6 ++---
>   gcc/config/arm/iterators.md | 30 +++--
>   gcc/config/arm/mve.md   |  4 ++--
>   gcc/config/arm/unspecs.md   |  6 ++---
>   5 files changed, 21 insertions(+), 29 deletions(-)
>
> diff --git a/gcc/config/arm/arm-mve-builtins-base.cc
> b/gcc/config/arm/arm-mve-builtins-base.cc
> index e31095ae112..426a87e9852 100644
> --- a/gcc/config/arm/arm-mve-builtins-base.cc
> +++ b/gcc/config/arm/arm-mve-builtins-base.cc
> @@ -260,8 +260,8 @@ FUNCTION_PRED_P_S_U (vaddvq, VADDVQ)
>   FUNCTION_PRED_P_S_U (vaddvaq, VADDVAQ)
>   FUNCTION_WITH_RTX_M (vandq, AND, VANDQ)
>   FUNCTION_ONLY_N (vbrsrq, VBRSRQ)
> -FUNCTION (vcaddq_rot90, unspec_mve_function_exact_insn_rot,
> (UNSPEC_VCADD90, UNSPEC_VCADD90, UNSPEC_VCADD90, VCADDQ_ROT90_M_S,
> VCADDQ_ROT90_M_U, VCADDQ_ROT90_M_F))
> -FUNCTION (vcaddq_rot270, unspec_mve_function_exact_insn_rot,
> (UNSPEC_VCADD270, UNSPEC_VCADD270, UNSPEC_VCADD270, VCADDQ_ROT270_M_S,
> VCADDQ_ROT270_M_U, VCADDQ_ROT270_M_F))
> +FUNCTION (vcaddq_rot90, unspec_mve_function_exact_insn_rot,
> (UNSPEC_VCADD90, UNSPEC_VCADD90, UNSPEC_VCADD90, VCADDQ_ROT90_M,
> VCADDQ_ROT90_M, VCADDQ_ROT90_M_F))
> +FUNCTION (vcaddq_rot270, unspec_mve_function_exact_insn_rot,
> (UNSPEC_VCADD270, UNSPEC_VCADD270, UNSPEC_VCADD270, VCADDQ_ROT270_M,
> VCADDQ_ROT270_M, VCADDQ_ROT270_M_F))
>   FUNCTION (vcmlaq, unspec_mve_function_exact_insn_rot, (-1, -1,
> UNSPEC_VCMLA, -1, -1, VCMLAQ_M_F))
>   FUNCTION (vcmlaq_rot90, unspec_mve_function_exact_insn_rot, (-1, -1,
> UNSPEC_VCMLA90, -1, -1, VCMLAQ_ROT90_M_F))
>   FUNCTION (vcmlaq_rot180, unspec_mve_function_exact_insn_rot, (-1, -1,
> UNSPEC_VCMLA180, -1, -1, VCMLAQ_ROT180_M_F))
> diff --git a/gcc/config/arm/arm_mve_builtins.def
> b/gcc/config/arm/arm_mve_builtins.def
> index 43dacc3dda1..6ac1812c697 100644
> --- a/gcc/config/arm/arm_mve_builtins.def
> +++ b/gcc/config/arm/arm_mve_builtins.def
> @@ -523,8 +523,8 @@ VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED,
> vhsubq_m_n_u, v16qi, v8hi, v4si)
>   VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vhaddq_m_u, v16qi, v8hi, v4si)
>   VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vhaddq_m_n_u, v16qi, v8hi,
> v4si)
>   VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, veorq_m_u, v16qi, v8hi, v4si)
> -VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vcaddq_rot90_m_u, v16qi,
> v8hi, v4si)
> -VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vcaddq_rot270_m_u, v16qi,
> v8hi, v4si)
> +VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vcaddq_rot90_m_, v16qi,
> v8hi, v4si)
> +VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vcaddq_rot270_m_, v16qi,
> v8hi, v4si)
>   VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vbicq_m_u, v16qi, v8hi, v4si)
>   VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vandq_m_u, v16qi, v8hi, v4si)
>   VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vaddq_m_u, v16qi, v8hi, v4si)
> @@ -587,8 +587,6 @@ VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED,
> vhcaddq_rot270_m_s, v16qi, v8hi, v4si)
>   VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vhaddq_m_s, v16qi, v8hi, v4si)
>   VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vhaddq_m_n_s, v16qi, v8hi, v4si)
>   VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, veorq_m_s, v16qi, v8hi, v4si)
> -VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vcaddq_rot90_m_s, v16qi, v8hi,
> v4si)
> -VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vcaddq_rot270_m_s, v16qi, v8hi,
> v4si)
>   VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vbrsrq_m_n_s, v16qi, v8hi, v4si)
>   VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vbicq_m_s, v16qi, v8hi, v4si)
>   VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vandq_m_s, v16qi,

[PATCH] match.pd: Canonicalize (signed x << c) >> c [PR101955]

2023-08-01 Thread Drew Ross via Gcc-patches

Canonicalizes (signed x << c) >> c into the lowest
precision(type) - c bits of x IF those bits have a mode precision or a
precision of 1. Also combines this rule with (unsigned x << c) >> c -> x &
((unsigned)-1 >> c) to prevent duplicate pattern. Tested successfully on
x86_64 and x86 targets.

  PR middle-end/101955

gcc/ChangeLog:

  * match.pd ((signed x << c) >> c): New canonicalization.

gcc/testsuite/ChangeLog:

  * gcc.dg/pr101955.c: New test.
---
 gcc/match.pd| 20 +++
 gcc/testsuite/gcc.dg/pr101955.c | 63 +
 2 files changed, 77 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr101955.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 8543f777a28..62f7c84f565 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3758,13 +3758,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
- TYPE_PRECISION (TREE_TYPE (@2)
   (bit_and (convert @0) (lshift { build_minus_one_cst (type); } @1
 
-/* Optimize (x << c) >> c into x & ((unsigned)-1 >> c) for unsigned
-   types.  */
+/* For (x << c) >> c, optimize into x & ((unsigned)-1 >> c) for
+   unsigned x OR truncate into the precision(type) - c lowest bits
+   of signed x (if they have mode precision or a precision of 1)  */
 (simplify
- (rshift (lshift @0 INTEGER_CST@1) @1)
- (if (TYPE_UNSIGNED (type)
-  && (wi::ltu_p (wi::to_wide (@1), element_precision (type
-  (bit_and @0 (rshift { build_minus_one_cst (type); } @1
+ (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
+ (if (wi::ltu_p (wi::to_wide (@1), element_precision (type)))
+  (if (TYPE_UNSIGNED (type))
+   (bit_and @0 (rshift { build_minus_one_cst (type); } @1))
+   (if (INTEGRAL_TYPE_P (type))
+(with {
+  int width = element_precision (type) - tree_to_uhwi (@1);
+  tree stype = build_nonstandard_integer_type (width, 0);
+ }
+ (if (width  == 1 || type_has_mode_precision_p (stype))
+  (convert (convert:stype @0
 
 /* Optimize x >> x into 0 */
 (simplify
diff --git a/gcc/testsuite/gcc.dg/pr101955.c b/gcc/testsuite/gcc.dg/pr101955.c
new file mode 100644
index 000..8619661b291
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr101955.c
@@ -0,0 +1,63 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+__attribute__((noipa)) int
+t1 (int x)
+{
+  int y = x << 31;
+  int z = y >> 31;
+  return z;
+}
+
+__attribute__((noipa)) int
+t2 (unsigned int x)
+{
+  int y = x << 31;
+  int z = y >> 31;
+  return z;
+}
+
+__attribute__((noipa)) int
+t3 (int x)
+{
+  return (x << 31) >> 31;
+}
+
+__attribute__((noipa)) int
+t4 (int x)
+{
+  return (x << 24) >> 24;
+}
+
+__attribute__((noipa)) int
+t5 (int x)
+{
+  return (x << 16) >> 16;
+}
+
+__attribute__((noipa)) long long
+t6 (long long x)
+{
+  return (x << 63) >> 63;
+}
+
+__attribute__((noipa)) long long
+t7 (long long x)
+{
+  return (x << 56) >> 56;
+}
+
+__attribute__((noipa)) long long
+t8 (long long x)
+{
+  return (x << 48) >> 48;
+}
+
+__attribute__((noipa)) long long
+t9 (long long x)
+{
+  return (x << 32) >> 32;
+}
+
+/* { dg-final { scan-tree-dump-not " >> " "optimized" } } */
+/* { dg-final { scan-tree-dump-not " << " "optimized" } } */
-- 
2.39.3

PING ^2: [PATCH V4, rs6000] Disable generation of scalar modulo instructions

2023-08-01 Thread Pat Haugen via Gcc-patches


On 6/30/23 2:26 PM, Pat Haugen via Gcc-patches wrote:

Updated from prior version to address latest review comment (simplify
umod3).

Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-06-30  Pat Haugen  

gcc/
     * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling
     scalar modulo.
     * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
     * config/rs6000/rs6000.md (mod3, *mod3): Disable.
     (define_expand umod3): New.
     (define_insn umod3): Rename to *umod3 and disable.
     (umodti3, modti3): Disable.

gcc/testsuite/
     * gcc.target/powerpc/clone1.c: Add xfails.
     * gcc.target/powerpc/clone3.c: Likewise.
     * gcc.target/powerpc/mod-1.c: Update scan strings and add xfails.
     * gcc.target/powerpc/mod-2.c: Likewise.
     * gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.

[PATCH] match.pd: Canonicalize (signed x << c) >> c [PR101955]

2023-08-01 Thread Drew Ross via Gcc-patches

Canonicalizes (signed x << c) >> c into the lowest
precision(type) - c bits of x IF those bits have a mode precision or a
precision of 1. Also combines this rule with (unsigned x << c) >> c -> x &
((unsigned)-1 >> c) to prevent duplicate pattern. Tested successfully on
x86_64 and x86 targets.

  PR middle-end/101955

gcc/ChangeLog:

  * match.pd ((signed x << c) >> c): New canonicalization.

gcc/testsuite/ChangeLog:

  * gcc.dg/pr101955.c: New test.
---
 gcc/match.pd| 20 +++
 gcc/testsuite/gcc.dg/pr101955.c | 63 +
 2 files changed, 77 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr101955.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 8543f777a28..62f7c84f565 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3758,13 +3758,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
- TYPE_PRECISION (TREE_TYPE (@2)
   (bit_and (convert @0) (lshift { build_minus_one_cst (type); } @1
 
-/* Optimize (x << c) >> c into x & ((unsigned)-1 >> c) for unsigned
-   types.  */
+/* For (x << c) >> c, optimize into x & ((unsigned)-1 >> c) for
+   unsigned x OR truncate into the precision(type) - c lowest bits
+   of signed x (if they have mode precision or a precision of 1)  */
 (simplify
- (rshift (lshift @0 INTEGER_CST@1) @1)
- (if (TYPE_UNSIGNED (type)
-  && (wi::ltu_p (wi::to_wide (@1), element_precision (type
-  (bit_and @0 (rshift { build_minus_one_cst (type); } @1
+ (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
+ (if (wi::ltu_p (wi::to_wide (@1), element_precision (type)))
+  (if (TYPE_UNSIGNED (type))
+   (bit_and @0 (rshift { build_minus_one_cst (type); } @1))
+   (if (INTEGRAL_TYPE_P (type))
+(with {
+  int width = element_precision (type) - tree_to_uhwi (@1);
+  tree stype = build_nonstandard_integer_type (width, 0);
+ }
+ (if (width  == 1 || type_has_mode_precision_p (stype))
+  (convert (convert:stype @0
 
 /* Optimize x >> x into 0 */
 (simplify
diff --git a/gcc/testsuite/gcc.dg/pr101955.c b/gcc/testsuite/gcc.dg/pr101955.c
new file mode 100644
index 000..8619661b291
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr101955.c
@@ -0,0 +1,63 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+__attribute__((noipa)) int
+t1 (int x)
+{
+  int y = x << 31;
+  int z = y >> 31;
+  return z;
+}
+
+__attribute__((noipa)) int
+t2 (unsigned int x)
+{
+  int y = x << 31;
+  int z = y >> 31;
+  return z;
+}
+
+__attribute__((noipa)) int
+t3 (int x)
+{
+  return (x << 31) >> 31;
+}
+
+__attribute__((noipa)) int
+t4 (int x)
+{
+  return (x << 24) >> 24;
+}
+
+__attribute__((noipa)) int
+t5 (int x)
+{
+  return (x << 16) >> 16;
+}
+
+__attribute__((noipa)) long long
+t6 (long long x)
+{
+  return (x << 63) >> 63;
+}
+
+__attribute__((noipa)) long long
+t7 (long long x)
+{
+  return (x << 56) >> 56;
+}
+
+__attribute__((noipa)) long long
+t8 (long long x)
+{
+  return (x << 48) >> 48;
+}
+
+__attribute__((noipa)) long long
+t9 (long long x)
+{
+  return (x << 32) >> 32;
+}
+
+/* { dg-final { scan-tree-dump-not " >> " "optimized" } } */
+/* { dg-final { scan-tree-dump-not " << " "optimized" } } */
-- 
2.39.3

ICE for interim fix for PR/110748

2023-08-01 Thread Vineet Gupta


Hi Jeff,

As discussed this morning, I'm sending over dumps for the optim of DF 
const -0.0 (PR/110748)  [1]
For rv64gc_zbs build, IRA is undoing the split which eventually leads to 
ICE in final pass.


[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110748#c15

void znd(double *d) {  *d = -0.0;   }


*split1*

(insn 10 3 11 2 (set (reg:DI 136)
    (const_int [0x8000])) "neg.c":4:5 -1

(insn 11 10 0 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64])
    (subreg:DF (reg:DI 136) 0)) "neg.c":4:5 -1

*ira*

(insn 11 9 12 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64])
    (const_double:DF -0.0 [-0x0.0p+0])) "neg.c":4:5 190 
{*movdf_hardfloat_rv64}

 (expr_list:REG_DEAD (reg:DI 135)


For the working case, the large const is not involved and not subject to 
IRA playing foul.


Attached are split1 and IRA dumps for OK (rv64gc) and NOK (rv64gc_zbs) 
cases.


Thx,
-Vineet
;; Function znd (znd, funcdef_no=0, decl_uid=2278, cgraph_uid=1, symbol_order=0)

Starting decreasing number of live ranges...
starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called
;; 1 loops found
;;
;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2
;; 2 succs { 1 }
rescanning insn with uid = 11.
deleting insn with uid = 10.
starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called
df_worklist_dataflow_doublequeue: n_basic_blocks 3 n_edges 2 count 3 (1)
Reg 135 uninteresting
;; 1 loops found
;;
;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2
;; 2 succs { 1 }
Building IRA IR
starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called

Pass 0 for finding pseudo/allocno costs

a0 (r135,l0) best GR_REGS, allocno GR_REGS

  a0(r135,l0) costs: SIBCALL_REGS:2000,2000 JALR_REGS:2000,2000 
GR_REGS:2000,2000 MEM:1,1


Pass 1 for finding pseudo/allocno costs

r135: preferred GR_REGS, alternative NO_REGS, allocno GR_REGS

  a0(r135,l0) costs: GR_REGS:2000,2000 MEM:1,1

   Insn 11(l0): point = 0
   Insn 9(l0): point = 2
 a0(r135): [1..2]
Compressing live ranges: from 5 to 2 - 40%
Ranges after the compression:
 a0(r135): [0..1]
+++Allocating 0 bytes for conflict table (uncompressed size 8)
;; a0(r135,l0) conflicts:
;; total conflict hard regs:
;; conflict hard regs:


  pref0:a0(r135)<-hr10@2000
  regions=1, blocks=3, points=2
allocnos=1 (big 0), copies=0, conflicts=0, ranges=1

 Allocnos coloring:


  Loop 0 (parent -1, header bb2, depth 0)
bbs: 2
all: 0r135
modified regnos: 135
border:
Pressure: GR_REGS=2
Hard reg set forest:
  0:( 1 5-63)@0
1:( 5-31)@24000
  Allocno a0r135 of GR_REGS(28) has 27 avail. regs  5-31, node:  5-31 
(confl regs =  0-4 32-127)
  Forming thread from colorable bucket:
  Pushing a0(r135,l0)(cost 0)
  Popping a0(r135,l0)  -- assign reg 10
Disposition:
0:r135 l010
New iteration of spill/restore move
+++Costs: overall -2000, reg -2000, mem 0, ld 0, st 0, move 0
+++   move loops 0, new jumps 0


znd

Dataflow summary:
;;  fully invalidated by EH  0 [zero] 3 [gp] 4 [tp] 5 [t0] 6 [t1] 7 [t2] 10 
[a0] 11 [a1] 12 [a2] 13 [a3] 14 [a4] 15 [a5] 16 [a6] 17 [a7] 28 [t3] 29 [t4] 30 
[t5] 31 [t6] 32 [ft0] 33 [ft1] 34 [ft2] 35 [ft3] 36 [ft4] 37 [ft5] 38 [ft6] 39 
[ft7] 42 [fa0] 43 [fa1] 44 [fa2] 45 [fa3] 46 [fa4] 47 [fa5] 48 [fa6] 49 [fa7] 
60 [ft8] 61 [ft9] 62 [ft10] 63 [ft11] 66 [vl] 67 [vtype] 68 [vxrm] 69 [frm] 70 
[N/A] 71 [N/A] 72 [N/A] 73 [N/A] 74 [N/A] 75 [N/A] 76 [N/A] 77 [N/A] 78 [N/A] 
79 [N/A] 80 [N/A] 81 [N/A] 82 [N/A] 83 [N/A] 84 [N/A] 85 [N/A] 86 [N/A] 87 
[N/A] 88 [N/A] 89 [N/A] 90 [N/A] 91 [N/A] 92 [N/A] 93 [N/A] 94 [N/A] 95 [N/A] 
96 [v0] 97 [v1] 98 [v2] 99 [v3] 100 [v4] 101 [v5] 102 [v6] 103 [v7] 104 [v8] 
105 [v9] 106 [v10] 107 [v11] 108 [v12] 109 [v13] 110 [v14] 111 [v15] 112 [v16] 
113 [v17] 114 [v18] 115 [v19] 116 [v20] 117 [v21] 118 [v22] 119 [v23] 120 [v24] 
121 [v25] 122 [v26] 123 [v27] 124 [v28] 125 [v29] 126 [v30] 127 [v31]
;;  hardware regs used   2 [sp] 64 [arg] 65 [frame]
;;  regular block artificial uses2 [sp] 8 [s0] 64 [arg] 65 [frame]
;;  eh block artificial uses 2 [sp] 8 [s0] 64 [arg] 65 [frame]
;;  entry block defs 1 [ra] 2 [sp] 8 [s0] 10 [a0] 11 [a1] 12 [a2] 13 [a3] 
14 [a4] 15 [a5] 16 [a6] 17 [a7] 42 [fa0] 43 [fa1] 44 [fa2] 45 [fa3] 46 [fa4] 47 
[fa5] 48 [fa6] 49 [fa7] 64 [arg] 65 [frame]
;;  exit block uses  1 [ra] 2 [sp] 8 [s0] 65 [frame]
;;  regs ever live   10 [a0]
;;  ref usage   r1={1d,1u} r2={1d,2u} r8={1d,2u} r10={1d,1u} r11={1d} r12={1d} 
r13={1d} r14={1d} r15={1d} r16={1d} r17={1d} r42={1d} r43={1d} r44={1d} 
r45={1d} r46={1d} r47={1d} r48={1d} r49={1d} r64={1d,1u} r65={1d,2u} 
r135={1d,1u} 
;;total ref usage 32{22d,10u,0e} in 2{2 regular + 0 call} insns.
(note 1 0 4 NOTE_INSN_DELETED)
(note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 2 4 3 2

Re: RISC-V: Folding memory for FP + constant case

2023-08-01 Thread Vineet Gupta




On 7/25/23 20:31, Jeff Law via Gcc-patches wrote:



On 7/25/23 05:24, Jivan Hakobyan wrote:

Hi.

I re-run the benchmarks and hopefully got the same profit.
I also compared the leela's code and figured out the reason.

Actually, my and Manolis's patches do the same thing. The difference 
is only execution order.
But shouldn't your patch also allow for for at the last the potential 
to pull the fp+offset computation out of a loop?  I'm pretty sure 
Manolis's patch can't do that.


Because of f-m-o held after the register allocation it cannot 
eliminate redundant move 'sp' to another register.
Actually that's supposed to be handled by a different patch that 
should already be upstream.  Specifically;



commit 6a2e8dcbbd4bab374b27abea375bf7a921047800
Author: Manolis Tsamis 
Date:   Thu May 25 13:44:41 2023 +0200

    cprop_hardreg: Enable propagation of the stack pointer if possible
        Propagation of the stack pointer in cprop_hardreg is currenty
    forbidden in all cases, due to maybe_mode_change returning NULL.
    Relax this restriction and allow propagation when no mode change is
    requested.
        gcc/ChangeLog:
        * regcprop.cc (maybe_mode_change): Enable stack pointer
    propagation.
I think there were a couple-follow-ups.  But that's the key change 
that should allow propagation of copies from the stack pointer and 
thus eliminate the mov gpr,sp instructions.  If that's not happening, 
then it's worth investigating why.




Besides that, I have checked the build failure on x264_r. It is 
already fixed on the third version.

Yea, this was a problem with re-recognition.  I think it was fixed by:


commit ecfa870ff29d979bd2c3d411643b551f2b6915b0
Author: Vineet Gupta 
Date:   Thu Jul 20 11:15:37 2023 -0700

    RISC-V: optim const DF +0.0 store to mem [PR/110748]
        Fixes: ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT")
        DF +0.0 is bitwise all zeros so int x0 store to mem can be 
used to optimize it.

[ ... ]


So I think the big question WRT your patch is does it still help the 
case where we weren't pulling the fp+offset computation out of a loop.


I have some numbers for f-m-o v3 vs this. Attached here (vs. inline to 
avoid the Thunderbird mangling the test formatting)
benchmark   workload #   upstreamupstream +  
upstream +
 g54e54f77c1f-m-o   
fold-fp-off

500.perlbench_r 0   1217932817476   1217884553366   0.004%  
1217928953834   0.000%
1   7437572412017436555281330.014%  
7436958204260.008%
2   7034556460907034235592980.005%  
7034552962510.000%
502.gcc_r   0   1950043691041949734789450.016%  
1949841884000.010%
1   2327199387782326884911130.014%  
2326923790850.012%
2   2234432804592234136163680.013%  
2234241518480.009%
3   1862337046241862065164210.015%  
1862311376160.001%
4   2874063942322873788702790.010%  
2874037074660.001%
503.bwaves_r0   3161940436793161940436790.000%  
3161940436620.000%
1   4992934903804992934903800.000%  
4992934903630.000%
2   3893654016153893654016150.000%  
3893654015980.000%
3   4735143106794735143106790.000%  
4735143106620.000%
505.mcf_r   0   6892586949026892547403440.001%  
6892586948870.000%
507.cactuBSSN_r 0   3966612364613   3966498234698   0.003%  
3966612365068   0.000%
508.namd_r  0   1903766272166   1903766271701   0.000%  
1903765987301   0.000%
510.parest_r0   3512678127316   3512676752062   0.000%  
3512677505662   0.008%
511.povray_r0   3036725558618   3036722265149   0.000%  
3036725556997   0.000%
519.lbm_r   0   1134454304533   1134454304533   0.000%  
1134454304518   0.000%
520.omnetpp_r   0   1001937885126   1001937884542   0.000%  
1001937883931   0.000%
521.wrf_r   0   3959642601629   3959541912013   0.003%  
3959642615086   0.000%
523.xalancbmk_r 0   1065004269065   1064981413043   0.002%  
1065004132070   0.000%
525.x264_r  0   4964928575334964593675820.007%  
4964779884350.003%
1   1891248078083   1891222197535   0.001%  
1890990911614   0.014%
2   1815609267498   1815561397105   0.003%  
1815341248007   0.015%
526.blender_r   0   1672203767444   1671549923427   0.039%  
1672224626743  -0.001%
527.cam4_r  0   2326424925038   2320567166886   0.252%  
2326333566227   0.004% <-
531.deepsjeng_r 0   1668993359340

[Committed] IBM Z: Handle unaligned symbols

2023-08-01 Thread Andreas Krebbel via Gcc-patches

The IBM Z ELF ABI mandates every symbol to reside on a 2 byte boundary
in order to be able to use the larl instruction. However, in some
situations it is difficult to enforce this, e.g. for common linker
scripts as used in the Linux kernel. This patch introduces the
-munaligned-symbols option. When that option is used, external symbols
without an explicit alignment are considered unaligned and its address
will be pushed into GOT or the literal pool.

If the symbol in the final linker step turns out end up on a 2 byte
boundary the linker is able to take this back and replace the indirect
reference with larl again. This should minimize the effect to symbols
which are actually unaligned in the end.

Bootstrapped and regression tested on s390x. Committed to mainline.

Backports to stable branches will follow.

gcc/ChangeLog:

* config/s390/s390.cc (s390_encode_section_info): Assume external
symbols without explicit alignment to be unaligned if
-munaligned-symbols has been specified.
* config/s390/s390.opt (-munaligned-symbols): New option.

gcc/testsuite/ChangeLog:

* gcc.target/s390/aligned-1.c: New test.
* gcc.target/s390/unaligned-1.c: New test.
---
 gcc/config/s390/s390.cc |  9 +++--
 gcc/config/s390/s390.opt|  7 +++
 gcc/testsuite/gcc.target/s390/aligned-1.c   | 20 
 gcc/testsuite/gcc.target/s390/unaligned-1.c | 20 
 4 files changed, 54 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/aligned-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/unaligned-1.c

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 13970edcb5e..89474fd487a 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -13709,8 +13709,13 @@ s390_encode_section_info (tree decl, rtx rtl, int 
first)
 a larl/load-relative instruction.  We only handle the cases
 that can go wrong (i.e. no FUNC_DECLs).
 All symbols without an explicit alignment are assumed to be 2
-byte aligned as mandated by our ABI.  */
-  if (DECL_USER_ALIGN (decl) && DECL_ALIGN (decl) % 16)
+byte aligned as mandated by our ABI.  This behavior can be
+overridden for external symbols with the -munaligned-symbols
+switch.  */
+  if (DECL_ALIGN (decl) % 16
+ && (DECL_USER_ALIGN (decl)
+ || (!SYMBOL_REF_LOCAL_P (XEXP (rtl, 0))
+ && s390_unaligned_symbols_p)))
SYMBOL_FLAG_SET_NOTALIGN2 (XEXP (rtl, 0));
   else if (DECL_ALIGN (decl) % 32)
SYMBOL_FLAG_SET_NOTALIGN4 (XEXP (rtl, 0));
diff --git a/gcc/config/s390/s390.opt b/gcc/config/s390/s390.opt
index 344aa551f44..496572046f7 100644
--- a/gcc/config/s390/s390.opt
+++ b/gcc/config/s390/s390.opt
@@ -329,3 +329,10 @@ Target Undocumented Var(unroll_only_small_loops) Init(0) 
Save
 mpreserve-args
 Target Var(s390_preserve_args_p) Init(0)
 Store all argument registers on the stack.
+
+munaligned-symbols
+Target Var(s390_unaligned_symbols_p) Init(0)
+Assume external symbols to be potentially unaligned.  By default all
+symbols without explicit alignment are assumed to reside on a 2 byte
+boundary as mandated by the IBM Z ABI.
+
diff --git a/gcc/testsuite/gcc.target/s390/aligned-1.c 
b/gcc/testsuite/gcc.target/s390/aligned-1.c
new file mode 100644
index 000..2dc99cf66bd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/aligned-1.c
@@ -0,0 +1,20 @@
+/* Even symbols without explicite alignment are assumed to reside on a
+   2 byte boundary, as mandated by the IBM Z ELF ABI, and therefore
+   can be accessed using the larl instruction.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z900 -fno-section-anchors" } */
+
+extern unsigned char extern_implicitly_aligned;
+extern unsigned char extern_explicitly_aligned __attribute__((aligned(2)));
+unsigned char aligned;
+
+unsigned char
+foo ()
+{
+  return extern_implicitly_aligned + extern_explicitly_aligned + aligned;
+}
+
+/* { dg-final { scan-assembler-times 
"larl\t%r\[0-9\]*,extern_implicitly_aligned\n" 1 } } */
+/* { dg-final { scan-assembler-times 
"larl\t%r\[0-9\]*,extern_explicitly_aligned\n" 1 } } */
+/* { dg-final { scan-assembler-times "larl\t%r\[0-9\]*,aligned\n" 1 } } */
diff --git a/gcc/testsuite/gcc.target/s390/unaligned-1.c 
b/gcc/testsuite/gcc.target/s390/unaligned-1.c
new file mode 100644
index 000..421330aded1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/unaligned-1.c
@@ -0,0 +1,20 @@
+/* With the -munaligned-symbols option all external symbols without
+   explicite alignment are assumed to be potentially unaligned and
+   therefore cannot be accessed with larl.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z900 -fno-section-anchors -munaligned-symbols" } */
+
+extern unsigned char extern_unaligned;
+extern unsigned char extern_explicitly_aligned __attribute__((aligned(2)));
+unsigned char aligned;
+
+unsigned

Re: [PATCH RESEND] libatomic: drop redundant all-multi command

2023-08-01 Thread Nathanael Nerode via Gcc-patches

I'm afraid I don't understand this part of the code well, and I've really been 
away from GCC work for years, and I'm not sure what tests should be run to 
verify that this is working, so I don't feel comfortable approving it by 
myself.  It looks right though.

On Tue, Aug 1, 2023, at 1:55 AM, Jan Beulich wrote:
> ./multilib.am already specifies this same command, and make warns about
> the earlier one being ignored when seeing the later one. All that needs
> retaining to still satisfy the preceding comment is the extra
> dependency.
>
> libatomic/
>
>   * Makefile.am (all-multi): Drop commands.
>   * Makefile.in: Update accordingly.
> ---
> While originally sent over a year ago and pinged subsequently, I can't
> quite view changes like this as "trivial" ...
>
> --- a/libatomic/Makefile.am
> +++ b/libatomic/Makefile.am
> @@ -149,12 +149,11 @@ endif
>  libatomic_convenience_la_SOURCES = $(libatomic_la_SOURCES)
>  libatomic_convenience_la_LIBADD = $(libatomic_la_LIBADD)
> 
> -# Override the automake generated all-multi rule to guarantee that all-multi
> +# Amend the automake generated all-multi rule to guarantee that all-multi
>  # is not run in parallel with the %_.lo rules which generate $(DEPDIR)/*.Ppo
>  # makefile fragments to avoid broken *.Ppo getting included into the Makefile
>  # when it is reloaded during the build of all-multi.
>  all-multi: $(libatomic_la_LIBADD)
> - $(MULTIDO) $(AM_MAKEFLAGS) DO=all multi-do # $(MAKE)
> 
>  # target overrides
>  -include $(tmake_file)
> --- a/libatomic/Makefile.in
> +++ b/libatomic/Makefile.in
> @@ -892,12 +892,11 @@ vpath % $(strip $(search_path))
>  %_.lo: Makefile
>   $(LTCOMPILE) $(M_DEPS) $(M_SIZE) $(M_IFUNC) -c -o $@ $(M_SRC)
> 
> -# Override the automake generated all-multi rule to guarantee that all-multi
> +# Amend the automake generated all-multi rule to guarantee that all-multi
>  # is not run in parallel with the %_.lo rules which generate $(DEPDIR)/*.Ppo
>  # makefile fragments to avoid broken *.Ppo getting included into the Makefile
>  # when it is reloaded during the build of all-multi.
>  all-multi: $(libatomic_la_LIBADD)
> - $(MULTIDO) $(AM_MAKEFLAGS) DO=all multi-do # $(MAKE)
> 
>  # target overrides
>  -include $(tmake_file)

Re: _BitInt vs. _Atomic

2023-08-01 Thread Martin Uecker

Am Dienstag, dem 01.08.2023 um 15:54 + schrieb Michael Matz:
> Hello,
> 
> On Mon, 31 Jul 2023, Martin Uecker wrote:
> 
> > >  Say you have a loop like so:
> > > 
> > > _Atomic T obj;
> > > ...
> > > T expected1, expected2, newval;
> > > newval = ...;
> > > expected1 = ...;
> > > do {
> > >   expected2 = expected1;
> > >   if (atomic_compare_exchange_weak(, , newval);
> > > break;
> > >   expected1 = expected2;
> > > } while (1);
> > > 
> > > As written this looks of course stupid, and you may say "don't do that", 
> > > but internally the copies might result from temporaries (compiler 
> > > generated or wrapper function arguments, or suchlike). 
> > >  Now, while 
> > > expected2 will contain the copied padding bits after the cmpxchg the 
> > > copies to and from expected1 will possibly destroy them.  Either way I 
> > > don't see why the above loop should be out-of-spec, so I can write it and 
> > > expect it to proceed eventually (certainly when the _strong variant is 
> > > used).  Any argument that would declare the above loop out-of-spec I 
> > > would 
> > > consider a defect in the spec.
> > 
> > It is "out-of-spec" for C in the sense that it can not be
> > expected work with the semantics as specified in the C standard.
> 
> (I call that a defect.  See below)

This was extensively discussed in WG14 (before my time). In fact,
there was a defect report about the previous version defined in
terms of values and the wording was changed to memcmp / memcpy
operating on padding bytes (also to align with C++ at that time):

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2059.htm#dr_431
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1906.htm


> > In practice, what the semantics specified using memcpy/memcmp
> > allow one to do is to also apply atomic operations on non-atomic 
> > types.  This is not guaranteed to work by the C standard, but
> > in practice  people often have to do this.  For example, nobody
> > is going to copy a 256 GB numerical array with non-atomic types
> > into another data structure with atomic versions of the same
> > type just so that you can apply atomic operations on it.
> > So one simply does an unsafe cast and hopes the compiler does
> > not break this.
> > 
> > If the non-atomic struct now has non-zero values in the padding, 
> > and the compiler would clear those automatically for "expected", 
> > you would create the problem of an infinite loop (this time 
> > for real).
> 
> Only because cmpxchg is defined in terms of memcpy/memcmp. 

Yes, but this is intentional.

>  If it were 
> defined in terms of the == operator (obviously applied recursively 
> member-wise for structs) and simple-assignment that wouldn't be a problem. 

C has no == operator or any concept of struct equality. 

It would also cause implementation overhead and I guess
could cause severe performance issues when there several
padding bytes distributed over an object and you need to
jump over those when doing copying or doing comparisons.
(how do vectorization?)

> In addition that would get rid of all discussion of what happens or 
> doesn't happen with padding.  Introducing reliance on padding bits (which 
> IMHO goes against some fundamental ideas of the C standard) has 
> far-reaching consequences, see below. 

Working with representation bytes of objects is a rather 
fundamental property of C. That you can do this using
character pointers or that you can copy objects with
memcpy and that the result are compared with memcmp is
something I expect to work in C. 

>  The current definition of the 
> atomic_cmpxchg is also inconsistent with the rest of the standard:
> 
> We have:
> 
>   ... (C is non-atomic variant of A) ...
>   _Bool atomic_compare_exchange_strong(volatile A *object,
>C *expected, C desired);
>   ... (is equivalent to atomic variant of:) 
>   if (memcmp(object, expected, sizeof (*object)) == 0)
> { memcpy(object, , sizeof (*object)); return true; }
>   else
> { memcpy(expected, object, sizeof (*object)); return false; }
> 
> But we also have:
> 
>   The size, representation, and alignment of an atomic type need not be 
>   the same as those of the corresponding unqualified type.
> 
>   (with later text only suggesting that at least for atomic integer 
>   types these please be the same.  But here we aren't talking about
>   integer types even.)

Reading the old meeting minutes, it seems WG14 considered
the case that an atomic type could have a content part and
possibly a lock and you would compare only the content
part (with padding) and not the lock. But I agree, the
wording should be improved.


> 
> So, already the 'memcmp(object, expected, sizeof (*object)' may be 
> undefined.  sizeof(*object) need not be the same as sizeof(*expected).
> In particular the memcpy in the else branch might clobber memory outside 
> *expected.
> 
> That alone should be sufficient to show that defining this all in terms of 
> memcpy/memcmp is

[PATCH 1/2] bpf: Implementation of BPF CO-RE builtins

2023-08-01 Thread Cupertino Miranda via Gcc-patches

This patch updates the support for the BPF CO-RE builtins
__builtin_preserve_access_index and __builtin_preserve_field_info,
and adds support for the CO-RE builtins __builtin_btf_type_id,
__builtin_preserve_type_info and __builtin_preserve_enum_value.

These CO-RE relocations are now converted to __builtin_core_reloc which
abstracts all of the original builtins in a polymorphic relocation
specific builtin.

The builtin processing is now split in 2 stages, the first (pack) is
executed right after the front-end and the second (process) right before
the asm output.

In expand pass the __builtin_core_reloc is converted to a
unspec:UNSPEC_CORE_RELOC rtx entry.

The data required to process the builtin is now collected in the packing
stage (after front-end), not allowing the compiler to optimize any of
the relevant information required to compose the relocation when
necessary.
At expansion, that information is recovered and CTF/BTF is queried to
construct the information that will be used in the relocation.
At this point the relocation is added to specific section and the
builtin is expanded to the expected default value for the builtin.

In order to process __builtin_preserve_enum_value, it was necessary to
hook the front-end to collect the original enum value reference.
This is needed since the parser folds all the enum values to its
integer_cst representation.

More details can be found within the core-builtins.cc.

Regtested in host x86_64-linux-gnu and target bpf-unknown-none.
---
 gcc/config.gcc|4 +-
 gcc/config/bpf/bpf-passes.def |   20 -
 gcc/config/bpf/bpf-protos.h   |4 +-
 gcc/config/bpf/bpf.cc |  817 +-
 gcc/config/bpf/bpf.md |   17 +
 gcc/config/bpf/core-builtins.cc   | 1397 +
 gcc/config/bpf/core-builtins.h|   36 +
 gcc/config/bpf/coreout.cc |   50 +-
 gcc/config/bpf/coreout.h  |   13 +-
 gcc/config/bpf/t-bpf  |6 +-
 gcc/doc/extend.texi   |   51 +
 ...core-builtin-fieldinfo-const-elimination.c |   29 +
 12 files changed, 1639 insertions(+), 805 deletions(-)
 delete mode 100644 gcc/config/bpf/bpf-passes.def
 create mode 100644 gcc/config/bpf/core-builtins.cc
 create mode 100644 gcc/config/bpf/core-builtins.h
 create mode 100644 
gcc/testsuite/gcc.target/bpf/core-builtin-fieldinfo-const-elimination.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index eba69a463be0..c521669e78b1 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1597,8 +1597,8 @@ bpf-*-*)
 use_collect2=no
 extra_headers="bpf-helpers.h"
 use_gcc_stdint=provide
-extra_objs="coreout.o"
-target_gtfiles="$target_gtfiles \$(srcdir)/config/bpf/coreout.cc"
+extra_objs="coreout.o core-builtins.o"
+target_gtfiles="$target_gtfiles \$(srcdir)/config/bpf/coreout.cc 
\$(srcdir)/config/bpf/core-builtins.cc"
 ;;
 cris-*-elf | cris-*-none)
tm_file="elfos.h newlib-stdint.h ${tm_file}"
diff --git a/gcc/config/bpf/bpf-passes.def b/gcc/config/bpf/bpf-passes.def
deleted file mode 100644
index deeaee988a01..
--- a/gcc/config/bpf/bpf-passes.def
+++ /dev/null
@@ -1,20 +0,0 @@
-/* Declaration of target-specific passes for eBPF.
-   Copyright (C) 2021-2023 Free Software Foundation, Inc.
-
-   This file is part of GCC.
-
-   GCC is free software; you can redistribute it and/or modify it
-   under the terms of the GNU General Public License as published by
-   the Free Software Foundation; either version 3, or (at your option)
-   any later version.
-
-   GCC is distributed in the hope that it will be useful, but
-   WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   General Public License for more details.
-
-   You should have received a copy of the GNU General Public License
-   along with GCC; see the file COPYING3.  If not see
-   .  */
-
-INSERT_PASS_AFTER (pass_df_initialize_opt, 1, pass_bpf_core_attr);
diff --git a/gcc/config/bpf/bpf-protos.h b/gcc/config/bpf/bpf-protos.h
index b484310e8cbf..fbcf5111eb21 100644
--- a/gcc/config/bpf/bpf-protos.h
+++ b/gcc/config/bpf/bpf-protos.h
@@ -30,7 +30,7 @@ extern void bpf_print_operand_address (FILE *, rtx);
 extern void bpf_expand_prologue (void);
 extern void bpf_expand_epilogue (void);
 extern void bpf_expand_cbranch (machine_mode, rtx *);
-
-rtl_opt_pass * make_pass_bpf_core_attr (gcc::context *);
+const char *bpf_add_core_reloc (rtx *operands, const char *templ);
+void bpf_process_move_operands (rtx *operands);
 
 #endif /* ! GCC_BPF_PROTOS_H */
diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
index b5b5674edbb5..101e994905d2 100644
--- a/gcc/config/bpf/bpf.cc
+++ b/gcc/config/bpf/bpf.cc
@@ -69,10 +69,7 @@ along with GCC; see the file COPYING3.  If

[PATCH 2/2] bpf: CO-RE builtins support tests.

2023-08-01 Thread Cupertino Miranda via Gcc-patches

This patch adds tests for the following builtins:
  __builtin_preserve_enum_value
  __builtin_btf_type_id
  __builtin_preserve_type_info
---
 .../gcc.target/bpf/core-builtin-enumvalue.c   |  52 +
 .../bpf/core-builtin-enumvalue_errors.c   |  22 
 .../bpf/core-builtin-enumvalue_opt.c  |  35 ++
 .../bpf/core-builtin-fieldinfo-errors-1.c |   2 +-
 .../bpf/core-builtin-fieldinfo-errors-2.c |   2 +-
 .../gcc.target/bpf/core-builtin-type-based.c  |  58 ++
 .../gcc.target/bpf/core-builtin-type-id.c |  40 +++
 gcc/testsuite/gcc.target/bpf/core-support.h   | 109 ++
 8 files changed, 318 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-builtin-enumvalue.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-builtin-enumvalue_errors.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-builtin-enumvalue_opt.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-builtin-type-based.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-builtin-type-id.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-support.h

diff --git a/gcc/testsuite/gcc.target/bpf/core-builtin-enumvalue.c 
b/gcc/testsuite/gcc.target/bpf/core-builtin-enumvalue.c
new file mode 100644
index ..3e3334dc089a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/core-builtin-enumvalue.c
@@ -0,0 +1,52 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -dA -gbtf -mco-re" } */
+
+#include "core-support.h"
+
+extern int *v;
+
+int foo(void *data)
+{
+ int i = 0;
+ enum named_ue64 named_unsigned64 = 0;
+ enum named_se64 named_signed64 = 0;
+ enum named_ue named_unsigned = 0;
+ enum named_se named_signed = 0;
+
+ v[i++] = bpf_core_enum_value_exists (named_unsigned64, UE64_VAL1);
+ v[i++] = bpf_core_enum_value_exists (enum named_ue64, UE64_VAL2);
+ v[i++] = bpf_core_enum_value_exists (enum named_ue64, UE64_VAL3);
+ v[i++] = bpf_core_enum_value_exists (named_signed64, SE64_VAL1);
+ v[i++] = bpf_core_enum_value_exists (enum named_se64, SE64_VAL2);
+ v[i++] = bpf_core_enum_value_exists (enum named_se64, SE64_VAL3);
+
+ v[i++] = bpf_core_enum_value (named_unsigned64, UE64_VAL1);
+ v[i++] = bpf_core_enum_value (named_unsigned64, UE64_VAL2);
+ v[i++] = bpf_core_enum_value (named_signed64, SE64_VAL1);
+ v[i++] = bpf_core_enum_value (named_signed64, SE64_VAL2);
+
+ v[i++] = bpf_core_enum_value_exists (named_unsigned, UE_VAL1);
+ v[i++] = bpf_core_enum_value_exists (enum named_ue, UE_VAL2);
+ v[i++] = bpf_core_enum_value_exists (enum named_ue, UE_VAL3);
+ v[i++] = bpf_core_enum_value_exists (named_signed, SE_VAL1);
+ v[i++] = bpf_core_enum_value_exists (enum named_se, SE_VAL2);
+ v[i++] = bpf_core_enum_value_exists (enum named_se, SE_VAL3);
+
+ v[i++] = bpf_core_enum_value (named_unsigned, UE_VAL1);
+ v[i++] = bpf_core_enum_value (named_unsigned, UE_VAL2);
+ v[i++] = bpf_core_enum_value (named_signed, SE_VAL1);
+ v[i++] = bpf_core_enum_value (named_signed, SE_VAL2);
+
+ return 0;
+}
+
+/* { dg-final { scan-assembler-times "\t.4byte\t0x8\t; bpfcr_type 
\\(named_ue64\\)" 5 } } */
+/* { dg-final { scan-assembler-times "\t.4byte\t0x9\t; bpfcr_type 
\\(named_se64\\)" 5} } */
+/* { dg-final { scan-assembler-times "\t.4byte\t0xb\t; bpfcr_type 
\\(named_ue\\)" 5 } } */
+/* { dg-final { scan-assembler-times "\t.4byte\t0xc\t; bpfcr_type 
\\(named_se\\)" 5} } */
+/* { dg-final { scan-assembler-times "\t.4byte\t0xa\t; bpfcr_kind" 12 } } 
BPF_ENUMVAL_EXISTS */
+/* { dg-final { scan-assembler-times "\t.4byte\t0xb\t; bpfcr_kind" 8 } } 
BPF_ENUMVAL_VALUE */
+
+/* { dg-final { scan-assembler-times "bpfcr_astr_off \\(\"0\"\\)" 8 } } */
+/* { dg-final { scan-assembler-times "bpfcr_astr_off \\(\"1\"\\)" 8 } } */
+/* { dg-final { scan-assembler-times "bpfcr_astr_off \\(\"2\"\\)" 4 } } */
diff --git a/gcc/testsuite/gcc.target/bpf/core-builtin-enumvalue_errors.c 
b/gcc/testsuite/gcc.target/bpf/core-builtin-enumvalue_errors.c
new file mode 100644
index ..138e99895160
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/core-builtin-enumvalue_errors.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -dA -gbtf -mco-re" } */
+
+#include "core-support.h"
+
+extern int *v;
+
+unsigned long foo(void *data)
+{
+  int i = 0;
+  enum named_ue64 named_unsigned = 0;
+  enum named_se64 named_signed = 0;
+  typeof(enum named_ue64) a = 0;
+
+  v[i++] = __builtin_preserve_enum_value (({ extern typeof(named_unsigned) 
*_type0; _type0; }), 0, BPF_ENUMVAL_EXISTS); /* { dg-error "invalid 
enum value argument for enum value builtin" } */
+  v[i++] = __builtin_preserve_enum_value (({ extern typeof(enum named_ue64) 
*_type0; _type0; }), v,BPF_ENUMVAL_EXISTS); /* { dg-error "invalid enum 
value argument for enum value builtin" } */
+  v[i++] = __builtin_preserve_enum_value (a,   
 UE64_VAL3, BPF_ENUMVAL_EXISTS); /* { dg-error "invalid 
type argument format for enum value builtin" } */
+  v[i++] =

[PATCH] CO-RE BPF builtins support

2023-08-01 Thread Cupertino Miranda via Gcc-patches

Hi everyone,

This patch series implements all the BPF CO-RE builtins.
It improves the support for __builtin_preserve_access_index and
__builtin_preserve_field_info, but also introduces the support for
__builtin_btf_type_id, __builtin_btf_preserve_type_info and
__builtin_preserve_enum_value.

Regtested in host x86_64-linux-gnu and target bpf-unknown-none.

Looking forward to your comments.

Best regards,
Cupertino

[PATCH v2] rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation

2023-08-01 Thread Carl Love via Gcc-patches



GCC maintainers:

Ver 2:  Re-worked the test vec-cmpne.c to create a compile only test
verify the instruction generation and a runnable test to verify the
built-in functionality.  Retested the patch on Power 8 LE/BE, Power 9LE/BE and 
Power 10 LE with no regressions.

The following patch cleans up the definition for the
__builtin_altivec_vcmpne{b,h,w}.  The current implementation implies
that the built-in is only supported on Power 9 since it is defined
under the Power 9 stanza.  However the built-in has no ISA restrictions
as stated in the Power Vector Intrinsic Programming Reference document.
The current built-in works because the built-in gets replaced during
GIMPLE folding by a simple not-equal operator so it doesn't get
expanded and checked for Power 9 code generation.

This patch moves the definition to the Altivec stanza in the built-in
definition file to make it clear the built-ins are valid for Power 8,
Power 9 and beyond.  

The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
LE with no regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

  Carl 


rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation

The current built-in definitions for vcmpneb, vcmpneh, vcmpnew are defined
under the Power 9 section of r66000-builtins.  This implies they are only
supported on Power 9 and above when in fact they are defined and work with
Altivec as well with the appropriate Altivec instruction generation.

The vec_cmpne builtin should generate the vcmpequ{b,h,w} instruction with
Altivec enabled and generate the vcmpne{b,h,w} on Power 9 and newer
processors.

This patch moves the definitions to the Altivec stanza to make it clear
the built-ins are supported for all Altivec processors.  The patch
enables the vcmpequ{b,h,w} instruction to be generated on Altivec and
the vcmpne{b,h,w} instruction to be generated on Power 9 and beyond.

There is existing test coverage for the vec_cmpne built-in for
vector bool char, vector bool short, vector bool int,
vector bool long long in builtins-3-p9.c and p8vector-builtin-2.c.
Coverage for vector signed int, vector unsigned int is in
p8vector-builtin-2.c.

Test vec-cmpne.c is updated to check the generation of the vcmpequ{b,h,w}
instructions for Altivec.  A new test vec-cmpne-runnable.c is added to
verify the built-ins work as expected.

Patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE
with no regressions.

gcc/ChangeLog:

* config/rs6000/rs6000-builtins.def (vcmpneb, vcmpneh, vcmpnew):
Move definitions to Altivec stanza.
* config/rs6000/altivec.md (vcmpneb, vcmpneh, vcmpnew): New
define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/vec-cmpne-runnable.c: New execution test.
* gcc.target/powerpc/vec-cmpne.c (define_test_functions,
execute_test_functions) moved to vec-cmpne.h.  Added
scan-assembler-times for vcmpequb, vcmpequh, vcmpequw.
* gcc.target/powerpc/vec-cmpne.h: New include file for vec-cmpne.c
and vec-cmpne-runnable.c. Split define_test_functions definition
into define_test_functions and define_init_verify_functions.
---
 gcc/config/rs6000/altivec.md  |  12 ++
 gcc/config/rs6000/rs6000-builtins.def |  18 +--
 .../gcc.target/powerpc/vec-cmpne-runnable.c   |  36 ++
 gcc/testsuite/gcc.target/powerpc/vec-cmpne.c  | 110 ++
 gcc/testsuite/gcc.target/powerpc/vec-cmpne.h  |  86 ++
 5 files changed, 151 insertions(+), 111 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmpne-runnable.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmpne.h

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index ad1224e0b57..31f65aa1b7a 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2631,6 +2631,18 @@ (define_insn "altivec_vcmpequt_p"
   "vcmpequq. %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+;; Expand for builtin vcmpne{b,h,w}
+(define_expand "altivec_vcmpne_"
+  [(set (match_operand:VSX_EXTRACT_I 3 "altivec_register_operand" "=v")
+   (eq:VSX_EXTRACT_I (match_operand:VSX_EXTRACT_I 1 
"altivec_register_operand" "v")
+ (match_operand:VSX_EXTRACT_I 2 
"altivec_register_operand" "v")))
+   (set (match_operand:VSX_EXTRACT_I 0 "altivec_register_operand" "=v")
+(not:VSX_EXTRACT_I (match_dup 3)))]
+  "TARGET_ALTIVEC"
+  {
+operands[3] = gen_reg_rtx (GET_MODE (operands[0]));
+  });
+
 (define_insn "*altivec_vcmpgts_p"
   [(set (reg:CC CR6_REGNO)
(unspec:CC [(gt:CC (match_operand:VI2 1 "register_operand" "v")
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 638d0bc72ca..6b06fa8b34d 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -641,6 +641,15 @@
   const int

Re: [PATCH] rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation

2023-08-01 Thread Carl Love via Gcc-patches

Kewen:

On Mon, 2023-07-31 at 14:53 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/7/28 23:00, Carl Love wrote:
> > GCC maintainers:
> > 
> > The following patch cleans up the definition for the
> > __builtin_altivec_vcmpnet.  The current implementation implies that
> > the
> 
> s/__builtin_altivec_vcmpnet/__builtin_altivec_vcmpne[bhw]/

OK, updated in email for version 2. 

> 
> > built-in is only supported on Power 9 since it is defined under the
> > Power 9 stanza.  However the built-in has no ISA restrictions as
> > stated
> > in the Power Vector Intrinsic Programming Reference document. The
> > current built-in works because the built-in gets replaced during
> > GIMPLE
> > folding by a simple not-equal operator so it doesn't get expanded
> > and
> > checked for Power 9 code generation.
> > 
> > This patch moves the definition to the Altivec stanza in the built-
> > in
> > definition file to make it clear the built-ins are valid for Power
> > 8,
> > Power 9 and beyond.  
> > 
> > The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power
> > 10
> > LE with no regressions.
> > 
> > Please let me know if the patch is acceptable for
> > mainline.  Thanks.
> > 
> >   Carl 
> > 
> > --
> > rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation
> > 
> > The current built-in definitions for vcmpneb, vcmpneh, vcmpnew are
> > defined
> > under the Power 9 section of r66000-builtins.  This implies they
> > are only
> > supported on Power 9 and above when in fact they are defined and
> > work on
> > Power 8 as well with the appropriate Power 8 instruction
> > generation.
> 
> Nit: It's confusing to say Power8 only, it's actually supported once
> altivec
> is enabled, so I think it's more clear to replace Power8 with altivec
> here.

OK, replaced Power 8 with Altivec here and for additional instances of
Power 8 below.

> 
> > The vec_cmpne builtin should generate the vcmpequ{b,h,w}
> > instruction on
> > Power 8 and generate the vcmpne{b,h,w} on Power 9 an newer
> > processors.
> 
> 
> Ditto for Power8 and "an" -> "and"?

Fixed, fixed.

> 
> > This patch moves the definitions to the Altivec stanza to make it
> > clear
> > the built-ins are supported for all Altivec processors.  The patch
> > enables the vcmpequ{b,h,w} instruction to be generated on Power 8
> > and
> > the vcmpne{b,h,w} instruction to be generated on Power 9 and
> > beyond.
> 
> Ditto for Power8.

fixed

> 
> > There is existing test coverage for the vec_cmpne built-in for
> > vector bool char, vector bool short, vector bool int,
> > vector bool long long in builtins-3-p9.c and p8vector-builtin-2.c.
> > Coverage for vector signed int, vector unsigned int is in
> > p8vector-builtin-2.c.
> 
> So there is no coverage with the basic altivec support.  I noticed
> we have one test case "gcc/testsuite/gcc.target/powerpc/vec-cmpne.c"
> which is a test case for running but with vsx_ok, I think we can
> rewrite it with altivec (vmx), either separating to compiling and
> running case, or adding -save-temp and check expected insns.

I looked at just adding -save-temp and scan-assembler-times for the
instructions.  I noticed that vcmpequw occurs 30 times in the functions
to initialize and test the results.  So, I opted to create a separate
compile/check instructions test and a runnable test to verify the
functionality.  This way any changes in the code to calculate and
verify the results will not break the instruction generation checks.

> 
> Coverage for unsigned long long int and long long int
> > for Power 10 in int_128bit-runnable.c.

Removed comment about Power 10, long long int testing.

> > 
> > Patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
> > LE
> > with no regressions.
> > 
> > gcc/ChangeLog:
> > 
> > * config/rs6000/rs6000-builtins.def (vcmpneb, vcmpneh, vcmpnew.
> > vcmpnet): Move definitions to Altivec stanza.
> 
> vcmpnet which isn't handled in this patch should be removed.

Removed.
 
 Carl

Re: [PATCH v3 2/2] libstdc++: Use _GLIBCXX_HAS_BUILTIN_TRAIT

2023-08-01 Thread Patrick Palka via Gcc-patches

On Thu, 27 Jul 2023, Ken Matsui via Gcc-patches wrote:

> This patch uses _GLIBCXX_HAS_BUILTIN_TRAIT macro instead of
> __has_builtin in the type_traits header. This macro supports to toggle
> the use of built-in traits in the type_traits header through
> _GLIBCXX_NO_BUILTIN_TRAITS macro, without needing to modify the
> source code.
> 
> libstdc++-v3/ChangeLog:
> 
>   * include/std/type_traits (__has_builtin): Replace with ...
>   (_GLIBCXX_HAS_BUILTIN): ... this.
> 
> Signed-off-by: Ken Matsui 
> ---
>  libstdc++-v3/include/std/type_traits | 26 +-
>  1 file changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/libstdc++-v3/include/std/type_traits 
> b/libstdc++-v3/include/std/type_traits
> index 9f086992ebc..12423361b6e 100644
> --- a/libstdc++-v3/include/std/type_traits
> +++ b/libstdc++-v3/include/std/type_traits
> @@ -1411,7 +1411,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  : public __bool_constant<__is_base_of(_Base, _Derived)>
>  { };
>  
> -#if __has_builtin(__is_convertible)
> +#if _GLIBCXX_HAS_BUILTIN_TRAIT(__is_convertible)
>template
>  struct is_convertible
>  : public __bool_constant<__is_convertible(_From, _To)>
> @@ -1462,7 +1462,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  #if __cplusplus >= 202002L
>  #define __cpp_lib_is_nothrow_convertible 201806L
>  
> -#if __has_builtin(__is_nothrow_convertible)
> +#if _GLIBCXX_HAS_BUILTIN_TRAIT(__is_nothrow_convertible)
>/// is_nothrow_convertible_v
>template
>  inline constexpr bool is_nothrow_convertible_v
> @@ -1537,7 +1537,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  { using type = _Tp; };
>  
>/// remove_cv
> -#if __has_builtin(__remove_cv)
> +#if _GLIBCXX_HAS_BUILTIN_TRAIT(__remove_cv)
>template
>  struct remove_cv
>  { using type = __remove_cv(_Tp); };
> @@ -1606,7 +1606,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>// Reference transformations.
>  
>/// remove_reference
> -#if __has_builtin(__remove_reference)
> +#if _GLIBCXX_HAS_BUILTIN_TRAIT(__remove_reference)
>template
>  struct remove_reference
>  { using type = __remove_reference(_Tp); };
> @@ -2963,7 +2963,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>template  bool _Nothrow = noexcept(_S_conv<_Tp>(_S_get())),
>  typename = decltype(_S_conv<_Tp>(_S_get())),
> -#if __has_builtin(__reference_converts_from_temporary)
> +#if _GLIBCXX_HAS_BUILTIN_TRAIT(__reference_converts_from_temporary)
>  bool _Dangle = __reference_converts_from_temporary(_Tp, _Res_t)
>  #else
>  bool _Dangle = false
> @@ -3420,7 +3420,7 @@ template
> */
>  #define __cpp_lib_remove_cvref 201711L
>  
> -#if __has_builtin(__remove_cvref)
> +#if _GLIBCXX_HAS_BUILTIN_TRAIT(__remove_cvref)
>template
>  struct remove_cvref
>  { using type = __remove_cvref(_Tp); };
> @@ -3515,7 +3515,7 @@ template
>  : public bool_constant>
>  { };
>  
> -#if __has_builtin(__is_layout_compatible)
> +#if _GLIBCXX_HAS_BUILTIN_TRAIT(__is_layout_compatible)

Hmm, I was thinking we'd use this macro only for traits that have a
fallback non-built-in implementation so that we could easily use/test
their fallback implementation.  For traits that don't have such a
fallback, using this macro would mean that trait would no longer get
defined at all, which doesn't seem as useful.  Perhaps let's initially
adjust only the traits that have a fallback implementation?

We could then verify that using the fallback implementation for all such
traits works as expected by running the testsuite with:

  make check RUNTESTFLAGS="conformance.exp 
--target_board=unix/-D_GLIBCXX_NO_BUILTIN_TRAITS"

>  
>/// @since C++20
>template
> @@ -3529,7 +3529,7 @@ template
>  constexpr bool is_layout_compatible_v
>= __is_layout_compatible(_Tp, _Up);
>  
> -#if __has_builtin(__builtin_is_corresponding_member)
> +#if _GLIBCXX_HAS_BUILTIN_TRAIT(__builtin_is_corresponding_member)
>  #define __cpp_lib_is_layout_compatible 201907L
>  
>/// @since C++20
> @@ -3540,7 +3540,7 @@ template
>  #endif
>  #endif
>  
> -#if __has_builtin(__is_pointer_interconvertible_base_of)
> +#if _GLIBCXX_HAS_BUILTIN_TRAIT(__is_pointer_interconvertible_base_of)
>/// True if `_Derived` is standard-layout and has a base class of type 
> `_Base`
>/// @since C++20
>template
> @@ -3554,7 +3554,7 @@ template
>  constexpr bool is_pointer_interconvertible_base_of_v
>= __is_pointer_interconvertible_base_of(_Base, _Derived);
>  
> -#if __has_builtin(__builtin_is_pointer_interconvertible_with_class)
> +#if 
> _GLIBCXX_HAS_BUILTIN_TRAIT(__builtin_is_pointer_interconvertible_with_class)
>  #define __cpp_lib_is_pointer_interconvertible 201907L
>  
>/// True if `__mp` points to the first member of a standard-layout type
> @@ -3590,8 +3590,8 @@ template
>template
>  inline constexpr bool is_scoped_enum_v = is_scoped_enum<_Tp>::value;
>  
> -#if __has_builtin(__reference_constructs_from_temporary) \
> -

arm: Remove unsigned variant of vcaddq_m

2023-08-01 Thread Stamatis Markianos-Wright via Gcc-patches


Hi all,

The unsigned variants of the vcaddq_m operation are not needed within the
compiler, as the assembly output of the signed and unsigned versions of the
ops is identical: with a `.i` suffix (as opposed to separate `.s` and `.u`
suffixes).

Tested with baremetal arm-none-eabi on Arm's fastmodels.

Ok for trunk?

Thanks,
Stamatis Markianos-Wright

gcc/ChangeLog:

    * config/arm/arm-mve-builtins-base.cc (vcaddq_rot90, vcaddq_rot270):
      Use common insn for signed and unsigned front-end definitions.
    * config/arm/arm_mve_builtins.def
      (vcaddq_rot90_m_u, vcaddq_rot270_m_u): Make common.
      (vcaddq_rot90_m_s, vcaddq_rot270_m_s): Remove.
    * config/arm/iterators.md (mve_insn): Merge signed and unsigned defs.
      (isu): Likewise.
      (rot): Likewise.
      (mve_rot): Likewise.
      (supf): Likewise.
      (VxCADDQ_M): Likewise.
    * config/arm/unspecs.md (unspec): Likewise.
---
 gcc/config/arm/arm-mve-builtins-base.cc |  4 ++--
 gcc/config/arm/arm_mve_builtins.def |  6 ++---
 gcc/config/arm/iterators.md | 30 +++--
 gcc/config/arm/mve.md   |  4 ++--
 gcc/config/arm/unspecs.md   |  6 ++---
 5 files changed, 21 insertions(+), 29 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc

index e31095ae112..426a87e9852 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -260,8 +260,8 @@ FUNCTION_PRED_P_S_U (vaddvq, VADDVQ)
 FUNCTION_PRED_P_S_U (vaddvaq, VADDVAQ)
 FUNCTION_WITH_RTX_M (vandq, AND, VANDQ)
 FUNCTION_ONLY_N (vbrsrq, VBRSRQ)
-FUNCTION (vcaddq_rot90, unspec_mve_function_exact_insn_rot, 
(UNSPEC_VCADD90, UNSPEC_VCADD90, UNSPEC_VCADD90, VCADDQ_ROT90_M_S, 
VCADDQ_ROT90_M_U, VCADDQ_ROT90_M_F))
-FUNCTION (vcaddq_rot270, unspec_mve_function_exact_insn_rot, 
(UNSPEC_VCADD270, UNSPEC_VCADD270, UNSPEC_VCADD270, VCADDQ_ROT270_M_S, 
VCADDQ_ROT270_M_U, VCADDQ_ROT270_M_F))
+FUNCTION (vcaddq_rot90, unspec_mve_function_exact_insn_rot, 
(UNSPEC_VCADD90, UNSPEC_VCADD90, UNSPEC_VCADD90, VCADDQ_ROT90_M, 
VCADDQ_ROT90_M, VCADDQ_ROT90_M_F))
+FUNCTION (vcaddq_rot270, unspec_mve_function_exact_insn_rot, 
(UNSPEC_VCADD270, UNSPEC_VCADD270, UNSPEC_VCADD270, VCADDQ_ROT270_M, 
VCADDQ_ROT270_M, VCADDQ_ROT270_M_F))
 FUNCTION (vcmlaq, unspec_mve_function_exact_insn_rot, (-1, -1, 
UNSPEC_VCMLA, -1, -1, VCMLAQ_M_F))
 FUNCTION (vcmlaq_rot90, unspec_mve_function_exact_insn_rot, (-1, -1, 
UNSPEC_VCMLA90, -1, -1, VCMLAQ_ROT90_M_F))
 FUNCTION (vcmlaq_rot180, unspec_mve_function_exact_insn_rot, (-1, -1, 
UNSPEC_VCMLA180, -1, -1, VCMLAQ_ROT180_M_F))
diff --git a/gcc/config/arm/arm_mve_builtins.def 
b/gcc/config/arm/arm_mve_builtins.def

index 43dacc3dda1..6ac1812c697 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -523,8 +523,8 @@ VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, 
vhsubq_m_n_u, v16qi, v8hi, v4si)

 VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vhaddq_m_u, v16qi, v8hi, v4si)
 VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vhaddq_m_n_u, v16qi, v8hi, 
v4si)

 VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, veorq_m_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vcaddq_rot90_m_u, v16qi, 
v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vcaddq_rot270_m_u, v16qi, 
v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vcaddq_rot90_m_, v16qi, 
v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vcaddq_rot270_m_, v16qi, 
v8hi, v4si)

 VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vbicq_m_u, v16qi, v8hi, v4si)
 VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vandq_m_u, v16qi, v8hi, v4si)
 VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vaddq_m_u, v16qi, v8hi, v4si)
@@ -587,8 +587,6 @@ VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, 
vhcaddq_rot270_m_s, v16qi, v8hi, v4si)

 VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vhaddq_m_s, v16qi, v8hi, v4si)
 VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vhaddq_m_n_s, v16qi, v8hi, v4si)
 VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, veorq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vcaddq_rot90_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vcaddq_rot270_m_s, v16qi, v8hi, 
v4si)

 VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vbrsrq_m_n_s, v16qi, v8hi, v4si)
 VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vbicq_m_s, v16qi, v8hi, v4si)
 VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vandq_m_s, v16qi, v8hi, v4si)
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index b13ff53d36f..2edd0b06370 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -941,8 +941,8 @@
      (VBICQ_N_S "vbic") (VBICQ_N_U "vbic")
      (VBRSRQ_M_N_S "vbrsr") (VBRSRQ_M_N_U "vbrsr") (VBRSRQ_M_N_F 
"vbrsr")

      (VBRSRQ_N_S "vbrsr") (VBRSRQ_N_U "vbrsr") (VBRSRQ_N_F "vbrsr")
-         (VCADDQ_ROT270_M_U "vcadd") (VCADDQ_ROT270_M_S "vcadd") 
(VCADDQ_ROT270_M_F "vcadd")
-         (VCADDQ_ROT90_M_U "vcadd") (VCADDQ_ROT90_M_S "vcadd") 
(VCADDQ_ROT90_M_F

Re: [PATCH] analyzer: stash values for CPython plugin [PR107646]

2023-08-01 Thread David Malcolm via Gcc-patches

On Tue, 2023-08-01 at 09:52 -0400, Eric Feng wrote:
> Hi all,
> 
> This patch adds a hook to the end of ana::on_finish_translation_unit
> which calls relevant stashing-related callbacks registered during
> plugin
> initialization. This feature is used to stash named types and global
> variables for a CPython analyzer plugin [PR107646].
> 
> Bootstrapped and tested on aarch64-unknown-linux-gnu. Does it look
> okay?

Hi Eric, thanks for the patch.

The patch touches the C frontend, so those parts would need approval
from the C FE maintainers/reviewers; I've CCed them.

Overall, I like the patch, but it's not ready for trunk yet; various
comments inline below...

> 
> ---
> 
> gcc/analyzer/ChangeLog:

You could add: PR analyzer/107646 to these ChangeLog entries; have a
look at how other ChangeLog entries refer to such bugzilla entries.

> 
>     * analyzer-language.cc (run_callbacks): New function.
>     (on_finish_translation_unit): New function.
>     * analyzer-language.h (GCC_ANALYZER_LANGUAGE_H): New include.
>     (class translation_unit): New vfuncs.
> 
> gcc/c/ChangeLog:
> 
>     * c-parser.cc: New functions.

I think this ChangeLog entry needs more detail.
> 
> gcc/testsuite/ChangeLog:
> 
>     * gcc.dg/plugin/analyzer_cpython_plugin.c: New test.
> 
> Signed-off-by: Eric Feng 
> ---
>  gcc/analyzer/analyzer-language.cc |  22 ++
>  gcc/analyzer/analyzer-language.h  |   9 +
>  gcc/c/c-parser.cc |  26 ++
>  .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 224
> ++
>  4 files changed, 281 insertions(+)
>  create mode 100644
> gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> 
> diff --git a/gcc/analyzer/analyzer-language.cc
> b/gcc/analyzer/analyzer-language.cc
> index 2c8910906ee..fc41b9c17b8 100644
> --- a/gcc/analyzer/analyzer-language.cc
> +++ b/gcc/analyzer/analyzer-language.cc
> @@ -35,6 +35,26 @@ static GTY (()) hash_map 
> *analyzer_stashed_constants;
>  #if ENABLE_ANALYZER
> 
>  namespace ana {
> +static vec
> +    *finish_translation_unit_callbacks;
> +
> +void
> +register_finish_translation_unit_callback (
> +    finish_translation_unit_callback callback)
> +{
> +  if (!finish_translation_unit_callbacks)
> +    vec_alloc (finish_translation_unit_callbacks, 1);
> +  finish_translation_unit_callbacks->safe_push (callback);
> +}
> +
> +void
> +run_callbacks (logger *logger, const translation_unit )

This function could be "static" since it's not needed outside of
analyzer-language.cc

> +{
> +  for (auto const  : finish_translation_unit_callbacks)
> +    {
> +  cb (logger, tu);
> +    }
> +}
> 
>  /* Call into TU to try to find a value for NAME.
>     If found, stash its value within analyzer_stashed_constants.  */
> @@ -102,6 +122,8 @@ on_finish_translation_unit (const
> translation_unit )
>  the_logger.set_logger (new logger (logfile, 0, 0,
>  *global_dc->printer));
>    stash_named_constants (the_logger.get_logger (), tu);
> +
> +  run_callbacks (the_logger.get_logger (), tu);
>  }
> 
>  /* Lookup NAME in the named constants stashed when the frontend TU
> finished.
> diff --git a/gcc/analyzer/analyzer-language.h
> b/gcc/analyzer/analyzer-language.h
> index 00f85aba041..8deea52d627 100644
> --- a/gcc/analyzer/analyzer-language.h
> +++ b/gcc/analyzer/analyzer-language.h
> @@ -21,6 +21,8 @@ along with GCC; see the file COPYING3.  If not see
>  #ifndef GCC_ANALYZER_LANGUAGE_H
>  #define GCC_ANALYZER_LANGUAGE_H
> 
> +#include "analyzer/analyzer-logging.h"
> +
>  #if ENABLE_ANALYZER
> 
>  namespace ana {
> @@ -35,8 +37,15 @@ class translation_unit
>   have been seen).  If it is defined and an integer (e.g. either
> as a
>   macro or enum), return the INTEGER_CST value, otherwise return
> NULL.  */
>    virtual tree lookup_constant_by_id (tree id) const = 0;
> +  virtual tree lookup_type_by_id (tree id) const = 0;
> +  virtual tree lookup_global_var_by_id (tree id) const = 0;
>  };
> 
> +typedef void (*finish_translation_unit_callback)
> +   (logger *, const translation_unit &);
> +void register_finish_translation_unit_callback (
> +    finish_translation_unit_callback callback);
> +
>  /* Analyzer hook for frontends to call at the end of the TU.  */
> 
>  void on_finish_translation_unit (const translation_unit );
> diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
> index 80920b31f83..f0ee55e416b 100644
> --- a/gcc/c/c-parser.cc
> +++ b/gcc/c/c-parser.cc
> @@ -1695,6 +1695,32 @@ public:
>  return NULL_TREE;
>    }
> 
> +  tree
> +  lookup_type_by_id (tree id) const final override
> +  {
> +    if (tree type_decl = lookup_name (id))
> +  {
> + if (TREE_CODE (type_decl) == TYPE_DECL)
> + {
> + tree record_type = TREE_TYPE (type_decl);
> + if (TREE_CODE (record_type) == RECORD_TYPE)
> + return record_type;
> + }

It looks like something's wrong with the indentation here, but the idea
seems OK to me (but needs C FE reviewer approval).

> +  }
> +
> +    return NULL_TREE;

Re: _BitInt vs. _Atomic

2023-08-01 Thread Joseph Myers

On Tue, 1 Aug 2023, Michael Matz via Gcc-patches wrote:

> Only because cmpxchg is defined in terms of memcpy/memcmp.  If it were 
> defined in terms of the == operator (obviously applied recursively 
> member-wise for structs) and simple-assignment that wouldn't be a problem.  

It also wouldn't work for floating point, where I think clearly the atomic 
operations should consider positive and negative zero as different, and 
should consider different DFP quantum exponents for the same real number 
as different - but should also consider the same NaN (same payload, same 
choice of quiet / signaling) as being the same.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] Add POLY_INT_CST support to fold_ctor_reference in gimple-fold.cc

2023-08-01 Thread Richard Ball via Gcc-patches


Thanks Richard,

I've gone through the write access process and committed this.

On 7/31/2023 10:56 AM, Richard Sandiford wrote:

Richard Ball  writes:

Add POLY_INT_CST support to code within
fold_ctor_reference. This code previously
only supported INTEGER_CST which caused a
bug when using VEC_PERM_EXPR with SVE vectors.


Just to add for others: this is a prerequisite for a follow-on patch,
so the change will be tested there.


gcc/ChangeLog:

  * gimple-fold.cc (fold_ctor_reference):
  Add support for Poly_int.


Nit: s/Poly_int/poly_int/

OK with that change, thanks.  Please follow https://gcc.gnu.org/gitwrite.html
to get write access (I'll sponsor).

Richard


#

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index
4027ff71e10337fe49c600fcd5a80026b260d54d..91e80b9aaa3b4797ce3a94129ca42c98d974cbd9
100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -8162,8 +8162,8 @@ fold_ctor_reference (tree type, tree ctor, const
poly_uint64 _offset,
result.  */
 if (!AGGREGATE_TYPE_P (TREE_TYPE (ctor)) && !offset
 /* VIEW_CONVERT_EXPR is defined only for matching sizes.  */
-  && !compare_tree_int (TYPE_SIZE (type), size)
-  && !compare_tree_int (TYPE_SIZE (TREE_TYPE (ctor)), size))
+  && known_eq (wi::to_poly_widest (TYPE_SIZE (type)), size)
+  && known_eq (wi::to_poly_widest (TYPE_SIZE (TREE_TYPE (ctor))),
size))
   {
 ret = canonicalize_constructor_val (unshare_expr (ctor), from_decl);
 if (ret)

Re: [PATCH] rtl-optimization/110587 - remove quadratic regno_in_use_p

2023-08-01 Thread Vladimir Makarov via Gcc-patches




On 7/25/23 09:40, Richard Biener wrote:

The following removes the code checking whether a noop copy
is between something involved in the return sequence composed
of a SET and USE.  Instead of checking for this special-case
the following makes us only ever remove noop copies between
pseudos - which is the case that is necessary for IRA/LRA
interfacing to function according to the comment.  That makes
looking for the return reg special case unnecessary, reducing
the compile-time in LRA non-specific to zero for the testcase.

Bootstrapped and tested on x86_64-unknown-linux-gnu with
all languages and {,-m32}.

OK?


Richard, sorry for the delay with the answer.  I was on vacation.

There is a lot of history of changes of the code.  I believe your change 
is right.  I don't think that RTL will ever contain noop return move 
insn involving the return hard register especially after removing hard 
reg propagation couple years ago, at least IRA/LRA do not generate such 
insns during its work.


So the patch is OK for me.  I specially like that the big part of code 
is removed.  No code, no problem (including performance one).  Thank you 
for the patch.



PR rtl-optimization/110587
* lra-spills.cc (return_regno_p): Remove.
(regno_in_use_p): Likewise.
(lra_final_code_change): Do not remove noop moves
between hard registers.
---
  gcc/lra-spills.cc | 69 +--
  1 file changed, 1 insertion(+), 68 deletions(-)

diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.cc
index 3a7bb7e8cd9..fe58f162d05 100644
--- a/gcc/lra-spills.cc
+++ b/gcc/lra-spills.cc
@@ -705,72 +705,6 @@ alter_subregs (rtx *loc, bool final_p)
return res;
  }

Re: _BitInt vs. _Atomic

2023-08-01 Thread Michael Matz via Gcc-patches

Hello,

On Mon, 31 Jul 2023, Martin Uecker wrote:

> >  Say you have a loop like so:
> > 
> > _Atomic T obj;
> > ...
> > T expected1, expected2, newval;
> > newval = ...;
> > expected1 = ...;
> > do {
> >   expected2 = expected1;
> >   if (atomic_compare_exchange_weak(, , newval);
> > break;
> >   expected1 = expected2;
> > } while (1);
> > 
> > As written this looks of course stupid, and you may say "don't do that", 
> > but internally the copies might result from temporaries (compiler 
> > generated or wrapper function arguments, or suchlike). 
> >  Now, while 
> > expected2 will contain the copied padding bits after the cmpxchg the 
> > copies to and from expected1 will possibly destroy them.  Either way I 
> > don't see why the above loop should be out-of-spec, so I can write it and 
> > expect it to proceed eventually (certainly when the _strong variant is 
> > used).  Any argument that would declare the above loop out-of-spec I would 
> > consider a defect in the spec.
> 
> It is "out-of-spec" for C in the sense that it can not be
> expected work with the semantics as specified in the C standard.

(I call that a defect.  See below)

> In practice, what the semantics specified using memcpy/memcmp
> allow one to do is to also apply atomic operations on non-atomic 
> types.  This is not guaranteed to work by the C standard, but
> in practice  people often have to do this.  For example, nobody
> is going to copy a 256 GB numerical array with non-atomic types
> into another data structure with atomic versions of the same
> type just so that you can apply atomic operations on it.
> So one simply does an unsafe cast and hopes the compiler does
> not break this.
> 
> If the non-atomic struct now has non-zero values in the padding, 
> and the compiler would clear those automatically for "expected", 
> you would create the problem of an infinite loop (this time 
> for real).

Only because cmpxchg is defined in terms of memcpy/memcmp.  If it were 
defined in terms of the == operator (obviously applied recursively 
member-wise for structs) and simple-assignment that wouldn't be a problem.  
In addition that would get rid of all discussion of what happens or 
doesn't happen with padding.  Introducing reliance on padding bits (which 
IMHO goes against some fundamental ideas of the C standard) has 
far-reaching consequences, see below.  The current definition of the 
atomic_cmpxchg is also inconsistent with the rest of the standard:

We have:

  ... (C is non-atomic variant of A) ...
  _Bool atomic_compare_exchange_strong(volatile A *object,
   C *expected, C desired);
  ... (is equivalent to atomic variant of:) 
  if (memcmp(object, expected, sizeof (*object)) == 0)
{ memcpy(object, , sizeof (*object)); return true; }
  else
{ memcpy(expected, object, sizeof (*object)); return false; }

But we also have:

  The size, representation, and alignment of an atomic type need not be 
  the same as those of the corresponding unqualified type.

  (with later text only suggesting that at least for atomic integer 
  types these please be the same.  But here we aren't talking about
  integer types even.)

So, already the 'memcmp(object, expected, sizeof (*object)' may be 
undefined.  sizeof(*object) need not be the same as sizeof(*expected).
In particular the memcpy in the else branch might clobber memory outside 
*expected.

That alone should be sufficient to show that defining this all in terms of 
memcpy/memcmp is a bad idea.  But it also has other 
consequences: you can't copy (simple-assign) or compare (== operator) 
atomic values anymore reliably and expect the atomic_cmpxchg to work.  My 
example from earlier shows that you can't copy them, a similar one can be 
constructed for breaking ==.

But it goes further: you can also construct an example that shows an 
internal inconsistency just with using atomic_cmpxchg (of course, assume 
all this to run without concurrent accesses to the respective objects):

  _Atomic T obj;
  ...
  T expected, newval;
  expected = ...;
  newval = expected + 1; // just to make it different
  atomic_store (, expected);
  if (atomic_cmpxchg_strong (, , newval)) {
/* Now we have: obj == newval.
   Do we also have memcmp(,)==0? */
if (!atomic_cmpxchg_strong (, , expected)) {
  /* No, we can't rely on that!  */
  error("what's going on?");
}
  } else {
/* May happen, padding of expected may not be the same
   as in obj, even after atomic_store.  */
error("WTH? a compare after a store doesn't even work?");
  }

So, even though cmpxchg is defined in terms of memcpy/memcmp, we still 
can't rely on anything after it succeeded (or failed).  Simply because the 
by-value passing of the 'desired' argument will have unknown padding 
(within the implementation of cmpxchg) that isn't necessarily the same as 
the newval object.

Now, about your suggestion of clearing or ignoring the padding bits at 
specific

[PATCH, OpenACC 2.7, v2] Implement default clause support for data constructs

2023-08-01 Thread Chung-Lin Tang via Gcc-patches

Hi Thomas,
this is v2 of the patch for implementing the OpenACC 2.7 addition of
default(none|present) support for data constructs.

Instead of propagating an additional 'oacc_default_kind' for OpenACC,
this patch does it in a more complete way: it directly propagates the
gimplify_omp_ctx* pointer of the inner most context where we found
a default-clause. This supports displaying the location/type of OpenACC
construct where the default-clause is in the error messages.

The testcases also have the multiple nested data construct testing added,
where we can now have messages referring precisely to the exact innermost
default clause that was active at that program point.

Note, I got rid of the dummy OMP_CLAUSE_DEFAULT creation in this version,
since it seemed not really needed.

Re-tested on master on powerpc64le-linux/nvptx. Okay to commit?

Thanks,
Chung-Lin

2023-08-01  Chung-Lin Tang  

gcc/c/ChangeLog:
* c-parser.cc (OACC_DATA_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_DEFAULT.

gcc/cp/ChangeLog:
* parser.cc (OACC_DATA_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_DEFAULT.

gcc/fortran/ChangeLog:
* openmp.cc (OACC_DATA_CLAUSES): Add OMP_CLAUSE_DEFAULT.

gcc/ChangeLog:
* gimplify.cc (struct gimplify_omp_ctx): Add oacc_default_clause_ctx
field.
(new_omp_context): Initialize oacc_default_clause_ctx field.
(oacc_region_type_name): New function.
(oacc_default_clause): Lookup current default_kind value from
ctx->oacc_default_clause_ctx, adjust default(none) error and inform
message dumping.
(gimplify_scan_omp_clauses): Upon OMP_CLAUSE_DEFAULT case, set
ctx->oacc_default_clause_ctx to current context.

gcc/testsuite/ChangeLog:
* c-c++-common/goacc/default-3.c: Adjust testcase.
* c-c++-common/goacc/default-5.c: Adjust testcase.
* gfortran.dg/goacc/default-3.f95: Adjust testcase.
* gfortran.dg/goacc/default-5.f: Adjust testcase.diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 24a6eb6e459..974f0132787 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -18196,6 +18196,7 @@ c_parser_oacc_cache (location_t loc, c_parser *parser)
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYIN)  \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYOUT) \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_CREATE)  \
+   | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEFAULT) \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICEPTR)   \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_IF)  \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_NO_CREATE)   \
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index d7ef5b34d42..bc59fbeac20 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -45860,6 +45860,7 @@ cp_parser_oacc_cache (cp_parser *parser, cp_token 
*pragma_tok)
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYIN)  \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYOUT) \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_CREATE)  \
+   | (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEFAULT) \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DETACH)  \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICEPTR)   \
| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_IF)  \
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 2952cd300ac..c37f843ec3b 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -3802,7 +3802,8 @@ error:
 #define OACC_DATA_CLAUSES \
   (omp_mask (OMP_CLAUSE_IF) | OMP_CLAUSE_DEVICEPTR  | OMP_CLAUSE_COPY\
| OMP_CLAUSE_COPYIN | OMP_CLAUSE_COPYOUT | OMP_CLAUSE_CREATE
  \
-   | OMP_CLAUSE_NO_CREATE | OMP_CLAUSE_PRESENT | OMP_CLAUSE_ATTACH)
+   | OMP_CLAUSE_NO_CREATE | OMP_CLAUSE_PRESENT | OMP_CLAUSE_ATTACH   \
+   | OMP_CLAUSE_DEFAULT)
 #define OACC_LOOP_CLAUSES \
   (omp_mask (OMP_CLAUSE_COLLAPSE) | OMP_CLAUSE_GANG | OMP_CLAUSE_WORKER
  \
| OMP_CLAUSE_VECTOR | OMP_CLAUSE_SEQ | OMP_CLAUSE_INDEPENDENT \
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 320920ed74c..ec0ccc67da8 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -225,6 +225,7 @@ struct gimplify_omp_ctx
   vec loop_iter_var;
   location_t location;
   enum omp_clause_default_kind default_kind;
+  struct gimplify_omp_ctx *oacc_default_clause_ctx;
   enum omp_region_type region_type;
   enum tree_code code;
   bool combined_loop;
@@ -459,6 +460,10 @@ new_omp_context (enum omp_region_type region_type)
 c->default_kind = OMP_CLAUSE_DEFAULT_SHARED;
   else
 c->default_kind = OMP_CLAUSE_DEFAULT_UNSPECIFIED;
+  if (gimplify_omp_ctxp)
+c->oacc_default_clause_ctx = gimplify_omp_ctxp->oacc_default_clause_ctx;
+  else
+c->oacc_default_clause_ctx = c;
   c->defaultmap[GDMK_SCALAR] =

Re: [PATCH] preprocessor: c++: Support `#pragma GCC target' macros [PR87299]

2023-08-01 Thread Joseph Myers

On Mon, 31 Jul 2023, Lewis Hyatt via Gcc-patches wrote:

> I added some additional testcases from the PR for x86. The other targets
> that support `#pragma GCC target' (aarch64, arm, nios2, powerpc, s390)
> already had tests verifying that the pragma sets macros as expected; here I
> have added -save-temps to some of them, to test that it now works in
> preprocess-only mode as well.

It would seem better to have copies of the tests with and without 
-save-temps, to test in both modes, rather than changing what's tested by 
an existing test here.  Or a test variant that #includes the original test 
but uses different options, if the original test isn't doing anything that 
would fail to work with that approach.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [C PATCH]: Add Walloc-type to warn about insufficient size in allocations

2023-08-01 Thread Martin Uecker via Gcc-patches

Am Dienstag, dem 01.08.2023 um 13:27 + schrieb Qing Zhao:
> 
> > On Aug 1, 2023, at 3:51 AM, Martin Uecker via Gcc-patches 
> >  wrote:
> > 


> > > Hi Martin,
> > > Just wondering if it'd be a good idea perhaps to warn if alloc size is
> > > not a multiple of TYPE_SIZE_UNIT instead of just less-than ?
> > > So it can catch cases like:
> > > int *p = malloc (sizeof (int) + 2); // probably intended malloc
> > > (sizeof (int) * 2)
> > > 
> > > FWIW, this is caught using -fanalyzer:
> > > f.c: In function 'f':
> > > f.c:3:12: warning: allocated buffer size is not a multiple of the
> > > pointee's size [CWE-131] [-Wanalyzer-allocation-size]
> > >    3 |   int *p = __builtin_malloc (sizeof(int) + 2);
> > >  |^~
> > > 
> > > Thanks,
> > > Prathamesh
> > 
> > Yes, this is probably a good idea.  It might need special
> > logic for flexible array members then...
> 
> Why special logic for FAM on such warning? (Not a multiple of TYPE_SIZE_UNIT 
> for the element).
> 

For

struct { int n; char buf[]; } *p = malloc(sizeof *p + n);
p->n = n;

the size would not be a multiple.

Martin

[PATCH] RISC-V: Implement vector "average" autovec pattern.

2023-08-01 Thread Robin Dapp via Gcc-patches

Hi,

this patch adds vector average patterns

 op[0] = (narrow) ((wide) op[1] + (wide) op[2]) >> 1;
 op[0] = (narrow) ((wide) op[1] + (wide) op[2] + 1) >> 1;

If there is no direct support, the vectorizer can synthesize the patterns
but, presumably due to lack of narrowing operation support, won't try
a narrowing shift.  Therefore, this patch implements the expanders instead.

A synthesized pattern results in e.g:
vsrl.vi v2,v1,1
vsrl.vi v4,v3,1
vand.vv v1,v1,v3
vadd.vv v2,v2,v4
vand.vi v1,v1,1
vadd.vv v1,v2,v1

With this patch we generate:
vwadd.vvv2,v4,v1
vadd.vi v2,1
vnsrl.wiv2,v2,1

We manage to recover (i.e. create the latter sequence) for signed types
but not for unsigned.  I figured that offering both patterns might be the
safe thing to do but open to leaving the signed one out.  In the long
term we'd want full vectorizer support for this I suppose.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/autovec.md (avg3_floor):
Implement expander.
(avg3_ceil): Ditto.
* config/riscv/vector-iterators.md (ashiftrt): New iterator.
(ASHIFTRT): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vec-avg-run.c: New test.
* gcc.target/riscv/rvv/autovec/vec-avg-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/vec-avg-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/vec-avg-template.h: New test.
---
 gcc/config/riscv/autovec.md   | 66 ++
 gcc/config/riscv/vector-iterators.md  |  5 ++
 .../riscv/rvv/autovec/vec-avg-run.c   | 85 +++
 .../riscv/rvv/autovec/vec-avg-rv32gcv.c   | 10 +++
 .../riscv/rvv/autovec/vec-avg-rv64gcv.c   | 10 +++
 .../riscv/rvv/autovec/vec-avg-template.h  | 33 +++
 6 files changed, 209 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv32gcv.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv64gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-template.h

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 7b784437c7e..23d3c2feaff 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1752,3 +1752,69 @@ (define_expand "mask_len_fold_left_plus_"

riscv_vector::reduction_type::MASK_LEN_FOLD_LEFT);
   DONE;
 })
+
+;; -
+;;  [INT] Average.
+;; -
+;; Implements the following "average" patterns:
+;; floor:
+;;  op[0] = (narrow) ((wide) op[1] + (wide) op[2]) >> 1;
+;; ceil:
+;;  op[0] = (narrow) ((wide) op[1] + (wide) op[2] + 1)) >> 1;
+;; -
+
+(define_expand "avg3_floor"
+ [(set (match_operand: 0 "register_operand")
+   (truncate:
+(:VWEXTI
+ (plus:VWEXTI
+  (any_extend:VWEXTI
+   (match_operand: 1 "register_operand"))
+  (any_extend:VWEXTI
+   (match_operand: 2 "register_operand"))]
+  "TARGET_VECTOR"
+{
+  /* First emit a widening addition.  */
+  rtx tmp1 = gen_reg_rtx (mode);
+  rtx ops1[] = {tmp1, operands[1], operands[2]};
+  insn_code icode = code_for_pred_dual_widen (PLUS, , mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops1);
+
+  /* Then a narrowing shift.  */
+  rtx ops2[] = {operands[0], tmp1, const1_rtx};
+  icode = code_for_pred_narrow_scalar (, mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops2);
+  DONE;
+})
+
+(define_expand "avg3_ceil"
+ [(set (match_operand: 0 "register_operand")
+   (truncate:
+(:VWEXTI
+ (plus:VWEXTI
+  (plus:VWEXTI
+   (any_extend:VWEXTI
+   (match_operand: 1 "register_operand"))
+   (any_extend:VWEXTI
+   (match_operand: 2 "register_operand")))
+  (const_int 1)]
+  "TARGET_VECTOR"
+{
+  /* First emit a widening addition.  */
+  rtx tmp1 = gen_reg_rtx (mode);
+  rtx ops1[] = {tmp1, operands[1], operands[2]};
+  insn_code icode = code_for_pred_dual_widen (PLUS, , mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops1);
+
+  /* Then add 1.  */
+  rtx tmp2 = gen_reg_rtx (mode);
+  rtx ops2[] = {tmp2, tmp1, const1_rtx};
+  icode = code_for_pred_scalar (PLUS, mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops2);
+
+  /* Finally, a narrowing shift.  */
+  rtx ops3[] = {operands[0], tmp2, const1_rtx};
+  icode = code_for_pred_narrow_scalar (, mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops3);
+  DONE;
+})
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 37c6337f1a3..409f63332c9 100644
---

[committed] MAINTAINERS: Add myself to write after approval

2023-08-01 Thread Richard Ball via Gcc-patches


Sponsored by Richard Sandiford 

ChangeLog:

* MAINTAINERS: Add myself.

###

diff --git a/MAINTAINERS b/MAINTAINERS
index 49aa6bae73b..a8bb43f50c3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -332,6 +332,7 @@ David Ayers 


 Prakhar Bahuguna   
 Giovanni Bajo  
 Simon Baldwin  
+Richard Ball   
 Scott Bambrough 


 Wolfgang Bangerth  
 Gergö Barany

[PATCH] analyzer: stash values for CPython plugin [PR107646]

2023-08-01 Thread Eric Feng via Gcc-patches

Hi all,

This patch adds a hook to the end of ana::on_finish_translation_unit
which calls relevant stashing-related callbacks registered during plugin
initialization. This feature is used to stash named types and global
variables for a CPython analyzer plugin [PR107646].

Bootstrapped and tested on aarch64-unknown-linux-gnu. Does it look okay?

---

gcc/analyzer/ChangeLog:

* analyzer-language.cc (run_callbacks): New function.
(on_finish_translation_unit): New function.
* analyzer-language.h (GCC_ANALYZER_LANGUAGE_H): New include.
(class translation_unit): New vfuncs.

gcc/c/ChangeLog:

* c-parser.cc: New functions.

gcc/testsuite/ChangeLog:

* gcc.dg/plugin/analyzer_cpython_plugin.c: New test.

Signed-off-by: Eric Feng 
---
 gcc/analyzer/analyzer-language.cc |  22 ++
 gcc/analyzer/analyzer-language.h  |   9 +
 gcc/c/c-parser.cc |  26 ++
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 224 ++
 4 files changed, 281 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c

diff --git a/gcc/analyzer/analyzer-language.cc
b/gcc/analyzer/analyzer-language.cc
index 2c8910906ee..fc41b9c17b8 100644
--- a/gcc/analyzer/analyzer-language.cc
+++ b/gcc/analyzer/analyzer-language.cc
@@ -35,6 +35,26 @@ static GTY (()) hash_map 
*analyzer_stashed_constants;
 #if ENABLE_ANALYZER

 namespace ana {
+static vec
+*finish_translation_unit_callbacks;
+
+void
+register_finish_translation_unit_callback (
+finish_translation_unit_callback callback)
+{
+  if (!finish_translation_unit_callbacks)
+vec_alloc (finish_translation_unit_callbacks, 1);
+  finish_translation_unit_callbacks->safe_push (callback);
+}
+
+void
+run_callbacks (logger *logger, const translation_unit )
+{
+  for (auto const  : finish_translation_unit_callbacks)
+{
+  cb (logger, tu);
+}
+}

 /* Call into TU to try to find a value for NAME.
If found, stash its value within analyzer_stashed_constants.  */
@@ -102,6 +122,8 @@ on_finish_translation_unit (const translation_unit )
 the_logger.set_logger (new logger (logfile, 0, 0,
 *global_dc->printer));
   stash_named_constants (the_logger.get_logger (), tu);
+
+  run_callbacks (the_logger.get_logger (), tu);
 }

 /* Lookup NAME in the named constants stashed when the frontend TU finished.
diff --git a/gcc/analyzer/analyzer-language.h b/gcc/analyzer/analyzer-language.h
index 00f85aba041..8deea52d627 100644
--- a/gcc/analyzer/analyzer-language.h
+++ b/gcc/analyzer/analyzer-language.h
@@ -21,6 +21,8 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_ANALYZER_LANGUAGE_H
 #define GCC_ANALYZER_LANGUAGE_H

+#include "analyzer/analyzer-logging.h"
+
 #if ENABLE_ANALYZER

 namespace ana {
@@ -35,8 +37,15 @@ class translation_unit
  have been seen).  If it is defined and an integer (e.g. either as a
  macro or enum), return the INTEGER_CST value, otherwise return NULL.  */
   virtual tree lookup_constant_by_id (tree id) const = 0;
+  virtual tree lookup_type_by_id (tree id) const = 0;
+  virtual tree lookup_global_var_by_id (tree id) const = 0;
 };

+typedef void (*finish_translation_unit_callback)
+   (logger *, const translation_unit &);
+void register_finish_translation_unit_callback (
+finish_translation_unit_callback callback);
+
 /* Analyzer hook for frontends to call at the end of the TU.  */

 void on_finish_translation_unit (const translation_unit );
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 80920b31f83..f0ee55e416b 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -1695,6 +1695,32 @@ public:
 return NULL_TREE;
   }

+  tree
+  lookup_type_by_id (tree id) const final override
+  {
+if (tree type_decl = lookup_name (id))
+  {
+ if (TREE_CODE (type_decl) == TYPE_DECL)
+ {
+ tree record_type = TREE_TYPE (type_decl);
+ if (TREE_CODE (record_type) == RECORD_TYPE)
+ return record_type;
+ }
+  }
+
+return NULL_TREE;
+  }
+
+  tree
+  lookup_global_var_by_id (tree id) const final override
+  {
+if (tree var_decl = lookup_name (id))
+  if (TREE_CODE (var_decl) == VAR_DECL)
+ return var_decl;
+
+return NULL_TREE;
+  }
+
 private:
   /* Attempt to get an INTEGER_CST from MACRO.
  Only handle the simplest cases: where MACRO's definition is a single
diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
new file mode 100644
index 000..285da102edb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
@@ -0,0 +1,224 @@
+/* -fanalyzer plugin for CPython extension modules  */
+/* { dg-options "-g" } */
+
+#define INCLUDE_MEMORY
+#include "gcc-plugin.h"
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tree.h"
+#include "function.h"
+#include "basic-block.h"
+#include "gimple.h"
+#include "gimple-iterator.h"
+#include "diagnostic-core.h"
+#include

Re: RISCV test infrastructure for d / v / zfh extensions

2023-08-01 Thread Robin Dapp via Gcc-patches

Hi Joern,

thanks, I believe this will help with testing.

> +proc check_effective_target_riscv_v { } {
> +return [check_no_compiler_messages riscv_ext_v assembly {
> +   #ifndef __riscv_v
> +   #error "Not __riscv_v"
> +   #endif
> +}]
> +}
This can be replaced by riscv_vector or vice versa.

> +# Return 1 if we can execute code when using dg-add-options riscv_v
> +
> +proc check_effective_target_riscv_v_ok { } {
> +# If the target already supports v without any added options,
> +# we may assume we can execute just fine.
> +if { [check_effective_target_riscv_v] } {
> + return 1
> +}
> +
> +# check if we can execute vector insns with the given hardware or
> +# simulator
> +set gcc_march [regsub {[[:alnum:]]*} [riscv_get_arch] ]
> +if { [check_runtime ${gcc_march}_exec {
> +   int main() {  asm("vsetivli t0, 9, e8, m1, tu, ma"); return 0; } } 
> "-march=${gcc_march}"] } {
> + return 1
> +}
> +
> +# Possible future extensions: If the target is a simulator, 
> dg-add-options
> +# might change its config to make it allow vector insns, or we might use
> +# options to set special elf flags / sections to effect that.
> +
> +return 0
> +}
So in general we would add {dg-add-options riscv_v} for every
test that requires compile-time vector support?

For a run test we would check {dg-require-effective-target riscv_v_ok}
before?

Would it make sense to skip the first check here
(check_effective_target_riscv_v) so we have a proper runtime check?
Right now we assume the runtime can execute vector instructions if
the compiler can emit them.  You could replace riscv_vector_hw and
riscv_zvfh_hw by your versions then and we'd have a clear separation
between runtime and compile time.
We would just need to make sure not to add "v" twice if it's already
in the march string.

> +if { [string equal $gcc_march "imafd"] } {
> + set gcc_march "g"
> +}
Wouldn't we want to always replace "imafd" with "g" for
simplicity/consistency and not just the exact string? 

> +proc add_options_for_riscv_v { flags } {
> +if { [lsearch $flags -march=*] >= 0 } {
> + # If there are multiple -march flags, we have to adjust all of them.
> + # ??? Is there a way to make the match specific to a full list element?
> + # as it is, we might match something inside a string.
> + return [regsub -all -- {(-march=rv[[:digit:]]*[a-rt-uwy]*)v*} $flags 
> \\1v ]

Is iterating over the list elements and returning a new list
not an option?  Or would that break something else?

Regards
 Robin

[Patch,avr,committed] Fix PR target/110220: Set JUMP_LABEL as required.

2023-08-01 Thread Georg-Johann Lay

Committed as obvious.  An insn emitted by avr specific RTL optimization 
pass missed setting of its JUMP_LABEL.


Johann

target/110220: Set JUMP_LABEL and LABEL_NUSES of new branch insn 
generated by

target specific RTL optimization pass .avr-casesi.

gcc/
PR target/110220
* config/avr/avr.cc (avr_optimize_casesi): Set JUMP_LABEL and
LABEL_NUSES of new conditional branch instruction.

diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index 0447641a8e9..25f3f4c22e0 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -644,9 +644,11 @@ avr_optimize_casesi (rtx_insn *insns[5], rtx *xop)
   emit_insn (gen_add (reg, reg, gen_int_mode (-low_idx, mode)));
   rtx op0 = reg; rtx op1 = gen_int_mode (num_idx, mode);
   rtx labelref = copy_rtx (xop[4]);
-  emit_jump_insn (gen_cbranch (gen_rtx_fmt_ee (GTU, VOIDmode, op0, op1),
-   op0, op1,
-   labelref));
+  rtx xbranch = gen_cbranch (gen_rtx_fmt_ee (GTU, VOIDmode, op0, op1),
+op0, op1, labelref);
+  rtx_insn *cbranch = emit_jump_insn (xbranch);
+  JUMP_LABEL (cbranch) = xop[4];
+  ++LABEL_NUSES (xop[4]);

   seq1 = get_insns();
   last1 = get_last_insn();

Re: [RFC] light expander sra for parameters and returns

2023-08-01 Thread Jiufu Guo via Gcc-patches



Hi,

Jiufu Guo via Gcc-patches  writes:

> Hi,
>
> Richard Biener  writes:
>
>> On Mon, 24 Jul 2023, Jiufu Guo wrote:
>>
>>> 
>>> Hi Martin,
>>> 
>>> Not sure about your current option about re-using the ipa-sra code
>>> in the light-expander-sra. And if anything I could input please
>>> let me know.
>>>
...
>>
>> What I was hoping for is shared stmt-level analysis and a shared
>> data structure for the "access"(es) a stmt performs.  Because that
>> can come up handy in multiple places.  The existing SRA data
>> structures could easily embed that subset for example if sharing
>> the whole data structure of [IPA] SRA seems too unwieldly.
>
> Understand.
> The stmt-level analysis and "access" data structure are similar
> between ipa-sra/tree-sra and the expander-sra.
>
> I just update the patch, this version does not change the behaviors of
> the previous version.  It is just cleaning/merging some functions only.
> The patch is attached.
>
> This version (and tree-sra/ipa-sra) is still using the similar
> "stmt analyze" and "access struct"".  This could be extracted as
> shared code.
> I'm thinking to update the code to use the same "base_access" and
> "walk function".

I'm drafting code for the shared stmt-analyze and access-structure.
The code may like below.

BR,
Jeff (Jiufu Guo)

---
struct base_access
{
  /* Values returned by get_ref_base_and_extent, indicates the
 OFFSET, SIZE and BASE of the access.  */
  HOST_WIDE_INT offset;
  HOST_WIDE_INT size;
  tree base;

  /* The context expression of this access.  */
  tree expr;

  /* Indicates this is a write access.  */
  bool write : 1;

  /* Indicates if this access is made in reverse storage order.  */
  bool reverse : 1;
};

/* Default template for sra_scan_function.  */

struct default_analyzer
{
  /* Template analyze functions.  */
  void analyze_phi (gphi *){};
  void pre_analyze_stmt (gimple *){};
  void analyze_return (greturn *){};
  void analyze_assign (gassign *){};
  void analyze_call (gcall *){};
  void analyze_asm (gasm *){};
  void analyze_default_stmt (gimple *){};
};

/* Scan function and look for interesting expressions.  */

template 
void
sra_scan_function (struct function *fun, analyzer )
{
  basic_block bb;
  FOR_EACH_BB_FN (bb, fun)
{
  for (gphi_iterator gsi = gsi_start_phis (bb); !gsi_end_p (gsi);
   gsi_next ())
a.analyze_phi (gsi.phi ());

  for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
   gsi_next ())
{
  gimple *stmt = gsi_stmt (gsi);
  a.pre_analyze_stmt (stmt);

  switch (gimple_code (stmt))
{
case GIMPLE_RETURN:
  a.analyze_return (as_a (stmt));
  break;

case GIMPLE_ASSIGN:
  a.analyze_assign (as_a (stmt));
  break;

case GIMPLE_CALL:
  a.analyze_call (as_a (stmt));
  break;

case GIMPLE_ASM:
  a.analyze_asm (as_a (stmt));
  break;

default:
  a.analyze_default_stmt (stmt);
  break;
}
}
}
}


struct access : public base_access
{
  /* The rtx for the access: link to incoming/returning register(s).  */
  rtx rtx_val;
};

struct expand_access_analyzer : public default_analyzer
{
  /* Now use default APIs, no actions for
 pre_analyze_stmt, analyze_return.  */

  /* overwrite analyze_default_stmt.  */
  void analyze_default_stmt (gimple *);

  /* overwrite analyze phi,call,asm .  */
  void analyze_phi (gphi *stmt) { analyze_default_stmt (stmt); };
  void analyze_call (gcall *stmt) { analyze_default_stmt (stmt); };
  void analyze_asm (gasm *stmt) { analyze_default_stmt (stmt); };  

  /* overwrite analyze_assign.  */
  void analyze_assign (gassign *);
};


>
>>
>> With a stmt-leve API using FOR_EACH_IMM_USE_STMT would still be
>> possible (though RTL expansion pre-walks all stmts anyway).
>
> Yeap, I also notice that "FOR_EACH_IMM_USE_STMT" is not enough.
> For struct parameters, walking stmt is needed.
>
>
> BR,
> Jeff (Jiufu Guo)
>
> -
> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> index edf292cfbe9..8c36ad5df79 100644
> --- a/gcc/cfgexpand.cc
> +++ b/gcc/cfgexpand.cc
> @@ -97,6 +97,502 @@ static bool defer_stack_allocation (tree, bool);
>  
>  static void record_alignment_for_reg_var (unsigned int);
>  
> +extern rtx
> +expand_shift (enum tree_code, machine_mode, rtx, poly_int64, rtx, int);
> +
> +/* For light SRA in expander about paramaters and returns.  */
> +namespace
> +{
> +
> +struct access
> +{
> +  /* Each accessing on the aggragate is about OFFSET/SIZE.  */
> +  HOST_WIDE_INT offset;
> +  HOST_WIDE_INT size;
> +
> +  bool writing;
> +
> +  /* The context expression of this access.  */
> +  tree expr;
> +
> +  /* The rtx for the access: link to incoming/returning register(s).  */
> +  rtx rtx_val;
> +};
> +
> +typedef struct access *access_p;
> +
>

Re: [PATCH 2/5] [RISC-V] Generate Zicond instruction for basic semantics

2023-08-01 Thread Richard Sandiford via Gcc-patches

Jeff Law via Gcc-patches  writes:
> On 7/19/23 04:11, Xiao Zeng wrote:
>> This patch completes the recognition of the basic semantics
>> defined in the spec, namely:
>> 
>> Conditional zero, if condition is equal to zero
>>rd = (rs2 == 0) ? 0 : rs1
>> Conditional zero, if condition is non zero
>>rd = (rs2 != 0) ? 0 : rs1
>> 
>> gcc/ChangeLog:
>> 
>>  * config/riscv/riscv.md: Include zicond.md
>>  * config/riscv/zicond.md: New file.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  * gcc.target/riscv/zicond-primitiveSemantics.c: New test.
> So I played with this a bit today.  I originally thought that using 
> match_dup was the right way to go for those 4 secondary patterns.  But 
> after further pondering it's not ideal.
>
> match_dup will require pointer equality within the RTL structure.  That 
> could inhibit detection in two cases.  First, SUBREGs.   SUBREGs are not 
> shared.  So we'd never match if we had a SUBREG expression.
>
> Second, post register allocation we can have the same looking RTX, but 
> it may not be pointer equal.

Where were you seeing the requirement for pointer equality?  genrecog.cc
at least uses rtx_equal_p, and I think it has to.  E.g. some patterns
use (match_dup ...) to match output and input mems, and mem rtxes
shouldn't be shared.

I'd always understood using matching constraints against other inputs
to be a no-no, since the RA doesn't (and can't reasonably be expected to)
make two non-identical inputs match.  So AIUI, using "1" won't lead to
different code generation compared to "r".  Both are relying on the RA
happening to do the right thing.  But "1" would presumably trigger an
ICE if something goes wrong.

Thanks,
Richard

> The SUBREG issue also means that we don't want to use a REGNO (x) == 
> REGNO (y) style check because those macros are only valid on REG 
> expressions.  We could strip the SUBREG, but that's usually awkward to 
> do in a pattern's condition.
>
> The net result is we probably should use rtx_equal_p which I was hoping 
> to avoid.  I'm testing with that change to the 4 secondary patterns 
> right now.  Assuming that passes (and I have no reason to think it 
> won't) then I'll go ahead and commit #1 and #2 from this series which is 
> all I have time for today.
>
>
>
> Jeff

Fix profile upate after vectorizer peeling

2023-08-01 Thread Jan Hubicka via Gcc-patches

Hi,
This patch fixes update after constant peeling in profilogue.  We now reached 0 
profile
update bugs on tramp3d vectorizaiton and also on quite few testcases, so I am 
enabling the
testuiste checks so we do not regress again.

Bootstrapped/regtested x86_64, comitted.

Honza

gcc/ChangeLog:

* tree-vect-loop-manip.cc (vect_do_peeling): Fix profile update after
constant prologue peeling.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-1-big-array.c: Check profile consistency.
* gcc.dg/vect/vect-1.c: Check profile consistency.
* gcc.dg/vect/vect-10-big-array.c: Check profile consistency.
* gcc.dg/vect/vect-10.c: Check profile consistency.
* gcc.dg/vect/vect-100.c: Check profile consistency.
* gcc.dg/vect/vect-103.c: Check profile consistency.
* gcc.dg/vect/vect-104.c: Check profile consistency.
* gcc.dg/vect/vect-105-big-array.c: Check profile consistency.
* gcc.dg/vect/vect-105.c: Check profile consistency.
* gcc.dg/vect/vect-106.c: Check profile consistency.
* gcc.dg/vect/vect-107.c: Check profile consistency.
* gcc.dg/vect/vect-108.c: Check profile consistency.
* gcc.dg/vect/vect-109.c: Check profile consistency.
* gcc.dg/vect/vect-11.c: Check profile consistency.
* gcc.dg/vect/vect-110.c: Check profile consistency.
* gcc.dg/vect/vect-112-big-array.c: Check profile consistency.
* gcc.dg/vect/vect-112.c: Check profile consistency.
* gcc.dg/vect/vect-113.c: Check profile consistency.
* gcc.dg/vect/vect-114.c: Check profile consistency.
* gcc.dg/vect/vect-115.c: Check profile consistency.
* gcc.dg/vect/vect-116.c: Check profile consistency.
* gcc.dg/vect/vect-117.c: Check profile consistency.
* gcc.dg/vect/vect-118.c: Check profile consistency.
* gcc.dg/vect/vect-119.c: Check profile consistency.
* gcc.dg/vect/vect-11a.c: Check profile consistency.
* gcc.dg/vect/vect-12.c: Check profile consistency.
* gcc.dg/vect/vect-120.c: Check profile consistency.
* gcc.dg/vect/vect-121.c: Check profile consistency.
* gcc.dg/vect/vect-122.c: Check profile consistency.
* gcc.dg/vect/vect-123.c: Check profile consistency.
* gcc.dg/vect/vect-124.c: Check profile consistency.
* gcc.dg/vect/vect-126.c: Check profile consistency.
* gcc.dg/vect/vect-13.c: Check profile consistency.
* gcc.dg/vect/vect-14.c: Check profile consistency.
* gcc.dg/vect/vect-15-big-array.c: Check profile consistency.
* gcc.dg/vect/vect-15.c: Check profile consistency.
* gcc.dg/vect/vect-17.c: Check profile consistency.
* gcc.dg/vect/vect-18.c: Check profile consistency.
* gcc.dg/vect/vect-19.c: Check profile consistency.
* gcc.dg/vect/vect-2-big-array.c: Check profile consistency.
* gcc.dg/vect/vect-2.c: Check profile consistency.
* gcc.dg/vect/vect-20.c: Check profile consistency.
* gcc.dg/vect/vect-21.c: Check profile consistency.
* gcc.dg/vect/vect-22.c: Check profile consistency.
* gcc.dg/vect/vect-23.c: Check profile consistency.
* gcc.dg/vect/vect-24.c: Check profile consistency.
* gcc.dg/vect/vect-25.c: Check profile consistency.
* gcc.dg/vect/vect-26.c: Check profile consistency.
* gcc.dg/vect/vect-27.c: Check profile consistency.
* gcc.dg/vect/vect-28.c: Check profile consistency.
* gcc.dg/vect/vect-29.c: Check profile consistency.
* gcc.dg/vect/vect-3.c: Check profile consistency.
* gcc.dg/vect/vect-30.c: Check profile consistency.
* gcc.dg/vect/vect-31-big-array.c: Check profile consistency.
* gcc.dg/vect/vect-31.c: Check profile consistency.
* gcc.dg/vect/vect-32-big-array.c: Check profile consistency.
* gcc.dg/vect/vect-32-chars.c: Check profile consistency.
* gcc.dg/vect/vect-32.c: Check profile consistency.
* gcc.dg/vect/vect-33-big-array.c: Check profile consistency.
* gcc.dg/vect/vect-33.c: Check profile consistency.
* gcc.dg/vect/vect-34-big-array.c: Check profile consistency.
* gcc.dg/vect/vect-34.c: Check profile consistency.
* gcc.dg/vect/vect-35-big-array.c: Check profile consistency.
* gcc.dg/vect/vect-35.c: Check profile consistency.
* gcc.dg/vect/vect-36-big-array.c: Check profile consistency.
* gcc.dg/vect/vect-36.c: Check profile consistency.
* gcc.dg/vect/vect-38.c: Check profile consistency.
* gcc.dg/vect/vect-4.c: Check profile consistency.
* gcc.dg/vect/vect-40.c: Check profile consistency.
* gcc.dg/vect/vect-42.c: Check profile consistency.
* gcc.dg/vect/vect-44.c: Check profile consistency.
* gcc.dg/vect/vect-46.c: Check profile consistency.
* gcc.dg/vect/vect-48.c: Check profile consistency.
* gcc.dg/vect/vect-5.c:

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-08-01 Thread Hao Liu OS via Gcc-patches

Hi Richard,

This is a quick fix to the several ICEs.  It seems even STMT_VINFO_LIVE_P is 
true, some reduct stmts still don't have REDUC_DEF.  So I change the check to 
STMT_VINFO_REDUC_DEF.

Is it OK for trunk?

---
Fix the ICEs on empty reduction define.  Even STMT_VINFO_LIVE_P is true, some 
reduct stmts
still don't have definition.

gcc/ChangeLog:

PR target/110625
* config/aarch64/aarch64.cc (aarch64_force_single_cycle): check
STMT_VINFO_REDUC_DEF to avoid failures in info_for_reduction
---
 gcc/config/aarch64/aarch64.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index d4d76025545..5b8d8fa8e2d 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -16776,7 +16776,7 @@ aarch64_adjust_stmt_cost (vect_cost_for_stmt kind, 
stmt_vec_info stmt_info,
 static bool
 aarch64_force_single_cycle (vec_info *vinfo, stmt_vec_info stmt_info)
 {
-  if (!STMT_VINFO_LIVE_P (stmt_info))
+  if (!STMT_VINFO_REDUC_DEF (stmt_info))
 return false;

   auto reduc_info = info_for_reduction (vinfo, stmt_info);
--
2.40.0



From: Richard Sandiford 
Sent: Monday, July 31, 2023 17:11
To: Hao Liu OS
Cc: Richard Biener; GCC-patches@gcc.gnu.org
Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by 
multiplying count [PR110625]

Hao Liu OS  writes:
>> Which test case do you see this for?  The two tests in the patch still
>> seem to report correct latencies for me if I make the change above.
>
> Not the newly added tests.  It is still the existing case causing the 
> previous ICE (i.e. assertion problem): gcc.target/aarch64/sve/cost_model_13.c.
>
> It's not the test case itself failed, but the dump message of vect says the 
> "reduction latency" is 0:
>
> Before the change:
> cost_model_13.c:7:21: note:  Original vector body cost = 6
> cost_model_13.c:7:21: note:  Scalar issue estimate:
> cost_model_13.c:7:21: note:load operations = 1
> cost_model_13.c:7:21: note:store operations = 0
> cost_model_13.c:7:21: note:general operations = 1
> cost_model_13.c:7:21: note:reduction latency = 1
> cost_model_13.c:7:21: note:estimated min cycles per iteration = 1.00
> cost_model_13.c:7:21: note:estimated cycles per vector iteration (for VF 
> 8) = 8.00
> cost_model_13.c:7:21: note:  Vector issue estimate:
> cost_model_13.c:7:21: note:load operations = 1
> cost_model_13.c:7:21: note:store operations = 0
> cost_model_13.c:7:21: note:general operations = 1
> cost_model_13.c:7:21: note:reduction latency = 2
> cost_model_13.c:7:21: note:estimated min cycles per iteration = 2.00
>
> After the change:
> cost_model_13.c:7:21: note:  Original vector body cost = 6
> cost_model_13.c:7:21: note:  Scalar issue estimate:
> cost_model_13.c:7:21: note:load operations = 1
> cost_model_13.c:7:21: note:store operations = 0
> cost_model_13.c:7:21: note:general operations = 1
> cost_model_13.c:7:21: note:reduction latency = 0 <--- seems not 
> consistent with above result
> cost_model_13.c:7:21: note:estimated min cycles per iteration = 1.00
> cost_model_13.c:7:21: note:estimated cycles per vector iteration (for VF 
> 8) = 8.00
> cost_model_13.c:7:21: note:  Vector issue estimate:
> cost_model_13.c:7:21: note:load operations = 1
> cost_model_13.c:7:21: note:store operations = 0
> cost_model_13.c:7:21: note:general operations = 1
> cost_model_13.c:7:21: note:reduction latency = 0 <--- seems not 
> consistent with above result
> cost_model_13.c:7:21: note:estimated min cycles per iteration = 1.00  
><--- seems not consistent with above result
>
> BTW. this should be caused by the reduction stmt is not live, which indicates 
> whether this stmts is part of a computation whose result is used outside the 
> loop (tree-vectorized.h:1204):
>   :
>   # res_18 = PHI 
>   # i_20 = PHI 
>   _1 = (long unsigned int) i_20;
>   _2 = _1 * 2;
>   _3 = x_14(D) + _2;
>   _4 = *_3;
>   _5 = (unsigned short) _4;
>   res.0_6 = (unsigned short) res_18;
>   _7 = _5 + res.0_6; <-- This is not live, may be 
> caused by the below type cast stmt.
>   res_15 = (short int) _7;
>   i_16 = i_20 + 1;
>   if (n_11(D) > i_16)
> goto ;
>   else
> goto ;
>
>   :
>   goto ;

Ah, I see, thanks.  My concern was: if requiring !STMT_VINFO_LIVE_P stmts
can cause "normal" reductions to have a latency of 0, could the same thing
happen for single-cycle reductions?  But I suppose the answer is "no".
Introducing a cast like the above would cause reduc_chain_length > 1,
and so:

  if (ncopies > 1
  && (STMT_VINFO_RELEVANT (stmt_info) <= vect_used_only_live)
  && reduc_chain_length == 1
  && loop_vinfo->suggested_unroll_factor == 1)
single_defuse_cycle = true;

wouldn't trigger.  Which makes the single-cycle thing a bit

Re: [PATCH] tree-optimization/110838 - vectorization of widened shifts

2023-08-01 Thread Richard Sandiford via Gcc-patches

Richard Sandiford  writes:
> Richard Biener via Gcc-patches  writes:
>> The following makes sure to limit the shift operand when vectorizing
>> (short)((int)x >> 31) via (short)x >> 31 as the out of bounds shift
>> operand otherwise invokes undefined behavior.  When we determine
>> whether we can demote the operand we know we at most shift in the
>> sign bit so we can adjust the shift amount.
>>
>> Note this has the possibility of un-CSEing common shift operands
>> as there's no good way to share pattern stmts between patterns.
>> We'd have to separately pattern recognize the definition.
>>
>> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>>
>> Not sure about LSHIFT_EXPR, it probably has the same issue but
>> the fallback optimistic zero for out-of-range shifts is at least
>> "corrrect".  Not sure we ever try to demote rotates (probably not).
>
> I guess you mean "correct" for x86?  But that's just a quirk of x86.
> IMO the behaviour is equally wrong for LSHIFT_EXPR.

Sorry for the multiple messages.  Wanted to get something out quickly
because I wasn't sure how long it would take me to write this...

On rotates, for:

void
foo (unsigned short *restrict ptr)
{
  for (int i = 0; i < 200; ++i)
{
  unsigned int x = ptr[i] & 0xff0;
  ptr[i] = (x << 1) | (x >> 31);
}
}

we do get:

can narrow to unsigned:13 without loss of precision: _5 = x_12 r>> 31;

although aarch64 doesn't provide rrotate patterns, so nothing actually
comes of it.

I think the handling of variable shifts is flawed for other reasons.  Given:

void
uu (unsigned short *restrict ptr1, unsigned short *restrict ptr2)
{
  for (int i = 0; i < 200; ++i)
ptr1[i] = ptr1[i] >> ptr2[i];
}

void
us (unsigned short *restrict ptr1, short *restrict ptr2)
{
  for (int i = 0; i < 200; ++i)
ptr1[i] = ptr1[i] >> ptr2[i];
}

void
su (short *restrict ptr1, unsigned short *restrict ptr2)
{
  for (int i = 0; i < 200; ++i)
ptr1[i] = ptr1[i] >> ptr2[i];
}

void
ss (short *restrict ptr1, short *restrict ptr2)
{
  for (int i = 0; i < 200; ++i)
ptr1[i] = ptr1[i] >> ptr2[i];
}

we only narrow uu and ss, due to:

/* Ignore codes that don't take uniform arguments.  */
if (!types_compatible_p (TREE_TYPE (op), type))
  return;

in vect_determine_precisions_from_range.  Maybe we should drop
the shift handling from there and instead rely on
vect_determine_precisions_from_users, extending:

if (TREE_CODE (shift) != INTEGER_CST
|| !wi::ltu_p (wi::to_widest (shift), precision))
  return;

to handle ranges where the max is known to be < precision.

There again, if masking is enough for right shifts and right rotates,
maybe we should keep the current handling for then (with your fix)
and skip the types_compatible_p check for those cases.

So:

- restrict shift handling in vect_determine_precisions_from_range to
  RSHIFT_EXPR and RROTATE_EXPR

- remove types_compatible_p restriction for those cases

- extend vect_determine_precisions_from_users shift handling to check
  for ranges on the shift amount

Does that sound right?

Thanks,
Richard

Re: [PATCH v2] combine: Narrow comparison of memory and constant

2023-08-01 Thread Stefan Schulze Frielinghaus via Gcc-patches

On Tue, Aug 01, 2023 at 01:52:16PM +0530, Prathamesh Kulkarni wrote:
> On Tue, 1 Aug 2023 at 05:20, Jeff Law  wrote:
> >
> >
> >
> > On 7/31/23 15:43, Prathamesh Kulkarni via Gcc-patches wrote:
> > > On Mon, 19 Jun 2023 at 19:59, Stefan Schulze Frielinghaus via
> > > Gcc-patches  wrote:
> > >>
> > >> Comparisons between memory and constants might be done in a smaller mode
> > >> resulting in smaller constants which might finally end up as immediates
> > >> instead of in the literal pool.
> > >>
> > >> For example, on s390x a non-symmetric comparison like
> > >>x <= 0x3fff
> > >> results in the constant being spilled to the literal pool and an 8 byte
> > >> memory comparison is emitted.  Ideally, an equivalent comparison
> > >>x0 <= 0x3f
> > >> where x0 is the most significant byte of x, is emitted where the
> > >> constant is smaller and more likely to materialize as an immediate.
> > >>
> > >> Similarly, comparisons of the form
> > >>x >= 0x4000
> > >> can be shortened into x0 >= 0x40.
> > >>
> > >> Bootstrapped and regtested on s390x, x64, aarch64, and powerpc64le.
> > >> Note, the new tests show that for the mentioned little-endian targets
> > >> the optimization does not materialize since either the costs of the new
> > >> instructions are higher or they do not match.  Still ok for mainline?
> > > Hi Stefan,
> > > Unfortunately this patch (committed in 
> > > 7cdd0860949c6c3232e6cff1d7ca37bb5234074c)
> > > caused the following ICE on armv8l-unknown-linux-gnu:
> > > during RTL pass: combine
> > > ../../../gcc/libgcc/fixed-bit.c: In function ‘__gnu_saturate1sq’:
> > > ../../../gcc/libgcc/fixed-bit.c:210:1: internal compiler error: in
> > > decompose, at rtl.h:2297
> > >210 | }
> > >| ^
> > > 0xaa23e3 wi::int_traits
> > >> ::decompose(long long*, unsigned int, std::pair > > machine_mode> const&)
> > >  ../../gcc/gcc/rtl.h:2297
> > [ ... ]
> > Yea, we're seeing something very similar on nios2-linux-gnu building the
> > kernel.
> >
> > Prathamesh, can you extract the .i file for fixed-bit on armv8 and open
> > a bug for this issue, attaching the .i file as well as the right command
> > line options necessary to reproduce the failure.  THat way Stefan can
> > tackle it with a cross compiler.
> Hi Jeff,
> Filed the issue in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110867

Hi Prathamesh,

Sorry for the inconvenience.  I will have a look at this and thanks for
the small reproducer.  I already started to come up with a cross
compiler.

Thanks,
Stefan

> 
> Thanks,
> Prathamesh
> >
> > Thanks,
> > jeff

[PATCH] MAINTAINERS: correct my email address

2023-08-01 Thread Jan Beulich via Gcc-patches

The @novell.com one has been out of use for quite some time.

ChangeLog:

* MAINTAINERS: Correct my email address.

--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -344,7 +344,7 @@ Andrew Bennett  

 Daniel Berlin  
 Pat Bernardi   
-Jan Beulich
+Jan Beulich
 David Billinghurst 

 Tomas Bily 
 Laurynas Biveinis

Re: [PATCH] tree-optimization/110838 - vectorization of widened shifts

2023-08-01 Thread Richard Sandiford via Gcc-patches

Richard Biener via Gcc-patches  writes:
> The following makes sure to limit the shift operand when vectorizing
> (short)((int)x >> 31) via (short)x >> 31 as the out of bounds shift
> operand otherwise invokes undefined behavior.  When we determine
> whether we can demote the operand we know we at most shift in the
> sign bit so we can adjust the shift amount.
>
> Note this has the possibility of un-CSEing common shift operands
> as there's no good way to share pattern stmts between patterns.
> We'd have to separately pattern recognize the definition.
>
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>
> Not sure about LSHIFT_EXPR, it probably has the same issue but
> the fallback optimistic zero for out-of-range shifts is at least
> "corrrect".  Not sure we ever try to demote rotates (probably not).

I guess you mean "correct" for x86?  But that's just a quirk of x86.
IMO the behaviour is equally wrong for LSHIFT_EXPR.

Richard

> OK?
>
> Thanks,
> Richard.
>
>   PR tree-optimization/110838
>   * tree-vect-patterns.cc (vect_recog_over_widening_pattern):
>   Adjust the shift operand of RSHIFT_EXPRs.
>
>   * gcc.dg/torture/pr110838.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/torture/pr110838.c | 43 +
>  gcc/tree-vect-patterns.cc   | 24 ++
>  2 files changed, 67 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr110838.c
>
> diff --git a/gcc/testsuite/gcc.dg/torture/pr110838.c 
> b/gcc/testsuite/gcc.dg/torture/pr110838.c
> new file mode 100644
> index 000..f039bd6c8ea
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr110838.c
> @@ -0,0 +1,43 @@
> +/* { dg-do run } */
> +
> +typedef __UINT32_TYPE__ uint32_t;
> +typedef __UINT8_TYPE__ uint8_t;
> +typedef __INT8_TYPE__ int8_t;
> +typedef uint8_t pixel;
> +
> +/* get the sign of input variable (TODO: this is a dup, make common) */
> +static inline int8_t signOf(int x)
> +{
> +  return (x >> 31) | ((int)uint32_t)-x)) >> 31));
> +}
> +
> +__attribute__((noipa))
> +static void calSign_bug(int8_t *dst, const pixel *src1, const pixel *src2, 
> const int endX)
> +{
> +  for (int x = 0; x < endX; x++)
> +dst[x] = signOf(src1[x] - src2[x]);
> +}
> +
> +__attribute__((noipa, optimize(0)))
> +static void calSign_ok(int8_t *dst, const pixel *src1, const pixel *src2, 
> const int endX)
> +{
> +  for (int x = 0; x < endX; x++)
> +dst[x] = signOf(src1[x] - src2[x]);
> +}
> +
> +__attribute__((noipa, optimize(0)))
> +int main()
> +{
> +  const pixel s1[9] = { 0xcd, 0x33, 0xd4, 0x3e, 0xb0, 0xfb, 0x95, 0x64, 
> 0x70, };
> +  const pixel s2[9] = { 0xba, 0x9f, 0xab, 0xa1, 0x3b, 0x29, 0xb1, 0xbd, 
> 0x64, };
> +  int endX = 9;
> +  int8_t dst[9];
> +  int8_t dst_ok[9];
> +
> +  calSign_bug(dst, s1, s2, endX);
> +  calSign_ok(dst_ok, s1, s2, endX);
> +
> +  if (__builtin_memcmp(dst, dst_ok, endX) != 0)
> +__builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index ef806e2346e..e4ab8c2d65b 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -3099,9 +3099,33 @@ vect_recog_over_widening_pattern (vec_info *vinfo,
>tree ops[3] = {};
>for (unsigned int i = 1; i < first_op; ++i)
>  ops[i - 1] = gimple_op (last_stmt, i);
> +  /* For right shifts limit the shift operand.  */
>vect_convert_inputs (vinfo, last_stmt_info, nops, [first_op - 1],
>  op_type, [0], op_vectype);
>  
> +  /* Limit shift operands.  */
> +  if (code == RSHIFT_EXPR)
> +{
> +  wide_int min_value, max_value;
> +  if (TREE_CODE (ops[1]) == INTEGER_CST)
> + ops[1] = wide_int_to_tree (op_type,
> +wi::bit_and (wi::to_wide (ops[1]),
> + new_precision - 1));
> +  else if (!vect_get_range_info (ops[1], _value, _value)
> +|| wi::ge_p (max_value, new_precision, TYPE_SIGN (op_type)))
> + {
> +   /* ???  Note the following bad for SLP as that only supports
> +  same argument widened shifts and it un-CSEs same arguments.  */
> +   tree new_var = vect_recog_temp_ssa_var (op_type, NULL);
> +   gimple *pattern_stmt
> + = gimple_build_assign (new_var, BIT_AND_EXPR, ops[1],
> +build_int_cst (op_type, new_precision - 1));
> +   ops[1] = new_var;
> +   gimple_set_location (pattern_stmt, gimple_location (last_stmt));
> +   append_pattern_def_seq (vinfo, last_stmt_info, pattern_stmt);
> + }
> +}
> +
>/* Use the operation to produce a result of type OP_TYPE.  */
>tree new_var = vect_recog_temp_ssa_var (op_type, NULL);
>gimple *pattern_stmt = gimple_build_assign (new_var, code,

[PATCH][COMMITTED] doc: Fix spelling in arm_v8_1m_main_cde_mve_fp

2023-08-01 Thread Christophe Lyon via Gcc-patches

Fix spelling mistakes introduced by my previous patch in this area.

Committed as obvious.

2023-08-01  Christophe Lyon  

gcc/
* doc/sourcebuild.texi (arm_v8_1m_main_cde_mve_fp): Fix spelling.
---
 gcc/doc/sourcebuild.texi | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index e5d15d67253..1a78b3c1abb 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2186,12 +2186,12 @@ the Custom Datapath Extension (CDE) and floating-point 
(VFP).
 Some multilibs may be incompatible with these options.
 
 @item arm_v8_1m_main_cde_mve
-Arm target supports options to generate instructions from Arm.1-M with
+Arm target supports options to generate instructions from Armv8.1-M with
 the Custom Datapath Extension (CDE) and M-Profile Vector Extension (MVE).
 Some multilibs may be incompatible with these options.
 
 @item arm_v8_1m_main_cde_mve_fp
-ARM target supports options to generate instructions from ARMv8.1-M
+Arm target supports options to generate instructions from Armv8.1-M
 with the Custom Datapath Extension (CDE) and M-Profile Vector
 Extension (MVE) with floating-point support.  Some multilibs may be
 incompatible with these options.
-- 
2.34.1

Re: [PATCH v2] combine: Narrow comparison of memory and constant

2023-08-01 Thread Prathamesh Kulkarni via Gcc-patches

On Tue, 1 Aug 2023 at 05:20, Jeff Law  wrote:
>
>
>
> On 7/31/23 15:43, Prathamesh Kulkarni via Gcc-patches wrote:
> > On Mon, 19 Jun 2023 at 19:59, Stefan Schulze Frielinghaus via
> > Gcc-patches  wrote:
> >>
> >> Comparisons between memory and constants might be done in a smaller mode
> >> resulting in smaller constants which might finally end up as immediates
> >> instead of in the literal pool.
> >>
> >> For example, on s390x a non-symmetric comparison like
> >>x <= 0x3fff
> >> results in the constant being spilled to the literal pool and an 8 byte
> >> memory comparison is emitted.  Ideally, an equivalent comparison
> >>x0 <= 0x3f
> >> where x0 is the most significant byte of x, is emitted where the
> >> constant is smaller and more likely to materialize as an immediate.
> >>
> >> Similarly, comparisons of the form
> >>x >= 0x4000
> >> can be shortened into x0 >= 0x40.
> >>
> >> Bootstrapped and regtested on s390x, x64, aarch64, and powerpc64le.
> >> Note, the new tests show that for the mentioned little-endian targets
> >> the optimization does not materialize since either the costs of the new
> >> instructions are higher or they do not match.  Still ok for mainline?
> > Hi Stefan,
> > Unfortunately this patch (committed in 
> > 7cdd0860949c6c3232e6cff1d7ca37bb5234074c)
> > caused the following ICE on armv8l-unknown-linux-gnu:
> > during RTL pass: combine
> > ../../../gcc/libgcc/fixed-bit.c: In function ‘__gnu_saturate1sq’:
> > ../../../gcc/libgcc/fixed-bit.c:210:1: internal compiler error: in
> > decompose, at rtl.h:2297
> >210 | }
> >| ^
> > 0xaa23e3 wi::int_traits
> >> ::decompose(long long*, unsigned int, std::pair > machine_mode> const&)
> >  ../../gcc/gcc/rtl.h:2297
> [ ... ]
> Yea, we're seeing something very similar on nios2-linux-gnu building the
> kernel.
>
> Prathamesh, can you extract the .i file for fixed-bit on armv8 and open
> a bug for this issue, attaching the .i file as well as the right command
> line options necessary to reproduce the failure.  THat way Stefan can
> tackle it with a cross compiler.
Hi Jeff,
Filed the issue in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110867

Thanks,
Prathamesh
>
> Thanks,
> jeff

PING^3] [PATCH 3/4] ree: Improve functionality of ree pass for rs6000 target.

2023-08-01 Thread Ajit Agarwal via Gcc-patches



Ping!

 Forwarded Message 
Subject: [PING^2] [PATCH 3/4] ree: Improve functionality of ree pass for rs6000 
target.
Date: Tue, 18 Jul 2023 13:31:27 +0530
From: Ajit Agarwal 
To: gcc-patches 
CC: Jeff Law , Richard Biener 
, Segher Boessenkool , 
Peter Bergner 

Ping^2.

Please review.

Thanks & Regards
Ajit


This patch provide functionality to improve ree pass for rs6000 target.
Eliminated sign_extend/zero_extend/AND with varying constants.

Bootstrapped and regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit

ree: Improve ree pass for rs6000 target

For rs6000 target we see redundant zero and sign extension and done to improve
ree pass to eliminate such redundant zero and sign extension. Support of
zero_extend/sign_extend/AND. Also support of AND with extension with different
constants other than 1.

2023-06-07  Ajit Kumar Agarwal  

gcc/ChangeLog:

* ree.cc (eliminate_across_bbs_p): Add checks to enable extension
elimination across and within basic blocks.
(def_arith_p): New function to check definition has arithmetic
operation.
(combine_set_extension): Modification to incorporate AND
and current zero_extend and sign_extend instruction.
(merge_def_and_ext): Add calls to eliminate_across_bbs_p and
zero_extend sign_extend and AND instruction.
(rtx_is_zext_p): New function.
(feasible_cfg): New function.
* rtl.h (reg_used_set_between_p): Add prototype.
* rtlanal.cc (reg_used_set_between_p): New function.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/zext-elim.C: New testcase.
* g++.target/powerpc/zext-elim-1.C: New testcase.
* g++.target/powerpc/zext-elim-2.C: New testcase.
* g++.target/powerpc/sext-elim.C: New testcase.
---
 gcc/ree.cc| 476 --
 gcc/rtl.h |   1 +
 gcc/rtlanal.cc|  15 +
 gcc/testsuite/g++.target/powerpc/sext-elim.C  |  18 +
 .../g++.target/powerpc/zext-elim-1.C  |  19 +
 .../g++.target/powerpc/zext-elim-2.C  |  11 +
 gcc/testsuite/g++.target/powerpc/zext-elim.C  |  30 ++
 7 files changed, 524 insertions(+), 46 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/sext-elim.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-1.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-2.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim.C

diff --git a/gcc/ree.cc b/gcc/ree.cc
index fc04249fa84..dc6da21ec16 100644
--- a/gcc/ree.cc
+++ b/gcc/ree.cc
@@ -253,6 +253,66 @@ struct ext_cand
 
 static int max_insn_uid;
 
+/* Return TRUE if OP can be considered a zero extension from one or
+   more sub-word modes to larger modes up to a full word.
+
+   For example (and:DI (reg) (const_int X))
+
+   Depending on the value of X could be considered a zero extension
+   from QI, HI and SI to larger modes up to DImode.  */
+
+static bool
+rtx_is_zext_p (rtx insn)
+{
+  if (GET_CODE (insn) == AND)
+{
+  rtx set = XEXP (insn, 0);
+  if (REG_P (set))
+   {
+ rtx src = XEXP (insn, 1);
+
+ if (CONST_INT_P (src)
+ && IN_RANGE (exact_log2 (UINTVAL (src)), 0, 7))
+   return true;
+   }
+  else
+   return false;
+}
+
+  return false;
+}
+/* Return TRUE if OP can be considered a zero extension from one or
+   more sub-word modes to larger modes up to a full word.
+
+   For example (and:DI (reg) (const_int X))
+
+   Depending on the value of X could be considered a zero extension
+   from QI, HI and SI to larger modes up to DImode.  */
+
+static bool
+rtx_is_zext_p (rtx_insn *insn)
+{
+  rtx body = single_set (insn);
+
+  if (GET_CODE (body) == SET && GET_CODE (SET_SRC (body)) == AND)
+   {
+ rtx set = XEXP (SET_SRC (body), 0);
+
+ if (REG_P (set) && GET_MODE (SET_DEST (body)) == GET_MODE (set))
+   {
+ rtx src = XEXP (SET_SRC (body), 1);
+
+ if (CONST_INT_P (src)
+ && IN_RANGE (exact_log2 (UINTVAL (src)), 0, 7))
+   return true;
+   }
+ else
+  return false;
+   }
+
+   return false;
+}
+
 /* Update or remove REG_EQUAL or REG_EQUIV notes for INSN.  */
 
 static bool
@@ -319,7 +379,7 @@ combine_set_extension (ext_cand *cand, rtx_insn *curr_insn, 
rtx *orig_set)
 {
   rtx orig_src = SET_SRC (*orig_set);
   machine_mode orig_mode = GET_MODE (SET_DEST (*orig_set));
-  rtx new_set;
+  rtx new_set = NULL_RTX;
   rtx cand_pat = single_set (cand->insn);
 
   /* If the extension's source/destination registers are not the same
@@ -359,27 +419,41 @@ combine_set_extension (ext_cand *cand, rtx_insn 
*curr_insn, rtx *orig_set)
   else if (GET_CODE (orig_src) == cand->code)
 {
   /* Here is a sequence of two extensions.  Try to merge them.  */
-  rtx temp_extension
-   = gen_rtx_fmt_e (cand->code, cand->mode, XEXP (orig_src, 0));
+  rtx temp_extension =

[PING^3] PATCH v5 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces.

2023-08-01 Thread Ajit Agarwal via Gcc-patches

Ping!


 Forwarded Message 
Subject: [PING^2] PATCH v5 4/4] ree: Improve ree pass for rs6000 target using 
defined ABI interfaces.
Date: Tue, 18 Jul 2023 13:28:08 +0530
From: Ajit Agarwal 
To: gcc-patches 
CC: Jeff Law , Richard Biener 
, Segher Boessenkool , 
Peter Bergner 


Ping^2.

Please review.

Thanks & Regards
Ajit


This new version of patch 4 use improve ree pass for rs6000 target using 
defined ABI interfaces.
Bootstrapped and regtested on power64-linux-gnu.

Review comments incorporated.

Thanks & Regards
Ajit

Improve ree pass for rs6000 target using defined abi interfaces

For rs6000 target we see redundant zero and sign
extension and done to improve ree pass to eliminate
such redundant zero and sign extension using defined
ABI interfaces.

2023-06-01  Ajit Kumar Agarwal  

gcc/ChangeLog:

* ree.cc (combine_reaching_defs): Use of  zero_extend and sign_extend
defined abi interfaces.
(add_removable_extension): Use of defined abi interfaces for no
reaching defs.
(abi_extension_candidate_return_reg_p): New function.
(abi_extension_candidate_p): New function.
(abi_extension_candidate_argno_p): New function.
(abi_handle_regs_without_defs_p): New function.
(abi_target_promote_function_mode): New function.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/zext-elim-3.C
---
 gcc/ree.cc| 199 +++---
 .../g++.target/powerpc/zext-elim-3.C  |  13 ++
 2 files changed, 183 insertions(+), 29 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-3.C

diff --git a/gcc/ree.cc b/gcc/ree.cc
index fc04249fa84..2025a7c43da 100644
--- a/gcc/ree.cc
+++ b/gcc/ree.cc
@@ -514,7 +514,8 @@ get_uses (rtx_insn *insn, rtx reg)
 if (REGNO (DF_REF_REG (def)) == REGNO (reg))
   break;
 
-  gcc_assert (def != NULL);
+  if (def == NULL)
+return NULL;
 
   ref_chain = DF_REF_CHAIN (def);
 
@@ -750,6 +751,120 @@ get_extended_src_reg (rtx src)
   return src;
 }
 
+/* Return TRUE if target mode is equal to source mode of zero_extend
+   or sign_extend otherwise false.  */
+
+static bool
+abi_target_promote_function_mode (machine_mode mode)
+{
+  int unsignedp;
+  machine_mode tgt_mode =
+targetm.calls.promote_function_mode (NULL_TREE, mode, ,
+NULL_TREE, 1);
+
+  if (tgt_mode == mode)
+return true;
+  else
+return false;
+}
+
+/* Return TRUE if the candidate insn is zero extend and regno is
+   an return  registers.  */
+
+static bool
+abi_extension_candidate_return_reg_p (rtx_insn *insn, int regno)
+{
+  rtx set = single_set (insn);
+
+  if (GET_CODE (SET_SRC (set)) !=  ZERO_EXTEND)
+return false;
+
+  if (FUNCTION_VALUE_REGNO_P (regno))
+return true;
+
+  return false;
+}
+
+/* Return TRUE if reg source operand of zero_extend is argument registers
+   and not return registers and source and destination operand are same
+   and mode of source and destination operand are not same.  */
+
+static bool
+abi_extension_candidate_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+
+  if (GET_CODE (SET_SRC (set)) !=  ZERO_EXTEND)
+return false;
+
+  machine_mode ext_dst_mode = GET_MODE (SET_DEST (set));
+  rtx orig_src = XEXP (SET_SRC (set),0);
+
+  bool copy_needed
+= (REGNO (SET_DEST (set)) != REGNO (XEXP (SET_SRC (set), 0)));
+
+  if (!copy_needed && ext_dst_mode != GET_MODE (orig_src)
+  && FUNCTION_ARG_REGNO_P (REGNO (orig_src))
+  && !abi_extension_candidate_return_reg_p (insn, REGNO (orig_src)))
+return true;
+
+  return false;
+}
+
+/* Return TRUE if the candidate insn is zero extend and regno is
+   an argument registers.  */
+
+static bool
+abi_extension_candidate_argno_p (rtx_code code, int regno)
+{
+  if (code !=  ZERO_EXTEND)
+return false;
+
+  if (FUNCTION_ARG_REGNO_P (regno))
+return true;
+
+  return false;
+}
+
+/* Return TRUE if the candidate insn doesn't have defs and have
+ * uses without RTX_BIN_ARITH/RTX_COMM_ARITH/RTX_UNARY rtx class.  */
+
+static bool
+abi_handle_regs_without_defs_p (rtx_insn *insn)
+{
+  if (side_effects_p (PATTERN (insn)))
+return false;
+
+  struct df_link *uses
+= get_uses (insn, SET_DEST (PATTERN (insn)));
+
+  if (!uses)
+return false;
+
+  for (df_link *use = uses; use; use = use->next)
+{
+  if (!use->ref)
+   return false;
+
+  if (BLOCK_FOR_INSN (insn)
+ != BLOCK_FOR_INSN (DF_REF_INSN (use->ref)))
+   return false;
+
+  rtx_insn *use_insn = DF_REF_INSN (use->ref);
+
+  if (GET_CODE (PATTERN (use_insn)) == SET)
+   {
+ rtx_code code = GET_CODE (SET_SRC (PATTERN (use_insn)));
+
+ if (GET_RTX_CLASS (code) == RTX_BIN_ARITH
+ || GET_RTX_CLASS (code) == RTX_COMM_ARITH
+ || GET_RTX_CLASS (code) == RTX_UNARY)
+   return false;
+   }
+ }
+  return true;
+}
+
 /* This function goes through all reaching defs of the source

[PING^1] [PATCH v8] tree-ssa-sink: Improve code sinking pass.

2023-08-01 Thread Ajit Agarwal via Gcc-patches

Ping! 


 Forwarded Message 
Subject: [PATCH v8] tree-ssa-sink: Improve code sinking pass.
Date: Tue, 18 Jul 2023 19:03:37 +0530
From: Ajit Agarwal 
To: gcc-patches 
CC: Richard Biener , Jeff Law 
, Segher Boessenkool , Peter 
Bergner 

Hello All:

This patch improves code sinking pass to sink statements before call to reduce
register pressure.
Review comments are incorporated.

For example :

void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
  int l;
  l = a + b + c + d +e + f;
  if (a != 5)
{
  bar();
  j = l;
}
}

Code Sinking does the following:

void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
  int l;
  
  if (a != 5)
{
  l = a + b + c + d +e + f; 
  bar();
  j = l;
}
}

Bootstrapped regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit

tree-ssa-sink: Improve code sinking pass

Currently, code sinking will sink code after function calls.  This increases
register pressure for callee-saved registers.  The following patch improves
code sinking by placing the sunk code before calls in the use block or in
the immediate dominator of the use blocks.

2023-07-18  Ajit Kumar Agarwal  

gcc/ChangeLog:

PR tree-optimization/81953
* tree-ssa-sink.cc (statement_sink_location): Move statements before
calls.
(def_use_same_block): New function.
(select_best_block): Add heuristics to select the best blocks in the
immediate post dominator.

gcc/testsuite/ChangeLog:

PR tree-optimization/81953
* gcc.dg/tree-ssa/ssa-sink-20.c: New testcase.
* gcc.dg/tree-ssa/ssa-sink-21.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c | 15 ++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 19 +++
 gcc/tree-ssa-sink.cc| 59 -
 3 files changed, 67 insertions(+), 26 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
new file mode 100644
index 000..d3b79ca5803
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump 
{l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
new file mode 100644
index 000..84e7938c54f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j, x;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  if (b != 3)
+x = 3;
+  else
+x = 5;
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump 
{l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index b1ba7a2ad6c..e7190323abe 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -173,7 +173,8 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool 
*debug_stmts)
 
 /* Given EARLY_BB and LATE_BB, two blocks in a path through the dominator
tree, return the best basic block between them (inclusive) to place
-   statements.
+   statements. The best basic block should be an immediate dominator of
+   best basic block if the use stmt is after the call.
 
We want the most control dependent block in the shallowest loop nest.
 
@@ -190,11 +191,22 @@ nearest_common_dominator_of_uses (def_operand_p def_p, 
bool *debug_stmts)
 static basic_block
 select_best_block (basic_block early_bb,
   basic_block late_bb,
-  gimple *stmt)
+  gimple *stmt,
+  gimple *use)
 {
   basic_block best_bb = late_bb;
   basic_block temp_bb = late_bb;
   int threshold;
+  /* Get the sinking threshold.  If the statement to be moved has memory
+ operands, then increase the threshold by 7% as those are even more
+ profitable to avoid, clamping at 100%.  */
+  threshold = param_sink_frequency_threshold;
+  if (gimple_vuse (stmt) || gimple_vdef (stmt))
+{
+  threshold += 7;
+  if (threshold > 100)
+   threshold = 100;
+}
 
   while (temp_bb != early_bb)
 {
@@ -203,34 +215,31 @@ select_best_block (basic_block early_bb,
   if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
best_bb = temp_bb;
 
+  /* Placing a statement before a setjmp-like function would be invalid
+(it cannot be reevaluated when execution follows an

[COMMITTED] ada: Disable inlining of subprograms with Skip(_Flow_And)_Proof in GNATprove

2023-08-01 Thread Marc Poulhiès via Gcc-patches

From: Yannick Moy 

Subprograms with these Skip(_Flow_And)_Proof annotations should not be
inlined in GNATprove, as we want to skip part of the analysis for their
body.

gcc/ada/

* inline.adb (Can_Be_Inlined_In_GNATprove_Mode): Check for
Skip_Proof and Skip_Flow_And_Proof annotations for deciding
whether a subprogram can be inlined.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/inline.adb | 49 ++
 1 file changed, 49 insertions(+)

diff --git a/gcc/ada/inline.adb b/gcc/ada/inline.adb
index edb90a9fe20..db8b4164e87 100644
--- a/gcc/ada/inline.adb
+++ b/gcc/ada/inline.adb
@@ -1503,6 +1503,10 @@ package body Inline is
   --  an unconstrained record type with per-object constraints on component
   --  types.
 
+  function Has_Skip_Proof_Annotation (Id : Entity_Id) return Boolean;
+  --  Returns True if subprogram Id has an annotation Skip_Proof or
+  --  Skip_Flow_And_Proof.
+
   function Has_Some_Contract (Id : Entity_Id) return Boolean;
   --  Return True if subprogram Id has any contract. The presence of
   --  Extensions_Visible or Volatile_Function is also considered as a
@@ -1701,6 +1705,45 @@ package body Inline is
  return False;
   end Has_Formal_With_Discriminant_Dependent_Fields;
 
+  ---
+  -- Has_Skip_Proof_Annotation --
+  ---
+
+  function Has_Skip_Proof_Annotation (Id : Entity_Id) return Boolean is
+ Decl : Node_Id := Unit_Declaration_Node (Id);
+
+  begin
+ Next (Decl);
+
+ while Present (Decl)
+   and then Nkind (Decl) = N_Pragma
+ loop
+if Get_Pragma_Id (Decl) = Pragma_Annotate
+  and then List_Length (Pragma_Argument_Associations (Decl)) = 3
+then
+   declare
+  Arg1  : constant Node_Id :=
+First (Pragma_Argument_Associations (Decl));
+  Arg2  : constant Node_Id := Next (Arg1);
+  Arg1_Name : constant String :=
+Get_Name_String (Chars (Get_Pragma_Arg (Arg1)));
+  Arg2_Name : constant String :=
+Get_Name_String (Chars (Get_Pragma_Arg (Arg2)));
+   begin
+  if Arg1_Name = "gnatprove"
+and then Arg2_Name in "skip_proof" | "skip_flow_and_proof"
+  then
+ return True;
+  end if;
+   end;
+end if;
+
+Next (Decl);
+ end loop;
+
+ return False;
+  end Has_Skip_Proof_Annotation;
+
   ---
   -- Has_Some_Contract --
   ---
@@ -1903,6 +1946,12 @@ package body Inline is
   elsif Maybe_Traversal_Function (Id) then
  return False;
 
+  --  Do not inline subprograms with the Skip_Proof or Skip_Flow_And_Proof
+  --  annotation, which should be handled separately.
+
+  elsif Has_Skip_Proof_Annotation (Id) then
+ return False;
+
   --  Otherwise, this is a subprogram declared inside the private part of a
   --  package, or inside a package body, or locally in a subprogram, and it
   --  does not have any contract. Inline it.
-- 
2.40.0

[COMMITTED] ada: Bugbox compiling Constrained_Protected_Object'Image

2023-08-01 Thread Marc Poulhiès via Gcc-patches

From: Steve Baird 

In some cases, a bugbox is generated when compiling an example
that references X'Image, where X is a constrained object of a
discriminated protected type.

gcc/ada/

* sem_ch3.adb (Constrain_Corresponding_Record): When copying
information from the unconstrained record type to a newly
constructed constrained record subtype, the
Direct_Primitive_Operations attribute must be copied.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch3.adb | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/ada/sem_ch3.adb b/gcc/ada/sem_ch3.adb
index ed337f5408e..042ace01724 100644
--- a/gcc/ada/sem_ch3.adb
+++ b/gcc/ada/sem_ch3.adb
@@ -14325,6 +14325,8 @@ package body Sem_Ch3 is
   Set_Is_Constrained(T_Sub, True);
   Set_First_Entity  (T_Sub, First_Entity (Corr_Rec));
   Set_Last_Entity   (T_Sub, Last_Entity  (Corr_Rec));
+  Set_Direct_Primitive_Operations
+(T_Sub, Direct_Primitive_Operations (Corr_Rec));
 
   if Has_Discriminants (Prot_Subt) then -- False only if errors.
  Set_Discriminant_Constraint
-- 
2.40.0

[COMMITTED] ada: Fix printing of numbers in JSON output for data representation

2023-08-01 Thread Marc Poulhiès via Gcc-patches

From: Yannick Moy 

When calling GNAT with -gnatRj to generate JSON output for the
data representation of types and objects, it could happen that
numbers are printed in the Ada syntax for hexadecimal numbers, which
leads to an invalid JSON file being generated. Now fixed both for
the JSON output and the Ada-like output.

gcc/ada/

* repinfo.adb (Compute_Max_Length): Set parameter to print number
in decimal notation.
(List_Component_Layout): Same.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/repinfo.adb | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/repinfo.adb b/gcc/ada/repinfo.adb
index ba4b32b7027..ecd35e94e14 100644
--- a/gcc/ada/repinfo.adb
+++ b/gcc/ada/repinfo.adb
@@ -1100,7 +1100,7 @@ package body Repinfo is
  goto Continue;
   end if;
 
-  UI_Image (Spos);
+  UI_Image (Spos, Format => Decimal);
else
   --  If the record is not packed, then we know that all fields
   --  whose position is not specified have starting normalized
@@ -1176,7 +1176,7 @@ package body Repinfo is
Spos := Spos + 1;
 end if;
 
-UI_Image (Spos);
+UI_Image (Spos, Format => Decimal);
 Spaces (Max_Spos_Length - UI_Image_Length);
 Write_Str (UI_Image_Buffer (1 .. UI_Image_Length));
 
-- 
2.40.0

[COMMITTED] ada: Incorrect optimization for unconstrained limited record component type

2023-08-01 Thread Marc Poulhiès via Gcc-patches

From: Steve Baird 

If the discriminants of an immutably limited record type have defaults, then
it is safe to assume that a discriminant of an object of this type will never
change once it is initialized. In some cases, this means that the default
discriminant values can be treated like a constraint for purposes of
determining the amount of storage needed for an unconstrained object.
However, it is not safe to perform this optimization when determining
the size needed for an unconstrained component of an enclosing type. This
optimization was sometimes being incorrectly performed in this case. This could
save storage in some cases, but in other cases a constraint check could
incorrectly fail when initializing a component of an aggregate if the
discriminant values of the component differ from the default values.

gcc/ada/

* sem_ch3.adb (Analyze_Component_Declaration): Remove
Build_Default_Subtype_OK call and code that could only executed in
the case where the removed call would have returned True. Other
calls to Build_Default_Subtype_Ok are unaffected by this change.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch3.adb | 18 --
 1 file changed, 18 deletions(-)

diff --git a/gcc/ada/sem_ch3.adb b/gcc/ada/sem_ch3.adb
index 85019dfffa5..ed337f5408e 100644
--- a/gcc/ada/sem_ch3.adb
+++ b/gcc/ada/sem_ch3.adb
@@ -1868,7 +1868,6 @@ package body Sem_Ch3 is
---
 
procedure Analyze_Component_Declaration (N : Node_Id) is
-  Loc : constant Source_Ptr := Sloc (Component_Definition (N));
   Id  : constant Entity_Id  := Defining_Identifier (N);
   E   : constant Node_Id:= Expression (N);
   Typ : constant Node_Id:=
@@ -2205,23 +2204,6 @@ package body Sem_Ch3 is
  end if;
   end if;
 
-  --  When possible, build the default subtype
-
-  if Build_Default_Subtype_OK (T) then
- declare
-Act_T : constant Entity_Id := Build_Default_Subtype (T, N);
-
- begin
-Set_Etype (Id, Act_T);
-
---  Rewrite component definition to use the constrained subtype
-
-Rewrite (Component_Definition (N),
-  Make_Component_Definition (Loc,
-Subtype_Indication => New_Occurrence_Of (Act_T, Loc)));
- end;
-  end if;
-
   Set_Original_Record_Component (Id, Id);
 
   if Has_Aspects (N) then
-- 
2.40.0

[COMMITTED] ada: Emit SCOs for nested decisions in quantified expressions

2023-08-01 Thread Marc Poulhiès via Gcc-patches

From: Léo Creuse 

The tree traversal for decision SCO emission did not recurse in the
iterator specification or loop parameter specification of quantified
expressions, resulting in missing coverage obligations for nested
decisions. This change fixes this by traversing all the attributes
of quantified expressions nodes.

gcc/ada/

* par_sco.adb (Process_Decisions): Traverse all attributes of
quantified expressions nodes.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/par_sco.adb | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/par_sco.adb b/gcc/ada/par_sco.adb
index ce7de7f3d79..5e65fa25de1 100644
--- a/gcc/ada/par_sco.adb
+++ b/gcc/ada/par_sco.adb
@@ -829,8 +829,15 @@ package body Par_SCO is
 
 when N_Quantified_Expression =>
declare
-  Cond : constant Node_Id := Condition (N);
+  Cond   : constant Node_Id := Condition (N);
+  I_Spec : Node_Id := Empty;
begin
+  if Present (Iterator_Specification (N)) then
+ I_Spec := Iterator_Specification (N);
+  else
+ I_Spec := Loop_Parameter_Specification (N);
+  end if;
+  Process_Decisions (I_Spec, 'X', Pragma_Sloc);
   Process_Decisions (Cond, 'W', Pragma_Sloc);
   return Skip;
end;
-- 
2.40.0

[COMMITTED] ada: Fix generation of JSON output for data representation

2023-08-01 Thread Marc Poulhiès via Gcc-patches

From: Yannick Moy 

Using -gnatRj to generate data representation in JSON format could
lead to an ill-formed output or an assertion failure. Now fixed.

gcc/ada/

* repinfo.adb (List_Common_Type_Info): Fix output when alignment
is not statically known, and fix assertion when expansion is not
enabled.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/repinfo.adb | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/repinfo.adb b/gcc/ada/repinfo.adb
index 6a30bc7898b..ba4b32b7027 100644
--- a/gcc/ada/repinfo.adb
+++ b/gcc/ada/repinfo.adb
@@ -428,12 +428,21 @@ package body Repinfo is
  end if;
 
   --  Alignment is not always set for task, protected, and class-wide
-  --  types. Representation aspects are not computed for types in a
-  --  generic unit.
+  --  types, or when doing semantic analysis only. Representation aspects
+  --  are not computed for types in a generic unit.
 
   else
+ --  Add unknown alignment entry in JSON format to ensure the format is
+ --  valid, as a comma is added by the caller before another field.
+
+ if List_Representation_Info_To_JSON then
+Write_Str ("  ""Alignment"": ");
+Write_Unknown_Val;
+ end if;
+
  pragma Assert
-   (Is_Concurrent_Type (Ent) or else
+   (not Expander_Active or else
+  Is_Concurrent_Type (Ent) or else
   Is_Class_Wide_Type (Ent) or else
   Sem_Util.In_Generic_Scope (Ent));
   end if;
-- 
2.40.0

[COMMITTED] ada: Default Put_Image for composite derived types is missing information

2023-08-01 Thread Marc Poulhiès via Gcc-patches

From: Pascal Obry 

The output generated by a call to Some_Derived_Composite_Type'Put_Image
(in Ada2022 code) is incomplete in some cases, notably for a type derived
from a container type (i.e., from the Set/Map/List/Vector type declared in
an instance of one of Ada's predefined container generics) with no
user-specified Put_Image procedure.

gcc/ada/

* aspects.ads (Find_Aspect): Add Boolean parameter Or_Rep_Item
(defaulted to False).
* aspects.adb (Find_Aspect): If new Boolean parameter Or_Rep_Item
is True, then instead of returning an empty result if no
appropriate N_Aspect_Specification node is found, return an
appropriate N_Attribute_Definition_Clause if one is found.
* exp_put_image.ads: Change name of Enable_Put_Image function to
Put_Image_Enabled.
* exp_put_image.adb (Build_Record_Put_Image_Procedure): Detect the
case where a call to the Put_Image procedure of a derived type can
be transformed into a call to the parent type's Put_Image
procedure (with a type conversion to the parent type as the actual
parameter).
(Put_Image_Enabled): Change name of function (previously
Enable_Put_Image). Return True in more cases. In particular,
return True for a type with an explicitly specified Put_Image
aspect even if the type is declared in a predefined unit (or in an
instance of a predefined generic unit).
* exp_attr.adb: Changes due to Put_Image_Enabled function name
change.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/aspects.adb   | 30 +---
 gcc/ada/aspects.ads   | 12 --
 gcc/ada/exp_attr.adb  |  4 ++--
 gcc/ada/exp_put_image.adb | 48 +--
 gcc/ada/exp_put_image.ads |  2 +-
 5 files changed, 76 insertions(+), 20 deletions(-)

diff --git a/gcc/ada/aspects.adb b/gcc/ada/aspects.adb
index c14769c640c..86dbd183565 100644
--- a/gcc/ada/aspects.adb
+++ b/gcc/ada/aspects.adb
@@ -193,13 +193,14 @@ package body Aspects is
function Find_Aspect
  (Id: Entity_Id;
   A : Aspect_Id;
-  Class_Present : Boolean := False) return Node_Id
+  Class_Present : Boolean := False;
+  Or_Rep_Item   : Boolean := False) return Node_Id
is
-  Decl  : Node_Id;
-  Item  : Node_Id;
-  Owner : Entity_Id;
-  Spec  : Node_Id;
-
+  Decl : Node_Id;
+  Item : Node_Id;
+  Owner: Entity_Id;
+  Spec : Node_Id;
+  Alternative_Rep_Item : Node_Id := Empty;
begin
   Owner := Id;
 
@@ -231,6 +232,18 @@ package body Aspects is
and then Class_Present = Sinfo.Nodes.Class_Present (Item)
  then
 return Item;
+
+ --  We could do something similar here for an N_Pragma node
+ --  when Get_Aspect_Id (Pragma_Name (Item)) = A, but let's
+ --  wait for a demonstrated need.
+
+ elsif Or_Rep_Item
+   and then not Class_Present
+   and then Nkind (Item) = N_Attribute_Definition_Clause
+   and then Get_Aspect_Id (Chars (Item)) = A
+ then
+--  Remember this candidate in case we don't find anything better
+Alternative_Rep_Item := Item;
  end if;
 
  Next_Rep_Item (Item);
@@ -266,9 +279,10 @@ package body Aspects is
   end if;
 
   --  The entity does not carry any aspects or the desired aspect was not
-  --  found.
+  --  found. We have no N_Aspect_Specification node to return, but
+  --  Alternative_Rep_Item may have been set (if Or_Rep_Item is True).
 
-  return Empty;
+  return Alternative_Rep_Item;
end Find_Aspect;
 
--
diff --git a/gcc/ada/aspects.ads b/gcc/ada/aspects.ads
index 05677978037..f718227a7af 100644
--- a/gcc/ada/aspects.ads
+++ b/gcc/ada/aspects.ads
@@ -1156,10 +1156,18 @@ package Aspects is
 
function Find_Aspect (Id: Entity_Id;
  A : Aspect_Id;
- Class_Present : Boolean := False) return Node_Id;
+ Class_Present : Boolean := False;
+ Or_Rep_Item   : Boolean := False) return Node_Id;
--  Find the aspect specification of aspect A (or A'Class if Class_Present)
--  associated with entity I.
-   --  Return Empty if Id does not have the requested aspect.
+   --  If found, then return the aspect specification.
+   --  If not found and Or_Rep_Item is true, then look for a representation
+   --  item (as opposed to an N_Aspect_Specification node) which specifies
+   --  the given aspect; if found, then return the representation item.
+   --  [Currently only N_Attribute_Definition_Clause representation items
+   --  are checked for, but support for detecting N_Pragma representation
+   --  items could easily be added in the future

[COMMITTED] ada: check Atree.Get/Set_Field_Value

2023-08-01 Thread Marc Poulhiès via Gcc-patches

From: Bob Duff 

Get_Field_Value and Set_Field_Value now check that the Nkind or Ekind is
correct. However, the checks are partially disabled, because they
sometimes fail.

gcc/ada/

* atree.adb (Field_Present): New function to detect whether or not
a given field is present in a given node, based on either the node
kind or the entity kind as appropriate.
(Get_Field_Value): Check that the field begin fetched exists.
However, disable the check in the case of Scope_Depth_Value,
because we have failures in that case. Those failures need to be
fixed, and then the check can be enabled for all fields.
(Set_Field_Value): Check that the field begin set exists.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/atree.adb | 20 
 1 file changed, 20 insertions(+)

diff --git a/gcc/ada/atree.adb b/gcc/ada/atree.adb
index f1e4e2ca8bb..5597d166cdb 100644
--- a/gcc/ada/atree.adb
+++ b/gcc/ada/atree.adb
@@ -265,6 +265,10 @@ package body Atree is
   --  True if a node/entity of the given Kind has the given Field.
   --  Always True if assertions are disabled.
 
+  function Field_Present
+(N : Node_Id; Field : Node_Or_Entity_Field) return Boolean;
+  --  Same for a node, which could be an entity
+
end Field_Checking;
 
package body Field_Checking is
@@ -366,6 +370,17 @@ package body Atree is
  return Entity_Fields_Present (Kind) (Field);
   end Field_Present;
 
+  function Field_Present
+(N : Node_Id; Field : Node_Or_Entity_Field) return Boolean is
+  begin
+ case Field is
+when Node_Field =>
+   return Field_Present (Nkind (N), Field);
+when Entity_Field =>
+   return Field_Present (Ekind (N), Field);
+ end case;
+  end Field_Present;
+
end Field_Checking;
 

@@ -885,6 +900,10 @@ package body Atree is
function Get_Field_Value
  (N : Node_Id; Field : Node_Or_Entity_Field) return Field_Size_32_Bit
is
+  pragma Assert
+(if Field /= F_Scope_Depth_Value then -- ???Temporarily disable check
+   Field_Checking.Field_Present (N, Field));
+  --  Assert partially disabled because it fails in rare cases
   Desc : Field_Descriptor renames Field_Descriptors (Field);
   NN : constant Node_Or_Entity_Id := Node_To_Fetch_From (N, Field);
 
@@ -905,6 +924,7 @@ package body Atree is
procedure Set_Field_Value
  (N : Node_Id; Field : Node_Or_Entity_Field; Val : Field_Size_32_Bit)
is
+  pragma Assert (Field_Checking.Field_Present (N, Field));
   Desc : Field_Descriptor renames Field_Descriptors (Field);
 
begin
-- 
2.40.0

RE: [PATCH v8] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-08-01 Thread Li, Pan2 via Gcc-patches

Committed, thanks a lof, Robin and Kito, very appreciative for the explanation 
and comments from the expert's perspective.

Pan

-Original Message-
From: Kito Cheng  
Sent: Tuesday, August 1, 2023 3:51 PM
To: Li, Pan2 
Cc: Robin Dapp ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Wang, Yanzhang 

Subject: Re: [PATCH v8] RISC-V: Support CALL for RVV floating-point dynamic 
rounding

Hi Pan:


Thanks for your effort on this, this is LGTM and OK for trunk.


Hi Robin:


Thanks for your review on this stuff, this set of intrinsic functions
is complicated and might be controversial since the whole floating
point rounding mode is…complicated, and people might have different
tastes on that.


So what we (RVV intrinsic TG) trying to do is adding another set of
intrinsic *and* keep existing floating point intrinsic, so people
could still using fesetround style to play around the floating point
stuffs, but I am not intend to convince you that is necessary and it's
100% right design - I admit it's kind of experimental design as the
LLVM's constrained floating-point intrinsics.

Anyway, let's move forward, and see how useful it is for the RISC-V ecosystem :)

On Fri, Jul 28, 2023 at 8:35 PM Li, Pan2 via Gcc-patches
 wrote:
>
> Great! Thanks Robin for so many useful comments, as well as the 
> thought-provoking discussion with different insights.
> I believe such kind of interactively discussion will empower all of us, and 
> leading us to do the right things.
>
> Back to this PATCH, I try to only do one thing at a time and I totally agree 
> that there are something we need to try.
> Thanks again and let's wait for kito's comments.
>
> Pan
>
> -Original Message-
> From: Robin Dapp 
> Sent: Friday, July 28, 2023 6:05 PM
> To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> Cc: rdapp@gmail.com; juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Wang, 
> Yanzhang 
> Subject: Re: [PATCH v8] RISC-V: Support CALL for RVV floating-point dynamic 
> rounding
>
> Hi Pan,
>
> thanks for your patience and your work.  Apart from my general doubt
> whether mode-changing intrinsics are a good idea, I don't have other
> remarks that need fixing.  What I mentioned before:
>
>  - Handling of asms wouldn't be a huge change.  It can be done
>  in a follow-up patch of course but should be done eventually.
>
>  - The code is still rather difficult to follow because we diverge
>  from the usual mode-switching semantics e.g. in that we emit insns
>  in mode_needed as well as in mode_set.  I would have preferred
>  to stay close to the regular usage, document where and why we need
>  to do something different and suggest future middle-end improvements
>  to solve this more elegantly.
>
>  - I hope non-local control flow like setjmp/longjmp, sibcall
>  optimization and maybe others work fine.  I didn't see a reason
>  why not but I haven't checked very closely either.
>
>  - We can probably get away with not annotating every call with
>  an FRM clobber because there isn't any pass that would make use
>  of that anyway?
>
>
> As to my general qualm, independent of this patch, quickly
> summarized again one last time (the problem was latent before this
> specific patch anyway):
>
> I would prefer not to have mode-changing intrinsics at all but
> have users call fesetround explicitly.  That way the exact point
> where the rounding mode is changed would be obvious and not
> subject to optimization as well as caching/backing up.
> If at all necessary I would have preferred the LLVM way of
> backing up, setting new mode, performing the instruction
> and restoring directly after.
> If the initial intent of mode-changing intrinsics was to give
> users more control, I don't believe we achieve this by the "lazy"
> restore mechanism which is rather an obfuscation.
>
> Pardon my frankness but the whole mode-changing thing feels to me
> like just getting a feature out of the door to solve "something"
> /appease users than a well thought-out feature.  It doesn't even
> seem clear if this optimization is worthwhile when changing the
> rounding mode is prohibitively slow anyway.
>
> That said, if the current status is what the majority of
> contributors can live with, I'm not going to stand in the way,
> but I'd ask Kito or somebody else to give the final OK.
>
> Regards
>  Robin

RE: [PATCH v1] RISC-V: Support RVV VFSUB and VFRSUB rounding mode intrinsic API

2023-08-01 Thread Li, Pan2 via Gcc-patches

Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Tuesday, August 1, 2023 3:22 PM
To: Li, Pan2 ; gcc-patches 
Cc: Li, Pan2 ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v1] RISC-V: Support RVV VFSUB and VFRSUB rounding mode 
intrinsic API

LGTM


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-08-01 14:48
To: gcc-patches
CC: juzhe.zhong; 
pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v1] RISC-V: Support RVV VFSUB and VFRSUB rounding mode 
intrinsic API
From: Pan Li mailto:pan2...@intel.com>>

This patch would like to support the rounding mode API for both the
VFSUB and VFRSUB as below samples.

* __riscv_vfsub_vv_f32m1_rm
* __riscv_vfsub_vv_f32m1_rm_m
* __riscv_vfsub_vf_f32m1_rm
* __riscv_vfsub_vf_f32m1_rm_m
* __riscv_vfrsub_vf_f32m1_rm
* __riscv_vfrsub_vf_f32m1_rm_m

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class reverse_binop_frm): Add new template for reversed frm.
(vfsub_frm_obj): New obj.
(vfrsub_frm_obj): Likewise.
* config/riscv/riscv-vector-builtins-bases.h:
(vfsub_frm): New declaration.
(vfrsub_frm): Likewise.
* config/riscv/riscv-vector-builtins-functions.def
(vfsub_frm): New function define.
(vfrsub_frm): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-single-rsub.c: New test.
* gcc.target/riscv/rvv/base/float-point-single-sub.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  | 21 +
.../riscv/riscv-vector-builtins-bases.h   |  2 ++
.../riscv/riscv-vector-builtins-functions.def |  3 ++
.../riscv/rvv/base/float-point-single-rsub.c  | 19 
.../riscv/rvv/base/float-point-single-sub.c   | 30 +++
5 files changed, 75 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-rsub.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-sub.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 316b35b57c8..035cafc43b3 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -298,6 +298,23 @@ public:
   }
};
+/* Implements below instructions for frm
+   - vfrsub
+*/
+template
+class reverse_binop_frm : public function_base
+{
+public:
+  bool has_rounding_mode_operand_p () const override { return true; }
+
+public:
+  rtx expand (function_expander ) const override
+  {
+return e.use_exact_insn (
+  code_for_pred_reverse_scalar (CODE, e.vector_mode ()));
+  }
+};
+
/* Implements vrsub.  */
class vrsub : public function_base
{
@@ -2042,7 +2059,9 @@ static CONSTEXPR const vid vid_obj;
static CONSTEXPR const binop vfadd_obj;
static CONSTEXPR const binop vfsub_obj;
static CONSTEXPR const binop_frm vfadd_frm_obj;
+static CONSTEXPR const binop_frm vfsub_frm_obj;
static CONSTEXPR const reverse_binop vfrsub_obj;
+static CONSTEXPR const reverse_binop_frm vfrsub_frm_obj;
static CONSTEXPR const widen_binop vfwadd_obj;
static CONSTEXPR const widen_binop vfwsub_obj;
static CONSTEXPR const binop vfmul_obj;
@@ -2269,7 +2288,9 @@ BASE (vid)
BASE (vfadd)
BASE (vfadd_frm)
BASE (vfsub)
+BASE (vfsub_frm)
BASE (vfrsub)
+BASE (vfrsub_frm)
BASE (vfwadd)
BASE (vfwsub)
BASE (vfmul)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index e771a36adc8..5c6b239c274 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -144,7 +144,9 @@ extern const function_base *const vid;
extern const function_base *const vfadd;
extern const function_base *const vfadd_frm;
extern const function_base *const vfsub;
+extern const function_base *const vfsub_frm;
extern const function_base *const vfrsub;
+extern const function_base *const vfrsub_frm;
extern const function_base *const vfwadd;
extern const function_base *const vfwsub;
extern const function_base *const vfmul;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 035c9e4252f..fa1c2cef970 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -291,6 +291,9 @@ DEF_RVV_FUNCTION (vfsub, alu, full_preds, f_vvf_ops)
DEF_RVV_FUNCTION (vfrsub, alu, full_preds, f_vvf_ops)
DEF_RVV_FUNCTION (vfadd_frm, alu_frm, full_preds, f_vvv_ops)
DEF_RVV_FUNCTION (vfadd_frm, alu_frm, full_preds, f_vvf_ops)
+DEF_RVV_FUNCTION (vfsub_frm, alu_frm, full_preds, f_vvv_ops)
+DEF_RVV_FUNCTION (vfsub_frm, alu_frm, full_preds, f_vvf_ops)
+DEF_RVV_FUNCTION (vfrsub_frm, alu_frm, full_preds, f_vvf_ops)
// 13.3. Vector Widening Floating-Point Add/Subtract

Re: [C PATCH]: Add Walloc-type to warn about insufficient size in allocations

2023-08-01 Thread Martin Uecker via Gcc-patches

Am Dienstag, dem 01.08.2023 um 02:11 +0530 schrieb Prathamesh Kulkarni:
> On Fri, 21 Jul 2023 at 16:52, Martin Uecker via Gcc-patches
>  wrote:
> > 
> > 
> > 
> > This patch adds a warning for allocations with insufficient size
> > based on the "alloc_size" attribute and the type of the pointer
> > the result is assigned to. While it is theoretically legal to
> > assign to the wrong pointer type and cast it to the right type
> > later, this almost always indicates an error. Since this catches
> > common mistakes and is simple to diagnose, it is suggested to
> > add this warning.
> > 
> > 
> > Bootstrapped and regression tested on x86.
> > 
> > 
> > Martin
> > 
> > 
> > 
> > Add option Walloc-type that warns about allocations that have
> > insufficient storage for the target type of the pointer the
> > storage is assigned to.
> > 
> > gcc:
> > * doc/invoke.texi: Document -Wstrict-flex-arrays option.
> > 
> > gcc/c-family:
> > 
> > * c.opt (Walloc-type): New option.
> > 
> > gcc/c:
> > * c-typeck.cc (convert_for_assignment): Add Walloc-type warning.
> > 
> > gcc/testsuite:
> > 
> > * gcc.dg/Walloc-type-1.c: New test.
> > 
> > 
> > diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> > index 4abdc8d0e77..8b9d148582b 100644
> > --- a/gcc/c-family/c.opt
> > +++ b/gcc/c-family/c.opt
> > @@ -319,6 +319,10 @@ Walloca
> >  C ObjC C++ ObjC++ Var(warn_alloca) Warning
> >  Warn on any use of alloca.
> > 
> > +Walloc-type
> > +C ObjC Var(warn_alloc_type) Warning
> > +Warn when allocating insufficient storage for the target type of the
> > assigned pointer.
> > +
> >  Walloc-size-larger-than=
> >  C ObjC C++ LTO ObjC++ Var(warn_alloc_size_limit) Joined Host_Wide_Int
> > ByteSize Warning Init(HOST_WIDE_INT_MAX)
> >  -Walloc-size-larger-than=   Warn for calls to allocation
> > functions that
> > diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
> > index 7cf411155c6..2e392f9c952 100644
> > --- a/gcc/c/c-typeck.cc
> > +++ b/gcc/c/c-typeck.cc
> > @@ -7343,6 +7343,32 @@ convert_for_assignment (location_t location,
> > location_t expr_loc, tree type,
> > "request for implicit conversion "
> > "from %qT to %qT not permitted in C++", rhstype,
> > type);
> > 
> > +  /* Warn of new allocations are not big enough for the target
> > type.  */
> > +  tree fndecl;
> > +  if (warn_alloc_type
> > + && TREE_CODE (rhs) == CALL_EXPR
> > + && (fndecl = get_callee_fndecl (rhs)) != NULL_TREE
> > + && DECL_IS_MALLOC (fndecl))
> > +   {
> > + tree fntype = TREE_TYPE (fndecl);
> > + tree fntypeattrs = TYPE_ATTRIBUTES (fntype);
> > + tree alloc_size = lookup_attribute ("alloc_size",
> > fntypeattrs);
> > + if (alloc_size)
> > +   {
> > + tree args = TREE_VALUE (alloc_size);
> > + int idx = TREE_INT_CST_LOW (TREE_VALUE (args)) - 1;
> > + /* For calloc only use the second argument.  */
> > + if (TREE_CHAIN (args))
> > +   idx = TREE_INT_CST_LOW (TREE_VALUE (TREE_CHAIN
> > (args))) - 1;
> > + tree arg = CALL_EXPR_ARG (rhs, idx);
> > + if (TREE_CODE (arg) == INTEGER_CST
> > + && tree_int_cst_lt (arg, TYPE_SIZE_UNIT (ttl)))
> Hi Martin,
> Just wondering if it'd be a good idea perhaps to warn if alloc size is
> not a multiple of TYPE_SIZE_UNIT instead of just less-than ?
> So it can catch cases like:
> int *p = malloc (sizeof (int) + 2); // probably intended malloc
> (sizeof (int) * 2)
> 
> FWIW, this is caught using -fanalyzer:
> f.c: In function 'f':
> f.c:3:12: warning: allocated buffer size is not a multiple of the
> pointee's size [CWE-131] [-Wanalyzer-allocation-size]
> 3 |   int *p = __builtin_malloc (sizeof(int) + 2);
>   |^~
> 
> Thanks,
> Prathamesh

Yes, this is probably a good idea.  It might need special
logic for flexible array members then...


Martin


> > +warning_at (location, OPT_Walloc_type, "allocation of
> > "
> > +"insufficient size %qE for type %qT with
> > "
> > +"size %qE", arg, ttl, TYPE_SIZE_UNIT
> > (ttl));
> > +   }
> > +   }
> > +
> >/* See if the pointers point to incompatible address spaces.  */
> >asl = TYPE_ADDR_SPACE (ttl);
> >asr = TYPE_ADDR_SPACE (ttr);
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index 88e3c625030..6869bed64c3 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -8076,6 +8076,15 @@ always leads to a call to another @code{cold}
> > function such as wrappers of
> >  C++ @code{throw} or fatal error reporting functions leading to
> > @code{abort}.
> >  @end table
> > 
> > +@opindex Wno-alloc-type
> > +@opindex Walloc-type
> > +@item -Walloc-type
> > +Warn about calls to allocation functions decorated with attribute
> > +@code{alloc_size} that specify

Re: [PATCH v8] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-08-01 Thread Kito Cheng via Gcc-patches

Hi Pan:


Thanks for your effort on this, this is LGTM and OK for trunk.


Hi Robin:


Thanks for your review on this stuff, this set of intrinsic functions
is complicated and might be controversial since the whole floating
point rounding mode is…complicated, and people might have different
tastes on that.


So what we (RVV intrinsic TG) trying to do is adding another set of
intrinsic *and* keep existing floating point intrinsic, so people
could still using fesetround style to play around the floating point
stuffs, but I am not intend to convince you that is necessary and it's
100% right design - I admit it's kind of experimental design as the
LLVM's constrained floating-point intrinsics.

Anyway, let's move forward, and see how useful it is for the RISC-V ecosystem :)

On Fri, Jul 28, 2023 at 8:35 PM Li, Pan2 via Gcc-patches
 wrote:
>
> Great! Thanks Robin for so many useful comments, as well as the 
> thought-provoking discussion with different insights.
> I believe such kind of interactively discussion will empower all of us, and 
> leading us to do the right things.
>
> Back to this PATCH, I try to only do one thing at a time and I totally agree 
> that there are something we need to try.
> Thanks again and let's wait for kito's comments.
>
> Pan
>
> -Original Message-
> From: Robin Dapp 
> Sent: Friday, July 28, 2023 6:05 PM
> To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> Cc: rdapp@gmail.com; juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Wang, 
> Yanzhang 
> Subject: Re: [PATCH v8] RISC-V: Support CALL for RVV floating-point dynamic 
> rounding
>
> Hi Pan,
>
> thanks for your patience and your work.  Apart from my general doubt
> whether mode-changing intrinsics are a good idea, I don't have other
> remarks that need fixing.  What I mentioned before:
>
>  - Handling of asms wouldn't be a huge change.  It can be done
>  in a follow-up patch of course but should be done eventually.
>
>  - The code is still rather difficult to follow because we diverge
>  from the usual mode-switching semantics e.g. in that we emit insns
>  in mode_needed as well as in mode_set.  I would have preferred
>  to stay close to the regular usage, document where and why we need
>  to do something different and suggest future middle-end improvements
>  to solve this more elegantly.
>
>  - I hope non-local control flow like setjmp/longjmp, sibcall
>  optimization and maybe others work fine.  I didn't see a reason
>  why not but I haven't checked very closely either.
>
>  - We can probably get away with not annotating every call with
>  an FRM clobber because there isn't any pass that would make use
>  of that anyway?
>
>
> As to my general qualm, independent of this patch, quickly
> summarized again one last time (the problem was latent before this
> specific patch anyway):
>
> I would prefer not to have mode-changing intrinsics at all but
> have users call fesetround explicitly.  That way the exact point
> where the rounding mode is changed would be obvious and not
> subject to optimization as well as caching/backing up.
> If at all necessary I would have preferred the LLVM way of
> backing up, setting new mode, performing the instruction
> and restoring directly after.
> If the initial intent of mode-changing intrinsics was to give
> users more control, I don't believe we achieve this by the "lazy"
> restore mechanism which is rather an obfuscation.
>
> Pardon my frankness but the whole mode-changing thing feels to me
> like just getting a feature out of the door to solve "something"
> /appease users than a well thought-out feature.  It doesn't even
> seem clear if this optimization is worthwhile when changing the
> rounding mode is prohibitively slow anyway.
>
> That said, if the current status is what the majority of
> contributors can live with, I'm not going to stand in the way,
> but I'd ask Kito or somebody else to give the final OK.
>
> Regards
>  Robin

Re: [PATCH v1] RISC-V: Support RVV VFSUB and VFRSUB rounding mode intrinsic API

2023-08-01 Thread juzhe.zh...@rivai.ai

LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-08-01 14:48
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support RVV VFSUB and VFRSUB rounding mode 
intrinsic API
From: Pan Li 
 
This patch would like to support the rounding mode API for both the
VFSUB and VFRSUB as below samples.
 
* __riscv_vfsub_vv_f32m1_rm
* __riscv_vfsub_vv_f32m1_rm_m
* __riscv_vfsub_vf_f32m1_rm
* __riscv_vfsub_vf_f32m1_rm_m
* __riscv_vfrsub_vf_f32m1_rm
* __riscv_vfrsub_vf_f32m1_rm_m
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc
(class reverse_binop_frm): Add new template for reversed frm.
(vfsub_frm_obj): New obj.
(vfrsub_frm_obj): Likewise.
* config/riscv/riscv-vector-builtins-bases.h:
(vfsub_frm): New declaration.
(vfrsub_frm): Likewise.
* config/riscv/riscv-vector-builtins-functions.def
(vfsub_frm): New function define.
(vfrsub_frm): Likewise.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/float-point-single-rsub.c: New test.
* gcc.target/riscv/rvv/base/float-point-single-sub.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  | 21 +
.../riscv/riscv-vector-builtins-bases.h   |  2 ++
.../riscv/riscv-vector-builtins-functions.def |  3 ++
.../riscv/rvv/base/float-point-single-rsub.c  | 19 
.../riscv/rvv/base/float-point-single-sub.c   | 30 +++
5 files changed, 75 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-rsub.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-sub.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 316b35b57c8..035cafc43b3 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -298,6 +298,23 @@ public:
   }
};
+/* Implements below instructions for frm
+   - vfrsub
+*/
+template
+class reverse_binop_frm : public function_base
+{
+public:
+  bool has_rounding_mode_operand_p () const override { return true; }
+
+public:
+  rtx expand (function_expander ) const override
+  {
+return e.use_exact_insn (
+  code_for_pred_reverse_scalar (CODE, e.vector_mode ()));
+  }
+};
+
/* Implements vrsub.  */
class vrsub : public function_base
{
@@ -2042,7 +2059,9 @@ static CONSTEXPR const vid vid_obj;
static CONSTEXPR const binop vfadd_obj;
static CONSTEXPR const binop vfsub_obj;
static CONSTEXPR const binop_frm vfadd_frm_obj;
+static CONSTEXPR const binop_frm vfsub_frm_obj;
static CONSTEXPR const reverse_binop vfrsub_obj;
+static CONSTEXPR const reverse_binop_frm vfrsub_frm_obj;
static CONSTEXPR const widen_binop vfwadd_obj;
static CONSTEXPR const widen_binop vfwsub_obj;
static CONSTEXPR const binop vfmul_obj;
@@ -2269,7 +2288,9 @@ BASE (vid)
BASE (vfadd)
BASE (vfadd_frm)
BASE (vfsub)
+BASE (vfsub_frm)
BASE (vfrsub)
+BASE (vfrsub_frm)
BASE (vfwadd)
BASE (vfwsub)
BASE (vfmul)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index e771a36adc8..5c6b239c274 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -144,7 +144,9 @@ extern const function_base *const vid;
extern const function_base *const vfadd;
extern const function_base *const vfadd_frm;
extern const function_base *const vfsub;
+extern const function_base *const vfsub_frm;
extern const function_base *const vfrsub;
+extern const function_base *const vfrsub_frm;
extern const function_base *const vfwadd;
extern const function_base *const vfwsub;
extern const function_base *const vfmul;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 035c9e4252f..fa1c2cef970 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -291,6 +291,9 @@ DEF_RVV_FUNCTION (vfsub, alu, full_preds, f_vvf_ops)
DEF_RVV_FUNCTION (vfrsub, alu, full_preds, f_vvf_ops)
DEF_RVV_FUNCTION (vfadd_frm, alu_frm, full_preds, f_vvv_ops)
DEF_RVV_FUNCTION (vfadd_frm, alu_frm, full_preds, f_vvf_ops)
+DEF_RVV_FUNCTION (vfsub_frm, alu_frm, full_preds, f_vvv_ops)
+DEF_RVV_FUNCTION (vfsub_frm, alu_frm, full_preds, f_vvf_ops)
+DEF_RVV_FUNCTION (vfrsub_frm, alu_frm, full_preds, f_vvf_ops)
// 13.3. Vector Widening Floating-Point Add/Subtract Instructions
DEF_RVV_FUNCTION (vfwadd, widen_alu, full_preds, f_wvv_ops)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-rsub.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-rsub.c
new file mode 100644
index 000..1d770adc32c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-rsub.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+typedef float float32_t;
+
+vfloat32m1_t

Re: [RFC] light expander sra for parameters and returns

2023-08-01 Thread Jiufu Guo via Gcc-patches



Hi,

Richard Biener  writes:

> On Mon, 24 Jul 2023, Jiufu Guo wrote:
>
>> 
>> Hi Martin,
>> 
>> Not sure about your current option about re-using the ipa-sra code
>> in the light-expander-sra. And if anything I could input please
>> let me know.
>> 
>> And I'm thinking about the difference between the expander-sra, ipa-sra
>> and tree-sra. 1. For stmts walking, expander-sra has special behavior
>> for return-stmt, and also a little special on assign-stmt. And phi
>> stmts are not checked by ipa-sra/tree-sra. 2. For the access structure,
>> I'm also thinking if we need a tree structure; it would be useful when
>> checking overlaps, it was not used now in the expander-sra.
>> 
>> For ipa-sra and tree-sra, I notice that there is some similar code,
>> but of cause there are differences. While it seems the difference
>> is 'intended', for example: 1. when creating and accessing,
>> 'size != max_size' is acceptable in tree-sra but not for ipa-sra.
>> 2. 'AGGREGATE_TYPE_P' for ipa-sra is accepted for some cases, but
>> not ok for tree-ipa.  
>> I'm wondering if those slight difference blocks re-use the code
>> between ipa-sra and tree-sra.
>> 
>> The expander-sra may be more light, for example, maybe we can use
>> FOR_EACH_IMM_USE_STMT to check the usage of each parameter, and not
>> need to walk all the stmts.
>
> What I was hoping for is shared stmt-level analysis and a shared
> data structure for the "access"(es) a stmt performs.  Because that
> can come up handy in multiple places.  The existing SRA data
> structures could easily embed that subset for example if sharing
> the whole data structure of [IPA] SRA seems too unwieldly.

Understand.
The stmt-level analysis and "access" data structure are similar
between ipa-sra/tree-sra and the expander-sra.

I just update the patch, this version does not change the behaviors of
the previous version.  It is just cleaning/merging some functions only.
The patch is attached.

This version (and tree-sra/ipa-sra) is still using the similar
"stmt analyze" and "access struct"".  This could be extracted as
shared code.
I'm thinking to update the code to use the same "base_access" and
"walk function".

>
> With a stmt-leve API using FOR_EACH_IMM_USE_STMT would still be
> possible (though RTL expansion pre-walks all stmts anyway).

Yeap, I also notice that "FOR_EACH_IMM_USE_STMT" is not enough.
For struct parameters, walking stmt is needed.


BR,
Jeff (Jiufu Guo)

-
diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index edf292cfbe9..8c36ad5df79 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -97,6 +97,502 @@ static bool defer_stack_allocation (tree, bool);
 
 static void record_alignment_for_reg_var (unsigned int);
 
+extern rtx
+expand_shift (enum tree_code, machine_mode, rtx, poly_int64, rtx, int);
+
+/* For light SRA in expander about paramaters and returns.  */
+namespace
+{
+
+struct access
+{
+  /* Each accessing on the aggragate is about OFFSET/SIZE.  */
+  HOST_WIDE_INT offset;
+  HOST_WIDE_INT size;
+
+  bool writing;
+
+  /* The context expression of this access.  */
+  tree expr;
+
+  /* The rtx for the access: link to incoming/returning register(s).  */
+  rtx rtx_val;
+};
+
+typedef struct access *access_p;
+
+/* Expr (tree) -> Scalarized value (rtx) map.  */
+static hash_map *expr_rtx_vec;
+
+/* Base (tree) -> Vector (vec *) map.  */
+static hash_map > *base_access_vec;
+
+/* Return true if EXPR has interesting access to the sra candidates,
+   and created access, return false otherwise.  */
+
+static struct access *
+build_access (tree expr, bool write)
+{
+  enum tree_code code = TREE_CODE (expr);
+  if (code != VAR_DECL && code != PARM_DECL && code != COMPONENT_REF
+  && code != ARRAY_REF && code != ARRAY_RANGE_REF)
+return NULL;
+
+  HOST_WIDE_INT offset, size;
+  bool reverse;
+  tree base = get_ref_base_and_extent_hwi (expr, , , );
+  if (!base || !DECL_P (base))
+return NULL;
+
+  vec *access_vec = base_access_vec->get (base);
+  if (!access_vec)
+return NULL;
+
+  /* TODO: support reverse. */
+  if (reverse || size <= 0 || offset + size > tree_to_shwi (DECL_SIZE (base)))
+{
+  base_access_vec->remove (base);
+  return NULL;
+}
+
+  struct access *access = XNEWVEC (struct access, 1);
+
+  memset (access, 0, sizeof (struct access));
+  access->offset = offset;
+  access->size = size;
+  access->expr = expr;
+  access->writing = write;
+  access->rtx_val = NULL_RTX;
+
+  access_vec->safe_push (access);
+
+  return access;
+}
+
+/* Callback of walk_stmt_load_store_addr_ops visit_base used to remove
+   operands with address taken.  */
+
+static bool
+visit_base (gimple *, tree op, tree, void *)
+{
+  op = get_base_address (op);
+  if (op && DECL_P (op))
+base_access_vec->remove (op);
+
+  return false;
+}
+
+/* Scan function and look for interesting expressions and create access
+   structures for them.  */
+
+static void
+collect_acccesses (void)
+{
+  basic_block bb;
+
+

[PATCH v1] RISC-V: Support RVV VFSUB and VFRSUB rounding mode intrinsic API

2023-08-01 Thread Pan Li via Gcc-patches

From: Pan Li 

This patch would like to support the rounding mode API for both the
VFSUB and VFRSUB as below samples.

* __riscv_vfsub_vv_f32m1_rm
* __riscv_vfsub_vv_f32m1_rm_m
* __riscv_vfsub_vf_f32m1_rm
* __riscv_vfsub_vf_f32m1_rm_m
* __riscv_vfrsub_vf_f32m1_rm
* __riscv_vfrsub_vf_f32m1_rm_m

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class reverse_binop_frm): Add new template for reversed frm.
(vfsub_frm_obj): New obj.
(vfrsub_frm_obj): Likewise.
* config/riscv/riscv-vector-builtins-bases.h:
(vfsub_frm): New declaration.
(vfrsub_frm): Likewise.
* config/riscv/riscv-vector-builtins-functions.def
(vfsub_frm): New function define.
(vfrsub_frm): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-single-rsub.c: New test.
* gcc.target/riscv/rvv/base/float-point-single-sub.c: New test.
---
 .../riscv/riscv-vector-builtins-bases.cc  | 21 +
 .../riscv/riscv-vector-builtins-bases.h   |  2 ++
 .../riscv/riscv-vector-builtins-functions.def |  3 ++
 .../riscv/rvv/base/float-point-single-rsub.c  | 19 
 .../riscv/rvv/base/float-point-single-sub.c   | 30 +++
 5 files changed, 75 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-rsub.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-sub.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 316b35b57c8..035cafc43b3 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -298,6 +298,23 @@ public:
   }
 };
 
+/* Implements below instructions for frm
+   - vfrsub
+*/
+template
+class reverse_binop_frm : public function_base
+{
+public:
+  bool has_rounding_mode_operand_p () const override { return true; }
+
+public:
+  rtx expand (function_expander ) const override
+  {
+return e.use_exact_insn (
+  code_for_pred_reverse_scalar (CODE, e.vector_mode ()));
+  }
+};
+
 /* Implements vrsub.  */
 class vrsub : public function_base
 {
@@ -2042,7 +2059,9 @@ static CONSTEXPR const vid vid_obj;
 static CONSTEXPR const binop vfadd_obj;
 static CONSTEXPR const binop vfsub_obj;
 static CONSTEXPR const binop_frm vfadd_frm_obj;
+static CONSTEXPR const binop_frm vfsub_frm_obj;
 static CONSTEXPR const reverse_binop vfrsub_obj;
+static CONSTEXPR const reverse_binop_frm vfrsub_frm_obj;
 static CONSTEXPR const widen_binop vfwadd_obj;
 static CONSTEXPR const widen_binop vfwsub_obj;
 static CONSTEXPR const binop vfmul_obj;
@@ -2269,7 +2288,9 @@ BASE (vid)
 BASE (vfadd)
 BASE (vfadd_frm)
 BASE (vfsub)
+BASE (vfsub_frm)
 BASE (vfrsub)
+BASE (vfrsub_frm)
 BASE (vfwadd)
 BASE (vfwsub)
 BASE (vfmul)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index e771a36adc8..5c6b239c274 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -144,7 +144,9 @@ extern const function_base *const vid;
 extern const function_base *const vfadd;
 extern const function_base *const vfadd_frm;
 extern const function_base *const vfsub;
+extern const function_base *const vfsub_frm;
 extern const function_base *const vfrsub;
+extern const function_base *const vfrsub_frm;
 extern const function_base *const vfwadd;
 extern const function_base *const vfwsub;
 extern const function_base *const vfmul;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 035c9e4252f..fa1c2cef970 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -291,6 +291,9 @@ DEF_RVV_FUNCTION (vfsub, alu, full_preds, f_vvf_ops)
 DEF_RVV_FUNCTION (vfrsub, alu, full_preds, f_vvf_ops)
 DEF_RVV_FUNCTION (vfadd_frm, alu_frm, full_preds, f_vvv_ops)
 DEF_RVV_FUNCTION (vfadd_frm, alu_frm, full_preds, f_vvf_ops)
+DEF_RVV_FUNCTION (vfsub_frm, alu_frm, full_preds, f_vvv_ops)
+DEF_RVV_FUNCTION (vfsub_frm, alu_frm, full_preds, f_vvf_ops)
+DEF_RVV_FUNCTION (vfrsub_frm, alu_frm, full_preds, f_vvf_ops)
 
 // 13.3. Vector Widening Floating-Point Add/Subtract Instructions
 DEF_RVV_FUNCTION (vfwadd, widen_alu, full_preds, f_wvv_ops)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-rsub.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-rsub.c
new file mode 100644
index 000..1d770adc32c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-rsub.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+typedef float float32_t;
+
+vfloat32m1_t
+test_vfrsub_vf_f32m1_rm (vfloat32m1_t op1, float32_t op2, size_t vl) {
+  return

Re: [PATCH V2] RISC-V: Support POPCOUNT auto-vectorization

2023-08-01 Thread Robin Dapp via Gcc-patches

>>> I'm not against continuing with the more well-known approach for now
>>> but we should keep in mind that might still be potential for improvement.
> 
> No. I don't think it's faster.

I did a quick check on my x86 laptop and it's roughly 25% faster there.
That's consistent with the literature.  RISC-V qemu only shows 5-10%
improvement, though.

> I have no ideal. I saw ARM SVE generate:
> POP_COUNT
> POP_COUNT
> VEC_PACK_TRUNC.

I'd strongly suspect this happens because it's converting to int.
If you change dst to uint64_t there won't be any vec_pack_trunc.

> I am gonna drop this patch since it's meaningless.

But why?  It can still help even if we can improve on the sequence.
IMHO you can go ahead with it and just change int -> uint64_t in the
tests.

Regards
 Robin

[PATCH V3] VECT: Support CALL vectorization for COND_LEN_*

2023-08-01 Thread juzhe . zhong

From: Ju-Zhe Zhong 

Hi, Richard and Richi.

Base on the suggestions from Richard:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625396.html

This patch choose (1) approach that Richard provided, meaning:

RVV implements cond_* optabs as expanders.  RVV therefore supports
both IFN_COND_ADD and IFN_COND_LEN_ADD.  No dummy length arguments
are needed at the gimple level.

Such approach can make codes much cleaner and reasonable.

Consider this following case:
void foo (float * __restrict a, float * __restrict b, int * __restrict cond, 
int n)
{
  for (int i = 0; i < n; i++)
if (cond[i])
  a[i] = b[i] + a[i];
}


Output of RISC-V (32-bits) gcc (trunk) (Compiler #3)
:5:21: missed: couldn't vectorize loop
:5:21: missed: not vectorized: control flow in loop.

ARM SVE:

...
mask__27.10_51 = vect__4.9_49 != { 0, ... };
...
vec_mask_and_55 = loop_mask_49 & mask__27.10_51;
...
vect__9.17_62 = .COND_ADD (vec_mask_and_55, vect__6.13_56, vect__8.16_60, 
vect__6.13_56);

For RVV, we want IR as follows:

...
_68 = .SELECT_VL (ivtmp_66, POLY_INT_CST [4, 4]);
...
mask__27.10_51 = vect__4.9_49 != { 0, ... };
...
vect__9.17_60 = .COND_LEN_ADD (mask__27.10_51, vect__6.13_55, vect__8.16_59, 
vect__6.13_55, _68, 0);
...

Both len and mask of COND_LEN_ADD are real not dummy.

This patch has been fully tested in RISC-V port with supporting both COND_* and 
COND_LEN_*.

And also, Bootstrap and Regression on X86 passed.

OK for trunk?

gcc/ChangeLog:

* internal-fn.cc (get_len_internal_fn): New function.
(DEF_INTERNAL_COND_FN): Ditto.
(DEF_INTERNAL_SIGNED_COND_FN): Ditto.
* internal-fn.h (get_len_internal_fn): Ditto.
* tree-vect-stmts.cc (vectorizable_call): Add CALL auto-vectorization.

---
 gcc/internal-fn.cc | 24 +++
 gcc/internal-fn.h  |  1 +
 gcc/tree-vect-stmts.cc | 90 +-
 3 files changed, 106 insertions(+), 9 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 8e294286388..7f5ede00c02 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4443,6 +4443,30 @@ get_conditional_internal_fn (internal_fn fn)
 }
 }
 
+/* If there exists an internal function like IFN that operates on vectors,
+   but with additional length and bias parameters, return the internal_fn
+   for that function, otherwise return IFN_LAST.  */
+internal_fn
+get_len_internal_fn (internal_fn fn)
+{
+  switch (fn)
+{
+#undef DEF_INTERNAL_COND_FN
+#undef DEF_INTERNAL_SIGNED_COND_FN
+#define DEF_INTERNAL_COND_FN(NAME, ...)
\
+  case IFN_COND_##NAME:
\
+return IFN_COND_LEN_##NAME;
+#define DEF_INTERNAL_SIGNED_COND_FN(NAME, ...) 
\
+  case IFN_COND_##NAME:
\
+return IFN_COND_LEN_##NAME;
+#include "internal-fn.def"
+#undef DEF_INTERNAL_COND_FN
+#undef DEF_INTERNAL_SIGNED_COND_FN
+default:
+  return IFN_LAST;
+}
+}
+
 /* If IFN implements the conditional form of an unconditional internal
function, return that unconditional function, otherwise return IFN_LAST.  */
 
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index a5c3f4765ff..410c1b623d6 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ -224,6 +224,7 @@ extern bool set_edom_supported_p (void);
 
 extern internal_fn get_conditional_internal_fn (tree_code);
 extern internal_fn get_conditional_internal_fn (internal_fn);
+extern internal_fn get_len_internal_fn (internal_fn);
 extern internal_fn get_conditional_len_internal_fn (tree_code);
 extern tree_code conditional_internal_fn_code (internal_fn);
 extern internal_fn get_unconditional_internal_fn (internal_fn);
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 6a4e8fce126..97106b8c475 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -3540,7 +3540,10 @@ vectorizable_call (vec_info *vinfo,
 
   int reduc_idx = STMT_VINFO_REDUC_IDX (stmt_info);
   internal_fn cond_fn = get_conditional_internal_fn (ifn);
+  internal_fn cond_len_fn = get_len_internal_fn (ifn);
+  int len_opno = internal_fn_len_index (cond_len_fn);
   vec_loop_masks *masks = (loop_vinfo ? _VINFO_MASKS (loop_vinfo) : NULL);
+  vec_loop_lens *lens = (loop_vinfo ? _VINFO_LENS (loop_vinfo) : NULL);
   if (!vec_stmt) /* transformation not required.  */
 {
   if (slp_node)
@@ -3569,6 +3572,9 @@ vectorizable_call (vec_info *vinfo,
  if (reduc_idx >= 0
  && (cond_fn == IFN_LAST
  || !direct_internal_fn_supported_p (cond_fn, vectype_out,
+ OPTIMIZE_FOR_SPEED))
+ && (cond_len_fn == IFN_LAST
+ || !direct_internal_fn_supported_p (cond_len_fn, vectype_out,
  OPTIMIZE_FOR_SPEED)))
{
  if (dump_enabled_p ())
@@ -3586,8 +3592,14

Re: [PATCH V2] [PATCH 3/5] [RISC-V] Generate Zicond instruction for select pattern with condition eq or neq to 0

2023-08-01 Thread Jeff Law via Gcc-patches





On 7/29/23 03:13, Xiao Zeng wrote:

This patch recognizes Zicond patterns when the select pattern
with condition eq or neq to 0 (using eq as an example), namely:

1 rd = (rs2 == 0) ? non-imm : 0
2 rd = (rs2 == 0) ? non-imm : non-imm
3 rd = (rs2 == 0) ? reg : non-imm
4 rd = (rs2 == 0) ? reg : reg

gcc/ChangeLog:

 * config/riscv/riscv.cc (riscv_expand_conditional_move): Recognize
 Zicond patterns
 * config/riscv/riscv.md: Recognize Zicond patterns through movcc

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/zicond-primitiveSemantics_return_0_imm.c: New test.
 * gcc.target/riscv/zicond-primitiveSemantics_return_imm_imm.c: New 
test.
 * gcc.target/riscv/zicond-primitiveSemantics_return_imm_reg.c: New 
test.
 * gcc.target/riscv/zicond-primitiveSemantics_return_reg_reg.c: New 
test.
---
  gcc/config/riscv/riscv.cc | 144 ++
  gcc/config/riscv/riscv.md |   4 +-
  .../zicond-primitiveSemantics_return_0_imm.c  |  65 
  ...zicond-primitiveSemantics_return_imm_imm.c |  73 +
  ...zicond-primitiveSemantics_return_imm_reg.c |  65 
  ...zicond-primitiveSemantics_return_reg_reg.c |  65 
  6 files changed, 414 insertions(+), 2 deletions(-)
  create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_return_0_imm.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_return_imm_imm.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_return_imm_reg.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_return_reg_reg.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 941ea25e1f2..6ac39f63dd7 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3516,6 +3516,150 @@ riscv_expand_conditional_move (rtx dest, rtx op, rtx 
cons, rtx alt)
  cond, cons, alt)));
return true;
  }
+  else if (TARGET_ZICOND
+   && (code == EQ || code == NE)
+   && GET_MODE_CLASS (mode) == MODE_INT)
+{
+  need_eq_ne_p = true;
+  /* 0 + imm  */
+  if (GET_CODE (cons) == CONST_INT && cons == const0_rtx
+  && GET_CODE (alt) == CONST_INT && alt != const0_rtx)
A couple nits.  Rather than test the GET_CODE (object) == CONST_INT, 
instead use CONST_INT_P (object).


Rather than using const0_rtx, use CONST0_RTX (mode).  That makes it more 
general.





+{
+  riscv_emit_int_compare (, , , need_eq_ne_p);
Might as well use "true" rather than "need_eq_ne_p" here and for the 
other calls in your new code.



+  /* imm + imm  */
+  else if (GET_CODE (cons) == CONST_INT && cons != const0_rtx
+   && GET_CODE (alt) == CONST_INT && alt != const0_rtx)

So same comments on using CONST_INT_P and CONST0_RTX

+{
+  riscv_emit_int_compare (, , , need_eq_ne_p);
+  rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
+  rtx reg = gen_reg_rtx (mode);
+  rtx temp = GEN_INT (INTVAL (alt) - INTVAL (cons));
+  emit_insn (gen_rtx_SET (reg, temp));
Use force_reg here rather than directly emitting the insn to initialize 
"reg".  What you're doing works when the difference is small but will 
not work when the difference does not fit into a signed 12bit value.



+  /* imm + reg  */
+  else if (GET_CODE (cons) == CONST_INT && cons != const0_rtx
+   && GET_CODE (alt) == REG)
Same comments about CONST_INT_P and CONST0_RTX.  And instead of using 
GET_CODE (object) == REG, use REG_P (object).




+{
+  /* Optimize for register value of 0.  */
+  if (op0 == alt && op1 == const0_rtx)
+{
+  rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
+  cons = force_reg (mode, cons);
+  emit_insn (gen_rtx_SET (dest, gen_rtx_IF_THEN_ELSE (mode, cond,
+  cons, alt)));
+  return true;
+}

Isn't this only valid for NE?  Also use CONST0_RTX in that test.



+  /* Handle the special situation of: -2048 == INTVAL (alt)
+ to avoid failure due to an unrecognized insn. Let the costing
+ model determine if the conditional move sequence is better
+ than the branching sequence.  */
+  if (-2048 == INTVAL (cons))
So instead of checking for explicit values, we have SMALL_OPERAND to do 
it for us.  Also note that for !SMALL_OPERAND we can just force the 
value into a register using "force_reg" and all the right things will 
happen.


So just add something like this to your original code:

  if (!SMALL_OPERAND (INTVAL (cons))
cons = force_reg (mode, cons);

That will result in CONS being either a simple integer constant (when it 
is suitable for addi) or a register (all other cases).  At that point

RE: [PATCH] x86: fold two of vec_dupv2df's alternatives

2023-08-01 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Jan Beulich 
> Sent: Tuesday, August 1, 2023 1:49 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; Kirill Yukhin
> 
> Subject: [PATCH] x86: fold two of vec_dupv2df's alternatives
> 
> By using Yvm in the source, both can be expressed in one.
> 
> gcc/
> 
>   * sse.md (vec_dupv2df): Fold the middle two of the
>   alternatives.
Ok, thanks.
> 
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -13784,21 +13784,20 @@
> (set_attr "mode" "DF,DF,V1DF,V1DF,V1DF,V2DF,V1DF,V1DF,V1DF")])
> 
>  (define_insn "vec_dupv2df"
> -  [(set (match_operand:V2DF 0 "register_operand" "=x,x,v,v")
> +  [(set (match_operand:V2DF 0 "register_operand" "=x,v,v")
>   (vec_duplicate:V2DF
> -   (match_operand:DF 1 "nonimmediate_operand" "0,xm,vm,vm")))]
> +   (match_operand:DF 1 "nonimmediate_operand" "0,Yvm,vm")))]
>"TARGET_SSE2"
>"@
> unpcklpd\t%0, %0
> %vmovddup\t{%1, %0|%0, %1}
> -   vmovddup\t{%1, %0|%0, %1}
> vbroadcastsd\t{%1, }%g0{|, %1}"
> -  [(set_attr "isa" "noavx,sse3,avx512vl,*")
> -   (set_attr "type" "sselog1,ssemov,ssemov,ssemov")
> -   (set_attr "prefix" "orig,maybe_vex,evex,evex")
> -   (set_attr "mode" "V2DF,DF,DF,V8DF")
> +  [(set_attr "isa" "noavx,sse3,*")
> +   (set_attr "type" "sselog1,ssemov,ssemov")
> +   (set_attr "prefix" "orig,maybe_evex,evex")
> +   (set_attr "mode" "V2DF,DF,V8DF")
> (set (attr "enabled")
> - (cond [(eq_attr "alternative" "3")
> + (cond [(eq_attr "alternative" "2")
>(symbol_ref "TARGET_AVX512F && !TARGET_AVX512VL
> && !TARGET_PREFER_AVX256")
>  (match_test "")

90 matches

Mail list logo