Re: [committed] i386: Fix grammar typo in diagnostic

2023-08-22 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 23, 2023 at 7:28 AM Hongtao Liu  wrote:
>
> On Tue, Aug 8, 2023 at 5:22 AM Marek Polacek via Libstdc++
>  wrote:
> >
> > On Mon, Aug 07, 2023 at 10:12:35PM +0100, Jonathan Wakely via Gcc-patches 
> > wrote:
> > > Committed as obvious.
> > >
> > > Less obvious (to me) is whether it's correct to say "GCC V13" here. I
> > > don't think we refer to a version that way anywhere else, do we?
> > >
> > > Would "since GCC 13.1.0" be better?
> >
> > x86_field_alignment uses
> >
> >   inform (input_location, "the alignment of %<_Atomic %T%> "
> >   "fields changed in %{GCC 11.1%}",
> >
> > so maybe the below should use %{GCC 13.1%}.  "GCC V13" looks unusual
> > to me.
>  %{GCC 13.1%} sounds reasonable.
looks like %{ can't be using in const char*, so use % instead.

How about:

Author: liuhongt 
Date:   Wed Aug 23 07:31:13 2023 +0800

Adjust GCC V13 to GCC 13.1 in diagnotic.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_invalid_conversion): Adjust GCC
V13 to GCC 13.1.

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index e7822ef6500..88d9d7d537f 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -22899,7 +22899,7 @@ ix86_invalid_conversion (const_tree fromtype,
const_tree totype)
  || (TYPE_MODE (totype) == BFmode
  && TYPE_MODE (fromtype) == HImode))
warning (0, "%<__bfloat16%> is redefined from typedef % "
-   "to real %<__bf16%> since GCC V13, be careful of "
+   "to real %<__bf16%> since %, be careful of "
 "implicit conversion between %<__bf16%> and %; "
 "an explicit bitcast may be needed here");
 }


> >
> > > -- >8 --
> > >
> > > gcc/ChangeLog:
> > >
> > >   * config/i386/i386.cc (ix86_invalid_conversion): Fix grammar.
> > > ---
> > >  gcc/config/i386/i386.cc | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > index 50860050049..5d57726e22c 100644
> > > --- a/gcc/config/i386/i386.cc
> > > +++ b/gcc/config/i386/i386.cc
> > > @@ -22890,7 +22890,7 @@ ix86_invalid_conversion (const_tree fromtype, 
> > > const_tree totype)
> > >   warning (0, "%<__bfloat16%> is redefined from typedef % "
> > >   "to real %<__bf16%> since GCC V13, be careful of "
> > >"implicit conversion between %<__bf16%> and %; "
> > > -  "a explicit bitcast may be needed here");
> > > +  "an explicit bitcast may be needed here");
> > >  }
> > >
> > >/* Conversion allowed.  */
> > > --
> > > 2.41.0
> > >
> >
> > Marek
> >
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao


Re: [PATCH 1/3] vect: Remove some manual release in vectorizable_store

2023-08-22 Thread Kewen.Lin via Gcc-patches
on 2023/8/22 20:32, Richard Biener wrote:
> On Tue, Aug 22, 2023 at 10:45 AM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> To avoid some duplicates in some follow-up patches on
>> function vectorizable_store, this patch is to adjust some
>> existing vec with auto_vec and remove some manual release
>> invocation.  Also refactor a bit and remove some useless
>> codes.
>>
>> Bootstrapped and regtested on x86_64-redhat-linux,
>> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>>
>> Is it ok for trunk?
> 
> OK.

Thanks Richi, pushed this as r14-3402, the other two as r14-3403
and r14-3404.

BR,
Kewen



[PATCH V1 1/2] light expander sra v0

2023-08-22 Thread Jiufu Guo via Gcc-patches


Hi,

I just updated the patch.  We could review this one.

Compare with previous patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627287.html
This version:
* Supports bitfield access from one register.
* Allow return scalar registers cleaned via contructor.

Bootstrapped and regtested on x86_64-redhat-linux, and
powerpc64{,le}-linux-gnu.

Is it ok for trunk?


PR target/65421
PR target/69143

gcc/ChangeLog:

* cfgexpand.cc (extract_bit_field): Extern declare.
(struct access): New class.
(struct expand_sra): New class.
(expand_sra::build_access): New member function.
(expand_sra::visit_base): Likewise.
(expand_sra::analyze_default_stmt): Likewise.
(expand_sra::analyze_assign): Likewise.
(expand_sra::add_sra_candidate): Likewise.
(expand_sra::collect_sra_candidates): Likewise.
(expand_sra::valid_scalariable_accesses): Likewise.
(expand_sra::prepare_expander_sra): Likewise.
(expand_sra::expand_sra): Class constructor.
(expand_sra::~expand_sra): Class destructor.
(expand_sra::get_scalarized_rtx): New member function.
(extract_one_reg): New function.
(extract_bitfield): New function.
(expand_sra::scalarize_access): New member function.
(expand_sra::scalarize_accesses): New member function.
(get_scalar_rtx_for_aggregate_expr): New function.
(set_scalar_rtx_for_aggregate_access): New function.
(set_scalar_rtx_for_returns): New function.
(expand_return): Call get_scalar_rtx_for_aggregate_expr.
(expand_debug_expr): Call get_scalar_rtx_for_aggregate_expr.
(pass_expand::execute): Update to use the expand_sra.
* expr.cc (get_scalar_rtx_for_aggregate_expr): Extern declare.
(expand_assignment): Call get_scalar_rtx_for_aggregate_expr.
(expand_expr_real): Call get_scalar_rtx_for_aggregate_expr.
* function.cc (set_scalar_rtx_for_aggregate_access):  Extern declare.
(set_scalar_rtx_for_returns): Extern declare.
(assign_parm_setup_block): Call set_scalar_rtx_for_aggregate_access.
(assign_parms): Call set_scalar_rtx_for_aggregate_access. 
(expand_function_start): Call set_scalar_rtx_for_returns.
* tree-sra.h (struct base_access): New class.
(struct default_analyzer): New class.
(scan_function): New function template.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/pr102024.C: Updated.
* gcc.target/powerpc/pr108073.c: New test.
* gcc.target/powerpc/pr65421-1.c: New test.
* gcc.target/powerpc/pr65421-2.c: New test.

---
 gcc/cfgexpand.cc | 474 ++-
 gcc/expr.cc  |  29 +-
 gcc/function.cc  |  28 +-
 gcc/tree-sra.h   |  77 +++
 gcc/testsuite/g++.target/powerpc/pr102024.C  |   2 +-
 gcc/testsuite/gcc.target/powerpc/pr108073.c  |  29 ++
 gcc/testsuite/gcc.target/powerpc/pr65421-1.c |   6 +
 gcc/testsuite/gcc.target/powerpc/pr65421-2.c |  32 ++
 8 files changed, 668 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-2.c

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index 
edf292cfbe95ac2711faee7769e839cb4edb0dd3..385b6c781aa2805e7ca40293a0ae84f87e23e0b6
 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -74,6 +74,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "output.h"
 #include "builtins.h"
 #include "opts.h"
+#include "tree-sra.h"
 
 /* Some systems use __main in a way incompatible with its use in gcc, in these
cases use the macros NAME__MAIN to give a quoted symbol and SYMBOL__MAIN to
@@ -97,6 +98,468 @@ static bool defer_stack_allocation (tree, bool);
 
 static void record_alignment_for_reg_var (unsigned int);
 
+extern rtx extract_bit_field (rtx, poly_uint64, poly_uint64, int, rtx,
+ machine_mode, machine_mode, bool, rtx *);
+
+/* For light SRA in expander about paramaters and returns.  */
+struct access : public base_access
+{
+  /* The rtx for the access: link to incoming/returning register(s).  */
+  rtx rtx_val;
+};
+
+typedef struct access *access_p;
+
+struct expand_sra : public default_analyzer
+{
+  expand_sra ();
+  ~expand_sra ();
+
+  /* Now use default APIs, no actions for
+ pre_analyze_stmt, analyze_return.  */
+
+  /* overwrite analyze_default_stmt.  */
+  void analyze_default_stmt (gimple *);
+
+  /* overwrite analyze phi,call,asm .  */
+  void analyze_phi (gphi *stmt) { analyze_default_stmt (stmt); };
+  void analyze_call (gcall *stmt) { analyze_default_stmt (stmt); };
+  void analyze_asm (gasm *stmt) { analyze_default_stmt (stmt); };
+  /* overwrite analyze_assign.  */
+  void analyze_assign (gassign *);
+
+  /* 

[PATCH] Fix target_clone ("arch=graniterapids-d") and target_clone ("arch=arrowlake-s")

2023-08-22 Thread liuhongt via Gcc-patches
Both "graniterapid-d" and "graniterapids" are attached with
PROCESSOR_GRANITERAPID in processor_alias_table but mapped to
different __cpu_subtype in get_intel_cpu.

And get_builtin_code_for_version will try to match the first
PROCESSOR_GRANITERAPIDS in processor_alias_table which maps to
"granitepraids" here.

861  else if (new_target->arch_specified && new_target->arch > 0)
1862for (i = 0; i < pta_size; i++)
1863  if (processor_alias_table[i].processor == new_target->arch)
1864{
1865  const pta *arch_info = _alias_table[i];
1866  switch (arch_info->priority)
1867{
1868default:
1869  arg_str = arch_info->name;

This mismatch makes dispatch_function_versions check the preidcate
of__builtin_cpu_is ("graniterapids") for "graniterapids-d" and causes
the issue.
The patch explicitly adds PROCESSOR_ARROWLAKE_S and
PROCESSOR_GRANITERAPIDS_D to make a distinction.

For "alderlake","raptorlake", "meteorlake" they share same isa, cost,
tuning, and mapped to the same __cpu_type/__cpu_subtype in
get_intel_cpu, so no need to add PROCESSOR_RAPTORLAKE and others.


Bootstrapped and regtested on x86_64-pc-linux-gnu.
Ok for trunk(and backport graniterapids-d part to GCC13)?

gcc/ChangeLog:

* common/config/i386/i386-common.cc (processor_names): Add new
member graniterapids-s and arrowlake-s.
* config/i386/i386-options.cc (processor_alias_table): Update
table with PROCESSOR_ARROWLAKE_S and
PROCESSOR_GRANITERAPIDS_D.
(m_GRANITERAPID_D): New macro.
(m_ARROWLAKE_S): Ditto.
(m_CORE_AVX512): Add m_GRANITERAPIDS_D.
(processor_cost_table): Add icelake_cost for
PROCESSOR_GRANITERAPIDS_D and alderlake_cost for
PROCESSOR_ARROWLAKE_S.
* config/i386/x86-tune.def: Hanlde m_ARROWLAKE_S same as
m_ARROWLAKE.
* config/i386/i386.h (enum processor_type): Add new member
PROCESSOR_GRANITERAPIDS_D and PROCESSOR_ARROWLAKE_S.
* config/i386/i386-c.cc (ix86_target_macros_internal): Handle
PROCESSOR_GRANITERAPIDS_D and PROCESSOR_ARROWLAKE_S
---
 gcc/common/config/i386/i386-common.cc | 11 +++--
 gcc/config/i386/i386-c.cc | 15 +++
 gcc/config/i386/i386-options.cc   |  6 ++-
 gcc/config/i386/i386.h|  4 +-
 gcc/config/i386/x86-tune.def  | 63 ++-
 5 files changed, 62 insertions(+), 37 deletions(-)

diff --git a/gcc/common/config/i386/i386-common.cc 
b/gcc/common/config/i386/i386-common.cc
index 12a01704a73..1e11163004b 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -2155,7 +2155,9 @@ const char *const processor_names[] =
   "alderlake",
   "rocketlake",
   "graniterapids",
+  "graniterapids-d",
   "arrowlake",
+  "arrowlake-s",
   "intel",
   "lujiazui",
   "geode",
@@ -2279,13 +2281,14 @@ const pta processor_alias_table[] =
 M_CPU_SUBTYPE (INTEL_COREI7_ALDERLAKE), P_PROC_AVX2},
   {"graniterapids", PROCESSOR_GRANITERAPIDS, CPU_HASWELL, PTA_GRANITERAPIDS,
 M_CPU_SUBTYPE (INTEL_COREI7_GRANITERAPIDS), P_PROC_AVX512F},
-  {"graniterapids-d", PROCESSOR_GRANITERAPIDS, CPU_HASWELL, 
PTA_GRANITERAPIDS_D,
-M_CPU_SUBTYPE (INTEL_COREI7_GRANITERAPIDS_D), P_PROC_AVX512F},
+  {"graniterapids-d", PROCESSOR_GRANITERAPIDS_D, CPU_HASWELL,
+PTA_GRANITERAPIDS_D, M_CPU_SUBTYPE (INTEL_COREI7_GRANITERAPIDS_D),
+P_PROC_AVX512F},
   {"arrowlake", PROCESSOR_ARROWLAKE, CPU_HASWELL, PTA_ARROWLAKE,
 M_CPU_SUBTYPE (INTEL_COREI7_ARROWLAKE), P_PROC_AVX2},
-  {"arrowlake-s", PROCESSOR_ARROWLAKE, CPU_HASWELL, PTA_ARROWLAKE_S,
+  {"arrowlake-s", PROCESSOR_ARROWLAKE_S, CPU_HASWELL, PTA_ARROWLAKE_S,
 M_CPU_SUBTYPE (INTEL_COREI7_ARROWLAKE_S), P_PROC_AVX2},
-  {"lunarlake", PROCESSOR_ARROWLAKE, CPU_HASWELL, PTA_ARROWLAKE_S,
+  {"lunarlake", PROCESSOR_ARROWLAKE_S, CPU_HASWELL, PTA_ARROWLAKE_S,
 M_CPU_SUBTYPE (INTEL_COREI7_ARROWLAKE_S), P_PROC_AVX2},
   {"bonnell", PROCESSOR_BONNELL, CPU_ATOM, PTA_BONNELL,
 M_CPU_TYPE (INTEL_BONNELL), P_PROC_SSSE3},
diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc
index caef5531593..0e11709ebc5 100644
--- a/gcc/config/i386/i386-c.cc
+++ b/gcc/config/i386/i386-c.cc
@@ -258,6 +258,10 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
   def_or_undef (parse_in, "__graniterapids");
   def_or_undef (parse_in, "__graniterapids__");
   break;
+case PROCESSOR_GRANITERAPIDS_D:
+  def_or_undef (parse_in, "__graniterapids_d");
+  def_or_undef (parse_in, "__graniterapids_d__");
+  break;
 case PROCESSOR_ALDERLAKE:
   def_or_undef (parse_in, "__alderlake");
   def_or_undef (parse_in, "__alderlake__");
@@ -270,6 +274,11 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
   def_or_undef (parse_in, "__arrowlake");
   def_or_undef (parse_in, "__arrowlake__");
   break;
+case PROCESSOR_ARROWLAKE_S:
+  

Re: [PATCH] RISC-V: Add conditional unary neg/abs/not autovec patterns

2023-08-22 Thread Lehua Ding

Hi Robin,

Thanks for these nice comments!


-  emit_insn (gen_vcond_mask (vmode, vmode, d->target, d->op0, d->op1, mask));
+  /* swap op0 and op1 since the order is opposite to pred_merge.  */
+  rtx ops2[] = {d->target, d->op1, d->op0, mask};
+  emit_vlmax_merge_insn (code_for_pred_merge (vmode), 
riscv_vector::RVV_MERGE_OP, ops2);
return true;
  }


This seems a separate, general fix that just surfaced in the course of
this patch?  Would be nice to have this factored out but as we already have
it, no need I guess.


Yes, since I change @vcond_mask_ from define_expand to 
define_insn_and_split. If I don't change it then I need to manually make 
sure that d->target, d->op1, d->op0 satisfy the predicate of the 
@vcond_mask (vregs pass will check it, so need forbidden mem operand). 
If I use emit_vlmax_merge_insn directly, it uses expand_insn inner, 
which automatically converts the operands for me to make it satisfy the 
predicate condition. This is one difference between gen_xxx and 
expand_insn. And I think calling emit_vlmax_merge_insn to generate 
pred_merge is the most appropriate and uniform way.



+  if (is_dummy_mask)
+{
+  /* Use TU, MASK ANY policy.  */
+  if (needs_fp_rounding (code, mode))
+   emit_nonvlmax_fp_tu_insn (icode, RVV_UNOP_TU, cond_ops, len);
+  else
+   emit_nonvlmax_tu_insn (icode, RVV_UNOP_TU, cond_ops, len);
+}


We have quite a bit of code duplication across the expand_cond_len functions
now (binop, ternop, unop).  Not particular to your patch but I'd suggest to
unify this later.


Indeed, leave it to me and I'll send another patch later to reduce this 
duplicate code.





+TEST_ALL (DEF_LOOP)
+
+/* NOTE: int abs operator is converted to vmslt + vneg.v */
+/* { dg-final { scan-assembler-times {\tvneg\.v\tv[0-9]+,v[0-9]+,v0\.t} 12 { xfail { 
any-opts "--param riscv-autovec-lmul=m2" } } } } */


Why does this fail with LMUL == 2 (also in the following tests)?  A comment
would be nice here.


This is because the number of iterations 5 in the testcase caused GCC to 
remove the Loop and turn it into two basic blocks, resulting in a 
doubling of the number of vnegs. I'm going to modify the iteration count 
(It should be big enough that that wouldn't happen even when LMUL=m8) so 
that it doesn't trigger that optimization.


V2 patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628210.html

--
Best,
Lehua



[PATCH V2] RISC-V: Add conditional unary neg/abs/not autovec patterns

2023-08-22 Thread Lehua Ding
V2 changes:

1. remove xfail
2. testcase files naming harmonized with existing

---

Hi,

This patch add conditional unary neg/abs/not autovec patterns to RISC-V backend.
For this C code:

void
test_3 (float *__restrict a, float *__restrict b, int *__restrict pred, int n)
{
  for (int i = 0; i < n; i += 1)
{
  a[i] = pred[i] ? __builtin_fabsf (b[i]) : a[i];
}
}

Before this patch:
...
vsetvli a7,zero,e32,m1,ta,ma
vfabs.v v2,v2
vmerge.vvm  v1,v1,v2,v0
...

After this patch:
...
vsetvli a7,zero,e32,m1,ta,mu
vfabs.v v1,v2,v0.t
...

For int neg/not and FP neg patterns, Defining the corresponding cond_xxx paterns
is enough.
For the FP abs pattern, We need to change the definition of `abs2` and
`@vcond_mask_` pattern from define_expand to define_insn_and_split
in order to fuse them into a new pattern `*cond_abs` at the combine pass.
A fusion process similar to the one below:

(insn 30 29 31 4 (set (reg:RVVM1SF 152 [ vect_iftmp.15 ])
(abs:RVVM1SF (reg:RVVM1SF 137 [ vect__6.14 ]))) "float.c":15:56 discrim 
1 12799 {absrvvm1sf2}
 (expr_list:REG_DEAD (reg:RVVM1SF 137 [ vect__6.14 ])
(nil)))

(insn 31 30 32 4 (set (reg:RVVM1SF 140 [ vect_iftmp.19 ])
(if_then_else:RVVM1SF (reg:RVVMF32BI 136 [ mask__27.11 ])
(reg:RVVM1SF 152 [ vect_iftmp.15 ])
(reg:RVVM1SF 139 [ vect_iftmp.18 ]))) 12707 
{vcond_mask_rvvm1sfrvvmf32bi}
 (expr_list:REG_DEAD (reg:RVVM1SF 152 [ vect_iftmp.15 ])
(expr_list:REG_DEAD (reg:RVVM1SF 139 [ vect_iftmp.18 ])
(expr_list:REG_DEAD (reg:RVVMF32BI 136 [ mask__27.11 ])
(nil)
==>

(insn 31 30 32 4 (set (reg:RVVM1SF 140 [ vect_iftmp.19 ])
(if_then_else:RVVM1SF (reg:RVVMF32BI 136 [ mask__27.11 ])
(abs:RVVM1SF (reg:RVVM1SF 137 [ vect__6.14 ]))
(reg:RVVM1SF 139 [ vect_iftmp.18 ]))) 13444 {*cond_absrvvm1sf}
 (expr_list:REG_DEAD (reg:RVVM1SF 137 [ vect__6.14 ])
(expr_list:REG_DEAD (reg:RVVMF32BI 136 [ mask__27.11 ])
(expr_list:REG_DEAD (reg:RVVM1SF 139 [ vect_iftmp.18 ])
(nil)

Best,
Lehua

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_abs): New combine pattern.
(*copysign_neg): Ditto.
* config/riscv/autovec.md (@vcond_mask_): Adjust.
(2): Ditto.
(cond_): New.
(cond_len_): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): New.
(expand_cond_len_unop): New helper func.
* config/riscv/riscv-v.cc (shuffle_merge_patterns): Adjust.
(expand_cond_len_unop): New helper func.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_unary-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-6.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-7.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-8.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-8.c: New test.
---
 gcc/config/riscv/autovec-opt.md   | 39 
 gcc/config/riscv/autovec.md   | 97 +--
 gcc/config/riscv/riscv-protos.h   |  7 +-
 gcc/config/riscv/riscv-v.cc   | 56 ++-
 .../riscv/rvv/autovec/cond/cond_unary-1.c | 43 
 .../riscv/rvv/autovec/cond/cond_unary-2.c | 46 +
 .../riscv/rvv/autovec/cond/cond_unary-3.c | 43 
 .../riscv/rvv/autovec/cond/cond_unary-4.c | 43 
 .../riscv/rvv/autovec/cond/cond_unary-5.c | 36 +++
 .../riscv/rvv/autovec/cond/cond_unary-6.c | 39 
 .../riscv/rvv/autovec/cond/cond_unary-7.c | 36 +++
 .../riscv/rvv/autovec/cond/cond_unary-8.c | 36 +++
 .../riscv/rvv/autovec/cond/cond_unary_run-1.c | 27 ++
 .../riscv/rvv/autovec/cond/cond_unary_run-2.c | 28 ++
 .../riscv/rvv/autovec/cond/cond_unary_run-3.c | 27 ++
 .../riscv/rvv/autovec/cond/cond_unary_run-4.c | 27 ++
 .../riscv/rvv/autovec/cond/cond_unary_run-5.c | 26 +
 .../riscv/rvv/autovec/cond/cond_unary_run-6.c | 27 ++
 .../riscv/rvv/autovec/cond/cond_unary_run-7.c | 26 +
 

[PATCH v2] libffi: Backport of LoongArch support for libffi.

2023-08-22 Thread Lulu Cheng
v1 -> v2:
  Modify the changelog information and add PR libffi/108682.
  

This is a backport of ,
and contains modifications to commit 5a4774cd4d, as well as the LoongArch
schema portion of commit ee22ecbd11. This is needed for libgo.

libffi/ChangeLog:

PR libffi/108682
* configure.host: Add LoongArch support.
* Makefile.am: Likewise.
* Makefile.in: Regenerate.
* src/loongarch64/ffi.c: New file.
* src/loongarch64/ffitarget.h: New file.
* src/loongarch64/sysv.S: New file.
---
 libffi/Makefile.am |   4 +-
 libffi/Makefile.in |  25 +-
 libffi/configure.host  |   5 +
 libffi/src/loongarch64/ffi.c   | 621 +
 libffi/src/loongarch64/ffitarget.h |  82 
 libffi/src/loongarch64/sysv.S  | 327 +++
 6 files changed, 1058 insertions(+), 6 deletions(-)
 create mode 100644 libffi/src/loongarch64/ffi.c
 create mode 100644 libffi/src/loongarch64/ffitarget.h
 create mode 100644 libffi/src/loongarch64/sysv.S

diff --git a/libffi/Makefile.am b/libffi/Makefile.am
index c6d6f849c53..2259ddb75f9 100644
--- a/libffi/Makefile.am
+++ b/libffi/Makefile.am
@@ -139,7 +139,7 @@ noinst_HEADERS = src/aarch64/ffitarget.h 
src/aarch64/internal.h \
src/sparc/internal.h src/tile/ffitarget.h src/vax/ffitarget.h   \
src/x86/ffitarget.h src/x86/internal.h src/x86/internal64.h \
src/x86/asmnames.h src/xtensa/ffitarget.h src/dlmalloc.c\
-   src/kvx/ffitarget.h
+   src/kvx/ffitarget.h src/loongarch64/ffitarget.h
 
 EXTRA_libffi_la_SOURCES = src/aarch64/ffi.c src/aarch64/sysv.S \
src/aarch64/win64_armasm.S src/alpha/ffi.c src/alpha/osf.S  \
@@ -169,7 +169,7 @@ EXTRA_libffi_la_SOURCES = src/aarch64/ffi.c 
src/aarch64/sysv.S  \
src/x86/ffiw64.c src/x86/win64.S src/x86/ffi64.c\
src/x86/unix64.S src/x86/sysv_intel.S src/x86/win64_intel.S \
src/xtensa/ffi.c src/xtensa/sysv.S src/kvx/ffi.c\
-   src/kvx/sysv.S
+   src/kvx/sysv.S src/loongarch64/ffi.c src/loongarch64/sysv.S
 
 TARGET_OBJ = @TARGET_OBJ@
 libffi_la_LIBADD = $(TARGET_OBJ)
diff --git a/libffi/Makefile.in b/libffi/Makefile.in
index 5524a6a571e..1d936b5c8a5 100644
--- a/libffi/Makefile.in
+++ b/libffi/Makefile.in
@@ -550,7 +550,7 @@ noinst_HEADERS = src/aarch64/ffitarget.h 
src/aarch64/internal.h \
src/sparc/internal.h src/tile/ffitarget.h src/vax/ffitarget.h   \
src/x86/ffitarget.h src/x86/internal.h src/x86/internal64.h \
src/x86/asmnames.h src/xtensa/ffitarget.h src/dlmalloc.c\
-   src/kvx/ffitarget.h
+   src/kvx/ffitarget.h src/loongarch64/ffitarget.h
 
 EXTRA_libffi_la_SOURCES = src/aarch64/ffi.c src/aarch64/sysv.S \
src/aarch64/win64_armasm.S src/alpha/ffi.c src/alpha/osf.S  \
@@ -580,7 +580,7 @@ EXTRA_libffi_la_SOURCES = src/aarch64/ffi.c 
src/aarch64/sysv.S  \
src/x86/ffiw64.c src/x86/win64.S src/x86/ffi64.c\
src/x86/unix64.S src/x86/sysv_intel.S src/x86/win64_intel.S \
src/xtensa/ffi.c src/xtensa/sysv.S src/kvx/ffi.c\
-   src/kvx/sysv.S
+   src/kvx/sysv.S src/loongarch64/ffi.c src/loongarch64/sysv.S
 
 libffi_la_LIBADD = $(TARGET_OBJ)
 libffi_convenience_la_SOURCES = $(libffi_la_SOURCES)
@@ -1074,6 +1074,16 @@ src/kvx/ffi.lo: src/kvx/$(am__dirstamp) \
src/kvx/$(DEPDIR)/$(am__dirstamp)
 src/kvx/sysv.lo: src/kvx/$(am__dirstamp) \
src/kvx/$(DEPDIR)/$(am__dirstamp)
+src/loongarch64/$(am__dirstamp):
+   @$(MKDIR_P) src/loongarch64
+   @: > src/loongarch64/$(am__dirstamp)
+src/loongarch64/$(DEPDIR)/$(am__dirstamp):
+   @$(MKDIR_P) src/loongarch64/$(DEPDIR)
+   @: > src/loongarch64/$(DEPDIR)/$(am__dirstamp)
+src/loongarch64/ffi.lo: src/loongarch64/$(am__dirstamp) \
+   src/loongarch64/$(DEPDIR)/$(am__dirstamp)
+src/loongarch64/sysv.lo: src/loongarch64/$(am__dirstamp) \
+   src/loongarch64/$(DEPDIR)/$(am__dirstamp)
 
 libffi.la: $(libffi_la_OBJECTS) $(libffi_la_DEPENDENCIES) 
$(EXTRA_libffi_la_DEPENDENCIES) 
$(AM_V_CCLD)$(libffi_la_LINK) -rpath $(toolexeclibdir) 
$(libffi_la_OBJECTS) $(libffi_la_LIBADD) $(LIBS)
@@ -1107,6 +1117,8 @@ mostlyclean-compile:
-rm -f src/ia64/*.lo
-rm -f src/kvx/*.$(OBJEXT)
-rm -f src/kvx/*.lo
+   -rm -f src/loongarch64/*.$(OBJEXT)
+   -rm -f src/loongarch64/*.lo
-rm -f src/m32r/*.$(OBJEXT)
-rm -f src/m32r/*.lo
-rm -f src/m68k/*.$(OBJEXT)
@@ -1182,6 +1194,8 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@src/ia64/$(DEPDIR)/unix.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@src/kvx/$(DEPDIR)/ffi.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@src/kvx/$(DEPDIR)/sysv.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ 

[PATCH] RISC-V: Fix potential ICE of global vsetvl elimination

2023-08-22 Thread Juzhe-Zhong
Committed for following VSETVL refactor patch to make V2 patch easier to review.
gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc 
(pass_vsetvl::global_eliminate_vsetvl_insn): Fix potential ICE.

---
 gcc/config/riscv/riscv-vsetvl.cc | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index ec1aaa4b442..f7558cad2e2 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -4383,7 +4383,7 @@ pass_vsetvl::global_eliminate_vsetvl_insn (const bb_info 
*bb) const
 
   unsigned int bb_index;
   sbitmap_iterator sbi;
-  rtx avl = get_avl (dem.get_insn ()->rtl ());
+  rtx avl = dem.get_avl ();
   hash_set sets
 = get_all_sets (dem.get_avl_source (), true, false, false);
   /* Condition 2: All VL/VTYPE available in are all compatible.  */
@@ -4407,7 +4407,10 @@ pass_vsetvl::global_eliminate_vsetvl_insn (const bb_info 
*bb) const
 {
   sbitmap avout = m_vector_manager->vector_avout[e->src->index];
   if (e->src == ENTRY_BLOCK_PTR_FOR_FN (cfun)
- || e->src == EXIT_BLOCK_PTR_FOR_FN (cfun) || bitmap_empty_p (avout))
+ || e->src == EXIT_BLOCK_PTR_FOR_FN (cfun)
+ || (unsigned int) e->src->index
+  >= m_vector_manager->vector_block_infos.length ()
+ || bitmap_empty_p (avout))
return false;
 
   EXECUTE_IF_SET_IN_BITMAP (avout, 0, bb_index, sbi)
-- 
2.36.3



[PATCH] RISC-V: Fix VTYPE fuse rule bug

2023-08-22 Thread Juzhe-Zhong
This bug is exposed after refactor patch.
Separate it and commited.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (ge_sew_ratio_unavailable_p): Fix fuse 
rule bug.
* config/riscv/riscv-vsetvl.def (DEF_SEW_LMUL_FUSE_RULE): Ditto.

---
 gcc/config/riscv/riscv-vsetvl.cc  | 10 --
 gcc/config/riscv/riscv-vsetvl.def |  2 +-
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 819a3918b3e..ec1aaa4b442 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1423,8 +1423,14 @@ static bool
 ge_sew_ratio_unavailable_p (const vector_insn_info ,
const vector_insn_info )
 {
-  if (!info2.demand_p (DEMAND_LMUL) && info2.demand_p (DEMAND_GE_SEW))
-return info1.get_sew () < info2.get_sew ();
+  if (!info2.demand_p (DEMAND_LMUL))
+{
+  if (info2.demand_p (DEMAND_GE_SEW))
+   return info1.get_sew () < info2.get_sew ();
+  /* Demand GE_SEW should be available for non-demand SEW.  */
+  else if (!info2.demand_p (DEMAND_SEW))
+   return false;
+}
   return true;
 }
 
diff --git a/gcc/config/riscv/riscv-vsetvl.def 
b/gcc/config/riscv/riscv-vsetvl.def
index 7a73149f1da..7289c01efcf 100644
--- a/gcc/config/riscv/riscv-vsetvl.def
+++ b/gcc/config/riscv/riscv-vsetvl.def
@@ -319,7 +319,7 @@ DEF_SEW_LMUL_FUSE_RULE (/*SEW*/ DEMAND_TRUE, /*LMUL*/ 
DEMAND_FALSE,
/*RATIO*/ DEMAND_TRUE, /*GE_SEW*/ DEMAND_FALSE,
/*NEW_DEMAND_SEW*/ true,
/*NEW_DEMAND_LMUL*/ false,
-   /*NEW_DEMAND_RATIO*/ false,
+   /*NEW_DEMAND_RATIO*/ true,
/*NEW_DEMAND_GE_SEW*/ true, first_sew,
vlmul_for_first_sew_second_ratio, second_ratio)
 DEF_SEW_LMUL_FUSE_RULE (/*SEW*/ DEMAND_TRUE, /*LMUL*/ DEMAND_FALSE,
-- 
2.36.3



[PATCH] RISC-V: Fix gather_load_run-12.c test

2023-08-22 Thread Juzhe-Zhong
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c: Add 
vsetvli asm.

---
 .../riscv/rvv/autovec/gather-scatter/gather_load_run-12.c   | 6 ++
 1 file changed, 6 insertions(+)

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c
 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c
index b4e2ead8ca9..2fb525d8ffc 100644
--- 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c
+++ 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c
@@ -7,6 +7,12 @@
 int
 main (void)
 {
+  /* FIXME: The purpose of this assembly is to ensure that the vtype register 
is
+ initialized befor instructions such as vmv1r.v are executed. Otherwise you
+ will get illegal instruction errors when running with spike+pk. This is an
+ interim solution for reduce unnecessary failures and a unified solution
+ will come later. */
+  asm volatile("vsetivli x0, 0, e8, m1, ta, ma");
 #define RUN_LOOP(DATA_TYPE, INDEX_TYPE)
\
   DATA_TYPE dest_##DATA_TYPE##_##INDEX_TYPE[202] = {0};
\
   DATA_TYPE src_##DATA_TYPE##_##INDEX_TYPE[202] = {0}; 
\
-- 
2.36.3



Re: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 23, 2023 at 9:58 AM Jiang, Haochen  wrote:
>
> > -Original Message-
> > From: Jakub Jelinek 
> > Sent: Tuesday, August 22, 2023 11:02 PM
> > To: Hongtao Liu 
> > Cc: Richard Biener ; Jiang, Haochen
> > ; ZiNgA BuRgA ; gcc-
> > patc...@gcc.gnu.org
> > Subject: Re: Intel AVX10.1 Compiler Design and Support
> >
> > On Tue, Aug 22, 2023 at 10:35:55PM +0800, Hongtao Liu wrote:
> > > Let's assume there's no detla now, AVX10.1-512 is equal to
> > > AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG,VPOPCNTDQ}
> > > > other stuff.
> > > > The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET* 
> > > > would be
> > > > like now, except that the current AVX512* sets imply also 
> > > > EVEX512/whatever
> > > > it will be called, that option itself enables nothing (or 
> > > > TARGET_AVX512F),
> > > > and unsetting it doesn't disable all the TARGET_AVX512*.
> > > > -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
> > > So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> > > -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
> > > then the combination basically is equal to AVX10.1-512(AVX512* sets +
> > > EVEX512)
> > > If this is your assumption, yes, there's no need for TARGET_AVX10_1.
>
> I think we still need that since the current w/o AVX512VL, we will not only
> enable 512 bit vector instructions but also enable scalar instructions, which
> means when it comes to -mavx512bw -mno-evex512, we should enable
> the scalar function.
>
> And scalar functions will also be enabled in AVX10.1-256, we need something
> to distinguish them out from the ISA set w/o AVX512VL.
Why do we need to distinguish scalar evex instruction?
As long as -mavx512XXX -mno-evex does not generate zmm/64-bit kmask,
it should be ok.

Assume there's no delta in AVX10.1, It sounds to me the design should be like

avx512*  <== mno-evex512==  avx512* + mevex512
(no-evex512)(original AVX512 stuff)
   /\  /\
   ||(equal)   ||(equal)
   \/  \/
avx10.1-256   avx10.1-512
/\  /\
||  ||
||  ||
impliedimplied
||  ||
||  ||
avx10.2-256 <== implied ==  avx10.2-512
/\ /\
|| ||
|| ||
impliedImplied
|| ||
|| ||
avx10.3-256 <== implied ==   avx10.3-512

1. The new instructions in avx10.x should be put in either avx10.x-256
or avx10.x-512 according to vector/kmask size
2. -mno-evex512 should disable -avx10.x-512.
3. -mavx512* will defaultly enable -mevex512, but -mavx10.1-256 will
just enable -mavx512* but not -mevex512

>
> Thx,
> Haochen
>
> >
> > I think that would be my expectation.  -mavx512bw currently implies
> > 512-bit vector support of avx512f and avx512bw, and with -mavx512{bw,vl}
> > also 128-bit/256-bit vector support.  All pre-AVX10 chips which do support
> > AVX512BW support 512-bit vectors.  Now, -mavx10.1 will bring in also
> > vl,dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq as you wrote
> > which weren't enabled before, but unless there is some existing or planned
> > CPU which would support 512-bit vectors in avx512f and avx512bw ISAs and
> > only support 128/256-bit vectors in those
> > dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq isas, I think there
> > is no need to differentiate further; the only CPUs which will support both
> > what -mavx512bw and -mavx10.1 requires will be (if there is no delta)
> > either CPUs with 128/256/512-bit vector support of those
> > f,vl,bw,dq,cd,...vpopcntdq ISAs, or AVX10.1-512 ISAs.
> > -mavx512vl -mavx512bw -mno-evex512 -mavx10.1-256 would on the other side
> > disable all 512-bit vector instructions and in the end just mean the
> > same as -mavx10.1-256.
> > For just
> > -mavx512bw -mno-evex512 -mavx10.1-256
> > the question is if that -mno-evex512 turns off also avx512bw/avx512f because
> > avx512vl isn't enabled at that point during processing, or if we do that
> > only at the end as a special case.  Of course, in this exact case there is
> > no difference, because -mavx10.1-256 turns that back on.
> > But it would make a difference on
> > -mavx512bw -mno-evex512 -mavx512vl
> > (when processed right away would disable AVX512BW (because VL isn't on)
> > and in the end enable VL,F including EVEX512, or be equivalent to just
> > -mavx512bw -mavx512vl if processed at the end, because -mavx512vl 

[PATCH] libgcc/m68k: Fixes for soft float

2023-08-22 Thread Keith Packard via Gcc-patches
Check for non-zero denorm in __adddf3. Need to check both the upper and
lower 32-bit chunks of a 64-bit float for a non-zero value when
checking to see if the value is -0.

Fix __addsf3 when the sum exponent is exactly 0xff to ensure that
produces infinity and not nan.

Handle converting NaN/inf values between formats.

Handle underflow and overflow when truncating.

Write a replacement for __fixxfsi so that it does not raise extra
exceptions during an extra conversion from long double to double.

Signed-off-by: Keith Packard 
---
 libgcc/config/m68k/fpgnulib.c | 161 +++---
 libgcc/config/m68k/lb1sf68.S  |   7 +-
 2 files changed, 134 insertions(+), 34 deletions(-)

diff --git a/libgcc/config/m68k/fpgnulib.c b/libgcc/config/m68k/fpgnulib.c
index fe41edf26aa..5b53778e986 100644
--- a/libgcc/config/m68k/fpgnulib.c
+++ b/libgcc/config/m68k/fpgnulib.c
@@ -54,6 +54,7 @@
 #define SIGNBIT0x8000L
 #define HIDDEN (1L << 23L)
 #define SIGN(fp)   ((fp) & SIGNBIT)
+#define EXPMASK0xFFL
 #define EXP(fp)(((fp) >> 23L) & 0xFF)
 #define MANT(fp)   (((fp) & 0x7FL) | HIDDEN)
 #define PACK(s,e,m)((s) | ((e) << 23L) | (m))
@@ -262,6 +263,9 @@ __extendsfdf2 (float a1)
   mant &= ~HIDDEN;
 }
   exp = exp - EXCESS + EXCESSD;
+  /* Handle inf and NaN */
+  if (exp == EXPMASK - EXCESS + EXCESSD)
+exp = EXPDMASK;
   dl.l.upper |= exp << 20;
   dl.l.upper |= mant >> 3;
   dl.l.lower = mant << 29;
@@ -295,40 +299,52 @@ __truncdfsf2 (double a1)
   /* shift double mantissa 6 bits so we can round */
   sticky |= mant & ((1 << 6) - 1);
   mant >>= 6;
-
-  /* Check for underflow and denormals.  */
-  if (exp <= 0)
+  if (exp == EXPDMASK - EXCESSD + EXCESS)
+{
+  exp = EXPMASK;
+  mant = mant >> 1 | (mant & 1) | !!sticky;
+}
+  else
 {
-  if (exp < -24)
+  /* Check for underflow and denormals.  */
+  if (exp <= 0)
{
- sticky |= mant;
- mant = 0;
+ if (exp < -24)
+   {
+ sticky |= mant;
+ mant = 0;
+   }
+ else
+   {
+ sticky |= mant & ((1 << (1 - exp)) - 1);
+ mant >>= 1 - exp;
+   }
+ exp = 0;
}
-  else
+
+  /* now round */
+  shift = 1;
+  if ((mant & 1) && (sticky || (mant & 2)))
{
- sticky |= mant & ((1 << (1 - exp)) - 1);
- mant >>= 1 - exp;
-   }
-  exp = 0;
-}
-  
-  /* now round */
-  shift = 1;
-  if ((mant & 1) && (sticky || (mant & 2)))
-{
-  int rounding = exp ? 2 : 1;
+ int rounding = exp ? 2 : 1;
 
-  mant += 1;
+ mant += 1;
 
-  /* did the round overflow? */
-  if (mant >= (HIDDEN << rounding))
+ /* did the round overflow? */
+ if (mant >= (HIDDEN << rounding))
+   {
+ exp++;
+ shift = rounding;
+   }
+   }
+  /* shift down */
+  mant >>= shift;
+  if (exp >= EXPMASK)
{
- exp++;
- shift = rounding;
+ exp = EXPMASK;
+ mant = 0;
}
 }
-  /* shift down */
-  mant >>= shift;
 
   mant &= ~HIDDEN;
 
@@ -432,6 +448,30 @@ __extenddfxf2 (double d)
 }
 
   exp = EXPD (dl) - EXCESSD + EXCESSX;
+  /* Check for underflow and denormals. */
+  if (exp < 0)
+{
+  if (exp < -53)
+{
+ ldl.l.middle = 0;
+ ldl.l.lower = 0;
+   }
+  else if (exp < -30)
+{
+ ldl.l.lower = (ldl.l.middle & MANTXMASK) >> ((1 - exp) - 32);
+ ldl.l.middle &= ~MANTXMASK;
+   }
+  else
+{
+ ldl.l.lower >>= 1 - exp;
+ ldl.l.lower |= (ldl.l.middle & MANTXMASK) << (32 - (1 - exp));
+ ldl.l.middle = (ldl.l.middle & ~MANTXMASK) | (ldl.l.middle & 
MANTXMASK >> (1 - exp));
+   }
+  exp = 0;
+}
+  /* Handle inf and NaN */
+  if (exp == EXPDMASK - EXCESSD + EXCESSX)
+exp = EXPXMASK;
   ldl.l.upper |= exp << 16;
   ldl.l.middle = HIDDENX;
   /* 31-20: # mantissa bits in ldl.l.middle - # mantissa bits in dl.l.upper */
@@ -464,9 +504,38 @@ __truncxfdf2 (long double ld)
 }
 
   exp = EXPX (ldl) - EXCESSX + EXCESSD;
-  /* ??? quick and dirty: keep `exp' sane */
-  if (exp >= EXPDMASK)
-exp = EXPDMASK - 1;
+  /* Check for underflow and denormals. */
+  if (exp <= 0)
+{
+  if (exp < -53)
+{
+ ldl.l.middle = 0;
+ ldl.l.lower = 0;
+   }
+  else if (exp < -30)
+{
+ ldl.l.lower = (ldl.l.middle & MANTXMASK) >> ((1 - exp) - 32);
+ ldl.l.middle &= ~MANTXMASK;
+   }
+  else
+{
+ ldl.l.lower >>= 1 - exp;
+ ldl.l.lower |= (ldl.l.middle & MANTXMASK) << (32 - (1 - exp));
+ ldl.l.middle = (ldl.l.middle & ~MANTXMASK) | (ldl.l.middle & 
MANTXMASK >> (1 - exp));
+   }
+  exp = 0;
+}
+  else if (exp == EXPXMASK - EXCESSX + EXCESSD)
+{
+  exp = EXPDMASK;
+  

[PATCH] RISC-V: Add attribute to vtype change only vsetvl

2023-08-22 Thread Juzhe-Zhong
This patch is prepare patch for VSETVL PASS.

Commited.

gcc/ChangeLog:

* config/riscv/vector.md: Add attribute.

---
 gcc/config/riscv/vector.md | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index e772e79057d..6ceae25dbed 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1363,7 +1363,11 @@
   "TARGET_VECTOR"
   "vsetvli\tzero,zero,e%0,%m1,t%p2,m%p3"
   [(set_attr "type" "vsetvl")
-   (set_attr "mode" "SI")])
+   (set_attr "mode" "SI")
+   (set (attr "sew") (symbol_ref "INTVAL (operands[0])"))
+   (set (attr "vlmul") (symbol_ref "INTVAL (operands[1])"))
+   (set (attr "ta") (symbol_ref "INTVAL (operands[2])"))
+   (set (attr "ma") (symbol_ref "INTVAL (operands[3])"))])
 
 ;; vsetvl zero,rs1,vtype instruction.
 ;; The reason we need this pattern since we should avoid setting X0 register
-- 
2.36.3



Re: [PATCH] vect: Replace DR_GROUP_STORE_COUNT with DR_GROUP_LAST_ELEMENT

2023-08-22 Thread Kewen.Lin via Gcc-patches
Hi Richi,

on 2023/8/22 20:17, Richard Biener wrote:
> On Tue, Aug 22, 2023 at 10:44 AM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> Now we use DR_GROUP_STORE_COUNT to record how many stores
>> in a group have been transformed and only do the actual
>> transform when encountering the last one.  I'm making
>> patches to move costing next to the transform code, it's
>> awkward to use this DR_GROUP_STORE_COUNT for both costing
>> and transforming.  This patch is to introduce last_element
>> to record the last element to be transformed in the group
>> rather than to sum up the store number we have seen, then
>> we can only check the given stmt is the last or not.  It
>> can make it work simply for both costing and transforming.
>>
>> Bootstrapped and regtested on x86_64-redhat-linux,
>> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>>
>> Is it ok for trunk?
> 
> This is all (existing) gross, so ... can't we do sth like the following
> instead?  Going to test this further besides the quick single
> testcase I verified.

I just realized that dealing with this in vect_transform_stmt is super
neat as you questioned and posted, thanks a lot for pushing commit
r14-3383-g2c27600fa79431 for this!

BR,
Kewen


> 
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 33f62b77710..67de19d9ce5 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -8437,16 +8437,6 @@ vectorizable_store (vec_info *vinfo,
>/* FORNOW */
>gcc_assert (!loop || !nested_in_vect_loop_p (loop, stmt_info));
> 
> -  /* We vectorize all the stmts of the interleaving group when we
> -reach the last stmt in the group.  */
> -  if (DR_GROUP_STORE_COUNT (first_stmt_info)
> - < DR_GROUP_SIZE (first_stmt_info)
> - && !slp)
> -   {
> - *vec_stmt = NULL;
> - return true;
> -   }
> -
>if (slp)
>  {
>grouped_store = false;
> @@ -12487,21 +12477,21 @@ vect_transform_stmt (vec_info *vinfo,
>break;
> 
>  case store_vec_info_type:
> -  done = vectorizable_store (vinfo, stmt_info,
> -gsi, _stmt, slp_node, NULL);
> -  gcc_assert (done);
> -  if (STMT_VINFO_GROUPED_ACCESS (stmt_info) && !slp_node)
> +  if (STMT_VINFO_GROUPED_ACCESS (stmt_info)
> + && !slp_node
> + && DR_GROUP_NEXT_ELEMENT (stmt_info))
> +   /* In case of interleaving, the whole chain is vectorized when the
> +  last store in the chain is reached.  Store stmts before the last
> +  one are skipped, and there vec_stmt_info shouldn't be freed
> +  meanwhile.  */
> +   ;
> +  else
> {
> - /* In case of interleaving, the whole chain is vectorized when the
> -last store in the chain is reached.  Store stmts before the last
> -one are skipped, and there vec_stmt_info shouldn't be freed
> -meanwhile.  */
> - stmt_vec_info group_info = DR_GROUP_FIRST_ELEMENT (stmt_info);
> - if (DR_GROUP_STORE_COUNT (group_info) == DR_GROUP_SIZE (group_info))
> -   is_store = true;
> + done = vectorizable_store (vinfo, stmt_info,
> +gsi, _stmt, slp_node, NULL);
> + gcc_assert (done);
> + is_store = true;
> }
> -  else
> -   is_store = true;
>break;
> 
>  case condition_vec_info_type:
> 
> 
>> BR,
>> Kewen
>> -
>>
>> gcc/ChangeLog:
>>
>> * tree-vect-data-refs.cc (vect_set_group_last_element): New function.
>> (vect_analyze_group_access): Call new function
>> vect_set_group_last_element.
>> * tree-vect-stmts.cc (vectorizable_store): Replace 
>> DR_GROUP_STORE_COUNT
>> uses with DR_GROUP_LAST_ELEMENT.
>> (vect_transform_stmt): Likewise.
>> * tree-vect-slp.cc (vect_split_slp_store_group): Likewise.
>> (vect_build_slp_instance): Likewise.
>> * tree-vectorizer.h (DR_GROUP_LAST_ELEMENT): New macro.
>> (DR_GROUP_STORE_COUNT): Remove.
>> (class _stmt_vec_info::store_count): Remove.
>> (class _stmt_vec_info::last_element): New class member.
>> (vect_set_group_last_element): New function declaration.
>> ---
>>  gcc/tree-vect-data-refs.cc | 30 ++
>>  gcc/tree-vect-slp.cc   | 13 +
>>  gcc/tree-vect-stmts.cc |  9 +++--
>>  gcc/tree-vectorizer.h  | 12 +++-
>>  4 files changed, 49 insertions(+), 15 deletions(-)
>>
>> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
>> index 3e9a284666c..c4a495431d5 100644
>> --- a/gcc/tree-vect-data-refs.cc
>> +++ b/gcc/tree-vect-data-refs.cc
>> @@ -2832,6 +2832,33 @@ vect_analyze_group_access_1 (vec_info *vinfo, 
>> dr_vec_info *dr_info)
>>return true;
>>  }
>>
>> +/* Given vectorization information VINFO, set the last element in the
>> +   group led by FIRST_STMT_INFO.  For now, it's only used for loop
>> +   

Ping^^ [PATCH V5 2/2] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-08-22 Thread guojiufu via Gcc-patches

Hi,

I would like to have a gentle ping...

BR,
Jeff (Jiufu Guo)

On 2023-08-07 10:45, guojiufu via Gcc-patches wrote:

Hi,

Gentle ping...

On 2023-07-18 22:05, Jiufu Guo wrote:

Hi,

Integer expression "(X - N * M) / N" can be optimized to "X / N - M"
if there is no wrap/overflow/underflow and "X - N * M" has the same
sign with "X".

Compare the previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624067.html
- APIs: overflow, nonnegative_p and nonpositive_p are moved close
  to value range.
- Use above APIs in match.pd.

Bootstrap & regtest pass on ppc64{,le} and x86_64.
Is this patch ok for trunk?

BR,
Jeff (Jiufu Guo)

PR tree-optimization/108757

gcc/ChangeLog:

* match.pd ((X - N * M) / N): New pattern.
((X + N * M) / N): New pattern.
((X + C) div_rshift N): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/pr108757-1.c: New test.
* gcc.dg/pr108757-2.c: New test.
* gcc.dg/pr108757.h: New test.

---
 gcc/match.pd  |  85 +++
 gcc/testsuite/gcc.dg/pr108757-1.c |  18 +++
 gcc/testsuite/gcc.dg/pr108757-2.c |  19 +++
 gcc/testsuite/gcc.dg/pr108757.h   | 233 
++

 4 files changed, 355 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr108757-1.c
 create mode 100644 gcc/testsuite/gcc.dg/pr108757-2.c
 create mode 100644 gcc/testsuite/gcc.dg/pr108757.h

diff --git a/gcc/match.pd b/gcc/match.pd
index 8543f777a28..39dbb0567dc 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -942,6 +942,91 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 #endif


+#if GIMPLE
+(for div (trunc_div exact_div)
+ /* Simplify (t + M*N) / N -> t / N + M.  */
+ (simplify
+  (div (plus:c@4 @0 (mult:c@3 @1 @2)) @2)
+  (with {value_range vr0, vr1, vr2, vr3, vr4;}
+  (if (INTEGRAL_TYPE_P (type)
+   && get_range_query (cfun)->range_of_expr (vr1, @1)
+   && get_range_query (cfun)->range_of_expr (vr2, @2)
+   && range_op_handler (MULT_EXPR).overflow_free_p (vr1, vr2)
+   && get_range_query (cfun)->range_of_expr (vr0, @0)
+   && get_range_query (cfun)->range_of_expr (vr3, @3)
+   && range_op_handler (PLUS_EXPR).overflow_free_p (vr0, vr3)
+   && get_range_query (cfun)->range_of_expr (vr4, @4)
+   && (TYPE_UNSIGNED (type)
+  || (vr0.nonnegative_p () && vr4.nonnegative_p ())
+  || (vr0.nonpositive_p () && vr4.nonpositive_p (
+  (plus (div @0 @2) @1
+
+ /* Simplify (t - M*N) / N -> t / N - M.  */
+ (simplify
+  (div (minus@4 @0 (mult:c@3 @1 @2)) @2)
+  (with {value_range vr0, vr1, vr2, vr3, vr4;}
+  (if (INTEGRAL_TYPE_P (type)
+   && get_range_query (cfun)->range_of_expr (vr1, @1)
+   && get_range_query (cfun)->range_of_expr (vr2, @2)
+   && range_op_handler (MULT_EXPR).overflow_free_p (vr1, vr2)
+   && get_range_query (cfun)->range_of_expr (vr0, @0)
+   && get_range_query (cfun)->range_of_expr (vr3, @3)
+   && range_op_handler (MINUS_EXPR).overflow_free_p (vr0, vr3)
+   && get_range_query (cfun)->range_of_expr (vr4, @4)
+   && (TYPE_UNSIGNED (type)
+  || (vr0.nonnegative_p () && vr4.nonnegative_p ())
+  || (vr0.nonpositive_p () && vr4.nonpositive_p (
+  (minus (div @0 @2) @1)
+
+/* Simplify
+   (t + C) / N -> t / N + C / N where C is multiple of N.
+   (t + C) >> N -> t >> N + C>>N if low N bits of C is 0.  */
+(for op (trunc_div exact_div rshift)
+ (simplify
+  (op (plus@3 @0 INTEGER_CST@1) INTEGER_CST@2)
+   (with
+{
+  wide_int c = wi::to_wide (@1);
+  wide_int n = wi::to_wide (@2);
+  bool is_rshift = op == RSHIFT_EXPR;
+  bool neg_c = false;
+  bool ok = false;
+  value_range vr0;
+  if (INTEGRAL_TYPE_P (type)
+ && get_range_query (cfun)->range_of_expr (vr0, @0))
+{
+ ok = is_rshift ? wi::ctz (c) >= n.to_shwi ()
+: wi::multiple_of_p (c, n, TYPE_SIGN (type));
+ value_range vr1, vr3;
+ ok = ok && get_range_query (cfun)->range_of_expr (vr1, @1)
+  && range_op_handler (PLUS_EXPR).overflow_free_p (vr0, vr1)
+  && get_range_query (cfun)->range_of_expr (vr3, @3)
+  && (TYPE_UNSIGNED (type)
+  || (vr0.nonnegative_p () && vr3.nonnegative_p ())
+  || (vr0.nonpositive_p () && vr3.nonpositive_p ()));
+
+ /* Try check 'X + C' as 'X - -C' for unsigned.  */
+ if (!ok && TYPE_UNSIGNED (type) && c.sign_mask () < 0)
+   {
+ neg_c = true;
+ c = -c;
+ ok = is_rshift ? wi::ctz (c) >= n.to_shwi ()
+: wi::multiple_of_p (c, n, UNSIGNED);
+ ok = ok && wi::geu_p (vr0.lower_bound (), c);
+   }
+   }
+}
+   (if (ok)
+   (with
+{
+  wide_int m;
+  m = is_rshift ? wi::rshift (c, n, TYPE_SIGN (type))
+   : wi::div_trunc (c, n, TYPE_SIGN (type));
+  m = neg_c ? -m : m;
+}
+   (plus (op @0 @2) { wide_int_to_tree(type, m); 

RE: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Jakub Jelinek 
> Sent: Tuesday, August 22, 2023 11:02 PM
> To: Hongtao Liu 
> Cc: Richard Biener ; Jiang, Haochen
> ; ZiNgA BuRgA ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On Tue, Aug 22, 2023 at 10:35:55PM +0800, Hongtao Liu wrote:
> > Let's assume there's no detla now, AVX10.1-512 is equal to
> > AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG,VPOPCNTDQ}
> > > other stuff.
> > > The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET* would 
> > > be
> > > like now, except that the current AVX512* sets imply also EVEX512/whatever
> > > it will be called, that option itself enables nothing (or TARGET_AVX512F),
> > > and unsetting it doesn't disable all the TARGET_AVX512*.
> > > -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
> > So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> > -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
> > then the combination basically is equal to AVX10.1-512(AVX512* sets +
> > EVEX512)
> > If this is your assumption, yes, there's no need for TARGET_AVX10_1.

I think we still need that since the current w/o AVX512VL, we will not only
enable 512 bit vector instructions but also enable scalar instructions, which
means when it comes to -mavx512bw -mno-evex512, we should enable
the scalar function.

And scalar functions will also be enabled in AVX10.1-256, we need something
to distinguish them out from the ISA set w/o AVX512VL.

Thx,
Haochen

> 
> I think that would be my expectation.  -mavx512bw currently implies
> 512-bit vector support of avx512f and avx512bw, and with -mavx512{bw,vl}
> also 128-bit/256-bit vector support.  All pre-AVX10 chips which do support
> AVX512BW support 512-bit vectors.  Now, -mavx10.1 will bring in also
> vl,dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq as you wrote
> which weren't enabled before, but unless there is some existing or planned
> CPU which would support 512-bit vectors in avx512f and avx512bw ISAs and
> only support 128/256-bit vectors in those
> dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq isas, I think there
> is no need to differentiate further; the only CPUs which will support both
> what -mavx512bw and -mavx10.1 requires will be (if there is no delta)
> either CPUs with 128/256/512-bit vector support of those
> f,vl,bw,dq,cd,...vpopcntdq ISAs, or AVX10.1-512 ISAs.
> -mavx512vl -mavx512bw -mno-evex512 -mavx10.1-256 would on the other side
> disable all 512-bit vector instructions and in the end just mean the
> same as -mavx10.1-256.
> For just
> -mavx512bw -mno-evex512 -mavx10.1-256
> the question is if that -mno-evex512 turns off also avx512bw/avx512f because
> avx512vl isn't enabled at that point during processing, or if we do that
> only at the end as a special case.  Of course, in this exact case there is
> no difference, because -mavx10.1-256 turns that back on.
> But it would make a difference on
> -mavx512bw -mno-evex512 -mavx512vl
> (when processed right away would disable AVX512BW (because VL isn't on)
> and in the end enable VL,F including EVEX512, or be equivalent to just
> -mavx512bw -mavx512vl if processed at the end, because -mavx512vl implied
> -mevex512 again.
> 
>   Jakub



Re: Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS

2023-08-22 Thread juzhe.zh...@rivai.ai
>> This seems relax the compatiblitly check to allow optimize more case,
>> if so this should be a sperated patch.
This is not a optimization fix, It's an bug fix.

Since fusion for these 2 demands:
1. demand SEW and GE_SEW (meaning demand a SEW larger than a specific SEW).
2. demand SEW and GE_SEW (meaning demand a SEW larger than a specific SEW) and 
demand RATIO.

The new fusion demand should include RATIO demand but it didn't before. It's an 
bug.
It's lucky that previous tests didn't expose such bug before refactor.
But such bug is exposed after refactor.

I committed it with a separate patch.
Thanks.


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-08-22 23:35
To: Kito Cheng
CC: Robin Dapp; Juzhe-Zhong; GCC Patches; Jeff Law
Subject: Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS
It's really great improvement, it's drop some state like HARD_EMPTY
and DIRTY_WITH_KILLED_AVL which make this algorithm more easy to
understand!
also this also fundamentally improved the phase 3, although one
concern is the time complexity might be come more higher order,
(and it's already high enough in fact.)
but mostly those vectorized code are only appeard within the inner
most loop, so that is acceptable in generally
 
So I will try my best to review this closely to make it more close to
the perfect :)
 
I saw you has update serveral testcase, why update instead of add new testcase??
could you say more about why some testcase added __riscv_vadd_vv_i8mf8
or add some more dependency of vl variable?
 
 
 
> @@ -1423,8 +1409,13 @@ static bool
>  ge_sew_ratio_unavailable_p (const vector_insn_info ,
> const vector_insn_info )
>  {
> -  if (!info2.demand_p (DEMAND_LMUL) && info2.demand_p (DEMAND_GE_SEW))
> -return info1.get_sew () < info2.get_sew ();
> +  if (!info2.demand_p (DEMAND_LMUL))
> +{
> +  if (info2.demand_p (DEMAND_GE_SEW))
> +   return info1.get_sew () < info2.get_sew ();
> +  else if (!info2.demand_p (DEMAND_SEW))
> +   return false;
> +}
 
This seems relax the compatiblitly check to allow optimize more case,
if so this should be a sperated patch.
 
>return true;
>  }
 
 
> @@ -1815,7 +1737,7 @@ vector_insn_info::parse_insn (rtx_insn *rinsn)
>  return;
>if (optimize == 0 && !has_vtype_op (rinsn))
>  return;
> -  if (optimize > 0 && !vsetvl_insn_p (rinsn))
> +  if (optimize > 0 && vsetvl_discard_result_insn_p (rinsn))
 
I didn't get this change, could you explan few more about that? it was
early exit for non vsetvl insn, but now it allowed that now?
 
>  return;
>m_state = VALID;
>extract_insn_cached (rinsn);
 
> @@ -2206,9 +2128,9 @@ vector_insn_info::fuse_mask_policy (const 
> vector_insn_info ,
>
>  vector_insn_info
>  vector_insn_info::merge (const vector_insn_info _info,
> -enum merge_type type) const
> +enum merge_type type, unsigned bb_index) const
>  {
> -  if (!vsetvl_insn_p (get_insn ()->rtl ()))
> +  if (!vsetvl_insn_p (get_insn ()->rtl ()) && *this != merge_info)
 
Why need this exception?
 
>  gcc_assert (this->compatible_p (merge_info)
> && "Can't merge incompatible demanded infos");
 
> @@ -2403,18 +2348,22 @@ vector_infos_manager::get_all_available_exprs (
>  }
>
>  bool
> -vector_infos_manager::all_empty_predecessor_p (const basic_block cfg_bb) 
> const
> +vector_infos_manager::earliest_fusion_worthwhile_p (
> +  const basic_block cfg_bb) const
>  {
> -  hash_set pred_cfg_bbs = get_all_predecessors (cfg_bb);
> -  for (const basic_block pred_cfg_bb : pred_cfg_bbs)
> +  edge e;
> +  edge_iterator ei;
> +  profile_probability prob = profile_probability::uninitialized ();
> +  FOR_EACH_EDGE (e, ei, cfg_bb->succs)
>  {
> -  const auto _block_info = vector_block_infos[pred_cfg_bb->index];
> -  if (!pred_block_info.local_dem.valid_or_dirty_p ()
> - && !pred_block_info.reaching_out.valid_or_dirty_p ())
> +  if (prob == profile_probability::uninitialized ())
> +   prob = vector_block_infos[e->dest->index].probability;
> +  else if (prob == vector_block_infos[e->dest->index].probability)
> continue;
> -  return false;
> +  else
> +   return true;
 
Make sure I understand this correctly: it's worth if thoe edges has
different probability?
 
>  }
> -  return true;
> +  return false;
 
If all probability is same, then it's not worth?
 
Plz add few comments no matter my understand is right or not :)
 
>  }
>
>  bool
 
> @@ -2428,12 +2377,12 @@ vector_infos_manager::all_same_ratio_p (sbitmap 
> bitdata) const
>sbitmap_iterator sbi;
>
>EXECUTE_IF_SET_IN_BITMAP (bitdata, 0, bb_index, sbi)
> -  {
> -if (ratio == -1)
> -  ratio = vector_exprs[bb_index]->get_ratio ();
> -else if (vector_exprs[bb_index]->get_ratio () != ratio)
> -  return false;
> -  }
> +{
> +  if (ratio == -1)
> +   ratio = vector_exprs[bb_index]->get_ratio ();
> +  else if 

[PATCH] RISC-V: Adapt live-1.c testcase

2023-08-22 Thread Juzhe-Zhong
Commited.

Fix failures:

FAIL: gcc.target/riscv/rvv/autovec/partial/live-1.c scan-tree-dump-times 
optimized ".VEC_EXTRACT" 10
FAIL: gcc.target/riscv/rvv/autovec/partial/live-1.c scan-tree-dump-times 
optimized ".VEC_EXTRACT" 10

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/live-1.c: Adapt test.

---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/live-1.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/live-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/live-1.c
index 75fa2eba8cc..15ce74a0c4c 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/live-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/live-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param 
riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d 
-fno-vect-cost-model --param riscv-autovec-preference=scalable 
-fdump-tree-optimized-details" } */
 
 #include 
 
@@ -31,4 +31,4 @@
 
 TEST_ALL (EXTRACT_LAST)
 
-/* { dg-final { scan-tree-dump-times "\.VEC_EXTRACT" 10 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\.VEC_EXTRACT" 11 "optimized" } } */
-- 
2.36.3



Re: [committed] i386: Fix grammar typo in diagnostic

2023-08-22 Thread Hongtao Liu via Gcc-patches
On Tue, Aug 8, 2023 at 5:22 AM Marek Polacek via Libstdc++
 wrote:
>
> On Mon, Aug 07, 2023 at 10:12:35PM +0100, Jonathan Wakely via Gcc-patches 
> wrote:
> > Committed as obvious.
> >
> > Less obvious (to me) is whether it's correct to say "GCC V13" here. I
> > don't think we refer to a version that way anywhere else, do we?
> >
> > Would "since GCC 13.1.0" be better?
>
> x86_field_alignment uses
>
>   inform (input_location, "the alignment of %<_Atomic %T%> "
>   "fields changed in %{GCC 11.1%}",
>
> so maybe the below should use %{GCC 13.1%}.  "GCC V13" looks unusual
> to me.
 %{GCC 13.1%} sounds reasonable.
>
> > -- >8 --
> >
> > gcc/ChangeLog:
> >
> >   * config/i386/i386.cc (ix86_invalid_conversion): Fix grammar.
> > ---
> >  gcc/config/i386/i386.cc | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index 50860050049..5d57726e22c 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -22890,7 +22890,7 @@ ix86_invalid_conversion (const_tree fromtype, 
> > const_tree totype)
> >   warning (0, "%<__bfloat16%> is redefined from typedef % "
> >   "to real %<__bf16%> since GCC V13, be careful of "
> >"implicit conversion between %<__bf16%> and %; "
> > -  "a explicit bitcast may be needed here");
> > +  "an explicit bitcast may be needed here");
> >  }
> >
> >/* Conversion allowed.  */
> > --
> > 2.41.0
> >
>
> Marek
>


-- 
BR,
Hongtao


[PATCH] RISC-V: Clang format riscv-vsetvl.cc[NFC]

2023-08-22 Thread Juzhe-Zhong
Commited.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (change_insn): Clang format.
(vector_infos_manager::all_same_ratio_p): Ditto.
(vector_infos_manager::all_same_avl_p): Ditto.
(pass_vsetvl::refine_vsetvls): Ditto.
(pass_vsetvl::cleanup_vsetvls): Ditto.
(pass_vsetvl::commit_vsetvls): Ditto.
(pass_vsetvl::local_eliminate_vsetvl_insn): Ditto.
(pass_vsetvl::global_eliminate_vsetvl_insn): Ditto.
(pass_vsetvl::compute_probabilities): Ditto.

---
 gcc/config/riscv/riscv-vsetvl.cc | 65 ++--
 1 file changed, 29 insertions(+), 36 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 2d8fa754ea0..819a3918b3e 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -907,8 +907,8 @@ change_insn (function_info *ssa, insn_change change, 
insn_info *insn,
] UNSPEC_VPREDICATE)
(plus:RVVM4DI (reg/v:RVVM4DI 104 v8 [orig:137 op1 ] [137])
(sign_extend:RVVM4DI (vec_duplicate:RVVM4SI (reg:SI 15 a5
-[140] (unspec:RVVM4DI [ (const_int 0 [0]) ] UNSPEC_VUNDEF))) 
"rvv.c":8:12
-2784 {pred_single_widen_addsvnx8di_scalar} (expr_list:REG_EQUIV
+[140] (unspec:RVVM4DI [ (const_int 0 [0]) ] UNSPEC_VUNDEF)))
+"rvv.c":8:12 2784 {pred_single_widen_addsvnx8di_scalar} 
(expr_list:REG_EQUIV
 (mem/c:RVVM4DI (reg:DI 10 a0 [142]) [1 +0 S[64, 64] A128])
(expr_list:REG_EQUAL (if_then_else:RVVM4DI (unspec:RVVMF8BI [
(const_vector:RVVMF8BI repeat [
@@ -2428,12 +2428,12 @@ vector_infos_manager::all_same_ratio_p (sbitmap 
bitdata) const
   sbitmap_iterator sbi;
 
   EXECUTE_IF_SET_IN_BITMAP (bitdata, 0, bb_index, sbi)
-  {
-if (ratio == -1)
-  ratio = vector_exprs[bb_index]->get_ratio ();
-else if (vector_exprs[bb_index]->get_ratio () != ratio)
-  return false;
-  }
+{
+  if (ratio == -1)
+   ratio = vector_exprs[bb_index]->get_ratio ();
+  else if (vector_exprs[bb_index]->get_ratio () != ratio)
+   return false;
+}
   return true;
 }
 
@@ -2473,10 +2473,10 @@ vector_infos_manager::all_same_avl_p (const basic_block 
cfg_bb,
   sbitmap_iterator sbi;
 
   EXECUTE_IF_SET_IN_BITMAP (bitdata, 0, bb_index, sbi)
-  {
-if (vector_exprs[bb_index]->get_avl_info () != avl)
-  return false;
-  }
+{
+  if (vector_exprs[bb_index]->get_avl_info () != avl)
+   return false;
+}
   return true;
 }
 
@@ -3892,7 +3892,7 @@ pass_vsetvl::refine_vsetvls (void) const
   basic_block cfg_bb;
   FOR_EACH_BB_FN (cfg_bb, cfun)
 {
-  auto info = get_block_info(cfg_bb).local_dem;
+  auto info = get_block_info (cfg_bb).local_dem;
   insn_info *insn = info.get_insn ();
   if (!info.valid_p ())
continue;
@@ -3938,8 +3938,7 @@ pass_vsetvl::cleanup_vsetvls ()
   basic_block cfg_bb;
   FOR_EACH_BB_FN (cfg_bb, cfun)
 {
-  auto 
-   = get_block_info(cfg_bb).reaching_out;
+  auto  = get_block_info (cfg_bb).reaching_out;
   gcc_assert (m_vector_manager->expr_set_num (
m_vector_manager->vector_del[cfg_bb->index])
  <= 1);
@@ -3951,9 +3950,7 @@ pass_vsetvl::cleanup_vsetvls ()
info.set_unknown ();
  else
{
- const auto dem
-   = get_block_info(cfg_bb)
-   .local_dem;
+ const auto dem = get_block_info (cfg_bb).local_dem;
  gcc_assert (dem == *m_vector_manager->vector_exprs[i]);
  insn_info *insn = dem.get_insn ();
  gcc_assert (insn && insn->rtl ());
@@ -4020,8 +4017,7 @@ pass_vsetvl::commit_vsetvls (void)
   for (const bb_info *bb : crtl->ssa->bbs ())
 {
   basic_block cfg_bb = bb->cfg_bb ();
-  const auto reaching_out
-   = get_block_info(cfg_bb).reaching_out;
+  const auto reaching_out = get_block_info (cfg_bb).reaching_out;
   if (!reaching_out.dirty_p ())
continue;
 
@@ -4035,14 +4031,14 @@ pass_vsetvl::commit_vsetvls (void)
  sbitmap avin = m_vector_manager->vector_avin[cfg_bb->index];
  bool available_p = false;
  EXECUTE_IF_SET_IN_BITMAP (avin, 0, bb_index, sbi)
- {
-   if (m_vector_manager->vector_exprs[bb_index]->available_p (
- reaching_out))
- {
-   available_p = true;
-   break;
- }
- }
+   {
+ if (m_vector_manager->vector_exprs[bb_index]->available_p (
+   reaching_out))
+   {
+ available_p = true;
+ break;
+   }
+   }
  if (available_p)
continue;
}
@@ -4263,7 +4259,8 @@ pass_vsetvl::local_eliminate_vsetvl_insn (const bb_info 
*bb) const
 
   /* Local AVL compatibility checking is simpler than global, we only
 need to 

[PATCH] RISC-V: Add riscv-vsetvl.def to t-riscv

2023-08-22 Thread Juzhe-Zhong
This patch will be backport to GCC 13 and commit to trunk.
gcc/ChangeLog:

* config/riscv/t-riscv: Add riscv-vsetvl.def

---
 gcc/config/riscv/t-riscv | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
index 1252d6f851a..f3ce66ccdd4 100644
--- a/gcc/config/riscv/t-riscv
+++ b/gcc/config/riscv/t-riscv
@@ -62,7 +62,8 @@ riscv-vsetvl.o: $(srcdir)/config/riscv/riscv-vsetvl.cc \
   $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(REGS_H) \
   $(TARGET_H) tree-pass.h df.h rtl-ssa.h cfgcleanup.h insn-config.h \
   insn-attr.h insn-opinit.h tm-constrs.h cfgrtl.h cfganal.h lcm.h \
-  predict.h profile-count.h $(srcdir)/config/riscv/riscv-vsetvl.h
+  predict.h profile-count.h $(srcdir)/config/riscv/riscv-vsetvl.h \
+  $(srcdir)/config/riscv/riscv-vsetvl.def
$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
$(srcdir)/config/riscv/riscv-vsetvl.cc
 
-- 
2.36.3



Re: Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS

2023-08-22 Thread 钟居哲
>> I saw you has update serveral testcase, why update instead of add new 
>> testcase??
Since original testcase failed after this patch.

>> could you say more about why some testcase added __riscv_vadd_vv_i8mf8
>> or add some more dependency of vl variable?
These are 2 separate questions.

1. Why some testcase added __riscv_vadd_vv_i8mf8.
This is because the original testcase is too fragile and easily fail.
 Consider this following case:

  for (...)
 if (cond)
   vsetvl e8mf8
   load
   store
 else
   vsetvl e16mf4
   load
   store
This example, we know that both "e8mf8" and "e16mf4" are compatible, so we can 
either put a vsevl e8mf8 or vsetvli e16mf4 before the
for...loop and elide all vsetvlis inside the loop. 
Before this patch, the codegen result is vsetvli e8mf8, after this patch, the 
codegen result is vsetvli e16mf4.
They are both legal and optimal codegen.

To avoid future potential unnecessary test report failure, I added "vadd" which 
demand both SEW and LMUL and only allow e8mf8.
Such testcase doesn't change our testing goal, since our goal of this testcase 
is to test LCM ability of fusing VSETVL and compute the 
optimal location of vsetvl.

2. Why add some more dependency of vl variable ?
Well, as I told you previously.
HARD_EMPTY and DIRTY_WITH_KILLED_AVL is supposed to optimize this following
case:

li a6, 101.
vsetvli e8mf8
for ...
li a5,101
vsetvli e16mf4
for ...

This case happens since we set "li" cost too low that previous pass failed to 
optimized them.
I don't think we should optimize such corner case in VSETVL PASS which 
complicates the implementation seriously and 
mess up the code quality.

So after I remove them, the codegen for such case will generate one more 
"vsetvli" (only one more dynamic run-time instruction count).
I note if we make all "li" inside a loop, the issue will be gone and VSETVL 
PASS can achieve optimal codegen.

To fix this failure of such testcases, instead of "vl= 101", I make them "vl = 
a + 101", then the assembly check remain and pass.

Thanks.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-08-22 23:35
To: Kito Cheng
CC: Robin Dapp; Juzhe-Zhong; GCC Patches; Jeff Law
Subject: Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS
It's really great improvement, it's drop some state like HARD_EMPTY
and DIRTY_WITH_KILLED_AVL which make this algorithm more easy to
understand!
also this also fundamentally improved the phase 3, although one
concern is the time complexity might be come more higher order,
(and it's already high enough in fact.)
but mostly those vectorized code are only appeard within the inner
most loop, so that is acceptable in generally
 
So I will try my best to review this closely to make it more close to
the perfect :)
 
I saw you has update serveral testcase, why update instead of add new testcase??
could you say more about why some testcase added __riscv_vadd_vv_i8mf8
or add some more dependency of vl variable?
 
 
 
> @@ -1423,8 +1409,13 @@ static bool
>  ge_sew_ratio_unavailable_p (const vector_insn_info ,
> const vector_insn_info )
>  {
> -  if (!info2.demand_p (DEMAND_LMUL) && info2.demand_p (DEMAND_GE_SEW))
> -return info1.get_sew () < info2.get_sew ();
> +  if (!info2.demand_p (DEMAND_LMUL))
> +{
> +  if (info2.demand_p (DEMAND_GE_SEW))
> +   return info1.get_sew () < info2.get_sew ();
> +  else if (!info2.demand_p (DEMAND_SEW))
> +   return false;
> +}
 
This seems relax the compatiblitly check to allow optimize more case,
if so this should be a sperated patch.
 
>return true;
>  }
 
 
> @@ -1815,7 +1737,7 @@ vector_insn_info::parse_insn (rtx_insn *rinsn)
>  return;
>if (optimize == 0 && !has_vtype_op (rinsn))
>  return;
> -  if (optimize > 0 && !vsetvl_insn_p (rinsn))
> +  if (optimize > 0 && vsetvl_discard_result_insn_p (rinsn))
 
I didn't get this change, could you explan few more about that? it was
early exit for non vsetvl insn, but now it allowed that now?
 
>  return;
>m_state = VALID;
>extract_insn_cached (rinsn);
 
> @@ -2206,9 +2128,9 @@ vector_insn_info::fuse_mask_policy (const 
> vector_insn_info ,
>
>  vector_insn_info
>  vector_insn_info::merge (const vector_insn_info _info,
> -enum merge_type type) const
> +enum merge_type type, unsigned bb_index) const
>  {
> -  if (!vsetvl_insn_p (get_insn ()->rtl ()))
> +  if (!vsetvl_insn_p (get_insn ()->rtl ()) && *this != merge_info)
 
Why need this exception?
 
>  gcc_assert (this->compatible_p (merge_info)
> && "Can't merge incompatible demanded infos");
 
> @@ -2403,18 +2348,22 @@ vector_infos_manager::get_all_available_exprs (
>  }
>
>  bool
> -vector_infos_manager::all_empty_predecessor_p (const basic_block cfg_bb) 
> const
> +vector_infos_manager::earliest_fusion_worthwhile_p (
> +  const basic_block cfg_bb) const
>  {
> -  

Re: Patch ping Re: [PATCH 0/12] GCC _BitInt support [PR102989]

2023-08-22 Thread Andrew Pinski via Gcc-patches
On Mon, Aug 21, 2023 at 8:25 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> Hi!
>
> On Wed, Aug 09, 2023 at 08:14:14PM +0200, Jakub Jelinek via Gcc-patches wrote:
> > Jakub Jelinek (12):
> >   expr: Small optimization [PR102989]
> >   lto-streamer-in: Adjust assert [PR102989]
> >   phiopt: Fix phiopt ICE on vops [PR102989]
> >   Middle-end _BitInt support [PR102989]
> >   _BitInt lowering support [PR102989]
> >   i386: Enable _BitInt on x86-64 [PR102989]
> >   ubsan: _BitInt -fsanitize=undefined support [PR102989]
> >   libgcc: Generated tables for _BitInt <-> _Decimal* conversions [PR102989]
> >   libgcc _BitInt support [PR102989]
> >   C _BitInt support [PR102989]
> >   testsuite part 1 for _BitInt support [PR102989]
> >   testsuite part 2 for _BitInt support [PR102989]
>
> +   C _BitInt incremental fixes [PR102989]
>
> I'd like to ping this patch series.
> First 3 patches are committed, the rest awaits patch review.
>
> Joseph, could I ask now at least for an overall design review of the
> C patches (8-10,13) whether its interfaces with middle-end are ok,
> so that Richi can review the middle-end parts?

On a related note, does it make sense to add this as a C++ front-end
as an Extension too?
I noticed clang supports it for C++.

Thanks,
Andrew

>
> Thanks.
>
> Jakub
>


[pushed] analyzer: reimplement kf_strlen [PR105899]

2023-08-22 Thread David Malcolm via Gcc-patches
Reimplement kf_strlen in terms of the new string scanning
implementation, sharing strlen's implementation with
__analyzer_get_strlen.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-3391-g3242fb533d48ab.

gcc/analyzer/ChangeLog:
PR analyzer/105899
* kf-analyzer.cc (class kf_analyzer_get_strlen): Move to kf.cc.
(register_known_analyzer_functions): Use make_kf_strlen.
* kf.cc (class kf_strlen::impl_call_pre): Replace with
implementation of kf_analyzer_get_strlen from kf-analyzer.cc.
Handle "UNKNOWN" return from check_for_null_terminated_string_arg
by falling back to a conjured svalue.
(make_kf_strlen): New.
(register_known_functions): Use make_kf_strlen.
* known-function-manager.h (make_kf_strlen): New decl.

gcc/testsuite/ChangeLog:
PR analyzer/105899
* gcc.dg/analyzer/null-terminated-strings-1.c: Update expected
results on symbolic values.
* gcc.dg/analyzer/strlen-1.c: New test.
---
 gcc/analyzer/kf-analyzer.cc   | 30 +-
 gcc/analyzer/kf.cc| 56 +--
 gcc/analyzer/known-function-manager.h |  2 +
 .../analyzer/null-terminated-strings-1.c  |  4 +-
 gcc/testsuite/gcc.dg/analyzer/strlen-1.c  | 54 ++
 5 files changed, 85 insertions(+), 61 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/strlen-1.c

diff --git a/gcc/analyzer/kf-analyzer.cc b/gcc/analyzer/kf-analyzer.cc
index c767ebcb6615..7ae598a89123 100644
--- a/gcc/analyzer/kf-analyzer.cc
+++ b/gcc/analyzer/kf-analyzer.cc
@@ -358,33 +358,6 @@ public:
   }
 };
 
-/* Handler for "__analyzer_get_strlen".  */
-
-class kf_analyzer_get_strlen : public known_function
-{
-public:
-  bool matches_call_types_p (const call_details ) const final override
-  {
-return cd.num_args () == 1 && cd.arg_is_pointer_p (0);
-  }
-  void impl_call_pre (const call_details ) const final override
-  {
-if (const svalue *bytes_read = cd.check_for_null_terminated_string_arg (0))
-  {
-   region_model_manager *mgr = cd.get_manager ();
-   /* strlen is (bytes_read - 1).  */
-   const svalue *strlen_sval
- = mgr->get_or_create_binop (size_type_node,
- MINUS_EXPR,
- bytes_read,
- mgr->get_or_create_int_cst 
(size_type_node, 1));
-   cd.maybe_set_lhs (strlen_sval);
-  }
-else
-  cd.set_any_lhs_with_defaults ();
-  }
-};
-
 /* Populate KFM with instances of known functions used for debugging the
analyzer and for writing DejaGnu tests, all with a "__analyzer_" prefix.  */
 
@@ -406,8 +379,7 @@ register_known_analyzer_functions (known_function_manager 
)
   kfm.add ("__analyzer_eval", make_unique ());
   kfm.add ("__analyzer_get_unknown_ptr",
   make_unique ());
-  kfm.add ("__analyzer_get_strlen",
-  make_unique ());
+  kfm.add ("__analyzer_get_strlen", make_kf_strlen ());
 }
 
 } // namespace ana
diff --git a/gcc/analyzer/kf.cc b/gcc/analyzer/kf.cc
index 1601cf15c685..59f46bab581c 100644
--- a/gcc/analyzer/kf.cc
+++ b/gcc/analyzer/kf.cc
@@ -1187,7 +1187,7 @@ public:
   }
 };
 
-/* Handle the on_call_pre part of "strlen".  */
+/* Handler for "strlen" and for "__analyzer_get_strlen".  */
 
 class kf_strlen : public known_function
 {
@@ -1196,37 +1196,33 @@ public:
   {
 return (cd.num_args () == 1 && cd.arg_is_pointer_p (0));
   }
-  void impl_call_pre (const call_details ) const final override;
-};
-
-void
-kf_strlen::impl_call_pre (const call_details ) const
-{
-  region_model_context *ctxt = cd.get_ctxt ();
-  region_model *model = cd.get_model ();
-  region_model_manager *mgr = cd.get_manager ();
-
-  const svalue *arg_sval = cd.get_arg_svalue (0);
-  const region *buf_reg
-= model->deref_rvalue (arg_sval, cd.get_arg_tree (0), ctxt);
-  if (const string_region *str_reg
-  = buf_reg->dyn_cast_string_region ())
-{
-  tree str_cst = str_reg->get_string_cst ();
-  /* TREE_STRING_LENGTH is sizeof, not strlen.  */
-  int sizeof_cst = TREE_STRING_LENGTH (str_cst);
-  int strlen_cst = sizeof_cst - 1;
-  if (cd.get_lhs_type ())
+  void impl_call_pre (const call_details ) const final override
+  {
+if (const svalue *bytes_read = cd.check_for_null_terminated_string_arg (0))
+  if (bytes_read->get_kind () != SK_UNKNOWN)
{
- tree t_cst = build_int_cst (cd.get_lhs_type (), strlen_cst);
- const svalue *result_sval
-   = mgr->get_or_create_constant_svalue (t_cst);
- cd.maybe_set_lhs (result_sval);
+ region_model_manager *mgr = cd.get_manager ();
+ /* strlen is (bytes_read - 1).  */
+ const svalue *one = mgr->get_or_create_int_cst (size_type_node, 1);
+ const svalue *strlen_sval = mgr->get_or_create_binop (size_type_node,
+

Re: [PATCH] RISC-V: Add conditional unary neg/abs/not autovec patterns

2023-08-22 Thread Robin Dapp via Gcc-patches
Hi Lehua,

no concerns here, just tiny remarks but in general LGTM as is.

> +(define_insn_and_split "*copysign_neg"
> +  [(set (match_operand:VF 0 "register_operand")
> +(neg:VF
> +  (unspec:VF [
> +(match_operand:VF 1 "register_operand")
> +(match_operand:VF 2 "register_operand")
> +  ] UNSPEC_VCOPYSIGN)))]
> +  "TARGET_VECTOR && can_create_pseudo_p ()"
> +  "#"
> +  "&& 1"
> +  [(const_int 0)]
> +{
> +  riscv_vector::emit_vlmax_insn (code_for_pred_ncopysign (mode),
> + riscv_vector::RVV_BINOP, operands);
> +  DONE;
> +})

It's a bit unfortunate that we need this now but well, no way around it.

> -  emit_insn (gen_vcond_mask (vmode, vmode, d->target, d->op0, d->op1, mask));
> +  /* swap op0 and op1 since the order is opposite to pred_merge.  */
> +  rtx ops2[] = {d->target, d->op1, d->op0, mask};
> +  emit_vlmax_merge_insn (code_for_pred_merge (vmode), 
> riscv_vector::RVV_MERGE_OP, ops2);
>return true;
>  }

This seems a separate, general fix that just surfaced in the course of
this patch?  Would be nice to have this factored out but as we already have
it, no need I guess.

> +  if (is_dummy_mask)
> +{
> +  /* Use TU, MASK ANY policy.  */
> +  if (needs_fp_rounding (code, mode))
> + emit_nonvlmax_fp_tu_insn (icode, RVV_UNOP_TU, cond_ops, len);
> +  else
> + emit_nonvlmax_tu_insn (icode, RVV_UNOP_TU, cond_ops, len);
> +}

We have quite a bit of code duplication across the expand_cond_len functions
now (binop, ternop, unop).  Not particular to your patch but I'd suggest to
unify this later. 

> +TEST_ALL (DEF_LOOP)
> +
> +/* NOTE: int abs operator is converted to vmslt + vneg.v */
> +/* { dg-final { scan-assembler-times {\tvneg\.v\tv[0-9]+,v[0-9]+,v0\.t} 12 { 
> xfail { any-opts "--param riscv-autovec-lmul=m2" } } } } */

Why does this fail with LMUL == 2 (also in the following tests)?  A comment
would be nice here.

Regards
 Robin



Re: [PATCH v10 3/5] c++: Implement __is_function built-in trait

2023-08-22 Thread Patrick Palka via Gcc-patches
On Wed, 12 Jul 2023, Ken Matsui via Libstdc++ wrote:

> This patch implements built-in trait for std::is_function.
> 
> gcc/cp/ChangeLog:
> 
>   * cp-trait.def: Define __is_function.
>   * constraint.cc (diagnose_trait_expr): Handle CPTK_IS_FUNCTION.
>   * semantics.cc (trait_expr_value): Likewise.
>   (finish_trait_expr): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/ext/has-builtin-1.C: Test existence of __is_function.
>   * g++.dg/ext/is_function.C: New test.

LGTM!

> 
> Signed-off-by: Ken Matsui 
> ---
>  gcc/cp/constraint.cc |  3 ++
>  gcc/cp/cp-trait.def  |  1 +
>  gcc/cp/semantics.cc  |  4 ++
>  gcc/testsuite/g++.dg/ext/has-builtin-1.C |  3 ++
>  gcc/testsuite/g++.dg/ext/is_function.C   | 58 
>  5 files changed, 69 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/ext/is_function.C
> 
> diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> index f6951ee2670..927605c6cb7 100644
> --- a/gcc/cp/constraint.cc
> +++ b/gcc/cp/constraint.cc
> @@ -3754,6 +3754,9 @@ diagnose_trait_expr (tree expr, tree args)
>  case CPTK_IS_UNION:
>inform (loc, "  %qT is not a union", t1);
>break;
> +case CPTK_IS_FUNCTION:
> +  inform (loc, "  %qT is not a function", t1);
> +  break;
>  case CPTK_IS_AGGREGATE:
>inform (loc, "  %qT is not an aggregate", t1);
>break;
> diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
> index 1e3310cd682..3cd3babc242 100644
> --- a/gcc/cp/cp-trait.def
> +++ b/gcc/cp/cp-trait.def
> @@ -83,6 +83,7 @@ DEFTRAIT_EXPR (IS_TRIVIALLY_ASSIGNABLE, 
> "__is_trivially_assignable", 2)
>  DEFTRAIT_EXPR (IS_TRIVIALLY_CONSTRUCTIBLE, "__is_trivially_constructible", 
> -1)
>  DEFTRAIT_EXPR (IS_TRIVIALLY_COPYABLE, "__is_trivially_copyable", 1)
>  DEFTRAIT_EXPR (IS_UNION, "__is_union", 1)
> +DEFTRAIT_EXPR (IS_FUNCTION, "__is_function", 1)
>  DEFTRAIT_EXPR (REF_CONSTRUCTS_FROM_TEMPORARY, 
> "__reference_constructs_from_temporary", 2)
>  DEFTRAIT_EXPR (REF_CONVERTS_FROM_TEMPORARY, 
> "__reference_converts_from_temporary", 2)
>  /* FIXME Added space to avoid direct usage in GCC 13.  */
> diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
> index 2f37bc353a1..b976633645a 100644
> --- a/gcc/cp/semantics.cc
> +++ b/gcc/cp/semantics.cc
> @@ -12072,6 +12072,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, 
> tree type2)
>  case CPTK_IS_ENUM:
>return type_code1 == ENUMERAL_TYPE;
>  
> +case CPTK_IS_FUNCTION:
> +  return type_code1 == FUNCTION_TYPE;
> +
>  case CPTK_IS_FINAL:
>return CLASS_TYPE_P (type1) && CLASSTYPE_FINAL (type1);
>  
> @@ -12293,6 +12296,7 @@ finish_trait_expr (location_t loc, cp_trait_kind 
> kind, tree type1, tree type2)
>  case CPTK_IS_UNION:
>  case CPTK_IS_SAME:
>  case CPTK_IS_REFERENCE:
> +case CPTK_IS_FUNCTION:
>break;
>  
>  case CPTK_IS_LAYOUT_COMPATIBLE:
> diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
> b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
> index b697673790c..90eb00ebf2d 100644
> --- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
> +++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
> @@ -149,3 +149,6 @@
>  #if !__has_builtin (__is_reference)
>  # error "__has_builtin (__is_reference) failed"
>  #endif
> +#if !__has_builtin (__is_function)
> +# error "__has_builtin (__is_function) failed"
> +#endif
> diff --git a/gcc/testsuite/g++.dg/ext/is_function.C 
> b/gcc/testsuite/g++.dg/ext/is_function.C
> new file mode 100644
> index 000..2e1594b12ad
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/ext/is_function.C
> @@ -0,0 +1,58 @@
> +// { dg-do compile { target c++11 } }
> +
> +#include 
> +
> +using namespace __gnu_test;
> +
> +#define SA(X) static_assert((X),#X)
> +#define SA_TEST_CATEGORY(TRAIT, TYPE, EXPECT)\
> +  SA(TRAIT(TYPE) == EXPECT); \
> +  SA(TRAIT(const TYPE) == EXPECT);   \
> +  SA(TRAIT(volatile TYPE) == EXPECT);\
> +  SA(TRAIT(const volatile TYPE) == EXPECT)
> +
> +struct A
> +{ void fn(); };
> +
> +template
> +struct AHolder { };
> +
> +template
> +struct AHolder
> +{ using type = U; };
> +
> +// Positive tests.
> +SA(__is_function(int (int)));
> +SA(__is_function(ClassType (ClassType)));
> +SA(__is_function(float (int, float, int[], int&)));
> +SA(__is_function(int (int, ...)));
> +SA(__is_function(bool (ClassType) const));
> +SA(__is_function(AHolder::type));
> +
> +void fn();
> +SA(__is_function(decltype(fn)));
> +
> +// Negative tests.
> +SA_TEST_CATEGORY(__is_function, int, false);
> +SA_TEST_CATEGORY(__is_function, int*, false);
> +SA_TEST_CATEGORY(__is_function, int&, false);
> +SA_TEST_CATEGORY(__is_function, void, false);
> +SA_TEST_CATEGORY(__is_function, void*, false);
> +SA_TEST_CATEGORY(__is_function, void**, false);
> +SA_TEST_CATEGORY(__is_function, std::nullptr_t, false);
> +
> +SA_TEST_CATEGORY(__is_function, 

Re: [PATCH][committed] RISC-V: Add multiarch support on riscv-linux-gnu

2023-08-22 Thread Jeff Law




On 8/22/23 12:03, Palmer Dabbelt wrote:

On Tue, 22 Aug 2023 10:39:38 PDT (-0700), Jeff Law wrote:




The docs seem to suggest that we should have a multarch-compatible 
MULTILIB_OSDIRNAMES as we support both multilib and multiarch:


    @code{MULTIARCH_DIRNAME} is not used for configurations that support
    both multilib and multiarch.  In that case, multiarch names are encoded
    in @code{MULTILIB_OSDIRNAMES} instead.

It's not clear if "supports" there actually means "enabled", as IIUC 
none of the distros actually ship multilib.  So maybe this can't manifest and we 
should fix the docs?
Debian explicitly disables multilibs for RISC-V, Ubuntu almost certainly 
just follows that.   Fedora and its relatives don't use multiarch.





Or maybe something like this just does it?

Maybe, but we'd really want/need to test it :-)



    diff --git a/gcc/config/riscv/t-linux b/gcc/config/riscv/t-linux
    index a6f64f88d25..00e382db0f8 100644
    --- a/gcc/config/riscv/t-linux
    +++ b/gcc/config/riscv/t-linux
    @@ -1,5 +1,5 @@
     # Only XLEN and ABI affect Linux multilib dir names, e.g. 
/lib32/ilp32d/
     MULTILIB_DIRNAMES := $(patsubst rv32%,lib32,$(patsubst 
rv64%,lib64,$(MULTILIB_DIRNAMES)))

    -MULTILIB_OSDIRNAMES := $(patsubst lib%,../lib%,$(MULTILIB_DIRNAMES))
    +MULTILIB_OSDIRNAMES := $(patsubst lib64%:rv64%-linux-gnu,$(patsubst 
lib32%:rv32%-linux-gnu,../lib32,$(MULTILIB_DIRNAMES)))
     MULTIARCH_DIRNAME := $(call if_multiarch,$(firstword $(subst -, 
,$(target)))-linux-gnu)


I have no idea how to test multiarch+multilib, though.  Is there a way 
to just autoconf error that out as unsupported until someone wants it?
I'm not sure either.  It might be as simple as bootstrapping on a Debian 
native with a small set of rv64 multilibs.

Jeff


Re: [PATCH V2 2/5] OpenMP: C front end support for imperfectly-nested loops

2023-08-22 Thread Sandra Loosemore via Gcc-patches

On 8/22/23 07:23, Jakub Jelinek wrote:



diff --git a/gcc/testsuite/c-c++-common/goacc/collapse-1.c 
b/gcc/testsuite/c-c++-common/goacc/collapse-1.c
index 11b14383983..0feac8f8ddb 100644
--- a/gcc/testsuite/c-c++-common/goacc/collapse-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/collapse-1.c
@@ -8,8 +8,8 @@ f1 (void)
  {
#pragma acc parallel
#pragma acc loop collapse (2)
-  for (i = 0; i < 5; i++)
-;  /* { dg-error "not enough perfectly 
nested" } */
+  for (i = 0; i < 5; i++)   /* { dg-error "not enough nested loops" } */
+;
{
  for (j = 0; j < 5; j++)
;


All these c-c++-common testsuite changes will now FAIL after the C patch but
before the C++.  It is nice to have the new c-c++-common tests in a separate
patch, but these tweaks which can't be just avoided need the temporary
{ target c } vs. { target c++} hacks undone later in the C++ patch.


In spite of being in the c-c++-common subdirectory, this particular testcase is 
presently run only for C:


/* { dg-skip-if "not yet" { c++ } } */

I did previously do incremental testing between applying the the C and C++ 
parts of the series to confirm that there were no regressions.


BTW, thanks for your previous detailed review of the original version of this 
patch, and pointing out many things I'd overlooked; I feel that V2 is much more 
robust and correct as a result.  :-)


-Sandra


Re: [PATCH] Fortran: implement vector sections in DATA statements [PR49588]

2023-08-22 Thread Harald Anlauf via Gcc-patches

Hi Paul,

Am 22.08.23 um 08:32 schrieb Paul Richard Thomas via Gcc-patches:

Hi Harald,

It all looks good to me and does indeed make the code clearer. OK for trunk.

Thanks for the patch.


thanks for the review!


I was shocked to find that there are 217 older bugs than 49588. Does
anybody test older bugs to check if any of them have been fixed?


I am not aware of this being done systematically.

At the same time, we have over 100 PRs marked as regression,
with a few being fixed on mainline but not backported (or
undecided whether to backport).  Fixing and/or closing them
might be low-hanging fruits.

There are also far more than 100 TODOs in gcc/fortran/*.cc ...

And with the usual PRs, there's enough work left for all kinds
of contributions.

Cheers,
Harald


Paul

On Mon, 21 Aug 2023 at 20:48, Harald Anlauf via Fortran
 wrote:


Dear all,

the attached patch implements vector sections in DATA statements.

The implementation is simpler than the size of the patch suggests,
as part of changes try to clean up the existing code to make it
easier to understand, as ordinary sections (start:end:stride)
and vector sections may actually share some common code.

The basisc idea of the implementation is that one needs a
temporary vector that keeps track of the offsets into the
array constructors for the indices in the array reference
that are vectors.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald







[pushed 2/2] c++: maybe_substitute_reqs_for fix

2023-08-22 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

While working on PR109751 I found that maybe_substitute_reqs_for was doing
the wrong thing for a non-template friend, substituting in the template args
of the scope's original template rather than those of the instantiation.
This didn't end up being necessary to fix the PR, but it's still an
improvement.

gcc/cp/ChangeLog:

* pt.cc (outer_template_args): Handle non-template argument.
* constraint.cc (maybe_substitute_reqs_for): Pass decl to it.
* cp-tree.h (outer_template_args): Adjust.
---
 gcc/cp/cp-tree.h |  2 +-
 gcc/cp/constraint.cc |  2 +-
 gcc/cp/pt.cc | 12 +++-
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 356d7ffb6d6..eb901683b6d 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7083,7 +7083,7 @@ extern tree maybe_set_retval_sentinel (void);
 extern tree template_parms_to_args (tree);
 extern tree template_parms_level_to_args   (tree);
 extern tree generic_targs_for  (tree);
-extern tree outer_template_args(tree);
+extern tree outer_template_args(const_tree);
 
 /* in expr.cc */
 extern tree cplus_expand_constant  (tree);
diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 8cf0f2d0974..c9e4e7043cd 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -1339,7 +1339,7 @@ maybe_substitute_reqs_for (tree reqs, const_tree decl)
   if (DECL_UNIQUE_FRIEND_P (decl) && DECL_TEMPLATE_INFO (decl))
 {
   tree tmpl = DECL_TI_TEMPLATE (decl);
-  tree outer_args = outer_template_args (tmpl);
+  tree outer_args = outer_template_args (decl);
   processing_template_decl_sentinel s;
   if (PRIMARY_TEMPLATE_P (tmpl)
  || uses_template_parms (outer_args))
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index f4e77d172b9..c017591f235 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -4966,19 +4966,21 @@ generic_targs_for (tree tmpl)
 }
 
 /* Return the template arguments corresponding to the template parameters of
-   TMPL's enclosing scope.  When TMPL is a member of a partial specialization,
+   DECL's enclosing scope.  When DECL is a member of a partial specialization,
this returns the arguments for the partial specialization as opposed to 
those
for the primary template, which is the main difference between this function
-   and simply using e.g. the TYPE_TI_ARGS of TMPL's DECL_CONTEXT.  */
+   and simply using e.g. the TYPE_TI_ARGS of DECL's DECL_CONTEXT.  */
 
 tree
-outer_template_args (tree tmpl)
+outer_template_args (const_tree decl)
 {
-  tree ti = get_template_info (DECL_TEMPLATE_RESULT (tmpl));
+  if (TREE_CODE (decl) == TEMPLATE_DECL)
+decl = DECL_TEMPLATE_RESULT (decl);
+  tree ti = get_template_info (decl);
   if (!ti)
 return NULL_TREE;
   tree args = TI_ARGS (ti);
-  if (!PRIMARY_TEMPLATE_P (tmpl))
+  if (!PRIMARY_TEMPLATE_P (TI_TEMPLATE (ti)))
 return args;
   if (TMPL_ARGS_DEPTH (args) == 1)
 return NULL_TREE;
-- 
2.39.3



[pushed 1/2] c++: constrained hidden friends [PR109751]

2023-08-22 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

r13-4035 avoided a problem with overloading of constrained hidden friends by
checking satisfaction, but checking satisfaction early is inconsistent with
the usual late checking and can lead to hard errors, so let's not do that
after all.

We were wrongly treating the different instantiations of the same friend
template as the same function because maybe_substitute_reqs_for was failing
to actually substitute in the case of a non-template friend.  But we don't
actually need to do the substitution anyway, because [temp.friend] says that
such a friend can't be the same as any other declaration.

After fixing that, instead of a redefinition error we got an ambiguous
overload error, fixed by allowing constrained hidden friends to coexist
until overload resolution, at which point they probably won't be in the same
ADL overload set anyway.

And we avoid mangling collisions by following the proposed mangling for
these friends as a member function with an extra 'F' before the name.  I
demangle this by just adding [friend] to the name of the function because
it's not feasible to reconstruct the actual scope of the function since the
mangling ABI doesn't distinguish between class and namespace scopes.

PR c++/109751

gcc/cp/ChangeLog:

* cp-tree.h (member_like_constrained_friend_p): Declare.
* decl.cc (member_like_constrained_friend_p): New.
(function_requirements_equivalent_p): Check it.
(duplicate_decls): Check it.
(grokfndecl): Check friend template constraints.
* mangle.cc (decl_mangling_context): Check it.
(write_unqualified_name): Check it.
* pt.cc (uses_outer_template_parms_in_constraints): Fix for friends.
(tsubst_friend_function): Don't check satisfaction.

include/ChangeLog:

* demangle.h (enum demangle_component_type): Add
DEMANGLE_COMPONENT_FRIEND.

libiberty/ChangeLog:

* cp-demangle.c (d_make_comp): Handle DEMANGLE_COMPONENT_FRIEND.
(d_count_templates_scopes): Likewise.
(d_print_comp_inner): Likewise.
(d_unqualified_name): Handle member-like friend mangling.
* testsuite/demangle-expected: Add test.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-friend11.C: Now works.  Add template.
* g++.dg/cpp2a/concepts-friend15.C: New test.
---
 gcc/cp/cp-tree.h  |  3 +-
 include/demangle.h|  2 +
 gcc/cp/decl.cc| 49 ++-
 gcc/cp/mangle.cc  | 10 
 gcc/cp/pt.cc  | 14 --
 .../g++.dg/cpp2a/concepts-friend11.C  | 26 ++
 .../g++.dg/cpp2a/concepts-friend11a.C | 15 ++
 .../g++.dg/cpp2a/concepts-friend15.C  | 22 +
 libiberty/cp-demangle.c   | 17 +++
 libiberty/testsuite/demangle-expected |  3 ++
 10 files changed, 145 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-friend11a.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-friend15.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index d051ee85f70..356d7ffb6d6 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6859,6 +6859,7 @@ extern void note_break_stmt   (void);
 extern bool note_iteration_stmt_body_start (void);
 extern void note_iteration_stmt_body_end   (bool);
 extern void determine_local_discriminator  (tree);
+extern bool member_like_constrained_friend_p   (tree);
 extern bool fns_correspond (tree, tree);
 extern int decls_match (tree, tree, bool = true);
 extern bool maybe_version_functions(tree, tree, bool);
@@ -7385,7 +7386,7 @@ extern tree lookup_template_function  (tree, 
tree);
 extern tree lookup_template_variable   (tree, tree, tsubst_flags_t);
 extern bool uses_template_parms(tree);
 extern bool uses_template_parms_level  (tree, int);
-extern bool uses_outer_template_parms_in_constraints (tree);
+extern bool uses_outer_template_parms_in_constraints (tree, tree = NULL_TREE);
 extern bool need_generic_capture   (void);
 extern tree instantiate_class_template (tree);
 extern tree instantiate_template   (tree, tree, tsubst_flags_t);
diff --git a/include/demangle.h b/include/demangle.h
index 769137e03e5..f062d7731c6 100644
--- a/include/demangle.h
+++ b/include/demangle.h
@@ -448,6 +448,8 @@ enum demangle_component_type
   DEMANGLE_COMPONENT_TRANSACTION_SAFE,
   /* A cloned function.  */
   DEMANGLE_COMPONENT_CLONE,
+  /* A member-like friend function.  */
+  DEMANGLE_COMPONENT_FRIEND,
   DEMANGLE_COMPONENT_NOEXCEPT,
   DEMANGLE_COMPONENT_THROW_SPEC,
 
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 62c34bf9abe..bea0ee92106 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -951,6 

Re: [PATCH] RISC-V: output Autovec params explicitly in --help ...

2023-08-22 Thread Vineet Gupta




On 8/22/23 11:07, Palmer Dabbelt wrote:
We should probably put them in invoke.texi as well (and anything else 
we're missing that's been added recently). 


Looks like I'd pushed the patch already.
A whole bunch of them are missing, so guess that can happen seperately.

-Vineet


[Committed] RISC-V: output Autovec params explicitly in --help ...

2023-08-22 Thread Vineet Gupta
... otherwise user has no clue what -param to actually change

gcc/ChangeLog:
* config/riscv/riscv.opt: Add --param names
riscv-autovec-preference and riscv-autovec-lmul

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv.opt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 6304efebfd50..a962ea8f9d41 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -277,7 +277,7 @@ Always inline subword atomic operations.
 
 Enum
 Name(riscv_autovec_preference) Type(enum riscv_autovec_preference_enum)
-The RISC-V auto-vectorization preference:
+Valid arguments to -param=riscv-autovec-preference=:
 
 EnumValue
 Enum(riscv_autovec_preference) String(none) Value(NO_AUTOVEC)
@@ -294,7 +294,7 @@ Target RejectNegative Joined Enum(riscv_autovec_preference) 
Var(riscv_autovec_pr
 
 Enum
 Name(riscv_autovec_lmul) Type(enum riscv_autovec_lmul_enum)
-The RVV possible LMUL:
+The RVV possible LMUL (-param=riscv-autovec-lmul=):
 
 EnumValue
 Enum(riscv_autovec_lmul) String(m1) Value(RVV_M1)
-- 
2.34.1



Re: [PATCH] RISC-V: output Autovec params explicitly in --help ...

2023-08-22 Thread Palmer Dabbelt

On Tue, 22 Aug 2023 10:59:35 PDT (-0700), gcc-patches@gcc.gnu.org wrote:



On 8/22/23 11:40, Vineet Gupta wrote:

... otherwise user has no clue what -param to actually change

gcc/ChangeLog:
* config/riscv/riscv.opt: Add --param names
  riscv-autovec-preference and riscv-autovec-lmul

OK


We should probably put them in invoke.texi as well (and anything else 
we're missing that's been added recently).


Re: [PATCH][committed] RISC-V: Add multiarch support on riscv-linux-gnu

2023-08-22 Thread Palmer Dabbelt

On Tue, 22 Aug 2023 10:39:38 PDT (-0700), Jeff Law wrote:


This adds multiarch support to the RISC-V port so that bootstraps work
with Debian out-of-the-box.  Without this patch the stage1 compiler is
unable to find headers/libraries when building the stage1 runtime.

This is functionally (and possibly textually) equivalent to Debian's fix
for the same problem.

gcc/
* config/riscv/t-linux: Add MULTIARCH_DIRNAME.

Pushed to the trunk on Raphael's behalf.

Jeff
commit 47f95bc4be4eb14730ab3eaaaf8f6e71fda47690
Author: Raphael Moreira Zinsly 
Date:   Tue Aug 22 11:37:04 2023 -0600

RISC-V: Add multiarch support on riscv-linux-gnu

This adds multiarch support to the RISC-V port so that bootstraps work with
Debian out-of-the-box.  Without this patch the stage1 compiler is unable to
find headers/libraries when building the stage1 runtime.

This is functionally (and possibly textually) equivalent to Debian's fix for
the same problem.

gcc/
* config/riscv/t-linux: Add MULTIARCH_DIRNAME.

diff --git a/gcc/config/riscv/t-linux b/gcc/config/riscv/t-linux
index 216d2776a18..a6f64f88d25 100644
--- a/gcc/config/riscv/t-linux
+++ b/gcc/config/riscv/t-linux
@@ -1,3 +1,5 @@
 # Only XLEN and ABI affect Linux multilib dir names, e.g. /lib32/ilp32d/
 MULTILIB_DIRNAMES := $(patsubst rv32%,lib32,$(patsubst 
rv64%,lib64,$(MULTILIB_DIRNAMES)))
 MULTILIB_OSDIRNAMES := $(patsubst lib%,../lib%,$(MULTILIB_DIRNAMES))
+
+MULTIARCH_DIRNAME := $(call if_multiarch,$(firstword $(subst -, 
,$(target)))-linux-gnu)


The docs seem to suggest that we should have a multarch-compatible 
MULTILIB_OSDIRNAMES as we support both multilib and multiarch:


   @code{MULTIARCH_DIRNAME} is not used for configurations that support
   both multilib and multiarch.  In that case, multiarch names are encoded
   in @code{MULTILIB_OSDIRNAMES} instead.

It's not clear if "supports" there actually means "enabled", as IIUC none of
the distros actually ship multilib.  So maybe this can't manifest and we should
fix the docs?

Or maybe something like this just does it?

   diff --git a/gcc/config/riscv/t-linux b/gcc/config/riscv/t-linux
   index a6f64f88d25..00e382db0f8 100644
   --- a/gcc/config/riscv/t-linux
   +++ b/gcc/config/riscv/t-linux
   @@ -1,5 +1,5 @@
# Only XLEN and ABI affect Linux multilib dir names, e.g. /lib32/ilp32d/
MULTILIB_DIRNAMES := $(patsubst rv32%,lib32,$(patsubst 
rv64%,lib64,$(MULTILIB_DIRNAMES)))
   -MULTILIB_OSDIRNAMES := $(patsubst lib%,../lib%,$(MULTILIB_DIRNAMES))
   +MULTILIB_OSDIRNAMES := $(patsubst lib64%:rv64%-linux-gnu,$(patsubst 
lib32%:rv32%-linux-gnu,../lib32,$(MULTILIB_DIRNAMES)))

MULTIARCH_DIRNAME := $(call if_multiarch,$(firstword $(subst -, ,$(target)))-linux-gnu)


I have no idea how to test multiarch+multilib, though.  Is there a way to just
autoconf error that out as unsupported until someone wants it?


Re: [PATCH] RISC-V: output Autovec params explicitly in --help ...

2023-08-22 Thread Jeff Law via Gcc-patches




On 8/22/23 11:40, Vineet Gupta wrote:

... otherwise user has no clue what -param to actually change

gcc/ChangeLog:
* config/riscv/riscv.opt: Add --param names
  riscv-autovec-preference and riscv-autovec-lmul

OK
jeff


Re: [PATCH v4] c++: extend cold, hot attributes to classes

2023-08-22 Thread Jason Merrill via Gcc-patches

On 8/15/23 09:41, Javier Martinez wrote:
On Mon, Aug 14, 2023 at 8:32 PM Jason Merrill > wrote:

 > I think you also want to check for ATTR_FLAG_TYPE_IN_PLACE.
 > [...]
 > > +  propagate_class_warmth_attribute (t);
 > Maybe call this in check_bases_and_members instead?

Yes, that is sensible. Done.


You still need an update to doc/extend.texi for this additional use of 
the attribute.  Sorry I didn't think of that before.



+ warning (OPT_Wattributes, "ignoring attribute %qE because it "
+   "conflicts with attribute %qs", name, "cold");

...

+ warning (OPT_Wattributes, "ignoring attribute %qE because it "
+   "conflicts with attribute %qs", name, "hot");


Function arguments continuing on the next line should line up with the '('.


+  tree class_has_cold_attr = lookup_attribute ("cold",
+   TYPE_ATTRIBUTES (t));
+  tree class_has_hot_attr = lookup_attribute ("hot",
+   TYPE_ATTRIBUTES (t));


...so I'd suggest reformatting these lines as:

   tree class_has_cold_attr
 = lookup_attribute ("cold", TYPE_ATTRIBUTES (t));
   tree class_has_hot_attr
 = lookup_attribute ("hot", TYPE_ATTRIBUTES (t));


+   decl_attributes (,
+   tree_cons (get_identifier ("cold"), NULL, NULL), 0);


...and maybe use a local variable for the result of tree_cons.


+  if (has_cold_attr || has_hot_attr)
+{
+
+  /* Transparently ignore the new warmth attribute if it


Unnecessary blank line.

Jason



[PATCH] RISC-V: output Autovec params explicitly in --help ...

2023-08-22 Thread Vineet Gupta
... otherwise user has no clue what -param to actually change

gcc/ChangeLog:
* config/riscv/riscv.opt: Add --param names
  riscv-autovec-preference and riscv-autovec-lmul

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv.opt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 6304efebfd50..a962ea8f9d41 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -277,7 +277,7 @@ Always inline subword atomic operations.
 
 Enum
 Name(riscv_autovec_preference) Type(enum riscv_autovec_preference_enum)
-The RISC-V auto-vectorization preference:
+Valid arguments to -param=riscv-autovec-preference=:
 
 EnumValue
 Enum(riscv_autovec_preference) String(none) Value(NO_AUTOVEC)
@@ -294,7 +294,7 @@ Target RejectNegative Joined Enum(riscv_autovec_preference) 
Var(riscv_autovec_pr
 
 Enum
 Name(riscv_autovec_lmul) Type(enum riscv_autovec_lmul_enum)
-The RVV possible LMUL:
+The RVV possible LMUL (-param=riscv-autovec-lmul=):
 
 EnumValue
 Enum(riscv_autovec_lmul) String(m1) Value(RVV_M1)
-- 
2.34.1



[PATCH][committed] RISC-V: Add multiarch support on riscv-linux-gnu

2023-08-22 Thread Jeff Law


This adds multiarch support to the RISC-V port so that bootstraps work 
with Debian out-of-the-box.  Without this patch the stage1 compiler is 
unable to find headers/libraries when building the stage1 runtime.


This is functionally (and possibly textually) equivalent to Debian's fix 
for the same problem.


gcc/
* config/riscv/t-linux: Add MULTIARCH_DIRNAME.

Pushed to the trunk on Raphael's behalf.

Jeff
commit 47f95bc4be4eb14730ab3eaaaf8f6e71fda47690
Author: Raphael Moreira Zinsly 
Date:   Tue Aug 22 11:37:04 2023 -0600

RISC-V: Add multiarch support on riscv-linux-gnu

This adds multiarch support to the RISC-V port so that bootstraps work with
Debian out-of-the-box.  Without this patch the stage1 compiler is unable to
find headers/libraries when building the stage1 runtime.

This is functionally (and possibly textually) equivalent to Debian's fix for
the same problem.

gcc/
* config/riscv/t-linux: Add MULTIARCH_DIRNAME.

diff --git a/gcc/config/riscv/t-linux b/gcc/config/riscv/t-linux
index 216d2776a18..a6f64f88d25 100644
--- a/gcc/config/riscv/t-linux
+++ b/gcc/config/riscv/t-linux
@@ -1,3 +1,5 @@
 # Only XLEN and ABI affect Linux multilib dir names, e.g. /lib32/ilp32d/
 MULTILIB_DIRNAMES := $(patsubst rv32%,lib32,$(patsubst 
rv64%,lib64,$(MULTILIB_DIRNAMES)))
 MULTILIB_OSDIRNAMES := $(patsubst lib%,../lib%,$(MULTILIB_DIRNAMES))
+
+MULTIARCH_DIRNAME := $(call if_multiarch,$(firstword $(subst -, 
,$(target)))-linux-gnu)


Re: [PATCH] libgomp, testsuite: Do not call nonstandard functions on darwin

2023-08-22 Thread Jakub Jelinek via Gcc-patches
On Tue, Aug 22, 2023 at 10:25:51AM +0200, FX Coudert wrote:
> > Revised patch. I does the job on darwin, can you check that it still tests 
> > the functions on Linux?
> > And if so, OK to commit?
> 
> With the correct file, sorry.

Seems to work for me, I see
... -DNONSTDFUNC=1 ...
on the test's command line on linux and the test passes.
So ok.

Jakub



Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS

2023-08-22 Thread Kito Cheng via Gcc-patches
It's really great improvement, it's drop some state like HARD_EMPTY
and DIRTY_WITH_KILLED_AVL which make this algorithm more easy to
understand!
also this also fundamentally improved the phase 3, although one
concern is the time complexity might be come more higher order,
(and it's already high enough in fact.)
but mostly those vectorized code are only appeard within the inner
most loop, so that is acceptable in generally

So I will try my best to review this closely to make it more close to
the perfect :)

I saw you has update serveral testcase, why update instead of add new testcase??
could you say more about why some testcase added __riscv_vadd_vv_i8mf8
or add some more dependency of vl variable?



> @@ -1423,8 +1409,13 @@ static bool
>  ge_sew_ratio_unavailable_p (const vector_insn_info ,
> const vector_insn_info )
>  {
> -  if (!info2.demand_p (DEMAND_LMUL) && info2.demand_p (DEMAND_GE_SEW))
> -return info1.get_sew () < info2.get_sew ();
> +  if (!info2.demand_p (DEMAND_LMUL))
> +{
> +  if (info2.demand_p (DEMAND_GE_SEW))
> +   return info1.get_sew () < info2.get_sew ();
> +  else if (!info2.demand_p (DEMAND_SEW))
> +   return false;
> +}

This seems relax the compatiblitly check to allow optimize more case,
if so this should be a sperated patch.

>return true;
>  }


> @@ -1815,7 +1737,7 @@ vector_insn_info::parse_insn (rtx_insn *rinsn)
>  return;
>if (optimize == 0 && !has_vtype_op (rinsn))
>  return;
> -  if (optimize > 0 && !vsetvl_insn_p (rinsn))
> +  if (optimize > 0 && vsetvl_discard_result_insn_p (rinsn))

I didn't get this change, could you explan few more about that? it was
early exit for non vsetvl insn, but now it allowed that now?

>  return;
>m_state = VALID;
>extract_insn_cached (rinsn);

> @@ -2206,9 +2128,9 @@ vector_insn_info::fuse_mask_policy (const 
> vector_insn_info ,
>
>  vector_insn_info
>  vector_insn_info::merge (const vector_insn_info _info,
> -enum merge_type type) const
> +enum merge_type type, unsigned bb_index) const
>  {
> -  if (!vsetvl_insn_p (get_insn ()->rtl ()))
> +  if (!vsetvl_insn_p (get_insn ()->rtl ()) && *this != merge_info)

Why need this exception?

>  gcc_assert (this->compatible_p (merge_info)
> && "Can't merge incompatible demanded infos");

> @@ -2403,18 +2348,22 @@ vector_infos_manager::get_all_available_exprs (
>  }
>
>  bool
> -vector_infos_manager::all_empty_predecessor_p (const basic_block cfg_bb) 
> const
> +vector_infos_manager::earliest_fusion_worthwhile_p (
> +  const basic_block cfg_bb) const
>  {
> -  hash_set pred_cfg_bbs = get_all_predecessors (cfg_bb);
> -  for (const basic_block pred_cfg_bb : pred_cfg_bbs)
> +  edge e;
> +  edge_iterator ei;
> +  profile_probability prob = profile_probability::uninitialized ();
> +  FOR_EACH_EDGE (e, ei, cfg_bb->succs)
>  {
> -  const auto _block_info = vector_block_infos[pred_cfg_bb->index];
> -  if (!pred_block_info.local_dem.valid_or_dirty_p ()
> - && !pred_block_info.reaching_out.valid_or_dirty_p ())
> +  if (prob == profile_probability::uninitialized ())
> +   prob = vector_block_infos[e->dest->index].probability;
> +  else if (prob == vector_block_infos[e->dest->index].probability)
> continue;
> -  return false;
> +  else
> +   return true;

Make sure I understand this correctly: it's worth if thoe edges has
different probability?

>  }
> -  return true;
> +  return false;

If all probability is same, then it's not worth?

Plz add few comments no matter my understand is right or not :)

>  }
>
>  bool

> @@ -2428,12 +2377,12 @@ vector_infos_manager::all_same_ratio_p (sbitmap 
> bitdata) const
>sbitmap_iterator sbi;
>
>EXECUTE_IF_SET_IN_BITMAP (bitdata, 0, bb_index, sbi)
> -  {
> -if (ratio == -1)
> -  ratio = vector_exprs[bb_index]->get_ratio ();
> -else if (vector_exprs[bb_index]->get_ratio () != ratio)
> -  return false;
> -  }
> +{
> +  if (ratio == -1)
> +   ratio = vector_exprs[bb_index]->get_ratio ();
> +  else if (vector_exprs[bb_index]->get_ratio () != ratio)
> +   return false;
> +}
>return true;
>  }

Split this into a NFC patch, you can commit that without asking review.

> @@ -907,8 +893,8 @@ change_insn (function_info *ssa, insn_change change, 
> insn_info *insn,
> ] UNSPEC_VPREDICATE)
> (plus:RVVM4DI (reg/v:RVVM4DI 104 v8 [orig:137 op1 ] [137])
> (sign_extend:RVVM4DI (vec_duplicate:RVVM4SI (reg:SI 15 a5
> -[140] (unspec:RVVM4DI [ (const_int 0 [0]) ] UNSPEC_VUNDEF))) 
> "rvv.c":8:12
> -2784 {pred_single_widen_addsvnx8di_scalar} (expr_list:REG_EQUIV
> +[140] (unspec:RVVM4DI [ (const_int 0 [0]) ] UNSPEC_VUNDEF)))
> +"rvv.c":8:12 2784 {pred_single_widen_addsvnx8di_scalar} 
> (expr_list:REG_EQUIV
>  (mem/c:RVVM4DI (reg:DI 10 a0 [142]) [1 +0 S[64, 64] A128])
> 

Re: [PATCH] libgccjit: Add support for `restrict` attribute on function parameters

2023-08-22 Thread Antoni Boucher via Gcc-patches
Since the tests in the PR for rustc_codegen_gcc
(https://github.com/rust-lang/rustc_codegen_gcc/pull/312) currently
fails, let's wait a bit before merging the patch, in case it would need
some fixes.

On Thu, 2023-08-17 at 20:09 +0200, Guillaume Gomez via Jit wrote:
> Quick question: do you plan to make the merge or should I ask Antoni?
> 
> Le jeu. 17 août 2023 à 17:59, Guillaume Gomez
> 
> a écrit :
> 
> > Thanks for the review!
> > 
> > Le jeu. 17 août 2023 à 17:50, David Malcolm  a
> > écrit
> > :
> > > 
> > > On Thu, 2023-08-17 at 17:41 +0200, Guillaume Gomez wrote:
> > > > And now I just discovered that a lot of commits from Antoni's
> > > > fork
> > > > haven't been sent upstream which is why the ABI count is so
> > > > high in
> > > > his repository. Fixed that as well.
> > > 
> > > Thanks for the updated patch; I was about to comment on that.
> > > 
> > > This version is good for gcc trunk.
> > > 
> > > Dave
> > > 
> > > > 
> > > > Le jeu. 17 août 2023 à 17:26, Guillaume Gomez
> > > >  a écrit :
> > > > > 
> > > > > Antoni spot a typo I made:
> > > > > 
> > > > > I added `LIBGCCJIT_HAVE_gcc_jit_type_get_size` instead of
> > > > > `LIBGCCJIT_HAVE_gcc_jit_type_get_restrict`. Fixed in this
> > > > > patch,
> > > > > sorry
> > > > > for the noise.
> > > > > 
> > > > > Le jeu. 17 août 2023 à 11:30, Guillaume Gomez
> > > > >  a écrit :
> > > > > > 
> > > > > > Hi Dave,
> > > > > > 
> > > > > > > What kind of testing has the patch had? (e.g. did you run
> > > > > > > "make
> > > > > > > check-
> > > > > > > jit" ?  Has this been in use on real Rust code?)
> > > > > > 
> > > > > > I tested it as Rust backend directly on this code:
> > > > > > 
> > > > > > ```
> > > > > > pub fn foo(a:  i32, b:  i32, c: ) {
> > > > > >     *a += *c;
> > > > > >     *b += *c;
> > > > > > }
> > > > > > ```
> > > > > > 
> > > > > > I ran it with `rustc` (and the GCC backend) with the
> > > > > > following
> > > > > > flags:
> > > > > > `-C link-args=-lc --emit=asm -O --crate-type=lib` which
> > > > > > gave the
> > > > > > diff
> > > > > > you can see in the attached file. Explanations: the diff on
> > > > > > the
> > > > > > right
> > > > > > has the `__restrict__` attribute used whereas on the left
> > > > > > it is
> > > > > > the
> > > > > > current version where we don't handle it.
> > > > > > 
> > > > > > As for C testing, I used this code:
> > > > > > 
> > > > > > ```
> > > > > > void t(int *__restrict__ a, int *__restrict__ b, char
> > > > > > *__restrict__ c) {
> > > > > >     *a += *c;
> > > > > >     *b += *c;
> > > > > > }
> > > > > > ```
> > > > > > 
> > > > > > (without the `__restrict__` of course when I need to have a
> > > > > > witness
> > > > > > ASM). I attached the diff as well, this time the file with
> > > > > > the
> > > > > > use of
> > > > > > `__restrict__` in on the left. I compiled with the
> > > > > > following
> > > > > > flags:
> > > > > > `-S -O3`.
> > > > > > 
> > > > > > > Please add a feature macro:
> > > > > > > #define LIBGCCJIT_HAVE_gcc_jit_type_get_restrict
> > > > > > > (see the similar ones in the header).
> > > > > > 
> > > > > > I added `LIBGCCJIT_HAVE_gcc_jit_type_get_size` and extended
> > > > > > the
> > > > > > documentation as well to mention the ABI change.
> > > > > > 
> > > > > > > Please add a new ABI tag (LIBGCCJIT_ABI_25 ?), rather
> > > > > > > than
> > > > > > > adding this
> > > > > > > to ABI_0.
> > > > > > 
> > > > > > I added `LIBGCCJIT_ABI_34` as `LIBGCCJIT_ABI_33` was the
> > > > > > last
> > > > > > one.
> > > > > > 
> > > > > > > This refers to a "cold attribute"; is this a vestige of a
> > > > > > > copy-
> > > > > > > and-
> > > > > > > paste from a different test case?
> > > > > > 
> > > > > > It is a vestige indeed... Missed this one.
> > > > > > 
> > > > > > > I see that the test scans the generated assembler.  Does
> > > > > > > the
> > > > > > > test
> > > > > > > actually verify that restrict has an effect, or was that
> > > > > > > another
> > > > > > > vestige from a different test case?
> > > > > > 
> > > > > > No, this time it's what I wanted. Please see the C diff I
> > > > > > provided
> > > > > > above to see that the ASM has a small diff that allowed me
> > > > > > to
> > > > > > confirm
> > > > > > that the `__restrict__` attribute was correctly set.
> > > > > > 
> > > > > > > If this test is meant to run at -O3 and thus can't be
> > > > > > > part of
> > > > > > > test-
> > > > > > > combination.c, please add a comment about it to
> > > > > > > gcc/testsuite/jit.dg/all-non-failing-tests.h (in the
> > > > > > > alphabetical
> > > > > > > place).
> > > > > > 
> > > > > > Below `-O3`, this ASM difference doesn't appear
> > > > > > unfortunately.
> > > > > > 
> > > > > > > The patch also needs to add documentation for the new
> > > > > > > entrypoint (in
> > > > > > > topics/types.rst), and for the new ABI tag (in
> > > > > > > topics/compatibility.rst).
> > > > > > 
> > > > > > Added!
> > > > > > 
> > > > > > > Thanks again for the patch; hope 

Re: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Jakub Jelinek via Gcc-patches
On Tue, Aug 22, 2023 at 10:35:55PM +0800, Hongtao Liu wrote:
> Let's assume there's no detla now, AVX10.1-512 is equal to
> AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG, VPOPCNTDQ}
> > other stuff.
> > The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET* would be
> > like now, except that the current AVX512* sets imply also EVEX512/whatever
> > it will be called, that option itself enables nothing (or TARGET_AVX512F),
> > and unsetting it doesn't disable all the TARGET_AVX512*.
> > -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
> So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
> -mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
> then the combination basically is equal to AVX10.1-512(AVX512* sets +
> EVEX512)
> If this is your assumption, yes, there's no need for TARGET_AVX10_1.

I think that would be my expectation.  -mavx512bw currently implies
512-bit vector support of avx512f and avx512bw, and with -mavx512{bw,vl}
also 128-bit/256-bit vector support.  All pre-AVX10 chips which do support
AVX512BW support 512-bit vectors.  Now, -mavx10.1 will bring in also
vl,dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq as you wrote
which weren't enabled before, but unless there is some existing or planned
CPU which would support 512-bit vectors in avx512f and avx512bw ISAs and
only support 128/256-bit vectors in those
dq,cd,bf16,fp16,vbmi,vbmi2,vnni,ifma,bitalg,vpopcntdq isas, I think there
is no need to differentiate further; the only CPUs which will support both
what -mavx512bw and -mavx10.1 requires will be (if there is no delta)
either CPUs with 128/256/512-bit vector support of those
f,vl,bw,dq,cd,...vpopcntdq ISAs, or AVX10.1-512 ISAs.
-mavx512vl -mavx512bw -mno-evex512 -mavx10.1-256 would on the other side
disable all 512-bit vector instructions and in the end just mean the
same as -mavx10.1-256.
For just
-mavx512bw -mno-evex512 -mavx10.1-256
the question is if that -mno-evex512 turns off also avx512bw/avx512f because
avx512vl isn't enabled at that point during processing, or if we do that
only at the end as a special case.  Of course, in this exact case there is
no difference, because -mavx10.1-256 turns that back on.
But it would make a difference on
-mavx512bw -mno-evex512 -mavx512vl
(when processed right away would disable AVX512BW (because VL isn't on)
and in the end enable VL,F including EVEX512, or be equivalent to just
-mavx512bw -mavx512vl if processed at the end, because -mavx512vl implied
-mevex512 again.

Jakub



Re: [PATCH] RISC-V: Add Types to Un-Typed Sync Instructions:

2023-08-22 Thread Edwin Lu

On 8/21/2023 2:41 PM, Jeff Law via Gcc-patches wrote:



On 8/21/23 10:51, Edwin Lu wrote:

@@ -77,4 +78,4 @@ (define_insn "atomic_store_ztso"
    return "s\t%z1,%0";
    }
    [(set_attr "type" "atomic")
-   (set (attr "length") (const_int 8))])
\ No newline at end of file
+   (set (attr "length") (const_int 8))])
This raises a question.  We're likely better off using "multi" for a 
define_insn which generates multiple instructions.


That makes sense to me.



Can you respin changing atomic to multi for those cases where we're 
generating more than one instruction out of a define_insn?



Thanks for the feedback! I'll update those instructions with "multi".

Edwin Lu



Re: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Hongtao Liu via Gcc-patches
On Tue, Aug 22, 2023 at 9:35 PM Hongtao Liu  wrote:
>
> On Tue, Aug 22, 2023 at 9:24 PM Richard Biener
>  wrote:
> >
> > On Tue, Aug 22, 2023 at 3:16 PM Jakub Jelinek  wrote:
> > >
> > > On Tue, Aug 22, 2023 at 09:02:29PM +0800, Hongtao Liu wrote:
> > > > > Agreed.  And I still think -mevex512 vs. -mno-evex512 is the best 
> > > > > option
> > > > > name to represent whether the effective ISA set allows 512-bit 
> > > > > vectors or
> > > > > not.  I think -mavx10.1 -mno-avx512cd should be fine.  And, 
> > > > > -mavx10.1-256
> > > > > option IMHO should be in the same spirit to all the others a positive 
> > > > > enablement,
> > > > > not both positive (enable avx512{f,cd,bw,dq,...} and negative 
> > > > > (disallow
> > > > > 512-bit vectors).  So, if one uses -mavx512f -mavx10.1-256, because 
> > > > > the
> > > > > former would allow 512-bit vectors, the latter shouldn't disable 
> > > > > those again
> > > > > because it isn't a -mno-* option.  Sure, instructions which are 
> > > > > specific to
> > > > But there's implicit negative (disallow 512-bit vector), I think
> > >
> > > That is wrong.
> > >
> > > > -mav512f -mavx10.1-256 or -mavx10.1-256 -mavx512f shouldn't enable
> > > > 512-bit vector.
> > >
> > > Because then the -mavx10.1-256 option behaves completely differently from
> > > all the other isa options.
> > >
> > > We have the -march= options which are processed separately, but the normal
> > > ISA options either only enable something (when -mwhatever), or only 
> > > disable something
> > > (when -mno-whatever). -mavx512f -mavx10.1-256 should be a union of those
> > > ISAs, like say -mavx2 -mbmi is, not an intersection or something even
> > > harder to understand.
> > >
> > > > Further, we should disallow a mix of exex512 and non-evex512 (e.g.
> > > > -mavx10.1-512 -mavx10.2-256),they should be a unified separate switch
> > > > that either disallows both or allows both. Instead of some isa
> > > > allowing it and some isa disallowing it.
> > >
> > > No, it will be really terrible user experience if the new options behave
> > > completely differently from everything else.  Because then we'll need to
> Ok, then we can't avoid TARGET_AVX10_1 in those existing 256/128-bit
> evex instruction patterns.
> > > document it in detail how it behaves and users will have hard time to 
> > > figure
> > > it out, and specify what it does not just on the command line, but also 
> > > when
> > > mixing with target attribute or pragmas.  -mavx10.1-512 -mavx10.2-256 
> > > should
> > > be a union of those two ISAs.  Either internally there is an ISA flag 
> > > whether
> > > the instructions in the avx10.2 ISA but not avx10.1 ISA can operate on
> > > 512-bit vectors or not, in that case -mavx10.1-512 -mavx10.2-256 should
> > > enable the AVX10.1 set including 512-bit vectors + just the < 512-bit
> > > instructions from the 10.1 to 10.2 delta, or if there is no such 
> > > separation
> > > internally, it will just enable full AVX10.2-512.  User has asked for it.
> >
> > I think having all three -mavx10.1, -mavx10.1-256 and -mavx10.1-512 is just
> > confusing.  Please separate ISA (avx10.1) from size.  If -m[no-]evex512 
> > isn't
> > good propose something else.  -mavx512f will enable 512bits, -mavx10.1
> > will not unless -mevex512.  -mavx512f -mavx512vl -mno-evex512 will disable
> > 512bits.
> >
> > So scrap -mavx10.1-256 and -mavx10.1-512 please.
The related issue is what's the meaning of -mno-avx10.1-256/-mno-avx10.1-512
For -mno-avx10.1-256, maybe it just disable whole avx10.1
But for avx10.1-512 should it disable whole avx10.1 or just EVEX512,
or maybe we just doesn't provide -mno-avx10.1-512, just provide
-mno-avx10.1-256.
And use -mno-evex512 to disable 512-bit vectors.
>
> It sounds to me we would have something like
> avx512XXX
>^
>|
> "independent": TARGET_AVX512VL || TARGET_AVX10_1 will enable
> 128/256-bit instruction.
>|
> avx10.1-256  ^  ^
> |   |
> |   |
> implied   implied
> |   |
> |   |
> avx10.2-256  ^  ^
> |   |
> |   |
> impliedImplied
> |   |
> |   |
> avx10.3-256 <---implied---avx10.3-512
>   .
>
> And put every existing and new instruction under those flags
>
> >
> > Richard.
> >
> > > Jakub
> > >
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao


Re: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Hongtao Liu via Gcc-patches
On Tue, Aug 22, 2023 at 9:54 PM Jakub Jelinek  wrote:
>
> On Tue, Aug 22, 2023 at 09:35:44PM +0800, Hongtao Liu wrote:
> > Ok, then we can't avoid TARGET_AVX10_1 in those existing 256/128-bit
> > evex instruction patterns.
>
> Why?
> Internally for md etc. purposes, we should have the current
> TARGET_AVX512* etc. ISA flags, plus one new one, whatever we call it
> (TARGET_EVEX512 even if it is not completely descriptive because of kandq
> etc., or some other name) which says if 512-bit vector modes can be used,
> if g modifier can be used, if the 64-bit mask operations can be used etc.
> Plus, if AVX10.1 contains any instructions not covered in the preexisting
> TARGET_AVX512* sets, TARGET_AVX10_1 which covers that delta, otherwise
> keep -mavx10.1 just as an command line option which enables/disables
Let's assume there's no detla now, AVX10.1-512 is equal to
AVX512{F,VL,BW,DQ,CD,BF16,FP16,VBMI,VBMI2,VNNI,IFMA,BITALG, VPOPCNTDQ}
> other stuff.
> The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET* would be
> like now, except that the current AVX512* sets imply also EVEX512/whatever
> it will be called, that option itself enables nothing (or TARGET_AVX512F),
> and unsetting it doesn't disable all the TARGET_AVX512*.
> -mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
So for -mavx512bw -mavx10.1-256, -mavx512bw will set EVEX512, but
-mavx10.1-256 doesn't clear EVEX512 but just enable all AVX512* sets?.
then the combination basically is equal to AVX10.1-512(AVX512* sets +
EVEX512)
If this is your assumption, yes, there's no need for TARGET_AVX10_1.
(My former understanding is that you want  -mavx512bw -mavx10.1-256
enable all 128/256/scalar invariants but only avx512bw 512-bit
invariants, this can't be done without TARGET_AVX10_1).
So the whole point is -mavx10.x-256 shouldn't clear nor set EVEX512,
and -mavx10.x-512 should set EVEX512.
> At the end of the option processing, if EVEX512/whatever is set but
> TARGET_AVX512VL is not, disable TARGET_AVX512F with all its dependencies,
> because VL is a precondition of 128/256-bit EVEX and if 512-bit EVEX is not
> enabled, there is nothing left.
There's scalar evex instruction under TARGET_AVX512F(and other
non-avx512vl) w/o EVEX512, not nothing left.
>
> Jakub
>


-- 
BR,
Hongtao


Re: Re: [PATCH] RISC-V: Add conditional unary neg/abs/not autovec patterns

2023-08-22 Thread 钟居哲
>> It's certainly got the potential to get out of hand.  And it's not just
>> the vectorizer operations.  I know of an architecture that can execute
>> most of its ALU and loads/stores conditionally (not predication, but
>> actual conditional ops) like target  = (x COND Y) ? a << b ; a)

Do you mean we need to add cond_abs, cond_sqrt, cond_sign_extend, 
cond_zero_extend, cond_float_extend,.
...etc, over 100+ optabs/fns for vectoriation optimizaiton and support them in 
gimple IR (middle-end match.pd) ?

Or it's ok fo now we try to support those conditional operations in RISC-V 
backend by combine PASS ?

I personally prefer the later and I assign Lehua working on it.

Thanks.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-08-22 22:05
To: juzhe.zh...@rivai.ai; Robin Dapp; pinskia
CC: 丁乐华; gcc-patches; kito.cheng; palmer; richard.sandiford; Richard Biener
Subject: Re: [PATCH] RISC-V: Add conditional unary neg/abs/not autovec patterns
 
 
On 8/22/23 02:08, juzhe.zh...@rivai.ai wrote:
> Yes, I agree long-term we want every-thing be optimized as early as 
> possible.
> 
> However, IMHO, it's impossible we can support every conditional patterns 
> in the middle-end (match.pd).
> It's a really big number.
> 
> For example, for sign_extend conversion, we have vsext.vf2 (vector SI -> 
> vector DI),... vsext.vf4 (vector HI -> vector DI), vsext.vf8 (vector QI 
> -> vector DI)..
> Not only the conversion, every auto-vectorization patterns can have 
> conditional format.
> For example, abs,..rotate, sqrt, floor, ceil,etc.
> I bet it could be over 100+ conditional optabs/internal FNs. It's huge 
> number.
> I don't see necessity that we should support them in middle-end 
> (match.pd) since we known RTL back-end combine PASS can do the good job 
> here.
> 
> Besides, LLVM doesn't such many conditional pattern. LLVM just has "add" 
> and "select" separate IR then do the combine in the back-end:
> https://godbolt.org/z/rYcMMG1eT 
> 
> You can see LLVM didn't do the op + select optimization in generic IR, 
> they do the optimization in combine PASS.
> 
> So I prefer this patch solution and apply such solution for the future 
> more support : sign extend, zero extend, float extend, abs, sqrt, ceil, 
> floor, etc.
It's certainly got the potential to get out of hand.  And it's not just 
the vectorizer operations.  I know of an architecture that can execute 
most of its ALU and loads/stores conditionally (not predication, but 
actual conditional ops) like target  = (x COND Y) ? a << b ; a)
 
I'd tend to lean towards synthesizing these conditional ops around a 
conditional move/select primitive in gimple through the RTL expanders. 
That would in turn set things up so that if the target had various 
conditional operations like conditional shift it could be trivially 
discovered by the combiner.
 
We still get most of the benefit of eliminating control flow early, a 
sensible gimple representation, relatively easy translation into RTL and 
  easy combination for targets with actual conditional operations.
 
It turns out that model is something we may want to work towards anyway. 
  We were looking at this exact problem in the context of zicond for 
riscv.  The biggest problem we've seen so far is that the generic 
conditional move expansion generates fairly poor code when the target 
doesn't actually have a conditional move primitive.
 
jeff
 


RE: [PATCH] VECT: Add LEN_FOLD_EXTRACT_LAST pattern

2023-08-22 Thread Li, Pan2 via Gcc-patches
Committed as passed both the regression and bootstrap tests in x86, thanks 
Richard.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Richard Biener via Gcc-patches
Sent: Tuesday, August 22, 2023 7:08 PM
To: Juzhe-Zhong 
Cc: gcc-patches@gcc.gnu.org; richard.sandif...@arm.com
Subject: Re: [PATCH] VECT: Add LEN_FOLD_EXTRACT_LAST pattern

On Tue, 22 Aug 2023, Juzhe-Zhong wrote:

> Hi, Richard and Richi.
> 
> This is the last autovec pattern I want to add for RVV (length loop control).
> 
> This patch is supposed to handled this following case:
> 
> int __attribute__ ((noinline, noclone))
> condition_reduction (int *a, int min_v, int n)
> {
>   int last = 66; /* High start value.  */
> 
>   for (int i = 0; i < n; i++)
> if (a[i] < min_v)
>   last = i;
> 
>   return last;
> }
> 
> ARM SVE IR:
> 
>   ...
>   mask__7.11_39 = vect__4.10_37 < vect_cst__38;
>   _40 = loop_mask_36 & mask__7.11_39;
>   last_5 = .FOLD_EXTRACT_LAST (last_15, _40, vect_vec_iv_.7_32);
>   ...
> 
> RVV IR, we want to see:
>  ...
>  loop_len = SELECT_VL
>  mask__7.11_39 = vect__4.10_37 < vect_cst__38;
>  last_5 = .LEN_FOLD_EXTRACT_LAST (last_15, _40, vect_vec_iv_.7_32, loop_len, 
> bias);
>  ...

OK.

Richard.

> gcc/ChangeLog:
> 
>   * doc/md.texi: Add LEN_FOLD_EXTRACT_LAST pattern.
>   * internal-fn.cc (fold_len_extract_direct): Ditto.
>   (expand_fold_len_extract_optab_fn): Ditto.
>   (direct_fold_len_extract_optab_supported_p): Ditto.
>   * internal-fn.def (LEN_FOLD_EXTRACT_LAST): Ditto.
> 
> ---
>  gcc/doc/md.texi | 6 ++
>  gcc/internal-fn.cc  | 5 +
>  gcc/internal-fn.def | 3 +++
>  3 files changed, 14 insertions(+)
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 89562fdb43c..24453693d89 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5636,6 +5636,12 @@ has mode @var{m} and operands 0 and 1 have the mode 
> appropriate for
>  one element of @var{m}.  Operand 2 has the usual mask mode for vectors
>  of mode @var{m}; see @code{TARGET_VECTORIZE_GET_MASK_MODE}.
>  
> +@cindex @code{len_fold_extract_last_@var{m}} instruction pattern
> +@item @code{len_fold_extract_last_@var{m}}
> +Like @samp{fold_extract_last_@var{m}}, but takes an extra length operand as
> +operand 4 and an extra bias operand as operand 5.  The last associated 
> element
> +is extracted should have the index i < len (operand 4) + bias (operand 5).
> +
>  @cindex @code{fold_left_plus_@var{m}} instruction pattern
>  @item @code{fold_left_plus_@var{m}}
>  Take scalar operand 1 and successively add each element from vector
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 314f63b614b..4138cc31d7e 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -188,6 +188,7 @@ init_internal_fns ()
>  #define cond_len_ternary_direct { 1, 1, true }
>  #define while_direct { 0, 2, false }
>  #define fold_extract_direct { 2, 2, false }
> +#define fold_len_extract_direct { 2, 2, false }
>  #define fold_left_direct { 1, 1, false }
>  #define mask_fold_left_direct { 1, 1, false }
>  #define mask_len_fold_left_direct { 1, 1, false }
> @@ -3863,6 +3864,9 @@ expand_convert_optab_fn (internal_fn fn, gcall *stmt, 
> convert_optab optab,
>  #define expand_fold_extract_optab_fn(FN, STMT, OPTAB) \
>expand_direct_optab_fn (FN, STMT, OPTAB, 3)
>  
> +#define expand_fold_len_extract_optab_fn(FN, STMT, OPTAB) \
> +  expand_direct_optab_fn (FN, STMT, OPTAB, 5)
> +
>  #define expand_fold_left_optab_fn(FN, STMT, OPTAB) \
>expand_direct_optab_fn (FN, STMT, OPTAB, 2)
>  
> @@ -3980,6 +3984,7 @@ multi_vector_optab_supported_p (convert_optab optab, 
> tree_pair types,
>  #define direct_mask_len_store_optab_supported_p convert_optab_supported_p
>  #define direct_while_optab_supported_p convert_optab_supported_p
>  #define direct_fold_extract_optab_supported_p direct_optab_supported_p
> +#define direct_fold_len_extract_optab_supported_p direct_optab_supported_p
>  #define direct_fold_left_optab_supported_p direct_optab_supported_p
>  #define direct_mask_fold_left_optab_supported_p direct_optab_supported_p
>  #define direct_mask_len_fold_left_optab_supported_p direct_optab_supported_p
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 594f7881511..d09403c0a91 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -312,6 +312,9 @@ DEF_INTERNAL_OPTAB_FN (EXTRACT_LAST, ECF_CONST | 
> ECF_NOTHROW,
>  DEF_INTERNAL_OPTAB_FN (FOLD_EXTRACT_LAST, ECF_CONST | ECF_NOTHROW,
>  fold_extract_last, fold_extract)
>  
> +DEF_INTERNAL_OPTAB_FN (LEN_FOLD_EXTRACT_LAST, ECF_CONST | ECF_NOTHROW,
> +len_fold_extract_last, fold_len_extract)
> +
>  DEF_INTERNAL_OPTAB_FN (FOLD_LEFT_PLUS, ECF_CONST | ECF_NOTHROW,
>  fold_left_plus, fold_left)
>  
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [committed] i386: Fix grammar typo in diagnostic

2023-08-22 Thread Gerald Pfeifer
On Mon, 7 Aug 2023, Marek Polacek via Gcc-patches wrote:
>> Less obvious (to me) is whether it's correct to say "GCC V13" here. I
>> don't think we refer to a version that way anywhere else, do we?
>> 
>> Would "since GCC 13.1.0" be better?
> x86_field_alignment uses
> 
>   inform (input_location, "the alignment of %<_Atomic %T%> "
>   "fields changed in %{GCC 11.1%}",
> 
> so maybe the below should use %{GCC 13.1%}.  "GCC V13" looks unusual
> to me.

I usually say "GCC 13" when referring to a major release.

("GCC V13" definitely is very unusual.)

Gerald


Re: [PATCH] RISC-V: Add conditional unary neg/abs/not autovec patterns

2023-08-22 Thread Jeff Law via Gcc-patches




On 8/22/23 02:08, juzhe.zh...@rivai.ai wrote:
Yes, I agree long-term we want every-thing be optimized as early as 
possible.


However, IMHO, it's impossible we can support every conditional patterns 
in the middle-end (match.pd).

It's a really big number.

For example, for sign_extend conversion, we have vsext.vf2 (vector SI -> 
vector DI),... vsext.vf4 (vector HI -> vector DI), vsext.vf8 (vector QI 
-> vector DI)..
Not only the conversion, every auto-vectorization patterns can have 
conditional format.

For example, abs,..rotate, sqrt, floor, ceil,etc.
I bet it could be over 100+ conditional optabs/internal FNs. It's huge 
number.
I don't see necessity that we should support them in middle-end 
(match.pd) since we known RTL back-end combine PASS can do the good job 
here.


Besides, LLVM doesn't such many conditional pattern. LLVM just has "add" 
and "select" separate IR then do the combine in the back-end:

https://godbolt.org/z/rYcMMG1eT 

You can see LLVM didn't do the op + select optimization in generic IR, 
they do the optimization in combine PASS.


So I prefer this patch solution and apply such solution for the future 
more support : sign extend, zero extend, float extend, abs, sqrt, ceil, 
floor, etc.
It's certainly got the potential to get out of hand.  And it's not just 
the vectorizer operations.  I know of an architecture that can execute 
most of its ALU and loads/stores conditionally (not predication, but 
actual conditional ops) like target  = (x COND Y) ? a << b ; a)


I'd tend to lean towards synthesizing these conditional ops around a 
conditional move/select primitive in gimple through the RTL expanders. 
That would in turn set things up so that if the target had various 
conditional operations like conditional shift it could be trivially 
discovered by the combiner.


We still get most of the benefit of eliminating control flow early, a 
sensible gimple representation, relatively easy translation into RTL and 
 easy combination for targets with actual conditional operations.


It turns out that model is something we may want to work towards anyway. 
 We were looking at this exact problem in the context of zicond for 
riscv.  The biggest problem we've seen so far is that the generic 
conditional move expansion generates fairly poor code when the target 
doesn't actually have a conditional move primitive.


jeff


Re: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Jakub Jelinek via Gcc-patches
On Tue, Aug 22, 2023 at 09:35:44PM +0800, Hongtao Liu wrote:
> Ok, then we can't avoid TARGET_AVX10_1 in those existing 256/128-bit
> evex instruction patterns.

Why?
Internally for md etc. purposes, we should have the current
TARGET_AVX512* etc. ISA flags, plus one new one, whatever we call it
(TARGET_EVEX512 even if it is not completely descriptive because of kandq
etc., or some other name) which says if 512-bit vector modes can be used,
if g modifier can be used, if the 64-bit mask operations can be used etc.
Plus, if AVX10.1 contains any instructions not covered in the preexisting
TARGET_AVX512* sets, TARGET_AVX10_1 which covers that delta, otherwise
keep -mavx10.1 just as an command line option which enables/disables
other stuff.
The current common/config/i386/i386-common.cc OPTION_MASK_ISA*SET* would be
like now, except that the current AVX512* sets imply also EVEX512/whatever
it will be called, that option itself enables nothing (or TARGET_AVX512F),
and unsetting it doesn't disable all the TARGET_AVX512*.
-mavx10.1 would enable the AVX512* sets without EVEX512/whatever.
At the end of the option processing, if EVEX512/whatever is set but
TARGET_AVX512VL is not, disable TARGET_AVX512F with all its dependencies,
because VL is a precondition of 128/256-bit EVEX and if 512-bit EVEX is not
enabled, there is nothing left.

Jakub



Re: [PATCH] doc: Remove obsolete sentence about _Float* not being supported in C++ [PR106652]

2023-08-22 Thread Jeff Law via Gcc-patches




On 8/22/23 02:15, Jakub Jelinek via Gcc-patches wrote:

Hi!

As mentioned in the PR, these types are supported in C++ since GCC 13,
so we shouldn't confuse users.

Ok for trunk?

2023-08-22  Jakub Jelinek  

PR c++/106652
* doc/extend.texi (_Float): Drop obsolete sentence that the
types aren't supported in C++.

OK
jeff


Re: [PATCH V2 5/5] OpenMP: Fortran support for imperfectly-nested loops

2023-08-22 Thread Jakub Jelinek via Gcc-patches
On Sun, Jul 23, 2023 at 04:15:21PM -0600, Sandra Loosemore wrote:
> OpenMP 5.0 removed the restriction that multiple collapsed loops must
> be perfectly nested, allowing "intervening code" (including nested
> BLOCKs) before or after each nested loop.  In GCC this code is moved
> into the inner loop body by the respective front ends.
> 
> In the Fortran front end, most of the semantic processing happens during
> the translation phase, so the parse phase just collects the intervening
> statements, checks them for errors, and splices them around the loop body.
> 
> gcc/fortran/ChangeLog
>   * gfortran.h (struct gfc_namespace): Add omp_structured_block bit.
>   * openmp.cc: Include omp-api.h.
>   (resolve_omp_clauses): Consolidate inscan reduction clause conflict
>   checking here.
>   (find_nested_loop_in_chain): New.
>   (find_nested_loop_in_block): New.
>   (gfc_resolve_omp_do_blocks): Set omp_current_do_collapse properly.
>   Handle imperfectly-nested loops when looking for nested omp scan.
>   Refactor to move inscan reduction clause conflict checking to
>   resolve_omp_clauses.
>   (gfc_resolve_do_iterator): Handle imperfectly-nested loops.
>   (struct icode_error_state): New.
>   (icode_code_error_callback): New.
>   (icode_expr_error_callback): New.
>   (diagnose_intervening_code_errors_1): New.
>   (diagnose_intervening_code_errors): New.
>   (make_structured_block): New.
>   (restructure_intervening_code): New.
>   (is_outer_iteration_variable): Do not assume loops are perfectly
>   nested.
>   (check_nested_loop_in_chain): New.
>   (check_nested_loop_in_block_state): New.
>   (check_nested_loop_in_block_symbol): New.
>   (check_nested_loop_in_block): New.
>   (expr_uses_intervening_var): New.
>   (is_intervening_var): New.
>   (expr_is_invariant): Do not assume loops are perfectly nested.
>   (resolve_omp_do): Handle imperfectly-nested loops.
>   * trans-stmt.cc (gfc_trans_block_construct): Generate
>   OMP_STRUCTURED_BLOCK if magic bit is set on block namespace.
> 
> gcc/testsuite/ChangeLog
>   * gfortran.dg/gomp/collapse1.f90: Adjust expected errors.
>   * gfortran.dg/gomp/collapse2.f90: Likewise.
>   * gfortran.dg/gomp/imperfect-gotos.f90: New.
>   * gfortran.dg/gomp/imperfect-invalid-scope.f90: New.
>   * gfortran.dg/gomp/imperfect1.f90: New.
>   * gfortran.dg/gomp/imperfect2.f90: New.
>   * gfortran.dg/gomp/imperfect3.f90: New.
>   * gfortran.dg/gomp/imperfect4.f90: New.
>   * gfortran.dg/gomp/imperfect5.f90: New.
> 
> libgomp/ChangeLog
>   * testsuite/libgomp.fortran/imperfect-destructor.f90: New.
>   * testsuite/libgomp.fortran/imperfect1.f90: New.
>   * testsuite/libgomp.fortran/imperfect2.f90: New.
>   * testsuite/libgomp.fortran/imperfect3.f90: New.
>   * testsuite/libgomp.fortran/imperfect4.f90: New.
>   * testsuite/libgomp.fortran/target-imperfect1.f90: New.
>   * testsuite/libgomp.fortran/target-imperfect2.f90: New.
>   * testsuite/libgomp.fortran/target-imperfect3.f90: New.
>   * testsuite/libgomp.fortran/target-imperfect4.f90: New.

LGTM, but please let Tobias have a second look unless he has done so
already.

Jakub



Re: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Hongtao Liu via Gcc-patches
On Tue, Aug 22, 2023 at 9:24 PM Richard Biener
 wrote:
>
> On Tue, Aug 22, 2023 at 3:16 PM Jakub Jelinek  wrote:
> >
> > On Tue, Aug 22, 2023 at 09:02:29PM +0800, Hongtao Liu wrote:
> > > > Agreed.  And I still think -mevex512 vs. -mno-evex512 is the best option
> > > > name to represent whether the effective ISA set allows 512-bit vectors 
> > > > or
> > > > not.  I think -mavx10.1 -mno-avx512cd should be fine.  And, 
> > > > -mavx10.1-256
> > > > option IMHO should be in the same spirit to all the others a positive 
> > > > enablement,
> > > > not both positive (enable avx512{f,cd,bw,dq,...} and negative (disallow
> > > > 512-bit vectors).  So, if one uses -mavx512f -mavx10.1-256, because the
> > > > former would allow 512-bit vectors, the latter shouldn't disable those 
> > > > again
> > > > because it isn't a -mno-* option.  Sure, instructions which are 
> > > > specific to
> > > But there's implicit negative (disallow 512-bit vector), I think
> >
> > That is wrong.
> >
> > > -mav512f -mavx10.1-256 or -mavx10.1-256 -mavx512f shouldn't enable
> > > 512-bit vector.
> >
> > Because then the -mavx10.1-256 option behaves completely differently from
> > all the other isa options.
> >
> > We have the -march= options which are processed separately, but the normal
> > ISA options either only enable something (when -mwhatever), or only disable 
> > something
> > (when -mno-whatever). -mavx512f -mavx10.1-256 should be a union of those
> > ISAs, like say -mavx2 -mbmi is, not an intersection or something even
> > harder to understand.
> >
> > > Further, we should disallow a mix of exex512 and non-evex512 (e.g.
> > > -mavx10.1-512 -mavx10.2-256),they should be a unified separate switch
> > > that either disallows both or allows both. Instead of some isa
> > > allowing it and some isa disallowing it.
> >
> > No, it will be really terrible user experience if the new options behave
> > completely differently from everything else.  Because then we'll need to
Ok, then we can't avoid TARGET_AVX10_1 in those existing 256/128-bit
evex instruction patterns.
> > document it in detail how it behaves and users will have hard time to figure
> > it out, and specify what it does not just on the command line, but also when
> > mixing with target attribute or pragmas.  -mavx10.1-512 -mavx10.2-256 should
> > be a union of those two ISAs.  Either internally there is an ISA flag 
> > whether
> > the instructions in the avx10.2 ISA but not avx10.1 ISA can operate on
> > 512-bit vectors or not, in that case -mavx10.1-512 -mavx10.2-256 should
> > enable the AVX10.1 set including 512-bit vectors + just the < 512-bit
> > instructions from the 10.1 to 10.2 delta, or if there is no such separation
> > internally, it will just enable full AVX10.2-512.  User has asked for it.
>
> I think having all three -mavx10.1, -mavx10.1-256 and -mavx10.1-512 is just
> confusing.  Please separate ISA (avx10.1) from size.  If -m[no-]evex512 isn't
> good propose something else.  -mavx512f will enable 512bits, -mavx10.1
> will not unless -mevex512.  -mavx512f -mavx512vl -mno-evex512 will disable
> 512bits.
>
> So scrap -mavx10.1-256 and -mavx10.1-512 please.

It sounds to me we would have something like
avx512XXX
   ^
   |
"independent": TARGET_AVX512VL || TARGET_AVX10_1 will enable
128/256-bit instruction.
   |
avx10.1-256 
> Richard.
>
> > Jakub
> >



-- 
BR,
Hongtao


Re: [PATCH V2 4/5] OpenMP: New C/C++ testcases for imperfectly nested loops.

2023-08-22 Thread Jakub Jelinek via Gcc-patches
On Sun, Jul 23, 2023 at 04:15:20PM -0600, Sandra Loosemore wrote:
> gcc/testsuite/ChangeLog
>   * c-c++-common/gomp/imperfect-attributes.c: New.
>   * c-c++-common/gomp/imperfect-badloops.c: New.
>   * c-c++-common/gomp/imperfect-blocks.c: New.
>   * c-c++-common/gomp/imperfect-extension.c: New.
>   * c-c++-common/gomp/imperfect-gotos.c: New.
>   * c-c++-common/gomp/imperfect-invalid-scope.c: New.
>   * c-c++-common/gomp/imperfect-labels.c: New.
>   * c-c++-common/gomp/imperfect-legacy-syntax.c: New.
>   * c-c++-common/gomp/imperfect-pragmas.c: New.
>   * c-c++-common/gomp/imperfect1.c: New.
>   * c-c++-common/gomp/imperfect2.c: New.
>   * c-c++-common/gomp/imperfect3.c: New.
>   * c-c++-common/gomp/imperfect4.c: New.
>   * c-c++-common/gomp/imperfect5.c: New.
> 
> libgomp/ChangeLog
>   * testsuite/libgomp.c-c++-common/imperfect1.c: New.
>   * testsuite/libgomp.c-c++-common/imperfect2.c: New.
>   * testsuite/libgomp.c-c++-common/imperfect3.c: New.
>   * testsuite/libgomp.c-c++-common/imperfect4.c: New.
>   * testsuite/libgomp.c-c++-common/imperfect5.c: New.
>   * testsuite/libgomp.c-c++-common/imperfect6.c: New.
>   * testsuite/libgomp.c-c++-common/target-imperfect1.c: New.
>   * testsuite/libgomp.c-c++-common/target-imperfect2.c: New.
>   * testsuite/libgomp.c-c++-common/target-imperfect3.c: New.
>   * testsuite/libgomp.c-c++-common/target-imperfect4.c: New.

As I wrote in reply to the cover letter, I'd prefer the
ordered(2)/ordered(3) nests to have #pragma omp ordered doacross(source:)
and #pragma omp ordered doacross(sink: ...) directives and
use the libgomp scan-1.c as basis for the scan tests.
Otherwise LGTM.

Jakub



Re: [PATCH V2 3/5] OpenMP: C++ support for imperfectly-nested loops

2023-08-22 Thread Jakub Jelinek via Gcc-patches
On Sun, Jul 23, 2023 at 04:15:19PM -0600, Sandra Loosemore wrote:
> OpenMP 5.0 removed the restriction that multiple collapsed loops must
> be perfectly nested, allowing "intervening code" (including nested
> BLOCKs) before or after each nested loop.  In GCC this code is moved
> into the inner loop body by the respective front ends.
> 
> This patch changes the C++ front end to use recursive descent parsing
> on nested loops within an "omp for" construct, rather than an
> iterative approach, in order to preserve proper nesting of compound
> statements.  Preserving cleanups (destructors) for class objects
> declared in intervening code and loop initializers complicates moving
> the former into the body of the loop; this is handled by parsing the
> entire construct before reassembling any of it.
> 
> gcc/cp/ChangeLog
>   * cp-tree.h (cp_convert_omp_range_for): Adjust declaration.
>   * parser.cc (struct omp_for_parse_data): New.
>   (cp_parser_postfix_expression): Diagnose calls to OpenMP runtime
>   in intervening code.
>   (check_omp_intervening_code): New.
>   (cp_parser_statement_seq_opt): Special-case nested loops, blocks,
>   and other constructs for OpenMP loops.
>   (cp_parser_iteration_statement): Reject loops in intervening code.
>   (cp_parser_omp_for_loop_init): Expand comments and tweak the
>   interface slightly to better distinguish input/output parameters.
>   (cp_convert_omp_range_for): Likewise.
>   (cp_parser_omp_loop_nest): New, split from cp_parser_omp_for_loop
>   and largely rewritten.  Add more comments.
>   (insert_structured_blocks): New.
>   (find_structured_blocks): New.
>   (struct sit_data, substitute_in_tree_walker, substitute_in_tree):
>   New.
>   (fixup_blocks_walker): New.
>   (cp_parser_omp_for_loop): Rewrite to use recursive descent instead
>   of a loop.  Add logic to reshuffle the bits of code collected
>   during parsing so intervening code gets moved to the loop body.
>   (cp_parser_omp_loop): Remove call to finish_omp_for_block, which
>   is now redundant.
>   (cp_parser_omp_simd): Likewise.
>   (cp_parser_omp_for): Likewise.
>   (cp_parser_omp_distribute): Likewise.
>   (cp_parser_oacc_loop): Likewise.
>   (cp_parser_omp_taskloop): Likewise.
>   (cp_parser_pragma): Reject OpenMP pragmas in intervening code.
>   * parser.h (struct cp_parser): Add omp_for_parse_state field.
>   * pt.cc (tsubst_omp_for_iterator): Adjust call to
>   cp_convert_omp_range_for.
>   * semantics.cc (finish_omp_for): Try harder to preserve location
>   of loop variable init expression for use in diagnostics.
>   (struct fofb_data, finish_omp_for_block_walker): New.
>   (finish_omp_for_block): Allow variables to be bound in a BIND_EXPR
>   nested inside BIND instead of directly in BIND itself.
> 
> gcc/testsuite/ChangeLog
>   * c-c++-common/goacc/tile-2.c: Adjust expected error patterns.
>   * g++.dg/gomp/attrs-imperfect1.C: New test.
>   * g++.dg/gomp/attrs-imperfect2.C: New test.
>   * g++.dg/gomp/attrs-imperfect3.C: New test.
>   * g++.dg/gomp/attrs-imperfect4.C: New test.
>   * g++.dg/gomp/attrs-imperfect5.C: New test.
>   * g++.dg/gomp/pr41967.C: Adjust expected error patterns.
>   * g++.dg/gomp/tpl-imperfect-gotos.C: New test.
>   * g++.dg/gomp/tpl-imperfect-invalid-scope.C: New test.
> 
> libgomp/ChangeLog
>   * testsuite/libgomp.c++/attrs-imperfect1.C: New test.
>   * testsuite/libgomp.c++/attrs-imperfect2.C: New test.
>   * testsuite/libgomp.c++/attrs-imperfect3.C: New test.
>   * testsuite/libgomp.c++/attrs-imperfect4.C: New test.
>   * testsuite/libgomp.c++/attrs-imperfect5.C: New test.
>   * testsuite/libgomp.c++/attrs-imperfect6.C: New test.
>   * testsuite/libgomp.c++/imperfect-class-1.C: New test.
>   * testsuite/libgomp.c++/imperfect-class-2.C: New test.
>   * testsuite/libgomp.c++/imperfect-class-3.C: New test.
>   * testsuite/libgomp.c++/imperfect-destructor.C: New test.
>   * testsuite/libgomp.c++/imperfect-template-1.C: New test.
>   * testsuite/libgomp.c++/imperfect-template-2.C: New test.
>   * testsuite/libgomp.c++/imperfect-template-3.C: New test.

Ok (though, if the c-c++-common tests are tweaked in the C patch,
this patch needs to undo that).

Jakub



Re: [PATCH 3/3] vect: Move VMAT_GATHER_SCATTER handlings from final loop nest

2023-08-22 Thread Richard Biener via Gcc-patches
On Tue, Aug 22, 2023 at 10:52 AM Kewen.Lin  wrote:
>
> Hi,
>
> Like r14-3317 which moves the handlings on memory access
> type VMAT_GATHER_SCATTER in vectorizable_load final loop
> nest, this one is to deal with vectorizable_store side.
>
> Bootstrapped and regtested on x86_64-redhat-linux,
> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>
> Is it ok for trunk?

OK.

> BR,
> Kewen
> -
>
> gcc/ChangeLog:
>
> * tree-vect-stmts.cc (vectorizable_store): Move the handlings on
> VMAT_GATHER_SCATTER in the final loop nest to its own loop,
> and update the final nest accordingly.
> ---
>  gcc/tree-vect-stmts.cc | 258 +
>  1 file changed, 159 insertions(+), 99 deletions(-)
>
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 18f5ebcc09c..b959c1861ad 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -8930,44 +8930,23 @@ vectorizable_store (vec_info *vinfo,
>return true;
>  }
>
> -  auto_vec result_chain (group_size);
> -  auto_vec vec_offsets;
> -  auto_vec vec_oprnds;
> -  for (j = 0; j < ncopies; j++)
> +  if (memory_access_type == VMAT_GATHER_SCATTER)
>  {
> -  gimple *new_stmt;
> -  if (j == 0)
> +  gcc_assert (!slp && !grouped_store);
> +  auto_vec vec_offsets;
> +  for (j = 0; j < ncopies; j++)
> {
> - if (slp)
> -   {
> - /* Get vectorized arguments for SLP_NODE.  */
> - vect_get_vec_defs (vinfo, stmt_info, slp_node, 1, op,
> -_oprnds);
> - vec_oprnd = vec_oprnds[0];
> -   }
> - else
> + gimple *new_stmt;
> + if (j == 0)
> {
> - /* For interleaved stores we collect vectorized defs for all the
> -stores in the group in DR_CHAIN. DR_CHAIN is then used as an
> -input to vect_permute_store_chain().
> -
> -If the store is not grouped, DR_GROUP_SIZE is 1, and DR_CHAIN
> -is of size 1.  */
> - stmt_vec_info next_stmt_info = first_stmt_info;
> - for (i = 0; i < group_size; i++)
> -   {
> - /* Since gaps are not supported for interleaved stores,
> -DR_GROUP_SIZE is the exact number of stmts in the chain.
> -Therefore, NEXT_STMT_INFO can't be NULL_TREE.  In case
> -that there is no interleaving, DR_GROUP_SIZE is 1,
> -and only one iteration of the loop will be executed.  */
> - op = vect_get_store_rhs (next_stmt_info);
> - vect_get_vec_defs_for_operand (vinfo, next_stmt_info, 
> ncopies,
> -op, gvec_oprnds[i]);
> - vec_oprnd = (*gvec_oprnds[i])[0];
> - dr_chain.quick_push (vec_oprnd);
> - next_stmt_info = DR_GROUP_NEXT_ELEMENT (next_stmt_info);
> -   }
> + /* Since the store is not grouped, DR_GROUP_SIZE is 1, and
> +DR_CHAIN is of size 1.  */
> + gcc_assert (group_size == 1);
> + op = vect_get_store_rhs (first_stmt_info);
> + vect_get_vec_defs_for_operand (vinfo, first_stmt_info, ncopies,
> +op, gvec_oprnds[0]);
> + vec_oprnd = (*gvec_oprnds[0])[0];
> + dr_chain.quick_push (vec_oprnd);
>   if (mask)
> {
>   vect_get_vec_defs_for_operand (vinfo, stmt_info, ncopies,
> @@ -8975,91 +8954,55 @@ vectorizable_store (vec_info *vinfo,
>  mask_vectype);
>   vec_mask = vec_masks[0];
> }
> -   }
>
> - /* We should have catched mismatched types earlier.  */
> - gcc_assert (useless_type_conversion_p (vectype,
> -TREE_TYPE (vec_oprnd)));
> - bool simd_lane_access_p
> -   = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) != 0;
> - if (simd_lane_access_p
> - && !loop_masks
> - && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) == ADDR_EXPR
> - && VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (first_dr_info->dr), 0))
> - && integer_zerop (get_dr_vinfo_offset (vinfo, first_dr_info))
> - && integer_zerop (DR_INIT (first_dr_info->dr))
> - && alias_sets_conflict_p (get_alias_set (aggr_type),
> -   get_alias_set (TREE_TYPE (ref_type
> -   {
> - dataref_ptr = unshare_expr (DR_BASE_ADDRESS 
> (first_dr_info->dr));
> - dataref_offset = build_int_cst (ref_type, 0);
> + /* We should have catched mismatched types earlier.  */
> + gcc_assert (useless_type_conversion_p (vectype,
> +  

Re: [PATCH 2/3] vect: Move VMAT_LOAD_STORE_LANES handlings from final loop nest

2023-08-22 Thread Richard Biener via Gcc-patches
On Tue, Aug 22, 2023 at 10:49 AM Kewen.Lin  wrote:
>
> Hi,
>
> Like commit r14-3214 which moves the handlings on memory
> access type VMAT_LOAD_STORE_LANES in vectorizable_load
> final loop nest, this one is to deal with the function
> vectorizable_store.
>
> Bootstrapped and regtested on x86_64-redhat-linux,
> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>
> Is it ok for trunk?

OK.

> BR,
> Kewen
> -
>
> gcc/ChangeLog:
>
> * tree-vect-stmts.cc (vectorizable_store): Move the handlings on
> VMAT_LOAD_STORE_LANES in the final loop nest to its own loop,
> and update the final nest accordingly.
> ---
>  gcc/tree-vect-stmts.cc | 732 ++---
>  1 file changed, 387 insertions(+), 345 deletions(-)
>
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index fcaa4127e52..18f5ebcc09c 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -8779,42 +8779,29 @@ vectorizable_store (vec_info *vinfo,
>*/
>
>auto_vec dr_chain (group_size);
> -  auto_vec result_chain (group_size);
>auto_vec vec_masks;
>tree vec_mask = NULL;
> -  auto_vec vec_offsets;
>auto_delete_vec> gvec_oprnds (group_size);
>for (i = 0; i < group_size; i++)
>  gvec_oprnds.quick_push (new auto_vec (ncopies));
> -  auto_vec vec_oprnds;
> -  for (j = 0; j < ncopies; j++)
> +
> +  if (memory_access_type == VMAT_LOAD_STORE_LANES)
>  {
> -  gimple *new_stmt;
> -  if (j == 0)
> +  gcc_assert (!slp && grouped_store);
> +  for (j = 0; j < ncopies; j++)
> {
> -  if (slp)
> -{
> - /* Get vectorized arguments for SLP_NODE.  */
> - vect_get_vec_defs (vinfo, stmt_info, slp_node, 1,
> -op, _oprnds);
> -  vec_oprnd = vec_oprnds[0];
> -}
> -  else
> -{
> - /* For interleaved stores we collect vectorized defs for all the
> -stores in the group in DR_CHAIN. DR_CHAIN is then used as an
> -input to vect_permute_store_chain().
> -
> -If the store is not grouped, DR_GROUP_SIZE is 1, and DR_CHAIN
> -is of size 1.  */
> + gimple *new_stmt;
> + if (j == 0)
> +   {
> + /* For interleaved stores we collect vectorized defs for all
> +the stores in the group in DR_CHAIN. DR_CHAIN is then used
> +as an input to vect_permute_store_chain().  */
>   stmt_vec_info next_stmt_info = first_stmt_info;
>   for (i = 0; i < group_size; i++)
> {
>   /* Since gaps are not supported for interleaved stores,
> -DR_GROUP_SIZE is the exact number of stmts in the chain.
> -Therefore, NEXT_STMT_INFO can't be NULL_TREE.  In case
> -that there is no interleaving, DR_GROUP_SIZE is 1,
> -and only one iteration of the loop will be executed.  */
> +DR_GROUP_SIZE is the exact number of stmts in the
> +chain. Therefore, NEXT_STMT_INFO can't be NULL_TREE.  */
>   op = vect_get_store_rhs (next_stmt_info);
>   vect_get_vec_defs_for_operand (vinfo, next_stmt_info, 
> ncopies,
>  op, gvec_oprnds[i]);
> @@ -8825,66 +8812,37 @@ vectorizable_store (vec_info *vinfo,
>   if (mask)
> {
>   vect_get_vec_defs_for_operand (vinfo, stmt_info, ncopies,
> -mask, _masks, 
> mask_vectype);
> +mask, _masks,
> +mask_vectype);
>   vec_mask = vec_masks[0];
> }
> -   }
>
> - /* We should have catched mismatched types earlier.  */
> - gcc_assert (useless_type_conversion_p (vectype,
> -TREE_TYPE (vec_oprnd)));
> - bool simd_lane_access_p
> -   = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) != 0;
> - if (simd_lane_access_p
> - && !loop_masks
> - && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) == ADDR_EXPR
> - && VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (first_dr_info->dr), 0))
> - && integer_zerop (get_dr_vinfo_offset (vinfo, first_dr_info))
> - && integer_zerop (DR_INIT (first_dr_info->dr))
> - && alias_sets_conflict_p (get_alias_set (aggr_type),
> -   get_alias_set (TREE_TYPE (ref_type
> -   {
> - dataref_ptr = unshare_expr (DR_BASE_ADDRESS 
> (first_dr_info->dr));
> - dataref_offset = build_int_cst (ref_type, 0);
> + /* We should have catched mismatched types earlier.  */
> + gcc_assert (
> +  

Re: [PATCH v1] libffi: Backport of LoongArch support for libffi.

2023-08-22 Thread Xi Ruoyao via Gcc-patches
On Tue, 2023-08-22 at 20:42 +0800, Lulu Cheng wrote:
> This is a backport of ,
> and contains modifications to commit 5a4774cd4d, as well as the LoongArch
> schema portion of commit ee22ecbd11. This is needed for libgo.
> 
> 
> libffi/ChangeLog:

Mention PR libffi/108682 in the ChangeLog here (if it's not pushed yet).

> * configure.host: Add LoongArch support.
> * Makefile.am: Likewise.
> * Makefile.in: Regenerate.
> * src/loongarch64/ffi.c: New file.
> * src/loongarch64/ffitarget.h: New file.
> * src/loongarch64/sysv.S: New file.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Richard Biener via Gcc-patches
On Tue, Aug 22, 2023 at 3:16 PM Jakub Jelinek  wrote:
>
> On Tue, Aug 22, 2023 at 09:02:29PM +0800, Hongtao Liu wrote:
> > > Agreed.  And I still think -mevex512 vs. -mno-evex512 is the best option
> > > name to represent whether the effective ISA set allows 512-bit vectors or
> > > not.  I think -mavx10.1 -mno-avx512cd should be fine.  And, -mavx10.1-256
> > > option IMHO should be in the same spirit to all the others a positive 
> > > enablement,
> > > not both positive (enable avx512{f,cd,bw,dq,...} and negative (disallow
> > > 512-bit vectors).  So, if one uses -mavx512f -mavx10.1-256, because the
> > > former would allow 512-bit vectors, the latter shouldn't disable those 
> > > again
> > > because it isn't a -mno-* option.  Sure, instructions which are specific 
> > > to
> > But there's implicit negative (disallow 512-bit vector), I think
>
> That is wrong.
>
> > -mav512f -mavx10.1-256 or -mavx10.1-256 -mavx512f shouldn't enable
> > 512-bit vector.
>
> Because then the -mavx10.1-256 option behaves completely differently from
> all the other isa options.
>
> We have the -march= options which are processed separately, but the normal
> ISA options either only enable something (when -mwhatever), or only disable 
> something
> (when -mno-whatever). -mavx512f -mavx10.1-256 should be a union of those
> ISAs, like say -mavx2 -mbmi is, not an intersection or something even
> harder to understand.
>
> > Further, we should disallow a mix of exex512 and non-evex512 (e.g.
> > -mavx10.1-512 -mavx10.2-256),they should be a unified separate switch
> > that either disallows both or allows both. Instead of some isa
> > allowing it and some isa disallowing it.
>
> No, it will be really terrible user experience if the new options behave
> completely differently from everything else.  Because then we'll need to
> document it in detail how it behaves and users will have hard time to figure
> it out, and specify what it does not just on the command line, but also when
> mixing with target attribute or pragmas.  -mavx10.1-512 -mavx10.2-256 should
> be a union of those two ISAs.  Either internally there is an ISA flag whether
> the instructions in the avx10.2 ISA but not avx10.1 ISA can operate on
> 512-bit vectors or not, in that case -mavx10.1-512 -mavx10.2-256 should
> enable the AVX10.1 set including 512-bit vectors + just the < 512-bit
> instructions from the 10.1 to 10.2 delta, or if there is no such separation
> internally, it will just enable full AVX10.2-512.  User has asked for it.

I think having all three -mavx10.1, -mavx10.1-256 and -mavx10.1-512 is just
confusing.  Please separate ISA (avx10.1) from size.  If -m[no-]evex512 isn't
good propose something else.  -mavx512f will enable 512bits, -mavx10.1
will not unless -mevex512.  -mavx512f -mavx512vl -mno-evex512 will disable
512bits.

So scrap -mavx10.1-256 and -mavx10.1-512 please.

Richard.

> Jakub
>


Re: [PATCH V2 2/5] OpenMP: C front end support for imperfectly-nested loops

2023-08-22 Thread Jakub Jelinek via Gcc-patches
> New common C/C++ testcases are in a separate patch.
> 
> gcc/c-family/ChangeLog
>   * c-common.h (c_omp_check_loop_binding_exprs): Declare.
>   * c-omp.cc: Include tree-iterator.h.
>   (find_binding_in_body): New.
>   (check_loop_binding_expr_r): New.
>   (LOCATION_OR): New.
>   (check_looop_binding_expr): New.
>   (c_omp_check_loop_binding_exprs): New.
> 
> gcc/c/ChangeLog
>   * c-parser.cc (struct c_parser): Add omp_for_parse_state field.
>   (struct omp_for_parse_data): New.
>   (check_omp_intervening_code): New.
>   (add_structured_block_stmt): New.
>   (c_parser_compound_statement_nostart): Recognize intervening code,
>   nested loops, and other things that need special handling in
>   OpenMP loop constructs.
>   (c_parser_while_statement): Error on loop in intervening code.
>   (c_parser_do_statement): Likewise.
>   (c_parser_for_statement): Likewise.
>   (c_parser_postfix_expression_after_primary): Error on calls to
>   the OpenMP runtime in intervening code.
>   (c_parser_pragma): Error on OpenMP pragmas in intervening code.
>   (c_parser_omp_loop_nest): New.
>   (c_parser_omp_for_loop): Rewrite to use recursive descent, calling
>   c_parser_omp_loop_nest to do the heavy lifting.
> 
> gcc/ChangeLog
>   * omp-api.h: New.
>   * omp-general.cc (omp_runtime_api_procname): New.
>   (omp_runtime_api_call): Moved here from omp-low.cc, and make
>   non-static.
>   * omp-general.h: Include omp-api.h.
>   * omp-low.cc (omp_runtime_api_call): Delete this copy.
> 
> gcc/testsuite/ChangeLog
>   * c-c++-common/goacc/collapse-1.c: Update for new C error behavior.
>   * c-c++-common/goacc/tile-2.c: Likewise.
>   * gcc.dg/gomp/collapse-1.c: Likewise.

> diff --git a/gcc/testsuite/c-c++-common/goacc/collapse-1.c 
> b/gcc/testsuite/c-c++-common/goacc/collapse-1.c
> index 11b14383983..0feac8f8ddb 100644
> --- a/gcc/testsuite/c-c++-common/goacc/collapse-1.c
> +++ b/gcc/testsuite/c-c++-common/goacc/collapse-1.c
> @@ -8,8 +8,8 @@ f1 (void)
>  {
>#pragma acc parallel
>#pragma acc loop collapse (2)
> -  for (i = 0; i < 5; i++)
> -;/* { dg-error "not enough 
> perfectly nested" } */
> +  for (i = 0; i < 5; i++)/* { dg-error "not enough nested loops" } */
> +;
>{
>  for (j = 0; j < 5; j++)
>;

All these c-c++-common testsuite changes will now FAIL after the C patch but
before the C++.  It is nice to have the new c-c++-common tests in a separate
patch, but these tweaks which can't be just avoided need the temporary
{ target c } vs. { target c++} hacks undone later in the C++ patch.

> --- a/gcc/testsuite/c-c++-common/goacc/tile-2.c
> +++ b/gcc/testsuite/c-c++-common/goacc/tile-2.c
> @@ -3,8 +3,8 @@ int main ()
>  #pragma acc parallel
>{
>  #pragma acc loop tile (*,*)
> -for (int ix = 0; ix < 30; ix++)
> -  ; /* { dg-error "not enough" } */
> +for (int ix = 0; ix < 30; ix++) /* { dg-error "not enough" "" { target c 
> } } */
> +  ; /* { dg-error "not enough" "" { target c++ } } */

E.g. like you do here.

Otherwise LGTM.

Jakub



Re: [PATCH v1] libffi: Backport of LoongArch support for libffi.

2023-08-22 Thread Richard Biener via Gcc-patches
On Tue, 22 Aug 2023, Lulu Cheng wrote:

> This is a backport of ,
> and contains modifications to commit 5a4774cd4d, as well as the LoongArch
> schema portion of commit ee22ecbd11. This is needed for libgo.

OK.

> 
> libffi/ChangeLog:
> 
>   * configure.host: Add LoongArch support.
>   * Makefile.am: Likewise.
>   * Makefile.in: Regenerate.
>   * src/loongarch64/ffi.c: New file.
>   * src/loongarch64/ffitarget.h: New file.
>   * src/loongarch64/sysv.S: New file.
> ---
>  libffi/Makefile.am |   4 +-
>  libffi/Makefile.in |  25 +-
>  libffi/configure.host  |   5 +
>  libffi/src/loongarch64/ffi.c   | 621 +
>  libffi/src/loongarch64/ffitarget.h |  82 
>  libffi/src/loongarch64/sysv.S  | 327 +++
>  6 files changed, 1058 insertions(+), 6 deletions(-)
>  create mode 100644 libffi/src/loongarch64/ffi.c
>  create mode 100644 libffi/src/loongarch64/ffitarget.h
>  create mode 100644 libffi/src/loongarch64/sysv.S
> 
> diff --git a/libffi/Makefile.am b/libffi/Makefile.am
> index c6d6f849c53..2259ddb75f9 100644
> --- a/libffi/Makefile.am
> +++ b/libffi/Makefile.am
> @@ -139,7 +139,7 @@ noinst_HEADERS = src/aarch64/ffitarget.h 
> src/aarch64/internal.h   \
>   src/sparc/internal.h src/tile/ffitarget.h src/vax/ffitarget.h   \
>   src/x86/ffitarget.h src/x86/internal.h src/x86/internal64.h \
>   src/x86/asmnames.h src/xtensa/ffitarget.h src/dlmalloc.c\
> - src/kvx/ffitarget.h
> + src/kvx/ffitarget.h src/loongarch64/ffitarget.h
>  
>  EXTRA_libffi_la_SOURCES = src/aarch64/ffi.c src/aarch64/sysv.S   
> \
>   src/aarch64/win64_armasm.S src/alpha/ffi.c src/alpha/osf.S  \
> @@ -169,7 +169,7 @@ EXTRA_libffi_la_SOURCES = src/aarch64/ffi.c 
> src/aarch64/sysv.S\
>   src/x86/ffiw64.c src/x86/win64.S src/x86/ffi64.c\
>   src/x86/unix64.S src/x86/sysv_intel.S src/x86/win64_intel.S \
>   src/xtensa/ffi.c src/xtensa/sysv.S src/kvx/ffi.c\
> - src/kvx/sysv.S
> + src/kvx/sysv.S src/loongarch64/ffi.c src/loongarch64/sysv.S
>  
>  TARGET_OBJ = @TARGET_OBJ@
>  libffi_la_LIBADD = $(TARGET_OBJ)
> diff --git a/libffi/Makefile.in b/libffi/Makefile.in
> index 5524a6a571e..1d936b5c8a5 100644
> --- a/libffi/Makefile.in
> +++ b/libffi/Makefile.in
> @@ -550,7 +550,7 @@ noinst_HEADERS = src/aarch64/ffitarget.h 
> src/aarch64/internal.h   \
>   src/sparc/internal.h src/tile/ffitarget.h src/vax/ffitarget.h   \
>   src/x86/ffitarget.h src/x86/internal.h src/x86/internal64.h \
>   src/x86/asmnames.h src/xtensa/ffitarget.h src/dlmalloc.c\
> - src/kvx/ffitarget.h
> + src/kvx/ffitarget.h src/loongarch64/ffitarget.h
>  
>  EXTRA_libffi_la_SOURCES = src/aarch64/ffi.c src/aarch64/sysv.S   
> \
>   src/aarch64/win64_armasm.S src/alpha/ffi.c src/alpha/osf.S  \
> @@ -580,7 +580,7 @@ EXTRA_libffi_la_SOURCES = src/aarch64/ffi.c 
> src/aarch64/sysv.S\
>   src/x86/ffiw64.c src/x86/win64.S src/x86/ffi64.c\
>   src/x86/unix64.S src/x86/sysv_intel.S src/x86/win64_intel.S \
>   src/xtensa/ffi.c src/xtensa/sysv.S src/kvx/ffi.c\
> - src/kvx/sysv.S
> + src/kvx/sysv.S src/loongarch64/ffi.c src/loongarch64/sysv.S
>  
>  libffi_la_LIBADD = $(TARGET_OBJ)
>  libffi_convenience_la_SOURCES = $(libffi_la_SOURCES)
> @@ -1074,6 +1074,16 @@ src/kvx/ffi.lo: src/kvx/$(am__dirstamp) \
>   src/kvx/$(DEPDIR)/$(am__dirstamp)
>  src/kvx/sysv.lo: src/kvx/$(am__dirstamp) \
>   src/kvx/$(DEPDIR)/$(am__dirstamp)
> +src/loongarch64/$(am__dirstamp):
> + @$(MKDIR_P) src/loongarch64
> + @: > src/loongarch64/$(am__dirstamp)
> +src/loongarch64/$(DEPDIR)/$(am__dirstamp):
> + @$(MKDIR_P) src/loongarch64/$(DEPDIR)
> + @: > src/loongarch64/$(DEPDIR)/$(am__dirstamp)
> +src/loongarch64/ffi.lo: src/loongarch64/$(am__dirstamp) \
> + src/loongarch64/$(DEPDIR)/$(am__dirstamp)
> +src/loongarch64/sysv.lo: src/loongarch64/$(am__dirstamp) \
> + src/loongarch64/$(DEPDIR)/$(am__dirstamp)
>  
>  libffi.la: $(libffi_la_OBJECTS) $(libffi_la_DEPENDENCIES) 
> $(EXTRA_libffi_la_DEPENDENCIES) 
>   $(AM_V_CCLD)$(libffi_la_LINK) -rpath $(toolexeclibdir) 
> $(libffi_la_OBJECTS) $(libffi_la_LIBADD) $(LIBS)
> @@ -1107,6 +1117,8 @@ mostlyclean-compile:
>   -rm -f src/ia64/*.lo
>   -rm -f src/kvx/*.$(OBJEXT)
>   -rm -f src/kvx/*.lo
> + -rm -f src/loongarch64/*.$(OBJEXT)
> + -rm -f src/loongarch64/*.lo
>   -rm -f src/m32r/*.$(OBJEXT)
>   -rm -f src/m32r/*.lo
>   -rm -f src/m68k/*.$(OBJEXT)
> @@ -1182,6 +1194,8 @@ distclean-compile:
>  @AMDEP_TRUE@@am__include@ @am__quote@src/ia64/$(DEPDIR)/unix.Plo@am__quote@
>  @AMDEP_TRUE@@am__include@ @am__quote@src/kvx/$(DEPDIR)/ffi.Plo@am__quote@
>  @AMDEP_TRUE@@am__include@ 

[PATCH] Simplify intereaved store vectorization processing

2023-08-22 Thread Richard Biener via Gcc-patches
When doing interleaving we perform code generation when visiting the
last store of a chain.  We keep track of this via DR_GROUP_STORE_COUNT,
the following localizes this to the caller of vectorizable_store,
also avoing redundant non-processing of the other stores.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-vect-stmts.cc (vectorizable_store): Do not bump
DR_GROUP_STORE_COUNT here.  Remove early out.
(vect_transform_stmt): Only call vectorizable_store on
the last element of an interleaving chain.
---
 gcc/tree-vect-stmts.cc | 40 ++--
 1 file changed, 14 insertions(+), 26 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 33f62b77710..43502dc169f 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -8429,24 +8429,11 @@ vectorizable_store (vec_info *vinfo,
   else if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) >= 3)
 return vectorizable_scan_store (vinfo, stmt_info, gsi, vec_stmt, ncopies);
 
-  if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
-DR_GROUP_STORE_COUNT (DR_GROUP_FIRST_ELEMENT (stmt_info))++;
-
   if (grouped_store)
 {
   /* FORNOW */
   gcc_assert (!loop || !nested_in_vect_loop_p (loop, stmt_info));
 
-  /* We vectorize all the stmts of the interleaving group when we
-reach the last stmt in the group.  */
-  if (DR_GROUP_STORE_COUNT (first_stmt_info)
- < DR_GROUP_SIZE (first_stmt_info)
- && !slp)
-   {
- *vec_stmt = NULL;
- return true;
-   }
-
   if (slp)
 {
   grouped_store = false;
@@ -12487,21 +12474,22 @@ vect_transform_stmt (vec_info *vinfo,
   break;
 
 case store_vec_info_type:
-  done = vectorizable_store (vinfo, stmt_info,
-gsi, _stmt, slp_node, NULL);
-  gcc_assert (done);
-  if (STMT_VINFO_GROUPED_ACCESS (stmt_info) && !slp_node)
+  if (STMT_VINFO_GROUPED_ACCESS (stmt_info)
+ && !slp_node
+ && (++DR_GROUP_STORE_COUNT (DR_GROUP_FIRST_ELEMENT (stmt_info))
+ < DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (stmt_info
+   /* In case of interleaving, the whole chain is vectorized when the
+  last store in the chain is reached.  Store stmts before the last
+  one are skipped, and there vec_stmt_info shouldn't be freed
+  meanwhile.  */
+   ;
+  else
{
- /* In case of interleaving, the whole chain is vectorized when the
-last store in the chain is reached.  Store stmts before the last
-one are skipped, and there vec_stmt_info shouldn't be freed
-meanwhile.  */
- stmt_vec_info group_info = DR_GROUP_FIRST_ELEMENT (stmt_info);
- if (DR_GROUP_STORE_COUNT (group_info) == DR_GROUP_SIZE (group_info))
-   is_store = true;
+ done = vectorizable_store (vinfo, stmt_info,
+gsi, _stmt, slp_node, NULL);
+ gcc_assert (done);
+ is_store = true;
}
-  else
-   is_store = true;
   break;
 
 case condition_vec_info_type:
-- 
2.35.3


Re: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Jakub Jelinek via Gcc-patches
On Tue, Aug 22, 2023 at 09:02:29PM +0800, Hongtao Liu wrote:
> > Agreed.  And I still think -mevex512 vs. -mno-evex512 is the best option
> > name to represent whether the effective ISA set allows 512-bit vectors or
> > not.  I think -mavx10.1 -mno-avx512cd should be fine.  And, -mavx10.1-256
> > option IMHO should be in the same spirit to all the others a positive 
> > enablement,
> > not both positive (enable avx512{f,cd,bw,dq,...} and negative (disallow
> > 512-bit vectors).  So, if one uses -mavx512f -mavx10.1-256, because the
> > former would allow 512-bit vectors, the latter shouldn't disable those again
> > because it isn't a -mno-* option.  Sure, instructions which are specific to
> But there's implicit negative (disallow 512-bit vector), I think

That is wrong.

> -mav512f -mavx10.1-256 or -mavx10.1-256 -mavx512f shouldn't enable
> 512-bit vector.

Because then the -mavx10.1-256 option behaves completely differently from
all the other isa options.

We have the -march= options which are processed separately, but the normal
ISA options either only enable something (when -mwhatever), or only disable 
something
(when -mno-whatever). -mavx512f -mavx10.1-256 should be a union of those
ISAs, like say -mavx2 -mbmi is, not an intersection or something even
harder to understand.

> Further, we should disallow a mix of exex512 and non-evex512 (e.g.
> -mavx10.1-512 -mavx10.2-256),they should be a unified separate switch
> that either disallows both or allows both. Instead of some isa
> allowing it and some isa disallowing it.

No, it will be really terrible user experience if the new options behave
completely differently from everything else.  Because then we'll need to
document it in detail how it behaves and users will have hard time to figure
it out, and specify what it does not just on the command line, but also when
mixing with target attribute or pragmas.  -mavx10.1-512 -mavx10.2-256 should
be a union of those two ISAs.  Either internally there is an ISA flag whether
the instructions in the avx10.2 ISA but not avx10.1 ISA can operate on
512-bit vectors or not, in that case -mavx10.1-512 -mavx10.2-256 should
enable the AVX10.1 set including 512-bit vectors + just the < 512-bit
instructions from the 10.1 to 10.2 delta, or if there is no such separation
internally, it will just enable full AVX10.2-512.  User has asked for it.

Jakub



Re: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Hongtao Liu via Gcc-patches
On Tue, Aug 22, 2023 at 4:34 PM Jakub Jelinek  wrote:
>
> On Tue, Aug 22, 2023 at 09:36:15AM +0200, Richard Biener via Gcc-patches 
> wrote:
> > I think internally we should have conditional 512bit support work across
> > AVX512 and AVX10.
> >
> > I also think it makes sense to _internally_ have AVX10.1 (10.1!) just
> > enable the respective AVX512 features.  AVX10.2 would then internally
> > cover the ISA extensions added in 10.2 only.  Both would reduce the
> > redundancy and possibly make providing inter-operation between
> > AVX10.1 (10.1!) and AVX512 to the user easier.  I see AVX 10.1 (10.1!)
> > just as "re-branding" latest AVX512, so we should treat it that way
> > (making it an alias to the AVX512 features).
> >
> > Whether we want allow -mavx10.1 -mno-avx512cd or whether
> > we only allow the "positive" -mavx512f -mavx512... (omitting avx512cd)
> > is an entirely separate
> > question.  But I think to not wreck the core idea (more interoperability,
> > here between small/big cores) we absolutely have to
> > provide a subset of avx10.1 but with disabled 512bit vectors which
> > effectively means AVX512 with disabled 512bit support.
>
> Agreed.  And I still think -mevex512 vs. -mno-evex512 is the best option
> name to represent whether the effective ISA set allows 512-bit vectors or
> not.  I think -mavx10.1 -mno-avx512cd should be fine.  And, -mavx10.1-256
> option IMHO should be in the same spirit to all the others a positive 
> enablement,
> not both positive (enable avx512{f,cd,bw,dq,...} and negative (disallow
> 512-bit vectors).  So, if one uses -mavx512f -mavx10.1-256, because the
> former would allow 512-bit vectors, the latter shouldn't disable those again
> because it isn't a -mno-* option.  Sure, instructions which are specific to
But there's implicit negative (disallow 512-bit vector), I think
-mav512f -mavx10.1-256 or -mavx10.1-256 -mavx512f shouldn't enable
512-bit vector.
Further, we should disallow a mix of exex512 and non-evex512 (e.g.
-mavx10.1-512 -mavx10.2-256),they should be a unified separate switch
that either disallows both or allows both. Instead of some isa
allowing it and some isa disallowing it.
> AVX10.1 (aren't present in any currently existing AVX512* ISA set) might be
> enabled only in 128/256 bit variants if we differentiate that level.
> But, if one uses -mavx2 -mavx10.1-256, because no AVX512* has been enabled
> it can enable all the AVX10.1 implied AVX512* parts without EVEX.512.
>
> Jakub
>


-- 
BR,
Hongtao


Re: [PATCH V2 1/5] OpenMP: Add OMP_STRUCTURED_BLOCK and GIMPLE_OMP_STRUCTURED_BLOCK.

2023-08-22 Thread Jakub Jelinek via Gcc-patches
On Sun, Jul 23, 2023 at 04:15:17PM -0600, Sandra Loosemore wrote:
> In order to detect invalid jumps in and out of intervening code in
> imperfectly-nested loops, the front ends need to insert some sort of
> marker to identify the structured block sequences that they push into
> the inner body of the loop.  The error checking happens in the
> diagnose_omp_blocks pass, between gimplification and OMP lowering, so
> we need both GENERIC and GIMPLE representations of these markers.
> They are removed in OMP lowering so no subsequent passes need to know
> about them.
> 
> This patch doesn't include any front-end changes to generate the new
> data structures.
> 
> gcc/cp/ChangeLog
>   * constexpr.cc (cxx_eval_constant_expression): Handle
>   OMP_STRUCTURED_BLOCK.
>   * pt.cc (tsubst_expr): Likewise.
> 
> gcc/ChangeLog
>   * doc/generic.texi (OpenMP): Document OMP_STRUCTURED_BLOCK.
>   * doc/gimple.texi (GIMPLE instruction set): Add
>   GIMPLE_OMP_STRUCTURED_BLOCK.
>   (GIMPLE_OMP_STRUCTURED_BLOCK): New subsection.
>   * gimple-low.cc (lower_stmt): Error on GIMPLE_OMP_STRUCTURED_BLOCK.
>   * gimple-pretty-print.cc (dump_gimple_omp_block): Handle
>   GIMPLE_OMP_STRUCTURED_BLOCK.
>   (pp_gimple_stmt_1): Likewise.
>   * gimple-walk.cc (walk_gimple_stmt): Likewise.
>   * gimple.cc (gimple_build_omp_structured_block): New.
>   * gimple.def (GIMPLE_OMP_STRUCTURED_BLOCK): New.
>   * gimple.h (gimple_build_omp_structured_block): Declare.
>   (gimple_has_substatements): Handle GIMPLE_OMP_STRUCTURED_BLOCK.
>   (CASE_GIMPLE_OMP): Likewise.
>   * gimplify.cc (is_gimple_stmt): Handle OMP_STRUCTURED_BLOCK.
>   (gimplify_expr): Likewise.
>   * omp-expand.cc (GIMPLE_OMP_STRUCTURED_BLOCK): Error on
>   GIMPLE_OMP_STRUCTURED_BLOCK.
>   * omp-low.cc (scan_omp_1_stmt): Handle GIMPLE_OMP_STRUCTURED_BLOCK.
>   (lower_omp_1): Likewise.
>   (diagnose_sb_1): Likewise.
>   (diagnose_sb_2): Likewise.
>   * tree-inline.cc (remap_gimple_stmt): Handle
>   GIMPLE_OMP_STRUCTURED_BLOCK.
>   (estimate_num_insns): Likewise.
>   * tree-nested.cc (convert_nonlocal_reference_stmt): Likewise.
>   (convert_local_reference_stmt): Likewise.
>   (convert_gimple_call): Likewise.
>   * tree-pretty-print.cc (dump_generic_node): Handle
>   OMP_STRUCTURED_BLOCK.
>   * tree.def (OMP_STRUCTURED_BLOCK): New.
>   * tree.h (OMP_STRUCTURED_BLOCK_BODY): New.
> --- a/gcc/gimple-low.cc
> +++ b/gcc/gimple-low.cc
> @@ -717,6 +717,11 @@ lower_stmt (gimple_stmt_iterator *gsi, struct lower_data 
> *data)
>   gsi_next (gsi);
>return;
>  
> +case GIMPLE_OMP_STRUCTURED_BLOCK:
> +  /* These are supposed to be removed already in OMP lowering.  */
> +  gcc_unreachable ();
> +  break;

Please don't add break; after gcc_unreachable ();

> --- a/gcc/omp-expand.cc
> +++ b/gcc/omp-expand.cc
> @@ -10592,6 +10592,11 @@ expand_omp (struct omp_region *region)
>parent GIMPLE_OMP_SECTIONS region.  */
> break;
>  
> + case GIMPLE_OMP_STRUCTURED_BLOCK:
> +   /* We should have gotten rid of these in gimple lowering.  */
> +   gcc_unreachable ();
> +   break;

And here neither.

Otherwise LGTM.

Jakub



Re: [PATCH V2 0/5] OpenMP: support for imperfectly-nested loops

2023-08-22 Thread Jakub Jelinek via Gcc-patches
On Sun, Jul 23, 2023 at 04:15:16PM -0600, Sandra Loosemore wrote:
> Here is the latest version of my imperfectly-nested loops patches.
> Compared to the initial version I'd posted in April
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2023-April/617103.html
> 
> this version includes many minor cosmetic fixes suggested by Jakub in
> his initial review (also present in the version I committed to the
> OG13 branch last month), many new test cases to cover various corner
> cases, and code fixes so that C and C++ at least behave consistently
> even if the spec is unclear.  The most intrusive of those fixes is
> that I couldn't figure out how to make jumping between different
> structured blocks of intervening code in the same OMP loop construct
> produce errors without introducing new GENERIC and GIMPLE data
> structures to represent a structured block without any other
> associated OpenMP semantics; that's now part 1 of the patch series.
> 
> There are a few things from the review comments I haven't done anything
> about:
> 
> * I left omp-api.h alone because the Fortran front end needs those
>   declarations without everything else in omp-general.h.

Ok.

> * I didn't think I ought to be speculatively implementing extensions
>   like allowing "do { ... } while (0);" in intervening code.  If it's
>   really important for supporting macros, I suppose it will make it
>   into a future version of the OpenMP spec.

Ack.

> * I didn't understand the comment about needing to add "#pragma omp
>   ordered doacross(source) and sink" to the testcase for errors with
>   the "ordered" clause.  Isn't that only for cross-iteration
>   data dependencies?  There aren't any in that loop.  Also note that some
>   of my new corner-case tests use the "ordered" clause to trigger an
>   error to check that things are being correctly parsed as intervening
>   code, so if there is something really bogus there that must be fixed,
>   it now affects other test cases as well.

ordered(N) clause is meant to be used with doacross loops, where one uses
#pragma omp ordered depend/doacross in the body.
So, when one is testing the rejection of imperfectly nested loops with it,
it is better to actually test it on something properly formed except for the
extra code making the loop imperfectly nested, rather than test it on
something which doesn't have the ordered directives in the body at all.

> * Likewise I didn't know what to do with coming up with a better
>   testcase for "scan".  I could not find an existing testcase with nested
>   loops that I could just add intervening code to, and when I made

What about libgomp.c-c++-common/scan-1.c ?
Obviously, you can cut the initialization and checking, because that is a
runtime testcase and all you need is a compile time test; perhaps put each
of the 2 loop nests into a separate function and just add some code in
between the loops + dg-error.

Jakub



[PATCH v1] libffi: Backport of LoongArch support for libffi.

2023-08-22 Thread Lulu Cheng
This is a backport of ,
and contains modifications to commit 5a4774cd4d, as well as the LoongArch
schema portion of commit ee22ecbd11. This is needed for libgo.


libffi/ChangeLog:

* configure.host: Add LoongArch support.
* Makefile.am: Likewise.
* Makefile.in: Regenerate.
* src/loongarch64/ffi.c: New file.
* src/loongarch64/ffitarget.h: New file.
* src/loongarch64/sysv.S: New file.
---
 libffi/Makefile.am |   4 +-
 libffi/Makefile.in |  25 +-
 libffi/configure.host  |   5 +
 libffi/src/loongarch64/ffi.c   | 621 +
 libffi/src/loongarch64/ffitarget.h |  82 
 libffi/src/loongarch64/sysv.S  | 327 +++
 6 files changed, 1058 insertions(+), 6 deletions(-)
 create mode 100644 libffi/src/loongarch64/ffi.c
 create mode 100644 libffi/src/loongarch64/ffitarget.h
 create mode 100644 libffi/src/loongarch64/sysv.S

diff --git a/libffi/Makefile.am b/libffi/Makefile.am
index c6d6f849c53..2259ddb75f9 100644
--- a/libffi/Makefile.am
+++ b/libffi/Makefile.am
@@ -139,7 +139,7 @@ noinst_HEADERS = src/aarch64/ffitarget.h 
src/aarch64/internal.h \
src/sparc/internal.h src/tile/ffitarget.h src/vax/ffitarget.h   \
src/x86/ffitarget.h src/x86/internal.h src/x86/internal64.h \
src/x86/asmnames.h src/xtensa/ffitarget.h src/dlmalloc.c\
-   src/kvx/ffitarget.h
+   src/kvx/ffitarget.h src/loongarch64/ffitarget.h
 
 EXTRA_libffi_la_SOURCES = src/aarch64/ffi.c src/aarch64/sysv.S \
src/aarch64/win64_armasm.S src/alpha/ffi.c src/alpha/osf.S  \
@@ -169,7 +169,7 @@ EXTRA_libffi_la_SOURCES = src/aarch64/ffi.c 
src/aarch64/sysv.S  \
src/x86/ffiw64.c src/x86/win64.S src/x86/ffi64.c\
src/x86/unix64.S src/x86/sysv_intel.S src/x86/win64_intel.S \
src/xtensa/ffi.c src/xtensa/sysv.S src/kvx/ffi.c\
-   src/kvx/sysv.S
+   src/kvx/sysv.S src/loongarch64/ffi.c src/loongarch64/sysv.S
 
 TARGET_OBJ = @TARGET_OBJ@
 libffi_la_LIBADD = $(TARGET_OBJ)
diff --git a/libffi/Makefile.in b/libffi/Makefile.in
index 5524a6a571e..1d936b5c8a5 100644
--- a/libffi/Makefile.in
+++ b/libffi/Makefile.in
@@ -550,7 +550,7 @@ noinst_HEADERS = src/aarch64/ffitarget.h 
src/aarch64/internal.h \
src/sparc/internal.h src/tile/ffitarget.h src/vax/ffitarget.h   \
src/x86/ffitarget.h src/x86/internal.h src/x86/internal64.h \
src/x86/asmnames.h src/xtensa/ffitarget.h src/dlmalloc.c\
-   src/kvx/ffitarget.h
+   src/kvx/ffitarget.h src/loongarch64/ffitarget.h
 
 EXTRA_libffi_la_SOURCES = src/aarch64/ffi.c src/aarch64/sysv.S \
src/aarch64/win64_armasm.S src/alpha/ffi.c src/alpha/osf.S  \
@@ -580,7 +580,7 @@ EXTRA_libffi_la_SOURCES = src/aarch64/ffi.c 
src/aarch64/sysv.S  \
src/x86/ffiw64.c src/x86/win64.S src/x86/ffi64.c\
src/x86/unix64.S src/x86/sysv_intel.S src/x86/win64_intel.S \
src/xtensa/ffi.c src/xtensa/sysv.S src/kvx/ffi.c\
-   src/kvx/sysv.S
+   src/kvx/sysv.S src/loongarch64/ffi.c src/loongarch64/sysv.S
 
 libffi_la_LIBADD = $(TARGET_OBJ)
 libffi_convenience_la_SOURCES = $(libffi_la_SOURCES)
@@ -1074,6 +1074,16 @@ src/kvx/ffi.lo: src/kvx/$(am__dirstamp) \
src/kvx/$(DEPDIR)/$(am__dirstamp)
 src/kvx/sysv.lo: src/kvx/$(am__dirstamp) \
src/kvx/$(DEPDIR)/$(am__dirstamp)
+src/loongarch64/$(am__dirstamp):
+   @$(MKDIR_P) src/loongarch64
+   @: > src/loongarch64/$(am__dirstamp)
+src/loongarch64/$(DEPDIR)/$(am__dirstamp):
+   @$(MKDIR_P) src/loongarch64/$(DEPDIR)
+   @: > src/loongarch64/$(DEPDIR)/$(am__dirstamp)
+src/loongarch64/ffi.lo: src/loongarch64/$(am__dirstamp) \
+   src/loongarch64/$(DEPDIR)/$(am__dirstamp)
+src/loongarch64/sysv.lo: src/loongarch64/$(am__dirstamp) \
+   src/loongarch64/$(DEPDIR)/$(am__dirstamp)
 
 libffi.la: $(libffi_la_OBJECTS) $(libffi_la_DEPENDENCIES) 
$(EXTRA_libffi_la_DEPENDENCIES) 
$(AM_V_CCLD)$(libffi_la_LINK) -rpath $(toolexeclibdir) 
$(libffi_la_OBJECTS) $(libffi_la_LIBADD) $(LIBS)
@@ -1107,6 +1117,8 @@ mostlyclean-compile:
-rm -f src/ia64/*.lo
-rm -f src/kvx/*.$(OBJEXT)
-rm -f src/kvx/*.lo
+   -rm -f src/loongarch64/*.$(OBJEXT)
+   -rm -f src/loongarch64/*.lo
-rm -f src/m32r/*.$(OBJEXT)
-rm -f src/m32r/*.lo
-rm -f src/m68k/*.$(OBJEXT)
@@ -1182,6 +1194,8 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@src/ia64/$(DEPDIR)/unix.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@src/kvx/$(DEPDIR)/ffi.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@src/kvx/$(DEPDIR)/sysv.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ 
@am__quote@src/loongarch64/$(DEPDIR)/ffi.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ 

Re: [PATCH 1/3] vect: Remove some manual release in vectorizable_store

2023-08-22 Thread Richard Biener via Gcc-patches
On Tue, Aug 22, 2023 at 10:45 AM Kewen.Lin  wrote:
>
> Hi,
>
> To avoid some duplicates in some follow-up patches on
> function vectorizable_store, this patch is to adjust some
> existing vec with auto_vec and remove some manual release
> invocation.  Also refactor a bit and remove some useless
> codes.
>
> Bootstrapped and regtested on x86_64-redhat-linux,
> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>
> Is it ok for trunk?

OK.

Thanks,
Richard.

> BR,
> Kewen
> -
>
> gcc/ChangeLog:
>
> * tree-vect-stmts.cc (vectorizable_store): Remove vec oprnds,
> adjust vec result_chain, vec_oprnd with auto_vec, and adjust
> gvec_oprnds with auto_delete_vec.
> ---
>  gcc/tree-vect-stmts.cc | 64 +++---
>  1 file changed, 23 insertions(+), 41 deletions(-)
>
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 1580a396301..fcaa4127e52 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -8200,9 +8200,6 @@ vectorizable_store (vec_info *vinfo,
>stmt_vec_info first_stmt_info;
>bool grouped_store;
>unsigned int group_size, i;
> -  vec oprnds = vNULL;
> -  vec result_chain = vNULL;
> -  vec vec_oprnds = vNULL;
>bool slp = (slp_node != NULL);
>unsigned int vec_num;
>bb_vec_info bb_vinfo = dyn_cast  (vinfo);
> @@ -8601,6 +8598,7 @@ vectorizable_store (vec_info *vinfo,
>
>alias_off = build_int_cst (ref_type, 0);
>stmt_vec_info next_stmt_info = first_stmt_info;
> +  auto_vec vec_oprnds (ncopies);
>for (g = 0; g < group_size; g++)
> {
>   running_off = offvar;
> @@ -8682,7 +8680,7 @@ vectorizable_store (vec_info *vinfo,
> }
> }
>   next_stmt_info = DR_GROUP_NEXT_ELEMENT (next_stmt_info);
> - vec_oprnds.release ();
> + vec_oprnds.truncate(0);
>   if (slp)
> break;
> }
> @@ -8690,9 +8688,6 @@ vectorizable_store (vec_info *vinfo,
>return true;
>  }
>
> -  auto_vec dr_chain (group_size);
> -  oprnds.create (group_size);
> -
>gcc_assert (alignment_support_scheme);
>vec_loop_masks *loop_masks
>  = (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
> @@ -8783,11 +8778,15 @@ vectorizable_store (vec_info *vinfo,
>   STMT_VINFO_RELATED_STMT for the next copies.
>*/
>
> +  auto_vec dr_chain (group_size);
> +  auto_vec result_chain (group_size);
>auto_vec vec_masks;
>tree vec_mask = NULL;
>auto_vec vec_offsets;
> -  auto_vec > gvec_oprnds;
> -  gvec_oprnds.safe_grow_cleared (group_size, true);
> +  auto_delete_vec> gvec_oprnds (group_size);
> +  for (i = 0; i < group_size; i++)
> +gvec_oprnds.quick_push (new auto_vec (ncopies));
> +  auto_vec vec_oprnds;
>for (j = 0; j < ncopies; j++)
>  {
>gimple *new_stmt;
> @@ -8803,11 +8802,11 @@ vectorizable_store (vec_info *vinfo,
>else
>  {
>   /* For interleaved stores we collect vectorized defs for all the
> -stores in the group in DR_CHAIN and OPRNDS. DR_CHAIN is then
> -used as an input to vect_permute_store_chain().
> +stores in the group in DR_CHAIN. DR_CHAIN is then used as an
> +input to vect_permute_store_chain().
>
>  If the store is not grouped, DR_GROUP_SIZE is 1, and DR_CHAIN
> -and OPRNDS are of size 1.  */
> +is of size 1.  */
>   stmt_vec_info next_stmt_info = first_stmt_info;
>   for (i = 0; i < group_size; i++)
> {
> @@ -8817,11 +8816,10 @@ vectorizable_store (vec_info *vinfo,
>  that there is no interleaving, DR_GROUP_SIZE is 1,
>  and only one iteration of the loop will be executed.  */
>   op = vect_get_store_rhs (next_stmt_info);
> - vect_get_vec_defs_for_operand (vinfo, next_stmt_info,
> -ncopies, op, 
> _oprnds[i]);
> - vec_oprnd = gvec_oprnds[i][0];
> - dr_chain.quick_push (gvec_oprnds[i][0]);
> - oprnds.quick_push (gvec_oprnds[i][0]);
> + vect_get_vec_defs_for_operand (vinfo, next_stmt_info, 
> ncopies,
> +op, gvec_oprnds[i]);
> + vec_oprnd = (*gvec_oprnds[i])[0];
> + dr_chain.quick_push (vec_oprnd);
>   next_stmt_info = DR_GROUP_NEXT_ELEMENT (next_stmt_info);
> }
>   if (mask)
> @@ -8863,16 +8861,13 @@ vectorizable_store (vec_info *vinfo,
>else
> {
>   gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo));
> - /* For interleaved stores we created vectorized defs for all the
> -defs stored in OPRNDS in the previous iteration (previous copy).
> -DR_CHAIN is then used as an input to vect_permute_store_chain().
> - 

Re: [PATCH] vect: Replace DR_GROUP_STORE_COUNT with DR_GROUP_LAST_ELEMENT

2023-08-22 Thread Richard Biener via Gcc-patches
On Tue, Aug 22, 2023 at 10:44 AM Kewen.Lin  wrote:
>
> Hi,
>
> Now we use DR_GROUP_STORE_COUNT to record how many stores
> in a group have been transformed and only do the actual
> transform when encountering the last one.  I'm making
> patches to move costing next to the transform code, it's
> awkward to use this DR_GROUP_STORE_COUNT for both costing
> and transforming.  This patch is to introduce last_element
> to record the last element to be transformed in the group
> rather than to sum up the store number we have seen, then
> we can only check the given stmt is the last or not.  It
> can make it work simply for both costing and transforming.
>
> Bootstrapped and regtested on x86_64-redhat-linux,
> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>
> Is it ok for trunk?

This is all (existing) gross, so ... can't we do sth like the following
instead?  Going to test this further besides the quick single
testcase I verified.

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 33f62b77710..67de19d9ce5 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -8437,16 +8437,6 @@ vectorizable_store (vec_info *vinfo,
   /* FORNOW */
   gcc_assert (!loop || !nested_in_vect_loop_p (loop, stmt_info));

-  /* We vectorize all the stmts of the interleaving group when we
-reach the last stmt in the group.  */
-  if (DR_GROUP_STORE_COUNT (first_stmt_info)
- < DR_GROUP_SIZE (first_stmt_info)
- && !slp)
-   {
- *vec_stmt = NULL;
- return true;
-   }
-
   if (slp)
 {
   grouped_store = false;
@@ -12487,21 +12477,21 @@ vect_transform_stmt (vec_info *vinfo,
   break;

 case store_vec_info_type:
-  done = vectorizable_store (vinfo, stmt_info,
-gsi, _stmt, slp_node, NULL);
-  gcc_assert (done);
-  if (STMT_VINFO_GROUPED_ACCESS (stmt_info) && !slp_node)
+  if (STMT_VINFO_GROUPED_ACCESS (stmt_info)
+ && !slp_node
+ && DR_GROUP_NEXT_ELEMENT (stmt_info))
+   /* In case of interleaving, the whole chain is vectorized when the
+  last store in the chain is reached.  Store stmts before the last
+  one are skipped, and there vec_stmt_info shouldn't be freed
+  meanwhile.  */
+   ;
+  else
{
- /* In case of interleaving, the whole chain is vectorized when the
-last store in the chain is reached.  Store stmts before the last
-one are skipped, and there vec_stmt_info shouldn't be freed
-meanwhile.  */
- stmt_vec_info group_info = DR_GROUP_FIRST_ELEMENT (stmt_info);
- if (DR_GROUP_STORE_COUNT (group_info) == DR_GROUP_SIZE (group_info))
-   is_store = true;
+ done = vectorizable_store (vinfo, stmt_info,
+gsi, _stmt, slp_node, NULL);
+ gcc_assert (done);
+ is_store = true;
}
-  else
-   is_store = true;
   break;

 case condition_vec_info_type:


> BR,
> Kewen
> -
>
> gcc/ChangeLog:
>
> * tree-vect-data-refs.cc (vect_set_group_last_element): New function.
> (vect_analyze_group_access): Call new function
> vect_set_group_last_element.
> * tree-vect-stmts.cc (vectorizable_store): Replace 
> DR_GROUP_STORE_COUNT
> uses with DR_GROUP_LAST_ELEMENT.
> (vect_transform_stmt): Likewise.
> * tree-vect-slp.cc (vect_split_slp_store_group): Likewise.
> (vect_build_slp_instance): Likewise.
> * tree-vectorizer.h (DR_GROUP_LAST_ELEMENT): New macro.
> (DR_GROUP_STORE_COUNT): Remove.
> (class _stmt_vec_info::store_count): Remove.
> (class _stmt_vec_info::last_element): New class member.
> (vect_set_group_last_element): New function declaration.
> ---
>  gcc/tree-vect-data-refs.cc | 30 ++
>  gcc/tree-vect-slp.cc   | 13 +
>  gcc/tree-vect-stmts.cc |  9 +++--
>  gcc/tree-vectorizer.h  | 12 +++-
>  4 files changed, 49 insertions(+), 15 deletions(-)
>
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index 3e9a284666c..c4a495431d5 100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -2832,6 +2832,33 @@ vect_analyze_group_access_1 (vec_info *vinfo, 
> dr_vec_info *dr_info)
>return true;
>  }
>
> +/* Given vectorization information VINFO, set the last element in the
> +   group led by FIRST_STMT_INFO.  For now, it's only used for loop
> +   vectorization and stores, since for loop-vect the grouped stores
> +   are only transformed till encounting its last one.  */
> +
> +void
> +vect_set_group_last_element (vec_info *vinfo, stmt_vec_info first_stmt_info)
> +{
> +  if (first_stmt_info
> +  && is_a (vinfo)
> +  && DR_IS_WRITE (STMT_VINFO_DATA_REF (first_stmt_info)))
> +{
> +  stmt_vec_info stmt_info = DR_GROUP_NEXT_ELEMENT (first_stmt_info);
> +  

Re: Loop-ch improvements, part 3

2023-08-22 Thread Jan Hubicka via Gcc-patches
> 
> We seem to peel one iteration for no good reason.  The loop is
> a do-while loop already.  The key is we see the first iteration
> exit condition is known not taken and then:
> 
>  Registering value_relation (path_oracle) (iter.24_6 > iter.24_5) (root: 
> bb2)
> Stmt is static (constant in the first iteration)
>   Analyzing: if (iter.24_6 != 16)
>  Registering killing_def (path_oracle) iter.24_6
>  Registering value_relation (path_oracle) (iter.24_6 > iter.24_5) (root: 
> bb2)
> Will eliminate peeled conditional in bb 3.
> Duplicating bb 3 is a win; it has zero cost
>   Not duplicating bb 5: it is single succ.
> Copying headers of loop 1
> Will duplicate bb 3
> Duplicating header of the loop 1 up to edge 3->4
> Loop 1 is do-while loop
> Loop 1 is now do-while loop.
> Exit count: 0 (estimated locally)
> Entry count: 10631108 (estimated locally)
> Peeled all exits: decreased number of iterations of loop 1 by 1.
> 
> and that's because of
> 
>   /* If the static exit fully optimize out, it is win to "duplicate"
>  it.
> 
>  TODO: Even if duplication costs some size we may opt to do so in case
>  exit probability is significant enough (do partial peeling).  */
>   if (static_exit)
> return code_size_cost ? ch_possible_zero_cost : ch_win;
> 
> IMHO we're over aggressively apply early peeling here.  That holds
> generally, not only for OMP simd loops (which we could identify).
> 
> Why are we doing this game for single-block do-while loops?

It seems I just wrongly updated the old conditional. Sorry for that.
It should be:
diff --git a/gcc/tree-ssa-loop-ch.cc b/gcc/tree-ssa-loop-ch.cc
index 6cdb87a762f..8142add4bec 100644
--- a/gcc/tree-ssa-loop-ch.cc
+++ b/gcc/tree-ssa-loop-ch.cc
@@ -464,7 +464,7 @@ should_duplicate_loop_header_p (basic_block header, class 
loop *loop,
  TODO: Even if duplication costs some size we may opt to do so in case
  exit probability is significant enough (do partial peeling).  */
   if (static_exit)
-return code_size_cost ? ch_possible_zero_cost : ch_win;
+return !code_size_cost ? ch_possible_zero_cost : ch_possible;
 
   /* We was not able to prove that conditional will be eliminated.  */
   int insns = estimate_num_insns (last, _size_weights);

So the heuristics knows that if there is no code produced "peeling" is
good idea since it eliminates one conditional for free. Otherwise it
should know that peeling is possible but only done if it produces
do-while-loop

As TODO says it would make to duplicate also if the exit likely avoids
entering the loop (which would be cheaper than peeling full first
iteration), but that can be done incrementally.

I am testing the fix.

Honza


[PATCH] rtl: Forward declare rtx_code

2023-08-22 Thread Richard Earnshaw via Gcc-patches

Now that we require C++ 11, we can safely forward declare rtx_code
so that we can use it in target hooks.

gcc/ChangeLog
* coretypes.h (rtx_code): Add forward declaration.
* rtl.h (rtx_code): Make compatible with forward declaration.
---
 gcc/coretypes.h | 4 
 gcc/rtl.h   | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index ca8837cef67..51e9ce0 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -100,6 +100,10 @@ struct gimple;
 typedef gimple *gimple_seq;
 struct gimple_stmt_iterator;
 
+/* Forward declare rtx_code, so that we can use it in target hooks without
+   needing to pull in rtl.h.  */
+enum rtx_code : unsigned;
+
 /* Forward decls for leaf gimple subclasses (for individual gimple codes).
Keep this in the same order as the corresponding codes in gimple.def.  */
 
diff --git a/gcc/rtl.h b/gcc/rtl.h
index e1c51156f90..0e9491b89b4 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -45,7 +45,7 @@ class predefined_function_abi;
 /* Register Transfer Language EXPRESSIONS CODES */
 
 #define RTX_CODE	enum rtx_code
-enum rtx_code  {
+enum rtx_code : unsigned {
 
 #define DEF_RTL_EXPR(ENUM, NAME, FORMAT, CLASS)   ENUM ,
 #include "rtl.def"		/* rtl expressions are documented here */


[PATCH 14/12] libgcc _BitInt helper documentation [PR102989]

2023-08-22 Thread Jakub Jelinek via Gcc-patches
On Mon, Aug 21, 2023 at 05:32:04PM +, Joseph Myers wrote:
> I think the libgcc functions (i.e. those exported by libgcc, to which 
> references are generated by the compiler) need documenting in libgcc.texi.  
> Internal functions or macros in the libgcc patch need appropriate comments 
> specifying their semantics; especially FP_TO_BITINT and FP_FROM_BITINT 
> which have a lot of arguments and no comments saying what the semantics of 
> the macros and their arguments are supposed to me.

Here is an incremental patch which does that.

2023-08-22  Jakub Jelinek  

PR c/102989
gcc/
* doc/libgcc.texi (Bit-precise integer arithmetic functions):
Document general rules for _BitInt support library functions
and document __mulbitint3 and __divmodbitint4.
(Conversion functions): Document __fix{s,d,x,t}fbitint,
__floatbitint{s,d,x,t,h,b}f, __bid_fix{s,d,t}dbitint and
__bid_floatbitint{s,d,t}d.
libgcc/
* libgcc2.c (bitint_negate): Add function comment.
* soft-fp/bitint.h (bitint_negate): Add function comment.
(FP_TO_BITINT, FP_FROM_BITINT): Add comment explaining the macros.

--- gcc/doc/libgcc.texi.jj  2023-01-16 11:52:16.115733593 +0100
+++ gcc/doc/libgcc.texi 2023-08-22 12:35:08.561348126 +0200
@@ -218,6 +218,51 @@ These functions return the number of bit
 These functions return the @var{a} byteswapped.
 @end deftypefn
 
+@subsection Bit-precise integer arithmetic functions
+
+@code{_BitInt(@var{N})} library functions operate on arrays of limbs, where
+each limb has @code{__LIBGCC_BITINT_LIMB_WIDTH__} bits and the limbs are
+ordered according to @code{__LIBGCC_BITINT_ORDER__} ordering.  The most
+significant limb if @var{N} is not divisible by
+@code{__LIBGCC_BITINT_LIMB_WIDTH__} contains padding bits which should be
+ignored on read (sign or zero extended), but extended on write.  For the
+library functions, all bit-precise integers regardless of @var{N} are
+represented like that, even when the target ABI says that for some small
+@var{N} they should be represented differently in memory.  A pointer
+to the array of limbs argument is always accompanied with a bit size
+argument.  If that argument is positive, it is number of bits and the
+number is assumed to be zero-extended to infinite precision, if that
+argument is negative, it is negated number of bits above which all bits
+are assumed to be sign-extended to infinite precision.  These number of bits
+arguments don't need to match actual @var{N} for the operation used in the
+source, they could be lowered because of sign or zero extensions on the
+input or because value-range optimization figures value will need certain
+lower number of bits.  For big-endian ordering of limbs, when lowering
+the bit size argument the pointer argument needs to be adjusted as well.
+Negative bit size argument should be always smaller or equal to @code{-2},
+because @code{signed _BitInt(1)} is not valid.
+For output arguments, either the corresponding bit size argument should
+be always positive (for multiplication and division), or is negative when
+the output of conversion from floating-point value is signed and positive
+when unsigned.  The arrays of limbs output arguments point to should not
+overlap any inputs, while input arrays of limbs can overlap.
+@code{UBILtype} below stands for unsigned integer type with
+@code{__LIBGCC_BITINT_LIMB_WIDTH__} bit precision.
+
+@deftypefn {Runtime Function} void __mulbitint3 (@code{UBILtype} *@var{ret}, 
int32_t @var{retprec}, const @code{UBILtype} *u, int32_t @var{uprec}, const 
@code{UBILtype} *v, int32_t @var{vprec})
+This function multiplies bit-precise integer operands @var{u} and @var{v} and 
stores
+result into @var{retprec} precision bit-precise integer result @var{ret}.
+@end deftypefn
+
+@deftypefn {Runtime Function} void __divmodbitint4 (@code{UBILtype} *@var{q}, 
int32_t @var{qprec}, @code{UBILtype} *@var{r}, int32_t @var{rprec},  const 
@code{UBILtype} *u, int32_t @var{uprec}, const @code{UBILtype} *v, int32_t 
@var{vprec})
+This function divides bit-precise integer operands @var{u} and @var{v} and 
stores
+quotient into @var{qprec} precision bit-precise integer result @var{q}
+(unless @var{q} is @code{NULL} and @var{qprec} is 0, in that case quotient
+is not stored anywhere) and remainder into @var{rprec} precision bit-precise
+integer result @var{r} (similarly, unless @var{r} is @code{NULL} and 
@var{rprec}
+is 0).
+@end deftypefn
+
 @node Soft float library routines
 @section Routines for floating point emulation
 @cindex soft float library
@@ -384,6 +429,27 @@ These functions convert @var{i}, an unsi
 These functions convert @var{i}, an unsigned long long, to floating point.
 @end deftypefn
 
+@deftypefn {Runtime Function} void __fixsfbitint (@code{UBILtype} *@var{r}, 
int32_t @var{rprec}, float @var{a})
+@deftypefnx {Runtime Function} void __fixdfbitint (@code{UBILtype} *@var{r}, 
int32_t @var{rprec}, double @var{a})
+@deftypefnx 

Re: [PATCH 7/12] ubsan: _BitInt -fsanitize=undefined support [PR102989]

2023-08-22 Thread Richard Biener via Gcc-patches
On Wed, 9 Aug 2023, Jakub Jelinek wrote:

> Hi!
> 
> The following patch introduces some -fsanitize=undefined support for _BitInt,
> but some of the diagnostics is limited by lack of proper support in the
> library.
> I've filed https://github.com/llvm/llvm-project/issues/64100 to request
> proper support, for now some of the diagnostics might have less or more
> confusing or inaccurate wording but UB should still be diagnosed when it
> happens.

OK, you're the expert here.

Richard.

> 2023-08-09  Jakub Jelinek  
> 
>   PR c/102989
> gcc/
>   * internal-fn.cc (expand_ubsan_result_store): Add LHS, MODE and
>   DO_ERROR arguments.  For non-mode precision BITINT_TYPE results
>   check if all padding bits up to mode precision are zeros or sign
>   bit copies and if not, jump to DO_ERROR.
>   (expand_addsub_overflow, expand_neg_overflow, expand_mul_overflow):
>   Adjust expand_ubsan_result_store callers.
>   * ubsan.cc: Include target.h and langhooks.h.
>   (ubsan_encode_value): Pass BITINT_TYPE values which fit into pointer
>   size converted to pointer sized integer, pass BITINT_TYPE values
>   which fit into TImode (if supported) or DImode as those integer types
>   or otherwise for now punt (pass 0).
>   (ubsan_type_descriptor): Handle BITINT_TYPE.  For pstyle of
>   UBSAN_PRINT_FORCE_INT use TK_Integer (0x) mode with a
>   TImode/DImode precision rather than TK_Unknown used otherwise for
>   large/huge BITINT_TYPEs.
>   (instrument_si_overflow): Instrument BITINT_TYPE operations even when
>   they don't have mode precision.
>   * ubsan.h (enum ubsan_print_style): New enumerator.
> gcc/c-family/
>   * c-ubsan.cc (ubsan_instrument_shift): Use UBSAN_PRINT_FORCE_INT
>   for type0 type descriptor.
> 
> --- gcc/ubsan.cc.jj   2023-08-08 15:54:35.443599459 +0200
> +++ gcc/ubsan.cc  2023-08-08 16:12:02.329939798 +0200
> @@ -50,6 +50,8 @@ along with GCC; see the file COPYING3.
>  #include "gimple-fold.h"
>  #include "varasm.h"
>  #include "realmpfr.h"
> +#include "target.h"
> +#include "langhooks.h"
>  
>  /* Map from a tree to a VAR_DECL tree.  */
>  
> @@ -125,6 +127,25 @@ tree
>  ubsan_encode_value (tree t, enum ubsan_encode_value_phase phase)
>  {
>tree type = TREE_TYPE (t);
> +  if (TREE_CODE (type) == BITINT_TYPE)
> +{
> +  if (TYPE_PRECISION (type) <= POINTER_SIZE)
> + {
> +   type = pointer_sized_int_node;
> +   t = fold_build1 (NOP_EXPR, type, t);
> + }
> +  else
> + {
> +   scalar_int_mode arith_mode
> + = (targetm.scalar_mode_supported_p (TImode) ? TImode : DImode);
> +   if (TYPE_PRECISION (type) > GET_MODE_PRECISION (arith_mode))
> + return build_zero_cst (pointer_sized_int_node);
> +   type
> + = build_nonstandard_integer_type (GET_MODE_PRECISION (arith_mode),
> +   TYPE_UNSIGNED (type));
> +   t = fold_build1 (NOP_EXPR, type, t);
> + }
> +}
>scalar_mode mode = SCALAR_TYPE_MODE (type);
>const unsigned int bitsize = GET_MODE_BITSIZE (mode);
>if (bitsize <= POINTER_SIZE)
> @@ -355,14 +376,32 @@ ubsan_type_descriptor (tree type, enum u
>  {
>/* See through any typedefs.  */
>type = TYPE_MAIN_VARIANT (type);
> +  tree type3 = type;
> +  if (pstyle == UBSAN_PRINT_FORCE_INT)
> +{
> +  /* Temporary hack for -fsanitize=shift with _BitInt(129) and more.
> +  libubsan crashes if it is not TK_Integer type.  */
> +  if (TREE_CODE (type) == BITINT_TYPE)
> + {
> +   scalar_int_mode arith_mode
> + = (targetm.scalar_mode_supported_p (TImode)
> +? TImode : DImode);
> +   if (TYPE_PRECISION (type) > GET_MODE_PRECISION (arith_mode))
> + type3 = build_qualified_type (type, TYPE_QUAL_CONST);
> + }
> +  if (type3 == type)
> + pstyle = UBSAN_PRINT_NORMAL;
> +}
>  
> -  tree decl = decl_for_type_lookup (type);
> +  tree decl = decl_for_type_lookup (type3);
>/* It is possible that some of the earlier created DECLs were found
>   unused, in that case they weren't emitted and varpool_node::get
>   returns NULL node on them.  But now we really need them.  Thus,
>   renew them here.  */
>if (decl != NULL_TREE && varpool_node::get (decl))
> -return build_fold_addr_expr (decl);
> +{
> +  return build_fold_addr_expr (decl);
> +}
>  
>tree dtype = ubsan_get_type_descriptor_type ();
>tree type2 = type;
> @@ -370,6 +409,7 @@ ubsan_type_descriptor (tree type, enum u
>pretty_printer pretty_name;
>unsigned char deref_depth = 0;
>unsigned short tkind, tinfo;
> +  char tname_bitint[sizeof ("unsigned _BitInt(2147483647)")];
>  
>/* Get the name of the type, or the name of the pointer type.  */
>if (pstyle == UBSAN_PRINT_POINTER)
> @@ -403,8 +443,18 @@ ubsan_type_descriptor (tree type, enum u
>  }
>  
>if (tname == NULL)
> -/* We weren't able to determine the type 

Re: [PATCH] VECT: Add LEN_FOLD_EXTRACT_LAST pattern

2023-08-22 Thread Richard Biener via Gcc-patches
On Tue, 22 Aug 2023, Juzhe-Zhong wrote:

> Hi, Richard and Richi.
> 
> This is the last autovec pattern I want to add for RVV (length loop control).
> 
> This patch is supposed to handled this following case:
> 
> int __attribute__ ((noinline, noclone))
> condition_reduction (int *a, int min_v, int n)
> {
>   int last = 66; /* High start value.  */
> 
>   for (int i = 0; i < n; i++)
> if (a[i] < min_v)
>   last = i;
> 
>   return last;
> }
> 
> ARM SVE IR:
> 
>   ...
>   mask__7.11_39 = vect__4.10_37 < vect_cst__38;
>   _40 = loop_mask_36 & mask__7.11_39;
>   last_5 = .FOLD_EXTRACT_LAST (last_15, _40, vect_vec_iv_.7_32);
>   ...
> 
> RVV IR, we want to see:
>  ...
>  loop_len = SELECT_VL
>  mask__7.11_39 = vect__4.10_37 < vect_cst__38;
>  last_5 = .LEN_FOLD_EXTRACT_LAST (last_15, _40, vect_vec_iv_.7_32, loop_len, 
> bias);
>  ...

OK.

Richard.

> gcc/ChangeLog:
> 
>   * doc/md.texi: Add LEN_FOLD_EXTRACT_LAST pattern.
>   * internal-fn.cc (fold_len_extract_direct): Ditto.
>   (expand_fold_len_extract_optab_fn): Ditto.
>   (direct_fold_len_extract_optab_supported_p): Ditto.
>   * internal-fn.def (LEN_FOLD_EXTRACT_LAST): Ditto.
> 
> ---
>  gcc/doc/md.texi | 6 ++
>  gcc/internal-fn.cc  | 5 +
>  gcc/internal-fn.def | 3 +++
>  3 files changed, 14 insertions(+)
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 89562fdb43c..24453693d89 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5636,6 +5636,12 @@ has mode @var{m} and operands 0 and 1 have the mode 
> appropriate for
>  one element of @var{m}.  Operand 2 has the usual mask mode for vectors
>  of mode @var{m}; see @code{TARGET_VECTORIZE_GET_MASK_MODE}.
>  
> +@cindex @code{len_fold_extract_last_@var{m}} instruction pattern
> +@item @code{len_fold_extract_last_@var{m}}
> +Like @samp{fold_extract_last_@var{m}}, but takes an extra length operand as
> +operand 4 and an extra bias operand as operand 5.  The last associated 
> element
> +is extracted should have the index i < len (operand 4) + bias (operand 5).
> +
>  @cindex @code{fold_left_plus_@var{m}} instruction pattern
>  @item @code{fold_left_plus_@var{m}}
>  Take scalar operand 1 and successively add each element from vector
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 314f63b614b..4138cc31d7e 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -188,6 +188,7 @@ init_internal_fns ()
>  #define cond_len_ternary_direct { 1, 1, true }
>  #define while_direct { 0, 2, false }
>  #define fold_extract_direct { 2, 2, false }
> +#define fold_len_extract_direct { 2, 2, false }
>  #define fold_left_direct { 1, 1, false }
>  #define mask_fold_left_direct { 1, 1, false }
>  #define mask_len_fold_left_direct { 1, 1, false }
> @@ -3863,6 +3864,9 @@ expand_convert_optab_fn (internal_fn fn, gcall *stmt, 
> convert_optab optab,
>  #define expand_fold_extract_optab_fn(FN, STMT, OPTAB) \
>expand_direct_optab_fn (FN, STMT, OPTAB, 3)
>  
> +#define expand_fold_len_extract_optab_fn(FN, STMT, OPTAB) \
> +  expand_direct_optab_fn (FN, STMT, OPTAB, 5)
> +
>  #define expand_fold_left_optab_fn(FN, STMT, OPTAB) \
>expand_direct_optab_fn (FN, STMT, OPTAB, 2)
>  
> @@ -3980,6 +3984,7 @@ multi_vector_optab_supported_p (convert_optab optab, 
> tree_pair types,
>  #define direct_mask_len_store_optab_supported_p convert_optab_supported_p
>  #define direct_while_optab_supported_p convert_optab_supported_p
>  #define direct_fold_extract_optab_supported_p direct_optab_supported_p
> +#define direct_fold_len_extract_optab_supported_p direct_optab_supported_p
>  #define direct_fold_left_optab_supported_p direct_optab_supported_p
>  #define direct_mask_fold_left_optab_supported_p direct_optab_supported_p
>  #define direct_mask_len_fold_left_optab_supported_p direct_optab_supported_p
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 594f7881511..d09403c0a91 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -312,6 +312,9 @@ DEF_INTERNAL_OPTAB_FN (EXTRACT_LAST, ECF_CONST | 
> ECF_NOTHROW,
>  DEF_INTERNAL_OPTAB_FN (FOLD_EXTRACT_LAST, ECF_CONST | ECF_NOTHROW,
>  fold_extract_last, fold_extract)
>  
> +DEF_INTERNAL_OPTAB_FN (LEN_FOLD_EXTRACT_LAST, ECF_CONST | ECF_NOTHROW,
> +len_fold_extract_last, fold_len_extract)
> +
>  DEF_INTERNAL_OPTAB_FN (FOLD_LEFT_PLUS, ECF_CONST | ECF_NOTHROW,
>  fold_left_plus, fold_left)
>  
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH 02/11] Handle epilogues that contain jumps

2023-08-22 Thread Richard Biener via Gcc-patches
On Tue, Aug 22, 2023 at 12:42 PM Szabolcs Nagy via Gcc-patches
 wrote:
>
> From: Richard Sandiford 
>
> The prologue/epilogue pass allows the prologue sequence
> to contain jumps.  The sequence is then partitioned into
> basic blocks using find_many_sub_basic_blocks.
>
> This patch treats epilogues in the same way.  It's needed for
> a follow-on aarch64 patch that adds conditional code to both
> the prologue and the epilogue.
>
> Tested on aarch64-linux-gnu (including with a follow-on patch)
> and x86_64-linux-gnu.  OK to install?
>
> Richard
>
> gcc/
> * function.cc (thread_prologue_and_epilogue_insns): Handle
> epilogues that contain jumps.
> ---
>
> This is a previously approved patch that was not committed
> because it was not needed at the time, but i'd like to commit
> it as it is needed for the followup aarch64 eh_return changes:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605769.html
>
> ---
>  gcc/function.cc | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/gcc/function.cc b/gcc/function.cc
> index dd2c1136e07..70d1cd65303 100644
> --- a/gcc/function.cc
> +++ b/gcc/function.cc
> @@ -6120,6 +6120,11 @@ thread_prologue_and_epilogue_insns (void)
>   && returnjump_p (BB_END (e->src)))
> e->flags &= ~EDGE_FALLTHRU;
> }
> +
> + auto_sbitmap blocks (last_basic_block_for_fn (cfun));
> + bitmap_clear (blocks);
> +   bitmap_set_bit (blocks, BLOCK_FOR_INSN (epilogue_seq)->index);
> + find_many_sub_basic_blocks (blocks);
> }
>else if (next_active_insn (BB_END (exit_fallthru_edge->src)))
> {
> @@ -6218,6 +6223,11 @@ thread_prologue_and_epilogue_insns (void)
>   set_insn_locations (seq, epilogue_location);
>
>   emit_insn_before (seq, insn);
> +
> + auto_sbitmap blocks (last_basic_block_for_fn (cfun));
> + bitmap_clear (blocks);
> + bitmap_set_bit (blocks, BLOCK_FOR_INSN (insn)->index);
> + find_many_sub_basic_blocks (blocks);

I'll note that clearing a full sbitmap to pass down a single basic block
to find_many_sub_basic_blocks is a quite expensive operation.  May I suggest
to add an overload operating on a single basic block?  It's only

  FOR_EACH_BB_FN (bb, cfun)
SET_STATE (bb,
   bitmap_bit_p (blocks, bb->index) ? BLOCK_TO_SPLIT :
BLOCK_ORIGINAL);

using the bitmap, so factoring the rest of the function and customizing this
walk would do the trick.  Note that the whole function could be refactored to
handle single blocks more efficiently.

> }
>  }
>
> --
> 2.25.1
>


[PATCH] VECT: Add LEN_FOLD_EXTRACT_LAST pattern

2023-08-22 Thread Juzhe-Zhong
Hi, Richard and Richi.

This is the last autovec pattern I want to add for RVV (length loop control).

This patch is supposed to handled this following case:

int __attribute__ ((noinline, noclone))
condition_reduction (int *a, int min_v, int n)
{
  int last = 66; /* High start value.  */

  for (int i = 0; i < n; i++)
if (a[i] < min_v)
  last = i;

  return last;
}

ARM SVE IR:

  ...
  mask__7.11_39 = vect__4.10_37 < vect_cst__38;
  _40 = loop_mask_36 & mask__7.11_39;
  last_5 = .FOLD_EXTRACT_LAST (last_15, _40, vect_vec_iv_.7_32);
  ...

RVV IR, we want to see:
 ...
 loop_len = SELECT_VL
 mask__7.11_39 = vect__4.10_37 < vect_cst__38;
 last_5 = .LEN_FOLD_EXTRACT_LAST (last_15, _40, vect_vec_iv_.7_32, loop_len, 
bias);
 ...

gcc/ChangeLog:

* doc/md.texi: Add LEN_FOLD_EXTRACT_LAST pattern.
* internal-fn.cc (fold_len_extract_direct): Ditto.
(expand_fold_len_extract_optab_fn): Ditto.
(direct_fold_len_extract_optab_supported_p): Ditto.
* internal-fn.def (LEN_FOLD_EXTRACT_LAST): Ditto.

---
 gcc/doc/md.texi | 6 ++
 gcc/internal-fn.cc  | 5 +
 gcc/internal-fn.def | 3 +++
 3 files changed, 14 insertions(+)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 89562fdb43c..24453693d89 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5636,6 +5636,12 @@ has mode @var{m} and operands 0 and 1 have the mode 
appropriate for
 one element of @var{m}.  Operand 2 has the usual mask mode for vectors
 of mode @var{m}; see @code{TARGET_VECTORIZE_GET_MASK_MODE}.
 
+@cindex @code{len_fold_extract_last_@var{m}} instruction pattern
+@item @code{len_fold_extract_last_@var{m}}
+Like @samp{fold_extract_last_@var{m}}, but takes an extra length operand as
+operand 4 and an extra bias operand as operand 5.  The last associated element
+is extracted should have the index i < len (operand 4) + bias (operand 5).
+
 @cindex @code{fold_left_plus_@var{m}} instruction pattern
 @item @code{fold_left_plus_@var{m}}
 Take scalar operand 1 and successively add each element from vector
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 314f63b614b..4138cc31d7e 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -188,6 +188,7 @@ init_internal_fns ()
 #define cond_len_ternary_direct { 1, 1, true }
 #define while_direct { 0, 2, false }
 #define fold_extract_direct { 2, 2, false }
+#define fold_len_extract_direct { 2, 2, false }
 #define fold_left_direct { 1, 1, false }
 #define mask_fold_left_direct { 1, 1, false }
 #define mask_len_fold_left_direct { 1, 1, false }
@@ -3863,6 +3864,9 @@ expand_convert_optab_fn (internal_fn fn, gcall *stmt, 
convert_optab optab,
 #define expand_fold_extract_optab_fn(FN, STMT, OPTAB) \
   expand_direct_optab_fn (FN, STMT, OPTAB, 3)
 
+#define expand_fold_len_extract_optab_fn(FN, STMT, OPTAB) \
+  expand_direct_optab_fn (FN, STMT, OPTAB, 5)
+
 #define expand_fold_left_optab_fn(FN, STMT, OPTAB) \
   expand_direct_optab_fn (FN, STMT, OPTAB, 2)
 
@@ -3980,6 +3984,7 @@ multi_vector_optab_supported_p (convert_optab optab, 
tree_pair types,
 #define direct_mask_len_store_optab_supported_p convert_optab_supported_p
 #define direct_while_optab_supported_p convert_optab_supported_p
 #define direct_fold_extract_optab_supported_p direct_optab_supported_p
+#define direct_fold_len_extract_optab_supported_p direct_optab_supported_p
 #define direct_fold_left_optab_supported_p direct_optab_supported_p
 #define direct_mask_fold_left_optab_supported_p direct_optab_supported_p
 #define direct_mask_len_fold_left_optab_supported_p direct_optab_supported_p
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 594f7881511..d09403c0a91 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -312,6 +312,9 @@ DEF_INTERNAL_OPTAB_FN (EXTRACT_LAST, ECF_CONST | 
ECF_NOTHROW,
 DEF_INTERNAL_OPTAB_FN (FOLD_EXTRACT_LAST, ECF_CONST | ECF_NOTHROW,
   fold_extract_last, fold_extract)
 
+DEF_INTERNAL_OPTAB_FN (LEN_FOLD_EXTRACT_LAST, ECF_CONST | ECF_NOTHROW,
+  len_fold_extract_last, fold_len_extract)
+
 DEF_INTERNAL_OPTAB_FN (FOLD_LEFT_PLUS, ECF_CONST | ECF_NOTHROW,
   fold_left_plus, fold_left)
 
-- 
2.36.3



[PATCH 11/11] aarch64,arm: Move branch-protection data to targets

2023-08-22 Thread Szabolcs Nagy via Gcc-patches
The branch-protection types are target specific, not the same on arm
and aarch64.  This currently affects pac-ret+b-key, but there will be
a new type on aarch64 that is not relevant for arm.

gcc/ChangeLog:

* config/aarch64/aarch64-opts.h (enum aarch64_key_type): Rename to ...
(enum aarch_key_type): ... this.
* config/aarch64/aarch64.cc (aarch_handle_no_branch_protection): Copy.
(aarch_handle_standard_branch_protection): Copy.
(aarch_handle_pac_ret_protection): Copy.
(aarch_handle_pac_ret_leaf): Copy.
(aarch_handle_pac_ret_b_key): Copy.
(aarch_handle_bti_protection): Copy.
* config/arm/aarch-common.cc (aarch_handle_no_branch_protection):
Remove.
(aarch_handle_standard_branch_protection): Remove.
(aarch_handle_pac_ret_protection): Remove.
(aarch_handle_pac_ret_leaf): Remove.
(aarch_handle_pac_ret_b_key): Remove.
(aarch_handle_bti_protection): Remove.
* config/arm/aarch-common.h (enum aarch_key_type): Remove.
(struct aarch_branch_protect_type): Declare.
* config/arm/arm-c.cc (arm_cpu_builtins): Remove aarch_ra_sign_key.
* config/arm/arm.cc (aarch_handle_no_branch_protection): Copy.
(aarch_handle_standard_branch_protection): Copy.
(aarch_handle_pac_ret_protection): Copy.
(aarch_handle_pac_ret_leaf): Copy.
(aarch_handle_bti_protection): Copy.
(arm_configure_build_target): Copy.
* config/arm/arm.opt: Remove aarch_ra_sign_key.
---
 gcc/config/aarch64/aarch64-opts.h |  6 ++--
 gcc/config/aarch64/aarch64.cc | 55 +++
 gcc/config/arm/aarch-common.cc| 55 ---
 gcc/config/arm/aarch-common.h | 11 +++
 gcc/config/arm/arm-c.cc   |  2 --
 gcc/config/arm/arm.cc | 52 +
 gcc/config/arm/arm.opt|  3 --
 7 files changed, 109 insertions(+), 75 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-opts.h 
b/gcc/config/aarch64/aarch64-opts.h
index 7e8f1babed8..75ef00b60d4 100644
--- a/gcc/config/aarch64/aarch64-opts.h
+++ b/gcc/config/aarch64/aarch64-opts.h
@@ -103,9 +103,9 @@ enum stack_protector_guard {
 };
 
 /* The key type that -msign-return-address should use.  */
-enum aarch64_key_type {
-  AARCH64_KEY_A,
-  AARCH64_KEY_B
+enum aarch_key_type {
+  AARCH_KEY_A,
+  AARCH_KEY_B
 };
 
 #endif
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 661ac12cacc..734980f78ec 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -18517,6 +18517,61 @@ aarch64_set_asm_isa_flags (aarch64_feature_flags flags)
   aarch64_set_asm_isa_flags (_options, flags);
 }
 
+static void
+aarch_handle_no_branch_protection (void)
+{
+  aarch_ra_sign_scope = AARCH_FUNCTION_NONE;
+  aarch_enable_bti = 0;
+}
+
+static void
+aarch_handle_standard_branch_protection (void)
+{
+  aarch_ra_sign_scope = AARCH_FUNCTION_NON_LEAF;
+  aarch_ra_sign_key = AARCH_KEY_A;
+  aarch_enable_bti = 1;
+}
+
+static void
+aarch_handle_pac_ret_protection (void)
+{
+  aarch_ra_sign_scope = AARCH_FUNCTION_NON_LEAF;
+  aarch_ra_sign_key = AARCH_KEY_A;
+}
+
+static void
+aarch_handle_pac_ret_leaf (void)
+{
+  aarch_ra_sign_scope = AARCH_FUNCTION_ALL;
+}
+
+static void
+aarch_handle_pac_ret_b_key (void)
+{
+  aarch_ra_sign_key = AARCH_KEY_B;
+}
+
+static void
+aarch_handle_bti_protection (void)
+{
+  aarch_enable_bti = 1;
+}
+
+static const struct aarch_branch_protect_type aarch_pac_ret_subtypes[] = {
+  { "leaf", false, aarch_handle_pac_ret_leaf, NULL, 0 },
+  { "b-key", false, aarch_handle_pac_ret_b_key, NULL, 0 },
+  { NULL, false, NULL, NULL, 0 }
+};
+
+const struct aarch_branch_protect_type aarch_branch_protect_types[] = {
+  { "none", true, aarch_handle_no_branch_protection, NULL, 0 },
+  { "standard", true, aarch_handle_standard_branch_protection, NULL, 0 },
+  { "pac-ret", false, aarch_handle_pac_ret_protection, aarch_pac_ret_subtypes,
+ARRAY_SIZE (aarch_pac_ret_subtypes) },
+  { "bti", false, aarch_handle_bti_protection, NULL, 0 },
+  { NULL, false, NULL, NULL, 0 }
+};
+
 /* Implement TARGET_OPTION_OVERRIDE.  This is called once in the beginning
and is used to parse the -m{cpu,tune,arch} strings and setup the initial
tuning structs.  In particular it must set selected_tune and
diff --git a/gcc/config/arm/aarch-common.cc b/gcc/config/arm/aarch-common.cc
index 159c61b786c..92e1248f83f 100644
--- a/gcc/config/arm/aarch-common.cc
+++ b/gcc/config/arm/aarch-common.cc
@@ -659,61 +659,6 @@ arm_md_asm_adjust (vec , vec & 
/*inputs*/,
   return saw_asm_flag ? seq : NULL;
 }
 
-static void
-aarch_handle_no_branch_protection (void)
-{
-  aarch_ra_sign_scope = AARCH_FUNCTION_NONE;
-  aarch_enable_bti = 0;
-}
-
-static void
-aarch_handle_standard_branch_protection (void)
-{
-  aarch_ra_sign_scope = AARCH_FUNCTION_NON_LEAF;
-  aarch_ra_sign_key = AARCH_KEY_A;
-  aarch_enable_bti = 1;
-}
-

[PATCH 05/11] aarch64: Add eh_return compile tests

2023-08-22 Thread Szabolcs Nagy via Gcc-patches
gcc/testsuite/ChangeLog:

* gcc.target/aarch64/eh_return-2.c: New test.
* gcc.target/aarch64/eh_return-3.c: New test.
---
 gcc/testsuite/gcc.target/aarch64/eh_return-2.c |  9 +
 gcc/testsuite/gcc.target/aarch64/eh_return-3.c | 14 ++
 2 files changed, 23 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/eh_return-2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/eh_return-3.c

diff --git a/gcc/testsuite/gcc.target/aarch64/eh_return-2.c 
b/gcc/testsuite/gcc.target/aarch64/eh_return-2.c
new file mode 100644
index 000..4a9d124e891
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/eh_return-2.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-final { scan-assembler "add\tsp, sp, x5" } } */
+/* { dg-final { scan-assembler "br\tx6" } } */
+
+void
+foo (unsigned long off, void *handler)
+{
+  __builtin_eh_return (off, handler);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/eh_return-3.c 
b/gcc/testsuite/gcc.target/aarch64/eh_return-3.c
new file mode 100644
index 000..35989eee806
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/eh_return-3.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mbranch-protection=pac-ret+leaf" } */
+/* { dg-final { scan-assembler "add\tsp, sp, x5" } } */
+/* { dg-final { scan-assembler "br\tx6" } } */
+/* { dg-final { scan-assembler "hint\t25 // paciasp" } } */
+/* { dg-final { scan-assembler "hint\t29 // autiasp" } } */
+
+void
+foo (unsigned long off, void *handler, int c)
+{
+  if (c)
+return;
+  __builtin_eh_return (off, handler);
+}
-- 
2.25.1



[PATCH 10/11] aarch64: Fix branch-protection error message tests

2023-08-22 Thread Szabolcs Nagy via Gcc-patches
Update tests for the new branch-protection parser errors.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/branch-protection-attr.c: Update.
* gcc.target/aarch64/branch-protection-option.c: Update.
---
 gcc/testsuite/gcc.target/aarch64/branch-protection-attr.c   | 6 +++---
 gcc/testsuite/gcc.target/aarch64/branch-protection-option.c | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/branch-protection-attr.c 
b/gcc/testsuite/gcc.target/aarch64/branch-protection-attr.c
index 272000c2747..dae2a758a56 100644
--- a/gcc/testsuite/gcc.target/aarch64/branch-protection-attr.c
+++ b/gcc/testsuite/gcc.target/aarch64/branch-protection-attr.c
@@ -4,19 +4,19 @@ void __attribute__ ((target("branch-protection=leaf")))
 foo1 ()
 {
 }
-/* { dg-error {invalid protection type 'leaf' in 
'target\("branch-protection="\)' pragma or attribute} "" { target *-*-* } 5 } */
+/* { dg-error {invalid argument 'leaf' for 'target\("branch-protection="\)'} 
"" { target *-*-* } 5 } */
 /* { dg-error {pragma or attribute 'target\("branch-protection=leaf"\)' is not 
valid} "" { target *-*-* } 5 } */
 
 void __attribute__ ((target("branch-protection=none+pac-ret")))
 foo2 ()
 {
 }
-/* { dg-error "unexpected 'pac-ret' after 'none'" "" { target *-*-* } 12 } */
+/* { dg-error {argument 'none' can only appear alone in 
'target\("branch-protection="\)'} "" { target *-*-* } 12 } */
 /* { dg-error {pragma or attribute 
'target\("branch-protection=none\+pac-ret"\)' is not valid} "" { target *-*-* } 
12 } */
 
 void __attribute__ ((target("branch-protection=")))
 foo3 ()
 {
 }
-/* { dg-error {missing argument to 'target\("branch-protection="\)' pragma or 
attribute} "" { target *-*-* } 19 } */
+/* { dg-error {invalid argument '' for 'target\("branch-protection="\)'} "" { 
target *-*-* } 19 } */
 /* { dg-error {pragma or attribute 'target\("branch-protection="\)' is not 
valid} "" { target *-*-* } 19 } */
diff --git a/gcc/testsuite/gcc.target/aarch64/branch-protection-option.c 
b/gcc/testsuite/gcc.target/aarch64/branch-protection-option.c
index 1b3bf4ee2b8..e2f847a31c4 100644
--- a/gcc/testsuite/gcc.target/aarch64/branch-protection-option.c
+++ b/gcc/testsuite/gcc.target/aarch64/branch-protection-option.c
@@ -1,4 +1,4 @@
 /* { dg-do "compile" } */
 /* { dg-options "-mbranch-protection=leaf -mbranch-protection=none+pac-ret" } 
*/
 
-/* { dg-error "unexpected 'pac-ret' after 'none'"  "" { target *-*-* } 0 } */
+/* { dg-error "argument 'none' can only appear alone in 
'-mbranch-protection='" "" { target *-*-* } 0 } */
-- 
2.25.1



[PATCH 04/11] aarch64: Do not force a stack frame for EH returns

2023-08-22 Thread Szabolcs Nagy via Gcc-patches
EH returns no longer rely on clobbering the return address on the stack
so forcing a stack frame is not necessary.

This does not actually change the code gen for the unwinder since there
are calls before the EH return.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_needs_frame_chain): Do not
force frame chain for eh_return.
---
 gcc/config/aarch64/aarch64.cc | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 36cd172d182..afdbf4213c1 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -8417,8 +8417,7 @@ aarch64_output_probe_sve_stack_clash (rtx base, rtx 
adjustment,
 static bool
 aarch64_needs_frame_chain (void)
 {
-  /* Force a frame chain for EH returns so the return address is at FP+8.  */
-  if (frame_pointer_needed || crtl->calls_eh_return)
+  if (frame_pointer_needed)
 return true;
 
   /* A leaf function cannot have calls or write LR.  */
-- 
2.25.1



[PATCH 09/11] aarch64,arm: Fix branch-protection= parsing

2023-08-22 Thread Szabolcs Nagy via Gcc-patches
Refactor the parsing to have a single API and fix a few parsing issues:

- Different handling of "bti+none" and "none+bti": these should be
  rejected because "none" can only appear alone.

- Accepted empty strings such as "bti++pac-ret" or "bti+", this bug
  was caused by using strtok_r.

- Memory got leaked (str_root was never freed). And two buffers got
  allocated when one is enough.

The callbacks now have no failure mode, only parsing can fail and
all failures are handled locally.  The "-mbranch-protection=" vs
"target("branch-protection=")" difference in the error message is
handled by a separate argument to aarch_validate_mbranch_protection.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_override_options): Update.
(aarch64_handle_attr_branch_protection): Update.
* config/arm/aarch-common-protos.h (aarch_parse_branch_protection):
Remove.
(aarch_validate_mbranch_protection): Add new argument.
* config/arm/aarch-common.cc (aarch_handle_no_branch_protection):
Update.
(aarch_handle_standard_branch_protection): Update.
(aarch_handle_pac_ret_protection): Update.
(aarch_handle_pac_ret_leaf): Update.
(aarch_handle_pac_ret_b_key): Update.
(aarch_handle_bti_protection): Update.
(aarch_parse_branch_protection): Remove.
(next_tok): New.
(aarch_validate_mbranch_protection): Rewrite.
* config/arm/aarch-common.h (struct aarch_branch_protect_type):
Add field "alone".
* config/arm/arm.cc (arm_configure_build_target): Update.
---
 gcc/config/aarch64/aarch64.cc|  37 +
 gcc/config/arm/aarch-common-protos.h |   5 +-
 gcc/config/arm/aarch-common.cc   | 214 ---
 gcc/config/arm/aarch-common.h|  14 +-
 gcc/config/arm/arm.cc|   3 +-
 5 files changed, 109 insertions(+), 164 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 7f0a22fae9c..661ac12cacc 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -18539,7 +18539,8 @@ aarch64_override_options (void)
 aarch64_validate_sls_mitigation (aarch64_harden_sls_string);
 
   if (aarch64_branch_protection_string)
-aarch_validate_mbranch_protection (aarch64_branch_protection_string);
+aarch_validate_mbranch_protection (aarch64_branch_protection_string,
+  "-mbranch-protection=");
 
   /* -mcpu=CPU is shorthand for -march=ARCH_FOR_CPU, -mtune=CPU.
  If either of -march or -mtune is given, they override their
@@ -18913,34 +18914,12 @@ aarch64_handle_attr_cpu (const char *str)
 
 /* Handle the argument STR to the branch-protection= attribute.  */
 
- static bool
- aarch64_handle_attr_branch_protection (const char* str)
- {
-  char *err_str = (char *) xmalloc (strlen (str) + 1);
-  enum aarch_parse_opt_result res = aarch_parse_branch_protection (str,
-  _str);
-  bool success = false;
-  switch (res)
-{
- case AARCH_PARSE_MISSING_ARG:
-   error ("missing argument to % pragma 
or"
- " attribute");
-   break;
- case AARCH_PARSE_INVALID_ARG:
-   error ("invalid protection type %qs in % pragma or attribute", err_str);
-   break;
- case AARCH_PARSE_OK:
-   success = true;
-  /* Fall through.  */
- case AARCH_PARSE_INVALID_FEATURE:
-   break;
- default:
-   gcc_unreachable ();
-}
-  free (err_str);
-  return success;
- }
+static bool
+aarch64_handle_attr_branch_protection (const char* str)
+{
+  return aarch_validate_mbranch_protection (str,
+   "target(\"branch-protection=\")");
+}
 
 /* Handle the argument STR to the tune= target attribute.  */
 
diff --git a/gcc/config/arm/aarch-common-protos.h 
b/gcc/config/arm/aarch-common-protos.h
index f8cb6562096..75ffdfbb050 100644
--- a/gcc/config/arm/aarch-common-protos.h
+++ b/gcc/config/arm/aarch-common-protos.h
@@ -159,10 +159,7 @@ rtx_insn *arm_md_asm_adjust (vec , vec & 
/*inputs*/,
 vec , HARD_REG_SET _regs,
 location_t loc);
 
-/* Parsing routine for branch-protection common to AArch64 and Arm.  */
-enum aarch_parse_opt_result aarch_parse_branch_protection (const char*, 
char**);
-
 /* Validation routine for branch-protection common to AArch64 and Arm.  */
-bool aarch_validate_mbranch_protection (const char *);
+bool aarch_validate_mbranch_protection (const char *, const char *);
 
 #endif /* GCC_AARCH_COMMON_PROTOS_H */
diff --git a/gcc/config/arm/aarch-common.cc b/gcc/config/arm/aarch-common.cc
index cbc7f68a8bf..159c61b786c 100644
--- a/gcc/config/arm/aarch-common.cc
+++ b/gcc/config/arm/aarch-common.cc
@@ -659,169 +659,143 @@ arm_md_asm_adjust (vec , vec & 
/*inputs*/,
   return saw_asm_flag ? seq : NULL;
 }
 
-static enum aarch_parse_opt_result

[PATCH 02/11] Handle epilogues that contain jumps

2023-08-22 Thread Szabolcs Nagy via Gcc-patches
From: Richard Sandiford 

The prologue/epilogue pass allows the prologue sequence
to contain jumps.  The sequence is then partitioned into
basic blocks using find_many_sub_basic_blocks.

This patch treats epilogues in the same way.  It's needed for
a follow-on aarch64 patch that adds conditional code to both
the prologue and the epilogue.

Tested on aarch64-linux-gnu (including with a follow-on patch)
and x86_64-linux-gnu.  OK to install?

Richard

gcc/
* function.cc (thread_prologue_and_epilogue_insns): Handle
epilogues that contain jumps.
---

This is a previously approved patch that was not committed
because it was not needed at the time, but i'd like to commit
it as it is needed for the followup aarch64 eh_return changes:

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605769.html

---
 gcc/function.cc | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/function.cc b/gcc/function.cc
index dd2c1136e07..70d1cd65303 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -6120,6 +6120,11 @@ thread_prologue_and_epilogue_insns (void)
  && returnjump_p (BB_END (e->src)))
e->flags &= ~EDGE_FALLTHRU;
}
+
+ auto_sbitmap blocks (last_basic_block_for_fn (cfun));
+ bitmap_clear (blocks);
+   bitmap_set_bit (blocks, BLOCK_FOR_INSN (epilogue_seq)->index);
+ find_many_sub_basic_blocks (blocks);
}
   else if (next_active_insn (BB_END (exit_fallthru_edge->src)))
{
@@ -6218,6 +6223,11 @@ thread_prologue_and_epilogue_insns (void)
  set_insn_locations (seq, epilogue_location);
 
  emit_insn_before (seq, insn);
+
+ auto_sbitmap blocks (last_basic_block_for_fn (cfun));
+ bitmap_clear (blocks);
+ bitmap_set_bit (blocks, BLOCK_FOR_INSN (insn)->index);
+ find_many_sub_basic_blocks (blocks);
}
 }
 
-- 
2.25.1



[PATCH 03/11] aarch64: Use br instead of ret for eh_return

2023-08-22 Thread Szabolcs Nagy via Gcc-patches
The expected way to handle eh_return is to pass the stack adjustment
offset and landing pad address via

  EH_RETURN_STACKADJ_RTX
  EH_RETURN_HANDLER_RTX

to the epilogue that is shared between normal return paths and the
eh_return paths.  EH_RETURN_HANDLER_RTX is the stack slot of the
return address that is overwritten with the landing pad in the
eh_return case and EH_RETURN_STACKADJ_RTX is a register added to sp
right before return and it is set to 0 in the normal return case.

The issue with this design is that eh_return and normal return may
require different return sequence but there is no way to distinguish
the two cases in the epilogue (the stack adjustment may be 0 in the
eh_return case too).

The reason eh_return and normal return requires different return
sequence is that control flow integrity hardening may need to treat
eh_return as a forward-edge transfer (it is not returning to the
previous stack frame) and normal return as a backward-edge one.
In case of AArch64 forward-edge is protected by BTI and requires br
instruction and backward-edge is protected by PAUTH or GCS and
requires ret (or authenticated ret) instruction.

This patch resolves the issue by using the EH_RETURN_STACKADJ_RTX
register only as a flag that is set to 1 in the eh_return paths
(it is 0 in normal return paths) and introduces

  AARCH64_EH_RETURN_STACKADJ_RTX
  AARCH64_EH_RETURN_HANDLER_RTX

to pass the actual stack adjustment and landing pad address to the
epilogue in the eh_return case. Then the epilogue can use the right
return sequence based on the EH_RETURN_STACKADJ_RTX flag.

The handler could be passed the old way via clobbering the return
address, but since now the eh_return case can be distinguished, the
handler can be in a different register than x30 and no stack frame
is needed for eh_return.

The new code generation for functions with eh_return is not amazing,
since x5 and x6 is assumed to be used by the epilogue even in the
normal return path, not just for eh_return.  But only the unwinder
is expected to use eh_return so this is fine.

This patch fixes a return to anywhere gadget in the unwinder with
existing standard branch protection as well as makes EH return
compatible with the Guarded Control Stack (GCS) extension.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_eh_return_handler_rtx):
Remove.
(aarch64_eh_return): New.
* config/aarch64/aarch64.cc (aarch64_return_address_signing_enabled):
Sign return address even in functions with eh_return.
(aarch64_epilogue_uses): Mark two registers as used.
(aarch64_expand_epilogue): Conditionally return with br or ret.
(aarch64_eh_return_handler_rtx): Remove.
(aarch64_eh_return): New.
* config/aarch64/aarch64.h (EH_RETURN_HANDLER_RTX): Remove.
(AARCH64_EH_RETURN_STACKADJ_REGNUM): Define.
(AARCH64_EH_RETURN_STACKADJ_RTX): Define.
(AARCH64_EH_RETURN_HANDLER_REGNUM): Define.
(AARCH64_EH_RETURN_HANDLER_RTX): Define.
* config/aarch64/aarch64.md (eh_return): New.
---
 gcc/config/aarch64/aarch64-protos.h |   2 +-
 gcc/config/aarch64/aarch64.cc   | 106 +++-
 gcc/config/aarch64/aarch64.h|  11 ++-
 gcc/config/aarch64/aarch64.md   |   8 +++
 4 files changed, 73 insertions(+), 54 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 70303d6fd95..5d1834162a4 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -855,7 +855,7 @@ machine_mode aarch64_hard_regno_caller_save_mode (unsigned, 
unsigned,
   machine_mode);
 int aarch64_uxt_size (int, HOST_WIDE_INT);
 int aarch64_vec_fpconst_pow_of_2 (rtx);
-rtx aarch64_eh_return_handler_rtx (void);
+void aarch64_eh_return (rtx);
 rtx aarch64_mask_from_zextract_ops (rtx, rtx);
 const char *aarch64_output_move_struct (rtx *operands);
 rtx aarch64_return_addr_rtx (void);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index eba5d4a7e04..36cd172d182 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -8972,17 +8972,6 @@ aarch64_return_address_signing_enabled (void)
   /* This function should only be called after frame laid out.   */
   gcc_assert (cfun->machine->frame.laid_out);
 
-  /* Turn return address signing off in any function that uses
- __builtin_eh_return.  The address passed to __builtin_eh_return
- is not signed so either it has to be signed (with original sp)
- or the code path that uses it has to avoid authenticating it.
- Currently eh return introduces a return to anywhere gadget, no
- matter what we do here since it uses ret with user provided
- address. An ideal fix for that is to use indirect branch which
- can be protected with BTI j (to some extent).  */
-  if (crtl->calls_eh_return)
-return false;
-
   /* If signing scope 

[PATCH 08/11] aarch64,arm: Remove accepted_branch_protection_string

2023-08-22 Thread Szabolcs Nagy via Gcc-patches
On aarch64 this caused ICE with pragma push_options since

  commit ae54c1b09963779c5c3914782324ff48af32e2f1
  Author: Wilco Dijkstra 
  CommitDate: 2022-06-01 18:13:57 +0100

  AArch64: Cleanup option processing code

The failure is at pop_options:

internal compiler error: ‘global_options’ are modified in local context

On arm the variable was unused.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_override_options_after_change_1):
Do not override branch_protection options.
(aarch64_override_options): Remove accepted_branch_protection_string.
* config/arm/aarch-common.cc (BRANCH_PROTECT_STR_MAX): Remove.
(aarch_parse_branch_protection): Remove
accepted_branch_protection_string.
* config/arm/arm.cc: Likewise.
---
 gcc/config/aarch64/aarch64.cc  | 10 +-
 gcc/config/arm/aarch-common.cc | 16 
 gcc/config/arm/arm.cc  |  2 --
 3 files changed, 1 insertion(+), 27 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index afdbf4213c1..7f0a22fae9c 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -322,8 +322,6 @@ bool aarch64_pcrelative_literal_loads;
 /* Global flag for whether frame pointer is enabled.  */
 bool aarch64_use_frame_pointer;
 
-char *accepted_branch_protection_string = NULL;
-
 /* Support for command line parsing of boolean flags in the tuning
structures.  */
 struct aarch64_flag_desc
@@ -18004,12 +18002,6 @@ aarch64_adjust_generic_arch_tuning (struct tune_params 
_tune)
 static void
 aarch64_override_options_after_change_1 (struct gcc_options *opts)
 {
-  if (accepted_branch_protection_string)
-{
-  opts->x_aarch64_branch_protection_string
-   = xstrdup (accepted_branch_protection_string);
-}
-
   /* PR 70044: We have to be careful about being called multiple times for the
  same function.  This means all changes should be repeatable.  */
 
@@ -18612,7 +18604,7 @@ aarch64_override_options (void)
   /* Return address signing is currently not supported for ILP32 targets.  For
  LP64 targets use the configured option in the absence of a command-line
  option for -mbranch-protection.  */
-  if (!TARGET_ILP32 && accepted_branch_protection_string == NULL)
+  if (!TARGET_ILP32 && aarch64_branch_protection_string == NULL)
 {
 #ifdef TARGET_ENABLE_PAC_RET
   aarch_ra_sign_scope = AARCH_FUNCTION_NON_LEAF;
diff --git a/gcc/config/arm/aarch-common.cc b/gcc/config/arm/aarch-common.cc
index 5b96ff4c2e8..cbc7f68a8bf 100644
--- a/gcc/config/arm/aarch-common.cc
+++ b/gcc/config/arm/aarch-common.cc
@@ -659,9 +659,6 @@ arm_md_asm_adjust (vec , vec & /*inputs*/,
   return saw_asm_flag ? seq : NULL;
 }
 
-#define BRANCH_PROTECT_STR_MAX 255
-extern char *accepted_branch_protection_string;
-
 static enum aarch_parse_opt_result
 aarch_handle_no_branch_protection (char* str, char* rest)
 {
@@ -812,19 +809,6 @@ aarch_parse_branch_protection (const char *const_str, 
char** last_str)
   else
*last_str = NULL;
 }
-
-  if (res == AARCH_PARSE_OK)
-{
-  /* If needed, alloc the accepted string then copy in const_str.
-   Used by override_option_after_change_1.  */
-  if (!accepted_branch_protection_string)
-   accepted_branch_protection_string
- = (char *) xmalloc (BRANCH_PROTECT_STR_MAX + 1);
-  strncpy (accepted_branch_protection_string, const_str,
-  BRANCH_PROTECT_STR_MAX + 1);
-  /* Forcibly null-terminate.  */
-  accepted_branch_protection_string[BRANCH_PROTECT_STR_MAX] = '\0';
-}
   return res;
 }
 
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 6e933c80183..f49312cace0 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -2424,8 +2424,6 @@ const struct tune_params arm_fa726te_tune =
   tune_params::SCHED_AUTOPREF_OFF
 };
 
-char *accepted_branch_protection_string = NULL;
-
 /* Auto-generated CPU, FPU and architecture tables.  */
 #include "arm-cpu-data.h"
 
-- 
2.25.1



[PATCH 07/11] aarch64: Disable branch-protection for pcs tests

2023-08-22 Thread Szabolcs Nagy via Gcc-patches
The tests manipulate the return address in abitest-2.h and thus not
compatible with -mbranch-protection=pac-ret+leaf or
-mbranch-protection=gcs.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/aapcs64/func-ret-1.c: Disable branch-protection.
* gcc.target/aarch64/aapcs64/func-ret-2.c: Likewise.
* gcc.target/aarch64/aapcs64/func-ret-3.c: Likewise.
* gcc.target/aarch64/aapcs64/func-ret-4.c: Likewise.
* gcc.target/aarch64/aapcs64/func-ret-64x1_1.c: Likewise.
---
 gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-1.c  | 1 +
 gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-2.c  | 1 +
 gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-3.c  | 1 +
 gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-4.c  | 1 +
 gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-64x1_1.c | 1 +
 5 files changed, 5 insertions(+)

diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-1.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-1.c
index 5405e1e4920..7bd7757efe6 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-1.c
@@ -4,6 +4,7 @@
AAPCS64 \S 4.1.  */
 
 /* { dg-do run { target aarch64*-*-* } } */
+/* { dg-additional-options "-mbranch-protection=none" } */
 /* { dg-additional-sources "abitest.S" } */
 
 #ifndef IN_FRAMEWORK
diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-2.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-2.c
index 6b171c46fbb..85a822ace4a 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-2.c
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-2.c
@@ -4,6 +4,7 @@
Homogeneous floating-point aggregate types are covered in func-ret-3.c.  */
 
 /* { dg-do run { target aarch64*-*-* } } */
+/* { dg-additional-options "-mbranch-protection=none" } */
 /* { dg-additional-sources "abitest.S" } */
 
 #ifndef IN_FRAMEWORK
diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-3.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-3.c
index ad312b675b9..1d35ebf14b4 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-3.c
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-3.c
@@ -4,6 +4,7 @@
in AAPCS64 \S 4.3.5.  */
 
 /* { dg-do run { target aarch64-*-* } } */
+/* { dg-additional-options "-mbranch-protection=none" } */
 /* { dg-additional-sources "abitest.S" } */
 /* { dg-require-effective-target aarch64_big_endian } */
 
diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-4.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-4.c
index af05fbe9fdf..15e1408c62d 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-4.c
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-4.c
@@ -5,6 +5,7 @@
are treated as general composite types.  */
 
 /* { dg-do run { target aarch64*-*-* } } */
+/* { dg-additional-options "-mbranch-protection=none" } */
 /* { dg-additional-sources "abitest.S" } */
 /* { dg-require-effective-target aarch64_big_endian } */
 
diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-64x1_1.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-64x1_1.c
index 05957e2dcae..fe7bbb6a835 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-64x1_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-64x1_1.c
@@ -3,6 +3,7 @@
   Test 64-bit singleton vector types which should be in FP/SIMD registers.  */
 
 /* { dg-do run { target aarch64*-*-* } } */
+/* { dg-additional-options "-mbranch-protection=none" } */
 /* { dg-additional-sources "abitest.S" } */
 
 #ifndef IN_FRAMEWORK
-- 
2.25.1



[PATCH 06/11] aarch64: Fix pac-ret eh_return tests

2023-08-22 Thread Szabolcs Nagy via Gcc-patches
This is needed since eh_return no longer prevents pac-ret in the
normal return path.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/return_address_sign_1.c: Move func4 to ...
* gcc.target/aarch64/return_address_sign_2.c: ... here and fix the
scan asm check.
* gcc.target/aarch64/return_address_sign_b_1.c: Move func4 to ...
* gcc.target/aarch64/return_address_sign_b_2.c: ... here and fix the
scan asm check.
---
 .../gcc.target/aarch64/return_address_sign_1.c  | 13 +
 .../gcc.target/aarch64/return_address_sign_2.c  | 17 +++--
 .../aarch64/return_address_sign_b_1.c   | 11 ---
 .../aarch64/return_address_sign_b_2.c   | 17 +++--
 4 files changed, 31 insertions(+), 27 deletions(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/return_address_sign_1.c 
b/gcc/testsuite/gcc.target/aarch64/return_address_sign_1.c
index 232ba67ade0..114a9dacb3f 100644
--- a/gcc/testsuite/gcc.target/aarch64/return_address_sign_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/return_address_sign_1.c
@@ -37,16 +37,5 @@ func3 (int a, int b, int c)
   /* autiasp */
 }
 
-/* eh_return.  */
-void __attribute__ ((target ("arch=armv8.3-a")))
-func4 (long offset, void *handler, int *ptr, int imm1, int imm2)
-{
-  /* no paciasp */
-  *ptr = imm1 + foo (imm1) + imm2;
-  __builtin_eh_return (offset, handler);
-  /* no autiasp */
-  return;
-}
-
-/* { dg-final { scan-assembler-times "autiasp" 3 } } */
 /* { dg-final { scan-assembler-times "paciasp" 3 } } */
+/* { dg-final { scan-assembler-times "autiasp" 3 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/return_address_sign_2.c 
b/gcc/testsuite/gcc.target/aarch64/return_address_sign_2.c
index a4bc5b45333..d93492c3c43 100644
--- a/gcc/testsuite/gcc.target/aarch64/return_address_sign_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/return_address_sign_2.c
@@ -14,5 +14,18 @@ func1 (int a, int b, int c)
   /* retaa */
 }
 
-/* { dg-final { scan-assembler-times "paciasp" 1 } } */
-/* { dg-final { scan-assembler-times "retaa" 1 } } */
+/* eh_return.  */
+void __attribute__ ((target ("arch=armv8.3-a")))
+func4 (long offset, void *handler, int *ptr, int imm1, int imm2)
+{
+  /* paciasp */
+  *ptr = imm1 + foo (imm1) + imm2;
+  if (handler)
+/* br */
+__builtin_eh_return (offset, handler);
+  /* retaa */
+  return;
+}
+
+/* { dg-final { scan-assembler-times "paciasp" 2 } } */
+/* { dg-final { scan-assembler-times "retaa" 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/return_address_sign_b_1.c 
b/gcc/testsuite/gcc.target/aarch64/return_address_sign_b_1.c
index 43e32ab6cb7..697fa30dc5a 100644
--- a/gcc/testsuite/gcc.target/aarch64/return_address_sign_b_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/return_address_sign_b_1.c
@@ -37,16 +37,5 @@ func3 (int a, int b, int c)
   /* autibsp */
 }
 
-/* eh_return.  */
-void __attribute__ ((target ("arch=armv8.3-a")))
-func4 (long offset, void *handler, int *ptr, int imm1, int imm2)
-{
-  /* no pacibsp */
-  *ptr = imm1 + foo (imm1) + imm2;
-  __builtin_eh_return (offset, handler);
-  /* no autibsp */
-  return;
-}
-
 /* { dg-final { scan-assembler-times "pacibsp" 3 } } */
 /* { dg-final { scan-assembler-times "autibsp" 3 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/return_address_sign_b_2.c 
b/gcc/testsuite/gcc.target/aarch64/return_address_sign_b_2.c
index 9ed64ce0591..748924c72f3 100644
--- a/gcc/testsuite/gcc.target/aarch64/return_address_sign_b_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/return_address_sign_b_2.c
@@ -14,5 +14,18 @@ func1 (int a, int b, int c)
   /* retab */
 }
 
-/* { dg-final { scan-assembler-times "pacibsp" 1 } } */
-/* { dg-final { scan-assembler-times "retab" 1 } } */
+/* eh_return.  */
+void __attribute__ ((target ("arch=armv8.3-a")))
+func4 (long offset, void *handler, int *ptr, int imm1, int imm2)
+{
+  /* paciasp */
+  *ptr = imm1 + foo (imm1) + imm2;
+  if (handler)
+/* br */
+__builtin_eh_return (offset, handler);
+  /* retab */
+  return;
+}
+
+/* { dg-final { scan-assembler-times "pacibsp" 2 } } */
+/* { dg-final { scan-assembler-times "retab" 2 } } */
-- 
2.25.1



[PATCH 01/11] aarch64: AARCH64_ISA_RCPC was defined twice

2023-08-22 Thread Szabolcs Nagy via Gcc-patches
gcc/ChangeLog:

* config/aarch64/aarch64.h (AARCH64_ISA_RCPC): Remove dup.
---
 gcc/config/aarch64/aarch64.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 2b0fc97bb71..c783cb96c48 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -222,7 +222,6 @@ enum class aarch64_feature : unsigned char {
 #define AARCH64_ISA_MOPS  (aarch64_isa_flags & AARCH64_FL_MOPS)
 #define AARCH64_ISA_LS64  (aarch64_isa_flags & AARCH64_FL_LS64)
 #define AARCH64_ISA_CSSC  (aarch64_isa_flags & AARCH64_FL_CSSC)
-#define AARCH64_ISA_RCPC   (aarch64_isa_flags & AARCH64_FL_RCPC)
 
 /* Crypto is an optional extension to AdvSIMD.  */
 #define TARGET_CRYPTO (AARCH64_ISA_CRYPTO)
-- 
2.25.1



[PATCH 00/11] aarch64 GCS preliminary patches

2023-08-22 Thread Szabolcs Nagy via Gcc-patches
I'm working on Guarded Control Stack support for aarch64 and have a
set of patches that are needed for GCS but seem useful without it so
makes sense to review them separately from the rest of the GCS work.

GCS support will depend on the linux ABI that is under discussion at
https://lore.kernel.org/lkml/20230807-arm64-gcs-v4-0-68cfa37f9...@kernel.org/
so it will come later.

Richard Sandiford (1):
  Handle epilogues that contain jumps

Szabolcs Nagy (10):
  aarch64: AARCH64_ISA_RCPC was defined twice
  aarch64: Use br instead of ret for eh_return
  aarch64: Do not force a stack frame for EH returns
  aarch64: Add eh_return compile tests
  aarch64: Fix pac-ret eh_return tests
  aarch64: Disable branch-protection for pcs tests
  aarch64,arm: Remove accepted_branch_protection_string
  aarch64,arm: Fix branch-protection= parsing
  aarch64: Fix branch-protection error message tests
  aarch64,arm: Move branch-protection data to targets

 gcc/config/aarch64/aarch64-opts.h |   6 +-
 gcc/config/aarch64/aarch64-protos.h   |   2 +-
 gcc/config/aarch64/aarch64.cc | 211 +---
 gcc/config/aarch64/aarch64.h  |  12 +-
 gcc/config/aarch64/aarch64.md |   8 +
 gcc/config/arm/aarch-common-protos.h  |   5 +-
 gcc/config/arm/aarch-common.cc| 229 +-
 gcc/config/arm/aarch-common.h |  25 +-
 gcc/config/arm/arm-c.cc   |   2 -
 gcc/config/arm/arm.cc |  57 -
 gcc/config/arm/arm.opt|   3 -
 gcc/function.cc   |  10 +
 .../gcc.target/aarch64/aapcs64/func-ret-1.c   |   1 +
 .../gcc.target/aarch64/aapcs64/func-ret-2.c   |   1 +
 .../gcc.target/aarch64/aapcs64/func-ret-3.c   |   1 +
 .../gcc.target/aarch64/aapcs64/func-ret-4.c   |   1 +
 .../aarch64/aapcs64/func-ret-64x1_1.c |   1 +
 .../aarch64/branch-protection-attr.c  |   6 +-
 .../aarch64/branch-protection-option.c|   2 +-
 .../gcc.target/aarch64/eh_return-2.c  |   9 +
 .../gcc.target/aarch64/eh_return-3.c  |  14 ++
 .../aarch64/return_address_sign_1.c   |  13 +-
 .../aarch64/return_address_sign_2.c   |  17 +-
 .../aarch64/return_address_sign_b_1.c |  11 -
 .../aarch64/return_address_sign_b_2.c |  17 +-
 25 files changed, 338 insertions(+), 326 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/eh_return-2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/eh_return-3.c

-- 
2.25.1



[patch] libgomp.c/simd-math-1.c: Test scalb{, l}n{, f} and un-XFAIL for non-nvptx/amdgcn

2023-08-22 Thread Tobias Burnus

As mentioned in the 'libgomp, testsuite: Do not call nonstandard functions on 
darwin' thread:

* scalb was deprecated then deleted in POSIX in favor of scalbn{,f} and 
scalbln{,f} which
  take an int or long, respectively, instead of double for the 'exp' argument of
  'x * FLT_RADIX ** exp'.   It makes sense to test the standard version 
alongside the
  deprecated one, especially on systems which don't have the nonstandard 
function.

* The testcase unconditionally used an XFAIL version in the tgamma{,f} test, but
  the comment indicated that it is only needed for newlib; hence, the XFAIL 
macro
  variant is now only used for nvptx and amdgcn.

Jakub: Do those changes look good to you?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp.c/simd-math-1.c: Test scalb{,l}n{,f} and un-XFAIL for non-nvptx/amdgcn

libgomp/ChangeLog:

	* testsuite/libgomp.c/simd-math-1.c (TEST_FUN2INT): New.
	(main): Also test __builtin_scalb{,l}n{,f}. Only TEST_FUN_XFAIL
	tgamma{,f} for nvptx and amdgcn and use TEST_FUN for all other targets.

 libgomp/testsuite/libgomp.c/simd-math-1.c | 55 +--
 1 file changed, 53 insertions(+), 2 deletions(-)

diff --git a/libgomp/testsuite/libgomp.c/simd-math-1.c b/libgomp/testsuite/libgomp.c/simd-math-1.c
index dd2077cc597..151698cc7a1 100644
--- a/libgomp/testsuite/libgomp.c/simd-math-1.c
+++ b/libgomp/testsuite/libgomp.c/simd-math-1.c
@@ -5,6 +5,9 @@
 /* { dg-options "-O2 -ftree-vectorize -fno-math-errno" } */
 /* { dg-additional-options -foffload-options=amdgcn-amdhsa=-mstack-size=300 { target offload_target_amdgcn } } */
 
+/* Newlib's version of tgammaf is known to have poor accuracy.  */
+/* { dg-additional-options "-DXFAIL_TGAMMA=1" { target { nvptx*-*-* amdgcn*-*-* } } } */
+
 #undef PRINT_RESULT
 #define VERBOSE 0
 #define EARLY_EXIT 1
@@ -144,6 +147,42 @@ void test_##FUN (void) \
 }\
 test_##FUN ();
 
+#define TEST_FUN2INT(TFLOAT, LOW1, HIGH1, TINT, LOW2, HIGH2, FUN) \
+__attribute__((optimize("no-tree-vectorize"))) \
+__attribute__((optimize("no-unsafe-math-optimizations"))) \
+void check_##FUN (TFLOAT res[N], TFLOAT a[N], TINT b[N]) \
+{ \
+  int failed = 0; \
+  for (int i = 0; i < N; i++) { \
+TFLOAT expected = FUN (a[i], b[i]); \
+TFLOAT diff = __builtin_fabs (expected - res[i]); \
+int deviation = deviation_##TFLOAT (expected, res[i]); \
+int fail = isnan (res[i]) != isnan (expected) \
+	   || isinf (res[i]) != isinf (expected) \
+	   || (diff > EPSILON_##TFLOAT && deviation > 10); \
+failed |= fail; \
+if (VERBOSE || fail) \
+  PRINTF (#FUN "(%f,%ld) = %f, expected = %f, diff = %f, deviation = %d %s\n", \
+	  a[i], (long) b[i], res[i], expected, diff, deviation, fail ? "(!)" : ""); \
+if (EARLY_EXIT && fail) \
+  exit (1); \
+  } \
+} \
+void test_##FUN (void) \
+{ \
+  TFLOAT res[N], a[N]; \
+  TINT b[N]; \
+  for (int i = 0; i < N; i++) { \
+a[i] = LOW1 + ((HIGH1 - LOW1) / N) * i; \
+b[i] = LOW1 + (i * (HIGH1 - LOW1)) / N; \
+  } \
+  _Pragma ("omp target parallel for simd map(to:a) map(from:res)") \
+for (int i = 0; i < N; i++) \
+  res[i] = FUN (a[i], b[i]); \
+  check_##FUN (res, a, b); \
+}\
+test_##FUN ();
+
 int main (void)
 {
   TEST_FUN (float, -1.1, 1.1, acosf);
@@ -169,6 +208,8 @@ int main (void)
   TEST_FUN2 (float, -100.0, 100.0, 100.0, -100.0, powf);
   TEST_FUN2 (float, -50.0, 100.0, -2.0, 40.0, remainderf);
   TEST_FUN (float, -50.0, 50.0, rintf);
+  TEST_FUN2INT (float, -50.0, 50.0, int, -10, 32, __builtin_scalbnf);
+  TEST_FUN2INT (float, -50.0, 50.0, long, -10L, 32L, __builtin_scalblnf);
   TEST_FUN2 (float, -50.0, 50.0, -10.0, 32.0, __builtin_scalbf);
   TEST_FUN (float, -10.0, 10.0, __builtin_significandf);
   TEST_FUN (float, -3.14159265359, 3.14159265359, sinf);
@@ -176,8 +217,12 @@ int main (void)
   TEST_FUN (float, -0.1, 1.0, sqrtf);
   TEST_FUN (float, -5.0, 5.0, tanf);
   TEST_FUN (float, -3.14159265359, 3.14159265359, tanhf);
-  /* Newlib's version of tgammaf is known to have poor accuracy.  */
+
+#ifdef XFAIL_TGAMMA
   TEST_FUN_XFAIL (float, -10.0, 10.0, tgammaf);
+#else
+  TEST_FUN (float, -10.0, 10.0, tgammaf);
+#endif
 
   TEST_FUN (double, -1.1, 1.1, acos);
   TEST_FUN (double, -10, 10, acosh);
@@ -202,6 +247,8 @@ int main (void)
   TEST_FUN2 (double, -100.0, 100.0, 100.0, -100.0, pow);
   TEST_FUN2 (double, -50.0, 100.0, -2.0, 40.0, remainder);
   TEST_FUN (double, -50.0, 50.0, rint);
+  TEST_FUN2INT (double, -50.0, 50.0, int, -10, 32, __builtin_scalbn);
+  TEST_FUN2INT (double, -50.0, 50.0, long, -10, 32, __builtin_scalbln);
   TEST_FUN2 (double, -50.0, 50.0, -10.0, 32.0, __builtin_scalb);
   TEST_FUN (double, -10.0, 10.0, __builtin_significand);
   TEST_FUN (double, -3.14159265359, 3.14159265359, sin);
@@ -209,8 

RE: [PATCH 1/9] arm: [MVE intrinsics] factorize vmullbq vmulltq

2023-08-22 Thread Kyrylo Tkachov via Gcc-patches
Hi Christophe,

> -Original Message-
> From: Christophe Lyon 
> Sent: Monday, August 14, 2023 7:34 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 1/9] arm: [MVE intrinsics] factorize vmullbq vmulltq
> 
> Factorize vmullbq, vmulltq so that they use the same parameterized
> names.
> 
> 2023-08-14  Christophe Lyon  
> 
>   gcc/
>   * config/arm/iterators.md (mve_insn): Add vmullb, vmullt.
>   (isu): Add VMULLBQ_INT_S, VMULLBQ_INT_U, VMULLTQ_INT_S,
>   VMULLTQ_INT_U.
>   (supf): Add VMULLBQ_POLY_P, VMULLTQ_POLY_P,
> VMULLBQ_POLY_M_P,
>   VMULLTQ_POLY_M_P.
>   (VMULLBQ_INT, VMULLTQ_INT, VMULLBQ_INT_M, VMULLTQ_INT_M):
> Delete.
>   (VMULLxQ_INT, VMULLxQ_POLY, VMULLxQ_INT_M,
> VMULLxQ_POLY_M): New.
>   * config/arm/mve.md (mve_vmullbq_int_)
>   (mve_vmulltq_int_): Merge into ...
>   (@mve_q_int_) ... this.
>   (mve_vmulltq_poly_p, mve_vmullbq_poly_p): Merge
> into ...
>   (@mve_q_poly_): ... this.
>   (mve_vmullbq_int_m_,
> mve_vmulltq_int_m_): Merge into ...
>   (@mve_q_int_m_): ... this.
>   (mve_vmullbq_poly_m_p, mve_vmulltq_poly_m_p):
> Merge into ...
>   (@mve_q_poly_m_): ... this.

The series is okay and similar in design to your previous series in this area.
Thanks again for doing this rework.
Kyrill

> ---
>  gcc/config/arm/iterators.md |  23 +++--
>  gcc/config/arm/mve.md   | 100 
>  2 files changed, 38 insertions(+), 85 deletions(-)
> 
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index b13ff53d36f..fb003bcd67b 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -917,6 +917,7 @@
> 
>  (define_int_attr mve_insn [
>(UNSPEC_VCADD90 "vcadd") (UNSPEC_VCADD270 "vcadd")
> +  (UNSPEC_VCMLA "vcmla") (UNSPEC_VCMLA90 "vcmla")
> (UNSPEC_VCMLA180 "vcmla") (UNSPEC_VCMLA270 "vcmla")
>(UNSPEC_VCMUL "vcmul") (UNSPEC_VCMUL90 "vcmul")
> (UNSPEC_VCMUL180 "vcmul") (UNSPEC_VCMUL270 "vcmul")
>(VABAVQ_P_S "vabav") (VABAVQ_P_U "vabav")
>(VABAVQ_S "vabav") (VABAVQ_U "vabav")
> @@ -1044,6 +1045,13 @@
>(VMOVNTQ_S "vmovnt") (VMOVNTQ_U "vmovnt")
>(VMULHQ_M_S "vmulh") (VMULHQ_M_U "vmulh")
>(VMULHQ_S "vmulh") (VMULHQ_U "vmulh")
> +  (VMULLBQ_INT_M_S "vmullb") (VMULLBQ_INT_M_U
> "vmullb")
> +  (VMULLBQ_INT_S "vmullb") (VMULLBQ_INT_U "vmullb")
> +  (VMULLBQ_POLY_M_P "vmullb") (VMULLTQ_POLY_M_P
> "vmullt")
> +  (VMULLBQ_POLY_P "vmullb")
> +  (VMULLTQ_INT_M_S "vmullt") (VMULLTQ_INT_M_U
> "vmullt")
> +  (VMULLTQ_INT_S "vmullt") (VMULLTQ_INT_U "vmullt")
> +  (VMULLTQ_POLY_P "vmullt")
>(VMULQ_M_N_S "vmul") (VMULQ_M_N_U "vmul")
> (VMULQ_M_N_F "vmul")
>(VMULQ_M_S "vmul") (VMULQ_M_U "vmul") (VMULQ_M_F
> "vmul")
>(VMULQ_N_S "vmul") (VMULQ_N_U "vmul") (VMULQ_N_F
> "vmul")
> @@ -1209,7 +1217,6 @@
>(VSUBQ_M_N_S "vsub") (VSUBQ_M_N_U "vsub")
> (VSUBQ_M_N_F "vsub")
>(VSUBQ_M_S "vsub") (VSUBQ_M_U "vsub") (VSUBQ_M_F
> "vsub")
>(VSUBQ_N_S "vsub") (VSUBQ_N_U "vsub") (VSUBQ_N_F
> "vsub")
> -  (UNSPEC_VCMLA "vcmla") (UNSPEC_VCMLA90 "vcmla")
> (UNSPEC_VCMLA180 "vcmla") (UNSPEC_VCMLA270 "vcmla")
>])
> 
>  (define_int_attr isu[
> @@ -1246,6 +1253,8 @@
>(VMOVNBQ_S "i") (VMOVNBQ_U "i")
>(VMOVNTQ_M_S "i") (VMOVNTQ_M_U "i")
>(VMOVNTQ_S "i") (VMOVNTQ_U "i")
> +  (VMULLBQ_INT_S "s") (VMULLBQ_INT_U "u")
> +  (VMULLTQ_INT_S "s") (VMULLTQ_INT_U "u")
>(VNEGQ_M_S "s")
>(VQABSQ_M_S "s")
>(VQMOVNBQ_M_S "s") (VQMOVNBQ_M_U "u")
> @@ -2330,6 +2339,10 @@
>  (VMLADAVQ_U "u") (VMULHQ_S "s") (VMULHQ_U "u")
>  (VMULLBQ_INT_S "s") (VMULLBQ_INT_U "u") (VQADDQ_S
> "s")
>  (VMULLTQ_INT_S "s") (VMULLTQ_INT_U "u") (VQADDQ_U
> "u")
> +(VMULLBQ_POLY_P "p")
> +(VMULLTQ_POLY_P "p")
> +(VMULLBQ_POLY_M_P "p")
> +(VMULLTQ_POLY_M_P "p")
>  (VMULQ_N_S "s") (VMULQ_N_U "u") (VMULQ_S "s")
>  (VMULQ_U "u")
>  (VQADDQ_N_S "s") (VQADDQ_N_U "u")
> @@ -2713,8 +2726,8 @@
>  (define_int_iterator VMINVQ [VMINVQ_U VMINVQ_S])
>  (define_int_iterator VMLADAVQ [VMLADAVQ_U VMLADAVQ_S])
>  (define_int_iterator VMULHQ [VMULHQ_S VMULHQ_U])
> -(define_int_iterator VMULLBQ_INT [VMULLBQ_INT_U VMULLBQ_INT_S])
> -(define_int_iterator VMULLTQ_INT [VMULLTQ_INT_U VMULLTQ_INT_S])
> +(define_int_iterator VMULLxQ_INT [VMULLBQ_INT_U VMULLBQ_INT_S
> VMULLTQ_INT_U VMULLTQ_INT_S])
> +(define_int_iterator VMULLxQ_POLY 

RE: [PATCH] arm: [MVE intrinsics] Remove dead check for float type in parse_element_type

2023-08-22 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Christophe Lyon 
> Sent: Monday, August 14, 2023 7:10 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH] arm: [MVE intrinsics] Remove dead check for float type in
> parse_element_type
> 
> Fix a likely copy/paste error, where we check if ch == 'f' after we
> checked it's either 's' or 'u'.

Ok.
Thanks,
Kyrill

> 
> 2023-08-14  Christophe Lyon  
> 
>   gcc/
>   * config/arm/arm-mve-builtins-shapes.cc (parse_element_type):
>   Remove dead check.
> ---
>  gcc/config/arm/arm-mve-builtins-shapes.cc | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-
> mve-builtins-shapes.cc
> index 1633084608e..23eb9d0e69b 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.cc
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
> @@ -80,8 +80,7 @@ parse_element_type (const function_instance
> , const char *)
> 
>if (ch == 's' || ch == 'u')
>  {
> -  type_class_index tclass = (ch == 'f' ? TYPE_float
> -  : ch == 's' ? TYPE_signed
> +  type_class_index tclass = (ch == 's' ? TYPE_signed
>: TYPE_unsigned);
>char *end;
>unsigned int bits = strtol (format, , 10);
> --
> 2.34.1



RE: [PATCH] arm: [MVE intrinsics] fix binary_acca_int32 and binary_acca_int64 shapes

2023-08-22 Thread Kyrylo Tkachov via Gcc-patches
Hi Christophe,

> -Original Message-
> From: Christophe Lyon 
> Sent: Monday, August 14, 2023 7:01 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH] arm: [MVE intrinsics] fix binary_acca_int32 and
> binary_acca_int64 shapes
> 
> Fix these two shapes, where we were failing to check the last
> non-predicate parameter.

Ok.
Thanks,
Kyrill

> 
> 2023-08-14  Christophe Lyon  
> 
>   gcc/
>   * config/arm/arm-mve-builtins-shapes.cc (binary_acca_int32): Fix
> loop bound.
>   (binary_acca_int64): Likewise.
> ---
>  gcc/config/arm/arm-mve-builtins-shapes.cc | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-
> mve-builtins-shapes.cc
> index 6d477a84330..1633084608e 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.cc
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
> @@ -455,7 +455,7 @@ struct binary_acca_int32_def : public
> overloaded_base<0>
>   || (type = r.infer_vector_type (1)) == NUM_TYPE_SUFFIXES)
>return error_mark_node;
> 
> -unsigned int last_arg = i;
> +unsigned int last_arg = i + 1;
>  for (i = 1; i < last_arg; i++)
>if (!r.require_matching_vector_type (i, type))
>   return error_mark_node;
> @@ -492,7 +492,7 @@ struct binary_acca_int64_def : public
> overloaded_base<0>
>   || (type = r.infer_vector_type (1)) == NUM_TYPE_SUFFIXES)
>return error_mark_node;
> 
> -unsigned int last_arg = i;
> +unsigned int last_arg = i + 1;
>  for (i = 1; i < last_arg; i++)
>if (!r.require_matching_vector_type (i, type))
>   return error_mark_node;
> --
> 2.34.1



[patch] OpenMP: Handle 'all' as category in defaultmap

2023-08-22 Thread Tobias Burnus

I stumbled over this when compiling the defaultmap files of the OpenMP example 
document
(upcoming version). Seemingly, an 'all' was sneaked in when the syntax 
representation was
changed (as alias for not specifying a category). I wonder which intended or 
nonintended
changes are still hiding and, hence, still need to be implemented and/or fixed 
in the spec.

Comments, suggestions, remarks to the attached patch?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP: Handle 'all' as category in defaultmap

Both, specifying no category and specifying 'all', implies
that the implicit-behavior applies to all categories.

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_omp_clause_defaultmap): Parse
	'all' as category.

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_omp_clause_defaultmap): Parse
	'all' as category.

gcc/fortran/ChangeLog:

	* gfortran.h (enum gfc_omp_defaultmap_category):
	Add OMP_DEFAULTMAP_CAT_ALL.
	* openmp.cc (gfc_match_omp_clauses): Parse
	'all' as category.
	* trans-openmp.cc (gfc_trans_omp_clauses): Handle it.

gcc/ChangeLog:

	* tree-core.h (enum omp_clause_defaultmap_kind): Add
	OMP_CLAUSE_DEFAULTMAP_CATEGORY_ALL.
	* gimplify.cc (gimplify_scan_omp_clauses): Handle it.
	* tree-pretty-print.cc (dump_omp_clause): Likewise.

libgomp/ChangeLog:

	* libgomp.texi (OpenMP 5.2 status): Add depobj with
	destroy-var argument as 'N'. Mark defaultmap with
	'all' category as 'Y'.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/defaultmap-1.f90: Update dg-error.
	* c-c++-common/gomp/defaultmap-5.c: New test.
	* c-c++-common/gomp/defaultmap-6.c: New test.
	* gfortran.dg/gomp/defaultmap-10.f90: New test.
	* gfortran.dg/gomp/defaultmap-9.f90: New test.

 gcc/c/c-parser.cc|  19 +++-
 gcc/cp/parser.cc |  19 +++-
 gcc/fortran/gfortran.h   |   1 +
 gcc/fortran/openmp.cc|  12 ++-
 gcc/fortran/trans-openmp.cc  |   3 +
 gcc/gimplify.cc  |   1 +
 gcc/testsuite/c-c++-common/gomp/defaultmap-5.c   |  47 +
 gcc/testsuite/c-c++-common/gomp/defaultmap-6.c   |  48 ++
 gcc/testsuite/gfortran.dg/gomp/defaultmap-1.f90  |   2 +-
 gcc/testsuite/gfortran.dg/gomp/defaultmap-10.f90 | 116 +++
 gcc/testsuite/gfortran.dg/gomp/defaultmap-9.f90  |  71 ++
 gcc/tree-core.h  |   1 +
 gcc/tree-pretty-print.cc |   3 +
 libgomp/libgomp.texi |   4 +-
 14 files changed, 334 insertions(+), 13 deletions(-)

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 33fe7b115ff..4a4820d792c 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -15067,8 +15067,8 @@ c_parser_omp_clause_defaultmap (c_parser *parser, tree list)
   if (!c_parser_next_token_is (parser, CPP_NAME))
 	{
 	invalid_category:
-	  c_parser_error (parser, "expected %, % or "
-  "%");
+	  c_parser_error (parser, "expected %, %, "
+  "% or %");
 	  goto out_err;
 	}
   p = IDENTIFIER_POINTER (c_parser_peek_token (parser)->value);
@@ -15077,6 +15077,8 @@ c_parser_omp_clause_defaultmap (c_parser *parser, tree list)
 	case 'a':
 	  if (strcmp ("aggregate", p) == 0)
 	category = OMP_CLAUSE_DEFAULTMAP_CATEGORY_AGGREGATE;
+	  else if (strcmp ("all", p) == 0)
+	category = OMP_CLAUSE_DEFAULTMAP_CATEGORY_ALL;
 	  else
 	goto invalid_category;
 	  break;
@@ -15106,13 +15108,19 @@ c_parser_omp_clause_defaultmap (c_parser *parser, tree list)
   for (c = list; c ; c = OMP_CLAUSE_CHAIN (c))
 if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEFAULTMAP
 	&& (category == OMP_CLAUSE_DEFAULTMAP_CATEGORY_UNSPECIFIED
+	|| category == OMP_CLAUSE_DEFAULTMAP_CATEGORY_ALL
 	|| OMP_CLAUSE_DEFAULTMAP_CATEGORY (c) == category
 	|| (OMP_CLAUSE_DEFAULTMAP_CATEGORY (c)
-		== OMP_CLAUSE_DEFAULTMAP_CATEGORY_UNSPECIFIED)))
+		== OMP_CLAUSE_DEFAULTMAP_CATEGORY_UNSPECIFIED)
+	|| (OMP_CLAUSE_DEFAULTMAP_CATEGORY (c)
+		== OMP_CLAUSE_DEFAULTMAP_CATEGORY_ALL)))
   {
 	enum omp_clause_defaultmap_kind cat = category;
 	location_t loc = OMP_CLAUSE_LOCATION (c);
-	if (cat == OMP_CLAUSE_DEFAULTMAP_CATEGORY_UNSPECIFIED)
+	if (cat == OMP_CLAUSE_DEFAULTMAP_CATEGORY_UNSPECIFIED
+	|| (cat == OMP_CLAUSE_DEFAULTMAP_CATEGORY_ALL
+		&& (OMP_CLAUSE_DEFAULTMAP_CATEGORY (c)
+		!= OMP_CLAUSE_DEFAULTMAP_CATEGORY_UNSPECIFIED)))
 	  cat = OMP_CLAUSE_DEFAULTMAP_CATEGORY (c);
 	p = NULL;
 	switch (cat)
@@ -15120,6 +15128,9 @@ c_parser_omp_clause_defaultmap (c_parser *parser, tree list)
 	  case OMP_CLAUSE_DEFAULTMAP_CATEGORY_UNSPECIFIED:
 	p = NULL;
 	break;
+	  case OMP_CLAUSE_DEFAULTMAP_CATEGORY_ALL:
+	p = "all";
+	break;
 	  case 

Re: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Richard Biener via Gcc-patches
On Tue, Aug 22, 2023 at 10:53 AM Jiang, Haochen  wrote:
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: Tuesday, August 22, 2023 4:36 PM
> > To: Jakub Jelinek 
> > Cc: Jiang, Haochen ; ZiNgA BuRgA
> > ; Hongtao Liu ; gcc-
> > patc...@gcc.gnu.org
> > Subject: Re: Intel AVX10.1 Compiler Design and Support
> >
> > On Tue, Aug 22, 2023 at 10:34 AM Jakub Jelinek  wrote:
> > >
> > > On Tue, Aug 22, 2023 at 09:36:15AM +0200, Richard Biener via Gcc-patches
> > wrote:
> > > > I think internally we should have conditional 512bit support work across
> > > > AVX512 and AVX10.
> > > >
> > > > I also think it makes sense to _internally_ have AVX10.1 (10.1!) just
> > > > enable the respective AVX512 features.  AVX10.2 would then internally
> > > > cover the ISA extensions added in 10.2 only.  Both would reduce the
> > > > redundancy and possibly make providing inter-operation between
> > > > AVX10.1 (10.1!) and AVX512 to the user easier.  I see AVX 10.1 (10.1!)
> > > > just as "re-branding" latest AVX512, so we should treat it that way
> > > > (making it an alias to the AVX512 features).
> > > >
> > > > Whether we want allow -mavx10.1 -mno-avx512cd or whether
> > > > we only allow the "positive" -mavx512f -mavx512... (omitting avx512cd)
> > > > is an entirely separate
> > > > question.  But I think to not wreck the core idea (more 
> > > > interoperability,
> > > > here between small/big cores) we absolutely have to
> > > > provide a subset of avx10.1 but with disabled 512bit vectors which
> > > > effectively means AVX512 with disabled 512bit support.
> > >
> > > Agreed.  And I still think -mevex512 vs. -mno-evex512 is the best option
> > > name to represent whether the effective ISA set allows 512-bit vectors or
> > > not.
> >
> > Works for me.  Note it also implies mask regs are SImode, not DImode,
> > not sure if that relates to evex more than mask reg encodings are all evex 
> > ...
> >
>
> Just in case we are not on the same page.
>
> So we are looking forward to an "extended" -m[no-]avx10-max-512bit option,
> which can also be used on AVX512. The other basic logic will not change.

Yes, I think that fulfills the main complaints.

Internally I'd also like to avoid having TARGET_AVX10.1 guards in the md file
but alias -mavx10.1 to the set of AVX512 sub-ISAs it covers.  Only have
TARGET_AVX10.2 covering ISA extensions introduced with 10.2.

> BTW, -mevex512 is not a good name since there will be 64 bit mask operations
> promoted to EVEX128 in APX, which might cause confusion.
>
> Thx,
> Haochen
>
> > >  I think -mavx10.1 -mno-avx512cd should be fine.  And, -mavx10.1-256
> > > option IMHO should be in the same spirit to all the others a positive
> > enablement,
> > > not both positive (enable avx512{f,cd,bw,dq,...} and negative (disallow
> > > 512-bit vectors).  So, if one uses -mavx512f -mavx10.1-256, because the
> > > former would allow 512-bit vectors, the latter shouldn't disable those 
> > > again
> > > because it isn't a -mno-* option.  Sure, instructions which are specific 
> > > to
> > > AVX10.1 (aren't present in any currently existing AVX512* ISA set) might 
> > > be
> > > enabled only in 128/256 bit variants if we differentiate that level.
> > > But, if one uses -mavx2 -mavx10.1-256, because no AVX512* has been enabled
> > > it can enable all the AVX10.1 implied AVX512* parts without EVEX.512.
> > >
> > > Jakub
> > >


Re: [PATCH] tree-optimization/94864 - vector insert of vector extract simplification

2023-08-22 Thread Hongtao Liu via Gcc-patches
On Tue, Aug 22, 2023 at 5:05 PM Richard Biener via Gcc-patches
 wrote:
>
> The PRs ask for optimizing of
>
>   _1 = BIT_FIELD_REF ;
>   result_4 = BIT_INSERT_EXPR ;
>
> to a vector permutation.  The following implements this as
> match.pd pattern, improving code generation on x86_64.
>
> On the RTL level we face the issue that backend patterns inconsistently
> use vec_merge and vec_select of vec_concat to represent permutes.
>
> I think using a (supported) permute is almost always better
> than an extract plus insert, maybe excluding the case we extract
> element zero and that's aliased to a register that can be used
> directly for insertion (not sure how to query that).
>
> The patch FAILs one case in gcc.target/i386/avx512fp16-vmovsh-1a.c
> where we now expand from
>
>  __A_28 = VEC_PERM_EXPR ;
>
> instead of
>
>  _28 = BIT_FIELD_REF ;
>  __A_29 = BIT_INSERT_EXPR ;
>
> producing a vpblendw instruction instead of the expected vmovsh.  That's
> either a missed vec_perm_const expansion optimization or even better,
> an improvement - Zen4 for example has 4 ports to execute vpblendw
> but only 3 for executing vmovsh and both instructions have the same size.
Looks like Sapphire rapids only have 2 ports for executing vpblendw
but 3 for vmovsh. I guess we may need a micro-architecture tuning for
this specific permutation.
for vmovss/vpblendd, they're equivalent on SPR, both are 3.
The change for the testcase is ok, I'll handle it with an incremental patch.
>
> The patch XFAILs the sub-testcase - is that OK or should I update
> the expected instruction to a vpblend?
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> Thanks,
> Richard.
>
> PR tree-optimization/94864
> PR tree-optimization/94865
> PR tree-optimization/93080
> * match.pd (bit_insert @0 (BIT_FIELD_REF @1 ..) ..): New pattern
> for vector insertion from vector extraction.
>
> * gcc.target/i386/pr94864.c: New testcase.
> * gcc.target/i386/pr94865.c: Likewise.
> * gcc.target/i386/avx512fp16-vmovsh-1a.c: XFAIL.
> * gcc.dg/tree-ssa/forwprop-40.c: Likewise.
> * gcc.dg/tree-ssa/forwprop-41.c: Likewise.
> ---
>  gcc/match.pd  | 25 +++
>  gcc/testsuite/gcc.dg/tree-ssa/forwprop-40.c   | 14 +++
>  gcc/testsuite/gcc.dg/tree-ssa/forwprop-41.c   | 16 
>  .../gcc.target/i386/avx512fp16-vmovsh-1a.c|  2 +-
>  gcc/testsuite/gcc.target/i386/pr94864.c   | 13 ++
>  gcc/testsuite/gcc.target/i386/pr94865.c   | 13 ++
>  6 files changed, 82 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/forwprop-40.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/forwprop-41.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr94864.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr94865.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 86fdc606a79..6e083021b27 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -8006,6 +8006,31 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   wi::to_wide (@ipos) + isize))
>  (BIT_FIELD_REF @0 @rsize @rpos)
>
> +/* Simplify vector inserts of other vector extracts to a permute.  */
> +(simplify
> + (bit_insert @0 (BIT_FIELD_REF@2 @1 @rsize @rpos) @ipos)
> + (if (VECTOR_TYPE_P (type)
> +  && types_match (@0, @1)
> +  && types_match (TREE_TYPE (TREE_TYPE (@0)), TREE_TYPE (@2))
> +  && TYPE_VECTOR_SUBPARTS (type).is_constant ())
> +  (with
> +   {
> + unsigned HOST_WIDE_INT elsz
> +   = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (TREE_TYPE (@1;
> + poly_uint64 relt = exact_div (tree_to_poly_uint64 (@rpos), elsz);
> + poly_uint64 ielt = exact_div (tree_to_poly_uint64 (@ipos), elsz);
> + unsigned nunits = TYPE_VECTOR_SUBPARTS (type).to_constant ();
> + vec_perm_builder builder;
> + builder.new_vector (nunits, nunits, 1);
> + for (unsigned i = 0; i < nunits; ++i)
> +   builder.quick_push (known_eq (ielt, i) ? nunits + relt : i);
> + vec_perm_indices sel (builder, 2, nunits);
> +   }
> +   (if (!VECTOR_MODE_P (TYPE_MODE (type))
> +   || can_vec_perm_const_p (TYPE_MODE (type), TYPE_MODE (type), sel, 
> false))
> +(vec_perm @0 @1 { vec_perm_indices_to_tree
> +(build_vector_type (ssizetype, nunits), sel); })
> +
>  (if (canonicalize_math_after_vectorization_p ())
>   (for fmas (FMA)
>(simplify
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-40.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-40.c
> new file mode 100644
> index 000..7513497f552
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-40.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized -Wno-psabi -w" } */
> +
> +#define vector __attribute__((__vector_size__(16) ))
> +
> +vector int g(vector int a)
> +{
> +  int b = a[0];
> +  a[0] = b;
> +  return a;
> +}
> +
> +/* { dg-final { 

Re: [OpenMP/offloading][RFC] How to handle target/device-specifics with C pre-processor (in general, inside 'omp declare variant')

2023-08-22 Thread Jakub Jelinek via Gcc-patches
On Tue, Aug 22, 2023 at 10:43:54AM +0200, Tobias Burnus wrote:
> On 22.08.23 09:25, Richard Biener wrote:
> > On Mon, Aug 21, 2023 at 6:23 PM Tobias Burnus  
> > wrote:
> > > ...
> > Err, so the OMP standard doesn't put any constraints on what to allow 
> > inside the
> > variants?  Is declare variant always at the toplevel?
> 
> Actually, the OpenMP specification only states the following – which is less 
> than I claimed:
> 
> "If the context selector of a begin declare variant directive contains traits 
> in the device
> or implementation set that are known never to be compatible with an OpenMP 
> context during
> the current compilation, the preprocessed code that follows the begin declare 
> variant
> directive up to its paired end directive is elided."

The reason for the way how GCC implements the offloading is make sure the
layout of types/variables/functions is the same so that the host and
offloading side can actually interoperate.  I think it is much cleaner
design.
The unfortunate thing is that LLVM decided to do it differently, by separate
parsing/compilation for host cases and device cases.
That allows the various preprocessor games and the like, but on the other
side allows the user to make host vs. offloading inoperable - say #ifdefing
out some members of a struct, using different attributes which cause
different alignment and the like.  If source comes from a pipe, what do you
do so that you can preprocess multiple times?  The offloading compilation
still needs to be some weird hybrid of the offloading target and host target,
because e.g. the structure/variable layout/alignment etc. decisions need to
be done according to host target.
The worst thing is that the bad way LLVM decided to implement this later
leaks into the standard, where some people who propose new features just
don't think that it could be implemented differently and that results in
cases like the begin declare variant eliding what is in between.  It takes
time to adjust the wording so that it is acceptable even for the GCC way
of doing offloading and sometimes we aren't successful at it.
So, the long term question is if we should't give up and do it with separate
parsing as well.  But that would be a lot of work...

Jakub



[PATCH] tree-optimization/94864 - vector insert of vector extract simplification

2023-08-22 Thread Richard Biener via Gcc-patches
The PRs ask for optimizing of

  _1 = BIT_FIELD_REF ;
  result_4 = BIT_INSERT_EXPR ;

to a vector permutation.  The following implements this as
match.pd pattern, improving code generation on x86_64.

On the RTL level we face the issue that backend patterns inconsistently
use vec_merge and vec_select of vec_concat to represent permutes.

I think using a (supported) permute is almost always better
than an extract plus insert, maybe excluding the case we extract
element zero and that's aliased to a register that can be used
directly for insertion (not sure how to query that).

The patch FAILs one case in gcc.target/i386/avx512fp16-vmovsh-1a.c
where we now expand from

 __A_28 = VEC_PERM_EXPR ;

instead of

 _28 = BIT_FIELD_REF ;
 __A_29 = BIT_INSERT_EXPR ;

producing a vpblendw instruction instead of the expected vmovsh.  That's
either a missed vec_perm_const expansion optimization or even better,
an improvement - Zen4 for example has 4 ports to execute vpblendw
but only 3 for executing vmovsh and both instructions have the same size.

The patch XFAILs the sub-testcase - is that OK or should I update
the expected instruction to a vpblend?

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Thanks,
Richard.

PR tree-optimization/94864
PR tree-optimization/94865
PR tree-optimization/93080
* match.pd (bit_insert @0 (BIT_FIELD_REF @1 ..) ..): New pattern
for vector insertion from vector extraction.

* gcc.target/i386/pr94864.c: New testcase.
* gcc.target/i386/pr94865.c: Likewise.
* gcc.target/i386/avx512fp16-vmovsh-1a.c: XFAIL.
* gcc.dg/tree-ssa/forwprop-40.c: Likewise.
* gcc.dg/tree-ssa/forwprop-41.c: Likewise.
---
 gcc/match.pd  | 25 +++
 gcc/testsuite/gcc.dg/tree-ssa/forwprop-40.c   | 14 +++
 gcc/testsuite/gcc.dg/tree-ssa/forwprop-41.c   | 16 
 .../gcc.target/i386/avx512fp16-vmovsh-1a.c|  2 +-
 gcc/testsuite/gcc.target/i386/pr94864.c   | 13 ++
 gcc/testsuite/gcc.target/i386/pr94865.c   | 13 ++
 6 files changed, 82 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/forwprop-40.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/forwprop-41.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr94864.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr94865.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 86fdc606a79..6e083021b27 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -8006,6 +8006,31 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  wi::to_wide (@ipos) + isize))
 (BIT_FIELD_REF @0 @rsize @rpos)
 
+/* Simplify vector inserts of other vector extracts to a permute.  */
+(simplify
+ (bit_insert @0 (BIT_FIELD_REF@2 @1 @rsize @rpos) @ipos)
+ (if (VECTOR_TYPE_P (type)
+  && types_match (@0, @1)
+  && types_match (TREE_TYPE (TREE_TYPE (@0)), TREE_TYPE (@2))
+  && TYPE_VECTOR_SUBPARTS (type).is_constant ())
+  (with
+   {
+ unsigned HOST_WIDE_INT elsz
+   = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (TREE_TYPE (@1;
+ poly_uint64 relt = exact_div (tree_to_poly_uint64 (@rpos), elsz);
+ poly_uint64 ielt = exact_div (tree_to_poly_uint64 (@ipos), elsz);
+ unsigned nunits = TYPE_VECTOR_SUBPARTS (type).to_constant ();
+ vec_perm_builder builder;
+ builder.new_vector (nunits, nunits, 1);
+ for (unsigned i = 0; i < nunits; ++i)
+   builder.quick_push (known_eq (ielt, i) ? nunits + relt : i);
+ vec_perm_indices sel (builder, 2, nunits);
+   }
+   (if (!VECTOR_MODE_P (TYPE_MODE (type))
+   || can_vec_perm_const_p (TYPE_MODE (type), TYPE_MODE (type), sel, 
false))
+(vec_perm @0 @1 { vec_perm_indices_to_tree
+(build_vector_type (ssizetype, nunits), sel); })
+
 (if (canonicalize_math_after_vectorization_p ())
  (for fmas (FMA)
   (simplify
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-40.c 
b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-40.c
new file mode 100644
index 000..7513497f552
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-40.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized -Wno-psabi -w" } */
+
+#define vector __attribute__((__vector_size__(16) ))
+
+vector int g(vector int a)
+{
+  int b = a[0];
+  a[0] = b;
+  return a;
+}
+
+/* { dg-final { scan-tree-dump-times "BIT_INSERT_EXPR" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 0 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-41.c 
b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-41.c
new file mode 100644
index 000..b1e75797a90
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-41.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized -Wno-psabi -w" } */
+
+#define vector __attribute__((__vector_size__(16) ))
+
+vector int g(vector int a, int c)
+{
+  int b = a[2];
+  a[2] = b;
+  a[1] = c;
+  return a;
+}
+

RE: Intel AVX10.1 Compiler Design and Support

2023-08-22 Thread Jiang, Haochen via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, August 22, 2023 4:36 PM
> To: Jakub Jelinek 
> Cc: Jiang, Haochen ; ZiNgA BuRgA
> ; Hongtao Liu ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: Intel AVX10.1 Compiler Design and Support
> 
> On Tue, Aug 22, 2023 at 10:34 AM Jakub Jelinek  wrote:
> >
> > On Tue, Aug 22, 2023 at 09:36:15AM +0200, Richard Biener via Gcc-patches
> wrote:
> > > I think internally we should have conditional 512bit support work across
> > > AVX512 and AVX10.
> > >
> > > I also think it makes sense to _internally_ have AVX10.1 (10.1!) just
> > > enable the respective AVX512 features.  AVX10.2 would then internally
> > > cover the ISA extensions added in 10.2 only.  Both would reduce the
> > > redundancy and possibly make providing inter-operation between
> > > AVX10.1 (10.1!) and AVX512 to the user easier.  I see AVX 10.1 (10.1!)
> > > just as "re-branding" latest AVX512, so we should treat it that way
> > > (making it an alias to the AVX512 features).
> > >
> > > Whether we want allow -mavx10.1 -mno-avx512cd or whether
> > > we only allow the "positive" -mavx512f -mavx512... (omitting avx512cd)
> > > is an entirely separate
> > > question.  But I think to not wreck the core idea (more interoperability,
> > > here between small/big cores) we absolutely have to
> > > provide a subset of avx10.1 but with disabled 512bit vectors which
> > > effectively means AVX512 with disabled 512bit support.
> >
> > Agreed.  And I still think -mevex512 vs. -mno-evex512 is the best option
> > name to represent whether the effective ISA set allows 512-bit vectors or
> > not.
> 
> Works for me.  Note it also implies mask regs are SImode, not DImode,
> not sure if that relates to evex more than mask reg encodings are all evex ...
> 

Just in case we are not on the same page.

So we are looking forward to an "extended" -m[no-]avx10-max-512bit option,
which can also be used on AVX512. The other basic logic will not change.

BTW, -mevex512 is not a good name since there will be 64 bit mask operations
promoted to EVEX128 in APX, which might cause confusion.

Thx,
Haochen

> >  I think -mavx10.1 -mno-avx512cd should be fine.  And, -mavx10.1-256
> > option IMHO should be in the same spirit to all the others a positive
> enablement,
> > not both positive (enable avx512{f,cd,bw,dq,...} and negative (disallow
> > 512-bit vectors).  So, if one uses -mavx512f -mavx10.1-256, because the
> > former would allow 512-bit vectors, the latter shouldn't disable those again
> > because it isn't a -mno-* option.  Sure, instructions which are specific to
> > AVX10.1 (aren't present in any currently existing AVX512* ISA set) might be
> > enabled only in 128/256 bit variants if we differentiate that level.
> > But, if one uses -mavx2 -mavx10.1-256, because no AVX512* has been enabled
> > it can enable all the AVX10.1 implied AVX512* parts without EVEX.512.
> >
> > Jakub
> >


[PATCH 3/3] vect: Move VMAT_GATHER_SCATTER handlings from final loop nest

2023-08-22 Thread Kewen.Lin via Gcc-patches
Hi,

Like r14-3317 which moves the handlings on memory access
type VMAT_GATHER_SCATTER in vectorizable_load final loop
nest, this one is to deal with vectorizable_store side.

Bootstrapped and regtested on x86_64-redhat-linux,
aarch64-linux-gnu and powerpc64{,le}-linux-gnu.

Is it ok for trunk?

BR,
Kewen
-

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_store): Move the handlings on
VMAT_GATHER_SCATTER in the final loop nest to its own loop,
and update the final nest accordingly.
---
 gcc/tree-vect-stmts.cc | 258 +
 1 file changed, 159 insertions(+), 99 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 18f5ebcc09c..b959c1861ad 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -8930,44 +8930,23 @@ vectorizable_store (vec_info *vinfo,
   return true;
 }

-  auto_vec result_chain (group_size);
-  auto_vec vec_offsets;
-  auto_vec vec_oprnds;
-  for (j = 0; j < ncopies; j++)
+  if (memory_access_type == VMAT_GATHER_SCATTER)
 {
-  gimple *new_stmt;
-  if (j == 0)
+  gcc_assert (!slp && !grouped_store);
+  auto_vec vec_offsets;
+  for (j = 0; j < ncopies; j++)
{
- if (slp)
-   {
- /* Get vectorized arguments for SLP_NODE.  */
- vect_get_vec_defs (vinfo, stmt_info, slp_node, 1, op,
-_oprnds);
- vec_oprnd = vec_oprnds[0];
-   }
- else
+ gimple *new_stmt;
+ if (j == 0)
{
- /* For interleaved stores we collect vectorized defs for all the
-stores in the group in DR_CHAIN. DR_CHAIN is then used as an
-input to vect_permute_store_chain().
-
-If the store is not grouped, DR_GROUP_SIZE is 1, and DR_CHAIN
-is of size 1.  */
- stmt_vec_info next_stmt_info = first_stmt_info;
- for (i = 0; i < group_size; i++)
-   {
- /* Since gaps are not supported for interleaved stores,
-DR_GROUP_SIZE is the exact number of stmts in the chain.
-Therefore, NEXT_STMT_INFO can't be NULL_TREE.  In case
-that there is no interleaving, DR_GROUP_SIZE is 1,
-and only one iteration of the loop will be executed.  */
- op = vect_get_store_rhs (next_stmt_info);
- vect_get_vec_defs_for_operand (vinfo, next_stmt_info, ncopies,
-op, gvec_oprnds[i]);
- vec_oprnd = (*gvec_oprnds[i])[0];
- dr_chain.quick_push (vec_oprnd);
- next_stmt_info = DR_GROUP_NEXT_ELEMENT (next_stmt_info);
-   }
+ /* Since the store is not grouped, DR_GROUP_SIZE is 1, and
+DR_CHAIN is of size 1.  */
+ gcc_assert (group_size == 1);
+ op = vect_get_store_rhs (first_stmt_info);
+ vect_get_vec_defs_for_operand (vinfo, first_stmt_info, ncopies,
+op, gvec_oprnds[0]);
+ vec_oprnd = (*gvec_oprnds[0])[0];
+ dr_chain.quick_push (vec_oprnd);
  if (mask)
{
  vect_get_vec_defs_for_operand (vinfo, stmt_info, ncopies,
@@ -8975,91 +8954,55 @@ vectorizable_store (vec_info *vinfo,
 mask_vectype);
  vec_mask = vec_masks[0];
}
-   }

- /* We should have catched mismatched types earlier.  */
- gcc_assert (useless_type_conversion_p (vectype,
-TREE_TYPE (vec_oprnd)));
- bool simd_lane_access_p
-   = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) != 0;
- if (simd_lane_access_p
- && !loop_masks
- && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) == ADDR_EXPR
- && VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (first_dr_info->dr), 0))
- && integer_zerop (get_dr_vinfo_offset (vinfo, first_dr_info))
- && integer_zerop (DR_INIT (first_dr_info->dr))
- && alias_sets_conflict_p (get_alias_set (aggr_type),
-   get_alias_set (TREE_TYPE (ref_type
-   {
- dataref_ptr = unshare_expr (DR_BASE_ADDRESS (first_dr_info->dr));
- dataref_offset = build_int_cst (ref_type, 0);
+ /* We should have catched mismatched types earlier.  */
+ gcc_assert (useless_type_conversion_p (vectype,
+TREE_TYPE (vec_oprnd)));
+ if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
+   vect_get_gather_scatter_ops (loop_vinfo, loop, stmt_info,
+slp_node, _info, _ptr,
+

  1   2   >